Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
> I think the best approach will be not to reset dr7 at all. Then there > won't be any need to worry about restoring it. Leaving a userspace > watchpoint enabled while running in the kernel ought not to matter; a > system call shouldn't touch any address in userspace more than once or > twice. Hmm. That sounds reasonable. But I wonder why the old code clears %dr7. It's been that way for a long time (since 2.4 at least). > My idea was to put 4 hwbkpt structures in thread_struct, so they would > always be available for use by ptrace. However it turned out not to be > feasible to replace the debugreg array with something more sophisticated, > because of conflicting declarations and problems with the ordering of > #includes. So instead I have been forced to replace debugreg[] with a > pointer to a structure which can be allocated as needed. I think that's preferable anyway. Most tasks most of the time will never need that storage, so why not make thread_struct a little smaller? (There is also the potential for sharing, which I mentioned earlier.) > This raises the possibility that a PTRACE syscall might fail because the > allocation fails. Hopefully that won't be an issue? It's not a new issue, anyway, after utrace. The utrace-based ptrace can fail for PTRACE_ATTACH because of OOM too, which wasn't possible before. I think it's survivable. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Thu, 22 Feb 2007, Roland McGrath wrote: > > Yes, you are wrong -- although perhaps you shouldn't be. > > > > The current version of do_debug() clears dr7 when a debug interrupt is > > fielded. As a result, if a system call touches two watchpoint addresses > > in userspace only the first access will be noticed. > > Ah, I see. I think it would indeed be nice to fix this. > > > This is probably a bug in do_debug(). It would be better to disable each > > individual userspace watchpoint as it is triggered (or even not to disable > > it at all). dr7 would be restored when the SIGTRAP is delivered. (But > > what if the user is blocking or ignoring SIGTRAP?) > > The user blocking or ignoring it doesn't come up, because it's a > force_sig_info call. However, a debugger will indeed swallow the signal > through ptrace/utrace means. In ptrace, the dr7 is always going to get > reset because there will always be a context switch out and back in that > does it. But with utrace it's now possible to swallow the signal and keep > going without a context switch (e.g. a breakpoint that is just doing > logging but not stopping). So perhaps we should have a TIF_RESTORE_DR7 > that goes into _TIF_WORK_MASK and gets handled in do_notify_resume > (or maybe it's TIF_HWBKPT). I think the best approach will be not to reset dr7 at all. Then there won't be any need to worry about restoring it. Leaving a userspace watchpoint enabled while running in the kernel ought not to matter; a system call shouldn't touch any address in userspace more than once or twice. > > I've worked out a plan for implementing combined user/kernel mode > > breakpoints and watchpoints (call them "debugpoints" as a catch-all > > term). It should work transparently with ptrace and should accomodate > > whatever scheme utrace decides to adopt. There won't need to be a > > separate kwatch facility on top of it; the new combined facility will > > handle debugpoints in both userspace and kernelspace. > > That sounds great. I'm not thrilled with the name "debugpoint", I have to > tell you. The hardware documentation calls all these things "breakpoints", > and I think "data breakpoint" and "instruction breakpoint" are pretty good > terms. How about "hwbkpt" for the facility API? Okay, I'll change the name. > The one caveat I have here is that I don't want ptrace (via utrace) to have > to supply the usual structure. I probably only think this because it would > be a pain for the ptrace/utrace implementation to find a place to stick it. > But I have a rationalization. The old ptrace interface, and the > utrace_regset for debugregs, is not really a "debugpoint user" in the sense > you're defining it. It's an access to the "raw" debugregs as part of the > thread's virtual CPU context. You can use ptrace to set a watchpoint, then > detach ptrace, and the thread will get a SIGTRAP later though there is no > remnant at that point of the debugger interface that made it come about. > For the degenerate case of medium-high priority with no handler callbacks > (that should instead be an error at registration time if no slot is free), > you shouldn't really need any per-caller storage (there can only be one > such caller per slot). My idea was to put 4 hwbkpt structures in thread_struct, so they would always be available for use by ptrace. However it turned out not to be feasible to replace the debugreg array with something more sophisticated, because of conflicting declarations and problems with the ordering of #includes. So instead I have been forced to replace debugreg[] with a pointer to a structure which can be allocated as needed. This raises the possibility that a PTRACE syscall might fail because the allocation fails. Hopefully that won't be an issue? Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Thu, 22 Feb 2007, Roland McGrath wrote: Yes, you are wrong -- although perhaps you shouldn't be. The current version of do_debug() clears dr7 when a debug interrupt is fielded. As a result, if a system call touches two watchpoint addresses in userspace only the first access will be noticed. Ah, I see. I think it would indeed be nice to fix this. This is probably a bug in do_debug(). It would be better to disable each individual userspace watchpoint as it is triggered (or even not to disable it at all). dr7 would be restored when the SIGTRAP is delivered. (But what if the user is blocking or ignoring SIGTRAP?) The user blocking or ignoring it doesn't come up, because it's a force_sig_info call. However, a debugger will indeed swallow the signal through ptrace/utrace means. In ptrace, the dr7 is always going to get reset because there will always be a context switch out and back in that does it. But with utrace it's now possible to swallow the signal and keep going without a context switch (e.g. a breakpoint that is just doing logging but not stopping). So perhaps we should have a TIF_RESTORE_DR7 that goes into _TIF_WORK_MASK and gets handled in do_notify_resume (or maybe it's TIF_HWBKPT). I think the best approach will be not to reset dr7 at all. Then there won't be any need to worry about restoring it. Leaving a userspace watchpoint enabled while running in the kernel ought not to matter; a system call shouldn't touch any address in userspace more than once or twice. I've worked out a plan for implementing combined user/kernel mode breakpoints and watchpoints (call them debugpoints as a catch-all term). It should work transparently with ptrace and should accomodate whatever scheme utrace decides to adopt. There won't need to be a separate kwatch facility on top of it; the new combined facility will handle debugpoints in both userspace and kernelspace. That sounds great. I'm not thrilled with the name debugpoint, I have to tell you. The hardware documentation calls all these things breakpoints, and I think data breakpoint and instruction breakpoint are pretty good terms. How about hwbkpt for the facility API? Okay, I'll change the name. The one caveat I have here is that I don't want ptrace (via utrace) to have to supply the usual structure. I probably only think this because it would be a pain for the ptrace/utrace implementation to find a place to stick it. But I have a rationalization. The old ptrace interface, and the utrace_regset for debugregs, is not really a debugpoint user in the sense you're defining it. It's an access to the raw debugregs as part of the thread's virtual CPU context. You can use ptrace to set a watchpoint, then detach ptrace, and the thread will get a SIGTRAP later though there is no remnant at that point of the debugger interface that made it come about. For the degenerate case of medium-high priority with no handler callbacks (that should instead be an error at registration time if no slot is free), you shouldn't really need any per-caller storage (there can only be one such caller per slot). My idea was to put 4 hwbkpt structures in thread_struct, so they would always be available for use by ptrace. However it turned out not to be feasible to replace the debugreg array with something more sophisticated, because of conflicting declarations and problems with the ordering of #includes. So instead I have been forced to replace debugreg[] with a pointer to a structure which can be allocated as needed. This raises the possibility that a PTRACE syscall might fail because the allocation fails. Hopefully that won't be an issue? Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
I think the best approach will be not to reset dr7 at all. Then there won't be any need to worry about restoring it. Leaving a userspace watchpoint enabled while running in the kernel ought not to matter; a system call shouldn't touch any address in userspace more than once or twice. Hmm. That sounds reasonable. But I wonder why the old code clears %dr7. It's been that way for a long time (since 2.4 at least). My idea was to put 4 hwbkpt structures in thread_struct, so they would always be available for use by ptrace. However it turned out not to be feasible to replace the debugreg array with something more sophisticated, because of conflicting declarations and problems with the ordering of #includes. So instead I have been forced to replace debugreg[] with a pointer to a structure which can be allocated as needed. I think that's preferable anyway. Most tasks most of the time will never need that storage, so why not make thread_struct a little smaller? (There is also the potential for sharing, which I mentioned earlier.) This raises the possibility that a PTRACE syscall might fail because the allocation fails. Hopefully that won't be an issue? It's not a new issue, anyway, after utrace. The utrace-based ptrace can fail for PTRACE_ATTACH because of OOM too, which wasn't possible before. I think it's survivable. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
> Yes, you are wrong -- although perhaps you shouldn't be. > > The current version of do_debug() clears dr7 when a debug interrupt is > fielded. As a result, if a system call touches two watchpoint addresses > in userspace only the first access will be noticed. Ah, I see. I think it would indeed be nice to fix this. > This is probably a bug in do_debug(). It would be better to disable each > individual userspace watchpoint as it is triggered (or even not to disable > it at all). dr7 would be restored when the SIGTRAP is delivered. (But > what if the user is blocking or ignoring SIGTRAP?) The user blocking or ignoring it doesn't come up, because it's a force_sig_info call. However, a debugger will indeed swallow the signal through ptrace/utrace means. In ptrace, the dr7 is always going to get reset because there will always be a context switch out and back in that does it. But with utrace it's now possible to swallow the signal and keep going without a context switch (e.g. a breakpoint that is just doing logging but not stopping). So perhaps we should have a TIF_RESTORE_DR7 that goes into _TIF_WORK_MASK and gets handled in do_notify_resume (or maybe it's TIF_HWBKPT). You should not actually need to disable user watchpoints, because in data watchpoints the exception comes after the instruction completes. Only for instruction watchpoints does the exception come before the instruction executes, and no user watchpoints can be in the address range containing kernel code. SIGTRAP both doesn't queue, and doesn't give %dr6 values in its siginfo_t. All user watchpoints will be handled via the signal; this is the only way ptrace can report them, and is also the utrace way of doing things. do_debug can happen inside kernel code, and tracing of user-level tasks can only safely do anything at the point just before returning to user mode, where signals are handled. So, getting to send_sigtrap in do_debug is enough to say "one or more user debug exceptions happened". The %dr6 value that collects in the thread state to be seen by ptrace, or by utrace-based things using your new facility, needs to collect all the %dr6 bits that were set by the hardware and weren't consumed by kernel-level tracing. An eventual utrace-based thing might in fact have some other way to tie in so that the event details could just be in some call made by do_debug and not recorded in the thread's virtual %dr6 value. But at least for ptrace, they should collect there if it becomes possible for more than one exception to happen while in kernel mode or in a single user instruction. (A single instruction can cause multiple exceptions at the hardware level.) > I've worked out a plan for implementing combined user/kernel mode > breakpoints and watchpoints (call them "debugpoints" as a catch-all > term). It should work transparently with ptrace and should accomodate > whatever scheme utrace decides to adopt. There won't need to be a > separate kwatch facility on top of it; the new combined facility will > handle debugpoints in both userspace and kernelspace. That sounds great. I'm not thrilled with the name "debugpoint", I have to tell you. The hardware documentation calls all these things "breakpoints", and I think "data breakpoint" and "instruction breakpoint" are pretty good terms. How about "hwbkpt" for the facility API? > To work with ptrace, the new scheme will completely virtualize the debug > registers. So for example, a ptrace call might request a debugpoint in > dr0, but it could end up that the physical register used is really dr2 > instead. The various bits in dr6 and dr7 will be mapped in such a way > that the entire procedure is transparent to the user. Debugpoints > registered in kernelspace or by utrace won't care, of course. I think that's a fine idea. The one caveat I have here is that I don't want ptrace (via utrace) to have to supply the usual structure. I probably only think this because it would be a pain for the ptrace/utrace implementation to find a place to stick it. But I have a rationalization. The old ptrace interface, and the utrace_regset for debugregs, is not really a "debugpoint user" in the sense you're defining it. It's an access to the "raw" debugregs as part of the thread's virtual CPU context. You can use ptrace to set a watchpoint, then detach ptrace, and the thread will get a SIGTRAP later though there is no remnant at that point of the debugger interface that made it come about. For the degenerate case of medium-high priority with no handler callbacks (that should instead be an error at registration time if no slot is free), you shouldn't really need any per-caller storage (there can only be one such caller per slot). > There are two things I am uncertain about: vm86 mode and kprobes. I don't > know anything about how either of them works. I know about kprobes. I don't know about vm86, but I can read the code. Thanks, Roland - To unsubscribe from this list:
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Wed, Feb 21, 2007 at 03:35:13PM -0500, Alan Stern wrote: > Going back to something you mentioned earlier... > [...] > On Fri, 9 Feb 2007, Roland McGrath wrote: > There are two things I am uncertain about: vm86 mode and kprobes. I don't > know anything about how either of them works. Judging from the current > code, nothing much should be needed -- debug traps in vm86 mode are > handled by calling handle_vm86_trap(), and kprobes puts itself at the > start of the notify_die() chain so it can handle single-step traps. > Eventually it will be necessary to check with someone who really > understands the issues. Yes, Kprobes needs to get notified first to handle single-step traps. So kwatch getting notified secound should be fine. Thanks Prasanna -- Prasanna S.P. Linux Technology Center India Software Labs, IBM Bangalore Email: [EMAIL PROTECTED] Ph: 91-80-41776329 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Wed, Feb 21, 2007 at 03:35:13PM -0500, Alan Stern wrote: Going back to something you mentioned earlier... [...] On Fri, 9 Feb 2007, Roland McGrath wrote: There are two things I am uncertain about: vm86 mode and kprobes. I don't know anything about how either of them works. Judging from the current code, nothing much should be needed -- debug traps in vm86 mode are handled by calling handle_vm86_trap(), and kprobes puts itself at the start of the notify_die() chain so it can handle single-step traps. Eventually it will be necessary to check with someone who really understands the issues. Yes, Kprobes needs to get notified first to handle single-step traps. So kwatch getting notified secound should be fine. Thanks Prasanna -- Prasanna S.P. Linux Technology Center India Software Labs, IBM Bangalore Email: [EMAIL PROTECTED] Ph: 91-80-41776329 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
Yes, you are wrong -- although perhaps you shouldn't be. The current version of do_debug() clears dr7 when a debug interrupt is fielded. As a result, if a system call touches two watchpoint addresses in userspace only the first access will be noticed. Ah, I see. I think it would indeed be nice to fix this. This is probably a bug in do_debug(). It would be better to disable each individual userspace watchpoint as it is triggered (or even not to disable it at all). dr7 would be restored when the SIGTRAP is delivered. (But what if the user is blocking or ignoring SIGTRAP?) The user blocking or ignoring it doesn't come up, because it's a force_sig_info call. However, a debugger will indeed swallow the signal through ptrace/utrace means. In ptrace, the dr7 is always going to get reset because there will always be a context switch out and back in that does it. But with utrace it's now possible to swallow the signal and keep going without a context switch (e.g. a breakpoint that is just doing logging but not stopping). So perhaps we should have a TIF_RESTORE_DR7 that goes into _TIF_WORK_MASK and gets handled in do_notify_resume (or maybe it's TIF_HWBKPT). You should not actually need to disable user watchpoints, because in data watchpoints the exception comes after the instruction completes. Only for instruction watchpoints does the exception come before the instruction executes, and no user watchpoints can be in the address range containing kernel code. SIGTRAP both doesn't queue, and doesn't give %dr6 values in its siginfo_t. All user watchpoints will be handled via the signal; this is the only way ptrace can report them, and is also the utrace way of doing things. do_debug can happen inside kernel code, and tracing of user-level tasks can only safely do anything at the point just before returning to user mode, where signals are handled. So, getting to send_sigtrap in do_debug is enough to say one or more user debug exceptions happened. The %dr6 value that collects in the thread state to be seen by ptrace, or by utrace-based things using your new facility, needs to collect all the %dr6 bits that were set by the hardware and weren't consumed by kernel-level tracing. An eventual utrace-based thing might in fact have some other way to tie in so that the event details could just be in some call made by do_debug and not recorded in the thread's virtual %dr6 value. But at least for ptrace, they should collect there if it becomes possible for more than one exception to happen while in kernel mode or in a single user instruction. (A single instruction can cause multiple exceptions at the hardware level.) I've worked out a plan for implementing combined user/kernel mode breakpoints and watchpoints (call them debugpoints as a catch-all term). It should work transparently with ptrace and should accomodate whatever scheme utrace decides to adopt. There won't need to be a separate kwatch facility on top of it; the new combined facility will handle debugpoints in both userspace and kernelspace. That sounds great. I'm not thrilled with the name debugpoint, I have to tell you. The hardware documentation calls all these things breakpoints, and I think data breakpoint and instruction breakpoint are pretty good terms. How about hwbkpt for the facility API? To work with ptrace, the new scheme will completely virtualize the debug registers. So for example, a ptrace call might request a debugpoint in dr0, but it could end up that the physical register used is really dr2 instead. The various bits in dr6 and dr7 will be mapped in such a way that the entire procedure is transparent to the user. Debugpoints registered in kernelspace or by utrace won't care, of course. I think that's a fine idea. The one caveat I have here is that I don't want ptrace (via utrace) to have to supply the usual structure. I probably only think this because it would be a pain for the ptrace/utrace implementation to find a place to stick it. But I have a rationalization. The old ptrace interface, and the utrace_regset for debugregs, is not really a debugpoint user in the sense you're defining it. It's an access to the raw debugregs as part of the thread's virtual CPU context. You can use ptrace to set a watchpoint, then detach ptrace, and the thread will get a SIGTRAP later though there is no remnant at that point of the debugger interface that made it come about. For the degenerate case of medium-high priority with no handler callbacks (that should instead be an error at registration time if no slot is free), you shouldn't really need any per-caller storage (there can only be one such caller per slot). There are two things I am uncertain about: vm86 mode and kprobes. I don't know anything about how either of them works. I know about kprobes. I don't know about vm86, but I can read the code. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
Going back to something you mentioned earlier... On Fri, 9 Feb 2007, Roland McGrath wrote: > I don't think I really object to the ABI change of clearing %dr6 after an > exception so that it does not accumulate multiple results. But first I'll > have to convince myself that we never actually do want to accumulate > multiple results. Hmm, I think we can, so maybe I do object. If you set > two watchpoints inside a user buffer and then do a system call that touches > both those addresses (e.g. read), then you will go through do_debug (to > send_sigtrap) twice before returning to user mode. When the syscall is > done, you'll have a pending SIGTRAP for the debugger to handle. By looking > at your %dr6 the debugger can see that both watchpoints hit. (gdb does not > handle this case, but it should.) Am I wrong? Yes, you are wrong -- although perhaps you shouldn't be. The current version of do_debug() clears dr7 when a debug interrupt is fielded. As a result, if a system call touches two watchpoint addresses in userspace only the first access will be noticed. This is probably a bug in do_debug(). It would be better to disable each individual userspace watchpoint as it is triggered (or even not to disable it at all). dr7 would be restored when the SIGTRAP is delivered. (But what if the user is blocking or ignoring SIGTRAP?) Moving on... I've worked out a plan for implementing combined user/kernel mode breakpoints and watchpoints (call them "debugpoints" as a catch-all term). It should work transparently with ptrace and should accomodate whatever scheme utrace decides to adopt. There won't need to be a separate kwatch facility on top of it; the new combined facility will handle debugpoints in both userspace and kernelspace. The idea is that callers can register and unregister a struct debugpoint, which contains fields for the type, length, address, and priority as well as three callback pointers (for installed, uninstalled, and triggered). The debug core will keep these structures sorted by priority and will allocate the available debug registers based on the priorities of the userspace and kernelspace requests. Each CPU will have its own array of pointers to these structures, indicating which debugpoints are currently enabled. To work with ptrace, the new scheme will completely virtualize the debug registers. So for example, a ptrace call might request a debugpoint in dr0, but it could end up that the physical register used is really dr2 instead. The various bits in dr6 and dr7 will be mapped in such a way that the entire procedure is transparent to the user. Debugpoints registered in kernelspace or by utrace won't care, of course. There are two things I am uncertain about: vm86 mode and kprobes. I don't know anything about how either of them works. Judging from the current code, nothing much should be needed -- debug traps in vm86 mode are handled by calling handle_vm86_trap(), and kprobes puts itself at the start of the notify_die() chain so it can handle single-step traps. Eventually it will be necessary to check with someone who really understands the issues. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
Going back to something you mentioned earlier... On Fri, 9 Feb 2007, Roland McGrath wrote: I don't think I really object to the ABI change of clearing %dr6 after an exception so that it does not accumulate multiple results. But first I'll have to convince myself that we never actually do want to accumulate multiple results. Hmm, I think we can, so maybe I do object. If you set two watchpoints inside a user buffer and then do a system call that touches both those addresses (e.g. read), then you will go through do_debug (to send_sigtrap) twice before returning to user mode. When the syscall is done, you'll have a pending SIGTRAP for the debugger to handle. By looking at your %dr6 the debugger can see that both watchpoints hit. (gdb does not handle this case, but it should.) Am I wrong? Yes, you are wrong -- although perhaps you shouldn't be. The current version of do_debug() clears dr7 when a debug interrupt is fielded. As a result, if a system call touches two watchpoint addresses in userspace only the first access will be noticed. This is probably a bug in do_debug(). It would be better to disable each individual userspace watchpoint as it is triggered (or even not to disable it at all). dr7 would be restored when the SIGTRAP is delivered. (But what if the user is blocking or ignoring SIGTRAP?) Moving on... I've worked out a plan for implementing combined user/kernel mode breakpoints and watchpoints (call them debugpoints as a catch-all term). It should work transparently with ptrace and should accomodate whatever scheme utrace decides to adopt. There won't need to be a separate kwatch facility on top of it; the new combined facility will handle debugpoints in both userspace and kernelspace. The idea is that callers can register and unregister a struct debugpoint, which contains fields for the type, length, address, and priority as well as three callback pointers (for installed, uninstalled, and triggered). The debug core will keep these structures sorted by priority and will allocate the available debug registers based on the priorities of the userspace and kernelspace requests. Each CPU will have its own array of pointers to these structures, indicating which debugpoints are currently enabled. To work with ptrace, the new scheme will completely virtualize the debug registers. So for example, a ptrace call might request a debugpoint in dr0, but it could end up that the physical register used is really dr2 instead. The various bits in dr6 and dr7 will be mapped in such a way that the entire procedure is transparent to the user. Debugpoints registered in kernelspace or by utrace won't care, of course. There are two things I am uncertain about: vm86 mode and kprobes. I don't know anything about how either of them works. Judging from the current code, nothing much should be needed -- debug traps in vm86 mode are handled by calling handle_vm86_trap(), and kprobes puts itself at the start of the notify_die() chain so it can handle single-step traps. Eventually it will be necessary to check with someone who really understands the issues. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
> How do you suggest this be handled? Maybe we should just keep track of a > maximum user priority level for each slot, allowing it to go up but not > down until all user processes have given up the slot. (I.e., in the > example above the later kwatch requests would still fail because we would > continue to remember the high user priority level so long as the first > process maintained its allocation.) That would be overly pessimistic, but > it would at least be safe. I think that is certainly fine to start with. Like I said before, we can start conservative and then worry about more complexity as the concrete needs arise. I don't think it will really be any trouble to change some of these low-level interfaces later to accomodate more sophisticated callers and implementations. When the issue does arise, scanning all the necessary tasks may not really need to be so costly. That is, if rather than scanning all tasks in the system, it's a list of debugreg allocations. The callers doing fancy allocation can be responsible for passing in storage for a struct containing the list structure. That would naturally embed in struct kwatch. Having the debugreg allocation routines pass in such a structure would also be useful for another kind of flexibility I'd like to have eventually. That is, "local" allocations that are local to a group of tasks rather than just one. For example, a debugger most often actually wants to share watchpoints among all the threads sharing an address space. If identical settings are in fact shared, the storage requirements for using watchpoints in many-threaded processes scale that much better. I think we have a while before we have to actually figure any of that out in detail. The simple starting point covers all our immediate concrete concerns. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
How do you suggest this be handled? Maybe we should just keep track of a maximum user priority level for each slot, allowing it to go up but not down until all user processes have given up the slot. (I.e., in the example above the later kwatch requests would still fail because we would continue to remember the high user priority level so long as the first process maintained its allocation.) That would be overly pessimistic, but it would at least be safe. I think that is certainly fine to start with. Like I said before, we can start conservative and then worry about more complexity as the concrete needs arise. I don't think it will really be any trouble to change some of these low-level interfaces later to accomodate more sophisticated callers and implementations. When the issue does arise, scanning all the necessary tasks may not really need to be so costly. That is, if rather than scanning all tasks in the system, it's a list of debugreg allocations. The callers doing fancy allocation can be responsible for passing in storage for a struct containing the list structure. That would naturally embed in struct kwatch. Having the debugreg allocation routines pass in such a structure would also be useful for another kind of flexibility I'd like to have eventually. That is, local allocations that are local to a group of tasks rather than just one. For example, a debugger most often actually wants to share watchpoints among all the threads sharing an address space. If identical settings are in fact shared, the storage requirements for using watchpoints in many-threaded processes scale that much better. I think we have a while before we have to actually figure any of that out in detail. The simple starting point covers all our immediate concrete concerns. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Fri, 9 Feb 2007, Roland McGrath wrote: > I don't think I really object to the ABI change of clearing %dr6 after an > exception so that it does not accumulate multiple results. But first I'll > have to convince myself that we never actually do want to accumulate > multiple results. Hmm, I think we can, so maybe I do object. If you set > two watchpoints inside a user buffer and then do a system call that touches > both those addresses (e.g. read), then you will go through do_debug (to > send_sigtrap) twice before returning to user mode. When the syscall is > done, you'll have a pending SIGTRAP for the debugger to handle. By looking > at your %dr6 the debugger can see that both watchpoints hit. (gdb does not > handle this case, but it should.) Am I wrong? I think you're right. > So this gets to the more complicated view of %dr6 handling that I had first > had in mind yesterday. Each allocation "owns" one of the low 4 bits in > %dr6 too. Only the dr6 bits owned by the userland "raw" allocation > (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6]. > So when kwatch swallows a debug exception, it should mask off its bit from > %dr6 in the CPU, but not clear %dr6 completely. That way you can have a > sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one > system call (including interrupt handlers), and when it gets to the > userland debugger examining dr6 it sees the low 2 bits both set. Okay; I'll fix this too. Come to think of it, kwatch needs to handle multiple hits as well -- there might be two watchpoints set to the same address. > > It's really quite a tricky matter. Should a register be allocated to > > kwatch only when no user process needs it? Should we really go about > > checking the requirements of every single process whenever a kwatch > > allocation request comes in? What if the processes which need a > > particular register aren't running -- should the register then be given to > > kwatch? What if one of those processes then does start running on one > > CPU? > > To "go about checking the requirements of every single process" is not so > hard as it sounds when they're recorded as a single global use count per > slot, as your original code does. When you mentioned a "your allocation is > available" callback, I was thinking it might come to that being called > inside context switch. It's all rather tricky, indeed. > > The obvious answer is to start simple. If any user process anywhere uses > drN, kwatch has to give it up for all CPUs (watchpoints with less than > "break ptrace" priority do). If anyone really cares about more flexibility > than that, we can change or extend it. Some copious comments in the > interface descriptions can lead them in the right direction if the > situation comes up. Probably with systemtap support in a while, we'll get > a lot more concrete uses of watchpoints and people finding out what really > matters to them. It's still more complicated than you might think. Let's say two user processes each have dr1 allocated, one with low priority and the other with high priority. The kernel has to be aware of the high-priority allocation, so that it can refuse intermediate-priority kwatch allocation attempts. Now suppose the second process exits. dr1 is still allocated to the first user process but only with low priority, so now intermediate-priority kwatch allocation attempts should succeed. In order for this to work, when the second process gives up its allocation I would have to either scan though all tasks to see the first process, or else keep several global use counts for each slot -- in fact, one use count for each priority level. That's doable if there are only a few levels, but not if there are many. How do you suggest this be handled? Maybe we should just keep track of a maximum user priority level for each slot, allowing it to go up but not down until all user processes have given up the slot. (I.e., in the example above the later kwatch requests would still fail because we would continue to remember the high user priority level so long as the first process maintained its allocation.) That would be overly pessimistic, but it would at least be safe. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
> Yes. In fact, the current existing code does not handle dr6 correctly. > It never clears the register, which means you're likely to get into > trouble when multiple breakpoints (or watchpoints) are enabled. This is a subtle change from the existing ABI, in which userland has to clear %dr6 via ptrace itself. But gdb never does that AFAICT. So it's in fact subject to confusion when two watchpoints are set and the second hits after the first. So gdb ought to be fixed to clear dr6 via ptrace, to work with existing and older kernels. I don't think I really object to the ABI change of clearing %dr6 after an exception so that it does not accumulate multiple results. But first I'll have to convince myself that we never actually do want to accumulate multiple results. Hmm, I think we can, so maybe I do object. If you set two watchpoints inside a user buffer and then do a system call that touches both those addresses (e.g. read), then you will go through do_debug (to send_sigtrap) twice before returning to user mode. When the syscall is done, you'll have a pending SIGTRAP for the debugger to handle. By looking at your %dr6 the debugger can see that both watchpoints hit. (gdb does not handle this case, but it should.) Am I wrong? So this gets to the more complicated view of %dr6 handling that I had first had in mind yesterday. Each allocation "owns" one of the low 4 bits in %dr6 too. Only the dr6 bits owned by the userland "raw" allocation (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6]. So when kwatch swallows a debug exception, it should mask off its bit from %dr6 in the CPU, but not clear %dr6 completely. That way you can have a sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one system call (including interrupt handlers), and when it gets to the userland debugger examining dr6 it sees the low 2 bits both set. > It's really quite a tricky matter. Should a register be allocated to > kwatch only when no user process needs it? Should we really go about > checking the requirements of every single process whenever a kwatch > allocation request comes in? What if the processes which need a > particular register aren't running -- should the register then be given to > kwatch? What if one of those processes then does start running on one > CPU? To "go about checking the requirements of every single process" is not so hard as it sounds when they're recorded as a single global use count per slot, as your original code does. When you mentioned a "your allocation is available" callback, I was thinking it might come to that being called inside context switch. It's all rather tricky, indeed. The obvious answer is to start simple. If any user process anywhere uses drN, kwatch has to give it up for all CPUs (watchpoints with less than "break ptrace" priority do). If anyone really cares about more flexibility than that, we can change or extend it. Some copious comments in the interface descriptions can lead them in the right direction if the situation comes up. Probably with systemtap support in a while, we'll get a lot more concrete uses of watchpoints and people finding out what really matters to them. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Fri, 9 Feb 2007, Roland McGrath wrote: > > All right. However this means thread_struct will have to grow in order to > > hold each task's debug-register allocations and priorities. Would that be > > acceptable? (This might be a good reason to keep the number of bits > > down.) > > Well, noone seems to mind the wasted debugreg[5..6] words. ;-) > > I'm inclined to make thread_struct smaller than it is now by making it > indirect (debugreg[8] -> one pointer). It feels like this would be pretty > safe now that we have TIF_DEBUG anyway. Already nothing should be looking > at the field when TIF_DEBUG isn't set. I had the same thought. > > Another question: How can a program using the ptrace API ever give up a > > debug-register allocation? Disabling the register isn't sufficient; a > > user program should be able to store a watchpoint address in dr1, enable > > it in dr7, disable it in dr7, and then re-enable it in the expectation > > that the address stored in dr1 hasn't been overwritten. As far as I can > > see, ptrace-type allocations have to be permanent (until the task exits or > > execs, or uses some other to-be-determined API to do the de-allocation.) > > I hadn't really thought about this before, but it's pretty straightforward. > Each allocation is for one of %dr[0-3] and for its associated bits in %dr7. > %dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc. > %dr7 & (0x000f0003 << N) for %drN, I guess it is. > 1 << DR_CONTROL_SIZE) - 1) << DR_CONTROL_SHIFT) | > ((1 << DR_ENABLE_SIZE) - 1)) << N, I should say. > > Each allocation owns those 38 bits (70 bits on x86_64). When all those > bits are zero, the allocation is not active and might as well not be there > (except for whatever semantics you might want to have about an allocation's > lifetime as distinct from the event of resetting its contents). Okay, that makes sense. > Your question made me think about the %dr6 issue, too, which I also hadn't > thought about before. It looks like your patch handles this correctly, but > it's a subtle point that I think warrants some comments in the code. When > userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug > and eventually looks at dr6 (via ptrace) to see what happened. Kernel > watchpoints that come along after the user signal is generated can clobber > the CPU %dr6 with new hits, but userland should not perceive this. This > works out because what userland sees is thread.debugreg[6], the only place > that sets it is do_debug, and a kwatch hit bails out before changing it. > Any other new users of the debugreg sharing interface need to be cognizant > of the %dr6 issue to avoid stepping on old ptrace uses. Yes. In fact, the current existing code does not handle dr6 correctly. It never clears the register, which means you're likely to get into trouble when multiple breakpoints (or watchpoints) are enabled. Here's another complication -- and this is one I can't figure out any easy solutions for. The type of API we've been discussing will work well enough on UP systems, but what about SMP? I don't see any value in having a kernel watchpoint enabled on some CPUs and not others. But then what should happen when a debug register is in use by kwatch and a ptrace request on one CPU usurps it (leaving no other register in which to put it)? Not to mention the difficulties of keeping track of everything when the same kwatch watchpoint is stored in different debug registers on different CPUs. It's really quite a tricky matter. Should a register be allocated to kwatch only when no user process needs it? Should we really go about checking the requirements of every single process whenever a kwatch allocation request comes in? What if the processes which need a particular register aren't running -- should the register then be given to kwatch? What if one of those processes then does start running on one CPU? Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
> All right. However this means thread_struct will have to grow in order to > hold each task's debug-register allocations and priorities. Would that be > acceptable? (This might be a good reason to keep the number of bits > down.) Well, noone seems to mind the wasted debugreg[5..6] words. ;-) I'm inclined to make thread_struct smaller than it is now by making it indirect (debugreg[8] -> one pointer). It feels like this would be pretty safe now that we have TIF_DEBUG anyway. Already nothing should be looking at the field when TIF_DEBUG isn't set. > Another question: How can a program using the ptrace API ever give up a > debug-register allocation? Disabling the register isn't sufficient; a > user program should be able to store a watchpoint address in dr1, enable > it in dr7, disable it in dr7, and then re-enable it in the expectation > that the address stored in dr1 hasn't been overwritten. As far as I can > see, ptrace-type allocations have to be permanent (until the task exits or > execs, or uses some other to-be-determined API to do the de-allocation.) I hadn't really thought about this before, but it's pretty straightforward. Each allocation is for one of %dr[0-3] and for its associated bits in %dr7. %dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc. %dr7 & (0x000f0003 << N) for %drN, I guess it is. 1 << DR_CONTROL_SIZE) - 1) << DR_CONTROL_SHIFT) | ((1 << DR_ENABLE_SIZE) - 1)) << N, I should say. Each allocation owns those 38 bits (70 bits on x86_64). When all those bits are zero, the allocation is not active and might as well not be there (except for whatever semantics you might want to have about an allocation's lifetime as distinct from the event of resetting its contents). gdb already works this way: when it removes a watchpoint, it clears drN to zero when it zeros all the corresponding bits in dr7. (In fact it's in a separate call after changing dr7, because of the ptrace interface.) Your question made me think about the %dr6 issue, too, which I also hadn't thought about before. It looks like your patch handles this correctly, but it's a subtle point that I think warrants some comments in the code. When userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug and eventually looks at dr6 (via ptrace) to see what happened. Kernel watchpoints that come along after the user signal is generated can clobber the CPU %dr6 with new hits, but userland should not perceive this. This works out because what userland sees is thread.debugreg[6], the only place that sets it is do_debug, and a kwatch hit bails out before changing it. Any other new users of the debugreg sharing interface need to be cognizant of the %dr6 issue to avoid stepping on old ptrace uses. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
All right. However this means thread_struct will have to grow in order to hold each task's debug-register allocations and priorities. Would that be acceptable? (This might be a good reason to keep the number of bits down.) Well, noone seems to mind the wasted debugreg[5..6] words. ;-) I'm inclined to make thread_struct smaller than it is now by making it indirect (debugreg[8] - one pointer). It feels like this would be pretty safe now that we have TIF_DEBUG anyway. Already nothing should be looking at the field when TIF_DEBUG isn't set. Another question: How can a program using the ptrace API ever give up a debug-register allocation? Disabling the register isn't sufficient; a user program should be able to store a watchpoint address in dr1, enable it in dr7, disable it in dr7, and then re-enable it in the expectation that the address stored in dr1 hasn't been overwritten. As far as I can see, ptrace-type allocations have to be permanent (until the task exits or execs, or uses some other to-be-determined API to do the de-allocation.) I hadn't really thought about this before, but it's pretty straightforward. Each allocation is for one of %dr[0-3] and for its associated bits in %dr7. %dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc. %dr7 (0x000f0003 N) for %drN, I guess it is. 1 DR_CONTROL_SIZE) - 1) DR_CONTROL_SHIFT) | ((1 DR_ENABLE_SIZE) - 1)) N, I should say. Each allocation owns those 38 bits (70 bits on x86_64). When all those bits are zero, the allocation is not active and might as well not be there (except for whatever semantics you might want to have about an allocation's lifetime as distinct from the event of resetting its contents). gdb already works this way: when it removes a watchpoint, it clears drN to zero when it zeros all the corresponding bits in dr7. (In fact it's in a separate call after changing dr7, because of the ptrace interface.) Your question made me think about the %dr6 issue, too, which I also hadn't thought about before. It looks like your patch handles this correctly, but it's a subtle point that I think warrants some comments in the code. When userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug and eventually looks at dr6 (via ptrace) to see what happened. Kernel watchpoints that come along after the user signal is generated can clobber the CPU %dr6 with new hits, but userland should not perceive this. This works out because what userland sees is thread.debugreg[6], the only place that sets it is do_debug, and a kwatch hit bails out before changing it. Any other new users of the debugreg sharing interface need to be cognizant of the %dr6 issue to avoid stepping on old ptrace uses. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Fri, 9 Feb 2007, Roland McGrath wrote: All right. However this means thread_struct will have to grow in order to hold each task's debug-register allocations and priorities. Would that be acceptable? (This might be a good reason to keep the number of bits down.) Well, noone seems to mind the wasted debugreg[5..6] words. ;-) I'm inclined to make thread_struct smaller than it is now by making it indirect (debugreg[8] - one pointer). It feels like this would be pretty safe now that we have TIF_DEBUG anyway. Already nothing should be looking at the field when TIF_DEBUG isn't set. I had the same thought. Another question: How can a program using the ptrace API ever give up a debug-register allocation? Disabling the register isn't sufficient; a user program should be able to store a watchpoint address in dr1, enable it in dr7, disable it in dr7, and then re-enable it in the expectation that the address stored in dr1 hasn't been overwritten. As far as I can see, ptrace-type allocations have to be permanent (until the task exits or execs, or uses some other to-be-determined API to do the de-allocation.) I hadn't really thought about this before, but it's pretty straightforward. Each allocation is for one of %dr[0-3] and for its associated bits in %dr7. %dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc. %dr7 (0x000f0003 N) for %drN, I guess it is. 1 DR_CONTROL_SIZE) - 1) DR_CONTROL_SHIFT) | ((1 DR_ENABLE_SIZE) - 1)) N, I should say. Each allocation owns those 38 bits (70 bits on x86_64). When all those bits are zero, the allocation is not active and might as well not be there (except for whatever semantics you might want to have about an allocation's lifetime as distinct from the event of resetting its contents). Okay, that makes sense. Your question made me think about the %dr6 issue, too, which I also hadn't thought about before. It looks like your patch handles this correctly, but it's a subtle point that I think warrants some comments in the code. When userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug and eventually looks at dr6 (via ptrace) to see what happened. Kernel watchpoints that come along after the user signal is generated can clobber the CPU %dr6 with new hits, but userland should not perceive this. This works out because what userland sees is thread.debugreg[6], the only place that sets it is do_debug, and a kwatch hit bails out before changing it. Any other new users of the debugreg sharing interface need to be cognizant of the %dr6 issue to avoid stepping on old ptrace uses. Yes. In fact, the current existing code does not handle dr6 correctly. It never clears the register, which means you're likely to get into trouble when multiple breakpoints (or watchpoints) are enabled. Here's another complication -- and this is one I can't figure out any easy solutions for. The type of API we've been discussing will work well enough on UP systems, but what about SMP? I don't see any value in having a kernel watchpoint enabled on some CPUs and not others. But then what should happen when a debug register is in use by kwatch and a ptrace request on one CPU usurps it (leaving no other register in which to put it)? Not to mention the difficulties of keeping track of everything when the same kwatch watchpoint is stored in different debug registers on different CPUs. It's really quite a tricky matter. Should a register be allocated to kwatch only when no user process needs it? Should we really go about checking the requirements of every single process whenever a kwatch allocation request comes in? What if the processes which need a particular register aren't running -- should the register then be given to kwatch? What if one of those processes then does start running on one CPU? Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
Yes. In fact, the current existing code does not handle dr6 correctly. It never clears the register, which means you're likely to get into trouble when multiple breakpoints (or watchpoints) are enabled. This is a subtle change from the existing ABI, in which userland has to clear %dr6 via ptrace itself. But gdb never does that AFAICT. So it's in fact subject to confusion when two watchpoints are set and the second hits after the first. So gdb ought to be fixed to clear dr6 via ptrace, to work with existing and older kernels. I don't think I really object to the ABI change of clearing %dr6 after an exception so that it does not accumulate multiple results. But first I'll have to convince myself that we never actually do want to accumulate multiple results. Hmm, I think we can, so maybe I do object. If you set two watchpoints inside a user buffer and then do a system call that touches both those addresses (e.g. read), then you will go through do_debug (to send_sigtrap) twice before returning to user mode. When the syscall is done, you'll have a pending SIGTRAP for the debugger to handle. By looking at your %dr6 the debugger can see that both watchpoints hit. (gdb does not handle this case, but it should.) Am I wrong? So this gets to the more complicated view of %dr6 handling that I had first had in mind yesterday. Each allocation owns one of the low 4 bits in %dr6 too. Only the dr6 bits owned by the userland raw allocation (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6]. So when kwatch swallows a debug exception, it should mask off its bit from %dr6 in the CPU, but not clear %dr6 completely. That way you can have a sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one system call (including interrupt handlers), and when it gets to the userland debugger examining dr6 it sees the low 2 bits both set. It's really quite a tricky matter. Should a register be allocated to kwatch only when no user process needs it? Should we really go about checking the requirements of every single process whenever a kwatch allocation request comes in? What if the processes which need a particular register aren't running -- should the register then be given to kwatch? What if one of those processes then does start running on one CPU? To go about checking the requirements of every single process is not so hard as it sounds when they're recorded as a single global use count per slot, as your original code does. When you mentioned a your allocation is available callback, I was thinking it might come to that being called inside context switch. It's all rather tricky, indeed. The obvious answer is to start simple. If any user process anywhere uses drN, kwatch has to give it up for all CPUs (watchpoints with less than break ptrace priority do). If anyone really cares about more flexibility than that, we can change or extend it. Some copious comments in the interface descriptions can lead them in the right direction if the situation comes up. Probably with systemtap support in a while, we'll get a lot more concrete uses of watchpoints and people finding out what really matters to them. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Fri, 9 Feb 2007, Roland McGrath wrote: I don't think I really object to the ABI change of clearing %dr6 after an exception so that it does not accumulate multiple results. But first I'll have to convince myself that we never actually do want to accumulate multiple results. Hmm, I think we can, so maybe I do object. If you set two watchpoints inside a user buffer and then do a system call that touches both those addresses (e.g. read), then you will go through do_debug (to send_sigtrap) twice before returning to user mode. When the syscall is done, you'll have a pending SIGTRAP for the debugger to handle. By looking at your %dr6 the debugger can see that both watchpoints hit. (gdb does not handle this case, but it should.) Am I wrong? I think you're right. So this gets to the more complicated view of %dr6 handling that I had first had in mind yesterday. Each allocation owns one of the low 4 bits in %dr6 too. Only the dr6 bits owned by the userland raw allocation (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6]. So when kwatch swallows a debug exception, it should mask off its bit from %dr6 in the CPU, but not clear %dr6 completely. That way you can have a sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one system call (including interrupt handlers), and when it gets to the userland debugger examining dr6 it sees the low 2 bits both set. Okay; I'll fix this too. Come to think of it, kwatch needs to handle multiple hits as well -- there might be two watchpoints set to the same address. It's really quite a tricky matter. Should a register be allocated to kwatch only when no user process needs it? Should we really go about checking the requirements of every single process whenever a kwatch allocation request comes in? What if the processes which need a particular register aren't running -- should the register then be given to kwatch? What if one of those processes then does start running on one CPU? To go about checking the requirements of every single process is not so hard as it sounds when they're recorded as a single global use count per slot, as your original code does. When you mentioned a your allocation is available callback, I was thinking it might come to that being called inside context switch. It's all rather tricky, indeed. The obvious answer is to start simple. If any user process anywhere uses drN, kwatch has to give it up for all CPUs (watchpoints with less than break ptrace priority do). If anyone really cares about more flexibility than that, we can change or extend it. Some copious comments in the interface descriptions can lead them in the right direction if the situation comes up. Probably with systemtap support in a while, we'll get a lot more concrete uses of watchpoints and people finding out what really matters to them. It's still more complicated than you might think. Let's say two user processes each have dr1 allocated, one with low priority and the other with high priority. The kernel has to be aware of the high-priority allocation, so that it can refuse intermediate-priority kwatch allocation attempts. Now suppose the second process exits. dr1 is still allocated to the first user process but only with low priority, so now intermediate-priority kwatch allocation attempts should succeed. In order for this to work, when the second process gives up its allocation I would have to either scan though all tasks to see the first process, or else keep several global use counts for each slot -- in fact, one use count for each priority level. That's doable if there are only a few levels, but not if there are many. How do you suggest this be handled? Maybe we should just keep track of a maximum user priority level for each slot, allowing it to go up but not down until all user processes have given up the slot. (I.e., in the example above the later kwatch requests would still fail because we would continue to remember the high user priority level so long as the first process maintained its allocation.) That would be overly pessimistic, but it would at least be safe. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On 2/7/07, Alan Stern <[EMAIL PROTECTED]> wrote: I wonder where this convention of using lower numbers to indicate higher priorities comes from... It seems to be quite prevalent even though it's obviously backwards. I agree but at least in the case of 'nice' it works in the sense that the value increases with increasing niceness. Done the other way, they would have to call it 'mean,' then someone would wonder why 'mean 10 20' prints 'No such file or directory' instead of '15'... Bob - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Tue, 6 Feb 2007, Roland McGrath wrote: > > So for the sake of argument, let's assume that debug registers can be > > assigned with priority values ranging from 0 to 7 (overkill, but who > > cares?). By fiat, ptrace assignments use priority 4. Then kwatch callers > > can request whatever priority they like. The well-behaved cases you've > > been discussing will use priority 0, and the invasive cases can use > > priority 7. (With appropriate symbolic names instead of raw numeric > > values, naturally.) > > Sure. Or make it signed with lower value wins, have ptrace use -1 and the > average bear use 0 or something especially unobtrusive use >0, and > something very obtrusive use -many. I wonder where this convention of using lower numbers to indicate higher priorities comes from... It seems to be quite prevalent even though it's obviously backwards. > Unless you are really going to pack it > into a few bits somewhere, I'd make it an arbitrary int rather than a > special small range; it's just for sort order comparison. Bottom line, I > don't really care about the numerology. Just so "break ptrace", "don't > break ptrace", and "readily get out of the way on demand" can be expressed. > We can always fine-tune it later as there are more concrete users. Okay; I'm not fixated on any particular size. > > Or maybe that's too complicated. Perhaps all userspace assignments should > > always use the same priority level. > > No, I want priorities among user-mode watchpoint users too. ptrace is > rigid, but newer facilities can coexist with ptrace on the same thread and > with kwatch, and do fancy new things to fall back when there is debugreg > allocation pressure. Future user facilities might be able to do VM tricks > that are harder to make workable for kernel mode, for example. All right. However this means thread_struct will have to grow in order to hold each task's debug-register allocations and priorities. Would that be acceptable? (This might be a good reason to keep the number of bits down.) Another question: How can a program using the ptrace API ever give up a debug-register allocation? Disabling the register isn't sufficient; a user program should be able to store a watchpoint address in dr1, enable it in dr7, disable it in dr7, and then re-enable it in the expectation that the address stored in dr1 hasn't been overwritten. As far as I can see, ptrace-type allocations have to be permanent (until the task exits or execs, or uses some other to-be-determined API to do the de-allocation.) Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Tue, 6 Feb 2007, Roland McGrath wrote: So for the sake of argument, let's assume that debug registers can be assigned with priority values ranging from 0 to 7 (overkill, but who cares?). By fiat, ptrace assignments use priority 4. Then kwatch callers can request whatever priority they like. The well-behaved cases you've been discussing will use priority 0, and the invasive cases can use priority 7. (With appropriate symbolic names instead of raw numeric values, naturally.) Sure. Or make it signed with lower value wins, have ptrace use -1 and the average bear use 0 or something especially unobtrusive use 0, and something very obtrusive use -many. I wonder where this convention of using lower numbers to indicate higher priorities comes from... It seems to be quite prevalent even though it's obviously backwards. Unless you are really going to pack it into a few bits somewhere, I'd make it an arbitrary int rather than a special small range; it's just for sort order comparison. Bottom line, I don't really care about the numerology. Just so break ptrace, don't break ptrace, and readily get out of the way on demand can be expressed. We can always fine-tune it later as there are more concrete users. Okay; I'm not fixated on any particular size. Or maybe that's too complicated. Perhaps all userspace assignments should always use the same priority level. No, I want priorities among user-mode watchpoint users too. ptrace is rigid, but newer facilities can coexist with ptrace on the same thread and with kwatch, and do fancy new things to fall back when there is debugreg allocation pressure. Future user facilities might be able to do VM tricks that are harder to make workable for kernel mode, for example. All right. However this means thread_struct will have to grow in order to hold each task's debug-register allocations and priorities. Would that be acceptable? (This might be a good reason to keep the number of bits down.) Another question: How can a program using the ptrace API ever give up a debug-register allocation? Disabling the register isn't sufficient; a user program should be able to store a watchpoint address in dr1, enable it in dr7, disable it in dr7, and then re-enable it in the expectation that the address stored in dr1 hasn't been overwritten. As far as I can see, ptrace-type allocations have to be permanent (until the task exits or execs, or uses some other to-be-determined API to do the de-allocation.) Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On 2/7/07, Alan Stern [EMAIL PROTECTED] wrote: I wonder where this convention of using lower numbers to indicate higher priorities comes from... It seems to be quite prevalent even though it's obviously backwards. I agree but at least in the case of 'nice' it works in the sense that the value increases with increasing niceness. Done the other way, they would have to call it 'mean,' then someone would wonder why 'mean 10 20' prints 'No such file or directory' instead of '15'... Bob - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
> That's good. So I'll assume an updated version of kwatch can be submitted > without regard to the progress of utrace (other than minor conflicts over > the exact location of the ptrace code to change). Indeed. > Right. I had been thinking in terms of a developer using kwatch to track > down some particularly nasty problem, something that would happen rather > infrequently, where one wouldn't care about side effects on user programs. > But of course those side effects might alter an important aspect of the > kernel problem being debugged... This is indeed a way it might reasonably be used. As I said, it's fine for an individual use to be that way. But think also of using it for performance measurement (i.e. "how hot is this counter") in something like systemtap, where you might have long-running instrumentation over arbitrary workloads. > It's also true that the current kwatch version affects the user experience > even when no kernel debugging is going on, as it forcibly prevents ptrace > calls from setting the Global-Enable bits in dr7. That at least can be > fixed quite easily. (On the other hand, userspace should never do > anything other than a Local Enable.) The distinction between local and global here never matters on Linux. We don't use hardware task switching at all, and if we did it would be part of context switch, which already switches in debug register values. The local vs global distinction you have in debugreg allocation (when one Linux task_struct is on the CPU vs always on every CPU) is a machine-independent notion at the level of your debugreg sharing abstraction, and has nothing to do with particular %dr7 bit values (just with the allocation of all the bits in %dr7 that correspond to a particular allocated %drN). > How about a pair of callbacks: One to notify whenever the watchpoint is > enabled and one to notify whenever it is disabled? That sounds fine. You'll want to make sure it's structured so it doesn't get too hairy when a caller wants to just give up and unregister when its slot is unavailable (hopefully shouldn't lead to calling unregister from the callback made inside the register call and such twists). > So for the sake of argument, let's assume that debug registers can be > assigned with priority values ranging from 0 to 7 (overkill, but who > cares?). By fiat, ptrace assignments use priority 4. Then kwatch callers > can request whatever priority they like. The well-behaved cases you've > been discussing will use priority 0, and the invasive cases can use > priority 7. (With appropriate symbolic names instead of raw numeric > values, naturally.) Sure. Or make it signed with lower value wins, have ptrace use -1 and the average bear use 0 or something especially unobtrusive use >0, and something very obtrusive use -many. Unless you are really going to pack it into a few bits somewhere, I'd make it an arbitrary int rather than a special small range; it's just for sort order comparison. Bottom line, I don't really care about the numerology. Just so "break ptrace", "don't break ptrace", and "readily get out of the way on demand" can be expressed. We can always fine-tune it later as there are more concrete users. > Or maybe that's too complicated. Perhaps all userspace assignments should > always use the same priority level. No, I want priorities among user-mode watchpoint users too. ptrace is rigid, but newer facilities can coexist with ptrace on the same thread and with kwatch, and do fancy new things to fall back when there is debugreg allocation pressure. Future user facilities might be able to do VM tricks that are harder to make workable for kernel mode, for example. > For now I would prefer to avoid that. It's true that kwatch is intended > _only_ for kernelspace watchpoints, not userspace. But I'd rather leave > the complications up to someone else. Understood. If you constrain the kwatch interface so it cannot be used with user addresses (checks < TASK_SIZE or whatever), then the problem will be clearly defined as the slightly simpler one whenever someone does come along in need of more complications. > It seems likely that the interfaces added by kwatch will need to be > generalized in various ways in order to handle the requirements of other > architectures. However I don't know what those requirements might be, so > it seems best to start out small with x86 only and leave more refinements > for the future. Agreed, just to keep it in mind. I think the features on other machines are roughly similar except for not offering size choices other than "anywhere in this aligned word". > If I update the patch, adding a priority level and the callback > notifications, do you think it would then be acceptable? I expect so. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Mon, 5 Feb 2007, Roland McGrath wrote: > Sorry I've been slow in giving you feedback on kwatch. No problem (I have plenty of other things to work on!), and thanks for the detailed reply. > > I'll be happy to move this over to the utrace setting, once it is merged. > > Do you think it would be better to include the current version of kwatch > > now or to wait for utrace? > > > > Roland, is there a schedule for when you plan to get utrace into -mm? > > Since you've asked, I'll mention that I've been discussing this with Andrew > lately and we plan to work on merging it into -mm as soon as we can manage. > > The kwatch implementation is pretty much orthogonal to the utrace patch as > it is so far. As you've noted, it doesn't change the nature of the setting > of the debug registers; it only moves around the existing code for setting > them in raw form. Hence it doesn't much matter which order the work is > merged at this stage. There's no reason to withhold kwatch waiting for > utrace. That's good. So I'll assume an updated version of kwatch can be submitted without regard to the progress of utrace (other than minor conflicts over the exact location of the ptrace code to change). > I do have a problem with kwatch, however. The existing user ABI includes > setting all of the debug registers, and this interface has never before > expressed a situation where you can set some but not all of them. Having > ptrace suddenly fail with EBUSY when it never did before is not OK. No > well-behaved kernel-mode tracing/debugging facility should perturb the user > experience in this way. It is certainly understandable that one will > sometimes want to do invasive kernel-mode debugging and on special > occasions choose to be ill-behaved in this way (you might know your > userland work load doesn't include running gdb with watchpoints). > But kwatch as it stands does not even make it possible to write a > well-behaved facility. Right. I had been thinking in terms of a developer using kwatch to track down some particularly nasty problem, something that would happen rather infrequently, where one wouldn't care about side effects on user programs. But of course those side effects might alter an important aspect of the kernel problem being debugged... It's also true that the current kwatch version affects the user experience even when no kernel debugging is going on, as it forcibly prevents ptrace calls from setting the Global-Enable bits in dr7. That at least can be fixed quite easily. (On the other hand, userspace should never do anything other than a Local Enable.) > I am all in favor of a facility to manage shared use of the debug > registers, such as your debugreg.h additions. I just think it needs to be > a little more flexible. An unobtrusive kernel facility has to get out of > the way when user-mode decides to use all its debug registers. It's not > immediately important what it's going to about it when contention arises, > but there has to be a way for the user-mode facilities to say they need to > allocate debugregs with priority and evict other squatters. So, something > like code allocating a debugreg can supply a callback that's made when its > allocation has to taken by something with higher priority. How about a pair of callbacks: One to notify whenever the watchpoint is enabled and one to notify whenever it is disabled? > Even after utrace, there will always be the possibility of a traditional > uncoordinated user of the raw debug registers, if nothing else ptrace > compatibility will always be there for old users. So anything new and > fancy needs to be prepared to back out of the way gracefully. In the case > of kwatch, it can just have a handler function given by the caller to start > with. It's OK if individual callers can specially declare "I am not > well-behaved" and eat debugregs so that well-behaved high-priority users > like ptrace just have to lose (breaking compatibility). But no > well-behaved caller of kwatch will do that. No doubt the future userspace API will include some sort of priority facility. For now, though, ptrace doesn't have anything like it. We just have to assign it an arbitrary intermediate priority. So for the sake of argument, let's assume that debug registers can be assigned with priority values ranging from 0 to 7 (overkill, but who cares?). By fiat, ptrace assignments use priority 4. Then kwatch callers can request whatever priority they like. The well-behaved cases you've been discussing will use priority 0, and the invasive cases can use priority 7. (With appropriate symbolic names instead of raw numeric values, naturally.) Or maybe that's too complicated. Perhaps all userspace assignments should always use the same priority level. After all, it's possible for multiple tasks to allocate the same debug register at the same time -- if they had differing priorities that would make it much more difficult
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Mon, 5 Feb 2007, Roland McGrath wrote: Sorry I've been slow in giving you feedback on kwatch. No problem (I have plenty of other things to work on!), and thanks for the detailed reply. I'll be happy to move this over to the utrace setting, once it is merged. Do you think it would be better to include the current version of kwatch now or to wait for utrace? Roland, is there a schedule for when you plan to get utrace into -mm? Since you've asked, I'll mention that I've been discussing this with Andrew lately and we plan to work on merging it into -mm as soon as we can manage. The kwatch implementation is pretty much orthogonal to the utrace patch as it is so far. As you've noted, it doesn't change the nature of the setting of the debug registers; it only moves around the existing code for setting them in raw form. Hence it doesn't much matter which order the work is merged at this stage. There's no reason to withhold kwatch waiting for utrace. That's good. So I'll assume an updated version of kwatch can be submitted without regard to the progress of utrace (other than minor conflicts over the exact location of the ptrace code to change). I do have a problem with kwatch, however. The existing user ABI includes setting all of the debug registers, and this interface has never before expressed a situation where you can set some but not all of them. Having ptrace suddenly fail with EBUSY when it never did before is not OK. No well-behaved kernel-mode tracing/debugging facility should perturb the user experience in this way. It is certainly understandable that one will sometimes want to do invasive kernel-mode debugging and on special occasions choose to be ill-behaved in this way (you might know your userland work load doesn't include running gdb with watchpoints). But kwatch as it stands does not even make it possible to write a well-behaved facility. Right. I had been thinking in terms of a developer using kwatch to track down some particularly nasty problem, something that would happen rather infrequently, where one wouldn't care about side effects on user programs. But of course those side effects might alter an important aspect of the kernel problem being debugged... It's also true that the current kwatch version affects the user experience even when no kernel debugging is going on, as it forcibly prevents ptrace calls from setting the Global-Enable bits in dr7. That at least can be fixed quite easily. (On the other hand, userspace should never do anything other than a Local Enable.) I am all in favor of a facility to manage shared use of the debug registers, such as your debugreg.h additions. I just think it needs to be a little more flexible. An unobtrusive kernel facility has to get out of the way when user-mode decides to use all its debug registers. It's not immediately important what it's going to about it when contention arises, but there has to be a way for the user-mode facilities to say they need to allocate debugregs with priority and evict other squatters. So, something like code allocating a debugreg can supply a callback that's made when its allocation has to taken by something with higher priority. How about a pair of callbacks: One to notify whenever the watchpoint is enabled and one to notify whenever it is disabled? Even after utrace, there will always be the possibility of a traditional uncoordinated user of the raw debug registers, if nothing else ptrace compatibility will always be there for old users. So anything new and fancy needs to be prepared to back out of the way gracefully. In the case of kwatch, it can just have a handler function given by the caller to start with. It's OK if individual callers can specially declare I am not well-behaved and eat debugregs so that well-behaved high-priority users like ptrace just have to lose (breaking compatibility). But no well-behaved caller of kwatch will do that. No doubt the future userspace API will include some sort of priority facility. For now, though, ptrace doesn't have anything like it. We just have to assign it an arbitrary intermediate priority. So for the sake of argument, let's assume that debug registers can be assigned with priority values ranging from 0 to 7 (overkill, but who cares?). By fiat, ptrace assignments use priority 4. Then kwatch callers can request whatever priority they like. The well-behaved cases you've been discussing will use priority 0, and the invasive cases can use priority 7. (With appropriate symbolic names instead of raw numeric values, naturally.) Or maybe that's too complicated. Perhaps all userspace assignments should always use the same priority level. After all, it's possible for multiple tasks to allocate the same debug register at the same time -- if they had differing priorities that would make it much more difficult to keep things straight. Then there would be only
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
That's good. So I'll assume an updated version of kwatch can be submitted without regard to the progress of utrace (other than minor conflicts over the exact location of the ptrace code to change). Indeed. Right. I had been thinking in terms of a developer using kwatch to track down some particularly nasty problem, something that would happen rather infrequently, where one wouldn't care about side effects on user programs. But of course those side effects might alter an important aspect of the kernel problem being debugged... This is indeed a way it might reasonably be used. As I said, it's fine for an individual use to be that way. But think also of using it for performance measurement (i.e. how hot is this counter) in something like systemtap, where you might have long-running instrumentation over arbitrary workloads. It's also true that the current kwatch version affects the user experience even when no kernel debugging is going on, as it forcibly prevents ptrace calls from setting the Global-Enable bits in dr7. That at least can be fixed quite easily. (On the other hand, userspace should never do anything other than a Local Enable.) The distinction between local and global here never matters on Linux. We don't use hardware task switching at all, and if we did it would be part of context switch, which already switches in debug register values. The local vs global distinction you have in debugreg allocation (when one Linux task_struct is on the CPU vs always on every CPU) is a machine-independent notion at the level of your debugreg sharing abstraction, and has nothing to do with particular %dr7 bit values (just with the allocation of all the bits in %dr7 that correspond to a particular allocated %drN). How about a pair of callbacks: One to notify whenever the watchpoint is enabled and one to notify whenever it is disabled? That sounds fine. You'll want to make sure it's structured so it doesn't get too hairy when a caller wants to just give up and unregister when its slot is unavailable (hopefully shouldn't lead to calling unregister from the callback made inside the register call and such twists). So for the sake of argument, let's assume that debug registers can be assigned with priority values ranging from 0 to 7 (overkill, but who cares?). By fiat, ptrace assignments use priority 4. Then kwatch callers can request whatever priority they like. The well-behaved cases you've been discussing will use priority 0, and the invasive cases can use priority 7. (With appropriate symbolic names instead of raw numeric values, naturally.) Sure. Or make it signed with lower value wins, have ptrace use -1 and the average bear use 0 or something especially unobtrusive use 0, and something very obtrusive use -many. Unless you are really going to pack it into a few bits somewhere, I'd make it an arbitrary int rather than a special small range; it's just for sort order comparison. Bottom line, I don't really care about the numerology. Just so break ptrace, don't break ptrace, and readily get out of the way on demand can be expressed. We can always fine-tune it later as there are more concrete users. Or maybe that's too complicated. Perhaps all userspace assignments should always use the same priority level. No, I want priorities among user-mode watchpoint users too. ptrace is rigid, but newer facilities can coexist with ptrace on the same thread and with kwatch, and do fancy new things to fall back when there is debugreg allocation pressure. Future user facilities might be able to do VM tricks that are harder to make workable for kernel mode, for example. For now I would prefer to avoid that. It's true that kwatch is intended _only_ for kernelspace watchpoints, not userspace. But I'd rather leave the complications up to someone else. Understood. If you constrain the kwatch interface so it cannot be used with user addresses (checks TASK_SIZE or whatever), then the problem will be clearly defined as the slightly simpler one whenever someone does come along in need of more complications. It seems likely that the interfaces added by kwatch will need to be generalized in various ways in order to handle the requirements of other architectures. However I don't know what those requirements might be, so it seems best to start out small with x86 only and leave more refinements for the future. Agreed, just to keep it in mind. I think the features on other machines are roughly similar except for not offering size choices other than anywhere in this aligned word. If I update the patch, adding a priority level and the callback notifications, do you think it would then be acceptable? I expect so. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
Sorry I've been slow in giving you feedback on kwatch. > I'll be happy to move this over to the utrace setting, once it is merged. > Do you think it would be better to include the current version of kwatch > now or to wait for utrace? > > Roland, is there a schedule for when you plan to get utrace into -mm? Since you've asked, I'll mention that I've been discussing this with Andrew lately and we plan to work on merging it into -mm as soon as we can manage. The kwatch implementation is pretty much orthogonal to the utrace patch as it is so far. As you've noted, it doesn't change the nature of the setting of the debug registers; it only moves around the existing code for setting them in raw form. Hence it doesn't much matter which order the work is merged at this stage. There's no reason to withhold kwatch waiting for utrace. I do have a problem with kwatch, however. The existing user ABI includes setting all of the debug registers, and this interface has never before expressed a situation where you can set some but not all of them. Having ptrace suddenly fail with EBUSY when it never did before is not OK. No well-behaved kernel-mode tracing/debugging facility should perturb the user experience in this way. It is certainly understandable that one will sometimes want to do invasive kernel-mode debugging and on special occasions choose to be ill-behaved in this way (you might know your userland work load doesn't include running gdb with watchpoints). But kwatch as it stands does not even make it possible to write a well-behaved facility. I am all in favor of a facility to manage shared use of the debug registers, such as your debugreg.h additions. I just think it needs to be a little more flexible. An unobtrusive kernel facility has to get out of the way when user-mode decides to use all its debug registers. It's not immediately important what it's going to about it when contention arises, but there has to be a way for the user-mode facilities to say they need to allocate debugregs with priority and evict other squatters. So, something like code allocating a debugreg can supply a callback that's made when its allocation has to taken by something with higher priority. Even after utrace, there will always be the possibility of a traditional uncoordinated user of the raw debug registers, if nothing else ptrace compatibility will always be there for old users. So anything new and fancy needs to be prepared to back out of the way gracefully. In the case of kwatch, it can just have a handler function given by the caller to start with. It's OK if individual callers can specially declare "I am not well-behaved" and eat debugregs so that well-behaved high-priority users like ptrace just have to lose (breaking compatibility). But no well-behaved caller of kwatch will do that. As a later improvement, kwatch could try a thing or two to stave off giving up and telling its caller the watchpoint couldn't stay for the current task. For example, if a watchpoint is in kernel memory, you could switch in your debugreg settings on entering the kernel and restore the user watchpoints before returning to user mode. Then you'd need to make get_user et al somehow observe the user-mode watchpoints. But it could be investigated if the need arises. Note that you can already silently do something simple like juggling your kwatch debugreg assignments around if the higher-priority consumer evicting you has left some other debugregs unused. I certainly intend for later features based on utrace to include higher-level treatment of watchpoints so that user debugging facilities can also become responsive to debugreg allocation pressure. (Eventually, the user facilities might have easier ways of falling back to other methods and getting out of the way of kernel debugreg consumers, than can be done for the kernel-mode-tracing facilities.) To that end, I'd like to see a clear and robust interface for debugreg sharing, below the level of kwatch. I'd also like to see a thin layer on that giving a machine-independent kernel source API for talking about watchpoints, which you pretty much have rolled into the kwatch interface now. But these are further refinements, not barriers to including kwatch. Also, an unrelated minor point. I think it's error-prone to have an integer argument to unregister_kwatch. I think it makes most sense to have the caller provide the space and call register/unregister with a pointer, in the style of kprobes. Thanks, Roland - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
Sorry I've been slow in giving you feedback on kwatch. I'll be happy to move this over to the utrace setting, once it is merged. Do you think it would be better to include the current version of kwatch now or to wait for utrace? Roland, is there a schedule for when you plan to get utrace into -mm? Since you've asked, I'll mention that I've been discussing this with Andrew lately and we plan to work on merging it into -mm as soon as we can manage. The kwatch implementation is pretty much orthogonal to the utrace patch as it is so far. As you've noted, it doesn't change the nature of the setting of the debug registers; it only moves around the existing code for setting them in raw form. Hence it doesn't much matter which order the work is merged at this stage. There's no reason to withhold kwatch waiting for utrace. I do have a problem with kwatch, however. The existing user ABI includes setting all of the debug registers, and this interface has never before expressed a situation where you can set some but not all of them. Having ptrace suddenly fail with EBUSY when it never did before is not OK. No well-behaved kernel-mode tracing/debugging facility should perturb the user experience in this way. It is certainly understandable that one will sometimes want to do invasive kernel-mode debugging and on special occasions choose to be ill-behaved in this way (you might know your userland work load doesn't include running gdb with watchpoints). But kwatch as it stands does not even make it possible to write a well-behaved facility. I am all in favor of a facility to manage shared use of the debug registers, such as your debugreg.h additions. I just think it needs to be a little more flexible. An unobtrusive kernel facility has to get out of the way when user-mode decides to use all its debug registers. It's not immediately important what it's going to about it when contention arises, but there has to be a way for the user-mode facilities to say they need to allocate debugregs with priority and evict other squatters. So, something like code allocating a debugreg can supply a callback that's made when its allocation has to taken by something with higher priority. Even after utrace, there will always be the possibility of a traditional uncoordinated user of the raw debug registers, if nothing else ptrace compatibility will always be there for old users. So anything new and fancy needs to be prepared to back out of the way gracefully. In the case of kwatch, it can just have a handler function given by the caller to start with. It's OK if individual callers can specially declare I am not well-behaved and eat debugregs so that well-behaved high-priority users like ptrace just have to lose (breaking compatibility). But no well-behaved caller of kwatch will do that. As a later improvement, kwatch could try a thing or two to stave off giving up and telling its caller the watchpoint couldn't stay for the current task. For example, if a watchpoint is in kernel memory, you could switch in your debugreg settings on entering the kernel and restore the user watchpoints before returning to user mode. Then you'd need to make get_user et al somehow observe the user-mode watchpoints. But it could be investigated if the need arises. Note that you can already silently do something simple like juggling your kwatch debugreg assignments around if the higher-priority consumer evicting you has left some other debugregs unused. I certainly intend for later features based on utrace to include higher-level treatment of watchpoints so that user debugging facilities can also become responsive to debugreg allocation pressure. (Eventually, the user facilities might have easier ways of falling back to other methods and getting out of the way of kernel debugreg consumers, than can be done for the kernel-mode-tracing facilities.) To that end, I'd like to see a clear and robust interface for debugreg sharing, below the level of kwatch. I'd also like to see a thin layer on that giving a machine-independent kernel source API for talking about watchpoints, which you pretty much have rolled into the kwatch interface now. But these are further refinements, not barriers to including kwatch. Also, an unrelated minor point. I think it's error-prone to have an integer argument to unregister_kwatch. I think it makes most sense to have the caller provide the space and call register/unregister with a pointer, in the style of kprobes. Thanks, Roland - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Thu, Jan 18, 2007 at 08:31:59AM +0100, Ingo Molnar wrote: > > * Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > > > I'll be happy to move this over to the utrace setting, once it is > > > merged. Do you think it would be better to include the current > > > version of kwatch now or to wait for utrace? > > > > > > Roland, is there a schedule for when you plan to get utrace into > > > -mm? > > > > Even if it goes into mainline soon we'll need a lot of time for all > > architectures to catch up, so I think kwatch should definitely comes > > first. > > i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the > /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many > years, precisely because it was done in the 'oh, lets get this feature > added first, think about it later' manner. Roland's work is a large > logistical undertaking and we should not make it more complex than it > is. Once it's in we can add debugging features ontop of that. To me work > that cleans up existing mess takes precedence before work that adds to > the mess. Utrace doesn't provide any kind of watchpoint infrastructure now, and utrace will take a lot of time to get ready for inclusion, mostly because it really needs asll the arch maintainers to help out (and various not so easy core fixes aswell). I'm all for merging utrace, and I wish we'd be much further into the merging process already, but blocking mostly unrelated functionality for it is more than dumb. > ps. please fix your mailer to not emit those silly Mail-Followup-To > headers! It collapses To: and Cc: lines into one huge unnecessary To: > line. This header is absolutely intentation as far too many folks seem to randomly drop to or cc lines on mailing lists. and of course it's alsmost esential on lists with braindead reply to list policies (e.g. Debian) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Thu, 18 Jan 2007, Ingo Molnar wrote: > > * Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > > > I'll be happy to move this over to the utrace setting, once it is > > > merged. Do you think it would be better to include the current > > > version of kwatch now or to wait for utrace? > > > > > > Roland, is there a schedule for when you plan to get utrace into > > > -mm? > > > > Even if it goes into mainline soon we'll need a lot of time for all > > architectures to catch up, so I think kwatch should definitely comes > > first. > > i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the > /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many > years, precisely because it was done in the 'oh, lets get this feature > added first, think about it later' manner. Roland's work is a large > logistical undertaking and we should not make it more complex than it > is. Once it's in we can add debugging features ontop of that. To me work > that cleans up existing mess takes precedence before work that adds to > the mess. Interestingly, the current version of utrace makes no special provision for watchpoints, either in kernel or user space. Instead it relies on the legacy ptrace mechanism for setting debug registers in the target process's user area. Perhaps an explicit watchpoint implementation should be added to utrace, but that's beyond the scope of this discussion. Furthermore, utrace is explicitly intended for tracing user programs, not for tracing the kernel. Kwatch, however, is just the opposite: It is intended for setting up watchpoints in kernel space. In that sense it is pretty much orthogonal to utrace. Although it would affect the utrace patches, the changes would be basically transparent (i.e., move the new code from one ptrace handler to another instead of moving the old code). If Kwatch is to be subsumed anywhere, I think it should be under the Kprobes/Systemtap project. Again, that's a separate question -- so far they have avoided data watchpoints. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Thu, 18 Jan 2007, Ingo Molnar wrote: * Christoph Hellwig [EMAIL PROTECTED] wrote: I'll be happy to move this over to the utrace setting, once it is merged. Do you think it would be better to include the current version of kwatch now or to wait for utrace? Roland, is there a schedule for when you plan to get utrace into -mm? Even if it goes into mainline soon we'll need a lot of time for all architectures to catch up, so I think kwatch should definitely comes first. i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many years, precisely because it was done in the 'oh, lets get this feature added first, think about it later' manner. Roland's work is a large logistical undertaking and we should not make it more complex than it is. Once it's in we can add debugging features ontop of that. To me work that cleans up existing mess takes precedence before work that adds to the mess. Interestingly, the current version of utrace makes no special provision for watchpoints, either in kernel or user space. Instead it relies on the legacy ptrace mechanism for setting debug registers in the target process's user area. Perhaps an explicit watchpoint implementation should be added to utrace, but that's beyond the scope of this discussion. Furthermore, utrace is explicitly intended for tracing user programs, not for tracing the kernel. Kwatch, however, is just the opposite: It is intended for setting up watchpoints in kernel space. In that sense it is pretty much orthogonal to utrace. Although it would affect the utrace patches, the changes would be basically transparent (i.e., move the new code from one ptrace handler to another instead of moving the old code). If Kwatch is to be subsumed anywhere, I think it should be under the Kprobes/Systemtap project. Again, that's a separate question -- so far they have avoided data watchpoints. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Thu, Jan 18, 2007 at 08:31:59AM +0100, Ingo Molnar wrote: * Christoph Hellwig [EMAIL PROTECTED] wrote: I'll be happy to move this over to the utrace setting, once it is merged. Do you think it would be better to include the current version of kwatch now or to wait for utrace? Roland, is there a schedule for when you plan to get utrace into -mm? Even if it goes into mainline soon we'll need a lot of time for all architectures to catch up, so I think kwatch should definitely comes first. i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many years, precisely because it was done in the 'oh, lets get this feature added first, think about it later' manner. Roland's work is a large logistical undertaking and we should not make it more complex than it is. Once it's in we can add debugging features ontop of that. To me work that cleans up existing mess takes precedence before work that adds to the mess. Utrace doesn't provide any kind of watchpoint infrastructure now, and utrace will take a lot of time to get ready for inclusion, mostly because it really needs asll the arch maintainers to help out (and various not so easy core fixes aswell). I'm all for merging utrace, and I wish we'd be much further into the merging process already, but blocking mostly unrelated functionality for it is more than dumb. ps. please fix your mailer to not emit those silly Mail-Followup-To headers! It collapses To: and Cc: lines into one huge unnecessary To: line. This header is absolutely intentation as far too many folks seem to randomly drop to or cc lines on mailing lists. and of course it's alsmost esential on lists with braindead reply to list policies (e.g. Debian) - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
* Christoph Hellwig <[EMAIL PROTECTED]> wrote: > > I'll be happy to move this over to the utrace setting, once it is > > merged. Do you think it would be better to include the current > > version of kwatch now or to wait for utrace? > > > > Roland, is there a schedule for when you plan to get utrace into > > -mm? > > Even if it goes into mainline soon we'll need a lot of time for all > architectures to catch up, so I think kwatch should definitely comes > first. i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many years, precisely because it was done in the 'oh, lets get this feature added first, think about it later' manner. Roland's work is a large logistical undertaking and we should not make it more complex than it is. Once it's in we can add debugging features ontop of that. To me work that cleans up existing mess takes precedence before work that adds to the mess. Ingo ps. please fix your mailer to not emit those silly Mail-Followup-To headers! It collapses To: and Cc: lines into one huge unnecessary To: line. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Wed, Jan 17, 2007 at 11:17:37AM -0500, Alan Stern wrote: > I'll be happy to move this over to the utrace setting, once it is merged. > Do you think it would be better to include the current version of kwatch > now or to wait for utrace? > > Roland, is there a schedule for when you plan to get utrace into -mm? Even if it goes into mainline soon we'll need a lot of time for all architectures to catch up, so I think kwatch should definitely comes first. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Tue, 16 Jan 2007, Christoph Hellwig wrote: > Fir4st I'd say thanks a lot for forward-porting this, it's really useful > feature for all kinds of nasty debugging. > > I think you should split this into two patches, one for the debugreg > infrastructure, and one for the actual kwatch code. > > Also I think you provide one (or even a few) example wathes for > trivial things, say updating i_ino for an inode given through debugfs. > > Some comments on the code below: Many thanks for your detailed comments and suggestions. It probably was obvious that most of the things you picked up on were inherited from the original Kwatch patch. I'll update my patch in accordance with your suggestions. Responses to just a couple of the comments: > I suspect this should be replaced wit ha global and local variant > to fix the above mentioned issue. It's a tiny bit duplicated code, > but seems much cleaner. It would indeed be cleaner. And in fact the local variant would have a large amount of dead code, which could be left out entirely (at least from the initial version). That's because the only current user of local debug register allocations is ptrace. > > +static void write_dr(int debugreg, unsigned long addr) > > +{ > > + switch (debugreg) { > > + case 0: set_debugreg(addr, 0); break; > > + case 1: set_debugreg(addr, 1); break; > > + case 2: set_debugreg(addr, 2); break; > > + case 3: set_debugreg(addr, 3); break; > > + case 6: set_debugreg(addr, 6); break; > > + case 7: set_debugreg(addr, 7); break; > > + } > > +} > > What's the point of this wrapper? It is called from two different places, and it's better than including the "switch" in each place. > I think large parts of this header should go into a new linux/kwatch.h > so that generic code can use kwatches. In the long run that may well be true. For now, I'm a little hesitant to put something which works only on i386 under include/linux. > > +config KWATCH > > + bool "Kwatch points (EXPERIMENTAL)" > > + depends on EXPERIMENTAL > > + help > > + Kwatch enables kernel-space data watchpoints using the processor's > > + debug registers. It can be very useful for kernel debugging. > > + If in doubt, say "N". > > I think we want different options for debugregs and kwatch. The debugreg > one probably doesn't have to be actually user-visible, though. It's easier to start out like this and then change it later when someone comes up with another use for debugregs. Or perhaps by then the whole thing will have been moved over to utrace, making the issue academic. Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Wed, 17 Jan 2007, Ingo Molnar wrote: > * Alan Stern <[EMAIL PROTECTED]> wrote: > > > From: Alan Stern <[EMAIL PROTECTED]> > > > > This patch (as839) implements the Kwatch (kernel-space hardware-based > > watchpoints) API for the i386 architecture. The API is explained in > > the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c. > > i think it would be nice to have this ontop of Roland's utrace > infrastructure, which nicely modularizes all hardware debugging > capabilities and detaches it from ptrace. I'll be happy to move this over to the utrace setting, once it is merged. Do you think it would be better to include the current version of kwatch now or to wait for utrace? Roland, is there a schedule for when you plan to get utrace into -mm? Alan Stern - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
* Alan Stern <[EMAIL PROTECTED]> wrote: > From: Alan Stern <[EMAIL PROTECTED]> > > This patch (as839) implements the Kwatch (kernel-space hardware-based > watchpoints) API for the i386 architecture. The API is explained in > the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c. i think it would be nice to have this ontop of Roland's utrace infrastructure, which nicely modularizes all hardware debugging capabilities and detaches it from ptrace. Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
* Alan Stern [EMAIL PROTECTED] wrote: From: Alan Stern [EMAIL PROTECTED] This patch (as839) implements the Kwatch (kernel-space hardware-based watchpoints) API for the i386 architecture. The API is explained in the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c. i think it would be nice to have this ontop of Roland's utrace infrastructure, which nicely modularizes all hardware debugging capabilities and detaches it from ptrace. Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Wed, 17 Jan 2007, Ingo Molnar wrote: * Alan Stern [EMAIL PROTECTED] wrote: From: Alan Stern [EMAIL PROTECTED] This patch (as839) implements the Kwatch (kernel-space hardware-based watchpoints) API for the i386 architecture. The API is explained in the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c. i think it would be nice to have this ontop of Roland's utrace infrastructure, which nicely modularizes all hardware debugging capabilities and detaches it from ptrace. I'll be happy to move this over to the utrace setting, once it is merged. Do you think it would be better to include the current version of kwatch now or to wait for utrace? Roland, is there a schedule for when you plan to get utrace into -mm? Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Tue, 16 Jan 2007, Christoph Hellwig wrote: Fir4st I'd say thanks a lot for forward-porting this, it's really useful feature for all kinds of nasty debugging. I think you should split this into two patches, one for the debugreg infrastructure, and one for the actual kwatch code. Also I think you provide one (or even a few) example wathes for trivial things, say updating i_ino for an inode given through debugfs. Some comments on the code below: Many thanks for your detailed comments and suggestions. It probably was obvious that most of the things you picked up on were inherited from the original Kwatch patch. I'll update my patch in accordance with your suggestions. Responses to just a couple of the comments: I suspect this should be replaced wit ha global and local variant to fix the above mentioned issue. It's a tiny bit duplicated code, but seems much cleaner. It would indeed be cleaner. And in fact the local variant would have a large amount of dead code, which could be left out entirely (at least from the initial version). That's because the only current user of local debug register allocations is ptrace. +static void write_dr(int debugreg, unsigned long addr) +{ + switch (debugreg) { + case 0: set_debugreg(addr, 0); break; + case 1: set_debugreg(addr, 1); break; + case 2: set_debugreg(addr, 2); break; + case 3: set_debugreg(addr, 3); break; + case 6: set_debugreg(addr, 6); break; + case 7: set_debugreg(addr, 7); break; + } +} What's the point of this wrapper? It is called from two different places, and it's better than including the switch in each place. I think large parts of this header should go into a new linux/kwatch.h so that generic code can use kwatches. In the long run that may well be true. For now, I'm a little hesitant to put something which works only on i386 under include/linux. +config KWATCH + bool Kwatch points (EXPERIMENTAL) + depends on EXPERIMENTAL + help + Kwatch enables kernel-space data watchpoints using the processor's + debug registers. It can be very useful for kernel debugging. + If in doubt, say N. I think we want different options for debugregs and kwatch. The debugreg one probably doesn't have to be actually user-visible, though. It's easier to start out like this and then change it later when someone comes up with another use for debugregs. Or perhaps by then the whole thing will have been moved over to utrace, making the issue academic. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
On Wed, Jan 17, 2007 at 11:17:37AM -0500, Alan Stern wrote: I'll be happy to move this over to the utrace setting, once it is merged. Do you think it would be better to include the current version of kwatch now or to wait for utrace? Roland, is there a schedule for when you plan to get utrace into -mm? Even if it goes into mainline soon we'll need a lot of time for all architectures to catch up, so I think kwatch should definitely comes first. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
* Christoph Hellwig [EMAIL PROTECTED] wrote: I'll be happy to move this over to the utrace setting, once it is merged. Do you think it would be better to include the current version of kwatch now or to wait for utrace? Roland, is there a schedule for when you plan to get utrace into -mm? Even if it goes into mainline soon we'll need a lot of time for all architectures to catch up, so I think kwatch should definitely comes first. i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many years, precisely because it was done in the 'oh, lets get this feature added first, think about it later' manner. Roland's work is a large logistical undertaking and we should not make it more complex than it is. Once it's in we can add debugging features ontop of that. To me work that cleans up existing mess takes precedence before work that adds to the mess. Ingo ps. please fix your mailer to not emit those silly Mail-Followup-To headers! It collapses To: and Cc: lines into one huge unnecessary To: line. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
Fir4st I'd say thanks a lot for forward-porting this, it's really useful feature for all kinds of nasty debugging. I think you should split this into two patches, one for the debugreg infrastructure, and one for the actual kwatch code. Also I think you provide one (or even a few) example wathes for trivial things, say updating i_ino for an inode given through debugfs. Some comments on the code below: > --- /dev/null > +++ usb-2.6/arch/i386/kernel/debugreg.c > @@ -0,0 +1,182 @@ > +/* > + * Debug register > + * arch/i386/kernel/debugreg.c Please don't put in comments that mention the name of the containing file. Also the "Debug register" comments seems rather useless. > + * 2002-Oct Created by Vamsi Krishna S <[EMAIL PROTECTED]> and > + * Bharata Rao <[EMAIL PROTECTED]> to provide debug register > + * allocation mechanism. > + * 2004-Oct Updated by Prasanna S Panchamukhi <[EMAIL PROTECTED]> with > + * idr_allocations mechanism as suggested by Andi Kleen. I think these kinds of comments aren't in fashion anymore either, all changelogs should be in git commit messages and initial credits go into the first commit message. > +struct debugreg dr_list[DR_MAX]; > +static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED; I think you're supposed to use magic DEFINE_SPINLOCK macro these days. > +unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN | > + DR_GLOBAL_ENABLE_MASK; I'd rahter keep this static and make set_process_dr7 a non-inline function. > + > +static unsigned long dr7_global_reg_mask(unsigned int regnum) > +{ > + return (0xf << (16 + regnum * 4)) | (0x1 << (regnum * 2)); > +} > + > +static int get_dr(int regnum, int flag) > +{ > + if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) { > + dr_list[regnum].flag = flag; > + dr7_global_mask |= dr7_global_reg_mask(regnum); > + return regnum; > + } > + if (flag == DR_ALLOC_LOCAL && > + dr_list[regnum].flag != DR_ALLOC_GLOBAL) { > + dr_list[regnum].flag = flag; > + dr_list[regnum].use_count++; > + return regnum; > + } > + return -1; This looks rather poorly structured, as the function does compltely different things depending on the flags passed in. > +static void free_dr(int regnum) > +{ > + if (dr_list[regnum].flag == DR_ALLOC_LOCAL) { > + if (!--dr_list[regnum].use_count) > + dr_list[regnum].flag = 0; > + } else { > + dr_list[regnum].flag = 0; > + dr_list[regnum].use_count = 0; > + dr7_global_mask &= ~(dr7_global_reg_mask(regnum)); > + } > +} Same here. > +int dr_alloc(int regnum, int flag) > +{ > + int ret = -1; > + > + spin_lock(_lock); > + if (regnum >= 0 && regnum < DR_MAX) > + ret = get_dr(regnum, flag); > + else if (regnum == DR_ANY) { > + > + /* gdb allocates local debug registers starting from 0. > + * To help avoid conflicts, we'll start from the other end. > + */ > + for (regnum = DR_MAX - 1; regnum >= 0; --regnum) { > + ret = get_dr(regnum, flag); > + if (ret >= 0) > + break; > + } > + } else > + printk(KERN_ERR "dr_alloc: " > + "Cannot allocate debug register %d\n", regnum); > + spin_unlock(_lock); > + return ret; I suspect this should be replaced wit ha global and local variant to fix the above mentioned issue. It's a tiny bit duplicated code, but seems much cleaner. > +static int get_dr(int regnum, int flag) > +{ > + if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) { > + dr_list[regnum].flag = flag; > + dr7_global_mask |= dr7_global_reg_mask(regnum); > + return regnum; > + } > + if (flag == DR_ALLOC_LOCAL && > + dr_list[regnum].flag != DR_ALLOC_GLOBAL) { > + dr_list[regnum].flag = flag; > + dr_list[regnum].use_count++; > + return regnum; > + } > + return -1; Same comments about global vs local here. > + > +EXPORT_SYMBOL(dr_alloc); > +EXPORT_SYMBOL(dr_free); I don't think we want these exported at all, and if a proper modular user shows up they should be _GPL as they're fairly lowlevel. Btw, the naming in the whole debugregs code should be consolidated to be debugreg_ instead of all kinds of different variants. > +#ifdef CONFIG_KWATCH > + > +/* Set the type, len and global flag in dr7 for a debug register */ > +#define SET_DR7(dr, regnum, type, len) do {\ > + dr &= ~(0xf << (16 + (regnum) * 4));\ > + dr |= (len) - 1) << 2) | (type)) << \ > + (16 + (regnum) * 4)) | \ > + (0x2 << ((regnum) * 2));\ > + } while (0) > + > +/* Disable a debug
[PATCH] Kwatch: kernel watchpoints using CPU debug registers
From: Alan Stern <[EMAIL PROTECTED]> This patch (as839) implements the Kwatch (kernel-space hardware-based watchpoints) API for the i386 architecture. The API is explained in the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c. The original version of the patch was written by Vamsi Krishna S and Bharata Rao. It was later updated by Prasanna S Panchamukhi for 2.6.13 and then again by me for 2.6.20. Signed-off-by: Prasanna S Panchamukhi <[EMAIL PROTECTED]> Signed-off-by: Alan Stern <[EMAIL PROTECTED]> --- Hardware-based watchpoints can sometimes be indispensable for finding the source of problems. Although this patch is only for the x86 architecture, it should still be useful. And there's no downside to adopting it, since it has virtually no overhead with CONFIG_KWATCH isn't selected. Index: usb-2.6/arch/i386/kernel/debugreg.c === --- /dev/null +++ usb-2.6/arch/i386/kernel/debugreg.c @@ -0,0 +1,182 @@ +/* + * Debug register + * arch/i386/kernel/debugreg.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2002, 2004 + * + * 2002-OctCreated by Vamsi Krishna S <[EMAIL PROTECTED]> and + * Bharata Rao <[EMAIL PROTECTED]> to provide debug register + * allocation mechanism. + * 2004-OctUpdated by Prasanna S Panchamukhi <[EMAIL PROTECTED]> with + * idr_allocations mechanism as suggested by Andi Kleen. + */ + +/* + * These routines provide a debug register allocation mechanism. + */ + +#include +#include +#include +#include +#include + +struct debugreg { + int flag; + int use_count; +}; + +struct debugreg dr_list[DR_MAX]; +static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED; +unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN | + DR_GLOBAL_ENABLE_MASK; + +static unsigned long dr7_global_reg_mask(unsigned int regnum) +{ + return (0xf << (16 + regnum * 4)) | (0x1 << (regnum * 2)); +} + +static int get_dr(int regnum, int flag) +{ + if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) { + dr_list[regnum].flag = flag; + dr7_global_mask |= dr7_global_reg_mask(regnum); + return regnum; + } + if (flag == DR_ALLOC_LOCAL && + dr_list[regnum].flag != DR_ALLOC_GLOBAL) { + dr_list[regnum].flag = flag; + dr_list[regnum].use_count++; + return regnum; + } + return -1; +} + +static void free_dr(int regnum) +{ + if (dr_list[regnum].flag == DR_ALLOC_LOCAL) { + if (!--dr_list[regnum].use_count) + dr_list[regnum].flag = 0; + } else { + dr_list[regnum].flag = 0; + dr_list[regnum].use_count = 0; + dr7_global_mask &= ~(dr7_global_reg_mask(regnum)); + } +} + +int dr_alloc(int regnum, int flag) +{ + int ret = -1; + + spin_lock(_lock); + if (regnum >= 0 && regnum < DR_MAX) + ret = get_dr(regnum, flag); + else if (regnum == DR_ANY) { + + /* gdb allocates local debug registers starting from 0. +* To help avoid conflicts, we'll start from the other end. +*/ + for (regnum = DR_MAX - 1; regnum >= 0; --regnum) { + ret = get_dr(regnum, flag); + if (ret >= 0) + break; + } + } else + printk(KERN_ERR "dr_alloc: " + "Cannot allocate debug register %d\n", regnum); + spin_unlock(_lock); + return ret; +} + +void dr_free(int regnum) +{ + spin_lock(_lock); + if (regnum < 0 || regnum >= DR_MAX || !dr_list[regnum].flag) + printk(KERN_ERR "dr_free: " + "Cannot free debug register %d\n", regnum); + else + free_dr(regnum); + spin_unlock(_lock); +} + +void dr_inc_use_count(unsigned long mask) +{ + int i; + int dr_local_enable = 1 << DR_LOCAL_ENABLE_SHIFT; + + spin_lock(_lock); + for (i = 0; i < DR_MAX; (++i, dr_local_enable <<= DR_ENABLE_SIZE)) { + if (mask & dr_local_enable) +
[PATCH] Kwatch: kernel watchpoints using CPU debug registers
From: Alan Stern [EMAIL PROTECTED] This patch (as839) implements the Kwatch (kernel-space hardware-based watchpoints) API for the i386 architecture. The API is explained in the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c. The original version of the patch was written by Vamsi Krishna S and Bharata Rao. It was later updated by Prasanna S Panchamukhi for 2.6.13 and then again by me for 2.6.20. Signed-off-by: Prasanna S Panchamukhi [EMAIL PROTECTED] Signed-off-by: Alan Stern [EMAIL PROTECTED] --- Hardware-based watchpoints can sometimes be indispensable for finding the source of problems. Although this patch is only for the x86 architecture, it should still be useful. And there's no downside to adopting it, since it has virtually no overhead with CONFIG_KWATCH isn't selected. Index: usb-2.6/arch/i386/kernel/debugreg.c === --- /dev/null +++ usb-2.6/arch/i386/kernel/debugreg.c @@ -0,0 +1,182 @@ +/* + * Debug register + * arch/i386/kernel/debugreg.c + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2002, 2004 + * + * 2002-OctCreated by Vamsi Krishna S [EMAIL PROTECTED] and + * Bharata Rao [EMAIL PROTECTED] to provide debug register + * allocation mechanism. + * 2004-OctUpdated by Prasanna S Panchamukhi [EMAIL PROTECTED] with + * idr_allocations mechanism as suggested by Andi Kleen. + */ + +/* + * These routines provide a debug register allocation mechanism. + */ + +#include linux/kernel.h +#include linux/spinlock.h +#include linux/module.h +#include asm/system.h +#include asm/debugreg.h + +struct debugreg { + int flag; + int use_count; +}; + +struct debugreg dr_list[DR_MAX]; +static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED; +unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN | + DR_GLOBAL_ENABLE_MASK; + +static unsigned long dr7_global_reg_mask(unsigned int regnum) +{ + return (0xf (16 + regnum * 4)) | (0x1 (regnum * 2)); +} + +static int get_dr(int regnum, int flag) +{ + if (flag == DR_ALLOC_GLOBAL !dr_list[regnum].flag) { + dr_list[regnum].flag = flag; + dr7_global_mask |= dr7_global_reg_mask(regnum); + return regnum; + } + if (flag == DR_ALLOC_LOCAL + dr_list[regnum].flag != DR_ALLOC_GLOBAL) { + dr_list[regnum].flag = flag; + dr_list[regnum].use_count++; + return regnum; + } + return -1; +} + +static void free_dr(int regnum) +{ + if (dr_list[regnum].flag == DR_ALLOC_LOCAL) { + if (!--dr_list[regnum].use_count) + dr_list[regnum].flag = 0; + } else { + dr_list[regnum].flag = 0; + dr_list[regnum].use_count = 0; + dr7_global_mask = ~(dr7_global_reg_mask(regnum)); + } +} + +int dr_alloc(int regnum, int flag) +{ + int ret = -1; + + spin_lock(dr_lock); + if (regnum = 0 regnum DR_MAX) + ret = get_dr(regnum, flag); + else if (regnum == DR_ANY) { + + /* gdb allocates local debug registers starting from 0. +* To help avoid conflicts, we'll start from the other end. +*/ + for (regnum = DR_MAX - 1; regnum = 0; --regnum) { + ret = get_dr(regnum, flag); + if (ret = 0) + break; + } + } else + printk(KERN_ERR dr_alloc: + Cannot allocate debug register %d\n, regnum); + spin_unlock(dr_lock); + return ret; +} + +void dr_free(int regnum) +{ + spin_lock(dr_lock); + if (regnum 0 || regnum = DR_MAX || !dr_list[regnum].flag) + printk(KERN_ERR dr_free: + Cannot free debug register %d\n, regnum); + else + free_dr(regnum); + spin_unlock(dr_lock); +} + +void dr_inc_use_count(unsigned long mask) +{ + int i; + int dr_local_enable = 1 DR_LOCAL_ENABLE_SHIFT; + + spin_lock(dr_lock); + for (i = 0; i DR_MAX; (++i, dr_local_enable = DR_ENABLE_SIZE)) { +
Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers
Fir4st I'd say thanks a lot for forward-porting this, it's really useful feature for all kinds of nasty debugging. I think you should split this into two patches, one for the debugreg infrastructure, and one for the actual kwatch code. Also I think you provide one (or even a few) example wathes for trivial things, say updating i_ino for an inode given through debugfs. Some comments on the code below: --- /dev/null +++ usb-2.6/arch/i386/kernel/debugreg.c @@ -0,0 +1,182 @@ +/* + * Debug register + * arch/i386/kernel/debugreg.c Please don't put in comments that mention the name of the containing file. Also the Debug register comments seems rather useless. + * 2002-Oct Created by Vamsi Krishna S [EMAIL PROTECTED] and + * Bharata Rao [EMAIL PROTECTED] to provide debug register + * allocation mechanism. + * 2004-Oct Updated by Prasanna S Panchamukhi [EMAIL PROTECTED] with + * idr_allocations mechanism as suggested by Andi Kleen. I think these kinds of comments aren't in fashion anymore either, all changelogs should be in git commit messages and initial credits go into the first commit message. +struct debugreg dr_list[DR_MAX]; +static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED; I think you're supposed to use magic DEFINE_SPINLOCK macro these days. +unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN | + DR_GLOBAL_ENABLE_MASK; I'd rahter keep this static and make set_process_dr7 a non-inline function. + +static unsigned long dr7_global_reg_mask(unsigned int regnum) +{ + return (0xf (16 + regnum * 4)) | (0x1 (regnum * 2)); +} + +static int get_dr(int regnum, int flag) +{ + if (flag == DR_ALLOC_GLOBAL !dr_list[regnum].flag) { + dr_list[regnum].flag = flag; + dr7_global_mask |= dr7_global_reg_mask(regnum); + return regnum; + } + if (flag == DR_ALLOC_LOCAL + dr_list[regnum].flag != DR_ALLOC_GLOBAL) { + dr_list[regnum].flag = flag; + dr_list[regnum].use_count++; + return regnum; + } + return -1; This looks rather poorly structured, as the function does compltely different things depending on the flags passed in. +static void free_dr(int regnum) +{ + if (dr_list[regnum].flag == DR_ALLOC_LOCAL) { + if (!--dr_list[regnum].use_count) + dr_list[regnum].flag = 0; + } else { + dr_list[regnum].flag = 0; + dr_list[regnum].use_count = 0; + dr7_global_mask = ~(dr7_global_reg_mask(regnum)); + } +} Same here. +int dr_alloc(int regnum, int flag) +{ + int ret = -1; + + spin_lock(dr_lock); + if (regnum = 0 regnum DR_MAX) + ret = get_dr(regnum, flag); + else if (regnum == DR_ANY) { + + /* gdb allocates local debug registers starting from 0. + * To help avoid conflicts, we'll start from the other end. + */ + for (regnum = DR_MAX - 1; regnum = 0; --regnum) { + ret = get_dr(regnum, flag); + if (ret = 0) + break; + } + } else + printk(KERN_ERR dr_alloc: + Cannot allocate debug register %d\n, regnum); + spin_unlock(dr_lock); + return ret; I suspect this should be replaced wit ha global and local variant to fix the above mentioned issue. It's a tiny bit duplicated code, but seems much cleaner. +static int get_dr(int regnum, int flag) +{ + if (flag == DR_ALLOC_GLOBAL !dr_list[regnum].flag) { + dr_list[regnum].flag = flag; + dr7_global_mask |= dr7_global_reg_mask(regnum); + return regnum; + } + if (flag == DR_ALLOC_LOCAL + dr_list[regnum].flag != DR_ALLOC_GLOBAL) { + dr_list[regnum].flag = flag; + dr_list[regnum].use_count++; + return regnum; + } + return -1; Same comments about global vs local here. + +EXPORT_SYMBOL(dr_alloc); +EXPORT_SYMBOL(dr_free); I don't think we want these exported at all, and if a proper modular user shows up they should be _GPL as they're fairly lowlevel. Btw, the naming in the whole debugregs code should be consolidated to be debugreg_ instead of all kinds of different variants. +#ifdef CONFIG_KWATCH + +/* Set the type, len and global flag in dr7 for a debug register */ +#define SET_DR7(dr, regnum, type, len) do {\ + dr = ~(0xf (16 + (regnum) * 4));\ + dr |= (len) - 1) 2) | (type)) \ + (16 + (regnum) * 4)) | \ + (0x2 ((regnum) * 2));\ + } while (0) + +/* Disable a debug register by clearing the global/local flag in dr7 */ +#define RESET_DR7(dr, regnum)dr = ~(0x3 ((regnum) * 2)) I don't think