subject:"\[PATCH\] Kwatch\: kernel watchpoints using CPU debug registers"

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-23 Thread Roland McGrath

> I think the best approach will be not to reset dr7 at all.  Then there 
> won't be any need to worry about restoring it.  Leaving a userspace 
> watchpoint enabled while running in the kernel ought not to matter; a 
> system call shouldn't touch any address in userspace more than once or 
> twice.

Hmm.  That sounds reasonable.  But I wonder why the old code clears %dr7.
It's been that way for a long time (since 2.4 at least).

> My idea was to put 4 hwbkpt structures in thread_struct, so they would
> always be available for use by ptrace.  However it turned out not to be
> feasible to replace the debugreg array with something more sophisticated,
> because of conflicting declarations and problems with the ordering of
> #includes.  So instead I have been forced to replace debugreg[] with a
> pointer to a structure which can be allocated as needed.

I think that's preferable anyway.  Most tasks most of the time will never
need that storage, so why not make thread_struct a little smaller?
(There is also the potential for sharing, which I mentioned earlier.)

> This raises the possibility that a PTRACE syscall might fail because the 
> allocation fails.  Hopefully that won't be an issue?

It's not a new issue, anyway, after utrace.  The utrace-based ptrace can
fail for PTRACE_ATTACH because of OOM too, which wasn't possible before.
I think it's survivable.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-23 Thread Alan Stern

On Thu, 22 Feb 2007, Roland McGrath wrote:

> > Yes, you are wrong -- although perhaps you shouldn't be.
> > 
> > The current version of do_debug() clears dr7 when a debug interrupt is 
> > fielded.  As a result, if a system call touches two watchpoint addresses 
> > in userspace only the first access will be noticed.
> 
> Ah, I see.  I think it would indeed be nice to fix this.
> 
> > This is probably a bug in do_debug().  It would be better to disable each
> > individual userspace watchpoint as it is triggered (or even not to disable
> > it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
> > what if the user is blocking or ignoring SIGTRAP?)
> 
> The user blocking or ignoring it doesn't come up, because it's a
> force_sig_info call.  However, a debugger will indeed swallow the signal
> through ptrace/utrace means.  In ptrace, the dr7 is always going to get
> reset because there will always be a context switch out and back in that
> does it.  But with utrace it's now possible to swallow the signal and keep
> going without a context switch (e.g. a breakpoint that is just doing
> logging but not stopping).  So perhaps we should have a TIF_RESTORE_DR7
> that goes into _TIF_WORK_MASK and gets handled in do_notify_resume
> (or maybe it's TIF_HWBKPT).

I think the best approach will be not to reset dr7 at all.  Then there 
won't be any need to worry about restoring it.  Leaving a userspace 
watchpoint enabled while running in the kernel ought not to matter; a 
system call shouldn't touch any address in userspace more than once or 
twice.

> > I've worked out a plan for implementing combined user/kernel mode
> > breakpoints and watchpoints (call them "debugpoints" as a catch-all
> > term).  It should work transparently with ptrace and should accomodate
> > whatever scheme utrace decides to adopt.  There won't need to be a
> > separate kwatch facility on top of it; the new combined facility will
> > handle debugpoints in both userspace and kernelspace.
> 
> That sounds great.  I'm not thrilled with the name "debugpoint", I have to
> tell you.  The hardware documentation calls all these things "breakpoints",
> and I think "data breakpoint" and "instruction breakpoint" are pretty good
> terms.  How about "hwbkpt" for the facility API?

Okay, I'll change the name.

> The one caveat I have here is that I don't want ptrace (via utrace) to have
> to supply the usual structure.  I probably only think this because it would
> be a pain for the ptrace/utrace implementation to find a place to stick it.
> But I have a rationalization.  The old ptrace interface, and the
> utrace_regset for debugregs, is not really a "debugpoint user" in the sense
> you're defining it.  It's an access to the "raw" debugregs as part of the
> thread's virtual CPU context.  You can use ptrace to set a watchpoint, then
> detach ptrace, and the thread will get a SIGTRAP later though there is no
> remnant at that point of the debugger interface that made it come about.
> For the degenerate case of medium-high priority with no handler callbacks
> (that should instead be an error at registration time if no slot is free),
> you shouldn't really need any per-caller storage (there can only be one
> such caller per slot).  

My idea was to put 4 hwbkpt structures in thread_struct, so they would
always be available for use by ptrace.  However it turned out not to be
feasible to replace the debugreg array with something more sophisticated,
because of conflicting declarations and problems with the ordering of
#includes.  So instead I have been forced to replace debugreg[] with a
pointer to a structure which can be allocated as needed.

This raises the possibility that a PTRACE syscall might fail because the 
allocation fails.  Hopefully that won't be an issue?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-23 Thread Alan Stern

On Thu, 22 Feb 2007, Roland McGrath wrote:

  Yes, you are wrong -- although perhaps you shouldn't be.
  
  The current version of do_debug() clears dr7 when a debug interrupt is 
  fielded.  As a result, if a system call touches two watchpoint addresses 
  in userspace only the first access will be noticed.
 
 Ah, I see.  I think it would indeed be nice to fix this.
 
  This is probably a bug in do_debug().  It would be better to disable each
  individual userspace watchpoint as it is triggered (or even not to disable
  it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
  what if the user is blocking or ignoring SIGTRAP?)
 
 The user blocking or ignoring it doesn't come up, because it's a
 force_sig_info call.  However, a debugger will indeed swallow the signal
 through ptrace/utrace means.  In ptrace, the dr7 is always going to get
 reset because there will always be a context switch out and back in that
 does it.  But with utrace it's now possible to swallow the signal and keep
 going without a context switch (e.g. a breakpoint that is just doing
 logging but not stopping).  So perhaps we should have a TIF_RESTORE_DR7
 that goes into _TIF_WORK_MASK and gets handled in do_notify_resume
 (or maybe it's TIF_HWBKPT).

I think the best approach will be not to reset dr7 at all.  Then there 
won't be any need to worry about restoring it.  Leaving a userspace 
watchpoint enabled while running in the kernel ought not to matter; a 
system call shouldn't touch any address in userspace more than once or 
twice.

  I've worked out a plan for implementing combined user/kernel mode
  breakpoints and watchpoints (call them debugpoints as a catch-all
  term).  It should work transparently with ptrace and should accomodate
  whatever scheme utrace decides to adopt.  There won't need to be a
  separate kwatch facility on top of it; the new combined facility will
  handle debugpoints in both userspace and kernelspace.
 
 That sounds great.  I'm not thrilled with the name debugpoint, I have to
 tell you.  The hardware documentation calls all these things breakpoints,
 and I think data breakpoint and instruction breakpoint are pretty good
 terms.  How about hwbkpt for the facility API?

Okay, I'll change the name.

 The one caveat I have here is that I don't want ptrace (via utrace) to have
 to supply the usual structure.  I probably only think this because it would
 be a pain for the ptrace/utrace implementation to find a place to stick it.
 But I have a rationalization.  The old ptrace interface, and the
 utrace_regset for debugregs, is not really a debugpoint user in the sense
 you're defining it.  It's an access to the raw debugregs as part of the
 thread's virtual CPU context.  You can use ptrace to set a watchpoint, then
 detach ptrace, and the thread will get a SIGTRAP later though there is no
 remnant at that point of the debugger interface that made it come about.
 For the degenerate case of medium-high priority with no handler callbacks
 (that should instead be an error at registration time if no slot is free),
 you shouldn't really need any per-caller storage (there can only be one
 such caller per slot).  

My idea was to put 4 hwbkpt structures in thread_struct, so they would
always be available for use by ptrace.  However it turned out not to be
feasible to replace the debugreg array with something more sophisticated,
because of conflicting declarations and problems with the ordering of
#includes.  So instead I have been forced to replace debugreg[] with a
pointer to a structure which can be allocated as needed.

This raises the possibility that a PTRACE syscall might fail because the 
allocation fails.  Hopefully that won't be an issue?

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-23 Thread Roland McGrath

 I think the best approach will be not to reset dr7 at all.  Then there 
 won't be any need to worry about restoring it.  Leaving a userspace 
 watchpoint enabled while running in the kernel ought not to matter; a 
 system call shouldn't touch any address in userspace more than once or 
 twice.

Hmm.  That sounds reasonable.  But I wonder why the old code clears %dr7.
It's been that way for a long time (since 2.4 at least).

 My idea was to put 4 hwbkpt structures in thread_struct, so they would
 always be available for use by ptrace.  However it turned out not to be
 feasible to replace the debugreg array with something more sophisticated,
 because of conflicting declarations and problems with the ordering of
 #includes.  So instead I have been forced to replace debugreg[] with a
 pointer to a structure which can be allocated as needed.

I think that's preferable anyway.  Most tasks most of the time will never
need that storage, so why not make thread_struct a little smaller?
(There is also the potential for sharing, which I mentioned earlier.)

 This raises the possibility that a PTRACE syscall might fail because the 
 allocation fails.  Hopefully that won't be an issue?

It's not a new issue, anyway, after utrace.  The utrace-based ptrace can
fail for PTRACE_ATTACH because of OOM too, which wasn't possible before.
I think it's survivable.


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-22 Thread Roland McGrath

> Yes, you are wrong -- although perhaps you shouldn't be.
> 
> The current version of do_debug() clears dr7 when a debug interrupt is 
> fielded.  As a result, if a system call touches two watchpoint addresses 
> in userspace only the first access will be noticed.

Ah, I see.  I think it would indeed be nice to fix this.

> This is probably a bug in do_debug().  It would be better to disable each
> individual userspace watchpoint as it is triggered (or even not to disable
> it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
> what if the user is blocking or ignoring SIGTRAP?)

The user blocking or ignoring it doesn't come up, because it's a
force_sig_info call.  However, a debugger will indeed swallow the signal
through ptrace/utrace means.  In ptrace, the dr7 is always going to get
reset because there will always be a context switch out and back in that
does it.  But with utrace it's now possible to swallow the signal and keep
going without a context switch (e.g. a breakpoint that is just doing
logging but not stopping).  So perhaps we should have a TIF_RESTORE_DR7
that goes into _TIF_WORK_MASK and gets handled in do_notify_resume
(or maybe it's TIF_HWBKPT).

You should not actually need to disable user watchpoints, because in data
watchpoints the exception comes after the instruction completes.  Only for
instruction watchpoints does the exception come before the instruction
executes, and no user watchpoints can be in the address range containing
kernel code.  

SIGTRAP both doesn't queue, and doesn't give %dr6 values in its siginfo_t.
All user watchpoints will be handled via the signal; this is the only way
ptrace can report them, and is also the utrace way of doing things.
do_debug can happen inside kernel code, and tracing of user-level tasks can
only safely do anything at the point just before returning to user mode,
where signals are handled.  So, getting to send_sigtrap in do_debug is
enough to say "one or more user debug exceptions happened".  The %dr6 value
that collects in the thread state to be seen by ptrace, or by utrace-based
things using your new facility, needs to collect all the %dr6 bits that
were set by the hardware and weren't consumed by kernel-level tracing.  An
eventual utrace-based thing might in fact have some other way to tie in so
that the event details could just be in some call made by do_debug and not
recorded in the thread's virtual %dr6 value.  But at least for ptrace, they
should collect there if it becomes possible for more than one exception to
happen while in kernel mode or in a single user instruction.  (A single
instruction can cause multiple exceptions at the hardware level.)

> I've worked out a plan for implementing combined user/kernel mode
> breakpoints and watchpoints (call them "debugpoints" as a catch-all
> term).  It should work transparently with ptrace and should accomodate
> whatever scheme utrace decides to adopt.  There won't need to be a
> separate kwatch facility on top of it; the new combined facility will
> handle debugpoints in both userspace and kernelspace.

That sounds great.  I'm not thrilled with the name "debugpoint", I have to
tell you.  The hardware documentation calls all these things "breakpoints",
and I think "data breakpoint" and "instruction breakpoint" are pretty good
terms.  How about "hwbkpt" for the facility API?

> To work with ptrace, the new scheme will completely virtualize the debug
> registers.  So for example, a ptrace call might request a debugpoint in
> dr0, but it could end up that the physical register used is really dr2
> instead.  The various bits in dr6 and dr7 will be mapped in such a way
> that the entire procedure is transparent to the user.  Debugpoints 
> registered in kernelspace or by utrace won't care, of course.

I think that's a fine idea.  

The one caveat I have here is that I don't want ptrace (via utrace) to have
to supply the usual structure.  I probably only think this because it would
be a pain for the ptrace/utrace implementation to find a place to stick it.
But I have a rationalization.  The old ptrace interface, and the
utrace_regset for debugregs, is not really a "debugpoint user" in the sense
you're defining it.  It's an access to the "raw" debugregs as part of the
thread's virtual CPU context.  You can use ptrace to set a watchpoint, then
detach ptrace, and the thread will get a SIGTRAP later though there is no
remnant at that point of the debugger interface that made it come about.
For the degenerate case of medium-high priority with no handler callbacks
(that should instead be an error at registration time if no slot is free),
you shouldn't really need any per-caller storage (there can only be one
such caller per slot).  

> There are two things I am uncertain about: vm86 mode and kprobes.  I don't
> know anything about how either of them works.  

I know about kprobes.  I don't know about vm86, but I can read the code.


Thanks,
Roland
-
To unsubscribe from this list:

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-22 Thread S. P. Prasanna

On Wed, Feb 21, 2007 at 03:35:13PM -0500, Alan Stern wrote:
> Going back to something you mentioned earlier...
> 
[...]
> On Fri, 9 Feb 2007, Roland McGrath wrote:
> There are two things I am uncertain about: vm86 mode and kprobes.  I don't
> know anything about how either of them works.  Judging from the current
> code, nothing much should be needed -- debug traps in vm86 mode are
> handled by calling handle_vm86_trap(), and kprobes puts itself at the
> start of the notify_die() chain so it can handle single-step traps.  
> Eventually it will be necessary to check with someone who really 
> understands the issues.

Yes, Kprobes needs to get notified first to handle single-step traps. So kwatch
getting notified secound should be fine.

Thanks
Prasanna
-- 
Prasanna S.P.
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [EMAIL PROTECTED]
Ph: 91-80-41776329
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-22 Thread S. P. Prasanna

On Wed, Feb 21, 2007 at 03:35:13PM -0500, Alan Stern wrote:
 Going back to something you mentioned earlier...
 
[...]
 On Fri, 9 Feb 2007, Roland McGrath wrote:
 There are two things I am uncertain about: vm86 mode and kprobes.  I don't
 know anything about how either of them works.  Judging from the current
 code, nothing much should be needed -- debug traps in vm86 mode are
 handled by calling handle_vm86_trap(), and kprobes puts itself at the
 start of the notify_die() chain so it can handle single-step traps.  
 Eventually it will be necessary to check with someone who really 
 understands the issues.

Yes, Kprobes needs to get notified first to handle single-step traps. So kwatch
getting notified secound should be fine.

Thanks
Prasanna
-- 
Prasanna S.P.
Linux Technology Center
India Software Labs, IBM Bangalore
Email: [EMAIL PROTECTED]
Ph: 91-80-41776329
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-22 Thread Roland McGrath

 Yes, you are wrong -- although perhaps you shouldn't be.
 
 The current version of do_debug() clears dr7 when a debug interrupt is 
 fielded.  As a result, if a system call touches two watchpoint addresses 
 in userspace only the first access will be noticed.

Ah, I see.  I think it would indeed be nice to fix this.

 This is probably a bug in do_debug().  It would be better to disable each
 individual userspace watchpoint as it is triggered (or even not to disable
 it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
 what if the user is blocking or ignoring SIGTRAP?)

The user blocking or ignoring it doesn't come up, because it's a
force_sig_info call.  However, a debugger will indeed swallow the signal
through ptrace/utrace means.  In ptrace, the dr7 is always going to get
reset because there will always be a context switch out and back in that
does it.  But with utrace it's now possible to swallow the signal and keep
going without a context switch (e.g. a breakpoint that is just doing
logging but not stopping).  So perhaps we should have a TIF_RESTORE_DR7
that goes into _TIF_WORK_MASK and gets handled in do_notify_resume
(or maybe it's TIF_HWBKPT).

You should not actually need to disable user watchpoints, because in data
watchpoints the exception comes after the instruction completes.  Only for
instruction watchpoints does the exception come before the instruction
executes, and no user watchpoints can be in the address range containing
kernel code.  

SIGTRAP both doesn't queue, and doesn't give %dr6 values in its siginfo_t.
All user watchpoints will be handled via the signal; this is the only way
ptrace can report them, and is also the utrace way of doing things.
do_debug can happen inside kernel code, and tracing of user-level tasks can
only safely do anything at the point just before returning to user mode,
where signals are handled.  So, getting to send_sigtrap in do_debug is
enough to say one or more user debug exceptions happened.  The %dr6 value
that collects in the thread state to be seen by ptrace, or by utrace-based
things using your new facility, needs to collect all the %dr6 bits that
were set by the hardware and weren't consumed by kernel-level tracing.  An
eventual utrace-based thing might in fact have some other way to tie in so
that the event details could just be in some call made by do_debug and not
recorded in the thread's virtual %dr6 value.  But at least for ptrace, they
should collect there if it becomes possible for more than one exception to
happen while in kernel mode or in a single user instruction.  (A single
instruction can cause multiple exceptions at the hardware level.)

 I've worked out a plan for implementing combined user/kernel mode
 breakpoints and watchpoints (call them debugpoints as a catch-all
 term).  It should work transparently with ptrace and should accomodate
 whatever scheme utrace decides to adopt.  There won't need to be a
 separate kwatch facility on top of it; the new combined facility will
 handle debugpoints in both userspace and kernelspace.

That sounds great.  I'm not thrilled with the name debugpoint, I have to
tell you.  The hardware documentation calls all these things breakpoints,
and I think data breakpoint and instruction breakpoint are pretty good
terms.  How about hwbkpt for the facility API?

 To work with ptrace, the new scheme will completely virtualize the debug
 registers.  So for example, a ptrace call might request a debugpoint in
 dr0, but it could end up that the physical register used is really dr2
 instead.  The various bits in dr6 and dr7 will be mapped in such a way
 that the entire procedure is transparent to the user.  Debugpoints 
 registered in kernelspace or by utrace won't care, of course.

I think that's a fine idea.  

The one caveat I have here is that I don't want ptrace (via utrace) to have
to supply the usual structure.  I probably only think this because it would
be a pain for the ptrace/utrace implementation to find a place to stick it.
But I have a rationalization.  The old ptrace interface, and the
utrace_regset for debugregs, is not really a debugpoint user in the sense
you're defining it.  It's an access to the raw debugregs as part of the
thread's virtual CPU context.  You can use ptrace to set a watchpoint, then
detach ptrace, and the thread will get a SIGTRAP later though there is no
remnant at that point of the debugger interface that made it come about.
For the degenerate case of medium-high priority with no handler callbacks
(that should instead be an error at registration time if no slot is free),
you shouldn't really need any per-caller storage (there can only be one
such caller per slot).  

 There are two things I am uncertain about: vm86 mode and kprobes.  I don't
 know anything about how either of them works.  

I know about kprobes.  I don't know about vm86, but I can read the code.


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-21 Thread Alan Stern

Going back to something you mentioned earlier...

On Fri, 9 Feb 2007, Roland McGrath wrote:

> I don't think I really object to the ABI change of clearing %dr6 after an
> exception so that it does not accumulate multiple results.  But first I'll
> have to convince myself that we never actually do want to accumulate
> multiple results.  Hmm, I think we can, so maybe I do object.  If you set
> two watchpoints inside a user buffer and then do a system call that touches
> both those addresses (e.g. read), then you will go through do_debug (to
> send_sigtrap) twice before returning to user mode.  When the syscall is
> done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
> at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
> handle this case, but it should.)  Am I wrong?

Yes, you are wrong -- although perhaps you shouldn't be.

The current version of do_debug() clears dr7 when a debug interrupt is 
fielded.  As a result, if a system call touches two watchpoint addresses 
in userspace only the first access will be noticed.

This is probably a bug in do_debug().  It would be better to disable each
individual userspace watchpoint as it is triggered (or even not to disable
it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
what if the user is blocking or ignoring SIGTRAP?)

Moving on...

I've worked out a plan for implementing combined user/kernel mode 
breakpoints and watchpoints (call them "debugpoints" as a catch-all 
term).  It should work transparently with ptrace and should accomodate 
whatever scheme utrace decides to adopt.  There won't need to be a 
separate kwatch facility on top of it; the new combined facility will 
handle debugpoints in both userspace and kernelspace.

The idea is that callers can register and unregister a struct debugpoint, 
which contains fields for the type, length, address, and priority as well 
as three callback pointers (for installed, uninstalled, and triggered).  
The debug core will keep these structures sorted by priority and will 
allocate the available debug registers based on the priorities of the 
userspace and kernelspace requests.  Each CPU will have its own array of 
pointers to these structures, indicating which debugpoints are currently 
enabled.

To work with ptrace, the new scheme will completely virtualize the debug
registers.  So for example, a ptrace call might request a debugpoint in
dr0, but it could end up that the physical register used is really dr2
instead.  The various bits in dr6 and dr7 will be mapped in such a way
that the entire procedure is transparent to the user.  Debugpoints 
registered in kernelspace or by utrace won't care, of course.

There are two things I am uncertain about: vm86 mode and kprobes.  I don't
know anything about how either of them works.  Judging from the current
code, nothing much should be needed -- debug traps in vm86 mode are
handled by calling handle_vm86_trap(), and kprobes puts itself at the
start of the notify_die() chain so it can handle single-step traps.  
Eventually it will be necessary to check with someone who really 
understands the issues.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-21 Thread Alan Stern

Going back to something you mentioned earlier...

On Fri, 9 Feb 2007, Roland McGrath wrote:

 I don't think I really object to the ABI change of clearing %dr6 after an
 exception so that it does not accumulate multiple results.  But first I'll
 have to convince myself that we never actually do want to accumulate
 multiple results.  Hmm, I think we can, so maybe I do object.  If you set
 two watchpoints inside a user buffer and then do a system call that touches
 both those addresses (e.g. read), then you will go through do_debug (to
 send_sigtrap) twice before returning to user mode.  When the syscall is
 done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
 at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
 handle this case, but it should.)  Am I wrong?

Yes, you are wrong -- although perhaps you shouldn't be.

The current version of do_debug() clears dr7 when a debug interrupt is 
fielded.  As a result, if a system call touches two watchpoint addresses 
in userspace only the first access will be noticed.

This is probably a bug in do_debug().  It would be better to disable each
individual userspace watchpoint as it is triggered (or even not to disable
it at all).  dr7 would be restored when the SIGTRAP is delivered.  (But
what if the user is blocking or ignoring SIGTRAP?)


Moving on...

I've worked out a plan for implementing combined user/kernel mode 
breakpoints and watchpoints (call them debugpoints as a catch-all 
term).  It should work transparently with ptrace and should accomodate 
whatever scheme utrace decides to adopt.  There won't need to be a 
separate kwatch facility on top of it; the new combined facility will 
handle debugpoints in both userspace and kernelspace.

The idea is that callers can register and unregister a struct debugpoint, 
which contains fields for the type, length, address, and priority as well 
as three callback pointers (for installed, uninstalled, and triggered).  
The debug core will keep these structures sorted by priority and will 
allocate the available debug registers based on the priorities of the 
userspace and kernelspace requests.  Each CPU will have its own array of 
pointers to these structures, indicating which debugpoints are currently 
enabled.

To work with ptrace, the new scheme will completely virtualize the debug
registers.  So for example, a ptrace call might request a debugpoint in
dr0, but it could end up that the physical register used is really dr2
instead.  The various bits in dr6 and dr7 will be mapped in such a way
that the entire procedure is transparent to the user.  Debugpoints 
registered in kernelspace or by utrace won't care, of course.

There are two things I am uncertain about: vm86 mode and kprobes.  I don't
know anything about how either of them works.  Judging from the current
code, nothing much should be needed -- debug traps in vm86 mode are
handled by calling handle_vm86_trap(), and kprobes puts itself at the
start of the notify_die() chain so it can handle single-step traps.  
Eventually it will be necessary to check with someone who really 
understands the issues.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-17 Thread Roland McGrath

> How do you suggest this be handled?  Maybe we should just keep track of a
> maximum user priority level for each slot, allowing it to go up but not
> down until all user processes have given up the slot.  (I.e., in the
> example above the later kwatch requests would still fail because we would
> continue to remember the high user priority level so long as the first
> process maintained its allocation.)  That would be overly pessimistic, but
> it would at least be safe.

I think that is certainly fine to start with.  Like I said before, we can
start conservative and then worry about more complexity as the concrete
needs arise.  I don't think it will really be any trouble to change some of
these low-level interfaces later to accomodate more sophisticated callers
and implementations.  

When the issue does arise, scanning all the necessary tasks may not really
need to be so costly.  That is, if rather than scanning all tasks in the
system, it's a list of debugreg allocations.  The callers doing fancy
allocation can be responsible for passing in storage for a struct
containing the list structure.  That would naturally embed in struct
kwatch.  Having the debugreg allocation routines pass in such a structure
would also be useful for another kind of flexibility I'd like to have
eventually.  That is, "local" allocations that are local to a group of
tasks rather than just one.  For example, a debugger most often actually
wants to share watchpoints among all the threads sharing an address space.
If identical settings are in fact shared, the storage requirements for
using watchpoints in many-threaded processes scale that much better.

I think we have a while before we have to actually figure any of that out
in detail.  The simple starting point covers all our immediate concrete
concerns.


Thanks,
Roland

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-17 Thread Roland McGrath

 How do you suggest this be handled?  Maybe we should just keep track of a
 maximum user priority level for each slot, allowing it to go up but not
 down until all user processes have given up the slot.  (I.e., in the
 example above the later kwatch requests would still fail because we would
 continue to remember the high user priority level so long as the first
 process maintained its allocation.)  That would be overly pessimistic, but
 it would at least be safe.

I think that is certainly fine to start with.  Like I said before, we can
start conservative and then worry about more complexity as the concrete
needs arise.  I don't think it will really be any trouble to change some of
these low-level interfaces later to accomodate more sophisticated callers
and implementations.  

When the issue does arise, scanning all the necessary tasks may not really
need to be so costly.  That is, if rather than scanning all tasks in the
system, it's a list of debugreg allocations.  The callers doing fancy
allocation can be responsible for passing in storage for a struct
containing the list structure.  That would naturally embed in struct
kwatch.  Having the debugreg allocation routines pass in such a structure
would also be useful for another kind of flexibility I'd like to have
eventually.  That is, local allocations that are local to a group of
tasks rather than just one.  For example, a debugger most often actually
wants to share watchpoints among all the threads sharing an address space.
If identical settings are in fact shared, the storage requirements for
using watchpoints in many-threaded processes scale that much better.

I think we have a while before we have to actually figure any of that out
in detail.  The simple starting point covers all our immediate concrete
concerns.


Thanks,
Roland

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Alan Stern

On Fri, 9 Feb 2007, Roland McGrath wrote:

> I don't think I really object to the ABI change of clearing %dr6 after an
> exception so that it does not accumulate multiple results.  But first I'll
> have to convince myself that we never actually do want to accumulate
> multiple results.  Hmm, I think we can, so maybe I do object.  If you set
> two watchpoints inside a user buffer and then do a system call that touches
> both those addresses (e.g. read), then you will go through do_debug (to
> send_sigtrap) twice before returning to user mode.  When the syscall is
> done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
> at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
> handle this case, but it should.)  Am I wrong?

I think you're right.

> So this gets to the more complicated view of %dr6 handling that I had first
> had in mind yesterday.  Each allocation "owns" one of the low 4 bits in
> %dr6 too.  Only the dr6 bits owned by the userland "raw" allocation
> (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6].
> So when kwatch swallows a debug exception, it should mask off its bit from
> %dr6 in the CPU, but not clear %dr6 completely.  That way you can have a
> sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one
> system call (including interrupt handlers), and when it gets to the
> userland debugger examining dr6 it sees the low 2 bits both set.

Okay; I'll fix this too.  Come to think of it, kwatch needs to handle 
multiple hits as well -- there might be two watchpoints set to the same 
address.

> > It's really quite a tricky matter.  Should a register be allocated to
> > kwatch only when no user process needs it?  Should we really go about
> > checking the requirements of every single process whenever a kwatch
> > allocation request comes in?  What if the processes which need a
> > particular register aren't running -- should the register then be given to
> > kwatch?  What if one of those processes then does start running on one
> > CPU?
> 
> To "go about checking the requirements of every single process" is not so
> hard as it sounds when they're recorded as a single global use count per
> slot, as your original code does.  When you mentioned a "your allocation is
> available" callback, I was thinking it might come to that being called
> inside context switch.  It's all rather tricky, indeed.  
> 
> The obvious answer is to start simple.  If any user process anywhere uses
> drN, kwatch has to give it up for all CPUs (watchpoints with less than
> "break ptrace" priority do).  If anyone really cares about more flexibility
> than that, we can change or extend it.  Some copious comments in the
> interface descriptions can lead them in the right direction if the
> situation comes up.  Probably with systemtap support in a while, we'll get
> a lot more concrete uses of watchpoints and people finding out what really 
> matters to them.

It's still more complicated than you might think.  Let's say two user
processes each have dr1 allocated, one with low priority and the other
with high priority.  The kernel has to be aware of the high-priority
allocation, so that it can refuse intermediate-priority kwatch allocation
attempts.  Now suppose the second process exits.  dr1 is still allocated
to the first user process but only with low priority, so now
intermediate-priority kwatch allocation attempts should succeed.

In order for this to work, when the second process gives up its allocation 
I would have to either scan though all tasks to see the first process, or 
else keep several global use counts for each slot -- in fact, one use 
count for each priority level.  That's doable if there are only a few 
levels, but not if there are many.

How do you suggest this be handled?  Maybe we should just keep track of a
maximum user priority level for each slot, allowing it to go up but not
down until all user processes have given up the slot.  (I.e., in the
example above the later kwatch requests would still fail because we would
continue to remember the high user priority level so long as the first
process maintained its allocation.)  That would be overly pessimistic, but
it would at least be safe.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Roland McGrath

> Yes.  In fact, the current existing code does not handle dr6 correctly.  
> It never clears the register, which means you're likely to get into 
> trouble when multiple breakpoints (or watchpoints) are enabled.

This is a subtle change from the existing ABI, in which userland has to
clear %dr6 via ptrace itself.  But gdb never does that AFAICT.  So it's in
fact subject to confusion when two watchpoints are set and the second hits
after the first.  So gdb ought to be fixed to clear dr6 via ptrace, to work
with existing and older kernels.

I don't think I really object to the ABI change of clearing %dr6 after an
exception so that it does not accumulate multiple results.  But first I'll
have to convince myself that we never actually do want to accumulate
multiple results.  Hmm, I think we can, so maybe I do object.  If you set
two watchpoints inside a user buffer and then do a system call that touches
both those addresses (e.g. read), then you will go through do_debug (to
send_sigtrap) twice before returning to user mode.  When the syscall is
done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
handle this case, but it should.)  Am I wrong?

So this gets to the more complicated view of %dr6 handling that I had first
had in mind yesterday.  Each allocation "owns" one of the low 4 bits in
%dr6 too.  Only the dr6 bits owned by the userland "raw" allocation
(i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6].
So when kwatch swallows a debug exception, it should mask off its bit from
%dr6 in the CPU, but not clear %dr6 completely.  That way you can have a
sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one
system call (including interrupt handlers), and when it gets to the
userland debugger examining dr6 it sees the low 2 bits both set.

> It's really quite a tricky matter.  Should a register be allocated to
> kwatch only when no user process needs it?  Should we really go about
> checking the requirements of every single process whenever a kwatch
> allocation request comes in?  What if the processes which need a
> particular register aren't running -- should the register then be given to
> kwatch?  What if one of those processes then does start running on one
> CPU?

To "go about checking the requirements of every single process" is not so
hard as it sounds when they're recorded as a single global use count per
slot, as your original code does.  When you mentioned a "your allocation is
available" callback, I was thinking it might come to that being called
inside context switch.  It's all rather tricky, indeed.  

The obvious answer is to start simple.  If any user process anywhere uses
drN, kwatch has to give it up for all CPUs (watchpoints with less than
"break ptrace" priority do).  If anyone really cares about more flexibility
than that, we can change or extend it.  Some copious comments in the
interface descriptions can lead them in the right direction if the
situation comes up.  Probably with systemtap support in a while, we'll get
a lot more concrete uses of watchpoints and people finding out what really 
matters to them.


Thanks,
Roland

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Alan Stern

On Fri, 9 Feb 2007, Roland McGrath wrote:

> > All right.  However this means thread_struct will have to grow in order to
> > hold each task's debug-register allocations and priorities.  Would that be
> > acceptable?  (This might be a good reason to keep the number of bits
> > down.)
> 
> Well, noone seems to mind the wasted debugreg[5..6] words. ;-) 
> 
> I'm inclined to make thread_struct smaller than it is now by making it
> indirect (debugreg[8] -> one pointer).  It feels like this would be pretty
> safe now that we have TIF_DEBUG anyway.  Already nothing should be looking
> at the field when TIF_DEBUG isn't set.

I had the same thought.

> > Another question: How can a program using the ptrace API ever give up a
> > debug-register allocation?  Disabling the register isn't sufficient; a
> > user program should be able to store a watchpoint address in dr1, enable
> > it in dr7, disable it in dr7, and then re-enable it in the expectation
> > that the address stored in dr1 hasn't been overwritten.  As far as I can
> > see, ptrace-type allocations have to be permanent (until the task exits or
> > execs, or uses some other to-be-determined API to do the de-allocation.)
> 
> I hadn't really thought about this before, but it's pretty straightforward.
> Each allocation is for one of %dr[0-3] and for its associated bits in %dr7.
> %dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc.
> %dr7 & (0x000f0003 << N) for %drN, I guess it is.
> 1 << DR_CONTROL_SIZE) - 1) << DR_CONTROL_SHIFT) |
>  ((1 << DR_ENABLE_SIZE) - 1)) << N, I should say.
> 
> Each allocation owns those 38 bits (70 bits on x86_64).  When all those
> bits are zero, the allocation is not active and might as well not be there
> (except for whatever semantics you might want to have about an allocation's
> lifetime as distinct from the event of resetting its contents).  

Okay, that makes sense.

> Your question made me think about the %dr6 issue, too, which I also hadn't
> thought about before.  It looks like your patch handles this correctly, but
> it's a subtle point that I think warrants some comments in the code.  When
> userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug
> and eventually looks at dr6 (via ptrace) to see what happened.  Kernel
> watchpoints that come along after the user signal is generated can clobber
> the CPU %dr6 with new hits, but userland should not perceive this.  This
> works out because what userland sees is thread.debugreg[6], the only place
> that sets it is do_debug, and a kwatch hit bails out before changing it.
> Any other new users of the debugreg sharing interface need to be cognizant
> of the %dr6 issue to avoid stepping on old ptrace uses.

Yes.  In fact, the current existing code does not handle dr6 correctly.  
It never clears the register, which means you're likely to get into 
trouble when multiple breakpoints (or watchpoints) are enabled.

Here's another complication -- and this is one I can't figure out any easy
solutions for.  The type of API we've been discussing will work well
enough on UP systems, but what about SMP?  I don't see any value in having
a kernel watchpoint enabled on some CPUs and not others.  But then what
should happen when a debug register is in use by kwatch and a ptrace
request on one CPU usurps it (leaving no other register in which to put
it)?  Not to mention the difficulties of keeping track of everything when
the same kwatch watchpoint is stored in different debug registers on
different CPUs.

It's really quite a tricky matter.  Should a register be allocated to
kwatch only when no user process needs it?  Should we really go about
checking the requirements of every single process whenever a kwatch
allocation request comes in?  What if the processes which need a
particular register aren't running -- should the register then be given to
kwatch?  What if one of those processes then does start running on one
CPU?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Roland McGrath

> All right.  However this means thread_struct will have to grow in order to
> hold each task's debug-register allocations and priorities.  Would that be
> acceptable?  (This might be a good reason to keep the number of bits
> down.)

Well, noone seems to mind the wasted debugreg[5..6] words. ;-) 

I'm inclined to make thread_struct smaller than it is now by making it
indirect (debugreg[8] -> one pointer).  It feels like this would be pretty
safe now that we have TIF_DEBUG anyway.  Already nothing should be looking
at the field when TIF_DEBUG isn't set.

> Another question: How can a program using the ptrace API ever give up a
> debug-register allocation?  Disabling the register isn't sufficient; a
> user program should be able to store a watchpoint address in dr1, enable
> it in dr7, disable it in dr7, and then re-enable it in the expectation
> that the address stored in dr1 hasn't been overwritten.  As far as I can
> see, ptrace-type allocations have to be permanent (until the task exits or
> execs, or uses some other to-be-determined API to do the de-allocation.)

I hadn't really thought about this before, but it's pretty straightforward.
Each allocation is for one of %dr[0-3] and for its associated bits in %dr7.
%dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc.
%dr7 & (0x000f0003 << N) for %drN, I guess it is.
1 << DR_CONTROL_SIZE) - 1) << DR_CONTROL_SHIFT) |
 ((1 << DR_ENABLE_SIZE) - 1)) << N, I should say.

Each allocation owns those 38 bits (70 bits on x86_64).  When all those
bits are zero, the allocation is not active and might as well not be there
(except for whatever semantics you might want to have about an allocation's
lifetime as distinct from the event of resetting its contents).  

gdb already works this way: when it removes a watchpoint, it clears drN to
zero when it zeros all the corresponding bits in dr7.  (In fact it's in a
separate call after changing dr7, because of the ptrace interface.)

Your question made me think about the %dr6 issue, too, which I also hadn't
thought about before.  It looks like your patch handles this correctly, but
it's a subtle point that I think warrants some comments in the code.  When
userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug
and eventually looks at dr6 (via ptrace) to see what happened.  Kernel
watchpoints that come along after the user signal is generated can clobber
the CPU %dr6 with new hits, but userland should not perceive this.  This
works out because what userland sees is thread.debugreg[6], the only place
that sets it is do_debug, and a kwatch hit bails out before changing it.
Any other new users of the debugreg sharing interface need to be cognizant
of the %dr6 issue to avoid stepping on old ptrace uses.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Roland McGrath

 All right.  However this means thread_struct will have to grow in order to
 hold each task's debug-register allocations and priorities.  Would that be
 acceptable?  (This might be a good reason to keep the number of bits
 down.)

Well, noone seems to mind the wasted debugreg[5..6] words. ;-) 

I'm inclined to make thread_struct smaller than it is now by making it
indirect (debugreg[8] - one pointer).  It feels like this would be pretty
safe now that we have TIF_DEBUG anyway.  Already nothing should be looking
at the field when TIF_DEBUG isn't set.

 Another question: How can a program using the ptrace API ever give up a
 debug-register allocation?  Disabling the register isn't sufficient; a
 user program should be able to store a watchpoint address in dr1, enable
 it in dr7, disable it in dr7, and then re-enable it in the expectation
 that the address stored in dr1 hasn't been overwritten.  As far as I can
 see, ptrace-type allocations have to be permanent (until the task exits or
 execs, or uses some other to-be-determined API to do the de-allocation.)

I hadn't really thought about this before, but it's pretty straightforward.
Each allocation is for one of %dr[0-3] and for its associated bits in %dr7.
%dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc.
%dr7  (0x000f0003  N) for %drN, I guess it is.
1  DR_CONTROL_SIZE) - 1)  DR_CONTROL_SHIFT) |
 ((1  DR_ENABLE_SIZE) - 1))  N, I should say.

Each allocation owns those 38 bits (70 bits on x86_64).  When all those
bits are zero, the allocation is not active and might as well not be there
(except for whatever semantics you might want to have about an allocation's
lifetime as distinct from the event of resetting its contents).  

gdb already works this way: when it removes a watchpoint, it clears drN to
zero when it zeros all the corresponding bits in dr7.  (In fact it's in a
separate call after changing dr7, because of the ptrace interface.)

Your question made me think about the %dr6 issue, too, which I also hadn't
thought about before.  It looks like your patch handles this correctly, but
it's a subtle point that I think warrants some comments in the code.  When
userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug
and eventually looks at dr6 (via ptrace) to see what happened.  Kernel
watchpoints that come along after the user signal is generated can clobber
the CPU %dr6 with new hits, but userland should not perceive this.  This
works out because what userland sees is thread.debugreg[6], the only place
that sets it is do_debug, and a kwatch hit bails out before changing it.
Any other new users of the debugreg sharing interface need to be cognizant
of the %dr6 issue to avoid stepping on old ptrace uses.


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Alan Stern

On Fri, 9 Feb 2007, Roland McGrath wrote:

  All right.  However this means thread_struct will have to grow in order to
  hold each task's debug-register allocations and priorities.  Would that be
  acceptable?  (This might be a good reason to keep the number of bits
  down.)
 
 Well, noone seems to mind the wasted debugreg[5..6] words. ;-) 
 
 I'm inclined to make thread_struct smaller than it is now by making it
 indirect (debugreg[8] - one pointer).  It feels like this would be pretty
 safe now that we have TIF_DEBUG anyway.  Already nothing should be looking
 at the field when TIF_DEBUG isn't set.

I had the same thought.

  Another question: How can a program using the ptrace API ever give up a
  debug-register allocation?  Disabling the register isn't sufficient; a
  user program should be able to store a watchpoint address in dr1, enable
  it in dr7, disable it in dr7, and then re-enable it in the expectation
  that the address stored in dr1 hasn't been overwritten.  As far as I can
  see, ptrace-type allocations have to be permanent (until the task exits or
  execs, or uses some other to-be-determined API to do the de-allocation.)
 
 I hadn't really thought about this before, but it's pretty straightforward.
 Each allocation is for one of %dr[0-3] and for its associated bits in %dr7.
 %dr0 and bits 0,1,16-19; %dr1 and bits 2,3,20-23; etc.
 %dr7  (0x000f0003  N) for %drN, I guess it is.
 1  DR_CONTROL_SIZE) - 1)  DR_CONTROL_SHIFT) |
  ((1  DR_ENABLE_SIZE) - 1))  N, I should say.
 
 Each allocation owns those 38 bits (70 bits on x86_64).  When all those
 bits are zero, the allocation is not active and might as well not be there
 (except for whatever semantics you might want to have about an allocation's
 lifetime as distinct from the event of resetting its contents).  

Okay, that makes sense.

 Your question made me think about the %dr6 issue, too, which I also hadn't
 thought about before.  It looks like your patch handles this correctly, but
 it's a subtle point that I think warrants some comments in the code.  When
 userland inserts a watchpoint and it's hit, it gets a SIGTRAP via do_debug
 and eventually looks at dr6 (via ptrace) to see what happened.  Kernel
 watchpoints that come along after the user signal is generated can clobber
 the CPU %dr6 with new hits, but userland should not perceive this.  This
 works out because what userland sees is thread.debugreg[6], the only place
 that sets it is do_debug, and a kwatch hit bails out before changing it.
 Any other new users of the debugreg sharing interface need to be cognizant
 of the %dr6 issue to avoid stepping on old ptrace uses.

Yes.  In fact, the current existing code does not handle dr6 correctly.  
It never clears the register, which means you're likely to get into 
trouble when multiple breakpoints (or watchpoints) are enabled.


Here's another complication -- and this is one I can't figure out any easy
solutions for.  The type of API we've been discussing will work well
enough on UP systems, but what about SMP?  I don't see any value in having
a kernel watchpoint enabled on some CPUs and not others.  But then what
should happen when a debug register is in use by kwatch and a ptrace
request on one CPU usurps it (leaving no other register in which to put
it)?  Not to mention the difficulties of keeping track of everything when
the same kwatch watchpoint is stored in different debug registers on
different CPUs.

It's really quite a tricky matter.  Should a register be allocated to
kwatch only when no user process needs it?  Should we really go about
checking the requirements of every single process whenever a kwatch
allocation request comes in?  What if the processes which need a
particular register aren't running -- should the register then be given to
kwatch?  What if one of those processes then does start running on one
CPU?

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Roland McGrath

 Yes.  In fact, the current existing code does not handle dr6 correctly.  
 It never clears the register, which means you're likely to get into 
 trouble when multiple breakpoints (or watchpoints) are enabled.

This is a subtle change from the existing ABI, in which userland has to
clear %dr6 via ptrace itself.  But gdb never does that AFAICT.  So it's in
fact subject to confusion when two watchpoints are set and the second hits
after the first.  So gdb ought to be fixed to clear dr6 via ptrace, to work
with existing and older kernels.

I don't think I really object to the ABI change of clearing %dr6 after an
exception so that it does not accumulate multiple results.  But first I'll
have to convince myself that we never actually do want to accumulate
multiple results.  Hmm, I think we can, so maybe I do object.  If you set
two watchpoints inside a user buffer and then do a system call that touches
both those addresses (e.g. read), then you will go through do_debug (to
send_sigtrap) twice before returning to user mode.  When the syscall is
done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
handle this case, but it should.)  Am I wrong?

So this gets to the more complicated view of %dr6 handling that I had first
had in mind yesterday.  Each allocation owns one of the low 4 bits in
%dr6 too.  Only the dr6 bits owned by the userland raw allocation
(i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6].
So when kwatch swallows a debug exception, it should mask off its bit from
%dr6 in the CPU, but not clear %dr6 completely.  That way you can have a
sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one
system call (including interrupt handlers), and when it gets to the
userland debugger examining dr6 it sees the low 2 bits both set.

 It's really quite a tricky matter.  Should a register be allocated to
 kwatch only when no user process needs it?  Should we really go about
 checking the requirements of every single process whenever a kwatch
 allocation request comes in?  What if the processes which need a
 particular register aren't running -- should the register then be given to
 kwatch?  What if one of those processes then does start running on one
 CPU?

To go about checking the requirements of every single process is not so
hard as it sounds when they're recorded as a single global use count per
slot, as your original code does.  When you mentioned a your allocation is
available callback, I was thinking it might come to that being called
inside context switch.  It's all rather tricky, indeed.  

The obvious answer is to start simple.  If any user process anywhere uses
drN, kwatch has to give it up for all CPUs (watchpoints with less than
break ptrace priority do).  If anyone really cares about more flexibility
than that, we can change or extend it.  Some copious comments in the
interface descriptions can lead them in the right direction if the
situation comes up.  Probably with systemtap support in a while, we'll get
a lot more concrete uses of watchpoints and people finding out what really 
matters to them.


Thanks,
Roland

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-09 Thread Alan Stern

On Fri, 9 Feb 2007, Roland McGrath wrote:

 I don't think I really object to the ABI change of clearing %dr6 after an
 exception so that it does not accumulate multiple results.  But first I'll
 have to convince myself that we never actually do want to accumulate
 multiple results.  Hmm, I think we can, so maybe I do object.  If you set
 two watchpoints inside a user buffer and then do a system call that touches
 both those addresses (e.g. read), then you will go through do_debug (to
 send_sigtrap) twice before returning to user mode.  When the syscall is
 done, you'll have a pending SIGTRAP for the debugger to handle.  By looking
 at your %dr6 the debugger can see that both watchpoints hit.  (gdb does not
 handle this case, but it should.)  Am I wrong?

I think you're right.

 So this gets to the more complicated view of %dr6 handling that I had first
 had in mind yesterday.  Each allocation owns one of the low 4 bits in
 %dr6 too.  Only the dr6 bits owned by the userland raw allocation
 (i.e. ptrace/utrace_regset) should appear nonzero in thread.debugreg[6].
 So when kwatch swallows a debug exception, it should mask off its bit from
 %dr6 in the CPU, but not clear %dr6 completely.  That way you can have a
 sequence of user dr0 hit, kwatch dr3 hit, user dr1 hit, all inside one
 system call (including interrupt handlers), and when it gets to the
 userland debugger examining dr6 it sees the low 2 bits both set.

Okay; I'll fix this too.  Come to think of it, kwatch needs to handle 
multiple hits as well -- there might be two watchpoints set to the same 
address.

  It's really quite a tricky matter.  Should a register be allocated to
  kwatch only when no user process needs it?  Should we really go about
  checking the requirements of every single process whenever a kwatch
  allocation request comes in?  What if the processes which need a
  particular register aren't running -- should the register then be given to
  kwatch?  What if one of those processes then does start running on one
  CPU?
 
 To go about checking the requirements of every single process is not so
 hard as it sounds when they're recorded as a single global use count per
 slot, as your original code does.  When you mentioned a your allocation is
 available callback, I was thinking it might come to that being called
 inside context switch.  It's all rather tricky, indeed.  
 
 The obvious answer is to start simple.  If any user process anywhere uses
 drN, kwatch has to give it up for all CPUs (watchpoints with less than
 break ptrace priority do).  If anyone really cares about more flexibility
 than that, we can change or extend it.  Some copious comments in the
 interface descriptions can lead them in the right direction if the
 situation comes up.  Probably with systemtap support in a while, we'll get
 a lot more concrete uses of watchpoints and people finding out what really 
 matters to them.

It's still more complicated than you might think.  Let's say two user
processes each have dr1 allocated, one with low priority and the other
with high priority.  The kernel has to be aware of the high-priority
allocation, so that it can refuse intermediate-priority kwatch allocation
attempts.  Now suppose the second process exits.  dr1 is still allocated
to the first user process but only with low priority, so now
intermediate-priority kwatch allocation attempts should succeed.

In order for this to work, when the second process gives up its allocation 
I would have to either scan though all tasks to see the first process, or 
else keep several global use counts for each slot -- in fact, one use 
count for each priority level.  That's doable if there are only a few 
levels, but not if there are many.

How do you suggest this be handled?  Maybe we should just keep track of a
maximum user priority level for each slot, allowing it to go up but not
down until all user processes have given up the slot.  (I.e., in the
example above the later kwatch requests would still fail because we would
continue to remember the high user priority level so long as the first
process maintained its allocation.)  That would be overly pessimistic, but
it would at least be safe.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-07 Thread Bob Copeland


On 2/7/07, Alan Stern <[EMAIL PROTECTED]> wrote:

I wonder where this convention of using lower numbers to indicate higher
priorities comes from...  It seems to be quite prevalent even though it's
obviously backwards.


I agree but at least in the case of 'nice' it works in the sense that
the value increases with increasing niceness.  Done the other way,
they would have to call it 'mean,' then someone would wonder why 'mean
10 20' prints 'No such file or directory' instead of '15'...

Bob
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-07 Thread Alan Stern

On Tue, 6 Feb 2007, Roland McGrath wrote:

> > So for the sake of argument, let's assume that debug registers can be 
> > assigned with priority values ranging from 0 to 7 (overkill, but who 
> > cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
> > can request whatever priority they like.  The well-behaved cases you've 
> > been discussing will use priority 0, and the invasive cases can use 
> > priority 7.  (With appropriate symbolic names instead of raw numeric 
> > values, naturally.)
> 
> Sure.  Or make it signed with lower value wins, have ptrace use -1 and the
> average bear use 0 or something especially unobtrusive use >0, and
> something very obtrusive use -many.

I wonder where this convention of using lower numbers to indicate higher 
priorities comes from...  It seems to be quite prevalent even though it's 
obviously backwards.

>  Unless you are really going to pack it
> into a few bits somewhere, I'd make it an arbitrary int rather than a
> special small range; it's just for sort order comparison.  Bottom line, I
> don't really care about the numerology.  Just so "break ptrace", "don't
> break ptrace", and "readily get out of the way on demand" can be expressed.
> We can always fine-tune it later as there are more concrete users.

Okay; I'm not fixated on any particular size.

> > Or maybe that's too complicated.  Perhaps all userspace assignments should 
> > always use the same priority level.  
> 
> No, I want priorities among user-mode watchpoint users too.  ptrace is
> rigid, but newer facilities can coexist with ptrace on the same thread and
> with kwatch, and do fancy new things to fall back when there is debugreg
> allocation pressure.  Future user facilities might be able to do VM tricks
> that are harder to make workable for kernel mode, for example.  

All right.  However this means thread_struct will have to grow in order to
hold each task's debug-register allocations and priorities.  Would that be
acceptable?  (This might be a good reason to keep the number of bits
down.)

Another question: How can a program using the ptrace API ever give up a
debug-register allocation?  Disabling the register isn't sufficient; a
user program should be able to store a watchpoint address in dr1, enable
it in dr7, disable it in dr7, and then re-enable it in the expectation
that the address stored in dr1 hasn't been overwritten.  As far as I can
see, ptrace-type allocations have to be permanent (until the task exits or
execs, or uses some other to-be-determined API to do the de-allocation.)

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-07 Thread Alan Stern

On Tue, 6 Feb 2007, Roland McGrath wrote:

  So for the sake of argument, let's assume that debug registers can be 
  assigned with priority values ranging from 0 to 7 (overkill, but who 
  cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
  can request whatever priority they like.  The well-behaved cases you've 
  been discussing will use priority 0, and the invasive cases can use 
  priority 7.  (With appropriate symbolic names instead of raw numeric 
  values, naturally.)
 
 Sure.  Or make it signed with lower value wins, have ptrace use -1 and the
 average bear use 0 or something especially unobtrusive use 0, and
 something very obtrusive use -many.

I wonder where this convention of using lower numbers to indicate higher 
priorities comes from...  It seems to be quite prevalent even though it's 
obviously backwards.

  Unless you are really going to pack it
 into a few bits somewhere, I'd make it an arbitrary int rather than a
 special small range; it's just for sort order comparison.  Bottom line, I
 don't really care about the numerology.  Just so break ptrace, don't
 break ptrace, and readily get out of the way on demand can be expressed.
 We can always fine-tune it later as there are more concrete users.

Okay; I'm not fixated on any particular size.

  Or maybe that's too complicated.  Perhaps all userspace assignments should 
  always use the same priority level.  
 
 No, I want priorities among user-mode watchpoint users too.  ptrace is
 rigid, but newer facilities can coexist with ptrace on the same thread and
 with kwatch, and do fancy new things to fall back when there is debugreg
 allocation pressure.  Future user facilities might be able to do VM tricks
 that are harder to make workable for kernel mode, for example.  

All right.  However this means thread_struct will have to grow in order to
hold each task's debug-register allocations and priorities.  Would that be
acceptable?  (This might be a good reason to keep the number of bits
down.)

Another question: How can a program using the ptrace API ever give up a
debug-register allocation?  Disabling the register isn't sufficient; a
user program should be able to store a watchpoint address in dr1, enable
it in dr7, disable it in dr7, and then re-enable it in the expectation
that the address stored in dr1 hasn't been overwritten.  As far as I can
see, ptrace-type allocations have to be permanent (until the task exits or
execs, or uses some other to-be-determined API to do the de-allocation.)

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-07 Thread Bob Copeland


On 2/7/07, Alan Stern [EMAIL PROTECTED] wrote:

I wonder where this convention of using lower numbers to indicate higher
priorities comes from...  It seems to be quite prevalent even though it's
obviously backwards.


I agree but at least in the case of 'nice' it works in the sense that
the value increases with increasing niceness.  Done the other way,
they would have to call it 'mean,' then someone would wonder why 'mean
10 20' prints 'No such file or directory' instead of '15'...

Bob
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-06 Thread Roland McGrath

> That's good.  So I'll assume an updated version of kwatch can be submitted 
> without regard to the progress of utrace (other than minor conflicts over 
> the exact location of the ptrace code to change).

Indeed.

> Right.  I had been thinking in terms of a developer using kwatch to track 
> down some particularly nasty problem, something that would happen rather 
> infrequently, where one wouldn't care about side effects on user programs.  
> But of course those side effects might alter an important aspect of the 
> kernel problem being debugged...

This is indeed a way it might reasonably be used.  As I said, it's fine for
an individual use to be that way.  But think also of using it for
performance measurement (i.e. "how hot is this counter") in something like
systemtap, where you might have long-running instrumentation over arbitrary
workloads.

> It's also true that the current kwatch version affects the user experience
> even when no kernel debugging is going on, as it forcibly prevents ptrace
> calls from setting the Global-Enable bits in dr7.  That at least can be
> fixed quite easily.  (On the other hand, userspace should never do 
> anything other than a Local Enable.)

The distinction between local and global here never matters on Linux.  We
don't use hardware task switching at all, and if we did it would be part of
context switch, which already switches in debug register values.  

The local vs global distinction you have in debugreg allocation (when one
Linux task_struct is on the CPU vs always on every CPU) is a
machine-independent notion at the level of your debugreg sharing
abstraction, and has nothing to do with particular %dr7 bit values
(just with the allocation of all the bits in %dr7 that correspond to a
particular allocated %drN).

> How about a pair of callbacks: One to notify whenever the watchpoint is 
> enabled and one to notify whenever it is disabled?

That sounds fine.  You'll want to make sure it's structured so it doesn't
get too hairy when a caller wants to just give up and unregister when its
slot is unavailable (hopefully shouldn't lead to calling unregister from
the callback made inside the register call and such twists).

> So for the sake of argument, let's assume that debug registers can be 
> assigned with priority values ranging from 0 to 7 (overkill, but who 
> cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
> can request whatever priority they like.  The well-behaved cases you've 
> been discussing will use priority 0, and the invasive cases can use 
> priority 7.  (With appropriate symbolic names instead of raw numeric 
> values, naturally.)

Sure.  Or make it signed with lower value wins, have ptrace use -1 and the
average bear use 0 or something especially unobtrusive use >0, and
something very obtrusive use -many.  Unless you are really going to pack it
into a few bits somewhere, I'd make it an arbitrary int rather than a
special small range; it's just for sort order comparison.  Bottom line, I
don't really care about the numerology.  Just so "break ptrace", "don't
break ptrace", and "readily get out of the way on demand" can be expressed.
We can always fine-tune it later as there are more concrete users.

> Or maybe that's too complicated.  Perhaps all userspace assignments should 
> always use the same priority level.  

No, I want priorities among user-mode watchpoint users too.  ptrace is
rigid, but newer facilities can coexist with ptrace on the same thread and
with kwatch, and do fancy new things to fall back when there is debugreg
allocation pressure.  Future user facilities might be able to do VM tricks
that are harder to make workable for kernel mode, for example.  

> For now I would prefer to avoid that.  It's true that kwatch is intended
> _only_ for kernelspace watchpoints, not userspace.  But I'd rather leave
> the complications up to someone else.

Understood.  If you constrain the kwatch interface so it cannot be used
with user addresses (checks < TASK_SIZE or whatever), then the problem will
be clearly defined as the slightly simpler one whenever someone does come
along in need of more complications.

> It seems likely that the interfaces added by kwatch will need to be
> generalized in various ways in order to handle the requirements of other
> architectures.  However I don't know what those requirements might be, so
> it seems best to start out small with x86 only and leave more refinements
> for the future.

Agreed, just to keep it in mind.  I think the features on other machines
are roughly similar except for not offering size choices other than
"anywhere in this aligned word".  

> If I update the patch, adding a priority level and the callback 
> notifications, do you think it would then be acceptable?

I expect so.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-06 Thread Alan Stern

On Mon, 5 Feb 2007, Roland McGrath wrote:

> Sorry I've been slow in giving you feedback on kwatch.

No problem (I have plenty of other things to work on!), and thanks for the 
detailed reply.

> > I'll be happy to move this over to the utrace setting, once it is merged.  
> > Do you think it would be better to include the current version of kwatch 
> > now or to wait for utrace?
> > 
> > Roland, is there a schedule for when you plan to get utrace into -mm?
> 
> Since you've asked, I'll mention that I've been discussing this with Andrew
> lately and we plan to work on merging it into -mm as soon as we can manage.
> 
> The kwatch implementation is pretty much orthogonal to the utrace patch as
> it is so far.  As you've noted, it doesn't change the nature of the setting
> of the debug registers; it only moves around the existing code for setting
> them in raw form.  Hence it doesn't much matter which order the work is
> merged at this stage.  There's no reason to withhold kwatch waiting for 
> utrace.

That's good.  So I'll assume an updated version of kwatch can be submitted 
without regard to the progress of utrace (other than minor conflicts over 
the exact location of the ptrace code to change).

> I do have a problem with kwatch, however.  The existing user ABI includes
> setting all of the debug registers, and this interface has never before
> expressed a situation where you can set some but not all of them.  Having
> ptrace suddenly fail with EBUSY when it never did before is not OK.  No
> well-behaved kernel-mode tracing/debugging facility should perturb the user
> experience in this way.  It is certainly understandable that one will
> sometimes want to do invasive kernel-mode debugging and on special
> occasions choose to be ill-behaved in this way (you might know your
> userland work load doesn't include running gdb with watchpoints).  
> But kwatch as it stands does not even make it possible to write a
> well-behaved facility.

Right.  I had been thinking in terms of a developer using kwatch to track 
down some particularly nasty problem, something that would happen rather 
infrequently, where one wouldn't care about side effects on user programs.  
But of course those side effects might alter an important aspect of the 
kernel problem being debugged...

It's also true that the current kwatch version affects the user experience
even when no kernel debugging is going on, as it forcibly prevents ptrace
calls from setting the Global-Enable bits in dr7.  That at least can be
fixed quite easily.  (On the other hand, userspace should never do 
anything other than a Local Enable.)

> I am all in favor of a facility to manage shared use of the debug
> registers, such as your debugreg.h additions.  I just think it needs to be
> a little more flexible.  An unobtrusive kernel facility has to get out of
> the way when user-mode decides to use all its debug registers.  It's not
> immediately important what it's going to about it when contention arises,
> but there has to be a way for the user-mode facilities to say they need to
> allocate debugregs with priority and evict other squatters.  So, something
> like code allocating a debugreg can supply a callback that's made when its
> allocation has to taken by something with higher priority.  

How about a pair of callbacks: One to notify whenever the watchpoint is 
enabled and one to notify whenever it is disabled?

> Even after utrace, there will always be the possibility of a traditional
> uncoordinated user of the raw debug registers, if nothing else ptrace
> compatibility will always be there for old users.  So anything new and
> fancy needs to be prepared to back out of the way gracefully.  In the case
> of kwatch, it can just have a handler function given by the caller to start
> with.  It's OK if individual callers can specially declare "I am not
> well-behaved" and eat debugregs so that well-behaved high-priority users
> like ptrace just have to lose (breaking compatibility).  But no
> well-behaved caller of kwatch will do that.  

No doubt the future userspace API will include some sort of priority 
facility.  For now, though, ptrace doesn't have anything like it.  We just 
have to assign it an arbitrary intermediate priority.

So for the sake of argument, let's assume that debug registers can be 
assigned with priority values ranging from 0 to 7 (overkill, but who 
cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
can request whatever priority they like.  The well-behaved cases you've 
been discussing will use priority 0, and the invasive cases can use 
priority 7.  (With appropriate symbolic names instead of raw numeric 
values, naturally.)

Or maybe that's too complicated.  Perhaps all userspace assignments should 
always use the same priority level.  After all, it's possible for multiple 
tasks to allocate the same debug register at the same time -- if they had 
differing priorities that would make it much more difficult

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-06 Thread Alan Stern

On Mon, 5 Feb 2007, Roland McGrath wrote:

 Sorry I've been slow in giving you feedback on kwatch.

No problem (I have plenty of other things to work on!), and thanks for the 
detailed reply.

  I'll be happy to move this over to the utrace setting, once it is merged.  
  Do you think it would be better to include the current version of kwatch 
  now or to wait for utrace?
  
  Roland, is there a schedule for when you plan to get utrace into -mm?
 
 Since you've asked, I'll mention that I've been discussing this with Andrew
 lately and we plan to work on merging it into -mm as soon as we can manage.
 
 The kwatch implementation is pretty much orthogonal to the utrace patch as
 it is so far.  As you've noted, it doesn't change the nature of the setting
 of the debug registers; it only moves around the existing code for setting
 them in raw form.  Hence it doesn't much matter which order the work is
 merged at this stage.  There's no reason to withhold kwatch waiting for 
 utrace.

That's good.  So I'll assume an updated version of kwatch can be submitted 
without regard to the progress of utrace (other than minor conflicts over 
the exact location of the ptrace code to change).

 I do have a problem with kwatch, however.  The existing user ABI includes
 setting all of the debug registers, and this interface has never before
 expressed a situation where you can set some but not all of them.  Having
 ptrace suddenly fail with EBUSY when it never did before is not OK.  No
 well-behaved kernel-mode tracing/debugging facility should perturb the user
 experience in this way.  It is certainly understandable that one will
 sometimes want to do invasive kernel-mode debugging and on special
 occasions choose to be ill-behaved in this way (you might know your
 userland work load doesn't include running gdb with watchpoints).  
 But kwatch as it stands does not even make it possible to write a
 well-behaved facility.

Right.  I had been thinking in terms of a developer using kwatch to track 
down some particularly nasty problem, something that would happen rather 
infrequently, where one wouldn't care about side effects on user programs.  
But of course those side effects might alter an important aspect of the 
kernel problem being debugged...

It's also true that the current kwatch version affects the user experience
even when no kernel debugging is going on, as it forcibly prevents ptrace
calls from setting the Global-Enable bits in dr7.  That at least can be
fixed quite easily.  (On the other hand, userspace should never do 
anything other than a Local Enable.)

 I am all in favor of a facility to manage shared use of the debug
 registers, such as your debugreg.h additions.  I just think it needs to be
 a little more flexible.  An unobtrusive kernel facility has to get out of
 the way when user-mode decides to use all its debug registers.  It's not
 immediately important what it's going to about it when contention arises,
 but there has to be a way for the user-mode facilities to say they need to
 allocate debugregs with priority and evict other squatters.  So, something
 like code allocating a debugreg can supply a callback that's made when its
 allocation has to taken by something with higher priority.  

How about a pair of callbacks: One to notify whenever the watchpoint is 
enabled and one to notify whenever it is disabled?

 Even after utrace, there will always be the possibility of a traditional
 uncoordinated user of the raw debug registers, if nothing else ptrace
 compatibility will always be there for old users.  So anything new and
 fancy needs to be prepared to back out of the way gracefully.  In the case
 of kwatch, it can just have a handler function given by the caller to start
 with.  It's OK if individual callers can specially declare I am not
 well-behaved and eat debugregs so that well-behaved high-priority users
 like ptrace just have to lose (breaking compatibility).  But no
 well-behaved caller of kwatch will do that.  

No doubt the future userspace API will include some sort of priority 
facility.  For now, though, ptrace doesn't have anything like it.  We just 
have to assign it an arbitrary intermediate priority.

So for the sake of argument, let's assume that debug registers can be 
assigned with priority values ranging from 0 to 7 (overkill, but who 
cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
can request whatever priority they like.  The well-behaved cases you've 
been discussing will use priority 0, and the invasive cases can use 
priority 7.  (With appropriate symbolic names instead of raw numeric 
values, naturally.)

Or maybe that's too complicated.  Perhaps all userspace assignments should 
always use the same priority level.  After all, it's possible for multiple 
tasks to allocate the same debug register at the same time -- if they had 
differing priorities that would make it much more difficult to keep things 
straight.  Then there would be only

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-06 Thread Roland McGrath

 That's good.  So I'll assume an updated version of kwatch can be submitted 
 without regard to the progress of utrace (other than minor conflicts over 
 the exact location of the ptrace code to change).

Indeed.

 Right.  I had been thinking in terms of a developer using kwatch to track 
 down some particularly nasty problem, something that would happen rather 
 infrequently, where one wouldn't care about side effects on user programs.  
 But of course those side effects might alter an important aspect of the 
 kernel problem being debugged...

This is indeed a way it might reasonably be used.  As I said, it's fine for
an individual use to be that way.  But think also of using it for
performance measurement (i.e. how hot is this counter) in something like
systemtap, where you might have long-running instrumentation over arbitrary
workloads.

 It's also true that the current kwatch version affects the user experience
 even when no kernel debugging is going on, as it forcibly prevents ptrace
 calls from setting the Global-Enable bits in dr7.  That at least can be
 fixed quite easily.  (On the other hand, userspace should never do 
 anything other than a Local Enable.)

The distinction between local and global here never matters on Linux.  We
don't use hardware task switching at all, and if we did it would be part of
context switch, which already switches in debug register values.  

The local vs global distinction you have in debugreg allocation (when one
Linux task_struct is on the CPU vs always on every CPU) is a
machine-independent notion at the level of your debugreg sharing
abstraction, and has nothing to do with particular %dr7 bit values
(just with the allocation of all the bits in %dr7 that correspond to a
particular allocated %drN).

 How about a pair of callbacks: One to notify whenever the watchpoint is 
 enabled and one to notify whenever it is disabled?

That sounds fine.  You'll want to make sure it's structured so it doesn't
get too hairy when a caller wants to just give up and unregister when its
slot is unavailable (hopefully shouldn't lead to calling unregister from
the callback made inside the register call and such twists).

 So for the sake of argument, let's assume that debug registers can be 
 assigned with priority values ranging from 0 to 7 (overkill, but who 
 cares?).  By fiat, ptrace assignments use priority 4.  Then kwatch callers 
 can request whatever priority they like.  The well-behaved cases you've 
 been discussing will use priority 0, and the invasive cases can use 
 priority 7.  (With appropriate symbolic names instead of raw numeric 
 values, naturally.)

Sure.  Or make it signed with lower value wins, have ptrace use -1 and the
average bear use 0 or something especially unobtrusive use 0, and
something very obtrusive use -many.  Unless you are really going to pack it
into a few bits somewhere, I'd make it an arbitrary int rather than a
special small range; it's just for sort order comparison.  Bottom line, I
don't really care about the numerology.  Just so break ptrace, don't
break ptrace, and readily get out of the way on demand can be expressed.
We can always fine-tune it later as there are more concrete users.

 Or maybe that's too complicated.  Perhaps all userspace assignments should 
 always use the same priority level.  

No, I want priorities among user-mode watchpoint users too.  ptrace is
rigid, but newer facilities can coexist with ptrace on the same thread and
with kwatch, and do fancy new things to fall back when there is debugreg
allocation pressure.  Future user facilities might be able to do VM tricks
that are harder to make workable for kernel mode, for example.  

 For now I would prefer to avoid that.  It's true that kwatch is intended
 _only_ for kernelspace watchpoints, not userspace.  But I'd rather leave
 the complications up to someone else.

Understood.  If you constrain the kwatch interface so it cannot be used
with user addresses (checks  TASK_SIZE or whatever), then the problem will
be clearly defined as the slightly simpler one whenever someone does come
along in need of more complications.

 It seems likely that the interfaces added by kwatch will need to be
 generalized in various ways in order to handle the requirements of other
 architectures.  However I don't know what those requirements might be, so
 it seems best to start out small with x86 only and leave more refinements
 for the future.

Agreed, just to keep it in mind.  I think the features on other machines
are roughly similar except for not offering size choices other than
anywhere in this aligned word.  

 If I update the patch, adding a priority level and the callback 
 notifications, do you think it would then be acceptable?

I expect so.


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-05 Thread Roland McGrath

Sorry I've been slow in giving you feedback on kwatch.

> I'll be happy to move this over to the utrace setting, once it is merged.  
> Do you think it would be better to include the current version of kwatch 
> now or to wait for utrace?
> 
> Roland, is there a schedule for when you plan to get utrace into -mm?

Since you've asked, I'll mention that I've been discussing this with Andrew
lately and we plan to work on merging it into -mm as soon as we can manage.

The kwatch implementation is pretty much orthogonal to the utrace patch as
it is so far.  As you've noted, it doesn't change the nature of the setting
of the debug registers; it only moves around the existing code for setting
them in raw form.  Hence it doesn't much matter which order the work is
merged at this stage.  There's no reason to withhold kwatch waiting for utrace.

I do have a problem with kwatch, however.  The existing user ABI includes
setting all of the debug registers, and this interface has never before
expressed a situation where you can set some but not all of them.  Having
ptrace suddenly fail with EBUSY when it never did before is not OK.  No
well-behaved kernel-mode tracing/debugging facility should perturb the user
experience in this way.  It is certainly understandable that one will
sometimes want to do invasive kernel-mode debugging and on special
occasions choose to be ill-behaved in this way (you might know your
userland work load doesn't include running gdb with watchpoints).  
But kwatch as it stands does not even make it possible to write a
well-behaved facility.

I am all in favor of a facility to manage shared use of the debug
registers, such as your debugreg.h additions.  I just think it needs to be
a little more flexible.  An unobtrusive kernel facility has to get out of
the way when user-mode decides to use all its debug registers.  It's not
immediately important what it's going to about it when contention arises,
but there has to be a way for the user-mode facilities to say they need to
allocate debugregs with priority and evict other squatters.  So, something
like code allocating a debugreg can supply a callback that's made when its
allocation has to taken by something with higher priority.  

Even after utrace, there will always be the possibility of a traditional
uncoordinated user of the raw debug registers, if nothing else ptrace
compatibility will always be there for old users.  So anything new and
fancy needs to be prepared to back out of the way gracefully.  In the case
of kwatch, it can just have a handler function given by the caller to start
with.  It's OK if individual callers can specially declare "I am not
well-behaved" and eat debugregs so that well-behaved high-priority users
like ptrace just have to lose (breaking compatibility).  But no
well-behaved caller of kwatch will do that.  

As a later improvement, kwatch could try a thing or two to stave off giving
up and telling its caller the watchpoint couldn't stay for the current
task.  For example, if a watchpoint is in kernel memory, you could switch
in your debugreg settings on entering the kernel and restore the user
watchpoints before returning to user mode.  Then you'd need to make
get_user et al somehow observe the user-mode watchpoints.  But it could be
investigated if the need arises.  Note that you can already silently do
something simple like juggling your kwatch debugreg assignments around if
the higher-priority consumer evicting you has left some other debugregs unused.

I certainly intend for later features based on utrace to include
higher-level treatment of watchpoints so that user debugging facilities can
also become responsive to debugreg allocation pressure.  (Eventually, the
user facilities might have easier ways of falling back to other methods and
getting out of the way of kernel debugreg consumers, than can be done for
the kernel-mode-tracing facilities.)  To that end, I'd like to see a clear
and robust interface for debugreg sharing, below the level of kwatch.  I'd
also like to see a thin layer on that giving a machine-independent kernel
source API for talking about watchpoints, which you pretty much have rolled
into the kwatch interface now.  But these are further refinements, not
barriers to including kwatch.

Also, an unrelated minor point.  I think it's error-prone to have an
integer argument to unregister_kwatch.  I think it makes most sense to have
the caller provide the space and call register/unregister with a pointer,
in the style of kprobes.


Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-02-05 Thread Roland McGrath

Sorry I've been slow in giving you feedback on kwatch.

 I'll be happy to move this over to the utrace setting, once it is merged.  
 Do you think it would be better to include the current version of kwatch 
 now or to wait for utrace?
 
 Roland, is there a schedule for when you plan to get utrace into -mm?

Since you've asked, I'll mention that I've been discussing this with Andrew
lately and we plan to work on merging it into -mm as soon as we can manage.

The kwatch implementation is pretty much orthogonal to the utrace patch as
it is so far.  As you've noted, it doesn't change the nature of the setting
of the debug registers; it only moves around the existing code for setting
them in raw form.  Hence it doesn't much matter which order the work is
merged at this stage.  There's no reason to withhold kwatch waiting for utrace.

I do have a problem with kwatch, however.  The existing user ABI includes
setting all of the debug registers, and this interface has never before
expressed a situation where you can set some but not all of them.  Having
ptrace suddenly fail with EBUSY when it never did before is not OK.  No
well-behaved kernel-mode tracing/debugging facility should perturb the user
experience in this way.  It is certainly understandable that one will
sometimes want to do invasive kernel-mode debugging and on special
occasions choose to be ill-behaved in this way (you might know your
userland work load doesn't include running gdb with watchpoints).  
But kwatch as it stands does not even make it possible to write a
well-behaved facility.

I am all in favor of a facility to manage shared use of the debug
registers, such as your debugreg.h additions.  I just think it needs to be
a little more flexible.  An unobtrusive kernel facility has to get out of
the way when user-mode decides to use all its debug registers.  It's not
immediately important what it's going to about it when contention arises,
but there has to be a way for the user-mode facilities to say they need to
allocate debugregs with priority and evict other squatters.  So, something
like code allocating a debugreg can supply a callback that's made when its
allocation has to taken by something with higher priority.  

Even after utrace, there will always be the possibility of a traditional
uncoordinated user of the raw debug registers, if nothing else ptrace
compatibility will always be there for old users.  So anything new and
fancy needs to be prepared to back out of the way gracefully.  In the case
of kwatch, it can just have a handler function given by the caller to start
with.  It's OK if individual callers can specially declare I am not
well-behaved and eat debugregs so that well-behaved high-priority users
like ptrace just have to lose (breaking compatibility).  But no
well-behaved caller of kwatch will do that.  

As a later improvement, kwatch could try a thing or two to stave off giving
up and telling its caller the watchpoint couldn't stay for the current
task.  For example, if a watchpoint is in kernel memory, you could switch
in your debugreg settings on entering the kernel and restore the user
watchpoints before returning to user mode.  Then you'd need to make
get_user et al somehow observe the user-mode watchpoints.  But it could be
investigated if the need arises.  Note that you can already silently do
something simple like juggling your kwatch debugreg assignments around if
the higher-priority consumer evicting you has left some other debugregs unused.

I certainly intend for later features based on utrace to include
higher-level treatment of watchpoints so that user debugging facilities can
also become responsive to debugreg allocation pressure.  (Eventually, the
user facilities might have easier ways of falling back to other methods and
getting out of the way of kernel debugreg consumers, than can be done for
the kernel-mode-tracing facilities.)  To that end, I'd like to see a clear
and robust interface for debugreg sharing, below the level of kwatch.  I'd
also like to see a thin layer on that giving a machine-independent kernel
source API for talking about watchpoints, which you pretty much have rolled
into the kwatch interface now.  But these are further refinements, not
barriers to including kwatch.

Also, an unrelated minor point.  I think it's error-prone to have an
integer argument to unregister_kwatch.  I think it makes most sense to have
the caller provide the space and call register/unregister with a pointer,
in the style of kprobes.


Thanks,
Roland
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-18 Thread Christoph Hellwig

On Thu, Jan 18, 2007 at 08:31:59AM +0100, Ingo Molnar wrote:
> 
> * Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> 
> > > I'll be happy to move this over to the utrace setting, once it is 
> > > merged.  Do you think it would be better to include the current 
> > > version of kwatch now or to wait for utrace?
> > > 
> > > Roland, is there a schedule for when you plan to get utrace into 
> > > -mm?
> > 
> > Even if it goes into mainline soon we'll need a lot of time for all 
> > architectures to catch up, so I think kwatch should definitely comes 
> > first.
> 
> i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
> /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
> years, precisely because it was done in the 'oh, lets get this feature 
> added first, think about it later' manner. Roland's work is a large 
> logistical undertaking and we should not make it more complex than it 
> is. Once it's in we can add debugging features ontop of that. To me work 
> that cleans up existing mess takes precedence before work that adds to 
> the mess.

Utrace doesn't provide any kind of watchpoint infrastructure now, and
utrace will take a lot of time to get ready for inclusion, mostly because
it really needs asll the arch maintainers to help out (and various not
so easy core fixes aswell).

I'm all for merging utrace, and I wish we'd be much further into the
merging process already, but blocking mostly unrelated functionality for
it is more than dumb.

> ps. please fix your mailer to not emit those silly Mail-Followup-To 
> headers! It collapses To: and Cc: lines into one huge unnecessary To: 
> line.

This header is absolutely intentation as far too many folks seem to randomly
drop to or cc lines on mailing lists.  and of course it's alsmost esential
on lists with braindead reply to list policies (e.g. Debian)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-18 Thread Alan Stern

On Thu, 18 Jan 2007, Ingo Molnar wrote:

> 
> * Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> 
> > > I'll be happy to move this over to the utrace setting, once it is 
> > > merged.  Do you think it would be better to include the current 
> > > version of kwatch now or to wait for utrace?
> > > 
> > > Roland, is there a schedule for when you plan to get utrace into 
> > > -mm?
> > 
> > Even if it goes into mainline soon we'll need a lot of time for all 
> > architectures to catch up, so I think kwatch should definitely comes 
> > first.
> 
> i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
> /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
> years, precisely because it was done in the 'oh, lets get this feature 
> added first, think about it later' manner. Roland's work is a large 
> logistical undertaking and we should not make it more complex than it 
> is. Once it's in we can add debugging features ontop of that. To me work 
> that cleans up existing mess takes precedence before work that adds to 
> the mess.

Interestingly, the current version of utrace makes no special provision
for watchpoints, either in kernel or user space.  Instead it relies on the
legacy ptrace mechanism for setting debug registers in the target
process's user area.  Perhaps an explicit watchpoint implementation should
be added to utrace, but that's beyond the scope of this discussion.

Furthermore, utrace is explicitly intended for tracing user programs, not
for tracing the kernel.  Kwatch, however, is just the opposite: It is
intended for setting up watchpoints in kernel space.  In that sense it is
pretty much orthogonal to utrace.  Although it would affect the utrace
patches, the changes would be basically transparent (i.e., move the new
code from one ptrace handler to another instead of moving the old code).

If Kwatch is to be subsumed anywhere, I think it should be under the
Kprobes/Systemtap project.  Again, that's a separate question -- so far 
they have avoided data watchpoints.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-18 Thread Alan Stern

On Thu, 18 Jan 2007, Ingo Molnar wrote:

 
 * Christoph Hellwig [EMAIL PROTECTED] wrote:
 
   I'll be happy to move this over to the utrace setting, once it is 
   merged.  Do you think it would be better to include the current 
   version of kwatch now or to wait for utrace?
   
   Roland, is there a schedule for when you plan to get utrace into 
   -mm?
  
  Even if it goes into mainline soon we'll need a lot of time for all 
  architectures to catch up, so I think kwatch should definitely comes 
  first.
 
 i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
 /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
 years, precisely because it was done in the 'oh, lets get this feature 
 added first, think about it later' manner. Roland's work is a large 
 logistical undertaking and we should not make it more complex than it 
 is. Once it's in we can add debugging features ontop of that. To me work 
 that cleans up existing mess takes precedence before work that adds to 
 the mess.

Interestingly, the current version of utrace makes no special provision
for watchpoints, either in kernel or user space.  Instead it relies on the
legacy ptrace mechanism for setting debug registers in the target
process's user area.  Perhaps an explicit watchpoint implementation should
be added to utrace, but that's beyond the scope of this discussion.

Furthermore, utrace is explicitly intended for tracing user programs, not
for tracing the kernel.  Kwatch, however, is just the opposite: It is
intended for setting up watchpoints in kernel space.  In that sense it is
pretty much orthogonal to utrace.  Although it would affect the utrace
patches, the changes would be basically transparent (i.e., move the new
code from one ptrace handler to another instead of moving the old code).

If Kwatch is to be subsumed anywhere, I think it should be under the
Kprobes/Systemtap project.  Again, that's a separate question -- so far 
they have avoided data watchpoints.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-18 Thread Christoph Hellwig

On Thu, Jan 18, 2007 at 08:31:59AM +0100, Ingo Molnar wrote:
 
 * Christoph Hellwig [EMAIL PROTECTED] wrote:
 
   I'll be happy to move this over to the utrace setting, once it is 
   merged.  Do you think it would be better to include the current 
   version of kwatch now or to wait for utrace?
   
   Roland, is there a schedule for when you plan to get utrace into 
   -mm?
  
  Even if it goes into mainline soon we'll need a lot of time for all 
  architectures to catch up, so I think kwatch should definitely comes 
  first.
 
 i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
 /huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
 years, precisely because it was done in the 'oh, lets get this feature 
 added first, think about it later' manner. Roland's work is a large 
 logistical undertaking and we should not make it more complex than it 
 is. Once it's in we can add debugging features ontop of that. To me work 
 that cleans up existing mess takes precedence before work that adds to 
 the mess.

Utrace doesn't provide any kind of watchpoint infrastructure now, and
utrace will take a lot of time to get ready for inclusion, mostly because
it really needs asll the arch maintainers to help out (and various not
so easy core fixes aswell).

I'm all for merging utrace, and I wish we'd be much further into the
merging process already, but blocking mostly unrelated functionality for
it is more than dumb.


 ps. please fix your mailer to not emit those silly Mail-Followup-To 
 headers! It collapses To: and Cc: lines into one huge unnecessary To: 
 line.

This header is absolutely intentation as far too many folks seem to randomly
drop to or cc lines on mailing lists.  and of course it's alsmost esential
on lists with braindead reply to list policies (e.g. Debian)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Ingo Molnar

* Christoph Hellwig <[EMAIL PROTECTED]> wrote:

> > I'll be happy to move this over to the utrace setting, once it is 
> > merged.  Do you think it would be better to include the current 
> > version of kwatch now or to wait for utrace?
> > 
> > Roland, is there a schedule for when you plan to get utrace into 
> > -mm?
> 
> Even if it goes into mainline soon we'll need a lot of time for all 
> architectures to catch up, so I think kwatch should definitely comes 
> first.

i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
/huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
years, precisely because it was done in the 'oh, lets get this feature 
added first, think about it later' manner. Roland's work is a large 
logistical undertaking and we should not make it more complex than it 
is. Once it's in we can add debugging features ontop of that. To me work 
that cleans up existing mess takes precedence before work that adds to 
the mess.

Ingo

ps. please fix your mailer to not emit those silly Mail-Followup-To 
headers! It collapses To: and Cc: lines into one huge unnecessary To: 
line.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Christoph Hellwig

On Wed, Jan 17, 2007 at 11:17:37AM -0500, Alan Stern wrote:
> I'll be happy to move this over to the utrace setting, once it is merged.  
> Do you think it would be better to include the current version of kwatch 
> now or to wait for utrace?
> 
> Roland, is there a schedule for when you plan to get utrace into -mm?

Even if it goes into mainline soon we'll need a lot of time for all
architectures to catch up, so I think kwatch should definitely comes first.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Alan Stern

On Tue, 16 Jan 2007, Christoph Hellwig wrote:

> Fir4st I'd say thanks a lot for forward-porting this, it's really useful
> feature for all kinds of nasty debugging.
> 
> I think you should split this into two patches, one for the debugreg
> infrastructure, and one for the actual kwatch code.
> 
> Also I think you provide one (or even a few) example wathes for
> trivial things, say updating i_ino for an inode given through debugfs.
> 
> Some comments on the code below:

Many thanks for your detailed comments and suggestions.  It probably was
obvious that most of the things you picked up on were inherited from the
original Kwatch patch.  I'll update my patch in accordance with your
suggestions.

Responses to just a couple of the comments:

> I suspect this should be replaced wit ha global and local variant
> to fix the above mentioned issue.  It's a tiny bit duplicated code,
> but seems much cleaner.

It would indeed be cleaner.  And in fact the local variant would have a
large amount of dead code, which could be left out entirely (at least from
the initial version).  That's because the only current user of local debug 
register allocations is ptrace.

> > +static void write_dr(int debugreg, unsigned long addr)
> > +{
> > +   switch (debugreg) {
> > +   case 0: set_debugreg(addr, 0);  break;
> > +   case 1: set_debugreg(addr, 1);  break;
> > +   case 2: set_debugreg(addr, 2);  break;
> > +   case 3: set_debugreg(addr, 3);  break;
> > +   case 6: set_debugreg(addr, 6);  break;
> > +   case 7: set_debugreg(addr, 7);  break;
> > +   }
> > +}
> 
> What's the point of this wrapper?

It is called from two different places, and it's better than including
the "switch" in each place.

> I think large parts of this header should go into a new linux/kwatch.h
> so that generic code can use kwatches.

In the long run that may well be true.  For now, I'm a little hesitant to
put something which works only on i386 under include/linux.

> > +config KWATCH
> > +   bool "Kwatch points (EXPERIMENTAL)"
> > +   depends on EXPERIMENTAL
> > +   help
> > + Kwatch enables kernel-space data watchpoints using the processor's
> > + debug registers.  It can be very useful for kernel debugging.
> > + If in doubt, say "N".
> 
> I think we want different options for debugregs and kwatch.  The debugreg
> one probably doesn't have to be actually user-visible, though.

It's easier to start out like this and then change it later when someone
comes up with another use for debugregs.  Or perhaps by then the whole
thing will have been moved over to utrace, making the issue academic.

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Alan Stern

On Wed, 17 Jan 2007, Ingo Molnar wrote:

> * Alan Stern <[EMAIL PROTECTED]> wrote:
> 
> > From: Alan Stern <[EMAIL PROTECTED]>
> > 
> > This patch (as839) implements the Kwatch (kernel-space hardware-based 
> > watchpoints) API for the i386 architecture.  The API is explained in 
> > the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.
> 
> i think it would be nice to have this ontop of Roland's utrace 
> infrastructure, which nicely modularizes all hardware debugging 
> capabilities and detaches it from ptrace.

I'll be happy to move this over to the utrace setting, once it is merged.  
Do you think it would be better to include the current version of kwatch 
now or to wait for utrace?

Roland, is there a schedule for when you plan to get utrace into -mm?

Alan Stern

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Ingo Molnar


* Alan Stern <[EMAIL PROTECTED]> wrote:

> From: Alan Stern <[EMAIL PROTECTED]>
> 
> This patch (as839) implements the Kwatch (kernel-space hardware-based 
> watchpoints) API for the i386 architecture.  The API is explained in 
> the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.

i think it would be nice to have this ontop of Roland's utrace 
infrastructure, which nicely modularizes all hardware debugging 
capabilities and detaches it from ptrace.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Ingo Molnar


* Alan Stern [EMAIL PROTECTED] wrote:

 From: Alan Stern [EMAIL PROTECTED]
 
 This patch (as839) implements the Kwatch (kernel-space hardware-based 
 watchpoints) API for the i386 architecture.  The API is explained in 
 the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.

i think it would be nice to have this ontop of Roland's utrace 
infrastructure, which nicely modularizes all hardware debugging 
capabilities and detaches it from ptrace.

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Alan Stern

On Wed, 17 Jan 2007, Ingo Molnar wrote:

 * Alan Stern [EMAIL PROTECTED] wrote:
 
  From: Alan Stern [EMAIL PROTECTED]
  
  This patch (as839) implements the Kwatch (kernel-space hardware-based 
  watchpoints) API for the i386 architecture.  The API is explained in 
  the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.
 
 i think it would be nice to have this ontop of Roland's utrace 
 infrastructure, which nicely modularizes all hardware debugging 
 capabilities and detaches it from ptrace.

I'll be happy to move this over to the utrace setting, once it is merged.  
Do you think it would be better to include the current version of kwatch 
now or to wait for utrace?

Roland, is there a schedule for when you plan to get utrace into -mm?

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Alan Stern

On Tue, 16 Jan 2007, Christoph Hellwig wrote:

 Fir4st I'd say thanks a lot for forward-porting this, it's really useful
 feature for all kinds of nasty debugging.
 
 I think you should split this into two patches, one for the debugreg
 infrastructure, and one for the actual kwatch code.
 
 Also I think you provide one (or even a few) example wathes for
 trivial things, say updating i_ino for an inode given through debugfs.
 
 Some comments on the code below:

Many thanks for your detailed comments and suggestions.  It probably was
obvious that most of the things you picked up on were inherited from the
original Kwatch patch.  I'll update my patch in accordance with your
suggestions.

Responses to just a couple of the comments:

 I suspect this should be replaced wit ha global and local variant
 to fix the above mentioned issue.  It's a tiny bit duplicated code,
 but seems much cleaner.

It would indeed be cleaner.  And in fact the local variant would have a
large amount of dead code, which could be left out entirely (at least from
the initial version).  That's because the only current user of local debug 
register allocations is ptrace.

  +static void write_dr(int debugreg, unsigned long addr)
  +{
  +   switch (debugreg) {
  +   case 0: set_debugreg(addr, 0);  break;
  +   case 1: set_debugreg(addr, 1);  break;
  +   case 2: set_debugreg(addr, 2);  break;
  +   case 3: set_debugreg(addr, 3);  break;
  +   case 6: set_debugreg(addr, 6);  break;
  +   case 7: set_debugreg(addr, 7);  break;
  +   }
  +}
 
 What's the point of this wrapper?

It is called from two different places, and it's better than including
the switch in each place.

 I think large parts of this header should go into a new linux/kwatch.h
 so that generic code can use kwatches.

In the long run that may well be true.  For now, I'm a little hesitant to
put something which works only on i386 under include/linux.

  +config KWATCH
  +   bool Kwatch points (EXPERIMENTAL)
  +   depends on EXPERIMENTAL
  +   help
  + Kwatch enables kernel-space data watchpoints using the processor's
  + debug registers.  It can be very useful for kernel debugging.
  + If in doubt, say N.
 
 I think we want different options for debugregs and kwatch.  The debugreg
 one probably doesn't have to be actually user-visible, though.

It's easier to start out like this and then change it later when someone
comes up with another use for debugregs.  Or perhaps by then the whole
thing will have been moved over to utrace, making the issue academic.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Christoph Hellwig

On Wed, Jan 17, 2007 at 11:17:37AM -0500, Alan Stern wrote:
 I'll be happy to move this over to the utrace setting, once it is merged.  
 Do you think it would be better to include the current version of kwatch 
 now or to wait for utrace?
 
 Roland, is there a schedule for when you plan to get utrace into -mm?

Even if it goes into mainline soon we'll need a lot of time for all
architectures to catch up, so I think kwatch should definitely comes first.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-17 Thread Ingo Molnar


* Christoph Hellwig [EMAIL PROTECTED] wrote:

  I'll be happy to move this over to the utrace setting, once it is 
  merged.  Do you think it would be better to include the current 
  version of kwatch now or to wait for utrace?
  
  Roland, is there a schedule for when you plan to get utrace into 
  -mm?
 
 Even if it goes into mainline soon we'll need a lot of time for all 
 architectures to catch up, so I think kwatch should definitely comes 
 first.

i disagree. Utrace is a once-in-a-lifetime opportunity to clean up the 
/huge/ ptrace mess. Ptrace has been a very large PITA, for many, many 
years, precisely because it was done in the 'oh, lets get this feature 
added first, think about it later' manner. Roland's work is a large 
logistical undertaking and we should not make it more complex than it 
is. Once it's in we can add debugging features ontop of that. To me work 
that cleans up existing mess takes precedence before work that adds to 
the mess.

Ingo

ps. please fix your mailer to not emit those silly Mail-Followup-To 
headers! It collapses To: and Cc: lines into one huge unnecessary To: 
line.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-16 Thread Christoph Hellwig

Fir4st I'd say thanks a lot for forward-porting this, it's really useful
feature for all kinds of nasty debugging.

I think you should split this into two patches, one for the debugreg
infrastructure, and one for the actual kwatch code.

Also I think you provide one (or even a few) example wathes for
trivial things, say updating i_ino for an inode given through debugfs.

Some comments on the code below:

> --- /dev/null
> +++ usb-2.6/arch/i386/kernel/debugreg.c
> @@ -0,0 +1,182 @@
> +/*
> + *  Debug register
> + *  arch/i386/kernel/debugreg.c

Please don't put in comments that mention the name of the containing
file.  Also the "Debug register" comments seems rather useless.

> + * 2002-Oct  Created by Vamsi Krishna S <[EMAIL PROTECTED]> and
> + *   Bharata Rao <[EMAIL PROTECTED]> to provide debug register
> + *   allocation mechanism.
> + * 2004-Oct  Updated by Prasanna S Panchamukhi <[EMAIL PROTECTED]> with
> + *   idr_allocations mechanism as suggested by Andi Kleen.

I think these kinds of comments aren't in fashion anymore either, all
changelogs should be in git commit messages and initial credits go
into the first commit message.

> +struct debugreg dr_list[DR_MAX];
> +static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED;

I think you're supposed to use magic DEFINE_SPINLOCK macro these days.

> +unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN |
> + DR_GLOBAL_ENABLE_MASK;

I'd rahter keep this static and make  set_process_dr7 a non-inline
function.

> +
> +static unsigned long dr7_global_reg_mask(unsigned int regnum)
> +{
> + return (0xf << (16 + regnum * 4)) | (0x1 << (regnum * 2));
> +}
> +
> +static int get_dr(int regnum, int flag)
> +{
> + if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) {
> + dr_list[regnum].flag = flag;
> + dr7_global_mask |= dr7_global_reg_mask(regnum);
> + return regnum;
> + }
> + if (flag == DR_ALLOC_LOCAL &&
> + dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
> + dr_list[regnum].flag = flag;
> + dr_list[regnum].use_count++;
> + return regnum;
> + }
> + return -1;

This looks rather poorly structured, as the function does compltely
different things depending on the flags passed in.

> +static void free_dr(int regnum)
> +{
> + if (dr_list[regnum].flag == DR_ALLOC_LOCAL) {
> + if (!--dr_list[regnum].use_count)
> + dr_list[regnum].flag = 0;
> + } else {
> + dr_list[regnum].flag = 0;
> + dr_list[regnum].use_count = 0;
> + dr7_global_mask &= ~(dr7_global_reg_mask(regnum));
> + }
> +}

Same here.

> +int dr_alloc(int regnum, int flag)
> +{
> + int ret = -1;
> +
> + spin_lock(_lock);
> + if (regnum >= 0 && regnum < DR_MAX)
> + ret = get_dr(regnum, flag);
> + else if (regnum == DR_ANY) {
> +
> + /* gdb allocates local debug registers starting from 0.
> +  * To help avoid conflicts, we'll start from the other end.
> +  */
> + for (regnum = DR_MAX - 1; regnum >= 0; --regnum) {
> + ret = get_dr(regnum, flag);
> + if (ret >= 0)
> + break;
> + }
> + } else
> + printk(KERN_ERR "dr_alloc: "
> + "Cannot allocate debug register %d\n", regnum);
> + spin_unlock(_lock);
> + return ret;

I suspect this should be replaced wit ha global and local variant
to fix the above mentioned issue.  It's a tiny bit duplicated code,
but seems much cleaner.

> +static int get_dr(int regnum, int flag)
> +{
> + if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) {
> + dr_list[regnum].flag = flag;
> + dr7_global_mask |= dr7_global_reg_mask(regnum);
> + return regnum;
> + }
> + if (flag == DR_ALLOC_LOCAL &&
> + dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
> + dr_list[regnum].flag = flag;
> + dr_list[regnum].use_count++;
> + return regnum;
> + }
> + return -1;

Same comments about global vs local here.

> +
> +EXPORT_SYMBOL(dr_alloc);
> +EXPORT_SYMBOL(dr_free);

I don't think we want these exported at all, and if a proper modular
user shows up they should be _GPL as they're fairly lowlevel.

Btw, the naming in the whole debugregs code should be consolidated to
be debugreg_ instead of all kinds of different variants.

> +#ifdef CONFIG_KWATCH
> +
> +/* Set the type, len and global flag in dr7 for a debug register */
> +#define SET_DR7(dr, regnum, type, len)   do {\
> + dr &= ~(0xf << (16 + (regnum) * 4));\
> + dr |= (len) - 1) << 2) | (type)) << \
> + (16 + (regnum) * 4)) |  \
> + (0x2 << ((regnum) * 2));\
> + } while (0)
> +
> +/* Disable a debug

[PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-16 Thread Alan Stern

From: Alan Stern <[EMAIL PROTECTED]>

This patch (as839) implements the Kwatch (kernel-space hardware-based
watchpoints) API for the i386 architecture.  The API is explained in
the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.

The original version of the patch was written by Vamsi Krishna S and
Bharata Rao.  It was later updated by Prasanna S Panchamukhi for 2.6.13
and then again by me for 2.6.20.

Signed-off-by: Prasanna S Panchamukhi <[EMAIL PROTECTED]>
Signed-off-by: Alan Stern <[EMAIL PROTECTED]>

---

Hardware-based watchpoints can sometimes be indispensable for finding the 
source of problems.  Although this patch is only for the x86 architecture, 
it should still be useful.  And there's no downside to adopting it, since 
it has virtually no overhead with CONFIG_KWATCH isn't selected.


Index: usb-2.6/arch/i386/kernel/debugreg.c
===
--- /dev/null
+++ usb-2.6/arch/i386/kernel/debugreg.c
@@ -0,0 +1,182 @@
+/*
+ *  Debug register
+ *  arch/i386/kernel/debugreg.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-OctCreated by Vamsi Krishna S <[EMAIL PROTECTED]> and
+ * Bharata Rao <[EMAIL PROTECTED]> to provide debug register
+ * allocation mechanism.
+ * 2004-OctUpdated by Prasanna S Panchamukhi <[EMAIL PROTECTED]> with
+ * idr_allocations mechanism as suggested by Andi Kleen.
+ */
+
+/*
+ * These routines provide a debug register allocation mechanism.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct debugreg {
+   int flag;
+   int use_count;
+};
+
+struct debugreg dr_list[DR_MAX];
+static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED;
+unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN |
+   DR_GLOBAL_ENABLE_MASK;
+
+static unsigned long dr7_global_reg_mask(unsigned int regnum)
+{
+   return (0xf << (16 + regnum * 4)) | (0x1 << (regnum * 2));
+}
+
+static int get_dr(int regnum, int flag)
+{
+   if (flag == DR_ALLOC_GLOBAL && !dr_list[regnum].flag) {
+   dr_list[regnum].flag = flag;
+   dr7_global_mask |= dr7_global_reg_mask(regnum);
+   return regnum;
+   }
+   if (flag == DR_ALLOC_LOCAL &&
+   dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
+   dr_list[regnum].flag = flag;
+   dr_list[regnum].use_count++;
+   return regnum;
+   }
+   return -1;
+}
+
+static void free_dr(int regnum)
+{
+   if (dr_list[regnum].flag == DR_ALLOC_LOCAL) {
+   if (!--dr_list[regnum].use_count)
+   dr_list[regnum].flag = 0;
+   } else {
+   dr_list[regnum].flag = 0;
+   dr_list[regnum].use_count = 0;
+   dr7_global_mask &= ~(dr7_global_reg_mask(regnum));
+   }
+}
+
+int dr_alloc(int regnum, int flag)
+{
+   int ret = -1;
+
+   spin_lock(_lock);
+   if (regnum >= 0 && regnum < DR_MAX)
+   ret = get_dr(regnum, flag);
+   else if (regnum == DR_ANY) {
+
+   /* gdb allocates local debug registers starting from 0.
+* To help avoid conflicts, we'll start from the other end.
+*/
+   for (regnum = DR_MAX - 1; regnum >= 0; --regnum) {
+   ret = get_dr(regnum, flag);
+   if (ret >= 0)
+   break;
+   }
+   } else
+   printk(KERN_ERR "dr_alloc: "
+   "Cannot allocate debug register %d\n", regnum);
+   spin_unlock(_lock);
+   return ret;
+}
+
+void dr_free(int regnum)
+{
+   spin_lock(_lock);
+   if (regnum < 0 || regnum >= DR_MAX || !dr_list[regnum].flag)
+   printk(KERN_ERR "dr_free: "
+   "Cannot free debug register %d\n", regnum);
+   else
+   free_dr(regnum);
+   spin_unlock(_lock);
+}
+
+void dr_inc_use_count(unsigned long mask)
+{
+   int i;
+   int dr_local_enable = 1 << DR_LOCAL_ENABLE_SHIFT;
+
+   spin_lock(_lock);
+   for (i = 0; i < DR_MAX; (++i, dr_local_enable <<= DR_ENABLE_SIZE)) {
+   if (mask & dr_local_enable)
+

[PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-16 Thread Alan Stern

From: Alan Stern [EMAIL PROTECTED]

This patch (as839) implements the Kwatch (kernel-space hardware-based
watchpoints) API for the i386 architecture.  The API is explained in
the kerneldoc for register_kwatch() in arch/i386/kernel/kwatch.c.

The original version of the patch was written by Vamsi Krishna S and
Bharata Rao.  It was later updated by Prasanna S Panchamukhi for 2.6.13
and then again by me for 2.6.20.

Signed-off-by: Prasanna S Panchamukhi [EMAIL PROTECTED]
Signed-off-by: Alan Stern [EMAIL PROTECTED]

---

Hardware-based watchpoints can sometimes be indispensable for finding the 
source of problems.  Although this patch is only for the x86 architecture, 
it should still be useful.  And there's no downside to adopting it, since 
it has virtually no overhead with CONFIG_KWATCH isn't selected.


Index: usb-2.6/arch/i386/kernel/debugreg.c
===
--- /dev/null
+++ usb-2.6/arch/i386/kernel/debugreg.c
@@ -0,0 +1,182 @@
+/*
+ *  Debug register
+ *  arch/i386/kernel/debugreg.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004
+ *
+ * 2002-OctCreated by Vamsi Krishna S [EMAIL PROTECTED] and
+ * Bharata Rao [EMAIL PROTECTED] to provide debug register
+ * allocation mechanism.
+ * 2004-OctUpdated by Prasanna S Panchamukhi [EMAIL PROTECTED] with
+ * idr_allocations mechanism as suggested by Andi Kleen.
+ */
+
+/*
+ * These routines provide a debug register allocation mechanism.
+ */
+
+#include linux/kernel.h
+#include linux/spinlock.h
+#include linux/module.h
+#include asm/system.h
+#include asm/debugreg.h
+
+struct debugreg {
+   int flag;
+   int use_count;
+};
+
+struct debugreg dr_list[DR_MAX];
+static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED;
+unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN |
+   DR_GLOBAL_ENABLE_MASK;
+
+static unsigned long dr7_global_reg_mask(unsigned int regnum)
+{
+   return (0xf  (16 + regnum * 4)) | (0x1  (regnum * 2));
+}
+
+static int get_dr(int regnum, int flag)
+{
+   if (flag == DR_ALLOC_GLOBAL  !dr_list[regnum].flag) {
+   dr_list[regnum].flag = flag;
+   dr7_global_mask |= dr7_global_reg_mask(regnum);
+   return regnum;
+   }
+   if (flag == DR_ALLOC_LOCAL 
+   dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
+   dr_list[regnum].flag = flag;
+   dr_list[regnum].use_count++;
+   return regnum;
+   }
+   return -1;
+}
+
+static void free_dr(int regnum)
+{
+   if (dr_list[regnum].flag == DR_ALLOC_LOCAL) {
+   if (!--dr_list[regnum].use_count)
+   dr_list[regnum].flag = 0;
+   } else {
+   dr_list[regnum].flag = 0;
+   dr_list[regnum].use_count = 0;
+   dr7_global_mask = ~(dr7_global_reg_mask(regnum));
+   }
+}
+
+int dr_alloc(int regnum, int flag)
+{
+   int ret = -1;
+
+   spin_lock(dr_lock);
+   if (regnum = 0  regnum  DR_MAX)
+   ret = get_dr(regnum, flag);
+   else if (regnum == DR_ANY) {
+
+   /* gdb allocates local debug registers starting from 0.
+* To help avoid conflicts, we'll start from the other end.
+*/
+   for (regnum = DR_MAX - 1; regnum = 0; --regnum) {
+   ret = get_dr(regnum, flag);
+   if (ret = 0)
+   break;
+   }
+   } else
+   printk(KERN_ERR dr_alloc: 
+   Cannot allocate debug register %d\n, regnum);
+   spin_unlock(dr_lock);
+   return ret;
+}
+
+void dr_free(int regnum)
+{
+   spin_lock(dr_lock);
+   if (regnum  0 || regnum = DR_MAX || !dr_list[regnum].flag)
+   printk(KERN_ERR dr_free: 
+   Cannot free debug register %d\n, regnum);
+   else
+   free_dr(regnum);
+   spin_unlock(dr_lock);
+}
+
+void dr_inc_use_count(unsigned long mask)
+{
+   int i;
+   int dr_local_enable = 1  DR_LOCAL_ENABLE_SHIFT;
+
+   spin_lock(dr_lock);
+   for (i = 0; i  DR_MAX; (++i, dr_local_enable = DR_ENABLE_SIZE)) {
+

Re: [PATCH] Kwatch: kernel watchpoints using CPU debug registers

2007-01-16 Thread Christoph Hellwig

Fir4st I'd say thanks a lot for forward-porting this, it's really useful
feature for all kinds of nasty debugging.

I think you should split this into two patches, one for the debugreg
infrastructure, and one for the actual kwatch code.

Also I think you provide one (or even a few) example wathes for
trivial things, say updating i_ino for an inode given through debugfs.

Some comments on the code below:

 --- /dev/null
 +++ usb-2.6/arch/i386/kernel/debugreg.c
 @@ -0,0 +1,182 @@
 +/*
 + *  Debug register
 + *  arch/i386/kernel/debugreg.c

Please don't put in comments that mention the name of the containing
file.  Also the Debug register comments seems rather useless.

 + * 2002-Oct  Created by Vamsi Krishna S [EMAIL PROTECTED] and
 + *   Bharata Rao [EMAIL PROTECTED] to provide debug register
 + *   allocation mechanism.
 + * 2004-Oct  Updated by Prasanna S Panchamukhi [EMAIL PROTECTED] with
 + *   idr_allocations mechanism as suggested by Andi Kleen.

I think these kinds of comments aren't in fashion anymore either, all
changelogs should be in git commit messages and initial credits go
into the first commit message.

 +struct debugreg dr_list[DR_MAX];
 +static spinlock_t dr_lock = SPIN_LOCK_UNLOCKED;

I think you're supposed to use magic DEFINE_SPINLOCK macro these days.

 +unsigned long dr7_global_mask = DR_CONTROL_RESERVED | DR_GLOBAL_SLOWDOWN |
 + DR_GLOBAL_ENABLE_MASK;

I'd rahter keep this static and make  set_process_dr7 a non-inline
function.

 +
 +static unsigned long dr7_global_reg_mask(unsigned int regnum)
 +{
 + return (0xf  (16 + regnum * 4)) | (0x1  (regnum * 2));
 +}
 +
 +static int get_dr(int regnum, int flag)
 +{
 + if (flag == DR_ALLOC_GLOBAL  !dr_list[regnum].flag) {
 + dr_list[regnum].flag = flag;
 + dr7_global_mask |= dr7_global_reg_mask(regnum);
 + return regnum;
 + }
 + if (flag == DR_ALLOC_LOCAL 
 + dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
 + dr_list[regnum].flag = flag;
 + dr_list[regnum].use_count++;
 + return regnum;
 + }
 + return -1;

This looks rather poorly structured, as the function does compltely
different things depending on the flags passed in.

 +static void free_dr(int regnum)
 +{
 + if (dr_list[regnum].flag == DR_ALLOC_LOCAL) {
 + if (!--dr_list[regnum].use_count)
 + dr_list[regnum].flag = 0;
 + } else {
 + dr_list[regnum].flag = 0;
 + dr_list[regnum].use_count = 0;
 + dr7_global_mask = ~(dr7_global_reg_mask(regnum));
 + }
 +}

Same here.

 +int dr_alloc(int regnum, int flag)
 +{
 + int ret = -1;
 +
 + spin_lock(dr_lock);
 + if (regnum = 0  regnum  DR_MAX)
 + ret = get_dr(regnum, flag);
 + else if (regnum == DR_ANY) {
 +
 + /* gdb allocates local debug registers starting from 0.
 +  * To help avoid conflicts, we'll start from the other end.
 +  */
 + for (regnum = DR_MAX - 1; regnum = 0; --regnum) {
 + ret = get_dr(regnum, flag);
 + if (ret = 0)
 + break;
 + }
 + } else
 + printk(KERN_ERR dr_alloc: 
 + Cannot allocate debug register %d\n, regnum);
 + spin_unlock(dr_lock);
 + return ret;

I suspect this should be replaced wit ha global and local variant
to fix the above mentioned issue.  It's a tiny bit duplicated code,
but seems much cleaner.

 +static int get_dr(int regnum, int flag)
 +{
 + if (flag == DR_ALLOC_GLOBAL  !dr_list[regnum].flag) {
 + dr_list[regnum].flag = flag;
 + dr7_global_mask |= dr7_global_reg_mask(regnum);
 + return regnum;
 + }
 + if (flag == DR_ALLOC_LOCAL 
 + dr_list[regnum].flag != DR_ALLOC_GLOBAL) {
 + dr_list[regnum].flag = flag;
 + dr_list[regnum].use_count++;
 + return regnum;
 + }
 + return -1;

Same comments about global vs local here.

 +
 +EXPORT_SYMBOL(dr_alloc);
 +EXPORT_SYMBOL(dr_free);

I don't think we want these exported at all, and if a proper modular
user shows up they should be _GPL as they're fairly lowlevel.

Btw, the naming in the whole debugregs code should be consolidated to
be debugreg_ instead of all kinds of different variants.

 +#ifdef CONFIG_KWATCH
 +
 +/* Set the type, len and global flag in dr7 for a debug register */
 +#define SET_DR7(dr, regnum, type, len)   do {\
 + dr = ~(0xf  (16 + (regnum) * 4));\
 + dr |= (len) - 1)  2) | (type))  \
 + (16 + (regnum) * 4)) |  \
 + (0x2  ((regnum) * 2));\
 + } while (0)
 +
 +/* Disable a debug register by clearing the global/local flag in dr7 */
 +#define RESET_DR7(dr, regnum)dr = ~(0x3  ((regnum) * 2))

I don't think

48 matches

Mail list logo