Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Valery Ushakov
On Thu, Dec 15, 2016 at 19:51:35 +0100, Kamil Rytarowski wrote:

> On 15.12.2016 16:42, Valery Ushakov wrote:
> > Again, you don't provide any details.  What extra logic?  Also, what
> > are these few dozens of instructions you are talking about?  I.e. what
> > is that extra work you have to do for a process-wide watchpoint that
> > you don't have to do for an lwp-specific watchpoint on each return to
> > userland?
> 
> 1. Complexity is adding extra case in ptrace_watchpoint structure,
> adding there a way to specify per-thread or per-process. If there
> someone wants to set per-thread watchpoints inside the process
> structure.. there would be need to have a list of available watchpoints,
> that would scale to number of watchpoints possible x number of threads list.
> 
> 2. Complexity on returning to userland - need to lock structure process
> in userret(9) and check every watchpoint if it's process-wide or
> dedicated for the thread.

Why would you need all this?  Consider the case when debug registers
are part of the mcontext, then the very act of restoring the context
enables corresponding watchpoints for the lwp.  When the debug
registers are not part of mcontext the only difference is that after
restoring the mcontext you also set debug registers from some other
structure.

E.g. sh3 uses User Break Controller to implement single-stepping, so
effectively a kind of watchpoint that is triggered after instruction,
not matching any address bits, asid, etc, etc.  The register in UBC
that enables the watchpoint is set from a field in trapframe, just
like any other register.

So at ptrace(2) time to set a process-wide watchpoint, you go over all
existing lwps and setup their trapframes accordingly.  For new lwps
created after the watchpoint is set you need to do that at lwp
creation time.  But when lwp returns to userland, there's no overhead.


> I implemented it originally per process and I finally decided to throw
> the per-process vs per-thread logic away, out of the kernel and expose
> watchpoints (or technically bitmasks of available debug registers) to
> userland.
> 
> It's easier to check perlwp local structure and end up with up to 4
> fields there, than lock a list and iterate over N elements. Every thread
> has also dedicated bit in its property indicating whether it has
> attached watchpoints.
> 
> From user-land point of view, and management it's equivalent. With the
> difference that debugger needs to catch thread creation and apply
> desired watchpoint to it.
> 
> Why bitmasks and not raw registers? On some level there is need to check
> if the composed combination is valid in the kernel - dividing
> user-settable bits from registers to bitmask is needed on some level
> anyway, and while it's possible to be done in kernel, why not to export
> it to userland?
> 
> I've found it easier to be reused in 3rd party software.

-uwe


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
On 15.12.2016 19:30, Andrew Cagney wrote:
> 
> On 13 December 2016 at 12:16, Kamil Rytarowski  > wrote:
> 
> >> 5. Do not allow to mix PT_STEP and hardware watchpoint, in case of
> >> single-stepping the code, disable (it means: don't set) hardware
> >> watchpoints for threads. Some platforms might implement single-step 
> with
> >> hardware watchpoints and managing both at the same time is generating
> >> extra pointless complexity.
> 
> 
> Is this wise?  I suspect it might be better to just expose all the hairy
> details and let the client decide if the restriction should apply.
> (to turn this round, if the details are not exposed, then clients will
> wonder why their platform is being crippled).

This is subject to change. I'm discussing it with debugger developers on
LLDB. They wish to have as many data available about
breakpoint/watchpoint as possible. This implies request for dedicated
si_code for hardware assisted traps.



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Andrew Cagney
On 15 December 2016 at 13:23, Kamil Rytarowski  wrote:

> BTW. I'm having some DWARF related questions, if I may reach you in a
> private mail?
>

Better is http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org
so you don't have me as a bottle neck.

Andrew


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Andrew Cagney
On 15 December 2016 at 13:22, Eduardo Horvath  wrote:

>
> On Thu, 15 Dec 2016, Andrew Cagney wrote:
>
> > Might a better strategy be to first get the registers exposed, and then,
> if
> > there's still time start to look at an abstract interface?
>
> That's one way of looking at it.
>
> Another way is to consider that watchpoints can be implemented through
> careful use of the MMU.
>
>
Yes, HP implemented something like this with wildebeast (gdb fork) on HP-UX.


> > and now lets consider this simple example, try to watch c.a in:
> >
> > struct { char c; char a[3]; int32_t i; int64_t j; } c;
> >
> > Under the proposed model (it looks a lot like gdb's remote protocol's Z
> > packet) it's assumed this will allocate one watch-point:
> >
> > address=, size=3
>
> So when you register the watchpoint, the kernel adds the address and size
> to an internal list and changes the protection of that page.  If it's a
> write watchpoint, the page is made read-only.  If it's a read watchpoint,
> it's made invalid.
>
> The userland program runs happily until it tries to access something on
> the page with the watchpoint.  Then it takes a page fault.
>
>
Threads cause the theory to cough up a few fur balls, but nothing fatal.


> The fault handler checks the fault address against its watchpoint list,
> and if there's a match, send a ptrace event and you're done.
>
> If it doesn't match the address, the kernel can either map the address in
> and single step, set a breakpoint on the next instruction, or emulate the
> instruction, and then protect the page again to wait for the next fault.
>
> It has a bit more overhead than using debug registers, but it scales a lot
> better.
>

Agreed.  There are many things that can be implemented to improve user land
debugging.  Simple watch-points (especially if they work on the kernel) are
a good starting point.

Andrew


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Andrew Cagney
On 13 December 2016 at 12:16, Kamil Rytarowski  wrote:

> >> 5. Do not allow to mix PT_STEP and hardware watchpoint, in case of
> >> single-stepping the code, disable (it means: don't set) hardware
> >> watchpoints for threads. Some platforms might implement single-step with
> >> hardware watchpoints and managing both at the same time is generating
> >> extra pointless complexity.
>
>
Is this wise?  I suspect it might be better to just expose all the hairy
details and let the client decide if the restriction should apply.
(to turn this round, if the details are not exposed, then clients will
wonder why their platform is being crippled).


> > I don't think I see how "extra pointless complexity" follows.
> >
>
> 1. At least in MD x86 specific code, watchpoint traps triggered with
> stepped code are reported differently to those reported with plain steps
> and also differently to plain hardware watchpoint traps. They are 3rd
> type of a trap.
>
> 2. Single stepping can be implemented with hardware assisted watchpoints
> (technically breakpoints) on the kernel side in MD. And if so, trying to
> apply watchpoints and singlestep will conflict and this will need
> additional handling on the kernel side.
>
> To oppose extra complexity I propose to make stepping and watchpoints
> separable, one or the other, but not both.


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
On 15.12.2016 18:45, Andrew Cagney wrote:
> [see end]


Please see inline.

> 
> On 13 December 2016 at 12:16, Kamil Rytarowski  > wrote:
> 
> On 13.12.2016 04:12, Valery Ushakov wrote:
> > On Tue, Dec 13, 2016 at 02:04:36 +0100, Kamil Rytarowski wrote:
> >
> >> The design is as follows:
> >>
> >> 1. Accessors through:
> >>  - PT_WRITE_WATCHPOINT - write new watchpoint's state (set, unset, 
> ...),
> >>  - PT_READ_WATCHPOINT - read watchpoints's state,
> >>  - PT_COUNT_WATCHPOINT - receive the number of available watchpoints.
> >
> > Gdb supports hardware assisted watchpoints.  That implies that other
> > OSes have existing designs for them.  Have you studied those existing
> > designs?  Why do you think they are not suitable to be copied?
> >
> 
> They are based on the concept of exporting debug registers to tracee's
> context (machine context/userdata/etc). FreeBSD exposes MD-specific
> DBREGS to be set/get by a user, similar with Linux and with MacOSX.
> 
> GDB supports hardware and software assisted watchpoints. Software ones
> are stepping the code and checking each instruction, hardware ones make
> use of the registers.
> 
> I propose to export an interface that is not limited to one type of
> hardware assisted action, while it can be fully used for hardware
> watchpoints (if CPU supports it). This interface will abstract
> underlying hardware specific capabilities with a MI ptrace(2) calls (but
> MD-specific ptrace_watchpoint structure).
> 
> These interfaces are already platform specific and aren't shared between
> OSes.
> 
> 
> That isn't true (or at least it shouldn't).
> 
> While access to the registers is OS specific, the contents of the
> registers, and their behaviour is not.  Instead, that is specified by
> the Instruction Set Architecture.
> For instance, FreeBSD's gdb/i386fbsd-nat.c uses the generic
> gdb/x86-nat.c:x86_use_watchpoints() code.
> 
>  

Thank you for your pointer.

It's similar in LLDB, that there are some assumptions that I consider
fragile (like dbregs in mcontext). However in the end I think the result
is the same.

One of my motivations was to use a simplified interface on the
user-level and more easily integrate it within existing applications
with builtin debuggers like radare2. Another motivation was to use
breakpoints without mprotect restrictions.. and handing few small
breakpoints is easier with hardware assisted watchpoints, and more
tricky with software ones.

> 
> Some time ago I checked and IIRC the only two users of these interfaces
> were GDB and LLDB, I implied from this that there is no danger from
> heavy patching 3rd party software.
> 
> 
> I'm not sure how to interpret this.  Is the suggestion that, because
> there are only two consumers, hacking them both will be easy; or
> something else?  I hope it isn't.  Taking on such maintenance has a
> horrendous cost.
> 

So, please help to promote a local debuggers' developer to be aboard and
take the maintenance cost sensu largo.

> Anyway, lets look at the problem space.  It might help to understand why
> kernel developers tend to throw up their hands.
> 
> First lets set the scene:
> 
> - if we're lucky we have one hardware watch-point, if we're really lucky
> there's more than one
> - if we're lucky it does something, if we're really lucky it does what
> the documentation says
> 
> which reminds me:
> 
> - if we're lucky we've got documentation, if we're really lucky we've
> correct and up-to-date errata explaining all the hair brained
> interactions these features have with other hardware events
> 
> and now lets consider this simple example, try to watch c.a in:
> 
> struct { char c; char a[3]; int32_t i; int64_t j; } c;
> 
> Under the proposed model (it looks a lot like gdb's remote protocol's Z
> packet) it's assumed this will allocate one watch-point:
> 
> address=, size=3
> 
> but wait, the hardware watch-point registers have a few, er, standard
> features:
> 
> - I'll be kind, there are two registers
> - size must be power-of-two (lucky size==4 isn't fixed)
> - address must be size aligned (lucky addr & 3 == 0 isn't fixed)
> - there are separate read/write bits (lucky r+w isn't fixed)
> 
> so what to do?  With this hardware we can:
> 
> - use two watch-point registers (making your count meaningless), so that
> accesses only apply to the address in question
> 
> - use one watch-point register and over-allocate the address/size and
> then try to figure out what happened
> For writes, a memcmp can help, for reads, well you might be lucky and
> have a further register with the access address, or unlucky and find
> yourself disassembling instructions to figure out what the address/size
> really was
> 
> Now, lets consider what happens when the user tries to add:
> 
>, size=8
> 
> depending on where all the balls are (above decision, and the 

Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
Hello,

Please see inline, I tried to refer to other questions offlist.

On 15.12.2016 16:42, Valery Ushakov wrote:
> Again, you don't provide any details.  What extra logic?  Also, what
> are these few dozens of instructions you are talking about?  I.e. what
> is that extra work you have to do for a process-wide watchpoint that
> you don't have to do for an lwp-specific watchpoint on each return to
> userland?
> 
> 

1. Complexity is adding extra case in ptrace_watchpoint structure,
adding there a way to specify per-thread or per-process. If there
someone wants to set per-thread watchpoints inside the process
structure.. there would be need to have a list of available watchpoints,
that would scale to number of watchpoints possible x number of threads list.

2. Complexity on returning to userland - need to lock structure process
in userret(9) and check every watchpoint if it's process-wide or
dedicated for the thread.

I implemented it originally per process and I finally decided to throw
the per-process vs per-thread logic away, out of the kernel and expose
watchpoints (or technically bitmasks of available debug registers) to
userland.

It's easier to check perlwp local structure and end up with up to 4
fields there, than lock a list and iterate over N elements. Every thread
has also dedicated bit in its property indicating whether it has
attached watchpoints.

From user-land point of view, and management it's equivalent. With the
difference that debugger needs to catch thread creation and apply
desired watchpoint to it.

Why bitmasks and not raw registers? On some level there is need to check
if the composed combination is valid in the kernel - dividing
user-settable bits from registers to bitmask is needed on some level
anyway, and while it's possible to be done in kernel, why not to export
it to userland?

I've found it easier to be reused in 3rd party software.



signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Kamil Rytarowski
On 15.12.2016 19:45, Andrew Cagney wrote:
> 
> 
> On 15 December 2016 at 13:22, Eduardo Horvath  > wrote:
> 
> 
> On Thu, 15 Dec 2016, Andrew Cagney wrote:
> 
> > Might a better strategy be to first get the registers exposed, and 
> then, if
> > there's still time start to look at an abstract interface?
> 
> That's one way of looking at it.
> 
> Another way is to consider that watchpoints can be implemented through
> careful use of the MMU.
> 
> 
> Yes, HP implemented something like this with wildebeast (gdb fork) on HP-UX.

Can it work for short data fields like shorts, integers? I know there
are features like mprotect(2) that perhaps could be used for the same
purpose... to some extend.




signature.asc
Description: OpenPGP digital signature


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Eduardo Horvath

On Thu, 15 Dec 2016, Andrew Cagney wrote:

> Might a better strategy be to first get the registers exposed, and then, if
> there's still time start to look at an abstract interface?

That's one way of looking at it.

Another way is to consider that watchpoints can be implemented through 
careful use of the MMU.


> and now lets consider this simple example, try to watch c.a in:
> 
> struct { char c; char a[3]; int32_t i; int64_t j; } c;
> 
> Under the proposed model (it looks a lot like gdb's remote protocol's Z
> packet) it's assumed this will allocate one watch-point:
> 
> address=, size=3

So when you register the watchpoint, the kernel adds the address and size 
to an internal list and changes the protection of that page.  If it's a 
write watchpoint, the page is made read-only.  If it's a read watchpoint, 
it's made invalid.

The userland program runs happily until it tries to access something on 
the page with the watchpoint.  Then it takes a page fault.  

The fault handler checks the fault address against its watchpoint list, 
and if there's a match, send a ptrace event and you're done.

If it doesn't match the address, the kernel can either map the address in 
and single step, set a breakpoint on the next instruction, or emulate the 
instruction, and then protect the page again to wait for the next fault.

It has a bit more overhead than using debug registers, but it scales a lot 
better.

Eduardo


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Andrew Cagney
[see end]

On 13 December 2016 at 12:16, Kamil Rytarowski  wrote:

> On 13.12.2016 04:12, Valery Ushakov wrote:
> > On Tue, Dec 13, 2016 at 02:04:36 +0100, Kamil Rytarowski wrote:
> >
> >> The design is as follows:
> >>
> >> 1. Accessors through:
> >>  - PT_WRITE_WATCHPOINT - write new watchpoint's state (set, unset, ...),
> >>  - PT_READ_WATCHPOINT - read watchpoints's state,
> >>  - PT_COUNT_WATCHPOINT - receive the number of available watchpoints.
> >
> > Gdb supports hardware assisted watchpoints.  That implies that other
> > OSes have existing designs for them.  Have you studied those existing
> > designs?  Why do you think they are not suitable to be copied?
> >
>
> They are based on the concept of exporting debug registers to tracee's
> context (machine context/userdata/etc). FreeBSD exposes MD-specific
> DBREGS to be set/get by a user, similar with Linux and with MacOSX.
>
> GDB supports hardware and software assisted watchpoints. Software ones
> are stepping the code and checking each instruction, hardware ones make
> use of the registers.
>
> I propose to export an interface that is not limited to one type of
> hardware assisted action, while it can be fully used for hardware
> watchpoints (if CPU supports it). This interface will abstract
> underlying hardware specific capabilities with a MI ptrace(2) calls (but
> MD-specific ptrace_watchpoint structure).
>
> These interfaces are already platform specific and aren't shared between
> OSes.
>
>
That isn't true (or at least it shouldn't).

While access to the registers is OS specific, the contents of the
registers, and their behaviour is not.  Instead, that is specified by the
Instruction Set Architecture.
For instance, FreeBSD's gdb/i386fbsd-nat.c uses the generic
gdb/x86-nat.c:x86_use_watchpoints() code.



> Some time ago I checked and IIRC the only two users of these interfaces
> were GDB and LLDB, I implied from this that there is no danger from
> heavy patching 3rd party software.


I'm not sure how to interpret this.  Is the suggestion that, because there
are only two consumers, hacking them both will be easy; or something else?
I hope it isn't.  Taking on such maintenance has a horrendous cost.

Anyway, lets look at the problem space.  It might help to understand why
kernel developers tend to throw up their hands.

First lets set the scene:

- if we're lucky we have one hardware watch-point, if we're really lucky
there's more than one
- if we're lucky it does something, if we're really lucky it does what the
documentation says

which reminds me:

- if we're lucky we've got documentation, if we're really lucky we've
correct and up-to-date errata explaining all the hair brained interactions
these features have with other hardware events

and now lets consider this simple example, try to watch c.a in:

struct { char c; char a[3]; int32_t i; int64_t j; } c;

Under the proposed model (it looks a lot like gdb's remote protocol's Z
packet) it's assumed this will allocate one watch-point:

address=, size=3

but wait, the hardware watch-point registers have a few, er, standard
features:

- I'll be kind, there are two registers
- size must be power-of-two (lucky size==4 isn't fixed)
- address must be size aligned (lucky addr & 3 == 0 isn't fixed)
- there are separate read/write bits (lucky r+w isn't fixed)

so what to do?  With this hardware we can:

- use two watch-point registers (making your count meaningless), so that
accesses only apply to the address in question

- use one watch-point register and over-allocate the address/size and then
try to figure out what happened
For writes, a memcmp can help, for reads, well you might be lucky and have
a further register with the access address, or unlucky and find yourself
disassembling instructions to figure out what the address/size really was

Now, lets consider what happens when the user tries to add:

   , size=8

depending on where all the balls are (above decision, and the hardware),
that may or may not succeed:

  - 32-bit hardware probably limits size<=4, so above would require two
registers
  - even if not, ,size=3 may have already used up the two registers

Eww.

Might a better strategy be to first get the registers exposed, and then, if
there's still time start to look at an abstract interface?

Andrew


Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-15 Thread Valery Ushakov
On Tue, Dec 13, 2016 at 18:16:04 +0100, Kamil Rytarowski wrote:

> >> 4. Do not set watchpoints globally per process, limit them to
> >> threads (LWP). [...]  Adding process-wide management in the
> >> ptrace(2) interface calls adds extra complexity that should be
> >> pushed away to user-land code in debuggers.
> > 
> > I have no idea what amd64 debug registers do, but this smells like you
> > are exposing in the MI interface some of those details.  I don't think
> > this can be done in hardware on sh3, e.g.  

Ok, I was confused there for a moment.  The "debug state" is per-lwp
and is restored when lwp is switched to.  What was I thinking...


> > Also, you quite often have no idea which thread stomps on your data,
> > so I'd imagine most of the time you do want a global watchpoint.
> 
> This is true.
> 
> With the proposed interface per-thread a debugger can set the same
> hardware watchpoint for each LWP and achieve the same result. There are
> no performance or synchronization challenges as watchpoints can be set
> only when a process is stopped.
> 
> In my older code I had logic per-process to access watchpoints, but
> it required extra logic in thread-specific functions to access
> process specific data. I assumed that saving few dozens of CPU
> cycles before each thread entering user-space is precious. (I know
> it's a small optimization, however it's for free)

Again, you don't provide any details.  What extra logic?  Also, what
are these few dozens of instructions you are talking about?  I.e. what
is that extra work you have to do for a process-wide watchpoint that
you don't have to do for an lwp-specific watchpoint on each return to
userland?


> >> 5. Do not allow to mix PT_STEP and hardware watchpoint, in case of
> >> single-stepping the code, disable (it means: don't set) hardware
> >> watchpoints for threads. Some platforms might implement single-step with
> >> hardware watchpoints and managing both at the same time is generating
> >> extra pointless complexity.
> > 
> > I don't think I see how "extra pointless complexity" follows.
> 
> 1. At least in MD x86 specific code, watchpoint traps triggered with
> stepped code are reported differently to those reported with plain steps
> and also differently to plain hardware watchpoint traps. They are 3rd
> type of a trap.
>
> 2. Single stepping can be implemented with hardware assisted watchpoints
> (technically breakpoints) on the kernel side in MD. And if so, trying to
> apply watchpoints and singlestep will conflict and this will need
> additional handling on the kernel side.
> 
> To oppose extra complexity I propose to make stepping and watchpoints
> separable, one or the other, but not both.

And again you allude to MD details and don't provide any.  You cannot
just handwave this away.  You will have to provide enough information
for people to implement this for other arches evnentually, including
MD specifics that affected the design, so that people can see how
their MD specific details affect their implementation.  Why don't
provide this upfront?  I understand you might be eager to commit this
work and be done with it, but you are doing this fulltime.  Others
don't have this luxury.  So I don't want to come around to
implementing your desing in a few months time when I have some spare
cycles and discover that it's ill suited for the hardware I have to
deal with.

May be you are right, and it's hard to mix single-stepping and
watchpoints, but I don't have time to investigate this fully right now
for sh3 and you don't provide any details that will back your
conclusion for x86.  Have it occured to you that you might me missing
some approach to solving this, but people that grok x86 can't tell you
unless they know the details.  And I don't think that committing
first, as you seem to have done already, and then let people figure it
out from RTFS is an acceptable approach, b/c, again, without
description you force people to RTFS and they might not have the time.


> > Also, you might want both, single-stepping and waiting for a
> > watchpoint.  Will debugger have switch dynamically to software
> > watchpoints when single-stepping?  Can it even do that already?
> 
> My understanding of stepping the code is that we want to go one and only
> one instruction ahead (unless port restricts it and its 1 or more),
> followed with a break.
> 
> What's the use case of waiting for data access and stepping in the same
> time? Is it needed? Does it solve some issues that cannot be solved
> otherwise? Could it be implemented in software (in case of watch)?

Isn't it your job to tell us the answers?  So, let's say I set a
watchpoint and then I hit some other breakpoint and do some stepi.  If
one of those instructions I'm stepping will do the read/write I'm
watching for, how it will be detected y the debugger if you can't mix
hw-assisted watchpoints and single-stepping?


> My original intention was to make it friendly for ports, without too
> specific 

Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-13 Thread Kamil Rytarowski
On 13.12.2016 04:12, Valery Ushakov wrote:
> On Tue, Dec 13, 2016 at 02:04:36 +0100, Kamil Rytarowski wrote:
> 
>> The design is as follows:
>>
>> 1. Accessors through:
>>  - PT_WRITE_WATCHPOINT - write new watchpoint's state (set, unset, ...),
>>  - PT_READ_WATCHPOINT - read watchpoints's state,
>>  - PT_COUNT_WATCHPOINT - receive the number of available watchpoints.
> 
> Gdb supports hardware assisted watchpoints.  That implies that other
> OSes have existing designs for them.  Have you studied those existing
> designs?  Why do you think they are not suitable to be copied?
> 

They are based on the concept of exporting debug registers to tracee's
context (machine context/userdata/etc). FreeBSD exposes MD-specific
DBREGS to be set/get by a user, similar with Linux and with MacOSX.

GDB supports hardware and software assisted watchpoints. Software ones
are stepping the code and checking each instruction, hardware ones make
use of the registers.

I propose to export an interface that is not limited to one type of
hardware assisted action, while it can be fully used for hardware
watchpoints (if CPU supports it). This interface will abstract
underlying hardware specific capabilities with a MI ptrace(2) calls (but
MD-specific ptrace_watchpoint structure).

These interfaces are already platform specific and aren't shared between
OSes.

Some time ago I checked and IIRC the only two users of these interfaces
were GDB and LLDB, I implied from this that there is no danger from
heavy patching 3rd party software.

> 
>> 4. Do not set watchpoints globally per process, limit them to
>> threads (LWP). [...]  Adding process-wide management in the
>> ptrace(2) interface calls adds extra complexity that should be
>> pushed away to user-land code in debuggers.
> 
> 
> I have no idea what amd64 debug registers do, but this smells like you
> are exposing in the MI interface some of those details.  I don't think
> this can be done in hardware on sh3, e.g.  
> 

No, I'm not exposing anything in MI code - except the number of
available watchpoints defined by MD code (but this information goes
through a function called from MD part).

The functions are hidden under __HAVE_PTRACE_WATCHPOINTS ifdefs.

"watchpoint" terminology can be misleading, but since I couldn't get
better, I called this interface with this word.

> Also, you quite often have no idea which thread stomps on your data,
> so I'd imagine most of the time you do want a global watchpoint.

This is true.

With the proposed interface per-thread a debugger can set the same
hardware watchpoint for each LWP and achieve the same result. There are
no performance or synchronization challenges as watchpoints can be set
only when a process is stopped.

In my older code I had logic per-process to access watchpoints, but it
required extra logic in thread-specific functions to access process
specific data. I assumed that saving few dozens of CPU cycles before
each thread entering user-space is precious. (I know it's a small
optimization, however it's for free)

A user-interface of a debugger ("from a user point of view") is agnostic
to both approaches.

> Note, that if you want to restrict your watchpoint to one thread, you
> can probably (I don't know and I haven't checked) do this with gdb
> "command" that "continue"s if it's on the wrong thread.
> 

The proposed approach is just on the level of ptrace(2) implementation,
any debugger is free to support free to implement it in any way it's
possible, while making an option to set watchpoints per thread.

I don't want to appear like escaping the choice, but I'm trying to
propose an implementation that was easier to be applied on the kernel
side. From a userland point of view I think it does not matter.

> 
>> 5. Do not allow to mix PT_STEP and hardware watchpoint, in case of
>> single-stepping the code, disable (it means: don't set) hardware
>> watchpoints for threads. Some platforms might implement single-step with
>> hardware watchpoints and managing both at the same time is generating
>> extra pointless complexity.
> 
> I don't think I see how "extra pointless complexity" follows.
> 

1. At least in MD x86 specific code, watchpoint traps triggered with
stepped code are reported differently to those reported with plain steps
and also differently to plain hardware watchpoint traps. They are 3rd
type of a trap.

2. Single stepping can be implemented with hardware assisted watchpoints
(technically breakpoints) on the kernel side in MD. And if so, trying to
apply watchpoints and singlestep will conflict and this will need
additional handling on the kernel side.

To oppose extra complexity I propose to make stepping and watchpoints
separable, one or the other, but not both.

> Also, you might want both, single-stepping and waiting for a
> watchpoint.  Will debugger have switch dynamically to software
> watchpoints when single-stepping?  Can it even do that already?
> 

My understanding of stepping the code is that we want to go one 

Re: ptrace(2) interface for hardware watchpoints (breakpoints)

2016-12-12 Thread Valery Ushakov
On Tue, Dec 13, 2016 at 02:04:36 +0100, Kamil Rytarowski wrote:

> The design is as follows:
> 
> 1. Accessors through:
>  - PT_WRITE_WATCHPOINT - write new watchpoint's state (set, unset, ...),
>  - PT_READ_WATCHPOINT - read watchpoints's state,
>  - PT_COUNT_WATCHPOINT - receive the number of available watchpoints.

Gdb supports hardware assisted watchpoints.  That implies that other
OSes have existing designs for them.  Have you studied those existing
designs?  Why do you think they are not suitable to be copied?


> 4. Do not set watchpoints globally per process, limit them to
> threads (LWP). [...]  Adding process-wide management in the
> ptrace(2) interface calls adds extra complexity that should be
> pushed away to user-land code in debuggers.


I have no idea what amd64 debug registers do, but this smells like you
are exposing in the MI interface some of those details.  I don't think
this can be done in hardware on sh3, e.g.  

Also, you quite often have no idea which thread stomps on your data,
so I'd imagine most of the time you do want a global watchpoint.
Note, that if you want to restrict your watchpoint to one thread, you
can probably (I don't know and I haven't checked) do this with gdb
"command" that "continue"s if it's on the wrong thread.


> 5. Do not allow to mix PT_STEP and hardware watchpoint, in case of
> single-stepping the code, disable (it means: don't set) hardware
> watchpoints for threads. Some platforms might implement single-step with
> hardware watchpoints and managing both at the same time is generating
> extra pointless complexity.

I don't think I see how "extra pointless complexity" follows.

Also, you might want both, single-stepping and waiting for a
watchpoint.  Will debugger have switch dynamically to software
watchpoints when single-stepping?  Can it even do that already?


In general I'd appreciate if handwavy "this is pointless/extra
complexity" arguments were spelled out.  They might be obvious to you,
but most people reading this don't have relevant information swapped
in, or don't know enough details.

-uwe