Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Andi Kleen
On Tue, Nov 13, 2018 at 07:37:27PM +0100, Richard Biener wrote:
> I'd look at doing the instrumentation after var-tracking has run - that is 
> what computes the locations in the end. That means instrumenting on late RTL 
> after register allocation (and eventually with branch range restrictions in 
> place). Basically you'd instrument at the same time as generating debug info.

Ok that would be a full rewrite. I'll check if it's really a problem
first. I would prefer to stay on the GIMPLE level.

-Andi


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Richard Biener
On November 13, 2018 7:09:15 PM GMT+01:00, Andi Kleen  
wrote:
>On Tue, Nov 13, 2018 at 09:03:52AM +0100, Richard Biener wrote:
>> > I even had an earlier version of this that instrumented
>> > assembler output of the compiler with PTWRITE in a separate script,
>> > and it worked fine too.
>> 
>> Apart from eventually messing up branch range restrictions I guess ;)
>
>You mean for LOOP? For everything else the assembler handles it I
>believe.
>
>> Did you gather any statistics on how many ptwrite instructions
>> that are generated by your patch are not covered by any
>> location range & expr?  
>
>Need to look into that. Any suggestions how to do it in the compiler?

I guess you need to do that in a dwarf decoder somehow. 

>I had some decode failures with the perf dwarf decoder,
>but I was usually blaming them on perf dwarf limitations. 
>
>> I assume ptwrite is writing from register
>> input only so you probably should avoid instrumenting writes
>> of constants (will require an extra register)?
>
>Hmm, I think those are needed unfortunately because someone
>might want to trace every update of of something. With branch
>tracing it could be recreated theoretically but would 
>be a lot more work for the decoder.
>
>> How does the .text size behave say for cc1 when you enable
>> the various granularities of instrumentation?  How many
>> ptwrite instructions are there per 100 regular instructions?
>
>With locals tracing (worst case) I see ~23% of all instructions
>in cc1 be PTWRITE. Binary is ~27% bigger.

OK, I suppose it will get better when addressing some of my review comments. 

>> Can we get an updated patch based on my review?
>
>Yes, working on it, also addressing Martin's comments. Hopefully soon.
>> 
>> I still think we should eventually move the pass later
>
>It's after pass_sanopt now.
>
>> avoid instrumenting places we'll not have any meaningful locations
>> in the debug info - if only to reduce required trace bandwith.
>
>Can you suggest how to check that?

I'd look at doing the instrumentation after var-tracking has run - that is what 
computes the locations in the end. That means instrumenting on late RTL after 
register allocation (and eventually with branch range restrictions in place). 
Basically you'd instrument at the same time as generating debug info.

Richard. 

>-Andi



Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Andi Kleen
On Tue, Nov 13, 2018 at 09:03:52AM +0100, Richard Biener wrote:
> > I even had an earlier version of this that instrumented
> > assembler output of the compiler with PTWRITE in a separate script,
> > and it worked fine too.
> 
> Apart from eventually messing up branch range restrictions I guess ;)

You mean for LOOP? For everything else the assembler handles it I
believe.

> Did you gather any statistics on how many ptwrite instructions
> that are generated by your patch are not covered by any
> location range & expr?  

Need to look into that. Any suggestions how to do it in the compiler?

I had some decode failures with the perf dwarf decoder,
but I was usually blaming them on perf dwarf limitations. 

> I assume ptwrite is writing from register
> input only so you probably should avoid instrumenting writes
> of constants (will require an extra register)?

Hmm, I think those are needed unfortunately because someone
might want to trace every update of of something. With branch
tracing it could be recreated theoretically but would 
be a lot more work for the decoder.

> How does the .text size behave say for cc1 when you enable
> the various granularities of instrumentation?  How many
> ptwrite instructions are there per 100 regular instructions?

With locals tracing (worst case) I see ~23% of all instructions
in cc1 be PTWRITE. Binary is ~27% bigger.

> Can we get an updated patch based on my review?

Yes, working on it, also addressing Martin's comments. Hopefully soon.
> 
> I still think we should eventually move the pass later

It's after pass_sanopt now.

> avoid instrumenting places we'll not have any meaningful locations
> in the debug info - if only to reduce required trace bandwith.

Can you suggest how to check that?

-Andi


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-13 Thread Richard Biener
On Mon, Nov 12, 2018 at 4:16 AM Andi Kleen  wrote:
>
> On Sun, Nov 11, 2018 at 10:06:21AM +0100, Richard Biener wrote:
> > That is, usually debuggers look for a location list of a variable
> > and find, say, %rax.  But for ptwrite the debugger needs to
> > examine all active location lists for, say, %rax and figure out
> > that it contains the value for variable 'a'?
>
> In dwarf output you end up with a list of
>
> start-IP...stop-IP ...  variable locations
>
> Both the original load/store and PTWRITE are in the same scope,
> and the debugger just looks it up based on the IP,
> so it all works without any extra modifications.

Yes, that's how I thought it would work.

> I even had an earlier version of this that instrumented
> assembler output of the compiler with PTWRITE in a separate script,
> and it worked fine too.

Apart from eventually messing up branch range restrictions I guess ;)

> >
> > When there isn't any such relation between the ptwrite stored
> > value and any variable the ptwrite is useless, right?
>
> A programmer might still be able to make use of it
> based on the context or the order.

OK.

> e.g. if you don't instrument everything, but only specific
> variables, or you only instrument arguments and returns or
> similar then it could be still useful just based on the IP->symbol
> resolution. If you instrument too many things yes it will be
> hard to use without debug info resolution.

Did you gather any statistics on how many ptwrite instructions
that are generated by your patch are not covered by any
location range & expr?  I assume ptwrite is writing from register
input only so you probably should avoid instrumenting writes
of constants (will require an extra register)?

How does the .text size behave say for cc1 when you enable
the various granularities of instrumentation?  How many
ptwrite instructions are there per 100 regular instructions?

> > I hope you don't mind if this eventually slips to GCC 10 given
> > as you say there is no HW available right now.  (still waiting
> > for a CPU with CET ...)
>
> :-/
>
> Actually there is.  Gemini Lake Atom Hardware with Goldmont Plus
> is shipping for some time and you can buy them.

Ah, interesting.

Can we get an updated patch based on my review?

I still think we should eventually move the pass later and somehow
avoid instrumenting places we'll not have any meaningful locations
in the debug info - if only to reduce required trace bandwith.

Thanks,
Richard.

> -Andi


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-11 Thread Andi Kleen
On Sun, Nov 11, 2018 at 10:06:21AM +0100, Richard Biener wrote:
> That is, usually debuggers look for a location list of a variable
> and find, say, %rax.  But for ptwrite the debugger needs to
> examine all active location lists for, say, %rax and figure out
> that it contains the value for variable 'a'?

In dwarf output you end up with a list of

start-IP...stop-IP ...  variable locations

Both the original load/store and PTWRITE are in the same scope,
and the debugger just looks it up based on the IP,
so it all works without any extra modifications.

I even had an earlier version of this that instrumented
assembler output of the compiler with PTWRITE in a separate script,
and it worked fine too.
> 
> When there isn't any such relation between the ptwrite stored
> value and any variable the ptwrite is useless, right?

A programmer might still be able to make use of it
based on the context or the order.

e.g. if you don't instrument everything, but only specific
variables, or you only instrument arguments and returns or
similar then it could be still useful just based on the IP->symbol
resolution. If you instrument too many things yes it will be
hard to use without debug info resolution.

> I hope you don't mind if this eventually slips to GCC 10 given
> as you say there is no HW available right now.  (still waiting
> for a CPU with CET ...)

:-/

Actually there is.  Gemini Lake Atom Hardware with Goldmont Plus 
is shipping for some time and you can buy them.

-Andi


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-11 Thread Andi Kleen
On Sun, Nov 11, 2018 at 11:37:57AM -0700, Martin Sebor wrote:
> One other high-level comment: a more powerful interface to
> variable tracing than annotating declarations in the source
> would be to provide either the names of the symbols to trace
> on the command line or in an external file.  That way tracing
> could be enabled for objects and types declared in read-only
> files (such as system headers), and would let the user more
> easily experiment with annotations.

For variables/functions if you add at the end of the source file

typeof(foo) __attribute__(("vartrace"));

it should enable it in theory (haven't tested) for both
variables or functions. Not sure about types, probably not,
but that might not be needed.

But it has to be at the end of the file, so -include doesn't work.
If an -include-after would be added to the preprocessor
it would work.

> This could be in addition to the attributes, and would require
> coming up with a way of identifying symbols with internal or
> no linkage, such as local variables, and perhaps also function

Individual local variables are hard, but you could likely
enable tracing for everything in the function with the 
attribute trick above.

-Andi


Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-11 Thread Martin Sebor

One other high-level comment: a more powerful interface to
variable tracing than annotating declarations in the source
would be to provide either the names of the symbols to trace
on the command line or in an external file.  That way tracing
could be enabled for objects and types declared in read-only
files (such as system headers), and would let the user more
easily experiment with annotations.

This could be in addition to the attributes, and would require
coming up with a way of identifying symbols with internal or
no linkage, such as local variables, and perhaps also function
arguments, return values, etc., if this mechanisms were to
provide access to those as well (I think it would be fine if
this "external" mechanism provided support to only a subset
of symbols).

Martin

On 11/04/2018 12:32 AM, Andi Kleen wrote:

From: Andi Kleen 

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an Processor Trace log, which allows log over head
logging of informatin.

This allows to reconstruct how values later, which can be useful for
debugging or other analysis of the program behavior. With the compiler
support this can be done with without having to manually add instrumentation
to the code.

Using dwarf information this can be later mapped back to the variables.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

The -fvartrace-locals options is experimental: it works, but it
generates redundant ptwrites because the pass doesn't use
the SSA information to minimize instrumentation. This could be optimized
later.

Currently the code can be tested with SDE, or on a Intel
Gemini Lake system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Linux perf can be used to
record the values

perf record -e intel_pt/ptw=1,branch=0/ program
perf script --itrace=crw -F +synth ...

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4

It is currently not able to decode all variable locations to names,
but a large subset.

Longer term hopefully gdb will support this information too.

The CPU can potentially generate very data high bandwidths when
code doing a lot of computation is heavily instrumented.
This can cause some data loss in both the CPU and also in perf
logging the data when the disk cannot keep up.

Running some larger workloads most workloads do not cause
CPU level overflows, but I've seen it with -fvartrace
with crafty, and with more workloads with -fvartrace-locals.

Recommendation is to not fully instrument programs,
but only areas of interest either at the file level or using
the attributes.

The other thing is that perf and the disk often cannot keep up
with the data bandwidth for longer computations. In this case
it's possible to use perf snapshot mode (add --snapshot
to the command line above). The data will be only logged to
a memory ring buffer then, and only dump the buffers on events
of interest by sending SIGUSR2 to the perf binrary.

In the future this will be hopefully better supported with
core files and gdb.

Passes bootstrap and test suite on x86_64-linux, also
bootstrapped and tested gcc itself with full -fvartrace
and -fvartrace-locals instrumentation.

gcc/:

2018-11-03  Andi Kleen  

* Makefile.in: Add tree-vartrace.o.
* common.opt: Add -fvartrace, -fvartrace-returns,
-fvartrace-args, -fvartrace-reads, -fvartrace-writes,
-fvartrace-locals
* config/i386/i386.c (ix86_vartrace_func): Add.
(TARGET_VARTRACE_FUNC): Add.
* doc/extend.texi: Document vartrace/no_vartrace
attributes.
* doc/invoke.texi: Document -fvartrace, -fvartrace-returns,
-fvartrace-args, -fvartrace-reads, -fvartrace-writes,
-fvartrace-locals
* doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
* passes.def: Add vartrace pass.
* target.def (vartrace_func): Add.
* tree-pass.h (make_pass_vartrace): Add.
* tree-vartrace.c: New file to implement vartrace pass.

gcc/c-family/:

2018-11-03  Andi Kleen  

* c-attribs.c (handle_vartrace_attribute): New function.

config/:

2018-11-03  Andi Kleen  

* bootstrap-vartrace.mk: New.
* bootstrap-vartrace-

Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-11 Thread Richard Biener
On Fri, Nov 9, 2018 at 7:18 PM Andi Kleen  wrote:
>
> Hi Richard,
>
> On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote:
> > > Passes bootstrap and test suite on x86_64-linux, also
> > > bootstrapped and tested gcc itself with full -fvartrace
> > > and -fvartrace-locals instrumentation.
> >
> > So how is this supposed to be used?  I guess in a
> > edit-debug cycle and not for production code?
>
> It can actually be used for production code.
>
> When processor trace is disabled the PTWRITE
> instructions acts as nops. So it's only increasing
> the code foot print. Since the instrumentation
> should only log values which are already computed
> it normally doesn't cause any other code.
>
> Even when it is enabled the primary overhead is the
> additional memory bandwidth, since the CPU can
> do the logging in parallel to other code. As long
> as the instrumentation is not too excessive to generate
> too much memory bandwidth, it might be actually
> quite reasonable to even keep the logging on for
> production code, and use it as a "flight recorder",
> which is dumped on failures.

I see.

> This would also be the model in gdb, once we have support
> in it. You would run the program in the debugger
> and it just logs the data to a memory buffer,
> but when stopping the value history can be examined.

Hmm, so the debugger still needs to relate the ptwrite
instruction with the actual variable the data is for.  I suppose
practically this means that var-tracking needs to be able to
compute a location list for a variable that happens to overlap
with the stored value?

That is, usually debuggers look for a location list of a variable
and find, say, %rax.  But for ptwrite the debugger needs to
examine all active location lists for, say, %rax and figure out
that it contains the value for variable 'a'?

When there isn't any such relation between the ptwrite stored
value and any variable the ptwrite is useless, right?

> There's also some ongoing work to add (optional) support
> for PT to Linux crash dumps, so eventually that will
> work without having to always run the debugger.
>
> Today it can be done by running perf in the background
> to record the PT, however there the setup is a bit
> more complicated.
>
> The primary use case I was envisioning was to set
> the attribute on some critical functions/structures/types
> of interest and then have a very overhead logging
> option for them (generally cheaper than
> equivalent software instrumentation). And then
> they automatically gets logged without the programmer
> needing to add lots of instrumentation code to
> catch every instance. So think of it as a
> "hardware accelerated printf"
>
> >
> > What do you actually write with PTWRITE?  I suppose
> > you need to keep a ID to something mapping somewhere
> > so you can make sense of the perf records?
>
> PTWRITE writes 32bit/64bit values. The CPU reports the
> IP of PTWRITE in the log, either explicitely, or implicitely if branch
> trace is enabled too. The IP can then be used to look up
> the DWARF scope for that IP. Then the decoder
> decodes the operand of PTWRITE and maps it back using
> the dwarf information. So it all works using
> existing debugger infrastructure, and a quite simple
> instruction decoder.
>
> I'll clarify that in the description.
>
> >
> > Few comments inline below, but I'm not sure if this
> > whole thing is interesting for GCC (as opposed to being
> > a instrumentation plugin)
>
> I'm biased, but I think automatic data tracing is a very exciting
> use case, so hopefully it can be considered for mainstream gcc.
>
> > >   
> > > handle_no_profile_instrument_function_attribute,
> > >   NULL },
> > > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, 
> > > tree name, tree, int,
> > >return NULL_TREE;
> > >  }
> > >
> > > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in
> > > +   struct attribute_spec.handler.  */
> > > +
> > > +static tree
> > > +handle_vartrace_attribute (tree *node, tree, tree, int flags,
> > > +  bool *)
> > > +{
> > > +  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
> > > +*node = build_variant_type_copy (*node);
> >
> > I don't think you want the attribute on types.  As far as I understood your
> > descriptions it should only be on variables and functions.
>
> The idea was that it's possible to trace all instances of a type,
> especially structure members. Otherwise it will be harder for
> the programmer to hunt down every instance.
>
> For example if I have a structure that is used over a program,
> and one member gets the wrong value.
>
> I can do then in the header:
>
> struct foo {
> int member __attribute__(("vartrace"));
> };
>
> and then recompile the program. Every instance of writing to
> member will then be automatically instrumented (assuming
> the program stays type safe)
>
> Makes sense?

OK.  The user

Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-09 Thread Martin Sebor

On 11/04/2018 12:32 AM, Andi Kleen wrote:

From: Andi Kleen 

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an Processor Trace log, which allows log over head
logging of informatin.

This allows to reconstruct how values later, which can be useful for
debugging or other analysis of the program behavior. With the compiler
support this can be done with without having to manually add instrumentation
to the code.

Using dwarf information this can be later mapped back to the variables.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

The -fvartrace-locals options is experimental: it works, but it
generates redundant ptwrites because the pass doesn't use
the SSA information to minimize instrumentation. This could be optimized
later.

Currently the code can be tested with SDE, or on a Intel
Gemini Lake system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Linux perf can be used to
record the values

perf record -e intel_pt/ptw=1,branch=0/ program
perf script --itrace=crw -F +synth ...

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4

It is currently not able to decode all variable locations to names,
but a large subset.

Longer term hopefully gdb will support this information too.

The CPU can potentially generate very data high bandwidths when
code doing a lot of computation is heavily instrumented.
This can cause some data loss in both the CPU and also in perf
logging the data when the disk cannot keep up.

Running some larger workloads most workloads do not cause
CPU level overflows, but I've seen it with -fvartrace
with crafty, and with more workloads with -fvartrace-locals.

Recommendation is to not fully instrument programs,
but only areas of interest either at the file level or using
the attributes.

The other thing is that perf and the disk often cannot keep up
with the data bandwidth for longer computations. In this case
it's possible to use perf snapshot mode (add --snapshot
to the command line above). The data will be only logged to
a memory ring buffer then, and only dump the buffers on events
of interest by sending SIGUSR2 to the perf binrary.

In the future this will be hopefully better supported with
core files and gdb.

Passes bootstrap and test suite on x86_64-linux, also
bootstrapped and tested gcc itself with full -fvartrace
and -fvartrace-locals instrumentation.


(I initially meant to just suggest detecting and rejecting the two
mutually exclusive attributes but as I read the rest of the patch
to better understand what it's about I noticed a few other issues
I thought would be useful to point out.)

...

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 4416b5042f7..66bbd87921f 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -325,6 +327,12 @@ const struct attribute_spec c_common_attribute_table[] =
   { "no_instrument_function", 0, 0, true,  false, false, false,
  handle_no_instrument_function_attribute,
  NULL },
+  { "vartrace",0, 0, false,  false, false, false,
+ handle_vartrace_attribute,
+ NULL },
+  { "no_vartrace", 0, 0, false,  false, false, false,
+ handle_vartrace_attribute,
+ NULL },
   { "no_profile_instrument_function",  0, 0, true, false, false, false,
  handle_no_profile_instrument_function_attribute,
  NULL },


Unless mixing these attributes on the same declaration makes sense
I would suggest to either define the exclusions that should be
automatically applied to the attributes (see attribute exclusions),
or to enforce them in the handler.  Judging only by the names it
looks to me like vartrace should be mutually exclusive with
no_vartrace.


@@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, tree 
name, tree, int,
   return NULL_TREE;
 }

+/* Handle "vartrace"/"no_vartrace" attributes; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_vartrace_attribute (tree *node, tree, tre

Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-09 Thread Andi Kleen
Hi Richard,

On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote:
> > Passes bootstrap and test suite on x86_64-linux, also
> > bootstrapped and tested gcc itself with full -fvartrace
> > and -fvartrace-locals instrumentation.
> 
> So how is this supposed to be used?  I guess in a
> edit-debug cycle and not for production code?

It can actually be used for production code.

When processor trace is disabled the PTWRITE
instructions acts as nops. So it's only increasing
the code foot print. Since the instrumentation
should only log values which are already computed
it normally doesn't cause any other code.

Even when it is enabled the primary overhead is the
additional memory bandwidth, since the CPU can
do the logging in parallel to other code. As long
as the instrumentation is not too excessive to generate
too much memory bandwidth, it might be actually
quite reasonable to even keep the logging on for
production code, and use it as a "flight recorder",
which is dumped on failures. 

This would also be the model in gdb, once we have support
in it. You would run the program in the debugger
and it just logs the data to a memory buffer,
but when stopping the value history can be examined.

There's also some ongoing work to add (optional) support
for PT to Linux crash dumps, so eventually that will
work without having to always run the debugger.

Today it can be done by running perf in the background
to record the PT, however there the setup is a bit
more complicated.

The primary use case I was envisioning was to set
the attribute on some critical functions/structures/types
of interest and then have a very overhead logging
option for them (generally cheaper than
equivalent software instrumentation). And then
they automatically gets logged without the programmer
needing to add lots of instrumentation code to 
catch every instance. So think of it as a 
"hardware accelerated printf"

> 
> What do you actually write with PTWRITE?  I suppose
> you need to keep a ID to something mapping somewhere
> so you can make sense of the perf records?

PTWRITE writes 32bit/64bit values. The CPU reports the
IP of PTWRITE in the log, either explicitely, or implicitely if branch
trace is enabled too. The IP can then be used to look up
the DWARF scope for that IP. Then the decoder
decodes the operand of PTWRITE and maps it back using 
the dwarf information. So it all works using 
existing debugger infrastructure, and a quite simple
instruction decoder.

I'll clarify that in the description.

> 
> Few comments inline below, but I'm not sure if this
> whole thing is interesting for GCC (as opposed to being
> a instrumentation plugin)

I'm biased, but I think automatic data tracing is a very exciting
use case, so hopefully it can be considered for mainstream gcc.

> >   
> > handle_no_profile_instrument_function_attribute,
> >   NULL },
> > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, 
> > tree name, tree, int,
> >return NULL_TREE;
> >  }
> >
> > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in
> > +   struct attribute_spec.handler.  */
> > +
> > +static tree
> > +handle_vartrace_attribute (tree *node, tree, tree, int flags,
> > +  bool *)
> > +{
> > +  if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE))
> > +*node = build_variant_type_copy (*node);
> 
> I don't think you want the attribute on types.  As far as I understood your
> descriptions it should only be on variables and functions.

The idea was that it's possible to trace all instances of a type,
especially structure members. Otherwise it will be harder for
the programmer to hunt down every instance.

For example if I have a structure that is used over a program,
and one member gets the wrong value.

I can do then in the header:

struct foo {
int member __attribute__(("vartrace"));
};

and then recompile the program. Every instance of writing to
member will then be automatically instrumented (assuming
the program stays type safe)

Makes sense?

[BTW I considered adding an address trace
too for pointer writes to hunt down the non type safe
instances and possibly some other use cases. 
That might be possible follow on work]

> > +
> >  #undef TARGET_GIMPLIFY_VA_ARG_EXPR
> >  #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg
> >
> > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> > index 1eca009e255..08286aa4591 100644
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -3193,6 +3193,13 @@ the standard C library can be guaranteed not to 
> > throw an exception
> >  with the notable exceptions of @code{qsort} and @code{bsearch} that
> >  take function pointer arguments.
> >
> > +@item no_vartrace
> > +@cindex @code{no_vartrace} function or variable attribute
> > +Disable data tracing for the function or variable or structured field
> > +marked with this attribute. Applies to types. Currentl

Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-09 Thread Richard Biener
On Sun, Nov 4, 2018 at 7:33 AM Andi Kleen  wrote:
>
> From: Andi Kleen 
>
> Add a new pass to automatically instrument changes to variables
> with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
> field into an Processor Trace log, which allows log over head
> logging of informatin.
>
> This allows to reconstruct how values later, which can be useful for
> debugging or other analysis of the program behavior. With the compiler
> support this can be done with without having to manually add instrumentation
> to the code.
>
> Using dwarf information this can be later mapped back to the variables.
>
> There are new options to enable instrumentation for different types,
> and also a new attribute to control analysis fine grained per
> function or variable level. The attributes can be set on both
> the variable and the type level, and also on structure fields.
> This allows to enable tracing only for specific code in large
> programs.
>
> The pass is generic, but only the x86 backend enables the necessary
> hooks. When the backend enables the necessary hooks (with -mptwrite)
> there is an additional pass that looks through the code for
> attribute vartrace enabled functions or variables.
>
> The -fvartrace-locals options is experimental: it works, but it
> generates redundant ptwrites because the pass doesn't use
> the SSA information to minimize instrumentation. This could be optimized
> later.
>
> Currently the code can be tested with SDE, or on a Intel
> Gemini Lake system with a new enough Linux kernel (v4.10+)
> that supports PTWRITE for PT. Linux perf can be used to
> record the values
>
> perf record -e intel_pt/ptw=1,branch=0/ program
> perf script --itrace=crw -F +synth ...
>
> I have an experimential version of perf that can also use
> dwarf information to symbolize many[1] values back to their variable
> names. So far it is not in standard perf, but available at
>
> https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4
>
> It is currently not able to decode all variable locations to names,
> but a large subset.
>
> Longer term hopefully gdb will support this information too.
>
> The CPU can potentially generate very data high bandwidths when
> code doing a lot of computation is heavily instrumented.
> This can cause some data loss in both the CPU and also in perf
> logging the data when the disk cannot keep up.
>
> Running some larger workloads most workloads do not cause
> CPU level overflows, but I've seen it with -fvartrace
> with crafty, and with more workloads with -fvartrace-locals.
>
> Recommendation is to not fully instrument programs,
> but only areas of interest either at the file level or using
> the attributes.
>
> The other thing is that perf and the disk often cannot keep up
> with the data bandwidth for longer computations. In this case
> it's possible to use perf snapshot mode (add --snapshot
> to the command line above). The data will be only logged to
> a memory ring buffer then, and only dump the buffers on events
> of interest by sending SIGUSR2 to the perf binrary.
>
> In the future this will be hopefully better supported with
> core files and gdb.
>
> Passes bootstrap and test suite on x86_64-linux, also
> bootstrapped and tested gcc itself with full -fvartrace
> and -fvartrace-locals instrumentation.

So how is this supposed to be used?  I guess in a
edit-debug cycle and not for production code?

What do you actually write with PTWRITE?  I suppose
you need to keep a ID to something mapping somewhere
so you can make sense of the perf records?

Few comments inline below, but I'm not sure if this
whole thing is interesting for GCC (as opposed to being
a instrumentation plugin)

> gcc/:
>
> 2018-11-03  Andi Kleen  
>
> * Makefile.in: Add tree-vartrace.o.
> * common.opt: Add -fvartrace, -fvartrace-returns,
> -fvartrace-args, -fvartrace-reads, -fvartrace-writes,
> -fvartrace-locals
> * config/i386/i386.c (ix86_vartrace_func): Add.
> (TARGET_VARTRACE_FUNC): Add.
> * doc/extend.texi: Document vartrace/no_vartrace
> attributes.
> * doc/invoke.texi: Document -fvartrace, -fvartrace-returns,
> -fvartrace-args, -fvartrace-reads, -fvartrace-writes,
> -fvartrace-locals
> * doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
> * passes.def: Add vartrace pass.
> * target.def (vartrace_func): Add.
> * tree-pass.h (make_pass_vartrace): Add.
> * tree-vartrace.c: New file to implement vartrace pass.
>
> gcc/c-family/:
>
> 2018-11-03  Andi Kleen  
>
> * c-attribs.c (handle_vartrace_attribute): New function.
>
> config/:
>
> 2018-11-03  Andi Kleen  
>
> * bootstrap-vartrace.mk: New.
> * bootstrap-vartrace-locals.mk: New.
> ---
>  config/bootstrap-vartrace-locals.mk |   3 +
>  config/bootstrap-vartrace.mk|   3 +
>  gcc/Makefile.in |   1 +
>  gcc/c-family/c-attribs.c|  23 

[PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-11-03 Thread Andi Kleen
From: Andi Kleen 

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an Processor Trace log, which allows log over head
logging of informatin.

This allows to reconstruct how values later, which can be useful for
debugging or other analysis of the program behavior. With the compiler
support this can be done with without having to manually add instrumentation
to the code.

Using dwarf information this can be later mapped back to the variables.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

The -fvartrace-locals options is experimental: it works, but it
generates redundant ptwrites because the pass doesn't use
the SSA information to minimize instrumentation. This could be optimized
later.

Currently the code can be tested with SDE, or on a Intel
Gemini Lake system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Linux perf can be used to
record the values

perf record -e intel_pt/ptw=1,branch=0/ program
perf script --itrace=crw -F +synth ...

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4

It is currently not able to decode all variable locations to names,
but a large subset.

Longer term hopefully gdb will support this information too.

The CPU can potentially generate very data high bandwidths when
code doing a lot of computation is heavily instrumented.
This can cause some data loss in both the CPU and also in perf
logging the data when the disk cannot keep up.

Running some larger workloads most workloads do not cause
CPU level overflows, but I've seen it with -fvartrace
with crafty, and with more workloads with -fvartrace-locals.

Recommendation is to not fully instrument programs,
but only areas of interest either at the file level or using
the attributes.

The other thing is that perf and the disk often cannot keep up
with the data bandwidth for longer computations. In this case
it's possible to use perf snapshot mode (add --snapshot
to the command line above). The data will be only logged to
a memory ring buffer then, and only dump the buffers on events
of interest by sending SIGUSR2 to the perf binrary.

In the future this will be hopefully better supported with
core files and gdb.

Passes bootstrap and test suite on x86_64-linux, also
bootstrapped and tested gcc itself with full -fvartrace
and -fvartrace-locals instrumentation.

gcc/:

2018-11-03  Andi Kleen  

* Makefile.in: Add tree-vartrace.o.
* common.opt: Add -fvartrace, -fvartrace-returns,
-fvartrace-args, -fvartrace-reads, -fvartrace-writes,
-fvartrace-locals
* config/i386/i386.c (ix86_vartrace_func): Add.
(TARGET_VARTRACE_FUNC): Add.
* doc/extend.texi: Document vartrace/no_vartrace
attributes.
* doc/invoke.texi: Document -fvartrace, -fvartrace-returns,
-fvartrace-args, -fvartrace-reads, -fvartrace-writes,
-fvartrace-locals
* doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
* passes.def: Add vartrace pass.
* target.def (vartrace_func): Add.
* tree-pass.h (make_pass_vartrace): Add.
* tree-vartrace.c: New file to implement vartrace pass.

gcc/c-family/:

2018-11-03  Andi Kleen  

* c-attribs.c (handle_vartrace_attribute): New function.

config/:

2018-11-03  Andi Kleen  

* bootstrap-vartrace.mk: New.
* bootstrap-vartrace-locals.mk: New.
---
 config/bootstrap-vartrace-locals.mk |   3 +
 config/bootstrap-vartrace.mk|   3 +
 gcc/Makefile.in |   1 +
 gcc/c-family/c-attribs.c|  23 ++
 gcc/common.opt  |  24 ++
 gcc/config/i386/i386.c  |  16 +
 gcc/doc/extend.texi |  13 +
 gcc/doc/invoke.texi |  29 ++
 gcc/doc/tm.texi |   4 +
 gcc/doc/tm.texi.in  |   2 +
 gcc/passes.def  |   1 +
 gcc/target.def  |   7 +
 gcc/tree-pass.h |   1 +
 gcc/tree-vartrace.c | 463 
 14 files changed, 590 insertions(+)
 create mode 100644 config/bootstrap-vartrace-locals.mk
 create mode 100644 config/bootstrap-vartrace.mk
 create mod

[PATCH 2/3] Add a pass to automatically add ptwrite instrumentation

2018-02-11 Thread Andi Kleen
From: Andi Kleen 

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an external Processor Trace log.

This allows to reconstruct how values later, which can be useful for
debugging or other analysis of the program behavior. With the compiler
support this can be done with without having to manually add instrumentation
to the code.

Using dwarf information this can be later mapped back to the variables.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

The -fvartrace-locals options is experimental: it works, but it
generates many redundant ptwrites because the pass doesn't use
the SSA information to minimize instrumentation. This could be optimized
later.

Currently the code can be tested with SDE, or on a Intel
Cherry Trail system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Linux perf can be used to
record the values

perf record -e intel_pt/ptw=1,branch=0/ program
perf script --itrace=crw -F +synth ...

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-2

Longer term hopefully gdb will support this information too.

Passes bootstrap and test suite on x86_64-linux.

[1] Many: so far it only supports register variables that are not
arguments.

gcc/:

2018-02-10  Andi Kleen  

* Makefile.in: Add tree-vartrace.o.
* common.opt: Add -fvartrace, -fvartrace-returns,
-fvartrace-args, -fvartrace-reads, -fvartrace-writes,
-fvartrace-locals
* config/i386/i386.c (ix86_vartrace_func): Add.
(TARGET_VARTRACE_FUNC): Add.
* doc/extend.texi: Document vartrace/no_vartrace
attributes.
* doc/invoke.texi: Document -fvartrace, -fvartrace-returns,
-fvartrace-args, -fvartrace-reads, -fvartrace-writes,
-fvartrace-locals
* doc/tm.texi (TARGET_VARTRACE_FUNC): Add.
* passes.def: Add vartrace pass.
* target.def (vartrace_func): Add.
* tree-pass.h (make_pass_vartrace): Add.
* tree-vartrace.c: New file to implement vartrace pass.

gcc/c-family/:

2018-02-10  Andi Kleen  

* c-attribs.c (handle_vartrace_attribute): New function.
---
 gcc/Makefile.in  |   1 +
 gcc/c-family/c-attribs.c |  23 +++
 gcc/common.opt   |  24 +++
 gcc/config/i386/i386.c   |  16 ++
 gcc/doc/extend.texi  |  13 ++
 gcc/doc/invoke.texi  |  29 +++
 gcc/doc/tm.texi  |   4 +
 gcc/doc/tm.texi.in   |   2 +
 gcc/passes.def   |   1 +
 gcc/target.def   |   7 +
 gcc/tree-pass.h  |   1 +
 gcc/tree-vartrace.c  | 462 +++
 12 files changed, 583 insertions(+)
 create mode 100644 gcc/tree-vartrace.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6c37e46f792..3bce0f21bb4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1580,6 +1580,7 @@ OBJS = \
tree-vectorizer.o \
tree-vector-builder.o \
tree-vrp.o \
+   tree-vartrace.o \
tree.o \
typed-splay-tree.o \
unique-ptr-tests.o \
diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 0261a45ec98..0c6488e0912 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -104,6 +104,8 @@ static tree handle_tls_model_attribute (tree *, tree, tree, 
int,
bool *);
 static tree handle_no_instrument_function_attribute (tree *, tree,
 tree, int, bool *);
+static tree handle_vartrace_attribute (tree *, tree,
+tree, int, bool *);
 static tree handle_no_profile_instrument_function_attribute (tree *, tree,
 tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
@@ -331,6 +333,12 @@ const struct attribute_spec c_common_attribute_table[] =
   { "no_instrument_function", 0, 0, true,  false, false, false,
  handle_no_instrument_function_attribute,
  NULL },
+  { "vartrace",  0, 0, false,  false, false, false,
+ handle_vartrace_attribute,