Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On Tue, Nov 13, 2018 at 07:37:27PM +0100, Richard Biener wrote: > I'd look at doing the instrumentation after var-tracking has run - that is > what computes the locations in the end. That means instrumenting on late RTL > after register allocation (and eventually with branch range restrictions in > place). Basically you'd instrument at the same time as generating debug info. Ok that would be a full rewrite. I'll check if it's really a problem first. I would prefer to stay on the GIMPLE level. -Andi
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On November 13, 2018 7:09:15 PM GMT+01:00, Andi Kleen wrote: >On Tue, Nov 13, 2018 at 09:03:52AM +0100, Richard Biener wrote: >> > I even had an earlier version of this that instrumented >> > assembler output of the compiler with PTWRITE in a separate script, >> > and it worked fine too. >> >> Apart from eventually messing up branch range restrictions I guess ;) > >You mean for LOOP? For everything else the assembler handles it I >believe. > >> Did you gather any statistics on how many ptwrite instructions >> that are generated by your patch are not covered by any >> location range & expr? > >Need to look into that. Any suggestions how to do it in the compiler? I guess you need to do that in a dwarf decoder somehow. >I had some decode failures with the perf dwarf decoder, >but I was usually blaming them on perf dwarf limitations. > >> I assume ptwrite is writing from register >> input only so you probably should avoid instrumenting writes >> of constants (will require an extra register)? > >Hmm, I think those are needed unfortunately because someone >might want to trace every update of of something. With branch >tracing it could be recreated theoretically but would >be a lot more work for the decoder. > >> How does the .text size behave say for cc1 when you enable >> the various granularities of instrumentation? How many >> ptwrite instructions are there per 100 regular instructions? > >With locals tracing (worst case) I see ~23% of all instructions >in cc1 be PTWRITE. Binary is ~27% bigger. OK, I suppose it will get better when addressing some of my review comments. >> Can we get an updated patch based on my review? > >Yes, working on it, also addressing Martin's comments. Hopefully soon. >> >> I still think we should eventually move the pass later > >It's after pass_sanopt now. > >> avoid instrumenting places we'll not have any meaningful locations >> in the debug info - if only to reduce required trace bandwith. > >Can you suggest how to check that? I'd look at doing the instrumentation after var-tracking has run - that is what computes the locations in the end. That means instrumenting on late RTL after register allocation (and eventually with branch range restrictions in place). Basically you'd instrument at the same time as generating debug info. Richard. >-Andi
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On Tue, Nov 13, 2018 at 09:03:52AM +0100, Richard Biener wrote: > > I even had an earlier version of this that instrumented > > assembler output of the compiler with PTWRITE in a separate script, > > and it worked fine too. > > Apart from eventually messing up branch range restrictions I guess ;) You mean for LOOP? For everything else the assembler handles it I believe. > Did you gather any statistics on how many ptwrite instructions > that are generated by your patch are not covered by any > location range & expr? Need to look into that. Any suggestions how to do it in the compiler? I had some decode failures with the perf dwarf decoder, but I was usually blaming them on perf dwarf limitations. > I assume ptwrite is writing from register > input only so you probably should avoid instrumenting writes > of constants (will require an extra register)? Hmm, I think those are needed unfortunately because someone might want to trace every update of of something. With branch tracing it could be recreated theoretically but would be a lot more work for the decoder. > How does the .text size behave say for cc1 when you enable > the various granularities of instrumentation? How many > ptwrite instructions are there per 100 regular instructions? With locals tracing (worst case) I see ~23% of all instructions in cc1 be PTWRITE. Binary is ~27% bigger. > Can we get an updated patch based on my review? Yes, working on it, also addressing Martin's comments. Hopefully soon. > > I still think we should eventually move the pass later It's after pass_sanopt now. > avoid instrumenting places we'll not have any meaningful locations > in the debug info - if only to reduce required trace bandwith. Can you suggest how to check that? -Andi
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On Mon, Nov 12, 2018 at 4:16 AM Andi Kleen wrote: > > On Sun, Nov 11, 2018 at 10:06:21AM +0100, Richard Biener wrote: > > That is, usually debuggers look for a location list of a variable > > and find, say, %rax. But for ptwrite the debugger needs to > > examine all active location lists for, say, %rax and figure out > > that it contains the value for variable 'a'? > > In dwarf output you end up with a list of > > start-IP...stop-IP ... variable locations > > Both the original load/store and PTWRITE are in the same scope, > and the debugger just looks it up based on the IP, > so it all works without any extra modifications. Yes, that's how I thought it would work. > I even had an earlier version of this that instrumented > assembler output of the compiler with PTWRITE in a separate script, > and it worked fine too. Apart from eventually messing up branch range restrictions I guess ;) > > > > When there isn't any such relation between the ptwrite stored > > value and any variable the ptwrite is useless, right? > > A programmer might still be able to make use of it > based on the context or the order. OK. > e.g. if you don't instrument everything, but only specific > variables, or you only instrument arguments and returns or > similar then it could be still useful just based on the IP->symbol > resolution. If you instrument too many things yes it will be > hard to use without debug info resolution. Did you gather any statistics on how many ptwrite instructions that are generated by your patch are not covered by any location range & expr? I assume ptwrite is writing from register input only so you probably should avoid instrumenting writes of constants (will require an extra register)? How does the .text size behave say for cc1 when you enable the various granularities of instrumentation? How many ptwrite instructions are there per 100 regular instructions? > > I hope you don't mind if this eventually slips to GCC 10 given > > as you say there is no HW available right now. (still waiting > > for a CPU with CET ...) > > :-/ > > Actually there is. Gemini Lake Atom Hardware with Goldmont Plus > is shipping for some time and you can buy them. Ah, interesting. Can we get an updated patch based on my review? I still think we should eventually move the pass later and somehow avoid instrumenting places we'll not have any meaningful locations in the debug info - if only to reduce required trace bandwith. Thanks, Richard. > -Andi
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On Sun, Nov 11, 2018 at 10:06:21AM +0100, Richard Biener wrote: > That is, usually debuggers look for a location list of a variable > and find, say, %rax. But for ptwrite the debugger needs to > examine all active location lists for, say, %rax and figure out > that it contains the value for variable 'a'? In dwarf output you end up with a list of start-IP...stop-IP ... variable locations Both the original load/store and PTWRITE are in the same scope, and the debugger just looks it up based on the IP, so it all works without any extra modifications. I even had an earlier version of this that instrumented assembler output of the compiler with PTWRITE in a separate script, and it worked fine too. > > When there isn't any such relation between the ptwrite stored > value and any variable the ptwrite is useless, right? A programmer might still be able to make use of it based on the context or the order. e.g. if you don't instrument everything, but only specific variables, or you only instrument arguments and returns or similar then it could be still useful just based on the IP->symbol resolution. If you instrument too many things yes it will be hard to use without debug info resolution. > I hope you don't mind if this eventually slips to GCC 10 given > as you say there is no HW available right now. (still waiting > for a CPU with CET ...) :-/ Actually there is. Gemini Lake Atom Hardware with Goldmont Plus is shipping for some time and you can buy them. -Andi
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On Sun, Nov 11, 2018 at 11:37:57AM -0700, Martin Sebor wrote: > One other high-level comment: a more powerful interface to > variable tracing than annotating declarations in the source > would be to provide either the names of the symbols to trace > on the command line or in an external file. That way tracing > could be enabled for objects and types declared in read-only > files (such as system headers), and would let the user more > easily experiment with annotations. For variables/functions if you add at the end of the source file typeof(foo) __attribute__(("vartrace")); it should enable it in theory (haven't tested) for both variables or functions. Not sure about types, probably not, but that might not be needed. But it has to be at the end of the file, so -include doesn't work. If an -include-after would be added to the preprocessor it would work. > This could be in addition to the attributes, and would require > coming up with a way of identifying symbols with internal or > no linkage, such as local variables, and perhaps also function Individual local variables are hard, but you could likely enable tracing for everything in the function with the attribute trick above. -Andi
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
One other high-level comment: a more powerful interface to variable tracing than annotating declarations in the source would be to provide either the names of the symbols to trace on the command line or in an external file. That way tracing could be enabled for objects and types declared in read-only files (such as system headers), and would let the user more easily experiment with annotations. This could be in addition to the attributes, and would require coming up with a way of identifying symbols with internal or no linkage, such as local variables, and perhaps also function arguments, return values, etc., if this mechanisms were to provide access to those as well (I think it would be fine if this "external" mechanism provided support to only a subset of symbols). Martin On 11/04/2018 12:32 AM, Andi Kleen wrote: From: Andi Kleen Add a new pass to automatically instrument changes to variables with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte field into an Processor Trace log, which allows log over head logging of informatin. This allows to reconstruct how values later, which can be useful for debugging or other analysis of the program behavior. With the compiler support this can be done with without having to manually add instrumentation to the code. Using dwarf information this can be later mapped back to the variables. There are new options to enable instrumentation for different types, and also a new attribute to control analysis fine grained per function or variable level. The attributes can be set on both the variable and the type level, and also on structure fields. This allows to enable tracing only for specific code in large programs. The pass is generic, but only the x86 backend enables the necessary hooks. When the backend enables the necessary hooks (with -mptwrite) there is an additional pass that looks through the code for attribute vartrace enabled functions or variables. The -fvartrace-locals options is experimental: it works, but it generates redundant ptwrites because the pass doesn't use the SSA information to minimize instrumentation. This could be optimized later. Currently the code can be tested with SDE, or on a Intel Gemini Lake system with a new enough Linux kernel (v4.10+) that supports PTWRITE for PT. Linux perf can be used to record the values perf record -e intel_pt/ptw=1,branch=0/ program perf script --itrace=crw -F +synth ... I have an experimential version of perf that can also use dwarf information to symbolize many[1] values back to their variable names. So far it is not in standard perf, but available at https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4 It is currently not able to decode all variable locations to names, but a large subset. Longer term hopefully gdb will support this information too. The CPU can potentially generate very data high bandwidths when code doing a lot of computation is heavily instrumented. This can cause some data loss in both the CPU and also in perf logging the data when the disk cannot keep up. Running some larger workloads most workloads do not cause CPU level overflows, but I've seen it with -fvartrace with crafty, and with more workloads with -fvartrace-locals. Recommendation is to not fully instrument programs, but only areas of interest either at the file level or using the attributes. The other thing is that perf and the disk often cannot keep up with the data bandwidth for longer computations. In this case it's possible to use perf snapshot mode (add --snapshot to the command line above). The data will be only logged to a memory ring buffer then, and only dump the buffers on events of interest by sending SIGUSR2 to the perf binrary. In the future this will be hopefully better supported with core files and gdb. Passes bootstrap and test suite on x86_64-linux, also bootstrapped and tested gcc itself with full -fvartrace and -fvartrace-locals instrumentation. gcc/: 2018-11-03 Andi Kleen * Makefile.in: Add tree-vartrace.o. * common.opt: Add -fvartrace, -fvartrace-returns, -fvartrace-args, -fvartrace-reads, -fvartrace-writes, -fvartrace-locals * config/i386/i386.c (ix86_vartrace_func): Add. (TARGET_VARTRACE_FUNC): Add. * doc/extend.texi: Document vartrace/no_vartrace attributes. * doc/invoke.texi: Document -fvartrace, -fvartrace-returns, -fvartrace-args, -fvartrace-reads, -fvartrace-writes, -fvartrace-locals * doc/tm.texi (TARGET_VARTRACE_FUNC): Add. * passes.def: Add vartrace pass. * target.def (vartrace_func): Add. * tree-pass.h (make_pass_vartrace): Add. * tree-vartrace.c: New file to implement vartrace pass. gcc/c-family/: 2018-11-03 Andi Kleen * c-attribs.c (handle_vartrace_attribute): New function. config/: 2018-11-03 Andi Kleen * bootstrap-vartrace.mk: New. * bootstrap-vartrace-
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On Fri, Nov 9, 2018 at 7:18 PM Andi Kleen wrote: > > Hi Richard, > > On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote: > > > Passes bootstrap and test suite on x86_64-linux, also > > > bootstrapped and tested gcc itself with full -fvartrace > > > and -fvartrace-locals instrumentation. > > > > So how is this supposed to be used? I guess in a > > edit-debug cycle and not for production code? > > It can actually be used for production code. > > When processor trace is disabled the PTWRITE > instructions acts as nops. So it's only increasing > the code foot print. Since the instrumentation > should only log values which are already computed > it normally doesn't cause any other code. > > Even when it is enabled the primary overhead is the > additional memory bandwidth, since the CPU can > do the logging in parallel to other code. As long > as the instrumentation is not too excessive to generate > too much memory bandwidth, it might be actually > quite reasonable to even keep the logging on for > production code, and use it as a "flight recorder", > which is dumped on failures. I see. > This would also be the model in gdb, once we have support > in it. You would run the program in the debugger > and it just logs the data to a memory buffer, > but when stopping the value history can be examined. Hmm, so the debugger still needs to relate the ptwrite instruction with the actual variable the data is for. I suppose practically this means that var-tracking needs to be able to compute a location list for a variable that happens to overlap with the stored value? That is, usually debuggers look for a location list of a variable and find, say, %rax. But for ptwrite the debugger needs to examine all active location lists for, say, %rax and figure out that it contains the value for variable 'a'? When there isn't any such relation between the ptwrite stored value and any variable the ptwrite is useless, right? > There's also some ongoing work to add (optional) support > for PT to Linux crash dumps, so eventually that will > work without having to always run the debugger. > > Today it can be done by running perf in the background > to record the PT, however there the setup is a bit > more complicated. > > The primary use case I was envisioning was to set > the attribute on some critical functions/structures/types > of interest and then have a very overhead logging > option for them (generally cheaper than > equivalent software instrumentation). And then > they automatically gets logged without the programmer > needing to add lots of instrumentation code to > catch every instance. So think of it as a > "hardware accelerated printf" > > > > > What do you actually write with PTWRITE? I suppose > > you need to keep a ID to something mapping somewhere > > so you can make sense of the perf records? > > PTWRITE writes 32bit/64bit values. The CPU reports the > IP of PTWRITE in the log, either explicitely, or implicitely if branch > trace is enabled too. The IP can then be used to look up > the DWARF scope for that IP. Then the decoder > decodes the operand of PTWRITE and maps it back using > the dwarf information. So it all works using > existing debugger infrastructure, and a quite simple > instruction decoder. > > I'll clarify that in the description. > > > > > Few comments inline below, but I'm not sure if this > > whole thing is interesting for GCC (as opposed to being > > a instrumentation plugin) > > I'm biased, but I think automatic data tracing is a very exciting > use case, so hopefully it can be considered for mainstream gcc. > > > > > > > handle_no_profile_instrument_function_attribute, > > > NULL }, > > > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, > > > tree name, tree, int, > > >return NULL_TREE; > > > } > > > > > > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in > > > + struct attribute_spec.handler. */ > > > + > > > +static tree > > > +handle_vartrace_attribute (tree *node, tree, tree, int flags, > > > + bool *) > > > +{ > > > + if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE)) > > > +*node = build_variant_type_copy (*node); > > > > I don't think you want the attribute on types. As far as I understood your > > descriptions it should only be on variables and functions. > > The idea was that it's possible to trace all instances of a type, > especially structure members. Otherwise it will be harder for > the programmer to hunt down every instance. > > For example if I have a structure that is used over a program, > and one member gets the wrong value. > > I can do then in the header: > > struct foo { > int member __attribute__(("vartrace")); > }; > > and then recompile the program. Every instance of writing to > member will then be automatically instrumented (assuming > the program stays type safe) > > Makes sense? OK. The user
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On 11/04/2018 12:32 AM, Andi Kleen wrote: From: Andi Kleen Add a new pass to automatically instrument changes to variables with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte field into an Processor Trace log, which allows log over head logging of informatin. This allows to reconstruct how values later, which can be useful for debugging or other analysis of the program behavior. With the compiler support this can be done with without having to manually add instrumentation to the code. Using dwarf information this can be later mapped back to the variables. There are new options to enable instrumentation for different types, and also a new attribute to control analysis fine grained per function or variable level. The attributes can be set on both the variable and the type level, and also on structure fields. This allows to enable tracing only for specific code in large programs. The pass is generic, but only the x86 backend enables the necessary hooks. When the backend enables the necessary hooks (with -mptwrite) there is an additional pass that looks through the code for attribute vartrace enabled functions or variables. The -fvartrace-locals options is experimental: it works, but it generates redundant ptwrites because the pass doesn't use the SSA information to minimize instrumentation. This could be optimized later. Currently the code can be tested with SDE, or on a Intel Gemini Lake system with a new enough Linux kernel (v4.10+) that supports PTWRITE for PT. Linux perf can be used to record the values perf record -e intel_pt/ptw=1,branch=0/ program perf script --itrace=crw -F +synth ... I have an experimential version of perf that can also use dwarf information to symbolize many[1] values back to their variable names. So far it is not in standard perf, but available at https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4 It is currently not able to decode all variable locations to names, but a large subset. Longer term hopefully gdb will support this information too. The CPU can potentially generate very data high bandwidths when code doing a lot of computation is heavily instrumented. This can cause some data loss in both the CPU and also in perf logging the data when the disk cannot keep up. Running some larger workloads most workloads do not cause CPU level overflows, but I've seen it with -fvartrace with crafty, and with more workloads with -fvartrace-locals. Recommendation is to not fully instrument programs, but only areas of interest either at the file level or using the attributes. The other thing is that perf and the disk often cannot keep up with the data bandwidth for longer computations. In this case it's possible to use perf snapshot mode (add --snapshot to the command line above). The data will be only logged to a memory ring buffer then, and only dump the buffers on events of interest by sending SIGUSR2 to the perf binrary. In the future this will be hopefully better supported with core files and gdb. Passes bootstrap and test suite on x86_64-linux, also bootstrapped and tested gcc itself with full -fvartrace and -fvartrace-locals instrumentation. (I initially meant to just suggest detecting and rejecting the two mutually exclusive attributes but as I read the rest of the patch to better understand what it's about I noticed a few other issues I thought would be useful to point out.) ... diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index 4416b5042f7..66bbd87921f 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -325,6 +327,12 @@ const struct attribute_spec c_common_attribute_table[] = { "no_instrument_function", 0, 0, true, false, false, false, handle_no_instrument_function_attribute, NULL }, + { "vartrace",0, 0, false, false, false, false, + handle_vartrace_attribute, + NULL }, + { "no_vartrace", 0, 0, false, false, false, false, + handle_vartrace_attribute, + NULL }, { "no_profile_instrument_function", 0, 0, true, false, false, false, handle_no_profile_instrument_function_attribute, NULL }, Unless mixing these attributes on the same declaration makes sense I would suggest to either define the exclusions that should be automatically applied to the attributes (see attribute exclusions), or to enforce them in the handler. Judging only by the names it looks to me like vartrace should be mutually exclusive with no_vartrace. @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, tree name, tree, int, return NULL_TREE; } +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_vartrace_attribute (tree *node, tree, tre
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
Hi Richard, On Fri, Nov 09, 2018 at 04:27:22PM +0100, Richard Biener wrote: > > Passes bootstrap and test suite on x86_64-linux, also > > bootstrapped and tested gcc itself with full -fvartrace > > and -fvartrace-locals instrumentation. > > So how is this supposed to be used? I guess in a > edit-debug cycle and not for production code? It can actually be used for production code. When processor trace is disabled the PTWRITE instructions acts as nops. So it's only increasing the code foot print. Since the instrumentation should only log values which are already computed it normally doesn't cause any other code. Even when it is enabled the primary overhead is the additional memory bandwidth, since the CPU can do the logging in parallel to other code. As long as the instrumentation is not too excessive to generate too much memory bandwidth, it might be actually quite reasonable to even keep the logging on for production code, and use it as a "flight recorder", which is dumped on failures. This would also be the model in gdb, once we have support in it. You would run the program in the debugger and it just logs the data to a memory buffer, but when stopping the value history can be examined. There's also some ongoing work to add (optional) support for PT to Linux crash dumps, so eventually that will work without having to always run the debugger. Today it can be done by running perf in the background to record the PT, however there the setup is a bit more complicated. The primary use case I was envisioning was to set the attribute on some critical functions/structures/types of interest and then have a very overhead logging option for them (generally cheaper than equivalent software instrumentation). And then they automatically gets logged without the programmer needing to add lots of instrumentation code to catch every instance. So think of it as a "hardware accelerated printf" > > What do you actually write with PTWRITE? I suppose > you need to keep a ID to something mapping somewhere > so you can make sense of the perf records? PTWRITE writes 32bit/64bit values. The CPU reports the IP of PTWRITE in the log, either explicitely, or implicitely if branch trace is enabled too. The IP can then be used to look up the DWARF scope for that IP. Then the decoder decodes the operand of PTWRITE and maps it back using the dwarf information. So it all works using existing debugger infrastructure, and a quite simple instruction decoder. I'll clarify that in the description. > > Few comments inline below, but I'm not sure if this > whole thing is interesting for GCC (as opposed to being > a instrumentation plugin) I'm biased, but I think automatic data tracing is a very exciting use case, so hopefully it can be considered for mainstream gcc. > > > > handle_no_profile_instrument_function_attribute, > > NULL }, > > @@ -767,6 +775,21 @@ handle_no_sanitize_undefined_attribute (tree *node, > > tree name, tree, int, > >return NULL_TREE; > > } > > > > +/* Handle "vartrace"/"no_vartrace" attributes; arguments as in > > + struct attribute_spec.handler. */ > > + > > +static tree > > +handle_vartrace_attribute (tree *node, tree, tree, int flags, > > + bool *) > > +{ > > + if (TYPE_P (*node) && !(flags & (int) ATTR_FLAG_TYPE_IN_PLACE)) > > +*node = build_variant_type_copy (*node); > > I don't think you want the attribute on types. As far as I understood your > descriptions it should only be on variables and functions. The idea was that it's possible to trace all instances of a type, especially structure members. Otherwise it will be harder for the programmer to hunt down every instance. For example if I have a structure that is used over a program, and one member gets the wrong value. I can do then in the header: struct foo { int member __attribute__(("vartrace")); }; and then recompile the program. Every instance of writing to member will then be automatically instrumented (assuming the program stays type safe) Makes sense? [BTW I considered adding an address trace too for pointer writes to hunt down the non type safe instances and possibly some other use cases. That might be possible follow on work] > > + > > #undef TARGET_GIMPLIFY_VA_ARG_EXPR > > #define TARGET_GIMPLIFY_VA_ARG_EXPR ix86_gimplify_va_arg > > > > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > > index 1eca009e255..08286aa4591 100644 > > --- a/gcc/doc/extend.texi > > +++ b/gcc/doc/extend.texi > > @@ -3193,6 +3193,13 @@ the standard C library can be guaranteed not to > > throw an exception > > with the notable exceptions of @code{qsort} and @code{bsearch} that > > take function pointer arguments. > > > > +@item no_vartrace > > +@cindex @code{no_vartrace} function or variable attribute > > +Disable data tracing for the function or variable or structured field > > +marked with this attribute. Applies to types. Currentl
Re: [PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
On Sun, Nov 4, 2018 at 7:33 AM Andi Kleen wrote: > > From: Andi Kleen > > Add a new pass to automatically instrument changes to variables > with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte > field into an Processor Trace log, which allows log over head > logging of informatin. > > This allows to reconstruct how values later, which can be useful for > debugging or other analysis of the program behavior. With the compiler > support this can be done with without having to manually add instrumentation > to the code. > > Using dwarf information this can be later mapped back to the variables. > > There are new options to enable instrumentation for different types, > and also a new attribute to control analysis fine grained per > function or variable level. The attributes can be set on both > the variable and the type level, and also on structure fields. > This allows to enable tracing only for specific code in large > programs. > > The pass is generic, but only the x86 backend enables the necessary > hooks. When the backend enables the necessary hooks (with -mptwrite) > there is an additional pass that looks through the code for > attribute vartrace enabled functions or variables. > > The -fvartrace-locals options is experimental: it works, but it > generates redundant ptwrites because the pass doesn't use > the SSA information to minimize instrumentation. This could be optimized > later. > > Currently the code can be tested with SDE, or on a Intel > Gemini Lake system with a new enough Linux kernel (v4.10+) > that supports PTWRITE for PT. Linux perf can be used to > record the values > > perf record -e intel_pt/ptw=1,branch=0/ program > perf script --itrace=crw -F +synth ... > > I have an experimential version of perf that can also use > dwarf information to symbolize many[1] values back to their variable > names. So far it is not in standard perf, but available at > > https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4 > > It is currently not able to decode all variable locations to names, > but a large subset. > > Longer term hopefully gdb will support this information too. > > The CPU can potentially generate very data high bandwidths when > code doing a lot of computation is heavily instrumented. > This can cause some data loss in both the CPU and also in perf > logging the data when the disk cannot keep up. > > Running some larger workloads most workloads do not cause > CPU level overflows, but I've seen it with -fvartrace > with crafty, and with more workloads with -fvartrace-locals. > > Recommendation is to not fully instrument programs, > but only areas of interest either at the file level or using > the attributes. > > The other thing is that perf and the disk often cannot keep up > with the data bandwidth for longer computations. In this case > it's possible to use perf snapshot mode (add --snapshot > to the command line above). The data will be only logged to > a memory ring buffer then, and only dump the buffers on events > of interest by sending SIGUSR2 to the perf binrary. > > In the future this will be hopefully better supported with > core files and gdb. > > Passes bootstrap and test suite on x86_64-linux, also > bootstrapped and tested gcc itself with full -fvartrace > and -fvartrace-locals instrumentation. So how is this supposed to be used? I guess in a edit-debug cycle and not for production code? What do you actually write with PTWRITE? I suppose you need to keep a ID to something mapping somewhere so you can make sense of the perf records? Few comments inline below, but I'm not sure if this whole thing is interesting for GCC (as opposed to being a instrumentation plugin) > gcc/: > > 2018-11-03 Andi Kleen > > * Makefile.in: Add tree-vartrace.o. > * common.opt: Add -fvartrace, -fvartrace-returns, > -fvartrace-args, -fvartrace-reads, -fvartrace-writes, > -fvartrace-locals > * config/i386/i386.c (ix86_vartrace_func): Add. > (TARGET_VARTRACE_FUNC): Add. > * doc/extend.texi: Document vartrace/no_vartrace > attributes. > * doc/invoke.texi: Document -fvartrace, -fvartrace-returns, > -fvartrace-args, -fvartrace-reads, -fvartrace-writes, > -fvartrace-locals > * doc/tm.texi (TARGET_VARTRACE_FUNC): Add. > * passes.def: Add vartrace pass. > * target.def (vartrace_func): Add. > * tree-pass.h (make_pass_vartrace): Add. > * tree-vartrace.c: New file to implement vartrace pass. > > gcc/c-family/: > > 2018-11-03 Andi Kleen > > * c-attribs.c (handle_vartrace_attribute): New function. > > config/: > > 2018-11-03 Andi Kleen > > * bootstrap-vartrace.mk: New. > * bootstrap-vartrace-locals.mk: New. > --- > config/bootstrap-vartrace-locals.mk | 3 + > config/bootstrap-vartrace.mk| 3 + > gcc/Makefile.in | 1 + > gcc/c-family/c-attribs.c| 23
[PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
From: Andi Kleen Add a new pass to automatically instrument changes to variables with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte field into an Processor Trace log, which allows log over head logging of informatin. This allows to reconstruct how values later, which can be useful for debugging or other analysis of the program behavior. With the compiler support this can be done with without having to manually add instrumentation to the code. Using dwarf information this can be later mapped back to the variables. There are new options to enable instrumentation for different types, and also a new attribute to control analysis fine grained per function or variable level. The attributes can be set on both the variable and the type level, and also on structure fields. This allows to enable tracing only for specific code in large programs. The pass is generic, but only the x86 backend enables the necessary hooks. When the backend enables the necessary hooks (with -mptwrite) there is an additional pass that looks through the code for attribute vartrace enabled functions or variables. The -fvartrace-locals options is experimental: it works, but it generates redundant ptwrites because the pass doesn't use the SSA information to minimize instrumentation. This could be optimized later. Currently the code can be tested with SDE, or on a Intel Gemini Lake system with a new enough Linux kernel (v4.10+) that supports PTWRITE for PT. Linux perf can be used to record the values perf record -e intel_pt/ptw=1,branch=0/ program perf script --itrace=crw -F +synth ... I have an experimential version of perf that can also use dwarf information to symbolize many[1] values back to their variable names. So far it is not in standard perf, but available at https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-4 It is currently not able to decode all variable locations to names, but a large subset. Longer term hopefully gdb will support this information too. The CPU can potentially generate very data high bandwidths when code doing a lot of computation is heavily instrumented. This can cause some data loss in both the CPU and also in perf logging the data when the disk cannot keep up. Running some larger workloads most workloads do not cause CPU level overflows, but I've seen it with -fvartrace with crafty, and with more workloads with -fvartrace-locals. Recommendation is to not fully instrument programs, but only areas of interest either at the file level or using the attributes. The other thing is that perf and the disk often cannot keep up with the data bandwidth for longer computations. In this case it's possible to use perf snapshot mode (add --snapshot to the command line above). The data will be only logged to a memory ring buffer then, and only dump the buffers on events of interest by sending SIGUSR2 to the perf binrary. In the future this will be hopefully better supported with core files and gdb. Passes bootstrap and test suite on x86_64-linux, also bootstrapped and tested gcc itself with full -fvartrace and -fvartrace-locals instrumentation. gcc/: 2018-11-03 Andi Kleen * Makefile.in: Add tree-vartrace.o. * common.opt: Add -fvartrace, -fvartrace-returns, -fvartrace-args, -fvartrace-reads, -fvartrace-writes, -fvartrace-locals * config/i386/i386.c (ix86_vartrace_func): Add. (TARGET_VARTRACE_FUNC): Add. * doc/extend.texi: Document vartrace/no_vartrace attributes. * doc/invoke.texi: Document -fvartrace, -fvartrace-returns, -fvartrace-args, -fvartrace-reads, -fvartrace-writes, -fvartrace-locals * doc/tm.texi (TARGET_VARTRACE_FUNC): Add. * passes.def: Add vartrace pass. * target.def (vartrace_func): Add. * tree-pass.h (make_pass_vartrace): Add. * tree-vartrace.c: New file to implement vartrace pass. gcc/c-family/: 2018-11-03 Andi Kleen * c-attribs.c (handle_vartrace_attribute): New function. config/: 2018-11-03 Andi Kleen * bootstrap-vartrace.mk: New. * bootstrap-vartrace-locals.mk: New. --- config/bootstrap-vartrace-locals.mk | 3 + config/bootstrap-vartrace.mk| 3 + gcc/Makefile.in | 1 + gcc/c-family/c-attribs.c| 23 ++ gcc/common.opt | 24 ++ gcc/config/i386/i386.c | 16 + gcc/doc/extend.texi | 13 + gcc/doc/invoke.texi | 29 ++ gcc/doc/tm.texi | 4 + gcc/doc/tm.texi.in | 2 + gcc/passes.def | 1 + gcc/target.def | 7 + gcc/tree-pass.h | 1 + gcc/tree-vartrace.c | 463 14 files changed, 590 insertions(+) create mode 100644 config/bootstrap-vartrace-locals.mk create mode 100644 config/bootstrap-vartrace.mk create mod
[PATCH 2/3] Add a pass to automatically add ptwrite instrumentation
From: Andi Kleen Add a new pass to automatically instrument changes to variables with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte field into an external Processor Trace log. This allows to reconstruct how values later, which can be useful for debugging or other analysis of the program behavior. With the compiler support this can be done with without having to manually add instrumentation to the code. Using dwarf information this can be later mapped back to the variables. There are new options to enable instrumentation for different types, and also a new attribute to control analysis fine grained per function or variable level. The attributes can be set on both the variable and the type level, and also on structure fields. This allows to enable tracing only for specific code in large programs. The pass is generic, but only the x86 backend enables the necessary hooks. When the backend enables the necessary hooks (with -mptwrite) there is an additional pass that looks through the code for attribute vartrace enabled functions or variables. The -fvartrace-locals options is experimental: it works, but it generates many redundant ptwrites because the pass doesn't use the SSA information to minimize instrumentation. This could be optimized later. Currently the code can be tested with SDE, or on a Intel Cherry Trail system with a new enough Linux kernel (v4.10+) that supports PTWRITE for PT. Linux perf can be used to record the values perf record -e intel_pt/ptw=1,branch=0/ program perf script --itrace=crw -F +synth ... I have an experimential version of perf that can also use dwarf information to symbolize many[1] values back to their variable names. So far it is not in standard perf, but available at https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-2 Longer term hopefully gdb will support this information too. Passes bootstrap and test suite on x86_64-linux. [1] Many: so far it only supports register variables that are not arguments. gcc/: 2018-02-10 Andi Kleen * Makefile.in: Add tree-vartrace.o. * common.opt: Add -fvartrace, -fvartrace-returns, -fvartrace-args, -fvartrace-reads, -fvartrace-writes, -fvartrace-locals * config/i386/i386.c (ix86_vartrace_func): Add. (TARGET_VARTRACE_FUNC): Add. * doc/extend.texi: Document vartrace/no_vartrace attributes. * doc/invoke.texi: Document -fvartrace, -fvartrace-returns, -fvartrace-args, -fvartrace-reads, -fvartrace-writes, -fvartrace-locals * doc/tm.texi (TARGET_VARTRACE_FUNC): Add. * passes.def: Add vartrace pass. * target.def (vartrace_func): Add. * tree-pass.h (make_pass_vartrace): Add. * tree-vartrace.c: New file to implement vartrace pass. gcc/c-family/: 2018-02-10 Andi Kleen * c-attribs.c (handle_vartrace_attribute): New function. --- gcc/Makefile.in | 1 + gcc/c-family/c-attribs.c | 23 +++ gcc/common.opt | 24 +++ gcc/config/i386/i386.c | 16 ++ gcc/doc/extend.texi | 13 ++ gcc/doc/invoke.texi | 29 +++ gcc/doc/tm.texi | 4 + gcc/doc/tm.texi.in | 2 + gcc/passes.def | 1 + gcc/target.def | 7 + gcc/tree-pass.h | 1 + gcc/tree-vartrace.c | 462 +++ 12 files changed, 583 insertions(+) create mode 100644 gcc/tree-vartrace.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 6c37e46f792..3bce0f21bb4 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1580,6 +1580,7 @@ OBJS = \ tree-vectorizer.o \ tree-vector-builder.o \ tree-vrp.o \ + tree-vartrace.o \ tree.o \ typed-splay-tree.o \ unique-ptr-tests.o \ diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c index 0261a45ec98..0c6488e0912 100644 --- a/gcc/c-family/c-attribs.c +++ b/gcc/c-family/c-attribs.c @@ -104,6 +104,8 @@ static tree handle_tls_model_attribute (tree *, tree, tree, int, bool *); static tree handle_no_instrument_function_attribute (tree *, tree, tree, int, bool *); +static tree handle_vartrace_attribute (tree *, tree, +tree, int, bool *); static tree handle_no_profile_instrument_function_attribute (tree *, tree, tree, int, bool *); static tree handle_malloc_attribute (tree *, tree, tree, int, bool *); @@ -331,6 +333,12 @@ const struct attribute_spec c_common_attribute_table[] = { "no_instrument_function", 0, 0, true, false, false, false, handle_no_instrument_function_attribute, NULL }, + { "vartrace", 0, 0, false, false, false, false, + handle_vartrace_attribute,