Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
OK for google/main with the nits below. http://codereview.appspot.com/5272048/diff/42003/ChangeLog.google-main File ChangeLog.google-main (right): http://codereview.appspot.com/5272048/diff/42003/ChangeLog.google-main#newcode1 ChangeLog.google-main:1: 2011-11-02 Kostya Serebryany 1 2011-11-02 Kostya Serebryany Need blank line below the datestamp. http://codereview.appspot.com/5272048/diff/42003/ChangeLog.google-main#newcode3 ChangeLog.google-main:3: AddressSanitizer, a fast memory error detector: 2 Introduce a new option -fasan which enables 3 AddressSanitizer, a fast memory error detector: Not strictly accepted, but it is common to add some verbiage of this nature when adding major features. OK. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
The invoke.texi change looks fine. The ChangeLog entry needs some work. http://codereview.appspot.com/5272048/diff/41001/ChangeLog.google-main File ChangeLog.google-main (right): http://codereview.appspot.com/5272048/diff/41001/ChangeLog.google-main#newcode6 ChangeLog.google-main:6: 1 2011-11-02 Kostya Serebryany 2 Introduce a new option -fasan which enables 3 AddressSanitizer, a fast memory error detector: 4 http://code.google.com/p/address-sanitizer. 5 The current implementation handles only heap accesses. 6 Not quite. A ChangeLog entry needs to have a rigid structure. For every file and function changed, you need to state what changed. The entry below this one is not really a good example. ChangeLog entries never describe how the patch works or what it provides. See examples in gcc/ChangeLog. See http://www.gnu.org/prep/standards/standards.html#Style-of-Change-Logs for details. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
Ok for google/main when the option is documented in doc/invoke.texi and a Changlog file is provided. David On Tue, Nov 1, 2011 at 5:24 PM, wrote: > So, do you have any other suggestions before the first commit? > > http://codereview.appspot.com/5272048/ >
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Nov 1, 2011 at 12:41 PM, Diego Novillo wrote: > On 11-11-01 15:34 , Xinliang David Li wrote: > >>> Right before pass_expand? In init_optimization_passes(), look for >>> NEXT_PASS >>> (pass_expand). That's RTL generation. Somewhere before that. >>> >> >> Why? > > The idea was to experiment where to best place ASAN to avoid instrumenting > too much. If we schedule it really late, then we may save ourselves some > unnecessary instrumentation. > It needs to be balanced -- on one hand it needs to be as late as possible so that as few memory references (dynamically executed) as possible are instrumented. On the other hand, early enough so that the instrumented code can be optimized sufficiently. > Though, I still think ASAN should never open code the library calls > directly. Rather, it should emit straight-code gimple that can be better > understood and optimized away. that depends on the library function themselves -- if they are trivial, inline sequence should be generated. > > >>> TARGET_MEM_REFs are converted to RTL mems during RTL expansion. >>> >> >> What? they will still be seen by asan which can not be handled (e.g, >> creating address expression out of it). > > So, it needs to run before TMRs are introduced then. *shrug*. > yes it should be before ivopt as discussed. David
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On 11-11-01 15:34 , Xinliang David Li wrote: Right before pass_expand? In init_optimization_passes(), look for NEXT_PASS (pass_expand). That's RTL generation. Somewhere before that. Why? The idea was to experiment where to best place ASAN to avoid instrumenting too much. If we schedule it really late, then we may save ourselves some unnecessary instrumentation. Though, I still think ASAN should never open code the library calls directly. Rather, it should emit straight-code gimple that can be better understood and optimized away. TARGET_MEM_REFs are converted to RTL mems during RTL expansion. What? they will still be seen by asan which can not be handled (e.g, creating address expression out of it). So, it needs to run before TMRs are introduced then. *shrug*.
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Nov 1, 2011 at 12:16 PM, Diego Novillo wrote: > On 11-11-01 15:11 , konstantin.s.serebry...@gmail.com wrote: >> >> Diego mentioned that we can move the asan pass somewhere to the very >> end, just before lowering to RTL. >> Where would be this blessed place? >> Does it still have TARGET_MEM_REF? > > Right before pass_expand? In init_optimization_passes(), look for NEXT_PASS > (pass_expand). That's RTL generation. Somewhere before that. > Why? > TARGET_MEM_REFs are converted to RTL mems during RTL expansion. > What? they will still be seen by asan which can not be handled (e.g, creating address expression out of it). David > > Diego. > >
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On 11-11-01 15:11 , konstantin.s.serebry...@gmail.com wrote: Diego mentioned that we can move the asan pass somewhere to the very end, just before lowering to RTL. Where would be this blessed place? Does it still have TARGET_MEM_REF? Right before pass_expand? In init_optimization_passes(), look for NEXT_PASS (pass_expand). That's RTL generation. Somewhere before that. TARGET_MEM_REFs are converted to RTL mems during RTL expansion. Diego.
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
First round of comments. I think we should add this to google/main. It's in sufficiently good shape for it. You can keep improving it in the branch. It is now too late for 4.7's stage 1, so I think a reasonable way to proceed is to keep it in google/main and then present it for trunk inclusion when stage 1 opens for 4.8. http://codereview.appspot.com/5272048/diff/30001/toplev.c File toplev.c (right): http://codereview.appspot.com/5272048/diff/30001/toplev.c#newcode621 toplev.c:621: asan_finish_file(); + /* File-scope initialization for AddressSanitizer. */ Two spaces before '*/' + if (flag_asan) +asan_finish_file(); Space before '()' http://codereview.appspot.com/5272048/diff/30001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode80 tree-asan.c:80: (All I need is to traverse *all* memory accesses and instrument them). + Implementation details: + This is my first code in gcc. I wrote it by copying tree-mudflap.c, + stripping 70% of irrelevant code and modifying the instrumentation routine + build_check_stmt. The code seems to work, but I don't feel I understand it. + In particular, transform_derefs() and transform_statements() seem too complex. + Suggestions are welcome on how to simplify them. + (All I need is to traverse *all* memory accesses and instrument them). No need to have this in the code. These details usually go in the mail message you submit your patch with. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode140 tree-asan.c:140: 'size' is one of 1, 2, 4, 8, 16. */ +/* Instrument the memory access instruction 'base'. + Insert new statements before 'instr_gsi'. + 'location' is source code location. + 'is_store' is either 1 (for a store) or 0 (for a load). + 'size' is one of 1, 2, 4, 8, 16. */ When documenting arguments, you should refer to the arguments in CAPITALS instead of 'quoted'. The comment needs to be formatted so the lines below the first line are indented 3 spaces. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode155 tree-asan.c:155: tree shadow_type = size > 8 ? short_integer_type_node : char_type_node; + tree shadow_type = size > 8 ? short_integer_type_node : char_type_node; s/8/BITS_PER_UNIT/ http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode219 tree-asan.c:219: build1 (VIEW_CONVERT_EXPR, shadow_ptr_type, t)); + t = build2 (RSHIFT_EXPR, uintptr_type, base_addr, + build_int_cst (uintptr_type, asan_scale)); + t = build2 (PLUS_EXPR, uintptr_type, t, + build2 (LSHIFT_EXPR, uintptr_type, + build_int_cst (uintptr_type, 1), + build_int_cst (uintptr_type, asan_offset_log) + )); + t = build1 (INDIRECT_REF, shadow_type, + build1 (VIEW_CONVERT_EXPR, shadow_ptr_type, t)); It helps if this adds documentation on what expressions it's building. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode250 tree-asan.c:250: gimple_seq_add_stmt (&seq, g); [ ... ] + g = gimple_build_assign (cond, t); + gimple_set_location (g, location); + gimple_seq_add_stmt (&seq, g); + g = gimple_build_cond (NE_EXPR, cond, boolean_false_node, NULL_TREE, + NULL_TREE); + gimple_set_location (g, location); + gimple_seq_add_stmt (&seq, g); So, instead of open coding all the access checks, how about we created a new GIMPLE instruction? This instruction would take the same parameters, and have the semantics of the check but it would be optimizable. You could for instance make the instruction produce a pointer SSA name that is then used by the memory reference. With this, you can then allow the optimizers do their thing. These instructions could be moved around, eliminated, coalesced. Resulting in a reduced number of checks. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode251 tree-asan.c:251: /* debug_gimple_seq (seq); */ + /* debug_gimple_seq (seq); */ If you want the pass to dump debugging data, use the dump support. See other passes for examples (look for 'if (dump_file)' snippets)'. When -fdump-tree-asan is passed to the compiler, the pass manager will instantiate a 'dump_file' that you can use to emit debug info. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode253 tree-asan.c:253: /* crash */ + /* crash */ ??? http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode272 tree-asan.c:272: location_t location, int is_store) +static void +transform_derefs (gimple_stmt_iterator *iter, tree *tp, + location_t location, int is_store) Needs documentation. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode326 tree-asan.c:326: bb = ENTRY_BLOCK_PTR ->next_bb; + bb = ENTRY_BLOCK_PTR ->next_bb; No space before '->'. http://codereview.appspot.com/5272048/diff/30001/tree-asan.c#newcode327 tree-asan.c:327: do FOR_EACH_BB (bb) http:
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
what kind of failures? David On Wed, Oct 19, 2011 at 2:04 PM, wrote: > On 2011/10/19 20:38:35, kcc wrote: >> >> Added code to avoid bitfields. > > Is there anything I should know about splitting basic blocks in the > presence of EH? > Currently, asan fails on 483.xalancbmk, 453.povray and 450.soplex, which > are known to have EH. > > http://codereview.appspot.com/5272048/ >
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Wed, Oct 19, 2011 at 12:02 PM, wrote: > Minimized the crash to this: > > struct Foo { > unsigned bf1:1; > unsigned bf2:1; > unsigned bf3:1; > }; > > void foo (struct Foo *ob) { > ob->bf2 = 1; > } > > > > D.2731_4 = &ob_1(D)->bf2; > __asan_base_addr.2 = (long unsigned int) D.2731_4; > D.2732_5 = __asan_base_addr.2 >> 3; > D.2733_6 = 1 << 44; > D.2734_7 = D.2732_5 + D.2733_6; > D.2735_8 = VIEW_CONVERT_EXPR(D.2734_7); > # VUSE <.MEM> > __asan_shadow.3 = *D.2735_8; > D.2737_9 = __asan_shadow.3 != 0; > D.2738_10 = __asan_base_addr.2 & 7; > D.2739_11 = (char) D.2738_10; > D.2740_12 = D.2739_11 + 0; > D.2741_13 = D.2740_12 >= __asan_shadow.3; > __asan_crash_cond.4 = D.2737_9 & D.2741_13; > if (__asan_crash_cond.4 != 0) > ./expand_expr_addr_expr_1_err.c: In function ‘foo’: > ./expand_expr_addr_expr_1_err.c:8:11: internal compiler error: in > expand_expr_addr_expr_1, at expr.c:7381 > > > > How do I avoid instrumenting bitfields? Use get_inner_reference to compute the bitpos, and check if it is multiple of bits_per_unit. David > > http://codereview.appspot.com/5272048/ >
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Oct 18, 2011 at 19:31, Xinliang David Li wrote: > It will be weird to put the instrumentation pass inside loop opt, > besides memory loads which are loop invariants and redundant stores in > loop should be handled by pre/pde. > > +cc Richard Guenther > > You may want to ask the middle-end maintainer to review your code at > this point if you want it to be in trunk. For trunk inclusion, we need to decide what to do wrt mudflap. Clearly, if ASAN offers the same protections and considerable better performance, then that should be an easy decision. I still have not looked at the implementation in detail, but I like the fact that it is a pure gimple pass. If the instrumentation can be statically optimized, then that is a clear advantage over mudflap, which we have never been able to optimize properly. More comments on the patch itself coming soon. Diego.
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
It will be weird to put the instrumentation pass inside loop opt, besides memory loads which are loop invariants and redundant stores in loop should be handled by pre/pde. +cc Richard Guenther You may want to ask the middle-end maintainer to review your code at this point if you want it to be in trunk. thanks, David On Tue, Oct 18, 2011 at 4:12 PM, wrote: >> yes -- so a good choice would be after PRE and PDE (pre, sink_code) >> which should handle most of the loop invariant memory loads. > > how about pass_lim? I think asan should be after lim. > > http://codereview.appspot.com/5272048/ >
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
On Tue, Oct 18, 2011 at 3:56 PM, wrote: > On 2011/10/18 22:52:33, davidxl wrote: >> >> http://codereview.appspot.com/5272048/diff/18001/tree-asan.c >> File tree-asan.c (right): > > > http://codereview.appspot.com/5272048/diff/18001/tree-asan.c#newcode325 >> >> tree-asan.c:325: base = build_addr (t, current_function_decl); >> There are issues with creating address expressions from TARGET_MEM_REF > > in gcc. >> >> Since you want compiler to do optimization on instrumented code as > > much as >> >> possible, asan instrumentation is better done as early as possible > > after ipa > > Why? > I would actually say that I want the instrumentation to happen as late > as possible because this will instrument fewer memory accesses. > For example, asan certainly needs to happen after loop invariants are > moved out and common subexpressions (including loads) are eliminated. > No? yes -- so a good choice would be after PRE and PDE (pre, sink_code) which should handle most of the loop invariant memory loads. David > >> inlining -- and this will also solve this problem. I tried moving asan > > pass >> >> before loop opt, and it worked fine. > > > > http://codereview.appspot.com/5272048/ >
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
http://codereview.appspot.com/5272048/diff/18001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/18001/tree-asan.c#newcode325 tree-asan.c:325: base = build_addr (t, current_function_decl); There are issues with creating address expressions from TARGET_MEM_REF in gcc. Since you want compiler to do optimization on instrumented code as much as possible, asan instrumentation is better done as early as possible after ipa inlining -- and this will also solve this problem. I tried moving asan pass before loop opt, and it worked fine. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
http://codereview.appspot.com/5272048/diff/18001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/18001/tree-asan.c#newcode325 tree-asan.c:325: base = build_addr (t, current_function_decl); You need to create a temp var and build as gimple assignment. See init_tmp_var in tree-nested.c. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
There must be a style lint for gcc -- but I have not used it .. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode79 tree-asan.c:79: (All I need is to traverse *all* memory accesses and instrument them). For refs such as component ref 'ap->b[i].c', you need to create the access address expression out of it -- you may want to try to use build_addr interface. There is another interface get_ref_base_and_extent which can return the access base + offset, but it does not work for asan when there is array ref with runtime index -- this is mainly for alias analysis. On 2011/10/17 23:04:50, kcc wrote: On 2011/10/17 22:33:17, davidxl wrote: > Discard 2) -- it is not correct. What Asan needs is slightly different. Yea. I actually have a test where this instrumentation does something bad. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode128 tree-asan.c:128: /* perform the instrumentation */ On 2011/10/17 23:04:50, kcc wrote: On 2011/10/17 22:26:55, davidxl wrote: > Parameter documentation. s/perform/Perform/ Done, although I am not sure what location is... location is the source location. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode286 tree-asan.c:286: transform_derefs (gimple_stmt_iterator *iter, tree *tp, Look at interface build_addr in tree-nested.c, it may be what you need. On 2011/10/17 23:04:50, kcc wrote: On 2011/10/17 22:26:55, davidxl wrote: > You can use get_base_address utility function defined in gimple.c So, do I ignore this for now? http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode414 tree-asan.c:414: do Ok, in that case, may be you should use the aux field of BB to mark those new BBs and skip them? On 2011/10/17 23:04:50, kcc wrote: On 2011/10/17 22:26:55, davidxl wrote: > Can you use FOR_EACH_BB macro? I don't know. Can I? The instrumentation code creates new BB while it runs and those should not be instrumented. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
http://codereview.appspot.com/5272048/diff/2001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode79 tree-asan.c:79: (All I need is to traverse *all* memory accesses and instrument them). Discard 2) -- it is not correct. What Asan needs is slightly different. David On 2011/10/17 22:26:55, davidxl wrote: Two suggestions: 1) You only need to deal with GIMPLE_ASSIGN (lhs and rhs) for all memory references) 2) use get_base_address function to compute base address. http://codereview.appspot.com/5272048/
Re: [google] AddressSanitizer for gcc, first attempt. (issue 5272048)
fasan option also needs to be documented in doc/invoke.texi. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c File tree-asan.c (right): http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode54 tree-asan.c:54: ShadowValue = (char*)ShadowAddr; *(char*) ShadowAddr; http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode59 tree-asan.c:59: ShadowValue = (char*)ShadowAddr; *(char*) ShadowAddr; http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode79 tree-asan.c:79: (All I need is to traverse *all* memory accesses and instrument them). Two suggestions: 1) You only need to deal with GIMPLE_ASSIGN (lhs and rhs) for all memory references) 2) use get_base_address function to compute base address. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode89 tree-asan.c:89: We may want to add command line flags to change these values. */ two spaces. Similarly for other comments. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode90 tree-asan.c:90: static int asan_scale = 3; Need an empty line after the comment. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode91 tree-asan.c:91: static int asan_offset_log_32 = 29; const int? http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode97 tree-asan.c:97: static tree New empty line after the comment. Similarly for all other functions in the file. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode98 tree-asan.c:98: report_error_func (int is_store, int size) Document IS_STORE and SIZE in the comment. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode102 tree-asan.c:102: Extra line. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode104 tree-asan.c:104: sprintf(name, "__asan_report_%s%d\n", is_store ? "store" : "load", size); Empty line between decls and statements. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode108 tree-asan.c:108: DECL_ATTRIBUTES (def) = tree_cons (get_identifier ("leaf"), NULL, DECL_ATTRIBUTES (def)); line too long. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode128 tree-asan.c:128: /* perform the instrumentation */ Parameter documentation. s/perform/Perform/ http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode154 tree-asan.c:154: if (! gsi_end_p (gsi)) remove extra space http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode187 tree-asan.c:187: base_addr = make_rename_temp (uintptr_type, "base_addr"); May be better "__asan_base_addr" as the temp var name? http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode199 tree-asan.c:199: build_int_cst(uintptr_type, asan_scale) Missing space after function name. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode200 tree-asan.c:200: ); Do not put closing parenthesis in a separate line. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode203 tree-asan.c:203: build_int_cst(uintptr_type, 1), Missing space. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode204 tree-asan.c:204: build_int_cst(uintptr_type, asan_offset_log) Missing space. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode205 tree-asan.c:205: ) Do not start a new line. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode230 tree-asan.c:230: { { } is not needed. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode237 tree-asan.c:237: cond = make_rename_temp (boolean_type_node, "asan_crash_cond"); -> __asan_crash_cond to be consistent. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode238 tree-asan.c:238: g = gimple_build_assign (cond, t); Extra space here. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode251 tree-asan.c:251: g = gimple_build_call (report_error_func(is_store, size), 1, base_addr); Missing space. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode254 tree-asan.c:254: Extra line. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode286 tree-asan.c:286: transform_derefs (gimple_stmt_iterator *iter, tree *tp, You can use get_base_address utility function defined in gimple.c http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode414 tree-asan.c:414: do Can you use FOR_EACH_BB macro? http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode425 tree-asan.c:425: transform_derefs (&i, gimple_assign_lhs_ptr (s), Use get_base_address and then do transformation. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode436 tree-asan.c:436: if (gimple_return_retval (s) != NULL_TREE) The operand of a gimple_return should be a SSA_NAME, so handling it is not needed. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode447 tree-asan.c:447: if (fndecl && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_ALLOCA)) && DECL_BUILT_IN (fndecl) .. http://codereview.appspot.com/5272048/diff/2001/tree-asan.c#newcode