Re: [PATCH, i386]: Fix MULTILIB_MATCHES option typos in t-rtems
On 11/05/2013 07:54 PM, Uros Bizjak wrote: On Tue, Nov 5, 2013 at 7:43 PM, Ralf Corsepius ralf.corsep...@gmail.com wrote: 2013-11-05 Uros Bizjak ubiz...@gmail.com * config/i386/t-rtems (MULTILIB_MATCHES): Fix option typos. Committed to SVN mainline as obvious, should be backported to other release branches. Yikes! Could you please apply it to all open GCC branches? Done. Thanks, Ralf
Re: [PATCH, SH] Implement builtin_strlen
On 11/05/2013 02:12 PM, Kaz Kojima wrote: Christian Bruel christian.br...@st.com wrote: No regressions for sh-none-elf. OK for trunk ? OK. Regards, kaz thanks, applied together with the cleanup referenced earlier and a slight variable renaming (start_addr-curr_addr, end_addr-start_addr) for readability as obvious, Christian
Commit: MSP430: Make -mmcu option define C preprocessor symbol and linker scripts
Hi Guys, I am checking in the patch below to extend the -mmcu command line option of the MSP430 port so that it defines a C preprocessor symbol based on the name of the MCU part and it also changes the linker command line so that it will search for linker scripts specific to that part. Cheers Nick gcc/ChangeLog 2013-11-06 Nick Clifton ni...@redhat.com * config/msp430/msp430.h (TARGET_CPU_CPP_BUILTINS): Define the name returned by msp430_mcu_name. (LIB_SPEC): If a -T option has not been specified then set a default, mcu-specific, linker script. * config/msp430/t-msp430 (MULTILIB_MATCHES): Add more mcu names. * config/msp430/msp430.c (msp430x_names): Likewise. Alpha sort the names for ease of comparison. (msp430_mcu_name): New function: Returns a string suitable for use as a C preprocessor symbol based upon the name of the MCU being targeted. (msp430_option_override): Accept msp430x and msp430xv2 as generic mcu names. * config/msp430/msp430-protos.h (msp430_mcu_name): Prototype. Index: gcc/config/msp430/msp430-protos.h === --- gcc/config/msp430/msp430-protos.h (revision 20) +++ gcc/config/msp430/msp430-protos.h (working copy) @@ -35,6 +35,7 @@ int msp430_initial_elimination_offset (int, int); boolmsp430_is_interrupt_func (void); const char * msp430x_logical_shift_right (rtx); +const char * msp430_mcu_name (void); bool msp430_modes_tieable_p (enum machine_mode, enum machine_mode); void msp430_output_labelref (FILE *, const char *); void msp430_register_pragmas (void); Index: gcc/config/msp430/msp430.c === --- gcc/config/msp430/msp430.c (revision 20) +++ gcc/config/msp430/msp430.c (working copy) @@ -109,69 +109,87 @@ #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE msp430_option_override +/* This list provides a set of known MCU names that support the MSP430X + ISA. The list has been provided by TI and should be kept in sync with + the ones in: + + gcc/config/msp430/t-msp430 + gas/config/tc-msp430.c + + FIXME: We ought to read the names in from a file at run, rather + than having them built in like this. Also such a file should be + shared with gas. */ + static const char * msp430x_names [] = { - msp430x, /* Generic name. */ - msp430xv2, /* Generic name. */ - - /* These names have been provided by TI and match the names currently - supported by GAS. - - NB/ This list should be kept in sync with the ones in: - gcc/config/msp430/t-msp430 - gas/config/tc-msp430.c - - FIXME: We ought to read the names in from a file at run, rather - than having them built in like this. Also such a file should be - shared with gas. */ - - msp430cg4616, msp430cg4617, msp430cg4618, msp430cg4619, msp430f2416, - msp430f2417, msp430f2418, msp430f2419, msp430f2616, msp430f2617, - msp430f2618, msp430f2619, msp430f47126, msp430f47127, msp430f47163, - msp430f47173, msp430f47183, msp430f47193, msp430f47166, msp430f47176, - msp430f47186, msp430f47196, msp430f47167, msp430f47177, msp430f47187, - msp430f47197, msp430f46161, msp430f46171, msp430f46181, msp430f46191, - msp430f4616, msp430f4617, msp430f4618, msp430f4619, msp430fg4616, - msp430fg4617, msp430fg4618, msp430fg4619, msp430f5418, msp430f5419, - msp430f5435, msp430f5436, msp430f5437, msp430f5438, msp430f5418a, - msp430f5419a, msp430f5435a, msp430f5436a, msp430f5437a, msp430f5438a, - msp430f5212, msp430f5213, msp430f5214, msp430f5217, msp430f5218, - msp430f5219, msp430f5222, msp430f5223, msp430f5224, msp430f5227, - msp430f5228, msp430f5229, msp430f5304, msp430f5308, msp430f5309, - msp430f5310, msp430f5340, msp430f5341, msp430f5342, msp430f5324, - msp430f5325, msp430f5326, msp430f5327, msp430f5328, msp430f5329, + cc430f5123, cc430f5125, cc430f5133, cc430f5135, cc430f5137, + cc430f5143, cc430f5145, cc430f5147, cc430f6125, cc430f6126, + cc430f6127, cc430f6135, cc430f6137, cc430f6143, cc430f6145, + cc430f6147, msp430bt5190, msp430cg4616, msp430cg4617, msp430cg4618, + msp430cg4619, msp430f2416, msp430f2417, msp430f2418, msp430f2419, + msp430f2616, msp430f2617, msp430f2618, msp430f2619, msp430f4616, + msp430f46161, msp430f4617, msp430f46171, msp430f4618, msp430f46181, + msp430f4619, msp430f46191, msp430f47126, msp430f47127, msp430f47163, + msp430f47166, msp430f47167, msp430f47173, msp430f47176, msp430f47177, + msp430f47183, msp430f47186, msp430f47187, msp430f47193, msp430f47196, + msp430f47197, msp430f5131, msp430f5132, msp430f5151, msp430f5152, + msp430f5171, msp430f5172, msp430f5212, msp430f5213, msp430f5214, + msp430f5217, msp430f5218, msp430f5219, msp430f5222, msp430f5223, + msp430f5224, msp430f5227, msp430f5228, msp430f5229,
Re: patch implementing a new pass for register-pressure relief through live range shrinkage
On Tue, Nov 5, 2013 at 4:35 PM, Vladimir Makarov vmaka...@redhat.com wrote: I'd like to add a new experimental optimization to the trunk. This optimization was discussed on RA BOF of this summer GNU Cauldron. It is a register pressure relief through live-range shrinkage. It is implemented on the scheduler base and uses register-pressure insn scheduling infrastructure. By rearranging insns we shorten pseudo live-ranges and increase a chance to them be assigned to a hard register. The code looks pretty simple but there are a lot of works behind this patch. I've tried about ten different versions of this code (different heuristics for two currently existing register-pressure algorithms). I think it is *upto target maintainers* to decide to use or not to use this optimization for their targets. I'd recommend to use this at least for x86/x86-64. I think any OOO processor with small or moderate register file which does not use the 1st insn scheduling might benefit from this too. On SPEC2000 for x86/x86-64 (I use Haswell processor, -O3 with general tuning), the optimization usage results in smaller code size in average (for floating point and integer benchmarks in 32- and 64-bit mode). The improvement better visible for SPECFP2000 (although I have the same improvement on x86-64 SPECInt2000 but it might be attributed mostly mcf benchmark unstability). It is about 0.5% for 32-bit and 64-bit mode. It is understandable, as the optimization has more opportunities to improve the code on longer BBs. Different from other heuristic optimizations, I don't see any significant worse performance. It gives practically the same or better performance (a few benchmarks imporoved by 1% or more upto 3%). The single but significant drawback is additional compilation time (4%-6%) as the 1st insn scheduling pass is quite expensive. So I'd recommend target maintainers to switch it on only for -Ofast. Generally I'd not recomment viewing -Ofast as -O4 but as -O3 plus generally unsafe optimizations. So I'd not enable it for -Ofast but for -O3 - possibly also with -Os if indeed the main motivation is also code-size improvements (-Os is a similar beast as -O3, spend as much time as you can on optimizing size). Btw, thanks for working on this. How does it relate to -fsched-pressure? Does it treat all register classes the same? On x86 mostly the few fixed registers for some of the integer pipeline instructions hurt, x86_64 has enough general and FP registers? Richard. If somebody finds that the optimization works on processors which uses 1st insn scheduling by default (in which I slightly doubt), we could improve the compilation time by reusing data for this optimization and the 1st insn scheduling. Any comments, questions, thoughts are appreciated. 2013-11-05 Vladimir Makarov vmaka...@redhat.com * tree-pass.h (make_pass_live_range_shrinkage): New external. * timevar.def (TV_LIVE_RANGE_SHRINKAGE): New. * sched-rgn.c (gate_handle_live_range_shrinkage): New. (rest_of_handle_live_range_shrinkage): Ditto (class pass_live_range_shrinkage): Ditto. (pass_data_live_range_shrinkage): Ditto. (make_pass_live_range_shrinkage): Ditto. * sched-int.h (sched_relief_p): New external. * sched-deps.c (create_insn_reg_set): Make void return value. * passes.def: Add pass_live_range_shrinkage. * ira.c (update_equiv_regs): Don't move if flag_live_range_shrinkage. * haifa-sched.c (sched_relief_p): New. (rank_for_schedule): Add code for pressure relief through live range shrinkage. (schedule_insn): Print more debug info. (sched_init): Setup SCHED_PRESSURE_WEIGHTED for pressure relief through live range shrinkage. * doc/invoke.texi (-flive-range-shrinkage): New. * common.opt (flive-range-shrinkage): New.
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. Also sth weird is going on as your arg_bnd call gets [return slot optimization]. It seems that your POINTER_BOUNDS_TYPE is seen as aggregate somehow. Richard. Ilya Richard.
Re: [patch] Move block_in_transaction out of gimple.h
On Tue, Nov 5, 2013 at 10:08 PM, Jeff Law l...@redhat.com wrote: On 11/05/13 14:06, Andrew MacLeod wrote: Looks like another location of convenience perhaps... Anyway, block_in_transaction (bb) really belongs in basic-block.h... The only oddity is that it also checks flag_tm... Is this really necessary? One would think the flag would never be set if flag_tm wasn't on... In any case, basic-block.h is already picking options.h up through function.h which includes tm.h. And regardless, it does belong here... Bootstraps on x86_64-unknown-linux-gnu and current running regressions. Assuming its clean, OK? OK. I wouldn't lose any sleep of that test were removed. In fact, please remove it :-) I wouldn't want someone to see that code and think hey, BB_IN_TRANSACTION is guarded, let's reuse that bit for something else when not compiling for TM. We've done far too much of that through the years ;( I think it's not computed for !flag_tm, but I agree. Richard. Jeff
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On Tue, Nov 5, 2013 at 10:18 PM, Jeff Law l...@redhat.com wrote: On 10/31/13 10:26, David Malcolm wrote: The gimple statement types are currently implemented using a hand-coded C inheritance scheme, with a union gimple_statement_d holding the various possible structs for a statement. The following series of patches convert it to a C++ hierarchy, using the existing structs, eliminating the union. The gimple typedef changes from being a (union gimple_statement_d *) to being a: (struct gimple_statement_base *) There are no virtual functions in the new code: the sizes of the various structs are unchanged. It makes use of is-a.h, using the as_a T template function to perform downcasts, which are checked (via gcc_checking_assert) in an ENABLE_CHECKING build, and are simple casts in an unchecked build, albeit it in an inlined function rather than a macro. For example, one can write: gimple_statement_phi *phi = as_a gimple_statement_phi (gsi_stmt (gsi)); and then directly access the fields of the phi, as a phi. The existing accessor functions in gimple.h become somewhat redundant in this scheme, but are preserved. The earlier versions of the patches made all of the types GTY((user)) and provided hand-written implementations of the gc and pch marker routines. In this new version we rely on the support for simple inheritance that I recently added to gengtype, by adding a desc to the GTY marking for the base class, and a tag to the marking for all of the concrete subclasses. (I say class, but all the types remain structs since their fields are all publicly accessible). As noted in the earlier patch, I believe this is a superior scheme to the C implementation: * We can get closer to compile-time type-safety, checking the gimple code once and downcasting with an as_a, then directly accessing fields, rather than going through accessor functions that check each time. In some places we may want to replace a gimple with a subclass e.g. phis are always of the phi subclass, to get full compile-time type-safety. * This scheme is likely to be easier for newbies to understand. * Currently in gdb, dereferencing a gimple leads to screenfuls of text, showing all the various union values. With this, you get just the base class, and can cast it to the appropriate subclass. * With this, we're working directly with the language constructs, rather than rolling our own, and thus other tools can better understand the code. (e.g. doxygen). Again, as noted in the earlier patch series, the names of the structs are rather verbose. I would prefer to also rename them all to eliminate the _statement component: gimple_statement_base - gimple_base gimple_statement_phi - gimple_phi gimple_statement_omp - gimple_omp etc, but I didn't do this to mimimize the patch size. But if the core maintainers are up for that, I can redo the patch series with that change also, or do that as a followup. The patch is in 6 parts; all of them are needed together. And that's part of the problem. There's understandable resistance to (for example) the as_a casting. There's a bit of natural tension between the desire to keep patches small and self-contained and the size/scope of the changes necessary to do any serious reorganization work. This set swings too far in the latter direction :-) Is there any way to go forward without the is_a/as_a stuff? ie, is there're a simpler step towards where we're trying to go that allows most of this to go forward now rather than waiting? We decided to move to C++. As part of a later discussion we decided to go with a single general dynamic-casting style, mimicing the real C++ variant which is dynamic_cast ... . Which resulted in is-a.h. So yes, we've decided to go C++ so we have to live with certain uglinesses of that decisions (and maybe over time those uglinesses will fade away and we get used to it and like it). Thus, there isn't another option besides using the is-a.h machinery and enabling and using RTTI. Sticking to C for gimple doesn't seem to be consistent with the decision to move to C++. Oh, I'm not saying I'm a big fan of as_a / is_a or C++ in general as it plays out right now. But well, we've had the discussion and had a decision. Execute. Richard. * Patch 1 of 6: This patch adds inheritance to the various gimple types, eliminating the initial baseclass fields, and eliminating the union gimple_statement_d. All the types remain structs. They become marked with GTY(()), gaining GSS_ tag values. * Patch 2 of 6: This patch ports various accessor functions within gimple.h to the new scheme. * Patch 3 of 6: This patch is autogenerated by refactor_gimple.py from https://github.com/davidmalcolm/gcc-refactoring-scripts There is a test suite test_refactor_gimple.py which may give a
Re: [PATCH] PR58985: testcase error.
On Wed, Nov 6, 2013 at 3:51 AM, Wei Mi w...@google.com wrote: +Release manager. Thanks, committed to trunk as r204438. Ok for 4.8 branch? Ok. On Tue, Nov 5, 2013 at 11:19 AM, Jeff Law l...@redhat.com wrote: On 11/04/13 12:07, Wei Mi wrote: Hi, This is to fix testcase error reported in PR58985. The intention of the testcase was to ensure there was no REG_EQUIV notes generated for a reg which was used in a paradoxical subreg. When target was x86, there was subreg generated so I omitted to add the subreg in the regexp pattern. However there is no subreg generated for target cris-axis-elf, so REG_EQUIV should be allowed. Is it ok for trunk and gcc-4.8 branch? Thanks, Wei Mi. 2013-11-04 Wei Mi w...@google.com PR regression/58985 * testsuite/gcc.dg/pr57518.c: Add subreg in regexp pattern. Fine for the trunk. Release manager's call for the branch. jeff
Re: [C] Fix PR57258: unused variable warning is emitted for volatile variables
On Wed, Nov 6, 2013 at 8:40 AM, Mingjie Xing mingjie.x...@gmail.com wrote: Hi, Changes: * c/c-decl.c (pop_scope): Skip volatile variables while emit warnings for unused variables. Tested on i686-pc-linux-gnu. OK? You miss a testcase. Also why should the warning be omitted for unused automatic volatile variables? They cannot be used in any way. Richard. Mingjie
[ARM, AArch64] Make aarch-common.c files more robust.
Hi, This patch is a respin of the aarch-common improvements to use a generic search to find SETs and the variety of shifts, rather than relying on the hope that we will find something well formed. I've bootstrapped the patch on a chromebook with -mcpu=cortex-a7, -mcpu=cortex-a9 and -mcpu=cortex-a15 with no issues, and run the aarch64-none-elf regression suite with no issues. And I've thrown a bunch of smoke tests at this over all supported -mcpu values with no issues, and only a few minor scheduling differences where the old code was too liberal and did the wrong thing. Is this OK? Thanks, James --- gcc/ 2013-11-06 James Greenhalgh james.greenha...@arm.com * config/arm/aarch-common.c (search_term): New typedef. (shift_rtx_costs): New array. (arm_rtx_shift_left_p): New. (arm_find_sub_rtx_with_search_term): Likewise. (arm_find_sub_rtx_with_code): Likewise. (arm_early_load_addr_dep): Add sanity checking. (arm_no_early_alu_shift_dep): Likewise. (arm_no_early_alu_shift_value_dep): Likewise. (arm_no_early_mul_dep): Likewise. (arm_no_early_store_addr_dep): Likewise. diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index 3111ae9..201e581 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -31,30 +31,139 @@ #include c-family/c-common.h #include rtl.h -/* Return nonzero if the CONSUMER instruction (a load) does need - PRODUCER's value to calculate the address. */ +typedef struct +{ + rtx_code search_code; + rtx search_result; + bool find_any_shift; +} search_term; + +/* Return TRUE if X is either an arithmetic shift left, or + is a multiplication by a power of two. */ +static bool +arm_rtx_shift_left_p (rtx x) +{ + enum rtx_code code = GET_CODE (x); -int -arm_early_load_addr_dep (rtx producer, rtx consumer) + if (code == MULT CONST_INT_P (XEXP (x, 1)) + exact_log2 (INTVAL (XEXP (x, 1))) 0) +return true; + + if (code == ASHIFT) +return true; + + return false; +} + +static rtx_code shift_rtx_codes[] = + { ASHIFT, ROTATE, ASHIFTRT, LSHIFTRT, +ROTATERT, ZERO_EXTEND, SIGN_EXTEND }; + +/* Callback function for arm_find_sub_rtx_with_code. + DATA is safe to treat as a SEARCH_TERM, ST. This will + hold a SEARCH_CODE. PATTERN is checked to see if it is an + RTX with that code. If it is, write SEARCH_RESULT in ST + and return 1. Otherwise, or if we have been passed a NULL_RTX + return 0. If ST.FIND_ANY_SHIFT then we are interested in + anything which can reasonably be described as a SHIFT RTX. */ +static int +arm_find_sub_rtx_with_search_term (rtx *pattern, void *data) { - rtx value = PATTERN (producer); - rtx addr = PATTERN (consumer); - - if (GET_CODE (value) == COND_EXEC) -value = COND_EXEC_CODE (value); - if (GET_CODE (value) == PARALLEL) -value = XVECEXP (value, 0, 0); - value = XEXP (value, 0); - if (GET_CODE (addr) == COND_EXEC) -addr = COND_EXEC_CODE (addr); - if (GET_CODE (addr) == PARALLEL) + search_term *st = (search_term *) data; + rtx_code pattern_code; + int found = 0; + + gcc_assert (pattern); + gcc_assert (st); + + /* Poorly formed patterns can really ruin our day. */ + if (*pattern == NULL_RTX) +return 0; + + pattern_code = GET_CODE (*pattern); + + if (st-find_any_shift) { - if (GET_CODE (XVECEXP (addr, 0, 0)) == RETURN) -addr = XVECEXP (addr, 0, 1); + unsigned i = 0; + + /* Left shifts might have been canonicalized to a MULT of some + power of two. Make sure we catch them. */ + if (arm_rtx_shift_left_p (*pattern)) + found = 1; else -addr = XVECEXP (addr, 0, 0); + for (i = 0; i ARRAY_SIZE (shift_rtx_codes); i++) + if (pattern_code == shift_rtx_codes[i]) + found = 1; +} + + if (pattern_code == st-search_code) +found = 1; + + if (found) +st-search_result = *pattern; + + return found; +} + +/* Traverse PATTERN looking for a sub-rtx with RTX_CODE CODE. */ +static rtx +arm_find_sub_rtx_with_code (rtx pattern, rtx_code code, bool find_any_shift) +{ + search_term st; + int result = 0; + + gcc_assert (pattern != NULL_RTX); + st.search_code = code; + st.search_result = NULL_RTX; + st.find_any_shift = find_any_shift; + result = for_each_rtx (pattern, arm_find_sub_rtx_with_search_term, st); + if (result) +return st.search_result; + else +return NULL_RTX; +} + +/* Traverse PATTERN looking for any sub-rtx which looks like a shift. */ +static rtx +arm_find_shift_sub_rtx (rtx pattern) +{ + return arm_find_sub_rtx_with_code (pattern, ASHIFT, true); +} + +/* PRODUCER and CONSUMER are two potentially dependant RTX. PRODUCER + (possibly) contains a SET which will provide a result we can access + using the SET_DEST macro. We will place the RTX which would be + written by PRODUCER in SET_SOURCE. + Similarly, CONSUMER (possibly) contains a SET which has an operand + we can access using
[v3 patch] Fix spelling in regex headers
This simply fixes s/boundry/boundary/, which I wanted to do first before some more regex refactoring I'm planning. 2013-11-06 Jonathan Wakely jwakely@gmail.com * include/bits/regex_automaton.h (_S_opcode_word_boundry): Rename to _S_opcode_word_boundary. * include/bits/regex_automaton.tcc: Likewise. * include/bits/regex_executor.h (__detail::_Executor::_M_word_boundry): Rename to _M_word_boundary. * include/bits/regex_executor.tcc: Likewise. Tested x86_64-linux, committed to trunk. commit 9c72a2889302b33b4e13643c11414e8278a9ba34 Author: Jonathan Wakely jwakely@gmail.com Date: Wed Nov 6 09:13:10 2013 + * include/bits/regex_automaton.h (_S_opcode_word_boundry): Rename to _S_opcode_word_boundary. * include/bits/regex_automaton.tcc: Likewise. * include/bits/regex_executor.h (__detail::_Executor::_M_word_boundry): Rename to _M_word_boundary. * include/bits/regex_executor.tcc: Likewise. diff --git a/libstdc++-v3/include/bits/regex_automaton.h b/libstdc++-v3/include/bits/regex_automaton.h index 4fb5556..e630512 100644 --- a/libstdc++-v3/include/bits/regex_automaton.h +++ b/libstdc++-v3/include/bits/regex_automaton.h @@ -56,7 +56,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _S_opcode_backref, _S_opcode_line_begin_assertion, _S_opcode_line_end_assertion, - _S_opcode_word_boundry, + _S_opcode_word_boundary, _S_opcode_subexpr_lookahead, _S_opcode_subexpr_begin, _S_opcode_subexpr_end, @@ -83,7 +83,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _StateIdT _M_quant_index; // for _S_opcode_alternative or _S_opcode_subexpr_lookahead _StateIdT _M_alt; - // for _S_opcode_word_boundry or _S_opcode_subexpr_lookahead or + // for _S_opcode_word_boundary or _S_opcode_subexpr_lookahead or // quantifiers(ungreedy if set true) bool _M_neg; }; @@ -197,7 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _StateIdT _M_insert_word_bound(bool __neg) { - _StateT __tmp(_S_opcode_word_boundry); + _StateT __tmp(_S_opcode_word_boundary); __tmp._M_neg = __neg; return _M_insert_state(__tmp); } diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc b/libstdc++-v3/include/bits/regex_automaton.tcc index c15e3e9..258d22d 100644 --- a/libstdc++-v3/include/bits/regex_automaton.tcc +++ b/libstdc++-v3/include/bits/regex_automaton.tcc @@ -93,7 +93,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __ostr __id [label=\ __id \\nLINE_END \];\n __id - _M_next [label=\epsilon\];\n; break; - case _S_opcode_word_boundry: + case _S_opcode_word_boundary: __ostr __id [label=\ __id \\nWORD_BOUNDRY _M_neg \];\n __id - _M_next [label=\epsilon\];\n; diff --git a/libstdc++-v3/include/bits/regex_executor.h b/libstdc++-v3/include/bits/regex_executor.h index 12db47b..57b3108 100644 --- a/libstdc++-v3/include/bits/regex_executor.h +++ b/libstdc++-v3/include/bits/regex_executor.h @@ -138,7 +138,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } bool - _M_word_boundry(_State_CharT, _TraitsT __state) const; + _M_word_boundary(_State_CharT, _TraitsT __state) const; bool _M_lookahead(_State_CharT, _TraitsT __state); diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc index 0c42189..b310aba 100644 --- a/libstdc++-v3/include/bits/regex_executor.tcc +++ b/libstdc++-v3/include/bits/regex_executor.tcc @@ -257,8 +257,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (_M_at_end()) _M_dfs__match_mode(__state._M_next); break; - case _S_opcode_word_boundry: - if (_M_word_boundry(__state) == !__state._M_neg) + case _S_opcode_word_boundary: + if (_M_word_boundary(__state) == !__state._M_neg) _M_dfs__match_mode(__state._M_next); break; // Here __state._M_alt offers a single start node for a sub-NFA. @@ -344,11 +344,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } - // Return whether now is at some word boundry. + // Return whether now is at some word boundary. templatetypename _BiIter, typename _Alloc, typename _TraitsT, bool __dfs_mode bool _Executor_BiIter, _Alloc, _TraitsT, __dfs_mode:: -_M_word_boundry(_State_CharT, _TraitsT __state) const +_M_word_boundary(_State_CharT, _TraitsT __state) const { // By definition. bool __ans = false;
[PATCH GCC]Improve IVOPT to handle outside and inside loop iv uses differently in GCC
Hi, GCC IVOPT has a problem that it doesn't differentiate between iv uses outside of loop from inside ones. It computes cost for outside iv use just like inside ones, which is wrong because outside iv use should be computed along loop exit edge and the cost should be amortized against loop iteration number. Lastly, the computation of outside iv use is inserted in loop, rather along loop exit edge. This is interesting since usually outside iv use should be handled differently, or it hurts optimization in several ways like: 1) Wrong iv candidate is chosen because of inaccurate cost. 2) Extra computation in loop itself is redundant. 3) Extra code computing outside iv use in loop may increases register pressure because both iv variables before and after stepping could be alive at same time. 4) IVOPT generates code that it expects to stay as is, passes like DOM tends to break this because of the extra computation. This hurts targets with auto-increment support more. This patch fixes the problem. Bootstrap and test on x86/x86_64/arm. Richard, Zdenek, does this look reasonable? Thanks, bin gcc/testsuite/ChangeLog 2013-11-06 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test. 2013-11-06 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (iv_use_p, iv_cand_p): Move around. (iv_use_location): New. (struct iv): Remove have_use_for and use_id. New fields inside_use, outside_uses_vec and use_loc. (struct iv_use): New fields exit_edge and outside_use_p. (struct edge_info, edge_info_p): New. (struct ivopts_data): New fields alloc_uses_vecs, changed_bbs, edge_map and edge_obstack. (init_edge_info, get_edge_info): New. (dump_use): Dump outside/inside information for iv use. (tree_ssa_iv_optimize_init): Init new fields. (alloc_iv): Init new fields. Remove have_use_for and use_id. (record_use): New parameter. Record information for outside loop iv use. (find_interesting_uses_op): New parameter. Handle inside and outside loop iv uses. (find_interesting_uses_cond, idx_record_use): Pass new argument. (find_interesting_uses_address): Likewise. (find_interesting_uses_stmt, create_new_iv): likewise. (find_interesting_uses_outside): Rename exit to exit_edge. New parameter normal_edge_p. Pass new argument. (find_interesting_uses): Find iv uses in two passes. (get_computation): Compute cost at right position for iv use. (determine_use_iv_cost_generic): Ajust cost for outside loop iv use. (rewrite_use_outside_of_loop): New. (rewrite_use): Call rewrite_use_outside_of_loop. (remove_unused_ivs): Keep computation only for inner iv use. (free_loop_data): Reset outside_uses_vec in various iv structures. Free alloc_uses_vecs and edge_map. (tree_ssa_iv_optimize_finalize): Free and reset. (tree_ssa_iv_optimize_loop): Create edge_map. (tree_ssa_iv_optimize): Call rewrite_into_loop_closed_ssa if necessary.Index: gcc/testsuite/gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c (revision 0) @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-ivopts } */ + +struct header { +unsigned long size; +struct header **ptr; +}; + +extern unsigned long top; + +int foo (struct header *start, int size) +{ + struct header *hd = start; + top = 0; + while (hd != 0 top+1 size) +{ + ++top; + hd = (hd-ptr)[4]; +} + + return 0; +} + +/* { dg-final { scan-tree-dump iv2tmp.* = .* + \[0-9\]* ivopts } } */ +/* { dg-final { cleanup-tree-dump ivopts } } */ Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c (revision 204117) +++ gcc/tree-ssa-loop-ivopts.c (working copy) @@ -56,6 +56,24 @@ along with GCC; see the file COPYING3. If not see 4) The trees are transformed to use the new variables, the dead code is removed. + Induction variable is possibly used outside of loop. The algorithm + handles outside iv use like: + -- Marks iv use outside of loop by recording the corresponding exit +edge along which the iv is used. If an induction variable is +used both inside and outside of loop, outside use will be ignored, +as if it is escalated into inside one. + -- Computes the use cost against location of the end of source block +of the exit edge, and amortize the cost against loop iteration +number. + -- Inserts computation of outside iv use along the exit edge, rather +than in basic block of current loop. + TODO: For degenerable phi node
Re: [v3 patch] Implement C++14 N3655 TransformationTraits Redux
Hi On 11/06/2013 12:01 AM, Jonathan Wakely wrote: I also changed a few tests to use static_assert instead of VERIFY so they only need to be compiled, not executed, saving a few milliseconds when running the testsuite ;-) Good idea, anyway. I may get around to do more of that + those missing testcases ;) Thanks! Paolo.
RE: [PATCH GCC]Simplify address expression in IVOPT
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Tuesday, November 05, 2013 8:18 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH GCC]Simplify address expression in IVOPT On Tue, Nov 5, 2013 at 11:13 AM, bin.cheng bin.ch...@arm.com wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of bin.cheng Sent: Monday, November 04, 2013 4:35 PM To: 'Richard Biener' Cc: GCC Patches Subject: RE: [PATCH GCC]Simplify address expression in IVOPT -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, October 30, 2013 10:46 PM To: Bin Cheng Cc: GCC Patches Subject: Re: [PATCH GCC]Simplify address expression in IVOPT On Tue, Oct 29, 2013 at 10:18 AM, bin.cheng bin.ch...@arm.com wrote: Hmm. I think you want what get_inner_reference_aff does, using the return value of get_inner_reference as starting point for determine_base_object. And you don't want to restrict yourselves so much on what exprs to process, but only exclude DECL_Ps. Did you mean I should pass all address expressions to get_inner_reference_aff except the one with declaration operand? I am a little confused here why DECL_Ps should be handled specially? Seems it can be handled properly by the get_* function. Anything important I missed? Just amend get_inner_reference_aff to return the tree base object. I found it's inappropriate to do this because: functions like get_inner_reference* only accept reference expressions, while base_object has to be computed for other kinds of expressions too. Take gimple statement a_1 = *ptr_1 as an example, the base passed to alloc_iv is ptr_1; the base_object computed by determine_base_object is also ptr_1, which we can't rely on get_inner_reference* . Also It seems the comment before determine_base_object might be inaccurate: Returns a memory object to that EXPR points. which I think is Returns a pointer pointing to the main memory object to that EXPR points. Right? Note that this isn't really simplifying but rather lowering all addresses. The other changes in this patch are unrelated, right? Right, I should have named this message like refine cost computation of expressions in IVOPT just as the patch file. Hi, I updated the patch according to review comments. Now it lowers all address expressions except ones with DECL_P to get_inner_reference_aff, it also computes base_object which is used later to ease determine_base_object. Bootstrap and test on x86/x86_64/arm on going, is it OK if tests pass? static struct iv * alloc_iv (tree base, tree step) { + tree expr = base; + tree base_object = base; struct iv *iv = XCNEW (struct iv); gcc_assert (step != NULL_TREE); + /* Lower all address expressions except ones with DECL_P as oeprand. + By doing this: + 1) More accurate cost can be computed for address expressions; + 2) Duplicate candidates won't be created for bases in different + forms, like a[0] and a. */ STRIP_NOPS (expr); if + (TREE_CODE (expr) == ADDR_EXPR !DECL_P (TREE_OPERAND (expr, 0))) +{ + aff_tree comb; + double_int size; + base_object = get_inner_reference_aff (TREE_OPERAND (expr, 0), + comb, size); + gcc_assert (base_object != NULL_TREE); + base = fold_convert (TREE_TYPE (base), aff_combination_to_tree (comb)); +} + iv-base = base; - iv-base_object = determine_base_object (base); + iv-base_object = determine_base_object (base_object); for less confusion do not introduce 'expr' but just base_object (s/expr/base_object/). Also can you split out this change (and the related tree-affine.c one) from the rest? Ok with that change assuming it passes bootstrap testing. It passes test. I split it into two patches as attached. Are these two OK? Thanks, bin 3-ivopt-expr_cost-a-20131106: gcc/testsuite/ChangeLog 2013-11-06 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/loop-2.c: Refine check condition. * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto. 2013-11-06 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (alloc_iv): Lower address expressions. * tree-affine.c (get_inner_reference_aff): Return base. * tree-affine.h (get_inner_reference_aff): Change prototype. 3-ivopt-expr_cost-b-20131106: 2013-11-06 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (get_shiftadd_cost): Check equality using operand_equal_p. (force_expr_to_var_cost): Refactor the code. Handle type conversion. (split_address_cost): Call force_expr_to_var_cost.Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c === --- gcc/testsuite
Re: [PATCH, MPX, 2/X] Pointers Checker [8/25] Languages support
On Tue, Nov 5, 2013 at 3:08 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:37 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:20 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:02 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/4 Richard Biener richard.guent...@gmail.com: On Thu, Oct 31, 2013 at 10:11 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds support Pointer Bounds Checker into c-family and LTO front-ends. The main purpose of changes in front-end is to register all statically initialized objects for checker. We also need to register such objects created by compiler. LTO is quite specific front-end. For LTO Pointer Bounds Checker support macro just means it allows instrumented code as input because all instrumentation is performed before code is streamed out for LTO. But your patch doesn't even make use of the langhook... Use of langhook is in previous patch (http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02408.html). Here is it's part: diff --git a/gcc/toplev.c b/gcc/toplev.c index db269b7..0eaf081 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1282,6 +1282,15 @@ process_options (void) and -ftree-loop-linear)); #endif + if (flag_check_pointer_bounds) +{ + if (targetm.chkp_bound_mode () == VOIDmode) + error (-fcheck-pointers is not supported for this target); + + if (!lang_hooks.chkp_supported) + flag_check_pointer_bounds = 0; +} + /* One region RA really helps to decrease the code size. */ if (flag_ira_region == IRA_REGION_AUTODETECT) flag_ira_region If we try to use -fcheck-pointers -flto for some unsupported language, code will not be instrumented. What's the reason to have unsupported languages? For some languages (e.g. Java) Pointer Bounds Checker does not make sense at all. Others may require additional support in front-end. The primary target is C family where solved problem is more critical. What does break if you enable it for Java or other unsupported languages? That is, if LTO is able to handle a mixed Java and C binary then why can Java alone not handle it? In such case checker will produce useless overhead in Java code. Resulting code will probably report some bound violations because Java FE may generate code which seems wrong for Pointer Bounds Checker. So it's only an issue that if you use it that it may trip over Java FE bugs? Not a good reason to have a langhook - you can use the existing post_options langhook for disallowing this? Thanks, Richard. Ilya Richard. Ilya Richard. Ilya Richard. Ilya You define CHKP as supported in LTO. That means it has to be supported for all languages and thus you should drop the langhook. Richard. Thanks, Ilya -- gcc/ 2013-10-29 Ilya Enkovich ilya.enkov...@intel.com * c/c-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * cp/cp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * objc/objc-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * objcp/objcp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * lto/lto-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * c/c-parser.c (c_parser_declaration_or_fndef): Register statically initialized decls in Pointer Bounds Checker. * cp/decl.c (cp_finish_decl): Likewise. * gimplify.c (gimplify_init_constructor): Likewise. diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c index 614c46d..a32bc6b 100644 --- a/gcc/c/c-lang.c +++ b/gcc/c/c-lang.c @@ -43,6 +43,8 @@ enum c_language_kind c_language = clk_c; #define LANG_HOOKS_INIT c_objc_common_init #undef LANG_HOOKS_INIT_TS #define LANG_HOOKS_INIT_TS c_common_init_ts +#undef LANG_HOOKS_CHKP_SUPPORTED +#define LANG_HOOKS_CHKP_SUPPORTED true /* Each front end provides its own lang hook initializer. */ struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 9ccae3b..65d83c8 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -1682,6 +1682,12 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok, maybe_warn_string_init (TREE_TYPE (d), init); finish_decl (d, init_loc, init.value, init.original_type, asm_name); + + /* Register all decls with initializers in Pointer +Bounds Checker to generate required static bounds +initializers. */ + if (DECL_INITIAL (d) != error_mark_node) + chkp_register_var_initializer (d); } } else diff --git a/gcc/cp/cp-lang.c b/gcc/cp/cp-lang.c index a7fa8e4..6d138bd 100644 ---
Re: [C++ Patch, Java related/RFC] PR 11006
On 11/04/2013 05:21 PM, Jason Merrill wrote: Surely it should be valid to allocate a Java boolean type. Andrew, how should that work? It's not allowed. All objects that are allocated by new must be of class type (i.e. instances of a subclass of java.lang.Object), but boolean is a primitive type. Andrew.
[C++ Patch] PR 50436
Hi, in this bug, filed by Zack, we loop forever after error in constant_value_1. Straightforward thing to do, detect and break out. Tested x86_64-linux. Thanks, Paolo. /cp 2013-11-06 Paolo Carlini paolo.carl...@oracle.com PR c++/50436 * init.c (constant_value_1): Break out of infinite loops during error recovery. /testsuite 2013-11-06 Paolo Carlini paolo.carl...@oracle.com PR c++/50436 * g++.dg/template/pr50436.C: New. Index: cp/init.c === --- cp/init.c (revision 204447) +++ cp/init.c (working copy) @@ -1968,6 +1968,7 @@ constant_value_1 (tree decl, bool integral_p, bool CP_TYPE_CONST_NON_VOLATILE_P (TREE_TYPE (decl) { tree init; + tree save_decl = decl; /* If DECL is a static data member in a template specialization, we must instantiate it here. The initializer for the static data member is not processed @@ -2007,6 +2008,10 @@ constant_value_1 (tree decl, bool integral_p, bool || TREE_CODE (init) == STRING_CST))) break; decl = unshare_expr (init); + if (decl == save_decl) + /* Otherwise, in some cases of excessive recursive template + instantation (c++/50436) we hang after error. */ + return error_mark_node; } return decl; } Index: testsuite/g++.dg/template/pr50436.C === --- testsuite/g++.dg/template/pr50436.C (revision 0) +++ testsuite/g++.dg/template/pr50436.C (working copy) @@ -0,0 +1,17 @@ +// PR c++/50436 + +template bool struct VI {}; +template typename T +struct IP +{ + static const bool r = IPT::r; // { dg-error template|expression } +}; +template typename T struct V +{ + VIIPT::r vi; +}; +struct X; +struct Y +{ + VX v; +};
Re: [PATCH, MPX, 2/X] Pointers Checker [8/25] Languages support
2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:08 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:37 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:20 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:02 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/4 Richard Biener richard.guent...@gmail.com: On Thu, Oct 31, 2013 at 10:11 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds support Pointer Bounds Checker into c-family and LTO front-ends. The main purpose of changes in front-end is to register all statically initialized objects for checker. We also need to register such objects created by compiler. LTO is quite specific front-end. For LTO Pointer Bounds Checker support macro just means it allows instrumented code as input because all instrumentation is performed before code is streamed out for LTO. But your patch doesn't even make use of the langhook... Use of langhook is in previous patch (http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02408.html). Here is it's part: diff --git a/gcc/toplev.c b/gcc/toplev.c index db269b7..0eaf081 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1282,6 +1282,15 @@ process_options (void) and -ftree-loop-linear)); #endif + if (flag_check_pointer_bounds) +{ + if (targetm.chkp_bound_mode () == VOIDmode) + error (-fcheck-pointers is not supported for this target); + + if (!lang_hooks.chkp_supported) + flag_check_pointer_bounds = 0; +} + /* One region RA really helps to decrease the code size. */ if (flag_ira_region == IRA_REGION_AUTODETECT) flag_ira_region If we try to use -fcheck-pointers -flto for some unsupported language, code will not be instrumented. What's the reason to have unsupported languages? For some languages (e.g. Java) Pointer Bounds Checker does not make sense at all. Others may require additional support in front-end. The primary target is C family where solved problem is more critical. What does break if you enable it for Java or other unsupported languages? That is, if LTO is able to handle a mixed Java and C binary then why can Java alone not handle it? In such case checker will produce useless overhead in Java code. Resulting code will probably report some bound violations because Java FE may generate code which seems wrong for Pointer Bounds Checker. So it's only an issue that if you use it that it may trip over Java FE bugs? Not a good reason to have a langhook - you can use the existing post_options langhook for disallowing this? The issue is that users do not get what expect. I do not want someone having mixed codes get instrumentation for his Java/Fortran/Ada functions which slows them down and does nothing useful. What is the point to allow checks of pointer bounds for language with no pointers? If I use post_options, I need to change hooks of languages that do not care about checker to disable it. Now I have it disabled by default. Using post_options will also require empty redefinition of post_options in C languages with empty body which may be confusing. If you do not like this hook, I can move all Pointer Bounds Checker flags into c.opt and remove the hook. Should it be OK? Thanks, Ilya Thanks, Richard. Ilya Richard. Ilya Richard. Ilya Richard. Ilya You define CHKP as supported in LTO. That means it has to be supported for all languages and thus you should drop the langhook. Richard. Thanks, Ilya -- gcc/ 2013-10-29 Ilya Enkovich ilya.enkov...@intel.com * c/c-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * cp/cp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * objc/objc-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * objcp/objcp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * lto/lto-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * c/c-parser.c (c_parser_declaration_or_fndef): Register statically initialized decls in Pointer Bounds Checker. * cp/decl.c (cp_finish_decl): Likewise. * gimplify.c (gimplify_init_constructor): Likewise. diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c index 614c46d..a32bc6b 100644 --- a/gcc/c/c-lang.c +++ b/gcc/c/c-lang.c @@ -43,6 +43,8 @@ enum c_language_kind c_language = clk_c; #define LANG_HOOKS_INIT c_objc_common_init #undef LANG_HOOKS_INIT_TS #define LANG_HOOKS_INIT_TS c_common_init_ts +#undef LANG_HOOKS_CHKP_SUPPORTED +#define LANG_HOOKS_CHKP_SUPPORTED true /* Each front end provides its own lang hook initializer. */ struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 9ccae3b..65d83c8 100644 --- a/gcc/c/c-parser.c +++
Re: [PATCH, SH] Implement builtin_strlen
thanks, applied together with the cleanup referenced earlier and a slight variable renaming (start_addr-curr_addr, end_addr-start_addr) for readability as obvious, Minor nit: no gcc/ prefix in gcc/ChangeLog. -- Eric Botcazou
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Also sth weird is going on as your arg_bnd call gets [return slot optimization]. It seems that your POINTER_BOUNDS_TYPE is seen as aggregate somehow. Target is responsible for having return slot optimization here. And target expands all such calls. So, do not see the problem here. Ilya Richard. Ilya Richard.
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/06/2013 10:31 AM, Richard Biener wrote: We decided to move to C++. As part of a later discussion we decided to go with a single general dynamic-casting style, mimicing the real C++ variant which is dynamic_cast ... . Which resulted in is-a.h. So yes, we've decided to go C++ so we have to live with certain uglinesses of that decisions (and maybe over time those uglinesses will fade away and we get used to it and like it). Thus, there isn't another option besides using the is-a.h machinery and enabling and using RTTI. Sticking to C for gimple doesn't seem to be consistent with the decision to move to C++. Oh, I'm not saying I'm a big fan of as_a / is_a or C++ in general as it plays out right now. But well, we've had the discussion and had a decision. Maybe we need to revisit it? As one of those who were not in favour of the C++ move, can I ask you guys to step back for a moment and think about - what do all of these changes buy us, exactly? Imagine the state at the end, where everything is converted and supposedly the temporary ugliness is gone, what have we gained over the code as it is now? I still think all this effort is misdirected and distracts us from making changes that improve gcc for its users. Bernd
[PATCH][1/2] Get TREE_OVERFLOW somewhat under control
The following patch makes initial steps of reducing the amount of TREE_OVERFLOW constants in the GIMPLE IL (there should be zero such constants). Tree folding in the frontends (and also later during optimization) introduces those so the following patches the gimplifier to drop the flag from all constants in GENERIC. The idea is that we can eventually verify no TREE_OVERFLOW is set form the verifier (no, I won't arrive there with this series). TREE_OVERFLOW in the GIMPLE IL seriously confuses VRP which uses the flag for its own purposes (ick) and relies on fold-const setting TREE_OVERFLOW on overflows (double ick). So the patch also replaces as many TREE_OVERFLOW checks in tree-vrp.c with the appropriate is_overflow_infinity check. This also introduces a helper to drop TREE_OVERFLOW. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2013-11-06 Richard Biener rguent...@suse.de * tree.c (drop_tree_overflow): New function. * tree.h (drop_tree_overflow): Declare. * gimplify.c (gimplify_expr): Drop TREE_OVERFLOW. * tree-vrp.c (range_int_cst_singleton_p): Use is_overflow_infinity instead of testing TREE_OVERFLOW. (extract_range_from_assert): Likewise. (zero_nonzero_bits_from_vr): Likewise. (extract_range_basic): Likewise. (register_new_assert_for): Use drop_tree_overflow. (vrp_visit_phi_node): Likewise. Index: gcc/tree.c === *** gcc/tree.c (revision 204449) --- gcc/tree.c (working copy) *** get_tree_code_name (enum tree_code code) *** 12542,12545 --- 12542,12564 return tree_code_name[code]; } + /* Drops the TREE_OVERFLOW flag from T. */ + + tree + drop_tree_overflow (tree t) + { + gcc_checking_assert (TREE_OVERFLOW (t)); + + /* For tree codes with a sharing machinery re-build the result. */ + if (TREE_CODE (t) == INTEGER_CST) + return build_int_cst_wide (TREE_TYPE (t), + TREE_INT_CST_LOW (t), TREE_INT_CST_HIGH (t)); + + /* Otherwise, as all tcc_constants are possibly shared, copy the node + and drop the flag. */ + t = copy_node (t); + TREE_OVERFLOW (t) = 0; + return t; + } + #include gt-tree.h Index: gcc/tree.h === *** gcc/tree.h (revision 204449) --- gcc/tree.h (working copy) *** extern tree get_base_address (tree t); *** 4800,4805 --- 4800,4806 extern void mark_addressable (tree); /* In tree.c. */ + extern tree drop_tree_overflow (tree); extern int tree_map_base_eq (const void *, const void *); extern unsigned int tree_map_base_hash (const void *); extern int tree_map_base_marked_p (const void *); Index: gcc/gimplify.c === *** gcc/gimplify.c (revision 204449) --- gcc/gimplify.c (working copy) *** gimplify_expr (tree *expr_p, gimple_seq *** 7897,7902 --- 7897,7906 case STRING_CST: case COMPLEX_CST: case VECTOR_CST: + /* Drop the overflow flag on constants, we do not want +that in the GIMPLE IL. */ + if (TREE_OVERFLOW_P (*expr_p)) + *expr_p = drop_tree_overflow (*expr_p); ret = GS_ALL_DONE; break; Index: gcc/tree-vrp.c === *** gcc/tree-vrp.c (revision 204449) --- gcc/tree-vrp.c (working copy) *** static inline bool *** 892,899 range_int_cst_singleton_p (value_range_t *vr) { return (range_int_cst_p (vr) ! !TREE_OVERFLOW (vr-min) ! !TREE_OVERFLOW (vr-max) tree_int_cst_equal (vr-min, vr-max)); } --- 892,899 range_int_cst_singleton_p (value_range_t *vr) { return (range_int_cst_p (vr) ! !is_overflow_infinity (vr-min) ! !is_overflow_infinity (vr-max) tree_int_cst_equal (vr-min, vr-max)); } *** extract_range_from_assert (value_range_t *** 1741,1747 all should be optimized away above us. */ if ((cond_code == LT_EXPR compare_values (max, min) == 0) ! || (CONSTANT_CLASS_P (max) TREE_OVERFLOW (max))) set_value_range_to_varying (vr_p); else { --- 1741,1747 all should be optimized away above us. */ if ((cond_code == LT_EXPR compare_values (max, min) == 0) ! || is_overflow_infinity (max)) set_value_range_to_varying (vr_p); else { *** extract_range_from_assert (value_range_t *** 1781,1787 all should be optimized away above us. */ if ((cond_code == GT_EXPR compare_values (min, max) == 0) ! || (CONSTANT_CLASS_P (min) TREE_OVERFLOW (min))) set_value_range_to_varying (vr_p); else { ---
Re: [PATCH GCC]Simplify address expression in IVOPT
ivopts_data *data, *var_present = true; fd_ivopts_data = data; walk_tree (addr, find_depends, depends_on, NULL); - return new_cost (target_spill_cost[data-speed], 0); + return force_expr_to_var_cost (addr, data-speed); } this seems to be an unrelated change - there are other spill-cost users in ivopts as well, not sure why one or the other variant is used. For IVOPTs which is quite fragile with code generation quality it helps to split up patches as much as possible to allow easy bisection when problems arise on targets. So, you can commit the get_shiftadd_cost change, re-post a fixed conversion-handling patch and split the split_address_cost piece, motivating it? Thanks, Richard. Thanks, bin 3-ivopt-expr_cost-a-20131106: gcc/testsuite/ChangeLog 2013-11-06 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/loop-2.c: Refine check condition. * gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto. 2013-11-06 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (alloc_iv): Lower address expressions. * tree-affine.c (get_inner_reference_aff): Return base. * tree-affine.h (get_inner_reference_aff): Change prototype. 3-ivopt-expr_cost-b-20131106: 2013-11-06 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (get_shiftadd_cost): Check equality using operand_equal_p. (force_expr_to_var_cost): Refactor the code. Handle type conversion. (split_address_cost): Call force_expr_to_var_cost.
Re: [PATCH, MPX, 2/X] Pointers Checker [8/25] Languages support
On Wed, Nov 6, 2013 at 11:55 AM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:08 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:37 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:20 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 2:02 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/4 Richard Biener richard.guent...@gmail.com: On Thu, Oct 31, 2013 at 10:11 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch adds support Pointer Bounds Checker into c-family and LTO front-ends. The main purpose of changes in front-end is to register all statically initialized objects for checker. We also need to register such objects created by compiler. LTO is quite specific front-end. For LTO Pointer Bounds Checker support macro just means it allows instrumented code as input because all instrumentation is performed before code is streamed out for LTO. But your patch doesn't even make use of the langhook... Use of langhook is in previous patch (http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02408.html). Here is it's part: diff --git a/gcc/toplev.c b/gcc/toplev.c index db269b7..0eaf081 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1282,6 +1282,15 @@ process_options (void) and -ftree-loop-linear)); #endif + if (flag_check_pointer_bounds) +{ + if (targetm.chkp_bound_mode () == VOIDmode) + error (-fcheck-pointers is not supported for this target); + + if (!lang_hooks.chkp_supported) + flag_check_pointer_bounds = 0; +} + /* One region RA really helps to decrease the code size. */ if (flag_ira_region == IRA_REGION_AUTODETECT) flag_ira_region If we try to use -fcheck-pointers -flto for some unsupported language, code will not be instrumented. What's the reason to have unsupported languages? For some languages (e.g. Java) Pointer Bounds Checker does not make sense at all. Others may require additional support in front-end. The primary target is C family where solved problem is more critical. What does break if you enable it for Java or other unsupported languages? That is, if LTO is able to handle a mixed Java and C binary then why can Java alone not handle it? In such case checker will produce useless overhead in Java code. Resulting code will probably report some bound violations because Java FE may generate code which seems wrong for Pointer Bounds Checker. So it's only an issue that if you use it that it may trip over Java FE bugs? Not a good reason to have a langhook - you can use the existing post_options langhook for disallowing this? The issue is that users do not get what expect. I do not want someone having mixed codes get instrumentation for his Java/Fortran/Ada functions which slows them down and does nothing useful. What is the point to allow checks of pointer bounds for language with no pointers? If I use post_options, I need to change hooks of languages that do not care about checker to disable it. Now I have it disabled by default. No, you'd change the hooks to complain about the flag. Using post_options will also require empty redefinition of post_options in C languages with empty body which may be confusing. If you do not like this hook, I can move all Pointer Bounds Checker flags into c.opt and remove the hook. Should it be OK? That works for me if for LTO it is enough to see the instrumented IL to do the right thing, that is, no flag tests appear in middle-end code [You can also simply add LTO to the list of supported languages of course, still no flag checks hint at a proper design]. Richard. Thanks, Ilya Thanks, Richard. Ilya Richard. Ilya Richard. Ilya Richard. Ilya You define CHKP as supported in LTO. That means it has to be supported for all languages and thus you should drop the langhook. Richard. Thanks, Ilya -- gcc/ 2013-10-29 Ilya Enkovich ilya.enkov...@intel.com * c/c-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * cp/cp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * objc/objc-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * objcp/objcp-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * lto/lto-lang.c (LANG_HOOKS_CHKP_SUPPORTED): New. * c/c-parser.c (c_parser_declaration_or_fndef): Register statically initialized decls in Pointer Bounds Checker. * cp/decl.c (cp_finish_decl): Likewise. * gimplify.c (gimplify_init_constructor): Likewise. diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c index 614c46d..a32bc6b 100644 --- a/gcc/c/c-lang.c +++ b/gcc/c/c-lang.c @@ -43,6 +43,8 @@ enum c_language_kind c_language = clk_c; #define
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
On Wed, Nov 6, 2013 at 12:00 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. Richard. Also sth weird is going on as your arg_bnd call gets [return slot optimization]. It seems that your POINTER_BOUNDS_TYPE is seen as aggregate somehow. Target is responsible for having return slot optimization here. And target expands all such calls. So, do not see the problem here. Ilya Richard. Ilya Richard.
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On Wed, Nov 6, 2013 at 12:02 PM, Bernd Schmidt ber...@codesourcery.com wrote: On 11/06/2013 10:31 AM, Richard Biener wrote: We decided to move to C++. As part of a later discussion we decided to go with a single general dynamic-casting style, mimicing the real C++ variant which is dynamic_cast ... . Which resulted in is-a.h. So yes, we've decided to go C++ so we have to live with certain uglinesses of that decisions (and maybe over time those uglinesses will fade away and we get used to it and like it). Thus, there isn't another option besides using the is-a.h machinery and enabling and using RTTI. Sticking to C for gimple doesn't seem to be consistent with the decision to move to C++. Oh, I'm not saying I'm a big fan of as_a / is_a or C++ in general as it plays out right now. But well, we've had the discussion and had a decision. Maybe we need to revisit it? As one of those who were not in favour of the C++ move, can I ask you guys to step back for a moment and think about - what do all of these changes buy us, exactly? Imagine the state at the end, where everything is converted and supposedly the temporary ugliness is gone, what have we gained over the code as it is now? as_a gains us less runtime checking and more static type checking which is good. I still think all this effort is misdirected and distracts us from making changes that improve gcc for its users. That I agree to. Instead of fixing the less than optimal separation / boundary between frontends and the middle-end, or fixing several other long-standing issues with GCC we spend a lot of time refactoring things to be C++. But that was kind of part of the decision (though I remember that we mainly wanted to convert containters and isolated stuff, not gimple or trees (I bet that'll be next)). Of course I don't see contributors of changes that improve gcc for its users now wasting their time with converting code to C++. That conversion may slow down those people, but only so much. It'll get more interesting with branch maintainance ... Richard. Bernd
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On Wed, Nov 06, 2013 at 12:31:00PM +0100, Richard Biener wrote: Maybe we need to revisit it? As one of those who were not in favour of the C++ move, can I ask you guys to step back for a moment and think about - what do all of these changes buy us, exactly? Imagine the state at the end, where everything is converted and supposedly the temporary ugliness is gone, what have we gained over the code as it is now? as_a gains us less runtime checking and more static type checking which is good. But that really affects --enable-checking=yes builds (and only cases where things aren't inlined). If the price for that is uglier and less readable code, then the price is just too high. I still think all this effort is misdirected and distracts us from making changes that improve gcc for its users. That I agree to. Instead of fixing the less than optimal separation / boundary between frontends and the middle-end, or fixing several other long-standing issues with GCC we spend a lot of time refactoring things to be C++. But that was kind of part of the decision (though I remember that we mainly wanted to convert containters and isolated stuff, not gimple or trees (I bet that'll be next)). Of course I don't see contributors of changes that improve gcc for its users now wasting their time with converting code to C++. That conversion may slow down those people, but only so much. It'll get more interesting with branch maintainance ... If the changes are done at least with some for users useful goal (like perhaps making gcc usable as a library for JITting etc.), then I can see the benefit, but as_s uglification doesn't seem to be beneficial to users at all, and IMNSHO much better would be to instead spend time on what is beneficial to users (restrict, vectorizer cost model, vectorizer improvements, better diagnostics (say range locations), ...). Jakub
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
Maybe we need to revisit it? As one of those who were not in favour of the C++ move, can I ask you guys to step back for a moment and think about - what do all of these changes buy us, exactly? Imagine the state at the end, where everything is converted and supposedly the temporary ugliness is gone, what have we gained over the code as it is now? Without going as far as revisiting the decision, I'd like to ask whether the impact of the recent changes on the speed of the compiler has been evaluated. I didn't conduct real measurements but bootstrap times on mainline, which were slightly increasing as usual over the course of the year, started to increase at a faster pace over the last couple of months, at least for me. -- Eric Botcazou
Re: libsanitizer merge from upstream r191666
On Tue, Nov 05, 2013 at 01:49:43PM -0800, Evgeniy Stepanov wrote: This way we can't test kernel interfaces that are not used in glibc, like linux aio. So you just test what you can test. Why do you need to intercept kernel aio when hardly anything uses it? Also, what is the reason why say stat* interceptors couldn't just use the glibc stat* (using dlsym obtained pointers) and just do some work before and/or after)? Most of our team is travelling, and we won't be able to submit a proper fix until next week. Is this blocking anyone? Are you OK with disabling broken parts with #ifdefs for now? Most of this is used in the interceptors, and they can be disabled on an individual basis. Bootstrap failures are certainly blocking lots of people, so if you could disable known problematic parts with #ifdefs for now and resolve later, it would be great. Jakub
Re: [PATCH, SH] Implement builtin_strlen
ok I'll didn't notice this., do you mind that I cleanup the other entries in the same time ? I already cleaned up some, but go ahead. -- Eric Botcazou
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:00 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. That is definitely a correctness issue. Compiler cannot assume bounds by pointer value ignoring bounds passed to the function. SSA_NAME_OCCURS_IN_ABNORMAL_PHI usage results in missed optimizations. My original patch does not. Ilya Richard. Also sth weird is going on as your arg_bnd call gets [return slot optimization]. It seems that your POINTER_BOUNDS_TYPE is seen as aggregate somehow. Target is responsible for having return slot optimization here. And target expands all such calls. So, do not see the problem here. Ilya Richard. Ilya Richard.
Re: [RFA][PATCH] Minor fix to aliasing machinery
[Discussion started in http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02472.html ] On Wed, 30 Oct 2013, Marc Glisse wrote: On Wed, 30 Oct 2013, Richard Biener wrote: Btw, get_addr_base_and_unit_offset may also return an offsetted MEM_REF (from MEM [p_3, 17] for example). As we are interested in pointers this could be handled by not requiring a memory reference but extracting the base address and offset, covering more cases. I tried the attached patch, and it almost worked, except for one fortran testcase (widechar_intrinsics_10.f90): Now that ao_ref_init_from_ptr_and_size has been fixed, the following patch passes bootstrap+testsuite (default languages) on x86_64-unknown-linux-gnu. 2013-11-06 Marc Glisse marc.gli...@inria.fr Jeff Law l...@redhat.com gcc/ * tree-ssa-alias.c (stmt_kills_ref_p_1): Use ao_ref_init_from_ptr_and_size for builtins. gcc/testsuite/ * gcc.dg/tree-ssa/alias-27.c: New testcase. -- Marc GlisseIndex: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -fdump-tree-optimized } */ + +void f (long *p) { + *p = 42; + p[4] = 42; + __builtin_memset (p, 0, 100); +} + +/* { dg-final { scan-tree-dump-not = 42 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204453) +++ gcc/tree-ssa-alias.c(working copy) @@ -2001,23 +2001,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre return stmt_may_clobber_ref_p_1 (stmt, r); } /* If STMT kills the memory reference REF return true, otherwise return false. */ static bool stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) { /* For a must-alias check we need to be able to constrain - the access properly. */ - ao_ref_base (ref); - if (ref-max_size == -1) + the access properly. + FIXME: except for BUILTIN_FREE. */ + if (!ao_ref_base (ref) + || ref-max_size == -1) return false; if (gimple_has_lhs (stmt) TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME /* The assignment is not necessarily carried out if it can throw and we can catch it in the current function where we could inspect the previous value. ??? We only need to care about the RHS throwing. For aggregate assignments or similar calls and non-call exceptions the LHS might throw as well. */ @@ -2090,37 +2091,47 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref case BUILT_IN_MEMPCPY: case BUILT_IN_MEMMOVE: case BUILT_IN_MEMSET: case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMPCPY_CHK: case BUILT_IN_MEMMOVE_CHK: case BUILT_IN_MEMSET_CHK: { tree dest = gimple_call_arg (stmt, 0); tree len = gimple_call_arg (stmt, 2); - tree base = NULL_TREE; - HOST_WIDE_INT offset = 0; + tree rbase = ref-base; + HOST_WIDE_INT roffset = ref-offset; if (!host_integerp (len, 0)) return false; - if (TREE_CODE (dest) == ADDR_EXPR) - base = get_addr_base_and_unit_offset (TREE_OPERAND (dest, 0), - offset); - else if (TREE_CODE (dest) == SSA_NAME) - base = dest; - if (base - base == ao_ref_base (ref)) + ao_ref dref; + ao_ref_init_from_ptr_and_size (dref, dest, len); + tree base = ao_ref_base (dref); + HOST_WIDE_INT offset = dref.offset; + if (!base || dref.size == -1) + return false; + if (TREE_CODE (base) == MEM_REF) + { + if (TREE_CODE (rbase) != MEM_REF) + return false; + // Compare pointers. + offset += BITS_PER_UNIT + * TREE_INT_CST_LOW (TREE_OPERAND (base, 1)); + roffset += BITS_PER_UNIT +* TREE_INT_CST_LOW (TREE_OPERAND (rbase, 1)); + base = TREE_OPERAND (base, 0); + rbase = TREE_OPERAND (rbase, 0); + } + if (base == rbase) { - HOST_WIDE_INT size = TREE_INT_CST_LOW (len); - if (offset = ref-offset / BITS_PER_UNIT -
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/06/2013 12:31 PM, Richard Biener wrote: That I agree to. Instead of fixing the less than optimal separation / boundary between frontends and the middle-end, or fixing several other long-standing issues with GCC we spend a lot of time refactoring things to be C++. But that was kind of part of the decision (though I remember that we mainly wanted to convert containters and isolated stuff, not gimple or trees (I bet that'll be next)). What I seem to remember is being told that we'd use C++ mostly for new code and not engage in a wholesale rewrite of the compiler. Instead things happened exactly as predicted, we're getting immense amounts of churn for little real gain, and in some cases the C++ changes are not even improvements and look like really strange examples of C++ code. I'll offer as evidence the pass manager with its bizarre has_gate and has_execute variables, member functions for gate and execute wrapping non-member functions, and a separate pass_data outside of its class (not even as a static member). Now that the question of what would happen after a C++ conversion is no longer theoretical but has been answered, can we maybe just stop and think for a while whether this is really the way we should be going? Bernd
Re: mismatched decls w/ both builtin and explicit decl
On Tue, 5 Nov 2013, DJ Delorie wrote: Consider this source: extern char *index(const char *,int); static int index; index is a builtin as well, but because it's a builtin gcc skips the previous declaration was here... despite having *a* previous decl it could complain about. Note that newlib provides decls for many builtins (the decl above is from newlib), so this could be a common case. So I added a check for !C_DECL_DECLARED_BUILTIN (decl) which seems to specifically cover this case. Ok to apply? Please send a patch that adds a testcase to the testsuite to show the diagnostics you get after the patch. -- Joseph S. Myers jos...@codesourcery.com
[SH] arith_reg_operand vs. fp_arith_reg_operand
Hello, The attached patch addresses one thing which I mentioned in PR 22553 comment #27. Some of the FP insns take arith_reg_operand, which seems a bit off. I haven't checked whether it fixes the original problem of PR 22553, but anyway it's probably better to use fp_arith_reg_operand for FP insns. Tested on rev 204263 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m2a-single/-mb,-m4/-ml,-m4/-mb,-m4-single/ -ml,-m4-single/-mb,-m4a-single/-ml,-m4a-single/-mb} and no new failures. OK for trunk? Cheers, Oleg gcc/ChangeLog: * config/sh/sh.md (addsf3, divsf3, divsf3_i, rsqrtsf2, cmpgtdf_t, cmpeqdf_t, *ieee_ccmpeqdf_t, negdf2, sqrtdf2, absdf2): Use fp_arith_reg_operand instead of arith_reg_operand. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 204263) +++ gcc/config/sh/sh.md (working copy) @@ -12187,9 +12187,9 @@ [(set_attr type fpscr_toggle)]) (define_expand addsf3 - [(set (match_operand:SF 0 arith_reg_operand ) - (plus:SF (match_operand:SF 1 arith_reg_operand ) - (match_operand:SF 2 arith_reg_operand )))] + [(set (match_operand:SF 0 fp_arith_reg_operand) + (plus:SF (match_operand:SF 1 fp_arith_reg_operand) + (match_operand:SF 2 fp_arith_reg_operand)))] TARGET_SH2E || TARGET_SHMEDIA_FPU { if (TARGET_SH2E) @@ -12436,9 +12436,9 @@ [(set_attr type fparith_media)]) (define_expand divsf3 - [(set (match_operand:SF 0 arith_reg_operand ) - (div:SF (match_operand:SF 1 arith_reg_operand ) - (match_operand:SF 2 arith_reg_operand )))] + [(set (match_operand:SF 0 fp_arith_reg_operand) + (div:SF (match_operand:SF 1 fp_arith_reg_operand) + (match_operand:SF 2 fp_arith_reg_operand)))] TARGET_SH2E || TARGET_SHMEDIA_FPU { if (TARGET_SH2E) @@ -12457,9 +12457,9 @@ [(set_attr type fdiv_media)]) (define_insn divsf3_i - [(set (match_operand:SF 0 arith_reg_dest =f) - (div:SF (match_operand:SF 1 arith_reg_operand 0) - (match_operand:SF 2 arith_reg_operand f))) + [(set (match_operand:SF 0 fp_arith_reg_dest =f) + (div:SF (match_operand:SF 1 fp_arith_reg_operand 0) + (match_operand:SF 2 fp_arith_reg_operand f))) (use (match_operand:PSI 3 fpscr_operand c))] TARGET_SH2E fdiv %2,%0 @@ -12743,9 +12743,9 @@ (set_attr fp_mode single)]) (define_insn rsqrtsf2 - [(set (match_operand:SF 0 register_operand =f) + [(set (match_operand:SF 0 fp_arith_reg_operand =f) (div:SF (match_operand:SF 1 immediate_operand i) - (sqrt:SF (match_operand:SF 2 register_operand 0 + (sqrt:SF (match_operand:SF 2 fp_arith_reg_operand 0 (use (match_operand:PSI 3 fpscr_operand c))] TARGET_FPU_ANY TARGET_FSRRA operands[1] == CONST1_RTX (SFmode) @@ -13059,8 +13059,8 @@ (define_insn cmpgtdf_t [(set (reg:SI T_REG) - (gt:SI (match_operand:DF 0 arith_reg_operand f) - (match_operand:DF 1 arith_reg_operand f))) + (gt:SI (match_operand:DF 0 fp_arith_reg_operand f) + (match_operand:DF 1 fp_arith_reg_operand f))) (use (match_operand:PSI 2 fpscr_operand c))] (TARGET_SH4 || TARGET_SH2A_DOUBLE) fcmp/gt %1,%0 @@ -13069,8 +13069,8 @@ (define_insn cmpeqdf_t [(set (reg:SI T_REG) - (eq:SI (match_operand:DF 0 arith_reg_operand f) - (match_operand:DF 1 arith_reg_operand f))) + (eq:SI (match_operand:DF 0 fp_arith_reg_operand f) + (match_operand:DF 1 fp_arith_reg_operand f))) (use (match_operand:PSI 2 fpscr_operand c))] (TARGET_SH4 || TARGET_SH2A_DOUBLE) fcmp/eq %1,%0 @@ -13080,8 +13080,8 @@ (define_insn *ieee_ccmpeqdf_t [(set (reg:SI T_REG) (ior:SI (reg:SI T_REG) - (eq:SI (match_operand:DF 0 arith_reg_operand f) - (match_operand:DF 1 arith_reg_operand f + (eq:SI (match_operand:DF 0 fp_arith_reg_operand f) + (match_operand:DF 1 fp_arith_reg_operand f (use (match_operand:PSI 2 fpscr_operand c))] TARGET_IEEE (TARGET_SH4 || TARGET_SH2A_DOUBLE) { @@ -13139,10 +13139,9 @@ DONE; }) - (define_expand negdf2 - [(set (match_operand:DF 0 arith_reg_operand ) - (neg:DF (match_operand:DF 1 arith_reg_operand )))] + [(set (match_operand:DF 0 fp_arith_reg_operand) + (neg:DF (match_operand:DF 1 fp_arith_reg_operand)))] (TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU { if (TARGET_SH4 || TARGET_SH2A_DOUBLE) @@ -13169,8 +13168,8 @@ (set_attr fp_mode double)]) (define_expand sqrtdf2 - [(set (match_operand:DF 0 arith_reg_operand ) - (sqrt:DF (match_operand:DF 1 arith_reg_operand )))] + [(set (match_operand:DF 0 fp_arith_reg_operand) + (sqrt:DF (match_operand:DF 1 fp_arith_reg_operand)))] (TARGET_SH4 || TARGET_SH2A_DOUBLE) || TARGET_SHMEDIA_FPU { if (TARGET_SH4 || TARGET_SH2A_DOUBLE) @@ -13197,8 +13196,8 @@ (set_attr fp_mode double)]) (define_expand absdf2 - [(set (match_operand:DF 0 arith_reg_operand ) - (abs:DF (match_operand:DF 1 arith_reg_operand )))] + [(set (match_operand:DF 0 fp_arith_reg_operand) + (abs:DF (match_operand:DF
Re: [C] Fix PR57258: unused variable warning is emitted for volatile variables
On Wed, 6 Nov 2013, Mingjie Xing wrote: * c/c-decl.c (pop_scope): Skip volatile variables while emit warnings for unused variables. c/ has its own ChangeLog, so no c/ in the ChangeLog entries. This patch doesn't include a testsuite addition. -- Joseph S. Myers jos...@codesourcery.com
Re: [Patch Ada/build] deal with some cross/native cross issues
I've been trying to improve the building and testing of Darwin for crosses and native crosses. This has thrown up a few small glitches in the Ada build stuff (that would seem to apply to any build-host-nativeX scenario, not just darwin). I would imagine it would be beneficial to resolve these, since Ada requires Ada - a native cross seems like the only realistic way onto a new target. Thanks for working on this. 1. xgnatugn needs to be run on the build system, so needs to be built with the build system's gnatmake. I haven't put a canonical prefix on this since this doesn't appear to be done elsewhere. Defined as GNATMAKE_FOR_BUILD and passed to sub-processes. Why do you need to pass it to ADA_TOOLS_FLAGS_TO_PASS though? Just replace $(GNATMAKE) with gnatmake. 2. Some builds might need to pass LDFLAGS to the gnat* builds. Appended LDFLAGS to GCC_LINK. Passed on in gnattools/Make. OK. 3. In gnattools, the RTS dir must be for the host and not for the build; This actually only showed up when I tried a cross from a 64bit pointer machine to a 32bit pointer one (i.e it is easy for it to go unnoticed). OK, but don't you need to do the same for gnatmake/gnatbind/gnatlink here? See gcc-interface/Make-lang.in, line 171 and below, for similar code. -- Eric Botcazou
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On Wed, Nov 06, 2013 at 02:00:56PM +0100, Richard Biener wrote: Well, what else, besides as_a or keeping the current global accessor functions would you propose? I'd prefer to keep the current accessors. Whether there is a class hierarchy or we keep unions doesn't affect the code too much (unless it makes debugging experience harder, as numerous past C++ification changes have done :( ). Jakub
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
On Wed, Nov 6, 2013 at 12:59 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:00 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. That is definitely a correctness issue. Then please add a runtime testcase that fails before and passes after your patch. Compiler cannot assume bounds by pointer value ignoring bounds passed to the function. But it can certainly drop bounds to unknown? You certainly cannot rely on no such transform taking place. Maybe this just shows that your refering to the parameter is done wrong and shouldn't have used the SSA name default def. What do you do for parameters that have their address taken btw? You'll get foo (void *p) { p_2 = *p; __builtin_ia32_arg_bnd (p_2); which isn't desired? Hmm, unfortunately on x86_64 I get ./cc1 -quiet t.c -fcheck-pointer-bounds t.c:1:0: error: -fcheck-pointers is not supported for this target extern int bar1 (int *p); ^ trying a simple testcase. Grepping shows me no target supports chkp_bound_mode ..? What's this stuff in our tree that has no way of testing it ... :/ Richard. SSA_NAME_OCCURS_IN_ABNORMAL_PHI usage results in missed optimizations. My original patch does not. Ilya Richard. Also sth weird is going on as your arg_bnd call gets [return slot optimization]. It seems that your POINTER_BOUNDS_TYPE is seen as aggregate somehow. Target is responsible for having return slot optimization here. And target expands all such calls. So, do not see the problem here. Ilya Richard. Ilya Richard.
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On Wed, 6 Nov 2013, Jakub Jelinek wrote: On Wed, Nov 06, 2013 at 12:31:00PM +0100, Richard Biener wrote: Maybe we need to revisit it? As one of those who were not in favour of the C++ move, can I ask you guys to step back for a moment and think about - what do all of these changes buy us, exactly? Imagine the state at the end, where everything is converted and supposedly the temporary ugliness is gone, what have we gained over the code as it is now? as_a gains us less runtime checking and more static type checking which is good. But that really affects --enable-checking=yes builds (and only cases where things aren't inlined). If the price for that is uglier and less readable code, then the price is just too high. I see static type checking as being about preventing certain sorts of bugs getting in the compiler at all, rather than about saving time on runtime checks for --enable-checking=yes. I don't have advice on the gimple case. But certainly trees and RTL could both do with more static typing. There, I'm thinking of different types of objects (types, expressions, ...) logically being completely separate static types, not inheriting from a common tree base class at all. But if inheritance and as_a can be used as intermediate steps to allow an incremental transition to completely separate static types, where otherwise the whole source tree would need converting at once, that seems reasonable to me. (I have no comments about what things should look like in cases where completely separate static types don't make sense but there is still some sort of inheritance structure.) -- Joseph S. Myers jos...@codesourcery.com
[Patch, RTL] Eliminate redundant vec_select moves.
Hi, The attached patch eliminates moves of the form set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0] i.e. eliminates lower lane moves between src and dst where src and dst are the same register and this causes rtl to instead use the destination register in the required mode. Also, if my understanding of Big-Endian is correct, this should be safe for big-endian targets as well. I've bootstrapped this on x64_64, regressed on aarch64-none-elf, aarch64_be-none-elf. OK for trunk? Thanks, Tejas Belagod ARM. 2013-11-06 Tejas Belagod tejas.bela...@arm.com gcc/ * rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select for same src and dst.diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 9769b69..3e434cd 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1180,6 +1180,25 @@ set_noop_p (const_rtx set) dst = SUBREG_REG (dst); } + /* This is big-endian-safe because the elements are kept in target + memory order. So, for eg. PARALLEL element value of 2 is the same in + either endian-ness. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + REGNO (XEXP (src, 0)) == REGNO (dst)) +{ + rtx par = XEXP (src, 1); + int i; + + for (i = 0; i XVECLEN (par, 0); i++) + { + rtx tem = XVECEXP (par, 0, i); + if (!CONST_INT_P (tem) || INTVAL (tem) != i) + return 0; + } + return 1; +} + return (REG_P (src) REG_P (dst) REGNO (src) == REGNO (dst)); }
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:59 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:00 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. That is definitely a correctness issue. Then please add a runtime testcase that fails before and passes after your patch. Compiler cannot assume bounds by pointer value ignoring bounds passed to the function. But it can certainly drop bounds to unknown? You certainly cannot rely on no such transform taking place. Maybe this just shows that your refering to the parameter is done wrong and shouldn't have used the SSA name default def. What do you do for parameters that have their address taken btw? You'll get foo (void *p) { p_2 = *p; __builtin_ia32_arg_bnd (p_2); which isn't desired? In this case you just make a load and bounds for p_2 are obtained using bndldx call. I would be glad to use better way to refer to the argument but I do not see one. Hmm, unfortunately on x86_64 I get ./cc1 -quiet t.c -fcheck-pointer-bounds t.c:1:0: error: -fcheck-pointers is not supported for this target extern int bar1 (int *p); ^ trying a simple testcase. Grepping shows me no target supports chkp_bound_mode ..? What's this stuff in our tree that has no way of testing it ... :/ You should pass -mmpx to enable it on the target. Current version in the mpx tree should build most of tests correctly but it is not updated to the current state. Ilya Richard. SSA_NAME_OCCURS_IN_ABNORMAL_PHI usage results in missed optimizations. My original patch does not. Ilya Richard. Also sth weird is going on as your arg_bnd call gets [return slot optimization]. It seems that your POINTER_BOUNDS_TYPE is seen as aggregate somehow. Target is responsible for having return slot optimization here. And target expands all such calls. So, do not see the problem here. Ilya Richard. Ilya Richard.
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
On Wed, Nov 6, 2013 at 2:31 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:59 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:00 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. That is definitely a correctness issue. Then please add a runtime testcase that fails before and passes after your patch. Compiler cannot assume bounds by pointer value ignoring bounds passed to the function. But it can certainly drop bounds to unknown? You certainly cannot rely on no such transform taking place. Maybe this just shows that your refering to the parameter is done wrong and shouldn't have used the SSA name default def. What do you do for parameters that have their address taken btw? You'll get foo (void *p) { p_2 = *p; __builtin_ia32_arg_bnd (p_2); which isn't desired? In this case you just make a load and bounds for p_2 are obtained using bndldx call. I would be glad to use better way to refer to the argument but I do not see one. Err, my mistake - I meant foo (void *p) { p_2 = p; __builtin_ia32_arg_bnd (p_2); Hmm, unfortunately on x86_64 I get ./cc1 -quiet t.c -fcheck-pointer-bounds t.c:1:0: error: -fcheck-pointers is not supported for this target extern int bar1 (int *p); ^ trying a simple testcase. Grepping shows me no target supports chkp_bound_mode ..? What's this stuff in our tree that has no way of testing it ... :/ You should pass -mmpx to enable it on the target. Current version in the mpx tree should build most of tests correctly but it is not updated to the current state. ./cc1 -quiet t.c -fcheck-pointer-bounds -mmpx t.c:1:0: error: -fcheck-pointers is not supported for this target extern int bar1 (int *p); ^ hmm, maybe it's configure disabled somehow (this is just my dev tree). But I still see no chkp_bound_mode implementation. Note that POINTER_BOUNDS_TYPE shouldn't have been a new tree code either but should be a RECORD_TYPE. See how va_list doesn't use a new TREE_CODE either. Richard. Ilya Richard. SSA_NAME_OCCURS_IN_ABNORMAL_PHI usage results in missed optimizations. My original patch does not. Ilya Richard. Also sth weird is going on as your arg_bnd call gets [return slot
Re: [RFA][PATCH] Minor fix to aliasing machinery
On Wed, Nov 6, 2013 at 1:19 PM, Marc Glisse marc.gli...@inria.fr wrote: [Discussion started in http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02472.html ] On Wed, 30 Oct 2013, Marc Glisse wrote: On Wed, 30 Oct 2013, Richard Biener wrote: Btw, get_addr_base_and_unit_offset may also return an offsetted MEM_REF (from MEM [p_3, 17] for example). As we are interested in pointers this could be handled by not requiring a memory reference but extracting the base address and offset, covering more cases. I tried the attached patch, and it almost worked, except for one fortran testcase (widechar_intrinsics_10.f90): Now that ao_ref_init_from_ptr_and_size has been fixed, the following patch passes bootstrap+testsuite (default languages) on x86_64-unknown-linux-gnu. 2013-11-06 Marc Glisse marc.gli...@inria.fr Jeff Law l...@redhat.com gcc/ * tree-ssa-alias.c (stmt_kills_ref_p_1): Use ao_ref_init_from_ptr_and_size for builtins. gcc/testsuite/ * gcc.dg/tree-ssa/alias-27.c: New testcase. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -fdump-tree-optimized } */ + +void f (long *p) { + *p = 42; + p[4] = 42; + __builtin_memset (p, 0, 100); +} + +/* { dg-final { scan-tree-dump-not = 42 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204453) +++ gcc/tree-ssa-alias.c(working copy) @@ -2001,23 +2001,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre return stmt_may_clobber_ref_p_1 (stmt, r); } /* If STMT kills the memory reference REF return true, otherwise return false. */ static bool stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) { /* For a must-alias check we need to be able to constrain - the access properly. */ - ao_ref_base (ref); - if (ref-max_size == -1) + the access properly. + FIXME: except for BUILTIN_FREE. */ + if (!ao_ref_base (ref) + || ref-max_size == -1) return false; if (gimple_has_lhs (stmt) TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME /* The assignment is not necessarily carried out if it can throw and we can catch it in the current function where we could inspect the previous value. ??? We only need to care about the RHS throwing. For aggregate assignments or similar calls and non-call exceptions the LHS might throw as well. */ @@ -2090,37 +2091,47 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref case BUILT_IN_MEMPCPY: case BUILT_IN_MEMMOVE: case BUILT_IN_MEMSET: case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMPCPY_CHK: case BUILT_IN_MEMMOVE_CHK: case BUILT_IN_MEMSET_CHK: { tree dest = gimple_call_arg (stmt, 0); tree len = gimple_call_arg (stmt, 2); - tree base = NULL_TREE; - HOST_WIDE_INT offset = 0; + tree rbase = ref-base; + HOST_WIDE_INT roffset = ref-offset; if (!host_integerp (len, 0)) return false; - if (TREE_CODE (dest) == ADDR_EXPR) - base = get_addr_base_and_unit_offset (TREE_OPERAND (dest, 0), - offset); - else if (TREE_CODE (dest) == SSA_NAME) - base = dest; - if (base - base == ao_ref_base (ref)) + ao_ref dref; + ao_ref_init_from_ptr_and_size (dref, dest, len); What I dislike about this is that it can end up building a new tree node. Not sure if that should block the patch though as it clearly allows to simplify the code ... + tree base = ao_ref_base (dref); + HOST_WIDE_INT offset = dref.offset; + if (!base || dref.size == -1) + return false; + if (TREE_CODE (base) == MEM_REF) + { + if (TREE_CODE (rbase) != MEM_REF) why's that? I think that just processing both bases separately would work as well. + return false; + // Compare pointers. + offset += BITS_PER_UNIT + * TREE_INT_CST_LOW (TREE_OPERAND (base,
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, Nov 6, 2013 at 2:24 PM, Tejas Belagod tbela...@arm.com wrote: Hi, The attached patch eliminates moves of the form set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0] i.e. eliminates lower lane moves between src and dst where src and dst are the same register and this causes rtl to instead use the destination register in the required mode. Also, if my understanding of Big-Endian is correct, this should be safe for big-endian targets as well. I've bootstrapped this on x64_64, regressed on aarch64-none-elf, aarch64_be-none-elf. OK for trunk? It looks good to me (but I'm also wondering about bigendian). Bill? Can you add a testcase where this has an effect on code generation? Thanks, Richard. Thanks, Tejas Belagod ARM. 2013-11-06 Tejas Belagod tejas.bela...@arm.com gcc/ * rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select for same src and dst. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 9769b69..3e434cd 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1180,6 +1180,25 @@ set_noop_p (const_rtx set) dst = SUBREG_REG (dst); } + /* This is big-endian-safe because the elements are kept in target + memory order. So, for eg. PARALLEL element value of 2 is the same in + either endian-ness. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + REGNO (XEXP (src, 0)) == REGNO (dst)) +{ + rtx par = XEXP (src, 1); + int i; + + for (i = 0; i XVECLEN (par, 0); i++) + { + rtx tem = XVECEXP (par, 0, i); + if (!CONST_INT_P (tem) || INTVAL (tem) != i) + return 0; + } + return 1; +} + return (REG_P (src) REG_P (dst) REGNO (src) == REGNO (dst)); }
Re: [SH] arith_reg_operand vs. fp_arith_reg_operand
Oleg Endo oleg.e...@t-online.de wrote: The attached patch addresses one thing which I mentioned in PR 22553 comment #27. Some of the FP insns take arith_reg_operand, which seems a bit off. I haven't checked whether it fixes the original problem of PR 22553, but anyway it's probably better to use fp_arith_reg_operand for FP insns. Tested on rev 204263 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m2a-single/-mb,-m4/-ml,-m4/-mb,-m4-single/ -ml,-m4-single/-mb,-m4a-single/-ml,-m4a-single/-mb} and no new failures. OK for trunk? OK. Regards, kaz
Re: [RFA][PATCH] Minor fix to aliasing machinery
On Wed, 6 Nov 2013, Richard Biener wrote: On Wed, Nov 6, 2013 at 1:19 PM, Marc Glisse marc.gli...@inria.fr wrote: [Discussion started in http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02472.html ] On Wed, 30 Oct 2013, Marc Glisse wrote: On Wed, 30 Oct 2013, Richard Biener wrote: Btw, get_addr_base_and_unit_offset may also return an offsetted MEM_REF (from MEM [p_3, 17] for example). As we are interested in pointers this could be handled by not requiring a memory reference but extracting the base address and offset, covering more cases. I tried the attached patch, and it almost worked, except for one fortran testcase (widechar_intrinsics_10.f90): Now that ao_ref_init_from_ptr_and_size has been fixed, the following patch passes bootstrap+testsuite (default languages) on x86_64-unknown-linux-gnu. 2013-11-06 Marc Glisse marc.gli...@inria.fr Jeff Law l...@redhat.com gcc/ * tree-ssa-alias.c (stmt_kills_ref_p_1): Use ao_ref_init_from_ptr_and_size for builtins. gcc/testsuite/ * gcc.dg/tree-ssa/alias-27.c: New testcase. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -fdump-tree-optimized } */ + +void f (long *p) { + *p = 42; + p[4] = 42; + __builtin_memset (p, 0, 100); +} + +/* { dg-final { scan-tree-dump-not = 42 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204453) +++ gcc/tree-ssa-alias.c(working copy) @@ -2001,23 +2001,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre return stmt_may_clobber_ref_p_1 (stmt, r); } /* If STMT kills the memory reference REF return true, otherwise return false. */ static bool stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) { /* For a must-alias check we need to be able to constrain - the access properly. */ - ao_ref_base (ref); - if (ref-max_size == -1) + the access properly. + FIXME: except for BUILTIN_FREE. */ + if (!ao_ref_base (ref) + || ref-max_size == -1) return false; if (gimple_has_lhs (stmt) TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME /* The assignment is not necessarily carried out if it can throw and we can catch it in the current function where we could inspect the previous value. ??? We only need to care about the RHS throwing. For aggregate assignments or similar calls and non-call exceptions the LHS might throw as well. */ @@ -2090,37 +2091,47 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref case BUILT_IN_MEMPCPY: case BUILT_IN_MEMMOVE: case BUILT_IN_MEMSET: case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMPCPY_CHK: case BUILT_IN_MEMMOVE_CHK: case BUILT_IN_MEMSET_CHK: { tree dest = gimple_call_arg (stmt, 0); tree len = gimple_call_arg (stmt, 2); - tree base = NULL_TREE; - HOST_WIDE_INT offset = 0; + tree rbase = ref-base; + HOST_WIDE_INT roffset = ref-offset; if (!host_integerp (len, 0)) return false; - if (TREE_CODE (dest) == ADDR_EXPR) - base = get_addr_base_and_unit_offset (TREE_OPERAND (dest, 0), - offset); - else if (TREE_CODE (dest) == SSA_NAME) - base = dest; - if (base - base == ao_ref_base (ref)) + ao_ref dref; + ao_ref_init_from_ptr_and_size (dref, dest, len); What I dislike about this is that it can end up building a new tree node. Not sure if that should block the patch though as it clearly allows to simplify the code ... + tree base = ao_ref_base (dref); + HOST_WIDE_INT offset = dref.offset; + if (!base || dref.size == -1) + return false; + if (TREE_CODE (base) == MEM_REF) + { + if (TREE_CODE (rbase) != MEM_REF) why's that? I think that just processing both bases separately would work as well. If they differ, there is no point going on, we might as well break early. And this way we maintain the property that either base and rbase are both refs, or they are both pointers, not some weird mix. + return false; +
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 2:31 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:59 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:00 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. That is definitely a correctness issue. Then please add a runtime testcase that fails before and passes after your patch. Compiler cannot assume bounds by pointer value ignoring bounds passed to the function. But it can certainly drop bounds to unknown? You certainly cannot rely on no such transform taking place. Maybe this just shows that your refering to the parameter is done wrong and shouldn't have used the SSA name default def. What do you do for parameters that have their address taken btw? You'll get foo (void *p) { p_2 = *p; __builtin_ia32_arg_bnd (p_2); which isn't desired? In this case you just make a load and bounds for p_2 are obtained using bndldx call. I would be glad to use better way to refer to the argument but I do not see one. Err, my mistake - I meant foo (void *p) { p_2 = p; __builtin_ia32_arg_bnd (p_2); Hmm, unfortunately on x86_64 I get ./cc1 -quiet t.c -fcheck-pointer-bounds t.c:1:0: error: -fcheck-pointers is not supported for this target extern int bar1 (int *p); ^ trying a simple testcase. Grepping shows me no target supports chkp_bound_mode ..? What's this stuff in our tree that has no way of testing it ... :/ You should pass -mmpx to enable it on the target. Current version in the mpx tree should build most of tests correctly but it is not updated to the current state. ./cc1 -quiet t.c -fcheck-pointer-bounds -mmpx t.c:1:0: error: -fcheck-pointers is not supported for this target extern int bar1 (int *p); ^ hmm, maybe it's configure disabled somehow (this is just my dev tree). But I still see no chkp_bound_mode implementation. Sorry, I missed you use your dev tree. Only first 7 patches of ~30 have been committed so far and therefore no support yet. Currently full implementation is available in mpx branch only. Ilya Note that POINTER_BOUNDS_TYPE shouldn't have been a new tree code either but should be a RECORD_TYPE. See how
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 3:04 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 2:31 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:59 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Wed, Nov 6, 2013 at 12:00 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/6 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 3:24 PM, Ilya Enkovich enkovich@gmail.com wrote: 2013/11/5 Richard Biener richard.guent...@gmail.com: On Tue, Nov 5, 2013 at 1:52 PM, Ilya Enkovich enkovich@gmail.com wrote: For input parameter P I need to have BOUNDS = __builtin_arg_bnd (P) to somehow refer to bounds of P in GIMPLE. Optimizations may modify __builtin_arg_bnd (P) replacing P with its copy or some value. It makes call useless because removes information about parameter whose bounds we refer to. I want such optimization to ignore __builtin_arg_bnd calls and always leave default SSA_NAME of PARM_DECL there as arg. How does a compilable testcase look like that shows how the default def is used? And what transforms break that use? I really cannot see how this would happen (and you didn't add a testcase). Here is a test source: extern int bar1 (int *p); extern int bar2 (int); int foo (int *p) { if (!p) return bar1 (p); return bar2 (10); } After instrumentation GIMPLE looks like: foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 6: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (p_3(D)); bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: _6 = bar1 (p_3(D), __bound_tmp.0_9); goto bb 5; bb 4: _8 = bar2 (10); bb 5: # _1 = PHI _6(3), _8(4) return _1; } Here is optimized GIMPLE (if I do not apply my changes in tree-ssa-dom.c): foo (int * p) { unnamed type __bound_tmp.0; int _1; int _6; int _8; bb 2: if (p_3(D) == 0B) goto bb 3; else goto bb 4; bb 3: __bound_tmp.0_9 = __builtin_ia32_arg_bnd (0B); [return slot optimization] _6 = bar1 (0B, __bound_tmp.0_9); [tail call] goto bb 5; bb 4: _8 = bar2 (10); [tail call] bb 5: # _1 = PHI _6(3), _8(4) return _1; } Now during expand or inline of foo I cannot determine the value for __bound_tmp.0_9. Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. That is definitely a correctness issue. Then please add a runtime testcase that fails before and passes after your patch. Compiler cannot assume bounds by pointer value ignoring bounds passed to the function. But it can certainly drop bounds to unknown? You certainly cannot rely on no such transform taking place. Maybe this just shows that your refering to the parameter is done wrong and shouldn't have used the SSA name default def. What do you do for parameters that have their address taken btw? You'll get foo (void *p) { p_2 = *p; __builtin_ia32_arg_bnd (p_2); which isn't desired? In this case you just make a load and bounds for p_2 are obtained using bndldx call. I would be glad to use better way to refer to the argument but I do not see one. Err, my mistake - I meant foo (void *p) { p_2 = p; __builtin_ia32_arg_bnd (p_2); Well, it does not make much difference. Such params are allocated on the stack by expand. So, here p_2 = p is load from memory and bounds should be loaded by bndldx. Expand is responsible to store bounds using bndstx when it allocates p on the stack. The parameter is incoming in a register still. It does not matter. In GIMPLE it is still load. During expand assign_parms allocates slot on the stack for it and stores reg to it. DECL_RTL for PARM_DECL is set to stack slot. I just added code to store corresponding bounds for stored param (I did not send this patch yet). Ilya Hmm, unfortunately on x86_64 I get ./cc1 -quiet t.c -fcheck-pointer-bounds t.c:1:0: error: -fcheck-pointers is not supported for this target extern int bar1 (int *p); ^ trying a simple testcase. Grepping shows me no target supports chkp_bound_mode ..? What's this stuff in our tree that has no way of testing it ... :/ You should pass -mmpx
Re: [RFA][PATCH] Isolate erroneous paths optimization
On Tue, Nov 5, 2013 at 10:50 PM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 5 Nov 2013, Ian Lance Taylor wrote: This patch actually breaks the Go testsuite. In Go dereferencing a nil pointer is well-defined: it causes panic that can be caught. This breaks a test for that functionality by changing the panic to a builtin_trap. That's not a big deal; I'll just disable this optimization in the Go frontend. Shouldn't go use -fno-delete-null-pointer-checks by default then? That should disable this optimization and others that rely on the same idea. No, that is a different optimization with different properties. The -fdelete-null-pointer-checks optimization assumes that if you write x = *p; if (p == NULL) { printf (NULL\n); } that the test p == NULL can not succeed. In Go, that is true. If p is NULL the *p will cause a panic and ordinary code execution will not proceed. The recent -fisolate-erroneous-paths optimization will change code like this: if (p == NULL) { printf (NULL\n); } x = *p; into code like this: if (p == NULL) { printf (NULL\n); __builtin_trap (); } x = *p; That is, it will insert a trap rather than dereferencing the pointer known to be NULL. This doesn't work for Go because we really do want the panic, not the __builtin_trap. This optimization would work fine for Go if there were a way to explicitly call the panic function rather than calling trap, but that would be a frontend-dependent aspect to a middle-end optimization. Ian
Re: [RFA][PATCH] Minor fix to aliasing machinery
On Wed, Nov 6, 2013 at 3:08 PM, Marc Glisse marc.gli...@inria.fr wrote: On Wed, 6 Nov 2013, Richard Biener wrote: On Wed, Nov 6, 2013 at 1:19 PM, Marc Glisse marc.gli...@inria.fr wrote: [Discussion started in http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02472.html ] On Wed, 30 Oct 2013, Marc Glisse wrote: On Wed, 30 Oct 2013, Richard Biener wrote: Btw, get_addr_base_and_unit_offset may also return an offsetted MEM_REF (from MEM [p_3, 17] for example). As we are interested in pointers this could be handled by not requiring a memory reference but extracting the base address and offset, covering more cases. I tried the attached patch, and it almost worked, except for one fortran testcase (widechar_intrinsics_10.f90): Now that ao_ref_init_from_ptr_and_size has been fixed, the following patch passes bootstrap+testsuite (default languages) on x86_64-unknown-linux-gnu. 2013-11-06 Marc Glisse marc.gli...@inria.fr Jeff Law l...@redhat.com gcc/ * tree-ssa-alias.c (stmt_kills_ref_p_1): Use ao_ref_init_from_ptr_and_size for builtins. gcc/testsuite/ * gcc.dg/tree-ssa/alias-27.c: New testcase. -- Marc Glisse Index: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c === --- gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/alias-27.c(working copy) @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O1 -fdump-tree-optimized } */ + +void f (long *p) { + *p = 42; + p[4] = 42; + __builtin_memset (p, 0, 100); +} + +/* { dg-final { scan-tree-dump-not = 42 optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Property changes on: gcc/testsuite/gcc.dg/tree-ssa/alias-27.c ___ Added: svn:keywords ## -0,0 +1 ## +Author Date Id Revision URL \ No newline at end of property Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 204453) +++ gcc/tree-ssa-alias.c(working copy) @@ -2001,23 +2001,24 @@ stmt_may_clobber_ref_p (gimple stmt, tre return stmt_may_clobber_ref_p_1 (stmt, r); } /* If STMT kills the memory reference REF return true, otherwise return false. */ static bool stmt_kills_ref_p_1 (gimple stmt, ao_ref *ref) { /* For a must-alias check we need to be able to constrain - the access properly. */ - ao_ref_base (ref); - if (ref-max_size == -1) + the access properly. + FIXME: except for BUILTIN_FREE. */ + if (!ao_ref_base (ref) + || ref-max_size == -1) return false; if (gimple_has_lhs (stmt) TREE_CODE (gimple_get_lhs (stmt)) != SSA_NAME /* The assignment is not necessarily carried out if it can throw and we can catch it in the current function where we could inspect the previous value. ??? We only need to care about the RHS throwing. For aggregate assignments or similar calls and non-call exceptions the LHS might throw as well. */ @@ -2090,37 +2091,47 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref case BUILT_IN_MEMPCPY: case BUILT_IN_MEMMOVE: case BUILT_IN_MEMSET: case BUILT_IN_MEMCPY_CHK: case BUILT_IN_MEMPCPY_CHK: case BUILT_IN_MEMMOVE_CHK: case BUILT_IN_MEMSET_CHK: { tree dest = gimple_call_arg (stmt, 0); tree len = gimple_call_arg (stmt, 2); - tree base = NULL_TREE; - HOST_WIDE_INT offset = 0; + tree rbase = ref-base; + HOST_WIDE_INT roffset = ref-offset; if (!host_integerp (len, 0)) return false; - if (TREE_CODE (dest) == ADDR_EXPR) - base = get_addr_base_and_unit_offset (TREE_OPERAND (dest, 0), - offset); - else if (TREE_CODE (dest) == SSA_NAME) - base = dest; - if (base - base == ao_ref_base (ref)) + ao_ref dref; + ao_ref_init_from_ptr_and_size (dref, dest, len); What I dislike about this is that it can end up building a new tree node. Not sure if that should block the patch though as it clearly allows to simplify the code ... + tree base = ao_ref_base (dref); + HOST_WIDE_INT offset = dref.offset; + if (!base || dref.size == -1) + return false; + if (TREE_CODE (base) == MEM_REF) + { + if (TREE_CODE (rbase) != MEM_REF) why's that? I think that just processing both bases separately would work as well. If they differ, there is no point going on, we might as
RE: [PATCH, PR 57748] Check for out of bounds access, Part 3
Hi, On Tue, 17 Sep 2013 01:09:45, Martin Jambor wrote: @@ -4773,6 +4738,8 @@ expand_assignment (tree to, tree from, b if (MEM_P (to_rtx) GET_MODE (to_rtx) == BLKmode GET_MODE (XEXP (to_rtx, 0)) != VOIDmode + bitregion_start == 0 + bitregion_end == 0 bitsize 0 (bitpos % bitsize) == 0 (bitsize % GET_MODE_ALIGNMENT (mode1)) == 0 ... I'm not sure to what extent the hunk adding tests for bitregion_start and bitregion_end being zero is connected to this issue. I do not see any of the testcases exercising that path. If it is indeed another problem, I think it should be submitted (and potentially committed) as a separate patch, preferably with a testcase. Meanwhile I am able to give an example where that code is executed with bitpos = 64, bitsize=32, bitregion_start = 32, bitregion_end = 95. Afterwards bitpos=0, bitsize=32, which is completely outside bitregion_start=32, bitregion_end=95. However this can only be seen in the debugger, as the store_field goes thru a code path that does not look at bitregion_start/end. Well that is at least extremely ugly, and I would not be sure, that I cannot come up with a sample that crashes or creates wrong code. Currently I think that maybe the best way to fix that would be this: --- gcc/expr.c2013-10-21 08:27:09.546035668 +0200 +++ gcc/expr.c2013-10-22 15:19:56.749476525 +0200 @@ -4762,6 +4762,9 @@ expand_assignment (tree to, tree from, b MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1)) { to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT); + bitregion_start = 0; + if (bitregion_end= (unsigned HOST_WIDE_INT) bitpos) +bitregion_end -= bitpos; bitpos = 0; } Any suggestions? Regards Bernd. well, as already discussed, the following example executes the above memory block, and leaves bitregion_start/end in an undefined state: extern void abort (void); struct x{ int a; int :32; volatile int b:32; }; struct s { int a,b,c,d; struct x xx[1]; }; struct s ss; volatile int k; int main() { ss.xx[k].b = 1; // asm volatile(:::memory); if ( ss.xx[k].b != 1) abort (); return 0; } Although this does not cause malfunction at the time, I'd propose to play safe, and update the bitregion_start/bitregion_end. Additionally I'd propose to remove this comment in expand_assignment and expand_expr_real_1: A constant address in TO_RTX can have VOIDmode, we must not try to call force_reg for that case. Avoid that case. This comment is completely out of sync: There is no longer any force_reg in that if-block, and a constant address in TO_RTX has SImode or DImode in GET_MODE (XEXP (to_rtx, 0)) I do not know how to make it a VOIDmode, therefore the comment does not help either. Boot-strapped and regression-tested on x86_64-linux-gnu. OK for trunk? Regards Bernd.2013-11-06 Bernd Edlinger bernd.edlin...@hotmail.de PR middle-end/57748 * expr.c (expand_assignment): Remove bogus comment. Update bitregion_start/bitregion_end. (expand_expr_real_1): Remove bogus comment. patch-pr57748-3.diff Description: Binary data
Go patch committed: Turn off -fisolate-erroneous-paths for Go
As discussed at the thread at http://gcc.gnu.org/ml/gcc-patches/2013-11/msg00505.html , the new -fisolate-erroneous-paths option does not work for Go. The option can change nil pointer deferences into a trap, but for Go we want those to consistently be a panic. This patch disables the option for Go. If the option changes in a way that lets Go work, such as by disabling the option when using -fnon-call-exceptions, then this patch can be reverted. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian 2013-11-06 Ian Lance Taylor i...@google.com * go-lang.c (go_langhook_post_options): If -fisolate-erroneous-paths was turned on by an optimization option, turn it off. Index: go-lang.c === --- go-lang.c (revision 204430) +++ go-lang.c (working copy) @@ -268,6 +268,12 @@ go_langhook_post_options (const char **p if (flag_excess_precision_cmdline == EXCESS_PRECISION_DEFAULT) flag_excess_precision_cmdline = EXCESS_PRECISION_STANDARD; + /* The isolate_erroneous_paths optimization can change a nil + dereference from a panic to a trap, so we have to disable it for + Go, even though it is normally enabled by -O2. */ + if (!global_options_set.x_flag_isolate_erroneous_paths) +global_options.x_flag_isolate_erroneous_paths = 0; + /* Returning false means that the backend should be used. */ return false; }
Re: [RFA][PATCH] Isolate erroneous paths optimization
On Wed, Nov 6, 2013 at 7:15 AM, Richard Biener richard.guent...@gmail.com wrote: But I think that you cannot transform foo () { *0 = 1; } to __builtin_trap as you can catch the trap via an exception handler in a caller of foo, no? That is true. OK, I can see an argument that when using -fnon-call-exceptions that kind of code should not be changed to call __builtin_trap. In that case I think it would be fine to run the isolate paths optimization, but to not omit the actual dereference of the NULL pointer (possibly the dereference could be followed by a trap). Ian
Re: [RFA][PATCH] Isolate erroneous paths optimization
On Wed, Nov 06, 2013 at 04:23:06PM +0100, Richard Biener wrote: In that case I think it would be fine to run the isolate paths optimization, but to not omit the actual dereference of the NULL pointer (possibly the dereference could be followed by a trap). Yeah, we need the trap to properly end the BB (even if that is a waste instruction generated). BTW, why do we generate in this optimization __builtin_trap rather than just __builtin_unreachable ()? The former still generates some code (abort, some aborting instruction, ...), while the former is just an assertion that valid code will not reach it. Jakub
Re: [RFA][PATCH] Isolate erroneous paths optimization
On 11/06/13 08:08, Ian Lance Taylor wrote: The recent -fisolate-erroneous-paths optimization will change code like this: if (p == NULL) { printf (NULL\n); } x = *p; into code like this: if (p == NULL) { printf (NULL\n); __builtin_trap (); } x = *p; That is, it will insert a trap rather than dereferencing the pointer known to be NULL. This doesn't work for Go because we really do want the panic, not the __builtin_trap. This optimization would work fine for Go if there were a way to explicitly call the panic function rather than calling trap, but that would be a frontend-dependent aspect to a middle-end optimization. But then you have -fnon-call-exceptions enabled? Yes (go_langhook_init_options_struct in go/go-lang.c). I wonder if we could have the front-ends define a builtin to use in this kind of situation and use the builtin trap as a fallback if the front-end didn't provide a suitable builtin. Where obviously -fisolate-erroneous-paths also shouldn't apply? I guess that's not entirely obvious to me. I'm not sure that -fnon-call-exceptions means that all trapping instructions must be executed. In effect, yes. Jeff
Re: [RFA][PATCH] Isolate erroneous paths optimization
On 11/06/13 08:20, Ian Lance Taylor wrote: On Wed, Nov 6, 2013 at 7:15 AM, Richard Biener richard.guent...@gmail.com wrote: But I think that you cannot transform foo () { *0 = 1; } to __builtin_trap as you can catch the trap via an exception handler in a caller of foo, no? That is true. OK, I can see an argument that when using -fnon-call-exceptions that kind of code should not be changed to call __builtin_trap. In that case I think it would be fine to run the isolate paths optimization, but to not omit the actual dereference of the NULL pointer (possibly the dereference could be followed by a trap). That would be trivial to implement. I went back and forth on whether to to leave the *0 or issue the trap. A program could catch the *0 and do something in its handler. Changing the *0 to a trap would change hte program's behaviour if it took different action in the handler on the trap. The trap works better in other cases that don't involve a dereference though. For example, if a function has an argument that is marked as must be non-null and we find a path which passes null for that argument trapping seems best. jeff
Re: [RFA][PATCH] Isolate erroneous paths optimization
On 11/06/13 08:23, Richard Biener wrote: On Wed, Nov 6, 2013 at 4:20 PM, Ian Lance Taylor i...@google.com wrote: On Wed, Nov 6, 2013 at 7:15 AM, Richard Biener richard.guent...@gmail.com wrote: But I think that you cannot transform foo () { *0 = 1; } to __builtin_trap as you can catch the trap via an exception handler in a caller of foo, no? That is true. OK, I can see an argument that when using -fnon-call-exceptions that kind of code should not be changed to call __builtin_trap. In that case I think it would be fine to run the isolate paths optimization, but to not omit the actual dereference of the NULL pointer (possibly the dereference could be followed by a trap). Yeah, we need the trap to properly end the BB (even if that is a waste instruction generated). Right. We *really* don't want execution to continue. We've entered an undefined state and continuing execution can lead to all kinds of nasty problems, including security exploits. jeff
Re: [RFA][PATCH] Isolate erroneous paths optimization
On Wed, Nov 06, 2013 at 09:15:57AM -0700, Jeff Law wrote: On 11/06/13 08:27, Jakub Jelinek wrote: On Wed, Nov 06, 2013 at 04:23:06PM +0100, Richard Biener wrote: In that case I think it would be fine to run the isolate paths optimization, but to not omit the actual dereference of the NULL pointer (possibly the dereference could be followed by a trap). Yeah, we need the trap to properly end the BB (even if that is a waste instruction generated). BTW, why do we generate in this optimization __builtin_trap rather than just __builtin_unreachable ()? The former still generates some code (abort, some aborting instruction, ...), while the former is just an assertion that valid code will not reach it. Because if you do reach the site, you really want to halt the program to avoid potential security exploits. I'm actually of the opinion that builtin_unreachable should be trapping as well. __builtin_unreachable () is trapping if -fsanitize=unreachable, but otherwise it is just an optimization hint, intentionally so, that allows optimizing on the fact that it doesn't happen. If from fear of security exploits we'd stop trying to optimize code well, we'd need to punt on using undefined integer overflow and many other things. Jakub
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On Wed, 2013-11-06 at 14:11 +0100, Jakub Jelinek wrote: On Wed, Nov 06, 2013 at 02:00:56PM +0100, Richard Biener wrote: Well, what else, besides as_a or keeping the current global accessor functions would you propose? I'd prefer to keep the current accessors. Whether there is a class hierarchy or we keep unions doesn't affect the code too much (unless it makes debugging experience harder, as numerous past C++ification changes have done :( ). What specific things have become more difficult to debug due to the changes you refer to? I can have a look at improving them by extending gdbhooks.py, if that is desirable. Thanks Dave
[PATCH, PR 10474] Take two on splitting live-ranges of function arguments to help shrink-wrapping
Hi, last Thursday I had to revert a previous version of this patch because it has caused a lot of trouble on various platforms I did not test it on. The patch is still very similar to its previous iteration (http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02183.html) the only major difference is that I moved more after-the-fact re-analyzing from find_moveable_pseudos to a point when my transformation also finished. The problem with it was that REG_N_REFS of the pseudos introduced by my code was not set and thus reload ignored and let it slip through instead of generating spills. This is fixed by moving the re-computation of regstat_n_sets_and_refs in regstat_init_n_sets_and_refs to after my transformation. In order to avoid similar surprises I have moved all other re-computations from find_moveable_pseudos to the caller in ira, namely: fix_reg_equiv_init (); expand_reg_info (); regstat_free_n_sets_and_refs (); regstat_free_ri (); regstat_init_n_sets_and_refs (); regstat_compute_ri (); I have also bootstrapped and tested the patch not only on x86_64-linux but also on i686-linux and ppc64-linux (without Ada though). I have made sure that the reported problem does not occur on cris-elf and sh-none-elf cross compilers. (I could not reproduce it on arm, and do not have access to sparc but it was also reported there.) Another minor change which I erroneously omitted the last time is that the testcases are run on x86_64 only because that is the only platform where I know the transformation currently takes place. The reason why I did not move them to a target-specific directory is that I believe the transformation can be beneficial on other platforms as well. For example, PR 10474 was actually filed against PPC but this patch does not work there because the initial move of the parameter is done in a parallel insn: (parallel [ (set (reg:CC 124) (compare:CC (reg:DI 3 3 [ i ]) (const_int 0 [0]))) (set (reg/v/f:DI 123 [ i ]) (reg:DI 3 3 [ i ])) ]) which fails my single_set test. However, the mechanism can be extended to handle these situations as well and afterwards we could run the test also on ppc64. So, Vlad, Steven, do you think that this time I have re-computed all that is necessary? Do you think the patch is OK? Thanks a lot and sorry for the breakage, Martin 2013-11-04 Martin Jambor mjam...@suse.cz PR rtl-optimization/10474 * ira.c (interesting_dest_for_shprep): New function. (split_live_ranges_for_shrink_wrap): Likewise. (find_moveable_pseudos): Move calculation of dominance info, df_analysios and the final anlyses to... (ira): ...here, call split_live_ranges_for_shrink_wrap. testsuite/ * gcc.dg/pr10474.c: New testcase. * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise. * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise. diff --git a/gcc/ira.c b/gcc/ira.c index 9e94704..70fe13b 100644 --- a/gcc/ira.c +++ b/gcc/ira.c @@ -4514,9 +4514,6 @@ find_moveable_pseudos (void) pseudo_replaced_reg.release (); pseudo_replaced_reg.safe_grow_cleared (max_regs); - df_analyze (); - calculate_dominance_info (CDI_DOMINATORS); - i = 0; bitmap_initialize (live, 0); bitmap_initialize (used, 0); @@ -4829,14 +4826,196 @@ find_moveable_pseudos (void) free (bb_moveable_reg_sets); last_moveable_pseudo = max_reg_num (); +} - fix_reg_equiv_init (); - expand_reg_info (); - regstat_free_n_sets_and_refs (); - regstat_free_ri (); - regstat_init_n_sets_and_refs (); - regstat_compute_ri (); - free_dominance_info (CDI_DOMINATORS); + +/* If insn is interesting for parameter range-splitting shring-wrapping + preparation, i.e. it is a single set from a hard register to a pseudo, which + is live at CALL_DOM, return the destination. Otherwise return NULL. */ + +static rtx +interesting_dest_for_shprep (rtx insn, basic_block call_dom) +{ + rtx set = single_set (insn); + if (!set) +return NULL; + rtx src = SET_SRC (set); + rtx dest = SET_DEST (set); + if (!REG_P (src) || !HARD_REGISTER_P (src) + || !REG_P (dest) || HARD_REGISTER_P (dest) + || (call_dom !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest +return NULL; + return dest; +} + +/* Split live ranges of pseudos that are loaded from hard registers in the + first BB in a BB that dominates all non-sibling call if such a BB can be + found and is not in a loop. Return true if the function has made any + changes. */ + +static bool +split_live_ranges_for_shrink_wrap (void) +{ + basic_block bb, call_dom = NULL; + basic_block first = single_succ (ENTRY_BLOCK_PTR); + rtx insn, last_interesting_insn = NULL; + bitmap_head need_new, reachable; + vecbasic_block queue; + + if (!flag_shrink_wrap) +return false; + + bitmap_initialize (need_new, 0); + bitmap_initialize (reachable, 0); + queue.create
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, 2013-11-06 at 15:01 +0100, Richard Biener wrote: On Wed, Nov 6, 2013 at 2:24 PM, Tejas Belagod tbela...@arm.com wrote: Hi, The attached patch eliminates moves of the form set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0] i.e. eliminates lower lane moves between src and dst where src and dst are the same register and this causes rtl to instead use the destination register in the required mode. Also, if my understanding of Big-Endian is correct, this should be safe for big-endian targets as well. I've bootstrapped this on x64_64, regressed on aarch64-none-elf, aarch64_be-none-elf. OK for trunk? It looks good to me (but I'm also wondering about bigendian). Bill? Yes, that should be ok. I can run a regression test to confirm. Thanks, Bill Can you add a testcase where this has an effect on code generation? Thanks, Richard. Thanks, Tejas Belagod ARM. 2013-11-06 Tejas Belagod tejas.bela...@arm.com gcc/ * rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select for same src and dst. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 9769b69..3e434cd 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1180,6 +1180,25 @@ set_noop_p (const_rtx set) dst = SUBREG_REG (dst); } + /* This is big-endian-safe because the elements are kept in target + memory order. So, for eg. PARALLEL element value of 2 is the same in + either endian-ness. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + REGNO (XEXP (src, 0)) == REGNO (dst)) +{ + rtx par = XEXP (src, 1); + int i; + + for (i = 0; i XVECLEN (par, 0); i++) + { + rtx tem = XVECEXP (par, 0, i); + if (!CONST_INT_P (tem) || INTVAL (tem) != i) + return 0; + } + return 1; +} + return (REG_P (src) REG_P (dst) REGNO (src) == REGNO (dst)); }
Re: [PATCH] Add gimple subclasses for every gimple code (was Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3))
Hi, On Tue, 5 Nov 2013, David Malcolm wrote: Here's a followup patch which ensures that every gimple code has its own subclass, by adding empty subclasses derived from the GSS_-based subclasses as appropriate (I don't bother for gimple codes that already have their own subclass due to having their own GSS layout). I also copied the comments from gimple.def into gimple.h, so that Doxygen picks up on the descriptions and uses them to describe each subclass. I don't like that. The empty classes are just useless, they imply a structure that isn't really there, some of the separate gimple codes are basically selectors of specific subtypes of a generic concept, without additional data or methods; creating a type for those is confusing. Generally I don't like complicating the type system without good reasons (as in actually also making use of the complicated types). The fewer types the better IMO. Ciao, Michael.
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On Wed, Nov 06, 2013 at 11:25:31AM -0500, David Malcolm wrote: On Wed, 2013-11-06 at 14:11 +0100, Jakub Jelinek wrote: On Wed, Nov 06, 2013 at 02:00:56PM +0100, Richard Biener wrote: Well, what else, besides as_a or keeping the current global accessor functions would you propose? I'd prefer to keep the current accessors. Whether there is a class hierarchy or we keep unions doesn't affect the code too much (unless it makes debugging experience harder, as numerous past C++ification changes have done :( ). What specific things have become more difficult to debug due to the changes you refer to? I can have a look at improving them by extending gdbhooks.py, if that is desirable. I'm refering to various things that have been mentioned already multiple times, mostly from the switch to C++, like: 1) p debug_tree (0x718b3690) not working anymore, typically one would use this when just cut'n'pasting addresses seen in backtrace, other debug_tree etc., one now has to add explicitly a cast p debug_tree ((tree)0x718b3690). Perhaps this can be cured just by adding extra overloads for a few most popular debug_* routines that weren't initially overloaded, add them an overload with uintptr_t type or similar that will just cast it to the right type and call the original routine 2) tab completion often not working well for hundreds of functions, say type p gimple_can_mtab When gcc was written in C, this would complete it just fine into p gimple_can_merge_blocks_p but GDB is confused by the forward declarations and offers you: p gimple_can_merge_blocks_p(basic_block and doesn't autocomplete further from there, so, for functions that aren't really overloaded (fortunately so far most of them) one has to manually remove all the cruft until and including the (, or has to complete from memory or similar. The issue is that gdb sees: gimple_can_merge_blocks_p(basic_block, basic_block) gimple_can_merge_blocks_p(basic_block_def*, basic_block_def*) gimple_can_merge_blocks_p(basic_block_def*, basic_block_def*)::__FUNCTION__ and considers the two different thing, but of course: typedef struct basic_block_def *basic_block; and thus you really don't care which one is chosen from these, but preferrably just one, not both. 3) step got significantly worse over time, we have skip_file tree.h, but with all the recent header reorganizations gdbinit.in hasn't been updated on which other headers would greatly benefit from being skipped by default. Certainly gimple.h IMHO, and various new headers that just contain small inline accessors 4) the gdbhooks.py printing is nice, but (talking just from memory), it is tied to just one spelling of various types, so say if something has basic_block type, it is printed nicely, but if it has basic_block_def * type, it isn't 5) the new improved C++ vec.h is harder to use in the debugger This list is still very much incomplete. Jakub
Disable floating-point contraction in ISO C conformance modes
In http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01131.html, I noted that, as I'd previously noted in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01951.html, ISO C conformance modes should set -ffp-contract=on (= -ffp-contract=off) rather than the default -ffp-contract=fast, given the ISO C restriction to contraction taking place within source language expressions. No-one indicated any reason not to do this, so I've now applied this patch to do so (and to make an explicit -ffp-contract=fast in ISO C mode imply __GCC_IEC_559=0). Bootstrapped with no regressions on x86_64-unknown-linux-gnu. c-family: 2013-11-06 Joseph Myers jos...@codesourcery.com * c-opts.c (c_common_post_options): Set -ffp-contract=off in C standards modes. * c-cppbuiltin.c (cpp_iec_559_value): Consider -ffp-contract=fast to mean lack of IEEE 754 support. testsuite: 2013-11-06 Joseph Myers jos...@codesourcery.com * gcc.dg/torture/c99-contract-1.c: New test. Index: gcc/c-family/c-cppbuiltin.c === --- gcc/c-family/c-cppbuiltin.c (revision 204453) +++ gcc/c-family/c-cppbuiltin.c (working copy) @@ -726,16 +726,19 @@ cpp_iec_559_value (void) ret = 0; /* In strict C standards conformance mode, consider unpredictable - excess precision to mean lack of IEEE 754 support. ??? The same - should apply to unpredictable contraction, but at present - standards conformance options do not enable conforming - contraction. For C++, and outside strict conformance mode, do - not consider these options to mean lack of IEEE 754 support. */ + excess precision to mean lack of IEEE 754 support. The same + applies to unpredictable contraction. For C++, and outside + strict conformance mode, do not consider these options to mean + lack of IEEE 754 support. */ if (flag_iso !c_dialect_cxx () TARGET_FLT_EVAL_METHOD != 0 flag_excess_precision_cmdline != EXCESS_PRECISION_STANDARD) ret = 0; + if (flag_iso + !c_dialect_cxx () + flag_fp_contract_mode == FP_CONTRACT_FAST) +ret = 0; /* Various options are contrary to IEEE 754 semantics. */ if (flag_unsafe_math_optimizations Index: gcc/c-family/c-opts.c === --- gcc/c-family/c-opts.c (revision 204453) +++ gcc/c-family/c-opts.c (working copy) @@ -827,6 +827,15 @@ c_common_post_options (const char **pfilename) ? EXCESS_PRECISION_STANDARD : EXCESS_PRECISION_FAST); + /* ISO C restricts floating-point expression contraction to within + source-language expressions (-ffp-contract=on, currently an alias + for -ffp-contract=off). */ + if (flag_iso + !c_dialect_cxx () + (global_options_set.x_flag_fp_contract_mode + == (enum fp_contract_mode) 0)) +flag_fp_contract_mode = FP_CONTRACT_OFF; + /* By default we use C99 inline semantics in GNU99 or C99 mode. C99 inline semantics are not supported in GNU89 or C89 mode. */ if (flag_gnu89_inline == -1) Index: gcc/testsuite/gcc.dg/torture/c99-contract-1.c === --- gcc/testsuite/gcc.dg/torture/c99-contract-1.c (revision 0) +++ gcc/testsuite/gcc.dg/torture/c99-contract-1.c (revision 0) @@ -0,0 +1,21 @@ +/* Test floating-point contraction occurs only within source language + expressions. */ +/* { dg-do run } */ +/* { dg-options -std=c99 -pedantic-errors } */ + +extern void abort (void); +extern void exit (int); + +volatile float a = 1 + 0x1p-23f, b = 1 - 0x1p-23f, c = -1; + +int +main (void) +{ + float av = a, bv = b, cv = c; + float p = av * bv; + float r = p + cv; + if (r == 0) +exit (0); + else +abort (); +} -- Joseph S. Myers jos...@codesourcery.com
Re: [RFA][PATCH] Isolate erroneous paths optimization
On 11/06/13 08:27, Jakub Jelinek wrote: On Wed, Nov 06, 2013 at 04:23:06PM +0100, Richard Biener wrote: In that case I think it would be fine to run the isolate paths optimization, but to not omit the actual dereference of the NULL pointer (possibly the dereference could be followed by a trap). Yeah, we need the trap to properly end the BB (even if that is a waste instruction generated). BTW, why do we generate in this optimization __builtin_trap rather than just __builtin_unreachable ()? The former still generates some code (abort, some aborting instruction, ...), while the former is just an assertion that valid code will not reach it. Because if you do reach the site, you really want to halt the program to avoid potential security exploits. I'm actually of the opinion that builtin_unreachable should be trapping as well. jeff
Re: patch implementing a new pass for register-pressure relief through live range shrinkage
On 11/6/2013, 4:17 AM, Richard Biener wrote: On Tue, Nov 5, 2013 at 4:35 PM, Vladimir Makarov vmaka...@redhat.com wrote: I'd like to add a new experimental optimization to the trunk. This optimization was discussed on RA BOF of this summer GNU Cauldron. It is a register pressure relief through live-range shrinkage. It is implemented on the scheduler base and uses register-pressure insn scheduling infrastructure. By rearranging insns we shorten pseudo live-ranges and increase a chance to them be assigned to a hard register. The code looks pretty simple but there are a lot of works behind this patch. I've tried about ten different versions of this code (different heuristics for two currently existing register-pressure algorithms). I think it is *upto target maintainers* to decide to use or not to use this optimization for their targets. I'd recommend to use this at least for x86/x86-64. I think any OOO processor with small or moderate register file which does not use the 1st insn scheduling might benefit from this too. On SPEC2000 for x86/x86-64 (I use Haswell processor, -O3 with general tuning), the optimization usage results in smaller code size in average (for floating point and integer benchmarks in 32- and 64-bit mode). The improvement better visible for SPECFP2000 (although I have the same improvement on x86-64 SPECInt2000 but it might be attributed mostly mcf benchmark unstability). It is about 0.5% for 32-bit and 64-bit mode. It is understandable, as the optimization has more opportunities to improve the code on longer BBs. Different from other heuristic optimizations, I don't see any significant worse performance. It gives practically the same or better performance (a few benchmarks imporoved by 1% or more upto 3%). The single but significant drawback is additional compilation time (4%-6%) as the 1st insn scheduling pass is quite expensive. So I'd recommend target maintainers to switch it on only for -Ofast. Generally I'd not recomment viewing -Ofast as -O4 but as -O3 plus generally unsafe optimizations. So I'd not enable it for -Ofast but for -O3 - possibly also with -Os if indeed the main motivation is also code-size improvements (-Os is a similar beast as -O3, spend as much time as you can on optimizing size). Ok. Probably my recommendation is wrong. It is actually upto target maintainers to decide when to use the optimization and or use it at all for default (may be they just decide to use it only for SPEC reporting). I guess that in some time we will need to use something like -O4 for greedy algorithms (there are a lot of researches in this area, e.g. I am reading an article about optimal register-pressure sensitive insn scheduling but the optimization can be constrained for time, for example 1ms for each insn, and still to produce better results than the current heuristics). I am sure such algorithms will be coming. Btw, thanks for working on this. How does it relate to -fsched-pressure? It is based on -fsched-pressure infrastructure but has different heuristics and goals. GCC with 1st insn scheduling even with -fsched-pressure still produces worse results on mainstream x86/x86-64 processors that GCC without it. I've also tried -flive-range-shrinkage -fschedule-insns -fsched-pressure, but just -flive-range-shrinkage is better for x86/x86-64. By the way, LLVM uses insn-scheduling for x86/x86-64 before RA, but it goal is only register-pressure decrease (for x86, for x86-64 it is a bit more complicated). So with this optimization we are just catching up with LLVM (which is unusual for us in optimization area). Does it treat all register classes the same? On x86 mostly the few fixed registers for some of the integer pipeline instructions hurt, x86_64 has enough general and FP registers? It treats them the same (although it is different for different classes as they have different number of available regs). It is always some kind of approximation as we use register pressure classes here not the classes which will be actually used for RA. It is even more complicated as IRA actually uses dynamic classes (only sets of regs which are profitable, e.g. it can be different from classes defined in the target file as reg in classes are caller-saved or some specific hard regs are used for arg passing). It makes graph coloring better for irregular register file architectures. In whole as I remember, dynamic classes gave about 1% improvement even for ppc. I should say that presence of hard regs in RTL (e.g. for parameter passing) is still a challenge for live-range shrinkage and register-pressure scheduling. It should be addressed somehow.
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, 2013-11-06 at 10:42 -0600, Bill Schmidt wrote: On Wed, 2013-11-06 at 15:01 +0100, Richard Biener wrote: On Wed, Nov 6, 2013 at 2:24 PM, Tejas Belagod tbela...@arm.com wrote: Hi, The attached patch eliminates moves of the form set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0] i.e. eliminates lower lane moves between src and dst where src and dst are the same register and this causes rtl to instead use the destination register in the required mode. Also, if my understanding of Big-Endian is correct, this should be safe for big-endian targets as well. I've bootstrapped this on x64_64, regressed on aarch64-none-elf, aarch64_be-none-elf. OK for trunk? It looks good to me (but I'm also wondering about bigendian). Bill? Yes, that should be ok. I can run a regression test to confirm. Never mind, I agree with Richard S. The direction of the ordering is fine, but an offset should be needed. Bill Thanks, Bill Can you add a testcase where this has an effect on code generation? Thanks, Richard. Thanks, Tejas Belagod ARM. 2013-11-06 Tejas Belagod tejas.bela...@arm.com gcc/ * rtlanal.c (set_noop_p): Return nonzero in case of redundant vec_select for same src and dst. diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 9769b69..3e434cd 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1180,6 +1180,25 @@ set_noop_p (const_rtx set) dst = SUBREG_REG (dst); } + /* This is big-endian-safe because the elements are kept in target + memory order. So, for eg. PARALLEL element value of 2 is the same in + either endian-ness. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + REGNO (XEXP (src, 0)) == REGNO (dst)) +{ + rtx par = XEXP (src, 1); + int i; + + for (i = 0; i XVECLEN (par, 0); i++) + { + rtx tem = XVECEXP (par, 0, i); + if (!CONST_INT_P (tem) || INTVAL (tem) != i) + return 0; + } + return 1; +} + return (REG_P (src) REG_P (dst) REGNO (src) == REGNO (dst)); }
Re: [Patch, RTL] Eliminate redundant vec_select moves.
Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: + /* This is big-endian-safe because the elements are kept in target + memory order. So, for eg. PARALLEL element value of 2 is the same in + either endian-ness. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + REGNO (XEXP (src, 0)) == REGNO (dst)) +{ + rtx par = XEXP (src, 1); + int i; + + for (i = 0; i XVECLEN (par, 0); i++) + { + rtx tem = XVECEXP (par, 0, i); + if (!CONST_INT_P (tem) || INTVAL (tem) != i) + return 0; + } + return 1; +} + I think for big endian it needs to be: INTVAL (tem) != i + base where base is something like: int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0); E.g. a big-endian V4HI looks like: msb lsb and shortening it to say V2HI only gives the low 32 bits: msb lsb But, in this case we want msb lsb It depends on whether the result occupies a full register or not. I was thinking of the case where it didn't, but I realise now you were thinking of the case where it did. And yeah, my suggestion doesn't cope with that... I was under the impression that the const vector parallel for vec_select represents the element indexes of the array in memory order. Therefore, in bigendian, msb lsb element a[0] a[1] a[2] a[3] and in littleendian msb lsb element a[3] a[2] a[1] a[0] Right. But if an N-bit value is stored in a register, it's assumed to occupy the lsb of the register and the N-1 bits above that. The other bits in the register are don't-care. E.g., leaving vectors to one side, if you have: (set (reg:HI N) (truncate:SI (reg:SI N))) on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this: msb lsb 01234567 4567 rather than: msb lsb 01234567 0123 for both endiannesses. The same principle applies to vectors. The lsb of the register is always assumed to be significant. So maybe the original patch was correct for partial-register and full-register results on little-endian, but only for full-register results on big-endian. Ah, ok! I think I get it. By eliminating set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0] using the check INTVAL (tem) != i, I'm essentially making subsequent operations use (reg:V2DI n) in DI mode which is a partial register result and this gives me the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial register mode, I have to make sure the correct elements coincide with the lsb in big-endian... Thanks for your input, I'll apply the offset correction for big-endian you suggested. I'll respin the patch. Thanks, Tejas.
Re: [PATCH]Fix computation of offset in ivopt
Hi, bin.cheng bin.ch...@arm.com writes: Index: gcc/tree-ssa-loop-ivopts.c === --- gcc/tree-ssa-loop-ivopts.c(revision 203267) +++ gcc/tree-ssa-loop-ivopts.c(working copy) @@ -2037,12 +2037,12 @@ find_interesting_uses (struct ivopts_data *data) static tree strip_offset_1 (tree expr, bool inside_addr, bool top_compref, - unsigned HOST_WIDE_INT *offset) + HOST_WIDE_INT *offset) { tree op0 = NULL_TREE, op1 = NULL_TREE, tmp, step; enum tree_code code; tree type, orig_type = TREE_TYPE (expr); - unsigned HOST_WIDE_INT off0, off1, st; + HOST_WIDE_INT off0, off1, st; tree orig_expr = expr; STRIP_NOPS (expr); @@ -2133,19 +2133,32 @@ strip_offset_1 (tree expr, bool inside_addr, bool break; case COMPONENT_REF: - if (!inside_addr) - return orig_expr; + { + tree field; - tmp = component_ref_field_offset (expr); - if (top_compref -cst_and_fits_in_hwi (tmp)) - { - /* Strip the component reference completely. */ - op0 = TREE_OPERAND (expr, 0); - op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); - *offset = off0 + int_cst_value (tmp); - return op0; - } + if (!inside_addr) + return orig_expr; + + tmp = component_ref_field_offset (expr); + field = TREE_OPERAND (expr, 1); + if (top_compref + cst_and_fits_in_hwi (tmp) + cst_and_fits_in_hwi (DECL_FIELD_BIT_OFFSET (field))) While comparing output for wide-int and mainline, I noticed that this condition is now always false on x86_64, since DECL_FIELD_BIT_OFFSET is a 128-bit bitsizetype and since cst_and_fits_in_hwi rejects constants with precision greater than HWI: if (TREE_CODE (x) != INTEGER_CST) return false; if (TYPE_PRECISION (TREE_TYPE (x)) HOST_BITS_PER_WIDE_INT) return false; Should this be host_integerp (DECL_FIELD_BIT_OFFSET (field), 0) instead? + { + HOST_WIDE_INT boffset, abs_off; + + /* Strip the component reference completely. */ + op0 = TREE_OPERAND (expr, 0); + op0 = strip_offset_1 (op0, inside_addr, top_compref, off0); + boffset = int_cst_value (DECL_FIELD_BIT_OFFSET (field)); + abs_off = abs_hwi (boffset) / BITS_PER_UNIT; + if (boffset 0) + abs_off = -abs_off; + + *offset = off0 + int_cst_value (tmp) + abs_off; + return op0; + } + } break; Thanks, Richard
Re: Re-factor tree.h - Part 1
On Wed, Nov 6, 2013 at 2:04 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 5 Nov 2013, Diego Novillo wrote: This is the first patch in a series of patches to cleanup tree.h to reduce the exposure it has all over the compiler. In this patch, I'm moving functions that are used once into the files that use them, and make them private to that file. These functions were declared extern in tree.h and called from exactly one place. I am not a big fan of doing it so automatically. For instance widest_int_cst_value should imho remain next to int_cst_value since they mostly share the same implementation. Doing this also doesn't promote code reuse: if I am looking for a function that does some basic operation on trees, I won't only need to look in the file that is semantically relevant, Yeah, that was what I was trying to impose. Functions that semantically make better sense in the place where they're called from. I filtered functions that were used once but would not make much sense to move (for instance, the init functions). How about this. Below is the list of functions that I took out of tree.h. They are either not used or used only once, in which case I moved them (and their private transitive closure) over to the file that uses them. There are more, but these are the ones that I originally considered to be too specific to the user file to be globally exposed. Which ones would you folks keep global? I have marked some that we may decide to keep extern and one, which I would not mind to not move at all. I also marked the functions I removed and the ones that I just made private to their definition file: Moved to builtins.c: more_const_call_expr_args_p expand_stack_restore expand_stack_save Moved to cfgexpand.c: expand_main_function stack_protect_prologue expand_asm_stmt expand_computed_goto expand_goto expand_return Moved to cgraphclones.c: build_function_decl_skip_args Moved to explow.c: tree_expr_size (maybe keep extern?) Moved to expr.c: fields_length Moved to fold-const.c: size_low_cst Moved to gimple-fold.c: truth_type_for (maybe keep extern?) Moved to symtab.c: decl_assembler_name_hash decl_assembler_name_equal Moved to trans-mem.c: is_tm_safe_or_pure Moved to tree-eh.c: in_array_bounds_p range_in_array_bounds_p Moved to tree-ssa-dom.c: iterative_hash_exprs_commutative Moved to tree-ssa-math-opts.c: widest_int_cst_value (maybe keep extern?) Moved to tree-vrp.c: fold_unary_to_constant (?) Moved to cp/call.c ctor_to_vec Moved to cp/decl.c: supports_one_only chain_member Moved to java/class.c: build_method_type Removed (not used anywhere) build_type_no_quals omp_remove_redundant_declare_simd_attrs fold_build3_initializer_loc real_twop print_vec_tree list_equal_p ssa_name_nonnegative_p addr_expr_of_non_mem_decl_p save_vtable_map_decl Made static (only used in its defining file) stabilize_reference_1 tree_expr_nonzero_p tree_invalid_nonnegative_warnv_p tree_expr_nonzero_warnv_p fold_builtin_snprintf_chk validate_arglist simple_cst_list_equal lookup_scoped_attribute_spec get_attribute_namespace fini_object_sizes Thanks. Diego.
Re: Re-factor tree.h - Part 1
On Wed, Nov 6, 2013 at 11:54 AM, Jeff Law l...@redhat.com wrote: On 11/06/13 00:04, Marc Glisse wrote: On Tue, 5 Nov 2013, Diego Novillo wrote: This is the first patch in a series of patches to cleanup tree.h to reduce the exposure it has all over the compiler. In this patch, I'm moving functions that are used once into the files that use them, and make them private to that file. These functions were declared extern in tree.h and called from exactly one place. I am not a big fan of doing it so automatically. For instance widest_int_cst_value should imho remain next to int_cst_value since they mostly share the same implementation. Doing this also doesn't promote code reuse: if I am looking for a function that does some basic operation on trees, I won't only need to look in the file that is semantically relevant, I'll also need to randomly grep through possible users to see if I should revert that part of your patch. On the other hand, most of those functions you move probably are better off in their new location, so you can ignore my post. What I think makes sense is to review what moved and where/why. I suspect (but have not looked) that much of the movement of code makes sense. Where it doesn't, then obviously we shouldn't make that change. Ah, yes. I just sent this list. Perhaps that's easier to discuss than the patch, which looks like line noise. Thanks. Diego.
[PATCH] Improve VRP assert creation for loops
On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote: I'm also curious -- did this code show up in a particular benchmark, if so, which one? I didn't find this problem from any benchmark, but from another concern about loop upper bound estimation. Look at the following code: int foo(unsigned int n, int r) { int i; if (n 0) if (n 4) { do { --n; r *= 2; } while (n 0); } return r+n; } In order to get the upper bound of the loop in this function, GCC traverses conditions n4 and n0 separately and tries to get any useful information. But as those two conditions cannot be combined into one due to this issue (note that n0 will be transformed into n!=0), when GCC sees n4, it will consider the possibility that n may be equal to 0, in which case the upper bound is UINT_MAX. If those two conditions can be combined into one, which is n-1=2, then we can get the correct upper bound of the loop. I've looked at the above testcase to see why we aren't able to determine the number of iterations upper bound properly here. That doesn't mean your patch is useless, though I must say I'm far from being convinced it is safe ATM and also it looks like quite ugly special case (trying to undo a VRP optimization but only one single specific case). The first problem is VRP, we end up with: bb 5: # n_1 = PHI n_5(D)(4), n_7(6) # r_3 = PHI r_6(D)(4), r_8(6) # RANGE [0, 4294967295] n_7 = n_1 + 4294967295; # RANGE [-2147483648, 2147483647] NONZERO 0x0fffe r_8 = r_3 * 2; if (n_7 != 0) goto bb 6; else goto bb 7; bb 6: goto bb 5; - missing range on n_1, extremely conservative range on n_7. With the attached patch we get instead: bb 5: # RANGE [1, 3] NONZERO 0x3 # n_1 = PHI n_5(D)(4), n_7(6) # r_3 = PHI r_6(D)(4), r_8(6) # RANGE [0, 2] NONZERO 0x3 n_7 = n_1 + 4294967295; # RANGE [-2147483648, 2147483647] NONZERO 0x0fffe r_8 = r_3 * 2; if (n_7 != 0) goto bb 6; else goto bb 7; bb 6: goto bb 5; The issue is that we use live_on_edge to determine if it is desirable to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter the loop through the bb with exit edge and thus live of the latch isn't computed (and, generally the propagation for live ignores dfs back edges anyway), and because in the above live_on_edge ((5)-(6), n_7) is false, we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during iteration we start with n_1 being assumed [1, 3] (that is range of the assertion from the preceeding conditions on n_5(D)), but in the next iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the PHI and thus end up with uselessly wide range because we think the subtraction can wrap around. This patch improves live_on_edge for empty latches, by just looking at the PHIs on loop-header from the latch - header edge and noting which SSA_NAMEs are used there. I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable various tree optimizations already and want to test unrolling of loops iterating exactly twice, but with this VRP change VRP is smart enough to replace the PHI argument on the i IV from - # i_13 = PHI i_8(3), 0(2) + # i_13 = PHI 1(3), 0(2) (the loop iterates exactly twice) and RTL unrolling doesn't do the tested thing in that case. But, this should affect only loops that roll exactly twice and those realy should be unrolled already far before, so IMHO it isn't something to try to optimize better at the RTL level. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-11-06 Jakub Jelinek ja...@redhat.com * tree-vrp.c (live_on_edge): If e-dest is an empty latch of some loop, but live[e-dest-index] is not computed, look for SSA_NAMEs used in loop header PHIs from the latch edge. * gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options. * gcc.dg/unroll_2.c: Likewise. * gcc.dg/unroll_3.c: Likewise. * gcc.dg/unroll_4.c: Likewise. --- gcc/tree-vrp.c.jj 2013-11-06 08:48:01.0 +0100 +++ gcc/tree-vrp.c 2013-11-06 09:32:19.205104029 +0100 @@ -92,6 +92,42 @@ static sbitmap *live; static bool live_on_edge (edge e, tree name) { + if (live[e-dest-index] == NULL) +{ + /* Handle edges to empty latch blocks. If NAME appears +in loop header phis on edges from latch, return true +and remember those SSA_NAMEs. */ + basic_block bb = e-dest; + if (bb-loop_father + bb-loop_father-latch == bb + single_succ_p (bb) + empty_block_p (bb)) + { + gimple_stmt_iterator gsi; + edge e2 = single_succ_edge (bb); + for (gsi = gsi_start_phis (e2-dest); + !gsi_end_p (gsi); + gsi_next (gsi)) + { + gimple phi = gsi_stmt (gsi); + tree res = gimple_phi_result (phi), arg; + +
[PATCH] Use get_range_info during number of iterations analysis
On Wed, Nov 06, 2013 at 06:13:42PM +0100, Jakub Jelinek wrote: I've looked at the above testcase to see why we aren't able to determine the number of iterations upper bound properly here. And here is a patch that uses get_range_info during # of iterations analysis, so that for your testcase we can actually find out that it has at most 2 latch executions and at most 3 header executions. On that testcase, bound_difference is actually called with n_5(D) + 0x and 0 and we don't have range info for n_5(D), because we had better range info only for the SSA_NAMEs temporarily created for the assertions. But with the previously posted patch, we have reasonable range info on the PHI of the IV at loop-header, and as that PHI has n_5(D) as one of its arguments (the one from preheader edge), we can still use that range info when inside of the loop for # of iters purposes, even when it might be wider range than strictly necessary (but not in this case). So, the patch looks at both get_range_info of var itself and of all the results of PHIs on loop-header that have var as PHI arg from the preheader edge, and ands all the ranges together. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-11-06 Jakub Jelinek ja...@redhat.com * tree-ssa-loop-niter.c: Include tree-ssanames.h. (determine_value_range): Add loop argument. Use get_range_info to improve range. (bound_difference): Adjust caller. --- gcc/tree-ssa-loop-niter.c.jj2013-10-24 10:19:21.0 +0200 +++ gcc/tree-ssa-loop-niter.c 2013-11-06 14:37:47.445220588 +0100 @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. #include diagnostic-core.h #include tree-inline.h #include tree-pass.h +#include tree-ssanames.h #define SWAP(X, Y) do { affine_iv *tmp = (X); (X) = (Y); (Y) = tmp; } while (0) @@ -119,9 +120,12 @@ split_to_var_and_offset (tree expr, tree in TYPE to MIN and MAX. */ static void -determine_value_range (tree type, tree var, mpz_t off, +determine_value_range (struct loop *loop, tree type, tree var, mpz_t off, mpz_t min, mpz_t max) { + double_int minv, maxv; + enum value_range_type rtype = VR_VARYING; + /* If the expression is a constant, we know its value exactly. */ if (integer_zerop (var)) { @@ -130,9 +134,73 @@ determine_value_range (tree type, tree v return; } + get_type_static_bounds (type, min, max); + + /* See if we have some range info from VRP. */ + if (TREE_CODE (var) == SSA_NAME INTEGRAL_TYPE_P (type)) +{ + edge e = loop_preheader_edge (loop); + gimple_stmt_iterator gsi; + + /* Either for VAR itself... */ + rtype = get_range_info (var, minv, maxv); + /* Or for PHI results in loop-header where VAR is used as +PHI argument from the loop preheader edge. */ + for (gsi = gsi_start_phis (loop-header); !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple phi = gsi_stmt (gsi); + double_int minc, maxc; + if (PHI_ARG_DEF_FROM_EDGE (phi, e) == var + (get_range_info (gimple_phi_result (phi), minc, maxc) + == VR_RANGE)) + { + if (rtype != VR_RANGE) + { + rtype = VR_RANGE; + minv = minc; + maxv = maxc; + } + else + { + minv = minv.max (minc, TYPE_UNSIGNED (type)); + maxv = maxv.min (maxc, TYPE_UNSIGNED (type)); + gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) = 0); + } + } + } + if (rtype == VR_RANGE) + { + mpz_t minm, maxm; + gcc_assert (minv.cmp (maxv, TYPE_UNSIGNED (type)) = 0); + mpz_init (minm); + mpz_init (maxm); + mpz_set_double_int (minm, minv, TYPE_UNSIGNED (type)); + mpz_set_double_int (maxm, maxv, TYPE_UNSIGNED (type)); + mpz_add (minm, minm, off); + mpz_add (maxm, maxm, off); + /* If the computation may not wrap or off is zero, then this +is always fine. If off is negative and minv + off isn't +smaller than type's minimum, or off is positive and +maxv + off isn't bigger than type's maximum, use the more +precise range too. */ + if (nowrap_type_p (type) + || mpz_sgn (off) == 0 + || (mpz_sgn (off) 0 mpz_cmp (minm, min) = 0) + || (mpz_sgn (off) 0 mpz_cmp (maxm, max) = 0)) + { + mpz_set (min, minm); + mpz_set (max, maxm); + mpz_clear (minm); + mpz_clear (maxm); + return; + } + mpz_clear (minm); + mpz_clear (maxm); + } +} + /* If the computation may wrap, we know nothing about the value, except for the range of the type. */ - get_type_static_bounds (type, min, max); if (!nowrap_type_p (type))
[PATCH] Fix PR58697
This fixes PR58697 by instead of trying to preserve reference structure when generating initial loads for the predictive commoned loop, just builds an approprate MEM_REF. That now can handle the cases that previously could be discovered as non-trapping (all constant offsets). I don't see a better way avoiding the out-of-bounds array access it currently generates (with preserving the transformation and not regressing other testcases). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2013-11-06 Richard Biener rguent...@suse.de PR tree-optimization/58653 * tree-predcom.c (ref_at_iteration): Rewrite to generate a MEM_REF. (prepare_initializers_chain): Adjust. * gcc.dg/tree-ssa/predcom-6.c: New testcase. * gcc.dg/tree-ssa/predcom-7.c: Likewise. Index: gcc/tree-predcom.c === *** gcc/tree-predcom.c (revision 204396) --- gcc/tree-predcom.c (working copy) *** replace_ref_with (gimple stmt, tree new_ *** 1323,1412 gsi_insert_after (bsi, new_stmt, GSI_NEW_STMT); } ! /* Returns the reference to the address of REF in the ITER-th iteration of !LOOP, or NULL if we fail to determine it (ITER may be negative). We !try to preserve the original shape of the reference (not rewrite it !as an indirect ref to the address), to make tree_could_trap_p in !prepare_initializers_chain return false more often. */ ! static tree ! ref_at_iteration (struct loop *loop, tree ref, int iter) { ! tree idx, *idx_p, type, val, op0 = NULL_TREE, ret; ! affine_iv iv; ! bool ok; ! ! if (handled_component_p (ref)) ! { ! op0 = ref_at_iteration (loop, TREE_OPERAND (ref, 0), iter); ! if (!op0) ! return NULL_TREE; ! } ! else if (!INDIRECT_REF_P (ref) ! TREE_CODE (ref) != MEM_REF) ! return unshare_expr (ref); ! ! if (TREE_CODE (ref) == MEM_REF) ! { ! ret = unshare_expr (ref); ! idx = TREE_OPERAND (ref, 0); ! idx_p = TREE_OPERAND (ret, 0); ! } ! else if (TREE_CODE (ref) == COMPONENT_REF) ! { ! /* Check that the offset is loop invariant. */ ! if (TREE_OPERAND (ref, 2) ! !expr_invariant_in_loop_p (loop, TREE_OPERAND (ref, 2))) ! return NULL_TREE; ! ! return build3 (COMPONENT_REF, TREE_TYPE (ref), op0, !unshare_expr (TREE_OPERAND (ref, 1)), !unshare_expr (TREE_OPERAND (ref, 2))); ! } ! else if (TREE_CODE (ref) == ARRAY_REF) ! { ! /* Check that the lower bound and the step are loop invariant. */ ! if (TREE_OPERAND (ref, 2) ! !expr_invariant_in_loop_p (loop, TREE_OPERAND (ref, 2))) ! return NULL_TREE; ! if (TREE_OPERAND (ref, 3) ! !expr_invariant_in_loop_p (loop, TREE_OPERAND (ref, 3))) ! return NULL_TREE; ! ! ret = build4 (ARRAY_REF, TREE_TYPE (ref), op0, NULL_TREE, ! unshare_expr (TREE_OPERAND (ref, 2)), ! unshare_expr (TREE_OPERAND (ref, 3))); ! idx = TREE_OPERAND (ref, 1); ! idx_p = TREE_OPERAND (ret, 1); ! } ! else ! return NULL_TREE; ! ! ok = simple_iv (loop, loop, idx, iv, true); ! if (!ok) ! return NULL_TREE; ! iv.base = expand_simple_operations (iv.base); ! if (integer_zerop (iv.step)) ! *idx_p = unshare_expr (iv.base); else ! { ! type = TREE_TYPE (iv.base); ! if (POINTER_TYPE_P (type)) ! { ! val = fold_build2 (MULT_EXPR, sizetype, iv.step, !size_int (iter)); ! val = fold_build_pointer_plus (iv.base, val); ! } ! else ! { ! val = fold_build2 (MULT_EXPR, type, iv.step, !build_int_cst_type (type, iter)); ! val = fold_build2 (PLUS_EXPR, type, iv.base, val); ! } ! *idx_p = unshare_expr (val); ! } ! ! return ret; } /* Get the initialization expression for the INDEX-th temporary variable --- 1323,1351 gsi_insert_after (bsi, new_stmt, GSI_NEW_STMT); } ! /* Returns a memory reference to DR in the ITER-th iteration of !the loop it was analyzed in. Append init stmts to STMTS. */ ! static tree ! ref_at_iteration (data_reference_p dr, int iter, gimple_seq *stmts) { ! tree off = DR_OFFSET (dr); ! tree coff = DR_INIT (dr); ! if (iter == 0) ! ; ! else if (TREE_CODE (DR_STEP (dr)) == INTEGER_CST) ! coff = size_binop (PLUS_EXPR, coff, ! size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter))); else ! off = size_binop (PLUS_EXPR, off, ! size_binop (MULT_EXPR, DR_STEP (dr), ssize_int (iter))); ! tree addr = fold_build_pointer_plus (DR_BASE_ADDRESS (dr), off); ! addr = force_gimple_operand_1 (addr, stmts, is_gimple_mem_ref_addr, !NULL_TREE); ! return fold_build2 (MEM_REF, TREE_TYPE (DR_REF (dr)), !
Re: [PATCH v2] Fix libgfortran cross compile configury w.r.t newlib
On 24 October 2013 17:47, Steve Ellcey sell...@mips.com wrote: I am not sure how we would fix the build issue to allow us to not hardcode the newlib configure details into the libgfortran configure script. The linker script that needs to be used to get a good link is different depending on what options one is compiling with on a multilib MIPS system and I have no idea on how one could create (or use) a dummy linker script. Ideally, I think we would want a check that does not depend on linking at all. Note that this problem is not just in libgfortran, it affects libstdc++ and libjava too and those configure scripts also have hardcoded the newlib assumptions. The only difference between libgfortran and the other two libraries is that the newlib assumptions for libgfortran are not static like they are for libstdc++ and libjava. They vary (for one function) based on whether or not long double is supported. Exactly, hard wiring the newlib interface into the configury of other libraries is questionable, but the patch applied to libgfortran goes beyond hard wiring details of the interface, it hard wires incorrect details of the interface when sizeof(long double) != sizeof(double). The argument that the patch is OK because libjava and libstdc++ also use this idiom, is also rather questionable, because the libgfortran patch goes beyond the idiom used in those other libraries by hard wiring an assumption that does not hold universally. On that basis, I think the the libgfortran patch should be reverted since it caused a regression. /Marcus
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
Hi, On Wed, 6 Nov 2013, Ilya Enkovich wrote: Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. That is definitely a correctness issue. Compiler cannot assume bounds by pointer value ignoring bounds passed to the function. Of course they can. They assume unbounded for unknown bounds and emit correct code. It's merely more conservative than you'd like (as the function got passed tighter bounds than unbounded), and hence no correctness but a QoI problem. Ciao, Michael.
Re: [Patch, RTL] Eliminate redundant vec_select moves.
Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: + /* This is big-endian-safe because the elements are kept in target + memory order. So, for eg. PARALLEL element value of 2 is the same in + either endian-ness. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + REGNO (XEXP (src, 0)) == REGNO (dst)) +{ + rtx par = XEXP (src, 1); + int i; + + for (i = 0; i XVECLEN (par, 0); i++) + { + rtx tem = XVECEXP (par, 0, i); + if (!CONST_INT_P (tem) || INTVAL (tem) != i) + return 0; + } + return 1; +} + I think for big endian it needs to be: INTVAL (tem) != i + base where base is something like: int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0); E.g. a big-endian V4HI looks like: msb lsb and shortening it to say V2HI only gives the low 32 bits: msb lsb But, in this case we want msb lsb It depends on whether the result occupies a full register or not. I was thinking of the case where it didn't, but I realise now you were thinking of the case where it did. And yeah, my suggestion doesn't cope with that... I was under the impression that the const vector parallel for vec_select represents the element indexes of the array in memory order. Therefore, in bigendian, msb lsb element a[0] a[1] a[2] a[3] and in littleendian msb lsb element a[3] a[2] a[1] a[0] Right. But if an N-bit value is stored in a register, it's assumed to occupy the lsb of the register and the N-1 bits above that. The other bits in the register are don't-care. E.g., leaving vectors to one side, if you have: (set (reg:HI N) (truncate:SI (reg:SI N))) on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this: msb lsb 01234567 4567 rather than: msb lsb 01234567 0123 for both endiannesses. The same principle applies to vectors. The lsb of the register is always assumed to be significant. So maybe the original patch was correct for partial-register and full-register results on little-endian, but only for full-register results on big-endian. Ah, ok! I think I get it. By eliminating set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0] using the check INTVAL (tem) != i, I'm essentially making subsequent operations use (reg:V2DI n) in DI mode which is a partial register result and this gives me the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial register mode, I have to make sure the correct elements coincide with the lsb in big-endian... Thanks for your input, I'll apply the offset correction for big-endian you suggested. I'll respin the patch. Thanks. Just for avoidance of doubt, the result might be a full or partial register, depending on the mode and target. I was trying to correct myself by agreeing that your original was right and mine was wrong for big-endian if the result is a full register. I don't know if there are existing helper functions for this kind of thing. Richard
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
Jakub == Jakub Jelinek ja...@redhat.com writes: Jakub 1) p debug_tree (0x718b3690) Jakubnot working anymore, typically one would use this when just Jakubcut'n'pasting addresses seen in backtrace, other debug_tree Jakubetc., one now has to add explicitly a cast p debug_tree Jakub((tree)0x718b3690). Perhaps this can be cured just by Jakubadding extra overloads for a few most popular debug_* routines Jakubthat weren't initially overloaded, add them an overload with Jakubuintptr_t type or similar that will just cast it to the right Jakubtype and call the original routine Last time this came up, we added the feature to gdb. You have to enable it, though. Use set check type off. Tom
Clean up configure glibc version detection, add --with-glibc-version
When bootstrapping a cross toolchain with GCC and glibc, it's desirable to keep down the number of GCC builds needed: to be able to build an initial static-only C-only GCC, use that to build glibc, and then build the full compiler with the resulting glibc. The aim is that if glibc were then rebuilt with the full compiler, the results would be identical to the glibc built with the initial compiler. (See https://sourceware.org/ml/libc-alpha/2012-03/msg00960.html for more on how ideally this might work; really it should only be target libraries that need rebuilding, not the compiler at all.) In support of this, I put various changes into glibc 2.17, and other changes into GCC 4.8 to make static libgcc sufficiently similar in an initial bootstrap build and a full build so as not to affect the resulting glibc binaries (see http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01219.html and http://gcc.gnu.org/ml/gcc-patches/2012-08/msg01462.html). However, while this solved the problem of stable bootstrapped glibc binaries with just two GCC builds for ARM, it did not do so for all architectures. Specifically, for architectures with architecture-specific stack-protector code in GCC, the first bootstrap GCC gets configured thinking glibc lacks the corresponding glibc support, meaning it is configured in a way incompatible with building glibc (but compatible with installing glibc headers if you do a three-compiler bootstrap, with a second compiler built after glibc was configured with the first compiler to install glibc headers, that second compiler used to build glibc, and a third compiler built after that). That can be worked around by setting gcc_cv_libc_provides_ssp=yes when configuring GCC, but setting an internal configure variable like that shouldn't be needed. Instead, this patch provides a properly documented configure option --with-glibc-version to use to specify the minimum glibc version a given compiler will target, for this case and also for more obscure cases where the autodetection may not work (e.g. if a sysroot headers suffix is used so the headers aren't found in the location expected by the configure tests). Different configure tests detected and used glibc versions in different ways. The stack-protector test checked for a glibc version number in headers, the long-double-128 test checked for target architecture and __LONG_DOUBLE_MATH_OPTIONAL, and the gnu-unique-object test used ldd --version and so only worked at all for native toolchains. After this patch, the configure script sets up glibc_version_major and glibc_version_minor variables (from the configure option if used, or headers if not used), and all those tests use those variables (in addition to whatever tests not of glibc version numbers are still relevant - __LONG_DOUBLE_MATH_OPTIONAL is kept as a fallback in the long-double-128 tests because uClibc defines it but pretends to be glibc 2.2). Thus, a documented configure option can be used to get correct stack-protector support in a bootstrap compiler, while the fixed gnu-unique-object test works for cross compilation. Any future cases where GCC needs to know the target glibc version number can also use the same configuration logic. Note that this patch does *not* address the --enable-initfini-array differences between native and cross compilation. I believe Thomas intends to deal with those (in GCC and elsewhere in the toolchain). I think there the correct default is to enable the feature by default for ELF targets (since it's about using a standard feature of ELF) and then blacklist (if necessary with version number checks) particular targets found to be broken, without doing anything special for native tools. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. OK to commit? 2013-11-06 Joseph Myers jos...@codesourcery.com * acinclude.m4 (GCC_GLIBC_VERSION_GTE_IFELSE): New configure macro. * configure.ac: Determine target_header_dir earlier. (--with-glibc-version): New configure option. Use GCC_GLIBC_VERSION_GTE_IFELSE in enable_gnu_unique_object, gcc_cv_libc_provides_ssp and gcc_cv_target_ldbl128 tests. * configure: Regenerate. * doc/install.texi (--enable-gnu-unique-object): Don't refer to native toolchains for default. (--with-glibc-version): Document. Index: gcc/acinclude.m4 === --- gcc/acinclude.m4(revision 204453) +++ gcc/acinclude.m4(working copy) @@ -561,3 +561,18 @@ dnl Make sure that build_exeext is looked for AC_DEFUN([gcc_AC_BUILD_EXEEXT], [ ac_executable_extensions=$build_exeext]) +dnl GCC_GLIBC_VERSION_GTE_IFELSE(MAJOR, MINOR, IF-TRUE, IF-FALSE) +dnl - +dnl If the target glibc version ($glibc_version_major.$glibc_version_minor) +dnl is at least MAJOR.MINOR, call IF-TRUE, otherwise call IF-FALSE. +AC_DEFUN([GCC_GLIBC_VERSION_GTE_IFELSE],
Re: [PATCH, MPX, 2/X] Pointers Checker [7/25] Suppress BUILT_IN_CHKP_ARG_BND optimizations.
2013/11/6 Michael Matz m...@suse.de: Hi, On Wed, 6 Nov 2013, Ilya Enkovich wrote: Well, but clearly you can see that bounds for p == NULL are useless and thus there is no problem with the transform. This example just demonstrates the issue. I may use some var's address in comparison with the similar result. And value does not always allow to determine bounds, because pointers with the same value may have different bounds. Some vars address can be used for better bounds deriving though. Well, it clearly shows it's a get me good results issue and not a correctness issue. And for good results you trade in missed general optimizations. That doesn't sound very good to me. That is definitely a correctness issue. Compiler cannot assume bounds by pointer value ignoring bounds passed to the function. Of course they can. They assume unbounded for unknown bounds and emit correct code. It's merely more conservative than you'd like (as the function got passed tighter bounds than unbounded), and hence no correctness but a QoI problem. Well, it depends on what we call 'correct' behavior for checker. I suppose that bound violation caught on -O0 and not caught on -O2 is a compiler issue. That is why I call it correctness issue. When you instrument your code you intentionally slow your program down to improve security. Compiler should have strong reason to behave contrary. Of course, there should be some trade off. But in this particular case I do not see why we should make pointer unchecked. If I just drop this patch and use undefined bounds in all cases arg_bnd call is optimized, I wont get any performance gain. Ilya Ciao, Michael.
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/06/13 11:04, Tom Tromey wrote: Jakub == Jakub Jelinek ja...@redhat.com writes: Jakub 1) p debug_tree (0x718b3690) Jakubnot working anymore, typically one would use this when just Jakubcut'n'pasting addresses seen in backtrace, other debug_tree Jakubetc., one now has to add explicitly a cast p debug_tree Jakub((tree)0x718b3690). Perhaps this can be cured just by Jakubadding extra overloads for a few most popular debug_* routines Jakubthat weren't initially overloaded, add them an overload with Jakubuintptr_t type or similar that will just cast it to the right Jakubtype and call the original routine Last time this came up, we added the feature to gdb. You have to enable it, though. Use set check type off. I'd been mildly annoyed by this behaviour, but not so much as to look for a solution and I can certainly see why gdb is doing what it's doing. I certainly don't consider this something that should stand in the way of moving forward. ISTM that one liner belongs in GCC's .gdbinit. Until then, I'm adding it to my own :-) jeff
Re: [Patch, RTL] Eliminate redundant vec_select moves.
On Wed, 2013-11-06 at 17:34 +, Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: Richard Sandiford wrote: Tejas Belagod tbela...@arm.com writes: + /* This is big-endian-safe because the elements are kept in target + memory order. So, for eg. PARALLEL element value of 2 is the same in + either endian-ness. */ + if (GET_CODE (src) == VEC_SELECT + REG_P (XEXP (src, 0)) REG_P (dst) + REGNO (XEXP (src, 0)) == REGNO (dst)) +{ + rtx par = XEXP (src, 1); + int i; + + for (i = 0; i XVECLEN (par, 0); i++) + { + rtx tem = XVECEXP (par, 0, i); + if (!CONST_INT_P (tem) || INTVAL (tem) != i) + return 0; + } + return 1; +} + I think for big endian it needs to be: INTVAL (tem) != i + base where base is something like: int base = GET_MODE_NUNITS (GET_MODE (XEXP (src, 0))) - XVECLEN (par, 0); E.g. a big-endian V4HI looks like: msb lsb and shortening it to say V2HI only gives the low 32 bits: msb lsb But, in this case we want msb lsb It depends on whether the result occupies a full register or not. I was thinking of the case where it didn't, but I realise now you were thinking of the case where it did. And yeah, my suggestion doesn't cope with that... I was under the impression that the const vector parallel for vec_select represents the element indexes of the array in memory order. Therefore, in bigendian, msb lsb element a[0] a[1] a[2] a[3] and in littleendian msb lsb element a[3] a[2] a[1] a[0] Right. But if an N-bit value is stored in a register, it's assumed to occupy the lsb of the register and the N-1 bits above that. The other bits in the register are don't-care. E.g., leaving vectors to one side, if you have: (set (reg:HI N) (truncate:SI (reg:SI N))) on a 32-bit !TRULY_NOOP_TRUNCATION target, it shortens like this: msb lsb 01234567 4567 rather than: msb lsb 01234567 0123 for both endiannesses. The same principle applies to vectors. The lsb of the register is always assumed to be significant. So maybe the original patch was correct for partial-register and full-register results on little-endian, but only for full-register results on big-endian. Ah, ok! I think I get it. By eliminating set( (reg:DI n) vec_select:DI ( (reg:V2DI n) (parallel [const 0] using the check INTVAL (tem) != i, I'm essentially making subsequent operations use (reg:V2DI n) in DI mode which is a partial register result and this gives me the wrong set of lanes in bigendian. So, if I want to use (reg n) in partial register mode, I have to make sure the correct elements coincide with the lsb in big-endian... Thanks for your input, I'll apply the offset correction for big-endian you suggested. I'll respin the patch. Thanks. Just for avoidance of doubt, the result might be a full or partial register, depending on the mode and target. I was trying to correct myself by agreeing that your original was right and mine was wrong for big-endian if the result is a full register. I don't know if there are existing helper functions for this kind of thing. Tejas, can you please include test cases for both possibilities? The existing test suite is not sufficient (your original patch does not demonstrate regressions on powerpc64 big endian, even though we know it's not correct). Thanks! Bill Richard
Re: [C++ Patch, Java related/RFC] PR 11006
On 11/06/2013 05:42 AM, Andrew Haley wrote: On 11/04/2013 05:21 PM, Jason Merrill wrote: Surely it should be valid to allocate a Java boolean type. Andrew, how should that work? It's not allowed. All objects that are allocated by new must be of class type (i.e. instances of a subclass of java.lang.Object), but boolean is a primitive type. In that case, Paolo's first patch is OK. Jason
Initial submission of OpenACC support integrated into OpenMP's lowering and expansion passes
Hi! Here is our take of adding OpenACC support to GCC's front end and middle end. What I post here (that is, as replies to this email, to gcc-patches only) is still very incomplete, but hopefully enough to outline the general approach, and I'd like to commit these patches to gomp-4_0-branch -- OK? Also, may I additionally commit those patches to trunk that don't have anything specific to OpenACC in them (such as the first two preparation patches), which I assume will save some work when merging between trunk and gomp-4_0-branch later on? Due to the conceptual similarity compared to OpenMP, and that (later) it is reasonable to expect to embed OpenACC directives into OpenMP ones, the approach I've chosen is to directly embed OpenACC handling into the existing OpenMP infrastructure/passes. I had first begun implementing separate OpenACC lowering and expansion passes, but then found myself re-implementing/copying more and more of the existing OpenMP passes' code, realized that makes no sense, and we shall instead handle these directly together in one set of passes -- which also happens to facilitate the interoperation requirement. For the same reasons, OpenACC's runtime library shall be implemented in libgomp next to OpenMP's. Jakub's already approved this approach generally. This patch series doesn't contain any substantial rumtime library work yet; the last patch adds GOACC_parallel, which so far simply branches to GOMP_target. There's more to come. This is in contrast to Samsung's work, who are implementing OpenACC separately from the existing OpenMP support. Yet, I hope we'll be able to share/re-use at least front end and some middle end code. Our work builds on what has (briefly) been discussed before, for example at the GNU Tools Cauldron's Accelerator BOF in this summer. For the moment, while still getting the concepts clear, I'm adding new GENERIC/GIMPLE codes and their handling for OpenACC constructs (for the implementation of OpenACC parallel borrowing from OpenMP target) -- the long-term goal still is to evolve all these into general acceleration abstractions, evolve the GCC OpenMP infrastructure and passes into general acceleration passes. As discussed, it makes sense to do this generalization work later on, once we have a better understanding of what we actually need. We directly strive for OpenACC 2.0 support, skipping OpenACC 1. We're focussing on the C front end implementation first, following on with C++ and Fortran later on. The patches I send here only contain a bare minimum of documentation changes. I already have further documentation changes prepared (source code comments as well as Texinfo manual updates); will polish and send these only once the approach we've chosen has generally been approved. Comments on the approach and/or patches? Grüße, Thomas pgpYa4_ZiFyzF.pgp Description: PGP signature
[gomp4 1/9] Add missing include.
From: Thomas Schwinge tho...@codesourcery.com libgomp/ * libgomp_g.h: Include stddef.h for size_t. --- libgomp/libgomp_g.h | 1 + 1 file changed, 1 insertion(+) diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h index 32c4cf6..577956a 100644 --- libgomp/libgomp_g.h +++ libgomp/libgomp_g.h @@ -29,6 +29,7 @@ #define LIBGOMP_G_H 1 #include stdbool.h +#include stddef.h /* barrier.c */ -- 1.8.1.1
[gomp4 2/9] libgomp: Prepare for testcases without -fopenmp.
From: Thomas Schwinge tho...@codesourcery.com libgomp/ * testsuite/lib/libgomp.exp (libgomp_init): Don't add -fopenmp to ALWAYS_CFLAGS. * testsuite/libgomp.c++/c++.exp (ALWAYS_CFLAGS): Add -fopenmp. * testsuite/libgomp.c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.graphite/graphite.exp (ALWAYS_CFLAGS): Likewise. --- libgomp/testsuite/lib/libgomp.exp | 3 --- libgomp/testsuite/libgomp.c++/c++.exp | 3 +++ libgomp/testsuite/libgomp.c/c.exp | 3 +++ libgomp/testsuite/libgomp.fortran/fortran.exp | 3 +++ libgomp/testsuite/libgomp.graphite/graphite.exp | 3 +++ 5 files changed, 12 insertions(+), 3 deletions(-) diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp index d1d8bc8..c965147 100644 --- libgomp/testsuite/lib/libgomp.exp +++ libgomp/testsuite/lib/libgomp.exp @@ -169,9 +169,6 @@ proc libgomp_init { args } { # Disable color diagnostics lappend ALWAYS_CFLAGS additional_flags=-fdiagnostics-color=never - -# And, gee, turn on OpenMP. -lappend ALWAYS_CFLAGS additional_flags=-fopenmp } # diff --git libgomp/testsuite/libgomp.c++/c++.exp libgomp/testsuite/libgomp.c++/c++.exp index b336306..88e017e 100644 --- libgomp/testsuite/libgomp.c++/c++.exp +++ libgomp/testsuite/libgomp.c++/c++.exp @@ -11,6 +11,9 @@ set lang_library_path ../libstdc++-v3/src/.libs # Initialize dg. dg-init +# Turn on OpenMP. +lappend ALWAYS_CFLAGS additional_flags=-fopenmp + set blddir [lookfor_file [get_multilibs] libgomp] diff --git libgomp/testsuite/libgomp.c/c.exp libgomp/testsuite/libgomp.c/c.exp index 7dfdf8b..8e902d4 100644 --- libgomp/testsuite/libgomp.c/c.exp +++ libgomp/testsuite/libgomp.c/c.exp @@ -17,6 +17,9 @@ if ![info exists DEFAULT_CFLAGS] then { # Initialize dg. dg-init +# Turn on OpenMP. +lappend ALWAYS_CFLAGS additional_flags=-fopenmp + # Gather a list of all tests. set tests [lsort [find $srcdir/$subdir *.c]] diff --git libgomp/testsuite/libgomp.fortran/fortran.exp libgomp/testsuite/libgomp.fortran/fortran.exp index b7fef29..e0bffe3 100644 --- libgomp/testsuite/libgomp.fortran/fortran.exp +++ libgomp/testsuite/libgomp.fortran/fortran.exp @@ -15,6 +15,9 @@ set quadmath_library_path ../libquadmath/.libs # Initialize dg. dg-init +# Turn on OpenMP. +lappend ALWAYS_CFLAGS additional_flags=-fopenmp + if { $blddir != } { lappend ALWAYS_CFLAGS additional_flags=-fintrinsic-modules-path=${blddir} # Look for a static libgfortran first. diff --git libgomp/testsuite/libgomp.graphite/graphite.exp libgomp/testsuite/libgomp.graphite/graphite.exp index 08aa509..9129964 100644 --- libgomp/testsuite/libgomp.graphite/graphite.exp +++ libgomp/testsuite/libgomp.graphite/graphite.exp @@ -42,6 +42,9 @@ set PARALLEL_CFLAGS -ansi -pedantic-errors -O2 \ # Initialize `dg'. dg-init +# Turn on OpenMP. +lappend ALWAYS_CFLAGS additional_flags=-fopenmp + # Gather a list of all tests. set tests [lsort [find $srcdir/$subdir *.c]] -- 1.8.1.1
[gomp4 5/9] OpenACC: preprocessor definition, Fortran integer parameter.
From: Thomas Schwinge tho...@codesourcery.com gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Conditionally define _OPENACC. gcc/fortran/ * cpp.c (cpp_define_builtins): Conditionally define _OPENACC. gcc/testsuite/ * c-c++-common/cpp/openacc-define-1.c: Test _OPENACC. * c-c++-common/cpp/openacc-define-2.c: Likewise. * c-c++-common/cpp/openacc-define-3.c: Likewise. * gfortran.dg/openacc-define-1.f90: Likewise. * gfortran.dg/openacc-define-2.f90: Likewise. * gfortran.dg/openacc-define-3.f90: Likewise. libgomp/ * openacc.f90 (openacc_version): New integer parameter. * openacc_lib.h (openacc_version): Likewise. * testsuite/libgomp.oacc-fortran/openacc_version-1.f: New file. * testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Likewise. --- gcc/c-family/c-cppbuiltin.c | 3 +++ gcc/fortran/cpp.c| 3 +++ gcc/testsuite/c-c++-common/cpp/openacc-define-1.c| 4 gcc/testsuite/c-c++-common/cpp/openacc-define-2.c| 4 gcc/testsuite/c-c++-common/cpp/openacc-define-3.c| 8 gcc/testsuite/gfortran.dg/openacc-define-1.f90 | 4 gcc/testsuite/gfortran.dg/openacc-define-2.f90 | 4 gcc/testsuite/gfortran.dg/openacc-define-3.f90 | 8 libgomp/openacc.f90 | 2 ++ libgomp/openacc_lib.h| 3 +++ libgomp/testsuite/libgomp.oacc-fortran/openacc_version-1.f | 9 + libgomp/testsuite/libgomp.oacc-fortran/openacc_version-2.f90 | 9 + 12 files changed, 61 insertions(+) create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/openacc_version-1.f create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/openacc_version-2.f90 diff --git gcc/c-family/c-cppbuiltin.c gcc/c-family/c-cppbuiltin.c index ed4c82c..d48d96f 100644 --- gcc/c-family/c-cppbuiltin.c +++ gcc/c-family/c-cppbuiltin.c @@ -895,6 +895,9 @@ c_cpp_builtins (cpp_reader *pfile) else if (flag_stack_protect == 1) cpp_define (pfile, __SSP__=1); + if (flag_openacc) +cpp_define (pfile, _OPENACC=201306); + if (flag_openmp) cpp_define (pfile, _OPENMP=201307); diff --git gcc/fortran/cpp.c gcc/fortran/cpp.c index ea53681..58f6cc9 100644 --- gcc/fortran/cpp.c +++ gcc/fortran/cpp.c @@ -169,6 +169,9 @@ cpp_define_builtins (cpp_reader *pfile) cpp_define (pfile, __GFORTRAN__=1); cpp_define (pfile, _LANGUAGE_FORTRAN=1); + if (gfc_option.gfc_flag_openacc) +cpp_define (pfile, _OPENACC=201306); + if (gfc_option.gfc_flag_openmp) cpp_define (pfile, _OPENMP=201107); diff --git gcc/testsuite/c-c++-common/cpp/openacc-define-1.c gcc/testsuite/c-c++-common/cpp/openacc-define-1.c index feaf778..cd37548 100644 --- gcc/testsuite/c-c++-common/cpp/openacc-define-1.c +++ gcc/testsuite/c-c++-common/cpp/openacc-define-1.c @@ -1,2 +1,6 @@ /* { dg-do preprocess } */ /* { dg-require-effective-target fopenacc } */ + +#ifdef _OPENACC +# error _OPENACC defined +#endif diff --git gcc/testsuite/c-c++-common/cpp/openacc-define-2.c gcc/testsuite/c-c++-common/cpp/openacc-define-2.c index a2f3e28..b007e32 100644 --- gcc/testsuite/c-c++-common/cpp/openacc-define-2.c +++ gcc/testsuite/c-c++-common/cpp/openacc-define-2.c @@ -1,3 +1,7 @@ /* { dg-options -fno-openacc } */ /* { dg-do preprocess } */ /* { dg-require-effective-target fopenacc } */ + +#ifdef _OPENACC +# error _OPENACC defined +#endif diff --git gcc/testsuite/c-c++-common/cpp/openacc-define-3.c gcc/testsuite/c-c++-common/cpp/openacc-define-3.c index ce270c3..ccedcd9 100644 --- gcc/testsuite/c-c++-common/cpp/openacc-define-3.c +++ gcc/testsuite/c-c++-common/cpp/openacc-define-3.c @@ -1,3 +1,11 @@ /* { dg-options -fopenacc } */ /* { dg-do preprocess } */ /* { dg-require-effective-target fopenacc } */ + +#ifndef _OPENACC +# error _OPENACC not defined +#endif + +#if _OPENACC != 201306 +# error _OPENACC defined to wrong value +#endif diff --git gcc/testsuite/gfortran.dg/openacc-define-1.f90 gcc/testsuite/gfortran.dg/openacc-define-1.f90 index b961468..42f4073 100644 --- gcc/testsuite/gfortran.dg/openacc-define-1.f90 +++ gcc/testsuite/gfortran.dg/openacc-define-1.f90 @@ -1,3 +1,7 @@ ! { dg-options -cpp } ! { dg-do preprocess } ! { dg-require-effective-target fopenacc } + +#ifdef _OPENACC +# error _OPENACC defined +#endif diff --git gcc/testsuite/gfortran.dg/openacc-define-2.f90 gcc/testsuite/gfortran.dg/openacc-define-2.f90 index 49b714d..8ad1bd5 100644 --- gcc/testsuite/gfortran.dg/openacc-define-2.f90 +++ gcc/testsuite/gfortran.dg/openacc-define-2.f90 @@ -1,3 +1,7 @@ ! { dg-options -cpp -fno-openacc } ! { dg-do preprocess } ! { dg-require-effective-target fopenacc } + +#ifdef _OPENACC +# error _OPENACC defined +#endif diff --git gcc/testsuite/gfortran.dg/openacc-define-3.f90
[gomp4 7/9] OpenACC: Use OpenMP's lowering and expansion passes.
From: Thomas Schwinge tho...@codesourcery.com gcc/ * gimplify.c (gimplify_body): Consider flag_openacc additionally to flag_openmp. * omp-low.c (execute_expand_omp, execute_lower_omp) (gate_diagnose_omp_blocks): Likewise. gcc/testsuite/ * gcc.dg/goacc-gomp/goacc-gomp.exp: New file. * gcc.dg/goacc/goacc.exp: Likewise. --- gcc/gimplify.c | 4 +-- gcc/omp-low.c | 10 --- gcc/testsuite/gcc.dg/goacc-gomp/goacc-gomp.exp | 38 ++ gcc/testsuite/gcc.dg/goacc/goacc.exp | 37 + 4 files changed, 83 insertions(+), 6 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/goacc-gomp/goacc-gomp.exp create mode 100644 gcc/testsuite/gcc.dg/goacc/goacc.exp diff --git gcc/gimplify.c gcc/gimplify.c index 1f18466..30c2b45 100644 --- gcc/gimplify.c +++ gcc/gimplify.c @@ -8803,7 +8803,7 @@ gimplify_body (tree fndecl, bool do_parms) gcc_assert (gimplify_ctxp == NULL); push_gimplify_context (gctx); - if (flag_openmp) + if (flag_openacc || flag_openmp) { gcc_assert (gimplify_omp_ctxp == NULL); if (lookup_attribute (omp declare target, DECL_ATTRIBUTES (fndecl))) @@ -8872,7 +8872,7 @@ gimplify_body (tree fndecl, bool do_parms) nonlocal_vlas = NULL; } - if (flag_openmp gimplify_omp_ctxp) + if ((flag_openacc || flag_openmp) gimplify_omp_ctxp) { delete_omp_context (gimplify_omp_ctxp); gimplify_omp_ctxp = NULL; diff --git gcc/omp-low.c gcc/omp-low.c index 94058af..99811d0 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -8234,7 +8234,8 @@ execute_expand_omp (void) static bool gate_expand_omp (void) { - return (flag_openmp != 0 !seen_error ()); + return ((flag_openacc || flag_openmp) + !seen_error ()); } namespace { @@ -10054,8 +10055,9 @@ execute_lower_omp (void) gimple_seq body; /* This pass always runs, to provide PROP_gimple_lomp. - But there is nothing to do unless -fopenmp is given. */ - if (flag_openmp == 0) + But there is nothing to do unless at least one of -fopenacc or -fopenmp is + given. */ + if (!(flag_openacc || flag_openmp)) return 0; all_contexts = splay_tree_new (splay_tree_compare_pointers, 0, @@ -10484,7 +10486,7 @@ diagnose_omp_structured_block_errors (void) static bool gate_diagnose_omp_blocks (void) { - return flag_openmp != 0; + return flag_openacc || flag_openmp; } namespace { diff --git gcc/testsuite/gcc.dg/goacc-gomp/goacc-gomp.exp gcc/testsuite/gcc.dg/goacc-gomp/goacc-gomp.exp new file mode 100644 index 000..29e9a93 --- /dev/null +++ gcc/testsuite/gcc.dg/goacc-gomp/goacc-gomp.exp @@ -0,0 +1,38 @@ +# Copyright (C) 2006-2013 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. + +# GCC testsuite that uses the `dg.exp' driver. + +# Load support procs. +load_lib gcc-dg.exp + +if { ![check_effective_target_fopenacc] \ + || ![check_effective_target_fopenmp] } { + return +} + +# Initialize `dg'. +dg-init + +# Main loop. +dg-runtest [lsort [concat \ + [find $srcdir/$subdir *.c] \ + [find $srcdir/c-c++-common/goacc-gomp *.c]]] -fopenacc -fopenmp + +# All done. +dg-finish diff --git gcc/testsuite/gcc.dg/goacc/goacc.exp gcc/testsuite/gcc.dg/goacc/goacc.exp new file mode 100644 index 000..1137c99 --- /dev/null +++ gcc/testsuite/gcc.dg/goacc/goacc.exp @@ -0,0 +1,37 @@ +# Copyright (C) 2006-2013 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3, or (at your option) +# any later version. +# +# GCC is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# http://www.gnu.org/licenses/. + +# GCC testsuite that uses the `dg.exp' driver. + +# Load support procs. +load_lib gcc-dg.exp + +if ![check_effective_target_fopenacc] { + return +} + +# Initialize `dg'. +dg-init +
[gomp4 3/9] OpenACC: Recognize -fopenacc.
From: Thomas Schwinge tho...@codesourcery.com gcc/c-family/ * c.opt (fopenacc): New option. gcc/fortran/ * lang.opt (fopenacc): New option. * invoke.texi (-fopenacc): Document it. * gfortran.h (gfc_option_t): New member. * options.c (gfc_init_options, gfc_handle_option): Handle it. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_fopenacc): New procedure. gcc/ * doc/invoke.texi (-fopenacc): Document it. * doc/sourcebuild.texi (fopenacc): Document it. gcc/testsuite/ * c-c++-common/cpp/openacc-define-1.c: New file. * c-c++-common/cpp/openacc-define-2.c: Likewise. * c-c++-common/cpp/openacc-define-3.c: Likewise. * gfortran.dg/openacc-define-1.f90: Likewise. * gfortran.dg/openacc-define-2.f90: Likewise. * gfortran.dg/openacc-define-3.f90: Likewise. --- gcc/c-family/c.opt| 4 gcc/doc/invoke.texi | 11 ++- gcc/doc/sourcebuild.texi | 3 +++ gcc/fortran/gfortran.h| 1 + gcc/fortran/invoke.texi | 7 ++- gcc/fortran/lang.opt | 4 gcc/fortran/options.c | 5 + gcc/testsuite/c-c++-common/cpp/openacc-define-1.c | 2 ++ gcc/testsuite/c-c++-common/cpp/openacc-define-2.c | 3 +++ gcc/testsuite/c-c++-common/cpp/openacc-define-3.c | 3 +++ gcc/testsuite/gfortran.dg/openacc-define-1.f90| 3 +++ gcc/testsuite/gfortran.dg/openacc-define-2.f90| 3 +++ gcc/testsuite/gfortran.dg/openacc-define-3.f90| 3 +++ gcc/testsuite/lib/target-supports.exp | 9 + 14 files changed, 59 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/cpp/openacc-define-1.c create mode 100644 gcc/testsuite/c-c++-common/cpp/openacc-define-2.c create mode 100644 gcc/testsuite/c-c++-common/cpp/openacc-define-3.c create mode 100644 gcc/testsuite/gfortran.dg/openacc-define-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/openacc-define-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/openacc-define-3.f90 diff --git gcc/c-family/c.opt gcc/c-family/c.opt index b862eb9..d86d79b 100644 --- gcc/c-family/c.opt +++ gcc/c-family/c.opt @@ -1065,6 +1065,10 @@ fobjc-std=objc1 ObjC ObjC++ Var(flag_objc1_only) Conform to the Objective-C 1.0 language as implemented in GCC 4.0 +fopenacc +C ObjC C++ ObjC++ Var(flag_openacc) +Enable OpenACC + fopenmp C ObjC C++ ObjC++ Var(flag_openmp) Enable OpenMP (implies -frecursive in Fortran) diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index e84bca3..e393139 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -168,7 +168,8 @@ in the following sections. @gccoptlist{-ansi -std=@var{standard} -fgnu89-inline @gol -aux-info @var{filename} -fallow-parameterless-variadic-functions @gol -fno-asm -fno-builtin -fno-builtin-@var{function} @gol --fhosted -ffreestanding -fopenmp -fms-extensions -fplan9-extensions @gol +-fhosted -ffreestanding -fopenacc -fopenmp -fms-extensions @gol +-fplan9-extensions @gol -trigraphs -traditional -traditional-cpp @gol -fallow-single-precision -fcond-mismatch -flax-vector-conversions @gol -fsigned-bitfields -fsigned-char @gol @@ -1831,6 +1832,14 @@ This is equivalent to @option{-fno-hosted}. @xref{Standards,,Language Standards Supported by GCC}, for details of freestanding and hosted environments. +@item -fopenacc +@opindex fopenacc +@cindex OpenACC accelerator programming +Enable handling of OpenACC. +When @option{-fopenacc} is specified, the +compiler generates accelerated code according to the OpenACC Application +Programming Interface v2.0 @w{@uref{http://www.openacc.org/}}. + @item -fopenmp @opindex fopenmp @cindex OpenMP parallel diff --git gcc/doc/sourcebuild.texi gcc/doc/sourcebuild.texi index 1a70916..8b0031c 100644 --- gcc/doc/sourcebuild.texi +++ gcc/doc/sourcebuild.texi @@ -1787,6 +1787,9 @@ Target supports Graphite optimizations. @item fixed_point Target supports fixed-point extension to C. +@item fopenacc +Target supports OpenACC via @option{-fopenacc}. + @item fopenmp Target supports OpenMP via @option{-fopenmp}. diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h index b28edd8..5089691 100644 --- gcc/fortran/gfortran.h +++ gcc/fortran/gfortran.h @@ -2285,6 +2285,7 @@ typedef struct int blas_matmul_limit; int flag_cray_pointer; int flag_d_lines; + int gfc_flag_openacc; int gfc_flag_openmp; int flag_sign_zero; int flag_stack_arrays; diff --git gcc/fortran/invoke.texi gcc/fortran/invoke.texi index eb678d1..46fca59 100644 --- gcc/fortran/invoke.texi +++ gcc/fortran/invoke.texi @@ -120,7 +120,7 @@ by type. Explanations are in the following sections. -ffixed-line-length-none -ffree-form -ffree-line-length-@var{n} @gol -ffree-line-length-none -fimplicit-none
[PATCH, i386]: Fix PR 59021, new vzeroupper instructions generated with -mavx
Hello! Attached patch fixes PR 59021, where new vzeroupper instructions are generated for -mavx after Vlad's useless insn removal patch. The problem was, that we depent on (useless) moves to AVX256 function argument registers in front of the function call to switch the mode to DIRTY mode. This is not true anymore, so call_insn has to switch to DIRTY mode by itself. While looking at this area - always set the mode after the call_insn in MODE_AFTER function. 2013-11-06 Uros Bizjak ubiz...@gmail.com PR target/59021 * config/i386/i386.c (ix86_avx_u128_mode_needed): Require AVX_U128_DIRTY mode for call_insn RTXes that use AVX256 registers. (ix86_avx_u128_mode_needed): Return AVX_U128_DIRTY mode for call_insn RTXes that return in AVX256 register. testsuite/ChangeLog: 2013-11-06 Uros Bizjak ubiz...@gmail.com PR target/59021 * gcc.target/i386/pr59021.c: New test. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}, configured with --with-arch=core-avx-i --with-cpu=core-avx-i --with-fpmath=avx. The patch will be backported to 4.7 and 4.8 branch in a couple of days. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 204459) +++ config/i386/i386.c (working copy) @@ -15708,7 +15708,7 @@ ix86_avx_u128_mode_needed (rtx insn) rtx arg = XEXP (XEXP (link, 0), 0); if (ix86_check_avx256_register (arg, NULL)) - return AVX_U128_ANY; + return AVX_U128_DIRTY; } } @@ -15828,8 +15828,8 @@ ix86_avx_u128_mode_after (int mode, rtx insn) { bool avx_reg256_found = false; note_stores (pat, ix86_check_avx256_stores, avx_reg256_found); - if (!avx_reg256_found) - return AVX_U128_CLEAN; + + return avx_reg256_found ? AVX_U128_DIRTY : AVX_U128_CLEAN; } /* Otherwise, return current mode. Remember that if insn Index: testsuite/gcc.target/i386/pr59021.c === --- testsuite/gcc.target/i386/pr59021.c (revision 0) +++ testsuite/gcc.target/i386/pr59021.c (working copy) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -mavx -mvzeroupper } */ + +extern void abort (void); + +struct S { + int i1; + int i2; + int i3; +}; + +typedef double v4df __attribute__ ((vector_size (32))); + +extern int foo (v4df, int i1, int i2, int i3, int i4, int i5, struct S s); + +void bar (v4df v, struct S s) +{ + int r = foo (v, 1, 2, 3, 4, 5, s); + if (r) +abort (); +} + +/* { dg-final { scan-assembler-not vzeroupper } } */
Committed patch implementing live range shrinkage.
Here is the final version of the patch I committed as rev.204465. It takes Jeff's proposal into account. The patch was successfully bootstrapped with -flive-range-shrinkage on x86 and x86-64. 2013-11-06 Vladimir Makarov vmaka...@redhat.com * tree-pass.h (make_pass_live_range_shrinkage): New external. * timevar.def (TV_LIVE_RANGE_SHRINKAGE): New. * sched-rgn.c (gate_handle_live_range_shrinkage): New. (rest_of_handle_live_range_shrinkage): Ditto (class pass_live_range_shrinkage): Ditto. (pass_data_live_range_shrinkage): Ditto. (make_pass_live_range_shrinkage): Ditto. * sched-int.h (initialize_live_range_shrinkage): New prototype. (finish_live_range_shrinkage): Ditto. * sched-deps.c (create_insn_reg_set): Make void return value. * passes.def: Add pass_live_range_shrinkage. * ira.c (update_equiv_regs): Don't move if flag_live_range_shrinkage. * haifa-sched.c (live_range_shrinkage_p): New. (initialize_live_range_shrinkage, finish_live_range_shrinkage): New functions. (rank_for_schedule): Add code for pressure relief through live range shrinkage. (schedule_insn): Print more debug info. (sched_init): Setup SCHED_PRESSURE_WEIGHTED for pressure relief through live range shrinkage. * doc/invoke.texi (-flive-range-shrinkage): New. * common.opt (flive-range-shrinkage): New. Index: common.opt === --- common.opt (revision 204380) +++ common.opt (working copy) @@ -1738,6 +1738,10 @@ fregmove Common Ignore Does nothing. Preserved for backward compatibility. +flive-range-shrinkage +Common Report Var(flag_live_range_shrinkage) Init(0) Optimization +Relief of register pressure through live range shrinkage + frename-registers Common Report Var(flag_rename_registers) Init(2) Optimization Perform a register renaming optimization pass Index: doc/invoke.texi === --- doc/invoke.texi (revision 204216) +++ doc/invoke.texi (working copy) @@ -378,7 +378,7 @@ Objective-C and Objective-C++ Dialects}. -fira-region=@var{region} -fira-hoist-pressure @gol -fira-loop-pressure -fno-ira-share-save-slots @gol -fno-ira-share-spill-slots -fira-verbose=@var{n} @gol --fivopts -fkeep-inline-functions -fkeep-static-consts @gol +-fivopts -fkeep-inline-functions -fkeep-static-consts -flive-range-shrinkage @gol -floop-block -floop-interchange -floop-strip-mine -floop-nest-optimize @gol -floop-parallelize-all -flto -flto-compression-level @gol -flto-partition=@var{alg} -flto-report -flto-report-wpa -fmerge-all-constants @gol @@ -7257,6 +7257,12 @@ registers after writing to their lower 3 Enabled for x86 at levels @option{-O2}, @option{-O3}. +@item -flive-range-shrinkage +@opindex flive-range-shrinkage +Attempt to decrease register pressure through register live range +shrinkage. This is helpful for fast processors with small or moderate +size register sets. + @item -fira-algorithm=@var{algorithm} Use the specified coloring algorithm for the integrated register allocator. The @var{algorithm} argument can be @samp{priority}, which Index: haifa-sched.c === --- haifa-sched.c (revision 204380) +++ haifa-sched.c (working copy) @@ -150,6 +150,24 @@ along with GCC; see the file COPYING3. #ifdef INSN_SCHEDULING +/* True if we do register pressure relief through live-range + shrinkage. */ +static bool live_range_shrinkage_p; + +/* Switch on live range shrinkage. */ +void +initialize_live_range_shrinkage (void) +{ + live_range_shrinkage_p = true; +} + +/* Switch off live range shrinkage. */ +void +finish_live_range_shrinkage (void) +{ + live_range_shrinkage_p = false; +} + /* issue_rate is the number of insns that can be scheduled in the same machine cycle. It can be defined in the config/mach/mach.h file, otherwise we set it to 1. */ @@ -2519,7 +2537,7 @@ rank_for_schedule (const void *x, const rtx tmp = *(const rtx *) y; rtx tmp2 = *(const rtx *) x; int tmp_class, tmp2_class; - int val, priority_val, info_val; + int val, priority_val, info_val, diff; if (MAY_HAVE_DEBUG_INSNS) { @@ -2532,6 +2550,22 @@ rank_for_schedule (const void *x, const return INSN_LUID (tmp) - INSN_LUID (tmp2); } + if (live_range_shrinkage_p) +{ + /* Don't use SCHED_PRESSURE_MODEL -- it results in much worse +code. */ + gcc_assert (sched_pressure == SCHED_PRESSURE_WEIGHTED); + if ((INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp) 0 + || INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2) 0) + (diff = (INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp) + - INSN_REG_PRESSURE_EXCESS_COST_CHANGE (tmp2))) != 0) + return diff; + /* Sort by INSN_LUID
Re: [gomp4 9/9] OpenACC: Basic support for #pragma acc parallel.
On Wed, Nov 06, 2013 at 08:42:23PM +0100, tho...@codesourcery.com wrote: +#define OACC_PARALLEL_CLAUSE_MASK\ + PRAGMA_OMP_CLAUSE_NONE This is incorrect, either you should define in c-common.h also OMP_CLAUSE_MASK_0 and define this to it, or it needs to be (OMP_CLAUSE_MASK_1 PRAGMA_OMP_CLAUSE_NONE). Otherwise it won't compile on 32-bit HWI targets where omp_clause_mask is a C++ class. Jakub
[gomp4 8/9] OpenACC: Basic support for #pragma acc in the C front end.
From: Thomas Schwinge tho...@codesourcery.com gcc/c-family/ * c-pragma.c (oacc_pragmas): New array. (c_pp_lookup_pragma, init_pragma): Handle it. gcc/ * doc/invoke.texi (-fopenacc): Update. gcc/c/ * c-parser.c (c_parser_omp_all_clauses): Make a parser error message suitable for OpenACC, too. gcc/cp/ * parser.c (cp_parser_omp_all_clauses): Make a parser error message suitable for OpenACC, too. --- gcc/c-family/c-pragma.c | 22 ++ gcc/c/c-parser.c| 2 +- gcc/cp/parser.c | 2 +- gcc/doc/invoke.texi | 2 +- 4 files changed, 25 insertions(+), 3 deletions(-) diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c index 3ce77a2..98f98d0 100644 --- gcc/c-family/c-pragma.c +++ gcc/c-family/c-pragma.c @@ -1164,6 +1164,8 @@ typedef struct static vecpragma_ns_name registered_pp_pragmas; struct omp_pragma_def { const char *name; unsigned int id; }; +static const struct omp_pragma_def oacc_pragmas[] = { +}; static const struct omp_pragma_def omp_pragmas[] = { { atomic, PRAGMA_OMP_ATOMIC }, { barrier, PRAGMA_OMP_BARRIER }, @@ -1194,9 +1196,18 @@ static const struct omp_pragma_def omp_pragmas[] = { void c_pp_lookup_pragma (unsigned int id, const char **space, const char **name) { + const int n_oacc_pragmas = sizeof (oacc_pragmas) / sizeof (*oacc_pragmas); const int n_omp_pragmas = sizeof (omp_pragmas) / sizeof (*omp_pragmas); int i; + for (i = 0; i n_oacc_pragmas; ++i) +if (oacc_pragmas[i].id == id) + { + *space = acc; + *name = oacc_pragmas[i].name; + return; + } + for (i = 0; i n_omp_pragmas; ++i) if (omp_pragmas[i].id == id) { @@ -1348,6 +1359,17 @@ c_invoke_pragma_handler (unsigned int id) void init_pragma (void) { + if (flag_openacc) +{ + const int n_oacc_pragmas + = sizeof (oacc_pragmas) / sizeof (*oacc_pragmas); + int i; + + for (i = 0; i n_oacc_pragmas; ++i) + cpp_register_deferred_pragma (parse_in, acc, oacc_pragmas[i].name, + oacc_pragmas[i].id, true, true); +} + if (flag_openmp) { const int n_omp_pragmas = sizeof (omp_pragmas) / sizeof (*omp_pragmas); diff --git gcc/c/c-parser.c gcc/c/c-parser.c index a8f4774..8a1e988 100644 --- gcc/c/c-parser.c +++ gcc/c/c-parser.c @@ -10730,7 +10730,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask, c_name = simdlen; break; default: - c_parser_error (parser, expected %#pragma omp% clause); + c_parser_error (parser, expected clause); goto saw_error; } diff --git gcc/cp/parser.c gcc/cp/parser.c index bbc8e75..c3345ee 100644 --- gcc/cp/parser.c +++ gcc/cp/parser.c @@ -27911,7 +27911,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask, c_name = simdlen; break; default: - cp_parser_error (parser, expected %#pragma omp% clause); + cp_parser_error (parser, expected clause); goto saw_error; } diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index af8973a..cc4a6da 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -1835,7 +1835,7 @@ freestanding and hosted environments. @item -fopenacc @opindex fopenacc @cindex OpenACC accelerator programming -Enable handling of OpenACC. +Enable handling of OpenACC directives @code{#pragma acc} in C. When @option{-fopenacc} is specified, the compiler generates accelerated code according to the OpenACC Application Programming Interface v2.0 @w{@uref{http://www.openacc.org/}}. This option -- 1.8.1.1
[gomp4 6/9] OpenACC: Infrastructure for builtins.
From: Thomas Schwinge tho...@codesourcery.com gcc/ * oacc-builtins.def: New file. * Makefile.in (BUILTINS_DEF): Add oacc-builtins.def. * builtins.def (DEF_GOACC_BUILTIN): New macro. Include oacc-builtins.def. gcc/fortran/ * f95-lang.c (DEF_GOACC_BUILTIN): New macro. Include ../oacc-builtins.def. libgomp/ * libgomp.map (GOACC_2.0): New symbol version. --- gcc/Makefile.in| 3 ++- gcc/builtins.def | 10 ++ gcc/fortran/f95-lang.c | 10 ++ gcc/oacc-builtins.def | 28 libgomp/libgomp.map| 3 +++ 5 files changed, 53 insertions(+), 1 deletion(-) create mode 100644 gcc/oacc-builtins.def diff --git gcc/Makefile.in gcc/Makefile.in index cc88fb8..0511097 100644 --- gcc/Makefile.in +++ gcc/Makefile.in @@ -871,7 +871,8 @@ FIXED_VALUE_H = fixed-value.h $(MACHMODE_H) double-int.h RTL_H = $(RTL_BASE_H) $(FLAGS_H) genrtl.h READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h PARAMS_H = params.h params.def -BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \ +BUILTINS_DEF = builtins.def sync-builtins.def \ + oacc-builtins.def omp-builtins.def \ gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def INTERNAL_FN_DEF = internal-fn.def INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF) diff --git gcc/builtins.def gcc/builtins.def index e2d8849..9a9a20a 100644 --- gcc/builtins.def +++ gcc/builtins.def @@ -139,6 +139,13 @@ along with GCC; see the file COPYING3. If not see DEF_BUILTIN (ENUM, NAME, BUILT_IN_NORMAL, BT_LAST, BT_LAST, false, false, \ false, ATTR_LAST, false, false) +/* Builtin used by the implementation of GNU OpenACC. None of these are + actually implemented in the compiler; they're all in libgomp. */ +#undef DEF_GOACC_BUILTIN +#define DEF_GOACC_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ + DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ + false, true, true, ATTRS, false, flag_openacc) + /* Builtin used by the implementation of GNU OpenMP. None of these are actually implemented in the compiler; they're all in libgomp. */ #undef DEF_GOMP_BUILTIN @@ -856,6 +863,9 @@ DEF_GCC_BUILTIN (BUILT_IN_LINE, LINE, BT_FN_INT, ATTR_NOTHROW_LEAF_LIST) /* Synchronization Primitives. */ #include sync-builtins.def +/* OpenACC builtins. */ +#include oacc-builtins.def + /* OpenMP builtins. */ #include omp-builtins.def diff --git gcc/fortran/f95-lang.c gcc/fortran/f95-lang.c index 873c137..69012b6 100644 --- gcc/fortran/f95-lang.c +++ gcc/fortran/f95-lang.c @@ -1035,6 +1035,16 @@ gfc_init_builtin_functions (void) #include ../sync-builtins.def #undef DEF_SYNC_BUILTIN + if (gfc_option.gfc_flag_openacc) +{ +#undef DEF_GOACC_BUILTIN +#define DEF_GOACC_BUILTIN(code, name, type, attr) \ + gfc_define_builtin (__builtin_ name, builtin_types[type], \ + code, name, attr); +#include ../oacc-builtins.def +#undef DEF_GOACC_BUILTIN +} + if (gfc_option.gfc_flag_openmp || flag_tree_parallelize_loops) { #undef DEF_GOMP_BUILTIN diff --git gcc/oacc-builtins.def gcc/oacc-builtins.def new file mode 100644 index 000..fd630e0 --- /dev/null +++ gcc/oacc-builtins.def @@ -0,0 +1,28 @@ +/* This file contains the definitions and documentation for the + OpenACC builtins used in the GNU compiler. + + Copyright (C) 2013 Free Software Foundation, Inc. + + Contributed by Thomas Schwinge tho...@codesourcery.com. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +/* Before including this file, you should define a macro: + + DEF_GOACC_BUILTIN (ENUM, NAME, TYPE, ATTRS) + + See builtins.def for details. */ diff --git libgomp/libgomp.map libgomp/libgomp.map index 4f87d00..f094ed2 100644 --- libgomp/libgomp.map +++ libgomp/libgomp.map @@ -230,3 +230,6 @@ GOMP_4.0 { OACC_2.0 { }; + +GOACC_2.0 { +}; -- 1.8.1.1
[gomp4 4/9] OpenACC: The runtime library will be implemented in libgomp, too.
From: Thomas Schwinge tho...@codesourcery.com gcc/ * gcc.c (LINK_COMMAND_SPEC, GOMP_SELF_SPECS): For -fopenacc, link to libgomp and its dependencies. * config/arc/arc.h (LINK_COMMAND_SPEC): Likewise. * config/darwin.h (LINK_COMMAND_SPEC_A): Likewise. * config/i386/mingw32.h (GOMP_SELF_SPECS): Likewise. * config/ia64/hpux.h (LIB_SPEC): Likewise. * config/pa/pa-hpux11.h (LIB_SPEC): Likewise. * config/pa/pa64-hpux.h (LIB_SPEC): Likewise. * doc/invoke.texi (-fopenacc): Update. libgomp/ * libgomp.map (OACC_2.0): New symbol version. * libgomp.spec.in: Update comment. * configure.ac: Likewise. * configure: Regenerate. * Makefile.am (nodist_libsubinclude_HEADERS): Add openacc.h. (nodist_finclude_HEADERS): Add openacc_lib.h, openacc.f90, openacc.mod, and openacc_kinds.mod. (openacc_kinds.mod): New target. (%.mod): New target, generalized from omp_lib.mod. * Makefile.in: Regenerate. * openacc.f90: New file. * openacc.h: Likewise. * openacc_lib.h: Likewise. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.oacc-c/lib-1.c: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise. * testsuite/libgomp.oacc-fortran/lib-1.f90: Likewise. * testsuite/libgomp.oacc-fortran/lib-2.f: Likewise. * testsuite/libgomp.oacc-fortran/lib-3.f: Likewise. --- gcc/config/arc/arc.h | 2 +- gcc/config/darwin.h| 2 +- gcc/config/i386/mingw32.h | 2 +- gcc/config/ia64/hpux.h | 2 +- gcc/config/pa/pa-hpux11.h | 2 +- gcc/config/pa/pa64-hpux.h | 12 ++-- gcc/doc/invoke.texi| 4 +- gcc/gcc.c | 7 ++- libgomp/Makefile.am| 9 ++- libgomp/Makefile.in| 10 +++- libgomp/configure | 2 +- libgomp/configure.ac | 2 +- libgomp/libgomp.map| 3 + libgomp/libgomp.spec.in| 2 +- libgomp/openacc.f90| 37 libgomp/openacc.h | 45 ++ libgomp/openacc_lib.h | 26 libgomp/testsuite/libgomp.oacc-c++/c++.exp | 66 + libgomp/testsuite/libgomp.oacc-c/c.exp | 36 +++ libgomp/testsuite/libgomp.oacc-c/lib-1.c | 7 +++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp | 69 ++ libgomp/testsuite/libgomp.oacc-fortran/lib-1.f90 | 3 + libgomp/testsuite/libgomp.oacc-fortran/lib-2.f | 3 + libgomp/testsuite/libgomp.oacc-fortran/lib-3.f | 3 + 24 files changed, 332 insertions(+), 24 deletions(-) create mode 100644 libgomp/openacc.f90 create mode 100644 libgomp/openacc.h create mode 100644 libgomp/openacc_lib.h create mode 100644 libgomp/testsuite/libgomp.oacc-c++/c++.exp create mode 100644 libgomp/testsuite/libgomp.oacc-c/c.exp create mode 100644 libgomp/testsuite/libgomp.oacc-c/lib-1.c create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-2.f create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/lib-3.f diff --git gcc/config/arc/arc.h gcc/config/arc/arc.h index 637f7b6..14fc717 100644 --- gcc/config/arc/arc.h +++ gcc/config/arc/arc.h @@ -174,7 +174,7 @@ along with GCC; see the file COPYING3. If not see %(linker) %l LINK_PIE_SPEC %X %{o*} %{A} %{d} %{e*} %{m} %{N} %{n} %{r}\ %{s} %{t} %{u*} %{x} %{z} %{Z} %{!A:%{!nostdlib:%{!nostartfiles:%S}}}\ %{static:} %{L*} %(mfwrap) %(link_libgcc) %o\ -%{fopenmp:%:include(libgomp.spec)%(link_gomp)} %(mflib)\ +%{fopenacc|fopenmp:%:include(libgomp.spec)%(link_gomp)} %(mflib)\ %{fprofile-arcs|fprofile-generate|coverage:-lgcov}\ %{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\ %{!A:%{!nostdlib:%{!nostartfiles:%E}}} %{T*} }} diff --git gcc/config/darwin.h gcc/config/darwin.h index 596c9ef..735b6b9 100644 --- gcc/config/darwin.h +++ gcc/config/darwin.h @@ -176,7 +176,7 @@ extern GTY(()) int darwin_ms_struct; %{o*}%{!o:-o a.out} \ %{!nostdlib:%{!nostartfiles:%S}} \ %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \ -%{fopenmp|ftree-parallelize-loops=*: \ +%{fopenacc|fopenmp|ftree-parallelize-loops=*: \ %{static|static-libgcc|static-libstdc++|static-libgfortran: libgomp.a%s; : -lgomp } } \ %{fgnu-tm: \
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
Jeff == Jeff Law l...@redhat.com writes: Jeff ISTM that one liner belongs in GCC's .gdbinit. Until then, I'm adding Jeff it to my own :-) Yeah, I think that would be reasonable. It seems like it isn't appropriate in many cases, so we left it off by default. Tom
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/06/13 13:01, Tom Tromey wrote: Jeff == Jeff Law l...@redhat.com writes: Jeff ISTM that one liner belongs in GCC's .gdbinit. Until then, I'm adding Jeff it to my own :-) Yeah, I think that would be reasonable. It seems like it isn't appropriate in many cases, so we left it off by default. Here's the patch I installed for GCC. Thanks for the suggestion! Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index dfaf4e3..43ebbda 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-11-06 Jeff Law l...@redhat.com + Tom Tromey tro...@redhat.com + + * gdbinit.in: Disable strict type checking. + 2013-11-06 Vladimir Makarov vmaka...@redhat.com * tree-pass.h (make_pass_live_range_shrinkage): New external. diff --git a/gcc/gdbinit.in b/gcc/gdbinit.in index 503ef24..c60cab1 100644 --- a/gcc/gdbinit.in +++ b/gcc/gdbinit.in @@ -205,6 +205,11 @@ set complaints 0 b exit b abort +# Disable strict type checking. This allows developers to (for example) +# make inferior calls without casting absolute address to a suitable +# pointer type. +set check type off + # Skip all inline functions in tree.h. # These are used in accessor macros. # Note that this is added at the end because older gdb versions
PATCHES: [4.9 Regression] libsanitizer doesn't build for x32
These 4 patches fix libsanitizer build for x32. They will be checked into upstream: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59018#c4 In the meantime, OK to install to unblock x32 build? Thanks. H.J. From de59975f158501bc99b345c02687944a49c9d31c Mon Sep 17 00:00:00 2001 From: H.J. Lu hjl.to...@gmail.com Date: Tue, 5 Nov 2013 05:40:48 -0800 Subject: [PATCH 1/4] Cast pointers to uptr for 64-bit syscalls --- libsanitizer/ChangeLog.x32 | 20 + libsanitizer/sanitizer_common/sanitizer_linux.cc | 38 2 files changed, 39 insertions(+), 19 deletions(-) create mode 100644 libsanitizer/ChangeLog.x32 diff --git a/libsanitizer/ChangeLog.x32 b/libsanitizer/ChangeLog.x32 new file mode 100644 index 000..7b54005 --- /dev/null +++ b/libsanitizer/ChangeLog.x32 @@ -0,0 +1,20 @@ +2013-11-05 H.J. Lu hongjiu...@intel.com + + * sanitizer_common/sanitizer_linux.cc (internal_mmap): Cast + pointers to uptr for 64-bit syscalls. + (internal_munmap): Likewise. + (internal_open): Likewise. + (internal_read): Likewise. + (internal_write): Likewise. + (internal_stat): Likewise. + (internal_lstat): Likewise. + (internal_fstat): Likewise. + (internal_readlink): Likewise. + (internal_unlink): Likewise. + (internal_execve): Likewise. + (NanoTime): Likewise. + (BlockingMutex::Lock): Likewise. + (BlockingMutex::Unlock): Likewise. + (internal_ptrace): Likewise. + (internal_getdents): Likewise. + (internal_sigaltstack): Likewise. diff --git a/libsanitizer/sanitizer_common/sanitizer_linux.cc b/libsanitizer/sanitizer_common/sanitizer_linux.cc index 666f15b..e48bee5 100644 --- a/libsanitizer/sanitizer_common/sanitizer_linux.cc +++ b/libsanitizer/sanitizer_common/sanitizer_linux.cc @@ -77,14 +77,14 @@ namespace __sanitizer { uptr internal_mmap(void *addr, uptr length, int prot, int flags, int fd, u64 offset) { #if SANITIZER_LINUX_USES_64BIT_SYSCALLS - return internal_syscall(__NR_mmap, addr, length, prot, flags, fd, offset); + return internal_syscall(__NR_mmap, (uptr)addr, length, prot, flags, fd, offset); #else return internal_syscall(__NR_mmap2, addr, length, prot, flags, fd, offset); #endif } uptr internal_munmap(void *addr, uptr length) { - return internal_syscall(__NR_munmap, addr, length); + return internal_syscall(__NR_munmap, (uptr)addr, length); } uptr internal_close(fd_t fd) { @@ -92,11 +92,11 @@ uptr internal_close(fd_t fd) { } uptr internal_open(const char *filename, int flags) { - return internal_syscall(__NR_open, filename, flags); + return internal_syscall(__NR_open, (uptr)filename, flags); } uptr internal_open(const char *filename, int flags, u32 mode) { - return internal_syscall(__NR_open, filename, flags, mode); + return internal_syscall(__NR_open, (uptr)filename, flags, mode); } uptr OpenFile(const char *filename, bool write) { @@ -106,13 +106,13 @@ uptr OpenFile(const char *filename, bool write) { uptr internal_read(fd_t fd, void *buf, uptr count) { sptr res; - HANDLE_EINTR(res, (sptr)internal_syscall(__NR_read, fd, buf, count)); + HANDLE_EINTR(res, (sptr)internal_syscall(__NR_read, fd, (uptr)buf, count)); return res; } uptr internal_write(fd_t fd, const void *buf, uptr count) { sptr res; - HANDLE_EINTR(res, (sptr)internal_syscall(__NR_write, fd, buf, count)); + HANDLE_EINTR(res, (sptr)internal_syscall(__NR_write, fd, (uptr)buf, count)); return res; } @@ -138,7 +138,7 @@ static void stat64_to_stat(struct stat64 *in, struct stat *out) { uptr internal_stat(const char *path, void *buf) { #if SANITIZER_LINUX_USES_64BIT_SYSCALLS - return internal_syscall(__NR_stat, path, buf); + return internal_syscall(__NR_stat, (uptr)path, (uptr)buf); #else struct stat64 buf64; int res = internal_syscall(__NR_stat64, path, buf64); @@ -149,7 +149,7 @@ uptr internal_stat(const char *path, void *buf) { uptr internal_lstat(const char *path, void *buf) { #if SANITIZER_LINUX_USES_64BIT_SYSCALLS - return internal_syscall(__NR_lstat, path, buf); + return internal_syscall(__NR_lstat, (uptr)path, (uptr)buf); #else struct stat64 buf64; int res = internal_syscall(__NR_lstat64, path, buf64); @@ -160,7 +160,7 @@ uptr internal_lstat(const char *path, void *buf) { uptr internal_fstat(fd_t fd, void *buf) { #if SANITIZER_LINUX_USES_64BIT_SYSCALLS - return internal_syscall(__NR_fstat, fd, buf); + return internal_syscall(__NR_fstat, fd, (uptr)buf); #else struct stat64 buf64; int res = internal_syscall(__NR_fstat64, fd, buf64); @@ -181,11 +181,11 @@ uptr internal_dup2(int oldfd, int newfd) { } uptr internal_readlink(const char *path, char *buf, uptr bufsize) { - return internal_syscall(__NR_readlink, path, buf, bufsize); + return internal_syscall(__NR_readlink, (uptr)path, (uptr)buf, bufsize); } uptr internal_unlink(const char *path) { - return
Re: [PATCH 0/6] Conversion of gimple types to C++ inheritance (v3)
On 11/05/13 16:23, Ian Lance Taylor wrote: On Tue, Nov 5, 2013 at 2:12 PM, Jeff Law l...@redhat.com wrote: I can't speak for Andrew, but my experience with this kind of object type casting in a large C++ project is that it's a red flag for a design problem. I'm going to chime in to say that I think it's a pretty typical way to represent a compiler IR in C++. There is a base type that a lot of code uses, but there is also a real type, and the way to get to that real type is to use a cast. We could do it all with virtual functions, but those carry a different cost. In effect, using virtual functions increases the size of the code field from 16 bits to 64 bits. It adds up. Also this seems to be a pretty direct version of the data structures we already have. Certainly and to some degree these casts in one form or another are a necessity. My concern is we use them as a crutch (as I've seen in other projects) and we end up not getting the level of compile time static type checking that we want. If the bulk are temporary in nature and we as a project agree to look at the refactoring necessary to minimize the is-a/as-a uses to a few low level idioms, then I'll hesitatingly go along. If that's the choice we make, then folks should know I'll probably look at each is-a/as-a in patches I review and ask for a justification for their use. jeff