[PATCH][4.8] S/390: Transaction optimization fixes
Hi, the attached patch fixes two issues with the TX backend optimization trying to get rid of FPR save/restore operations in some cases. The first is that the stack pointer might get clobbered on 64 bit if the backend was able to get rid of all the FPR saves/restores and these were the only things requiring stack space. For more details please see the testcase comment. The second is that the optimization did not work as expected on 31 bit since the loop setting the fpr save bits only for the 64 bit call-saved FPRs made use of the call_clobbered information. Both issues are covered by the new testcase. Bootstrapped and regtested on s390 and s390x with --with-arch=zEC12. No regressions. I'm going to apply the patch to 4.8 and mainline after waiting a few days for comments. Bye, -Andreas- 2013-10-04 Andreas Krebbel andreas.kreb...@de.ibm.com * config/s390/s390.c (s390_register_info): Make the call-saved FPR loop to work also for 31bit ABI. Save the stack pointer for frame_size 0. 2013-10-04 Andreas Krebbel andreas.kreb...@de.ibm.com * gcc.target/s390/htm-nofloat-2.c: New testcase. --- gcc/config/s390/s390.c| 21 gcc/testsuite/gcc.target/s390/htm-nofloat-2.c | 55 ++ 2 files changed, 57 insertions(+), 8 deletions(-), 11 modifications(!) Index: gcc/config/s390/s390.c === *** gcc/config/s390/s390.c.orig --- gcc/config/s390/s390.c *** s390_register_info (int clobbered_regs[] *** 7509,7516 { cfun_frame_layout.fpr_bitmap = 0; cfun_frame_layout.high_fprs = 0; ! if (TARGET_64BIT) ! for (i = FPR8_REGNUM; i = FPR15_REGNUM; i++) /* During reload we have to use the df_regs_ever_live infos since reload is marking FPRs used as spill slots there as live before actually making the code changes. Without --- 7509,7519 { cfun_frame_layout.fpr_bitmap = 0; cfun_frame_layout.high_fprs = 0; ! ! for (i = FPR0_REGNUM; i = FPR15_REGNUM; i++) ! { ! if (call_really_used_regs[i]) ! continue; /* During reload we have to use the df_regs_ever_live infos since reload is marking FPRs used as spill slots there as live before actually making the code changes. Without *** s390_register_info (int clobbered_regs[] *** 7523,7530 !global_regs[i]) { cfun_set_fpr_save (i); ! cfun_frame_layout.high_fprs++; } } for (i = 0; i 16; i++) --- 7526,7536 !global_regs[i]) { cfun_set_fpr_save (i); ! ! if (i = FPR8_REGNUM) ! cfun_frame_layout.high_fprs++; } + } } for (i = 0; i 16; i++) *** s390_register_info (int clobbered_regs[] *** 7554,7559 --- 7560,7566 || TARGET_TPF_PROFILING || cfun_save_high_fprs_p || get_frame_size () 0 + || (reload_completed cfun_frame_layout.frame_size 0) || cfun-calls_alloca || cfun-stdarg); *** s390_register_info (int clobbered_regs[] *** 7652,7665 cfun_set_fpr_save (i + FPR0_REGNUM); } } - - if (!TARGET_64BIT) - { - if (df_regs_ever_live_p (FPR4_REGNUM) !global_regs[FPR4_REGNUM]) - cfun_set_fpr_save (FPR4_REGNUM); - if (df_regs_ever_live_p (FPR6_REGNUM) !global_regs[FPR6_REGNUM]) - cfun_set_fpr_save (FPR6_REGNUM); - } } /* Fill cfun-machine with info about frame of current function. */ --- 7659,7664 Index: gcc/testsuite/gcc.target/s390/htm-nofloat-2.c === *** /dev/null --- gcc/testsuite/gcc.target/s390/htm-nofloat-2.c *** *** 0 --- 1,55 + /* { dg-do run } */ + /* { dg-options -O3 -mhtm -Wa,-march=zEC12 --save-temps } */ + + /* __builtin_tbegin has to emit clobbers for all FPRs since the tbegin +instruction does not automatically preserves them. If the +transaction body is fully contained in a function the backend tries +after reload to get rid of the FPR save/restore operations +triggered by the clobbers. This testcase failed since the backend +was able to get rid of all FPR saves/restores and since these were +the only stack operations also of the entire stack space. So even +the save/restore of the stack pointer was omitted in the end. +However, since the frame layout has been fixed before, the prologue +still generated the stack pointer decrement making foo return with +a modified stack pointer. */ + + void abort(void); + + void __attribute__((noinline)) + foo (int a) + { + /* This is just to prevent the tbegin code from actually being + executed. That way the test may even run
[SH] PR 51244 - Fix defects introduced in 4.8
Hello, Some of the things I've done in 4.8 to improve SH T bit handling turned out to produce wrong code. The attached patch fixes that by introducing an SH specific RTL pass. Tested on rev 202876 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. Additional test cases will follow. OK for trunk? Cheers, Oleg gcc/ChangeLog: PR target/51244 * config/sh/ifcvt_sh.cc: New SH specific RTL pass. * config.gcc (SH extra_objs): Add ifcvt_sh.o. * config/sh/t-sh (ifcvt_sh.o): New entry. * config/sh/sh.c (sh_fixed_condition_code_regs): New function that implements the target hook TARGET_FIXED_CONDITION_CODE_REGS. (register_sh_passes): New function. Register ifcvt_sh pass. (sh_option_override): Invoke it. (sh_canonicalize_comparison): Handle op0_preserve_value. * sh.md (*cbranch_t): Do not try to optimize missed test and branch opportunities. Canonicalize branch condition. (nott): Allow only if pseudos can be created for non-SH2A. Index: gcc/config.gcc === --- gcc/config.gcc (revision 202876) +++ gcc/config.gcc (working copy) @@ -462,6 +462,7 @@ cpu_type=sh need_64bit_hwint=yes extra_options=${extra_options} fused-madd.opt + extra_objs=${extra_objs} ifcvt_sh.o ;; v850*-*-*) cpu_type=v850 Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 202876) +++ gcc/config/sh/sh.c (working copy) @@ -53,6 +53,9 @@ #include alloc-pool.h #include tm-constrs.h #include opts.h +#include tree-pass.h +#include pass_manager.h +#include context.h #include sstream #include vector @@ -311,6 +314,7 @@ static void sh_canonicalize_comparison (int *, rtx *, rtx *, bool); static void sh_canonicalize_comparison (enum rtx_code, rtx, rtx, enum machine_mode, bool); +static bool sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2); static void sh_init_sync_libfuncs (void) ATTRIBUTE_UNUSED; @@ -587,6 +591,9 @@ #undef TARGET_CANONICALIZE_COMPARISON #define TARGET_CANONICALIZE_COMPARISON sh_canonicalize_comparison +#undef TARGET_FIXED_CONDITION_CODE_REGS +#define TARGET_FIXED_CONDITION_CODE_REGS sh_fixed_condition_code_regs + /* Machine-specific symbol_ref flags. */ #define SYMBOL_FLAG_FUNCVEC_FUNCTION (SYMBOL_FLAG_MACH_DEP 0) @@ -710,6 +717,34 @@ #undef err_ret } +/* Register SH specific RTL passes. */ +extern opt_pass* make_pass_ifcvt_sh (gcc::context* ctx, bool split_insns, + const char* name); +static void +register_sh_passes (void) +{ + if (!TARGET_SH1) +return; + +/* Running the ifcvt_sh pass after ce1 generates better code when + comparisons are combined and reg-reg moves are introduced, because + reg-reg moves will be eliminated afterwards. However, there are quite + some cases where combine will be unable to fold comparison related insns, + thus for now don't do it. + register_pass (make_pass_ifcvt_sh (g, false, ifcvt1_sh), + PASS_POS_INSERT_AFTER, ce1, 1); +*/ + + /* Run ifcvt_sh pass after combine but before register allocation. */ + register_pass (make_pass_ifcvt_sh (g, true, ifcvt2_sh), + PASS_POS_INSERT_AFTER, split1, 1); + + /* Run ifcvt_sh pass after register allocation and basic block reordering + as this sometimes creates new opportunities. */ + register_pass (make_pass_ifcvt_sh (g, true, ifcvt3_sh), + PASS_POS_INSERT_AFTER, split4, 1); +} + /* Implement TARGET_OPTION_OVERRIDE macro. Validate and override various options, and do some machine dependent initialization. */ static void @@ -1022,6 +1057,8 @@ target CPU. */ selected_atomic_model_ = parse_validate_atomic_model_option (sh_atomic_model_str); + + register_sh_passes (); } /* Print the operand address in x to the stream. */ @@ -1908,7 +1945,7 @@ static void sh_canonicalize_comparison (enum rtx_code cmp, rtx op0, rtx op1, enum machine_mode mode, - bool op0_preserve_value ATTRIBUTE_UNUSED) + bool op0_preserve_value) { /* When invoked from within the combine pass the mode is not specified, so try to get it from one of the operands. */ @@ -1928,6 +1965,9 @@ // Make sure that the constant operand is the second operand. if (CONST_INT_P (op0) !CONST_INT_P (op1)) { + if (op0_preserve_value) + return; + std::swap (op0, op1); cmp = swap_condition (cmp); } @@ -2016,6 +2056,14 @@ *code = (int)tmp_code; } +bool +sh_fixed_condition_code_regs (unsigned int* p1, unsigned int* p2) +{ + *p1 = T_REG; + *p2 = INVALID_REGNUM; + return true; +} + enum rtx_code prepare_cbranch_operands (rtx *operands, enum machine_mode mode, enum rtx_code comparison) Index: gcc/config/sh/sh.md === ---
[C++ Patch] PR 58560
Hi, this error recovery ICE (a low priority regression) can be easily avoided by checking the TREE_TYPE of exp too. Tested x86_64-linux. Thanks, Paolo. /// /cp 2013-10-04 Paolo Carlini paolo.carl...@oracle.com PR c++/58560 * typeck2.c (build_functional_cast): Use error_operand_p on exp. /testsuite 2013-10-04 Paolo Carlini paolo.carl...@oracle.com PR c++/58560 * g++.dg/cpp0x/auto39.C: New. Index: cp/typeck2.c === --- cp/typeck2.c(revision 203200) +++ cp/typeck2.c(working copy) @@ -1757,7 +1757,7 @@ build_functional_cast (tree exp, tree parms, tsubs tree type; vectree, va_gc *parmvec; - if (exp == error_mark_node || parms == error_mark_node) + if (error_operand_p (exp) || parms == error_mark_node) return error_mark_node; if (TREE_CODE (exp) == TYPE_DECL) Index: testsuite/g++.dg/cpp0x/auto39.C === --- testsuite/g++.dg/cpp0x/auto39.C (revision 0) +++ testsuite/g++.dg/cpp0x/auto39.C (working copy) @@ -0,0 +1,6 @@ +// PR c++/58560 +// { dg-do compile { target c++11 } } + +typedef auto T; // { dg-error typedef declared 'auto' } + +void foo() { T(); }
Enable SSE math on i386 with -Ofast
Hi, this patch makes -Ofast to also imply -mfpmath=sse. It is important win on SPECfP (2000 and 2006). Even though for exmaple the following float a(float b) { return b+10; } results in somewhat ridiculous a: .LFB0: .cfi_startproc subl$4, %esp .cfi_def_cfa_offset 8 movss .LC0, %xmm0 addss 8(%esp), %xmm0 movss %xmm0, (%esp) flds(%esp) addl$4, %esp .cfi_def_cfa_offset 4 ret I wonder if we can get rid at least of the redundant stack alignment on ESP... Bootstrapped/regtested x86_64-linux, will commit it on weekend if there are no complains. I wonder if -ffast-math should do the same - it is documented as enabling explicit set of options, bu that can be changed I guess. * invoke.texi (Ofast): Update documentation. * i386.h (TARGET_FPMATH_DEFAULT): Enable SSE math with -Ofast. Index: doc/invoke.texi === --- doc/invoke.texi (revision 203161) +++ doc/invoke.texi (working copy) @@ -6796,6 +6796,7 @@ Disregard strict standards compliance. valid for all standard-compliant programs. It turns on @option{-ffast-math} and the Fortran-specific @option{-fno-protect-parens} and @option{-fstack-arrays}. +On i386 target it also enable @option{-mfpmath=sse}. @item -Og @opindex Og Index: config/i386/i386.h === --- config/i386/i386.h (revision 203161) +++ config/i386/i386.h (working copy) @@ -209,7 +209,8 @@ extern const struct processor_costs ix86 #ifndef TARGET_FPMATH_DEFAULT #define TARGET_FPMATH_DEFAULT \ - (TARGET_64BIT TARGET_SSE ? FPMATH_SSE : FPMATH_387) + ((TARGET_64BIT TARGET_SSE) \ + || (TARGET_SSE optimize_fast) ? FPMATH_SSE : FPMATH_387) #endif #define TARGET_FLOAT_RETURNS_IN_80387 TARGET_FLOAT_RETURNS
Re: [C++ Patch] PR 58448
... and this is a more straightforward approach. Also tested x86_64-linux. Thanks! Paolo. / /cp 2013-10-04 Paolo Carlini paolo.carl...@oracle.com PR c++/58448 * pt.c (tsubst): Use error_operand_p on parameter t. /testsuite 2013-10-04 Paolo Carlini paolo.carl...@oracle.com PR c++/58448 * g++.dg/template/crash117.C: New. Index: cp/pt.c === --- cp/pt.c (revision 203200) +++ cp/pt.c (working copy) @@ -11272,7 +11272,7 @@ tsubst (tree t, tree args, tsubst_flags_t complain enum tree_code code; tree type, r = NULL_TREE; - if (t == NULL_TREE || t == error_mark_node + if (t == NULL_TREE || t == integer_type_node || t == void_type_node || t == char_type_node @@ -11281,6 +11281,9 @@ tsubst (tree t, tree args, tsubst_flags_t complain || TREE_CODE (t) == TRANSLATION_UNIT_DECL) return t; + if (error_operand_p (t)) +return error_mark_node; + if (DECL_P (t)) return tsubst_decl (t, args, complain); Index: testsuite/g++.dg/template/crash117.C === --- testsuite/g++.dg/template/crash117.C(revision 0) +++ testsuite/g++.dg/template/crash117.C(working copy) @@ -0,0 +1,6 @@ +// PR c++/58448 + +class SmallVector; struct Types4; +template typename, typename, typename, typename struct Types { + typedef Types4::Constructable // { dg-error template|typedef|expected } +} TypesSmallVector, SmallVector, SmallVector, SmallVector:: // { dg-error expected }
Re: patch to canonicalize tree-csts.
So here is a patch with the change. As before, bootstrapped an tested on x86-64. On 10/03/2013 12:16 PM, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: Changing the representation of unsigned constants is only worthwhile if we can avoid the force_to_size for some unsigned cases. I think we can avoid it for precision = xprecision !small_prec. Either we should take the hit of doing that comparison (but see below) or the change isn't worthwhile. i think that it is something closer to precision = xprecision + HOST_BITS_PER_WIDE_INT ... The problem is that the tree cst may have one extra block beyond the precision. Ah, OK. I was thinking that we should always be able to use the constant as-is for max_wide_int-based and addr_wide_int-based operations. The small_prec again, you can get edge cased to death here.i think it would work for max because that really is bigger than anything else, but it is possible (though unlikely) to have something big converted to an address by truncation. But I'd have expected that conversion to be represented by an explicit CONVERT_EXPR or NOP_EXPR. It seems wrong to use addr_wide_int directly on something that isn't bit- or byte-address-sized. It'd be the C equivalent of int + long - int rather than the expected int + long - long. Same goes for wide_int. If we're doing arithmetic at a specific precision, it seems odd for one of the inputs to be wider and yet not have an explicit truncation. Thanks, Richard Index: gcc/tree.c === --- gcc/tree.c (revision 203039) +++ gcc/tree.c (working copy) @@ -1187,10 +1187,10 @@ wide_int_to_tree (tree type, const wide_ tree t; int ix = -1; int limit = 0; - int i; + unsigned int i; gcc_assert (type); - int prec = TYPE_PRECISION (type); + unsigned int prec = TYPE_PRECISION (type); signop sgn = TYPE_SIGN (type); /* Verify that everything is canonical. */ @@ -1204,11 +1204,11 @@ wide_int_to_tree (tree type, const wide_ } wide_int cst = wide_int::from (pcst, prec, sgn); - int len = int (cst.get_len ()); - int small_prec = prec (HOST_BITS_PER_WIDE_INT - 1); + unsigned int len = int (cst.get_len ()); + unsigned int small_prec = prec (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED - (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len - small_prec; + small_prec + (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; switch (TREE_CODE (type)) { @@ -1235,7 +1235,7 @@ wide_int_to_tree (tree type, const wide_ case INTEGER_TYPE: case OFFSET_TYPE: - if (TYPE_UNSIGNED (type)) + if (TYPE_SIGN (type) == UNSIGNED) { /* Cache 0..N */ limit = INTEGER_SHARE_LIMIT; @@ -1294,7 +1294,7 @@ wide_int_to_tree (tree type, const wide_ must be careful here because tree-csts and wide-ints are not canonicalized in the same way. */ gcc_assert (TREE_TYPE (t) == type); - gcc_assert (TREE_INT_CST_NUNITS (t) == len); + gcc_assert (TREE_INT_CST_NUNITS (t) == (int)len); if (recanonize) { len--; @@ -1321,7 +1321,10 @@ wide_int_to_tree (tree type, const wide_ TREE_VEC_ELT (TYPE_CACHED_VALUES (type), ix) = t; } } - else if (cst.get_len () == 1) + else if (cst.get_len () == 1 + (TYPE_SIGN (type) == SIGNED + || recanonize + || cst.elt (0) = 0)) { /* 99.99% of all int csts will fit in a single HWI. Do that one efficiently. */ @@ -1351,14 +1354,29 @@ wide_int_to_tree (tree type, const wide_ for the gc to take care of. There will not be enough of them to worry about. */ void **slot; - tree nt = make_int_cst (len); - TREE_INT_CST_NUNITS (nt) = len; + tree nt; + if (!recanonize + TYPE_SIGN (type) == UNSIGNED + cst.elt (len - 1) 0) + { + unsigned int blocks_needed + = (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT; + + nt = make_int_cst (blocks_needed + 1); + for (i = len; i blocks_needed; i++) + TREE_INT_CST_ELT (nt, i) = (HOST_WIDE_INT)-1; + + TREE_INT_CST_ELT (nt, blocks_needed) = 0; + } + else + nt = make_int_cst (len); if (recanonize) { len--; TREE_INT_CST_ELT (nt, len) = zext_hwi (cst.elt (len), small_prec); } - for (int i = 0; i len; i++) + + for (i = 0; i len; i++) TREE_INT_CST_ELT (nt, i) = cst.elt (i); TREE_TYPE (nt) = type; @@ -10556,7 +10574,8 @@ widest_int_cst_value (const_tree x) #if HOST_BITS_PER_WIDEST_INT HOST_BITS_PER_WIDE_INT gcc_assert (HOST_BITS_PER_WIDEST_INT = HOST_BITS_PER_DOUBLE_INT); - gcc_assert (TREE_INT_CST_NUNITS (x) = 2); + gcc_assert (TREE_INT_CST_NUNITS (x) = 2 + || (TREE_INT_CST_NUNITS (x) == 3 TREE_INT_CST_ELT (x, 2) == 0)); if (TREE_INT_CST_NUNITS (x) == 1) val = ((HOST_WIDEST_INT)val HOST_BITS_PER_WIDE_INT) HOST_BITS_PER_WIDE_INT; @@ -10565,7 +10584,8 @@
Re: [PATCH] alternative hirate for builtin_expert
On 10/02/13 23:49, Rong Xu wrote: Here is the new patch. Honaz: Could you take a look? Thanks, -Rong On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka hubi...@ucw.cz wrote: Thanks for the suggestion. This is much cleaner than to use binary parameter. Just want to make sure I understand it correctly about the orginal hitrate: you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use the one specified in the biniltin-expect-probability parameter. Yes. Should I use 90% as the default? It's hard to fit current value 0.9996 in percent form. Yes, 90% seems fine. The original value was set quite arbitrarily and no real performance study was made as far as I know except yours. I think users that are sure they use expect to gueard completely cold edges may just use 100% instead of 0.9996, so I would not worry much about the precision. Honza -Rong OK with that change. Honza This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug further yet but still reducing the testcase. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619 for details. Can you please deal with this appropriately ? regards Ramana
Re: [c++-concepts] constrained friends redux
+ // Do not permit the declaration of constrained friend + // function declarations. They cannot be instantiated since + // the resulting declaration would never match the definition, + // which must be a non-template and cannot be constrained. You're in the template-id code here, so must be a non-template is confusing: template class T void f(); struct A { friend void fint(); // matches a template }; Perhaps you mean that it must match a fully-instantiated function, so any constraints on the templates were considered during determine_specialization. This seems like a simple comment fix, but there's a longer explanation of what I want (see below). Would this be more appropriate? // Do not allow constrained friend template specializations. The intent is stronger than to say it must match something. I don't want to allow any declarations of the form templatetypename T struct X { friend void f(T x) requires CT; // Error. }; This should never even get to determine_specialization since the original declaration is never actually pushed. We could use those constraints to match the specialization to one of several constrained overloads, as you mentioned earlier, but I'd rather avoid that for now. Solving that problem in general would require that we allow constrained (explicit) specializations and define a method of matching instantiated constraints to dependent constraints, and that we do so as an alternative to the usual constraint checking during template argument deduction. Maybe it's a useful feature, but it's really hard to gauge how much use it would actually get. We can always revisit that in the future. Somebody else can write that paper :) Andrew
Re: [PATCH] libgccjit.so: an embeddable JIT-compiler based on GCC
On Thu, 3 Oct 2013, David Malcolm wrote: Right now all you get back from the result is a void* which you're meant to cast to machine code for the CPU. I guess we could add an And right now the library is calling dlopen. Which means that although what the user gets is a void *, the dynamic linker processing has dealt with registering eh_frame information and the right hooks have been passed through for GDB to see that an object has been loaded and access its debug information. Is this use of dlopen intended to be part of the interface, or just a temporary hack with users eventually needing to use other interfaces for eh_frame and debug info handling? (GDB has some support for JITs, but I haven't looked into the details of how it works.) option on the gcc_jit_context for setting which ISA you want code for. Even apart from completely separate ISAs, there's also the matter of other command-line options such as -march=, which I'd think users should be able to specify. And the complication that default options to cc1 can be determined by specs, whether in .h files or from configure options such as --with=arch=, so cc1's defaults may not be the same as the defaults when you run the driver (which are likely to be better defaults for the JIT than cc1's). And if the user wants to use -march=native (which seems sensible for the common use case of a JIT, generating code for the same system as runs the compiler), that's handled through specs. Maybe the driver should be converted to library code so it's possible to run command-line options through it and generate the command line that would get passed to cc1 (but then use that to call a function in the same process - with a different copy of global options data, diagnostic context etc. to avoid any issues from those being initialized twice - rather than running a subprocess). That would also allow generating the right options for the assembler to pass to the library version of the assembler. * There are some grotesque kludges in internal-api.c, especially in how we go from .s assembler files to a DSO (grep for gross hack ;) ) Apart from making the assembler into a shared library itself, it would also be nice to avoid needing those files at all (via an API for writing assembler text that doesn't depend on a FILE * pointer, although on GNU hosts you can always use in-memory files via open_memstream, which would make the internal changes much smaller). But in the absence of such a cleanup, running the assembler via the driver should work, although inefficient. (nods) Note that all of the kludgery (if that's a word) is hidden inside the library, so we can fix it all up without affecting client code: the client-facing API doesn't expose any of this. FWIW I added timevars for measuring the invocation of the driver; that kludge makes up about 50% of the overall time taken. I haven't yet broken that down into assembler vs linker vs fork/exec overhead, but clearly this is something that could use optimization - leading to (very) vague thoughts involving turning parts of binutils into libraries also. First I guess it might simply be a library that receives blocks of text that currently go to asm_out_file - but eventually there might be efficiency to be gained by passing binary data to the assembler in some cases (e.g. for large blocks of debug info) rather than generating and then parsing text. (So some asm_out methods would gain implementations passing such data instead of generating text.) -- Joseph S. Myers jos...@codesourcery.com
[PING] [PATCH] Add a new option -ftree-bitfield-merge (patch / doc inside)
Just to ping this patch. http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01829.html Regards, Zoran Jovanovic From: Zoran Jovanovic Sent: Tuesday, September 24, 2013 11:59 PM To: gcc-patches@gcc.gnu.org Cc: Petar Jovanovic Subject: RE: [PATCH] Add a new option -ftree-bitfield-merge (patch / doc inside) Hello, This is new patch version. Comments from Bernhard Reutner-Fischer review applied. Also, test case bitfildmrg2.c modified - it is now execute test. Example: Original code: unnamed-unsigned:3 D.1351; unnamed-unsigned:9 D.1350; unnamed-unsigned:7 D.1349; D.1349_2 = p1_1(D)-f1; p2_3(D)-f1 = D.1349_2; D.1350_4 = p1_1(D)-f2; p2_3(D)-f2 = D.1350_4; D.1351_5 = p1_1(D)-f3; p2_3(D)-f3 = D.1351_5; Optimized code: unnamed-unsigned:19 D.1358; _16 = pr1_2(D)-_field0; pr2_4(D)-_field0 = _16; Algorithm works on basic block level and consists of following 3 major steps: 1. Go trough basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Modify trees accordingly. New command line option -ftree-bitfield-merge is introduced. Tested - passed gcc regression tests. Changelog - gcc/ChangeLog: 2013-09-24 Zoran Jovanovic (zoran.jovano...@imgtec.com) * Makefile.in : Added tree-sra.c to GTFILES. * common.opt (ftree-bitfield-merge): New option. * doc/invoke.texi: Added reference to -ftree-bitfield-merge. * tree-sra.c (ssa_bitfield_merge): New function. Entry for (-ftree-bitfield-merge). (bitfield_stmt_access_pair_htab_hash): New function. (bitfield_stmt_access_pair_htab_eq): New function. (cmp_access): New function. (create_and_insert_access): New function. (get_bit_offset): New function. (get_merged_bit_field_size): New function. (add_stmt_access_pair): New function. * dwarf2out.c (simple_type_size_in_bits): moved to tree.c. (field_byte_offset): declaration moved to tree.h, static removed. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h. * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c. (simple_type_size_in_bits): moved from dwarf2out.c. * tree.h (expressions_equal_p): declaration added. (field_byte_offset): declaration added. Patch - diff --git a/gcc/Makefile.in b/gcc/Makefile.in index a2e3f7a..54aa8e7 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/asan.c \ $(srcdir)/ubsan.c \ $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \ + $(srcdir)/tree-sra.c \ @all_gtfiles@ # Compute the list of GT header files from the corresponding C sources, diff --git a/gcc/common.opt b/gcc/common.opt index 202e169..afac514 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2164,6 +2164,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +ftree-bitfield-merge +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization +Enable bit-field merge on trees + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA-normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index aa0f4ed..e588cae 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol @@ -7679,6 +7679,11 @@ pointer alignment information. This pass only operates on local scalar variables and is enabled by default at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. +@item -ftree-bitfield-merge +@opindex ftree-bitfield-merge +Combines several adjacent bit-field accesses that copy values +from one memory location to another into one single bit-field access. + @item -ftree-ccp @opindex ftree-ccp Perform sparse conditional constant propagation (CCP) on trees. This diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 95049e4..e74db17 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT, unsigned int); static tree field_type (const_tree);
Re: [PATCH] alternative hirate for builtin_expert
Dehao, can you take a look? David On Fri, Oct 4, 2013 at 6:05 AM, Ramana Radhakrishnan ramra...@arm.com wrote: On 10/02/13 23:49, Rong Xu wrote: Here is the new patch. Honaz: Could you take a look? Thanks, -Rong On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka hubi...@ucw.cz wrote: Thanks for the suggestion. This is much cleaner than to use binary parameter. Just want to make sure I understand it correctly about the orginal hitrate: you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use the one specified in the biniltin-expect-probability parameter. Yes. Should I use 90% as the default? It's hard to fit current value 0.9996 in percent form. Yes, 90% seems fine. The original value was set quite arbitrarily and no real performance study was made as far as I know except yours. I think users that are sure they use expect to gueard completely cold edges may just use 100% instead of 0.9996, so I would not worry much about the precision. Honza -Rong OK with that change. Honza This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug further yet but still reducing the testcase. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619 for details. Can you please deal with this appropriately ? regards Ramana
Re: [Patch] Handle profile counts truncated to 0 in coldness test
On Thu, Oct 3, 2013 at 10:15 PM, Teresa Johnson tejohn...@google.com wrote: This patch handles the fact that small profile count values sometimes get truncated down to 0 during updates due to integer arithmetic. This causes sub-optimal function splitting under -freorder-blocks-and-partition. The first part fixes the logic in probably_never_executed that looks at the bb frequency when the bb count is zero. It now correctly compares the count computed from the frequency to the number of profile runs instead of to REG_BR_PROB_BASE. The minimum ratio of profile counts to training runs required for a block to not be considered cold is now encoded in a parameter, with the default value of 4 to match the existing code. The second part ensures that frequencies are correctly updated after inlining. The problem is that after inlining the frequencies were being recomputed directly from the corresponding bb counts in rebuild_frequencies. If the counts had been truncated to 0, then the recomputed frequencies would become 0 as well (often leading to inconsistencies in the frequencies between blocks). Then the above change to probably_never_executed would not help identify these blocks as non-cold from the frequencies. The fix was to use the existing logic used during static profile rebuilding to also recompute/propagate the frequencies from the branch probabilities in the profile feedback case. I also renamed a number of estimate_* routines to compute_* and updated the comments to reflect the fact that these routines are not doing estimation (from a static profile), but in fact recomputing/propagating frequencies from the existing (either guessed or profile-feedback-based) probabilities. For the second part, it seems to assume the branch probabilities are better maintained than bb counts? David Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? (One more round of regression testing in progress after making some slight cleanup changes.) Thanks, Teresa 2013-10-03 Teresa Johnson tejohn...@google.com * params.def (PARAM_MIN_HOT_RUN_RATIO): New parameter. * predict.c (probably_never_executed): Compare frequency-based count to number of training runs. (tree_estimate_probability): Update function call to new name. compute_bb_frequencies and pass new parameter. (compute_loops_at_level): Renamed. (compute_loops): Ditto. (compute_bb_frequencies): Renamed, and new parameter to force recomputing frequencies. (rebuild_frequencies): Recompute bb frequencies from the probabilities instead of from counts due to truncation issues. * predict.h (compute_bb_frequencies): Update declaration. Index: params.def === --- params.def (revision 203152) +++ params.def (working copy) @@ -373,6 +373,12 @@ DEFPARAM(HOT_BB_FREQUENCY_FRACTION, Select fraction of the maximal frequency of executions of basic block in function given basic block needs to have to be considered hot, 1000, 0, 0) +DEFPARAM(PARAM_MIN_HOT_RUN_RATIO, +min-hot-run-ratio, + The minimum ratio of profile runs to basic block execution count + for the block to be considered hot, +4, 0, 1) + DEFPARAM (PARAM_ALIGN_THRESHOLD, align-threshold, Select fraction of the maximal frequency of executions of basic block in function given basic block get alignment, Index: predict.c === --- predict.c (revision 203152) +++ predict.c (working copy) @@ -237,17 +237,31 @@ probably_never_executed (struct function *fun, gcc_checking_assert (fun); if (profile_status_for_function (fun) == PROFILE_READ) { - if ((count * 4 + profile_info-runs / 2) / profile_info-runs 0) + int min_run_ratio = PARAM_VALUE (PARAM_MIN_HOT_RUN_RATIO); + if (RDIV (count * min_run_ratio, profile_info-runs) 0) return false; if (!frequency) return true; if (!ENTRY_BLOCK_PTR-frequency) return false; - if (ENTRY_BLOCK_PTR-count ENTRY_BLOCK_PTR-count REG_BR_PROB_BASE) + if (ENTRY_BLOCK_PTR-count) { - return (RDIV (frequency * ENTRY_BLOCK_PTR-count, - ENTRY_BLOCK_PTR-frequency) - REG_BR_PROB_BASE / 4); + gcov_type scaled_count + = frequency * ENTRY_BLOCK_PTR-count * min_run_ratio; + gcov_type computed_count; + /* Check for overflow, in which case ENTRY_BLOCK_PTR-count should + be large enough to do the division first without losing much + precision. */ + if (scaled_count/frequency/min_run_ratio != ENTRY_BLOCK_PTR-count) +{ + computed_count = RDIV (ENTRY_BLOCK_PTR-count, +
Re: [Patch] Internal functions for testsuite in regex
Hi, On 10/04/2013 04:13 PM, Tim Shen wrote: This is based on disscusions here(http://gcc.gnu.org/ml/libstdc++/2013-10/msg00034.html) And it successfully find a bug in regex_executor.tcc :) Booted and tested under -m32, -m64 and debug before the change in regex_executor.tcc; -m32 and -m64 only for regex_executor.tcc, but I'll start a bootstrap now. Can you please add short comments inline in the code explaining the semantics of the new functions? A short comment before each one. In particular, the new *_testsuite functions, isn't immediately clear in what they differ from the non-_testsuite variants. In any case we should figure out a better name, maybe even *_internal, if we can't find anything more accurate, but it should be self-explaining in terms of their C++ semantics. Please also document in a comment before the definition of __regex_search_impl the __policy parameter (not just in the body of the function). And I don't see why we can't use an enumeration instead of integers, like enum class _RegexExecutorPolicy : int { _AutoExecutor, _DFSExecutor, _BFSExecutor }; More trivial issues: inline functions (like all the new _testsuite ones) should be in .h, not in .tcc. Some unuglified names, like res ans res2. Paolo.
Re: [Patch] Internal functions for testsuite in regex
On 10/04/2013 06:04 PM, Paolo Carlini wrote: In particular, the new *_testsuite functions, isn't immediately clear in what they differ from the non-_testsuite variants. In any case we should figure out a better name, maybe even *_internal, if we can't find anything more accurate, but it should be self-explaining in terms of their C++ semantics. Thus, the functions are comparing two different executors. Thus say that in the name, like: __regex_search_debug, or __regex_search_compare. And please say what they are for in a comment before at least the primary overload, the one actually carrying out the comparison. All the other comments live. Paolo.
[patch] Fix unresolved symbol with -gsplit-dwarf
When building the location list for a variable that has been optimized by SRA, dw_sra_loc_expr sometimes creates a DWARF expression or a piece of an expression, but then abandons it for some reason. When abandoning it, we neglected to release any addr_table entries created for DW_OP_addr_index opcodes, occasionally resulting in a link-time unresolved symbol. This patch releases the addr_table entries before abandoning a location expression. Bootstrapped and regression tested on x86-64. Committed to trunk at r203206. -cary 2013-10-03 Cary Coutant ccout...@google.com gcc/ * dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when discarding a location list expression (or a piece of one). Index: dwarf2out.c === --- dwarf2out.c (revision 203183) +++ dwarf2out.c (working copy) @@ -13492,6 +13492,9 @@ dw_sra_loc_expr (tree decl, rtx loc) if (last != NULL opsize != bitsize) { padsize += bitsize; + /* Discard the current piece of the descriptor and release any +addr_table entries it uses. */ + remove_loc_list_addr_table_entries (cur_descr); continue; } @@ -13500,18 +13503,24 @@ dw_sra_loc_expr (tree decl, rtx loc) if (padsize) { if (padsize decl_size) - return NULL; + { + remove_loc_list_addr_table_entries (cur_descr); + goto discard_descr; + } decl_size -= padsize; *descr_tail = new_loc_descr_op_bit_piece (padsize, 0); if (*descr_tail == NULL) - return NULL; + { + remove_loc_list_addr_table_entries (cur_descr); + goto discard_descr; + } descr_tail = (*descr_tail)-dw_loc_next; padsize = 0; } *descr_tail = cur_descr; descr_tail = tail; if (bitsize decl_size) - return NULL; + goto discard_descr; decl_size -= bitsize; if (last == NULL) { @@ -13547,9 +13556,9 @@ dw_sra_loc_expr (tree decl, rtx loc) { if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN (memsize BITS_PER_WORD || bitsize BITS_PER_WORD)) - return NULL; + goto discard_descr; if (memsize bitsize) - return NULL; + goto discard_descr; if (BITS_BIG_ENDIAN) offset = memsize - bitsize; } @@ -13557,7 +13566,7 @@ dw_sra_loc_expr (tree decl, rtx loc) *descr_tail = new_loc_descr_op_bit_piece (bitsize, offset); if (*descr_tail == NULL) - return NULL; + goto discard_descr; descr_tail = (*descr_tail)-dw_loc_next; } } @@ -13568,9 +13577,14 @@ dw_sra_loc_expr (tree decl, rtx loc) { *descr_tail = new_loc_descr_op_bit_piece (decl_size, 0); if (*descr_tail == NULL) - return NULL; + goto discard_descr; } return descr; + +discard_descr: + /* Discard the descriptor and release any addr_table entries it uses. */ + remove_loc_list_addr_table_entries (descr); + return NULL; } /* Return the dwarf representation of the location list LOC_LIST of
Re: [Patch] Internal functions for testsuite in regex
.. a final one: if you don't like all those *_debug functions around, both in rehex.h and regex.tcc, you could move all of them to a new header matching the new naming scheme, like regex_debug.h. For the time being I would recommend putting it in bits/, installing it, including it from regex, exactly like all the other regex bits. A completely different option, which I also like a lot in fact, would be putting the new *_testsuite functions inside the already existing testsuite/util/testsuite/regex.h. There you would use namespace __gnu_test, would not require strict uglification, etc. Paolo.
[Patch, Fortran, committed] GCC 4.8 backporting of defined assignment patches
I have committed the attached patch to the GCC 4.8 branch, backporting some defined assignment patches. Committed as Rev. 203207/203208. GCC 4.8 added defined assignment for components during intrinsic assignment, which had some issues. a) PR 57697/PR58469 Patch: http://gcc.gnu.org/ml/fortran/2013-09/msg00039.html Approval: http://gcc.gnu.org/ml/fortran/2013-09/msg00056.html b) http://gcc.gnu.org/ml/fortran/2013-09/msg00016.html c) http://gcc.gnu.org/ml/fortran/2013-09/msg00026.html d) http://gcc.gnu.org/ml/fortran/2013-09/msg00038.html Tobias Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 203206) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,3 +1,13 @@ +2013-10-04 Tobias Burnus bur...@net-b.de + + Backport from mainline + 2013-09-25 Tobias Burnus bur...@net-b.de + + PR fortran/57697 + PR fortran/58469 + * resolve.c (generate_component_assignments): Avoid double free + at runtime and freeing a still-being used expr. + 2013-08-24 Mikael Morin mik...@gcc.gnu.org PR fortran/57798 Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (Revision 203206) +++ gcc/fortran/resolve.c (Arbeitskopie) @@ -9997,6 +9997,26 @@ temp_code = build_assignment (EXEC_ASSIGN, t1, (*code)-expr1, NULL, NULL, (*code)-loc); + + /* For allocatable LHS, check whether it is allocated. Note + that allocatable components with defined assignment are + not yet support. See PR 57696. */ + if ((*code)-expr1-symtree-n.sym-attr.allocatable) + { + gfc_code *block; + gfc_expr *e = + gfc_lval_expr_from_sym ((*code)-expr1-symtree-n.sym); + block = gfc_get_code (); + block-op = EXEC_IF; + block-block = gfc_get_code (); + block-block-op = EXEC_IF; + block-block-expr1 + = gfc_build_intrinsic_call (ns, +GFC_ISYM_ALLOCATED, allocated, +(*code)-loc, 1, e); + block-block-next = temp_code; + temp_code = block; + } add_code_to_chain (temp_code, tmp_head, tmp_tail); } @@ -10005,8 +10025,37 @@ gfc_free_expr (this_code-ext.actual-expr); this_code-ext.actual-expr = gfc_copy_expr (t1); add_comp_ref (this_code-ext.actual-expr, comp1); + + /* If the LHS variable is allocatable and wasn't allocated and + the temporary is allocatable, pointer assign the address of + the freshly allocated LHS to the temporary. */ + if ((*code)-expr1-symtree-n.sym-attr.allocatable + gfc_expr_attr ((*code)-expr1).allocatable) + { + gfc_code *block; + gfc_expr *cond; + + cond = gfc_get_expr (); + cond-ts.type = BT_LOGICAL; + cond-ts.kind = gfc_default_logical_kind; + cond-expr_type = EXPR_OP; + cond-where = (*code)-loc; + cond-value.op.op = INTRINSIC_NOT; + cond-value.op.op1 = gfc_build_intrinsic_call (ns, + GFC_ISYM_ALLOCATED, allocated, + (*code)-loc, 1, gfc_copy_expr (t1)); + block = gfc_get_code (); + block-op = EXEC_IF; + block-block = gfc_get_code (); + block-block-op = EXEC_IF; + block-block-expr1 = cond; + block-block-next = build_assignment (EXEC_POINTER_ASSIGN, + t1, (*code)-expr1, + NULL, NULL, (*code)-loc); + add_code_to_chain (block, head, tail); + } } - } + } else if (this_code-op == EXEC_ASSIGN !this_code-next) { /* Don't add intrinsic assignments since they are already @@ -10028,13 +10077,6 @@ } } - /* This is probably not necessary. */ - if (this_code) -{ - gfc_free_statements (this_code); - this_code = NULL; -} - /* Put the temporary assignments at the top of the generated code. */ if (tmp_head component_assignment_level == 1) { @@ -10043,6 +10085,30 @@ tmp_head = tmp_tail = NULL; } + // If we did a pointer assignment - thus, we need to ensure that the LHS is + // not accidentally deallocated. Hence, nullify t1. + if (t1 (*code)-expr1-symtree-n.sym-attr.allocatable + gfc_expr_attr ((*code)-expr1).allocatable) +{ + gfc_code *block; + gfc_expr *cond; + gfc_expr *e; + + e = gfc_lval_expr_from_sym ((*code)-expr1-symtree-n.sym); + cond = gfc_build_intrinsic_call (ns, GFC_ISYM_ASSOCIATED, associated, + (*code)-loc, 2, gfc_copy_expr (t1), e); + block = gfc_get_code (); + block-op = EXEC_IF; + block-block = gfc_get_code (); + block-block-op = EXEC_IF; + block-block-expr1 = cond; + block-block-next = build_assignment (EXEC_POINTER_ASSIGN, + t1, gfc_get_null_expr ((*code)-loc), + NULL, NULL, (*code)-loc); + gfc_append_code (tail, block); + tail = block; +} + /* Now attach the remaining code chain to the input code. Step on to the end of the new code since resolution is complete. */ gcc_assert ((*code)-op == EXEC_ASSIGN); @@ -10052,7
Re: patch to canonize unsigned tree-csts
I was hoping Richard would weigh in here. In case not... Kenneth Zadeck zad...@naturalbridge.com writes: I was thinking that we should always be able to use the constant as-is for max_wide_int-based and addr_wide_int-based operations. The small_prec again, you can get edge cased to death here.i think it would work for max because that really is bigger than anything else, but it is possible (though unlikely) to have something big converted to an address by truncation. But I'd have expected that conversion to be represented by an explicit CONVERT_EXPR or NOP_EXPR. It seems wrong to use addr_wide_int directly on something that isn't bit- or byte-address-sized. It'd be the C equivalent of int + long - int rather than the expected int + long - long. Same goes for wide_int. If we're doing arithmetic at a specific precision, it seems odd for one of the inputs to be wider and yet not have an explicit truncation. you miss the second reason why we needed addr-wide-int. A large amount of the places where the addressing arithmetic is done are not type safe.Only the gimple and rtl that is translated from the source code is really type safe. In passes like the strength reduction where they just grab things from all over, the addr-wide-int or the max-wide-int are safe haven structures that are guaranteed to be large enough to not matter.So what i fear here is something like a very wide loop counter being grabbed into some address calculation. It still seems really dangerous to be implicitly truncating a wider type to addr_wide_int. It's not something we'd ever do in mainline, because uses of addr_wide_int are replacing uses of double_int, and double_int is our current maximum-width representation. Using addr_wide_int rather than max_wide_int is an optimisation. IMO part of implementating that optimisation should be to introduce explicit truncations whenever we try to use addr_wide_int to operate on inputs than might be wider than addr_wide_int. So I still think the assert is the way to go. Thanks, Richard
[PATCH][ARM]Use cortex tune_params for armv8-a architecture
Hi all, This patch will change tune_params for armv8-a architecture to general cortex. Change has been tested for arm-none-eabi on the model. Ok for trunk? Kind regards, Renlin Li gcc/ChangeLog: 2013-10-03 Renlin Li renlin...@arm.com * config/arm/arm-cores.def (cortex-a53): Use cortex tunning.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 3d59fa6..17c9bf3 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -129,7 +129,7 @@ ARM_CORE(cortex-a7, cortexa7, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV ARM_CORE(cortex-a8, cortexa8, 7A, FL_LDSCHED, cortex) ARM_CORE(cortex-a9, cortexa9, 7A, FL_LDSCHED, cortex_a9) ARM_CORE(cortex-a15, cortexa15, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) -ARM_CORE(cortex-a53, cortexa53, 8A, FL_LDSCHED, cortex_a5) +ARM_CORE(cortex-a53, cortexa53, 8A, FL_LDSCHED, cortex) ARM_CORE(cortex-r4, cortexr4, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r4f, cortexr4f, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r5, cortexr5, 7R, FL_LDSCHED | FL_ARM_DIV, cortex)
Re: [c++-concepts] constrained friends redux
On 10/04/2013 09:20 AM, Andrew Sutton wrote: Perhaps you mean that it must match a fully-instantiated function, so any constraints on the templates were considered during determine_specialization. This seems like a simple comment fix, but there's a longer explanation of what I want (see below). Would this be more appropriate? // Do not allow constrained friend template specializations. The intent is stronger than to say it must match something. By must I meant that whatever it matches could only be a fully-instantiated function. But I guess the main reason for disallowing constraints here is the same as for explicit specializations and non-template functions; non-dependent constraints don't really make any sense. Jason
Re: [c++-concepts] constrained friends redux
Perhaps you mean that it must match a fully-instantiated function, so any constraints on the templates were considered during determine_specialization. This seems like a simple comment fix, but there's a longer explanation of what I want (see below). Would this be more appropriate? // Do not allow constrained friend template specializations. The intent is stronger than to say it must match something. By must I meant that whatever it matches could only be a fully-instantiated function. I see what you mean. I was caught up in the wrong part of the sentence.. But yes, that's right. But I guess the main reason for disallowing constraints here is the same as for explicit specializations and non-template functions; non-dependent constraints don't really make any sense. That's the intent. Okay to commit? Andrew friends-3.patch Description: Binary data
Re: [PATCH] alternative hirate for builtin_expert
My change on the probability of builtin_expect does have an impact on the partial inline (more outlined functions will get inline back to the original function). I think this triggers an existing issue. Dehao will explain this in his coming email. -Rong On Fri, Oct 4, 2013 at 6:05 AM, Ramana Radhakrishnan ramra...@arm.com wrote: On 10/02/13 23:49, Rong Xu wrote: Here is the new patch. Honaz: Could you take a look? Thanks, -Rong On Wed, Oct 2, 2013 at 2:31 PM, Jan Hubicka hubi...@ucw.cz wrote: Thanks for the suggestion. This is much cleaner than to use binary parameter. Just want to make sure I understand it correctly about the orginal hitrate: you want to retire the hitrate in PRED_BUILTIN_EXPECT and always use the one specified in the biniltin-expect-probability parameter. Yes. Should I use 90% as the default? It's hard to fit current value 0.9996 in percent form. Yes, 90% seems fine. The original value was set quite arbitrarily and no real performance study was made as far as I know except yours. I think users that are sure they use expect to gueard completely cold edges may just use 100% instead of 0.9996, so I would not worry much about the precision. Honza -Rong OK with that change. Honza This broke arm-linux-gnueabihf building libstdc++-v3. I haven't dug further yet but still reducing the testcase. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619 for details. Can you please deal with this appropriately ? regards Ramana
[google/gcc-4_8] Backport fix for unresolved symbol with -gsplit-dwarf
When building the location list for a variable that has been optimized by SRA, dw_sra_loc_expr sometimes creates a DWARF expression or a piece of an expression, but then abandons it for some reason. When abandoning it, we neglected to release any addr_table entries created for DW_OP_addr_index opcodes, occasionally resulting in a link-time unresolved symbol. This patch releases the addr_table entries before abandoning a location expression. Backported from trunk r203206. Google ref b/10833306. 2013-10-03 Cary Coutant ccout...@google.com gcc/ * dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when discarding a location list expression (or a piece of one).
Go patch committed: Use backend interface for references to temps
This patch from Chris Manghane changes the Go frontend to use the backend interface when building a reference to a temporary variable. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian 2013-10-04 Chris Manghane cm...@google.com * go-gcc.cc (Backend::convert_expression): New function. Index: gcc/go/go-gcc.cc === --- gcc/go/go-gcc.cc (revision 203037) +++ gcc/go/go-gcc.cc (revision 203038) @@ -208,6 +208,16 @@ class Gcc_backend : public Backend Bexpression* zero_expression(Btype*); + Bexpression* + error_expression() + { return this-make_expression(error_mark_node); } + + Bexpression* + var_expression(Bvariable* var, Location); + + Bexpression* + indirect_expression(Bexpression* expr, bool known_valid, Location); + // Statements. Bstatement* @@ -848,6 +858,30 @@ Gcc_backend::zero_expression(Btype* btyp return tree_to_expr(ret); } +// An expression that references a variable. + +Bexpression* +Gcc_backend::var_expression(Bvariable* var, Location) +{ + tree ret = var-get_tree(); + if (ret == error_mark_node) +return this-error_expression(); + return tree_to_expr(ret); +} + +// An expression that indirectly references an expression. + +Bexpression* +Gcc_backend::indirect_expression(Bexpression* expr, bool known_valid, + Location location) +{ + tree ret = build_fold_indirect_ref_loc(location.gcc_location(), + expr-get_tree()); + if (known_valid) +TREE_THIS_NOTRAP(ret) = 1; + return tree_to_expr(ret); +} + // An expression as a statement. Bstatement* Index: gcc/go/gofrontend/backend.h === --- gcc/go/gofrontend/backend.h (revision 203037) +++ gcc/go/gofrontend/backend.h (revision 203038) @@ -231,6 +231,22 @@ class Backend virtual Bexpression* zero_expression(Btype*) = 0; + // Create an error expression. This is used for cases which should + // not occur in a correct program, in order to keep the compilation + // going without crashing. + virtual Bexpression* + error_expression() = 0; + + // Create a reference to a variable. + virtual Bexpression* + var_expression(Bvariable* var, Location) = 0; + + // Create an expression that indirects through the pointer expression EXPR + // (i.e., return the expression for *EXPR). KNOWN_VALID is true if the pointer + // is known to point to a valid memory location. + virtual Bexpression* + indirect_expression(Bexpression* expr, bool known_valid, Location) = 0; + // Statements. // Create an error statement. This is used for cases which should Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc (revision 203037) +++ gcc/go/gofrontend/expressions.cc (revision 203038) @@ -978,22 +978,19 @@ Var_expression::do_get_tree(Translate_co { Bvariable* bvar = this-variable_-get_backend_variable(context-gogo(), context-function()); - tree ret = var_to_tree(bvar); - if (ret == error_mark_node) -return error_mark_node; bool is_in_heap; + Location loc = this-location(); if (this-variable_-is_variable()) is_in_heap = this-variable_-var_value()-is_in_heap(); else if (this-variable_-is_result_variable()) is_in_heap = this-variable_-result_var_value()-is_in_heap(); else go_unreachable(); + + Bexpression* ret = context-backend()-var_expression(bvar, loc); if (is_in_heap) -{ - ret = build_fold_indirect_ref_loc(this-location().gcc_location(), ret); - TREE_THIS_NOTRAP(ret) = 1; -} - return ret; +ret = context-backend()-indirect_expression(ret, true, loc); + return expr_to_tree(ret); } // Ast dump for variable expression.
libgo patch committed: Fix calling Interface on MakeFunc value
This patch to libgo fixes calling the Interface method on a Value created by calling MakeFunc. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r 52c01e7b81fe libgo/go/reflect/all_test.go --- a/libgo/go/reflect/all_test.go Fri Oct 04 11:00:40 2013 -0700 +++ b/libgo/go/reflect/all_test.go Fri Oct 04 11:49:27 2013 -0700 @@ -1454,6 +1454,30 @@ } } +func TestMakeFuncInterface(t *testing.T) { + switch runtime.GOARCH { + case amd64, 386: + default: + t.Skip(MakeFunc not implemented for + runtime.GOARCH) + } + + fn := func(i int) int { return i } + incr := func(in []Value) []Value { + return []Value{ValueOf(int(in[0].Int() + 1))} + } + fv := MakeFunc(TypeOf(fn), incr) + ValueOf(fn).Elem().Set(fv) + if r := fn(2); r != 3 { + t.Errorf(Call returned %d, want 3, r) + } + if r := fv.Call([]Value{ValueOf(14)})[0].Int(); r != 15 { + t.Errorf(Call returned %d, want 15, r) + } + if r := fv.Interface().(func(int) int)(26); r != 27 { + t.Errorf(Call returned %d, want 27, r) + } +} + type Point struct { x, y int } diff -r 52c01e7b81fe libgo/go/reflect/makefunc.go --- a/libgo/go/reflect/makefunc.go Fri Oct 04 11:00:40 2013 -0700 +++ b/libgo/go/reflect/makefunc.go Fri Oct 04 11:49:27 2013 -0700 @@ -63,7 +63,7 @@ impl := makeFuncImpl{code: code, typ: ftyp, fn: fn} - return Value{t, unsafe.Pointer(impl), flag(Func) flagKindShift} + return Value{t, unsafe.Pointer(impl), flag(FuncflagKindShift) | flagIndir} } // makeFuncStub is an assembly function that is the code half of
Re: [PATCH] alternative hirate for builtin_expert
I looked at this problem. Bug updated http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619 This is a bug when updating block during tree-inline. Basically, it is legal for *n to be NULL. E.g. When gimple_block(id-gimple_call) is NULL, remap_blocks_to_null will be called to set *n to NULL. The NULL in gimple_block (gimple_call) comes from the call introduced by ipa-split? I remember that ipa-split used to try to put the call into block since we was ICEing in similar ways previously, too. Perhaps this has changed with new BLOCK representation? Honza
Re: patch to canonize unsigned tree-csts
On 10/04/2013 01:00 PM, Richard Sandiford wrote: I was hoping Richard would weigh in here. In case not... Kenneth Zadeck zad...@naturalbridge.com writes: I was thinking that we should always be able to use the constant as-is for max_wide_int-based and addr_wide_int-based operations. The small_prec again, you can get edge cased to death here.i think it would work for max because that really is bigger than anything else, but it is possible (though unlikely) to have something big converted to an address by truncation. But I'd have expected that conversion to be represented by an explicit CONVERT_EXPR or NOP_EXPR. It seems wrong to use addr_wide_int directly on something that isn't bit- or byte-address-sized. It'd be the C equivalent of int + long - int rather than the expected int + long - long. Same goes for wide_int. If we're doing arithmetic at a specific precision, it seems odd for one of the inputs to be wider and yet not have an explicit truncation. you miss the second reason why we needed addr-wide-int. A large amount of the places where the addressing arithmetic is done are not type safe.Only the gimple and rtl that is translated from the source code is really type safe. In passes like the strength reduction where they just grab things from all over, the addr-wide-int or the max-wide-int are safe haven structures that are guaranteed to be large enough to not matter.So what i fear here is something like a very wide loop counter being grabbed into some address calculation. It still seems really dangerous to be implicitly truncating a wider type to addr_wide_int. It's not something we'd ever do in mainline, because uses of addr_wide_int are replacing uses of double_int, and double_int is our current maximum-width representation. Using addr_wide_int rather than max_wide_int is an optimisation. IMO part of implementating that optimisation should be to introduce explicit truncations whenever we try to use addr_wide_int to operate on inputs than might be wider than addr_wide_int. So I still think the assert is the way to go. addr_wide_int is not as much of an optimization as it is documentation of what you are doing - i.e. this is addressing arithmetic. My justification for putting it in was that we wanted a sort of an abstract type to say that this was not just user math, it was addressing arithmetic and that the ultimate result is going to be slammed into a target pointer. I was only using that as an example to try to indicate that I did not think that it was wrong if someone did truncate. In particular, would you want the assert to be that the value was truncated or that the type of the value would allow numbers that would be truncated? I actually think neither. If a programmer uses a long long on a 32 bit machine for some index variable and slams that into a pointer, he either knows what he is doing or has made a mistake.do you really think that the compiler should ice? Thanks, Richard
Merge from GCC 4.8 branch to gccgo branch
I merged revision 203214 of the GCC 4.8 branch to the gccgo branch. Ian
Re: [PATCH] alternative hirate for builtin_expert
On Fri, Oct 4, 2013 at 11:54 AM, Jan Hubicka hubi...@ucw.cz wrote: I looked at this problem. Bug updated http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619 This is a bug when updating block during tree-inline. Basically, it is legal for *n to be NULL. E.g. When gimple_block(id-gimple_call) is NULL, remap_blocks_to_null will be called to set *n to NULL. The NULL in gimple_block (gimple_call) comes from the call introduced by ipa-split? That is correct. I remember that ipa-split used to try to put the call into block since we was ICEing in similar ways previously, too. Perhaps this has changed with new BLOCK representation? The new BLOCK representation does not change this. I think it makes sense to leave the block of newly introduced call_stmt as NULL because when it's inlined back, we don't want to add additional block layers. Dehao Honza
Re: [c++-concepts] constrained friends redux
OK. Jason
[gomp4] Target fallback ICV handling, ICV fixes
Hi! I've committed the following patch to gomp-4.0-branch. The omp-low.c changes are to fix some bugs with if clause on #pragma omp target{, data, update}. The c-cppbuiltin.c is to finally announce OpenMP 4.0 support for C/C++. The libgomp changes are: 1) as required by OpenMP 4.0, thread_limit_var is now a per-data-environment ICV, rather than global var 2) gomp_remaining_threads_count has been removed, instead as required by the spec ThreadsBusy from the spec is tracked per contention group inside of thread_pool; if there is just one contention group, then the new thr-thread_pool-threads_busy should be the difference between icv-thread_limit_var and the old gomp_remaining_threads_count; so, in gomp_resolve_num_threads now we add nthreads - 1 to it rather than subtracting it 3) apparently the old OMP_THREAD_LIMIT code was buggy, because GOMP_parallel_end was also subtracting from gomp_remaining_threads_count rather than adding to it (with the new code it is correct to subtract; when I get spare time I'll write a small alternative patch for the release branches together with thread-limit-1.c testcase) 4) as the threads_busy count is now per-contention group, if a parallel isn't nested, we actually don't need to atomically update the counter, because there is just one thread in the contention group 5) gomp_managed_threads counter remains to be a global var, that is used to decide about spinning length, that is desirable to be global and is not user observable thing covered by the standard; I've just renamed the mutex guarding it 6) for GOMP_target host fallback, the function will create a new initial thread by making a copy of the old TLS *gomp_thread () and clearing it (except for affinity place and reinitializing it's place var to the whole place list), then restoring back 7) I've noticed that thr-release semaphore is never used for the master threads, so there is no point initializing it; we were initializing it just for the first initial thread, e.g. not in subsequent user pthread_create created threads that encounter #pragma omp constructs; and the semaphore wasn't ever destroyed 8) GOMP_teams is now implemented for the host fallback just by adjusting icv-thread_limit_var 9) on the target-7.c testcase I found several issues in the var remapping code (some fields could be uninitialized in certain cases) Tested on x86_64-linux, committed. 2013-10-04 Jakub Jelinek ja...@redhat.com * omp-low.c (expand_omp_target): When handling IF clause on #pragma omp target, split new_bb rather than entry_bb. If not GF_OMP_TARGET_KIND_REGION, split new_bb right before the GOMP_TARGET stmt, rather than after labels. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins): Predefine _OPENMP to 201307 instead of 201107. libgomp/ * libgomp.h (struct gomp_task_icv): Add thread_limit_var. (gomp_thread_limit_var, gomp_remaining_threads_count, gomp_remaining_threads_lock): Remove. (gomp_managed_threads_lock): New variable. (struct gomp_thread_pool): Add threads_busy field. (gomp_free_thread): New prototype. * parallel.c (gomp_resolve_num_threads): Adjust for thread_limit now being in icv-thread_limit_var. Use UINT_MAX instead of ULONG_MAX as infinity. If not nested, just return minimum of max_num_threads and icv-thread_limit_var and if thr-thread_pool, set threads_busy to the returned value. Otherwise, don't update atomically gomp_remaining_threads_count, but instead thr-thread_pool-threads_busy. (GOMP_parallel_end): Adjust for thread_limit now being in icv-thread_limit_var. Use UINT_MAX instead of ULONG_MAX as infinity. Adjust threads_busy in the pool rather than gomp_remaining_threads_count. Remember team-nthreads and call gomp_team_end before adjusting threads_busy, if not nested afterwards, just set it to 1 non-atomically. * team.c (gomp_thread_start): Clear thr-thread_pool and thr-task before returning. (gomp_free_pool_helper): Clear thr-thread_pool and thr-task before calling pthread_exit. (gomp_free_thread): No longer static. Use gomp_managed_threads_lock instead of gomp_remaining_threads_lock. (gomp_team_start): Set thr-thread_pool-threads_busy to nthreads immediately after creating new pool. Use gomp_managed_threads_lock instead of gomp_remaining_threads_lock. (gomp_team_end): Use gomp_managed_threads_lock instead of gomp_remaining_threads_lock. (initialize_team): Don't call gomp_sem_init here. * env.c (gomp_global_icv): Initialize thread_limit_var field. (gomp_thread_limit_var, gomp_remaining_threads_count, gomp_remaining_threads_lock): Remove. (gomp_managed_threads_locks): New variable.
[PATCH] Refactor a bit of jump thread identification code
This pulls out the code to search for a threading opportunity in a normal block into its own subroutine. No functional changes. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Installed on the trunk. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 065ebf1..cf4a45c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2013-10-04 Jeff Law l...@redhat.com + + * tree-ssa-threadedge.c: Fix some trailing whitespace problems. + + * tree-ssa-threadedge.c (thread_through_normal_block): Broken out of ... + (thread_across_edge): Here. Call it. + 2013-10-04 Cary Coutant ccout...@google.com * dwarf2out.c (dw_sra_loc_expr): Release addr_table entries when diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 39e921b..c2dd015 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -771,7 +771,7 @@ thread_around_empty_blocks (edge taken_edge, gsi = gsi_start_nondebug_bb (bb); /* If the block has no statements, but does have a single successor, then - it's just a forwarding block and we can thread through it trivially. + it's just a forwarding block and we can thread through it trivially. However, note that just threading through empty blocks with single successors is not inherently profitable. For the jump thread to @@ -779,7 +779,7 @@ thread_around_empty_blocks (edge taken_edge, By taking the return value from the recursive call, we get the desired effect of returning TRUE when we found a profitable jump - threading opportunity and FALSE otherwise. + threading opportunity and FALSE otherwise. This is particularly important when this routine is called after processing a joiner block. Returning TRUE too aggressively in @@ -844,13 +844,16 @@ thread_around_empty_blocks (edge taken_edge, path); return true; } - + return false; } - + /* We are exiting E-src, see if E-dest ends with a conditional jump which has a known value when reached via E. + E-dest can have arbitrary side effects which, if threading is + successful, will be maintained. + Special care is necessary if E is a back edge in the CFG as we may have already recorded equivalences for E-dest into our various tables, including the result of the conditional at @@ -858,11 +861,6 @@ thread_around_empty_blocks (edge taken_edge, limited in that case to avoid short-circuiting the loop incorrectly. - Note it is quite common for the first block inside a loop to - end with a conditional which is either always true or always - false when reached via the loop backedge. Thus we do not want - to blindly disable threading across a loop backedge. - DUMMY_COND is a shared cond_expr used by condition simplification as scratch, to avoid allocating memory. @@ -873,17 +871,19 @@ thread_around_empty_blocks (edge taken_edge, STACK is used to undo temporary equivalences created during the walk of E-dest. - SIMPLIFY is a pass-specific function used to simplify statements. */ + SIMPLIFY is a pass-specific function used to simplify statements. -void -thread_across_edge (gimple dummy_cond, - edge e, - bool handle_dominating_asserts, - vectree *stack, - tree (*simplify) (gimple, gimple)) -{ - gimple stmt; + Our caller is responsible for restoring the state of the expression + and const_and_copies stacks. */ +static bool +thread_through_normal_block (edge e, +gimple dummy_cond, +bool handle_dominating_asserts, +vectree *stack, +tree (*simplify) (gimple, gimple), +vecjump_thread_edge * *path) +{ /* If E is a backedge, then we want to verify that the COND_EXPR, SWITCH_EXPR or GOTO_EXPR at the end of e-dest is not affected by any statements in e-dest. If it is affected, then it is not @@ -891,20 +891,19 @@ thread_across_edge (gimple dummy_cond, if (e-flags EDGE_DFS_BACK) { if (cond_arg_set_in_bb (e, e-dest)) - goto fail; + return false; } - stmt_count = 0; - /* PHIs create temporary equivalences. */ if (!record_temporary_equivalences_from_phis (e, stack)) -goto fail; +return false; /* Now walk each statement recording any context sensitive temporary equivalences we can detect. */ - stmt = record_temporary_equivalences_from_stmts_at_dest (e, stack, simplify); + gimple stmt += record_temporary_equivalences_from_stmts_at_dest (e, stack, simplify); if (!stmt) -goto fail; +return false; /* If we stopped at a COND_EXPR or SWITCH_EXPR, see if we know which arm will be taken. */ @@ -927,9 +926,8 @@ thread_across_edge (gimple dummy_cond, /* DEST could be NULL for a computed
Re: [C++ Patch] PR 58448
OK. Jason
Re: [PATCH] alternative hirate for builtin_expert
On Fri, Oct 4, 2013 at 11:54 AM, Jan Hubicka hubi...@ucw.cz wrote: I looked at this problem. Bug updated http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58619 This is a bug when updating block during tree-inline. Basically, it is legal for *n to be NULL. E.g. When gimple_block(id-gimple_call) is NULL, remap_blocks_to_null will be called to set *n to NULL. The NULL in gimple_block (gimple_call) comes from the call introduced by ipa-split? That is correct. I remember that ipa-split used to try to put the call into block since we was ICEing in similar ways previously, too. Perhaps this has changed with new BLOCK representation? The new BLOCK representation does not change this. I think it makes sense to leave the block of newly introduced call_stmt as NULL because when it's inlined back, we don't want to add additional block layers. You are right, it may be result of Jakub's changes in the area (to improve debug info after inlining back). I guess the patch makes sense then. Honza Dehao Honza
Re: Enable building of libatomic on AArch64
On Thu, Oct 3, 2013 at 3:43 PM, Michael Hudson-Doyle michael.hud...@linaro.org wrote: Hi, As libatomic builds for and the tests pass on AArch64 (built on x86_64 but tested on a foundation model, logs and summary: http://people.linaro.org/~mwhudson/libatomic.sum.txt http://people.linaro.org/~mwhudson/runtest-log-v-2.txt ) this patch enables the build. Cheers, mwh (first time posting to this list, let me know if I'm doing it wrong) 2013-10-04 Michael Hudson-Doyle michael.hud...@linaro.org * configure.tgt: Add AArch64 support. Replying here also, This is the same patch which we have been using internally and I would like to see this approved. Thanks, Andrew Pinski
Re: [C++ Patch] PR 58560
OK. Jason
Re: [C++ Patch] PR 58503
OK. Jason
Re: [SH] PR 51244 - Fix defects introduced in 4.8
Oleg Endo oleg.e...@t-online.de wrote: Some of the things I've done in 4.8 to improve SH T bit handling turned out to produce wrong code. The attached patch fixes that by introducing an SH specific RTL pass. Tested on rev 202876 with make -k check RUNTESTFLAGS=--target_board=sh-sim \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb} and no new failures. Additional test cases will follow. OK for trunk? I've got a new failure with the patch on sh4-unknown-linux-gnu: New tests that FAIL: gfortran.fortran-torture/execute/forall_7.f90 execution, -O3 -g ifcvt_sh We usually use sh_/sh- prefix for target specific things, don't we? I'm not sure that there is some rigid naming convention for them, though. Regards, kaz
Re: [c++-concepts] constrained friends redux
Hi Andrew, On 10/04/2013 07:36 PM, Andrew Sutton wrote: + if (!check_template_constraints (tmpl, args)) +{ + location_t loc = DECL_SOURCE_LOCATION (function); + error (%qD is not a viable candidate, function); + diagnose_constraints (input_location, tmpl, args); + return error_mark_node; +} Nit: loc seems unused. Thanks, Paolo.