Re: avx runtime check
On May 16, 2014 4:47:11 AM CEST, Mike Stump mikest...@comcast.net wrote: This reorders the avx checks and gates on a target triplet check before compiling any code. Can you explain why? Ok? diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 40b5414..103a28a 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1353,8 +1353,8 @@ proc check_effective_target_sse2_runtime { } { # Return 1 if the target supports running AVX executables, 0 otherwise. proc check_effective_target_avx_runtime { } { -if { [check_effective_target_avx] - [check_avx_hw_available] +if { [check_avx_hw_available] + [check_effective_target_avx] [check_avx_os_support_available] } { return 1 }
Re: [PATCH] Implement -fsanitize=float-cast-overflow
On Thu, May 15, 2014 at 09:29:44PM +, Joseph S. Myers wrote: On Thu, 15 May 2014, Jakub Jelinek wrote: But I think we can't use decimal_real_from_string, we'd need a variant of that function that would allow specification of the rounding mode My point is that you can use %.*RUe or %.*RDe formats (for max and min respectively), with an appropriate precision, and let MPFR do the rounding to an appropriate number of decimal digits in the right direction (to produce a value that's exactly representable in the relevant DFP type, as long as it's in range). You are right, that seems to work. So new incremental patch. BTW, for IBM long double __int128_t f3 (long double x) { return x; } the u= { -170141183460469231731687303715884105728.0 + -4194304.0 } check is actually imprecise, while all correct long double values will be properly accepted, if high double is exactly -170141183460469231731687303715884105728.0 and low double is in [-1.0, -4194304.0), then the unspecified conversion will not be diagnosed, but I'm afraid there is nothing we can (easily) do about it, because { -170141183460469231731687303715884105728.0 + -1.0 }, while representable in IBM long double, is not representable in our REAL_VALUE_TYPE, because we assume fixed precision, while IBM long double has variable. Guess the combination of IBM long double and __int128_t (for long long it is still ok) is rare enough that we don't need to care about it (alternative would be e.g. to compare against addition of those two numbers with some optimization barrier/volatile that would avoid it from being optimized into a single long double number. --- gcc/convert.c +++ gcc/convert.c @@ -851,6 +851,8 @@ expr = save_expr (expr); tree check = ubsan_instrument_float_cast (loc, type, expr); expr = build1 (FIX_TRUNC_EXPR, type, expr); + if (check == NULL) + return expr; return fold_build2 (COMPOUND_EXPR, TREE_TYPE (expr), check, expr); } else --- gcc/ubsan.c +++ gcc/ubsan.c @@ -47,6 +47,8 @@ #include asan.h #include gimplify-me.h #include intl.h +#include realmpfr.h +#include dfp.h /* Map from a tree to a VAR_DECL tree. */ @@ -903,17 +905,95 @@ ubsan_instrument_float_cast (location_t loc, tree type, tree expr) { tree expr_type = TREE_TYPE (expr); - tree t, tt, fn; + tree t, tt, fn, min, max; + enum machine_mode mode = TYPE_MODE (expr_type); + int prec = TYPE_PRECISION (type); + bool uns_p = TYPE_UNSIGNED (type); - tree min = TYPE_MIN_VALUE (type); - tree max = TYPE_MAX_VALUE (type); - /* Add/subtract 1.0 so we can avoid truncating the value of EXPR. */ - min = fold_build2 (MINUS_EXPR, expr_type, -build_real_from_int_cst (expr_type, min), -build_one_cst (expr_type)); - max = fold_build2 (PLUS_EXPR, expr_type, -build_real_from_int_cst (expr_type, max), -build_one_cst (expr_type)); + /* Float to integer conversion first truncates toward zero, so + even signed char c = 127.875f; is not problematic. + Therefore, we should complain only if EXPR is unordered or smaller + or equal than TYPE_MIN_VALUE - 1.0 or greater or equal than + TYPE_MAX_VALUE + 1.0. */ + if (REAL_MODE_FORMAT (mode)-b == 2) +{ + /* For maximum, TYPE_MAX_VALUE might not be representable +in EXPR_TYPE, e.g. if TYPE is 64-bit long long and +EXPR_TYPE is IEEE single float, but TYPE_MAX_VALUE + 1.0 is +either representable or infinity. */ + REAL_VALUE_TYPE maxval = dconst1; + SET_REAL_EXP (maxval, REAL_EXP (maxval) + prec - !uns_p); + real_convert (maxval, mode, maxval); + max = build_real (expr_type, maxval); + + /* For unsigned, assume -1.0 is always representable. */ + if (uns_p) + min = build_minus_one_cst (expr_type); + else + { + /* TYPE_MIN_VALUE is generally representable (or -inf), +but TYPE_MIN_VALUE - 1.0 might not be. */ + REAL_VALUE_TYPE minval = dconstm1, minval2; + SET_REAL_EXP (minval, REAL_EXP (minval) + prec - 1); + real_convert (minval, mode, minval); + real_arithmetic (minval2, MINUS_EXPR, minval, dconst1); + real_convert (minval2, mode, minval2); + if (real_compare (EQ_EXPR, minval, minval2) + !real_isinf (minval)) + { + /* If TYPE_MIN_VALUE - 1.0 is not representable and +rounds to TYPE_MIN_VALUE, we need to subtract +more. As REAL_MODE_FORMAT (mode)-p is the number +of base digits, we want to subtract a number that +will be 1 (REAL_MODE_FORMAT (mode)-p - 1) +times smaller than minval. */ + minval2 = dconst1; + gcc_assert (prec REAL_MODE_FORMAT (mode)-p); + SET_REAL_EXP (minval2, + REAL_EXP (minval2) + prec - 1 + -
Re: [PATCH] Make SCCVN constant-fold calls
On Thu, 15 May 2014, Richard Biener wrote: For some odd reason I didn't implement this earlier. This is one major source of 2nd-stage opportunities that come up when running two adjacent FRE passes. Bootstrap and regtest ongoing on x86_64-unknown-linux-gnu. The following is what I have applied (with also properly value-numbering VDEFs if there). Bootstrapped and tested on x86_64-unknown-linux-gnu. Richard. 2014-05-16 Richard Biener rguent...@suse.de * tree-ssa-sccvn.c (visit_use): Also constant-fold calls. * gcc.dg/tree-ssa/ssa-fre-41.c: New testcase. Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c(revision 210418) --- gcc/tree-ssa-sccvn.c(working copy) *** visit_use (tree use) *** 3566,3593 else if (is_gimple_call (stmt)) { tree lhs = gimple_call_lhs (stmt); - - /* ??? We could try to simplify calls. */ - if (lhs TREE_CODE (lhs) == SSA_NAME) { ! if (stmt_has_constants (stmt)) ! VN_INFO (lhs)-has_constants = true; ! else { ! /* We reset expr and constantness here because we may !have been value numbering optimistically, and !iterating. They may become non-constant in this case, !even if they were optimistically constant. */ ! VN_INFO (lhs)-has_constants = false; ! VN_INFO (lhs)-expr = NULL_TREE; } ! ! if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) { ! changed = defs_to_varying (stmt); goto done; } } if (!gimple_call_internal_p (stmt) --- 3566,3635 else if (is_gimple_call (stmt)) { tree lhs = gimple_call_lhs (stmt); if (lhs TREE_CODE (lhs) == SSA_NAME) { ! /* Try constant folding based on our current lattice. */ ! tree simplified = gimple_fold_stmt_to_constant_1 (stmt, ! vn_valueize); ! if (simplified) { ! if (dump_file (dump_flags TDF_DETAILS)) ! { ! fprintf (dump_file, call ); ! print_gimple_expr (dump_file, stmt, 0, 0); ! fprintf (dump_file, simplified to ); ! print_generic_expr (dump_file, simplified, 0); ! if (TREE_CODE (lhs) == SSA_NAME) ! fprintf (dump_file, has constants %d\n, !expr_has_constants (simplified)); ! else ! fprintf (dump_file, \n); ! } } ! /* Setting value numbers to constants will occasionally !screw up phi congruence because constants are not !uniquely associated with a single ssa name that can be !looked up. */ ! if (simplified ! is_gimple_min_invariant (simplified)) { ! VN_INFO (lhs)-expr = simplified; ! VN_INFO (lhs)-has_constants = true; ! changed = set_ssa_val_to (lhs, simplified); ! if (gimple_vdef (stmt)) ! changed |= set_ssa_val_to (gimple_vdef (stmt), ! gimple_vuse (stmt)); goto done; } + else if (simplified + TREE_CODE (simplified) == SSA_NAME) + { + changed = visit_copy (lhs, simplified); + if (gimple_vdef (stmt)) + changed |= set_ssa_val_to (gimple_vdef (stmt), + gimple_vuse (stmt)); + goto done; + } + else + { + if (stmt_has_constants (stmt)) + VN_INFO (lhs)-has_constants = true; + else + { + /* We reset expr and constantness here because we may +have been value numbering optimistically, and +iterating. They may become non-constant in this case, +even if they were optimistically constant. */ + VN_INFO (lhs)-has_constants = false; + VN_INFO (lhs)-expr = NULL_TREE; + } + + if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs)) + { + changed = defs_to_varying (stmt); + goto done; + } + } } if (!gimple_call_internal_p (stmt) Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-41.c
Re: patch8.diff updated Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
On Wed, 2014-05-07 at 10:18 +0200, Svante Signell wrote: On Tue, 2014-05-06 at 15:26 +0200, Samuel Thibault wrote: Attached is an updated patch8.diff. Arch specific code to src/libgo/mksysinfo.sh has been added, now other systems are not affected by the patch except the SYS_FCNTL part. For that part of the patch without it the build on GNU/Hurd fails. On the other hand SYS_FCNTL is not defined for e.g. GNU/Linux either. This is used in gcc-4.9-4.9.0/src/libgo/go/net/fd_unix.go: func dupCloseOnExec(fd int) (newfd int, err error) { if atomic.LoadInt32(tryDupCloexec) == 1 syscall.F_DUPFD_CLOEXEC!=0 { r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd), syscall.F_DUPFD_CLOEXEC, 0) It is yet unknown how the build succeeds on Linux without the SYS_FCNTL being defined? Maybe any the conditions above are not met. --- a/src/libgo/mksysinfo.sh +++ b/src/libgo/mksysinfo.sh @@ -210,6 +210,13 @@ egrep '#define E[A-Z0-9_]+ ' | \ sed -e 's/^#define \(E[A-Z0-9_]*\) .*$/const \1 = Errno(_\1)/' ${OUT} +# Special treatment of EWOULDBLOCK for GNU/Hurd +# /usr/include/bits/errno.h: #define EWOULDBLOCK EAGAIN +if egrep 'define EWOULDBLOCK EAGAIN' gen-sysinfo.go /dev/null 21; then + egrep '^const EWOULDBLOCK = Errno(_EWOULDBLOCK)' ${OUT} | \ +sed -i.bak -e 's/_EWOULDBLOCK/_EAGAIN/' ${OUT} +fi + # The O_xxx flags. egrep '^const _(O|F|FD)_' gen-sysinfo.go | \ sed -e 's/^\(const \)_\([^= ]*\)\(.*\)$/\1\2 = _\2/' ${OUT} @@ -225,6 +232,11 @@ echo const F_DUPFD_CLOEXEC = 0 ${OUT} fi +# Special treatment of SYS_FCNTL for GNU/Hurd +if ! grep '^const SYS_FCNTL' ${OUT} /dev/null 21; then + echo const SYS_FCNTL = 0 ${OUT} +fi + # These flags can be lost on i386 GNU/Linux when using # -D_FILE_OFFSET_BITS=64, because we see #define F_SETLK F_SETLK64 # before we see the definition of F_SETLK64. @@ -528,6 +540,12 @@ # The stat type. # Prefer largefile variant if available. +# Special treatment of st_dev for GNU/Hurd +# /usr/include/i386-gnu/bits/stat.h: #define st_dev st_fsid +if grep 'define st_dev st_fsid' gen-sysinfo.go /dev/null 21; then + egrep '^type _stat ' gen-sysinfo.go /dev/null 21| \ + sed -i.bak -e 's/; st_fsid/; st_dev/' gen-sysinfo.go +fi stat=`grep '^type _stat64 ' gen-sysinfo.go || true` if test $stat != ; then grep '^type _stat64 ' gen-sysinfo.go
Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware
On Thu, 8 May 2014, Richard Biener wrote: Ok, not really predicate aware, but this makes value-numbering pessimistically handle non-executable edges. In the following patch groundwork is laid and PHI value-numbering is adjusted to take advantage of edges known to be not executable. SCCVN is not well-suited to be control aware, but we still can see if value-numbering allows us to mark edges as not executable by looking at control statements. Value-numbering of PHI nodes is one obvious consumer of such information and it also gives a natural order to do that (pessimistic) edge executability computation - dominator order. Thus the following adds a pass over all control statements, trying to simplify them after value-numbering their operands (and all uses recursively, as SCCVN does). With followup patches I will try to use this information to reduce the amount of work done (also improving optimization, of course). One other obvious candidate is the alias walker which doesn't have to consider unreachable paths when walking into virtual PHIs. The patch will likely get some more cleanups (due to the hack in set_ssa_val_to). Comments still welcome. Quiet as usual. Well, the following is what I have committed after bootstrapping and regtesting on x86_64-unknown-linux-gnu. It fixes the inliner which is confused by random pass-local flags on the edges to the exit block, adds one more testcase and adjusts two. I figured that followups for more optimizations are not necessary as virtual operand value-numbering already gets us most of the benefit. Followups trying to do less work may still be possible but they are low on priority. Richard. 2014-05-16 Richard Biener rguent...@suse.de * tree-ssa-sccvn.c: Include tree-cfg.h and domwalk.h. (set_ssa_val_to): Handle unexpected sets to VN_TOP. (visit_phi): Ignore edges marked as not executable. (class cond_dom_walker): New. (cond_dom_walker::before_dom_children): Value-number control statements and mark successor edges as not executable if possible. (run_scc_vn): First walk all control statements in dominator order, marking edges as not executable. * tree-inline.c (copy_edges_for_bb): Be not confused about random edge flags. * gcc.dg/tree-ssa/ssa-fre-39.c: New testcase. * gcc.dg/tree-ssa/ssa-fre-40.c: Likewise. * gcc.dg/tree-ssa/ssa-pre-8.c: One more elimination. * gcc.dg/tree-ssa/struct-aliasing-2.c: Scan cddce1 dump. Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c.orig 2014-05-15 12:47:20.762286122 +0200 --- gcc/tree-ssa-sccvn.c2014-05-15 13:04:57.872213342 +0200 *** along with GCC; see the file COPYING3. *** 51,56 --- 51,58 #include params.h #include tree-ssa-propagate.h #include tree-ssa-sccvn.h + #include tree-cfg.h + #include domwalk.h /* This algorithm is based on the SCC algorithm presented by Keith Cooper and L. Taylor Simpson in SCC-Based Value numbering *** set_ssa_val_to (tree from, tree to) *** 2661,2666 --- 2663,2687 tree currval = SSA_VAL (from); HOST_WIDE_INT toff, coff; + /* The only thing we allow as value numbers are ssa_names + and invariants. So assert that here. We don't allow VN_TOP + as visiting a stmt should produce a value-number other than + that. + ??? Still VN_TOP can happen for unreachable code, so force + it to varying in that case. Not all code is prepared to + get VN_TOP on valueization. */ + if (to == VN_TOP) + { + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, Forcing value number to varying on +receiving VN_TOP\n); + to = from; + } + + gcc_assert (to != NULL_TREE + (TREE_CODE (to) == SSA_NAME + || is_gimple_min_invariant (to))); + if (from != to) { if (currval == from) *** set_ssa_val_to (tree from, tree to) *** 2680,2692 to = from; } - /* The only thing we allow as value numbers are VN_TOP, ssa_names - and invariants. So assert that here. */ - gcc_assert (to != NULL_TREE - (to == VN_TOP - || TREE_CODE (to) == SSA_NAME - || is_gimple_min_invariant (to))); - if (dump_file (dump_flags TDF_DETAILS)) { fprintf (dump_file, Setting value number of ); --- 2701,2706 *** visit_phi (gimple phi) *** 3071,3077 tree result; tree sameval = VN_TOP; bool allsame = true; - unsigned i; /* TODO: We could check for this in init_sccvn, and replace this with a gcc_assert. */ --- 3085,3090 *** visit_phi (gimple phi) *** 3080,3106 /* See if all non-TOP arguments have the same value. TOP is equivalent to
Re: [AArch64 costs 0/18] Improve address- and rtx-cost models
This series is OK. /Marcus On 27 March 2014 17:33, James Greenhalgh james.greenha...@arm.com wrote: Hi, This patch series improves the costing model in the AArch64 backend to match a number of new idioms. This patch is a combination of a series I had been working on, with the cost-model for XGene-1 proposed by Philipp Tomsich. ( http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01084.html ) Where sensible I have integrated the idiom matching in Philipp's with my own work, though there were cases which were redundant, or could be folded to reduce code duplication. There were other cases where the code suggested XGene-1 would benefit from special-case handling. Without documentation for the XGene-1 I can't cater for these special cases, and I have not tried to do so here. The patch series has been bootstrapped natively on aarch64-none-elf, and has been through aarch64-none-elf runs with no issue. Is this OK for stage-1? Thanks, James
[AArch64 costs] Fixup to costing of FNMUL
Hi, While waiting for approval of the costs patches, I spotted this bug. FNMUL should be costed like an FMUL, rather than being costed like an FMA instruction. Bootstrapped and tested in series with the costs patches with no issues. I see Marcus has approved the patch series, so is this OK to go in as a fixup to that? Thanks, James --- gcc/ 2014-05-15 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Fix FNMUL case. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 6eb2303..0a7f441 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4711,24 +4711,18 @@ aarch64_rtx_mult_cost (rtx x, int code, int outer, bool speed) { if (speed) { - /* Floating-point FMA can also support negations of the + /* Floating-point FMA/FMUL can also support negations of the operands. */ if (GET_CODE (op0) == NEG) - { - maybe_fma = true; - op0 = XEXP (op0, 0); - } + op0 = XEXP (op0, 0); if (GET_CODE (op1) == NEG) - { - maybe_fma = true; - op1 = XEXP (op1, 0); - } + op1 = XEXP (op1, 0); if (maybe_fma) /* FMADD/FNMADD/FNMSUB/FMSUB. */ cost += extra_cost-fp[mode == DFmode].fma; else - /* FMUL. */ + /* FMUL/FNMUL. */ cost += extra_cost-fp[mode == DFmode].mult; }
Re: RFA: Fix calculation of size of builtin setjmp buffer
Hi Eric, OK - here is your version of the patch, extended with a comment which I think is helpful for other people reading the code, and with the changes to builtins.c and md.texi removed, since the size of the buffer is not changing. Is this version OK to apply ? Cheers Nick gcc/ChangeLog 2014-05-16 Nick Clifton ni...@redhat.com * except.c (init_eh): Correct computation of the size of a builtin setjmp buffer for when pointers are bigger than words. Index: gcc/except.c === --- gcc/except.c(revision 210490) +++ gcc/except.c(working copy) @@ -286,9 +286,22 @@ tmp = size_int (FIRST_PSEUDO_REGISTER + 2 - 1); #endif #else - /* builtin_setjmp takes a pointer to 5 words. */ - tmp = size_int (5 * BITS_PER_WORD / POINTER_SIZE - 1); + /* Compute a minimally sized jump buffer. We need room to store at +least 3 pointers - stack pointer, frame pointer and return address. +Plus for some targets we need room for an extra pointer - in the +case of MIPS this is the global pointer. This makes a total of four +pointers, but to be safe we actually allocate room for 5. + +If pointers are smaller than words then we allocate enough room for +5 words, just in case the backend needs this much room. For more +discussion on this issue see: +http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00313.html. */ + if (POINTER_SIZE BITS_PER_WORD) + tmp = size_int (5 - 1); + else + tmp = size_int ((5 * BITS_PER_WORD / POINTER_SIZE) - 1); #endif tmp = build_index_type (tmp); tmp = build_array_type (ptr_type_node, tmp);
Re: [AArch64 costs] Fixup to costing of FNMUL
On 16 May 2014 09:26, James Greenhalgh james.greenha...@arm.com wrote: 2014-05-15 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Fix FNMUL case. OK /Marcus
Avoiding some garbage rtl from instantiate_virtual_regs
It seems that in a typical -O0 compile the amount of rtl that starts out as needed but becomes garbage is only slightly less than half of the total amount created. One of the big offenders is the vregs pass, which creates new PLUSes when instanstiating a virtual register + a constant and which creates new MEMs when instantiating an address involving a virtual register. This happens a lot in -O0 code because all variables live on the stack. The instantiation walk is fundamentally in-place: every other part of the pattern is modified without copying. And rtl sharing rules guarantee that we can do the same for PLUSes of registers and MEMs. The patch does this by adding inplace arguments to plus_constant and replace_equiv_address. In a -O0 compile of an oldish fold-const.ii (where no GC takes place) it reduces the amount of used GC memory from 169M to 166M. The average max RSS goes down by just over 1%. Compile time seems to decrease slightly, but probably in the noise range. There might be other callers than can use the new interfaces too. Tested on x86_64-linux-gnu. Also tested by comparing the asm output for various parts of the testsuite before and after the patch. The only changes were that some sym+0s becamse plain syms (i.e. (plus X (const_int 0)) became X) because of the plus_constant change. OK to install? Thanks, Richard gcc/ * emit-rtl.h (replace_equiv_address, replace_equiv_address_nv): Add an inplace argument. Store the new address in the original MEM when true. * emit-rtl.c (change_address_1): Likewise. (adjust_address_1, adjust_automodify_address_1, offset_address): Update accordingly. * rtl.h (plus_constant): Add an inplace argument. * explow.c (plus_constant): Likewise. Try to reuse the original PLUS when true. Avoid generating (plus X (const_int 0)). * function.c (instantiate_virtual_regs_in_rtx): Adjust the PLUS in-place. Pass true to plus_constant. (instantiate_virtual_regs_in_insn): Pass true to replace_equiv_address. Index: gcc/emit-rtl.h === --- gcc/emit-rtl.h 2014-05-15 11:27:06.000259353 +0100 +++ gcc/emit-rtl.h 2014-05-16 09:11:42.479556294 +0100 @@ -52,10 +52,10 @@ extern tree get_spill_slot_decl (bool); ADDR. The caller is asserting that the actual piece of memory pointed to is the same, just the form of the address is being changed, such as by putting something into a register. */ -extern rtx replace_equiv_address (rtx, rtx); +extern rtx replace_equiv_address (rtx, rtx, bool = false); /* Likewise, but the reference is not required to be valid. */ -extern rtx replace_equiv_address_nv (rtx, rtx); +extern rtx replace_equiv_address_nv (rtx, rtx, bool = false); extern rtx gen_blockage (void); extern rtvec gen_rtvec (int, ...); Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c 2014-05-16 09:09:18.446271662 +0100 +++ gcc/emit-rtl.c 2014-05-16 09:40:25.285714457 +0100 @@ -145,7 +145,6 @@ #define cur_insn_uid (crtl-emit.x_cur_i #define cur_debug_insn_uid (crtl-emit.x_cur_debug_insn_uid) #define first_label_num (crtl-emit.x_first_label_num) -static rtx change_address_1 (rtx, enum machine_mode, rtx, int); static void set_used_decls (tree); static void mark_label_nuses (rtx); static hashval_t const_int_htab_hash (const void *); @@ -2010,11 +2009,15 @@ clear_mem_size (rtx mem) /* Return a memory reference like MEMREF, but with its mode changed to MODE and its address changed to ADDR. (VOIDmode means don't change the mode. NULL for ADDR means don't change the address.) VALIDATE is nonzero if the - returned memory location is required to be valid. The memory - attributes are not changed. */ + returned memory location is required to be valid. INPLACE is true if any + changes can be made directly to MEMREF or false if MEMREF must be treated + as immutable. + + The memory attributes are not changed. */ static rtx -change_address_1 (rtx memref, enum machine_mode mode, rtx addr, int validate) +change_address_1 (rtx memref, enum machine_mode mode, rtx addr, int validate, + bool inplace) { addr_space_t as; rtx new_rtx; @@ -2042,6 +2045,12 @@ change_address_1 (rtx memref, enum machi if (rtx_equal_p (addr, XEXP (memref, 0)) mode == GET_MODE (memref)) return memref; + if (inplace) +{ + XEXP (memref, 0) = addr; + return memref; +} + new_rtx = gen_rtx_MEM (mode, addr); MEM_COPY_ATTRIBUTES (new_rtx, memref); return new_rtx; @@ -2053,7 +2062,7 @@ change_address_1 (rtx memref, enum machi rtx change_address (rtx memref, enum machine_mode mode, rtx addr) { - rtx new_rtx = change_address_1 (memref, mode, addr, 1); + rtx new_rtx = change_address_1 (memref, mode, addr, 1, false); enum machine_mode mmode = GET_MODE (new_rtx); struct
[PATCH SH] Don't switch mode around fmov (pr61195)
Hi, This patch reduces the number of unnecessary PR mode switches for single precision moves. When FPSCR.SZ is not forced. Test illustration in the patch Regtested for sh-none-elf with -m4, (-m2a still running), bootstrapped on sh4-linux-elf board. ok for trunk ? Many thanks, Christian 2014-05-16 Christian Bruel christian.br...@st.com PR target/61195 * config/sh/sh.c (movsf_ie): Unset fp_mode for fmov. 2014-05-16 Christian Bruel christian.br...@st.com PR target/61195 * gcc.target/sh/pr61195.c: New test. Index: config/sh/sh.md === --- config/sh/sh.md (revision 210475) +++ config/sh/sh.md (working copy) @@ -8357,9 +8357,26 @@ label: (const_int 2) (const_int 2) (const_int 0)]) - (set (attr fp_mode) (if_then_else (eq_attr fmovd yes) - (const_string single) - (const_string single)))]) + (set_attr_alternative fp_mode + [(if_then_else (eq_attr fmovd yes) (const_string single) (const_string none)) + (const_string none) + (const_string single) + (const_string single) + (const_string none) + (if_then_else (eq_attr fmovd yes) (const_string single) (const_string none)) + (if_then_else (eq_attr fmovd yes) (const_string single) (const_string none)) + (const_string none) + (const_string none) + (const_string none) + (const_string none) + (const_string none) + (const_string none) + (const_string none) + (const_string none) + (const_string none) + (const_string none) + (const_string none) + (const_string none)])]) (define_split [(set (match_operand:SF 0 register_operand ) Index: testsuite/gcc.target/sh/pr61195.c === --- testsuite/gcc.target/sh/pr61195.c (revision 0) +++ testsuite/gcc.target/sh/pr61195.c (working copy) @@ -0,0 +1,19 @@ +/* Verify that we don't switch mode for single moves. */ +/* { dg-do compile } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-skip-if { *-*-* } { mfmovd } { } } */ +/* { dg-final { scan-assembler-not fpscr } } */ + +float *g; + +float +foo(float f) +{ + return f; +} + +float +foo1(void) +{ + return *g; +}
RE: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
Ping. This should be relatively simple to review. Many thanks. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Ian Bolton Sent: 08 May 2014 18:36 To: gcc-patches Subject: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible Hi, It currently takes 4 instructions to generate certain immediates on AArch64 (unless we put them in the constant pool). For example ... long long beefcafebabe () { return 0xBEEFCAFEBABEll; } leads to ... mov x0, 0x47806 mov x0, 0xcafe, lsl 16 mov x0, 0xbeef, lsl 32 orr x0, x0, -281474976710656 The above case is tackled in this patch by employing MOVN to generate the top 32-bits in a single instruction ... mov x0, -71536975282177 movk x0, 0xcafe, lsl 16 movk x0, 0xbabe, lsl 0 Note that where at least two half-words are 0x, existing code that does the immediate in two instructions is still used.) Tested on standard gcc regressions and the attached test case. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use MOVN when top-most half-word (and only that half-word) is 0x. gcc/testsuite/ * gcc.target/aarch64/movn_1.c: New test.
RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case
Ping. This may well be classed as obvious, but that's not obvious to me, so I request a review. Many thanks. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Ian Bolton Sent: 08 May 2014 18:42 To: gcc-patches Subject: [PATCH, AArch64] Fix macro in vdup_lane_2 test case This patch fixes a defective macro definition, based on correct definition in similar testcases. The test currently passes through luck rather than correctness. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton ian.bol...@arm.com gcc/testsuite * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an actual instruction to move into the allocated register.
Re: [Patch, avr] Propagate -mrelax gcc driver flag to assembler
Am 05/15/2014 09:55 AM, schrieb Senthil Kumar Selvaraj: On Wed, May 14, 2014 at 12:56:54PM +0200, Rainer Orth wrote: Georg-Johann Lay a...@gjlay.de writes: Or what about simply that, which works for me: Index: config/avr/avr.h === --- config/avr/avr.h(revision 210276) +++ config/avr/avr.h(working copy) @@ -512,7 +512,11 @@ extern const char *avr_device_to_sp8 (in %{!fenforce-eh-specs:-fno-enforce-eh-specs} \ %{!fexceptions:-fno-exceptions} +#ifdef HAVE_AS_AVR_LINK_RELAX_OPTION +#define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) %{mrelax:-mlink-relax} +#else #define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) +#endif #define LINK_SPEC \ %{mrelax:--relax\ Better yet something like #ifdef HAVE_AS_AVR_LINK_RELAX_OPTION #define LINK_RELAX_SPEC %{mrelax:-mlink-relax} #else #define LINK_RELAX_SPEC #endif #define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) LINK_RELAX_SPEC Does this look ok? I don't have commit access, so could someone commit this please? Hi, looks fine to me. Thanks Usually, changelogs are more descriptive w.r.t. to what objects are touched like: * config/avr/avr.h (LINK_RELAX_SPEC): Pass -mlink-relax to the assembler, depending on HAVE_AS_AVR_LINK_RELAX_OPTION. (ASM_SPEC): Use it. * configure.ac (HAVE_AVR_AS_LINK_RELAX_OPTION) [avr]: New define if assembler supports -mlink-relax. * config.in: Regenerate. * configure: Likewise. Regards Senthil 2014-05-15 Senthil Kumar Selvaraj senthil_kumar.selva...@atmel.com * config/avr/avr.h: Pass on mlink-relax to assembler. * configure.ac: Test for mlink-relax assembler support. * config.in: Regenerate. * configure: Likewise. diff --git gcc/config.in gcc/config.in index c0ba36e..1738301 100644 --- gcc/config.in +++ gcc/config.in @@ -575,6 +575,12 @@ #endif +/* Define if your assembler supports -mlink-relax option. */ +#ifndef USED_FOR_TARGET +#undef HAVE_AVR_AS_LINK_RELAX_OPTION +#endif + + /* Define to 1 if you have the `clearerr_unlocked' function. */ #ifndef USED_FOR_TARGET #undef HAVE_CLEARERR_UNLOCKED diff --git gcc/config/avr/avr.h gcc/config/avr/avr.h index 9d34983..c59c54d 100644 --- gcc/config/avr/avr.h +++ gcc/config/avr/avr.h @@ -512,8 +512,14 @@ extern const char *avr_device_to_sp8 (int argc, const char **argv); %{!fenforce-eh-specs:-fno-enforce-eh-specs} \ %{!fexceptions:-fno-exceptions} -#define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) - +#ifdef HAVE_AVR_AS_LINK_RELAX_OPTION +#define ASM_RELAX_SPEC %{mrelax:-mlink-relax} +#else +#define ASM_RELAX_SPEC +#endif + +#define ASM_SPEC %:device_to_as(%{mmcu=*:%*}) ASM_RELAX_SPEC + #define LINK_SPEC \ %{mrelax:--relax\ %{mpmem-wrap-around:%{mmcu=at90usb8*:--pmem-wrap-around=8k}\ diff --git gcc/configure gcc/configure index f4db0a0..2812cdb 100755 --- gcc/configure +++ gcc/configure @@ -24014,6 +24014,39 @@ $as_echo #define HAVE_AS_JSRDIRECT_RELOCS 1 confdefs.h fi ;; + avr-*-*) +{ $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for -mlink-relax option 5 +$as_echo_n checking assembler for -mlink-relax option... 6; } +if test ${gcc_cv_as_avr_relax+set} = set; then : + $as_echo_n (cached) 6 +else + gcc_cv_as_avr_relax=no + if test x$gcc_cv_as != x; then +$as_echo '.text' conftest.s +if { ac_try='$gcc_cv_as $gcc_cv_as_flags -mlink-relax -o conftest.o conftest.s 5' + { { eval echo \\$as_me\:${as_lineno-$LINENO}: \$ac_try\; } 5 + (eval $ac_try) 25 + ac_status=$? + $as_echo $as_me:${as_lineno-$LINENO}: \$? = $ac_status 5 + test $ac_status = 0; }; } +then + gcc_cv_as_avr_relax=yes +else + echo configure: failed program was 5 + cat conftest.s 5 +fi +rm -f conftest.o conftest.s + fi +fi +{ $as_echo $as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_avr_relax 5 +$as_echo $gcc_cv_as_avr_relax 6; } +if test $gcc_cv_as_avr_relax = yes; then + +$as_echo #define HAVE_AVR_AS_LINK_RELAX_OPTION 1 confdefs.h + +fi + ;; + cris-*-*) { $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for -no-mul-bug-abort option 5 $as_echo_n checking assembler for -no-mul-bug-abort option... 6; } diff --git gcc/configure.ac gcc/configure.ac index 8f17dfb..49a1f3d 100644 --- gcc/configure.ac +++ gcc/configure.ac @@ -3510,6 +3510,13 @@ case $target in [Define if your assembler supports the lituse_jsrdirect relocation.])]) ;; + avr-*-*) +gcc_GAS_CHECK_FEATURE([-mlink-relax option], gcc_cv_as_avr_relax,, + [-mlink-relax], [.text],, + [AC_DEFINE(HAVE_AVR_AS_LINK_RELAX_OPTION, 1, + [Define if your assembler supports -mlink-relax option.])]) + ;; + cris-*-*) gcc_GAS_CHECK_FEATURE([-no-mul-bug-abort option], gcc_cv_as_cris_no_mul_bug,[2,15,91],
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
Ping? Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function find_bswap/find_bswap_or_nop and reduce its size. Please hold for the review, I will post an updated version as soon as I finish testing. Best regards, Thomas Preud'homme
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On Fri, May 16, 2014 at 12:57 AM, Steven Bosscher stevenb@gmail.com wrote: On Thu, May 15, 2014 at 9:26 AM, bin.cheng wrote: Hi, Targets like ARM and AARCH64 support double-word load store instructions, and these instructions are generally faster than the corresponding two load/stores. GCC currently uses peephole2 to merge paired load/store into one single instruction which has a disadvantage. It can only handle simple cases like the two instructions actually appear sequentially in instruction stream, and is too weak to handle cases in which the two load/store are intervened by other irrelevant instructions. Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjacent to each other. The algorithm is pretty simple: 1) In initialization pass iterating over instruction stream it collects relevant memory access information for each instruction. 2) It iterates over each basic block, tries to find possible paired instruction for each memory access instruction. During this work, it checks dependencies between the two possible instructions and also records the information indicating how to pair the two instructions. To avoid quadratic behavior of the algorithm, It introduces new parameter max-merge-paired-loadstore-distance and set the default value to 4, which is large enough to catch major part of opportunities on ARM/cortex-a15. 3) For each candidate pair, it calls back-end's hook to do target dependent check and merge the two instructions if possible. Though the parameter is set to 4, for miscellaneous benchmarks, this pass can merge numerous opportunities except ones already merged by peephole2 (same level numbers of opportunities comparing to peepholed ones). GCC bootstrap can also confirm this finding. Yet there is an open issue about when we should run this new pass. Though register renaming is disabled by default now, I put this pass after it, because renaming can resolve some false dependencies thus benefit this pass. Another finding is, it can capture a lot more opportunities if it's after sched2, but I am not sure whether it will mess up with scheduling results in this way. So, any comments about this? Hi Steven, Thanks for reviewing this. Here are some answers to the general questions. First off: Why does this need a target hook? We're getting more and more of them -- too many IMHO. There should be really good reasons for adding even more new ones. Yes, I think this one does have a good reason. The target independent pass just makes sure that two consecutive memory access instructions are free of data-dependency with each other, then feeds it to back-end hook. It's back-end's responsibility to generate correct instruction. It's not about modifying an existing insn then recognize it, it's about creating new instruction sometimes. For example, we can generate a simple move insn in Arm mode, while have to generate a parallel instruction in Thumb mode. Target independent part has no idea how to generate an expected insn. Moreover, back-end may check some special conditions too. Does this pass have to run after scheduling? The way you describe it, No, I just meant there is more opportunities after regrenaming, and even more opportunities after sched2, I haven't investigated reason for the latter one yet, but this pass doesn't depend on sched2 to work. this sounds like a scheduling problem, where you don't need regrename to resolve false dependencies. Your sched2 pass should be able to prioritize mergeable loads/stores to schedule them adjacent. Of if you must do this before scheduling, then at least do it before sched2. Now you're really ruining the order of the scheduled instructions, which seems bad. Yes, I agree it should NOT disturb scheduling results, that's why I put it before sched2 and after register renaming right now. I don't see how regrename will help resolve [base+offset] false dependencies. Can you explain? I'd expect effects from hardreg-copyprop commoning a base register. It's the register operand's false dependency, rather than the base's one. Considering below simple case: mov r1, #const1 store r1, [base+offset1] mov r1, #const2 store r1, [base_offset2] It should be renamed into: mov r1, #const1 store r1, [base+offset1] mov r2, #const2 store r2, [base_offset2] Then caught by the patch. I will leave other comments for a moment after more discussion here. Thanks, bin ChangeLog is missing the entry for arm.c. Your pass should make those peephole2's redundant, so you should remove the relevant define_peephole2's. + generated by spilling during reigster allocation. To catch all s/reigster/register/ + whose address expression is in the form of [base_offset]. It s/[base_offset]/[base+offset]/ + only guarantee that the two consecutive memory access instructions s/guarantee/guarantees/ +
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On Fri, May 16, 2014 at 1:13 AM, Jeff Law l...@redhat.com wrote: On 05/15/14 10:51, Mike Stump wrote: On May 15, 2014, at 12:26 AM, bin.cheng bin.ch...@arm.com wrote: Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjacent to each other. So I have a target that has load and store multiple support that supports large a number of registers (2-n registers), and I added a sched0 pass that is a light copy of the regular scheduling pass that uses a different cost function which arranges all loads first, then all stores then everything else. Within a group of loads or stores the secondary key is the base register, the next key is the offset. The net result, all loads off the same register are sorted in increasing order. Glad to see someone else stumble on (ab)using the scheduler to do this. Emm, If it's (ab)using, should we still do it then? I've poked at the scheduler several times to do similar stuff, but was never really satisfied with the results and never tried to polish those prototypes into something worth submitting. One example I've poked at was discovery of stores which then feed into a load from the same location. Which obviously we'd prefer to turn into a store + copy (subject to mess of constraints). There's a handful of these kind of transformations that seem to naturally drop out of this kind of work. As Mike stated, merging of consecutive memory accesses is all about the base register and the offset. I am thinking another method collecting all memory accesses with same base register then doing the merge work. In this way, we should be able to merge more than 2 instructions, also it would be possible to remove redundant load instructions in one pass. My question is how many these redundant loads could be? Is there any rtl pass responsible for this now? Thanks, bin Similarly a post-reload pass could be used to promote single word loads/stores to double-word operations. If anyone cared about PA 1.1 code generation, it'd be a much cleaner way to support the non-fused fmpyadd fmpysub insns. Anyway, if you want to move forward with the idea, I'd certainly support doing so. jeff -- Best Regards.
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On Fri, May 16, 2014 at 12:51 AM, Mike Stump mikest...@comcast.net wrote: On May 15, 2014, at 12:26 AM, bin.cheng bin.ch...@arm.com wrote: Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjacent to each other. So I have a target that has load and store multiple support that supports large a number of registers (2-n registers), and I added a sched0 pass that is a light copy of the regular scheduling pass that uses a different cost function which arranges all loads first, then all stores then everything else. Within a group of loads or stores the secondary key is the base register, the next key is the offset. The net result, all loads off the same register are sorted in increasing order. We then can use some define_insns and some peephole to patterns to take the seemingly unrelated instructions, which are now adjacent to knock them down into single instructions, instead of the mass of instructions they were before. And then a peephole pass that runs early to allow the patterns to do the heavy lifting. This scheme can handle unlimited complexity on the addressing forms just by ensuring the cost function for the new scheduling pass looks at all relevant bits (target dependent, if the simple machine independent form reg + n is not enough). The sched0 and the peephole pass run early: + NEXT_PASS (pass_sched0); + NEXT_PASS (pass_peephole1); NEXT_PASS (pass_web); NEXT_PASS (pass_rtl_cprop); NEXT_PASS (pass_cse2); (before register allocation) so, it can arrange to have things in adjacent registers for the load and store multiple instructions. The register allocator can then arrange all the code to use those registers directly. So, having done all this work, I think it would be nicer if there were a pass that managed it (so that one doesn't have to write any of the peephole or the define_insns (you need like 3*n patterns, and the patterns of O(n), so, you need around n*4/2 lines of code, which is annoying for large n. A pass could use the existing load store multiple patterns directly, so, no additional port work. In my work, since I extend life times of values into registers, pretty much without limit, this could be a bad thing. The code is naturally limited to only extending the lives of things where load and store multiple are used, as if they aren't used, the regular scheduling pass would undo all the sched0 motion. I choose a light copy of sched, as I don't care about compile time, and I wanted a very small patch that was easy to maintain. If this pass when into trunk, we'd run the new passes _only_ if a port asked for them. 99% of the ports likely don't want either, though, peephole before register allocation might be interesting for others to solve other issues. I wanted this to run before register allocation as my load and store multiple instructions only take consecutive register ranges (n-m), and I need the register allocator to manage to make it true. I do reg to reg motion to move things around as needed, but almost all I expect the allocator to get rid of. Very complex cases might wind up with a few extra moves, but I have nice bubbles that I can fit these extra moves into. In my scheme, no new options, no documentation for new options, no new param options, no silly constants, no hard to write/maintain pass, no new weird targets interface, no limit on just pairs, works on stores as well, runs earlier, 430 lines instead of 1086 lines, conceptually much simpler, added benefit of peephole before register allocation that can be used for other things by the port... So, my question is, does my scheme on your target find more or fewer things? Would your scheme pair pairs (so that 4 registers would go into 1 instruction)? Hi Mike, Thanks for bringing up this new method. I had to admit that I was not very much into this method at the first glance especially it requires another pass in cooperation with scheduler. For the first question, unfortunately, I can't apply the patch to svn trunk, could you please update it thus I can do some experiment with it? From my experience, lots of opportunities on ARM are generated by RA, putting it before RA would miss many cases. Another possible issue is about interaction with other optimizations. Putting it at early stage of RTL, we may disturb other optimizations like /gcse/-fgcse-lm/-fgcse-sm/dse. Of course, it has advantages too, for example, fwprop_addr sometimes corrupt load/store pair opportunities on ARM, it won't be a problem for your patch. For the second question, the answer is no for current implementation. For ARM the most important opportunity is the paired one, so I just started with pair of two instructions, which is much simpler. I do have a draft idea how to merge more than two instructions, but haven't worked on it yet. Thanks, bin
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On Thu, May 15, 2014 at 6:31 PM, Oleg Endo oleg.e...@t-online.de wrote: Hi, On 15 May 2014, at 09:26, bin.cheng bin.ch...@arm.com wrote: Hi, Targets like ARM and AARCH64 support double-word load store instructions, and these instructions are generally faster than the corresponding two load/stores. GCC currently uses peephole2 to merge paired load/store into one single instruction which has a disadvantage. It can only handle simple cases like the two instructions actually appear sequentially in instruction stream, and is too weak to handle cases in which the two load/store are intervened by other irrelevant instructions. Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjacent to each other. The algorithm is pretty simple: 1) In initialization pass iterating over instruction stream it collects relevant memory access information for each instruction. 2) It iterates over each basic block, tries to find possible paired instruction for each memory access instruction. During this work, it checks dependencies between the two possible instructions and also records the information indicating how to pair the two instructions. To avoid quadratic behavior of the algorithm, It introduces new parameter max-merge-paired-loadstore-distance and set the default value to 4, which is large enough to catch major part of opportunities on ARM/cortex-a15. 3) For each candidate pair, it calls back-end's hook to do target dependent check and merge the two instructions if possible. Though the parameter is set to 4, for miscellaneous benchmarks, this pass can merge numerous opportunities except ones already merged by peephole2 (same level numbers of opportunities comparing to peepholed ones). GCC bootstrap can also confirm this finding. This is interesting. E.g. on SH there are insns to load/store SFmode pairs. However, these insns require a mode switch and have some constraints on register usage. So in the SH case the load/store pairing would need to be done before reg alloc and before mode switching. Yet there is an open issue about when we should run this new pass. Though register renaming is disabled by default now, I put this pass after it, because renaming can resolve some false dependencies thus benefit this pass. Another finding is, it can capture a lot more opportunities if it's after sched2, but I am not sure whether it will mess up with scheduling results in this way. How about the following. Instead of adding new hooks and inserting the pass to the general pass list, make the new pass class take the necessary callback functions directly. Then targets can just instantiate the pass, passing their impl of the callbacks, and insert the pass object into the pass list at a place that fits best for the target. Oh, I don't know we can do this in GCC. But yes, a target may want to run it at some place that fits best for the target. Thanks, bin So, any comments about this? Thanks, bin 2014-05-15 Bin Cheng bin.ch...@arm.com * common.opt (flag_merge_paired_loadstore): New option. * merge-paired-loadstore.c: New file. * Makefile.in: Support new file. * config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro. (load_latency_expanded_p, arm_merge_paired_loadstore): New function. * params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param. * doc/invoke.texi (-fmerge-paired-loadstore): New. (max-merge-paired-loadstore-distance): New. * doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New. * doc/tm.texi: Regenerated. * target.def (merge_paired_loadstore): New. * tree-pass.h (make_pass_merge_paired_loadstore): New decl. * passes.def (pass_merge_paired_loadstore): New pass. * timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var. gcc/testsuite/ChangeLog 2014-05-15 Bin Cheng bin.ch...@arm.com * gcc.target/arm/merge-paired-loadstore.c: New test. merge-paired-loadstore-20140515.txt -- Best Regards.
Re: [AArch64] Implement ADD in vector registers for 32-bit scalar values.
On Fri, Mar 28, 2014 at 03:39:53PM +, James Greenhalgh wrote: On Fri, Mar 28, 2014 at 03:09:22PM +, pins...@gmail.com wrote: On Mar 28, 2014, at 7:48 AM, James Greenhalgh james.greenha...@arm.com wrote: On Fri, Mar 28, 2014 at 11:11:58AM +, pins...@gmail.com wrote: On Mar 28, 2014, at 2:12 AM, James Greenhalgh james.greenha...@arm.com wrote: There is no way to perform scalar addition in the vector register file, but with the RTX costs in place we start rewriting (x 1) to (x + x) on almost all cores. The code which makes this decision has no idea that we will end up doing this (it happens well before reload) and so we end up with very ugly code generation in the case where addition was selected, but we are operating in vector registers. This patch relies on the same gimmick we are already using to allow shifts on 32-bit scalars in the vector register file - Use a vector 32x2 operation instead, knowing that we can safely ignore the top bits. This restores some normality to scalar_shift_1.c, however the test that we generate a left shift by one is clearly bogus, so remove that. This patch is pretty ugly, but it does generate superficially better looking code for this testcase. Tested on aarch64-none-elf with no issues. OK for stage 1? It seems we should also discourage the neon alternatives as there might be extra movement between the two register sets which we don't want. I see your point, but we've tried to avoid doing that elsewhere in the AArch64 backend. Our argument has been that strictly speaking, it isn't that the alternative is expensive, it is the movement between the register sets. We do model that elsewhere, and the register allocator should already be trying to avoid unneccesary moves between register classes. What about on a specific core where that alternative is expensive; that is the vector instructions are worse than the scalar ones. How are we going to handle this case? Certainly not by discouraging the alternative for all cores. We would need a more nuanced approach which could be tuned on a per-core basis. Otherwise we are bluntly and inaccurately pessimizing those cases where we can cheaply perform the operation in the vector register file (e.g. we are cleaning up loose ends after a vector loop, we have spilled to the vector register file, etc.). The register preference mechanism feels the wrong place to catch this as it does not allow for that degree of per-core felxibility, an alternative is simply disparaged slightly (?, * in LRA) or disparaged severely (!). I would think that we don't want to start polluting the machine description trying to hack around this as was done with the ARM backend's neon_for_64_bits/avoid_neon_for_64_bits. How have other targets solved this issue? Did you have any further thoughts on this? I've pushed the costs patches, so we will start to see gcc.target/aarch64/scalar_shift_1.c failing without this or an equivalent patch. Othereise, *ping* Thanks, James --- gcc/ 2014-05-16 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in vector registers. gcc/testsuite/ 2014-05-16 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. If those mechanisms are broken, we should fix them - in that case fixing this by discouraging valid alternatives would seem to be gaffer-taping over the real problem. Thanks, James Thanks, Andrew Thanks, James --- gcc/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (*addsi3_aarch64): Add alternative in vector registers. gcc/testsuite/ 2014-03-27 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/scalar_shift_1.c: Fix expected assembler. 0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch --1.8.3-rc0 Content-Type: text/x-patch; name=0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch Content-Transfer-Encoding: 8bit Content-Disposition: attachment; filename=0001-AArch64-Implement-ADD-in-vector-registers-for-32-bit.patch diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 266d7873a5a1b8dbb7f955c3f13cf370920a9c4a..7c5b5a566ebfd907b83b38eed2e214738e7e9bd4 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1068,16 +1068,17 @@ (define_expand addmode3 (define_insn *addsi3_aarch64 [(set -(match_operand:SI 0 register_operand =rk,rk,rk) +(match_operand:SI 0 register_operand =rk,rk,w,rk) (plus:SI - (match_operand:SI 1 register_operand %rk,rk,rk) - (match_operand:SI 2 aarch64_plus_operand I,r,J)))] + (match_operand:SI 1 register_operand
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Thanks and sorry again for the delay. Otherwise the patch looks good to me. Richard. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function find_bswap/find_bswap_or_nop and reduce its size. Please hold for the review, I will post an updated version as soon as I finish testing. Best regards, Thomas Preud'homme
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On Fri, May 16, 2014 at 12:10 PM, Bin.Cheng amker.ch...@gmail.com wrote: On Thu, May 15, 2014 at 6:31 PM, Oleg Endo oleg.e...@t-online.de wrote: Hi, On 15 May 2014, at 09:26, bin.cheng bin.ch...@arm.com wrote: Hi, Targets like ARM and AARCH64 support double-word load store instructions, and these instructions are generally faster than the corresponding two load/stores. GCC currently uses peephole2 to merge paired load/store into one single instruction which has a disadvantage. It can only handle simple cases like the two instructions actually appear sequentially in instruction stream, and is too weak to handle cases in which the two load/store are intervened by other irrelevant instructions. Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjacent to each other. The algorithm is pretty simple: 1) In initialization pass iterating over instruction stream it collects relevant memory access information for each instruction. 2) It iterates over each basic block, tries to find possible paired instruction for each memory access instruction. During this work, it checks dependencies between the two possible instructions and also records the information indicating how to pair the two instructions. To avoid quadratic behavior of the algorithm, It introduces new parameter max-merge-paired-loadstore-distance and set the default value to 4, which is large enough to catch major part of opportunities on ARM/cortex-a15. 3) For each candidate pair, it calls back-end's hook to do target dependent check and merge the two instructions if possible. Though the parameter is set to 4, for miscellaneous benchmarks, this pass can merge numerous opportunities except ones already merged by peephole2 (same level numbers of opportunities comparing to peepholed ones). GCC bootstrap can also confirm this finding. This is interesting. E.g. on SH there are insns to load/store SFmode pairs. However, these insns require a mode switch and have some constraints on register usage. So in the SH case the load/store pairing would need to be done before reg alloc and before mode switching. Yet there is an open issue about when we should run this new pass. Though register renaming is disabled by default now, I put this pass after it, because renaming can resolve some false dependencies thus benefit this pass. Another finding is, it can capture a lot more opportunities if it's after sched2, but I am not sure whether it will mess up with scheduling results in this way. How about the following. Instead of adding new hooks and inserting the pass to the general pass list, make the new pass class take the necessary callback functions directly. Then targets can just instantiate the pass, passing their impl of the callbacks, and insert the pass object into the pass list at a place that fits best for the target. Oh, I don't know we can do this in GCC. But yes, a target may want to run it at some place that fits best for the target. Btw, the bswap pass enhancements that are currently in review may also be an opportunity to catch these. They can merge adjacent loads that are used composed (but not yet composed by storing into adjacent memory). The basic-block vectorizer should also handle this (if the composition happens to be by storing into adjacent memory) - of course it needs vector modes available and it has to be enabled. Richard. Thanks, bin So, any comments about this? Thanks, bin 2014-05-15 Bin Cheng bin.ch...@arm.com * common.opt (flag_merge_paired_loadstore): New option. * merge-paired-loadstore.c: New file. * Makefile.in: Support new file. * config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro. (load_latency_expanded_p, arm_merge_paired_loadstore): New function. * params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param. * doc/invoke.texi (-fmerge-paired-loadstore): New. (max-merge-paired-loadstore-distance): New. * doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New. * doc/tm.texi: Regenerated. * target.def (merge_paired_loadstore): New. * tree-pass.h (make_pass_merge_paired_loadstore): New decl. * passes.def (pass_merge_paired_loadstore): New pass. * timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var. gcc/testsuite/ChangeLog 2014-05-15 Bin Cheng bin.ch...@arm.com * gcc.target/arm/merge-paired-loadstore.c: New test. merge-paired-loadstore-20140515.txt -- Best Regards.
Re: add dbgcnt and opt-info support for devirtualization
On Fri, May 16, 2014 at 1:54 AM, Xinliang David Li davi...@google.com wrote: Hi, debugging runtime bugs due to devirtualization can be hard for very large C++ programs with complicated class hierarchy. This patch adds the support to report this high level transformation via -fopt-info (not hidden inside dump file) and the ability the do binary search with cutoff. Ok for trunk after build and test? + else if (dump_enabled_p ()) + { + location_t loc = gimple_location (ie-call_stmt); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, + Discovered direct call to non-function in %s, diagnostics start with lower-case. Why not merge this with the dump_file case? The point of all the infrastructure was to _not_ need to distinguish the cases ... (similar for the other cases, and IIRC you miss one case in tree-ssa-pre.c calling ipa_intraprocedural_devirtualization?) Thanks, Richard. thanks, David
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Fri, May 16, 2014 at 12:56 PM, pins...@gmail.com wrote: On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Not always decomposed. On MIPS, it should using the load/store left/right instructions for unaligned load/stores which is normally better than decomposed load/stores. So having a cost model would be nice. Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Richard. Thanks, Andrew Thanks and sorry again for the delay. Otherwise the patch looks good to me. Richard. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function find_bswap/find_bswap_or_nop and reduce its size. Please hold for the review, I will post an updated version as soon as I finish testing. Best regards, Thomas Preud'homme
Re: [PATCH, libstdc++/61166] overflow when parse number in std::duration operator
On 15/05/14 22:54 -0400, Ed Smith-Rowland wrote: On 05/15/2014 03:03 PM, Jonathan Wakely wrote: Here's a finished patch to simplify bits/parse_numbers.h Tested x86_64-linux. Ed, any objection to this version? This looks great, thanks! I committed that to trunk, I'll put it on the 4.9 branch too. Having done that should we actually stop using it as suggested in the bug trail? ;-) I was going to do that, then realised that there's a defect in the standard where it requires overflow in duration integer literals to be diagnosed. That's only possible with literal operator templates, so I think we should keep your _Parse_int code, but apply the attached change to detect overflow. As the TODO comment says, it should be sufficient to simply instantiate integral_constant_Rep, _Val::value to give a diagnostic when _Rep{_Value::value} is narrowing, but GCC only gives a warning for it, and that's suppressed in a system header, so I do an explicit static_assert. That could be replaced with ... #pragma GCC diagnostic push #pragma GCC diagnostic error -Woverflow #pragma GCC diagnostic error -Wsystem-headers templatetypename _Dur, char... _Digits constexpr _Dur __check_overflow() { using _Val = __parse_int::_Parse_int_Digits...; using _Rep = typename _Dur::rep; return _Dur{integral_constant_Rep, _Val::value::value}; } #pragma GCC diagnostic pop ... but I have plans to do that sort of thing more widely, which I'll deal with another time as part of https://gcc.gnu.org/PR50871 and/or https://gcc.gnu.org/PR58876 (what do other people think about using diagnostic pragmas to locally re-enable diagnostics in our headers?) Tested x86_64-linux, committed to trunk. commit 361c9b79e0b1c7f2435f5b0b369a139b216dee90 Author: Jonathan Wakely jwak...@redhat.com Date: Fri May 16 10:31:38 2014 +0100 * include/bits/parse_numbers.h (__parse_int::_Number_help): Check for overflow. * include/std/chrono (chrono_literals::__select_type::_Select_type): Remove. (chrono_literals::_Checked_integral_constant): Define. Simplify UDL operator templates and check for overflow. * testsuite/20_util/duration/literals/range.cc: New. diff --git a/libstdc++-v3/include/bits/parse_numbers.h b/libstdc++-v3/include/bits/parse_numbers.h index 0a42381a..a29d127 100644 --- a/libstdc++-v3/include/bits/parse_numbers.h +++ b/libstdc++-v3/include/bits/parse_numbers.h @@ -193,6 +193,7 @@ namespace __parse_int _Pow / (_Base * __valid_digit::value), _Digs...; using type = __ull_constant_Pow * __digit::value + __next::type::value; + static_assert((type::value / _Pow) == __digit::value, overflow); }; templateunsigned _Base, unsigned long long _Pow, char _Dig diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono index b114e02..39ad5e3 100644 --- a/libstdc++-v3/include/std/chrono +++ b/libstdc++-v3/include/std/chrono @@ -787,117 +787,79 @@ _GLIBCXX_END_NAMESPACE_VERSION inline namespace chrono_literals { -namespace __select_type -{ - - using namespace __parse_int; - - templateunsigned long long _Val, typename _Dur - struct _Select_type - : conditional - _Val = static_castunsigned long long - (numeric_limitstypename _Dur::rep::max()), - _Dur, void - { - static constexpr typename _Select_type::type - value{static_casttypename _Select_type::type(_Val)}; - }; - - templateunsigned long long _Val, typename _Dur - constexpr typename _Select_type_Val, _Dur::type - _Select_type_Val, _Dur::value; +templatetypename _Rep, unsigned long long _Val + struct _Checked_integral_constant + : integral_constant_Rep, static_cast_Rep(_Val) + { + static_assert(_Checked_integral_constant::value 0 + _Checked_integral_constant::value == _Val, + literal value cannot be represented by duration type); + }; -} // __select_type +templatetypename _Dur, char... _Digits + constexpr _Dur __check_overflow() + { + using _Val = __parse_int::_Parse_int_Digits...; + using _Rep = typename _Dur::rep; + // TODO: should be simply integral_constant_Rep, _Val::value + // but GCC doesn't reject narrowing conversions to _Rep. + using _CheckedVal = _Checked_integral_constant_Rep, _Val::value; + return _Dur{_CheckedVal::value}; + } constexpr chrono::durationlong double, ratio3600,1 operatorh(long double __hours) { return chrono::durationlong double, ratio3600,1{__hours}; } template char... _Digits - constexpr typename - __select_type::_Select_type__select_int::_Select_int_Digits...::value, - chrono::hours::type + constexpr chrono::hours operatorh() - { - return __select_type::_Select_type - __select_int::_Select_int_Digits...::value, - chrono::hours::value; - } + { return __check_overflowchrono::hours, _Digits...(); } constexpr chrono::durationlong double, ratio60,1
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On Fri, May 16, 2014 at 1:03 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:56 PM, pins...@gmail.com wrote: On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Not always decomposed. On MIPS, it should using the load/store left/right instructions for unaligned load/stores which is normally better than decomposed load/stores. So having a cost model would be nice. Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] 24 | x[2] 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). Richard. Richard. Thanks, Andrew Thanks and sorry again for the delay. Otherwise the patch looks good to me. Richard. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Monday, May 05, 2014 7:30 PM To: GCC Patches Subject: RE: [PATCH][2/3] Fix PR54733 Optimize endian independent load/store I found a way to improve the function
[PATCH] Fix one testcase from PR61194
This fixes the non-regression testcase from PR61194 by also recognizing COND_EXPRs as sink for bool expressions in vectorizer pattern detection. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-05-16 Richard Biener rguent...@suse.de PR tree-optimization/61194 * tree-vect-patterns.c (adjust_bool_pattern): Also handle bool patterns ending in a COND_EXPR. * gcc.dg/vect/pr61194.c: New testcase. Index: gcc/tree-vect-patterns.c === *** gcc/tree-vect-patterns.c(revision 210492) --- gcc/tree-vect-patterns.c(working copy) *** adjust_bool_pattern (tree var, tree out_ *** 2889,2895 S5 e_b = c_b | d_b; S6 f_T = (TYPE) e_b; !where type 'TYPE' is an integral type. Input: --- 2889,2900 S5 e_b = c_b | d_b; S6 f_T = (TYPE) e_b; !where type 'TYPE' is an integral type. Or a similar pattern !ending in ! ! S6 f_Y = e_b ? r_Y : s_Y; ! !as results from if-conversion of a complex condition. Input: *** vect_recog_bool_pattern (vecgimple *st *** 2966,2971 --- 2971,3015 *type_out = vectype; *type_in = vectype; stmts-safe_push (last_stmt); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + vect_recog_bool_pattern: detected:\n); + + return pattern_stmt; + } + else if (rhs_code == COND_EXPR + TREE_CODE (var) == SSA_NAME) + { + vectype = get_vectype_for_scalar_type (TREE_TYPE (lhs)); + if (vectype == NULL_TREE) + return NULL; + + /* Build a scalar type for the boolean result that when + vectorized matches the vector type of the result in +size and number of elements. */ + unsigned prec + = wi::udiv_trunc (TYPE_SIZE (vectype), + TYPE_VECTOR_SUBPARTS (vectype)).to_uhwi (); + tree type + = build_nonstandard_integer_type (prec, + TYPE_UNSIGNED (TREE_TYPE (var))); + if (get_vectype_for_scalar_type (type) == NULL_TREE) + return NULL; + + if (!check_bool_pattern (var, loop_vinfo, bb_vinfo)) + return NULL; + + rhs = adjust_bool_pattern (var, type, NULL_TREE, stmts); + lhs = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); + pattern_stmt + = gimple_build_assign_with_ops (COND_EXPR, lhs, + build2 (NE_EXPR, boolean_type_node, + rhs, build_int_cst (type, 0)), + gimple_assign_rhs2 (last_stmt), + gimple_assign_rhs3 (last_stmt)); + *type_out = vectype; + *type_in = vectype; + stmts-safe_push (last_stmt); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, vect_recog_bool_pattern: detected:\n); Index: gcc/testsuite/gcc.dg/vect/pr61194.c === *** gcc/testsuite/gcc.dg/vect/pr61194.c (revision 0) --- gcc/testsuite/gcc.dg/vect/pr61194.c (working copy) *** *** 0 --- 1,43 + /* { dg-require-effective-target vect_cond_mixed } */ + /* { dg-additional-options -ftree-loop-if-convert-stores } */ + + #include tree-vect.h + + static float x[1024]; + static float y[1024]; + static float z[1024]; + static float w[1024]; + + void __attribute__((noinline,noclone)) barX() + { + int i; + for (i=0; i1024; ++i) + z[i] = ((x[i]0) (w[i]0)) ? z[i] : y[i]; + } + + int main() + { + int i; + + check_vect (); + + for (i = 0; i 1024; ++i) + { + x[i] = -10 + i; + w[i] = 100 - i; + z[i] = 0.; + y[i] = 1.; + __asm__ volatile ( : : : memory); + } + + barX(); + + for (i = 0; i 1024; ++i) + if (z[i] != ((x[i]0 w[i]0) ? 0. : 1.)) + abort (); + + return 0; + } + + /* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect } } */ + /* { dg-final { cleanup-tree-dump vect } } */
PR 61084: Use widest_int in sparc_fold_builtin
This is the second part of PR 61084, which it seems I'd forgotten to post. pdist calculates a wide result from narrower inputs, so I should have used widest_int rather than wide_int. Tested by Rainer on SPARC Solaris. OK to install? Thanks, Richard gcc/ PR target/61084 * config/sparc/sparc.c (sparc_fold_builtin): Use widest_int rather than wide_int. Index: gcc/config/sparc/sparc.c === --- gcc/config/sparc/sparc.c2014-05-15 13:49:25.425654068 +0100 +++ gcc/config/sparc/sparc.c2014-05-16 12:56:48.578349758 +0100 @@ -10915,8 +10915,8 @@ sparc_fold_builtin (tree fndecl, int n_a TREE_CODE (arg2) == INTEGER_CST) { bool overflow = false; - wide_int result = arg2; - wide_int tmp; + widest_int result = wi::to_widest (arg2); + widest_int tmp; unsigned i; for (i = 0; i VECTOR_CST_NELTS (arg0); ++i) @@ -10926,8 +10926,8 @@ sparc_fold_builtin (tree fndecl, int n_a bool neg1_ovf, neg2_ovf, add1_ovf, add2_ovf; - tmp = wi::neg (e1, neg1_ovf); - tmp = wi::add (e0, tmp, SIGNED, add1_ovf); + tmp = wi::neg (wi::to_widest (e1), neg1_ovf); + tmp = wi::add (wi::to_widest (e0), tmp, SIGNED, add1_ovf); if (wi::neg_p (tmp)) tmp = wi::neg (tmp, neg2_ovf); else
Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
On 08/05/14 18:36, Ian Bolton wrote: Hi, It currently takes 4 instructions to generate certain immediates on AArch64 (unless we put them in the constant pool). For example ... long long beefcafebabe () { return 0xBEEFCAFEBABEll; } leads to ... mov x0, 0x47806 mov x0, 0xcafe, lsl 16 mov x0, 0xbeef, lsl 32 orr x0, x0, -281474976710656 The above case is tackled in this patch by employing MOVN to generate the top 32-bits in a single instruction ... mov x0, -71536975282177 movk x0, 0xcafe, lsl 16 movk x0, 0xbabe, lsl 0 Note that where at least two half-words are 0x, existing code that does the immediate in two instructions is still used.) Tested on standard gcc regressions and the attached test case. OK for commit? What about: long long a() { return 0x123456789abcll; } long long b() { return 0x123456789abcll; } long long c() { return 0x123456789abcll; } ? Surely these can also benefit from this sort of optimization, but it looks as though you only handle the top 16 bits being set. R. Cheers, Ian 2014-05-08 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use MOVN when top-most half-word (and only that half-word) is 0x. gcc/testsuite/ * gcc.target/aarch64/movn_1.c: New test. aarch64-movn-exploitation-patch-v5.txt diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 43a83566..a8e504e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } } + /* Look for case where upper 16 bits are set, so we can use MOVN. */ + if ((val 0xll) == 0xll) +{ + emit_insn (gen_rtx_SET (VOIDmode, dest, + GEN_INT (~ (~val (0xll 32); + emit_insn (gen_insv_immdi (dest, GEN_INT (16), + GEN_INT ((val 16) 0x))); + emit_insn (gen_insv_immdi (dest, GEN_INT (0), + GEN_INT (val 0x))); + return; +} + simple_sequence: first = true; mask = 0x; diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c b/gcc/testsuite/gcc.target/aarch64/movn_1.c new file mode 100644 index 000..cc11ade --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern void abort (void); + +long long +foo () +{ + /* { dg-final { scan-assembler mov\tx\[0-9\]+, -71536975282177 } } */ + return 0xbeefcafebabell; +} + +long long +merge4 (int a, int b, int c, int d) +{ + return ((long long) a 48 | (long long) b 32 + | (long long) c 16 | (long long) d); +} + +int main () +{ + if (foo () != merge4 (0x, 0xbeef, 0xcafe, 0xbabe)) +abort (); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
Re: [PATCH (for next stage 1)] Add return type to gimple function dumps
On Tue, Apr 29, 2014 at 5:01 PM, David Malcolm dmalc...@redhat.com wrote: On Tue, 2014-04-29 at 11:16 +0200, Richard Biener wrote: On Tue, Apr 29, 2014 at 2:58 AM, David Malcolm dmalc...@redhat.com wrote: On Thu, 2014-04-24 at 15:46 -0600, Jeff Law wrote: On 03/10/14 13:22, David Malcolm wrote: Gimple function dumps contain the types of parameters, but not of the return type. The attached patch fixes this omission; here's an example of the before/after diff: $ diff -up /tmp/pr23401.c.004t.gimple.old /tmp/pr23401.c.004t.gimple.new --- /tmp/pr23401.c.004t.gimple.old 2014-03-10 13:40:08.972063541 -0400 +++ /tmp/pr23401.c.004t.gimple.new 2014-03-10 13:39:49.346515464 -0400 @@ -1,3 +1,4 @@ +int (int i) { int D.1731; Successfully bootstrapped and regrtested on x86_64 Linux (Fedora 20). A couple of test cases needed tweaking, since they were counting the number of occurrences of int in the gimple dump, which thus changed for functions returning int (like the one above). OK for next stage 1? Conceptually OK. As Richi notes, the work here is in fixing up the testsuite. I didn't see a reply to Richi's question, particularly WRT the Fortran testsuite. I'm attaching a revised version of the patch which adds the use of TDF_SLIM (though it didn't appear to be necessary in the test I did of a function returning a struct). Successfully bootstrapped regrtested on x86_64 Linux (Fedora 20), using: --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto I didn't see any new failures from this in the testsuite, in particular gfortran.sum. Here's a comparison of the before/after test results, generated using my jamais-vu tool [1], with comments added by me inline: Comparing 16 common .sum files -- gcc/testsuite/ada/acats/acats.sum : total: 2320 PASS: 2320 gcc/testsuite/g++/g++.sum : total: 90421 FAIL: 3 PASS: 86969 XFAIL: 445 UNSUPPORTED: 3004 gcc/testsuite/gcc/gcc.sum : total: 110458 FAIL: 45 PASS: 108292 XFAIL: 265 XPASS: 33 UNSUPPORTED: 1823 gcc/testsuite/gfortran/gfortran.sum : total: 45717 PASS: 45600 XFAIL: 52 UNSUPPORTED: 65 gcc/testsuite/gnat/gnat.sum : total: 1255 PASS: 1234 XFAIL: 18 UNSUPPORTED: 3 gcc/testsuite/go/go.sum : total: 7266 PASS: 7258 XFAIL: 1 UNTESTED: 6 UNSUPPORTED: 1 gcc/testsuite/obj-c++/obj-c++.sum : total: 1450 PASS: 1354 XFAIL: 10 UNSUPPORTED: 86 gcc/testsuite/objc/objc.sum : total: 2973 PASS: 2893 XFAIL: 6 UNSUPPORTED: 74 x86_64-unknown-linux-gnu/boehm-gc/testsuite/boehm-gc.sum : total: 13 PASS: 12 UNSUPPORTED: 1 x86_64-unknown-linux-gnu/libatomic/testsuite/libatomic.sum : total: 54 PASS: 54 x86_64-unknown-linux-gnu/libffi/testsuite/libffi.sum : total: 1856 PASS: 1801 UNSUPPORTED: 55 x86_64-unknown-linux-gnu/libgo/libgo.sum : total: 122 PASS: 122 x86_64-unknown-linux-gnu/libgomp/testsuite/libgomp.sum : total: 2420 PASS: 2420 x86_64-unknown-linux-gnu/libitm/testsuite/libitm.sum : total: 30 PASS: 26 XFAIL: 3 UNSUPPORTED: 1 x86_64-unknown-linux-gnu/libjava/testsuite/libjava.sum : total: 2586 PASS: 2582 XFAIL: 4 x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum : total: 10265 PASS: 1 XFAIL: 41 UNSUPPORTED: 224 (...i.e. the totals were unchanged between unpatched/patched for all of the .sum files; and yes, Fortran was tested. Should there be a gcj.sum?) Tests that went away in gcc/testsuite/gcc/gcc.sum: 2 PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple int 5 PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple int 3 Tests appeared in gcc/testsuite/gcc/gcc.sum: 2 -- PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple int 6 PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple int 4 (...my comparison tool isn't smart enough yet to tie these went away/appeared results together; they reflect the fixups from the patch). Tests that went away in gcc/testsuite/go/go.sum: 2 -- PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) compilation, -O2 -g PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) execution, -O2 -g Tests appeared in gcc/testsuite/go/go.sum: 2 PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) compilation, -O2 -g PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) execution, -O2 -g (...I hand edited the above, this main.go test embeds numerous paths, which change between the two builds; so nothing really changed here). Are the above results sane? Yes. I'm not sure why I didn't see the failures
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
The test uses SSSE3 because of the following restriction in i386.c: static bool expand_vec_perm_pshufb2 (struct expand_vec_perm_d *d) { rtx rperm[2][16], vperm, l, h, op, m128; unsigned int i, nelt, eltsz; if (!TARGET_SSSE3 || GET_MODE_SIZE (d-vmode) != 16) return false; Does the following fix ok? 2014-05-16 Evgeny Stupachenko evstu...@gmail.com * gcc.dg/vect/pr52252-ld.c: Fix target for the test. diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c index 6e3cb52..301433b 100644 --- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c +++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details } */ +/* { dg-skip-if why { ! { x86_64-*-* i?86-*-* } } } */ #define byte unsigned char On Tue, May 13, 2014 at 12:21 PM, Richard Biener rguent...@suse.de wrote: On Mon, 12 May 2014, Evgeny Stupachenko wrote: The test is on general changes. However I was able to test it on x86 only. I see 2 possible solutions: 1. Set the test for x86 only. 2. Modify it so that it will pass on sparc-sun-solaris2. If 2. is not acceptable I'll create patch for 1. Currently I don't see why in0_9 = *in_27 is not supported. Does the test fail because of unsupported permutation? The test uses /* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details { target { i?86-*-* x86_64-*-* } } } */ that's bogus. You shouldn't add any dg-options. Instead use proper dg-effective-target checks for the mssse3 feature you are using. Note that the dg-final checking is applied regardless of the options above are applied or not. Why does the test only succeed with -mssse3 btw? The proper way to restrict the test to a single target is to use /* { dg-skip-if why { ! { x86_64-*-* i?86-*-* } } } */ Sorry for not catching this in the review. Richard. On Mon, May 12, 2014 at 7:14 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Evgeny Stupachenko evstu...@gmail.com writes: Patch with fixes attached. Currently if-structure is as following: + if (count == 3) ... + else + { + /* If length is not equal to 3 then only power of 2 is supported. */ + gcc_assert (exact_log2 (count) != -1); For stores group I've created another mail thread. [...] 2014-05-06 Evgeny Stupachenko evstu...@gmail.com PR tree-optimization/52252 * gcc.dg/vect/pr52252-ld.c: Test on loads group of size 3. This test FAILs on sparc-sun-solaris2.11, both 32 and 64-bit: FAIL: gcc.dg/vect/pr52252-ld.c scan-tree-dump-times vect vectorized 1 loops 1 FAIL: gcc.dg/vect/pr52252-ld.c -flto -ffat-lto-objects scan-tree-dump-times vect vectorized 1 loops 1 The dumps have /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/pr52252-ld.c:10:3: note: not vectorized: relevant stmt not supported: in0_9 = *in_27; /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/vect/pr52252-ld.c:7:1: note: vectorized 0 loops in function. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
Hi Evgeny, Does the following fix ok? 2014-05-16 Evgeny Stupachenko evstu...@gmail.com * gcc.dg/vect/pr52252-ld.c: Fix target for the test. diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c index 6e3cb52..301433b 100644 --- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c +++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details } */ +/* { dg-skip-if why { ! { x86_64-*-* i?86-*-* } } } */ If the test is really x86 specific, move it to gcc.target/i386 and remove the dg-skip-if. Otherwise, add an explanation for skipping the test on other targets to the first arg of dg-skip-if. This is supposed to be a comment stating why the test is skipped, not why literally. Approval or rejection of the testcase is up to the target maintainers. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: patch8.diff updated Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
On Fri, May 16, 2014 at 1:03 AM, Svante Signell svante.sign...@gmail.com wrote: For that part of the patch without it the build on GNU/Hurd fails. On the other hand SYS_FCNTL is not defined for e.g. GNU/Linux either. This is used in gcc-4.9-4.9.0/src/libgo/go/net/fd_unix.go: func dupCloseOnExec(fd int) (newfd int, err error) { if atomic.LoadInt32(tryDupCloexec) == 1 syscall.F_DUPFD_CLOEXEC!=0 { r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd), syscall.F_DUPFD_CLOEXEC, 0) It is yet unknown how the build succeeds on Linux without the SYS_FCNTL being defined? Maybe any the conditions above are not met. On GNU/Linux SYS_FCNTL is defined by the generated file sysinfo.go, because SYS_fcntl is defined by syscall.h. Ian
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
On Fri, May 16, 2014 at 03:11:05PM +0200, Rainer Orth wrote: Hi Evgeny, Does the following fix ok? 2014-05-16 Evgeny Stupachenko evstu...@gmail.com * gcc.dg/vect/pr52252-ld.c: Fix target for the test. diff --git a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c index 6e3cb52..301433b 100644 --- a/gcc/testsuite/gcc.dg/vect/pr52252-ld.c +++ b/gcc/testsuite/gcc.dg/vect/pr52252-ld.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options -O2 -g -ftree-vectorize -mssse3 -fdump-tree-vect-details } */ +/* { dg-skip-if why { ! { x86_64-*-* i?86-*-* } } } */ If the test is really x86 specific, move it to gcc.target/i386 and remove the dg-skip-if. Otherwise, add an explanation for skipping the test on other targets to the first arg of dg-skip-if. This is supposed to be a comment stating why the test is skipped, not why literally. Well, I don't see anything i?86/x86_64 specific on the test. What is specific is the -mssse3, which supposedly should be added through /* { dg-additional-options -mssse3 { target { i?86-*-* x86_64-*-* } } } */ and then perhaps the test might not necessarily be vectorized (so the dg-final line may need target guard as well. But, I see no reason not to try to compile this on other targets. Jakub
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
I'm not sure about other architectures. I can test it on x86. Most likely it will pass on Arm, but again I'm not sure that { arm*-*-* } is ok. The test is on general changes. So we can wait for others and if there are no more objections leave sparc-sun-solaris2 as target to skip. Or change to the following: /* { dg-skip-if The test should pass on x86, other architectures are untested { ! { x86_64-*-* i?86-*-* } } } */ So that other will add their targets if necessary. Thanks, Evgeny
Re: [PATCH, PR52252] Vectorization for load/store groups of size 3.
Evgeny Stupachenko evstu...@gmail.com writes: The test is on general changes. So we can wait for others and if there are no more objections leave sparc-sun-solaris2 as target to skip. If so, use sparc*-*-* instead. Or change to the following: /* { dg-skip-if The test should pass on x86, other architectures are untested { ! { x86_64-*-* i?86-*-* } } } */ So that other will add their targets if necessary. I wouldn't do it this way, because this will never happen. Rather, start with all targets and skip or xfail if necessary, adding an explanation why this is necessary. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] [PING^2] Fix for PR libstdc++/60758
On 07.05.2014 13:28, Ramana Radhakrishnan wrote: On 05/07/14 09:19, Yury Gribov wrote: Original Message Subject: [PING] [PATCH] Fix for PR libstdc++/60758 Date: Thu, 17 Apr 2014 17:48:12 +0400 From: Alexey Merzlyakov alexey.merzlya...@samsung.com To: Ramana Radhakrishnan ramra...@arm.com CC: gcc-patches@gcc.gnu.org gcc-patches@gcc.gnu.org, Viacheslav Garbuzov v.garbu...@samsung.com, Yury Gribov y.gri...@samsung.com Hi, This fixes infinite backtrace in __cxa_end_cleanup(). Regtest was finished with no regressions on arm-linux-gnueabi(sf). The patch posted at: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html This is OK to apply if no regressions. Thanks, Ramana Thanks in advance. Best regards, Merzlyakov Alexey I have re-tested it again on arm-linux-gnueabi(sf) - no regressions. The change was committed to mainline as revision 210515. Best regards, Merzlyakov Alexey
Re: avx runtime check
On May 15, 2014, at 11:52 PM, Richard Biener richard.guent...@gmail.com wrote: On May 16, 2014 4:47:11 AM CEST, Mike Stump mikest...@comcast.net wrote: This reorders the avx checks and gates on a target triplet check before compiling any code. Can you explain why? Sure, because check_avx_hw_available runs [istarget x86_64-*-*] || [istarget i?86-*-*] before doing anything, like compiling: typedef double __m512d __attribute__ ((__vector_size__ (64))); __m512d _mm512_add (__m512d a) { return __builtin_ia32_addpd512_mask (a, a, a, 1, 4); } with -mavx512f, which my target doesn’t have, but even running a compilation of that seems wrong. The other possibility would be to add in a: # If this is not the right target then we can skip the test. if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } { expr 0 } else { into the test for check_effective_target_avx512f. proc check_effective_target_avx512f { } { return [check_no_compiler_messages avx512f object { typedef double __m512d __attribute__ ((__vector_size__ (64))); __m512d _mm512_add (__m512d a) { return __builtin_ia32_addpd512_mask (a, a, a, 1, 4); } } -O2 -mavx512f ] } proc check_avx_hw_available { } { return [check_cached_effective_target avx_hw_available { # If this is not the right target then we can skip the test. if { !([istarget x86_64-*-*] || [istarget i?86-*-*]) } { expr 0 } else { check_runtime_nocache avx_hw_available { #include cpuid.h int main () { unsigned int eax, ebx, ecx, edx; if (__get_cpuid (1, eax, ebx, ecx, edx)) return ((ecx (bit_AVX | bit_OSXSAVE)) != (bit_AVX | bit_OSXSAVE)); return 1; } } } }] } diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 40b5414..103a28a 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1353,8 +1353,8 @@ proc check_effective_target_sse2_runtime { } { # Return 1 if the target supports running AVX executables, 0 otherwise. proc check_effective_target_avx_runtime { } { -if { [check_effective_target_avx] - [check_avx_hw_available] +if { [check_avx_hw_available] + [check_effective_target_avx] [check_avx_os_support_available] } { return 1 }
[PATCH][ARM] Vectorize bswap[16,32,64] operations
Hi all, This is the aarch32 version of https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00850.html that allows us to (auto-)vectorise the __builtin_bswap[16,32,64] functions using the vrev instructions. For that we create some new NEON builtins and get arm_builtin_vectorized_function to map to them when asked to vectorise the corresponding builtin. The tests for this were added with the aarch64 patch mentioned above but were disabled for arm. This patch enables them now that we support the operations (of course they now pass on arm) Tested arm-none-eabi and bootstrapped on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Kyrill 2014-05-16 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/neon.md (neon_bswapmode): New pattern. * config/arm/arm.c (neon_itype): Add NEON_BSWAP. (arm_init_neon_builtins): Handle NEON_BSWAP. Define required type nodes. (arm_expand_neon_builtin): Handle NEON_BSWAP. (arm_builtin_vectorized_function): Handle BUILTIN_BSWAP builtins. * config/arm/arm_neon_builtins.def (bswap): Define builtins. * config/arm/iterators.md (VDQHSD): New mode iterator. 2014-05-16 Kyrylo Tkachov kyrylo.tkac...@arm.com * lib/target-supports.exp (check_effective_target_vect_bswap): Specify arm*-*-* support.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 1e44080..6d430a2 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -23064,6 +23064,7 @@ typedef enum { NEON_BINOP, NEON_TERNOP, NEON_UNOP, + NEON_BSWAP, NEON_GETLANE, NEON_SETLANE, NEON_CREATE, @@ -23531,14 +23532,19 @@ arm_init_neon_builtins (void) tree V8QI_type_node; tree V4HI_type_node; + tree V4UHI_type_node; tree V4HF_type_node; tree V2SI_type_node; + tree V2USI_type_node; tree V2SF_type_node; tree V16QI_type_node; tree V8HI_type_node; + tree V8UHI_type_node; tree V4SI_type_node; + tree V4USI_type_node; tree V4SF_type_node; tree V2DI_type_node; + tree V2UDI_type_node; tree intUQI_type_node; tree intUHI_type_node; @@ -23634,16 +23640,26 @@ arm_init_neon_builtins (void) const_intDI_pointer_node = build_pointer_type (const_intDI_node); const_float_pointer_node = build_pointer_type (const_float_node); + /* Unsigned integer types for various mode sizes. */ + intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode)); + intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode)); + intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode)); + intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode)); + neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode)); /* Now create vector types based on our NEON element types. */ /* 64-bit vectors. */ V8QI_type_node = build_vector_type_for_mode (neon_intQI_type_node, V8QImode); V4HI_type_node = build_vector_type_for_mode (neon_intHI_type_node, V4HImode); + V4UHI_type_node = +build_vector_type_for_mode (intUHI_type_node, V4HImode); V4HF_type_node = build_vector_type_for_mode (neon_floatHF_type_node, V4HFmode); V2SI_type_node = build_vector_type_for_mode (neon_intSI_type_node, V2SImode); + V2USI_type_node = +build_vector_type_for_mode (intUSI_type_node, V2SImode); V2SF_type_node = build_vector_type_for_mode (neon_float_type_node, V2SFmode); /* 128-bit vectors. */ @@ -23651,19 +23667,18 @@ arm_init_neon_builtins (void) build_vector_type_for_mode (neon_intQI_type_node, V16QImode); V8HI_type_node = build_vector_type_for_mode (neon_intHI_type_node, V8HImode); + V8UHI_type_node = +build_vector_type_for_mode (intUHI_type_node, V8HImode); V4SI_type_node = build_vector_type_for_mode (neon_intSI_type_node, V4SImode); + V4USI_type_node = +build_vector_type_for_mode (intUSI_type_node, V4SImode); V4SF_type_node = build_vector_type_for_mode (neon_float_type_node, V4SFmode); V2DI_type_node = build_vector_type_for_mode (neon_intDI_type_node, V2DImode); - - /* Unsigned integer types for various mode sizes. */ - intUQI_type_node = make_unsigned_type (GET_MODE_PRECISION (QImode)); - intUHI_type_node = make_unsigned_type (GET_MODE_PRECISION (HImode)); - intUSI_type_node = make_unsigned_type (GET_MODE_PRECISION (SImode)); - intUDI_type_node = make_unsigned_type (GET_MODE_PRECISION (DImode)); - neon_intUTI_type_node = make_unsigned_type (GET_MODE_PRECISION (TImode)); + V2UDI_type_node = +build_vector_type_for_mode (intUDI_type_node, V2DImode); (*lang_hooks.types.register_builtin_type) (intUQI_type_node, @@ -23741,8 +23756,6 @@ arm_init_neon_builtins (void) if (TARGET_CRYPTO TARGET_HARD_FLOAT) { -tree V4USI_type_node = - build_vector_type_for_mode (intUSI_type_node, V4SImode); tree V16UQI_type_node = build_vector_type_for_mode (intUQI_type_node, V16QImode); @@ -24106,6 +24119,31 @@ arm_init_neon_builtins (void) ftype = build_function_type_list (return_type,
Re: [Committed] [PATCH,*/2] shrink wrap a function with a single loop: split live_edge
One of the two commits breaks several fortran tests in 32 bit mode, ... It it is not the default you need to compile the failing tests with -mtune=core2 or -mtune=corei7. Dominique
[patch] fix recent libstdc++ test failures
Tested x86_64-linux, -m32 and -m64, committed to trunk. commit c5265e76b75cbfa0744c354f6d5aaf98c327ff8e Author: Jonathan Wakely jwak...@redhat.com Date: Fri May 16 13:26:18 2014 +0100 * testsuite/20_util/make_unsigned/requirements/typedefs-1.cc: Fix test for 32-bit target. * testsuite/20_util/make_unsigned/requirements/typedefs-2.cc: Likewise. * testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc: Fix test. diff --git a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc index a893ede..d9f13aa 100644 --- a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc +++ b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-1.cc @@ -49,7 +49,7 @@ void test01() #ifdef _GLIBCXX_USE_WCHAR_T typedef make_unsignedvolatile wchar_t::type test23_type; - static_assert(is_sametest23_type, volatile unsigned wchar_t::value, ); + static_assert(is_unsignedtest23_type::value, ); #endif // Chapter 48, chapter 20. Smallest rank such that new unsigned type diff --git a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc index 637b0c7..807e2bf 100644 --- a/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc +++ b/libstdc++-v3/testsuite/20_util/make_unsigned/requirements/typedefs-2.cc @@ -30,6 +30,7 @@ void test01() { using std::make_unsigned; using std::is_same; + using std::is_unsigned; // Positive tests. typedef make_unsignedconst unsigned int::type test2_type; @@ -50,7 +51,7 @@ void test01() #ifdef _GLIBCXX_USE_WCHAR_T typedef make_unsignedvolatile wchar_t::type test23_type; - static_assert(is_sametest23_type, volatile unsigned wchar_t::value, ); + static_assert(is_unsignedtest23_type::value, ); #endif typedef make_unsignedtest_enum::type test24_type; diff --git a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc index 26fc3d0..dd52a70 100644 --- a/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc +++ b/libstdc++-v3/testsuite/tr1/2_general_utilities/shared_ptr/modifiers/reset_neg.cc @@ -33,8 +33,7 @@ test01() bool test __attribute__((unused)) = true; const std::tr1::shared_ptrA p1(new A); - p1.reset(); // { dg-error discards qualifiers } - // { dg-error no match { target *-*-* } 36 } + p1.reset(); // { dg-error no match } return 0; }
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On Fri, May 16, 2014 at 12:51 PM, Richard Biener wrote: Btw, the bswap pass enhancements that are currently in review may also be an opportunity to catch these. They can merge adjacent loads that are used composed (but not yet composed by storing into adjacent memory). The basic-block vectorizer should also handle this (if the composition happens to be by storing into adjacent memory) - of course it needs vector modes available and it has to be enabled. That won't work for cleaning up spill code load/reloads, which is one of the motivations for the new pass. Ciao! Steven
Re: [GCC RFC]A new and simple pass merging paired load store instructions
Btw, the bswap pass enhancements that are currently in review may also be an opportunity to catch these. They can merge adjacent loads that are used composed (but not yet composed by storing into adjacent memory). The basic-block vectorizer should also handle this (if the composition happens to be by storing into adjacent memory) - of course it needs vector modes available and it has to be enabled. Should we really do it there ? If we start merging multiple loads and stores into the vector register set, on some architectures or microarchitectures the cost of moving to and from the vector register set to the general purpose register set might be too expensive for some operations and then we get into all sorts of issues. I think there is merit in an RTL pass. Ramana Richard. Thanks, bin So, any comments about this? Thanks, bin 2014-05-15 Bin Cheng bin.ch...@arm.com * common.opt (flag_merge_paired_loadstore): New option. * merge-paired-loadstore.c: New file. * Makefile.in: Support new file. * config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro. (load_latency_expanded_p, arm_merge_paired_loadstore): New function. * params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param. * doc/invoke.texi (-fmerge-paired-loadstore): New. (max-merge-paired-loadstore-distance): New. * doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New. * doc/tm.texi: Regenerated. * target.def (merge_paired_loadstore): New. * tree-pass.h (make_pass_merge_paired_loadstore): New decl. * passes.def (pass_merge_paired_loadstore): New pass. * timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var. gcc/testsuite/ChangeLog 2014-05-15 Bin Cheng bin.ch...@arm.com * gcc.target/arm/merge-paired-loadstore.c: New test. merge-paired-loadstore-20140515.txt -- Best Regards.
[PATCH 0/5] let gdb reuse gcc'c C compiler
Hi! This patch series is half of a project to let gdb reuse gcc (which half depends on which list you are seeing this on), so that users can compile small snippets of code and evaluate them in the current context of the inferior. This first series implements this idea for C. A user can compile a code snippet and it will be inserted into the inferior and evaluated. Declarations needed by the snippet are supplied by gdb, and there is a bit of magic so that the snippets can refer to local variables in the current frame. The new command allows for arbitrary code to be inserted -- not just expressions. For example: (gdb) compile code int i; for (i = 0; i 3; ++i) printf (#%d\n, i) #0 #1 #2 This series supplies a gcc plugin to do most of the work, so that any gcc crashes -- seen during development due to translation bugs -- do not also crash gdb. The interface between gdb and gcc is defined by a few files added to include/. There is a new shared library which gdb loads in order to communicate with the gcc plugin. This library communicates with the gcc plugin using a simple, ad-hoc RPC mechanism. This shared library exports a single public function which is used to instantiate any needed objects. This makes it simple to version the API and avoid undue synchronization between gcc and gdb. We think the plugin is best suited to be put into the gcc repository because it is coupled more tightly to gcc than to gdb. To try it out, just build gcc and gdb with the patches applied. Then set your PATH and LD_LIBRARY_PATH to point to the right subdirectories of the new gcc install directory. In later series we plan to extend this functionality; either on the gcc side, say by writing a similar plugin for C++; or on the gdb side, say by making it possible to compile breakpoint conditions. However, we haven't yet decided exactly which future projects we will tackle or in what order.
[PATCH 1/5] export finish_bitfield_layout from stor-layout
The gdb plugin handles some aspects of type layout on its own. It does this because it knows the layout of types, but not the path by which the layout was determined -- so normal gcc things like TYPE_PACKED cannot be used. This patch exposes one bit of stor-layout so it can be used by the plugin. 2014-05-16 Phil Muldoon pmuld...@redhat.com Tom Tromey tro...@redhat.com * stor-layout.c (finish_bitfield_layout): Now public. Change argument type to 'tree'. (finish_record_layout): Update. * stor-layout.h (finish_bitfield_layout): Declare. --- gcc/ChangeLog | 8 gcc/stor-layout.c | 12 ++-- gcc/stor-layout.h | 1 + 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/gcc/stor-layout.c b/gcc/stor-layout.c index 8fa4dc8..ad057d9 100644 --- a/gcc/stor-layout.c +++ b/gcc/stor-layout.c @@ -1931,10 +1931,10 @@ finish_bitfield_representative (tree repr, tree field) } /* Compute and set FIELD_DECLs for the underlying objects we should - use for bitfield access for the structure laid out with RLI. */ + use for bitfield access for the structure T. */ -static void -finish_bitfield_layout (record_layout_info rli) +void +finish_bitfield_layout (tree t) { tree field, prev; tree repr = NULL_TREE; @@ -1943,10 +1943,10 @@ finish_bitfield_layout (record_layout_info rli) we could use the underlying type as hint for the representative if the bitfield would fit and the representative would not exceed the union in size. */ - if (TREE_CODE (rli-t) != RECORD_TYPE) + if (TREE_CODE (t) != RECORD_TYPE) return; - for (prev = NULL_TREE, field = TYPE_FIELDS (rli-t); + for (prev = NULL_TREE, field = TYPE_FIELDS (t); field; field = DECL_CHAIN (field)) { if (TREE_CODE (field) != FIELD_DECL) @@ -2033,7 +2033,7 @@ finish_record_layout (record_layout_info rli, int free_p) finalize_type_size (rli-t); /* Compute bitfield representatives. */ - finish_bitfield_layout (rli); + finish_bitfield_layout (rli-t); /* Propagate TYPE_PACKED to variants. With C++ templates, handle_packed_attribute is too early to do this. */ diff --git a/gcc/stor-layout.h b/gcc/stor-layout.h index 0ff98f8..cd58672 100644 --- a/gcc/stor-layout.h +++ b/gcc/stor-layout.h @@ -35,6 +35,7 @@ extern tree rli_size_so_far (record_layout_info); extern void normalize_rli (record_layout_info); extern void place_field (record_layout_info, tree); extern void compute_record_mode (tree); +extern void finish_bitfield_layout (tree); extern void finish_record_layout (record_layout_info, int); extern unsigned int element_precision (const_tree); extern void finalize_size_functions (void); -- 1.9.0
[PATCH 3/5] introduce the binding oracle
gdb wants to supply any declarations that may be referred to by the user's code. Hooking into symbol lookup was an efficient way to accomplish this. This patch introducing a binding oracle that is consulted whenever a symbol binding is looked up for the first time. The oracle is just a global function pointer. If it is NULL, no special work is done. It is called with the identifier to supply and with an enum argument indicating the kind of binding being requested. The oracle can then call back into the C front end (via the new functions c_pushtag and c_bind) to supply a binding; or it can silently do nothing if the request could not be fulfilled. The code caches Whether the oracle has been called to avoid repeated useless queries. There is a little hack in c_print_identifier to avoid calling the binding oracle here. This makes debugging gcc in the presence of the plugin remain relatively sane -- without this, calling debug_tree or the like can confusingly call into the plugin. 2014-05-16 Phil Muldoon pmuld...@redhat.com Tom Tromey tro...@redhat.com * c-tree.h (enum c_oracle_request): New. (c_binding_oracle_function): New typedef. (c_binding_oracle, c_pushtag, c_bind): Declare. * c-decl.c (c_binding_oracle): New global. (I_SYMBOL_CHECKED): New macro. (i_symbol_binding): New function. (I_SYMBOL_BINDING, I_SYMBOL_DECL): Redefine. (I_TAG_CHECKED): New macro. (i_tag_binding): New function. (I_TAG_BINDING, I_TAG_DECL): Redefine. (I_LABEL_CHECKED): New macro. (i_label_binding): New function. (I_LABEL_BINDING, I_LABEL_DECL): Redefine. (c_print_identifier): Save and restore c_binding_oracle. (c_pushtag, c_bind): New functions. --- gcc/c/ChangeLog | 19 +++ gcc/c/c-decl.c | 161 ++-- gcc/c/c-tree.h | 24 + 3 files changed, 189 insertions(+), 15 deletions(-) diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index d52dcc9..b391add 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -215,21 +215,6 @@ struct GTY((chain_next (%h.prev))) c_binding { #define B_IN_FILE_SCOPE(b) ((b)-depth == 1 /*file_scope-depth*/) #define B_IN_EXTERNAL_SCOPE(b) ((b)-depth == 0 /*external_scope-depth*/) -#define I_SYMBOL_BINDING(node) \ - (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))-symbol_binding) -#define I_SYMBOL_DECL(node) \ - (I_SYMBOL_BINDING(node) ? I_SYMBOL_BINDING(node)-decl : 0) - -#define I_TAG_BINDING(node) \ - (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))-tag_binding) -#define I_TAG_DECL(node) \ - (I_TAG_BINDING(node) ? I_TAG_BINDING(node)-decl : 0) - -#define I_LABEL_BINDING(node) \ - (((struct lang_identifier *) IDENTIFIER_NODE_CHECK(node))-label_binding) -#define I_LABEL_DECL(node) \ - (I_LABEL_BINDING(node) ? I_LABEL_BINDING(node)-decl : 0) - /* Each C symbol points to three linked lists of c_binding structures. These describe the values of the identifier in the three different namespaces defined by the language. */ @@ -245,6 +230,96 @@ struct GTY(()) lang_identifier { extern char C_SIZEOF_STRUCT_LANG_IDENTIFIER_isnt_accurate [(sizeof(struct lang_identifier) == C_SIZEOF_STRUCT_LANG_IDENTIFIER) ? 1 : -1]; +/* The binding oracle; see c-tree.h. */ +void (*c_binding_oracle) (enum c_oracle_request, tree identifier); + +/* This flag is set on an identifier if we have previously asked the + binding oracle for this identifier's symbol binding. */ +#define I_SYMBOL_CHECKED(node) \ + (TREE_LANG_FLAG_4 (IDENTIFIER_NODE_CHECK (node))) + +static inline struct c_binding* * +i_symbol_binding (tree node) +{ + struct lang_identifier *lid += (struct lang_identifier *) IDENTIFIER_NODE_CHECK (node); + + if (lid-symbol_binding == NULL + c_binding_oracle != NULL + !I_SYMBOL_CHECKED (node)) +{ + /* Set the checked flag first, to avoid infinite recursion +when the binding oracle calls back into gcc. */ + I_SYMBOL_CHECKED (node) = 1; + c_binding_oracle (C_ORACLE_SYMBOL, node); +} + + return lid-symbol_binding; +} + +#define I_SYMBOL_BINDING(node) (*i_symbol_binding (node)) + +#define I_SYMBOL_DECL(node) \ + (I_SYMBOL_BINDING(node) ? I_SYMBOL_BINDING(node)-decl : 0) + +/* This flag is set on an identifier if we have previously asked the + binding oracle for this identifier's tag binding. */ +#define I_TAG_CHECKED(node) \ + (TREE_LANG_FLAG_5 (IDENTIFIER_NODE_CHECK (node))) + +static inline struct c_binding ** +i_tag_binding (tree node) +{ + struct lang_identifier *lid += (struct lang_identifier *) IDENTIFIER_NODE_CHECK (node); + + if (lid-tag_binding == NULL + c_binding_oracle != NULL + !I_TAG_CHECKED (node)) +{ + /* Set the checked flag first, to avoid infinite recursion +when the binding oracle calls back into gcc. */ + I_TAG_CHECKED (node) = 1; + c_binding_oracle (C_ORACLE_TAG, node); +
[PATCH 4/5] add gcc/gdb interface files
The gcc plugin is split into two parts. One part is an ordinary gcc plugin. The other part is a shared library that is loaded by gdb. This patch adds some files that define the interface exported by this shared library to gdb. These files also define the internal API by which the gdb- and gcc-sides communicate. These files will be kept in sync between gcc and gdb like much of include/. The exported API has been intentionally kept very simple. In particular only a single function is exported from the gdb-side library; symbol visibility is used to hide everything else. This exported symbol is a function which is called to return a structure holding function pointers that gdb then uses. This structure is versioned so that changes can be made without necessarily requiring a simultaneous gdb upgrade. Note that the C compiler API is broken out separately. This lets us extend it to other GCC front ends as desired. We plan to investigate C++ in the future. 2014-05-16 Phil Muldoon pmuld...@redhat.com Jan Kratochvil jan.kratoch...@redhat.com Tom Tromey tro...@redhat.com * gcc-c-fe.def: New file. * gcc-c-interface.h: New file. * gcc-interface.h: New file. --- include/ChangeLog | 8 ++ include/gcc-c-fe.def | 195 include/gcc-c-interface.h | 220 ++ include/gcc-interface.h | 120 + 4 files changed, 543 insertions(+) create mode 100644 include/gcc-c-fe.def create mode 100644 include/gcc-c-interface.h create mode 100644 include/gcc-interface.h diff --git a/include/gcc-c-fe.def b/include/gcc-c-fe.def new file mode 100644 index 000..dc5a41a --- /dev/null +++ b/include/gcc-c-fe.def @@ -0,0 +1,195 @@ +/* Interface between GCC C FE and GDB -*- c -*- + + Copyright (C) 2014 Free Software Foundation, Inc. + + This file is part of GDB. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see http://www.gnu.org/licenses/. */ + + + +/* Create a new decl in GCC. A decl is a declaration, basically a + kind of symbol. + + NAME is the name of the new symbol. SYM_KIND is the kind of + symbol being requested. SYM_TYPE is the new symbol's C type; + except for labels, where this is not meaningful and should be + zero. If SUBSTITUTION_NAME is not NULL, then a reference to this + decl in the source will later be substituted with a dereference + of a variable of the given name. Otherwise, for symbols having + an address (e.g., functions), ADDRESS is the address. FILENAME + and LINE_NUMBER refer to the symbol's source location. If this + is not known, FILENAME can be NULL and LINE_NUMBER can be 0. + This function returns the new decl. */ + +GCC_METHOD7 (gcc_decl, build_decl, +const char */* name */, +enum gcc_c_symbol_kind /* sym_kind */, +gcc_type /* sym_type */, +const char */* substitution_name */, +gcc_address /* address */, +const char */* filename */, +unsigned int /* line_number */) + +/* Insert a GCC decl into the symbol table. DECL is the decl to + insert. IS_GLOBAL is true if this is an outermost binding, and + false if it is a possibly-shadowing binding. */ + +GCC_METHOD2 (int /* bool */, bind, gcc_decl /* decl */, +int /* bool */ /* is_global */) + +/* Insert a tagged type into the symbol table. NAME is the tag name + of the type and TAGGED_TYPE is the type itself. TAGGED_TYPE must + be either a struct, union, or enum type, as these are the only + types that have tags. FILENAME and LINE_NUMBER refer to the type's + source location. If this is not known, FILENAME can be NULL and + LINE_NUMBER can be 0. */ + +GCC_METHOD4 (int /* bool */, tagbind, +const char */* name */, +gcc_type /* tagged_type */, +const char * /* filename */, +unsigned int /* line_number */) + +/* Return the type of a pointer to a given base type. */ + +GCC_METHOD1 (gcc_type, build_pointer_type, +gcc_type /* base_type */) + +/* Create a new 'struct' type. Initially it has no fields. */ + +GCC_METHOD0 (gcc_type, build_record_type) + +/* Create a new 'union' type. Initially it has no fields. */ + +GCC_METHOD0 (gcc_type, build_union_type) + +/* Add a field to a struct or union type. FIELD_NAME is
Re: add dbgcnt and opt-info support for devirtualization
On Fri, May 16, 2014 at 4:00 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 1:54 AM, Xinliang David Li davi...@google.com wrote: Hi, debugging runtime bugs due to devirtualization can be hard for very large C++ programs with complicated class hierarchy. This patch adds the support to report this high level transformation via -fopt-info (not hidden inside dump file) and the ability the do binary search with cutoff. Ok for trunk after build and test? + else if (dump_enabled_p ()) + { + location_t loc = gimple_location (ie-call_stmt); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, + Discovered direct call to non-function in %s, diagnostics start with lower-case. Why not merge this with the dump_file case? The point of all the infrastructure was to _not_ need to distinguish the cases ... Right -- however in this case I don't want the node 'order' to leak into the opt report. Should we drop it? (similar for the other cases, and IIRC you miss one case in tree-ssa-pre.c calling ipa_intraprocedural_devirtualization?) Good catch. Will add it. thanks, David Thanks, Richard. thanks, David
[PATCH 2/5] c_diagnostic_ignored_function hack
In the typical case, when compiling a snippet of user code, gdb wraps the user's text in a dummy function. It's somewhat odd for users if an error in their code is mentioned as coming from this dummy function. This patch makes it possible to suppress the function-name display in a straightforward way: it adds a new global which the plugin can set to declare the name of the dummy function. This patch seems like a bit of a hack, but there didn't seem to be a notably cleaner approach. 2014-05-16 Phil Muldoon pmuld...@redhat.com Tom Tromey tro...@redhat.com * c-lang.c (c_diagnostic_ignored_function): New global. (c_print_error_function): New function. (LANG_HOOKS_PRINT_ERROR_FUNCTION): Define. * c-lang.h (c_diagnostic_ignored_function): Declare. --- gcc/c/ChangeLog | 8 gcc/c/c-lang.c | 22 ++ gcc/c/c-lang.h | 4 3 files changed, 34 insertions(+) diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c index 97c0443..e563813 100644 --- a/gcc/c/c-lang.c +++ b/gcc/c/c-lang.c @@ -35,6 +35,26 @@ along with GCC; see the file COPYING3. If not see enum c_language_kind c_language = clk_c; +// If non-zero, this names a function which should not be reported in +// a diagnostic. This is used by the gdb plugin to avoid showing the +// generated function name to the user. +const char *c_diagnostic_ignored_function; + +// An implementation of the print_error_function langhook that +// respects C_DIAGNOSTIC_IGNORED_FUNCTION. +static void +c_print_error_function (diagnostic_context *context, const char *file, + diagnostic_info *diagnostic) +{ + if (c_diagnostic_ignored_function != NULL + current_function_decl != NULL_TREE + DECL_NAME (current_function_decl) != NULL_TREE + strcmp (IDENTIFIER_POINTER (DECL_NAME (current_function_decl)), +c_diagnostic_ignored_function) == 0) +return; + lhd_print_error_function (context, file, diagnostic); +} + /* Lang hooks common to C and ObjC are declared in c-objc-common.h; consequently, there should be very few hooks below. */ @@ -44,6 +64,8 @@ enum c_language_kind c_language = clk_c; #define LANG_HOOKS_INIT c_objc_common_init #undef LANG_HOOKS_INIT_TS #define LANG_HOOKS_INIT_TS c_common_init_ts +#undef LANG_HOOKS_PRINT_ERROR_FUNCTION +#define LANG_HOOKS_PRINT_ERROR_FUNCTION c_print_error_function /* Each front end provides its own lang hook initializer. */ struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER; diff --git a/gcc/c/c-lang.h b/gcc/c/c-lang.h index 7fcf333..022206f 100644 --- a/gcc/c/c-lang.h +++ b/gcc/c/c-lang.h @@ -59,4 +59,8 @@ struct GTY(()) language_function { attribute lists. */ extern GTY(()) int current_omp_declare_target_attribute; +/* If non-zero, the name of a function whose name should not be + reported in a diagnostic. */ +extern const char *c_diagnostic_ignored_function; + #endif /* ! GCC_C_LANG_H */ -- 1.9.0
[C++ Patch] PR 51640
Hi, the main issue in the bug report is luckily already fixed in mainline and 4.9 (had to do with a commit of mine ;( but I think we can do a little better in the diagnostic: avoid the final redundant does not name a type error after the error message about ambiguity. Tested x86_64-linux. Thanks, Paolo. // /cp 2014-05-16 Paolo Carlini paolo.carl...@oracle.com PR c++/51640 * parser.c (cp_parser_diagnose_invalid_type_name): Early return when cp_parser_lookup_name sets ambiguous_decls. /testsuite 2014-05-16 Paolo Carlini paolo.carl...@oracle.com PR c++/51640 * g++.dg/parse/error54.C: New. Index: cp/parser.c === --- cp/parser.c (revision 210512) +++ cp/parser.c (working copy) @@ -2880,13 +2880,21 @@ cp_parser_diagnose_invalid_type_name (cp_parser *p tree scope, tree id, location_t location) { - tree decl, old_scope; + tree decl, old_scope, ambiguous_decls; cp_parser_commit_to_tentative_parse (parser); /* Try to lookup the identifier. */ old_scope = parser-scope; parser-scope = scope; - decl = cp_parser_lookup_name_simple (parser, id, location); + decl = cp_parser_lookup_name (parser, id, none_type, + /*is_template=*/false, + /*is_namespace=*/false, + /*check_dependency=*/true, + ambiguous_decls, location); parser-scope = old_scope; + if (ambiguous_decls) +/* If the lookup was ambiguous, an error will already have + been issued. */ +return; /* If the lookup found a template-name, it means that the user forgot to specify an argument list. Emit a useful error message. */ if (TREE_CODE (decl) == TEMPLATE_DECL) Index: testsuite/g++.dg/parse/error54.C === --- testsuite/g++.dg/parse/error54.C(revision 0) +++ testsuite/g++.dg/parse/error54.C(working copy) @@ -0,0 +1,19 @@ +// PR c++/51640 + +class ex {}; + +namespace t +{ + class ex2 : public ex {}; +} + +class ex2 : public ex {}; + +void bar() +{ + using namespace t; + + try { + } catch (ex2) { // { dg-error reference to 'ex2' is ambiguous } + } +}
Re: [PATCH] Implement -fsanitize=float-cast-overflow
On Fri, 16 May 2014, Jakub Jelinek wrote: has variable. Guess the combination of IBM long double and __int128_t (for long long it is still ok) is rare enough that we don't need to care about it (alternative would be e.g. to compare against addition of There are existing problems with that combination anyway. fp-int-convert-timode.c has tests disabled for it because the compile-time and runtime conversions don't agree (whether for this issue or some other, I don't know). -- Joseph S. Myers jos...@codesourcery.com
Re: add dbgcnt and opt-info support for devirtualization
Hi, debugging runtime bugs due to devirtualization can be hard for very large C++ programs with complicated class hierarchy. This patch adds the support to report this high level transformation via -fopt-info (not hidden inside dump file) and the ability the do binary search with cutoff. Ok for trunk after build and test? Seems resonable to me. thanks, David Index: ChangeLog === --- ChangeLog (revision 210479) +++ ChangeLog (working copy) @@ -1,3 +1,18 @@ +2014-05-15 Xinliang David Li davi...@google.com + + * cgraphunit.c (walk_polymorphic_call_targets): Add + dbgcnt and fopt-info support. + 2014-05-15 Xinliang David Li davi...@google.com + + * cgraphunit.c (walk_polymorphic_call_targets): Add + dbgcnt and fopt-info support. + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto. + * ipa-devirt.c (ipa_devirt): Ditto. + * ipa.c (walk_polymorphic_call_targets): Ditto. + * gimple-fold.c (fold_gimple_assign): Ditto. + (gimple_fold_call): Ditto. + * dbgcnt.def: New counter. + 2014-05-15 Martin Jambor mjam...@suse.cz PR ipa/61085 Index: ipa-prop.c === --- ipa-prop.c(revision 210479) +++ ipa-prop.c(working copy) @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3. #include ipa-utils.h #include stringpool.h #include tree-ssanames.h +#include dbgcnt.h /* Intermediate information about a parameter that is only useful during the run of ipa_analyze_node and is not kept afterwards. */ @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c fprintf (dump_file, ipa-prop: Discovered direct call to non-function in %s/%i, making it unreachable.\n, ie-caller-name (), ie-caller-order); + else if (dump_enabled_p ()) + { + location_t loc = gimple_location (ie-call_stmt); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, +Discovered direct call to non-function in %s, +making it unreachable\n, ie-caller-name ()); Perhaps turning it to __builtin_unreachable call and similarly in the other cases we introduce __builtin_unreachable? I think that could be easier for user to work out. What king of problems in devirtualizatoin you are seeing? Honza
Re: [PATCH 2/5] c_diagnostic_ignored_function hack
On Fri, 16 May 2014, Tom Tromey wrote: In the typical case, when compiling a snippet of user code, gdb wraps the user's text in a dummy function. It's somewhat odd for users if an error in their code is mentioned as coming from this dummy function. This patch makes it possible to suppress the function-name display in a straightforward way: it adds a new global which the plugin can set to declare the name of the dummy function. This patch seems like a bit of a hack, but there didn't seem to be a notably cleaner approach. I'd say this global actually belongs somewhere in the diagnostic_context (i.e., instead of the diagnostic_context_auxiliary_data (DC) actually being a tree as it is at present, it should point to a structure with whatever extra information clients wish to use to control aspects of diagnostic reporting). -- Joseph S. Myers jos...@codesourcery.com
Re: add dbgcnt and opt-info support for devirtualization
On Fri, May 16, 2014 at 9:03 AM, Jan Hubicka hubi...@ucw.cz wrote: Hi, debugging runtime bugs due to devirtualization can be hard for very large C++ programs with complicated class hierarchy. This patch adds the support to report this high level transformation via -fopt-info (not hidden inside dump file) and the ability the do binary search with cutoff. Ok for trunk after build and test? Seems resonable to me. thanks, David Index: ChangeLog === --- ChangeLog (revision 210479) +++ ChangeLog (working copy) @@ -1,3 +1,18 @@ +2014-05-15 Xinliang David Li davi...@google.com + + * cgraphunit.c (walk_polymorphic_call_targets): Add + dbgcnt and fopt-info support. + 2014-05-15 Xinliang David Li davi...@google.com + + * cgraphunit.c (walk_polymorphic_call_targets): Add + dbgcnt and fopt-info support. + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto. + * ipa-devirt.c (ipa_devirt): Ditto. + * ipa.c (walk_polymorphic_call_targets): Ditto. + * gimple-fold.c (fold_gimple_assign): Ditto. + (gimple_fold_call): Ditto. + * dbgcnt.def: New counter. + 2014-05-15 Martin Jambor mjam...@suse.cz PR ipa/61085 Index: ipa-prop.c === --- ipa-prop.c(revision 210479) +++ ipa-prop.c(working copy) @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3. #include ipa-utils.h #include stringpool.h #include tree-ssanames.h +#include dbgcnt.h /* Intermediate information about a parameter that is only useful during the run of ipa_analyze_node and is not kept afterwards. */ @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c fprintf (dump_file, ipa-prop: Discovered direct call to non-function in %s/%i, making it unreachable.\n, ie-caller-name (), ie-caller-order); + else if (dump_enabled_p ()) + { + location_t loc = gimple_location (ie-call_stmt); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, +Discovered direct call to non-function in %s, +making it unreachable\n, ie-caller-name ()); Perhaps turning it to __builtin_unreachable call and similarly in the other cases we introduce __builtin_unreachable? I think that could be easier for user to work out. Ok. What king of problems in devirtualizatoin you are seeing? I have been chasing a runtime failure of a very large test built with gcc-4_9. The bad code either calls a pure function or turn a virtual call into __builtin_unreachable (incomplete target set). The indirect info shows the otr type to be !maybe_derived_type, and the outer-type gets cleared during inline update. I isolated a small test case -- but the good news is that gcc-4_9 @head already fixed the problem. I will check in the test case to trunk later. thanks, David Honza
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On 05/16/14 04:07, Bin.Cheng wrote: On Fri, May 16, 2014 at 1:13 AM, Jeff Law l...@redhat.com wrote: On 05/15/14 10:51, Mike Stump wrote: On May 15, 2014, at 12:26 AM, bin.cheng bin.ch...@arm.com wrote: Here comes up with a new GCC pass looking through each basic block and merging paired load store even they are not adjacent to each other. So I have a target that has load and store multiple support that supports large a number of registers (2-n registers), and I added a sched0 pass that is a light copy of the regular scheduling pass that uses a different cost function which arranges all loads first, then all stores then everything else. Within a group of loads or stores the secondary key is the base register, the next key is the offset. The net result, all loads off the same register are sorted in increasing order. Glad to see someone else stumble on (ab)using the scheduler to do this. Emm, If it's (ab)using, should we still do it then? I think it'd still be fine. There's even been a comment about doing this kind of thing in the scheduler that's been around since the early 90s... The scheduler is a bit interesting in that it has a wealth of dependency information and the ability to reorganize the insn stream in relatively arbitrary ways. That seems to make it a natural place to think about transformations of this nature. We just haven't had a good infrastructure for doing that. In theory we're a lot closer now to being able to plug in different costing/sorting models and let the scheduler do its thing. Those models might rewrite for register pressure, or encourage certain independent insns to issue back-to-back to encourage combining, or to build candidate insns for delay slot scheduling, etc. As Mike stated, merging of consecutive memory accesses is all about the base register and the offset. I am thinking another method collecting all memory accesses with same base register then doing the merge work. In this way, we should be able to merge more than 2 instructions, also it would be possible to remove redundant load instructions in one pass. My question is how many these redundant loads could be? Is there any rtl pass responsible for this now? I suspect it's a lot less important now than it used to be. But there's probably some cases where it'd be useful. Combining sub-word accesses into full-word accesses come immediately to mind. I'm not aware of any pass which does these kind of changes in a general form. Some passes (caller-save) do a fair amount of work to track when they can generate multi-object loads/stores (and it was a huge win back on the old sparc processors). jeff
Re: [PATCH (for next stage 1)] Add return type to gimple function dumps
On Fri, 2014-05-16 at 14:59 +0200, Richard Biener wrote: On Tue, Apr 29, 2014 at 5:01 PM, David Malcolm dmalc...@redhat.com wrote: On Tue, 2014-04-29 at 11:16 +0200, Richard Biener wrote: On Tue, Apr 29, 2014 at 2:58 AM, David Malcolm dmalc...@redhat.com wrote: On Thu, 2014-04-24 at 15:46 -0600, Jeff Law wrote: On 03/10/14 13:22, David Malcolm wrote: Gimple function dumps contain the types of parameters, but not of the return type. The attached patch fixes this omission; here's an example of the before/after diff: $ diff -up /tmp/pr23401.c.004t.gimple.old /tmp/pr23401.c.004t.gimple.new --- /tmp/pr23401.c.004t.gimple.old 2014-03-10 13:40:08.972063541 -0400 +++ /tmp/pr23401.c.004t.gimple.new 2014-03-10 13:39:49.346515464 -0400 @@ -1,3 +1,4 @@ +int (int i) { int D.1731; Successfully bootstrapped and regrtested on x86_64 Linux (Fedora 20). A couple of test cases needed tweaking, since they were counting the number of occurrences of int in the gimple dump, which thus changed for functions returning int (like the one above). OK for next stage 1? Conceptually OK. As Richi notes, the work here is in fixing up the testsuite. I didn't see a reply to Richi's question, particularly WRT the Fortran testsuite. I'm attaching a revised version of the patch which adds the use of TDF_SLIM (though it didn't appear to be necessary in the test I did of a function returning a struct). Successfully bootstrapped regrtested on x86_64 Linux (Fedora 20), using: --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto I didn't see any new failures from this in the testsuite, in particular gfortran.sum. Here's a comparison of the before/after test results, generated using my jamais-vu tool [1], with comments added by me inline: Comparing 16 common .sum files -- gcc/testsuite/ada/acats/acats.sum : total: 2320 PASS: 2320 gcc/testsuite/g++/g++.sum : total: 90421 FAIL: 3 PASS: 86969 XFAIL: 445 UNSUPPORTED: 3004 gcc/testsuite/gcc/gcc.sum : total: 110458 FAIL: 45 PASS: 108292 XFAIL: 265 XPASS: 33 UNSUPPORTED: 1823 gcc/testsuite/gfortran/gfortran.sum : total: 45717 PASS: 45600 XFAIL: 52 UNSUPPORTED: 65 gcc/testsuite/gnat/gnat.sum : total: 1255 PASS: 1234 XFAIL: 18 UNSUPPORTED: 3 gcc/testsuite/go/go.sum : total: 7266 PASS: 7258 XFAIL: 1 UNTESTED: 6 UNSUPPORTED: 1 gcc/testsuite/obj-c++/obj-c++.sum : total: 1450 PASS: 1354 XFAIL: 10 UNSUPPORTED: 86 gcc/testsuite/objc/objc.sum : total: 2973 PASS: 2893 XFAIL: 6 UNSUPPORTED: 74 x86_64-unknown-linux-gnu/boehm-gc/testsuite/boehm-gc.sum : total: 13 PASS: 12 UNSUPPORTED: 1 x86_64-unknown-linux-gnu/libatomic/testsuite/libatomic.sum : total: 54 PASS: 54 x86_64-unknown-linux-gnu/libffi/testsuite/libffi.sum : total: 1856 PASS: 1801 UNSUPPORTED: 55 x86_64-unknown-linux-gnu/libgo/libgo.sum : total: 122 PASS: 122 x86_64-unknown-linux-gnu/libgomp/testsuite/libgomp.sum : total: 2420 PASS: 2420 x86_64-unknown-linux-gnu/libitm/testsuite/libitm.sum : total: 30 PASS: 26 XFAIL: 3 UNSUPPORTED: 1 x86_64-unknown-linux-gnu/libjava/testsuite/libjava.sum : total: 2586 PASS: 2582 XFAIL: 4 x86_64-unknown-linux-gnu/libstdc++-v3/testsuite/libstdc++.sum : total: 10265 PASS: 1 XFAIL: 41 UNSUPPORTED: 224 (...i.e. the totals were unchanged between unpatched/patched for all of the .sum files; and yes, Fortran was tested. Should there be a gcj.sum?) Tests that went away in gcc/testsuite/gcc/gcc.sum: 2 PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple int 5 PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple int 3 Tests appeared in gcc/testsuite/gcc/gcc.sum: 2 -- PASS: gcc.dg/tree-ssa/pr23401.c scan-tree-dump-times gimple int 6 PASS: gcc.dg/tree-ssa/pr27810.c scan-tree-dump-times gimple int 4 (...my comparison tool isn't smart enough yet to tie these went away/appeared results together; they reflect the fixups from the patch). Tests that went away in gcc/testsuite/go/go.sum: 2 -- PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) compilation, -O2 -g PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) execution, -O2 -g Tests appeared in gcc/testsuite/go/go.sum: 2 PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) compilation, -O2 -g PASS: go.test/test/dwarf/dwarf.dir/main.go (lots of refs to path of build) execution, -O2 -g (...I hand edited the above, this main.go test
Re: [patch,mips] avoid invalid register for JALR
On Wed, 14 May 2014, Sandra Loosemore wrote: When I was trying to benchmark another patch (which I'll be sending along shortly) with CSiBE for -mabi=64, I ran into an assembler error like this: /tmp/ccJv2faG.s: Assembler messages: /tmp/ccJv2faG.s:1605: Error: a destination register must be supplied `jalr $31' JALR patterns should have an explicit clobber of $31, which I thought was also supposed to stop $31 from being used as the call address. Hm. Yes, that ought to work, in theory Do you have a testcase? I can reproduce the error in a current mipsisa64-elfoabi build, with my patch to delete ADJUST_REG_ALLOC_ORDER applied. It triggers on this file from CSiBE: mipsisa64-elfoabi-gcc -c -mabi=64 -O2 -fno-common -w csibe/src/./ttt-0.10.1.preproc/src/connect4.i I wonder if there's something fishy going on here. I checked output produced with -dP and the offending instruction is emitted like this: #(call_insn 172 124 161 (parallel [ #(call (mem:SI (reg:DI 31 $31) [0 c4_setup S4 A32]) #(const_int 0 [0])) #(clobber (reg:SI 31 $31)) #]) c4_new.i:79 594 {call_internal} # (expr_list:REG_DEAD (reg:DI 31 $31) #(expr_list:REG_DEAD (reg:DI 7 $7) #(expr_list:REG_DEAD (reg:DI 6 $6) #(expr_list:REG_DEAD (reg:DI 5 $5) #(expr_list:REG_DEAD (reg:DI 4 $4) #(nil)) #(expr_list:DI (use (reg:DI 4 $4)) #(expr_list:SI (use (reg:DI 5 $5)) #(expr_list:SI (use (reg:DI 6 $6)) #(expr_list:SI (use (reg:DI 7 $7)) #(nil)) jalr$31 # 172 call_internal/1 [length = 4] so clearly the clobber is ignored, or perhaps rather considered a late clobber instead of an early clobber that's indeed required here. I have reduced your case a bit and attached the result to this e-mail. With this code I can reproduce the problem using the following command: $ mips-sde-elf-gcc -S -dP -mabi=64 -O2 -fno-common -w -o c4_new.s c4_new.i and current trunk with the patch you recently posted as archived at: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01016.html applied. With the patch reverted the issue goes away ($17 is used for the jump), so clearly the register allocation order override made in mips_order_regs_for_local_alloc is currently covering an underlying bug. Maciejtypedef unsigned int size_t; extern void *malloc (size_t __size); typedef char bool; typedef struct _Board { struct _Board *(*copy) (const struct _Board *); void (*setup) (struct _Board *, int, int, int); void (*display) (struct _Board *); int (*eval) (struct _Board *); void (*score) (struct _Board *, int *, int *); bool (*full) (struct _Board *); int (*winner) (struct _Board *); bool (*valid_move) (struct _Board *, int); bool (*move) (struct _Board *, int, int); void (*unmove) (struct _Board *); int (*getmove) (struct _Board *, int, int *, int *); void (*help) (void); char (*symbol) (struct _Board *, int, int); void (*coords) (struct _Board *, int, char *); int rows, cols; int squares; int *moves; int nummoves; int X_player; int *board; int **points; int numpoints; int depth, depth2; int *center; } Board; Board *c4_copy (const Board *); void c4_setup (Board *, int, int, int); void c4_display (Board *); int c4_eval (Board *); inline bool c4_full (Board *); int c4_winner (Board *); bool c4_valid_move (Board *, int); bool c4_move (Board *, int, int); void c4_unmove (Board *); int c4_getmove (Board *, int, int *, int *); void c4_help (void); char c4_symbol (Board *, int, int); void c4_coords (Board *, int, char *); Board *c4_new (int players, int size, int depth) { Board *T = (Board *) malloc (sizeof (Board)); Board *B = (Board *)T; int **points; int *board; int i, j, point = 0; int MyPE = 0; int rows, cols; B-copy = c4_copy; B-setup = c4_setup; B-display = c4_display; B-eval = c4_eval; B-full = c4_full; B-winner = c4_winner; B-valid_move = c4_valid_move; B-move = c4_move; B-unmove = c4_unmove; B-getmove = c4_getmove; B-help = c4_help; B-symbol = c4_symbol; B-coords = c4_coords; B-score = ((void *)0) ; rows = B-rows = 6; cols = B-cols = 7; B-squares = 7; if (MyPE == 0) c4_setup ((Board *)T, players, depth, 0); B-board = (int *) malloc (rows*cols * sizeof(int)); B-moves = (int *) malloc (rows*cols * sizeof(int)); B-nummoves = 0; for (i = 0; i rows*cols; i++) B-board[i] = B-moves[i] = 0; B-numpoints = rows*(cols-3) + cols*(rows-3) + 2*(rows-3)*(cols-3); points = (int **) malloc (B-numpoints * 4 * sizeof(int *)); board = B-board; for (j = 0; j rows - 3; j++) { for (i = 0; i cols; i++, point++) { points[point*4+0] = board +
Re: [PATCH 0/5] let gdb reuse gcc'c C compiler
On May 16, 2014, at 8:26 AM, Tom Tromey tro...@redhat.com wrote: This patch series is half of a project to let gdb reuse gcc (which half depends on which list you are seeing this on), so that users can compile small snippets of code and evaluate them in the current context of the inferior. Nice… I looked though it, nice and lightweight and seems like the api can hold up through time.
Re: [PATCH 2/5] c_diagnostic_ignored_function hack
This patch makes it possible to suppress the function-name display in a straightforward way: it adds a new global which the plugin can set to declare the name of the dummy function. This patch seems like a bit of a hack, but there didn't seem to be a notably cleaner approach. Can't you override the diagnostic_starter() in your plugin? This way you can even customize it to print gdb in the prefix (if you wish to provide a custom prefix). If that is possible, that seems way cleaner. If not, I wonder why not. Otherwise, why not override the lang_hook itself? This way the plugin can provide its own lang_hook so that your proposed c_print_error_function lives in the plugin itself. I'm probably missing something obvious here. Cheers, Manuel.
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On 05/16/14 04:07, Bin.Cheng wrote: Yes, I think this one does have a good reason. The target independent pass just makes sure that two consecutive memory access instructions are free of data-dependency with each other, then feeds it to back-end hook. It's back-end's responsibility to generate correct instruction. But given these two memory access insns, there's only a couple ways they're likely to combine into a single insn. We could just as easily have the target independent code construct a new insn then try to recognize it. If it's not recognized, then try the other way. Or is it the case that we're doing something beyond upsizing the mode? It's not about modifying an existing insn then recognize it, it's about creating new instruction sometimes. For example, we can generate a simple move insn in Arm mode, while have to generate a parallel instruction in Thumb mode. Target independent part has no idea how to generate an expected insn. Moreover, back-end may check some special conditions too. But can't you go through movXX to generate either the simple insn on the ARM or the PARALLEL on the thumb? Jeff
Re: [C++ Patch] PR 51640
OK. Jason
Re: add dbgcnt and opt-info support for devirtualization
I have been chasing a runtime failure of a very large test built with gcc-4_9. The bad code either calls a pure function or turn a virtual call into __builtin_unreachable (incomplete target set). The indirect info shows the otr type to be !maybe_derived_type, and the outer-type gets cleared during inline update. I isolated a small test case -- but the good news is that gcc-4_9 @head already fixed the problem. I will check in the test case to trunk later. Good, testcase would be welcome. I guess it was the fix for placement_new bug. It disables some valid devirtualizations (and I thus may revisit the fix for 4.10), so it would be good to know if your testcase differs from the original PR one. Honza thanks, David Honza
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On May 16, 2014, at 3:07 AM, Bin.Cheng amker.ch...@gmail.com wrote: I don't see how regrename will help resolve [base+offset] false dependencies. Can you explain? I'd expect effects from hardreg-copyprop commoning a base register. It's the register operand's false dependency, rather than the base's one. Considering below simple case: mov r1, #const1 store r1, [base+offset1] mov r1, #const2 store r1, [base_offset2] It should be renamed into: mov r1, #const1 store r1, [base+offset1] mov r2, #const2 store r2, [base_offset2] Ah, but, what did this look like right before pass_web?
Re: we are starting the wide int merge
On Sat, 10 May 2014, Gerald Pfeifer wrote: Since (at least) 16:40 UTC that day my i386-unknown-freebsd10.0 builds fail as follows: Comparing stages 2 and 3 warning: gcc/cc1obj-checksum.o differs warning: gcc/cc1-checksum.o differs warning: gcc/cc1plus-checksum.o differs Bootstrap comparison failure! gcc/fold-const.o differs gcc/simplify-rtx.o differs gcc/tree-ssa-ccp.o differs (FreeBSD/i386 really builds for i486, but retains the original name; I'm traveling with limited access, but would not be surprised for this to also show up for i386-*-linux-gnu or i486-*-linux-gnu.) Is anybody able to reproduce this, for example on a GNU/Linux system? This tester of mine hasn't been able to bootstrap for nearly a week, and timing-wise it would be really a coincidence were this not due to wide-int. Gerald
Re: RFA: Fix calculation of size of builtin setjmp buffer
OK - here is your version of the patch, extended with a comment which I think is helpful for other people reading the code, and with the changes to builtins.c and md.texi removed, since the size of the buffer is not changing. Is this version OK to apply ? Yes, IMO that's fine, thanks. -- Eric Botcazou
Re: RFA: Fix calculation of size of builtin setjmp buffer
But, fails whenever the size of the mode of the save area is bigger than a certain amount… On my port, the size taken up by the save area is large enough to cause this to fail. :-( That's a bit unexpected, why do you need so big a save area exactly? The only architecture for which this doesn't work is the IA-64, which is a very special beast... In this case, the way out is to define DONT_USE_BUILTIN_SETJMP and JMP_BUF_SIZE to the needed size. -- Eric Botcazou
Re: [PATCH] Fix ARM NAN fraction bits
On Thu, 27 Feb 2014, Joey Ye wrote: Current ARM soft-float implementation is violating the RTABI (http://infocenter.arm.com/help/topic/com.arm.doc.ihi0043d/IHI0043D_rtabi.pd f) Section 4.1.1.1: When not otherwise specified by IEEE 754, the result on an invalid operation should be the quiet NaN bit pattern with only the most significant bit of the significand set, and all other significand bits zero. This patch fixes it by setting _FP_NANFRAC_* to zero. Ran make check test with –mfloat-abi=soft. No regression. OK to checkin? 2014-02-27 Joey Ye joey...@arm.com * sysdeps/arm/soft-fp/sfp-machine.h   (_FP_NANFRAC_S, _FP_NANFRAC_D, _FP_NANFRAC_Q):  Set to zero. diff --git a/sysdeps/arm/soft-fp/sfp-machine.h b/sysdeps/arm/soft-fp/sfp-machine.h index 52a08b5..32697fe 100644 --- a/sysdeps/arm/soft-fp/sfp-machine.h +++ b/sysdeps/arm/soft-fp/sfp-machine.h @@ -21,9 +21,9 @@ #define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y) #define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y) -#define _FP_NANFRAC_S ((_FP_QNANBIT_S 1) - 1) -#define _FP_NANFRAC_D ((_FP_QNANBIT_D 1) - 1), -1 -#define _FP_NANFRAC_Q ((_FP_QNANBIT_Q 1) - 1), -1, -1, -1 +#define _FP_NANFRAC_S 0 +#define _FP_NANFRAC_D   0, 0 +#define _FP_NANFRAC_Q   0, 0, 0, 0 #define _FP_NANSIGN_S  0 #define _FP_NANSIGN_D 0 #define _FP_NANSIGN_Q 0 This did regrettably, when propagated to libgcc, regress gcc.dg/torture/builtin-math-7.c on soft-fp arm-eabi targets, currently ARMv6-M (`-march=armv6-m -mthumb') only. This is because these NANFRAC macros have now no bits set and as a result when used to construct a NaN in the semi-raw mode, they build an infinity instead. Consequently operations such as (Inf - Inf) now produce Inf rather than NaN. The change worked for the original test case, because division is made in the canonical mode, where the quiet bit is set separately, from the fp class. Here's a fix making code match the commit description quoted above, that is set the most significant bit of the significand. This is also what targets similar in this respect do. OK to apply? OK for libgcc (against libgcc/config/arm/sfp-machine.h), in particular for GCC 4.8 and 4.9? 2014-05-16 Maciej W. Rozycki ma...@codesourcery.com PR libgcc/60166 * sysdeps/arm/soft-fp/sfp-machine.h (_FP_NANFRAC_S, _FP_NANFRAC_D) (_FP_NANSIGN_Q): Set the quiet bit. Maciej glibc-soft-fp-arm-nanfrac.diff Index: glibc-fsf-trunk-quilt/sysdeps/arm/soft-fp/sfp-machine.h === --- glibc-fsf-trunk-quilt.orig/sysdeps/arm/soft-fp/sfp-machine.h 2014-05-16 03:25:52.0 +0100 +++ glibc-fsf-trunk-quilt/sysdeps/arm/soft-fp/sfp-machine.h 2014-05-16 03:31:34.451805339 +0100 @@ -21,9 +21,9 @@ #define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y) #define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y) -#define _FP_NANFRAC_S 0 -#define _FP_NANFRAC_D 0, 0 -#define _FP_NANFRAC_Q 0, 0, 0, 0 +#define _FP_NANFRAC_S _FP_QNANBIT_S +#define _FP_NANFRAC_D _FP_QNANBIT_D, 0 +#define _FP_NANFRAC_Q _FP_QNANBIT_Q, 0, 0, 0 #define _FP_NANSIGN_S 0 #define _FP_NANSIGN_D 0 #define _FP_NANSIGN_Q 0
Re: [PATCH] Fix PR54733 Optimize endian independent load/store
On May 16, 2014, at 4:13 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 1:03 PM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:56 PM, pins...@gmail.com wrote: On May 16, 2014, at 3:48 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, May 16, 2014 at 12:07 PM, Thomas Preud'homme thomas.preudho...@arm.com wrote: Ping? Sorry ... Best regards, Thomas Preud'homme -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Thomas Preud'homme Sent: Friday, May 09, 2014 6:26 PM To: GCC Patches Subject: RE: [PATCH] Fix PR54733 Optimize endian independent load/store Sorry, took longer than expected as I got distracted by some other patch. I merged the whole patchset in a single patch as I was told the current setup is actually more difficult to read. Here are the updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * expr.c (get_inner_reference): Add a parameter to control whether a MEM_REF should be split into base + offset. * tree.h (get_inner_reference): Default new parameter to false. * tree-ssa-math-opts.c (nop_stats): New bswap_stats structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. Add assert when updating bswap_stats. *** gcc/testsuite/ChangeLog *** 2014-05-09 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Ok for trunk? Ok, I now decided otherwise and dislike the new parameter to get_inner_reference. Can you please revert that part and just deal with a MEM_REF result in your only caller? And (of course) I found another possible issue. The way you compute load_type and use it here: + /* Perform the load. */ + load_offset_ptr = build_int_cst (n-alias_set, 0); + val_expr = fold_build2 (MEM_REF, load_type, addr_tmp, + load_offset_ptr); makes the load always appear aligned according to the mode of load_type. On strict-alignment targets this may cause faults. So what you have to do is either (simpler) unsigned int align = get_pointer_alignment (addr_tmp); tree al_load_type = load_type; if (align TYPE_ALIGN (load_type)) al_load_type = build_aligned_type (load_type, align); ... val_expr = fold_build2 (MEM_REF, al_load_type, addr_tmp, load_offset_ptr); or keep track of the first actual load and use unsigned int align = get_object_alignment (that_first_load); first in the one that corresponds to addr_tmp. From that there is a much better chance to derive good alignment values. Of course on STRICT_ALIGNMENT targets a not aligned load will be decomposed again, so eventually doing the transformation may no longer be profitable(?). Not always decomposed. On MIPS, it should using the load/store left/right instructions for unaligned load/stores which is normally better than decomposed load/stores. So having a cost model would be nice. Agreed, but I am happy with doing that as a followup. Btw, a very simple one would be to reject unaligned SLOW_UNALIGNED_ACCESS (TYPE_MODE (load_type), align). [of course that may be true on MIPS even for the cases where a reasonable fast unalgined variant exists - nearly no target defines that macro in a too fancy way] Oh, and what happens for unsigned foo (unsigned char *x) { return x[0] 24 | x[2] 8 | x[3]; } ? We could do an unsigned int load from x and zero byte 3 with an AND. Enhancement for a followup, similar to also considering vector types for the load (also I'm not sure that uint64_type_node always has non-BLKmode for all targets). No we cannot if x[4] is on a different page which is not mapped in, we get a fault. Not something we want. Thanks, Andrew Richard. Richard. Thanks, Andrew Thanks and sorry again for the delay. Otherwise the patch looks good to me. Richard. Best regards, Thomas -Original Message- From: gcc-patches-ow...@gcc.gnu.org
Re: [PATCH] Fix ARM NAN fraction bits
On Fri, 16 May 2014, Maciej W. Rozycki wrote: 2014-05-16 Maciej W. Rozycki ma...@codesourcery.com PR libgcc/60166 * sysdeps/arm/soft-fp/sfp-machine.h (_FP_NANFRAC_S, _FP_NANFRAC_D) (_FP_NANSIGN_Q): Set the quiet bit. OK for glibc. -- Joseph S. Myers jos...@codesourcery.com
Re: PR 61084: Use widest_int in sparc_fold_builtin
This is the second part of PR 61084, which it seems I'd forgotten to post. pdist calculates a wide result from narrower inputs, so I should have used widest_int rather than wide_int. Is that documented? Because, if even you wide-int guys got it wrong... PR target/61084 * config/sparc/sparc.c (sparc_fold_builtin): Use widest_int rather than wide_int. OK, thanks. -- Eric Botcazou
Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware
On 05/16/14 02:02, Richard Biener wrote: Quiet as usual. Which strongly suggests folks trust you to do the right thing here :-) I think the FRE/PRE reference in $SUBJECT made me ignore the patch entirely -- my brain hurts when I look at our tree PRE implementation. Jeff
Eliminate write-only variables
Hi, this patch adds code to remove write only static variables. While analyzing effectivity of LTO on firefox, I noticed that surprisingly large part of binary's data segment is occupied by these. Fixed thus. (this is quite trivial transformation, I just never considered it important enough to work on it). The patch goes by marking write only variables in ipa.c (at same time we discover addressable flag) and also fixes handling of the flags for aliases. References to variables are then removed by fixup_cfg. As first cut, I only remove stores without side effects, so copies from volatile variables are preserved. I also kill LHS of function calls. I do not attempt to remove asm statements. This means that some references may be left in the code and therefore the IPA code does not eliminate the referneces after discovering write only variable and instead it relies on dead variable elimination to do the job later. Consequently not all write only variables are removed with WHOPR in the case the references ends up in different partitions. Something I can address incrementally. Also I think dwarf2out should be updated to mark value of the write only variables as optimized out. Jakub, can you help me with this? (I do not think it is valid to output the optimized out value of constructor) Bootstrapped/regtested x86_64-linux, will commit it later today. Honza * varpool.c (dump_varpool_node): Dump write-only flag. * lto-cgraph.c (lto_output_varpool_node, input_varpool_node): Stream write-only flag. * tree-cfg.c (execute_fixup_cfg): Remove statements setting write-only variables. * gcc.c-torture/execute/20101011-1.c: Update testcase. * gcc.dg/ira-shrinkwrap-prep-1.c: Update testcase. * gcc.dg/tree-ssa/writeonly.c: New testcase. * gcc.dg/tree-ssa/ssa-dse-6.c: Update testcase. * gcc.dg/tree-ssa/pr21559.c: Update testcase. * gcc.dg/debug/pr35154.c: Update testcase. * gcc.target/i386/vectorize1.c: Update testcase. * ipa.c (process_references): New function. (set_readonly_bit): New function. (set_writeonly_bit): New function. (clear_addressable_bit): New function. (ipa_discover_readonly_nonaddressable_var): Mark write only variables; fix handling of aliases. * cgraph.h (struct varpool_node): Add writeonly flag. Index: varpool.c === --- varpool.c (revision 210514) +++ varpool.c (working copy) @@ -211,6 +211,8 @@ dump_varpool_node (FILE *f, varpool_node fprintf (f, read-only); if (ctor_for_folding (node-decl) != error_mark_node) fprintf (f, const-value-known); + if (node-writeonly) +fprintf (f, write-only); fprintf (f, \n); } Index: lto-cgraph.c === --- lto-cgraph.c(revision 210514) +++ lto-cgraph.c(working copy) @@ -562,6 +562,7 @@ lto_output_varpool_node (struct lto_simp bp_pack_value (bp, node-forced_by_abi, 1); bp_pack_value (bp, node-unique_name, 1); bp_pack_value (bp, node-body_removed, 1); + bp_pack_value (bp, node-writeonly, 1); bp_pack_value (bp, node-definition, 1); alias_p = node-alias (!boundary_p || node-weakref); bp_pack_value (bp, alias_p, 1); @@ -1153,6 +1154,7 @@ input_varpool_node (struct lto_file_decl node-forced_by_abi = bp_unpack_value (bp, 1); node-unique_name = bp_unpack_value (bp, 1); node-body_removed = bp_unpack_value (bp, 1); + node-writeonly = bp_unpack_value (bp, 1); node-definition = bp_unpack_value (bp, 1); node-alias = bp_unpack_value (bp, 1); node-weakref = bp_unpack_value (bp, 1); Index: tree-cfg.c === --- tree-cfg.c (revision 210514) +++ tree-cfg.c (working copy) @@ -8431,7 +8431,7 @@ execute_fixup_cfg (void) FOR_EACH_BB_FN (bb, cfun) { bb-count = apply_scale (bb-count, count_scale); - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);) { gimple stmt = gsi_stmt (gsi); tree decl = is_gimple_call (stmt) @@ -8457,9 +8457,45 @@ execute_fixup_cfg (void) todo |= TODO_cleanup_cfg; } + /* Remove stores to variables we marked readonly. +Keep access when store has side effect, i.e. in case when source +is volatile. */ + if (gimple_store_p (stmt) + !gimple_has_side_effects (stmt)) + { + tree lhs = get_base_address (gimple_get_lhs (stmt)); + + if (TREE_CODE (lhs) == VAR_DECL + (TREE_STATIC (lhs) || DECL_EXTERNAL (lhs)) + varpool_get_node (lhs)-writeonly) + { + unlink_stmt_vdef (stmt); + gsi_remove (gsi, true); + release_defs (stmt); +
Re: [PATCH] Use direct computation to calculate fallthrough edge count.
This patch uses direct computation (instead of indirect minus computation) to calculate fallthrough edge's count and frequency. This is less error-prone when the profile is inconsistent (e.g. all succ edge counts sums larger than bb count). Bootstrapped and regression test on-going. OK for trunk if test pass? OK, thanks! Honza Thanks, Dehao gcc/ChangeLog: 2014-05-16 Dehao Chen de...@google.com * cfghooks.c (make_forwarder_block): Use direct computation to get fall-through edge's count and frequency. Index: gcc/cfghooks.c === --- gcc/cfghooks.c (revision 210518) +++ gcc/cfghooks.c (working copy) @@ -833,6 +833,9 @@ make_forwarder_block (basic_block bb, bool (*redir fallthru = split_block_after_labels (bb); dummy = fallthru-src; + dummy-count = 0; + dummy-frequency = 0; + fallthru-count = 0; bb = fallthru-dest; /* Redirect back edges we want to keep. */ @@ -842,20 +845,13 @@ make_forwarder_block (basic_block bb, bool (*redir if (redirect_edge_p (e)) { + dummy-frequency += EDGE_FREQUENCY (e); + dummy-count += e-count; + fallthru-count += e-count; ei_next (ei); continue; } - dummy-frequency -= EDGE_FREQUENCY (e); - dummy-count -= e-count; - if (dummy-frequency 0) - dummy-frequency = 0; - if (dummy-count 0) - dummy-count = 0; - fallthru-count -= e-count; - if (fallthru-count 0) - fallthru-count = 0; - e_src = e-src; jump = redirect_edge_and_branch_force (e, bb); if (jump != NULL)
[PATCH] Use direct computation to calculate fallthrough edge count.
This patch uses direct computation (instead of indirect minus computation) to calculate fallthrough edge's count and frequency. This is less error-prone when the profile is inconsistent (e.g. all succ edge counts sums larger than bb count). Bootstrapped and regression test on-going. OK for trunk if test pass? Thanks, Dehao gcc/ChangeLog: 2014-05-16 Dehao Chen de...@google.com * cfghooks.c (make_forwarder_block): Use direct computation to get fall-through edge's count and frequency. Index: gcc/cfghooks.c === --- gcc/cfghooks.c (revision 210518) +++ gcc/cfghooks.c (working copy) @@ -833,6 +833,9 @@ make_forwarder_block (basic_block bb, bool (*redir fallthru = split_block_after_labels (bb); dummy = fallthru-src; + dummy-count = 0; + dummy-frequency = 0; + fallthru-count = 0; bb = fallthru-dest; /* Redirect back edges we want to keep. */ @@ -842,20 +845,13 @@ make_forwarder_block (basic_block bb, bool (*redir if (redirect_edge_p (e)) { + dummy-frequency += EDGE_FREQUENCY (e); + dummy-count += e-count; + fallthru-count += e-count; ei_next (ei); continue; } - dummy-frequency -= EDGE_FREQUENCY (e); - dummy-count -= e-count; - if (dummy-frequency 0) - dummy-frequency = 0; - if (dummy-count 0) - dummy-count = 0; - fallthru-count -= e-count; - if (fallthru-count 0) - fallthru-count = 0; - e_src = e-src; jump = redirect_edge_and_branch_force (e, bb); if (jump != NULL)
Re: Eliminate write-only variables
Jan Hubicka hubi...@ucw.cz writes: + /* Remove stores to variables we marked readonly. s/read/write/ + /* For calls we can simply remove LHS when it is known to be read only. */ s/read/write/ Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: Eliminate write-only variables
Jan Hubicka hubi...@ucw.cz writes: + /* Remove stores to variables we marked readonly. s/read/write/ + /* For calls we can simply remove LHS when it is known to be read only. */ s/read/write/ Ah, thanks! Honza Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: add dbgcnt and opt-info support for devirtualization
On Fri, May 16, 2014 at 9:51 AM, Jan Hubicka hubi...@ucw.cz wrote: I have been chasing a runtime failure of a very large test built with gcc-4_9. The bad code either calls a pure function or turn a virtual call into __builtin_unreachable (incomplete target set). The indirect info shows the otr type to be !maybe_derived_type, and the outer-type gets cleared during inline update. I isolated a small test case -- but the good news is that gcc-4_9 @head already fixed the problem. I will check in the test case to trunk later. Good, testcase would be welcome. I guess it was the fix for placement_new bug. It disables some valid devirtualizations (and I thus may revisit the fix for 4.10), so it would be good to know if your testcase differs from the original PR one. I thought so too -- but when I backed out this single change from trunk, the test still passes, so something else is also going on. David Honza thanks, David Honza
patch to fix PR60969
The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60969 The patch was bootstrapped and tested on x86/x86-64. Committed as rev. 210519 to gcc 4.9 branch and as rev. 210520 to trunk. 2014-05-16 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/60969 * ira-costs.c (record_reg_classes): Allow only memory for pseudo. Calculate costs for this case. 2014-05-16 Vladimir Makarov vmaka...@redhat.com PR rtl-optimization/60969 * g++.dg/pr60969.C: New. Index: ira-costs.c === --- ira-costs.c (revision 210069) +++ ira-costs.c (working copy) @@ -762,10 +762,11 @@ record_reg_classes (int n_alts, int n_op into that class. */ if (REG_P (op) REGNO (op) = FIRST_PSEUDO_REGISTER) { - if (classes[i] == NO_REGS) + if (classes[i] == NO_REGS ! allows_mem[i]) { /* We must always fail if the operand is a REG, but -we did not find a suitable class. +we did not find a suitable class and memory is +not allowed. Otherwise we may perform an uninitialized read from this_op_costs after the `continue' statement @@ -783,50 +784,90 @@ record_reg_classes (int n_alts, int n_op bool out_p = recog_data.operand_type[i] != OP_IN; enum reg_class op_class = classes[i]; move_table *move_in_cost, *move_out_cost; + short (*mem_cost)[2]; ira_init_register_move_cost_if_necessary (mode); if (! in_p) { ira_assert (out_p); - move_out_cost = ira_may_move_out_cost[mode]; - for (k = cost_classes_ptr-num - 1; k = 0; k--) + if (op_class == NO_REGS) { - rclass = cost_classes[k]; - pp_costs[k] - = move_out_cost[op_class][rclass] * frequency; + mem_cost = ira_memory_move_cost[mode]; + for (k = cost_classes_ptr-num - 1; k = 0; k--) + { + rclass = cost_classes[k]; + pp_costs[k] = mem_cost[rclass][0] * frequency; + } + } + else + { + move_out_cost = ira_may_move_out_cost[mode]; + for (k = cost_classes_ptr-num - 1; k = 0; k--) + { + rclass = cost_classes[k]; + pp_costs[k] + = move_out_cost[op_class][rclass] * frequency; + } } } else if (! out_p) { ira_assert (in_p); - move_in_cost = ira_may_move_in_cost[mode]; - for (k = cost_classes_ptr-num - 1; k = 0; k--) + if (op_class == NO_REGS) { - rclass = cost_classes[k]; - pp_costs[k] - = move_in_cost[rclass][op_class] * frequency; + mem_cost = ira_memory_move_cost[mode]; + for (k = cost_classes_ptr-num - 1; k = 0; k--) + { + rclass = cost_classes[k]; + pp_costs[k] = mem_cost[rclass][1] * frequency; + } + } + else + { + move_in_cost = ira_may_move_in_cost[mode]; + for (k = cost_classes_ptr-num - 1; k = 0; k--) + { + rclass = cost_classes[k]; + pp_costs[k] + = move_in_cost[rclass][op_class] * frequency; + } } } else { - move_in_cost = ira_may_move_in_cost[mode]; - move_out_cost = ira_may_move_out_cost[mode]; - for (k = cost_classes_ptr-num - 1; k = 0; k--) - { - rclass = cost_classes[k]; - pp_costs[k] = ((move_in_cost[rclass][op_class] - + move_out_cost[op_class][rclass]) -* frequency); + if (op_class == NO_REGS) + { + mem_cost = ira_memory_move_cost[mode]; +
[C++ patch] Reduce vtable alignment
Hi, compiling: struct A { virtual void foo(void) {}; virtual void foo2(void) {}; virtual void foo3(void) {}; virtual void foo4(void) {}; virtual void foo5(void) {}; } a; give 32 byte alignment to the virtual table on i386, because we bump up alignments of arrays to size of vector operations. This is wasteful, since virutal tables are never really accessed this way. I am testing the following patch, OK if it passes? The patch also removes apparently 20 years old hack for SPARC. Honza * class.c (build_vtable): Force alignment of virtual tables to be pointer size only to save space. Index: class.c === --- class.c (revision 210521) +++ class.c (working copy) @@ -768,11 +768,8 @@ build_vtable (tree class_type, tree name TREE_READONLY (decl) = 1; DECL_VIRTUAL_P (decl) = 1; DECL_ALIGN (decl) = TARGET_VTABLE_ENTRY_ALIGN; + DECL_USER_ALIGN (decl) = true; DECL_VTABLE_OR_VTT_P (decl) = 1; - /* At one time the vtable info was grabbed 2 words at a time. This - fails on sparc unless you have 8-byte alignment. (tiemann) */ - DECL_ALIGN (decl) = MAX (TYPE_ALIGN (double_type_node), - DECL_ALIGN (decl)); set_linkage_according_to_type (class_type, decl); /* The vtable has not been defined -- yet. */ DECL_EXTERNAL (decl) = 1;
Re: Avoiding some garbage rtl from instantiate_virtual_regs
On 05/16/14 02:47, Richard Sandiford wrote: It seems that in a typical -O0 compile the amount of rtl that starts out as needed but becomes garbage is only slightly less than half of the total amount created. One of the big offenders is the vregs pass, which creates new PLUSes when instanstiating a virtual register + a constant and which creates new MEMs when instantiating an address involving a virtual register. This happens a lot in -O0 code because all variables live on the stack. The instantiation walk is fundamentally in-place: every other part of the pattern is modified without copying. And rtl sharing rules guarantee that we can do the same for PLUSes of registers and MEMs. The patch does this by adding inplace arguments to plus_constant and replace_equiv_address. In a -O0 compile of an oldish fold-const.ii (where no GC takes place) it reduces the amount of used GC memory from 169M to 166M. The average max RSS goes down by just over 1%. Compile time seems to decrease slightly, but probably in the noise range. There might be other callers than can use the new interfaces too. Tested on x86_64-linux-gnu. Also tested by comparing the asm output for various parts of the testsuite before and after the patch. The only changes were that some sym+0s becamse plain syms (i.e. (plus X (const_int 0)) became X) because of the plus_constant change. OK to install? Thanks, Richard gcc/ * emit-rtl.h (replace_equiv_address, replace_equiv_address_nv): Add an inplace argument. Store the new address in the original MEM when true. * emit-rtl.c (change_address_1): Likewise. (adjust_address_1, adjust_automodify_address_1, offset_address): Update accordingly. * rtl.h (plus_constant): Add an inplace argument. * explow.c (plus_constant): Likewise. Try to reuse the original PLUS when true. Avoid generating (plus X (const_int 0)). * function.c (instantiate_virtual_regs_in_rtx): Adjust the PLUS in-place. Pass true to plus_constant. (instantiate_virtual_regs_in_insn): Pass true to replace_equiv_address. Index: gcc/emit-rtl.h === --- gcc/emit-rtl.h 2014-05-15 11:27:06.000259353 +0100 +++ gcc/emit-rtl.h 2014-05-16 09:11:42.479556294 +0100 @@ -52,10 +52,10 @@ extern tree get_spill_slot_decl (bool); ADDR. The caller is asserting that the actual piece of memory pointed to is the same, just the form of the address is being changed, such as by putting something into a register. */ -extern rtx replace_equiv_address (rtx, rtx); +extern rtx replace_equiv_address (rtx, rtx, bool = false); /* Likewise, but the reference is not required to be valid. */ -extern rtx replace_equiv_address_nv (rtx, rtx); +extern rtx replace_equiv_address_nv (rtx, rtx, bool = false); Presumably the default value for the inplace argument is to avoid having to fixup all the call sites. I guess that's OK. Clearly it's a safe default and avoids a fair amount of unnecessary churn. OK for the trunk. THanks, Jeff
Re: [PATCH][1/n][RFC] Make FRE/PRE somewhat predicate aware
On May 16, 2014 7:07:11 PM CEST, Jeff Law l...@redhat.com wrote: On 05/16/14 02:02, Richard Biener wrote: Quiet as usual. Which strongly suggests folks trust you to do the right thing here :-) I think the FRE/PRE reference in $SUBJECT made me ignore the patch entirely -- my brain hurts when I look at our tree PRE implementation. Heh, it's much easier to understand than it once was! :) Richard. Jeff
Re: [DOC Patch] Incorrect @xref in #pragma visibility
On 05/14/14 20:25, David Wohlferd wrote: 2014-05-14 David Wohlferd d...@limegreensocks.com * doc/extend.texi: (Visibility Pragmas) Fix misplaced @xref Approved and installed on your behalf. Thanks, jeff
Re: [PATCH 1/1][ira-costs] grammar fix of comments
On 05/14/14 18:45, Zhouyi Zhou wrote: Fix grammar error for comments above process_bb_node_for_hard_reg_moves Signed-off-by: Zhouyi Zhou yizhouz...@ict.ac.cn --- gcc/ira-costs.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/gcc/ira-costs.c b/gcc/ira-costs.c index 648806b..fbfb070 100644 --- a/gcc/ira-costs.c +++ b/gcc/ira-costs.c @@ -1855,7 +1855,7 @@ find_costs_and_classes (FILE *dump_file) /* Process moves involving hard regs to modify allocno hard register costs. We can do this only after determining allocno class. If a - hard register forms a register class, than moves with the hard + hard register forms a register class, then moves with the hard register are already taken into account in class costs for the allocno. */ static void THanks. Installed. jeff
Re: [PATCH 1/5] export finish_bitfield_layout from stor-layout
On 05/16/14 09:26, Tom Tromey wrote: The gdb plugin handles some aspects of type layout on its own. It does this because it knows the layout of types, but not the path by which the layout was determined -- so normal gcc things like TYPE_PACKED cannot be used. This patch exposes one bit of stor-layout so it can be used by the plugin. 2014-05-16 Phil Muldoon pmuld...@redhat.com Tom Tromey tro...@redhat.com * stor-layout.c (finish_bitfield_layout): Now public. Change argument type to 'tree'. (finish_record_layout): Update. * stor-layout.h (finish_bitfield_layout): Declare. OK. However, please hold off installing until the entire set is approved. Jeff
Re: [PATCH 2/5] c_diagnostic_ignored_function hack
On 05/16/14 09:26, Tom Tromey wrote: In the typical case, when compiling a snippet of user code, gdb wraps the user's text in a dummy function. It's somewhat odd for users if an error in their code is mentioned as coming from this dummy function. This patch makes it possible to suppress the function-name display in a straightforward way: it adds a new global which the plugin can set to declare the name of the dummy function. This patch seems like a bit of a hack, but there didn't seem to be a notably cleaner approach. 2014-05-16 Phil Muldoon pmuld...@redhat.com Tom Tromey tro...@redhat.com * c-lang.c (c_diagnostic_ignored_function): New global. (c_print_error_function): New function. (LANG_HOOKS_PRINT_ERROR_FUNCTION): Define. * c-lang.h (c_diagnostic_ignored_function): Declare. Just a few nites. In c-lang.c, please use the old C-style comments. If for no other reason than it's consistent with all the other nearby code. Consider using non-NULL when referring to pointers rather than non-zero. */ Otherwise OK. Please wait to install until the entire kit is approved. BTW, didn't see patch #5 of the series. jeff
Re: [PATCH 2/5] c_diagnostic_ignored_function hack
Jeff BTW, didn't see patch #5 of the series. Maybe it was too big. I will try to resend it compressed. Tom
[PATCH 5/5] add libcc1
Tom == Tom Tromey tro...@redhat.com writes: Tom This patch series is half of a project to let gdb reuse gcc (which Tom half depends on which list you are seeing this on), so that users can Tom compile small snippets of code and evaluate them in the current Tom context of the inferior. Jeff noted that patch #5 didn't make it through. I've edited this one down by removing the auto-generated stuff , and then compressed it. If anybody wants to apply this to try it out, it's all on github. https://github.com/tromey/gcc https://github.com/tromey/gdb In each repository the branch named submit/compile holds the rebased series that were submitted. The development branches are both named gdbjit. These are much messier but not ever rebased, so perhaps safer to track. Tom This patch adds the plugin to the gcc tree and updates the top-level configury. It seems better to have this code in the gcc tree than in the gdb tree, because it is bound more tightly to gcc. The gcc plugin makes direct calls into various parts of gcc to do its work; whereas on the gdb side everything is done via a relatively simple API, without any direct connection to gdb internals. This breakdown made the most sense because most calls are from gdb to gcc rather than vice versa. The plugin itself consists of two parts. These parts communicate via a simple ad hoc RPC system implemented in the plugin code. 2014-05-16 Phil Muldoon pmuld...@redhat.com Tom Tromey tro...@redhat.com * Makefile.def: Add libcc1 to host_modules. * configure.ac (host_tools): Add libcc1. * Makefile.in, configure: Rebuild. 2014-05-16 Phil Muldoon pmuld...@redhat.com Jan Kratochvil jan.kratoch...@redhat.com Tom Tromey tro...@redhat.com * aclocal.m4: New file. * callbacks.cc: New file. * callbacks.hh: New file. * cc1plugin-config.h.in: New file. * configure: New file. * configure.ac: New file. * connection.cc: New file. * connection.hh: New file. * libcc1.cc: New file. * libcc1plugin.sym: New file. * libcc1.sym: New file. * Makefile.am: New file. * Makefile.in: New file. * marshall.cc: New file. * marshall.hh: New file. * names.cc: New file. * names.hh: New file. * plugin.cc: New file. * rpc.hh: New file. * status.hh: New file. --- ChangeLog| 7 + Makefile.def | 5 + Makefile.in | 995 ++- configure| 2 +- configure.ac | 4 +- libcc1/ChangeLog |24 + libcc1/Makefile.am |42 + libcc1/Makefile.in | 614 ++ libcc1/aclocal.m4| 980 +++ libcc1/callbacks.cc |90 + libcc1/callbacks.hh |64 + libcc1/cc1plugin-config.h.in |92 + libcc1/configure | 16956 + libcc1/configure.ac |65 + libcc1/connection.cc | 153 + libcc1/connection.hh | 114 + libcc1/libcc1.cc | 454 ++ libcc1/libcc1.sym| 1 + libcc1/libcc1plugin.sym | 2 + libcc1/marshall.cc | 166 + libcc1/marshall.hh |93 + libcc1/names.cc |46 + libcc1/names.hh |55 + libcc1/plugin.cc | 895 +++ libcc1/rpc.hh| 486 ++ libcc1/status.hh |33 + 26 files changed, 22430 insertions(+), 8 deletions(-) create mode 100644 libcc1/ChangeLog create mode 100644 libcc1/Makefile.am create mode 100644 libcc1/Makefile.in create mode 100644 libcc1/aclocal.m4 create mode 100644 libcc1/callbacks.cc create mode 100644 libcc1/callbacks.hh create mode 100644 libcc1/cc1plugin-config.h.in create mode 100755 libcc1/configure create mode 100644 libcc1/configure.ac create mode 100644 libcc1/connection.cc create mode 100644 libcc1/connection.hh create mode 100644 libcc1/libcc1.cc create mode 100644 libcc1/libcc1.sym create mode 100644 libcc1/libcc1plugin.sym create mode 100644 libcc1/marshall.cc create mode 100644 libcc1/marshall.hh create mode 100644 libcc1/names.cc create mode 100644 libcc1/names.hh create mode 100644 libcc1/plugin.cc create mode 100644 libcc1/rpc.hh create mode 100644 libcc1/status.hh 0001-add-libcc1.patch.gz Description: the patch
Re: [C++ patch] Reduce vtable alignment
Hi, this patch makes also the rtti type info for A in the testcase: struct A { virtual void foo(void) {}; virtual void foo2(void) {}; virtual void foo3(void) {}; virtual void foo4(void) {}; virtual void foo5(void) {}; } a; aligned only to the ABI requirement (8) instead of being bumped up to 16 bytes by the following code in i386.c: /* x86-64 ABI requires arrays greater than 16 bytes to be aligned to 16byte boundary. */ if (TARGET_64BIT) { if ((opt ? AGGREGATE_TYPE_P (type) : TREE_CODE (type) == ARRAY_TYPE) TYPE_SIZE (type) TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST wi::geu_p (TYPE_SIZE (type), 128) align 128) return 128; } Here the variable is first run through align_variable and that decides to add optional alignment. We really want only ABI required alignment here. Does the following patch look resonable? * rtti.c: Include tm_p.h (emit_tinfo_decl): Align type infos only as required by the target ABI. Index: rtti.c === --- rtti.c (revision 210521) +++ rtti.c (working copy) @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. #include coretypes.h #include tm.h #include tree.h +#include tm_p.h #include stringpool.h #include stor-layout.h #include cp-tree.h @@ -1596,6 +1597,12 @@ emit_tinfo_decl (tree decl) DECL_INITIAL (decl) = init; mark_used (decl); cp_finish_decl (decl, init, false, NULL_TREE, 0); + /* Avoid targets optionally bumping up the alignment to improve +vector instruction accesses, tinfo are never accessed this way. */ +#ifdef DATA_ABI_ALIGNMENT + DECL_ALIGN (decl) = DATA_ABI_ALIGNMENT (decl, TYPE_ALIGN (TREE_TYPE (decl))); + DECL_USER_ALIGN (decl) = true; +#endif return true; } else
Re: [PATCH 3/5] introduce the binding oracle
On 05/16/14 09:26, Tom Tromey wrote: gdb wants to supply any declarations that may be referred to by the user's code. Hooking into symbol lookup was an efficient way to accomplish this. This patch introducing a binding oracle that is consulted whenever a symbol binding is looked up for the first time. The oracle is just a global function pointer. If it is NULL, no special work is done. It is called with the identifier to supply and with an enum argument indicating the kind of binding being requested. The oracle can then call back into the C front end (via the new functions c_pushtag and c_bind) to supply a binding; or it can silently do nothing if the request could not be fulfilled. The code caches Whether the oracle has been called to avoid repeated useless queries. There is a little hack in c_print_identifier to avoid calling the binding oracle here. This makes debugging gcc in the presence of the plugin remain relatively sane -- without this, calling debug_tree or the like can confusingly call into the plugin. 2014-05-16 Phil Muldoon pmuld...@redhat.com Tom Tromey tro...@redhat.com * c-tree.h (enum c_oracle_request): New. (c_binding_oracle_function): New typedef. (c_binding_oracle, c_pushtag, c_bind): Declare. * c-decl.c (c_binding_oracle): New global. (I_SYMBOL_CHECKED): New macro. (i_symbol_binding): New function. (I_SYMBOL_BINDING, I_SYMBOL_DECL): Redefine. (I_TAG_CHECKED): New macro. (i_tag_binding): New function. (I_TAG_BINDING, I_TAG_DECL): Redefine. (I_LABEL_CHECKED): New macro. (i_label_binding): New function. (I_LABEL_BINDING, I_LABEL_DECL): Redefine. (c_print_identifier): Save and restore c_binding_oracle. (c_pushtag, c_bind): New functions. --- void c_print_identifier (FILE *file, tree node, int indent) { + void (*save) (enum c_oracle_request, tree identifier); + + // This makes debugging much more sane. + save = c_binding_oracle; + c_binding_oracle = NULL; + Just a nit. C-style comment would be appreciated. It might also help to clarify what much more sane really means here. Otherwise, it looks OK to me. jeff
Re: [patch] fix ppc spe bootstrap error in dwf_regno
On 04/30/14 09:15, Cesar Philippidis wrote: I've been working on a patch to switch a few ppc targets to use softfp in libgcc instead of fpbit and I noticed that ppc-none-eabispe fails to bootstrap in both trunk and 4.9. The regression was introduced in http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03588.html. Essentially, the assert for a hard register in dwf_regno () is not valid on ppc spe targets. In rs6000_dwarf_register_span (), there is a note stating: /* The duality of the SPE register size wreaks all kinds of havoc. This is a way of distinguishing r0 in 32-bits from r0 in 64-bits. */ and the function adds 1200 to regno, which makes that register appear to be a pseudo. This causes problems in dwf_regno (), which asserts that reg is a hard register. Since the dwarf2 pass is executed after register allocation it should, in theory, be ok for the rs6000 backend to be using a pseudo register for this application. Is this patch ok for trunk and 4.9? If so, please commit since I don't have an svn account. It seems to me the real problem here is the SPE port and this silly game it plays with adding a large value to regno. While it is currently safe to remove the assert, the assert is there to help catch cases where a pseudo has leaked through to a point where it shouldn't. I'd rather look for another approach. Jeff
Re: [PATCH, i386, Pointer Bounds Checker 2/x] Intel Memory Protection Extensions (MPX) instructions support
On 04/16/14 05:35, Ilya Enkovich wrote: Hi, This patch introduces Intel MPX bound registers and instructions. It was approved earlier for 4.9 and had no significant changes since then. I'll assume patch is OK if no objections arise. Patch was bootstrapped and tested for linux-x86_64. Thanks, Ilya -- gcc/ 2014-04-16 Ilya Enkovich ilya.enkov...@intel.com * mode-classes.def (MODE_BOUND): New. * tree.def (BOUND_TYPE): New. * genmodes.c (complete_mode): Support MODE_BOUND. (BOUND_MODE): New. (make_bound_mode): New. * machmode.h (BOUND_MODE_P): New. * stor-layout.c (int_mode_for_mode): Support MODE_BOUND. (layout_type): Support BOUND_TYPE. * tree-pretty-print.c (dump_generic_node): Support BOUND_TYPE. * tree.c (build_int_cst_wide): Support BOUND_TYPE. (type_contains_placeholder_1): Likewise. * tree.h (BOUND_TYPE_P): New. * varasm.c (output_constant): Support BOUND_TYPE. * config/i386/constraints.md (B): New. (Ti): New. (Tb): New. * config/i386/i386-modes.def (BND32): New. (BND64): New. * config/i386/i386-protos.h (ix86_bnd_prefixed_insn_p): New. * config/i386/i386.c (isa_opts): Add mmpx. (regclass_map): Add bound registers. (dbx_register_map): Likewise. (dbx64_register_map): Likewise. (svr4_dbx_register_map): Likewise. (PTA_MPX): New. (ix86_option_override_internal) Support MPX ISA. (ix86_code_end): Add MPX bnd prefix. (output_set_got): Likewise. (ix86_output_call_insn): Likewise. (get_some_local_dynamic_name): Add '!' (MPX bnd) print prefix support. (ix86_print_operand_punct_valid_p): Likewise. (ix86_print_operand_address): Support UNSPEC_BNDMK_ADDR and UNSPEC_BNDMK_ADDR. (ix86_class_likely_spilled_p): Add bound regs support. (ix86_hard_regno_mode_ok): Likewise. (x86_order_regs_for_local_alloc): Likewise. (ix86_bnd_prefixed_insn_p): New. * config/i386/i386.h (FIRST_PSEUDO_REGISTER): Fix to new value. (FIXED_REGISTERS): Add bound registers. (CALL_USED_REGISTERS): Likewise. (REG_ALLOC_ORDER): Likewise. (HARD_REGNO_NREGS): Likewise. (TARGET_MPX): New. (VALID_BND_REG_MODE): New. (FIRST_BND_REG): New. (LAST_BND_REG): New. (reg_class): Add BND_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Likewise. (BND_REGNO_P): New. (ANY_BND_REG_P): New. (BNDmode): New. (HI_REGISTER_NAMES): Add bound registers. * config/i386/i386.md (UNSPEC_BNDMK): New. (UNSPEC_BNDMK_ADDR): New. (UNSPEC_BNDSTX): New. (UNSPEC_BNDLDX): New. (UNSPEC_BNDLDX_ADDR): New. (UNSPEC_BNDCL): New. (UNSPEC_BNDCU): New. (UNSPEC_BNDCN): New. (UNSPEC_MPX_FENCE): New. (BND0_REG): New. (BND1_REG): New. (type): Add mpxmov, mpxmk, mpxchk, mpxld, mpxst. (length_immediate): Likewise. (prefix_0f): Likewise. (memory): Likewise. (prefix_rep): Check for bnd prefix. (BND): New. (bnd_ptr): New. (BNDCHECK): New. (bndcheck): New. (*jcc_1): Add MPX bnd prefix and fix length. (*jcc_2): Likewise. (jump): Likewise. (simple_return_internal): Likewise. (simple_return_pop_internal): Likewise. (*indirect_jump): Add MPX bnd prefix. (*tablejump_1): Likewise. (simple_return_internal_long): Likewise. (simple_return_indirect_internal): Likewise. (mode_mk): New. (*mode_mk): New. (movmode): New. (*movmode_internal_mpx): New. (mode_bndcheck): New. (*mode_bndcheck): New. (mode_ldx): New. (*mode_ldx): New. (mode_stx): New. (*mode_stx): New. * config/i386/predicates.md (lea_address_operand) Rename to... (address_no_seg_operand): ... this. (address_mpx_no_base_operand): New. (address_mpx_no_index_operand): New. (bnd_mem_operator): New. * config/i386/i386.opt (mmpx): New. Are parts of this patch missing? The ChangeLog references several changes in the machine independent parts of GCC, but I don't see them. I can recall what they look like, but for sanity's sake, I think this patch needs to be reposted. It's likely OK, but let's do the right thing. jeff
Re: [PATCH] Fix more typos in error messages
On 02/25/14 09:21, Benno Schulenberg wrote: 2014-02-07 Benno Schulenbergbensb...@justemail.net * config/arc/arc.c (arc_init): Fix typo in error message. * config/i386/i386.c (ix86_expand_builtin): Likewise. (split_stack_prologue_scratch_regno): Likewise. * fortran/check.c (gfc_check_fn_rc2008): Remove duplicate word from error message. Thanks. Installed. Sorry about the delay. jeff
[patch, libstdc++] fix TCL error in abi.exp
As noted in PR 23867, the libstdc++ ABI testsuite doesn't work for installed compiler testing. There is some logic in abi.exp to check for the presence of the required bits and skip these tests if they're not there, but before it gets that far it's hitting a TCL error: ERROR: can't read baseline_subdir_switch: no such variable while executing eval exec $cxx $baseline_subdir_switch invoked from within set baseline_subdir [eval exec $cxx $baseline_subdir_switch] ... This patch moves the offending reference to $baseline_subdir_switch after the check for missing bits instead of before. OK to commit? -Sandra 2014-05-16 Iain Sandoe i...@codesourcery.com Sandra Loosemore san...@codesourcery.com libstdc++-v3/ * testsuite/libstdc++-abi/abi.exp: Defer setting of baseline_subdir until after checking that the test is eligible to be run. Index: libstdc++-v3/testsuite/libstdc++-abi/abi.exp === --- libstdc++-v3/testsuite/libstdc++-abi/abi.exp (revision 210372) +++ libstdc++-v3/testsuite/libstdc++-abi/abi.exp (working copy) @@ -24,8 +24,6 @@ if { [string match *-*-darwin* $target set lib $blddir/src/.libs/libstdc++.so } -set baseline_subdir [eval exec $cxx $baseline_subdir_switch] - # Build the support objects. v3-build_support @@ -35,6 +33,8 @@ if { (${v3-symver} == 0) || ![info exist return } +set baseline_subdir [eval exec $cxx $baseline_subdir_switch] + set baseline_file \ [file join $baseline_dir $baseline_subdir baseline_symbols.txt] # If there is no ABI-specific reference file use that of the default ABI.
Re: Unreviewed Patch
On 02/22/14 16:07, rbmj wrote: Hi all, Just a ping, I haven't gotten anything back on this patch: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00621.html The patch needs to be tested by bootstrapping on another platform and performing a regression test.Most folks use an x86_64 linux system for that step. While the odds that this patch breaks things is small, the regression test and bootstrap has proven quite valuable through the years in catching issues. jeff
Re: [patch, libstdc++] fix TCL error in abi.exp
On 05/16/14 13:53, Sandra Loosemore wrote: As noted in PR 23867, the libstdc++ ABI testsuite doesn't work for installed compiler testing. There is some logic in abi.exp to check for the presence of the required bits and skip these tests if they're not there, but before it gets that far it's hitting a TCL error: ERROR: can't read baseline_subdir_switch: no such variable while executing eval exec $cxx $baseline_subdir_switch invoked from within set baseline_subdir [eval exec $cxx $baseline_subdir_switch] ... This patch moves the offending reference to $baseline_subdir_switch after the check for missing bits instead of before. OK to commit? -Sandra 2014-05-16 Iain Sandoe i...@codesourcery.com Sandra Loosemore san...@codesourcery.com libstdc++-v3/ * testsuite/libstdc++-abi/abi.exp: Defer setting of baseline_subdir until after checking that the test is eligible to be run. OK. Jeff
Re: [PATCH][C-family] Fix PR61184
On 05/14/14 03:06, Richard Biener wrote: The following fixes pre/post-inc/dec gimplification of promoted integer types. There is the issue with the way TYPE_OVERFLOW_UNDEFINED is related to TYPE_OVERFLOW_WRAPS and the (non-)semantics of -fno-strict-overflow. In this case, with -On -fno-strict-overflow for a variable of type short we have !TYPE_OVERFLOW_WRAPS _and_ !TYPE_OVERFLOW_UNDEFINED (so we're in an undefined area). Which means that !TYPE_OVERFLOW_UNDEFINED doesn't imply that overflow wraps. Thus the gimplification has to play on the safe side and always use an unsigned type unless the user specifies -fwrapv (the flag with a proper semantic meaning). That is, it seems to be the case that what predicate to use (TYPE_OVERFLOW_WRAPS or TYPE_OVERFLOW_UNDEFINED, independent on whether you invert it), depends on the use-case in a very awkward (and error-prone) way. Bootstrap and regtest pending on x86_64-unknown-linux-gnu, ok if that succeeds (I expect to have to adjust some testcases)? Thanks, Richard. 2014-05-14 Richard Biener rguent...@suse.de c-family/ * c-gimplify.c (c_gimplify_expr): Gimplify self-modify expressions using unsigned arithmetic if overflow does not wrap instead of if overflow is undefined. * c-c++-common/torture/pr61184.c: New testcase. Seems reasonable to me. Jeff
Re: [PATCH 5/5] add libcc1
On Fri, 16 May 2014, Tom Tromey wrote: This patch adds the plugin to the gcc tree and updates the top-level configury. I don't see anything obvious that would disable the plugin if plugins are unsupported (e.g. on Windows host) or disabled (--disable-plugin). Probably the relevant support from gcc/configure.ac needs to go somewhere it can be used at toplevel. + self-args.push_back (gcc); seems wrong - at least you should use the appropriate compiler name after transformation for cross compilers / --program-transform-name. Though really the *versioned* driver $(target_noncanonical)-gcc-$(version) is the right one to use, in that the plugin should presumably be closely associated with a particular compiler installation when e.g. a distribution has multiple GCC versions packaged that can be installed simultaneously. Having multiple copies of the plugin installed in the same prefix for different targets should be supported. I'm not clear on which library does what - does libcc1 depend in any way on the target or GCC version? How are the compiler multilib options (e.g. -m32 / -m64) specified? Is that something GDB passes through, based on examining properties of the binary being debugged? (Unfortunately there may not be a good way in general for GDB to tell what the required options for a particular binary are. It can at least reasonably default to using a copy of the plugin for the same target triplet as GDB was configured for.) -- Joseph S. Myers jos...@codesourcery.com
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On Fri, 2014-05-16 at 18:10 +0800, Bin.Cheng wrote: On Thu, May 15, 2014 at 6:31 PM, Oleg Endo oleg.e...@t-online.de wrote: How about the following. Instead of adding new hooks and inserting the pass to the general pass list, make the new pass class take the necessary callback functions directly. Then targets can just instantiate the pass, passing their impl of the callbacks, and insert the pass object into the pass list at a place that fits best for the target. Oh, I don't know we can do this in GCC. But yes, a target may want to run it at some place that fits best for the target. I think it's better than trying to come up with a scheme that so-so fits all. My idea would look like: // merge_paired_loadstore.h class merge_paired_loadstore : public rtl_opt_pass { public: struct delegate { virtual bool merge_paired_loadstore (rtx x, rtx y, ...) = 0; ... }; merge_paired_loadstore (gcc::context* ctx, const char* name, delegate* d); ... }; // target.cc #include merge_paired_loadstore.h static struct target_merge_loadstore_delegate : merge_paired_loadstore::delegate { virtual bool merge_paired_loadstore (...) { // code as if this was a freestanding target hook function }; } g_merge_loadstore_delegate; static void target_register_passes (void) { register_pass ( new merge_paired_loadstore (g, merge_ls, g_merge_loadstore_delegate), PASS_POS_INSERT_AFTER, other pass, 1); } Then, later, maybe sometime in the future, if there's something like a class target, it'd look like: class my_target : public target, merge_paired_loadstore::delegate { ... virtual bool merge_paired_loadstore (...); }; Maybe it's a bit far fetched at the moment, but it would be a start. Cheers, Oleg
[PATCH] Use optimize_function_for_size_p to assign register frequency
This patch uses optimize_function_for_size_p to replace old optimize_size check in regs.h and ira-int.h to make it consistent. Bootstrapped and testing on-going. OK for trunk if test passes? Thanks, Dehao gcc/ChangeLog: 2014-05-16 Dehao Chen de...@google.com * ira-int.h (REG_FREQ_FROM_EDGE_FREQ): Use optimize_function_for_size_p. * regs.h (REG_FREQ_FROM_BB): Likewise. Index: gcc/ira-int.h === --- gcc/ira-int.h (revision 210527) +++ gcc/ira-int.h (working copy) @@ -42,9 +42,8 @@ along with GCC; see the file COPYING3. If not see profile driven feedback is available and the function is never executed, frequency is always equivalent. Otherwise rescale the edge frequency. */ -#define REG_FREQ_FROM_EDGE_FREQ(freq) \ - (optimize_size || (flag_branch_probabilities\ - !ENTRY_BLOCK_PTR_FOR_FN (cfun)-count) \ +#define REG_FREQ_FROM_EDGE_FREQ(freq) \ + (optimize_function_for_size_p (cfun)\ ? REG_FREQ_MAX : (freq * REG_FREQ_MAX / BB_FREQ_MAX) \ ? (freq * REG_FREQ_MAX / BB_FREQ_MAX) : 1) Index: gcc/regs.h === --- gcc/regs.h (revision 210527) +++ gcc/regs.h (working copy) @@ -135,9 +135,7 @@ extern size_t reg_info_p_size; or profile driven feedback is available and the function is never executed, frequency is always equivalent. Otherwise rescale the basic block frequency. */ -#define REG_FREQ_FROM_BB(bb) (optimize_size \ - || (flag_branch_probabilities \ - !ENTRY_BLOCK_PTR_FOR_FN (cfun)-count) \ +#define REG_FREQ_FROM_BB(bb) (optimize_function_for_size_p (cfun)\ ? REG_FREQ_MAX \ : ((bb)-frequency * REG_FREQ_MAX / BB_FREQ_MAX)\ ? ((bb)-frequency * REG_FREQ_MAX / BB_FREQ_MAX)\
Re: [PATCH 5/5] add libcc1
Joseph == Joseph S Myers jos...@codesourcery.com writes: + self-args.push_back (gcc); Joseph seems wrong - at least you should use the appropriate compiler Joseph name after transformation for cross compilers / Joseph --program-transform-name. Ok, we'll look into it. Joseph I'm not clear on which library does what - does libcc1 depend in Joseph any way on the target or GCC version? It doesn't depend on the target at all. It does rely on the GCC plugin using the same .def file -- in fact we should probably add a handshake at the start so that we don't wind up with a version mismatch there. The plugin is of course exposed to GCC internals changes as all plugins are; but this is why we think it should be part of GCC. Joseph How are the compiler multilib options (e.g. -m32 / -m64) Joseph specified? Is that something GDB passes through, based on Joseph examining properties of the binary being debugged? Joseph (Unfortunately there may not be a good way in general for GDB to Joseph tell what the required options for a particular binary are. It Joseph can at least reasonably default to using a copy of the plugin Joseph for the same target triplet as GDB was configured for.) gdb extracts some information from the debuginfo. Some information is provided by a new gdbarch method; for example this is where -m32 / -m64 / -m31 (hi S390) come from. I am not sure if we can get the proper $(target_noncanonical) for a given gdbarch. We'll look into it. Tom