Re: Reload related segfaults
From: David Miller Date: Wed, 26 Oct 2011 23:07:11 -0400 (EDT) > From: Alan Modra > Date: Thu, 27 Oct 2011 13:29:56 +1030 > >> Some recent patch has exposed a reload bug. I'm seeing > > I think this might be a side effect or Eric's recent changes, > CC:'d. Eric, I'm seeing a similar segmentation fault in reload on sparc64. But it's in a slightly different place than Alan's crash. Simply compile gcc.target/ultrasp12.c with "-m64 -O2 -mcpu=ultrasparc -mvis" to see this. The crash is in find_valid_class() called from push_reload(), via this code block around line 1184 of reload.c: enum reg_class in_out_class = find_valid_class (outmode, GET_MODE (SUBREG_REG (out)), subreg_regno_offset (REGNO (SUBREG_REG (out)), GET_MODE (SUBREG_REG (out)), SUBREG_BYTE (out), GET_MODE (out)), REGNO (SUBREG_REG (out))); 'out' is: (subreg:DI (reg/v:V4QI 50 %f18 [orig:314 s2hi4_ ] [314]) 0) so subreg_regno_offset() returns -1, and find_valid_class() isn't too happy about getting "-1" for it's 'n' argument. I suspect the test that you removed in order to fix rtl-optimization/46603 would guard against this happening. And indeed, I confirmed that backing out the rtl-optimization/46603 fix makes the segmentation fault go away.
[PATCH] PR49720, infinite recursion in RTX simplification
Hi, this patch is for a specific case in RTX simplification where infinite recursion ensues, causing the out-of-stack segfault in PR49720. Tracking back the origin of this bug, the exact revision causing this was rev.92429, back in 2004. The clause in simplify-rtx.c:simplify_relational_operation_1(), transforming X + cst1 == cst2 into X == cst2 - cst1 can cause an oscillating recursive cycle like this: A + B == C<===> C - B == A When all A, B, and C are CONSTANT_P, but not simplifiable (in the PR testcase, they are all SYMBOL_REFs). The switching of XEXP(op0,1) to the other side of the equation, plus commutative canonicalizing then completes the cycle. This can be solved by using simplify_binary_relation() and returning NULL_RTX when simplification of the constant PLUS/MINUS fails, but I'm not sure that further working on X == CST2-CST1 will never yield more optimization, so the patch here just tries to plug the recursion hole for the exact condition. Bootstrapped and tested on i686 and x86_64 without regressions. Is this okay for trunk? Thanks, Chung-Lin 2011-10-27 Chung-Lin Tang PR rtl-optimization/49720 * simplify-rtx.c (simplify_relational_operation_1): Detect infinite recursion condition in "(eq/ne (plus x cst1) cst2) simplifies to (eq/ne x (cst2 - cst1))" case. testsuite/ * g++.dg/torture/pr49720.C: New test. Index: trunk/gcc/simplify-rtx.c === --- trunk/gcc/simplify-rtx.c(revision 180421) +++ trunk/gcc/simplify-rtx.c(working copy) @@ -4352,10 +4352,20 @@ { rtx x = XEXP (op0, 0); rtx c = XEXP (op0, 1); + enum rtx_code invcode = op0code == PLUS ? MINUS : PLUS; + rtx tem = simplify_gen_binary (invcode, cmp_mode, op1, c); - c = simplify_gen_binary (op0code == PLUS ? MINUS : PLUS, - cmp_mode, op1, c); - return simplify_gen_relational (code, mode, cmp_mode, x, c); + /* Detect an infinite recursive condition, where we oscillate at this +simplification case between: + A + B == C <---> C - B == A, +where A, B, and C are all constants with non-simplifiable expressions, +usually SYMBOL_REFs. */ + if (GET_CODE (tem) == invcode + && CONSTANT_P (x) + && rtx_equal_p (c, XEXP (tem, 1))) + return NULL_RTX; + + return simplify_gen_relational (code, mode, cmp_mode, x, tem); } /* (ne:SI (zero_extract:SI FOO (const_int 1) BAR) (const_int 0))) is Index: trunk/gcc/testsuite/g++.dg/torture/pr49720.C === --- trunk/gcc/testsuite/g++.dg/torture/pr49720.C(revision 0) +++ trunk/gcc/testsuite/g++.dg/torture/pr49720.C(revision 0) @@ -0,0 +1,8 @@ +/* { dg-do compile } */ + +extern char t_start[], t_end[], t_size[]; +bool foo (void) +{ + long size = reinterpret_cast(t_size); + return (size == t_end - t_start); +}
[PATCH, i386]: Avoid combining registers from different units in a single alternative.
Hello! See comment above inline_secondary_memory_needed function in i386.c. 2011-08-27 Uros Bizjak PR target/37191 * config/i386/sse.md (*vec_extract_v4sf_mem): Avoid combining registers from different units in a single alternative. Tested on x86_64-pc-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: sse.md === --- sse.md (revision 180559) +++ sse.md (working copy) @@ -3866,6 +3866,62 @@ DONE; }) +(define_insn_and_split "*sse4_1_extractps" + [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,x,x") + (vec_select:SF + (match_operand:V4SF 1 "register_operand" "x,0,x") + (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n")])))] + "TARGET_SSE4_1" + "@ + %vextractps\t{%2, %1, %0|%0, %1, %2} + # + #" + "&& reload_completed && SSE_REG_P (operands[0])" + [(const_int 0)] +{ + rtx dest = gen_rtx_REG (V4SFmode, REGNO (operands[0])); + switch (INTVAL (operands[2])) +{ +case 1: +case 3: + emit_insn (gen_sse_shufps_v4sf (dest, operands[1], operands[1], + operands[2], operands[2], + GEN_INT (INTVAL (operands[2]) + 4), + GEN_INT (INTVAL (operands[2]) + 4))); + break; +case 2: + emit_insn (gen_vec_interleave_highv4sf (dest, operands[1], operands[1])); + break; +default: + /* 0 should be handled by the *vec_extractv4sf_0 pattern above. */ + gcc_unreachable (); +} + DONE; +} + [(set_attr "isa" "*,noavx,avx") + (set_attr "type" "sselog,*,*") + (set_attr "prefix_data16" "1,*,*") + (set_attr "prefix_extra" "1,*,*") + (set_attr "length_immediate" "1,*,*") + (set_attr "prefix" "maybe_vex,*,*") + (set_attr "mode" "V4SF,*,*")]) + +(define_insn_and_split "*vec_extract_v4sf_mem" + [(set (match_operand:SF 0 "register_operand" "=x,*r,f") + (vec_select:SF +(match_operand:V4SF 1 "memory_operand" "o,o,o") +(parallel [(match_operand 2 "const_0_to_3_operand" "n,n,n")])))] + "TARGET_SSE" + "#" + "&& reload_completed" + [(const_int 0)] +{ + int i = INTVAL (operands[2]); + + emit_move_insn (operands[0], adjust_address (operands[1], SFmode, i*4)); + DONE; +}) + (define_expand "avx_vextractf128" [(match_operand: 0 "nonimmediate_operand" "") (match_operand:V_256 1 "register_operand" "") @@ -4044,62 +4100,6 @@ (set_attr "prefix" "vex") (set_attr "mode" "OI")]) -(define_insn_and_split "*sse4_1_extractps" - [(set (match_operand:SF 0 "nonimmediate_operand" "=rm,x,x") - (vec_select:SF - (match_operand:V4SF 1 "register_operand" "x,0,x") - (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n,n")])))] - "TARGET_SSE4_1" - "@ - %vextractps\t{%2, %1, %0|%0, %1, %2} - # - #" - "&& reload_completed && SSE_REG_P (operands[0])" - [(const_int 0)] -{ - rtx dest = gen_rtx_REG (V4SFmode, REGNO (operands[0])); - switch (INTVAL (operands[2])) -{ -case 1: -case 3: - emit_insn (gen_sse_shufps_v4sf (dest, operands[1], operands[1], - operands[2], operands[2], - GEN_INT (INTVAL (operands[2]) + 4), - GEN_INT (INTVAL (operands[2]) + 4))); - break; -case 2: - emit_insn (gen_vec_interleave_highv4sf (dest, operands[1], operands[1])); - break; -default: - /* 0 should be handled by the *vec_extractv4sf_0 pattern above. */ - gcc_unreachable (); -} - DONE; -} - [(set_attr "isa" "*,noavx,avx") - (set_attr "type" "sselog,*,*") - (set_attr "prefix_data16" "1,*,*") - (set_attr "prefix_extra" "1,*,*") - (set_attr "length_immediate" "1,*,*") - (set_attr "prefix" "maybe_vex,*,*") - (set_attr "mode" "V4SF,*,*")]) - -(define_insn_and_split "*vec_extract_v4sf_mem" - [(set (match_operand:SF 0 "register_operand" "=x*rf") - (vec_select:SF -(match_operand:V4SF 1 "memory_operand" "o") -(parallel [(match_operand 2 "const_0_to_3_operand" "n")])))] - "TARGET_SSE" - "#" - "&& reload_completed" - [(const_int 0)] -{ - int i = INTVAL (operands[2]); - - emit_move_insn (operands[0], adjust_address (operands[1], SFmode, i*4)); - DONE; -}) - ;; Modes handled by vec_extract patterns. (define_mode_iterator VEC_EXTRACT_MODE [(V32QI "TARGET_AVX") V16QI
Re: [C++ Patch / RFC] PR 50870
On 10/26/2011 07:00 PM, Paolo Carlini wrote: .. maybe my message wasn't clear, sorry, I'm a bit tired (here it's late): I meant to say that the non_reference tweak fixes the non-template impl class case, but something more is needed for a template impl (thus the new testcase). And, additionally, this issue is a [4.6/4.7 Regression], thus, post 4.6.2, we may be interested in back porting something. Ah. Let me know if you need additional details, or what else. How is the template case failing? Or we want me to look into a different way to attack the template case. Please. Jason
Re: Go patch committed: Implement new syscall package
Rainer Orth writes: > the problem is another one: using /usr/xpg4/bin/awk, I find: > > /usr/xpg4/bin/awk: line 47 (NR=32): wrong number of arguments to function "m" > > nawk(1) only documents match(s,ere) (i.e. two args), and the gawk docs > state: > > `match(STRING, REGEXP [, ARRAY])' > [...] > The ARRAY argument to `match' is a `gawk' extension. In > compatibility mode (*note Options::), using a third argument is a > fatal error. I committed this patch which should fix this problem. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Ian diff -r 5f4f4eae5cd9 libgo/go/syscall/mksyscall.awk --- a/libgo/go/syscall/mksyscall.awk Wed Oct 26 16:19:57 2011 -0700 +++ b/libgo/go/syscall/mksyscall.awk Wed Oct 26 21:53:05 2011 -0700 @@ -44,33 +44,63 @@ blocking = 1 } -if (match($0, "//sys(nb)?[ ]*([a-zA-Z0-9_]+)\\(([^()]*)\\) *(\\(([^()]+)\\))?", gosig) == 0) { +line = $0 + +if (match(line, "//sys(nb)?[ ]*[a-zA-Z0-9_]+\\([^()]*\\) *(\\(([^()]+)\\))?") == 0) { print "unmatched line:", $0 | "cat 1>&2" status = 1 next } -gofnname = gosig[2] -gofnparams = gosig[3] -gofnresults = gosig[5] +# Sets a[1] = //sysnb, a[2] == function name. +split(line, a, "[ (]*") +gofnname = a[2] + +off = match(line, "\\([^()]*\\)") +end = index(substr(line, off, length(line) - off + 1), ")") +gofnparams = substr(line, off + 1, end - 2) + +line = substr(line, off + end, length(line) - (off + end) + 1) +off = match(line, "\\([^()]*\\)") +if (off == 0) { + gofnresults = "" +} else { + end = index(substr(line, off, length(line) - off + 1), ")") + gofnresults = substr(line, off + 1, end - 2) +} getline +line = $0 -if (match($0, "//([a-zA-Z0-9_]+)\\(([^()]*)\\) *(.*)$", csig) == 0) { +if (match(line, "//[a-zA-Z0-9_]+\\([^()]*\\)") == 0) { print "unmatched C line", $0, "after", gofnname | "cat 1>&2" status = 1 next } -cfnname = csig[1] -cfnparams = csig[2] -cfnresult = csig[3] +split(line, a, "[ (]*") +cfnname = substr(a[1], 3, length(a[1]) - 2) + +off = match(line, "\\([^()]*\\)") +end = index(substr(line, off, length(line) - off + 1), ")") +cfnparams = substr(line, off + 1, end - 2) + +line = substr(line, off + end + 1, length(line) - (off + end) + 1) +while (substr(line, 1, 1) == " ") { + line = substr(line, 2, length(line) - 1) +} +end = index(line, " ") +if (end != 0) { + line = substr(line, 1, end) +} +cfnresult = line printf("// Automatically generated wrapper for %s/%s\n", gofnname, cfnname) printf("func c_%s(%s) %s%s__asm__(\"%s\")\n", cfnname, cfnparams, cfnresult, cfnresult == "" ? "" : " ", cfnname) -printf("func %s(%s) %s%s{\n", - gofnname, gofnparams, gosig[4], gosig[4] == "" ? "" : " ") +printf("func %s(%s) %s%s%s%s{\n", + gofnname, gofnparams, gofnresults == "" ? "" : "(", gofnresults, + gofnresults == "" ? "" : ")", gofnresults == "" ? "" : " ") if (blocking) { print "\tentersyscall()" @@ -91,22 +121,22 @@ args = args ", " } - if (match(goargs[goarg], "^([^ ]*) ([^ ]*)$", goparam) == 0) { + if (split(goargs[goarg], a) != 2) { print loc, "bad parameter:", goargs[goarg] | "cat 1>&2" status = 1 next } - goname = goparam[1] - gotype = goparam[2] + goname = a[1] + gotype = a[2] - if (match(cargs[carg], "^([^ ]*) ([^ ]*)$", cparam) == 0) { + if (split(cargs[carg], a) != 2) { print loc, "bad C parameter:", cargs[carg] | "cat 1>&2" status = 1 next } - ctype = cparam[2] + ctype = a[2] if (gotype ~ /^\*/) { if (gotype != ctype) {
[PATCH] Fix thinko in previous sparc setcc changes.
More extensive testing showed that we have to force the usage of v9 scc patterns when comparing DImode values with comparison codes other than EQ and NE. Soon we'll be able to add exceptions this this, because VIS3 has addxc and addxccc instructions which test the 64-bit carry condition. Committed to trunk. gcc/ * config/sparc/sparc.c (emit_scc_insn): Force attempt of v9 sequences if we're comparing DImode and comparison is other than EQ or NE. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180558 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog|3 +++ gcc/config/sparc/sparc.c |5 + 2 files changed, 8 insertions(+), 0 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3a51510..124b17c 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,8 @@ 2011-10-26 David S. Miller + * config/sparc/sparc.c (emit_scc_insn): Force attempt of v9 sequences + if we're comparing DImode and comparison is other than EQ or NE. + * config/sparc/sparc.c (emit_scc_insn): Do not try v9 sequences until LEU/LTU/GEU/GTU is attempted. * config/sparc/sparc.md (*neg_snesi_sign_extend): New 64-bit insn diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 80e05a6..ea9fdef 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -2541,6 +2541,11 @@ emit_scc_insn (rtx operands[]) } } + if (TARGET_V9 + && GET_MODE (x) == DImode + && gen_v9_scc (operands[0], code, x, y)) +return true; + /* We can do LTU and GEU using the addx/subx instructions too. And for GTU/LEU, if both operands are registers swap them and fall back to the easy case. */ -- 1.7.6.401.g6a319
Re: Reload related segfaults
From: Alan Modra Date: Thu, 27 Oct 2011 13:29:56 +1030 > Some recent patch has exposed a reload bug. I'm seeing I think this might be a side effect or Eric's recent changes, CC:'d.
Reload related segfaults
Some recent patch has exposed a reload bug. I'm seeing libtool: compile: /home/amodra/build/gcc-curr/./gcc/xgcc -B/home/amodra/build/gcc-curr/./gcc/ -B/home/amodra/gnu/powerpc-linux/bin/ -B/home/amodra/gnu/powerpc-linux/lib/ -isystem /home/amodra/gnu/powerpc-linux/include -isystem /home/amodra/gnu/powerpc-linux/sys-include -m64 -fPIC -mstrict-align -DHAVE_CONFIG_H -I. -I/home/amodra/src/gcc-current/libjava/classpath/native/fdlibm -I../../include -fexceptions -fasynchronous-unwind-tables -g -O2 -m64 -fPIC -mstrict-align -MT sf_fabs.lo -MD -MP -MF .deps/sf_fabs.Tpo -c /home/amodra/src/gcc-current/libjava/classpath/native/fdlibm/sf_fabs.c -fPIC -DPIC -o .libs/sf_fabs.o /home/amodra/src/gcc-current/libjava/classpath/native/fdlibm/sf_fabs.c: In function 'fabsf': /home/amodra/src/gcc-current/libjava/classpath/native/fdlibm/sf_fabs.c:33:1: internal compiler error: Segmentation fault (insn 11 9 15 2 (parallel [ (set (subreg:SI (reg:SF 123 [ ]) 0) (and:SI (subreg:SI (reg:SF 33 1 [ x ]) 0) (const_int 2147483647 [0x7fff]))) (clobber (scratch:CC)) ]) /home/amodra/src/gcc-current/libjava/classpath/native/fdlibm/sf_fabs.c:32 146 {andsi3_mc} (expr_list:REG_DEAD (reg:SF 33 1 [ x ]) (nil))) Reloads for insn # 11 Reload 0: GENERAL_REGS, RELOAD_FOR_INPUT_ADDRESS (opnum = 0), can't combine, secondary_reload_p reload_reg_rtx: (reg:SF 8 8) Reload 1: GENERAL_REGS, RELOAD_FOR_OUTPUT_ADDRESS (opnum = 0), can't combine, secondary_reload_p reload_reg_rtx: (reg:SF 8 8) Reload 2: reload_in (SF) = (reg:SF 33 1 [orig:123 ] [123]) reload_out (SF) = (reg:SF 33 1 [orig:123 ] [123]) NON_FLOAT_REGS, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg:SF 33 1 [orig:123 ] [123]) reload_out_reg: (reg:SF 33 1 [orig:123 ] [123]) reload_reg_rtx: (reg:SF 10 10) secondary_in_reload = 0, secondary_out_reload = 1 Reload 3: reload_out (SI) = (subreg:SI (reg:SF 33 1 [orig:123 ] [123]) 0) GENERAL_REGS, RELOAD_FOR_OUTPUT (opnum = 0) reload_out_reg: (subreg:SI (reg:SF 33 1 [orig:123 ] [123]) 0) reload_reg_rtx: (reg:SI 9 9) Reload 4: reload_in (SF) = (reg:SF 33 1 [ x ]) GENERAL_REGS, RELOAD_FOR_INPUT (opnum = 1) reload_in_reg: (reg:SF 33 1 [ x ]) reload_reg_rtx: (reg:SF 8 8) Notice reload 3 has a subreg for reload_out. We wind up in the gen_reload code shown below, and try to use REGNO on a subreg, leading to (gdb) p debug_rtx(out) (reg:DI -166922608) Or at least, that's what you get after fixing print_rtx to not segfault.. Bootstrap and regression test powerpc-linux in progress. OK to apply, assuming no regressions? * reload1.c (gen_reload): Don't use REGNO on SUBREGs. * print-rtl.c (print_rtx): Don't segfault on negative regno. Note that s/REGNO (in_rtx)/value/ in print_rtx is reasonable given that REG in rtl.def has a format of "i00", so XINT (in_rtx, i) is always XINT (in_rtx, 0) for a reg, which is equivalent to REGNO apart from signedness. Index: gcc/reload1.c === --- gcc/reload1.c (revision 180542) +++ gcc/reload1.c (working copy) @@ -8588,28 +8588,28 @@ gen_reload (rtx out, rtx in, int opnum, && reg_or_subregno (in) < FIRST_PSEUDO_REGISTER && (REG_P (out) || (GET_CODE (out) == SUBREG && REG_P (SUBREG_REG (out && reg_or_subregno (out) < FIRST_PSEUDO_REGISTER && SECONDARY_MEMORY_NEEDED (REGNO_REG_CLASS (reg_or_subregno (in)), REGNO_REG_CLASS (reg_or_subregno (out)), GET_MODE (out))) { /* Get the memory to use and rewrite both registers to its mode. */ rtx loc = get_secondary_mem (in, GET_MODE (out), opnum, type); if (GET_MODE (loc) != GET_MODE (out)) - out = gen_rtx_REG (GET_MODE (loc), REGNO (out)); + out = gen_rtx_REG (GET_MODE (loc), reg_or_subregno (out)); if (GET_MODE (loc) != GET_MODE (in)) - in = gen_rtx_REG (GET_MODE (loc), REGNO (in)); + in = gen_rtx_REG (GET_MODE (loc), reg_or_subregno (in)); gen_reload (loc, in, opnum, type); gen_reload (out, loc, opnum, type); } #endif else if (REG_P (out) && UNARY_P (in)) { rtx insn; rtx op1; rtx out_moded; rtx set; Index: gcc/print-rtl.c === --- gcc/print-rtl.c (revision 180542) +++ gcc/print-rtl.c (working copy) @@ -465,13 +465,12 @@ print_rtx (const_rtx in_rtx) int value = XINT (in_rtx, i); const char *name; #ifndef GENERATOR_FILE - if (REG_P (in_rtx) && value < FIRST_PSEUDO_REGISTER) - fprintf (outfile, " %d %s", REGNO (in_rtx), - reg_names[REGNO (in_rtx)]); + if (REG_P (in_rtx)
[pph] Fix some cgraph node merge problems (issue5302068)
This patch fixes some of the problems we had with cgraph nodes for merged functions. When merging a function that had a cgraph node emitted for it, we were ICEing during cgraph allocation because the reader was not checking whether the node already existed for that function. Additionally, the patch removes the computation of hash values for include branches. That was only papering over hashing problems. Instead, I made the hashing key more descriptive by using decl_as_string and the mangled name. We still have several merging problems. The x4tmplfun* and z4tmplfun* failures are different ICEs in template instantiation. Still have not looked what those are. Tested on x86_64. Committed. Diego. * cp-tree.h (get_mangled_id): Declare. * mangle.c (get_mangled_id): Factor out of ... (mangle_decl): ... here. * pph-streamer-in.c (pph_get_include_path_hash): Remove. Update all users. * pph-streamer-out.c (pph_merge_name): Move from pph-streamer.c. Change return type to char *. Call get_mangled_id and decl_as_string. Update users. testsuite/ChangeLog.pph * g++.dg/pph/c4inline.cc: Mark fixed. * g++.dg/pph/x4tmplclass1.cc: Likewise. * g++.dg/pph/x4tmplclass2.cc: Likewise. * g++.dg/pph/z4tmplclass1.cc: Likewise. * g++.dg/pph/z4tmplclass2.cc: Likewise. * g++.dg/pph/x4tmplfuncinln.cc: Change expected failure. * g++.dg/pph/x4tmplfuncninl.cc: Likewise. * g++.dg/pph/z4tmplfuncinln.cc: Likewise. * g++.dg/pph/z4tmplfuncninl.cc: Likewise. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 705c0a6..297a779 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5782,6 +5782,7 @@ extern tree merge_exception_specifiers(tree, tree, tree); /* in mangle.c */ extern void init_mangle(void); extern void mangle_decl(tree); +extern tree get_mangled_id (tree); extern const char *mangle_type_string (tree); extern tree mangle_typeinfo_for_type (tree); extern tree mangle_typeinfo_string_for_type(tree); diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c index 55851e6..38a1fcb 100644 --- a/gcc/cp/mangle.c +++ b/gcc/cp/mangle.c @@ -3157,13 +3157,21 @@ mangle_decl_string (const tree decl) return result; } +/* Return an identifier for the external mangled name of DECL. */ + +tree +get_mangled_id (tree decl) +{ + tree id = mangle_decl_string (decl); + return targetm.mangle_decl_assembler_name (decl, id); +} + /* Create an identifier for the external mangled name of DECL. */ void mangle_decl (const tree decl) { - tree id = mangle_decl_string (decl); - id = targetm.mangle_decl_assembler_name (decl, id); + tree id = get_mangled_id (decl); SET_DECL_ASSEMBLER_NAME (decl, id); if (G.need_abi_warning) diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index 6c8bec4..6aa301f 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -67,7 +67,7 @@ static VEC(char_p,heap) *string_tables = NULL; static int pph_loc_offset; /* Forward declarations to avoid circularity. */ -static tree pph_in_merge_key_tree (pph_stream *, tree *, hashval_t); +static tree pph_in_merge_key_tree (pph_stream *, tree *); /* stream initialization */ @@ -690,18 +690,17 @@ pph_in_chain (pph_stream *stream) /* Read a chain of AST merge keys from STREAM. Merge each tree - into *CHAIN. IPATH_HASH is the hash value of the include path - from STREAM to the root of the include tree. */ + into *CHAIN. */ static void -pph_in_merge_key_chain (pph_stream *stream, tree *chain, hashval_t ipath_hash) +pph_in_merge_key_chain (pph_stream *stream, tree *chain) { unsigned i; HOST_WIDE_INT count; count = pph_in_hwi (stream); for (i = 0; i < count; i++) -pph_in_merge_key_tree (stream, chain, ipath_hash); +pph_in_merge_key_tree (stream, chain); } @@ -740,13 +739,6 @@ typedef struct { /* Name of the tree (from pph_merge_name). */ const char *name; - - /* Hash value representing the include path starting at the image - where EXPR resides up to the root of the include tree. Objects - found in any of these PPH images do not need to be merged. They - were already emitted as external references and merged when - the PPH images were being generated. */ - hashval_t ipath_hash; } merge_toc_entry; @@ -767,11 +759,6 @@ htab_merge_key_eq (const void *p1, const void *p2) const merge_toc_entry *key1 = (const merge_toc_entry *) p1; const merge_toc_entry *key2 = (const merge_toc_entry *) p2; - /* Matches inside the same include path are not interesting. These - symbols have already been merged. */ - if (key1->ipath_hash == key2->ipath_hash) -return false; - if (key1->context != key2->context) return false; @@ -831,
[PATCH] Add sparc fmaf test.
Committed to trunk. gcc/testsuite/ * gcc.target/sparc/fmaf-1.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180551 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/testsuite/ChangeLog |2 + gcc/testsuite/gcc.target/sparc/fmaf-1.c | 51 +++ 2 files changed, 53 insertions(+), 0 deletions(-) create mode 100644 gcc/testsuite/gcc.target/sparc/fmaf-1.c diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 2879dfe..6078fdd 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,7 @@ 2011-10-26 David S. Miller + * gcc.target/sparc/fmaf-1.c: New test. + * gcc.target/sparc/setcc-1.c: New test. * gcc.target/sparc/setcc-2.c: New test. diff --git a/gcc/testsuite/gcc.target/sparc/fmaf-1.c b/gcc/testsuite/gcc.target/sparc/fmaf-1.c new file mode 100644 index 000..948b926 --- /dev/null +++ b/gcc/testsuite/gcc.target/sparc/fmaf-1.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mfmaf" } */ + +float fmadds (float a, float b, float c) +{ + return a * b + c; +} + +float fmsubs (float a, float b, float c) +{ + return a * b - c; +} + +float fnmadds (float a, float b, float c) +{ + return -(a * b + c); +} + +float fnmsubs (float a, float b, float c) +{ + return -(a * b - c); +} + +double fmaddd (double a, double b, double c) +{ + return a * b + c; +} + +double fmsubd (double a, double b, double c) +{ + return a * b - c; +} + +double fnmaddd (double a, double b, double c) +{ + return -(a * b + c); +} + +double fnmsubd (double a, double b, double c) +{ + return -(a * b - c); +} + +/* { dg-final { scan-assembler "fmadds\t%" } } */ +/* { dg-final { scan-assembler "fmsubs\t%" } } */ +/* { dg-final { scan-assembler "fnmadds\t%" } } */ +/* { dg-final { scan-assembler "fnmsubs\t%" } } */ +/* { dg-final { scan-assembler "fmaddd\t%" } } */ +/* { dg-final { scan-assembler "fmsubd\t%" } } */ +/* { dg-final { scan-assembler "fnmaddd\t%" } } */ +/* { dg-final { scan-assembler "fnmsubd\t%" } } */ -- 1.7.6.401.g6a319
[PATCH] Improve sparc setcc generation and add testcases.
This makes sure the addx/subx sequence is used even when v9 conditional moves are available. It also shows the compiler that the setcc instructions emitting -1, 0, or 1 all extend to DImode. I left all the "*x_{plus,minus}_foo" and related patterns alone for now, I'll get to those and add appropriate test cases as I find time. Committed to trunk. gcc/ * config/sparc/sparc.c (emit_scc_insn): Do not try v9 sequences until LEU/LTU/GEU/GTU is attempted. * config/sparc/sparc.md (*neg_snesi_sign_extend): New 64-bit insn and split. (*neg_seqsi_sign_extend): Likewise. (*sltu_extend_sp64, *neg_sltu_extend_sp64, *sgeu_extend_sp64, *neg_sgeu_extend_sp64): New insns. gcc/testsuite/ * gcc.target/sparc/setcc-1.c: New test. * gcc.target/sparc/setcc-2.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180550 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog|8 gcc/config/sparc/sparc.c | 14 +++ gcc/config/sparc/sparc.md| 59 ++ gcc/testsuite/ChangeLog |5 +++ gcc/testsuite/gcc.target/sparc/setcc-1.c | 39 gcc/testsuite/gcc.target/sparc/setcc-2.c | 39 6 files changed, 156 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/sparc/setcc-1.c create mode 100644 gcc/testsuite/gcc.target/sparc/setcc-2.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index d6d1382..3a51510 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,13 @@ 2011-10-26 David S. Miller + * config/sparc/sparc.c (emit_scc_insn): Do not try v9 sequences until + LEU/LTU/GEU/GTU is attempted. + * config/sparc/sparc.md (*neg_snesi_sign_extend): New 64-bit insn + and split. + (*neg_seqsi_sign_extend): Likewise. + (*sltu_extend_sp64, *neg_sltu_extend_sp64, *sgeu_extend_sp64, + *neg_sgeu_extend_sp64): New insns. + * config/sparc/sparc-protos.h (sparc_expand_conditional_move): Declare. * config/sparc/sparc.md (movcc, movcc): Call it. (*mov_cc_v9): Normalize to expect operand 0 always in operand 4. diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 964bcaf..80e05a6 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -2541,14 +2541,6 @@ emit_scc_insn (rtx operands[]) } } - /* For the rest, on v9 we can use conditional moves. */ - - if (TARGET_V9) -{ - if (gen_v9_scc (operands[0], code, x, y)) -return true; -} - /* We can do LTU and GEU using the addx/subx instructions too. And for GTU/LEU, if both operands are registers swap them and fall back to the easy case. */ @@ -2573,6 +2565,12 @@ emit_scc_insn (rtx operands[]) return true; } + /* All the posibilities to use addx/subx based sequences has been + exhausted, try for a 3 instruction sequence using v9 conditional + moves. */ + if (TARGET_V9 && gen_v9_scc (operands[0], code, x, y)) +return true; + /* Nope, do branches. */ return false; } diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index 2bae28e..2b4b2bb 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -713,6 +713,22 @@ "" [(set_attr "length" "2")]) +(define_insn_and_split "*neg_snesi_sign_extend" + [(set (match_operand:DI 0 "register_operand" "=r") +(neg:DI (ne:DI (match_operand:SI 1 "register_operand" "r") + (const_int 0 + (clobber (reg:CC CC_REG))] + "TARGET_ARCH64" + "#" + "&& 1" + [(set (reg:CC_NOOV CC_REG) (compare:CC_NOOV (minus:SI (const_int 0) + (match_dup 1)) + (const_int 0))) + (set (match_dup 0) (sign_extend:DI (neg:SI (ltu:SI (reg:CC CC_REG) + (const_int 0)] + "" + [(set_attr "length" "2")]) + (define_insn_and_split "*snedi_zero" [(set (match_operand:DI 0 "register_operand" "=&r") (ne:DI (match_operand:DI 1 "register_operand" "r") @@ -804,6 +820,21 @@ "" [(set_attr "length" "2")]) +(define_insn_and_split "*neg_seqsi_sign_extend" + [(set (match_operand:DI 0 "register_operand" "=r") + (neg:DI (eq:DI (match_operand:SI 1 "register_operand" "r") + (const_int 0 + (clobber (reg:CC CC_REG))] + "TARGET_ARCH64" + "#" + "&& 1" + [(set (reg:CC_NOOV CC_REG) (compare:CC_NOOV (neg:SI (match_dup 1)) + (const_int 0))) + (set (match_dup 0) (sign_extend:DI (neg:SI (geu:SI (reg:CC CC_REG) + (const_int 0)] + "" + [(set_attr "length" "2")]) + (define_insn_and_split "*seqdi_zero" [(set (match_operand:DI 0 "register_operand" "=&r") (eq:DI (match_operand:DI 1 "register_ope
Re: libstdc++/50862 fix deadlock in condition_variable_any
PR libstdc++/50862 * include/std/condition_variable (condition_variable_any::wait): Avoid terminating if relocking user mutex throws during stack-unwinding. * testsuite/30_threads/condition_variable_any/50862.cc: Add dg-require. Fixes two more issues pointed out in the PR comments. Tested x86_64-linux, committed to trunk. Index: include/std/condition_variable === --- include/std/condition_variable (revision 180456) +++ include/std/condition_variable (working copy) @@ -205,7 +205,13 @@ // scoped unlock - unlocks in ctor, re-locks in dtor struct _Unlock { explicit _Unlock(_Lock& __lk) : _M_lock(__lk) { __lk.unlock(); } - ~_Unlock() { _M_lock.lock(); } + ~_Unlock() noexcept(false) + { + if (uncaught_exception()) + __try { _M_lock.lock(); } __catch(...) { } + else + _M_lock.lock(); + } _Lock& _M_lock; }; Index: testsuite/30_threads/condition_variable_any/50862.cc === --- testsuite/30_threads/condition_variable_any/50862.cc(revision 180456) +++ testsuite/30_threads/condition_variable_any/50862.cc(working copy) @@ -4,6 +4,7 @@ // { dg-options " -std=gnu++0x " { target *-*-cygwin *-*-darwin* } } // { dg-require-cstdint "" } // { dg-require-gthreads "" } +// { dg-require-sched-yield "" } // Copyright (C) 2011 Free Software Foundation, Inc. //
[PATCH, devirtualization] Detect the new type in type change detection
Hi, I've been asked by Maxim Kuvyrkov to revive the following patch which has not made it to 4.6. Currently, when type based devirtualization detects a potential type change, it simply gives up on gathering any information on the object in question. This patch adds an attempt to actually detect the new type after the change. Maxim claimed this (and another patch I'll post tomorrow) noticeably improved performance of some real code. I can only offer a rather artificial example in the attachment. When the constructors are inlined but the function multiply_matrices is not, this patch makes the produced executable run for only 7 seconds instead of about 20 on my 4 year old i686 desktop (with -Ofast). Anyway, the patch passes bootstrap and testsuite on x86_64-linux. What do you think, is it a good idea for trunk now? Thanks, Martin 2011-10-21 Martin Jambor * ipa-prop.c (type_change_info): New fields object, known_current_type and multiple_types_encountered. (extr_type_from_vtbl_ptr_store): New function. (check_stmt_for_type_change): Use it, set multiple_types_encountered if the result is different from the previous one. (detect_type_change): Renamed to detect_type_change_1. New parameter comp_type. Set up new fields in tci, build known type jump functions if the new type can be identified. (detect_type_change): New function. * tree.h (DECL_CONTEXT): Comment new use. * testsuite/g++.dg/ipa/devirt-c-1.C: Add dump scans. * testsuite/g++.dg/ipa/devirt-c-2.C: Likewise. * testsuite/g++.dg/ipa/devirt-c-7.C: New test. Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -271,8 +271,17 @@ ipa_print_all_jump_functions (FILE *f) struct type_change_info { + /* The declaration or SSA_NAME pointer of the base that we are checking for + type change. */ + tree object; + /* If we actually can tell the type that the object has changed to, it is + stored in this field. Otherwise it remains NULL_TREE. */ + tree known_current_type; /* Set to true if dynamic type change has been detected. */ bool type_maybe_changed; + /* Set to true if multiple types have been encountered. known_current_type + must be disregarded in that case. */ + bool multiple_types_encountered; }; /* Return true if STMT can modify a virtual method table pointer. @@ -338,6 +347,49 @@ stmt_may_be_vtbl_ptr_store (gimple stmt) return true; } +/* If STMT can be proved to be an assignment to the virtual method table + pointer of ANALYZED_OBJ and the type associated with the new table + identified, return the type. Otherwise return NULL_TREE. */ + +static tree +extr_type_from_vtbl_ptr_store (gimple stmt, tree analyzed_obj) +{ + tree lhs, t, obj; + + if (!is_gimple_assign (stmt)) +return NULL_TREE; + + lhs = gimple_assign_lhs (stmt); + + if (TREE_CODE (lhs) != COMPONENT_REF) +return NULL_TREE; + obj = lhs; + + if (!DECL_VIRTUAL_P (TREE_OPERAND (lhs, 1))) +return NULL_TREE; + + do +{ + obj = TREE_OPERAND (obj, 0); +} + while (TREE_CODE (obj) == COMPONENT_REF); + if (TREE_CODE (obj) == MEM_REF) +obj = TREE_OPERAND (obj, 0); + if (TREE_CODE (obj) == ADDR_EXPR) +obj = TREE_OPERAND (obj, 0); + if (obj != analyzed_obj) +return NULL_TREE; + + t = gimple_assign_rhs1 (stmt); + if (TREE_CODE (t) != ADDR_EXPR) +return NULL_TREE; + t = get_base_address (TREE_OPERAND (t, 0)); + if (!t || TREE_CODE (t) != VAR_DECL || !DECL_VIRTUAL_P (t)) +return NULL_TREE; + + return DECL_CONTEXT (t); +} + /* Callback of walk_aliased_vdefs and a helper function for detect_type_change to check whether a particular statement may modify the virtual table pointer, and if possible also determine the new type of @@ -352,6 +404,12 @@ check_stmt_for_type_change (ao_ref *ao A if (stmt_may_be_vtbl_ptr_store (stmt)) { + tree type; + type = extr_type_from_vtbl_ptr_store (stmt, tci->object); + if (tci->type_maybe_changed + && type != tci->known_current_type) + tci->multiple_types_encountered = true; + tci->known_current_type = type; tci->type_maybe_changed = true; return true; } @@ -359,19 +417,19 @@ check_stmt_for_type_change (ao_ref *ao A return false; } -/* Detect whether the dynamic type of ARG has changed (before callsite CALL) by - looking for assignments to its virtual table pointer. If it is, return true - and fill in the jump function JFUNC with relevant type information or set it - to unknown. ARG is the object itself (not a pointer to it, unless - dereferenced). BASE is the base of the memory access as returned by - get_ref_base_and_extent, as is the offset. */ + + +/* Like detect_type_change but with extra argument COMP_TYPE which will become + the component type part of new JFUNC of d
Re: [C++ Patch / RFC] PR 50870
.. maybe my message wasn't clear, sorry, I'm a bit tired (here it's late): I meant to say that the non_reference tweak fixes the non-template impl class case, but something more is needed for a template impl (thus the new testcase). And, additionally, this issue is a [4.6/4.7 Regression], thus, post 4.6.2, we may be interested in back porting something. Let me know if you need additional details, or whatelse. Or we want me to look into a different way to attack the template case. Paolo.
Re: [C++ Patch / RFC] PR 50870
On 10/27/2011 12:48 AM, Jason Merrill wrote: On 10/26/2011 06:20 PM, Paolo Carlini wrote: I re-opened this one because: 1- We may want to fix it in 4_6-branch too, it's a regression there too; 2- We are still handling incorrectly the template impl case. For the latter a variant of my old idea still works, fwiw. -object_type = TREE_TYPE (object); +object_type = (TREE_CODE (object) == ARROW_EXPR + ? TREE_OPERAND (object, 0) : TREE_TYPE (object)); This is still wrong. Why not use the same patch for 4.6 as for trunk? I don't understand, sorry: both mainline and 4_6-branch do not handle correctly both testcases, template impl and not. If you want to look into it personally, I can unassign myself, no problem, really. Paolo.
Re: [C++ Patch / RFC] PR 50870
On 10/26/2011 06:20 PM, Paolo Carlini wrote: I re-opened this one because: 1- We may want to fix it in 4_6-branch too, it's a regression there too; 2- We are still handling incorrectly the template impl case. For the latter a variant of my old idea still works, fwiw. - object_type = TREE_TYPE (object); + object_type = (TREE_CODE (object) == ARROW_EXPR + ? TREE_OPERAND (object, 0) : TREE_TYPE (object)); This is still wrong. Why not use the same patch for 4.6 as for trunk? Jason
Re: PR c++/30195
On 10/24/2011 05:53 PM, Fabien Chêne wrote: After looking into it a bit more, I now see things differently. The target_decl is in fact put into the CLASSTYPE_METHOD_VEC at instantiation time, via handle_using_decl, with the correct access. Great. The problem is that while parsing the template (before instantiating), it is not yet there and an error is immediately raised. Ah, I see. We need lookup_member to find the USING_DECL, but if we're skipping function USING_DECLs in lookup_field_1 and they aren't in CLASSTYPE_METHOD_VEC in templates, we don't find it either way. I only see three solutions: 1) perform an additional lookup for using declarations that designate functions in lookup_fnfields_slot to find something even if the CLASSTYPE_METHOD_VEC is empty (what I implemented in the previous patch) 2) inhibate the lookup error at some point. 3) put the decl into the CLASSTYPE_METHOD_VEC before instantiation Honestly, I don't know if 2) is realistic, and how to implement it as well ... 3) seems complicated: in finish_member_declaration, we must put away the decl into TYPE_FIELDS or TYPE_METHODS, but we would like to put the target_decl into TYPE_METHODS (and call add_method), and at the same time put its using decl into TYPE_FIELDS... I don't know if it can theoretically work, but having try it, I think it would need many adjustments. I think #3 is the way to go. Isn't that what happens in non-templates? We don't try to instantiate the CLASSTYPE_METHOD_VEC, we build up a whole new one from the decl list, so having it in both places shouldn't be a problem for instantiation. Jason
Re: [C++ Patch / RFC] PR 50870
... forgot the testcase, sorry. Paolo. // // PR c++/50870 // { dg-options "-std=gnu++0x" } template struct impl { template static T create(); }; template ::template create() -> impl::template create())> struct tester { }; tester*, int, float> ti; template ::template create() -> impl::template create())> int test() { return 0; } int i = test*, int, float>();
Re: Fix gcc.dg/tls/thr-cse-1.c for MinGW target
On Wed, 26 Oct 2011, Kai Tietz wrote: > Hi, > > patch looks fine for mingw 32-bit. With a small nit new test passes > for 64-bit mingw version, too. The cygwin-part looks like the mingw > one. I've committed my original patch as what I can test - you may wish to commit followups after testing them on other Windows targets (and indeed more generally look at whatever Windows-specific test failures there are that I haven't fixed). -- Joseph S. Myers jos...@codesourcery.com
Re: [C++ Patch / RFC] PR 50870
On 10/26/2011 06:27 PM, Jason Merrill wrote: OK. I re-opened this one because: 1- We may want to fix it in 4_6-branch too, it's a regression there too; 2- We are still handling incorrectly the template impl case. For the latter a variant of my old idea still works, fwiw. Thanks, Paolo. / Index: pt.c === --- pt.c(revision 180532) +++ pt.c(working copy) @@ -13706,12 +13706,13 @@ tsubst_copy_and_build (tree t, /* Remember that there was a reference to this entity. */ if (DECL_P (object)) mark_used (object); - object_type = TREE_TYPE (object); + object_type = (TREE_CODE (object) == ARROW_EXPR + ? TREE_OPERAND (object, 0) : TREE_TYPE (object)); member = TREE_OPERAND (t, 1); if (BASELINK_P (member)) member = tsubst_baselink (member, - non_reference (TREE_TYPE (object)), + non_reference (object_type), args, complain, in_decl); else member = tsubst_copy (member, args, complain, in_decl);
Re: Fix gcc.dg/graphite/run-id-1.c for Windows targets
On Oct 26, 2011, at 2:17 PM, Joseph S. Myers wrote: > The test gcc.dg/graphite/run-id-1.c requires more stack space than > Windows targets provide by default. This patch arranges for the > appropriate -Wl,--stack option (as in config/mh-mingw and > config/mh-cygwin) to be used by this test on those targets. Tested > with cross to i686-mingw32. OK to commit? Yeah, I noticed these testcases (there were quite a few that hit me) on my port with a small stack. I fixed it by increasing my stack space for testing, so that I didn't have to go hit up testcase after testcase. I'm wondering if you would get better milage by just having this on the CFLAGS bits in dejagnu someplace, then at least, you'd never have to do this again. I'm fine with either fix you want to put in.
Re: [PATCH][RFC] Re-write LTO option merging
On Oct 26, 2011, at 6:10 AM, Richard Guenther wrote: > This completely rewrites LTO option merging. > Any comments? Wondering if It breaks darwin? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50876 We have: %{Zmultiply_defined*:-multiply_defined %*} in LINK_SPEC. and: ; Various linker options have a -Z added so that they can get to specs ; processing without interference. Note that an option name with a ; prefix that matches another option name, that also takes an ; argument, being mapped to a -Z linker option, needs to be modified ; so the prefix is different, otherwise a '*' after the shorter option ; will match with the longer one. multiply_defined Driver RejectNegative Separate Alias(Zmultiply_defined) in the .opt file. We may well be the only port that does these sorts of tricks. Essentially, we want to allow linker arguments on the command line, and have them never hit any machine independent option processing code, because they match (prefix match). We accomplish this this by hiding them early, doing all the normal processing, and then way late, just before link, turning them back into the normal names.
Re: Fix gcc.dg/graphite/run-id-1.c for Windows targets
2011/10/26 Joseph S. Myers : > The test gcc.dg/graphite/run-id-1.c requires more stack space than > Windows targets provide by default. This patch arranges for the > appropriate -Wl,--stack option (as in config/mh-mingw and > config/mh-cygwin) to be used by this test on those targets. Tested > with cross to i686-mingw32. OK to commit? > > 2011-10-26 Joseph Myers > > * gcc.dg/graphite/run-id-1.c: Pass -Wl,--stack,12582912 option for > Windows targets. > > Index: gcc.dg/graphite/run-id-1.c > === > --- gcc.dg/graphite/run-id-1.c (revision 180200) > +++ gcc.dg/graphite/run-id-1.c (working copy) > @@ -1,3 +1,4 @@ > +/* { dg-options "-Wl,--stack,12582912" { target *-*-mingw* *-*-cygwin* } } */ > /* { dg-require-effective-target size32plus } */ > > void abort (void); > > -- > Joseph S. Myers > jos...@codesourcery.com From my side patch is ok. Thanks, Kai
Re: Fix gcc.dg/tls/thr-cse-1.c for MinGW target
Hi, patch looks fine for mingw 32-bit. With a small nit new test passes for 64-bit mingw version, too. The cygwin-part looks like the mingw one. 2011/10/26 Joseph S. Myers : > gcc.dg/tls/thr-cse-1.c tests that there are not two calls to > __emutls_get_address in the output. Normally this just checks for > emutls_get_address.*emutls_get_address, but on some targets where the > compiler output declares the functions called in some way the testcase > has more specific tests. i?86-*-mingw* produces a .def directive for > ___emutls_get_address so also needs such a special case, which this > patch adds (only for i?86-*-mingw* as I don't know what the output > looks like for other Windows targets). Tested with cross to > i686-mingw32. OK to commit? > > 2011-10-26 Joseph Myers > > * gcc.dg/tls/thr-cse-1.c: For i?86-*-mingw*, check for multiple > calls. > > Index: gcc.dg/tls/thr-cse-1.c > === > --- gcc.dg/tls/thr-cse-1.c (revision 180200) > +++ gcc.dg/tls/thr-cse-1.c (working copy) > @@ -15,9 +15,10 @@ > return a; > } > > -/* { dg-final { scan-assembler-not > "emutls_get_address.*emutls_get_address.*" { target { ! { "*-wrs-vxworks" > "*-*-darwin8" "hppa*-*-hpux*" "spu-*-*" } } } } } */ > +/* { dg-final { scan-assembler-not > "emutls_get_address.*emutls_get_address.*" { target { ! { "*-wrs-vxworks" > "*-*-darwin8" "hppa*-*-hpux*" "spu-*-*" "i?86-*-mingw*" } } } } } */ > /* { dg-final { scan-assembler-not > "call\tL___emutls_get_address.stub.*call\tL___emutls_get_address.stub.*" { > target "*-*-darwin8" } } } */ > /* { dg-final { scan-assembler-not "(b,l|bl) __emutls_get_address.*(b,l|bl) > __emutls_get_address.*" { target "hppa*-*-hpux*" } } } */ > /* { dg-final { scan-assembler-not > "(brsl|brasl)\t__emutls_get_address.*(brsl|brasl)\t__emutls_get_address.*" { > target spu-*-* } } } */ > /* { dg-final { scan-assembler-not "tls_lookup.*tls_lookup.*" { target > *-wrs-vxworks } } } */ +/* { dg-final { scan-assembler-not "call\t___emutls_get_address.*call\t___emutls_get_address" { target { i?86-*-mingw* i?86-*-cygwin* } } } } */ +/* { dg-final { scan-assembler-not "call\t__emutls_get_address.*call\t__emutls_get_address" { target "x86_64-*-mingw*" } } } */ With this change test should pass for all IA mingw-targets and for cygwin, too. Thanks, Kai
[RFC PATCH] Gather vectorization (PR tree-optimization/50789)
Hi! This patch implements gather vectorization with -mavx2, if dr_may_alias (which apparently doesn't use tbaa :(( ) can figure out there is no overlap with stores in the loop (if any). The testcases show what is possible to get vectorized. I chose to add 4 extra (internal only) gather builtins in addition to the 16 ones needed for the intrinsics, because the builtins using different sizes of the index vs. src/mask/ret vectors would complicate the generic code way too much (we don't have a VEC_SELECT_EXPR nor VEC_CONCAT_EXPR and interleaving/extract even/odd is undesirable here). With these 4 extra builtins the generic code always sees same sized src/mask/ret vs. index vectors, either they have same number of units, then just one vgather* insn is needed, or the index has more elements (int index and double/long long load) - then for one loaded index vector there is one vgather* insn using the first half of the index vector and one using the second half of that vector, or long index with float/int load, then two index vectors are processed by two vgather* insns and the result gets concatenated first halves of both results. All this is so far unconditional only, we'd need some tree representation of conditional loads resp. conditional stores (and could already with AVX use vmaskmov* insns for that). Bootstrapped/regtested on x86_64-linux and i686-linux, testcases tested also under sde. Ok for trunk? 2011-10-26 Jakub Jelinek PR tree-optimization/50789 * tree-vect-stmts.c (process_use): Add force argument, avoid exist_non_indexing_operands_for_use_p check if true. (vect_mark_stmts_to_be_vectorized): Adjust callers. Handle STMT_VINFO_GATHER_P. (gen_perm_mask): New function. (perm_mask_for_reverse): Use it. (reverse_vec_element): Rename to... (permute_vec_elements): ... this. Add Y and MASK_VEC arguments, generalize for any permutations. (vectorizable_load): Adjust caller. Handle STMT_VINFO_GATHER_P. * target.def (TARGET_VECTORIZE_BUILTIN_GATHER): New hook. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_GATHER): Document it. * doc/tm.texi: Regenerate. * tree-data-ref.c (initialize_data_dependence_relation, compute_self_dependence): No longer static. * tree-data-ref.h (initialize_data_dependence_relation, compute_self_dependence): New prototypes. * tree-vect-data-refs.c (vect_check_gather): New function. (vect_analyze_data_refs): Detect possible gather load data refs. * tree-vectorizer.h (struct _stmt_vec_info): Add gather_p field. (STMT_VINFO_GATHER_P): Define. (vect_check_gather): New prototype. * config/i386/i386-builtin-types.def: Add types for alternate gather builtins. * config/i386/sse.md (AVXMODE48P_DI): Remove. (VEC_GATHER_MODE): Rename mode_attr to... (VEC_GATHER_IDXSI): ... this. (VEC_GATHER_IDXDI, VEC_GATHER_SRCDI): New mode_attrs. (avx2_gathersi, *avx2_gathersi): Use instead of . (avx2_gatherdi): Use instead of < and instead of VEC_GATHER_MODE on src and mask operands. (*avx2_gatherdi): Likewise. Use VEC_GATHER_MODE iterator instead of AVXMODE48P_DI. (avx2_gatherdi256, *avx2_gatherdi256): Removed. * config/i386/i386.c (enum ix86_builtins): Add IX86_BUILTIN_GATHERALTSIV4DF, IX86_BUILTIN_GATHERALTDIV8SF, IX86_BUILTIN_GATHERALTSIV4DI and IX86_BUILTIN_GATHERALTDIV8SI. (ix86_init_mmx_sse_builtins): Create those builtins. (ix86_expand_builtin): Handle those builtins and adjust expansions of other gather builtins. (ix86_vectorize_builtin_gather): New function. (TARGET_VECTORIZE_BUILTIN_GATHER): Define. * gcc.target/i386/avx2-gather-1.c: New test. * gcc.target/i386/avx2-gather-2.c: New test. * gcc.target/i386/avx2-gather-3.c: New test. --- gcc/tree-vect-stmts.c.jj2011-10-26 14:19:11.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-26 16:54:23.0 +0200 @@ -332,6 +332,8 @@ exist_non_indexing_operands_for_use_p (t - LIVE_P, RELEVANT - enum values to be set in the STMT_VINFO of the stmt that defined USE. This is done by calling mark_relevant and passing it the WORKLIST (to add DEF_STMT to the WORKLIST in case it is relevant). + - FORCE is true if exist_non_indexing_operands_for_use_p check shouldn't + be performed. Outputs: Generally, LIVE_P and RELEVANT are used to define the liveness and @@ -351,7 +353,8 @@ exist_non_indexing_operands_for_use_p (t static bool process_use (gimple stmt, tree use, loop_vec_info loop_vinfo, bool live_p, -enum vect_relevant relevant, VEC(gimple,heap) **worklist) +enum vect_relevant relevant, VEC(gimple,heap) **worklist, +bool force) { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); st
Re: [C++ Patch / RFC] PR 50864
On 10/26/2011 10:35 PM, Paolo Carlini wrote: On 10/26/2011 10:30 PM, Paolo Carlini wrote: Hi, At a glance, it looks too early; it's valid to have namespace-qualified names after ->. namespace A { struct B { int i; }; }; A::B* b; int i = b->A::B::i; I was also trying to construct such kind of example myself... but my patch does not regress on the testcase you wrote down. I can tell you exactly why, if you like.. We have that parser->scope is a RECORD_TYPE and postfix_expression is an INDIRECT_REF. In this case, for example (like PR50870): namespace impl { struct inner { template T create(); }; } template () -> impl::inner::create())> struct foo; we are also Ok, code is accepted, because name is a BASELINK and the new check isn't even reached (postfix_expression would be an ARROW_EXPR, but parser->scope again a RECORD_TYPE. More generally, in all the legal tests I tried by hand (outside the testsuite), when we get there parser->scope is always a RECORD_TYPE) But if you feel more comfortable about performing the check elsewhere, I can try that of course. Paolo.
Re: Skip gcc.target/i386/pad-4.c for Windows targets
On Oct 26, 2011, at 2:18 PM, Joseph S. Myers wrote: > gcc.target/i386/pad-4.c expects particular results from PIC code > generation - PIC in the sense of a GOT and a PLT, which is different > from the sense in which Windows code is always PIC. This patch > arranges for it to be skipped on Windows targets, similarly to other > tests such as gcc.target/i386/pic-1.c. Tested with cross to > i686-mingw32. OK to commit? Ok. This one makes my head hurt, just a little.
Re: Fix gcc.target/i386/pr37843-4.c for targets using leading underscores
On Oct 26, 2011, at 2:18 PM, Joseph S. Myers wrote: > gcc.target/i386/pr37843-4.c tests for the form calls to a function > "foo" take in the output. On Windows targets this function has a > leading underscore and so the test fails. > OK to commit? Ok.
[PATCH] Canonicalize sparc movcc patterns such that operand 0 always appears in operand 4.
The background is that I was putting together some test cases for gcc.target/sparc that make sure the most optimal setcc sequences are being generated. When v9, and thus conditional moves, are enabled we sometimes fall back to conditional moves even when the addx/subx sequence is more efficient. For example, we can do "a = x < y" in 2 instructions when x and y are both unsigned: cmp y, x subx%g0, -1, a but with v9 enabled we get: cmp x, y mov 0, a movleu %icc, 1, a The reason is because of the permissive way in which we define the RTL for conditional moves. For the integer moves we mark both operands 3 and 4 using predicate "arith*_operand" which accepts constants as well as registers. The constraints do in fact express how one of the two operands must be equal the operand 0. However combine doesn't look at constraints to determine if a combined instruction is recognized. So combine thinks that insns of the form: (set op0 (if_then_else (CMP X Y) (const_int 0) (const_int 1))) are ok. Reload later fixes things up by reloading one of the two constants into operand 0, and that's how we end up with the above 3 instruction sequence. Initially I tried to fix up the conditional move patterns by using a "match_dup 0" for operand 4 to show the combiner, in the RTL, that operand 4 must be equal to operand 0. But I learned that this is not legal RTL. You can only use match_dup in instructions after the first instruction of an expansion that initially makes use of the operand. Surprisingly this worked most of the time. The failures I got were in a regrename def/use chain assertion check when loop unrolling was enabled. Anyways, instead what I do here is normalize all expansions of conditional moves to be of the form: (set op0 (if_then_else (cmp X Y) op3 op0)) and in the instruction patterns I use "register_operand" and constraint "0" for operand 4. This is enough to keep the combiner from recognizing sequences like the above. Committed to trunk. gcc/ * config/sparc/sparc-protos.h (sparc_expand_conditional_move): Declare. * config/sparc/sparc.md (movcc, movcc): Call it. (*mov_cc_v9): Normalize to expect operand 0 always in operand 4. (*mov_cc_reg_sp64): Likewise. (*movsf_cc_v9): Likewise. (*movsf_cc_reg_sp64): Likewise. (*movdf_cc_v9): Likewise. (*movdf_cc_reg_sp64): Likewise. (*movtf_cc_hq_v9): Likewise. (*movtf_cc_reg_hq_sp64): Likewise. (*movtf_cc_v9): Likewise. (*movtf_cc_reg_sp64): Likewise. * config/sparc/sparc.c (sparc_expand_conditional_move): New function. (sparc_print_operand): Delete 'c' and 'd' handling, no longer used. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@180542 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 17 gcc/config/sparc/sparc-protos.h |1 + gcc/config/sparc/sparc.c| 80 gcc/config/sparc/sparc.md | 197 +-- 4 files changed, 144 insertions(+), 151 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index cd1a236..d6d1382 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,20 @@ +2011-10-26 David S. Miller + + * config/sparc/sparc-protos.h (sparc_expand_conditional_move): Declare. + * config/sparc/sparc.md (movcc, movcc): Call it. + (*mov_cc_v9): Normalize to expect operand 0 always in operand 4. + (*mov_cc_reg_sp64): Likewise. + (*movsf_cc_v9): Likewise. + (*movsf_cc_reg_sp64): Likewise. + (*movdf_cc_v9): Likewise. + (*movdf_cc_reg_sp64): Likewise. + (*movtf_cc_hq_v9): Likewise. + (*movtf_cc_reg_hq_sp64): Likewise. + (*movtf_cc_v9): Likewise. + (*movtf_cc_reg_sp64): Likewise. + * config/sparc/sparc.c (sparc_expand_conditional_move): New function. + (sparc_print_operand): Delete 'c' and 'd' handling, no longer used. + 2011-10-26 Eric Botcazou * reload.c (reload_inner_reg_of_subreg): Change type of return value diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h index bb6fb07..108e105 100644 --- a/gcc/config/sparc/sparc-protos.h +++ b/gcc/config/sparc/sparc-protos.h @@ -107,6 +107,7 @@ extern void sparc_expand_compare_and_swap_12 (rtx, rtx, rtx, rtx); extern const char *output_v8plus_mult (rtx, rtx *, const char *); extern void sparc_expand_vector_init (rtx, rtx); extern void sparc_expand_vec_perm_bmask(enum machine_mode, rtx); +extern bool sparc_expand_conditional_move (enum machine_mode, rtx *); #endif /* RTX_CODE */ #endif /* __SPARC_PROTOS_H__ */ diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index f806c6c..964bcaf 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -81
Fix gcc.target/i386/pr37843-4.c for targets using leading underscores
gcc.target/i386/pr37843-4.c tests for the form calls to a function "foo" take in the output. On Windows targets this function has a leading underscore and so the test fails. This patch allows for that underscore in the test. Tested with cross to i686-mingw32. OK to commit? 2011-10-26 Joseph Myers * gcc.target/i386/pr37843-4.c: Allow for leading underscores on symbol names. Index: gcc.target/i386/pr37843-4.c === --- gcc.target/i386/pr37843-4.c (revision 180200) +++ gcc.target/i386/pr37843-4.c (working copy) @@ -3,8 +3,8 @@ /* { dg-options "-O2 -msse2 -mpreferred-stack-boundary=4 -mstackrealign" } */ /* { dg-require-effective-target sse2 } */ /* { dg-final { scan-assembler-not "andl\[\\t \]*\\$-16,\[\\t \]*%\[re\]?sp" } } */ -/* { dg-final { scan-assembler-not "call\[\\t \]*foo" } } */ -/* { dg-final { scan-assembler "jmp\[\\t \]*foo" } } */ +/* { dg-final { scan-assembler-not "call\[\\t \]*_?foo" } } */ +/* { dg-final { scan-assembler "jmp\[\\t \]*_?foo" } } */ extern int foo (void); -- Joseph S. Myers jos...@codesourcery.com
Skip gcc.target/i386/pad-4.c for Windows targets
gcc.target/i386/pad-4.c expects particular results from PIC code generation - PIC in the sense of a GOT and a PLT, which is different from the sense in which Windows code is always PIC. This patch arranges for it to be skipped on Windows targets, similarly to other tests such as gcc.target/i386/pic-1.c. Tested with cross to i686-mingw32. OK to commit? 2011-10-26 Joseph Myers * gcc.target/i386/pad-4.c: Skip for Windows targets. Index: gcc.target/i386/pad-4.c === --- gcc.target/i386/pad-4.c (revision 180200) +++ gcc.target/i386/pad-4.c (working copy) @@ -2,6 +2,7 @@ /* { dg-require-effective-target ia32 } */ /* { dg-require-effective-target fpic } */ /* { dg-skip-if "" { i?86-*-* x86_64-*-* } { "-march=*" } { "-march=atom" } } */ +/* { dg-skip-if "No Windows PIC" { *-*-mingw* *-*-cygwin } { "*" } { "" } } */ /* { dg-options "-O2 -fomit-frame-pointer -march=atom -fPIC" } */ /* { dg-final { scan-assembler-times "nop" 8 } } */ /* { dg-final { scan-assembler-not "rep" } } */ -- Joseph S. Myers jos...@codesourcery.com
Fix gcc.dg/tls/thr-cse-1.c for MinGW target
gcc.dg/tls/thr-cse-1.c tests that there are not two calls to __emutls_get_address in the output. Normally this just checks for emutls_get_address.*emutls_get_address, but on some targets where the compiler output declares the functions called in some way the testcase has more specific tests. i?86-*-mingw* produces a .def directive for ___emutls_get_address so also needs such a special case, which this patch adds (only for i?86-*-mingw* as I don't know what the output looks like for other Windows targets). Tested with cross to i686-mingw32. OK to commit? 2011-10-26 Joseph Myers * gcc.dg/tls/thr-cse-1.c: For i?86-*-mingw*, check for multiple calls. Index: gcc.dg/tls/thr-cse-1.c === --- gcc.dg/tls/thr-cse-1.c (revision 180200) +++ gcc.dg/tls/thr-cse-1.c (working copy) @@ -15,9 +15,10 @@ return a; } -/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_address.*" { target { ! { "*-wrs-vxworks" "*-*-darwin8" "hppa*-*-hpux*" "spu-*-*" } } } } } */ +/* { dg-final { scan-assembler-not "emutls_get_address.*emutls_get_address.*" { target { ! { "*-wrs-vxworks" "*-*-darwin8" "hppa*-*-hpux*" "spu-*-*" "i?86-*-mingw*" } } } } } */ /* { dg-final { scan-assembler-not "call\tL___emutls_get_address.stub.*call\tL___emutls_get_address.stub.*" { target "*-*-darwin8" } } } */ /* { dg-final { scan-assembler-not "(b,l|bl) __emutls_get_address.*(b,l|bl) __emutls_get_address.*" { target "hppa*-*-hpux*" } } } */ /* { dg-final { scan-assembler-not "(brsl|brasl)\t__emutls_get_address.*(brsl|brasl)\t__emutls_get_address.*" { target spu-*-* } } } */ /* { dg-final { scan-assembler-not "tls_lookup.*tls_lookup.*" { target *-wrs-vxworks } } } */ +/* { dg-final { scan-assembler-not "call\t___emutls_get_address.*call\t___emutls_get_address" { target "i?86-*-mingw*" } } } */ -- Joseph S. Myers jos...@codesourcery.com
Fix gcc.dg/graphite/run-id-1.c for Windows targets
The test gcc.dg/graphite/run-id-1.c requires more stack space than Windows targets provide by default. This patch arranges for the appropriate -Wl,--stack option (as in config/mh-mingw and config/mh-cygwin) to be used by this test on those targets. Tested with cross to i686-mingw32. OK to commit? 2011-10-26 Joseph Myers * gcc.dg/graphite/run-id-1.c: Pass -Wl,--stack,12582912 option for Windows targets. Index: gcc.dg/graphite/run-id-1.c === --- gcc.dg/graphite/run-id-1.c (revision 180200) +++ gcc.dg/graphite/run-id-1.c (working copy) @@ -1,3 +1,4 @@ +/* { dg-options "-Wl,--stack,12582912" { target *-*-mingw* *-*-cygwin* } } */ /* { dg-require-effective-target size32plus } */ void abort (void); -- Joseph S. Myers jos...@codesourcery.com
[trans-mem] fix problems with same body aliases
The merge broke base/complete dtor transactional clones that we originally implemented here: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00590.html First, ipa_tm_execute() was putting __comp_dtor's into tm_callees, which wasn't happening pre-merge. Handling this __comp_dtor caused an ICE tree_function_versioning() because ENTRY_BLOCK_PTR_FOR_FUNCTION is invalid, since the CFG field for the original decl's cfun has not been set. The patch below excludes such aliases from tm_callees since we can't handle them in ipa_tm_create_version. Second, ipa_tm_create_version() marks aliases of the original decl as needed in callback_mark_needed(). Unfortunately, this "needed" bit does not persist until the end of compilation because whole program IPA will remove the bit here: /* Frontends and alias code marks nodes as needed before parsing is finished. We may end up marking as node external nodes where this flag is meaningless strip it. */ if (node->needed && (DECL_EXTERNAL (node->decl) || !node->analyzed)) node->needed = 0; Consequently, the alias will not show up in the .tm_clone_table. The patch below sets the "analyzed" bit to keep this from happening. With it, g++.dg/tm/alias.C works again. No regressions elsewhere. OK? * trans-mem.c (ipa_tm_execute): Do not include aliases in tm_callees. (callback_mark_needed): Set analyzed bit. Index: trans-mem.c === --- trans-mem.c (revision 180439) +++ trans-mem.c (working copy) @@ -4137,7 +4137,13 @@ callback_mark_needed (struct cgraph_node record_tm_clone_pair (node->decl, tm_alias); if (info->old_node->needed) - cgraph_mark_needed_node (cgraph_get_node (tm_alias)); + { + struct cgraph_node *alias = cgraph_get_node (tm_alias); + cgraph_mark_needed_node (alias); + /* Needed so function_and_variable_visibility() won't reset +the needed bit. */ + alias->analyzed = 1; + } } return false; } @@ -4592,6 +4598,7 @@ ipa_tm_execute (void) /* For all local functions marked tm_callable, queue them. */ for (node = cgraph_nodes; node; node = node->next) if (is_tm_callable (node->decl) + && !node->alias && cgraph_function_body_availability (node) >= AVAIL_OVERWRITABLE) { d = get_cg_data (node);
Re: [PATCH,fortran] Reap dead code
Steve, > Surely, you have Halloween across the Pond, ie., the Grim Reaper. :-) And what, pray, does the Grim Reaper hold??? The patch is OK. Thanks Paul > >> >> On Wed, Oct 26, 2011 at 7:53 PM, Steve Kargl >> wrote: >> > On Sat, Oct 22, 2011 at 01:16:14PM -0700, Steve Kargl wrote: >> >> The attach patch reaps some code that is now dead >> >> I am sure that you are right but could you confirm that is is because >> of previous patches? > > The tests in simplify.c can never be triggered because these > are already caught in check.c. Consider, > > program a > integer i > i = 12234 > print *, ibclr(i, 123) > end program a > > laptop:kargl[207] gfc4x -o z a.f90 > a.f90:4.20: > > print *, ibclr(i, 123) > 1 > Error: 'pos' at (1) must be less than BIT_SIZE('i') > > This error message comes from check.c(gfc_check_bitfcn), and > it is the error message I get with 4.5.x without my dead.diff > patch. The relevant lines of code in gfc_check_bitfcn are > > if (nonnegative_check ("pos", pos) == FAILURE) > return FAILURE; > > if (less_than_bitsize1 ("i", i, "pos", pos, false) == FAILURE) > return FAILURE; > > This is the chunk of code I removed from simplify.c(gfc_simplify_ibclr) > > - if (gfc_extract_int (y, &pos) != NULL || pos < 0) > - { > - gfc_error ("Invalid second argument of IBCLR at %L", &y->where); > - return &gfc_bad_expr; > - } > + gfc_extract_int (y, &pos); > > The if-statement can never be true. In fact, for the above Fortran > code, gfc_simplify_ibclr is never executed beyonds its first line: > > if (x->expr_type != EXPR_CONSTANT || y->expr_type != EXPR_CONSTANT) > return NULL; > > Here, x is 'i' from the program and y is 'pos'. So, both arguments > to ibclr must be constant for the code that I removed to even have > a chance to execute. > > The other chunks of code I removed are similar in nature. The > checking done by code in check.c makes the checking done in > simplify.c useless bloat. I'm just cleaning up the mess left > behind the person, who introduced gfc_check_bitfcn. > > laptop:kargl[217] svn log -r 160492 check.c |more > > r160492 | kargl | 2010-06-09 09:24:59 -0700 (Wed, 09 Jun 2010) | 23 lines > > 2010-06-09 Steven G. Kargl > > * fortran/intrinsic.c (add_functions): Change gfc_check_btest, > gfc_check_ibclr, and gfc_check_ibset to gfc_check_bitfcn. > * fortran/intrinsic.h: Remove prototypes for gfc_check_btest, > gfc_check_ibclr, and gfc_check_ibset. Add prototype for > gfc_check_bitfcn. > * fortran/check.c (nonnegative_check, less_than_bitsize1, > less_than_bitsize2): New functions. > (gfc_check_btest): Renamed to gfc_check_bitfcn. Use > nonnegative_check and less_than_bitsize1. > (gfc_check_ibclr, gfc_check_ibset): Removed. > (gfc_check_ibits,gfc_check_mvbits): Use nonnegative_check and > less_than_bitsize1. > >> >> due to my recent changes for ishftc in check.c. >> >> Regression tested on i686-*-freebsd. >> >> It looks to be OK, subject to the above, and is, I would guess, >> "obvious" in any case. >> >> Thanks for the patch > > My regression test on i686-*-freebsd. > > === gfortran Summary === > > # of expected passes 39552 > # of unexpected failures 8 > # of unexpected successes 16 > # of expected failures 41 > # of unsupported tests 212 > > The 8 failures are due to entry_4.f90 and select_type_12.f03. > > -- > Steve > -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
[Ada] Internal error on elaboration variable with -gnatct
This is an internal error on the elaboration variable generated for an array of discriminated record type whose discriminant is constraint by a dynamic upper bound in -gnatct mode. In this mode, the compiler doesn't generate code and gigi is invoked only to lay out types and back-annotate type information. Tested on i586-suse-linux, applied on the mainline. 2011-10-26 Eric Botcazou * gcc-interface/decl.c (elaborate_expression_1): Add EXPR_PUBLIC_P local variable. Always create the elaboration variable, if any, as constant. 2011-10-26 Eric Botcazou * gnat.dg/specs/discr1.ads: New test. * gnat.dg/specs/discr1_pkg.ads: New helper. -- Eric Botcazou Index: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 180540) +++ gcc-interface/decl.c (working copy) @@ -6085,7 +6085,8 @@ static tree elaborate_expression_1 (tree gnu_expr, Entity_Id gnat_entity, tree gnu_name, bool definition, bool need_debug) { - const bool expr_global_p = Is_Public (gnat_entity) || global_bindings_p (); + const bool expr_public_p = Is_Public (gnat_entity); + const bool expr_global_p = expr_public_p || global_bindings_p (); bool expr_variable_p, use_variable; /* In most cases, we won't see a naked FIELD_DECL because a discriminant @@ -6153,11 +6154,10 @@ elaborate_expression_1 (tree gnu_expr, E if (use_variable || need_debug) { tree gnu_decl - = create_var_decl (create_concat_name (gnat_entity, - IDENTIFIER_POINTER (gnu_name)), - NULL_TREE, TREE_TYPE (gnu_expr), gnu_expr, - !need_debug, Is_Public (gnat_entity), - !definition, expr_global_p, NULL, gnat_entity); + = create_var_decl_1 + (create_concat_name (gnat_entity, IDENTIFIER_POINTER (gnu_name)), + NULL_TREE, TREE_TYPE (gnu_expr), gnu_expr, true, expr_public_p, + !definition, expr_global_p, !need_debug, NULL, gnat_entity); if (use_variable) return gnu_decl; -- { dg-do compile } -- { dg-options "-gnatct" } with Discr1_Pkg; use Discr1_Pkg; package Discr1 is procedure Proc (V : Variable_String_Array); end Discr1; package Discr1_Pkg is Maximum_Length : Natural := 80 ; subtype String_Length is Natural range 0 .. Maximum_Length; type Variable_String (Length : String_Length := 0) is record S : String (1 .. Length); end record; type Variable_String_Array is array (Natural range <>) of Variable_String; end Discr1_Pkg;
Re: [PATCH,fortran] Reap dead code
On Wed, Oct 26, 2011 at 09:55:09PM +0200, Paul Richard Thomas wrote: > Dear Steve, > > Reaping implies that there is something about it that you want to keep > :-) Surely, weeding or herbicide spraying is better than reaping? Surely, you have Halloween across the Pond, ie., the Grim Reaper. :-) > > On Wed, Oct 26, 2011 at 7:53 PM, Steve Kargl > wrote: > > On Sat, Oct 22, 2011 at 01:16:14PM -0700, Steve Kargl wrote: > >> The attach patch reaps some code that is now dead > > I am sure that you are right but could you confirm that is is because > of previous patches? The tests in simplify.c can never be triggered because these are already caught in check.c. Consider, program a integer i i = 12234 print *, ibclr(i, 123) end program a laptop:kargl[207] gfc4x -o z a.f90 a.f90:4.20: print *, ibclr(i, 123) 1 Error: 'pos' at (1) must be less than BIT_SIZE('i') This error message comes from check.c(gfc_check_bitfcn), and it is the error message I get with 4.5.x without my dead.diff patch. The relevant lines of code in gfc_check_bitfcn are if (nonnegative_check ("pos", pos) == FAILURE) return FAILURE; if (less_than_bitsize1 ("i", i, "pos", pos, false) == FAILURE) return FAILURE; This is the chunk of code I removed from simplify.c(gfc_simplify_ibclr) - if (gfc_extract_int (y, &pos) != NULL || pos < 0) -{ - gfc_error ("Invalid second argument of IBCLR at %L", &y->where); - return &gfc_bad_expr; -} + gfc_extract_int (y, &pos); The if-statement can never be true. In fact, for the above Fortran code, gfc_simplify_ibclr is never executed beyonds its first line: if (x->expr_type != EXPR_CONSTANT || y->expr_type != EXPR_CONSTANT) return NULL; Here, x is 'i' from the program and y is 'pos'. So, both arguments to ibclr must be constant for the code that I removed to even have a chance to execute. The other chunks of code I removed are similar in nature. The checking done by code in check.c makes the checking done in simplify.c useless bloat. I'm just cleaning up the mess left behind the person, who introduced gfc_check_bitfcn. laptop:kargl[217] svn log -r 160492 check.c |more r160492 | kargl | 2010-06-09 09:24:59 -0700 (Wed, 09 Jun 2010) | 23 lines 2010-06-09 Steven G. Kargl * fortran/intrinsic.c (add_functions): Change gfc_check_btest, gfc_check_ibclr, and gfc_check_ibset to gfc_check_bitfcn. * fortran/intrinsic.h: Remove prototypes for gfc_check_btest, gfc_check_ibclr, and gfc_check_ibset. Add prototype for gfc_check_bitfcn. * fortran/check.c (nonnegative_check, less_than_bitsize1, less_than_bitsize2): New functions. (gfc_check_btest): Renamed to gfc_check_bitfcn. Use nonnegative_check and less_than_bitsize1. (gfc_check_ibclr, gfc_check_ibset): Removed. (gfc_check_ibits,gfc_check_mvbits): Use nonnegative_check and less_than_bitsize1. > >> due to my recent changes for ishftc in check.c. > >> Regression tested on i686-*-freebsd. > > It looks to be OK, subject to the above, and is, I would guess, > "obvious" in any case. > > Thanks for the patch My regression test on i686-*-freebsd. === gfortran Summary === # of expected passes39552 # of unexpected failures8 # of unexpected successes 16 # of expected failures 41 # of unsupported tests 212 The 8 failures are due to entry_4.f90 and select_type_12.f03. -- Steve
[Ada] Internal error on unchecked union and representation clause
The compiler aborts on a record type which contains a component of an unchecked union type whose size is smaller than 64 bits and which is subject to a representation clause that causes it not to start on a byte boundary. We already have the machinery to handle this kind of composite component, but it was enabled only for regular record types. Tested on i586-suse-linux, applied on the mainline. 2011-10-26 Eric Botcazou * gcc-interface/decl.c (gnat_to_gnu_entity) : Try to make a packable type for fields of union types as well. : Use RECORD_OR_UNION_TYPE_P predicate. (gnat_to_gnu_component_type): Try to make a packable type for fields of union types as well. (make_packable_type): Use RECORD_OR_UNION_TYPE_P predicate. (maybe_pad_type): Try to make a packable type for fields of union types as well. (gnat_to_gnu_field): Likewise. (is_variable_size): Use RECORD_OR_UNION_TYPE_P predicate. (set_rm_size): Likewise. (rm_size): Likewise. * gcc-interface/misc.c (gnat_type_max_size): Likewise. * gcc-interface/trans.c (add_decl_expr): Likewise. * gcc-interface/utils.c (finish_record_type): Likewise. * gcc-interface/utils2.c (build_simple_component_ref): Likewise. 2011-10-26 Eric Botcazou * gnat.dg/specs/unchecked_union.ads: Rename to... * gnat.dg/specs/unchecked_union1.ads: ...this. * gnat.dg/specs/unchecked_union2.ads: New test. -- Eric Botcazou -- { dg-do compile } package Unchecked_Union2 is type Small_Int is range 0 .. 2**19 - 1; type R1 (B : Boolean := True) is record case B is when True => Data1 : Small_Int; when False => Data2 : Small_Int; end case; end record; for R1 use record Data1 at 0 range 0 .. 18; Data2 at 0 range 0 .. 18; end record; for R1'Size use 24; pragma Unchecked_Union (R1); type R2 is record Data : R1; end record; for R2 use record Data at 0 range 3 .. 26; end record; end Unchecked_Union2; Index: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 180539) +++ gcc-interface/decl.c (working copy) @@ -3302,7 +3302,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit == INTEGER_CST) { gnu_size = DECL_SIZE (gnu_old_field); - if (TREE_CODE (gnu_field_type) == RECORD_TYPE + if (RECORD_OR_UNION_TYPE_P (gnu_field_type) && !TYPE_FAT_POINTER_P (gnu_field_type) && host_integerp (TYPE_SIZE (gnu_field_type), 1)) gnu_field_type @@ -4645,13 +4645,11 @@ gnat_to_gnu_entity (Entity_Id gnat_entit tree size; /* If a size was specified, take it into account. Otherwise - use the RM size for records as the type size has already - been adjusted to the alignment. */ + use the RM size for records or unions as the type size has + already been adjusted to the alignment. */ if (gnu_size) size = gnu_size; - else if ((TREE_CODE (gnu_type) == RECORD_TYPE - || TREE_CODE (gnu_type) == UNION_TYPE - || TREE_CODE (gnu_type) == QUAL_UNION_TYPE) + else if (RECORD_OR_UNION_TYPE_P (gnu_type) && !TYPE_FAT_POINTER_P (gnu_type)) size = rm_size (gnu_type); else @@ -5300,7 +5298,7 @@ gnat_to_gnu_component_type (Entity_Id gn && !Is_Bit_Packed_Array (gnat_array) && !Has_Aliased_Components (gnat_array) && !Strict_Alignment (gnat_type) - && TREE_CODE (gnu_type) == RECORD_TYPE + && RECORD_OR_UNION_TYPE_P (gnu_type) && !TYPE_FAT_POINTER_P (gnu_type) && host_integerp (TYPE_SIZE (gnu_type), 1)) gnu_type = make_packable_type (gnu_type, false); @@ -6357,9 +6355,7 @@ make_packable_type (tree type, bool in_r tree new_field_type = TREE_TYPE (old_field); tree new_field, new_size; - if ((TREE_CODE (new_field_type) == RECORD_TYPE - || TREE_CODE (new_field_type) == UNION_TYPE - || TREE_CODE (new_field_type) == QUAL_UNION_TYPE) + if (RECORD_OR_UNION_TYPE_P (new_field_type) && !TYPE_FAT_POINTER_P (new_field_type) && host_integerp (TYPE_SIZE (new_field_type), 1)) new_field_type = make_packable_type (new_field_type, true); @@ -6369,9 +6365,7 @@ make_packable_type (tree type, bool in_r packable version of the record type, see finish_record_type. */ if (!DECL_CHAIN (old_field) && !TYPE_PACKED (type) - && (TREE_CODE (new_field_type) == RECORD_TYPE - || TREE_CODE (new_field_type) == UNION_TYPE - || TREE_CODE (new_field_type) == QUAL_UNION_TYPE) + && RECORD_OR_UNION_TYPE_P (new_field_type) && !TYPE_FAT_POINTER_P (new_field_type) && !TYPE_CONTAINS_TEMPLATE_P (new_field_type) && TYPE_ADA_SIZE (new_field_type)) @@ -6533,7 +6527,7 @@ maybe_pad_type (tree type, tree size, un between them and it might be hard to overcome afterwards, including at the RTL level when the stand-alone
Re: [C++ Patch / RFC] PR 50864
On 10/26/2011 10:30 PM, Paolo Carlini wrote: Hi, At a glance, it looks too early; it's valid to have namespace-qualified names after ->. namespace A { struct B { int i; }; }; A::B* b; int i = b->A::B::i; I was also trying to construct such kind of example myself... but my patch does not regress on the testcase you wrote down. I can tell you exactly why, if you like.. We have that parser->scope is a RECORD_TYPE and postfix_expression is an INDIRECT_REF. Paolo.
[Ada] Missing error for invalid atomic component
The compiler should issue the "atomic access cannot be guaranteed" error twice on the attached testcase, but it only issues it for the stand-alone variable. Fixed thusly, tested on i586-suse-linux, applied on the mainline. 2011-10-26 Eric Botcazou * gcc-interface/decl.c (gnat_to_gnu_field): Always check components declared as atomic. Move around conditionally executed code. 2011-10-26 Eric Botcazou * gnat.dg/specs/atomic1.ads: New test. -- Eric Botcazou Index: gcc-interface/decl.c === --- gcc-interface/decl.c (revision 180423) +++ gcc-interface/decl.c (working copy) @@ -6853,10 +6853,8 @@ gnat_to_gnu_field (Entity_Id gnat_field, } } - /* If we are packing the record and the field is BLKmode, round the - size up to a byte boundary. */ - if (packed && TYPE_MODE (gnu_field_type) == BLKmode && gnu_size) -gnu_size = round_up (gnu_size, BITS_PER_UNIT); + if (Is_Atomic (gnat_field)) +check_ok_for_atomic (gnu_field_type, gnat_field, false); if (Present (Component_Clause (gnat_field))) { @@ -6946,9 +6944,6 @@ gnat_to_gnu_field (Entity_Id gnat_field, gnu_pos = NULL_TREE; } } - - if (Is_Atomic (gnat_field)) - check_ok_for_atomic (gnu_field_type, gnat_field, false); } /* If the record has rep clauses and this is the tag field, make a rep @@ -6961,7 +6956,14 @@ gnat_to_gnu_field (Entity_Id gnat_field, } else -gnu_pos = NULL_TREE; +{ + gnu_pos = NULL_TREE; + + /* If we are packing the record and the field is BLKmode, round the + size up to a byte boundary. */ + if (packed && TYPE_MODE (gnu_field_type) == BLKmode && gnu_size) + gnu_size = round_up (gnu_size, BITS_PER_UNIT); +} /* We need to make the size the maximum for the type if it is self-referential and an unconstrained type. In that case, we can't -- { dg-do compile } package Atomic1 is type Arr is array (Integer range <>) of Boolean; type UA is access all Arr; U : UA; pragma Atomic (U); -- { dg-error "atomic access" } type R is record U : UA; pragma Atomic (U); -- { dg-error "atomic access" } end record; end Atomic1;
Re: [C++ Patch / RFC] PR 50864
Hi, At a glance, it looks too early; it's valid to have namespace-qualified names after ->. namespace A { struct B { int i; }; }; A::B* b; int i = b->A::B::i; I was also trying to construct such kind of example myself... but my patch does not regress on the testcase you wrote down. I can tell you exactly why, if you like.. Paolo.
Re: [C++ Patch / RFC] PR 50864
On 10/26/2011 04:04 PM, Paolo Carlini wrote: The below tries to catch the problem very early, in cp_parser_postfix_dot_deref_expression and apparently works fine, passes the testsuite, etc. Is it too early? Is the check tight enough? At a glance, it looks too early; it's valid to have namespace-qualified names after ->. namespace A { struct B { int i; }; }; A::B* b; int i = b->A::B::i; Jason
Re: [C++-11] User defined literals
On 10/26/2011 04:11 PM, 3dw...@verizon.net wrote: Thank you Jason and Tom for your help in getting this together an putting up with my slowness. Thanks for all your work on this! Jason
Re: Use of vector instructions in memmov/memset expanding
Any questions on these patches? Are they ok for the trunk? On 20 October 2011 12:37, Michael Zolotukhin wrote: > And, finally, part with the tests. > > On 20 October 2011 12:36, Michael Zolotukhin > wrote: >> Back-end part of the patch is attached here. >> >> On 20 October 2011 12:35, Michael Zolotukhin >> wrote: >>> Middle-end part of the patch is attached. >>> >>> On 20 October 2011 12:34, Michael Zolotukhin >>> wrote: I fixed the tests as well as updated my branch and fixed introduced during this process bugs. Here is fixed complete patch (other parts will be sent in consequent letters). The changes passed bootstrap and make check. On 29 September 2011 15:21, Jakub Jelinek wrote: > Hi! > > On Thu, Sep 29, 2011 at 03:14:40PM +0400, Michael Zolotukhin wrote: > +/* { dg-options "-O2 -march=atom -mtune=atom -m64 -dp" } */ > > The testcases are wrong, -m64 or -m32 should never appear in dg-options, > instead if the testcase is specific to -m64, it should be guarded with > /* { dg-do compile { target lp64 } } */ > resp. ia32 (or ilp32, depending on what exactly should be done for -mx32), > if you have the same testcase for -m32 and -m64, but just want different > scan-assembler for the two cases, then just guard the scan-assembler > with lp64 resp. ia32/ilp32 target and add second one for the other target. > > Jakub -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation. >>> >>> >>> >>> -- >>> --- >>> Best regards, >>> Michael V. Zolotukhin, >>> Software Engineer >>> Intel Corporation. >>> >> >> >> >> -- >> --- >> Best regards, >> Michael V. Zolotukhin, >> Software Engineer >> Intel Corporation. >> > > > > -- > --- > Best regards, > Michael V. Zolotukhin, > Software Engineer > Intel Corporation. > -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
Re: Re: [C++-11] User defined literals
Oct 26, 2011 03:39:09 PM, ja...@redhat.com wrote: On 10/26/2011 02:00 AM, Ed Smith-Rowland wrote: > The patch was bootstrapped and regtested on x86_64-linux-gnu. Really? I ran into a warning about the unused "suffix" parameter to interpret_integer. So I've fixed that error. I also added a couple of comments, and implemented the change to check_literal_operator_args that I wondered about a while back. And checked it all in. But we aren't quite done, I think: I notice that the lookup of operators doesn't match what's in 2.14.8. For instance, I don't think this should be accepted: double operator"" _foo (long long unsigned); double d = 1.2_foo; The lookup described in 2.14.8 involves looking through the overload set for a particular signature before doing normal overload resolution. Also, we don't need to worry about argument-dependent lookup for these operators, since none of the arguments can have associated namespaces. So I think we can use lookup_name rather than lookup_function_nonclass, only look it up once in cp_userdef_numeric_literal, and then only build one call depending on the contents of the overload set. Jason Jason, Thank you Jason and Tom for your help in getting this together an putting up with my slowness. That warning about unused suffix didn't blow up the build. I'm surprised Werror didn't kill it. I'll check out these other issues as soon as I clean out all my work and rebuild on a clean tree. I guess I need to look at what kind of number I get when processing a numeric literal. Thanks again.
[C++ Patch / RFC] PR 50864
Hi, one more / RFC, for the ICE on invalid part of these issues with '->'. The below tries to catch the problem very early, in cp_parser_postfix_dot_deref_expression and apparently works fine, passes the testsuite, etc. Is it too early? Is the check tight enough? Thanks, Paolo. // Index: testsuite/g++.dg/template/crash109.C === --- testsuite/g++.dg/template/crash109.C(revision 0) +++ testsuite/g++.dg/template/crash109.C(revision 0) @@ -0,0 +1,10 @@ +// PR c++/50864 + +namespace impl +{ + template T create(); +} + +template () -> impl::create())> // { dg-error "not a class member" } +struct foo; Index: cp/parser.c === --- cp/parser.c (revision 180532) +++ cp/parser.c (working copy) @@ -5673,6 +5673,15 @@ cp_parser_postfix_dot_deref_expression (cp_parser { if (name != error_mark_node && !BASELINK_P (name) && parser->scope) { + if (TREE_CODE (parser->scope) == NAMESPACE_DECL + && TREE_CODE (postfix_expression) == ARROW_EXPR) + { + error_at (token->location, "%<%D::%D%> is not a class member", + parser->scope, name); + parser->context->object_type = NULL_TREE; + return error_mark_node; + } + name = build_qualified_name (/*type=*/NULL_TREE, parser->scope, name,
Re: [PATCH,fortran] Reap dead code
Dear Steve, Reaping implies that there is something about it that you want to keep :-) Surely, weeding or herbicide spraying is better than reaping? On Wed, Oct 26, 2011 at 7:53 PM, Steve Kargl wrote: > On Sat, Oct 22, 2011 at 01:16:14PM -0700, Steve Kargl wrote: >> The attach patch reaps some code that is now dead I am sure that you are right but could you confirm that is is because of previous patches? >> due to my recent changes for ishftc in check.c. >> Regression tested on i686-*-freebsd. It looks to be OK, subject to the above, and is, I would guess, "obvious" in any case. Thanks for the patch Paul
Re: [PATCH 0/6] More vector permutation work
On 26 Oct 2011, at 17:01, Richard Henderson wrote: On 10/26/2011 07:30 AM, Ulrich Weigand wrote: This fails since for u == 4 and mode == V4SFmode it attempts to expand a V4SFmode shift, which is unsupported. Shouldn't this be using the mode of the selector rather than the mode of the result in any case? Yes, it should use the mode of the selector. And doing just that is enough to fix the bug. But I noticed that the actual results for a constant permutation were much nastier than they ought to be. Try this. Iain, this might solve your case too; yes - bootstrap incl fortran just finished (not the quickest machine ;)) no I can't test myself off cross, because libgfortran needs the whole cross-env -- system headers and everything. Ah yes .. as does Ada .. and ObjC. I recently (when trying to figure out what was broken w.ppc Ada) did some Darwin crosses from Linux, not too too painful to set up (but then I have the stuff to hand), but in essence prob. no worse than d/ l a bunch of pkg or whatever to set up a foreign linux cross-root. cheers Iain
Re: [C++-11] User defined literals
On 10/26/2011 02:00 AM, Ed Smith-Rowland wrote: The patch was bootstrapped and regtested on x86_64-linux-gnu. Really? I ran into a warning about the unused "suffix" parameter to interpret_integer. So I've fixed that error. I also added a couple of comments, and implemented the change to check_literal_operator_args that I wondered about a while back. And checked it all in. But we aren't quite done, I think: I notice that the lookup of operators doesn't match what's in 2.14.8. For instance, I don't think this should be accepted: double operator"" _foo (long long unsigned); double d = 1.2_foo; The lookup described in 2.14.8 involves looking through the overload set for a particular signature before doing normal overload resolution. Also, we don't need to worry about argument-dependent lookup for these operators, since none of the arguments can have associated namespaces. So I think we can use lookup_name rather than lookup_function_nonclass, only look it up once in cp_userdef_numeric_literal, and then only build one call depending on the contents of the overload set. Jason commit e2fe821867bdc55a300f400fe8340a8d7168fb46 Author: Jason Merrill Date: Wed Oct 26 13:38:05 2011 -0400 interpret_int diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c index 46c0340..baee8eb 100644 --- a/gcc/c-family/c-lex.c +++ b/gcc/c-family/c-lex.c @@ -44,7 +44,7 @@ static splay_tree file_info_tree; int pending_lang_change; /* If we need to switch languages - C++ only */ int c_header_level; /* depth in C headers - C++ only */ -static tree interpret_integer (const cpp_token *, unsigned int, const char *); +static tree interpret_integer (const cpp_token *, unsigned int); static tree interpret_float (const cpp_token *, unsigned int, const char *); static tree interpret_fixed (const cpp_token *, unsigned int); static enum integer_type_kind narrowest_unsigned_type @@ -329,7 +329,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags, Set PURE_ZERO to pass this information to the C++ parser. */ if (tok->val.str.len == 1 && *tok->val.str.text == '0') add_flags = PURE_ZERO; - *value = interpret_integer (tok, flags, suffix); + *value = interpret_integer (tok, flags); break; case CPP_N_FLOATING: @@ -584,11 +584,9 @@ narrowest_signed_type (unsigned HOST_WIDE_INT low, return itk_none; } -/* Interpret TOKEN, an integer with FLAGS as classified by cpplib. - For C++0X SUFFIX may contain a user-defined literal suffix. */ +/* Interpret TOKEN, an integer with FLAGS as classified by cpplib. */ static tree -interpret_integer (const cpp_token *token, unsigned int flags, - const char *suffix) +interpret_integer (const cpp_token *token, unsigned int flags) { tree value, type; enum integer_type_kind itk; commit 9c14fb264df67fdc3a28b47c67991345af634d85 Author: Jason Merrill Date: Wed Oct 26 12:36:58 2011 -0400 build_string comments diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c index baee8eb..7b220ab 100644 --- a/gcc/c-family/c-lex.c +++ b/gcc/c-family/c-lex.c @@ -344,6 +344,8 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags, { tree suffix_id = get_identifier (suffix); int len = tok->val.str.len - strlen (suffix); + /* If this is going to be used as a C string to pass to a + raw literal operator, we need to add a trailing NUL. */ tree num_string = build_string (len + 1, (const char *) tok->val.str.text); TREE_TYPE (num_string) = char_array_type_node; diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 5ba5008..860556c 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -8592,7 +8592,7 @@ grokdeclarator (const cp_declarator *declarator, error ("declaration of %qD as non-function", dname); return error_mark_node; } - + if (dname && TREE_CODE (dname) == IDENTIFIER_NODE && UDLIT_OPER_P (dname) diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 840a30d..090482c 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -3667,6 +3667,7 @@ cp_parser_userdef_numeric_literal (cp_parser *parser) /* Parse a user-defined string constant. Returns a call to a user-defined literal operator taking a character pointer and the length of the string as arguments. */ + static tree cp_parser_userdef_string_literal (cp_token *token) { diff --git a/gcc/tree.c b/gcc/tree.c index 64c4968..2cbd68b 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -1525,6 +1525,7 @@ build_real_from_int_cst (tree type, const_tree i) /* Return a newly constructed STRING_CST node whose value is the LEN characters at STR. + Note that for a C string literal, LEN should include the trailing NUL. The TREE_TYPE is not initialized. */ tree commit d083a0d7f94fb0fe3605d499366b1b637e169c17 Author: Jason Merrill Date: Wed Oct 26 12:37:32 2011 -0400 * typeck.c (check_literal_operator_args): Avoid building types. diff --gi
[PATCH] Pass through jump functions for addressable (scalar) parameters
Hi, Honza requested that we do attempt to produce pass-through jump functions even when the actual formal parameter that is being passed on is addressable - provided that we can prove it has not changed value, of course. The following patch does this, hopefully the limitations of our parameter-not-modified mechanism are not too strict for this purpose. The patch is supposed to be applied on top of the one I have just sent that renames all parm_infos to parm_ainfos. I have successfully bootstrapped and tested it on x86_64-linux. OK for trunk? Thanks, Martin 2011-10-26 Martin Jambor * ipa-prop.c (mark_modified): Moved up in the file. (is_parm_modified_before_call): Renamed to is_parm_modified_before_stmt, moved up in the file. (load_from_unmodified_param): New function. (compute_complex_assign_jump_func): Also attempt to create pass through jump functions for values loaded from (addressable) parameters. * testsuite/gcc.dg/ipa/ipcp-4.c: New test. Index: src/gcc/testsuite/gcc.dg/ipa/ipcp-4.c === --- /dev/null +++ src/gcc/testsuite/gcc.dg/ipa/ipcp-4.c @@ -0,0 +1,68 @@ +/* Test that IPA-CP is able to produce a pass-through jump function for the + call of g1 and g2 even though a is addressable. Also test that h is not + cloned. */ + +/* { dg-do compile } */ +/* { dg-options "-O3 -fipa-cp -fipa-cp-clone -fdump-ipa-cp -fno-early-inlining" } */ +/* { dg-add-options bind_pic_locally } */ + +extern void use_stuff (int); +extern void use_pointer (int *); + +static int +h (int a, int b) +{ + int i; + + for (i = 8; i <= b; i++) +use_stuff (a+8); +} + +static int +g1 (int a, int b) +{ + int i; + + for (i = 0; i <= b; i++) +use_pointer (&a); + h (a, b); +} + +static int +g2 (int a, int b) +{ + int i; + + for (i = 4; i <= b; i += 2) +use_stuff (a); +} + + +static void +f (int a, int z) +{ + if (z > 1) +g1 (a, z); + else +g2 (a + 4, z); + use_pointer (&a); +} + +int +main (int argc, char *argv[]) +{ + int i; + for (i = 0; i < 100; i++) +f (7, argc); + return 0; +} + + +/* { dg-final { scan-ipa-dump "Creating a specialized node of g1.*for all known contexts" "cp" } } */ +/* { dg-final { scan-ipa-dump "Creating a specialized node of g2.*for all known contexts" "cp" } } */ +/* { dg-final { scan-ipa-dump-not "Creating a specialized node of h.*for all known contexts" "cp" } } */ +/* { dg-final { scan-ipa-dump-times "replacing param a with const 7" 2 "cp" } } */ +/* { dg-final { scan-ipa-dump "replacing param a with const 11" "cp" } } */ +/* { dg-final { cleanup-ipa-dump "cp" } } */ + + Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -419,31 +419,105 @@ detect_type_change_ssa (tree arg, gimple return detect_type_change (arg, arg, call, jfunc, 0); } +/* Callback of walk_aliased_vdefs. Flags that it has been invoked to the + boolean variable pointed to by DATA. */ + +static bool +mark_modified (ao_ref *ao ATTRIBUTE_UNUSED, tree vdef ATTRIBUTE_UNUSED, +void *data) +{ + bool *b = (bool *) data; + *b = true; + return true; +} + +/* Return true if the formal parameter PARM might have been modified in this + function before reaching the statement STMT. PARM_AINFO is a pointer to a + structure containing temporary information about PARM. */ + +static bool +is_parm_modified_before_stmt (struct param_analysis_info *parm_ainfo, + gimple stmt, tree parm) +{ + bool modified = false; + ao_ref refd; + + if (parm_ainfo->modified) +return true; + + ao_ref_init (&refd, parm); + walk_aliased_vdefs (&refd, gimple_vuse (stmt), mark_modified, + &modified, &parm_ainfo->visited_statements); + if (modified) +{ + parm_ainfo->modified = true; + return true; +} + return false; +} + +/* If STMT is an assignment that loads a value from an parameter declaration, + return the index of the parameter in ipa_node_params which has not been + modified. Otherwise return -1. */ + +static int +load_from_unmodified_param (struct ipa_node_params *info, + struct param_analysis_info *parms_ainfo, + gimple stmt) +{ + int index; + tree op1; + + if (!gimple_assign_single_p (stmt) + || gimple_assign_cast_p (stmt)) +return -1; + + op1 = gimple_assign_rhs1 (stmt); + index = ipa_get_param_decl_index (info, op1); + if (index < 0 + || is_parm_modified_before_stmt (&parms_ainfo[index], stmt, op1)) +return -1; + + return index; +} /* Given that an actual argument is an SSA_NAME (given in NAME) and is a result of an assignment statement STMT, try to find out whether NAME can be described by a (possibly polynomial) pass-through jump-function or an - ancestor jump function and if so, write
[PATCH] Mechanical renaming in ipa-prop.c
Hi, I've worked with the parameter-has-not-been-modified mechanism in ipa-prop.c after not seeing for quite some time and found the original "infos" and temporary "parm_infos" confusing. Thus I at least renamed all the parm_infos (which are deallocated after the analysis ends and which have type "struct param_analysis_info *") to parm_ainfos. Bootstrapped and tested on x86_64-linux without any issues. OK for trunk? Thanks, Martin 2011-10-25 Martin Jambor * ipa-prop.c (compute_pass_through_member_ptrs): Rename parm_info to parm_ainfo. (ipa_compute_jump_functions_for_edge): Likewise. (ipa_compute_jump_functions): Likewise. (ipa_analyze_indirect_call_uses): Likewise. (ipa_analyze_call_uses): Likewise. (ipa_analyze_params_uses): Likewise. (ipa_analyze_node): Likewise. Index: src/gcc/ipa-prop.c === --- src.orig/gcc/ipa-prop.c +++ src/gcc/ipa-prop.c @@ -794,7 +794,7 @@ is_parm_modified_before_call (struct par static bool compute_pass_through_member_ptrs (struct ipa_node_params *info, - struct param_analysis_info *parms_info, + struct param_analysis_info *parms_ainfo, struct ipa_edge_args *args, gimple call) { @@ -813,7 +813,8 @@ compute_pass_through_member_ptrs (struct int index = ipa_get_param_decl_index (info, arg); gcc_assert (index >=0); - if (!is_parm_modified_before_call (&parms_info[index], call, arg)) + if (!is_parm_modified_before_call (&parms_ainfo[index], call, +arg)) { struct ipa_jump_func *jfunc = ipa_get_ith_jump_func (args, num); @@ -968,7 +969,7 @@ compute_cst_member_ptr_arguments (struct to this callsite. */ static void -ipa_compute_jump_functions_for_edge (struct param_analysis_info *parms_info, +ipa_compute_jump_functions_for_edge (struct param_analysis_info *parms_ainfo, struct cgraph_edge *cs) { struct ipa_node_params *info = IPA_NODE_REF (cs->caller); @@ -985,7 +986,7 @@ ipa_compute_jump_functions_for_edge (str /* Let's check whether there are any potential member pointers and if so, whether we can determine their functions as pass_through. */ - if (!compute_pass_through_member_ptrs (info, parms_info, args, call)) + if (!compute_pass_through_member_ptrs (info, parms_ainfo, args, call)) return; /* Finally, let's check whether we actually pass a new constant member @@ -998,7 +999,7 @@ ipa_compute_jump_functions_for_edge (str static void ipa_compute_jump_functions (struct cgraph_node *node, - struct param_analysis_info *parms_info) + struct param_analysis_info *parms_ainfo) { struct cgraph_edge *cs; @@ -1010,11 +1011,11 @@ ipa_compute_jump_functions (struct cgrap functions unless they may become known during lto/whopr. */ if (!callee->analyzed && !flag_lto) continue; - ipa_compute_jump_functions_for_edge (parms_info, cs); + ipa_compute_jump_functions_for_edge (parms_ainfo, cs); } for (cs = node->indirect_calls; cs; cs = cs->next_callee) -ipa_compute_jump_functions_for_edge (parms_info, cs); +ipa_compute_jump_functions_for_edge (parms_ainfo, cs); } /* If RHS looks like a rhs of a statement loading pfn from a member @@ -,7 +1112,7 @@ ipa_note_param_call (struct cgraph_node } /* Analyze the CALL and examine uses of formal parameters of the caller NODE - (described by INFO). PARMS_INFO is a pointer to a vector containing + (described by INFO). PARMS_AINFO is a pointer to a vector containing intermediate information about each formal parameter. Currently it checks whether the call calls a pointer that is a formal parameter and if so, the parameter is marked with the called flag and an indirect call graph edge @@ -1170,7 +1171,7 @@ ipa_note_param_call (struct cgraph_node static void ipa_analyze_indirect_call_uses (struct cgraph_node *node, struct ipa_node_params *info, - struct param_analysis_info *parms_info, + struct param_analysis_info *parms_ainfo, gimple call, tree target) { gimple def; @@ -1283,7 +1284,7 @@ ipa_analyze_indirect_call_uses (struct c return; index = ipa_get_param_decl_index (info, rec); - if (index >= 0 && !is_parm_modified_before_call (&parms_info[index], + if (index >= 0 && !is_parm_modified_before_call (&parms_ainfo[index], call, rec)) ipa_note_param_call (node, index, call); @@ -1347,20 +1
[PATCH, testsuite] Unxfail g++.dg/tree-ssa/pr45605.C
Hi, I've noticed that a test in g++.dg/tree-ssa/pr45605.C is Xfailed, even though early FRE now does the devirtualization that it tests for. True, the current scan happens over the SSA dump (which implies the front-end was somehow supposed to do it?) but I think that given we do perform the optimization, checking that we continue to do so is useful, hence the following change. Tested by adding to a successful bootstrap and test run on x86_64. Unless anyone objects, I will commit the patch tomorrow. Thanks, Martin 2011-10-25 Martin Jambor * g++.dg/tree-ssa/pr45605.C: Scan fre1 dump and unxfail. Index: src/gcc/testsuite/g++.dg/tree-ssa/pr45605.C === --- src.orig/gcc/testsuite/g++.dg/tree-ssa/pr45605.C +++ src/gcc/testsuite/g++.dg/tree-ssa/pr45605.C @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O1 -fdump-tree-ssa" } */ +/* { dg-options "-O1 -fdump-tree-fre1" } */ extern "C" void abort(); bool destructor_called = false; @@ -33,5 +33,5 @@ int main() { /* We should devirtualize call to D::Run */ -/* { dg-final { scan-tree-dump-times "D::Run \\(" 1 "ssa" { xfail *-*-* } } } */ -/* { dg-final { cleanup-tree-dump "ssa" } } */ +/* { dg-final { scan-tree-dump-times "D::Run \\(" 1 "fre1" } } */ +/* { dg-final { cleanup-tree-dump "fre1" } } */
Re: [PATCH 3/3, RFC] Fixup COND_EXECs before reload
This RFC patch implements conversion of COND_EXEC instructions to control flow for pre-RA selective scheduler. Something like this is needed to employ predication support before reload. Each COND_EXEC is converted separately to a new basic block with the unconditional variant of the instruction, and a conditional jump around it. I'm not sure what would be an acceptable approach here. I'm also not sure about the recommended way to emit JUMPs. 2011-10-26 Sergey Grechanik * sel-sched.c (convert_cond_execs): New. Use it... (sel_global_finish): ... here. diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c index f5c6f8b..b8f2663 100644 --- a/gcc/sel-sched.c +++ b/gcc/sel-sched.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "rtlhooks-def.h" #include "output.h" #include "emit-rtl.h" +#include "cfghooks.h" #ifdef INSN_SCHEDULING #include "sel-sched-ir.h" @@ -7978,6 +7979,60 @@ sel_global_init (void) init_hard_regs_data (); } +/* Convert cond_execs to jumps before reload. */ +static void +convert_cond_execs (void) +{ + basic_block bb; + rtx insn; + + if (reload_completed) +return; + + FOR_EACH_BB (bb) + /* We don't need the safe variant because we break immediately after + removing the current instruction. */ +FOR_BB_INSNS (bb, insn) + if (INSN_P (insn) && GET_CODE (PATTERN (insn)) == COND_EXEC) + { + rtx jump; + rtx cond = COND_EXEC_TEST (PATTERN (insn)); + rtx rcond = reversed_comparison (cond, GET_MODE (cond)); + rtx unpredicated = copy_rtx (COND_EXEC_CODE (PATTERN (insn))); + + /* Split bb into BB, NEW_BB, NEXT_BB (in that order). */ + edge e1 = split_block (bb, insn); + basic_block next_bb = e1->dest; + edge e2 = split_block (bb, insn); + basic_block new_bb = e2->dest; + + /* Emit conditional jump at the end of bb. */ + rtx label = block_label (next_bb); + + /* FIXME There should be a better way. */ + rtx jump_pat + = gen_rtx_SET (GET_MODE (pc_rtx), pc_rtx, + gen_rtx_IF_THEN_ELSE (GET_MODE (pc_rtx), + rcond, + gen_rtx_LABEL_REF (VOIDmode, + label), + pc_rtx)); + + make_edge (bb, next_bb, 0); + jump = emit_jump_insn_after (jump_pat, BB_END (bb)); + JUMP_LABEL (jump) = label; + LABEL_NUSES (label)++; + + emit_insn_after_noloc (unpredicated, BB_HEAD (new_bb), new_bb); + + delete_insn (insn); + break; + } +#ifdef ENABLE_CHECKING + verify_flow_info (); +#endif +} + /* Free the global data of the scheduler. */ static void sel_global_finish (void) @@ -7998,6 +8053,8 @@ sel_global_finish (void) free_sched_pools (); free_dominance_info (CDI_DOMINATORS); + + convert_cond_execs (); } /* Return true when we need to skip selective scheduling. Used for debugging. */
Re: [PATCH,fortran] Reap dead code
On Sat, Oct 22, 2011 at 01:16:14PM -0700, Steve Kargl wrote: > The attach patch reaps some code that is now dead > due to my recent changes for ishftc in check.c. > Regression tested on i686-*-freebsd. > > 2011-10-22 Steevn G. Kargl > > * simplify.c (gfc_simplify_ishftc): Reap dead code. Here's a revised patch that does 2 things. First, it adds a check in gfc_check_nearest that the 2nd argument (if it is a constant) is not zero. Second, it reaps a dead code in several of the simplification functions. 2011-10-26 Steven G. Kargl * check.c (gfc_check_atan_2): Typo in comment. (gfc_check_nearest): If 's' is constant, check that it is not 0. * simplify.c (simplify_dshift, gfc_simplify_ibclr, gfc_simplify_ibits, gfc_simplify_ibset, simplify_shift, gfc_simplify_ishftc, gfc_simplify_nearest): Remove dead code. 2011-10-26 Steven G. Kargl * gfortran.dg/nearest_5.f90: New test. -- Steve Index: fortran/check.c === --- fortran/check.c (revision 180529) +++ fortran/check.c (working copy) @@ -934,7 +934,7 @@ null_arg: gfc_try gfc_check_atan_2 (gfc_expr *y, gfc_expr *x) { - /* gfc_notify_std would be a wast of time as the return value + /* gfc_notify_std would be a waste of time as the return value is seemingly used only for the generic resolution. The error will be: Too many arguments. */ if ((gfc_option.allow_std & GFC_STD_F2008) == 0) @@ -2710,6 +2710,16 @@ gfc_check_nearest (gfc_expr *x, gfc_expr if (type_check (s, 1, BT_REAL) == FAILURE) return FAILURE; + if (s->expr_type == EXPR_CONSTANT) +{ + if (mpfr_sgn (s->value.real) == 0) + { + gfc_error ("Argument 'S' of NEAREST at %L shall not be zero", + &s->where); + return FAILURE; + } +} + return SUCCESS; } Index: fortran/simplify.c === --- fortran/simplify.c (revision 180529) +++ fortran/simplify.c (working copy) @@ -1899,13 +1899,7 @@ simplify_dshift (gfc_expr *arg1, gfc_exp k = gfc_validate_kind (BT_INTEGER, arg1->ts.kind, false); size = gfc_integer_kinds[k].bit_size; - if (gfc_extract_int (shiftarg, &shift) != NULL) -{ - gfc_error ("Invalid SHIFT argument of DSHIFTL at %L", &shiftarg->where); - return &gfc_bad_expr; -} - - gcc_assert (shift >= 0 && shift <= size); + gfc_extract_int (shiftarg, &shift); /* DSHIFTR(I,J,SHIFT) = DSHIFTL(I,J,SIZE-SHIFT). */ if (right) @@ -2509,21 +2503,10 @@ gfc_simplify_ibclr (gfc_expr *x, gfc_exp if (x->expr_type != EXPR_CONSTANT || y->expr_type != EXPR_CONSTANT) return NULL; - if (gfc_extract_int (y, &pos) != NULL || pos < 0) -{ - gfc_error ("Invalid second argument of IBCLR at %L", &y->where); - return &gfc_bad_expr; -} + gfc_extract_int (y, &pos); k = gfc_validate_kind (x->ts.type, x->ts.kind, false); - if (pos >= gfc_integer_kinds[k].bit_size) -{ - gfc_error ("Second argument of IBCLR exceeds bit size at %L", - &y->where); - return &gfc_bad_expr; -} - result = gfc_copy_expr (x); convert_mpz_to_unsigned (result->value.integer, @@ -2551,17 +2534,8 @@ gfc_simplify_ibits (gfc_expr *x, gfc_exp || z->expr_type != EXPR_CONSTANT) return NULL; - if (gfc_extract_int (y, &pos) != NULL || pos < 0) -{ - gfc_error ("Invalid second argument of IBITS at %L", &y->where); - return &gfc_bad_expr; -} - - if (gfc_extract_int (z, &len) != NULL || len < 0) -{ - gfc_error ("Invalid third argument of IBITS at %L", &z->where); - return &gfc_bad_expr; -} + gfc_extract_int (y, &pos); + gfc_extract_int (z, &len); k = gfc_validate_kind (BT_INTEGER, x->ts.kind, false); @@ -2614,21 +2588,10 @@ gfc_simplify_ibset (gfc_expr *x, gfc_exp if (x->expr_type != EXPR_CONSTANT || y->expr_type != EXPR_CONSTANT) return NULL; - if (gfc_extract_int (y, &pos) != NULL || pos < 0) -{ - gfc_error ("Invalid second argument of IBSET at %L", &y->where); - return &gfc_bad_expr; -} + gfc_extract_int (y, &pos); k = gfc_validate_kind (x->ts.type, x->ts.kind, false); - if (pos >= gfc_integer_kinds[k].bit_size) -{ - gfc_error ("Second argument of IBSET exceeds bit size at %L", - &y->where); - return &gfc_bad_expr; -} - result = gfc_copy_expr (x); convert_mpz_to_unsigned (result->value.integer, @@ -3004,11 +2967,8 @@ simplify_shift (gfc_expr *e, gfc_expr *s if (e->expr_type != EXPR_CONSTANT || s->expr_type != EXPR_CONSTANT) return NULL; - if (gfc_extract_int (s, &shift) != NULL) -{ - gfc_error ("Invalid second argument of %s at %L", name, &s->where); - return &gfc_bad_expr; -} + + gfc_extract_int (s, &shift); k = gfc_validate_kind (BT_INTEGER, e->ts.kind, false); bitsize = gfc_integer_kinds[k].bit_size; @@ -3146,11 +3106,7 @@ gfc_simplify_ishftc (gfc_expr *e, gfc_ex
Re: [PATCH 2/3] Predication support
This patch contains the implementation of predication support. The selective scheduler is adjusted to recognize COND_EXEC instructions. They can be renamed like normal assignments. Selective scheduler itself will add predicated variants of instructions in availability sets when computing them on CFG splits, using the predicate of the conditional jump. Converting an instruction to a predicated form is treated similar to other transformations (speculation, substitition or renaming) in {undo,redo}_transformation. Predication cache is maintained to quickly retrieve predicated variants of instructions. In particular, predication support allows to pipeline non-doloop loops on ia64 more cleanly, as no recovery code for speculative loads is needed. Also, it allows to pipeline loads on arm. 2011-10-26 Alexander Monakov * common.opt: Add -fsel-sched-predication option. * config/ia64/ia64.c (get_mode_no_for_insn): Support conditional loads. * rtl.h (COND_SET_SRC_PTR, COND_SET_SRC_PTR): New macros. * sched-deps.c (conditions_mutex_with_rev_p): Rename from conditions_mutex_p. Adjust the caller. (conditions_mutex_p, conditions_same_p, conditions_same_or_mutex_p): New functions. (sched_analyze_insn): Extract LHS/RHS for conditional assignments. * sched-int.h (conditions_mutex_p, conditions_same_p, conditions_same_or_mutex_p): Declare. * sel-sched-ir.c (vinsn_equal_p): Support conditional insns. (vinsn_equal_skip_cond_p): New function. (init_expr): Add new argument (was_predicated). Update all callers. (merge_expr_data): Assert equality of merged predicates. (av_set_add_nocopy, join_distinct_sets): Export. (av_set_tail, av_set_concat): New functions. (av_set_union_and_live): Add parameter to_fallthru_p, generate predicated insns. (setup_id_lhs_rhs): Support conditional insns. (setup_id_cond): New function. (init_id_from_df, deps_init_id): Use it. (get_dest_and_mode): Support conditional insns. * sel-sched-ir.h (enum local_trans_type): Add TRANS_PREDICATION. (struct _expr): Add new member (was_predicated). (EXPR_COND, EXPR_WAS_PREDICATED): New accessor macros. (struct idata_def): Add new member (cond). (IDATA_COND, VINSN_COND, INSN_COND): New accessor macros. (vinsn_equal_skip_cond_p, maybe_predicate_expr_into, av_set_add_nocopy, av_set_intersect, join_distinct_sets, av_set_concat): Declare. (av_set_union_and_live): Change prototype. (in_fallthru_bb_p): Add new parameter (skip_nops_p). Update all callers. (can_substitute_through_p): Do not substitute through conditional insns. (substitute_reg_in_expr, replace_src_with_reg_ok_p): Support substitution in conditional moves. (create_insn_rtx_with_rhs, replace_dest_with_reg_ok_p, replace_dest_with_reg_in_expr): Support renaming of conditional moves. (create_insn_rtx_with_lhs): Delete. (mark_unavailable_hard_regs, choose_best_reg_1, choose_best_pseudo_reg, verify_target_availability): Support conditional insns. (undo_transformations): Add support for TRANS_PREDICATION. (redo_transformations): Ditto. (moveup_expr): Add support for predicated exprs. (struct predication_cache, predication_cache_hash, predication_cache_free, predication_cache_eq, find_in_predication_cache, predicate_expr, maybe_predicate_expr_into, populate_av_set_with_predicated, filter_av_set_by_predicate): New. (compute_av_set_at_bb_end): Generate predicated instructions. (sel_rank_for_schedule): Adjust heuristics. (vinsn_vec_has_expr_p): Allow inexact match for predicated insns. (move_op_orig_expr_not_found): Ignore insns with matching LHS if the condition is mutually exclusive. (code_motion_process_successors): Add new argument (cond). Use it to skip predicated-off paths. Update the caller. (av_set_common_cond): New. (sel_region_init): Initialize predication cache. (sel_region_finish): Free predication cache. diff --git a/gcc/common.opt b/gcc/common.opt index d1f286f..3a7114d 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1695,6 +1695,10 @@ fsel-sched-reschedule-pipelined Common Report Var(flag_sel_sched_reschedule_pipelined) Init(0) Optimization Reschedule pipelined regions without pipelining +fsel-sched-predication +Common Report Var(flag_sel_sched_predication) Init(0) Optimization +Perform predication in selective scheduling + ; sched_stalled_insns means that insns can be moved prematurely from the queue ; of stalled insns into the ready list. fsched-stalled-insns diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c index 78d2441..94dbff2 100644 --- a/gcc/config/ia64/ia64.c +++ b/gcc/config/ia64/
PATCH RFC: Fix cross-compiler include path if not using --with-sysroot
In my earlier patch to add support for --with-native-system-header-dir, I messed up the case of a cross-compiler when not using --with-sysroot. In that case the compiler was incorrectly searching /usr/include. This patch restores the logic to what it used to be before my patch. As the patch is straightforward and simply restores the code to what it used to be before my earlier patch, I plan to go ahead and commit this after bootstrap and testing passes. Please let me know if you object. Ian 2011-10-26 Ian Lance Taylor * cppdefault.c: Undef NATIVE_SYSTEM_HEADER_DIR if CROSS_DIRECTORY_STRUCTURE is defined and TARGET_SYSTEM_ROOT is not. (cpp_include_defaults): Only use NATIVE_SYSTEM_HEADER_DIR if it is defined. Index: gcc/cppdefault.c === --- gcc/cppdefault.c (revision 180530) +++ gcc/cppdefault.c (working copy) @@ -32,6 +32,7 @@ #if defined (CROSS_DIRECTORY_STRUCTURE) && !defined (TARGET_SYSTEM_ROOT) # undef LOCAL_INCLUDE_DIR +# undef NATIVE_SYSTEM_HEADER_DIR #else # undef CROSS_INCLUDE_DIR #endif @@ -84,8 +85,10 @@ const struct default_include cpp_include /* Another place the target system's headers might be. */ { TOOL_INCLUDE_DIR, "BINUTILS", 0, 1, 0, 0 }, #endif +#ifdef NATIVE_SYSTEM_HEADER_DIR /* /usr/include comes dead last. */ { NATIVE_SYSTEM_HEADER_DIR, NATIVE_SYSTEM_HEADER_COMPONENT, 0, 0, 1, 0 }, +#endif { 0, 0, 0, 0, 0, 0 } }; #endif /* no INCLUDE_DEFAULTS */
Re: [C++ Patch / RFC] PR 50870
OK. Jason
Re: [PATCH] rs6000: Delete the remaining vec_extract expanders.
On 10/25/2011 04:38 PM, Michael Meissner wrote: > On Tue, Oct 25, 2011 at 03:01:37PM -0700, Richard Henderson wrote: >> Now that expand_binop handles lowering vec_extract_even to vec_perm, >> we can remove the last two unnecessary vec_extract patterns from the >> Altivec backend. >> >> Ok? > > Just to be sure, the lowering doesn't depend on -ftree-vectorize (i.e. in case > people are using the buitlins). I assume it doesn't. No, this lowering happens during expansion to rtl, similar to NEG x -> SUB 0, x and other expansion variants that optabs.c applies. r~
Re: [cxx-mem-model] Change library call for __atomic_op_fetch routines
On 10/25/2011 06:45 PM, Andrew MacLeod wrote: > * builtins.c (expand_builtin_atomic_fetch_op): External calls for > 'op_fetch' builtins need to instead call 'fetch_op' externals and issue > correction code. > (expand_builtin): Provide proper builtin name for external call and > ignored flag to expand_builtin_atomic_fetch_op. Looks ok. > !resolved to an instriction sequence. */ "instruction" > ! gcc_assert (TREE_OPERAND (addr, 0) == fndecl); > ! TREE_OPERAND (addr, 0) = builtin_decl_explicit(ext_call); I do wonder if we ought not put the original back after the expand. r~
Re: [C++ Patch] BASELINK_P clean up
On 10/26/2011 12:41 PM, Paolo Carlini wrote: we have an inconsistent mix: shall we do the below? Sure, why not. Jason
Re: [cxx-mem-model] Change library call for __atomic_op_fetch routines
On 10/26/2011 10:25 AM, Andrew MacLeod wrote: > hum. I suppose that wouldn't hurt. I don't think it should make any > difference since Im not modifying whats in the table, but I can give > it a shot. If it bootstraps, which I presume it will, I'll check it > in with the original decl replaced. The other thing that just occurred to me is that we're lying to cgraph about the functions that wind up being referenced. But since different function references can come in via libcalls, I don't know how different this case is. It'll be interesting to see what happens with LTO enabled here... r~
Re: [PATCH 1/3] Transformation replay mechanism
(Note, this is a prerequisite for predication support but also is an improvement on its own. This patch can be installed separately.) This patch implements transformation replay mechanism (redo_transformations). Currently, after choosing an expression to schedule and finding its original expressions below the scheduling fence in move_op, we need to move the original expressions up (e.g. to correctly generate bookkeeping copies). That moving up would call moveup_expr_cached. The patch slightly optimizes this by performing lookups in the history vector of the chosen expression, which must contain all transformations of all original expressions. Not much changes are needed: the most notable is perhaps that for speculation transformation, both old and new speculative statuses are saved. This patch is needed by the following predication patch, which uses redo_transformations to correctly apply inverted/non-inverted predicate on jumps without needing to remember the branch from which the expression originated. 2011-10-26 Alexander Monakov * sel-sched-ir.c (free_history_vect): Export. (insert_in_history_vect): Rename argument spec_ds to old_spec_ds, add new argument (new_spec_ds). Record it in the history vector. Update all callers. (init_expr): Assert that initial history is empty. Simplify code. (copy_history_of_changes): Split out from copy_expr. * sel-sched-ir.h (struct expr_history_def_1): Rename spec_ds to old_spec_ds. Add new member (new_spec_ds). (copy_history_of_changes, free_history_vect): Declare. (insert_in_history_vect): Adjust prototype. * sel-sched.c (undo_transformations): Adjust. (move_exprs_to_boundary): Discard history of expression to be scheduled. (move_op_orig_expr_found): Populate history vector of c_expr for use in redo_transformations. (redo_transformations): New. (move_op_ascend): Use redo_transformations. diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index 1bbc3ed..1a73308 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -149,7 +149,6 @@ static expr_t set_insn_init (expr_t, vinsn_t, int); static void cfg_preds (basic_block, insn_t **, int *); static void prepare_insn_expr (insn_t, int); -static void free_history_vect (VEC (expr_history_def, heap) **); static void move_bb_info (basic_block, basic_block); static void remove_empty_bb (basic_block, bool); @@ -1503,13 +1502,13 @@ find_in_history_vect (VEC(expr_history_def, heap) *vect, rtx insn, /* Insert new element in a sorted history vector pointed to by PVECT, if it is not there already. The element is searched using - UID/NEW_EXPR_VINSN pair. TYPE, OLD_EXPR_VINSN and SPEC_DS save - the history of a transformation. */ + UID/NEW_EXPR_VINSN pair. TYPE, OLD_EXPR_VINSN, OLD_SPEC_DS and + NEW_SPEC_DS save the history of a transformation. */ void insert_in_history_vect (VEC (expr_history_def, heap) **pvect, unsigned uid, enum local_trans_type type, vinsn_t old_expr_vinsn, vinsn_t new_expr_vinsn, -ds_t spec_ds) +ds_t old_spec_ds, ds_t new_spec_ds) { VEC(expr_history_def, heap) *vect = *pvect; expr_history_def temp; @@ -1525,15 +1524,16 @@ insert_in_history_vect (VEC (expr_history_def, heap) **pvect, /* It is possible that speculation types of expressions that were propagated through different paths will be different here. In this case, merge the status to get the correct check later. */ - if (phist->spec_ds != spec_ds) -phist->spec_ds = ds_max_merge (phist->spec_ds, spec_ds); + if (phist->old_spec_ds != old_spec_ds) +phist->old_spec_ds = ds_max_merge (phist->old_spec_ds, old_spec_ds); return; } temp.uid = uid; temp.old_expr_vinsn = old_expr_vinsn; temp.new_expr_vinsn = new_expr_vinsn; - temp.spec_ds = spec_ds; + temp.old_spec_ds = old_spec_ds; + temp.new_spec_ds = new_spec_ds; temp.type = type; vinsn_attach (old_expr_vinsn); @@ -1543,7 +1543,7 @@ insert_in_history_vect (VEC (expr_history_def, heap) **pvect, } /* Free history vector PVECT. */ -static void +void free_history_vect (VEC (expr_history_def, heap) **pvect) { unsigned i; @@ -1576,7 +1576,7 @@ merge_history_vect (VEC (expr_history_def, heap) **pvect, for (i = 0; VEC_iterate (expr_history_def, from, i, phist); i++) insert_in_history_vect (pvect, phist->uid, phist->type, phist->old_expr_vinsn, phist->new_expr_vinsn, -phist->spec_ds); +phist->old_spec_ds, phist->new_spec_ds); } /* Compare two vinsns as rhses if possible and as vinsns otherwise. */ @@ -1632,10 +1632,8 @@ init_expr (expr_t expr, vinsn_t vi, int spec, int use, int priority, EXPR_SPEC_DONE_DS (expr) = spec_done_ds; EXPR_SPE
Re: [cxx-mem-model] Change library call for __atomic_op_fetch routines
On 10/26/2011 01:19 PM, Richard Henderson wrote: !resolved to an instriction sequence. */ "instruction" no no, its a mashup of "restricted instruction". ok, maybe not. ! gcc_assert (TREE_OPERAND (addr, 0) == fndecl); ! TREE_OPERAND (addr, 0) = builtin_decl_explicit(ext_call); I do wonder if we ought not put the original back after the expand. hum. I suppose that wouldn't hurt. I don't think it should make any difference since Im not modifying whats in the table, but I can give it a shot. If it bootstraps, which I presume it will, I'll check it in with the original decl replaced. Andrew
[PATCH 0/3] Predication support for selective scheduler
Hello, This patch series implements predication support for the selective scheduler. It is implemented as a transformation similar to speculative execution, but simpler in some aspects (e.g. does not require recovery code). On ia64, combining speculation and conditional execution is supported. Combining of predicates is not supported. The patches have been bootstrapped and regtested on x86_64, ia64 and arm with selective scheduling (+ pipelining and predication) enabled at -O2. OK for trunk? Alexander
[C++ Patch] BASELINK_P clean up
Hi, we have an inconsistent mix: shall we do the below? Test in progress on x86_64-linux. Thanks, Paolo. // 2011-10-26 Paolo Carlini * typeck.c (cp_build_addr_expr_1): Use BASELINK_P. * class.c (instantiate_type): Likewise. * pt.c (convert_nontype_argument_function, uses_template_parms, tsubst_copy, resolve_nondeduced_context, type_dependent_expression_p): Likewise. * semantics.c (finish_decltype_type): Likewise. * decl2.c (mark_used): Likewise. * name-lookup.c (arg_assoc): Likewise. Index: typeck.c === --- typeck.c(revision 180528) +++ typeck.c(working copy) @@ -4946,7 +4946,7 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue if (TREE_CODE (arg) == OFFSET_REF) PTRMEM_OK_P (val) = PTRMEM_OK_P (arg); } - else if (TREE_CODE (TREE_OPERAND (arg, 1)) == BASELINK) + else if (BASELINK_P (TREE_OPERAND (arg, 1))) { tree fn = BASELINK_FUNCTIONS (TREE_OPERAND (arg, 1)); Index: class.c === --- class.c (revision 180528) +++ class.c (working copy) @@ -6873,7 +6873,7 @@ instantiate_type (tree lhstype, tree rhs, tsubst_f } } - if (TREE_CODE (rhs) == BASELINK) + if (BASELINK_P (rhs)) { access_path = BASELINK_ACCESS_BINFO (rhs); rhs = BASELINK_FUNCTIONS (rhs); Index: pt.c === --- pt.c(revision 180528) +++ pt.c(working copy) @@ -5277,7 +5277,7 @@ convert_nontype_argument_function (tree type, tree fn_no_ptr = fn; if (TREE_CODE (fn_no_ptr) == ADDR_EXPR) fn_no_ptr = TREE_OPERAND (fn_no_ptr, 0); - if (TREE_CODE (fn_no_ptr) == BASELINK) + if (BASELINK_P (fn_no_ptr)) fn_no_ptr = BASELINK_FUNCTIONS (fn_no_ptr); /* [temp.arg.nontype]/1 @@ -7801,7 +7801,7 @@ uses_template_parms (tree t) || EXPR_P (t) || TREE_CODE (t) == TEMPLATE_PARM_INDEX || TREE_CODE (t) == OVERLOAD - || TREE_CODE (t) == BASELINK + || BASELINK_P (t) || TREE_CODE (t) == IDENTIFIER_NODE || TREE_CODE (t) == TRAIT_EXPR || TREE_CODE (t) == CONSTRUCTOR @@ -11993,7 +11993,7 @@ tsubst_copy (tree t, tree args, tsubst_flags_t com base, name, /*template_p=*/false); } - else if (TREE_CODE (name) == BASELINK) + else if (BASELINK_P (name)) name = tsubst_baselink (name, non_reference (TREE_TYPE (object)), args, complain, @@ -15197,7 +15197,7 @@ resolve_nondeduced_context (tree orig_expr) offset = expr; expr = TREE_OPERAND (expr, 1); } - if (TREE_CODE (expr) == BASELINK) + if (BASELINK_P (expr)) { baselink = expr; expr = BASELINK_FUNCTIONS (expr); @@ -19314,7 +19314,7 @@ type_dependent_expression_p (tree expression) if (TREE_CODE (expression) == SCOPE_REF) return false; - if (TREE_CODE (expression) == BASELINK) + if (BASELINK_P (expression)) expression = BASELINK_FUNCTIONS (expression); if (TREE_CODE (expression) == TEMPLATE_ID_EXPR) Index: semantics.c === --- semantics.c (revision 180528) +++ semantics.c (working copy) @@ -5115,7 +5115,7 @@ finish_decltype_type (tree expr, bool id_expressio step. */ expr = TREE_OPERAND (expr, 1); - if (TREE_CODE (expr) == BASELINK) + if (BASELINK_P (expr)) /* See through BASELINK nodes to the underlying function. */ expr = BASELINK_FUNCTIONS (expr); Index: decl2.c === --- decl2.c (revision 180528) +++ decl2.c (working copy) @@ -4174,7 +4174,7 @@ mark_used (tree decl) like the DECL for the function. Otherwise, if the BASELINK is for an overloaded function, we don't know which function was actually used until after overload resolution. */ - if (TREE_CODE (decl) == BASELINK) + if (BASELINK_P (decl)) { decl = BASELINK_FUNCTIONS (decl); if (really_overloaded_fn (decl)) Index: name-lookup.c === --- name-lookup.c (revision 180528) +++ name-lookup.c (working copy) @@ -5329,7 +5329,7 @@ arg_assoc (struct arg_lookup *k, tree n) n = TREE_OPERAND (n, 1); while (TREE_CODE (n) == TREE_LIST) n = TREE_VALUE (n); - if (TREE_CODE (n) == BASELINK) + if (BASELINK_P (n)) n = BASELINK_FUNCTIONS (n); if (TREE_CODE (n) == FUNCTION_DECL)
Re: new patches using -fopt-info (issue5294043)
I am hoping that too:) Yes, I will try to do it when I find some time. David On Wed, Oct 26, 2011 at 1:37 AM, Richard Guenther wrote: > On Tue, Oct 25, 2011 at 9:30 PM, Xinliang David Li wrote: >> >> >> On Tue, Oct 25, 2011 at 1:02 AM, Richard Guenther >> wrote: >>> >>> On Mon, Oct 24, 2011 at 6:27 PM, Xinliang David Li >>> wrote: >>> >> Well, you seem to keep not reading what I write. I am not opposed >>> >> to adding -fopt-info/report nor to funnel messages to stdout/err. What >>> >> I am opposed is the way you want to introduce them. I want you to >>> >> fix what we dump into dump files, so that both -fopt-report and >>> >> -fopt-info >>> >> can be implemented by outputting selected pieces of the dump file >>> >> to stdout/stderr. We already have -fdump-*-stats which supposedly >>> >> could match -fopt-report, and the default -fdump-* should be what >>> >> goes to -fopt-info (minus the function bodies, of course). >>> > >>> > That sounds good. What you propose seems like >>> > >>> > -fdump-pass-[ir_only|transformation|debug]-stderr >>> > >>> > and -fopt-info is a short cut for >>> > -fdump-tree-all-transformations-stderr >>> > -fdump-ipa-all-tranformations-stderr >>> > -fdump-rtl-all-transformations-stderr >>> >>> Yes. Note that I don't like it the way the vectorizer does (with >>> -fvectorizer-verbose=... the dump files are empty). The dump >>> file content should be unchanged when redirecting (parts) to >>> stderr, so we have to arrange to duplicate messages in two places. >>> >> Vectorizer dump is a good candidate for clean up when the dumping >> infrastructure improvement is done. >> FYI, for now, we will implement the opt-info for some passes in the simple >> way in google branches, and later migrate to trunk when the dumping >> infrastructure is improved. > > I was hoping you would volunteer to improve the dumping infrastructure. > > Richard. >
Re: [PATCH 0/6] More vector permutation work
Richard Henderson wrote: > On 10/26/2011 07:30 AM, Ulrich Weigand wrote: > > This fails since for u == 4 and mode == V4SFmode it attempts to expand > > a V4SFmode shift, which is unsupported. > > > > Shouldn't this be using the mode of the selector rather than the mode > > of the result in any case? > > Yes, it should use the mode of the selector. And doing just that is > enough to fix the bug. But I noticed that the actual results for a > constant permutation were much nastier than they ought to be. > > Try this. Thanks, this has indeed fixed the build problem. Full test suite is still running ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [PATCH 0/6] More vector permutation work
On 10/26/2011 07:30 AM, Ulrich Weigand wrote: > This fails since for u == 4 and mode == V4SFmode it attempts to expand > a V4SFmode shift, which is unsupported. > > Shouldn't this be using the mode of the selector rather than the mode > of the result in any case? Yes, it should use the mode of the selector. And doing just that is enough to fix the bug. But I noticed that the actual results for a constant permutation were much nastier than they ought to be. Try this. Iain, this might solve your case too; no I can't test myself off cross, because libgfortran needs the whole cross-env -- system headers and everything. r~ diff --git a/gcc/optabs.c b/gcc/optabs.c index 9afc911..736d826 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6912,7 +6912,7 @@ expand_vec_perm (enum machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target) enum insn_code icode; enum machine_mode qimode; unsigned int i, w, e, u; - rtx tmp, sel_qi; + rtx tmp, sel_qi = NULL; rtvec vec; if (!target || GET_MODE (target) != mode) @@ -6946,23 +6946,23 @@ expand_vec_perm (enum machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target) /* Fall back to a constant byte-based permutation. */ if (qimode != VOIDmode) { - icode = direct_optab_handler (vec_perm_const_optab, qimode); - if (icode != CODE_FOR_nothing) + vec = rtvec_alloc (w); + for (i = 0; i < e; ++i) { - vec = rtvec_alloc (w); - for (i = 0; i < e; ++i) - { - unsigned int j, this_e; + unsigned int j, this_e; - this_e = INTVAL (XVECEXP (sel, 0, i)); - this_e &= 2 * e - 1; - this_e *= u; + this_e = INTVAL (XVECEXP (sel, 0, i)); + this_e &= 2 * e - 1; + this_e *= u; - for (j = 0; j < u; ++j) - RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j); - } - sel_qi = gen_rtx_CONST_VECTOR (qimode, vec); + for (j = 0; j < u; ++j) + RTVEC_ELT (vec, i * u + j) = GEN_INT (this_e + j); + } + sel_qi = gen_rtx_CONST_VECTOR (qimode, vec); + icode = direct_optab_handler (vec_perm_const_optab, qimode); + if (icode != CODE_FOR_nothing) + { tmp = expand_vec_perm_1 (icode, gen_lowpart (qimode, target), gen_lowpart (qimode, v0), gen_lowpart (qimode, v1), sel_qi); @@ -6989,47 +6989,53 @@ expand_vec_perm (enum machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target) if (icode == CODE_FOR_nothing) return NULL_RTX; - /* Multiply each element by its byte size. */ - if (u == 2) -sel = expand_simple_binop (mode, PLUS, sel, sel, sel, 0, OPTAB_DIRECT); - else -sel = expand_simple_binop (mode, ASHIFT, sel, GEN_INT (exact_log2 (u)), - sel, 0, OPTAB_DIRECT); - gcc_assert (sel != NULL); - - /* Broadcast the low byte each element into each of its bytes. */ - vec = rtvec_alloc (w); - for (i = 0; i < w; ++i) -{ - int this_e = i / u * u; - if (BYTES_BIG_ENDIAN) - this_e += u - 1; - RTVEC_ELT (vec, i) = GEN_INT (this_e); -} - tmp = gen_rtx_CONST_VECTOR (qimode, vec); - sel = gen_lowpart (qimode, sel); - sel = expand_vec_perm (qimode, sel, sel, tmp, NULL); - gcc_assert (sel != NULL); - - /* Add the byte offset to each byte element. */ - /* Note that the definition of the indicies here is memory ordering, - so there should be no difference between big and little endian. */ - vec = rtvec_alloc (w); - for (i = 0; i < w; ++i) -RTVEC_ELT (vec, i) = GEN_INT (i % u); - tmp = gen_rtx_CONST_VECTOR (qimode, vec); - sel = expand_simple_binop (qimode, PLUS, sel, tmp, sel, 0, OPTAB_DIRECT); - gcc_assert (sel != NULL); + if (sel_qi == NULL) +{ + /* Multiply each element by its byte size. */ + enum machine_mode selmode = GET_MODE (sel); + if (u == 2) + sel = expand_simple_binop (selmode, PLUS, sel, sel, + sel, 0, OPTAB_DIRECT); + else + sel = expand_simple_binop (selmode, ASHIFT, sel, + GEN_INT (exact_log2 (u)), + sel, 0, OPTAB_DIRECT); + gcc_assert (sel != NULL); + + /* Broadcast the low byte each element into each of its bytes. */ + vec = rtvec_alloc (w); + for (i = 0; i < w; ++i) + { + int this_e = i / u * u; + if (BYTES_BIG_ENDIAN) + this_e += u - 1; + RTVEC_ELT (vec, i) = GEN_INT (this_e); + } + tmp = gen_rtx_CONST_VECTOR (qimode, vec); + sel = gen_lowpart (qimode, sel); + sel = expand_vec_perm (qimode, sel, sel, tmp, NULL); + gcc_assert (sel != NULL); + + /* Add the byte offset to each byte element. */ + /* Note that the definition of the indicies h
Re: [C++ Patch / RFC] PR 50870
On 10/26/2011 05:27 PM, Jason Merrill wrote: On 10/26/2011 10:39 AM, Paolo Carlini wrote: I'm trying to fix this PR, ice on valid, which Daniel kindly filed while we were triaging PR50864. In short, in tsubst_copy_and_build, for COMPONENT_REF, we call tsubst_baselink with an object which in this case is an ARROW_EXPR, thus its TREE_TYPE is NULL_TREE. That's OK, we can pass a null type to tsubst_baselink. Just fix non_reference to pass NULL_TREE through. Ah, very good (just read the comment preceding tsubst_baselink, should have done it earlier today ;) Thus I'm finishing testing the below, Ok if it passes? Thanks, Paolo. / /cp 2011-10-26 Paolo Carlini PR c++/50870 * typeck.c (non_reference): Pass NULL_TREE through. /testsuite 2011-10-26 Paolo Carlini PR c++/50870 * g++.dg/cpp0x/decltype34.C: New. Index: testsuite/g++.dg/cpp0x/decltype34.C === --- testsuite/g++.dg/cpp0x/decltype34.C (revision 0) +++ testsuite/g++.dg/cpp0x/decltype34.C (revision 0) @@ -0,0 +1,19 @@ +// PR c++/50870 +// { dg-options "-std=gnu++0x" } + +struct impl +{ + template static T create(); +}; + +template()->impl::create())> +struct tester{}; + +tester ti; + +template()->impl::create())> +int test() { return 0; } + +int i = test(); Index: cp/typeck.c === --- cp/typeck.c (revision 180528) +++ cp/typeck.c (working copy) @@ -8322,7 +8322,7 @@ casts_away_constness (tree t1, tree t2) tree non_reference (tree t) { - if (TREE_CODE (t) == REFERENCE_TYPE) + if (t && TREE_CODE (t) == REFERENCE_TYPE) t = TREE_TYPE (t); return t; }
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: > So you would mean that memory dereferencing shouldn't be considered as > side-effect at all? No. I haven't said this at all. Of course it's a side-effect, but we're allowed to remove existing ones (under some circumstances). We're not allowed to introduce new ones, which means that this ... > So we would happily cause by code 'if (i && *i != 0) an crash, as > memory-dereference has for you no side-effect? ... is not allowed. But in the original example the memread was on the left side, hence occured always, therefore we can move it to the right side, even though it might occur less often. > In you special case it might be valid that, if first (and C-fold-const > doesn't know if the side-effect condition is really the first, as it > might be a sub-sequence of a condition) condition might trap or not, to > combine it. But branching has to cover the general cases. If you find > a way to determine that left-hand operand in fold_const's branching code > is really the left-most condition in chain, then we can add such a > special case, but I don't see here an easy way to determine it. Hmm? I don't see why it's necessary to check if it's the left-most condition in a chain. If the left hand of '&&' is a memread it can always be moved to the right side (or the operator transformed into '&' which can have the same effect), of course only if the original rhs is free of side effects, but then independed if the && was part of a larger expression. The memread will possibly be done fewer times than originally, but as said, that's okay. Ciao, Michael.
Re: Go patch committed: Implement new syscall package
Rainer Orth writes: > the problem is another one: using /usr/xpg4/bin/awk, I find: > > /usr/xpg4/bin/awk: line 47 (NR=32): wrong number of arguments to function "m" > > nawk(1) only documents match(s,ere) (i.e. two args), and the gawk docs > state: > > `match(STRING, REGEXP [, ARRAY])' > [...] > The ARRAY argument to `match' is a `gawk' extension. In > compatibility mode (*note Options::), using a third argument is a > fatal error. Ah. I took a look at the nawk man page, and I don't see any evidence that it supports submatches at all. How annoying. Ian
Re: [C++ Patch / RFC] PR 50870
On 10/26/2011 10:39 AM, Paolo Carlini wrote: I'm trying to fix this PR, ice on valid, which Daniel kindly filed while we were triaging PR50864. In short, in tsubst_copy_and_build, for COMPONENT_REF, we call tsubst_baselink with an object which in this case is an ARROW_EXPR, thus its TREE_TYPE is NULL_TREE. That's OK, we can pass a null type to tsubst_baselink. Just fix non_reference to pass NULL_TREE through. Jason
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/26 Michael Matz : > Hi, > > On Wed, 26 Oct 2011, Kai Tietz wrote: > >> well, if such a function is used as inline and we know for it that j has >> value != 2, then we have here a big difference. For your first example, >> we still have to do the memory access to *i, even if we are not >> interested in result. > > Actually we don't have to preserve memory accesses. The interesting case > is if the pointer has an invalid value. The behaviour of the access then > is undefined, and it's okay to not do it at all. In case the pointer does > point to an object the access (if it's value isn't needed) also isn't > necessary. IOW: in "void f(int *p) { int i = *p; }" we can always remove > the pointer read. So, I still maintain that the transformation on the > original example was okay. > > > Ciao, > Michael. So you would mean that memory dereferencing shouldn't be considered as side-effect at all? So we would happily cause by code 'if (i && *i != 0) an crash, as memory-dereference has for you no side-effect? In you special case it might be valid that, if first (and C-fold-const doesn't know if the side-effect condition is really the first, as it might be a sub-sequence of a condition) condition might trap or not, to combine it. But branching has to cover the general cases. If you find a way to determine that left-hand operand in fold_const's branching code is really the left-most condition in chain, then we can add such a special case, but I don't see here an easy way to determine it. Regards, Kai Hmm, not sure
Re: Go patch committed: Implement new syscall package
Ian, > Hmmm, I don't have a copy of nawk. Is it possible that it doesn't like > " *" == *? You could try "[ ]*". the problem is another one: using /usr/xpg4/bin/awk, I find: /usr/xpg4/bin/awk: line 47 (NR=32): wrong number of arguments to function "m" nawk(1) only documents match(s,ere) (i.e. two args), and the gawk docs state: `match(STRING, REGEXP [, ARRAY])' [...] The ARRAY argument to `match' is a `gawk' extension. In compatibility mode (*note Options::), using a third argument is a fatal error. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH][RFC] Re-write LTO option merging
On Wed, 26 Oct 2011, Richard Guenther wrote: > Index: trunk/gcc/opts-global.c > === > *** trunk.orig/gcc/opts-global.c 2011-10-26 13:46:24.0 +0200 > --- trunk/gcc/opts-global.c 2011-10-26 13:48:42.0 +0200 > *** static void > *** 167,176 > post_handling_callback (const struct cl_decoded_option *decoded > ATTRIBUTE_UNUSED, > unsigned int mask ATTRIBUTE_UNUSED) > { > - #ifdef ENABLE_LTO > - lto_register_user_option (decoded->opt_index, decoded->arg, > - decoded->value, mask); > - #endif > } It looks like this leaves both implementations of this callback (driver and compiler proper) empty, so all support for the post_handling_callback can be removed. -- Joseph S. Myers jos...@codesourcery.com
Re: PowerPC shrink-wrap support 3 of 3
On Wed, Oct 26, 2011 at 03:59:36PM +0200, Bernd Schmidt wrote: > On 10/26/11 15:54, Alan Modra wrote: > > I guess the tradeoff between the classic shrink-wrap epilogue scheme > > and my duplicate tail idea is whether duplicating tail blocks adds > > more code than duplicating epilogues. From what I've seen, the > > duplicate tails are generally very small. I guess I should dump out > > some info so we can get a better idea. > > I suppose if one wanted to avoid inserting more than one epilogue for > code-size reasons, one could make a new basic block containing the > epilogue, and redirect edges as appropriate. Suppose you have a function that returns r3=0 in one tail block and r3=1 in another, and these blocks are reached both by paths needing a prologue and by paths not needing a prologue. Which seems a likely common case. I'm fairly certain that would require two copies of the normal epilogue, or duplicating the tail blocks. (But it's late here and I'm ready to nod off so may not be thinking straight.) -- Alan Modra Australia Development Lab, IBM
[C++ Patch / RFC] PR 50870
Hi, I'm trying to fix this PR, ice on valid, which Daniel kindly filed while we were triaging PR50864. In short, in tsubst_copy_and_build, for COMPONENT_REF, we call tsubst_baselink with an object which in this case is an ARROW_EXPR, thus its TREE_TYPE is NULL_TREE. I'm trying to fix this be using the first operand instead, as we normally do for ARROW_EXPRs elsewhere. Note that, assuming this makes sense, we probably want to audit also case COMPONENT_REF in tsubst_copy... What do you think? (Tested already x86_64-linux) Thanks, Paolo. // /cp 2011-10-26 Paolo Carlini PR c++/50870 * pt.c (tsubst_copy_and_build): When object is an ARROW_EXPR pass to tsubst_baselink its first operand. /testsuite 2011-10-26 Paolo Carlini PR c++/50870 * g++.dg/cpp0x/decltype34.C: New. Index: testsuite/g++.dg/cpp0x/decltype34.C === --- testsuite/g++.dg/cpp0x/decltype34.C (revision 0) +++ testsuite/g++.dg/cpp0x/decltype34.C (revision 0) @@ -0,0 +1,19 @@ +// PR c++/50870 +// { dg-options "-std=gnu++0x" } + +struct impl +{ + template static T create(); +}; + +template()->impl::create())> +struct tester{}; + +tester ti; + +template()->impl::create())> +int test() { return 0; } + +int i = test(); Index: cp/pt.c === --- cp/pt.c (revision 180520) +++ cp/pt.c (working copy) @@ -13711,7 +13711,9 @@ tsubst_copy_and_build (tree t, member = TREE_OPERAND (t, 1); if (BASELINK_P (member)) member = tsubst_baselink (member, - non_reference (TREE_TYPE (object)), + TREE_CODE (object) == ARROW_EXPR + ? TREE_OPERAND (object, 0) + : non_reference (TREE_TYPE (object)), args, complain, in_decl); else member = tsubst_copy (member, args, complain, in_decl);
Re: [PATCH 0/6] More vector permutation work
Richard Henderson wrote: > Fix typos in the names of vec_extract & vec_interleave tree codes. > Change vec_perm checking and expansion level. > Implement interleave via permutation. > spu: Remove vec_extract_even/odd and vec_interleave expanders. > rs6000: Remove some vec_extract_even/odd expanders. > i386: Delete the vec_extract_even/odd patterns. This patch set causes a build failure on SPU: /home/uweigand/fsf/gcc-head/libgfortran/generated/matmul_c4.c: In function 'matmul_c4': /home/uweigand/fsf/gcc-head/libgfortran/generated/matmul_c4.c:284:25: internal compiler error: in expand_vec_perm, at optabs.c:6998 /* Multiply each element by its byte size. */ if (u == 2) sel = expand_simple_binop (mode, PLUS, sel, sel, sel, 0, OPTAB_DIRECT); else sel = expand_simple_binop (mode, ASHIFT, sel, GEN_INT (exact_log2 (u)), sel, 0, OPTAB_DIRECT); gcc_assert (sel != NULL); This fails since for u == 4 and mode == V4SFmode it attempts to expand a V4SFmode shift, which is unsupported. Shouldn't this be using the mode of the selector rather than the mode of the result in any case? Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: > well, if such a function is used as inline and we know for it that j has > value != 2, then we have here a big difference. For your first example, > we still have to do the memory access to *i, even if we are not > interested in result. Actually we don't have to preserve memory accesses. The interesting case is if the pointer has an invalid value. The behaviour of the access then is undefined, and it's okay to not do it at all. In case the pointer does point to an object the access (if it's value isn't needed) also isn't necessary. IOW: in "void f(int *p) { int i = *p; }" we can always remove the pointer read. So, I still maintain that the transformation on the original example was okay. Ciao, Michael.
Re: [PATCH][ARM] Big Endian and Generic tuning
On 26/10/11 14:54, Andrew Stubbs wrote: > On 25/10/11 15:16, Richard Earnshaw wrote: >> The difficulty on R profile is that although BE-32 mode is obsolete, >> there is a config bit that can be set to make the CPU byte-swap the >> instruction stream to make it behave largely like it is running in BE-32 >> mode. >> >> I think really we should be pushing the R-profile code towards having >> BE-8 as the default; but to do that we really need a compiler option >> that can override this behaviour (probably -mbe-[8|32]). Once we have >> that option, we can fix the compiler to prefer BE-8 as there is then a >> way to get back the legacy behaviour. > > Here's an updated patch that makes no generalizations. > > OK? > Yep R. > Andrew > > > be.patch > > > 2011-10-26 Andrew Stubbs > > gcc/ > * config/arm/bpabi.h (BE8_LINK_SPEC): Recognize generic-armv7 tuning. > > --- a/gcc/config/arm/bpabi.h > +++ b/gcc/config/arm/bpabi.h > @@ -58,6 +58,7 @@ > #define BE8_LINK_SPEC \ >" %{mbig-endian:%{march=armv7-a|mcpu=cortex-a5 \ > |mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15\ > + |mcpu=generic-armv7-a \ > |march=armv7-m|mcpu=cortex-m3 \ > |march=armv7e-m|mcpu=cortex-m4\ > |march=armv6-m|mcpu=cortex-m0 \
Re: PowerPC shrink-wrap support 3 of 3
On 10/26/11 15:54, Alan Modra wrote: > On Wed, Oct 26, 2011 at 03:01:01PM +0200, Bernd Schmidt wrote: >> On 10/26/11 14:27, Alan Modra wrote: >>> Committed revision 180522. It turns out that shrink-wrapping isn't as >>> effective as it used to be with the 20110915 based sources I was using >>> originally. povray Ray_In_Bound no longer gets the benefit of shrink >>> wrap, likely due to some cfg optimization. We end up with a simple >>> block that just does r3=1 then jumps to last_bb being reached from >>> blocks that need a prologue as well as blocks that don't. That's >>> enough to kill our current shrink wrap implementation. What we need >>> is something to duplicate these tail blocks.. >> >> Would it work to insert the epilogue on some edges to this R3=1 block, >> and not on the others? > > Wouldn't you need to modify all the target epilogue code? Our > epilogues return. Not all of them at once - you could require that if a target has a simple_return pattern, the epilogue does not return. But yes, these kinds of complications are a reason why I went for a simple variant first. > I guess the tradeoff between the classic shrink-wrap epilogue scheme > and my duplicate tail idea is whether duplicating tail blocks adds > more code than duplicating epilogues. From what I've seen, the > duplicate tails are generally very small. I guess I should dump out > some info so we can get a better idea. I suppose if one wanted to avoid inserting more than one epilogue for code-size reasons, one could make a new basic block containing the epilogue, and redirect edges as appropriate. Bernd
Re: [PATCH][ARM] Big Endian and Generic tuning
On 25/10/11 15:16, Richard Earnshaw wrote: The difficulty on R profile is that although BE-32 mode is obsolete, there is a config bit that can be set to make the CPU byte-swap the instruction stream to make it behave largely like it is running in BE-32 mode. I think really we should be pushing the R-profile code towards having BE-8 as the default; but to do that we really need a compiler option that can override this behaviour (probably -mbe-[8|32]). Once we have that option, we can fix the compiler to prefer BE-8 as there is then a way to get back the legacy behaviour. Here's an updated patch that makes no generalizations. OK? Andrew 2011-10-26 Andrew Stubbs gcc/ * config/arm/bpabi.h (BE8_LINK_SPEC): Recognize generic-armv7 tuning. --- a/gcc/config/arm/bpabi.h +++ b/gcc/config/arm/bpabi.h @@ -58,6 +58,7 @@ #define BE8_LINK_SPEC \ " %{mbig-endian:%{march=armv7-a|mcpu=cortex-a5 \ |mcpu=cortex-a8|mcpu=cortex-a9|mcpu=cortex-a15 \ + |mcpu=generic-armv7-a\ |march=armv7-m|mcpu=cortex-m3 \ |march=armv7e-m|mcpu=cortex-m4 \ |march=armv6-m|mcpu=cortex-m0 \
Re: PowerPC shrink-wrap support 3 of 3
On Wed, Oct 26, 2011 at 03:01:01PM +0200, Bernd Schmidt wrote: > On 10/26/11 14:27, Alan Modra wrote: > > Committed revision 180522. It turns out that shrink-wrapping isn't as > > effective as it used to be with the 20110915 based sources I was using > > originally. povray Ray_In_Bound no longer gets the benefit of shrink > > wrap, likely due to some cfg optimization. We end up with a simple > > block that just does r3=1 then jumps to last_bb being reached from > > blocks that need a prologue as well as blocks that don't. That's > > enough to kill our current shrink wrap implementation. What we need > > is something to duplicate these tail blocks.. > > Would it work to insert the epilogue on some edges to this R3=1 block, > and not on the others? Wouldn't you need to modify all the target epilogue code? Our epilogues return. > (How many edges of each kind are there?) In the povray case there was one edge of each kind, but I have seen other cases where there were 4 edges from blocks needing no prologue and 2 edges from blocks needing a prologue. I can't tell you what the testcase was now; It was something I looked at when ironing out bugs in my code. You wouldn't believe how many ways it is possible to write buggy cfg manipulation code.. I guess the tradeoff between the classic shrink-wrap epilogue scheme and my duplicate tail idea is whether duplicating tail blocks adds more code than duplicating epilogues. From what I've seen, the duplicate tails are generally very small. I guess I should dump out some info so we can get a better idea. -- Alan Modra Australia Development Lab, IBM
Re: PR rtl-optimization/46603
> Thanks! You're welcome. I have also installed the attached patch which makes minor tweaks and fixes various issues in comments which have bugged me for years, the most glaring one being: Similar issue for (SUBREG:M1 (REG:M2 ...) ...) for a hard register R where either M1 is not valid for R or M2 is wider than a word but we only need one word to store an M2-sized quantity in R. which seems to be hinting at some relativistic effect on the size of M2... No functional changes, tested on i586-suse-linux, applied on the mainline. 2011-10-26 Eric Botcazou * reload.c (reload_inner_reg_of_subreg): Change type of return value and type of OUTPUT parameter to bool and adjust. Document MODE and OUTPUT parameters. Use HARD_REGISTER_P. Reorder final condition and improve associated comment. (push_reload): Clarify and update comments about reloading of subregs. Adjust calls to reload_inner_reg_of_subreg. Compute the class upfront for the reloading of subregs in the out case as well. -- Eric Botcazou Index: reload.c === --- reload.c (revision 180457) +++ reload.c (working copy) @@ -256,7 +256,6 @@ static int push_secondary_reload (int, r enum insn_code *, secondary_reload_info *); static enum reg_class find_valid_class (enum machine_mode, enum machine_mode, int, unsigned int); -static int reload_inner_reg_of_subreg (rtx, enum machine_mode, int); static void push_replacement (rtx *, int, enum machine_mode); static void dup_replacements (rtx *, rtx *); static void combine_reloads (void); @@ -791,39 +790,39 @@ find_reusable_reload (rtx *p_in, rtx out return n_reloads; } -/* Return nonzero if X is a SUBREG which will require reloading of its - SUBREG_REG expression. */ +/* Return true if X is a SUBREG that will need reloading of its SUBREG_REG + expression. MODE is the mode that X will be used in. OUTPUT is true if + the function is invoked for the output part of an enclosing reload. */ -static int -reload_inner_reg_of_subreg (rtx x, enum machine_mode mode, int output) +static bool +reload_inner_reg_of_subreg (rtx x, enum machine_mode mode, bool output) { rtx inner; /* Only SUBREGs are problematical. */ if (GET_CODE (x) != SUBREG) -return 0; +return false; inner = SUBREG_REG (x); - /* If INNER is a constant or PLUS, then INNER must be reloaded. */ + /* If INNER is a constant or PLUS, then INNER will need reloading. */ if (CONSTANT_P (inner) || GET_CODE (inner) == PLUS) -return 1; +return true; - /* If INNER is not a hard register, then INNER will not need to - be reloaded. */ - if (!REG_P (inner) - || REGNO (inner) >= FIRST_PSEUDO_REGISTER) -return 0; + /* If INNER is not a hard register, then INNER will not need reloading. */ + if (!(REG_P (inner) && HARD_REGISTER_P (inner))) +return false; /* If INNER is not ok for MODE, then INNER will need reloading. */ - if (! HARD_REGNO_MODE_OK (subreg_regno (x), mode)) -return 1; + if (!HARD_REGNO_MODE_OK (subreg_regno (x), mode)) +return true; - /* If the outer part is a word or smaller, INNER larger than a - word and the number of regs for INNER is not the same as the - number of words in INNER, then INNER will need reloading. */ - return (GET_MODE_SIZE (mode) <= UNITS_PER_WORD - && output + /* If this is for an output, and the outer part is a word or smaller, + INNER is larger than a word and the number of registers in INNER is + not the same as the number of words in INNER, then INNER will need + reloading (with an in-out reload). */ + return (output + && GET_MODE_SIZE (mode) <= UNITS_PER_WORD && GET_MODE_SIZE (GET_MODE (inner)) > UNITS_PER_WORD && ((GET_MODE_SIZE (GET_MODE (inner)) / UNITS_PER_WORD) != (int) hard_regno_nregs[REGNO (inner)][GET_MODE (inner)])); @@ -990,9 +989,9 @@ push_reload (rtx in, rtx out, rtx *inloc For machines that extend byte loads, do this for any SUBREG of a pseudo where both M1 and M2 are a word or smaller, M1 is wider than M2, and M2 is an integral mode that gets extended when loaded. - Similar issue for (SUBREG:M1 (REG:M2 ...) ...) for a hard register R where - either M1 is not valid for R or M2 is wider than a word but we only - need one word to store an M2-sized quantity in R. + Similar issue for (SUBREG:M1 (REG:M2 ...) ...) for a hard register R + where either M1 is not valid for R or M2 is wider than a word but we + only need one register to store an M2-sized quantity in R. (However, if OUT is nonzero, we need to reload the reg *and* the subreg, so do nothing here, and let following statement handle it.) @@ -1082,17 +1081,16 @@ push_reload (rtx in, rtx out, rtx *inloc inmode = GET_MODE (in); } - /* Similar issue for (SUBREG:M1 (REG:M2 ...) ...) for a hard regist
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/26 Michael Matz : > Hi, > > On Wed, 26 Oct 2011, Kai Tietz wrote: > >> >> > int f(char *i, int j) >> >> > { >> >> > if (*i && j!=2) >> >> > return *i; >> >> > else >> >> > return j; >> >> > } >> >> >> >> the case can be produced quite easily. >> >> extern int global = 0; >> >> >> if (*a && global) ... > > See? You had to change the program to prove the transformation to be > invalid. But my point was that the function we discuss about was exactly > as above. It didn't have globals, or two loads, or a volatile, or > anything else you can come up with where the transformation would be > detectable (and hence invalid). I claim that you can't construct a > program that can distinguish between this function: > > int f(char *i, int j) > { > if (*i && j!=2) > return *i; > else > return j; > } > > and this one: > > int f(char *i, int j) > { > if (*i & j!=2) > return *i; > else > return j; > } > > And if you can't construct such a program, then the initial transformation > before the fold-const.c change _for this specific situation_ was correct. > > > Ciao, > Michael. well, if such a function is used as inline and we know for it that j has value != 2, then we have here a big difference. For your first example, we still have to do the memory access to *i, even if we are not interested in result. See here point 4 of 5.1.2.3 of C-spec. For your second sample we don't need to do that, as the & itself is no sequence-point and so we can eliminate the *i access without breaking anything. Regards, Kai
Re: Go patch committed: Implement new syscall package
Rainer Orth writes: > There's one problem left: with Solaris nawk, building libcalls.go fails > (from Solaris 8 to 11 inclusive): > > nawk -f /vol/gcc/src/hg/trunk/local/libgo/go/syscall/mksyscall.awk ${files} > > li > bcalls.go.tmp > nawk: syntax error at source line 47 > context is > if (match($0, "//sys(nb)?[ ]*([a-zA-Z0-9_]+)\\(([^()]*)\\) >>> > *(\ > \(([^()]+)\\))?", <<< gosig) == 0) { > nawk: illegal statement at source line 47 > nawk: syntax error at source line 58 > missing } > make[4]: *** [s-libcalls] Error 2 > > I don't yet see what's wrong with the pattern, and gawk does accept it. Hmmm, I don't have a copy of nawk. Is it possible that it doesn't like " *" == *? You could try "[ ]*". Ian
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: > >> > int f(char *i, int j) > >> > { > >> > if (*i && j!=2) > >> > return *i; > >> > else > >> > return j; > >> > } > >> > > the case can be produced quite easily. > > extern int global = 0; > > > if (*a && global) ... See? You had to change the program to prove the transformation to be invalid. But my point was that the function we discuss about was exactly as above. It didn't have globals, or two loads, or a volatile, or anything else you can come up with where the transformation would be detectable (and hence invalid). I claim that you can't construct a program that can distinguish between this function: int f(char *i, int j) { if (*i && j!=2) return *i; else return j; } and this one: int f(char *i, int j) { if (*i & j!=2) return *i; else return j; } And if you can't construct such a program, then the initial transformation before the fold-const.c change _for this specific situation_ was correct. Ciao, Michael.
[PATCH][RFC] Re-write LTO option merging
This completely rewrites LTO option merging. At compile (uselessly now at WPA?) time we now stream a COLLECT_GCC_OPTIONS like string as it comes from argv of the compiler binary. Those options are read in by the LTO driver (lto-wrapper), merged into a single set (very simple merge function right now ;)) and given a place to complain about incompatible arguments. The merged set is then prepended to the arguments from the linker driver line (what we get in COLLECT_GCC_OPTIONS for lto-wrapper), thus the linker command-line may override what the compiler command-line(s) provided. One visible change is that no optimization option on the link line no longer means -O0, unless you explicitly specify -O0 at link time. There are probably more obscure differences, especially due to the very simple merge and complain function ;)) But this is a RFC ... If WPA partitioning at any point wants to do something clever with a set of incompatible functions it can re-parse the options and do that (we then have to arrange for lto-wrapper to let the options slip through). I'm LTO bootstrapping and testing this simple variant right now (I believe we do not excercise funny option combinations right now). I'll still implement a very simple merge/complain function. Suggestions for that welcome (I'll probably simply compute the intersection of options ... in the long run we'd want to annotate our options as to whether they should be unioned/intersected). Any comments? Thanks, Richard. 2011-10-26 Richard Guenther * lto-opts.c: Re-implement. * lto-streamer.h (lto_register_user_option): Remove. (lto_read_file_options): Likewise. (lto_reissue_options): Likewise. (lto_clear_user_options): Likewise. (lto_clear_file_options): Likewise. * opts-global.c (post_handling_callback): Remove LTO specific code. (decode_options): Likewise. * lto-wrapper.c (run_gcc): Read all input file options and prepend a merged set before the linker driver options. lto/ * lto-lang.c (lto_post_options): Do not read file options. * lto.c (lto_read_all_file_options): Remove. (lto_init): Call lto_set_in_hooks here. Index: trunk/gcc/lto-opts.c === *** trunk.orig/gcc/lto-opts.c 2011-10-26 13:46:24.0 +0200 --- trunk/gcc/lto-opts.c2011-10-26 15:06:29.0 +0200 *** along with GCC; see the file COPYING3. *** 33,280 #include "common/common-target.h" #include "diagnostic.h" #include "lto-streamer.h" ! /* When a file is initially compiled, the options used when generating !the IL are not necessarily the same as those used when linking the !objects into the final executable. In general, most build systems !will proceed with something along the lines of: ! ! $ gcc -flto -c f1.c -o f1.o ! $ gcc -flto -c f2.c -o f2.o ! ... ! $ gcc -flto -c fN.c -o fN.o ! !And the final link may or may not include the same used !to generate the initial object files: ! ! $ gcc -flto -o prog f1.o ... fN.o ! !Since we will be generating final code during the link step, some !of the flags used during the compile step need to be re-applied !during the link step. For instance, flags in the -m family. ! !The idea is to save a selected set of in a special !section of the initial object files. This section is then read !during linking and the options re-applied. ! !FIXME lto. Currently the scheme is limited in that only the !options saved on the first object file (f1.o) are read back during !the link step. This means that the options used to compile f1.o !will be applied to ALL the object files in the final link step. !More work needs to be done to implement a merging and validation !mechanism, as this will not be enough for all cases. */ ! ! /* Saved options hold the type of the option (currently CL_TARGET or !CL_COMMON), and the code, argument, and value. */ ! ! typedef struct GTY(()) opt_d ! { ! unsigned int type; ! size_t code; ! char *arg; ! int value; ! } opt_t; ! ! DEF_VEC_O (opt_t); ! DEF_VEC_ALLOC_O (opt_t, heap); ! ! ! /* Options are held in two vectors, one for those registered by !command line handling code, and the other for those read in from !any LTO IL input. */ ! static VEC(opt_t, heap) *user_options = NULL; ! static VEC(opt_t, heap) *file_options = NULL; ! ! /* Iterate FROM in reverse, writing option codes not yet in CODES into *TO. !Mark each new option code encountered in CODES. */ static void ! reverse_iterate_options (VEC(opt_t, heap) *from, VEC(opt_t, heap) **to, !bitmap codes) ! { ! int i; ! ! for (i = VEC_length (opt_t, from); i > 0; i--) ! { ! const opt_t *const o = VEC_index (opt_t, from, i - 1); ! ! if (bitmap_set_bit (codes, o->code)) !
Re: [cxx-mem-model] Generic atomic functions
On 10/26/2011 08:29 AM, Michael Matz wrote: Hi, On Tue, 25 Oct 2011, Andrew MacLeod wrote: and new generic routines are provided as follows: void __atomic_load (T* object, T* return_value, memory_order m) void __atomic_store (T* object, T* new_value, memory_order m) void __atomic_exchange (T* object, T* new_value, T* return_value, memory_order m) void __atomic_compare_exchange (T* object, T* expected_value, T* new_value, bool weak, memory_order success, memory_order fail) When a generic routine is called with an object whose size maps to one of the type specific built-ins, (ie sizeof (T) == 1,2,4,8 or 16) the generic call is translated into the appropriate direct call. Are the generic routines really generic? In particular do they accept objects that aren't naturally aligned and are supposed to still work? In that case you can't rewrite them into the type specific builtins after only checking the size, you also need to check alignment. yes, ive been considering that. At the moment, the claim is it is well defined for properly aligned objects, undefined otherwise. I've been considering what would be involved in defining it. And I not sure if that should wait until the next release, or whether we should try now. - The 5 size specific routines *must* be properly aligned. (ie, and N byte object must have the same alignment as the N byte integral) for the generic routines: - if it can be determined at compile time that the object being pointed to is properly aligned, then it can be mapped into one of the 5 size specific routines. Otherwise it must remain a call into the runtime library. - The runtime library can look at the pointer value and determine if things are properly aligned based on the pointer value. If properly aligned, then it could invoke one of the runtime lock-free routines. - If an improperly aligned pointer is passed in, the fall back would be to resort to what will probably be a locked implementation for the object. - the __atomic_is_lock_free(size) routine will have to have an additional object pointer parameter. If its NULL, it will return a value based on properly aligned objects of SIZE. Otherwise, it will also check the value of the pointer to see if it is properly aligned and then return true or false based on whether one of the lock-free routines will be invoked. So it would mean that an 8 byte object which is properly aligned could be lock free, while and unaligned 8 byte object would likely have a locked implementation. They would all work though, just requiring the library. Thats my thoughts... I'm tempted to implement the alignment checking... It seems more complete, but it is more complex. so I waffle :-) any other opinions?
Re: PowerPC shrink-wrap support 3 of 3
On 10/26/11 14:27, Alan Modra wrote: > Committed revision 180522. It turns out that shrink-wrapping isn't as > effective as it used to be with the 20110915 based sources I was using > originally. povray Ray_In_Bound no longer gets the benefit of shrink > wrap, likely due to some cfg optimization. We end up with a simple > block that just does r3=1 then jumps to last_bb being reached from > blocks that need a prologue as well as blocks that don't. That's > enough to kill our current shrink wrap implementation. What we need > is something to duplicate these tail blocks.. Would it work to insert the epilogue on some edges to this R3=1 block, and not on the others? (How many edges of each kind are there?) Now that we have an initial patch in the tree and it mostly seems to work, we can think about making it a little stronger - the initial implementation is really quite conservative. Bernd
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
I describe the sample more closely here extern int global = 0; extern int *a = NULL; void catchSigSegV( int sig ) { a = &global; } int foo (int j) { signal (SIGSEGV, catchSigSegV); if (*a && global) return 2; return 0; } I admit that in most cases such a scenario is not common. This sample seems to be a valid C program. So the conditions in IF shall be evaluted strict in order of sequence-points, as first argument might trap. It doesn't matter if second argument have side-effects or none. The point is the first and so it has to be separated from other conditions. Regards, Kai
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/26 Michael Matz : > Hi, > > On Wed, 26 Oct 2011, Kai Tietz wrote: > >> > Yes, this part introduced different behavior for this small case, >> > >> > int f(char *i, int j) >> > { >> > if (*i && j!=2) >> > return *i; >> > else >> > return j; >> > } >> >> Well, as far as I understand C specification and sequence points, it >> makes indeed a difference to do branch-cost optimization on instructions >> might cause a signal traps. As signal could be handled in specifc >> handlers. You need to consider here that short-circuit optimization >> assumes associative law on operands. So right-hand side might be >> evaluaded before the left-hand-side. And this indeed breaks IMHO the >> sequences as mentioned in C specification about conditions. > > I'm not sure in this specific case. If it had been: > > if (*a && *b) ... > > the you'd be right. The side-effect of dereferencing a must happen before > *b, and hence transforming this into (*a!=0)&(*b!=0) would be wrong. But > in the case at hand we only have one potentially problematic (aka > detectable) side-effect (*i), so I don't think you could construct a > program that detects the difference. It would entail detecting that > "j!=2" was already evaluated, when (say) the segfault happens, but you > can't as the variable is non-volatile. It also can't happen that the > side-effect *i does not occur (which also would be a problem). So, I > think there wasn't an actual problem before, and it had fewer jumps. > > > Ciao, > Michael. the case can be produced quite easily. Eg: extern int global = 0; if (*a && global) ... ... the issue is that by C-specification see here 5.1.2.2.3 about program-termination.The important point is here:: Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete *** and no side effects of subsequent evaluations shall have taken place *** Annex C describes sequence-points as 1 The following are the sequence points described in 5.1.2.3: — The call to a function, after the arguments have been evaluated (6.5.2.2). — The end of the first operand of the following operators: logical AND && (6.5.13); logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17). ... Regards, Kai
Re: [cxx-mem-model] Generic atomic functions
Hi, On Tue, 25 Oct 2011, Andrew MacLeod wrote: > and new generic routines are provided as follows: > > void __atomic_load (T* object, T* return_value, memory_order m) > void __atomic_store (T* object, T* new_value, memory_order m) > void __atomic_exchange (T* object, T* new_value, T* return_value, memory_order > m) > void __atomic_compare_exchange (T* object, T* expected_value, T* new_value, > bool weak, memory_order success, memory_order fail) > > When a generic routine is called with an object whose size maps to one > of the type specific built-ins, (ie sizeof (T) == 1,2,4,8 or 16) the > generic call is translated into the appropriate direct call. Are the generic routines really generic? In particular do they accept objects that aren't naturally aligned and are supposed to still work? In that case you can't rewrite them into the type specific builtins after only checking the size, you also need to check alignment. Ciao, Michael.
Re: PowerPC shrink-wrap support 3 of 3
On Sun, Oct 16, 2011 at 02:51:01PM -0400, David Edelsohn wrote: > The patch is okay, although I am not thrilled about the need to change > the register allocation order. Committed revision 180522. It turns out that shrink-wrapping isn't as effective as it used to be with the 20110915 based sources I was using originally. povray Ray_In_Bound no longer gets the benefit of shrink wrap, likely due to some cfg optimization. We end up with a simple block that just does r3=1 then jumps to last_bb being reached from blocks that need a prologue as well as blocks that don't. That's enough to kill our current shrink wrap implementation. What we need is something to duplicate these tail blocks.. Patch here for comment. I haven't yet run benchmarks, but this passes bootstrap and regression test (on rev 180286, current virgin mainline fails bootstrap on powerpc-linux). * function.c (thread_prologue_and_epilogue_insns): Delete shadowing variables. Don't do prologue register clobber tests when shrink wrapping already failed. Compute tail block candidates for duplicating exit path. Remove these from antic set. Duplicate tails when reached from both blocks needing a prologue/epilogue and blocks not needing such. Index: gcc/function.c === --- gcc/function.c (revision 180467) +++ gcc/function.c (working copy) @@ -5697,11 +5697,11 @@ thread_prologue_and_epilogue_insns (void HARD_REG_SET prologue_clobbered, prologue_used, live_on_edge; HARD_REG_SET set_up_by_prologue; rtx p_insn; - VEC(basic_block, heap) *vec; basic_block bb; bitmap_head bb_antic_flags; bitmap_head bb_on_list; + bitmap_head bb_tail; if (dump_file) fprintf (dump_file, "Attempting shrink-wrapping optimization.\n"); @@ -5732,6 +5732,7 @@ thread_prologue_and_epilogue_insns (void bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack); bitmap_initialize (&bb_on_list, &bitmap_default_obstack); + bitmap_initialize (&bb_tail, &bitmap_default_obstack); /* Find the set of basic blocks that require a stack frame. */ @@ -5774,19 +5775,22 @@ thread_prologue_and_epilogue_insns (void } } + /* Save a copy of blocks that really need a prologue. */ + bitmap_copy (&bb_antic_flags, &bb_flags); + /* For every basic block that needs a prologue, mark all blocks reachable from it, so as to ensure they are also seen as requiring a prologue. */ while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; + FOR_EACH_EDGE (e, ei, tmp_bb->succs) if (e->dest != EXIT_BLOCK_PTR && bitmap_set_bit (&bb_flags, e->dest->index)) VEC_quick_push (basic_block, vec, e->dest); } + /* If the last basic block contains only a label, we'll be able to convert jumps to it to (potentially conditional) return insns later. This means we don't necessarily need a prologue @@ -5799,14 +5803,29 @@ thread_prologue_and_epilogue_insns (void goto fail_shrinkwrap; } + /* Find the set of basic blocks that need no prologue and only +go to the exit. */ + bitmap_set_bit (&bb_tail, EXIT_BLOCK_PTR->index); + VEC_quick_push (basic_block, vec, EXIT_BLOCK_PTR); + while (!VEC_empty (basic_block, vec)) + { + basic_block tmp_bb = VEC_pop (basic_block, vec); + + FOR_EACH_EDGE (e, ei, tmp_bb->preds) + if (single_succ_p (e->src) + && !bitmap_bit_p (&bb_antic_flags, e->src->index) + && bitmap_set_bit (&bb_tail, e->src->index)) + VEC_quick_push (basic_block, vec, e->src); + } + /* Now walk backwards from every block that is marked as needing -a prologue to compute the bb_antic_flags bitmap. */ - bitmap_copy (&bb_antic_flags, &bb_flags); +a prologue to compute the bb_antic_flags bitmap. Exclude +tail blocks; They can be duplicated to be used on paths not +needing a prologue. */ + bitmap_and_compl (&bb_antic_flags, &bb_flags, &bb_tail); FOR_EACH_BB (bb) { - edge e; - edge_iterator ei; - if (!bitmap_bit_p (&bb_flags, bb->index)) + if (!bitmap_bit_p (&bb_antic_flags, bb->index)) continue; FOR_EACH_EDGE (e, ei, bb->preds) if (!bitmap_bit_p (&bb_antic_flags, e->src->index) @@ -5816,8 +5835,6 @@ thread_prologue_and_epilogue_insns (void while (!VEC_empty (basic_block, vec)) { basic_block tmp_bb = VEC_pop (basic_block, vec); - edge e; - edge_iterator ei; bool all_set = true; bitmap_clear_bit (&bb_on_list, tmp_bb->index); @@ -5862,28 +5879,172
Re: Go patch committed: Implement new syscall package
Ian, > I committed this patch to mainline to try to fix these problems. Thanks > for testing. For this patch I bootstrapped and ran Go testsuite on > x86_64-unknown-linux-gnu. with this patch, go and libgo results on Solaris 10 and 11/x86 are back to normal, and Solaris 10 and 11/SPARC bootstraps are currently running. Thanks. There's one problem left: with Solaris nawk, building libcalls.go fails (from Solaris 8 to 11 inclusive): nawk -f /vol/gcc/src/hg/trunk/local/libgo/go/syscall/mksyscall.awk ${files} > li bcalls.go.tmp nawk: syntax error at source line 47 context is if (match($0, "//sys(nb)?[ ]*([a-zA-Z0-9_]+)\\(([^()]*)\\) >>> *(\ \(([^()]+)\\))?", <<< gosig) == 0) { nawk: illegal statement at source line 47 nawk: syntax error at source line 58 missing } make[4]: *** [s-libcalls] Error 2 I don't yet see what's wrong with the pattern, and gawk does accept it. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Kai Tietz wrote: > > Yes, this part introduced different behavior for this small case, > > > > int f(char *i, int j) > > { > > if (*i && j!=2) > > return *i; > > else > > return j; > > } > > Well, as far as I understand C specification and sequence points, it > makes indeed a difference to do branch-cost optimization on instructions > might cause a signal traps. As signal could be handled in specifc > handlers. You need to consider here that short-circuit optimization > assumes associative law on operands. So right-hand side might be > evaluaded before the left-hand-side. And this indeed breaks IMHO the > sequences as mentioned in C specification about conditions. I'm not sure in this specific case. If it had been: if (*a && *b) ... the you'd be right. The side-effect of dereferencing a must happen before *b, and hence transforming this into (*a!=0)&(*b!=0) would be wrong. But in the case at hand we only have one potentially problematic (aka detectable) side-effect (*i), so I don't think you could construct a program that detects the difference. It would entail detecting that "j!=2" was already evaluated, when (say) the segfault happens, but you can't as the variable is non-volatile. It also can't happen that the side-effect *i does not occur (which also would be a problem). So, I think there wasn't an actual problem before, and it had fewer jumps. Ciao, Michael.
RE: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Wed, 26 Oct 2011, Jiangning Liu wrote: > > > >> - > > > >> - if (LOGICAL_OP_NON_SHORT_CIRCUIT) > > > >> - { > > > >> - if (code != orig_code || lhs != orig_lhs || rhs != > > orig_rhs) > > > >> - return build2_loc (loc, code, truth_type, lhs, rhs); > > > >> - return NULL_TREE; > > > >> - } > > > > > > > > Why do you remove this hunk? Shouldn't you instead move the hunk > > you > > > > added to fold_truth_andor() here. I realize this needs some TLC to > > > > fold_truth_andor_1, because right now it early-outs for non- > > comparisons, > > > > but it seems the better place. I.e. somehow move the below code > > into the > > > > above branch, with the associated diddling on fold_truth_andor_1 > > that it > > > > gets called. > > > > > > This hunk is removed, as it is vain to do here. > > > > There is a fallthrough now, that wasn't there before. I don't know if > > it's harmless, I just wanted to mention it. > > > > Yes, this part introduced different behavior for this small case, > > D.4710 = *i; > D.4711 = D.4710 != 0; > D.4712 = j != 2; > D.4713 = D.4711 & D.4712; > if (D.4713 != 0) goto ; else goto ; > > After the fix, we have > > D.4711 = *i; > if (D.4711 != 0) goto ; else goto ; > : > if (j != 2) goto ; else goto ; So, we have one more jump than originally, when the point of the patch was to emit less on targets with high branch costs. So, as speculated, the hunk was not useless. (It's nice that it caused a benchmark to improve significantly, but that should be done via a proper analysis and patch, not as a side effect of a supposed non-change). Ciao, Michael.
[PATCH] Fix lto-wrapper obstack handling
Seems only in some cases we fault freeing an uninitialized obstack. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2011-10-26 Richard Guenther * lto-wrapper.c (run_gcc): Properly init/free obstack. Index: gcc/lto-wrapper.c === *** gcc/lto-wrapper.c (revision 180520) --- gcc/lto-wrapper.c (working copy) *** run_gcc (unsigned argc, char *argv[]) *** 339,345 char *list_option_full = NULL; const char *linker_output = NULL; const char *collect_gcc, *collect_gcc_options; - struct obstack env_obstack; int parallel = 0; int jobserver = 0; bool no_partition = false; --- 339,344 *** run_gcc (unsigned argc, char *argv[]) *** 517,527 --- 516,528 { FILE *stream = fopen (ltrans_output_file, "r"); FILE *mstream = NULL; + struct obstack env_obstack; if (!stream) fatal_perror ("fopen: %s", ltrans_output_file); /* Parse the list of LTRANS inputs from the WPA stage. */ + obstack_init (&env_obstack); nr = 0; for (;;) { *** cont: *** 574,580 /* Replace the .o suffix with a .ltrans.o suffix and write the resulting name to the LTRANS output list. */ - obstack_init (&env_obstack); obstack_grow (&env_obstack, input_name, strlen (input_name) - 2); obstack_grow (&env_obstack, ".ltrans.o", sizeof (".ltrans.o")); output_name = XOBFINISH (&env_obstack, char *); --- 575,580 *** cont: *** 654,662 free (output_names); free (input_names); free (list_option_full); } - - obstack_free (&env_obstack, NULL); } --- 654,661 free (output_names); free (input_names); free (list_option_full); + obstack_free (&env_obstack, NULL); } }
Re: [PATCH 0/6] More vector permutation work
Hi Richard, On 25 Oct 2011, at 01:17, Richard Henderson wrote: The Idea with this patch set is to re-arrange vector permutation so that it can be used to implement other patterns automatically. In particular, Altivec, SPU currently have (and Sparc VIS would need) a large amount of boilerplate code that transforms several higher level tree codes into vector permutation inside the backend. This seems a bit unnecessary. It's arguable that we could simply make VEC_PERM_EXPR the only code that gets generated at the vectorizer level, and drop some of these other patterns entirely. I'm not 100% sure we should go that far, but even if we did, I still think this is a good cleanup. Bootstrapped and tested on x86_64-linux only. I've only cross-tested the ppc/spu patches. unfortunately, the series breaks bootstrap for powerpc-darwin9: http://gcc.gnu.org/ml/gcc-regression/2011-10/msg00539.html the failure is in building libgfortran: libtool: compile: /GCC/gcc-4-7-trunk-build/./gcc/xgcc -B/GCC/gcc-4-7- trunk-build/./gcc/ -B/GCC/gcc-4-7-install/powerpc-apple-darwin9/bin/ - B/GCC/gcc-4-7-install/powerpc-apple-darwin9/lib/ -isystem /GCC/gcc-4-7- install/powerpc-apple-darwin9/include -isystem /GCC/gcc-4-7-install/ powerpc-apple-darwin9/sys-include -DHAVE_CONFIG_H -I. -I/GCC/gcc-live- trunk/libgfortran -iquote/GCC/gcc-live-trunk/libgfortran/io -I/GCC/gcc- live-trunk/libgfortran/../gcc -I/GCC/gcc-live-trunk/libgfortran/../gcc/ config -I../.././gcc -std=gnu99 -Wall -Wstrict-prototypes -Wmissing- prototypes -Wold-style-definition -Wextra -Wwrite-strings -fcx-fortran- rules -ffunction-sections -fdata-sections -ftree-vectorize -funroll- loops -g -O2 -MT matmul_c4.lo -MD -MP -MF .deps/matmul_c4.Tpo -c /GCC/ gcc-live-trunk/libgfortran/generated/matmul_c4.c -fno-common -DPIC - o .libs/matmul_c4.o /GCC/gcc-live-trunk/libgfortran/generated/matmul_c4.c: In function ‘matmul_c4’: /GCC/gcc-live-trunk/libgfortran/generated/matmul_c4.c:284:25: internal compiler error: in expand_vec_perm, at optabs.c:6998 Please submit a full bug report, with preprocessed source if appropriate. it is reproducible with a stage-1 compiler - so presumably off a cross too. cheers Iain
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On Tue, Oct 25, 2011 at 2:22 PM, Tom de Vries wrote: > On 09/24/2011 01:42 PM, Richard Guenther wrote: >> On Sat, Sep 24, 2011 at 11:40 AM, Jakub Jelinek wrote: >>> On Sat, Sep 24, 2011 at 11:31:25AM +0200, Richard Guenther wrote: In the end I'd probably say the patch is ok without the option (thus turned on by default), but if LC_GLOBAL_LOCALE is part of the glibc ABI then we clearly can't do this. >>> >>> Yes, LC_GLOBAL_LOCALE is part of glibc ABI. I guess we could only assume >>> the alignment if the pointer is actually dereferenced on the statement >>> that checks the ABI or in some stmt that dominates the spot where you want >>> to check the alignment. It is IMHO quite common to pass arbitrary values >>> in pointer types, then cast them back or just compare. >> >> Yeah (even if technically invoking undefined behavior in C). Checking if >> there is a dereference post-dominating function entry with sth like >> >> FOR_EACH_IMM_USE_STMT (... ptr ...) >> if (stmt_post_dominates_entry && contains derefrence of ptr) >> alignment = TYPE_ALIGN (...); >> >> and otherwise not assuming anything about parameter alignment might work. >> Be careful to check the alignment of the dereference though, >> >> typedef int int_unaligned __attribute__((aligned(1))); >> int foo (int *p) >> { >> int_unaligned *q = p; >> return *q; >> } >> >> will be MEM[p] but with (well, hopefully ;)) TYPE_ALIGN of TREE_TYPE (MEM[p]) >> being 1. And yes, you'd have to look into handled-components as well. I >> guess >> you'll face similar problems as we do with tree-sra.c >> tree_non_mode_aligned_mem_p >> (you need to assume eventually misaligned accesses the same way expansion >> does for the dereference, otherwise you'll run into issues on >> strict-align targets). >> >> As that de-refrence thing doesn't really fit the CCP propagation you >> won't be able >> to handle >> >> int foo (int *p) >> { >> int *q = (char *)p + 3; >> return *q; >> } >> >> and assume q is aligned (and p is misaligned by 1). >> >> That is, if the definition of a pointer is post-dominated by a derefrence >> we could assume proper alignment for that pointer (as opposed to just >> special-casing its default definition). Would be certainly interesting to >> see what kind of fallout we would get from that ;) >> > > I gave this a try in deduce_alignment_from_dereferences. > > The fall-out I got from this were unaligned dereferenced pointers in > gcc.c-torture/unsorted/*{cmp,set}.c. > > Bootstrapped and reg-tested on x86_64. Build and reg-tested on MIPS and ARM. > > Ok for trunk? Can you not do the get_value_from_alignment split (it doesn't look necessary to me) and drop the @@ -541,10 +550,18 @@ get_value_for_expr (tree expr, bool for_ if (TREE_CODE (expr) == SSA_NAME) { val = *get_value (expr); - if (for_bits_p - && val.lattice_val == CONSTANT + if (!for_bits_p) + return val; + + if (val.lattice_val == CONSTANT && TREE_CODE (val.value) == ADDR_EXPR) val = get_value_from_alignment (val.value); + else if (val.lattice_val == VARYING + && SSA_NAME_PTR_INFO (expr) != NULL + && SSA_NAME_PTR_INFO (expr)->align > 1 + && SSA_NAME_PTR_INFO (expr)->misalign == 0) + val = get_align_value (SSA_NAME_PTR_INFO (expr)->align * BITS_PER_UNIT, + TREE_TYPE (expr), 0); } hunk? I'm not sure why it is necessary at all - CCP is the only pass computing alignment, so it should simply re-compute the info? Anyway, it looks unrelated to the purpose of the patch in general. The error reporting in deduce_alignment_from_dereferences is bogus, the programs are undefined only at runtime, so you can at most issue a warning. + /* Needs to be the successor of entry, for CDI_POST_DOMINATORS. */ + entry = single_succ (ENTRY_BLOCK_PTR); + + FOR_EACH_BB (bb) +{ + gimple_stmt_iterator i; + + if (!dominated_by_p (CDI_POST_DOMINATORS, entry, bb)) + continue; if you only consider post-dominators of the entry block then just walk them directly (first_dom_son / next_dom_son). + align = TYPE_ALIGN (TREE_TYPE (memref)) / BITS_PER_UNIT; + if (align == 1) + continue; I think you want to match what expand thinks of the alignment of this memory reference, not just what TYPE_ALIGN says (and yes, that needs to be split out somehow, SRA would need this as well). + while (TREE_CODE (ptr) == SSA_NAME) + { + pi = get_ptr_info (ptr); + if (pi->misalign != 0) + { + error ("misaligned pointer dereferenced"); + break; + } simply looking at pi->misalign is wrong. pi->align may be bigger than the align that you computed above, so pi->misalign % align != 0 would be the right check. + if (pi->align >= align) + break; + pi->align = align;