Re: [PATCH][2/2] Make SCCVN use conditional equivalences
On Wed, 12 Aug 2015, Richard Biener wrote: This brings FRE/PRE up to the same level as DOM in being able to remove redundant conditionals. It does so by inserting temporary conditional expressions proved to be true on single predecessor edges. I've had to do a lot of testcase adjustments, thus the patch is now re-bootstrapping / testing on x86_64-unknown-linux-gnu. I've applied with a slight change, trimming down the number of equivalences recorded (basically only record anything off conditions not already optimized to go either way). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-08-12 Richard Biener rguent...@suse.de * tree-ssa-sccvn.c (vn_nary_op_compute_hash): Also canonicalize comparison operand order and commutative ternary op operand order. (sccvn_dom_walker::cond_stack): New state to track temporary expressions. (sccvn_dom_walker::after_dom_children): Remove tempoary expressions no longer valid. (sccvn_dom_walker::record_cond): Add a single temporary conditional expression. (sccvn_dom_walker::record_conds): Add a temporary conditional expressions and all related expressions also true/false. (sccvn_dom_walker::before_dom_children): Record temporary expressions based on the controlling condition of a single predecessor. When trying to simplify a conditional statement lookup expressions we might have inserted earlier. * testsuite/gcc.dg/tree-ssa/ssa-fre-47.c: New testcase. * testsuite/gcc.dg/tree-ssa/ssa-fre-48.c: Likewise. * testsuite/gcc.dg/tree-ssa/ssa-fre-49.c: Likewise. * testsuite/g++.dg/tree-ssa/pr61034.C: Adjust. * testsuite/gcc.dg/fold-compare-2.c: Likewise. * testsuite/gcc.dg/pr50763.c: Likewise. * testsuite/gcc.dg/predict-3.c: Likewise. * testsuite/gcc.dg/tree-ssa/20030709-2.c: Likewise. * testsuite/gcc.dg/tree-ssa/pr19831-3.c: Likewise. * testsuite/gcc.dg/tree-ssa/pr20657.c: Likewise. * testsuite/gcc.dg/tree-ssa/pr21001.c: Likewise. * testsuite/gcc.dg/tree-ssa/pr37508.c: Likewise. * testsuite/gcc.dg/tree-ssa/vrp04.c: Likewise. * testsuite/gcc.dg/tree-ssa/vrp07.c: Likewise. * testsuite/gcc.dg/tree-ssa/vrp09.c: Likewise. * testsuite/gcc.dg/tree-ssa/vrp16.c: Likewise. * testsuite/gcc.dg/tree-ssa/vrp20.c: Likewise. * testsuite/gcc.dg/tree-ssa/vrp25.c: Likewise. * testsuite/gcc.dg/tree-ssa/vrp87.c: Likewise. Index: gcc/testsuite/g++.dg/tree-ssa/pr61034.C === --- gcc/testsuite/g++.dg/tree-ssa/pr61034.C (revision 226802) +++ gcc/testsuite/g++.dg/tree-ssa/pr61034.C (working copy) @@ -43,5 +43,5 @@ bool f(I a, I b, I c, I d) { // This works only if everything is inlined into 'f'. // { dg-final { scan-tree-dump-times ;; Function 1 fre2 } } -// { dg-final { scan-tree-dump-times free 18 fre2 } } +// { dg-final { scan-tree-dump-times free 10 fre2 } } // { dg-final { scan-tree-dump-times unreachable 11 fre2 } } Index: gcc/testsuite/gcc.dg/fold-compare-2.c === --- gcc/testsuite/gcc.dg/fold-compare-2.c (revision 226802) +++ gcc/testsuite/gcc.dg/fold-compare-2.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 -fno-tree-tail-merge -fdump-tree-vrp1 } */ +/* { dg-options -O2 -fdump-tree-fre1 } */ extern void abort (void); @@ -15,5 +15,5 @@ main(void) return 0; } -/* { dg-final { scan-tree-dump-times Removing basic block 2 vrp1 } } */ +/* { dg-final { scan-tree-dump-times Removing basic block 2 fre1 } } */ Index: gcc/testsuite/gcc.dg/pr50763.c === --- gcc/testsuite/gcc.dg/pr50763.c (revision 226802) +++ gcc/testsuite/gcc.dg/pr50763.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 -ftree-tail-merge -fno-tree-dominator-opts -fdump-tree-pre } */ +/* { dg-options -O2 -ftree-tail-merge -fno-tree-dominator-opts } */ int bar (int i); @@ -11,5 +11,3 @@ foo (int c, int d) d = 33; while (c == d); } - -/* { dg-final { scan-tree-dump-times == 33 2 pre} } */ Index: gcc/testsuite/gcc.dg/predict-3.c === --- gcc/testsuite/gcc.dg/predict-3.c(revision 226802) +++ gcc/testsuite/gcc.dg/predict-3.c(working copy) @@ -12,6 +12,10 @@ void foo (int bound) { if (i bound - 2) global += bar (i); + /* The following test is redundant with the loop bound check in the + for stmt and thus eliminated by FRE which makes the controlled +stmt always executed and thus equivalent to 100%. Thus the +heuristic only applies three times. */ if (i = bound) global += bar (i); if (i + 1 bound)
Re: [PATCH, libjava/classpath]: Fix overriding recipe for target 'gjdoc' build warning
On 08/11/2015 08:47 PM, Tom Tromey wrote: Jeff It's probably time for the occasional discussion WRT dropping Jeff gcj/libjava from the default languages and replace them with either Jeff Ada or Go. It's long past time to remove it. It's only had minimal maintenance for years now. No one is writing new features for it or fixing bugs. There aren't any significant users. I've always felt I should be the one to pull the trigger. If this is acceptable I can take a stab at preparing a patch. I thought maybe this would also enable deleting boehm-gc, zlib, or libffi; but I see now they all have other users in the tree now. Just to be clear, I'm not talking about total removal, just removal from the default languages. In the past this has stalled on issues like how will asynch-exceptions be tested and the like. My inclination is to replace GCJ with Go, but Ian wasn't comfortable with that when I suggested it a couple years ago. The other obvious alternative is Ada -- my testing showed that using Ada would further slow down bootstrap/regression test time which is undesirable. I could live with either, but I'd lean towards Go. jeff
[committed, PATCH] Add Knights Landing support to __builtin_cpu_is
This patch adds Knights Landing support to __builtin_cpu_is. gcc/testsuite/ * gcc.target/i386/builtin_target.c (check_intel_cpu_model): Check Knights Landing support. libgcc/ * config/i386/cpuinfo.c (processor_types): Add INTEL_KNL. (get_intel_cpu): Add Knights Landing support. --- gcc/testsuite/gcc.target/i386/builtin_target.c | 4 libgcc/config/i386/cpuinfo.c | 5 + 2 files changed, 9 insertions(+) diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c b/gcc/testsuite/gcc.target/i386/builtin_target.c index 4adea27..068db23 100644 --- a/gcc/testsuite/gcc.target/i386/builtin_target.c +++ b/gcc/testsuite/gcc.target/i386/builtin_target.c @@ -38,6 +38,10 @@ check_intel_cpu_model (unsigned int family, unsigned int model, /* Silvermont. */ assert (__builtin_cpu_is (silvermont)); break; + case 0x57: + /* Knights Landing. */ + assert (__builtin_cpu_is (knl)); + break; case 0x1a: case 0x1e: case 0x1f: diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c index 25d85e4..b7b11de 100644 --- a/libgcc/config/i386/cpuinfo.c +++ b/libgcc/config/i386/cpuinfo.c @@ -56,6 +56,7 @@ enum processor_types AMDFAM10H, AMDFAM15H, INTEL_SILVERMONT, + INTEL_KNL, AMD_BTVER1, AMD_BTVER2, CPU_TYPE_MAX @@ -197,6 +198,10 @@ get_intel_cpu (unsigned int family, unsigned int model, unsigned int brand_id) /* Silvermont. */ __cpu_model.__cpu_type = INTEL_SILVERMONT; break; + case 0x57: + /* Knights Landing. */ + __cpu_model.__cpu_type = INTEL_KNL; + break; case 0x1a: case 0x1e: case 0x1f: -- 2.4.3
Re: [PATCH, libjava/classpath]: Fix overriding recipe for target 'gjdoc' build warning
On 12/08/15 15:44, Jeff Law wrote: My inclination is to replace GCJ with Go, but Ian wasn't comfortable with that when I suggested it a couple years ago. Because Go wasn't ready for prime time? Andrew.
Re: [PATCH][1/2] Make SCCVN use conditional equivalences
On 2015.08.12 at 09:32 +0200, Richard Biener wrote: This is the first patch in the series to make SCCVN able to remove redundant comparisons by recording conditionals being true on visited edges. This part of the series fully transitions the toplevel walk gathering entries to SCCs we value-number to the DOM walk which at the moment only visits last stmts of BBs. After this series we can safely insert expressions into the SCCVN tables temporarily for uses dominated by the stmt generating them. This patch caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67191 -- Markus
[PATCH][AArch64][committed] Do not ICE after apologising for -mcmodel=large -fPIC
Hi all, If we try to compile a file with -mcmodel=large -fPIC we will emit a sorry in initialize_aarch64_code_model because that combination is not implemented. However, there is a missing break in that case statement and we end up falling through to the gcc_unreachable and ICE'ing. The right thing here is to break. The sorry () already suppresses generation of any result so there's no risk of proceeding with wrong codegen. Bootstrappped and tested on aarch64. Applied as obvious with r226815. Thanks, Kyrill 2015-08-12 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (initialize_aarch64_code_model): Break after -mcmodel=large -fPIC sorry. commit 227760c90ec4316193f5a4b8a11378faedb3e039 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Mon Aug 10 16:09:14 2015 +0100 [AArch64] Do not ICE after apologising for -mcmodel=large -fPIC diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e00a069..87ed777 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7950,6 +7950,7 @@ initialize_aarch64_code_model (struct gcc_options *opts) case AARCH64_CMODEL_LARGE: sorry (code model %qs with -f%s, large, opts-x_flag_pic 1 ? PIC : pic); + break; default: gcc_unreachable (); }
Re: [PATCH, PR67092, PR67098 ] Remove --with-host-libstdcxx
On Wed, 12 Aug 2015, Tom de Vries wrote: Hi, this patch removes configure option --with-host-libstdcxx. [ As suggested here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67092#c13 ): ... I think we can no longer reliably support host libstdc++ as includes are not picked up from its location and GCC is C++ now. I suggest to remove that entirely. ... ] The option was originally introduced to support linking with a static version of ppl, but we no longer support using ppl. The behaviour of --with-host-libstdcxx is implemented in terms of other configure options, so we're not losing any functionality. Furthermore, the patch adds the missing documentation of the default behaviour of --with-stage1-ldflags. Bootstrapped and reg-tested on x86_64. OK for trunk? Ok. Thanks, Richard.
Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive
On 11/08/15 18:09, Kyrill Tkachov wrote: On 11/08/15 18:05, Jeff Law wrote: On 08/09/2015 03:20 PM, Steven Bosscher wrote: On Fri, Jul 31, 2015 at 7:26 PM, Jeff Law l...@redhat.com wrote: So there's a tight relationship between the implementation of bbs_ok_for_cmove_arith and insn_valid_noce_process_p. If there wasn't, then we'd probably be looking to use note_stores and note_uses. Perhaps I'm missing something, but what is wrong with using DF here instead of note_stores/note_uses? All the info on refs/mods of registers is available in the DF caches. Nothing inherently wrong with using DF here. I have reworked the patch to use FOR_EACH_INSN_DEF and FOR_EACH_INSN_USE in bbs_ok_for_cmove_arith to extracts the refs/mods and it seems to work. Is that what you mean by DF? I'm doing some more testing and hope to post the updated version soon. Here it is, I've used the FOR_EACH* macros from dataflow to gather the uses and sets. Bootstrapped and tested on x86_64 and aarch64. How does this look? Thanks, Kyrill 2015-08-10 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (struct noce_if_info): Add then_simple, else_simple, then_cost, else_cost fields. Change branch_cost field to unsigned int. (end_ifcvt_sequence): Call set_used_flags on each insn in the sequence. Include rtl-iter.h. (noce_simple_bbs): New function. (noce_try_move): Bail if basic blocks are not simple. (noce_try_store_flag): Likewise. (noce_try_store_flag_constants): Likewise. (noce_try_addcc): Likewise. (noce_try_store_flag_mask): Likewise. (noce_try_cmove): Likewise. (noce_try_minmax): Likewise. (noce_try_abs): Likewise. (noce_try_sign_mask): Likewise. (noce_try_bitop): Likewise. (bbs_ok_for_cmove_arith): New function. (noce_emit_all_but_last): Likewise. (noce_emit_insn): Likewise. (noce_emit_bb): Likewise. (noce_try_cmove_arith): Handle non-simple basic blocks. (insn_valid_noce_process_p): New function. (contains_mem_rtx_p): Likewise. (bb_valid_for_noce_process_p): Likewise. (noce_process_if_block): Allow non-simple basic blocks where appropriate. 2015-08-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.dg/ifcvt-1.c: New test. * gcc.dg/ifcvt-2.c: Likewise. * gcc.dg/ifcvt-3.c: Likewise. Thanks, Kyrill jeff commit 76813a58e40d067bb3e64151617a975581bc81cb Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Wed Jul 8 15:45:04 2015 +0100 [PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 1f29646..c33fe24 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -53,6 +53,7 @@ #include tree-pass.h #include dbgcnt.h #include shrink-wrap.h +#include rtl-iter.h #include ifcvt.h #ifndef HAVE_incscc @@ -816,8 +817,17 @@ struct noce_if_info form as well. */ bool then_else_reversed; + /* True if the contents of then_bb and else_bb are a + simple single set instruction. */ + bool then_simple; + bool else_simple; + + /* The total rtx cost of the instructions in then_bb and else_bb. */ + unsigned int then_cost; + unsigned int else_cost; + /* Estimated cost of the particular branch instruction. */ - int branch_cost; + unsigned int branch_cost; }; static rtx noce_emit_store_flag (struct noce_if_info *, rtx, int, int); @@ -1037,6 +1047,10 @@ end_ifcvt_sequence (struct noce_if_info *if_info) set_used_flags (if_info-cond); set_used_flags (if_info-a); set_used_flags (if_info-b); + + for (insn = seq; insn; insn = NEXT_INSN (insn)) +set_used_flags (insn); + unshare_all_rtl_in_chain (seq); end_sequence (); @@ -1054,6 +1068,21 @@ end_ifcvt_sequence (struct noce_if_info *if_info) return seq; } +/* Return true iff the then and else basic block (if it exists) + consist of a single simple set instruction. */ + +static bool +noce_simple_bbs (struct noce_if_info *if_info) +{ + if (!if_info-then_simple) +return false; + + if (if_info-else_bb) +return if_info-else_simple; + + return true; +} + /* Convert if (a != b) x = a; else x = b into x = a and if (a == b) x = a; else x = b into x = b. */ @@ -1068,6 +1097,9 @@ noce_try_move (struct noce_if_info *if_info) if (code != NE code != EQ) return FALSE; + if (!noce_simple_bbs (if_info)) +return FALSE; + /* This optimization isn't valid if either A or B could be a NaN or a signed zero. */ if (HONOR_NANS (if_info-x) @@ -1116,6 +1148,9 @@ noce_try_store_flag (struct noce_if_info *if_info) rtx target; rtx_insn *seq; + if (!noce_simple_bbs (if_info)) +return FALSE; + if (CONST_INT_P (if_info-b) INTVAL (if_info-b) == STORE_FLAG_VALUE if_info-a == const0_rtx) @@ -1165,6 +1200,9 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) bool can_reverse; machine_mode mode; + if (!noce_simple_bbs (if_info)) +return FALSE; + if (CONST_INT_P (if_info-a)
Re: [PATCH, PR 66521,part 2] Fix warnings on darwin when bootstrapping with vtable verification enabled
On 08/11/2015 01:44 PM, Caroline Tice wrote: I forgot the ChangeLog enty; here it is: libstdc++-v3/ChangeLog: 2015-08-11 Caroline Tice cmt...@google.com PR 66521, Contributed by Eric Gallager * acinclude.m4 (VTV_CXXLINKFLAGS): Make this variable OS-specific, and fix the rpath flag to work properly for darwin. * configure: Regenerated. OK. jeff
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On 08/12/2015 01:31 AM, Segher Boessenkool wrote: Is there something that makes the cache not get too big? Do we care, anyway? No, nothing ever reduces the size of the cache. I doubt we care, but I haven't instrumented to see how big it grows. My intuition says the most important thing about managing this cache is not to put the most common trivial constants in, and I already do that. I did attempt to fix it, and got nothing for my troubles except poorer code generation for AND/IOR/XOR with non-trivial constants. Could you give an example of code that isn't split early enough? I believe the examples I was seeing was in the libdecnumber code. I'd have to go back and reproduce it now... Perhaps there's some other predication or costing error that's getting in the way, and it simply wasn't obvious to me. In any case, nothing in this patch set addresses this at all. The instruction (set (reg) (const_int 0x12345678)) is costed as 4 (i.e. one insn). That cannot be good. This is alternative #5 in *movsi_internal1_single (there are many more variants of that pattern). Yes, when I tried to fix it, I did adjust that costing, but still... * Constants are split *really* late. In particular, after reload. Yeah that is bad. But I'm still not seeing it. Hrm, maybe only DImode ones? Dunno. I found it after I did add a recipe to use ADDI, which then triggered an ICE due to the r0 base register. Have you looked at generated code quality? I've looked at diffs of all of the object files in the target directory. There were definite spots of improvement. I wasn't able to spot any regressions. r~
[C/C++ PATCH] Fix -Wshift-overflow with sign bit
This patch fixes a defect in -Wshift-overflow. We should only warn about left-shifting 1 into the sign bit when -Wshift-overflow=2. But this doesn't apply only for 1 31, but also for 2 30, etc. In C++14, never warn about this. Neither existing tests nor documentation require updating, I think. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-08-12 Marek Polacek pola...@redhat.com PR c++/55095 * c-common.c (maybe_warn_shift_overflow): Properly handle left-shifting 1 into the sign bit. * c-c++-common/Wshift-overflow-6.c: New test. * c-c++-common/Wshift-overflow-7.c: New test. * g++.dg/cpp1y/left-shift-2.C: New test. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index f6c5ddd..13175d8 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -12442,9 +12442,10 @@ maybe_warn_shift_overflow (location_t loc, tree op0, tree op1) if (TYPE_UNSIGNED (type0)) return false; + unsigned int min_prec = (wi::min_precision (op0, SIGNED) + + TREE_INT_CST_LOW (op1)); /* Handle the left-shifting 1 into the sign bit case. */ - if (integer_onep (op0) - compare_tree_int (op1, prec0 - 1) == 0) + if (min_prec == prec0 + 1) { /* Never warn for C++14 onwards. */ if (cxx_dialect = cxx14) @@ -12456,8 +12457,6 @@ maybe_warn_shift_overflow (location_t loc, tree op0, tree op1) return true; } - unsigned int min_prec = (wi::min_precision (op0, SIGNED) - + TREE_INT_CST_LOW (op1)); bool overflowed = min_prec prec0; if (overflowed c_inhibit_evaluation_warnings == 0) warning_at (loc, OPT_Wshift_overflow_, diff --git gcc/testsuite/c-c++-common/Wshift-overflow-6.c gcc/testsuite/c-c++-common/Wshift-overflow-6.c index e69de29..fed79f8 100644 --- gcc/testsuite/c-c++-common/Wshift-overflow-6.c +++ gcc/testsuite/c-c++-common/Wshift-overflow-6.c @@ -0,0 +1,36 @@ +/* PR c++/55095 */ +/* { dg-do compile { target int32 } } */ +/* { dg-options -Wshift-overflow=1 } */ +/* { dg-additional-options -std=c++11 { target c++ } } */ + +int i00 = 0b1 31; +int i01 = 0b10 30; +int i02 = 0b100 29; +int i03 = 0b1000 28; +int i04 = 0b1 27; +int i05 = 0b10 26; +int i06 = 0b100 25; +int i07 = 0b1000 24; +int i08 = 0b1 23; +int i09 = 0b10 22; +int i10 = 0b100 21; +int i11 = 0b1000 20; +int i12 = 0b1 19; +int i13 = 0b10 18; +int i14 = 0b100 17; +int i15 = 0b1000 16; +int i16 = 0b1 15; +int i17 = 0b10 14; +int i18 = 0b100 13; +int i19 = 0b1000 12; +int i20 = 0b1 11; +int i21 = 0b10 10; +int i22 = 0b100 9; +int i23 = 0b1000 8; +int i24 = 0b1 7; +int i25 = 0b10 6; +int i26 = 0b100 5; +int i27 = 0b1000 4; +int i28 = 0b1 3; +int i29 = 0b10 2; +int i30 = 0b100 1; diff --git gcc/testsuite/c-c++-common/Wshift-overflow-7.c gcc/testsuite/c-c++-common/Wshift-overflow-7.c index e69de29..0eb1fef 100644 --- gcc/testsuite/c-c++-common/Wshift-overflow-7.c +++ gcc/testsuite/c-c++-common/Wshift-overflow-7.c @@ -0,0 +1,36 @@ +/* PR c++/55095 */ +/* { dg-do compile { target int32 } } */ +/* { dg-options -Wshift-overflow=2 } */ +/* { dg-additional-options -std=c++11 { target c++ } } */ + +int i00 = 0b1 31; /* { dg-warning requires 33 bits to represent } */ +int i01 = 0b10 30; /* { dg-warning requires 33 bits to represent } */ +int i02 = 0b100 29; /* { dg-warning requires 33 bits to represent } */ +int i03 = 0b1000 28; /* { dg-warning requires 33 bits to represent } */ +int i04 = 0b1 27; /* { dg-warning requires 33 bits to represent } */ +int i05 = 0b10 26; /* { dg-warning requires 33 bits to represent } */ +int i06 = 0b100 25; /* { dg-warning requires 33 bits to represent } */ +int i07 = 0b1000 24; /* { dg-warning requires 33 bits to represent } */ +int i08 = 0b1 23; /* { dg-warning requires 33 bits to represent } */ +int i09 = 0b10 22; /* { dg-warning requires 33 bits to represent } */ +int i10 = 0b100 21; /* { dg-warning requires 33 bits to represent } */ +int i11 = 0b1000 20; /* { dg-warning requires 33 bits to represent } */ +int i12 = 0b1 19; /* { dg-warning requires 33 bits to represent } */ +int i13 = 0b10 18; /* { dg-warning requires 33 bits to represent } */ +int i14 = 0b100 17; /* { dg-warning requires 33 bits to represent } */ +int i15 = 0b1000 16; /* { dg-warning requires 33 bits to represent } */ +int i16 = 0b1 15; /* { dg-warning requires 33 bits to represent } */ +int i17 =
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On 08/12/2015 01:32 AM, Richard Earnshaw wrote: How do we clear the cache, and when? For example, on ARM, switching between ARM and Thumb state means we need to generate potentially radically different sequences? We can do such splitting at function boundaries now. At present I never clear the cache. Maybe we'll find that's a mistake. For arm vs thumb I would start with just using two different caches. The way the code is structured currently, that would mean two different classes. Which could just be trivial wrappers around a common base class containing the generator code. Can we generate different sequences for hot/cold code within a single function? Not without using different caches. Can we cache sequences with the context (eg use with AND, OR, ADD, etc)? No. At least not without... r~
Re: [gomp4] dimension API
On 08/12/15 09:30, Nathan Sidwell wrote: I've committed this patch to gomp4. It reworks the oacc functiuon attribute dimension handling. Rather than pass the TREE_LIST to the various hooks, I convert it to a regular C array of ints. That makes life simpler for the consumers. They return a 'changed' flag to indicate whether the attrribute should be rewritten. That rewriting is done in a way that doesn;t presume the attribute is unshared (Cesar, your workaround should no longer be necessary). I've discovered I'd committed a slightly earlier version of what I'd tested. Consequently there is some breakage. I'll be fixing it later today, after a retest. nathan
[PATCH] [graphite] Constrain only on INTEGER_TYPE
Passes bootstrap, no regressions. With this patch gcc bootstraps with graphite. make BOOT_CFLAGS=-g -O2 -fgraphite-identity -floop-interchange -floop-block gcc/ChangeLog: 2015-08-12 Aditya Kumar hiradi...@msn.com * graphite-scop-detection.c (stmt_simple_for_scop_p): Constrain only on INTEGER_TYPE --- gcc/graphite-scop-detection.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c index fb7247e..05cb5b9 100644 --- a/gcc/graphite-scop-detection.c +++ b/gcc/graphite-scop-detection.c @@ -410,7 +410,7 @@ stmt_simple_for_scop_p (basic_block scop_entry, loop_p outermost_loop, tree op = gimple_op (stmt, i); if (!graphite_can_represent_expr (scop_entry, loop, op) /* We can not handle REAL_TYPE. Failed for pr39260. */ - || TREE_CODE (TREE_TYPE (op)) == REAL_TYPE) + || (TREE_CODE (TREE_TYPE (op)) != INTEGER_TYPE)) { if (dump_file (dump_flags TDF_DETAILS)) { -- 2.1.0.243.g30d45f7
Re: [Aarch64][target/PR 67143] Use correct constraints on operands for atomic operations.
On Tue, 11 Aug 2015, Matthew Wahab wrote: PR target/67143 * gcc.target/aarch64/pr67143.c: New What's architecture-specific about this test? That is, why doesn't it just go in gcc.c-torture/compile (no dg- directives needed, automatically looped over optimization options)? Architecture-specific test directories should only be for tests that are actually architecture-specific (e.g. using architecture-specific options, or asm, or built-in functions, or scan-assembler for particular output), not for tests that can be built generically for all architectures and still test for the original bug. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL
On Wed, Aug 12, 2015 at 01:03:36PM -0600, Jeff Law wrote: On 08/12/2015 12:57 PM, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 08/10/2015 06:05 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, In many places gcc puts classes in the anon namespace so the compiler can tell they do not get inheritted from to enable better devirtualization. However debugging code in the anon namespace can be a pain, and the same thing can be accomplished more directly by marking the classes as final. When bootstrapping stage 3 should always be built in C++14 mode now, and of course will always be newer than gcc 4.7, so these classes will always be marked as final there. AIUI cross compilers are supposed to be built with recent gcc, which I would tend to think implies newer than 4.7, so they should also be built with these classes marked as final. I believe that means in all important cases this works just as well as the anon namespace. bootstrapped + regtested on x86_64-linux-gnu, ok? Trev gcc/ChangeLog: 2015-08-10 Trevor Saunders tbsau...@tbsaunde.org * compare-elim.c, dce.c, dse.c, gimple-ssa-isolate-paths.c, gimple-ssa-strength-reduction.c, graphite.c, init-regs.c, ipa-pure-const.c, ipa-visibility.c, ipa.c, mode-switching.c, omp-low.c, reorg.c, sanopt.c, trans-mem.c, tree-eh.c, tree-if-conv.c, tree-ssa-copyrename.c, tree-ssa-dce.c, tree-ssa-dom.c, tree-ssa-dse.c, tree-ssa-forwprop.c, tree-ssa-sink.c, tree-ssanames.c, tree-stdarg.c, tree-tailcall.c, tree-vect-generic.c, tree.c, ubsan.c, var-tracking.c, vtable-verify.c, web.c: Use GCC_FINAL instead of the anonymous namespace. OK. I was hoping someone else was going to speak up since I seem to have been posting a few negative messages recently, but I think this is really a step in the wrong direction. I think the code was using anonymous namespaces in exactly the way they were are they actually all that common? I think gcc is the only C++ with which I'm familiar that uses them much. intended to be used. No need to worry about seeming to be negative. The problem is you can't get to stuff in the anonymous namespace easily in the debugger. There was talk of fixing that, but I don't think it ever went forward on the gdb side. If gdb were to get fixed so that debugging this stuff was easier, then I'd fully support putting things back into the anonymous namespace. I have vague memories of other reasons to dislike the anon namespace being brought up when it was discussed at $day job, but debuggers was the big reason (and its not just gdb that's bad) Trev jeff
Re: [PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL
On 2015.08.12 at 13:03 -0600, Jeff Law wrote: On 08/12/2015 12:57 PM, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 08/10/2015 06:05 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, In many places gcc puts classes in the anon namespace so the compiler can tell they do not get inheritted from to enable better devirtualization. However debugging code in the anon namespace can be a pain, and the same thing can be accomplished more directly by marking the classes as final. When bootstrapping stage 3 should always be built in C++14 mode now, and of course will always be newer than gcc 4.7, so these classes will always be marked as final there. AIUI cross compilers are supposed to be built with recent gcc, which I would tend to think implies newer than 4.7, so they should also be built with these classes marked as final. I believe that means in all important cases this works just as well as the anon namespace. bootstrapped + regtested on x86_64-linux-gnu, ok? Trev gcc/ChangeLog: 2015-08-10 Trevor Saunders tbsau...@tbsaunde.org * compare-elim.c, dce.c, dse.c, gimple-ssa-isolate-paths.c, gimple-ssa-strength-reduction.c, graphite.c, init-regs.c, ipa-pure-const.c, ipa-visibility.c, ipa.c, mode-switching.c, omp-low.c, reorg.c, sanopt.c, trans-mem.c, tree-eh.c, tree-if-conv.c, tree-ssa-copyrename.c, tree-ssa-dce.c, tree-ssa-dom.c, tree-ssa-dse.c, tree-ssa-forwprop.c, tree-ssa-sink.c, tree-ssanames.c, tree-stdarg.c, tree-tailcall.c, tree-vect-generic.c, tree.c, ubsan.c, var-tracking.c, vtable-verify.c, web.c: Use GCC_FINAL instead of the anonymous namespace. OK. I was hoping someone else was going to speak up since I seem to have been posting a few negative messages recently, but I think this is really a step in the wrong direction. I think the code was using anonymous namespaces in exactly the way they were intended to be used. No need to worry about seeming to be negative. The problem is you can't get to stuff in the anonymous namespace easily in the debugger. There was talk of fixing that, but I don't think it ever went forward on the gdb side. If gdb were to get fixed so that debugging this stuff was easier, then I'd fully support putting things back into the anonymous namespace. For the record here the gdb bug in question: https://sourceware.org/bugzilla/show_bug.cgi?id=16874 It even has a patch attached, that improves the issue. -- Markus
Re: [PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL
On Wed, Aug 12, 2015 at 03:31:35PM -0400, David Edelsohn wrote: This patch has caused a bootstrap failure on AIX. reverted, though my bet is this is some sort of generic devirt bug on aix. Trev - David
RE: C++ Concepts available in trunk?
From: Jonathan Wakely [jwakely@gmail.com] Sent: 12 August 2015 17:41 On 12 August 2015 at 16:20, Dijk, J. van wrote: Dear all, Looking into gcc/cp/Changelog, it appears that on 2015-08-06 support for the C++ Concepts TS was added to trunk. Have Concepts in gcc matured to the point that end-users can start to explore and test them? If so, the page https://gcc.gnu.org/projects/cxx1y.html needs an update, since it mentions the c++1z Concepts Lite *branch* at the end of the document. Yes, the branch is not in use any longer, all support is on the trunk. There is also no explanation how to enable experimental c++1z features that I can see on that page (like there is for c++14 at the top of the page). I suppose a new section on 'C++1z language features' is due? Well it should be a separate page really, since that page is about C++14 not C++17. To enable the features just use the appropriate -std option, as documented in the manual: https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html Dear Jonathan, Thanks for the clarification. I hope the attached patch against current wwwdocs will be of use (only the name of the feature test macro is missing in the new file cxx1z.html; did not manage to find that.) Regards, Jan. diff -urN --exclude 'Entries*' wwwdocs-orig/htdocs/gcc-6/changes.html wwwdocs/htdocs/gcc-6/changes.html --- wwwdocs-orig/htdocs/gcc-6/changes.html 2015-08-13 00:25:44.009826451 +0200 +++ wwwdocs/htdocs/gcc-6/changes.html 2015-08-13 00:15:36.507550280 +0200 @@ -68,6 +68,8 @@ h3 id=cxxC++/h3 ul liThe default mode has been changed to code-std=gnu++14/code./li +li C++ concepts a href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4377.pdf; + (N4377)/a are now supported when c++1z code generation is requested./li /ul h3 id=fortranFortran/h3 diff -urN --exclude 'Entries*' wwwdocs-orig/htdocs/projects/cxx1y.html wwwdocs/htdocs/projects/cxx1y.html --- wwwdocs-orig/htdocs/projects/cxx1y.html 2015-08-13 00:25:44.010826466 +0200 +++ wwwdocs/htdocs/projects/cxx1y.html 2015-08-13 00:07:54.726496707 +0200 @@ -174,6 +174,13 @@ a name=concepts/ah3C++1z Concepts Branch/h3 +pstrongUpdate/strong: Support for the +a href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4377.pdf; +C++ Concepts Technical Specification/a +was merged into trunk at 6. August 2015 (revision r226713) and will +be available in gcc-6. Users of the concepts branch are urged to switch +to trunk instead./p + pConcepts was a major feature planned for the C++11 standard, but it was eventually dropped due to concerns about both the description and implementability. Since the publication of C++11, people have been working @@ -183,6 +190,7 @@ Specification in 2015. The initial implementation is available from the link above, and it is in the process of being cleaned up and moved into the codec++-concepts/code branch./p + /body /html diff -urN --exclude 'Entries*' wwwdocs-orig/htdocs/projects/cxx1z.html wwwdocs/htdocs/projects/cxx1z.html --- wwwdocs-orig/htdocs/projects/cxx1z.html 1970-01-01 01:00:00.0 +0100 +++ wwwdocs/htdocs/projects/cxx1z.html 2015-08-13 00:09:21.561823268 +0200 @@ -0,0 +1,66 @@ +html +head + titleC++1z Support in GCC/title +style type=text/css + /* ![CDATA[*/ +tr.separator { background: #cc} +.supported { color: green } +.unsupported { color: red } + /* ]] */ +/style +/head + +body + h1C++1z Support in GCC/h1 + + pGCC has experimental support for the upcoming revision of the C++ + standard, which is expected to be published in 2017. Until we are + sure of that, it will be referred to as C++1z./p + + pC++1z features are available as part of the mainline GCC +compiler in the trunk of +a href=../svn.htmlGCC's Subversion + repository/a. To enable C++1z support, add the command-line + parameter code-std=c++1z/code + to your codeg++/code command line. Or, to enable GNU + extensions in addition to C++1z extensions, + add code-std=gnu++1z/code to your codeg++/code command + line./p + + pstrongImportant/strong: Because the final ISO C++1z standard is + still under development, GCC's support is strongexperimental/strong + and features may disappear or change in incompatible ways as the standard + matures./p + +h2C++1z Language Features/h2 + + pThe following table lists new language features that are part of + the C++1z standard. The Proposal column + provides a link to the ISO C++ committee proposal that describes the + feature, while the Available in GCC? column indicates the first + version of GCC that contains an implementation of this feature (if + it has been implemented)./p + + table border=1 +tr class=separator + thLanguage Feature/th + thProposal/th + thAvailable in GCC?/th + thSD-6 Feature Test/th +/tr +tr + tdC++ concepts/td + tda
[PATCH] Forward to correct function in std::shared_mutex::unlock().
Looks like a typo. :) Index: libstdc++-v3/include/std/shared_mutex === --- libstdc++-v3/include/std/shared_mutex (revision 226840) +++ libstdc++-v3/include/std/shared_mutex (working copy) @@ -331,7 +331,7 @@ void lock() { _M_impl.lock(); } bool try_lock() { return _M_impl.try_lock(); } -void unlock() { _M_impl.try_lock(); } +void unlock() { _M_impl.unlock(); } // Shared ownership
Re: [PATCH] [graphite] Constrain only on INTEGER_TYPE
On 08/12/2015 10:33 PM, Aditya Kumar wrote: Passes bootstrap, no regressions. With this patch gcc bootstraps with graphite. make BOOT_CFLAGS=-g -O2 -fgraphite-identity -floop-interchange -floop-block LGTM, but please use a longer sentence to explain what you do. Tobias
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On Aug 12, 2015, at 1:07 PM, Richard Sandiford rdsandif...@googlemail.com wrote: I don't think the right shifts are an issue at all. Well, they're implementation-defined, at least in C. The C11 wording for E1 E2 is If E1 has a signed type and a negative value, the resulting value is implementation-defined”. Is C++ different? No, it is the same: 3 The value of E1 E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 divided by the quantity 2 raised to the power E2. If E1 has a signed type and a negative value, the resulting value is implementation- defined. (I don't have the standard handy.) Google is your friend.
Re: [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide
On Wed, Aug 12, 2015 at 08:50:48AM -0700, Richard Henderson wrote: On 08/12/2015 06:23 AM, Segher Boessenkool wrote: On Tue, Aug 11, 2015 at 06:11:30PM -0700, Richard Henderson wrote: This allows testing for a mask without having to call GEN_INT. Cc: David Edelsohn dje@gmail.com --- * config/rs6000/rs6000.c (rs6000_is_valid_mask_wide): Split out from... (rs6000_is_valid_mask): ... here. (rs6000_is_valid_and_mask_wide): Split out from... (rs6000_is_valid_and_mask): ... here. I don't like these _wide names much. It follows the existing practice within the backend. One existing function name, yes. And you are replacing that function :-) You could overload the shorter name, if you really think creating some garbage const_int's is too much overhead (it might well be if you use it a lot more in later patches). At one stage in the development (before I became much leaner with the search for rotate), it really really mattered. For the AND patterns I considered such a search too; I didn't do it because as you say it would have to consider a *lot* of possibilities, most useless. AND sequences of more than two insns often prevented other optimisation too, so I settled on two insns maximum, and then you can generate it directly no problem. So yes if you call it way too often it also creates too much garbage. -bool -rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode) +static bool +rs6000_is_valid_mask_wide (unsigned HOST_WIDE_INT val, int *b, int *e, int n) But why change the mode parameter? The code was clearer before. So that we don't have to look up GET_MODE_BITSIZE (mode). Getting rid of a single array lookup matters more than interface clarity? You must have been calling it *very* often! Thankfully you don't anymore. +static bool +rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val, machine_mode mode) { int nb, ne; - if (!rs6000_is_valid_mask (mask, nb, ne, mode)) -return false; + switch (mode) +{ +case DImode: + if (!rs6000_is_valid_mask_wide (val, nb, ne, 64)) + return false; + /* For DImode, we need a rldicl, rldicr, or a rlwinm with + mask that does not wrap. */ + return (ne == 0 || nb == 63 || (nb 32 ne = nb)); - /* For DImode, we need a rldicl, rldicr, or a rlwinm with mask that - does not wrap. */ - if (mode == DImode) -return (ne == 0 || nb == 63 || (nb 32 ne = nb)); +case SImode: + if (!rs6000_is_valid_mask_wide (val, nb, ne, 32)) + return false; + /* For SImode, rlwinm can do everything. */ + return (nb 32 ne 32); - /* For SImode, rlwinm can do everything. */ - if (mode == SImode) -return (nb 32 ne 32); +default: + return false; +} +} - return false; You don't need any of these changes then, either. True, not *needed* per-se, but if you look closer I'm combining conditionals. I think the replacement here is clearer. You're combining a conditional that you add (for mode - 32,64), and the code doesn't become any clearer at all IMHO. - unsigned HOST_WIDE_INT val = INTVAL (c); + unsigned HOST_WIDE_INT val = UINTVAL (c); Does it matter? No. Ah okay, you were getting me worried! :-) Segher
Re: [PATCH 06/15] rs6000: Use rldiwi in constant construction
On Wed, Aug 12, 2015 at 08:55:51AM -0700, Richard Henderson wrote: On 08/12/2015 07:02 AM, Segher Boessenkool wrote: On Tue, Aug 11, 2015 at 06:11:35PM -0700, Richard Henderson wrote: @@ -8173,6 +8173,13 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget) if (exam_mask (-1, c, sub_budget)) return true; + /* If the two halves are equal, use an insert. */ + if (c 32 == test exam_sub (test, sub_budget)) +{ + opN (VEC_DUPLICATE, 0xu); /* RLDIMI */ + return true; +} Does this work for c with the high bit set? I think you need to cast it to unsigned HOST_WIDE_INT first? Indeed, a sign-extension works better. It means the base constant will use LIS+ORIS without trying to create an unsigned version. Patch 8/15 changes this so that test is assigned the sign-extended low 32 bits right before this code; that should work just fine. If you're talking about ubsan sort of restrictions on shifting signed constants... I choose to totally ignore that. Good plan. We rely on arithmetic shifts rounding towards negative infinity, and so does the rest of the world. Certainly no where else in gcc has been audited for that, beginning with hwint.h itself. Yes. And there are much worse problems, like many things not working right if your HOST_WIDE_INT would happen to be more than 64 bits; we cannot really shake those out because there is no actual system to test that on -- but it also doesn't actually matter, because there is no system to run it on :-) Segher
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On Wed, Aug 12, 2015 at 03:31:48AM -0500, Segher Boessenkool wrote: * This is the only platform for which I bothered collecting any sort of performance data: As best I can tell, there is a 9% improvement in bootstrap speed for ppc64. That is, 10 minutes off the original 109 minute build. That is, wow. Wow :-) Bootstrap + 4 regtests of a virgin trunk du jour took me 127m; with your patches, 130m. So I'm not seeing that improvement (but no regression either). This is gcc110 fwiw. Segher
Re: [PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL
On Thu, 2015-08-13 at 01:58 +0200, Markus Trippelsdorf wrote: On 2015.08.12 at 13:03 -0600, Jeff Law wrote: On 08/12/2015 12:57 PM, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 08/10/2015 06:05 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, In many places gcc puts classes in the anon namespace so the compiler can tell they do not get inheritted from to enable better devirtualization. However debugging code in the anon namespace can be a pain, and the same thing can be accomplished more directly by marking the classes as final. When bootstrapping stage 3 should always be built in C++14 mode now, and of course will always be newer than gcc 4.7, so these classes will always be marked as final there. AIUI cross compilers are supposed to be built with recent gcc, which I would tend to think implies newer than 4.7, so they should also be built with these classes marked as final. I believe that means in all important cases this works just as well as the anon namespace. bootstrapped + regtested on x86_64-linux-gnu, ok? Trev gcc/ChangeLog: 2015-08-10 Trevor Saunders tbsau...@tbsaunde.org * compare-elim.c, dce.c, dse.c, gimple-ssa-isolate-paths.c, gimple-ssa-strength-reduction.c, graphite.c, init-regs.c, ipa-pure-const.c, ipa-visibility.c, ipa.c, mode-switching.c, omp-low.c, reorg.c, sanopt.c, trans-mem.c, tree-eh.c, tree-if-conv.c, tree-ssa-copyrename.c, tree-ssa-dce.c, tree-ssa-dom.c, tree-ssa-dse.c, tree-ssa-forwprop.c, tree-ssa-sink.c, tree-ssanames.c, tree-stdarg.c, tree-tailcall.c, tree-vect-generic.c, tree.c, ubsan.c, var-tracking.c, vtable-verify.c, web.c: Use GCC_FINAL instead of the anonymous namespace. OK. I was hoping someone else was going to speak up since I seem to have been posting a few negative messages recently, but I think this is really a step in the wrong direction. I think the code was using anonymous namespaces in exactly the way they were intended to be used. No need to worry about seeming to be negative. The problem is you can't get to stuff in the anonymous namespace easily in the debugger. There was talk of fixing that, but I don't think it ever went forward on the gdb side. If gdb were to get fixed so that debugging this stuff was easier, then I'd fully support putting things back into the anonymous namespace. For the record here the gdb bug in question: https://sourceware.org/bugzilla/show_bug.cgi?id=16874 It even has a patch attached, that improves the issue. Sadly that seems to have stalled. FWIW, our gcc/gdbhooks.py adds in a break-on-pass gdb command to work around this: https://gcc.gnu.org/ml/gcc-patches/2014-07/msg02011.html (gdb) break-on-pass pass_final Breakpoint 6 at 0x8396ba: file ../../src/gcc/final.c, line 4526. (gdb) cont Continuing. Breakpoint 6, (anonymous namespace)::pass_final::execute (this=0x17fb990) at ../../src/gcc/final.c:4526 4526virtual unsigned int execute (function *) { return rest_of_handle_final (); } (gdb) Though clearly hacking around it this way is suboptimal. Dave
Go patch committed: Ignore erroneous trees in flatten
This patch by Chris Manghane ignores erroneous trees in the flatten pass. This avoids various compiler crashes on invalid input, and fixes https://golang.org/issue/11536 , https://golang.org/issue/11558 , and https://golang.org/issue/11559 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 226825) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -5fc38e74d132cd6f4e7b56e6bcf9fe57031ab203 +fc9da313b4f5c13b4ac3b98e699fd1c89613 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 226596) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -3101,6 +3101,12 @@ Expression* Type_conversion_expression::do_flatten(Gogo*, Named_object*, Statement_inserter* inserter) { + if (this-type()-is_error_type() || this-expr_-is_error_expression()) +{ + go_assert(saw_errors()); + return Expression::make_error(this-location()); +} + if (((this-type()-is_string_type() this-expr_-type()-is_slice_type()) || this-expr_-type()-interface_type() != NULL) @@ -3585,8 +3591,13 @@ Expression* Unary_expression::do_flatten(Gogo* gogo, Named_object*, Statement_inserter* inserter) { - if (this-is_error_expression() || this-expr_-is_error_expression()) -return Expression::make_error(this-location()); + if (this-is_error_expression() + || this-expr_-is_error_expression() + || this-expr_-type()-is_error_type()) +{ + go_assert(saw_errors()); + return Expression::make_error(this-location()); +} Location location = this-location(); if (this-op_ == OPERATOR_MULT @@ -5062,10 +5073,16 @@ Expression* Binary_expression::do_flatten(Gogo* gogo, Named_object*, Statement_inserter* inserter) { - if (this-classification() == EXPRESSION_ERROR) -return this; - Location loc = this-location(); + if (this-left_-type()-is_error_type() + || this-right_-type()-is_error_type() + || this-left_-is_error_expression() + || this-right_-is_error_expression()) +{ + go_assert(saw_errors()); + return Expression::make_error(loc); +} + Temporary_statement* temp; if (this-left_-type()-is_string_type() this-op_ == OPERATOR_PLUS) @@ -6806,6 +6823,11 @@ Builtin_call_expression::do_flatten(Gogo Statement_inserter* inserter) { Location loc = this-location(); + if (this-is_erroneous_call()) +{ + go_assert(saw_errors()); + return Expression::make_error(loc); +} switch (this-code_) { @@ -8733,8 +8755,11 @@ Expression* Call_expression::do_flatten(Gogo* gogo, Named_object*, Statement_inserter* inserter) { - if (this-classification() == EXPRESSION_ERROR) -return this; + if (this-is_erroneous_call()) +{ + go_assert(saw_errors()); + return Expression::make_error(this-location()); +} if (this-is_flattened_) return this; @@ -8902,6 +8927,27 @@ Call_expression::issue_error() } } +// Whether or not this call contains errors, either in the call or the +// arguments to the call. + +bool +Call_expression::is_erroneous_call() +{ + if (this-is_error_expression() || this-fn()-is_error_expression()) +return true; + + if (this-args() == NULL) +return false; + for (Expression_list::iterator pa = this-args()-begin(); + pa != this-args()-end(); + ++pa) +{ + if ((*pa)-type()-is_error_type() || (*pa)-is_error_expression()) +return true; +} + return false; +} + // Get the type. Type* @@ -9848,30 +9894,47 @@ Array_index_expression::do_flatten(Gogo* Statement_inserter* inserter) { Location loc = this-location(); + Expression* array = this-array_; + Expression* start = this-start_; + Expression* end = this-end_; + Expression* cap = this-cap_; + if (array-is_error_expression() + || array-type()-is_error_type() + || start-is_error_expression() + || start-type()-is_error_type() + || (end != NULL + (end-is_error_expression() || end-type()-is_error_type())) + || (cap != NULL + (cap-is_error_expression() || cap-type()-is_error_type( +{ + go_assert(saw_errors()); + return Expression::make_error(loc); +} + Temporary_statement* temp; - if (this-array_-type()-is_slice_type() !this-array_-is_variable()) + if (array-type()-is_slice_type() !array-is_variable()) { - temp = Statement::make_temporary(NULL, this-array_, loc); + temp =
Re: [RFC] COMDAT Safe Module Level Multi versioning
On Tue, Aug 4, 2015 at 11:43 AM, Sriraman Tallam tmsri...@google.com wrote: On Tue, Jun 16, 2015 at 4:22 PM, Sriraman Tallam tmsri...@google.com wrote: On Tue, May 19, 2015 at 9:11 AM, Xinliang David Li davi...@google.com wrote: Hm. But which options are unsafe? Also wouldn't it be better to simply _not_ have unsafe options produce comdats but always make local clones for them (thus emit the comdat with unsafe flags dropped)? Always localize comdat functions may lead to text size increase. It does not work if the comdat function is a virtual function for instance. Based on Richard's suggestion, I have a patch to localize comdat functions which seems like a very effective solution to this problem. The text size increase is limited to the extra comdat copies generated for the specialized modules (modules with unsafe options) which is usually only a few. Since -fweak does something similar for functions, I have called the new option -fweak-comdat-functions. This does not apply to virtual comdat functions as their addresses can always be leaked via the vtable. Using this flag with virtual functions generates a warning. To summarize, this is the intended usage of this option. Modules which use unsafe code options, like -misa for multiversioning, to generate code that is meant to run only on a subset of CPUs can generate comdats with specialized instructions which when picked by the linker can get run unconditionally causing SIGILL on unsupported platforms. This flag hides these comdats to be local to these modules and not make them available publicly, with the caveat that it does not apply to virtual comdats. Could you please review? Ping. This patch uses Richard's suggestion to localize comdat functions with option -fno-weak-comdat-functions. Comments? Ping. * c-family/c.opt (fweak-comdat-functions): New option. * cp/decl2.c (comdat_linkage): Implement new option. Warn when virtual comdat functions are seen. * doc/invoke.texi: Document new option. * testsuite/g++.dg/no-weak-comdat-functions-1.C: New test. * c-family/c.opt (fweak-comdat-functions): New option. * cp/decl2.c (comdat_linkage): Implement new option. Warn when virtual comdat functions are seen. * doc/invoke.texi: Document new option. * testsuite/g++.dg/no-weak-comdat-functions-1.C: New test. * c-family/c.opt (fweak-comdat-functions): New option. * cp/decl2.c (comdat_linkage): Implement new option. Warn when virtual comdat functions are seen. * doc/invoke.texi: Document new option. * testsuite/g++.dg/no-weak-comdat-functions-1.C: New test. Thanks Sri David Richard. Thanks Sri * c-family/c.opt (fweak-comdat-functions): New option. * cp/decl2.c (comdat_linkage): Implement new option. Warn when virtual comdat functions are seen. * doc/invoke.texi: Document new option. * testsuite/g++.dg/no-weak-comdat-functions-1.C: New test. Index: c-family/c.opt === --- c-family/c.opt (revision 224486) +++ c-family/c.opt (working copy) @@ -1236,6 +1236,14 @@ fweak C++ ObjC++ Var(flag_weak) Init(1) Emit common-like symbols as weak symbols +fweak-comdat-functions +C++ Var(flag_weak_comdat_functions) Init(1) +Specific to comdat functions(-fno-weak-comdat-functions : Localize Comdat Functions). +With -fno-weak-comdat-functions, virtual comdat functions are still linked as +weak functions. With -fno-weak-comdat-functions, the address of the comdat +functions that are localized will be unique and this can cause unintended +behavior when addresses of comdat functions are used. + fwide-exec-charset= C ObjC C++ ObjC++ Joined RejectNegative -fwide-exec-charset=cset Convert all wide strings and character constants to character set cset Index: cp/decl2.c === --- cp/decl2.c (revision 224486) +++ cp/decl2.c (working copy) @@ -1702,8 +1702,19 @@ mark_vtable_entries (tree decl) void comdat_linkage (tree decl) { - if (flag_weak) -make_decl_one_only (decl, cxx_comdat_group (decl)); + if (flag_weak + (flag_weak_comdat_functions + || TREE_CODE (decl) != FUNCTION_DECL + || DECL_VIRTUAL_P (decl))) +{ + make_decl_one_only (decl, cxx_comdat_group (decl)); + if (TREE_CODE (decl) == FUNCTION_DECL + DECL_VIRTUAL_P (decl) + !flag_weak_comdat_functions) + warning_at (DECL_SOURCE_LOCATION (decl), 0, + fno-weak-comdat-functions: Comdat linkage of virtual + function %q#D preserved.); +} else if (TREE_CODE (decl) == FUNCTION_DECL || (VAR_P (decl) DECL_ARTIFICIAL (decl))) /* We can just emit function and compiler-generated variables Index: doc/invoke.texi === --- doc/invoke.texi (revision 224486) +++ doc/invoke.texi (working copy)
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On Wed, Aug 12, 2015 at 08:32:46AM -0700, Richard Henderson wrote: On 08/12/2015 01:31 AM, Segher Boessenkool wrote: Is there something that makes the cache not get too big? Do we care, anyway? No, nothing ever reduces the size of the cache. I doubt we care, but I haven't instrumented to see how big it grows. My intuition says the most important thing about managing this cache is not to put the most common trivial constants in, and I already do that. Right. And it seems to cache negative results too (the five-insn sequence). I did attempt to fix it, and got nothing for my troubles except poorer code generation for AND/IOR/XOR with non-trivial constants. Could you give an example of code that isn't split early enough? I believe the examples I was seeing was in the libdecnumber code. I'd have to go back and reproduce it now... If you could, please do. Perhaps there's some other predication or costing error that's getting in the way, and it simply wasn't obvious to me. In any case, nothing in this patch set addresses this at all. The instruction (set (reg) (const_int 0x12345678)) is costed as 4 (i.e. one insn). That cannot be good. This is alternative #5 in *movsi_internal1_single (there are many more variants of that pattern). Yes, when I tried to fix it, I did adjust that costing, but still... I misread it (it's alt #6, with cost 8). Maybe Alan's patches would fix this one? * Constants are split *really* late. In particular, after reload. Yeah that is bad. But I'm still not seeing it. Hrm, maybe only DImode ones? Dunno. I found it after I did add a recipe to use ADDI, which then triggered an ICE due to the r0 base register. Ah! Reload generates *new* addition insns out of thin air, and that is exactly the case where ADDI won't work. LRA works a bit better there it seems (but it is still not the default for rs6000); if the constraint (b in this case) would not match, it tries other things. Have you looked at generated code quality? I've looked at diffs of all of the object files in the target directory. There were definite spots of improvement. I wasn't able to spot any regressions. [ I've looked now. ] Some of the new combos help a bit, yes. Total code size increased a tiny bit; it looks to be all because of unfortunate scheduling and less tail merging. The usual. 32-bit code is identical, as it should be ;-) Segher
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On Wed, Aug 12, 2015 at 04:59:27PM +0100, Wilco Dijkstra wrote: However it also creates new dependencies that may not be desirable (such as hash table size, algorithm used etc). Some notes about ppc64 in particular: * Constants aren't split until quite late, preventing all hope of CSE'ing portions of the generated code. My gut feeling is that this is in general a mistake, but... Late split is best in general as you want to CSE the original constants, not parts of the expansion (which would be very rarely possible). CSE handles the REG_EQ* notes just fine? For rs6000 many constants are split at expansion already, and I never saw problems from that? Maybe I haven't looked at the right (wrong :-) ) testcases though. Comments? Especially on the shared header? I'm not convinced the amount of code that could be shared is enough to be worthwhile. I agree. Also the way it is written makes the immediate generation more complex and likely consuming a lot of memory (each cached immediate requires at least 64 bytes). It would take a bit less memory if it used pointers (to other cache elts) for sub-expressions, but that makes cache invalidation more interesting, and shared (within one constant) subexpressions non-trivial. It is not obvious to me why it is a good idea to hide the simple/fast cases behind the hashing scheme - it seems better that the backend explicitly decides which cases should be cached. Without going through the abstraction you mean? I'd rather there was no such abstraction at all :-) I looked at the statistics of AArch64 immediate generation a while ago. The interesting thing is ~95% of calls are queries, and the same query is on average repeated 10 times in a row. Huh. So (a) it is not important to cache the expansions, It also stores the expansion from analysis until emission time. This simplifies the code a lot. Maybe that can be done without caching everything (and still not be too expensive), dunno. and (b) the high repetition rate means a single-entry cache has a 90% hitrate. And a 10% miss rate... We already have a patch for this and could collect stats comparing the approaches. If a single-entry cache can provide a similar benefit as caching all immediates then my preference would be to keep things simple and just cache the last query. Note the many repeated queries indicate a performance issue at a much higher level (repeated cost queries on the same unchanged RTL), and solving that problem will likely improve buildtime for all targets. You cannot see if RTL has changed from the pointer to it, so we'd have to store that info (the unchanged info, or the cost info) somewhere. Maybe it would be useful to store it in the RTL itself? Segher
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On 12/08/15 09:32, Richard Earnshaw wrote: On 12/08/15 02:11, Richard Henderson wrote: Something last week had me looking at ppc64 code generation, and some of what I saw was fairly bad. Fixing it wasn't going to be easy, due to the fact that the logic for generating constants wasn't contained within a single function. Better is the way that aarch64 and alpha have done it in the past, sharing a single function with all of the logical that can be used for both cost calculation and the actual emission of the constants. However, the way that aarch64 and alpha have done it hasn't been ideal, in that there's a fairly costly search that must be done every time. I've thought before about changing this so that we would be able to cache results, akin to how we do it in expmed.c for multiplication. I've implemented such a caching scheme for three targets, as a test of how much code could be shared. The answer appears to be about 100 lines of boiler-plate. Minimal, true, but it may still be worth it as a way of encouraging backends to do similar things in a similar way. I've got a short week this week, so won't have time to look at this in detail for a while. So a bunch of questions... but not necessarily objections :-) How do we clear the cache, and when? For example, on ARM, switching between ARM and Thumb state means we need to generate potentially radically different sequences? We can do such splitting at function boundaries now. Can we generate different sequences for hot/cold code within a single function? Can we cache sequences with the context (eg use with AND, OR, ADD, etc)? Some notes about ppc64 in particular: * Constants aren't split until quite late, preventing all hope of CSE'ing portions of the generated code. My gut feeling is that this is in general a mistake, but... I did attempt to fix it, and got nothing for my troubles except poorer code generation for AND/IOR/XOR with non-trivial constants. On AArch64 in particular, building complex constants is generally destructive on the source register (if you want to preserve intermediate values you have to make intermediate copies); that's clearly never going to be a win if you don't need at least 3 instructions to form the constant. There might be some cases where you could form a second constant as a difference from an earlier one, but that then creates data-flow dependencies and in OoO machines that might not be worth-while. Even for in-order machines it can restrict scheduling and result in worse code. I'm somewhat surprised that the operands to the logicals aren't visible at rtl generation time, given all the work done in gimple. And failing that, combine has enough REG_EQUAL notes that it ought to be able to put things back together and see the simpler pattern. We've tried it in the past. Exposing the individual steps prevents the higher-level rtl-based optimizations since they can no-longer deal with the complete sub-expression. Eg. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63724 R.
[RFC] Add check_effective_target_vect_min_max
[ was: Re: [committed, testsuite] Fix vect/trapv-vect-reduc-4.c for sparc ] On 12/08/15 07:57, Tom de Vries wrote: Hi, For sparc, the scan for 'vectorized 2 loops' in vect/trapv-vect-reduc-4.c fail because the vector instructions min and max are not supported. Tested on x86_64. Tested on sparc by Rainer. This follow-up patch introduces a new effective target vect_min_max, similar to how effective target vect_bswap is implemented. Any comments? Thanks, - Tom Add check_effective_target_vect_min_max 2015-08-12 Tom de Vries t...@codesourcery.com * lib/target-supports.exp (check_effective_target_vect_min_max): New proc. * gcc.dg/vect/trapv-vect-reduc-4.c: Use vect_min_max effective target. --- gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c | 2 +- gcc/testsuite/lib/target-supports.exp | 18 ++ 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c index 8fd353c..1624696 100644 --- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c +++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c @@ -47,4 +47,4 @@ int main (void) } /* { dg-final { scan-tree-dump-times Detected reduction\\. 2 vect } } */ -/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { target { ! { sparc*-*-* } } } } } */ +/* { dg-final { scan-tree-dump-times vectorized 2 loops 1 vect { target { vect_min_max } } } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 1988301..c585e5f 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3568,6 +3568,24 @@ proc check_effective_target_vect_bswap { } { return $et_vect_bswap_saved } +# Return 1 if the target supports vector min/max operations. + +proc check_effective_target_vect_min_max { } { +global et_vect_min_max_saved + +if [info exists et_vect_min_max_saved] { + verbose check_effective_target_vect_min_max: using cached result 2 +} else { + set et_vect_min_max_saved 1 + if { [istarget sparc*-*-*] } { + set et_vect_min_max_saved 0 + } +} + +verbose check_effective_target_vect_min_max: returning $et_vect_min_max_saved 2 +return $et_vect_min_max_saved +} + # Return 1 if the target supports hardware vector shift operation for char. proc check_effective_target_vect_shift_char { } { -- 1.9.1
Re: [RFC] Add check_effective_target_vect_min_max
Tom de Vries tom_devr...@mentor.com writes: This follow-up patch introduces a new effective target vect_min_max, similar to how effective target vect_bswap is implemented. Any comments? Thanks, - Tom Add check_effective_target_vect_min_max 2015-08-12 Tom de Vries t...@codesourcery.com * lib/target-supports.exp (check_effective_target_vect_min_max): New proc. * gcc.dg/vect/trapv-vect-reduc-4.c: Use vect_min_max effective target. Looks good to me, but the new effective-target keyword needs documenting in sourcebuild.texi. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [optimize3/3] VRP min/max exprs
On Tue, 11 Aug 2015, Nathan Sidwell wrote: On 08/11/15 07:41, Richard Biener wrote: The patch looks good. Note that with SSA name operands it can be still profitable to do compare_range_with_value, esp. if the other operand has a symbolical range. See vrp_evaluate_conditional_warnv_with_ops_using_ranges which actually implements a nice helper for evaluating comparisons for code-gen transforms. Ok, like this? Thanks for your help! tested on x86_64-linux. Ok. Thanks, Richard.
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On 12/08/15 02:11, Richard Henderson wrote: Something last week had me looking at ppc64 code generation, and some of what I saw was fairly bad. Fixing it wasn't going to be easy, due to the fact that the logic for generating constants wasn't contained within a single function. Better is the way that aarch64 and alpha have done it in the past, sharing a single function with all of the logical that can be used for both cost calculation and the actual emission of the constants. However, the way that aarch64 and alpha have done it hasn't been ideal, in that there's a fairly costly search that must be done every time. I've thought before about changing this so that we would be able to cache results, akin to how we do it in expmed.c for multiplication. I've implemented such a caching scheme for three targets, as a test of how much code could be shared. The answer appears to be about 100 lines of boiler-plate. Minimal, true, but it may still be worth it as a way of encouraging backends to do similar things in a similar way. I've got a short week this week, so won't have time to look at this in detail for a while. So a bunch of questions... but not necessarily objections :-) How do we clear the cache, and when? For example, on ARM, switching between ARM and Thumb state means we need to generate potentially radically different sequences? We can do such splitting at function boundaries now. Can we generate different sequences for hot/cold code within a single function? Can we cache sequences with the context (eg use with AND, OR, ADD, etc)? Some notes about ppc64 in particular: * Constants aren't split until quite late, preventing all hope of CSE'ing portions of the generated code. My gut feeling is that this is in general a mistake, but... I did attempt to fix it, and got nothing for my troubles except poorer code generation for AND/IOR/XOR with non-trivial constants. On AArch64 in particular, building complex constants is generally destructive on the source register (if you want to preserve intermediate values you have to make intermediate copies); that's clearly never going to be a win if you don't need at least 3 instructions to form the constant. There might be some cases where you could form a second constant as a difference from an earlier one, but that then creates data-flow dependencies and in OoO machines that might not be worth-while. Even for in-order machines it can restrict scheduling and result in worse code. I'm somewhat surprised that the operands to the logicals aren't visible at rtl generation time, given all the work done in gimple. And failing that, combine has enough REG_EQUAL notes that it ought to be able to put things back together and see the simpler pattern. We've tried it in the past. Exposing the individual steps prevents the higher-level rtl-based optimizations since they can no-longer deal with the complete sub-expression. Perhaps there's some other predication or costing error that's getting in the way, and it simply wasn't obvious to me. In any case, nothing in this patch set addresses this at all. * I go on to add 4 new methods of generating a constant, each of which typically saves 2 insns over the current algorithm. There are a couple more that might be useful but... * Constants are split *really* late. In particular, after reload. It would be awesome if we could at least have them all split before register allocation so that we arrange to use ADDI and ADDIS when that could save a few instructions. But that does of course mean avoiding r0 for the input. Again, nothing here attempts to change when constants are split. certainly in the ARM port we try to split immediately before register allocation, that way we can be sure that we have scratch registers available if that helps with generating more efficient sequences. R. * This is the only platform for which I bothered collecting any sort of performance data: As best I can tell, there is a 9% improvement in bootstrap speed for ppc64. That is, 10 minutes off the original 109 minute build. For aarch64 and alpha, I simply assumed there would be no loss, since the basic search algorithm is unchanged for each. Comments? Especially on the shared header? r~ Cc: David Edelsohn dje@gmail.com Cc: Marcus Shawcroft marcus.shawcr...@arm.com Cc: Richard Earnshaw richard.earns...@arm.com Richard Henderson (15): rs6000: Split out rs6000_is_valid_and_mask_wide rs6000: Make num_insns_constant_wide static rs6000: Tidy num_insns_constant vs CONST_DOUBLE rs6000: Implement set_const_data infrastructure rs6000: Move constant via mask into build_set_const_data rs6000: Use rldiwi in constant construction rs6000: Generalize left shift in constant generation rs6000: Generalize masking in constant generation
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
Hi! This looks really nice. I'll try it out soon :-) Some comments now... On Tue, Aug 11, 2015 at 06:11:29PM -0700, Richard Henderson wrote: However, the way that aarch64 and alpha have done it hasn't been ideal, in that there's a fairly costly search that must be done every time. I've thought before about changing this so that we would be able to cache results, akin to how we do it in expmed.c for multiplication. Is there something that makes the cache not get too big? Do we care, anyway? Some notes about ppc64 in particular: * Constants aren't split until quite late, preventing all hope of CSE'ing portions of the generated code. My gut feeling is that this is in general a mistake, but... Constant arguments to IOR/XOR/AND that can be done with two machine insns are split at expand. Then combine comes along and just loves to recombine them, but then they are split again at split1 (before RA). For AND this was optimal in my experiments; for IOR/XOR it has been this way since the dawn of time. Simple SETs aren't split at expand, maybe they should be. But they are split at split1. I did attempt to fix it, and got nothing for my troubles except poorer code generation for AND/IOR/XOR with non-trivial constants. Could you give an example of code that isn't split early enough? I'm somewhat surprised that the operands to the logicals aren't visible at rtl generation time, given all the work done in gimple. So am I, because that is not what I'm seeing? E.g. int f(int x) { return x | 0x12345678; } is expanded as two IORs already. There must be something in your testcases that prevents this? And failing that, combine has enough REG_EQUAL notes that it ought to be able to put things back together and see the simpler pattern. Perhaps there's some other predication or costing error that's getting in the way, and it simply wasn't obvious to me. In any case, nothing in this patch set addresses this at all. The instruction (set (reg) (const_int 0x12345678)) is costed as 4 (i.e. one insn). That cannot be good. This is alternative #5 in *movsi_internal1_single (there are many more variants of that pattern). * I go on to add 4 new methods of generating a constant, each of which typically saves 2 insns over the current algorithm. There are a couple more that might be useful but... New methods look to be really simple to add with your framework, very nice :-) * Constants are split *really* late. In particular, after reload. Yeah that is bad. But I'm still not seeing it. Hrm, maybe only DImode ones? It would be awesome if we could at least have them all split before register allocation And before sched1, yeah. so that we arrange to use ADDI and ADDIS when that could save a few instructions. But that does of course mean avoiding r0 for the input. That is no problem at all before RA. Again, nothing here attempts to change when constants are split. * This is the only platform for which I bothered collecting any sort of performance data: As best I can tell, there is a 9% improvement in bootstrap speed for ppc64. That is, 10 minutes off the original 109 minute build. That is, wow. Wow :-) Have you looked at generated code quality? Segher
[PATCH][1/2] Make SCCVN use conditional equivalences
This is the first patch in the series to make SCCVN able to remove redundant comparisons by recording conditionals being true on visited edges. This part of the series fully transitions the toplevel walk gathering entries to SCCs we value-number to the DOM walk which at the moment only visits last stmts of BBs. After this series we can safely insert expressions into the SCCVN tables temporarily for uses dominated by the stmt generating them. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-08-12 Richard Biener rguent...@suse.de * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children): Eliminate edges marked as not executable by SCCVN. * tree-ssa-sccvn.c: Include gimple-iterator.h. (cond_dom_walker): Rename to sccvn_dom_walker. (sccvn_dom_walker::before_dom_children): Value-number defs of all stmts. (run_scc_vn): Remove loop value-numbering all SSA names. Drop not visited SSA names to varying. * gcc.dg/tree-ssa/ssa-fre-43.c: Adjust. Index: gcc/tree-ssa-pre.c === --- gcc/tree-ssa-pre.c (revision 226777) +++ gcc/tree-ssa-pre.c (working copy) @@ -4291,7 +4291,31 @@ eliminate_dom_walker::before_dom_childre el_to_remove.safe_push (stmt); continue; } -} + } + + /* If this is a control statement value numbering left edges +unexecuted on force the condition in a way consistent with +that. */ + if (gcond *cond = dyn_cast gcond * (stmt)) + { + if ((EDGE_SUCC (b, 0)-flags EDGE_EXECUTABLE) + ^ (EDGE_SUCC (b, 1)-flags EDGE_EXECUTABLE)) + { + if (dump_file (dump_flags TDF_DETAILS)) +{ + fprintf (dump_file, Removing unexecutable edge from ); + print_gimple_stmt (dump_file, stmt, 0, 0); +} + if (((EDGE_SUCC (b, 0)-flags EDGE_TRUE_VALUE) != 0) + == ((EDGE_SUCC (b, 0)-flags EDGE_EXECUTABLE) != 0)) + gimple_cond_make_true (cond); + else + gimple_cond_make_false (cond); + update_stmt (cond); + el_todo |= TODO_cleanup_cfg; + continue; + } + } bool can_make_abnormal_goto = stmt_can_make_abnormal_goto (stmt); bool was_noreturn = (is_gimple_call (stmt) Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c(revision 226777) +++ gcc/tree-ssa-sccvn.c(working copy) @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. #include tree-cfg.h #include domwalk.h #include cgraph.h +#include gimple-iterator.h /* This algorithm is based on the SCC algorithm presented by Keith Cooper and L. Taylor Simpson in SCC-Based Value numbering @@ -4277,18 +4278,20 @@ set_hashtable_value_ids (void) set_value_id_for_result (vr-result, vr-value_id); } -class cond_dom_walker : public dom_walker +class sccvn_dom_walker : public dom_walker { public: - cond_dom_walker () : dom_walker (CDI_DOMINATORS), fail (false) {} + sccvn_dom_walker () : dom_walker (CDI_DOMINATORS), fail (false) {} virtual void before_dom_children (basic_block); bool fail; }; +/* Value number all statements in BB. */ + void -cond_dom_walker::before_dom_children (basic_block bb) +sccvn_dom_walker::before_dom_children (basic_block bb) { edge e; edge_iterator ei; @@ -4317,6 +4320,34 @@ cond_dom_walker::before_dom_children (ba return; } + /* Value-number all defs in the basic-block. */ + for (gphi_iterator gsi = gsi_start_phis (bb); + !gsi_end_p (gsi); gsi_next (gsi)) +{ + gphi *phi = gsi.phi (); + tree res = PHI_RESULT (phi); + if (!VN_INFO (res)-visited + !DFS (res)) + { + fail = true; + return; + } +} + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); + !gsi_end_p (gsi); gsi_next (gsi)) +{ + ssa_op_iter i; + tree op; + FOR_EACH_SSA_TREE_OPERAND (op, gsi_stmt (gsi), i, SSA_OP_ALL_DEFS) + if (!VN_INFO (op)-visited +!DFS (op)) + { + fail = true; + return; + } +} + + /* Finally look at the last stmt. */ gimple stmt = last_stmt (bb); if (!stmt) return; @@ -4329,8 +4360,7 @@ cond_dom_walker::before_dom_children (ba if (dump_file (dump_flags TDF_DETAILS)) { - fprintf (dump_file, Value-numbering operands of stmt ending BB %d: , - bb-index); + fprintf (dump_file, Visiting stmt ending BB %d: , bb-index); print_gimple_stmt (dump_file, stmt, 0, 0); } @@ -4338,12 +4368,8 @@ cond_dom_walker::before_dom_children (ba ssa_op_iter i; tree op; FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_USE) -if (VN_INFO (op)-visited == false -
Re: [PATCH][RFC] Re-work GIMPLE checking to be gimple class friendly (even better)
On Mon, 3 Aug 2015, Richard Biener wrote: On Mon, 27 Jul 2015, Richard Biener wrote: On Mon, 27 Jul 2015, Richard Biener wrote: On Mon, 27 Jul 2015, Richard Biener wrote: I noticed that the code we generate for a simple gimple_assign_rhs1 (stmt) is quite bad as we have two checking pieces. The implementation is now static inline tree gimple_assign_rhs1 (const_gimple gs) { GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return gimple_op (gs, 1); } and the hidden checking is due to gimple_op being static inline tree * gimple_ops (gimple gs) { size_t off; /* All the tuples have their operand vector at the very bottom of the structure. Note that those structures that do not have an operand vector have a zero offset. */ off = gimple_ops_offset_[gimple_statement_structure (gs)]; gcc_gimple_checking_assert (off != 0); return (tree *) ((char *) gs + off); (ugly in itself considering that we just verified we have a gassign which has an operand vector as member...). gimple_op incurs two additional memory reference to compute gimple_statement_structure and gimple_ops_offset_ plus the checking bits. The simplest thing would be to use GIMPLE_CHECK (gs, GIMPLE_ASSIGN); return as_a const gassign * (gs)-op[1]; but that's not optimal either because we check we have a gassign again (not optimized in stage1). So I decided to invent a fancy as_a which reports errors similar to GIMPLE_CHECK and makes the code look like const gassign *ass = GIMPLE_CHECK2const gassign * (gs); return ass-op[1]; for some reason I needed to overload is_a_helper for const_gimple (I thought the is_a machinery would transparently support qualified types?). I'm missing a fallback for !ENABLE_GIMPLE_CHECKING as well as an overload for 'gimple' I guess. I just changed gimple_assign_rhs1 for now. So this is a RFC. I understand in the end the whole GCC may use gassign everywhere we access gimple_assign_rhs1 but I don't see this happening soon and this simple proof-of-concept patch already reduces unoptimized text size of gimple-match.c from 836080 to 836359 (too bad) and optimized from 585731 to 193190 bytes (yay!) on x86_64 using trunk. Some inlining must go very differently - we just have 544 calls to gimple_assign_rhs1 in gimple-match.c. .optimizes shows all calls remaining as bb 89: _ZL18gimple_assign_rhs1PK21gimple_statement_base.part.32 (def_stmt_47); which is tree_node* gimple_assign_rhs1(const_gimple) (const struct gimple_statement_base * gs) { bb 2: gimple_check_failed (gs_1(D), /space/rguenther/tramp3d/trunk/gcc/gimple.h[0], 2381, gimple_assign_rhs1[0], 6, 0); } so eventually we just do a better job with profile estimation. Ok, numbers are off because of a typo I introduced late. + if (ret) +gimple_check_failed (gs, file, line, fun, T()-code_, ERROR_MARK); should be if (!ret) of course ... Optimized code size after the patch is 588878 (we still avoid a lot of code and the checking fails are still split out). Not sure where the code size increase is from. Unoptimized code size after the patch is 836233. And here is a more complete variant (covering all op accesses of gimple assigns). stage2 cc1plus size drops from textdata bss dec hex filename 24970991 73776 1392952 264377191936857 /abuild/rguenther/obj2/prev-gcc/cc1plus to textdata bss dec hex filename 24809991 69608 1388536 26268135190d1e7 ../obj/prev-gcc/cc1plus So we tried to come up with the reason that there are no gassign * overloads already in place. And remembered sth about C++ and not working overload resolution and such. But experiments show that all cases we tried are handled well (including const gassign * - gimple conversions). So here is a variant of the patch adding those overloads and making the gimple variants forwarders with just the (new and fancy) GIMPLE checking. I'm leaving this for discussion again but plan to commit it this time if there is no further response (after Cauldron, that is). Btw, I wonder what happened to the gimple - gimple * conversion. Bootstrap regtest running on x86_64-unknown-linux-gnu. Applied to trunk as r226802. Richard. Richard. 2015-07-27 Richard Biener rguent...@suse.de * gimple.h (remove_pointer): New trait. (GIMPLE_CHECK2): New inline template function. (gassign::code_): New constant static member. (is_a_helperconst gassign *): Add. (gimple_assign_lhs): Use GIMPLE_CHECK2 in the gimple overload and forward to a new gassign
[gomp4] Merge trunk r226769 (2015-08-10) into gomp-4_0-branch
Hi! Committed to gomp-4_0-branch in r226803: commit e2f0fc69bc92b2232573cfafe5f4975551eb05e0 Merge: 66f94e8 b5c93b0 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Aug 12 09:34:27 2015 + svn merge -r 225562:226769 svn+ssh://gcc.gnu.org/svn/gcc/trunk git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226803 138bc75d-0d04-0410-961f-82ee72b054a4 Grüße, Thomas pgpNt3fiispgE.pgp Description: PGP signature
Re: [RFC] Add check_effective_target_vect_min_max
On Wed, 12 Aug 2015, Rainer Orth wrote: Tom de Vries tom_devr...@mentor.com writes: On 12/08/15 10:51, Rainer Orth wrote: Tom de Vries tom_devr...@mentor.com writes: This follow-up patch introduces a new effective target vect_min_max, similar to how effective target vect_bswap is implemented. Any comments? Thanks, - Tom Add check_effective_target_vect_min_max 2015-08-12 Tom de Vries t...@codesourcery.com * lib/target-supports.exp (check_effective_target_vect_min_max): New proc. * gcc.dg/vect/trapv-vect-reduc-4.c: Use vect_min_max effective target. Looks good to me, but the new effective-target keyword needs documenting in sourcebuild.texi. Hmm, in sourcebuild.texi I found: ... @item vect_no_int_max Target does not support a vector max instruction on @code{int}. ... That looks related. [ I also found a patch introducing vect_no_uint_max here: https://gcc.gnu.org/ml/gcc-patches/2010-01/msg00152.html. ] I'm not sure where to take it from here. Should I introduce vect_no_int_min, and use that in combination with vect_no_int_max? I'd say this is something for the vectorizer maintainers to decide. Richi? I expect the above is already effectively vect_no_int_min as well (which target would support min but not max...?). So after double-checking that you could rename it to vect_no_int_min_max. Richard.
[Aarch64][1/3] Use atomic compare-and-swap instructions when available.
ARMv8.1 adds instructions for atomic compare-and-swap with optional memory ordering specifiers. This patch series adds the instructions to GCC, making them available with -march=armv8.1-a or with -march=armv8-a+lse, and using them to implement the __sync and __atomic builtins. This patch adds an internal TARGET_LSE macro, to check target support for the atomic instructions. Subsequent patches - add and use atomic compare-and-swap instructions; - add tests for the compare-and-swap; Tested the series for aarch64-none-linux-gnu with native bootstrap and make check and for aarch64-none-elf with cross-compiled check-gcc. Also tested aarch64-none-elf with cross-compiled check-gcc on an emulator that supports ARMv8.1. Ok for trunk? Matthew 2015-08-12 Matthew Wahab matthew.wa...@arm.com * config/aarch64/aarch64.h (AARCH64_ISA_LSE): New. (TARGET_LSE): New. From 50a71e7572de7eb47c7913ae3d0984cd61f8d425 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Mon, 20 Jul 2015 12:42:52 +0100 Subject: [PATCH 1/3] Add LSE flag testing macro. Change-Id: I4bfeb5ec617f13026fb7dcd28a9b5c7efa4c1719 --- gcc/config/aarch64/aarch64.h | 4 1 file changed, 4 insertions(+) diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 721927f..d6c7a74 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -156,6 +156,7 @@ extern unsigned aarch64_architecture_version; #define AARCH64_ISA_CRYPTO (aarch64_isa_flags AARCH64_FL_CRYPTO) #define AARCH64_ISA_FP (aarch64_isa_flags AARCH64_FL_FP) #define AARCH64_ISA_SIMD (aarch64_isa_flags AARCH64_FL_SIMD) +#define AARCH64_ISA_LSE (aarch64_isa_flags AARCH64_FL_LSE) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD AARCH64_ISA_CRYPTO) @@ -163,6 +164,9 @@ extern unsigned aarch64_architecture_version; /* CRC instructions that can be enabled through +crc arch extension. */ #define TARGET_CRC32 (AARCH64_ISA_CRC) +/* Atomic instructions that can be enabled through the +lse extension. */ +#define TARGET_LSE (AARCH64_ISA_LSE) + /* Make sure this is always defined so we don't have to check for ifdefs but rather use normal ifs. */ #ifndef TARGET_FIX_ERR_A53_835769_DEFAULT -- 1.9.1
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On Wed, Aug 12, 2015 at 1:07 PM, Markus Trippelsdorf mar...@trippelsdorf.de wrote: On 2015.08.12 at 13:01 +0200, Richard Biener wrote: On Tue, Aug 11, 2015 at 9:49 PM, Jeff Law l...@redhat.com wrote: On 08/06/2015 04:25 AM, Mikael Morin wrote: Hello, this avoids an error found with bootstrap-ubsan. Regression tested on x86_64-unknown-linux-gnu. OK for trunk? Mikael noub_sext.CL 2015-08-05 Mikael Morinmik...@gcc.gnu.org * hwint.h (sext_hwi): Rewrite without undefined behaviour on negative SRC. OK. Hopefully most of the time the precision is known at compile-time which would allow for optimization of the resulting code back to the pair-of-shifts form by combine. I think it is not. The code also lacks a comment on why we do this kind of obfuscation. What kind of error does ubsan run into? That is, for which 'prec'? See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042 Ugh. Stupid C. My fear is that with so many stmts sext_hwi isn't any longer considered for inlining, even if prec is constant. What's the generated code for its out-of line copy with/without the patch? Richard. -- Markus
[PATCH][2/2] Make SCCVN use conditional equivalences
This brings FRE/PRE up to the same level as DOM in being able to remove redundant conditionals. It does so by inserting temporary conditional expressions proved to be true on single predecessor edges. I've had to do a lot of testcase adjustments, thus the patch is now re-bootstrapping / testing on x86_64-unknown-linux-gnu. Richard. 2015-08-12 Richard Biener rguent...@suse.de * tree-ssa-sccvn.c (vn_nary_op_compute_hash): Also canonicalize comparison operand order and commutative ternary op operand order. (sccvn_dom_walker::cond_stack): New state to track temporary expressions. (sccvn_dom_walker::after_dom_children): Remove tempoary expressions no longer valid. (sccvn_dom_walker::record_cond): Add a single temporary conditional expression. (sccvn_dom_walker::record_conds): Add a temporary conditional expressions and all related expressions also true/false. (sccvn_dom_walker::before_dom_children): Record temporary expressions based on the controlling condition of a single predecessor. When trying to simplify a conditional statement lookup expressions we might have inserted earlier. * testsuite/g++.dg/tree-ssa/pr61034.C * testsuite/gcc.dg/fold-compare-2.c * testsuite/gcc.dg/pr50763.c * testsuite/gcc.dg/predict-3.c * testsuite/gcc.dg/tree-ssa/20030709-2.c * testsuite/gcc.dg/tree-ssa/pr19831-3.c * testsuite/gcc.dg/tree-ssa/pr20657.c * testsuite/gcc.dg/tree-ssa/pr21001.c * testsuite/gcc.dg/tree-ssa/pr37508.c * testsuite/gcc.dg/tree-ssa/ssa-fre-47.c * testsuite/gcc.dg/tree-ssa/ssa-fre-48.c * testsuite/gcc.dg/tree-ssa/ssa-fre-49.c * testsuite/gcc.dg/tree-ssa/vrp04.c * testsuite/gcc.dg/tree-ssa/vrp07.c * testsuite/gcc.dg/tree-ssa/vrp09.c * testsuite/gcc.dg/tree-ssa/vrp16.c * testsuite/gcc.dg/tree-ssa/vrp20.c * testsuite/gcc.dg/tree-ssa/vrp25.c * testsuite/gcc.dg/tree-ssa/vrp87.c Index: gcc/testsuite/g++.dg/tree-ssa/pr61034.C === --- gcc/testsuite/g++.dg/tree-ssa/pr61034.C (revision 226802) +++ gcc/testsuite/g++.dg/tree-ssa/pr61034.C (working copy) @@ -43,5 +43,5 @@ bool f(I a, I b, I c, I d) { // This works only if everything is inlined into 'f'. // { dg-final { scan-tree-dump-times ;; Function 1 fre2 } } -// { dg-final { scan-tree-dump-times free 18 fre2 } } +// { dg-final { scan-tree-dump-times free 10 fre2 } } // { dg-final { scan-tree-dump-times unreachable 11 fre2 } } Index: gcc/testsuite/gcc.dg/fold-compare-2.c === --- gcc/testsuite/gcc.dg/fold-compare-2.c (revision 226802) +++ gcc/testsuite/gcc.dg/fold-compare-2.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 -fno-tree-tail-merge -fdump-tree-vrp1 } */ +/* { dg-options -O2 -fdump-tree-fre1 } */ extern void abort (void); @@ -15,5 +15,5 @@ main(void) return 0; } -/* { dg-final { scan-tree-dump-times Removing basic block 2 vrp1 } } */ +/* { dg-final { scan-tree-dump-times Removing basic block 2 fre1 } } */ Index: gcc/testsuite/gcc.dg/pr50763.c === --- gcc/testsuite/gcc.dg/pr50763.c (revision 226802) +++ gcc/testsuite/gcc.dg/pr50763.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 -ftree-tail-merge -fno-tree-dominator-opts -fdump-tree-pre } */ +/* { dg-options -O2 -ftree-tail-merge -fno-tree-dominator-opts } */ int bar (int i); @@ -11,5 +11,3 @@ foo (int c, int d) d = 33; while (c == d); } - -/* { dg-final { scan-tree-dump-times == 33 2 pre} } */ Index: gcc/testsuite/gcc.dg/predict-3.c === --- gcc/testsuite/gcc.dg/predict-3.c(revision 226802) +++ gcc/testsuite/gcc.dg/predict-3.c(working copy) @@ -12,6 +12,10 @@ void foo (int bound) { if (i bound - 2) global += bar (i); + /* The following test is redundant with the loop bound check in the + for stmt and thus eliminated by FRE which makes the controlled +stmt always executed and thus equivalent to 100%. Thus the +heuristic only applies three times. */ if (i = bound) global += bar (i); if (i + 1 bound) @@ -21,4 +25,4 @@ void foo (int bound) } } -/* { dg-final { scan-tree-dump-times loop iv compare heuristics: 100.0% 4 profile_estimate} } */ +/* { dg-final { scan-tree-dump-times loop iv compare heuristics: 100.0% 3 profile_estimate} } */ Index: gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c (revision 226802) +++ gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c (working copy) @@
Re: [RFC] Add check_effective_target_vect_min_max
On 12/08/15 10:51, Rainer Orth wrote: Tom de Vries tom_devr...@mentor.com writes: This follow-up patch introduces a new effective target vect_min_max, similar to how effective target vect_bswap is implemented. Any comments? Thanks, - Tom Add check_effective_target_vect_min_max 2015-08-12 Tom de Vries t...@codesourcery.com * lib/target-supports.exp (check_effective_target_vect_min_max): New proc. * gcc.dg/vect/trapv-vect-reduc-4.c: Use vect_min_max effective target. Looks good to me, but the new effective-target keyword needs documenting in sourcebuild.texi. Hmm, in sourcebuild.texi I found: ... @item vect_no_int_max Target does not support a vector max instruction on @code{int}. ... That looks related. [ I also found a patch introducing vect_no_uint_max here: https://gcc.gnu.org/ml/gcc-patches/2010-01/msg00152.html. ] I'm not sure where to take it from here. Should I introduce vect_no_int_min, and use that in combination with vect_no_int_max? Thanks, - Tom
[gomp4] Work around expand_GOACC_DIM_SIZE/expand_GOACC_DIM_POS ICEs (was: signed nums are better for dimensions)
Hi! On Tue, 11 Aug 2015 13:38:34 -0400, Nathan Sidwell nat...@acm.org wrote: 2) We really should not be getting to the expanders if there's nothing to expand to. That's simply covering up lack of earlier handling. That earlier removal gives optimizers more leeway. --- internal-fn.c (revision 226770) +++ internal-fn.c (working copy) static void -expand_GOACC_FORK (gcall *stmt ATTRIBUTE_UNUSED) +expand_GOACC_FORK (gcall *ARG_UNUSED (stmt)) { #ifdef HAVE_oacc_fork [...] +#else + gcc_unreachable (); #endif } static void -expand_GOACC_JOIN (gcall *stmt ATTRIBUTE_UNUSED) +expand_GOACC_JOIN (gcall *ARG_UNUSED (stmt)) { #ifdef HAVE_oacc_join [...] +#else + gcc_unreachable (); #endif } static void -expand_GOACC_DIM_SIZE (gcall *stmt) +expand_GOACC_DIM_SIZE (gcall *ARG_UNUSED (stmt)) { +#ifdef HAVE_oacc_dim_size [...] #else - emit_move_insn (target, const1_rtx); + gcc_unreachable (); #endif } static void -expand_GOACC_DIM_POS (gcall *stmt) +expand_GOACC_DIM_POS (gcall *ARG_UNUSED (stmt)) { +#ifdef HAVE_oacc_dim_pos [...] #else - emit_move_insn (target, const0_rtx); + gcc_unreachable (); #endif } It's not an issue for expand_GOACC_FORK and expand_GOACC_JOIN, but expand_GOACC_DIM_SIZE and expand_GOACC_DIM_POS then blow up when the intelmic offloading compiler works through the OpenACC offloading code, for example: [...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/firstprivate-1.c: In function 'main._omp_fn.0': [...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/firstprivate-1.c:18:11: internal compiler error: in expand_GOACC_DIM_POS, at internal-fn.c:2023 #pragma acc parallel num_gangs (n) firstprivate (a) ^ 0x90bce7 expand_GOACC_DIM_POS [...]/source-gcc/gcc/internal-fn.c:2023 0x62945a expand_call_stmt [...]/source-gcc/gcc/cfgexpand.c:2279 0x62945a expand_gimple_stmt_1 [...]/source-gcc/gcc/cfgexpand.c:3238 0x62945a expand_gimple_stmt [...]/source-gcc/gcc/cfgexpand.c:3399 0x62a82d expand_gimple_basic_block [...]/source-gcc/gcc/cfgexpand.c:5411 0x62fc86 execute [...]/source-gcc/gcc/cfgexpand.c:6023 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See http://gcc.gnu.org/bugs.html for instructions. mkoffload-intelmic: fatal error: [...]/install/offload-x86_64-intelmicemul-linux-gnu/bin//x86_64-unknown-linux-gnu-accel-x86_64-intelmicemul-linux-gnu-gcc returned 1 exit status Admittedly, compiling OpenACC offloading code for Intel MIC doesn't make a lot of sense (currently), but it is what's being done, and has caused a lot of regressions in my testing, so I committed the following workaround to gomp-4_0-branch in r226804. A different approach would have been to invent some machinery to not compile OpenACC offloading code on the Intel MIC offloading path, but as the latter eventually is to support OpenACC offloading, too (see http://news.gmane.org/find-root.php?message_id=%3C20141112111207.GW5026%40tucnak.redhat.com%3E, for example), this seemed like a step into the wrong direction. Eventually, this will need to be fixed differently/properly -- have to implement oacc_dim_size and oacc_dim_pos for Intel MIC offloading, or can we have a generic fallback solution not specific to the offloading target? commit 8a3187f17e13bd45e630ff8a587c1fc7086abece Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed Aug 12 09:56:59 2015 + Work around expand_GOACC_DIM_SIZE/expand_GOACC_DIM_POS ICEs Followup to r226783. gcc/ * internal-fn.c: Include builtins.h. (expand_GOACC_DIM_SIZE, expand_GOACC_DIM_POS): Instead of gcc_unreachable, expand_builtin_trap. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226804 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |6 ++ gcc/internal-fn.c |5 +++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index ea254c5..044fdf7 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,9 @@ +2015-08-12 Thomas Schwinge tho...@codesourcery.com + + * internal-fn.c: Include builtins.h. + (expand_GOACC_DIM_SIZE, expand_GOACC_DIM_POS): Instead of + gcc_unreachable, expand_builtin_trap. + 2015-08-10 Nathan Sidwell nat...@acm.org * tree-ssa-phiopt.c (minmax_replacement): Create new ssa name if diff --git gcc/internal-fn.c gcc/internal-fn.c index 8b8c6e1..70bffd4 100644 --- gcc/internal-fn.c +++ gcc/internal-fn.c @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3. If not see #include stringpool.h #include tree-ssanames.h #include diagnostic-core.h +#include builtins.h /* The names of each internal
Re: [Aarch64][1/3] Use atomic compare-and-swap instructions when available.
On Wed, Aug 12, 2015 at 11:12:29AM +0100, Matthew Wahab wrote: ARMv8.1 adds instructions for atomic compare-and-swap with optional memory ordering specifiers. This patch series adds the instructions to GCC, making them available with -march=armv8.1-a or with -march=armv8-a+lse, and using them to implement the __sync and __atomic builtins. This patch adds an internal TARGET_LSE macro, to check target support for the atomic instructions. Subsequent patches - add and use atomic compare-and-swap instructions; - add tests for the compare-and-swap; Tested the series for aarch64-none-linux-gnu with native bootstrap and make check and for aarch64-none-elf with cross-compiled check-gcc. Also tested aarch64-none-elf with cross-compiled check-gcc on an emulator that supports ARMv8.1. Ok for trunk? Matthew OK. Thanks, James 2015-08-12 Matthew Wahab matthew.wa...@arm.com * config/aarch64/aarch64.h (AARCH64_ISA_LSE): New. (TARGET_LSE): New.
Re: [Aarch64][3/3] Add tests for atomic compare-and-swap instructions.
On Wed, Aug 12, 2015 at 6:16 PM, Matthew Wahab matthew.wa...@foss.arm.com wrote: ARMv8.1 adds instructions for atomic compare-and-swap with optional memory ordering specifiers. This patch adds tests for the compare-and-swap instructions as two files. The first is support code to run the test with a range of types and memory models, the second is the test for the CAS instruction. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check and for aarch64-none-elf with cross-compiled check-gcc. Also tested aarch64-none-elf with cross-compiled check-gcc on an emulator that supports ARMv8.1. Ok for trunk? I noticed you did not change the other cas testcases so they don't set lse. This can cause those testcases to fail if a cpu defaults to having LSE. I am going to be modifying a cpu to default to LSE enabled soonish. Thanks, Andrew Matthew gcc/testsuite 2015-08-12 Matthew Wahab matthew.wa...@arm.com * gcc.target/aarch64/atomic-inst-cas.c: New. * gcc.target/aarch64/atomic-inst-ops.inc: New.
Re: [PATCH] Don't create superfluous parm in expand_omp_taskreg
On Tue, 11 Aug 2015, Tom de Vries wrote: [ was: Re: [committed, gomp4] Fix release_dangling_ssa_names ] On 05/08/15 13:13, Richard Biener wrote: On Wed, 5 Aug 2015, Tom de Vries wrote: On 05/08/15 11:30, Richard Biener wrote: On Wed, 5 Aug 2015, Tom de Vries wrote: On 05/08/15 09:29, Richard Biener wrote: This patch fixes that by making sure we reset the def stmt to NULL. This means we can simplify release_dangling_ssa_names to just test for NULL def stmts. Not sure if I understand the problem correctly but why are you not simply releasing the SSA name when you remove its definition? In move_sese_region_to_fn we move a region of blocks from one function to another, bit by bit. When we encounter an ssa_name as def or use in the region, we: - generate a new ssa_name, - set the def stmt of the old name as def stmt of the new name, and - add a mapping from the old to the new name. The next time we encounter the same ssa_name in another statement, we find it in the map. If we release the old ssa name, we effectively create statements with operands in the free-list. The first point where that cause breakage, is in walk_gimple_op, which expects the TREE_TYPE of the lhs of an assign to be defined, which is not the case if it's in the free-list: ... case GIMPLE_ASSIGN: /* Walk the RHS operands. If the LHS is of a non-renamable type or is a register variable, we may use a COMPONENT_REF on the RHS.*/ if (wi) { tree lhs = gimple_assign_lhs (stmt); wi-val_only = (is_gimple_reg_type (TREE_TYPE (lhs)) !is_gimple_reg (lhs)) || gimple_assign_rhs_class (stmt) != GIMPLE_SINGLE_RHS; } ... Hmm, ok, probably because the stmt moving doesn't happen in DOM order (move defs before uses). But There seems to be similar code for the rhs, so I don't think changing the order would fix anything. + + if (!SSA_NAME_IS_DEFAULT_DEF (name)) + /* The statement has been moved to the child function. It no longer + defines name in the original function. Mark the def stmt NULL, and + let release_dangling_ssa_names deal with it. */ + SSA_NAME_DEF_STMT (name) = NULL; applies also to uses - I don't see why it couldn't happen that you move a use but not its def (the def would be a parameter to the split-out function). You'd wreck the IL of the source function this way. If you first move a use, you create a mapping. When you encounter the def, you use the mapping. Indeed, if the def is a default def, we don't encounter the def. Which is why we create a nop as defining def for those cases. The default def in the source function still has a defining nop, and has no uses anymore. I don't understand what is broken here. If you never encounter the DEF then it's broken. Say, if for foo(int a) { int b = a; if (b) { code using b } } you move code using b to a function. Then the def is still in foo but you create a mapping for its use(s). Clearly the outlining process in this case has to pass b as parameter to the outlined function, something that may not happen currently. Ah, I see. Indeed, this is a situation that is assumed not to happen. It would probably be cleaner to separate the def and use remapping to separate functions and record on whether we saw a def or not. Right, or some other means to detect this situation, say when copying the def stmt in replace_ssa_name, check whether it's part of the sese region. I think that the whole dance of actually moving things instead of just copying it isn't worth the extra maintainance (well, if we already have a machinery duplicating a SESE region to another function - I suppose gimple_duplicate_sese_region could be trivially changed to support that). I'll mention that as todo. For now, I think the fastest way to get a working version is to fix move_sese_region_to_fn. Sure. Trunk doesn't have release_dangling_ssa_names it seems Yep, I only ran into this trouble for the kernels region handling. But I don't exclude the possibility it could happen for trunk as well. but I think it belongs to move_sese_region_to_fn and not to omp-low.c Makes sense indeed. and it could also just walk the d-vars_map replace_ssa_name fills to iterate over the removal candidates Agreed, I suppose in general that's a win over iterating over all the ssa names. (and if the situation of moving uses but not defs cannot happen you don't need any
[committed] Better location for atomic bit-fields
A trivial patch to use LOC for atomic bit-fields error messages. I haven't bothered with a testcase this time. Bootstrapped/regtested on x86_64-linux, applying to trunk. 2015-08-12 Marek Polacek pola...@redhat.com * c-decl.c (grokdeclarator): Call error_at instead of error and pass LOC to it. diff --git gcc/c/c-decl.c gcc/c/c-decl.c index 0d7aa3f..b83c584 100644 --- gcc/c/c-decl.c +++ gcc/c/c-decl.c @@ -6233,9 +6233,9 @@ grokdeclarator (const struct c_declarator *declarator, if (type_quals TYPE_QUAL_ATOMIC) { if (name) - error (bit-field %qE has atomic type, name); + error_at (loc, bit-field %qE has atomic type, name); else - error (bit-field has atomic type); + error_at (loc, bit-field has atomic type); type_quals = ~TYPE_QUAL_ATOMIC; } } Marek
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On 12/08/15 09:43, Richard Earnshaw wrote: On 12/08/15 09:32, Richard Earnshaw wrote: On 12/08/15 02:11, Richard Henderson wrote: I'm somewhat surprised that the operands to the logicals aren't visible at rtl generation time, given all the work done in gimple. And failing that, combine has enough REG_EQUAL notes that it ought to be able to put things back together and see the simpler pattern. We've tried it in the past. Exposing the individual steps prevents the higher-level rtl-based optimizations since they can no-longer deal with the complete sub-expression. Eg. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63724 R. And https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65768 R.
Re: [RFC] Add check_effective_target_vect_min_max
Tom de Vries tom_devr...@mentor.com writes: On 12/08/15 10:51, Rainer Orth wrote: Tom de Vries tom_devr...@mentor.com writes: This follow-up patch introduces a new effective target vect_min_max, similar to how effective target vect_bswap is implemented. Any comments? Thanks, - Tom Add check_effective_target_vect_min_max 2015-08-12 Tom de Vries t...@codesourcery.com * lib/target-supports.exp (check_effective_target_vect_min_max): New proc. * gcc.dg/vect/trapv-vect-reduc-4.c: Use vect_min_max effective target. Looks good to me, but the new effective-target keyword needs documenting in sourcebuild.texi. Hmm, in sourcebuild.texi I found: ... @item vect_no_int_max Target does not support a vector max instruction on @code{int}. ... That looks related. [ I also found a patch introducing vect_no_uint_max here: https://gcc.gnu.org/ml/gcc-patches/2010-01/msg00152.html. ] I'm not sure where to take it from here. Should I introduce vect_no_int_min, and use that in combination with vect_no_int_max? I'd say this is something for the vectorizer maintainers to decide. Richi? Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[Aarch64][2/3] Use the atomic compare-and-swap instructions when available.
ARMv8.1 adds instructions for atomic compare-and-swap with optional memory ordering specifiers. This patch adds the compare-and-swap instructions and changes the atomic_compare_and_swap patterns to use them The changes to the atomic_compare_and_swap pattern makes the new instructions available as an alternative for the existing expander, to be used when the target supports them. The existing expander is reused so that it can generate code needed to meet the requirements of the atomic_compare_and_swap name. Using the atomic CAS instructions, the code generated for a call to __atomic_compare_exchange (ptr, expected, desired, weak, smodel, fmodel) becomes: mov r, r1 cassmosz r, r2, [r0] cmp r, r1 cset r0, eq cbnz r0, L strb r, [r1] L: ret where r0 = ptr, r1 = *expected, r2 = *desired, r is some temporary. mo is one of {'', 'a', 'l', 'al'}, depending on smodel sz is one of {'', 'b', 'h'} depending on the data size. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check and for aarch64-none-elf with cross-compiled check-gcc. Also tested aarch64-none-elf with cross-compiled check-gcc on an emulator that supports ARMv8.1. Ok for trunk? Matthew 2015-08-12 Matthew Wahab matthew.wa...@arm.com * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_cas): Declare. * config/aarch64/aarch64.c (aarch64_expand_compare_and_swap): Choose appropriate instruction pattern for the target. (aarch64_gen_atomic_cas): New. * config/aarch64/atomics.md (UNSPECV_ATOMIC_CAS): New. (atomic_compare_and_swapmode_1): Rename to aarch64_compare_and_swapmode. Fix some indentation. (aarch64_compare_and_swapmode_lse): New. (aarch64_atomic_casmode): New. From a84d8f8202a30fca18ed79e617d5ba3422eb021f Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Mon, 10 Aug 2015 16:59:18 +0100 Subject: [PATCH 2/3] Add and use atomic CAS instruction. Change-Id: Ie40f345d414fc9dc6c8cbac0eb9457547aa9ec2d --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64.c| 66 ++-- gcc/config/aarch64/atomics.md | 117 +--- 3 files changed, 172 insertions(+), 12 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 32b5d09..0b09d49 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -362,6 +362,7 @@ rtx aarch64_load_tp (rtx); void aarch64_expand_compare_and_swap (rtx op[]); void aarch64_split_compare_and_swap (rtx op[]); +void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx); void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 6da7245..259e049 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10749,7 +10749,23 @@ aarch64_expand_compare_and_swap (rtx operands[]) { rtx bval, rval, mem, oldval, newval, is_weak, mod_s, mod_f, x; machine_mode mode, cmp_mode; - rtx (*gen) (rtx, rtx, rtx, rtx, rtx, rtx, rtx); + typedef rtx (*gen_cas_fn) (rtx, rtx, rtx, rtx, rtx, rtx, rtx); + int idx; + gen_cas_fn gen; + const gen_cas_fn split_cas[] = + { +gen_aarch64_compare_and_swapqi, +gen_aarch64_compare_and_swaphi, +gen_aarch64_compare_and_swapsi, +gen_aarch64_compare_and_swapdi + }; + const gen_cas_fn atomic_cas[] = + { +gen_aarch64_compare_and_swapqi_lse, +gen_aarch64_compare_and_swaphi_lse, +gen_aarch64_compare_and_swapsi_lse, +gen_aarch64_compare_and_swapdi_lse + }; bval = operands[0]; rval = operands[1]; @@ -10794,13 +10810,17 @@ aarch64_expand_compare_and_swap (rtx operands[]) switch (mode) { -case QImode: gen = gen_atomic_compare_and_swapqi_1; break; -case HImode: gen = gen_atomic_compare_and_swaphi_1; break; -case SImode: gen = gen_atomic_compare_and_swapsi_1; break; -case DImode: gen = gen_atomic_compare_and_swapdi_1; break; +case QImode: idx = 0; break; +case HImode: idx = 1; break; +case SImode: idx = 2; break; +case DImode: idx = 3; break; default: gcc_unreachable (); } + if (TARGET_LSE) +gen = atomic_cas[idx]; + else +gen = split_cas[idx]; emit_insn (gen (rval, mem, oldval, newval, is_weak, mod_s, mod_f)); @@ -10829,6 +10849,42 @@ aarch64_emit_post_barrier (enum memmodel model) } } +/* Emit an atomic compare-and-swap operation. RVAL is the destination register + for the data in memory. EXPECTED is the value expected to be in memory. + DESIRED is the value to store to memory. MEM is the memory location. MODEL + is the memory ordering to use. */ + +void +aarch64_gen_atomic_cas (rtx rval, rtx mem, + rtx expected, rtx desired, + rtx model) +{ + rtx (*gen) (rtx, rtx, rtx, rtx); +
[patch] Update libstdc++ status docs and contribution policies.
Committed to trunk. commit 099d855a8ad7b655662e8f9c36cfb998541f501d Author: Jonathan Wakely jwak...@redhat.com Date: Tue Jul 21 13:26:07 2015 +0100 * doc/xml/manual/appendix_contributing.xml: Update patch email policy. * doc/xml/manual/status_cxx2017.xml: Update status table. * doc/html/manual/*: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/appendix_contributing.xml b/libstdc++-v3/doc/xml/manual/appendix_contributing.xml index 2c3b9fb..d7df13c 100644 --- a/libstdc++-v3/doc/xml/manual/appendix_contributing.xml +++ b/libstdc++-v3/doc/xml/manual/appendix_contributing.xml @@ -184,7 +184,8 @@ When you have all these pieces, bundle them up in a mail message and send it to libstd...@gcc.gnu.org. All patches and related discussion should be sent to the - libstdc++ mailing list. + libstdc++ mailing list. In common with the rest of GCC, + patches should also be sent to the gcc-patches mailing list. /para /listitem /itemizedlist diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml index c0bca84..b625364 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml @@ -81,14 +81,13 @@ not in any particular release. /row row - ?dbhtml bgcolor=#C8B0B0 ? entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4089.pdf; N4089 /link /entry entrySafe conversions in codeunique_ptrlt;T[]gt;/code /entry - entryN/entry + entryY/entry entry/ /row @@ -116,14 +115,13 @@ not in any particular release. /row row - ?dbhtml bgcolor=#C8B0B0 ? entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4279.html; N4279 /link /entry entryImproved insertion interface for unique-key maps/entry - entryN/entry + entryY/entry entry/ /row @@ -147,32 +145,30 @@ not in any particular release. /entry entryA proposal to add invoke function template/entry entryN/entry - entry/ + entryIn progress/entry /row row - ?dbhtml bgcolor=#C8B0B0 ? entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4280.pdf; N4280 /link /entry entryNon-member size() and more/entry - entryN/entry + entryY/entry entry/ /row row - ?dbhtml bgcolor=#C8B0B0 ? entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4366.html; - N4387 + N4366 /link /entry entry LWG 2228: Missing SFINAE rule in unique_ptr templated assignment /entry - entryN/entry + entryY/entry entry/ /row @@ -314,14 +310,13 @@ not in any particular release. /row row - ?dbhtml bgcolor=#C8B0B0 ? entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4391.html; N4391 /link /entry entry codemake_array/code, revision 4 /entry - entryN/entry + entryY (not committed yet)/entry entryLibrary Fundamentals 2 TS/entry /row
RE: [PATCH][MIPS] Scheduler fix for the 74k 24k.
Hi, Simon gcc/ * config/mips/mips.c (mips_store_data_bypass_p): Bring code into line with comments. * config/mips/sb1.md: Update usage of mips_store_data_bypass_p. This patch is OK. Committed on Simon's behalf as r226805. Regards, Robert
Re: [Aarch64][3/3] Add tests for atomic compare-and-swap instructions.
On 12/08/15 11:51, Andrew Pinski wrote: On Wed, Aug 12, 2015 at 6:16 PM, Matthew Wahab matthew.wa...@foss.arm.com wrote: ARMv8.1 adds instructions for atomic compare-and-swap with optional memory ordering specifiers. This patch adds tests for the compare-and-swap instructions as two files. The first is support code to run the test with a range of types and memory models, the second is the test for the CAS instruction. I noticed you did not change the other cas testcases so they don't set lse. This can cause those testcases to fail if a cpu defaults to having LSE. I am going to be modifying a cpu to default to LSE enabled soonish. I'll see what I can do. Matthew
[Aarch64][3/3] Add tests for atomic compare-and-swap instructions.
ARMv8.1 adds instructions for atomic compare-and-swap with optional memory ordering specifiers. This patch adds tests for the compare-and-swap instructions as two files. The first is support code to run the test with a range of types and memory models, the second is the test for the CAS instruction. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check and for aarch64-none-elf with cross-compiled check-gcc. Also tested aarch64-none-elf with cross-compiled check-gcc on an emulator that supports ARMv8.1. Ok for trunk? Matthew gcc/testsuite 2015-08-12 Matthew Wahab matthew.wa...@arm.com * gcc.target/aarch64/atomic-inst-cas.c: New. * gcc.target/aarch64/atomic-inst-ops.inc: New. From c72302f2a0bc4d95a0b933e54332d551295040bf Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Mon, 3 Aug 2015 18:10:37 +0100 Subject: [PATCH 3/3] Add tests for CAS instruction. Change-Id: I42a46c2f81f1200a893620ba96323ce785873e8d --- gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c | 61 ++ .../gcc.target/aarch64/atomic-inst-ops.inc | 53 +++ 2 files changed, 114 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c b/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c new file mode 100644 index 000..f6f2892 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-cas.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv8-a+lse } */ + +/* Test ARMv8.1-A CAS instruction. */ + +#include atomic-inst-ops.inc + +#define TEST TEST_TWO + +#define CAS_ATOMIC(FN, TY, MODEL1, MODEL2)\ + int FNNAME (FN, TY) (TY* val, TY* foo, TY* bar) \ + { \ +int model_s = MODEL1; \ +int model_f = MODEL2; \ +/* The success memory ordering must be at least as strong as \ + the failure memory ordering. */ \ +if (model_s model_f) \ + return 0;\ +/* Ignore invalid memory orderings. */\ +if (model_f == __ATOMIC_RELEASE || model_f == __ATOMIC_ACQ_REL) \ + return 0;\ +return __atomic_compare_exchange_n (val, foo, bar, 0, model_s, model_f); \ + } + +#define CAS_ATOMIC_NORETURN(FN, TY, MODEL1, MODEL2) \ + void FNNAME (FN, TY) (TY* val, TY* foo, TY* bar) \ + { \ +int model_s = MODEL1; \ +int model_f = MODEL2; \ +/* The success memory ordering must be at least as strong as \ + the failure memory ordering. */ \ +if (model_s model_f) \ + return;\ +/* Ignore invalid memory orderings. */\ +if (model_f == __ATOMIC_RELEASE || model_f == __ATOMIC_ACQ_REL) \ + return;\ +__atomic_compare_exchange_n (val, foo, bar, 0, model_s, model_f); \ + } + +TEST (cas_atomic, CAS_ATOMIC) +TEST (cas_atomic_noreturn, CAS_ATOMIC_NORETURN) + + +/* { dg-final { scan-assembler-times casb\t 4} } */ +/* { dg-final { scan-assembler-times casab\t 20} } */ +/* { dg-final { scan-assembler-times caslb\t 4} } */ +/* { dg-final { scan-assembler-times casalb\t 36} } */ + +/* { dg-final { scan-assembler-times cash\t 4} } */ +/* { dg-final { scan-assembler-times casah\t 20} } */ +/* { dg-final { scan-assembler-times caslh\t 4} } */ +/* { dg-final { scan-assembler-times casalh\t 36} } */ + +/* { dg-final { scan-assembler-times cas\t 8} } */ +/* { dg-final { scan-assembler-times casa\t 40} } */ +/* { dg-final { scan-assembler-times casl\t 8} } */ +/* { dg-final { scan-assembler-times casal\t 72} } */ + +/* { dg-final { scan-assembler-not ldaxr\t } } */ +/* { dg-final { scan-assembler-not stlxr\t } } */ +/* { dg-final { scan-assembler-not dmb } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc new file mode 100644 index 000..72c7e5c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/atomic-inst-ops.inc @@ -0,0 +1,53 @@ +/* Support code for atomic instruction tests. */ + +/* Define types names without spaces. */ +typedef unsigned char uchar; +typedef unsigned short ushort; +typedef unsigned int uint; +typedef long long longlong; +typedef unsigned long long ulonglong; +typedef __int128_t int128; +typedef __uint128_t uint128; + +#define FNNAME(NAME,TY) NAME + +/* Expand one-model functions. */ +#define TEST_M1(NAME, FN, TY, MODEL, DUMMY) \ + FN (test_##NAME##_##TY, TY, MODEL) + +/* Expand two-model functions. */ +#define TEST_M2(NAME, FN, TY, MODEL1, MODEL2) \ + FN (test_##NAME##_##TY, TY, MODEL1, MODEL2) + +/* Typest to test. */ +#define TEST_TY(NAME, FN, N, MODEL1, MODEL2) \ + TEST_M##N (NAME, FN, char, MODEL1, MODEL2) \ + TEST_M##N (NAME, FN, uchar, MODEL1, MODEL2) \ + TEST_M##N (NAME, FN, short, MODEL1, MODEL2) \ + TEST_M##N (NAME, FN, ushort, MODEL1, MODEL2) \ + TEST_M##N (NAME, FN, int,
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On Tue, Aug 11, 2015 at 9:49 PM, Jeff Law l...@redhat.com wrote: On 08/06/2015 04:25 AM, Mikael Morin wrote: Hello, this avoids an error found with bootstrap-ubsan. Regression tested on x86_64-unknown-linux-gnu. OK for trunk? Mikael noub_sext.CL 2015-08-05 Mikael Morinmik...@gcc.gnu.org * hwint.h (sext_hwi): Rewrite without undefined behaviour on negative SRC. OK. Hopefully most of the time the precision is known at compile-time which would allow for optimization of the resulting code back to the pair-of-shifts form by combine. I think it is not. The code also lacks a comment on why we do this kind of obfuscation. What kind of error does ubsan run into? That is, for which 'prec'? Richard. jeff
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On 2015.08.12 at 13:01 +0200, Richard Biener wrote: On Tue, Aug 11, 2015 at 9:49 PM, Jeff Law l...@redhat.com wrote: On 08/06/2015 04:25 AM, Mikael Morin wrote: Hello, this avoids an error found with bootstrap-ubsan. Regression tested on x86_64-unknown-linux-gnu. OK for trunk? Mikael noub_sext.CL 2015-08-05 Mikael Morinmik...@gcc.gnu.org * hwint.h (sext_hwi): Rewrite without undefined behaviour on negative SRC. OK. Hopefully most of the time the precision is known at compile-time which would allow for optimization of the resulting code back to the pair-of-shifts form by combine. I think it is not. The code also lacks a comment on why we do this kind of obfuscation. What kind of error does ubsan run into? That is, for which 'prec'? See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042 -- Markus
Re: [Aarch64][2/3] Use the atomic compare-and-swap instructions when available.
On Wed, Aug 12, 2015 at 11:15:25AM +0100, Matthew Wahab wrote: ARMv8.1 adds instructions for atomic compare-and-swap with optional memory ordering specifiers. This patch adds the compare-and-swap instructions and changes the atomic_compare_and_swap patterns to use them The changes to the atomic_compare_and_swap pattern makes the new instructions available as an alternative for the existing expander, to be used when the target supports them. The existing expander is reused so that it can generate code needed to meet the requirements of the atomic_compare_and_swap name. Using the atomic CAS instructions, the code generated for a call to __atomic_compare_exchange (ptr, expected, desired, weak, smodel, fmodel) becomes: mov r, r1 cassmosz r, r2, [r0] cmp r, r1 cset r0, eq cbnz r0, L strb r, [r1] L: ret where r0 = ptr, r1 = *expected, r2 = *desired, r is some temporary. mo is one of {'', 'a', 'l', 'al'}, depending on smodel sz is one of {'', 'b', 'h'} depending on the data size. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check and for aarch64-none-elf with cross-compiled check-gcc. Also tested aarch64-none-elf with cross-compiled check-gcc on an emulator that supports ARMv8.1. Ok for trunk? Matthew OK, but please fix up the testcases Andrew mentioned for those configuring their toolchains/test runs with a default -mcpu or -march which targets these instructions. I would just hardwire them (with dg-additional-options) to -march=armv8-a. Thanks, James 2015-08-12 Matthew Wahab matthew.wa...@arm.com * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_cas): Declare. * config/aarch64/aarch64.c (aarch64_expand_compare_and_swap): Choose appropriate instruction pattern for the target. (aarch64_gen_atomic_cas): New. * config/aarch64/atomics.md (UNSPECV_ATOMIC_CAS): New. (atomic_compare_and_swapmode_1): Rename to aarch64_compare_and_swapmode. Fix some indentation. (aarch64_compare_and_swapmode_lse): New. (aarch64_atomic_casmode): New.
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
Richard Henderson wrote: However, the way that aarch64 and alpha have done it hasn't been ideal, in that there's a fairly costly search that must be done every time. I've thought before about changing this so that we would be able to cache results, akin to how we do it in expmed.c for multiplication. I've implemented such a caching scheme for three targets, as a test of how much code could be shared. The answer appears to be about 100 lines of boiler-plate. Minimal, true, but it may still be worth it as a way of encouraging backends to do similar things in a similar way. However it also creates new dependencies that may not be desirable (such as hash table size, algorithm used etc). Some notes about ppc64 in particular: * Constants aren't split until quite late, preventing all hope of CSE'ing portions of the generated code. My gut feeling is that this is in general a mistake, but... Late split is best in general as you want to CSE the original constants, not parts of the expansion (which would be very rarely possible). * This is the only platform for which I bothered collecting any sort of performance data: As best I can tell, there is a 9% improvement in bootstrap speed for ppc64. That is, 10 minutes off the original 109 minute build. For aarch64 and alpha, I simply assumed there would be no loss, since the basic search algorithm is unchanged for each. Comments? Especially on the shared header? I'm not convinced the amount of code that could be shared is enough to be worthwhile. Also the way it is written makes the immediate generation more complex and likely consuming a lot of memory (each cached immediate requires at least 64 bytes). It is not obvious to me why it is a good idea to hide the simple/fast cases behind the hashing scheme - it seems better that the backend explicitly decides which cases should be cached. I looked at the statistics of AArch64 immediate generation a while ago. The interesting thing is ~95% of calls are queries, and the same query is on average repeated 10 times in a row. So (a) it is not important to cache the expansions, and (b) the high repetition rate means a single-entry cache has a 90% hitrate. We already have a patch for this and could collect stats comparing the approaches. If a single-entry cache can provide a similar benefit as caching all immediates then my preference would be to keep things simple and just cache the last query. Note the many repeated queries indicate a performance issue at a much higher level (repeated cost queries on the same unchanged RTL), and solving that problem will likely improve buildtime for all targets. Wilco
Re: [PATCH, libjava/classpath]: Fix overriding recipe for target 'gjdoc' build warning
Jeff In the past this has stalled on issues like how will asynch-exceptions Jeff be tested and the like. It seems to me that either there is some other language which needs this -- in which case that language ought to have testing for the feature -- or the feature is only used by gcj, in which case it doesn't matter. Of course is!=ought; but relying on gcj and libjava to provide this small amount of testing seems like a bad cost/benefit tradeoff. Tom
Re: [PATCH, libjava/classpath]: Fix overriding recipe for target 'gjdoc' build warning
On 08/12/2015 10:24 AM, Ian Lance Taylor wrote: On Wed, Aug 12, 2015 at 9:21 AM, Tom Tromey t...@tromey.com wrote: Jeff In the past this has stalled on issues like how will asynch-exceptions Jeff be tested and the like. It seems to me that either there is some other language which needs this -- in which case that language ought to have testing for the feature -- or the feature is only used by gcj, in which case it doesn't matter. Of course is!=ought; but relying on gcj and libjava to provide this small amount of testing seems like a bad cost/benefit tradeoff. Go does use asynchronous exceptions, and has test cases that rely on them working. If you're comfortable with Go at this point and we have mechanisms in place to ensure Go only gets built on platforms that support Go, then I think we should go forward with replacing GCJ with Go. Jeff
Re: [PATCH, libjava/classpath]: Fix overriding recipe for target 'gjdoc' build warning
On Wed, Aug 12, 2015 at 7:57 AM, Andrew Haley a...@redhat.com wrote: On 12/08/15 15:44, Jeff Law wrote: My inclination is to replace GCJ with Go, but Ian wasn't comfortable with that when I suggested it a couple years ago. Because Go wasn't ready for prime time? I don't remember why I wasn't comfortable with Go a couple of years ago, but I think it would be OK as a default language today. I doubt it runs on as many platforms as gcj, though I would certainly be happy to see more portability patches. In particular, it doesn't currently run on Windows or Darwin. Ian
Re: [PATCH, libjava/classpath]: Fix overriding recipe for target 'gjdoc' build warning
On Wed, Aug 12, 2015 at 9:21 AM, Tom Tromey t...@tromey.com wrote: Jeff In the past this has stalled on issues like how will asynch-exceptions Jeff be tested and the like. It seems to me that either there is some other language which needs this -- in which case that language ought to have testing for the feature -- or the feature is only used by gcj, in which case it doesn't matter. Of course is!=ought; but relying on gcj and libjava to provide this small amount of testing seems like a bad cost/benefit tradeoff. Go does use asynchronous exceptions, and has test cases that rely on them working. Ian
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On 08/12/2015 07:33 AM, Mikael Morin wrote: Le 12/08/2015 13:09, Richard Biener a écrit : On Wed, Aug 12, 2015 at 1:07 PM, Markus Trippelsdorf mar...@trippelsdorf.de wrote: On 2015.08.12 at 13:01 +0200, Richard Biener wrote: What kind of error does ubsan run into? That is, for which 'prec'? See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042 Ugh. Stupid C. My fear is that with so many stmts sext_hwi isn't any longer considered for inlining, even if prec is constant. What's the generated code for its out-of line copy with/without the patch? The difference on x86_64 is: .cfi_startproc cmpl$64, %esi je.L3 -movl$64, %ecx -subl%esi, %ecx -salq%cl, %rdi +leal-1(%rsi), %ecx +movl$1, %eax +movq%rax, %rdx +salq%cl, %rdx +movl%esi, %ecx +salq%cl, %rax +subq$1, %rax +andq%rax, %rdi +xorq%rdx, %rdi movq%rdi, %rax -sarq%cl, %rax +subq%rdx, %rax ret .p2align 4,,10 .p2align 3 with -O2, tests attached. Yes it's worse. I thought it was better to avoid undefined behaviour at all prices to avoid bad surprises. Well, this isn't the right test. You're testing when PREC is an unknown and I fully expect in that case the code you're going to get will be worse. There's too many insns for combine to save us in that case. What's interesting here is what happens when prec is a known value (since we're hoping that's the common case via inlining). When prec has a known value of 8, 16 or 32 we get a nice movs[bwl]q on x86_64, which is exactly what we want. When prec is another known value, say 13 for the sake of argument, we get a pair of shifts, which is again exactly what we want. So when prec is a constant, we're going to get good code. So the only question left is whether or not prec is usually a constant or not in the contexts in which this routine is used. Jeff
Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
On 08/12/2015 08:59 AM, Wilco Dijkstra wrote: I looked at the statistics of AArch64 immediate generation a while ago. The interesting thing is ~95% of calls are queries, and the same query is on average repeated 10 times in a row. So (a) it is not important to cache the expansions, and (b) the high repetition rate means a single-entry cache has a 90% hitrate. We already have a patch for this and could collect stats comparing the approaches. If a single-entry cache can provide a similar benefit as caching all immediates then my preference would be to keep things simple and just cache the last query. Interesting. That's already more detailed investigation than I'd done. I had no idea the queries were so clustered. I assumed that the queries would be scattered across various passes, and so the various constants across the function would get checked in sequence. I would be very interested in seeing those stats when you've done. r~
[PATCH][RTL-ifcvt] Allow PLUS+immediate expression in noce_try_store_flag_constants
Hi all, This patch is a sequel to: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html It allows if-conversion of expressions of the form: if (test) x := y + c1; else x := y + c2 where c1 and c2 are a pair constants that can be optimised using the existing rules in noce_try_store_flag_constants. The resulting sequence looks something like: x := (test [!=,==] 0) + y; x := x + [c1,c2]; This patch reuses the logic for the combinations of diff and STORE_FLAG signs that is hammered out in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02567.html. That means we just extract the constants c1 and c2 from the PLUS rtx and use their diff as the driver for the logic in that function. Currently, this patch allows only the case where diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE as it is the most beneficial case (from looking at SPEC2006 on aarch64) and also the least tricky to get right. In the future I suppose that restriction can be relaxed if needed, but in any case this should be an improvement over the existing situation. On aarch64 this allows us for code: int barsi (int x) { return x 100 ? x + 4 : x + 3; } to generate: barsi: cmp w0, 101 cincw0, w0, ge add w0, w0, 3 ret instead of the current: barsi: add w1, w0, 4 add w2, w0, 3 cmp w0, 101 cselw0, w2, w1, lt ret thus saving one instruction and using two fewer registers. Bootstrapped and tested on arm, aarch64, x86_64. Ok for trunk? Thanks, Kyrill 2015-08-12 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (noce_try_store_flag_constants): Handle PLUS-immediate expressions in A and B. 2015-08-12 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/cinc_common_1.c: New test. commit ce0de8d30962bf00339dc3a030401a4be2a7a248 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Thu Jul 30 15:14:27 2015 +0100 [ifcvt][unfinished] Allow PLUS+immediate expression in noce_try_store_flag_constants diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index f815166..7a44e29 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -1198,17 +1198,35 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) HOST_WIDE_INT itrue, ifalse, diff, tmp; int normalize; bool can_reverse; - machine_mode mode; + machine_mode mode = GET_MODE (if_info-x);; + rtx common = NULL_RTX; if (!noce_simple_bbs (if_info)) return FALSE; - if (CONST_INT_P (if_info-a) - CONST_INT_P (if_info-b)) + rtx a = if_info-a; + rtx b = if_info-b; + + /* Handle cases like x := test ? y + 3 : y + 4. */ + if (GET_CODE (a) == PLUS + GET_CODE (b) == PLUS + CONST_INT_P (XEXP (a, 1)) + CONST_INT_P (XEXP (b, 1)) + rtx_equal_p (XEXP (a, 0), XEXP (b, 0)) + noce_operand_ok (XEXP (a, 0)) + if_info-branch_cost = 2) { - mode = GET_MODE (if_info-x); - ifalse = INTVAL (if_info-a); - itrue = INTVAL (if_info-b); + common = XEXP (a, 0); + + a = XEXP (a, 1); + b = XEXP (b, 1); +} + + if (CONST_INT_P (a) + CONST_INT_P (b)) +{ + ifalse = INTVAL (a); + itrue = INTVAL (b); bool subtract_flag_p = false; diff = (unsigned HOST_WIDE_INT) itrue - ifalse; @@ -1241,6 +1259,11 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) { reversep = can_reverse; subtract_flag_p = !can_reverse; + /* If we need to subtract the flag and we have PLUS-immediate + A and B then it is unlikely to be beneficial to play tricks + here. */ + if (subtract_flag_p common) + return FALSE; } /* test ? 3 : 4 = can_reverse | 3 + (test == 0) @@ -1249,6 +1272,11 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) { reversep = can_reverse; subtract_flag_p = !can_reverse; + /* If we need to subtract the flag and we have PLUS-immediate + A and B then it is unlikely to be beneficial to play tricks + here. */ + if (subtract_flag_p common) + return FALSE; } /* test ? 4 : 3 = 4 + (test != 0). */ @@ -1290,6 +1318,15 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) } start_sequence (); + + /* If we have x := test ? x + 3 : x + 4 then move the original + x out of the way while we store flags. */ + if (common rtx_equal_p (common, if_info-x)) + { + common = gen_reg_rtx (mode); + noce_emit_move_insn (common, if_info-x); + } + target = noce_emit_store_flag (if_info, if_info-x, reversep, normalize); if (! target) { @@ -1301,13 +1338,27 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) = x = 3 + (test == 0); */ if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE) { + /* Add the common part now. This may allow combine to merge this + with the store flag operation earlier into some sort of conditional + increment/decrement if the target allows it. */ + if
Re: [C/C++ PATCH] Fix -Wshift-overflow with sign bit
On 08/12/2015 09:39 AM, Marek Polacek wrote: This patch fixes a defect in -Wshift-overflow. We should only warn about left-shifting 1 into the sign bit when -Wshift-overflow=2. But this doesn't apply only for 1 31, but also for 2 30, etc. In C++14, never warn about this. Neither existing tests nor documentation require updating, I think. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-08-12 Marek Polacek pola...@redhat.com PR c++/55095 * c-common.c (maybe_warn_shift_overflow): Properly handle left-shifting 1 into the sign bit. * c-c++-common/Wshift-overflow-6.c: New test. * c-c++-common/Wshift-overflow-7.c: New test. * g++.dg/cpp1y/left-shift-2.C: New test. I didn't realize C++14 fixed this bit of lameness. I'd read that it was under consideration when I was looking at Mikael's sext_hwi patch, but didn't follow up on the current status. OK for the trunk. jeff
Re: [PATCH, libjava/classpath]: Fix overriding recipe for target 'gjdoc' build warning
On Wed, Aug 12, 2015 at 9:47 AM, Jeff Law l...@redhat.com wrote: If you're comfortable with Go at this point and we have mechanisms in place to ensure Go only gets built on platforms that support Go, then I think we should go forward with replacing GCJ with Go. We have the mechanism for disabling Go on systems where it will not work. However, the list of such systems is undoubtedly incomplete at this time. In the top level configure.ac: # Disable the go frontend on systems where it is known to not work. Please keep # this in sync with contrib/config-list.mk. case ${target} in *-*-darwin* | *-*-cygwin* | *-*-mingw* | *-*-aix*) unsupported_languages=$unsupported_languages go ;; esac # Disable libgo for some systems where it is known to not work. # For testing, you can easily override this with --enable-libgo. if test x$enable_libgo = x; then case ${target} in *-*-darwin*) # PR 46986 noconfigdirs=$noconfigdirs target-libgo ;; *-*-cygwin* | *-*-mingw*) noconfigdirs=$noconfigdirs target-libgo ;; *-*-aix*) noconfigdirs=$noconfigdirs target-libgo ;; esac fi
Re: [PATCH 06/15] rs6000: Use rldiwi in constant construction
On 08/12/2015 07:02 AM, Segher Boessenkool wrote: On Tue, Aug 11, 2015 at 06:11:35PM -0700, Richard Henderson wrote: @@ -8173,6 +8173,13 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget) if (exam_mask (-1, c, sub_budget)) return true; + /* If the two halves are equal, use an insert. */ + if (c 32 == test exam_sub (test, sub_budget)) +{ + opN (VEC_DUPLICATE, 0xu); /* RLDIMI */ + return true; +} Does this work for c with the high bit set? I think you need to cast it to unsigned HOST_WIDE_INT first? Indeed, a sign-extension works better. It means the base constant will use LIS+ORIS without trying to create an unsigned version. If you're talking about ubsan sort of restrictions on shifting signed constants... I choose to totally ignore that. Certainly no where else in gcc has been audited for that, beginning with hwint.h itself. r~
Re: [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide
On 08/12/2015 06:23 AM, Segher Boessenkool wrote: On Tue, Aug 11, 2015 at 06:11:30PM -0700, Richard Henderson wrote: This allows testing for a mask without having to call GEN_INT. Cc: David Edelsohn dje@gmail.com --- * config/rs6000/rs6000.c (rs6000_is_valid_mask_wide): Split out from... (rs6000_is_valid_mask): ... here. (rs6000_is_valid_and_mask_wide): Split out from... (rs6000_is_valid_and_mask): ... here. I don't like these _wide names much. It follows the existing practice within the backend. You could overload the shorter name, if you really think creating some garbage const_int's is too much overhead (it might well be if you use it a lot more in later patches). At one stage in the development (before I became much leaner with the search for rotate), it really really mattered. -bool -rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode) +static bool +rs6000_is_valid_mask_wide (unsigned HOST_WIDE_INT val, int *b, int *e, int n) But why change the mode parameter? The code was clearer before. So that we don't have to look up GET_MODE_BITSIZE (mode). +static bool +rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val, machine_mode mode) { int nb, ne; - if (!rs6000_is_valid_mask (mask, nb, ne, mode)) -return false; + switch (mode) +{ +case DImode: + if (!rs6000_is_valid_mask_wide (val, nb, ne, 64)) +return false; + /* For DImode, we need a rldicl, rldicr, or a rlwinm with + mask that does not wrap. */ + return (ne == 0 || nb == 63 || (nb 32 ne = nb)); - /* For DImode, we need a rldicl, rldicr, or a rlwinm with mask that - does not wrap. */ - if (mode == DImode) -return (ne == 0 || nb == 63 || (nb 32 ne = nb)); +case SImode: + if (!rs6000_is_valid_mask_wide (val, nb, ne, 32)) +return false; + /* For SImode, rlwinm can do everything. */ + return (nb 32 ne 32); - /* For SImode, rlwinm can do everything. */ - if (mode == SImode) -return (nb 32 ne 32); +default: + return false; +} +} - return false; You don't need any of these changes then, either. True, not *needed* per-se, but if you look closer I'm combining conditionals. I think the replacement here is clearer. /* Otherwise, fill in the lowest hole; if we can do the result with one insn, we can do the whole thing with two. */ - unsigned HOST_WIDE_INT val = INTVAL (c); + unsigned HOST_WIDE_INT val = UINTVAL (c); Does it matter? No. r~
Re: [PATCH] xtensa: add -mauto-litpools option
On Tue, Aug 11, 2015 at 6:09 PM, Max Filippov jcmvb...@gmail.com wrote: With support from assembler this option allows compiling huge functions, where single literal pool at the beginning of a function may not be reachable by L32R instructions at its end. Currently assembler --auto-litpools option cannot deal with literals used from multiple locations separated by more than 256 KBytes of code. Don't turn constants into literals, instead use MOVI instruction to load them into registers and let the assembler turn them into literals as necessary. 2015-08-11 Max Filippov jcmvb...@gmail.com gcc/ * config/xtensa/constraints.md (define_constraint Y): New constraint. * config/xtensa/elf.h (ASM_SPEC): Add m(no-)auto-litpools. * config/xtensa/linux.h (ASM_SPEC): Likewise. * config/xtensa/predicates.md (move_operand): Match constants and symbols in the presence of TARGET_AUTO_LITPOOLS. * config/xtensa/xtensa.c (xtensa_valid_move): Don't allow immediate references to TLS data. (xtensa_emit_move_sequence): Don't force constants to memory in the presence of TARGET_AUTO_LITPOOLS. (print_operand): Add 'y' format, same as default, but capable of printing SF mode constants as well. * config/xtensa/xtensa.md (movsi_internal, movhi_internal) (movsf_internal): Add movi pattern that loads literal. (movsf, movdf): Don't force constants to memory in the presence of TARGET_AUTO_LITPOOLS. (movdf_internal): Add 'Y' constraint. * config/xtensa/xtensa.opt (mauto-litpools): New option. * doc/invoke.text (Xtensa options): Document -mauto-litpools. If this is OK with the linux people, it is OK with me. As I recall, they used to have a need to keep literals in page-level groups, but my memory is hazy.
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On August 12, 2015 7:02:04 PM GMT+02:00, Jeff Law l...@redhat.com wrote: On 08/12/2015 07:33 AM, Mikael Morin wrote: Le 12/08/2015 13:09, Richard Biener a écrit : On Wed, Aug 12, 2015 at 1:07 PM, Markus Trippelsdorf mar...@trippelsdorf.de wrote: On 2015.08.12 at 13:01 +0200, Richard Biener wrote: What kind of error does ubsan run into? That is, for which 'prec'? See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042 Ugh. Stupid C. My fear is that with so many stmts sext_hwi isn't any longer considered for inlining, even if prec is constant. What's the generated code for its out-of line copy with/without the patch? The difference on x86_64 is: .cfi_startproc cmpl$64, %esi je.L3 -movl$64, %ecx -subl%esi, %ecx -salq%cl, %rdi +leal-1(%rsi), %ecx +movl$1, %eax +movq%rax, %rdx +salq%cl, %rdx +movl%esi, %ecx +salq%cl, %rax +subq$1, %rax +andq%rax, %rdi +xorq%rdx, %rdi movq%rdi, %rax -sarq%cl, %rax +subq%rdx, %rax ret .p2align 4,,10 .p2align 3 with -O2, tests attached. Yes it's worse. I thought it was better to avoid undefined behaviour at all prices to avoid bad surprises. Well, this isn't the right test. You're testing when PREC is an unknown and I fully expect in that case the code you're going to get will be worse. There's too many insns for combine to save us in that case. What's interesting here is what happens when prec is a known value (since we're hoping that's the common case via inlining). When prec has a known value of 8, 16 or 32 we get a nice movs[bwl]q on x86_64, which is exactly what we want. When prec is another known value, say 13 for the sake of argument, we get a pair of shifts, which is again exactly what we want. So when prec is a constant, we're going to get good code. So the only question left is whether or not prec is usually a constant or not in the contexts in which this routine is used. Prec is almost never a constant and is heavily used from wide-int. We are not exploiting this undefined ness in C so I object to making this so much slower. Can we instead do what we do for abs_hwi and add a checking assert so we can move the tests to the callers that misbehave instead? Thanks, Richard. Jeff
Re: [gomp4] Work around expand_GOACC_DIM_SIZE/expand_GOACC_DIM_POS ICEs
On 08/12/15 08:46, Nathan Sidwell wrote: On 08/12/15 06:21, Thomas Schwinge wrote: Hi! OpenACC offloading code for Intel MIC doesn't make a lot of sense (currently), but it is what's being done, and has caused a lot of regressions in my testing, so I committed the following workaround to gomp-4_0-branch in r226804. A different approach would have been to invent some machinery to not compile OpenACC offloading code on the Intel MIC offloading path, but as the latter eventually is to support OpenACC offloading, too (see The right solution is probably for the default validated_dims to set the size to unity, just like the host. Thereby forcing any backend that wants to provide a larger size to override that hook. I.e. remove the #ifndef ACCEL_COMPILER from default_validate_dims in omp-low for avoidance of doubt, I'll add that to the patch I have in progress. nathan -- Nathan Sidwell - Director, Sourcery Services - Mentor Embedded
Re: [gomp4] Work around expand_GOACC_DIM_SIZE/expand_GOACC_DIM_POS ICEs
On 08/12/15 06:21, Thomas Schwinge wrote: Hi! OpenACC offloading code for Intel MIC doesn't make a lot of sense (currently), but it is what's being done, and has caused a lot of regressions in my testing, so I committed the following workaround to gomp-4_0-branch in r226804. A different approach would have been to invent some machinery to not compile OpenACC offloading code on the Intel MIC offloading path, but as the latter eventually is to support OpenACC offloading, too (see The right solution is probably for the default validated_dims to set the size to unity, just like the host. Thereby forcing any backend that wants to provide a larger size to override that hook. I.e. remove the #ifndef ACCEL_COMPILER from default_validate_dims in omp-low nathan -- Nathan Sidwell
Re: [Aarch64][3/3] Add tests for atomic compare-and-swap instructions.
On Wed, Aug 12, 2015 at 11:16:41AM +0100, Matthew Wahab wrote: ARMv8.1 adds instructions for atomic compare-and-swap with optional memory ordering specifiers. This patch adds tests for the compare-and-swap instructions as two files. The first is support code to run the test with a range of types and memory models, the second is the test for the CAS instruction. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check and for aarch64-none-elf with cross-compiled check-gcc. Also tested aarch64-none-elf with cross-compiled check-gcc on an emulator that supports ARMv8.1. Ok for trunk? Matthew OK, but please write a follow-up to fix the pre-existing tests. Thanks, James gcc/testsuite 2015-08-12 Matthew Wahab matthew.wa...@arm.com * gcc.target/aarch64/atomic-inst-cas.c: New. * gcc.target/aarch64/atomic-inst-ops.inc: New.
Re: [PATCH] PR target/67127: [ARM] Avoiding odd-number ldrd/strd in movdi introduced a regression on armeb-linux-gnueabihf
On 11 August 2015 at 12:28, Ramana Radhakrishnan ramana.radhakrish...@foss.arm.com wrote: Yes in big-endian DI mode value are stored into VFP registers, and here register 16 is the first of them s0. Just in case you want to do more test, the issue can be seen with a oneline testcase: __attribute__((__vector_size__(2 * sizeof(int int fn1() {} Yep we may well have DImode values in the VFP register bank. I observe that FIRST_VIRTUAL_REGISTER is 104, whereas LAST_ARM_REG is 15. So it might be that the pattern should check against the latter instead of the former - as arm_hard_regno_mode_ok does... yes, when checking that that the operand register number is lower or equals to LAST_ARM_REGNUM the infinite loop is avoided. I haven't pass a full validation so far, but it has the same effect than checking that the changing mode is authorized. If you think that this checking makes more sense, I can rerun a full valid. It sounds to me like checking against LAST_ARM_REGNUM is the better approach. The additional test in the movdi expander was to prevent illegitimate ldrd / strd instructions from being generated. Thus, OK if there are no regressions - Please give it some time to bake on trunk before backporting to FSF 5 and / or do some additional validation (including the regression testsuite) before doing the backport. No regressions on trunk for both armeb-linux-gnueabihf and arm-linux-gnueabihf, thus I've committed it as r226811. I'll validate the backport to FSF 5 before committing it. Thanks Yvan regards Ramana Thanks, Yvan yes --Alan
Re: [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide
On Tue, Aug 11, 2015 at 06:11:30PM -0700, Richard Henderson wrote: This allows testing for a mask without having to call GEN_INT. Cc: David Edelsohn dje@gmail.com --- * config/rs6000/rs6000.c (rs6000_is_valid_mask_wide): Split out from... (rs6000_is_valid_mask): ... here. (rs6000_is_valid_and_mask_wide): Split out from... (rs6000_is_valid_and_mask): ... here. I don't like these _wide names much. You could overload the shorter name, if you really think creating some garbage const_int's is too much overhead (it might well be if you use it a lot more in later patches). The original functions really want rtx's since they are used like predicates (so should look and behave like one); rs6000_is_valid_mask itself is different (and a lousy name; suggestions welcome). -bool -rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode) +static bool +rs6000_is_valid_mask_wide (unsigned HOST_WIDE_INT val, int *b, int *e, int n) But why change the mode parameter? The code was clearer before. { - unsigned HOST_WIDE_INT val = INTVAL (mask); unsigned HOST_WIDE_INT bit; int nb, ne; - int n = GET_MODE_PRECISION (mode); - if (mode != DImode mode != SImode) -return false; - - if (INTVAL (mask) = 0) + if ((HOST_WIDE_INT)val = 0) ^ missing space { bit = val -val; ne = exact_log2 (bit); @@ -16430,27 +16427,54 @@ rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode) return true; } +bool +rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode) +{ + int n; + + if (mode == DImode) +n = 64; + else if (mode == SImode) +n = 32; + else +return false; + + unsigned HOST_WIDE_INT val = INTVAL (mask); + return rs6000_is_valid_mask_wide (val, b, e, n); +} + /* Return whether MASK (a CONST_INT) is a valid mask for any rlwinm, rldicl, or rldicr instruction, to implement an AND with it in mode MODE. */ -bool -rs6000_is_valid_and_mask (rtx mask, machine_mode mode) +static bool +rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val, machine_mode mode) { int nb, ne; - if (!rs6000_is_valid_mask (mask, nb, ne, mode)) -return false; + switch (mode) +{ +case DImode: + if (!rs6000_is_valid_mask_wide (val, nb, ne, 64)) + return false; + /* For DImode, we need a rldicl, rldicr, or a rlwinm with + mask that does not wrap. */ + return (ne == 0 || nb == 63 || (nb 32 ne = nb)); - /* For DImode, we need a rldicl, rldicr, or a rlwinm with mask that - does not wrap. */ - if (mode == DImode) -return (ne == 0 || nb == 63 || (nb 32 ne = nb)); +case SImode: + if (!rs6000_is_valid_mask_wide (val, nb, ne, 32)) + return false; + /* For SImode, rlwinm can do everything. */ + return (nb 32 ne 32); - /* For SImode, rlwinm can do everything. */ - if (mode == SImode) -return (nb 32 ne 32); +default: + return false; +} +} - return false; You don't need any of these changes then, either. +bool +rs6000_is_valid_and_mask (rtx mask, machine_mode mode) +{ + return rs6000_is_valid_and_mask_wide (UINTVAL (mask), mode); } /* Return the instruction template for an AND with mask in mode MODE, with @@ -16739,12 +16763,12 @@ rs6000_is_valid_2insn_and (rtx c, machine_mode mode) /* Otherwise, fill in the lowest hole; if we can do the result with one insn, we can do the whole thing with two. */ - unsigned HOST_WIDE_INT val = INTVAL (c); + unsigned HOST_WIDE_INT val = UINTVAL (c); Does it matter? Segher
[gomp4] dimension API
I've committed this patch to gomp4. It reworks the oacc functiuon attribute dimension handling. Rather than pass the TREE_LIST to the various hooks, I convert it to a regular C array of ints. That makes life simpler for the consumers. They return a 'changed' flag to indicate whether the attrribute should be rewritten. That rewriting is done in a way that doesn;t presume the attribute is unshared (Cesar, your workaround should no longer be necessary). Also put in the change I mentioned earlier this morning about the default validate dims hook setting the dimensions to 1 on accelerators to. I'll revert Thomas's patch shortly. nathan 2015-08-12 Nathan Sidwell nat...@codesourcery.com * target.def (validate_dims): Adjust API. * targhooks.h (default_goacc_validate_dims): Adjust. * omp-low.c (replace_oacc_fn_attrib): New function. (set_oacc_fn_attrib): Use it. (oacc_xform_dim): Dims is array of ints. (execute_oacc_transform): Create int array of dims, adjust uses. (default_goacc_validate_dims): Adjust API. Force to w everywhere. * doc/tm.texi: Rebuild. * config/nvptx/nvptx.c (nvptx_validate_dims): Adjust API. Index: gcc/config/nvptx/nvptx.c === --- gcc/config/nvptx/nvptx.c (revision 226808) +++ gcc/config/nvptx/nvptx.c (working copy) @@ -3538,71 +3538,46 @@ nvptx_expand_builtin (tree exp, rtx targ return d-expander (exp, target, mode, ignore); } -/* Validate compute dimensions, fill in defaults. */ - -static tree -nvptx_validate_dims (tree decl, tree dims) -{ - tree adims[GOMP_DIM_MAX]; - unsigned ix; - tree *pos_ptr; - - for (ix = 0, pos_ptr = dims; ix != GOMP_DIM_MAX; - ix++, pos_ptr = TREE_CHAIN (*pos_ptr)) -{ - if (!*pos_ptr) - *pos_ptr = tree_cons (NULL_TREE, NULL_TREE, NULL_TREE); - - adims[ix] = TREE_VALUE (*pos_ptr); -} - - /* Define vector size for known hardware. */ +/* Define vector size for known hardware. */ #define PTX_VECTOR_LENGTH 32 #define PTX_WORKER_LENGTH 32 +/* Validate compute dimensions, fill in non-unity defaults. */ + +static bool +nvptx_validate_dims (tree decl, int dims[]) +{ + bool changed = false; + /* If the worker size is not 1, the vector size must be 32. If the vector size is not 1, it must be 32. */ - if ((adims[GOMP_DIM_WORKER] -TREE_INT_CST_LOW (adims[GOMP_DIM_WORKER]) != 1) - || (adims[GOMP_DIM_VECTOR] - TREE_INT_CST_LOW (adims[GOMP_DIM_VECTOR]) != 1)) + if ((dims[GOMP_DIM_WORKER] 1 || dims[GOMP_DIM_WORKER] == 0) + || (dims[GOMP_DIM_VECTOR] 1 || dims[GOMP_DIM_VECTOR] == 0)) { - if (!adims[GOMP_DIM_VECTOR] - || TREE_INT_CST_LOW (adims[GOMP_DIM_VECTOR]) != PTX_VECTOR_LENGTH) + if (dims[GOMP_DIM_VECTOR] != PTX_VECTOR_LENGTH) { - tree use = build_int_cst (integer_type_node, PTX_VECTOR_LENGTH); - if (adims[GOMP_DIM_VECTOR]) + if (dims[GOMP_DIM_VECTOR] = 0) warning_at (DECL_SOURCE_LOCATION (decl), 0, - TREE_INT_CST_LOW (adims[GOMP_DIM_VECTOR]) - ? using vector_length (%E), ignoring %E - : using vector_length (%E), ignoring runtime setting, - use, adims[GOMP_DIM_VECTOR]); - adims[GOMP_DIM_VECTOR] = use; + dims[GOMP_DIM_VECTOR] + ? using vector_length (%d), ignoring %d + : using vector_length (%d), ignoring runtime setting, + PTX_VECTOR_LENGTH, dims[GOMP_DIM_VECTOR]); + dims[GOMP_DIM_VECTOR] = PTX_VECTOR_LENGTH; + changed = true; } } /* Check the num workers is not too large. */ - if (adims[GOMP_DIM_WORKER] - TREE_INT_CST_LOW (adims[GOMP_DIM_WORKER]) PTX_WORKER_LENGTH) + if (dims[GOMP_DIM_WORKER] PTX_WORKER_LENGTH) { - tree use = build_int_cst (integer_type_node, PTX_WORKER_LENGTH); warning_at (DECL_SOURCE_LOCATION (decl), 0, - using num_workers (%E), ignoring %E, - use, adims[GOMP_DIM_WORKER]); - adims[GOMP_DIM_WORKER] = use; + using num_workers (%d), ignoring %d, + PTX_WORKER_LENGTH, dims[GOMP_DIM_WORKER]); + dims[GOMP_DIM_WORKER] = PTX_WORKER_LENGTH; + changed = true; } - /* Set defaults. */ - for (ix = 0; ix != GOMP_DIM_MAX; ix++) -if (!adims[ix]) - adims[ix] = integer_one_node; - - /* Write results. */ - tree pos; - for (ix = 0, pos = dims; ix != GOMP_DIM_MAX; ix++, pos = TREE_CHAIN (pos)) -TREE_VALUE (pos) = adims[ix]; - - return dims; + return changed; } /* Return maximum dimension size, or zero for unbounded. */ Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi (revision 226808) +++ gcc/doc/tm.texi (working copy) @@ -5740,11 +5740,12 @@ usable. In that case, the smaller the n to use it. @end deftypefn -@deftypefn {Target Hook} tree TARGET_GOACC_VALIDATE_DIMS (tree, @var{tree}) +@deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree, int @var{[]}) This hook should check the launch dimensions provided. It should fill -in default values and verify non-defaults. The
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
Le 12/08/2015 13:09, Richard Biener a écrit : On Wed, Aug 12, 2015 at 1:07 PM, Markus Trippelsdorf mar...@trippelsdorf.de wrote: On 2015.08.12 at 13:01 +0200, Richard Biener wrote: What kind of error does ubsan run into? That is, for which 'prec'? See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67042 Ugh. Stupid C. My fear is that with so many stmts sext_hwi isn't any longer considered for inlining, even if prec is constant. What's the generated code for its out-of line copy with/without the patch? The difference on x86_64 is: .cfi_startproc cmpl$64, %esi je .L3 - movl$64, %ecx - subl%esi, %ecx - salq%cl, %rdi + leal-1(%rsi), %ecx + movl$1, %eax + movq%rax, %rdx + salq%cl, %rdx + movl%esi, %ecx + salq%cl, %rax + subq$1, %rax + andq%rax, %rdi + xorq%rdx, %rdi movq%rdi, %rax - sarq%cl, %rax + subq%rdx, %rax ret .p2align 4,,10 .p2align 3 with -O2, tests attached. Yes it's worse. I thought it was better to avoid undefined behaviour at all prices to avoid bad surprises. Mikael typedef signed long long HOST_WIDE_INT; #define HOST_BITS_PER_WIDE_INT (8 * sizeof(HOST_WIDE_INT)) #define HOST_WIDE_INT_C(X) X ## LL #define HOST_WIDE_INT_UC(X) HOST_WIDE_INT_C (X ## U) #define HOST_WIDE_INT_1 HOST_WIDE_INT_C (1) #define HOST_WIDE_INT_1U HOST_WIDE_INT_UC (1) #define HOST_WIDE_INT_M1 HOST_WIDE_INT_C (-1) #define HOST_WIDE_INT_M1U HOST_WIDE_INT_UC (-1) #define gcc_checking_assert(test) HOST_WIDE_INT sext_hwi (HOST_WIDE_INT src, unsigned int prec) { if (prec == HOST_BITS_PER_WIDE_INT) return src; else { gcc_checking_assert (prec HOST_BITS_PER_WIDE_INT); int shift = HOST_BITS_PER_WIDE_INT - prec; return (src shift) shift; } } typedef signed long long HOST_WIDE_INT; #define HOST_BITS_PER_WIDE_INT (8 * sizeof(HOST_WIDE_INT)) #define HOST_WIDE_INT_C(X) X ## LL #define HOST_WIDE_INT_UC(X) HOST_WIDE_INT_C (X ## U) #define HOST_WIDE_INT_1 HOST_WIDE_INT_C (1) #define HOST_WIDE_INT_1U HOST_WIDE_INT_UC (1) #define HOST_WIDE_INT_M1 HOST_WIDE_INT_C (-1) #define HOST_WIDE_INT_M1U HOST_WIDE_INT_UC (-1) #define gcc_checking_assert(test) HOST_WIDE_INT sext_hwi (HOST_WIDE_INT src, unsigned int prec) { if (prec == HOST_BITS_PER_WIDE_INT) return src; else { gcc_checking_assert (prec HOST_BITS_PER_WIDE_INT); HOST_WIDE_INT sign_mask = HOST_WIDE_INT_1 (prec - 1); HOST_WIDE_INT value_mask = (HOST_WIDE_INT_1U prec) - HOST_WIDE_INT_1U; return (((src value_mask) ^ sign_mask) - sign_mask); } }
Re: [PATCH 06/15] rs6000: Use rldiwi in constant construction
On Tue, Aug 11, 2015 at 06:11:35PM -0700, Richard Henderson wrote: @@ -8173,6 +8173,13 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget) if (exam_mask (-1, c, sub_budget)) return true; + /* If the two halves are equal, use an insert. */ + if (c 32 == test exam_sub (test, sub_budget)) +{ + opN (VEC_DUPLICATE, 0xu); /* RLDIMI */ + return true; +} Does this work for c with the high bit set? I think you need to cast it to unsigned HOST_WIDE_INT first? Segher
[PATCH, PR67092, PR67098 ] Remove --with-host-libstdcxx
Hi, this patch removes configure option --with-host-libstdcxx. [ As suggested here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67092#c13 ): ... I think we can no longer reliably support host libstdc++ as includes are not picked up from its location and GCC is C++ now. I suggest to remove that entirely. ... ] The option was originally introduced to support linking with a static version of ppl, but we no longer support using ppl. The behaviour of --with-host-libstdcxx is implemented in terms of other configure options, so we're not losing any functionality. Furthermore, the patch adds the missing documentation of the default behaviour of --with-stage1-ldflags. Bootstrapped and reg-tested on x86_64. OK for trunk? Thanks, - Tom Remove --with-host-libstdcxx 2015-08-12 Tom de Vries t...@codesourcery.com PR other/67092 PR other/67098 * doc/install.texi: Remove --with_host_libstdcxx item. Update --with-stage1-libs, --with-boot-ldflags and --with-boot-libs items accordingly. Mention default for --with-stage1-ldflags. * configure.ac: Remove --with_host_libstdcxx support. * configure: Regenerate. --- configure| 24 +++- configure.ac | 19 +++ gcc/doc/install.texi | 23 +++ 3 files changed, 13 insertions(+), 53 deletions(-) diff --git a/configure b/configure index 79257fd..eca5e6f 100755 --- a/configure +++ b/configure @@ -772,7 +772,6 @@ with_gmp_dir with_gmp with_gmp_include with_gmp_lib -with_host_libstdcxx with_stage1_libs with_stage1_ldflags with_boot_libs @@ -1540,8 +1539,6 @@ Optional Packages: --with-gmp-lib=PATH/lib --with-gmp-include=PATH specify directory for installed GMP include files --with-gmp-lib=PATH specify directory for the installed GMP library - --with-host-libstdcxx=L use linker arguments L to link with libstdc++ when - linking with PPL --with-stage1-libs=LIBS libraries for stage1 --with-stage1-ldflags=FLAGS linker flags for stage1 @@ -5849,20 +5846,6 @@ fi -# Allow host libstdc++ to be specified for static linking with PPL. - -# Check whether --with-host-libstdcxx was given. -if test ${with_host_libstdcxx+set} = set; then : - withval=$with_host_libstdcxx; -fi - - -case $with_host_libstdcxx in - no|yes) -as_fn_error -with-host-libstdcxx needs an argument $LINENO 5 -;; -esac - # Libraries to use for stage1 or when not bootstrapping. # Check whether --with-stage1-libs was given. @@ -5873,7 +5856,7 @@ if test ${with_stage1_libs+set} = set; then : stage1_libs=$withval fi else - stage1_libs=$with_host_libstdcxx + stage1_libs= fi @@ -5899,8 +5882,7 @@ fi -# Libraries to use for stage2 and later builds. This defaults to the -# argument passed to --with-host-libstdcxx. +# Libraries to use for stage2 and later builds. # Check whether --with-boot-libs was given. if test ${with_boot_libs+set} = set; then : @@ -5910,7 +5892,7 @@ if test ${with_boot_libs+set} = set; then : poststage1_libs=$withval fi else - poststage1_libs=$with_host_libstdcxx + poststage1_libs= fi diff --git a/configure.ac b/configure.ac index 452fc05..9241261 100644 --- a/configure.ac +++ b/configure.ac @@ -1697,18 +1697,6 @@ AC_SUBST(extra_mpc_gmp_configure_flags) AC_SUBST(extra_mpc_mpfr_configure_flags) AC_SUBST(extra_isl_gmp_configure_flags) -# Allow host libstdc++ to be specified for static linking with PPL. -AC_ARG_WITH(host-libstdcxx, -[AS_HELP_STRING([--with-host-libstdcxx=L], - [use linker arguments L to link with libstdc++ - when linking with PPL])]) - -case $with_host_libstdcxx in - no|yes) -AC_MSG_ERROR([-with-host-libstdcxx needs an argument]) -;; -esac - # Libraries to use for stage1 or when not bootstrapping. AC_ARG_WITH(stage1-libs, [AS_HELP_STRING([--with-stage1-libs=LIBS], [libraries for stage1])], @@ -1717,7 +1705,7 @@ AC_ARG_WITH(stage1-libs, else stage1_libs=$withval fi], -[stage1_libs=$with_host_libstdcxx]) +[stage1_libs=]) AC_SUBST(stage1_libs) # Linker flags to use for stage1 or when not bootstrapping. @@ -1737,8 +1725,7 @@ AC_ARG_WITH(stage1-ldflags, fi]) AC_SUBST(stage1_ldflags) -# Libraries to use for stage2 and later builds. This defaults to the -# argument passed to --with-host-libstdcxx. +# Libraries to use for stage2 and later builds. AC_ARG_WITH(boot-libs, [AS_HELP_STRING([--with-boot-libs=LIBS], [libraries for stage2 and later])], [if test $withval = no -o $withval = yes; then @@ -1746,7 +1733,7 @@ AC_ARG_WITH(boot-libs, else poststage1_libs=$withval fi], -[poststage1_libs=$with_host_libstdcxx]) +[poststage1_libs=]) AC_SUBST(poststage1_libs) # Linker flags to use for stage2 and later builds. diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi index 0e3093f..360b066 100644 --- a/gcc/doc/install.texi +++ b/gcc/doc/install.texi @@ -1853,36 +1853,27 @@ include and lib options
Re: [Aarch64][target/PR 67143] Use correct constraints on operands for atomic operations.
On 11/08/15 15:07, James Greenhalgh wrote: On Tue, Aug 11, 2015 at 02:05:37PM +0100, Matthew Wahab wrote: This patch reworks the atomic operation patterns to select the appropriate constraint for the operation. The logical operations take the constraints specified by the current lconst_atomic mode iterator while the arithmetic operations (plus, sub) take constraint I. I'm worried this still gives us a mismatch between constraints and predicates. The relevant predicates here are: (define_predicate aarch64_plus_operand (ior (match_operand 0 register_operand) (match_operand 0 aarch64_plus_immediate))) (define_predicate aarch64_plus_immediate (and (match_code const_int) (ior (match_test aarch64_uimm12_shift (INTVAL (op))) (match_test aarch64_uimm12_shift (-INTVAL (op)) But our constraint only permits: (define_constraint I A constant that can be used with an ADD operation. (and (match_code const_int) (match_test aarch64_uimm12_shift (ival Does this mean we are now loading constants we don't need to in to registers? I don't think we could cause this to ICE - but are we generating less good code than we would like? Updated the patch to make the constraints for the arithmetic operations IJ, which preserves the existing behaviour. Also added two cases to the gcc.target/aarch64/atomic-op-imm.c test to check the behaviour with large nagative numbers. Tested for aarch64-none-eabi with cross-compiled check-gcc and for aarch64-none-linux-gnu with native bootstrap and make check. Ok? Matthew gcc/ 2015-08-12 Matthew Wahab matthew.wa...@arm.com PR target/67143 * config/aarch64/atomic.md (atomic_optabmode): Replace 'lconst_atomic' with 'const_atomic'. (atomic_fetch_optabmode): Likewise. (atomic_optab_fetchmode): Likewise. * config/aarch64/iterators.md (lconst-atomic): Move below 'const_atomic'. (const_atomic): New. gcc/testsuite/ 2015-08-12 Matthew Wahab matthew.wa...@arm.com Matthias Klose d...@debian.org PR target/67143 * gcc.target/aarch64/atomic-op-imm.c (atomic_fetch_add_negative_RELAXED): New. (atomic_fetch_sub_negative_ACQUIRE): New. * gcc.target/aarch64/pr67143.c: New diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md index 1a38ac0..6e6be99 100644 --- a/gcc/config/aarch64/atomics.md +++ b/gcc/config/aarch64/atomics.md @@ -119,7 +119,7 @@ [(set (match_operand:ALLI 0 aarch64_sync_memory_operand +Q) (unspec_volatile:ALLI [(atomic_op:ALLI (match_dup 0) - (match_operand:ALLI 1 atomic_op_operand rlconst_atomic)) + (match_operand:ALLI 1 atomic_op_operand rconst_atomic)) (match_operand:SI 2 const_int_operand)] ;; model UNSPECV_ATOMIC_OP)) (clobber (reg:CC CC_REGNUM)) @@ -164,7 +164,7 @@ (set (match_dup 1) (unspec_volatile:ALLI [(atomic_op:ALLI (match_dup 1) - (match_operand:ALLI 2 atomic_op_operand rlconst_atomic)) + (match_operand:ALLI 2 atomic_op_operand rconst_atomic)) (match_operand:SI 3 const_int_operand)] ;; model UNSPECV_ATOMIC_OP)) (clobber (reg:CC CC_REGNUM)) @@ -209,7 +209,7 @@ [(set (match_operand:ALLI 0 register_operand =r) (atomic_op:ALLI (match_operand:ALLI 1 aarch64_sync_memory_operand +Q) - (match_operand:ALLI 2 atomic_op_operand rlconst_atomic))) + (match_operand:ALLI 2 atomic_op_operand rconst_atomic))) (set (match_dup 1) (unspec_volatile:ALLI [(match_dup 1) (match_dup 2) diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 5d7966d..b8a45d1 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -345,9 +345,6 @@ ;; Attribute to describe constants acceptable in logical operations (define_mode_attr lconst [(SI K) (DI L)]) -;; Attribute to describe constants acceptable in atomic logical operations -(define_mode_attr lconst_atomic [(QI K) (HI K) (SI K) (DI L)]) - ;; Map a mode to a specific constraint character. (define_mode_attr cmode [(QI q) (HI h) (SI s) (DI d)]) @@ -843,6 +840,16 @@ (plus aarch64_plus_operand) (minus aarch64_plus_operand)]) +;; Constants acceptable for atomic operations. +;; This definition must appear in this file before the iterators it refers to. +(define_code_attr const_atomic + [(plus IJ) (minus IJ) + (xor lconst_atomic) (ior lconst_atomic) + (and lconst_atomic)]) + +;; Attribute to describe constants acceptable in atomic logical operations +(define_mode_attr lconst_atomic [(QI K) (HI K) (SI K) (DI L)]) + ;; --- ;; Int Iterators. ;; --- diff --git a/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c b/gcc/testsuite/gcc.target/aarch64/atomic-op-imm.c index 6c6f7e1..47d7a96 100644 ---
Re: [PATCH 04/15] rs6000: Implement set_const_data infrastructure
Hi Richard, You wanted us to read this file... On Tue, Aug 11, 2015 at 06:11:33PM -0700, Richard Henderson wrote: + -- The fallback generation for the most complex word_mode constants. + The receipe built will be the full MAX_COST insns, as we will ^-- typo. Segher
Re: [PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL
Jeff Law l...@redhat.com writes: On 08/10/2015 06:05 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, In many places gcc puts classes in the anon namespace so the compiler can tell they do not get inheritted from to enable better devirtualization. However debugging code in the anon namespace can be a pain, and the same thing can be accomplished more directly by marking the classes as final. When bootstrapping stage 3 should always be built in C++14 mode now, and of course will always be newer than gcc 4.7, so these classes will always be marked as final there. AIUI cross compilers are supposed to be built with recent gcc, which I would tend to think implies newer than 4.7, so they should also be built with these classes marked as final. I believe that means in all important cases this works just as well as the anon namespace. bootstrapped + regtested on x86_64-linux-gnu, ok? Trev gcc/ChangeLog: 2015-08-10 Trevor Saunders tbsau...@tbsaunde.org * compare-elim.c, dce.c, dse.c, gimple-ssa-isolate-paths.c, gimple-ssa-strength-reduction.c, graphite.c, init-regs.c, ipa-pure-const.c, ipa-visibility.c, ipa.c, mode-switching.c, omp-low.c, reorg.c, sanopt.c, trans-mem.c, tree-eh.c, tree-if-conv.c, tree-ssa-copyrename.c, tree-ssa-dce.c, tree-ssa-dom.c, tree-ssa-dse.c, tree-ssa-forwprop.c, tree-ssa-sink.c, tree-ssanames.c, tree-stdarg.c, tree-tailcall.c, tree-vect-generic.c, tree.c, ubsan.c, var-tracking.c, vtable-verify.c, web.c: Use GCC_FINAL instead of the anonymous namespace. OK. I was hoping someone else was going to speak up since I seem to have been posting a few negative messages recently, but I think this is really a step in the wrong direction. I think the code was using anonymous namespaces in exactly the way they were intended to be used. We don't want to forbid the use of static and instead force every file-local function and variable into the global namespace. Why should classes be any different? Anonymous clases are a commonly-used feature, so if there's a specific problem with debugging them, I think it would be better to fix gdb (or whatever) rather than work around the problem like this. Thanks, Richard
Re: [PATCH 2/2] replace several uses of the anon namespace with GCC_FINAL
On 08/12/2015 12:57 PM, Richard Sandiford wrote: Jeff Law l...@redhat.com writes: On 08/10/2015 06:05 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders tbsaunde+...@tbsaunde.org Hi, In many places gcc puts classes in the anon namespace so the compiler can tell they do not get inheritted from to enable better devirtualization. However debugging code in the anon namespace can be a pain, and the same thing can be accomplished more directly by marking the classes as final. When bootstrapping stage 3 should always be built in C++14 mode now, and of course will always be newer than gcc 4.7, so these classes will always be marked as final there. AIUI cross compilers are supposed to be built with recent gcc, which I would tend to think implies newer than 4.7, so they should also be built with these classes marked as final. I believe that means in all important cases this works just as well as the anon namespace. bootstrapped + regtested on x86_64-linux-gnu, ok? Trev gcc/ChangeLog: 2015-08-10 Trevor Saunders tbsau...@tbsaunde.org * compare-elim.c, dce.c, dse.c, gimple-ssa-isolate-paths.c, gimple-ssa-strength-reduction.c, graphite.c, init-regs.c, ipa-pure-const.c, ipa-visibility.c, ipa.c, mode-switching.c, omp-low.c, reorg.c, sanopt.c, trans-mem.c, tree-eh.c, tree-if-conv.c, tree-ssa-copyrename.c, tree-ssa-dce.c, tree-ssa-dom.c, tree-ssa-dse.c, tree-ssa-forwprop.c, tree-ssa-sink.c, tree-ssanames.c, tree-stdarg.c, tree-tailcall.c, tree-vect-generic.c, tree.c, ubsan.c, var-tracking.c, vtable-verify.c, web.c: Use GCC_FINAL instead of the anonymous namespace. OK. I was hoping someone else was going to speak up since I seem to have been posting a few negative messages recently, but I think this is really a step in the wrong direction. I think the code was using anonymous namespaces in exactly the way they were intended to be used. No need to worry about seeming to be negative. The problem is you can't get to stuff in the anonymous namespace easily in the debugger. There was talk of fixing that, but I don't think it ever went forward on the gdb side. If gdb were to get fixed so that debugging this stuff was easier, then I'd fully support putting things back into the anonymous namespace. jeff
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On Aug 12, 2015, at 11:07 AM, Jeff Law l...@redhat.com wrote: On 08/12/2015 11:12 AM, Richard Biener wrote: Prec is almost never a constant and is heavily used from wide-int. We are not exploiting this undefined ness in C so I object to making this so much slower. Can we instead do what we do for abs_hwi and add a checking assert so we can move the tests to the callers that misbehave instead? Given that ISO C++ is moving away from making shifting 1 into the sign bit undefined behaviour, maybe we should make UBSan less strict in its warning. That may eliminate the need for Mikael's patch. Pedantically, then, we have to move the the later language standard (if this was standardese). If it was DRese, then we’re safe. :-)
Re: Fix reload1.c warning for some targets
On 08/05/2015 08:18 AM, Richard Sandiford wrote: Building some targets results in a warning about orig_dup[i] potentially being used uninitialised. I think the warning is fair, since it isn't obvious that the reog_data-based loop bound remains unchanged between: for (i = 0; i recog_data.n_dups; i++) orig_dup[i] = *recog_data.dup_loc[i]; and: for (i = 0; i recog_data.n_dups; i++) *recog_data.dup_loc[i] = orig_dup[i]; Tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * reload1.c (elimination_costs_in_insn): Make it obvious to the compiler that the n_dups and n_operands loop bounds are invariant. So thinking more about this, I think the best way forward is to: 1. Create a new BZ with the false positive extracted from c#4. 2. Install your patch and close 55035. I'll take care of #1, you can handle #2. jeff
C++ PATCH for c++/67104 (wrong handling of array and constexpr)
cxx_eval_array_reference was assuming that the CONSTRUCTOR for an array has one entry per array element, in order. But cxx_eval_store_expression doesn't try to create such a CONSTRUCTOR, and other places use RANGE_EXPRs, so we need to be prepared to handle these cases. Tested x86_64-pc-linux-gnu, applying to trunk and 5. commit 1aa0b7f99a8bbea646c9bf4600f32200b7400ace Author: Jason Merrill ja...@redhat.com Date: Wed Aug 12 17:56:21 2015 +0100 PR c++/67104 * constexpr.c (cxx_eval_array_reference): Handle sparse CONSTRUCTORs. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 218faec..b6788c7 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -1697,7 +1697,38 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree t, VERIFY_CONSTANT (ary); gcc_unreachable (); } - if (compare_tree_int (index, len) = 0) + + i = tree_to_shwi (index); + bool found = true; + if (TREE_CODE (ary) == CONSTRUCTOR len + (TREE_CODE (CONSTRUCTOR_ELT (ary, len-1)-index) == RANGE_EXPR + || compare_tree_int (CONSTRUCTOR_ELT (ary, len-1)-index, len-1))) +{ + /* The last element doesn't match its position in the array; this must be + a sparse array from cxx_eval_store_expression. So iterate. */ + found = false; + vecconstructor_elt, va_gc *v = CONSTRUCTOR_ELTS (ary); + constructor_elt *e; + for (unsigned ix = 0; vec_safe_iterate (v, ix, e); ++i) + { + if (TREE_CODE (e-index) == RANGE_EXPR) + { + tree lo = TREE_OPERAND (e-index, 0); + tree hi = TREE_OPERAND (e-index, 1); + if (tree_int_cst_le (lo, index) tree_int_cst_le (index, hi)) + found = true; + } + else if (tree_int_cst_equal (e-index, index)) + found = true; + if (found) + { + i = ix; + break; + } + } +} + + if (i = len || !found) { if (tree_int_cst_lt (index, array_type_nelts_top (TREE_TYPE (ary { @@ -1714,14 +1745,14 @@ cxx_eval_array_reference (const constexpr_ctx *ctx, tree t, *non_constant_p = true; return t; } - else if (tree_int_cst_lt (index, integer_zero_node)) + else if (i 0) { if (!ctx-quiet) error (negative array subscript); *non_constant_p = true; return t; } - i = tree_to_shwi (index); + if (TREE_CODE (ary) == CONSTRUCTOR) return (*CONSTRUCTOR_ELTS (ary))[i].value; else if (elem_nchars == 1) diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-array1.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-array1.C new file mode 100644 index 000..efe4617 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-array1.C @@ -0,0 +1,20 @@ +// PR c++/67104 +// { dg-do compile { target c++14 } } + +template typename T, int N struct array +{ + constexpr T operator[](int index) { return data[index]; } + constexpr T operator[](int index) const { return data[index]; } + T data[N]; +}; + +constexpr arraylong unsigned, 1001 +make_bottle_count () +{ + arraylong unsigned, 1001 a{}; + a[65] = 1; + return a; +} + +constexpr auto bottle_count = make_bottle_count (); +static_assert (bottle_count[65], );
[gomp4] declare directive
Hi, The attached patch is a revision of the functionality required to handle the declare directive. The motivation for the change was that the original code was deemed to fragile and difficult to maintain. The new code is now smaller, in terms of line count, and hopefully, easier to understand. Given the significant amount of changes, I've included a commentary on entire functions, rather than comments about the specific changes. Hopefully, this will help in the review process. gimple.c : gimplify_oacc_declare () The first part of this function deals with handling the entry clauses. After scanning the clauses, we spin through the list and while adding the appropriate ones to the context variable list. Also if any 'oacc declare' attributes are around, toss them. The next section deals with the return clauses. Again scan them, but after scanning them toss the context that was created as a result of the scan. We don't need the context as the context that was created with the entry clauses will suffice. After creation of the gimple statement, save it away so that it can be used at exit time. gimple.c : gimplify_body () This is the place where the 'exit' clauses are used and inserted via a gimple statement. This is also the place where the omp context is 'unwound'. fortran/trans-decl.c: finish_oacc_declare () This function is called to handle the declare directives. The information that was contained in the directives during parsing of the variable section is hanging off the namespace (ns). The primary function here is to create the two sets of derived clauses and associate them with a newly created statement, i.e, gfc_code. This new statement is then inserted at the beginning of the statement chain associated with 'ns'. c/c-parser.c c_parser_oacc_declare () The initial section of the function is doing syntax checking. If no errors are found and the variable is 'global', then an attribute is added to the variable. This is followed by the creation of the derived clauses. In the final section, when a global is at hand, a constructor is created. The constructor will pass on information to the runtime. On the other hand if the variable is local, then a statement will be inserted and created that will contain both sets of derived clauses. cp/parser.c: cp_parser_oacc_declare () The initial section of the function is doing syntax checking. If no errors are found and the variable is 'global', then an attribute is added to the variable. This is followed by the creation of the derived clauses. In the final section, when a global is at hand, a constructor is created. The constructor will pass on information to the runtime. On the other hand if the variable is local, then a statement will be inserted and created that will contain both sets of derived clauses. I'll commit this to gomp-4_0-branch sometime during the week of 17 August. Thanks! Jim diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index b452235..3f3e8c0 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -1452,319 +1452,6 @@ c_parser_external_declaration (c_parser *parser) } } -static tree -check_oacc_vars_1 (tree *tp, int *, void *l) -{ - if (TREE_CODE (*tp) == VAR_DECL TREE_PUBLIC (*tp)) -{ - location_t loc = DECL_SOURCE_LOCATION (*tp); - tree attrs; - attrs = lookup_attribute (oacc declare, DECL_ATTRIBUTES (*tp)); - if (attrs) - { - tree t; - - for (t = TREE_VALUE (attrs); t; t = TREE_CHAIN (t)) - { - loc = EXPR_LOCATION ((tree) l); - - if (OMP_CLAUSE_MAP_KIND (TREE_VALUE (t)) == GOMP_MAP_LINK) - { - error_at (loc, %link% clause cannot be used with %qE, - *tp); - break; - } - } - } - else - error_at (loc, no %#pragma acc declare% for %qE, *tp); -} - return NULL_TREE; -} - -static tree -check_oacc_vars (tree *tp, int *, void *) -{ - if (TREE_CODE (*tp) == STATEMENT_LIST) -{ - tree_stmt_iterator i; - - for (i = tsi_start (*tp); !tsi_end_p (i); tsi_next (i)) - { - tree t = tsi_stmt (i); - walk_tree_without_duplicates (t, check_oacc_vars_1, t); - } -} - - return NULL_TREE; -} - -static struct oacc_return -{ - tree_stmt_iterator iter; - tree stmt; - int op; - struct oacc_return *next; -} *oacc_returns; - -static tree -find_oacc_return (tree *tp, int *, void *) -{ - if (TREE_CODE (*tp) == STATEMENT_LIST) -{ - tree_stmt_iterator i; - - for (i = tsi_start (*tp); !tsi_end_p (i); tsi_next (i)) - { - tree t; - struct oacc_return *r; - - t = tsi_stmt (i); - - if (TREE_CODE (t) == RETURN_EXPR) - { - r = XNEW (struct
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On August 12, 2015 8:07:13 PM GMT+02:00, Jeff Law l...@redhat.com wrote: On 08/12/2015 11:12 AM, Richard Biener wrote: Prec is almost never a constant and is heavily used from wide-int. We are not exploiting this undefined ness in C so I object to making this so much slower. Can we instead do what we do for abs_hwi and add a checking assert so we can move the tests to the callers that misbehave instead? Given that ISO C++ is moving away from making shifting 1 into the sign bit undefined behaviour, maybe we should make UBSan less strict in its warning. That may eliminate the need for Mikael's patch. We can also use an logical left shift followed by an arithmetic right shift. Or is the latter invoking undefined behaviour as well in some cases we hit? Richard. jeff
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On 08/12/2015 12:32 PM, Richard Biener wrote: On August 12, 2015 8:07:13 PM GMT+02:00, Jeff Law l...@redhat.com wrote: On 08/12/2015 11:12 AM, Richard Biener wrote: Prec is almost never a constant and is heavily used from wide-int. We are not exploiting this undefined ness in C so I object to making this so much slower. Can we instead do what we do for abs_hwi and add a checking assert so we can move the tests to the callers that misbehave instead? Given that ISO C++ is moving away from making shifting 1 into the sign bit undefined behaviour, maybe we should make UBSan less strict in its warning. That may eliminate the need for Mikael's patch. We can also use an logical left shift followed by an arithmetic right shift. Or is the latter invoking undefined behaviour as well in some cases we hit? Hmm, why aren't we using logicals for the left shift to begin with? That's the problem area. I don't think the right shifts are an issue at all. It's strange that when I was researching this, consistently I saw folks suggesting the LEFT-SHIFT followed by RIGHT-SHIFT, then it'd get shot down as UB, then folks went to either Mikael's approach or another that is very similar. Nobody suggested twidding the types to get a left-logical-shift, then doing a right-arithmetic-shift. Jeff
Go patch committed: Don't make GC symbol for invalid array type
This patch by Chris Manghane changes the Go frontend to not make a GC symbol for an invalid array type. This avoids a compiler crash and fixes https://golang.org/issue/11539 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 226795) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -55175f7ee0db2c1e68423216d7744be80071ed6c +5fc38e74d132cd6f4e7b56e6bcf9fe57031ab203 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/types.cc === --- gcc/go/gofrontend/types.cc (revision 226510) +++ gcc/go/gofrontend/types.cc (working copy) @@ -6423,7 +6423,10 @@ Array_type::array_gc_symbol(Gogo* gogo, unsigned long bound; if (!this-length_-numeric_constant_value(nc) || nc.to_unsigned_long(bound) == Numeric_constant::NC_UL_NOTINT) -go_assert(saw_errors()); +{ + go_assert(saw_errors()); + return; +} Btype* pbtype = gogo-backend()-pointer_type(gogo-backend()-void_type()); int64_t pwidth = gogo-backend()-type_size(pbtype);
C++ PATCH for c++/67108 (ICE with constexpr and tree dumps)
We can't do tsubsting once cgraph starts throwing away front end information; use the at_eof flag to communicate that we've reached that point. Tested x86_64-pc-linux-gnu, applying to trunk. commit 61a9d0354261705979003d15ebd7c97605d6dc2e Author: Jason Merrill ja...@redhat.com Date: Wed Aug 12 15:54:06 2015 +0100 PR c++/67108 * decl2.c (c_parse_final_cleanups): Set at_eof to 2 at end. * error.c (dump_template_bindings): Don't tsubst in that case. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 78fd4af..ab6b3ec 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -4797,7 +4797,8 @@ extern GTY(()) vectree, va_gc *local_classes; #endif /* !defined(NO_DOLLAR_IN_LABEL) || !defined(NO_DOT_IN_LABEL) */ -/* Nonzero if we're done parsing and into end-of-file activities. */ +/* Nonzero if we're done parsing and into end-of-file activities. + Two if we're done with front-end processing. */ extern int at_eof; diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 068d79c..8e7a453 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -4846,6 +4846,9 @@ c_parse_final_cleanups (void) timevar_stop (TV_PHASE_DEFERRED); timevar_start (TV_PHASE_PARSING); + + /* Indicate that we're done with front end processing. */ + at_eof = 2; } /* Perform any post compilation-proper cleanups for the C++ front-end. diff --git a/gcc/cp/error.c b/gcc/cp/error.c index ae3e092..faf8744 100644 --- a/gcc/cp/error.c +++ b/gcc/cp/error.c @@ -339,6 +339,11 @@ dump_template_bindings (cxx_pretty_printer *pp, tree parms, tree args, !DECL_LANG_SPECIFIC (current_function_decl)) return; + /* Don't try to do this once cgraph starts throwing away front-end + information. */ + if (at_eof = 2) +return; + FOR_EACH_VEC_SAFE_ELT (typenames, i, t) { if (need_semicolon) diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-targ3.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-targ3.C new file mode 100644 index 000..d1e4482 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-targ3.C @@ -0,0 +1,40 @@ +// PR c++/67108 +// { dg-do compile { target c++11 } } + +template typename, typename struct is_same; +template typename T struct is_same T, T +{ + enum + { +value = true + } + ; +constexpr bool operator () () + { +return value; + } +} + ; +template bool, typename = void struct enable_if; +template typename T struct enable_if true, T +{ + typedef T type; +} + ; +struct A; +template typename, typename = void struct F; +template typename X struct F X, typename enable_if is_same X, A +{ +} + () ::type +{ + template typename MakeDependent F (MakeDependent) + { + } +} +; + +int main () +{ + F A (1); +}
Re: [C/C++ PATCH] Fix -Wshift-overflow with sign bit
On 08/12/2015 11:28 AM, Toon Moene wrote: On 08/12/2015 05:39 PM, Marek Polacek wrote: This patch fixes a defect in -Wshift-overflow. We should only warn about left-shifting 1 into the sign bit when -Wshift-overflow=2. But this doesn't apply only for 1 31, but also for 2 30, etc. In C++14, never warn about this. And then there's this: https://gcc.gnu.org/ml/gcc-testresults/2015-08/msg01036.html [ Yes, that's at run time, not compile time ... ] Hoping some of those are fixed by the sext_hwi changes from Mikael. jeff
Re: [C/C++ PATCH] Fix -Wshift-overflow with sign bit
On Wed, Aug 12, 2015 at 11:33:21AM -0600, Jeff Law wrote: On 08/12/2015 11:28 AM, Toon Moene wrote: https://gcc.gnu.org/ml/gcc-testresults/2015-08/msg01036.html [ Yes, that's at run time, not compile time ... ] Hoping some of those are fixed by the sext_hwi changes from Mikael. I hope too ;). But they have nothing to do with the -Wshift-overflow IMHO. Marek
Re: [C/C++ PATCH] Fix -Wshift-overflow with sign bit
On 08/12/2015 11:40 AM, Marek Polacek wrote: On Wed, Aug 12, 2015 at 11:33:21AM -0600, Jeff Law wrote: On 08/12/2015 11:28 AM, Toon Moene wrote: https://gcc.gnu.org/ml/gcc-testresults/2015-08/msg01036.html [ Yes, that's at run time, not compile time ... ] Hoping some of those are fixed by the sext_hwi changes from Mikael. I hope too ;). But they have nothing to do with the -Wshift-overflow IMHO. Right. Toon was just pointing out that we're getting the left-shift errors from ubsan all over the place. Those are totally independent if your -Wshift-overflow stuff. Jeff
Re: [PATCH] xtensa: add -mauto-litpools option
On Wed, Aug 12, 2015 at 7:35 PM, augustine.sterl...@gmail.com augustine.sterl...@gmail.com wrote: On Tue, Aug 11, 2015 at 6:09 PM, Max Filippov jcmvb...@gmail.com wrote: With support from assembler this option allows compiling huge functions, where single literal pool at the beginning of a function may not be reachable by L32R instructions at its end. Currently assembler --auto-litpools option cannot deal with literals used from multiple locations separated by more than 256 KBytes of code. Don't turn constants into literals, instead use MOVI instruction to load them into registers and let the assembler turn them into literals as necessary. If this is OK with the linux people, it is OK with me. As I recall, they used to have a need to keep literals in page-level groups, but my memory is hazy. Text-section-literals remain available, and without auto-litpools option code generation doesn't change at all. Even with auto-litpools option literals will tend to pool at the beginning of functions, so code generation shouldn't change much. I've applied the patch to trunk. -- Thanks. -- Max
Re: [Patch] sext_hwi: Avoid left shift of negative value undefined behaviour
On 08/12/2015 11:12 AM, Richard Biener wrote: Prec is almost never a constant and is heavily used from wide-int. We are not exploiting this undefined ness in C so I object to making this so much slower. Can we instead do what we do for abs_hwi and add a checking assert so we can move the tests to the callers that misbehave instead? Given that ISO C++ is moving away from making shifting 1 into the sign bit undefined behaviour, maybe we should make UBSan less strict in its warning. That may eliminate the need for Mikael's patch. jeff
C++ PATCH for c++/67161 (ICE with template-id)
When we're dealing with a template-id, all the arguments are non-default, so don't try to GET_NON_DEFAULT_TEMPLATE_ARGS_COUNT. Tested x86_64-pc-linux-gnu, applying to trunk and 5. commit 15818891060dc29f108ebe5929b1b85250f143a7 Author: Jason Merrill ja...@redhat.com Date: Wed Aug 12 15:28:20 2015 +0100 PR c++/67161 * error.c (dump_decl) [TEMPLATE_ID_EXPR]: Pass TFF_NO_OMIT_DEFAULT_TEMPLATE_ARGUMENTS. diff --git a/gcc/cp/error.c b/gcc/cp/error.c index 4f85751..ae3e092 100644 --- a/gcc/cp/error.c +++ b/gcc/cp/error.c @@ -1212,7 +1212,8 @@ dump_decl (cxx_pretty_printer *pp, tree t, int flags) if (args == error_mark_node) pp_string (pp, M_(template arguments error)); else if (args) - dump_template_argument_list (pp, args, flags); + dump_template_argument_list + (pp, args, flags|TFF_NO_OMIT_DEFAULT_TEMPLATE_ARGUMENTS); pp_cxx_end_template_argument_list (pp); } break; diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ44.C b/gcc/testsuite/g++.dg/cpp1y/var-templ44.C new file mode 100644 index 000..2fc21a5 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ44.C @@ -0,0 +1,29 @@ +// PR c++/67161 +// { dg-do compile { target c++14 } } + +template typename _Tp struct integral_constant { + static constexpr _Tp value = 0; +}; +template bool, typename, typename struct conditional; +template typename... struct __or_; +template typename _B1, typename _B2 +struct __or__B1, _B2 : conditional1, _B1, _B2::type {}; +template typename... struct __and_; +template typename struct __not_ : integral_constantbool {}; +template typename struct __is_void_helper : integral_constantbool {}; +template typename struct is_void : __is_void_helperint {}; +template bool, typename _Iftrue, typename struct conditional { + typedef _Iftrue type; +}; +template bool _Cond, typename _Iftrue, typename _Iffalse +using conditional_t = typename conditional_Cond, _Iftrue, _Iffalse::type; +template typename... using common_type_t = int; +template typename, int struct array {}; +template typename _Tp constexpr int is_void_v = is_void_Tp::value; +template typename _Dest = void, typename... _Types +constexpr auto make_array() +- arrayconditional_tis_void_v_Dest, common_type_t, _Dest, + sizeof...(_Types) { + static_assert(__or___not_is_void_Dest, __and_::value, ); // { dg-error static assert } +} +auto d = make_array();