Re: [PATCH] libgomp: Add comment to clarify last_team usage
On Fri, Jul 03, 2015 at 03:09:27PM +0200, Sebastian Huber wrote: libgomp/ChangeLog 2015-07-03 Sebastian Huber sebastian.hu...@embedded-brains.de * libgomp.h (gomp_thread_pool): Comment last_team field. --- libgomp/libgomp.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 5272f01..5ed0f78 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -458,6 +458,9 @@ struct gomp_thread_pool struct gomp_thread **threads; unsigned threads_size; unsigned threads_used; + /* The last team is used for non-nested teams to delay their destruction to + make sure all the threads in the team move on to the pool's barrier before + the team's barrier is destroyed. */ struct gomp_team *last_team; /* Number of threads running in this contention group. */ unsigned long threads_busy; -- 1.8.4.5 Ok for trunk. Jakub
Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator
On 07/03/2015 03:07 PM, Richard Sandiford wrote: Martin Jambor mjam...@suse.cz writes: On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote: Trevor Saunders tbsau...@tbsaunde.org writes: On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote: Martin Liška mli...@suse.cz writes: diff --git a/gcc/asan.c b/gcc/asan.c index e89817e..dabd6f1 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -362,20 +362,20 @@ struct asan_mem_ref /* Pool allocation new operator. */ inline void *operator new (size_t) { -return pool.allocate (); +return ::new (pool.allocate ()) asan_mem_ref (); } /* Delete operator utilizing pool allocation. */ inline void operator delete (void *ptr) { -pool.remove ((asan_mem_ref *) ptr); +pool.remove (ptr); } /* Memory allocation pool. */ - static pool_allocatorasan_mem_ref pool; + static pool_allocator pool; }; I'm probably going over old ground/wounds, sorry, but what's the benefit of having this sort of pattern? Why not simply have object_allocators and make callers use pool.allocate () and pool.remove (x) (with pool.remove calling the destructor) instead of new and delete? It feels wrong to me to tie the data type to a particular allocation object like this. Well the big question is what does allocate() do about construction? if it seems wierd for it to not call the ctor, but I'm not sure we can do a good job of forwarding args to allocate() with C++98. If you need non-default constructors then: new (pool) type (aaa, bbb)...; doesn't seem too bad. I agree object_allocator's allocate () should call the constructor. but then the pool allocator must not call placement new on the allocated memory itself because that would result in double construction. But we're talking about two different methods. The normal allocator object_allocator T::allocate () would use placement new and return a pointer to the new object while operator new (size_t, object_allocator T ) wouldn't call placement new and would just return a pointer to the memory. And using the pool allocator functions directly has the nice property that you can tell when a delete/remove isn't necessary because the pool itself is being cleared. Well, all these cases involve a pool with static storage lifetime right? so actually if you don't delete things in these pool they are effectively leaked. They might have a static storage lifetime now, but it doesn't seem like a good idea to hard-bake that into the interface Does that mean that operators new and delete are considered evil? Not IMO. Just that static load-time-initialized caches are not necessarily a good thing. That's effectively what the pool allocator is. (by saying that for these types you should use new and delete, but for other pool-allocated types you should use object_allocators). Depending on what kind of pool allocator you use, you will be forced to either call placement new or not, so the inconsistency will be there anyway. But how we handle argument-taking constructors is a problem that needs to be solved for the pool-allocated objects that don't use a single static type-specific pool. And once we solve that, we get consistency across all pools: - if you want a new object and argumentless construction is OK, use pool.allocate () - if you want a new object and need to pass arguments to the constructor, use new (pool) some_type (arg1, arg2, ...) Maybe I just have bad memories from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot of state that was obviously static in the old days, but that needed to become non-static to support vaguely-efficient switching between different subtargets. The same kind of thing is likely to happen again. I assume things like the jit would prefer not to have new global state with load-time construction. I'm not sure I follow this branch of the discussion, the allocators of any kind surely can dynamically allocated themselves? Sure, but either (a) you keep the pools as a static part of the class and some initialisation and finalisation code that has tendrils into all such classes or (b) you move the static pool outside of the class to some new (still global) state. Explicit pool allocation, like in the C days, gives you the option of putting the pool whereever it needs to go without relying on the principle that you can get to it from global state. Thanks, Richard Ok Richard. I've just finally understood your suggestions and I would suggest following: + I will add a new method to object_allocatorT that will return an allocated memory (void*) (w/o calling any construction) + object_allocatorT::allocate will call placement new with for a parameterless ctor + I will remove all overwritten operators new/delete on e.g. et_forest, ... + For these classes, I will add void* operator new (size_t, object_allocatorT ) + Pool
Re: C++ PATCH for c++/66748 (ICE with abi_tag on enum)
OK, thanks. Jason
Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)
On Fri, Jul 03, 2015 at 03:41:29PM +0200, Richard Biener wrote: The fallout (at least on x86_64) is surprisingly small, i.e. none, just gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due to a bug in the vectorizer. Jakub has a patch and knows the details. As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants (that was the motivation of this pass). Here is the fix for that. The problem is that for simd clone calls, if they have void return type, STMT_VINFO_VECTYPE is NULL. If vectorize_simd_clone_call succeeds, that is fine, but if it doesn't, we can fall into all the other vectorizable_* functions, and some of them compute some variables IMHO prematurely. It doesn't make sense to compute nunits/ncopies etc. if stmt isn't even an assignment etc. So, this patch adjusts the few routines that had this problem, so that we check is_gimple_assign and gimple_assign_rhs_code or whatever is the quick GIMPLE test those functions use to find if stmt is of interest to them, and only when it is, compute whatever they need later. As NULL STMT_VINFO_VECTYPE can happen only for calls, all these functions don't ICE anymore. Ok for trunk if it passes bootstrap/regtest? In the pr59984.c testcase, with Marek's patch and this patch, one loop in test is already vectorized (the ICE was on the other one), I'll work on recognizing multiples of GOMP_SIMD_LANE () as linear next, so that we vectorize also the loop with bar. Without Marek's patch we weren't vectorizing any of the two loops. 2015-07-03 Jakub Jelinek ja...@redhat.com PR tree-optimization/66718 * tree-vect-stmts.c (vectorizable_assignment, vectorizable_store, vectorizable_load, vectorizable_condition): Move vectype, nunits, ncopies computation after checking what kind of statement stmt is. --- gcc/tree-vect-stmts.c.jj2015-06-30 14:08:45.0 +0200 +++ gcc/tree-vect-stmts.c 2015-07-03 14:06:28.843573210 +0200 @@ -4043,13 +4043,11 @@ vectorizable_assignment (gimple stmt, gi tree scalar_dest; tree op; stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - tree vectype = STMT_VINFO_VECTYPE (stmt_info); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); tree new_temp; tree def; gimple def_stmt; enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type}; - unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype); int ncopies; int i, j; vectree vec_oprnds = vNULL; @@ -4060,16 +4058,6 @@ vectorizable_assignment (gimple stmt, gi enum tree_code code; tree vectype_in; - /* Multiple types in SLP are handled by creating the appropriate number of - vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in - case of SLP. */ - if (slp_node || PURE_SLP_STMT (stmt_info)) -ncopies = 1; - else -ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; - - gcc_assert (ncopies = 1); - if (!STMT_VINFO_RELEVANT_P (stmt_info) !bb_vinfo) return false; @@ -4095,6 +4083,19 @@ vectorizable_assignment (gimple stmt, gi if (code == VIEW_CONVERT_EXPR) op = TREE_OPERAND (op, 0); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype); + + /* Multiple types in SLP are handled by creating the appropriate number of + vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in + case of SLP. */ + if (slp_node || PURE_SLP_STMT (stmt_info)) +ncopies = 1; + else +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + + gcc_assert (ncopies = 1); + if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo, def_stmt, def, dt[0], vectype_in)) { @@ -5006,7 +5007,6 @@ vectorizable_store (gimple stmt, gimple_ tree vec_oprnd = NULL_TREE; stmt_vec_info stmt_info = vinfo_for_stmt (stmt); struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL; - tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree elem_type; loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); struct loop *loop = NULL; @@ -5020,7 +5020,6 @@ vectorizable_store (gimple stmt, gimple_ tree dataref_ptr = NULL_TREE; tree dataref_offset = NULL_TREE; gimple ptr_incr = NULL; - unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype); int ncopies; int j; gimple next_stmt, first_stmt = NULL; @@ -5039,28 +5038,6 @@ vectorizable_store (gimple stmt, gimple_ bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); tree aggr_type; - if (loop_vinfo) -loop = LOOP_VINFO_LOOP (loop_vinfo); - - /* Multiple types in SLP are handled by creating the appropriate number of - vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in - case of SLP. */ - if (slp || PURE_SLP_STMT (stmt_info)) -ncopies = 1; - else -ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; - - gcc_assert (ncopies = 1); - - /* FORNOW. This restriction should be
Re: [PATCH 0/3] [ARM] PR63870 improve error messages for NEON vldN_lane/vstN_lane
Charles Baylis wrote: These patches are a port of the changes do the same thing for AArch64 (see https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01984.html) The first patch ports over some infrastructure, and the second converts the vldN_lane and vstN_lane intrinsics. The changes required for vget_lane and vset_lane will be done in a future patch. The third patch includes the test cases from the AArch64 version, except that the xfails for arm targets have been removed. If this series gets approved before the AArch64 patch, I will commit the tests with xfail for aarch64 targets. Given the large number of test cases, essentially because of test framework limitations, does it make sense to put these in their own directory? Just a thought. Cheers, Alan
Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)
On Fri, 3 Jul 2015, Richard Biener wrote: On Fri, 3 Jul 2015, Marek Polacek wrote: This patch implements a new pass, called laddress, which deals with lowering ADDR_EXPR assignments. Such lowering ought to help the vectorizer, but it also could expose more CSE opportunities, maybe help reassoc, etc. It's only active when optimize != 0. So e.g. _1 = (sizetype) i_9; _7 = _1 * 4; _4 = b + _7; instead of _4 = b[i_9]; This triggered 14105 times during the regtest and 6392 times during the bootstrap. The fallout (at least on x86_64) is surprisingly small, i.e. none, just gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due to a bug in the vectorizer. Jakub has a patch and knows the details. As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants (that was the motivation of this pass). This doesn't introduce any kind of verification nor PROP_laddress. Don't know if we want that, but hopefully it can be done as a follow-up if we do. Yes. At the moment nothing requires lowered address form so this is merely an optimization (and not a bug for some later pass to re-introduce un-lowered non-invariant addresses). I can imagine that for example IVOPTs could be simplified if we didn't have this kind of addresses in the IL. Do we want to move some optimizations into this new pass, e.g. from fwprop? I think we might want to re-try forwprop_into_addr_expr before lowering the address. Well, but that's maybe just over-cautionous. Thoughts? Please move the pass before crited, crited and pre are supposed to go together. Otherwise looks ok to me. Thanks, Richard. Bootstrapped/regtested on x86_64-linux. 2015-07-03 Marek Polacek pola...@redhat.com PR tree-optimization/66718 * Makefile.in (OBJS): Add tree-ssa-laddress.o. * passes.def: Schedule pass_laddress. * timevar.def (DEFTIMEVAR): Add TV_TREE_LADDRESS. * tree-pass.h (make_pass_laddress): Declare. * tree-ssa-laddress.c: New file. * gcc.dg/vect/vect-126.c: New test. diff --git gcc/Makefile.in gcc/Makefile.in index 89eda96..2574b98 100644 --- gcc/Makefile.in +++ gcc/Makefile.in @@ -1447,6 +1447,7 @@ OBJS = \ tree-ssa-dse.o \ tree-ssa-forwprop.o \ tree-ssa-ifcombine.o \ + tree-ssa-laddress.o \ I'd say gimple-laddress.c is a better fit. There is nothing SSA specific in the pass and 'tree' is legacy... tree-ssa-live.o \ tree-ssa-loop-ch.o \ tree-ssa-loop-im.o \ diff --git gcc/passes.def gcc/passes.def index 0d8356b..ac16e8a 100644 --- gcc/passes.def +++ gcc/passes.def @@ -214,6 +214,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_cse_sincos); NEXT_PASS (pass_optimize_bswap); NEXT_PASS (pass_split_crit_edges); + NEXT_PASS (pass_laddress); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); NEXT_PASS (pass_asan); diff --git gcc/testsuite/gcc.dg/vect/vect-126.c gcc/testsuite/gcc.dg/vect/vect-126.c index e69de29..66a5821 100644 --- gcc/testsuite/gcc.dg/vect/vect-126.c +++ gcc/testsuite/gcc.dg/vect/vect-126.c @@ -0,0 +1,64 @@ +/* PR tree-optimization/66718 */ +/* { dg-do compile } */ +/* { dg-additional-options -mavx2 { target avx_runtime } } */ + +int *a[1024], b[1024]; +struct S { int u, v, w, x; }; +struct S c[1024]; +int d[1024][10]; + +void +f0 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = b[0]; +} + +void +f1 (void) +{ + for (int i = 0; i 1024; i++) +{ + int *p = b[0]; + a[i] = p + i; +} +} + +void +f2 (int *p) +{ + for (int i = 0; i 1024; i++) +a[i] = p[i]; +} + +void +f3 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = b[i]; +} + +void +f4 (void) +{ + int *p = c[0].v; + for (int i = 0; i 1024; i++) +a[i] = p[4 * i]; +} + +void +f5 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = c[i].v; +} + +void +f6 (void) +{ + for (int i = 0; i 1024; i++) +for (unsigned int j = 0; j 10; j++) + a[i] = d[i][j]; +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops in function 7 vect { target vect_condition } } } */ diff --git gcc/timevar.def gcc/timevar.def index efac4b7..fcc2fe0 100644 --- gcc/timevar.def +++ gcc/timevar.def @@ -275,6 +275,7 @@ DEFTIMEVAR (TV_GIMPLE_SLSR , straight-line strength reduction) DEFTIMEVAR (TV_VTABLE_VERIFICATION , vtable verification) DEFTIMEVAR (TV_TREE_UBSAN, tree ubsan) DEFTIMEVAR (TV_INITIALIZE_RTL, initialize rtl) +DEFTIMEVAR (TV_TREE_LADDRESS , address lowering) /* Everything else in rest_of_compilation not included above. */ DEFTIMEVAR (TV_EARLY_LOCAL , early local passes) diff --git gcc/tree-pass.h gcc/tree-pass.h index
Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)
On Fri, 3 Jul 2015, Marek Polacek wrote: This patch implements a new pass, called laddress, which deals with lowering ADDR_EXPR assignments. Such lowering ought to help the vectorizer, but it also could expose more CSE opportunities, maybe help reassoc, etc. It's only active when optimize != 0. So e.g. _1 = (sizetype) i_9; _7 = _1 * 4; _4 = b + _7; instead of _4 = b[i_9]; This triggered 14105 times during the regtest and 6392 times during the bootstrap. The fallout (at least on x86_64) is surprisingly small, i.e. none, just gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due to a bug in the vectorizer. Jakub has a patch and knows the details. As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants (that was the motivation of this pass). This doesn't introduce any kind of verification nor PROP_laddress. Don't know if we want that, but hopefully it can be done as a follow-up if we do. Yes. At the moment nothing requires lowered address form so this is merely an optimization (and not a bug for some later pass to re-introduce un-lowered non-invariant addresses). I can imagine that for example IVOPTs could be simplified if we didn't have this kind of addresses in the IL. Do we want to move some optimizations into this new pass, e.g. from fwprop? I think we might want to re-try forwprop_into_addr_expr before lowering the address. Well, but that's maybe just over-cautionous. Thoughts? Please move the pass before crited, crited and pre are supposed to go together. Otherwise looks ok to me. Thanks, Richard. Bootstrapped/regtested on x86_64-linux. 2015-07-03 Marek Polacek pola...@redhat.com PR tree-optimization/66718 * Makefile.in (OBJS): Add tree-ssa-laddress.o. * passes.def: Schedule pass_laddress. * timevar.def (DEFTIMEVAR): Add TV_TREE_LADDRESS. * tree-pass.h (make_pass_laddress): Declare. * tree-ssa-laddress.c: New file. * gcc.dg/vect/vect-126.c: New test. diff --git gcc/Makefile.in gcc/Makefile.in index 89eda96..2574b98 100644 --- gcc/Makefile.in +++ gcc/Makefile.in @@ -1447,6 +1447,7 @@ OBJS = \ tree-ssa-dse.o \ tree-ssa-forwprop.o \ tree-ssa-ifcombine.o \ + tree-ssa-laddress.o \ tree-ssa-live.o \ tree-ssa-loop-ch.o \ tree-ssa-loop-im.o \ diff --git gcc/passes.def gcc/passes.def index 0d8356b..ac16e8a 100644 --- gcc/passes.def +++ gcc/passes.def @@ -214,6 +214,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_cse_sincos); NEXT_PASS (pass_optimize_bswap); NEXT_PASS (pass_split_crit_edges); + NEXT_PASS (pass_laddress); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); NEXT_PASS (pass_asan); diff --git gcc/testsuite/gcc.dg/vect/vect-126.c gcc/testsuite/gcc.dg/vect/vect-126.c index e69de29..66a5821 100644 --- gcc/testsuite/gcc.dg/vect/vect-126.c +++ gcc/testsuite/gcc.dg/vect/vect-126.c @@ -0,0 +1,64 @@ +/* PR tree-optimization/66718 */ +/* { dg-do compile } */ +/* { dg-additional-options -mavx2 { target avx_runtime } } */ + +int *a[1024], b[1024]; +struct S { int u, v, w, x; }; +struct S c[1024]; +int d[1024][10]; + +void +f0 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = b[0]; +} + +void +f1 (void) +{ + for (int i = 0; i 1024; i++) +{ + int *p = b[0]; + a[i] = p + i; +} +} + +void +f2 (int *p) +{ + for (int i = 0; i 1024; i++) +a[i] = p[i]; +} + +void +f3 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = b[i]; +} + +void +f4 (void) +{ + int *p = c[0].v; + for (int i = 0; i 1024; i++) +a[i] = p[4 * i]; +} + +void +f5 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = c[i].v; +} + +void +f6 (void) +{ + for (int i = 0; i 1024; i++) +for (unsigned int j = 0; j 10; j++) + a[i] = d[i][j]; +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops in function 7 vect { target vect_condition } } } */ diff --git gcc/timevar.def gcc/timevar.def index efac4b7..fcc2fe0 100644 --- gcc/timevar.def +++ gcc/timevar.def @@ -275,6 +275,7 @@ DEFTIMEVAR (TV_GIMPLE_SLSR , straight-line strength reduction) DEFTIMEVAR (TV_VTABLE_VERIFICATION , vtable verification) DEFTIMEVAR (TV_TREE_UBSAN, tree ubsan) DEFTIMEVAR (TV_INITIALIZE_RTL, initialize rtl) +DEFTIMEVAR (TV_TREE_LADDRESS , address lowering) /* Everything else in rest_of_compilation not included above. */ DEFTIMEVAR (TV_EARLY_LOCAL, early local passes) diff --git gcc/tree-pass.h gcc/tree-pass.h index 2808dad..c47b22e 100644 --- gcc/tree-pass.h +++ gcc/tree-pass.h @@ -393,6 +393,7 @@ extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt); extern
Re: C++ PATCH to change default dialect to C++14
On 07/02/2015 07:41 PM, Jim Wilson wrote: The code compiles with -std=c++98. It does not compile with -std=c++14. So this testcase should be fixed to work with c++14. Done. Jason
[PATCH] rs6000: Add testcase for shifts
This new test tests that all shifts of int compile to exactly one machine instruction, not two as in the PR (which was a problem in combine). Tested on powerpc64-linux, with the usual options (-m32,-m32/-mpowerpc64,-m64,-m64/-mlra); okay for trunk? Segher 2015-07-03 Segher Boessenkool seg...@kernel.crashing.org gcc/testsuite/ PR rtl-optimization/66706 * gcc.target/powerpc/shift-int.c: New testcase. --- gcc/testsuite/gcc.target/powerpc/shift-int.c | 23 +++ 1 file changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/shift-int.c diff --git a/gcc/testsuite/gcc.target/powerpc/shift-int.c b/gcc/testsuite/gcc.target/powerpc/shift-int.c new file mode 100644 index 000..fe696ea --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/shift-int.c @@ -0,0 +1,23 @@ +/* Check that shifts do not get unnecessary extends. + See PR66706 for a case where this failed. */ + +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +/* Each function should compile to exactly two instructions. */ +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 16 } } */ +/* { dg-final { scan-assembler-times {(?n)^\s+blr} 8 } } */ + + +typedef unsigned u; +typedef signed s; + +u rot(u x, u n) { return (x n) | (x (32 - n)); } +u shl(u x, u n) { return x n; } +u shr(u x, u n) { return x n; } +s asr(s x, u n) { return x n; } + +u roti(u x) { return (x 23) | (x 9); } +u shli(u x) { return x 23; } +u shri(u x) { return x 23; } +s asri(s x) { return x 23; } -- 1.8.1.4
[PATCH 0/2][trunk+5 backport][ARM] PR/65956 Implement AAPCS updates for alignment attribute
This patch series implements the changes/additions to the ARM ABI proposed at https://gcc.gnu.org/ml/gcc/2015-07/msg00040.html . The first patch is the ABI update. This is an ABI-breaking change for any code using __attribute__((aligned(...))) on a public interface (a case not previously defined by the AAPCS). This causes a regression of gcc.c-torture/execute/20040709-1.c at -O0 (only), and the align_rec2.c fails, both due to a latent bug where we can emit strd/ldrd on an odd-numbered register in ARM state. The second patch prevents such illegal instructions and fixes both tests. On trunk, tested via bootstrap + check-gcc on arm-none-linux-gnueabihf (cortex-a15+neon). Also cross-tested arm-none-eabi with a number of variants. On gcc-5-branch, patches rebase cleanly, tested via profiledbootstrap + check-gcc. (Yes, profiledbootstrap succeeds.)
Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator
Martin Jambor mjam...@suse.cz writes: On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote: Trevor Saunders tbsau...@tbsaunde.org writes: On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote: Martin Liška mli...@suse.cz writes: diff --git a/gcc/asan.c b/gcc/asan.c index e89817e..dabd6f1 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -362,20 +362,20 @@ struct asan_mem_ref /* Pool allocation new operator. */ inline void *operator new (size_t) { -return pool.allocate (); +return ::new (pool.allocate ()) asan_mem_ref (); } /* Delete operator utilizing pool allocation. */ inline void operator delete (void *ptr) { -pool.remove ((asan_mem_ref *) ptr); +pool.remove (ptr); } /* Memory allocation pool. */ - static pool_allocatorasan_mem_ref pool; + static pool_allocator pool; }; I'm probably going over old ground/wounds, sorry, but what's the benefit of having this sort of pattern? Why not simply have object_allocators and make callers use pool.allocate () and pool.remove (x) (with pool.remove calling the destructor) instead of new and delete? It feels wrong to me to tie the data type to a particular allocation object like this. Well the big question is what does allocate() do about construction? if it seems wierd for it to not call the ctor, but I'm not sure we can do a good job of forwarding args to allocate() with C++98. If you need non-default constructors then: new (pool) type (aaa, bbb)...; doesn't seem too bad. I agree object_allocator's allocate () should call the constructor. but then the pool allocator must not call placement new on the allocated memory itself because that would result in double construction. But we're talking about two different methods. The normal allocator object_allocator T::allocate () would use placement new and return a pointer to the new object while operator new (size_t, object_allocator T ) wouldn't call placement new and would just return a pointer to the memory. And using the pool allocator functions directly has the nice property that you can tell when a delete/remove isn't necessary because the pool itself is being cleared. Well, all these cases involve a pool with static storage lifetime right? so actually if you don't delete things in these pool they are effectively leaked. They might have a static storage lifetime now, but it doesn't seem like a good idea to hard-bake that into the interface Does that mean that operators new and delete are considered evil? Not IMO. Just that static load-time-initialized caches are not necessarily a good thing. That's effectively what the pool allocator is. (by saying that for these types you should use new and delete, but for other pool-allocated types you should use object_allocators). Depending on what kind of pool allocator you use, you will be forced to either call placement new or not, so the inconsistency will be there anyway. But how we handle argument-taking constructors is a problem that needs to be solved for the pool-allocated objects that don't use a single static type-specific pool. And once we solve that, we get consistency across all pools: - if you want a new object and argumentless construction is OK, use pool.allocate () - if you want a new object and need to pass arguments to the constructor, use new (pool) some_type (arg1, arg2, ...) Maybe I just have bad memories from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot of state that was obviously static in the old days, but that needed to become non-static to support vaguely-efficient switching between different subtargets. The same kind of thing is likely to happen again. I assume things like the jit would prefer not to have new global state with load-time construction. I'm not sure I follow this branch of the discussion, the allocators of any kind surely can dynamically allocated themselves? Sure, but either (a) you keep the pools as a static part of the class and some initialisation and finalisation code that has tendrils into all such classes or (b) you move the static pool outside of the class to some new (still global) state. Explicit pool allocation, like in the C days, gives you the option of putting the pool whereever it needs to go without relying on the principle that you can get to it from global state. Thanks, Richard
Re: [PATCH] Allow embedded timestamps by C/C++ macros to be set externally
On 06/30/2015 06:23 PM, Manuel López-Ibáñez wrote: On 30 June 2015 at 17:18, Dhole dh...@openmailbox.org wrote: In the debian reproducible builds project we have considered several options to address this issue. We considered redefining the __DATE__ and __TIME__ defines by command line flags passed to gcc, but as you say, that triggers warnings, which could become errors when building with -Werror and thus may require manual intervention on many packages. Well, it would require adding -Wno-something (-Wno-reproducible? -Wno-unreproducible? or perhaps simply -freproducible? ) to some CFLAGS/CXXFLAGS. Is that too much manual intervention? (I'm asking sincerely, perhaps indeed it is). Our idea with the SOURCE_DATE_EPOCH env var was to find a general solution for all toolchain packages involved in the build process that embed timestamps. We already have a patched version of a package used during Debian builds (debhelper) which sets the SOURCE_DATE_EPOCH in the build environment. With the submitted patch to GCC nothing else would be needed, and we believe it would be useful to other projects working on reproducible builds, as they would only need to set the SOURCE_DATE_EPOCH env var during their build process. Modifying the CFLAGS/CXXFLAGS would need more intervention during the build process, and this would be a solution only useful for GCC and not other toolchain packages. It could be done, but we'd prefer the general approach. As mentioned before, we are trying to create a standard way of modifying timestamp embedding behavior for any package with the SOURCE_DATE_EPOCH. This could be a big hammer option that simply disables any warning that is not relevant for reproducible builds (the default being -Wsomething), for example avoid emitting --Wbuiltin-macro-redefined warnings in the specific cases of __TIME__ and __DATE. Just an idea, the maintainers would need to say if they would accept such an option. Cheers, Manuel. I'm looking forward to hear opinions from the maintainers :) Regards, -- Dhole signature.asc Description: OpenPGP digital signature
Re: [v3 PATCH] Implement Fundamentals v2 propagate_const
On 03/07/15 14:40 +0300, Ville Voutilainen wrote: Tested on Linux-PPC64. Patch gzipped to avoid polluting people's mailboxes with a 45k patch. Thanks very much, I made a few whitespace changes and committed it (as attached) after testing. I've also updated the status tables in the docs, see the second attachment. patch.txt.gz Description: application/gzip commit 6b5c94dc814c2aea9c457d0eb6cb965e9a7011f1 Author: Jonathan Wakely jwak...@redhat.com Date: Fri Jul 3 14:02:53 2015 +0100 * doc/xml/manual/status_cxx2017.xml: Update status table. * doc/html/manual/*: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml index 07e2dbe..491e024 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml @@ -112,7 +112,7 @@ not in any particular release. /entry entryCleaning-up noexcept in the Library/entry entryPartial/entry - entry/ + entryChanges to basic_string not complete./entry /row row @@ -177,14 +177,13 @@ not in any particular release. /row row - ?dbhtml bgcolor=#C8B0B0 ? entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4387.html; N4387 /link /entry entry Improving pair and tuple, revision 3 /entry - entryN/entry + entryY/entry entry/ /row @@ -304,14 +303,13 @@ not in any particular release. row - ?dbhtml bgcolor=#C8B0B0 ? entry link xmlns:xlink=http://www.w3.org/1999/xlink; xlink:href=http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4388.html; N4388 /link /entry entryConst-Propagating Wrapper/entry - entryN/entry + entryY/entry entryLibrary Fundamentals 2 TS/entry /row
Re: [PATCH] rs6000: Add testcase for shifts
On Fri, Jul 3, 2015 at 10:16 AM, Segher Boessenkool seg...@kernel.crashing.org wrote: This new test tests that all shifts of int compile to exactly one machine instruction, not two as in the PR (which was a problem in combine). Tested on powerpc64-linux, with the usual options (-m32,-m32/-mpowerpc64,-m64,-m64/-mlra); okay for trunk? Segher 2015-07-03 Segher Boessenkool seg...@kernel.crashing.org gcc/testsuite/ PR rtl-optimization/66706 * gcc.target/powerpc/shift-int.c: New testcase. Okay. Thanks, David
[Patch docs obvious AArch64] Fix position of -moverride documentation
Hi, -moverride is not a feature modifier, so it is currently misplaced in the documentation. Fix that by moving it out to the general AArch64 options section. Checked in the HTML output that is now in a sensible place, and committed as attached as obvious as revision 225384. Thanks, James --- 2015-07-03 James Greenhalgh james.greenha...@arm.com * doc/invoke.texi (moverride): Move to correct section. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 844d7edaecf2bc6642324ad8513f7c2add0ee486..1dfce1143027cef86d8fbf59580035e6d25f1189 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12496,6 +12496,15 @@ the target processor for which to tune for performance (as if by @option{-mtune}). Where this option is used in conjunction with @option{-march} or @option{-mtune}, those options take precedence over the appropriate part of this option. + +@item -moverride=@var{string} +@opindex moverride +Override tuning decisions made by the back-end in response to a +@option{-mtune=} switch. The syntax, semantics, and accepted values +for @var{string} in this option are not guaranteed to be consistent +across releases. + +This option is only intended to be useful when developing GCC. @end table @subsubsection @option{-march} and @option{-mcpu} Feature Modifiers @@ -12526,14 +12535,6 @@ Enable Limited Ordering Regions support. @item rdma Enable ARMv8.1 Advanced SIMD instructions. -@item -moverride=@var{string} -@opindex master -Override tuning decisions made by the back-end in response to a -@option{-mtune=} switch. The syntax, semantics, and accepted values -for @var{string} in this option are not guaranteed to be consistent -across releases. - -This option is only intended to be useful when developing GCC. @end table That is, @option{crypto} implies @option{simd} implies @option{fp}.
[PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute
These include tests of structs, scalars, and vectors - only general-purpose registers are affected by the ABI rules for alignment, but we can restrict the vector test to use the base AAPCS. Prior to this patch, align2.c, align3.c and align_rec1.c were failing (the latter showing an internal inconsistency, the first two merely that GCC did not obey the new ABI). With this patch, the align_rec2.c fails, and also gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a latent bug where we can emit strd/ldrd on an odd-numbered register in ARM state, fixed by the second patch. gcc/ChangeLog: * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer alignment attribute, exploring one level down for aggregates. gcc/testsuite/ChangeLog: * gcc.target/arm/aapcs/align1.c: New. * gcc.target/arm/aapcs/align_rec1.c: New. * gcc.target/arm/aapcs/align2.c: New. * gcc.target/arm/aapcs/align_rec2.c: New. * gcc.target/arm/aapcs/align3.c: New. * gcc.target/arm/aapcs/align_rec3.c: New. * gcc.target/arm/aapcs/align4.c: New. * gcc.target/arm/aapcs/align_rec4.c: New. * gcc.target/arm/aapcs/align_vararg1.c: New. * gcc.target/arm/aapcs/align_vararg2.c: New. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 04663999224c8c8eb8e2d10b0ec634db6ce5027e..ee57d30617a2f7e1cd63ca013fe5655a01027581 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -6020,8 +6020,17 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype, static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (!type) +return PARM_BOUNDARY GET_MODE_ALIGNMENT (mode); + + if (!AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; + + for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) +if (DECL_ALIGN (field) PARM_BOUNDARY) + return true; + + return false; } diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align1.c b/gcc/testsuite/gcc.target/arm/aapcs/align1.c new file mode 100644 index ..8981d57c3eaf0bd89d224bec79ff8a45627a0a89 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align1.c @@ -0,0 +1,29 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align1.c + +typedef __attribute__((aligned (8))) int alignedint; + +alignedint a = 11; +alignedint b = 13; +alignedint c = 17; +alignedint d = 19; +alignedint e = 23; +alignedint f = 29; + +#include abitest.h +#else + ARG (alignedint, a, R0) + /* Attribute suggests R2, but we should use only natural alignment: */ + ARG (alignedint, b, R1) + ARG (alignedint, c, R2) + ARG (alignedint, d, R3) + ARG (alignedint, e, STACK) + /* Attribute would suggest STACK + 8 but should be ignored: */ + LAST_ARG (alignedint, f, STACK + 4) +#endif diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align2.c b/gcc/testsuite/gcc.target/arm/aapcs/align2.c new file mode 100644 index ..992da53c606c793f25278152406582bb993719d2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align2.c @@ -0,0 +1,30 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align2.c + +/* The underlying struct here has alignment 4. */ +typedef struct __attribute__((aligned (8))) + { +int x; +int y; + } overaligned; + +/* A couple of instances, at 8-byte-aligned memory locations. */ +overaligned a = { 2, 3 }; +overaligned b = { 5, 8 }; + +#include abitest.h +#else + ARG (int, 7, R0) + /* Alignment should be 4. */ + ARG (overaligned, a, R1) + ARG (int, 9, R3) + ARG (int, 10, STACK) + /* Alignment should be 4. */ + LAST_ARG (overaligned, b, STACK + 4) +#endif diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align3.c b/gcc/testsuite/gcc.target/arm/aapcs/align3.c new file mode 100644 index ..81ad3f587a95aae52ec601ce5a60b198e5351edf --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align3.c @@ -0,0 +1,42 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O3 } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align3.c + +/* Struct will be aligned to 8. */ +struct s + { +int x; +/* 4 bytes padding here. */ +__attribute__((aligned (8))) int y; +/* 4 bytes padding here. */ + }; + +typedef struct s __attribute__((aligned (4))) underaligned; + +#define EXPECTED_STRUCT_SIZE 16 +extern void link_failure (void); +int +foo () +{ + /* Optimization gets rid of this before linking.
[PATCH 2/2][ARM] fix movdi expander to avoid illegal ldrd/strd
The previous patch caused a regression in gcc.c-torture/execute/20040709-1.c at -O0 (only), and the new align_rec2.c test fails, both outputting an illegal assembler instruction (ldrd on an odd-numbered reg) from output_move_double in arm.c. Most routes have checks against such an illegal instruction, but expanding a function call can directly name such impossible register (pairs), bypassing the normal checks. gcc/ChangeLog: * config/arm/arm.md (movdi): Avoid odd-number ldrd/strd in ARM state. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 164ac13a26289bf755c89e78a8a5f751883c6039..c6718282d2555f8cf9a4e9111b1393e1f7704983 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -5415,6 +5415,42 @@ if (!REG_P (operands[0])) operands[1] = force_reg (DImode, operands[1]); } + if (REG_P (operands[0]) REGNO (operands[0]) FIRST_VIRTUAL_REGISTER + !HARD_REGNO_MODE_OK (REGNO (operands[0]), DImode)) +{ + /* Avoid LDRD's into an odd-numbered register pair in ARM state + when expanding function calls. */ + gcc_assert (can_create_pseudo_p ()); + if (MEM_P (operands[1]) MEM_VOLATILE_P (operands[1])) + { + /* Perform load into legal reg pair first, then move. */ + rtx reg = gen_reg_rtx (DImode); + emit_insn (gen_movdi (reg, operands[1])); + operands[1] = reg; + } + emit_move_insn (gen_lowpart (SImode, operands[0]), + gen_lowpart (SImode, operands[1])); + emit_move_insn (gen_highpart (SImode, operands[0]), + gen_highpart (SImode, operands[1])); + DONE; +} + else if (REG_P (operands[1]) REGNO (operands[1]) FIRST_VIRTUAL_REGISTER + !HARD_REGNO_MODE_OK (REGNO (operands[1]), DImode)) +{ + /* Avoid LDRD's into an odd-numbered register pair in ARM state + when expanding function prologue. */ + gcc_assert (can_create_pseudo_p ()); + rtx split_dest = (MEM_P (operands[0]) MEM_VOLATILE_P (operands[0])) + ? gen_reg_rtx (DImode) + : operands[0]; + emit_move_insn (gen_lowpart (SImode, split_dest), + gen_lowpart (SImode, operands[1])); + emit_move_insn (gen_highpart (SImode, split_dest), + gen_highpart (SImode, operands[1])); + if (split_dest != operands[0]) + emit_insn (gen_movdi (operands[0], split_dest)); + DONE; +} )
Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator
Hi, On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote: Trevor Saunders tbsau...@tbsaunde.org writes: On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote: Martin Liška mli...@suse.cz writes: diff --git a/gcc/asan.c b/gcc/asan.c index e89817e..dabd6f1 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -362,20 +362,20 @@ struct asan_mem_ref /* Pool allocation new operator. */ inline void *operator new (size_t) { -return pool.allocate (); +return ::new (pool.allocate ()) asan_mem_ref (); } /* Delete operator utilizing pool allocation. */ inline void operator delete (void *ptr) { -pool.remove ((asan_mem_ref *) ptr); +pool.remove (ptr); } /* Memory allocation pool. */ - static pool_allocatorasan_mem_ref pool; + static pool_allocator pool; }; I'm probably going over old ground/wounds, sorry, but what's the benefit of having this sort of pattern? Why not simply have object_allocators and make callers use pool.allocate () and pool.remove (x) (with pool.remove calling the destructor) instead of new and delete? It feels wrong to me to tie the data type to a particular allocation object like this. Well the big question is what does allocate() do about construction? if it seems wierd for it to not call the ctor, but I'm not sure we can do a good job of forwarding args to allocate() with C++98. If you need non-default constructors then: new (pool) type (aaa, bbb)...; doesn't seem too bad. I agree object_allocator's allocate () should call the constructor. but then the pool allocator must not call placement new on the allocated memory itself because that would result in double construction. And calling placement new was explicitely requested in the previous thread about allocators, so we still need two kinds of allocators, typed and untyped. Or just the untyped allocators and requiring that users construct their objects via placement new. In fact, they might have to call placement new even if there is no constructor because of the weird aliasing issue. Two kinds of pool-allocators seem the lesser evil to me. And using the pool allocator functions directly has the nice property that you can tell when a delete/remove isn't necessary because the pool itself is being cleared. Well, all these cases involve a pool with static storage lifetime right? so actually if you don't delete things in these pool they are effectively leaked. They might have a static storage lifetime now, but it doesn't seem like a good idea to hard-bake that into the interface Does that mean that operators new and delete are considered evil? (by saying that for these types you should use new and delete, but for other pool-allocated types you should use object_allocators). Depending on what kind of pool allocator you use, you will be forced to either call placement new or not, so the inconsistency will be there anyway. I'm using pool allocators for classes with non-default constructors a lot in the HSA branch so I'd appreciate an early settlement of this issue. I think I slightly prefer overloading new and delete to using placement new (at least in new code) because then users just allocate stuff as usual and there is one central point where thing can be changed. But I do not have strong feelings and will comply with whatever we can agree on. Maybe I just have bad memories from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot of state that was obviously static in the old days, but that needed to become non-static to support vaguely-efficient switching between different subtargets. The same kind of thing is likely to happen again. I assume things like the jit would prefer not to have new global state with load-time construction. I'm not sure I follow this branch of the discussion, the allocators of any kind surely can dynamically allocated themselves? Thanks, Martin
RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)
This patch implements a new pass, called laddress, which deals with lowering ADDR_EXPR assignments. Such lowering ought to help the vectorizer, but it also could expose more CSE opportunities, maybe help reassoc, etc. It's only active when optimize != 0. So e.g. _1 = (sizetype) i_9; _7 = _1 * 4; _4 = b + _7; instead of _4 = b[i_9]; This triggered 14105 times during the regtest and 6392 times during the bootstrap. The fallout (at least on x86_64) is surprisingly small, i.e. none, just gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due to a bug in the vectorizer. Jakub has a patch and knows the details. As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants (that was the motivation of this pass). This doesn't introduce any kind of verification nor PROP_laddress. Don't know if we want that, but hopefully it can be done as a follow-up if we do. Do we want to move some optimizations into this new pass, e.g. from fwprop? Thoughts? Bootstrapped/regtested on x86_64-linux. 2015-07-03 Marek Polacek pola...@redhat.com PR tree-optimization/66718 * Makefile.in (OBJS): Add tree-ssa-laddress.o. * passes.def: Schedule pass_laddress. * timevar.def (DEFTIMEVAR): Add TV_TREE_LADDRESS. * tree-pass.h (make_pass_laddress): Declare. * tree-ssa-laddress.c: New file. * gcc.dg/vect/vect-126.c: New test. diff --git gcc/Makefile.in gcc/Makefile.in index 89eda96..2574b98 100644 --- gcc/Makefile.in +++ gcc/Makefile.in @@ -1447,6 +1447,7 @@ OBJS = \ tree-ssa-dse.o \ tree-ssa-forwprop.o \ tree-ssa-ifcombine.o \ + tree-ssa-laddress.o \ tree-ssa-live.o \ tree-ssa-loop-ch.o \ tree-ssa-loop-im.o \ diff --git gcc/passes.def gcc/passes.def index 0d8356b..ac16e8a 100644 --- gcc/passes.def +++ gcc/passes.def @@ -214,6 +214,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_cse_sincos); NEXT_PASS (pass_optimize_bswap); NEXT_PASS (pass_split_crit_edges); + NEXT_PASS (pass_laddress); NEXT_PASS (pass_pre); NEXT_PASS (pass_sink_code); NEXT_PASS (pass_asan); diff --git gcc/testsuite/gcc.dg/vect/vect-126.c gcc/testsuite/gcc.dg/vect/vect-126.c index e69de29..66a5821 100644 --- gcc/testsuite/gcc.dg/vect/vect-126.c +++ gcc/testsuite/gcc.dg/vect/vect-126.c @@ -0,0 +1,64 @@ +/* PR tree-optimization/66718 */ +/* { dg-do compile } */ +/* { dg-additional-options -mavx2 { target avx_runtime } } */ + +int *a[1024], b[1024]; +struct S { int u, v, w, x; }; +struct S c[1024]; +int d[1024][10]; + +void +f0 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = b[0]; +} + +void +f1 (void) +{ + for (int i = 0; i 1024; i++) +{ + int *p = b[0]; + a[i] = p + i; +} +} + +void +f2 (int *p) +{ + for (int i = 0; i 1024; i++) +a[i] = p[i]; +} + +void +f3 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = b[i]; +} + +void +f4 (void) +{ + int *p = c[0].v; + for (int i = 0; i 1024; i++) +a[i] = p[4 * i]; +} + +void +f5 (void) +{ + for (int i = 0; i 1024; i++) +a[i] = c[i].v; +} + +void +f6 (void) +{ + for (int i = 0; i 1024; i++) +for (unsigned int j = 0; j 10; j++) + a[i] = d[i][j]; +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops in function 7 vect { target vect_condition } } } */ diff --git gcc/timevar.def gcc/timevar.def index efac4b7..fcc2fe0 100644 --- gcc/timevar.def +++ gcc/timevar.def @@ -275,6 +275,7 @@ DEFTIMEVAR (TV_GIMPLE_SLSR , straight-line strength reduction) DEFTIMEVAR (TV_VTABLE_VERIFICATION , vtable verification) DEFTIMEVAR (TV_TREE_UBSAN, tree ubsan) DEFTIMEVAR (TV_INITIALIZE_RTL, initialize rtl) +DEFTIMEVAR (TV_TREE_LADDRESS , address lowering) /* Everything else in rest_of_compilation not included above. */ DEFTIMEVAR (TV_EARLY_LOCAL , early local passes) diff --git gcc/tree-pass.h gcc/tree-pass.h index 2808dad..c47b22e 100644 --- gcc/tree-pass.h +++ gcc/tree-pass.h @@ -393,6 +393,7 @@ extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt); extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt); extern gimple_opt_pass *make_pass_split_crit_edges (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_laddress (gcc::context *ctxt); extern gimple_opt_pass *make_pass_pre (gcc::context *ctxt); extern unsigned int tail_merge_optimize (unsigned int); extern gimple_opt_pass *make_pass_profile (gcc::context *ctxt); diff --git gcc/tree-ssa-laddress.c gcc/tree-ssa-laddress.c index e69de29..3f69d7d 100644 --- gcc/tree-ssa-laddress.c +++ gcc/tree-ssa-laddress.c @@ -0,0 +1,137 @@ +/* Lower and optimize address expressions. + Copyright (C) 2015 Free Software Foundation, Inc. + Contributed by Marek Polacek pola...@redhat.com + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or
[PATCH] libgomp: Add comment to clarify last_team usage
libgomp/ChangeLog 2015-07-03 Sebastian Huber sebastian.hu...@embedded-brains.de * libgomp.h (gomp_thread_pool): Comment last_team field. --- libgomp/libgomp.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h index 5272f01..5ed0f78 100644 --- a/libgomp/libgomp.h +++ b/libgomp/libgomp.h @@ -458,6 +458,9 @@ struct gomp_thread_pool struct gomp_thread **threads; unsigned threads_size; unsigned threads_used; + /* The last team is used for non-nested teams to delay their destruction to + make sure all the threads in the team move on to the pool's barrier before + the team's barrier is destroyed. */ struct gomp_team *last_team; /* Number of threads running in this contention group. */ unsigned long threads_busy; -- 1.8.4.5
Re: RFC: Add ADDR_EXPR lowering (PR tree-optimization/66718)
On July 3, 2015 4:06:26 PM GMT+02:00, Jakub Jelinek ja...@redhat.com wrote: On Fri, Jul 03, 2015 at 03:41:29PM +0200, Richard Biener wrote: The fallout (at least on x86_64) is surprisingly small, i.e. none, just gcc.dg/vect/pr59984.c test (using -fopenmp-simd) ICEs, but that is due to a bug in the vectorizer. Jakub has a patch and knows the details. As the test shows, we're now able to vectorize ADDR_EXPR of non-invariants (that was the motivation of this pass). Here is the fix for that. The problem is that for simd clone calls, if they have void return type, STMT_VINFO_VECTYPE is NULL. If vectorize_simd_clone_call succeeds, that is fine, but if it doesn't, we can fall into all the other vectorizable_* functions, and some of them compute some variables IMHO prematurely. It doesn't make sense to compute nunits/ncopies etc. if stmt isn't even an assignment etc. So, this patch adjusts the few routines that had this problem, so that we check is_gimple_assign and gimple_assign_rhs_code or whatever is the quick GIMPLE test those functions use to find if stmt is of interest to them, and only when it is, compute whatever they need later. As NULL STMT_VINFO_VECTYPE can happen only for calls, all these functions don't ICE anymore. Ok for trunk if it passes bootstrap/regtest? OK. Thanks, Richard. In the pr59984.c testcase, with Marek's patch and this patch, one loop in test is already vectorized (the ICE was on the other one), I'll work on recognizing multiples of GOMP_SIMD_LANE () as linear next, so that we vectorize also the loop with bar. Without Marek's patch we weren't vectorizing any of the two loops. 2015-07-03 Jakub Jelinek ja...@redhat.com PR tree-optimization/66718 * tree-vect-stmts.c (vectorizable_assignment, vectorizable_store, vectorizable_load, vectorizable_condition): Move vectype, nunits, ncopies computation after checking what kind of statement stmt is. --- gcc/tree-vect-stmts.c.jj 2015-06-30 14:08:45.0 +0200 +++ gcc/tree-vect-stmts.c 2015-07-03 14:06:28.843573210 +0200 @@ -4043,13 +4043,11 @@ vectorizable_assignment (gimple stmt, gi tree scalar_dest; tree op; stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - tree vectype = STMT_VINFO_VECTYPE (stmt_info); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); tree new_temp; tree def; gimple def_stmt; enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type}; - unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype); int ncopies; int i, j; vectree vec_oprnds = vNULL; @@ -4060,16 +4058,6 @@ vectorizable_assignment (gimple stmt, gi enum tree_code code; tree vectype_in; - /* Multiple types in SLP are handled by creating the appropriate number of - vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in - case of SLP. */ - if (slp_node || PURE_SLP_STMT (stmt_info)) -ncopies = 1; - else -ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; - - gcc_assert (ncopies = 1); - if (!STMT_VINFO_RELEVANT_P (stmt_info) !bb_vinfo) return false; @@ -4095,6 +4083,19 @@ vectorizable_assignment (gimple stmt, gi if (code == VIEW_CONVERT_EXPR) op = TREE_OPERAND (op, 0); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype); + + /* Multiple types in SLP are handled by creating the appropriate number of + vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in + case of SLP. */ + if (slp_node || PURE_SLP_STMT (stmt_info)) +ncopies = 1; + else +ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; + + gcc_assert (ncopies = 1); + if (!vect_is_simple_use_1 (op, stmt, loop_vinfo, bb_vinfo, def_stmt, def, dt[0], vectype_in)) { @@ -5006,7 +5007,6 @@ vectorizable_store (gimple stmt, gimple_ tree vec_oprnd = NULL_TREE; stmt_vec_info stmt_info = vinfo_for_stmt (stmt); struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL; - tree vectype = STMT_VINFO_VECTYPE (stmt_info); tree elem_type; loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); struct loop *loop = NULL; @@ -5020,7 +5020,6 @@ vectorizable_store (gimple stmt, gimple_ tree dataref_ptr = NULL_TREE; tree dataref_offset = NULL_TREE; gimple ptr_incr = NULL; - unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype); int ncopies; int j; gimple next_stmt, first_stmt = NULL; @@ -5039,28 +5038,6 @@ vectorizable_store (gimple stmt, gimple_ bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); tree aggr_type; - if (loop_vinfo) -loop = LOOP_VINFO_LOOP (loop_vinfo); - - /* Multiple types in SLP are handled by creating the appropriate number of - vectorized stmts for each SLP node. Hence, NCOPIES is always 1 in - case of SLP. */ - if (slp || PURE_SLP_STMT (stmt_info)) -ncopies = 1; - else -ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo)
Re: [PATCH 0/2][trunk+5 backport][ARM] PR/65956 Implement AAPCS updates for alignment attribute
On July 3, 2015 5:24:24 PM GMT+02:00, Alan Lawrence alan.lawre...@arm.com wrote: This patch series implements the changes/additions to the ARM ABI proposed at https://gcc.gnu.org/ml/gcc/2015-07/msg00040.html . The first patch is the ABI update. This is an ABI-breaking change for any code using __attribute__((aligned(...))) on a public interface (a case not previously defined by the AAPCS). This causes a regression of gcc.c-torture/execute/20040709-1.c at -O0 (only), and the align_rec2.c fails, both due to a latent bug where we can emit strd/ldrd on an odd-numbered register in ARM state. The second patch prevents such illegal instructions and fixes both tests. On trunk, tested via bootstrap + check-gcc on arm-none-linux-gnueabihf (cortex-a15+neon). Also cross-tested arm-none-eabi with a number of variants. On gcc-5-branch, patches rebase cleanly, tested via profiledbootstrap + check-gcc. (Yes, profiledbootstrap succeeds.) Just FYI, the back port is OK to apply once the trunk side is approved. Thanks, Richard.
[patch] libstdc++/66742 use allocators correctly in list::sort()
In list::sort() we use 65 list objects to use as temporary storage, splicing and swapping elements between lists. However the lists are default constructed, with no allocator argument, which is wrong because the allocator type might not be default constructible, and even more wrong because splicing and merging between lists with non-equal allocators is undefined behaviour. So if this-get_allocator() != allocator_type() then we have undefined behaviour. The attached patch replaces the fixed-size array of 64 default-constructed lists with a new container-like type, _ListSortBuf, which is initially empty (so we don't create lists we don't need) but will grow up to a maximum size (which I kept at 64). As the container grows it initializes new elements with the correct allocator, so that every list used in the sort uses the same allocator. As well as reducing the number of lists we construct when sorting this also allows us to range-check and ensure we don't overflow the fixed-size array (we now get an exception if that happens, although that's probably not possible even on a 64-bit machine). Unfortunately this seems to hurt performance, presumably the extra indirections to the _ListSOrtBuf rather than just an array of lists confuse the optimisers. Does anyone see any better solution to this? (other than rewriting the whole sort function, which I think has been proposed) commit ba5b393a09022907f9aee2b539ad14fc1fc8a42d Author: Jonathan Wakely jwak...@redhat.com Date: Fri Jul 3 15:20:36 2015 +0100 PR libstdc++/66742 * include/bits/list.tcc (_ListSortBuf): Define. (list::sort): Use _ListSortBuf. * testsuite/23_containers/list/operations/66742.cc: New. diff --git a/libstdc++-v3/include/bits/list.tcc b/libstdc++-v3/include/bits/list.tcc index 714d9b5..f335af4 100644 --- a/libstdc++-v3/include/bits/list.tcc +++ b/libstdc++-v3/include/bits/list.tcc @@ -440,6 +440,53 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER } } + // A simple array-like container with a fixed maximum size. + templatetypename _List, size_t _Size +class _ListSortBuf +{ + templatetypename _Tp, typename _Alloc + friend class list; + + struct __attribute__((__aligned__(__alignof__(_List _Elem + { + unsigned char _M_buf[sizeof(_List)]; + operator _List*() { return reinterpret_cast_List*(_M_buf); } + }; + _Elem _M_buf[_Size]; + + typedef typename _List::allocator_type allocator_type; + + struct _Impl : allocator_type + { + _Impl(const allocator_type __a) : allocator_type(__a), _M_size(0) { } + size_t _M_size; + } _M_impl; + + _ListSortBuf(const allocator_type __a) : _M_impl(__a) { } + + ~_ListSortBuf() + { + while (_M_impl._M_size) + static_cast_List*(_M_buf[--_M_impl._M_size])-~_List(); + } + + _List* begin() { return _M_buf[0]; } + _List* end() { return begin() + _M_impl._M_size; } + + void _M_grow() + { + if (_M_impl._M_size == _Size) + __throw_bad_alloc(); + ::new(static_castvoid*(_M_buf + _M_impl._M_size)) _List(_M_impl); + ++_M_impl._M_size; + } + + bool empty() const { return _M_impl._M_size == 0; } + + _ListSortBuf(const _ListSortBuf); + _ListSortBuf operator=(const _ListSortBuf); +}; + templatetypename _Tp, typename _Alloc void list_Tp, _Alloc:: @@ -448,33 +495,36 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER // Do nothing if the list has length 0 or 1. if (this-_M_impl._M_node._M_next != this-_M_impl._M_node this-_M_impl._M_node._M_next-_M_next != this-_M_impl._M_node) - { -list __carry; -list __tmp[64]; -list * __fill = __tmp; -list * __counter; + { + list __carry(get_allocator()); + _ListSortBuflist, 64 __tmp(get_allocator()); + list* __counter; -do - { - __carry.splice(__carry.begin(), *this, begin()); + do + { + __carry.splice(__carry.begin(), *this, begin()); - for(__counter = __tmp; - __counter != __fill !__counter-empty(); - ++__counter) - { - __counter-merge(__carry); - __carry.swap(*__counter); - } - __carry.swap(*__counter); - if (__counter == __fill) - ++__fill; - } - while ( !empty() ); + for(__counter = __tmp.begin(); + __counter != __tmp.end() !__counter-empty(); + ++__counter) + { + __counter-merge(__carry); + __carry.swap(*__counter); + } + if (__counter == __tmp.end()) + __tmp._M_grow(); + __carry.swap(*__counter); + } + while ( !empty() ); -for (__counter = __tmp + 1; __counter != __fill; ++__counter) - __counter-merge(*(__counter - 1)); -swap( *(__fill - 1) ); - } + if (!__tmp.empty()) + { + for (__counter = __tmp.begin() + 1; __counter != __tmp.end(); + ++__counter) + __counter-merge(*(__counter - 1)); + swap( *(__tmp.end() - 1) ); + } + } } templatetypename _Tp, typename _Alloc @@ -526,31 +576,34 @@
Re: [PATCH 2/2][ARM] fix movdi expander to avoid illegal ldrd/strd
On 03/07/15 16:27, Alan Lawrence wrote: The previous patch caused a regression in gcc.c-torture/execute/20040709-1.c at -O0 (only), and the new align_rec2.c test fails, both outputting an illegal assembler instruction (ldrd on an odd-numbered reg) from output_move_double in arm.c. Most routes have checks against such an illegal instruction, but expanding a function call can directly name such impossible register (pairs), bypassing the normal checks. gcc/ChangeLog: * config/arm/arm.md (movdi): Avoid odd-number ldrd/strd in ARM state. OK. R. arm_overalign_2.patch diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 164ac13a26289bf755c89e78a8a5f751883c6039..c6718282d2555f8cf9a4e9111b1393e1f7704983 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -5415,6 +5415,42 @@ if (!REG_P (operands[0])) operands[1] = force_reg (DImode, operands[1]); } + if (REG_P (operands[0]) REGNO (operands[0]) FIRST_VIRTUAL_REGISTER + !HARD_REGNO_MODE_OK (REGNO (operands[0]), DImode)) +{ + /* Avoid LDRD's into an odd-numbered register pair in ARM state + when expanding function calls. */ + gcc_assert (can_create_pseudo_p ()); + if (MEM_P (operands[1]) MEM_VOLATILE_P (operands[1])) + { + /* Perform load into legal reg pair first, then move. */ + rtx reg = gen_reg_rtx (DImode); + emit_insn (gen_movdi (reg, operands[1])); + operands[1] = reg; + } + emit_move_insn (gen_lowpart (SImode, operands[0]), + gen_lowpart (SImode, operands[1])); + emit_move_insn (gen_highpart (SImode, operands[0]), + gen_highpart (SImode, operands[1])); + DONE; +} + else if (REG_P (operands[1]) REGNO (operands[1]) FIRST_VIRTUAL_REGISTER + !HARD_REGNO_MODE_OK (REGNO (operands[1]), DImode)) +{ + /* Avoid LDRD's into an odd-numbered register pair in ARM state + when expanding function prologue. */ + gcc_assert (can_create_pseudo_p ()); + rtx split_dest = (MEM_P (operands[0]) MEM_VOLATILE_P (operands[0])) +? gen_reg_rtx (DImode) +: operands[0]; + emit_move_insn (gen_lowpart (SImode, split_dest), + gen_lowpart (SImode, operands[1])); + emit_move_insn (gen_highpart (SImode, split_dest), + gen_highpart (SImode, operands[1])); + if (split_dest != operands[0]) + emit_insn (gen_movdi (operands[0], split_dest)); + DONE; +} )
Re: [patch] libstdc++/66742 use allocators correctly in list::sort()
2015-07-03 17:51 GMT+02:00 Jonathan Wakely jwak...@redhat.com: As well as reducing the number of lists we construct when sorting this also allows us to range-check and ensure we don't overflow the fixed-size array (we now get an exception if that happens, although that's probably not possible even on a 64-bit machine). Unfortunately this seems to hurt performance, presumably the extra indirections to the _ListSOrtBuf rather than just an array of lists confuse the optimisers. Does anyone see any better solution to this? (other than rewriting the whole sort function, which I think has been proposed) I have not yet thought about better solutions, but: - Isn't it necessary to cope with possibly final allocators when unconditionally forming the derived member class struct _Impl : allocator_type ? Maybe you could just define that as a non-deriving aggregate? - Daniel
Re: [patch] libstdc++/66742 use allocators correctly in list::sort()
On 03/07/15 18:56 +0200, Daniel Krügler wrote: - Isn't it necessary to cope with possibly final allocators when unconditionally forming the derived member class struct _Impl : allocator_type If the allocator was final we couldn't even instantiate std::list because of this in _List_base: struct _List_impl : public _Node_alloc_type { This is https://gcc.gnu.org/PR60921 and I'm going to fix it everywhere, so I'm not concerned about this one place yet.
Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute
On July 3, 2015 6:11:13 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 03/07/15 16:26, Alan Lawrence wrote: These include tests of structs, scalars, and vectors - only general-purpose registers are affected by the ABI rules for alignment, but we can restrict the vector test to use the base AAPCS. Prior to this patch, align2.c, align3.c and align_rec1.c were failing (the latter showing an internal inconsistency, the first two merely that GCC did not obey the new ABI). With this patch, the align_rec2.c fails, and also gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a latent bug where we can emit strd/ldrd on an odd-numbered register in ARM state, fixed by the second patch. gcc/ChangeLog: * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer alignment attribute, exploring one level down for aggregates. gcc/testsuite/ChangeLog: * gcc.target/arm/aapcs/align1.c: New. * gcc.target/arm/aapcs/align_rec1.c: New. * gcc.target/arm/aapcs/align2.c: New. * gcc.target/arm/aapcs/align_rec2.c: New. * gcc.target/arm/aapcs/align3.c: New. * gcc.target/arm/aapcs/align_rec3.c: New. * gcc.target/arm/aapcs/align4.c: New. * gcc.target/arm/aapcs/align_rec4.c: New. * gcc.target/arm/aapcs/align_vararg1.c: New. * gcc.target/arm/aapcs/align_vararg2.c: New. arm_overalign_1.patch diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 04663999224c8c8eb8e2d10b0ec634db6ce5027e..ee57d30617a2f7e1cd63ca013fe5655a01027581 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -6020,8 +6020,17 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype, static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (!type) +return PARM_BOUNDARY GET_MODE_ALIGNMENT (mode); + + if (!AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; + + for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) +if (DECL_ALIGN (field) PARM_BOUNDARY) + return true; + Is this behavior correct for unions or aggregates with record or union members? Technically this is incorrect since AGGREGATE_TYPE_P includes ARRAY_TYPE and ARRAY_TYPE doesn't have TYPE_FIELDS. I doubt we could reach that case though (unless there's a language that allows passing arrays by value). For array types I think you need to check TYPE_ALIGN (TREE_TYPE (type)). R. + return false; } diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align1.c b/gcc/testsuite/gcc.target/arm/aapcs/align1.c new file mode 100644 index ..8981d57c3eaf0bd89d224bec79ff8a45627a0a89 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align1.c @@ -0,0 +1,29 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align1.c + +typedef __attribute__((aligned (8))) int alignedint; + +alignedint a = 11; +alignedint b = 13; +alignedint c = 17; +alignedint d = 19; +alignedint e = 23; +alignedint f = 29; + +#include abitest.h +#else + ARG (alignedint, a, R0) + /* Attribute suggests R2, but we should use only natural alignment: */ + ARG (alignedint, b, R1) + ARG (alignedint, c, R2) + ARG (alignedint, d, R3) + ARG (alignedint, e, STACK) + /* Attribute would suggest STACK + 8 but should be ignored: */ + LAST_ARG (alignedint, f, STACK + 4) +#endif diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align2.c b/gcc/testsuite/gcc.target/arm/aapcs/align2.c new file mode 100644 index ..992da53c606c793f25278152406582bb993719d2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align2.c @@ -0,0 +1,30 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align2.c + +/* The underlying struct here has alignment 4. */ +typedef struct __attribute__((aligned (8))) + { +int x; +int y; + } overaligned; + +/* A couple of instances, at 8-byte-aligned memory locations. */ +overaligned a = { 2, 3 }; +overaligned b = { 5, 8 }; + +#include abitest.h +#else + ARG (int, 7, R0) + /* Alignment should be 4. */ + ARG (overaligned, a, R1) + ARG (int, 9, R3) + ARG (int, 10, STACK) + /* Alignment should be 4. */ + LAST_ARG (overaligned, b, STACK + 4) +#endif diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align3.c b/gcc/testsuite/gcc.target/arm/aapcs/align3.c new file mode 100644 index ..81ad3f587a95aae52ec601ce5a60b198e5351edf --- /dev/null +++
Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute
On 03/07/15 16:26, Alan Lawrence wrote: These include tests of structs, scalars, and vectors - only general-purpose registers are affected by the ABI rules for alignment, but we can restrict the vector test to use the base AAPCS. Prior to this patch, align2.c, align3.c and align_rec1.c were failing (the latter showing an internal inconsistency, the first two merely that GCC did not obey the new ABI). With this patch, the align_rec2.c fails, and also gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a latent bug where we can emit strd/ldrd on an odd-numbered register in ARM state, fixed by the second patch. gcc/ChangeLog: * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer alignment attribute, exploring one level down for aggregates. gcc/testsuite/ChangeLog: * gcc.target/arm/aapcs/align1.c: New. * gcc.target/arm/aapcs/align_rec1.c: New. * gcc.target/arm/aapcs/align2.c: New. * gcc.target/arm/aapcs/align_rec2.c: New. * gcc.target/arm/aapcs/align3.c: New. * gcc.target/arm/aapcs/align_rec3.c: New. * gcc.target/arm/aapcs/align4.c: New. * gcc.target/arm/aapcs/align_rec4.c: New. * gcc.target/arm/aapcs/align_vararg1.c: New. * gcc.target/arm/aapcs/align_vararg2.c: New. arm_overalign_1.patch diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 04663999224c8c8eb8e2d10b0ec634db6ce5027e..ee57d30617a2f7e1cd63ca013fe5655a01027581 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -6020,8 +6020,17 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype, static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (!type) +return PARM_BOUNDARY GET_MODE_ALIGNMENT (mode); + + if (!AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; + + for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) +if (DECL_ALIGN (field) PARM_BOUNDARY) + return true; + Technically this is incorrect since AGGREGATE_TYPE_P includes ARRAY_TYPE and ARRAY_TYPE doesn't have TYPE_FIELDS. I doubt we could reach that case though (unless there's a language that allows passing arrays by value). For array types I think you need to check TYPE_ALIGN (TREE_TYPE (type)). R. + return false; } diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align1.c b/gcc/testsuite/gcc.target/arm/aapcs/align1.c new file mode 100644 index ..8981d57c3eaf0bd89d224bec79ff8a45627a0a89 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align1.c @@ -0,0 +1,29 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align1.c + +typedef __attribute__((aligned (8))) int alignedint; + +alignedint a = 11; +alignedint b = 13; +alignedint c = 17; +alignedint d = 19; +alignedint e = 23; +alignedint f = 29; + +#include abitest.h +#else + ARG (alignedint, a, R0) + /* Attribute suggests R2, but we should use only natural alignment: */ + ARG (alignedint, b, R1) + ARG (alignedint, c, R2) + ARG (alignedint, d, R3) + ARG (alignedint, e, STACK) + /* Attribute would suggest STACK + 8 but should be ignored: */ + LAST_ARG (alignedint, f, STACK + 4) +#endif diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align2.c b/gcc/testsuite/gcc.target/arm/aapcs/align2.c new file mode 100644 index ..992da53c606c793f25278152406582bb993719d2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align2.c @@ -0,0 +1,30 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align2.c + +/* The underlying struct here has alignment 4. */ +typedef struct __attribute__((aligned (8))) + { +int x; +int y; + } overaligned; + +/* A couple of instances, at 8-byte-aligned memory locations. */ +overaligned a = { 2, 3 }; +overaligned b = { 5, 8 }; + +#include abitest.h +#else + ARG (int, 7, R0) + /* Alignment should be 4. */ + ARG (overaligned, a, R1) + ARG (int, 9, R3) + ARG (int, 10, STACK) + /* Alignment should be 4. */ + LAST_ARG (overaligned, b, STACK + 4) +#endif diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align3.c b/gcc/testsuite/gcc.target/arm/aapcs/align3.c new file mode 100644 index ..81ad3f587a95aae52ec601ce5a60b198e5351edf --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align3.c @@ -0,0 +1,42 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target
[Patch, obvious] Guard inform with warning return value
Hi, noticed this nit in a conditional for c++11 attributes. I'm going to commit the below as obvious. Thanks, Paolo. / 2015-07-03 Paolo Carlini paolo.carl...@oracle.com * attribs.c (decl_attributes): Guard inform with the return value of the preceding warning. Index: attribs.c === --- attribs.c (revision 225384) +++ attribs.c (working copy) @@ -469,10 +469,10 @@ decl_attributes (tree *node, tree attributes, int /* This is a c++11 attribute that appertains to a type-specifier, outside of the definition of, a class type. Ignore it. */ - warning (OPT_Wattributes, attribute ignored); - inform (input_location, - an attribute that appertains to a type-specifier - is ignored); + if (warning (OPT_Wattributes, attribute ignored)) + inform (input_location, + an attribute that appertains to a type-specifier + is ignored); continue; }
Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute
On Fri, Jul 03, 2015 at 04:26:02PM +0100, Alan Lawrence wrote: These include tests of structs, scalars, and vectors - only general-purpose registers are affected by the ABI rules for alignment, but we can restrict the vector test to use the base AAPCS. Prior to this patch, align2.c, align3.c and align_rec1.c were failing (the latter showing an internal inconsistency, the first two merely that GCC did not obey the new ABI). With this patch, the align_rec2.c fails, and also gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a latent bug where we can emit strd/ldrd on an odd-numbered register in ARM state, fixed by the second patch. gcc/ChangeLog: * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer alignment attribute, exploring one level down for aggregates. Can you please also add the testcase from https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00278.html to your patch set? Or I can commit it separately after it is approved (if it is). Jakub
[PATCH] Fixes accidental renaming of gdb.py file (i.e. libstdc++.so.6.0.22-gdb.py)
The addition of libstdc++fs broke an inexact and fragile method in the libstdc++-v3/python makefile, so it mis-names a python script after libstdc++fs rather than libstdc++. With DESTDIR /usr/lib, toolexeclibdir ../lib, and the .so version of 6.0.21, this makefile used to install the python script to /usr/lib/libstdc++.so.6.0.21-gdb.py. Once libstdc++fs was added, this makefile installs the python script to /usr/lib/libstdc++fs.a-gdb.py. This makefile examines files named libstdc++* in DESTDIR/toolexeclibdir, excluding: symlinks; *.la files; and previous *-gdb.py files. Its comments report it is done this way because libtool hides the real names from us. This patch changes the makefile so it examines files named libstdc++.* (notice the addition of the dot.) Although this is still not an optimum method, it at least puts the makefile on the right track again. Adding the dot is more future-proof than excluding files starting with libstdc++fs, because of the possibility of future additions of similarly named libraries. The patch below is also an attachment to this email. Index: libstdc++-v3/ChangeLog === --- libstdc++-v3/ChangeLog(revision 225409) +++ libstdc++-v3/ChangeLog(working copy) @@ -1,3 +1,9 @@ +2015-07-03 Michael Darling darli...@gmail.com + +* python/Makefile.am: python script name based off libstdc++.* rather +than libstdc++*, to avoid being mis-named after libstdc++fs. +* python/Makefile.in: Regenerate. + 2015-07-03 Jonathan Wakely jwak...@redhat.com * doc/xml/manual/status_cxx2017.xml: Update status table. Index: libstdc++-v3/python/Makefile.am === --- libstdc++-v3/python/Makefile.am(revision 225409) +++ libstdc++-v3/python/Makefile.am(working copy) @@ -45,11 +45,11 @@ @$(mkdir_p) $(DESTDIR)$(toolexeclibdir) ## We want to install gdb.py as SOMETHING-gdb.py. SOMETHING is the ## full name of the final library. We want to ignore symlinks, the -## .la file, and any previous -gdb.py file. This is inherently -## fragile, but there does not seem to be a better option, because -## libtool hides the real names from us. +## .la file, any previous -gdb.py file, and libstdc++fs*. This is +## inherently fragile, but there does not seem to be a better option, +## because libtool hides the real names from us. @here=`pwd`; cd $(DESTDIR)$(toolexeclibdir); \ - for file in libstdc++*; do \ + for file in libstdc++.*; do \ case $$file in \ *-gdb.py) ;; \ *.la) ;; \ Index: libstdc++-v3/python/Makefile.in === --- libstdc++-v3/python/Makefile.in(revision 225409) +++ libstdc++-v3/python/Makefile.in(working copy) @@ -547,7 +547,7 @@ install-data-local: gdb.py @$(mkdir_p) $(DESTDIR)$(toolexeclibdir) @here=`pwd`; cd $(DESTDIR)$(toolexeclibdir); \ - for file in libstdc++*; do \ + for file in libstdc++.*; do \ case $$file in \ *-gdb.py) ;; \ *.la) ;; \ gcc.libstdc++-v3.python.dot.fix.patch Description: Binary data
Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator
Hi Martin, Martin Liška mli...@suse.cz writes: On 07/03/2015 03:07 PM, Richard Sandiford wrote: Martin Jambor mjam...@suse.cz writes: On Fri, Jul 03, 2015 at 09:55:58AM +0100, Richard Sandiford wrote: Trevor Saunders tbsau...@tbsaunde.org writes: On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote: Martin Liška mli...@suse.cz writes: diff --git a/gcc/asan.c b/gcc/asan.c index e89817e..dabd6f1 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -362,20 +362,20 @@ struct asan_mem_ref /* Pool allocation new operator. */ inline void *operator new (size_t) { -return pool.allocate (); +return ::new (pool.allocate ()) asan_mem_ref (); } /* Delete operator utilizing pool allocation. */ inline void operator delete (void *ptr) { -pool.remove ((asan_mem_ref *) ptr); +pool.remove (ptr); } /* Memory allocation pool. */ - static pool_allocatorasan_mem_ref pool; + static pool_allocator pool; }; I'm probably going over old ground/wounds, sorry, but what's the benefit of having this sort of pattern? Why not simply have object_allocators and make callers use pool.allocate () and pool.remove (x) (with pool.remove calling the destructor) instead of new and delete? It feels wrong to me to tie the data type to a particular allocation object like this. Well the big question is what does allocate() do about construction? if it seems wierd for it to not call the ctor, but I'm not sure we can do a good job of forwarding args to allocate() with C++98. If you need non-default constructors then: new (pool) type (aaa, bbb)...; doesn't seem too bad. I agree object_allocator's allocate () should call the constructor. but then the pool allocator must not call placement new on the allocated memory itself because that would result in double construction. But we're talking about two different methods. The normal allocator object_allocator T::allocate () would use placement new and return a pointer to the new object while operator new (size_t, object_allocator T ) wouldn't call placement new and would just return a pointer to the memory. And using the pool allocator functions directly has the nice property that you can tell when a delete/remove isn't necessary because the pool itself is being cleared. Well, all these cases involve a pool with static storage lifetime right? so actually if you don't delete things in these pool they are effectively leaked. They might have a static storage lifetime now, but it doesn't seem like a good idea to hard-bake that into the interface Does that mean that operators new and delete are considered evil? Not IMO. Just that static load-time-initialized caches are not necessarily a good thing. That's effectively what the pool allocator is. (by saying that for these types you should use new and delete, but for other pool-allocated types you should use object_allocators). Depending on what kind of pool allocator you use, you will be forced to either call placement new or not, so the inconsistency will be there anyway. But how we handle argument-taking constructors is a problem that needs to be solved for the pool-allocated objects that don't use a single static type-specific pool. And once we solve that, we get consistency across all pools: - if you want a new object and argumentless construction is OK, use pool.allocate () - if you want a new object and need to pass arguments to the constructor, use new (pool) some_type (arg1, arg2, ...) Maybe I just have bad memories from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot of state that was obviously static in the old days, but that needed to become non-static to support vaguely-efficient switching between different subtargets. The same kind of thing is likely to happen again. I assume things like the jit would prefer not to have new global state with load-time construction. I'm not sure I follow this branch of the discussion, the allocators of any kind surely can dynamically allocated themselves? Sure, but either (a) you keep the pools as a static part of the class and some initialisation and finalisation code that has tendrils into all such classes or (b) you move the static pool outside of the class to some new (still global) state. Explicit pool allocation, like in the C days, gives you the option of putting the pool whereever it needs to go without relying on the principle that you can get to it from global state. Thanks, Richard Ok Richard. I've just finally understood your suggestions and I would suggest following: + I will add a new method to object_allocatorT that will return an allocated memory (void*) (w/o calling any construction) + object_allocatorT::allocate will call placement new with for a parameterless ctor + I will remove all overwritten operators new/delete on e.g. et_forest, ... + For these classes, I will add void*
Re: Fix PR52482, libitm compilation in OSX ppc with old cctools
On Jul 3, 2015, at 4:16 AM, Carlos Sánchez de La Lama csanchez...@gmail.com wrote: PR52482 seems to be cause by old gas not supporting named parameters in macros. Xcode-2.5 (last available for OSX PPC) gas version is 1.38. Patch is against gcc-4.8.4, but affected lines have not changed in SVN HEAD. Ok. I dropped this into all active release branches as well. If anyone spots any problems with the change, let me know. If you do a test suite run, feel free to email it to the test results list.
[PR66726] Factor conversion out of COND_EXPR
Please find a patch that attempt to FIX PR66726 by factoring conversion out of COND_EXPR as explained in the PR. Bootstrapped and regression tested on x86-64-none-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/testsuite/ChangeLog: 2015-07-03 Kugan Vivekanandarajah kug...@linaro.org Jeff Law l...@redhat.com PR middle-end/66726 * gcc.dg/tree-ssa/pr66726.c: New test. gcc/ChangeLog: 2015-07-03 Kugan Vivekanandarajah kug...@linaro.org PR middle-end/66726 * tree-ssa-phiopt.c (factor_out_conditional_conversion): New function. (tree_ssa_phiopt_worker): Call factor_out_conditional_conversion. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr66726.c b/gcc/testsuite/gcc.dg/tree-ssa/pr66726.c index e69de29..b636c8f 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr66726.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr66726.c @@ -0,0 +1,13 @@ + +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-phiopt2 } */ + +extern unsigned short mode_size[]; +int +oof (int mode) +{ + return (64 mode_size[mode] ? 64 : mode_size[mode]); +} + +/* { dg-final { scan-tree-dump-times MIN_EXPR 1 phiopt2 } } */ + diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c index d2a5cee..e8af086 100644 --- a/gcc/tree-ssa-phiopt.c +++ b/gcc/tree-ssa-phiopt.c @@ -73,6 +73,7 @@ along with GCC; see the file COPYING3. If not see static unsigned int tree_ssa_phiopt_worker (bool, bool); static bool conditional_replacement (basic_block, basic_block, edge, edge, gphi *, tree, tree); +static bool factor_out_conditional_conversion (edge, edge, gphi *, tree, tree); static int value_replacement (basic_block, basic_block, edge, edge, gimple, tree, tree); static bool minmax_replacement (basic_block, basic_block, @@ -342,6 +343,8 @@ tree_ssa_phiopt_worker (bool do_store_elim, bool do_hoist_loads) cfgchanged = true; else if (minmax_replacement (bb, bb1, e1, e2, phi, arg0, arg1)) cfgchanged = true; + else if (factor_out_conditional_conversion (e1, e2, phi, arg0, arg1)) + cfgchanged = true; } } @@ -410,6 +413,108 @@ replace_phi_edge_with_variable (basic_block cond_block, bb-index); } +/* PR66726: Factor conversion out of COND_EXPR. If the argument of the PHI + stmt are CONVERT_STMT, factor out the conversion and perform the conversion + to the result of PHI stmt. */ + +static bool +factor_out_conditional_conversion (edge e0, edge e1, gphi *phi, + tree arg0, tree arg1) +{ + gimple def0 = NULL, def1 = NULL, new_stmt; + tree new_arg0 = NULL_TREE, new_arg1 = NULL_TREE; + tree temp, result; + gimple_stmt_iterator gsi; + + /* One of the argument has to be SSA_NAME and other argument can + be an SSA_NAME of INTEGER_CST. */ + if ((TREE_CODE (arg0) != SSA_NAME +TREE_CODE (arg0) != INTEGER_CST) + || (TREE_CODE (arg1) != SSA_NAME + TREE_CODE (arg1) != INTEGER_CST) + || (TREE_CODE (arg0) == INTEGER_CST + TREE_CODE (arg1) == INTEGER_CST)) +return false; + + /* Handle only PHI statements with two arguments. TODO: If all + other arguments to PHI are INTEGER_CST, we can handle more + than two arguments too. */ + if (gimple_phi_num_args (phi) != 2) +return false; + + /* If arg0 is an SSA_NAME and the stmt which defines arg0 is + ai CONVERT_STMT, use the LHS as new_arg0. */ + if (TREE_CODE (arg0) == SSA_NAME) +{ + def0 = SSA_NAME_DEF_STMT (arg0); + if (!is_gimple_assign (def0) + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def0))) + return false; + new_arg0 = gimple_assign_rhs1 (def0); +} + + /* If arg1 is an SSA_NAME and the stmt which defines arg0 is + ai CONVERT_STMT, use the LHS as new_arg1. */ + if (TREE_CODE (arg1) == SSA_NAME) +{ + def1 = SSA_NAME_DEF_STMT (arg1); + if (!is_gimple_assign (def1) + || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def1))) + return false; + new_arg1 = gimple_assign_rhs1 (def1); +} + + /* If arg0 is an INTEGER_CST, fold it to new type. */ + if (TREE_CODE (arg0) != SSA_NAME) +{ + if (!POINTER_TYPE_P (TREE_TYPE (new_arg1)) + int_fits_type_p (arg0, TREE_TYPE (new_arg1))) + new_arg0 = fold_convert (TREE_TYPE (new_arg1), arg0); + else + return false; +} + + /* If arg1 is an INTEGER_CST, fold it to new type. */ + if (TREE_CODE (arg1) != SSA_NAME) +{ + if (!POINTER_TYPE_P (TREE_TYPE (new_arg0)) + int_fits_type_p (arg1, TREE_TYPE (new_arg0))) + new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1); + else + return false; +} + + /* If types of new_arg0 and new_arg1 are different bailout. */ + if (TREE_TYPE (new_arg0) != TREE_TYPE (new_arg1)) +return false; + + /* Replace the PHI stmt with the new_arg0 and new_arg1. Also insert +
[PATCH] PR fortran/66725 -- Fix multiple ICEs
It seems that when the matching of various specifiers in OPEN, CLOSE, and WRITE were written with much confidence that user would not do something stupid. The attached patch fixes multiple ICEs. Regression tested on i386-*-freebsd. OK to commit? PS: There are other ICEs caused be ill-formed specifiers. This patch does not address those. 2015-07-03 Steven G. Kargl ka...@gcc.gnu.org * io.c (is_char_type): New function to test for BT_CHARACTER (gfc_match_open, gfc_match_close, match_dt_element): Use it. 2015-07-03 Steven G. Kargl ka...@gcc.gnu.org * gfortran.dg/pr66725.f90: New test. -- Steve Index: gcc/fortran/io.c === --- gcc/fortran/io.c (revision 225367) +++ gcc/fortran/io.c (working copy) @@ -1242,6 +1242,19 @@ gfc_match_format (void) } +static bool +is_char_type (const char *name, gfc_expr *e) +{ + if (e-ts.type != BT_CHARACTER) +{ + gfc_error (%s requires a scalar-default-char-expr at %L, + name, e-where); + return false; +} + return true; +} + + /* Match an expression I/O tag of some sort. */ static match @@ -1870,6 +1883,9 @@ gfc_match_open (void) static const char *access_f2003[] = { STREAM, NULL }; static const char *access_gnu[] = { APPEND, NULL }; + if (!is_char_type (ACCESS, open-access)) + goto cleanup; + if (!compare_to_allowed_values (ACCESS, access_f95, access_f2003, access_gnu, open-access-value.character.string, @@ -1882,6 +1898,9 @@ gfc_match_open (void) { static const char *action[] = { READ, WRITE, READWRITE, NULL }; + if (!is_char_type (ACTION, open-action)) + goto cleanup; + if (!compare_to_allowed_values (ACTION, action, NULL, NULL, open-action-value.character.string, OPEN, warn)) @@ -1895,6 +1914,9 @@ gfc_match_open (void) not allowed in Fortran 95)) goto cleanup; + if (!is_char_type (ASYNCHRONOUS, open-asynchronous)) + goto cleanup; + if (open-asynchronous-expr_type == EXPR_CONSTANT) { static const char * asynchronous[] = { YES, NO, NULL }; @@ -1913,6 +1935,9 @@ gfc_match_open (void) not allowed in Fortran 95)) goto cleanup; + if (!is_char_type (BLANK, open-blank)) + goto cleanup; + if (open-blank-expr_type == EXPR_CONSTANT) { static const char *blank[] = { ZERO, NULL, NULL }; @@ -1931,6 +1956,9 @@ gfc_match_open (void) not allowed in Fortran 95)) goto cleanup; + if (!is_char_type (DECIMAL, open-decimal)) + goto cleanup; + if (open-decimal-expr_type == EXPR_CONSTANT) { static const char * decimal[] = { COMMA, POINT, NULL }; @@ -1949,6 +1977,9 @@ gfc_match_open (void) { static const char *delim[] = { APOSTROPHE, QUOTE, NONE, NULL }; + if (!is_char_type (DELIM, open-delim)) + goto cleanup; + if (!compare_to_allowed_values (DELIM, delim, NULL, NULL, open-delim-value.character.string, OPEN, warn)) @@ -1962,7 +1993,10 @@ gfc_match_open (void) if (!gfc_notify_std (GFC_STD_F2003, ENCODING= at %C not allowed in Fortran 95)) goto cleanup; - + + if (!is_char_type (ENCODING, open-encoding)) + goto cleanup; + if (open-encoding-expr_type == EXPR_CONSTANT) { static const char * encoding[] = { DEFAULT, UTF-8, NULL }; @@ -1979,6 +2013,9 @@ gfc_match_open (void) { static const char *form[] = { FORMATTED, UNFORMATTED, NULL }; + if (!is_char_type (FORM, open-form)) + goto cleanup; + if (!compare_to_allowed_values (FORM, form, NULL, NULL, open-form-value.character.string, OPEN, warn)) @@ -1990,6 +2027,9 @@ gfc_match_open (void) { static const char *pad[] = { YES, NO, NULL }; + if (!is_char_type (PAD, open-pad)) + goto cleanup; + if (!compare_to_allowed_values (PAD, pad, NULL, NULL, open-pad-value.character.string, OPEN, warn)) @@ -2001,6 +2041,9 @@ gfc_match_open (void) { static const char *position[] = { ASIS, REWIND, APPEND, NULL }; + if (!is_char_type (POSITION, open-position)) + goto cleanup; + if (!compare_to_allowed_values (POSITION, position, NULL, NULL, open-position-value.character.string, OPEN, warn)) @@ -2014,6 +2057,9 @@ gfc_match_open (void) not allowed in Fortran 95)) goto cleanup; + if (!is_char_type (ROUND, open-round)) + goto cleanup; + if (open-round-expr_type == EXPR_CONSTANT) { static const char * round[] = { UP, DOWN, ZERO, NEAREST, @@ -2034,6 +2080,9 @@ gfc_match_open (void) not allowed in Fortran 95)) goto cleanup; + if (!is_char_type (SIGN, open-sign)) + goto cleanup; + if (open-sign-expr_type == EXPR_CONSTANT) { static const char * sign[] = { PLUS, SUPPRESS, PROCESSOR_DEFINED, @@ -2071,6 +2120,9 @@ gfc_match_open (void) static const char *status[] = { OLD, NEW, SCRATCH, REPLACE,
[PATCH] MIPS: fix failing branch range checks for micromips
Hi, The current branch range tests assume that the MIPS branch instructions have a 16 bit branch offset which is shifted by 2. Unfortunately for microMIPS this offset is shifted by 1 which reduces the branch range and is causing the branch-[2,4,6,10,12].c tests to fail. The following patch fixes this issue by firstly adding a new macro to branch-helper.h which outputs the correct number of nops to describe the maximum positive range of a 16 bit micromips branch offset (assuming the branch instruction has a delay slot). Secondly it breaks-up the branch-[2,4,6,10,12].c files into mips tests (which have -mno-micromips added to them) and micromips tests (which use the new macro). I have tested this on the mips-mti-elf target using mips32r2/{-mno-micromips/-mmicromips} test options and there are no new regressions. There is a follow-up patch that I will be working on that will correctly update the other branch tests to correctly test out of range branch behaviour for micromips. Currently these are passing because the mips branch range offset is large enough. These offsets will need to be reduced for micromips to verify the compiler is calculating branch ranges correctly. The ChangeLog and patch are below. Ok to commit? Many thanks, Andrew testsuite/ * gcc.target/mips/branch-2.c: Add -mno-micromips to dg-options. * gcc.target/mips/branch-4.c: Ditto. * gcc.target/mips/branch-6.c: Ditto. * gcc.target/mips/branch-8.c: Ditto. * gcc.target/mips/branch-10.c: Ditto. * gcc.target/mips/branch-12.c: Ditto. * gcc.target/mips/branch-umips-2.c: New file. * gcc.target/mips/branch-umips-4.c: New file. * gcc.target/mips/branch-umips-6.c: New file. * gcc.target/mips/branch-umips-8.c: New file. * gcc.target/mips/branch-umips-10.c: New file. * gcc.target/mips/branch-umips-12.c: New file. * gcc.target/mips/branch-helper.h (OCCUPY_0xfffc): New define. diff --git a/gcc/testsuite/gcc.target/mips/branch-10.c b/gcc/testsuite/gcc.target/mips/branch-10.c index e2b1b5f..00569b0 100644 --- a/gcc/testsuite/gcc.target/mips/branch-10.c +++ b/gcc/testsuite/gcc.target/mips/branch-10.c @@ -1,4 +1,4 @@ -/* { dg-options -mshared -mabi=n32 } */ +/* { dg-options -mshared -mabi=n32 -mno-micromips } */ /* { dg-final { scan-assembler-not (\\\$28|%gp_rel|%got) } } */ /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */ diff --git a/gcc/testsuite/gcc.target/mips/branch-12.c b/gcc/testsuite/gcc.target/mips/branch-12.c index 4aef160..7d7580b 100644 --- a/gcc/testsuite/gcc.target/mips/branch-12.c +++ b/gcc/testsuite/gcc.target/mips/branch-12.c @@ -1,4 +1,4 @@ -/* { dg-options -mshared -mabi=64 } */ +/* { dg-options -mshared -mabi=64 -mno-micromips } */ /* { dg-final { scan-assembler-not (\\\$28|%gp_rel|%got) } } */ /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */ diff --git a/gcc/testsuite/gcc.target/mips/branch-2.c b/gcc/testsuite/gcc.target/mips/branch-2.c index 6409c4c..241e885 100644 --- a/gcc/testsuite/gcc.target/mips/branch-2.c +++ b/gcc/testsuite/gcc.target/mips/branch-2.c @@ -1,4 +1,4 @@ -/* { dg-options -mshared -mabi=32 } */ +/* { dg-options -mshared -mabi=32 -mno-micromips } */ /* { dg-final { scan-assembler-not (\\\$25|\\\$28|cpload) } } */ /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */ /* { dg-final { scan-assembler-not \\.cprestore } } */ diff --git a/gcc/testsuite/gcc.target/mips/branch-4.c b/gcc/testsuite/gcc.target/mips/branch-4.c index 31e4909..923e6d4 100644 --- a/gcc/testsuite/gcc.target/mips/branch-4.c +++ b/gcc/testsuite/gcc.target/mips/branch-4.c @@ -1,4 +1,4 @@ -/* { dg-options -mshared -mabi=n32 } */ +/* { dg-options -mshared -mabi=n32 -mno-micromips } */ /* { dg-final { scan-assembler-not (\\\$25|\\\$28|%gp_rel|%got) } } */ /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */ diff --git a/gcc/testsuite/gcc.target/mips/branch-6.c b/gcc/testsuite/gcc.target/mips/branch-6.c index 77e0340..2c75ab1 100644 --- a/gcc/testsuite/gcc.target/mips/branch-6.c +++ b/gcc/testsuite/gcc.target/mips/branch-6.c @@ -1,4 +1,4 @@ -/* { dg-options -mshared -mabi=64 } */ +/* { dg-options -mshared -mabi=64 -mno-micromips } */ /* { dg-final { scan-assembler-not (\\\$25|\\\$28|%gp_rel|%got) } } */ /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */ diff --git a/gcc/testsuite/gcc.target/mips/branch-8.c b/gcc/testsuite/gcc.target/mips/branch-8.c index ba5f954..85df6b8 100644 --- a/gcc/testsuite/gcc.target/mips/branch-8.c +++ b/gcc/testsuite/gcc.target/mips/branch-8.c @@ -1,4 +1,4 @@ -/* { dg-options -mshared -mabi=32 } */ +/* { dg-options -mshared -mabi=32 -mno-micromips } */ /* { dg-final { scan-assembler-not (\\\$28|cpload|cprestore) } } */ /* { dg-final { scan-assembler-not \tjr\t\\\$1\n } } */ diff --git a/gcc/testsuite/gcc.target/mips/branch-helper.h b/gcc/testsuite/gcc.target/mips/branch-helper.h index 85399be..bc4a31f 100644 ---
[gomp] Move openacc vector worker single handling to RTL
This patch reorganizes the handling of vector and worker single modes and their transitions to/from partitioned mode out of omp-low and into mach-dep-reorg. That allows the regular middle end optimizers to behave normally -- with two exceptions, see below. There are no libgomp regressions, and a number of progressions -- mainly private variables now 'just work'. The approach taken is to have expand_omp_for_static_(no)chunk to emit open acc builtins at the start and end of the loop -- the points where execution should transition into a partitioned mode and back to single mode. I've actually used a single builtin with a constant argument to say whether it is the head or tail of the loop. You could consider these to be like 'fork' and 'join' primitives, if that helps. We cope with multi-mode loops over (say worker vector dimensions), by emitted two loop head and tails in nested seqence. I.e. 'hed-worker, head-vector loop tail-vector tail-worker'. Thus at a transition we only have to consider one particular axis. These builtins are made known to the duplication and merging optimizations as not-to-be duplicated or merged (see builtin_unique_p). For instance, the jump threading optimizer has to already check operations on the potentially threaded path as suitable for duplication, and this is an additional test there. The tail-merging optimizer similarly has to determine that tails are identical, and that is never true for this particular builtin. The intent is that the loops are then maintained as single-entry-single-exit all the way through to RTL expansion. Where and when these builtins are expanded to target specific code is not fixed. In the case of PTX they go all the way to RTL expansion. At RTL expansion the builtins are expanded to volatile unspecs. We insert 'pre' markers too, as some code needs to know the last instruction before the transition. These are uncopyable, and AFAICT RTL doesn't do tail merging (or at least I've not encountered it) so again these cause the SESE nature of the loop to be preserved all the way to mach dep reorg. That's where the fun starts. We scan the CFG looking for the loop markers. First we break basic blocks so the head and tail markers are the first insns of their block. That prevents us needing a mode transition mid block. We then rescan the graph discovering loops and adding each block to the loop in which it resides. The entire function is modeled as a NULL loop. Once that is done we walk the loop structure and insert state propagation code at the loop head points. For vector propagation that'll be a sequence of PTX shuffle instructions. For worker propagation it is a bit more complicated. At the pre-head marker, we insert a spill of state to .shared memory (executed by the single active worker) and at the head marker we insert a fill (executed by all workers). We also insert a sync barrier before the fill. More on where that memory comes from later. Finally we walk the loop structure again, inserting block or loop neutering code. Where possible we try and skip entire blocks[*], but the basic approach is the same. We insert branch-around at the start of the initial block and, if needed, insert propagation code at the end of the final block (which might be the same block). The vector-propagation case is again a simple shuffle, but the worker case is a spill/sync/fill sequence, with the spill done by the single active worker. The subsequent unified branch is marked with an unspec operand, rather than relying on detecting the data flow. Note, the branch around is inserted using hidden branches that appear to the rest of the compiler as volatile unspecs referring to a later label. I don't think the expense of creating new blocks is necessary or worthwhile -- this is flow control the compiler doesn't need to know about (if it did, I argue that we're inserting this too early). The worker spill/fill storage is a file-scope array variable, sized during compilation and emitted directly at the end of the compilation process. Again, this is not registered with the rest of the compiler = (a) I wasn't sure how to, and (b) considered this an internal bit of the backend. It is shared by all functions in this TU. Unfortunately PTX doesn't appear to support COMMON, so making it shared across all TU appears difficult -- one can always use LTO optimization anyway, IMHO this is a step towards putting target-dependent handling in the target compiler and out of the more generic host-side compiler. The changelog is separated into 3 parts - a) general infrastructure - b) additiona - c) deletions. comments? nathan [*] a possible optimization is to do superblock discovery, and skip those in a similar manner to loop skipping. 2015-07-02 Nathan Sidwell nat...@codesourcery.com Infrastructure: * builtins.h (builtin_unique_p): Declare. * builtins.c
Re: [gomp] Move openacc vector worker single handling to RTL
On Fri, Jul 03, 2015 at 06:51:57PM -0400, Nathan Sidwell wrote: IMHO this is a step towards putting target-dependent handling in the target compiler and out of the more generic host-side compiler. The changelog is separated into 3 parts - a) general infrastructure - b) additiona - c) deletions. comments? Thanks for working on it. If the builtins are not meant to be used by users directly (I assume they aren't) nor have a 1-1 correspondence to a library routine, it is much better to emit them as internal calls (see internal-fn.{c,def}) instead of BUILT_IN_NORMAL functions. Jakub
Re: [PATCH 1/2][ARM] PR/65956 AAPCS update for alignment attribute
On 03/07/15 19:24, Richard Biener wrote: On July 3, 2015 6:11:13 PM GMT+02:00, Richard Earnshaw richard.earns...@foss.arm.com wrote: On 03/07/15 16:26, Alan Lawrence wrote: These include tests of structs, scalars, and vectors - only general-purpose registers are affected by the ABI rules for alignment, but we can restrict the vector test to use the base AAPCS. Prior to this patch, align2.c, align3.c and align_rec1.c were failing (the latter showing an internal inconsistency, the first two merely that GCC did not obey the new ABI). With this patch, the align_rec2.c fails, and also gcc.c-torture/execute/20040709-1.c at -O0 only, both because of a latent bug where we can emit strd/ldrd on an odd-numbered register in ARM state, fixed by the second patch. gcc/ChangeLog: * config/arm/arm.c (arm_needs_doubleword_align): Drop any outer alignment attribute, exploring one level down for aggregates. gcc/testsuite/ChangeLog: * gcc.target/arm/aapcs/align1.c: New. * gcc.target/arm/aapcs/align_rec1.c: New. * gcc.target/arm/aapcs/align2.c: New. * gcc.target/arm/aapcs/align_rec2.c: New. * gcc.target/arm/aapcs/align3.c: New. * gcc.target/arm/aapcs/align_rec3.c: New. * gcc.target/arm/aapcs/align4.c: New. * gcc.target/arm/aapcs/align_rec4.c: New. * gcc.target/arm/aapcs/align_vararg1.c: New. * gcc.target/arm/aapcs/align_vararg2.c: New. arm_overalign_1.patch diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 04663999224c8c8eb8e2d10b0ec634db6ce5027e..ee57d30617a2f7e1cd63ca013fe5655a01027581 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -6020,8 +6020,17 @@ arm_init_cumulative_args (CUMULATIVE_ARGS *pcum, tree fntype, static bool arm_needs_doubleword_align (machine_mode mode, const_tree type) { - return (GET_MODE_ALIGNMENT (mode) PARM_BOUNDARY - || (type TYPE_ALIGN (type) PARM_BOUNDARY)); + if (!type) +return PARM_BOUNDARY GET_MODE_ALIGNMENT (mode); + + if (!AGGREGATE_TYPE_P (type)) +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)) PARM_BOUNDARY; + + for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) +if (DECL_ALIGN (field) PARM_BOUNDARY) + return true; + Is this behavior correct for unions or aggregates with record or union members? Yes, at least that was my intention. It's an error in the wording of the proposed change, which I think should say composite types not aggregate types. R. Technically this is incorrect since AGGREGATE_TYPE_P includes ARRAY_TYPE and ARRAY_TYPE doesn't have TYPE_FIELDS. I doubt we could reach that case though (unless there's a language that allows passing arrays by value). For array types I think you need to check TYPE_ALIGN (TREE_TYPE (type)). R. + return false; } diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align1.c b/gcc/testsuite/gcc.target/arm/aapcs/align1.c new file mode 100644 index ..8981d57c3eaf0bd89d224bec79ff8a45627a0a89 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align1.c @@ -0,0 +1,29 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align1.c + +typedef __attribute__((aligned (8))) int alignedint; + +alignedint a = 11; +alignedint b = 13; +alignedint c = 17; +alignedint d = 19; +alignedint e = 23; +alignedint f = 29; + +#include abitest.h +#else + ARG (alignedint, a, R0) + /* Attribute suggests R2, but we should use only natural alignment: */ + ARG (alignedint, b, R1) + ARG (alignedint, c, R2) + ARG (alignedint, d, R3) + ARG (alignedint, e, STACK) + /* Attribute would suggest STACK + 8 but should be ignored: */ + LAST_ARG (alignedint, f, STACK + 4) +#endif diff --git a/gcc/testsuite/gcc.target/arm/aapcs/align2.c b/gcc/testsuite/gcc.target/arm/aapcs/align2.c new file mode 100644 index ..992da53c606c793f25278152406582bb993719d2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/aapcs/align2.c @@ -0,0 +1,30 @@ +/* Test AAPCS layout (alignment). */ + +/* { dg-do run { target arm_eabi } } */ +/* { dg-require-effective-target arm32 } */ +/* { dg-options -O } */ + +#ifndef IN_FRAMEWORK +#define TESTFILE align2.c + +/* The underlying struct here has alignment 4. */ +typedef struct __attribute__((aligned (8))) + { +int x; +int y; + } overaligned; + +/* A couple of instances, at 8-byte-aligned memory locations. */ +overaligned a = { 2, 3 }; +overaligned b = { 5, 8 }; + +#include abitest.h +#else + ARG (int, 7, R0) + /* Alignment should be 4. */ + ARG (overaligned, a, R1) + ARG (int, 9, R3) + ARG (int, 10, STACK) + /* Alignment should be 4. */ + LAST_ARG (overaligned, b, STACK + 4) +#endif diff --git
[PATCH PR66702]Skip pr48052 on targets not support vect_int_mult
Hi, The test failed on sparc because sparc doesn't support vect_int_mult. This patch adds the prerequisite condition thus skips test on such platforms. An obvious change, will apply it in 24h. Thanks, bin gcc/testsuite/ChangeLog 2015-07-02 Bin Cheng bin.ch...@arm.com PR tree-optimization/66720 * gcc.dg/vect/pr48052.c: Use dg-require-effective-target vect_int_mult. Index: gcc/testsuite/gcc.dg/vect/pr48052.c === --- gcc/testsuite/gcc.dg/vect/pr48052.c (revision 225094) +++ gcc/testsuite/gcc.dg/vect/pr48052.c (working copy) @@ -1,9 +1,9 @@ /* { dg-do compile } */ -/* { dg-additional-options -O3 } */ +/* { dg-require-effective-target vect_int_mult } */ int foo(int* A, int* B, unsigned start, unsigned BS) { - int s; + int s = 0; for (unsigned k = start; k start + BS; k++) { s += A[k] * B[k]; @@ -14,7 +14,7 @@ int foo(int* A, int* B, unsigned start, unsigned int bar(int* A, int* B, unsigned BS) { - int s; + int s = 0; for (unsigned k = 0; k BS; k++) { s += A[k] * B[k];
Re: C++ PATCH to change default dialect to C++14
On Fri, Jul 3, 2015 at 1:41 AM, Jim Wilson jim.wil...@linaro.org wrote: On 07/01/2015 11:17 PM, Jim Wilson wrote: On Wed, Jul 1, 2015 at 10:21 PM, Jason Merrill ja...@redhat.com wrote: This document also says that A workaround until libraries get updated is to include cstddef or stddef.h before any headers from that library. Can you try modifying the graphite* files accordingly? Right. I forgot to try that. Trying it now, I see that my build gets past the point that it failed, so this does appear to work. I won't be able to finish a proper test until tomorrow, but for now this patch seems to work. Since the patch to include system.h before the isl header did not work, I went ahead and tested this patch to add stddef.h includes before the isl headers. I tested it with an x86_64 bootstrap and make check. There were no problems caused by my patch. Ok then. I presume it might still cause issues on some hosts in the end. At some point we talked about doing sth like #define WANT_ISL_HEADERS #include system.h and include isl headers from system.h at the appropriate location if WANT_ISL_HEADERS Richard. Though as a side effect of doing this, I discovered another minor problem with the C++ version change. This caused one additional testsuite failure. It also caused a bunch of tests to start working, which is nice, but the new failure needs to be addressed. /home/wilson/FOSS/GCC/gcc-svn/gcc/testsuite/gcc.dg/plugin/wide-int_plugin. c: In function 'void test_double_int_round_udiv()': /home/wilson/FOSS/GCC/gcc-svn/gcc/testsuite/gcc.dg/plugin/wide-int_plugin. c:13:45: error: narrowing conversion of '-1' from ' int' to 'long unsigned int' inside { } [-Wnarrowing] double_int dmax = { -1, HOST_WIDE_INT_MAX }; ^ /home/wilson/FOSS/GCC/gcc-svn/gcc/testsuite/gcc.dg/plugin/wide-int_plugin. c:14:33: error: narrowing conversion of '-1' from ' int' to 'long unsigned int' inside { } [-Wnarrowing] double_int dnegone = { -1, -1 }; ^ ... FAIL: gcc.dg/plugin/wide-int_plugin.c compilation The code compiles with -std=c++98. It does not compile with -std=c++14. So this testcase should be fixed to work with c++14. Or the c++14 support should be fixed if it is broken. Jim
Re: [Patch ARM-AArch64/testsuite Neon intrinsics: vget_lane
On 2 July 2015 at 14:44, Christophe Lyon christophe.l...@linaro.org wrote: Hi, Here is the missing test for ARM/AArch64 AdvSIMD intrinsic: vget_lane. Tested on arm, armeb, aarch64 and aarch64_be targets (using QEMU). The tests all pass, expect on armeb where vgetq_lane_s64 and vgetq_lane_u64 fail. I haven't investigated in details yet. OK for trunk? 2015-07-02 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vget_lane.c: New testcase. OK /Marcus
Re: [Fortran f951, C++14] Fix trans-common.c compilation failure on AIX
On Thu, Jul 2, 2015 at 10:49 PM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 02, 2015 at 04:47:13PM -0400, David Edelsohn wrote: I can change the patch to include it after system.h, if that is preferred. That order also works on AIX. If including it right after system.h works, it is preapproved. Note that after config.h is generally better (considering all the #poison stuff in system.h). Not using std::map but GCCs own hash_map would be prefered though. (otherwise at some point we'll end up including all of libstdc++ from system.h given host compiler weirdness and workarounds for include stuff - which is what system.h is for) Richard. Jakub
Re: [Fortran f951, C++14] Fix trans-common.c compilation failure on AIX
On Fri, Jul 03, 2015 at 10:32:38AM +0200, Richard Biener wrote: On Thu, Jul 2, 2015 at 10:49 PM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 02, 2015 at 04:47:13PM -0400, David Edelsohn wrote: I can change the patch to include it after system.h, if that is preferred. That order also works on AIX. If including it right after system.h works, it is preapproved. Note that after config.h is generally better (considering all the #poison stuff in system.h). Not using std::map but GCCs own hash_map would be prefered though. (otherwise at some point we'll end up including all of libstdc++ from system.h given host compiler weirdness and workarounds for include stuff - which is what system.h is for) Can we poison std::map and other templates we want to avoid in GCC sources, so that people wouldn't be tempted to use it? Jakub
Re: [C/C++ PATCH] Implement -Wshift-overflow (PR c++/55095) (take 3)
Ping^4. On Fri, Jun 26, 2015 at 10:08:51AM +0200, Marek Polacek wrote: I'm pinging the C++ parts. On Fri, Jun 19, 2015 at 12:44:36PM +0200, Marek Polacek wrote: Ping. On Fri, Jun 12, 2015 at 11:07:29AM +0200, Marek Polacek wrote: Ping. On Fri, Jun 05, 2015 at 10:55:08AM +0200, Marek Polacek wrote: On Thu, Jun 04, 2015 at 09:04:19PM +, Joseph Myers wrote: The C changes are OK. Jason, do you want to approve the C++ parts? Marek
Re: [Fortran f951, C++14] Fix trans-common.c compilation failure on AIX
On Fri, Jul 3, 2015 at 10:37 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Jul 03, 2015 at 10:32:38AM +0200, Richard Biener wrote: On Thu, Jul 2, 2015 at 10:49 PM, Jakub Jelinek ja...@redhat.com wrote: On Thu, Jul 02, 2015 at 04:47:13PM -0400, David Edelsohn wrote: I can change the patch to include it after system.h, if that is preferred. That order also works on AIX. If including it right after system.h works, it is preapproved. Note that after config.h is generally better (considering all the #poison stuff in system.h). Not using std::map but GCCs own hash_map would be prefered though. (otherwise at some point we'll end up including all of libstdc++ from system.h given host compiler weirdness and workarounds for include stuff - which is what system.h is for) Can we poison std::map and other templates we want to avoid in GCC sources, so that people wouldn't be tempted to use it? Won't they just include map before system.h then? ;) This needs to be caught by patch review I fear. Richard. Jakub
Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator
Trevor Saunders tbsau...@tbsaunde.org writes: On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote: Martin Liška mli...@suse.cz writes: diff --git a/gcc/asan.c b/gcc/asan.c index e89817e..dabd6f1 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -362,20 +362,20 @@ struct asan_mem_ref /* Pool allocation new operator. */ inline void *operator new (size_t) { -return pool.allocate (); +return ::new (pool.allocate ()) asan_mem_ref (); } /* Delete operator utilizing pool allocation. */ inline void operator delete (void *ptr) { -pool.remove ((asan_mem_ref *) ptr); +pool.remove (ptr); } /* Memory allocation pool. */ - static pool_allocatorasan_mem_ref pool; + static pool_allocator pool; }; I'm probably going over old ground/wounds, sorry, but what's the benefit of having this sort of pattern? Why not simply have object_allocators and make callers use pool.allocate () and pool.remove (x) (with pool.remove calling the destructor) instead of new and delete? It feels wrong to me to tie the data type to a particular allocation object like this. Well the big question is what does allocate() do about construction? if it seems wierd for it to not call the ctor, but I'm not sure we can do a good job of forwarding args to allocate() with C++98. If you need non-default constructors then: new (pool) type (aaa, bbb)...; doesn't seem too bad. I agree object_allocator's allocate () should call the constructor. However it seems kind of wierd the operator new here is calling the placement new on the object it allocates. Yeah. And using the pool allocator functions directly has the nice property that you can tell when a delete/remove isn't necessary because the pool itself is being cleared. Well, all these cases involve a pool with static storage lifetime right? so actually if you don't delete things in these pool they are effectively leaked. They might have a static storage lifetime now, but it doesn't seem like a good idea to hard-bake that into the interface (by saying that for these types you should use new and delete, but for other pool-allocated types you should use object_allocators). Maybe I just have bad memories from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot of state that was obviously static in the old days, but that needed to become non-static to support vaguely-efficient switching between different subtargets. The same kind of thing is likely to happen again. I assume things like the jit would prefer not to have new global state with load-time construction. Thanks, Richard
Re: [PATCH] PR target/66746: Failure to compile #include x86intrin.h with -miamcu
On Fri, Jul 3, 2015 at 5:53 AM, H.J. Lu hjl.to...@gmail.com wrote: x86intrin.h has useful intrinsics for instructions for IA MCU. This patch adds __iamcu__ check to x86intrin.h and ia32intrin.h. OK for trunk? H.J. --- gcc/ PR target/66746 * config/i386/ia32intrin.h (__crc32b): Don't define if __iamcu__ is defined. (__crc32w): Likewise. (__crc32d): Likewise. (__rdpmc): Likewise. (__rdtscp): Likewise. (_rdpmc): Likewise. (_rdtscp): Likewise. * config/i386/x86intrin.h: Only include ia32intrin.h if __iamcu__ is defined. gcc/testsuite/ PR target/66746 * gcc.target/i386/pr66746.c: New file. OK. Thanks, Uros. gcc/config/i386/ia32intrin.h| 16 +++- gcc/config/i386/x86intrin.h | 5 + gcc/testsuite/gcc.target/i386/pr66746.c | 10 ++ 3 files changed, 30 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr66746.c diff --git a/gcc/config/i386/ia32intrin.h b/gcc/config/i386/ia32intrin.h index 1f728c8..b8d1c31 100644 --- a/gcc/config/i386/ia32intrin.h +++ b/gcc/config/i386/ia32intrin.h @@ -49,6 +49,8 @@ __bswapd (int __X) return __builtin_bswap32 (__X); } +#ifndef __iamcu__ + #ifndef __SSE4_2__ #pragma GCC push_options #pragma GCC target(sse4.2) @@ -82,6 +84,8 @@ __crc32d (unsigned int __C, unsigned int __V) #pragma GCC pop_options #endif /* __DISABLE_SSE4_2__ */ +#endif /* __iamcu__ */ + /* 32bit popcnt */ extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -90,6 +94,8 @@ __popcntd (unsigned int __X) return __builtin_popcount (__X); } +#ifndef __iamcu__ + /* rdpmc */ extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -98,6 +104,8 @@ __rdpmc (int __S) return __builtin_ia32_rdpmc (__S); } +#endif /* __iamcu__ */ + /* rdtsc */ extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -106,6 +114,8 @@ __rdtsc (void) return __builtin_ia32_rdtsc (); } +#ifndef __iamcu__ + /* rdtscp */ extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -114,6 +124,8 @@ __rdtscp (unsigned int *__A) return __builtin_ia32_rdtscp (__A); } +#endif /* __iamcu__ */ + /* 8bit rol */ extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -290,9 +302,11 @@ __writeeflags (unsigned int X) #define _bit_scan_reverse(a) __bsrd(a) #define _bswap(a) __bswapd(a) #define _popcnt32(a) __popcntd(a) +#ifndef __iamcu__ #define _rdpmc(a) __rdpmc(a) -#define _rdtsc() __rdtsc() #define _rdtscp(a) __rdtscp(a) +#endif /* __iamcu__ */ +#define _rdtsc() __rdtsc() #define _rotwl(a,b)__rolw((a), (b)) #define _rotwr(a,b)__rorw((a), (b)) #define _rotl(a,b) __rold((a), (b)) diff --git a/gcc/config/i386/x86intrin.h b/gcc/config/i386/x86intrin.h index 6f7b1f6..be0a1a1 100644 --- a/gcc/config/i386/x86intrin.h +++ b/gcc/config/i386/x86intrin.h @@ -26,6 +26,8 @@ #include ia32intrin.h +#ifndef __iamcu__ + #include mmintrin.h #include xmmintrin.h @@ -86,4 +88,7 @@ #include xsavecintrin.h #include mwaitxintrin.h + +#endif /* __iamcu__ */ + #endif /* _X86INTRIN_H_INCLUDED */ diff --git a/gcc/testsuite/gcc.target/i386/pr66746.c b/gcc/testsuite/gcc.target/i386/pr66746.c new file mode 100644 index 000..3ef77bf --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr66746.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options -O2 -miamcu } */ + +/* Defining away extern and __inline results in all of them being + compiled as proper functions. */ + +#define extern +#define __inline + +#include x86intrin.h -- 2.4.3
Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator
On 07/03/2015 10:55 AM, Richard Sandiford wrote: Trevor Saunders tbsau...@tbsaunde.org writes: On Thu, Jul 02, 2015 at 09:09:31PM +0100, Richard Sandiford wrote: Martin Liška mli...@suse.cz writes: diff --git a/gcc/asan.c b/gcc/asan.c index e89817e..dabd6f1 100644 --- a/gcc/asan.c +++ b/gcc/asan.c @@ -362,20 +362,20 @@ struct asan_mem_ref /* Pool allocation new operator. */ inline void *operator new (size_t) { -return pool.allocate (); +return ::new (pool.allocate ()) asan_mem_ref (); } /* Delete operator utilizing pool allocation. */ inline void operator delete (void *ptr) { -pool.remove ((asan_mem_ref *) ptr); +pool.remove (ptr); } /* Memory allocation pool. */ - static pool_allocatorasan_mem_ref pool; + static pool_allocator pool; }; I'm probably going over old ground/wounds, sorry, but what's the benefit of having this sort of pattern? Why not simply have object_allocators and make callers use pool.allocate () and pool.remove (x) (with pool.remove calling the destructor) instead of new and delete? It feels wrong to me to tie the data type to a particular allocation object like this. Well the big question is what does allocate() do about construction? if it seems wierd for it to not call the ctor, but I'm not sure we can do a good job of forwarding args to allocate() with C++98. If you need non-default constructors then: new (pool) type (aaa, bbb)...; doesn't seem too bad. I agree object_allocator's allocate () should call the constructor. Hello. I do not insist on having a new/delete operator for aforementioned class. However, I don't know a different approach that will do an object construction in the allocate method w/o utilizing placement new? However it seems kind of wierd the operator new here is calling the placement new on the object it allocates. Yeah. And using the pool allocator functions directly has the nice property that you can tell when a delete/remove isn't necessary because the pool itself is being cleared. Well, all these cases involve a pool with static storage lifetime right? so actually if you don't delete things in these pool they are effectively leaked. They might have a static storage lifetime now, but it doesn't seem like a good idea to hard-bake that into the interface (by saying that for these types you should use new and delete, but for other pool-allocated types you should use object_allocators). Maybe I just have bad memories from doing the SWITCHABLE_TARGET stuff, but there I was changing a lot of state that was obviously static in the old days, but that needed to become non-static to support vaguely-efficient switching between different subtargets. The same kind of thing is likely to happen again. I assume things like the jit would prefer not to have new global state with load-time construction. Agree with that it's a global state. But even before my transformation the code utilized static variables that are similar problem from e.g. JIT perspective. Best approach would be to encapsulate these static allocators to a class (a pass?). It's quite a lot of work. Thanks, Martin Thanks, Richard
Re: [Patch SRA] Fix PR66119 by calling get_move_ratio in SRA
On Tue, 30 Jun 2015, James Greenhalgh wrote: On Fri, Jun 26, 2015 at 06:10:00PM +0100, Jakub Jelinek wrote: On Fri, Jun 26, 2015 at 06:03:34PM +0100, James Greenhalgh wrote: --- /dev/null +++ b/gcc/testsuite/g++.dg/pr66119.C I think generally testcases shouldn't be added into g++.dg/ directly, but subdirectories. So g++.dg/opt/ ? @@ -0,0 +1,69 @@ +/* PR66119 - MOVE_RATIO is not constant in a compiler run, so Scalar + Reduction of Aggregates must ask the back-end more than once what + the value of MOVE_RATIO now is. */ + +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ In g++.dg/, dejagnu cycles through all 3 major -std=c* versions, thus using -std=c++11 is inappropriate. If the test requires c++11, instead you do // { dg-do compile { target { { i?86-*-* x86_64-*-* } c++11 } } } +/* { dg-options -std=c++11 -O3 -mavx -fdump-tree-sra -march=slm { target avx_runtime } } */ and remove -std=c++11 here. I don't see any point in guarding it with avx_runtime, after all, if not avx_runtime, the test will be compiled with -O0 and thus very likely fail the scan-tree-dump test. As it is dg-do compile test only, you have no dependency on assembler nor linker nor runtime. But I'd add -mtune=slm too. Thanks, I'm used to the dance we try to do to get Neon enabled/disabled correctly when testing multilib environments on ARM so tried to overengineer things! I've updated the testcase as you suggested, and moved it to g++.dg/opt. OK? Looks good to me now. Thanks, Richard. Thanks, James --- gcc/ 2015-06-30 James Greenhalgh james.greenha...@arm.com PR tree-optimization/66119 * toplev.c (process_options): Don't set up default values for the sra_max_scalarization_size_{speed,size} parameters. * tree-sra (analyze_all_variable_accesses): If no values have been set for the sra_max_scalarization_size_{speed,size} parameters, call get_move_ratio to get target defaults. gcc/testsuite/ 2015-06-30 James Greenhalgh james.greenha...@arm.com * g++.dg/opt/pr66119.C: New. -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [Ping, Patch, Fortran, PR58586, v5] ICE with derived type with allocatable component passed by value
Ping! Version increment only to reflect rebasing on current trunk. Bootstraps and regtests fine on x86_64-linux-gnu/f21. I am tempted to follow Paul's method of setting a deadline for objections. Else I will commit the patch next Friday (just kidding). I am more interested in a review. The patch now lives in my code base for several months and is used to compile a rather sophisticated fortran code without issues. So I expect no big trouble in trunk given that the patch addresses a rather seldomly (;-)) used construct. Ok for trunk? Regards, Andre On Tue, 19 May 2015 16:01:37 +0200 Andre Vehreschild ve...@gmx.de wrote: Hi, attached is the most recent version of the patch for 58586. It adapts to recent trunk and addresses the caveats so far, i.e. the testcases in the comments now compile and run again w/o errors. Bootstraps and regtests fine on x86_64-linux-gnu/f21. Comments? - Andre -- Andre Vehreschild * Email: vehre ad gmx dot de pr58586_5.clog Description: Binary data diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index efafabc..d16bf13 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -14083,10 +14083,15 @@ resolve_symbol (gfc_symbol *sym) if ((!a-save !a-dummy !a-pointer !a-in_common !a-use_assoc - (a-referenced || a-result) - !(a-function sym != sym-result)) + !a-result !a-function) || (a-dummy a-intent == INTENT_OUT !a-pointer)) apply_default_init (sym); + else if (a-function sym-result a-access != ACCESS_PRIVATE + (sym-ts.u.derived-attr.alloc_comp + || sym-ts.u.derived-attr.pointer_comp)) + /* Mark the result symbol to be referenced, when it has allocatable + components. */ + sym-result-attr.referenced = 1; } if (sym-ts.type == BT_CLASS sym-ns == gfc_current_ns diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index b4f75ba..aec2018 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -5885,9 +5885,33 @@ gfc_generate_function_code (gfc_namespace * ns) tmp = gfc_trans_code (ns-code); gfc_add_expr_to_block (body, tmp); - if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node) + if (TREE_TYPE (DECL_RESULT (fndecl)) != void_type_node + || (sym-result sym-result != sym + sym-result-ts.type == BT_DERIVED + sym-result-ts.u.derived-attr.alloc_comp)) { + bool artificial_result_decl = false; tree result = get_proc_result (sym); + gfc_symbol *rsym = sym == sym-result ? sym : sym-result; + + /* Make sure that a function returning an object with + alloc/pointer_components always has a result, where at least + the allocatable/pointer components are set to zero. */ + if (result == NULL_TREE sym-attr.function + ((sym-result-ts.type == BT_DERIVED + (sym-attr.allocatable + || sym-attr.pointer + || sym-result-ts.u.derived-attr.alloc_comp + || sym-result-ts.u.derived-attr.pointer_comp)) + || (sym-result-ts.type == BT_CLASS + (CLASS_DATA (sym)-attr.allocatable + || CLASS_DATA (sym)-attr.class_pointer + || CLASS_DATA (sym-result)-attr.alloc_comp + || CLASS_DATA (sym-result)-attr.pointer_comp + { + artificial_result_decl = true; + result = gfc_get_fake_result_decl (sym, 0); + } if (result != NULL_TREE sym-attr.function !sym-attr.pointer) { @@ -5907,16 +5931,30 @@ gfc_generate_function_code (gfc_namespace * ns) null_pointer_node)); } else if (sym-ts.type == BT_DERIVED - sym-ts.u.derived-attr.alloc_comp !sym-attr.allocatable) { - rank = sym-as ? sym-as-rank : 0; - tmp = gfc_nullify_alloc_comp (sym-ts.u.derived, result, rank); - gfc_add_expr_to_block (init, tmp); + gfc_expr *init_exp; + /* Arrays are not initialized using the default initializer of + their elements. Therefore only check if a default + initializer is available when the result is scalar. */ + init_exp = rsym-as ? NULL : gfc_default_initializer (rsym-ts); + if (init_exp) + { + tmp = gfc_trans_structure_assign (result, init_exp, 0); + gfc_free_expr (init_exp); + gfc_add_expr_to_block (init, tmp); + } + else if (rsym-ts.u.derived-attr.alloc_comp) + { + rank = rsym-as ? rsym-as-rank : 0; + tmp = gfc_nullify_alloc_comp (rsym-ts.u.derived, result, + rank); + gfc_prepend_expr_to_block (body, tmp); + } } } - if (result == NULL_TREE) + if (result == NULL_TREE || artificial_result_decl) { /* TODO: move to the appropriate place in resolve.c. */ if (warn_return_type sym == sym-result) @@ -5926,7 +5964,7 @@ gfc_generate_function_code (gfc_namespace * ns) if (warn_return_type) TREE_NO_WARNING(sym-backend_decl) = 1; } - else + if (result != NULL_TREE) gfc_add_expr_to_block (body, gfc_generate_return ()); } diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 7747a67..195f7a4 100644
[PATCH][match-and-simplify] Properly canonicalize operand order for sub-expressions
I observed that we fail to match patterns because when valueizing sub-expression operands we fail to canonicalize operand order and thus try matching (1 + a) - 1 instead of the canonical (a + 1) - 1. The following fixes this at least for commutative tree codes. For comparisons which we also canonicalize in the plumbing (by means of changing the comparison code via swap_tree_comparison) this isn't that easily done. I'm thinking of a proper solution here. Bootstrap regtest running on x86_64-unknown-linux-gnu. Richard. 2015-07-03 Richard Biener rguent...@suse.de * genmatch.c (commutative_tree_code, commutative_ternary_tree_code): Copy from tree.c (dt_operand::gen_gimple_expr): After valueizing operands re-canonicalize operand order for commutative tree codes. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 225368) +++ gcc/genmatch.c (working copy) @@ -175,6 +175,62 @@ END_BUILTINS }; #undef DEF_BUILTIN +/* Return true if CODE represents a commutative tree code. Otherwise + return false. */ +bool +commutative_tree_code (enum tree_code code) +{ + switch (code) +{ +case PLUS_EXPR: +case MULT_EXPR: +case MULT_HIGHPART_EXPR: +case MIN_EXPR: +case MAX_EXPR: +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: +case BIT_AND_EXPR: +case NE_EXPR: +case EQ_EXPR: +case UNORDERED_EXPR: +case ORDERED_EXPR: +case UNEQ_EXPR: +case LTGT_EXPR: +case TRUTH_AND_EXPR: +case TRUTH_XOR_EXPR: +case TRUTH_OR_EXPR: +case WIDEN_MULT_EXPR: +case VEC_WIDEN_MULT_HI_EXPR: +case VEC_WIDEN_MULT_LO_EXPR: +case VEC_WIDEN_MULT_EVEN_EXPR: +case VEC_WIDEN_MULT_ODD_EXPR: + return true; + +default: + break; +} + return false; +} + +/* Return true if CODE represents a ternary tree code for which the + first two operands are commutative. Otherwise return false. */ +bool +commutative_ternary_tree_code (enum tree_code code) +{ + switch (code) +{ +case WIDEN_MULT_PLUS_EXPR: +case WIDEN_MULT_MINUS_EXPR: +case DOT_PROD_EXPR: +case FMA_EXPR: + return true; + +default: + break; +} + return false; +} + /* Base class for all identifiers the parser knows. */ @@ -1996,6 +2052,25 @@ dt_operand::gen_gimple_expr (FILE *f) child_opname, child_opname); fprintf (f, {\n); } + /* While the toplevel operands are canonicalized by the caller + after valueizing operands of sub-expressions we have to + re-canonicalize operand order. */ + if (operator_id *code = dyn_cast operator_id * (id)) +{ + /* ??? We can't canonicalize tcc_comparison operands here + because that requires changing the comparison code which +we already matched... */ + if (commutative_tree_code (code-code) + || commutative_ternary_tree_code (code-code)) + { + char child_opname0[20], child_opname1[20]; + gen_opname (child_opname0, 0); + gen_opname (child_opname1, 1); + fprintf (f, if (tree_swap_operands_p (%s, %s, false))\n +std::swap (%s, %s);\n, child_opname0, child_opname1, + child_opname0, child_opname1); + } +} return n_ops; }
Re: [PATCH 2/2] Add leon3r0 and leon3r0v7 CPU targets
Thank you for the patch in your other mail that changes this! You're welcome. We were also thinking of the instruction timing information found in the leon_costs and leon3_costs. We took a look at the values in leon_costs and they seem to fit well with the UT699, except for division. We got a bit unsure as to what leon system they are based on, as the division cost was wrong also for the AT697F, which is the most common leon2 system. Would it be ok to update the division cost values of leon_costs so that they match UT699 and AT697F? Sure. In general, depending on how one instantiate a leon system and which FPU is selected, you will get different timing. Is there a recommended way of adding support for this without adding additional CPU targets? We are considering to add support for GRFPU-lite, which only differs in the timing. One could add a -mtune-fpu switch. Did you look at other architectures in the GCC tree that would have similar requirements? -- Eric Botcazou
Fix PR52482, libitm compilation in OSX ppc with old cctools
Hi all, PR52482 seems to be cause by old gas not supporting named parameters in macros. Xcode-2.5 (last available for OSX PPC) gas version is 1.38. Patch is against gcc-4.8.4, but affected lines have not changed in SVN HEAD. BR Carlos diff -ur gcc-4.8.4.old/libitm/config/powerpc/sjlj.S gcc-4.8.4/libitm/config/powerpc/sjlj.S --- gcc-4.8.4.old/libitm/config/powerpc/sjlj.S 2014-04-04 16:17:55.0 +0200 +++ gcc-4.8.4/libitm/config/powerpc/sjlj.S 2015-07-03 11:34:23.0 +0200 @@ -83,16 +83,16 @@ bl \name .endm #elif defined(_CALL_DARWIN) -.macro FUNC name +.macro FUNC .globl _$0 _$0: .endmacro -.macro END name +.macro END .endmacro -.macro HIDDEN name +.macro HIDDEN .private_extern _$0 .endmacro -.macro CALL name +.macro CALL bl _$0 .endmacro # ifdef __ppc64__ -- 'Whoever has the power in society determines what can be studied, determines what can be observed, determines what can be thought.' Michael Crichton, Micro (2011)
C++ PATCH for c++/66748 (ICE with abi_tag on enum)
The following testcase was breaking because we we're trying to access TYPE_LANG_SPECIFIC via CLASSTYPE_TEMPLATE_* macros without first checking that we indeed have a CLASS_TYPE. Bootstrapped/regtested on x86_64-linux, ok for trunk/5/4.9? 2015-07-03 Marek Polacek pola...@redhat.com PR c++/66748 * tree.c (handle_abi_tag_attribute): Check for CLASS_TYPE_P before accessing TYPE_LANG_SPECIFIC node. * g++.dg/abi/abi-tag15.C: New test. diff --git gcc/cp/tree.c gcc/cp/tree.c index 0d1112c..22d5b3a 100644 --- gcc/cp/tree.c +++ gcc/cp/tree.c @@ -3654,13 +3654,15 @@ handle_abi_tag_attribute (tree* node, tree name, tree args, name, *node); goto fail; } - else if (CLASSTYPE_TEMPLATE_INSTANTIATION (*node)) + else if (CLASS_TYPE_P (*node) + CLASSTYPE_TEMPLATE_INSTANTIATION (*node)) { warning (OPT_Wattributes, ignoring %qE attribute applied to template instantiation %qT, name, *node); goto fail; } - else if (CLASSTYPE_TEMPLATE_SPECIALIZATION (*node)) + else if (CLASS_TYPE_P (*node) + CLASSTYPE_TEMPLATE_SPECIALIZATION (*node)) { warning (OPT_Wattributes, ignoring %qE attribute applied to template specialization %qT, name, *node); diff --git gcc/testsuite/g++.dg/abi/abi-tag15.C gcc/testsuite/g++.dg/abi/abi-tag15.C index e69de29..bfda3a2 100644 --- gcc/testsuite/g++.dg/abi/abi-tag15.C +++ gcc/testsuite/g++.dg/abi/abi-tag15.C @@ -0,0 +1,3 @@ +// PR c++/66748 + +enum __attribute__((abi_tag(foo))) E {}; // { dg-error redeclaration of } Marek
[v3 PATCH] Implement Fundamentals v2 propagate_const
Tested on Linux-PPC64. Patch gzipped to avoid polluting people's mailboxes with a 45k patch. 2015-07-03 Ville Voutilainen ville.voutilai...@gmail.com Implement std::experimental::fundamentals_v2::propagate_const. * include/Makefile.am: Add propagate_const. * include/Makefile.in: Add propagate_const. * include/experimental/propagate_const: New. * testsuite/experimental/propagate_const/assignment/copy.cc: Likewise. * testsuite/experimental/propagate_const/assignment/move.cc: Likewise. * testsuite/experimental/propagate_const/assignment/move_neg.cc: Likewise. * testsuite/experimental/propagate_const/cons/copy.cc: Likewise. * testsuite/experimental/propagate_const/cons/default.cc: Likewise. * testsuite/experimental/propagate_const/cons/move.cc: Likewise. * testsuite/experimental/propagate_const/cons/move_neg.cc: Likewise. * testsuite/experimental/propagate_const/hash/1.cc: Likewise. * testsuite/experimental/propagate_const/observers/1.cc: Likewise. * testsuite/experimental/propagate_const/relops/1.cc: Likewise. * testsuite/experimental/propagate_const/requirements1.cc: Likewise. * testsuite/experimental/propagate_const/requirements2.cc: Likewise. * testsuite/experimental/propagate_const/requirements3.cc: Likewise. * testsuite/experimental/propagate_const/requirements4.cc: Likewise. * testsuite/experimental/propagate_const/requirements5.cc: Likewise. * testsuite/experimental/propagate_const/swap/1.cc: Likewise. * testsuite/experimental/propagate_const/typedefs.cc: Likewise. propagate_const.diff.gz Description: GNU Zip compressed data
Re: [PATCH 2/2] Add leon3r0 and leon3r0v7 CPU targets
One could add a -mtune-fpu switch. Did you look at other architectures in the GCC tree that would have similar requirements? Thank you for the suggestion about adding a -mtune-fpu switch. I have not yet looked at the other architectures, but will do so before proceeding. -- Daniel Cederman
[PATCH] Do not use floating point registers when compiling with -msoft-float for SPARC
__builtin_apply* and __builtin_return accesses the floating point registers on SPARC even when compiling with -msoft-float. gcc/ChangeLog: 2015-06-26 Daniel Cederman ceder...@gaisler.com * config/sparc/sparc.c (sparc_function_value_regno_p): Floating point registers cannot be used when compiling for a target without FPU. * config/sparc/sparc.md: A function cannot return a value in a floating point register when compiled without floating point support. --- gcc/config/sparc/sparc.c | 2 +- gcc/config/sparc/sparc.md | 26 -- 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index 2556eec..e0d40a5 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -7403,7 +7403,7 @@ sparc_libcall_value (machine_mode mode, static bool sparc_function_value_regno_p (const unsigned int regno) { - return (regno == 8 || regno == 32); + return (regno == 8 || (TARGET_FPU regno == 32)); } /* Do what is necessary for `va_start'. We look at the current function diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index a561877..c296913 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -6398,7 +6398,6 @@ { rtx valreg1 = gen_rtx_REG (DImode, 8); - rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32); rtx result = operands[1]; /* Pass constm1 to indicate that it may expect a structure value, but @@ -6407,8 +6406,12 @@ /* Save the function value registers. */ emit_move_insn (adjust_address (result, DImode, 0), valreg1); - emit_move_insn (adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 8), - valreg2); + if (TARGET_FPU) +{ + rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32); + emit_move_insn (adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 8), + valreg2); +} /* The optimizer does not know that the call sets the function value registers we stored in the result block. We avoid problems by @@ -6620,7 +6623,6 @@ { rtx valreg1 = gen_rtx_REG (DImode, 24); - rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32); rtx result = operands[0]; if (! TARGET_ARCH64) @@ -6637,14 +6639,18 @@ emit_insn (gen_update_return (rtnreg, value)); } - /* Reload the function value registers. */ + /* Reload the function value registers. + Put USE insns before the return. */ emit_move_insn (valreg1, adjust_address (result, DImode, 0)); - emit_move_insn (valreg2, - adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 8)); - - /* Put USE insns before the return. */ emit_use (valreg1); - emit_use (valreg2); + + if ( TARGET_FPU ) +{ + rtx valreg2 = gen_rtx_REG (TARGET_ARCH64 ? TFmode : DFmode, 32); + emit_move_insn (valreg2, + adjust_address (result, TARGET_ARCH64 ? TFmode : DFmode, 8)); + emit_use (valreg2); +} /* Construct the return. */ expand_naked_return (); -- 2.4.3
[PATCH] Update instruction cost for LEON
gcc/ChangeLog: 2015-07-03 Daniel Cederman ceder...@gaisler.com * config/sparc/sparc.c (struct processor_costs): Set div cost for leon to match UT699 and AT697F. Set mul cost for leon3 to match standard leon3. --- gcc/config/sparc/sparc.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index e0d40a5..54341c5 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -251,8 +251,8 @@ struct processor_costs leon_costs = { COSTS_N_INSNS (5), /* imul */ COSTS_N_INSNS (5), /* imulX */ 0, /* imul bit factor */ - COSTS_N_INSNS (5), /* idiv */ - COSTS_N_INSNS (5), /* idivX */ + COSTS_N_INSNS (35), /* idiv */ + COSTS_N_INSNS (35), /* idivX */ COSTS_N_INSNS (1), /* movcc/movr */ 0, /* shift penalty */ }; @@ -272,8 +272,8 @@ struct processor_costs leon3_costs = { COSTS_N_INSNS (15), /* fdivd */ COSTS_N_INSNS (22), /* fsqrts */ COSTS_N_INSNS (23), /* fsqrtd */ - COSTS_N_INSNS (5), /* imul */ - COSTS_N_INSNS (5), /* imulX */ + COSTS_N_INSNS (1), /* imul */ + COSTS_N_INSNS (1), /* imulX */ 0, /* imul bit factor */ COSTS_N_INSNS (35), /* idiv */ COSTS_N_INSNS (35), /* idivX */ -- 2.4.3
[PATCH] save takes a single integer (register or 13-bit signed immediate)
This removes a warning about operand 0 missing mode gcc/ChangeLog: 2015-06-26 Daniel Cederman ceder...@gaisler.com * config/sparc/sparc.md: Window save takes a single integer --- gcc/config/sparc/sparc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index c296913..66f7306 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -6490,7 +6490,7 @@ (define_insn window_save [(unspec_volatile - [(match_operand 0 arith_operand rI)] + [(match_operand:SI 0 arith_operand rI)] UNSPECV_SAVEW)] !TARGET_FLAT save\t%%sp, %0, %%sp -- 2.4.3