Re: fix PR46029: reimplement if conversion of loads and stores
Abe Skolnik wrote: Hi everybody! In the current implementation of if conversion, loads and stores are if-converted in a thread-unsafe way: * loads were always executed, even when they should have not been. Some source code could be rendered invalid due to null pointers that were OK in the original program because they were never dereferenced. * writes were if-converted via load/maybe-modify/store, which renders some code multithreading-unsafe. This patch reimplements if-conversion of loads and stores in a safe way using a scratchpad allocated by the compiler on the stack: * loads are done through an indirection, reading either the correct data from the correct source [if the condition is true] or reading from the scratchpad and later ignoring this read result [if the condition is false]. * writes are also done through an indirection, writing either to the correct destination [if the condition is true] or to the scratchpad [if the condition is false]. Vectorization of if-cvt-stores-vect-ifcvt-18.c disabled because the old if-conversion resulted in unsafe code that could fail under multithreading even though the as-written code _was_ thread-safe. Passed regression testing and bootstrap on amd64-linux. Is this OK to commit to trunk? Regards, Abe Thanks for getting back to this! My main thought concerns the direction we are travelling here. A major reason why we do if-conversion is to enable vectorization. Is this is targetted at gathering/scattering loads? Following vectorization, different elements of the vector being loaded/stored may have to go to/from the scratchpad or to/from main memory. Or, are we aiming at the case where the predicate or address are invariant? That seems unlikely - loop unswitching would be better for the predicate; loading from an address, we'd just peel and hoist; storing, this'd result in the address holding the last value written, at exit from the loop, a curious idiom. Where the predicate/address is invariant across the vector? (!) Or, at we aiming at non-vectorized code? Beyond that question... Does the description for -ftree-loop-if-convert-stores in doc/invoke.texi describe what the flag now does? (It doesn't mention loads; the code doesn't look like we use scratchpads at all without -ftree-loop-if-convert-stores, or am I missing something?) In tree-if-conv.c: @@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt, if (flag_tree_loop_if_convert_stores) { - if (ifcvt_could_trap_p (stmt, refs)) + if (ifcvt_could_trap_p (stmt)) { if (ifcvt_can_use_mask_load_store (stmt)) { and + + if (has_non_addressable_refs (stmt)) + { + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, has non-addressable memory references\n); + return false; + } + if it doesn't trap, but has_non_addressable_refs, can't we use ifcvt_can_use_mask_load_store there too? And/or, I think I may be misunderstanding here, but if an access could trap, but is addressable, can't we use the scratchpad technique to get round the trapping problem? (Look at it another way - this patch makes strictly more things return true from ifcvt_could_trap_p, which always exits immediately from if_convertible_gimple_assign_stmt_p...?) Re. creation of scratchpads: (1) Should the '64' byte size be the result of scanning the function, for the largest data size to which we store? (ideally, conditionally store!) (2) Allocating only once per function: if we had one scratchpad per loop, it could/would live inside the test of gimple_build_call_internal (IFN_LOOP_VECTORIZED, Otherwise, if we if-convert one or more loops in the function, but then fail to vectorize them, we'll leave the scratchpad around for later phases to clean up. Is that OK? Also some style nits: @@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop, /* Check the if-convertibility of statements in predicated BBs. */ if (!dominated_by_p (CDI_DOMINATORS, loop-latch, bb)) for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (itr)) - if (!if_convertible_stmt_p (gsi_stmt (itr), *refs, + if (!if_convertible_stmt_p (gsi_stmt (itr), any_mask_load_store)) return false; } bet that fits on one line now. + * Returns a memory reference to the pointer defined by the +conditional expression: pointer = cond ? A[i] : scratch_pad; and + inserts this code at GSI. */ + +static tree +create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad, + gimple_stmt_iterator *gsi, bool swap) in comment, should A[i] just be AI, as I see nothing in create_indirect_cond_expr that requires ai to be an array dereference? @@ -2063,12 +1998,14 @@ mask_exists (int size, vecint vec) | end_bb_1 | | bb_2 + | cond =
[AArch64][TLSGD Desc][2/3] Sort case label alphabetically
Obivious coding style fix. 2015-06-22 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_expand_move_immediate): Sort case label alphabetically. -- Regards, Jiong diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 16c8dba..dddf401 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1548,12 +1548,12 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) emit_insn (gen_rtx_SET (dest, mem)); return; -case SYMBOL_TLSGD: -case SYMBOL_SMALL_GOTTPREL: + case SYMBOL_SMALL_GOTTPREL: case SYMBOL_SMALL_GOT: case SYMBOL_TINY_GOT: -case SYMBOL_TINY_TLSIE: + case SYMBOL_TINY_TLSIE: case SYMBOL_TLSDESC: + case SYMBOL_TLSGD: if (offset != const0_rtx) { gcc_assert(can_create_pseudo_p ());
[AArch64][TLSGD Desc][1/3] Generalize TLS Descriptor for Global Dynamic
Currently, there is only small model support for TLS Global Dynamic (Desciptor) on AArch64. While TLS Global Dynamic (Descriptor) is actually the same for all memory mode. We always generate below code sequences: R0 = GOT entry address of tls descriptor for var. Rx = speialize_func .tlsdesccall var blr Rx Instruction sequences for different memory model differs only for how to addressing the GOT descriptor of that TLS variable, and they should always be packed together for later linker relaxation. Tiny: ldr xr, :tlsdesc:var adr x0, :tlsdesc:var .tlsdesccall var blr xr Small: adrp x0, :tlsdesc:var ldr xr, [x0, #:tlsdesc_lo12:var] add x0, x0, #:tlsdesc_lo12:var .tlsdesccall var blr xr Large: movz x0, #:tlsdesc_off_g1:var movk x0, #:tlsdesc_off_g0_nc:var .tlsdescldr var ldr xr, [gp, x0] .tlsdescadd var add x0, gp, x0 .tlsdesccall var blr xr This patch generalize TLS Global Dynamic Descriptor code for all memory model. Another seperate patch will add descriptor support for Tiny model. OK for trunk? 2015-06-22 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Rename SYMBOL_SMALL_TLSDESC to SYMBOL_TLSDESC. (aarch64_symbol_context): Ditto. * config/aarch64/aarch64.md (tlsdesc_small_mode): Renamed into tlsdesc_mode. * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Rename SYMBOL_SMALL_TLSDESC to SYMBOL_TLSDESC. Rename gen_tlsdesc_small_* to gen_tlsdesc_*. (aarch64_expand_mov_immediate): Ditto. (aarch64_print_operand): Ditto. (aarch64_classify_tls_symbol): Ditto. -- Regards, Jiong diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 7fad48b..576acc0 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -61,9 +61,9 @@ enum aarch64_symbol_context This corresponds to the small PIC model of the compiler. - SYMBOL_SMALL_TLSDESC SYMBOL_SMALL_GOTTPREL SYMBOL_TINY_TLSIE + SYMBOL_TLSDESC SYMBOL_TLSGD SYMBOL_TLSLE Each of of these represents a thread-local symbol, and corresponds to the @@ -96,11 +96,11 @@ enum aarch64_symbol_type { SYMBOL_SMALL_ABSOLUTE, SYMBOL_SMALL_GOT, - SYMBOL_SMALL_TLSDESC, SYMBOL_SMALL_GOTTPREL, SYMBOL_TINY_ABSOLUTE, SYMBOL_TINY_GOT, SYMBOL_TINY_TLSIE, + SYMBOL_TLSDESC, SYMBOL_TLSGD, SYMBOL_TLSLE, SYMBOL_FORCE_TO_MEM diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e724bd4..16c8dba 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -921,7 +921,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, return; } -case SYMBOL_SMALL_TLSDESC: +case SYMBOL_TLSDESC: { machine_mode mode = GET_MODE (dest); rtx x0 = gen_rtx_REG (mode, R0_REGNUM); @@ -932,9 +932,9 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, /* In ILP32, the got entry is always of SImode size. Unlike small GOT, the dest is fixed at reg 0. */ if (TARGET_ILP32) - emit_insn (gen_tlsdesc_small_si (imm)); + emit_insn (gen_tlsdesc_si (imm)); else - emit_insn (gen_tlsdesc_small_di (imm)); + emit_insn (gen_tlsdesc_di (imm)); tp = aarch64_load_tp (NULL); if (mode != Pmode) @@ -1549,11 +1549,11 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) return; case SYMBOL_TLSGD: -case SYMBOL_SMALL_TLSDESC: case SYMBOL_SMALL_GOTTPREL: case SYMBOL_SMALL_GOT: case SYMBOL_TINY_GOT: case SYMBOL_TINY_TLSIE: + case SYMBOL_TLSDESC: if (offset != const0_rtx) { gcc_assert(can_create_pseudo_p ()); @@ -4435,7 +4435,7 @@ aarch64_print_operand (FILE *f, rtx x, char code) asm_fprintf (asm_out_file, :tlsgd:); break; - case SYMBOL_SMALL_TLSDESC: + case SYMBOL_TLSDESC: asm_fprintf (asm_out_file, :tlsdesc:); break; @@ -4468,7 +4468,7 @@ aarch64_print_operand (FILE *f, rtx x, char code) asm_fprintf (asm_out_file, :tlsgd_lo12:); break; - case SYMBOL_SMALL_TLSDESC: + case SYMBOL_TLSDESC: asm_fprintf (asm_out_file, :tlsdesc_lo12:); break; @@ -7273,7 +7273,7 @@ aarch64_classify_tls_symbol (rtx x) { case TLS_MODEL_GLOBAL_DYNAMIC: case TLS_MODEL_LOCAL_DYNAMIC: - return TARGET_TLS_DESC ? SYMBOL_SMALL_TLSDESC : SYMBOL_TLSGD; + return TARGET_TLS_DESC ? SYMBOL_TLSDESC : SYMBOL_TLSGD; case TLS_MODEL_INITIAL_EXEC: switch (aarch64_cmodel) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 9f1b26e..f3d9082 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4378,7 +4378,7 @@ (set_attr length 8, 12)] ) -(define_insn tlsdesc_small_mode +(define_insn tlsdesc_mode [(set (reg:PTR R0_REGNUM) (unspec:PTR [(match_operand 0 aarch64_valid_symref S)] UNSPEC_TLSDESC))
[AArch64][TLSGD Desc][3/3] Implement TLS Global Dynamic Descriptor for tiny model
As we have generalized GD Descriptor support for all memory model in the first patch. Support for tiny model is quite straightforward. We just need to output different instruction sequences according on memory model. OK for trunk? 2015-06-22 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.md (tlsdesc_mode): Support tiny model constraint. gcc/testsuite/ * gcc.target/aarch64/tlsdesc_small.c: New. * gcc.target/aarch64/tlsdesc_tiny.c: Ditto. -- Regards, Jiong diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 827ae8e..1b4e387 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4394,9 +4394,19 @@ (clobber (reg:CC CC_REGNUM)) (clobber (match_scratch:DI 1 =r))] TARGET_TLS_DESC - adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0, w0, %L0\;.tlsdesccall\\t%0\;blr\\t%1 + { +if (aarch64_cmodel_var == AARCH64_CMODEL_TINY) + return ldr\t%w1, #%A0;adr\tw0, %A0;.tlsdesccall\t%0;blr\t%1; +else if (aarch64_cmodel_var == AARCH64_CMODEL_SMALL) + return adrp\tx0, %A0;ldr\t%w1, [x0, #%L0];add\tw0, w0, %L0;.tlsdesccall\t%0;blr\t%1; +else + /* TBD: Large model to be supported. */ + gcc_unreachable (); + } [(set_attr type call) - (set_attr length 16)]) + (set (attr length) + (if_then_else (match_test aarch64_cmodel_var == AARCH64_CMODEL_TINY) + (const_int 12) (const_int 16)))]) (define_insn stack_tie [(set (mem:BLK (scratch)) diff --git a/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c b/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c new file mode 100644 index 000..f1429b9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/tlsdesc_small.c @@ -0,0 +1,9 @@ +/* { dg-do run } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-options -O2 -ftls-model=global-dynamic -fPIC --save-temps } */ + +#include tls.c + +/* { dg-final { scan-assembler-times adrp\tx0, :tlsdesc: 2 } } */ +/* { dg-final { scan-assembler-times tlsdesccall 2 } } */ +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c b/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c new file mode 100644 index 000..a107650 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/tlsdesc_tiny.c @@ -0,0 +1,9 @@ +/* { dg-do run } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-options -O2 -ftls-model=global-dynamic -fPIC -mcmodel=tiny --save-temps } */ + +#include tls.c + +/* { dg-final { scan-assembler-times adr\tx0, :tlsdesc: 2 } } */ +/* { dg-final { scan-assembler-times tlsdesccall 2 } } */ +/* { dg-final { cleanup-saved-temps } } */
Re: C++ PATCH for c++/66515 (ICE with initializer_list)
On 06/17/2015 04:44 PM, Jason Merrill wrote: Now that reshape_init can return a non-CONSTRUCTOR, we need to call it earlier in implicit_conversion. I haven't noticed any problems with the original patch, but just to be safe this patch limits the new reshape to the same conditions as the old one: only classes. Tested x86_64-pc-linux-gnu, commit 98a362ded24db54963524761c3b0613ff844de51 Author: Jason Merrill ja...@redhat.com Date: Fri Jun 19 15:29:58 2015 -0400 PR c++/66515 * call.c (implicit_conversion): Only reshape for classes. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index ba5da4c..a6c313a 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -1759,8 +1759,9 @@ implicit_conversion (tree to, tree from, tree expr, bool c_cast_p, /* Call reshape_init early to remove redundant braces. */ if (expr BRACE_ENCLOSED_INITIALIZER_P (expr) + CLASS_TYPE_P (to) COMPLETE_TYPE_P (complete_type (to)) - CP_AGGREGATE_TYPE_P (to)) + !CLASSTYPE_NON_AGGREGATE (to)) { expr = reshape_init (to, expr, complain); if (expr == error_mark_node)
[patch] Delete temporary response file
Hi, when you pass a response file at link time and you use the GNU linker, then collect2 creates another, temporary response file and passes it to the linker. But it fails to delete the file after it is done. This can easily be seen with the following manipulation: eric@polaris:~/build/gcc/native cat t.c int main (void) { return 0; } eric@polaris:~/build/gcc/native cat t.resp -L/usr/lib64 eric@polaris:~/build/gcc/native gcc -c t.c eric@polaris:~/build/gcc/native export TMPDIR=$PWD eric@polaris:~/build/gcc/native gcc -o t t.o @t.resp eric@polaris:~/build/gcc/native ls cc* ccVSQ6W5 The problem is that do_wait is not invoked by tlink_execute, only collect_wait is, so the cleanup code present therein is never invoked. Tested on x86_64-suse-linux, OK for the mainline? 2015-06-22 Tristan Gingold ging...@adacore.com * collect2.c (collect_wait): Unlink the response file here instead of... (do_wait): ...here. (utils_cleanup): ...and here. -- Eric BotcazouIndex: collect-utils.c === --- collect-utils.c (revision 224708) +++ collect-utils.c (working copy) @@ -68,6 +68,12 @@ collect_wait (const char *prog, struct p fatal_error (input_location, can't get program status: %m); pex_free (pex); + if (response_file !save_temps) +{ + unlink (response_file); + response_file = NULL; +} + if (status) { if (WIFSIGNALED (status)) @@ -90,12 +96,6 @@ do_wait (const char *prog, struct pex_ob int ret = collect_wait (prog, pex); if (ret != 0) fatal_error (input_location, %s returned %d exit status, prog, ret); - - if (response_file !save_temps) -{ - unlink (response_file); - response_file = NULL; -} } @@ -224,7 +224,5 @@ utils_cleanup (bool from_signal) calls to maybe_unlink fails. */ cleanup_done = true; - if (response_file) -maybe_unlink (response_file); tool_cleanup (from_signal); }
Re: [PATCH 3/3] Improve -Wmissing-indentation heuristics
On 06/09/2015 11:31 AM, Patrick Palka wrote: This patch improves the heuristics of the warning in a number of ways. The improvements are hopefully adequately documented in the code comments. The additions to the test case also highlight the improvements. I tested an earlier version of this patch on more than a dozen C code bases. I only found one class of bogus warnings yet emitted, in the libpng and bdwgc projects. These projects have a coding style which indents code inside #ifdefs as if this code was guarded by an if(), e.g. if (foo != 0) x = 10; else // GUARD y = 100; // BODY #ifdef BAR blah (); // NEXT #endif These bogus warnings are pre-existing, however (i.e. not caused by this patch). gcc/c-family/ChangeLog: * c-indentation.c (should_warn_for_misleading_indentation): Improve heuristics. gcc/testsuite/ChangeLog: * c-c++-common/Wmisleading-indentation.c: Add more tests. OK after confirming a successful bootstrap regression test. jeff
[PATCH 1/2][ARM] Record FPU features as a bit-set
Hello, The ARM backend records FPU features as booleans, one for each feature. This means that adding support for a new feature involves updating every entry in the list of FPU descriptions in arm-fpus.def. This patch series changes the representation of FPU features to use a simple bit-set and flags, as is done elsewhere. This patch adds the new FPU feature representation, with feature sets represented as unsigned longs. Tested the series for arm-none-linux-gnueabihf with check-gcc Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab matthew.wa...@arm.com * config/arm/arm.h (arm_fpu_fset): New. (ARM_FPU_FSET_HAS): New. (FPU_FL_NONE): New. (FPU_FL_NEON): New. (FPU_FL_FP16): New. (FPU_FL_CRYPTO): New. From 0ae697751afd9420ece15432e4892a60574b1d56 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Wed, 10 Jun 2015 09:57:55 +0100 Subject: [PATCH 1/2] Add fpu feature set definitions. Change-Id: I9614d12b19f068ae2e0cebc1a6c3903972c73d6a --- gcc/config/arm/arm.h | 13 + 1 file changed, 13 insertions(+) diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 373dc85..eadbcec 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -318,6 +318,19 @@ extern void (*arm_lang_output_object_attributes_hook)(void); {mode, %{!marm:%{!mthumb:-m%(VALUE)}}}, \ {tls, %{!mtls-dialect=*:-mtls-dialect=%(VALUE)}}, +/* FPU feature sets. */ + +typedef unsigned long arm_fpu_fset; + +/* Test for an FPU feature. */ +#define ARM_FPU_FSET_HAS(S,F) (((S) (F)) == F) + +/* FPU Features. */ +#define FPU_FL_NONE (0) +#define FPU_FL_NEON (1 0) /* NEON instructions. */ +#define FPU_FL_FP16 (1 1) /* Half-precision. */ +#define FPU_FL_CRYPTO (1 2) /* Crypto extensions. */ + /* Which floating point model to use. */ enum arm_fp_model { -- 1.9.1
Re: [Ping, Patch, fortran, 64674, v3] [OOP] ICE in ASSOCIATE with class array
Dear Andre, It was indeed the associate(pam = im(:, c)) that I had in mind. If you have that working and in the tescase, that's good enough for me. Cheers Paul On 22 June 2015 at 17:15, Andre Vehreschild ve...@gmx.de wrote: Hi Paul, On Mon, 22 Jun 2015 16:04:09 +0200 Paul Richard Thomas paul.richard.tho...@gmail.com wrote: Hi Andre, Some questions: The first and second chunks look a bit awkward in parse.c. Do they have to be there in order that primary.c does the right thing? I tried at first to do this rank resolution in primary.c, but that was too late. parse.c needs to propagate the rank correctly. When I remember correctly, then doing so later prevents parse.c to correctly recognize the vector (from the example in the initial description) as such, i.e., the indexing in vector(2) was not allowed. gfortran assumed vector to be scalar. So, IMHO yes. Could the whole lot be transferred to resolve.c or would that make it horribly messy? Again, IMO is it not easily transferable to primary.c or even resolve.c. Therefore no. I couldn't apply the patch right now - does it work with variable expressions for the target array indices? I am not quite sure, what you mean. Something like this: associate(pam = im(2:3, 2:3)) pam = 9 pam(1,2) = 10 do c = 1, 2 pam(2, c) = 0 end do end associate ? I have added that to the testcase and it works. Or do you want the variable expressions in the target, like this: integer :: expect(20)= 23 integer :: im(4,5) = 23 integer :: c expect(2:3) = 9 do c = 1, 5 im = 23 associate(pam = im(:, c)) pam(2:3) = 9 end associate if (any (reshape(im, [20]) /= expect)) call abort() ! Shift expect expect = [expect(17:), expect(:16)] end do Will this do, or did you have something more elaborate in mind? This is also working and in the testcase now. Thanks for the review so far. Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de -- Outside of a dog, a book is a man's best friend. Inside of a dog it's too dark to read. Groucho Marx
[PATCH 1/4][ARM] Make room for more CPU feature flags.
Hello, The ARM backend uses an unsigned long to record CPU feature flags and there are currently 30 bits in use. To be able to support new architecture features, the current representation will need to be replaced so that more flags can be recorded. This series of patches replaces the single unsigned long with a representation based on an array of unsigned longs. Constructors and operations are explicitly defined for the new representation and the backend is updated to use the new operations. The individual patches: - Make architecture flags explicit in arm-cores.def, to prepare for the changes. - Add definitions for the new representation as type arm_feature_set and macros with prefix ARM_FSET. - Replace uses of the old representation with the arm_feature_set type and operations. - Rework arm-cores.def and arm-arches.def to make the feature set constructions explicit. The series tested for arm-none-linux-gnueabihf with check-gcc. This patch moves the derived FL_FOR_ARCH##ARCH flags from the expansion of macro arm.c/ARM_CORE and makes them explicit in the entries in arm-cores.def. This patch tested for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew 2015-06-22 Matthew Wahab matthew.wa...@arm.com * gcc/config/arm/arm-cores.def: Add FL_FOR_ARCH flag for each ARM_CORE entry. Fix some white-space. * gcc/config/arm/arm.c: Remove FL_FOR_ARCH derivation from ARM_CORE definition. From b8d4b4ef938d64996d0d20aaa9974757057aaad2 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Fri, 5 Jun 2015 12:33:34 +0100 Subject: [PATCH 1/4] [ARM] Make ARCH flags explicit in arm-cores.def Change-Id: I13a79c89bebaf82aa921f0502b721ff5d9b92dbe --- gcc/config/arm/arm-cores.def | 200 +-- gcc/config/arm/arm.c | 2 +- 2 files changed, 101 insertions(+), 101 deletions(-) diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 103c314..f362c27 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -43,134 +43,134 @@ Some tools assume no whitespace up to the first , in each entry. */ /* V2/V2A Architecture Processors */ -ARM_CORE(arm2, arm2, arm2, 2, FL_CO_PROC | FL_MODE26, slowmul) -ARM_CORE(arm250, arm250, arm250, 2, FL_CO_PROC | FL_MODE26, slowmul) -ARM_CORE(arm3, arm3, arm3, 2, FL_CO_PROC | FL_MODE26, slowmul) +ARM_CORE(arm2, arm2, arm2, 2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul) +ARM_CORE(arm250, arm250, arm250, 2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul) +ARM_CORE(arm3, arm3, arm3, 2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2, slowmul) /* V3 Architecture Processors */ -ARM_CORE(arm6, arm6, arm6, 3, FL_CO_PROC | FL_MODE26, slowmul) -ARM_CORE(arm60, arm60, arm60, 3, FL_CO_PROC | FL_MODE26, slowmul) -ARM_CORE(arm600, arm600, arm600, 3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm610, arm610, arm610, 3, FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm620, arm620, arm620, 3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm7, arm7, arm7, 3, FL_CO_PROC | FL_MODE26, slowmul) -ARM_CORE(arm7d, arm7d, arm7d, 3, FL_CO_PROC | FL_MODE26, slowmul) -ARM_CORE(arm7di, arm7di, arm7di, 3, FL_CO_PROC | FL_MODE26, slowmul) -ARM_CORE(arm70, arm70, arm70, 3, FL_CO_PROC | FL_MODE26, slowmul) -ARM_CORE(arm700, arm700, arm700, 3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm700i, arm700i, arm700i, 3, FL_CO_PROC | FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm710, arm710, arm710, 3, FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm720, arm720, arm720, 3, FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm710c, arm710c, arm710c, 3, FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm7100, arm7100, arm7100, 3, FL_MODE26 | FL_WBUF, slowmul) -ARM_CORE(arm7500, arm7500, arm7500, 3, FL_MODE26 | FL_WBUF, slowmul) +ARM_CORE(arm6, arm6, arm6, 3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm60, arm60, arm60, 3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm600, arm600, arm600, 3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm610, arm610, arm610, 3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm620, arm620, arm620, 3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm7, arm7, arm7, 3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm7d, arm7d, arm7d, 3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm7di, arm7di, arm7di, 3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm70, arm70, arm70, 3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm700, arm700, arm700, 3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm700i, arm700i, arm700i, 3, FL_CO_PROC | FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm710, arm710, arm710, 3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm720, arm720, arm720, 3, FL_MODE26 | FL_WBUF | FL_FOR_ARCH3, slowmul) +ARM_CORE(arm710c,
Re: RFA: Fix isl-ast-gen-if-1.c test
On 06/22/2015 09:38 AM, Nick Clifton wrote: Hi Guys, The test file gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c file was generating an unexpected failure for the RX. When I investigated I found that a return address on the stack was being corrupted, and I tracked it down to the foo() function: foo (int a[], int n) { int i; for (i = 0; i n; i++) { if (i 25) a[i] = i; a[n - i] = 1; } } The problem is that when i is 0, the line a[n - i] writes to a[50] which is beyond the end of the a array. (In the RX case it writes over the return address on the stack). The patch below fixes the problem, although it could also be solved by increasing the size of the a array when it is declared in main(). OK to apply ? Cheers Nick gcc/testsuite/ChangeLog 2015-06-22 Nick Clifton ni...@redhat.com * gcc.dg/graphite/isl-ast-gen-if-1.c (foo): Prevent writing after the end of the array. I'd tend to prefer to change the size of the array -- adding another conditional in the loop may have unintended consequences that possibly scramble things just enough to compromise the test. jeff
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
On 04/05/15 17:37, Alexander Monakov wrote: This patch introduces option -fno-plt that allows to expand calls that would go via PLT to load the address of the function immediately at call site (which introduces a GOT load). Cover letter explains the motivation for this patch. New option documentation for invoke.texi is missing from the patch; if this is accepted I'll be happy to send a v2 with documentation added. * calls.c (prepare_call_address): Transform PLT call to GOT lookup and indirect call by forcing address into a pseudo with -fno-plt. * common.opt (flag_plt): New option. Have done a quick experiment, -fno-plt doesn't work on AArch64. it's because although this patch force the function address into register, but the combine pass runs later combine it back as AArch64 have defined such insn pattern. For X86, it's not combined back. From the rtl dump, it's because the rtl pre pass has moved the address load instruction into another basic block and combine pass don't combine across basic blocks. Also, x86 backend has done some check on flag_plt in the new added ix86_nopic_noplt_attribute_p which could help generate correct insns. What I can think of the fix on AArch64 is by restricting the call symbol under flag_plt == true only, so that call via register can't be combined into call symbol direct, Or better to prohibit combine pass for such combining? as the generic fix on combine may fix other broken targets. Thoughts? Regards, Jiong
Re: [gomp4] Preserve NVPTX reconvergence points
On 06/22/15 11:18, Bernd Schmidt wrote: You can have a hint that it is desirable, but not a hint that it is correct (because passes in between may invalidate that). The OpenACC directives guarantee to the compiler that the program can be transformed into a parallel form. If we lose them early we must then rely on our analysis which may not be strong enough to prove that the loop can be parallelized. If we make these transformations early enough, while we still have the OpenACC directives, we can guarantee that we do exactly what the programmer specified. How does this differ from openmp's needs to preserve parallelism on a parallel loop? Is it more than the reconvergence issue? nathan -- Nathan Sidwell
Re: [PATCH] top-level for libvtv: use normal (not raw_cxx) target exports
On 06/19/2015 11:36 AM, Paolo Bonzini wrote: On 09/06/2015 16:22, Michael Haubenwallner wrote: Hi build machinery maintainers, since we always build the C++ compiler now, I fail to see the need to still use RAW_CXX_TARGET_EXPORTS for libvtv. The situation to expose the problem is: * Use a multilib-enabled x86_64-linux box. * Use a 64-bit (multilib-disabled) bootstrap compiler (binary image). $ configure --enable-multilib --with-system-zlib $ make bootstrap When it comes to build the 32-bit libvtv, it breaks because of using CC=/build/prev-gcc/xgcc -m32 CXX=g++ -m32, while it should use CC=/build/prev-gcc/xgcc -m32 CXX=/build/prev-gcc/xg++ -m32 instead. Unfortunately, I've been unable to reproduce this problem for a while now, and it turns out that I also used --enable-maintainer-mode. And it happens only when some generated autotool-file is updated while it is in use - not really sure which one though, probably some configure script. But still: However, I'm not sure about the general question behind: Should it work to bootstrap the multilib-compiler using a non-multilib one? This also needs above configure flags to work around two more but minor issues, which I'm unsure about whether I can/should fix at all: * --enable-multilib: Without this, the user friendly check is breaking, since https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=205975 Why is it breaking? The OS I'm running on is a multilib-enabled x86_64 (Gentoo) Linux, and 32-bit development libraries (libc and headers) are available. The compiler I want to bootstrap the multilib compiler with is an x86_64-only binary image (c,c++,ada), built _without_ multilib. Now this user friendly check tries to create a 32-bit executable using that 64-bit-only bootstrap gcc, which lacks 32-bit libgcc* - while it can create 32-bit object files though. * --with-system-zlib: Without this, --enable-multilib tries to build a 32-bit zlib with CC=/build/32/./prev-gcc/xgcc Ouch, that's a separate bug... Arguably --with-system-zlib should be the default these days (and should have been for 10 years or so). This one I'll leave untouched. The patch is ok. Even if the problem raises only because maintainer-mode isn't multilib-save? Thanks! /haubi/
Re: Re: [PATCH] [PATCH][ARM] Fix split-live-ranges-for-shrink-wrap.c testcase.
On 20/05/15 21:14, Joseph Myers wrote: Again, the condition you propose to add doesn't make sense. arm_arch_X_ok is only appropriate for tests using an explicit -march=X. Testing with -march=armv7* should automatically skip this test anyway because it would cause arm_thumb1_ok to fail. Hi, I adjusted the patch to skip execution split-live-ranges-for-shrink-wrap.c with explicitly specified -march=armv4t and provide -march=armv5t flag = for arm_arch_v5t_ok targets. Is patch ok? Alex gcc/testsuite 2015-06-22 Alex Velenko alex.vele...@arm.com * gcc.target/arm/split-live-ranges-for-shrink-wrap.c (dg-skip-if): Skip -march=armv4t. (dg-additional-options): Set armv5t flag. diff --git a/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c b/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c index e36000b..3cb93dc 100644 --- a/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c +++ b/gcc/testsuite/gcc.target/arm/split-live-ranges-for-shrink-wrap.c @@ -1,6 +1,8 @@ /* { dg-do assemble } */ /* { dg-options -mthumb -Os -fdump-rtl-ira } */ /* { dg-require-effective-target arm_thumb1_ok } */ +/* { dg-skip-if do not test on armv4t { *-*-* } { -march=armv4t } } = */ +/* { dg-additional-options -march=armv5t {target arm_arch_v5t_ok} } */ int foo (char *, char *, int); int test (int d, char * out, char *in, int len) --1.8.1.2--
[committed] Add missing update_stmt in transform_to_exit_first_loop_alt
Hi, I realized that transform_to_exit_first_loop_alt is missing an update_stmt for the gimple_cond_set_rhs (transform_to_exit_first_loop has an update_stmt after a similar gimple_cond_set_lhs). Bootstrapped and reg-tested on x86_64. Committed as trivial. Thanks, - Tom Add missing update_stmt in transform_to_exit_first_loop_alt 2015-06-22 Tom de Vries t...@codesourcery.com * tree-parloops.c (transform_to_exit_first_loop_alt): Add update_stmt for cond_stmt. --- gcc/tree-parloops.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 28112b2..7123c27 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1679,6 +1679,7 @@ transform_to_exit_first_loop_alt (struct loop *loop, /* Set the new loop bound. */ gimple_cond_set_rhs (cond_stmt, bound); + update_stmt (cond_stmt); /* Repair the ssa. */ vecedge_var_map *v = redirect_edge_var_map_vector (post_inc_edge); -- 1.9.1
Re: [i386, PATCH, 2/3] IA MCU psABI support: changes to libraries.
Hello, Patch in the bottom adds support of IA MCU psABI to libgcc (enables soft-fp) and libdecnumber (enables it for IA MCU). Bootstrapped and regtested on top of [1/3] patch. config/ * dfp.m4 (enable_decimal_float): Also set to yes for i?86*-*-elfiamcu target. gcc/ * configure: Regenerated. libdecnumber/ * configure: Regenerated. libgcc/ * config.host: Support i[34567]86-*-elfiamcu target. * config/i386/32/t-iamcu: New file. * configure: Regenerated. Is it OK for trunk? -- Thanks, K diff --git a/config/dfp.m4 b/config/dfp.m4 index 48683f0..5b29089 100644 --- a/config/dfp.m4 +++ b/config/dfp.m4 @@ -21,7 +21,7 @@ Valid choices are 'yes', 'bid', 'dpd', and 'no'.]) ;; [ case $1 in powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \ -i?86*-*-gnu* | \ +i?86*-*-elfiamcu | i?86*-*-gnu* | \ i?86*-*-mingw* | x86_64*-*-mingw* | \ i?86*-*-cygwin* | x86_64*-*-cygwin*) enable_decimal_float=yes diff --git a/gcc/configure b/gcc/configure index b26a86f..64eeac6 100755 --- a/gcc/configure +++ b/gcc/configure @@ -7317,7 +7317,7 @@ else case $target in powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \ -i?86*-*-gnu* | \ +i?86*-*-elfiamcu | i?86*-*-gnu* | \ i?86*-*-mingw* | x86_64*-*-mingw* | \ i?86*-*-cygwin* | x86_64*-*-cygwin*) enable_decimal_float=yes diff --git a/libdecnumber/configure b/libdecnumber/configure index 2720f46..964837d 100755 --- a/libdecnumber/configure +++ b/libdecnumber/configure @@ -4614,7 +4614,7 @@ else case $target in powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \ -i?86*-*-gnu* | \ +i?86*-*-elfiamcu | i?86*-*-gnu* | \ i?86*-*-mingw* | x86_64*-*-mingw* | \ i?86*-*-cygwin* | x86_64*-*-cygwin*) enable_decimal_float=yes diff --git a/libgcc/config.host b/libgcc/config.host index 4df..dd8e356 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -562,6 +562,9 @@ x86_64-*-darwin*) tm_file=$tm_file i386/darwin-lib.h extra_parts=$extra_parts crtprec32.o crtprec64.o crtprec80.o crtfastmath.o ;; +i[34567]86-*-elfiamcu) + tmake_file=$tmake_file i386/t-crtstuff t-softfp-sfdf i386/32/t-softfp i386/32/t-iamcu i386/t-softfp t-softfp t-dfprules + ;; i[34567]86-*-elf*) tmake_file=$tmake_file i386/t-crtstuff t-crtstuff-pic t-libgcc-pic ;; diff --git a/libgcc/config/i386/32/t-iamcu b/libgcc/config/i386/32/t-iamcu new file mode 100644 index 000..0752bff --- /dev/null +++ b/libgcc/config/i386/32/t-iamcu @@ -0,0 +1,6 @@ +softfp_float_modes += tf +softfp_extensions += sftf dftf xftf +softfp_truncations += tfsf tfdf tfxf +softfp_exclude_libgcc2 := n + +HOST_LIBGCC2_CFLAGS += -mlong-double-80 diff --git a/libgcc/configure b/libgcc/configure index ce66d1d..e22cbcb 100644 --- a/libgcc/configure +++ b/libgcc/configure @@ -4436,7 +4436,7 @@ else case $host in powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \ -i?86*-*-gnu* | \ +i?86*-*-elfiamcu | i?86*-*-gnu* | \ i?86*-*-mingw* | x86_64*-*-mingw* | \ i?86*-*-cygwin* | x86_64*-*-cygwin*) enable_decimal_float=yes
[PATCH 3/4][ARM] Use new feature set representation.
Hello, The ARM backend uses an unsigned long to record CPU feature flags and there are currently 30 bits in use. This series of patches replaces the single unsigned long with a representation based on an array of values. This patch replaces the existing representation of CPU feature sets with the type arm_feature_set and ARM_FSET macros added in an earlier patch in this series. Tested arm-none-linux-gnueabihf with check-gcc. Also tested as part of the series for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab matthew.wa...@arm.com * config/arm/arm-builtins.c (def_mbuiltin): Use ARM_FSET macro. * config/arm/arm-protos.h (insn_flags): Declare as type arm_feature_set. (tune_flags): Likewise. * config/arm/arm.c (feature_count): New. (insn_flags): Define as type arm_feature_set. (tune_flags): Likewise. (struct processors): Define field flags as type arm_feature_set. (all_cores): Update for change to struct processors. (all_architectures): Likewise. (arm_option_check_internal): Use arm_feature_set and ARM_FSET macros. (arm_option_override_internal): Likewise. (arm_option_override): Likewise. From 8b5e132868da066eb8a8673286b796656b9ed127 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Mon, 8 Jun 2015 14:11:13 +0100 Subject: [PATCH 3/4] Use feature sets. Change-Id: I5a1b162102dd19b6376637218dc548502112cf4b --- gcc/config/arm/arm-builtins.c | 4 +- gcc/config/arm/arm-protos.h | 4 +- gcc/config/arm/arm.c | 131 -- 3 files changed, 80 insertions(+), 59 deletions(-) diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c index f960e0a..31203d4 100644 --- a/gcc/config/arm/arm-builtins.c +++ b/gcc/config/arm/arm-builtins.c @@ -1074,10 +1074,10 @@ arm_init_neon_builtins (void) #undef NUM_DREG_TYPES #undef NUM_QREG_TYPES -#define def_mbuiltin(MASK, NAME, TYPE, CODE)\ +#define def_mbuiltin(FLAG, NAME, TYPE, CODE)\ do \ { \ - if ((MASK) insn_flags) \ + if (ARM_FSET_HAS_CPU1 (insn_flags, (FLAG))) \ {\ tree bdecl; \ bdecl = add_builtin_function ((NAME), (TYPE), (CODE), \ diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index a19d54d..859b5d2 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -515,11 +515,11 @@ typedef struct /* The bits in this mask specify which instructions we are allowed to generate. */ -extern unsigned long insn_flags; +extern arm_feature_set insn_flags; /* The bits in this mask specify which instruction scheduling options should be used. */ -extern unsigned long tune_flags; +extern arm_feature_set tune_flags; /* Nonzero if this chip supports the ARM Architecture 3M extensions. */ extern int arm_arch3m; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index b21f433..dd892a7 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -105,6 +105,7 @@ static void arm_add_gc_roots (void); static int arm_gen_constant (enum rtx_code, machine_mode, rtx, HOST_WIDE_INT, rtx, rtx, int, int); static unsigned bit_count (unsigned long); +static unsigned feature_count (const arm_feature_set*); static int arm_address_register_rtx_p (rtx, int); static int arm_legitimate_index_p (machine_mode, rtx, RTX_CODE, int); static bool is_called_in_ARM_mode (tree); @@ -771,11 +772,11 @@ static int thumb_call_reg_needed; /* The bits in this mask specify which instructions we are allowed to generate. */ -unsigned long insn_flags = 0; +arm_feature_set insn_flags = ARM_FSET_EMPTY; /* The bits in this mask specify which instruction scheduling options should be used. */ -unsigned long tune_flags = 0; +arm_feature_set tune_flags = ARM_FSET_EMPTY; /* The highest ARM architecture version supported by the target. */ @@ -928,7 +929,7 @@ struct processors enum processor_type core; const char *arch; enum base_architecture base_arch; - const unsigned long flags; + const arm_feature_set flags; const struct tune_params *const tune; }; @@ -2197,10 +2198,10 @@ static const struct processors all_cores[] = /* ARM Cores */ #define ARM_CORE(NAME, X, IDENT, ARCH, FLAGS, COSTS) \ {NAME, IDENT, #ARCH, BASE_ARCH_##ARCH, \ -FLAGS, arm_##COSTS##_tune}, + ARM_FSET_MAKE_CPU1 (FLAGS), arm_##COSTS##_tune}, #include arm-cores.def #undef ARM_CORE - {NULL, arm_none, NULL, BASE_ARCH_0, 0, NULL} + {NULL, arm_none, NULL, BASE_ARCH_0, ARM_FSET_EMPTY, NULL} }; static const struct processors all_architectures[] = @@ -2210,10 +2211,10 @@ static const struct processors all_architectures[] = from the core. */ #define ARM_ARCH(NAME, CORE, ARCH, FLAGS) \ - {NAME, CORE, #ARCH, BASE_ARCH_##ARCH, FLAGS, NULL}, + {NAME, CORE, #ARCH, BASE_ARCH_##ARCH, ARM_FSET_MAKE_CPU1
Re: Fix more of C/fortran canonical type issues
Hi, I would like to ping this. There are still few things to fix to make our merging compliant at least for C/C++/Fortran rules (the array bounds for Fortran and union ordering for C I believe) and I would like to progress on this. I don't like the changes to useless_type_conversion_p much. Why do you preserve qualifiers for the integer kind compares? All the testcases have the integral types in aggregates as members. I already said that I'm happy globbing them together in aggregates. I originally made the testcase this way because I wanted to test the way aggregates are built and because it needs less of fortran code to realize it. It is possible to consutrct same testcase with scalar variables. See the other patch fixing the surprious warning. You need also variant size_t a to be compatible with fortran equivalent of signed size_t a, so it is not only about variables. I'm still not convinced that we need a 1:1 correspondence between canonical types and alias sets. In particular canonical types are used for type compatibility in lhs = rhs assignments (useless_type_conversion_p) which is a transitive relation. Mixing both too much will cause serious confusion. We have alias-sets for a reason. OK, I am not sure if canonical types needs to actually mean the type compatibility in the middle-end sense. It is a language specific thing: The canonical type for this type node, which is used by frontends to compare the type for equality with another type. If two types are equal (based on the semantics of the language), then they will have equivalent TYPE_CANONICAL entries. In a way TYPE_CANONICAL seems bit schizofrenic about if it means language level compatibility, representation compatibility or middle end semantic compatibility. It seems bit odd to define something like useless_type_conversion_p by language specific manner despite the fact its definition is now sound in language independent way as we now have all semantics represented in IL (and flags, well) but I would be happy to update the patch to assign different canonical types to signed/unsigned integers and avoid recursion on those for aggregates/arrays and all other derived types. (After all I plan to do that for pointers incrementally) Here we are lucky that alias.c already contains the globbing for signed/unsigned. Do we want to do the same scheme in other cases? For example next on my list is the fact that array with bounds 3...5 is interoperable with array[3] in C. Here again we can not consider these useless_type_conversion_p because index operation is different, but they are representation compatible. This will need a special case in get_alias_set. I would not like to make get_alias_set, or (with less loss of code quality on non-C languages) in lto's get_alias_set langhook. Honza Richard.
Re: [gomp4] Preserve NVPTX reconvergence points
On 06/22/15 12:20, Jakub Jelinek wrote: OpenMP worksharing loop is just coordination between the threads in the team, which thread takes which subset of the loop's iterations, and optionally followed by a barrier. OpenMP simd loop is a loop that has certain properties guaranteed by the user and can be vectorized. In contrast to this, OpenACC spawns all the threads/CTAs upfront, and then idles on some of them until there is work for them. correct. I expressed my question poorly. What I mean is that in openmp, a loop that is parallelizeable (by user decree, I guess[*]), should not be transformed such that it is not parallelizeable. This seems to me to be a common requirement of both languages. How one gets parallel threads of execution to the body of the loop is a different question. nathan [*] For ones where the compiler needs to detect parallizeablilty, it's preferable that it doesn't do something earlier to force serializeablility. -- Nathan Sidwell
Re: [C++ Patch] Use declspecs-locations[ds_virtual]
On 06/22/2015 09:54 AM, Paolo Carlini wrote: I think this also qualifies as obvious given the past work / discussion: use in one more place declspecs-locations to improve the location of the error message. Agreed, thanks. Jason
Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation
On 06/09/2015 11:31 AM, Patrick Palka wrote: This patch refactors the entry point of -Wmisleading-indentation from: void warn_for_misleading_indentation (location_t guard_loc, location_t body_loc, location_t next_stmt_loc, enum cpp_ttype next_tok_type, const char *guard_kind); to struct token_indent_info { location_t location; cpp_ttype type; rid keyword; }; void warn_for_misleading_indentation (const token_indent_info guard_tinfo, const token_indent_info body_tinfo, const token_indent_info next_tinfo); The purpose of this refactoring is to expose more information to the -Wmisleading-indentation implementation to allow for more advanced heuristics and for better coverage. (I decided to keep the usage of const references because nobody seems to mind. Also I added a new header file, c-indentation.h.) gcc/c-family/ChangeLog: * c-indentation.h (struct token_indent_info): Define. (get_token_indent_info): Define. (warn_for_misleading_information): Declare. * c-common.h (warn_for_misleading_information): Remove. * c-identation.c (warn_for_misleading_indentation): Change declaration to take three token_indent_infos. Adjust accordingly. * c-identation.c (should_warn_for_misleading_indentation): Likewise. Bail out early if the body is a compound statement. (guard_tinfo_to_string): Define. gcc/c/ChangeLog: * c-parser.c (c_parser_if_body): Take token_indent_info argument. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. (c_parser_else_body): Likewise. (c_parser_if_statement): Likewise. (c_parser_while_statement): Likewise. (c_parser_for_statement): Likewise. gcc/cp/ChangeLog: * parser.c (cp_parser_selection_statement): Move handling of semicolon body to ... (cp_parser_implicitly_scoped_statement): .. here. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. Take token_indent_info argument. (cp_parser_already_scoped_statement): Likewise. (cp_parser_selection_statement, cp_parser_iteration_statement): Extract a token_indent_info corresponding to the guard token. The only question in my mind is bootstrap regression testing. From reading the thread for the earlier version of this patch I got the impression you had bootstrapped and regression tested earlier versions. If you could confirm that you've bootstrapped and regression tested this version it'd be appreciated. You can do it on the individual patches or the set as a whole. Jeff
Re: [Patch, C++, PR65882] Check tf_warning flag in build_new_op_1
On 06/19/2015 08:23 PM, Mikhail Maltsev wrote: I see that version 5.2 is set as target milestone for this bug. Should I backport the patch? Please. Jason
Re: [gomp4] Remove some ptxness from middle end
On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote: + if (GET_CODE (arg) != CONST_INT + || (unsigned HOST_WIDE_INT)INTVAL (arg) = OACC_HWM) Don't we have UINTVAL for this? So UINTVAL (arg). Marek
Re: [PATCH] Check dominator info in compute_dominance_frontiers
On 22/06/15 13:47, Richard Biener wrote: On Mon, Jun 22, 2015 at 1:33 PM, Tom de Vries tom_devr...@mentor.com wrote: On 22/06/15 12:14, Richard Biener wrote: On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, during development of a patch I ran into a case where compute_dominance_frontiers was called with incorrect dominance info. The result was a segmentation violation somewhere in the bitmap code while executing this bitmap_set_bit in compute_dominance_frontiers_1: ... if (!bitmap_set_bit (frontiers[runner-index], b-index)) break; ... The segmentation violation happens because runner-index is 0, and frontiers[0] is uninitialized. [ The initialization in update_ssa looks like this: ... dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun)); FOR_EACH_BB_FN (bb, cfun) bitmap_initialize (dfs[bb-index], bitmap_default_obstack); compute_dominance_frontiers (dfs); ... FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0] (frontiers[0] in compute_dominance_frontiers_1) is not initialized. We could add initialization by making the entry/exit-block bitmap_heads empty and setting the obstack to a reserved obstack bitmap_no_obstack for which allocation results in an assert. ] AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but that the loop reaches the state of runner-index == 0, due to the incorrect dominance info. The patch adds an assert to the loop in compute_dominance_frontiers_1, to make the failure mode cleaner and easier to understand. I think we wouldn't catch all errors in dominance info with this assert. So the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at the start of compute_dominance_frontiers. I'm not sure if: - adding the verify_dominators call is too costly in runtime. - the verify_dominators call should be inside or outside the TV_DOM_FRONTIERS measurement. - there is a level of ENABLE_CHECKING that is more appropriate for the verify_dominators call. Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds? I don't think these kind of asserts are good. A segfault is good by itself (so you can just add the comment if you like). The segfault is not guaranteed to trigger, because it works on uninitialized data. Instead, we may end up modifying valid memory and silently generating wrong code or causing sigsegvs (which will be difficult to track back this error). So I don't think doing nothing is an option here. If we're not going to add this assert, we should initialize the uninitialized data in such a way that we are guaranteed to detect the error. The scheme I proposed above would take care of that. Should I implement that instead? No, instead the check below should catch the error much earlier. Likewise the verify_dominators call is too expensive and misplaced. If then the call belongs in the dom_computed[] == DOM_OK early-out in calculate_dominance_info OK, like this: ... diff --git a/gcc/dominance.c b/gcc/dominance.c index a9e042e..1827eda9 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir) bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false; if (dom_computed[dir_index] == DOM_OK) -return; +{ +#if ENABLE_CHECKING + verify_dominators (CDI_DOMINATORS); +#endif + return; +} timevar_push (TV_DOMINANCE); if (!dom_info_available_p (dir)) ... Yes. I didn't fully understand your comment, do you want me to test this? Sure, it should catch the error. Bootstrapped and reg-tested on x86_64. Committed as attached. Thanks, - Tom Verify dominators in early-out calculate_dominance_info 2015-06-22 Tom de Vries t...@codesourcery.com * dominance.c (calculate_dominance_info): Verify dominators if early-out. --- gcc/dominance.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/dominance.c b/gcc/dominance.c index a9e042e..9c66ca2 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir) bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false; if (dom_computed[dir_index] == DOM_OK) -return; +{ +#if ENABLE_CHECKING + verify_dominators (CDI_DOMINATORS); +#endif + return; +} timevar_push (TV_DOMINANCE); if (!dom_info_available_p (dir)) -- 1.9.1
[committed] Test for flag_parallelize_loops 1
On 19/06/15 11:26, Tom de Vries wrote: Hi, DEF_GOMP_BUILTIN tests for 'flag_parallelize_loops'. But if flag_parallelize_loops is one (which is also the default), then pass_parloops doesn't do anything, and won't generate any OMP constructs. This patch makes DEF_GOMP_BUILTIN tests 'flag_parallelize_loops 1', just like all the other tests of flag_parallelize_loops in the compiler. Build on x86_64 and reg-tested libgomp's c.exp. During bootstrap and reg-test, I found regressions for -fcilkplus. There's a dependency of fcilkplus on the gomp builtins, which is exposed by this patch. This updated patch also enables the gomp builtins for fcilkplus. Bootstrapped and reg-tested on x86_64 on top of trunk. Committed to trunk as obvious. Thanks, - Tom Test for flag_parallelize_loops 1 2015-06-19 Tom de Vries t...@codesourcery.com * builtins.def (DEF_GOMP_BUILTIN): Test 'flag_tree_parallelize_loops 1' instead of 'flag_tree_parallelize_loops'. Test flag_cilkplus. --- gcc/builtins.def | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/builtins.def b/gcc/builtins.def index 55ce9f6..80e4a9c 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -182,7 +182,9 @@ along with GCC; see the file COPYING3. If not see #define DEF_GOMP_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ DEF_BUILTIN (ENUM, __builtin_ NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ false, true, true, ATTRS, false, \ - (flag_openmp || flag_tree_parallelize_loops \ + (flag_openmp \ + || flag_tree_parallelize_loops 1 \ + || flag_cilkplus \ || flag_offload_abi != OFFLOAD_ABI_UNSET)) /* Builtin used by implementation of Cilk Plus. Most of these are decomposed -- 1.9.1
Re: [PATCH 2/3] Remove is_first_nonwhitespace_on_line(), instead improve get_visual_column()
On 06/09/2015 11:31 AM, Patrick Palka wrote: This patch removes the function is_first_nonwhitespace_on_line() in favor of augmenting the function get_visual_column() to optionally return the visual column corresponding to the first non-whitespace character on the line. Existing usage of is_first_nonwhitespace_on_line() can be trivially replaced by calling get_visual_column() and comparing *out with *first_nws. The rationale for this change is that in many cases it is better to use the visual column of the first non-whitespace character rather than the visual column of the token. Consider: if (p) { foo (1); } else // GUARD if (q) // BODY foo (2); foo (3); // NEXT Here, with current heuristics, we do not emit a warning because we notice that the visual columns of each token line up (suggesting autogenerated code). Yet it is obvious that we should warn here because it misleadingly looks like the foo (3); statement is guarded by the else. If we instead consider the visual column of the first non-whitespace character on the guard line, the columns will not line up thus we will emit the warning. This will be done in the next patch. gcc/c-family/ChangeLog: * c-indentation.c (get_visual_column): Add parameter first_nws, use it. Update comment documenting the function. (is_first_nonwhitespace_on_line): Remove. (should_warn_for_misleading_indentation): Replace usage of of is_first_nonwhitespace_on_line with get_visual_column. Same comment/question WRT testing as the prior patch. OK once you've confirmed bootstrap regression testing was completed successfully. jeff
Re: RFA: Fix isl-ast-gen-if-1.c test
Hi Jeff, I'd tend to prefer to change the size of the array -- adding another conditional in the loop may have unintended consequences that possibly scramble things just enough to compromise the test. Okey dokey, here is a revised version. Is this one OK ? Cheers Nick gcc/ChangeLog Index: 2015-06-22 Nick Clifton ni...@redhat.com * gcc.dg/graphite/isl-ast-gen-if.c (main): Increase size of a array to allow a[50] to be a valid location. gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c === --- gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(revision 224722) +++ gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(working copy) @@ -28,7 +28,7 @@ int main (void) { - int a[50]; + int a[51]; /* NB This size allows foo's first iteration to write to a[50]. */ foo (a, 50); int res = array_sum (a); if (res != 49)
Re: [C++/58583] ICE instantiating NSDMIs
On 06/22/15 03:37, Andreas Schwab wrote: Nathan Sidwell nat...@acm.org writes: On 06/20/15 02:09, Andreas Schwab wrote: This also fails on powerpc. what is the build compiler? It is a bootstrapped build, so the build compiler should not matter. ok, thanks. I've just built me a powerpc-linux targeting compiler from an x86_64-linux host. That is showing the expected diagnostic, so I'm still unable to reproduce the failure. nsidwell@build6-lucid-cs:19install/bin/powerpc-linux-gnu-g++ -std=c++11 -c nsdmi-template14.C nsdmi-template14.C:6:20: error: constructor required before non-static data member for 'A0::i' has been parsed int i = (A0(), 0); // { dg-error has been parsed } ^ nsdmi-template14.C: In constructor 'constexpr A0::A()': nsdmi-template14.C:4:22: error: constructor required before non-static data member for 'A0::i' has been parsed templateint struct A // { dg-error has been parsed } ^ nsdmi-template14.C: At global scope: nsdmi-template14.C:6:20: note: synthesized method 'constexpr A0::A()' first required here int i = (A0(), 0); // { dg-error has been parsed } ^ nsdmi-template14.C:14:6: error: recursive instantiation of non-static data member initializer for 'B1::p' B1 x; // { dg-error recursive instantiation of non-static data } ^ nathan
Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation
On 06/18/2015 10:39 AM, David Malcolm wrote: On Thu, 2015-06-18 at 11:41 -0400, Patrick Palka wrote: On Tue, Jun 9, 2015 at 1:31 PM, Patrick Palka patr...@parcs.ath.cx wrote: This patch refactors the entry point of -Wmisleading-indentation from: void warn_for_misleading_indentation (location_t guard_loc, location_t body_loc, location_t next_stmt_loc, enum cpp_ttype next_tok_type, const char *guard_kind); to struct token_indent_info { location_t location; cpp_ttype type; rid keyword; }; void warn_for_misleading_indentation (const token_indent_info guard_tinfo, const token_indent_info body_tinfo, const token_indent_info next_tinfo); The purpose of this refactoring is to expose more information to the -Wmisleading-indentation implementation to allow for more advanced heuristics and for better coverage. (I decided to keep the usage of const references because nobody seems to mind. Also I added a new header file, c-indentation.h.) gcc/c-family/ChangeLog: * c-indentation.h (struct token_indent_info): Define. (get_token_indent_info): Define. (warn_for_misleading_information): Declare. * c-common.h (warn_for_misleading_information): Remove. * c-identation.c (warn_for_misleading_indentation): Change declaration to take three token_indent_infos. Adjust accordingly. * c-identation.c (should_warn_for_misleading_indentation): Likewise. Bail out early if the body is a compound statement. (guard_tinfo_to_string): Define. gcc/c/ChangeLog: * c-parser.c (c_parser_if_body): Take token_indent_info argument. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. (c_parser_else_body): Likewise. (c_parser_if_statement): Likewise. (c_parser_while_statement): Likewise. (c_parser_for_statement): Likewise. gcc/cp/ChangeLog: * parser.c (cp_parser_selection_statement): Move handling of semicolon body to ... (cp_parser_implicitly_scoped_statement): .. here. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. Take token_indent_info argument. (cp_parser_already_scoped_statement): Likewise. (cp_parser_selection_statement, cp_parser_iteration_statement): Extract a token_indent_info corresponding to the guard token. Pinging this series. FWIW, they look reasonable to me; I'm not a reviewer. But as the implementer of the warning, your comments/thoughts are definitely helpful in the review process. We've never worked too hard to find a way to formalize this into a set of policies and procedures, which is probably a mistake. jeff
Re: Re: [PATCH] [PATCH][ARM] Fix split-live-ranges-for-shrink-wrap.c testcase.
I have no more comments on this patch. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Check dominator info in compute_dominance_frontiers
On 22/06/15 13:47, Richard Biener wrote: (eventually also for the case where we end up only computing the fast-query stuff). Like this? ... diff --git a/gcc/dominance.c b/gcc/dominance.c index 9c66ca2..58fc6fd 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -679,6 +679,12 @@ calculate_dominance_info (enum cdi_direction dir) free_dom_info (di); dom_computed[dir_index] = DOM_NO_FAST_QUERY; } + else +{ +#if ENABLE_CHECKING + verify_dominators (CDI_DOMINATORS); +#endif +} compute_dom_fast_query (dir); ... Thanks, - Tom
Re: [Ping, Patch, fortran, 64674, v3] [OOP] ICE in ASSOCIATE with class array
Hi Paul, On Mon, 22 Jun 2015 16:04:09 +0200 Paul Richard Thomas paul.richard.tho...@gmail.com wrote: Hi Andre, Some questions: The first and second chunks look a bit awkward in parse.c. Do they have to be there in order that primary.c does the right thing? I tried at first to do this rank resolution in primary.c, but that was too late. parse.c needs to propagate the rank correctly. When I remember correctly, then doing so later prevents parse.c to correctly recognize the vector (from the example in the initial description) as such, i.e., the indexing in vector(2) was not allowed. gfortran assumed vector to be scalar. So, IMHO yes. Could the whole lot be transferred to resolve.c or would that make it horribly messy? Again, IMO is it not easily transferable to primary.c or even resolve.c. Therefore no. I couldn't apply the patch right now - does it work with variable expressions for the target array indices? I am not quite sure, what you mean. Something like this: associate(pam = im(2:3, 2:3)) pam = 9 pam(1,2) = 10 do c = 1, 2 pam(2, c) = 0 end do end associate ? I have added that to the testcase and it works. Or do you want the variable expressions in the target, like this: integer :: expect(20)= 23 integer :: im(4,5) = 23 integer :: c expect(2:3) = 9 do c = 1, 5 im = 23 associate(pam = im(:, c)) pam(2:3) = 9 end associate if (any (reshape(im, [20]) /= expect)) call abort() ! Shift expect expect = [expect(17:), expect(:16)] end do Will this do, or did you have something more elaborate in mind? This is also working and in the testcase now. Thanks for the review so far. Regards, Andre -- Andre Vehreschild * Email: vehre ad gmx dot de
[ping] Couple of patches for -fdump-ada-spec
Add query for template-dependent arguments to -fdump-ada-spec: http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00403.html Get rid of assembly file with -fdump-ada-spec: http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00420.html Thanks in advance. -- Eric Botcazou
[Aarch64] Expand +rdma documentation, small changes to march and mcpu text.
Hello, The documentation for the ARMv8.1 +rdma option doesn't mention that enabling it also implies enabling Adv.SIMD. This patch fixes that. The documentation for the -march and -mcpu options are also a little messy, this patch tries to make the text clearer and adds a (texinfo) link to the subsection documenting the feature modifiers. Tested by checking the html output. Ok for trunk? Matthew 2015-06-22 Matthew Wahab matthew.wa...@arm.com * doc/invoke.texi (Aarch64 Options, -march): Split out arch and feature description, split out the native option, add a link to the feature documentation, rearrange and slightly rewrite text. (Aarch64 options, -mcpu): Likewise. (Aarch64 options, Feature Modifiers): Add an anchor. Mention +rdma implies Adv. SIMD. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index b99ab1c..599dbf0 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12426,24 +12426,26 @@ corresponding flag to the linker. @opindex march Specify the name of the target architecture, optionally suffixed by one or more feature modifiers. This option has the form -@option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the -permissible values for @var{arch} are @samp{armv8-a} or @samp{armv8.1-a}. -The permissible values for @var{feature} are documented in the sub-section -below. Additionally on native AArch64 GNU/Linux systems the value +@option{-march=@var{arch}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}. + +The permissible values for @var{arch} are @samp{armv8-a} or +@samp{armv8.1-a}. + +For the permissible values for @var{feature}, see the sub-section on +@ref{aarch64-feature-modifiers,,@option{-march} and @option{-mcpu} +Feature Modifiers}. Where conflicting feature modifiers are +specified, the right-most feature is used. + +Additionally on native AArch64 GNU/Linux systems the value @samp{native} is available. This option causes the compiler to pick the architecture of the host system. If the compiler is unable to recognize the architecture of the host system this option has no effect. -Where conflicting feature modifiers are specified, the right-most feature is -used. - -GCC uses this name to determine what kind of instructions it can emit when -generating assembly code. - -Where @option{-march} is specified without either of @option{-mtune} -or @option{-mcpu} also being specified, the code is tuned to perform -well across a range of target processors implementing the target -architecture. +GCC uses @var{name} to determine what kind of instructions it can emit +when generating assembly code. If @option{-march} is specified +without either of @option{-mtune} or @option{-mcpu} also being +specified, the code is tuned to perform well across a range of target +processors implementing the target architecture. @item -mtune=@var{name} @opindex mtune @@ -12456,12 +12458,11 @@ Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible values for this option are: @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53}. -Additionally on native AArch64 GNU/Linux systems the value @samp{native} -is available. -This option causes the compiler to pick the architecture of and tune the -performance of the code for the processor of the host system. -If the compiler is unable to recognize the processor of the host system -this option has no effect. +Additionally on native AArch64 GNU/Linux systems the value +@samp{native} is available. This option causes the compiler to pick +the architecture of and tune the performance of the code for the +processor of the host system. If the compiler is unable to recognize +the processor of the host system this option has no effect. Where none of @option{-mtune=}, @option{-mcpu=} or @option{-march=} are specified, the code is tuned to perform well across a range @@ -12471,23 +12472,23 @@ This option cannot be suffixed by feature modifiers. @item -mcpu=@var{name} @opindex mcpu -Specify the name of the target processor, optionally suffixed by one or more -feature modifiers. This option has the form -@option{-mcpu=@var{cpu}@r{@{}+@r{[}no@r{]}@var{feature}@r{@}*}}, where the -permissible values for @var{cpu} are the same as those available for -@option{-mtune}. Additionally on native AArch64 GNU/Linux systems the -value @samp{native} is available. -This option causes the compiler to tune the performance of the code for the -processor of the host system. If the compiler is unable to recognize the -processor of the host system this option has no effect. - -The permissible values for @var{feature} are documented in the sub-section -below. - -Where conflicting feature modifiers are specified, the right-most feature is -used. +Specify the name of the target processor, optionally suffixed by one +or more feature modifiers. This option has the form
Re: RFA: Fix isl-ast-gen-if-1.c test
On 06/22/2015 10:07 AM, Nicholas Clifton wrote: Hi Jeff, I'd tend to prefer to change the size of the array -- adding another conditional in the loop may have unintended consequences that possibly scramble things just enough to compromise the test. Okey dokey, here is a revised version. Is this one OK ? Cheers Nick gcc/ChangeLog Index: 2015-06-22 Nick Clifton ni...@redhat.com * gcc.dg/graphite/isl-ast-gen-if.c (main): Increase size of a array to allow a[50] to be a valid location. OK. jeff
Re: [C++ Patch] Remove pointless code in grokdeclarator
I think we should keep a comment to clarify why we don't care about type_quals here. Jason
Re: [C++ Patch] Remove pointless code in grokdeclarator
Hi, On 06/22/2015 06:56 PM, Jason Merrill wrote: I think we should keep a comment to clarify why we don't care about type_quals here. Ok, I will commit with a comment added. Thanks, Paolo.
Re: [gomp4] Preserve NVPTX reconvergence points
On Mon, 22 Jun 2015 16:24:56 +0200 Jakub Jelinek ja...@redhat.com wrote: On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote: One problem is that (at least on the GPU hardware we've considered so far) we're somewhat constrained in how much control we have over how the underlying hardware executes code: it's possible to draw up a scheme where OpenACC source-level control-flow semantics are reflected directly in the PTX assembly output (e.g. to say all threads in a CTA/warp will be coherent after such-and-such a loop), and lowering OpenACC directives quite early seems to make that relatively tractable. (Even if the resulting code is relatively un-optimisable due to the abnormal edges inserted to make sure that the CFG doesn't become ill-formed.) If arbitrary optimisations are done between OMP-lowering time and somewhere around vectorisation (say), it's less clear if that correspondence can be maintained. Say if the code executed by half the threads in a warp becomes physically separated from the code executed by the other half of the threads in a warp due to some loop optimisation, we can no longer easily determine where that warp will reconverge, and certain other operations (relying on coherent warps -- e.g. CTA synchronisation) become impossible. A similar issue exists for warps within a CTA. So, essentially -- I don't know how late loop lowering would interact with: (a) Maintaining a CFG that will work with PTX. (b) Predication for worker-single and/or vector-single modes (actually all currently-proposed schemes have problems with proper representation of data-dependencies for variables and compiler-generated temporaries between predicated regions.) I don't understand why lowering the way you suggest helps here at all. In the proposed scheme, you essentially have whole function in e.g. worker-single or vector-single mode, which you need to be able to handle properly in any case, because users can write such routines themselves. And then you can have a loop in such a function that has some special attribute, a hint that it is desirable to vectorize it (for PTX the PTX way) or use vector-single mode for it in a worker-single function. So, the special pass then of course needs to handle all the needed broadcasting and reduction required to change the mode from e.g. worker-single to vector-single, but the convergence points still would be either on the boundary of such loops to be vectorized or parallelized, or wherever else they appear in normal vector-single or worker-single functions (around the calls to certainly calls?). I think most of my concerns are centred around loops (with the markings you suggest) that might be split into parts: if that cannot happen for loops that are annotated as you describe, maybe things will work out OK. (Apologies for my ignorance here, this isn't a part of the compiler that I know anything about.) Julian
[PATCH 2/2][ARM] Use new FPU features representation
Hello, This patch series changes the representation of FPU features to use a simple bit-set and flags, as is done elsewhere. This patch uses the new representation of FPU feature sets. Tested the series for arm-none-linux-gnueabihf with check-gcc Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab matthew.wa...@arm.com * config/arm/arm-fpus.def: Replace neon, fp16 and crypto boolean fields with feature flags. Update comment. * config/arm/arm.c (ARM_FPU): Update macro. * config/arm/arm.h (TARGET_NEON_FP16): Update feature test. (TARGET_FP16): Likewise. (TARGET_CRYPTO): Likewise. (TARGET_NEON): Likewise. (struct arm_fpu_desc): Remove fields neon, fp16 and crypto. Add field features. From 6f9cd1b41d7597d95bd80aa21344f8e6e011e168 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Wed, 10 Jun 2015 10:11:56 +0100 Subject: [PATCH 2/2] Use new FPU feature definitions. Change-Id: I0c45e52b08b31433ec2b30fcb666584cabcb826b --- gcc/config/arm/arm-fpus.def | 40 gcc/config/arm/arm.c| 4 ++-- gcc/config/arm/arm.h| 22 +- 3 files changed, 35 insertions(+), 31 deletions(-) diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def index 2dfefd6..efd5896 100644 --- a/gcc/config/arm/arm-fpus.def +++ b/gcc/config/arm/arm-fpus.def @@ -19,30 +19,30 @@ /* Before using #include to read this file, define a macro: - ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO) + ARM_FPU(NAME, MODEL, REV, VFP_REGS, FEATURES) The arguments are the fields of struct arm_fpu_desc. genopt.sh assumes no whitespace up to the first , in each entry. */ -ARM_FPU(vfp, ARM_FP_MODEL_VFP, 2, VFP_REG_D16, false, false, false) -ARM_FPU(vfpv3, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false) -ARM_FPU(vfpv3-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, true, false) -ARM_FPU(vfpv3-d16, ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, false, false) -ARM_FPU(vfpv3-d16-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_D16, false, true, false) -ARM_FPU(vfpv3xd, ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, false, false) -ARM_FPU(vfpv3xd-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, false, true, false) -ARM_FPU(neon, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true , false, false) -ARM_FPU(neon-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, true, true, false) -ARM_FPU(vfpv4, ARM_FP_MODEL_VFP, 4, VFP_REG_D32, false, true, false) -ARM_FPU(vfpv4-d16, ARM_FP_MODEL_VFP, 4, VFP_REG_D16, false, true, false) -ARM_FPU(fpv4-sp-d16, ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, false, true, false) -ARM_FPU(fpv5-sp-d16, ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, false, true, false) -ARM_FPU(fpv5-d16, ARM_FP_MODEL_VFP, 5, VFP_REG_D16, false, true, false) -ARM_FPU(neon-vfpv4, ARM_FP_MODEL_VFP, 4, VFP_REG_D32, true, true, false) -ARM_FPU(fp-armv8, ARM_FP_MODEL_VFP, 8, VFP_REG_D32, false, true, false) -ARM_FPU(neon-fp-armv8,ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, false) +ARM_FPU(vfp, ARM_FP_MODEL_VFP, 2, VFP_REG_D16, FPU_FL_NONE) +ARM_FPU(vfpv3, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE) +ARM_FPU(vfpv3-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_FP16) +ARM_FPU(vfpv3-d16, ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_NONE) +ARM_FPU(vfpv3-d16-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_D16, FPU_FL_FP16) +ARM_FPU(vfpv3xd, ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_NONE) +ARM_FPU(vfpv3xd-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_SINGLE, FPU_FL_FP16) +ARM_FPU(neon, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON) +ARM_FPU(neon-fp16, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16) +ARM_FPU(vfpv4, ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_FP16) +ARM_FPU(vfpv4-d16, ARM_FP_MODEL_VFP, 4, VFP_REG_D16, FPU_FL_FP16) +ARM_FPU(fpv4-sp-d16, ARM_FP_MODEL_VFP, 4, VFP_REG_SINGLE, FPU_FL_FP16) +ARM_FPU(fpv5-sp-d16, ARM_FP_MODEL_VFP, 5, VFP_REG_SINGLE, FPU_FL_FP16) +ARM_FPU(fpv5-d16, ARM_FP_MODEL_VFP, 5, VFP_REG_D16, FPU_FL_FP16) +ARM_FPU(neon-vfpv4, ARM_FP_MODEL_VFP, 4, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16) +ARM_FPU(fp-armv8, ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16) +ARM_FPU(neon-fp-armv8,ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16) ARM_FPU(crypto-neon-fp-armv8, - ARM_FP_MODEL_VFP, 8, VFP_REG_D32, true, true, true) + ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO) /* Compatibility aliases. */ -ARM_FPU(vfp3, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, false, false, false) +ARM_FPU(vfp3, ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e79a369..e104d2f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2231,8 +2231,8 @@ char arm_arch_name[] = __ARM_ARCH_0UNK__; static const struct arm_fpu_desc all_fpus[] = { -#define ARM_FPU(NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO) \ - { NAME, MODEL, REV, VFP_REGS, NEON, FP16, CRYPTO }, +#define ARM_FPU(NAME, MODEL, REV, VFP_REGS, FEATURES) \ + {
Re: [gomp4] Preserve NVPTX reconvergence points
On 06/22/2015 04:24 PM, Jakub Jelinek wrote: I don't understand why lowering the way you suggest helps here at all. In the proposed scheme, you essentially have whole function in e.g. worker-single or vector-single mode, which you need to be able to handle properly in any case, because users can write such routines themselves. And then you can have a loop in such a function that has some special attribute, a hint that it is desirable to vectorize it (for PTX the PTX way) or use vector-single mode for it in a worker-single function. You can have a hint that it is desirable, but not a hint that it is correct (because passes in between may invalidate that). The OpenACC directives guarantee to the compiler that the program can be transformed into a parallel form. If we lose them early we must then rely on our analysis which may not be strong enough to prove that the loop can be parallelized. If we make these transformations early enough, while we still have the OpenACC directives, we can guarantee that we do exactly what the programmer specified. Bernd
[PATCH 2/4][ARM] Add feature set definitions.
Hello, The ARM backend uses an unsigned long to record CPU feature flags and there are currently 30 bits in use. This series of patches replaces the single unsigned long with a representation based on an array of values. This patch adds, but doesn't use, type arm_feature_set and macros prefixed with ARM_FSET to represent and operate on feature sets. Tested by building with no errors. Also tested as part of the series, for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab matthew.wa...@arm.com * config/arm/arm-protos.h (FL_NONE): New. (FL_ANY): New. (arm_feature_set): New. (ARM_FSET_MAKE): New. (ARM_FSET_MAKE_CPU1): New. (ARM_FSET_MAKE_CPU2): New. (ARM_FSET_CPU1): New. (ARM_FSET_CPU2): New. (ARM_FSET_EMPTY): New. (ARM_FSET_ANY): New. (ARM_FSET_HAS_CPU1): New. (ARM_FSET_HAS_CPU2): New. (ARM_FSET_ADD_CPU1): New. (ARM_FSET_ADD_CPU2): New. (ARM_FSET_DEL_CPU1): New. (ARM_FSET_DEL_CPU2): New. (ARM_FSET_UNION): New. (ARM_FSET_INTER): New. (ARM_FSET_XOR): New. (ARM_FSET_EXCLUDE): New. (AFM_FSET_IS_EMPTY): New. (ARM_FSET_CPU_SUBSET): New. From 1a98a80b64427f7bb97212ae9ecff515e980ddb7 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Thu, 4 Jun 2015 15:35:25 +0100 Subject: [PATCH 2/4] Add feature set definitions. Change-Id: I5f89b46ea57e35f477ec4751fea3cb6ee8fce251 --- gcc/config/arm/arm-protos.h | 101 1 file changed, 101 insertions(+) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 62f91ef..a19d54d 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -346,6 +346,8 @@ extern bool arm_is_constant_pool_ref (rtx); /* Flags used to identify the presence of processor capabilities. */ /* Bit values used to identify processor capabilities. */ +#define FL_NONE (0) /* No flags. */ +#define FL_ANY (0x)/* All flags. */ #define FL_CO_PROC(1 0)/* Has external co-processor bus */ #define FL_ARCH3M (1 1)/* Extended multiply */ #define FL_MODE26 (1 2)/* 26-bit mode support */ @@ -412,6 +414,105 @@ extern bool arm_is_constant_pool_ref (rtx); #define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM) #define FL_FOR_ARCH8A (FL_FOR_ARCH7VE | FL_ARCH8) +/* There are too many feature bits to fit in a single word so the set of cpu and + fpu capabilities is a structure. A feature set is created and manipulated + with the ARM_FSET macros. */ + +typedef struct +{ + unsigned long cpu[2]; +} arm_feature_set; + + +/* Initialize a feature set. */ + +#define ARM_FSET_MAKE(CPU1,CPU2) { { (CPU1), (CPU2) } } + +#define ARM_FSET_MAKE_CPU1(CPU1) ARM_FSET_MAKE ((CPU1), (FL_NONE)) +#define ARM_FSET_MAKE_CPU2(CPU2) ARM_FSET_MAKE ((FL_NONE), (CPU2)) + +/* Accessors. */ + +#define ARM_FSET_CPU1(S) ((S).cpu[0]) +#define ARM_FSET_CPU2(S) ((S).cpu[1]) + +/* Useful combinations. */ + +#define ARM_FSET_EMPTY ARM_FSET_MAKE (FL_NONE, FL_NONE) +#define ARM_FSET_ANY ARM_FSET_MAKE (FL_ANY, FL_ANY) + +/* Tests for a specific CPU feature. */ + +#define ARM_FSET_HAS_CPU1(A, F) (((A).cpu[0] (F)) == F) +#define ARM_FSET_HAS_CPU2(A, F) (((A).cpu[1] (F)) == F) + +/* Add a feature to a feature set. */ + +#define ARM_FSET_ADD_CPU1(DST, F) \ + do { \ +(DST).cpu[0] |= (F); \ + } while (0) + +#define ARM_FSET_ADD_CPU2(DST, F) \ + do { \ +(DST).cpu[1] |= (F); \ + } while (0) + +/* Remove a feature from a feature set. */ + +#define ARM_FSET_DEL_CPU1(DST, F) \ + do { \ +(DST).cpu[0] = ~(F); \ + } while (0) + +#define ARM_FSET_DEL_CPU2(DST, F) \ + do { \ +(DST).cpu[1] = ~(F); \ + } while (0) + +/* Union of feature sets. */ + +#define ARM_FSET_UNION(DST,F1,F2) \ + do { \ +(DST).cpu[0] = (F1).cpu[0] | (F2).cpu[0]; \ +(DST).cpu[1] = (F1).cpu[1] | (F2).cpu[1]; \ + } while (0) + +/* Intersection of feature sets. */ + +#define ARM_FSET_INTER(DST,F1,F2) \ + do { \ +(DST).cpu[0] = (F1).cpu[0] (F2).cpu[0]; \ +(DST).cpu[1] = (F1).cpu[1] (F2).cpu[1]; \ + } while (0) + +/* Exclusive disjunction. */ + +#define ARM_FSET_XOR(DST,F1,F2)\ + do { \ +(DST).cpu[0] = (F1).cpu[0] ^ (F2).cpu[0]; \ +(DST).cpu[1] = (F1).cpu[1] ^ (F2).cpu[1]; \ + } while (0) + +/* Difference of feature sets: F1 excluding the elements of F2. */ + +#define ARM_FSET_EXCLUDE(DST,F1,F2) \ + do { \ +(DST).cpu[0] = (F1).cpu[0] ~(F2).cpu[0]; \ +(DST).cpu[1] = (F1).cpu[1] ~(F2).cpu[1]; \ + } while (0) + +/* Test for an empty feature set. */ + +#define ARM_FSET_IS_EMPTY(A) \ + (!((A).cpu[0]) !((A).cpu[1])) + +/* Tests whether the cpu features of A are a subset of B. */ + +#define ARM_FSET_CPU_SUBSET(A,B) \ + A).cpu[0] (B).cpu[0]) == (A).cpu[0])
Re: genmatch: guess the type of a?b:c as b instead of a
On 06/06/2015 05:34 AM, Marc Glisse wrote: Hello, as discussed around https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00041.html we are currently guessing the type of a?b:c incorrectly. This does not affect current simplifications, because the only 'cond' in output patterns are at the outermost level, so their type is forced to 'type' and never guessed. Indeed, the patch does not change the generated *-match.c. It would allow removing an explicit cond:itype in a patch posted by Jeff. I tested it on a dummy .pd file containing: (simplify (plus @0 (plus @1 @2)) (negate (cond @0 @1 @2))) and the generated files differ by: - res = fold_build3_loc (loc, COND_EXPR, TREE_TYPE (ops1[0]), ops1[0], ops1[1], ops1[2]); + res = fold_build3_loc (loc, COND_EXPR, TREE_TYPE (ops1[1]), ops1[0], ops1[1], ops1[2]); (and something similar for gimple) I wondered about using something like VOID_TYPE_P (TREE_TYPE (ops1[1])) ? TREE_TYPE (ops1[2]) : TREE_TYPE (ops1[1]) but I don't think that will be necessary. Bootstrap is currently broken on many platforms with comparison failures, but since it went that far and generated the same *-match.c files, that seems sufficient testing. 2015-06-08 Marc Glisse marc.gli...@inria.fr * genmatch.c (expr::gen_transform): For conditions, guess the type from the second operand. Thanks for taking care of this. I'd gone back and verified the type was needed, but didn't have to time reduce a testcase for it prior to going on PTO. Jeff
[PATCH 4/4][ARM] Move initializer into arm-cores.def and arm-arches.def
Hello, The ARM backend uses an unsigned long to record CPU feature flags and there are currently 30 bits in use. This series of patches replaces the single unsigned long with a representation based on an array of values. This patch updates the entries in the arm-core.def and arm-arches.def files for the new arm_feature_set representation, moving the initializers from a macro expansion and making them explicit in the file entries. Tested for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/ 2015-08-22 Matthew Wahab matthew.wa...@arm.com * config/arm/arm-arches.def: Replace single value flags with initializer built from ARM_FSET_MAKE_CPU1. * config/arm/arm-cores.def: Likewise. * config/arm/arm.c: (all_cores): Remove ARM_FSET_MAKE_CPU1 derivation from the ARM_CORE macro definition, use the given value instead. (all_architectures): Remove ARM_FSET_MAKE_CPU1 derivation from the ARM_ARCH macro definition, use the given value instead. From 389cfb0e1046b1d84dd3d8920aa5bed50dc19164 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Mon, 8 Jun 2015 16:15:52 +0100 Subject: [PATCH 4/4] Move feature sets into core and arch def files. Change-Id: Ica484c7d9f46413c196b26a630ff49413b10289b --- gcc/config/arm/arm-arches.def | 56 ++-- gcc/config/arm/arm-cores.def | 200 +- gcc/config/arm/arm.c | 4 +- 3 files changed, 130 insertions(+), 130 deletions(-) diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index 840c1ff..6d0374a 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -28,33 +28,33 @@ genopt.sh assumes no whitespace up to the first , in each entry. */ -ARM_ARCH(armv2, arm2, 2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2) -ARM_ARCH(armv2a, arm2, 2, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2) -ARM_ARCH(armv3, arm6, 3, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3) -ARM_ARCH(armv3m, arm7m, 3M, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M) -ARM_ARCH(armv4, arm7tdmi, 4, FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4) +ARM_ARCH(armv2, arm2, 2, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)) +ARM_ARCH(armv2a, arm2, 2, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH2)) +ARM_ARCH(armv3, arm6, 3, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3)) +ARM_ARCH(armv3m, arm7m, 3M, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH3M)) +ARM_ARCH(armv4, arm7tdmi, 4, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_MODE26 | FL_FOR_ARCH4)) /* Strictly, FL_MODE26 is a permitted option for v4t, but there are no implementations that support it, so we will leave it out for now. */ -ARM_ARCH(armv4t, arm7tdmi, 4T, FL_CO_PROC | FL_FOR_ARCH4T) -ARM_ARCH(armv5, arm10tdmi, 5, FL_CO_PROC | FL_FOR_ARCH5) -ARM_ARCH(armv5t, arm10tdmi, 5T, FL_CO_PROC | FL_FOR_ARCH5T) -ARM_ARCH(armv5e, arm1026ejs, 5E, FL_CO_PROC | FL_FOR_ARCH5E) -ARM_ARCH(armv5te, arm1026ejs, 5TE, FL_CO_PROC | FL_FOR_ARCH5TE) -ARM_ARCH(armv6, arm1136js, 6, FL_CO_PROC | FL_FOR_ARCH6) -ARM_ARCH(armv6j, arm1136js, 6J, FL_CO_PROC | FL_FOR_ARCH6J) -ARM_ARCH(armv6k, mpcore, 6K, FL_CO_PROC | FL_FOR_ARCH6K) -ARM_ARCH(armv6z, arm1176jzs, 6Z, FL_CO_PROC | FL_FOR_ARCH6Z) -ARM_ARCH(armv6zk, arm1176jzs, 6ZK, FL_CO_PROC | FL_FOR_ARCH6ZK) -ARM_ARCH(armv6t2, arm1156t2s, 6T2, FL_CO_PROC | FL_FOR_ARCH6T2) -ARM_ARCH(armv6-m, cortexm1, 6M, FL_FOR_ARCH6M) -ARM_ARCH(armv6s-m, cortexm1, 6M, FL_FOR_ARCH6M) -ARM_ARCH(armv7, cortexa8, 7, FL_CO_PROC | FL_FOR_ARCH7) -ARM_ARCH(armv7-a, cortexa8, 7A, FL_CO_PROC | FL_FOR_ARCH7A) -ARM_ARCH(armv7ve, cortexa8, 7A, FL_CO_PROC | FL_FOR_ARCH7VE) -ARM_ARCH(armv7-r, cortexr4, 7R, FL_CO_PROC | FL_FOR_ARCH7R) -ARM_ARCH(armv7-m, cortexm3, 7M, FL_CO_PROC | FL_FOR_ARCH7M) -ARM_ARCH(armv7e-m, cortexm4, 7EM, FL_CO_PROC | FL_FOR_ARCH7EM) -ARM_ARCH(armv8-a, cortexa53, 8A, FL_CO_PROC | FL_FOR_ARCH8A) -ARM_ARCH(armv8-a+crc,cortexa53, 8A,FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A) -ARM_ARCH(iwmmxt, iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT) -ARM_ARCH(iwmmxt2, iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2) +ARM_ARCH(armv4t, arm7tdmi, 4T, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH4T)) +ARM_ARCH(armv5, arm10tdmi, 5, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5)) +ARM_ARCH(armv5t, arm10tdmi, 5T, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5T)) +ARM_ARCH(armv5e, arm1026ejs, 5E, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH5E)) +ARM_ARCH(armv5te, arm1026ejs, 5TE, ARM_FSET_MAKE_CPU1 (FL_CO_PROC |
Re: [gomp4] Preserve NVPTX reconvergence points
On Mon, Jun 22, 2015 at 12:08:36PM -0400, Nathan Sidwell wrote: On 06/22/15 11:18, Bernd Schmidt wrote: You can have a hint that it is desirable, but not a hint that it is correct (because passes in between may invalidate that). The OpenACC directives guarantee to the compiler that the program can be transformed into a parallel form. If we lose them early we must then rely on our analysis which may not be strong enough to prove that the loop can be parallelized. If we make these transformations early enough, while we still have the OpenACC directives, we can guarantee that we do exactly what the programmer specified. How does this differ from openmp's needs to preserve parallelism on a parallel loop? Is it more than the reconvergence issue? OpenMP has significantly different execution model, a parallel block in OpenMP is run by certain number of threads (the initial thread (the one encountering that region) and then dpeending on clauses and library decisions perhaps others), with a barrier at the end of the region, and afterwards only the initial thread continues again. So, an OpenMP parallel is implemented as a library call, taking outlined function from the parallel's body as one of its arguments and the body is executed by the initial thread and perhaps others. OpenMP worksharing loop is just coordination between the threads in the team, which thread takes which subset of the loop's iterations, and optionally followed by a barrier. OpenMP simd loop is a loop that has certain properties guaranteed by the user and can be vectorized. In contrast to this, OpenACC spawns all the threads/CTAs upfront, and then idles on some of them until there is work for them. Jakub
Re: [gomp4] Remove some ptxness from middle end
On 06/22/15 13:04, Marek Polacek wrote: On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote: + if (GET_CODE (arg) != CONST_INT + || (unsigned HOST_WIDE_INT)INTVAL (arg) = OACC_HWM) Don't we have UINTVAL for this? So UINTVAL (arg). Oh, thanks! will fix nathan -- Nathan Sidwell
RFA: Fix isl-ast-gen-if-1.c test
Hi Guys, The test file gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c file was generating an unexpected failure for the RX. When I investigated I found that a return address on the stack was being corrupted, and I tracked it down to the foo() function: foo (int a[], int n) { int i; for (i = 0; i n; i++) { if (i 25) a[i] = i; a[n - i] = 1; } } The problem is that when i is 0, the line a[n - i] writes to a[50] which is beyond the end of the a array. (In the RX case it writes over the return address on the stack). The patch below fixes the problem, although it could also be solved by increasing the size of the a array when it is declared in main(). OK to apply ? Cheers Nick gcc/testsuite/ChangeLog 2015-06-22 Nick Clifton ni...@redhat.com * gcc.dg/graphite/isl-ast-gen-if-1.c (foo): Prevent writing after the end of the array. Index: gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c === --- gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(revision 224722) +++ gcc/testsuite/gcc.dg/graphite/isl-ast-gen-if-1.c(working copy) @@ -10,7 +10,8 @@ { if (i 25) a[i] = i; - a[n - i] = 1; + if (i 0) + a[n - i] = 1; } }
Re: [PATCH 3/3] Improve -Wmissing-indentation heuristics
On 06/18/2015 10:57 AM, Patrick Palka wrote: [ big snip ] These bogus warnings are pre-existing, however (i.e. not caused by this patch). (nods) Fixing the false positives from libpng/bdwgc sounds like a separate issue and thus a separate patch then. Agreed. jeff
[gomp4] Remove some ptxness from middle end
I've committed this patch to the gomp4 branch, after testing. It does a number of cleanups 1) removes the ptx-specific TID, NTID, CTAID NCTAID builtins, replacing them with openacc-specific GOACC_id and GOACC_nid builtins, using gang/worker vector level enumeration. These are mapped by the PTX backend to PTX-specifc instructions. 2) Created a oacc_loop_levels enumeration, and generate the loop nest masks from that. 3) Removed a bunch of duplicate calculations in omp-low related to determining number of threads and thread index. With #2 it becomes easier to use a loop. nathan -- Nathan Sidwell 2015-06-20 Nathan Sidwell nat...@codesourcery.com gcc/ * omp-builtins.def (BUILT_IN_GOACC_NTID, BUILTIN_NCTAID): Replace with ... (BUILT_IN_GOACC_NID): ... this. (BUILT_IN_GOACC_TID, BUILTIN_CTAID): Replace with ... (BUILT_IN_GOACC_ID): ... this. * builtins.c: Include omp-low.h. (expand_oacc_buoltin): Replace with ... (expand_oacc_id): ... this. (expand_builtin, is_simple_builtin): Adjust.oo * omp-low.h (enum oacc_loop_levels): New. * omp-low.c (MASK_GANG, MASK_WORKER, MASK_VECTOR): Replace with ... (OACC_LOOP_MASK): ... this. (scan_omp_for, scan_omp_target): Adjust. (expand_oacc_get_num_threads): Adjust and use a loop. (expand_oacc_get_thread_num): Likewise. (oacc_loop_needs_thread_barrier_p, find_omp_for_region_gwv, find_omp_taarget_region_data, required_predication_mask, generate_vector_broadcast, generate_oacc_broadcast): Adjust. (make_predication_test): Adjust and use a loop. (predicate_bb, oacc_broadcast, oacc_init_count_vars): Adjust. * config/nvptx/nvptx.md (UNSPEC_NTID, UNSPEC_TID, UNSPEC_NCTAID, UNSPEC_CTAID): Replace with ... (UNSPEC_NID, UNSPEC_ID): ... these. (*oacc_ntid_insn, oacc_ntid, *oacc_tid_insn, oacc_tid, *oacc_nctaid_insn, oacc_nctaid, *oacc_ctaid_insn, oacc_ctaid): Replace with ... (oacc_nid, oacc_id): ... these. * config/nvptx/nvptx.c (nvptx_print_operand [CASE 'd']): Remove. libgomp/ * testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Replace GOACC_ctaid builtin with GOACC_id. Index: libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c === --- libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c (revision 224671) +++ libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c (working copy) @@ -35,38 +35,38 @@ main () #pragma acc parallel loop gang (static:*) num_gangs (10) for (i = 0; i 100; i++) -a[i] = __builtin_GOACC_ctaid (0); +a[i] = __builtin_GOACC_id (0); test_nonstatic (a, 10); #pragma acc parallel loop gang (static:1) num_gangs (10) for (i = 0; i 100; i++) -a[i] = __builtin_GOACC_ctaid (0); +a[i] = __builtin_GOACC_id (0); test_static (a, 10, 1); #pragma acc parallel loop gang (static:2) num_gangs (10) for (i = 0; i 100; i++) -a[i] = __builtin_GOACC_ctaid (0); +a[i] = __builtin_GOACC_id (0); test_static (a, 10, 2); #pragma acc parallel loop gang (static:5) num_gangs (10) for (i = 0; i 100; i++) -a[i] = __builtin_GOACC_ctaid (0); +a[i] = __builtin_GOACC_id (0); test_static (a, 10, 5); #pragma acc parallel loop gang (static:20) num_gangs (10) for (i = 0; i 100; i++) -a[i] = __builtin_GOACC_ctaid (0); +a[i] = __builtin_GOACC_id (0); test_static (a, 10, 20); /* Non-static gang. */ #pragma acc parallel loop gang num_gangs (10) for (i = 0; i 100; i++) -a[i] = __builtin_GOACC_ctaid (0); +a[i] = __builtin_GOACC_id (0); test_nonstatic (a, 10); Index: gcc/omp-builtins.def === --- gcc/omp-builtins.def (revision 224671) +++ gcc/omp-builtins.def (working copy) @@ -61,13 +61,9 @@ DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC DEF_GOACC_BUILTIN (BUILT_IN_GOACC_WAIT, GOACC_wait, BT_FN_VOID_INT_INT_VAR, ATTR_NOTHROW_LIST) -DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NTID, GOACC_ntid, +DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ID, GOACC_id, BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) -DEF_GOACC_BUILTIN (BUILT_IN_GOACC_TID, GOACC_tid, - BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) -DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NCTAID, GOACC_nctaid, - BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) -DEF_GOACC_BUILTIN (BUILT_IN_GOACC_CTAID, GOACC_ctaid, +DEF_GOACC_BUILTIN (BUILT_IN_GOACC_NID, GOACC_nid, BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GOACC_BUILTIN (BUILT_IN_GOACC_GET_GANGLOCAL_PTR, GOACC_get_ganglocal_ptr, BT_FN_PTR, ATTR_NOTHROW_LEAF_LIST) Index: gcc/config/nvptx/nvptx.md === --- gcc/config/nvptx/nvptx.md (revision 224671) +++ gcc/config/nvptx/nvptx.md (working copy) @@ -49,10 +49,8 @@ UNSPEC_ALLOCA - UNSPEC_NTID - UNSPEC_TID - UNSPEC_NCTAID - UNSPEC_CTAID + UNSPEC_NID + UNSPEC_ID UNSPEC_SHARED_DATA ]) @@ -1263,65 +1261,32 @@ DONE; }) -(define_insn
[patch] Fix std::polar() test FAIL
I recently added a debug mode assertion that std::polar is not called with a negative rho argument, which this test does. Tested powerpc64le-linux, committed to trunk. commit 3592c4a31ba7f3af4eb8111565888651652ad7b1 Author: Jonathan Wakely jwak...@redhat.com Date: Mon Jun 22 15:08:55 2015 +0100 * testsuite/26_numerics/complex/value_operations/1.cc: Use non-negative rho argument. diff --git a/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc b/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc index 1caf9f1..a1e0a6b 100644 --- a/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc +++ b/libstdc++-v3/testsuite/26_numerics/complex/value_operations/1.cc @@ -53,7 +53,7 @@ void test01() complex_type e __attribute__((unused)) = conj(c); - complex_type f = polar(c.imag(), 0.0); + complex_type f = polar(std::abs(c.imag()), 0.0); VERIFY( f.real() != 0 ); }
Re: [patch] libstdc++/55409 C++11 allocator support for std::list
On 17/06/15 21:36 +0100, Jonathan Wakely wrote: I didn't get time to finish this for 5.1, but this adds missing C++11 allocator support to std::list. François pointed out that this change means we can update __gnu_debug::list to derive from an allocator-aware _Safe_sequence. We can do the same for __cxx11::basic_string by evaluating bool(_GLIBCXX_USE_CXX11_ABI). Tested powerpc64-linux, committed to trunk. commit e5fb5e8dc5d1b1ffeaa48cd1d05f76ee93bc377d Author: Jonathan Wakely jwak...@redhat.com Date: Thu Jun 18 10:02:15 2015 +0100 * include/debug/list (__gnu_debug::list): Use allocator-aware _Safe_container base. * include/debug/string (__gnu_debug::basic_string): Use allocator-aware _Safe_container base for cxx11 ABI. diff --git a/libstdc++-v3/include/debug/list b/libstdc++-v3/include/debug/list index 1562946..12ac53c 100644 --- a/libstdc++-v3/include/debug/list +++ b/libstdc++-v3/include/debug/list @@ -43,12 +43,12 @@ namespace __debug class list : public __gnu_debug::_Safe_container list_Tp, _Allocator, _Allocator, - __gnu_debug::_Safe_node_sequence, false, + __gnu_debug::_Safe_node_sequence, public _GLIBCXX_STD_C::list_Tp, _Allocator { - typedef _GLIBCXX_STD_C::list_Tp, _Allocator _Base; + typedef _GLIBCXX_STD_C::list_Tp, _Allocator _Base; typedef __gnu_debug::_Safe_container - list, _Allocator, __gnu_debug::_Safe_node_sequence, false _Safe; + list, _Allocator, __gnu_debug::_Safe_node_sequence _Safe; typedef typename _Base::iterator _Base_iterator; typedef typename _Base::const_iterator _Base_const_iterator; diff --git a/libstdc++-v3/include/debug/string b/libstdc++-v3/include/debug/string index 3793a35..f068ef0 100644 --- a/libstdc++-v3/include/debug/string +++ b/libstdc++-v3/include/debug/string @@ -42,12 +42,13 @@ namespace __gnu_debug class basic_string : public __gnu_debug::_Safe_container basic_string_CharT, _Traits, _Allocator, - _Allocator, _Safe_sequence, false, + _Allocator, _Safe_sequence, bool(_GLIBCXX_USE_CXX11_ABI), public std::basic_string_CharT, _Traits, _Allocator { typedef std::basic_string_CharT, _Traits, _Allocator _Base; typedef __gnu_debug::_Safe_container - basic_string, _Allocator, _Safe_sequence, false _Safe; + basic_string, _Allocator, _Safe_sequence, bool(_GLIBCXX_USE_CXX11_ABI) + _Safe; public: // types:
Re: Do not take address of empty string front
On 20/06/15 12:59 +0100, Jonathan Wakely wrote: On 20/06/15 12:03 +0200, François Dumont wrote: Hi 2 experimental tests are failing in debug mode because __do_str_codecvt is sometimes taking address of string front() and back() even if empty. It wasn't use so not a big issue but it still seems better to avoid. I propose to rather use string begin() to get buffer address. But derefencing begin() is still undefined for an empty string. Shouldn't that fail for debug mode too? Why change one form of undefined behaviour that we diagnose to another form that we don't diagnose? It would be better if that function didn't do any work when the input range is empty: --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + size_t __outchars = 0; auto __next = __first; const auto __maxlen = __cvt.max_length() + 1; This makes that change, and also moves wstring_convert into the ABI-tagged __cxx11 namespace, and fixes a copypaste error in the exception thrown from wbuffer_convert. Tested powerpc64le-linux, committed to trunk. François, your changes to add extra checks in std::string are still useful separately. commit 4ab3f0a76f7e18074c91c4644cbfdf23084e93ba Author: Jonathan Wakely jwak...@redhat.com Date: Mon Jun 22 13:47:24 2015 +0100 * include/bits/locale_conv.h (__do_str_codecvt): Handle empty range. (wstring_convert): Move into __cxx11 namespace. (wbuffer_convert(streambuf*, _Codecvt*, state_type)): Fix exception message. diff --git a/libstdc++-v3/include/bits/locale_conv.h b/libstdc++-v3/include/bits/locale_conv.h index 61b535c..fd99499 100644 --- a/libstdc++-v3/include/bits/locale_conv.h +++ b/libstdc++-v3/include/bits/locale_conv.h @@ -58,6 +58,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _OutStr __outstr, const _Codecvt __cvt, _State __state, size_t __count, _Fn __fn) { + if (__first == __last) + { + __outstr.clear(); + return true; + } + size_t __outchars = 0; auto __next = __first; const auto __maxlen = __cvt.max_length() + 1; @@ -150,6 +156,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return __str_codecvt_out(__first, __last, __outstr, __cvt, __state, __n); } +_GLIBCXX_BEGIN_NAMESPACE_CXX11 + /// String conversions templatetypename _Codecvt, typename _Elem = wchar_t, typename _Wide_alloc = allocator_Elem, @@ -301,6 +309,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool _M_with_strings = false; }; +_GLIBCXX_END_NAMESPACE_CXX11 + /// Buffer conversions templatetypename _Codecvt, typename _Elem = wchar_t, typename _Tr = char_traits_Elem @@ -325,7 +335,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : _M_buf(__bytebuf), _M_cvt(__pcvt), _M_state(__state) { if (!_M_cvt) - __throw_logic_error(wstring_convert); + __throw_logic_error(wbuffer_convert); _M_always_noconv = _M_cvt-always_noconv();
Re: Fix more of C/fortran canonical type issues
On Mon, 22 Jun 2015, Jan Hubicka wrote: On Mon, 8 Jun 2015, Jan Hubicka wrote: I think we should instead work towards eliminating the get_alias_set langhook first. The LTO langhook variant contains the same handling, btw, so just inline that into get_alias_set and see what remains? I see, i completely missed existence of gimple_get_alias_set. It makes more sense now. Is moving everyting to alias.c realy a desirable thing? If non-C languages do not have this rule, why we want to reduce the code quality when compiling those? Well, for consistency and for getting rid of one langhook ;) :) In a way this particular langhook makes sense to me - TBAA rules are language specific. We also may with explicit streaming of the TBAA dag, like LLVM does. Anyway, this is the updated patch fixing the Fortran's interoperability with size_t and signed char. I will send separate patch for the extra lto-symtab warnings shortly. I will be happy looking into the TYPE_CANONICAL (int) to be different from TYPE_CANONICAL (unsigned int) if that seems desirable. There are two things that needs to be solved - hash_canonical_type/gimple_canonical_types_compatible_p can't use TYPE_CNAONICAL of subtypes in all cases (that is easy) and we will need some way to recognize the conflict in lto-symtab other thanjust comparing TYPE_CANONICAL to not warn when a variable is declared signed in Fortran unit and unsigned in C. Bootstrapped/regtested ppc64le-linux. * lto/lto.c (hash_canonical_type): Do not hash TYPE_UNSIGNED of INTEGER_TYPE. * tree.c (gimple_canonical_types_compatible_p): Do not compare TYPE_UNSIGNED of INTEGER_TYPE. * gimple-expr.c (useless_type_conversion_p): Move INTEGER type handling ahead the canonical type lookup. * gfortran.dg/lto/bind_c-2_0.f90: New testcase * gfortran.dg/lto/bind_c-2_1.c: New testcase * gfortran.dg/lto/bind_c-3_0.f90: New testcase * gfortran.dg/lto/bind_c-3_1.c: New testcase * gfortran.dg/lto/bind_c-4_0.f90: New testcase * gfortran.dg/lto/bind_c-4_1.c: New testcase Hi, I would like to ping this. There are still few things to fix to make our merging compliant at least for C/C++/Fortran rules (the array bounds for Fortran and union ordering for C I believe) and I would like to progress on this. I don't like the changes to useless_type_conversion_p much. Why do you preserve qualifiers for the integer kind compares? All the testcases have the integral types in aggregates as members. I already said that I'm happy globbing them together in aggregates. I'm still not convinced that we need a 1:1 correspondence between canonical types and alias sets. In particular canonical types are used for type compatibility in lhs = rhs assignments (useless_type_conversion_p) which is a transitive relation. Mixing both too much will cause serious confusion. We have alias-sets for a reason. Richard.
Re: [VRP] Improve value ranges for unsigned division
On Sat, Jun 20, 2015 at 9:12 AM, Kugan kugan.vivekanandara...@linaro.org wrote: As discussed in PR64130, this patch improves the VRP value ranges for unsigned division. Bootstrapped and regression tested on x86_64-linux-gnu and regression tested on arm-none-linux-gnu with no new regression. Is this OK for trunk? Hum, the patch is at least incomplete not covering the cmp == -1 case in the max value computation, no? Also I wonder if we have two VR_RANGEs as you require the code using extract_range_from_multiplicative_op_1 isn't better suited and already handles the case properly? Richard. Thanks, Kugan gcc/ChangeLog: 2015-06-20 Kugan Vivekanandarajah kug...@linaro.org PR middle-end/64130 * tree-vrp.c (extract_range_from_binary_expr_1): For unsigned division, compute minimum when value ranges for dividend and divisor are available. gcc/testsuite/ChangeLog: 2015-06-20 Kugan Vivekanandarajah kug...@linaro.org PR middle-end/64130 * gcc.dg/tree-ssa/pr64130.c: New test.
Re: [patch] libstdc++/64657 support iterators with overloaded comma operator
On 29/04/15 16:22 +0100, Jonathan Wakely wrote: I think this covers all the places in the library where we do: ++i, ++j I missed one. Tested powerpc64-linux, committed to trunk. commit 28542877cbeae9e1da3bdc8e3a5a3053b2f0ee23 Author: Jonathan Wakely jwak...@redhat.com Date: Mon Jun 22 13:44:06 2015 +0100 PR libstdc++/64657 * include/bits/stl_uninitialized.h (__uninitialized_copy::__uninit_copy): Cast expression to void. diff --git a/libstdc++-v3/include/bits/stl_uninitialized.h b/libstdc++-v3/include/bits/stl_uninitialized.h index 715cb58..045bdd7 100644 --- a/libstdc++-v3/include/bits/stl_uninitialized.h +++ b/libstdc++-v3/include/bits/stl_uninitialized.h @@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _ForwardIterator __cur = __result; __try { - for (; __first != __last; ++__first, ++__cur) + for (; __first != __last; ++__first, (void)++__cur) std::_Construct(std::__addressof(*__cur), *__first); return __cur; }
Re: [gomp4] Remove some ptxness from middle end
On 06/22/15 13:04, Marek Polacek wrote: On Mon, Jun 22, 2015 at 01:00:51PM -0400, Nathan Sidwell wrote: + if (GET_CODE (arg) != CONST_INT + || (unsigned HOST_WIDE_INT)INTVAL (arg) = OACC_HWM) Don't we have UINTVAL for this? So UINTVAL (arg). Applied the attached, after testing. Also realized I'd missed some places I should have used the new loop level enumeration. nathan -- Nathan Sidwell 2015-06-22 Nathan Sidwell nat...@codesourcery.com * omp-low.c (expand_oacc_get_num_threads): Use OACC enum. (expand_oacc_get_thread_num, make_predication_test): Likewise. * builtins.c (expand_oacc_id): Use UINTVAL. Index: omp-low.c === --- omp-low.c (revision 224747) +++ omp-low.c (working copy) @@ -4994,8 +4994,8 @@ expand_oacc_get_num_threads (gimple_seq tree decl = builtin_decl_explicit (BUILT_IN_GOACC_NID); unsigned ix; - for (ix = 0; (1 ix) = gwv_bits; ix++) -if ((1 ix) gwv_bits) + for (ix = OACC_gang; ix != OACC_HWM; ix++) +if (OACC_LOOP_MASK(ix) gwv_bits) { tree arg = build_int_cst (unsigned_type_node, ix); tree count = create_tmp_var (unsigned_type_node); @@ -5022,8 +5022,8 @@ expand_oacc_get_thread_num (gimple_seq * unsigned ix; /* Start at gang level, and examine relevant dimension indices. */ - for (ix = 0; (1 ix) = gwv_bits; ix++) -if ((1 ix) gwv_bits) + for (ix = OACC_gang; ix != OACC_HWM; ix++) +if (OACC_LOOP_MASK (ix) gwv_bits) { tree arg = build_int_cst (unsigned_type_node, ix); @@ -10671,7 +10671,7 @@ make_predication_test (edge true_edge, b unsigned ix; for (ix = OACC_worker; ix = OACC_vector; ix++) -if (mask (1 ix)) +if (OACC_LOOP_MASK (ix) mask) { gimple call = gimple_build_call (decl, 1, build_int_cst (unsigned_type_node, ix)); Index: builtins.c === --- builtins.c (revision 224747) +++ builtins.c (working copy) @@ -5971,8 +5971,7 @@ expand_oacc_id (enum built_in_function f rtx arg; arg = expand_normal (arg0); - if (GET_CODE (arg) != CONST_INT - || (unsigned HOST_WIDE_INT)INTVAL (arg) = OACC_HWM) + if (GET_CODE (arg) != CONST_INT || UINTVAL (arg) = OACC_HWM) { error (argument to %D must be constant in range 0 to %d, get_callee_fndecl (exp), OACC_HWM - 1);
patch to fix PR63740 on gcc5 branch
I've committed the following patch to gcc 5 branch as rev.224761. The patch was bootstrapped on x86-64. 2015-06-22 Vladimir Makarov vmaka...@redhat.com PR bootstrap/63740 * lra-lives.c (process_bb_lives): Check insn copying the same reload pseudo and don't create a copy for it. Index: lra-lives.c === --- lra-lives.c (revision 224739) +++ lra-lives.c (working copy) @@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int c dst_regno = REGNO (SET_DEST (set)); if (dst_regno = lra_constraint_new_regno_start src_regno = lra_constraint_new_regno_start) - lra_create_copy (dst_regno, src_regno, freq); + { + /* It might be still an original (non-reload) insn with +one unused output and a constraint requiring to use +the same reg for input/output operands. In this case +dst_regno and src_regno have the same value, we don't +need a misleading copy for this case. */ + if (dst_regno != src_regno) + lra_create_copy (dst_regno, src_regno, freq); + } else if (dst_regno = lra_constraint_new_regno_start) { if ((hard_regno = src_regno) = FIRST_PSEUDO_REGISTER)
Re: [PATCH] Combine related fail of gcc.target/powerpc/ti_math1.c
On Mon, Jun 22, 2015 at 09:24:07AM +0200, Eric Botcazou wrote: * rtlanal.c (commutative_operand_precedence): Correct comments. * simplify-rtx.c (simplify_plus_minus_op_data_cmp): Delete forward declaration. Return an int. Distinguish REG,REG return from others. (struct simplify_plus_minus_op_data): Make local to function. (simplify_plus_minus): Rename canonicalized to not_canonical. Don't set not_canonical if merely sorting registers. Avoid packing ops if nothing changes. White space fixes. OK in principle, but... Thanks for reviewing! Some notes: Renaming canonicalized to not_canonical better reflects its usage. At the time the var is set, the expression hasn't been canonicalized. I'm quite skeptical, in particular given: I'm a little surprised, but committed without the renaming. -- Alan Modra Australia Development Lab, IBM
Fix ao initialization in ipa-polymorphic-call
Hi, this patch fixes thinko when initializing ao oracle in ipa_polymorphic_call_context::get_dynamic_type. It took get_deref_alias_set of vptr type instead of get_alias_set that now makes difference because pointer types are different. Bootstrapped/regtested x86_64-linux, comitted. Honza PR ipa/66351 * ipa-polymorphic-call.c (ipa_polymorphic_call_context::get_dynamic_type): Fix thinko when initializing alias oracle; fix formating; set base_alias_set if it is known. Index: ipa-polymorphic-call.c === --- ipa-polymorphic-call.c (revision 224713) +++ ipa-polymorphic-call.c (working copy) @@ -1574,13 +1574,15 @@ ipa_polymorphic_call_context::get_dynami tree base_ref = get_ref_base_and_extent (ref_exp, offset2, size, max_size); - /* Finally verify that what we found looks like read from OTR_OBJECT -or from INSTANCE with offset OFFSET. */ + /* Finally verify that what we found looks like read from +OTR_OBJECT or from INSTANCE with offset OFFSET. */ if (base_ref ((TREE_CODE (base_ref) == MEM_REF ((offset2 == instance_offset TREE_OPERAND (base_ref, 0) == instance) - || (!offset2 TREE_OPERAND (base_ref, 0) == otr_object))) + || (!offset2 + TREE_OPERAND (base_ref, 0) + == otr_object))) || (DECL_P (instance) base_ref == instance offset2 == instance_offset))) { @@ -1608,9 +1610,17 @@ ipa_polymorphic_call_context::get_dynami /* We look for vtbl pointer read. */ ao.size = POINTER_SIZE; ao.max_size = ao.size; + /* We are looking for stores to vptr pointer within the instance of + outer type. + TODO: The vptr pointer type is globally known, we probably should + keep it and do that even when otr_type is unknown. */ if (otr_type) -ao.ref_alias_set - = get_deref_alias_set (TREE_TYPE (BINFO_VTABLE (TYPE_BINFO (otr_type; +{ + ao.base_alias_set + = get_alias_set (outer_type ? outer_type : otr_type); + ao.ref_alias_set += get_alias_set (TREE_TYPE (BINFO_VTABLE (TYPE_BINFO (otr_type; +} if (dump_file) {
Re: Fix more of C/fortran canonical type issues
On Mon, 8 Jun 2015, Jan Hubicka wrote: I think we should instead work towards eliminating the get_alias_set langhook first. The LTO langhook variant contains the same handling, btw, so just inline that into get_alias_set and see what remains? I see, i completely missed existence of gimple_get_alias_set. It makes more sense now. Is moving everyting to alias.c realy a desirable thing? If non-C languages do not have this rule, why we want to reduce the code quality when compiling those? Well, for consistency and for getting rid of one langhook ;) :) In a way this particular langhook makes sense to me - TBAA rules are language specific. We also may with explicit streaming of the TBAA dag, like LLVM does. Anyway, this is the updated patch fixing the Fortran's interoperability with size_t and signed char. I will send separate patch for the extra lto-symtab warnings shortly. I will be happy looking into the TYPE_CANONICAL (int) to be different from TYPE_CANONICAL (unsigned int) if that seems desirable. There are two things that needs to be solved - hash_canonical_type/gimple_canonical_types_compatible_p can't use TYPE_CNAONICAL of subtypes in all cases (that is easy) and we will need some way to recognize the conflict in lto-symtab other thanjust comparing TYPE_CANONICAL to not warn when a variable is declared signed in Fortran unit and unsigned in C. Bootstrapped/regtested ppc64le-linux. * lto/lto.c (hash_canonical_type): Do not hash TYPE_UNSIGNED of INTEGER_TYPE. * tree.c (gimple_canonical_types_compatible_p): Do not compare TYPE_UNSIGNED of INTEGER_TYPE. * gimple-expr.c (useless_type_conversion_p): Move INTEGER type handling ahead the canonical type lookup. * gfortran.dg/lto/bind_c-2_0.f90: New testcase * gfortran.dg/lto/bind_c-2_1.c: New testcase * gfortran.dg/lto/bind_c-3_0.f90: New testcase * gfortran.dg/lto/bind_c-3_1.c: New testcase * gfortran.dg/lto/bind_c-4_0.f90: New testcase * gfortran.dg/lto/bind_c-4_1.c: New testcase Hi, I would like to ping this. There are still few things to fix to make our merging compliant at least for C/C++/Fortran rules (the array bounds for Fortran and union ordering for C I believe) and I would like to progress on this. Honza
Re: [PATCH] Fix PR ipa/65908.
Hi, this patch extends earlier Martin's patch. I completely removed parse_tree_args because it assumed that every function has only one set of parm types and one return value type. We really want to compare two sets of parameter types (DECL_ARGUMENTS and TYPE_PARM_TYPES) as they do not need to 100% match and similarly for the return value. I also added check that skips unnecesary matching (in addition to the signature) for parameters that are not used. The patch bootstrapped/regtested x86_64-linux, and I checked it behaves sane with Firefox. I will commit it to mainline and will follow with branch in couple days. Honza PR ipa/65908 * ipa-icf.c (sem_item::target_supports_symbol_aliases): Remove construction of arg_types. (sem_function::sem_function): Likewise. (sem_function::~sem_function): Remove destruction of arg_types. (sem_function::compatible_parm_types_p): New function. (sem_function::equals_wpa): Reorg matching of return values and parameter types. (sem_function::equals_private): Reorg mathcing of argument types. (sem_function::parse_tree_args): Remove. * ipa-icf.h (init_wpa): Do not call it. (parse_tree_args): Remove. (compatible_parm_types_p): Declare. (result_type): Remove. (arg_types): Remove. * testsuite/g++.dg/ipa/pr65908.C: New testcase. Index: ipa-icf.h === --- ipa-icf.h (revision 224713) +++ ipa-icf.h (working copy) @@ -292,7 +292,6 @@ public: inline virtual void init_wpa (void) { -parse_tree_args (); } virtual void init (void); @@ -310,9 +309,6 @@ public: dump_function_to_file (decl, file, TDF_DETAILS); } - /* Parses function arguments and result type. */ - void parse_tree_args (void); - /* Returns cgraph_node. */ inline cgraph_node *get_node (void) { @@ -329,15 +325,13 @@ public: semantic function item. */ static sem_function *parse (cgraph_node *node, bitmap_obstack *stack); + /* Perform additional checks needed to match types of used function + paramters. */ + bool compatible_parm_types_p (tree, tree); + /* Exception handling region tree. */ eh_region region_tree; - /* Result type tree node. */ - tree result_type; - - /* Array of argument tree types. */ - vec tree arg_types; - /* Number of function arguments. */ unsigned int arg_count; Index: testsuite/g++.dg/ipa/pr65908.C === --- testsuite/g++.dg/ipa/pr65908.C (revision 0) +++ testsuite/g++.dg/ipa/pr65908.C (revision 0) @@ -0,0 +1,27 @@ +// PR ipa/65908 +// { dg-do compile } +// { dg-options -O2 } +// { dg-additional-options -fPIC { target fpic } } + +class A +{ + A (A ); +}; +class B +{ + const A m_fn1 () const; +}; +class C +{ + A m_fn2 () const; +}; +A +C::m_fn2 () const +{ + throw 0; +} +const A +B::m_fn1 () const +{ + throw 0; +} Index: ipa-icf.c === --- ipa-icf.c (revision 224713) +++ ipa-icf.c (working copy) @@ -259,7 +259,6 @@ sem_item::target_supports_symbol_aliases sem_function::sem_function (bitmap_obstack *stack): sem_item (FUNC, stack), m_checker (NULL), m_compared_func (NULL) { - arg_types.create (0); bb_sizes.create (0); bb_sorted.create (0); } @@ -271,7 +270,6 @@ sem_function::sem_function (cgraph_node sem_item (FUNC, node, hash, stack), m_checker (NULL), m_compared_func (NULL) { - arg_types.create (0); bb_sizes.create (0); bb_sorted.create (0); } @@ -281,7 +279,6 @@ sem_function::~sem_function () for (unsigned i = 0; i bb_sorted.length (); i++) delete (bb_sorted[i]); - arg_types.release (); bb_sizes.release (); bb_sorted.release (); } @@ -581,6 +578,30 @@ sem_function::param_used_p (unsigned int return ipa_is_param_used (IPA_NODE_REF (get_node ()), i); } +/* Perform additional check needed to match types function parameters that are + used. Unlike for normal decls it matters if type is TYPE_RESTRICT and we + make an assumption that REFERENCE_TYPE parameters are always non-NULL. */ + +bool +sem_function::compatible_parm_types_p (tree parm1, tree parm2) +{ + /* Be sure that parameters are TBAA compatible. */ + if (!func_checker::compatible_types_p (parm1, parm2)) +return return_false_with_msg (parameter type is not compatible); + + if (POINTER_TYPE_P (parm1) + (TYPE_RESTRICT (parm1) != TYPE_RESTRICT (parm2))) +return return_false_with_msg (argument restrict flag mismatch); + + /* nonnull_arg_p implies non-zero range to REFERENCE types. */ + if (POINTER_TYPE_P (parm1) + TREE_CODE (parm1) != TREE_CODE (parm2) + opt_for_fn (decl, flag_delete_null_pointer_checks)) +return return_false_with_msg (pointer wrt reference mismatch); + + return true; +} + /* Fast equality function based on knowledge known
[PATCH] Avoid computing scalar iteration cost multiple times
This avoids doing $subject in the vectorizer. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-06-22 Richard Biener rguent...@suse.de * tree-vectorizer.h (_loop_vec_info): Add scalar_cost_vec and single_scalar_iteration_cost members. (LOOP_VINFO_SCALAR_ITERATION_COST): New. (LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST): Likewise. (vect_get_single_scalar_iteration_cost): Remove. * tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost): Use LOOP_VINFO_SCALAR_ITERATION_COST. * tree-vect-loop.c (destroy_loop_vec_info): Free scalar_cost_vec. (vect_get_single_scalar_iteration_cost): Compute result into LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST and LOOP_VINFO_SCALAR_ITERATION_COST. Make static. (vect_analyze_loop_2): Call vect_get_single_scalar_iteration_cost. (vect_estimate_min_profitable_iters): Use them. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 224603) +++ gcc/tree-vect-data-refs.c (working copy) @@ -1165,11 +1165,10 @@ vect_peeling_hash_get_lowest_cost (_vect SET_DR_MISALIGNMENT (dr, save_misalignment); } - auto_vecstmt_info_for_cost scalar_cost_vec; - vect_get_single_scalar_iteration_cost (loop_vinfo, scalar_cost_vec); outside_cost += vect_get_known_peeling_cost (loop_vinfo, elem-npeel, dummy, - scalar_cost_vec, prologue_cost_vec, epilogue_cost_vec); + LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), + prologue_cost_vec, epilogue_cost_vec); /* Prologue and epilogue costs are added to the target model later. These costs depend only on the scalar iteration cost, the Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 224603) +++ gcc/tree-vect-loop.c(working copy) @@ -1095,12 +1095,82 @@ destroy_loop_vec_info (loop_vec_info loo LOOP_VINFO_PEELING_HTAB (loop_vinfo) = NULL; destroy_cost_data (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo)); + loop_vinfo-scalar_cost_vec.release (); free (loop_vinfo); loop-aux = NULL; } +/* Calculate the cost of one scalar iteration of the loop. */ +static void +vect_get_single_scalar_iteration_cost (loop_vec_info loop_vinfo) +{ + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); + int nbbs = loop-num_nodes, factor, scalar_single_iter_cost = 0; + int innerloop_iters, i; + + /* Count statements in scalar loop. Using this as scalar cost for a single + iteration for now. + + TODO: Add outer loop support. + + TODO: Consider assigning different costs to different scalar + statements. */ + + /* FORNOW. */ + innerloop_iters = 1; + if (loop-inner) +innerloop_iters = 50; /* FIXME */ + + for (i = 0; i nbbs; i++) +{ + gimple_stmt_iterator si; + basic_block bb = bbs[i]; + + if (bb-loop_father == loop-inner) +factor = innerloop_iters; + else +factor = 1; + + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (si)) +{ + gimple stmt = gsi_stmt (si); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + + if (!is_gimple_assign (stmt) !is_gimple_call (stmt)) +continue; + + /* Skip stmts that are not vectorized inside the loop. */ + if (stmt_info + !STMT_VINFO_RELEVANT_P (stmt_info) + (!STMT_VINFO_LIVE_P (stmt_info) + || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info))) + !STMT_VINFO_IN_PATTERN_P (stmt_info)) +continue; + + vect_cost_for_stmt kind; + if (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt))) +{ + if (DR_IS_READ (STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt + kind = scalar_load; + else + kind = scalar_store; +} + else +kind = scalar_stmt; + + scalar_single_iter_cost + += record_stmt_cost (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), +factor, kind, NULL, 0, vect_prologue); +} +} + LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo) += scalar_single_iter_cost; +} + + /* Function vect_analyze_loop_1. Apply a set of analyses on LOOP, and create a loop_vec_info struct @@ -1834,6 +1904,9 @@ vect_analyze_loop_2 (loop_vec_info loop_ return false; } + /* Compute the scalar iteration cost. */ + vect_get_single_scalar_iteration_cost (loop_vinfo); + /* This pass will decide on using loop versioning and/or loop peeling in order to enhance the alignment of data references in the loop. */ @@ -2706,74 +2779,6 @@ vect_force_simple_reduction (loop_vec_in double_reduc, true); } -/*
Re: [PATCH] Check dominator info in compute_dominance_frontiers
On 22/06/15 12:14, Richard Biener wrote: On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, during development of a patch I ran into a case where compute_dominance_frontiers was called with incorrect dominance info. The result was a segmentation violation somewhere in the bitmap code while executing this bitmap_set_bit in compute_dominance_frontiers_1: ... if (!bitmap_set_bit (frontiers[runner-index], b-index)) break; ... The segmentation violation happens because runner-index is 0, and frontiers[0] is uninitialized. [ The initialization in update_ssa looks like this: ... dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun)); FOR_EACH_BB_FN (bb, cfun) bitmap_initialize (dfs[bb-index], bitmap_default_obstack); compute_dominance_frontiers (dfs); ... FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0] (frontiers[0] in compute_dominance_frontiers_1) is not initialized. We could add initialization by making the entry/exit-block bitmap_heads empty and setting the obstack to a reserved obstack bitmap_no_obstack for which allocation results in an assert. ] AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but that the loop reaches the state of runner-index == 0, due to the incorrect dominance info. The patch adds an assert to the loop in compute_dominance_frontiers_1, to make the failure mode cleaner and easier to understand. I think we wouldn't catch all errors in dominance info with this assert. So the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at the start of compute_dominance_frontiers. I'm not sure if: - adding the verify_dominators call is too costly in runtime. - the verify_dominators call should be inside or outside the TV_DOM_FRONTIERS measurement. - there is a level of ENABLE_CHECKING that is more appropriate for the verify_dominators call. Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds? I don't think these kind of asserts are good. A segfault is good by itself (so you can just add the comment if you like). The segfault is not guaranteed to trigger, because it works on uninitialized data. Instead, we may end up modifying valid memory and silently generating wrong code or causing sigsegvs (which will be difficult to track back this error). So I don't think doing nothing is an option here. If we're not going to add this assert, we should initialize the uninitialized data in such a way that we are guaranteed to detect the error. The scheme I proposed above would take care of that. Should I implement that instead? Likewise the verify_dominators call is too expensive and misplaced. If then the call belongs in the dom_computed[] == DOM_OK early-out in calculate_dominance_info OK, like this: ... diff --git a/gcc/dominance.c b/gcc/dominance.c index a9e042e..1827eda9 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir) bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false; if (dom_computed[dir_index] == DOM_OK) -return; +{ +#if ENABLE_CHECKING + verify_dominators (CDI_DOMINATORS); +#endif + return; +} timevar_push (TV_DOMINANCE); if (!dom_info_available_p (dir)) ... I didn't fully understand your comment, do you want me to test this? Thanks, - Tom (eventually also for the case where we end up only computing the fast-query stuff).
Re: [PATCH] Check dominator info in compute_dominance_frontiers
On Mon, Jun 22, 2015 at 1:33 PM, Tom de Vries tom_devr...@mentor.com wrote: On 22/06/15 12:14, Richard Biener wrote: On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, during development of a patch I ran into a case where compute_dominance_frontiers was called with incorrect dominance info. The result was a segmentation violation somewhere in the bitmap code while executing this bitmap_set_bit in compute_dominance_frontiers_1: ... if (!bitmap_set_bit (frontiers[runner-index], b-index)) break; ... The segmentation violation happens because runner-index is 0, and frontiers[0] is uninitialized. [ The initialization in update_ssa looks like this: ... dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun)); FOR_EACH_BB_FN (bb, cfun) bitmap_initialize (dfs[bb-index], bitmap_default_obstack); compute_dominance_frontiers (dfs); ... FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0] (frontiers[0] in compute_dominance_frontiers_1) is not initialized. We could add initialization by making the entry/exit-block bitmap_heads empty and setting the obstack to a reserved obstack bitmap_no_obstack for which allocation results in an assert. ] AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but that the loop reaches the state of runner-index == 0, due to the incorrect dominance info. The patch adds an assert to the loop in compute_dominance_frontiers_1, to make the failure mode cleaner and easier to understand. I think we wouldn't catch all errors in dominance info with this assert. So the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at the start of compute_dominance_frontiers. I'm not sure if: - adding the verify_dominators call is too costly in runtime. - the verify_dominators call should be inside or outside the TV_DOM_FRONTIERS measurement. - there is a level of ENABLE_CHECKING that is more appropriate for the verify_dominators call. Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds? I don't think these kind of asserts are good. A segfault is good by itself (so you can just add the comment if you like). The segfault is not guaranteed to trigger, because it works on uninitialized data. Instead, we may end up modifying valid memory and silently generating wrong code or causing sigsegvs (which will be difficult to track back this error). So I don't think doing nothing is an option here. If we're not going to add this assert, we should initialize the uninitialized data in such a way that we are guaranteed to detect the error. The scheme I proposed above would take care of that. Should I implement that instead? No, instead the check below should catch the error much earlier. Likewise the verify_dominators call is too expensive and misplaced. If then the call belongs in the dom_computed[] == DOM_OK early-out in calculate_dominance_info OK, like this: ... diff --git a/gcc/dominance.c b/gcc/dominance.c index a9e042e..1827eda9 100644 --- a/gcc/dominance.c +++ b/gcc/dominance.c @@ -646,7 +646,12 @@ calculate_dominance_info (enum cdi_direction dir) bool reverse = (dir == CDI_POST_DOMINATORS) ? true : false; if (dom_computed[dir_index] == DOM_OK) -return; +{ +#if ENABLE_CHECKING + verify_dominators (CDI_DOMINATORS); +#endif + return; +} timevar_push (TV_DOMINANCE); if (!dom_info_available_p (dir)) ... Yes. I didn't fully understand your comment, do you want me to test this? Sure, it should catch the error. Richard. Thanks, - Tom (eventually also for the case where we end up only computing the fast-query stuff).
[PATCH 3/3][ARM][PR target/65697] Add tests for __sync builtins.
This is the ARM version of the patches to strengthen memory barriers for the __sync builtins on ARMv8 targets (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html). This patch adds tests for the code generated by the ARM backend for the __sync builtins. Tested the series for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/testsuite 2015-06-22 Matthew Wahab matthew.wa...@arm.com PR Target/65697 * gcc.target/arm/armv8-sync-comp-swap.c: New. * gcc.target/arm/armv8-sync-op-acquire.c: New. * gcc.target/arm/armv8-sync-op-full.c: New. * gcc.target/arm/armv8-sync-op-release.c: New. From 8157c7480a9d6d559013d02e24519d1b7ba1ed5b Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Wed, 3 Jun 2015 16:27:55 +0100 Subject: [PATCH 3/3] [ARM] Add test cases. Change-Id: I0f2257ce5b5e7f9d0f75e57e6be22fd9733ed3ca --- gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c | 10 ++ gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c | 10 ++ gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c| 10 ++ gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c | 8 4 files changed, 38 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c create mode 100644 gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c new file mode 100644 index 000..f96c81a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options -O2 } */ +/* { dg-add-options arm_arch_v8a } */ + +#include ../aarch64/sync-comp-swap.x + +/* { dg-final { scan-assembler-times ldrex 2 } } */ +/* { dg-final { scan-assembler-times stlex 2 } } */ +/* { dg-final { scan-assembler-times dmb 2 } } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c new file mode 100644 index 000..8d6659b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options -O2 } */ +/* { dg-add-options arm_arch_v8a } */ + +#include ../aarch64/sync-op-acquire.x + +/* { dg-final { scan-assembler-times ldrex 1 } } */ +/* { dg-final { scan-assembler-times stlex 1 } } */ +/* { dg-final { scan-assembler-times dmb 1 } } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c new file mode 100644 index 000..a5ad3bd --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options -O2 } */ +/* { dg-add-options arm_arch_v8a } */ + +#include ../aarch64/sync-op-full.x + +/* { dg-final { scan-assembler-times ldrex 12 } } */ +/* { dg-final { scan-assembler-times stlex 12 } } */ +/* { dg-final { scan-assembler-times dmb 12 } } */ diff --git a/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c new file mode 100644 index 000..0d3be7b --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { do-require-effective-target arm_arch_v8a_ok } */ +/* { dg-options -O2 } */ +/* { dg-add-options arm_arch_v8a } */ + +#include ../aarch64/sync-op-release.x + +/* { dg-final { scan-assembler-times stl 1 } } */ -- 1.9.1
Re: [PATCH] Check dominator info in compute_dominance_frontiers
On Mon, Jun 22, 2015 at 10:04 AM, Tom de Vries tom_devr...@mentor.com wrote: Hi, during development of a patch I ran into a case where compute_dominance_frontiers was called with incorrect dominance info. The result was a segmentation violation somewhere in the bitmap code while executing this bitmap_set_bit in compute_dominance_frontiers_1: ... if (!bitmap_set_bit (frontiers[runner-index], b-index)) break; ... The segmentation violation happens because runner-index is 0, and frontiers[0] is uninitialized. [ The initialization in update_ssa looks like this: ... dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun)); FOR_EACH_BB_FN (bb, cfun) bitmap_initialize (dfs[bb-index], bitmap_default_obstack); compute_dominance_frontiers (dfs); ... FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0] (frontiers[0] in compute_dominance_frontiers_1) is not initialized. We could add initialization by making the entry/exit-block bitmap_heads empty and setting the obstack to a reserved obstack bitmap_no_obstack for which allocation results in an assert. ] AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but that the loop reaches the state of runner-index == 0, due to the incorrect dominance info. The patch adds an assert to the loop in compute_dominance_frontiers_1, to make the failure mode cleaner and easier to understand. I think we wouldn't catch all errors in dominance info with this assert. So the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at the start of compute_dominance_frontiers. I'm not sure if: - adding the verify_dominators call is too costly in runtime. - the verify_dominators call should be inside or outside the TV_DOM_FRONTIERS measurement. - there is a level of ENABLE_CHECKING that is more appropriate for the verify_dominators call. Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds? I don't think these kind of asserts are good. A segfault is good by itself (so you can just add the comment if you like). Likewise the verify_dominators call is too expensive and misplaced. If then the call belongs in the dom_computed[] == DOM_OK early-out in calculate_dominance_info (eventually also for the case where we end up only computing the fast-query stuff). Richard. Thanks, - Tom
Re: [C++/58583] ICE instantiating NSDMIs
Nathan Sidwell nat...@acm.org writes: On 06/20/15 02:09, Andreas Schwab wrote: This also fails on powerpc. what is the build compiler? It is a bootstrapped build, so the build compiler should not matter. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[PATCH] Check dominator info in compute_dominance_frontiers
Hi, during development of a patch I ran into a case where compute_dominance_frontiers was called with incorrect dominance info. The result was a segmentation violation somewhere in the bitmap code while executing this bitmap_set_bit in compute_dominance_frontiers_1: ... if (!bitmap_set_bit (frontiers[runner-index], b-index)) break; ... The segmentation violation happens because runner-index is 0, and frontiers[0] is uninitialized. [ The initialization in update_ssa looks like this: ... dfs = XNEWVEC (bitmap_head, last_basic_block_for_fn (cfun)); FOR_EACH_BB_FN (bb, cfun) bitmap_initialize (dfs[bb-index], bitmap_default_obstack); compute_dominance_frontiers (dfs); ... FOR_EACH_BB_FN skips over the entry-block and the exit-block, so dfs[0] (frontiers[0] in compute_dominance_frontiers_1) is not initialized. We could add initialization by making the entry/exit-block bitmap_heads empty and setting the obstack to a reserved obstack bitmap_no_obstack for which allocation results in an assert. ] AFAIU, the immediate problem is not that frontiers[0] is uninitialized, but that the loop reaches the state of runner-index == 0, due to the incorrect dominance info. The patch adds an assert to the loop in compute_dominance_frontiers_1, to make the failure mode cleaner and easier to understand. I think we wouldn't catch all errors in dominance info with this assert. So the patch also contains an ENABLE_CHECKING-enabled verify_dominators call at the start of compute_dominance_frontiers. I'm not sure if: - adding the verify_dominators call is too costly in runtime. - the verify_dominators call should be inside or outside the TV_DOM_FRONTIERS measurement. - there is a level of ENABLE_CHECKING that is more appropriate for the verify_dominators call. Is this ok for trunk if bootstrap and reg-test on x86_64 succeeds? Thanks, - Tom Check dominator info in compute_dominance_frontiers 2015-06-22 Tom de Vries t...@codesourcery.com * cfganal.c (compute_dominance_frontiers_1): Add assert. (compute_dominance_frontiers): Verify dominators if ENABLE_CHECKING. --- gcc/cfganal.c | 9 + 1 file changed, 9 insertions(+) diff --git a/gcc/cfganal.c b/gcc/cfganal.c index b8d67bc..0e0e2bb 100644 --- a/gcc/cfganal.c +++ b/gcc/cfganal.c @@ -1261,6 +1261,11 @@ compute_dominance_frontiers_1 (bitmap_head *frontiers) domsb = get_immediate_dominator (CDI_DOMINATORS, b); while (runner != domsb) { + /* If you're running into this assert, the dominator info is + incorrect. Try enabling the verify_dominators call at the + start of compute_dominance_frontiers. */ + gcc_assert (runner != ENTRY_BLOCK_PTR_FOR_FN (cfun)); + if (!bitmap_set_bit (frontiers[runner-index], b-index)) break; @@ -1276,6 +1281,10 @@ compute_dominance_frontiers_1 (bitmap_head *frontiers) void compute_dominance_frontiers (bitmap_head *frontiers) { +#if ENABLE_CHECKING + verify_dominators (CDI_DOMINATORS); +#endif + timevar_push (TV_DOM_FRONTIERS); compute_dominance_frontiers_1 (frontiers); -- 1.9.1
[gomp4] dominance info after predicate_omp_regions
On 21/05/15 13:42, ber...@gcc.gnu.org wrote: Author: bernds Date: Thu May 21 11:42:14 2015 New Revision: 223478 URL: https://gcc.gnu.org/viewcvs?rev=223478root=gccview=rev Log: * omp-low.c (struct omp_region): Add a gwv_this field. (bb_region_map): New variable. (find_omp_for_region_data, find_omp_target_region_data): New static functions. (build_omp_regions_1): Call them. Build the bb_region_map. (enclosing_target_region, requires_vector_predicate, generate_vector_broadcast, predicate_bb, find_predicatable_bbs, predicate_omp_regions): New static functions. (execute_expand_omp): Allocate and free bb_region_map. Modified: branches/gomp-4_0-branch/gcc/ChangeLog.gomp branches/gomp-4_0-branch/gcc/omp-low.c Hi Bernd, I ran into trouble with invalid dominance info, AFAIU because predicate_omp_regions invalidates the dominance info. For now I'm using this workaround: ... diff --git a/gcc/omp-low.c b/gcc/omp-low.c index f7e13d3..5601cff 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -11268,6 +11268,8 @@ execute_expand_omp (void) } predicate_omp_regions (ENTRY_BLOCK_PTR_FOR_FN (cfun)); + free_dominance_info (CDI_DOMINATORS); + calculate_dominance_info (CDI_DOMINATORS); remove_exit_barriers (root_omp_region); ... Thanks, - Tom
[PATCH, i386]: Ignore the cost of embedded comparison (PR 65871)
Hello! As shown in the PR [1], RTX costs can reject combination of the operation and its embedded comparison. Attached patch fixes this by ignoring the cost of embedded comparison. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65871#c10 2015-06-22 Uros Bizjak ubiz...@gmail.com PR target/65871 * config/i386/i386.c (ix86_rtx_costs) case COMPARE: Ignore the cost of embedded comparison. Bootstrapped on x86_64-linux-gnu, regtest in progress. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 224718) +++ config/i386/i386.c (working copy) @@ -42531,6 +42531,12 @@ ix86_rtx_costs (rtx x, int code_i, int outer_code_ + rtx_cost (const1_rtx, outer_code, opno, speed)); return true; } + + /* The embedded comparison operand is completely free. */ + if (!general_operand (XEXP (x, 0), GET_MODE (XEXP (x, 0))) + XEXP (x, 1) == const0_rtx) + *total = 0; + return false; case FLOAT_EXTEND:
[PATCH 2/3][ARM][PR target/65697] Strengthen barriers for compare-and-swap builtin.
This is the ARM version of the patches to strengthen memory barriers for the __sync builtins on ARMv8 targets (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html). This patch changes the code generated for __sync_type_compare_and_swap to remove the acquire-barrier from the load and end the operation with a fence. This also strengthens the acquire barrier generated for __sync_lock_test_and_set which, like compare-and-swap, is implemented as a form of atomic exchange. Tested as part of a series for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab matthew.wa...@arm.com PR Target/65697 * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace an initial acquire barrier with a final full barrier. From ddb9a45acda7bb64d91c446bc40afe4b78fcc1e1 Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Fri, 22 May 2015 13:36:39 +0100 Subject: [PATCH 2/3] [ARM] Strengthen barriers for compare-and-swap builtin. Change-Id: I43381b2ea88492f807d85a73d233369334c99881 --- gcc/config/arm/arm.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 94118f4..4610ff6 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -27603,6 +27603,8 @@ arm_split_compare_and_swap (rtx operands[]) scratch = operands[7]; mode = GET_MODE (mem); + bool is_armv8_sync = arm_arch8 is_mm_sync (mod_s); + bool use_acquire = TARGET_HAVE_LDACQ !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s) || is_mm_release (mod_s)); @@ -27611,6 +27613,11 @@ arm_split_compare_and_swap (rtx operands[]) !(is_mm_relaxed (mod_s) || is_mm_consume (mod_s) || is_mm_acquire (mod_s)); + /* For ARMv8, the load-acquire is too weak for __sync memory orders. Instead, + a full barrier is emitted after the store-release. */ + if (is_armv8_sync) +use_acquire = false; + /* Checks whether a barrier is needed and emits one accordingly. */ if (!(use_acquire || use_release)) arm_pre_atomic_barrier (mod_s); @@ -27651,7 +27658,8 @@ arm_split_compare_and_swap (rtx operands[]) emit_label (label2); /* Checks whether a barrier is needed and emits one accordingly. */ - if (!(use_acquire || use_release)) + if (is_armv8_sync + || !(use_acquire || use_release)) arm_post_atomic_barrier (mod_s); if (is_mm_relaxed (mod_f)) -- 1.9.1
Re: [PATCH] Combine related fail of gcc.target/powerpc/ti_math1.c
This patch fixes FAIL: gcc.target/powerpc/ti_math1.c scan-assembler-times adde 1 a failure caused by combine simplifying this i2src (plus:DI (plus:DI (reg:DI 165 [ val+8 ]) (reg:DI 169 [+8 ])) (reg:DI 76 ca)) to this (plus:DI (plus:DI (reg:DI 76 ca) (reg:DI 165 [ val+8 ])) (reg:DI 169 [+8 ])) which no longer matches rs6000.md adddi3_carry_in_internal. See https://gcc.gnu.org/ml/gcc/2015-05/msg00206.html for related discussion. Bootstrapped and regression tested powerpc64le-linux, powerpc64-linux and x86_64-linux. OK to apply mainline? * rtlanal.c (commutative_operand_precedence): Correct comments. * simplify-rtx.c (simplify_plus_minus_op_data_cmp): Delete forward declaration. Return an int. Distinguish REG,REG return from others. (struct simplify_plus_minus_op_data): Make local to function. (simplify_plus_minus): Rename canonicalized to not_canonical. Don't set not_canonical if merely sorting registers. Avoid packing ops if nothing changes. White space fixes. OK in principle, but... Some notes: Renaming canonicalized to not_canonical better reflects its usage. At the time the var is set, the expression hasn't been canonicalized. I'm quite skeptical, in particular given: + /* Just swapping registers doesn't count as canonicalization. */ + if (cmp != 1) + not_canonical = 1; and + /* If nothing changed, fail. */ + if (!not_canonical) +return NULL_RTX; Both are rather confusing now so the renaming isn't really a progress IMO. -- Eric Botcazou
[PATCH 1/3][ARM][PR target/65697] Strengthen memory barriers for __sync builtins
This is the ARM version of the patches to strengthen memory barriers for the __sync builtins on ARMv8 targets (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html). The problem is that the barriers generated for the __sync builtins for ARMv8 targets are too weak. This affects the full and the acquire barriers in the __sync fetch-and-op, compare-and-swap functions and __sync_lock_test_and_set. This patch series changes the code to strengthen the barriers by replacing initial load-acquires with a simple load and adding a final memory barrier to prevent code hoisting. - Full barriers: __sync_fetch_and_op, __sync_op_and_fetch __sync_*_compare_and_swap [load-acquire; code; store-release] becomes [load; code ; store-release; barrier]. - Acquire barriers: __sync_lock_test_and_set [load-acquire; code; store] becomes [load; code; store; barrier] This patch changes the code generated for __sync_fetch_and_op and __sync_op_and_fetch builtins. Tested as part of a series for arm-none-linux-gnueabihf with check-gcc. Ok for trunk? Matthew gcc/ 2015-06-22 Matthew Wahab matthew.wa...@arm.com PR Target/65697 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an initial acquire barrier with a final full barrier. From 3e9f71c04dba20ba66b5c9bae284fcac5fdd91ec Mon Sep 17 00:00:00 2001 From: Matthew Wahab matthew.wa...@arm.com Date: Fri, 22 May 2015 13:31:58 +0100 Subject: [PATCH 1/3] [ARM] Strengthen barriers for sync-fetch-op builtin. Change-Id: I18f5af5ba4b2e74b5866009d3a090e251eff4a45 --- gcc/config/arm/arm.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e79a369..94118f4 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -27668,6 +27668,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, rtx_code_label *label; rtx x; + bool is_armv8_sync = arm_arch8 is_mm_sync (model); + bool use_acquire = TARGET_HAVE_LDACQ !(is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release (model)); @@ -27676,6 +27678,11 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, !(is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire (model)); + /* For ARMv8, a load-acquire is too weak for __sync memory orders. Instead, + a full barrier is emitted after the store-release. */ + if (is_armv8_sync) +use_acquire = false; + /* Checks whether a barrier is needed and emits one accordingly. */ if (!(use_acquire || use_release)) arm_pre_atomic_barrier (model); @@ -27746,7 +27753,8 @@ arm_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem, emit_unlikely_jump (gen_cbranchsi4 (x, cond, const0_rtx, label)); /* Checks whether a barrier is needed and emits one accordingly. */ - if (!(use_acquire || use_release)) + if (is_armv8_sync + || !(use_acquire || use_release)) arm_post_atomic_barrier (model); } -- 1.9.1
[gomp4, committed] Handle reduction in oacc kernels region
Hi, attached patch handles reductions in oacc kernels region. The approach uses the normal parloops reduction handling code, with these modifications: 1. For each reduction, we look for this pattern in the oacc-lowered code, and store 'addr' in the corresponding struct reduction_info: ... bb preheader .omp_data_i = .omp_data_arr; addr = .omp_data_i-sum; sum_a = *addr; bb header: sum_b = PHI sum_a (preheader), sum_c (latch) ... 2. We replaces the non-atomic store to 'addr' at the end of the kernels region with an atomic one. Bootstrapped and reg-tested on x86_64 on top of gomp-4_0-branch. Committed to gomp-4_0-branch. Thanks, - Tom Handle reduction in oacc kernels region 2015-06-18 Tom de Vries t...@codesourcery.com * tree-parloops.c (struct reduction_info): Add reduc_addr field. (create_call_for_reduction_1): Handle case that reduc_addr is non-NULL. (gen_parallel_loop): Init clsn_data for oacc_kernels_p case. (try_create_reduction_list): Add and handle oacc_kernels_p parameter. (parallelize_loops): Add argument to call to try_create_reduction_list. * testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c: New test. * c-c++-common/goacc/kernels-reduction.c: New test. --- .../c-c++-common/goacc/kernels-reduction.c | 38 + gcc/tree-parloops.c| 92 -- .../libgomp.oacc-c-c++-common/kernels-reduction.c | 37 + 3 files changed, 162 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-reduction.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c new file mode 100644 index 000..bfbcdbd --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c @@ -0,0 +1,38 @@ +/* { dg-additional-options -O2 } */ +/* { dg-additional-options -ftree-parallelize-loops=32 } */ +/* { dg-additional-options -fdump-tree-parloops_oacc_kernels-all } */ +/* { dg-additional-options -fdump-tree-optimized } */ + +#include stdlib.h + +#define n 1 + +unsigned int a[n]; + +void __attribute__((noinline,noclone)) +foo (void) +{ + int i; + unsigned int sum = 1; + +#pragma acc kernels copyin (a[0:n]) copy (sum) + { +for (i = 0; i n; ++i) + sum += a[i]; + } + + if (sum != 5001) +abort (); +} + +/* Check that only one loop is analyzed, and that it can be parallelized. */ +/* { dg-final { scan-tree-dump-times SUCCESS: may be parallelized 1 parloops_oacc_kernels } } */ +/* { dg-final { scan-tree-dump-not FAILED: parloops_oacc_kernels } } */ + +/* Check that the loop has been split off into a function. */ +/* { dg-final { scan-tree-dump-times (?n);; Function .*foo.*._omp_fn.0 1 optimized } } */ + +/* { dg-final { scan-tree-dump-times (?n)pragma omp target oacc_parallel.*num_gangs\\(32\\) 1 parloops_oacc_kernels } } */ + +/* { dg-final { cleanup-tree-dump parloops_oacc_kernels } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 0661b78..c5f4d9a 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -218,6 +218,8 @@ struct reduction_info of the reduction variable when existing the loop. */ tree initial_value; /* The initial value of the reduction var before entering the loop. */ tree field; /* the name of the field in the parloop data structure intended for reduction. */ + tree reduc_addr; /* The address of the reduction variable for + openacc reductions. */ tree init; /* reduction initialization value. */ gphi *new_phi; /* (helper field) Newly created phi node whose result will be passed to the atomic operation. Represents @@ -1107,10 +1109,30 @@ create_call_for_reduction_1 (reduction_info **slot, struct clsn_data *clsn_data) tree tmp_load, name; gimple load; - load_struct = build_simple_mem_ref (clsn_data-load); - t = build3 (COMPONENT_REF, type, load_struct, reduc-field, NULL_TREE); + if (reduc-reduc_addr == NULL_TREE) +{ + load_struct = build_simple_mem_ref (clsn_data-load); + t = build3 (COMPONENT_REF, type, load_struct, reduc-field, NULL_TREE); + + addr = build_addr (t, current_function_decl); +} + else +{ + /* Set the address for the atomic store. */ + addr = reduc-reduc_addr; + + /* Remove the non-atomic store '*addr = sum'. */ + tree res = PHI_RESULT (reduc-keep_res); + use_operand_p use_p; + gimple stmt; + bool single_use_p = single_imm_use (res, use_p, stmt); + gcc_assert (single_use_p); + replace_uses_by (gimple_vdef (stmt), + gimple_vuse (stmt)); + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + gsi_remove (gsi, true); +} - addr = build_addr (t, current_function_decl); /* Create phi node. */ bb = clsn_data-load_bb; @@ -2441,6 +2463,10 @@
Re: [gomp4] Preserve NVPTX reconvergence points
On Mon, 22 Jun 2015 16:24:56 +0200 Jakub Jelinek ja...@redhat.com wrote: On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote: One problem is that (at least on the GPU hardware we've considered so far) we're somewhat constrained in how much control we have over how the underlying hardware executes code: it's possible to draw up a scheme where OpenACC source-level control-flow semantics are reflected directly in the PTX assembly output (e.g. to say all threads in a CTA/warp will be coherent after such-and-such a loop), and lowering OpenACC directives quite early seems to make that relatively tractable. (Even if the resulting code is relatively un-optimisable due to the abnormal edges inserted to make sure that the CFG doesn't become ill-formed.) If arbitrary optimisations are done between OMP-lowering time and somewhere around vectorisation (say), it's less clear if that correspondence can be maintained. Say if the code executed by half the threads in a warp becomes physically separated from the code executed by the other half of the threads in a warp due to some loop optimisation, we can no longer easily determine where that warp will reconverge, and certain other operations (relying on coherent warps -- e.g. CTA synchronisation) become impossible. A similar issue exists for warps within a CTA. So, essentially -- I don't know how late loop lowering would interact with: (a) Maintaining a CFG that will work with PTX. (b) Predication for worker-single and/or vector-single modes (actually all currently-proposed schemes have problems with proper representation of data-dependencies for variables and compiler-generated temporaries between predicated regions.) I don't understand why lowering the way you suggest helps here at all. In the proposed scheme, you essentially have whole function in e.g. worker-single or vector-single mode, which you need to be able to handle properly in any case, because users can write such routines themselves. In vector-single or worker-single mode, divergence of threads within a warp or a CTA is controlled by broadcasting the controlling expression of conditional branches to the set of inactive threads, so each of those follows along with the active thread. So you only get potentially-problematic thread divergence when workers or vectors are operating in partitioned mode. So, for instance, a made-up example: #pragma acc parallel { #pragma acc loop gang for (i = 0; i N; i++)) { #pragma acc loop worker for (j = 0; j M; j++) { if (j M / 2) /* stmt 1 */ else /* stmt 2 */ } /* reconvergence point: thread barrier */ [...] } } Here stmt 1 and stmt 2 execute in worker-partitioned, vector-single mode. With early lowering, the reconvergence point can be inserted at the end of the loop, and abnormal edges (etc.) can be used to ensure that the CFG does not get changed in such a way that there is no longer a unique point at which the loop threads reconverge. With late lowering, it's no longer obvious to me if that can still be done. Julian
[PATCH] parloops exit phi fixes
Hi, the gomp-4_0-branch contains the kernels oacc pass group. I've run into trouble before with this pass group due to the fact that it uses passes in an unusual location or order (pass_lim before pass_stdarg, https://gcc.gnu.org/ml/gcc/2015-01/msg00282.html ). In an attempt to find this sort of issue pro-actively, I've modified the pass list in the following way (similar to the oacc kernels pass group, but always functional, not just for functions with kernel regions or loops in kernels regions), and bootstrapped and reg-tested on x86_64 on top of gomp-4_-0-branch: ... NEXT_PASS (pass_build_ealias); NEXT_PASS (pass_fre); + NEXT_PASS (pass_ch); + NEXT_PASS (pass_tree_loop_init); + NEXT_PASS (pass_lim); + NEXT_PASS (pass_tree_loop_done); + NEXT_PASS (pass_fre); + NEXT_PASS (pass_tree_loop_init); + NEXT_PASS (pass_scev_cprop); + NEXT_PASS (pass_parallelize_loops); + NEXT_PASS (pass_expand_omp_ssa); + NEXT_PASS (pass_tree_loop_done); NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_dse); ... Apart from running into PR66616, I found two issues with the parloops pass: 1. handling of loop header phi, when there's no corresponding loop exit phi (unused reduction result) 2. handling of loop exit phi, when there's no corresponding loop header phi (value not modified in loop) The two attached patches fix these problems. Bootstrapped and reg-tested on x864_64 on top of gomp-4_0-branch in combination with the patch series that triggered the problem. Bootstrapped and reg-tested on x864_64 on top of trunk. OK for trunk? Thanks, - Tom Handle unused reduction in create_loads_for_reductions 2015-06-22 Tom de Vries t...@codesourcery.com * tree-parloops.c (create_loads_for_reductions): Handle case that reduction is unused. --- gcc/tree-parloops.c | 4 1 file changed, 4 insertions(+) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 48c143d..28112b2 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1162,6 +1162,10 @@ create_loads_for_reductions (reduction_info **slot, struct clsn_data *clsn_data) tree name; tree x; + /* If there's no exit phi, the result of the reduction is unused. */ + if (red-keep_res == NULL) +return 1; + gsi = gsi_after_labels (clsn_data-load_bb); load_struct = build_simple_mem_ref (clsn_data-load); load_struct = build3 (COMPONENT_REF, type, load_struct, red-field, -- 1.9.1 Handle exit phi without header phi in create_parallel_loop 2015-06-22 Tom de Vries t...@codesourcery.com * tree-parloops.c (create_parallel_loop): Handle case that exit phi does not have a corresponding loop header phi. --- gcc/tree-parloops.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 7123c27..0693b9e 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2061,13 +2061,17 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, !gsi_end_p (gpi); gsi_next (gpi)) { source_location locus; - tree def; gphi *phi = gpi.phi (); - gphi *stmt; + tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit); + gimple def_stmt = SSA_NAME_DEF_STMT (def); - stmt = as_a gphi * ( - SSA_NAME_DEF_STMT (PHI_ARG_DEF_FROM_EDGE (phi, exit))); + /* If the exit phi is not connected to a header phi in the same loop, this + value is not modified in the loop, and we're done with this phi. */ + if (!(gimple_code (def_stmt) == GIMPLE_PHI + gimple_bb (def_stmt) == loop-header)) + continue; + gphi *stmt = as_a gphi * (def_stmt); def = PHI_ARG_DEF_FROM_EDGE (stmt, loop_preheader_edge (loop)); locus = gimple_phi_arg_location_from_edge (stmt, loop_preheader_edge (loop)); -- 1.9.1
Re: [gomp4] Preserve NVPTX reconvergence points
On Mon, Jun 22, 2015 at 06:48:10PM +0100, Julian Brown wrote: In vector-single or worker-single mode, divergence of threads within a warp or a CTA is controlled by broadcasting the controlling expression of conditional branches to the set of inactive threads, so each of those follows along with the active thread. So you only get potentially-problematic thread divergence when workers or vectors are operating in partitioned mode. So, for instance, a made-up example: #pragma acc parallel { #pragma acc loop gang for (i = 0; i N; i++)) { #pragma acc loop worker for (j = 0; j M; j++) { if (j M / 2) /* stmt 1 */ else /* stmt 2 */ } /* reconvergence point: thread barrier */ [...] } } Here stmt 1 and stmt 2 execute in worker-partitioned, vector-single mode. With early lowering, the reconvergence point can be inserted at the end of the loop, and abnormal edges (etc.) can be used to ensure that the CFG does not get changed in such a way that there is no longer a unique point at which the loop threads reconverge. With late lowering, it's no longer obvious to me if that can still be done. Why? The loop still has an exit edge (if there is no break/return/throw out of the loop which I bet is not allowed), so you just insert the reconvergence point at the exit edge from the loop. For the late lowering, I said it is up for benchmarking/investigation where it would be best placed, it doesn't have to be after the loop passes, there are plenty of optimization passes even before those. But once you turn many of the SSA_NAMEs in a function into (ab) ssa vars, many optimizations just give up. And, if you really want to avoid certain loop optimizations, you have always the possibility to e.g. wrap certain statement in the loop in internal function (e.g. the loop condition) or something similar to make the passes more careful about those loops and make it easier to lower it later. Jakub
patch to fix PR63740 on trunk
I've committed patch for PR63740 to the trunk as rev. 224753. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63740 The patch was bootstrapped on x86-64. 2015-06-22 Vladimir Makarov vmaka...@redhat.com PR bootstrap/63740 * lra-lives.c (process_bb_lives): Check insn copying the same reload pseudo and don't create a copy for it. Index: lra-lives.c === --- lra-lives.c (revision 224739) +++ lra-lives.c (working copy) @@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int c dst_regno = REGNO (SET_DEST (set)); if (dst_regno = lra_constraint_new_regno_start src_regno = lra_constraint_new_regno_start) - lra_create_copy (dst_regno, src_regno, freq); + { + /* It might be still an original (non-reload) insn with +one unused output and a constraint requiring to use +the same reg for input/output operands. In this case +dst_regno and src_regno have the same value, we don't +need a misleading copy for this case. */ + if (dst_regno != src_regno) + lra_create_copy (dst_regno, src_regno, freq); + } else if (dst_regno = lra_constraint_new_regno_start) { if ((hard_regno = src_regno) = FIRST_PSEUDO_REGISTER)
Re: match.pd: Three new patterns
On Fri, Jun 19, 2015 at 05:51:53PM +0200, Marc Glisse wrote: On Fri, 19 Jun 2015, Marek Polacek wrote: +/* x + y - (x | y) - x y */ +(simplify + (minus (plus @0 @1) (bit_ior @0 @1)) + (if (!TYPE_OVERFLOW_SANITIZED (type) !TYPE_SATURATING (type)) + (bit_and @0 @1))) + +/* (x + y) - (x y) - x | y */ +(simplify + (minus (plus @0 @1) (bit_and @0 @1)) + (if (!TYPE_OVERFLOW_SANITIZED (type) !TYPE_SATURATING (type)) + (bit_ior @0 @1))) It could be macroized so they are handled by the same piece of code, but that's not important for a couple lines. Yeah, that could be done, but I didn't see much value in doing that. As far as I can tell, TYPE_SATURATING is for fixed point numbers only, are we allowed to use bit_ior/bit_and on those? I never know what kind of integers are supposed to be supported, so I would have checked TYPE_OVERFLOW_UNDEFINED (type) || TYPE_OVERFLOW_WRAPS (type) since those are the 2 cases where we know it is safe (for TYPE_OVERFLOW_TRAPS it is never clear if we are supposed to preserve traps or just avoid introducing new ones). Well, the reviewer will know, I'll shut up :-) I think you're right about TYPE_SATURATING so I've dropped that and instead replaced it with TYPE_OVERFLOW_TRAPS. That should do the right thing together with TYPE_OVERFLOW_SANITIZED. (I still believe that the necessity for TYPE_OVERFLOW_SANITIZED here points to a design issue in ubsan, but it is way too late to discuss that) I think delayed folding would help here a bit. Also, we've been talking about doing the signed overflow sanitization earlier, but so far I didn't implement that. And -ftrapv should be merged into the ubsan infrastructure some day. It is probably not worth the trouble adding the variant: x+(y-(xy)) - x|y since it decomposes as y-(xy) - y~x x+(y~x) - x|y x+(y-(x|y)) - x-(x~y) - xy is less likely to happen because the first transform y-(x|y) - -(x~y) increases the number of insns. Bah, we can't handle everything... That sounds about right ;). Thanks! So, Richi, is this variant ok as well? I also added one ubsan test. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-06-22 Marek Polacek pola...@redhat.com * match.pd ((x + y) - (x | y) - x y, (x + y) - (x y) - x | y): New patterns. * gcc.dg/fold-minus-4.c: New test. * gcc.dg/fold-minus-5.c: New test. * c-c++-common/ubsan/overflow-add-5.c: New test. diff --git gcc/match.pd gcc/match.pd index badb80a..6d520ef 100644 --- gcc/match.pd +++ gcc/match.pd @@ -343,6 +343,18 @@ along with GCC; see the file COPYING3. If not see (plus:c (bit_and @0 @1) (bit_ior @0 @1)) (plus @0 @1)) +/* (x + y) - (x | y) - x y */ +(simplify + (minus (plus @0 @1) (bit_ior @0 @1)) + (if (!TYPE_OVERFLOW_SANITIZED (type) !TYPE_OVERFLOW_TRAPS (type)) + (bit_and @0 @1))) + +/* (x + y) - (x y) - x | y */ +(simplify + (minus (plus @0 @1) (bit_and @0 @1)) + (if (!TYPE_OVERFLOW_SANITIZED (type) !TYPE_OVERFLOW_TRAPS (type)) + (bit_ior @0 @1))) + /* (x | y) - (x ^ y) - x y */ (simplify (minus (bit_ior @0 @1) (bit_xor @0 @1)) diff --git gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c index e69de29..905a60a 100644 --- gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c +++ gcc/testsuite/c-c++-common/ubsan/overflow-add-5.c @@ -0,0 +1,30 @@ +/* { dg-do run } */ +/* { dg-options -fsanitize=signed-integer-overflow } */ + +int __attribute__ ((noinline)) +foo (int i, int j) +{ + return (i + j) - (i | j); +} + +/* { dg-output signed integer overflow: 2147483647 \\+ 1 cannot be represented in type 'int'\[^\n\r]*(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*signed integer overflow: -2147483648 - 2147483647 cannot be represented in type 'int'\[^\n\r]*(\n|\r\n|\r) } */ + +int __attribute__ ((noinline)) +bar (int i, int j) +{ + return (i + j) - (i j); +} + +/* { dg-output \[^\n\r]*signed integer overflow: 2147483647 \\+ 1 cannot be represented in type 'int'\[^\n\r]*(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*signed integer overflow: -2147483648 - 1 cannot be represented in type 'int' } */ + +int +main () +{ + int r = foo (__INT_MAX__, 1); + asm volatile ( : +g (r)); + r = bar (__INT_MAX__, 1); + asm volatile ( : +g (r)); + return 0; +} diff --git gcc/testsuite/gcc.dg/fold-minus-4.c gcc/testsuite/gcc.dg/fold-minus-4.c index e69de29..2d76b4f 100644 --- gcc/testsuite/gcc.dg/fold-minus-4.c +++ gcc/testsuite/gcc.dg/fold-minus-4.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-cddce1 } */ + +int +fn1 (int a, int b) +{ + int tem1 = a + b; + int tem2 = a b; + return tem1 - tem2; +} + +int +fn2 (int a, int b) +{ + int tem1 = b + a; + int tem2 = a b; + return tem1 - tem2; +} + +int +fn3 (int a, int b) +{ + int tem1 = a + b; + int tem2 = b a; + return tem1 - tem2; +} + +int +fn4 (int a, int b) +{ + int tem1 = b + a; + int tem2 = b a; + return tem1 - tem2; +} + +/* { dg-final {
Re: [ping] Couple of patches for -fdump-ada-spec
On 06/22/2015 09:33 AM, Eric Botcazou wrote: Add query for template-dependent arguments to -fdump-ada-spec: http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00403.html Get rid of assembly file with -fdump-ada-spec: http://gcc.gnu.org/ml/gcc-patches/2015-06/msg00420.html OK for both. jeff
Re: [PATCH] Expand PIC calls without PLT with -fno-plt
On Mon, 22 Jun 2015, Jiong Wang wrote: Have done a quick experiment, -fno-plt doesn't work on AArch64. it's because although this patch force the function address into register, but the combine pass runs later combine it back as AArch64 have defined such insn pattern. For X86, it's not combined back. From the rtl dump, it's because the rtl pre pass has moved the address load instruction into another basic block and combine pass don't combine across basic blocks. Also, x86 backend has done some check on flag_plt in the new added ix86_nopic_noplt_attribute_p which could help generate correct insns. What I can think of the fix on AArch64 is by restricting the call symbol under flag_plt == true only, so that call via register can't be combined into call symbol direct, Or better to prohibit combine pass for such combining? as the generic fix on combine may fix other broken targets. My colleagues at ISP RAS (CC'ed) have been looking on arm (and aarch64) no-plt codegen. We also saw the problem with the combine pass you describe. I think your description of why it's not observed on x86 is incorrect; the newly added ix86_nopic_noplt_attribute_p should not have anything to do with that. It's just that the GOT load insn has a REG_EQUAL note, and the combine pass can use it to replace the register in the indirect branch, producing a direct branch to a symbol (i.e. a PLT jump). Actually we are not hitting the same problem on x86 by pure luck. Early RTL passes manage to lose the REG_EQUAL note, so by the time combine runs, the register annotation is lost. It's possible to reproduce the arm/aarch64 problem on x86 with -fno-gcse and the following hack: diff --git a/gcc/cse.c b/gcc/cse.c index 2a33827..88cff96 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -6634,6 +6634,9 @@ cse_main (rtx_insn *f ATTRIBUTE_UNUSED, int nregs) int *rc_order = XNEWVEC (int, last_basic_block_for_fn (cfun)); int i, n_blocks; + if (!flag_gcse) +return 0; + df_set_flags (DF_LR_RUN_DCE); df_note_add_problem (); df_analyze (); Regarding fixing the issue, I also think that combine pass might be a better place (than the backends). I'd appreciate comments from maintainers. If you try disabling the REG_EQUAL note generation [*], you'll probably find a performance regression on arm32 (and probably on aarch64 as well? we only tried arm32 so far). The main reason for that is that GCC emits pretty bad code for a GOT load. Instead of using two add instructions and one ldr for the GOT slot access, like the PLT stubs do, it uses three(!) ldr instructions and one add. The first ldr is for loading the GOT address, and the second is for the offset of the GOT slot. As I understand, to fix that, GCC has to learn using the GOT_PREL relocation type. [*] To do that, we hacked arm legitimize_pic_address not to emit REG_EQUAL note under !flag_plt. Alexander
Re: [PATCH 1/3] Refactor entry point to -Wmisleading-indentation
On Mon, Jun 22, 2015 at 1:29 PM, Jeff Law l...@redhat.com wrote: On 06/09/2015 11:31 AM, Patrick Palka wrote: This patch refactors the entry point of -Wmisleading-indentation from: void warn_for_misleading_indentation (location_t guard_loc, location_t body_loc, location_t next_stmt_loc, enum cpp_ttype next_tok_type, const char *guard_kind); to struct token_indent_info { location_t location; cpp_ttype type; rid keyword; }; void warn_for_misleading_indentation (const token_indent_info guard_tinfo, const token_indent_info body_tinfo, const token_indent_info next_tinfo); The purpose of this refactoring is to expose more information to the -Wmisleading-indentation implementation to allow for more advanced heuristics and for better coverage. (I decided to keep the usage of const references because nobody seems to mind. Also I added a new header file, c-indentation.h.) gcc/c-family/ChangeLog: * c-indentation.h (struct token_indent_info): Define. (get_token_indent_info): Define. (warn_for_misleading_information): Declare. * c-common.h (warn_for_misleading_information): Remove. * c-identation.c (warn_for_misleading_indentation): Change declaration to take three token_indent_infos. Adjust accordingly. * c-identation.c (should_warn_for_misleading_indentation): Likewise. Bail out early if the body is a compound statement. (guard_tinfo_to_string): Define. gcc/c/ChangeLog: * c-parser.c (c_parser_if_body): Take token_indent_info argument. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. (c_parser_else_body): Likewise. (c_parser_if_statement): Likewise. (c_parser_while_statement): Likewise. (c_parser_for_statement): Likewise. gcc/cp/ChangeLog: * parser.c (cp_parser_selection_statement): Move handling of semicolon body to ... (cp_parser_implicitly_scoped_statement): .. here. Call warn_for_misleading_indentation even when the body is a semicolon. Extract token_indent_infos corresponding to the guard, body and next tokens. Adjust call to warn_for_misleading_indentation accordingly. Take token_indent_info argument. (cp_parser_already_scoped_statement): Likewise. (cp_parser_selection_statement, cp_parser_iteration_statement): Extract a token_indent_info corresponding to the guard token. The only question in my mind is bootstrap regression testing. From reading the thread for the earlier version of this patch I got the impression you had bootstrapped and regression tested earlier versions. If you could confirm that you've bootstrapped and regression tested this version it'd be appreciated. You can do it on the individual patches or the set as a whole. I think I successfully bootstrapped + regtested this exact version but I'm not sure. I was going to do so again before committing anyway. I will fire off a build tonight and confirm the results tomorrow. Jeff
patch to fix PR63740
The following patch fixes PR63740 which is describedin details on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63740 Committed as rev. 224752. I'll commit later the same patch for the trunk. 2015-06-22 Vladimir Makarov vmaka...@redhat.com PR bootstrap/63740 * lra-lives.c (process_bb_lives): Check insn copying the same reload pseudo and don't create a copy for it. Index: lra-lives.c === --- lra-lives.c (revision 224739) +++ lra-lives.c (working copy) @@ -565,7 +565,15 @@ process_bb_lives (basic_block bb, int c dst_regno = REGNO (SET_DEST (set)); if (dst_regno = lra_constraint_new_regno_start src_regno = lra_constraint_new_regno_start) - lra_create_copy (dst_regno, src_regno, freq); + { + /* It might be still an original (non-reload) insn with +one unused output and a constraint requiring to use +the same reg for input/output operands. In this case +dst_regno and src_regno have the same value, we don't +need a misleading copy for this case. */ + if (dst_regno != src_regno) + lra_create_copy (dst_regno, src_regno, freq); + } else if (dst_regno = lra_constraint_new_regno_start) { if ((hard_regno = src_regno) = FIRST_PSEUDO_REGISTER)
Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function
On Sun, Jun 21, 2015 at 05:05:14PM -0600, Martin Sebor wrote: --- /dev/null +++ b/gcc/testsuite/gcc.dg/addr_builtin-pr66516.c @@ -0,0 +1,59 @@ +/* { dg-do compile } */ One more nit: I think I'd prefer naming the test addr-builtin-1.c and then putting /* PR c/66516 */ on the first line of the test. Marek
Re: [PATCH] Fix PR c++/30044
On 06/15/2015 02:32 PM, Patrick Palka wrote: On Mon, Jun 15, 2015 at 2:05 PM, Jason Merrill ja...@redhat.com wrote: Any reason not to use grow_tree_vec? Doing so causes a lot of ICEs in the testsuite. I think it's because grow_tree_vec invalidates the older parameter_vec which some trees may still be holding a reference to in their DECL_TEMPLATE_PARMS field. Hmm, that's unfortunate, as doing it this way means we get a bunch of garbage TREE_VECs in the process. But I guess the patch is OK as is. Jason
Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function
It seems like this patch regresess pr59630.c testcase; I don't see the testcase being addressed in this patch. Thanks for the review and for pointing out this regression! I missed it among all the C test suite failures (I see 157 of them in 24 distinct tests on x86_64.) pr59630 is marked ice-on-valid-code even though the call via the converted pointer is clearly invalid (UB). What's more relevant, though, is that the test case is one of those that (while they both compile and link with the unpatched GCC) are not intended to compile with the patch (and don't compile with Clang). In this simple case, the call to __builtin_abs(0) is folded into the constant 0, but in more involved cases GCC emits a call to abs. It's not clear to me from the manual or from the builtin tests I've seen whether this is by design or an accident of the implementation Is it intended that programs be able to take the address of the builtins that correspond to libc functions and make calls to the underlying libc functions via such pointers? (If so, the patch will need some tweaking.) Please no c/ and cp/ prefixes. Sure, let me fix that in the next patch once the question above has been settled. +#include stdlib.h As Joseph already pointed out, this is redundant. Yes, that was an accidental vestige of some debugging code I had added. I'll take it out. @@ -3384,7 +3392,14 @@ parser_build_unary_op (location_t loc, enum tree_code code, struct c_expr arg) result.original_code = code; result.original_type = NULL; - if (TREE_OVERFLOW_P (result.value) !TREE_OVERFLOW_P (arg.value)) + if (code == ADDR_EXPR + TREE_CODE (TREE_TYPE (arg.value)) == FUNCTION_TYPE + DECL_IS_BUILTIN (arg.value)) +{ + error_at (loc, taking address of a builtin function); + result.value = error_mark_node; +} + else if (TREE_OVERFLOW_P (result.value) !TREE_OVERFLOW_P (arg.value)) overflow_warning (loc, result.value); It seems like you can move the new hunk a bit above so that we don't call build_unary_op in a case when taking the address of a built-in function. Yes, that should work. Unfortunately, it doesn't seem possible to do this error in build_unary_op or in function_to_pointer_conversion :(. Right. I couldn't find a way to do it because it gets called for function calls too. One more nit: I think I'd prefer naming the test addr-builtin-1.c and then putting /* PR c/66516 */ on the first line of the test. Will do. Martin
[i386, PATCH] Support new psABI for IA MCU.
Hello, I am starting (hopefully small) serie of patches to support new ABI dedicated for Intel's MicroController Units [1]. Support for new arch was introduced into Binutils in a few threads, e.g. [2]. This patchset includes: - Support in GCC: new switch (-miamcu), macro etc. - Changes to libraries. - Testsuite. Whole patch is in the bottom. [1] - https://groups.google.com/forum/#!topic/ia32-abi/cn7TM6J_TIg [2] - https://sourceware.org/ml/binutils/2015-05/msg00063.html -- Thanks, K diff --git a/config/dfp.m4 b/config/dfp.m4 index 48683f0..5b29089 100644 --- a/config/dfp.m4 +++ b/config/dfp.m4 @@ -21,7 +21,7 @@ Valid choices are 'yes', 'bid', 'dpd', and 'no'.]) ;; [ case $1 in powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux* | s390*-*-linux* | \ -i?86*-*-gnu* | \ +i?86*-*-elfiamcu | i?86*-*-gnu* | \ i?86*-*-mingw* | x86_64*-*-mingw* | \ i?86*-*-cygwin* | x86_64*-*-cygwin*) enable_decimal_float=yes diff --git a/configure b/configure index bced9de..82e45f3 100755 --- a/configure +++ b/configure @@ -6914,7 +6914,7 @@ case ${enable_target_optspace}:${target} in :d30v-*) ospace_frag=config/mt-d30v ;; - :m32r-* | :d10v-* | :fr30-*) + :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu) ospace_frag=config/mt-ospace ;; no:* | :*) diff --git a/configure.ac b/configure.ac index 7c06e6b..dc77a1b 100644 --- a/configure.ac +++ b/configure.ac @@ -2560,7 +2560,7 @@ case ${enable_target_optspace}:${target} in :d30v-*) ospace_frag=config/mt-d30v ;; - :m32r-* | :d10v-* | :fr30-*) + :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu) ospace_frag=config/mt-ospace ;; no:* | :*) diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index 0f8c3e1..79b2472 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -223,7 +223,7 @@ along with GCC; see the file COPYING3. If not see bool ix86_handle_option (struct gcc_options *opts, - struct gcc_options *opts_set ATTRIBUTE_UNUSED, + struct gcc_options *opts_set, const struct cl_decoded_option *decoded, location_t loc) { @@ -232,6 +232,20 @@ ix86_handle_option (struct gcc_options *opts, switch (code) { +case OPT_miamcu: + if (value) + { + /* Turn off x87/MMX/SSE/AVX codegen for -miamcu. */ + opts-x_target_flags = ~MASK_80387; + opts_set-x_target_flags |= MASK_80387; + opts-x_ix86_isa_flags = ~(OPTION_MASK_ISA_MMX_UNSET + | OPTION_MASK_ISA_SSE_UNSET); + opts-x_ix86_isa_flags_explicit |= (OPTION_MASK_ISA_MMX_UNSET + | OPTION_MASK_ISA_SSE_UNSET); + + } + return true; + case OPT_mmmx: if (value) { diff --git a/gcc/config.gcc b/gcc/config.gcc index 805638d..2b3af82 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -1389,6 +1389,9 @@ x86_64-*-darwin*) tmake_file=${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc tm_file=${tm_file} ${cpu_type}/darwin64.h ;; +i[34567]86-*-elfiamcu) + tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h newlib-stdint.h i386/iamcu.h + ;; i[34567]86-*-elf*) tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h newlib-stdint.h i386/i386elf.h ;; diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 0228f4b..66f7e37 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -426,6 +426,11 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, __CLWB__); if (isa_flag OPTION_MASK_ISA_MWAITX) def_or_undef (parse_in, __MWAITX__); + if (TARGET_IAMCU) +{ + def_or_undef (parse_in, __iamcu); + def_or_undef (parse_in, __iamcu__); +} } diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 24fccfc..26ffa67 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3433,6 +3433,10 @@ ix86_option_override_internal (bool main_args_p, || TARGET_16BIT_P (opts-x_ix86_isa_flags)) opts-x_ix86_isa_flags = ~OPTION_MASK_ABI_X32; #endif + if (TARGET_64BIT_P (opts-x_ix86_isa_flags) + TARGET_IAMCU_P (opts-x_target_flags)) + sorry (Intel MCU psABI isn%'t supported in %s mode, + TARGET_X32_P (opts-x_ix86_isa_flags) ? x32 : 64-bit); } #endif @@ -3817,6 +3821,20 @@ ix86_option_override_internal (bool main_args_p, if (TARGET_X32 (ix86_isa_flags OPTION_MASK_ISA_MPX)) error (Intel MPX does not support x32); + if (TARGET_IAMCU_P (opts-x_target_flags)) +{ + /* Verify that x87/MMX/SSE/AVX is off for -miamcu. */ + if (TARGET_80387_P (opts-x_target_flags)) + sorry (X87 FPU isn%'t supported in Intel MCU psABI); + else if ((opts-x_ix86_isa_flags (OPTION_MASK_ISA_MMX +
[Patch, fortran] PR52846 - [F2008] Support submodules
Dear All, This patch enables submodule support in gfortran. Submodules are a feature of F2008 but are fully described in ISO/IEC TR 19767:2004(E). The patch has one significant non-conformance (that I know about, anyway!); whilst private derived type components are correctly dealt with, symbols whose access is private within the parent module are not. They should effectively be host associated in descendant submodules. At present gfortran handles private access at the module write stage. This means that when a submodule reads the module file, there is no information present about symbols whose access was private. Since this modification might cause significant fall-out to existing code, I propose to submit a separate patch later on to sort out the non-conformance. However, as required private and public statements are not allowed in submodules. The patch makes maximum possible leverage of existing code to handle modules. Once the submodule is matched, the ancestor module and submodules are first used and then all the symbols are set host associated and private derived type components set public. Most of the work involved matching module procedures, with both the traditional form of declaration and the abbreviated one. I have chosen to treat MODULE as a prefix like PURE or ELEMENTAL. This is logical both because of the form of the declaration and because the identification of module procedures is most easily done with an attribute bit. With traditional procedure declarations, the procedure, result and dummy characteristics are compared with those of the interface declaration. The comparison of the dummy characteristics is a bit cobbled together and might be better done by copying the formal_namespace and it's contents to the new symbol and retaining the old for the interface symbol. This patch leaves the old dummy symbols in the formal namespace in the new ones in the formal arglist. I have checked that cleanup occurs for all objects. Note the comment in submodule_1.f90 about the possibility of undetected recursion between procedures in different submodules. I am not at all sure that I know how to deal with this and am open to suggestions. In addition, it should be noted that collisions between the names of entities and procedures, other than module procedures are detected by the linker at present. Apart from this, all is very straightforward and follows the the ChangeLogs. Thanks for testing of an early version of the patch by Damian Rouson, Salvatore Filippone and Tobias Burnus. Bootstrapped and regtested on FC21/x86_64 - OK for trunk? Cheers Paul 2015-06-22 Paul Thomas pa...@gcc.gnu.org PR fortran/52846 * decl.c (get_proc_name): Make a partially populated interface symbol to carry the characteristics of a module procedure and its result. (match_attr_spec): Submodule variables have implicit save attribute for F2008 onwards. (gfc_match_prefix): Add 'module' as the a prefix and set the module_procedure attribute. (gfc_match_formal_arglist): For a module procedure keep the interface formal_arglist from the interface, match new the formal arguments and then compare the number and names of each. (gfc_match_procedure): Add case COMP_SUBMODULE. (gfc_match_function_decl, gfc_match_subroutine_decl): Set the module_procedure attribute. (gfc_match_entry, gfc_match_end): Add case COMP_SUBMODULE. (gfc_match_submod_proc): New function to match the abbreviated style of submodule declaration. * gfortran.h : Add ST_SUBMODULE and ST_END_SUBMODULE. Add the attribute bits 'used_in_submodule' and 'module_procedure'. Add prototypes for the functions 'gfc_check_dummy_characteristics' and 'gfc_check_result_characteristics'. * interface.c : Add the prefix 'gfc_' to the names of functions 'check_dummy(result)_characteristics' and all their references. * match.h : Add prototype for 'gfc_match_submod_proc' and 'gfc_match_submodule'. * module.c (gfc_match_submodule): New function. Add handling for the 'module_procedure' attribute bit. * parse.c (decode_statement): Handle a match occurring in 'gfc_match_submod_proc' and a match for 'submodule'. (gfc_enclosing_unit): Include the state COMP_SUBMODULE. (gfc_ascii_statement): Add END SUBMODULE. (accept_statement): Add ST_SUBMODULE. (parse_spec): Disallow statement functions in a submodule specification part. (parse_contained): Add ST_END_SUBMODULE and COMP_SUBMODULE twice each. (set_syms_host_assoc): Make symbols from the ancestor module and submodules use associated, as required by the standard and set all private components public. Module procedures 'external' attribute bit is reset and the 'used_in_submodule' bit is set. (parse_module): If this is a submodule, use the ancestor module and submodules. Traverse the namespace, calling 'set_syms_host_assoc'. Add ST_END_SUBMODULE and COMP_SUBMODULE. *
[C++ Patch] Remove pointless code in grokdeclarator
Hi, I think this qualifies as obvious: we reset type_quals to TYPE_UNQUALIFIED and then we only use it in a 'if (type_quals != TYPE_UNQUALIFIED)' test before returning. Thanks, Paolo. /// 2015-06-22 Paolo Carlini paolo.carl...@oracle.com * decl.c (grokdeclarator): Remove pointless code. Index: cp/decl.c === --- cp/decl.c (revision 224724) +++ cp/decl.c (working copy) @@ -10476,19 +10477,9 @@ grokdeclarator (const cp_declarator *declarator, if (decl_context == TYPENAME) { - /* Note that the grammar rejects storage classes -in typenames, fields or parameters. */ - if (type_quals != TYPE_UNQUALIFIED) - type_quals = TYPE_UNQUALIFIED; - /* Special case: friend class foo looks like a TYPENAME context. */ if (friendp) { - if (type_quals != TYPE_UNQUALIFIED) - { - error (type qualifiers specified for friend class declaration); - type_quals = TYPE_UNQUALIFIED; - } if (inlinep) { error (%inline% specified for friend class declaration);
Re: [gomp4] Preserve NVPTX reconvergence points
On 06/19/2015 03:45 PM, Jakub Jelinek wrote: I actually believe having some optimization passes in between the ompexp and the lowering of the IR into the form PTX wants is highly desirable, the form with the worker-single or vector-single mode lowered will contain too complex CFG for many optimizations to be really effective, especially if it uses abnormal edges. E.g. inlining supposedly would have harder job etc. What exact unpredictable effects do you fear? Mostly the ones I can't predict. But let's take one example, LICM: let's say you pull some assignment out of a loop, then you find yourself in one of two possible situations: either it's become not actually available inside the loop (because the data and control flow is not described correctly and the compiler doesn't know what's going on), or, to avoid that, you introduce additional broadcasting operations when entering the loop, which might be quite expensive. Bernd
Re: [gomp4] Preserve NVPTX reconvergence points
On Mon, Jun 22, 2015 at 03:59:57PM +0200, Bernd Schmidt wrote: On 06/19/2015 03:45 PM, Jakub Jelinek wrote: I actually believe having some optimization passes in between the ompexp and the lowering of the IR into the form PTX wants is highly desirable, the form with the worker-single or vector-single mode lowered will contain too complex CFG for many optimizations to be really effective, especially if it uses abnormal edges. E.g. inlining supposedly would have harder job etc. What exact unpredictable effects do you fear? Mostly the ones I can't predict. But let's take one example, LICM: let's say you pull some assignment out of a loop, then you find yourself in one of two possible situations: either it's become not actually available inside the loop (because the data and control flow is not described correctly and the compiler doesn't know what's going on), or, to avoid that, you introduce Why do you think that would happen? E.g. for non-addressable gimple types you'd most likely just have a PHI for it on the loop. additional broadcasting operations when entering the loop, which might be quite expensive. If the PHI has cheap initialization, there is not a problem to emit it as initialization in the loop instead of a broadcast (kind like RA rematerialization). And by actually adding such an optimization, you help even code that has computation in a vector-single code and uses it in vector acc loop. Jakub
Re: [gomp4][PATCH] Handle casts in bound in try_transform_to_exit_first_loop_alt
On Thu, 18 Jun 2015, Tom de Vries wrote: On 13/06/15 16:24, Tom de Vries wrote: Hi, this patch allows try_transform_to_exit_first_loop_alt to succeed when handling cases where the expression representing the number of iterations contains a cast. Currently, transform_to_exit_first_loop_alt testcase gfortran/parloops-exit-first-loop-alt.f95 will fail. The nit is _19, which is defined as follows: ... _20 = _6 + -1; _19 = (unsigned int) _20; ... And transform_to_exit_first_loop_alt currently only handles nits with defining stmt 'nit = x - 1', for which it finds alt_bound 'x'. The patch: - uses try_get_loop_niter to get nit as a nested tree expression '(unsigned int) (_6 + -1)' - strips the outer nops (assuming no change in value) - uses '(unsigned int)_6' as the alt_bound, and - gimplifies the expression. Bootstrapped and reg-tested on x86_64. Cleaned up whitespace in testcases. Committed to gomp-4_0-branch as atttached. OK for trunk? I assume the above also handles the reverse, (int) (_6 + -1)? In this case what happens if _6 == INT_MAX + 1? nit is INT_MAX but (int) _6 is INT_MIN. Likewise what happens if _6 + -1 under-/overflows? Richard. Thanks, - Tom -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [i386, PATCH, 1/3] IA MCU psABI support: GCC changes.
Hello, This patch introduces basic support into GCC. Bootstrapped and regtested. / * configure.ac (ospace_frag): Enable for i?86*-*-elfiamcu target. * configure: Regenerate. gcc/ * config.gcc: Support i[34567]86-*-elfiamcu target. * config/i386/iamcu.h: New. * config/i386/i386.opt: Add -miamcu. * doc/invoke.texi: Document -miamcu. * common/config/i386/i386-common.c (ix86_handle_option): Turn off x87/MMX/SSE/AVX codegen for -miamcu. * config/i386/i386-c.c (ix86_target_macros_internal): Define __iamcu/__iamcu__ for -miamcu. * config/i386/i386.h (PREFERRED_STACK_BOUNDARY_DEFAULT): Set to MIN_STACK_BOUNDARY if TARGET_IAMCU is true. (BIGGEST_ALIGNMENT): Set to 32 if TARGET_IAMCU is true. * config/i386/i386.c (ix86_option_override_internal): - Ignore and warn -mregparm for Intel MCU. Turn on -mregparm=3 for Intel MCU by default. - Default long double to 64-bit for Intel MCU. - Turn on -freg-struct-return for Intel MCU. - Issue an error when -miamcu is used in 64-bit or x32 mode, or if x87, MMX, SSE or AVX is turned on. (function_arg_advance_32): Pass value whose size is no larger than 8 bytes in registers for Intel MCU. (function_arg_32): Likewise. (ix86_return_in_memory): Return value whose size is no larger than 8 bytes in registers for Intel MCU. (iamcu_alignment): New function. (ix86_data_alignment): Call iamcu_alignment if TARGET_IAMCU is true. (ix86_local_alignment): Don't increase alignment for Intel MCU. (x86_field_alignment): Return iamcu_alignment if TARGET_IAMCU is true. Is it OK for trunk? -- Thanks, K diff --git a/configure b/configure index bced9de..82e45f3 100755 --- a/configure +++ b/configure @@ -6914,7 +6914,7 @@ case ${enable_target_optspace}:${target} in :d30v-*) ospace_frag=config/mt-d30v ;; - :m32r-* | :d10v-* | :fr30-*) + :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu) ospace_frag=config/mt-ospace ;; no:* | :*) diff --git a/configure.ac b/configure.ac index 7c06e6b..dc77a1b 100644 --- a/configure.ac +++ b/configure.ac @@ -2560,7 +2560,7 @@ case ${enable_target_optspace}:${target} in :d30v-*) ospace_frag=config/mt-d30v ;; - :m32r-* | :d10v-* | :fr30-*) + :m32r-* | :d10v-* | :fr30-* | :i?86*-*-elfiamcu) ospace_frag=config/mt-ospace ;; no:* | :*) diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/i386-common.c index 0f8c3e1..79b2472 100644 --- a/gcc/common/config/i386/i386-common.c +++ b/gcc/common/config/i386/i386-common.c @@ -223,7 +223,7 @@ along with GCC; see the file COPYING3. If not see bool ix86_handle_option (struct gcc_options *opts, - struct gcc_options *opts_set ATTRIBUTE_UNUSED, + struct gcc_options *opts_set, const struct cl_decoded_option *decoded, location_t loc) { @@ -232,6 +232,20 @@ ix86_handle_option (struct gcc_options *opts, switch (code) { +case OPT_miamcu: + if (value) + { + /* Turn off x87/MMX/SSE/AVX codegen for -miamcu. */ + opts-x_target_flags = ~MASK_80387; + opts_set-x_target_flags |= MASK_80387; + opts-x_ix86_isa_flags = ~(OPTION_MASK_ISA_MMX_UNSET + | OPTION_MASK_ISA_SSE_UNSET); + opts-x_ix86_isa_flags_explicit |= (OPTION_MASK_ISA_MMX_UNSET + | OPTION_MASK_ISA_SSE_UNSET); + + } + return true; + case OPT_mmmx: if (value) { diff --git a/gcc/config.gcc b/gcc/config.gcc index 805638d..2b3af82 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -1389,6 +1389,9 @@ x86_64-*-darwin*) tmake_file=${tmake_file} ${cpu_type}/t-darwin64 t-slibgcc tm_file=${tm_file} ${cpu_type}/darwin64.h ;; +i[34567]86-*-elfiamcu) + tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h newlib-stdint.h i386/iamcu.h + ;; i[34567]86-*-elf*) tm_file=${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h newlib-stdint.h i386/i386elf.h ;; diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c index 0228f4b..66f7e37 100644 --- a/gcc/config/i386/i386-c.c +++ b/gcc/config/i386/i386-c.c @@ -426,6 +426,11 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, __CLWB__); if (isa_flag OPTION_MASK_ISA_MWAITX) def_or_undef (parse_in, __MWAITX__); + if (TARGET_IAMCU) +{ + def_or_undef (parse_in, __iamcu); + def_or_undef (parse_in, __iamcu__); +} } diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 24fccfc..26ffa67 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3433,6 +3433,10 @@ ix86_option_override_internal (bool main_args_p,
Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function
On Sun, Jun 21, 2015 at 05:05:14PM -0600, Martin Sebor wrote: Attached is a patch to reject C and C++ constructs that result in obtaining a pointer (or a reference in C++) to a builtin function. These constructs are currently silently accepted by GCC and, in most cases(*), result in a linker error. The patch brings GCC on par with Clang by rejecting all such constructs. Bootstrapped and tested on x86_64-unknown-linux-gnu. It seems like this patch regresess pr59630.c testcase; I don't see the testcase being addressed in this patch. 2015-06-21 Martin Sebor mse...@redhat.com PR c/66516 * c/c-typeck.c (default_function_array_conversion): Reject converting a builtin function to a pointer. (parser_build_unary_op): Reject taking the address of a builtin function. * cp/call.c (convert_like_real): Reject converting a builtin function to a pointer. (initialize_reference): Reject initializing a reference with a builtin function. * cp/typeck.c (cp_build_addr_expr_strict): Reject taking the address of a builtin function. (build_reinterpret_cast_1): Reject casting a builtin function to a pointer. (convert_for_initialization): Reject initializing a pointer with the a builtin function. Please no c/ and cp/ prefixes. +#include stdlib.h As Joseph already pointed out, this is redundant. @@ -3384,7 +3392,14 @@ parser_build_unary_op (location_t loc, enum tree_code code, struct c_expr arg) result.original_code = code; result.original_type = NULL; - if (TREE_OVERFLOW_P (result.value) !TREE_OVERFLOW_P (arg.value)) + if (code == ADDR_EXPR + TREE_CODE (TREE_TYPE (arg.value)) == FUNCTION_TYPE + DECL_IS_BUILTIN (arg.value)) +{ + error_at (loc, taking address of a builtin function); + result.value = error_mark_node; +} + else if (TREE_OVERFLOW_P (result.value) !TREE_OVERFLOW_P (arg.value)) overflow_warning (loc, result.value); It seems like you can move the new hunk a bit above so that we don't call build_unary_op in a case when taking the address of a built-in function. Unfortunately, it doesn't seem possible to do this error in build_unary_op or in function_to_pointer_conversion :(. Marek
Re: match.pd: Three new patterns
On Fri, 19 Jun 2015, Marek Polacek wrote: On Thu, Jun 18, 2015 at 05:41:18PM +0200, Marek Polacek wrote: Again for symmetry, it seems like this comes with x + y - (x | y) - x y x + y - (x y) - x | y which seem fine when overflow is undefined or wraps, but not if for instance it saturates. I'll leave this as a follow-up. ...here. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-06-19 Marek Polacek pola...@redhat.com * match.pd (x + y - (x | y) - x y, ) missing (x + y) - (x y) - x | y): New patterns. * gcc.dg/fold-minus-4.c: New test. * gcc.dg/fold-minus-5.c: New test. diff --git gcc/match.pd gcc/match.pd index badb80a..61ff710 100644 --- gcc/match.pd +++ gcc/match.pd @@ -343,6 +343,18 @@ along with GCC; see the file COPYING3. If not see (plus:c (bit_and @0 @1) (bit_ior @0 @1)) (plus @0 @1)) +/* x + y - (x | y) - x y */ Please wrap x + y in () here as well. Ok with that changes. Thanks, Richard. +(simplify + (minus (plus @0 @1) (bit_ior @0 @1)) + (if (!TYPE_OVERFLOW_SANITIZED (type) !TYPE_SATURATING (type)) + (bit_and @0 @1))) + +/* (x + y) - (x y) - x | y */ +(simplify + (minus (plus @0 @1) (bit_and @0 @1)) + (if (!TYPE_OVERFLOW_SANITIZED (type) !TYPE_SATURATING (type)) + (bit_ior @0 @1))) + /* (x | y) - (x ^ y) - x y */ (simplify (minus (bit_ior @0 @1) (bit_xor @0 @1)) diff --git gcc/testsuite/gcc.dg/fold-minus-4.c gcc/testsuite/gcc.dg/fold-minus-4.c index e69de29..2d76b4f 100644 --- gcc/testsuite/gcc.dg/fold-minus-4.c +++ gcc/testsuite/gcc.dg/fold-minus-4.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-cddce1 } */ + +int +fn1 (int a, int b) +{ + int tem1 = a + b; + int tem2 = a b; + return tem1 - tem2; +} + +int +fn2 (int a, int b) +{ + int tem1 = b + a; + int tem2 = a b; + return tem1 - tem2; +} + +int +fn3 (int a, int b) +{ + int tem1 = a + b; + int tem2 = b a; + return tem1 - tem2; +} + +int +fn4 (int a, int b) +{ + int tem1 = b + a; + int tem2 = b a; + return tem1 - tem2; +} + +/* { dg-final { scan-tree-dump-notcddce1 } } */ +/* { dg-final { scan-tree-dump-not \\+ cddce1 } } */ diff --git gcc/testsuite/gcc.dg/fold-minus-5.c gcc/testsuite/gcc.dg/fold-minus-5.c index e69de29..a31e1cc 100644 --- gcc/testsuite/gcc.dg/fold-minus-5.c +++ gcc/testsuite/gcc.dg/fold-minus-5.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-cddce1 } */ + +int +fn1 (int a, int b) +{ + int tem1 = a + b; + int tem2 = a | b; + return tem1 - tem2; +} + +int +fn2 (int a, int b) +{ + int tem1 = b + a; + int tem2 = a | b; + return tem1 - tem2; +} + +int +fn3 (int a, int b) +{ + int tem1 = a + b; + int tem2 = b | a; + return tem1 - tem2; +} + +int +fn4 (int a, int b) +{ + int tem1 = b + a; + int tem2 = b | a; + return tem1 - tem2; +} + +/* { dg-final { scan-tree-dump-not \\+ cddce1 } } */ +/* { dg-final { scan-tree-dump-not \\| cddce1 } } */ Marek -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
[C++ Patch] Use declspecs-locations[ds_virtual]
Hi, I think this also qualifies as obvious given the past work / discussion: use in one more place declspecs-locations to improve the location of the error message. Thanks, Paolo. /cp 2015-06-22 Paolo Carlini paolo.carl...@oracle.com * decl.c (grokdeclarator): Use declspecs-locations[ds_virtual]. /testsuite 2015-06-22 Paolo Carlini paolo.carl...@oracle.com * g++.dg/inherit/pure1.C: Test location too. Index: cp/decl.c === --- cp/decl.c (revision 224724) +++ cp/decl.c (working copy) @@ -9529,7 +9529,8 @@ grokdeclarator (const cp_declarator *declarator, if (virtualp (current_class_name == NULL_TREE || decl_context != FIELD)) { - error (%virtual% outside class declaration); + error_at (declspecs-locations[ds_virtual], + %virtual% outside class declaration); virtualp = 0; } Index: testsuite/g++.dg/inherit/pure1.C === --- testsuite/g++.dg/inherit/pure1.C(revision 224724) +++ testsuite/g++.dg/inherit/pure1.C(working copy) @@ -3,8 +3,8 @@ // { dg-do compile } void foo0() = 0; // { dg-error like a variable } -virtual void foo1() = 0; // { dg-error outside class|variable } - +virtual void foo1() = 0; // { dg-error 1:'virtual' outside class } +// { dg-error like a variable { target *-*-* } 6 } struct A { void foo2() = 0; // { dg-error non-virtual }
Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function
On Sun, 21 Jun 2015, Martin Sebor wrote: diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 636e0bb..637a292 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -58,6 +58,8 @@ along with GCC; see the file COPYING3. If not see #include cilk.h #include gomp-constants.h +#include stdlib.h Included from system.h, don't include it explicitly in source files. + if (DECL_IS_BUILTIN (exp.value)) + { + error_at (loc, converting builtin function to a pointer); Say built-in (see codingconventions.html). -- Joseph S. Myers jos...@codesourcery.com
Re: [gomp4] Preserve NVPTX reconvergence points
On Mon, Jun 22, 2015 at 02:55:49PM +0100, Julian Brown wrote: One problem is that (at least on the GPU hardware we've considered so far) we're somewhat constrained in how much control we have over how the underlying hardware executes code: it's possible to draw up a scheme where OpenACC source-level control-flow semantics are reflected directly in the PTX assembly output (e.g. to say all threads in a CTA/warp will be coherent after such-and-such a loop), and lowering OpenACC directives quite early seems to make that relatively tractable. (Even if the resulting code is relatively un-optimisable due to the abnormal edges inserted to make sure that the CFG doesn't become ill-formed.) If arbitrary optimisations are done between OMP-lowering time and somewhere around vectorisation (say), it's less clear if that correspondence can be maintained. Say if the code executed by half the threads in a warp becomes physically separated from the code executed by the other half of the threads in a warp due to some loop optimisation, we can no longer easily determine where that warp will reconverge, and certain other operations (relying on coherent warps -- e.g. CTA synchronisation) become impossible. A similar issue exists for warps within a CTA. So, essentially -- I don't know how late loop lowering would interact with: (a) Maintaining a CFG that will work with PTX. (b) Predication for worker-single and/or vector-single modes (actually all currently-proposed schemes have problems with proper representation of data-dependencies for variables and compiler-generated temporaries between predicated regions.) I don't understand why lowering the way you suggest helps here at all. In the proposed scheme, you essentially have whole function in e.g. worker-single or vector-single mode, which you need to be able to handle properly in any case, because users can write such routines themselves. And then you can have a loop in such a function that has some special attribute, a hint that it is desirable to vectorize it (for PTX the PTX way) or use vector-single mode for it in a worker-single function. So, the special pass then of course needs to handle all the needed broadcasting and reduction required to change the mode from e.g. worker-single to vector-single, but the convergence points still would be either on the boundary of such loops to be vectorized or parallelized, or wherever else they appear in normal vector-single or worker-single functions (around the calls to certainly calls?). Jakub
Re: match.pd: Three new patterns (and some more)
On Thu, 18 Jun 2015, Marek Polacek wrote: On Tue, Jun 16, 2015 at 03:35:15PM +0200, Richard Biener wrote: We already have /* (x y) ^ (x | y) - x ^ y */ (simplify (bit_xor:c (bit_and @0 @1) (bit_ior @0 @1)) (bit_xor @0 @1)) but of course with minus it doesn't commutate so it's hard to merge. Yeah :(. +/* (x y) + (x | y) - x + y */ Again for symmetry, it seems like this comes with x + y - (x | y) - x y x + y - (x y) - x | y which seem fine when overflow is undefined or wraps, but not if for instance it saturates. Can you adjust according to Marcs comment and re-submit? If you like you can do it as followup as well and thus the original patch is ok as well. Sure. This is a new version with some more patters. Thanks. Bootstrapped/regtested on x86_64-linux, ok for trunk? Ok. Thanks, Richard. 2015-06-18 Marek Polacek pola...@redhat.com * match.pd ((x ^ y) ^ (x | y) - x y, (x y) + (x ^ y) - x | y, (x y) | (x ^ y) - x | y, (x y) ^ (x ^ y) - x | y, (x y) + (x | y) - x + y, (x | y) - (x ^ y) - x y, (x | y) - (x y) - x ^ y): New patterns. * gcc.dg/fold-ior-1.c: New test. * gcc.dg/fold-minus-2.c: New test. * gcc.dg/fold-minus-3.c: New test. * gcc.dg/fold-plus-1.c: New test. * gcc.dg/fold-plus-2.c: New test. * gcc.dg/fold-xor-4.c: New test. * gcc.dg/fold-xor-5.c: New test. diff --git gcc/match.pd gcc/match.pd index 1ab2b1c..badb80a 100644 --- gcc/match.pd +++ gcc/match.pd @@ -325,6 +325,34 @@ along with GCC; see the file COPYING3. If not see (bit_xor:c (bit_and @0 @1) (bit_ior @0 @1)) (bit_xor @0 @1)) +/* (x ^ y) ^ (x | y) - x y */ +(simplify + (bit_xor:c (bit_xor @0 @1) (bit_ior @0 @1)) + (bit_and @0 @1)) + +/* (x y) + (x ^ y) - x | y */ +/* (x y) | (x ^ y) - x | y */ +/* (x y) ^ (x ^ y) - x | y */ +(for op (plus bit_ior bit_xor) + (simplify + (op:c (bit_and @0 @1) (bit_xor @0 @1)) + (bit_ior @0 @1))) + +/* (x y) + (x | y) - x + y */ +(simplify + (plus:c (bit_and @0 @1) (bit_ior @0 @1)) + (plus @0 @1)) + +/* (x | y) - (x ^ y) - x y */ +(simplify + (minus (bit_ior @0 @1) (bit_xor @0 @1)) + (bit_and @0 @1)) + +/* (x | y) - (x y) - x ^ y */ +(simplify + (minus (bit_ior @0 @1) (bit_and @0 @1)) + (bit_xor @0 @1)) + (simplify (abs (negate @0)) (abs @0)) diff --git gcc/testsuite/gcc.dg/fold-ior-1.c gcc/testsuite/gcc.dg/fold-ior-1.c index e69de29..0358eb5 100644 --- gcc/testsuite/gcc.dg/fold-ior-1.c +++ gcc/testsuite/gcc.dg/fold-ior-1.c @@ -0,0 +1,69 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-cddce1 } */ + +int +fn1 (int a, int b) +{ + int tem1 = a b; + int tem2 = a ^ b; + return tem1 | tem2; +} + +int +fn2 (int a, int b) +{ + int tem1 = b a; + int tem2 = a ^ b; + return tem1 | tem2; +} + +int +fn3 (int a, int b) +{ + int tem1 = a b; + int tem2 = b ^ a; + return tem1 | tem2; +} + +int +fn4 (int a, int b) +{ + int tem1 = b a; + int tem2 = b ^ a; + return tem1 | tem2; +} + +int +fn5 (int a, int b) +{ + int tem1 = a ^ b; + int tem2 = a b; + return tem1 | tem2; +} + +int +fn6 (int a, int b) +{ + int tem1 = b ^ a; + int tem2 = a b; + return tem1 | tem2; +} + +int +fn7 (int a, int b) +{ + int tem1 = a ^ b; + int tem2 = b a; + return tem1 | tem2; +} + +int +fn8 (int a, int b) +{ + int tem1 = b ^ a; + int tem2 = b a; + return tem1 | tem2; +} + +/* { dg-final { scan-tree-dump-notcddce1 } } */ +/* { dg-final { scan-tree-dump-not \\^ cddce1 } } */ diff --git gcc/testsuite/gcc.dg/fold-minus-2.c gcc/testsuite/gcc.dg/fold-minus-2.c index e69de29..6501f2f 100644 --- gcc/testsuite/gcc.dg/fold-minus-2.c +++ gcc/testsuite/gcc.dg/fold-minus-2.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-cddce1 } */ + +int +fn1 (int a, int b) +{ + int tem1 = a | b; + int tem2 = a ^ b; + return tem1 - tem2; +} + +int +fn2 (int a, int b) +{ + int tem1 = b | a; + int tem2 = a ^ b; + return tem1 - tem2; +} + +int +fn3 (int a, int b) +{ + int tem1 = a | b; + int tem2 = b ^ a; + return tem1 - tem2; +} + +int +fn4 (int a, int b) +{ + int tem1 = b | a; + int tem2 = b ^ a; + return tem1 - tem2; +} + +/* { dg-final { scan-tree-dump-not \\^ cddce1 } } */ +/* { dg-final { scan-tree-dump-not \\| cddce1 } } */ diff --git gcc/testsuite/gcc.dg/fold-minus-3.c gcc/testsuite/gcc.dg/fold-minus-3.c index e69de29..e7adce6 100644 --- gcc/testsuite/gcc.dg/fold-minus-3.c +++ gcc/testsuite/gcc.dg/fold-minus-3.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-cddce1 } */ + +int +fn1 (int a, int b) +{ + int tem1 = a | b; + int tem2 = a b; + return tem1 - tem2; +} + +int +fn2 (int a, int b) +{ + int tem1 = b | a; + int tem2 = a b; + return tem1 - tem2; +} + +int +fn3 (int a, int b)
Re: [gomp4] Preserve NVPTX reconvergence points
On Fri, 19 Jun 2015 14:25:57 +0200 Jakub Jelinek ja...@redhat.com wrote: On Fri, Jun 19, 2015 at 11:53:14AM +0200, Bernd Schmidt wrote: On 05/28/2015 05:08 PM, Jakub Jelinek wrote: I understand it is more work, I'd just like to ask that when designing stuff for the OpenACC offloading you (plural) try to take the other offloading devices and host fallback into account. The problem is that many of the transformations we need to do are really GPU specific, and with the current structure of omplow/ompexp they are being done in the host compiler. The offloading scheme we decided on does not give us the means to write out multiple versions of an offloaded function where each target gets a different one. For that reason I think we should postpone these lowering decisions until we're in the accel compiler, where they could be controlled by target hooks, and over the last two weeks I've been doing some experiments to see how that could be achieved. I wonder why struct loop flags and other info together with function attributes and/or cgraph flags and other info aren't sufficient for the OpenACC needs. Have you or Thomas looked what we're doing for OpenMP simd / Cilk+ simd? Why can't the execution model (normal, vector-single and worker-single) be simply attributes on functions or cgraph node flags and the kind of #acc loop simply be flags on struct loop, like already OpenMP simd / Cilk+ simd is? One problem is that (at least on the GPU hardware we've considered so far) we're somewhat constrained in how much control we have over how the underlying hardware executes code: it's possible to draw up a scheme where OpenACC source-level control-flow semantics are reflected directly in the PTX assembly output (e.g. to say all threads in a CTA/warp will be coherent after such-and-such a loop), and lowering OpenACC directives quite early seems to make that relatively tractable. (Even if the resulting code is relatively un-optimisable due to the abnormal edges inserted to make sure that the CFG doesn't become ill-formed.) If arbitrary optimisations are done between OMP-lowering time and somewhere around vectorisation (say), it's less clear if that correspondence can be maintained. Say if the code executed by half the threads in a warp becomes physically separated from the code executed by the other half of the threads in a warp due to some loop optimisation, we can no longer easily determine where that warp will reconverge, and certain other operations (relying on coherent warps -- e.g. CTA synchronisation) become impossible. A similar issue exists for warps within a CTA. So, essentially -- I don't know how late loop lowering would interact with: (a) Maintaining a CFG that will work with PTX. (b) Predication for worker-single and/or vector-single modes (actually all currently-proposed schemes have problems with proper representation of data-dependencies for variables and compiler-generated temporaries between predicated regions.) Julian