Re: Ping^3 : [PATCH] [gcc, combine] PR46164: Don't combine the insns if a volatile register is contained.
On Wed, Apr 22, 2015 at 10:21:43AM +0800, Terry Guo wrote: > gcc/ChangeLog: > 2015-04-22 Hale Wang > Terry Guo > >PR rtl-optimization/64818 >* combine.c (can_combine_p): Don't combine user-specified register if >it is in an asm input. > > gcc/testsuite/ChangeLog: > 2015-04-22 Hale Wang > Terry Guo > >PR rtl-optimization/64818 >* gcc.target/arm/pr64818.c: New. This is okay for trunk, if it has been bootstrapped and regression tested. Thanks, Segher
Re: Ping^3 : [PATCH] [gcc, combine] PR46164: Don't combine the insns if a volatile register is contained.
On Wed, Apr 22, 2015 at 9:44 AM, Segher Boessenkool wrote: > On Tue, Apr 21, 2015 at 03:13:38PM +0800, Terry Guo wrote: >> > Did you fix the comment? REG_USERVAR_P and HARD_REGISTER_P can be >> > set for more than just register asm. >> >> Sorry for missing the patch. I believe that I addressed your patch. >> Please review it again to make sure my understanding is correct. > >> + /* Use REG_USERVAR_P and HARD_REGISTER_P to check whether DEST is a user >> + specified register, and do not eliminate such register if it is in an >> + asm input. Otherwise if allow such elimination, we may break the >> + register asm usage defined in GCC manual. */ >> + if (REG_P (dest) && REG_USERVAR_P (dest) && HARD_REGISTER_P (dest) >> + && extract_asm_operands (PATTERN (i3))) >> +return 0; > > The "to check whether DEST is a user-specified register" part is not > correct; this check can for example also match for function arguments > (which are hard regs) that were combined into any "normal" user var. > I don't see how we would do a better check, and disallowing combination > in this case is harmless (or even good); but the comment is misleading. > > > Segher Thanks for reviewing. Patch is updated per you suggestion. The ChangeLog is also updated as below: gcc/ChangeLog: 2015-04-22 Hale Wang Terry Guo PR rtl-optimization/64818 * combine.c (can_combine_p): Don't combine user-specified register if it is in an asm input. gcc/testsuite/ChangeLog: 2015-04-22 Hale Wang Terry Guo PR rtl-optimization/64818 * gcc.target/arm/pr64818.c: New. diff --git a/gcc/combine.c b/gcc/combine.c index 6f0007a..6cd55dd 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -1910,6 +1910,15 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn *pred ATTRIBUTE_UNUSED, set = expand_field_assignment (set); src = SET_SRC (set), dest = SET_DEST (set); + /* Do not eliminate user-specified register if it is in an + asm input because we may break the register asm usage defined + in GCC manual if allow to do so. + Be aware that this may cover more cases than we expect but this + should be harmless. */ + if (REG_P (dest) && REG_USERVAR_P (dest) && HARD_REGISTER_P (dest) + && extract_asm_operands (PATTERN (i3))) +return 0; + /* Don't eliminate a store in the stack pointer. */ if (dest == stack_pointer_rtx /* Don't combine with an insn that sets a register to itself if it has diff --git a/gcc/testsuite/gcc.target/arm/pr64818.c b/gcc/testsuite/gcc.target/arm/pr64818.c new file mode 100644 index 000..bddd846 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr64818.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-O1" } */ + +char temp[16]; +extern int foo1 (void); + +void foo (void) +{ + int i; + int len; + + while (1) + { +len = foo1 (); +register int a asm ("r0") = 5; +register char *b asm ("r1") = temp; +register int c asm ("r2") = len; +asm volatile ("mov %[r0], %[r0]\n mov %[r1], %[r1]\n mov %[r2], %[r2]\n" + : "+m"(*b) + : [r0]"r"(a), [r1]"r"(b), [r2]"r"(c)); + +for (i = 0; i < len; i++) +{ + if (temp[i] == 10) + return; +} + } +} + +/* { dg-final { scan-assembler "\[\\t \]+mov\ r1,\ r1" } } */
Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state
On Tue, Apr 21, 2015 at 03:56:18PM -0500, Peter Bergner wrote: > On Fri, 2015-03-20 at 17:41 -0500, Peter Bergner wrote: > > On Fri, 2015-03-20 at 15:52 -0500, Segher Boessenkool wrote: > > > Maybe it would be nicer if the builtin-expansion code handled the copy > > > from cc, instead of stacking on RTL expanders. > > > > That would allow getting rid of the expanders completely, which > > would be nice. I'd have to somehow add some type of RS6000_BTC_* > > flag onto the builtin though, so I can tell during builtin expansion > > whether I need to emit the cr copy code or not. > > Ok, the patch below implements your suggestion. It looks good, thanks. Some minor comments... > This patch also fixes some issues I hit with the tabortdc[i] and > htm_m[ft]spr_ patterns when used with -m32 -mpowerpc64. Running the testsuite, or did you actually try to _use_ -m32 -mpowerpc64? :-) > +(define_insn "tabortdc" >[(set (match_operand:CC 3 "cc_reg_operand" "=x") > (unspec_volatile:CC [(match_operand 0 "u5bit_cint_operand" "n") > - (match_operand:SI 1 "gpc_reg_operand" "r") > - (match_operand:SI 2 "gpc_reg_operand" "r")] > + (match_operand:DI 1 "gpc_reg_operand" "r") > + (match_operand:DI 2 "gpc_reg_operand" "r")] > UNSPECV_HTM_TABORTDC))] > - "TARGET_HTM" > + "TARGET_POWERPC64 && TARGET_HTM" >"tabortdc. %0,%1,%2" >[(set_attr "type" "htm") > (set_attr "length" "4")]) Maybe you can fold tabortdc with tabortwc now? Use one UNSPEC name for both, :GPR and ? > + case HTM_BUILTIN_TTEST: /* Alias for: tabortwci. 0,r0,0 */ > + op[nopnds++] = GEN_INT (0); > + op[nopnds++] = gen_rtx_REG (SImode, 0); > + op[nopnds++] = GEN_INT (0); Is that really r0, isn't that (0|rA)? [Too lazy to read the docs myself right now, sorry.] > + if (attr & RS6000_BTC_CR) > + { > + if (fcode == HTM_BUILTIN_TBEGIN) > + { > + /* Emit code to set TARGET to true or false depending on > +whether the tbegin. instruction successfully or failed > +to start a transaction. We do this by placing the 1's > +complement of CR's EQ bit into TARGET. */ > + rtx scratch = gen_reg_rtx (SImode); > + emit_insn (gen_rtx_SET (VOIDmode, scratch, > + gen_rtx_EQ (SImode, cr, > + const0_rtx))); > + emit_insn (gen_rtx_SET (VOIDmode, target, > + gen_rtx_XOR (SImode, scratch, > + GEN_INT (1; > + } > + else > + { > + /* Emit code to copy the 4-bit condition register field > +CR into the least significant end of register TARGET. */ > + rtx scratch1 = gen_reg_rtx (SImode); > + rtx scratch2 = gen_reg_rtx (SImode); > + rtx subreg = simplify_gen_subreg (CCmode, scratch1, SImode, 0); > + emit_insn (gen_movcc (subreg, cr)); > + emit_insn (gen_lshrsi3 (scratch2, scratch1, GEN_INT (28))); > + emit_insn (gen_andsi3 (target, scratch2, GEN_INT (0xf))); > + } > + } Don't we have helper functions/expanders to do these moves? Yuck. > -/* { dg-final { scan-assembler-times "tabortdc\\." 1 } } */ > -/* { dg-final { scan-assembler-times "tabortdci\\." 1 } } */ > +/* { dg-final { scan-assembler-times "tabortdc\\." 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "tabortdci\\." 1 { target lp64 } } } */ This skips this test on -m32 -mpowerpc64, is that on purpose? Segher
Re: Ping^3 : [PATCH] [gcc, combine] PR46164: Don't combine the insns if a volatile register is contained.
On Tue, Apr 21, 2015 at 03:13:38PM +0800, Terry Guo wrote: > > Did you fix the comment? REG_USERVAR_P and HARD_REGISTER_P can be > > set for more than just register asm. > > Sorry for missing the patch. I believe that I addressed your patch. > Please review it again to make sure my understanding is correct. > + /* Use REG_USERVAR_P and HARD_REGISTER_P to check whether DEST is a user > + specified register, and do not eliminate such register if it is in an > + asm input. Otherwise if allow such elimination, we may break the > + register asm usage defined in GCC manual. */ > + if (REG_P (dest) && REG_USERVAR_P (dest) && HARD_REGISTER_P (dest) > + && extract_asm_operands (PATTERN (i3))) > +return 0; The "to check whether DEST is a user-specified register" part is not correct; this check can for example also match for function arguments (which are hard regs) that were combined into any "normal" user var. I don't see how we would do a better check, and disallowing combination in this case is harmless (or even good); but the comment is misleading. Segher
Re: Re: [RFC stage 1] Proposed new warning: -Wmisleading-indentation
On Tue, 2015-04-21 at 20:14 +0200, Manuel López-Ibáñez wrote: > On 21/04/15 18:07, David Malcolm wrote: > > > > I have the patch working now for the C++ frontend. Am attaching the > > work-in-progress (sans ChangeLog). This one (v2) bootstrapped and > > regrtested on x86_64-unknown-linux-gnu (Fedora 20), with: > >63 new "PASS" results in gcc.sum > >189 new "PASS" results in g++.sum > > for the new test cases (relative to a control build of r48). > > > > I still do not understand why you need so much complexity as I explained > here: > https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00830.html > > The attached patch passes all your tests except Wmisleading-indentation-3.c, > which warns only once instead of two times (it doesn't seem a big loss to > me), > and Wmisleading-indentation-7.c which I did not bother to implement but it is > straightforward application of the if-case to the else-case. Aha! Thanks. Your approach is much simpler, and likely much faster. > Perhaps I'm missing something that is not reflected in your tests? No, mostly just my lack of expertise on the frontend :) > BTW, the start-up cost of GCC is not negligible, thus grouping similar > testcases in a single file may pay off in the long term. Many small files > also > tend to slow down VC tools. It also makes harder to see what is tested and > what > is missing. OK. I'll finish up your version of the patch, and consolidate the testcases. Thanks Dave
Re: [patch, fortran] PR 37131
Hello Mikael and Dominique, thanks for your helpful comments! > To sum um, tests missing for the following: > array(4,:,:) > array(3:5,:) > array(3:10:2,:) > array(:,:)%comp > with both lbound == 1 and lbound != 1. > One test with lhs-rhs dependency would be good as well. I have included those (and fixed the bugs that appeared). This is done in inline_matmul_1.f90 and in inline_matmul_5.f90. > >> Index: fortran/array.c >> === >> --- fortran/array.c (Revision 18) >> +++ fortran/array.c (Arbeitskopie) >> @@ -338,6 +338,9 @@ gfc_resolve_array_spec (gfc_array_spec *as, int ch >>if (as == NULL) >> return true; >> >> + if (as->resolved) >> +return true; >> + > Why this? Because you get regressions otherwise. Not resolving an array spec twice should do no harm, and resolving it twice does so - I hit the error message in check_restricted. I'm not sure what is wrong, maybe PR 23466 was not fully fixed, but this works. > >> -static gfc_expr *create_var (gfc_expr *); >> +static gfc_expr *create_var (gfc_expr *, const char *vname=NULL); >> +static int optimize_matmul_assign (gfc_code **, int *, void *); > The function doesn't really "optimize", so name it inline_matmul_assign > instead. > Same for the comments about "optimizing MATMUL". Done. > >> @@ -524,29 +542,11 @@ constant_string_length (gfc_expr *e) >> >> } >> >> -/* Returns a new expression (a variable) to be used in place of the old one, >> - with an assignment statement before the current statement to set >> - the value of the variable. Creates a new BLOCK for the statement if >> - that hasn't already been done and puts the statement, plus the >> - newly created variables, in that block. Special cases: If the >> - expression is constant or a temporary which has already >> - been created, just copy it. */ >> - >> -static gfc_expr* >> -create_var (gfc_expr * e) > Keep a comment here. Still exists, further down. >> +static gfc_namespace* >> +insert_block () >> { >> - char name[GFC_MAX_SYMBOL_LEN +1]; >> - static int num = 1; >> - gfc_symtree *symtree; >> - gfc_symbol *symbol; >> - gfc_expr *result; >> - gfc_code *n; >>gfc_namespace *ns; >> - int i; >> >> - if (e->expr_type == EXPR_CONSTANT || is_fe_temp (e)) >> -return gfc_copy_expr (e); >> - >>/* If the block hasn't already been created, do so. */ >>if (inserted_block == NULL) >> { > >> @@ -1939,7 +1977,1049 @@ doloop_warn (gfc_namespace *ns) >>gfc_code_walker (&ns->code, doloop_code, do_function, NULL); >> } >> >> +/* This selction deals with inlining calls to MATMUL. */ > section >> >> +/* Auxiliary function to build and simplify an array inquiry function. >> + dim is zero-based. */ >> + >> +static gfc_expr * >> +get_array_inq_function (gfc_expr *e, int dim, gfc_isym_id id) > It's better if the id is the first argument, so that the function id and > its arguments come in their natural order. Changed. > [...] > >> +/* Builds a logical expression. */ >> + >> +static gfc_expr* >> +build_logical_expr (gfc_expr *e1, gfc_expr *e2, gfc_intrinsic_op op) > Same here, op first. Also changed. > [...] > >> + >> +/* Return an operation of one two gfc_expr (one if e2 is NULL). This assumes >> + compatible typespecs. */ >> + >> +static gfc_expr * >> +get_operand (gfc_intrinsic_op op, gfc_expr *e1, gfc_expr *e2) > Here it's good already. :-) :-) > [...] > >> +/* Insert code to issue a runtime error if the expressions are not equal. >> */ >> + >> +static gfc_code * >> +runtime_error_ne (gfc_expr *e1, gfc_expr *e2, const char *msg) >> +{ >> + gfc_expr *cond; >> + gfc_code *if_1, *if_2; >> + gfc_code *c; >> + // const char *name; > Any reason... > >> + gfc_actual_arglist *a1, *a2, *a3; >> + >> + gcc_assert (e1->where.lb); >> + /* Build the call to runtime_error. */ >> + c = XCNEW (gfc_code); >> + c->op = EXEC_CALL; >> + c->loc = e1->where; >> + // name = gfc_get_string (PREFIX ("runtime_error")); >> + // c->resolved_sym = gfc_get_intrinsic_sub_symbol (name); > ... to keep these? Removed. >> + while (ref) >> +{ >> + if (ref->type == REF_ARRAY && ref->u.ar.type != AR_ELEMENT) >> +break; >> + >> + ref = ref->next; >> + >> +} >> + ar = &ref->u.ar; > You can probably use gfc_find_array_ref here. Changed. There are a few other places that could also benefit from gfc_find_array_ref (now I know it exists :-) > [...] > > >> + >> +/* Function to return a scalarized expression. It is assumed that indices >> are >> + zero based to make generation of DO loops easier. A zero as index will >> + access the first element along a dimension. Single element references will >> + be skipped. A NULL as an expression will be replaced by a full reference. >> + This assumes that the index loops have gfc_index_integer_kind, and that all >> + references have been frozen. */ >> + >> +static gfc_expr* >
[C PATCH] Make -Wno-shift-count-negative -Wno-shift-count-overflow work for const ints (PR c/65830)
A trivial patch to use OPT_* where they belong. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-04-21 Marek Polacek PR c/65830 * c-common.c (c_fully_fold_internal): Use OPT_Wshift_count_negative and OPT_Wshift_count_overflow. * c-c++-common/pr65830.c: New test. diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c index 7fe7fa6..64fc95f 100644 --- gcc/c-family/c-common.c +++ gcc/c-family/c-common.c @@ -1370,15 +1370,17 @@ c_fully_fold_internal (tree expr, bool in_init, bool *maybe_const_operands, && c_inhibit_evaluation_warnings == 0) { if (tree_int_cst_sgn (op1) < 0) - warning_at (loc, 0, (code == LSHIFT_EXPR -? G_("left shift count is negative") -: G_("right shift count is negative"))); + warning_at (loc, OPT_Wshift_count_negative, + (code == LSHIFT_EXPR +? G_("left shift count is negative") +: G_("right shift count is negative"))); else if (compare_tree_int (op1, TYPE_PRECISION (TREE_TYPE (orig_op0))) >= 0) - warning_at (loc, 0, (code == LSHIFT_EXPR -? G_("left shift count >= width of type") -: G_("right shift count >= width of type"))); + warning_at (loc, OPT_Wshift_count_overflow, + (code == LSHIFT_EXPR +? G_("left shift count >= width of type") +: G_("right shift count >= width of type"))); } if ((code == TRUNC_DIV_EXPR || code == CEIL_DIV_EXPR diff --git gcc/testsuite/c-c++-common/pr65830.c gcc/testsuite/c-c++-common/pr65830.c index e69de29..e115f18 100644 --- gcc/testsuite/c-c++-common/pr65830.c +++ gcc/testsuite/c-c++-common/pr65830.c @@ -0,0 +1,16 @@ +/* PR c/65830 */ +/* { dg-do compile } */ +/* { dg-options "-O -Wno-shift-count-negative -Wno-shift-count-overflow" } */ + +int +foo (int x) +{ + const int a = sizeof (int) * __CHAR_BIT__; + const int b = -7; + int c = 0; + c += x << a; /* { dg-bogus "left shift count >= width of type" } */ + c += x << b; /* { dg-bogus "left shift count is negative" } */ + c += x >> a; /* { dg-bogus "right shift count >= width of type" } */ + c += x >> b; /* { dg-bogus "right shift count is negative" } */ + return c; +} Marek
[PATCH 5/5] libcc1: 'set debug compile': Display absolute GCC driver filename
Hi, with the patches so far after (gdb) set debug compile 1 one would get: searching for compiler matching regex ^(x86_64|i.86)(-[^-]*)?-linux(-gnu)?-gcc$ found compiler x86_64-unknown-linux-gnu-gcc But I believe it is more readable to see: searching for compiler matching regex ^(x86_64|i.86)(-[^-]*)?-linux(-gnu)?-gcc$ found compiler /usr/bin/x86_64-unknown-linux-gnu-gcc I do not think the change will have functionality impact, although the filename gets used even for executing the command. Jan libcc1/ChangeLog 2015-04-21 Jan Kratochvil * findcomp.cc: Include system.h. (search_dir): Return absolute filename. --- libcc1/findcomp.cc |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libcc1/findcomp.cc b/libcc1/findcomp.cc index f02b1df..5d49e29 100644 --- a/libcc1/findcomp.cc +++ b/libcc1/findcomp.cc @@ -25,6 +25,7 @@ along with GCC; see the file COPYING3. If not see #include "libiberty.h" #include "xregex.h" #include "findcomp.hh" +#include "system.h" class scanner { @@ -68,7 +69,7 @@ search_dir (const regex_t ®exp, const std::string &dir, std::string *result) { if (regexec (®exp, filename, 0, NULL, 0) == 0) { - *result = filename; + *result = dir + DIR_SEPARATOR + filename; return true; } }
[PATCH 4/5] libcc1: Add 'set compile-gcc'
as discussed in How to use compile & execute function in GDB https://sourceware.org/ml/gdb/2015-04/msg00026.html GDB currently searches for /usr/bin/ARCH-OS-gcc and chooses one but one cannot override which one. GDB would provide new option 'set compile-gcc'. This patch does not change the libcc1 API as it overloads the triplet_regexp parameter of GCC's set_arguments according to: + if (access (triplet_regexp, X_OK) == 0) GDB counterpart: [PATCH 4/4] compile: Add 'set compile-gcc' https://sourceware.org/ml/gdb-patches/2015-04/msg00808.html Message-ID: <20150421213657.14147.60506.st...@host1.jankratochvil.net> Jan include/ChangeLog 2015-04-21 Jan Kratochvil * gcc-interface.h (enum gcc_base_api_version): Add comment to GCC_FE_VERSION_1. (struct gcc_base_vtable): Describe triplet_regexp parameter overload for set_arguments. libcc1/ChangeLog 2015-04-21 Jan Kratochvil * libcc1.cc (libcc1_set_arguments): Implement filenames for triplet_regexp. --- include/gcc-interface.h |7 - libcc1/libcc1.cc| 62 +++ 2 files changed, 41 insertions(+), 28 deletions(-) diff --git a/include/gcc-interface.h b/include/gcc-interface.h index dd9fd50..a15edf7 100644 --- a/include/gcc-interface.h +++ b/include/gcc-interface.h @@ -46,7 +46,9 @@ enum gcc_base_api_version { GCC_FE_VERSION_0 = 0, - /* Parameter verbose has been moved from compile to set_arguments. */ + /* Parameter verbose has been moved from compile to set_arguments. + Parameter triplet_regexp of set_arguments can be also gcc driver + executable. */ GCC_FE_VERSION_1 = 1, }; @@ -69,7 +71,8 @@ struct gcc_base_vtable /* Set the compiler's command-line options for the next compilation. TRIPLET_REGEXP is a regular expression that is used to match the - configury triplet prefix to the compiler. + configury triplet prefix to the compiler; TRIPLET_REGEXP can be + also absolute filename to the computer. The arguments are copied by GCC. ARGV need not be NULL-terminated. The arguments must be set separately for each compilation; that is, after a compile is requested, the diff --git a/libcc1/libcc1.cc b/libcc1/libcc1.cc index d36073d..e2718b0 100644 --- a/libcc1/libcc1.cc +++ b/libcc1/libcc1.cc @@ -322,38 +322,48 @@ libcc1_set_arguments (struct gcc_base_context *s, self->verbose = verbose != 0; - std::string rx = make_regexp (triplet_regexp, COMPILER_NAME); - // Simulate fnotice by fprintf. - if (self->verbose) -fprintf (stderr, _("searching for compiler matching regex %s\n"), -rx.c_str()); - code = regcomp (&triplet, rx.c_str (), REG_EXTENDED | REG_NOSUB); - if (code != 0) + std::string compiler; + if (access (triplet_regexp, X_OK) == 0) { - size_t len = regerror (code, &triplet, NULL, 0); - char err[len]; + compiler = triplet_regexp; + // Simulate fnotice by fprintf. + if (self->verbose) + fprintf (stderr, _("using explicit compiler filename %s\n"), +compiler.c_str()); +} + else +{ + std::string rx = make_regexp (triplet_regexp, COMPILER_NAME); + if (self->verbose) + fprintf (stderr, _("searching for compiler matching regex %s\n"), +rx.c_str()); + code = regcomp (&triplet, rx.c_str (), REG_EXTENDED | REG_NOSUB); + if (code != 0) + { + size_t len = regerror (code, &triplet, NULL, 0); + char err[len]; - regerror (code, &triplet, err, len); + regerror (code, &triplet, err, len); - return concat ("Could not compile regexp \"", -rx.c_str (), -"\": ", -err, -(char *) NULL); -} + return concat ("Could not compile regexp \"", +rx.c_str (), +"\": ", +err, +(char *) NULL); + } - std::string compiler; - if (!find_compiler (triplet, &compiler)) -{ + if (!find_compiler (triplet, &compiler)) + { + regfree (&triplet); + return concat ("Could not find a compiler matching \"", +rx.c_str (), +"\"", +(char *) NULL); + } regfree (&triplet); - return concat ("Could not find a compiler matching \"", -rx.c_str (), -"\"", -(char *) NULL); + if (self->verbose) + fprintf (stderr, _("found compiler %s\n"), compiler.c_str()); } - regfree (&triplet); - if (self->verbose) -fprintf (stderr, _("found compiler %s\n"), compiler.c_str()); self->args.push_back (compiler);
[PATCH 2/5] libcc1: Use libcc1.so.0->libcc1.so.1
Hi, see [patch 1/5], particularly: (3) Currently there is no backward or forward compatibility although there could be one implemented. Personally I think the 'compile' feature is still in experimental stage so that it is OK to require last releases. At least in Fedora we can keep GDB<->GCC in sync. GDB counterpart: [PATCH 2/4] compile: Use libcc1.so.0->libcc1.so.1 https://sourceware.org/ml/gdb-patches/2015-04/msg00806.html Message-ID: <20150421213642.14147.93210.st...@host1.jankratochvil.net> Jan include/ChangeLog 2015-04-21 Jan Kratochvil * gcc-c-interface.h (GCC_C_FE_LIBCC): Update it to GCC_FE_VERSION_1. * gcc-interface.h (enum gcc_base_api_version): Add GCC_FE_VERSION_1. libcc1/ChangeLog 2015-04-21 Jan Kratochvil * Makefile.am (libcc1_la_LDFLAGS): Add version-info 1. * Makefile.in: Regenerate. * libcc1.cc (vtable, gcc_c_fe_context): Update it to GCC_FE_VERSION_1. --- include/gcc-c-interface.h |2 +- include/gcc-interface.h |3 ++- libcc1/Makefile.am|3 ++- libcc1/Makefile.in|4 +++- libcc1/libcc1.cc |4 ++-- 5 files changed, 10 insertions(+), 6 deletions(-) diff --git a/include/gcc-c-interface.h b/include/gcc-c-interface.h index 1b73e32..285c9c7 100644 --- a/include/gcc-c-interface.h +++ b/include/gcc-c-interface.h @@ -197,7 +197,7 @@ struct gcc_c_context /* The name of the .so that the compiler builds. We dlopen this later. */ -#define GCC_C_FE_LIBCC "libcc1.so." STRINGIFY (GCC_FE_VERSION_0) +#define GCC_C_FE_LIBCC "libcc1.so." STRINGIFY (GCC_FE_VERSION_1) /* The compiler exports a single initialization function. This macro holds its name as a symbol. */ diff --git a/include/gcc-interface.h b/include/gcc-interface.h index 34010f2..dcfa6ce 100644 --- a/include/gcc-interface.h +++ b/include/gcc-interface.h @@ -44,7 +44,8 @@ struct gcc_base_context; enum gcc_base_api_version { - GCC_FE_VERSION_0 = 0 + GCC_FE_VERSION_0 = 0, + GCC_FE_VERSION_1 = 1, }; /* The operations defined by the GCC base API. This is the vtable for diff --git a/libcc1/Makefile.am b/libcc1/Makefile.am index 7a274b3..e6a94e2 100644 --- a/libcc1/Makefile.am +++ b/libcc1/Makefile.am @@ -63,7 +63,8 @@ libcc1plugin_la_LINK = $(LIBTOOL) --tag=CXX $(AM_LIBTOOLFLAGS) \ $(CXXFLAGS) $(libcc1plugin_la_LDFLAGS) $(LTLDFLAGS) -o $@ LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS)) -libcc1_la_LDFLAGS = -module -export-symbols $(srcdir)/libcc1.sym +libcc1_la_LDFLAGS = -module -export-symbols $(srcdir)/libcc1.sym \ + -version-info 1:0:0 libcc1_la_SOURCES = findcomp.cc libcc1.cc names.cc names.hh $(shared_source) libcc1_la_LIBADD = $(libiberty) libcc1_la_DEPENDENCIES = $(libiberty_dep) diff --git a/libcc1/Makefile.in b/libcc1/Makefile.in index 1916134..ebec54c 100644 --- a/libcc1/Makefile.in +++ b/libcc1/Makefile.in @@ -279,7 +279,9 @@ libcc1plugin_la_LINK = $(LIBTOOL) --tag=CXX $(AM_LIBTOOLFLAGS) \ $(CXXFLAGS) $(libcc1plugin_la_LDFLAGS) $(LTLDFLAGS) -o $@ LTLDFLAGS = $(shell $(SHELL) $(top_srcdir)/../libtool-ldflags $(LDFLAGS)) -libcc1_la_LDFLAGS = -module -export-symbols $(srcdir)/libcc1.sym +libcc1_la_LDFLAGS = -module -export-symbols $(srcdir)/libcc1.sym \ + -version-info 1:0:0 + libcc1_la_SOURCES = findcomp.cc libcc1.cc names.cc names.hh $(shared_source) libcc1_la_LIBADD = $(libiberty) libcc1_la_DEPENDENCIES = $(libiberty_dep) diff --git a/libcc1/libcc1.cc b/libcc1/libcc1.cc index 7d7d2c1..afda023 100644 --- a/libcc1/libcc1.cc +++ b/libcc1/libcc1.cc @@ -504,7 +504,7 @@ libcc1_destroy (struct gcc_base_context *s) static const struct gcc_base_vtable vtable = { - GCC_FE_VERSION_0, + GCC_FE_VERSION_1, libcc1_set_arguments, libcc1_set_source_file, libcc1_set_print_callback, @@ -523,7 +523,7 @@ struct gcc_c_context * gcc_c_fe_context (enum gcc_base_api_version base_version, enum gcc_c_api_version c_version) { - if (base_version != GCC_FE_VERSION_0 || c_version != GCC_C_FE_VERSION_0) + if (base_version != GCC_FE_VERSION_1 || c_version != GCC_C_FE_VERSION_0) return NULL; return new libcc1 (&vtable, &c_vtable);
[PATCH 3/5] libcc1: set debug compile: Display GCC driver filename
Hi, as discussed in How to use compile & execute function in GDB https://sourceware.org/ml/gdb/2015-04/msg00026.html GDB currently searches for /usr/bin/ARCH-OS-gcc and chooses one but it does not display which one. It cannot, GCC method set_arguments() does not yet know whether 'set debug compile' is enabled or not. Unfortunately this changes libcc1 API in an incompatible way. There is a possibility of a hack to keep the API the same - one could pass "-v" option explicitly to set_arguments(), set_arguments() could compare the "-v" string and print the GCC filename accordingly. Then the 'verbose' parameter of compile() would lose its meaning. What do you think? GDB counterpart: [PATCH 3/4] compile: set debug compile: Display GCC driver filename https://sourceware.org/ml/gdb-patches/2015-04/msg00807.html Message-ID: <20150421213649.14147.79719.st...@host1.jankratochvil.net> Jan include/ChangeLog 2015-04-21 Jan Kratochvil * gcc-interface.h (enum gcc_base_api_version): Add comment to GCC_FE_VERSION_1. (struct gcc_base_vtable): Move parameter verbose from compile to set_arguments. libcc1/ChangeLog 2015-04-21 Jan Kratochvil * libcc1.cc: Include intl.h. (struct libcc1): Add field verbose. (libcc1::libcc1): Initialize it. (libcc1_set_arguments): Add parameter verbose, implement it. (libcc1_compile): Remove parameter verbose, use self's field instead. --- include/gcc-interface.h | 14 +++--- libcc1/libcc1.cc| 22 +- 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/include/gcc-interface.h b/include/gcc-interface.h index dcfa6ce..dd9fd50 100644 --- a/include/gcc-interface.h +++ b/include/gcc-interface.h @@ -45,6 +45,8 @@ struct gcc_base_context; enum gcc_base_api_version { GCC_FE_VERSION_0 = 0, + + /* Parameter verbose has been moved from compile to set_arguments. */ GCC_FE_VERSION_1 = 1, }; @@ -71,14 +73,15 @@ struct gcc_base_vtable The arguments are copied by GCC. ARGV need not be NULL-terminated. The arguments must be set separately for each compilation; that is, after a compile is requested, the - previously-set arguments cannot be reused. + previously-set arguments cannot be reused. VERBOSE can be set + to cause GCC to print some information as it works. This returns NULL on success. On failure, returns a malloc()d error message. The caller is responsible for freeing it. */ char *(*set_arguments) (struct gcc_base_context *self, const char *triplet_regexp, - int argc, char **argv); + int argc, char **argv, int /* bool */ verbose); /* Set the file name of the program to compile. The string is copied by the method implementation, but the caller must @@ -95,13 +98,10 @@ struct gcc_base_vtable void *datum); /* Perform the compilation. FILENAME is the name of the resulting - object file. VERBOSE can be set to cause GCC to print some - information as it works. Returns true on success, false on - error. */ + object file. Returns true on success, false on error. */ int /* bool */ (*compile) (struct gcc_base_context *self, -const char *filename, -int /* bool */ verbose); +const char *filename); /* Destroy this object. */ diff --git a/libcc1/libcc1.cc b/libcc1/libcc1.cc index afda023..d36073d 100644 --- a/libcc1/libcc1.cc +++ b/libcc1/libcc1.cc @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see #include "xregex.h" #include "findcomp.hh" #include "compiler-name.h" +#include "intl.h" struct libcc1; @@ -66,6 +67,9 @@ struct libcc1 : public gcc_c_context std::vector args; std::string source_file; + + /* Non-zero as an equivalent to gcc driver option "-v". */ + bool verbose; }; // A local subclass of connection that holds a back-pointer to the @@ -97,7 +101,8 @@ libcc1::libcc1 (const gcc_base_vtable *v, print_function (NULL), print_datum (NULL), args (), -source_file () +source_file (), +verbose (false) { base.ops = v; c_ops = cv; @@ -309,13 +314,19 @@ make_regexp (const char *triplet_regexp, const char *compiler) static char * libcc1_set_arguments (struct gcc_base_context *s, const char *triplet_regexp, - int argc, char **argv) + int argc, char **argv, int verbose) { libcc1 *self = (libcc1 *) s; regex_t triplet; int code; + self->verbose = verbose != 0; + std::string rx = make_regexp (triplet_regexp, COMPILER_NAME); + // Simulate fnotice by fprintf. + if (self->verbose) +fprintf (stderr, _("searching for compiler matching regex %s\n"), +rx.c_str()); code = regco
[PATCH 1/5] libcc1: Make libcc1.so->libcc1.so.0
Hi, the next [patch 3/5] will change the libcc1.so API. I am not sure if the API change gets approved that way but for such case: (1) We really need to change GCC_FE_VERSION_0 -> GCC_FE_VERSION_1, this feature is there for this purpose. That is [patch 2/5]. (2) Currently GDB does only dlopen("libcc1.so") and then depending on which libcc1.so version it would find first it would succeed/fail. I guess it is more convenient to do dlopen("libcc1.so.1") instead (where ".1"=".x" corresponds to GCC_FE_VERSION_x). That is this patch (with x=0). GCC_C_FE_LIBCC is used only by GDB. (3) Currently there is no backward or forward compatibility although there could be one implemented. Personally I think the 'compile' feature is still in experimental stage so that it is OK to require last releases. At least in Fedora we can keep GDB<->GCC in sync. GDB counterpart: [PATCH 1/4] compile: Use libcc1.so->libcc1.so.0 https://sourceware.org/ml/gdb-patches/2015-04/msg00805.html Message-ID: <20150421213635.14147.15653.st...@host1.jankratochvil.net> Jan include/ChangeLog 2015-04-21 Jan Kratochvil * gcc-c-interface.h (GCC_C_FE_LIBCC): Quote it. Append GCC_FE_VERSION_0. --- include/gcc-c-interface.h |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/gcc-c-interface.h b/include/gcc-c-interface.h index 25ef62f..1b73e32 100644 --- a/include/gcc-c-interface.h +++ b/include/gcc-c-interface.h @@ -197,7 +197,7 @@ struct gcc_c_context /* The name of the .so that the compiler builds. We dlopen this later. */ -#define GCC_C_FE_LIBCC libcc1.so +#define GCC_C_FE_LIBCC "libcc1.so." STRINGIFY (GCC_FE_VERSION_0) /* The compiler exports a single initialization function. This macro holds its name as a symbol. */
Re: [PATCH, rs6000, testsuite] Fix PR target/64579, __TM_end __builtin_tend failed to return transactional state
On Fri, 2015-03-20 at 17:41 -0500, Peter Bergner wrote: > On Fri, 2015-03-20 at 15:52 -0500, Segher Boessenkool wrote: > > Maybe it would be nicer if the builtin-expansion code handled the copy > > from cc, instead of stacking on RTL expanders. > > That would allow getting rid of the expanders completely, which > would be nice. I'd have to somehow add some type of RS6000_BTC_* > flag onto the builtin though, so I can tell during builtin expansion > whether I need to emit the cr copy code or not. Ok, the patch below implements your suggestion. > > Expanders have no constraints (you can leave out the field completely). > > Doesn't gen* warn on non-empty constraints? > > Correct, and David mentioned this when I first submitted the original > HTM patch, but I replied they were added to allow better error > messages when people used out of range integers for builtin args: This is a moot point now that the expanders are gone. > > > --- gcc/testsuite/gcc.target/powerpc/htm-1.c (revision 0) > > > +++ gcc/testsuite/gcc.target/powerpc/htm-1.c (working copy) > > > @@ -0,0 +1,53 @@ > > > +/* { dg-do run { target { powerpc*-*-* && htm_hw } } } */ > > > +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ > > > > htm_hw already disallows Darwin? [ And {"*"} {""} is default. ] Fixed. This patch also fixes some issues I hit with the tabortdc[i] and htm_m[ft]spr_ patterns when used with -m32 -mpowerpc64. This passed bootstrap and regtesting with no regressions, so how does this look for stage1? I'd also like to backport this to the open release branches so everything matches (obviously waiting until after 5.1). Is that ok once I've verified bootstrap and regtesting on each release branch? Peter gcc/ PR target/64579 * config/rs6000/htm.md: Remove all define_expands. (tabort_internal, tabortdc_internal, tabortdci_internal, tabortwc_internal, tabortwci_internal, tbegin_internal, tcheck_internal, tend_internal, trechkpt_internal, treclaim_internal, tsr_internal): Rename define_insns from this... (tabort, tabortdc, tabortdci, tabortwc, tabortwci, tbegin, tcheck, tend, trechkpt, treclaim, tsr): ...to this. (tabort): Use gpc_reg_operand. (tabortdc, tabortdci): Match DImode registers. Add TARGET_POWERPC64 constraint. (tcheck_internal): Remove operand. (htm_mfspr_, htm_mtspr_): Use GPR mode macro. * config/rs6000/htmxlintrin.h (__TM_end): Use _HTM_TRANSACTIONAL as expected value. * config/rs6000/rs6000-builtin.def (BU_HTM_SPR0): Remove. (BU_HTM_SPR1): Rename to BU_HTM_V1. Remove use of RS6000_BTC_SPR. (tabort, tabortdc, tabortdci, tabortwc, tabortwci, tbegin, tcheck, tend, tendall, trechkpt, treclaim, tresume, tsuspend, tsr, ttest): Pass in the RS6000_BTC_CR attribute. (get_tfhar, set_tfhar, get_tfiar, set_tfiar, get_texasr, set_texasr, get_texasru, set_texasru): Pass in the RS6000_BTC_SPR attribute. (tcheck): Remove builtin argument. (ttest): Update pattern name. * config/rs6000/rs6000.c (rs6000_htm_spr_icode): Use TARGET_POWERPC64 not TARGET_64BIT. (htm_expand_builtin): Fix usage of expandedp. Disallow usage of the tabortdc and tabortdci builtins when not in 64-bit mode. Modify code to handle the loss of the HTM define_expands. Emit code to copy the CR register to TARGET. (htm_init_builtins): Modify code to handle the loss of the HTM define_expands. * config/rs6000/rs6000.h (RS6000_BTC_32BIT): Delete. (RS6000_BTC_64BIT): Likewise. (RS6000_BTC_CR): New macro. * doc/extend.texi: Update documentation for htm builtins. gcc/testsuite/ PR target/64579 * gcc.target/powerpc/htm-1.c: New test. * gcc.target/powerpc/htm-builtin-1.c (__builtin_tabortdc): Only test on 64-bit compiles. (__builtin_tabortdci): Likewise. (__builtin_tcheck): Remove operand. * lib/target-supports.exp (check_htm_hw_available): New function. Index: gcc/config/rs6000/htm.md === --- gcc/config/rs6000/htm.md(revision 222127) +++ gcc/config/rs6000/htm.md(working copy) @@ -47,108 +47,38 @@ (define_c_enum "unspecv" ]) -(define_expand "tabort" - [(set (match_dup 2) - (unspec_volatile:CC [(match_operand:SI 1 "int_reg_operand" "")] - UNSPECV_HTM_TABORT)) - (set (match_dup 3) - (eq:SI (match_dup 2) - (const_int 0))) - (set (match_operand:SI 0 "int_reg_operand" "") - (xor:SI (match_dup 3) - (const_int 1)))] - "TARGET_HTM" -{ - operands[2] = gen_rtx_REG (CCmode, CR0_REGNO); - operands[3] = gen_reg_rtx (SImode); -}) - -(define_insn "*tabort_internal" +(define_insn "tabort" [(set (match_operand:CC 1 "cc_reg_operand" "=x") - (unspec_vola
Handle oacc kernels with other directives (was: openacc kernels directive -- initial support)
Hi! On Sat, 15 Nov 2014 13:14:52 +0100, Tom de Vries wrote: > I'm submitting a patch series with initial support for the oacc kernels > directive. Committed to gomp-4_0-branch in r88: commit 7109b39defb87bc839983339c9fb4cdcb3891238 Author: tschwinge Date: Tue Apr 21 20:32:01 2015 + Handle oacc kernels with other directives Mark directives with fn spec attributes to prevent them from acting as optimization barrier. gcc/ * builtin-attrs.def (DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING. (ATTR_FNSPEC_DOT_DOT_r_r_r_NOTHROW_LIST): Add DEF_ATTR_TREE_LIST. * omp-builtins.def (BUILT_IN_GOACC_DATA_START) (BUILT_IN_GOACC_ENTER_EXIT_DATA, BUILT_IN_GOACC_UPDATE): Use DEF_GOACC_BUILTIN_FNSPEC instead of DEF_GOACC_BUILTIN. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-data-2.c: New test. * c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: New test. * c-c++-common/goacc/kernels-loop-data-enter-exit.c: New test. * c-c++-common/goacc/kernels-loop-data-update.c: New test. * c-c++-common/goacc/kernels-loop-data.c: New test. * c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: New test. * gfortran.dg/goacc/kernels-loop-data-2.f95: New test. * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: New test. * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: New test. * gfortran.dg/goacc/kernels-loop-data-update.f95: New test. * gfortran.dg/goacc/kernels-loop-data.f95: New test. * gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: New test. libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c: New test. * testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95: New test. * testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95: New test. * testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95: New test. * testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95: New test. * testsuite/libgomp.oacc-fortran/kernels-loop-data.f95: New test. * testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@88 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |6 ++ gcc/builtin-attrs.def |3 + gcc/omp-builtins.def | 21 +++--- gcc/testsuite/ChangeLog.gomp | 15 + .../c-c++-common/goacc/kernels-loop-data-2.c | 71 .../goacc/kernels-loop-data-enter-exit-2.c | 69 +++ .../goacc/kernels-loop-data-enter-exit.c | 66 ++ .../c-c++-common/goacc/kernels-loop-data-update.c | 66 ++ .../c-c++-common/goacc/kernels-loop-data.c | 65 ++ .../goacc/kernels-parallel-loop-data-enter-exit.c | 67 ++ .../gfortran.dg/goacc/kernels-loop-data-2.f95 | 52 ++ .../goacc/kernels-loop-data-enter-exit-2.f95 | 52 ++ .../goacc/kernels-loop-data-enter-exit.f95 | 50 ++ .../gfortran.dg/goacc/kernels-loop-data-update.f95 | 49 ++ .../gfortran.dg/goacc/kernels-loop-data.f95| 50 ++ .../kernels-parallel-loop-data-enter-exit.f95 | 51 ++ libgomp/ChangeLog.gomp | 24 +++ .../kernels-loop-data-2.c | 56 +++ .../kernels-loop-data-enter-exit-2.c | 54 +++ .../kernels-loop-data-enter-exit.c | 51 ++ .../kernels-loop-data-update.c | 53 +++ .../libgomp.oacc-c-c++-common/kernels-loop-data.c | 50 ++ .../kernels-parallel-loop-data-enter-exit.c| 52 ++ .../libgomp.oacc-fortran/kernels-loop-data-2.f95 | 38 +++ .../kernels-loop-data-enter-exit-2.f95 | 38 +++ .../kernels-loop-data-enter-exit.f95 | 36 ++ .../kernels-loop-data-update.f95 | 36 ++ .../libgomp.oacc-fortran/kernels-loop-data.f95 | 36 ++ .../kernels-parallel-loop-data-enter-
Handle global loop counters in c/c++ oacc kernels (was: openacc kernels directive -- initial support)
Hi! On Sat, 15 Nov 2014 13:14:52 +0100, Tom de Vries wrote: > I'm submitting a patch series with initial support for the oacc kernels > directive. Committed to gomp-4_0-branch in r87: commit abaf92b2db3c0799edac63cfb846af2dbde47423 Author: tschwinge Date: Tue Apr 21 20:27:40 2015 + Handle global loop counters in c/c++ oacc kernels gcc/ * passes.def: Add pass_fre after pass_ch_oacc_kernels. gcc/testsuite/ * c-c++-common/goacc/kernels-counter-vars-function-scope.c: New test. * c-c++-common/goacc/kernels-one-counter-var.c: New test. * g++.dg/ipa/devirt-37.C: Update for new pass_fre. * g++.dg/ipa/devirt-40.C: Likewise. * g++.dg/tree-ssa/pr61034.C: Likewise. * gcc.dg/ipa/ipa-pta-13.c: Likewise. * gcc.dg/ipa/ipa-pta-3.c: Likewise. * gcc.dg/ipa/ipa-pta-4.c: Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@87 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |2 + gcc/passes.def |1 + gcc/testsuite/ChangeLog.gomp |9 .../goacc/kernels-counter-vars-function-scope.c| 55 .../c-c++-common/goacc/kernels-one-counter-var.c | 54 +++ gcc/testsuite/g++.dg/ipa/devirt-37.C | 12 ++--- gcc/testsuite/g++.dg/ipa/devirt-40.C |6 +-- gcc/testsuite/g++.dg/tree-ssa/pr61034.C| 10 ++-- gcc/testsuite/gcc.dg/ipa/ipa-pta-13.c |6 +-- gcc/testsuite/gcc.dg/ipa/ipa-pta-3.c |6 +-- gcc/testsuite/gcc.dg/ipa/ipa-pta-4.c |6 +-- 11 files changed, 144 insertions(+), 23 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index f14c3718..b1933ba 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,7 @@ 2015-04-21 Tom de Vries + * passes.def: Add pass_fre after pass_ch_oacc_kernels. + * passes.def: Add pass_scev_cprop to pass_oacc_kernels. * tree-ssa-loop.c (pass_scev_cprop::clone): New function. diff --git gcc/passes.def gcc/passes.def index 3e85808..04cbba0 100644 --- gcc/passes.def +++ gcc/passes.def @@ -91,6 +91,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_oacc_kernels); PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) NEXT_PASS (pass_ch_oacc_kernels); + NEXT_PASS (pass_fre); NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp index eed22e2..ed80f5b 100644 --- gcc/testsuite/ChangeLog.gomp +++ gcc/testsuite/ChangeLog.gomp @@ -1,6 +1,15 @@ 2015-04-21 Tom de Vries Thomas Schwinge + * c-c++-common/goacc/kernels-counter-vars-function-scope.c: New test. + * c-c++-common/goacc/kernels-one-counter-var.c: New test. + * g++.dg/ipa/devirt-37.C: Update for new pass_fre. + * g++.dg/ipa/devirt-40.C: Likewise. + * g++.dg/tree-ssa/pr61034.C: Likewise. + * gcc.dg/ipa/ipa-pta-13.c: Likewise. + * gcc.dg/ipa/ipa-pta-3.c: Likewise. + * gcc.dg/ipa/ipa-pta-4.c: Likewise. + * gcc.dg/pr41488.c: Update for new pass_scev_cprop. * gcc.dg/tree-ssa/loop-17.c: Likewise. * gcc.dg/tree-ssa/loop-39.c: Likewise. diff --git gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c new file mode 100644 index 000..06cdb29 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c @@ -0,0 +1,55 @@ +/* { dg-additional-options "-O2" } */ +/* { dg-additional-options "-ftree-parallelize-loops=32" } */ +/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */ +/* { dg-additional-options "-fdump-tree-optimized" } */ + +#include + +#define N (1024 * 512) +#define COUNTERTYPE unsigned int + +int +main (void) +{ + unsigned int *__restrict a; + unsigned int *__restrict b; + unsigned int *__restrict c; + COUNTERTYPE i; + COUNTERTYPE ii; + + a = (unsigned int *)malloc (N * sizeof (unsigned int)); + b = (unsigned int *)malloc (N * sizeof (unsigned int)); + c = (unsigned int *)malloc (N * sizeof (unsigned int)); + + for (i = 0; i < N; i++) +a[i] = i * 2; + + for (i = 0; i < N; i++) +b[i] = i * 4; + +#pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) + { +for (ii = 0; ii < N; ii++) + c[ii] = a[ii] + b[ii]; + } + + for (i = 0; i < N; i++) +if (c[i] != a[i] + b[i]) + abort (); + + free (a); + free (b); + free (c); + + return 0; +} + +/* Check that only one loop is analyzed, and that it can be parallelized. */ +/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */ +/* { dg-final { scan-tree-dump-
Handle global loop counters in fortran oacc kernels (was: openacc kernels directive -- initial support)
Hi! On Sat, 15 Nov 2014 13:14:52 +0100, Tom de Vries wrote: > I'm submitting a patch series with initial support for the oacc kernels > directive. Committed to gomp-4_0-branch in r86: commit 0c33234340aa17536c2c86e0982c42070c89226b Author: tschwinge Date: Tue Apr 21 20:22:54 2015 + Handle global loop counters in fortran oacc kernels Unable to have loop counters with a scope limited to the kernels region, and the fact that function scope inhibits parallelization, at the technical level, it seems possible to do DCE and get rid of the dead code that is inhibiting parallelization (in other words, the code copying the loop iterator value out of the region), but probably some effort would be involved. Another possibility is to add an assign of the final value of the loop iteration variable after the loop to cut the dependency, though this will only work for loops where that value is know at compile time -- which is exactly what pass_scev_cprop does. gcc/ * passes.def: Add pass_scev_cprop to pass_oacc_kernels. * tree-ssa-loop.c (pass_scev_cprop::clone): New function. gcc/testsuite/ * gcc.dg/pr41488.c: Update for new pass_scev_cprop. * gcc.dg/tree-ssa/loop-17.c: Likewise. * gcc.dg/tree-ssa/loop-39.c: Likewise. * gcc.dg/tree-ssa/scev-7.c: Likewise. * gfortran.dg/goacc/kernels-loop-2.f95: New test. * gfortran.dg/goacc/kernels-loop.f95: New test. libgomp/ * testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: New test. * testsuite/libgomp.oacc-fortran/kernels-loop.f95: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@86 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |3 ++ gcc/passes.def |1 + gcc/testsuite/ChangeLog.gomp |7 +++ gcc/testsuite/gcc.dg/pr41488.c |6 +-- gcc/testsuite/gcc.dg/tree-ssa/loop-17.c|6 +-- gcc/testsuite/gcc.dg/tree-ssa/loop-39.c|6 +-- gcc/testsuite/gcc.dg/tree-ssa/scev-7.c |6 +-- gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 | 46 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 | 40 + gcc/tree-ssa-loop.c|1 + libgomp/ChangeLog.gomp |3 ++ .../libgomp.oacc-fortran/kernels-loop-2.f95| 32 ++ .../libgomp.oacc-fortran/kernels-loop.f95 | 28 13 files changed, 173 insertions(+), 12 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index bf0ee52..f14c3718 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,8 @@ 2015-04-21 Tom de Vries + * passes.def: Add pass_scev_cprop to pass_oacc_kernels. + * tree-ssa-loop.c (pass_scev_cprop::clone): New function. + * passes.def: Add pass_parallelize_loops_oacc_kernels in pass group pass_oacc_kernels. * tree-parloops.c (create_parallel_loop, gen_parallel_loop): Add diff --git gcc/passes.def gcc/passes.def index 2d2e286..3e85808 100644 --- gcc/passes.def +++ gcc/passes.def @@ -94,6 +94,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_parallelize_loops_oacc_kernels); NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_tree_loop_done); diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp index 2c6abff..eed22e2 100644 --- gcc/testsuite/ChangeLog.gomp +++ gcc/testsuite/ChangeLog.gomp @@ -1,6 +1,13 @@ 2015-04-21 Tom de Vries Thomas Schwinge + * gcc.dg/pr41488.c: Update for new pass_scev_cprop. + * gcc.dg/tree-ssa/loop-17.c: Likewise. + * gcc.dg/tree-ssa/loop-39.c: Likewise. + * gcc.dg/tree-ssa/scev-7.c: Likewise. + * gfortran.dg/goacc/kernels-loop-2.f95: New test. + * gfortran.dg/goacc/kernels-loop.f95: New test. + * c-c++-common/goacc/kernels-loop-2.c: New test. * c-c++-common/goacc/kernels-loop.c: New test. * c-c++-common/goacc/kernels-loop-n.c: New test. diff --git gcc/testsuite/gcc.dg/pr41488.c gcc/testsuite/gcc.dg/pr41488.c index c4bc428..1f306b4 100644 --- gcc/testsuite/gcc.dg/pr41488.c +++ gcc/testsuite/gcc.dg/pr41488.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -fdump-tree-sccp-scev" } */ +/* { dg-options "-O2 -fdump-tree-sccp2-scev" } */ struct struct_t { @@ -14,5 +14,5 @@ void foo (struct struct_t* sp, int start, int end) sp->data[i+start] = 0; } -/* { dg-final { scan-tree-dump-times "Simplify PEELED_CHREC into POLYNOMIAL_CHREC" 1 "sccp" } } */ -/* { dg-final { cleanup-
RE: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only /proc/cpuinfo
Kyrill, Here's what I get on an Exynos M1: $ cat /proc/cpuinfo < Processor : AArch64 Processor rev 0 (aarch64) ... Features: fp asimd aes pmull sha1 sha2 crc32 CPU implementer : 0x53 CPU architecture: AArch64 CPU variant : 0x0 CPU part: 0x001 CPU revision: 0 Please, let me know if you need any help. Thank you, -- Evandro Menezes Austin, TX > -Original Message- > From: Kyrill Tkachov [mailto:kyrylo.tkac...@arm.com] > Sent: Monday, April 20, 2015 10:48 > To: GCC Patches > Cc: Marcus Shawcroft; Richard Earnshaw; James Greenhalgh; Evandro Menezes; > Andrew Pinski; James Greenhalgh > Subject: [PATCH][AArch64] Implement -m{cpu,tune,arch}=native using only > /proc/cpuinfo > > Hi all, > > This is an attempt to add native CPU detection to AArch64 GNU/Linux targets. > Similar to other ports we use SPEC rewriting to rewrite - > m{cpu,tune,arch}=native options into the appropriate CPU/architecture and the > architecture extension options when appropriate (i.e. +crypto/+crc etc). > > For CPU/architecture detection it gets a bit involved, especially when > running on a big.LITTLE system. My proposed approach is to look at > /proc/cpuinfo/ and search for the implementer id and part number fields that > uniquely identify each core (appropriate identifying information is added to > aarch64-cores.def). If we find two types of core we have a big.LITTLE system, > so search through the core definitions extracted from aarch64-cores.def to > find if we support such a combination (currently only cortex-a57.cortex-a53 > and cortex-a72.cortex-a53) and make sure that the implementer id field > matches up. > > I tested this on a 4xCortex-A53 + 2xCortex-A57 big.LITTLE Ubuntu GNU/Linux > system. > There are two formats for /proc/cpuinfo/ that I'm aware of. The first (old) > one has the format: > -- > processor: 0 > processor: 1 > processor: 2 > processor: 3 > processor: 4 > processor: 5 > Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 > CPU implementer: 0x41 > CPU architecture: AArch64 > CPU variant: 0x0 > CPU part: 0xd03 > -- > > In this format it lists the 6 cores but the CPU part it reports is only the > one for the core from which /proc/cpuinfo was read from (!), in this case one > of the Cortex-A53 cores. > This means we detect a different CPU depending on which core GCC was invoked > on. Not ideal really, but there's no more information that we can extract. > Given the /proc/cpuinfo above, this patch will rewrite -mcpu=native into - > mcpu=cortex-a53+fp+simd+crypto+crc > > The newer /proc/cpuinfo format proposed at > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=44 > b82b7700d05a52cd983799d3ecde1a976b3bed > looks like this: > > -- > processor : 0 > Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 > CPU implementer : 0x41 > CPU architecture: 8 > CPU variant : 0x0 > CPU part: 0xd03 > CPU revision: 0 > > processor : 1 > Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 > CPU implementer : 0x41 > CPU architecture: 8 > CPU variant : 0x0 > CPU part: 0xd03 > CPU revision: 0 > > processor : 2 > Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 > CPU implementer : 0x41 > CPU architecture: 8 > CPU variant : 0x0 > CPU part: 0xd03 > CPU revision: 0 > > processor : 3 > Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 > CPU implementer : 0x41 > CPU architecture: 8 > CPU variant : 0x0 > CPU part: 0xd03 > CPU revision: 0 > > processor : 4 > Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 > CPU implementer : 0x41 > CPU architecture: 8 > CPU variant : 0x0 > CPU part: 0xd07 > CPU revision: 0 > > processor : 5 > Features: fp asimd evtstrm aes pmull sha1 sha2 crc32 > CPU implementer : 0x41 > CPU architecture: 8 > CPU variant : 0x0 > CPU part: 0xd07 > CPU revision: 0 > -- > > The Features field is used to detect the architectural features that we map > to GCC option extensions i.e. +fp,+crypto,+simd,+crc etc. > > Similarly, -march=native would be rewritten into -march=armv8- > a+fp+simd+crypto+crc while -mtune=native into -march=cortex-a57.cortex-a53 > (the arch extension options are not valid for -mtune). > > If it detects more than one implementer ID or the implementer IDs not > matching up somewhere or some other weirdness /proc/cpuinfo or fails to > recognise the CPU it will bail out and ignore the option entirely (similarly > to other ports). > > The patch works fine with both /proc/cpuinfo formats although, as mentioned > above, it will not be able to detect the
Re: [PATCH, 7/8] Add pass_parallelize_loops_oacc_kernels to pass_oacc_kernels
Hi! On Tue, 25 Nov 2014 12:42:28 +0100, Tom de Vries wrote: > On 15-11-14 18:23, Tom de Vries wrote: > > On 15-11-14 13:14, Tom de Vries wrote: > >> I'm submitting a patch series with initial support for the oacc kernels > >> directive. > >> > >> The patch series uses pass_parallelize_loops to implement parallelization > >> of > >> loops in the oacc kernels region. > >> > >> The patch series consists of these 8 patches: > >> ... > >> 1 Expand oacc kernels after pass_build_ealias > >> 2 Add pass_oacc_kernels > >> 3 Add pass_ch_oacc_kernels to pass_oacc_kernels > >> 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels > >> 5 Add pass_loop_im to pass_oacc_kernels > >> 6 Add pass_ccp to pass_oacc_kernels > >> 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels > >> 8 Do simple omp lowering for no address taken var > >> ... > > > > This patch adds: > > - a specialized version of pass_parallelize_loops called > > pass_parloops_oacc_kernels to pass group pass_oacc_kernels, and > > - relevant test-cases. > > > > The pass only handles loops that are in a kernels region, and skips over > > bits of > > pass_parallelize_loops that are already done for oacc kernels. > > > > The pass reintroduces the use of omp_expand_local, I haven't managed to > > make it > > work yet using the external pass pass_expand_omp_ssa. > > > > An obvious limitation of the patch is the fact that we copy over the clauses > > from the kernels directive to the generated parallel directive. We'll need > > to do > > something more intelligent here, f.i. setting vector_length based on the > > parallelization factor. > > > > Another limitation is that the pass still needs -ftree-parallelize-loops to > > trigger. > > > > Updated for using pass_copyprop instead of pass_ccp in pass_oacc_kernels. > > Bootstrapped and reg-tested as before. > > OK for trunk? Committed to gomp-4_0-branch in r85: commit 74e09b9dbbe43321fb20b0174f926893bf2111bc Author: tschwinge Date: Tue Apr 21 20:06:16 2015 + Add pass_parallelize_loops_oacc_kernels to pass_oacc_kernels gcc/ * passes.def: Add pass_parallelize_loops_oacc_kernels in pass group pass_oacc_kernels. * tree-parloops.c (create_parallel_loop, gen_parallel_loop): Add function parameters region_entry and bool oacc_kernels_p. Handle oacc_kernels_p. Call create_parallel_loop with additional args. (parallelize_loops): Add function parameter oacc_kernels_p. Calculate dominance info. Skip loops that are not in a kernels region. Call gen_parallel_loop with additional args. (pass_parallelize_loops::execute): Call parallelize_loops with false argument. (pass_data_parallelize_loops_oacc_kernels): New pass_data. (class pass_parallelize_loops_oacc_kernels): New pass. (pass_parallelize_loops_oacc_kernels::execute) (make_pass_parallelize_loops_oacc_kernels): New function. * tree-pass.h (make_pass_parallelize_loops_oacc_kernels): Declare. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-2.c: New test. * c-c++-common/goacc/kernels-loop.c: New test. * c-c++-common/goacc/kernels-loop-n.c: New test. * c-c++-common/goacc/kernels-loop-mod-not-zero.c: New test. libgomp/ * testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: New test. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@85 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 17 ++ gcc/passes.def |1 + gcc/testsuite/ChangeLog.gomp |5 + gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c | 62 + .../c-c++-common/goacc/kernels-loop-mod-not-zero.c | 53 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c | 48 gcc/testsuite/c-c++-common/goacc/kernels-loop.c| 53 gcc/tree-parloops.c| 282 gcc/tree-pass.h|2 + libgomp/ChangeLog.gomp |9 + .../libgomp.oacc-c-c++-common/kernels-loop-2.c | 47 .../kernels-loop-mod-not-zero.c| 41 +++ .../libgomp.oacc-c-c++-common/kernels-loop-n.c | 47 .../libgomp.oacc-c-c++-common/kernels-loop.c | 41 +++ 14 files changed, 650 insertions(+), 58 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 0be9191..bf0ee52 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,22 @@ 2015-04-21 Tom de Vries + * passes.def: Add pass_parallelize_loops_oacc_kern
Re: [PATCH, 6/8] Add pass_copy_prop in pass_oacc_kernels
Hi! On Tue, 25 Nov 2014 12:38:55 +0100, Tom de Vries wrote: > On 15-11-14 18:22, Tom de Vries wrote: > > On 15-11-14 13:14, Tom de Vries wrote: > >> I'm submitting a patch series with initial support for the oacc kernels > >> directive. > >> > >> The patch series uses pass_parallelize_loops to implement parallelization > >> of > >> loops in the oacc kernels region. > >> > >> The patch series consists of these 8 patches: > >> ... > >> 1 Expand oacc kernels after pass_build_ealias > >> 2 Add pass_oacc_kernels > >> 3 Add pass_ch_oacc_kernels to pass_oacc_kernels > >> 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels > >> 5 Add pass_loop_im to pass_oacc_kernels > >> 6 Add pass_ccp to pass_oacc_kernels > >> 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels > >> 8 Do simple omp lowering for no address taken var > >> ... > > > > This patch adds pass_loop_ccp to pass group pass_oacc_kernels. > > > > We need this pass to simplify the loop body, and allow pass_parloops to > > detect > > that loop iterations are independent. > > > > As suggested here ( https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02993.html > ) > I've replaced the pass_ccp with pass_copyprop, which performs trivial > constant > propagation in addition to copy propagation. > > Bootstrapped and reg-tested as before. > > OK for trunk? Committed to gomp-4_0-branch in r84: commit 1c2529b64620811cbff4a50374af797ee52ef5f8 Author: tschwinge Date: Tue Apr 21 19:58:54 2015 + Add pass_copy_prop in pass_oacc_kernels gcc/ * passes.def: Add pass_copy_prop to pass group pass_oacc_kernels. * tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init conservatively. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@84 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |4 gcc/passes.def |1 + gcc/tree-ssa-copy.c |4 3 files changed, 9 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 98e33ad..0be9191 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,9 @@ 2015-04-21 Tom de Vries + * passes.def: Add pass_copy_prop to pass group pass_oacc_kernels. + * tree-ssa-copy.c (stmt_may_generate_copy): Handle .omp_data_i init + conservatively. + * passes.def: Add pass_lim in pass group pass_ch_oacc_kernels. * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass diff --git gcc/passes.def gcc/passes.def index e6c9287..e6f1c33 100644 --- gcc/passes.def +++ gcc/passes.def @@ -93,6 +93,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ch_oacc_kernels); NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); + NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () diff --git gcc/tree-ssa-copy.c gcc/tree-ssa-copy.c index 5ae8e6c..6f35f99 100644 --- gcc/tree-ssa-copy.c +++ gcc/tree-ssa-copy.c @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3. If not see #include "tree-scalar-evolution.h" #include "tree-ssa-dom.h" #include "tree-ssa-loop-niter.h" +#include "omp-low.h" /* This file implements the copy propagation pass and provides a @@ -116,6 +117,9 @@ stmt_may_generate_copy (gimple stmt) if (gimple_has_volatile_ops (stmt)) return false; + if (gimple_stmt_omp_data_i_init_p (stmt)) +return false; + /* Statements with loads and/or stores will never generate a useful copy. */ if (gimple_vuse (stmt)) return false; Grüße, Thomas signature.asc Description: PGP signature
Re: [PATCH, 5/8] Add pass_lim to pass_oacc_kernels
Hi! On Tue, 25 Nov 2014 12:30:52 +0100, Tom de Vries wrote: > On 15-11-14 18:22, Tom de Vries wrote: > > On 15-11-14 13:14, Tom de Vries wrote: > >> I'm submitting a patch series with initial support for the oacc kernels > >> directive. > >> > >> The patch series uses pass_parallelize_loops to implement parallelization > >> of > >> loops in the oacc kernels region. > >> > >> The patch series consists of these 8 patches: > >> ... > >> 1 Expand oacc kernels after pass_build_ealias > >> 2 Add pass_oacc_kernels > >> 3 Add pass_ch_oacc_kernels to pass_oacc_kernels > >> 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels > >> 5 Add pass_loop_im to pass_oacc_kernels > >> 6 Add pass_ccp to pass_oacc_kernels > >> 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels > >> 8 Do simple omp lowering for no address taken var > >> ... > > > > This patch adds pass_loop_im to pass group pass_oacc_kernels. > > > > We need this pass to simplify the loop body, and allow pass_parloops to > > detect > > that loop iterations are independent. > > > > Updated for moving pass_oacc_kernels down past pass_fre in the pass list. > > Bootstrapped and reg-tested as before. > > OK for trunk? Committed to gomp-4_0-branch in r83: commit 79112043cabc81c3a283585c9a28b6a1ab3826df Author: tschwinge Date: Tue Apr 21 19:55:42 2015 + Add pass_lim to pass_oacc_kernels gcc/ * passes.def: Add pass_lim in pass group pass_ch_oacc_kernels. gcc/testsuite/ * c-c++-common/restrict-2.c: Update for new pass_lim. * c-c++-common/restrict-4.c: Same. * g++.dg/tree-ssa/pr33615.C: Same. * g++.dg/tree-ssa/restrict1.C: Same. * gcc.dg/tm/pub-safety-1.c: Same. * gcc.dg/tm/reg-promotion.c: Same. * gcc.dg/tree-ssa/20050314-1.c: Same. * gcc.dg/tree-ssa/loop-32.c: Same. * gcc.dg/tree-ssa/loop-33.c: Same. * gcc.dg/tree-ssa/loop-34.c: Same. * gcc.dg/tree-ssa/loop-35.c: Same. * gcc.dg/tree-ssa/loop-7.c: Same. * gcc.dg/tree-ssa/pr23109.c: Same. * gcc.dg/tree-ssa/restrict-3.c: Same. * gcc.dg/tree-ssa/restrict-5.c: Same. * gcc.dg/tree-ssa/ssa-lim-1.c: Same. * gcc.dg/tree-ssa/ssa-lim-10.c: Same. * gcc.dg/tree-ssa/ssa-lim-11.c: Same. * gcc.dg/tree-ssa/ssa-lim-12.c: Same. * gcc.dg/tree-ssa/ssa-lim-2.c: Same. * gcc.dg/tree-ssa/ssa-lim-3.c: Same. * gcc.dg/tree-ssa/ssa-lim-6.c: Same. * gcc.dg/tree-ssa/ssa-lim-7.c: Same. * gcc.dg/tree-ssa/ssa-lim-8.c: Same. * gcc.dg/tree-ssa/ssa-lim-9.c: Same. * gcc.dg/tree-ssa/structopt-1.c: Same. * gfortran.dg/pr32921.f: Same. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@83 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |2 ++ gcc/passes.def |1 + gcc/testsuite/ChangeLog.gomp| 31 +++ gcc/testsuite/c-c++-common/restrict-2.c |6 +++--- gcc/testsuite/c-c++-common/restrict-4.c |6 +++--- gcc/testsuite/g++.dg/tree-ssa/pr33615.C |6 +++--- gcc/testsuite/g++.dg/tree-ssa/restrict1.C |6 +++--- gcc/testsuite/gcc.dg/tm/pub-safety-1.c |6 +++--- gcc/testsuite/gcc.dg/tm/reg-promotion.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-32.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-33.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-34.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-35.c |8 +++ gcc/testsuite/gcc.dg/tree-ssa/loop-7.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/pr23109.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c |8 +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c |6 +++--- gcc/testsuite/gfortran.dg/pr32921.f |6 +++--- 30 files changed, 117 insertions(+), 83 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 1fb060f..98e33ad 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,7 @@ 2015-04-21 Tom de Vries + * passes.def: Add pass_lim in pass group pass_ch_oacc_kernels. + * passes.def: Run pass_tree_loo
Re: [PATCH, 4/8] Add pass_tree_loop_{init,done} to pass_oacc_kernels
Hi! On Tue, 25 Nov 2014 12:29:28 +0100, Tom de Vries wrote: > On 15-11-14 18:21, Tom de Vries wrote: > > On 15-11-14 13:14, Tom de Vries wrote: > >> I'm submitting a patch series with initial support for the oacc kernels > >> directive. > >> > >> The patch series uses pass_parallelize_loops to implement parallelization > >> of > >> loops in the oacc kernels region. > >> > >> The patch series consists of these 8 patches: > >> ... > >> 1 Expand oacc kernels after pass_build_ealias > >> 2 Add pass_oacc_kernels > >> 3 Add pass_ch_oacc_kernels to pass_oacc_kernels > >> 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels > >> 5 Add pass_loop_im to pass_oacc_kernels > >> 6 Add pass_ccp to pass_oacc_kernels > >> 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels > >> 8 Do simple omp lowering for no address taken var > >> ... > > > > This patch adds pass_tree_loop_init and pass_tree_loop_init_done to > > pass_oacc_kernels. > > > > Pass_parallelize_loops is run between these passes in the pass group > > pass_tree_loop, since it requires loop information. We do the same for > > pass_oacc_kernels. > > > > Updated for moving pass_oacc_kernels down past pass_fre in the pass list. > > Bootstrapped and reg-tested as before. > > OK for trunk? Committed to gomp-4_0-branch in r82: commit cb95b4a1efcdb96c58cda986d53b20c3537c1ab7 Author: tschwinge Date: Tue Apr 21 19:51:33 2015 + Add pass_tree_loop_{init,done} to pass_oacc_kernels gcc/ * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass group pass_oacc_kernels. * tree-ssa-loop.c (pass_tree_loop_init::clone) (pass_tree_loop_done::clone): New function. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@82 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |5 + gcc/passes.def |2 ++ gcc/tree-ssa-loop.c |2 ++ 3 files changed, 9 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index d00c5e0..1fb060f 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,10 @@ 2015-04-21 Tom de Vries + * passes.def: Run pass_tree_loop_init and pass_tree_loop_done in pass + group pass_oacc_kernels. + * tree-ssa-loop.c (pass_tree_loop_init::clone) + (pass_tree_loop_done::clone): New function. + * omp-low.c (loop_in_oacc_kernels_region_p): New function. * omp-low.h (loop_in_oacc_kernels_region_p): Declare. * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels. diff --git gcc/passes.def gcc/passes.def index 5cdbc87..83ae04e 100644 --- gcc/passes.def +++ gcc/passes.def @@ -91,7 +91,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_oacc_kernels); PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) NEXT_PASS (pass_ch_oacc_kernels); + NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_expand_omp_ssa); + NEXT_PASS (pass_tree_loop_done); POP_INSERT_PASSES () NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c index a041858..2a96a39 100644 --- gcc/tree-ssa-loop.c +++ gcc/tree-ssa-loop.c @@ -272,6 +272,7 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *); + opt_pass * clone () { return new pass_tree_loop_init (m_ctxt); } }; // class pass_tree_loop_init @@ -566,6 +567,7 @@ public: /* opt_pass methods: */ virtual unsigned int execute (function *) { return tree_ssa_loop_done (); } + opt_pass * clone () { return new pass_tree_loop_done (m_ctxt); } }; // class pass_tree_loop_done Grüße, Thomas signature.asc Description: PGP signature
Re: [PATCH, 3/8] Add pass_ch_oacc_kernels to pass_oacc_kernels
Hi! On Tue, 25 Nov 2014 12:27:34 +0100, Tom de Vries wrote: > On 15-11-14 18:21, Tom de Vries wrote: > > On 15-11-14 13:14, Tom de Vries wrote: > >> Hi, > >> > >> I'm submitting a patch series with initial support for the oacc kernels > >> directive. > >> > >> The patch series uses pass_parallelize_loops to implement parallelization > >> of > >> loops in the oacc kernels region. > >> > >> The patch series consists of these 8 patches: > >> ... > >> 1 Expand oacc kernels after pass_build_ealias > >> 2 Add pass_oacc_kernels > >> 3 Add pass_ch_oacc_kernels to pass_oacc_kernels > >> 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels > >> 5 Add pass_loop_im to pass_oacc_kernels > >> 6 Add pass_ccp to pass_oacc_kernels > >> 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels > >> 8 Do simple omp lowering for no address taken var > >> ... > > > > This patch adds a pass_ch_oacc_kernels to the pass group pass_oacc_kernels. > > > > The idea is that pass_parallelize_loops only deals with loops for which the > > header has been copied, so the easiest way to meet that requirement when > > running > > pass_parallelize_loops in group pass_oacc_kernels, is to run pass_ch as a > > part > > of pass_oacc_kernels. > > > > We define a seperate pass pass_ch_oacc_kernels, to leave all loops that > > aren't > > part of a kernels region alone. > > > > Updated for moving pass_oacc_kernels down past pass_fre in the pass list. > > Bootstrapped and reg-tested as before. > > OK for trunk? Committed to gomp-4_0-branch in r81: commit 58c33a7965c379b55b549d50e3b79b2252bcc876 Author: tschwinge Date: Tue Apr 21 19:48:16 2015 + Add pass_ch_oacc_kernels to pass_oacc_kernels gcc/ * omp-low.c (loop_in_oacc_kernels_region_p): New function. * omp-low.h (loop_in_oacc_kernels_region_p): Declare. * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels. * tree-pass.h (make_pass_ch_oacc_kernels): Declare * tree-ssa-loop-ch.c: Include omp-low.h. (pass_ch_execute): Declare. (pass_ch::execute): Factor out ... (pass_ch_execute): ... this new function. If handling oacc kernels, skip loops that are not in oacc kernels region. (pass_ch_oacc_kernels::execute): (pass_data_ch_oacc_kernels): New pass_data. (class pass_ch_oacc_kernels): New pass. (pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New function. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@81 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 15 gcc/omp-low.c | 91 gcc/omp-low.h |2 ++ gcc/passes.def |1 + gcc/tree-pass.h|1 + gcc/tree-ssa-loop-ch.c | 59 +-- 6 files changed, 167 insertions(+), 2 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 8a53ad8..d00c5e0 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,20 @@ 2015-04-21 Tom de Vries + * omp-low.c (loop_in_oacc_kernels_region_p): New function. + * omp-low.h (loop_in_oacc_kernels_region_p): Declare. + * passes.def: Add pass_ch_oacc_kernels to pass group pass_oacc_kernels. + * tree-pass.h (make_pass_ch_oacc_kernels): Declare + * tree-ssa-loop-ch.c: Include omp-low.h. + (pass_ch_execute): Declare. + (pass_ch::execute): Factor out ... + (pass_ch_execute): ... this new function. If handling oacc kernels, + skip loops that are not in oacc kernels region. + (pass_ch_oacc_kernels::execute): + (pass_data_ch_oacc_kernels): New pass_data. + (class pass_ch_oacc_kernels): New pass. + (pass_ch_oacc_kernels::execute, make_pass_ch_oacc_kernels): New + function. + * passes.def: Add pass group pass_oacc_kernels. * tree-pass.h (make_pass_oacc_kernels): Declare. * tree-ssa-loop.c (gate_oacc_kernels): New static function. diff --git gcc/omp-low.c gcc/omp-low.c index 16d9a5e..1b03ae6 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -13920,4 +13920,95 @@ gimple_stmt_omp_data_i_init_p (gimple stmt) SSA_OP_DEF); } +/* Return true if LOOP is inside a kernels region. */ + +bool +loop_in_oacc_kernels_region_p (struct loop *loop, basic_block *region_entry, + basic_block *region_exit) +{ + bitmap excludes_bitmap = BITMAP_GGC_ALLOC (); + bitmap region_bitmap = BITMAP_GGC_ALLOC (); + bitmap_clear (region_bitmap); + + if (region_entry != NULL) +*region_entry = NULL; + if (region_exit != NULL) +*region_exit = NULL; + + basic_block bb; + gimple last; + FOR_EACH_BB_FN (bb, cfun) +{ + if (bitmap_bit_p (region_bitmap, bb->index)) + continue; + + last = last_stmt (bb); + if (!last) + continue; + +
Re: [PATCH, 2/8] Add pass_oacc_kernels
Hi! On Tue, 25 Nov 2014 12:25:35 +0100, Tom de Vries wrote: > On 15-11-14 18:20, Tom de Vries wrote: > > On 15-11-14 13:14, Tom de Vries wrote: > >> I'm submitting a patch series with initial support for the oacc kernels > >> directive. > >> > >> The patch series uses pass_parallelize_loops to implement parallelization > >> of > >> loops in the oacc kernels region. > >> > >> The patch series consists of these 8 patches: > >> ... > >> 1 Expand oacc kernels after pass_build_ealias > >> 2 Add pass_oacc_kernels > >> 3 Add pass_ch_oacc_kernels to pass_oacc_kernels > >> 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels > >> 5 Add pass_loop_im to pass_oacc_kernels > >> 6 Add pass_ccp to pass_oacc_kernels > >> 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels > >> 8 Do simple omp lowering for no address taken var > >> ... > > > > This patch adds a pass group pass_oacc_kernels. > > > > The rationale is that we want a pass group to run oacc kernels region > > related > > (optimization) passes in. > > > > Updated for moving pass_oacc_kernels down past pass_fre in the pass list. > > Bootstrapped and reg-tested as before. > > OK for trunk? Committed to gomp-4_0-branch in r80: commit 0ac5f6ae679a0cd70b197f0962d7d365e7dfbd21 Author: tschwinge Date: Tue Apr 21 19:45:23 2015 + Add pass_oacc_kernels gcc/ * passes.def: Add pass group pass_oacc_kernels. * tree-pass.h (make_pass_oacc_kernels): Declare. * tree-ssa-loop.c (gate_oacc_kernels): New static function. (pass_data_oacc_kernels): New pass_data. (class pass_oacc_kernels): New pass. (make_pass_oacc_kernels): New function. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@80 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |7 +++ gcc/passes.def |7 ++- gcc/tree-pass.h |1 + gcc/tree-ssa-loop.c | 45 + 4 files changed, 59 insertions(+), 1 deletion(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 1f86160..8a53ad8 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,12 @@ 2015-04-21 Tom de Vries + * passes.def: Add pass group pass_oacc_kernels. + * tree-pass.h (make_pass_oacc_kernels): Declare. + * tree-ssa-loop.c (gate_oacc_kernels): New static function. + (pass_data_oacc_kernels): New pass_data. + (class pass_oacc_kernels): New pass. + (make_pass_oacc_kernels): New function. + * omp-low.c: Include gimple-pretty-print.h. (release_first_vuse_in_edge_dest): New function. (expand_omp_target): When not in ssa, don't split off oacc kernels diff --git gcc/passes.def gcc/passes.def index db0dd18..854c5b8 100644 --- gcc/passes.def +++ gcc/passes.def @@ -86,7 +86,12 @@ along with GCC; see the file COPYING3. If not see execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); NEXT_PASS (pass_fre); - NEXT_PASS (pass_expand_omp_ssa); + /* Pass group that runs when there are oacc kernels in the +function. */ + NEXT_PASS (pass_oacc_kernels); + PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) + NEXT_PASS (pass_expand_omp_ssa); + POP_INSERT_PASSES () NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); NEXT_PASS (pass_early_ipa_sra); diff --git gcc/tree-pass.h gcc/tree-pass.h index b59ae7a..35778f2 100644 --- gcc/tree-pass.h +++ gcc/tree-pass.h @@ -450,6 +450,7 @@ extern gimple_opt_pass *make_pass_strength_reduction (gcc::context *ctxt); extern gimple_opt_pass *make_pass_vtable_verify (gcc::context *ctxt); extern gimple_opt_pass *make_pass_ubsan (gcc::context *ctxt); extern gimple_opt_pass *make_pass_sanopt (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_oacc_kernels (gcc::context *ctxt); /* IPA Passes */ extern simple_ipa_opt_pass *make_pass_ipa_lower_emutls (gcc::context *ctxt); diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c index ccb8f97..a041858 100644 --- gcc/tree-ssa-loop.c +++ gcc/tree-ssa-loop.c @@ -163,6 +163,51 @@ make_pass_tree_loop (gcc::context *ctxt) return new pass_tree_loop (ctxt); } +/* Gate for oacc kernels pass group. */ + +static bool +gate_oacc_kernels (function *fn) +{ + return (fn->curr_properties & PROP_gimple_eomp) == 0; +} + +/* The oacc kernels superpass. */ + +namespace { + +const pass_data pass_data_oacc_kernels = +{ + GIMPLE_PASS, /* type */ + "oacc_kernels", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_TREE_LOOP, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + 0, /* todo_flags_finish */ +}; + +class pass_oacc_kernels : public gimple_opt_pass +{ +public: + pass_oacc_kernels (gcc::context *ctxt) +: gimple_opt_pass (pass_data_oacc_ker
Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias)
Hi! On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries wrote: > On 24-11-14 11:56, Tom de Vries wrote: > > On 15-11-14 18:19, Tom de Vries wrote: > >> On 15-11-14 13:14, Tom de Vries wrote: > >>> I'm submitting a patch series with initial support for the oacc kernels > >>> directive. > >>> > >>> The patch series uses pass_parallelize_loops to implement parallelization > >>> of > >>> loops in the oacc kernels region. > >>> > >>> The patch series consists of these 8 patches: > >>> ... > >>> 1 Expand oacc kernels after pass_build_ealias > >>> 2 Add pass_oacc_kernels > >>> 3 Add pass_ch_oacc_kernels to pass_oacc_kernels > >>> 4 Add pass_tree_loop_{init,done} to pass_oacc_kernels > >>> 5 Add pass_loop_im to pass_oacc_kernels > >>> 6 Add pass_ccp to pass_oacc_kernels > >>> 7 Add pass_parloops_oacc_kernels to pass_oacc_kernels > >>> 8 Do simple omp lowering for no address taken var > >>> ... > >> > >> This patch moves omp expansion of the oacc kernels directive to after > >> pass_build_ealias. > >> > >> The rationale is that in order to use pass_parallelize_loops for analysis > >> and > >> transformation of an oacc kernels region, we postpone omp expansion of that > >> region until the earliest point in the pass list where enough information > >> is > >> availabe to run pass_parallelize_loops, in other words, after > >> pass_build_ealias. > >> > >> The patch postpones expansion in expand_omp, and ensures expansion by > >> adding > >> pass_expand_omp_ssa: > >> - after pass_build_ealias, and > >> - after pass_all_early_optimizations for the case we're not optimizing. > >> > >> In order to make sure the oacc kernels region arrives at > >> pass_expand_omp_ssa, > >> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop > >> aware of > >> lowered omp code, to handle it conservatively. > >> > >> The patch contains changes in expand_omp_target to deal with ssa-code, > >> similar > >> to what is already present in expand_omp_taskreg. > >> > >> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to > >> not be > >> static for oacc kernels. It does this to get some references to > >> .omp_data_sizes > >> and .omp_data_kinds in the ssa code. Without these references, the > >> definitions > >> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is > >> not > >> enough to have them not removed. [ In vries/oacc-kernels, I used a > >> BUILT_IN_USE > >> kludge for this purpose ]. > >> > >> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the > >> original function of which the definition has been removed (as in moved to > >> the > >> split off function). TODO_remove_unused_locals takes care of some of them, > >> but > >> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find > >> these > >> dangling SSA_NAMEs and releases them. > >> > > > > Reposting with small update: I've replaced the use of the rather generic > > gimple_stmt_omp_lowering_p with the more specific > > gimple_stmt_omp_data_i_init_p. > > > > Bootstrapped and reg-tested in the same way as before. > > > > I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre. > > This allows fre to unify references to the same omp variable before entering > pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels. > > F.i. this reduction fragment: > ... ># VUSE <.MEM_8> ># PT = { D.2282 } >_67 = .omp_data_i_59->sumD.2270; ># VUSE <.MEM_8> >_68 = *_67; > >_70 = _66 + _68; > ># VUSE <.MEM_8> ># PT = { D.2282 } >_69 = .omp_data_i_59->sumD.2270; ># .MEM_71 = VDEF <.MEM_8> >*_69 = _70; > ... > > is transformed by fre into: > ... ># VUSE <.MEM_8> ># PT = { D.2282 } >_67 = .omp_data_i_59->sumD.2270; ># VUSE <.MEM_8> >_68 = *_67; > >_70 = _66 + _68; > ># .MEM_71 = VDEF <.MEM_8> >*_67 = _70; > ... > > In order for pass_fre to respect the kernels region boundaries, I've added a > change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init > conservatively. > > Bootstrapped and reg-tested as before. > > OK for trunk? Committed to gomp-4_0-branch in r79: commit 93557ac5e30c26ee1a3d1255e31265b287171a0d Author: tschwinge Date: Tue Apr 21 19:37:19 2015 + Expand oacc kernels after pass_fre gcc/ * omp-low.c: Include gimple-pretty-print.h. (release_first_vuse_in_edge_dest): New function. (expand_omp_target): When not in ssa, don't split off oacc kernels region, clear PROP_gimple_eomp in cfun->curr_properties to force later expanssion, and add GOACC_kernels_internal call. When in ssa, split off oacc kernels and convert GOACC_kernels_internal into GOACC_kernels call. Handle ssa-code. (pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in properties_provided field. (pass_expand_omp::execute): Set PROP_g
Add BUILT_IN_GOACC_KERNELS_INTERNAL (was: openacc kernels directive -- initial support)
Hi! On Sat, 15 Nov 2014 13:14:52 +0100, Tom de Vries wrote: > I'm submitting a patch series with initial support for the oacc kernels > directive. > > The patch series uses pass_parallelize_loops to implement parallelization of > loops in the oacc kernels region. Committed to gomp-4_0-branch in r78: commit fd3add90d38d5f1b38c9cb557404542b6383b2b0 Author: tschwinge Date: Tue Apr 21 19:24:57 2015 + Add BUILT_IN_GOACC_KERNELS_INTERNAL ..., a variant of the GOACC_kernels builtin. This variant does not call the function passed as function pointer, and therefore is less of an optimization barrier than the original variant. The purpose of this variant is to allow the introduction of the GOACC_kernels call before splitting off the region body into a function (something that is currently done simultaneously). gcc/ * builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING. (ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add DEF_ATTR_TREE_LIST. * omp-builtins.def (BUILT_IN_GOACC_KERNELS_INTERNAL): Add DEF_GOACC_BUILTIN_FNSPEC. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@78 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp|6 ++ gcc/builtin-attrs.def |4 gcc/omp-builtins.def |5 + 3 files changed, 15 insertions(+) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index b091dd5..7885189 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,11 @@ 2015-04-21 Tom de Vries + * builtin-attrs.def (DOT_DOT_DOT_r_r_r): Add DEF_ATTR_FOR_STRING. + (ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST): Add + DEF_ATTR_TREE_LIST. + * omp-builtins.def (BUILT_IN_GOACC_KERNELS_INTERNAL): Add + DEF_GOACC_BUILTIN_FNSPEC. + * builtins.def (DEF_GOACC_BUILTIN_FNSPEC): Define. 2015-03-21 Tom de Vries diff --git gcc/builtin-attrs.def gcc/builtin-attrs.def index 1338644..8eca053 100644 --- gcc/builtin-attrs.def +++ gcc/builtin-attrs.def @@ -64,6 +64,7 @@ DEF_ATTR_FOR_INT (6) DEF_ATTR_TREE_LIST (ATTR_LIST_##ENUM, ATTR_NULL, \ ATTR_##ENUM, ATTR_NULL) DEF_ATTR_FOR_STRING (STR1, "1") +DEF_ATTR_FOR_STRING (DOT_DOT_DOT_r_r_r, "...rrr") #undef DEF_ATTR_FOR_STRING /* Construct a tree for a list of two integers. */ @@ -127,6 +128,9 @@ DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LIST, ATTR_PURE, \ ATTR_NULL, ATTR_NOTHROW_LIST) DEF_ATTR_TREE_LIST (ATTR_PURE_NOTHROW_LEAF_LIST, ATTR_PURE,\ ATTR_NULL, ATTR_NOTHROW_LEAF_LIST) +DEF_ATTR_TREE_LIST (ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST, \ + ATTR_FNSPEC, ATTR_LIST_DOT_DOT_DOT_r_r_r, \ + ATTR_NOTHROW_LIST) DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LIST, ATTR_NORETURN, \ ATTR_NULL, ATTR_NOTHROW_LIST) DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\ diff --git gcc/omp-builtins.def gcc/omp-builtins.def index 03955c4..cd273f2 100644 --- gcc/omp-builtins.def +++ gcc/omp-builtins.def @@ -39,6 +39,11 @@ DEF_GOACC_BUILTIN (BUILT_IN_GOACC_DATA_END, "GOACC_data_end", DEF_GOACC_BUILTIN (BUILT_IN_GOACC_ENTER_EXIT_DATA, "GOACC_enter_exit_data", BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_INT_INT_VAR, ATTR_NOTHROW_LIST) +DEF_GOACC_BUILTIN_FNSPEC (BUILT_IN_GOACC_KERNELS_INTERNAL, + "GOACC_kernels_internal", + BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR, + ATTR_FNSPEC_DOT_DOT_DOT_r_r_r_NOTHROW_LIST, + ATTR_NOTHROW_LIST, "...rrr") DEF_GOACC_BUILTIN (BUILT_IN_GOACC_KERNELS, "GOACC_kernels", BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR, ATTR_NOTHROW_LIST) Grüße, Thomas signature.asc Description: PGP signature
Re: [PATCH v3][MIPS] fix CRT_CALL_STATIC_FUNCTION macro
On Tue, 21 Apr 2015, Petar Jovanovic wrote: > --- /dev/null > +++ b/gcc/testsuite/gcc.target/mips/call-from-init.c > @@ -0,0 +1,10 @@ > +/* Check that __do_global_ctors_aux can be reached from .init section that > + is in a different (256MB) region. */ > +/* { dg-do run } */ > +/* { dg-options "-Wl,--section-start=.init=0x0FF0" } */ > +/* { dg-options "-Wl,--section-start=.text=0x1000" } */ Hmm, the addresses should work for any virtual-memory targets, however if this is going to be a run-time test, then not for bare-iron ones, as they won't normally support mapped addresses. And we may not be able to come up with better ones, because a typical bare-iron target will often not have enough memory to span a 256MB boundary. I think this will best be reduced to a link-only test on bare iron, hoping for a link failure. Speaking of which, have you been able to make a linker test case out of this example for a bug report against binutils? Not necessarily a proper LD test suite addition, I wouldn't be asking for *that*! Just a small case will do, e.g. a pair of .s files generated out of this source and the generated crtbegin.s file, preferably with unrelated clutter removed, together with a recipe how to assemble them and link to show the missing error message. That will certainly help covering this issue all and for good! Thanks, Maciej
Re: [PATCH, fortran] Add gfc_define_builtin_with_spec
Hi! On Fri, 9 Jan 2015 16:37:00 +0100, Tom de Vries wrote: > For the oacc kernels patch series I need a fortran builtin with fn spec > attribute (as mentioned here: > https://gcc.gnu.org/ml/gcc/2014-12/msg1.html ). > > Attached patch adds a function gfc_define_builtin_with_spec that allows me to > define such a builtin. > > At this point there's no user yet in trunk, so I've declared it unused. > > Bootstrapped and reg-tested on x86_64. > > OK for stage 3 trunk? Committed to gomp-4_0-branch in r75: commit ee0a44a49648f1addce78a6765bcbf6a14f237c2 Author: tschwinge Date: Tue Apr 21 18:24:48 2015 + Add gfc_define_builtin_with_spec gcc/fortran/ * f95-lang.c (gfc_define_builtin_with_spec): New function. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@75 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/fortran/ChangeLog.gomp |4 gcc/fortran/f95-lang.c | 21 + 2 files changed, 25 insertions(+) diff --git gcc/fortran/ChangeLog.gomp gcc/fortran/ChangeLog.gomp index 02a1aeb..8c23900 100644 --- gcc/fortran/ChangeLog.gomp +++ gcc/fortran/ChangeLog.gomp @@ -1,3 +1,7 @@ +2015-04-21 Tom de Vries + + * f95-lang.c (gfc_define_builtin_with_spec): New function. + 2015-01-13 Thomas Schwinge * trans-openmp.c (gfc_omp_finish_clause, gfc_trans_omp_clauses): diff --git gcc/fortran/f95-lang.c gcc/fortran/f95-lang.c index de9c813..1a14860 100644 --- gcc/fortran/f95-lang.c +++ gcc/fortran/f95-lang.c @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see #include "wide-int.h" #include "inchash.h" #include "tree.h" +#include "stringpool.h" #include "flags.h" #include "langhooks.h" #include "langhooks-def.h" @@ -600,6 +601,26 @@ gfc_define_builtin (const char *name, tree type, enum built_in_function code, set_builtin_decl (code, decl, true); } +/* Like gfc_define_builtin, but with fn spec attribute FNSPEC. */ + +static void ATTRIBUTE_UNUSED +gfc_define_builtin_with_spec (const char *name, tree fntype, + enum built_in_function code, + const char *library_name, int attr, + const char *fnspec) +{ + if (fnspec) +{ + /* Code copied from build_library_function_decl_1. */ + tree attr_args = build_tree_list (NULL_TREE, + build_string (strlen (fnspec), fnspec)); + tree attrs = tree_cons (get_identifier ("fn spec"), + attr_args, TYPE_ATTRIBUTES (fntype)); + fntype = build_type_attribute_variant (fntype, attrs); +} + + gfc_define_builtin (name, fntype, code, library_name, attr); +} #define DO_DEFINE_MATH_BUILTIN(code, name, argtype, tbase) \ gfc_define_builtin ("__builtin_" name "l", tbase##longdouble[argtype], \ Grüße, Thomas signature.asc Description: PGP signature
Re: Re: [RFC stage 1] Proposed new warning: -Wmisleading-indentation
On 21/04/15 18:07, David Malcolm wrote: I have the patch working now for the C++ frontend. Am attaching the work-in-progress (sans ChangeLog). This one (v2) bootstrapped and regrtested on x86_64-unknown-linux-gnu (Fedora 20), with: 63 new "PASS" results in gcc.sum 189 new "PASS" results in g++.sum for the new test cases (relative to a control build of r48). I still do not understand why you need so much complexity as I explained here: https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00830.html The attached patch passes all your tests except Wmisleading-indentation-3.c, which warns only once instead of two times (it doesn't seem a big loss to me), and Wmisleading-indentation-7.c which I did not bother to implement but it is straightforward application of the if-case to the else-case. Perhaps I'm missing something that is not reflected in your tests? BTW, the start-up cost of GCC is not negligible, thus grouping similar testcases in a single file may pay off in the long term. Many small files also tend to slow down VC tools. It also makes harder to see what is tested and what is missing. Cheers, Manuel. Index: c/c-parser.c === --- c/c-parser.c (revision 222087) +++ c/c-parser.c (working copy) @@ -5174,20 +5174,34 @@ c_parser_c99_block_statement (c_parser * location_t loc = c_parser_peek_token (parser)->location; c_parser_statement (parser); return c_end_compound_stmt (loc, block, flag_isoc99); } +static void +warn_for_misleading_indentation (location_t guard_loc, location_t body_loc, location_t next_stmt_loc, + const char *s) +{ + if (LOCATION_FILE (next_stmt_loc) == LOCATION_FILE (body_loc) + && (LOCATION_LINE (next_stmt_loc) == LOCATION_LINE (body_loc) + || (LOCATION_LINE (next_stmt_loc) > LOCATION_LINE (body_loc) + && LOCATION_COLUMN (next_stmt_loc) == LOCATION_COLUMN (body_loc +if (warning_at (next_stmt_loc, OPT_Wmisleading_indentation, + "statement is indented as if it were guarded by...")) + inform (guard_loc, + "...this %qs clause, but it is not", s); +} + /* Parse the body of an if statement. This is just parsing a statement but (a) it is a block in C99, (b) we track whether the body is an if statement for the sake of -Wparentheses warnings, (c) we handle an empty body specially for the sake of -Wempty-body warnings, and (d) we call parser_compound_statement directly because c_parser_statement_after_labels resets parser->in_if_block. */ static tree -c_parser_if_body (c_parser *parser, bool *if_p) +c_parser_if_body (c_parser *parser, bool *if_p, location_t if_loc) { tree block = c_begin_compound_stmt (flag_isoc99); location_t body_loc = c_parser_peek_token (parser)->location; c_parser_all_labels (parser); *if_p = c_parser_next_token_is_keyword (parser, RID_IF); @@ -5201,11 +5215,16 @@ c_parser_if_body (c_parser *parser, bool "suggest braces around empty body in an % statement"); } else if (c_parser_next_token_is (parser, CPP_OPEN_BRACE)) add_stmt (c_parser_compound_statement (parser)); else -c_parser_statement_after_labels (parser); +{ + c_parser_statement_after_labels (parser); + if (!c_parser_next_token_is_keyword (parser, RID_ELSE)) + warn_for_misleading_indentation (if_loc, body_loc, c_parser_peek_token (parser)->location, "if"); +} + return c_end_compound_stmt (body_loc, block, flag_isoc99); } /* Parse the else body of an if statement. This is just parsing a statement but (a) it is a block in C99, (b) we handle an empty body @@ -5248,10 +5267,11 @@ c_parser_if_statement (c_parser *parser) tree first_body, second_body; bool in_if_block; tree if_stmt; gcc_assert (c_parser_next_token_is_keyword (parser, RID_IF)); + location_t if_loc = c_parser_peek_token (parser)->location; c_parser_consume_token (parser); block = c_begin_compound_stmt (flag_isoc99); loc = c_parser_peek_token (parser)->location; cond = c_parser_paren_condition (parser); if (flag_cilkplus && contains_cilk_spawn_stmt (cond)) @@ -5259,11 +5279,11 @@ c_parser_if_statement (c_parser *parser) error_at (loc, "if statement cannot contain %"); cond = error_mark_node; } in_if_block = parser->in_if_block; parser->in_if_block = true; - first_body = c_parser_if_body (parser, &first_if); + first_body = c_parser_if_body (parser, &first_if, if_loc); parser->in_if_block = in_if_block; if (c_parser_next_token_is_keyword (parser, RID_ELSE)) { c_parser_consume_token (parser); second_body = c_parser_else_body (parser); @@ -5344,10 +5364,11 @@ static void c_parser_while_statement (c_parser *parser, bool ivdep) { tree block, cond, body, save_break, save_cont; location_t loc; gcc_assert (c_parser_next_token_is_keyword (parser, RID_WHILE)); + location_t while_loc = c_parser_peek_token (parser)->location; c_parser_co
RE: [PATCH v2][MIPS] fix CRT_CALL_STATIC_FUNCTION macro
-Original Message- From: Moore, Catherine [mailto:catherine_mo...@mentor.com] Sent: Friday, April 17, 2015 8:36 PM To: Petar Jovanovic Cc: Maciej W. Rozycki; Matthew Fortune; gcc-patches@gcc.gnu.org Subject: RE: [PATCH v2][MIPS] fix CRT_CALL_STATIC_FUNCTION macro > > Hi Petar, > Running the executable is fine, but didn't you say that your test case takes > hours to execute? > If so, that's not acceptable. Is it possible to construct a test case that > will run more quickly? > Thanks, > Catherine Hi Catherine, I was just raising concerns about the original example (that did run long). Anyway, I have just uploaded a different example that triggers the issue. Regards, Petar
[patch, libgfortran] PR65234 Output descriptor (*(1E15.7)) not accepted
I have had this simple patch in my trunk for quite some time and it has tested OK. I plan to commit with a test case based on the one in the PR today. Regards, Jerry 2015-04-21 Jerry DeLisle PR libgfortran/65234 * io/format.c (parse_format_list): Set the seen_dd flag in all cases where a data descriptor has been seen. Index: format.c === --- format.c (revision 222194) +++ format.c (working copy) @@ -624,6 +624,7 @@ parse_format_list (st_parameter_dt *dtp, bool *see get_fnode (fmt, &head, &tail, FMT_LPAREN); tail->repeat = -2; /* Signifies unlimited format. */ tail->u.child = parse_format_list (dtp, &seen_data_desc); + *seen_dd = seen_data_desc; if (fmt->error != NULL) goto finished; if (!seen_data_desc) @@ -851,6 +852,7 @@ parse_format_list (st_parameter_dt *dtp, bool *see switch (t) { case FMT_L: + *seen_dd = true; t = format_lex (fmt); if (t != FMT_POSINT) { @@ -873,6 +875,7 @@ parse_format_list (st_parameter_dt *dtp, bool *see break; case FMT_A: + *seen_dd = true; t = format_lex (fmt); if (t == FMT_ZERO) { @@ -897,6 +900,7 @@ parse_format_list (st_parameter_dt *dtp, bool *see case FMT_G: case FMT_EN: case FMT_ES: + *seen_dd = true; get_fnode (fmt, &head, &tail, t); tail->repeat = repeat; @@ -903,6 +907,7 @@ parse_format_list (st_parameter_dt *dtp, bool *see u = format_lex (fmt); if (t == FMT_G && u == FMT_ZERO) { + *seen_dd = true; if (notification_std (GFC_STD_F2008) == NOTIFICATION_ERROR || dtp->u.p.mode == READING) { @@ -928,6 +933,7 @@ parse_format_list (st_parameter_dt *dtp, bool *see } if (t == FMT_F && dtp->u.p.mode == WRITING) { + *seen_dd = true; if (u != FMT_POSINT && u != FMT_ZERO) { fmt->error = nonneg_required; @@ -969,9 +975,11 @@ parse_format_list (st_parameter_dt *dtp, bool *see tail->u.real.e = -1; if (t2 == FMT_D || t2 == FMT_F) - break; + { + *seen_dd = true; + break; + } - /* Look for optional exponent */ t = format_lex (fmt); if (t != FMT_E) @@ -1011,6 +1019,7 @@ parse_format_list (st_parameter_dt *dtp, bool *see case FMT_B: case FMT_O: case FMT_Z: + *seen_dd = true; get_fnode (fmt, &head, &tail, t); tail->repeat = repeat;
Re: [RFC] Dynamically aligning the stack
On Tue, Apr 21, 2015 at 9:52 AM, Steve Ellcey wrote: > On Tue, 2015-04-14 at 10:08 -0700, H.J. Lu wrote: > >> We have done just that in GCC 4.4 to implement dynamic stack >> alignment on x86 :-). Some of x86 backend changes for dynamic >> stack alignment are x86 psABI specific. Others are historical, >> like -mstackrealign. which was the old attempt for dynamic stack >> alignment. > > I am a bit confused about the history of stack alignment on x86. So I > guess -mpreferred-stack-boundary=X came first and is not > obsolete/depreciated. But I thought -mstackrealign=X was the current > method of aligning the stack, but based on this comment and the patches > you pointed me at I guess this is also obsolete (or at least deprecated) > and that -mincoming-stack-boundary=X is the current option that should > be used. But I am not sure how this option works. -mpreferred-stack-boundary=X and -mincoming-stack-boundary=X set stack alignment. -mstackrealign=X: '-mstackrealign' Realign the stack at entry. On the Intel x86, the '-mstackrealign' option generates an alternate prologue and epilogue that realigns the run-time stack if necessary. This supports mixing legacy codes that keep 4-byte stack alignment with modern codes that keep 16-byte stack alignment for SSE compatibility. See also the attribute 'force_align_arg_pointer', applicable to individual functions. assumes 4-byte incoming stack alignment in 32-bit. It isn't needed in most cases since GCC has been generating 16-byte outgoing stack alignment for ages. > Obviously it tells GCC what assumption to make about stack alignment at > the start of a function but how do you tell GCC what alignment you want > for the function? Or does GCC figure that out for itself based on the > instructions and data types it sees in the function? > Please do # git grep "stack_alignment_needed = " to see how middle-end and backend track stack alignment requirement. -- H.J.
[PATCH v3][MIPS] fix CRT_CALL_STATIC_FUNCTION macro
New patch, v3. PTAL. Regards, Petar gcc/ChangeLog: 2015-04-21 Petar Jovanovic * config/mips/mips.h (CRT_CALL_STATIC_FUNCTION): Fix the macro to use la/jalr instead of jal. gcc/testsuite/ChangeLog: 2015-04-21 Petar Jovanovic * gcc.target/mips/call-from-init.c: New test. * gcc.target/mips/mips.exp: Add section_start to mips_option_groups. commit 566564bd6d80fd6b5ebd6b8eccf09e3716246930 Author: Petar Jovanovic Date: Thu Jan 22 02:15:22 2015 +0100 [mips] fix CRT_CALL_STATIC_FUNCTION macro jal can not reach a target in different region, so replace it with la/jalr variant. diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index ec69ed5..4bd83f5 100644 --- a/gcc/config/mips/mips.h +++ b/gcc/config/mips/mips.h @@ -3034,11 +3034,11 @@ while (0) nop\n\ 1: .cpload $31\n\ .set reorder\n\ - jal " USER_LABEL_PREFIX #FUNC "\n\ + la $25, " USER_LABEL_PREFIX #FUNC "\n\ + jalr $25\n\ .set pop\n\ " TEXT_SECTION_ASM_OP); -#elif ((defined _ABIN32 && _MIPS_SIM == _ABIN32) \ - || (defined _ABI64 && _MIPS_SIM == _ABI64)) +#elif (defined _ABIN32 && _MIPS_SIM == _ABIN32) #define CRT_CALL_STATIC_FUNCTION(SECTION_OP, FUNC) \ asm (SECTION_OP "\n\ .set push\n\ @@ -3048,7 +3048,22 @@ while (0) nop\n\ 1: .set reorder\n\ .cpsetup $31, $2, 1b\n\ - jal " USER_LABEL_PREFIX #FUNC "\n\ + la $25, " USER_LABEL_PREFIX #FUNC "\n\ + jalr $25\n\ + .set pop\n\ + " TEXT_SECTION_ASM_OP); +#elif (defined _ABI64 && _MIPS_SIM == _ABI64) +#define CRT_CALL_STATIC_FUNCTION(SECTION_OP, FUNC) \ + asm (SECTION_OP "\n\ + .set push\n\ + .set nomips16\n\ + .set noreorder\n\ + bal 1f\n\ + nop\n\ +1: .set reorder\n\ + .cpsetup $31, $2, 1b\n\ + dla $25, " USER_LABEL_PREFIX #FUNC "\n\ + jalr $25\n\ .set pop\n\ " TEXT_SECTION_ASM_OP); #endif diff --git a/gcc/testsuite/gcc.target/mips/call-from-init.c b/gcc/testsuite/gcc.target/mips/call-from-init.c new file mode 100644 index 000..ee00a17 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/call-from-init.c @@ -0,0 +1,10 @@ +/* Check that __do_global_ctors_aux can be reached from .init section that + is in a different (256MB) region. */ +/* { dg-do run } */ +/* { dg-options "-Wl,--section-start=.init=0x0FF0" } */ +/* { dg-options "-Wl,--section-start=.text=0x1000" } */ + +int +main (void) { + return 0; +} diff --git a/gcc/testsuite/gcc.target/mips/mips.exp b/gcc/testsuite/gcc.target/mips/mips.exp index a0980a9..1dd4173 100644 --- a/gcc/testsuite/gcc.target/mips/mips.exp +++ b/gcc/testsuite/gcc.target/mips/mips.exp @@ -254,6 +254,7 @@ set mips_option_groups { madd "HAS_MADD" maddps "HAS_MADDPS" lsa "(|!)HAS_LSA" +section_start "-Wl,--section-start=.*" } for { set option 0 } { $option < 32 } { incr option } {
Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
On 21/04/15 15:09, Jeff Law wrote: On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: From reading config/stormy16/stormy-abi it seems to me that we don't pass arguments partially in stormy16, so this code would never be called there. That leaves pa as the potential problematic target. I don't suppose there's an easy way to test on pa? My checkout of binutils doesn't seem to include a sim target for it. No simulator, no machines in the testfarm, the box I had access to via parisc-linux.org seems dead and my ancient PA overheats well before a bootstrap could complete. I often regret knowing about the backwards way many things were done on the PA because it makes me think about cases that only matter on dead architectures. So what should be the action plan here? I can't add an assert on positive result as a negative result is valid. We want to catch the case where this would cause trouble on pa, or change the patch until we're confident that it's fine for pa. That being said, reading the documentation of STACK_GROWS_UPWARD and ARGS_GROW_DOWNWARD I'm having a hard time visualising a case where this would cause trouble on pa. Is the problem that in the function: +/* Add SIZE to X and check whether it's greater than Y. + If it is, return the constant amount by which it's greater or smaller. + If the two are not statically comparable (for example, X and Y contain + different registers) return -1. This is used in expand_push_insn to + figure out if reading SIZE bytes from location X will end up reading from + location Y. */ +static int +memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) +{ + rtx tmp = plus_constant (Pmode, x, size); + rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); + + if (!CONST_INT_P (sub)) +return -1; + + return INTVAL (sub); +} for ARGS_GROW_DOWNWARD we would be reading 'backwards' from x, so the function should something like the following? static int memory_load_overlap (rtx x, rtx y, HOST_WIDE_INT size) { #ifdef ARGS_GROW_DOWNWARD rtx tmp = plus_constant (Pmode, x, -size); #else rtx tmp = plus_constant (Pmode, x, size); #endif rtx sub = simplify_gen_binary (MINUS, Pmode, tmp, y); if (!CONST_INT_P (sub)) return -1; #ifdef ARGS_GROW_DOWNWARD return INTVAL (-sub); #else return INTVAL (sub); #endif } now, say for x == sp + 4, y == sp + 8, size == 16: This would be a problematic case for arm, so this code on arm (where ARGS_GROW_DOWNWARD is *not* defined) would return 12, which is the number of bytes that overlap. On a target where ARGS_GROW_DOWNWARD is defined this would return -20, meaning that no overlap occurs (because we read in the descending direction from x, IIUC). Thanks, Kyrill Jeff
Re: [patch] [java] bump libgcj soname
- Original Message - > On Tue, Apr 21, 2015 at 01:04:04PM -0400, Andrew Hughes wrote: > > - Original Message - > > > On Tue, Apr 21, 2015 at 04:07:13PM +0200, Matthias Klose wrote: > > > > bump the libgcj soname on the trunk, as done for every release cycle, > > > > > > Is that really needed though these days? > > > Weren't there basically zero changes to libjava (both libjava and > > > libjava/classpath) in the last 2 or more years? > > > The few ones were mostly updating Copyright notices, minor configure > > > changes, but I really haven't seen anything ABI changing for quite a > > > while. > > > > > > > On the Classpath side, there's a bunch of stuff to merge in that would > > change the ABI. It's a matter of finding a good point at which to do it > > and time to do so. I keep missing the right point in the gcc lifecycle. > > Now might be a good time (any time next 6.5 months or so), and if that is > done, surely I have no issue with bumping the soname. > Ok, that should be possible. > Jakub > -- Andrew :) Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: ed25519/35964222 (hkp://keys.gnupg.net) Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 PGP Key: rsa4096/248BDC07 (hkp://keys.gnupg.net) Fingerprint = EC5A 1F5E C0AD 1D15 8F1F 8F91 3B96 A578 248B DC07
Re: [patch] [java] bump libgcj soname
On Tue, Apr 21, 2015 at 01:04:04PM -0400, Andrew Hughes wrote: > - Original Message - > > On Tue, Apr 21, 2015 at 04:07:13PM +0200, Matthias Klose wrote: > > > bump the libgcj soname on the trunk, as done for every release cycle, > > > > Is that really needed though these days? > > Weren't there basically zero changes to libjava (both libjava and > > libjava/classpath) in the last 2 or more years? > > The few ones were mostly updating Copyright notices, minor configure > > changes, but I really haven't seen anything ABI changing for quite a while. > > > > On the Classpath side, there's a bunch of stuff to merge in that would > change the ABI. It's a matter of finding a good point at which to do it > and time to do so. I keep missing the right point in the gcc lifecycle. Now might be a good time (any time next 6.5 months or so), and if that is done, surely I have no issue with bumping the soname. Jakub
Re: [RFC stage 1] Proposed new warning: -Wmisleading-indentation
On Apr 21, 2015, at 9:07 AM, David Malcolm wrote: > I think I want to make a distinction between > > (A) classic C "gotchas", like the one in my mail and the: > > if (cond); >stmt; > > one you mentioned above > > vs > > (B) wrong/inconsistent indentation. > > I think (A) is high-value, since it detects subtly wrong code, likely to > have misled the reader, whereas I don't find (B) as interesting. Ok. I don’t have any problem with that. Going for the high value only makes the problem space smaller, more likely to implement and do a good job and avoids false positives and all sorts of what ifs that the other class would expose you to. I like your work and your plan.
Re: [patch] [java] bump libgcj soname
- Original Message - > On Tue, Apr 21, 2015 at 04:07:13PM +0200, Matthias Klose wrote: > > bump the libgcj soname on the trunk, as done for every release cycle, > > Is that really needed though these days? > Weren't there basically zero changes to libjava (both libjava and > libjava/classpath) in the last 2 or more years? > The few ones were mostly updating Copyright notices, minor configure > changes, but I really haven't seen anything ABI changing for quite a while. > On the Classpath side, there's a bunch of stuff to merge in that would change the ABI. It's a matter of finding a good point at which to do it and time to do so. I keep missing the right point in the gcc lifecycle. > Jakub > -- Andrew :) Free Java Software Engineer Red Hat, Inc. (http://www.redhat.com) PGP Key: ed25519/35964222 (hkp://keys.gnupg.net) Fingerprint = 5132 579D D154 0ED2 3E04 C5A0 CFDA 0F9B 3596 4222 PGP Key: rsa4096/248BDC07 (hkp://keys.gnupg.net) Fingerprint = EC5A 1F5E C0AD 1D15 8F1F 8F91 3B96 A578 248B DC07
Re: [RFC] Dynamically aligning the stack
On Tue, 2015-04-14 at 10:08 -0700, H.J. Lu wrote: > We have done just that in GCC 4.4 to implement dynamic stack > alignment on x86 :-). Some of x86 backend changes for dynamic > stack alignment are x86 psABI specific. Others are historical, > like -mstackrealign. which was the old attempt for dynamic stack > alignment. I am a bit confused about the history of stack alignment on x86. So I guess -mpreferred-stack-boundary=X came first and is not obsolete/depreciated. But I thought -mstackrealign=X was the current method of aligning the stack, but based on this comment and the patches you pointed me at I guess this is also obsolete (or at least deprecated) and that -mincoming-stack-boundary=X is the current option that should be used. But I am not sure how this option works. Obviously it tells GCC what assumption to make about stack alignment at the start of a function but how do you tell GCC what alignment you want for the function? Or does GCC figure that out for itself based on the instructions and data types it sees in the function? Steve Ellcey sell...@imgtec.com
Re: [RFC stage 1] Proposed new warning: -Wmisleading-indentation
On 21/04/15 18:07, David Malcolm wrote: On Thu, 2015-04-16 at 10:26 -0700, Mike Stump wrote: Does it also handle: if (cone); stmt; ? Would be good to add that to the test suite, as that is another hard to spot common error that should be caught. Not yet, but I agree that it would be a good thing to issue a warning for. GCC already warns for the above: test.c:3:9: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] if (a); ^ Cheers, Manuel.
Re: [RFC stage 1] Proposed new warning: -Wmisleading-indentation
On Tue, Apr 21, 2015 at 12:07:00PM -0400, David Malcolm wrote: > On Thu, 2015-04-16 at 10:26 -0700, Mike Stump wrote: > > On Apr 16, 2015, at 8:01 AM, David Malcolm wrote: > > > Attached is a work-in-progress patch for a new > > > -Wmisleading-indentation > > > warning I've been experimenting with, for GCC 6. > > > > Seems like a nice idea in general. > > > > Does it also handle: > > > > if (cone); > > stmt; > > > > ? Would be good to add that to the test suite, as that is another hard to > > spot common error that should be caught. > > Not yet, but I agree that it would be a good thing to issue a warning > for. > > > I do think that it is reasonable to warn for things like: > > > > stmt; > > stmt; > > > > one of those two lines is likely misindented, though, maybe you want to > > start with the high payback things first. > > > > An issue here is how to determine (i), or if it's OK to default to 8 > > > > Yes, 8 is the proper value to default it to. > > > > > and have a command-line option (param?) to override it? (though what > > > about, > > > say, each header file?) > > > > I’ll abstain from this. The purist in me says no option for other > > than 8, life goes on. 20 years ago, someone was confused over hard v > > soft tabbing and what exactly the editor key TAB does. That confusion > > is over, the 8 people have won. Catering to other than 8 gives the > > impression that the people that lost still have a chance at > > winning. :-) > > > > > Thoughts on this, and on the patch? > > > > Would be nice to have a stricter version that warns about all wildly > > inconsistently or wrongly indented lines. > > > > { > > stmt; > > stmt; // must be same as above > > } > > > > { > > stmt; // must be indented at least 1 > > } > > > > if (cond) > > stmt; // must be indented at least 1 > > I think I want to make a distinction between > > (A) classic C "gotchas", like the one in my mail and the: > > if (cond); > stmt; > > one you mentioned above > > vs > > (B) wrong/inconsistent indentation. > > I think (A) is high-value, since it detects subtly wrong code, likely to > have misled the reader, whereas I don't find (B) as interesting. I > think (A) is "misleading", whereas (B) is "wrong"; the ugliness of the > (B) cases tends to give me a "this code is ugly; beware, danger Will > Robinson!" reaction, whereas (A) is less ugly and thus more dangerous. So, while I was working on ifdef stuff in gcc I found the following pattern #ifdef FOO if (FOO) #endif bar (); which you may want to handle somehow. In that sort of case one side of the ifdef will necessarily have the B type of miss indentation. Trev > > (if that makes sense; this may just be my own visceral reaction to the > erroneous code). > > Or to put it another way, I hope to make (A) good enough to go into > -Wall, whereas I think (B) would meet more resistance. > Also, I think autogenerated code is more likely to run into (B) than > (A). > > I have the patch working now for the C++ frontend. Am attaching the > work-in-progress (sans ChangeLog). This one (v2) bootstrapped and > regrtested on x86_64-unknown-linux-gnu (Fedora 20), with: > 63 new "PASS" results in gcc.sum > 189 new "PASS" results in g++.sum > for the new test cases (relative to a control build of r48). > > I also moved the visual-parser.c/h to c-family, to make use of the > -ftabstop option Tom mentioned in another mail. > > I also made it identify the kind of clause, so error messages say things > like: > > ./Wmisleading-indentation-1.c:10:7: warning: statement is indented as if > it were guarded by... [-Wmisleading-indentation] > ./Wmisleading-indentation-1.c:8:3: note: ...this 'if' clause, but it is > not > > which makes it easier to read, especially when dealing with nesting. > > This hasn't yet had any performance/leak fixes so it isn't ready as is. > I plan to look at making it warn about the: > > if (cond); > stmt; > > gotcha next, before trying to optimize it. > > (and no ChangeLog yet) > > Dave > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 80c91f0..8154469 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1143,7 +1143,8 @@ C_COMMON_OBJS = c-family/c-common.o > c-family/c-cppbuiltin.o c-family/c-dump.o \ >c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \ >c-family/c-semantics.o c-family/c-ada-spec.o \ >c-family/c-cilkplus.o \ > - c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o > + c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o \ > + c-family/visual-parser.o > > # Language-independent object files. > # We put the insn-*.o files first so that a parallel make will build > diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt > index 983f4a8..88f1f94 100644 > --- a/gcc/c-family/c.opt > +++ b/gcc/c-family/c.opt > @@ -554,6 +554,10 @@ Wmemset-transposed-args > C ObjC C++ ObjC++ Var(warn_memset_transp
Re: [RFC stage 1] Proposed new warning: -Wmisleading-indentation
On Thu, 2015-04-16 at 10:26 -0700, Mike Stump wrote: > On Apr 16, 2015, at 8:01 AM, David Malcolm wrote: > > Attached is a work-in-progress patch for a new > > -Wmisleading-indentation > > warning I've been experimenting with, for GCC 6. > > Seems like a nice idea in general. > > Does it also handle: > > if (cone); > stmt; > > ? Would be good to add that to the test suite, as that is another hard to > spot common error that should be caught. Not yet, but I agree that it would be a good thing to issue a warning for. > I do think that it is reasonable to warn for things like: > > stmt; > stmt; > > one of those two lines is likely misindented, though, maybe you want to start > with the high payback things first. > > An issue here is how to determine (i), or if it's OK to default to 8 > > Yes, 8 is the proper value to default it to. > > > and have a command-line option (param?) to override it? (though what about, > > say, each header file?) > > I’ll abstain from this. The purist in me says no option for other > than 8, life goes on. 20 years ago, someone was confused over hard v > soft tabbing and what exactly the editor key TAB does. That confusion > is over, the 8 people have won. Catering to other than 8 gives the > impression that the people that lost still have a chance at > winning. :-) > > > Thoughts on this, and on the patch? > > Would be nice to have a stricter version that warns about all wildly > inconsistently or wrongly indented lines. > > { > stmt; > stmt; // must be same as above > } > > { > stmt; // must be indented at least 1 > } > > if (cond) > stmt; // must be indented at least 1 I think I want to make a distinction between (A) classic C "gotchas", like the one in my mail and the: if (cond); stmt; one you mentioned above vs (B) wrong/inconsistent indentation. I think (A) is high-value, since it detects subtly wrong code, likely to have misled the reader, whereas I don't find (B) as interesting. I think (A) is "misleading", whereas (B) is "wrong"; the ugliness of the (B) cases tends to give me a "this code is ugly; beware, danger Will Robinson!" reaction, whereas (A) is less ugly and thus more dangerous. (if that makes sense; this may just be my own visceral reaction to the erroneous code). Or to put it another way, I hope to make (A) good enough to go into -Wall, whereas I think (B) would meet more resistance. Also, I think autogenerated code is more likely to run into (B) than (A). I have the patch working now for the C++ frontend. Am attaching the work-in-progress (sans ChangeLog). This one (v2) bootstrapped and regrtested on x86_64-unknown-linux-gnu (Fedora 20), with: 63 new "PASS" results in gcc.sum 189 new "PASS" results in g++.sum for the new test cases (relative to a control build of r48). I also moved the visual-parser.c/h to c-family, to make use of the -ftabstop option Tom mentioned in another mail. I also made it identify the kind of clause, so error messages say things like: ./Wmisleading-indentation-1.c:10:7: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation] ./Wmisleading-indentation-1.c:8:3: note: ...this 'if' clause, but it is not which makes it easier to read, especially when dealing with nesting. This hasn't yet had any performance/leak fixes so it isn't ready as is. I plan to look at making it warn about the: if (cond); stmt; gotcha next, before trying to optimize it. (and no ChangeLog yet) Dave diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 80c91f0..8154469 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1143,7 +1143,8 @@ C_COMMON_OBJS = c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o \ c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o \ c-family/c-semantics.o c-family/c-ada-spec.o \ c-family/c-cilkplus.o \ - c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o + c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o \ + c-family/visual-parser.o # Language-independent object files. # We put the insn-*.o files first so that a parallel make will build diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 983f4a8..88f1f94 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -554,6 +554,10 @@ Wmemset-transposed-args C ObjC C++ ObjC++ Var(warn_memset_transposed_args) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Warn about suspicious calls to memset where the third argument is constant literal zero and the second is not +Wmisleading-indentation +C C++ Common Var(warn_misleading_indentation) Warning +Warn when the indentation of the code does not reflect the block structure + Wmissing-braces C ObjC C++ ObjC++ Var(warn_missing_braces) Warning LangEnabledBy(C ObjC,Wall) Warn about possibly missing braces around initializers diff --git a/gcc/c-family/visual-parser.c b/gcc/c-family/visual-parser.c new file mode 100644 index 000..b1fcb8b --- /dev/null
[PATCH][AARCH64]Use mov for add with large immediate.
Hi all, This is a simple patch to generate a move instruction to temporarily hold the large immediate for a add instruction. GCC regression test has been run using aarch64-none-elf toolchain. NO new issues. Okay for trunk? Regards, Renlin Li gcc/ChangeLog: 2015-04-21 Renlin Li * config/aarch64/aarch64.md (add3): Use mov when allowed. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1f4169e..9ea1939 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1414,18 +1414,28 @@ " if (! aarch64_plus_operand (operands[2], VOIDmode)) { - rtx subtarget = ((optimize && can_create_pseudo_p ()) - ? gen_reg_rtx (mode) : operands[0]); HOST_WIDE_INT imm = INTVAL (operands[2]); - - if (imm < 0) - imm = -(-imm & ~0xfff); + if (aarch64_move_imm (imm, mode) + && can_create_pseudo_p ()) + { + rtx tmp = gen_reg_rtx (mode); + emit_move_insn (tmp, operands[2]); + operands[2] = tmp; + } else -imm &= ~0xfff; + { + rtx subtarget = ((optimize && can_create_pseudo_p ()) + ? gen_reg_rtx (mode) : operands[0]); + + if (imm < 0) + imm = -(-imm & ~0xfff); + else + imm &= ~0xfff; - emit_insn (gen_add3 (subtarget, operands[1], GEN_INT (imm))); - operands[1] = subtarget; - operands[2] = GEN_INT (INTVAL (operands[2]) - imm); + emit_insn (gen_add3 (subtarget, operands[1], GEN_INT (imm))); + operands[1] = subtarget; + operands[2] = GEN_INT (INTVAL (operands[2]) - imm); + } } " )
Re: [PATCH][doc] Improve pipeline description docs a bit
On 04/20/2015 04:31 AM, Kyrill Tkachov wrote: Hi all, This patch attempts to improve the pipeline description documentation. It fixes some grammar errors,typos and clarifies some concepts. The sections on the syntactic constructs are formatted to have a small description, and example, description of syntax elements and some elaboration. Is this ok for trunk? Thanks, Kyrill 2014-04-20 Kyrylo Tkachov * doc/md.texi (Specifying processor pipeline description): Improve wording. Clarify some constructs. H. I guess overall this is an improvement, but I still see quite a few things that need tweaking (and I wasn't even looking very hard). +latency time}. Instructions may not complete execution until all inputs +to the instruction have been evaluated and are available for use. +Taking data dependence delays into account is simple. I don't think the above sentence adds anything and could be deleted. +The data dependence (true, output, and anti-dependence) delay between two +instructions is modelled as being constant. In most cases this approach is +adequate. The second kind of interlock delays is a reservation delay. +The reservation delay means that two or more executing instructions will require s/will require/require/ + +The define_automaton construct declares the names of automata. +It takes the following form: @smallexample (define_automaton @var{automata-names}) @end smallexample @var{automata-names} is a string giving names of the automata. The -names are separated by commas. All the automata should have unique names. -The automaton name is used in the constructions @code{define_cpu_unit} and -@code{define_query_cpu_unit}. +names are separated by commas. All the automata must have unique names. +The automaton name is used to bind @code{define_cpu_unit} and +@code{define_query_cpu_unit} constructs to specific automata. + +This construct declares the names of automata. You already said that a few sentences above; delete this one. +The define_query_cpu_unit construct can be used to define units Add @code{} markup here. -@var{default_latency} is a number giving latency time of the +@var{default_latency} is a number giving the latency of the instruction. There is an important difference between the old description and the automaton based pipeline description. The latency -time is used for all dependencies when we use the old description. In -the automaton based pipeline description, the given latency time is only -used for true dependencies. The cost of anti-dependencies is always -zero and the cost of output dependencies is the difference between -latency times of the producing and consuming insns (if the difference -is negative, the cost is considered to be zero). You can always -change the default costs for any description by using the target hook +is used for all types of dependencies when we used the old description. In +the automaton based pipeline description, the latency is only taken into +account when analysing true dependencies (i.e. not output or +anti-dependencies). The cost of anti-dependencies is always zero and the +cost of output dependencies is the difference between the latencies +of the producing and consuming insns (if the difference is negative, the +cost is considered to be zero). You can always change the default cost +between any pair of insns by using the target hook @code{TARGET_SCHED_ADJUST_COST} (@pxref{Scheduling}). Here I am confused. What is the "old description"? If this is a leftover of some obsolete way of doing things, the references to it should be deleted. +construct. You must avoid having more than one +@code{define_insn_reservation} matching any one RTL insn, as the behaviour is s/behaviour/behavior/ +The following construct is used to describe a bypass i.e. an exception +in the execution latency between a pair of instructions: @dfn{bypass} ?? @var{guard} is an optional string giving the name of a C function which -defines an additional guard for the bypass. The function will get the +defines an additional guard for the bypass. The function will take the two insns as parameters. If the function returns zero the bypass will be ignored for this case. The additional guard is necessary to s/will take/takes/ s/will be ignored/is ignored/ +If there is more one bypass with the same output and input insns, the +chosen bypass is the first bypass with a guard function in its definition that +returns nonzero. If there is no such bypass, then a bypass without a guard +function is chosen. These constructs can be used to describe, for example, +forwarding paths in a processor pipeline. I don't understand what the last sentence has to do with the rest of this paragraph. If this is part of the general discussion of what define_bypass does, it should be moved up to the paragraph where the concept of a bypass is introduced. -@var{unit-names} is a string giving names o
[WIP] OpenMP 4 NVPTX support
Hi! Attached is a minimal patch to get at least a trivial OpenMP 4.0 testcase offloading to NVPTX (the first patch). The second patch is WIP, just first few needed changes to make libgomp to build for NVPTX (several weeks of work at least). The following seems to work and the output suggests that it was offloaded to a non-SHM arch: int main () { int v = 0; int *w = 0; int x = 0; #pragma omp target { v = 6; w = &v; x = 1; // omp_is_initial_device (); } __builtin_printf ("%d %p %p %d\n", v, &v, w, x); return 0; } but already tiny bit more complicated testcase: extern void *malloc (__SIZE_TYPE__); extern void free (void *); int main () { int v = 0; int *w = 0; int x = 0; #pragma omp target { v = 6; w = &v; char *p = malloc (64); x = 1; // omp_is_initial_device (); free (p); } __builtin_printf ("%d %p %p %d\n", v, &v, w, x); return 0; } suggests that while it is nice that when building nvptx accel compiler we build libgcc.a, libc.a, libm.a, libgfortran.a (and in the future hopefully libgomp.a), nothing attempts to link those in :(. Is the plan to link those in at mkoffload time (haven't seen any attempt of mkoffload to invoke the nvptx-none-ld linker though), or link those in somehow at link_ptx time in the plugin? In either case, it isn't clear to me how things will work (if at all) in the case where multiple shared libraries (or executable and at least one shared library) have their own offloading bits, and if you try to e.g. call an offloaded function defined in the shared library from an offloaded kernel in the executable, because if any library needs some global singleton case, if it is linked multiple times, no idea what the PTX JIT will do. Once that is resolved, another thing will be to figure out how to efficiently implement the TLS libgomp needs for its ICVs and other state - right now it uses either __thread, or pthread_getspecific, neither of these is usable of course. I've been thinking about an array of those structures in .shared memory indexed by %tid.x, but I guess that runs into the issue that the array would need to be declared fixed size and there is a very small size limitation on .shared memory size. So perhaps a file scope .shared pointer to global memory, where whomever launches an OpenMP 4.0 kernel (either the libgomp-plugin-nvptx.so.1 doing GOMP_run, or later on dynamic parallelism from GOMP_target in the nvptx libgomp.a) allocates the memory and some wrapper sets the .shared variable to that allocated memory, then calls the kernel? Jakub --- libgomp/plugin/plugin-nvptx.c.jj2015-04-21 08:38:00.0 +0200 +++ libgomp/plugin/plugin-nvptx.c 2015-04-21 16:55:25.247470080 +0200 @@ -978,8 +978,8 @@ event_add (enum ptx_event_type type, CUe void nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs, - size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers, - int vector_length, int async, void *targ_mem_desc) + size_t *sizes, unsigned short *kinds, int num_gangs, + int num_workers, int vector_length, int async, void *targ_mem_desc) { struct targ_fn_descriptor *targ_fn = (struct targ_fn_descriptor *) fn; CUfunction function; @@ -1137,7 +1137,6 @@ nvptx_host2dev (void *d, const void *h, CUresult r; CUdeviceptr pb; size_t ps; - struct nvptx_thread *nvthd = nvptx_thread (); if (!s) return 0; @@ -1162,7 +1161,8 @@ nvptx_host2dev (void *d, const void *h, GOMP_PLUGIN_fatal ("invalid size"); #ifndef DISABLE_ASYNC - if (nvthd->current_stream != nvthd->ptx_dev->null_stream) + struct nvptx_thread *nvthd = nvptx_thread (); + if (nvthd && nvthd->current_stream != nvthd->ptx_dev->null_stream) { CUevent *e; @@ -1202,7 +1202,6 @@ nvptx_dev2host (void *h, const void *d, CUresult r; CUdeviceptr pb; size_t ps; - struct nvptx_thread *nvthd = nvptx_thread (); if (!s) return 0; @@ -1227,7 +1226,8 @@ nvptx_dev2host (void *h, const void *d, GOMP_PLUGIN_fatal ("invalid size"); #ifndef DISABLE_ASYNC - if (nvthd->current_stream != nvthd->ptx_dev->null_stream) + struct nvptx_thread *nvthd = nvptx_thread (); + if (nvthd && nvthd->current_stream != nvthd->ptx_dev->null_stream) { CUevent *e; @@ -1559,7 +1559,8 @@ GOMP_OFFLOAD_get_name (void) unsigned int GOMP_OFFLOAD_get_caps (void) { - return GOMP_OFFLOAD_CAP_OPENACC_200; + return GOMP_OFFLOAD_CAP_OPENACC_200 +| GOMP_OFFLOAD_CAP_OPENMP_400; } int @@ -1759,7 +1760,7 @@ GOMP_OFFLOAD_openacc_parallel (void (*fn void *targ_mem_desc) { nvptx_exec (fn, mapnum, hostaddrs, devaddrs, sizes, kinds, num_gangs, - num_workers, vector_length, async, targ_mem_desc); + num_workers, vector_length, async, targ_mem_desc); } void @@ -1889,3 +1890,27 @@ GOMP_OFFLOAD_openacc_set_cuda_stream (in { return nvptx_set_cuda_stream (async, stream); } + +void
Re: [PATCH 00/12] Reduce conditional compilation
On Tue, Apr 21, 2015 at 07:57:19AM -0600, Jeff Law wrote: > On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: > >From: Trevor Saunders > > > >Hi, > > > >This is a first round of patches to reduce the amount of code with in #if / > >#ifdef. This makes it incrementally easier to not break configs other than > >the > >one being built, and moves things slightly closer to using target hooks for > >everything. > > > >each commit bootstrapped and regtested on x86_64-linux-gnu without > >regression, > >and whole patch set run through config-list.mk without issue, ok? > So I think after looking at this patchset, any changes of a similar nature > you want to make should be considered pre-approved. Just post them for > archival purposes, but no need for you to wait for review as long as they > have the same purpose and overall structure as was seen in these patches. thanks! Its also always nice to have someone double check your logic :-) Trev > > jeff >
Re: [PATCH 02/12] remove some ifdef HAVE_cc0
On Tue, Apr 21, 2015 at 04:14:01PM +0200, Richard Biener wrote: > On Tue, Apr 21, 2015 at 3:24 PM, wrote: > > From: Trevor Saunders > > > > gcc/ChangeLog: > > > > 2015-04-21 Trevor Saunders > > > > * conditions.h: Define macros even if HAVE_cc0 is undefined. > > * emit-rtl.c: Define functions even if HAVE_cc0 is undefined. > > * final.c: Likewise. > > * jump.c: Likewise. > > * recog.c: Likewise. > > * recog.h: Declare functions even when HAVE_cc0 is undefined. > > * sched-deps.c (sched_analyze_2): Always compile case for cc0. > > --- > > gcc/conditions.h | 6 -- > > gcc/emit-rtl.c | 2 -- > > gcc/final.c | 2 -- > > gcc/jump.c | 3 --- > > gcc/recog.c | 2 -- > > gcc/recog.h | 2 -- > > gcc/sched-deps.c | 5 +++-- > > 7 files changed, 3 insertions(+), 19 deletions(-) > > > > diff --git a/gcc/conditions.h b/gcc/conditions.h > > index 2308bfc..7cd1e1c 100644 > > --- a/gcc/conditions.h > > +++ b/gcc/conditions.h > > @@ -20,10 +20,6 @@ along with GCC; see the file COPYING3. If not see > > #ifndef GCC_CONDITIONS_H > > #define GCC_CONDITIONS_H > > > > -/* None of the things in the files exist if we don't use CC0. */ > > - > > -#ifdef HAVE_cc0 > > - > > /* The variable cc_status says how to interpret the condition code. > > It is set by output routines for an instruction that sets the cc's > > and examined by output routines for jump instructions. > > @@ -117,6 +113,4 @@ extern CC_STATUS cc_status; > > (cc_status.flags = 0, cc_status.value1 = 0, cc_status.value2 = 0, \ > >CC_STATUS_MDEP_INIT) > > > > -#endif > > - > > #endif /* GCC_CONDITIONS_H */ > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c > > index 483eacb..c1974bb 100644 > > --- a/gcc/emit-rtl.c > > +++ b/gcc/emit-rtl.c > > @@ -3541,7 +3541,6 @@ prev_active_insn (rtx uncast_insn) > >return insn; > > } > > > > -#ifdef HAVE_cc0 > > /* Return the next insn that uses CC0 after INSN, which is assumed to > > set it. This is the inverse of prev_cc0_setter (i.e., prev_cc0_setter > > applied to the result of this function should yield INSN). > > @@ -3589,7 +3588,6 @@ prev_cc0_setter (rtx uncast_insn) > > > >return insn; > > } > > -#endif > > > > #ifdef AUTO_INC_DEC > > /* Find a RTX_AUTOINC class rtx which matches DATA. */ > > diff --git a/gcc/final.c b/gcc/final.c > > index 1fa93d9..41f6bd9 100644 > > --- a/gcc/final.c > > +++ b/gcc/final.c > > @@ -191,7 +191,6 @@ static rtx last_ignored_compare = 0; > > > > static int insn_counter = 0; > > > > -#ifdef HAVE_cc0 > > /* This variable contains machine-dependent flags (defined in tm.h) > > set and examined by output routines > > that describe how to interpret the condition codes properly. */ > > @@ -202,7 +201,6 @@ CC_STATUS cc_status; > > from before the insn. */ > > > > CC_STATUS cc_prev_status; > > -#endif > > > > /* Number of unmatched NOTE_INSN_BLOCK_BEG notes we have seen. */ > > > > diff --git a/gcc/jump.c b/gcc/jump.c > > index 34b3b7b..bc91550 100644 > > --- a/gcc/jump.c > > +++ b/gcc/jump.c > > @@ -1044,8 +1044,6 @@ jump_to_label_p (const rtx_insn *insn) > > && JUMP_LABEL (insn) != NULL && !ANY_RETURN_P (JUMP_LABEL > > (insn))); > > } > > > > -#ifdef HAVE_cc0 > > - > > /* Return nonzero if X is an RTX that only sets the condition codes > > and has no side effects. */ > > > > @@ -1094,7 +1092,6 @@ sets_cc0_p (const_rtx x) > > } > >return 0; > > } > > -#endif > > > > /* Find all CODE_LABELs referred to in X, and increment their use > > counts. If INSN is a JUMP_INSN and there is at least one > > diff --git a/gcc/recog.c b/gcc/recog.c > > index a9d3b1f..c3ad86f 100644 > > --- a/gcc/recog.c > > +++ b/gcc/recog.c > > @@ -971,7 +971,6 @@ validate_simplify_insn (rtx insn) > >return ((num_changes_pending () > 0) && (apply_change_group () > 0)); > > } > > > > -#ifdef HAVE_cc0 > > /* Return 1 if the insn using CC0 set by INSN does not contain > > any ordered tests applied to the condition codes. > > EQ and NE tests do not count. */ > > @@ -988,7 +987,6 @@ next_insn_tests_no_inequality (rtx insn) > >return (INSN_P (next) > > && ! inequality_comparisons_p (PATTERN (next))); > > } > > -#endif > > > > /* Return 1 if OP is a valid general operand for machine mode MODE. > > This is either a register reference, a memory reference, > > diff --git a/gcc/recog.h b/gcc/recog.h > > index 45ea671..8a38b26 100644 > > --- a/gcc/recog.h > > +++ b/gcc/recog.h > > @@ -112,9 +112,7 @@ extern void validate_replace_rtx_group (rtx, rtx, rtx); > > extern void validate_replace_src_group (rtx, rtx, rtx); > > extern bool validate_simplify_insn (rtx insn); > > extern int num_changes_pending (void); > > -#ifdef HAVE_cc0 > > extern int next_insn_tests_no_inequality (rtx); > > -#endif > > extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode); > > > > extern int offsettable_memref_p (rtx); > > diff --git
Re: [PATCH 03/12] more removal of ifdef HAVE_cc0
On Tue, Apr 21, 2015 at 07:51:14AM -0600, Jeff Law wrote: > On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: > >From: Trevor Saunders > > > >gcc/ChangeLog: > > > >2015-04-21 Trevor Saunders > > > > * combine.c (find_single_use): Remove HAVE_cc0 ifdef for code > > that is trivially ded on non cc0 targets. > > (simplify_set): Likewise. > > (mark_used_regs_combine): Likewise. > > * cse.c (new_basic_block): Likewise. > > (fold_rtx): Likewise. > > (cse_insn): Likewise. > > (cse_extended_basic_block): Likewise. > > (set_live_p): Likewise. > > * rtlanal.c (canonicalize_condition): Likewise. > > * simplify-rtx.c (simplify_binary_operation_1): Likewise. > OK. I find myself wondering if the conditionals should look like > if (HAVE_cc0 > && (whatever)) > > But I doubt it makes any measurable difference. It's something we can > always add in the future if we feel the need to avoid the runtime checks for > things that aren't ever going to happen on most modern targets. yeah, it seems reasonably likely the branch predictor can deal with this for us (I tried to ensure things handled this way didn't do much other than a compare). If not well that's what profiling is for :-) Trev > > jeff >
Re: [PATCH 04/12] always define HAVE_cc0
On Tue, Apr 21, 2015 at 07:53:05AM -0600, Jeff Law wrote: > On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: > >From: Trevor Saunders > > > >gcc/ChangeLog: > > > >2015-04-21 Trevor Saunders > > > > * genconfig.c (main): Always define HAVE_cc0. > > * caller-save.c (insert_one_insn): Change ifdef HAVE_cc0 to #if > > HAVE_cc0. > > * cfgcleanup.c (flow_find_cross_jump): Likewise. > > (flow_find_head_matching_sequence): Likewise. > > (try_head_merge_bb): Likewise. > > * cfgrtl.c (rtl_merge_blocks): Likewise. > > (try_redirect_by_replacing_jump): Likewise. > > (rtl_tidy_fallthru_edge): Likewise. > > * combine.c (do_SUBST_MODE): Likewise. > > (insn_a_feeds_b): Likewise. > > (combine_instructions): Likewise. > > (can_combine_p): Likewise. > > (try_combine): Likewise. > > (find_split_point): Likewise. > > (subst): Likewise. > > (simplify_set): Likewise. > > (distribute_notes): Likewise. > > * cprop.c (cprop_jump): Likewise. > > * cse.c (cse_extended_basic_block): Likewise. > > * df-problems.c (can_move_insns_across): Likewise. > > * final.c (final): Likewise. > > (final_scan_insn): Likewise. > > * function.c (emit_use_return_register_into_block): Likewise. > > * gcse.c (insert_insn_end_basic_block): Likewise. > > * haifa-sched.c (sched_init): Likewise. > > * ira.c (find_moveable_pseudos): Likewise. > > * loop-invariant.c (find_invariant_insn): Likewise. > > * lra-constraints.c (curr_insn_transform): Likewise. > > * optabs.c (prepare_cmp_insn): Likewise. > > * postreload.c (reload_combine_recognize_const_pattern): > > * Likewise. > > * reload.c (find_reloads): Likewise. > > (find_reloads_address_1): Likewise. > > * reorg.c (delete_scheduled_jump): Likewise. > > (steal_delay_list_from_target): Likewise. > > (steal_delay_list_from_fallthrough): Likewise. > > (try_merge_delay_insns): Likewise. > > (redundant_insn): Likewise. > > (fill_simple_delay_slots): Likewise. > > (fill_slots_from_thread): Likewise. > > (delete_computation): Likewise. > > (relax_delay_slots): Likewise. > > * sched-deps.c (sched_analyze_2): Likewise. > > * sched-rgn.c (add_branch_dependences): Likewise. > Doesn't go as far as I'd like, but it's still an improvement. Yeah, this one really just enables other nice things. I really dislike big patches since there's invariably something wrong somewhere and if you don't really know the code in question it can be next to impossible to figure out where the problem is. Trev > > OK. > > jeff >
RE: [PATCH 6/13] mips musl support
Rich Felker writes: > On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote: > > Szabolcs Nagy writes: > > > Set up dynamic linker name for mips. > > > > > > gcc/Changelog: > > > > > > 2015-04-16 Gregor Richards > > > > > > * config/mips/linux.h (MUSL_DYNAMIC_LINKER): Define. > > > > I understand that mips musl is o32 only currently is that correct? > > This is correct. Other ABIs if/when we support them will have different > names. > > > There does however appear to be both soft and hard float variants > > listed in the musl docs. Do you plan on using the same dynamic linker > > name for both float variants? No problem if so but someone must have > > decided to have unique names for big and little endian so I thought > it > > worth checking. > > No, it's supposed to be different (-sf suffix for soft float; see > arch/mips/reloc.h in musl source). If this didn't make it into the > patches it's an omission, probably because we didn't officially support > the sf ABI at all for a long time. > > > Also, are you aware of the two nan encoding formats that MIPS has and > > the support present in glibc's dynamic linker to deal with it? > > I am aware but somewhat skeptical of treating it as yet another > dimension to ABI and the resulting ABI combinatorics. The vast majority > of programs couldn't care less which is which and whether a NAN is > quiet or signaling. Officially we just use the classic mips ABI (with > qnan/snan swapped vs other archs) but there's no harm in somebody doing > the opposite if they really know what they're doing. Couldn't agree more here but I know some people have been concerned about it so the strict rules were put in place. I will attempt to remember and copy the musl list when putting out a plan for formally relaxing the nan encoding rules. The proposal is probably less than 2 weeks away from being ready to review, it does of course make certain assumptions originating from glibc as reference but is an independent ABI proposal. > > I wonder if it would be wise to refuse to target musl unless the ABI > > is known to be supported so that we can avoid compatibility issues > > when different ABI variants are added in musl. > > Possibly, though this might make bootstrapping new ABIs harder. Indeed. The other alternative would be to set the dynamic linker name to something slightly silly for unsupported ABIs like /lib/fixme.so which would make it possible to bootstrap via the addition of a symlink but it is clearly not the approved name. thanks, Matthew
Re: [PATCH 3/13] aarch64 musl support
On 21/04/15 15:16, pins...@gmail.com wrote: > > I don't think you need to check if defaulting to little or big-endian here > are the specs always have one or the other passing through. > i was not aware of this may be the ifdef is not necessary for other archs either i will check > Also if musl does not support ilp32, you might want to error out. Or even > define the dynamic linker name even before support goes into musl. > ok, i guess adding %{mabi=ilp32:_ilp32} won't hurt us
[patch, avr] extend part-clobbered check to AVR_TINY architecture
Hi, When tried backporting AVR_TINY architecture support to 4.9, build failed in libgcc for AVR_TINY. Failure was due to ICE same as: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53065 Fix provided for that bug checks for if the mode crosses the callee saved register. Below patch updates that check as the AVR_TINY has different set of callee saved registers (r18 and r19). This patch is against trunk. NOTE: ICE is re-produciable only with 4.9 + tiny patch and --with-dwarf2 enabled. Is this ok for trunk? diff --git a/gcc/config/avr/avr.c b/gcc/config/avr/avr.c index 68d5ddc..2f441e5 100644 --- a/gcc/config/avr/avr.c +++ b/gcc/config/avr/avr.c @@ -11333,9 +11333,10 @@ avr_hard_regno_call_part_clobbered (unsigned regno, machine_mode mode) return 0; /* Return true if any of the following boundaries is crossed: - 17/18, 27/28 and 29/30. */ + 17/18 or 19/20 (if AVR_TINY), 27/28 and 29/30. */ - return ((regno < 18 && regno + GET_MODE_SIZE (mode) > 18) + return ((regno <= LAST_CALLEE_SAVED_REG && + regno + GET_MODE_SIZE (mode) > (LAST_CALLEE_SAVED_REG + 1)) || (regno < REG_Y && regno + GET_MODE_SIZE (mode) > REG_Y) || (regno < REG_Z && regno + GET_MODE_SIZE (mode) > REG_Z)); } Regards, Pitchumani
Re: [PATCH] 65479 - sanitizer stack trace missing frames past #0 on powerpc64
On 04/21/2015 06:39 AM, Peter Bergner wrote: On Tue, 2015-04-21 at 08:22 +0200, Jakub Jelinek wrote: -#if defined(__powerpc__) || defined(__powerpc64__) - // PCs are always 4 byte aligned. - return pc - 4; -#elif defined(__sparc__) || defined(__mips__) - return pc - 8; The SPARC/MIPS case is of course needed, because on these architectures the call is followed by a delay slot. But I wonder why you need anything special on any other architecture, why pc - 1 isn't good enough for those. The point isn't to find a PC of the call instruction, on some targets that is very hard and you need to disassemble, but to just find some byte in the call instruction. I wrote the "pc - 4" code for powerpc* and I guess I was just being pedantic on returning the first address of the instruction. If using "pc - 1" works, then I'm fine with that. It works fine with the patch and produces sensible output because the decremented address is only used to look up the debug info and restored before it's output. Otherwise (with the unpatched code) we'd end up with odd PC addresses in the stack trace. Martin Peter
Re: [PATCH] PR 62173, re-shuffle insns for RTL loop invariant hoisting
Jiong Wang writes: > 2015-04-14 18:24 GMT+01:00 Jeff Law : >> On 04/14/2015 10:48 AM, Steven Bosscher wrote: So I think this stage2/3 binary difference is acceptable? >>> >>> >>> No, they should be identical. If there's a difference, then there's a >>> bug - which, it seems, you've already found, too. >> >> RIght. And so the natural question is how to fix. >> >> At first glance it would seem like having this new code ignore dependencies >> rising from debug insns would work. >> >> Which then begs the question, what happens to the debug insn -- it's >> certainly not going to be correct anymore if the transformation is made. > > Exactly. > > The debug_insn 2776 in my example is to record the base address of a > local array. the new code is doing correctly here by not shuffling the > operands of insn 2556 and 2557 as there is additional reference of > reg:1473 from debug insn, although the code will still execute correctly > if we do the transformation. > > my understanding to fix this: > > * delete the out-of-date mismatch debug_insn? as there is no guarantee > to generate accurate debug info under -O2. > > IMO, this debug_insn may affect "DW_AT_location" field for variable > descrption of "classes" in .debug_info section, but it's omitted in > the final output already. > > <3><38a4d>: Abbrev Number: 137 (DW_TAG_variable) > <38a4f> DW_AT_name : (indirect string, offset: 0x18db): classes > <38a53> DW_AT_decl_file : 1 > <38a54> DW_AT_decl_line : 548 > <38a56> DW_AT_type: <0x38cb4> > > * update the debug_insn? if the following change is OK with dwarf standard > >from > > insn0: reg0 = fp + reg1 > debug_insn: var_loc = reg0 + const_off > insn1: reg2 = reg0 + const_off > >to > > insn0: reg0 = fp + const_off > debug_insn: var_loc = reg0 + reg1 > insn1: reg2 = reg0 + reg1 > > Thanks, > And attachment is the new patch which will update debug_insn as described in the second solution above. Now the stage2/3 binary differences on AArch64 gone away. Bootstrap OK. On AArch64, this patch give 600+ new rtl loop invariants found across spec2k6 float. +4.5% perf improvement on 436.cactusADM because four new invariants found in the critical function "regex_compile". The similar improvements may be achieved on other RISC backends like powerpc/mips I guess. One thing to mention, for AArch64, one minor glitch in aarch64_legitimize_address needs to be fixed to let this patch take effect, I will send out that patch later as it's a seperate issue. Powerpc/Mips don't have this glitch in LEGITIMIZE_ADDRESS hook, so should be OK, and I verified the base address of local array in the testcase given by Seb on pr62173 do hoisted on ppc64 now. I think pr62173 is fixed on those 64bit arch by this patch. Thoughts? Thanks. 2015-04-21 Jiong Wang gcc/ * loop-invariant.c (find_defs): Enable DF_DU_CHAIN build. (vfp_const_iv): New hash table. (expensive_addr_check_p): New boolean. (init_inv_motion_data): Initialize new variables.> (free_inv_motion_data): Release hash table. (create_new_invariant): Set cheap_address to false for iv in vfp_const_iv table. (find_invariant_insn): Skip dependencies check for iv in vfp_const_iv table. (use_for_single_du): New function. (reshuffle_insn_with_vfp): Likewise. (find_invariants_bb): Call reshuffle_insn_with_vfp. gcc/testsuite/ * gcc.dg/pr62173.c: New testcase. -- Regards, Jiong diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c index f79b497..f70dfb0 100644 --- a/gcc/loop-invariant.c +++ b/gcc/loop-invariant.c @@ -203,6 +203,8 @@ typedef struct invariant *invariant_p; /* The invariants. */ static vec invariants; +static hash_table > *vfp_const_iv; +static bool need_expensive_addr_check_p; /* Check the size of the invariant table and realloc if necessary. */ @@ -695,7 +697,7 @@ find_defs (struct loop *loop) df_remove_problem (df_chain); df_process_deferred_rescans (); - df_chain_add_problem (DF_UD_CHAIN); + df_chain_add_problem (DF_UD_CHAIN + DF_DU_CHAIN); df_set_flags (DF_RD_PRUNE_DEAD_DEFS); df_analyze_loop (loop); check_invariant_table_size (); @@ -742,6 +744,9 @@ create_new_invariant (struct def *def, rtx_insn *insn, bitmap depends_on, See http://gcc.gnu.org/ml/gcc-patches/2009-10/msg01210.html . */ inv->cheap_address = address_cost (SET_SRC (set), word_mode, ADDR_SPACE_GENERIC, speed) < 3; + + if (need_expensive_addr_check_p && vfp_const_iv->find (insn)) + inv->cheap_address = false; } else { @@ -952,7 +957,8 @@ find_invariant_insn (rtx_insn *insn, bool always_reached, bool always_executed) return; depends_on = BITMAP_ALLOC (NULL); - if (!check_dependencies (insn, depends_on)) + if (!vfp_const_iv->find (insn) + && !check_dependencies (insn, depends_on)) { BITMAP_FREE (depends_on); return; @@ -1007,6 +1013,180 @@ find_invariants_insn (rtx_insn *in
Re: [PATCH 6/13] mips musl support
On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote: > Szabolcs Nagy writes: > > Set up dynamic linker name for mips. > > > > gcc/Changelog: > > > > 2015-04-16 Gregor Richards > > > > * config/mips/linux.h (MUSL_DYNAMIC_LINKER): Define. > > I understand that mips musl is o32 only currently is that correct? This is correct. Other ABIs if/when we support them will have different names. > There does however appear to be both soft and hard float variants > listed in the musl docs. Do you plan on using the same dynamic linker > name for both float variants? No problem if so but someone must have > decided to have unique names for big and little endian so I thought > it worth checking. No, it's supposed to be different (-sf suffix for soft float; see arch/mips/reloc.h in musl source). If this didn't make it into the patches it's an omission, probably because we didn't officially support the sf ABI at all for a long time. > Also, are you aware of the two nan encoding formats that MIPS has > and the support present in glibc's dynamic linker to deal with it? I am aware but somewhat skeptical of treating it as yet another dimension to ABI and the resulting ABI combinatorics. The vast majority of programs couldn't care less which is which and whether a NAN is quiet or signaling. Officially we just use the classic mips ABI (with qnan/snan swapped vs other archs) but there's no harm in somebody doing the opposite if they really know what they're doing. > I wonder if it would be wise to refuse to target musl unless the > ABI is known to be supported so that we can avoid compatibility > issues when different ABI variants are added in musl. Possibly, though this might make bootstrapping new ABIs harder. Rich
Re: [PATCH] 65479 - sanitizer stack trace missing frames past #0 on powerpc64
--- a/libsanitizer/ChangeLog +++ b/libsanitizer/ChangeLog @@ -1,3 +1,15 @@ +2015-04-19 Martin Sebor + + PR sanitizer/65479 + * libsanitizer/sanitizer_common/sanitizer_stacktrace.h + (StackTrace::signaled, StackTrace::min_insn_bytes): New data members. + (StackTrace::StackTrace): Initialize signaled. + * libsanitizer/sanitizer_common/sanitizer_stacktrace.cc + (StackTrace::GetPreviousInstructionPc): Rewrite. + * libsanitizer/sanitizer_common/sanitizer_stacktrace_libcdep.cc + (StackTrace::Print): Use min_insn_bytes to adjust PC value. + (BufferedStackTrace::Unwind): Set signaled. libsanitizer/ should not show up in the ChangeLog entry. But as somebody said earlier, the libsanitizer changes really should go to LLVM compiler-rt repo first and then be just backported, either cherry-picked (probably the case for the 5 branch backport later on) or go in full merge from compiler-rt. Okay, let me submit the sanitizer changes there. Since the tests will continue to fail without it, the libbacktrace change can go in later if that's preferable. --- a/libsanitizer/sanitizer_common/sanitizer_stacktrace.cc +++ b/libsanitizer/sanitizer_common/sanitizer_stacktrace.cc @@ -15,19 +15,33 @@ namespace __sanitizer { -uptr StackTrace::GetPreviousInstructionPc(uptr pc) { -#if defined(__arm__) - // Cancel Thumb bit. - pc = pc & (~1); -#endif Your code loses this, which is undesirable. The original function fails to return the pc value on ARM so I just took it out. I didn't look into what the intent was but all the tests pass with the patch on aarch64 (after applying the Fedora gcc 5 patch you mentioned yesterday). -#if defined(__powerpc__) || defined(__powerpc64__) - // PCs are always 4 byte aligned. - return pc - 4; -#elif defined(__sparc__) || defined(__mips__) - return pc - 8; The SPARC/MIPS case is of course needed, because on these architectures the call is followed by a delay slot. But I wonder why you need anything special on any other architecture, why pc - 1 isn't good enough for those. The point isn't to find a PC of the call instruction, on some targets that is very hard and you need to disassemble, but to just find some byte in the call instruction. I forgot about the delay slot. Thanks for the reminder. +const unsigned StackTrace::min_insn_bytes = +#if defined __ia64__ +// Intel Itanium has 5 byte instructions. +5 E.g. this is wrong, ia64 doesn't have 5 byte instructions, but has VLIW bundles, where in the 16 byte bundle there are up to 3 41-bit instructions plus template. But, ia64 isn't supported by libsanitizer and I doubt there are enough users that would be interested in writing support for a dead architecture. I suppose with the sanitizer output referencing the unmodified PC values on the stack the computation can be simplified to just subtract (and later add) 1 on all targets. Let me change that. Martin
Re: [patch] [java] bump libgcj soname
On Tue, Apr 21, 2015 at 04:29:52PM +0200, Matthias Klose wrote: > On 04/21/2015 04:19 PM, Jakub Jelinek wrote: > > On Tue, Apr 21, 2015 at 04:16:18PM +0200, Matthias Klose wrote: > >> On 04/21/2015 04:11 PM, Jakub Jelinek wrote: > >>> On Tue, Apr 21, 2015 at 04:07:13PM +0200, Matthias Klose wrote: > bump the libgcj soname on the trunk, as done for every release cycle, > >>> > >>> Is that really needed though these days? > >>> Weren't there basically zero changes to libjava (both libjava and > >>> libjava/classpath) in the last 2 or more years? > >>> The few ones were mostly updating Copyright notices, minor configure > >>> changes, but I really haven't seen anything ABI changing for quite a > >>> while. > >> > >> yes, the GCC version is embedded in the GCJ_VERSIONED_LIBDIR > >> > >> which is defined as > >> > >> gcjsubdir=gcj-$gcjversion-$libgcj_soversion > >> dbexecdir='$(toolexeclibdir)/'$gcjsubdir > > > > But why is that an argument for bumping it? If both GCC 5 and GCC 6 will > > (likely) provide the same ABI in the library, there is no reason not to use > > the same directory for those. > > but currently there are different directories used (gcjversion already changed > on the trunk) and compiled into the library. Do you mean that gcjsubdir > should > be just defined as gcj? What depends on BASE-VER sure, that is bumped automatically and should track the gcc version. But the soname, which is an unrelated number, there is no point to bump it. If you have a packaging issue, just solve it on the packaging side, but really there is no point to yearly bump a soname of something that doesn't change at all (and is really dead project for many years). Jakub
[PATCH][libstc++v3]Add new dg-require-thread-fence directive.
Hi all, This patch defines a new dg-require-thread-fence directive. And three test cases are updated to use it. The new directive are used to check whether the target support thread fence either by the target back-end or external library function call. A thread fence is required to expand atomic load/store. There is a case that a call to some external __sync_synchronize will be emitted, and it's not implemented. You will get linking errors like this: undefined reference to `__sync_synchronize`. Test cases which are gated by this directive will be skipped if no thread fence is available. For example the three test cases updated here. They fail on arm-none-eabi target where __sync_synchronize() isn't implemented and target cpu has no memory_barrier. ___sync_synchronize () is used to check whether thread-fence is available. In GCC sync_synchronize is expanded as expand_mem_thread_fence (MEMMODEL_SEQ_CST). Okay to commit? libstdc++-v3/ChangeLog: 2015-04-21 Renlin Li * testsuite/lib/dg-options.exp (dg-require-thread-fence): New. * testsuite/lib/libstdc++.exp (check_v3_target_thread_fence): New. * testsuite/29_atomics/atomic_flag/clear/1.cc: Use it. * testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc: Likewise. * testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc: Likewise. diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc b/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc index 0a4219c..a6e2299 100644 --- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc +++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc @@ -1,4 +1,5 @@ // { dg-options "-std=gnu++11" } +// { dg-require-thread-fence "" } // Copyright (C) 2009-2015 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc b/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc index 2ff740b..0655be4 100644 --- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc +++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc @@ -1,4 +1,5 @@ // { dg-options "-std=gnu++11" } +// { dg-require-thread-fence "" } // Copyright (C) 2008-2015 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc b/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc index 6ac20c0..a867da2 100644 --- a/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc +++ b/libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc @@ -1,4 +1,5 @@ // { dg-options "-std=gnu++11" } +// { dg-require-thread-fence "" } // Copyright (C) 2008-2015 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/lib/dg-options.exp b/libstdc++-v3/testsuite/lib/dg-options.exp index 38c8206..56ca896 100644 --- a/libstdc++-v3/testsuite/lib/dg-options.exp +++ b/libstdc++-v3/testsuite/lib/dg-options.exp @@ -115,6 +115,15 @@ proc dg-require-cmath { args } { return } +proc dg-require-thread-fence { args } { +if { ![ check_v3_target_thread_fence ] } { + upvar dg-do-what dg-do-what + set dg-do-what [list [lindex ${dg-do-what} 0] "N" "P"] + return +} +return +} + proc dg-require-atomic-builtins { args } { if { ![ check_v3_target_atomic_builtins ] } { upvar dg-do-what dg-do-what diff --git a/libstdc++-v3/testsuite/lib/libstdc++.exp b/libstdc++-v3/testsuite/lib/libstdc++.exp index b2f7d00..9e395e2 100644 --- a/libstdc++-v3/testsuite/lib/libstdc++.exp +++ b/libstdc++-v3/testsuite/lib/libstdc++.exp @@ -1221,6 +1221,62 @@ proc check_v3_target_cmath { } { return $et_c99_math } +proc check_v3_target_thread_fence { } { +global cxxflags +global DEFAULT_CXXFLAGS +global et_thread_fence + +global tool + +if { ![info exists et_thread_fence_target_name] } { + set et_thread_fence_target_name "" +} + +# If the target has changed since we set the cached value, clear it. +set current_target [current_target_name] +if { $current_target != $et_thread_fence_target_name } { + verbose "check_v3_target_thread_fence: `$et_thread_fence_target_name'" 2 + set et_thread_fence_target_name $current_target + if [info exists et_thread_fence] { + verbose "check_v3_target_thread_fence: removing cached result" 2 + unset et_thread_fence + } +} + +if [info exists et_thread_fence] { + verbose "check_v3_target_thread_fence: using cached result" 2 +} else { + set et_thread_fence 0 + + # Set up and preprocess a C++11 test program that depends + # on the thread fence to be available. + set src thread_fence[pid].cc + + set f [open $src "w"] + puts $f "int main() {" + puts $f "__sync_synchronize ();" + puts $f "return 0;" + puts $f "}" + close $f + + set cxxflags_saved $cxxflags + set cxxflags "$cxxflags $DEFAULT_CXXFLAGS -Werror -std=gnu++11" + + set lines [v3_target_compile $src /dev/null executable ""] + set cxxflags $cxxflag
Re: [patch] [java] bump libgcj soname
On 04/21/2015 04:19 PM, Jakub Jelinek wrote: > On Tue, Apr 21, 2015 at 04:16:18PM +0200, Matthias Klose wrote: >> On 04/21/2015 04:11 PM, Jakub Jelinek wrote: >>> On Tue, Apr 21, 2015 at 04:07:13PM +0200, Matthias Klose wrote: bump the libgcj soname on the trunk, as done for every release cycle, >>> >>> Is that really needed though these days? >>> Weren't there basically zero changes to libjava (both libjava and >>> libjava/classpath) in the last 2 or more years? >>> The few ones were mostly updating Copyright notices, minor configure >>> changes, but I really haven't seen anything ABI changing for quite a while. >> >> yes, the GCC version is embedded in the GCJ_VERSIONED_LIBDIR >> >> which is defined as >> >> gcjsubdir=gcj-$gcjversion-$libgcj_soversion >> dbexecdir='$(toolexeclibdir)/'$gcjsubdir > > But why is that an argument for bumping it? If both GCC 5 and GCC 6 will > (likely) provide the same ABI in the library, there is no reason not to use > the same directory for those. but currently there are different directories used (gcjversion already changed on the trunk) and compiled into the library. Do you mean that gcjsubdir should be just defined as gcj? Matthias
Re: [PATCH 02/12] remove some ifdef HAVE_cc0
On Tue, Apr 21, 2015 at 3:24 PM, wrote: > From: Trevor Saunders > > gcc/ChangeLog: > > 2015-04-21 Trevor Saunders > > * conditions.h: Define macros even if HAVE_cc0 is undefined. > * emit-rtl.c: Define functions even if HAVE_cc0 is undefined. > * final.c: Likewise. > * jump.c: Likewise. > * recog.c: Likewise. > * recog.h: Declare functions even when HAVE_cc0 is undefined. > * sched-deps.c (sched_analyze_2): Always compile case for cc0. > --- > gcc/conditions.h | 6 -- > gcc/emit-rtl.c | 2 -- > gcc/final.c | 2 -- > gcc/jump.c | 3 --- > gcc/recog.c | 2 -- > gcc/recog.h | 2 -- > gcc/sched-deps.c | 5 +++-- > 7 files changed, 3 insertions(+), 19 deletions(-) > > diff --git a/gcc/conditions.h b/gcc/conditions.h > index 2308bfc..7cd1e1c 100644 > --- a/gcc/conditions.h > +++ b/gcc/conditions.h > @@ -20,10 +20,6 @@ along with GCC; see the file COPYING3. If not see > #ifndef GCC_CONDITIONS_H > #define GCC_CONDITIONS_H > > -/* None of the things in the files exist if we don't use CC0. */ > - > -#ifdef HAVE_cc0 > - > /* The variable cc_status says how to interpret the condition code. > It is set by output routines for an instruction that sets the cc's > and examined by output routines for jump instructions. > @@ -117,6 +113,4 @@ extern CC_STATUS cc_status; > (cc_status.flags = 0, cc_status.value1 = 0, cc_status.value2 = 0, \ >CC_STATUS_MDEP_INIT) > > -#endif > - > #endif /* GCC_CONDITIONS_H */ > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c > index 483eacb..c1974bb 100644 > --- a/gcc/emit-rtl.c > +++ b/gcc/emit-rtl.c > @@ -3541,7 +3541,6 @@ prev_active_insn (rtx uncast_insn) >return insn; > } > > -#ifdef HAVE_cc0 > /* Return the next insn that uses CC0 after INSN, which is assumed to > set it. This is the inverse of prev_cc0_setter (i.e., prev_cc0_setter > applied to the result of this function should yield INSN). > @@ -3589,7 +3588,6 @@ prev_cc0_setter (rtx uncast_insn) > >return insn; > } > -#endif > > #ifdef AUTO_INC_DEC > /* Find a RTX_AUTOINC class rtx which matches DATA. */ > diff --git a/gcc/final.c b/gcc/final.c > index 1fa93d9..41f6bd9 100644 > --- a/gcc/final.c > +++ b/gcc/final.c > @@ -191,7 +191,6 @@ static rtx last_ignored_compare = 0; > > static int insn_counter = 0; > > -#ifdef HAVE_cc0 > /* This variable contains machine-dependent flags (defined in tm.h) > set and examined by output routines > that describe how to interpret the condition codes properly. */ > @@ -202,7 +201,6 @@ CC_STATUS cc_status; > from before the insn. */ > > CC_STATUS cc_prev_status; > -#endif > > /* Number of unmatched NOTE_INSN_BLOCK_BEG notes we have seen. */ > > diff --git a/gcc/jump.c b/gcc/jump.c > index 34b3b7b..bc91550 100644 > --- a/gcc/jump.c > +++ b/gcc/jump.c > @@ -1044,8 +1044,6 @@ jump_to_label_p (const rtx_insn *insn) > && JUMP_LABEL (insn) != NULL && !ANY_RETURN_P (JUMP_LABEL (insn))); > } > > -#ifdef HAVE_cc0 > - > /* Return nonzero if X is an RTX that only sets the condition codes > and has no side effects. */ > > @@ -1094,7 +1092,6 @@ sets_cc0_p (const_rtx x) > } >return 0; > } > -#endif > > /* Find all CODE_LABELs referred to in X, and increment their use > counts. If INSN is a JUMP_INSN and there is at least one > diff --git a/gcc/recog.c b/gcc/recog.c > index a9d3b1f..c3ad86f 100644 > --- a/gcc/recog.c > +++ b/gcc/recog.c > @@ -971,7 +971,6 @@ validate_simplify_insn (rtx insn) >return ((num_changes_pending () > 0) && (apply_change_group () > 0)); > } > > -#ifdef HAVE_cc0 > /* Return 1 if the insn using CC0 set by INSN does not contain > any ordered tests applied to the condition codes. > EQ and NE tests do not count. */ > @@ -988,7 +987,6 @@ next_insn_tests_no_inequality (rtx insn) >return (INSN_P (next) > && ! inequality_comparisons_p (PATTERN (next))); > } > -#endif > > /* Return 1 if OP is a valid general operand for machine mode MODE. > This is either a register reference, a memory reference, > diff --git a/gcc/recog.h b/gcc/recog.h > index 45ea671..8a38b26 100644 > --- a/gcc/recog.h > +++ b/gcc/recog.h > @@ -112,9 +112,7 @@ extern void validate_replace_rtx_group (rtx, rtx, rtx); > extern void validate_replace_src_group (rtx, rtx, rtx); > extern bool validate_simplify_insn (rtx insn); > extern int num_changes_pending (void); > -#ifdef HAVE_cc0 > extern int next_insn_tests_no_inequality (rtx); > -#endif > extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode); > > extern int offsettable_memref_p (rtx); > diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c > index 5434831..31de6be 100644 > --- a/gcc/sched-deps.c > +++ b/gcc/sched-deps.c > @@ -2608,8 +2608,10 @@ sched_analyze_2 (struct deps_desc *deps, rtx x, > rtx_insn *insn) > >return; > > -#ifdef HAVE_cc0 > case CC0: > +#ifdef HAVE_cc0 #ifndef ? > + gcc_unreachable (); > +#endif >/* Us
Re: [patch] [java] bump libgcj soname
On Tue, Apr 21, 2015 at 04:16:18PM +0200, Matthias Klose wrote: > On 04/21/2015 04:11 PM, Jakub Jelinek wrote: > > On Tue, Apr 21, 2015 at 04:07:13PM +0200, Matthias Klose wrote: > >> bump the libgcj soname on the trunk, as done for every release cycle, > > > > Is that really needed though these days? > > Weren't there basically zero changes to libjava (both libjava and > > libjava/classpath) in the last 2 or more years? > > The few ones were mostly updating Copyright notices, minor configure > > changes, but I really haven't seen anything ABI changing for quite a while. > > yes, the GCC version is embedded in the GCJ_VERSIONED_LIBDIR > > which is defined as > > gcjsubdir=gcj-$gcjversion-$libgcj_soversion > dbexecdir='$(toolexeclibdir)/'$gcjsubdir But why is that an argument for bumping it? If both GCC 5 and GCC 6 will (likely) provide the same ABI in the library, there is no reason not to use the same directory for those. Jakub
Re: [C/C++ PATCH] Improve -Wlogical-op (PR c/63357)
On 21/04/15 13:16, Marek Polacek wrote: (-Wlogical-op still isn't enabled neither by -Wall nor by -Wextra.) The reason is https://gcc.gnu.org/PR61534 which means we don't want to warn for: extern int xxx; #define XXX xxx int test (void) { if (!XXX && xxx) return 4; else return 0; } (gcc/testsuite/gcc.dg/pr40172-3.c, although it should be moved to c-c++-common) As noted in the PR: The problem is that !XXX becomes XXX == 0, but it has the location of "!", which is not virtual. If we look at the argument of the expression, then XXX is actually a var_decl, whose location corresponds to the declaration and not the use, and it is not virtual either. This is PR43486. Bootstrapped/regtested on x86_64-linux, ok for trunk? Does it pass bootstrap if you enable it? That is, is GCC itself -Wlogical-op clean? Cheers, Manuel.
Re: [PATCH 3/13] aarch64 musl support
> On Apr 20, 2015, at 11:52 AM, Szabolcs Nagy wrote: > > Set up dynamic linker name for aarch64. > > gcc/Changelog: > > 2015-04-16 Gregor Richards >Szabolcs Nagy > >* config/aarch64/aarch64-linux.h (MUSL_DYNAMIC_LINKER): Define. I don't think you need to check if defaulting to little or big-endian here are the specs always have one or the other passing through. Also if musl does not support ilp32, you might want to error out. Or even define the dynamic linker name even before support goes into musl. Thanks, Andrew > <03-aarch64.patch>
Re: [patch] [java] bump libgcj soname
On 04/21/2015 04:11 PM, Jakub Jelinek wrote: > On Tue, Apr 21, 2015 at 04:07:13PM +0200, Matthias Klose wrote: >> bump the libgcj soname on the trunk, as done for every release cycle, > > Is that really needed though these days? > Weren't there basically zero changes to libjava (both libjava and > libjava/classpath) in the last 2 or more years? > The few ones were mostly updating Copyright notices, minor configure > changes, but I really haven't seen anything ABI changing for quite a while. yes, the GCC version is embedded in the GCJ_VERSIONED_LIBDIR which is defined as gcjsubdir=gcj-$gcjversion-$libgcj_soversion dbexecdir='$(toolexeclibdir)/'$gcjsubdir
Re: [patch] [java] bump libgcj soname
On Tue, Apr 21, 2015 at 04:07:13PM +0200, Matthias Klose wrote: > bump the libgcj soname on the trunk, as done for every release cycle, Is that really needed though these days? Weren't there basically zero changes to libjava (both libjava and libjava/classpath) in the last 2 or more years? The few ones were mostly updating Copyright notices, minor configure changes, but I really haven't seen anything ABI changing for quite a while. Jakub
Re: [PATCH][combine] Do not call rtx costs on potentially unrecognisable rtxes in combine
On 21/04/15 15:06, Jeff Law wrote: On 04/21/2015 03:18 AM, Kyrill Tkachov wrote: Though I do wonder if, in practice, we can identify those cases that do simplify more directly apriori and just punt everything else rather than this rather convoluted approach. You mean like calling simplify_binary_operation that returns NULL if no simplification is possible? Not entirely sure, just a general sense that we're doing far more work here than is justified by the potential gains. The cases we care about are very limited (negated or duplicated arguments) and I'd be surprised if they're still showing up in combine.c these days. I didn't look at the history of that code, but I suspect it is *very very* old. I had a look when I was writing that patch and it was from 2005 (r96681). I'm not asking you to tackle this problem, it was more meant as an observation. But if you want to dig deeper, go for it. If it were me, the first thing I'd do is try to construct a testcase that would get me into that code -- I'd be it's hard, particularly with the tree and rtl reassociations we do these days. Yeah, the comment does mention that it's supposed to trigger rarely. I'm looking at it from the perspective of cleaning up rtx cost usages though. Thanks, Kyrill Jeff
Re: [PATCH][expr.c] PR 65358 Avoid clobbering partial argument during sibcall
On 04/21/2015 02:30 AM, Kyrill Tkachov wrote: From reading config/stormy16/stormy-abi it seems to me that we don't pass arguments partially in stormy16, so this code would never be called there. That leaves pa as the potential problematic target. I don't suppose there's an easy way to test on pa? My checkout of binutils doesn't seem to include a sim target for it. No simulator, no machines in the testfarm, the box I had access to via parisc-linux.org seems dead and my ancient PA overheats well before a bootstrap could complete. I often regret knowing about the backwards way many things were done on the PA because it makes me think about cases that only matter on dead architectures. Jeff
[patch] [java] bump libgcj soname
bump the libgcj soname on the trunk, as done for every release cycle, and update the cygwin/mingw32 files. ok for the trunk? Matthias gcc/ 2015-04-21 Matthias Klose * config/i386/cygwin.h (LIBGCJ_SONAME): Set libgcj version to -17. * config/i386/mingw32.h (LIBGCJ_SONAME): Set libgcj version to -17. libjava/ 2015-04-21 Matthias Klose * libtool-version: Bump soversion. Index: gcc/config/i386/cygwin.h === --- gcc/config/i386/cygwin.h (revision 68) +++ gcc/config/i386/cygwin.h (working copy) @@ -154,5 +154,5 @@ #define LIBGCC_SONAME "cyggcc_s" LIBGCC_EH_EXTN "-1.dll" /* We should find a way to not have to update this manually. */ -#define LIBGCJ_SONAME "cyggcj" /*LIBGCC_EH_EXTN*/ "-16.dll" +#define LIBGCJ_SONAME "cyggcj" /*LIBGCC_EH_EXTN*/ "-17.dll" Index: gcc/config/i386/mingw32.h === --- gcc/config/i386/mingw32.h (revision 68) +++ gcc/config/i386/mingw32.h (working copy) @@ -254,4 +254,4 @@ #define LIBGCC_SONAME "libgcc_s" LIBGCC_EH_EXTN "-1.dll" /* We should find a way to not have to update this manually. */ -#define LIBGCJ_SONAME "libgcj" /*LIBGCC_EH_EXTN*/ "-16.dll" +#define LIBGCJ_SONAME "libgcj" /*LIBGCC_EH_EXTN*/ "-17.dll" Index: libjava/libtool-version === --- libjava/libtool-version (revision 68) +++ libjava/libtool-version (working copy) @@ -5,4 +5,4 @@ # Note: When changing the version here, please do also update LIBGCJ_SONAME # in gcc/config/i386/cygwin.h and gcc/config/i386/mingw32.h. # CURRENT:REVISION:AGE -16:0:0 +17:0:0
Re: [PATCH][combine] Do not call rtx costs on potentially unrecognisable rtxes in combine
On 04/21/2015 03:18 AM, Kyrill Tkachov wrote: Though I do wonder if, in practice, we can identify those cases that do simplify more directly apriori and just punt everything else rather than this rather convoluted approach. You mean like calling simplify_binary_operation that returns NULL if no simplification is possible? Not entirely sure, just a general sense that we're doing far more work here than is justified by the potential gains. The cases we care about are very limited (negated or duplicated arguments) and I'd be surprised if they're still showing up in combine.c these days. I didn't look at the history of that code, but I suspect it is *very very* old. I'm not asking you to tackle this problem, it was more meant as an observation. But if you want to dig deeper, go for it. If it were me, the first thing I'd do is try to construct a testcase that would get me into that code -- I'd be it's hard, particularly with the tree and rtl reassociations we do these days. Jeff
Re: [PATCH 01/12] add default definition of EH_RETURN_DATA_REGNO
On 04/21/2015 08:00 AM, Jakub Jelinek wrote: On Tue, Apr 21, 2015 at 07:40:37AM -0600, Jeff Law wrote: On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h: New definition of EH_RETURN_DATA_REGNO. * except.c: Remove definition of EH_RETURN_DATA_REGNO. * builtins.c (expand_builtin): Remove check if EH_RETURN_DATA_REGNO is defined. * df-scan.c (df_bb_refs_collect): Likewise. (df_get_exit_block_use_set): Likewise. * haifa-sched.c (initiate_bb_reg_pressure_info): Likewise. * ira-lives.c (process_bb_node_lives): Likewise. * lra-lives.c (process_bb_lives): Likewise. This one wasn't as obvious as the others, but is clearly OK once the full loops being guarded by EH_RETURN_DATA_REGNO are examined. Except that the bb_has_eh_pred predicate might burn CPU time for basic blocks with many predecessors. Though, the question is if there are any important targets that don't define EH_RETURN_DATA_REGNO already. Probably not since they'll blow up elsewhere (I was recently helping someone with a private port that didn't define EH_RETURN_DATA_REGNO) :-) jeff
Re: [PATCH 01/12] add default definition of EH_RETURN_DATA_REGNO
On Tue, Apr 21, 2015 at 07:40:37AM -0600, Jeff Law wrote: > On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: > >From: Trevor Saunders > > > >gcc/ChangeLog: > > > >2015-04-21 Trevor Saunders > > > > * defaults.h: New definition of EH_RETURN_DATA_REGNO. > > * except.c: Remove definition of EH_RETURN_DATA_REGNO. > > * builtins.c (expand_builtin): Remove check if > > EH_RETURN_DATA_REGNO is defined. > > * df-scan.c (df_bb_refs_collect): Likewise. > > (df_get_exit_block_use_set): Likewise. > > * haifa-sched.c (initiate_bb_reg_pressure_info): Likewise. > > * ira-lives.c (process_bb_node_lives): Likewise. > > * lra-lives.c (process_bb_lives): Likewise. > This one wasn't as obvious as the others, but is clearly OK once the full > loops being guarded by EH_RETURN_DATA_REGNO are examined. Except that the bb_has_eh_pred predicate might burn CPU time for basic blocks with many predecessors. Though, the question is if there are any important targets that don't define EH_RETURN_DATA_REGNO already. Jakub
[PATCH][AArch64] Add branch-cost to cpu tuning information.
The AArch64 backend sets BRANCH_COST to be the constant value 2 for all cpus, meaning that the compiler thinks that branches cost the same across all cpus. This patch reworks the handling of branch costs to allow per-cpu values to be set. The actual value of the branch-costs is unchanged as the correct values for will need to be decided for each core. Tested aarch64-none-linux-gnu with gcc-check. Ok for trunk? Matthew 2015-05-21 Matthew Wahab * gcc/config/aarch64-protos.h (struct cpu_branch_cost): New. (tune_params): Add field branch_costs. (aarch64_branch_cost): Declare. * gcc/config/aarch64.c (generic_branch_cost): New. (generic_tunings): Set field cpu_branch_cost to generic_branch_cost. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_branch_cost): Define. * gcc/config/aarch64/aarch64.h (BRANCH_COST): Redefine. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8676c5c..77b01fa 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -162,12 +162,20 @@ struct cpu_vector_cost const int cond_not_taken_branch_cost; /* Cost of not taken branch. */ }; +/* Branch costs. */ +struct cpu_branch_cost +{ + const int predictable;/* Predictable branch or optimizing for size. */ + const int unpredictable; /* Unpredictable branch or optimizing for speed. */ +}; + struct tune_params { const struct cpu_cost_table *const insn_extra_cost; const struct cpu_addrcost_table *const addr_cost; const struct cpu_regmove_cost *const regmove_cost; const struct cpu_vector_cost *const vec_costs; + const struct cpu_branch_cost *const branch_costs; const int memmov_cost; const int issue_rate; const unsigned int fuseable_ops; @@ -259,6 +267,8 @@ void aarch64_print_operand (FILE *, rtx, char); void aarch64_print_operand_address (FILE *, rtx); void aarch64_emit_call_insn (rtx); +int aarch64_branch_cost (bool, bool); + /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 77a641e..a020316 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -339,12 +339,20 @@ static const struct cpu_vector_cost xgene1_vector_cost = #define AARCH64_FUSE_ADRP_LDR (1 << 3) #define AARCH64_FUSE_CMP_BRANCH (1 << 4) +/* Generic costs for branch instructions. */ +static const struct cpu_branch_cost generic_branch_cost = +{ + 2, /* Predictable. */ + 2 /* Unpredictable. */ +}; + static const struct tune_params generic_tunings = { &cortexa57_extra_costs, &generic_addrcost_table, &generic_regmove_cost, &generic_vector_cost, + &generic_branch_cost, 4, /* memmov_cost */ 2, /* issue_rate */ AARCH64_FUSE_NOTHING, /* fuseable_ops */ @@ -362,6 +370,7 @@ static const struct tune_params cortexa53_tunings = &generic_addrcost_table, &cortexa53_regmove_cost, &generic_vector_cost, + &generic_branch_cost, 4, /* memmov_cost */ 2, /* issue_rate */ (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD @@ -380,6 +389,7 @@ static const struct tune_params cortexa57_tunings = &cortexa57_addrcost_table, &cortexa57_regmove_cost, &cortexa57_vector_cost, + &generic_branch_cost, 4, /* memmov_cost */ 3, /* issue_rate */ (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD @@ -398,6 +408,7 @@ static const struct tune_params thunderx_tunings = &generic_addrcost_table, &thunderx_regmove_cost, &generic_vector_cost, + &generic_branch_cost, 6, /* memmov_cost */ 2, /* issue_rate */ AARCH64_FUSE_CMP_BRANCH, /* fuseable_ops */ @@ -415,6 +426,7 @@ static const struct tune_params xgene1_tunings = &xgene1_addrcost_table, &xgene1_regmove_cost, &xgene1_vector_cost, + &generic_branch_cost, 6, /* memmov_cost */ 4, /* issue_rate */ AARCH64_FUSE_NOTHING, /* fuseable_ops */ @@ -5361,6 +5373,19 @@ aarch64_address_cost (rtx x, return cost; } +int +aarch64_branch_cost (bool speed_p, bool predictable_p) +{ + /* When optimizing for speed, use the cost of unpredictable branches. */ + const struct cpu_branch_cost *branch_costs = +aarch64_tune_params->branch_costs; + + if (!speed_p || predictable_p) +return branch_costs->predictable; + else +return branch_costs->unpredictable; +} + /* Return true if the RTX X in mode MODE is a zero or sign extract usable in an ADD or SUB (extended register) instruction. */ static bool diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bf59e40..93a32f5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -823,7 +823,8 @@ do { \ #define TRAMPOLINE_SECTION text_section /* To start with. */ -#define BRANCH_COST(SPEED_P, PREDICTABLE_P) 2 +#d
RE: [PATCH 6/13] mips musl support
Szabolcs Nagy writes: > Set up dynamic linker name for mips. > > gcc/Changelog: > > 2015-04-16 Gregor Richards > > * config/mips/linux.h (MUSL_DYNAMIC_LINKER): Define. I understand that mips musl is o32 only currently is that correct? There does however appear to be both soft and hard float variants listed in the musl docs. Do you plan on using the same dynamic linker name for both float variants? No problem if so but someone must have decided to have unique names for big and little endian so I thought it worth checking. Also, are you aware of the two nan encoding formats that MIPS has and the support present in glibc's dynamic linker to deal with it? I wonder if it would be wise to refuse to target musl unless the ABI is known to be supported so that we can avoid compatibility issues when different ABI variants are added in musl. Thanks, Matthew
Re: [PATCH 00/12] Reduce conditional compilation
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders Hi, This is a first round of patches to reduce the amount of code with in #if / #ifdef. This makes it incrementally easier to not break configs other than the one being built, and moves things slightly closer to using target hooks for everything. each commit bootstrapped and regtested on x86_64-linux-gnu without regression, and whole patch set run through config-list.mk without issue, ok? So I think after looking at this patchset, any changes of a similar nature you want to make should be considered pre-approved. Just post them for archival purposes, but no need for you to wait for review as long as they have the same purpose and overall structure as was seen in these patches. jeff
Re: [PATCH 10/12] remove more ifdefs for HAVE_cc0
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * caller-save.c (insert_one_insn): Remove ifdef HAVE_cc0. * cfgcleanup.c (flow_find_cross_jump): Likewise. (flow_find_head_matching_sequence): Likewise. (try_head_merge_bb): Likewise. * combine.c (can_combine_p): Likewise. (try_combine): Likewise. (distribute_notes): Likewise. * df-problems.c (can_move_insns_across): Likewise. * final.c (final): Likewise. * gcse.c (insert_insn_end_basic_block): Likewise. * ira.c (find_moveable_pseudos): Likewise. * reorg.c (try_merge_delay_insns): Likewise. (fill_simple_delay_slots): Likewise. (fill_slots_from_thread): Likewise. * sched-deps.c (sched_analyze_2): Likewise. OK. Jeff
Re: [PATCH 05/12] make some HAVE_cc0 code always compiled
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * cfgrtl.c (rtl_merge_blocks): Change #if HAVE_cc0 to if (HAVE_cc0) (try_redirect_by_replacing_jump): Likewise. (rtl_tidy_fallthru_edge): Likewise. * combine.c (insn_a_feeds_b): Likewise. (find_split_point): Likewise. (simplify_set): Likewise. * cprop.c (cprop_jump): Likewise. * cse.c (cse_extended_basic_block): Likewise. * df-problems.c (can_move_insns_across): Likewise. * function.c (emit_use_return_register_into_block): Likewise. * haifa-sched.c (sched_init): Likewise. * ira.c (find_moveable_pseudos): Likewise. * loop-invariant.c (find_invariant_insn): Likewise. * lra-constraints.c (curr_insn_transform): Likewise. * postreload.c (reload_combine_recognize_const_pattern): * Likewise. * reload.c (find_reloads): Likewise. * reorg.c (delete_scheduled_jump): Likewise. (steal_delay_list_from_target): Likewise. (steal_delay_list_from_fallthrough): Likewise. (redundant_insn): Likewise. (fill_simple_delay_slots): Likewise. (fill_slots_from_thread): Likewise. (delete_computation): Likewise. * sched-rgn.c (add_branch_dependences): Likewise. OK. This is what I expected to see a lot of :-0 jeff
Re: [PATCH 04/12] always define HAVE_cc0
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * genconfig.c (main): Always define HAVE_cc0. * caller-save.c (insert_one_insn): Change ifdef HAVE_cc0 to #if HAVE_cc0. * cfgcleanup.c (flow_find_cross_jump): Likewise. (flow_find_head_matching_sequence): Likewise. (try_head_merge_bb): Likewise. * cfgrtl.c (rtl_merge_blocks): Likewise. (try_redirect_by_replacing_jump): Likewise. (rtl_tidy_fallthru_edge): Likewise. * combine.c (do_SUBST_MODE): Likewise. (insn_a_feeds_b): Likewise. (combine_instructions): Likewise. (can_combine_p): Likewise. (try_combine): Likewise. (find_split_point): Likewise. (subst): Likewise. (simplify_set): Likewise. (distribute_notes): Likewise. * cprop.c (cprop_jump): Likewise. * cse.c (cse_extended_basic_block): Likewise. * df-problems.c (can_move_insns_across): Likewise. * final.c (final): Likewise. (final_scan_insn): Likewise. * function.c (emit_use_return_register_into_block): Likewise. * gcse.c (insert_insn_end_basic_block): Likewise. * haifa-sched.c (sched_init): Likewise. * ira.c (find_moveable_pseudos): Likewise. * loop-invariant.c (find_invariant_insn): Likewise. * lra-constraints.c (curr_insn_transform): Likewise. * optabs.c (prepare_cmp_insn): Likewise. * postreload.c (reload_combine_recognize_const_pattern): * Likewise. * reload.c (find_reloads): Likewise. (find_reloads_address_1): Likewise. * reorg.c (delete_scheduled_jump): Likewise. (steal_delay_list_from_target): Likewise. (steal_delay_list_from_fallthrough): Likewise. (try_merge_delay_insns): Likewise. (redundant_insn): Likewise. (fill_simple_delay_slots): Likewise. (fill_slots_from_thread): Likewise. (delete_computation): Likewise. (relax_delay_slots): Likewise. * sched-deps.c (sched_analyze_2): Likewise. * sched-rgn.c (add_branch_dependences): Likewise. Doesn't go as far as I'd like, but it's still an improvement. OK. jeff
Re: [PATCH 03/12] more removal of ifdef HAVE_cc0
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * combine.c (find_single_use): Remove HAVE_cc0 ifdef for code that is trivially ded on non cc0 targets. (simplify_set): Likewise. (mark_used_regs_combine): Likewise. * cse.c (new_basic_block): Likewise. (fold_rtx): Likewise. (cse_insn): Likewise. (cse_extended_basic_block): Likewise. (set_live_p): Likewise. * rtlanal.c (canonicalize_condition): Likewise. * simplify-rtx.c (simplify_binary_operation_1): Likewise. OK. I find myself wondering if the conditionals should look like if (HAVE_cc0 && (whatever)) But I doubt it makes any measurable difference. It's something we can always add in the future if we feel the need to avoid the runtime checks for things that aren't ever going to happen on most modern targets. jeff
Re: [PATCH 02/12] remove some ifdef HAVE_cc0
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * conditions.h: Define macros even if HAVE_cc0 is undefined. * emit-rtl.c: Define functions even if HAVE_cc0 is undefined. * final.c: Likewise. * jump.c: Likewise. * recog.c: Likewise. * recog.h: Declare functions even when HAVE_cc0 is undefined. * sched-deps.c (sched_analyze_2): Always compile case for cc0. OK. Note for anyone else reading at home, some of the functions being unconditionally compiled now already had unconditional prototypes in the header files. So not everything needed a .h file change. jeff
Re: [PATCH 00/12] Reduce conditional compilation
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders Hi, This is a first round of patches to reduce the amount of code with in #if / #ifdef. This makes it incrementally easier to not break configs other than the one being built, and moves things slightly closer to using target hooks for everything. each commit bootstrapped and regtested on x86_64-linux-gnu without regression, and whole patch set run through config-list.mk without issue, ok? Thanks for tackling this. It's not particular deep work, but I do think it'll help reduce the long term maintenance costs and make developers' lives easier. Onward to the HAVE_cc0 patches :-) Jeff ps. You hit a good window, my daughter was update late last night and is sleeping in a bit, so I've got unexpected time this morning before my meetings.
Re: [PATCH 01/12] add default definition of EH_RETURN_DATA_REGNO
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h: New definition of EH_RETURN_DATA_REGNO. * except.c: Remove definition of EH_RETURN_DATA_REGNO. * builtins.c (expand_builtin): Remove check if EH_RETURN_DATA_REGNO is defined. * df-scan.c (df_bb_refs_collect): Likewise. (df_get_exit_block_use_set): Likewise. * haifa-sched.c (initiate_bb_reg_pressure_info): Likewise. * ira-lives.c (process_bb_node_lives): Likewise. * lra-lives.c (process_bb_lives): Likewise. This one wasn't as obvious as the others, but is clearly OK once the full loops being guarded by EH_RETURN_DATA_REGNO are examined. Jeff
Re: [PATCH 08/12] reduce conditional compilation for HARD_FRAME_POINTER_IS_FRAME_POINTER
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * alias.c (init_alias_target): Remove ifdef * HARD_FRAME_POINTER_IS_FRAME_POINTER. * df-scan.c (df_insn_refs_collect): Likewise. (df_get_regular_block_artificial_uses): Likewise. (df_get_eh_block_artificial_uses): Likewise. (df_get_entry_block_def_set): Likewise. (df_get_exit_block_use_set): Likewise. * emit-rtl.c (gen_rtx_REG): Likewise. * ira.c (ira_setup_eliminable_regset): Likewise. * reginfo.c (init_reg_sets_1): Likewise. * regrename.c (rename_chains): Likewise. * reload1.c (reload): Likewise. (eliminate_regs_in_insn): Likewise. * resource.c (mark_referenced_resources): Likewise. (init_resource_info): Likewise. OK. jeff
Re: [PATCH 09/12] remove #if for PIC_OFFSET_TABLE_REGNUM
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * df-scan.c (df_get_entry_block_def_set): Remove #ifdef PIC_OFFSET_TABLE_REGNUM. OK. jeff
[PATCH, i386]: Some spring cleaning in i386.h
Hello! This patch redefines various hard register numbers with ones from i386.md. Also, the patch reshuffles some defines to group them together in a better way. No functional changes. 2015-04-21 Uros Bizjak * config/i386/i386.md (ARGP_REG, FRAME_REG, BND2_REG, BND3_REG, FIRST_PSEUDO_REG): New. * config/i386/i386.h (STACK_POINTER_REGNUM): Define to SP_REG. (ARG_POINTER_REGNUM): Define to ARGP_REG. (FRAME_POINTER_REGNUM): Define to FRAME_REG. (HARD_FRAME_POINTER_REGNUM): Define to BP_REG. (FIRST_PSEUDO_REGISTER): Define to FIRST_PSEUDO_REG. (FIRST_INT_REG): New. (LAST_INT_REG): New. (FIRST_*_REG): Define using *_REG. (LAST_*_REG): Ditto. (QI_REGNO_P): Define using FIRST_QU_REG and LAST_QI_REG. (LEGACY_INT_REGNO_P): Define using FIRST_INT_REG and LAST_INT_REG. (FIRST_FLOAT_REG): Define to FIRST_STACK_REG. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}, committed to mainline SVN. Uros. Index: config/i386/i386.h === --- config/i386/i386.h (revision 57) +++ config/i386/i386.h (working copy) @@ -957,7 +957,7 @@ extern const char *host_detect_local_cpu (int argc eliminated during reloading in favor of either the stack or frame pointer. */ -#define FIRST_PSEUDO_REGISTER 81 +#define FIRST_PSEUDO_REGISTER FIRST_PSEUDO_REG /* Number of hardware registers that go into the DWARF-2 unwind info. If not defined, equals FIRST_PSEUDO_REGISTER. */ @@ -1100,7 +1100,7 @@ extern const char *host_detect_local_cpu (int argc || (MODE) == V16SImode || (MODE) == V16SFmode || (MODE) == V32HImode \ || (MODE) == V4TImode) -#define VALID_AVX512VL_128_REG_MODE(MODE) \ +#define VALID_AVX512VL_128_REG_MODE(MODE) \ ((MODE) == V2DImode || (MODE) == V2DFmode || (MODE) == V16QImode \ || (MODE) == V4SImode || (MODE) == V4SFmode || (MODE) == V8HImode) @@ -1121,6 +1121,10 @@ extern const char *host_detect_local_cpu (int argc || (MODE) == V2SImode || (MODE) == SImode \ || (MODE) == V4HImode || (MODE) == V8QImode) +#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode) + +#define VALID_MASK_AVX512BW_MODE(MODE) ((MODE) == SImode || (MODE) == DImode) + #define VALID_BND_REG_MODE(MODE) \ (TARGET_64BIT ? (MODE) == BND64mode : (MODE) == BND32mode) @@ -1150,10 +1154,16 @@ extern const char *host_detect_local_cpu (int argc || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode \ || (MODE) == V16SFmode) -#define VALID_MASK_REG_MODE(MODE) ((MODE) == HImode || (MODE) == QImode) +#define X87_FLOAT_MODE_P(MODE) \ + (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode)) -#define VALID_MASK_AVX512BW_MODE(MODE) ((MODE) == SImode || (MODE) == DImode) +#define SSE_FLOAT_MODE_P(MODE) \ + ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode)) +#define FMA4_VEC_FLOAT_MODE_P(MODE) \ + (TARGET_FMA4 && ((MODE) == V4SFmode || (MODE) == V2DFmode \ + || (MODE) == V8SFmode || (MODE) == V4DFmode)) + /* Value is 1 if hard register REGNO can hold a value of machine-mode MODE. */ #define HARD_REGNO_MODE_OK(REGNO, MODE)\ @@ -1198,42 +1208,46 @@ extern const char *host_detect_local_cpu (int argc register. The ordinary mov instructions won't work */ /* #define PC_REGNUM */ +/* Base register for access to arguments of the function. */ +#define ARG_POINTER_REGNUM ARGP_REG + /* Register to use for pushing function arguments. */ -#define STACK_POINTER_REGNUM 7 +#define STACK_POINTER_REGNUM SP_REG /* Base register for access to local variables of the function. */ -#define HARD_FRAME_POINTER_REGNUM 6 +#define FRAME_POINTER_REGNUM FRAME_REG +#define HARD_FRAME_POINTER_REGNUM BP_REG -/* Base register for access to local variables of the function. */ -#define FRAME_POINTER_REGNUM 20 +#define FIRST_INT_REG AX_REG +#define LAST_INT_REG SP_REG -/* First floating point reg */ -#define FIRST_FLOAT_REG 8 +#define FIRST_QI_REG AX_REG +#define LAST_QI_REG BX_REG /* First & last stack-like regs */ -#define FIRST_STACK_REG FIRST_FLOAT_REG -#define LAST_STACK_REG (FIRST_FLOAT_REG + 7) +#define FIRST_STACK_REG ST0_REG +#define LAST_STACK_REG ST7_REG -#define FIRST_SSE_REG (FRAME_POINTER_REGNUM + 1) -#define LAST_SSE_REG (FIRST_SSE_REG + 7) +#define FIRST_SSE_REG XMM0_REG +#define LAST_SSE_REG XMM7_REG -#define FIRST_MMX_REG (LAST_SSE_REG + 1) /*29*/ -#define LAST_MMX_REG (FIRST_MMX_REG + 7) +#define FIRST_MMX_REG MM0_REG +#define LAST_MMX_REG MM7_REG -#define FIRST_REX_INT_REG (LAST_MMX_REG + 1) /*37*/ -#define LAST_REX_INT_REG (FIRST_REX_INT_REG + 7) +#define FIRST_REX_INT_REG R8_REG +#define LAST_REX_INT_REG R15_REG -#define FIRST_REX_SSE_REG (LAST_REX_INT_REG + 1) /*45*/ -#define LAST_REX_SSE_REG (FIRST_REX_SSE_REG + 7) +#define FIRST_R
Re: [PATCH 07/12] provide default for MASK_RETURN_ADDR
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h (MASK_RETURN_ADDR): New definition. * except.c (expand_builtin_extract_return_addr): Remove ifdef MASK_RETURN_ADDR. OK. jeff
Re: [PATCH 06/12] provide default for RETURN_ADDR_OFFSET
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h (RETURN_ADDR_OFFSET): New definition. * except.c (expand_builtin_extract_return_addr): Remove ifdef RETURN_ADDR_OFFSET. (expand_builtin_frob_return_addr): Likewise. OK. jeff
Re: [PATCH 12/12] add default for INSN_REFERENCES_ARE_DELAYED
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h (INSN_REFERENCES_ARE_DELAYED): New definition. * reorg.c (redundant_insn): Remove ifdef INSN_REFERENCES_ARE_DELAYED. * resource.c (mark_referenced_resources): Likewise. OK. Jeff
Re: [PATCH 11/12] provide default for INSN_SETS_ARE_DELAYED
On 04/21/2015 07:24 AM, tbsaunde+...@tbsaunde.org wrote: From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h (INSN_SETS_ARE_DELAYED): New definition. * reorg.c (redundant_insn): Remove ifdef INSN_SETS_ARE_DELAYED. * resource.c (mark_set_resources): Likewise. OK. Jeff
[PATCH 11/12] provide default for INSN_SETS_ARE_DELAYED
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h (INSN_SETS_ARE_DELAYED): New definition. * reorg.c (redundant_insn): Remove ifdef INSN_SETS_ARE_DELAYED. * resource.c (mark_set_resources): Likewise. --- gcc/defaults.h | 4 gcc/reorg.c| 4 gcc/resource.c | 2 -- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/gcc/defaults.h b/gcc/defaults.h index 843d7e2..79cb599 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1201,6 +1201,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define DEFAULT_PCC_STRUCT_RETURN 1 #endif +#ifndef INSN_SETS_ARE_DELAYED +#define INSN_SETS_ARE_DELAYED(INSN) false +#endif + #ifdef GCC_INSN_FLAGS_H /* Dependent default target macro definitions diff --git a/gcc/reorg.c b/gcc/reorg.c index b7228f2..ae77f0a 100644 --- a/gcc/reorg.c +++ b/gcc/reorg.c @@ -1555,10 +1555,8 @@ redundant_insn (rtx insn, rtx_insn *target, rtx delay_list) slots because it is difficult to track its resource needs correctly. */ -#ifdef INSN_SETS_ARE_DELAYED if (INSN_SETS_ARE_DELAYED (seq->insn (0))) return 0; -#endif #ifdef INSN_REFERENCES_ARE_DELAYED if (INSN_REFERENCES_ARE_DELAYED (seq->insn (0))) @@ -1657,10 +1655,8 @@ redundant_insn (rtx insn, rtx_insn *target, rtx delay_list) /* If this is an INSN or JUMP_INSN with delayed effects, it is hard to track the resource needs properly, so give up. */ -#ifdef INSN_SETS_ARE_DELAYED if (INSN_SETS_ARE_DELAYED (control)) return 0; -#endif #ifdef INSN_REFERENCES_ARE_DELAYED if (INSN_REFERENCES_ARE_DELAYED (control)) diff --git a/gcc/resource.c b/gcc/resource.c index 9a013b3..5af9376 100644 --- a/gcc/resource.c +++ b/gcc/resource.c @@ -696,11 +696,9 @@ mark_set_resources (rtx x, struct resources *res, int in_dest, /* An insn consisting of just a CLOBBER (or USE) is just for flow and doesn't actually do anything, so we ignore it. */ -#ifdef INSN_SETS_ARE_DELAYED if (mark_type != MARK_SRC_DEST_CALL && INSN_SETS_ARE_DELAYED (as_a (x))) return; -#endif x = PATTERN (x); if (GET_CODE (x) != USE && GET_CODE (x) != CLOBBER) -- 2.3.0.80.g18d0fec.dirty
Re: [AArch64][PR65139] use clobber with match_scratch for aarch64_lshr_sisd_or_int_3
On 18/04/15 19:17, Maxim Kuvyrkov wrote: >> On Apr 18, 2015, at 8:21 PM, Richard Earnshaw >> wrote: >> >> On 18/04/15 16:13, Jakub Jelinek wrote: >>> On Sat, Apr 18, 2015 at 03:07:16PM +0100, Richard Earnshaw wrote: You need to ensure that your scratch register cannot overlap op1, since the scratch is written before op1 is read. >>> >>> - (clobber (match_scratch:QI 3 "=X,w,X"))] >>> + (clobber (match_scratch:QI 3 "=X,&w,X"))] >>> >>> incremental diff should ensure that, right? >>> >>> Jakub >>> >> >> >> Sorry, where in the patch is that hunk? >> >> I see just: >> >> + (clobber (match_scratch:QI 3 "=X,w,X"))] > > Jakub's suggestion is an incremental patch on top of Kugan's. > Ah, sorry, I though he was implying it was already in the patch somewhere. >> >> And why would early clobbering the scratch be notably better than the >> original? >> > > It will still be better. With this patch we want to allow RA freedom to > optimally handle both of the following cases: > > 1. operand[1] dies after the instruction. In this case we want operand[0] > and operand[1] to be assigned to the same reg, and operand[3] to be assigned > to a different register to provide a temporary. In this case we don't care > whether operand[3] is early-clobber or not. This case is not optimally > handled with current insn patterns. > > 2. operand[1] lives on after the instruction. In this case we want > operand[0] and operand[3] to be assigned to the same reg, and not clobber > operand[1]. By marking operand[3] early-clobber we ensure that operand[1] is > in a different register from what operand[0] and operand[3] were assigned to. > This case should be handled equally well before and after the patch. > > My understanding is that Kugan's patch with Jakub's fix on top satisfy both > of these cases. > I still don't think it handles all cases efficiently. If we really want the result in a different register from both of the inputs, then now we need two registers for the results, one for the result and another for the temporary. In that case we could have used the result register as the scratch, but now we can't. Maybe we can provide two alternatives, one that early-clobbers the result register but doesn't need a scratch and one that doesn't early-clobber the result, but does need a scratch. So something like (define_insn "aarch64_lshr_sisd_or_int_3" [(set (match_operand:GPI 0 "register_operand" "=w,&w,w,r") (lshiftrt:GPI (match_operand:GPI 1 "register_operand" "w,w,w,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_" "Us,w,w,rUs"))) (clobber (match_scratch:QI 3 "=X,X,w,X"))] ... but I haven't tested any of that. I would also note the conversation in https://gcc.gnu.org/ml/gcc/2015-04/msg00240.html. That seems to suggest we should be wary of using scratch sequences since the register allocator doesn't account for them properly. R. > -- > Maxim Kuvyrkov > www.linaro.org >
[PATCH 12/12] add default for INSN_REFERENCES_ARE_DELAYED
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h (INSN_REFERENCES_ARE_DELAYED): New definition. * reorg.c (redundant_insn): Remove ifdef INSN_REFERENCES_ARE_DELAYED. * resource.c (mark_referenced_resources): Likewise. --- gcc/defaults.h | 4 gcc/reorg.c| 4 gcc/resource.c | 2 -- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/gcc/defaults.h b/gcc/defaults.h index 79cb599..cafcb1e 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -1205,6 +1205,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define INSN_SETS_ARE_DELAYED(INSN) false #endif +#ifndef INSN_REFERENCES_ARE_DELAYED +#define INSN_REFERENCES_ARE_DELAYED(INSN) false +#endif + #ifdef GCC_INSN_FLAGS_H /* Dependent default target macro definitions diff --git a/gcc/reorg.c b/gcc/reorg.c index ae77f0a..d8d8ab69 100644 --- a/gcc/reorg.c +++ b/gcc/reorg.c @@ -1558,10 +1558,8 @@ redundant_insn (rtx insn, rtx_insn *target, rtx delay_list) if (INSN_SETS_ARE_DELAYED (seq->insn (0))) return 0; -#ifdef INSN_REFERENCES_ARE_DELAYED if (INSN_REFERENCES_ARE_DELAYED (seq->insn (0))) return 0; -#endif /* See if any of the insns in the delay slot match, updating resource requirements as we go. */ @@ -1658,10 +1656,8 @@ redundant_insn (rtx insn, rtx_insn *target, rtx delay_list) if (INSN_SETS_ARE_DELAYED (control)) return 0; -#ifdef INSN_REFERENCES_ARE_DELAYED if (INSN_REFERENCES_ARE_DELAYED (control)) return 0; -#endif if (JUMP_P (control)) annul_p = INSN_ANNULLED_BRANCH_P (control); diff --git a/gcc/resource.c b/gcc/resource.c index 5af9376..26d9fca 100644 --- a/gcc/resource.c +++ b/gcc/resource.c @@ -392,11 +392,9 @@ mark_referenced_resources (rtx x, struct resources *res, include_delayed_effects ? MARK_SRC_DEST_CALL : MARK_SRC_DEST); -#ifdef INSN_REFERENCES_ARE_DELAYED if (! include_delayed_effects && INSN_REFERENCES_ARE_DELAYED (as_a (x))) return; -#endif /* No special processing, just speed up. */ mark_referenced_resources (PATTERN (x), res, include_delayed_effects); -- 2.3.0.80.g18d0fec.dirty
[PATCH 09/12] remove #if for PIC_OFFSET_TABLE_REGNUM
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * df-scan.c (df_get_entry_block_def_set): Remove #ifdef PIC_OFFSET_TABLE_REGNUM. --- gcc/df-scan.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/gcc/df-scan.c b/gcc/df-scan.c index 69332a8..4232ec8 100644 --- a/gcc/df-scan.c +++ b/gcc/df-scan.c @@ -3589,10 +3589,6 @@ df_get_entry_block_def_set (bitmap entry_block_defs) /* These registers are live everywhere. */ if (!reload_completed) { -#ifdef PIC_OFFSET_TABLE_REGNUM - unsigned int picreg = PIC_OFFSET_TABLE_REGNUM; -#endif - #if FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM /* Pseudos with argument area equivalences may require reloading via the argument pointer. */ @@ -3600,13 +3596,12 @@ df_get_entry_block_def_set (bitmap entry_block_defs) bitmap_set_bit (entry_block_defs, ARG_POINTER_REGNUM); #endif -#ifdef PIC_OFFSET_TABLE_REGNUM /* Any constant, or pseudo with constant equivalences, may require reloading from memory using the pic register. */ + unsigned int picreg = PIC_OFFSET_TABLE_REGNUM; if (picreg != INVALID_REGNUM && fixed_regs[picreg]) bitmap_set_bit (entry_block_defs, picreg); -#endif } #ifdef INCOMING_RETURN_ADDR_RTX -- 2.3.0.80.g18d0fec.dirty
[PATCH 10/12] remove more ifdefs for HAVE_cc0
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * caller-save.c (insert_one_insn): Remove ifdef HAVE_cc0. * cfgcleanup.c (flow_find_cross_jump): Likewise. (flow_find_head_matching_sequence): Likewise. (try_head_merge_bb): Likewise. * combine.c (can_combine_p): Likewise. (try_combine): Likewise. (distribute_notes): Likewise. * df-problems.c (can_move_insns_across): Likewise. * final.c (final): Likewise. * gcse.c (insert_insn_end_basic_block): Likewise. * ira.c (find_moveable_pseudos): Likewise. * reorg.c (try_merge_delay_insns): Likewise. (fill_simple_delay_slots): Likewise. (fill_slots_from_thread): Likewise. * sched-deps.c (sched_analyze_2): Likewise. --- gcc/caller-save.c | 4 +--- gcc/cfgcleanup.c | 26 -- gcc/combine.c | 54 +- gcc/df-problems.c | 5 + gcc/final.c | 29 ++--- gcc/gcse.c| 24 +--- gcc/ira.c | 5 + gcc/reorg.c | 26 +++--- gcc/sched-deps.c | 6 +++--- 9 files changed, 69 insertions(+), 110 deletions(-) diff --git a/gcc/caller-save.c b/gcc/caller-save.c index fc575eb..76c3a7e 100644 --- a/gcc/caller-save.c +++ b/gcc/caller-save.c @@ -1400,18 +1400,16 @@ insert_one_insn (struct insn_chain *chain, int before_p, int code, rtx pat) rtx_insn *insn = chain->insn; struct insn_chain *new_chain; -#if HAVE_cc0 /* If INSN references CC0, put our insns in front of the insn that sets CC0. This is always safe, since the only way we could be passed an insn that references CC0 is for a restore, and doing a restore earlier isn't a problem. We do, however, assume here that CALL_INSNs don't reference CC0. Guard against non-INSN's like CODE_LABEL. */ - if ((NONJUMP_INSN_P (insn) || JUMP_P (insn)) + if (HAVE_cc0 && (NONJUMP_INSN_P (insn) || JUMP_P (insn)) && before_p && reg_referenced_p (cc0_rtx, PATTERN (insn))) chain = chain->prev, insn = chain->insn; -#endif new_chain = new_insn_chain (); if (before_p) diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c index 58d235e..e5c4747 100644 --- a/gcc/cfgcleanup.c +++ b/gcc/cfgcleanup.c @@ -1416,12 +1416,11 @@ flow_find_cross_jump (basic_block bb1, basic_block bb2, rtx_insn **f1, i2 = PREV_INSN (i2); } -#if HAVE_cc0 /* Don't allow the insn after a compare to be shared by cross-jumping unless the compare is also shared. */ - if (ninsns && reg_mentioned_p (cc0_rtx, last1) && ! sets_cc0_p (last1)) + if (HAVE_cc0 && ninsns && reg_mentioned_p (cc0_rtx, last1) + && ! sets_cc0_p (last1)) last1 = afterlast1, last2 = afterlast2, last_dir = afterlast_dir, ninsns--; -#endif /* Include preceding notes and labels in the cross-jump. One, this may bring us to the head of the blocks as requested above. @@ -1539,12 +1538,11 @@ flow_find_head_matching_sequence (basic_block bb1, basic_block bb2, rtx_insn **f i2 = NEXT_INSN (i2); } -#if HAVE_cc0 /* Don't allow a compare to be shared by cross-jumping unless the insn after the compare is also shared. */ - if (ninsns && reg_mentioned_p (cc0_rtx, last1) && sets_cc0_p (last1)) + if (HAVE_cc0 && ninsns && reg_mentioned_p (cc0_rtx, last1) + && sets_cc0_p (last1)) last1 = beforelast1, last2 = beforelast2, ninsns--; -#endif if (ninsns) { @@ -2330,11 +2328,9 @@ try_head_merge_bb (basic_block bb) cond = get_condition (jump, &move_before, true, false); if (cond == NULL_RTX) { -#if HAVE_cc0 - if (reg_mentioned_p (cc0_rtx, jump)) + if (HAVE_cc0 && reg_mentioned_p (cc0_rtx, jump)) move_before = prev_nonnote_nondebug_insn (jump); else -#endif move_before = jump; } @@ -2499,11 +2495,9 @@ try_head_merge_bb (basic_block bb) cond = get_condition (jump, &move_before, true, false); if (cond == NULL_RTX) { -#if HAVE_cc0 - if (reg_mentioned_p (cc0_rtx, jump)) + if (HAVE_cc0 && reg_mentioned_p (cc0_rtx, jump)) move_before = prev_nonnote_nondebug_insn (jump); else -#endif move_before = jump; } } @@ -2522,12 +2516,10 @@ try_head_merge_bb (basic_block bb) /* Try again, using a different insertion point. */ move_before = jump; -#if HAVE_cc0 /* Don't try moving before a cc0 user, as that may invalidate the cc0. */ - if (reg_mentioned_p (cc0_rtx, jump)) + if (HAVE_cc0 && reg_mentioned_p (cc0_rtx, jump)) break; -#endif continue; } @@ -2582,12 +2574,10 @@ try_head_merge_bb (basic_block bb) /* For the unmerged insns, try a different insertion point. */ move_before = jump; -#if HAVE_cc0 /* Don't try moving before a cc0 user,
[PATCH 08/12] reduce conditional compilation for HARD_FRAME_POINTER_IS_FRAME_POINTER
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * alias.c (init_alias_target): Remove ifdef * HARD_FRAME_POINTER_IS_FRAME_POINTER. * df-scan.c (df_insn_refs_collect): Likewise. (df_get_regular_block_artificial_uses): Likewise. (df_get_eh_block_artificial_uses): Likewise. (df_get_entry_block_def_set): Likewise. (df_get_exit_block_use_set): Likewise. * emit-rtl.c (gen_rtx_REG): Likewise. * ira.c (ira_setup_eliminable_regset): Likewise. * reginfo.c (init_reg_sets_1): Likewise. * regrename.c (rename_chains): Likewise. * reload1.c (reload): Likewise. (eliminate_regs_in_insn): Likewise. * resource.c (mark_referenced_resources): Likewise. (init_resource_info): Likewise. --- gcc/alias.c | 7 +++ gcc/df-scan.c | 35 +-- gcc/emit-rtl.c | 6 +++--- gcc/ira.c | 23 --- gcc/reginfo.c | 5 ++--- gcc/regrename.c | 5 ++--- gcc/reload1.c | 10 -- gcc/resource.c | 11 +-- 8 files changed, 48 insertions(+), 54 deletions(-) diff --git a/gcc/alias.c b/gcc/alias.c index a7160f3..8f48660 100644 --- a/gcc/alias.c +++ b/gcc/alias.c @@ -2765,10 +2765,9 @@ init_alias_target (void) = unique_base_value (UNIQUE_BASE_VALUE_ARGP); static_reg_base_value[FRAME_POINTER_REGNUM] = unique_base_value (UNIQUE_BASE_VALUE_FP); -#if !HARD_FRAME_POINTER_IS_FRAME_POINTER - static_reg_base_value[HARD_FRAME_POINTER_REGNUM] -= unique_base_value (UNIQUE_BASE_VALUE_HFP); -#endif + if (!HARD_FRAME_POINTER_IS_FRAME_POINTER) +static_reg_base_value[HARD_FRAME_POINTER_REGNUM] + = unique_base_value (UNIQUE_BASE_VALUE_HFP); } /* Set MEMORY_MODIFIED when X modifies DATA (that is assumed diff --git a/gcc/df-scan.c b/gcc/df-scan.c index b2e2e5d..69332a8 100644 --- a/gcc/df-scan.c +++ b/gcc/df-scan.c @@ -3247,12 +3247,11 @@ df_insn_refs_collect (struct df_collection_rec *collection_rec, regno_reg_rtx[FRAME_POINTER_REGNUM], NULL, bb, insn_info, DF_REF_REG_USE, 0); -#if !HARD_FRAME_POINTER_IS_FRAME_POINTER - df_ref_record (DF_REF_BASE, collection_rec, - regno_reg_rtx[HARD_FRAME_POINTER_REGNUM], - NULL, bb, insn_info, - DF_REF_REG_USE, 0); -#endif + if (!HARD_FRAME_POINTER_IS_FRAME_POINTER) + df_ref_record (DF_REF_BASE, collection_rec, + regno_reg_rtx[HARD_FRAME_POINTER_REGNUM], + NULL, bb, insn_info, + DF_REF_REG_USE, 0); break; default: break; @@ -3442,9 +3441,9 @@ df_get_regular_block_artificial_uses (bitmap regular_block_artificial_uses) reference of the frame pointer. */ bitmap_set_bit (regular_block_artificial_uses, FRAME_POINTER_REGNUM); -#if !HARD_FRAME_POINTER_IS_FRAME_POINTER - bitmap_set_bit (regular_block_artificial_uses, HARD_FRAME_POINTER_REGNUM); -#endif + if (!HARD_FRAME_POINTER_IS_FRAME_POINTER) + bitmap_set_bit (regular_block_artificial_uses, + HARD_FRAME_POINTER_REGNUM); #if FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM /* Pseudos with argument area equivalences may require @@ -3494,9 +3493,9 @@ df_get_eh_block_artificial_uses (bitmap eh_block_artificial_uses) if (frame_pointer_needed) { bitmap_set_bit (eh_block_artificial_uses, FRAME_POINTER_REGNUM); -#if !HARD_FRAME_POINTER_IS_FRAME_POINTER - bitmap_set_bit (eh_block_artificial_uses, HARD_FRAME_POINTER_REGNUM); -#endif + if (!HARD_FRAME_POINTER_IS_FRAME_POINTER) + bitmap_set_bit (eh_block_artificial_uses, + HARD_FRAME_POINTER_REGNUM); } #if FRAME_POINTER_REGNUM != ARG_POINTER_REGNUM if (fixed_regs[ARG_POINTER_REGNUM]) @@ -3580,11 +3579,11 @@ df_get_entry_block_def_set (bitmap entry_block_defs) /* Any reference to any pseudo before reload is a potential reference of the frame pointer. */ bitmap_set_bit (entry_block_defs, FRAME_POINTER_REGNUM); -#if !HARD_FRAME_POINTER_IS_FRAME_POINTER + /* If they are different, also mark the hard frame pointer as live. */ - if (!LOCAL_REGNO (HARD_FRAME_POINTER_REGNUM)) + if (!HARD_FRAME_POINTER_IS_FRAME_POINTER + && !LOCAL_REGNO (HARD_FRAME_POINTER_REGNUM)) bitmap_set_bit (entry_block_defs, HARD_FRAME_POINTER_REGNUM); -#endif } /* These registers are live everywhere. */ @@ -3718,11 +3717,11 @@ df_get_exit_block_use_set (bitmap exit_block_uses) if ((!reload_completed) || frame_pointer_needed) { bitmap_set_bit (exit_block_uses, FRAME_POINTER_REGNUM); -#if !HARD_FRAME_POINTER_IS_FRAME_POINTER + /* If they are different, also mark the hard frame pointer as live. */ - if (
[PATCH 07/12] provide default for MASK_RETURN_ADDR
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h (MASK_RETURN_ADDR): New definition. * except.c (expand_builtin_extract_return_addr): Remove ifdef MASK_RETURN_ADDR. --- gcc/defaults.h | 4 gcc/except.c | 6 +++--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/gcc/defaults.h b/gcc/defaults.h index 767901a..843d7e2 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -388,6 +388,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define RETURN_ADDR_OFFSET 0 #endif +#ifndef MASK_RETURN_ADDR +#define MASK_RETURN_ADDR NULL_RTX +#endif + /* If we have named section and we support weak symbols, then use the .jcr section for recording java classes which need to be registered at program start-up time. */ diff --git a/gcc/except.c b/gcc/except.c index c98163d..5b24006 100644 --- a/gcc/except.c +++ b/gcc/except.c @@ -2184,9 +2184,9 @@ expand_builtin_extract_return_addr (tree addr_tree) } /* First mask out any unwanted bits. */ -#ifdef MASK_RETURN_ADDR - expand_and (Pmode, addr, MASK_RETURN_ADDR, addr); -#endif + rtx mask = MASK_RETURN_ADDR; + if (mask) +expand_and (Pmode, addr, mask, addr); /* Then adjust to find the real return address. */ if (RETURN_ADDR_OFFSET) -- 2.3.0.80.g18d0fec.dirty
[PATCH 06/12] provide default for RETURN_ADDR_OFFSET
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * defaults.h (RETURN_ADDR_OFFSET): New definition. * except.c (expand_builtin_extract_return_addr): Remove ifdef RETURN_ADDR_OFFSET. (expand_builtin_frob_return_addr): Likewise. --- gcc/defaults.h | 5 + gcc/except.c | 14 +++--- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/gcc/defaults.h b/gcc/defaults.h index 911c2f8..767901a 100644 --- a/gcc/defaults.h +++ b/gcc/defaults.h @@ -383,6 +383,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see #define EH_RETURN_DATA_REGNO(N) INVALID_REGNUM #endif +/* Offset between the eh handler address and entry in eh tables. */ +#ifndef RETURN_ADDR_OFFSET +#define RETURN_ADDR_OFFSET 0 +#endif + /* If we have named section and we support weak symbols, then use the .jcr section for recording java classes which need to be registered at program start-up time. */ diff --git a/gcc/except.c b/gcc/except.c index 7573c88..c98163d 100644 --- a/gcc/except.c +++ b/gcc/except.c @@ -2189,9 +2189,8 @@ expand_builtin_extract_return_addr (tree addr_tree) #endif /* Then adjust to find the real return address. */ -#if defined (RETURN_ADDR_OFFSET) - addr = plus_constant (Pmode, addr, RETURN_ADDR_OFFSET); -#endif + if (RETURN_ADDR_OFFSET) +addr = plus_constant (Pmode, addr, RETURN_ADDR_OFFSET); return addr; } @@ -2207,10 +2206,11 @@ expand_builtin_frob_return_addr (tree addr_tree) addr = convert_memory_address (Pmode, addr); -#ifdef RETURN_ADDR_OFFSET - addr = force_reg (Pmode, addr); - addr = plus_constant (Pmode, addr, -RETURN_ADDR_OFFSET); -#endif + if (RETURN_ADDR_OFFSET) +{ + addr = force_reg (Pmode, addr); + addr = plus_constant (Pmode, addr, -RETURN_ADDR_OFFSET); +} return addr; } -- 2.3.0.80.g18d0fec.dirty
[PATCH 04/12] always define HAVE_cc0
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * genconfig.c (main): Always define HAVE_cc0. * caller-save.c (insert_one_insn): Change ifdef HAVE_cc0 to #if HAVE_cc0. * cfgcleanup.c (flow_find_cross_jump): Likewise. (flow_find_head_matching_sequence): Likewise. (try_head_merge_bb): Likewise. * cfgrtl.c (rtl_merge_blocks): Likewise. (try_redirect_by_replacing_jump): Likewise. (rtl_tidy_fallthru_edge): Likewise. * combine.c (do_SUBST_MODE): Likewise. (insn_a_feeds_b): Likewise. (combine_instructions): Likewise. (can_combine_p): Likewise. (try_combine): Likewise. (find_split_point): Likewise. (subst): Likewise. (simplify_set): Likewise. (distribute_notes): Likewise. * cprop.c (cprop_jump): Likewise. * cse.c (cse_extended_basic_block): Likewise. * df-problems.c (can_move_insns_across): Likewise. * final.c (final): Likewise. (final_scan_insn): Likewise. * function.c (emit_use_return_register_into_block): Likewise. * gcse.c (insert_insn_end_basic_block): Likewise. * haifa-sched.c (sched_init): Likewise. * ira.c (find_moveable_pseudos): Likewise. * loop-invariant.c (find_invariant_insn): Likewise. * lra-constraints.c (curr_insn_transform): Likewise. * optabs.c (prepare_cmp_insn): Likewise. * postreload.c (reload_combine_recognize_const_pattern): * Likewise. * reload.c (find_reloads): Likewise. (find_reloads_address_1): Likewise. * reorg.c (delete_scheduled_jump): Likewise. (steal_delay_list_from_target): Likewise. (steal_delay_list_from_fallthrough): Likewise. (try_merge_delay_insns): Likewise. (redundant_insn): Likewise. (fill_simple_delay_slots): Likewise. (fill_slots_from_thread): Likewise. (delete_computation): Likewise. (relax_delay_slots): Likewise. * sched-deps.c (sched_analyze_2): Likewise. * sched-rgn.c (add_branch_dependences): Likewise. --- gcc/caller-save.c | 2 +- gcc/cfgcleanup.c | 12 ++-- gcc/cfgrtl.c | 6 +++--- gcc/combine.c | 36 ++-- gcc/cprop.c | 2 +- gcc/cse.c | 2 +- gcc/df-problems.c | 4 ++-- gcc/final.c | 14 +++--- gcc/function.c| 2 +- gcc/gcse.c| 2 +- gcc/genconfig.c | 1 + gcc/haifa-sched.c | 2 +- gcc/ira.c | 4 ++-- gcc/loop-invariant.c | 2 +- gcc/lra-constraints.c | 2 +- gcc/optabs.c | 2 +- gcc/postreload.c | 2 +- gcc/reload.c | 6 +++--- gcc/reorg.c | 30 +++--- gcc/sched-deps.c | 2 +- gcc/sched-rgn.c | 2 +- 21 files changed, 69 insertions(+), 68 deletions(-) diff --git a/gcc/caller-save.c b/gcc/caller-save.c index 3b01941..fc575eb 100644 --- a/gcc/caller-save.c +++ b/gcc/caller-save.c @@ -1400,7 +1400,7 @@ insert_one_insn (struct insn_chain *chain, int before_p, int code, rtx pat) rtx_insn *insn = chain->insn; struct insn_chain *new_chain; -#ifdef HAVE_cc0 +#if HAVE_cc0 /* If INSN references CC0, put our insns in front of the insn that sets CC0. This is always safe, since the only way we could be passed an insn that references CC0 is for a restore, and doing a restore earlier diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c index cee152e..58d235e 100644 --- a/gcc/cfgcleanup.c +++ b/gcc/cfgcleanup.c @@ -1416,7 +1416,7 @@ flow_find_cross_jump (basic_block bb1, basic_block bb2, rtx_insn **f1, i2 = PREV_INSN (i2); } -#ifdef HAVE_cc0 +#if HAVE_cc0 /* Don't allow the insn after a compare to be shared by cross-jumping unless the compare is also shared. */ if (ninsns && reg_mentioned_p (cc0_rtx, last1) && ! sets_cc0_p (last1)) @@ -1539,7 +1539,7 @@ flow_find_head_matching_sequence (basic_block bb1, basic_block bb2, rtx_insn **f i2 = NEXT_INSN (i2); } -#ifdef HAVE_cc0 +#if HAVE_cc0 /* Don't allow a compare to be shared by cross-jumping unless the insn after the compare is also shared. */ if (ninsns && reg_mentioned_p (cc0_rtx, last1) && sets_cc0_p (last1)) @@ -2330,7 +2330,7 @@ try_head_merge_bb (basic_block bb) cond = get_condition (jump, &move_before, true, false); if (cond == NULL_RTX) { -#ifdef HAVE_cc0 +#if HAVE_cc0 if (reg_mentioned_p (cc0_rtx, jump)) move_before = prev_nonnote_nondebug_insn (jump); else @@ -2499,7 +2499,7 @@ try_head_merge_bb (basic_block bb) cond = get_condition (jump, &move_before, true, false); if (cond == NULL_RTX) { -#ifdef HAVE_cc0 +#if HAVE_cc0 if (reg_mentioned_p (cc0_rtx, jump)) move_before = prev_nonnote_nondebug_insn (jump); else @@ -2522,7 +2522,7 @@ try_head_merge_bb
[PATCH 03/12] more removal of ifdef HAVE_cc0
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * combine.c (find_single_use): Remove HAVE_cc0 ifdef for code that is trivially ded on non cc0 targets. (simplify_set): Likewise. (mark_used_regs_combine): Likewise. * cse.c (new_basic_block): Likewise. (fold_rtx): Likewise. (cse_insn): Likewise. (cse_extended_basic_block): Likewise. (set_live_p): Likewise. * rtlanal.c (canonicalize_condition): Likewise. * simplify-rtx.c (simplify_binary_operation_1): Likewise. --- gcc/combine.c | 6 -- gcc/cse.c | 18 -- gcc/rtlanal.c | 2 -- gcc/simplify-rtx.c | 5 ++--- 4 files changed, 2 insertions(+), 29 deletions(-) diff --git a/gcc/combine.c b/gcc/combine.c index 46cd6db..0a35b8f 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -686,7 +686,6 @@ find_single_use (rtx dest, rtx_insn *insn, rtx_insn **ploc) rtx *result; struct insn_link *link; -#ifdef HAVE_cc0 if (dest == cc0_rtx) { next = NEXT_INSN (insn); @@ -699,7 +698,6 @@ find_single_use (rtx dest, rtx_insn *insn, rtx_insn **ploc) *ploc = next; return result; } -#endif if (!REG_P (dest)) return 0; @@ -6724,7 +6722,6 @@ simplify_set (rtx x) src = SET_SRC (x), dest = SET_DEST (x); } -#ifdef HAVE_cc0 /* If we have (set (cc0) (subreg ...)), we try to remove the subreg in SRC. */ if (dest == cc0_rtx @@ -6744,7 +6741,6 @@ simplify_set (rtx x) src = SET_SRC (x); } } -#endif #ifdef LOAD_EXTEND_OP /* If we have (set FOO (subreg:M (mem:N BAR) 0)) with M wider than N, this @@ -13193,11 +13189,9 @@ mark_used_regs_combine (rtx x) case ADDR_VEC: case ADDR_DIFF_VEC: case ASM_INPUT: -#ifdef HAVE_cc0 /* CC0 must die in the insn after it is set, so we don't need to take special note of it here. */ case CC0: -#endif return; case CLOBBER: diff --git a/gcc/cse.c b/gcc/cse.c index 2a33827..d184d27 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -281,7 +281,6 @@ struct qty_table_elem /* The table of all qtys, indexed by qty number. */ static struct qty_table_elem *qty_table; -#ifdef HAVE_cc0 /* For machines that have a CC0, we do not record its value in the hash table since its use is guaranteed to be the insn immediately following its definition and any other insn is presumed to invalidate it. @@ -293,7 +292,6 @@ static struct qty_table_elem *qty_table; static rtx this_insn_cc0, prev_insn_cc0; static machine_mode this_insn_cc0_mode, prev_insn_cc0_mode; -#endif /* Insn being scanned. */ @@ -884,9 +882,7 @@ new_basic_block (void) } } -#ifdef HAVE_cc0 prev_insn_cc0 = 0; -#endif } /* Say that register REG contains a quantity in mode MODE not in any @@ -3166,10 +3162,8 @@ fold_rtx (rtx x, rtx_insn *insn) case EXPR_LIST: return x; -#ifdef HAVE_cc0 case CC0: return prev_insn_cc0; -#endif case ASM_OPERANDS: if (insn) @@ -3223,7 +3217,6 @@ fold_rtx (rtx x, rtx_insn *insn) const_arg = folded_arg; break; -#ifdef HAVE_cc0 case CC0: /* The cc0-user and cc0-setter may be in different blocks if the cc0-setter potentially traps. In that case PREV_INSN_CC0 @@ -3247,7 +3240,6 @@ fold_rtx (rtx x, rtx_insn *insn) const_arg = equiv_constant (folded_arg); } break; -#endif default: folded_arg = fold_rtx (folded_arg, insn); @@ -4522,11 +4514,9 @@ cse_insn (rtx_insn *insn) sets = XALLOCAVEC (struct set, XVECLEN (x, 0)); this_insn = insn; -#ifdef HAVE_cc0 /* Records what this insn does to set CC0. */ this_insn_cc0 = 0; this_insn_cc0_mode = VOIDmode; -#endif /* Find all regs explicitly clobbered in this insn, to ensure they are not replaced with any other regs @@ -5541,7 +5531,6 @@ cse_insn (rtx_insn *insn) } } -#ifdef HAVE_cc0 /* If setting CC0, record what it was set to, or a constant, if it is equivalent to a constant. If it is being set to a floating-point value, make a COMPARE with the appropriate constant of 0. If we @@ -5556,7 +5545,6 @@ cse_insn (rtx_insn *insn) this_insn_cc0 = gen_rtx_COMPARE (VOIDmode, this_insn_cc0, CONST0_RTX (mode)); } -#endif } /* Now enter all non-volatile source expressions in the hash table @@ -6604,11 +6592,9 @@ cse_extended_basic_block (struct cse_basic_block_data *ebb_data) record_jump_equiv (insn, taken); } -#ifdef HAVE_cc0 /* Clear the CC0-tracking related insns, they can't provide useful information across basic block boundaries. */ prev_insn_cc0 = 0; -#endif } gcc_assert (next_qty <= max_qty); @@ -6859,21 +6845,17 @@ static bool set_live_p (rtx set, rtx_in
[PATCH 05/12] make some HAVE_cc0 code always compiled
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * cfgrtl.c (rtl_merge_blocks): Change #if HAVE_cc0 to if (HAVE_cc0) (try_redirect_by_replacing_jump): Likewise. (rtl_tidy_fallthru_edge): Likewise. * combine.c (insn_a_feeds_b): Likewise. (find_split_point): Likewise. (simplify_set): Likewise. * cprop.c (cprop_jump): Likewise. * cse.c (cse_extended_basic_block): Likewise. * df-problems.c (can_move_insns_across): Likewise. * function.c (emit_use_return_register_into_block): Likewise. * haifa-sched.c (sched_init): Likewise. * ira.c (find_moveable_pseudos): Likewise. * loop-invariant.c (find_invariant_insn): Likewise. * lra-constraints.c (curr_insn_transform): Likewise. * postreload.c (reload_combine_recognize_const_pattern): * Likewise. * reload.c (find_reloads): Likewise. * reorg.c (delete_scheduled_jump): Likewise. (steal_delay_list_from_target): Likewise. (steal_delay_list_from_fallthrough): Likewise. (redundant_insn): Likewise. (fill_simple_delay_slots): Likewise. (fill_slots_from_thread): Likewise. (delete_computation): Likewise. * sched-rgn.c (add_branch_dependences): Likewise. --- gcc/cfgrtl.c | 12 +++- gcc/combine.c | 10 ++ gcc/cprop.c | 4 +--- gcc/cse.c | 4 +--- gcc/df-problems.c | 4 +--- gcc/function.c| 5 ++--- gcc/haifa-sched.c | 3 +-- gcc/ira.c | 5 ++--- gcc/loop-invariant.c | 4 +--- gcc/lra-constraints.c | 6 ++ gcc/postreload.c | 4 +--- gcc/reload.c | 10 +++--- gcc/reorg.c | 32 gcc/sched-rgn.c | 4 +--- 14 files changed, 29 insertions(+), 78 deletions(-) diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c index 4c1708f..d93a49e 100644 --- a/gcc/cfgrtl.c +++ b/gcc/cfgrtl.c @@ -893,10 +893,9 @@ rtl_merge_blocks (basic_block a, basic_block b) del_first = a_end; -#if HAVE_cc0 /* If this was a conditional jump, we need to also delete the insn that set cc0. */ - if (only_sets_cc0_p (prev)) + if (HAVE_cc0 && only_sets_cc0_p (prev)) { rtx_insn *tmp = prev; @@ -905,7 +904,6 @@ rtl_merge_blocks (basic_block a, basic_block b) prev = BB_HEAD (a); del_first = tmp; } -#endif a_end = PREV_INSN (del_first); } @@ -1064,11 +1062,9 @@ try_redirect_by_replacing_jump (edge e, basic_block target, bool in_cfglayout) /* In case we zap a conditional jump, we'll need to kill the cc0 setter too. */ kill_from = insn; -#if HAVE_cc0 - if (reg_mentioned_p (cc0_rtx, PATTERN (insn)) + if (HAVE_cc0 && reg_mentioned_p (cc0_rtx, PATTERN (insn)) && only_sets_cc0_p (PREV_INSN (insn))) kill_from = PREV_INSN (insn); -#endif /* See if we can create the fallthru edge. */ if (in_cfglayout || can_fallthru (src, target)) @@ -1825,12 +1821,10 @@ rtl_tidy_fallthru_edge (edge e) delete_insn (table); } -#if HAVE_cc0 /* If this was a conditional jump, we need to also delete the insn that set cc0. */ - if (any_condjump_p (q) && only_sets_cc0_p (PREV_INSN (q))) + if (HAVE_cc0 && any_condjump_p (q) && only_sets_cc0_p (PREV_INSN (q))) q = PREV_INSN (q); -#endif q = PREV_INSN (q); } diff --git a/gcc/combine.c b/gcc/combine.c index 430084e..d71f863 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -1141,10 +1141,8 @@ insn_a_feeds_b (rtx_insn *a, rtx_insn *b) FOR_EACH_LOG_LINK (links, b) if (links->insn == a) return true; -#if HAVE_cc0 - if (sets_cc0_p (a)) + if (HAVE_cc0 && sets_cc0_p (a)) return true; -#endif return false; } @@ -4816,7 +4814,6 @@ find_split_point (rtx *loc, rtx_insn *insn, bool set_src) break; case SET: -#if HAVE_cc0 /* If SET_DEST is CC0 and SET_SRC is not an operand, a COMPARE, or a ZERO_EXTRACT, the most likely reason why this doesn't match is that we need to put the operand into a register. So split at that @@ -4829,7 +4826,6 @@ find_split_point (rtx *loc, rtx_insn *insn, bool set_src) && ! (GET_CODE (SET_SRC (x)) == SUBREG && OBJECT_P (SUBREG_REG (SET_SRC (x) return &SET_SRC (x); -#endif /* See if we can split SET_SRC as it stands. */ split = find_split_point (&SET_SRC (x), insn, true); @@ -6582,13 +6578,12 @@ simplify_set (rtx x) else compare_mode = SELECT_CC_MODE (new_code, op0, op1); -#if !HAVE_cc0 /* If the mode changed, we have to change SET_DEST, the mode in the compare, and the mode in the place SET_DEST is used. If SET_DEST is a hard register, just build new versions with the proper mode. If it is a pseudo, we lose unless it is only time we set the pseudo, in
[PATCH 02/12] remove some ifdef HAVE_cc0
From: Trevor Saunders gcc/ChangeLog: 2015-04-21 Trevor Saunders * conditions.h: Define macros even if HAVE_cc0 is undefined. * emit-rtl.c: Define functions even if HAVE_cc0 is undefined. * final.c: Likewise. * jump.c: Likewise. * recog.c: Likewise. * recog.h: Declare functions even when HAVE_cc0 is undefined. * sched-deps.c (sched_analyze_2): Always compile case for cc0. --- gcc/conditions.h | 6 -- gcc/emit-rtl.c | 2 -- gcc/final.c | 2 -- gcc/jump.c | 3 --- gcc/recog.c | 2 -- gcc/recog.h | 2 -- gcc/sched-deps.c | 5 +++-- 7 files changed, 3 insertions(+), 19 deletions(-) diff --git a/gcc/conditions.h b/gcc/conditions.h index 2308bfc..7cd1e1c 100644 --- a/gcc/conditions.h +++ b/gcc/conditions.h @@ -20,10 +20,6 @@ along with GCC; see the file COPYING3. If not see #ifndef GCC_CONDITIONS_H #define GCC_CONDITIONS_H -/* None of the things in the files exist if we don't use CC0. */ - -#ifdef HAVE_cc0 - /* The variable cc_status says how to interpret the condition code. It is set by output routines for an instruction that sets the cc's and examined by output routines for jump instructions. @@ -117,6 +113,4 @@ extern CC_STATUS cc_status; (cc_status.flags = 0, cc_status.value1 = 0, cc_status.value2 = 0, \ CC_STATUS_MDEP_INIT) -#endif - #endif /* GCC_CONDITIONS_H */ diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 483eacb..c1974bb 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -3541,7 +3541,6 @@ prev_active_insn (rtx uncast_insn) return insn; } -#ifdef HAVE_cc0 /* Return the next insn that uses CC0 after INSN, which is assumed to set it. This is the inverse of prev_cc0_setter (i.e., prev_cc0_setter applied to the result of this function should yield INSN). @@ -3589,7 +3588,6 @@ prev_cc0_setter (rtx uncast_insn) return insn; } -#endif #ifdef AUTO_INC_DEC /* Find a RTX_AUTOINC class rtx which matches DATA. */ diff --git a/gcc/final.c b/gcc/final.c index 1fa93d9..41f6bd9 100644 --- a/gcc/final.c +++ b/gcc/final.c @@ -191,7 +191,6 @@ static rtx last_ignored_compare = 0; static int insn_counter = 0; -#ifdef HAVE_cc0 /* This variable contains machine-dependent flags (defined in tm.h) set and examined by output routines that describe how to interpret the condition codes properly. */ @@ -202,7 +201,6 @@ CC_STATUS cc_status; from before the insn. */ CC_STATUS cc_prev_status; -#endif /* Number of unmatched NOTE_INSN_BLOCK_BEG notes we have seen. */ diff --git a/gcc/jump.c b/gcc/jump.c index 34b3b7b..bc91550 100644 --- a/gcc/jump.c +++ b/gcc/jump.c @@ -1044,8 +1044,6 @@ jump_to_label_p (const rtx_insn *insn) && JUMP_LABEL (insn) != NULL && !ANY_RETURN_P (JUMP_LABEL (insn))); } -#ifdef HAVE_cc0 - /* Return nonzero if X is an RTX that only sets the condition codes and has no side effects. */ @@ -1094,7 +1092,6 @@ sets_cc0_p (const_rtx x) } return 0; } -#endif /* Find all CODE_LABELs referred to in X, and increment their use counts. If INSN is a JUMP_INSN and there is at least one diff --git a/gcc/recog.c b/gcc/recog.c index a9d3b1f..c3ad86f 100644 --- a/gcc/recog.c +++ b/gcc/recog.c @@ -971,7 +971,6 @@ validate_simplify_insn (rtx insn) return ((num_changes_pending () > 0) && (apply_change_group () > 0)); } -#ifdef HAVE_cc0 /* Return 1 if the insn using CC0 set by INSN does not contain any ordered tests applied to the condition codes. EQ and NE tests do not count. */ @@ -988,7 +987,6 @@ next_insn_tests_no_inequality (rtx insn) return (INSN_P (next) && ! inequality_comparisons_p (PATTERN (next))); } -#endif /* Return 1 if OP is a valid general operand for machine mode MODE. This is either a register reference, a memory reference, diff --git a/gcc/recog.h b/gcc/recog.h index 45ea671..8a38b26 100644 --- a/gcc/recog.h +++ b/gcc/recog.h @@ -112,9 +112,7 @@ extern void validate_replace_rtx_group (rtx, rtx, rtx); extern void validate_replace_src_group (rtx, rtx, rtx); extern bool validate_simplify_insn (rtx insn); extern int num_changes_pending (void); -#ifdef HAVE_cc0 extern int next_insn_tests_no_inequality (rtx); -#endif extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode); extern int offsettable_memref_p (rtx); diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c index 5434831..31de6be 100644 --- a/gcc/sched-deps.c +++ b/gcc/sched-deps.c @@ -2608,8 +2608,10 @@ sched_analyze_2 (struct deps_desc *deps, rtx x, rtx_insn *insn) return; -#ifdef HAVE_cc0 case CC0: +#ifdef HAVE_cc0 + gcc_unreachable (); +#endif /* User of CC0 depends on immediately preceding insn. */ SCHED_GROUP_P (insn) = 1; /* Don't move CC0 setter to another block (it can set up the @@ -2620,7 +2622,6 @@ sched_analyze_2 (struct deps_desc *deps, rtx x, rtx_insn *insn) sched_deps_info->finish_rhs (); return; -#endif case R
[PATCH 00/12] Reduce conditional compilation
From: Trevor Saunders Hi, This is a first round of patches to reduce the amount of code with in #if / #ifdef. This makes it incrementally easier to not break configs other than the one being built, and moves things slightly closer to using target hooks for everything. each commit bootstrapped and regtested on x86_64-linux-gnu without regression, and whole patch set run through config-list.mk without issue, ok? Trevor Saunders (12): add default definition of EH_RETURN_DATA_REGNO remove some ifdef HAVE_cc0 more HAVE_cc0 always define HAVE_cc0 make some HAVE_cc0 code always compiled provide default for RETURN_ADDR_OFFSET provide default for MASK_RETURN_ADDR reduce conditional compilation for HARD_FRAME_POINTER_IS_FRAME_POINTER remove #if for PIC_OFFSET_TABLE_REGNUM remove more ifdefs for HAVE_cc0 provide default for INSN_SETS_ARE_DELAYED add default for INSN_REFERENCES_ARE_DELAYED gcc/alias.c | 7 ++--- gcc/builtins.c| 2 -- gcc/caller-save.c | 4 +-- gcc/cfgcleanup.c | 26 +--- gcc/cfgrtl.c | 12 ++-- gcc/combine.c | 84 ++- gcc/conditions.h | 6 gcc/cprop.c | 4 +-- gcc/cse.c | 22 +- gcc/defaults.h| 23 ++ gcc/df-problems.c | 9 ++ gcc/df-scan.c | 46 +++- gcc/emit-rtl.c| 8 ++--- gcc/except.c | 26 ++-- gcc/final.c | 43 -- gcc/function.c| 5 ++- gcc/gcse.c| 24 --- gcc/genconfig.c | 1 + gcc/haifa-sched.c | 5 +-- gcc/ira-lives.c | 2 -- gcc/ira.c | 33 +--- gcc/jump.c| 3 -- gcc/loop-invariant.c | 4 +-- gcc/lra-constraints.c | 6 ++-- gcc/lra-lives.c | 2 -- gcc/optabs.c | 2 +- gcc/postreload.c | 4 +-- gcc/recog.c | 2 -- gcc/recog.h | 2 -- gcc/reginfo.c | 5 ++- gcc/regrename.c | 5 ++- gcc/reload.c | 12 +++- gcc/reload1.c | 10 +++--- gcc/reorg.c | 68 ++--- gcc/resource.c| 15 +++-- gcc/rtlanal.c | 2 -- gcc/sched-deps.c | 5 +-- gcc/sched-rgn.c | 4 +-- gcc/simplify-rtx.c| 5 ++- 39 files changed, 199 insertions(+), 349 deletions(-) -- 2.3.0.80.g18d0fec.dirty