Re: [patch, mips] Fix for PR target/56942
Steve Ellcey writes: > OK, here is patch to next_real_insn to keep the ordering property intact > and fix the bug. OK for checkin? Thanks, looks good to me, but an rtl/middle-end/global maintainer would need to approve it. Richard
Re: [testsuite] Disabling gcc.dg/cpp/trad/include.c for Android
2013/4/29 Mike Stump : > On Jan 9, 2013, at 7:14 AM, Alexander Ivchenko wrote: >> We have test fail for gcc.dg/cpp/trad/include.c on Android. The >> reason for that is that >> -ftraditional-cpp is not expected to work on Android due to variadic >> macro (like #define __builtin_warning(x, y...)) >> in standard headers and traditional preprocessor cannot handle them. >> The attached patch disables that test. > > Be sure to ask, Ok? in your patch submittals. > > Ok. thank you! I thought I did ask.. > ... > in standard headers and traditional preprocessor cannot handle them." > > is it ok for trunk? > could someone commit that patch please? I don't have commit access. thanks, Alexander
RE: [patch] cilkplus: Array notation for C patch
Here's a review of the changes to the compiler proper in this patch. I don't think much more will come up from reviews of the compiler changes - but I still need to review the testsuite changes against the language specification to make sure that everything is properly covered in the testsuite (which might in turn show up further things needing to be addressed in the compiler). > + error_at (location, "__sec_implicit_index parameter must be a " > + "integer constant expression"); "an", not "a". > diff --git a/gcc/c/ChangeLog.cilkplus b/gcc/c/ChangeLog.cilkplus I believe the actual trunk commit, when this is ready to go in, should simply add the ChangeLog entries for the committed changes to the top of the existing ChangeLog files, rather than creating such a new ChangeLog file. > diff --git a/gcc/c/c-array-notation.c b/gcc/c/c-array-notation.c > +#include "gcc.h" That header is for the compiler driver. Including it in anything built into cc1 is suspicious. > +/* Given an FNDECL or an ADDR_EXPR, return the corresponding I think you mean something like "Given FNDECL, a FUNCTION_DECL or an ADDR_EXPR", rather than "Given an FNDECL or an ADDR_EXPR". > +/* Set *RANK of expression ARRAY, ignoring array notation specific built-in > + functions if IGNORE_BUILTIN_FN is true. The ORIG_EXPR is printed out if > an > + error occured in the rank calculation. The functions returns false if it > + encounters an error in rank calculation. > + > + For example, an array notation of A[:][:] or B[0:10][0:5:2] or > C[5][:][1:0] > + all have a rank of 2. */ This still doesn't seem to say anything about the semantics of the value *RANK on entry to the function. (I think it's something like *RANK being either 0, or the rank of another subexpression that must have the same rank as this one, but you need to say that.) > +/* Extracts all array notations in NODE and stores them in ARRAY_LIST. If > + IGNORE_BUILTIN_FN is set, then array notations inside array notation > + specific built-in functions are ignored. The NODE can be anything from a > + full function to a single variable. */ "can be anything"? That seems rather ad hoc. I'd think there should be defined classes of trees - probably expressions and things that can appear in them, but not tcc_exceptional or tcc_type - that can appear here, and that you should check (in an assertion) for EXPR_P or one of the other cases allowed. In particular, you allow TREE_LIST in this function. How can TREE_LISTs get here and can they readily be avoided? It's generally a bad idea (and rare) to have places where something with the static type "tree" can be either a TREE_LIST or some other kind of tree. I note that in the function replace_array_notations, which is presumably intended to match this one, you *don't* handle TREE_LIST. These functions recurse down into operands of trees. But what about into types? If a type contains an expression that needs to be evaluated as part of evaluating VLA sizes, that gets stored specially by grokdeclarator, and in the end that expression get put in a statement somewhere to ensure that it does get evaluated. But that's for expressions with side effects involved in types. Array notation expressions may not necessarily have side effects. And as I understand it, even if an expression is extracted OK by extract_array_notation_exprs because it appears somewhere that function looks at, replace_array_notations will need to substitute it everywhere - substituting a copy appearing directly in a statement / expression, while missing a copy embedded in a type, won't suffice. So maybe you need to recurse down into types in some way? (Then I'm not entirely sure when it's safe to modify an existing type and when you'd need to build up a new, similar type with the expression modified appropriately.) Maybe an example would help. I see nothing in the Cilk Plus specification to rule out expressions of the form a[:] = ((int (*)[b[:]][c[:]]) d[:])[1][2]; meaning that each element of the array d should be cast to a pointer-to-VLA type, with the dimensions of the VLA coming from corresponding elements of arrays b and c, and then element[1][2] of that VLA extracted. But the rules for determining rank don't really seem to consider subexpressions that appear within types, so maybe adjustments are needed there as well. (Of course such type names can appear within expressions in sizeof, or compound literals, or several other cases in the syntax, not just in casts.) It's possible that the above case does work despite types not being adjusted, because the logic to multiply by array sizes when doing pointer addition / array dereference may already have taken effect while the expressions were constructed. But leaving types unadjusted still seems rather risky, and would seem likely to cause problems with debug info (consider the case where a variable is actually being declared with the type involving arra
[Fortran-Dev] Some ubounds -> extent changes
This patch changes some ubounds to extent. The patch is relative to my type patch - but it also applies without. It also fixes a bunch fo testsuite failures. Build and regtested on x86-64-gnu-linux. I intent to commit the patch soon. Comments and suggestions are welcome. Tobias 2013-04-29 Tobias Burnus * trans-array.c (gfc_trans_dummy_array_bias, get_std_lbound, gfc_alloc_allocatable_for_assignment): Change ubound to extent. * trans-expr.c (gfc_trans_alloc_subarray_assign): Ditto. * trans-intrinsic.c (gfc_conv_intrinsic_bound): Ditto. diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index 49eaaae..34421df 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -8110,7 +8110,7 @@ static tree get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size) { tree lbound; - tree ubound; + tree extent; tree stride; tree cond, cond1, cond3, cond4; tree tmp; @@ -8120,10 +8120,10 @@ get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size) { tmp = gfc_rank_cst[dim]; lbound = gfc_conv_descriptor_lbound_get (desc, tmp); - ubound = gfc_conv_descriptor_ubound_get (desc, tmp); + extent = gfc_conv_descriptor_extent_get (desc, tmp); stride = gfc_conv_descriptor_stride_get (desc, tmp); - cond1 = fold_build2_loc (input_location, GE_EXPR, boolean_type_node, - ubound, lbound); + cond1 = fold_build2_loc (input_location, GT_EXPR, boolean_type_node, + extent, gfc_index_zero_node); cond3 = fold_build2_loc (input_location, GE_EXPR, boolean_type_node, stride, gfc_index_zero_node); cond3 = fold_build2_loc (input_location, TRUTH_AND_EXPR, @@ -8240,7 +8240,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop, tree tmp; tree tmp2; tree lbound; - tree ubound; + tree extent; tree desc; tree old_desc; tree desc2; @@ -8248,7 +8248,6 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop, tree jump_label1; tree jump_label2; tree neq_size; - tree lbd; int n; int dim; gfc_array_spec * as; @@ -8411,37 +8410,24 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop, for (n = 0; n < expr2->rank; n++) { + lbound = gfc_index_one_node; tmp = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type, loop->to[n], loop->from[n]); - tmp = fold_build2_loc (input_location, PLUS_EXPR, + extent = fold_build2_loc (input_location, PLUS_EXPR, gfc_array_index_type, tmp, gfc_index_one_node); - lbound = gfc_index_one_node; - ubound = tmp; - if (as) - { - lbd = get_std_lbound (expr2, desc2, n, -as->type == AS_ASSUMED_SIZE); - ubound = fold_build2_loc (input_location, -MINUS_EXPR, -gfc_array_index_type, -ubound, lbound); - ubound = fold_build2_loc (input_location, -PLUS_EXPR, -gfc_array_index_type, -ubound, lbd); - lbound = lbd; - } + lbound = get_std_lbound (expr2, desc2, n, + as->type == AS_ASSUMED_SIZE); gfc_conv_descriptor_lbound_set (&fblock, desc, gfc_rank_cst[n], lbound); - gfc_conv_descriptor_ubound_set (&fblock, desc, + gfc_conv_descriptor_extent_set (&fblock, desc, gfc_rank_cst[n], - ubound); + extent); gfc_conv_descriptor_stride_set (&fblock, desc, gfc_rank_cst[n], size1); @@ -8455,7 +8441,7 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop, offset, tmp2); size1 = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, - tmp, size1); + extent, size1); } /* Set the lhs descriptor and scalarizer offsets. For rank > 1, diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index e21c3d2..2370f44 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -5830,7 +5830,6 @@ gfc_trans_alloc_subarray_assign (tree dest, gfc_component * cm, for (n = 0; n < expr->rank; n++) { - tree span; tree lbound; /* Obtain the correct lbound - ISO/IEC TR 15581:2001 page 9. @@ -5860,14 +5859,7 @@ gfc_trans_alloc_subarray_assign (tree dest, gfc_component * cm, lbound = fold_convert (gfc_array_index_type, lbound); - /* Shift the bounds and set the offset accordingly. */ - tmp = gfc_conv_descriptor_ubound_get (dest, gfc_rank_cst[n]); - span = fold_build2_loc (input_location, MINUS_EXPR, gfc_array_index_type, - tmp, gfc_conv_descriptor_lbound_get (dest, gfc_rank_cst[n])); - tmp = fold_build2_loc (input_location, PLUS_EXPR, gfc_array_index_type, - span, lbound); - gfc_conv_descriptor_ubound_set (&block, dest, - gfc_rank_cst[n], tmp); + /* Shift the lower_bound and set the offset accordingly. */ gfc_conv_descriptor_lbound_set (&block, dest, gfc_rank_cst[n], lbound); diff --git a/gcc/fortran/trans
MEM_REF representation problem, and folding fix
Currently, MEM_REF contains two pointer arguments, one which is supposed to be a base object and another which is supposed to be a constant offset. This representation is somewhat problematic, as not all machines treat pointer values as essentially integers. On machines where size_t is smaller than a pointer, for example m32c where it's due to limitations in the compiler, or the port I've been working on recently where pointers contain a segment selector that does not participate in additions, this is not an accurate representation, and it does cause real issues. It would be better to use a representation more like POINTER_PLUS with a pointer and a real sizetype integer. Can someone explain the comment in tree.def which states that the type of the constant offset is used for TBAA purposes? It states "MEM_REF is equivalent to ((typeof(c))p)->x [...]", so why not represent it as MEM_REF <(desired type)p, (size_t)c>? The following patch works around one instance of the problem. When we fold an offset addition, the addition must be performed in sizetype, otherwise we may get unwanted overflow. This bug triggers on m32c for example, where an offset of 65528 (representing -8) and and offset of 8 are added, yielding an offset of 65536 instead of zero. Solved by performing the intermediate computation in sizetype. Bootstrapped and tested on x86_64-linux (all languages except Ada) with no changes in the tests, and tested on m32c-elf where it fixes 22 failures. Ok? Bernd * fold-const.c (fold_binary_loc): When folding an addition in the offset of a memref, use size_type to perform the arithmetic. diff --git a/gcc/fold-const.c b/gcc/fold-const.c index 59dbc03..6f092ab 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -10025,15 +10025,17 @@ fold_binary_loc (location_t loc, && handled_component_p (TREE_OPERAND (arg0, 0))) { tree base; + tree type1 = TREE_TYPE (arg1); HOST_WIDE_INT coffset; base = get_addr_base_and_unit_offset (TREE_OPERAND (arg0, 0), &coffset); if (!base) return NULL_TREE; - return fold_build2 (MEM_REF, type, - build_fold_addr_expr (base), - int_const_binop (PLUS_EXPR, arg1, - size_int (coffset))); + arg1 = fold_convert (size_type_node, arg1); + arg1 = int_const_binop (PLUS_EXPR, arg1, size_int (coffset)); + base = build_fold_addr_expr (base); + arg1 = fold_convert (type1, arg1); + return fold_build2 (MEM_REF, type, base, arg1); } return NULL_TREE;
Fwd: [PATCH] Fix PR56915
This patch is for the ICE of PR56915 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56915), specific to gcc 4.9. Because this patch only touches the C++ frontend, I only ran the g++ and libstdc++ testsuits with the newly added testcase. And the test results on Ubuntu x86_64 indicate no regression regarding the testsuits. The problem causing the ICE is that GCC sets DECL_INTERFACE_KNOWN too early before 'start_preparsed_function', which generates the body of the thunk and depends on the value of DECL_INTERFACE_KNOWN. Moving the setting after the point of 'start_preparsed_function' or 'symtab_add_to_same_comdat_group' fixes the problem. Shixiong commit 1bc89baae77d7a6d4f98b70e6e603454e2837919 Author: Shixiong Xu Date: Sun Apr 28 22:11:05 2013 +1200 PR c++/56915 * gcc/cp/semantics.c: Move down the setting to DECL_INTERFACE_KNOWN(...) after symtab_add_to_same_comdat_group(...). * gcc/testsuite/g++.dg/torture/pr56915.C: New. pr56915.patch Description: Binary data
[PATCH, i386]: Fix PR44578, GCC generates MMX instructions but fails to generate "emms"
Hello! Attached patch fixes PR44578, where MMX register was allocated for zero_extendsidi2 RTX. The patch adds "!" to the interfering alternative, so RA won't choose alternative involving MMX register unless absolute necessary. 2013-04-29 Uros Bizjak PR target/44578 * config/i386/i386.md (*zero_extendisid2): Add "!" to m->?*y alternative. testsuite/ChangeLog: 2013-04-29 Uros Bizjak PR target/44578 * gcc.target/i386/pr44578.c: New test. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN. The patch will be backported to 4.7 and 4.8 branches. Uros. Index: config/i386/i386.md === --- config/i386/i386.md (revision 198401) +++ config/i386/i386.md (working copy) @@ -3049,10 +3049,10 @@ (define_insn "*zero_extendsidi2" [(set (match_operand:DI 0 "nonimmediate_operand" - "=r,?r,?o,r ,o,?*Ym,?*y,?*Yi,?*x") + "=r,?r,?o,r ,o,?*Ym,?!*y,?*Yi,?*x") (zero_extend:DI (match_operand:SI 1 "x86_64_zext_operand" - "0 ,rm,r ,rmWz,0,r ,m ,r ,m")))] + "0 ,rm,r ,rmWz,0,r ,m ,r ,m")))] "" { switch (get_attr_type (insn)) Index: testsuite/gcc.target/i386/pr44578.c === --- testsuite/gcc.target/i386/pr44578.c (revision 0) +++ testsuite/gcc.target/i386/pr44578.c (working copy) @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mtune=athlon64" } */ + +extern void abort (void); + +long double +__attribute__((noinline, noclone)) +test (float num) +{ + unsigned int i; + + if (num < 0.0) +num = 0.0; + + __builtin_memcpy (&i, &num, sizeof(unsigned int)); + + return (long double)(unsigned long long) i; +} + +int +main () +{ + long double x; + + x = test (0.0); + + if (x != 0.0) +abort (); + + return 0; +}
Re: [patch, mips] Fix for PR target/56942
On Sat, 2013-04-27 at 08:56 +0100, Richard Sandiford wrote: > >> But using next_real_insn was at least as correct (IMO, more correct) > >> as next_active_insn before r197266. It seems counterintuitive that > >> something can be "active" but not "real". > >> > >> Richard > > > > So should we put the active_insn_p hack/FIXME into real_next_insn? That > > doesn't seem like much of a win but it would probably fix the problem. > > Yeah, I think so. If "=>" mean "accepts more than", then there used > to be a nice total order: > > next_insn > => next_nonnote_insn > => next_real_insn > => next_active_insn > > I think we should keep that if possible, even during the transition period. > > Thanks, > Richard OK, here is patch to next_real_insn to keep the ordering property intact and fix the bug. OK for checkin? Steve Ellcey sell...@imgtec.com 2013-04-29 Andrew Bennett Steve Ellcey PR target/56942 * emit-rtl.c (next_real_insn): Accept jump table data as 'real' (like next_active_insn does). diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c index 538b1ec..9de3f1e 100644 --- a/gcc/emit-rtl.c +++ b/gcc/emit-rtl.c @@ -3248,7 +3248,8 @@ next_real_insn (rtx insn) while (insn) { insn = NEXT_INSN (insn); - if (insn == 0 || INSN_P (insn)) + if (insn == 0 || INSN_P (insn) + || JUMP_TABLE_DATA_P (insn)) /* FIXME */ break; }
patch to fix PR57097
The following patch fixes: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57097 The patch was successfully bootstrapped and tested on x86/x86-64. Committed as rev. 198432. 2013-04-29 Vladimir Makarov PR target/57097 * lra-constraints.c (process_alt_operands): Discourage a bit more using memory for pseudos. Print cost dump for alternatives. Modify cost values for conflicts with early clobbers. (curr_insn_transform): Spill pseudos reassigned to NO_REGS. 2013-04-29 Vladimir Makarov PR target/57097 * gcc.target/i386/pr57097.c: New test. Index: lra-constraints.c === --- lra-constraints.c (revision 198422) +++ lra-constraints.c (working copy) @@ -2013,7 +2013,7 @@ process_alt_operands (int only_alternati although it might takes the same number of reloads. */ if (no_regs_p && REG_P (op)) - reject++; + reject += 2; #ifdef SECONDARY_MEMORY_NEEDED /* If reload requires moving value through secondary @@ -2044,7 +2044,13 @@ process_alt_operands (int only_alternati or non-important thing to be worth to do it. */ overall = losers * LRA_LOSER_COST_FACTOR + reject; if ((best_losers == 0 || losers != 0) && best_overall < overall) - goto fail; +{ + if (lra_dump_file != NULL) + fprintf (lra_dump_file, +" alt=%d,overall=%d,losers=%d -- reject\n", +nalt, overall, losers); + goto fail; +} curr_alt[nop] = this_alternative; COPY_HARD_REG_SET (curr_alt_set[nop], this_alternative_set); @@ -2139,7 +2145,10 @@ process_alt_operands (int only_alternati curr_alt_dont_inherit_ops[curr_alt_dont_inherit_ops_num++] = last_conflict_j; losers++; - overall += LRA_LOSER_COST_FACTOR; + /* Early clobber was already reflected in REJECT. */ + lra_assert (reject > 0); + reject--; + overall += LRA_LOSER_COST_FACTOR - 1; } else { @@ -2163,7 +2172,10 @@ process_alt_operands (int only_alternati } curr_alt_win[i] = curr_alt_match_win[i] = false; losers++; - overall += LRA_LOSER_COST_FACTOR; + /* Early clobber was already reflected in REJECT. */ + lra_assert (reject > 0); + reject--; + overall += LRA_LOSER_COST_FACTOR - 1; } } small_class_operands_num = 0; @@ -2171,6 +2183,11 @@ process_alt_operands (int only_alternati small_class_operands_num += SMALL_REGISTER_CLASS_P (curr_alt[nop]) ? 1 : 0; + if (lra_dump_file != NULL) + fprintf (lra_dump_file, " alt=%d,overall=%d,losers=%d," +"small_class_ops=%d,rld_nregs=%d\n", +nalt, overall, losers, small_class_operands_num, reload_nregs); + /* If this alternative can be made to work by reloading, and it needs less reloading than the others checked so far, record it as the chosen goal for reloading. */ @@ -3136,7 +3153,15 @@ curr_insn_transform (void) spilled. Spilled scratch pseudos are transformed back to scratches at the LRA end. */ && lra_former_scratch_operand_p (curr_insn, i)) - change_class (REGNO (op), NO_REGS, " Change", true); + { + int regno = REGNO (op); + change_class (regno, NO_REGS, " Change", true); + if (lra_get_regno_hard_regno (regno) >= 0) + /* We don't have to mark all insn affected by the + spilled pseudo as there is only one such insn, the + current one. */ + reg_renumber[regno] = -1; + } continue; } Index: testsuite/gcc.target/i386/pr57097.c === --- testsuite/gcc.target/i386/pr57097.c (revision 0) +++ testsuite/gcc.target/i386/pr57097.c (working copy) @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fPIC" } */ +extern double ad[], bd[], cd[], dd[]; +extern long long all[], bll[], cll[], dll[]; + +int +main (int i, char **a) +{ + bd[i] = i + 64; + if (i % 3 == 0) +{ + cd[i] = i; +} + dd[i] = i / 2; + ad[i] = i * 2; + if (i % 3 == 1) +{ + dll[i] = 127; +} + dll[i] = i; + cll[i] = i * 2; + switch (i % 3) +{ +case 0: + bll[i] = i + 64; +} + all[i] = i / 2; + return 0; +}
Re: [patch] Fix node weight updates during ipa-cp (issue7812053)
On Mon, Apr 29, 2013 at 10:31 AM, Teresa Johnson wrote: > FYI, Fixed in r198416. > > Thanks, > Teresa > I noticed that sometimes GCC generates: _8 = memcpy (ret_6, s_2(D), len_4); _8 = memcpy (ret_6, s_2(D), len_4); memcpy (_17, buffer_12(D), add_16); memcpy (_17, buffer_12(D), add_16); memcpy (_25, _28, _27); memcpy (_25, _28, _27); memcpy (_39, buffer_2, len_4); memcpy (_39, buffer_2, len_4); memcpy (_16, &fillbuf, pad_1); memcpy (_16, &fillbuf, pad_1); ... -- H.J.
[PATCH] Don't instrument with -fsanitize=thread accesses to DECL_HARD_REGISTER vars (PR tree-optimization/57104)
Hi! DECL_HARD_REGISTER vars don't live in memory, thus they can't be addressable. The following patch fixes the ICE, ok for trunk/4.8? 2013-04-29 Jakub Jelinek PR tree-optimization/57104 * tsan.c (instrument_expr): Don't instrument accesses to DECL_HARD_REGISTER VAR_DECLs. * gcc.dg/pr57104.c: New test. --- gcc/tsan.c.jj 2013-04-24 12:07:12.0 +0200 +++ gcc/tsan.c 2013-04-29 21:06:48.975888478 +0200 @@ -128,7 +128,9 @@ instrument_expr (gimple_stmt_iterator gs return false; } - if (TREE_READONLY (base)) + if (TREE_READONLY (base) + || (TREE_CODE (base) == VAR_DECL + && DECL_HARD_REGISTER (base))) return false; if (size == 0 --- gcc/testsuite/gcc.dg/pr57104.c.jj 2013-04-29 21:09:46.812948131 +0200 +++ gcc/testsuite/gcc.dg/pr57104.c 2013-04-29 21:09:39.0 +0200 @@ -0,0 +1,12 @@ +/* PR tree-optimization/57104 */ +/* { dg-do compile { target { x86_64-*-linux* && lp64 } } } */ +/* { dg-options "-fsanitize=thread" } */ + +register int r asm ("r14"); +int v; + +int +foo (void) +{ + return r + v; +} Jakub
Re: Make m32c build, fix PSImode truncation
> Sorry for missing the truncation patterns, I should have grepped > more than m32c.md. They look a lot like normal moves though. Is > truncation really not a noop, or are the patterns there to work > around something (probably this :-))? Not sure which pattern you're talking about, but in general, the m32c's registers are either 16-bit or 24-bit. You can move a pair of 16-bit registers into a 24-bit register and it truncates as part of the move, likewise from 32-bit memory to 24-bit reg. Note that moves to other 32-bit destinations do *not* truncate, nor can 24-bit registers hold 32-bit values (duh). The 24-bit registers may also hold a 16-bit value. If you move a 16-bit value into a 24-bit register, it zero_extends.
[WIP RFH] #pragma omp declare simd (aka OpenMP elemental functions) parsing
Hi! The following patch are some WIP steps towards #pragma omp declare simd parsing. The spec is a little bit vague, talks just that (a sequence of) #pragma omp declare simd pragmas have to immediately precede a function declaration or definition and that the arguments referred in its clauses are the argument names of that function declaration or definition. ATM the patch just throws that info away completely in cp_finish_omp_declare_simd after calling finish_omp_clauses on it, the plan is just for each clause list create some artificial attribute (say "omp declare simd" with the spaces) and put the clauses as its argument. Now, my current problem is that in the declare-simd-1.C testcase unfortunately on 2 lines I get 3 errors each; the problem is that this is an explicit specialization and the original decl has no param names (or could have different parameter names), and before start_decl returns grokdeclarator -> grokfndecl -> check_explicit_specialization calls duplicate_decls and throws away the new parameter names (if the new explicit specialization isn't definition). Any suggestions what to do? The problem is that grokdeclarator, grokfndecl, check_explicit_specialization are decl.c, and have no access to cp_parser structure which contains the vector. Should I copy the parser->omp_declare_simd_clauses vector pointer say into cp_declarator structure so that grokfndecl could grab it from there? Also, for the attributes I wonder if it wouldn't be better to finally replace the PARM_DECLs in the clauses say with parameter indexes, because otherwise it might be difficult to adjust those during instantiation etc. Other comments? 2013-04-29 Jakub Jelinek * parser.h (struct cp_parser): Add omp_declare_simd_clauses field. * parser.c (cp_ensure_no_omp_declare_simd): New function. (enum pragma_context): Add pragma_member and pragma_objc_icode. (cp_parser_linkage_specification, cp_parser_namespace_definition, cp_parser_class_specifier_1): Call cp_ensure_no_omp_declare_simd. (cp_parser_init_declarator, cp_parser_member_declaration, cp_parser_function_definition_from_specifiers_and_declarator, cp_parser_save_member_function_body): Call cp_finish_omp_declare_simd. (cp_parser_member_specification_opt): Pass pragma_member instead of pragma_external to cp_parser_pragma. (cp_parser_objc_interstitial_code): Pass pragma_objc_icode instead of pragma_external to cp_parser_pragma. (cp_parser_omp_var_list_no_open): If parser->omp_declare_simd_clauses, just cp_parser_identifier the argument names. (cp_parser_omp_all_clauses): Don't call finish_omp_clauses for parser->omp_declare_simd_clauses. (OMP_DECLARE_SIMD_CLAUSE_MASK): Define. (cp_parser_omp_declare_simd, cp_finish_omp_declare_simd, cp_parser_omp_declare): New functions. (cp_parser_pragma): Call cp_ensure_no_omp_declare_simd. Handle PRAGMA_OMP_DECLARE_REDUCTION. Replace == pragma_external with != pragma_stmt and != pragma_compound. * g++.dg/gomp/declare-simd-1.C: New test. * g++.dg/gomp/declare-simd-2.C: New test. --- gcc/cp/parser.h.jj 2013-03-20 10:07:19.0 +0100 +++ gcc/cp/parser.h 2013-04-29 12:17:55.445392454 +0200 @@ -340,6 +340,10 @@ typedef struct GTY(()) cp_parser { /* The number of template parameter lists that apply directly to the current declaration. */ unsigned num_template_parameter_lists; + + /* When parsing #pragma omp declare simd, this is a vector of + the clauses. */ + vec *omp_declare_simd_clauses; } cp_parser; /* In parser.c */ --- gcc/cp/parser.c.jj 2013-04-24 15:24:45.0 +0200 +++ gcc/cp/parser.c 2013-04-29 19:55:05.987702600 +0200 @@ -1169,6 +1169,19 @@ cp_token_cache_new (cp_token *first, cp_ return cache; } +/* Diagnose if #pragma omp declare simd isn't followed immediately + by function declaration or definition. */ + +static inline void +cp_ensure_no_omp_declare_simd (cp_parser *parser) +{ + if (parser->omp_declare_simd_clauses) +{ + error ("%<#pragma omp declare simd%> not immediately followed by " +"function declaration or definition"); + parser->omp_declare_simd_clauses = NULL; +} +} /* Decl-specifiers. */ @@ -2149,7 +2162,13 @@ static bool cp_parser_function_transacti static tree cp_parser_transaction_cancel (cp_parser *); -enum pragma_context { pragma_external, pragma_stmt, pragma_compound }; +enum pragma_context { + pragma_external, + pragma_member, + pragma_objc_icode, + pragma_stmt, + pragma_compound +}; static bool cp_parser_pragma (cp_parser *, enum pragma_context); @@ -11154,6 +11173,8 @@ cp_parser_linkage_specification (cp_pars production. */ if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE)) { + cp_ensure_no_omp_declare_simd (parser); + /* Consume the `{' token. */
[PATCH, i386]: Fix PR57098, ICE with -mcmodel=large -msse4 and __builtin_shuffle()
Hello! 2013-04-29 Uros Bizjak PR target/57098 * config/i386/i386.c (ix86_expand_vec_perm): Validize constant memory. 2013-04-29 Uros Bizjak PR target/57098 * gcc.target/i386/pr57098.c: New test. Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. The patch will be backported to 4.7 and 4.8 branches. Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 198401) +++ config/i386/i386.c (working copy) @@ -20559,7 +20559,7 @@ ix86_expand_vec_perm (rtx operands[]) vec[i * 2 + 1] = const1_rtx; } vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec)); - vt = force_const_mem (maskmode, vt); + vt = validize_mem (force_const_mem (maskmode, vt)); t1 = expand_simple_binop (maskmode, PLUS, t1, vt, t1, 1, OPTAB_DIRECT); @@ -20756,7 +20756,7 @@ ix86_expand_vec_perm (rtx operands[]) for (i = 0; i < 16; ++i) vec[i] = GEN_INT (i/e * e); vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec)); - vt = force_const_mem (V16QImode, vt); + vt = validize_mem (force_const_mem (V16QImode, vt)); if (TARGET_XOP) emit_insn (gen_xop_pperm (mask, mask, mask, vt)); else @@ -20767,7 +20767,7 @@ ix86_expand_vec_perm (rtx operands[]) for (i = 0; i < 16; ++i) vec[i] = GEN_INT (i % e); vt = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, vec)); - vt = force_const_mem (V16QImode, vt); + vt = validize_mem (force_const_mem (V16QImode, vt)); emit_insn (gen_addv16qi3 (mask, mask, vt)); } Index: testsuite/gcc.target/i386/pr57098.c === --- testsuite/gcc.target/i386/pr57098.c (revision 0) +++ testsuite/gcc.target/i386/pr57098.c (working copy) @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-msse4 -mcmodel=large" } */ + +typedef int V __attribute__((vector_size(16))); + +void foo (V *p, V *mask) +{ + *p = __builtin_shuffle (*p, *mask); +}
[Patch, Fortran, committed] PR57114 correct intrinsic.texi (RANK) bug.
Committed as obvious as Rev. 198429 Tobias Index: gcc/fortran/ChangeLog === --- gcc/fortran/ChangeLog (Revision 198428) +++ gcc/fortran/ChangeLog (Arbeitskopie) @@ -1,5 +1,11 @@ 2013-04-28 Tobias Burnus + PR fortran/57114 + * intrinsic.texi (RANK): Correct syntax description and + expected result. + +2013-04-28 Tobias Burnus + PR fortran/57093 * trans-types.c (gfc_get_element_type): Fix handling of scalar coarrays of type character. Index: gcc/fortran/intrinsic.texi === --- gcc/fortran/intrinsic.texi (Revision 198428) +++ gcc/fortran/intrinsic.texi (Arbeitskopie) @@ -10279,7 +10279,7 @@ Inquiry function @item @emph{Syntax}: -@code{RESULT = RANGE(A)} +@code{RESULT = RANK(A)} @item @emph{Arguments}: @multitable @columnfractions .15 .70 @@ -10296,7 +10296,7 @@ integer :: a real, allocatable :: b(:,:) - print *, rank(a), rank(b) ! Prints: 0 3 + print *, rank(a), rank(b) ! Prints: 0 2 end program test_rank @end smallexample
Re: GCC does not support *mmintrin.h with function specific opts
On Thu, Apr 25, 2013 at 12:41 PM, Joseph S. Myers wrote: > On Tue, 16 Apr 2013, Sriraman Tallam wrote: > >> Ok, it is on by default now. There is a way to turn it off, with >> -mno-generate-builtins. > > Any new option needs documenting in invoke.texi. Added and new patch attached. Thanks Sri > > -- > Joseph S. Myers > jos...@codesourcery.com * config/i386/i386.c (construct_container): Do not issue SSE return error for extern gnu_inline functions. (def_builtin): Do not generate builtins when -mno-generate-builtins is used. * doc/invoke.texi: Document option -mgenerate-builtins. * config/i386/i386.opt (mgenerate-builtins): New target option. * config/i386/i386-c.c (ix86_target_macros_internal): Define macro __ALL_ISA__ when generate_target_builtins is true. * testsuite/gcc.target/i386/intrinsics_1.c: New test. * testsuite/gcc.target/i386/intrinsics_2.c: Ditto. * testsuite/gcc.target/i386/intrinsics_3.c: Ditto. * testsuite/gcc.target/i386/intrinsics_4.c: Ditto. * testsuite/gcc.target/i386/intrinsics_5.c: Ditto. * config/i386/lzcntintrin.h: Expose header when __ALL_ISA__ is defined. * config/i386/lwpintrin.h: Ditto. * config/i386/xopintrin.h: Ditto. * config/i386/fmaintrin.h: Ditto. * config/i386/bmiintrin.h: Ditto. * config/i386/fma4intrin.h: Ditto. * config/i386/nmmintrin.h: Ditto. * config/i386/tbmintrin.h: Ditto. * config/i386/smmintrin.h: Ditto. * config/i386/wmmintrin.h: Ditto. * config/i386/popcntintrin.h: Ditto. * config/i386/f16cintrin.h: Ditto. * config/i386/pmmintrin.h: Ditto. * config/i386/bmi2intrin.h: Ditto. * config/i386/tmmintrin.h: Ditto. * config/i386/xmmintrin.h: Ditto. * config/i386/mmintrin.h: Ditto. * config/i386/ammintrin.h: Ditto. * config/i386/emmintrin.h: Ditto. Index: config/i386/smmintrin.h === --- config/i386/smmintrin.h (revision 198212) +++ config/i386/smmintrin.h (working copy) @@ -27,7 +27,7 @@ #ifndef _SMMINTRIN_H_INCLUDED #define _SMMINTRIN_H_INCLUDED -#ifndef __SSE4_1__ +#if !defined (__SSE4_1__) && !defined (__ALL_ISA__) # error "SSE4.1 instruction set not enabled" #else Index: config/i386/f16cintrin.h === --- config/i386/f16cintrin.h(revision 198212) +++ config/i386/f16cintrin.h(working copy) @@ -25,7 +25,7 @@ # error "Never use directly; include or instead." #endif -#ifndef __F16C__ +#if !defined (__F16C__) && !defined (__ALL_ISA__) # error "F16C instruction set not enabled" #else Index: config/i386/wmmintrin.h === --- config/i386/wmmintrin.h (revision 198212) +++ config/i386/wmmintrin.h (working copy) @@ -30,7 +30,7 @@ /* We need definitions from the SSE2 header file. */ #include -#if !defined (__AES__) && !defined (__PCLMUL__) +#if !defined (__AES__) && !defined (__PCLMUL__) && !defined (__ALL_ISA__) # error "AES/PCLMUL instructions not enabled" #else Index: config/i386/bmi2intrin.h === --- config/i386/bmi2intrin.h(revision 198212) +++ config/i386/bmi2intrin.h(working copy) @@ -25,7 +25,7 @@ # error "Never use directly; include instead." #endif -#ifndef __BMI2__ +#if !defined (__BMI2__) && !defined (__ALL_ISA__) # error "BMI2 instruction set not enabled" #endif /* __BMI2__ */ Index: config/i386/pmmintrin.h === --- config/i386/pmmintrin.h (revision 198212) +++ config/i386/pmmintrin.h (working copy) @@ -27,7 +27,7 @@ #ifndef _PMMINTRIN_H_INCLUDED #define _PMMINTRIN_H_INCLUDED -#ifndef __SSE3__ +#if !defined (__SSE3__) && !defined (__ALL_ISA__) # error "SSE3 instruction set not enabled" #else Index: config/i386/lzcntintrin.h === --- config/i386/lzcntintrin.h (revision 198212) +++ config/i386/lzcntintrin.h (working copy) @@ -25,7 +25,7 @@ # error "Never use directly; include instead." #endif -#ifndef __LZCNT__ +#if !defined (__LZCNT__) && !defined (__ALL_ISA__) # error "LZCNT instruction is not enabled" #endif /* __LZCNT__ */ Index: config/i386/tmmintrin.h === --- config/i386/tmmintrin.h (revision 198212) +++ config/i386/tmmintrin.h (working copy) @@ -27,7 +27,7 @@ #ifndef _TMMINTRIN_H_INCLUDED #define _TMMINTRIN_H_INCLUDED -#ifndef __SSSE3__ +#if !defined (__SSSE3__) && !defined (__ALL_ISA__) # error "SSSE3 instruction set not enabled" #else Index: config/i386/xmmintrin.h === --- config/i386/xmmin
Re: [patch] Fix node weight updates during ipa-cp (issue7812053)
FYI, Fixed in r198416. Thanks, Teresa On Thu, Apr 25, 2013 at 10:19 PM, Teresa Johnson wrote: > Reproduced. This looks like another instance of a case I found testing > my follow-on patch: the helper routines have some assertion checking > that is too strict for the broader usage where we may be scaling > counts up and not just down. I am verifying and will send a patch in > the morning that suppresses this assert, which is the approach I am > taking in the follow-on patch also coming tomorrow. > > Teresa > > On Thu, Apr 25, 2013 at 3:29 PM, H.J. Lu wrote: >> On Fri, Apr 5, 2013 at 7:18 AM, Teresa Johnson wrote: >>> On Thu, Mar 28, 2013 at 2:27 AM, Richard Biener >>> wrote: On Wed, Mar 27, 2013 at 6:22 PM, Teresa Johnson wrote: > I found that the node weight updates on cloned nodes during ipa-cp were > leading to incorrect/insane weights. Both the original and new node weight > computations used truncating divides, leading to a loss of total node > weight. > I have fixed this by making both rounding integer divides. > > Bootstrapped and tested on x86-64-unknown-linux-gnu. Ok for trunk? I'm sure we can outline a rounding integer divide inline function on gcov_type. To gcov-io.h, I suppose. Otherwise this looks ok to me. >>> >>> Thanks. I went ahead and worked on outlining this functionality. In >>> the process of doing so, I discovered that there was already a method >>> in basic-block.h to do part of this: apply_probability(), which does >>> the rounding divide by REG_BR_PROB_BASE. There is a related function >>> combine_probabilities() that takes 2 int probabilities instead of a >>> gcov_type and an int probability. I decided to use apply_probability() >>> in ipa-cp, and add a new macro GCOV_COMPUTE_SCALE to basic-block.h to >>> compute the scale factor/probability via a rounding divide. So the >>> ipa-cp changes I made use both GCOV_COMPUTE_SCALE and >>> apply_probability. >>> >>> I then went through all the code to look for instances where we were >>> computing scale factors/probabilities and performing scaling. I found >>> a mix of existing uses of apply/combine_probabilities, uses of RDIV, >>> inlined rounding divides, and truncating divides. I think it would be >>> good to unify all of this. As a first step, I replaced all inline code >>> sequences that were already doing rounding divides to compute scale >>> factors/probabilities or do the scaling, to instead use the >>> appropriate helper function/macro described above. For these >>> locations, there should be no change to behavior. >>> >>> There are a number of places where there are truncating divides right >>> now. Since changing those may impact the resulting behavior, for this >>> patch I simply added a comment as to which helper they should use. As >>> soon as this patch goes in I am planning to change those to use the >>> appropriate helper and test performance, and then will send that patch >>> for review. So for this patch, the only place where behavior is >>> changed is in ipa-cp which was my original change. >>> >>> New patch is attached. Bootstrapped (both bootstrap and >>> profiledbootstrap) and tested on x86-64-unknown-linux-gnu. Ok for >>> trunk? >>> >> >> This caused: >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57077 >> >> >> H.J. > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: Trivial testsuite fix
Jeff Law writes: > commit 07373396d21b65f975c2354e7c6ab454200b40af > Author: Jeff Law You should set the author accordingly. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH, i386]: Enable SSE -> GPR moves for generic x86 targets (PR target/54349)
Hi Uros, I was just updating an old bug (GCC generates MMX instructions but fails to generate "emms" http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44578) with a case where I am now hitting this same issue. But as I was including reproducer info I found that the problem went away with a compiler I just updated and rebuilt this morning. Turns out that your patch makes this problem disappear. I think this is just a side-effect of your change though. Specifically, enabling TARGET_INTER_UNIT_MOVES_FROM_VEC for Generic and using that in inline_secondary_memory_needed() is changing instruction selection in a lucky way for my test case (confirmed by hand-modifying the code to use TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES_FROM_VEC). The code is the same going into reload, but in reload I get the following difference for a *zero_extendsidi2: < Choosing alt 6 in insn 11: (0) ?*y (1) m < Creating newreg=74 from oldreg=67, assigning class MMX_REGS to r74 --- > Choosing alt 8 in insn 11: (0) ?*x (1) m > Creating newreg=74 from oldreg=67, assigning class SSE_REGS to r74 Would you agree that this is just a lucky side-effect and that there is still a bug here? I think I will go ahead and update the bug (I can still reproduce it with -mtune=athlon64). Here is the test case: --- I have another instance of this issue. Trunk is generating move instructions to implement an inlined memcpy. The move instructions use the MMX registers, but no EMMS instruction is generated. My testcase then calls a libm function that uses the FPU, which returns incorrect results. This worked with an older gcc 4.7 based compiler, which didn't use MMX registers. The compiler was configured for x86_64-unknown-linux-gnu. The testcase was compiled with -O2. $ cat test.cc #include #include #include #include namespace { volatile double dd = 0.080553657784353652; double dds, ddc; } unsigned long long test(float num) { if (num < 0) { num = 0; } unsigned int i; memcpy(&i, &num, sizeof(unsigned int)); unsigned long long a = i; sincos(dd, &dds, &ddc); if (isnan(dds) || isnan(ddc)) { printf ("Failed\n"); exit (1); } return a; } $ cat test_main.cc #include extern unsigned long long test(float num); int main() { unsigned long long h = test(1); printf ("Passed\n"); } $ g++ -O2 test*.cc -mtune=athlon64 $ a.out Failed --- Thanks, Teresa On Mon, Apr 29, 2013 at 4:08 AM, Uros Bizjak wrote: > Hello! > > Attached patch enables SSE -> general register moves for generic x86 > targets. The patch splits TARGET_INTER_UNIT_MOVES to > TARGET_INTER_UNIT_MOVES_TO_VEC and TARGET_INTER_UNIT_MOVES_FROM_VEC > tuning flags and updates gcc sources accordingly. > > According to AMD optimization manuals, direct moves *FROM* SSE (and > MMX) registers *TO* general registers should be used for AMD K10 > family and later families. Since Intel targets are unaffected by this > change, I have also changed generic setting to enable these moves for > a generic target tuning. > > 2013-04-29 Uros Bizjak > > PR target/54349 > * config/i386/i386.h (enum ix86_tune_indices) > : > New, split from X86_TUNE_INTER_UNIT_MOVES. > : Remove. > (TARGET_INTER_UNIT_MOVES_TO_VEC): New define. > (TARGET_INTER_UNIT_MOVES_FROM_VEC): Ditto. > (TARGET_INTER_UNIT_MOVES): Remove. > * config/i386/i386.c (initial_ix86_tune_features): Update. > Disable X86_TUNE_INTER_UNIT_MOVES_FROM_VEC for m_ATHLON_K8 only. > (ix86_expand_convert_uns_didf_sse): Use > TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES. > (ix86_expand_vector_init_one_nonzero): Ditto. > (ix86_expand_vector_init_interleave): Ditto. > (inline_secondary_memory_needed): Return true for moves from SSE class > registers for !TARGET_INTER_UNIT_MOVES_FROM_VEC targets and for moves > to SSE class registers for !TARGET_INTER_UNIT_MOVES_TO_VEC targets. > * config/i386/constraints.md (Yi, Ym): Depend on > TARGET_INTER_UNIT_MOVES_TO_VEC. > (Yj, Yn): New constraints. > * config/i386/i386.md (*movdi_internal): Change constraints of > operand 1 from Yi to Yj and from Ym to Yn. > (*movsi_internal): Ditto. > (*movdf_internal): Ditto. > (*movsf_internal): Ditto. > (*float2_1): Use > TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES. > (*float2_1 splitters): Ditto. > (floatdi2_i387_with_xmm): Ditto. > (floatdi2_i387_with_xmm splitters): Ditto. > * config/i386/sse.md (movdi_to_sse): Ditto. > (sse2_stored): Change constraint of operand 1 from Yi to Yj. > Use TARGET_INTER_UNIT_MOVES_FROM_VEC instead of > TARGET_INTER_UNIT_MOVES. > (sse_storeq_rex64): Change constraint of operand 1 from Yi to Yj. > (sse_storeq_rex64 splitter): Use TARGET_INTER_UNIT_MOVES_FROM_VEC > instead of TARGET_INTER_UNIT_MOVES. > * config/i386/mmx.md (*mov_internal): Change constraint
Trivial testsuite fix
A private message from Kai to myself include this patch to fix an out of bounds array access in the testsuite. Installed on the trunk for Kai. commit 07373396d21b65f975c2354e7c6ab454200b40af Author: Jeff Law Date: Mon Apr 29 10:22:11 2013 -0600 * gcc.c-torture/execute/pr55875.c diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 3364efc..73eeaf2 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-04-29 Kai Tietz + + * gcc.c-torture/execute/pr55875.c + 2013-04-29 Richard Biener PR middle-end/57075 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr55875.c b/gcc/testsuite/gcc.c-torture/execute/pr55875.c index 4a0ce1b..4e56f7c 100644 --- a/gcc/testsuite/gcc.c-torture/execute/pr55875.c +++ b/gcc/testsuite/gcc.c-torture/execute/pr55875.c @@ -1,4 +1,4 @@ -int a[250]; +int a[251]; __attribute__ ((noinline)) t(int i) {
Re: [testsuite] Disabling gcc.dg/cpp/trad/include.c for Android
On Jan 9, 2013, at 7:14 AM, Alexander Ivchenko wrote: > We have test fail for gcc.dg/cpp/trad/include.c on Android. The > reason for that is that > -ftraditional-cpp is not expected to work on Android due to variadic > macro (like #define __builtin_warning(x, y...)) > in standard headers and traditional preprocessor cannot handle them. > The attached patch disables that test. Be sure to ask, Ok? in your patch submittals. Ok.
Re: [C++ Patch/RFC] PR 57092
On 04/29/2013 05:05 AM, Paolo Carlini wrote: in this 4.8/4.9 Regression, finish_decltype_type doesn't handle ADDR_EXPR. Hmm...we're seeing the regression because previously finish_decltype_type would have just returned the type of the template parameter so it wouldn't ever see the ADDR_EXPR at instantiation time. But we want to form a DECLTYPE_TYPE so that the mangling is correct. Perhaps the right solution is to handle this case specially in tsubst/DECLTYPE_TYPE: If id is true and the original expr is a TEMPLATE_PARM_INDEX, just instantiate the type of the template parm rather than its value. Jason
[PING] SLSR for conditional candidates
Half-hearted ping for http://gcc.gnu.org/ml/gcc-patches/2013-03/msg01291.html ... I promise this is the last major code dump for SLSR. ;) Thanks, Bill
Re: [PATCH] Redesign pthread in LIB_SPEC for systems without libpthread
*ping* thank you, Alexander 2013/4/15 Pavel Chupin : > On Tue, Apr 2, 2013 at 1:59 PM, Pavel Chupin wrote: >> On Mon, Apr 1, 2013 at 7:07 PM, Pavel Chupin >> wrote: >>> On Android pthread is integrated into libc. >>> Attached patch fixes configures for this case by trying to build test >>> without -pthread -lpthread. >>> >>> 2013-04-01 Pavel Chupin >>> >>> Fix libatomic and libgomp configure for systems without libpthread >>> * libatomic/configure.ac: Add test without -pthread -lpthread. >>> * libgomp/configure.ac: Ditto. >>> * libatomic/configure: Regenerate. >>> * libgomp/configure: Regenerate. >>> >>> OK for trunk? >>> >> >> I think I made a better fix: >> >> 2013-04-02 Pavel Chupin >> >> Redesign pthread in LIB_SPEC for systems without libpthread >> * gcc/config/gnu-user.h: Remove pthread from GNU_USER_TARGET_LIB_SPEC >> but keep in default LIB_SPEC >> * gcc/config/linux-android.h: Add pthread to ANDROID_LIB_SPEC >> >> Is it OK for trunk? > > Ping > > -- > Pavel Chupin > Intel Corporation
[Patch, testsuite] Add -gdwarf to debug/dwarf2 testcases (Take 2)
This patch adds -gdwarf to the flags passed to all tests run by dwarf2.exp. In the first attempt, I'd added the flag to dg-options in individual testcases. Jakub then suggested adding it to the exp file instead. Does this look ok? If yes, could someone commit please, I don't have commit access. Regards Senthil gcc/testsuite/ChangeLog 2013-04-29 Senthil Kumar Selvaraj * gcc.dg/debug/dwarf2/dwarf2.exp: Replace -gdwarf-2 with -gdwarf and force -gdwarf when invoking dg-runtest. diff --git gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp index 829840c..f161787 100644 --- gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp +++ gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp @@ -22,7 +22,7 @@ load_lib gcc-dg.exp # If a testcase doesn't have special options, use these. global DEFAULT_CFLAGS if ![info exists DEFAULT_CFLAGS] then { -set DEFAULT_CFLAGS " -ansi -pedantic-errors -gdwarf-2" +set DEFAULT_CFLAGS " -ansi -pedantic-errors" } # Initialize `dg'. @@ -31,12 +31,12 @@ dg-init # Main loop. set comp_output [gcc_target_compile \ "$srcdir/$subdir/../trivial.c" "trivial.S" assembly \ -"additional_flags=-gdwarf-2"] +"additional_flags=-gdwarf"] if { ! [string match "*: target system does not support the * debug format*" \ $comp_output] } { remove-build-file "trivial.S" dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cS\] $srcdir/c-c++-common/dwarf2/*.c]] \ - "" $DEFAULT_CFLAGS + " -gdwarf " $DEFAULT_CFLAGS } # All done.
[PATCH] Fix PR57075
When fixing PR57036 made the inliner not add abnormal edges from calls to non-local labels or setjmps I made it not split the blocks after the possible source of abnormal control flow. That turns out to upset the CFG verifier so the following re-instantiates splitting of blocks. Bootstrap & regtest ongoing on x86_64-unknown-linux-gnu. Richard. 2013-04-29 Richard Biener PR middle-end/57075 * tree-inline.c (copy_edges_for_bb): Still split the bbs, even if not adding abnormal edges for calls that can make abnormal gotos. * gcc.dg/torture/pr57075.c: New testcase. Index: gcc/tree-inline.c === *** gcc/tree-inline.c (revision 198409) --- gcc/tree-inline.c (working copy) *** copy_edges_for_bb (basic_block bb, gcov_ *** 1923,1933 into a COMPONENT_REF which doesn't. If the copy can throw, the original could also throw. */ can_throw = stmt_can_throw_internal (copy_stmt); ! /* If the call we inline cannot make abnormal goto do not add ! additional abnormal edges but only retain those already present !in the original function body. */ ! nonlocal_goto ! = can_make_abnormal_goto && stmt_can_make_abnormal_goto (copy_stmt); if (can_throw || nonlocal_goto) { --- 1927,1933 into a COMPONENT_REF which doesn't. If the copy can throw, the original could also throw. */ can_throw = stmt_can_throw_internal (copy_stmt); ! nonlocal_goto = stmt_can_make_abnormal_goto (copy_stmt); if (can_throw || nonlocal_goto) { *** copy_edges_for_bb (basic_block bb, gcov_ *** 1955,1960 --- 1955,1964 else if (can_throw) make_eh_edges (copy_stmt); + /* If the call we inline cannot make abnormal goto do not add + additional abnormal edges but only retain those already present +in the original function body. */ + nonlocal_goto &= can_make_abnormal_goto; if (nonlocal_goto) make_abnormal_goto_edges (gimple_bb (copy_stmt), true); Index: gcc/testsuite/gcc.dg/torture/pr57075.c === *** gcc/testsuite/gcc.dg/torture/pr57075.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr57075.c (working copy) *** *** 0 --- 1,15 + /* { dg-do compile } */ + + extern int baz (void) __attribute__ ((returns_twice)); + int __attribute__ ((__leaf__)) + foo (void) + { + return __builtin_printf ("$"); + } + + void + bar () + { + foo (); + baz (); + }
Re: [Patch] Emit error for negative _Alignas alignment values
On Thu, 25 Apr 2013, Senthil Kumar Selvaraj wrote: > On Wed, Apr 24, 2013 at 03:18:51PM +, Joseph S. Myers wrote: > > On Wed, 3 Apr 2013, Senthil Kumar Selvaraj wrote: > > > > > 2013-04-03Senthil Kumar Selvaraj > > > > > > > > > * c-common.c (check_user_alignment): Emit error for negative values > > > > > > * gcc.dg/c1x-align-3.c: Add test for negative power of 2 > > > > OK (but note there should be a "." at the end of each ChangeLog entry). > > > > Fixed now. I also moved the test case change into its own Changelog. Could > someone commit it for me please, as I don't have commit access? Thanks, committed. -- Joseph S. Myers jos...@codesourcery.com
Re: Make m32c build, fix PSImode truncation
Richard Sandiford writes: > Bernd Schmidt writes: >> On 04/27/2013 10:39 AM, Richard Sandiford wrote: >>> Argh, that's unfortunate. The point of that change was to make >>> simplify_gen_unary (TRUNCATE, ...) no worse than using a subreg. >>> Would the equivalent lowpart simplify_gen_subreg call succeed >>> (return nonnull)? If so, I think we want truncate to do the same. >>> >>> What simplification is this blocking, and why does it lead to >>> reload failures? >> >> There's an explicit (set (reg:PSI) (truncate:PSI (reg:SI)) insn which >> currently gets changed to (set (reg:PSI) (subreg:PSI (reg:SI)) during >> cse1. Reload fails because the subreg gets propagated into a memory >> address, which requires a class of A_REGS, but A_REGS can only hold >> PSImode values, not SImode. This shows that the truncation is not >> always a no-op: in this case it involves a register move, but there's no >> way to describe this using TRULY_NOOP_TRUNCATION. > > Hmm, but isn't this a reload bug? We have: > > (insn 53 51 54 10 (set (reg:HI 0 r0 [orig:26 D.2817 ] [26]) > (zero_extend:HI (mem/u/j:QI (plus:PSI (subreg:PSI (reg:SI 44 [ D.2818 > ]) 0) > (symbol_ref:PSI ("__clz_tab") [flags 0x40] 0x7f2c253d42f8 __clz_tab>)) [0 __clz_tab S1 A8]))) > /home/richards/gcc/HEAD/gcc/libgcc/libgcc2.c:520 115 {zero_extendqihi2} > (expr_list:REG_DEAD (reg:SI 44 [ D.2818 ]) > (nil))) > > Reloads for insn # 53 > Reload 0: reload_in (SI) = (reg:SI 44 [ D.2818 ]) > A_REGS, RELOAD_FOR_OTHER_ADDRESS (opnum = 0) > reload_in_reg: (reg:SI 44 [ D.2818 ]) > > find_reloads_address_1 is reloading the SUBREG_REG rather than the > SUBREG itself, even though SImode is not valid for BASE_REGS == A_REGS: > > if (GET_CODE (op0) == SUBREG) > { > op0 = SUBREG_REG (op0); > code0 = GET_CODE (op0); > if (code0 == REG && REGNO (op0) < FIRST_PSEUDO_REGISTER) > op0 = gen_rtx_REG (word_mode, >(REGNO (op0) + > subreg_regno_offset (REGNO (SUBREG_REG > (orig_op0)), > GET_MODE (SUBREG_REG > (orig_op0)), > SUBREG_BYTE (orig_op0), > GET_MODE (orig_op0; > } > > push_reloads would specifically not convert a SUBREG reload to a > REG reload in this case. In principle, I think address subregs > should be handled in the same way. > > So is the problem really that (subreg:PSI (reg:SI ...)) isn't a valid > truncation on m32c? Without TRULY_NOOP_TRUNCATION, I don't see what > forces most code to use (truncate:PSI (reg:SI ...)) instead. Many places > would call gen_lowpart directly. > > Sorry for missing the truncation patterns, I should have grepped more > than m32c.md. They look a lot like normal moves though. Is truncation > really not a noop, or are the patterns there to work around something > (probably this :-))? Even if that's true, I suppose it isn't worth trying to fix such a sensitive part of reload at this stage. I think LRA already handles it correctly. In the meantime, we could work around the problem by disallowing subregs in m32c addresses. I think all non-paradoxical subregs[*] are going to need a reload anyway, so it should also produce better code. [*] Paradoxical subregs imply an address has don't-care bits, so should be rare. FWIW, the proof-of-concept patch below restores the build for me. I realise it might fail muster on style grounds though. Richard gcc/ * config/m32c/m32c.c (address_pattern_p): New variable. (encode_pattern_1): Include subregs address_pattern_p. (encode_pattern): Add address_p parameter. (m32c_legitimate_address_p): Update accordingly. Index: gcc/config/m32c/m32c.c === --- gcc/config/m32c/m32c.c 2013-04-29 14:07:50.0 +0100 +++ gcc/config/m32c/m32c.c 2013-04-29 14:07:51.207987093 +0100 @@ -113,6 +113,7 @@ static int class_contents[LIM_REG_CLASSE /* These are all to support encode_pattern(). */ static char pattern[30], *patternp; static GTY(()) rtx patternr[30]; +static bool address_pattern_p; #define RTX_IS(x) (streq (pattern, x)) /* Some macros to simplify the logic throughout this file. */ @@ -166,8 +167,9 @@ encode_pattern_1 (rtx x) *patternp++ = 'r'; break; case SUBREG: - if (GET_MODE_SIZE (GET_MODE (x)) != - GET_MODE_SIZE (GET_MODE (XEXP (x, 0 + if (address_pattern_p + || (GET_MODE_SIZE (GET_MODE (x)) + != GET_MODE_SIZE (GET_MODE (XEXP (x, 0) *patternp++ = 'S'; encode_pattern_1 (XEXP (x, 0)); break; @@ -254,9 +256,10 @@ encode_pattern_1 (rtx x) } static void -encode_pattern (rtx x) +encode_pattern (rtx x, bool address_p = false) { patternp = pattern; +
Re: [Patch, Ping] Emit error for negative _Alignas alignment values
Ping - could you commit it for me please, I don't have commit access. Regards Senthil On Thu, Apr 25, 2013 at 12:10:06PM +0530, Senthil Kumar Selvaraj wrote: > On Wed, Apr 24, 2013 at 03:18:51PM +, Joseph S. Myers wrote: > > On Wed, 3 Apr 2013, Senthil Kumar Selvaraj wrote: > > > > > 2013-04-03Senthil Kumar Selvaraj > > > > > > > > > * c-common.c (check_user_alignment): Emit error for negative values > > > > > > * gcc.dg/c1x-align-3.c: Add test for negative power of 2 > > > > OK (but note there should be a "." at the end of each ChangeLog entry). > > > > Fixed now. I also moved the test case change into its own Changelog. Could > someone commit it for me please, as I don't have commit access? > > Regards > Senthil > > gcc/c-family/ChangeLog > > 2013-04-03 Senthil Kumar Selvaraj > > * c-common.c (check_user_alignment): Emit error for negative values. > > gcc/testsuite/ChangeLog > > 2013-04-03 Senthil Kumar Selvaraj > > * gcc.dg/c1x-align-3.c: Add test for negative power of 2. > > > diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c > index c7cdd0f..dfdfbb6 100644 > --- gcc/c-family/c-common.c > +++ gcc/c-family/c-common.c > @@ -7308,9 +7308,10 @@ check_user_alignment (const_tree align, bool > allow_zero) > } >else if (allow_zero && integer_zerop (align)) > return -1; > - else if ((i = tree_log2 (align)) == -1) > + else if (tree_int_cst_sgn (align) == -1 > + || (i = tree_log2 (align)) == -1) > { > - error ("requested alignment is not a power of 2"); > + error ("requested alignment is not a positive power of 2"); >return -1; > } >else if (i >= HOST_BITS_PER_INT - BITS_PER_UNIT_LOG) > diff --git gcc/testsuite/gcc.dg/c1x-align-3.c > gcc/testsuite/gcc.dg/c1x-align-3.c > index 0b2a77f..b97351c 100644 > --- gcc/testsuite/gcc.dg/c1x-align-3.c > +++ gcc/testsuite/gcc.dg/c1x-align-3.c > @@ -23,6 +23,7 @@ _Alignas (-(__LONG_LONG_MAX__-1)/4) char i3; /* { dg-error > "too large|power of 2 > _Alignas (-(__LONG_LONG_MAX__-1)/8) char i4; /* { dg-error "too large|power > of 2" } */ > _Alignas (-(__LONG_LONG_MAX__-1)/16) char i5; /* { dg-error "too large|power > of 2" } */ > _Alignas (-1) char j; /* { dg-error "power of 2" } */ > +_Alignas (-2) char j; /* { dg-error "positive power of 2" } */ > _Alignas (3) char k; /* { dg-error "power of 2" } */ > > _Alignas ((void *) 1) char k; /* { dg-error "integer constant" } */
Re: Make m32c build, fix PSImode truncation
Bernd Schmidt writes: > On 04/27/2013 10:39 AM, Richard Sandiford wrote: >> Argh, that's unfortunate. The point of that change was to make >> simplify_gen_unary (TRUNCATE, ...) no worse than using a subreg. >> Would the equivalent lowpart simplify_gen_subreg call succeed >> (return nonnull)? If so, I think we want truncate to do the same. >> >> What simplification is this blocking, and why does it lead to >> reload failures? > > There's an explicit (set (reg:PSI) (truncate:PSI (reg:SI)) insn which > currently gets changed to (set (reg:PSI) (subreg:PSI (reg:SI)) during > cse1. Reload fails because the subreg gets propagated into a memory > address, which requires a class of A_REGS, but A_REGS can only hold > PSImode values, not SImode. This shows that the truncation is not > always a no-op: in this case it involves a register move, but there's no > way to describe this using TRULY_NOOP_TRUNCATION. Hmm, but isn't this a reload bug? We have: (insn 53 51 54 10 (set (reg:HI 0 r0 [orig:26 D.2817 ] [26]) (zero_extend:HI (mem/u/j:QI (plus:PSI (subreg:PSI (reg:SI 44 [ D.2818 ]) 0) (symbol_ref:PSI ("__clz_tab") [flags 0x40] )) [0 __clz_tab S1 A8]))) /home/richards/gcc/HEAD/gcc/libgcc/libgcc2.c:520 115 {zero_extendqihi2} (expr_list:REG_DEAD (reg:SI 44 [ D.2818 ]) (nil))) Reloads for insn # 53 Reload 0: reload_in (SI) = (reg:SI 44 [ D.2818 ]) A_REGS, RELOAD_FOR_OTHER_ADDRESS (opnum = 0) reload_in_reg: (reg:SI 44 [ D.2818 ]) find_reloads_address_1 is reloading the SUBREG_REG rather than the SUBREG itself, even though SImode is not valid for BASE_REGS == A_REGS: if (GET_CODE (op0) == SUBREG) { op0 = SUBREG_REG (op0); code0 = GET_CODE (op0); if (code0 == REG && REGNO (op0) < FIRST_PSEUDO_REGISTER) op0 = gen_rtx_REG (word_mode, (REGNO (op0) + subreg_regno_offset (REGNO (SUBREG_REG (orig_op0)), GET_MODE (SUBREG_REG (orig_op0)), SUBREG_BYTE (orig_op0), GET_MODE (orig_op0; } push_reloads would specifically not convert a SUBREG reload to a REG reload in this case. In principle, I think address subregs should be handled in the same way. So is the problem really that (subreg:PSI (reg:SI ...)) isn't a valid truncation on m32c? Without TRULY_NOOP_TRUNCATION, I don't see what forces most code to use (truncate:PSI (reg:SI ...)) instead. Many places would call gen_lowpart directly. Sorry for missing the truncation patterns, I should have grepped more than m32c.md. They look a lot like normal moves though. Is truncation really not a noop, or are the patterns there to work around something (probably this :-))? Richard
Re: [PATCH][ARM] Restrict store_minmaxsi
On 29/04/13 12:33, Kyrylo Tkachov wrote: Hi all, With this patch, we now only use the store_minmaxsi pattern when we're not in a hot path. We have found that this pattern can cause memory access bottlenecks in some cases (one benchmark was 45% slower when this pattern was enabled). Tested arm-none-eabi on qemu. Ok for trunk? Thanks, Kyrill 2013-04-29 Kyrylo Tkachov * config/arm/arm.md (store_minmaxsi): Use only when optimize_insn_for_size_p. OK. R.
[PATCH] Update tail-merge header comment.
Steven, I answered your question in tree-ssa-tail-merge.c about why tail-merge is not a stand-alone gimple pass. Committed to trunk. Thanks, - Tom 2013-04-29 Tom de Vries * tree-ssa-tail-merge.c: Update header comment.
[PATCH] Fix PR57103
The following fixes a thinko in move_stmt_op regarding to block updates. It also makes the two copies look the same and removes redundant checking. Bootstrap and regtest pending on x86_64-unknown-linux-gnu. Richard. 2013-04-29 Richard Biener PR middle-end/57103 * tree-cfg.c (move_stmt_op): Fix condition under which to update TREE_BLOCK. (move_stmt_r): Remove redundant checking. * gcc.dg/autopar/pr57103.c: New testcase. Index: gcc/tree-cfg.c === *** gcc/tree-cfg.c (revision 198409) --- gcc/tree-cfg.c (working copy) *** move_stmt_op (tree *tp, int *walk_subtre *** 6099,6108 if (EXPR_P (t)) { ! if (TREE_BLOCK (t) == p->orig_block || (p->orig_block == NULL_TREE ! && TREE_BLOCK (t) == NULL_TREE)) TREE_SET_BLOCK (t, p->new_block); } else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) { --- 6099,6117 if (EXPR_P (t)) { ! tree block = TREE_BLOCK (t); ! if (block == p->orig_block || (p->orig_block == NULL_TREE ! && block != NULL_TREE)) TREE_SET_BLOCK (t, p->new_block); + #ifdef ENABLE_CHECKING + else if (block != NULL_TREE) + { + while (block && TREE_CODE (block) == BLOCK && block != p->orig_block) + block = BLOCK_SUPERCONTEXT (block); + gcc_assert (block == p->orig_block); + } + #endif } else if (DECL_P (t) || TREE_CODE (t) == SSA_NAME) { *** move_stmt_r (gimple_stmt_iterator *gsi_p *** 6187,6204 gimple stmt = gsi_stmt (*gsi_p); tree block = gimple_block (stmt); ! if (p->orig_block == NULL_TREE ! || block == p->orig_block ! || block == NULL_TREE) gimple_set_block (stmt, p->new_block); - #ifdef ENABLE_CHECKING - else if (block != p->new_block) - { - while (block && block != p->orig_block) - block = BLOCK_SUPERCONTEXT (block); - gcc_assert (block); - } - #endif switch (gimple_code (stmt)) { --- 6196,6205 gimple stmt = gsi_stmt (*gsi_p); tree block = gimple_block (stmt); ! if (block == p->orig_block ! || (p->orig_block == NULL_TREE ! && block != NULL_TREE)) gimple_set_block (stmt, p->new_block); switch (gimple_code (stmt)) { *** move_block_to_fn (struct function *dest_ *** 6426,6439 e->goto_locus = d->new_block ? COMBINE_LOCATION_DATA (line_table, e->goto_locus, d->new_block) : LOCATION_LOCUS (e->goto_locus); - #ifdef ENABLE_CHECKING - else if (block != d->new_block) - { - while (block && block != d->orig_block) - block = BLOCK_SUPERCONTEXT (block); - gcc_assert (block); - } - #endif } } --- 6427,6432 Index: gcc/testsuite/gcc.dg/autopar/pr57103.c === *** gcc/testsuite/gcc.dg/autopar/pr57103.c (revision 0) --- gcc/testsuite/gcc.dg/autopar/pr57103.c (working copy) *** *** 0 --- 1,19 + /* { dg-do compile } */ + /* { dg-options "-O -ftree-parallelize-loops=4" } */ + + int d[1024]; + + static inline int foo (void) + { + int s = 0; + int i = 0; + for (; i < 1024; i++) + s += d[i]; + return s; + } + + void bar (void) + { + if (foo ()) + __builtin_abort (); + }
Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output
On Mon, Apr 29, 2013 at 03:39:58PM +0400, Maksim Kuznetsov wrote: > 2013/4/29 Jakub Jelinek : > > Also, why are you handling just %{ and %}, and > > not also %| ? I mean, if you want to print say {|} into assembly for both > > dialects, don't you need: > > asm ("{dialect1%{%|%}|%{%|%}dialect2}"); > > or similar? If you use just | instead of %|, it would be handled as > > separator of the dialects. > > Sure. %| was removed due to concerns over some target architectures > already use it, but now %| is under ASSEMBLER_DIALECT and doesn't seem > to affect them. > > ChangeLog: > > 2013-04-29 Maxim Kuznetsov > * final.c (do_assembler_dialects): Don't handle curly braces and > vertical bar escaped by % as dialect delimiters. > (output_asm_insn): Print curly braces and vertical bar if escaped > by % and ASSEMBLER_DIALECT defined. > * doc/tm.texi (ASSEMBLER_DIALECT): Document new standard escapes. > > testsuite/ChangeLog: > > 2013-04-29 Maxim Kuznetsov > > * gcc.target/i386/asm-dialect-2.c: New testcase. Ok, thanks. Jakub
Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output
2013/4/29 Jakub Jelinek : > Also, why are you handling just %{ and %}, and > not also %| ? I mean, if you want to print say {|} into assembly for both > dialects, don't you need: > asm ("{dialect1%{%|%}|%{%|%}dialect2}"); > or similar? If you use just | instead of %|, it would be handled as > separator of the dialects. Sure. %| was removed due to concerns over some target architectures already use it, but now %| is under ASSEMBLER_DIALECT and doesn't seem to affect them. ChangeLog: 2013-04-29 Maxim Kuznetsov * final.c (do_assembler_dialects): Don't handle curly braces and vertical bar escaped by % as dialect delimiters. (output_asm_insn): Print curly braces and vertical bar if escaped by % and ASSEMBLER_DIALECT defined. * doc/tm.texi (ASSEMBLER_DIALECT): Document new standard escapes. testsuite/ChangeLog: 2013-04-29 Maxim Kuznetsov * gcc.target/i386/asm-dialect-2.c: New testcase. -- Maxim Kuznetsov curly_braces_20130429-2.patch Description: Binary data
[PATCH][ARM] Restrict store_minmaxsi
Hi all, With this patch, we now only use the store_minmaxsi pattern when we're not in a hot path. We have found that this pattern can cause memory access bottlenecks in some cases (one benchmark was 45% slower when this pattern was enabled). Tested arm-none-eabi on qemu. Ok for trunk? Thanks, Kyrill 2013-04-29 Kyrylo Tkachov * config/arm/arm.md (store_minmaxsi): Use only when optimize_insn_for_size_p. disable_store_minmaxsi.patch Description: Binary data
[PATCH][committed]: Fix typo in predict.c
Hi all, I've committed this typo fix in predict.c as r198408. Thanks, Kyrill 2013-04-29 Kyrylo Tkachov * predict.c: Fix typo in comment above #define PROB_VERY_UNLIKELY. predict-spelling.patch Description: Binary data
Re: [testsuite] Disabling gcc.dg/cpp/trad/include.c for Android
*ping* thanks, Alexander 2013/3/26 Alexander Ivchenko : > Hi, > > Could you please take a look at the attached fixinclude patch > that addresses the problem: > > " We have test fail for gcc.dg/cpp/trad/include.c on Android. The > reason for that is that > -ftraditional-cpp is not expected to work on Android due to variadic > macro (like #define __builtin_warning(x, y...)) > in standard headers and traditional preprocessor cannot handle them." > > is it ok for trunk? > > thanks, > Alexander > > 2013/1/9 Andrew Pinski : >> On Wed, Jan 9, 2013 at 7:14 AM, Alexander Ivchenko >> wrote: >>> Hi, >>> >>> We have test fail for gcc.dg/cpp/trad/include.c on Android. The >>> reason for that is that >>> -ftraditional-cpp is not expected to work on Android due to variadic >>> macro (like #define __builtin_warning(x, y...)) >>> in standard headers and traditional preprocessor cannot handle them. >>> The attached patch disables that test. >> >> It sounds like it is better to fix the system headers instead. Via a >> fixincludes for older headers and have the android folks fix them for >> newer releases. >> >> Thanks, >> Andrew Pinski
[PATCH, i386]: Enable SSE -> GPR moves for generic x86 targets (PR target/54349)
Hello! Attached patch enables SSE -> general register moves for generic x86 targets. The patch splits TARGET_INTER_UNIT_MOVES to TARGET_INTER_UNIT_MOVES_TO_VEC and TARGET_INTER_UNIT_MOVES_FROM_VEC tuning flags and updates gcc sources accordingly. According to AMD optimization manuals, direct moves *FROM* SSE (and MMX) registers *TO* general registers should be used for AMD K10 family and later families. Since Intel targets are unaffected by this change, I have also changed generic setting to enable these moves for a generic target tuning. 2013-04-29 Uros Bizjak PR target/54349 * config/i386/i386.h (enum ix86_tune_indices) : New, split from X86_TUNE_INTER_UNIT_MOVES. : Remove. (TARGET_INTER_UNIT_MOVES_TO_VEC): New define. (TARGET_INTER_UNIT_MOVES_FROM_VEC): Ditto. (TARGET_INTER_UNIT_MOVES): Remove. * config/i386/i386.c (initial_ix86_tune_features): Update. Disable X86_TUNE_INTER_UNIT_MOVES_FROM_VEC for m_ATHLON_K8 only. (ix86_expand_convert_uns_didf_sse): Use TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES. (ix86_expand_vector_init_one_nonzero): Ditto. (ix86_expand_vector_init_interleave): Ditto. (inline_secondary_memory_needed): Return true for moves from SSE class registers for !TARGET_INTER_UNIT_MOVES_FROM_VEC targets and for moves to SSE class registers for !TARGET_INTER_UNIT_MOVES_TO_VEC targets. * config/i386/constraints.md (Yi, Ym): Depend on TARGET_INTER_UNIT_MOVES_TO_VEC. (Yj, Yn): New constraints. * config/i386/i386.md (*movdi_internal): Change constraints of operand 1 from Yi to Yj and from Ym to Yn. (*movsi_internal): Ditto. (*movdf_internal): Ditto. (*movsf_internal): Ditto. (*float2_1): Use TARGET_INTER_UNIT_MOVES_TO_VEC instead of TARGET_INTER_UNIT_MOVES. (*float2_1 splitters): Ditto. (floatdi2_i387_with_xmm): Ditto. (floatdi2_i387_with_xmm splitters): Ditto. * config/i386/sse.md (movdi_to_sse): Ditto. (sse2_stored): Change constraint of operand 1 from Yi to Yj. Use TARGET_INTER_UNIT_MOVES_FROM_VEC instead of TARGET_INTER_UNIT_MOVES. (sse_storeq_rex64): Change constraint of operand 1 from Yi to Yj. (sse_storeq_rex64 splitter): Use TARGET_INTER_UNIT_MOVES_FROM_VEC instead of TARGET_INTER_UNIT_MOVES. * config/i386/mmx.md (*mov_internal): Change constraint of operand 1 from Yi to Yj and from Ym to Yn. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline SVN. Uros. Index: constraints.md === --- constraints.md (revision 198390) +++ constraints.md (working copy) @@ -87,8 +87,10 @@ ;; We use the Y prefix to denote any number of conditional register sets: ;; z First SSE register. -;; i SSE2 inter-unit moves enabled -;; m MMX inter-unit moves enabled +;; i SSE2 inter-unit moves to SSE register enabled +;; j SSE2 inter-unit moves from SSE register enabled +;; m MMX inter-unit moves to MMX register enabled +;; n MMX inter-unit moves from MMX register enabled ;; a Integer register when zero extensions with AND are disabled ;; p Integer register when TARGET_PARTIAL_REG_STALL is disabled ;; d Integer register when integer DFmode moves are enabled @@ -99,13 +101,21 @@ "First SSE register (@code{%xmm0}).") (define_register_constraint "Yi" - "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES ? SSE_REGS : NO_REGS" - "@internal Any SSE register, when SSE2 and inter-unit moves are enabled.") + "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_TO_VEC ? SSE_REGS : NO_REGS" + "@internal Any SSE register, when SSE2 and inter-unit moves to vector registers are enabled.") +(define_register_constraint "Yj" + "TARGET_SSE2 && TARGET_INTER_UNIT_MOVES_FROM_VEC ? SSE_REGS : NO_REGS" + "@internal Any SSE register, when SSE2 and inter-unit moves from vector registers are enabled.") + (define_register_constraint "Ym" - "TARGET_MMX && TARGET_INTER_UNIT_MOVES ? MMX_REGS : NO_REGS" - "@internal Any MMX register, when inter-unit moves are enabled.") + "TARGET_MMX && TARGET_INTER_UNIT_MOVES_TO_VEC ? MMX_REGS : NO_REGS" + "@internal Any MMX register, when inter-unit moves to vector registers are enabled.") +(define_register_constraint "Yn" + "TARGET_MMX && TARGET_INTER_UNIT_MOVES_FROM_VEC ? MMX_REGS : NO_REGS" + "@internal Any MMX register, when inter-unit moves from vector registers are enabled.") + (define_register_constraint "Yp" "TARGET_PARTIAL_REG_STALL ? NO_REGS : GENERAL_REGS" "@internal Any integer register when TARGET_PARTIAL_REG_STALL is disabled.") Index: i386.c === --- i386.c (revision 198390) +++ i386.c (working copy) @@ -1931,9 +1931,12 @@ static unsigned int initial_ix86_tune_features[X86 /* X86_TUNE_USE_FFREEP */ m_AMD_MULTIPLE, - /* X86_TUNE_INTER_UNIT_MOVES */ + /* X86_TUNE_INTER_UNIT_MOVES_TO_VEC */ ~(m_
Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output
On Mon, Apr 29, 2013 at 02:31:40PM +0400, Maksim Kuznetsov wrote: > Jakub, Richard, thank you for your feedback! > > > I wonder if it %{ and %} shouldn't be better handled in final.c > > for all #ifdef ASSEMBLER_DIALECT targets, rather than just for one specific. > > I moved %{ and %} cases to output_asm_insn in final.c > > > Also: > > *(p + 1) > > should be better written as p[1] (more readable). > > Fixed. > > I also documented new escapes. > Could you please have a look? ChangeLog entry is missing. Also, why are you handling just %{ and %}, and not also %| ? I mean, if you want to print say {|} into assembly for both dialects, don't you need: asm ("{dialect1%{%|%}|%{%|%}dialect2}"); or similar? If you use just | instead of %|, it would be handled as separator of the dialects. Otherwise it looks good to me. Jakub
Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output
Jakub, Richard, thank you for your feedback! > I wonder if it %{ and %} shouldn't be better handled in final.c > for all #ifdef ASSEMBLER_DIALECT targets, rather than just for one specific. I moved %{ and %} cases to output_asm_insn in final.c > Also: > *(p + 1) > should be better written as p[1] (more readable). Fixed. I also documented new escapes. Could you please have a look? -- Maxim Kuznetsov curly_braces_20130429.patch Description: Binary data
Re: [PATCH SH] Fix PR57108
Christian Bruel wrote: > This patches set the correct operand mode for tstsi_t_zero_extract_eq, > to avoid reload generating a move between a constant and a void register. > > Reg tested for sh-elf. No performance impact > > OK for 4.7, 4.8 and trunk ? OK. Regards, kaz
Re: [PATCH v2] gcc: arm: linux-eabi: fix handling of armv4 bx fixups when linking
On 28/04/13 04:52, Mike Frysinger wrote: The bpabi.h header already sets up defines to automatically use the --fix-v4bx flag with the assembler & linker as needed, and creates a default assembly & linker spec which uses those. Unfortunately, the linux-eabi.h header clobbers the LINK_SPEC define and doesn't include the v4bx define when setting up its own. So while the assembler spec is retained and works fine to generate the right relocs, building for armv4 targets doesn't invoke the linker correctly so all the relocs get processed as if we had an armv4t target. You can see this with -dumpspecs when configuring gcc for an armv4 target and using --with-arch=armv4: $ armv4l-unknown-linux-gnueabi-gcc -dumpspecs |& grep -B 1 fix-v4bx *subtarget_extra_asm_spec: %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx} ... With this fix in place, we also get the link spec: $ armv4l-unknown-linux-gnueabi-gcc -dumpspecs |& grep -B 1 fix-v4bx *link: ... %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx} ... And all my hello world tests / glibc builds automatically turn the bx insn into the 'mov pc, lr' insn and all is right in the world. Signed-off-by: Mike Frysinger 2013-04-27 Mike Frysinger * config/arm/bpabi.h (EABI_LINK_SPEC): Define. (BPABI_LINK_SPEC): Use new EABI_LINK_SPEC. * config/arm/linux-eabi.h (LINK_SPEC): Replace BE8_LINK_SPEC with EABI_LINK_SPEC. OK. R.
Re: [PATCH] Fix PR57089
On Mon, 29 Apr 2013, Richard Biener wrote: > > I've tried to follow where the scalar loop appears in > expand_omp_for_static_nochunk but got lost quickly. So the following > papers over the lack of OMP expansion populating the loop tree > as I've done in the original patch introducing loops to it. > > If the OMP expansion code knows at some point "here is a new loop > and this is the header block and this is the latch block" I can > write a helper that properly updates the loop tree with that > information (call alloc_loop, init ->header and ->latch and > call add_loop). But at the moment I have no idea where to call > that function ... After discussion on IRC I am now testing the following (only the degenerate case still uses fixup, I'm not sure how to reliably get at loops here - eventually we want to revisit that loops-with-abnormal-entries issue again). Richard. 2013-04-29 Richard Biener PR middle-end/57089 * omp-low.c (expand_omp_taskreg): If the parent function had a broken loop tree make sure to schedule a fixup for the child as well. (expand_omp_for_generic): Properly add loops. (expand_omp_for_static_nochunk): Likewise. (expand_omp_for_static_chunk): Likewise. (expand_omp_for): For the degenerate case fixup loops. (expand_omp_sections): Fix default bb placement in loops. (expand_omp_atomic_pipeline): Properly add loops. * gfortran.dg/gomp/pr57089.f90: New testcase. Index: gcc/omp-low.c === *** gcc/omp-low.c (revision 198389) --- gcc/omp-low.c (working copy) *** expand_omp_taskreg (struct omp_region *r *** 3571,3581 new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block); if (exit_bb) single_succ_edge (new_bb)->flags = EDGE_FALLTHRU; ! /* ??? As the OMP expansion process does not update the loop ! tree of the original function before outlining the region to !the new child function we need to discover loops in the child. !Arrange for that. */ ! child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP; /* Remove non-local VAR_DECLs from child_cfun->local_decls list. */ num = vec_safe_length (child_cfun->local_decls); --- 3571,3580 new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block); if (exit_bb) single_succ_edge (new_bb)->flags = EDGE_FALLTHRU; ! /* When the OMP expansion process cannot guarantee an up-to-date ! loop tree arrange for the child function to fixup loops. */ ! if (loops_state_satisfies_p (LOOPS_NEED_FIXUP)) ! child_cfun->x_current_loops->state |= LOOPS_NEED_FIXUP; /* Remove non-local VAR_DECLs from child_cfun->local_decls list. */ num = vec_safe_length (child_cfun->local_decls); *** expand_omp_for_generic (struct omp_regio *** 4148,4153 --- 4147,4162 recompute_dominator (CDI_DOMINATORS, l0_bb)); set_immediate_dominator (CDI_DOMINATORS, l1_bb, recompute_dominator (CDI_DOMINATORS, l1_bb)); + + struct loop *outer_loop = alloc_loop (); + outer_loop->header = l0_bb; + outer_loop->latch = l2_bb; + add_loop (outer_loop, l0_bb->loop_father); + + struct loop *loop = alloc_loop (); + loop->header = l1_bb; + /* The loop may have multiple latches. */ + add_loop (loop, outer_loop); } } *** expand_omp_for_static_nochunk (struct om *** 4370,4375 --- 4379,4389 recompute_dominator (CDI_DOMINATORS, body_bb)); set_immediate_dominator (CDI_DOMINATORS, fin_bb, recompute_dominator (CDI_DOMINATORS, fin_bb)); + + struct loop *loop = alloc_loop (); + loop->header = body_bb; + loop->latch = cont_bb; + add_loop (loop, body_bb->loop_father); } *** expand_omp_for_static_chunk (struct omp_ *** 4671,4676 --- 4685,4700 recompute_dominator (CDI_DOMINATORS, seq_start_bb)); set_immediate_dominator (CDI_DOMINATORS, body_bb, recompute_dominator (CDI_DOMINATORS, body_bb)); + + struct loop *trip_loop = alloc_loop (); + trip_loop->header = iter_part_bb; + trip_loop->latch = trip_update_bb; + add_loop (trip_loop, iter_part_bb->loop_father); + + struct loop *loop = alloc_loop (); + loop->header = body_bb; + loop->latch = cont_bb; + add_loop (loop, trip_loop); } *** expand_omp_for (struct omp_region *regio *** 4698,4703 --- 4722,4732 BRANCH_EDGE (region->cont)->flags &= ~EDGE_ABNORMAL; FALLTHRU_EDGE (region->cont)->flags &= ~EDGE_ABNORMAL; } + else + /* If there isnt a continue then this is a degerate case where +the introduction of abnormal e
Re: [PATCH, AArch64] Support LDR/STR to/from S and D registers
On 26 April 2013 14:38, Ian Bolton wrote: > This patch allows us to load to and store from the S and D registers, > which helps with doing scalar operations in those registers. > > This has been regression tested on bare-metal and linux. > > OK for trunk? > > Cheers, > Ian > > > 2013-04-26 Ian Bolton > > * config/aarch64/aarch64.md (movsi_aarch64): Support LDR/STR > from/to S register. > (movdi_aarch64): Support LDR/STR from/to D register. OK /Marcus
Re: [AArch64] Vectorize over more math.h functions.
On 26 April 2013 14:28, James Greenhalgh wrote: > > Hi, > > This patch adds float -> int builtins to the set > of builtins we can try to vectorize in aarch64_builtin_vectorized_function. > > In particular, we add BUILT_IN_IFLOORF, BUILT_IN_ICEILF, BUILT_IN_LROUND, > BUILT_IN_IROUNDF. > > The BUILT_IN_LROUND cases won't be triggered unless -ffast-math > or something else which turns off inexact errors is enabled. > > Regression tested for aarch64-none-elf with no regressions. > > Thanks, > James > > --- > gcc/ > > 2013-04-26 James Greenhalgh > > * config/aarch64/aarch64-builtins.c > (aarch64_builtin_vectorized_function): Vectorize over ifloorf, > iceilf, lround, iroundf. OK /Marcus
Re: [AArch64] Implement vector float->double widening and double->float narrowing.
On 26 April 2013 14:25, James Greenhalgh wrote: > > Hi, > > gcc.dg/vect/vect-float-truncate-1.c and > gcc.dg/vect/vect-float-extend-1.c > > Were failing because widening and narrowing of floats to doubles was > not wired up. > > This patch fixes that by implementing the standard names: > > vec_pack_trunc_v2df > Taking two vectors of V2DFmode and returning one vector of V4SF mode. > > `vec_unpacks_float_hi_v4sf', `vec_unpacks_float_lo_v4sf' > Taking one vector of V4SF mode and splitting it to two vectors of V2DF mode. > > Patch regression tested on aarch64-none-elf with no regressions, > and shown to fix the bug. > > Thanks, > James > --- > gcc/ > > 2013-04-26 James Greenhalgh > > * config/aarch64/aarch64-simd-builtins.def (vec_unpacks_hi_): New. > (float_truncate_hi_): Likewise. > (float_extend_lo_): Likewise. > (float_truncate_lo_): Likewise. > * config/aarch64/aarch64-simd.md (vec_unpacks_lo_v4sf): New. > (aarch64_float_extend_lo_v2df): Likewise. > (vec_unpacks_hi_v4sf): Likewise. > (aarch64_float_truncate_lo_v2sf): Likewise. > (aarch64_float_truncate_hi_v4sf): Likewise. > (vec_pack_trunc_v2df): Likewise. > (vec_pack_trunc_df): Likewise. OK /Marcus
Re: [AArch64] Add vector int to float conversions.
On 26 April 2013 14:22, James Greenhalgh wrote: > > Hi, > > This patch wires up builtins for int to float conversions in > Tree, and uint to float conversions in RTL. > > Regression tested for aarch64-none-elf with no regressions. > > Thanks, > James > > --- > gcc/ > > 2013-04-26 James Greenhalgh > > * config/aarch64/aarch64-builtins.c > (aarch64_fold_builtin): Fold float conversions. > * config/aarch64/aarch64-simd-builtins.def > (floatv2si, floatv4si, floatv2di): New. > (floatunsv2si, floatunsv4si, floatunsv2di): Likewise. > * config/aarch64/aarch64-simd.md > (2): New, expands to float and > floatuns. > * config/aarch64/iterators.md (FLOATUORS): New. > (optab): Add float, floatuns. > (su_optab): Likewise. OK /Marcus
Re: Make m32c build, fix PSImode truncation
On 04/27/2013 10:39 AM, Richard Sandiford wrote: > Argh, that's unfortunate. The point of that change was to make > simplify_gen_unary (TRUNCATE, ...) no worse than using a subreg. > Would the equivalent lowpart simplify_gen_subreg call succeed > (return nonnull)? If so, I think we want truncate to do the same. > > What simplification is this blocking, and why does it lead to > reload failures? There's an explicit (set (reg:PSI) (truncate:PSI (reg:SI)) insn which currently gets changed to (set (reg:PSI) (subreg:PSI (reg:SI)) during cse1. Reload fails because the subreg gets propagated into a memory address, which requires a class of A_REGS, but A_REGS can only hold PSImode values, not SImode. This shows that the truncation is not always a no-op: in this case it involves a register move, but there's no way to describe this using TRULY_NOOP_TRUNCATION. Bernd
Re: [AArch64] Map fcvt intrinsics to builtin name directly.
On 26 April 2013 14:12, James Greenhalgh wrote: > > Hi, > > This patch uses the new builtin-mapping infrastructure > to map the fcvt family of builtins directly to their > GCC standard pattern name. > > Regression tested on aarch64-none-elf with no regressions. > > Thanks, > James > > --- > gcc/ > > 2013-04-26 James Greenhalgh > > * config/aarch64/aarch64-builtins.c > (aarch64_builtin_vectorized_function): Use new names for > fcvt builtins. > * config/aarch64/aarch64-simd-builtins.def (fcvtzs): Split as... > (lbtruncv2sf, lbtruncv4sf, lbtruncv2df): ...This. > (fcvtzu): Split as... > (lbtruncuv2sf, lbtruncuv4sf, lbtruncuv2df): ...This. > (fcvtas): Split as... > (lroundv2sf, lroundv4sf, lroundv2df, lroundsf, lrounddf): ...This. > (fcvtau): Split as... > (lrounduv2sf, lrounduv4sf, lrounduv2df, lroundusf, lroundudf): > ...This. > (fcvtps): Split as... > (lceilv2sf, lceilv4sf, lceilv2df): ...This. > (fcvtpu): Split as... > (lceiluv2sf, lceiluv4sf, lceiluv2df, lceilusf, lceiludf): ...This. > (fcvtms): Split as... > (lfloorv2sf, lfloorv4sf, lfloorv2df): ...This. > (fcvtmu): Split as... > (lflooruv2sf, lflooruv4sf, lflooruv2df, lfloorusf, lfloorudf): > ...This. > (lfrintnv2sf, lfrintnv4sf, lfrintnv2df, lfrintnsf, lfrintndf): New. > (lfrintnuv2sf, lfrintnuv4sf, lfrintnuv2df): Likewise. > (lfrintnusf, lfrintnudf): Likewise. > * config/aarch64/aarch64-simd.md > (l2): Convert to > define_insn. > (aarch64_fcvt): Remove. > * config/aarch64/iterators.md (FCVT): Include UNSPEC_FRINTN. > (fcvt_pattern): Likewise. OK /Marcus
[PATCH SH] Fix PR57108
Hello, This patches set the correct operand mode for tstsi_t_zero_extract_eq, to avoid reload generating a move between a constant and a void register. Reg tested for sh-elf. No performance impact OK for 4.7, 4.8 and trunk ? Thanks 2013-04-26 Christian Bruel PR target/57108 * sh.md (tstsi_t_zero_extract_eq): Set mode for operand 0. 2013-04-26 Christian Bruel PR target/57108 * gcc.target/sh/pr57108.c: New test. Index: gcc/testsuite/gcc.target/sh/pr57108.c === --- gcc/testsuite/gcc.target/sh/pr57108.c (revision 0) +++ gcc/testsuite/gcc.target/sh/pr57108.c (revision 0) @@ -0,0 +1,19 @@ +/* { dg-do compile { target "sh*-*-*" } } */ +/* { dg-options "-O1" } */ + +void __assert_func (void) __attribute__ ((__noreturn__)) ; + +void ATATransfer (int num, int buffer) +{ + int wordCount; + + while (num > 0) + { +wordCount = num * 512 / sizeof (int); + +((0 == (buffer & 63)) ? (void)0 : __assert_func () ); +((0 == (wordCount & 31)) ? (void)0 : __assert_func ()); + } + + + } Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 198287) +++ gcc/config/sh/sh.md (working copy) @@ -689,7 +689,7 @@ ;; Extract contiguous bits and compare them against zero. (define_insn "tstsi_t_zero_extract_eq" [(set (reg:SI T_REG) - (eq:SI (zero_extract:SI (match_operand 0 "logical_operand" "z") + (eq:SI (zero_extract:SI (match_operand:SI 0 "logical_operand" "z") (match_operand:SI 1 "const_int_operand") (match_operand:SI 2 "const_int_operand")) (const_int 0)))]
Re: [AArch64][Testsuite] Enable vect_uintfloat_cvt for AArch64.
On 26 April 2013 14:36, James Greenhalgh wrote: > > Hi, > > While modifying all the vcvt builtins we've fixed enough bugs > that we can now enable vect_uintfloat_cvt for AArch64. Do that. > > Patch tested to ensure all newly enabled tests succeed. > > James > --- > gcc/testsuite/ > > 2013-04-26 James Greenhalgh > > * lib/target-supports.exp (vect_uintfloat_cvt): Enable for AArch64. OK /Marcus
Re: [AArch64] fcvt instructions - arm_neon.h changes.
On 26 April 2013 14:34, James Greenhalgh wrote: > > This patch updates the implimentation in arm_neon.h of the vcvt > intrinsics. Where appropriate we use C statements, and where not > possible we fall back to builtins. > > There were a number of errors with names and types in the current > revision of the file. These have been corrected. > > Regression tested with no regressions. > > Thanks, > James > > --- > gcc/ > > 2013-04-26 James Greenhalgh > > * config/aarch64/arm_neon.h > (vcvt_f<32,64>_s<32,64>): Rewrite in C. > (vcvt_f<32,64>_s<32,64>): Rewrite using builtins. > (vcvt__f<32,64>_f<32,64>): Likewise. > (vcvt_<32,64>_f<32,64>): Likewise. > (vcvta_<32,64>_f<32,64>): Likewise. > (vcvtm_<32,64>_f<32,64>): Likewise. > (vcvtn_<32,64>_f<32,64>): Likewise. > (vcvtp_<32,64>_f<32,64>): Likewise. > > gcc/testsuite/ > > 2013-04-26 James Greenhalgh > > * gcc.target/aarch64/vect-vcvt.c: New. OK /Marcus
Re: [AArch64] Add vector fix, fixuns, fix_trunc, fixuns_trunc standard patterns
OK /Marcus On 26 April 2013 14:30, James Greenhalgh wrote: > > Hi, > > This patch enables vectorization over conversions by implimenting the > fix, fixuns, fix_trunc, fixuns_trunc, and ftrunc standard pattern names. > > Each of these is implimented by the frintz instruction. > (Round towards 0) > > The expanders for these are blank as they are already > implimented by the lrint standard patterns. We are > just connecting the dots for another set of standard names. > > Regression tested for aarch64-none-elf with no regressions. > > Thanks, > James > > --- > gcc/ > > 2013-04-26 James Greenhalgh > > * config/aarch64/aarch64-simd.md > (2): New, maps to fix, fixuns. > (2): New, maps to > fix_trunc, fixuns_trunc. > (ftrunc2): New. > * config/aarch64/iterators.md (optab): Add fix, fixuns. > (fix_trunc_optab): New.
Re: [Patch, fortran] PR 56981 Improve unbuffered unformatted performance
On Mon, Apr 29, 2013 at 1:46 AM, Jerry DeLisle wrote: > OK Janne and thanks for the patch. Thanks for the review, committed (as well as the system_clock patch). > What are your thoughts about special casing nul devices/ Hmm, I'm not that eager. It starts to smell of "benchmarketing".. One thing I have been thinking of which could help would be to implement a "start of the current record" marker in the buffering implementation, and when flushing then only flush up to that marker. Currently when writing small sequential unformatted records what often happens when looking at strace output is something like write(3, "\4\0\0\0`\0263I\4\0\0\0\4\0\0\0p\0263I\4\0\0\0\4\0\0\0\200\0263I"..., 8192) = 8192 write(3, "\4\0\0\0", 4) = 4 lseek(3, 8810688, SEEK_SET) = 8810688 write(3, "\4\0\0\0", 4) = 4 lseek(3, 8810700, SEEK_SET) = 8810700 write(3, "\4\0\0\0\20A3I\4\0\0\0\4\0\0\0 A3I\4\0\0\0\4\0\0\A3I"..., 8192) = 8192 i.e. the buffer fills up in the middle of a record, we flush it (the 8192 byte writes), but then we have to seek back and forth to fix the record markers. This also means that we cannot fully buffer non-seekable files (such as /dev/null) because the records around the buffer boundaries are corrupted (which of course doesn't matter for the particular case of /dev/null, but otherwise..). So if this issue is fixed we could buffer those as well and get essentially the same performance as for regular files. E.g. something like /* Reserve space in the buffer (flush existing data if necessary), up to some reasonable max size (e.g. 4 KB) unless the size is small and known upfront (e.g. direct access), and set the current_record_start marker at the current position. Should be called when preparing a new record (st_write()). */ void sreserve(int size); Write the record data as usual... /* Finish the record, i.e. set current_record_start marker to -1 to mark that there is no current record. size should be <= reserved size. Should be called when finishing a write (st_write_done()). */ void scommit(int size); Well, that's a rough idea. I don't know when or if I'll have time and motivation to implement it, though.. -- Janne Blomqvist
[C++ Patch/RFC] PR 57092
Hi, in this 4.8/4.9 Regression, finish_decltype_type doesn't handle ADDR_EXPR. In 4.7, finish_decltype_type deals with a TEMPLATE_PARM_INDEX and the testcase compiles fine, but it's quite easy - see c++/52282 - to trigger the same ICE there too (it would be nice to make progress on the latter too). The patchlet below passes testing, not sure whether there is something deeper about this issue. Thanks, Paolo. / Index: cp/semantics.c === --- cp/semantics.c (revision 198381) +++ cp/semantics.c (working copy) @@ -5389,6 +5389,7 @@ finish_decltype_type (tree expr, bool id_expressio case PARM_DECL: case RESULT_DECL: case TEMPLATE_PARM_INDEX: + case ADDR_EXPR: expr = mark_type_use (expr); type = TREE_TYPE (expr); break; Index: testsuite/g++.dg/cpp0x/decltype53.C === --- testsuite/g++.dg/cpp0x/decltype53.C (revision 0) +++ testsuite/g++.dg/cpp0x/decltype53.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/57092 +// { dg-do compile { target c++11 } } + +template +class B { + decltype(F) v; +}; + +void foo(int) {} + +B o;
[DWARF] Fix multiple register spanning location.
Hello, We noticed a few failures with the gdb testsuite due to incorrect mapping of floating point, noticed on SH that defines both TARGET_DWARF_REGISTER_SPAN and DBX_REGISTER_NUMBER. The problem was that the converted pseudo reg was never converted to the dbx format when fed from 'multiple_reg_loc_descriptor' reg tested for sh-elf (including gdb). bootstrap OK for arm-none-eabi, sh64-elf and x86_64-unknown-linux-gnu Note that this could apply to the ARM, C6X, RS6000, MIPS targets that also defines the same macro combination. Although asking approval from the DWARF maintainers, feedback from the respective arch maintainers would be appreciated as I don't run the gdb testsuite on those targets. Many thanks, Christian 2013-04-26 Christian Bruel * dwarf2out.c (multiple_reg_loc_descriptor): Use DBX_REGISTER_NUMBER for spaning registers. 2013-04-26 Christian Bruel * gcc.dg/debug/dwarf2/dwarf_span.c: New test case. Index: dwarf2out.c === --- dwarf2out.c (revision 198287) +++ dwarf2out.c (working copy) @@ -10656,7 +10656,8 @@ multiple_reg_loc_descriptor (rtx rtl, rtx regs, { dw_loc_descr_ref t; - t = one_reg_loc_descriptor (REGNO (XVECEXP (regs, 0, i)), + reg = REGNO (XVECEXP (regs, 0, i)); + t = one_reg_loc_descriptor (DBX_REGISTER_NUMBER (reg), VAR_INIT_STATUS_INITIALIZED); add_loc_descr (&loc_result, t); size = GET_MODE_SIZE (GET_MODE (XVECEXP (regs, 0, 0))); Index: testsuite/gcc.dg/debug/dwarf2/dwarf_span.c === --- testsuite/gcc.dg/debug/dwarf2/dwarf_span.c (revision 0) +++ testsuite/gcc.dg/debug/dwarf2/dwarf_span.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile { target "sh*-*-*" } } */ +/* { dg-require-effective-target hard_float } */ +/* { dg-options "-g -dA" } */ +/* { dg-final { scan-assembler-times "DW_OP_regx" 4 } } */ + +double +add_double (register double u, register double v) +{ + return u + v; +} + +double +wack_double (register double u, register double v) +{ + register double l = u, r = v; + l = add_double (l, r); + return l + r; +}
[PATCH] Fix PR57089
I've tried to follow where the scalar loop appears in expand_omp_for_static_nochunk but got lost quickly. So the following papers over the lack of OMP expansion populating the loop tree as I've done in the original patch introducing loops to it. If the OMP expansion code knows at some point "here is a new loop and this is the header block and this is the latch block" I can write a helper that properly updates the loop tree with that information (call alloc_loop, init ->header and ->latch and call add_loop). But at the moment I have no idea where to call that function ... Bootstrap and regtest pending on x86_64-unknown-linux-gnu. Richard. 2013-04-29 Richard Biener PR middle-end/57089 * omp-low.c (expand_omp_for_static_nochunk): Mark loops for fixup. * gfortran.dg/gomp/pr57089.f90: New testcase. Index: gcc/omp-low.c === *** gcc/omp-low.c (revision 198389) --- gcc/omp-low.c (working copy) *** expand_omp_for_static_nochunk (struct om *** 4370,4375 --- 4370,4380 recompute_dominator (CDI_DOMINATORS, body_bb)); set_immediate_dominator (CDI_DOMINATORS, fin_bb, recompute_dominator (CDI_DOMINATORS, fin_bb)); + + /* ??? The scalar loop that remains in the body is not registered + with the loop tree. Mark that for fixup. */ + if (current_loops) + loops_state_set (LOOPS_NEED_FIXUP); } Index: gcc/testsuite/gfortran.dg/gomp/pr57089.f90 === *** gcc/testsuite/gfortran.dg/gomp/pr57089.f90 (revision 0) --- gcc/testsuite/gfortran.dg/gomp/pr57089.f90 (working copy) *** *** 0 --- 1,12 + ! PR middle-end/57089 + ! { dg-do compile } + ! { dg-options "-O -fopenmp" } + SUBROUTINE T() + INTEGER:: npoints, grad_deriv + SELECT CASE(grad_deriv) + CASE (0) +!$omp do +DO ii=1,npoints +END DO + END SELECT + END SUBROUTINE
Re: mips SNaN/QNaN is swapped
Hi! Ping. On Mon, 22 Apr 2013 11:52:23 +0200, I wrote: > On Fri, 5 Apr 2013 23:55:37 +0100, "Maciej W. Rozycki" > wrote: > > On Fri, 5 Apr 2013, Thomas Schwinge wrote: > > > > Index: gcc/config/fp-bit.c > > > > === > > > > RCS file: /cvs/uberbaum/gcc/config/fp-bit.c,v > > > > retrieving revision 1.39 > > > > diff -u -p -r1.39 fp-bit.c > > > > --- gcc/config/fp-bit.c 26 Jan 2003 10:06:57 - 1.39 > > > > +++ gcc/config/fp-bit.c 1 Apr 2003 21:35:00 - > > > > @@ -210,7 +210,11 @@ pack_d ( fp_number_type * src) > > > >exp = EXPMAX; > > > >if (src->class == CLASS_QNAN || 1) > > > > { > > > > +#ifdef QUIET_NAN_NEGATED > > > > + fraction |= QUIET_NAN - 1; > > > > +#else > > > > fraction |= QUIET_NAN; > > > > +#endif > > > I think the intent of this code is to preserve a NaN's payload (it > > certainly does for non-QUIET_NAN_NEGATED targets) > > I agree. For preserving the payload, both the unpack/pack code also has > to shift by NGARDS. > > > Complementing the change above I think it will also make > > sense to clear the qNaN bit when extracting a payload from fraction in > > unpack_d as the class of a NaN being handled is stored separately. > > I agree. > > > Also I find the "|| 1" clause in the condition immediately above the > > pack_d piece concerned suspicious -- why is a qNaN returned for sNaN > > input? Likewise why are __thenan_sf, etc. encoded as sNaNs rather than > > qNaNs? Does anybody know? > > I also stumbled over that, but for all these, I suppose the idea is that > when a sNaN is "arithmetically processed" (which includes datatype > conversion), an INVALID exception is to be raised (though, »[fp-bit] > implements IEEE 754 format arithmetic, but does not provide a mechanism > [...] for generating or handling exceptions«), and then converted into a > qNaN. > > Also, I found that the bit to look at for distinguishing qNaN/sNaN is > defined wrongly for float. Giving me some "interesting" test results... > ;-) > > Manual testing looks good. Automated testing is still running; in case > nothing turns up, is this OK to check in? > > libgcc/ > * fp-bit.c (unpack_d, pack_d): Properly preserve and restore a > NaN's payload. > * fp-bit.h [FLOAT] (QUIET_NAN): Correct value. > > Index: libgcc/fp-bit.c > === > --- libgcc/fp-bit.c (revision 402061) > +++ libgcc/fp-bit.c (working copy) > @@ -214,11 +214,18 @@ pack_d (const fp_number_type *src) >else if (isnan (src)) > { >exp = EXPMAX; > + /* Restore the NaN's payload. */ > + fraction >>= NGARDS; > + fraction &= QUIET_NAN - 1; >if (src->class == CLASS_QNAN || 1) > { > #ifdef QUIET_NAN_NEGATED > - fraction |= QUIET_NAN - 1; > + /* The quiet/signaling bit remains unset. */ > + /* Make sure the fraction has a non-zero value. */ > + if (fraction == 0) > + fraction |= QUIET_NAN - 1; > #else > + /* Set the quiet/signaling bit. */ > fraction |= QUIET_NAN; > #endif > } > @@ -574,8 +581,10 @@ unpack_d (FLO_union_type * src, fp_number_type * d > { > dst->class = CLASS_SNAN; > } > - /* Keep the fraction part as the nan number */ > - dst->fraction.ll = fraction; > + /* Now that we know which kind of NaN we got, discard the > + quiet/signaling bit, but do preserve the NaN payload. */ > + fraction &= ~QUIET_NAN; > + dst->fraction.ll = fraction << NGARDS; > } > } >else > Index: libgcc/fp-bit.h > === > --- libgcc/fp-bit.h (revision 402061) > +++ libgcc/fp-bit.h (working copy) > @@ -190,7 +190,7 @@ typedef unsigned int UTItype __attribute__ ((mode > #define EXPBIAS 127 > #define FRACBITS 23 > #define EXPMAX (0xff) > -#define QUIET_NAN 0x10L > +#define QUIET_NAN 0x40L > #define FRAC_NBITS 32 > #define FRACHIGH 0x8000L > #define FRACHIGH2 0xc000L > @@ -298,7 +298,7 @@ typedef unsigned int UTItype __attribute__ ((mode > /* numeric parameters */ > /* F_D_BITOFF is the number of bits offset between the MSB of the mantissa > of a float and of a double. Assumes there are only two float types. > - (double::FRAC_BITS+double::NGARDS-(float::FRAC_BITS-float::NGARDS)) > + (double::FRAC_BITS+double::NGARDS-(float::FRAC_BITS+float::NGARDS)) > */ > #define F_D_BITOFF (52+8-(23+7)) > Grüße, Thomas pgpCJJckZpfEO.pgp Description: PGP signature
Re: [RFA][PATCH] Eliminate more unnecessary type conversions
On Fri, Apr 26, 2013 at 8:53 PM, Jeff Law wrote: > > So looking at more dumps made it pretty obvious that my previous patch to > tree-vrp.c to eliminate useless casts to boolean types which fed into > comparisons could and should be generalized. > > Given: > > x1 = (T1) x0; > if (x1 COND CONST) > > If the known value range for x0 fits into T1, then we can rewrite as > > x1 = (T1) x0; > if (x0 COND (T)CONST) > > Which typically makes the first statement dead and may allow further > simplifications. > > Bootstrapped and regression tested on x86_64-unknown-linux-gnu. OK for the > trunk? Ok. Thanks, Richard. > > commit ad290c7270201042bfc3cde1d84c12e639e4bff7 > Author: Jeff Law > Date: Fri Apr 26 12:52:06 2013 -0600 > > * tree-vrp.c (range_fits_type_p): Move to earlier point in file. > (simplify_cond_using_ranges): Generalize code to simplify > COND_EXPRs where one argument is a constant and the other > is an SSA_NAME created by an integral type conversion. > > * gcc.dg/tree-ssa/vrp88.c: New test. > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index d06eee6..f9b207c 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,10 @@ > +2013-04-26 Jeff Law > + > + * tree-vrp.c (range_fits_type_p): Move to earlier point in file. > + (simplify_cond_using_ranges): Generalize code to simplify > + COND_EXPRs where one argument is a constant and the other > + is an SSA_NAME created by an integral type conversion. > + > 2013-04-26 Vladimir Makarov > > * rtl.h (struct rtx_def): Add comment for field jump. > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index bbea9fa..6d7839f 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,7 @@ > +2013-04-26 Jeff Law > + > + * gcc.dg/tree-ssa/vrp88.c: New test. > + > 2013-04-26 Jakub Jelinek > > PR go/57045 > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp88.c > b/gcc/testsuite/gcc.dg/tree-ssa/vrp88.c > new file mode 100644 > index 000..e43bdff > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp88.c > @@ -0,0 +1,39 @@ > +/* { dg-do compile } */ > + > +/* { dg-options "-O2 -fdump-tree-vrp1-details" } */ > + > + > +typedef const struct bitmap_head_def *const_bitmap; > +typedef unsigned long BITMAP_WORD; > +typedef struct bitmap_element_def { > + struct bitmap_element_def *next; > + BITMAP_WORD bits[((128 + (8 * 8 * 1u) - 1) / (8 * 8 * 1u))]; > +} bitmap_element; > +typedef struct bitmap_head_def { > + bitmap_element *first; > +} bitmap_head; > +unsigned char > +bitmap_single_bit_set_p (const_bitmap a) > +{ > + unsigned long count = 0; > + const bitmap_element *elt; > + unsigned ix; > + if ((!(a)->first)) > +return 0; > + elt = a->first; > + if (elt->next != ((void *)0)) > +return 0; > + for (ix = 0; ix != ((128 + (8 * 8 * 1u) - 1) / (8 * 8 * 1u)); ix++) > +{ > + count += __builtin_popcountl (elt->bits[ix]); > + if (count > 1) > + return 0; > +} > + return count == 1; > +} > + > +/* Verify that VRP simplified an "if" statement. */ > +/* { dg-final { scan-tree-dump "Folded into: if.*" "vrp1"} } */ > +/* { dg-final { cleanup-tree-dump "vrp1" } } */ > + > + > diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c > index cb4a09a..07e3e01 100644 > --- a/gcc/tree-vrp.c > +++ b/gcc/tree-vrp.c > @@ -8509,6 +8509,57 @@ test_for_singularity (enum tree_code cond_code, tree > op0, >return NULL; > } > > +/* Return whether the value range *VR fits in an integer type specified > + by PRECISION and UNSIGNED_P. */ > + > +static bool > +range_fits_type_p (value_range_t *vr, unsigned precision, bool unsigned_p) > +{ > + tree src_type; > + unsigned src_precision; > + double_int tem; > + > + /* We can only handle integral and pointer types. */ > + src_type = TREE_TYPE (vr->min); > + if (!INTEGRAL_TYPE_P (src_type) > + && !POINTER_TYPE_P (src_type)) > +return false; > + > + /* An extension is fine unless VR is signed and unsigned_p, > + and so is an identity transform. */ > + src_precision = TYPE_PRECISION (TREE_TYPE (vr->min)); > + if ((src_precision < precision > + && !(unsigned_p && !TYPE_UNSIGNED (src_type))) > + || (src_precision == precision > + && TYPE_UNSIGNED (src_type) == unsigned_p)) > +return true; > + > + /* Now we can only handle ranges with constant bounds. */ > + if (vr->type != VR_RANGE > + || TREE_CODE (vr->min) != INTEGER_CST > + || TREE_CODE (vr->max) != INTEGER_CST) > +return false; > + > + /* For sign changes, the MSB of the double_int has to be clear. > + An unsigned value with its MSB set cannot be represented by > + a signed double_int, while a negative value cannot be represented > + by an unsigned double_int. */ > + if (TYPE_UNSIGNED (src_type) != unsigned_p > + && (TREE_INT_CST_HIGH (vr->min) | TREE_INT_CST_HIGH (vr->max)) < 0) > +return false; > + > + /* Then we can
Re: [PATCH] Fix PR57077 (issue8840045)
On Fri, Apr 26, 2013 at 8:52 PM, Teresa Johnson wrote: > This patch fixes PR57077. Certain new uses of apply_probability > are actually scaling the counts up, and the scale factor should not > be treated as a probability as the value may exceed REG_BR_PROB_BASE. > One example (from the PR) is when scaling counts up in LTO when merging > profiles. Another example I found when preparing the patch to use > the rounding divide in more places is when inlining COMDAT functions. > > Add new helper function apply_scale that does the scaling without > the probability range check. I audited the new uses of apply_probability > and changed the calls as appropriate. > > Profilebootstrapped and tested on x86_64-unknown-linux-gnu. Verified that this > fixes the lto-bootstrap issue. Ok for trunk? Ok. Thanks, Richard. > 2013-04-26 Teresa Johnson > > * basic-block.h (apply_scale): New function. > (apply_probability): Use apply_scale. > * gimple-streamer-in.c (input_bb): Ditto. > * lto-streamer-in.c (input_cfg): Ditto. > * lto-cgraph.c (merge_profile_summaries): Ditto. > * tree-optimize.c (execute_fixup_cfg): Ditto. > * tree-inline.c (copy_bb): Update comment to use > apply_scale. > (copy_edges_for_bb): Ditto. > (copy_cfg_body): Ditto. > > Index: gimple-streamer-in.c > === > --- gimple-streamer-in.c(revision 198344) > +++ gimple-streamer-in.c(working copy) > @@ -329,8 +329,8 @@ input_bb (struct lto_input_block *ib, enum LTO_tag >index = streamer_read_uhwi (ib); >bb = BASIC_BLOCK_FOR_FUNCTION (fn, index); > > - bb->count = apply_probability (streamer_read_gcov_count (ib), > - count_materialization_scale); > + bb->count = apply_scale (streamer_read_gcov_count (ib), > + count_materialization_scale); >bb->frequency = streamer_read_hwi (ib); >bb->flags = streamer_read_hwi (ib); > > Index: lto-streamer-in.c > === > --- lto-streamer-in.c (revision 198344) > +++ lto-streamer-in.c (working copy) > @@ -635,8 +635,8 @@ input_cfg (struct lto_input_block *ib, struct func > > dest_index = streamer_read_uhwi (ib); > probability = (int) streamer_read_hwi (ib); > - count = apply_probability ((gcov_type) streamer_read_gcov_count > (ib), > - count_materialization_scale); > + count = apply_scale ((gcov_type) streamer_read_gcov_count (ib), > + count_materialization_scale); > edge_flags = streamer_read_uhwi (ib); > > dest = BASIC_BLOCK_FOR_FUNCTION (fn, dest_index); > Index: tree-inline.c > === > --- tree-inline.c (revision 198344) > +++ tree-inline.c (working copy) > @@ -1519,7 +1519,7 @@ copy_bb (copy_body_data *id, basic_block bb, int f > basic_block_info automatically. */ >copy_basic_block = create_basic_block (NULL, (void *) 0, > (basic_block) prev->aux); > - /* Update to use apply_probability(). */ > + /* Update to use apply_scale(). */ >copy_basic_block->count = bb->count * count_scale / REG_BR_PROB_BASE; > >/* We are going to rebuild frequencies from scratch. These values > @@ -1891,7 +1891,7 @@ copy_edges_for_bb (basic_block bb, gcov_type count > && old_edge->dest->aux != EXIT_BLOCK_PTR) > flags |= EDGE_FALLTHRU; > new_edge = make_edge (new_bb, (basic_block) old_edge->dest->aux, > flags); > -/* Update to use apply_probability(). */ > +/* Update to use apply_scale(). */ > new_edge->count = old_edge->count * count_scale / REG_BR_PROB_BASE; > new_edge->probability = old_edge->probability; >} > @@ -2278,7 +2278,7 @@ copy_cfg_body (copy_body_data * id, gcov_type coun > incoming_frequency += EDGE_FREQUENCY (e); > incoming_count += e->count; > } > - /* Update to use apply_probability(). */ > + /* Update to use apply_scale(). */ >incoming_count = incoming_count * count_scale / REG_BR_PROB_BASE; >/* Update to use EDGE_FREQUENCY. */ >incoming_frequency > Index: tree-optimize.c > === > --- tree-optimize.c (revision 198344) > +++ tree-optimize.c (working copy) > @@ -131,15 +131,15 @@ execute_fixup_cfg (void) > ENTRY_BLOCK_PTR->count); > >ENTRY_BLOCK_PTR->count = cgraph_get_node (current_function_decl)->count; > - EXIT_BLOCK_PTR->count = apply_probability (EXIT_BLOCK_PTR->count, > - count_scale); > + EXIT_BLOCK_PTR->count = apply_scale (EXIT_BLOCK_PTR->count, > +
Re: [PATCH] Preserve loops from CFG build until after RTL loop opts
On Sun, 28 Apr 2013, Tom de Vries wrote: > On 26/04/13 16:27, Tom de Vries wrote: > > On 25/04/13 16:19, Richard Biener wrote: > > > >> and compared to the previous patch changed the tree-ssa-tailmerge.c > >> part to deal with merging of loop latch and loop preheader (even > >> if that's a really bad idea) to not regress gcc.dg/pr50763.c. > >> Any suggestion on how to improve that part welcome. > > > So I think this is really a cornercase, and we should disregard it if that > > makes > > things simpler. > > > > Rather than fixing up the loop structure, we could prevent tail-merge in > > these > > cases. > > > > The current fix tests for current_loops == NULL, and I'm not sure that can > > still > > happen there, given that we have PROP_loops. > > Richard, > > I've found that it happens in these g++ test-cases: > g++.dg/ext/mv1.C > g++.dg/ext/mv12.C > g++.dg/ext/mv2.C > g++.dg/ext/mv5.C > g++.dg/torture/covariant-1.C > g++.dg/torture/pr43068.C > g++.dg/torture/pr47714.C > This seems rare enough to just bail out of tail-merge in those cases. > > > It's not evident to me that the test bb2->loop_father->latch == bb2 is > > sufficient. Before calling tail_merge_optimize, we call > > loop_optimizer_finalize > > in which we assert that LOOPS_MAY_HAVE_MULTIPLE_LATCHES from there on, so in > > theory we might miss some latches. > > > > But I guess that pre (having started out with simple latches) maintains > > simple > > latches throughout, and that tail-merge does the same. > > I've added a comment related to this in the patch. > > Bootstrapped and reg-tested (ada inclusive) on x86_64. > > OK for trunk? + if (bb == NULL + /* Be conservative with loop structure. It's not evident that this test +is sufficient. Before tail-merge, we've just called +loop_optimizer_finalize, and LOOPS_MAY_HAVE_MULTIPLE_LATCHES is now +set, so there's no guarantee that the loop->latch value is still valid. +But we assume that, since we've forced LOOPS_HAVE_SIMPLE_LATCHES at the +start of pre, we've kept that property intact throughout pre, and are +keeping it throughout tail-merge using this test. */ + || bb->loop_father->latch == bb) return; A more complete test would be to use what the bb_loop_header_p predicate does - skip latch _edges_. Not sure if that's easily possible in the loop looking at succs FOR_EACH_EDGE (e, ei, bb->succs) { int index = e->dest->index; bitmap_set_bit (same->succs, index); same_succ_edge_flags[index] = e->flags; } but we'd skip all edges for which dominated_by_p (CDI_DOMINATORS, e->src, e->dest) of course that's equal to skipping the whole basic-block if the above is true. I suppose the patch is ok as-is for now, but let's keep the above in mind (I want to audit the whole bootstrap process for loops that vanish and eventually re-appear, I just didn't get around thinking about a proper way to efficiently instrument for that). Thanks, Richard.
Re: [PATCH] Fix VRP LSHIFT_EXPR non-singleton shift count handling (PR tree-optimization/57083)
On Sat, 27 Apr 2013, Jakub Jelinek wrote: > Hi! > > If shift count range is [0, 1], then for unsigned LSHIFT_EXPR > bound is the topmost bit, but as llshift method always sign-extends > the result into double_int, the test don't properly find out that > deriving the value range is unsafe. In this case > vr0 is [0x7fff8001, 0x8001], thus when shifting up by 0 or one bit > we might shift out either zero or 1. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? Ok. Thanks, Richard. > 2013-04-26 Jakub Jelinek > > PR tree-optimization/57083 > * tree-vrp.c (extract_range_from_binary_expr_1): For LSHIFT_EXPR with > non-singleton shift count range, zero extend low_bound for uns case. > > * gcc.dg/torture/pr57083.c: New test. > > --- gcc/tree-vrp.c.jj 2013-04-24 12:07:07.0 +0200 > +++ gcc/tree-vrp.c2013-04-26 17:59:41.077938198 +0200 > @@ -2837,7 +2837,7 @@ extract_range_from_binary_expr_1 (value_ > > if (uns) > { > - low_bound = bound; > + low_bound = bound.zext (prec); > high_bound = complement.zext (prec); > if (tree_to_double_int (vr0.max).ult (low_bound)) > { > --- gcc/testsuite/gcc.dg/torture/pr57083.c.jj 2013-04-26 18:09:05.396031875 > +0200 > +++ gcc/testsuite/gcc.dg/torture/pr57083.c2013-04-26 18:08:51.0 > +0200 > @@ -0,0 +1,15 @@ > +/* PR tree-optimization/57083 */ > +/* { dg-do run { target int32plus } } */ > + > +extern void abort (void); > +short x = 1; > +int y = 0; > + > +int > +main () > +{ > + unsigned t = (0x7fff8001U - x) << (y == 0); > + if (t != 0xU) > +abort (); > + return 0; > +} > > Jakub > > -- Richard Biener SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend