[PATCH] Pattern recognizer rotate improvement
On Wed, May 15, 2013 at 03:24:37PM +0200, Richard Biener wrote: We have the same issue in some other places where we insert invariant code into the loop body - one reason there is another LIM pass after vectorization. Well, in this case it causes the shift amount to be loaded into a vector instead of scalar, therefore even when LIM moves it before the loop, it will only work with vector/vector shifts and be more expensive that way (need to broadcast the value in a vector). The following patch improves it slightly at least for loops, by just emitting the shift amount stmts to loop preheader, rotate-4.c used to be only vectorizable with -mavx2 (which has vector/vector shifts), now also -mavx (which doesn't) vectorizes it. Unfortunately this trick doesn't work for SLP vectorization, emitting the stmts at the start of the current bb doesn't help, because every stmt emits its own and thus it is vectorized with vector/vector shifts only anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2013-05-17 Jakub Jelinek ja...@redhat.com * tree-vect-patterns.c (vect_recog_rotate_pattern): For vect_external_def oprnd1 with loop_vinfo, try to emit optional cast, negation and and stmts on the loop preheader edge instead of into the pattern def seq. * gcc.target/i386/rotate-4.c: Compile only with -mavx instead of -mavx2, require only avx instead of avx2. * gcc.target/i386/rotate-4a.c: Include avx-check.h instead of avx2-check.h and turn into an avx runtime test instead of avx2 runtime test. --- gcc/tree-vect-patterns.c.jj 2013-05-16 13:56:08.0 +0200 +++ gcc/tree-vect-patterns.c2013-05-16 15:27:00.565143478 +0200 @@ -1494,6 +1494,7 @@ vect_recog_rotate_pattern (vecgimple * bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); enum vect_def_type dt; optab optab1, optab2; + edge ext_def = NULL; if (!is_gimple_assign (last_stmt)) return NULL; @@ -1574,6 +1575,21 @@ vect_recog_rotate_pattern (vecgimple * if (*type_in == NULL_TREE) return NULL; + if (dt == vect_external_def + TREE_CODE (oprnd1) == SSA_NAME + loop_vinfo) +{ + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + ext_def = loop_preheader_edge (loop); + if (!SSA_NAME_IS_DEFAULT_DEF (oprnd1)) + { + basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (oprnd1)); + if (bb == NULL + || !dominated_by_p (CDI_DOMINATORS, ext_def-dest, bb)) + ext_def = NULL; + } +} + def = NULL_TREE; if (TREE_CODE (oprnd1) == INTEGER_CST || TYPE_MODE (TREE_TYPE (oprnd1)) == TYPE_MODE (type)) @@ -1593,7 +1609,14 @@ vect_recog_rotate_pattern (vecgimple * def = vect_recog_temp_ssa_var (type, NULL); def_stmt = gimple_build_assign_with_ops (NOP_EXPR, def, oprnd1, NULL_TREE); - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + append_pattern_def_seq (stmt_vinfo, def_stmt); } stype = TREE_TYPE (def); @@ -1618,11 +1641,19 @@ vect_recog_rotate_pattern (vecgimple * def2 = vect_recog_temp_ssa_var (stype, NULL); def_stmt = gimple_build_assign_with_ops (NEGATE_EXPR, def2, def, NULL_TREE); - def_stmt_vinfo - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); - set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); - STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + { + def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); + set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); + STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; + append_pattern_def_seq (stmt_vinfo, def_stmt); + } def2 = vect_recog_temp_ssa_var (stype, NULL); tree mask @@ -1630,11 +1661,19 @@ vect_recog_rotate_pattern (vecgimple * def_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, def2, gimple_assign_lhs (def_stmt), mask); - def_stmt_vinfo - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); - set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); - STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + { + def_stmt_vinfo =
[Patch, Fortran] PR48858/55465 - permit multiple bind(C) declarations (but not definitions) for the same proc
Followup (and depending on) to the C binding patches for * COMMON: http://gcc.gnu.org/ml/fortran/2013-05/msg00048.html * Procedures: http://gcc.gnu.org/ml/fortran/2013-05/msg00051.html which honour Fortran 2008, where the Fortran name is no longer a global identifier if a binding name has been specified. The main reason for this patch is a build failure of Open MPI (requires !gcc$ attributes no_arg_check, i.e. it only affects GCC 4.9). Open MPI uses somethine like: interface subroutine pmpi_something() bind(C,name=MPI_something) ... and in a different module: interface subroutine mpi_something() bind(C,name=MPI_something) ... Currently, gfortran rejects it because it only permits one definition/declaration per translation unit. However, there is no reason why multiple INTERFACE blocks shouldn't be permitted. Remarks: a) Better argument checks if definition and declaration are in the same file. (see INTENT patch in a test case) b) Currently, no check is done regarding the characteristic of procedure declarations. Of course, the declaration has to be compatible with the C procedure. However, there seems to be the wish* to permit compatible input - even if the Fortran characteristic is different. For instance int * takes both a scalar integer (int i; f(i)) and arrays (int i[5]; f(i)). Or also popular according to the PRs: Taking a C_LOC or an integer(c_intptr_t). (* Seemingly, also J3 and/or WG5 discussed this (plenum? subgroups?) and they had the permit it. However, finding some official document is difficult.) I was wondering for a while what should be permitted and what shouldn't, but I have now decided to put that completely into the hands of the user. Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias 2013-05-17 Tobias Burnus bur...@net-b.de PR fortran/48858 PR fortran/55465 * decl.c (add_global_entry): Add sym_name. * parse.c (add_global_procedure): Ditto. * resolve.c (resolve_bind_c_derived_types): Handle multiple decl for a procedure. (resolve_global_procedure): Handle gsym-ns pointing to a module. * trans-decl.c (gfc_get_extern_function_decl): Ditto. 2013-05-17 Tobias Burnus bur...@net-b.de PR fortran/48858 PR fortran/55465 * gfortran.dg/binding_label_tests_10_main.f03: Update dg-error. * gfortran.dg/binding_label_tests_11_main.f03: Ditto. * gfortran.dg/binding_label_tests_13_main.f03: Ditto. * gfortran.dg/binding_label_tests_3.f03: Ditto. * gfortran.dg/binding_label_tests_4.f03: Ditto. * gfortran.dg/binding_label_tests_5.f03: Ditto. * gfortran.dg/binding_label_tests_6.f03: Ditto. * gfortran.dg/binding_label_tests_7.f03: Ditto. * gfortran.dg/binding_label_tests_8.f03: Ditto. * gfortran.dg/c_loc_tests_12.f03: Fix test case. * gfortran.dg/binding_label_tests_24.f90: New. * gfortran.dg/binding_label_tests_25.f90: New. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index cb449a2..6ab9cc7 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -5375,6 +5375,7 @@ add_global_entry (const char *name, const char *binding_label, bool sub) else { s-type = type; + s-sym_name = name; s-where = gfc_current_locus; s-defined = 1; s-ns = gfc_current_ns; @@ -5396,6 +5397,7 @@ add_global_entry (const char *name, const char *binding_label, bool sub) else { s-type = type; + s-sym_name = name; s-binding_label = binding_label; s-where = gfc_current_locus; s-defined = 1; diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index ba1730a..a223a2c 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -4359,10 +4359,15 @@ add_global_procedure (bool sub) if (s-defined || (s-type != GSYM_UNKNOWN s-type != (sub ? GSYM_SUBROUTINE : GSYM_FUNCTION))) - gfc_global_used(s, NULL); + { + gfc_global_used (s, NULL); + /* Silence follow-up errors. */ + gfc_new_block-binding_label = NULL; + } else { s-type = sub ? GSYM_SUBROUTINE : GSYM_FUNCTION; + s-sym_name = gfc_new_block-name; s-where = gfc_current_locus; s-defined = 1; s-ns = gfc_current_ns; @@ -4379,10 +4384,15 @@ add_global_procedure (bool sub) if (s-defined || (s-type != GSYM_UNKNOWN s-type != (sub ? GSYM_SUBROUTINE : GSYM_FUNCTION))) - gfc_global_used(s, NULL); + { + gfc_global_used (s, NULL); + /* Silence follow-up errors. */ + gfc_new_block-binding_label = NULL; + } else { s-type = sub ? GSYM_SUBROUTINE : GSYM_FUNCTION; + s-sym_name = gfc_new_block-name; s-binding_label = gfc_new_block-binding_label; s-where = gfc_current_locus; s-defined = 1; diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c index f3607b4..74e0aa4 100644 --- a/gcc/fortran/resolve.c +++ b/gcc/fortran/resolve.c @@ -2389,6 +2389,11 @@ resolve_global_procedure (gfc_symbol *sym, locus *where, } def_sym = gsym-ns-proc_name; + + /* This can happen if a binding name has been specified. */ + if (gsym-binding_label
[PATCH] Fix VEC_[LR]SHIFT_EXPR folding for big-endian (PR tree-optimization/57051)
On Thu, May 16, 2013 at 07:59:00PM +0200, Mikael Pettersson wrote: Jakub Jelinek writes: On Thu, Apr 25, 2013 at 11:47:02PM +0200, Jakub Jelinek wrote: This patch adds folding of constant arguments v and v, which helps to optimize the testcase from the PR back into constant store after vectorized loop is unrolled. As this fixes a regression on the 4.8 branch, I've backported it (and minimal prerequisite for that) to 4.8 branch too. Unfortunately this patch makes gcc.dg/vect/no-scevccp-outer-{7,13}.c fail on powerpc64-linux: +FAIL: gcc.dg/vect/no-scevccp-outer-13.c execution test +FAIL: gcc.dg/vect/no-scevccp-outer-7.c execution test which is a regression from 4.8-20130502. Reverting r198580 fixes it. The same FAILs also occur on trunk. Ah right, I was confused by the fact that VEC_RSHIFT_EXPR is used not just on little endian targets, but on big endian as well (VEC_LSHIFT_EXPR is never emitted), but the important spot is when extracting the scalar result from the vector: if (BYTES_BIG_ENDIAN) bitpos = size_binop (MULT_EXPR, bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1), TYPE_SIZE (scalar_type)); else bitpos = bitsize_zero_node; Fixed thusly, ok for trunk/4.8? 2013-05-17 Jakub Jelinek ja...@redhat.com PR tree-optimization/57051 * fold-const.c (const_binop) case VEC_LSHIFT_EXPR, case VEC_RSHIFT_EXPR: Fix BYTES_BIG_ENDIAN handling. --- gcc/fold-const.c.jj 2013-05-16 12:36:28.0 +0200 +++ gcc/fold-const.c2013-05-17 08:38:12.575117676 +0200 @@ -1393,7 +1393,7 @@ const_binop (enum tree_code code, tree a if (shiftc = outerc || (shiftc % innerc) != 0) return NULL_TREE; int offset = shiftc / innerc; - if (code == VEC_LSHIFT_EXPR) + if ((code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN)) offset = -offset; tree zero = build_zero_cst (TREE_TYPE (type)); for (i = 0; i count; i++) Jakub
Patch ping
Hi! http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00282.html Reject -fsanitize=address -fsanitize=thread linking that won't ever work at runtime. Jakub
Re: section anchors and weak hidden symbols
On 05/16/13 15:32, Rainer Orth wrote: The new gcc.dg/visibility-21.c testcase fails on i386-pc-solaris2.11 and x86_64-unknown-linux-gnu: FAIL: gcc.dg/visibility-21.c (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/visibility-21.c:1:0: warning: this target does not support '-fsection-anchors' [-fsection-anchors] Fixed as follows, tested with the appropriate runtest invokation on both targets where the test becomes UNSUPPORTED, installed on mainline. thanks!
Re: [patch,fortran] PR50405 - Statement function with itself as argument SEGV's
Am 17.05.2013 05:22, schrieb Bud Davis: Not to much to add beyond the title and the patch. The test file fails before (eventually, when you run out of stack) and passes after the patch is applied. No new testsuite failures. --bud Index: gcc/gcc/fortran/resolve.c === --- gcc/gcc/fortran/resolve.c(revision 198955) +++ gcc/gcc/fortran/resolve.c(working copy) @@ -306,6 +306,14 @@ !resolve_procedure_interface (sym)) return; + if (strcmp (proc-name,sym-name) == 0) Missing blank after the comma. +{ + gfc_error (Self referential argument + '%s' at %L is not allowed, sym-name, + proc-declared_at); + return; Indentation is wrong. (As a friend of hyphens, I would add one between self and referential, but it is also fine without.) !{ dg-do compile } ! submitted by zec...@gmail.com !{ dg-prune-output Obsolescent feature: Statement function at } Please add ! PR fortran/50405 as comment. Instead of dg-prune-output, you could also use: '! { dg-options }'. That will override the default setting, i.e. it removes the -pedantic. f(f) = 0 ! { dg-error Self referential argument } end 2013-05-17 Bud Davis jmda...@link.com PR fortran/50405 resolve.c (resolve_formal_arglist): Detect error when an argument has the same name as the function. OK and thanks for the patch! Tobias PS: Nice that you are back to (casual) gfortran development.
[ping] Re: [patch] install host specific {bits,ext}/opt_random.h headers in host specific c++ incdir
ping? Am 12.05.2013 11:58, schrieb Matthias Klose: {bits,ext}/opt_random.h became host specific in 4.8, but are still installed into the host independent c++ include dir. install them into the host specific include dir instead. Ok for 4.8 and trunk? Matthias
Re: [PATCH] Pattern recognizer rotate improvement
On Fri, 17 May 2013, Jakub Jelinek wrote: On Wed, May 15, 2013 at 03:24:37PM +0200, Richard Biener wrote: We have the same issue in some other places where we insert invariant code into the loop body - one reason there is another LIM pass after vectorization. Well, in this case it causes the shift amount to be loaded into a vector instead of scalar, therefore even when LIM moves it before the loop, it will only work with vector/vector shifts and be more expensive that way (need to broadcast the value in a vector). The following patch improves it slightly at least for loops, by just emitting the shift amount stmts to loop preheader, rotate-4.c used to be only vectorizable with -mavx2 (which has vector/vector shifts), now also -mavx (which doesn't) vectorizes it. Unfortunately this trick doesn't work for SLP vectorization, emitting the stmts at the start of the current bb doesn't help, because every stmt emits its own and thus it is vectorized with vector/vector shifts only anyway. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2013-05-17 Jakub Jelinek ja...@redhat.com * tree-vect-patterns.c (vect_recog_rotate_pattern): For vect_external_def oprnd1 with loop_vinfo, try to emit optional cast, negation and and stmts on the loop preheader edge instead of into the pattern def seq. * gcc.target/i386/rotate-4.c: Compile only with -mavx instead of -mavx2, require only avx instead of avx2. * gcc.target/i386/rotate-4a.c: Include avx-check.h instead of avx2-check.h and turn into an avx runtime test instead of avx2 runtime test. --- gcc/tree-vect-patterns.c.jj 2013-05-16 13:56:08.0 +0200 +++ gcc/tree-vect-patterns.c 2013-05-16 15:27:00.565143478 +0200 @@ -1494,6 +1494,7 @@ vect_recog_rotate_pattern (vecgimple * bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); enum vect_def_type dt; optab optab1, optab2; + edge ext_def = NULL; if (!is_gimple_assign (last_stmt)) return NULL; @@ -1574,6 +1575,21 @@ vect_recog_rotate_pattern (vecgimple * if (*type_in == NULL_TREE) return NULL; + if (dt == vect_external_def + TREE_CODE (oprnd1) == SSA_NAME + loop_vinfo) +{ + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + ext_def = loop_preheader_edge (loop); + if (!SSA_NAME_IS_DEFAULT_DEF (oprnd1)) + { + basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (oprnd1)); + if (bb == NULL + || !dominated_by_p (CDI_DOMINATORS, ext_def-dest, bb)) + ext_def = NULL; + } +} + def = NULL_TREE; if (TREE_CODE (oprnd1) == INTEGER_CST || TYPE_MODE (TREE_TYPE (oprnd1)) == TYPE_MODE (type)) @@ -1593,7 +1609,14 @@ vect_recog_rotate_pattern (vecgimple * def = vect_recog_temp_ssa_var (type, NULL); def_stmt = gimple_build_assign_with_ops (NOP_EXPR, def, oprnd1, NULL_TREE); - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + append_pattern_def_seq (stmt_vinfo, def_stmt); } stype = TREE_TYPE (def); @@ -1618,11 +1641,19 @@ vect_recog_rotate_pattern (vecgimple * def2 = vect_recog_temp_ssa_var (stype, NULL); def_stmt = gimple_build_assign_with_ops (NEGATE_EXPR, def2, def, NULL_TREE); - def_stmt_vinfo - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); - set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); - STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate (ext_def, def_stmt); + gcc_assert (!new_bb); + } + else + { + def_stmt_vinfo = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); + set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); + STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; + append_pattern_def_seq (stmt_vinfo, def_stmt); + } def2 = vect_recog_temp_ssa_var (stype, NULL); tree mask @@ -1630,11 +1661,19 @@ vect_recog_rotate_pattern (vecgimple * def_stmt = gimple_build_assign_with_ops (BIT_AND_EXPR, def2, gimple_assign_lhs (def_stmt), mask); - def_stmt_vinfo - = new_stmt_vec_info (def_stmt, loop_vinfo, bb_vinfo); - set_vinfo_for_stmt (def_stmt, def_stmt_vinfo); - STMT_VINFO_VECTYPE (def_stmt_vinfo) = vecstype; - append_pattern_def_seq (stmt_vinfo, def_stmt); + if (ext_def) + { + basic_block new_bb + = gsi_insert_on_edge_immediate
Re: [PATCH] Fix VEC_[LR]SHIFT_EXPR folding for big-endian (PR tree-optimization/57051)
On Fri, 17 May 2013, Jakub Jelinek wrote: On Thu, May 16, 2013 at 07:59:00PM +0200, Mikael Pettersson wrote: Jakub Jelinek writes: On Thu, Apr 25, 2013 at 11:47:02PM +0200, Jakub Jelinek wrote: This patch adds folding of constant arguments v and v, which helps to optimize the testcase from the PR back into constant store after vectorized loop is unrolled. As this fixes a regression on the 4.8 branch, I've backported it (and minimal prerequisite for that) to 4.8 branch too. Unfortunately this patch makes gcc.dg/vect/no-scevccp-outer-{7,13}.c fail on powerpc64-linux: +FAIL: gcc.dg/vect/no-scevccp-outer-13.c execution test +FAIL: gcc.dg/vect/no-scevccp-outer-7.c execution test which is a regression from 4.8-20130502. Reverting r198580 fixes it. The same FAILs also occur on trunk. Ah right, I was confused by the fact that VEC_RSHIFT_EXPR is used not just on little endian targets, but on big endian as well (VEC_LSHIFT_EXPR is never emitted), but the important spot is when extracting the scalar result from the vector: if (BYTES_BIG_ENDIAN) bitpos = size_binop (MULT_EXPR, bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) - 1), TYPE_SIZE (scalar_type)); else bitpos = bitsize_zero_node; Fixed thusly, ok for trunk/4.8? Ok with a comment in front of (code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN) Thanks, Richard. 2013-05-17 Jakub Jelinek ja...@redhat.com PR tree-optimization/57051 * fold-const.c (const_binop) case VEC_LSHIFT_EXPR, case VEC_RSHIFT_EXPR: Fix BYTES_BIG_ENDIAN handling. --- gcc/fold-const.c.jj 2013-05-16 12:36:28.0 +0200 +++ gcc/fold-const.c 2013-05-17 08:38:12.575117676 +0200 @@ -1393,7 +1393,7 @@ const_binop (enum tree_code code, tree a if (shiftc = outerc || (shiftc % innerc) != 0) return NULL_TREE; int offset = shiftc / innerc; - if (code == VEC_LSHIFT_EXPR) + if ((code == VEC_RSHIFT_EXPR) ^ (!BYTES_BIG_ENDIAN)) offset = -offset; tree zero = build_zero_cst (TREE_TYPE (type)); for (i = 0; i count; i++) Jakub -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: [PATCH] Fix incorrect discriminator assignment.
On Wed, May 15, 2013 at 6:50 PM, Cary Coutant ccout...@google.com wrote: gcc/ChangeLog: * tree-cfg.c (locus_descrim_hasher::hash): Only hash lineno. (locus_descrim_hasher::equal): Likewise. (next_discriminator_for_locus): Likewise. (assign_discriminator): Add return value. (make_edges): Assign more discriminator if necessary. (make_cond_expr_edges): Likewise. (make_goto_expr_edges): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/debug/dwarf2/discriminator.c: New Test. Looks OK to me, but I can't approve patches for tree-cfg.c. Two comments: (1) In the C++ conversion, it looks like someone misspelled discrim in locus_descrim_hasher. (2) Is this worth fixing in trunk when we'll eventually switch to a per-instruction discriminator? This patch fixes a common case where one line is distributed to 3 BBs, but only 1 discriminator is assigned. As far as I can see the patch makes discriminators coarser (by hashing and comparing on LOCATION_LINE and not on the location). It also has the changes like - assign_discriminator (entry_locus, then_bb); + if (assign_discriminator (entry_locus, then_bb)) +assign_discriminator (entry_locus, bb); which is what the comment refers to? I think those changes are short-sighted because what happens if the 2nd assign_discriminator call has exactly the same issue? Especially with the make_goto_expr_edges change there may be multiple predecessors where I cannot see how the change is correct. Yes, the change changes something and thus may fix the testcase but it doesn't change things in a predictable way as far as I can see. So - the change to compare/hash LOCATION_LINE looks good to me, but the assign_discriminator change needs more explaining. Richard. -cary
Re: debuggability of recog_data
On Thu, May 16, 2013 at 11:04 PM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, May 16, 2013 at 2:02 PM, Richard Sandiford rdsandif...@googlemail.com wrote: Steven Bosscher stevenb@gmail.com writes: On Wed, May 15, 2013 at 12:14 AM, Mike Stump wrote: I don't what to bike shed. So, I'm happy if the next poor soul that touches it just does so. If people like recog_data_info, I'd be happy to change it to that. Let's give then peanut gallery a day to vote on it. :-) Usually we append _d or _def to structure definitions, so recog_data_def? Gah, I wrote the patch from memory and forgot about the bit after the comma. I'm not trying to be contrary really. :-) Bootstrapped regression-tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * recog.h (Recog_data): Rename to... (recog_data_t): ...this. ^^^ It should be recog_data_d. Ok with that change. Thanks, Richard. -- H.J.
[PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)
On Thu, May 16, 2013 at 06:22:10PM +0200, Jakub Jelinek wrote: As discussed in the PR, there seem to be only 3 define_split patterns that use dead_or_set_p, one in i386.md and two in s390.md, but unfortunately insn splitting is done in many passes (combine, split{1,2,3,4,5}, dbr, pro_and_epilogue, final, sometimes mach) and only in combine the note problem is computed. Computing the note problem in split{1,2,3,4,5} just because of the single pattern on i?86 -m32 and one on s390x -m64 might be too expensive, and while neither of these targets do dbr scheduling, e.g. during final without cfg one can't df_analyze. So, the following patch fixes it by doing the transformation instead in the peephole2 pass which computes the notes problem and has REG_DEAD notes up2date (and peep2_reg_dead_p is used there heavily and works). Alternative, so far untested, patch is let the register is not dead splitter do its job always during split2 and just fix it up during peephole2, if the register was dead. For the non-cltd case the peephole2 is always desirable, we get rid of a register move and free one hard register for potential other uses after peephole2 (cprop_hardreg? anything else that could benefit from that?). For the cltd case, it is questionable, while we gain a free hard register at that spot, it isn't guaranteed any pass will benefit from that, and cltd is 2 byts smaller than sarl $31, %eax. Though, not sure about the performance. So, the patch below doesn't use the second peephole2 for -Os. 2013-05-17 Jakub Jelinek ja...@redhat.com PR rtl-optimization/57281 PR rtl-optimization/57300 * config/i386/i386.md (extendsidi2_1 dead reg splitter): Remove. (extendsidi2_1 peephole2s): Add instead 2 new peephole2s, that undo what the other splitter did if the registers are dead. * gcc.dg/pr57300.c: New test. * gcc.c-torture/execute/pr57281.c: New test. --- gcc/config/i386/i386.md.jj 2013-05-16 18:22:59.0 +0200 +++ gcc/config/i386/i386.md 2013-05-17 10:11:20.365455394 +0200 @@ -3332,22 +3332,8 @@ (define_insn extendsidi2_1 !TARGET_64BIT #) -;; Extend to memory case when source register does die. -(define_split - [(set (match_operand:DI 0 memory_operand) - (sign_extend:DI (match_operand:SI 1 register_operand))) - (clobber (reg:CC FLAGS_REG)) - (clobber (match_operand:SI 2 register_operand))] - (reload_completed - dead_or_set_p (insn, operands[1]) - !reg_mentioned_p (operands[1], operands[0])) - [(set (match_dup 3) (match_dup 1)) - (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31))) - (clobber (reg:CC FLAGS_REG))]) - (set (match_dup 4) (match_dup 1))] - split_double_mode (DImode, operands[0], 1, operands[3], operands[4]);) - -;; Extend to memory case when source register does not die. +;; Split the memory case. If the source register doesn't die, it will stay +;; this way, if it does die, following peephole2s take care of it. (define_split [(set (match_operand:DI 0 memory_operand) (sign_extend:DI (match_operand:SI 1 register_operand))) @@ -3376,6 +3362,48 @@ (define_split DONE; }) +;; Peepholes for the case where the source register does die, after +;; being split with the above splitter. +(define_peephole2 + [(set (match_operand:SI 0 memory_operand) + (match_operand:SI 1 register_operand)) + (set (match_operand:SI 2 register_operand) (match_dup 1)) + (parallel [(set (match_dup 2) + (ashiftrt:SI (match_dup 2) + (match_operand:QI 3 const_int_operand))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_operand:SI 4 memory_operand) (match_dup 2))] + INTVAL (operands[3]) == 31 +REGNO (operands[1]) != REGNO (operands[2]) +peep2_reg_dead_p (2, operands[1]) +peep2_reg_dead_p (4, operands[2]) +!reg_mentioned_p (operands[2], operands[4]) + [(set (match_dup 0) (match_dup 1)) + (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_dup 4) (match_dup 1))]) + +(define_peephole2 + [(set (match_operand:SI 0 memory_operand) + (match_operand:SI 1 register_operand)) + (parallel [(set (match_operand:SI 2 register_operand) + (ashiftrt:SI (match_dup 1) + (match_operand:QI 3 const_int_operand))) + (clobber (reg:CC FLAGS_REG))]) + (set (match_operand:SI 4 memory_operand) (match_dup 2))] + INTVAL (operands[3]) == 31 + /* cltd is shorter than sarl $31, %eax */ +!optimize_function_for_size_p (cfun) +true_regnum (operands[1]) == AX_REG +true_regnum (operands[2]) == DX_REG +peep2_reg_dead_p (2, operands[1]) +peep2_reg_dead_p (3, operands[2]) +!reg_mentioned_p (operands[2], operands[4]) + [(set (match_dup 0) (match_dup 1)) + (parallel [(set (match_dup 1) (ashiftrt:SI (match_dup 1) (const_int 31))) +
Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: This removes two degradations in CPU2006 for 32-bit PowerPC due to lost vectorization opportunities. Previously, GCC treated malloc'd arrays as only guaranteeing 4-byte alignment, even though the glibc implementation guarantees 8-byte alignment. This raises the guarantee to 8 bytes, which is sufficient to permit the missed vectorization opportunities. The guarantee for 64-bit PowerPC should be raised to 16-byte alignment, but doing so currently exposes a latent bug that degrades a 64-bit benchmark. I have therefore not included that change at this time, but added a FIXME recording the information. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Verified that SPEC CPU2006 degradations are fixed with no new degradations. Ok for trunk? Also, do you want any backports? You say this is because glibc guarantees such alignment - shouldn't this be guarded by a proper check then? Probably AIX guarantees similar alignment but there are also embedded elf/newlib targets, no? Btw, the #define should possibly move to config/linux.h guarded by OPTION_GLIBC? Richard. Thanks, Bill 2013-05-16 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.h (MALLOC_ABI_ALIGNMENT): New #define. Index: gcc/config/rs6000/rs6000.h === --- gcc/config/rs6000/rs6000.h (revision 198998) +++ gcc/config/rs6000/rs6000.h (working copy) @@ -2297,6 +2297,13 @@ extern char rs6000_reg_names[][8]; /* register nam /* How to align the given loop. */ #define LOOP_ALIGN(LABEL) rs6000_loop_align(LABEL) +/* Alignment guaranteed by __builtin_malloc. */ +/* FIXME: 128-bit alignment is guaranteed by glibc for TARGET_64BIT. + However, specifying the stronger guarantee currently leads to + a regression in SPEC CPU2006 437.leslie3d. The stronger + guarantee should be implemented here once that's fixed. */ +#define MALLOC_ABI_ALIGNMENT (64) + /* Pick up the return address upon entry to a procedure. Used for dwarf2 unwind information. This also enables the table driven mechanism. */
Re: [fixincludes] solaris_pow_int_overload should use __cplusplus
Bruce Korb bk...@gnu.org writes: On 05/16/13 06:41, Rainer Orth wrote: Work is going on to incorporate all applicable fixincludes fixes into the Solaris headers proper. One fix is currently problematic since it uses an G++-internal macro (__GXX_EXPERIMENTAL_CXX0X__) where libstdc++ cmath already switched to testing __cplusplus. The following patch updates the fix to match cmath. Tested by mainline bootstraps on i386-pc-solaris2.11, sparc-sun-solaris2.11 and 4.8 bootstrap on i386-pc-solaris2.10. Ok for mainline and 4.8 branch if they pass? Look good to me. Thanks. Testing completed successfully now on both 4.8 branch and mainline, thus I've installed the patch on mainline. Jakub, may I install on the 4.8 branch now or better wait until 4.8.1 is released? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote: On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: This removes two degradations in CPU2006 for 32-bit PowerPC due to lost vectorization opportunities. Previously, GCC treated malloc'd arrays as only guaranteeing 4-byte alignment, even though the glibc implementation guarantees 8-byte alignment. This raises the guarantee to 8 bytes, which is sufficient to permit the missed vectorization opportunities. The guarantee for 64-bit PowerPC should be raised to 16-byte alignment, but doing so currently exposes a latent bug that degrades a 64-bit benchmark. I have therefore not included that change at this time, but added a FIXME recording the information. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Verified that SPEC CPU2006 degradations are fixed with no new degradations. Ok for trunk? Also, do you want any backports? You say this is because glibc guarantees such alignment - shouldn't this be guarded by a proper check then? Probably AIX guarantees similar alignment but there are also embedded elf/newlib targets, no? Btw, the #define should possibly move to config/linux.h guarded by OPTION_GLIBC? For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what glibc really guarantees on most architectures (I think x32 provides more than that, but it can override). But changing it requires some analysis on commonly used malloc alternatives on Linux, what exactly do they guarantee. Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan. Note that on most targets there is some standard type that requires such alignment, thus at least 2 * POINTER_SIZE alignment is also a C conformance matter. Jakub
Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
On Fri, May 17, 2013 at 4:47 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote: On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: This removes two degradations in CPU2006 for 32-bit PowerPC due to lost vectorization opportunities. Previously, GCC treated malloc'd arrays as only guaranteeing 4-byte alignment, even though the glibc implementation guarantees 8-byte alignment. This raises the guarantee to 8 bytes, which is sufficient to permit the missed vectorization opportunities. The guarantee for 64-bit PowerPC should be raised to 16-byte alignment, but doing so currently exposes a latent bug that degrades a 64-bit benchmark. I have therefore not included that change at this time, but added a FIXME recording the information. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Verified that SPEC CPU2006 degradations are fixed with no new degradations. Ok for trunk? Also, do you want any backports? You say this is because glibc guarantees such alignment - shouldn't this be guarded by a proper check then? Probably AIX guarantees similar alignment but there are also embedded elf/newlib targets, no? Btw, the #define should possibly move to config/linux.h guarded by OPTION_GLIBC? For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what glibc really guarantees on most architectures (I think x32 provides more than that, but it can override). But changing it requires some analysis on commonly used malloc alternatives on Linux, what exactly do they guarantee. Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan. Note that on most targets there is some standard type that requires such alignment, thus at least 2 * POINTER_SIZE alignment is also a C conformance matter. Jakub, As Bill wrote earlier, 2 * POINTER_SIZE causes a different performance regression for 16 byte alignment on PPC64. 8 bytes is a work-around for now. - David
Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
On Fri, May 17, 2013 at 04:55:20AM -0400, David Edelsohn wrote: As Bill wrote earlier, 2 * POINTER_SIZE causes a different performance regression for 16 byte alignment on PPC64. 8 bytes is a work-around for now. I was talking about the config/linux.h definition, PPC64 can of course override it to workaround something temporarily, but because there is some problem on PPC64 we shouldn't penalize code say on x86_64, s390x or aarch64. Jakub
Re: [PATCH] Don't invalidate string length cache when not needed
On Thu, May 16, 2013 at 10:33:27PM +0200, Jakub Jelinek wrote: On Thu, May 16, 2013 at 06:44:03PM +0200, Marek Polacek wrote: --- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200 +++ gcc/tree-ssa-strlen.c 2013-05-16 17:57:33.963150006 +0200 @@ -1693,8 +1693,10 @@ handle_char_store (gimple_stmt_iterator } else { - si-writable = true; - si-dont_invalidate = true; + /* The string might be e.g. in the .rodata section. */ + si-writable = false; No, this really should be si-writable = true; (and comment not needed). Ok with that change. Done. The thing is, while the string might not be known to be writable before, i.e. we can't optimize away this store, because supposedly it should trap, if we notice e.g. another write to the same location (writing zero there again), we can optimize that other write already, because we know that this store stored there something. +#include strlenopt.h + +__attribute__((noinline, noclone)) size_t +fn1 (char *p, const char *r) +{ + size_t len1 = __builtin_strlen (r); + char *q = __builtin_strchr (p, '\0'); + *q = '\0'; + return len1 - __builtin_strlen (r); // This strlen should be optimized into len1. With strlenopt.h include you can avoid using __builtin_ prefixes, all the builtins needed are prototyped in that header. Ugh, sure, that's only a remnant from my testing. Thanks for the reviews. Will commit this one finally. 2013-05-17 Marek Polacek pola...@redhat.com * tree-ssa-strlen.c (handle_char_store): Don't invalidate cached length when doing non-zero store of storing '\0' to '\0'. * gcc.dg/strlenopt-25.c: New test. * gcc.dg/strlenopt-26.c: Likewise. --- gcc/tree-ssa-strlen.c.mp2013-05-15 14:11:20.079707492 +0200 +++ gcc/tree-ssa-strlen.c 2013-05-17 10:59:27.461231167 +0200 @@ -1694,7 +1694,8 @@ handle_char_store (gimple_stmt_iterator else { si-writable = true; - si-dont_invalidate = true; + gsi_next (gsi); + return false; } } else @@ -1717,6 +1718,33 @@ handle_char_store (gimple_stmt_iterator si-endptr = ssaname; si-dont_invalidate = true; } + /* If si-length is non-zero constant, we aren't overwriting '\0', +and if we aren't storing '\0', we know that the length of the +string and any other zero terminated string in memory remains +the same. In that case we move to the next gimple statement and +return to signal the caller that it shouldn't invalidate anything. + +This is benefical for cases like: + +char p[20]; +void foo (char *q) +{ + strcpy (p, foobar); + size_t len = strlen (p);// This can be optimized into 6 + size_t len2 = strlen (q);// This has to be computed + p[0] = 'X'; + size_t len3 = strlen (p);// This can be optimized into 6 + size_t len4 = strlen (q);// This can be optimized into len2 + bar (len, len2, len3, len4); +} + */ + else if (si != NULL si-length != NULL_TREE + TREE_CODE (si-length) == INTEGER_CST + integer_nonzerop (gimple_assign_rhs1 (stmt))) + { + gsi_next (gsi); + return false; + } } else if (idx == 0 initializer_zerop (gimple_assign_rhs1 (stmt))) { --- gcc/testsuite/gcc.dg/strlenopt-25.c.mp 2013-05-15 17:15:18.702118637 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-25.c 2013-05-15 18:26:27.881030317 +0200 @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fdump-tree-strlen } */ + +#include strlenopt.h + +int +main () +{ + char p[] = foobar; + int len, len2; + len = strlen (p); + p[0] = 'O'; + len2 = strlen (p); + return len - len2; +} + +/* { dg-final { scan-tree-dump-times strlen \\( 0 strlen } } */ +/* { dg-final { cleanup-tree-dump strlen } } */ --- gcc/testsuite/gcc.dg/strlenopt-26.c.mp 2013-05-16 17:33:00.302060413 +0200 +++ gcc/testsuite/gcc.dg/strlenopt-26.c 2013-05-17 10:58:17.605015608 +0200 @@ -0,0 +1,25 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fdump-tree-strlen } */ + +#include strlenopt.h + +__attribute__((noinline, noclone)) size_t +fn1 (char *p, const char *r) +{ + size_t len1 = strlen (r); + char *q = strchr (p, '\0'); + *q = '\0'; + return len1 - strlen (r); // This strlen should be optimized into len1. +} + +int +main (void) +{ + char p[] = foobar; + const char *volatile q = xyzzy; + fn1 (p, q); + return 0; +} + +/* { dg-final { scan-tree-dump-times strlen \\( 1 strlen } } */ +/* { dg-final { cleanup-tree-dump strlen } } */ Marek
Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
On Fri, May 17, 2013 at 10:47 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, May 17, 2013 at 10:32:11AM +0200, Richard Biener wrote: On Fri, May 17, 2013 at 4:40 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: This removes two degradations in CPU2006 for 32-bit PowerPC due to lost vectorization opportunities. Previously, GCC treated malloc'd arrays as only guaranteeing 4-byte alignment, even though the glibc implementation guarantees 8-byte alignment. This raises the guarantee to 8 bytes, which is sufficient to permit the missed vectorization opportunities. The guarantee for 64-bit PowerPC should be raised to 16-byte alignment, but doing so currently exposes a latent bug that degrades a 64-bit benchmark. I have therefore not included that change at this time, but added a FIXME recording the information. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Verified that SPEC CPU2006 degradations are fixed with no new degradations. Ok for trunk? Also, do you want any backports? You say this is because glibc guarantees such alignment - shouldn't this be guarded by a proper check then? Probably AIX guarantees similar alignment but there are also embedded elf/newlib targets, no? Btw, the #define should possibly move to config/linux.h guarded by OPTION_GLIBC? For that location it shouldn't be 64, but 2 * POINTER_SIZE, which is what glibc really guarantees on most architectures (I think x32 provides more than that, but it can override). But changing it requires some analysis on commonly used malloc alternatives on Linux, what exactly do they guarantee. Say valgrind, ElectricFence, jemalloc, tcmalloc, libasan. Note that on most targets there is some standard type that requires such alignment, thus at least 2 * POINTER_SIZE alignment is also a C conformance matter. Yes - note that it's called MALLOC_*ABI*_ALIGNMENT for a reason - it is supposed to be the alignment that is required for conforming to the C ABI on the target. For different allocators I'd rather have a new function attribute that tells us the alignment of the allocators allocation function (of course replacing the allocator at runtime via ELF interposition makes that possibly fragile as well). We shouldn't assume just because glibc may at some day guarantee 256bit alignment that other allocators on linux do so, too. So - I'd rather play safe (and the default to align to BITS_PER_WORD probably catches the C ABI guarantee on most targets I suppose). Richard. Jakub
Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
On Fri, May 17, 2013 at 11:17:26AM +0200, Richard Biener wrote: Yes - note that it's called MALLOC_*ABI*_ALIGNMENT for a reason - it is supposed to be the alignment that is required for conforming to the C ABI on the target. For different allocators I'd rather have a new function attribute that tells us the alignment of the allocators allocation function (of course replacing the allocator at runtime via ELF interposition makes that possibly fragile as well). We shouldn't assume just because glibc Most of the allocators I've mentioned, if not all, are providing just malloc/calloc etc. entrypoints and are being in search scope before -lc, so it isn't anything you can control with attributes. I've looked at libasan so far (seems to guarantee either 16, or 64 or 128 bytes depending on flags), valgrind (seems to guarantee 8 bytes on 32-bit arches except ppc32, and 16 bytes elsewhere), ElectricFence (no guarantees at all, the default is 4 bytes, but can be overridden with EF_ALIGNMENT env option to smaller/larger value, I guess let's ignore efence). Jakub
Re: More vector folding
From: Marc Glisse marc.gli...@inria.fr Date: Tue, 14 May 2013 13:47:23 +0200 Here is what I tested during the night, I'll just rename the function. I took the chance to remove an unnecessary alternative in TRUTH_XOR_EXPR. Passes bootstrap+testsuite on x86_64-linux-gnu. 2013-05-14 Marc Glisse marc.gli...@inria.fr gcc/ * fold-const.c (fold_negate_expr): Handle vectors. (fold_truth_not_expr): Make it static. (fold_invert_truth): New static function. (invert_truthvalue_loc): Handle vectors. Do not call fold_truth_not_expr directly. (fold_unary_loc) BIT_NOT_EXPR: Handle comparisons. TRUTH_NOT_EXPR: Do not cast to boolean. (fold_comparison): Handle vector constants. (fold_binary_loc) TRUTH_XOR_EXPR: Remove redundant code. (fold_ternary_loc) VEC_COND_EXPR: Adapt more COND_EXPR optimizations. * tree.h (fold_truth_not_expr): Remove declaration. gcc/testsuite/ * g++.dg/ext/vector22.C: New testcase. * gcc.dg/binop-xor3.c: Remove xfail. Looks like the removed xfail caused regressions for about half of all targets; there's now PR57313 opened for this. Maybe an rtl test like BRANCH_COSTS or rtx_cost now matters in the decision? brgds, H-P
Re: rtl expansion without zero/sign extension based on VRP
On 13/05/13 17:47, Richard Biener wrote: On Mon, May 13, 2013 at 5:45 AM, Kugan kugan.vivekanandara...@linaro.org wrote: Hi, This patch removes some of the redundant sign/zero extensions using value ranges informations generated by VRP. When GIMPLE_ASSIGN stmts with LHS type smaller than word is expanded to rtl, if we can prove that RHS expression value can always fit in LHS type and there is no sign conversion, truncation and extension to fit the type is redundant. Subreg and Zero/sign extensions are therefore redundant. For example, when an expression is evaluated and it's value is assigned to variable of type short, the generated rtl would look something like the following. (set (reg:SI 110) (zero_extend:SI (subreg:HI (reg:SI 117) 0))) However, if during value range propagation, if we can say for certain that the value of the expression which is present in register 117 is within the limits of short and there is no sign conversion, we don’t need to perform the subreg and zero_extend; instead we can generate the following rtl. (set (reg:SI 110) (reg:SI 117))) Attached patch adds value range information to gimple stmts during value range propagation pass and expands rtl without subreg and zero/sign extension if the value range is within the limits of it's type. This change improve the geomean of spec2k int benchmark with ref by about ~3.5% on an arm chromebook. Tested on X86_64 and ARM. I would like review comments on this. A few comments on the way you preserve this information. Thanks Richard for reviewing it. --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -191,6 +191,11 @@ struct GTY((chain_next (%h.next))) gimple_statement_base { in there. */ unsigned int subcode : 16; + /* if an assignment gimple statement has RHS expression that can fit + LHS type, zero/sign extension to truncate is redundant. + Set this if we detect extension as redundant during VRP. */ + unsigned sign_zero_ext_redundant : 1; + this enlarges all gimple statements by 8 bytes, so it's out of the question. My original plan to preserve value-range information was to re-use SSA_NAME_PTR_INFO which is where we already store value-range information for pointers: struct GTY(()) ptr_info_def { /* The points-to solution. */ struct pt_solution pt; /* Alignment and misalignment of the pointer in bytes. Together align and misalign specify low known bits of the pointer. ptr (align - 1) == misalign. */ /* When known, this is the power-of-two byte alignment of the object this pointer points into. This is usually DECL_ALIGN_UNIT for decls and MALLOC_ABI_ALIGNMENT for allocated storage. When the alignment is not known, it is zero. Do not access directly but use functions get_ptr_info_alignment, set_ptr_info_alignment, mark_ptr_info_alignment_unknown and similar. */ unsigned int align; /* When alignment is known, the byte offset this pointer differs from the above alignment. Access only through the same helper functions as align above. */ unsigned int misalign; }; what you'd do is to make the ptr_info_def reference from tree_ssa_name a reference to a union of the above (for pointer SSA names) and an alternate info like struct range_info_def { double_int min; double_int max; }; I have now changed the way I am preserving this information as you have suggested. or a more compressed format (a reference to a mode or type it fits for example). The very specific case in question also points at the fact that we have two conversion tree codes, NOP_EXPR and CONVERT_EXPR, and we've tried to unify them (but didn't finish up that task)... We can also remove sign/zero extension in cases other than NOP_EXPR and CONVERT_EXPR, if the value range of the RHS expressions fits LHS type without sign change. For example, please look at the following gimple stmt and the resulting rtl: ;; c_3 = c_2(D) 15; ;; Without the patch (insn 6 5 7 (set (reg:SI 116) (and:SI (reg/v:SI 115 [ c ]) (const_int 15 [0xf]))) c5.c:4 -1 (nil)) (insn 7 6 0 (set (reg/v:SI 111 [ c ]) (zero_extend:SI (subreg:QI (reg:SI 116) 0))) c5.c:4 -1 (nil)) ;; with the patch (insn 6 5 7 (set (reg:SI 116) (and:SI (reg/v:SI 115 [ c ]) (const_int 15 [0xf]))) c5.c:4 -1 (nil)) (insn 7 6 0 (set (reg/v:SI 111 [ c ]) (reg:SI 116)) c5.c:4 -1 (nil)) +static void +process_stmt_for_ext_elimination () +{ we already do all the analysis when looking for opportunities to eliminate the intermediate cast in (T2)(T1)x in simplify_conversion_using_ranges, that's the place that should handle this case. I have changed this. I have attached patch with the above changes. Thanks, Kugan Richard. Thanks, Kugan 2013-05-09 Kugan Vivekanandarajah kug...@linaro.org * gcc/gimple.h (gimple_is_exp_fit_lhs, gimple_set_exp_fit_lhs):
Re: [C++ Patch] PR 18126
OK. Jason
Re: [PATCH] Error out on -fsanitize=thread -fsanitize=address
Jakub Jelinek ja...@redhat.com writes: 2013-05-06 Jakub Jelinek ja...@redhat.com * gcc.c (SANITIZER_SPEC): Reject -fsanitize=address -fsanitize=thread linking. This looks OK to me, thanks. -- Dodji
Re: [PATCH, rs6000] Increase MALLOC_ABI_ALIGNMENT for 32-bit PowerPC
On Fri, 2013-05-17 at 01:11 -0400, David Edelsohn wrote: On Thu, May 16, 2013 at 10:40 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: This removes two degradations in CPU2006 for 32-bit PowerPC due to lost vectorization opportunities. Previously, GCC treated malloc'd arrays as only guaranteeing 4-byte alignment, even though the glibc implementation guarantees 8-byte alignment. This raises the guarantee to 8 bytes, which is sufficient to permit the missed vectorization opportunities. The guarantee for 64-bit PowerPC should be raised to 16-byte alignment, but doing so currently exposes a latent bug that degrades a 64-bit benchmark. I have therefore not included that change at this time, but added a FIXME recording the information. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Verified that SPEC CPU2006 degradations are fixed with no new degradations. Ok for trunk? Also, do you want any backports? Okay. If you think that this is appropriate for 4.8 branch, that is okay as well. Yes, the degradation appears to have started last summer, so updating 4.8 would be good. Is there a FSF GCC Bugzilla open for the 437.leslie3d problem? The FIXME doesn't need to contain the complete explanation, but it would be nice to reference a longer explanation and not pretend to be Fermat's theorem that is too large to write in the marge. There is now. I opened 57309 after submitting the patch, so I can reference that. Thanks, Bill Thanks, David
Re: More vector folding
On Fri, 17 May 2013, Hans-Peter Nilsson wrote: From: Marc Glisse marc.gli...@inria.fr Date: Tue, 14 May 2013 13:47:23 +0200 Here is what I tested during the night, I'll just rename the function. I took the chance to remove an unnecessary alternative in TRUTH_XOR_EXPR. Passes bootstrap+testsuite on x86_64-linux-gnu. 2013-05-14 Marc Glisse marc.gli...@inria.fr gcc/ * fold-const.c (fold_negate_expr): Handle vectors. (fold_truth_not_expr): Make it static. (fold_invert_truth): New static function. (invert_truthvalue_loc): Handle vectors. Do not call fold_truth_not_expr directly. (fold_unary_loc) BIT_NOT_EXPR: Handle comparisons. TRUTH_NOT_EXPR: Do not cast to boolean. (fold_comparison): Handle vector constants. (fold_binary_loc) TRUTH_XOR_EXPR: Remove redundant code. (fold_ternary_loc) VEC_COND_EXPR: Adapt more COND_EXPR optimizations. * tree.h (fold_truth_not_expr): Remove declaration. gcc/testsuite/ * g++.dg/ext/vector22.C: New testcase. * gcc.dg/binop-xor3.c: Remove xfail. Looks like the removed xfail caused regressions for about half of all targets; there's now PR57313 opened for this. Maybe an rtl test like BRANCH_COSTS or rtx_cost now matters in the decision? Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it. What is the proper thing to do here? If I add the generic xfail back, we'll get xpass on some platforms, now we have fails on some platforms, and listing the platforms where we want the transformation to happen is just a pain. Shall I remove the testcase? -- Marc Glisse
Re: [PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)
On Fri, May 17, 2013 at 10:25:01AM +0200, Jakub Jelinek wrote: Alternative, so far untested, patch is let the register is not dead splitter do its job always during split2 and just fix it up during peephole2, if the register was dead. Now fully bootstrapped/regtested on x86_64-linux and i686-linux. 2013-05-17 Jakub Jelinek ja...@redhat.com PR rtl-optimization/57281 PR rtl-optimization/57300 * config/i386/i386.md (extendsidi2_1 dead reg splitter): Remove. (extendsidi2_1 peephole2s): Add instead 2 new peephole2s, that undo what the other splitter did if the registers are dead. * gcc.dg/pr57300.c: New test. * gcc.c-torture/execute/pr57281.c: New test. Jakub
[fortran, doc] Improve random_seed example
Hi, the example we provide for the usage of the random_seed intrinsic could be better. At least one user has already been tripped over by the fact that on some targets the first call to system_clock returns 0, resulting in a poor seed. Below is an improved(?!) example program. Ok for trunk/4.8/4.7? (Changelog + patch is of course trivial once there is agreement on the code itself) ! Seed the random generator with some random input module rseed_rand contains subroutine rseed() implicit none integer, allocatable :: seed(:) integer :: i, n, un, istat, dt(8), pid, t(2), s integer(8) :: count, tms call random_seed(size = n) allocate(seed(n)) ! First try if the OS provides a random number generator open(newunit=un, file=/dev/urandom, access=stream, form=unformatted, iostat=istat) if (istat == 0) then read(un) seed close(un) else ! Fallback to XOR:ing the current time and pid. The PID is ! useful in case one launches multiple instances of the same ! program in parallel. call system_clock(count) if (count /= 0) then t = transfer(count, t) else call date_and_time(values=dt) tms = (dt(1) - 1970) * 365_8 * 24 * 60 * 60 * 1000 + dt(2) * 31_8 * 24 * 60 * 60 * 1000 + dt(3) * 24 * 60 * 60 * 60 * 1000 + dt(5) * 60 * 60 * 1000 + dt(6) * 60 * 1000 + dt(7) * 1000 + dt(8) t = transfer(tms, t) end if s = ieor(t(1), t(2)) pid = getpid() s = ieor(s, pid) if (n = 3) then seed(1) = t(1) seed(2) = t(2) seed(3) = pid if (n 3) then seed(4:) = s + 37 * (/ (i, i = 0, n - 4) /) end if else seed = s + 37 * (/ (i, i = 0, n - 1 ) /) end if end if call random_seed(put=seed) end subroutine rseed end module rseed_rand -- Janne Blomqvist
[PATCH] Fix typo in stmt_kills_ref_p_1
stmt_kills_ref_p_1 wants to compare MEM_REF offset but compares the pointers instead (which are previously compared as equal). This breaks my fix for PR57303. Fixed thus, applied as obvious. Richard. 2013-05-17 Richard Biener rguent...@suse.de * tree-ssa-alias.c (stmt_kills_ref_p_1): Properly compare MEM_REF offsets. Index: gcc/tree-ssa-alias.c === --- gcc/tree-ssa-alias.c(revision 199004) +++ gcc/tree-ssa-alias.c(working copy) @@ -2002,8 +2002,8 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref if (TREE_CODE (base) == MEM_REF TREE_CODE (ref-base) == MEM_REF TREE_OPERAND (base, 0) == TREE_OPERAND (ref-base, 0)) { - if (!tree_int_cst_equal (TREE_OPERAND (base, 0), - TREE_OPERAND (ref-base, 0))) + if (!tree_int_cst_equal (TREE_OPERAND (base, 1), + TREE_OPERAND (ref-base, 1))) { double_int off1 = mem_ref_offset (base); off1 = off1.lshift (BITS_PER_UNIT == 8
Re: [PATCH] Fix typo in stmt_kills_ref_p_1
On Fri, May 17, 2013 at 02:46:56PM +0200, Richard Biener wrote: stmt_kills_ref_p_1 wants to compare MEM_REF offset but compares the pointers instead (which are previously compared as equal). This breaks my fix for PR57303. Fixed thus, applied as obvious. Richard. 2013-05-17 Richard Biener rguent...@suse.de * tree-ssa-alias.c (stmt_kills_ref_p_1): Properly compare MEM_REF offsets. Ugh, thanks. As the 0th operand was equal, tree_int_cst_equal used to return always 1 and thus the if condition was never true. --- gcc/tree-ssa-alias.c (revision 199004) +++ gcc/tree-ssa-alias.c (working copy) @@ -2002,8 +2002,8 @@ stmt_kills_ref_p_1 (gimple stmt, ao_ref if (TREE_CODE (base) == MEM_REF TREE_CODE (ref-base) == MEM_REF TREE_OPERAND (base, 0) == TREE_OPERAND (ref-base, 0)) { - if (!tree_int_cst_equal (TREE_OPERAND (base, 0), -TREE_OPERAND (ref-base, 0))) + if (!tree_int_cst_equal (TREE_OPERAND (base, 1), +TREE_OPERAND (ref-base, 1))) { double_int off1 = mem_ref_offset (base); off1 = off1.lshift (BITS_PER_UNIT == 8 Jakub
Re: More vector folding
2013/5/17 Marc Glisse marc.gli...@inria.fr: Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it. What is the proper thing to do here? If I add the generic xfail back, we'll get xpass on some platforms, now we have fails on some platforms, and listing the platforms where we want the transformation to happen is just a pain. Shall I remove the testcase? If we cannot redesign the testcase, we should remove it. Because it seems the testcase cannot properly demonstrate its purpose after r198983. Otherwise we better have some comment in the testcase to inform target maintainers listing their targets in the xfail field. Best regards, jasonwucj
Re: More vector folding
On Fri, May 17, 2013 at 09:54:14PM +0800, Chung-Ju Wu wrote: 2013/5/17 Marc Glisse marc.gli...@inria.fr: Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it. What is the proper thing to do here? If I add the generic xfail back, we'll get xpass on some platforms, now we have fails on some platforms, and listing the platforms where we want the transformation to happen is just a pain. Shall I remove the testcase? If we cannot redesign the testcase, we should remove it. Because it seems the testcase cannot properly demonstrate its purpose after r198983. Otherwise we better have some comment in the testcase to inform target maintainers listing their targets in the xfail field. You can just limit the testcase to a few selected targets, if it includes a few widely used architectures, it will be enough for the testing and will be better than removing the testcase altogether. Jakub
Re: [PATCH, ARM][1 of 2] Add epilogue dwarf info for shrink-wrap
On 05/16/13 07:27, Zhenqiang Chen wrote: On 15 May 2013 06:31, Ramana Radhakrishnan ramana@googlemail.com wrote: Sorry this had dropped off my list of patches to review somehow but anyway here's a first cut review. On Thu, Mar 21, 2013 at 6:58 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: Hi, When shrink-wrap is enabled, the returns from simple-return path and normal return path can be merged. The code is like: tst ... / \ | push ... | ... | pop ... \ / bx lr If the dwarf info after pop ... is incorrect, the dwarf checks will fail at dwarf2cfi.c: function maybe_record_trace_start. /* We ought to have the same state incoming to a given trace no matter how we arrive at the trace. Anything else means we've got some kind of optimization error. */ gcc_checking_assert (cfi_row_equal_p (cur_row, ti-beg_row)); The patch is to add epilogue dwarf info to make sure: Before bx lr, * All registers saved in stack had been restored. * .cfi_def_cfa_offset 0 * .cfi_def_cfa_register is sp even if frame_pointer_needed Boot strapped and no make check regression. in what configuration - thumb2 / arm / cortex-a15 / cortex-a9 ? what languages and on what platform ? Bootstrapped both arm and thumb2 on Pandaboard ES and Chromebook. Pandaboard ES (cortex-a9): --enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb Chromebook (cortext-a15): --enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a --with-tune=cortex-a15 --with-float=hard --with-fpu=vfpv3-d16 --enable-languages=c,c++,fortran,objc,obj-c++ --with-arch=armv7-a --with-tune=cortex-a15 --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb Regression tests on Pandborad ES for both arm and thumb2 with the same configure as bootstrap. Is it OK? @@ -25447,9 +25519,15 @@ arm_unwind_emit (FILE * asm_out_file, rtx insn) handled_one = true; break; +/* The INSN is generated in epilogue. It is set as RTX_FRAME_RELATED_P + to get correct dwarf info for shrink-wrap. But We should not emit + unwind info for it. */ s/dwarf/debug frame. REplace But We should not emit unwind info for it with We should not emit unwind info for it because these are used either for pretend arguments or put the other case here. I think you are correct that there is no need for unwind information as we only have unwind information valid at call points and in this case I wouldn't expect unwind information to need to be exact for pretend arguments Thanks for the comments. The patch is updated and rebased to trunk. Ok if no regressions. regards Ramana -Zhenqiang
Re: More vector folding
On Fri, 17 May 2013, Jakub Jelinek wrote: On Fri, May 17, 2013 at 09:54:14PM +0800, Chung-Ju Wu wrote: 2013/5/17 Marc Glisse marc.gli...@inria.fr: Yes, LOGICAL_OP_NON_SHORT_CIRCUIT seems to be it. What is the proper thing to do here? If I add the generic xfail back, we'll get xpass on some platforms, now we have fails on some platforms, and listing the platforms where we want the transformation to happen is just a pain. Shall I remove the testcase? If we cannot redesign the testcase, we should remove it. Because it seems the testcase cannot properly demonstrate its purpose after r198983. Otherwise we better have some comment in the testcase to inform target maintainers listing their targets in the xfail field. You can just limit the testcase to a few selected targets, if it includes a few widely used architectures, it will be enough for the testing and will be better than removing the testcase altogether. Like this? (according to the PR, ia64, sh4 and spu would work too) I tested with: make check-gcc RUNTESTFLAGS=dg.exp=binop-xor3.c and it still passes on x86_64 and appears as unsupported on powerpc. 2013-05-17 Marc Glisse marc.gli...@inria.fr PR regression/57313 * gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86). -- Marc GlisseIndex: gcc.dg/binop-xor3.c === --- gcc.dg/binop-xor3.c (revision 199006) +++ gcc.dg/binop-xor3.c (working copy) @@ -1,11 +1,11 @@ -/* { dg-do compile } */ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ /* { dg-options -O2 -fdump-tree-optimized } */ int foo (int a, int b) { return ((a !b) || (!a b)); } /* { dg-final { scan-tree-dump-times \\\^ 1 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */
Re: [PATCH, ARM][2 of 2] Enable shrink-wrap for ARM
On 05/16/13 07:46, Zhenqiang Chen wrote: On 15 May 2013 06:36, Ramana Radhakrishnan ramana@googlemail.com wrote: What happens to the *arm_simple_return pattern that already exists in the backend ? Can you rebase to trunk ? The *arm_simple_return is the same as simple_return used by shrink-wrap. *arm_return and *arm_simple_return are not merged since *arm_return is for Often the return insn will be the same as loading from memory, while *arm_simple_return is for branch. For thumb2, there is only simple_return pattern -- *thumb2_return. And this is the reason why the normal return generated by epilogue and simple return generated by shrink-wrap can be optimized as one, which leads to dwarf info issue. The rebased patch is attached. Ok. Ramana Thanks! -Zhenqiang On Wed, Apr 3, 2013 at 7:50 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: On 2 April 2013 17:55, Ramana Radhakrishnan ramana@googlemail.com wrote: On Thu, Mar 21, 2013 at 7:03 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: Hi, The patch is to enable shrink-wrap for TARGET_ARM and TARGET_THUMB2. Bootstrapped and no make check regression. All previous Linaro shrink-wrap bugs (http://goo.gl/6fGg5) are verified. Is it OK? The tests should be part of the patch attached and not just added as separate files in your patch submission. Thank you for t -Zhenqiang Thanks! -Zhenqiang ChangeLog: 2013-03-21 Bernd Schmidt ber...@codesourcery.com Zhenqiang Chen zhenqiang.c...@linaro.org * config/arm/arm-protos.h: Add and update function protos. * config/arm/arm.c (use_simple_return_p): New added. (thumb2_expand_return): Check simple_return flag. * config/arm/arm.md: Add simple_return and conditional simple_return. * config/arm/iterators.md: Add iterator for return and simple_return. testsuite/ChangeLog: 2013-03-21 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.dg/shrink-wrap-alloca.c: New added. * gcc.dg/shrink-wrap-pretend.c: New added. * gcc.dg/shrink-wrap-sibcall.c: New added.
Re: More vector folding
On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote: 2013-05-17 Marc Glisse marc.gli...@inria.fr PR regression/57313 * gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86). I'd say it should be PR testsuite/57313 (and the PR changed to that). Ok with that change, if some other target maintainer sees this test passing on his platform, it can be added to the target selector later. --- gcc.dg/binop-xor3.c (revision 199006) +++ gcc.dg/binop-xor3.c (working copy) @@ -1,11 +1,11 @@ -/* { dg-do compile } */ +/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ /* { dg-options -O2 -fdump-tree-optimized } */ int foo (int a, int b) { return ((a !b) || (!a b)); } /* { dg-final { scan-tree-dump-times \\\^ 1 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ Jakub
Re: More vector folding
On Fri, 17 May 2013, Jakub Jelinek wrote: On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote: 2013-05-17 Marc Glisse marc.gli...@inria.fr PR regression/57313 * gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86). I'd say it should be PR testsuite/57313 (and the PR changed to that). Ok, thanks. I was actually thinking that it might be better to put the target selector on dg-final, as in the attached. Which do you prefer? -- Marc GlisseIndex: gcc.dg/binop-xor3.c === --- gcc.dg/binop-xor3.c (revision 199006) +++ gcc.dg/binop-xor3.c (working copy) @@ -1,11 +1,11 @@ /* { dg-do compile } */ /* { dg-options -O2 -fdump-tree-optimized } */ int foo (int a, int b) { return ((a !b) || (!a b)); } -/* { dg-final { scan-tree-dump-times \\\^ 1 optimized } } */ +/* { dg-final { scan-tree-dump-times \\\^ 1 optimized { target i?86-*-* x86_64-*-* } } } */ /* { dg-final { cleanup-tree-dump optimized } } */
Re: More vector folding
On Fri, May 17, 2013 at 04:37:45PM +0200, Marc Glisse wrote: On Fri, 17 May 2013, Jakub Jelinek wrote: On Fri, May 17, 2013 at 04:23:08PM +0200, Marc Glisse wrote: 2013-05-17 Marc Glisse marc.gli...@inria.fr PR regression/57313 * gcc.dg/binop-xor3.c: Restrict to platforms known to work (x86). I'd say it should be PR testsuite/57313 (and the PR changed to that). Ok, thanks. I was actually thinking that it might be better to put the target selector on dg-final, as in the attached. Which do you prefer? Doesn't really matter, both are fine. Jakub
Re: [PATCH] Fix extendsidi2_1 splitting (PR rtl-optimization/57281, PR rtl-optimization/57300 wrong-code, alternative)
On 05/17/2013 01:25 AM, Jakub Jelinek wrote: +(define_peephole2 + [(set (match_operand:SI 0 memory_operand) + (match_operand:SI 1 register_operand)) + (set (match_operand:SI 2 register_operand) (match_dup 1)) + (parallel [(set (match_dup 2) +(ashiftrt:SI (match_dup 2) + (match_operand:QI 3 const_int_operand))) +(clobber (reg:CC FLAGS_REG))]) + (set (match_operand:SI 4 memory_operand) (match_dup 2))] + INTVAL (operands[3]) == 31 No sense in using match_operand in the pattern and INTVAL == 31 in the condition when you can just use (const_int 31) in the pattern. Modulo those two cases, the patch is ok. r~
Re: [PATCH, i386]: Update processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags
On Fri, May 17, 2013 at 7:38 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: Thank you Uros for the patch. Could you backport this to the 4.8.0? -Original Message- From: Uros Bizjak [mailto:ubiz...@gmail.com] Sent: Wednesday, May 15, 2013 11:16 PM To: gcc-patches@gcc.gnu.org Cc: Gopalasubramanian, Ganesh Subject: [PATCH, i386]: Update processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags Hello! Attached patch adds missing PTA_PRFCHW and PTA_FXSR flags to x86 processor alias table. PRFCHW CPUID flag is shared with 3dnow prefetch flag, so some additional logic is needed to avoid generating SSE prefetches for non-SSE 3dNow! targets, while still generating full set of 3dnow prefetches on 3dNow! targets. 2013-05-15 Uros Bizjak ubiz...@gmail.com * config/i386/i386.c (iy86_option_override_internal): Update processor_alias_table for missing PTA_PRFCHW and PTA_FXSR flags. Add PTA_POPCNT to corei7 entry and remove PTA_SSE from athlon-4 entry. Do not enable SSE prefetch on non-SSE 3dNow! targets. Enable TARGET_PRFCHW for TARGET_3DNOW targets. * config/i386/i386.md (prefetch): Enable for TARGET_PRFCHW instead of TARGET_3DNOW. (*prefetch_3dnow): Enable for TARGET_PRFCHW only. Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN. The patch will be backported to 4.8 branch in a couple of days. Done. I have also reverted athlon-4 change everywhere. It does have SSE after all. Uros.
Fix pr49146 - crash in unwinder
The rationale is captured in the comment in the first hunk. Committed. r~ PR target/49146 * unwind-dw2.c (UNWIND_COLUMN_IN_RANGE): New macro. (execute_cfa_program): Use it when storing to fs-regs. diff --git a/libgcc/unwind-dw2.c b/libgcc/unwind-dw2.c index 80de5ab..041f9d5 100644 --- a/libgcc/unwind-dw2.c +++ b/libgcc/unwind-dw2.c @@ -59,6 +59,35 @@ #define DWARF_REG_TO_UNWIND_COLUMN(REGNO) (REGNO) #endif +/* ??? For the public function interfaces, we tend to gcc_assert that the + column numbers are in range. For the dwarf2 unwind info this does happen, + although so far in a case that doesn't actually matter. + + See PR49146, in which a call from x86_64 ms abi to x86_64 unix abi stores + the call-saved xmm registers and annotates them. We havn't bothered + providing support for the xmm registers for the x86_64 port primarily + because the 64-bit windows targets don't use dwarf2 unwind, using sjlj or + SEH instead. Adding the support for unix targets would generally be a + waste. However, some runtime libraries supplied with ICC do contain such + an unorthodox transition, as well as the unwind info to match. This loss + of register restoration doesn't matter in practice, because the exception + is caught in the native unix abi, where all of the xmm registers are + call clobbered. + + Ideally, we'd record some bit to notice when we're failing to restore some + register recorded in the unwind info, but to do that we need annotation on + the unix-ms abi edge, so that we know when the register data may be + discarded. And since this edge is also within the ICC library, we're + unlikely to be able to get the new annotation. + + Barring a magic solution to restore the ms abi defined 128-bit xmm registers + (as distictly opposed to the full runtime width) without causing extra + overhead for normal unix abis, the best solution seems to be to simply + ignore unwind data for unknown columns. */ + +#define UNWIND_COLUMN_IN_RANGE(x) \ +__builtin_expect((x) = DWARF_FRAME_REGISTERS, 1) + #ifdef REG_VALUE_IN_UNWIND_CONTEXT typedef _Unwind_Word _Unwind_Context_Reg_Val; @@ -939,14 +968,19 @@ execute_cfa_program (const unsigned char *insn_ptr, reg = insn 0x3f; insn_ptr = read_uleb128 (insn_ptr, utmp); offset = (_Unwind_Sword) utmp * fs-data_align; - fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how - = REG_SAVED_OFFSET; - fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].loc.offset = offset; + reg = DWARF_REG_TO_UNWIND_COLUMN (reg); + if (UNWIND_COLUMN_IN_RANGE (reg)) + { + fs-regs.reg[reg].how = REG_SAVED_OFFSET; + fs-regs.reg[reg].loc.offset = offset; + } } else if ((insn 0xc0) == DW_CFA_restore) { reg = insn 0x3f; - fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how = REG_UNSAVED; + reg = DWARF_REG_TO_UNWIND_COLUMN (reg); + if (UNWIND_COLUMN_IN_RANGE (reg)) + fs-regs.reg[reg].how = REG_UNSAVED; } else switch (insn) { @@ -977,26 +1011,35 @@ execute_cfa_program (const unsigned char *insn_ptr, insn_ptr = read_uleb128 (insn_ptr, reg); insn_ptr = read_uleb128 (insn_ptr, utmp); offset = (_Unwind_Sword) utmp * fs-data_align; - fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].how - = REG_SAVED_OFFSET; - fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN (reg)].loc.offset = offset; + reg = DWARF_REG_TO_UNWIND_COLUMN (reg); + if (UNWIND_COLUMN_IN_RANGE (reg)) + { + fs-regs.reg[reg].how = REG_SAVED_OFFSET; + fs-regs.reg[reg].loc.offset = offset; + } break; case DW_CFA_restore_extended: insn_ptr = read_uleb128 (insn_ptr, reg); /* FIXME, this is wrong; the CIE might have said that the register was saved somewhere. */ - fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNSAVED; + reg = DWARF_REG_TO_UNWIND_COLUMN (reg); + if (UNWIND_COLUMN_IN_RANGE (reg)) + fs-regs.reg[reg].how = REG_UNSAVED; break; case DW_CFA_same_value: insn_ptr = read_uleb128 (insn_ptr, reg); - fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNSAVED; + reg = DWARF_REG_TO_UNWIND_COLUMN (reg); + if (UNWIND_COLUMN_IN_RANGE (reg)) + fs-regs.reg[reg].how = REG_UNSAVED; break; case DW_CFA_undefined: insn_ptr = read_uleb128 (insn_ptr, reg); - fs-regs.reg[DWARF_REG_TO_UNWIND_COLUMN(reg)].how = REG_UNDEFINED; + reg = DWARF_REG_TO_UNWIND_COLUMN (reg); + if (UNWIND_COLUMN_IN_RANGE (reg)) + fs-regs.reg[reg].how = REG_UNDEFINED; break; case DW_CFA_nop: @@ -1007,9 +1050,12 @@ execute_cfa_program (const unsigned char *insn_ptr,
Re: Patch ping
On 05/17/2013 12:49 AM, Jakub Jelinek wrote: Hi! http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00282.html Reject -fsanitize=address -fsanitize=thread linking that won't ever work at runtime. I thought Dodji already OK's this.If there's any concern about the scope going outside Dodji's area, then I'll approve it :-) jeff
Re: [PATCH] Fix incorrect discriminator assignment.
On Fri, May 17, 2013 at 1:22 AM, Richard Biener richard.guent...@gmail.com wrote: On Wed, May 15, 2013 at 6:50 PM, Cary Coutant ccout...@google.com wrote: gcc/ChangeLog: * tree-cfg.c (locus_descrim_hasher::hash): Only hash lineno. (locus_descrim_hasher::equal): Likewise. (next_discriminator_for_locus): Likewise. (assign_discriminator): Add return value. (make_edges): Assign more discriminator if necessary. (make_cond_expr_edges): Likewise. (make_goto_expr_edges): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/debug/dwarf2/discriminator.c: New Test. Looks OK to me, but I can't approve patches for tree-cfg.c. Two comments: (1) In the C++ conversion, it looks like someone misspelled discrim in locus_descrim_hasher. (2) Is this worth fixing in trunk when we'll eventually switch to a per-instruction discriminator? This patch fixes a common case where one line is distributed to 3 BBs, but only 1 discriminator is assigned. As far as I can see the patch makes discriminators coarser (by hashing and comparing on LOCATION_LINE and not on the location). It also has the changes like - assign_discriminator (entry_locus, then_bb); + if (assign_discriminator (entry_locus, then_bb)) +assign_discriminator (entry_locus, bb); which is what the comment refers to? I think those changes are short-sighted because what happens if the 2nd assign_discriminator call has exactly the same issue? Especially with the make_goto_expr_edges change there may be multiple predecessors where I cannot see how the change is correct. Yes, the change changes something and thus may fix the testcase but it doesn't change things in a predictable way as far as I can see. So - the change to compare/hash LOCATION_LINE looks good to me, but the assign_discriminator change needs more explaining. The new discriminator assignment algorithm is: 1 FOR_EACH_BB: 2 FOR_EACH_SUCC_EDGE: 3 if (same_line(bb, succ_bb)) 4 if (succ_bb has no discriminator) 5 succ_bb-discriminator = new_discriminator 6 else if (bb has no discriminator) 7 bb-discriminator = new_discriminator Because there are so many places to handle SUCC_EDGE, thus the logic line #3, #4 is embedded within assign_discriminator function, and the logic in line #6 is encoded as the return value of assign_discriminator. Thanks, Dehao Richard. -cary
[PATCH] [tree-optimization/57124] Updated fix for 254.gap problems
As I believe I pointed out in a follow-up message, 254.gap is depending on signed overflow semantics. This patch avoids eliminating a cast feeding a conditional when the SSA_NAME's range has overflowed unless -fstrict-overflow is in effect. Thus 254.gap should be building with -fno-strict-overflow. This patch also introduces a warning when the optimization is applied and the ranges have overflowed. Bootstrapped and regression tested on x86-unknown-linux-gnu. OK for the trunk? commit 62bbaa8de0e8d929eb3c63331b47950e9b09d801 Author: Jeff Law l...@redhat.com Date: Wed May 1 12:33:20 2013 -0600 PR tree-optimization/57124 * tree-vrp.c (simplify_cond_using_ranges): Only simplify a conversion feeding a condition if the range has an overflow if -fstrict-overflow. Add warnings for when we do make the transformation. PR tree-optimization/57124 * gcc.c-torture/execute/pr57124.c: New test. * gcc.c-torture/execute/pr57124.x: Set -fno-strict-overflow. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8e92c44..9320f21 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2013-05-17 Jeff Law l...@redhat.com + + PR tree-optimization/57124 + * tree-vrp.c (simplify_cond_using_ranges): Only simplify a + conversion feeding a condition if the range has an overflow + if -fstrict-overflow. Add warnings for when we do make the + transformation. + 2013-05-16 Rainer Orth r...@cebitec.uni-bielefeld.de * reorg.c (link_cc0_insns): Wrap in #ifdef HAVE_cc0. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 879b9bc..482151c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2013-05-17 Jeff Law l...@redhat.com + + PR tree-optimization/57124 + * gcc.c-torture/execute/pr57124.c: New test. + * gcc.c-torture/execute/pr57124.x: Set -fno-strict-overflow. + 2013-05-16 Greta Yorsh greta.yo...@arm.com * gcc.target/arm/unaligned-memcpy-2.c: Adjust expected output. diff --git a/gcc/testsuite/gcc.c-torture/execute/pr57124.c b/gcc/testsuite/gcc.c-torture/execute/pr57124.c new file mode 100644 index 000..835d249 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr57124.c @@ -0,0 +1,27 @@ +__attribute__ ((noinline)) +foo(short unsigned int *p1, short unsigned int *p2) +{ + short unsigned int x1, x4; + int x2, x3, x5, x6; + unsigned int x7; + + x1 = *p1; + x2 = (int) x1; + x3 = x2 * 65536; + x4 = *p2; + x5 = (int) x4; + x6 = x3 + x4; + x7 = (unsigned int) x6; + if (x7 = 268435455U) +abort (); + exit (0); +} + +main() +{ + short unsigned int x, y; + x = -5; + y = -10; + foo (x, y); +} + diff --git a/gcc/testsuite/gcc.c-torture/execute/pr57124.x b/gcc/testsuite/gcc.c-torture/execute/pr57124.x new file mode 100644 index 000..d8cacbe --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr57124.x @@ -0,0 +1,2 @@ +set additional_flags -fno-strict-overflow +return 0 diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index b5de683..b3eccf0 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -8669,8 +8669,32 @@ simplify_cond_using_ranges (gimple stmt) range_fits_type_p (vr, TYPE_PRECISION (TREE_TYPE (op0)), TYPE_UNSIGNED (TREE_TYPE (op0))) - int_fits_type_p (op1, TREE_TYPE (innerop))) + int_fits_type_p (op1, TREE_TYPE (innerop)) + /* The range must not have overflowed, or if it did overflow +we must not be wrapping/trapping overflow and optimizing +with strict overflow semantics. */ + ((!is_negative_overflow_infinity (vr-min) + !is_positive_overflow_infinity (vr-max)) + || (!flag_wrapv !flag_trapv flag_strict_overflow))) { + /* If the range overflowed and the user has asked for warnings +when strict overflow semantics were used to optimize code, +issue an appropriate warning. */ + if ((is_negative_overflow_infinity (vr-min) + || is_positive_overflow_infinity (vr-max)) + issue_strict_overflow_warning (WARN_STRICT_OVERFLOW_CONDITIONAL)) + { + location_t location; + + if (!gimple_has_location (stmt)) + location = input_location; + else + location = gimple_location (stmt); + warning_at (location, OPT_Wstrict_overflow, + assuming signed overflow does not occur when + simplifying conditional.); + } + tree newconst = fold_convert (TREE_TYPE (innerop), op1); gimple_cond_set_lhs (stmt, innerop); gimple_cond_set_rhs (stmt, newconst);
Add myself to MAINTAINERS as Write After Approval
Adding myself to the MAINTAINERS in the Write After Approval category Index: ChangeLog === --- ChangeLog (revision 199021) +++ ChangeLog (revision 199022) @@ -1,3 +1,7 @@ +2013-05-17 David Malcolm dmalc...@redhat.com + + * MAINTAINERS (Write After Approval): Add myself. + 2013-04-30 Brooks Moses bmo...@google.com * MAINTAINERS: Update my email; move myself from Fortran Index: MAINTAINERS === --- MAINTAINERS (revision 199021) +++ MAINTAINERS (revision 199022) @@ -452,6 +452,7 @@ Christophe Lyon christophe.l...@st.com Luis Machado luis...@br.ibm.com Ziga Mahkovec ziga.mahko...@klika.si +David Malcolm dmalc...@redhat.com Simon Martin simar...@users.sourceforge.net Ranjit Mathew rmat...@hotmail.com Michael Matz m...@suse.de
[wwwdocs] gcc-4.8/changes.html: mention IRA and transactional memory
Errr, how did we miss this? Ok, I'm partly to blame for the lack of transactional memory in changes.html, but something as big as getting rid of reload?! Would it be preferable to mention IRA in the target specific x86 section instead? I figured it was an important enough change to warrant front-page coverage. Is this OK? Index: htdocs/gcc-4.8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.116 diff -u -r1.116 changes.html --- htdocs/gcc-4.8/changes.html 24 Apr 2013 15:14:26 - 1.116 +++ htdocs/gcc-4.8/changes.html 17 May 2013 17:49:19 - @@ -143,6 +143,11 @@ code-fsanitize=thread/code. Instructions will be instrumented to detect data races. The ThreadSanitizer is available on x86-64 GNU/Linux./li +liA new local register allocator has been implemented, which +replaces the 26 year old reload pass and improves generated code +quality on ia32 and x86-64 targets./li +liSupport for transactional memory has been implemented on +selected architectures./li /ul
[PATCH, AArch64] Allow insv_imm to handle bigger immediates via masking to 16-bits
The MOVK instruction is currently not used when operand 2 is more than 16 bits, which leads to sub-optimal code. This patch improves those situations by removing the check and instead masking down to 16 bits within the new X format specifier I added recently. OK for trunk? Cheers, Ian 2013-05-17 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.c (aarch64_print_operand): Change the X format specifier to only display bottom 16 bits. * config/aarch64/aarch64.md (insv_immmode): Allow any-sized immediate to match for operand 2, since it will be masked.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b57416c..1bdfd85 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3424,13 +3424,13 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case 'X': - /* Print integer constant in hex. */ + /* Print bottom 16 bits of integer constant in hex. */ if (GET_CODE (x) != CONST_INT) { output_operand_lossage (invalid operand for '%%%c', code); return; } - asm_fprintf (f, 0x%wx, UINTVAL (x)); + asm_fprintf (f, 0x%wx, UINTVAL (x) 0x); break; case 'w': diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b27bcda..403d717 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -858,9 +858,8 @@ (const_int 16) (match_operand:GPI 1 const_int_operand n)) (match_operand:GPI 2 const_int_operand n))] - INTVAL (operands[1]) GET_MODE_BITSIZE (MODEmode) -INTVAL (operands[1]) % 16 == 0 -UINTVAL (operands[2]) = 0x + UINTVAL (operands[1]) GET_MODE_BITSIZE (MODEmode) +UINTVAL (operands[1]) % 16 == 0 movk\\t%w0, %X2, lsl %1 [(set_attr v8type movk) (set_attr mode MODE)]
Re: [patch] Fix sched-deps DEP_POSTPONED, ds_t documentation
Ping On Sun, May 12, 2013 at 5:45 PM, Steven Bosscher wrote: Hello, While working on a sched-deps based delay slot scheduler, I've come to the conclusion that the dependencies themselves must indicate whether the dependent ref is delayed. So I started hacking sched-deps and ran into trouble... It turns out there is a problem introduced along with DEP_POSTPONED last year, but the real problem is the complicated ds_t representation and the unclear documentation. The first *6* bits on a ds_t were reserved for the dependence type, and two more bits were reserved for HARD_DEP and DEP_CANCELLED: -/* First '6' stands for 4 dep type bits and the HARD_DEP and DEP_CANCELLED - bits. - Second '4' stands for BEGIN_{DATA, CONTROL}, BE_IN_{DATA, CONTROL} - dep weakness. */ -#define BITS_PER_DEP_WEAK ((BITS_PER_DEP_STATUS - 6) / 4) But DEP_POSTPONED adds another bit: /* Instruction has non-speculative dependence. This bit represents the property of an instruction - not the one of a dependence. Therefore, it can appear only in the TODO_SPEC field of an instruction. */ #define HARD_DEP (DEP_CONTROL 1) /* Set in the TODO_SPEC field of an instruction for which new_ready has decided not to schedule it speculatively. */ #define DEP_POSTPONED (HARD_DEP 1) /* Set if a dependency is cancelled via speculation. */ #define DEP_CANCELLED (DEP_POSTPONED 1) I wanted to add another flag, DEP_DELAYED, and optimistically just added another bit, the compiler started warning, etc. So far we seem to've gotten away with this because the sign bit on a ds_t was unused: /* We exclude sign bit. */ #define BITS_PER_DEP_STATUS (HOST_BITS_PER_INT - 1) The attached patch extends the ds_t documentation to clarify in a comment how all the bits are used. I've made ds_t and dw_t unsigned int, because ds_t is just a collection of bits, and dw_t is unsigned. The target hooks I had to update are all only used by ia64. I opportunistically reserved the one left-over bit for my own purposes ;-) Bootstrappedtested on ia64-unknown-linux-gnu and on powerpc64-unknown-linux-gnu unix-{,-m32). OK for trunk? Ciao! Steven
Re: [PATCH] Use indentation in gtype.state to show nested structure
On Wed, 2013-05-15 at 11:51 -0600, Jeff Law wrote: On 05/14/2013 01:07 PM, David Malcolm wrote: On Tue, 2013-05-14 at 11:44 -0600, Jeff Law wrote: On 05/06/2013 03:05 PM, David Malcolm wrote: [...snip review and comments about auto-checking of formatting...] Anyway, the patch is good to go. Please install. Thanks for reviewing. Probably a dumb question, but by Please install, do you mean, please commit to SVN? Yes. Thanks. Committed to svn trunk as r199032 (using the version with the fixed ChangeLog).
Re: web ICEs on subreg
On May 16, 2013, at 5:26 PM, David Edelsohn dje@gmail.com wrote: This patch is creating new segfaults for 32 bit POWER AIX. Thanks for the heads up. Fixed in r199030. 2013-05-17 Mike Stump mikest...@comcast.net PR rtl-optimization/57304 * web.c (union_match_dups): Ensure that DF_REF_LOC exists before accessing DF_REF_REAL_LOC. Index: web.c === --- web.c (revision 199016) +++ web.c (working copy) @@ -133,9 +133,10 @@ union_match_dups (rtx insn, struct web_e entry = type == OP_IN ? use_entry : def_entry; for (; *ref; ref++) { - if (DF_REF_LOC (*ref) == recog_data.operand_loc[op]) + rtx *l = DF_REF_LOC (*ref); + if (l == recog_data.operand_loc[op]) break; - if (DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op]) + if (l DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op]) break; } @@ -143,9 +144,10 @@ union_match_dups (rtx insn, struct web_e { for (ref = use_link, entry = use_entry; *ref; ref++) { - if (DF_REF_LOC (*ref) == recog_data.operand_loc[op]) + rtx *l = DF_REF_LOC (*ref); + if (l == recog_data.operand_loc[op]) break; - if (DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op]) + if (l DF_REF_REAL_LOC (*ref) == recog_data.operand_loc[op]) break; } } --
[patch] Cleanup the CFG after pro_and_epilogue pass
Hello, Trying to dump the CFG as a graph fails after the pro_and_epilogue pass because it leaves unreachable basic blocks. This patch fixes that issue. Will commit sometime next week if no-one objects. Ciao! Steven * function.c (rest_of_handle_thread_prologue_and_epilogue): Cleanup the CFG after thread_prologue_and_epilogue_insns. Index: function.c === --- function.c (revision 199028) +++ function.c (working copy) @@ -6976,6 +6976,9 @@ rest_of_handle_thread_prologue_and_epilo scheduling to operate in the epilogue. */ thread_prologue_and_epilogue_insns (); + /* The prologue and epilogue may have made some blocks unreachable. */ + cleanup_cfg (0); + /* The stack usage info is finalized during prologue expansion. */ if (flag_stack_usage_info) output_stack_usage ();
Re: [patch] Cleanup the CFG after pro_and_epilogue pass
On 05/17/2013 01:49 PM, Steven Bosscher wrote: Hello, Trying to dump the CFG as a graph fails after the pro_and_epilogue pass because it leaves unreachable basic blocks. This patch fixes that issue. Will commit sometime next week if no-one objects. Ciao! Steven * function.c (rest_of_handle_thread_prologue_and_epilogue): Cleanup the CFG after thread_prologue_and_epilogue_insns. ?!? Under what conditions does threading the prologue/epilogue make code unreachable? Also note that commit after X days if no-one objects is not the projects policy. Please don't take such liberties. Jeff
Re: [Patch ARM] Fix PR target/19599
On 05/15/2013 04:50 AM, Ramana Radhakrishnan wrote: /* Cannot tail-call to long calls, since these are out of range of a branch instruction. */ - if (arm_is_long_call_p (decl)) + if (decl arm_is_long_call_p (decl)) return false; You can tail call long calls via indirection now. I.e. load the address first. (define_insn *sibcall_insn - [(call (mem:SI (match_operand:SI 0 X)) + [(call (mem:SI (match_operand:SI 0 call_insn_operand Cs,Ss)) FWIW, i or s would work just as well, given you've already constrained by call_insn_operand; Ss isn't really needed. + if (arm_arch5 || arm_arch4t) + return \ bx\\t%0\\t%@ indirect register sibling call\; Missing %?. + if (arm_arch5 || arm_arch4t) + return \bx\\t%1\; Likewise. diff --git a/gcc/testsuite/gcc.target/arm/pr40887.c b/gcc/testsuite/gcc.target/arm/pr40887.c index 0b5e873..5cabe3a 100644 --- a/gcc/testsuite/gcc.target/arm/pr40887.c +++ b/gcc/testsuite/gcc.target/arm/pr40887.c @@ -2,9 +2,9 @@ /* { dg-options -O2 -march=armv5te } */ /* { dg-final { scan-assembler blx } } */ -int (*indirect_func)(); +int (*indirect_func)(int x); int indirect_call() { -return indirect_func(); + return indirect_func(20) + indirect_func (40); } You've made this test case 100% useless. Of course there will always be a blx insn, since you've two calls there, and one of them will never be tail called. It should have been either removed or become your new test case. r~
[PATCH, i386]: Also pass mmx, 3dnow and sseX options with -march=native
Hello! For some reason we didn't pass mmx, 3dnow and sseX options to cc1 with -march=native. Passing these options will help older processors that doesn't get detected through architecture recognition functionality to get all their ISAs enabled. 2013-05-17 Uros Bizjak ubiz...@gmail.com * config/i386/driver-i386.c (host_detect_local_cpu): Pass mmx, 3dnow, sse, sse2, sse3, ssse3 and sse4a flags to options. Tested on x86_64-pc-linux-gnu {,-m32} and committed to mainline. Uros. Index: config/i386/driver-i386.c === --- config/i386/driver-i386.c (revision 198989) +++ config/i386/driver-i386.c (working copy) @@ -786,10 +786,17 @@ const char *host_detect_local_cpu (int argc, const if (arch) { + const char *mmx = has_mmx ? -mmmx : -mno-mmx; + const char *mmx3dnow = has_3dnow ? -m3dnow : -mno-3dnow; + const char *sse = has_sse ? -msse : -mno-sse; + const char *sse2 = has_sse2 ? -msse2 : -mno-sse2; + const char *sse3 = has_sse3 ? -msse3 : -mno-sse3; + const char *ssse3 = has_ssse3 ? -mssse3 : -mno-ssse3; + const char *sse4a = has_sse4a ? -msse4a : -mno-sse4a; const char *cx16 = has_cmpxchg16b ? -mcx16 : -mno-cx16; const char *sahf = has_lahf_lm ? -msahf : -mno-sahf; const char *movbe = has_movbe ? -mmovbe : -mno-movbe; - const char *ase = has_aes ? -maes : -mno-aes; + const char *aes = has_aes ? -maes : -mno-aes; const char *pclmul = has_pclmul ? -mpclmul : -mno-pclmul; const char *popcnt = has_popcnt ? -mpopcnt : -mno-popcnt; const char *abm = has_abm ? -mabm : -mno-abm; @@ -817,7 +824,8 @@ const char *host_detect_local_cpu (int argc, const const char *xsave = has_xsave ? -mxsave : -mno-xsave; const char *xsaveopt = has_xsaveopt ? -mxsaveopt : -mno-xsaveopt; - options = concat (options, cx16, sahf, movbe, ase, pclmul, + options = concat (options, mmx, mmx3dnow, sse, sse2, sse3, ssse3, + sse4a, cx16, sahf, movbe, aes, pclmul, popcnt, abm, lwp, fma, fma4, xop, bmi, bmi2, tbm, avx, avx2, sse4_2, sse4_1, lzcnt, rtm, hle, rdrnd, f16c, fsgsbase, rdseed, prfchw, adx,
Re: [PATCH:RL78] Add new insn for mulqi3 and mulhi3
On 05/14/2013 09:43 PM, Kaushik Phatak wrote: Hi, The below patch adds expanders and insns for QI and HI mode for the RL78 target. The QI mode uses a generic 'mulu' instruction supported by all variants, while the HI mode creates insn for G13 and G14 target variants using hardware multiply instructions. Tested on hardware and simulator with no additional regressions. Kindly review the same. Thanks, Kaushik 2013-05-15 Kaushik Phatak kaushik.pha...@kpitcummins.com * config/rl78/rl78.md (mulqi3,mulhi3): New define_expands. (mulqi3_rl78,mulhi3_rl78,mulhi3_g13): New define_insns. Index: gcc/config/rl78/rl78.md === --- gcc/config/rl78/rl78.md (revision 198915) +++ gcc/config/rl78/rl78.md (working copy) @@ -235,6 +235,24 @@ [(set_attr valloc macax)] ) +(define_expand mulqi3 + [(set (match_operand:QI 0 register_operand =v) + (mult:QI (match_operand:QI 1 general_operand +vim) + (match_operand:QI 2 nonmemory_operand vi))) + ] + ; mulu supported by all targets + +) No constraints on define_expand, only predicates. +(define_insn mulqi3_rl78 +(define_insn mulhi3_rl78 +(define_insn mulhi3_g13 These names are not used. They should be prefixed with * to indicate the name is just for documentation. r~
Re: [patch] Cleanup the CFG after pro_and_epilogue pass
On Fri, May 17, 2013 at 9:52 PM, Jeff Law l...@redhat.com wrote: On 05/17/2013 01:49 PM, Steven Bosscher wrote: Hello, Trying to dump the CFG as a graph fails after the pro_and_epilogue pass because it leaves unreachable basic blocks. This patch fixes that issue. Will commit sometime next week if no-one objects. Ciao! Steven * function.c (rest_of_handle_thread_prologue_and_epilogue): Cleanup the CFG after thread_prologue_and_epilogue_insns. ?!? Under what conditions does threading the prologue/epilogue make code unreachable? Compiling testsuite/gcc.target/i386/pad-10.c with -fdump-rtl-all-graph. Just before rest_of_handle_thread_prologue_and_epilogue we have: Breakpoint 2, rest_of_handle_thread_prologue_and_epilogue () at ../../trunk/gcc/function.c:6969 6969{ (gdb) p brief_dump_cfg(stderr,-1) ;; basic block 2, loop depth 0, count 0, freq 1, maybe hot ;; prev block 0, next block 3, flags: (REACHABLE, RTL, MODIFIED) ;; pred: ENTRY [100.0%] (FALLTHRU) ;; succ: 3 [9.2%] (FALLTHRU) ;; 4 [90.8%] ;; basic block 3, loop depth 0, count 0, freq 922, maybe hot ;; prev block 2, next block 4, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 2 [9.2%] (FALLTHRU) ;; succ: 4 [100.0%] (FALLTHRU) ;; basic block 4, loop depth 0, count 0, freq 1, maybe hot ;; prev block 3, next block 1, flags: (REACHABLE, RTL, MODIFIED) ;; pred: 3 [100.0%] (FALLTHRU) ;; 2 [90.8%] ;; succ: EXIT [100.0%] (FALLTHRU) $1 = void (gdb) p debug_bb_n(2) (note 6 1 4 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 4 6 31 2 NOTE_INSN_FUNCTION_BEG) (insn 31 4 17 2 (set (reg:SI 0 ax [orig:59 D.1775 ] [59]) (reg/v:SI 4 si [orig:62 x ] [62])) testsuite/gcc.target/i386/pad-10.c:18 85 {*movsi_internal} (nil)) (insn 17 31 8 2 (parallel [ (set (reg:SI 0 ax [orig:59 D.1775 ] [59]) (minus:SI (reg:SI 0 ax [orig:59 D.1775 ] [59]) (reg/v:SI 5 di [orig:61 z ] [61]))) (clobber (reg:CC 17 flags)) ]) testsuite/gcc.target/i386/pad-10.c:18 295 {*subsi_1} (nil)) (insn 8 17 9 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg/v:SI 4 si [orig:62 x ] [62]) (const_int 1 [0x1]))) testsuite/gcc.target/i386/pad-10.c:12 7 {*cmpsi_1} (expr_list:REG_DEAD (reg/v:SI 4 si [orig:62 x ] [62]) (nil))) (jump_insn 9 8 10 2 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref:DI 18) (pc))) testsuite/gcc.target/i386/pad-10.c:12 602 {*jcc_1} (expr_list:REG_BR_PROB (const_int 9078 [0x2376]) (nil)) - 18) $2 = (basic_block_def *) 0x76224410 (gdb) p debug_bb_n(3) (note 10 9 33 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (insn 33 10 11 3 (set (mem/c:SI (plus:DI (reg/f:DI 7 sp) (const_int 12 [0xc])) [2 %sfp+-4 S4 A32]) (reg/v:SI 5 di [orig:61 z ] [61])) 85 {*movsi_internal} (expr_list:REG_DEAD (reg/v:SI 5 di [orig:61 z ] [61]) (nil))) (insn 11 33 12 3 (set (reg:QI 0 ax) (const_int 0 [0])) testsuite/gcc.target/i386/pad-10.c:14 87 {*movqi_internal} (nil)) (call_insn 12 11 34 3 (call (mem:QI (symbol_ref:DI (bar) [flags 0x41] function_decl 0x76225800 bar) [0 bar S1 A8]) (const_int 0 [0])) testsuite/gcc.target/i386/pad-10.c:14 646 {*call} (expr_list:REG_DEAD (reg:QI 0 ax) (nil)) (expr_list:REG_DEP_TRUE (use (reg:QI 0 ax)) (nil))) (insn 34 12 5 3 (set (reg/v:SI 5 di [orig:61 z ] [61]) (mem/c:SI (plus:DI (reg/f:DI 7 sp) (const_int 12 [0xc])) [2 %sfp+-4 S4 A32])) testsuite/gcc.target/i386/pad-10.c:15 85 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 65) (nil))) (insn 5 34 18 3 (set (reg:SI 0 ax [orig:59 D.1775 ] [59]) (reg/v:SI 5 di [orig:61 z ] [61])) testsuite/gcc.target/i386/pad-10.c:15 85 {*movsi_internal} (expr_list:REG_DEAD (reg/v:SI 5 di [orig:61 z ] [61]) (nil))) $3 = (basic_block_def *) 0x76224208 (gdb) p debug_bb_n(4) (code_label 18 5 19 4 3 [1 uses]) (note 19 18 27 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (insn 27 19 0 4 (use (reg/i:SI 0 ax)) testsuite/gcc.target/i386/pad-10.c:19 -1 (nil)) $4 = (basic_block_def *) 0x762242d8 Continuing from here leads to an ice in the graph dumper: (gdb) cont Continuing. Breakpoint 1, internal_error (gmsgid=gmsgid@entry=0x119725e in %s, at %s:%d) at ../../trunk/gcc/diagnostic.c:1091 1091bool (gdb) bt 10 #0 internal_error (gmsgid=gmsgid@entry=0x119725e in %s, at %s:%d) at ../../trunk/gcc/diagnostic.c:1091 #1 0x00dfda24 in fancy_abort (file=optimized out, line=869, function=0xee3e60 pre_and_rev_post_order_compute(int*, int*, bool)::__FUNCTION__ pre_and_rev_post_order_compute) at ../../trunk/gcc/diagnostic.c:1151 #2 0x0061dbab in pre_and_rev_post_order_compute (pre_order=0x0, rev_post_order=0x1627850, include_entry_exit=optimized out) at ../../trunk/gcc/cfganal.c:869 #3 0x00d28d22
Re: [patch] Cleanup the CFG after pro_and_epilogue pass
On 05/17/2013 02:53 PM, Steven Bosscher wrote: On Fri, May 17, 2013 at 9:52 PM, Jeff Law l...@redhat.com wrote: On 05/17/2013 01:49 PM, Steven Bosscher wrote: Hello, Trying to dump the CFG as a graph fails after the pro_and_epilogue pass because it leaves unreachable basic blocks. This patch fixes that issue. Will commit sometime next week if no-one objects. Ciao! Steven * function.c (rest_of_handle_thread_prologue_and_epilogue): Cleanup the CFG after thread_prologue_and_epilogue_insns. ?!? Under what conditions does threading the prologue/epilogue make code unreachable? Compiling testsuite/gcc.target/i386/pad-10.c with -fdump-rtl-all-graph. Just before rest_of_handle_thread_prologue_and_epilogue we have: Thanks. What's happened, is that emitting the epilogue at the end of basic block 4 (with a barrier at the end) has made the use insn 43 unreachable. But from the description you've given, it appears that the epilogue itself has unreachable code, and that shouldn't be happening. If you think it can happen by way of shrink-wrapping, I'd like to see the analysis. What does the epilogue itself look like before threading it into the insn stream? This reaction seems to me like an unnecessary and disappointing slap on the wrist. Project policy does allow committing obvious patches without review. Perhaps I should just use that policy and not give people the chance to comment? I would think you would also know by now that I know all CFG related code as well as anyone, and, frankly, probably better than even you. You do not make project policy. It's that simple. Yes we have a obvious patch policy, but this certainly doesn't fall under that policy. I would frown upon you using the obvious patch path to bypass existing policy for patch review. If you want commit without review privileges, ask for someone to sponsor your request an the steering committee will vote on it. jeff
Re: [patch] Cleanup the CFG after pro_and_epilogue pass
On Fri, May 17, 2013 at 11:16 PM, Jeff Law wrote: What's happened, is that emitting the epilogue at the end of basic block 4 (with a barrier at the end) has made the use insn 43 unreachable. But from the description you've given, it appears that the epilogue itself has unreachable code, and that shouldn't be happening. If you think it can happen by way of shrink-wrapping, I'd like to see the analysis. It is not the epilogue itself but the way shrink-wrapping emits it. The block that is unreachable has its last predecessor edge removed in function.c:6607: 6607 redirect_edge_and_branch_force (e, *pdest_bb); I haven't looked at how the shrink-wrapping code works exactly. It's Bernd's code, so perhaps he can have a look. This is now PR57320. Ciao! Steven
Re: [patch] Cleanup the CFG after pro_and_epilogue pass
On Fri, May 17, 2013 at 9:49 PM, Steven Bosscher wrote: Will commit sometime next week if no-one objects. Obviously there are objections. Patch withdrawn. Ciao! Steven
Re: [patch] Cleanup the CFG after pro_and_epilogue pass
On 05/17/2013 03:53 PM, Steven Bosscher wrote: On Fri, May 17, 2013 at 11:16 PM, Jeff Law wrote: What's happened, is that emitting the epilogue at the end of basic block 4 (with a barrier at the end) has made the use insn 43 unreachable. But from the description you've given, it appears that the epilogue itself has unreachable code, and that shouldn't be happening. If you think it can happen by way of shrink-wrapping, I'd like to see the analysis. It is not the epilogue itself but the way shrink-wrapping emits it. The block that is unreachable has its last predecessor edge removed in function.c:6607: 6607 redirect_edge_and_branch_force (e, *pdest_bb); I haven't looked at how the shrink-wrapping code works exactly. It's Bernd's code, so perhaps he can have a look. This is now PR57320. OK. Let's go with your patch then. Approved with a comment that shrink-wrapping can result in unreachable edges in the epilogue. jeff
Fix infinite loop in lto-partition.c
Hi, privatize_symbol_name skips privatizing of symbols where we know for some reason that it is needed. Loop in rename_statics however expect privatize_symbol_name to always rename the var and it can get into an infinite loop. Bootstrapped/regtested x86_64-linux, comitted. Honza * lto-partition.c (privatize_symbol_name): Return true when privatizing happened. (rename_statics): Do not go into infinite loop when privatizing is not needed. Index: lto-partition.c === --- lto-partition.c (revision 198936) +++ lto-partition.c (working copy) @@ -766,7 +766,7 @@ lto_balanced_map (void) with symbols defined out of the LTO world. */ -static void +static bool privatize_symbol_name (symtab_node node) { tree decl = node-symbol.decl; @@ -781,7 +781,7 @@ privatize_symbol_name (symtab_node node) fprintf (cgraph_dump_file, Not privatizing symbol name: %s. It privatized already.\n, name); - return; + return false; } /* Avoid mangling of already mangled clones. ??? should have a flag whether a symbol has a 'private' name already, @@ -793,7 +793,7 @@ privatize_symbol_name (symtab_node node) fprintf (cgraph_dump_file, Not privatizing symbol name: %s. Has unique name.\n, name); - return; + return false; } change_decl_assembler_name (decl, clone_function_name (decl, lto_priv)); if (node-symbol.lto_file_data) @@ -804,6 +804,7 @@ privatize_symbol_name (symtab_node node) fprintf (cgraph_dump_file, Privatizing symbol name: %s - %s\n, name, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl))); + return true; } /* Promote variable VNODE to be static. */ @@ -906,11 +907,12 @@ rename_statics (lto_symtab_encoder_t enc (!encoder || lto_symtab_encoder_lookup (encoder, s) != LCC_NOT_FOUND)) { -privatize_symbol_name (s); - /* Re-start from beggining since we do not know how many symbols changed a name. */ - s = symtab_node_for_asm (name); +if (privatize_symbol_name (s)) + /* Re-start from beggining since we do not know how many symbols changed a name. */ + s = symtab_node_for_asm (name); +else s = s-symbol.next_sharing_asm_name; } - else s = s-symbol.next_sharing_asm_name; +else s = s-symbol.next_sharing_asm_name; } /* Find out all static decls that need to be promoted to global because
Re: [Patch ARM] Fix PR target/19599
On Fri, May 17, 2013 at 9:12 PM, Richard Henderson r...@redhat.com wrote: On 05/15/2013 04:50 AM, Ramana Radhakrishnan wrote: /* Cannot tail-call to long calls, since these are out of range of a branch instruction. */ - if (arm_is_long_call_p (decl)) + if (decl arm_is_long_call_p (decl)) return false; You can tail call long calls via indirection now. I.e. load the address first. (define_insn *sibcall_insn - [(call (mem:SI (match_operand:SI 0 X)) + [(call (mem:SI (match_operand:SI 0 call_insn_operand Cs,Ss)) FWIW, i or s would work just as well, given you've already constrained by call_insn_operand; Ss isn't really needed. True - don't know what I was thinking there. + if (arm_arch5 || arm_arch4t) + return \ bx\\t%0\\t%@ indirect register sibling call\; Missing %?. It's not marked predicable - if anything these %? markers ought to be removed as well as I don't expect these to be conditionalized ever. + if (arm_arch5 || arm_arch4t) + return \bx\\t%1\; Likewise. diff --git a/gcc/testsuite/gcc.target/arm/pr40887.c b/gcc/testsuite/gcc.target/arm/pr40887.c index 0b5e873..5cabe3a 100644 --- a/gcc/testsuite/gcc.target/arm/pr40887.c +++ b/gcc/testsuite/gcc.target/arm/pr40887.c @@ -2,9 +2,9 @@ /* { dg-options -O2 -march=armv5te } */ /* { dg-final { scan-assembler blx } } */ -int (*indirect_func)(); +int (*indirect_func)(int x); int indirect_call() { -return indirect_func(); + return indirect_func(20) + indirect_func (40); } You've made this test case 100% useless. Of course there will always be a blx insn, since you've two calls there, and one of them will never be tail called. It should have been either removed or become your new test case. If you haven't noticed it has effectively moved to pr19599.c as my testcase . This is not 100% useless - this test was to make sure we actually put out a blx for an indirect call as per PR40887 . So it *actually* continues to test what we want as per PR40887 . Thanks, Ramana r~
Re: [GOOGLE] Back port discriminator patches to gcc-4_8
The warning was attributed to the wrong lineno. Looks like discriminator does not work well when it coexists with macros. I think the problem is with same_line_p. It's using expand_location to test whether two locations refer to the same line, but expand_location always unwinds the macro stack so that it's looking at the line number of the macro expansion point. That means that every token in the macro expansion will appear to be at the same line, so the loop over the basic_block will replace all of the locations with the new_locus that we just allocated for the new discriminator. Thus, it's clobbering the locus for the instructions at line 27 in the macro definition with the new locus we create to put a discriminator at line 26, and the warning that should appear for line 27 now says line 26. Other comments on the patch... In expand_location_1: + if (min_discriminator_location != UNKNOWN_LOCATION + loc = min_discriminator_location + loc min_discriminator_location + next_discriminator_location) The last comparison should just be loc next_discriminator_location. Likewise in has_discriminator. In assign_discriminator: + location_t new_locus = location_with_discriminator (locus, discriminator); + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + if (same_line_p (locus, gimple_location (stmt))) +gimple_set_location (stmt, block ? + COMBINE_LOCATION_DATA (line_table, new_locus, block) : + LOCATION_LOCUS (new_locus)); +} I'm not convinced the COMBINE_LOCATION_DATA is needed here, since location_with_discriminator has already done that. And when block == NULL, calling LOCATION_LOCUS is effectively a no-op, isn't it? It seems this could simply be: gimple_set_location (stmt, new_locus); Also, you need to add vec.o to OBJS-libcommon in Makefile.in, since input.o now depends on it. Other binaries, like gcov, won't build otherwise. I'm still not happy with a straightforward port of the old patch, though, since discriminator assignment might allocate locations that overlap those used for macro definitions -- there's no checking to see if we've run out of locations in that case. I guess this could be OK for the google branch for now, but not for trunk. I'm looking at alternatives using the adhoc mapping -- do you think it would work if we extend the adhoc mapping to track both a data (i.e., the block pointer) and a discriminator? -cary
Re: [GOOGLE] Back port discriminator patches to gcc-4_8
On Fri, May 17, 2013 at 4:35 PM, Cary Coutant ccout...@google.com wrote: The warning was attributed to the wrong lineno. Looks like discriminator does not work well when it coexists with macros. I think the problem is with same_line_p. It's using expand_location to test whether two locations refer to the same line, but expand_location always unwinds the macro stack so that it's looking at the line number of the macro expansion point. That means that every token in the macro expansion will appear to be at the same line, so the loop over the basic_block will replace all of the locations with the new_locus that we just allocated for the new discriminator. Thus, it's clobbering the locus for the instructions at line 27 in the macro definition with the new locus we create to put a discriminator at line 26, and the warning that should appear for line 27 now says line 26. Sounds like instead of checking the macro-expansion location, same_line_p should check the expanded macro location instead, so the problem will be resolved? Other comments on the patch... In expand_location_1: + if (min_discriminator_location != UNKNOWN_LOCATION + loc = min_discriminator_location + loc min_discriminator_location + next_discriminator_location) The last comparison should just be loc next_discriminator_location. Likewise in has_discriminator. Acked, will update the patch. In assign_discriminator: + location_t new_locus = location_with_discriminator (locus, discriminator); + gimple_stmt_iterator gsi; + + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) +{ + gimple stmt = gsi_stmt (gsi); + if (same_line_p (locus, gimple_location (stmt))) +gimple_set_location (stmt, block ? + COMBINE_LOCATION_DATA (line_table, new_locus, block) : + LOCATION_LOCUS (new_locus)); +} I'm not convinced the COMBINE_LOCATION_DATA is needed here, since location_with_discriminator has already done that. And when block == NULL, calling LOCATION_LOCUS is effectively a no-op, isn't it? It seems this could simply be: gimple_set_location (stmt, new_locus); But what if block != NULL? Also, you need to add vec.o to OBJS-libcommon in Makefile.in, since input.o now depends on it. Other binaries, like gcov, won't build otherwise. Acked, will update the patch I'm still not happy with a straightforward port of the old patch, though, since discriminator assignment might allocate locations that overlap those used for macro definitions -- there's no checking to see if we've run out of locations in that case. I guess this could be OK for the google branch for now, but not for trunk. I'm looking at alternatives using the adhoc mapping -- do you think it would work if we extend the adhoc mapping to track both a data (i.e., the block pointer) and a discriminator? Yeah, I think that works. The only concern is that it would increase the memory because each locus_adhoc map entry need to have another bit to store the extra info? Thanks, Dehao -cary
Re: Minimize downward code motion during reassociation
On Thu, Apr 25, 2013 at 4:42 AM, Richard Biener richard.guent...@gmail.com wrote: On Fri, Dec 7, 2012 at 9:01 PM, Easwaran Raman era...@google.com wrote: It seems I need to reset the debug uses of a statement before moving the statement itself. The attached patch starts from the leaf to root of the tree to be reassociated and places them at the point where their dependences will be met after reassociation. This bootstraps and I am running the tests. Ok if there are no test failures? + if ((dep_bb != insert_bb +!dominated_by_p (CDI_DOMINATORS, insert_bb, dep_bb)) + || (dep_bb == insert_bb + gimple_uid (insert_stmt) gimple_uid (dep_stmt))) +{ + insert_stmt = dep_stmt; +} superfluous {} Removed. + /* If there are any debug uses of LHS, reset them. */ + FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs) +{ + if (is_gimple_debug (use_stmt)) +{ + gimple_debug_bind_reset_value (use_stmt); that's only needed for debug stmts that are not dominated by insert_point, no? You are right. I have added the check. + /* Statements marked for throw can not be in the middle of a basic block. So + we can not insert a statement (not marked for throw) immediately after. */ + else if (stmt_can_throw_internal (insert_point)) +{ use stmt_ends_bb_p (insert_point) instead + edge succ_edge = find_fallthru_edge (insert_bb-succs); + insert_bb = succ_edge-dest; + /* Insert STMT at the beginning of the successor basic block. */ + gsi_insert = gsi_after_labels (insert_bb); + gsi_move_before (gsistmt, gsi_insert); are you sure this is a proper insert location? How would you even arrive in this situation given find_insert_point dominance checks? Let's say the last statement in a BB which can throw feeds into a statement that is to be reassociated. Then the insert point would be this statement. Instead, you could always insert at the beginning of its (lone) successor. This is independent of the the find_insert_point checks. Otherwise the patch looks ok - can you add one or two testcases please? I have added a test case and submitted at r199048. Thanks, Easwaran Thanks, Richard. Thanks, Easwaran 2012-12-07 Easwaran Raman era...@google.com * tree-ssa-reassoc.c(find_insert_point): New function. (insert_stmt_after): Likewise. (get_def_stmt): Likewise. (ensure_ops_are_available): Likewise. (rewrite_expr_tree): Do not move statements beyond what is necessary. Remove call to swap_ops_for_binary_stmt... (reassociate_bb): ... and move it here. (build_and_add_sum): Assign UIDs for new statements. (linearize_expr): Likewise. (do_reassoc): Renumber gimple statement UIDs. On Thu, Dec 6, 2012 at 1:10 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 6, 2012 at 1:54 AM, Easwaran Raman era...@google.com wrote: I am unable to figure out the right way to handle the debug statements. What I tried was to find debug statements that use the SSA name defined by the statement I moved (using SSA_NAME_IMM_USE_NODE) and then moved them as well at the right place. Thus, if I have to move t1 = a + b down (after the definition of 'd'), I also moved all debug statements that use t1 after the new position of t1. That still caused use-before-def problems in ssa_verify. I noticed that the debug statements got modified behind the scenes causing these issues. Any hints on what is the right way to handle the debug statements would be very helpful. I think you cannot (and should not) move debug statements. Instead you have to invalidate them. Otherwise you'll introduce confusion as debug info cannot handle overlapping live ranges. But maybe Alex can clarify. Richard.
[C++ Patch] PR 10207
Hi, in this even older ;) and related issue, we reject empty initializer lists in compound-literals. The fix seems simple: just use cp_parser_braced_list instead of cp_parser_initializer_list, which allows for the special case of empty list. Tested x86_64-linux. There is a nit which I don't want to hide: cp_parser_initializer_list + build_constructor does a little more than cp_parser_braced_list: in build_constructor there is a loop setting TREE_SIDE_EFFECTS and TREE_CONSTANT to the right value for the CONSTRUCTOR overall, which doesn't exist in cp_parser_braced_list. In case it matters - I don't think it does, and we have testcases with side effects in the testsuite - we can't simply add it to cp_parser_braced_list, because its many existing uses are perfectly fine without. A little more code would be needed, not a big issue. Thanks, Paolo. /// /cp 2013-05-18 Paolo Carlini paolo.carl...@oracle.com PR c++/10207 * parser.c (cp_parser_postfix_expression): Use cp_parser_braced_list instead of cp_parser_initializer_list for compound-literals. /testsuite 2013-05-18 Paolo Carlini paolo.carl...@oracle.com PR c++/10207 * g++.dg/ext/complit13.C: New. Index: cp/parser.c === --- cp/parser.c (revision 199043) +++ cp/parser.c (working copy) @@ -5719,7 +5719,7 @@ cp_parser_postfix_expression (cp_parser *parser, b if (cp_parser_allow_gnu_extensions_p (parser) cp_lexer_next_token_is (parser-lexer, CPP_OPEN_PAREN)) { - vecconstructor_elt, va_gc *initializer_list = NULL; + tree initializer = NULL_TREE; bool saved_in_type_id_in_expr_p; cp_parser_parse_tentatively (parser); @@ -5732,21 +5732,19 @@ cp_parser_postfix_expression (cp_parser *parser, b parser-in_type_id_in_expr_p = saved_in_type_id_in_expr_p; /* Look for the `)'. */ cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN); - /* Look for the `{'. */ - cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE); /* If things aren't going well, there's no need to keep going. */ if (!cp_parser_error_occurred (parser)) { - bool non_constant_p; - /* Parse the initializer-list. */ - initializer_list - = cp_parser_initializer_list (parser, non_constant_p); - /* Allow a trailing `,'. */ - if (cp_lexer_next_token_is (parser-lexer, CPP_COMMA)) - cp_lexer_consume_token (parser-lexer); - /* Look for the final `}'. */ - cp_parser_require (parser, CPP_CLOSE_BRACE, RT_CLOSE_BRACE); + if (cp_lexer_next_token_is (parser-lexer, CPP_OPEN_BRACE)) + { + bool non_constant_p; + /* Parse the brace-enclosed initializer list. */ + initializer = cp_parser_braced_list (parser, +non_constant_p); + } + else + cp_parser_simulate_error (parser); } /* If that worked, we're definitely looking at a compound-literal expression. */ @@ -5754,7 +5752,8 @@ cp_parser_postfix_expression (cp_parser *parser, b { /* Warn the user that a compound literal is not allowed in standard C++. */ - pedwarn (input_location, OPT_Wpedantic, ISO C++ forbids compound-literals); + pedwarn (input_location, OPT_Wpedantic, +ISO C++ forbids compound-literals); /* For simplicity, we disallow compound literals in constant-expressions. We could allow compound literals of integer type, whose @@ -5772,10 +5771,8 @@ cp_parser_postfix_expression (cp_parser *parser, b } /* Form the representation of the compound-literal. */ postfix_expression - = (finish_compound_literal -(type, build_constructor (init_list_type_node, - initializer_list), - tf_warning_or_error)); + = (finish_compound_literal (type, initializer, + tf_warning_or_error)); break; } } Index: testsuite/g++.dg/ext/complit13.C === --- testsuite/g++.dg/ext/complit13.C(revision 0) +++ testsuite/g++.dg/ext/complit13.C(working copy) @@ -0,0 +1,11 @@ +// PR c++/10207 +// { dg-options } + +typedef struct { } EmptyStruct; +typedef struct { EmptyStruct Empty; } DemoStruct; + +void Func() +{ + DemoStruct Demo; +
Re: [Patch][gcc-4_7-branch] Backport trunk revision 187838 into gcc-4_7-branch
Ping: http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00429.html The patch is to fix dependency issue of libgcc Makefile.in by adding 'rm libgcc_tm.stamp' in the clean rule. That was fixed on main trunk (r187838) but not on gcc-4_7-branch. Since gcc-4_7-branch is still open, is it OK to backport r187838 into gcc-4_7-branch? Best regards, jasonwucj
[Patch] Extend script ./contrib/download_prerequisites usage for isl and cloog
Hi all, Using current trunk repository, it is now able to build compiler with in-tree isl and cloog. This patch is to extend ./contrib/download_prerequisites usage to download isl and cloog conditionally in case people would like to build gcc with graphite loop optimizations. OK for the trunk? contrib/ChangeLog 2013-05-18 Chung-Ju Wu jasonw...@gmail.com * download_prerequisites: Download isl and cloog conditionally. Index: contrib/download_prerequisites === --- contrib/download_prerequisites (revision 199006) +++ contrib/download_prerequisites (working copy) @@ -19,6 +19,12 @@ # You should have received a copy of the GNU General Public License # along with this program. If not, see http://www.gnu.org/licenses/. +# If you want to build GCC with the Graphite loop optimizations, +# set GRAPHITE_LOOP_OPT=yes to download optional prerequisties +# ISL Library and CLooG. +GRAPHITE_LOOP_OPT=no + +# Necessary to build GCC. MPFR=mpfr-2.4.2 GMP=gmp-4.3.2 MPC=mpc-0.8.1 @@ -36,3 +42,19 @@ ln -sf $MPC mpc || exit 1 rm $MPFR.tar.bz2 $GMP.tar.bz2 $MPC.tar.gz || exit 1 + +# Necessary to build GCC with the Graphite loop optimizations. +if [ $GRAPHITE_LOOP_OPT == yes ] ; then + ISL=isl-0.11.1 + CLOOG=cloog-0.18.0 + + wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$ISL.tar.bz2 || exit 1 + tar xjf $ISL.tar.bz2 || exit 1 + ln -sf $ISL isl || exit 1 + + wget ftp://gcc.gnu.org/pub/gcc/infrastructure/$CLOOG.tar.gz || exit 1 + tar xzf $CLOOG.tar.gz || exit 1 + ln -sf $CLOOG cloog || exit 1 + + rm $ISL.tar.bz2 $CLOOG.tar.gz || exit 1 +fi Best regards, jasonwucj
Re: [PATCH] Fix latent bug in RTL GCSE/PRE (PR57159)
On 05/07/2013 09:05 AM, Julian Brown wrote: On Mon, 6 May 2013 11:53:20 -0600 I don't know. My assumption was that a simple mem would be one that the PRE pass could handle -- that clause was intended to handle simple mems in REG_EQUAL notes the same as simple mems as the source of a set. Maybe that is not safe though, and it would be better to unconditionally invalidate buried mems in REG_EQUAL notes? I was slightly wary of inhibiting genuine optimisation opportunities. For a simple mem, we'll do the right thing, at least that's my reading of the code. I went ahead and put the two code fragments together and committed the change after a bootstrap and regression test on x86_64-unknown-linux-gnu. For reference, attached is the patch that ultimately went into the tree. Thanks, Jeff commit f473eb72a23bc82db0ee23e51fdd40b20417fb15 Author: law law@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Sat May 18 03:48:18 2013 + * gcse.c (compute_ld_motion_mems): If a non-simple MEM is found in a REG_EQUAL note, invalidate it. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@199049 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 4536e62..d6eab5f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-05-17 Julian Brown jul...@codesourcery.com + + * gcse.c (compute_ld_motion_mems): If a non-simple MEM is + found in a REG_EQUAL note, invalidate it. + 2013-05-17 Easwaran Raman era...@google.com * tree-ssa-reassoc.c (find_insert_point): New function. diff --git a/gcc/gcse.c b/gcc/gcse.c index 07043f7..e485985 100644 --- a/gcc/gcse.c +++ b/gcc/gcse.c @@ -3894,6 +3894,8 @@ compute_ld_motion_mems (void) { rtx src = SET_SRC (PATTERN (insn)); rtx dest = SET_DEST (PATTERN (insn)); + rtx note = find_reg_equal_equiv_note (insn); + rtx src_eq; /* Check for a simple LOAD... */ if (MEM_P (src) simple_mem (src)) @@ -3910,6 +3912,15 @@ compute_ld_motion_mems (void) invalidate_any_buried_refs (src); } + if (note != 0 REG_NOTE_KIND (note) == REG_EQUAL) + src_eq = XEXP (note, 0); + else + src_eq = NULL_RTX; + + if (src_eq != NULL_RTX + !(MEM_P (src_eq) simple_mem (src_eq))) + invalidate_any_buried_refs (src_eq); + /* Check for stores. Don't worry about aliased ones, they will block any movement we might do later. We only care about this exact pattern since those are the only