Re: [PATCH,GRAPHITE] Fix for P1 bug 58028
On Wed, Feb 26, 2014 at 10:13 PM, Tobias Grosser tob...@grosser.es wrote: On 02/26/2014 10:09 PM, Mircea Namolaru wrote: This patch fixes the libgomp problems: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58028 2014-02-26 Tobias Grosser tob...@grosser.es Mircea Namolaru mnamo...@inria.fr Fix for bug 58028 * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar dimensions Index: gcc/graphite-clast-to-gimple.c === --- gcc/graphite-clast-to-gimple.c (revision 207298) +++ gcc/graphite-clast-to-gimple.c (working copy) @@ -1522,6 +1522,13 @@ variables. */ options-save_domains = 1; + /* Do not remove scalar dimensions. Cloog be default removes scalar + dimensions very early from the input schedule. However, they are + necessary to correctly derive from the saved domains + (options-save_domains) the relationship between the generated loops + and the schedule dimensions they are generated from. */ + options-noscalars = 1; + /* Disable optimizations and make cloog generate source code closer to the input. This is useful for debugging, but later we want the optimized code. Tested on x86-64 Linux, no regressions for c/c++/fortran. Got a notification failure from gcc-patches for the initial submission of this patch. However, Tobias (on cc) received it and already sent his comments, (see http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01540.html), that are addressed in this re-submission. Yes, very nice. Thanks Mircea for reacting so quickly. This patch look good to me. @Release-managers: Is this patch OK for trunk? It fixes a P1 regression, so yes. Richard. Cheers, Tobias
Re: PR60147: fix ICE with DECL_NAMELIST and tree-pretty-printer.c
On Wed, Feb 26, 2014 at 11:19 PM, Tobias Burnus bur...@net-b.de wrote: Dear all, as suggested by Richard, it now only prints the namelist name and no longer the variables of the namelist. Bootstrapped on x86-64-gnu-linux and currently regtesting. OK for the trunk when it succeeds? Works for me, but doesn't the simpler Index: tree-pretty-print.c === --- tree-pretty-print.c (revision 208066) +++ tree-pretty-print.c (working copy) @@ -1390,6 +1390,7 @@ case FIELD_DECL: case DEBUG_EXPR_DECL: case NAMESPACE_DECL: +case NAMELIST_DECL: dump_decl_name (buffer, node, flags); break; also work? It's odd that we need to do sth special just for namelist-decls. Richard. Tobias On February 22, 2014 10:00, Tobias Burnus wrote: Since GCC 4.9, gfortran generates a DECL_NAMELIST (for DWARF's DW_TAG_namelist) - they are stored in the BIND_EXPR. Namelists are a bit boring: They only group variable names and the namelist name is only used in I/O statements (READ, WRITE) to permit a fancy data input [and output] - and for the debugger. Due to DW_TAG_namelist, namelists are now exposed to the middle end - and I forgot to handle them also in the tree pretty printer - hence -fdump-tree-original now ICEs. For the pretty printer one has two options: Ignoring (or NYI;) the decl or dumping it. The attached patch does the latter. Bootstrapped (C/C++/Fortran) and regtested on x86-64-gnu-linux. OK for the trunk? Tobias
Re: [PR debug/57232] don't record non-fixed reg as CFA base value
On Thu, Feb 27, 2014 at 6:28 AM, Alexandre Oliva aol...@redhat.com wrote: Some ports were failing an assertion check that was supposed to make sure some RTX created a new VALUE, rather than reuse an existing one in the cselib tables. The reason the value was already there was that we'd recorded the register in the permanent table as the CFA, but the register was correctly used for other purposes by the compiler itself, we'd just failed to refrain from take note of that in this particular code path. Other paths that register the CFA base value are guarded by this condition, and guarding this one fixed the problem on the affected ports. Regstrapped on x86_64-linux-gnu and i686-linux-gnu; in the PR, the problem is confirmed fixed by this patch on the rx and lm32 ports too. Ok? Ok. Thanks, Richard. for gcc/ChangeLog PR debug/57232 * var-tracking.c (vt_initialize): Apply the same condition to preserve the CFA base value. --- gcc/var-tracking.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c index c5ce1dd..65d8285 100644 --- a/gcc/var-tracking.c +++ b/gcc/var-tracking.c @@ -9924,7 +9924,8 @@ vt_initialize (void) val = cselib_lookup_from_insn (reg, GET_MODE (reg), 1, VOIDmode, get_insns ()); preserve_value (val); - cselib_preserve_cfa_base_value (val, REGNO (reg)); + if (reg != hard_frame_pointer_rtx fixed_regs[REGNO (reg)]) + cselib_preserve_cfa_base_value (val, REGNO (reg)); expr = plus_constant (GET_MODE (stack_pointer_rtx), stack_pointer_rtx, -ofst); cselib_add_permanent_equiv (val, expr, get_insns ()); -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Toolchain Engineer
Re: [PR debug/59992 #1/2] avoid quadratic behavior for the removal of useless values
On Thu, Feb 27, 2014 at 6:54 AM, Alexandre Oliva aol...@redhat.com wrote: We indirectly call remove_useless_values quite often during vt_initialize; at least once per extended basic block. On functions with thousands of small basic blocks, each adding permanent and temporary entries to the table, that turns out to be quite expensive: the permanent entries pile up and trigger the quadratic behavior mentioned in teh guard of one of the two calls of remove_useless_values. This patch moves the guard so that it applies to the other as well. I wasn't entirely sure this wouldn't invalidate assumptions made in callers of cselib_preserve_only_values (the function called at the end of each extended basic block), but some analysis, plus comparing before and after assembly output ;-), made sure it didn't ;-) This patch alone cut down the vt_initialize time of insn-recog on generic i686 from more than 1700 seconds (~84% of the total compile time) to about 500 seconds. Regstrapped on x86_64-linux-gnu and i686-linux-gnu, along with the other patch for PR debug/59992 that I'm about to post. Ok. Thanks, Richard. for gcc/ChangeLog PR debug/59992 * cselib.c (remove_useless_values): Skip to avoid quadratic behavior if the condition moved from... (cselib_process_insn): ... here holds. --- gcc/cselib.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/gcc/cselib.c b/gcc/cselib.c index 525e717..dabd2d3 100644 --- a/gcc/cselib.c +++ b/gcc/cselib.c @@ -662,6 +662,14 @@ remove_useless_values (void) { cselib_val **p, *v; + if (n_useless_values = MAX_USELESS_VALUES + /* remove_useless_values is linear in the hash table size. Avoid + quadratic behavior for very large hashtables with very few +useless elements. */ + || ((unsigned int)n_useless_values + = (cselib_hash_table.elements () - n_debug_values) / 4)) +return; + /* First pass: eliminate locations that reference the value. That in turn can make more values useless. */ do @@ -2693,13 +2701,7 @@ cselib_process_insn (rtx insn) cselib_current_insn = NULL_RTX; - if (n_useless_values MAX_USELESS_VALUES - /* remove_useless_values is linear in the hash table size. Avoid - quadratic behavior for very large hashtables with very few -useless elements. */ - ((unsigned int)n_useless_values - (cselib_hash_table.elements () - n_debug_values) / 4)) -remove_useless_values (); + remove_useless_values (); } /* Initialize cselib for one pass. The caller must also call -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Toolchain Engineer
Re: [PATCH, GOMP4] Fix OpenACC async clause
Hi! On Wed, 26 Feb 2014 19:32:35 +0400, Ilmir Usmanov i.usma...@samsung.com wrote: This patch is pretty obvious. Currently ASYNC clause cannot have integer-expression-list. Patch fixes this. OK for gomp4 branch? From df76a29ebf869687209d7a606e243624cc136dbc Mon Sep 17 00:00:00 2001 From: Ilmir Usmanov i.usma...@samsung.com Date: Wed, 26 Feb 2014 19:04:47 +0400 Subject: [PATCH 4/5] Fix ASYNC --- gcc/gimplify.c | 4 ++-- gcc/omp-low.c | 4 ++-- gcc/tree-core.h | 6 +++--- gcc/tree.c | 6 +++--- 4 files changed, 10 insertions(+), 10 deletions(-) Sure, with ChangeLog added, and the following changes folded in, I think: --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -6545,11 +6545,11 @@ gimplify_adjust_omp_clauses (tree *list_p) case OMP_CLAUSE_OACC_DEVICE: case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_USE_DEVICE: + case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_GANG: case OMP_CLAUSE_WAIT: GANG/ASYNC/WAIT, for consistency? --- a/gcc/tree.c +++ b/gcc/tree.c @@ -263,6 +263,7 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_OACC_DEVICE */ 1, /* OMP_CLAUSE_DEVICE_RESIDENT */ 1, /* OMP_CLAUSE_USE_DEVICE */ + 1, /* OMP_CLAUSE_ASYNC */ 1, /* OMP_CLAUSE_GANG */ 1, /* OMP_CLAUSE_WAIT */ Likewise. For consistency, also move OMP_CLAUSE_ASYNC_EXPR in gcc/tree.h, and also adapt OMP_CLAUSE_ASYNC in gcc/tree-pretty-print.c:dump_omp_clause? Grüße, Thomas pgpQJ0AeLLvJc.pgp Description: PGP signature
Re: [PATCH][AArch64] Use Cortex-A57 rtx costs for the generic CPU
On 26/02/14 14:48, Kyrill Tkachov wrote: Hi all, The generic rtx cost table was written with AArch32 cores in mind. It would be a better idea to use the Cortex-A57 costs for the generic CPU in aarch64. That way we schedule for the Cortex-A53 and do instruction selection for the Cortex-A57. Since generic is the default CPU, this patch can change default code generation but is highly unlikely to affect correctness (unless it exposes latent bugs) since it only affects instruction selection. Tested aarch64-none-elf on a model with no regressions. Is it ok for trunk now or shall we wait for stage1? Thanks, Kyrill 2014-02-26 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (generic_tunings): Use cortexa57_extra_costs. I think this is the right direction to go, and as such I'd prefer that we did it for 4.9. Please give the RMs 24 hours to comment. But otherwise OK. R.
Re: PR60147: fix ICE with DECL_NAMELIST and tree-pretty-printer.c
Hi Richard, hi all, I wrote: @@ -1713,21 +1733,24 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags, case BIND_EXPR: ... for (op0 = BIND_EXPR_VARS (node); op0; op0 = DECL_CHAIN (op0)) { -print_declaration (buffer, op0, spc+2, flags); +if (TREE_CODE(op0) == NAMELIST_DECL) ... Richard Biener wrote: Works for me, but doesn't the simpler --- tree-pretty-print.c (revision 208066) +++ tree-pretty-print.c (working copy) @@ -1390,6 +1390,7 @@ case FIELD_DECL: case DEBUG_EXPR_DECL: case NAMESPACE_DECL: +case NAMELIST_DECL: dump_decl_name (buffer, node, flags); break; also work? It's odd that we need to do sth special just for namelist-decls. It won't do - at least not like that. NAMELIST_DECL are stored in BIND_EXPR, hence, one ends up in the code I quote above. That code calls print_declaration() NAMELIST_DECL are created as decl = make_node (NAMELIST_DECL); TREE_TYPE (decl) = void_type_node; NAMELIST_DECL_ASSOCIATED_DECL (decl) = build_constructor (NULL_TREE, nml_decls); Hence, TREE_TYPE() is void_type_node and dereferencing that one leads to an ICE. However, the following should work - do you like it more? Tobias --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -2686,6 +2686,13 @@ print_declaration (pretty_printer *buffer, tree t, int spc, int flags) { INDENT (spc); + if (TREE_CODE(t) == NAMELIST_DECL) +{ + pp_string(buffer, namelist ); + dump_decl_name (buffer, t, flags); + goto done; +} + if (TREE_CODE (t) == TYPE_DECL) pp_string (buffer, typedef ); @@ -2767,6 +2774,7 @@ print_declaration (pretty_printer *buffer, tree t, int spc, int flags) pp_right_bracket (buffer); } +done: pp_semicolon (buffer); }
Re: [PING^3][PATCH] Add a couple of dialect and warning options regarding Objective-C instance variable scope
Ping! On 02/20/2014 12:11 PM, Dimitris Papavasiliou wrote: Hello all, Pinging this patch review request again. See previous messages quoted below for details. Regards, Dimitris On 02/13/2014 04:22 PM, Dimitris Papavasiliou wrote: Hello, Pinging this patch review request. Can someone involved in the Objective-C language frontend have a quick look at the description of the proposed features and tell me if it'd be ok to have them in the trunk so I can go ahead and create proper patches? Thanks, Dimitris On 02/06/2014 11:25 AM, Dimitris Papavasiliou wrote: Hello, This is a patch regarding a couple of Objective-C related dialect options and warning switches. I have already submitted it a while ago but gave up after pinging a couple of times. I am now informed that should have kept pinging until I got someone's attention so I'm resending it. The patch is now against an old revision and as I stated originally it's probably not in a state that can be adopted as is. I'm sending it as is so that the implemented features can be assesed in terms of their usefulness and if they're welcome I'd be happy to make any necessary changes to bring it up-to-date, split it into smaller patches, add test-cases and anything else that is deemed necessary. Here's the relevant text from my initial message: Two of these switches are related to a feature request I submitted a while ago, Bug 56044 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56044). I won't reproduce the entire argument here since it is available in the feature request. The relevant functionality in the patch comes in the form of two switches: -Wshadow-ivars which controls the local declaration of ‘somevar’ hides instance variable warning which curiously is enabled by default instead of being controlled at least by -Wshadow. The patch changes it so that this warning can be enabled and disabled specifically through -Wshadow-ivars as well as with all other shadowing-related warnings through -Wshadow. The reason for the extra switch is that, while searching through the Internet for a solution to this problem I have found out that other people are inconvenienced by this particular warning as well so it might be useful to be able to turn it off while keeping all the other shadowing-related warnings enabled. -flocal-ivars which when true, as it is by default, treats instance variables as having local scope. If false (-fno-local-ivars) instance variables must always be referred to as self-ivarname and references of ivarname resolve to the local or global scope as usual. I've also taken the opportunity of adding another switch unrelated to the above but related to instance variables: -fivar-visibility which can be set to either private, protected (the default), public and package. This sets the default instance variable visibility which normally is implicitly protected. My use-case for it is basically to be able to set it to public and thus effectively disable this visibility mechanism altogether which I find no use for and therefore have to circumvent. I'm not sure if anyone else feels the same way towards this but I figured it was worth a try. I'm attaching a preliminary patch against the current revision in case anyone wants to have a look. The changes are very small and any blatant mistakes should be immediately obvious. I have to admit to having virtually no knowledge of the internals of GCC but I have tried to keep in line with formatting guidelines and general style as well as looking up the particulars of the way options are handled in the available documentation to avoid blind copy-pasting. I have also tried to test the functionality both in my own (relatively large, or at least not too small) project and with small test programs and everything works as expected. Finallly, I tried running the tests too but these fail to complete both in the patched and unpatched version, possibly due to the way I've configured GCC. Dimitris
Re: [PATCH,GRAPHITE] Fix for P1 bug 58028
Hi Tobias and Mircea, On 02/26/2014 10:09 PM, Mircea Namolaru wrote: + /* Do not remove scalar dimensions. Cloog be default removes scalar + dimensions very early from the input schedule. However, they are Small nit: Instead of 'be default' it should be 'by default' Tobias
Re: PR60147: fix ICE with DECL_NAMELIST and tree-pretty-printer.c
On Thu, Feb 27, 2014 at 10:37 AM, Tobias Burnus tobias.bur...@physik.fu-berlin.de wrote: Hi Richard, hi all, I wrote: @@ -1713,21 +1733,24 @@ dump_generic_node (pretty_printer *buffer, tree node, int spc, int flags, case BIND_EXPR: ... for (op0 = BIND_EXPR_VARS (node); op0; op0 = DECL_CHAIN (op0)) { -print_declaration (buffer, op0, spc+2, flags); +if (TREE_CODE(op0) == NAMELIST_DECL) ... Richard Biener wrote: Works for me, but doesn't the simpler --- tree-pretty-print.c (revision 208066) +++ tree-pretty-print.c (working copy) @@ -1390,6 +1390,7 @@ case FIELD_DECL: case DEBUG_EXPR_DECL: case NAMESPACE_DECL: +case NAMELIST_DECL: dump_decl_name (buffer, node, flags); break; also work? It's odd that we need to do sth special just for namelist-decls. It won't do - at least not like that. NAMELIST_DECL are stored in BIND_EXPR, hence, one ends up in the code I quote above. That code calls print_declaration() NAMELIST_DECL are created as decl = make_node (NAMELIST_DECL); TREE_TYPE (decl) = void_type_node; NAMELIST_DECL_ASSOCIATED_DECL (decl) = build_constructor (NULL_TREE, nml_decls); Hence, TREE_TYPE() is void_type_node and dereferencing that one leads to an ICE. Where is it dereferenced? void_type_node is a valid type. However, the following should work - do you like it more? Tobias --- a/gcc/tree-pretty-print.c +++ b/gcc/tree-pretty-print.c @@ -2686,6 +2686,13 @@ print_declaration (pretty_printer *buffer, tree t, int spc, int flags) { INDENT (spc); + if (TREE_CODE(t) == NAMELIST_DECL) +{ + pp_string(buffer, namelist ); + dump_decl_name (buffer, t, flags); + goto done; +} + if (TREE_CODE (t) == TYPE_DECL) pp_string (buffer, typedef ); @@ -2767,6 +2774,7 @@ print_declaration (pretty_printer *buffer, tree t, int spc, int flags) pp_right_bracket (buffer); } +done: pp_semicolon (buffer); Instead of the goto please duplicate the pp_semicolon and return. Ok with that change. You probably still want to add the NAMELIST_DECL case to dump_generic_node (to avoid the Unknown tree: ...). Thanks, Richard. }
Re: [PATCH, GOMP4] Fix OpenACC async clause
Hi Thomas! Fixed patch and ChangeLog in attachment. On 27.02.2014 12:59, Thomas Schwinge wrote: GANG/ASYNC/WAIT, for consistency? Fixed. For consistency, also move OMP_CLAUSE_ASYNC_EXPR in gcc/tree.h, and also adapt OMP_CLAUSE_ASYNC in gcc/tree-pretty-print.c:dump_omp_clause? It is already adapted: case OMP_CLAUSE_ASYNC: pp_string (buffer, async); if (OMP_CLAUSE_DECL (clause)) { pp_character(buffer, '('); dump_generic_node (buffer, OMP_CLAUSE_DECL (clause), spc, flags, false); pp_character(buffer, ')'); } break; -- Ilmir. From bff6eb6dd930316bab91f222c1673e699fa97b96 Mon Sep 17 00:00:00 2001 From: Ilmir Usmanov i.usma...@samsung.com Date: Thu, 27 Feb 2014 13:46:24 +0400 Subject: [PATCH] Fix ASYNC --- gcc/gimplify.c | 4 ++-- gcc/omp-low.c | 4 ++-- gcc/tree-core.h | 6 +++--- gcc/tree.c | 6 +++--- gcc/tree.h | 6 +++--- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/gcc/gimplify.c b/gcc/gimplify.c index fd4305c..d0a4779 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -6186,10 +6186,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_GANG: + case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WAIT: case OMP_NO_CLAUSE_CACHE: case OMP_CLAUSE_INDEPENDENT: - case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WORKER: case OMP_CLAUSE_VECTOR: case OMP_CLAUSE_NUM_GANGS: @@ -6546,10 +6546,10 @@ gimplify_adjust_omp_clauses (tree *list_p) case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_GANG: + case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WAIT: case OMP_NO_CLAUSE_CACHE: case OMP_CLAUSE_INDEPENDENT: - case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WORKER: case OMP_CLAUSE_VECTOR: case OMP_CLAUSE_NUM_GANGS: diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 6dec687..eec862e 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -1776,10 +1776,10 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_GANG: + case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WAIT: case OMP_NO_CLAUSE_CACHE: case OMP_CLAUSE_INDEPENDENT: - case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WORKER: case OMP_CLAUSE_VECTOR: case OMP_CLAUSE_NUM_GANGS: @@ -1916,10 +1916,10 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_GANG: + case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WAIT: case OMP_NO_CLAUSE_CACHE: case OMP_CLAUSE_INDEPENDENT: - case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WORKER: case OMP_CLAUSE_VECTOR: case OMP_CLAUSE_NUM_GANGS: diff --git a/gcc/tree-core.h b/gcc/tree-core.h index fcdeb44..e9aeb65 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -286,6 +286,9 @@ enum omp_clause_code { size-expression: * | integer-expression. */ OMP_CLAUSE_GANG, + /* OpenACC clause: async [(integer-expression)]. */ + OMP_CLAUSE_ASYNC, + /* OpenACC clause/directive: wait [(integer-expression-list)]. */ OMP_CLAUSE_WAIT, @@ -372,9 +375,6 @@ enum omp_clause_code { /* OpenACC clause: independent. */ OMP_CLAUSE_INDEPENDENT, - /* OpenACC clause: async [(integer-expression)]. */ - OMP_CLAUSE_ASYNC, - /* OpenACC clause: worker [( [num:] integer-expression)]. */ OMP_CLAUSE_WORKER, diff --git a/gcc/tree.c b/gcc/tree.c index 0655db0..d9a577b 100644 --- a/gcc/tree.c +++ b/gcc/tree.c @@ -264,6 +264,7 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_DEVICE_RESIDENT */ 1, /* OMP_CLAUSE_USE_DEVICE */ 1, /* OMP_CLAUSE_GANG */ + 1, /* OMP_CLAUSE_ASYNC */ 1, /* OMP_CLAUSE_WAIT */ 1, /* OMP_NO_CLAUSE_CACHE */ 1, /* OMP_CLAUSE__LOOPTEMP_ */ @@ -292,7 +293,6 @@ unsigned const char omp_clause_num_ops[] = 0, /* OMP_CLAUSE_TASKGROUP */ 1, /* OMP_CLAUSE__SIMDUID_ */ 0, /* OMP_CLAUSE_INDEPENDENT */ - 1, /* OMP_CLAUSE_ASYNC */ 1, /* OMP_CLAUSE_WORKER */ 1, /* OMP_CLAUSE_VECTOR */ 1, /* OMP_CLAUSE_NUM_GANGS */ @@ -322,6 +322,7 @@ const char * const omp_clause_code_name[] = device_resident, use_device, gang, + async, wait, _cache_, _looptemp_, @@ -350,7 +351,6 @@ const char * const omp_clause_code_name[] = taskgroup, _simduid_, independent, - async, worker, vector, num_gangs, @@ -11062,9 +11062,9 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data, case OMP_CLAUSE_DEVICE_RESIDENT: case OMP_CLAUSE_USE_DEVICE: case OMP_CLAUSE_GANG: + case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WAIT: case OMP_NO_CLAUSE_CACHE: - case OMP_CLAUSE_ASYNC: case OMP_CLAUSE_WORKER: case OMP_CLAUSE_VECTOR: case OMP_CLAUSE_NUM_GANGS: diff --git a/gcc/tree.h b/gcc/tree.h index 4d598d1..84801ff 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1315,15 +1315,15 @@ extern void protected_set_expr_location (tree, location_t); #define
Re: [PATCH,GRAPHITE] Fix for P1 bug 58028
Tobias Burnus tobias.bur...@physik.fu-berlin.de writes: On 02/26/2014 10:09 PM, Mircea Namolaru wrote: + /* Do not remove scalar dimensions. Cloog be default removes scalar + dimensions very early from the input schedule. However, they are Small nit: Instead of 'be default' it should be 'by default' ... and two spaces after the full stops. While you're at it, please fix the CLooG capitalization ;-) Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, GOMP4] Fix OpenACC async clause
Hi Ilmir! On Thu, 27 Feb 2014 13:55:36 +0400, Ilmir Usmanov i.usma...@samsung.com wrote: Fixed patch and ChangeLog in attachment. By the way, what I usually do is to put a change's ChangeLog snippet into that Git commit's commit message, and then once I get to the point where I push something upstream, I can just take it from there, and put it into the respective ChangeLog file(s). Just a suggestions -- no idea whether that fits your workflow. On 27.02.2014 12:59, Thomas Schwinge wrote: For consistency, also move OMP_CLAUSE_ASYNC_EXPR in gcc/tree.h, and also adapt OMP_CLAUSE_ASYNC in gcc/tree-pretty-print.c:dump_omp_clause? It is already adapted: case OMP_CLAUSE_ASYNC: Move between handling of OMP_CLAUSE_GANG and OMP_CLAUSE_ASYNC? 2014-02-27 Ilmir Usmanov i.usma...@samsung.com Fix OpenACC ASYNC clause: it can contain integer-expression-list. s%can%cannot, I think? * gcc/ tree-core.h (enum omp_clause_code): Likewise. tree.h (OMP_CLAUSE_ASYNC_EXPR): Reorder. tree.c (omp_clause_num_ops, omp_clause_code_name): Reorder clauses. omp-low.c (scan_sharing_clauses): Likewise. gimplify.c (gimplify_scan_omp_clauses): Likewise. Can't start that with »Likewise« ;-), so change that or reorder the entries. With that changed, it's OK to commit (don't need to repost), thanks! Grüße, Thomas pgpmxlR9xEbFU.pgp Description: PGP signature
Re: PR60147: fix ICE with DECL_NAMELIST and tree-pretty-printer.c
On Thu, Feb 27, 2014 at 10:43:33AM +0100, Richard Biener wrote: hence, one ends up in the code I quote above. That code calls print_declaration() ... Hence, TREE_TYPE() is void_type_node and dereferencing that one leads to an ICE. Sorry, I misremembered. In any case, the following leads to an ICE. print_declaration has the code: if (TREE_CODE (t) != FUNCTION_DECL) if (DECL_INITIAL (t)) ... dump_generic_node (buffer, DECL_INITIAL (t), spc, flags, false); which handles the CONSTRUCTOR - which fails with: else if (TREE_CODE (TREE_TYPE (node)) == RECORD_TYPE as TREE_TYPE (node) == NULL. Instead of the goto please duplicate the pp_semicolon and return. Ok with that change. You probably still want to add the NAMELIST_DECL case to dump_generic_node (to avoid the Unknown tree: ...). Fine with me. I assume that that is only relevant for calls to debug_tree() in the debugger? Because otherwise, it shouldn't be reachable. Thanks for the review! Tobias
Re: PR60147: fix ICE with DECL_NAMELIST and tree-pretty-printer.c
On Thu, Feb 27, 2014 at 11:39 AM, Tobias Burnus tobias.bur...@physik.fu-berlin.de wrote: On Thu, Feb 27, 2014 at 10:43:33AM +0100, Richard Biener wrote: hence, one ends up in the code I quote above. That code calls print_declaration() ... Hence, TREE_TYPE() is void_type_node and dereferencing that one leads to an ICE. Sorry, I misremembered. In any case, the following leads to an ICE. print_declaration has the code: if (TREE_CODE (t) != FUNCTION_DECL) if (DECL_INITIAL (t)) ... dump_generic_node (buffer, DECL_INITIAL (t), spc, flags, false); which handles the CONSTRUCTOR - which fails with: else if (TREE_CODE (TREE_TYPE (node)) == RECORD_TYPE as TREE_TYPE (node) == NULL. Instead of the goto please duplicate the pp_semicolon and return. Ok with that change. You probably still want to add the NAMELIST_DECL case to dump_generic_node (to avoid the Unknown tree: ...). Fine with me. I assume that that is only relevant for calls to debug_tree() in the debugger? Because otherwise, it shouldn't be reachable. Yes. Thanks for the review! Tobias
Re: C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
Jason Merrill ja...@redhat.com writes: diff --git a/gcc/testsuite/g++.dg/opt/devirt4.C b/gcc/testsuite/g++.dg/opt/devirt4.C new file mode 100644 index 000..26e8ee6 --- /dev/null +++ b/gcc/testsuite/g++.dg/opt/devirt4.C @@ -0,0 +1,16 @@ +// PR c++/53808 +// Devirtualization + inlining should produce a non-virtual +// call to ~foo. +// { dg-options -O -fdevirtualize } +// { dg-final { scan-assembler _ZN3fooD2Ev } } + +struct foo { + virtual ~foo(); +}; +struct bar : public foo { + virtual void zed(); +}; +void f() { + foo *x(new bar); + delete x; +} $ gcc/xg++ -Bgcc/ ../gcc/testsuite/g++.dg/opt/devirt4.C -nostdinc++ -Iia64-suse-linux/libstdc++-v3/include/ia64-suse-linux -Iia64-suse-linux/libstdc++-v3/include -Ilibstdc++-v3/libsupc++ -I../libstdc++-v3/include/backward -I../libstdc++-v3/testsuite/util -std=gnu++11 -O -fdevirtualize -ffat-lto-objects -S -o devirt4.s $ cat devirt4.s .file devirt4.C .pred.safe_across_calls p1-p5,p16-p63 .sdata .align 8 .LC0: data8 _ZTV3bar#+16 .align 8 .LC1: data8 _ZTV3bar#+32 .text .align 16 .global _Z1fv# .type _Z1fv#, @function .proc _Z1fv# _Z1fv: [.LFB0:] .prologue 12, 32 .save ar.pfs, r33 alloc r33 = ar.pfs, 0, 3, 1, 0 .save rp, r32 mov r32 = b0 mov r34 = r1 .body addl r35 = 8, r0 ;; br.call.sptk.many b0 = _Znwm# mov r1 = r34 ;; addl r14 = @gprel(.LC0), gp ;; ld8 r14 = [r14] ;; st8 [r8] = r14 cmp.eq p6, p7 = 0, r8 (p6) br.cond.dpnt .L1 mov r35 = r8 addl r14 = @gprel(.LC1), gp ;; ld8 r14 = [r14] ;; ld8 r15 = [r14], 8 ;; mov b6 = r15 ld8 r1 = [r14] br.call.sptk.many b0 = b6 ;; mov r1 = r34 .L1: mov ar.pfs = r33 mov b0 = r32 br.ret.sptk.many b0 ;; .endp _Z1fv# .ident GCC: (GNU) 4.9.0 20140227 (experimental) [trunk revision 208194] Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [PATCH, GOMP4] Fix OpenACC async clause
Committed as r208197. -- Ilmir.
Re: [PATCH] Fix PR 60268
Thanks, this is what I have installed. Andreas. * config/m68k/m68k.c (m68k_option_override): Disable -flive-range-shrinkage for classic m68k. (m68k_override_options_after_change): Likewise. diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c index f20d071..7f7d668 100644 --- a/gcc/config/m68k/m68k.c +++ b/gcc/config/m68k/m68k.c @@ -642,6 +642,7 @@ m68k_option_override (void) flag_schedule_insns = 0; flag_schedule_insns_after_reload = 0; flag_modulo_sched = 0; + flag_live_range_shrinkage = 0; } if (m68k_sched_cpu != CPU_UNKNOWN) @@ -665,6 +666,7 @@ m68k_override_options_after_change (void) flag_schedule_insns = 0; flag_schedule_insns_after_reload = 0; flag_modulo_sched = 0; + flag_live_range_shrinkage = 0; } } -- 1.9.0 -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [PATCH][AARCH64]PR60034
On 21 February 2014 04:24, Kugan kugan.vivekanandara...@linaro.org wrote: Compiling inline asm results in ICE (PR60034). Alignment calculation in aarch64_classify_address for (symbol_ref:DI (*.LANCHOR4) [flags 0x182])) seems wrong here. Hi Kugan, + else if (SYMBOL_REF_FLAGS (sym)) + align = GET_MODE_ALIGNMENT (GET_MODE (sym)); This is inserted into the LO_SUM handling in the function aarch64_classify_address(), the code in question is checking the alignment of the object to ensure that a scaled address instruction would be valid. The proposed code is testing if any of a bunch of unrelated predicate flags have been set on the symbol and using that to gate whether GET_MODE_ALIGNMENT would give accurate alignment information on the symbol. I'm not convinced that the presence of SYMBOL_REF_FLAGS states anything definitive about the relevance of GET_MODE_ALIGNMENT. The test looks like it fails because a section anchor has been introduced and we fail to determine anything sensible about the alignment of a section anchor. How about this instead? if (SYMBOL_REF_BLOCK (sym)) align = SYMBOL_REF_BLOCK (sym)-alignment; Fixing this also caused a regression for pr38151.c, which is due to complex type being allocated with wrong alignment. Attached patch fixes these issues. It ~might~ be beneficial to increase data_alignment here as suggest for performance reasons, but the existing alignment should not cause breakage... this issue suggest to me that the SYMBOL_REF_FLAGS approach is at fault. Cheers /Marcus
[Patch ARM] Fix failing gcc.dg/atomics/atomic-exec-2.c - movmisalignmode expanders.
Hi, This is a case where we end up with an ICE in movmisalignmode where mode is DImode because the address so generated is aligned to 32 bits as a result of V_C_E on an _Atomic Complex float temporary to a DImode temporary. The problem also shows up only when you have -mfpu=neon turned on because these expanders are only valid on hardware where you have neon and not elsewhere. Now, the problem we have is that the address in this case is unspec:DI [ (mem/c:DI (plus:SI (reg/f:SI 1073) (const_int -8 [0xfff8])) [0 S8 A32]) ] UNSPEC_MISALIGNED_ACCESS)) ../gcc/besttry.c:38 -1 (nil)) Now the problem here is that we don't allow anything but the plus (virtual*-regs) (offset) type addresses in our predicates but the expander allows the generation in that form. So, forcing the memory addresses to be the right form is the right thing to be doing here with movmisalignmode unfortunately. Given we have no other way of controlling illegitimate misaligned addresses, this is the only place to fix this at this point of time. Longer term maybe we should look at relaxing the predicates and allowing LRA / reload to deal with this but that's not suitable for stage4. Tested by bootstrap on arm-linux-gnueabihf with neon, no regressions and the testcase passes. Will apply if RMs don't object in 24 hours. regards Ramana * config/arm/neon.md (movmisalignmode): Legitimize addresses not allowed by recognizers. P.S. I had my share of the fun with the C11 atomics and realized that the standard allows for the size and alignment of atomic types to be different from the equivalent base types (n1570 - 6.2.5 = 27). Keeping the 2 out of sync is a bit unfortunate but required for atomic access (in this case we can only have atomic access to 8byte objects on 8 byte aligned addresses). -- Ramana Radhakrishnan Principal Engineer ARM Ltd.diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 2f06e42..aad420c 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -245,12 +245,23 @@ UNSPEC_MISALIGNED_ACCESS))] TARGET_NEON !BYTES_BIG_ENDIAN unaligned_access { + rtx adjust_mem; /* This pattern is not permitted to fail during expansion: if both arguments are non-registers (e.g. memory := constant, which can be created by the auto-vectorizer), force operand 1 into a register. */ if (!s_register_operand (operands[0], MODEmode) !s_register_operand (operands[1], MODEmode)) operands[1] = force_reg (MODEmode, operands[1]); + + if (s_register_operand (operands[0], MODEmode)) +adjust_mem = operands[1]; + else +adjust_mem = operands[0]; + + /* Legitimize address. */ + if (!neon_vector_mem_operand (adjust_mem, 2, true)) +XEXP (adjust_mem, 0) = force_reg (Pmode, XEXP (adjust_mem, 0)); + }) (define_insn *movmisalignmode_neon_store
Re: [PATCH, AArch64] Define __ARM_NEON by default
On 24 February 2014 10:01, Ian Bolton ian.bol...@arm.com wrote: Hi, This is needed for when people are porting their aarch32 code to aarch64. They will have #ifdef __ARM_NEON (as specified in ACLE) and their intrinsics currently won't get used on aarch64 because it's not defined there by default. This patch defines __ARM_NEON so long as we are not using general regs only. Tested on simple testcase to ensure __ARM_NEON was defined. OK for trunk? OK provide no objection in next 24 hours. /Marcus
Re: [PING][PATCH][AARCH64]Resolves testsuite/gcc.target/aarch64/aapcs64/ret-func-1.c regression
On 24 February 2014 09:49, Renlin Li renlin...@arm.com wrote: gcc/testsuite/ChangeLog: 2014-02-03 Renlin Li renlin...@arm.com * gcc.target/aarch64/aapcs64/validate_memory.h: Move f32in64 and i32in128 cases outside special big-endian processing block. This is is a fix for a broken test case, this is OK. /Marcus
Re: [PATCH][AARCH64]Adjust address with offset assembler format
On 12 February 2014 16:10, Renlin Li renlin...@arm.com wrote: gcc/ChangeLog: 2014-02-12 Renlin Li renlin...@arm.com * config/aarch64/aarch64.c (aarch64_print_operand_address): Adjust the output asm format by adding a space between base register and offset. OK for stage-1. /Marcus
Re: copyright dates in binutils (and includes/)
Hi Alan, My two cents... --- a/bfd/elf32-sparc.c +++ b/bfd/elf32-sparc.c @@ -1,7 +1,5 @@ /* SPARC-specific support for 32-bit ELF - Copyright 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, - 2003, 2004, 2005, 2006, 2007, 2010, 2011 - Free Software Foundation, Inc. + Copyright (C) 1993-2014 Free Software Foundation, Inc. This file is part of BFD, the Binary File Descriptor library. Does anyone have a violent objection to committing updates in bfd, binutils, elfcpp, gas, gold, gprof, ld, and opcodes? FWIW, I find that the new ouptut is a nice improvement. I should mention, however, that for us to use ranges like this, the FSF asked us to add a note explaining that the copyright years could be abbreviated into a range. See gdb/README (at the end). I suspect that you'll need the same note for binutils. How about includes/ too? The choices there are a) apply to just binutils owned files, b) apply to binutils+gdb files, c) apply to the lot, and update gcc/include/ too. I see Joel already updated include/gdb, but the script makes a further small change. So choice (b) in include/gdb consists of patches like the following: IIRC, the FSF does not require the (C), but I think they would prefer it. Either way, it doesn't get in the way for future updates, so I'd go with at least (b). As for (c), I'll let the GCC folks answer. -- Joel
Re: [AArch64] Fix possible wrong code generation when comparing DImode values.
On 24 February 2014 18:17, James Greenhalgh james.greenha...@arm.com wrote: gcc/ 2014-02-24 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-simd.md (aarch64_cmoptabdi): Always split. (*aarch64_cmoptabdi): New. (aarch64_cmtstdi): Always split. (*aarch64_cmtstdi): New. OK for stage-1 /Marcus
Re: C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
Hmm, I wonder why we aren't devirtualizing that call on ia64. Does it work with -O3? Jason
Re: C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
Jason Merrill ja...@redhat.com writes: Hmm, I wonder why we aren't devirtualizing that call on ia64. Does it work with -O3? That doesn't change anything fundamentally. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[Patch ARM] Allow any register for DImode values in Thumb2.
Hi I noticed that for T32 we don't allow any old register for DImode values. The restriction of an even register is true only for ARM state because the ISA doesn't allow any old register in this place. In a few large .i files that I had knocking about, noticed a nice drop in stack usage and a generally improved register allocation strategy. Queued for stage1 after suitable testing including a bootstrap and regression test in Thumb2 found no issues. regards Ramana DATE Ramana Radhakrishnan ramana.radhakrish...@arm.com * config/arm/arm.c (arm_hard_regno_mode_ok): Loosen restrictions on core registers for DImode values in Thumb2. -- Ramana Radhakrishnan Principal Engineer ARM Ltd.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index b49f43e..73dc04a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -22593,12 +22593,19 @@ arm_hard_regno_mode_ok (unsigned int regno, enum machine_mode mode) } /* We allow almost any value to be stored in the general registers. - Restrict doubleword quantities to even register pairs so that we can - use ldrd. Do not allow very large Neon structure opaque modes in - general registers; they would use too many. */ + Restrict doubleword quantities to even register pairs in ARM state + so that we can use ldrd. Do not allow very large Neon structure + opaque modes in general registers; they would use too many. */ if (regno = LAST_ARM_REGNUM) -return !(TARGET_LDRD GET_MODE_SIZE (mode) 4 (regno 1) != 0) - ARM_NUM_REGS (mode) = 4; +{ + if (ARM_NUM_REGS (mode) 4) + return FALSE; + + if (TARGET_THUMB2) + return TRUE; + + return !(TARGET_LDRD GET_MODE_SIZE (mode) 4 (regno 1) != 0); +} if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)
Re: C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
On Thu, Feb 27, 2014 at 2:53 PM, Andreas Schwab sch...@suse.de wrote: Jason Merrill ja...@redhat.com writes: Hmm, I wonder why we aren't devirtualizing that call on ia64. Does it work with -O3? That doesn't change anything fundamentally. I think the vtable lookup sequence is different and nobody cared to adjust the gimple matcher to also match the ia64 sequence. Richard. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
On 02/27/2014 09:03 AM, Richard Biener wrote: Jason Merrill ja...@redhat.com writes: Hmm, I wonder why we aren't devirtualizing that call on ia64. I think the vtable lookup sequence is different and nobody cared to adjust the gimple matcher to also match the ia64 sequence. Ah. So xfail on ia64? Jason
Re: C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
On Thu, Feb 27, 2014 at 3:51 PM, Jason Merrill ja...@redhat.com wrote: On 02/27/2014 09:03 AM, Richard Biener wrote: Jason Merrill ja...@redhat.com writes: Hmm, I wonder why we aren't devirtualizing that call on ia64. I think the vtable lookup sequence is different and nobody cared to adjust the gimple matcher to also match the ia64 sequence. Ah. So xfail on ia64? OTOH the g++dg/ipa/devirt-* testcases seem to run fine everywhere? Richard. Jason
Re: [PATCHv2/AARCH64 2/3] Fix TLS for ILP32.
Hi, On 02/26/14 02:25, Andrew Pinski wrote: Hi, With ILP32, some simple usage of TLS variables causes an unrecognizable instruction due to needing to use SImode for loading pointers from memory. This fixes the three (tlsie_small, tlsle_small, tlsdesc_small) patterns to support SImode for pointers. I modified them to be like what was done for the GOT patterns. The patch looks good to me (but I cannot approve). There are some minor indentation issues though; see below. OK? Build and tested on aarch64-elf with no regressions. Thanks, Andrew Pinski * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle TLS for ILP32. * config/aarch64/aarch64.md (tlsie_small): Rename to ... (tlsie_small_mode): this and handle PTR. (tlsie_small_sidi): New pattern. (tlsle_small): Change to an expand to handle ILP32. (tlsle_small_mode): New pattern. (tlsdesc_small): Rename to ... (tlsdesc_small_mode): this and handle PTR. --- gcc/ChangeLog | 12 + gcc/config/aarch64/aarch64.c | 48 +++ gcc/config/aarch64/aarch64.md | 54 +++- 3 files changed, 96 insertions(+), 18 deletions(-) [snip] diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8..7d8a645 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3581,35 +3581,65 @@ [(set_attr type call) (set_attr length 16)]) -(define_insn tlsie_small - [(set (match_operand:DI 0 register_operand =r) -(unspec:DI [(match_operand:DI 1 aarch64_tls_ie_symref S)] +(define_insn tlsie_small_mode + [(set (match_operand:PTR 0 register_operand =r) +(unspec:PTR [(match_operand 1 aarch64_tls_ie_symref S)] UNSPEC_GOTSMALLTLS))] - adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1] + adrp\\t%0, %A1\;ldr\\t%w0, [%0, #%L1] [(set_attr type load1) (set_attr length 8)] ) -(define_insn tlsle_small + +(define_insn tlsie_small_sidi [(set (match_operand:DI 0 register_operand =r) -(unspec:DI [(match_operand:DI 1 register_operand r) - (match_operand:DI 2 aarch64_tls_le_symref S)] + (zero_extend:DI + (unspec:SI [(match_operand 1 aarch64_tls_ie_symref S)] + UNSPEC_GOTSMALLTLS)))] + + adrp\\t%0, %A1\;ldr\\t%w0, [%0, #%L1] + [(set_attr type load1) + (set_attr length 8)] +) + + +(define_expand tlsle_small + [(set (match_operand 0 register_operand =r) +(unspec [(match_operand 1 register_operand r) + (match_operand 2 aarch64_tls_le_symref S)] + UNSPEC_GOTSMALLTLS))] The last two lines were not indented well. + +{ + enum machine_mode mode = GET_MODE (operands[0]); + emit_insn ((mode == DImode + ? gen_tlsle_small_di + : gen_tlsle_small_si) (operands[0], + operands[1], + operands[2])); Likewise here. Thanks, Yufeng + DONE; + +}) + +(define_insn tlsle_small_mode + [(set (match_operand:P 0 register_operand =r) +(unspec:P [(match_operand:P 1 register_operand r) + (match_operand 2 aarch64_tls_le_symref S)] UNSPEC_GOTSMALLTLS))] - add\\t%0, %1, #%G2\;add\\t%0, %0, #%L2 + add\\t%w0, %w1, #%G2\;add\\t%w0, %w0, #%L2 [(set_attr type alu_reg) (set_attr length 8)] ) -(define_insn tlsdesc_small - [(set (reg:DI R0_REGNUM) -(unspec:DI [(match_operand:DI 0 aarch64_valid_symref S)] +(define_insn tlsdesc_small_mode + [(set (reg:PTR R0_REGNUM) +(unspec:PTR [(match_operand 0 aarch64_valid_symref S)] UNSPEC_TLSDESC)) (clobber (reg:DI LR_REGNUM)) (clobber (match_scratch:DI 1 =r))] TARGET_TLS_DESC - adrp\\tx0, %A0\;ldr\\t%1, [x0, #%L0]\;add\\tx0, x0, %L0\;.tlsdesccall\\t%0\;blr\\t%1 + adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0,w0, %L0\;.tlsdesccall\\t%0\;blr\\t%1 [(set_attr type call) (set_attr length 16)])
Re: [PATCHv2/AARCH64 3/3] Support ILP32 multi-lib
On 02/26/14 02:25, Andrew Pinski wrote: Hi, This is the final patch which adds support for the dynamic linker and multi-lib directories for ILP32. I did not change multi-arch support as I did not know what it should be changed to and internally here at Cavium, we don't use multi-arch. Updated for the new names that were decided on. OK? Build and tested for aarch64-linux-gnu with and without --with-multilib-list=lp64,ilp32. Looks good to me, but I cannot approve. I have a couple of minor comments on the changelog entry. Thanks, Andrew Pinski * config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): /lib/ld-linux-aarch64_ilp32.so.1 is used for ILP32. (LINUX_TARGET_LINK_SPEC): Update linker script for ILP32. file whose name depends on -mabi= and -mbig-endian. * config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 better and handle ilp32 too. (MULTILIB_OPTIONS): Delete. (MULTILIB_DIRNAMES): Delete. --- gcc/ChangeLog | 11 +++ gcc/config/aarch64/aarch64-linux.h |4 ++-- gcc/config/aarch64/t-aarch64-linux |7 ++- 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 155ce45..a0cdc58 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,5 +1,16 @@ 2014-02-25 Andrew Pinskiapin...@cavium.com + * config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): /lib/ld-linux32-aarch64_ilp32.so.1 s/linux32/linux + is used for ILP32. + (LINUX_TARGET_LINK_SPEC): Update linker script for ILP32. + file whose name depends on -mabi= and -mbig-endian. + * config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 better + and handle ilp32 too. s/ilp32/ILP32 Thanks, Yufeng + (MULTILIB_OPTIONS): Delete. + (MULTILIB_DIRNAMES): Delete. + +2014-02-25 Andrew Pinskiapin...@cavium.com + * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle TLS for ILP32. * config/aarch64/aarch64.md (tlsie_small): Rename to ... diff --git a/gcc/config/aarch64/aarch64-linux.h b/gcc/config/aarch64/aarch64-linux.h index a8f0771..48beafb 100644 --- a/gcc/config/aarch64/aarch64-linux.h +++ b/gcc/config/aarch64/aarch64-linux.h @@ -21,7 +21,7 @@ #ifndef GCC_AARCH64_LINUX_H #define GCC_AARCH64_LINUX_H -#define GLIBC_DYNAMIC_LINKER /lib/ld-linux-aarch64%{mbig-endian:_be}.so.1 +#define GLIBC_DYNAMIC_LINKER /lib/ld-linux-aarch64%{mbig-endian:_be}%{mabi=ilp32:_ilp32}.so.1 #define CPP_SPEC %{pthread:-D_REENTRANT} @@ -33,7 +33,7 @@ -dynamic-linker GNU_USER_DYNAMIC_LINKER \ -X \ %{mbig-endian:-EB} %{mlittle-endian:-EL} \ - -maarch64linux%{mbig-endian:b} + -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b} #define LINK_SPEC LINUX_TARGET_LINK_SPEC diff --git a/gcc/config/aarch64/t-aarch64-linux b/gcc/config/aarch64/t-aarch64-linux index 147452b..d6a678e 100644 --- a/gcc/config/aarch64/t-aarch64-linux +++ b/gcc/config/aarch64/t-aarch64-linux @@ -22,10 +22,7 @@ LIB1ASMSRC = aarch64/lib1funcs.asm LIB1ASMFUNCS = _aarch64_sync_cache_range AARCH_BE = $(if $(findstring TARGET_BIG_ENDIAN_DEFAULT=1, $(tm_defines)),_be) -MULTILIB_OSDIRNAMES = .=../lib64$(call if_multiarch,:aarch64$(AARCH_BE)-linux-gnu) +MULTILIB_OSDIRNAMES = mabi.lp64=../lib64$(call if_multiarch,:aarch64$(AARCH_BE)-linux-gnu) MULTIARCH_DIRNAME = $(call if_multiarch,aarch64$(AARCH_BE)-linux-gnu) -# Disable the multilib for linux-gnu targets for the time being; focus -# on the baremetal targets. -MULTILIB_OPTIONS= -MULTILIB_DIRNAMES = +MULTILIB_OSDIRNAMES += mabi.ilp32=../libilp32
[Ping] Re: [C++ Patch] PR 60253
On 02/21/2014 12:14 PM, Paolo Carlini wrote: Hi, unless we have reasons to believe that the diagnostic quality could regress in some circumstances, we can easily resolve this ICE on invalid regression by always returning error_mark_node after error (thus outside SFINAE too). Pinging this patchlet... Thanks! Paolo.
Fwd: [PATCH, ARM] Improve 64 bit division performance
[resending as text/plain] Hi These patches optimise 64 bit division by removing the use of the __gnu_[u]ldivmod_helper functions and hence avoiding the redundant calculation of the remainder in those functions. Bootstrapped, tested and checked for arm-unknown-linux-gnueabihf. Benchmarked on Chromebook and Raspberry Pi using attached divbench3.c. Loop1 varies the divisor and loop2 varies the dividend. Chromebook: before: loop1 unsigned: 3.474419 loop2 unsigned: 6.564871 loop1 signed: 4.127967 loop2 signed: 6.071490 after: loop1 unsigned: 2.781364 loop2 unsigned: 6.166478 loop1 signed: 2.800974 loop2 signed: 6.129588 Raspberry pi: before loop1 unsigned:28.881753 loop2 unsigned:19.876385 loop1 signed: 32.074941 loop2 signed: 20.594860 after: loop1 unsigned:24.893846 loop2 unsigned:19.537562 loop1 signed: 25.334509 loop2 signed: 19.615088 Any comments? OK for stage 1? Patch 1: 2014-02-27 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call to __udivmoddi4. Patch 2: 2014-02-27 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_ldivmod): Perform signed division via call to __udivmoddi4 and fixing up for negative operands. From 35254b813303e7fb40eb8aa0bb749216fd8f96fc Mon Sep 17 00:00:00 2001 From: Charles Baylis charles.bay...@linaro.org Date: Tue, 25 Feb 2014 18:34:38 + Subject: [PATCH 1/2] Optimise __aeabi_uldivmod 2014-02-25 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call to __udivmoddi4. * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer manipulation. --- libgcc/config/arm/bpabi.S | 25 - 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index 7772301..e020af5 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -120,6 +120,16 @@ ARM_FUNC_START aeabi_ulcmp #endif .endm +/* we can use STRD/LDRD on v5TE and later, and any Thumb-2 architecture. */ +#if (defined(__ARM_EABI__)\ + (defined(__thumb2__) \ + || (__ARM_ARCH = 5 defined(__TARGET_FEATURE_DSP +#define CAN_USE_LDRD 1 +#else +#define CAN_USE_LDRD 0 +#endif + + #ifdef L_aeabi_ldivmod ARM_FUNC_START aeabi_ldivmod @@ -149,18 +159,23 @@ ARM_FUNC_START aeabi_uldivmod cfi_start __aeabi_uldivmod, LSYM(Lend_aeabi_uldivmod) test_div_by_zero unsigned - sub sp, sp, #8 -#if defined(__thumb2__) - mov ip, sp - push {ip, lr} +#if defined(__thumb2__) CAN_USE_LDRD + sub ip, sp, #8 + strd ip,lr, [sp, #-16]! #else + sub sp, sp, #8 do_push {sp, lr} #endif 98: cfi_push 98b - __aeabi_uldivmod, 0xe, -0xc, 0x10 - bl SYM(__gnu_uldivmod_helper) __PLT__ + bl SYM(__udivmoddi4) __PLT__ ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else add sp, sp, #8 do_pop {r2, r3} +#endif RET cfi_end LSYM(Lend_aeabi_uldivmod) -- 1.8.3.2 From 975d9c624e77ee00476e6866250b0e2e31461fca Mon Sep 17 00:00:00 2001 From: Charles Baylis charles.bay...@linaro.org Date: Tue, 25 Feb 2014 16:27:59 + Subject: [PATCH 2/2] Optimise __aeabi_ldivmod 2014-02-25 Charles Baylis charles.bay...@linaro.org * config/arm/bpabi.S (__aeabi_ldivmod): Perform signed division using unsigned division via call to __udivmoddi4 and additional logic. --- libgcc/config/arm/bpabi.S | 74 +++ 1 file changed, 69 insertions(+), 5 deletions(-) diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S index e020af5..8b75a28 100644 --- a/libgcc/config/arm/bpabi.S +++ b/libgcc/config/arm/bpabi.S @@ -136,20 +136,84 @@ ARM_FUNC_START aeabi_ldivmod cfi_start __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod) test_div_by_zero signed - sub sp, sp, #8 -#if defined(__thumb2__) - mov ip, sp - push {ip, lr} +#if defined(__thumb2__) CAN_USE_LDRD + sub ip, sp, #8 + strd ip,lr, [sp, #-16]! #else + sub sp, sp, #8 do_push {sp, lr} #endif + cmp xxh, #0 + blt 1f + cmp yyh, #0 + blt 2f + +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + bl SYM(__udivmoddi4) __PLT__ + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif + RET +1: /* xxh:xxl is negative */ + rsbs xxl, xxl, #0 + sbc xxh, xxh, xxh, lsl #1 + cmp yyh, #0 + blt 3f +98: cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10 + bl SYM(__udivmoddi4) __PLT__ + ldr lr, [sp, #4] +#if CAN_USE_LDRD + ldrd r2, r3, [sp, #8] + add sp, sp, #16 +#else + add sp, sp, #8 + do_pop {r2, r3} +#endif + rsbs xxl, xxl, #0 + sbc xxh, xxh, xxh, lsl #1 + rsbs yyl, yyl, #0 + sbc yyh, yyh, yyh, lsl #1 + RET + +2: /* only yyh:yyl is negative */ + rsbs yyl,
Re: C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
On Thu, Feb 27, 2014 at 04:00:22PM +0100, Richard Biener wrote: On Thu, Feb 27, 2014 at 3:51 PM, Jason Merrill ja...@redhat.com wrote: On 02/27/2014 09:03 AM, Richard Biener wrote: Jason Merrill ja...@redhat.com writes: Hmm, I wonder why we aren't devirtualizing that call on ia64. I think the vtable lookup sequence is different and nobody cared to adjust the gimple matcher to also match the ia64 sequence. Ah. So xfail on ia64? OTOH the g++dg/ipa/devirt-* testcases seem to run fine everywhere? This is probably the same issue as the one described in http://gcc.gnu.org/ml/gcc/2012-08/msg00055.html and the subsequent thread. The problem however turned out to be slightly more difficult and I was not interested enough in ia64 to care again. Martin
C++ PATCH for c++/60353 (bogus error with unnamed class)
The code I factored out of expand_or_defer_fn_1 into tentative_decl_linkage is intended for use with functions that are defined; using it on functions that have not been defined causes problems. Tested x86_64-pc-linux-gnu, applying to trunk. commit 2e63bb7138e59e7e04e2649fb3d8086fffc238a7 Author: Jason Merrill ja...@redhat.com Date: Thu Feb 27 09:15:02 2014 -0500 PR c++/60353 PR c++/55877 * decl2.c (tentative_decl_linkage): Don't mess with functions that are not yet defined. diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 1afe16e..dfc532d 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -2564,7 +2564,8 @@ tentative_decl_linkage (tree decl) be handled. */; else if (vague_linkage_p (decl)) { - if (TREE_CODE (decl) == FUNCTION_DECL) + if (TREE_CODE (decl) == FUNCTION_DECL + decl_defined_p (decl)) { DECL_EXTERNAL (decl) = 1; DECL_NOT_REALLY_EXTERN (decl) = 1; @@ -2586,11 +2587,8 @@ tentative_decl_linkage (tree decl) DECL_INTERFACE_KNOWN (decl) = 1; } } - else - { - gcc_assert (TREE_CODE (decl) == VAR_DECL); - maybe_commonize_var (decl); - } + else if (TREE_CODE (decl) == VAR_DECL) + maybe_commonize_var (decl); } } diff --git a/gcc/testsuite/g++.dg/other/anon6.C b/gcc/testsuite/g++.dg/other/anon6.C new file mode 100644 index 000..2fd0942 --- /dev/null +++ b/gcc/testsuite/g++.dg/other/anon6.C @@ -0,0 +1,8 @@ +// PR c++/60353 + +struct A { + A(int); +}; +typedef struct { + A format; +} B;
Re: [PATCHv2/AARCH64 2/3] Fix TLS for ILP32.
On 26 February 2014 02:25, Andrew Pinski apin...@cavium.com wrote: Hi, With ILP32, some simple usage of TLS variables causes an unrecognizable instruction due to needing to use SImode for loading pointers from memory. This fixes the three (tlsie_small, tlsle_small, tlsdesc_small) patterns to support SImode for pointers. I modified them to be like what was done for the GOT patterns. OK? Build and tested on aarch64-elf with no regressions. Thanks, Andrew Pinski * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle TLS for ILP32. * config/aarch64/aarch64.md (tlsie_small): Rename to ... (tlsie_small_mode): this and handle PTR. (tlsie_small_sidi): New pattern. (tlsle_small): Change to an expand to handle ILP32. (tlsle_small_mode): New pattern. (tlsdesc_small): Rename to ... (tlsdesc_small_mode): this and handle PTR. Thanks Andrew, this is OK for stage-1, but please fix Yufeng's layout nit. /Marcus
patch to fix PR59222
The following patch fixes http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59222 The patch was successfully bootstrapped and tested on x86/x86-64. Committed as rev. 208201. 2014-02-27 Vladimir Makarov vmaka...@redhat.com PR target/59222 * lra.c (lra_emit_add): Check SUBREG too. Index: lra.c === --- lra.c (revision 208174) +++ lra.c (working copy) @@ -340,8 +340,9 @@ lra_emit_add (rtx x, rtx y, rtx z) base = a1; index = a2; } - if (! REG_P (base) - || (index != NULL_RTX ! REG_P (index)) + if (! (REG_P (base) || GET_CODE (base) == SUBREG) + || (index != NULL_RTX + ! (REG_P (index) || GET_CODE (index) == SUBREG)) || (disp != NULL_RTX ! CONSTANT_P (disp)) || (scale != NULL_RTX ! CONSTANT_P (scale))) {
Re: [C++ Patch] PR 60253
OK. Jason
Re: [C++ patch] for C++/52369
On 02/23/2014 02:36 PM, Fabien Chêne wrote: * cp/method.c (walk_field_subobs): improve the diagnostic locations for both REFERENCE_TYPEs and non-static const members. It's important to have the error location be the place where the actual problem is, namely the constructor definition. Instead of changing it, please add an inform pointing out the location of the member in question. Jason
Re: copyright dates in binutils (and includes/)
On Thu, 27 Feb 2014, Joel Brobecker wrote: I should mention, however, that for us to use ranges like this, the FSF asked us to add a note explaining that the copyright years could be abbreviated into a range. See gdb/README (at the end). I suspect that you'll need the same note for binutils. And, where a gap in the years is being implicitly filled in by conversion to a range, make sure that either (a) there was a public version control repository for binutils during that year, or (b) there was a release (including beta releases, Cygnus releases etc., not just official releases) during that year. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH/AARCH64 1/3] Add AARCH64 ILP32 PCH support
On Tue, Feb 25, 2014 at 06:25:11PM -0800, Andrew Pinski wrote: -#elif defined(__aarch64__) +#elif defined(__aarch64__) defined(__LP64__) # define TRY_EMPTY_VM_SPACE 0x10 +#elif defined(__aarch64__) +# define TRY_EMPTY_VM_SPACE 0x6000 awesome. thanks andrew! --kyle
[PATCH] [rtl-optimization/49847] Fix cse to handle cc0 setter and cc0 user in different blocks
As discussed in 49847, a few years ago GCC was changed to add EH edges for exceptions that might arise from a floating point comparison. That change made it possible for a cc0-setter and cc0-user to end up in different blocks, separated by a NOTE. After discussing reverting the change, duplicating the cc0-setter and a couple other ideas, eventually Richi and myself settled on relaxing the long standing restrictions on the relative locations of the cc0 setter and cc0 user. We agreed to update the documentation and fault in fixes due to low priority of cc0 targets (and doubly so since this really only affects cc0 targets with used with non-call-exceptions). This patch updates the documentation and fixes the only known failure due to this change in cse.c with a patch from Mikael. Tested with a cross compiler. I'll probably fire off a m68k bootstrap for good measure, but it'll be several days before that completes. Installed on the trunk. Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a78716..a20cee3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2014-02-27 Mikael Pettersson mi...@it.uu.se + Jeff Law l...@redhat.com + + PR rtl-optimization/49847 + * cse.c (fold_rtx) Handle case where cc0 setter and cc0 user + are in different blocks. + * doc/tm.texi (Condition Code Status): Update documention for + relative locations of cc0-setter and cc0-user. + 2014-02-27 Vladimir Makarov vmaka...@redhat.com PR target/59222 diff --git a/gcc/cse.c b/gcc/cse.c index cffa553..dba85f1 100644 --- a/gcc/cse.c +++ b/gcc/cse.c @@ -3199,9 +3199,27 @@ fold_rtx (rtx x, rtx insn) #ifdef HAVE_cc0 case CC0: - folded_arg = prev_insn_cc0; - mode_arg = prev_insn_cc0_mode; - const_arg = equiv_constant (folded_arg); + /* The cc0-user and cc0-setter may be in different blocks if + the cc0-setter potentially traps. In that case PREV_INSN_CC0 + will have been cleared as we exited the block with the + setter. + + While we could potentially track cc0 in this case, it just + doesn't seem to be worth it given that cc0 targets are not + terribly common or important these days and trapping math + is rarely used. The combination of those two conditions + necessary to trip this situation is exceedingly rare in the + real world. */ + if (!prev_insn_cc0) + { + const_arg = NULL_RTX; + } + else + { + folded_arg = prev_insn_cc0; + mode_arg = prev_insn_cc0_mode; + const_arg = equiv_constant (folded_arg); + } break; #endif diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f204936..f7024a7 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5900,8 +5900,13 @@ most instructions do not affect it. The latter category includes most RISC machines. The implicit clobbering poses a strong restriction on the placement of -the definition and use of the condition code, which need to be in adjacent -insns for machines using @code{(cc0)}. This can prevent important +the definition and use of the condition code. In the past the definition +and use were always adjacent. However, recent changes to support trapping +arithmatic may result in the definition and user being in different blocks. +Thus, there may be a @code{NOTE_INSN_BASIC_BLOCK} between them. Additionally, +the definition may be the source of exception handling edges. + +These restrictions can prevent important optimizations on some machines. For example, on the IBM RS/6000, there is a delay for taken branches unless the condition code register is set three instructions earlier than the conditional branch. The instruction diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 128a5f7..be4cb12 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2014-02-27 Mikael Pettersson mi...@it.uu.se +Jeff Law l...@redhat.com + +PR rtl-optimization/49847 +* g++.dg/pr49847.C: New test. + 2014-02-27 Marek Polacek pola...@redhat.com PR middle-end/59223 diff --git a/gcc/testsuite/g++.dg/pr49847.C b/gcc/testsuite/g++.dg/pr49847.C new file mode 100644 index 000..b047713 --- /dev/null +++ b/gcc/testsuite/g++.dg/pr49847.C @@ -0,0 +1,7 @@ +/* { dg-do compile } */ +/* { dg-options -O -fnon-call-exceptions } */ +int f (float g) +{ + try { return g = 0; } + catch (...) {} +}
Re: C++ PATCH for lto/53808 (devirtualization of defaulted virtual dtor)
On Thu, Feb 27, 2014 at 2:53 PM, Andreas Schwab sch...@suse.de wrote: Jason Merrill ja...@redhat.com writes: Hmm, I wonder why we aren't devirtualizing that call on ia64. Does it work with -O3? That doesn't change anything fundamentally. I think the vtable lookup sequence is different and nobody cared to adjust the gimple matcher to also match the ia64 sequence. We sort of handle the function descriptors thorough the devirtualization code. I will check why we fail here. Honza Richard. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [C++ Patch] PR 60314 (ICE with decltype(auto))
On 02/25/2014 05:03 AM, Paolo Carlini wrote: here we ICE exactly as we did in c++/53756: the only difference is the use of decltype(auto) instead of auto. Now, if we compare is_cxx_auto to is_auto (the front-end helper), evidently there is an inconsistency about the handling of decltype(auto) and the below fixes the ICE. However, also clearly the patchlet needs a review, because an out of class decltype(auto) is already fine. Also, I'm not 100% sure we don't need a decltype_auto_die, etc. I think we do need a decltype_auto_die. Jason
Re: [RFA][rtl-optimization/52714] Do not allow combine to create invalid RTL
On 02/17/14 02:28, Eric Botcazou wrote: Although it's probably time to concede defeat in this particular case, I don't think we should use the NONJUMP_INSN_P big hammer because we want to eliminate branches if possible. So I'd just add the minimal test to the two existing conditions so as to prevent the invalid RTL from being created, e.g. (!JUMP_P (i3) || SET_DEST (set[01]) == pc_rtx) Done. Attached is the final iteration that got installed. Verified with a cross compiler. Bootstrap on m68k will start at some point and finish in the next week or so. jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index a20cee3..16c499b 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,10 @@ +2014-02-27 Jeff Law l...@redhat.com + + PR rtl-optimization/52714 + * combine.c (try_combine): When splitting an unrecognized PARALLEL + into two independent simple sets, if I3 is a jump, ensure the + pattern we place into I3 is a (set (pc) ...) + 2014-02-27 Mikael Pettersson mi...@it.uu.se Jeff Law l...@redhat.com diff --git a/gcc/combine.c b/gcc/combine.c index 1b598b4..fc473b6 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -3712,6 +3712,9 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, #ifdef HAVE_cc0 !reg_referenced_p (cc0_rtx, set0) #endif + /* If I3 is a jump, ensure that set0 is a jump so that +we do not create invalid RTL. */ + (!JUMP_P (i3) || SET_DEST (set0) == pc_rtx) ) { newi2pat = set1; @@ -3726,6 +3729,9 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i0, int *new_direct_jump_p, #ifdef HAVE_cc0 !reg_referenced_p (cc0_rtx, set1) #endif + /* If I3 is a jump, ensure that set1 is a jump so that + we do not create invalid RTL. */ + (!JUMP_P (i3) || SET_DEST (set1) == pc_rtx) ) { newi2pat = set0; diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index be4cb12..b95bcfd 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-02-27 Jeff Law l...@redhat.com + + PR rtl-optimization/52714 + * gcc.c-torture/compile/pr52714.c: New test. + 2014-02-27 Mikael Pettersson mi...@it.uu.se Jeff Law l...@redhat.com diff --git a/gcc/testsuite/gcc.c-torture/compile/pr52714.c b/gcc/testsuite/gcc.c-torture/compile/pr52714.c new file mode 100644 index 000..03d4990 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr52714.c @@ -0,0 +1,25 @@ + +int __re_compile_fastmap(unsigned char *p) +{ +unsigned char **stack; +unsigned size; +unsigned avail; + +stack = __builtin_alloca(5 * sizeof(unsigned char*)); +if (stack == 0) + return -2; +size = 5; +avail = 0; + +for (;;) { + switch (*p++) { + case 0: + if (avail == size) + return -2; + stack[avail++] = p; + } +} + +return 0; +} +
Re: [PATCH] [rtl-optimization/49847] Fix cse to handle cc0 setter and cc0 user in different blocks
On Thu, Feb 27, 2014 at 11:02 AM, Jeff Law l...@redhat.com wrote: As discussed in 49847, a few years ago GCC was changed to add EH edges for exceptions that might arise from a floating point comparison. That change made it possible for a cc0-setter and cc0-user to end up in different blocks, separated by a NOTE. After discussing reverting the change, duplicating the cc0-setter and a couple other ideas, eventually Richi and myself settled on relaxing the long standing restrictions on the relative locations of the cc0 setter and cc0 user. We agreed to update the documentation and fault in fixes due to low priority of cc0 targets (and doubly so since this really only affects cc0 targets with used with non-call-exceptions). This patch updates the documentation and fixes the only known failure due to this change in cse.c with a patch from Mikael. Tested with a cross compiler. I'll probably fire off a m68k bootstrap for good measure, but it'll be several days before that completes. Installed on the trunk. Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a78716..a20cee3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2014-02-27 Mikael Pettersson mi...@it.uu.se + Jeff Law l...@redhat.com + + PR rtl-optimization/49847 + * cse.c (fold_rtx) Handle case where cc0 setter and cc0 user + are in different blocks. + * doc/tm.texi (Condition Code Status): Update documention for + relative locations of cc0-setter and cc0-user. + This breaks bootstrap: You should edit ../../src-trunk/gcc/doc/tm.texi.in rather than ../../src-trunk/gcc/doc/tm.texi . make[6]: *** [s-tm-texi] Error 1 make[6]: *** Waiting for unfinished jobs -- H.J.
Re: [PATCH] [rtl-optimization/49847] Fix cse to handle cc0 setter and cc0 user in different blocks
On Thu, Feb 27, 2014 at 11:50 AM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Feb 27, 2014 at 11:02 AM, Jeff Law l...@redhat.com wrote: As discussed in 49847, a few years ago GCC was changed to add EH edges for exceptions that might arise from a floating point comparison. That change made it possible for a cc0-setter and cc0-user to end up in different blocks, separated by a NOTE. After discussing reverting the change, duplicating the cc0-setter and a couple other ideas, eventually Richi and myself settled on relaxing the long standing restrictions on the relative locations of the cc0 setter and cc0 user. We agreed to update the documentation and fault in fixes due to low priority of cc0 targets (and doubly so since this really only affects cc0 targets with used with non-call-exceptions). This patch updates the documentation and fixes the only known failure due to this change in cse.c with a patch from Mikael. Tested with a cross compiler. I'll probably fire off a m68k bootstrap for good measure, but it'll be several days before that completes. Installed on the trunk. Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a78716..a20cee3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2014-02-27 Mikael Pettersson mi...@it.uu.se + Jeff Law l...@redhat.com + + PR rtl-optimization/49847 + * cse.c (fold_rtx) Handle case where cc0 setter and cc0 user + are in different blocks. + * doc/tm.texi (Condition Code Status): Update documention for + relative locations of cc0-setter and cc0-user. + This breaks bootstrap: You should edit ../../src-trunk/gcc/doc/tm.texi.in rather than ../../src-trunk/gcc/doc/tm.texi . make[6]: *** [s-tm-texi] Error 1 make[6]: *** Waiting for unfinished jobs I checked in this as an obvious fix. -- H.J. --- Index: ChangeLog === --- ChangeLog (revision 208204) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-02-27 H.J. Lu hongjiu...@intel.com + + * doc/tm.texi.in (Condition Code Status): Update documention for + relative locations of cc0-setter and cc0-user. + 2014-02-27 Jeff Law l...@redhat.com PR rtl-optimization/52714 Index: doc/tm.texi.in === --- doc/tm.texi.in (revision 208204) +++ doc/tm.texi.in (working copy) @@ -4484,8 +4484,13 @@ most instructions do not affect it. The most RISC machines. The implicit clobbering poses a strong restriction on the placement of -the definition and use of the condition code, which need to be in adjacent -insns for machines using @code{(cc0)}. This can prevent important +the definition and use of the condition code. In the past the definition +and use were always adjacent. However, recent changes to support trapping +arithmatic may result in the definition and user being in different blocks. +Thus, there may be a @code{NOTE_INSN_BASIC_BLOCK} between them. Additionally, +the definition may be the source of exception handling edges. + +These restrictions can prevent important optimizations on some machines. For example, on the IBM RS/6000, there is a delay for taken branches unless the condition code register is set three instructions earlier than the conditional branch. The instruction
[gomp4] Gimplification: Merge gimplify_oacc_parallel into gimplify_omp_workshare. (was: [gomp4 9/9] OpenACC: Basic support for #pragma acc parallel.)
Hi! On Wed, 6 Nov 2013 20:53:00 +0100, I wrote: gcc/ * gimplify.c [...] (gimplify_oacc_parallel): New function. To get rid of code duplication, I have applied the following in r208206: commit 9ffb216dd43bda84f56ce7fe68ae15cc08110924 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Thu Feb 27 19:56:28 2014 + Gimplification: Merge gimplify_oacc_parallel into gimplify_omp_workshare. gcc/ * gimplify.c (gimplify_oacc_parallel): Merge into gimplify_omp_workshare. Update all callers. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@208206 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 9f5941f..3d9b06d 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-02-27 Thomas Schwinge tho...@codesourcery.com + + * gimplify.c (gimplify_oacc_parallel): Merge into + gimplify_omp_workshare. Update all callers. + 2014-02-27 Ilmir Usmanov i.usma...@samsung.com Fix OpenACC ASYNC clause: it cannot contain integer-expression-list. diff --git gcc/gimplify.c gcc/gimplify.c index d0a4779..6dbabfa 100644 --- gcc/gimplify.c +++ gcc/gimplify.c @@ -6572,38 +6572,6 @@ gimplify_adjust_omp_clauses (tree *list_p) delete_omp_context (ctx); } -/* Gimplify the contents of an OACC_PARALLEL statement. This involves - gimplification of the body, as well as scanning the body for used - variables. We need to do this scan now, because variable-sized - decls will be decomposed during gimplification. */ - -static void -gimplify_oacc_parallel (tree *expr_p, gimple_seq *pre_p) -{ - tree expr = *expr_p; - gimple g; - gimple_seq body = NULL; - enum omp_region_type ort = (enum omp_region_type) (ORT_TARGET -| ORT_TARGET_OFFLOAD -| ORT_TARGET_MAP_FORCE); - - gimplify_scan_omp_clauses (OACC_PARALLEL_CLAUSES (expr), pre_p, ort); - - push_gimplify_context (); - - g = gimplify_and_return_first (OACC_PARALLEL_BODY (expr), body); - if (gimple_code (g) == GIMPLE_BIND) -pop_gimplify_context (g); - else -pop_gimplify_context (NULL); - - gimplify_adjust_omp_clauses (OACC_PARALLEL_CLAUSES (expr)); - - g = gimple_build_oacc_parallel (body, OACC_PARALLEL_CLAUSES (expr)); - gimplify_seq_add_stmt (pre_p, g); - *expr_p = NULL_TREE; -} - /* Gimplify the contents of an OMP_PARALLEL statement. This involves gimplification of the body, as well as scanning the body for used variables. We need to do this scan now, because variable-sized @@ -7039,6 +7007,11 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) ort = (enum omp_region_type) (ORT_TARGET | ORT_TARGET_MAP_FORCE); break; +case OACC_PARALLEL: + ort = (enum omp_region_type) (ORT_TARGET + | ORT_TARGET_OFFLOAD + | ORT_TARGET_MAP_FORCE); + break; case OMP_SECTIONS: case OMP_SINGLE: ort = ORT_WORKSHARE; @@ -7097,6 +7070,9 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p) stmt = gimple_build_omp_target (body, GF_OMP_TARGET_KIND_OACC_DATA, OACC_DATA_CLAUSES (expr)); break; +case OACC_PARALLEL: + stmt = gimple_build_oacc_parallel (body, OACC_PARALLEL_CLAUSES (expr)); + break; case OMP_SECTIONS: stmt = gimple_build_omp_sections (body, OMP_CLAUSES (expr)); break; @@ -8060,11 +8036,6 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, ret = GS_ALL_DONE; break; - case OACC_PARALLEL: - gimplify_oacc_parallel (expr_p, pre_p); - ret = GS_ALL_DONE; - break; - case OACC_KERNELS: case OACC_HOST_DATA: case OACC_DECLARE: @@ -8095,6 +8066,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, break; case OACC_DATA: + case OACC_PARALLEL: case OMP_SECTIONS: case OMP_SINGLE: case OMP_TARGET: Grüße, Thomas pgpoBzrHjF4uI.pgp Description: PGP signature
Re: [PR debug/59992 #2/2] preserve permanent values in a separate table
On 02/26/2014 10:09 PM, Alexandre Oliva wrote: Regstrapped on x86_64-linux-gnu and i686-linux-gnu, along with the other patch for PR debug/59992. Ok to install? for gcc/ChangeLog PR debug/59992 * cselib.c (cselib_hasher::equal): Special-case VALUE lookup. (cselib_preserved_hash_table): New. (preserve_constants_and_equivs): Move preserved vals to it. (cselib_find_slot): Look it up first. (cselib_init): Initialize it. (cselib_finish): Release it. (dump_cselib_table): Dump it. Ok. r~
[C++ Patch] PR 58610
Hi, in this regression, we ICE in explain_non_literal_class in: if (DECL_DELETED_FN (fn)) maybe_explain_implicit_delete (fn); because fn is a TEMPLATE_DECL. Now, if I read the comment which provides rationale for LANG_DECL_FN_CHECK: /* We want to be able to check DECL_CONSTRUCTOR_P and such on a function template, not just on a FUNCTION_DECL. So when looking for things in lang_decl_fn, look down through a TEMPLATE_DECL into its result. */ #define LANG_DECL_FN_CHECK(NODE) __extension__\ I think it could make sense to use the macro for DECL_DELETED_FN too, not just for DECL_DEFAULTED_FN. Or do we want something more localized? Tested x86_64-linux. Thanks, Paolo. / /cp 2014-02-27 Paolo Carlini paolo.carl...@oracle.com PR c++/58610 * cp-tree.h (DECL_DELETED_FN): Use LANG_DECL_FN_CHECK. * call.c (print_z_candidate): Remove STRIP_TEMPLATE use. /testsuite 2014-02-27 Paolo Carlini paolo.carl...@oracle.com PR c++/58610 * g++.dg/cpp0x/constexpr-ice11.C: New. Index: cp/call.c === --- cp/call.c (revision 208203) +++ cp/call.c (working copy) @@ -3237,7 +3237,7 @@ print_z_candidate (location_t loc, const char *msg inform (cloc, %s%T conversion, msg, candidate-fn); else if (candidate-viable == -1) inform (cloc, %s%#D near match, msg, candidate-fn); - else if (DECL_DELETED_FN (STRIP_TEMPLATE (candidate-fn))) + else if (DECL_DELETED_FN (candidate-fn)) inform (cloc, %s%#D deleted, msg, candidate-fn); else inform (cloc, %s%#D, msg, candidate-fn); Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 208203) +++ cp/cp-tree.h(working copy) @@ -3222,7 +3222,7 @@ more_aggr_init_expr_args_p (const aggr_init_expr_a /* Nonzero if DECL was declared with '= delete'. */ #define DECL_DELETED_FN(DECL) \ - (DECL_LANG_SPECIFIC (FUNCTION_DECL_CHECK (DECL))-u.base.threadprivate_or_deleted_p) + (LANG_DECL_FN_CHECK (DECL)-min.base.threadprivate_or_deleted_p) /* Nonzero if DECL was declared with '= default' (maybe implicitly). */ #define DECL_DEFAULTED_FN(DECL) \ Index: testsuite/g++.dg/cpp0x/constexpr-ice11.C === --- testsuite/g++.dg/cpp0x/constexpr-ice11.C(revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-ice11.C(working copy) @@ -0,0 +1,9 @@ +// PR c++/58610 +// { dg-do compile { target c++11 } } + +struct A +{ + templatetypename A(); +}; + +constexpr A a; // { dg-error literal|matching }
Re: [PATCH,GRAPHITE] Fix for P1 bug 58028
Thanks for comments - updated the patch (fixed my e-mail address too :-)). 2014-02-26 Tobias Grosser tob...@grosser.es Mircea Namolaru mircea.namol...@inria.fr Fix for bug 58028 * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar dimensions. Index: gcc/graphite-clast-to-gimple.c === --- gcc/graphite-clast-to-gimple.c (revision 207298) +++ gcc/graphite-clast-to-gimple.c (working copy) @@ -1522,6 +1522,13 @@ variables. */ options-save_domains = 1; + /* Do not remove scalar dimensions. CLooG by default removes scalar + dimensions very early from the input schedule. However, they are + necessary to correctly derive from the saved domains + (options-save_domains) the relationship between the generated loops + and the schedule dimensions they are generated from. */ + options-noscalars = 1; + /* Disable optimizations and make cloog generate source code closer to the input. This is useful for debugging, but later we want the optimized code. - Mail original - De: Rainer Orth r...@cebitec.uni-bielefeld.de À: Tobias Burnus tobias.bur...@physik.fu-berlin.de Cc: Tobias Grosser tob...@grosser.es, Mircea Namolaru mircea.namol...@inria.fr, gcc-patches@gcc.gnu.org Envoyé: Jeudi 27 Février 2014 11:14:30 Objet: Re: [PATCH,GRAPHITE] Fix for P1 bug 58028 Tobias Burnus tobias.bur...@physik.fu-berlin.de writes: On 02/26/2014 10:09 PM, Mircea Namolaru wrote: + /* Do not remove scalar dimensions. Cloog be default removes scalar + dimensions very early from the input schedule. However, they are Small nit: Instead of 'be default' it should be 'by default' ... and two spaces after the full stops. While you're at it, please fix the CLooG capitalization ;-) Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] [rtl-optimization/49847] Fix cse to handle cc0 setter and cc0 user in different blocks
On 02/27/14 12:50, H.J. Lu wrote: On Thu, Feb 27, 2014 at 11:02 AM, Jeff Law l...@redhat.com wrote: As discussed in 49847, a few years ago GCC was changed to add EH edges for exceptions that might arise from a floating point comparison. That change made it possible for a cc0-setter and cc0-user to end up in different blocks, separated by a NOTE. After discussing reverting the change, duplicating the cc0-setter and a couple other ideas, eventually Richi and myself settled on relaxing the long standing restrictions on the relative locations of the cc0 setter and cc0 user. We agreed to update the documentation and fault in fixes due to low priority of cc0 targets (and doubly so since this really only affects cc0 targets with used with non-call-exceptions). This patch updates the documentation and fixes the only known failure due to this change in cse.c with a patch from Mikael. Tested with a cross compiler. I'll probably fire off a m68k bootstrap for good measure, but it'll be several days before that completes. Installed on the trunk. Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a78716..a20cee3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2014-02-27 Mikael Pettersson mi...@it.uu.se + Jeff Law l...@redhat.com + + PR rtl-optimization/49847 + * cse.c (fold_rtx) Handle case where cc0 setter and cc0 user + are in different blocks. + * doc/tm.texi (Condition Code Status): Update documention for + relative locations of cc0-setter and cc0-user. + This breaks bootstrap: You should edit ../../src-trunk/gcc/doc/tm.texi.in rather than ../../src-trunk/gcc/doc/tm.texi . make[6]: *** [s-tm-texi] Error 1 make[6]: *** Waiting for unfinished jobs Nuts. Sorry. Thanks for fixing it. jeff
Re: [PATCH,GRAPHITE] Fix for P1 bug 58028
Hi Mircea, two last nits: 2014-02-26 Tobias Grosser tob...@grosser.es Mircea Namolaru mircea.namol...@inria.fr Fix for bug 58028 Please write this as PR tree-optimization/58028 instead. This way, the commit triggers an update to the bugzilla bug. * graphite-clast-to-gimple.c (set_cloog_options): Don't remove scalar dimensions. Don't insert a line break after the colon, but continue on the same line, letting it wrap at pos. 75 (Emacs' fill-column). Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [jit] Major API change: blocks rather than labels
On Thu, 2014-02-27 at 17:11 -0500, David Malcolm wrote: [...] With this commit, the API changes to using basic blocks instead: blocks are created within functions, and statements are added to blocks, rather than to functions. [...] I've also ported the jittest example to the new API, as of this commit: https://github.com/davidmalcolm/jittest/commit/af66efe0386e52a9292b7527174ae402c0af5e43 (though currently it falls foul of type-checking, due to int vs bool issues in conditionals; upon hacking out the type-checking from libgccjit it compiles and runs OK). I haven't yet ported the Python bindings. Dave
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. This change: if (dest-loop_father-header == dest) - return false; + { +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + bb-loop_father-header != dest) + return false; + +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + bb-loop_father-header == dest) + return false; + } } miscompiled 435.gromacs in SPEC CPU 2006 on x32 with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin This patch changes loops without LOOPS_HAVE_PREHEADERS nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning true. I don't have a small testcase. But this patch: diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index b5c384b..2ba673c 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) bb-loop_father-header == dest) return false; + +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; } } fixes the regression. Does it make any senses? -- H.J.
Re: [Patch, testsuite]: Allow MicroBlaze .weakext pattern in regex match
On 02/26/14 16:29, David Holsgrove wrote: Hi Mike S., Michael E., -Original Message- From: Mike Stump [mailto:mikest...@comcast.net] Sent: Friday, 21 February 2014 6:17 am To: David Holsgrove Cc: gcc-patches@gcc.gnu.org; Michael Eager (ea...@eagerm.com); Vidhumouli Hunsigida; Nagaraju Mekala; John Williams; Edgar Iglesias Subject: Re: [Patch, testsuite]: Allow MicroBlaze .weakext pattern in regex match On Feb 16, 2014, at 4:21 PM, David Holsgrove david.holsgr...@xilinx.com wrote: I've attached an updated patch Ok, thanks. If no one has any objections on this updated patch, would either of you be able to approve and apply? What targets have you tested this patch with? It isn't sufficient to test it only with MicroBlaze, since this code is used by all other targets. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
On Tue, 25 Feb 2014, bin.cheng wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. gcc/testsuite/ChangeLog 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * gnat.dg/renaming5.adb: Change to two expected gotos. * gcc.dg/tree-ssa/pr21559.c: Change back to three expected jump threads. * gcc.dg/tree-prof/update-loopch.c: Check two Invalid sum messages for removed basic block. * gcc.dg/tree-ssa/ivopt_1.c: Fix unreliable scanning string. * gcc.dg/tree-ssa/ivopt_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_3.c: Ditto. * gcc.dg/tree-ssa/ivopt_4.c: Ditto. Do you need to also update gcc.dg/tree-ssa/ssa-dom-thread-4.c, at least for logical_op_short_circuit targets? (There's a nice analysis comment there by Richard S which may also have to be updated.) This caused a regression for logical_op_short_circuit targets, I entered PR60363 for convenience. brgds, H-P
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
Thanks for reporting this, I will look into it. Thanks, bin On Fri, Feb 28, 2014 at 8:52 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Feb 24, 2014 at 9:12 PM, bin.cheng bin.ch...@arm.com wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. This change: if (dest-loop_father-header == dest) - return false; + { +if (loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + bb-loop_father-header != dest) + return false; + +if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) + bb-loop_father-header == dest) + return false; + } } miscompiled 435.gromacs in SPEC CPU 2006 on x32 with -O3 -funroll-loops -ffast-math -fwhole-program -flto=jobserver -fuse-linker-plugin This patch changes loops without LOOPS_HAVE_PREHEADERS nor LOOPS_HAVE_SIMPLE_LATCHES from returning false to returning true. I don't have a small testcase. But this patch: diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index b5c384b..2ba673c 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -323,6 +323,10 @@ tree_forwarder_block_p (basic_block bb, bool phi_wanted) if (loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES) bb-loop_father-header == dest) return false; + +if (!loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS) + !loops_state_satisfies_p (LOOPS_HAVE_SIMPLE_LATCHES)) + return false; } } fixes the regression. Does it make any senses? -- H.J. -- Best Regards.
Re: [PATCH GCC]Allow cfgcleanup to remove forwarder loop preheaders and latches
Sorry, I didn't test it against logical_op_short_circuit target. I will look into this PR. Thanks, bin On Fri, Feb 28, 2014 at 9:34 AM, Hans-Peter Nilsson h...@bitrange.com wrote: On Tue, 25 Feb 2014, bin.cheng wrote: Hi, This patch is to fix regression reported in PR60280 by removing forward loop headers/latches in cfg cleanup if possible. Several tests are broken by this change since cfg cleanup is shared by all optimizers. Some tests has already been fixed by recent patches, I went through and fixed the others. One case needs to be clarified is gcc.dg/tree-prof/update-loopch.c. When GCC removing a basic block, it checks profile information by calling check_bb_profile after redirecting incoming edges of the bb. This certainly results in warnings about invalid profile information and causes the case to fail. I will send a patch to skip checking profile information for a removing basic block in stage 1 if it sounds reasonable. For now I just twisted the case itself. Bootstrap and tested on x86_64 and arm_a15. Is it OK? 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * tree-cfgcleanup.c (tree_forwarder_block_p): Protect loop preheaders and latches only if requested. Fix latch if it is removed. * tree-ssa-dom.c (tree_ssa_dominator_optimize): Set LOOPS_HAVE_PREHEADERS. gcc/testsuite/ChangeLog 2014-02-25 Bin Cheng bin.ch...@arm.com PR target/60280 * gnat.dg/renaming5.adb: Change to two expected gotos. * gcc.dg/tree-ssa/pr21559.c: Change back to three expected jump threads. * gcc.dg/tree-prof/update-loopch.c: Check two Invalid sum messages for removed basic block. * gcc.dg/tree-ssa/ivopt_1.c: Fix unreliable scanning string. * gcc.dg/tree-ssa/ivopt_2.c: Ditto. * gcc.dg/tree-ssa/ivopt_3.c: Ditto. * gcc.dg/tree-ssa/ivopt_4.c: Ditto. Do you need to also update gcc.dg/tree-ssa/ssa-dom-thread-4.c, at least for logical_op_short_circuit targets? (There's a nice analysis comment there by Richard S which may also have to be updated.) This caused a regression for logical_op_short_circuit targets, I entered PR60363 for convenience. brgds, H-P -- Best Regards.
RE: [patch] [arm] Fix PR60169 - thumb1 far jump
Ping. OK for trunk and 4.8? -Original Message- From: Joey Ye [mailto:joey...@arm.com] Sent: 21 February 2014 19:32 To: gcc-patches@gcc.gnu.org Subject: [patch] [arm] Fix PR60169 - thumb1 far jump Patch http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01229.html introduced this ICE: 1. thumb1 estimate if far_jump is used based on function insn size 2. During reload, after stack layout finalized, it does reload_as_needed. It however increases insn size that changes estimation result of far_jump, which in return need to save lr and change stack layout again. While there is not chance to change, GCC crashes. Solution: Do not change estimation result of far_jump if reload_in_progress or reload_completed is true. Not likely need to fix lra according to Vlad: http://gcc.gnu.org/ml/gcc/2014-02/msg00355.html ChangeLog: * config/arm/arm.c (thumb_far_jump_used_p): Don't change if reload in progress or completed. * gcc.target/arm/thumb1-far-jump-3.c: New case. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index b562986..2cf362c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -26255,6 +26255,11 @@ thumb_far_jump_used_p (void) return 0; } + /* We should not change far_jump_used during or after reload, as there is + no chance to change stack frame layout. */ + if (reload_in_progress || reload_completed) +return 0; + /* Check to see if the function contains a branch insn with the far jump attribute set. */ for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) diff --git a/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c new file mode 100644 index 000..90559ba --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c @@ -0,0 +1,108 @@ +/* Catch reload ICE on target thumb1 with far jump optimization. + * It is also a valid case for non-thumb1 target. */ + +/* Add -mno-lra option as it is only reproducable with reload. It will + be removed after reload is completely removed. */ +/* { dg-options -mno-lra -fomit-frame-pointer } */ +/* { dg-do compile } */ + +#define C 2 +#define A 4 +#define RGB (C | A) +#define GRAY (A) + +typedef unsigned long uint_32; +typedef unsigned char byte; +typedef byte* bytep; + +typedef struct ss +{ + uint_32 w; + uint_32 r; + byte c; + byte b; + byte p; +} info; + +typedef info * infop; + +void +foo(infop info, bytep row) +{ + uint_32 iw = info-w; + if (info-c == RGB) + { + if (info-b == 8) + { + bytep sp = row + info-r; + bytep dp = sp; + byte save; + uint_32 i; + + for (i = 0; i iw; i++) + { +save = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = save; + } + } + + else + { + bytep sp = row + info-r; + bytep dp = sp; + byte save[2]; + uint_32 i; + + for (i = 0; i iw; i++) + { +save[0] = *(--sp); +save[1] = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = save[0]; +*(--dp) = save[1]; + } + } + } + else if (info-c == GRAY) + { + if (info-b == 8) + { + bytep sp = row + info-r; + bytep dp = sp; + byte save; + uint_32 i; + + for (i = 0; i iw; i++) + { +save = *(--sp); +*(--dp) = *(--sp); +*(--dp) = save; + } + } + else + { + bytep sp = row + info-r; + bytep dp = sp; + byte save[2]; + uint_32 i; + + for (i = 0; i iw; i++) + { +save[0] = *(--sp); +save[1] = *(--sp); +*(--dp) = *(--sp); +*(--dp) = *(--sp); +*(--dp) = save[0]; +*(--dp) = save[1]; + } + } + } +}
[PATCH] [libgcc,arm] Fix PR 60166 - NAN fraction bits
This patch is a mirror copy from approved patch in glibc: http://sourceware.org/ml/libc-alpha/2014-02/msg00741.html OK to trunk, 4.8 and 4.7? ChangeLog.libgcc: * config/arm/sfp-machine.h (_FP_NANFRAC_H, _FP_NANFRAC_S, _FP_NANFRAC_D, _FP_NANFRAC_Q): Set to zero. diff --git a/libgcc/config/arm/sfp-machine.h b/libgcc/config/arm/sfp-machine.h index bb34895..8d45320 100644 --- a/libgcc/config/arm/sfp-machine.h +++ b/libgcc/config/arm/sfp-machine.h @@ -19,10 +19,12 @@ typedef int __gcc_CMPtype __attribute__ ((mode (__libgcc_cmp_return__))); #define _FP_DIV_MEAT_D(R,X,Y) _FP_DIV_MEAT_2_udiv(D,R,X,Y) #define _FP_DIV_MEAT_Q(R,X,Y) _FP_DIV_MEAT_4_udiv(Q,R,X,Y) -#define _FP_NANFRAC_H ((_FP_QNANBIT_H 1) - 1) -#define _FP_NANFRAC_S ((_FP_QNANBIT_S 1) - 1) -#define _FP_NANFRAC_D ((_FP_QNANBIT_D 1) - 1), -1 -#define _FP_NANFRAC_Q ((_FP_QNANBIT_Q 1) - 1), -1, -1, -1 +/* According to RTABI, QNAN is only with the most significant bit of the + significand set, and all other significand bits zero. */ +#define _FP_NANFRAC_H 0 +#define _FP_NANFRAC_S 0 +#define _FP_NANFRAC_D 0, 0 +#define _FP_NANFRAC_Q 0, 0, 0, 0 #define _FP_NANSIGN_H 0 #define _FP_NANSIGN_S 0 #define _FP_NANSIGN_D 0
RE: [PATCH v4] PR middle-end/60281
Hi, I see the problem too. But I think it is not necessary to change the stack alignment to solve the problem. It appears to me that the code in asan_emit_stack_protection is just wrong. It uses SImode when the memory is not aligned enough for that mode. This would not happen if that code is rewritten to use get_best_mode, and by the way, even on x86_64 the emitted code is not optimal, because that target could work with DImode more efficiently. So, to fix that, it would be better to concentrate on that function, and use word_mode instead of SImode, and let get_best_mode choose the required mode. Regards Bernd Edlinger.
Re: [PATCH v4] PR middle-end/60281
Hi Bernd, I agree you with the mode problem. And I have not change the stack alignment.What I change is the virtual register base's alignment. Realignment must be make in !STRICT_ALIGNMENT machine,or emitting the efficient code is impossible. For example 4 set mem:QI X,REG:QI Y will not combine into one set mem:SI X1,REG:SI Y1,if X is not mentioned as SI mode aligned. To make sure X is SI mode algined,virtual register base must be realigned. For this patch,I only intent to make it right.Making it best is next task. -- Regards lin zuojian. 于 2014年02月28日 15:47, Bernd Edlinger 写道: Hi, I see the problem too. But I think it is not necessary to change the stack alignment to solve the problem. It appears to me that the code in asan_emit_stack_protection is just wrong. It uses SImode when the memory is not aligned enough for that mode. This would not happen if that code is rewritten to use get_best_mode, and by the way, even on x86_64 the emitted code is not optimal, because that target could work with DImode more efficiently. So, to fix that, it would be better to concentrate on that function, and use word_mode instead of SImode, and let get_best_mode choose the required mode. Regards Bernd Edlinger.