Re: [PATCH 01/13] recog: Increased max number of alternatives - v2
On Mon, May 18, 2015 at 3:41 PM, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: The new version also changes the type for the alternative_mask to unsigned HOST_WIDE_INT. Bootstrapped without regressions on x86-64. Ok to apply? Please use uint64_t instead. Richard. Bye, -Andreas- gcc/ * recog.h: Increase MAX_RECOG_ALTERNATIVES. --- gcc/recog.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/recog.h b/gcc/recog.h index 463c748..07700bd 100644 --- a/gcc/recog.h +++ b/gcc/recog.h @@ -23,8 +23,8 @@ along with GCC; see the file COPYING3. If not see /* Random number that should be large enough for all purposes. Also define a type that has at least MAX_RECOG_ALTERNATIVES + 1 bits, with the extra bit giving an invalid value that can be used to mean uninitialized. */ -#define MAX_RECOG_ALTERNATIVES 30 -typedef unsigned int alternative_mask; +#define MAX_RECOG_ALTERNATIVES 35 +typedef unsigned HOST_WIDE_INT alternative_mask; /* A mask of all alternatives. */ #define ALL_ALTERNATIVES ((alternative_mask) -1)
[PATCH][AArch64][PR 66136] rewrite geniterators.sh in awk
Rewrote the generator script in awk, to avoid dealing with sed portability issues. gcc/Changelog: 2015-05-18 Szabolcs Nagy szabolcs.n...@arm.com PR target/66136 * config/aarch64/geniterators.sh: Rewrite in awk. diff --git a/gcc/config/aarch64/geniterators.sh b/gcc/config/aarch64/geniterators.sh index f908e89..5a51d29 100644 --- a/gcc/config/aarch64/geniterators.sh +++ b/gcc/config/aarch64/geniterators.sh @@ -22,25 +22,52 @@ # Generate aarch64-builtin-iterators.h, a file containing a series of # BUILTIN_ITERATOR macros, which expand to VARN Macros covering the # same set of modes as the iterator in iterators.md - -echo /* -*- buffer-read-only: t -*- */ -echo /* Generated automatically by geniterators.sh from iterators.md. */ -echo #ifndef GCC_AARCH64_ITERATORS_H -echo #define GCC_AARCH64_ITERATORS_H - -# Strip newlines, create records marked ITERATOR, and strip junk (anything -# which does not have a matching brace because it contains characters we +# +# Find the ITERATOR definitions (may span several lines), skip the ones +# which does not have a simple format because it contains characters we # don't want to or can't handle (e.g P, PTR iterators change depending on # Pmode and ptr_mode). -export LC_ALL=C -cat $1 | tr \n \ - | sed 's/(define_mode_iterator \([A-Za-z0-9_]*\) \([]\[A-Z0-9 \t]*\)/\n#define BUILTIN_\1(T, N, MAP) \\ \2\n/g' \ - | grep '#define [A-Z0-9_(), \\]* \[[A-Z0-9[:space:]]*]' \ - | sed 's/\t//g' \ - | sed 's/ */ /g' \ - | sed 's/ \[\([A-Z0-9 ]*\)]/\n\1/' \ - | awk ' BEGIN { FS = ; OFS = , } \ - /#/ { print } \ - ! /#/ { $1 = $1 ; printf VAR%d (T, N, MAP, %s)\n, NF, tolower($0) }' - -echo #endif /* GCC_AARCH64_ITERATORS_H */ +LC_ALL=C awk ' +BEGIN { + print /* -*- buffer-read-only: t -*- */ + print /* Generated automatically by geniterators.sh from iterators.md. */ + print #ifndef GCC_AARCH64_ITERATORS_H + print #define GCC_AARCH64_ITERATORS_H +} + +{ + sub(/;.*/, ) +} + +iterdef { + s = s $0 +} + +!iterdef /\(define_mode_iterator/ { + iterdef = 1 + s = $0 + sub(/.*\(define_mode_iterator/, , s) +} + +iterdef s ~ /\)/ { + iterdef = 0 + + gsub(/[ \t]+/, , s) + sub(/ *\).*/, , s) + sub(/^ /, , s) + if (s !~ /^[A-Za-z0-9_]+ \[[A-Z0-9 ]*\]$/) + next + sub(/\[ */, , s) + sub(/ *\]/, , s) + + n = split(s, a) + printf #define BUILTIN_ a[1] (T, N, MAP) \\\n + printf VAR (n-1) (T, N, MAP + for (i = 2; i = n; i++) + printf , tolower(a[i]) + printf )\n +} + +END { + print #endif /* GCC_AARCH64_ITERATORS_H */ +}' $1
[patch,gomp4] remove gwv test case
This patch removes the libgomp.oacc-c/gwv.c. test case. The original idea behind this test was to allow us to create our own parallel loops using by using the gang, worker and vector threads directly by using the built-in GOACC_ thread functions. This test was supposed to ensure that those functions work. Now that we're working on a proper execution model, exposing those functions to the end user doesn't make much sense. That's at least true for workers and vectors because they start off in worker-single and vector-single mode, respectively. Then there's the other issue of variable broadcasting. As a result, this test case isn't necessary going forward. Those thread functions are already being exercised by omp-low for each acc loop anyway. Cesar 2015-05-18 Cesar Philippidis ce...@codesourcery.com libgomp/ * testsuite/libgomp.oacc-c/gwv.c: Delete. diff --git a/libgomp/testsuite/libgomp.oacc-c/gwv.c b/libgomp/testsuite/libgomp.oacc-c/gwv.c deleted file mode 100644 index f8f1169..000 --- a/libgomp/testsuite/libgomp.oacc-c/gwv.c +++ /dev/null @@ -1,34 +0,0 @@ -#include stdlib.h -#include openacc.h - -extern int __builtin_GOACC_ntid (int); -extern int __builtin_GOACC_tid (int); -extern int __builtin_GOACC_nctaid (int); -extern int __builtin_GOACC_ctaid (int); - -int main () -{ - int gangs = -1, vectors = -1; - int GANGS = 100, VECTORS = 64; - -#pragma acc parallel num_gangs (GANGS) vector_length (VECTORS) copyout (gangs, vectors) - { -gangs = __builtin_GOACC_nctaid (0); -vectors = __builtin_GOACC_ntid (0); - } - - if (acc_get_device_type () == acc_device_nvidia) -{ - if (gangs != GANGS) - abort (); - if (vectors != VECTORS) - abort (); -} - else -{ - if (gangs != 0) - abort (); - if (vectors != 0) - abort (); -} -}
RE: [PATCH, MIPS]: Fix internal compiler error: in check_bool_attrs, at recog.c:2218 for micromips attribute
On Mon, 18 May 2015, Robert Suchanek wrote: Just to check, the reason we don't see this in the current testsuite is that there is no test with both MIPS and microMIPS functions? Correct. The testsuite either tests microMIPS or MIPS but not both in the same TU. We could add -mflip-micromips complementing -mflip-mips16 and use that for testing too. Chances are it'd reveal further issues. Looking at how -mflip-mips16 has been implemented it does not appear to me adding -mflip-micromips would be a lot of effort. Maciej
Re: [patch,gomp4] remove gwv test case
On Mon, May 18, 2015 at 07:57:22AM -0700, Cesar Philippidis wrote: This patch removes the libgomp.oacc-c/gwv.c. test case. The original idea behind this test was to allow us to create our own parallel loops using by using the gang, worker and vector threads directly by using the built-in GOACC_ thread functions. This test was supposed to ensure that those functions work. Now that we're working on a proper execution model, exposing those functions to the end user doesn't make much sense. That's at least true for workers and vectors because they start off in worker-single and vector-single mode, respectively. Then there's the other issue of variable broadcasting. As a result, this test case isn't necessary going forward. Those thread functions are already being exercised by omp-low for each acc loop anyway. Cesar 2015-05-18 Cesar Philippidis ce...@codesourcery.com libgomp/ * testsuite/libgomp.oacc-c/gwv.c: Delete. Ok with me. Jakub
Re: Check canonical types in verify_type
+ if (!comp_type_attributes (t1, t2)) + return false; Because I think the TBAA does not care about attribute lists. I suppose this is kind-of harmless because of: I think that was about attributes like 'packed', but those should have their effects applied at stor-layout time. So yes, I think this can be dropped. It does test the attribute list for functions only. I think packed should be detected from memory layout, so I will test patch dropping this. + /* For canonical type comparisons we do not want to build SCCs +so we cannot compare pointed-to types. But we can, for now, +require the same pointed-to type kind and match what +useless_type_conversion_p would do. */ + if (POINTER_TYPE_P (t1)) + { + if (TYPE_ADDR_SPACE (TREE_TYPE (t1)) + != TYPE_ADDR_SPACE (TREE_TYPE (t2))) + return false; + + if (TREE_CODE (TREE_TYPE (t1)) != TREE_CODE (TREE_TYPE (t2))) + return false; + } Is the reason for not making differences between differnt pointer types really just lazyness to not deal with SCCs? For that it is easy to add set of visited type pairs, just like odr_types_equivalent_p does. Well - TBAA doesn't care about pointed-to types. See get_alias_set else if (POINTER_TYPE_P (t) t != ptr_type_node) set = get_alias_set (ptr_type_node); Hmm, I see, interesting hack. For the first part of comment, I see that qualifiers needs to be ignored, but I do not see why we put short * and int * pointers to same class. The complete/incomplete type fun with LTO can be solved for C++ but indeed I see why pointer to incomplete type needs to be considered compatible with every other pointer to structure. Can't this be dealt with by adding correct edges into the aliasing dag? so the above would at most matter for pointer-type fields of aggregates. And no, we don't want to deal with SCCs - I doubt it would bring any benefit and the issue here is that I am worried about SCCs being present dependent on the order of streaming (well, the code pre-dates SCC streaming). Because we now stream in SCC order anyway, most of time we won't even need to populate it. Shall I implement this? You can certainly try - but I think for cross-language interoperability we want to treat struct X { void *p; } struct Y { void *p; } and struct X { Y *p; } struct Y { X *q; } as compatible so I don't think we can use SCCs to partition alias sets further (that is, we _do_ have to treat pointers conservatively). The current code is as far as we can get IMHO (well, even the current code is too strict in making struct { void *p } and struct { int *p } not have the same alias-set. I will give it a try. Do you have some examples where the above matters for cross-language TBAA? Other issue I run into is that for Ada bootstrap we have variadic type whose canonical types are having different temporary set as size. I think this is valid and perhaps gimple_canonical_types_compatible_p should consider variadic arrays to be compatible with any array of compatible type? Those are all local types and thus the strict equality compare should be ok? Not sure if we can do in C sth like void foo (int n) { struct { int i; int a[n]; } a; struct { int i; int a[n]; } b; } and if we'd have to assign the same alias-sets to 'a' and 'b'. No idea here either. I wonder if the types are intedned to be TBAA compatible across two calls of the function. In that case we may introduce multiple copies of body by early inlining already that may be a problem. I am not quite convinced we get variadic types right at LTO time, because they bypass canonical type calculation anyway and their canonical type is set by TYPE_MAIN_VARIANT in lto_read_body_or_constructor which I think is not safe. I will look for a testcase. That is because if they are streamed locally they do not enter type merging, but they still go via gimple_register_canonical_type, so I'm not sure where you see they always get their main variant as canonical type. I tought these are handled by: /* And fixup types we streamed locally. */ { struct streamer_tree_cache_d *cache = data_in-reader_cache; unsigned len = cache-nodes.length (); unsigned i; for (i = len; i-- from;) { tree t = streamer_tree_cache_get_tree (cache, i); if (t == NULL_TREE) continue; if (TYPE_P (t)) { gcc_assert (TYPE_CANONICAL (t) == NULL_TREE); TYPE_CANONICAL (t) = TYPE_MAIN_VARIANT (t); if (TYPE_MAIN_VARIANT (t) != t) { gcc_assert (TYPE_NEXT_VARIANT (t) == NULL_TREE); TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (TYPE_MAIN_VARIANT (t));
Re: [RFA] More type narrowing in match.pd V2
On Mon, 18 May 2015, Richard Biener wrote: On Thu, May 14, 2015 at 4:28 PM, Jeff Law l...@redhat.com wrote: On 05/14/2015 08:04 AM, Marc Glisse wrote: On Fri, 1 May 2015, Jeff Law wrote: Slight refactoring of the condition by using types_match as suggested by Richi. I also applied the new types_match to 2 other patterns in match.pd where it seemed clearly appropriate. I would like to propose this small tweak (regtested ok). If we had a different type for trees and types, this would be overloading the function. We already do this in a few places, and I find the resulting shorter code more readable. 2015-05-14 Marc Glisse marc.gli...@inria.fr * generic-match-head.c (types_match): Handle non-types. * gimple-match-head.c (types_match): Likewise. * match.pd: Remove unnecessary TREE_TYPE for types_match. Update the comment for types_match and this is fine. No - it breaks genmatch autodetection of what trees escape. It special-cases TREE_TYPE (). See capture_info::walk_c_expr: Oh, I didn't know that. /* Give up for C exprs mentioning captures not inside TREE_TYPE (). */ unsigned p_depth = 0; so the patch will pessimize GENERIC folding. Please do not apply. I think it was already applied. Feel free to revert it, or tell me and I'll do it next week-end (from your next message I don't really know if it should be reverted or not). -- Marc Glisse
Re: [RFA] More type narrowing in match.pd V2
On Mon, May 18, 2015 at 2:34 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 18 May 2015, Richard Biener wrote: On Thu, May 14, 2015 at 4:28 PM, Jeff Law l...@redhat.com wrote: On 05/14/2015 08:04 AM, Marc Glisse wrote: On Fri, 1 May 2015, Jeff Law wrote: Slight refactoring of the condition by using types_match as suggested by Richi. I also applied the new types_match to 2 other patterns in match.pd where it seemed clearly appropriate. I would like to propose this small tweak (regtested ok). If we had a different type for trees and types, this would be overloading the function. We already do this in a few places, and I find the resulting shorter code more readable. 2015-05-14 Marc Glisse marc.gli...@inria.fr * generic-match-head.c (types_match): Handle non-types. * gimple-match-head.c (types_match): Likewise. * match.pd: Remove unnecessary TREE_TYPE for types_match. Update the comment for types_match and this is fine. No - it breaks genmatch autodetection of what trees escape. It special-cases TREE_TYPE (). See capture_info::walk_c_expr: Oh, I didn't know that. /* Give up for C exprs mentioning captures not inside TREE_TYPE (). */ unsigned p_depth = 0; so the patch will pessimize GENERIC folding. Please do not apply. I think it was already applied. Feel free to revert it, or tell me and I'll do it next week-end (from your next message I don't really know if it should be reverted or not). It's always good to compare generated code (gimple/generic-match.c in the build directory) for such changes. Only the obvious differences should appear. If they do you can leave the patch in. Richard. -- Marc Glisse
Re: [patch 10/10] debug-early merge: compiler proper
On Fri, May 8, 2015 at 2:40 AM, Aldy Hernandez al...@redhat.com wrote: As seen on TV. +/* FIRST_TIME is set to TRUE for the first time we are called for a + translation unit from finalize_compilation_unit() or false + otherwise. */ + static void -analyze_functions (void) +analyze_functions (bool first_time) { ... + if (first_time) +{ + symtab_node *snode; + FOR_EACH_SYMBOL (snode) + check_global_declaration (snode-decl); +} + I think it is better to split analyze_functions (why does it have it's own copy of unreachable node removal?) into analysis and unused-symbol removal and have the check_global_declaration call be in finalize_compilation_unit directly. Honza? It is trying to avoid analyzing functions that are never used. Analyzing include gimplification and other stuff that is not for free. remove_unreachable_nodes works only on fully built cgraph. Main reason why analyze_functions is intended to be called multiple times was --combine support. After each end of C source file you called symbol_table::finalize_compilation_unit which did unreachable code removal and saved some memory. I am not sure symbol table is useful for debug info this way: we insert only function definitions into symbol table and external declarations are inserted lazilly. That is why we still have the (convoluted) check_global_declarations that walks the vectors provided by frontends. @@ -1113,6 +1131,19 @@ analyze_functions (void) { if (symtab-dump_file) fprintf (symtab-dump_file, %s, node-name ()); + + /* See if the debugger can use anything before the DECL +passes away. Perhaps it can notice a DECL that is now a +constant and can tag the early DIE with an appropriate +attribute. + +Otherwise, this is the last chance the debug_hooks have +at looking at optimized away DECLs, since +late_global_decl will subsequently be called from the +contents of the now pruned symbol table. */ + if (!decl_function_context (node-decl)) + (*debug_hooks-late_global_decl) (node-decl); + node-remove (); so this applies to VAR_DECLs only - shouldn't this be in the varpool_node::remove function then? You can even register/deregister a hook for this in finalize_compilation_unit. That would IMHO be better. There is varpool_node::remove_initializer that is intended to throw away the constructor. I suppose late_global_decl can be called from there? @@ -1334,6 +1334,9 @@ struct GTY(()) tree_block { tree abstract_origin; tree fragment_origin; tree fragment_chain; + + /* Pointer to the DWARF lexical block. */ + struct die_struct *die; }; struct GTY(()) tree_type_common { Ick - do we need this? dwarf2out.c has a hashtable to map blocks to DIEs (which you don't remove in turn). Note that types also have tree_type_symtab that in turn contains die pointer. Honza Jason - did you intentionally not yet approve the dwarf2out.c changes (like, are you expecting somebody else to do a review)? Thanks, Richard.
Re: [match-and-simplify] fix incorrect code-gen in 'for' pattern
On 18 May 2015 at 14:12, Richard Biener rguent...@suse.de wrote: On Sat, 16 May 2015, Prathamesh Kulkarni wrote: Hi, genmatch generates incorrect code for following (artificial) pattern: (for op (plus) op2 (op) (simplify (op @x @y) (op2 @x @y) generated gimple code: http://pastebin.com/h1uau9qB 'op' is not replaced in the generated code on line 33: *res_code = op; I think it would be a better idea to make op2 iterate over same set of operators (op2-substitutes = op-substitutes). I have attached patch for the same. Bootstrap + testing in progress on x86_64-unknown-linux-gnu. OK for trunk after bootstrap+testing completes ? Hmm, but then the example could as well just use 'op'. I think we should instead reject this. Consider (for op (plus minus) (for op2 (op) (simplify ... where it is not clear what would be desired. Simple replacement of 'op's value would again just mean you could have used 'op' in the first place. Doing what you propose would get you (for op (plus minus) (for op2 (plus minus) (simplify ... thus a different iteration. I wonder if we really need is_oper_list flag in user_id ? We can determine if user_id is an operator list if user_id::substitutes is not empty ? After your change yes. That will lose the ability to distinguish between user-defined operator list and list-iterator in for like op/op2, but I suppose we (so far) don't need to distinguish between them ? Well, your change would simply make each list-iterator a (temporary) user-defined operator list as well as the current iterator element (dependent on context - see the nested for example). I think that adds to confusion. So - can you instead reject this use? Well my intention was to have support for walking operator list in reverse. For eg: (for bitop (bit_and bit_ior) rbitop (bit_ior bit_and) ...) Could be replaced by sth like: (for bitop (bit_and bit_ior) rbitop (~bitop)) ...) where ~bitop would indicate walking (bit_and bit_ior) in reverse. Would that be a good idea ? For symmetry, I thought (for op (list) op2 (op)) should be supported too. Thanks, Prathamesh Thanks, Richard.
[PATCH][Testsuite] Disable tests with dg-require-fork for simulated targets
Simulators such as qemu report the presence of fork (it's in glibc) but generally do not support synchronization primitives between threads, so any tests using fork are unreliable. This patch disables the subset of such tests that identify themselves using dg-require-fork. At present, such tests are limited to (a) gcc.dg/torture/ftrapv-1.c. and (b) some tests in the 27_io section of the libstdc++ testsuite, listed below. Further patches can add dg-require-fork to the many other tests that call fork(). Cross-tested on aarch64-none-linux-gnu using qemu, with these tests becoming UNSUPPORTED: (gcc) gcc.dg/torture/ftrapv-1.c (libstdc++) 27_io/basic_filebuf/close/char/4879.cc 27_io/basic_filebuf/close/char/9964.cc 27_io/basic_filebuf/seekoff/char/26777.cc 27_io/basic_filebuf/showmanyc/char/9533-1.cc 27_io/basic_filebuf/underflow/char/10097.cc 27_io/objects/char/7.cc 27_io/objects/char/9661-1.cc 27_io/objects/wchar_t/7.cc 27_io/objects/wchar_t/9661-1.cc Thoughts? Is this patch OK for trunk? Cheers, Alan gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_fork_available): Return 0 if running on simulator. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index f632d00..0ffd35c 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -1781,6 +1781,10 @@ proc check_function_available { function } { # Returns true iff fork is available on the target system. proc check_fork_available {} { +if { [check_effective_target_simulator] } { + # Most simulators do not properly support multithreading + return 0 +} return [check_function_available fork] }
Re: [PATCH][calls.c] Remove #ifdef checks on STACK_GROWS_DOWNWARD
On 05/18/2015 03:22 AM, Kyrill Tkachov wrote: Hi all, As per https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01166.html I saw that STACK_GROWS_DOWNWARD is being checked for ifdef in a number of places unrelated to the patch in the PR, so decided to remove the ifdef checks for STACK_GROWS_DOWNWARD in a separate patch. It's fairly mechanical. Bootstrapped on x86_64 and aarch64 and tested on arm. I believe these patches are considered obvious, so I'll commit this in a day, unless someone objects, before proceeding with the changes in https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01166.html 2015-05-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * calls.c: Always define STACK_GROWS_DOWNWARD as 0 or 1. (mem_overlaps_already_clobbered_arg_p): Rewrite ifdef STACK_GROWS_DOWNWARD as normal if. (expand_call): Likewise. Thanks for taking care of this... jeff
RE: [PATCH, MIPS]: Fix internal compiler error: in check_bool_attrs, at recog.c:2218 for micromips attribute
Hi Matthew, This patch fixes an internal compiler error when micromips/nomicromips attributes are used. The problem here was that the cached boolean attributes for the current target did not agree with the uncached attributes throwing an assertion error. It appears that saving and restoring the state for micromips was missing, just like there is for mips16. OK to apply? Just to check, the reason we don't see this in the current testsuite is that there is no test with both MIPS and microMIPS functions? Correct. The testsuite either tests microMIPS or MIPS but not both in the same TU. gcc/ * config/mips/mips.c (micromips_globals): New variable. (mips_set_compression_mode): Save and reinitialize target-dependent state for microMIPS. gcc/testsuite * gcc.target/mips/umips-attr.c: New test. OK. Committed as r223294. Regards, Robert
Re: [PATCH] [AArch32] Additional bics patterns.
Committed r223295. Alex.
Re: RFC: Add a new relocation to x86-64/i386 psABIs
On Mon, May 18, 2015 at 6:13 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 18 May 2015, H.J. Lu wrote: To avoid indirect branch to internal functions, I am proposing to add a new relocation, R_X86_64_RELAX_GOTPCREL, to x86-64 psABI: 1. When branching to an external function, foo, compiler may generate call/jmp *foo@GOTRELAX(%rip) which generates R_X86_64_RELAX_GOTPCREL relocation, instead of call/jmp foo[@PLT] 2. When function foo is locally defined, linker converts call/jmp *foo@GOTRELAX(%rip) to nop call/jmp foo For the jmp case the nop can also be added after it, to not even disturb Yes, we should convert it to nop call foo/jmp foo nop I implemented it on users/hjl/relax branch in binutils git repo. the insn decoder. For calls as well of course, but there it might be better to have it before the call. I think a nop prefix is better on call. We won't mandate nop call foo in psABI and linker is free to use either a nop prefix or a nop suffix. Should we move forward with it? Thanks. -- H.J.
Re: RFC: Add a new relocation to x86-64/i386 psABIs
Hi, On Mon, 18 May 2015, H.J. Lu wrote: To avoid indirect branch to internal functions, I am proposing to add a new relocation, R_X86_64_RELAX_GOTPCREL, to x86-64 psABI: 1. When branching to an external function, foo, compiler may generate call/jmp *foo@GOTRELAX(%rip) which generates R_X86_64_RELAX_GOTPCREL relocation, instead of call/jmp foo[@PLT] 2. When function foo is locally defined, linker converts call/jmp *foo@GOTRELAX(%rip) to nop call/jmp foo For the jmp case the nop can also be added after it, to not even disturb the insn decoder. For calls as well of course, but there it might be better to have it before the call. Ciao, Michael.
Re: [PING][PR65637] Fix ssa-handling code in expand_omp_for_static_chunk
On 18-05-15 14:53, Tom de Vries wrote: On 15-04-15 15:10, Tom de Vries wrote: Hi, This patch series fixes PR65637. Currently, ssa-handling code in expand_omp_for_static_chunk is dead and not exercised by testing. Ssa-handling code in omp-low.c is only triggered by pass_parallelize_loops, and that pass doesn't specify a chunk size on the GIMPLE_OMP_FOR it constructs, so that only exercises the expand_omp_for_static_nochunk path. Using the attached trigger patch, we excercise the ssa-handling code in expand_omp_for_static_chunk. The following patch series fixes the problems in the ssa-handling code that we encounter. 1. Fix gcc_assert in expand_omp_for_static_chunk 2. Fix inner loop phi in expand_omp_for_static_chunk 3. Handle 2 preds for fin_bb in expand_omp_for_static_chunk The patch series has been bootstrapped and reg-tested on x86_64 together with attached trigger patch. I'll post the patches from the patch series individually, in response to this email. Ping for the three patches. Committed to gomp-4_0-branch. Thanks, - Tom 0001-Set-chunk_size-to-one-for-parloops-parallel.patch Set chunk_size to one for parloops parallel --- gcc/tree-parloops.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 62a6444..862c420 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1719,6 +1719,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, type = TREE_TYPE (cvar); t = build_omp_clause (loc, OMP_CLAUSE_SCHEDULE); OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC; + OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (t) = integer_one_node; for_stmt = gimple_build_omp_for (NULL, GF_OMP_FOR_KIND_FOR, t, 1, NULL); gimple_set_location (for_stmt, loc); -- 1.9.1
RFC: Add a new relocation to x86-64/i386 psABIs
To avoid indirect branch to internal functions, I am proposing to add a new relocation, R_X86_64_RELAX_GOTPCREL, to x86-64 psABI: 1. When branching to an external function, foo, compiler may generate call/jmp *foo@GOTRELAX(%rip) which generates R_X86_64_RELAX_GOTPCREL relocation, instead of call/jmp foo[@PLT] 2. When function foo is locally defined, linker converts call/jmp *foo@GOTRELAX(%rip) to nop call/jmp foo 3. Otherwise, linker treats R_X86_64_RELAX_GOTPCREL the same way as R_X86_64_GOTPCREL. For i386 psABI, we add R_386_RELAX_GOT32: 1. When branching to an external function, foo, in non-PIC mode, compiler may generate call/jmp *foo@GOTRELAX which generates R_386_RELAX_GOT32 relocation, instead of call/jmp foo and in PIC mode call/jmp *foo@GOTRELAX(%reg) which generates R_386_RELAX_GOT32 relocation and REG holds the address of GOT, instead of call/jmp foo@PLT 2. When function foo is locally defined, linker converts call/jmp *foo@GOTRELAX[(%reg)] to nop call/jmp foo 3. Otherwise, a. In PIC mode, linker treats R_386_RELAX_GOT32 the same way as R_386_GOT32 and call/jmp *foo@GOTRELAX is unsupported. b. In no-PIC mode, linker computes its relocation value as relocation value of R_386_GOT32 plus the address of GOT and converts call/jmp *foo@GOTRELAX(%reg) to call/jmp *foo@GOTRELAX if needed. This new relocation effectively turns off lazy binding on function, foo. For i386, compiler is free to choose any register to hold the address of GOT and there is no need to make EBX a fixed register when branching to an external function in PIC mode. With this new relocation, only a one-byte NOP prefix overhead is added when function, foo, which compiler determines is external, turns out to be local at link-time, because of -Bsymbolic or a definition in another input object file which compiler has no knowledge of. The new -fno-plt GCC option can use R_X86_64_RELAX_GOTPCREL and R_386_RELAX_GOT32 relocations if linker supports them to avoid indirect branch to internal functions. H.J.
[PING][PR65637] Fix ssa-handling code in expand_omp_for_static_chunk
On 15-04-15 15:10, Tom de Vries wrote: Hi, This patch series fixes PR65637. Currently, ssa-handling code in expand_omp_for_static_chunk is dead and not exercised by testing. Ssa-handling code in omp-low.c is only triggered by pass_parallelize_loops, and that pass doesn't specify a chunk size on the GIMPLE_OMP_FOR it constructs, so that only exercises the expand_omp_for_static_nochunk path. Using the attached trigger patch, we excercise the ssa-handling code in expand_omp_for_static_chunk. The following patch series fixes the problems in the ssa-handling code that we encounter. 1. Fix gcc_assert in expand_omp_for_static_chunk 2. Fix inner loop phi in expand_omp_for_static_chunk 3. Handle 2 preds for fin_bb in expand_omp_for_static_chunk The patch series has been bootstrapped and reg-tested on x86_64 together with attached trigger patch. I'll post the patches from the patch series individually, in response to this email. Ping for the three patches. Thanks, - Tom 0001-Set-chunk_size-to-one-for-parloops-parallel.patch Set chunk_size to one for parloops parallel --- gcc/tree-parloops.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 62a6444..862c420 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -1719,6 +1719,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data, type = TREE_TYPE (cvar); t = build_omp_clause (loc, OMP_CLAUSE_SCHEDULE); OMP_CLAUSE_SCHEDULE_KIND (t) = OMP_CLAUSE_SCHEDULE_STATIC; + OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (t) = integer_one_node; for_stmt = gimple_build_omp_for (NULL, GF_OMP_FOR_KIND_FOR, t, 1, NULL); gimple_set_location (for_stmt, loc); -- 1.9.1
[PATCH 01/13] recog: Increased max number of alternatives - v2
The new version also changes the type for the alternative_mask to unsigned HOST_WIDE_INT. Bootstrapped without regressions on x86-64. Ok to apply? Bye, -Andreas- gcc/ * recog.h: Increase MAX_RECOG_ALTERNATIVES. --- gcc/recog.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/recog.h b/gcc/recog.h index 463c748..07700bd 100644 --- a/gcc/recog.h +++ b/gcc/recog.h @@ -23,8 +23,8 @@ along with GCC; see the file COPYING3. If not see /* Random number that should be large enough for all purposes. Also define a type that has at least MAX_RECOG_ALTERNATIVES + 1 bits, with the extra bit giving an invalid value that can be used to mean uninitialized. */ -#define MAX_RECOG_ALTERNATIVES 30 -typedef unsigned int alternative_mask; +#define MAX_RECOG_ALTERNATIVES 35 +typedef unsigned HOST_WIDE_INT alternative_mask; /* A mask of all alternatives. */ #define ALL_ALTERNATIVES ((alternative_mask) -1)
Re: [PATCH][ARM] PR/65711: Don't pass '-dynamic-linker' when '-shared' is used
On Thu, Apr 23, 2015 at 9:29 AM, Ludovic Courtès l...@gnu.org wrote: As discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65711. Patch is for both 4.8 and 4.9 (possibly 5.1 too, I haven’t checked.) OK for trunk. This is also ok for all release branches if no objections in 24 hours. Sorry about the delayed review. regards Ramana Tested on armhf-linux-gnu (armv7.) gcc/ 2015-04-23 Ludovic Courtès l...@gnu.org PR 65711 * config/arm/linux-elf.h (LINUX_TARGET_LINK_SPEC): Move '-dynamic-linker' within %{!shared: ...}.
Re: [PATCH diagnostics/fortran] Handle two locations for the same diagnostic. Convert all gfc_warning_1 and gfc_notify_std_1 calls
Manuel López-Ibáñez lopeziba...@gmail.com writes: On 15 May 2015 at 10:39, Dodji Seketeli do...@redhat.com wrote: Manuel López-Ibáñez lopeziba...@gmail.com writes: -/* Expand the location of this diagnostic. Use this function for consistency. */ +/* Return the location associated to this diagnostic. WHICH specifies Here, I think only the 'W' (in WHICH) should be uppercase. I'm following the convention that parameter names are uppercase in the description of functions. Oh, okay then. My bad. Sorry. /* The type of a text to be formatted according a format specification along with a list of things. */ struct text_info { +public: As this is a struct, the 'public' here is not necessary, as the members are public by default. I have a very poor memory for such details ;), since we are using 'private:' already, does it really hurt to be explicit and use 'public:' here? It doesn't hurt, per se. But I think it's a very common style to avoid specifying the 'public' in this case, so I'd rather just remove it, yes. [...] OK to commit with this change then. Cheers, -- Dodji
Re: [PATCH] [AArch32] Additional bics patterns.
Hi Alex, On 15/05/15 18:53, Alex Velenko wrote: On 01/05/15 10:28, Kyrill Tkachov wrote: Can you please confirm that bootstraps with both arm and thumb pass? That is, configured with --with-mode=arm and --with-mode=thumb Hi Kyrill, Bootstrapped on arm-none-gnueabihf with arm and thumb mode. Following patch requires bics shift operand on Thumb2 to be const int, as bics shifted by register is not supported by Thumb2. Is patch ok? gcc 2015-05-15 Alex Velenko alex.vele...@arm.com * config/arm/arm.md (andsi_not_shiftsi_si_scc): New pattern. * (andsi_not_shiftsi_si_scc_no_reuse): New pattern. No need for the '*', just: * config/arm/arm.md (andsi_not_shiftsi_si_scc): New pattern. (andsi_not_shiftsi_si_scc_no_reuse): Likewise. will do. Ok for trunk with that ChangeLog nit. Thanks, Kyrill gcc/testsuite 2015-05-15 Alex Velenko alex.vele...@arm.com * gcc.target/arm/bics_1.c : New testcase. * gcc.target/arm/bics_2.c : New testcase. * gcc.target/arm/bics_3.c : New testcase. * gcc.target/arm/bics_4.c : New testcase. --- gcc/config/arm/arm.md | 49 ++ gcc/testsuite/gcc.target/arm/bics_1.c | 54 + gcc/testsuite/gcc.target/arm/bics_2.c | 57 +++ gcc/testsuite/gcc.target/arm/bics_3.c | 41 + gcc/testsuite/gcc.target/arm/bics_4.c | 49 ++ 5 files changed, 250 insertions(+) create mode 100644 gcc/testsuite/gcc.target/arm/bics_1.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_2.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_3.c create mode 100644 gcc/testsuite/gcc.target/arm/bics_4.c diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 164ac13..26d3ad2 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2768,6 +2768,55 @@ (const_string logic_shift_reg)))] ) +;; Shifted bics pattern used to set up CC status register and not reusing +;; bics output. Pattern restricts Thumb2 shift operand as bics for Thumb2 +;; does not support shift by register. +(define_insn andsi_not_shiftsi_si_scc_no_reuse + [(set (reg:CC_NOOV CC_REGNUM) + (compare:CC_NOOV + (and:SI (not:SI (match_operator:SI 0 shift_operator + [(match_operand:SI 1 s_register_operand r) +(match_operand:SI 2 arm_rhs_operand rM)])) + (match_operand:SI 3 s_register_operand r)) + (const_int 0))) + (clobber (match_scratch:SI 4 =3Dr))] + TARGET_ARM || (TARGET_THUMB2 CONST_INT_P (operands[2])) + bic%.%?\\t%4, %3, %1%S0 + [(set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr conds set) + (set_attr shift 1) + (set (attr type) (if_then_else (match_operand 2 const_int_operand ) + (const_string logic_shift_imm) + (const_string logic_shift_reg)))] +) + +;; Same as andsi_not_shiftsi_si_scc_no_reuse, but the bics result is also +;; getting reused later. +(define_insn andsi_not_shiftsi_si_scc + [(parallel [(set (reg:CC_NOOV CC_REGNUM) + (compare:CC_NOOV + (and:SI (not:SI (match_operator:SI 0 shift_operator + [(match_operand:SI 1 s_register_operand r) +(match_operand:SI 2 arm_rhs_operand rM)])) + (match_operand:SI 3 s_register_operand r)) + (const_int 0))) + (set (match_operand:SI 4 s_register_operand =3Dr) +(and:SI (not:SI (match_op_dup 0 +[(match_dup 1) + (match_dup 2)])) +(match_dup 3)))])] + TARGET_ARM || (TARGET_THUMB2 CONST_INT_P (operands[2])) + bic%.%?\\t%4, %3, %1%S0 + [(set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr conds set) + (set_attr shift 1) + (set (attr type) (if_then_else (match_operand 2 const_int_operand ) + (const_string logic_shift_imm) + (const_string logic_shift_reg)))] +) + (define_insn *andsi_notsi_si_compare0 [(set (reg:CC_NOOV CC_REGNUM) (compare:CC_NOOV diff --git a/gcc/testsuite/gcc.target/arm/bics_1.c b/gcc/testsuite/gcc.target/arm/bics_1.c new file mode 100644 index 000..173eb89 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/bics_1.c @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ +/* { dg-require-effective-target arm32 } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d =3D a ~b; + + /* { dg-final { scan-assembler-times bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+ 2 } } */ + if (d =3D=3D 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d =3D a ~(b 3); + + /* { dg-final { scan-assembler-times bics\tr\[0-9\]+, r\[0-9\]+, r\[0-9\]+, .sl \#3 1 } }
RE: [PATCH] [gcc, combine] Backport to GCC 5.0 branch PR46164: Don't combine the insns if a volatile register is contained.
-Original Message- From: Segher Boessenkool [mailto:seg...@kernel.crashing.org] Sent: Thursday, May 14, 2015 9:29 PM To: Hale Wang Cc: l...@redhat.com; GCC Patches; Richard Sandiford; 'Terry Guo' Subject: Re: [PATCH] [gcc, combine] Backport to GCC 5.0 branch PR46164: Don't combine the insns if a volatile register is contained. On Thu, May 14, 2015 at 01:56:54PM +0800, Hale Wang wrote: gcc/ChangeLog: 2015-04-22 Hale Wang hale.w...@arm.com Terry Guo terry@arm.com PR rtl-optimization/64818 * combine.c (can_combine_p): Don't combine user-specified register if it is in an asm input. gcc/testsuite/ChangeLog: 2015-04-22 Hale Wang hale.w...@arm.com Terry Guo terry@arm.com PR rtl-optimization/64818 * gcc.target/arm/pr64818.c: New. This patch applies cleanly on GCC 5.0 branch. Bootstrap and regression test are OK for X86_64. Can we backport this patch to GCC 5.0 branch? It should be perfectly safe, and it's a pretty nasty bug. But it is technically not a regression (or is it?), so I'll defer to the release managers. Yes, I agree it is not a regression. Segher
Re: [RFC: Patch, PR 60158] gcc/varasm.c : Pass actual alignment value to output_constant_pool_2
On Fri, 15 May 2015, Jeff Law wrote: On 05/15/2015 04:37 AM, Dharmakan Rohit Arul Raj wrote: -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: Friday, May 15, 2015 10:30 AM Just to summarize: By default in GCC v4.7.x, all the constants are put into '.rodata.str1.4' section. In GCC v4.8.x from r192719 onwards, one of the move instruction of the string constant .LC0 is getting spilled. The reload pass, for any constants that aren't allowed and can't be reloaded in to registers tries to change them into memory references. Then while emitting that string constant to asm code (A:varasm.c: output_constant_pool_1), it explicitly passes the alignment as 1 which prevents the generation of fix-up table entries in 'B: rs6000.c:rs6000_assemble_integer' because the data is considered unaligned now. The bug seems to have gone latent with an unrelated trunk commit r204695 [* tree-ssa-loop-ivopts.c (force_expr_to_var_cost): Refactor the code. Handle type conversion.]. This commit chooses different spill candidates hence all the string constants are being put in to '.rodata.str1.4´section. The check I had in the test case is that if there is a '.data.rel.ro.local', then there should be '.fixup' section generated. Please let me know if you need any other details. Thanks. Even though I wasn't able to trigger the bug with the testcase from 65018, I went ahead and committed this patch to the trunk. It can't hurt and it's the right thing to do. Thanks for your patience, Jeff, Thanks for checking-in the changes. Can we apply the patch to GCC v4.8 and GCC v4.9 branch as well? That's up to the branch maintainers. I'd think it's safe, but it's ultimately their call. If it's a regression or a wrong-code issue then it's ok to backport given the patch is safe, of course. Richard. jeff -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
Re: Check canonical types in verify_type
On Mon, 18 May 2015, Jan Hubicka wrote: Hi, this patch adds basic checking of TYPE_CANOINCAL. It checkes TYPE_CANONICAL forms a tree and it moves lto's gimple_canonical_types_compatible_p back to middle-end and uses it to check that TYPE_CANONICAL is compatible and thus none of the FEs produce types sharing TYPE_CANONICAL that would be considered different by LTO's TBAA. I added trust_type_canonical argument and changed: - /* If the types have been previously registered and found equal - they still are. */ - if (TYPE_CANONICAL (t1) - TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2)) -return true; to: + /* If the types have been previously registered and found equal + they still are. */ + if (TYPE_CANONICAL (t1) TYPE_CANONICAL (t2) + trust_type_canonical) +return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2); (notice that I blocked the recursion when TYPE_CANONICAL is known to be different). The patch originally triggered an ICE in testsuite becuase C++ FE builds FUNCTION_TYPE type variant with different attributes but same TYPE_CANONICAL. I think C++ FE is correct and we may want to drop: + if (!comp_type_attributes (t1, t2)) + return false; Because I think the TBAA does not care about attribute lists. I suppose this is kind-of harmless because of: I think that was about attributes like 'packed', but those should have their effects applied at stor-layout time. So yes, I think this can be dropped. + /* For canonical type comparisons we do not want to build SCCs + so we cannot compare pointed-to types. But we can, for now, + require the same pointed-to type kind and match what + useless_type_conversion_p would do. */ + if (POINTER_TYPE_P (t1)) + { + if (TYPE_ADDR_SPACE (TREE_TYPE (t1)) + != TYPE_ADDR_SPACE (TREE_TYPE (t2))) + return false; + + if (TREE_CODE (TREE_TYPE (t1)) != TREE_CODE (TREE_TYPE (t2))) + return false; + } Is the reason for not making differences between differnt pointer types really just lazyness to not deal with SCCs? For that it is easy to add set of visited type pairs, just like odr_types_equivalent_p does. Well - TBAA doesn't care about pointed-to types. See get_alias_set else if (POINTER_TYPE_P (t) t != ptr_type_node) set = get_alias_set (ptr_type_node); so the above would at most matter for pointer-type fields of aggregates. And no, we don't want to deal with SCCs - I doubt it would bring any benefit and the issue here is that I am worried about SCCs being present dependent on the order of streaming (well, the code pre-dates SCC streaming). Because we now stream in SCC order anyway, most of time we won't even need to populate it. Shall I implement this? You can certainly try - but I think for cross-language interoperability we want to treat struct X { void *p; } struct Y { void *p; } and struct X { Y *p; } struct Y { X *q; } as compatible so I don't think we can use SCCs to partition alias sets further (that is, we _do_ have to treat pointers conservatively). The current code is as far as we can get IMHO (well, even the current code is too strict in making struct { void *p } and struct { int *p } not have the same alias-set. Other issue I run into is that for Ada bootstrap we have variadic type whose canonical types are having different temporary set as size. I think this is valid and perhaps gimple_canonical_types_compatible_p should consider variadic arrays to be compatible with any array of compatible type? Those are all local types and thus the strict equality compare should be ok? Not sure if we can do in C sth like void foo (int n) { struct { int i; int a[n]; } a; struct { int i; int a[n]; } b; } and if we'd have to assign the same alias-sets to 'a' and 'b'. I am not quite convinced we get variadic types right at LTO time, because they bypass canonical type calculation anyway and their canonical type is set by TYPE_MAIN_VARIANT in lto_read_body_or_constructor which I think is not safe. I will look for a testcase. That is because if they are streamed locally they do not enter type merging, but they still go via gimple_register_canonical_type, so I'm not sure where you see they always get their main variant as canonical type. I also added check that TYPE_CANONICAL agrees for type variants. I think it should because few times in the middle-end uses TYPE_MAIN_VARIANT and seem to expect this to happen: Yes. Otherwise we'd run in circles for code that does type = TYPE_CANONICAL (TYPE_MAIN_VARIANT (type)); because the result might not be a main variant. That's something to verify (both TYPE_CANONICAL and TYPE_MAIN_VARIANT should form a tree). static inline int same_type_for_tbaa (tree type1, tree type2) {
Re: [PATCH, 5/5] check_GNU_style.sh: Fix tab size in 80 characters check
On 12-05-15 17:16, Jeff Law wrote: [PATCH 5/5] check_GNU_style.sh: Fix tab size in 80 characters check 2015-05-11 Tom de Vriest...@codesourcery.com * check_GNU_style.sh (col): Fix tab size. OK. Hi Jeff, I. I noticed a performance degradation due to this patch: ... $ cat gcc/tree-ssa-tail-merge.c | awk '{printf +%s\n, $0}' | time -p ./contrib/check_GNU_style.sh - ... real 4.10 user 0.71 sys 6.77 ... Before this patch, the 'real' time was roughly a factor 80 smaller: ... real 0.05 user 0.02 sys 0.03 ... This degradation is due to the fact that the patch does the 80 chars check line-by-line, and invokes processes for each new line. II. Attached follow-up patch rewrites the 80 chars check to handle a file at a time rather than a line at a time, and gets performance back to normal: ... real 0.07 user 0.03 sys 0.05 ... As a bonus, the bit longer than 80 chars is now printed in red, similar to how the other checks show the output. OK for trunk? Thanks, - Tom check_GNU_style.sh: Don't do 80 char check line by line 2015-05-18 Tom de Vries t...@codesourcery.com * check_GNU_style.sh: Add temp files tmp2 and tmp3. (cat_with_prefix): New function, using global variable prefix. (col): Make prefix a global variable. Rewrite to process file at a time rather than line at a time. Print part longer than 80 chars in red. --- contrib/check_GNU_style.sh | 70 ++ 1 file changed, 52 insertions(+), 18 deletions(-) diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh index ab59b1e..033a2c9 100755 --- a/contrib/check_GNU_style.sh +++ b/contrib/check_GNU_style.sh @@ -65,10 +65,12 @@ fi inp=check_GNU_style.inp tmp=check_GNU_style.tmp +tmp2=check_GNU_style.2.tmp +tmp3=check_GNU_style.3.tmp # Remove $tmp on exit and various signals. -trap rm -f $inp $tmp $stdin_tmp 0 -trap rm -f $inp $tmp $stdin_tmp; exit 1 1 2 3 5 9 13 15 +trap rm -f $inp $tmp $tmp2 $tmp3 $stdin_tmp 0 +trap rm -f $inp $tmp $tmp2 $tmp3 $stdin_tmp; exit 1 1 2 3 5 9 13 15 if [ $nfiles -eq 1 ]; then # There's no need for the file prefix if we're dealing only with one file. @@ -80,6 +82,17 @@ grep $format '^+' $files \ | grep -v ':+++' \ $inp +cat_with_prefix () +{ +local f=$1 + +if [ $prefix = ]; then + cat $f +else + awk {printf %s%s\n, $prefix, \$0} $f +fi +} + # Grep g (){ local msg=$1 @@ -134,10 +147,11 @@ vg (){ col (){ local msg=$1 + local first=true local f for f in $files; do - local prefix= + prefix= if [ $nfiles -ne 1 ]; then prefix=$f: fi @@ -148,22 +162,42 @@ col (){ | grep -v ':+++' \ $tmp - cat $tmp | while IFS= read -r line; do - local longline - # Filter out the line number prefix and the patch line modifier '+' - # to obtain the bare line, before we use expand. - longline=$(echo $line \ - | sed 's/^[0-9]*:+//' \ - | expand \ - | awk '{ if (length($0) 80) print $0}') - if [ $longline != ]; then - if $first; then - printf \n$msg\n - first=false - fi - echo $prefix$line + # Keep only line number prefix and patch modifier '+'. + cat $tmp \ + | sed 's/\(^[0-9][0-9]*:+\).*/\1/' \ + $tmp2 + + # Remove line number prefix and patch modifier '+'. + # Expand tabs to spaces according to tab positions. + # Keep long lines, make short lines empty. Print the part past 80 chars + # in red. + cat $tmp \ + | sed 's/^[0-9]*:+//' \ + | expand \ + | awk '{ \ + if (length($0) 80) \ + printf %s\033[1;31m%s\033[0m\n, \ + substr($0,1,80), \ + substr($0,81); \ + else \ + print \ + }' \ + $tmp3 + + # Combine prefix back with long lines. + # Filter out empty lines. + local found=false + paste -d '' $tmp2 $tmp3 \ + | grep -v '^[0-9][0-9]*:+$' \ + $tmp found=true + + if $found; then + if $first; then + printf \n$msg\n + first=false fi - done + cat_with_prefix $tmp + fi done } -- 1.9.1
RE: [Patch][loop-invariant.c] Fix a couple of bugs regarding loop invariant motion discovered by spec2k6 on aarch64
Hi Jeff, Thanks for the suggestion. I did a bootstrap x86_64 build before and after my patch and looked for differences in the last stage object files and there were plenty of them. I chose a nice simple function (check_callers) from ipa-inline-analysis.c and reduced it to a small test case. Hope this is ok. Testing done: * aarch64 built, make check no regressions * aarch64_be built, make check no regressions * x86_64 built, make check no regressions ChangeLog: 2015-05-15 David Sherwood david.sherw...@arm.com * loop-invariant.c (create_new_invariant): Don't calculate address cost if mode is not a scalar integer. (get_inv_cost): Increase computational cost for unused invariants. * testsuite/gcc.dg/loop-invariant.c: New testcase. Regards, David. -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: 11 May 2015 20:27 To: David Sherwood; gcc-patches@gcc.gnu.org Subject: Re: [Patch][loop-invariant.c] Fix a couple of bugs regarding loop invariant motion discovered by spec2k6 on aarch64 On 05/11/2015 06:02 AM, David Sherwood wrote: Hi, This patch fixes a couple of issues I saw during the compilation of shell_lam.f for 410.bwaves test in spec2006: * create_new_invariant: We shouldn't bother attempting to calculate the address cost for something that clearly isn't an address. Use a SCALAR_INT_MODE_P check to eliminate the most obvious cases, such as vector modes and so on. * get_inv_cost: Change the code so that we don't treat something as an address if it is never actually used as an address, i.e. we didn't use it at all. This occurs due to the way we process the innermost loop first - invariants get pulled out of inner most loop and then get processed during next innermost loop. Here is more detail from shell_lam.f.210r.loop2_invariant dump file that shows what currently happens: 4427 *ending processing of loop 27 ** 4442 (const_vector:V2DF [ 4443 (const_double:DF 4.0e+0 [0x0.8p+3]) (const_double:DF 4.0e+0 [0x0.8p+3]) 4445 ]) 4446 4447 Hot cost: 8 (final) 4448 (set (reg:V2DF 3179) 4449 (const_vector:V2DF [ 4450 (const_double:DF 4.0e+0 [0x0.8p+3]) 4451 (const_double:DF 4.0e+0 [0x0.8p+3]) 4452 ])) 4453 4454 Hot cost: 8 (final) 4455 Set in insn 2764 is invariant (1), cost 8, depends on 4471 Decided to move invariant 1 -- gain 8 move_invariant_reg moves invariant (the const_vector above) into 3490 and replaces use of 3179 with 3490, but conservatively keeps the definition of 3179, which gets processed in outer loop, but is never used. Debug for outer loop now follows: 4497 *starting processing of loop 26 ** 6379 (insn 2764 2748 2766 110 (set (reg:V2DF 3490) 6380 (const_vector:V2DF [ 6381 (const_double:DF 4.0e+0 [0x0.8p+3]) 6382 (const_double:DF 4.0e+0 [0x0.8p+3]) 6383 ])) shell_lam.f:287 819 {*aarch64_simd_movv2df} 6384 (nil)) 6385 ;; UD chains for insn luid 88 uid 2766 6604 ;; UD chains for insn luid 21 uid 3836 6605 ;; reg 3490 { d419(bb 110 insn 2764) } 6606 (insn 3836 2763 2765 111 (set (reg:V2DF 3179) 6607 (reg:V2DF 3490)) shell_lam.f:287 -1 6608 (nil)) ^^^ additional insn generated by move_invariant_reg from loop 27 6836 ;; UD chains for insn luid 40 uid 2790 6837 ;; reg 3161 { d272(bb 111 insn 2746) } 6838 ;; reg 3201 { d305(bb 111 insn 2788) } 6839 ;; reg 3490 { d419(bb 110 insn 2764) } 6840 ;; eq_note reg 3161 { d272(bb 111 insn 2746) } 6841 ;; eq_note reg 3201 { d305(bb 111 insn 2788) } 6842 (insn 2790 2788 2791 111 (set (reg:V2DF 3203 [ vect__930.427 ]) 6843 (fma:V2DF (reg:V2DF 3161 [ MEM[base: vectp_q.371_2137, index: ivtmp.679_3214, offset: 0B] ]) 6844 (neg:V2DF (reg:V2DF 3490)) 6845 (reg:V2DF 3201 [ vect__928.424 ]))) shell_lam.f:287 1219 {fnmav2df4} 6846 (expr_list:REG_DEAD (reg:V2DF 3201 [ vect__928.424 ]) 6847 (expr_list:REG_DEAD (reg:V2DF 3179) ... ^^^ 3179 marked as DEAD 8366 *ending processing of loop 26 ** 8472 (const_vector:V2DF [ 8473 (const_double:DF 4.0e+0 [0x0.8p+3]) 8474 (const_double:DF 4.0e+0 [0x0.8p+3]) 8475 ]) 8476 8477 Hot cost: 8 (final) 8478 (set (reg:V2DF 3490) 8479 (const_vector:V2DF [ 8480 (const_double:DF 4.0e+0 [0x0.8p+3]) 8481 (const_double:DF 4.0e+0 [0x0.8p+3]) 8482 ])) 8483 8484 Hot cost: 8 (final) 8485 Set in insn 2764 is invariant (12), cost 8, depends on 8505 (set (reg:V2DF 3179)
Re: check_GNU_style.sh: Don't cat empty file
Tom de Vries tom_devr...@mentor.com writes: diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh index 728c11a..ab59b1e 100755 --- a/contrib/check_GNU_style.sh +++ b/contrib/check_GNU_style.sh @@ -84,10 +84,16 @@ grep $format '^+' $files \ g (){ local msg=$1 local arg=$2 + +local found=false cat $inp \ | egrep --color=always -- $arg \ - $tmp printf \n$msg\n -cat $tmp + $tmp found=true + +if $found; then cat $inp \ | egrep --color=always -- $arg \ $tmp { printf \n$msg\n cat $tmp } Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [match-and-simplify] fix incorrect code-gen in 'for' pattern
On Sat, 16 May 2015, Prathamesh Kulkarni wrote: Hi, genmatch generates incorrect code for following (artificial) pattern: (for op (plus) op2 (op) (simplify (op @x @y) (op2 @x @y) generated gimple code: http://pastebin.com/h1uau9qB 'op' is not replaced in the generated code on line 33: *res_code = op; I think it would be a better idea to make op2 iterate over same set of operators (op2-substitutes = op-substitutes). I have attached patch for the same. Bootstrap + testing in progress on x86_64-unknown-linux-gnu. OK for trunk after bootstrap+testing completes ? Hmm, but then the example could as well just use 'op'. I think we should instead reject this. Consider (for op (plus minus) (for op2 (op) (simplify ... where it is not clear what would be desired. Simple replacement of 'op's value would again just mean you could have used 'op' in the first place. Doing what you propose would get you (for op (plus minus) (for op2 (plus minus) (simplify ... thus a different iteration. I wonder if we really need is_oper_list flag in user_id ? We can determine if user_id is an operator list if user_id::substitutes is not empty ? After your change yes. That will lose the ability to distinguish between user-defined operator list and list-iterator in for like op/op2, but I suppose we (so far) don't need to distinguish between them ? Well, your change would simply make each list-iterator a (temporary) user-defined operator list as well as the current iterator element (dependent on context - see the nested for example). I think that adds to confusion. So - can you instead reject this use? Thanks, Richard.
[patch, gcc 5 regression] re-enable biarch for powerpc-linux-gnu
We've found that configuring a powerpc-linux-gnu cross toolchain with --enable-targets=all no longer enables -m64 support in GCC 5, due to the patch for PR target/65286. (We want to build with a -m64 multilib, in particular.) The attached patch seems to fix the breakage, but I'm not sure that it might not break some other configuration. If this isn't the right fix, can one of the target experts suggest a better one? Here's a link to the discussion of the patch that caused the breakage: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00321.html -Sandra 2015-05-18 Sandra Loosemore san...@codesourcery.com gcc/ * config.gcc [powerpc*-*-linux*]: Allow --enable-targets=all to build a biarch toolchain again. Index: gcc/config.gcc === --- gcc/config.gcc (revision 449971) +++ gcc/config.gcc (working copy) @@ -2376,6 +2379,7 @@ powerpc*-*-linux*) maybe_biarch=${cpu_is_64bit} case ${enable_targets} in *powerpc64*) maybe_biarch=yes ;; + all) maybe_biarch=yes ;; esac case ${target} in powerpc64*-*-linux*spe* | powerpc64*-*-linux*paired*)
C++ PATCH to print_tree TRAIT_EXPR
I noticed that debug_tree wasn't printing the types or code of a TRAIT_EXPR, so this fixes that. I'm not bothering to turn the kind into an enum name at the moment. Tested x86_64-pc-linux-gnu, applying to trunk. commit 6baa86e88580ee3e4f5a322fc530427169da2227 Author: Jason Merrill ja...@redhat.com Date: Fri May 15 15:27:51 2015 -0400 * ptree.c (cxx_print_xnode): Handle TRAIT_EXPR. diff --git a/gcc/cp/ptree.c b/gcc/cp/ptree.c index 2d0b584..fd71bb4 100644 --- a/gcc/cp/ptree.c +++ b/gcc/cp/ptree.c @@ -271,6 +271,13 @@ cxx_print_xnode (FILE *file, tree node, int indent) print_node (file, pattern, DEFERRED_NOEXCEPT_PATTERN (node), indent+4); print_node (file, args, DEFERRED_NOEXCEPT_ARGS (node), indent+4); break; +case TRAIT_EXPR: + indent_to (file, indent+4); + fprintf (file, kind %d, TRAIT_EXPR_KIND (node)); + print_node (file, type 1, TRAIT_EXPR_TYPE1 (node), indent+4); + if (TRAIT_EXPR_TYPE2 (node)) + print_node (file, type 2, TRAIT_EXPR_TYPE2 (node), indent+4); + break; case LAMBDA_EXPR: cxx_print_lambda_node (file, node, indent); break;
[PATCH] Refactoring: use std::swap instead of manual swaps
Hi all. The attached patch replaces the following typical sequence of assignments, matching the pattern: [type] temp_var = value1; value1 = value2; value2 = temp_var; with std::swap (value1, value2); The goal is to improve code readability (at least a little). Pattern was searched in GCC tree using a script, but all modifications have been done manually to make sure that removing assignment to temp_var does not modify the behavior of the code. Where possible, temporary variables were removed or moved to blocks with narrower scope. I also removed a couple of pre-C++ implementations of swap function and one implementation of std::reverse. Bootstrapped/regtested on x86_64 linux (multilib). Test for broken builds currently running. I will also run test on a couple of simulators (though currently there is a problem with build: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66181). OK for trunk after complete test? -- Regards, Mikhail Maltsev gcc/c/ChangeLog: 2015-05-18 Mikhail Maltsev malts...@gmail.com * c-typeck.c (build_array_ref): Use std::swap instead of explicit swaps gcc/ChangeLog: 2015-05-18 Mikhail Maltsev malts...@gmail.com * bb-reorder.c (fix_up_fall_thru_edges): Use std::swap instead of explicit swaps. * dojump.c (do_compare_rtx_and_jump): Likewise. * expmed.c (emit_store_flag_1): Likewise. * fibonacci_heap.h (fibonacci_heap::union_with): Likewise. * final.c (sprint_ul): Use std::reverse for reversing a string. * fold-const.c (extract_muldiv_1): Use std::swap. * genmodes.c (emit_mode_int_n): Likewise. * ifcvt.c (dead_or_predicable): Likewise. * ira-build.c (ira_merge_live_ranges): Likewise. (swap_allocno_copy_ends_if_necessary): Likewise. * ira.c (ira_setup_alts): Likewise. * loop-iv.c (iv_analyze_expr): Likewise. (implies_p): Likewise. (canon_condition): Likewise. * lra-constraints.c (swap_operands): Likewise. * lra-lives.c (lra_merge_live_ranges): Likewise. * omega.c (swap): Remove. (bswap): Remove. (omega_unprotect_1): Use std::swap. (omega_solve_geq): Likewise. * optabs.c (expand_binop_directly): Likewise. (expand_binop): Likewise. (emit_conditional_move): Likewise. (emit_conditional_add): Likewise. * postreload.c (reload_cse_simplify_operands): Likewise. * reg-stack.c (emit_swap_insn): Likewise. (swap_to_top): Likewise. (compare_for_stack_reg): Likewise. (subst_asm_stack_regs): Likewise. * reload.c (find_reloads): Likewise. * reload1.c (gen_reload_chain_without_interm_reg_p): Likewise. * sel-sched.c (invoke_reorder_hooks): Likewise. (create_block_for_bookkeeping): Likewise. * tree-data-ref.c (lambda_matrix_row_exchange): Remove. (lambda_matrix_right_hermite): Use std::swap. * tree-ssa-coalesce.c (sort_coalesce_list): Likewise. * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise. * tree-ssa-loop-ivopts.c (iv_ca_delta_reverse): Likewise. * tree-ssa-math-opts.c (is_widening_mult_p): Likewise. * tree-ssa-phiopt.c (hoist_adjacent_loads): Likewise. * tree-ssa-reassoc.c (linearize_expr_tree): Likewise. * tree-ssa-threadedge.c (simplify_control_stmt_condition): Likewise. * tree-vrp.c (compare_ranges): Likewise. * var-tracking.c (add_with_sets): Likewise. (vt_find_locations): Likewise. gcc/cp/ChangeLog: 2015-05-18 Mikhail Maltsev malts...@gmail.com * typeck.c (composite_pointer_type): Use std::swap instead of explicit swaps. gcc/fortran/ChangeLog: 2015-05-18 Mikhail Maltsev malts...@gmail.com * interface.c (compare_actual_formal): Use std::swap instead of explicit swaps. * trans-array.c (gfc_trans_scalarized_loop_end): Likewise. * trans-intrinsic.c (walk_inline_intrinsic_transpose): Likewise. gcc/c-family/ChangeLog: 2015-05-18 Mikhail Maltsev malts...@gmail.com * c-common.c (shorten_compare): Use std::swap instead of explicit swaps. diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index c134712..a97c2cd 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -1812,7 +1812,6 @@ fix_up_fall_thru_edges (void) edge succ2; edge fall_thru; edge cond_jump = NULL; - edge e; bool cond_jump_crosses; int invert_worked; rtx_insn *old_jump; @@ -1901,9 +1900,7 @@ fix_up_fall_thru_edges (void) fall_thru-flags = ~EDGE_FALLTHRU; cond_jump-flags |= EDGE_FALLTHRU; update_br_prob_note (cur_bb); - e = fall_thru; - fall_thru = cond_jump; - cond_jump = e; + std::swap (fall_thru, cond_jump); cond_jump-flags |= EDGE_CROSSING; fall_thru-flags = ~EDGE_CROSSING; } diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 8c7fdd2..ae5564e 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -4372,20 +4372,12 @@ shorten_compare (location_t loc, tree *op0_ptr, tree *op1_ptr, !integer_zerop (primop1) !real_zerop (primop1) !fixed_zerop (primop1)) { - tree tem = primop0; - int temi = unsignedp0; - primop0 = primop1; - primop1 = tem; - tem = op0; - op0 = op1; - op1 = tem; + std::swap
Re: [PATCH 02/13] optabs: Fix vec_perm - V16QI middle end lowering.
On 05/11/2015 06:23 AM, Andreas Krebbel wrote: @@ -6784,14 +6784,18 @@ expand_vec_perm (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target) { /* Multiply each element by its byte size. */ machine_mode selmode = GET_MODE (sel); + /* We cannot re-use SEL as a temp operand since it might by in + read-only storage. */ + rtx sel_reg = gen_reg_rtx (selmode); + if (u == 2) - sel = expand_simple_binop (selmode, PLUS, sel, sel, -sel, 0, OPTAB_DIRECT); + sel_reg = expand_simple_binop (selmode, PLUS, sel, sel, +sel_reg, 0, OPTAB_DIRECT); else You needn't allocate sel_reg explicitly; expand_simple_binop will do that for you if the TARGET parameter is NULL. Thus this patch should be an 8 character change on those two calls. r~
C++ PATCH for core DR 1391
The proposed resolution of core issue 1391 is problematic, as I discovered when trying to implement it, but this variant seems like a strict improvement: rather than consider non-dependent conversions during deduction, wait until we've tried to deduce all arguments first. Tested x86_64-pc-linux-gnu, applying to trunk. commit cd4859011b50c864c2d027ce443f241301243ef5 Author: Jason Merrill ja...@redhat.com Date: Wed May 13 19:32:15 2015 -0400 DR 1391 * pt.c (type_unification_real): Check convertibility here. (unify_one_argument): Not here. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 7871474..2cd36c9 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -16678,7 +16678,7 @@ uses_deducible_template_parms (tree type) static int unify_one_argument (tree tparms, tree targs, tree parm, tree arg, - int subr, unification_kind_t strict, int flags, + int subr, unification_kind_t strict, bool explain_p) { tree arg_expr = NULL_TREE; @@ -16695,16 +16695,10 @@ unify_one_argument (tree tparms, tree targs, tree parm, tree arg, argument to convert it to the type of the corresponding function parameter if the parameter type contains no template-parameters that participate in template argument deduction. */ - if (TYPE_P (parm) !uses_template_parms (parm)) -/* For function parameters that contain no template-parameters at all, - we have historically checked for convertibility in order to shortcut - consideration of this candidate. */ -return check_non_deducible_conversion (parm, arg, strict, flags, - explain_p); - else if (strict == DEDUCE_CALL - TYPE_P (parm) !uses_deducible_template_parms (parm)) -/* For function parameters with only non-deducible template parameters, - just return. */ + if (strict != DEDUCE_EXACT + TYPE_P (parm) !uses_deducible_template_parms (parm)) +/* For function parameters with no deducible template parameters, + just return. We'll check non-dependent conversions later. */ return unify_success (explain_p); switch (strict) @@ -16843,7 +16837,7 @@ type_unification_real (tree tparms, ++ia; if (unify_one_argument (tparms, targs, parm, arg, subr, strict, - flags, explain_p)) + explain_p)) return 1; } @@ -16925,8 +16919,11 @@ type_unification_real (tree tparms, this parameter can be deduced. */ if (TREE_CODE (tparm) == PARM_DECL uses_template_parms (TREE_TYPE (tparm)) - !saw_undeduced++) - goto again; + saw_undeduced 2) + { + saw_undeduced = 1; + continue; + } /* Core issue #226 (C++0x) [temp.deduct]: @@ -16937,32 +16934,9 @@ type_unification_real (tree tparms, be NULL_TREE or ERROR_MARK_NODE, so we do not need to explicitly check cxx_dialect here. */ if (TREE_PURPOSE (TREE_VEC_ELT (tparms, i))) - { - tree parm = TREE_VALUE (TREE_VEC_ELT (tparms, i)); - tree arg = TREE_PURPOSE (TREE_VEC_ELT (tparms, i)); - reopen_deferring_access_checks (*checks); - location_t save_loc = input_location; - if (DECL_P (parm)) - input_location = DECL_SOURCE_LOCATION (parm); - arg = tsubst_template_arg (arg, targs, complain, NULL_TREE); - arg = convert_template_argument (parm, arg, targs, complain, - i, NULL_TREE); - input_location = save_loc; - *checks = get_deferred_access_checks (); - pop_deferring_access_checks (); - if (arg == error_mark_node) - return 1; - else - { - TREE_VEC_ELT (targs, i) = arg; - /* The position of the first default template argument, - is also the number of non-defaulted arguments in TARGS. - Record that. */ - if (!NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs)) - SET_NON_DEFAULT_TEMPLATE_ARGS_COUNT (targs, i); - continue; - } - } + /* OK, there is a default argument. Wait until after the + conversion check to do substitution. */ + continue; /* If the type parameter is a parameter pack, then it will be deduced to an empty parameter pack. */ @@ -16987,6 +16961,84 @@ type_unification_real (tree tparms, return unify_parameter_deduction_failure (explain_p, tparm); } + + /* DR 1391: All parameters have args, now check non-dependent parms for + convertibility. */ + if (saw_undeduced 2) + for (ia = 0, parms = xparms, args = xargs, nargs = xnargs; + parms parms != void_list_node ia nargs; ) + { + parm = TREE_VALUE (parms); + + if (TREE_CODE (parm) == TYPE_PACK_EXPANSION + (!TREE_CHAIN (parms) + || TREE_CHAIN (parms) == void_list_node)) + /* For a function parameter pack that occurs at the end of the + parameter-declaration-list, the type A of each remaining + argument of the call is compared with the type P of the + declarator-id of the function parameter pack. */ + break; + + parms = TREE_CHAIN (parms); + + if (TREE_CODE (parm) ==
Re: [PATCH] FreeBSD add functionality to build PIE executables.
On 05/17/2015 02:42 PM, Andreas Tobler wrote: Ping?! Thanks, Andreas On 11.05.15 22:34, Andreas Tobler wrote: All, this patch adds the ability to build PIE executables for FreeBSD. The core is since a longer time in the code base of FreeBSD itself and is working fine. This patch makes it available for all FreeBSD targets. Tested on x86_64-*-freebsd11.0 and armv6/hf-*-freebsd11.0, i386-*-freebsd11.0 is progress. In the same turn I removed the STARTFILE/ENDFILE_SPEC from config/i386/freebsd.h and use the ones from config/freebsd-spec.h. Here the results before the patch: https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01267.html and with the patch: https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01324.html Is this ok for trunk and for 5.1X? Thanks, Andreas 2015-05-11 Andreas Tobler andre...@gcc.gnu.org * config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build pie executables. (FBSD_ENDFILE_SPEC): Likewise. * config/i386/freebsd.h (STARTFILE_SPEC): Remove and use the one from config/freebsd-spec.h. (ENDFILE_SPEC): Likewise. 2015-05-11 Andreas Tobler andre...@gcc.gnu.org * lib/target-supports.exp (check_effective_target_pie): Add *-*-freebsd* to the familiy of pie capable targets. OK for the trunk, please install. Release managers have final say for the release branches. jeff
Re: RFC: Add a new relocation to x86-64/i386 psABIs
Hi, On Mon, 18 May 2015, H.J. Lu wrote: Yes, we should convert it to nop call foo/jmp foo nop I implemented it on users/hjl/relax branch in binutils git repo. the insn decoder. For calls as well of course, but there it might be better to have it before the call. I think a nop prefix is better on call. We won't mandate nop call foo in psABI and linker is free to use either a nop prefix or a nop suffix. Should we move forward with it? I think I have only nit-picking left: why call the whole thing relax? To me relax implies some length-changing transformation (like jump target relaxing, emitting shorter jumps when in range), but perhaps that's just me. OTOH I can't think of a better name right now. Then one remark: there's a small interaction between this scheme and taking the address of a function. I _think_ it's all taken care of, but just want to make sure it is: the relax scheme must only apply to the .got.plt slot, not to the normal .got slot (which must continue to hold the final function address), and with the recent sharing you have implemented (when both are needed) it must be ensured that also an existing RELAX_GOTPCREL reloc doesn't overwrite that .got slot with the .plt entry address. Ciao, Michael.
Re: RFC: Add a new relocation to x86-64/i386 psABIs
On Mon, May 18, 2015 at 8:52 AM, Michael Matz m...@suse.de wrote: Hi, On Mon, 18 May 2015, H.J. Lu wrote: Yes, we should convert it to nop call foo/jmp foo nop I implemented it on users/hjl/relax branch in binutils git repo. the insn decoder. For calls as well of course, but there it might be better to have it before the call. I think a nop prefix is better on call. We won't mandate nop call foo in psABI and linker is free to use either a nop prefix or a nop suffix. Should we move forward with it? I think I have only nit-picking left: why call the whole thing relax? To me relax implies some length-changing transformation (like jump target relaxing, emitting shorter jumps when in range), but perhaps that's just me. OTOH I can't think of a better name right now. Me neither. Linker manual has '--relax' '--no-relax' An option with machine dependent effects. This option is only supported on a few targets. *Note 'ld' and the H8/300: H8/300. *Note 'ld' and the Intel 960 family: i960. *Note 'ld' and Xtensa Processors: Xtensa. *Note 'ld' and the 68HC11 and 68HC12: M68HC11/68HC12. *Note 'ld' and the Altera Nios II: Nios II. *Note 'ld' and PowerPC 32-bit ELF Support: PowerPC ELF32. On some platforms the '--relax' option performs target specific, global optimizations that become possible when the linker resolves addressing in the program, such as relaxing address modes, synthesizing new instructions, selecting shorter version of current instructions, and combining constant values. On some platforms these link time global optimizations may make symbolic debugging of the resulting executable impossible. This is known to be the case for the Matsushita MN10200 and MN10300 family of processors. In this sense, calling this scheme relax is not entirely inaccurate. Then one remark: there's a small interaction between this scheme and taking the address of a function. I _think_ it's all taken care of, but I believe so. just want to make sure it is: the relax scheme must only apply to the .got.plt slot, not to the normal .got slot (which must continue to hold the final function address), and with the recent sharing you have implemented (when both are needed) it must be ensured that also an existing RELAX_GOTPCREL reloc doesn't overwrite that .got slot with the .plt entry address. Since .got and .got.plt slots serve the same purpose, ld already combines them into one single .got slot with .plt entry pointing to the .got slot. That is if you take the address of the function and branch to it, linker will arrange .plt entry to do an indirect branch via its got slot. -- H.J.
Re: [PATCH] Fix memory orders description in atomic ops built-ins docs.
Hello, On 15/05/15 17:22, Torvald Riegel wrote: This patch improves the documentation of the built-ins for atomic operations. The memory model to memory order change does improve things but I think that the patch has some problems. As it is now, it makes some of the descriptions quite difficult to understand and seems to assume more familiarity with details of the C++11 specification then might be expected. Generally, the memory order descriptions seem to be targeted towards language designers but don't provide for anybody trying to understand how to implement or to use the built-ins. Adding a less formal, programmers view to some of the descriptions would help. That implies the descriptions would be more than just illustrative, but I'd suggest that would be appropriate for the GCC manual. I'm also not sure that the use of C++11 terms in the some of the descriptions. In particular, using happens-before seems wrong because happens-before isn't described anywhere in the GCC manual and because it has a specific meaning in the C++11 specification that doesn't apply to the GCC built-ins (which C++11 doesn't know about). Some more comments below. Regards, Matthew diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 6004681..5b2ded8 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -8853,19 +8853,19 @@ are not prevented from being speculated to before the barrier. [...] If the data type size maps to one -of the integral sizes that may have lock free support, the generic -version uses the lock free built-in function. Otherwise an +of the integral sizes that may support lock-freedom, the generic +version uses the lock-free built-in function. Otherwise an external call is left to be resolved at run time. = This is a slightly awkward sentence. Maybe it could be replaced with something on the lines of The generic function uses the lock-free built-in function when the data-type size makes that possible, otherwise an external call is left to be resolved at run-time. = -The memory models integrate both barriers to code motion as well as -synchronization requirements with other threads. They are listed here -in approximately ascending order of strength. +An atomic operation can both constrain code motion by the compiler and +be mapped to a hardware instruction for synchronization between threads +(e.g., a fence). [...] = This is a little unclear (and inaccurate, aarch64 can use two instructions for fences). I also thought that atomic operations constrain code motion by the hardware. Maybe break the link with the compiler and hardware: An atomic operation can both constrain code motion and act as a synchronization point between threads. = @table @code @item __ATOMIC_RELAXED -No barriers or synchronization. +Implies no inter-thread ordering constraints. It may be useful to be explicit that there are no restrctions on code motion. @item __ATOMIC_CONSUME -Data dependency only for both barrier and synchronization with another -thread. +This is currently implemented using the stronger @code{__ATOMIC_ACQUIRE} +memory order because of a deficiency in C++11's semantics for +@code{memory_order_consume}. = It would be useful to have a description of what the __ATOMIC_CONSUME was meant to do, as well as the fact that it currently just maps to __ATOMIC_ACQUIRE. (Or maybe just drop it from the documentation until it's fixed.) = @item __ATOMIC_ACQUIRE -Barrier to hoisting of code and synchronizes with release (or stronger) -semantic stores from another thread. +Creates an inter-thread happens-before constraint from the release (or +stronger) semantic store to this acquire load. Can prevent hoisting +of code to before the operation. = As noted before, it's not clear what the inter-thread happens-before means in this context. Here and elsewhere: Can prevent motion of code is ambiguous: it doesn't say under what conditions code would or wouldn't be prevented from moving. = -Note that the scope of a C++11 memory model depends on whether or not -the function being called is a @emph{fence} (such as -@samp{__atomic_thread_fence}). In a fence, all memory accesses are -subject to the restrictions of the memory model. When the function is -an operation on a location, the restrictions apply only to those -memory accesses that could affect or that could depend on the -location. +Note that in the C++11 memory model, @emph{fences} (e.g., +@samp{__atomic_thread_fence}) take effect in combination with other +atomic operations on specific memory locations (e.g., atomic loads); +operations on specific memory locations do not necessarily affect other +operations in the same way. Its very unclear what this paragraph is saying. It seems to suggest that fences only work in combination with other operations. But that doesn't seem right since a __atomic_thread_fence (with appropriate memory order) can be dropped into any piece of code and will act in the
C++ PATCH for bugs in strip_typedefs*
While looking at a bug report caused by failure to strip all typedefs, I came across a few latent bugs in the strip_typedefs functions. First, if we're going to return a TREE_LIST unchanged we need to copy the pointer before we change the input variable. Second, if we have a decltype with no typedefs in the argument, we still need to handle a typedef to that decltype. Third, when we strip the operands of a TRAIT_EXPR we want to update the operands of the copy, not the original. Tested x86_64-pc-linux-gnu, applying to trunk and 5. commit a5bfb34605c5ea629db6ff6db76cdaed0f675670 Author: Jason Merrill ja...@redhat.com Date: Fri May 15 15:28:54 2015 -0400 * tree.c (strip_typedefs_expr) [TRAIT_EXPR]: Fix typo. (strip_typedefs) [DECLTYPE_TYPE]: Fix typedef of decltype. [TREE_LIST]: Fix no-change case. diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c index ec9be8c..eebb415 100644 --- a/gcc/cp/tree.c +++ b/gcc/cp/tree.c @@ -1265,6 +1265,7 @@ strip_typedefs (tree t, bool *remove_attributes) { bool changed = false; vectree,va_gc *vec = make_tree_vector (); + tree r = t; for (; t; t = TREE_CHAIN (t)) { gcc_assert (!TREE_PURPOSE (t)); @@ -1273,7 +1274,6 @@ strip_typedefs (tree t, bool *remove_attributes) changed = true; vec_safe_push (vec, elt); } - tree r = t; if (changed) r = build_tree_list_vec (vec); release_tree_vector (vec); @@ -1411,7 +1411,7 @@ strip_typedefs (tree t, bool *remove_attributes) result = strip_typedefs_expr (DECLTYPE_TYPE_EXPR (t), remove_attributes); if (result == DECLTYPE_TYPE_EXPR (t)) - return t; + result = NULL_TREE; else result = (finish_decltype_type (result, @@ -1496,8 +1496,8 @@ strip_typedefs_expr (tree t, bool *remove_attributes) type2 == TRAIT_EXPR_TYPE2 (t)) return t; r = copy_node (t); - TRAIT_EXPR_TYPE1 (t) = type1; - TRAIT_EXPR_TYPE2 (t) = type2; + TRAIT_EXPR_TYPE1 (r) = type1; + TRAIT_EXPR_TYPE2 (r) = type2; return r; }
Re: [PATCH, 5/5] check_GNU_style.sh: Fix tab size in 80 characters check
On 05/18/2015 02:07 AM, Tom de Vries wrote: On 12-05-15 17:16, Jeff Law wrote: [PATCH 5/5] check_GNU_style.sh: Fix tab size in 80 characters check 2015-05-11 Tom de Vriest...@codesourcery.com * check_GNU_style.sh (col): Fix tab size. OK. Hi Jeff, I. I noticed a performance degradation due to this patch: ... $ cat gcc/tree-ssa-tail-merge.c | awk '{printf +%s\n, $0}' | time -p ./contrib/check_GNU_style.sh - ... real 4.10 user 0.71 sys 6.77 ... Before this patch, the 'real' time was roughly a factor 80 smaller: ... real 0.05 user 0.02 sys 0.03 ... This degradation is due to the fact that the patch does the 80 chars check line-by-line, and invokes processes for each new line. II. Attached follow-up patch rewrites the 80 chars check to handle a file at a time rather than a line at a time, and gets performance back to normal: ... real 0.07 user 0.03 sys 0.05 ... As a bonus, the bit longer than 80 chars is now printed in red, similar to how the other checks show the output. OK for trunk? Yes, this is fine. jeff
[PATCH, alpha]: Fix PR57032, ICE with TARGET_LRA_P
Hello! Attached patch is the first small step towards LRA enablement on alpha. As suggester by Richard, the patch rewrites Q constraint as define_memory_constraint, so reload can push AND address to a register. I will close the referred PR, as it is not a regression and open a new one to track LRA enablement progress. 2015-05-18 Uros Bizjak ubiz...@gmail.com Richard Henderson r...@redhat.com PR target/57032 * config/alpha/constraints.md (Q): Rewrite as define_memory_constraint. Check for a memory location that is not a reference (using an AND) to an unaligned location here. * config/alpha/predicates.md (normal_memory_operand): Remove. Tested on alpha-linux-gnu and committed to mainline SVN. Uros. Index: config/alpha/constraints.md === --- config/alpha/constraints.md (revision 223268) +++ config/alpha/constraints.md (working copy) @@ -91,9 +91,13 @@ (match_test op == CONST0_RTX (mode ;; Extra constraints. -(define_constraint Q + +;; A memory location that is not a reference +;; (using an AND) to an unaligned location. +(define_memory_constraint Q @internal A normal_memory_operand - (match_operand 0 normal_memory_operand)) + (and (match_code mem) + (not (match_test GET_CODE (XEXP (op, 0)) == AND (define_constraint R @internal A direct_call_operand Index: config/alpha/predicates.md === --- config/alpha/predicates.md (revision 223268) +++ config/alpha/predicates.md (working copy) @@ -525,14 +525,6 @@ return false; }) -;; Return 1 is OP is a memory location that is not a reference -;; (using an AND) to an unaligned location. Take into account -;; what reload will do. -(define_special_predicate normal_memory_operand - (ior (match_test op = resolve_reload_operand (op), 0) - (and (match_code mem) - (match_test GET_CODE (XEXP (op, 0)) != AND - ;; Returns 1 if OP is not an eliminable register. ;; ;; This exists to cure a pathological failure in the s8addq (et al) patterns,
[C++ Patch[ PR 66130
Hi, Manuel did most of the work for this rather simple issue filed by Tom: essentially, invalid_nonstatic_memfn_p gets a location_t parameter which is used to pass the location of the place where the use of the nonstatic member function is indeed invalid. Besides that, while working on the bug we noticed that we must be careful with exprs which aren't DECLs. Tested x86_64-linux. Thanks, Paolo. 2015-05-18 Manuel López-Ibáñez m...@gcc.gnu.org Paolo Carlini paolo.carl...@oracle.com PR c++/66130 * typeck.c (invalid_nonstatic_memfn_p): Add location_t parameter and use it in the diagnostic. (decay_conversion): Adjust call. * semantics.c (finish_decltype_type): Likewise. * call.c (resolve_args, build_new_op_1, perform_implicit_conversion_flags): Adjust calls. * cvt.c (ocp_convert, convert_to_void): Likewise. * cp-tree.h (invalid_nonstatic_memfn_p): Update declaration. 2015-05-18 Manuel López-Ibáñez m...@gcc.gnu.org Paolo Carlini paolo.carl...@oracle.com PR c++/66130 * g++.dg/other/pr66130.C: New. * g++.dg/cpp0x/pr66130.C: Likewise. Index: cp/call.c === --- cp/call.c (revision 223295) +++ cp/call.c (working copy) @@ -3941,7 +3941,7 @@ resolve_args (vectree, va_gc *args, tsubst_flags error (invalid use of void expression); return NULL; } - else if (invalid_nonstatic_memfn_p (arg, complain)) + else if (invalid_nonstatic_memfn_p (input_location, arg, complain)) return NULL; } return args; @@ -5542,9 +5542,9 @@ build_new_op_1 (location_t loc, enum tree_code cod /* If one of the arguments of the operator represents an invalid use of member function pointer, try to report a meaningful error ... */ - if (invalid_nonstatic_memfn_p (arg1, tf_error) - || invalid_nonstatic_memfn_p (arg2, tf_error) - || invalid_nonstatic_memfn_p (arg3, tf_error)) + if (invalid_nonstatic_memfn_p (loc, arg1, tf_error) + || invalid_nonstatic_memfn_p (loc, arg2, tf_error) + || invalid_nonstatic_memfn_p (loc, arg3, tf_error)) /* We displayed the error message. */; else { @@ -9409,7 +9409,7 @@ perform_implicit_conversion_flags (tree type, tree Call instantiate_type to get good error messages. */ if (TREE_TYPE (expr) == unknown_type_node) instantiate_type (type, expr, complain); - else if (invalid_nonstatic_memfn_p (expr, complain)) + else if (invalid_nonstatic_memfn_p (loc, expr, complain)) /* We gave an error. */; else error_at (loc, could not convert %qE from %qT to %qT, expr, Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 223295) +++ cp/cp-tree.h(working copy) @@ -6282,7 +6282,8 @@ extern tree build_address (tree); extern tree build_nop (tree, tree); extern tree non_reference (tree); extern tree lookup_anon_field (tree, tree); -extern bool invalid_nonstatic_memfn_p (tree, tsubst_flags_t); +extern bool invalid_nonstatic_memfn_p (location_t, tree, +tsubst_flags_t); extern tree convert_member_func_to_ptr (tree, tree, tsubst_flags_t); extern tree convert_ptrmem (tree, tree, bool, bool, tsubst_flags_t); Index: cp/cvt.c === --- cp/cvt.c(revision 223295) +++ cp/cvt.c(working copy) @@ -902,7 +902,7 @@ ocp_convert (tree type, tree expr, int convtype, i { /* If the conversion failed and expr was an invalid use of pointer to member function, try to report a meaningful error. */ - if (invalid_nonstatic_memfn_p (expr, complain)) + if (invalid_nonstatic_memfn_p (loc, expr, complain)) /* We displayed the error message. */; else error_at (loc, conversion from %qT to non-scalar type %qT requested, @@ -960,7 +960,7 @@ convert_to_void (tree expr, impl_conv_void implici if (!TREE_TYPE (expr)) return expr; - if (invalid_nonstatic_memfn_p (expr, complain)) + if (invalid_nonstatic_memfn_p (loc, expr, complain)) return error_mark_node; if (TREE_CODE (expr) == PSEUDO_DTOR_EXPR) { Index: cp/semantics.c === --- cp/semantics.c (revision 223295) +++ cp/semantics.c (working copy) @@ -7242,7 +7242,7 @@ finish_decltype_type (tree expr, bool id_expressio expr =
Re: [PATCH 1/4] rs6000_stack_info changes for -fsplit-stack
On Sun, May 17, 2015 at 10:54 PM, Alan Modra amo...@gmail.com wrote: This patch changes rs6000_stack_info to keep save areas offsets even when not used. I need lr_save_offset valid for split-stack, and it seemed reasonable to treat the other offsets the same. Not zeroing the offsets requires just one change in code that uses them, the use_backchain_to_restore_sp expression in rs6000_emit_epilogue, not counting the debug_stack_info changes. * config/rs6000/rs6000.c (rs6000_stack_info): Don't zero offsets when not saving registers. (debug_stack_info): Adjust to omit printing unused offsets, as before. (rs6000_emit_epilogue): Adjust use_backchain_to_restore_sp expression. I think that the vrsave_save_offset change may break saving of callee-saved VRs. See PR 55276. Additional points for converting the testcase into one that can be included in the GCC testsuite. Thanks, David
[0/9] Record number of hard registers in a REG
While looking at a profile of gcc, I noticed one thing fairly high up the list was a loop iterating over all the registers in a REG, apparently due to the delay in computing the index for hard_regno_nregs and then loading the value (which would often be an L1 cache miss). When we were adding CONST_WIDE_INT, the general opinion seemed to be that we should lay out rtxes for LP64 hosts rather than try to have two alternative layouts, one optimised for ILP32 and one for LP64. We therefore unconditionally filled the 32-bit hole (on LP64) between the rtx header and the main union with extra data. That area is already used by REGs to store ORIGINAL_REGNO, but on LP64 hosts there's another hole in the REGNO field itself. This series takes that idea a step further and uses the hole to store the number of registers in a REG. This still leaves 24 redundant bits that could be used for other things in future. That's actually enough to store a SUBREG of a REG (8 bits for the inner mode, 16 for the offset), but having a single rtx for that would probably cause too many problems. The series sped up an --enable-checking=release gcc by just over 0.5% for various tests on my box. Not a big saving, but hopefully the patches also count as a clean-up. As a follow-on, I'd like to add a FOR_EACH_* macro that iterates over all the registers in a REG. These loops always execute at least once, and rarely more than once, and it would be good to model that in the iterator so that all use sites benefit. Each patch in the series was individually bootstrapped regression-tested on x86_64-linux-gnu. Thanks, Richard
[1/9] Make more use of END_REGNO
This patch just handles some obvious cases where END_REGNO could be used. gcc/ * cfgcleanup.c (mentions_nonequal_regs): Use END_REGNO. * dse.c (note_add_store): Likewise. * ira-lives.c (mark_hard_reg_dead): Likewise. * loop-invariant.c (mark_reg_store): Likewise. (mark_reg_death): Likewise. * postreload.c (reload_combine): Likewise. (reload_combine_note_store): Likewise. (reload_combine_note_use): Likewise. * recog.c (peep2_reg_dead_p): Likewise. Index: gcc/cfgcleanup.c === --- gcc/cfgcleanup.c2015-05-18 08:19:38.719489341 +0100 +++ gcc/cfgcleanup.c2015-05-18 08:19:52.0 +0100 @@ -267,26 +267,20 @@ mark_effect (rtx exp, regset nonequal) mentions_nonequal_regs (const_rtx x, regset nonequal) { subrtx_iterator::array_type array; FOR_EACH_SUBRTX (iter, array, x, NONCONST) { const_rtx x = *iter; if (REG_P (x)) { - unsigned int regno = REGNO (x); - if (REGNO_REG_SET_P (nonequal, regno)) - return true; - if (regno FIRST_PSEUDO_REGISTER) - { - int n = hard_regno_nregs[regno][GET_MODE (x)]; - while (--n 0) - if (REGNO_REG_SET_P (nonequal, regno + n)) - return true; - } + unsigned int end_regno = END_REGNO (x); + for (unsigned int regno = REGNO (x); regno end_regno; ++regno) + if (REGNO_REG_SET_P (nonequal, regno)) + return true; } } return false; } /* Attempt to prove that the basic block B will have no side effects and always continues in the same edge if reached via E. Return the edge if exist, NULL otherwise. */ Index: gcc/dse.c === --- gcc/dse.c 2015-05-18 08:19:38.719489341 +0100 +++ gcc/dse.c 2015-05-18 08:19:52.0 +0100 @@ -850,17 +850,16 @@ typedef struct /* Callback for emit_inc_dec_insn_before via note_stores. Check if a register is clobbered which is live afterwards. */ static void note_add_store (rtx loc, const_rtx expr ATTRIBUTE_UNUSED, void *data) { rtx_insn *insn; note_add_store_info *info = (note_add_store_info *) data; - int r, n; if (!REG_P (loc)) return; /* If this register is referenced by the current or an earlier insn, that's OK. E.g. this applies to the register that is being incremented with this addition. */ for (insn = info-first; @@ -869,25 +868,24 @@ note_add_store (rtx loc, const_rtx expr if (reg_referenced_p (loc, PATTERN (insn))) return; /* If we come here, we have a clobber of a register that's only OK if that register is not live. If we don't have liveness information available, fail now. */ if (!info-fixed_regs_live) { - info-failure = true; + info-failure = true; return; } /* Now check if this is a live fixed register. */ - r = REGNO (loc); - n = hard_regno_nregs[r][GET_MODE (loc)]; - while (--n = 0) -if (REGNO_REG_SET_P (info-fixed_regs_live, r+n)) - info-failure = true; + unsigned int end_regno = END_REGNO (loc); + for (unsigned int regno = REGNO (loc); regno end_regno; ++regno) +if (REGNO_REG_SET_P (info-fixed_regs_live, regno)) + info-failure = true; } /* Callback for for_each_inc_dec that emits an INSN that sets DEST to SRC + SRCOFF before insn ARG. */ static int emit_inc_dec_insn_before (rtx mem ATTRIBUTE_UNUSED, rtx op ATTRIBUTE_UNUSED, Index: gcc/ira-lives.c === --- gcc/ira-lives.c 2015-05-18 08:19:38.719489341 +0100 +++ gcc/ira-lives.c 2015-05-18 08:19:52.0 +0100 @@ -473,17 +473,17 @@ mark_pseudo_regno_subword_dead (int regn register. */ static void mark_hard_reg_dead (rtx reg) { int regno = REGNO (reg); if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)) { - int last = regno + hard_regno_nregs[regno][GET_MODE (reg)]; + int last = END_REGNO (reg); enum reg_class aclass, pclass; while (regno last) { if (TEST_HARD_REG_BIT (hard_regs_live, regno)) { aclass = ira_hard_regno_allocno_class[regno]; pclass = ira_pressure_class_translate[aclass]; Index: gcc/loop-invariant.c === --- gcc/loop-invariant.c2015-05-18 08:19:38.719489341 +0100 +++ gcc/loop-invariant.c2015-05-18 08:19:38.719489341 +0100 @@ -1802,68 +1802,44 @@ mark_regno_death (int regno) change_pressure (regno, false); } /* Mark setting register REG. */ static void mark_reg_store (rtx reg, const_rtx setter ATTRIBUTE_UNUSED, void *data ATTRIBUTE_UNUSED) { - int regno; - if (GET_CODE (reg) == SUBREG) reg =
Re: [PATCH i386] Allow sibcalls in no-PLT PIC
On Fri, 15 May 2015, Jan Hubicka wrote: With -fno-plt, we don't have to reject even direct calls as sibcall candidates. This patch depends on '-fplt' flag that is introduced in another patch. This patch requires that with -fno-plt all sibcall candidates go through prepare_call_address that transforms the call to a GOT lookup. OK? * config/i386/i386.c (ix86_function_ok_for_sibcall): Check flag_plt. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index f29e053..b734350 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -5448,12 +5448,13 @@ ix86_function_ok_for_sibcall (tree decl, tree exp) /* If we are generating position-independent code, we cannot sibcall optimize any indirect call, or a direct call to a global function, as the PLT requires %ebx be live. (Darwin does not have a PLT.) */ if (!TARGET_MACHO !TARGET_64BIT flag_pic + flag_plt (decl !targetm.binds_local_p (decl))) return false; /* If we need to align the outgoing stack, then sibcalling would unalign the stack, which may break the called function. */ if (ix86_minimum_incoming_stack_boundary (true) I think it should be done via psABI change similar to https://groups.google.com/forum/#!topic/x86-64-abi/n8GYMpqvBxI which I have implemented on users/hjl/relax branch in binutils. OK, I am trying to understand how relax branch works and what difference it makes. As I underestand it, the main purpose is to be able to make relaxed call of call function that will, in 64bit mode, either result to RIP relative call with extra NOP just before the instruction if FUNCTION binds within the DSO or to indirect call through GOT bypassing the PLT. This saves overhead of PLT and increase every such call by extra NOP for no-LTO builds and even in LTO when the symbol is defined but interposable. This is actually really nice trick. Now this is about 32bit mode where explicit GOT pointer register is needed (how this work with large code model on x86-64?). It is needed by PLT, but I suppose to implement the same relaxation for 32bit it would need to use EBX to lookup the GOT pointer, too, so the check above would still be valid. The patches makes sense to be given that we support -fno-plt now. After this message the discussion diverged in the direction of H.J.Lu's proposed relaxation scheme involving new type of relocations. I'm not clear if my patch is actually approved. I'd like to point out that it doesn't clash with H.J.Lu's work. It improves codegen by allowing sibcalls in more circumstances. Alexander
C++ PATCH for typedef stripping with variable templates
At the Lenexa C++ meeting Walter pointed me at a variable template testcase that was breaking, which turned out to be because we were failing to strip aliases consistently in variable template arguments, leading to mismatches between, in the first testcase, Vfloat and Vfl. Tested x86_64-pc-linux-gnu, applying to trunk and concepts branch. commit 87ca1bec448edb74cd531d5b2691a9d58ba17326 Author: Jason Merrill ja...@redhat.com Date: Sat May 9 15:50:40 2015 -0500 * pt.c (tsubst_decl) [VAR_DECL]: Call coerce_innermost_template_parms. (determine_specialization): Call coerce_template_parms. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 2cd36c9..2166f5f 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -1919,7 +1919,13 @@ determine_specialization (tree template_id, ++header_count; if (variable_template_p (fns)) -templates = tree_cons (explicit_targs, fns, templates); +{ + tree parms = INNERMOST_TEMPLATE_PARMS (DECL_TEMPLATE_PARMS (fns)); + targs = coerce_template_parms (parms, explicit_targs, fns, + tf_warning_or_error, + /*req_all*/true, /*use_defarg*/true); + templates = tree_cons (targs, fns, templates); +} else for (; fns; fns = OVL_NEXT (fns)) { tree fn = OVL_CURRENT (fns); @@ -11265,6 +11271,11 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain) tmpl = DECL_TI_TEMPLATE (t); gen_tmpl = most_general_template (tmpl); argvec = tsubst (DECL_TI_ARGS (t), args, complain, in_decl); + if (argvec != error_mark_node) + argvec = (coerce_innermost_template_parms + (DECL_TEMPLATE_PARMS (gen_tmpl), + argvec, t, complain, + /*all*/true, /*defarg*/true)); if (argvec == error_mark_node) RETURN (error_mark_node); hash = hash_tmpl_and_args (gen_tmpl, argvec); diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ25.C b/gcc/testsuite/g++.dg/cpp1y/var-templ25.C new file mode 100644 index 000..8253eac --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ25.C @@ -0,0 +1,8 @@ +// { dg-do compile { target c++14 } } + +using fl = float; + +templateclass T const int V = 0; +template const int Vfl = 42; + +static_assert(Vfloat == 42, ); diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ26.C b/gcc/testsuite/g++.dg/cpp1y/var-templ26.C new file mode 100644 index 000..9ac0777 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/var-templ26.C @@ -0,0 +1,19 @@ +// { dg-do compile { target c++14 } } + +template class T const int V = 0; +template const int Vchar = 42; + +template class T +struct A +{ + using N = T; +}; + +#define SA(X) static_assert((X),#X) +template class T +struct B +{ + SA(Vtypename AT::N == 42); +}; + +Bchar b; commit c1e703a6f795ea63609098607b4484e4b516af41 Author: Jason Merrill ja...@redhat.com Date: Wed May 13 15:57:00 2015 -0400 * pt.c (retrieve_specialization): Make sure our arguments have gone through strip_typedefs. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 2166f5f..60f3958 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -1058,6 +1058,14 @@ retrieve_specialization (tree tmpl, tree args, hashval_t hash) ? TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (tmpl)) : template_class_depth (DECL_CONTEXT (tmpl; +#ifdef ENABLE_CHECKING + /* We should have gone through coerce_template_parms by now. */ + ++processing_template_decl; + if (!any_dependent_template_arguments_p (args)) +gcc_assert (strip_typedefs_expr (args, NULL) == args); + --processing_template_decl; +#endif + if (optimize_specialization_lookup_p (tmpl)) { tree class_template;
[7/9] Record the number of registers in a REG
This is the main patch, to record REG_NREGS in the REG itself. The END_REGNO/END_HARD_REGNO distinction goes away in the next patch. gcc/ * rtl.h (reg_info): Add an nregs field. (REG_NREGS): Use it. (SET_REGNO_RAW): Delete. (set_regno_raw): New function. * regs.h (END_HARD_REGNO): Make equivalent to END_REGNO. (END_REGNO): Redefine in terms of REG_NREGS. * read-rtl.c (read_rtx_code): Call set_regno_raw instead of SET_REGNO_RAW. * emit-rtl.c (set_mode_and_regno): Likewise. * df-scan.c (df_ref_change_reg_with_loc): Use set_mode_and_regno instead of SET_REGNO_RAW. Index: gcc/rtl.h === --- gcc/rtl.h 2015-05-18 08:34:38.532523211 +0100 +++ gcc/rtl.h 2015-05-18 08:36:23.407245048 +0100 @@ -210,7 +210,9 @@ struct GTY(()) reg_info { /* The value of REGNO. */ unsigned int regno; - unsigned int unused : 32; + /* The value of REG_NREGS. */ + unsigned int nregs : 8; + unsigned int unused : 24; /* The value of REG_ATTRS. */ reg_attrs *attrs; @@ -1712,15 +1714,11 @@ #define LABEL_REF_LABEL(LABREF) XCEXP (L be used on RHS. Use SET_REGNO to change the value. */ #define REGNO(RTX) (rhs_regno(RTX)) #define SET_REGNO(RTX, N) (df_ref_change_reg_with_loc (RTX, N)) -#define SET_REGNO_RAW(RTX, N) (REG_CHECK (RTX)-regno = N) /* Return the number of consecutive registers in a REG. This is always 1 for pseudo registers and is determined by HARD_REGNO_NREGS for hard registers. */ -#define REG_NREGS(RTX) \ - (REGNO (RTX) FIRST_PSEUDO_REGISTER \ - ? (unsigned int) hard_regno_nregs[REGNO (RTX)][GET_MODE (RTX)] \ - : 1) +#define REG_NREGS(RTX) (REG_CHECK (RTX)-nregs) /* ORIGINAL_REGNO holds the number the register originally had; for a pseudo register turned into a hard reg this will hold the old pseudo @@ -1735,6 +1733,15 @@ rhs_regno (const_rtx x) return REG_CHECK (x)-regno; } +/* Change the REGNO and REG_NREGS of REG X to the specified values, + bypassing the df machinery. */ +static inline void +set_regno_raw (rtx x, unsigned int regno, unsigned int nregs) +{ + reg_info *reg = REG_CHECK (x); + reg-regno = regno; + reg-nregs = nregs; +} /* 1 if RTX is a reg or parallel that is the current function's return value. */ Index: gcc/regs.h === --- gcc/regs.h 2015-05-18 08:34:38.532523211 +0100 +++ gcc/regs.h 2015-05-18 08:36:11.0 +0100 @@ -288,11 +288,11 @@ end_hard_regno (machine_mode mode, unsig /* Likewise for hard register X. */ -#define END_HARD_REGNO(X) end_hard_regno (GET_MODE (X), REGNO (X)) +#define END_HARD_REGNO(X) END_REGNO (X) /* Likewise for hard or pseudo register X. */ -#define END_REGNO(X) (HARD_REGISTER_P (X) ? END_HARD_REGNO (X) : REGNO (X) + 1) +#define END_REGNO(X) (REGNO (X) + REG_NREGS (X)) /* Add to REGS all the registers required to store a value of mode MODE in register REGNO. */ Index: gcc/read-rtl.c === --- gcc/read-rtl.c 2015-05-18 08:34:38.532523211 +0100 +++ gcc/read-rtl.c 2015-05-18 08:34:38.532523211 +0100 @@ -1349,7 +1349,7 @@ read_rtx_code (const char *code_name) case 'r': read_name (name); validate_const_int (name.string); - SET_REGNO_RAW (return_rtx, atoi (name.string)); + set_regno_raw (return_rtx, atoi (name.string), 1); REG_ATTRS (return_rtx) = NULL; break; Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c 2015-05-18 08:34:38.532523211 +0100 +++ gcc/emit-rtl.c 2015-05-18 08:34:38.532523211 +0100 @@ -435,8 +435,11 @@ gen_blockage (void) void set_mode_and_regno (rtx x, machine_mode mode, unsigned int regno) { + unsigned int nregs = (HARD_REGISTER_NUM_P (regno) + ? hard_regno_nregs[regno][mode] + : 1); PUT_MODE_RAW (x, mode); - SET_REGNO_RAW (x, regno); + set_regno_raw (x, regno, nregs); } /* Generate a new REG rtx. Make sure ORIGINAL_REGNO is set properly, and Index: gcc/df-scan.c === --- gcc/df-scan.c 2015-05-18 08:34:38.532523211 +0100 +++ gcc/df-scan.c 2015-05-18 08:36:11.0 +0100 @@ -1930,7 +1930,7 @@ df_ref_change_reg_with_loc (rtx loc, uns DF_REG_EQ_USE_GET (new_regno), new_regno, loc); } - SET_REGNO_RAW (loc, new_regno); + set_mode_and_regno (loc, GET_MODE (loc), new_regno); }
[6/9] Pass REG changes through a new function
This patch adds a new function to set both the mode and regno of a REG. It makes sure that all REG PUT_MODE and SET_REGNO changes go through this function. There's a new PUT_MODE_RAW (analogous to SET_REGNO_RAW) for the cases where the caller doesn't want that. There's a small consistency fix: gen_rtx_REG was declared with unsigned and defined with unsigned int. The latter is usual GCC style. gcc/ * rtl.h (PUT_MODE_RAW): New macro. (PUT_REG_NOTE_KIND): Use it. (set_mode_and_regno): Declare. (gen_raw_REG): Change regno to unsigned int. (gen_rtx_REG): Change unsigned to unsigned int. (PUT_MODE): Forward to PUT_MODE_RAW for generators, otherwise use set_mode_and_regno to change the mode of registers. * gengenrtl.c (gendef): Use PUT_MODE_RAW. * emit-rtl.c (set_mode_and_regno): New function. (gen_raw_REG): Change regno to unsigned int. Use set_mode_and_regno. * caller-save.c (reg_save_code): Use set_mode_and_regno. * expr.c (init_expr_target): Likewise. * ira.c (setup_prohibited_mode_move_regs): Likewise. * postreload.c (reload_cse_simplify_operands): Likewise. Index: gcc/rtl.h === --- gcc/rtl.h 2015-05-17 21:26:44.375213705 +0100 +++ gcc/rtl.h 2015-05-17 21:29:52.068986873 +0100 @@ -668,8 +668,8 @@ #define RTX_PREV(X) ((INSN_P (X) #define GET_CODE(RTX) ((enum rtx_code) (RTX)-code) #define PUT_CODE(RTX, CODE) ((RTX)-code = (CODE)) -#define GET_MODE(RTX) ((machine_mode) (RTX)-mode) -#define PUT_MODE(RTX, MODE) ((RTX)-mode = (MODE)) +#define GET_MODE(RTX) ((machine_mode) (RTX)-mode) +#define PUT_MODE_RAW(RTX, MODE)((RTX)-mode = (MODE)) /* RTL vector. These appear inside RTX's when there is a need for a variable number of things. The principle use is inside @@ -1509,7 +1509,7 @@ #define DEF_REG_NOTE(NAME) NAME, /* Define macros to extract and insert the reg-note kind in an EXPR_LIST. */ #define REG_NOTE_KIND(LINK) ((enum reg_note) GET_MODE (LINK)) #define PUT_REG_NOTE_KIND(LINK, KIND) \ - PUT_MODE (LINK, (machine_mode) (KIND)) + PUT_MODE_RAW (LINK, (machine_mode) (KIND)) /* Names for REG_NOTE's in EXPR_LIST insn's. */ @@ -3216,13 +3216,27 @@ gen_rtx_INSN (machine_mode mode, rtx_ins rtx reg_notes); extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT); extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec); -extern rtx gen_raw_REG (machine_mode, int); -extern rtx gen_rtx_REG (machine_mode, unsigned); +extern void set_mode_and_regno (rtx, machine_mode, unsigned int); +extern rtx gen_raw_REG (machine_mode, unsigned int); +extern rtx gen_rtx_REG (machine_mode, unsigned int); extern rtx gen_rtx_SUBREG (machine_mode, rtx, int); extern rtx gen_rtx_MEM (machine_mode, rtx); extern rtx gen_rtx_VAR_LOCATION (machine_mode, tree, rtx, enum var_init_status); +#ifdef GENERATOR_FILE +#define PUT_MODE(RTX, MODE) PUT_MODE_RAW (RTX, MODE) +#else +static inline void +PUT_MODE (rtx x, machine_mode mode) +{ + if (REG_P (x)) +set_mode_and_regno (x, mode, REGNO (x)); + else +PUT_MODE_RAW (x, mode); +} +#endif + #define GEN_INT(N) gen_rtx_CONST_INT (VOIDmode, (N)) /* Virtual registers are used during RTL generation to refer to locations into Index: gcc/gengenrtl.c === --- gcc/gengenrtl.c 2015-05-15 21:12:56.172234508 +0100 +++ gcc/gengenrtl.c 2015-05-17 21:31:16.143988041 +0100 @@ -252,7 +252,7 @@ gendef (const char *format) puts ( rtx rt;); puts ( rt = rtx_alloc_stat (code PASS_MEM_STAT);\n); - puts ( PUT_MODE (rt, mode);); + puts ( PUT_MODE_RAW (rt, mode);); for (p = format, i = j = 0; *p ; ++p, ++i) if (*p != '0') Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c 2015-05-17 21:26:44.375213705 +0100 +++ gcc/emit-rtl.c 2015-05-17 21:30:18.484672892 +0100 @@ -430,16 +430,24 @@ gen_blockage (void) #endif +/* Set the mode and register number of X to MODE and REGNO. */ + +void +set_mode_and_regno (rtx x, machine_mode mode, unsigned int regno) +{ + PUT_MODE_RAW (x, mode); + SET_REGNO_RAW (x, regno); +} + /* Generate a new REG rtx. Make sure ORIGINAL_REGNO is set properly, and don't attempt to share with the various global pieces of rtl (such as frame_pointer_rtx). */ rtx -gen_raw_REG (machine_mode mode, int regno) +gen_raw_REG (machine_mode mode, unsigned int regno) { rtx x = rtx_alloc_stat (REG PASS_MEM_STAT); - PUT_MODE (x, mode); - SET_REGNO_RAW (x, regno); + set_mode_and_regno (x, mode, regno); REG_ATTRS (x) = NULL; ORIGINAL_REGNO (x) = regno; return x; Index: gcc/caller-save.c === --- gcc/caller-save.c 2015-05-17 21:26:44.375213705 +0100 +++ gcc/caller-save.c
Re: [PATCH][Testsuite] Disable tests with dg-require-fork for simulated targets
On May 18, 2015, at 8:01 AM, Alan Lawrence alan.lawre...@arm.com wrote: Simulators such as qemu report the presence of fork (it's in glibc) but generally do not support synchronization primitives between threads, so any tests using fork are unreliable. Hum, I have a simulator (binutils/sim) that has fork. All those tests pass for me. They seem to be reliable for me. I didn’t do anything special as I recall. ? I did add enough libc (aka newlib) to bootstrap gcc, which maybe is slightly more than some do, but, existence of additional libraries shouldn’t change it much. To the extent it does, it should be easy to notice any extra required libraries directly. If a qmu bug or design deficiency, do you have a pointer to the reported bug or the design where they talk about tit. Remember, the point of the test suite is to find bugs to be fixed. Papering over bugs by turning it off, is fine, but, we should have named bug reports that when fixed, cause us to go back and turn back on those that were turned off. This patch disables the subset of such tests that identify themselves using dg-require-fork. At present, such tests are limited to (a) gcc.dg/torture/ftrapv-1.c. and (b) some tests in the 27_io section of the libstdc++ testsuite, listed below. Further patches can add dg-require-fork to the many other tests that call fork(). Cross-tested on aarch64-none-linux-gnu using qemu, with these tests becoming UNSUPPORTED: (gcc) gcc.dg/torture/ftrapv-1.c So, I reviewed this test case. What about it doesn’t work? Kinda simple and small, easy to understand. Is this patch OK for trunk? No. Let’s talk about it before turning off a to of test cases.
[8/9] Remove END_HARD_REGNO
This patch replaces calls to END_HARD_REGNO with calls to END_REGNO and moves END_REGNO to rtl.h (since it's now just a REG access). gcc/ * regs.h (END_HARD_REGNO): Delete. (END_REGNO): Move to... * rtl.h: ...here. * bt-load.c (note_btr_set): Use END_REGNO instead of END_HARD_REGNO. * caller-save.c (mark_set_regs): Likewise. * combine.c (move_deaths, distribute_notes): Likewise. * cse.c (invalidate, invalidate_for_call): Likewise. * df-scan.c (df_ref_record): Likewise. * postreload-gcse.c (reg_changed_after_insn_p): Likewise. (record_last_reg_set_info): Likewise. * reg-stack.c (convert_regs_exit): Likewise. * reload.c (reg_overlap_mentioned_for_reload_p): Likewise. * resource.c (update_live_status): Likewise. * rtlanal.c (find_reg_fusage, find_regno_fusage): Likewise. Index: gcc/regs.h === --- gcc/regs.h 2015-05-18 08:36:41.951019042 +0100 +++ gcc/regs.h 2015-05-18 08:36:41.947019063 +0100 @@ -286,14 +286,6 @@ end_hard_regno (machine_mode mode, unsig return regno + hard_regno_nregs[regno][(int) mode]; } -/* Likewise for hard register X. */ - -#define END_HARD_REGNO(X) END_REGNO (X) - -/* Likewise for hard or pseudo register X. */ - -#define END_REGNO(X) (REGNO (X) + REG_NREGS (X)) - /* Add to REGS all the registers required to store a value of mode MODE in register REGNO. */ Index: gcc/rtl.h === --- gcc/rtl.h 2015-05-18 08:36:41.951019042 +0100 +++ gcc/rtl.h 2015-05-18 08:37:14.494621384 +0100 @@ -1733,6 +1733,13 @@ rhs_regno (const_rtx x) return REG_CHECK (x)-regno; } +/* Return the final register in REG X plus one. */ +static inline unsigned int +END_REGNO (const_rtx x) +{ + return REGNO (x) + REG_NREGS (x); +} + /* Change the REGNO and REG_NREGS of REG X to the specified values, bypassing the df machinery. */ static inline void Index: gcc/bt-load.c === --- gcc/bt-load.c 2015-05-18 08:36:41.951019042 +0100 +++ gcc/bt-load.c 2015-05-18 08:36:41.943019110 +0100 @@ -443,7 +443,7 @@ note_btr_set (rtx dest, const_rtx set AT if (!REG_P (dest)) return; regno = REGNO (dest); - end_regno = END_HARD_REGNO (dest); + end_regno = END_REGNO (dest); for (; regno end_regno; regno++) if (TEST_HARD_REG_BIT (all_btrs, regno)) { Index: gcc/caller-save.c === --- gcc/caller-save.c 2015-05-18 08:36:41.951019042 +0100 +++ gcc/caller-save.c 2015-05-18 08:36:41.947019063 +0100 @@ -992,7 +992,7 @@ mark_set_regs (rtx reg, const_rtx setter REGNO (reg) FIRST_PSEUDO_REGISTER) { regno = REGNO (reg); - endregno = END_HARD_REGNO (reg); + endregno = END_REGNO (reg); } else return; Index: gcc/combine.c === --- gcc/combine.c 2015-05-18 08:36:41.951019042 +0100 +++ gcc/combine.c 2015-05-18 08:36:41.947019063 +0100 @@ -13316,8 +13316,8 @@ move_deaths (rtx x, rtx maybe_kill_insn, GET_MODE_SIZE (GET_MODE (x { unsigned int deadregno = REGNO (XEXP (note, 0)); - unsigned int deadend = END_HARD_REGNO (XEXP (note, 0)); - unsigned int ourend = END_HARD_REGNO (x); + unsigned int deadend = END_REGNO (XEXP (note, 0)); + unsigned int ourend = END_REGNO (x); unsigned int i; for (i = deadregno; i deadend; i++) @@ -13337,7 +13337,7 @@ move_deaths (rtx x, rtx maybe_kill_insn, regno FIRST_PSEUDO_REGISTER REG_NREGS (x) 1) { - unsigned int ourend = END_HARD_REGNO (x); + unsigned int ourend = END_REGNO (x); unsigned int i, offset; rtx oldnotes = 0; @@ -13932,7 +13932,7 @@ distribute_notes (rtx notes, rtx_insn *f if (place REG_NREGS (XEXP (note, 0)) 1) { - unsigned int endregno = END_HARD_REGNO (XEXP (note, 0)); + unsigned int endregno = END_REGNO (XEXP (note, 0)); bool all_used = true; unsigned int i; Index: gcc/cse.c === --- gcc/cse.c 2015-05-18 08:36:41.951019042 +0100 +++ gcc/cse.c 2015-05-18 08:36:41.947019063 +0100 @@ -1894,7 +1894,7 @@ invalidate (rtx x, machine_mode full_mod { HOST_WIDE_INT in_table = TEST_HARD_REG_BIT (hard_regs_in_table, regno); - unsigned int endregno = END_HARD_REGNO (x); + unsigned int endregno = END_REGNO (x); unsigned int tregno, tendregno, rn; struct table_elt *p, *next; @@ -1920,7
[2/9] Add a REG_NREGS macro
This patch adds a REG_NREGS macro that by the end of the series will simply be an rtx field access. The definition in this patch is just a placeholder (and so I haven't tried to avoid the double evaluation of the parameter). The diff has extra context because in most cases it makes it obvious that we really are looking at hard_regno_nregs[REGNO (x)][GET_MODE (x)] for some rtx x. There are a couple of sites where it's more indirect though. gcc/ * rtl.h (REG_NREGS): New macro * alias.c (record_set): Use it. * cfgcleanup.c (mark_effect): Likewise. * combine.c (likely_spilled_retval_1): Likewise. (likely_spilled_retval_p, can_change_dest_mode): Likewise. (move_deaths, distribute_notes): Likewise. * cselib.c (cselib_record_set): Likewise. * df-problems.c (df_simulate_one_insn_forwards): Likewise. * df-scan.c (df_mark_reg): Likewise. * dse.c (look_for_hardregs): Likewise. * dwarf2out.c (reg_loc_descriptor): Likewise. (multiple_reg_loc_descriptor): Likewise. * expr.c (write_complex_part, read_complex_part): Likewise. (emit_move_complex): Likewise. * haifa-sched.c (setup_ref_regs): Likewise. * ira-lives.c (mark_hard_reg_live): Likewise. * lra.c (lra_set_insn_recog_data): Likewise. * mode-switching.c (create_pre_exit): Likewise. * postreload.c (reload_combine_recognize_const_pattern): Likewise. (reload_combine_recognize_pattern): Likewise. (reload_combine_note_use, move2add_record_mode): Likewise. (reload_cse_move2add): Likewise. * reg-stack.c (subst_stack_regs_pat): Likewise. * regcprop.c (kill_value, copy_value): Likewise. (copyprop_hardreg_forward_1): Likewise. * regrename.c (verify_reg_in_set, scan_rtx_reg): Likewise. (build_def_use): Likewise. * sched-deps.c (mark_insn_reg_birth, mark_reg_death): Likewise. (deps_analyze_insn): Likewise. * sched-rgn.c (check_live_1, update_live_1): Likewise. * sel-sched.c (count_occurrences_equiv): Likewise. * valtrack.c (dead_debug_insert_temp): Likewise. Index: gcc/rtl.h === --- gcc/rtl.h 2015-05-18 08:19:43.475431381 +0100 +++ gcc/rtl.h 2015-05-18 08:19:43.459431568 +0100 @@ -1692,16 +1692,24 @@ #define LABEL_REF_LABEL(LABREF) XCEXP (L /* For a REG rtx, REGNO extracts the register number. REGNO can only be used on RHS. Use SET_REGNO to change the value. */ #define REGNO(RTX) (rhs_regno(RTX)) #define SET_REGNO(RTX,N) \ (df_ref_change_reg_with_loc (REGNO (RTX), N, RTX), XCUINT (RTX, 0, REG) = N) #define SET_REGNO_RAW(RTX,N) (XCUINT (RTX, 0, REG) = N) +/* Return the number of consecutive registers in a REG. This is always + 1 for pseudo registers and is determined by HARD_REGNO_NREGS for + hard registers. */ +#define REG_NREGS(RTX) \ + (REGNO (RTX) FIRST_PSEUDO_REGISTER \ + ? (unsigned int) hard_regno_nregs[REGNO (RTX)][GET_MODE (RTX)] \ + : 1) + /* ORIGINAL_REGNO holds the number the register originally had; for a pseudo register turned into a hard reg this will hold the old pseudo register number. */ #define ORIGINAL_REGNO(RTX) \ (RTL_FLAG_CHECK1 (ORIGINAL_REGNO, (RTX), REG)-u2.original_regno) /* Force the REGNO macro to only be used on the lhs. */ static inline unsigned int Index: gcc/alias.c === --- gcc/alias.c 2015-05-18 08:19:43.475431381 +0100 +++ gcc/alias.c 2015-05-18 08:19:43.455431618 +0100 @@ -1295,22 +1295,17 @@ record_set (rtx dest, const_rtx set, voi if (!REG_P (dest)) return; regno = REGNO (dest); gcc_checking_assert (regno reg_base_value-length ()); - /* If this spans multiple hard registers, then we must indicate that every - register has an unusable value. */ - if (regno FIRST_PSEUDO_REGISTER) -n = hard_regno_nregs[regno][GET_MODE (dest)]; - else -n = 1; + n = REG_NREGS (dest); if (n != 1) { while (--n = 0) { bitmap_set_bit (reg_seen, regno + n); new_reg_base_value[regno + n] = 0; } return; Index: gcc/cfgcleanup.c === --- gcc/cfgcleanup.c2015-05-18 08:19:43.475431381 +0100 +++ gcc/cfgcleanup.c2015-05-18 08:19:43.455431618 +0100 @@ -229,35 +229,33 @@ mark_effect (rtx exp, regset nonequal) /* In case we do clobber the register, mark it as equal, as we know the value is dead so it don't have to match. */ case CLOBBER: if (REG_P (XEXP (exp, 0))) { dest = XEXP (exp, 0); regno = REGNO (dest); if (HARD_REGISTER_NUM_P (regno)) - bitmap_clear_range (nonequal, regno, - hard_regno_nregs[regno][GET_MODE (dest)]); + bitmap_clear_range (nonequal, regno,
[3/9] Clean up df_ref_change_reg_with_loc
This patch cleans up the interface to df_ref_change_reg_with_loc. The function is used only by SET_REGNO_RAW, so the old regno is always REGNO (x) and the caller always goes on to set REGNO. (And the fuction doesn't make much sense otherwise.) The patch therefore gets df_ref_change_reg_with_loc to work out the old regno itself and to install the new register number once it's done. The check for the old register being -1 was redundant. Only expr.c and postreload.c create -1 registers (fixed in a later patch). Both sites are just creating temporary registers in order to query backend hooks and neither site needs the check. expr.c does use SET_REGNO and so does go through this function, but only at a time when there's no df information. (And it wouldn't work if the df machinery were set up, since any change after the first would look like a normal change.) postreload.c uses SET_REGNO_RAW and so bypasses the code altogether. expr.c bypasses the code too by the end of the series. gcc/ * df.h (df_ref_change_reg_with_loc): Remove old_regno parameter. Change type of new_regno to unsigned int. * df-scan.c (df_ref_change_reg_with_loc_1): Change type of new_regno to unsigned int. (df_ref_change_reg_with_loc): Remove old_regno parameter. Change type of new_regno to unsigned int. Use SET_REGNO_RAW. * rtl.h (SET_REGNO): Update call to df_ref_change_reg_with_loc. (SET_REGNO_RAW): Add space after ,. Index: gcc/df.h === --- gcc/df.h2015-05-18 07:53:09.890871253 +0100 +++ gcc/df.h2015-05-18 07:53:09.890871253 +0100 @@ -1049,7 +1049,7 @@ extern void df_recompute_luids (basic_bl extern void df_insn_change_bb (rtx_insn *, basic_block); extern void df_maybe_reorganize_use_refs (enum df_ref_order); extern void df_maybe_reorganize_def_refs (enum df_ref_order); -extern void df_ref_change_reg_with_loc (int, int, rtx); +extern void df_ref_change_reg_with_loc (rtx, unsigned int); extern void df_notes_rescan (rtx_insn *); extern void df_hard_reg_init (void); extern void df_update_entry_block_defs (void); Index: gcc/df-scan.c === --- gcc/df-scan.c 2015-05-18 07:53:09.890871253 +0100 +++ gcc/df-scan.c 2015-05-18 07:53:09.890871253 +0100 @@ -1819,7 +1819,7 @@ df_insn_change_bb (rtx_insn *insn, basic static void df_ref_change_reg_with_loc_1 (struct df_reg_info *old_df, struct df_reg_info *new_df, - int new_regno, rtx loc) + unsigned int new_regno, rtx loc) { df_ref the_ref = old_df-reg_chain; @@ -1904,25 +1904,33 @@ df_ref_change_reg_with_loc_1 (struct df_ } -/* Change the regno of all refs that contained LOC from OLD_REGNO to - NEW_REGNO. Refs that do not match LOC are not changed which means - that artificial refs are not changed since they have no loc. This - call is to support the SET_REGNO macro. */ +/* Change the regno of register LOC to NEW_REGNO and update the df + information accordingly. Refs that do not match LOC are not changed + which means that artificial refs are not changed since they have no loc. + This call is to support the SET_REGNO macro. */ void -df_ref_change_reg_with_loc (int old_regno, int new_regno, rtx loc) +df_ref_change_reg_with_loc (rtx loc, unsigned int new_regno) { - if ((!df) || (old_regno == -1) || (old_regno == new_regno)) + unsigned int old_regno = REGNO (loc); + if (old_regno == new_regno) return; - df_grow_reg_info (); + if (df) +{ + df_grow_reg_info (); - df_ref_change_reg_with_loc_1 (DF_REG_DEF_GET (old_regno), - DF_REG_DEF_GET (new_regno), new_regno, loc); - df_ref_change_reg_with_loc_1 (DF_REG_USE_GET (old_regno), - DF_REG_USE_GET (new_regno), new_regno, loc); - df_ref_change_reg_with_loc_1 (DF_REG_EQ_USE_GET (old_regno), - DF_REG_EQ_USE_GET (new_regno), new_regno, loc); + df_ref_change_reg_with_loc_1 (DF_REG_DEF_GET (old_regno), + DF_REG_DEF_GET (new_regno), + new_regno, loc); + df_ref_change_reg_with_loc_1 (DF_REG_USE_GET (old_regno), + DF_REG_USE_GET (new_regno), + new_regno, loc); + df_ref_change_reg_with_loc_1 (DF_REG_EQ_USE_GET (old_regno), + DF_REG_EQ_USE_GET (new_regno), + new_regno, loc); +} + SET_REGNO_RAW (loc, new_regno); } Index: gcc/rtl.h === --- gcc/rtl.h 2015-05-18 07:53:09.890871253 +0100 +++ gcc/rtl.h 2015-05-18 07:53:09.886871176 +0100 @@ -1693,9 +1693,8 @@ #define LABEL_REF_LABEL(LABREF) XCEXP (L /* For a REG rtx, REGNO extracts
[4/9] Add a dedicated rtx union member for REGs
This patch replaces the current REG i0 format with a dedicated structure, so that we can make use of the extra 32 bits in the i field. Of the places that iterate on formats and do something for 'i's, most already handled REGs specially before the format walk and so don't need to check for 'r'. Otherwise it's mostly just a case of adding dummy 'r' cases in order avoid the default gcc_unreachable () (in cases where we do the same for 'i'). The main exceptions are the cselib.c and lra-constraints.c changes. final.c:leaf_renumber_regs_insn handled REGs specially but then went on to do a no-op walk of the format. I just added an early exit instead of an empty 'r' case. gcc/ * rtl.def (REG): Change format to r. * rtl.h (rtunion): Remove rt_reg. (reg_info): New structure. (rtx_def): Add reg field to main union. (X0REGATTR): Delete. (REG_CHECK): New macro. (SET_REGNO_RAW, rhs_regno, REG_ATTRS): Use it. * rtl.c (rtx_format): Document r. (rtx_code_size): Handle REG specially. * gengenrtl.c (special_format): Return true for formats that include 'r'. * gengtype.c (adjust_field_rtx_def): Handle 'r' fields. Deal with REG_ATTRS after the field loop. * emit-rtl.c (gen_raw_REG): Call rtx_alloc_stat directly. * expmed.c (init_expmed): Call gen_raw_REG instead of gen_rtx_raw_REG. * expr.c (init_expr_target): Likewise. * regcprop.c (maybe_mode_change): Likewise. * varasm.c (make_decl_rtl): Likewise. * final.c (leaf_renumber_regs_insn): Return early after handling REGs. * genemit.c (gen_exp): Handle 'r' fields. * genpeep.c (match_rtx): Likewise. * gensupport.c (subst_pattern_match): Likewise. (get_alternatives_number, collect_insn_data, alter_predicate_for_insn) (alter_constraints, subst_dup): Likewise. * read-rtl.c (read_rtx_code): Likewise. * print-rtl.c (print_rtx): Likewise. * genrecog.c (find_operand, find_matching_operand): Likewise. (validate_pattern, match_pattern_2): Likewise. (parameter::UINT, rtx_test::REGNO_FIELD): New enum values. (rtx_test::regno_field): New function. (operator ==, safe_to_hoist_p, transition_parameter_type) (parameter_type_string, print_parameter_value) (print_nonbool_test, print_test): Handle new enum values. * cselib.c (rtx_equal_for_cselib_1): Handle REG specially. * lra-constraints.c (operands_match_p): Likewise. Index: gcc/rtl.def === --- gcc/rtl.def 2015-05-18 07:53:15.014808321 +0100 +++ gcc/rtl.def 2015-05-18 07:53:15.010808427 +0100 @@ -381,7 +381,7 @@ DEF_RTL_EXPR(PC, pc, , RTX_OBJ) points to a reg_attrs structure. This rtx needs to have as many (or more) fields as a MEM, since we can change REG rtx's into MEMs during reload. */ -DEF_RTL_EXPR(REG, reg, i0, RTX_OBJ) +DEF_RTL_EXPR(REG, reg, r, RTX_OBJ) /* A scratch register. This represents a register used only within a single insn. It will be replaced by a REG during register allocation Index: gcc/rtl.h === --- gcc/rtl.h 2015-05-18 07:53:15.014808321 +0100 +++ gcc/rtl.h 2015-05-18 07:53:15.014808321 +0100 @@ -201,11 +201,21 @@ struct GTY((for_user)) reg_attrs { tree rt_tree; basic_block rt_bb; mem_attrs *rt_mem; - reg_attrs *rt_reg; struct constant_descriptor_rtx *rt_constant; struct dw_cfi_node *rt_cfi; }; +/* Describes the properties of a REG. */ +struct GTY(()) reg_info { + /* The value of REGNO. */ + unsigned int regno; + + unsigned int unused : 32; + + /* The value of REG_ATTRS. */ + reg_attrs *attrs; +}; + /* This structure remembers the position of a SYMBOL_REF within an object_block structure. A SYMBOL_REF only provides this information if SYMBOL_REF_HAS_BLOCK_INFO_P is true. */ @@ -395,6 +405,7 @@ struct GTY((desc(0), tag(0), union u { rtunion fld[1]; HOST_WIDE_INT hwint[1]; +struct reg_info reg; struct block_symbol block_sym; struct real_value rv; struct fixed_value fv; @@ -1070,6 +1081,13 @@ #define XCNMPFV(RTX, C, M) __extension__ __LINE__, __FUNCTION__); \ _rtx-u.fv; }) +#define REG_CHECK(RTX) __extension__ \ +({ __typeof (RTX) const _rtx = (RTX); \ + if (GET_CODE (_rtx) != REG) \ + rtl_check_failed_code1 (_rtx, REG, __FILE__, __LINE__, \ +__FUNCTION__); \ + _rtx-u.reg; }) + #define BLOCK_SYMBOL_CHECK(RTX) __extension__ \ ({ __typeof (RTX) const _symbol = (RTX); \ const unsigned int flags = SYMBOL_REF_FLAGS
[5/9] Create sensible dummy registers
Some pieces of code create a temporary REG or MEM and only fill it in later when they're testing the cost of a particular rtx. This patch makes sure that even the dummy REG or MEM is valid, rather than force the gen_* code to handle garbage values. gcc/ * caller-save.c (init_caller_save): Use word_mode and FIRST_PSEUDO_REGISTER when creating temporary rtxes. * expr.c (init_expr_target): Likewise. * ira.c (setup_prohibited_mode_move_regs): Likewise. * postreload.c (reload_cse_regs_1): Likewise. Index: gcc/caller-save.c === --- gcc/caller-save.c 2015-05-16 15:50:58.063567104 +0100 +++ gcc/caller-save.c 2015-05-16 15:50:58.055567198 +0100 @@ -287,8 +287,8 @@ init_caller_save (void) To avoid lots of unnecessary RTL allocation, we construct all the RTL once, then modify the memory and register operands in-place. */ - test_reg = gen_rtx_REG (VOIDmode, 0); - test_mem = gen_rtx_MEM (VOIDmode, address); + test_reg = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER); + test_mem = gen_rtx_MEM (word_mode, address); savepat = gen_rtx_SET (test_mem, test_reg); restpat = gen_rtx_SET (test_reg, test_mem); Index: gcc/expr.c === --- gcc/expr.c 2015-05-16 15:50:58.063567104 +0100 +++ gcc/expr.c 2015-05-16 15:50:58.059567152 +0100 @@ -202,12 +202,12 @@ init_expr_target (void) /* Try indexing by frame ptr and try by stack ptr. It is known that on the Convex the stack ptr isn't a valid index. With luck, one or the other is valid on any machine. */ - mem = gen_rtx_MEM (VOIDmode, stack_pointer_rtx); - mem1 = gen_rtx_MEM (VOIDmode, frame_pointer_rtx); + mem = gen_rtx_MEM (word_mode, stack_pointer_rtx); + mem1 = gen_rtx_MEM (word_mode, frame_pointer_rtx); /* A scratch register we can modify in-place below to avoid useless RTL allocations. */ - reg = gen_rtx_REG (VOIDmode, -1); + reg = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER); insn = rtx_alloc (INSN); pat = gen_rtx_SET (NULL_RTX, NULL_RTX); Index: gcc/ira.c === --- gcc/ira.c 2015-05-16 15:50:58.063567104 +0100 +++ gcc/ira.c 2015-05-16 15:50:58.055567198 +0100 @@ -1767,8 +1767,8 @@ setup_prohibited_mode_move_regs (void) if (ira_prohibited_mode_move_regs_initialized_p) return; ira_prohibited_mode_move_regs_initialized_p = true; - test_reg1 = gen_rtx_REG (VOIDmode, 0); - test_reg2 = gen_rtx_REG (VOIDmode, 0); + test_reg1 = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER); + test_reg2 = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER); move_pat = gen_rtx_SET (test_reg1, test_reg2); move_insn = gen_rtx_INSN (VOIDmode, 0, 0, 0, move_pat, 0, -1, 0); for (i = 0; i NUM_MACHINE_MODES; i++) Index: gcc/postreload.c === --- gcc/postreload.c2015-05-16 15:50:58.063567104 +0100 +++ gcc/postreload.c2015-05-16 15:50:58.055567198 +0100 @@ -234,7 +234,7 @@ reload_cse_regs_1 (void) bool cfg_changed = false; basic_block bb; rtx_insn *insn; - rtx testreg = gen_rtx_REG (VOIDmode, -1); + rtx testreg = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER); cselib_init (CSELIB_RECORD_MEMORY); init_alias_analysis ();
[9/9] Simplify register bitmap operations
After the previous changes, several callers are left with code like: if (HARD_REGISTER_P (x)) bitmap_foo_range (x, REGNO (x), REG_NREGS (x)); else bitmap_foo (x, REGNO (x)); These might as well now be: if (REG_NREGS (x) 1) bitmap_foo_range (x, REGNO (x), REG_NREGS (x)); else bitmap_foo (x, REGNO (x)); since REG_NREGS is as cheap to test, and since single-register hard REGs are the common case. But if separating out the cases is a win -- and it seems to be, very slightly -- then it would be better to add the shortcut to the range functions themselves. gcc/ * bitmap.c (bitmap_set_range): Handle count==1 specially. (bitmap_clear_range): Likewise. * cfgcleanup.c (mark_effect): Use bitmap_clear_range and bitmap_set_range unconditionally. * df-problems.c (df_simulate_one_insn_forwards): Likewise. * df-scan.c (df_mark_reg): Likewise. * haifa-sched.c (setup_ref_regs): Likewise. * sched-rgn.c (update_live_1): Likewise. Index: gcc/bitmap.c === --- gcc/bitmap.c2015-05-18 08:38:25.845752816 +0100 +++ gcc/bitmap.c2015-05-18 08:38:25.841752865 +0100 @@ -1212,6 +1212,12 @@ bitmap_set_range (bitmap head, unsigned if (!count) return; + if (count == 1) +{ + bitmap_set_bit (head, start); + return; +} + first_index = start / BITMAP_ELEMENT_ALL_BITS; end_bit_plus1 = start + count; last_index = (end_bit_plus1 - 1) / BITMAP_ELEMENT_ALL_BITS; @@ -1311,6 +1317,12 @@ bitmap_clear_range (bitmap head, unsigne if (!count) return; + if (count == 1) +{ + bitmap_clear_bit (head, start); + return; +} + first_index = start / BITMAP_ELEMENT_ALL_BITS; end_bit_plus1 = start + count; last_index = (end_bit_plus1 - 1) / BITMAP_ELEMENT_ALL_BITS; Index: gcc/cfgcleanup.c === --- gcc/cfgcleanup.c2015-05-18 08:38:25.845752816 +0100 +++ gcc/cfgcleanup.c2015-05-18 08:38:25.841752865 +0100 @@ -222,22 +222,15 @@ try_simplify_condjump (basic_block cbran static bool mark_effect (rtx exp, regset nonequal) { - int regno; rtx dest; switch (GET_CODE (exp)) { /* In case we do clobber the register, mark it as equal, as we know the value is dead so it don't have to match. */ case CLOBBER: - if (REG_P (XEXP (exp, 0))) - { - dest = XEXP (exp, 0); - regno = REGNO (dest); - if (HARD_REGISTER_NUM_P (regno)) - bitmap_clear_range (nonequal, regno, REG_NREGS (dest)); - else - bitmap_clear_bit (nonequal, regno); - } + dest = XEXP (exp, 0); + if (REG_P (dest)) + bitmap_clear_range (nonequal, REGNO (dest), REG_NREGS (dest)); return false; case SET: @@ -248,11 +241,7 @@ mark_effect (rtx exp, regset nonequal) return false; if (!REG_P (dest)) return true; - regno = REGNO (dest); - if (HARD_REGISTER_NUM_P (regno)) - bitmap_set_range (nonequal, regno, REG_NREGS (dest)); - else - bitmap_set_bit (nonequal, regno); + bitmap_set_range (nonequal, REGNO (dest), REG_NREGS (dest)); return false; default: Index: gcc/df-problems.c === --- gcc/df-problems.c 2015-05-18 08:38:25.845752816 +0100 +++ gcc/df-problems.c 2015-05-18 08:38:25.845752816 +0100 @@ -3574,11 +3574,7 @@ df_simulate_one_insn_forwards (basic_blo case REG_UNUSED: { rtx reg = XEXP (link, 0); - int regno = REGNO (reg); - if (HARD_REGISTER_NUM_P (regno)) - bitmap_clear_range (live, regno, REG_NREGS (reg)); - else - bitmap_clear_bit (live, regno); + bitmap_clear_range (live, REGNO (reg), REG_NREGS (reg)); } break; default: Index: gcc/df-scan.c === --- gcc/df-scan.c 2015-05-18 08:38:25.845752816 +0100 +++ gcc/df-scan.c 2015-05-18 08:38:25.841752865 +0100 @@ -3518,15 +3518,7 @@ df_get_eh_block_artificial_uses (bitmap static void df_mark_reg (rtx reg, void *vset) { - bitmap set = (bitmap) vset; - int regno = REGNO (reg); - - gcc_assert (GET_MODE (reg) != BLKmode); - - if (regno FIRST_PSEUDO_REGISTER) -bitmap_set_range (set, regno, REG_NREGS (reg)); - else -bitmap_set_bit (set, regno); + bitmap_set_range ((bitmap) vset, REGNO (reg), REG_NREGS (reg)); } Index: gcc/haifa-sched.c === --- gcc/haifa-sched.c 2015-05-18 08:38:25.845752816 +0100 +++ gcc/haifa-sched.c 2015-05-18 08:38:25.845752816 +0100 @@ -1032,17 +1032,13 @@ initiate_reg_pressure_info (bitmap live) static void setup_ref_regs (rtx x) { - int i, j, regno; + int i, j; const RTX_CODE code =
Re: [PATCH i386] Allow sibcalls in no-PLT PIC
After this message the discussion diverged in the direction of H.J.Lu's proposed relaxation scheme involving new type of relocations. I'm not clear if my patch is actually approved. I'd like to point out that it doesn't clash with H.J.Lu's work. It improves codegen by allowing sibcalls in more circumstances. Yes, the original patch is OK. Honza Alexander
Re: [PATCH][ARM] PR/65711: Don't pass '-dynamic-linker' when '-shared' is used
Ramana Radhakrishnan ramana@googlemail.com skribis: On Thu, Apr 23, 2015 at 9:29 AM, Ludovic Courtès l...@gnu.org wrote: As discussed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65711. Patch is for both 4.8 and 4.9 (possibly 5.1 too, I haven’t checked.) OK for trunk. This is also ok for all release branches if no objections in 24 hours. OK, thank you. I haven’t applied for write-after-approval so perhaps you should commit it yourself? Ludo’.
Re: [1/9] Make more use of END_REGNO
On 05/18/2015 12:11 PM, Richard Sandiford wrote: This patch just handles some obvious cases where END_REGNO could be used. gcc/ * cfgcleanup.c (mentions_nonequal_regs): Use END_REGNO. * dse.c (note_add_store): Likewise. * ira-lives.c (mark_hard_reg_dead): Likewise. * loop-invariant.c (mark_reg_store): Likewise. (mark_reg_death): Likewise. * postreload.c (reload_combine): Likewise. (reload_combine_note_store): Likewise. (reload_combine_note_use): Likewise. * recog.c (peep2_reg_dead_p): Likewise. OK. jeff
Fix r223258 fallout in gfortran.dg
I've committed the attached patch to fix testsuite regressions casued by r223258. The interface statements needed to be within the porgram unit. -- Steve Index: ChangeLog === --- ChangeLog (revision 223308) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2015-05-18 Steven G. Kargl ka...@gcc.gnu.org + + * gfortran.dg/lto/pr41521_0.f90: Move INTERFACE statement in program + unit. + * gfortran.dg/lto/pr41576_1.f90: Ditto. + 2015-05-12 Andreas Tobler andre...@gcc.gnu.org * lib/target-supports.exp (check_effective_target_pie): Add *-*-freebsd* Index: gfortran.dg/lto/pr41521_0.f90 === --- gfortran.dg/lto/pr41521_0.f90 (revision 223308) +++ gfortran.dg/lto/pr41521_0.f90 (working copy) @@ -3,9 +3,6 @@ program species integer spk(2) real eval(2) -spk = 2 -call atom(1.1,spk,eval) -end program interface subroutine atom(sol,k,eval) real, intent(in) :: sol @@ -13,4 +10,7 @@ interface real, intent(out) :: eval(2) end subroutine end interface +spk = 2 +call atom(1.1,spk,eval) +end program Index: gfortran.dg/lto/pr41576_1.f90 === --- gfortran.dg/lto/pr41576_1.f90 (revision 223308) +++ gfortran.dg/lto/pr41576_1.f90 (working copy) @@ -1,12 +1,11 @@ program test common /bar/ c, d integer(4) :: c, d - call foo - if (c/=1 .or. d/=2) call abort -end program test - interface subroutine foo() end subroutine end interface + call foo + if (c/=1 .or. d/=2) call abort +end program test
[PATCH] Fix match.pd narrowing opt (PR tree-optimization/66187)
Hi! As the testcases show, for signed types we really should use SIGNED rather than UNSIGNED as tree_int_cst_min_precision argument, that function doesn't really do the desired thing with UNSIGNED for negative values and with -fwrapv we just want the narrowing cast to not lose anything from the number. Additionally, as for TYPE_OVERFLOW_WRAPS case we perform the +/-/ in unsigned type, we have to avoid all negative masks, because those surely have the higher bits set. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? For plus and unsigned type, we could improve it even further, as in that case we could just check if the bit immediately above precision is clear in the mask, higher bits than that are uninteresting, because the addition will have always zeros there. Perhaps as a follow-up? I mean say uchar foo (uchar a, uchar b) { return (x + y) 0x72fe; } can be done as uchar addition 0xfe, but not e.g. 0x71fe. 2015-05-18 Jakub Jelinek ja...@redhat.com PR tree-optimization/66187 * match.pd ((bit_and (plus/minus (convert @0) (convert @1)) mask)): Pass TYPE_SIGN to tree_int_cst_min_precision. If !TYPE_OVERFLOW_WRAPS, ensure @4 is non-negative. * gcc.c-torture/execute/pr66187.c: New test. * gcc.dg/pr66187-1.c: New test. * gcc.dg/pr66187-2.c: New test. --- gcc/match.pd.jj 2015-05-18 09:46:39.0 +0200 +++ gcc/match.pd2015-05-18 10:20:24.944475737 +0200 @@ -1115,8 +1115,10 @@ (define_operator_list CBRT BUILT_IN_CBRT /* The inner conversion must be a widening conversion. */ TYPE_PRECISION (TREE_TYPE (@2)) TYPE_PRECISION (TREE_TYPE (@0)) types_match (@0, @1) - (tree_int_cst_min_precision (@4, UNSIGNED) + (tree_int_cst_min_precision (@4, TYPE_SIGN (TREE_TYPE (@0))) = TYPE_PRECISION (TREE_TYPE (@0))) + (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)) +|| tree_int_cst_sgn (@4) = 0) single_use (@5)) (if (TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))) (with { tree ntype = TREE_TYPE (@0); } --- gcc/testsuite/gcc.c-torture/execute/pr66187.c.jj2015-05-18 10:53:43.953654893 +0200 +++ gcc/testsuite/gcc.c-torture/execute/pr66187.c 2015-05-18 10:53:28.0 +0200 @@ -0,0 +1,16 @@ +/* PR tree-optimization/66187 */ + +int a = 1, e = -1; +short b, f; + +int +main () +{ + f = e; + int g = b 0 ? 0 : f + b; + if ((g -4) 0) +a = 0; + if (a) +__builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.dg/pr66187-1.c.jj 2015-05-18 10:54:30.677906029 +0200 +++ gcc/testsuite/gcc.dg/pr66187-1.c2015-05-18 12:30:17.0 +0200 @@ -0,0 +1,97 @@ +/* PR tree-optimization/66187 */ +/* { dg-do run } */ +/* { dg-options -O2 -fno-wrapv } */ + +__attribute__((noinline, noclone)) int +f0 (unsigned char x, unsigned char y) +{ + return (x + y) 0x2ff; +} + +__attribute__((noinline, noclone)) int +f1 (unsigned char x, unsigned char y) +{ + return (x - y) 0x2ff; +} + +__attribute__((noinline, noclone)) int +f2 (signed char x, signed char y) +{ + return (x + y) -4; +} + +__attribute__((noinline, noclone)) int +f3 (signed char x, signed char y) +{ + return (x + y) 0xf8; +} + +__attribute__((noinline, noclone)) int +f4 (signed char x, signed char y) +{ + return (x + y) 0x78; +} + +__attribute__((noinline, noclone)) int +f5 (unsigned char x, unsigned char y) +{ + int a = x; + int b = y; + int c = a + b; + return c 0x2ff; +} + +__attribute__((noinline, noclone)) int +f6 (unsigned char x, unsigned char y) +{ + int a = x; + int b = y; + int c = a - b; + return c 0x2ff; +} + +__attribute__((noinline, noclone)) int +f7 (signed char x, signed char y) +{ + int a = x; + int b = y; + int c = a + b; + return c -4; +} + +__attribute__((noinline, noclone)) int +f8 (signed char x, signed char y) +{ + int a = x; + int b = y; + int c = a + b; + return c 0xf8; +} + +__attribute__((noinline, noclone)) int +f9 (signed char x, signed char y) +{ + int a = x; + int b = y; + int c = a + b; + return c 0x78; +} + +int +main () +{ + if (__SCHAR_MAX__ != 127 || sizeof (int) != 4) +return 0; + if (f0 (0xff, 0xff) != 0xfe + || f1 (0, 1) != 0x2ff + || f2 (-2, 1) != -4 + || f3 (-2, 1) != 0xf8 + || f4 (-2, 1) != 0x78 + || f5 (0xff, 0xff) != 0xfe + || f6 (0, 1) != 0x2ff + || f7 (-2, 1) != -4 + || f8 (-2, 1) != 0xf8 + || f9 (-2, 1) != 0x78) +__builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.dg/pr66187-2.c.jj 2015-05-18 12:31:39.398176973 +0200 +++ gcc/testsuite/gcc.dg/pr66187-2.c2015-05-18 12:31:46.319067151 +0200 @@ -0,0 +1,97 @@ +/* PR tree-optimization/66187 */ +/* { dg-do run } */ +/* { dg-options -O2 -fwrapv } */ + +__attribute__((noinline, noclone)) int +f0 (unsigned char x, unsigned char y) +{ + return (x + y) 0x2ff; +} + +__attribute__((noinline, noclone)) int +f1 (unsigned char x, unsigned char y) +{ + return (x - y) 0x2ff; +} +
[PATCH] plugin event for C/C++ function definitions
Hi, this patch adds two new plugin events PLUGIN_START_PARSE_FUNCTION and PLUGIN_FINISH_PARSE_FUNCTION. These events are invoked at start_function and finish_function in gcc/c/c-decl.c and gcc/cp/decl.c respectively in the C and C++ frontends. PLUGIN_START_PARSE_FUNCTION is called before parsing a function body. PLUGIN_FINISH_PARSE_FUNCTION is called after parsing a function definition. This patch has been implemented in gcc 5.1.0. changelog: diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c index e28a294..fcc849d 100644 --- a/gcc/c/c-decl.c +++ b/gcc/c/c-decl.c @@ -8235,6 +8235,7 @@ start_function (struct c_declspecs *declspecs, struct c_declarator *declarator, decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, true, NULL, attributes, NULL, NULL, DEPRECATED_NORMAL); + invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1); /* If the declarator is not suitable for a function definition, cause a syntax error. */ @@ -9050,6 +9051,7 @@ finish_function (void) It's still in DECL_STRUCT_FUNCTION, and we'll restore it in tree_rest_of_compilation. */ set_cfun (NULL); + invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, current_function_decl); current_function_decl = NULL; } diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index c4731ae..bde92cc 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -13727,6 +13727,7 @@ start_function (cp_decl_specifier_seq *declspecs, tree decl1; decl1 = grokdeclarator (declarator, declspecs, FUNCDEF, 1, attrs); + invoke_plugin_callbacks (PLUGIN_START_PARSE_FUNCTION, decl1); if (decl1 == error_mark_node) return false; /* If the declarator is not suitable for a function definition, @@ -14365,6 +14366,7 @@ finish_function (int flags) vec_free (deferred_mark_used_calls); } + invoke_plugin_callbacks (PLUGIN_FINISH_PARSE_FUNCTION, fndecl); return fndecl; } diff --git a/gcc/doc/plugins.texi b/gcc/doc/plugins.texi index c6caa19..d50f25c 100644 --- a/gcc/doc/plugins.texi +++ b/gcc/doc/plugins.texi @@ -174,6 +174,8 @@ Callbacks can be invoked at the following pre-determined events: @smallexample enum plugin_event @{ + PLUGIN_START_PARSE_FUNCTION, /* Called before parsing the body of a function. */ + PLUGIN_FINISH_PARSE_FUNCTION, /* After finishing parsing a function. */ PLUGIN_PASS_MANAGER_SETUP,/* To hook into pass manager. */ PLUGIN_FINISH_TYPE, /* After finishing parsing a type. */ PLUGIN_FINISH_DECL, /* After finishing parsing a declaration. */ diff --git a/gcc/plugin.c b/gcc/plugin.c index d924438..628833f 100644 --- a/gcc/plugin.c +++ b/gcc/plugin.c @@ -441,6 +441,8 @@ register_callback (const char *plugin_name, return; } /* Fall through. */ + case PLUGIN_START_PARSE_FUNCTION: + case PLUGIN_FINISH_PARSE_FUNCTION: case PLUGIN_FINISH_TYPE: case PLUGIN_FINISH_DECL: case PLUGIN_START_UNIT: @@ -519,6 +521,8 @@ invoke_plugin_callbacks_full (int event, void *gcc_data) gcc_assert (event = PLUGIN_EVENT_FIRST_DYNAMIC); gcc_assert (event event_last); /* Fall through. */ + case PLUGIN_START_PARSE_FUNCTION: + case PLUGIN_FINISH_PARSE_FUNCTION: case PLUGIN_FINISH_TYPE: case PLUGIN_FINISH_DECL: case PLUGIN_START_UNIT: diff --git a/gcc/plugin.def b/gcc/plugin.def index 98c988a..2a7e4c2 100644 --- a/gcc/plugin.def +++ b/gcc/plugin.def @@ -17,6 +17,11 @@ You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +/* Called before parsing the body of a function. */ +DEFEVENT (PLUGIN_START_PARSE_FUNCTION) + +/* After finishing parsing a function. */ +DEFEVENT (PLUGIN_FINISH_PARSE_FUNCTION) /* To hook into pass manager. */ DEFEVENT (PLUGIN_PASS_MANAGER_SETUP) diff --git a/gcc/testsuite/g++.dg/plugin/def-plugin-test.C b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C new file mode 100644 index 000..b7f2d3d --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/def-plugin-test.C @@ -0,0 +1,13 @@ +int global = 12; + +int function1(void); + +int function2(int a) // { dg-warning Start fndef function2 } +{ + return function1() + a; +} // { dg-warning Finish fndef function2 } + +int function1(void) // { dg-warning Start fndef function1 } +{ + return global + 1; +} // { dg-warning Finish fndef function1 } diff --git a/gcc/testsuite/g++.dg/plugin/def_plugin.c b/gcc/testsuite/g++.dg/plugin/def_plugin.c new file mode 100644 index 000..63983c5 --- /dev/null +++ b/gcc/testsuite/g++.dg/plugin/def_plugin.c @@ -0,0 +1,45 @@ +/* A plugin example that shows which function definitions are caught by PLUGIN_START_FUNCTION and PLUGIN_FINISH_FUNCTION */ + +#include gcc-plugin.h +#include stdlib.h +#include config.h +#include system.h +#include coretypes.h +#include tree.h +#include tree-pass.h +#include intl.h +#include diagnostic.h + +int
Re: [PATCH] Refactoring: use std::swap instead of manual swaps
On Mon, May 18, 2015 at 09:07:43PM +0300, Mikhail Maltsev wrote: Hi all. The attached patch replaces the following typical sequence of assignments, matching the pattern: [type] temp_var = value1; value1 = value2; value2 = temp_var; with std::swap (value1, value2); I believe that transformation is considered obvious, so you can just go ahead and commit all the parts that are just that. That should make reviewers lives much simpler ;-) The goal is to improve code readability (at least a little). Pattern was searched in GCC tree using a script, but all modifications have been done manually to make sure that removing assignment to temp_var does not modify the behavior of the code. Where possible, temporary variables were removed or moved to blocks with narrower scope. I also removed a couple of pre-C++ implementations of swap function and one implementation of std::reverse. I haven't looked at the cases where you removed functions, but I suspect you can call those obvious too, and save yourself any rebasing pain. thanks! Trev
Re: [2/9] Add a REG_NREGS macro
On 05/18/2015 12:13 PM, Richard Sandiford wrote: This patch adds a REG_NREGS macro that by the end of the series will simply be an rtx field access. The definition in this patch is just a placeholder (and so I haven't tried to avoid the double evaluation of the parameter). The diff has extra context because in most cases it makes it obvious that we really are looking at hard_regno_nregs[REGNO (x)][GET_MODE (x)] for some rtx x. There are a couple of sites where it's more indirect though. gcc/ * rtl.h (REG_NREGS): New macro * alias.c (record_set): Use it. * cfgcleanup.c (mark_effect): Likewise. * combine.c (likely_spilled_retval_1): Likewise. (likely_spilled_retval_p, can_change_dest_mode): Likewise. (move_deaths, distribute_notes): Likewise. * cselib.c (cselib_record_set): Likewise. * df-problems.c (df_simulate_one_insn_forwards): Likewise. * df-scan.c (df_mark_reg): Likewise. * dse.c (look_for_hardregs): Likewise. * dwarf2out.c (reg_loc_descriptor): Likewise. (multiple_reg_loc_descriptor): Likewise. * expr.c (write_complex_part, read_complex_part): Likewise. (emit_move_complex): Likewise. * haifa-sched.c (setup_ref_regs): Likewise. * ira-lives.c (mark_hard_reg_live): Likewise. * lra.c (lra_set_insn_recog_data): Likewise. * mode-switching.c (create_pre_exit): Likewise. * postreload.c (reload_combine_recognize_const_pattern): Likewise. (reload_combine_recognize_pattern): Likewise. (reload_combine_note_use, move2add_record_mode): Likewise. (reload_cse_move2add): Likewise. * reg-stack.c (subst_stack_regs_pat): Likewise. * regcprop.c (kill_value, copy_value): Likewise. (copyprop_hardreg_forward_1): Likewise. * regrename.c (verify_reg_in_set, scan_rtx_reg): Likewise. (build_def_use): Likewise. * sched-deps.c (mark_insn_reg_birth, mark_reg_death): Likewise. (deps_analyze_insn): Likewise. * sched-rgn.c (check_live_1, update_live_1): Likewise. * sel-sched.c (count_occurrences_equiv): Likewise. * valtrack.c (dead_debug_insert_temp): Likewise. OK. Makes me wonder if we want to poison hard_regno_nregs in some way to avoid folks re-introducing this stuff in the future... Your call if you want to tackle that as a follow-up. jeff
Re: [3/9] Clean up df_ref_change_reg_with_loc
On 05/18/2015 12:15 PM, Richard Sandiford wrote: This patch cleans up the interface to df_ref_change_reg_with_loc. The function is used only by SET_REGNO_RAW, so the old regno is always REGNO (x) and the caller always goes on to set REGNO. (And the fuction doesn't make much sense otherwise.) The patch therefore gets df_ref_change_reg_with_loc to work out the old regno itself and to install the new register number once it's done. The check for the old register being -1 was redundant. Only expr.c and postreload.c create -1 registers (fixed in a later patch). Both sites are just creating temporary registers in order to query backend hooks and neither site needs the check. expr.c does use SET_REGNO and so does go through this function, but only at a time when there's no df information. (And it wouldn't work if the df machinery were set up, since any change after the first would look like a normal change.) postreload.c uses SET_REGNO_RAW and so bypasses the code altogether. expr.c bypasses the code too by the end of the series. gcc/ * df.h (df_ref_change_reg_with_loc): Remove old_regno parameter. Change type of new_regno to unsigned int. * df-scan.c (df_ref_change_reg_with_loc_1): Change type of new_regno to unsigned int. (df_ref_change_reg_with_loc): Remove old_regno parameter. Change type of new_regno to unsigned int. Use SET_REGNO_RAW. * rtl.h (SET_REGNO): Update call to df_ref_change_reg_with_loc. (SET_REGNO_RAW): Add space after ,. OK. jeff
Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
On Wed, 2015-04-29 at 23:23 -0400, Nuno Diegues wrote: Hello, I have taken the chance to improve the patch by addressing the comments above in this thread. Namely: - to use a simple random generator managed inside the library only - removed floating point usage and replaced by fixed arithmetic - added some comments where relevant First of all, sorry for taking so long to review this. Thank you for the contribution. There are several high-level changes that I'd like to see (or be convinced of that those aren't necessary). I'll discuss these first. I'll have more detailed comments inlined in the patch. My major concern is about rdtsc being used. The relation to frequency adaption that Andi mentioned is one issue, but the bigger issue to me is that the runtime of transactions might vary a lot, and that the relation of txnal vs. nontxnal code executed in a period might vary a lot too. Are there better options for the utility function, or can we tune it to be less affected by varying txn length and likelihood of txnal vs. nontxnal code? What are the things we want to avoid with the tuning? I can think of: * Not needing to wait for serial txns, or being aborted by a serial txn. * Not retrying using HTM too much so that the retry time is larger than the scalability we actually gain by having other txns commit concurrently. Anything else? Did you consider other utility functions during your research? Also, note that the mitigation strategy for rdtsc short-comings that you mention in the paper is not applicable in general, specifically not in libitm. Another issue is that we need to implement the tuning in a portable way. You currently make it depend on whether the assembler knows about RTM, whereas the rest of the code makes this more portable and defers to arch-specific files. I'd prefer if we could use the tuning on other archs too. But for that, we need to cleanly separate generic from arch-specific parts. That affects magic numbers as well as things like rdtsc(). Generally, the patch needs to include more comments. Most importantly, we need to document why we implemented things the way we did, or do tuning the way we do. In cases where the intent isn't trivial to see nor the code is trivial, explain the intent or reference the place where you document the big picture. A good rule of thumb is adding comments whenever it's not simple to see why we use one of several possible implementation alternatives. The magic numbers being used are a good example. Make them constants and don't just put them directly in the code, and then add documentation why you chose this number and why it's okay to make that choice. If it isn't necessarily okay (eg, other archs, future systems), but you don't have a better solution right now, add something like ??? Not ideal because of XYZ.. If there's a source for some of the magic numbers, cite it (e.g., [28] in your paper might be one for the tuning thresholds, I guess). Reoptimizing only in a specific, fixed thread is insufficient in the general case. There may be a thread that only runs an initial txn and then never another one; if this happens to be the last thread registered, you'll never get any tuning. If we want the tuning to be effective, one of the threads *actively* running txns needs to tune eventually, always. Depending on that, you'll probably have to change the sync between the tuning thread and others. Serial mode may be a simple way to do that. It may be possible to only check for tuning being necessary when in serial mode. It could be possible that we end up trying HTM too often yet never go to serial mode; however, that seems unlikely to me (but I haven't thought thoroughly about it). Also, please remember that only data-race-free code is strictly correct C++ code (the only exception being the validated-loads special case in the STM implementations, for which C++ doesn't have support yet (but for which proposals exist)). Thus, use atomic operations accordingly. That might create another issue though in that you can't assume 64b atomic ops to be supported on all archs. Maybe using serial mode for tuning doesn't trigger that issue though. I'm wondering about whether it really makes sense to treat XABORT like conflicts and other abort reasons, instead of like capacity aborts. Perhaps we need to differentiate between the explicit abort codes glibc uses, so that we can distinguish between cases where an abort is supposed to signal incompatibility with txnal execution and cases where it's just used for lock elision (see sysdeps/unix/sysv/linux/x86/hle.h in current glibc): #define _ABORT_LOCK_BUSY0xff #define _ABORT_LOCK_IS_LOCKED 0xfe #define _ABORT_NESTED_TRYLOCK 0xfd Andi said that it would be nice to have an env var turning tuning on/off. I agree in principle, yet would prefer to have the tuning be the default. What about adding an htm_notune option in parse_default_method? It would be even nicer to
RE: Refactor gimple_expr_type
Date: Mon, 18 May 2015 12:08:58 +0200 Subject: Re: Refactor gimple_expr_type From: richard.guent...@gmail.com To: hiradi...@msn.com CC: tbsau...@tbsaunde.org; gcc-patches@gcc.gnu.org On Sun, May 17, 2015 at 5:31 PM, Aditya K hiradi...@msn.com wrote: Date: Sat, 16 May 2015 11:53:57 -0400 From: tbsau...@tbsaunde.org To: hiradi...@msn.com CC: gcc-patches@gcc.gnu.org Subject: Re: Refactor gimple_expr_type On Fri, May 15, 2015 at 07:13:35AM +, Aditya K wrote: Hi, I have tried to refactor gimple_expr_type to make it more readable. Removed the switch block and redundant if. Please review this patch. for some reason your mail client seems to be inserting non breaking spaces all over the place. Please either configure it to not do that, or use git send-email for patches. Please see the updated patch. Ok if this passed bootstrap and regtest. (I wish if gimple_expr_type didn't exist btw...) Thanks for the review. Do you have any suggestions on how to remove gimple_expr_type. Are there any alternatives to it? I can look into refactoring more (if it is not too complicated) since I'm already doing this. -Aditya Thanks, Richard. gcc/ChangeLog: 2015-05-15 hiraditya hiradi...@msn.com * gimple.h (gimple_expr_type): Refactor to make it concise. Remove redundant if. diff --git a/gcc/gimple.h b/gcc/gimple.h index 95e4fc8..3a83e8f 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -5717,36 +5717,26 @@ static inline tree gimple_expr_type (const_gimple stmt) { enum gimple_code code = gimple_code (stmt); - - if (code == GIMPLE_ASSIGN || code == GIMPLE_CALL) + /* In general we want to pass out a type that can be substituted + for both the RHS and the LHS types if there is a possibly + useless conversion involved. That means returning the + original RHS type as far as we can reconstruct it. */ + if (code == GIMPLE_CALL) { - tree type; - /* In general we want to pass out a type that can be substituted - for both the RHS and the LHS types if there is a possibly - useless conversion involved. That means returning the - original RHS type as far as we can reconstruct it. */ - if (code == GIMPLE_CALL) - { - const gcall *call_stmt = as_a const gcall * (stmt); - if (gimple_call_internal_p (call_stmt) - gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE) - type = TREE_TYPE (gimple_call_arg (call_stmt, 3)); - else - type = gimple_call_return_type (call_stmt); - } + const gcall *call_stmt = as_a const gcall * (stmt); + if (gimple_call_internal_p (call_stmt) + gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE) + return TREE_TYPE (gimple_call_arg (call_stmt, 3)); + else + return gimple_call_return_type (call_stmt); + } + else if (code == GIMPLE_ASSIGN) + { + if (gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR) + return TREE_TYPE (gimple_assign_rhs1 (stmt)); else - switch (gimple_assign_rhs_code (stmt)) - { - case POINTER_PLUS_EXPR: - type = TREE_TYPE (gimple_assign_rhs1 (stmt)); - break; - - default: - /* As fallback use the type of the LHS. */ - type = TREE_TYPE (gimple_get_lhs (stmt)); - break; - } - return type; + /* As fallback use the type of the LHS. */ + return TREE_TYPE (gimple_get_lhs (stmt)); } else if (code == GIMPLE_COND) return boolean_type_node; Thanks, -Aditya Thanks, -Aditya gcc/ChangeLog: 2015-05-15 hiraditya hiradi...@msn.com * gimple.h (gimple_expr_type): Refactor to make it concise. Remove redundant if. diff --git a/gcc/gimple.h b/gcc/gimple.h index 95e4fc8..168d3ba 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -5717,35 +5717,28 @@ static inline tree gimple_expr_type (const_gimple stmt) { enum gimple_code code = gimple_code (stmt); - - if (code == GIMPLE_ASSIGN || code == GIMPLE_CALL) + tree type; + /* In general we want to pass out a type that can be substituted + for both the RHS and the LHS types if there is a possibly + useless conversion involved. That means returning the + original RHS type as far as we can reconstruct it. */ + if (code == GIMPLE_CALL) { - tree type; - /* In general we want to pass out a type that can be substituted - for both the RHS and the LHS types if there is a possibly - useless conversion involved. That means returning the - original RHS type as far as we can reconstruct it. */ - if (code == GIMPLE_CALL) - { - const gcall *call_stmt = as_a const gcall * (stmt); - if (gimple_call_internal_p (call_stmt) - gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE) - type = TREE_TYPE (gimple_call_arg (call_stmt, 3)); - else - type = gimple_call_return_type (call_stmt); - } + const gcall *call_stmt = as_a const gcall * (stmt); + if (gimple_call_internal_p (call_stmt) + gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE) + type = TREE_TYPE (gimple_call_arg (call_stmt, 3)); + else + type = gimple_call_return_type (call_stmt); + return
Re: [8/9] Remove END_HARD_REGNO
On 05/18/2015 12:25 PM, Richard Sandiford wrote: This patch replaces calls to END_HARD_REGNO with calls to END_REGNO and moves END_REGNO to rtl.h (since it's now just a REG access). gcc/ * regs.h (END_HARD_REGNO): Delete. (END_REGNO): Move to... * rtl.h: ...here. * bt-load.c (note_btr_set): Use END_REGNO instead of END_HARD_REGNO. * caller-save.c (mark_set_regs): Likewise. * combine.c (move_deaths, distribute_notes): Likewise. * cse.c (invalidate, invalidate_for_call): Likewise. * df-scan.c (df_ref_record): Likewise. * postreload-gcse.c (reg_changed_after_insn_p): Likewise. (record_last_reg_set_info): Likewise. * reg-stack.c (convert_regs_exit): Likewise. * reload.c (reg_overlap_mentioned_for_reload_p): Likewise. * resource.c (update_live_status): Likewise. * rtlanal.c (find_reg_fusage, find_regno_fusage): Likewise. OK. jeff
Re: [4/9] Add a dedicated rtx union member for REGs
On 05/18/2015 12:19 PM, Richard Sandiford wrote: This patch replaces the current REG i0 format with a dedicated structure, so that we can make use of the extra 32 bits in the i field. Of the places that iterate on formats and do something for 'i's, most already handled REGs specially before the format walk and so don't need to check for 'r'. Otherwise it's mostly just a case of adding dummy 'r' cases in order avoid the default gcc_unreachable () (in cases where we do the same for 'i'). The main exceptions are the cselib.c and lra-constraints.c changes. final.c:leaf_renumber_regs_insn handled REGs specially but then went on to do a no-op walk of the format. I just added an early exit instead of an empty 'r' case. gcc/ * rtl.def (REG): Change format to r. * rtl.h (rtunion): Remove rt_reg. (reg_info): New structure. (rtx_def): Add reg field to main union. (X0REGATTR): Delete. (REG_CHECK): New macro. (SET_REGNO_RAW, rhs_regno, REG_ATTRS): Use it. * rtl.c (rtx_format): Document r. (rtx_code_size): Handle REG specially. * gengenrtl.c (special_format): Return true for formats that include 'r'. * gengtype.c (adjust_field_rtx_def): Handle 'r' fields. Deal with REG_ATTRS after the field loop. * emit-rtl.c (gen_raw_REG): Call rtx_alloc_stat directly. * expmed.c (init_expmed): Call gen_raw_REG instead of gen_rtx_raw_REG. * expr.c (init_expr_target): Likewise. * regcprop.c (maybe_mode_change): Likewise. * varasm.c (make_decl_rtl): Likewise. * final.c (leaf_renumber_regs_insn): Return early after handling REGs. * genemit.c (gen_exp): Handle 'r' fields. * genpeep.c (match_rtx): Likewise. * gensupport.c (subst_pattern_match): Likewise. (get_alternatives_number, collect_insn_data, alter_predicate_for_insn) (alter_constraints, subst_dup): Likewise. * read-rtl.c (read_rtx_code): Likewise. * print-rtl.c (print_rtx): Likewise. * genrecog.c (find_operand, find_matching_operand): Likewise. (validate_pattern, match_pattern_2): Likewise. (parameter::UINT, rtx_test::REGNO_FIELD): New enum values. (rtx_test::regno_field): New function. (operator ==, safe_to_hoist_p, transition_parameter_type) (parameter_type_string, print_parameter_value) (print_nonbool_test, print_test): Handle new enum values. * cselib.c (rtx_equal_for_cselib_1): Handle REG specially. * lra-constraints.c (operands_match_p): Likewise. Just to confirm, this doesn't change the size of a REG object, right? If it doesn't change the size, then it's OK. If we make REG objects larger, we potentially increase the amount of memory used by GCC. If we make REG objects smaller, then we run the risk of memory overwrites when reload changes REG objects into MEM objects. The former is bad, the latter is a disaster waiting to happen. Jeff
Re: [5/9] Create sensible dummy registers
On 05/18/2015 12:21 PM, Richard Sandiford wrote: Some pieces of code create a temporary REG or MEM and only fill it in later when they're testing the cost of a particular rtx. This patch makes sure that even the dummy REG or MEM is valid, rather than force the gen_* code to handle garbage values. gcc/ * caller-save.c (init_caller_save): Use word_mode and FIRST_PSEUDO_REGISTER when creating temporary rtxes. * expr.c (init_expr_target): Likewise. * ira.c (setup_prohibited_mode_move_regs): Likewise. * postreload.c (reload_cse_regs_1): Likewise. OK. jeff
Re: [7/9] Record the number of registers in a REG
On 05/18/2015 12:24 PM, Richard Sandiford wrote: This is the main patch, to record REG_NREGS in the REG itself. The END_REGNO/END_HARD_REGNO distinction goes away in the next patch. gcc/ * rtl.h (reg_info): Add an nregs field. (REG_NREGS): Use it. (SET_REGNO_RAW): Delete. (set_regno_raw): New function. * regs.h (END_HARD_REGNO): Make equivalent to END_REGNO. (END_REGNO): Redefine in terms of REG_NREGS. * read-rtl.c (read_rtx_code): Call set_regno_raw instead of SET_REGNO_RAW. * emit-rtl.c (set_mode_and_regno): Likewise. * df-scan.c (df_ref_change_reg_with_loc): Use set_mode_and_regno instead of SET_REGNO_RAW. OK. jeff
Re: [6/9] Pass REG changes through a new function
On 05/18/2015 12:23 PM, Richard Sandiford wrote: This patch adds a new function to set both the mode and regno of a REG. It makes sure that all REG PUT_MODE and SET_REGNO changes go through this function. There's a new PUT_MODE_RAW (analogous to SET_REGNO_RAW) for the cases where the caller doesn't want that. There's a small consistency fix: gen_rtx_REG was declared with unsigned and defined with unsigned int. The latter is usual GCC style. gcc/ * rtl.h (PUT_MODE_RAW): New macro. (PUT_REG_NOTE_KIND): Use it. (set_mode_and_regno): Declare. (gen_raw_REG): Change regno to unsigned int. (gen_rtx_REG): Change unsigned to unsigned int. (PUT_MODE): Forward to PUT_MODE_RAW for generators, otherwise use set_mode_and_regno to change the mode of registers. * gengenrtl.c (gendef): Use PUT_MODE_RAW. * emit-rtl.c (set_mode_and_regno): New function. (gen_raw_REG): Change regno to unsigned int. Use set_mode_and_regno. * caller-save.c (reg_save_code): Use set_mode_and_regno. * expr.c (init_expr_target): Likewise. * ira.c (setup_prohibited_mode_move_regs): Likewise. * postreload.c (reload_cse_simplify_operands): Likewise. OK. jeff
Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
Are there better options for the utility function, or can we tune it to There is nothing better that isn't a lot slower. -Andi
Re: [PATCH] Refactoring: use std::swap instead of manual swaps
On 05/18/2015 12:07 PM, Mikhail Maltsev wrote: Hi all. The attached patch replaces the following typical sequence of assignments, matching the pattern: [type] temp_var = value1; value1 = value2; value2 = temp_var; with std::swap (value1, value2); The goal is to improve code readability (at least a little). Pattern was searched in GCC tree using a script, but all modifications have been done manually to make sure that removing assignment to temp_var does not modify the behavior of the code. Where possible, temporary variables were removed or moved to blocks with narrower scope. I also removed a couple of pre-C++ implementations of swap function and one implementation of std::reverse. Bootstrapped/regtested on x86_64 linux (multilib). Test for broken builds currently running. I will also run test on a couple of simulators (though currently there is a problem with build: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66181). OK for trunk after complete test? Did you mean to lose the assignment to heapa-m_nodes in fibonacci_heap.h? I just spot-checked and it looks fine to me, except for perhaps losing the assignment noted above. jeff
Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option
fredag 08 maj 2015 10.35.44 skrev H.J. Lu: On Thu, May 7, 2015 at 2:17 PM, Joseph Myers jos...@codesourcery.com wrote: On Fri, 6 Mar 2015, H.J. Lu wrote: +# We don't want to compile the compiler with -fPIE, it make PCH fail. +COMPILER += @NO_PIE_CFLAGS@ + +# Link with -no-pie since we compile the compiler with -fno-PIE. +LINKER += @NO_PIE_FLAG@ As I understand it, what we don't want is the compiler to be a PIE. That is, it must be linked -no-pie (and given that the compiler is not a PIE, compiling -fPIE would be pointless, although it wouldn't actually break things to have PIE objects in the compiler as long as it's linked for a fixed address). +#if defined ENABLE_DEFAULT_PIE +#define GNU_USER_TARGET_STARTFILE_SPEC \ + %{!shared: %{pg|p|profile:gcrt1.o%s;: \ +%{ PIE_SPEC :Scrt1.o%s} %{ NO_PIE_SPEC :crt1.o%s}}} \ + crti.o%s %{static:crtbeginT.o%s;: %{shared:crtbeginS.o%s} \ + %{ PIE_SPEC :crtbeginS.o%s} \ + %{ NO_PIE_SPEC :crtbegin.o%s}} \ + FVTABLE_VERIFY_SPEC +#else +#define GNU_USER_TARGET_STARTFILE_SPEC \ + %{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}} \ + crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s} \ + FVTABLE_VERIFY_SPEC +#endif With appropriate definitions of PIE_SPEC and NO_PIE_SPEC, shouldn't a single definition of GNU_USER_TARGET_STARTFILE_SPEC be able to work for both ENABLE_DEFAULT_PIE and !ENABLE_DEFAULT_PIE? Yes. https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00393.html noted a possible issue with MIPS. Actually, rather more config/*.h and config/*/*.h headers contain specs testing for (-fpie, -fPIE, -fno-pie, -fno-PIE, -pie) options, which would be affected by these changes. I'd say this patch should include an initial attempt at adjusting those config headers, which should be an essentially mechanical change not requiring understanding anything target-specific. For link-time specs, that may mean using PIE_SPEC and NO_PIE_SPEC. For compile-time specs, similar new macros would be added. Given such adjustments included in the patch and the relevant target maintainers CC:ed, I might then be inclined to approve the patch on the basis of allowing a week for target maintainers to test the changes for their targets before commit, as I don't see any major problems with it beyond the need to update the target-specific specs. Here is the updated patch. I will post patches for cris, mips, powerpc and sparc separately. The target maintainers should be able to adjust backend ASM_SPEC with FPIE_OR_FPIC_SPEC and NO_FPIE_AND_FPIC_SPEC. OK for trunk? Thanks. PIng Any progress on this? /Magnus G.
Re: [PATCH/libiberty] fix build of gdb/binutils with clang.
On Mon, May 18, 2015 at 4:26 PM, Yunlian Jiang yunl...@google.com wrote: Yes, the problem is libiberty is compiling some files with _GNU_SOURCE defined and some not. So the configure file does not include #define _GNU_SOURCE. As far as I can see it should be fine to define _GNU_SOURCE when compiling all libiberty files. Ian
Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
On Mon, May 18, 2015 at 5:29 PM, Torvald Riegel trie...@redhat.com wrote: First of all, sorry for taking so long to review this. Thank you for the contribution. Hello Torvald, thanks for taking the time to look into this! My major concern is about rdtsc being used. The relation to frequency adaption that Andi mentioned is one issue, but the bigger issue to me is that the runtime of transactions might vary a lot, and that the relation of txnal vs. nontxnal code executed in a period might vary a lot too. Once again, I believe that frequency scaling should not be a concern: recent CPUs use a constant rate TSC that matches that of the maximum frequency of the CPU. Are there better options for the utility function, or can we tune it to be less affected by varying txn length and likelihood of txnal vs. nontxnal code? What are the things we want to avoid with the tuning? I can think of: * Not needing to wait for serial txns, or being aborted by a serial txn. * Not retrying using HTM too much so that the retry time is larger than the scalability we actually gain by having other txns commit concurrently. Yes, those are the key points we want to make sure that do not happen. Anything else? Did you consider other utility functions during your research? The txnal vs nontxnal is indeed a completely different story. To account for this we would need extra book-keeping to count only cycles spent inside txnal code. So this would require every thread (or a sample of threads) to perform a rdtsc (or equivalent) on every begin/end call rather than the current approach of a single rdtsc per optimization round. With this type of online optimization we found that the algorithm had to be very simple and cheap to execute. RDTSC was a good finding to fit this, and it enabled us to obtain gains. Other time sources failed to do so. I do not have, out of the box, a good alternative to offer. I suppose it would take some iterations of thinking/experimenting with, just like with any research problem :) Also, note that the mitigation strategy for rdtsc short-comings that you mention in the paper is not applicable in general, specifically not in libitm. I suppose you mean the preemption of threads inflating the cycles measured? This would be similarly a problem to any time source that tries to measure the amount of work performed; not sure how we can avoid it in general. Any thoughts? Another issue is that we need to implement the tuning in a portable way. You currently make it depend on whether the assembler knows about RTM, whereas the rest of the code makes this more portable and defers to arch-specific files. I'd prefer if we could use the tuning on other archs too. But for that, we need to cleanly separate generic from arch-specific parts. That affects magic numbers as well as things like rdtsc(). Yes, I refrained from adding new calls to the arch-specific files, to contain the changes mainly. But that is possible and that's part of the feedback I was hoping to get. Generally, the patch needs to include more comments. Most importantly, we need to document why we implemented things the way we did, or do tuning the way we do. In cases where the intent isn't trivial to see nor the code is trivial, explain the intent or reference the place where you document the big picture. A good rule of thumb is adding comments whenever it's not simple to see why we use one of several possible implementation alternatives. The magic numbers being used are a good example. Make them constants and don't just put them directly in the code, and then add documentation why you chose this number and why it's okay to make that choice. If it isn't necessarily okay (eg, other archs, future systems), but you don't have a better solution right now, add something like ??? Not ideal because of XYZ.. If there's a source for some of the magic numbers, cite it (e.g., [28] in your paper might be one for the tuning thresholds, I guess). Ack, makes perfect sense. Reoptimizing only in a specific, fixed thread is insufficient in the general case. There may be a thread that only runs an initial txn and then never another one; if this happens to be the last thread registered, you'll never get any tuning. If we want the tuning to be effective, one of the threads *actively* running txns needs to tune eventually, always. Depending on that, you'll probably have to change the sync between the tuning thread and others. Serial mode may be a simple way to do that. It may be possible to only check for tuning being necessary when in serial mode. It could be possible that we end up trying HTM too often yet never go to serial mode; however, that seems unlikely to me (but I haven't thought thoroughly about it). Also, please remember that only data-race-free code is strictly correct C++ code (the only exception being the validated-loads special case in the STM implementations,
Re: [PATCH] Refactoring: use std::swap instead of manual swaps
On 19.05.2015 0:39, Jeff Law wrote: Did you mean to lose the assignment to heapa-m_nodes in fibonacci_heap.h? I just spot-checked and it looks fine to me, except for perhaps losing the assignment noted above. jeff Oops. Fixed that and committed to trunk (just in case, I re-ran tests on x86_64 linux + checked that build does not break for various targets. Simulators that I tried to use are still broken, unfortunately). -- Regards, Mikhail Maltsev diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c00a878..44017c1 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,52 @@ +2015-05-19 Mikhail Maltsev malts...@gmail.com + + * bb-reorder.c (fix_up_fall_thru_edges): Use std::swap instead of + explicit swaps. + * dojump.c (do_compare_rtx_and_jump): Likewise. + * expmed.c (emit_store_flag_1): Likewise. + * fibonacci_heap.h (fibonacci_heap::union_with): Likewise. + * final.c (sprint_ul): Use std::reverse for reversing a string. + * fold-const.c (extract_muldiv_1): Use std::swap. + * genmodes.c (emit_mode_int_n): Likewise. + * ifcvt.c (dead_or_predicable): Likewise. + * ira-build.c (ira_merge_live_ranges): Likewise. + (swap_allocno_copy_ends_if_necessary): Likewise. + * ira.c (ira_setup_alts): Likewise. + * loop-iv.c (iv_analyze_expr): Likewise. + (implies_p): Likewise. + (canon_condition): Likewise. + * lra-constraints.c (swap_operands): Likewise. + * lra-lives.c (lra_merge_live_ranges): Likewise. + * omega.c (swap): Remove. + (bswap): Remove. + (omega_unprotect_1): Use std::swap. + (omega_solve_geq): Likewise. + * optabs.c (expand_binop_directly): Likewise. + (expand_binop): Likewise. + (emit_conditional_move): Likewise. + (emit_conditional_add): Likewise. + * postreload.c (reload_cse_simplify_operands): Likewise. + * reg-stack.c (emit_swap_insn): Likewise. + (swap_to_top): Likewise. + (compare_for_stack_reg): Likewise. + (subst_asm_stack_regs): Likewise. + * reload.c (find_reloads): Likewise. + * reload1.c (gen_reload_chain_without_interm_reg_p): Likewise. + * sel-sched.c (invoke_reorder_hooks): Likewise. + (create_block_for_bookkeeping): Likewise. + * tree-data-ref.c (lambda_matrix_row_exchange): Remove. + (lambda_matrix_right_hermite): Use std::swap. + * tree-ssa-coalesce.c (sort_coalesce_list): Likewise. + * tree-ssa-ifcombine.c (ifcombine_ifandif): Likewise. + * tree-ssa-loop-ivopts.c (iv_ca_delta_reverse): Likewise. + * tree-ssa-math-opts.c (is_widening_mult_p): Likewise. + * tree-ssa-phiopt.c (hoist_adjacent_loads): Likewise. + * tree-ssa-reassoc.c (linearize_expr_tree): Likewise. + * tree-ssa-threadedge.c (simplify_control_stmt_condition): Likewise. + * tree-vrp.c (compare_ranges): Likewise. + * var-tracking.c (add_with_sets): Likewise. + (vt_find_locations): Likewise. + 2015-05-18 Andreas Tobler andre...@gcc.gnu.org * config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index c134712..a97c2cd 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -1812,7 +1812,6 @@ fix_up_fall_thru_edges (void) edge succ2; edge fall_thru; edge cond_jump = NULL; - edge e; bool cond_jump_crosses; int invert_worked; rtx_insn *old_jump; @@ -1901,9 +1900,7 @@ fix_up_fall_thru_edges (void) fall_thru-flags = ~EDGE_FALLTHRU; cond_jump-flags |= EDGE_FALLTHRU; update_br_prob_note (cur_bb); - e = fall_thru; - fall_thru = cond_jump; - cond_jump = e; + std::swap (fall_thru, cond_jump); cond_jump-flags |= EDGE_CROSSING; fall_thru-flags = ~EDGE_CROSSING; } diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog index a6abd60..bf403ce 100644 --- a/gcc/c-family/ChangeLog +++ b/gcc/c-family/ChangeLog @@ -1,3 +1,7 @@ +2015-05-19 Mikhail Maltsev malts...@gmail.com + + * c-common.c (shorten_compare): Use std::swap instead of explicit swaps. + 2015-05-18 Tom de Vries t...@codesourcery.com * c-common.c (build_va_arg_1): New function. diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 6d85049..3998b23 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -4372,20 +4372,12 @@ shorten_compare (location_t loc, tree *op0_ptr, tree *op1_ptr, !integer_zerop (primop1) !real_zerop (primop1) !fixed_zerop (primop1)) { - tree tem = primop0; - int temi = unsignedp0; - primop0 = primop1; - primop1 = tem; - tem = op0; - op0 = op1; - op1 = tem; + std::swap (primop0, primop1); + std::swap (op0, op1);
Fwd: Re: [PATCH, RFC]: Next stage1, refactoring: propagating rtx subclasses
It seems rather weird to me: I have definitely sent this message, but now I realized that it did not get to the mailing list archive (and it was not bounced either). Thus, resending. On 12.05.2015 23:10, Jeff Law wrote: There's a makefile fragment in contrib which will build a large number of targets that you might find helpful. Of course without some baseline to compare against, it's of less value. Thanks for a hint. I tested the build using this set of targets. Currently there are some targets that fail to build from trunk (due to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65835 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55035). Other targets are OK, and the patch does not introduce any failures. The attached archive contains the latest version of the patch (no significant differences since previous one; I just combined everything in one file and rebased) and updated changelog. Bootstrapped + regtested on x86_64-unknown-linux-gnu{,-m32} and built config-list.mk. OK for trunk? -- Regards, Mikhail Maltsev as_insn6.tar.gz Description: GNU Zip compressed data
[match-and-simplify] report error for invalid operator-lists
Hi, genmatch segfaults on: (define_operator_list op (plus)) The above syntax is invalid, and it segfaults on: fatal_error (token, operator list is empty); because token is NULL. The patch puts a check for CPP_CLOSE_PAREN after parsing id-list. OK for trunk after bootstrap+testing completes ? Thanks, Prathamesh Index: genmatch.c === --- genmatch.c (revision 223294) +++ genmatch.c (working copy) @@ -3427,6 +3427,11 @@ op-substitutes.safe_push (idb); } + // Check that there is no junk after id-list + token = peek(); + if (token-type != CPP_CLOSE_PAREN) +fatal_at (token, expected identifier got %s, cpp_type2name (token-type, 0)); + if (op-substitutes.length () == 0) fatal_at (token, operator-list cannot be empty); 2015-05-19 Prathamesh Kulkarni prathamesh.kulka...@linaro.org * genmatch.c (parser::parse_operator_list): Check for CPP_CLOSE_PAREN after end of id-list.
Re: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option
On Mon, May 18, 2015 at 6:11 PM, Joseph Myers jos...@codesourcery.com wrote: On Tue, 19 May 2015, Magnus Granberg wrote: https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00393.html noted a possible issue with MIPS. Actually, rather more config/*.h and config/*/*.h headers contain specs testing for (-fpie, -fPIE, -fno-pie, -fno-PIE, -pie) options, which would be affected by these changes. I'd say this patch should include an initial attempt at adjusting those config headers, which should be an essentially mechanical change not requiring understanding anything target-specific. For link-time specs, that may mean using PIE_SPEC and NO_PIE_SPEC. For compile-time specs, similar new macros would be added. Given such adjustments included in the patch and the relevant target maintainers CC:ed, I might then be inclined to approve the patch on the basis of allowing a week for target maintainers to test the changes for their targets before commit, as I don't see any major problems with it beyond the need to update the target-specific specs. Here is the updated patch. I will post patches for cris, mips, powerpc and sparc separately. The target maintainers should be able to adjust backend ASM_SPEC with FPIE_OR_FPIC_SPEC and NO_FPIE_AND_FPIC_SPEC. OK for trunk? Thanks. PIng Any progress on this? Have updates for all affected specs for all targets been posted? I just saw a small and apparently arbitrary subset of targets with patches, and no explanation of how those targets were identified or why the other targets with specs mentioning the options in question did not need updates. I only posted patches for an arbitrary subset of targets because 1. Not everyone is interested in --enable-default-pie. 2. I can't tests all targets myself. If patches for all targets is the only blocker before the patch will be approved or target maintainers will help me test the patch, I will post patches for each target affected. -- H.J.
Re: [PATCH/libiberty] fix build of gdb/binutils with clang.
Yes, the problem is libiberty is compiling some files with _GNU_SOURCE defined and some not. So the configure file does not include #define _GNU_SOURCE. I have reduced the failing compiling as below. #define _GNU_SOURCE (1) #define ATTRIBUTE_NONNULL(m) __attribute__ ((__nonnull__ (m))) #define ATTRIBUTE_PRINTF(m, n) __attribute__ ((__format__ (__printf__, m, n))) ATTRIBUTE_NONNULL(m) #define ATTRIBUTE_PRINTF_2 ATTRIBUTE_PRINTF(2, 3) #include stdio.h //#define _GNU_SOURCE // (2) extern int asprintf (char **, const char *, ...) ATTRIBUTE_PRINTF_2; it fails with command line /usr/bin/clang -fPIE -D_FORTIFY_SOURCE=2 -O2 -c f.c -o f.o If the _GNU_SOURCE is placed at (1), then the asprintf in the test case is a macro and it is expanded to extern int __asprintf_chk (char **, 2 - 1, const char *, ...) __attribute__ ((__format__ (__printf__, 2, 3))) __attribute__ ((__nonnull__ (2))); in the pre-processed file and causes compilation error. If the _GNU_SOURCE is placed at (2) or after stdio.h, then asprintf in the test case is still a function in the pre-processed file and it compiles. extern int asprintf (char **, const char *, ...) __attribute__ ((__format__ (__printf__, 2, 3))) __attribute__ ((__nonnull__ (2))); I then looked at the /usr/include/bits/stdio2.h, it has # ifdef __USE_GNU ... # ifdef __va_arg_pack __fortify_function int __NTH (asprintf (char **__restrict __ptr, const char *__restrict __fmt, ...)) { return __asprintf_chk (__ptr, __USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); } __fortify_function int __NTH (__asprintf (char **__restrict __ptr, const char *__restrict __fmt, ...)) { return __asprintf_chk (__ptr, __USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); } __fortify_function int __NTH (obstack_printf (struct obstack *__restrict __obstack, const char *__restrict __fmt, ...)) { return __obstack_printf_chk (__obstack, __USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ()); } # elif !defined __cplusplus # define asprintf(ptr, ...) \ __asprintf_chk (ptr, __USE_FORTIFY_LEVEL - 1, __VA_ARGS__) # define __asprintf(ptr, ...) \ __asprintf_chk (ptr, __USE_FORTIFY_LEVEL - 1, __VA_ARGS__) # define obstack_printf(obstack, ...) \ __obstack_printf_chk (obstack, __USE_FORTIFY_LEVEL - 1, __VA_ARGS__) # endif # endif It requires gcc to be at least 4.3 to have __va_arg_pack defined. Clang only has gcc 4.2. So clang chooses the asprintf macro and causes the problem. On Mon, May 4, 2015 at 5:30 PM, Ian Lance Taylor i...@google.com wrote: On Mon, May 4, 2015 at 3:49 PM, Yunlian Jiang yunl...@google.com wrote: There was a similar disscussion here https://gcc.gnu.org/ml/gcc/2005-11/msg01190.html That was a discussion about libiberty. Your subject says you have trouble building gdb. Can you describe the exact problem that you are having? What precisely are you doing? What precisely happens? The problem is in the configure stage, the __GNU_SOURCE is not defined, and it could not find the declaration of asprintf. so it make a declaration of asprintf in libiberty.h. And for the file floatformat.c, the __GNU_SOURCE is defined, so it could find another asprintf in /usr/include/bits/stdio2.h, it also includes libiberty.h. So these two asprintf conflicts when __USE_FORTIFY_LEVEL is set. I think the basic guideline should be that HAVE_DECL_ASPRINTF should be correct. If libiberty compiled with _GNU_SOURCE defined, then it should test HAVE_DECL_ASPRINTF with _GNU_SOURCE defined. If not, then not. So perhaps the problem is that libiberty is compiling some files with _GNU_SOURCE defined and some not. Ian
Re: [match-and-simplify] fix incorrect code-gen in 'for' pattern
On 18 May 2015 at 20:17, Prathamesh Kulkarni prathamesh.kulka...@linaro.org wrote: On 18 May 2015 at 14:12, Richard Biener rguent...@suse.de wrote: On Sat, 16 May 2015, Prathamesh Kulkarni wrote: Hi, genmatch generates incorrect code for following (artificial) pattern: (for op (plus) op2 (op) (simplify (op @x @y) (op2 @x @y) generated gimple code: http://pastebin.com/h1uau9qB 'op' is not replaced in the generated code on line 33: *res_code = op; I think it would be a better idea to make op2 iterate over same set of operators (op2-substitutes = op-substitutes). I have attached patch for the same. Bootstrap + testing in progress on x86_64-unknown-linux-gnu. OK for trunk after bootstrap+testing completes ? Hmm, but then the example could as well just use 'op'. I think we should instead reject this. Consider (for op (plus minus) (for op2 (op) (simplify ... where it is not clear what would be desired. Simple replacement of 'op's value would again just mean you could have used 'op' in the first place. Doing what you propose would get you (for op (plus minus) (for op2 (plus minus) (simplify ... thus a different iteration. I wonder if we really need is_oper_list flag in user_id ? We can determine if user_id is an operator list if user_id::substitutes is not empty ? After your change yes. That will lose the ability to distinguish between user-defined operator list and list-iterator in for like op/op2, but I suppose we (so far) don't need to distinguish between them ? Well, your change would simply make each list-iterator a (temporary) user-defined operator list as well as the current iterator element (dependent on context - see the nested for example). I think that adds to confusion. AFAIU, the way it's implemented in lower_for, the iterator is handled the same as a user-defined operator list. I was wondering if we should get rid of 'for' altogether and have it replaced by operator-list ? IMHO having two different things - iterator and operator-list is unnecessary and we could brand iterator as a local operator-list. We could extend syntax of 'simplify' to accommodate local operator-lists. So we can say, using an operator-list within 'match' replaces it by corresponding operators in that list. Operator-lists can be global (visible to all patterns), or local to a particular pattern. eg: a) single for (for op (...) (simplify (op ...))) can be written as: (simplify op (...) // define local operator-list op. (op ...)) // proceed here the same way as for lowering global operator list. b) multiple iterators: (for op1 (...) op2 (...) (simplify (op1 (op2 ... can be written as: (simplify op1 (...) op2 (...) (op1 (op2 ...))) c) nested for (for op1 (...) (for op2 (...) (simplify (op1 (op2 ... can be written as: (simplify op1 (...) (simplify op2 (...) (op1 (op2 ... My rationale behind removing 'for' is we don't need to distinguish between an operator-list and iterator, and only have an operator-list -;) Also we can reuse parser::parse_operator_list (in parser::parse_for parsing oper-list is duplicated) and get rid of 'parser::parse_for'. We don't need to change lowering, since operator-lists are handled the same way as 'for' (we can keep lowering of simplify::for_vec as it is). Does it sound reasonable ? Thanks, Prathamesh So - can you instead reject this use? Well my intention was to have support for walking operator list in reverse. For eg: (for bitop (bit_and bit_ior) rbitop (bit_ior bit_and) ...) Could be replaced by sth like: (for bitop (bit_and bit_ior) rbitop (~bitop)) ...) where ~bitop would indicate walking (bit_and bit_ior) in reverse. Would that be a good idea ? For symmetry, I thought (for op (list) op2 (op)) should be supported too. Thanks, Prathamesh Thanks, Richard.
Re: [PATCH, RS6000] gpc_reg_operand
On Mon, May 18, 2015 at 9:09 PM, Alan Modra amo...@gmail.com wrote: gpc_reg_operand contains a test that dates back to the early 1990s, when FIRST_PSEUDO_REGISTER was 76. At that point, testing REGNO (op) = ARG_POINTER_REGNUM allowed ap, xer/ca, cr0..cr7 and pseudos. A later patch corrected this to remove xer. Nowadays we have a lot more hard registers, altivec, vsx, htm, spe hi. Some of these are definitely special registers so not appropriate for gpc_reg_operand to match. This patch corrects the situation, to just allow pseudos, int, fp, altivec and vsx. ap and sfp are allowed via INT_REGNO_P. Note that I've removed cr0..cr7. I believe this is OK along with the extzvdi_internal2 change. I checked all uses of gpc_reg_operand and predicates derived from gpc_reg_operand. The patch also removes some operand predicates I discovered were unused, and tightens the vr save/restore patterns I added some time ago. Bootstrapped and regression tested powerpc64le-linux and powerpc64-linux, the latter with -mcpu=power7. OK for mainline? * config/rs6000/predicates.md (gpc_reg_operand): Don't allow all hard registers numbered greater or equal to ARG_POINTER_REGNUM. (reg_or_neg_short_operand, fix_trunc_dest_operand): Delete unused predicates. * config/rs6000/altivec.md (save_vregs_*, restore_vregs_*): Use altivec_register_operand. Make insn predicate TARGET_ALTIVEC. * config/rs6000/rs6000.md (extzvdi_internal2): Use cc_reg_operand. * config/rs6000/vsx.md (vsx_floatVSimode2): Expand comment. Okay. Thanks, David
Re: [PATCH] FreeBSD add functionality to build PIE executables.
On 18.05.15 19:36, Jeff Law wrote: On 05/17/2015 02:42 PM, Andreas Tobler wrote: Ping?! Thanks, Andreas On 11.05.15 22:34, Andreas Tobler wrote: All, this patch adds the ability to build PIE executables for FreeBSD. The core is since a longer time in the code base of FreeBSD itself and is working fine. This patch makes it available for all FreeBSD targets. Tested on x86_64-*-freebsd11.0 and armv6/hf-*-freebsd11.0, i386-*-freebsd11.0 is progress. In the same turn I removed the STARTFILE/ENDFILE_SPEC from config/i386/freebsd.h and use the ones from config/freebsd-spec.h. Here the results before the patch: https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01267.html and with the patch: https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg01324.html Is this ok for trunk and for 5.1X? Thanks, Andreas 2015-05-11 Andreas Tobler andre...@gcc.gnu.org * config/freebsd-spec.h (FBSD_STARTFILE_SPEC): Add the bits to build pie executables. (FBSD_ENDFILE_SPEC): Likewise. * config/i386/freebsd.h (STARTFILE_SPEC): Remove and use the one from config/freebsd-spec.h. (ENDFILE_SPEC): Likewise. 2015-05-11 Andreas Tobler andre...@gcc.gnu.org * lib/target-supports.exp (check_effective_target_pie): Add *-*-freebsd* to the familiy of pie capable targets. OK for the trunk, please install. Release managers have final say for the release branches. Done. Thanks for the review. Andreas
Re: [9/9] Simplify register bitmap operations
On 05/18/2015 12:27 PM, Richard Sandiford wrote: After the previous changes, several callers are left with code like: if (HARD_REGISTER_P (x)) bitmap_foo_range (x, REGNO (x), REG_NREGS (x)); else bitmap_foo (x, REGNO (x)); These might as well now be: if (REG_NREGS (x) 1) bitmap_foo_range (x, REGNO (x), REG_NREGS (x)); else bitmap_foo (x, REGNO (x)); since REG_NREGS is as cheap to test, and since single-register hard REGs are the common case. But if separating out the cases is a win -- and it seems to be, very slightly -- then it would be better to add the shortcut to the range functions themselves. gcc/ * bitmap.c (bitmap_set_range): Handle count==1 specially. (bitmap_clear_range): Likewise. * cfgcleanup.c (mark_effect): Use bitmap_clear_range and bitmap_set_range unconditionally. * df-problems.c (df_simulate_one_insn_forwards): Likewise. * df-scan.c (df_mark_reg): Likewise. * haifa-sched.c (setup_ref_regs): Likewise. * sched-rgn.c (update_live_1): Likewise. OK. jeff
Re: [PATCH] add self-tuning to x86 hardware fast path in libitm
On Mon, 2015-05-18 at 23:39 +0200, Andi Kleen wrote: Are there better options for the utility function, or can we tune it to There is nothing better that isn't a lot slower. Do you care to elaborate why? As-is, I find this statement to not be convincing; at the very least we need to document why we think that something time-based is the best option. Other tuning attempts have looked at rates of aborted, attempted, and committed txns, for example. Why do we measure nb. of transactions in a whole period, and can't get the same info through measuring smaller but more specific time intervals (e.g., how long we wait for a serial txn)?
Re: [PATCH] fortran/66052 -- Prevent dereference of NULL pointer
On Sat, May 16, 2015 at 01:58:30PM +0200, Mikael Morin wrote: Index: gcc/fortran/decl.c === --- gcc/fortran/decl.c (revision 223094) +++ gcc/fortran/decl.c (working copy) @@ -6968,7 +6968,8 @@ gfc_match_protected (void) gfc_symbol *sym; match m; - if (gfc_current_ns-proc_name-attr.flavor != FL_MODULE) + if (gfc_current_ns-proc_name + gfc_current_ns-proc_name-attr.flavor != FL_MODULE) { gfc_error (PROTECTED at %C only allowed in specification part of a module); Wouldn't one get a slightly better error message if using !gfc_current_ns-proc_name || gfc_current_ns-proc_name-attr.flavor != FL_MODULE as condition ? OK with that change. Thanks. Committed ar r223324 with your suggested change. Indeed, a better error messaage emitted. -- Steve
Re: [PATCH] fortran/66057 -- detect malformed GENERIC statement
On Fri, May 15, 2015 at 10:37:24PM +0200, FX wrote: Does it also catch the other cases shown in the PR? Things like: generic :: 2 generic :: 2 = generic :: 1 = x generic :: ? Yes, the patch appears to catch those as well. -- Steve
[Committed] check_GNU_style.sh: Improve readability function calls
Hi, committed as obvious. Thanks, - Tom check_GNU_style.sh: Improve readability function calls 2015-05-18 Tom de Vries t...@codesourcery.com * check_GNU_style.sh: Improve readability function calls. --- contrib/check_GNU_style.sh | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh index 90c612f..b86c474 100755 --- a/contrib/check_GNU_style.sh +++ b/contrib/check_GNU_style.sh @@ -170,11 +170,12 @@ g 'Sentences should end with a dot. Dot, space, space, end of the comment.' \ '[[:alnum:]][[:blank:]]*\*/' vg 'There should be exactly one space between function name and parentheses.' \ -'\#define' '[[:alnum:]]([[:blank:]]{2,})?\(' +'\#define' \ +'[[:alnum:]]([[:blank:]]{2,})?\(' g 'There should be no space before closing parentheses.' \ '[[:graph:]][[:blank:]]+\)' ag 'Braces should be on a separate line.' \ -'\{' 'if[[:blank:]]\(|while[[:blank:]]\(|switch[[:blank:]]\(' - +'\{' \ +'if[[:blank:]]\(|while[[:blank:]]\(|switch[[:blank:]]\(' -- 1.9.1
check_GNU_style.sh: Don't cat empty file
Hi, committed as obvious. Thanks, - Tom check_GNU_style.sh: Don't cat empty file 2015-05-18 Tom de Vries t...@codesourcery.com * check_GNU_style.sh (g, ag, vg): Don't cat empty file. --- contrib/check_GNU_style.sh | 30 -- 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh index 728c11a..ab59b1e 100755 --- a/contrib/check_GNU_style.sh +++ b/contrib/check_GNU_style.sh @@ -84,10 +84,16 @@ grep $format '^+' $files \ g (){ local msg=$1 local arg=$2 + +local found=false cat $inp \ | egrep --color=always -- $arg \ - $tmp printf \n$msg\n -cat $tmp + $tmp found=true + +if $found; then + printf \n$msg\n + cat $tmp +fi } # And Grep @@ -95,11 +101,17 @@ ag (){ local msg=$1 local arg1=$2 local arg2=$3 + +local found=false cat $inp \ | egrep --color=always -- $arg1 \ | egrep --color=always -- $arg2 \ - $tmp printf \n$msg\n -cat $tmp + $tmp found=true + +if $found; then + printf \n$msg\n + cat $tmp +fi } # reVerse Grep @@ -107,11 +119,17 @@ vg (){ local msg=$1 local varg=$2 local arg=$3 + +local found=false cat $inp \ | egrep -v -- $varg \ | egrep --color=always -- $arg \ - $tmp printf \n$msg\n -cat $tmp + $tmp found=true + +if $found; then + printf \n$msg\n + cat $tmp +fi } col (){ -- 1.9.1
[Committed] check_GNU_style.sh: Declare local vars with local
Hi, committed as obvious. Thanks, - Tom check_GNU_style.sh: Declare local vars with local 2015-05-18 Tom de Vries t...@codesourcery.com * check_GNU_style.sh (g, ag, vg, col): Declare local vars with local. --- contrib/check_GNU_style.sh | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/contrib/check_GNU_style.sh b/contrib/check_GNU_style.sh index b86c474..728c11a 100755 --- a/contrib/check_GNU_style.sh +++ b/contrib/check_GNU_style.sh @@ -82,8 +82,8 @@ grep $format '^+' $files \ # Grep g (){ -msg=$1 -arg=$2 +local msg=$1 +local arg=$2 cat $inp \ | egrep --color=always -- $arg \ $tmp printf \n$msg\n @@ -92,9 +92,9 @@ g (){ # And Grep ag (){ -msg=$1 -arg1=$2 -arg2=$3 +local msg=$1 +local arg1=$2 +local arg2=$3 cat $inp \ | egrep --color=always -- $arg1 \ | egrep --color=always -- $arg2 \ @@ -104,9 +104,9 @@ ag (){ # reVerse Grep vg (){ -msg=$1 -varg=$2 -arg=$3 +local msg=$1 +local varg=$2 +local arg=$3 cat $inp \ | egrep -v -- $varg \ | egrep --color=always -- $arg \ @@ -115,7 +115,7 @@ vg (){ } col (){ -msg=$1 +local msg=$1 local first=true local f for f in $files; do -- 1.9.1
Re: [PATCH 3/4] split-stack for powerpc64
On Mon, May 18, 2015 at 12:24:51PM +0930, Alan Modra wrote: +error (%-fsplit-stack% currently only supported on PowerPC64 GNU/Linux with glibc-2.18 or later); I forgot to comment on this. 2.19 is actually when __private_ss appeared in the ppc tcbhead_t, but I misread the commit date and thought it was 2.18. I was going to correct that, but then wondered if glibc allocates any spare bytes, and it looks like it does. We have a struct pthread before tcbhead_t, and struct pthread is aligned according to TCB_ALIGNMENT which is 16 for powerpc. So 2.18's 56 byte tcbhead_t, which is laid out so that the end coincides with tp-0x7000, must be preceded by 8 bytes of padding. Enough for __private_ss, and we don't care if it isn't initially zero. -- Alan Modra Australia Development Lab, IBM
Re: [PATCH][PR66010] Don't take address of ap unless necessary
On Thu, 14 May 2015, Tom de Vries wrote: On 12-05-15 12:04, Richard Biener wrote: The fact that we have to handle this specially in both build_va_arg and gimplify_va_arg makes me wonder whether we should be dealing with all this in build_va_arg already. That is, determine whether we take the address, and add the address operator if so in build_va_arg already. Likewise, determine do_deref and pass that as extra operand. That would certainly be a nice cleanup. I've implemented this cleanup. Furthermore, I realized that in expand_ifn_va_arg_1, 'do_deref == 0' disables code: ... if (do_deref == integer_one_node) ap = build_fold_indirect_ref (ap); ... that has a copy in the target hook: ... gpr = build3 (COMPONENT_REF, TREE_TYPE (f_gpr), build_va_arg_indirect_ref (valist), f_gpr, NULL_TREE); valist = build_va_arg_indirect_ref (valist); ... It's easier to remove that code from the few target hooks that do that, execute the 'do_deref == 1' code unconditionally in expand_ifn_va_arg_1 and simply remove all do_deref propagation (that also means we don't have to add an operand to VA_ARG_EXPR). Bootstrapped and reg-tested on x86_64, with and without -m32. Did partial build and compiled a few va_arg test-cases on the other targets. OK for trunk? Ok. Thanks, Richard. Thanks, - Tom -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH][calls.c] Remove #ifdef checks on STACK_GROWS_DOWNWARD
Hi all, As per https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01166.html I saw that STACK_GROWS_DOWNWARD is being checked for ifdef in a number of places unrelated to the patch in the PR, so decided to remove the ifdef checks for STACK_GROWS_DOWNWARD in a separate patch. It's fairly mechanical. Bootstrapped on x86_64 and aarch64 and tested on arm. I believe these patches are considered obvious, so I'll commit this in a day, unless someone objects, before proceeding with the changes in https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01166.html 2015-05-18 Kyrylo Tkachov kyrylo.tkac...@arm.com * calls.c: Always define STACK_GROWS_DOWNWARD as 0 or 1. (mem_overlaps_already_clobbered_arg_p): Rewrite ifdef STACK_GROWS_DOWNWARD as normal if. (expand_call): Likewise. commit 13a36809b55631dbc2c7e7dfb564857f6fb74339 Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Wed May 13 10:20:59 2015 +0100 [calls.c] Remove #ifdef checks on STACK_GROWS_DOWNWARD diff --git a/gcc/calls.c b/gcc/calls.c index caa7d60..e11b1b9 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -81,6 +81,15 @@ along with GCC; see the file COPYING3. If not see #include tree-chkp.h #include rtl-chkp.h + +/* Redefine STACK_GROWS_DOWNWARD in terms of 0 or 1. */ +#ifdef STACK_GROWS_DOWNWARD +# undef STACK_GROWS_DOWNWARD +# define STACK_GROWS_DOWNWARD 1 +#else +# define STACK_GROWS_DOWNWARD 0 +#endif + /* Like PREFERRED_STACK_BOUNDARY but in units of bytes, not bits. */ #define STACK_BYTES (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT) @@ -1974,11 +1983,12 @@ mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size) return true; else i = INTVAL (val); -#ifdef STACK_GROWS_DOWNWARD - i -= crtl-args.pretend_args_size; -#else - i += crtl-args.pretend_args_size; -#endif + + if (STACK_GROWS_DOWNWARD) +i -= crtl-args.pretend_args_size; + else +i += crtl-args.pretend_args_size; + if (ARGS_GROW_DOWNWARD) i = -i - size; @@ -2908,12 +2918,13 @@ expand_call (tree exp, rtx target, int ignore) if (pass == 0) { argblock = crtl-args.internal_arg_pointer; - argblock -#ifdef STACK_GROWS_DOWNWARD - = plus_constant (Pmode, argblock, crtl-args.pretend_args_size); -#else - = plus_constant (Pmode, argblock, -crtl-args.pretend_args_size); -#endif + if (STACK_GROWS_DOWNWARD) + argblock + = plus_constant (Pmode, argblock, crtl-args.pretend_args_size); + else + argblock + = plus_constant (Pmode, argblock, -crtl-args.pretend_args_size); + stored_args_map = sbitmap_alloc (args_size.constant); bitmap_clear (stored_args_map); }
Re: [RFA] More type narrowing in match.pd V2
On Mon, May 18, 2015 at 11:17 AM, Richard Biener richard.guent...@gmail.com wrote: On Thu, May 14, 2015 at 4:28 PM, Jeff Law l...@redhat.com wrote: On 05/14/2015 08:04 AM, Marc Glisse wrote: On Fri, 1 May 2015, Jeff Law wrote: Slight refactoring of the condition by using types_match as suggested by Richi. I also applied the new types_match to 2 other patterns in match.pd where it seemed clearly appropriate. I would like to propose this small tweak (regtested ok). If we had a different type for trees and types, this would be overloading the function. We already do this in a few places, and I find the resulting shorter code more readable. 2015-05-14 Marc Glisse marc.gli...@inria.fr * generic-match-head.c (types_match): Handle non-types. * gimple-match-head.c (types_match): Likewise. * match.pd: Remove unnecessary TREE_TYPE for types_match. Update the comment for types_match and this is fine. No - it breaks genmatch autodetection of what trees escape. It special-cases TREE_TYPE (). See capture_info::walk_c_expr: /* Give up for C exprs mentioning captures not inside TREE_TYPE (). */ unsigned p_depth = 0; so the patch will pessimize GENERIC folding. Please do not apply. Hmm, ok - if the functions are only ever used in if()s then it's ok. You just have to remember to never use them in (with ...) expressions or transform results. Like for example (with { bool overflow_p; wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), overflow_p); } is bad because genmatch thinks that @1 and @2 might be used multiple times in the simplification because they escape. It doesn't have sophisticated enough analysis to see that mul is only used in a following 'if'. I suppose we might want to emit warnings from genmatch.c whenever it has to give up for multi-use (thus need to emit save_expr ()) or non-use (need to use omit_one_operand if TREE_SIDE_EFFECTS, but as it doesn't know whether it is used in the result we guard the whole pattern with !TREE_SIDE_EFFECTS). Richard. Richard. jeff
Re: [PATCH, PR middle-end/66134] Fix coalescing for bounds used in abnormal phi
On Fri, May 15, 2015 at 11:43 AM, Ilya Enkovich enkovich@gmail.com wrote: Hi, This patch fixes misuse of abnormal bounds copy to avoid coalescing issue. Bootstrapped and regtested for x86_64-unknown-linux-gnu. Applied to trunk. Is it OK for gcc-5? Hum. I'm not sure how you can assert the value is defined by a gimple-assign. (what about PHIs?) also see below Thanks, Ilya -- gcc/ 2015-05-15 Ilya Enkovich enkovich@gmail.com PR middle-end/66134 * tree-chkp.c (chkp_get_orginal_bounds_for_abnormal_copy): New. (chkp_maybe_copy_and_register_bounds): Don't copy abnormal copy. gcc/testsuite/ 2015-05-15 Ilya Enkovich enkovich@gmail.com PR middle-end/66134 * gcc.target/i386/mpx/pr66134.c: New test. diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr66134.c b/gcc/testsuite/gcc.target/i386/mpx/pr66134.c new file mode 100755 index 000..3889674 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mpx/pr66134.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fcheck-pointer-bounds -mmpx -fno-tree-ccp } */ + +extern int vfork (void) __attribute__ ((__nothrow__ , __leaf__)); +void test1 (void); +void test2 (void); +void test3 (int *); + +void test (int *p) +{ + test1 (); + p++; + test2 (); + p++; + vfork (); + test3 (p); +} diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c index 288470b..17a52bc 100644 --- a/gcc/tree-chkp.c +++ b/gcc/tree-chkp.c @@ -1097,7 +1097,20 @@ chkp_get_bounds_var (tree ptr_var) return bnd_var; } +/* If BND is an abnormal bounds copy, return a copied value. + Otherwise return BND. */ +static tree +chkp_get_orginal_bounds_for_abnormal_copy (tree bnd) +{ + if (bitmap_bit_p (chkp_abnormal_copies, SSA_NAME_VERSION (bnd))) +{ + gimple bnd_def = SSA_NAME_DEF_STMT (bnd); + gcc_checking_assert (gimple_code (bnd_def) == GIMPLE_ASSIGN); + bnd = gimple_assign_rhs1 (bnd_def); +} + return bnd; +} /* Register bounds BND for object PTR in global bounds table. A copy of bounds may be created for abnormal ssa names. @@ -1141,11 +1154,7 @@ chkp_maybe_copy_and_register_bounds (tree ptr, tree bnd) /* For abnormal copies we may just find original bounds and use them. */ if (!abnormal_ptr !SSA_NAME_IS_DEFAULT_DEF (bnd)) - { - gimple bnd_def = SSA_NAME_DEF_STMT (bnd); - gcc_checking_assert (gimple_code (bnd_def) == GIMPLE_ASSIGN); - bnd = gimple_assign_rhs1 (bnd_def); - } + bnd = chkp_get_orginal_bounds_for_abnormal_copy (bnd); /* For undefined values we usually use none bounds value but in case of abnormal edge it may cause coalescing failures. Use default definition of @@ -1177,6 +1186,7 @@ chkp_maybe_copy_and_register_bounds (tree ptr, tree bnd) copy = make_temp_ssa_name (pointer_bounds_type_node, gimple_build_nop (), Just pass NULL here to avoid creating a garbage NOP stmt. CHKP_BOUND_TMP_NAME); + bnd = chkp_get_orginal_bounds_for_abnormal_copy (bnd); assign = gimple_build_assign (copy, bnd); if (dump_file (dump_flags TDF_DETAILS))
Re: Small fold-const.c / match.pd tweaks
On Fri, May 15, 2015 at 12:52 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, these are just a few tweaks / clean-ups. The 'ord' pattern is the dual of the 'unord' one I added last month. The tree_unary_nonnegative_warnv_p change should have no effect (it is only used from VRP which does not handle vectors (yet?)) but it seems right to handle vectors like scalars here. Regtested on ppc64le-redhat-linux. 2015-05-15 Marc Glisse marc.gli...@inria.fr PR tree-optimization/63387 gcc/ * match.pd ((X /[ex] A) * A - X): Remove unnecessary condition. ((x ord x) (y ord y) - (x ord y), (x ord x) (x ord y) - (x ord y)): New simplifications. * fold-const.c (tree_unary_nonnegative_warnv_p) ABS_EXPR: Handle vectors like scalars. gcc/testsuite/ * gcc.dg/pr63387-2.c: New testcase. -- Marc Glisse Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 223214) +++ gcc/fold-const.c(working copy) @@ -14685,21 +14685,21 @@ tree_unary_nonnegative_warnv_p (enum tre bool *strict_overflow_p) { if (TYPE_UNSIGNED (type)) return true; switch (code) { case ABS_EXPR: /* We can't return 1 if flag_wrapv is set because ABS_EXPRINT_MIN = INT_MIN. */ - if (!INTEGRAL_TYPE_P (type)) + if (!ANY_INTEGRAL_TYPE_P (type)) return true; if (TYPE_OVERFLOW_UNDEFINED (type)) { *strict_overflow_p = true; return true; } break; case NON_LVALUE_EXPR: case FLOAT_EXPR: Index: gcc/match.pd === --- gcc/match.pd(revision 223214) +++ gcc/match.pd(working copy) @@ -791,22 +791,21 @@ along with GCC; see the file COPYING3. TYPE_PRECISION (type) = TYPE_PRECISION (TREE_TYPE (@0)) operand_equal_p (@1, build_low_bits_mask (TREE_TYPE (@1), TYPE_PRECISION (type)), 0)) (convert @0))) /* (X /[ex] A) * A - X. */ (simplify (mult (convert? (exact_div @0 @1)) @1) /* Look through a sign-changing conversion. */ - (if (TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (type)) - (convert @0))) + (convert @0)) Hmm, indeed. Looks like we were overly cautionous (in forwprop where I moved this from) /* Canonicalization of binary operations. */ /* Convert X + -C into X - C. */ (simplify (plus @0 REAL_CST@1) (if (REAL_VALUE_NEGATIVE (TREE_REAL_CST (@1))) (with { tree tem = fold_unary (NEGATE_EXPR, type, @1); } (if (!TREE_OVERFLOW (tem) || !flag_trapping_math) (minus @0 { tem; }) @@ -944,22 +943,29 @@ along with GCC; see the file COPYING3. (icmp @0 @1)) (if (ic == ncmp) (ncmp @0 @1) /* Unordered tests if either argument is a NaN. */ (simplify (bit_ior (unordered @0 @0) (unordered @1 @1)) (if (types_match (@0, @1)) (unordered @0 @1))) (simplify + (bit_and (ordered @0 @0) (ordered @1 @1)) + (if (types_match (@0, @1)) + (ordered @0 @1))) (for op (unordered ordered) op2 (bit_ior bit_and) (simplify (op2 (op @0 @0) (op @1 @1)) (if ... (op @0 @1))) would be a merged variant (nor sure if better - if the list increases maybe). I suppose your version is easier to read and not very much bigger. Patch is ok as-is. Thanks, Richard. +(simplify (bit_ior:c (unordered @0 @0) (unordered:c@2 @0 @1)) @2) +(simplify + (bit_and:c (ordered @0 @0) (ordered:c@2 @0 @1)) + @2) /* Simplification of math builtins. */ (define_operator_list LOG BUILT_IN_LOGF BUILT_IN_LOG BUILT_IN_LOGL) (define_operator_list EXP BUILT_IN_EXPF BUILT_IN_EXP BUILT_IN_EXPL) (define_operator_list LOG2 BUILT_IN_LOG2F BUILT_IN_LOG2 BUILT_IN_LOG2L) (define_operator_list EXP2 BUILT_IN_EXP2F BUILT_IN_EXP2 BUILT_IN_EXP2L) (define_operator_list LOG10 BUILT_IN_LOG10F BUILT_IN_LOG10 BUILT_IN_LOG10L) (define_operator_list EXP10 BUILT_IN_EXP10F BUILT_IN_EXP10 BUILT_IN_EXP10L) (define_operator_list POW BUILT_IN_POWF BUILT_IN_POW BUILT_IN_POWL) Index: gcc/testsuite/gcc.dg/pr63387-2.c === --- gcc/testsuite/gcc.dg/pr63387-2.c(revision 0) +++ gcc/testsuite/gcc.dg/pr63387-2.c(working copy) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-optimized } */ + +int f(double aaa, double bbb){ + int xa = !__builtin_isunordered(aaa, aaa); + int xb = !__builtin_isunordered(bbb, bbb); + return xa xb; +} + +int g(double aaa, double bbb){ + int xa = !__builtin_isunordered(aaa, bbb); + int xb = !__builtin_isunordered(bbb, bbb); + return xa xb; +} + +int h(double ccc, float ddd){ + int xc = !__builtin_isunordered(ccc, ccc); + int xd = !__builtin_isunordered(ddd, ddd); + return xc xd; +} +
Re: match.pd: (x | y) ~x - y ~x
On Fri, May 15, 2015 at 7:22 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, we already have the more complicated: x ~(x y) - x ~y (which I am reindenting by the way) and the simpler: (~x | y) x - x y, so I am proposing this one for completeness. Regtested on ppc64le-redhat-linux. Ok (doesn't seem to be in fold-const.c). Btw, there are quite some (simple) ones only in fold-const.c which are eligible for moving to match.pd (thus remove them from fold-const.c and implement in match.pd). Mostly canonicalization ones though. Richard. 2015-05-15 Marc Glisse marc.gli...@inria.fr gcc/ * match.pd ((x | y) ~x - y ~x, (x y) | ~x - y | ~x): New simplifications. gcc/testsuite/ * gcc.dg/nand.c: New testcase. -- Marc Glisse Index: match.pd === --- match.pd(revision 223217) +++ match.pd(working copy) @@ -257,24 +257,32 @@ along with GCC; see the file COPYING3. /* x + (x 1) - (x + 1) ~1 */ (simplify (plus:c @0 (bit_and@2 @0 integer_onep@1)) (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) (bit_and (plus @0 @1) (bit_not @1 /* x ~(x y) - x ~y */ /* x | ~(x | y) - x | ~y */ (for bitop (bit_and bit_ior) - (simplify -(bitop:c @0 (bit_not (bitop:c@2 @0 @1))) - (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) - (bitop @0 (bit_not @1) + (simplify + (bitop:c @0 (bit_not (bitop:c@2 @0 @1))) + (if (TREE_CODE (@2) != SSA_NAME || has_single_use (@2)) +(bitop @0 (bit_not @1) + +/* (x | y) ~x - y ~x */ +/* (x y) | ~x - y | ~x */ +(for bitop (bit_and bit_ior) + rbitop (bit_ior bit_and) + (simplify + (bitop:c (rbitop:c @0 @1) (bit_not@2 @0)) + (bitop @1 @2))) (simplify (abs (negate @0)) (abs @0)) (simplify (abs tree_expr_nonnegative_p@0) @0) /* Try to fold (type) X op CST - (type) (X op ((type-x) CST)) Index: testsuite/gcc.dg/nand.c === --- testsuite/gcc.dg/nand.c (revision 0) +++ testsuite/gcc.dg/nand.c (working copy) @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O -fdump-tree-original } */ + +unsigned f(unsigned x, unsigned y){ + return (x | y) ~x; +} +unsigned g(unsigned x, unsigned y){ + return ~x (y | x); +} + +/* { dg-final { scan-tree-dump-times return ~x y; 2 original } } */ +/* { dg-final { cleanup-tree-dump original } } */