Re: [PATCH] Fix part of PR58712
LTO bootstrapped and tested on x86_64-unknown-linux-gnu. I would be grateful if one of you guys could apply it. Thanks. 2013-10-21 Markus Trippelsdorf mar...@trippelsdorf.de PR ipa/58712 * cgraph.c (cgraph_create_edge_1): Add indirect_unknown_callee as argument. (cgraph_create_edge): Use the new argument. (cgraph_create_indirect_edge): Likewise. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index 6ebd0c71e026..7f83a016dcf3 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -814,7 +814,8 @@ cgraph_set_call_stmt (struct cgraph_edge *e, gimple new_stmt, static struct cgraph_edge * cgraph_create_edge_1 (struct cgraph_node *caller, struct cgraph_node *callee, - gimple call_stmt, gcov_type count, int freq) + gimple call_stmt, gcov_type count, int freq, + bool indir_unknown_callee) { struct cgraph_edge *edge; @@ -874,6 +875,7 @@ cgraph_create_edge_1 (struct cgraph_node *caller, struct cgraph_node *callee, edge-indirect_info = NULL; edge-indirect_inlining_edge = 0; edge-speculative = false; + edge-indirect_unknown_callee = indir_unknown_callee; if (call_stmt caller-call_site_hash) cgraph_add_edge_to_call_site_hash (edge); @@ -887,9 +889,8 @@ cgraph_create_edge (struct cgraph_node *caller, struct cgraph_node *callee, gimple call_stmt, gcov_type count, int freq) { struct cgraph_edge *edge = cgraph_create_edge_1 (caller, callee, call_stmt, - count, freq); + count, freq, false); - edge-indirect_unknown_callee = 0; initialize_inline_failed (edge); edge-next_caller = callee-callers; @@ -926,10 +927,9 @@ cgraph_create_indirect_edge (struct cgraph_node *caller, gimple call_stmt, gcov_type count, int freq) { struct cgraph_edge *edge = cgraph_create_edge_1 (caller, NULL, call_stmt, - count, freq); + count, freq, true); tree target; - edge-indirect_unknown_callee = 1; initialize_inline_failed (edge); edge-indirect_info = cgraph_allocate_init_indirect_info (); -- Markus
Re: [PATCH] Fix PR58775
On Mon, Oct 21, 2013 at 01:20:06PM +0800, Zhenqiang Chen wrote: Thanks for the comments. Patch is updated to set uid in update_range_test after force_gimple_operand_gsi. I am not sure the patch can cover all cases. If not, I think force_gimple_operand_gsi_1 maybe the only place to set uid for all new generated statements. No, force_gimple_operand_gsi can add more than one stmt, thus I think it is appropriate to do this instead (or force_gimple_operand_1 into a gimple_seq, set uid on all stmts in that seq and then insert them). OT, the other force_gimple_operand_gsi in negate_value lead me to a thought that we could avoid the extra walk of all the IL in do_reassoc - renumber_gimple_stmt_uids by setting the uids in break_up_subtract_bb, where we already reset the visited flags to avoid the extra IL walk. 2013-10-21 Jakub Jelinek ja...@redhat.com PR tree-optimization/58775 * tree-ssa-reassoc.c (update_range_test): Set uid on stmts added by force_gimple_operand_gsi. * gcc.c-torture/compile/pr58775.c: New test. --- gcc/tree-ssa-reassoc.c.jj 2013-10-21 09:00:25.0 +0200 +++ gcc/tree-ssa-reassoc.c 2013-10-21 09:47:24.431082049 +0200 @@ -2011,6 +2011,11 @@ update_range_test (struct range_entry *r gsi = gsi_for_stmt (stmt); tem = force_gimple_operand_gsi (gsi, tem, true, NULL_TREE, true, GSI_SAME_STMT); + for (gsi_prev (gsi); !gsi_end_p (gsi); gsi_prev (gsi)) +if (gimple_uid (gsi_stmt (gsi))) + break; +else + gimple_set_uid (gsi_stmt (gsi), gimple_uid (stmt)); /* If doing inter-bb range test optimization, update the stmts immediately. Start with changing the first range test --- gcc/testsuite/gcc.c-torture/compile/pr58775.c.jj2013-10-21 09:48:34.034726581 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr58775.c 2013-10-21 09:48:47.029662347 +0200 @@ -0,0 +1,26 @@ +/* PR tree-optimization/58775 */ + +void bar (void); + +void +foo (char *x) +{ + char a; + _Bool b, c, d, e, f, g, h, i, j, k, l, m; + + a = *x; + b = a == 100; + c = a == 105; + d = b | c; + e = a != 111; + f = !d; + g = e f; + h = a != 117; + i = g h; + j = a != 120; + k = i j; + l = a != 88; + m = k l; + if (m == 0) +bar (); +} Jakub
Re: [PR libstdc++/58804][PR libstdc++/58729] tr2/dynamic_bitset issues.
On 10/21/2013 02:02 AM, Ed Smith-Rowland wrote: Greetings. Here is a patch to correct tr2/dynamic_bitset to use __builtin_xxxll for long long instead of the long versions. Relevant bugs: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58804 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58729 Builds and *really* tests clean on x86_64-linux. OK? Ok, thanks. Remember to close the Bugs. Paolo.
[PATCH][AArch64] Get %c output template tests to pass for -fPIC
Hi all, When I added the tests for the %c output template for aarch64 I did not take into account that compiling them with -fPIC would fail. This patch fixes them to use the 'S' constraint to get them to work. Ok for trunk? Tested on aarch64-none-elf/-fPIC Thanks, Kyrill [gcc/testsuite] 2013-10-21 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/c-output-mod-2.c: Fix for -fPIC. * gcc.target/aarch64/c-output-mod-3.c: Likewise.diff --git a/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c b/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c index 16ff58d..ced96d0 100644 --- a/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c +++ b/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c @@ -1,15 +1,9 @@ /* { dg-do compile } */ -struct tracepoint { -int dummy; -int state; -}; -static struct tracepoint tp; - void test (void) { -__asm__ (@ %c0 : : i (tp)); +__asm__ (@ %c0 : : S (test)); } -/* { dg-final { scan-assembler @ tp } } */ +/* { dg-final { scan-assembler @ test } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c b/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c index e332fe1..c28837c 100644 --- a/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c +++ b/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c @@ -1,15 +1,10 @@ /* { dg-do compile } */ - -struct tracepoint { -int dummy; -int state; -}; -static struct tracepoint tp; +/* { dg-options -Wno-pointer-arith } */ void test (void) { -__asm__ (@ %c0 : : i (tp.state)); +__asm__ (@ %c0 : : S (test + 4)); } -/* { dg-final { scan-assembler @ tp\\+4 } } */ +/* { dg-final { scan-assembler @ test\\+4 } } */
Re: libstdc++ Testsuite extension for sort, partial_sort, partial_sort_copy, nth_element
Hi, On 10/19/2013 11:10 PM, Christopher Jefferson wrote: The following patch (related to my 58800 patch) adds many more tests for several important functions. While 58800 is my fault, the reason it was not caught earlier is that many functions in libstdc++ have almost no testing. This works toward fixing that. These tests are kept to an acceptable amount of time (~2 seconds/ function) at the moment (and automatic reduction for simulators), although I would like to take much more time and do much more testing, but I realise that might annoy some people so it might not be on by default. I plan to write my tests if these are accepted (although writing this reminded me why gcc is the worst open source project for submitting code to I have ever worked with). 2013-10-16 Edward Smith-Rowland 3dw...@verizon.net * testsuite/util/testsuite_iterators.h : Add 'val' method to read testsuite container. * testsuite/25_algorithms/partial_sort/nth_element.cc : New * testsuite/25_algorithms/partial_sort/random_test.cc : New * testsuite/25_algorithms/partial_sort_copy/random_test.cc : New * testsuite/25_algorithms/sort/random_test.cc : New * testsuite/util/testsuite_containergen.h : New header for automatically testing functions on a large set of autogenerated headers a bunch of nits and a more substantive issue. The nits are: the name in the ChangeLog ;); these days prefer -std=gnu++11 to 0x, only 2013 as Copyright date. Also, stylistically, we don't use curly braces in single-line for loop bodies. Watch out overlong lines, as usual (I spotted a couple) Slightly more interesting: while we are at it, we are including the whole random, why not using uniform_int_distribution instead of fiddling with % in lots of places? It's also preferable from a statistical point of view (probably doesn't matter much in this specific application, but using % isn't something we want to recommend to people studying free software to learn about these topics, eh!) Thanks, Paolo.
Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov
The patch is OK. Thanks, the patch was committed. That's a good point. I added a check for this case - so if CONST_INT is passed we assume that mode is QI. But usually promoted value resides in a register, so we don't go over-conservative here. Hmm, so if we use broadcast constant, then we won't end up having CONST_INT here? It is OK then. Currently we don't end up with CONST_INT. But maybe we'd need to revisit this place later. Michael Honza
Re: [Patch, fortran] PR58793 - Wrong value for _vtab for intrinsic types with CLASS(*): storage_size of class(*) gives wrong result
Paul Richard Thomas wrote: This patch is fairly obvious and follows a suggestion from Tobias to use gfc_element_size. He even wrote the testcase! Bootstrapped and regested on FC17/x86_64 - OK for trunk? Looks good to me. Thanks for the patch. PS In writing this, I have just noted that I need to trap ts-type == BT_HOLLERITH. It would be a rather bizarre thing to try to do but I just know that somebody, somewhere will eventually try it :-) Consider that it will be done in the committed version. I think that should be done as follow-up patch.In particular, it is not obvious how it should be handled. The program below ICEs in gfc_typenode_for_spec;with both Intel's and Cray's compiler, it gives Character with len 3 Something else I somehow had expected that it maps to a character. program test call up(abc) call up(3habc) contains subroutine up(x) class(*) :: x select type(x) type is (character(*)) print *, 'Character with len', len(x) class default print *, 'Something else' end select end subroutine end program test The following - related to TYPE(*) - is also accepted, but I think it shouldn't. (call up(x) is presumably invalid). It fails here at assembler time. program test use iso_c_binding integer,target ::aa call up(c_loc(aa)) ! Probably okay but not matchable? call bar(aa) !OK contains subroutine up(x) class(*) :: x select type(x) ! type is (c_ptr) ! invalid rejected by gfortran ! print *, 'C_ptr' class default print *, 'Something else' end select end subroutine subroutine bar(x) type(*) :: x call up(x) ! Invalid end subroutine bar end program test Tobias 2013-10-20 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * class.c : Include target-memory.h. (gfc_find_intrinsic_vtab) Build a minimal expression so that gfc_element_size can be used to obtain the storage size, rather that the kind value. 2013-10-20 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * gfortran.dg/unlimited_polymorphic_12.f90 : New test.
[4.8 branch] Handle svn 1.8.1
I'd like to backport this patch from trunk to handle subversion 1.8.1 output. Ok to commit? Andreas 2013-10-21 Mike Stump mikest...@comcast.net * gcc_update (configure): Update to handle svn 1.8.1. Index: contrib/gcc_update === --- contrib/gcc_update (revision 203886) +++ contrib/gcc_update (working copy) @@ -382,7 +382,7 @@ fi revision=`$GCC_SVN info | awk '/Revision:/ { print $2 }'` - branch=`$GCC_SVN info | sed -ne /URL:/ { + branch=`$GCC_SVN info | sed -ne /^URL:/ { s,.*/trunk,trunk, s,.*/branches/,, s,.*/tags/,, -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126
Re: [4.8 branch] Handle svn 1.8.1
On Mon, Oct 21, 2013 at 12:01:28PM +0200, Andreas Jaeger wrote: Ok to commit? Andreas 2013-10-21 Mike Stump mikest...@comcast.net * gcc_update (configure): Update to handle svn 1.8.1. Ok. --- contrib/gcc_update(revision 203886) +++ contrib/gcc_update(working copy) @@ -382,7 +382,7 @@ fi revision=`$GCC_SVN info | awk '/Revision:/ { print $2 }'` - branch=`$GCC_SVN info | sed -ne /URL:/ { + branch=`$GCC_SVN info | sed -ne /^URL:/ { s,.*/trunk,trunk, s,.*/branches/,, s,.*/tags/,, -- Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn,Jennifer Guild,Felix Imendörffer,HRB16746 (AG Nürnberg) GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126 Jakub
[Patch] Fix gcc.dg/20050922-*.c
Hello, Tests gcc.dg/20050922-1.c and gcc.dg/20050922-2.c includes stdlib.h. This can be a issue especially since they define uint32_t. Testcase writing guidelines discourages such inclusion as well. This patch replaces these #includes with manual declarations. Tested for aarch64-none-elf, arm-none-eabi and x86_64-linux-gnu OK for trunk, 4.7, 4.8? VP. --- gcc/testsuite/ChangeLog: 2013-10-21 Vidya Praveen vidyaprav...@arm.com * gcc.dg/20050922-1.c: Remove stdlib.h and declare abort(). * gcc.dg/20050922-1.c: Remove stdlib.h and declare abort() and exit(). diff --git a/gcc/testsuite/gcc.dg/20050922-1.c b/gcc/testsuite/gcc.dg/20050922-1.c index ed5a3c6..982f820 100644 --- a/gcc/testsuite/gcc.dg/20050922-1.c +++ b/gcc/testsuite/gcc.dg/20050922-1.c @@ -4,7 +4,7 @@ /* { dg-do run } */ /* { dg-options -O1 -std=c99 } */ -#include stdlib.h +extern void abort (void); #if __INT_MAX__ == 2147483647 typedef unsigned int uint32_t; diff --git a/gcc/testsuite/gcc.dg/20050922-2.c b/gcc/testsuite/gcc.dg/20050922-2.c index c2974d0..2e8db82 100644 --- a/gcc/testsuite/gcc.dg/20050922-2.c +++ b/gcc/testsuite/gcc.dg/20050922-2.c @@ -4,7 +4,8 @@ /* { dg-do run } */ /* { dg-options -O1 -std=c99 } */ -#include stdlib.h +extern void abort (void); +extern void exit (int); #if __INT_MAX__ == 2147483647 typedef unsigned int uint32_t;
Add FLAGS_REG clobber to kxnormode
Hello! We are splitting it to normal XOR that includes clobber, so better start with the clobber from the beginning. 2013-10-20 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (kxnormode): Add FLAGS_REG clobber. Tested on x86_64-pc-linux-gnu and committed to mainline SVN. Uros. Index: i386.md === --- i386.md (revision 203885) +++ i386.md (working copy) @@ -8254,7 +8253,8 @@ (not:SWI12 (xor:SWI12 (match_operand:SWI12 1 register_operand 0,Yk) - (match_operand:SWI12 2 register_operand r,Yk] + (match_operand:SWI12 2 register_operand r,Yk + (clobber (reg:CC FLAGS_REG))] TARGET_AVX512F @ # @@ -8268,15 +8268,15 @@ (not:SWI12 (xor:SWI12 (match_dup 0) - (match_operand:SWI12 1 general_reg_operand] + (match_operand:SWI12 1 general_reg_operand + (clobber (reg:CC FLAGS_REG))] TARGET_AVX512F reload_completed [(parallel [(set (match_dup 0) (xor:HI (match_dup 0)
Re: [Patch, fortran] PR58793 - Wrong value for _vtab for intrinsic types with CLASS(*): storage_size of class(*) gives wrong result
Dear Tobias, Looks good to me. Thanks for the patch. I think that should be done as follow-up patch.In particular, it is not obvious how it should be handled. OK - I'll commit tonight, as is. The program below ICEs in gfc_typenode_for_spec;with both Intel's and Cray's compiler, it gives Character with len 3 Something else I somehow had expected that it maps to a character. I guess that it would do no harm to do that mapping anyway, I'll have a think about it tonight. Cheers and thanks for the review. Paul program test call up(abc) call up(3habc) contains subroutine up(x) class(*) :: x select type(x) type is (character(*)) print *, 'Character with len', len(x) class default print *, 'Something else' end select end subroutine end program test The following - related to TYPE(*) - is also accepted, but I think it shouldn't. (call up(x) is presumably invalid). It fails here at assembler time. program test use iso_c_binding integer,target ::aa call up(c_loc(aa)) ! Probably okay but not matchable? call bar(aa) !OK contains subroutine up(x) class(*) :: x select type(x) ! type is (c_ptr) ! invalid rejected by gfortran ! print *, 'C_ptr' class default print *, 'Something else' end select end subroutine subroutine bar(x) type(*) :: x call up(x) ! Invalid end subroutine bar end program test Tobias 2013-10-20 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * class.c : Include target-memory.h. (gfc_find_intrinsic_vtab) Build a minimal expression so that gfc_element_size can be used to obtain the storage size, rather that the kind value. 2013-10-20 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * gfortran.dg/unlimited_polymorphic_12.f90 : New test. -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
Re: [wide-int] Add is_sign_extended optimisation
On Sun, 20 Oct 2013, Richard Sandiford wrote: Another follow-up to yesterday's patch. This one implements Richard's suggestion of having an is_sign_extended trait to optimise cases where excess upper bits are known to be signs rather than undefined. The uses so far are: * make to_shwi () equivalent to slow () * turn eq_p into a simple loop * avoid extensions in sign_mask () * avoid extensions in set_len if the input was already sign-extended The first two are new (compared to wide-int svn) while the second two partially undo the negative effects of yesterday's patch on is_sign_extended values. E.g. bool f (wide_int x, HOST_WIDE_INT y) { return x == y; } now gives: xorl%eax, %eax cmpl$1, 264(%rsp) je .L27 ret .p2align 4,,10 .p2align 3 .L27: cmpq8(%rsp), %rdi sete%al ret And: wide_int f (wide_int x, wide_int y) { return x == y; } gives: movl264(%rsp), %ecx xorl%eax, %eax cmpl528(%rsp), %ecx je .L42 rep ret .p2align 4,,10 .p2align 3 .L42: xorl%eax, %eax jmp .L38 .p2align 4,,10 .p2align 3 .L44: addl$1, %eax cmpl%eax, %ecx je .L43 .L38: movl%eax, %edx movq272(%rsp,%rdx,8), %rsi cmpq%rsi, 8(%rsp,%rdx,8) je .L44 xorl%eax, %eax ret .p2align 4,,10 .p2align 3 .L43: movl$1, %eax ret (which is a bit poor -- je .L42 trivially threads to je. L38, although that's probably only true after RA). The code for: bool f (wide_int x, unsigned HOST_WIDE_INT y) { return x == y; } still needs some work though... The lts_p sequences of wide_int are similar to the ones Mike posted. Tested on x86_64-linux-gnu. OK for wide-int? Ok. Thanks, Richard. Thanks, Richard Index: gcc/rtl.h === --- gcc/rtl.h 2013-10-20 09:38:40.254493991 +0100 +++ gcc/rtl.h 2013-10-20 09:39:28.169894855 +0100 @@ -1410,6 +1410,7 @@ typedef std::pair rtx, enum machine_mod { static const enum precision_type precision_type = VAR_PRECISION; static const bool host_dependent_precision = false; +static const bool is_sign_extended = true; static unsigned int get_precision (const rtx_mode_t ); static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int, const rtx_mode_t ); Index: gcc/tree.h === --- gcc/tree.h2013-10-20 09:38:40.254493991 +0100 +++ gcc/tree.h2013-10-20 09:39:28.170894863 +0100 @@ -5158,6 +5158,7 @@ #define ANON_AGGRNAME_FORMAT __anon_%d { static const enum precision_type precision_type = FLEXIBLE_PRECISION; static const bool host_dependent_precision = false; +static const bool is_sign_extended = false; static unsigned int get_precision (const_tree); static wi::storage_ref decompose (HOST_WIDE_INT *, unsigned int, const_tree); Index: gcc/wide-int.h === --- gcc/wide-int.h2013-10-20 09:39:11.527755622 +0100 +++ gcc/wide-int.h2013-10-20 09:45:17.725820291 +0100 @@ -335,8 +335,21 @@ #define WI_UNARY_RESULT_VAR(RESULT, VAL, struct wide_int_storage; typedef generic_wide_int wide_int_storage wide_int; +template bool SE struct wide_int_ref_storage; -typedef generic_wide_int wide_int_ref_storage wide_int_ref; + +typedef generic_wide_int wide_int_ref_storage false wide_int_ref; + +/* This can be used instead of wide_int_ref if the referenced value is + known to have type T. It carries across properties of T's representation, + such as whether excess upper bits in a HWI are defined, and can therefore + help avoid redundant work. + + The macro could be replaced with a template typedef, once we're able + to use those. */ +#define WIDE_INT_REF_FOR(T) \ + generic_wide_int \ +wide_int_ref_storage wi::int_traits T::is_sign_extended /* Public functions for querying and operating on integers. */ namespace wi @@ -520,18 +533,6 @@ wi::storage_ref::get_val () const return val; } -namespace wi -{ - template - struct int_traits wi::storage_ref - { -static const enum precision_type precision_type = VAR_PRECISION; -/* wi::storage_ref can be a reference to a primitive type, - so this is the conservatively-correct setting. */ -static const bool host_dependent_precision = true; - }; -} - /* This class defines an integer type using the storage provided by the template argument. The storage class must provide the
[PATCH] Fix regression parts of PR58742
This fixes a regression introduced in GCC 4.2 after which we no longer fold (unsigned) (X /[ex] 4) * 4 to (unsigned) X. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2013-10-21 Richard Biener rguent...@suse.de PR middle-end/58742 * fold-const.c (fold_binary_loc): Fold ((T) (X /[ex] C)) * C to (T) X for sign-changing conversions (or no conversion). * c-c++-common/fold-divmul-1.c: New testcase. Index: gcc/fold-const.c === *** gcc/fold-const.c(revision 203885) --- gcc/fold-const.c(working copy) *** fold_binary_loc (location_t loc, *** 11002,11007 --- 11002,11014 fold_build2_loc (loc, MULT_EXPR, type, build_int_cst (type, 2) , arg1)); + /* ((T) (X /[ex] C)) * C cancels out if the conversion is +sign-changing only. */ + if (TREE_CODE (arg1) == INTEGER_CST + TREE_CODE (arg0) == EXACT_DIV_EXPR + operand_equal_p (arg1, TREE_OPERAND (arg0, 1), 0)) + return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0)); + strict_overflow_p = false; if (TREE_CODE (arg1) == INTEGER_CST 0 != (tem = extract_muldiv (op0, arg1, code, NULL_TREE, Index: gcc/testsuite/c-c++-common/fold-divmul-1.c === *** gcc/testsuite/c-c++-common/fold-divmul-1.c (revision 0) --- gcc/testsuite/c-c++-common/fold-divmul-1.c (working copy) *** *** 0 --- 1,11 + /* { dg-do compile } */ + /* { dg-options -fdump-tree-original } */ + + int * + fx (int *b, int *e) + { + return b + (e - b); + } + + /* { dg-final { scan-tree-dump-not /\\\[ex\\\] original } } */ + /* { dg-final { cleanup-tree-dump original } } */
[PATCH, MPX, 2/X] Pointers Checker [1/25] Hooks
Hi, This patch starts the series which introduces Pointers Checker and its support in i386 via Intel MPX. Pointers Checker is described on Wiki page - http://gcc.gnu.org/wiki/Intel%20MPX%20support%20in%20the%20GCC%20compiler. This series actually replaces previously sent patch (http://gcc.gnu.org/ml/gcc-patches/2013-08/msg01167.html) which seems too inconvenient for review. The first patch in a series introduces new target and language hooks used by Pointers Checker. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2013-10-21 Ilya Enkovich ilya.enkov...@intel.com * target.def (builtin_chkp_function): New. (chkp_bound_type): New. (chkp_bound_mode): New. (fn_abi_va_list_bounds_size): New. (load_bounds_for_arg): New. (store_bounds_for_arg): New. * targhooks.h (default_load_bounds_for_arg): New. (default_store_bounds_for_arg): New. (default_fn_abi_va_list_bounds_size): New. (default_chkp_bound_type): New. (default_chkp_bound_mode): New. (default_builtin_chkp_function): New. * targhooks.c (default_load_bounds_for_arg): New. (default_store_bounds_for_arg): New. (default_fn_abi_va_list_bounds_size): New. (default_chkp_bound_type): New. (default_chkp_bound_mode); New. (default_builtin_chkp_function): New. * doc/tm.texi.in (TARGET_FN_ABI_VA_LIST_BOUNDS_SIZE): New. (TARGET_LOAD_BOUNDS_FOR_ARG): New. (TARGET_STORE_BOUNDS_FOR_ARG): New. (TARGET_BUILTIN_CHKP_FUNCTION): New. (TARGET_CHKP_BOUND_TYPE): New. (TARGET_CHKP_BOUND_MODE): New. * doc/tm.texi: Regenerated. * langhooks.h (lang_hooks): Add chkp_supported field. * langhooks-def.h (LANG_HOOKS_CHKP_SUPPORTED): New. (LANG_HOOKS_INITIALIZER); Add LANG_HOOKS_CHKP_SUPPORTED. diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 8d220f3..01462a2 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -4334,6 +4334,11 @@ This hook returns the va_list type of the calling convention specified by The default version of this hook returns @code{va_list_type_node}. @end deftypefn +@deftypefn {Target Hook} tree TARGET_FN_ABI_VA_LIST_BOUNDS_SIZE (tree @var{fndecl}) +This hook returns size for va_list object or integer_zero_node if +it does not have any (e.g. is scalar pointer to the stack). +@end deftypefn + @deftypefn {Target Hook} tree TARGET_CANONICAL_VA_LIST_TYPE (tree @var{type}) This hook returns the va_list type of the calling convention specified by the type of @var{type}. If @var{type} is not a valid va_list type, it returns @@ -5151,6 +5156,16 @@ defined, then define this hook to return @code{true} if Otherwise, you should not define this hook. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_LOAD_BOUNDS_FOR_ARG (rtx, @var{rtx}, @var{rtx}) +This hook is used to emit insn to load arg's bounds +in case bounds are not passed on register. Return loaded bounds +@end deftypefn + +@deftypefn {Target Hook} void TARGET_STORE_BOUNDS_FOR_ARG (rtx, @var{rtx}, @var{rtx}, @var{rtx}) +This hook is used to emit insn to store arg's bounds +in case bounds are not passed on register. +@end deftypefn + @node Trampolines @section Trampolines for Nested Functions @cindex trampolines for nested functions @@ -10907,6 +10922,18 @@ ignored. This function should return the result of the call to the built-in function. @end deftypefn +@deftypefn {Target Hook} tree TARGET_BUILTIN_CHKP_FUNCTION (unsigned @var{fcode}) +Pointers checker instrumentation pass uses this hook to obtain +target-specific functions which implement specified generic checker +builtins. +@end deftypefn +@deftypefn {Target Hook} tree TARGET_CHKP_BOUND_TYPE (void) +Return type to be used for bounds +@end deftypefn +@deftypefn {Target Hook} {enum machine_mode} TARGET_CHKP_BOUND_MODE (void) +Return mode to be used for bounds. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_RESOLVE_OVERLOADED_BUILTIN (unsigned int @var{loc}, tree @var{fndecl}, void *@var{arglist}) Select a replacement for a machine specific built-in function that was set up by @samp{TARGET_INIT_BUILTINS}. This is done diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 863e843a..2828361 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3694,6 +3694,8 @@ stack. @hook TARGET_FN_ABI_VA_LIST +@hook TARGET_FN_ABI_VA_LIST_BOUNDS_SIZE + @hook TARGET_CANONICAL_VA_LIST_TYPE @hook TARGET_GIMPLIFY_VA_ARG_EXPR @@ -4064,6 +4066,10 @@ These machine description macros help implement varargs: @hook TARGET_PRETEND_OUTGOING_VARARGS_NAMED +@hook TARGET_LOAD_BOUNDS_FOR_ARG + +@hook TARGET_STORE_BOUNDS_FOR_ARG + @node Trampolines @section Trampolines for Nested Functions @cindex trampolines for nested functions @@ -8184,6 +8190,10 @@ to by @var{ce_info}. @hook TARGET_EXPAND_BUILTIN +@hook TARGET_BUILTIN_CHKP_FUNCTION +@hook TARGET_CHKP_BOUND_TYPE +@hook
Re: [PATCH, MPX, 2/X] Pointers Checker [1/25] Hooks
On Mon, 21 Oct 2013, Ilya Enkovich wrote: +DEFHOOK +(builtin_chkp_function, + Pointers checker instrumentation pass uses this hook to obtain\n\ +target-specific functions which implement specified generic checker\n\ +builtins., + tree, (unsigned fcode), + default_builtin_chkp_function) I don't think that's enough detail. The audience for this hook description is back-end maintainers wanting to implement such hooks for their back ends, and the hook description should give sufficient information to do so. This description says nothing at all about the semantics of the hook argument or return value. If it seems difficult to describe things sufficiently in the context of individual hook descriptions, maybe an overview of the feature and implementation approach is needed as a new section in the internals manual, with hook descriptions then referring to that section, or going in appropriate places within that section (if the section is in tm.texi.in). +DEFHOOK +(fn_abi_va_list_bounds_size, + This hook returns size for va_list object or integer_zero_node if\n\ +it does not have any (e.g. is scalar pointer to the stack)., + tree, (tree fndecl), + default_fn_abi_va_list_bounds_size) @code{va_list}, @code{integer_zero_node}, specify semantics of fndecl argument. DEFHOOK +(load_bounds_for_arg, + This hook is used to emit insn to load arg's bounds\n\ +in case bounds are not passed on register. Return loaded bounds, + rtx, (rtx, rtx, rtx), + default_load_bounds_for_arg) You need to name all the arguments and explain their semantics by name in the documentation (which should end with .). +DEFHOOK +(store_bounds_for_arg, + This hook is used to emit insn to store arg's bounds\n\ +in case bounds are not passed on register., + void, (rtx, rtx, rtx, rtx), + default_store_bounds_for_arg) Likewise. -- Joseph S. Myers jos...@codesourcery.com
[PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
Hi, This patch introduces built-in functions used by Pointers Checker and flag to enable Pointers Checker. Builtins available for user are expanded in expand_builtin. All other builtins are not allowed in expand until generic version of Pointers Cheker is implemented. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2013-10-04 Ilya Enkovich ilya.enkov...@intel.com * builtin-types.def (BT_FN_VOID_CONST_PTR): New. (BT_FN_PTR_CONST_PTR): New. (BT_FN_CONST_PTR_CONST_PTR): New. (BT_FN_PTR_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR): New. (BT_FN_VOID_PTRPTR_CONST_PTR): New. (BT_FN_VOID_CONST_PTR_SIZE): New. (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New. * chkp-builtins.def: New. * builtins.def: include chkp-builtins.def. (DEF_CHKP_BUILTIN): New. * builtins.c (expand_builtin): Support BUILT_IN_CHKP_INIT_PTR_BOUNDS, BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS, BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS, BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS, BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS, BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND, BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL, BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET, BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND, BUILT_IN_CHKP_NARROW, BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER. * common.opt (fcheck-pointers): New. * toplev.c (process_options): Check Pointers Checker is supported. * doc/extend.texi: Document Pointers Checker built-in functions. diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def index 2634ecc..c6c5e5c 100644 --- a/gcc/builtin-types.def +++ b/gcc/builtin-types.def @@ -224,12 +224,15 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, BT_DFLOAT64) DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR) +DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR) DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT) DEF_FUNCTION_TYPE_1 (BT_FN_ULONG_ULONG, BT_ULONG, BT_ULONG) DEF_FUNCTION_TYPE_1 (BT_FN_ULONGLONG_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG) DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT16, BT_UINT16, BT_UINT16) DEF_FUNCTION_TYPE_1 (BT_FN_UINT32_UINT32, BT_UINT32, BT_UINT32) DEF_FUNCTION_TYPE_1 (BT_FN_UINT64_UINT64, BT_UINT64, BT_UINT64) +DEF_FUNCTION_TYPE_1 (BT_FN_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR) +DEF_FUNCTION_TYPE_1 (BT_FN_CONST_PTR_CONST_PTR, BT_CONST_PTR, BT_CONST_PTR) DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR, BT_FN_VOID_PTR) @@ -341,6 +344,10 @@ DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VPTR_INT, BT_VOID, BT_VOLATILE_PTR, BT_INT) DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_VPTR_INT, BT_BOOL, BT_VOLATILE_PTR, BT_INT) DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_SIZE_CONST_VPTR, BT_BOOL, BT_SIZE, BT_CONST_VOLATILE_PTR) +DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_SIZE, BT_PTR, BT_CONST_PTR, BT_SIZE) +DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR, BT_CONST_PTR) +DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRPTR_CONST_PTR, BT_VOID, BT_PTR_PTR, BT_CONST_PTR) +DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, BT_SIZE) DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR) @@ -425,6 +432,7 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, BT_VOLATILE_PTR, BT_I2, BT DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, BT_INT) DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, BT_I8, BT_INT) DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I16_INT, BT_VOID, BT_VOLATILE_PTR, BT_I16, BT_INT) +DEF_FUNCTION_TYPE_3 (BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE, BT_PTR, BT_CONST_PTR, BT_CONST_PTR, BT_SIZE) DEF_FUNCTION_TYPE_4 (BT_FN_SIZE_CONST_PTR_SIZE_SIZE_FILEPTR, BT_SIZE, BT_CONST_PTR, BT_SIZE, BT_SIZE, BT_FILEPTR) diff --git a/gcc/builtins.c b/gcc/builtins.c index 3b16d59..b8dec3f 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5861,7 +5861,18 @@ expand_builtin (tree exp, rtx target, rtx subtarget, enum machine_mode mode, fcode != BUILT_IN_EXECVE fcode != BUILT_IN_ALLOCA fcode != BUILT_IN_ALLOCA_WITH_ALIGN - fcode != BUILT_IN_FREE) + fcode != BUILT_IN_FREE + fcode != BUILT_IN_CHKP_SET_PTR_BOUNDS + fcode != BUILT_IN_CHKP_INIT_PTR_BOUNDS + fcode != BUILT_IN_CHKP_NULL_PTR_BOUNDS + fcode != BUILT_IN_CHKP_COPY_PTR_BOUNDS + fcode != BUILT_IN_CHKP_NARROW_PTR_BOUNDS + fcode != BUILT_IN_CHKP_STORE_PTR_BOUNDS + fcode != BUILT_IN_CHKP_CHECK_PTR_LBOUNDS + fcode != BUILT_IN_CHKP_CHECK_PTR_UBOUNDS + fcode !=
RE: [PATCH] Fix PR58682
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: 18 October 2013 13:12 To: Paulo Matos Cc: Mike Stump; Kyrill Tkachov; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix PR58682 Please re-post the latest version of the patch and CC Honza. Richard. Patch attached. OK to submit? Bootstrapped x86_64-unknown-linux-gnu. 2013-10-21 Paulo Matos pma...@broadcom.com * ipa-inline.c (inline_small_functions): Update max_count if new edges are found after inlining. -- Paulo Matos pr58682.patch Description: pr58682.patch
Re: [PATCH] Fix regression parts of PR58742
On Mon, Oct 21, 2013 at 01:30:37PM +0200, Richard Biener wrote: This fixes a regression introduced in GCC 4.2 after which we no longer fold (unsigned) (X /[ex] 4) * 4 to (unsigned) X. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Don't we want to do the same optimization on GIMPLE? I mean, if the testcase is int * fx (int *b, int *e) { ptrdiff_t p = e - b; return b + p; } (or s/ptrdiff_t/long long/, etc.)? Jakub
Re: [PATCH] Fix regression parts of PR58742
On Mon, 21 Oct 2013, Jakub Jelinek wrote: On Mon, Oct 21, 2013 at 01:30:37PM +0200, Richard Biener wrote: This fixes a regression introduced in GCC 4.2 after which we no longer fold (unsigned) (X /[ex] 4) * 4 to (unsigned) X. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Don't we want to do the same optimization on GIMPLE? I mean, if the testcase is int * fx (int *b, int *e) { ptrdiff_t p = e - b; return b + p; } (or s/ptrdiff_t/long long/, etc.)? Sure, but that's not the regression part either ;) Richard.
[PATCH, MPX, 2/X] Pointers Checker [3/25] Attributes
Hi, This patch adds attributes 'bnd_variable_size' and 'bnd_legacy' used by Pointers Checker. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2013-10-21 Ilya Enkovich ilya.enkov...@intel.com * c-family/c-common.c (handle_bnd_variable_size_attribute): New. (handle_bnd_legacy): New. (c_common_attribute_table): Add bnd_variable_size and bnd_legacy. * doc/extend.texi: Document bnd_variable_size and bnd_legacy attributes. diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 8ecb70c..3902909 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -371,6 +371,8 @@ static tree ignore_attribute (tree *, tree, tree, int, bool *); static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *); static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *); static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *); +static tree handle_bnd_variable_size_attribute (tree *, tree, tree, int, bool *); +static tree handle_bnd_legacy (tree *, tree, tree, int, bool *); static void check_function_nonnull (tree, int, tree *); static void check_nonnull_arg (void *, tree, unsigned HOST_WIDE_INT); @@ -745,8 +747,12 @@ const struct attribute_spec c_common_attribute_table[] = The name contains space to prevent its usage in source code. */ { fn spec, 1, 1, false, true, true, handle_fnspec_attribute, false }, - { warn_unused,0, 0, false, false, false, + { warn_unused, 0, 0, false, false, false, handle_warn_unused_attribute, false }, + { bnd_variable_size, 0, 0, true, false, false, + handle_bnd_variable_size_attribute, false }, + { bnd_legacy, 0, 0, true, false, false, + handle_bnd_legacy, false }, { NULL, 0, 0, false, false, false, NULL, false } }; @@ -8000,6 +8006,38 @@ handle_fnspec_attribute (tree *node ATTRIBUTE_UNUSED, tree ARG_UNUSED (name), return NULL_TREE; } +/* Handle a bnd_variable_size attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_bnd_variable_size_attribute (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + if (TREE_CODE (*node) != FIELD_DECL) +{ + warning (OPT_Wattributes, %qE attribute ignored, name); + *no_add_attrs = true; +} + + return NULL_TREE; +} + +/* Handle a bnd_legacy attribute; arguments as in + struct attribute_spec.handler. */ + +static tree +handle_bnd_legacy (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + if (TREE_CODE (*node) != FUNCTION_DECL) +{ + warning (OPT_Wattributes, %qE attribute ignored, name); + *no_add_attrs = true; +} + + return NULL_TREE; +} + /* Handle a warn_unused attribute; arguments as in struct attribute_spec.handler. */ diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index ef422ad..7701d60 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -2137,7 +2137,7 @@ attributes are currently defined for functions on all targets: @code{warn_unused_result}, @code{nonnull}, @code{gnu_inline}, @code{externally_visible}, @code{hot}, @code{cold}, @code{artificial}, @code{no_sanitize_address}, @code{no_address_safety_analysis}, -@code{no_sanitize_undefined}, +@code{no_sanitize_undefined}, @code{bnd_legacy}, @code{error} and @code{warning}. Several other attributes are defined for functions on particular target systems. Other attributes, including @code{section} are @@ -3508,6 +3508,12 @@ The @code{no_sanitize_undefined} attribute on functions is used to inform the compiler that it should not check for undefined behavior in the function when compiling with the @option{-fsanitize=undefined} option. +@item bnd_legacy +@cindex @code{bnd_legacy} function attribute +The @code{bnd_legacy} attribute on functions is used to inform +compiler that function should not be instrumented when compiled +with @option{-fcheck-pointers} option. + @item regparm (@var{number}) @cindex @code{regparm} attribute @cindex functions that are passed arguments in registers on the 386 @@ -5280,12 +5286,12 @@ placed in either the @code{.bss_below100} section or the The keyword @code{__attribute__} allows you to specify special attributes of @code{struct} and @code{union} types when you define such types. This keyword is followed by an attribute specification -inside double parentheses. Seven attributes are currently defined for +inside double parentheses. Eight attributes are currently defined for types: @code{aligned}, @code{packed}, @code{transparent_union}, -@code{unused}, @code{deprecated}, @code{visibility}, and -@code{may_alias}. Other attributes are defined for functions -(@pxref{Function
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
On Mon, 21 Oct 2013, Ilya Enkovich wrote: + if (flag_check_pointers) +{ + if (flag_lto) + sorry (Pointers checker is not yet fully supported for link-time optimization); That sounds wrong. It suggests some bug somewhere in your patch series failing to allow for LTO, which should be fixed. At least give a more detailed explanation in a comment of what would need to change to remove this call to sorry (). + if (targetm.chkp_bound_mode () == VOIDmode) + error (-fcheck-pointers is not supported for this target.); Also note GNU Coding Standards for diagnostics. They should not start with uppercase letters or end with .. -- Joseph S. Myers jos...@codesourcery.com
[PATCH, MPX, 2/X] Pointers Checker [4/25] Constructors
Hi, This patch introduces two new contructor types supported by cgraph_build_static_cdtor. 'B' type is used to initialize static objects (bounds) created by Pointers Checker. The difference of this type from the regular constructor is that 'B' constructor is never instrumented by Pointers Checker. 'P' type is used by Pointers Checker to generate constructors to initialize bounds of statically initialized pointers. Pointers Checker remove all stores from such constructors after instrumentation. Since 'P' type constructors are created for statically initialized objects, we need to avoid creation of such objects during its gimplification. New restriction was added to gimplify_init_constructor. Bootstrapped and checked on linux-x86_64. Thanks, Ilya -- gcc/ 2013-10-21 Ilya Enkovich ilya.enkov...@intel.com * ipa.c (cgraph_build_static_cdtor_1): Support contructors with chkp ctor and bnd_legacy attributes. * gimplify.c (gimplify_init_constructor): Avoid infinite loop during gimplification of bounds initializer. diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 86bda77..7c350fd 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -4033,10 +4033,19 @@ gimplify_init_constructor (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p, individual element initialization. Also don't do this for small all-zero initializers (which aren't big enough to merit clearing), and don't try to make bitwise copies of - TREE_ADDRESSABLE types. */ + TREE_ADDRESSABLE types. + + We cannot apply such transformation when compiling chkp static + initializer because creation of initializer image in the memory + will require static initialization of bounds for it. It should + result in another gimplification of similar initializer and we + may fall into infinite loop. */ if (valid_const_initializer !(cleared || num_nonzero_elements == 0) -!TREE_ADDRESSABLE (type)) +!TREE_ADDRESSABLE (type) +(!current_function_decl + || !lookup_attribute (chkp ctor, + DECL_ATTRIBUTES (current_function_decl { HOST_WIDE_INT size = int_size_in_bytes (type); unsigned int align; diff --git a/gcc/ipa.c b/gcc/ipa.c index 92343fb2..36cc621 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -1259,9 +1259,11 @@ make_pass_ipa_whole_program_visibility (gcc::context *ctxt) } /* Generate and emit a static constructor or destructor. WHICH must - be one of 'I' (for a constructor) or 'D' (for a destructor). BODY - is a STATEMENT_LIST containing GENERIC statements. PRIORITY is the - initialization priority for this constructor or destructor. + be one of 'I' (for a constructor), 'D' (for a destructor), 'P' + (for chp static vars constructor) or 'B' (for chkp static bounds + constructor). BODY is a STATEMENT_LIST containing GENERIC + statements. PRIORITY is the initialization priority for this + constructor or destructor. FINAL specify whether the externally visible name for collect2 should be produced. */ @@ -1320,6 +1322,20 @@ cgraph_build_static_cdtor_1 (char which, tree body, int priority, bool final) DECL_STATIC_CONSTRUCTOR (decl) = 1; decl_init_priority_insert (decl, priority); break; +case 'P': + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (chkp ctor), + NULL, + NULL_TREE); + decl_init_priority_insert (decl, priority); + break; +case 'B': + DECL_STATIC_CONSTRUCTOR (decl) = 1; + DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (bnd_legacy), + NULL, + NULL_TREE); + decl_init_priority_insert (decl, priority); + break; case 'D': DECL_STATIC_DESTRUCTOR (decl) = 1; decl_fini_priority_insert (decl, priority); @@ -1337,9 +1353,11 @@ cgraph_build_static_cdtor_1 (char which, tree body, int priority, bool final) } /* Generate and emit a static constructor or destructor. WHICH must - be one of 'I' (for a constructor) or 'D' (for a destructor). BODY - is a STATEMENT_LIST containing GENERIC statements. PRIORITY is the - initialization priority for this constructor or destructor. */ + be one of 'I' (for a constructor), 'D' (for a destructor), 'P' + (for chkp static vars constructor) or 'B' (for chkp static bounds + constructor). BODY is a STATEMENT_LIST containing GENERIC + statements. PRIORITY is the initialization priority for this + constructor or destructor. */ void cgraph_build_static_cdtor (char which, tree body, int priority)
Re: [patch 1/8] Remove gimple-low.h from the tree-ssa.h include list.
On Fri, Oct 18, 2013 at 9:01 PM, Andrew MacLeod amacl...@redhat.com wrote: On 10/18/2013 12:10 PM, Jeff Law wrote: On 10/18/13 07:37, Andrew MacLeod wrote: gimple_check_call_matching_types() was being called from 3 or 4 different files,and seemed more appropriate as a cgraph routine (which called it 3 times). So I moved that and its helper to cgraph.c. After that, I only needed to update 4 .c files to directly include gimple-low.h bootstraps on x86_64-unknown-linux-gnu with no new regressions. OK? I'm less sure about this one. I don't see that it clearly belongs in either location. I could easily see it moving out of cgraph.c at some point. If it furthers your cleanup efforts at this time, that's fine. Just be aware that, at least IMHO, this routine doesn't fit clearly into either location and I wouldn't be surprised if we have to come back to it at some point. possibly, there didn't seem to be a tree call file for call helpers, so I went with locality :-) I will happily move it to a more appropriate place as a follow up. cgraph.h is already a collector for the exports for a half dozen .c file. It needs the same treatment tree-flow got. I think that we should remove all indirect includes again at some point to get a clearer picture of what is used where. After all prototypes are in the appropriate header files, of course... So the less indirect includes the better (for now). Richard. Andrew
Re: [RFC] Isolate simplify paths with undefined behaviour
On Fri, Oct 18, 2013 at 7:12 PM, Jeff Law l...@redhat.com wrote: Back in 2011 I wrote code to detect cases when traversing a particular path could be proven to trigger undefined behaviour (such as a null pointer dereference). That original patch would find the control dependent edges leading to the dereference and eliminate those edges. The result being those paths with undefined behaviour simply vanished. The problem with that implementation is there could have been observable side effects on those paths prior to triggering the undefined behaviour. At that time I speculated we could isolate the path (via block copying) with undefined behaviour. On the duplicate path we'd replace the undefined behaviour with a trap and remove only the outgoing edges. That would still preserve any visible side effects on the undefined path up to the undefined behaviour and we still often get to simplify the main path, though not as much as the original version from 2011. That's precisely what this patch does. When we have a PHI which feeds a memory dereference in the same block as the PHI and one (or more) of the PHI args is NULL we duplicate the block, redirect incoming edges with the NULL PHI args to the duplicate and change the statement with the memory dereference to a trap. You can certainly ask is this actually helpful. In my experience, yes. I'm regularly seeing stuff like x2 = PHI (x1, x1, 0, 0); *x2 When we isolate the paths with NULL incoming values, we're left with x2 = PHI (x1, x1) And we can propagate x1 into uses of x2. The block now has just one pred and the current block can sometimes be merged into that single pred. I've seen the combination of those two then lead to identification additional common subexpressions. You might also ask how often such paths show up. In a bootstrap about a week ago, this triggered ~3500 times. The code tries to be reasonably smart and re-use an existing duplicate when there's multiple NULL values in a PHI node. That triggers often enough to be worth the marginal amount of book keeping. The code doesn't yet handle the case where there's multiple dereferences in the block. There's no guarantee the first dereference will be the first one we find on the immediate use list. It's on my TODO list. There's no reason this same framework couldn't be used to do the same thing for an out of bounds array access. Similarly, it wouldn't be hard to issue a warning when these transformations are applied. Posting at this time to get some feedback. Planning a formal RFA prior to close of 4.9 stage1. And, of course, this bootstrapps and regression tests. 20030711-3 is clearly optimized further with this patch, but I'll cobble together some more testcases next week. I'm not familiar with the options stuff, so if someone could look at that more closely, I'd appreciate it. Similarly for the bits to create a new pass structure. Thoughts, comments, flames? I wonder why this isn't part of the regular jump-threading code - after all the opportunity is to thead to __builtin_unreachable () ;) Otherwise the question is always where you'd place this pass and whether it enables jump threading or CSE opportunities (that at least it does). Richard. * Makefile.in (OBJS): Add tree-ssa-isolate-paths. * common.opt (-ftree-isolate-paths): Add option and documentation. * opts.c (default_options_table): Add OPT_ftree_isolate_paths. * passes.def: Add path isolation pass. * timevar.def (TV_TREE_SSA_ISOLATE_PATHS): New timevar. * tree-pass.h (make_isolate_paths): Declare. * tree-ssa-isolate-paths.c: New file. * gcc.dg/pr38984.c: Add -fno-tree-isolate-paths. * gcc.dg/tree-ssa/20030711-3.c: Update expected output. * testsuite/libmudflap.c/fail20-frag.c: Add -fno-tree-isolate-paths. diff --git a/gcc/Makefile.in b/gcc/Makefile.in index ba39ac9..e7c18bc 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1412,6 +1412,7 @@ OBJS = \ tree-ssa-dse.o \ tree-ssa-forwprop.o \ tree-ssa-ifcombine.o \ + tree-ssa-isolate-paths.o \ tree-ssa-live.o \ tree-ssa-loop-ch.o \ tree-ssa-loop-im.o \ diff --git a/gcc/common.opt b/gcc/common.opt index 1f11fcd..1fd1bcc 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2104,6 +2104,12 @@ foptimize-strlen Common Report Var(flag_optimize_strlen) Optimization Enable string length optimizations on trees +ftree-isolate-paths +Common Report Var(flag_tree_isolate_paths) Init(1) Optimization +Detect paths which trigger undefined behaviour. Isolate those paths from +the main control flow and turn the statement with undefined behaviour into +a trap. + ftree-loop-distribution Common Report Var(flag_tree_loop_distribution) Optimization Enable loop distribution on trees diff --git a/gcc/opts.c b/gcc/opts.c index 728d36d..b6ed0c2
Re: [PATCH] Fix PR58682
-Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: 18 October 2013 13:12 To: Paulo Matos Cc: Mike Stump; Kyrill Tkachov; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix PR58682 Please re-post the latest version of the patch and CC Honza. Richard. Patch attached. OK to submit? Bootstrapped x86_64-unknown-linux-gnu. 2013-10-21 Paulo Matos pma...@broadcom.com * ipa-inline.c (inline_small_functions): Update max_count if new edges are found after inlining. This won't work - the max_count is there to get things correctly scaled by badness calculation. Changing max_count would mean the need to recompute all badnesses in the fibheap keys. I would just cap the edge_count in badness calculation. Honza -- Paulo Matos
Re: [PATCH]Fix computation of offset in ivopt
On Fri, Oct 18, 2013 at 5:48 PM, Bin.Cheng amker.ch...@gmail.com wrote: Hi Richard, Is the first patch OK? Since there is a patch depending on it. http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01931.html Yes. Richard. Thanks. bin On Fri, Oct 18, 2013 at 7:18 PM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Oct 17, 2013 at 7:52 AM, bin.cheng bin.ch...@arm.com wrote: Hi, As noted in previous messages, GCC forces offset to unsigned in middle end. It also gets offset value and stores it in HOST_WIDE_INT then uses it in various computation. In this scenario, It should use int_cst_value to do additional sign extension against the type of tree node, otherwise we might lose sign information. Take function fold_plusminus_mult_expr as an example, the sign information of arg01/arg11 might be lost because it uses TREE_INT_CST_LOW directly. I think this is the offender of the problem in this thread. I think we should fix the offender, rather the hacking strip_offset as in the original patch, so I split original patch into two as attached, with one fixing the offset of COMPONENT_REF in strip_offset_1, the other fixing the mentioned sign extension problem. Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 203267) +++ gcc/fold-const.c (working copy) @@ -7270,8 +7270,8 @@ fold_plusminus_mult_expr (location_t loc, enum tre HOST_WIDE_INT int01, int11, tmp; bool swap = false; tree maybe_same; - int01 = TREE_INT_CST_LOW (arg01); - int11 = TREE_INT_CST_LOW (arg11); + int01 = int_cst_value (arg01); + int11 = int_cst_value (arg11); this is not correct - it will mishandle all large unsigned numbers. The real issue is that we rely on twos-complement arithmetic to work when operating on pointer offsets (because we convert them all to unsigned sizetype). That makes interpreting the offsets or expressions that compute offsets hard (or impossible in some cases), as you can see from the issues in IVOPTs. The above means that strip_offset_1 cannot reliably look through MULT_EXPRs as that can make an offset X * CST that is effectively signed surely unsigned in the process. Think of this looking into X * CST as performing a division - whose result is dependent on the sign of X which we lost with our unsigned casting. Now, removing the MULT_EXPR look-through might be too pessimizing ... but it may be worth trying to see if it fixes your issue? In this context I also remember a new bug filed recently of us not folding x /[ex] 4 * 4 to x. In the past I was working multiple times on stopping doing that (make all offsets unsigned), but I never managed to finish ... Richard. Bootstrap and test on x86/x86_64/arm. Is it OK? Thanks. bin Patch a: 2013-10-17 Bin Cheng bin.ch...@arm.com * tree-ssa-loop-ivopts.c (strip_offset_1): Change parameter type. Count DECL_FIELD_BIT_OFFSET when computing offset for COMPONENT_REF. Patch b: 2013-10-17 Bin Cheng bin.ch...@arm.com * fold-const.c (fold_plusminus_mult_expr): Use int_cst_value instead of TREE_INT_CST_LOW. gcc/testsuite/ChangeLog 2013-10-17 Bin Cheng bin.ch...@arm.com * gcc.dg/tree-ssa/ivopts-outside-loop-use-1.c: New test. -Original Message- From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, October 02, 2013 4:32 PM To: Bin.Cheng Cc: Bin Cheng; GCC Patches Subject: Re: [PATCH]Fix computation of offset in ivopt On Tue, Oct 1, 2013 at 6:13 PM, Bin.Cheng amker.ch...@gmail.com wrote: On Tue, Oct 1, 2013 at 6:50 PM, Richard Biener richard.guent...@gmail.com wrote: On Mon, Sep 30, 2013 at 7:39 AM, bin.cheng bin.ch...@arm.com wrote: I don't think you need + /* Sign extend off if expr is in type which has lower precision + than HOST_WIDE_INT. */ + if (TYPE_PRECISION (TREE_TYPE (expr)) = HOST_BITS_PER_WIDE_INT) +off = sext_hwi (off, TYPE_PRECISION (TREE_TYPE (expr))); at least it would be suspicious if you did ... There is a problem for example of the first message. The iv base if like: pretmp_184 + ((sizetype) KeyIndex_180 + 1073741823) * 4 I am not sure why but it seems (-4/0xFFFC) is represented by (1073741823*4). For each operand strip_offset_1 returns exactly the positive number and result of multiplication never get its chance of sign extend. That's why the sign extend is necessary to fix the problem. Or it should be fixed elsewhere by representing iv base with: pretmp_184 + ((sizetype) KeyIndex_180 + 4294967295) * 4 in the first place. Yeah, that's why I said the whole issue with forcing all offsets to be unsigned is a mess ... There is really no good answer besides not doing that I fear. Yes, in the above case we could fold the whole thing differently (interpret the offset of a POINTER_PLUS_EXPR as signed). You can try tracking down
[PATCH, MPX, 2/X] Pointers Checker [5/25] Tree and gimple ifaces
Hi, This patch adds bounds related iface macros and functions for tree and gimple. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2013-10-21 Ilya Enkovich ilya.enkov...@intel.com * tree-core.h (tree_index): Add TI_BOUND_TYPE. * tree.h (BOUND_P): New. (BOUNDED_TYPE_P): New. (BOUNDED_P): New. (bound_type_node): New. * tree.c (build_common_tree_nodes): Initialize bound_type_node. * gimple.h (gimple_call_get_nobnd_arg_index): New. (gimple_call_num_nobnd_args): New. (gimple_call_nobnd_arg): New. (gimple_return_retbnd): New. (gimple_return_set_retbnd): New * gimple.c (gimple_build_return): Increase number of ops for return statement. (gimple_call_get_nobnd_arg_index): New. * gimple-pretty-print.c (dump_gimple_return): Print second op. diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c index 1e985e0..fddcee0 100644 --- a/gcc/gimple-pretty-print.c +++ b/gcc/gimple-pretty-print.c @@ -535,11 +535,12 @@ dump_gimple_assign (pretty_printer *buffer, gimple gs, int spc, int flags) static void dump_gimple_return (pretty_printer *buffer, gimple gs, int spc, int flags) { - tree t; + tree t, t2; t = gimple_return_retval (gs); + t2 = gimple_return_retbnd (gs); if (flags TDF_RAW) -dump_gimple_fmt (buffer, spc, flags, %G %T, gs, t); +dump_gimple_fmt (buffer, spc, flags, %G %T %T, gs, t, t2); else { pp_string (buffer, return); @@ -548,6 +549,11 @@ dump_gimple_return (pretty_printer *buffer, gimple gs, int spc, int flags) pp_space (buffer); dump_generic_node (buffer, t, spc, flags, false); } + if (t2) + { + pp_string (buffer, , ); + dump_generic_node (buffer, t2, spc, flags, false); + } pp_semicolon (buffer); } } diff --git a/gcc/gimple.c b/gcc/gimple.c index e12f7d9..eb2d365 100644 --- a/gcc/gimple.c +++ b/gcc/gimple.c @@ -179,7 +179,7 @@ gimple_build_with_ops_stat (enum gimple_code code, unsigned subcode, gimple gimple_build_return (tree retval) { - gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 1); + gimple s = gimple_build_with_ops (GIMPLE_RETURN, ERROR_MARK, 2); if (retval) gimple_return_set_retval (s, retval); return s; @@ -371,6 +371,26 @@ gimple_build_call_from_tree (tree t) } +/* Return index of INDEX's non bound argument of the call. */ + +unsigned +gimple_call_get_nobnd_arg_index (const_gimple gs, unsigned index) +{ + unsigned num_args = gimple_call_num_args (gs); + for (unsigned n = 0; n num_args; n++) +{ + if (BOUND_P (gimple_call_arg (gs, n))) + continue; + else if (index) + index--; + else + return n; +} + + gcc_unreachable (); +} + + /* Extract the operands and code for expression EXPR into *SUBCODE_P, *OP1_P, *OP2_P and *OP3_P respectively. */ diff --git a/gcc/gimple.h b/gcc/gimple.h index 376fda2..4376408 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -910,6 +910,7 @@ extern tree get_initialized_tmp_var (tree, gimple_seq *, gimple_seq *); extern tree get_formal_tmp_var (tree, gimple_seq *); extern void declare_vars (tree, gimple, bool); extern void annotate_all_with_location (gimple_seq, location_t); +extern unsigned gimple_call_get_nobnd_arg_index (const_gimple, unsigned); /* Validation of GIMPLE expressions. Note that these predicates only check the basic form of the expression, they don't recurse to make sure that @@ -2379,6 +2380,31 @@ gimple_call_arg (const_gimple gs, unsigned index) } +/* Return the number of arguments used by call statement GS. */ + +static inline unsigned +gimple_call_num_nobnd_args (const_gimple gs) +{ + unsigned num_args = gimple_call_num_args (gs); + unsigned res = num_args; + for (unsigned n = 0; n num_args; n++) +if (BOUND_P (gimple_call_arg (gs, n))) + res--; + return res; +} + + +/* Return INDEX's call argument ignoring bound ones. */ +static inline tree +gimple_call_nobnd_arg (const_gimple gs, unsigned index) +{ + /* No bound args may exist if pointers checker is off. */ + if (!flag_check_pointers) +return gimple_call_arg (gs, index); + return gimple_call_arg (gs, gimple_call_get_nobnd_arg_index (gs, index)); +} + + /* Return a pointer to the argument at position INDEX for call statement GS. */ @@ -4886,6 +4912,26 @@ gimple_return_set_retval (gimple gs, tree retval) } +/* Return the return bounds for GIMPLE_RETURN GS. */ + +static inline tree +gimple_return_retbnd (const_gimple gs) +{ + GIMPLE_CHECK (gs, GIMPLE_RETURN); + return gimple_op (gs, 1); +} + + +/* Set RETVAL to be the return bounds for GIMPLE_RETURN GS. */ + +static inline void +gimple_return_set_retbnd (gimple gs, tree retval) +{ + GIMPLE_CHECK (gs, GIMPLE_RETURN); + gimple_set_op (gs, 1, retval); +} + + /* Returns true when the gimple statement STMT is any of the OpenMP types. */ #define
[PATCH] Fix DECL_BIT_FIELD dependent on flag_strict_volatile_bitfields
Hello, this patch removes the dependency of DECL_BIT_FIELD on the flag_strict_volatile_bitfields, and makes get_inner_reference not returning a different mode for non-volatile members if flag_strict-volatile_bitfields is used. This fixes the regression on the SH-target, due to the patch Fix asymmetric volatile handling in optimize_bit_field_compare: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01629.html Manually verified the mentioned test cases on a SH-cross-compiler. Boot-strapped and regression-tested on x86_64-linux-gnu. Ok for trunk? Bernd.
RE: [PATCH] Fix DECL_BIT_FIELD dependent on flag_strict_volatile_bitfields
Oops... Hello, this patch removes the dependency of DECL_BIT_FIELD on the flag_strict_volatile_bitfields, and makes get_inner_reference not returning a different mode for non-volatile members if flag_strict-volatile_bitfields is used. This fixes the regression on the SH-target, due to the patch Fix asymmetric volatile handling in optimize_bit_field_compare: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01629.html Manually verified the mentioned test cases on a SH-cross-compiler. Boot-strapped and regression-tested on x86_64-linux-gnu. Ok for trunk? Bernd. 2013-10-21 Bernd Edlinger bernd.edlin...@hotmail.de Fix DECL_BIT_FIELD depencency on flag_strict_volatile_bitfields and get_inner_reference returning different pmode for non-volatile bit-field members dependent on flag_strict_volatile_bitfields. * stor-layout.c (layout_decl): Remove special handling of flag_strict_volatile_bitfields. * expr.c (get_inner_reference): Don't use DECL_BIT_FIELD if flag_strict_volatile_bitfields 0 and TREE_THIS_VOLATILE. patch-stor-layout.diff Description: Binary data
Re: [PATCH] Fix DECL_BIT_FIELD dependent on flag_strict_volatile_bitfields
On Mon, Oct 21, 2013 at 2:33 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: Oops... Hello, this patch removes the dependency of DECL_BIT_FIELD on the flag_strict_volatile_bitfields, and makes get_inner_reference not returning a different mode for non-volatile members if flag_strict-volatile_bitfields is used. This fixes the regression on the SH-target, due to the patch Fix asymmetric volatile handling in optimize_bit_field_compare: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01629.html Manually verified the mentioned test cases on a SH-cross-compiler. Boot-strapped and regression-tested on x86_64-linux-gnu. Ok for trunk? Ok. Thanks, Richard. Bernd.
[PATCH, rs6000] Be careful with special permute masks for little endian
Hi, In altivec_expand_vec_perm_const, we look for special masks that match the behavior of specific instructions, so we can use those instructions rather than load a constant control vector and perform a permute. Some of the masks must be treated differently for little endian mode. The masks that represent merge-high and merge-low operations have reversed meanings in little-endian, because of the reversed ordering of the vector elements. The masks that represent vector-pack operations remain correct when the mode of the input operands matches the natural mode of the instruction, but not otherwise. This is because the pack instructions always select the rightmost, low-order bits of the vector element. There are cases where we use this, for example, with a V8SI vector matching a vpkuwum mask in order to select the odd-numbered elements of the vector. In little endian mode, this instruction will get us the even-numbered elements instead. There is no alternative instruction with the desired behavior, so I've just disabled use of those masks for little endian when the mode isn't natural. These changes fix 32 failures in the test suite for little endian mode. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new failures. Is this ok for trunk? Thanks, Bill 2013-10-21 Bill Schmidt wschm...@vnet.ibm.com * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse meaning of merge-high and merge-low masks for little endian; avoid use of vector-pack masks for little endian for mismatched modes. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 203792) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, @@ -28980,6 +28987,22 @@ altivec_expand_vec_perm_const (rtx operands[4]) enum machine_mode omode = insn_data[icode].operand[0].mode; enum machine_mode imode = insn_data[icode].operand[1].mode; + /* For little-endian, don't use vpkuwum and vpkuhum if the +underlying vector type is not V4SI and V8HI, respectively. +For example, using vpkuwum with a V8HI picks up the even +halfwords (BE numbering) when the even halfwords (LE +numbering) are what we need. */ + if (!BYTES_BIG_ENDIAN + icode == CODE_FOR_altivec_vpkuwum + GET_CODE (op0) == SUBREG + GET_MODE (XEXP (op0, 0)) != V4SImode) + continue; + if (!BYTES_BIG_ENDIAN + icode == CODE_FOR_altivec_vpkuhum + GET_CODE (op0) == SUBREG + GET_MODE (XEXP (op0, 0)) != V8HImode) + continue; + /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering from 0 to 2N-1. */
[PATCH] Fix PR58794
This fixes another case of unequal addresses of volatile fields. Boostrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2013-10-21 Richard Biener rguent...@suse.de PR tree-optimization/58794 * fold-const.c (operand_equal_p): Compare FIELD_DECL operand of COMPONENT_REFs with OEP_CONSTANT_ADDRESS_OF left in place. * c-c++-common/torture/pr58794-1.c: New testcase. * c-c++-common/torture/pr58794-2.c: Likewise. Index: gcc/fold-const.c === *** gcc/fold-const.c(revision 203886) --- gcc/fold-const.c(working copy) *** operand_equal_p (const_tree arg0, const_ *** 2715,2724 case COMPONENT_REF: /* Handle operand 2 the same as for ARRAY_REF. Operand 0 may be NULL when we're called to compare MEM_EXPRs. */ ! if (!OP_SAME_WITH_NULL (0)) return 0; flags = ~OEP_CONSTANT_ADDRESS_OF; ! return OP_SAME (1) OP_SAME_WITH_NULL (2); case BIT_FIELD_REF: if (!OP_SAME (0)) --- 2715,2725 case COMPONENT_REF: /* Handle operand 2 the same as for ARRAY_REF. Operand 0 may be NULL when we're called to compare MEM_EXPRs. */ ! if (!OP_SAME_WITH_NULL (0) ! || !OP_SAME (1)) return 0; flags = ~OEP_CONSTANT_ADDRESS_OF; ! return OP_SAME_WITH_NULL (2); case BIT_FIELD_REF: if (!OP_SAME (0)) Index: gcc/testsuite/c-c++-common/torture/pr58794-1.c === *** gcc/testsuite/c-c++-common/torture/pr58794-1.c (revision 0) --- gcc/testsuite/c-c++-common/torture/pr58794-1.c (working copy) *** *** 0 --- 1,29 + /* { dg-do compile } */ + + struct S0 + { + int f; + }; + + struct S1 + { + struct S0 f1; + volatile int f2; + }; + + struct S2 + { + struct S1 g; + } a, b; + + static int *c[1][2] = {{0, (int *)a.g.f2}}; + static int d; + + int + main () + { + for (d = 0; d 1; d++) + for (b.g.f1.f = 0; b.g.f1.f 1; b.g.f1.f++) + *c[b.g.f1.f][d + 1] = 0; + return 0; + } Index: gcc/testsuite/c-c++-common/torture/pr58794-2.c === *** gcc/testsuite/c-c++-common/torture/pr58794-2.c (revision 0) --- gcc/testsuite/c-c++-common/torture/pr58794-2.c (working copy) *** *** 0 --- 1,21 + /* { dg-do compile } */ + + struct S + { + volatile int f; + } a; + + unsigned int b; + + static int *c[1][2] = {{0, (int *)a.f}}; + static unsigned int d; + + int + main () + { + for (; d 1; d++) + for (; b 1; b++) + *c[b][d + 1] = 0; + + return 0; + }
Re: [patch] Cleanup tree-ssa-ter.c exports
On 10/20/2013 09:53 AM, Eric Botcazou wrote: Unfortunately, tree-ssa-ter.c also has 2 functions (find_replaceable_exprs() and dump_replaceable_exprs()) which are exported and utilized by tree-outof-ssa.c (the file is a part of the out-of-ssa module). So I moved the prototypes from tree-ssa-live.h into a newly created tree-ssa-ter.h file, and included it directly from tree-outof-ssa.c, the only consumer. Apparently something went wrong when tree-ssa-ter.h was created. what went wrong? I see it in my checkouts...? Andrew
Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov
The patch is OK. Thanks, the patch was committed. That's a good point. I added a check for this case - so if CONST_INT is passed we assume that mode is QI. But usually promoted value resides in a register, so we don't go over-conservative here. Hmm, so if we use broadcast constant, then we won't end up having CONST_INT here? It is OK then. Currently we don't end up with CONST_INT. But maybe we'd need to revisit this place later. OK, I merged in my misaligned prologues changes and will post patch after full testing. It seemed to go seamlessly. I spotted there are still few places for cleanup, so i will try to handle there incrementally. Do you have plans on enabling the SSE memcpy/memset on Atom now? What about the move_by_pieces/set_by_pieces via SSE now? Thanks! Honza Michael Honza
[PATCH][ARM] Update Cortex A9 DImode mult extend rtx costs
Hi all, There's a slight error in the Cortex A9 costs I committed recently. The extend and extend_add costs refer to 32x32-64 bit operations and thus have valid costs in aarch32. This patches updates those costs for the Cortex A9. The other existing cost tables are correct in this regard and don't need updating. This is just a thinko in the A9 costs. Tested arm-none-eabi on qemu. Ok for trunk? Thanks, Kyrill [gcc/] 2013-10-21 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (cortexa9_extra_costs): Update mult costs for extend and extend_add.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e2baa9b..f85aa5f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1078,18 +1078,18 @@ const struct cpu_cost_table cortexa9_extra_costs = { COSTS_N_INSNS (3), /* Simple. */ COSTS_N_INSNS (3), /* Flag_setting. */ - COSTS_N_INSNS (4), /* Extend. */ + COSTS_N_INSNS (2), /* Extend. */ COSTS_N_INSNS (3), /* Add. */ - COSTS_N_INSNS (4), /* Extend_add. */ + COSTS_N_INSNS (2), /* Extend_add. */ COSTS_N_INSNS (30) /* Idiv. No HW div on Cortex A9. */ }, /* MULT DImode */ { 0, /* Simple (N/A). */ 0, /* Flag_setting (N/A). */ - 0, /* Extend (N/A). */ + COSTS_N_INSNS (4), /* Extend. */ 0, /* Add (N/A). */ - 0, /* Extend_add (N/A). */ + COSTS_N_INSNS (4), /* Extend_add. */ 0/* Idiv (N/A). */ } },
Re: [PATCH][ARM] Update Cortex A9 DImode mult extend rtx costs
On 21/10/13 14:37, Kyrill Tkachov wrote: Hi all, There's a slight error in the Cortex A9 costs I committed recently. The extend and extend_add costs refer to 32x32-64 bit operations and thus have valid costs in aarch32. This patches updates those costs for the Cortex A9. The other existing cost tables are correct in this regard and don't need updating. This is just a thinko in the A9 costs. Tested arm-none-eabi on qemu. Ok for trunk? Thanks, Kyrill [gcc/] 2013-10-21 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.c (cortexa9_extra_costs): Update mult costs for extend and extend_add. OK. R.
Re: [PATCH][i386] Enable vector_loop in memset expanding and merge expanders for memset and memmov
OK, I merged in my misaligned prologues changes and will post patch after full testing. It seemed to go seamlessly. I spotted there are still few places for cleanup, so i will try to handle there incrementally. Great! Hope these recent memcpy/memset changes will lead to some gains soon:) Do you have plans on enabling the SSE memcpy/memset on Atom now? What about the move_by_pieces/set_by_pieces via SSE now? Actually, I'm a bit busy with other stuff right now, so I'm not sure that I'll find time for this before stage1 finish. But maybe somebody will carry on this and do necessary experiments and tuning. And the same for the move_by_pieces. Michael Honza
Re: [PATCH, rs6000] Be careful with special permute masks for little endian
On Mon, Oct 21, 2013 at 8:49 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, In altivec_expand_vec_perm_const, we look for special masks that match the behavior of specific instructions, so we can use those instructions rather than load a constant control vector and perform a permute. Some of the masks must be treated differently for little endian mode. The masks that represent merge-high and merge-low operations have reversed meanings in little-endian, because of the reversed ordering of the vector elements. The masks that represent vector-pack operations remain correct when the mode of the input operands matches the natural mode of the instruction, but not otherwise. This is because the pack instructions always select the rightmost, low-order bits of the vector element. There are cases where we use this, for example, with a V8SI vector matching a vpkuwum mask in order to select the odd-numbered elements of the vector. In little endian mode, this instruction will get us the even-numbered elements instead. There is no alternative instruction with the desired behavior, so I've just disabled use of those masks for little endian when the mode isn't natural. These changes fix 32 failures in the test suite for little endian mode. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new failures. Is this ok for trunk? Thanks, Bill 2013-10-21 Bill Schmidt wschm...@vnet.ibm.com * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse meaning of merge-high and merge-low masks for little endian; avoid use of vector-pack masks for little endian for mismatched modes. Okay. Thanks, David
Re: [C++ PATCH] Fix PR58705
Ping. On Mon, Oct 14, 2013 at 02:23:59PM +0200, Marek Polacek wrote: We were ICEing on the attached testcase, because in check_narrowing, for = {{}}, we wanted to check recursively the CONSTRUCTOR_ELTs, even though init in this case has 0 CONSTRUCTOR_NELTS. So I added the check for CONSTRUCTOR_NELTS 0. Moreover, since empty scalar initializers are forbidden in C FE, I think we should error out here too. (Complex type is considered as an arithmetic type as a GNU extension and arithmetic types are scalar types.) This isn't C++11-specific as it may look from the PR. The bug exhibits when -Wnarrowing (that is implicitly enabled with C++11) is on, since in check_narrowing we have if (!warn_narrowing || !ARITHMETIC_TYPE_P (type)) return; and with -Wno-narrowing we just return early. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-10-14 Marek Polacek pola...@redhat.com PR c++/58705 cp/ * typeck2.c (check_narrowing): Give an error when the scalar initializer is empty. testsuite/ * g++.dg/parse/pr58705.C: New test. --- gcc/cp/typeck2.c.mp 2013-10-14 11:11:36.971293089 +0200 +++ gcc/cp/typeck2.c 2013-10-14 11:52:14.582061052 +0200 @@ -833,7 +833,10 @@ check_narrowing (tree type, tree init) TREE_CODE (type) == COMPLEX_TYPE) { tree elttype = TREE_TYPE (type); - check_narrowing (elttype, CONSTRUCTOR_ELT (init, 0)-value); + if (CONSTRUCTOR_NELTS (init) 0) + check_narrowing (elttype, CONSTRUCTOR_ELT (init, 0)-value); + else + error (empty scalar initializer); if (CONSTRUCTOR_NELTS (init) 1) check_narrowing (elttype, CONSTRUCTOR_ELT (init, 1)-value); return; --- gcc/testsuite/g++.dg/parse/pr58705.C.mp 2013-10-14 11:14:46.460955343 +0200 +++ gcc/testsuite/g++.dg/parse/pr58705.C 2013-10-14 11:51:02.698810118 +0200 @@ -0,0 +1,5 @@ +// PR c++/58705 +// { dg-do compile } +// { dg-options -Wnarrowing } + +_Complex float f = {{}}; // { dg-error empty scalar initializer } Marek Marek
Re: [PATCH, MPX, 2/X] Pointers Checker [1/25] Hooks
esOn 21 Oct 11:44, Joseph S. Myers wrote: On Mon, 21 Oct 2013, Ilya Enkovich wrote: +DEFHOOK +(builtin_chkp_function, + Pointers checker instrumentation pass uses this hook to obtain\n\ +target-specific functions which implement specified generic checker\n\ +builtins., + tree, (unsigned fcode), + default_builtin_chkp_function) I don't think that's enough detail. The audience for this hook description is back-end maintainers wanting to implement such hooks for their back ends, and the hook description should give sufficient information to do so. This description says nothing at all about the semantics of the hook argument or return value. If it seems difficult to describe things sufficiently in the context of individual hook descriptions, maybe an overview of the feature and implementation approach is needed as a new section in the internals manual, with hook descriptions then referring to that section, or going in appropriate places within that section (if the section is in tm.texi.in). +DEFHOOK +(fn_abi_va_list_bounds_size, + This hook returns size for va_list object or integer_zero_node if\n\ +it does not have any (e.g. is scalar pointer to the stack)., + tree, (tree fndecl), + default_fn_abi_va_list_bounds_size) @code{va_list}, @code{integer_zero_node}, specify semantics of fndecl argument. DEFHOOK +(load_bounds_for_arg, + This hook is used to emit insn to load arg's bounds\n\ +in case bounds are not passed on register. Return loaded bounds, + rtx, (rtx, rtx, rtx), + default_load_bounds_for_arg) You need to name all the arguments and explain their semantics by name in the documentation (which should end with .). +DEFHOOK +(store_bounds_for_arg, + This hook is used to emit insn to store arg's bounds\n\ +in case bounds are not passed on register., + void, (rtx, rtx, rtx, rtx), + default_store_bounds_for_arg) Likewise. -- Joseph S. Myers jos...@codesourcery.com Hello Joseph, Thanks for your comments! I attach a new patch version with changed hooks documentation. Hope it is more informative now. Thanks, Ilya -- diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 8d220f3..79bd0f9 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -4334,6 +4334,13 @@ This hook returns the va_list type of the calling convention specified by The default version of this hook returns @code{va_list_type_node}. @end deftypefn +@deftypefn {Target Hook} tree TARGET_FN_ABI_VA_LIST_BOUNDS_SIZE (tree @var{fndecl}) +This hook returns size for @code{va_list} object in function specified +by @var{fndecl}. This hook is used by Pointers Checker to build bounds for +@code{va_list} object. Return @code{integer_zero_node} if no bounds should +be used (e.g. @code{va_list} is a scalar pointer to the stack). +@end deftypefn + @deftypefn {Target Hook} tree TARGET_CANONICAL_VA_LIST_TYPE (tree @var{type}) This hook returns the va_list type of the calling convention specified by the type of @var{type}. If @var{type} is not a valid va_list type, it returns @@ -5151,6 +5158,19 @@ defined, then define this hook to return @code{true} if Otherwise, you should not define this hook. @end deftypefn +@deftypefn {Target Hook} rtx TARGET_LOAD_BOUNDS_FOR_ARG (rtx @var{slot}, rtx @var{arg}, rtx @var{slot_no}) +This hook is used to emit insn to load bounds of @var{arg} passed +in @var{slot}. In case @var{slot} is not a memory, @var{slot_no} is RTX +constant holding number of the special slot we should get bounds from. +Return loaded bounds. +@end deftypefn + +@deftypefn {Target Hook} void TARGET_STORE_BOUNDS_FOR_ARG (rtx @var{arg}, rtx @var{slot}, rtx @var{bounds}, rtx @var{slot_no}) +This hook is used to emit insn to store bounds of @var{arg} passed +in @var{slot}. In case @var{slot} is not a memory, @var{slot_no} is RTX +constant holding number of the special slot we should store bounds to. +@end deftypefn + @node Trampolines @section Trampolines for Nested Functions @cindex trampolines for nested functions @@ -10907,6 +10927,27 @@ ignored. This function should return the result of the call to the built-in function. @end deftypefn +@deftypefn {Target Hook} tree TARGET_BUILTIN_CHKP_FUNCTION (unsigned @var{fcode}) +This hook allows target to redefine built-in functions used by +Pointers Checker for code instrumentation. Hook should return +fndecl of function implementing generic builtin whose code is +passed in @var{fcode}. Currently following built-in functions are +obtained using this hook: +@code{BUILT_IN_CHKP_BNDMK}, @code{BUILT_IN_CHKP_BNDSTX}, +@code{BUILT_IN_CHKP_BNDLDX}, @code{BUILT_IN_CHKP_BNDCL}, +@code{BUILT_IN_CHKP_BNDCU}, @code{BUILT_IN_CHKP_BNDRET}, +@code{BUILT_IN_CHKP_INTERSECT}, @code{BUILT_IN_CHKP_SET_PTR_BOUNDS}, +@code{BUILT_IN_CHKP_NARROW}, @code{BUILT_IN_CHKP_ARG_BND}, +@code{BUILT_IN_CHKP_SIZEOF}, @code{BUILT_IN_CHKP_EXTRACT_LOWER}, +@code{BUILT_IN_CHKP_EXTRACT_UPPER}. +@end deftypefn +@deftypefn
RE: [PATCH] Fix PR58682
-Original Message- From: Jan Hubicka [mailto:hubi...@ucw.cz] Sent: 21 October 2013 13:21 To: Paulo Matos Cc: Richard Biener; Mike Stump; Kyrill Tkachov; gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz) Subject: Re: [PATCH] Fix PR58682 This won't work - the max_count is there to get things correctly scaled by badness calculation. Changing max_count would mean the need to recompute all badnesses in the fibheap keys. I would just cap the edge_count in badness calculation. Honza Makes sense. Thanks for the comment. I have created a new patch. Now attached. 2013-10-21 Paulo Matos pma...@broadcom.com * ipa-inline.c (edge_badness): Cap edge-count at max_count for badness calculations. -- PMatos pr58682-v2.patch Description: pr58682-v2.patch
Doc patch committed: Improve attributes doc in -fno-toplevel-reorder
This patch adds the words when possible to make clear that it is not always possible to use attributes to replace the effects of -fno-toplevel-reorder. I also fixed a typo s/includes files/include files/ I noticed a while ago. Bootstrapped on x86_64-unknown-linux-gnu. Committed to mainline. Ian 2013-10-21 Ian Lance Taylor i...@google.com * doc/invoke.texi (Optimize Options): For -fno-toplevel-reorder, don't imply that attributes can solve all problems. (Directory Options): Fix typo. Index: doc/invoke.texi === --- doc/invoke.texi (revision 203893) +++ doc/invoke.texi (working copy) @@ -8364,7 +8364,7 @@ statements. Output them in the same ord input file. When this option is used, unreferenced static variables are not removed. This option is intended to support existing code that relies on a particular ordering. For new code, it is better to -use attributes. +use attributes when possible. Enabled at level @option{-O0}. When disabled explicitly, it also implies @option{-fno-section-anchors}, which is otherwise enabled at @option{-O0} on some @@ -10367,7 +10367,7 @@ separator character at the end of the pa @option{-B} prefixes that effectively specify directory names also apply to libraries in the linker, because the compiler translates these options into @option{-L} options for the linker. They also apply to -includes files in the preprocessor, because the compiler translates these +include files in the preprocessor, because the compiler translates these options into @option{-isystem} options for the preprocessor. In this case, the compiler appends @samp{include} to the prefix.
Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins
On 21 Oct 12:08, Joseph S. Myers wrote: On Mon, 21 Oct 2013, Ilya Enkovich wrote: + if (flag_check_pointers) +{ + if (flag_lto) + sorry (Pointers checker is not yet fully supported for link-time optimization); That sounds wrong. It suggests some bug somewhere in your patch series failing to allow for LTO, which should be fixed. At least give a more detailed explanation in a comment of what would need to change to remove this call to sorry (). It has been a while since I saw problem with LTO (some functions appeared to miss instrumentation; it seemed compilation flags for checker were not handled correctly for some reason). When I made this branch public I put this guard to prevent users from falling into this problem. Actually I do not know what is the current status of that issue. A lot of things has changed including checker flags definition. I'll try mpx testsuite with LTO and be back with the result. + if (targetm.chkp_bound_mode () == VOIDmode) + error (-fcheck-pointers is not supported for this target.); Also note GNU Coding Standards for diagnostics. They should not start with uppercase letters or end with .. Thanks for input! I'll fix it. Thanks, Ilya -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Make fold_range_test work only on integral types (PR middle-end/58809)
In this PR (well, in the first testcase in it), the problem was that fold_range_test got op0 and op1 of a COMPLEX_TYPE, subsequent call of make_range then failed because it wants to create an integer constant using build_int_cst - and that can't handle COMPLEX_TYPE. Also, we can't just compare complex numbers, so I think it's sane to bail out if fold_range_test isn't dealing with integer types... I didn't want to put the check between the declarations, so the make_range calls were moved slightly below. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-10-21 Marek Polacek pola...@redhat.com PR middle-end/58809 * fold-const.c (fold_range_test): Return 0 if the type is not an integral type. testsuite/ * gcc.dg/gomp/pr58809.c: New test. --- gcc/fold-const.c.mp 2013-10-21 16:12:31.494179417 +0200 +++ gcc/fold-const.c2013-10-21 16:19:08.673580775 +0200 @@ -4984,12 +4984,16 @@ fold_range_test (location_t loc, enum tr int in0_p, in1_p, in_p; tree low0, low1, low, high0, high1, high; bool strict_overflow_p = false; - tree lhs = make_range (op0, in0_p, low0, high0, strict_overflow_p); - tree rhs = make_range (op1, in1_p, low1, high1, strict_overflow_p); - tree tem; + tree tem, lhs, rhs; const char * const warnmsg = G_(assuming signed overflow does not occur when simplifying range test); + if (!INTEGRAL_TYPE_P (type)) +return 0; + + lhs = make_range (op0, in0_p, low0, high0, strict_overflow_p); + rhs = make_range (op1, in1_p, low1, high1, strict_overflow_p); + /* If this is an OR operation, invert both sides; we will invert again at the end. */ if (or_op) --- gcc/testsuite/gcc.dg/gomp/pr58809.c.mp 2013-10-21 16:12:13.333114856 +0200 +++ gcc/testsuite/gcc.dg/gomp/pr58809.c 2013-10-21 16:11:49.296022857 +0200 @@ -0,0 +1,13 @@ +/* PR middle-end/58809 */ +/* { dg-do compile } */ +/* { dg-options -fopenmp -O } */ + +int i; +#pragma omp threadprivate (i) + +void foo() +{ + _Complex int j; +#pragma omp parallel copyin (i) reduction (:j) + ; +} Marek
Re: [PATCH] Fix PR58682
-Original Message- From: Jan Hubicka [mailto:hubi...@ucw.cz] Sent: 21 October 2013 13:21 To: Paulo Matos Cc: Richard Biener; Mike Stump; Kyrill Tkachov; gcc-patches@gcc.gnu.org; Jan Hubicka (hubi...@ucw.cz) Subject: Re: [PATCH] Fix PR58682 This won't work - the max_count is there to get things correctly scaled by badness calculation. Changing max_count would mean the need to recompute all badnesses in the fibheap keys. I would just cap the edge_count in badness calculation. Honza Makes sense. Thanks for the comment. I have created a new patch. Now attached. 2013-10-21 Paulo Matos pma...@broadcom.com * ipa-inline.c (edge_badness): Cap edge-count at max_count for badness calculations. OK, thanks! Honza
[PATCH] More jump threading cleanups
Two cleanups I noticed late last week while working on the general case FSA optimization. First, we were not cancelling jump threads through joiner blocks when there is any edge on the jump threading path with a recorded jump thread opportunity. We were not nearly as aggressive at pruning as we should have been resulting in useless block duplication. Second, in the general form of the FSA optimization, we can have jump threading path which starts with a join point and traverses through a normal jump threading block (ie, has side effects and a compile-time determinable out-edge). To facilitate these opportunities, we want to call thread_through_normal_block after threading through a joiner block. Thus we need thread_through_normal_block to accept the bitmap of blocks we have already visited and check it appropriately. Bootstrapped regression tested on x86_64-unknown-linux-gnu. Installed on trunk. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index c5b794e..54298e4 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,15 @@ +2013-10-21 Jeff Law l...@redhat.com + + * tree-ssa-threadedge.c (thread_through_normal_block): New argument VISITED. + Remove VISISTED as a local variable. When we have a threadable jump, verify + the destination of the jump has not been visised. + (thread_across_edge): Allocate VISITED bitmap once at function scope and + use it throughout. Make sure to set appropriate bits in VISITED for E (start + of jump thread path). + + * tree-ssa-threadupdate.c (mark_threaded_blocks): Reject threading through + a joiner if any edge on the path has a recorded jump thread. + 2013-10-21 Ian Lance Taylor i...@google.com * doc/invoke.texi (Optimize Options): For -fno-toplevel-reorder, diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index f567557..ebd93cb 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -883,7 +883,8 @@ thread_through_normal_block (edge e, bool handle_dominating_asserts, vectree *stack, tree (*simplify) (gimple, gimple), -vecjump_thread_edge * *path) +vecjump_thread_edge * *path, +bitmap visited) { /* If E is a backedge, then we want to verify that the COND_EXPR, SWITCH_EXPR or GOTO_EXPR at the end of e-dest is not affected @@ -922,11 +923,10 @@ thread_through_normal_block (edge e, { edge taken_edge = find_taken_edge (e-dest, cond); basic_block dest = (taken_edge ? taken_edge-dest : NULL); - bitmap visited; /* DEST could be NULL for a computed jump to an absolute address. */ - if (dest == NULL || dest == e-dest) + if (dest == NULL || dest == e-dest || bitmap_bit_p (visited, dest-index)) return false; jump_thread_edge *x @@ -944,7 +944,6 @@ thread_through_normal_block (edge e, { /* We don't want to thread back to a block we have already visited. This may be overly conservative. */ - visited = BITMAP_ALLOC (NULL); bitmap_set_bit (visited, dest-index); bitmap_set_bit (visited, e-dest-index); thread_around_empty_blocks (taken_edge, @@ -953,7 +952,6 @@ thread_through_normal_block (edge e, simplify, visited, path); - BITMAP_FREE (visited); } return true; } @@ -995,15 +993,21 @@ thread_across_edge (gimple dummy_cond, vectree *stack, tree (*simplify) (gimple, gimple)) { + bitmap visited = BITMAP_ALLOC (NULL); + stmt_count = 0; vecjump_thread_edge * *path = new vecjump_thread_edge * (); + bitmap_clear (visited); + bitmap_set_bit (visited, e-src-index); + bitmap_set_bit (visited, e-dest-index); if (thread_through_normal_block (e, dummy_cond, handle_dominating_asserts, - stack, simplify, path)) + stack, simplify, path, visited)) { propagate_threaded_block_debug_into (path-last ()-e-dest, e-dest); remove_temporary_equivalences (stack); + BITMAP_FREE (visited); register_jump_thread (path); return; } @@ -1030,7 +1034,6 @@ thread_across_edge (gimple dummy_cond, edge taken_edge; edge_iterator ei; bool found; -bitmap visited = BITMAP_ALLOC (NULL); /* Look at each successor of E-dest to see if we can thread through it. */ FOR_EACH_EDGE (taken_edge, ei, e-dest-succs) diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index e791269..737a6a2 100644 ---
Doc patch RFA: Remove docs for -fno-default-inline
According to c.opt, the -fdefault-inline patch is ignored, and only exists for backward compatibility. The only use of flag_default_inline was removed by Honza here: http://gcc.gnu.org/ml/gcc-patches/2008-07/msg02104.html This patch removes the docs for -fdefault-inline. Bootstrapped on x86_64-unknown-linux-gnu. OK for mainline? Ian 2013-10-21 Ian Lance Taylor i...@google.com * doc/invoke.texi (Option Summary): Remove -fno-default-inline. (C++ Dialect Options): Likewise. (Optimize Options): Likewise. Index: doc/invoke.texi === --- doc/invoke.texi (revision 203894) +++ doc/invoke.texi (working copy) @@ -190,7 +190,7 @@ in the following sections. -frepo -fno-rtti -fstats -ftemplate-backtrace-limit=@var{n} @gol -ftemplate-depth=@var{n} @gol -fno-threadsafe-statics -fuse-cxa-atexit -fno-weak -nostdinc++ @gol --fno-default-inline -fvisibility-inlines-hidden @gol +-fvisibility-inlines-hidden @gol -fvtable-verify=@var{std|preinit|none} @gol -fvtv-counts -fvtv-debug @gol -fvisibility-ms-compat @gol @@ -385,7 +385,6 @@ Objective-C and Objective-C++ Dialects}. -flto-partition=@var{alg} -flto-report -flto-report-wpa -fmerge-all-constants @gol -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg @gol --fno-default-inline @gol -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol -fno-inline -fno-math-errno -fno-peephole -fno-peephole2 @gol -fno-sched-interblock -fno-sched-spec -fno-signed-zeros @gol @@ -2403,13 +2402,6 @@ In addition, these optimization, warning have meanings only for C++ programs: @table @gcctabopt -@item -fno-default-inline -@opindex fno-default-inline -Do not assume @samp{inline} for functions defined inside a class scope. -@xref{Optimize Options,,Options That Control Optimization}. Note that these -functions have linkage like inline functions; they just aren't -inlined by default. - @item -Wabi @r{(C, Objective-C, C++ and Objective-C++ only)} @opindex Wabi @opindex Wno-abi @@ -6822,14 +6814,6 @@ can use the following flags in the rare optimizations to be performed is desired. @table @gcctabopt -@item -fno-default-inline -@opindex fno-default-inline -Do not make member functions inline by default merely because they are -defined inside the class scope (C++ only). Otherwise, when you specify -@w{@option{-O}}, member functions defined inside class scope are compiled -inline by default; i.e., you don't need to add @samp{inline} in front of -the member function name. - @item -fno-defer-pop @opindex fno-defer-pop Always pop the arguments to each function call as soon as that function
Re: patch to enable LRA for ppc
On Sun, Oct 20, 2013 at 10:48:08PM -0400, Vladimir Makarov wrote: On 13-10-18 11:26 AM, David Edelsohn wrote: On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov vmaka...@redhat.com wrote: The following patch permits today trunk to use LRA for ppc by default. To switch it off -mno-lra can be used. The patch was bootstrapped on ppc64. GCC testsuite does not have regressions too (in comparison with reload). The change in rs6000.md is for fix LRA failure on a recently added ppc test. Vlad, I have not forgotten this patch. We are trying to figure out the right timeframe to make this change. The patch does affect performance -- both positively and negatively; most are in the noise but not all. And there still are some SPEC benchmarks that fail to build with the patch, at least in Mike's tests. And Mike is implementing some patches to utilize reload to improve use of VSX registers, which would need to be mirrored in LRA for the equivalent functionality. Thanks for informing me, David. I am ready to work on any LRA ppc issues when it will be in the trunk. It would be easier for me to work on LRA ppc if the patch is committed to the trunk and of course LRA is used as non-default local RA. I don't know what Mike is doing on reload to use VSX registers. I guess it is usage of VSX regs as spilled locations for GENERAL regs instead of memory. If it is so, it is 2 day work to add this functionality in LRA (as it already has analogous functionality for Intel processors and that gave a nice SPECFP2000 improvement for them) and probably more work on resolving issues especially as I have no power8. I would say lets add -mlra, but make the default OFF for the time being. We can always switch the default later. Vladimir, I thought I included you in the list when I gave status. The big thing is several of the Spec 2006 benchmarks don't work in 32-bit mode, and I get a lot of Fortran errors, again in 32-bit. I also saw some decimal floating point problems. What I'm doing is adding secondary reload support so that up until reload time, we can represent VSX addresses as reg+offset, and in secondary reload, create the addition instructions to put the offset in a base register. I haven't made any changes to the machine independent portions of the compiler. As long as IRA uses the secondary reload interface, it should be ok. However, right now, I need to focus most of my attention on getting the secondary reload support to work. One thing that I've asked for before, but to remind you, is I really, really wish secondary reload could allocate two scratch registers if it is given an insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since that gives 2 registers, but future changes will want TFmode to go into a single vector register, and I will need to create another type, like V4DI that does take 2 registers. The case that this is needed for is moving an item from GPRs to VSX registers that takes 2 GPR registers, such as moving 128-bit items in 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the move into, and then I will do the combine operation. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping
Hi, in spring I have suggested to shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue in order to create many more shrink-wrapping opportunities. The problem is that formal arguments of a functions which need to be saved across calls on slow paths often get assigned callee saved registers and the very first BB thus needs prologue, which makes any attempt at shrink-wrapping difficult. Steven suggested that splitting live-ranges of these registers might be a better approach (and Vlad suggested that I should look at http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01137.html, a big thanks to both) and this is what the patch below does. The basic idea is to split live ranges of problematic pseudos in the common dominator of all non-sibling calls and all uses of those pseudos in BBs which are reachable from calls, if such dominator is not in a loop and if it does not post-dominate the entry BB. As far as the number of performed shrink-wrappings is concerned, it has even bigger effect to scheduling pass_cprop_hardreg earlier that I have proposed in spring. Apart from looking at bootstrap and SPEC 2006 (at -Ofast) I have tried compiling Python and Ruby (with whatever their default flags were, I believe mostly -O3) because David suggested at his Cauldron talk last year that shrink-wrapping may be important: Number of performed shrink-wrappings: | Source | Trunk | With patch | % | |-+---++-| | SPEC 2006 Int | 734 | 1670 | +127.52 | | SPEC 2006 FP (*)| 575 | 1022 | +77.74 | | C/CPP bootstrap stage 2 | 1748 | 3015 | +72.48 | | Python 2.7.5| 231 |401 | +73.59 | | Python 3.3.2| 234 |416 | +77.78 | | Ruby 2.0.0-p247 | 331 |464 | +40.18 | (*) I forgot to increase the stack limit and I believe that is the reason why 410.bwaves, 416.gamess, 447.dealII and 481.wrf SPEC 2006 FP benchmarks failed, regardless of my patch, but I have not investigated what exactly happened, so there might have even been compile-time issues. As far as run-times are concerned, I have tried running the benchmarks from hg.python.org/benchmarks but I do not have anything to report. David claimed that shrink-wrapping in function lookdict_string is likely to help but this patch alone does not make that happen, I am investigating what else needs to be done for this to happen. I have not yet tried benchmarking ruby. Spec 2006 (*) run times are a bit encouraging. Run-times (seconds, median of three runs): | Benchmark | Trunk | Live range splitting | % | |+---+--+---| | 400.perlbench | 294 | 290 | -1.36 | | 401.bzip2 | 384 | 387 | +0.78 | | 403.gcc| 239 | 237 | -0.84 | | 429.mcf| 227 | 228 | +0.44 | | 445.gobmk | 392 | 390 | -0.51 | | 456.hmmer | 343 | 342 | -0.29 | | 458.sjeng | 418 | 411 | -1.67 | | 462.libquantum | 287 | 288 | +0.35 | | 464.h264ref| 417 | 418 | +0.24 | | 471.omnetpp| 275 | 280 | +1.82 | | 473.astar | 312 | 295 | -5.45 | | 483.xalancbmk | 183 | 183 | 0.00 | |+---+--+---| | 433.milc | 336 | 335 | -0.30 | | 434.zeusmp | 247 | 246 | -0.40 | | 435.gromacs| 259 | 259 | 0.00 | | 436.cactusADM | 192 | 186 | -3.12 | | 437.leslie3d | 227 | 226 | -0.44 | | 444.namd | 315 | 315 | 0.00 | | 450.soplex | 202 | 200 | -0.99 | | 453.povray | 136 | 134 | -1.47 | | 454.calculix | 300 | 300 | 0.00 | | 459.GemsFDTD | 273 | 267 | -2.20 | | 465.tonto | 685 | 685 | 0.00 | | 470.lbm| 212 | 213 | +0.47 | | 482.sphinx3| 363 | 360 | -0.83 | The patch is speculative, quite a few modifications do not lead to shrink wrappings, as outlined in the following table which shows how many of the changed functions still needed their prologue in the first BB. Number of functions touched: | | Modified | | | | Source | functions | In vain | % | |-+---+-+---| | SPEC 2006 Int | 2004 | 745 | 37.18 | | SPEC 2006 FP (*)| 1491 | 881 | 59.09 | | C/C++ bootstrap stage 2 | 2864 | 935 | 32.65 |
Re: [PATCH, rs6000] Be careful with special permute masks for little endian
Whoops, looks like I missed some simpler cases (REG with the wrong mode instead of SUBREG with the wrong mode). Is this revised version ok, assuming it passes testing? It should fix a few more test cases. The changed code from the previous version is in the last hunk. Thanks, Bill Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 203792) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, @@ -28980,6 +28987,26 @@ altivec_expand_vec_perm_const (rtx operands[4]) enum machine_mode omode = insn_data[icode].operand[0].mode; enum machine_mode imode = insn_data[icode].operand[1].mode; + /* For little-endian, don't use vpkuwum and vpkuhum if the +underlying vector type is not V4SI and V8HI, respectively. +For example, using vpkuwum with a V8HI picks up the even +halfwords (BE numbering) when the even halfwords (LE +numbering) are what we need. */ + if (!BYTES_BIG_ENDIAN + icode == CODE_FOR_altivec_vpkuwum + ((GET_CODE (op0) == REG + GET_MODE (op0) != V4SImode) + || (GET_CODE (op0) == SUBREG + GET_MODE (XEXP (op0, 0)) != V4SImode))) + continue; + if (!BYTES_BIG_ENDIAN + icode == CODE_FOR_altivec_vpkuhum + ((GET_CODE (op0) == REG + GET_MODE (op0) != V8HImode) + || (GET_CODE (op0) == SUBREG + GET_MODE (XEXP (op0, 0)) != V8HImode))) + continue; + /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering from 0 to 2N-1. */ On Mon, 2013-10-21 at 10:02 -0400, David Edelsohn wrote: On Mon, Oct 21, 2013 at 8:49 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, In altivec_expand_vec_perm_const, we look for special masks that match the behavior of specific instructions, so we can use those instructions rather than load a constant control vector and perform a permute. Some of the masks must be treated differently for little endian mode. The masks that represent merge-high and merge-low operations have reversed meanings in little-endian, because of the reversed ordering of the vector elements. The masks that represent vector-pack operations remain correct when the mode of the input operands matches the natural mode of the instruction, but not otherwise. This is because the pack instructions always select the rightmost, low-order bits of the vector element. There are cases where we use this, for example, with a V8SI vector matching a vpkuwum mask in order to select the odd-numbered elements of the vector. In little endian mode, this instruction will get us the even-numbered elements instead. There is no alternative instruction with the desired behavior, so I've just disabled use of those masks for little endian when the mode isn't natural. These changes fix 32 failures in the test suite for little endian mode.
Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.
Jeff, thank you for installing this patch. Actually I already have the write privileges. I just came back from a trip. Thank you again! thanks, Cong On Fri, Oct 18, 2013 at 10:22 PM, Jeff Law l...@redhat.com wrote: On 10/18/13 03:56, Richard Biener wrote: On Thu, 17 Oct 2013, Cong Hou wrote: I tested this case with -fno-tree-loop-im and -fno-tree-pre, and it seems that GCC could hoist j+1 outside of the i loop: t3.c:5:5: note: hoisting out of the vectorized loop: _10 = (sizetype) j_25; t3.c:5:5: note: hoisting out of the vectorized loop: _11 = _10 + 1; t3.c:5:5: note: hoisting out of the vectorized loop: _12 = _11 * 4; t3.c:5:5: note: hoisting out of the vectorized loop: _14 = b_13(D) + _12; t3.c:5:5: note: hoisting out of the vectorized loop: _15 = *_14; t3.c:5:5: note: hoisting out of the vectorized loop: _16 = _15 + 1; But your suggestion is still nice as it can remove a branch and make the code more brief. I have updated the patch and also included the nested loop example into the test case. Ok if it passes bootstrap regtest. Bootstrapped regression tested on x86_64-unknown-linux-gnu. Installed on Cong's behalf. Cong -- if you plan on contributing regularly to GCC, please start the process for write privileges. This form should have everything you need: https://sourceware.org/cgi-bin/pdw/ps_form.cgi Jeff
Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.
On 10/21/13 10:45, Cong Hou wrote: Jeff, thank you for installing this patch. Actually I already have the write privileges. I just came back from a trip. Ah. I didn't see you in the MAINTAINERS file. Can you update that file please. Thanks, jeff
Re: [Patch] Fix gcc.dg/20050922-*.c
On 10/21/13 04:28, Vidya Praveen wrote: Hello, Tests gcc.dg/20050922-1.c and gcc.dg/20050922-2.c includes stdlib.h. This can be a issue especially since they define uint32_t. Testcase writing guidelines discourages such inclusion as well. This patch replaces these #includes with manual declarations. Tested for aarch64-none-elf, arm-none-eabi and x86_64-linux-gnu OK for trunk, 4.7, 4.8? VP. --- gcc/testsuite/ChangeLog: 2013-10-21 Vidya Praveen vidyaprav...@arm.com * gcc.dg/20050922-1.c: Remove stdlib.h and declare abort(). * gcc.dg/20050922-1.c: Remove stdlib.h and declare abort() and exit(). OK installed on trunk. Release managers would need to make a decision about whether or not to include this for the 4.7/4.8 branches. Thanks, Jeff
Re-factor inclusion of tree.h
This moves tree.h out of every header. This exposes dependencies of tree.h in files that should probably not need it after tree and gimple are separated. After this change, no header should include tree.h directly. It should only be included by a .c file. Unfortunately, I did not find an automatic way of forcing this. Tested on x86_64 with all languages enabled and using contrib/config-list.mk. Andrew, I think that some of the inclusions of tree-core.h can actually go away. Last time we chatted about this, I think we decided that including tree-core.h from a header file is OK. Not sure if that's still the case. I plan to commit this by tomorrow, unless there are objections. Diego. * asan.c: Include tree.h * bb-reorder.c: Likewise. * cfgcleanup.c: Likewise. * cfgloopmanip.c: Likewise. * data-streamer-in.c: Likewise. * data-streamer-out.c: Likewise. * data-streamer.c: Likewise. * dwarf2cfi.c: Likewise. * graphite-blocking.c: Likewise. * graphite-clast-to-gimple.c: Likewise. * graphite-dependences.c: Likewise. * graphite-interchange.c: Likewise. * graphite-optimize-isl.c: Likewise. * graphite-poly.c: Likewise. * graphite-scop-detection.c: Likewise. * graphite-sese-to-poly.c: Likewise. * graphite.c: Likewise. * ipa-devirt.c: Likewise. * ipa-profile.c: Likewise. * ipa.c: Likewise. * ira.c: Likewise. * loop-init.c: Likewise. * loop-unroll.c: Likewise. * lower-subreg.c: Likewise. * lto/lto-object.c: Likewise. * recog.c: Likewise. * reginfo.c: Likewise. * tree-loop-distribution.c: Likewise. * tree-parloops.c: Likewise. * tree-ssa-strlen.c: Likewise. * tree-streamer.c: Likewise. * value-prof.c: Likewise. * expr.h: Include tree-core.h instead of tree.h. * gimple.h: Likewise. * ipa-prop.h: Likewise. * ipa-utils.h: Likewise. * lto-streamer.h: Likewise. * streamer-hooks.h: Likewise. * ipa-reference.h: Include cgraph.h instead of tree.h. * cgraph.h: Include basic-block.h instead of tree.h. * tree-streamer.h: Do not include tree.h. testsuite/ChangeLog * g++.dg/plugin/selfassign.c: Include tree.h. * gcc.dg/plugin/finish_unit_plugin.c: Likewise. * gcc.dg/plugin/ggcplug.c: Likewise. * gcc.dg/plugin/one_time_plugin.c: Likewise. * gcc.dg/plugin/selfassign.c: Likewise. * gcc.dg/plugin/start_unit_plugin.c: Likewise. --- gcc/ChangeLog| 44 gcc/asan.c | 1 + gcc/bb-reorder.c | 1 + gcc/cfgcleanup.c | 1 + gcc/cfgloopmanip.c | 1 + gcc/cgraph.h | 2 +- gcc/data-streamer-in.c | 1 + gcc/data-streamer-out.c | 1 + gcc/data-streamer.c | 1 + gcc/dwarf2cfi.c | 1 + gcc/expr.h | 2 +- gcc/gimple.h | 2 +- gcc/graphite-blocking.c | 1 + gcc/graphite-clast-to-gimple.c | 1 + gcc/graphite-dependences.c | 1 + gcc/graphite-interchange.c | 1 + gcc/graphite-optimize-isl.c | 1 + gcc/graphite-poly.c | 1 + gcc/graphite-scop-detection.c| 1 + gcc/graphite-sese-to-poly.c | 1 + gcc/graphite.c | 1 + gcc/ipa-devirt.c | 1 + gcc/ipa-profile.c| 1 + gcc/ipa-prop.h | 2 +- gcc/ipa-reference.h | 2 +- gcc/ipa-utils.h | 2 +- gcc/ipa.c| 1 + gcc/ira.c| 1 + gcc/loop-init.c | 1 + gcc/loop-unroll.c| 1 + gcc/lower-subreg.c | 1 + gcc/lto-streamer.h | 2 +- gcc/lto/lto-object.c | 1 + gcc/recog.c | 1 + gcc/reginfo.c| 1 + gcc/streamer-hooks.h | 2 +- gcc/testsuite/ChangeLog | 9 + gcc/testsuite/g++.dg/plugin/selfassign.c | 1 + gcc/testsuite/gcc.dg/plugin/finish_unit_plugin.c | 1 + gcc/testsuite/gcc.dg/plugin/ggcplug.c| 1 + gcc/testsuite/gcc.dg/plugin/one_time_plugin.c| 1 +
Re: [PATCH] Make fold_range_test work only on integral types (PR middle-end/58809)
On 10/21/13 09:13, Marek Polacek wrote: In this PR (well, in the first testcase in it), the problem was that fold_range_test got op0 and op1 of a COMPLEX_TYPE, subsequent call of make_range then failed because it wants to create an integer constant using build_int_cst - and that can't handle COMPLEX_TYPE. Also, we can't just compare complex numbers, so I think it's sane to bail out if fold_range_test isn't dealing with integer types... I didn't want to put the check between the declarations, so the make_range calls were moved slightly below. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-10-21 Marek Polacek pola...@redhat.com PR middle-end/58809 * fold-const.c (fold_range_test): Return 0 if the type is not an integral type. testsuite/ * gcc.dg/gomp/pr58809.c: New test. For exceedingly annoying reasons BOOLEAN_TYPE is not considered an integral type. You might consider letting fold_range_test do it's thing on BOOLEAN_TYPEs, though I'd be surprised if that matters in practice. Your call -- fine as-is or you can let BOOLEAN_TYPEs through with a pre-approved patch. jeff
Re: Re-factor inclusion of tree.h
On 10/21/13 10:52, Diego Novillo wrote: This moves tree.h out of every header. This exposes dependencies of tree.h in files that should probably not need it after tree and gimple are separated. After this change, no header should include tree.h directly. It should only be included by a .c file. Unfortunately, I did not find an automatic way of forcing this. Tested on x86_64 with all languages enabled and using contrib/config-list.mk. Andrew, I think that some of the inclusions of tree-core.h can actually go away. Last time we chatted about this, I think we decided that including tree-core.h from a header file is OK. Not sure if that's still the case. I plan to commit this by tomorrow, unless there are objections. I can't think of a good reason to even bother waiting :-) jeff
Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.
OK. Have done that. And this is also a patch, right? ;) thanks, Cong diff --git a/MAINTAINERS b/MAINTAINERS index 15b6cc7..a6954da 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -406,6 +406,7 @@ Fergus Hendersonf...@cs.mu.oz.au Stuart Henderson shend...@gcc.gnu.org Matthew Hiller hil...@redhat.com Manfred Hollstein m...@suse.com +Cong Hou co...@google.com Falk Hueffner f...@debian.org Andrew John Hughes gnu_and...@member.fsf.org Andy Hutchinsonhutchinsona...@aim.com On Mon, Oct 21, 2013 at 9:46 AM, Jeff Law l...@redhat.com wrote: On 10/21/13 10:45, Cong Hou wrote: Jeff, thank you for installing this patch. Actually I already have the write privileges. I just came back from a trip. Ah. I didn't see you in the MAINTAINERS file. Can you update that file please. Thanks, jeff
Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++)
On 10/18/13 15:06, Iyer, Balaji V wrote: Hi Jeff, Please see my comments below. Also, I am adding all these changes to the files as you requested in my local directory. Should I send you an updated patch along the way? I'll let you know when I've worked my way through everything. ISTM an updated patch after that would be best. I'm hoping to get through the remaining changes today, then review your replies in detail. jeff
Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.
On 10/21/13 11:00, Cong Hou wrote: OK. Have done that. And this is also a patch, right? ;) Right :-) Approved. jeff
[Patch, Fortran] PR58803 - Prevent a double-free with proc-pointer components
As written in the PR: The problem is that in free_components one frees: 2054free_components (gfc_component *p) ... 2058 for (; p; p = q) 2059{ 2060 q = p-next; 2061 2062 gfc_free_array_spec (p-as); 2063 gfc_free_expr (p-initializer); 2064 free (p-tb); 2065 2066 free (p); 2067} Here: p-name == f1 p-tb == (gfc_typebound_proc *) 0x17e0070 when one then cycles to p = q (i.e. to p-next), one has: p-name == f2 p-tb == (gfc_typebound_proc *) 0x17e0070 The patch is rather straight forward. Build and regtested on x86-64-gnu-linux. OK for the trunk? Tobias 2013-10-21 Tobias Burnus bur...@net-b.de PR fortran/58803 * decl.c (match_ppc_decl): Prevent later double free. 2013-10-21 Tobias Burnus bur...@net-b.de PR fortran/58803 * gfortran.dg/proc_ptr_comp_38.f90: New. diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c index 3a8175f..9c9fd4f 100644 --- a/gcc/fortran/decl.c +++ b/gcc/fortran/decl.c @@ -5055,7 +5055,14 @@ match_ppc_decl (void) if (!gfc_add_proc (c-attr, name, NULL)) return MATCH_ERROR; - c-tb = tb; + if (num == 1) + c-tb = tb; + else + { + c-tb = XCNEW (gfc_typebound_proc); + c-tb-where = gfc_current_locus; + *c-tb = *tb; + } /* Set interface. */ if (proc_if != NULL) diff --git a/gcc/testsuite/gfortran.dg/proc_ptr_comp_38.f90 b/gcc/testsuite/gfortran.dg/proc_ptr_comp_38.f90 new file mode 100644 index 000..2a71ca0 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/proc_ptr_comp_38.f90 @@ -0,0 +1,12 @@ +! { dg-do compile } +! +! PR fortran/58803 +! +! Contributed by Vittorio Zecca +! +! Was before ICEing due to a double free +! + type t + procedure(real), pointer, nopass :: f1, f2 + end type + end
Re: [PATCH] Make fold_range_test work only on integral types (PR middle-end/58809)
On Mon, Oct 21, 2013 at 10:56:52AM -0600, Jeff Law wrote: On 10/21/13 09:13, Marek Polacek wrote: In this PR (well, in the first testcase in it), the problem was that fold_range_test got op0 and op1 of a COMPLEX_TYPE, subsequent call of make_range then failed because it wants to create an integer constant using build_int_cst - and that can't handle COMPLEX_TYPE. Also, we can't just compare complex numbers, so I think it's sane to bail out if fold_range_test isn't dealing with integer types... I didn't want to put the check between the declarations, so the make_range calls were moved slightly below. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-10-21 Marek Polacek pola...@redhat.com PR middle-end/58809 * fold-const.c (fold_range_test): Return 0 if the type is not an integral type. testsuite/ * gcc.dg/gomp/pr58809.c: New test. For exceedingly annoying reasons BOOLEAN_TYPE is not considered an integral type. You might consider letting fold_range_test do it's thing on BOOLEAN_TYPEs, though I'd be surprised if that matters in practice. Really? In tree.h I can see #define INTEGRAL_TYPE_P(TYPE) \ (TREE_CODE (TYPE) == ENUMERAL_TYPE \ || TREE_CODE (TYPE) == BOOLEAN_TYPE \ || TREE_CODE (TYPE) == INTEGER_TYPE) Your call -- fine as-is or you can let BOOLEAN_TYPEs through with a pre-approved patch. ...so I'm keeping the patch as-is. Thanks, Marek
[jit] Implement gcc_jit_lvalue_get_address
Committed to dmalcolm/jit: gcc/jit/ * TODO.rst (the C unary prefix operator): Remove completed item. * internal-api.c (gcc::jit::lvalue::get_address): New. * internal-api.h (gcc::jit::lvalue::get_address): New. * libgccjit.c (gcc_jit_lvalue_get_address): New. * libgccjit.h (gcc_jit_lvalue_get_address): New. * libgccjit.map (gcc_jit_lvalue_get_address): New. gcc/testsuite/ * jit.dg/test-expressions.c (test_global): New. (make_test_of_get_address): New. (verify_get_address): New. (code_making_callback): Add call to make_test_of_get_address. (verify_code): Add call to verify_get_address. --- gcc/jit/ChangeLog.jit | 9 ++ gcc/jit/TODO.rst| 5 --- gcc/jit/internal-api.c | 13 gcc/jit/internal-api.h | 3 ++ gcc/jit/libgccjit.c | 9 ++ gcc/jit/libgccjit.h | 7 + gcc/jit/libgccjit.map | 1 + gcc/testsuite/ChangeLog.jit | 8 + gcc/testsuite/jit.dg/test-expressions.c | 54 + 9 files changed, 104 insertions(+), 5 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 7c574e8..e16902d 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,12 @@ +2013-10-21 David Malcolm dmalc...@redhat.com + + * TODO.rst (the C unary prefix operator): Remove completed item. + * internal-api.c (gcc::jit::lvalue::get_address): New. + * internal-api.h (gcc::jit::lvalue::get_address): New. + * libgccjit.c (gcc_jit_lvalue_get_address): New. + * libgccjit.h (gcc_jit_lvalue_get_address): New. + * libgccjit.map (gcc_jit_lvalue_get_address): New. + 2013-10-18 David Malcolm dmalc...@redhat.com * internal-api.c (gcc::jit::context::new_param): Add context diff --git a/gcc/jit/TODO.rst b/gcc/jit/TODO.rst index 41cc4ad..d0daeb4 100644 --- a/gcc/jit/TODO.rst +++ b/gcc/jit/TODO.rst @@ -55,11 +55,6 @@ Initial Release and, indeed, clarify all of the other operations. -* the C unary prefix operator:: - -extern gcc_jit_rvalue * -gcc_jit_lvalue_get_address (gcc_jit_lvalue *lvalue); - * array types, in case they're needed for structs:: extern gcc_jit_type * diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 81178e1..8f84e14 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -757,6 +757,19 @@ dereference (gcc::jit::location *loc) return new lvalue (get_context (), datum); } +gcc::jit::rvalue * +gcc::jit::lvalue:: +get_address (location *loc) +{ + tree t_lvalue = as_tree (); + tree t_thistype = TREE_TYPE (t_lvalue); + tree t_ptrtype = build_pointer_type (t_thistype); + tree ptr = build1 (ADDR_EXPR, t_ptrtype, t_lvalue); + if (loc) +get_context ()-set_tree_location (ptr, loc); + return new rvalue (get_context (), ptr); +} + void * gcc::jit::wrapper:: operator new (size_t sz) diff --git a/gcc/jit/internal-api.h b/gcc/jit/internal-api.h index e31dbc8..1a8499f 100644 --- a/gcc/jit/internal-api.h +++ b/gcc/jit/internal-api.h @@ -422,6 +422,9 @@ public: access_field (location *loc, const char *fieldname); + rvalue * + get_address (location *loc); + }; class param : public lvalue diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c index 023c213..8d77761 100644 --- a/gcc/jit/libgccjit.c +++ b/gcc/jit/libgccjit.c @@ -529,6 +529,15 @@ gcc_jit_rvalue_dereference (gcc_jit_rvalue *rvalue, return (gcc_jit_lvalue *)rvalue-dereference (loc); } +gcc_jit_rvalue * +gcc_jit_lvalue_get_address (gcc_jit_lvalue *lvalue, + gcc_jit_location *loc) +{ + RETURN_NULL_IF_FAIL (lvalue, NULL, NULL lvalue); + + return (gcc_jit_rvalue *)lvalue-get_address (loc); +} + gcc_jit_lvalue * gcc_jit_function_new_local (gcc_jit_function *func, gcc_jit_location *loc, diff --git a/gcc/jit/libgccjit.h b/gcc/jit/libgccjit.h index 2187ede..12a715d 100644 --- a/gcc/jit/libgccjit.h +++ b/gcc/jit/libgccjit.h @@ -601,6 +601,13 @@ extern gcc_jit_lvalue * gcc_jit_rvalue_dereference (gcc_jit_rvalue *rvalue, gcc_jit_location *loc); +/* Taking the address of an lvalue; analogous to: + (EXPR) + in C. */ +extern gcc_jit_rvalue * +gcc_jit_lvalue_get_address (gcc_jit_lvalue *lvalue, + gcc_jit_location *loc); + extern gcc_jit_lvalue * gcc_jit_function_new_local (gcc_jit_function *func, gcc_jit_location *loc, diff --git a/gcc/jit/libgccjit.map b/gcc/jit/libgccjit.map index 7700788..e2503e4 100644 --- a/gcc/jit/libgccjit.map +++ b/gcc/jit/libgccjit.map @@ -41,6 +41,7 @@ gcc_jit_loop_end; gcc_jit_lvalue_as_rvalue; gcc_jit_lvalue_access_field; +gcc_jit_lvalue_get_address; gcc_jit_param_as_lvalue; gcc_jit_param_as_rvalue;
Re: [RFC] Isolate simplify paths with undefined behaviour
On 10/21/13 06:19, Richard Biener wrote: I wonder why this isn't part of the regular jump-threading code - after all the opportunity is to thead to __builtin_unreachable () ;) Otherwise the question is always where you'd place this pass and whether it enables jump threading or CSE opportunities (that at least it does). In theory one could do this as part of jump threading. Doing it in jump threading would have the advantage that we could use a NULL generated in a different block than the dereference. Of course, we'd have a full path to duplicate at that point. The downside is complexity. This pass is amazingly simple and effective. The jump threading code is considerably more complex and bolting this on the side would just make it worse. From a pipeline location, if kept as a separate pass, it probably needs to be between DOM1 phi-only cprop. DOM1 does a reasonable job at propagating the NULL values into the PHI nodes to expose the optimization. And the pass tends to expose precisely the kinds of PHI nodes that phi-only cprop can optimize. VRP2/PRE/DOM2 do a good job at finding the newly exposed CSEs and jump threading opportunities. I haven't looked to see if there's anything left to optimize after DOM2, but I can certainly speculate that there would be cases exposed by the various passes that currently exist between DOM1 DOM2. That's certainly worth exploring. As you know, phase ordering will always be a concern. We already have acknowledged that we're punting some of those issues once we stopped iterating DOM. Jeff
[Patch, Fortran] PR58793 - reject passing TYPE(*) to CLASS(*)
The issue came up while reviewing the patch for PR58793. Build and regtested on x86-64-gnu-linux OK for the trunk? Tobias 2013-10-21 Tobias Burnus bur...@net-b.de PR fortran/58793 * interface.c (compare_parameter): Reject passing TYPE(*) to CLASS(*). 2013-10-21 Tobias Burnus bur...@net-b.de PR fortran/58793 * gfortran.dg/assumed_type_8.f90: New. diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c index b3ddf5f..0504c90 100644 --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -1972,6 +1972,15 @@ compare_parameter (gfc_symbol *formal, gfc_expr *actual, return 0; } + if (actual-ts.type == BT_ASSUMED formal-ts.type != BT_ASSUMED) +{ + if (where) + gfc_error (Assumed-type actual argument at %L requires that dummy + argument '%s' is of assumed type, actual-where, + formal-name); + return 0; +} + /* F2008, 12.5.2.5; IR F08/0073. */ if (formal-ts.type == BT_CLASS formal-attr.class_ok actual-expr_type != EXPR_NULL diff --git a/gcc/testsuite/gfortran.dg/assumed_type_8.f90 b/gcc/testsuite/gfortran.dg/assumed_type_8.f90 new file mode 100644 index 000..543e693 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/assumed_type_8.f90 @@ -0,0 +1,19 @@ +! { dg-do compile } +! +! Issue came up during the review of PR fortran/58793 +! +! Test for TS29113:2012's C407b. +! +program test + use iso_c_binding + integer,target ::aa + call up(c_loc(aa)) +contains + subroutine up(x) +class(*) :: x + end subroutine + subroutine bar(x) + type(*) :: x + call up(x) ! { dg-error Assumed-type actual argument at .1. requires that dummy argument 'x' is of assumed type } + end subroutine bar +end program test
Re: [Patch] Fix gcc.dg/20050922-*.c
On Oct 21, 2013, at 3:28 AM, Vidya Praveen vidyaprav...@arm.com wrote: Tests gcc.dg/20050922-1.c and gcc.dg/20050922-2.c includes stdlib.h. This can be a issue especially since they define uint32_t. OK for 4.7, 4.8? For release branches, you'd need to transition from the theoretical to the practical. On which systems (software) does it fail? If none, then no, a back port isn't necessary. If it fails on a system (or software) on which real users use, then I'll approve it once you name the system (software) and let it bake on trunk for a week and see if anyone objects…
Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++)
On 10/18/13 15:06, Iyer, Balaji V wrote: The main reason why I made it volatile (as expressed by the volatil bool variable) is that I want to make sure these values aren't optimized by the compiler and the value is fetched from memory on every access. I have added an explanation to the header comment. I figured that was the case; it's also what origianlly got me thinking about memory models. volatile can help with getting the right number of accesses (though there are cases where it gets in the way). But volatile is not a fence/barrier :-) Presumably users never concern themselves with the cilkrts_pedigree structure, it's entirely hidden, right? So there's no way for a user to ask for an array of these things, right? Ok, so this relates to the question about keeping all the structures in sync I note you set DECL_ALIGN to BIGGEST_ALIGNMENT, which seems excessive. Why not compute the natural alignment for that structure and use that? Similarly for the cilkrts_stack_frame, except that it seems to use PREFERRED_STACK_BOUNDARY. Is there some reason why each shouldn't be aligned on whatever boundary the target would normally align those structures? Fixed, I now let the compiler set these values Actually, thinking more about this, we've got to make sure we're compatible with ICC on this stuff. So if you need to weak the alignments to be ICC compatible, then, well, we have to do it. If (for example) the compilers differ on their notion of the structure's alignment, then you couldn't make an array of them and use it reliably across the ICC GCC implementations. You might consider some sort of interoperatibity test. Obviously the test wouldn't run if ICC wasn't installed, but having a test to catch this stuff early (if it ever happens) would be wise. These structures don't change often and whenever they do (i.e. fields get added into it) it is kind of a big deal. So far, when we have changed - once in the past 3 yrs. or so - we have allowed backward compatibility. Good to hear it's a big deal. Once there are multiple implementations in the wild relying upon them, it'll be an even bigger deal.. We've given it some thought, but have neither the manpower nor charter to port Cilk to non-Intel architectures. We've done a trial port to ARM to prove it could be done, but we ran fib (a small example) once and declared victory. We're sure that more work needs to be done here and would welcome changes to the runtime to support other architectures. In particular, most or all uses of assembly-language instructions should be replaced by compiler intrinsics and memory barriers probably need to be added for architectures that have a more relaxed memory model than the x86. Our hope is that, once Cilk Plus was added to gcc, that members of the community would help us port the runtime to other architectures. Such ports could probably start with full barriers, it'd be painful from a performance standpoint, but as a starting point, sufficient. jeff
[C++ Patch] PR 58810
Hi, this is just an ICE on invalid, but I think it hints to a small clean-up. Essentially the issue is that we diagnose qualified free functions in two different places: in the grokdeclarator itself and in grokfndecl. The former handles non-friend declarations and does type = TYPE_MAIN_VARIANT (type), the latter handles friends and then doesn't even try to do what grokdeclarator does to type. Then in situations like 58810, when we have *both* at the same time: typedef int F() const; F f; struct A { friend F f; }; the two types become inconsistent, one has the const and the other doesn't, and an assert in typeck.c:merge_types triggers. Thus the idea of resolving the inconsistency by removing completely one of the two checks: that in grokfndecl is more precise (separate messages for cv-qualifiers and ref-qualifiers) and seems the best candidate. The patch below passes testing modulo trivial adjustments to a couple of existing testcases. Admittedly, the change is a bit risky, but it's still Stage 1, maybe we can take the chance and try to shave some lines off the mammuth grokdeclarator ;) What do you think? Thanks, Paolo. /// /cp 2013-10-21 Paolo Carlini paolo.carl...@oracle.com PR c++/58810 * decl.c (grokdeclarator): Don't handle qualified free functions here, leave the diagnostic to grokfndecl. /testsuite 2013-10-21 Paolo Carlini paolo.carl...@oracle.com PR c++/58810 * g++.dg/other/cv_func3.C: New. * g++.dg/other/cv_func.C: Adjust. * g++.dg/parse/fn-typedef2.C: Likewise. Index: cp/decl.c === --- cp/decl.c (revision 203893) +++ cp/decl.c (working copy) @@ -10242,21 +10242,6 @@ grokdeclarator (const cp_declarator *declarator, if (decl_context != TYPENAME) { - /* A cv-qualifier-seq shall only be part of the function type -for a non-static member function. A ref-qualifier shall only - /same as above/ [dcl.fct] */ - if ((type_memfn_quals (type) != TYPE_UNQUALIFIED - || type_memfn_rqual (type) != REF_QUAL_NONE) - (current_class_type == NULL_TREE || staticp) ) - { - error (staticp - ? G_(qualified function types cannot be used to - declare static member functions) - : G_(qualified function types cannot be used to - declare free functions)); - type = TYPE_MAIN_VARIANT (type); - } - /* The qualifiers on the function type become the qualifiers on the non-static member function. */ memfn_quals |= type_memfn_quals (type); Index: testsuite/g++.dg/other/cv_func.C === --- testsuite/g++.dg/other/cv_func.C(revision 203893) +++ testsuite/g++.dg/other/cv_func.C(working copy) @@ -3,7 +3,7 @@ typedef int FIC(int) const; typedef int FI(int); -FIC f; // { dg-error qualified } +FIC f; // { dg-error cv-qualifier } struct S { FIC f; // OK Index: testsuite/g++.dg/other/cv_func3.C === --- testsuite/g++.dg/other/cv_func3.C (revision 0) +++ testsuite/g++.dg/other/cv_func3.C (working copy) @@ -0,0 +1,10 @@ +// PR c++/58810 + +typedef int F() const; + +F f; // { dg-error cv-qualifier } + +struct A +{ + friend F f; // { dg-error cv-qualifier } +}; Index: testsuite/g++.dg/parse/fn-typedef2.C === --- testsuite/g++.dg/parse/fn-typedef2.C(revision 203893) +++ testsuite/g++.dg/parse/fn-typedef2.C(working copy) @@ -4,4 +4,4 @@ typedef void ft() const; typedef void V; typedef V ft() const; -ft f; // { dg-error qualified } +ft f; // { dg-error cv-qualifier }
Re: [PATCH, i386, MPX, 1/X] Support of Intel MPX ISA. 1/2 Bound type and modes
On 10/15/13 07:31, Ilya Enkovich wrote: Hey guys, could please someone look at this small patch? It blocks approved MPX ISA support on i386 target. diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi index 1d62223..02b1214 100644 --- a/gcc/doc/rtl.texi +++ b/gcc/doc/rtl.texi @@ -1382,6 +1382,10 @@ any @code{CC_MODE} modes listed in the @file{@var{machine}-modes.def}. @xref{Jump Patterns}, also see @ref{Condition Code}. +@findex MODE_BOUND +@item MODE_BOUND +Bound modes class. Used to represent values of pointer bounds. + @findex MODE_RANDOM @item MODE_RANDOM This is a catchall mode class for modes which don't fit into the above So why are bounds distinct modes?Is there some inherent reason why bounds are something other than an integer mode (MODE_INT)? Similarly what's the rationale behind having new types for bounds? Is there some reason why they couldn't be implemented with one of the existing types? ISTM the entire patch is gated on being able to answer those two questions. jeff
Re: patch to enable LRA for ppc
On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote: What I'm doing is adding secondary reload support so that up until reload time, we can represent VSX addresses as reg+offset, and in secondary reload, create the addition instructions to put the offset in a base register. I haven't made any changes to the machine independent portions of the compiler. As long as IRA uses the secondary reload interface, it should be ok. However, right now, I need to focus most of my attention on getting the secondary reload support to work. One thing that I've asked for before, but to remind you, is I really, really wish secondary reload could allocate two scratch registers if it is given an insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since that gives 2 registers, but future changes will want TFmode to go into a single vector register, and I will need to create another type, like V4DI that does take 2 registers. The case that this is needed for is moving an item from GPRs to VSX registers that takes 2 GPR registers, such as moving 128-bit items in 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the move into, and then I will do the combine operation. Eh, perhaps I'm missing something, but... Isn't one of the great advantages of LRA over reload, that LRA allows you to create new pseudos so that you shouldn't ever need secondary reloads?? Ciao! Steven
Re: [C PATCH] Warn for _Alignas in an array declarator (PR c/58267)
On Sat, Oct 19, 2013 at 07:23:20PM +, Joseph S. Myers wrote: On Wed, 16 Oct 2013, Marek Polacek wrote: @@ -2946,7 +2957,8 @@ c_parser_declarator (c_parser *parser, b struct c_declspecs *quals_attrs = build_null_declspecs (); struct c_declarator *inner; c_parser_consume_token (parser); - c_parser_declspecs (parser, quals_attrs, false, false, true, cla_prefer_id); + c_parser_declspecs (parser, quals_attrs, false, false, true, + true, cla_prefer_id); inner = c_parser_declarator (parser, type_seen_p, kind, seen_id); if (inner == NULL) return NULL; Looking at this again, shouldn't the new argument be false (with associated testcase)? This is parsing pointer declarators, and _Alignas isn't allowed there any more than it is in array declarators Ah, yeah, I think so, fixed testcase added. Thanks for catching it. @@ -3715,7 +3730,8 @@ c_parser_type_name (c_parser *parser) struct c_declarator *declarator; struct c_type_name *ret; bool dummy = false; - c_parser_declspecs (parser, specs, false, true, true, cla_prefer_type); + c_parser_declspecs (parser, specs, false, true, true, false, + cla_prefer_type); if (!specs-declspecs_seen_p) { c_parser_error (parser, expected specifier-qualifier-list); And this should get a testcase added, that _Alignas is correctly rejected in type names where previously it would have been wrongly accepted. I added the tests. (Strictly by the standard it should be false in c_parser_struct_declaration as well - the syntax there doesn't allow _Alignas - but it appears to have been intended to allow it there, so probably best not to change anything there until WG14 reaches a conclusion on the issues I raised in N1731.) I added a comment there to that effect. Tested via make check -C gcc RUNTESTFLAGS=dg.exp=c1x-*.c, ok for trunk? 2013-10-21 Marek Polacek pola...@redhat.com c/ * c-parser.c (c_parser_struct_declaration): Add a comment. (c_parser_declarator): Don't allow _Alignas here. testsuite/ * gcc.dg/c1x-align-5.c: Add more testing. --- gcc/c/c-parser.c.mp 2013-10-21 19:34:17.877539421 +0200 +++ gcc/c/c-parser.c2013-10-21 20:22:45.980220027 +0200 @@ -2645,6 +2645,11 @@ c_parser_struct_declaration (c_parser *p } specs = build_null_declspecs (); decl_loc = c_parser_peek_token (parser)-location; + /* Strictly by the standard, we shouldn't allow _Alignas here, + but it appears to have been intended to allow it there, so + we're keeping it as it is until WG14 reaches a conclusion + of N1731. + http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1731.pdf */ c_parser_declspecs (parser, specs, false, true, true, true, cla_nonabstract_decl); if (parser-error) @@ -2959,7 +2964,7 @@ c_parser_declarator (c_parser *parser, b struct c_declarator *inner; c_parser_consume_token (parser); c_parser_declspecs (parser, quals_attrs, false, false, true, - true, cla_prefer_id); + false, cla_prefer_id); inner = c_parser_declarator (parser, type_seen_p, kind, seen_id); if (inner == NULL) return NULL; --- gcc/testsuite/gcc.dg/c1x-align-5.c.mp 2013-10-21 19:41:20.088041095 +0200 +++ gcc/testsuite/gcc.dg/c1x-align-5.c 2013-10-21 20:16:55.096984545 +0200 @@ -14,6 +14,14 @@ void foo (int a[_Alignas (0) 10]) { } /* void test (void) { + int *_Alignas (long) p; /* { dg-error expected } */ + int *const _Alignas (long) *q; /* { dg-error expected } */ + struct s { int n; }; + __builtin_offsetof (struct s _Alignas (int), n); /* { dg-error expected } */ + __typeof (long double _Alignas (0)) e; /* { dg-error expected } */ + sizeof (int _Alignas (int)); /* { dg-error expected } */ + _Alignas (int _Alignas (float)) int t; /* { dg-error expected } */ + __builtin_types_compatible_p (signed _Alignas (0), unsigned); /* { dg-error expected } */ int a[_Alignas (int) 10]; /* { dg-error expected expression before } */ int b[10]; foo (b); Marek
[jit] Improve error message in gcc_jit_function_new_local
Committed to dmalcolm/jit: gcc/jit/ * libgccjit.c (gcc_jit_function_new_local): Use a more clear error message for the case where someone tries to add a local to a function imported from elsewhere. --- gcc/jit/ChangeLog.jit | 6 ++ gcc/jit/libgccjit.c | 5 - 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index e16902d..df08834 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,11 @@ 2013-10-21 David Malcolm dmalc...@redhat.com + * libgccjit.c (gcc_jit_function_new_local): Use a more clear + error message for the case where someone tries to add a local + to a function imported from elsewhere. + +2013-10-21 David Malcolm dmalc...@redhat.com + * TODO.rst (the C unary prefix operator): Remove completed item. * internal-api.c (gcc::jit::lvalue::get_address): New. * internal-api.h (gcc::jit::lvalue::get_address): New. diff --git a/gcc/jit/libgccjit.c b/gcc/jit/libgccjit.c index 8d77761..40ac98d 100644 --- a/gcc/jit/libgccjit.c +++ b/gcc/jit/libgccjit.c @@ -544,7 +544,10 @@ gcc_jit_function_new_local (gcc_jit_function *func, gcc_jit_type *type, const char *name) { - RETURN_NULL_IF_NOT_FUNC_DEFINITION (func); + RETURN_NULL_IF_FAIL (func, NULL, NULL function); + RETURN_NULL_IF_FAIL (func-get_kind () != GCC_JIT_FUNCTION_IMPORTED, + NULL, + Cannot add locals to an imported function); RETURN_NULL_IF_FAIL (type, NULL, NULL type); RETURN_NULL_IF_FAIL (name, NULL, NULL name); -- 1.7.11.7
Re: [Patch, Fortran] PR58803 - Prevent a double-free with proc-pointer components
On Mon, Oct 21, 2013 at 07:15:07PM +0200, Tobias Burnus wrote: - c-tb = tb; + if (num == 1) + c-tb = tb; + else + { + c-tb = XCNEW (gfc_typebound_proc); + c-tb-where = gfc_current_locus; + *c-tb = *tb; I haven't convinced myself yet, but is this leaking memory? Lines 5006-7 are tb = XCNEW (gfc_typebound_proc); tb-where = gfc_current_locus; In the num = 1 case, c-tb = tb and I presume during teardown c-tb is freed and by extension tb. But, in the else case, are you copying the contents from *tb into *c-tb and the tb is never freed? -- Steve
[jit] Show source locations (if any) in gimple dump
Committed to dmalcolm/jit: gcc/jit/ * internal-api.c (gcc::jit::context::postprocess): Show source line numbers (if any) in gimple dump. --- gcc/jit/ChangeLog.jit | 5 + gcc/jit/internal-api.c | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index df08834..adf273b 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,10 @@ 2013-10-21 David Malcolm dmalc...@redhat.com + * internal-api.c (gcc::jit::context::postprocess): Show source + line numbers (if any) in gimple dump. + +2013-10-21 David Malcolm dmalc...@redhat.com + * libgccjit.c (gcc_jit_function_new_local): Use a more clear error message for the case where someone tries to add a local to a function imported from elsewhere. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 8f84e14..8bb6741 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -864,7 +864,7 @@ postprocess () current_function_decl = m_inner_fndecl; if (m_ctxt-get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_INITIAL_GIMPLE)) - dump_function_to_file (m_inner_fndecl, stderr, TDF_VOPS|TDF_MEMSYMS); + dump_function_to_file (m_inner_fndecl, stderr, TDF_VOPS|TDF_MEMSYMS|TDF_LINENO); //debug_tree (m_inner_fndecl); //printf(about to add to cgraph\n); -- 1.7.11.7
[jit] Fix error-handling within gcc::jit::context::compile (timevar issues)
Committed to dmalcolm/jit: gcc/jit/ * internal-api.c (gcc::jit::context::compile): Correctly cleanup timevars in error-handling, preventing an issue where an error on a context left timevar.c in an unstopped state, leading to an assertion failure when restarting timevars in the next compile. Found via fuzz-testing. --- gcc/jit/ChangeLog.jit | 8 gcc/jit/internal-api.c | 13 +++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index adf273b..2cf5c8d 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,13 @@ 2013-10-21 David Malcolm dmalc...@redhat.com + * internal-api.c (gcc::jit::context::compile): Correctly cleanup + timevars in error-handling, preventing an issue where an error + on a context left timevar.c in an unstopped state, leading to an + assertion failure when restarting timevars in the next compile. + Found via fuzz-testing. + +2013-10-21 David Malcolm dmalc...@redhat.com + * internal-api.c (gcc::jit::context::postprocess): Show source line numbers (if any) in gimple dump. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 8bb6741..d90f001 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -1261,7 +1261,11 @@ compile () active_jit_ctxt = NULL; if (errors_occurred ()) -goto error; +{ + timevar_stop (TV_TOTAL); + timevar_print (stderr); + goto error; +} timevar_push (TV_ASSEMBLE); @@ -1279,7 +1283,12 @@ compile () printf (cmd: %s\n, cmd); int ret = system (cmd); if (ret) - goto error; + { + timevar_pop (TV_ASSEMBLE); + timevar_stop (TV_TOTAL); + timevar_print (stderr); + goto error; + } } timevar_pop (TV_ASSEMBLE); -- 1.7.11.7
Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++)
On 09/11/13 12:18, Iyer, Balaji V wrote: Hello Everyone, Couple weeks back, I had submitted a patch for review that will implement Cilk keywords (_Cilk_spawn and _Cilk_sync) into the C compiler. I recently finished C++ implementation also. In this email, I am attaching 2 patches: 1 for C (and the common parts for C and C++) and 1 for C++. The C++ Changelog is labelled cp-ChangeLog.cilkplus and the other one is just ChangeLog.cilkplus. There isn't much changes in the C patch. Only noticeable changes would be moving functions to the common parts so that C++ can use them. It passes all the tests and does not affect (by affect I mean fail a passing test or pass a failing one) any of the other tests in the testsuite directory. Is this Ok for trunk? Here are my final notes from this pass over the changes. If you could send an updated patch to the list, it would be appreciated. I may have more comments now that I've looked over everything and have a slightly better understanding of the overall structure. I'm not a C++ guy, but are you going to have to do anything special with lambdas? Such as in cilk_set_spawn_marker? I'm going to assume you don't need to do anything for the C++ specific decls in copy_decl_for_cilk. Note I didn't look at any of the C++ specific files, so if you're handling it there, that's fine with me. I didn't look at the tests closely. How thorough are those tests, particularly for front-end issues? The reason I ask is those tests will help ensure that folks don't break the cilk+ support as often as they might otherwise. Basically they cover for the lack of knowledge about the cilk+ implementation by providing developers a heads-up that they broke something. Are there any internal testsuites Intel could contribute to help beef up testing? Can you #undef the helper macros SUBTREE, MODIFIED, INITIALIZED that are defined in extract_free_variables? A nit, but those kind of defines can certainly surprised folks. Actually, unless there's a compelling reason, why not just go ahead and make the calls to extract_free_variables explicit and drop the macros completely? In all, I didn't see anything that made me say wait, this is a huge issue we need to address. Presumably you and Aldy worked through those before I got involved ;-) jeff
Re: patch to enable LRA for ppc
On Mon, Oct 21, 2013 at 08:18:22PM +0200, Steven Bosscher wrote: On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote: What I'm doing is adding secondary reload support so that up until reload time, we can represent VSX addresses as reg+offset, and in secondary reload, create the addition instructions to put the offset in a base register. I haven't made any changes to the machine independent portions of the compiler. As long as IRA uses the secondary reload interface, it should be ok. However, right now, I need to focus most of my attention on getting the secondary reload support to work. One thing that I've asked for before, but to remind you, is I really, really wish secondary reload could allocate two scratch registers if it is given an insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since that gives 2 registers, but future changes will want TFmode to go into a single vector register, and I will need to create another type, like V4DI that does take 2 registers. The case that this is needed for is moving an item from GPRs to VSX registers that takes 2 GPR registers, such as moving 128-bit items in 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the move into, and then I will do the combine operation. Eh, perhaps I'm missing something, but... Isn't one of the great advantages of LRA over reload, that LRA allows you to create new pseudos so that you shouldn't ever need secondary reloads?? You still need secondary reload. For example, on the powerpc, you have 5 addressing modes for GPRs and FPRS: 1) base register 2) base register + index register 3) base register + offset 4) auto-update for base register + index register 5) auto-update for base register + offset register Now for VSX registers you only have: 1) base register 2) base register + index register So, in the work I'm doing right now, I want to allow reg + offset addressing, but if the register being loaded is an Altivec register (high part of the VSX registers), I need to create a secondary reload to load the offset into and then convert the address to indirect or indexed addressing. You don't want to always disallow offset based addressing, but you want to create the secondary reload when you need to. Similarly, vector types can do indexed addressing (reg+reg) but if you are loading or storing the value into GPR registers, you can't do reg+reg addressing on multi-word items. Finally, one of the features of ISA 2.07 is the notion of load fusion, where the hardware will fuse together a load immediate with an adjacent load (for GPRs, you need the load immediate shifted to be the register that is being loaded, for VSX registers, you just need the instructions adjacent). In this case, before reload we will want to pretend that the machine has addressing to include the fusion forms, and in secondary reload, you will generate the combined insn that will become the fusion instruction. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [Patch, Fortran] PR58803 - Prevent a double-free with proc-pointer components
Steve Kargl wrote: On Mon, Oct 21, 2013 at 07:15:07PM +0200, Tobias Burnus wrote: - c-tb = tb; + if (num == 1) + c-tb = tb; + else + { + c-tb = XCNEW (gfc_typebound_proc); + c-tb-where = gfc_current_locus; + *c-tb = *tb; I haven't convinced myself yet, but is this leaking memory? Lines 5006-7 are tb = XCNEW (gfc_typebound_proc); tb-where = gfc_current_locus; In the num = 1 case, c-tb = tb and I presume during teardown c-tb is freed and by extension tb. But, in the else case, are you copying the contents from *tb into *c-tb and the tb is never freed? Well, as the double-free PR shows, free_components frees the memory: free (p-tb);. And as it walks all components … Tobias
Re: [C++ Patch] PR 58810
... this is an alternate, more conservative approach: a check remains in grokdeclarator and it handles friends too. At variance with the check in grokfndecl it's about function *types*. Thanks, Paolo. /// Index: cp/decl.c === --- cp/decl.c (revision 203907) +++ cp/decl.c (working copy) @@ -10247,7 +10247,7 @@ grokdeclarator (const cp_declarator *declarator, /same as above/ [dcl.fct] */ if ((type_memfn_quals (type) != TYPE_UNQUALIFIED || type_memfn_rqual (type) != REF_QUAL_NONE) - (current_class_type == NULL_TREE || staticp) ) + (current_class_type == NULL_TREE || staticp || friendp) ) { error (staticp ? G_(qualified function types cannot be used to Index: testsuite/g++.dg/other/cv_func3.C === --- testsuite/g++.dg/other/cv_func3.C (revision 0) +++ testsuite/g++.dg/other/cv_func3.C (working copy) @@ -0,0 +1,10 @@ +// PR c++/58810 + +typedef int F() const; + +F f; // { dg-error qualified } + +struct A +{ + friend F f; // { dg-error qualified } +};
Re: [Patch, Fortran] PR58803 - Prevent a double-free with proc-pointer components
On Mon, Oct 21, 2013 at 09:04:22PM +0200, Tobias Burnus wrote: Steve Kargl wrote: On Mon, Oct 21, 2013 at 07:15:07PM +0200, Tobias Burnus wrote: - c-tb = tb; + if (num == 1) + c-tb = tb; + else + { +c-tb = XCNEW (gfc_typebound_proc); (see below) +c-tb-where = gfc_current_locus; +*c-tb = *tb; I haven't convinced myself yet, but is this leaking memory? Lines 5006-7 are tb = XCNEW (gfc_typebound_proc); tb-where = gfc_current_locus; In the num = 1 case, c-tb = tb and I presume during teardown c-tb is freed and by extension tb. But, in the else case, are you copying the contents from *tb into *c-tb and the tb is never freed? Well, as the double-free PR shows, free_components frees the memory: free (p-tb);. And as it walks all components ? Argh. I misread the code. Instead of 'c-tb = XCNEW(...' above, I saw 'tb = XCNEW(...'. So, I thought you might be leaking a local copy of tb. With that misunderstanding cleared up, the patch looks fine. -- Steve
Re: Re-factor inclusion of tree.h
On Mon, Oct 21, 2013 at 12:57 PM, Jeff Law l...@redhat.com wrote: On 10/21/13 10:52, Diego Novillo wrote: I plan to commit this by tomorrow, unless there are objections. I can't think of a good reason to even bother waiting :-) Heh, OK, thanks. After analyzing all the build failures in config-list.mk, I found that we also need explicit inclusion of tree.h in the gen* binaries. Some of these generate all their includes in header files, but I figured we can revise this later on. This was breaking cr16-elf and mips-netbsd. Can anyone think of some way that we can use to automatically block inclusions of tree.h from header files? Code review is the only way that comes to mind. Committed both patches to trunk. Diego. diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c index 973cade..f79380d 100644 --- a/gcc/genattrtab.c +++ b/gcc/genattrtab.c @@ -5100,6 +5100,7 @@ write_header (FILE *outf) fprintf (outf, #include \system.h\\n); fprintf (outf, #include \coretypes.h\\n); fprintf (outf, #include \tm.h\\n); + fprintf (outf, #include \tree.h\\n); fprintf (outf, #include \rtl.h\\n); fprintf (outf, #include \insn-attr.h\\n); fprintf (outf, #include \tm_p.h\\n); diff --git a/gcc/genautomata.c b/gcc/genautomata.c index a0bf076..f6c4b91c4 100644 --- a/gcc/genautomata.c +++ b/gcc/genautomata.c @@ -9665,6 +9665,7 @@ main (int argc, char **argv) #include \system.h\\n #include \coretypes.h\\n #include \tm.h\\n + #include \tree.h\\n #include \rtl.h\\n #include \tm_p.h\\n #include \insn-config.h\\n diff --git a/gcc/genemit.c b/gcc/genemit.c index d4bb301..724a114 100644 --- a/gcc/genemit.c +++ b/gcc/genemit.c @@ -790,6 +790,7 @@ from the machine description file `md'. */\n\n); printf (#include \system.h\\n); printf (#include \coretypes.h\\n); printf (#include \tm.h\\n); + printf (#include \tree.h\\n); printf (#include \rtl.h\\n); printf (#include \tm_p.h\\n); printf (#include \function.h\\n); diff --git a/gcc/genopinit.c b/gcc/genopinit.c index 9c7cf2c..3efb71e 100644 --- a/gcc/genopinit.c +++ b/gcc/genopinit.c @@ -404,6 +404,7 @@ main (int argc, char **argv) #include \system.h\\n #include \coretypes.h\\n #include \tm.h\\n + #include \tree.h\\n #include \rtl.h\\n #include \tm_p.h\\n #include \flags.h\\n diff --git a/gcc/genoutput.c b/gcc/genoutput.c index c3a0936..2a7ee23 100644 --- a/gcc/genoutput.c +++ b/gcc/genoutput.c @@ -238,6 +238,7 @@ output_prologue (void) printf (#include \tm.h\\n); printf (#include \flags.h\\n); printf (#include \ggc.h\\n); + printf (#include \tree.h\\n); printf (#include \rtl.h\\n); printf (#include \expr.h\\n); printf (#include \insn-codes.h\\n); diff --git a/gcc/genpeep.c b/gcc/genpeep.c index a14d061..877fde3 100644 --- a/gcc/genpeep.c +++ b/gcc/genpeep.c @@ -359,6 +359,7 @@ from the machine description file `md'. */\n\n); printf (#include \coretypes.h\\n); printf (#include \tm.h\\n); printf (#include \insn-config.h\\n); + printf (#include \tree.h\\n); printf (#include \rtl.h\\n); printf (#include \tm_p.h\\n); printf (#include \regs.h\\n); diff --git a/gcc/target-globals.c b/gcc/target-globals.c index 65ccb8a..9d223fc 100644 --- a/gcc/target-globals.c +++ b/gcc/target-globals.c @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3. If not see #include tm.h #include insn-config.h #include machmode.h +#include tree.h #include ggc.h #include toplev.h #include target-globals.h
pr37868
Concerning: 2008-11-20 Richard Guenther rguent...@suse.de PR tree-optimization/37868 * gcc.dg/torture/pr37868.c: New testcase. * gcc.c-torture/execute/pr38048-1.c: Likewise. * gcc.c-torture/execute/pr38048-2.c: Likewise. So, is there any reason why we can't make the test case portable as in the below instead of skipping on all the targets that blow on misaligned data? Ok? diff --git a/gcc/testsuite/gcc.dg/torture/pr37868.c b/gcc/testsuite/gcc.dg/torture/pr37868.c index 380b089..b7ff43f 100644 --- a/gcc/testsuite/gcc.dg/torture/pr37868.c +++ b/gcc/testsuite/gcc.dg/torture/pr37868.c @@ -24,15 +24,18 @@ struct X { int main (void) { - struct X x; + union { +struct X x; +unsigned int i; + } s; unsigned int bad_bits; - x.pad = -1; - x.a = -1; - x.b = -1; - x.c = -1; + s.x.pad = -1; + s.x.a = -1; + s.x.b = -1; + s.x.c = -1; - bad_bits = ((unsigned int)-1) ^ *(1+(unsigned int *) x); + bad_bits = ((unsigned int)-1) ^ *(1+(unsigned int *) s.x); if (bad_bits != 0) abort (); return 0;
Re: Patch: Add #pragma ivdep support to the ME and C FE
Dear all, attached is a new version of the patch. Changes: * #pragma GCC ivdep instead of #pragma ivdep * Corrections to the error message in c-parser.c and a test case for it * New wording in the .texi and examples I am still not completely happy with the wording – and I am open for suggestions. In the example, I played safe and mention k -m and k =m; even if k = 0 probably works. I also didn't know how to best state the reason for requiring a condition. (Internal reason: The annotation is attached to the condition - thus, it has to be present. External reason: For vectorization, there shouldn't be a branching in the body of the loop and without a condition in either the for header or in its body, one has an endless loop.) Do you have suggestions for a better wording? If not, is the patch okay for the trunk? Built and regtested (C only). [An all-language bootstrap + regtesting is underway.] Crossrefs: C review comments: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00803.html OpenMPv4's omp simd wording, see bottom of the email at http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01194.html Tobias 2013-08-21 Tobias Burnus bur...@net-b.de PR other/33426 * c-pragma.c (init_pragma) Add #pragma ivdep handling. * c-pragma.h (pragma_kind): Add PRAGMA_IVDEP. PR other/33426 * c-parser.c (c_parser_pragma, c_parser_for_statement): Handle PRAGMA_IVDEP. (c_parser_statement_after_labels): Update call. PR other/33426 * cfgloop.c (flow_loops_find): Search for IFN_ANNOTATE and set safelen. * gimplify.c (gimple_boolify, gimplify_expr): Handle ANNOTATE_EXPR. * internal-fn.c (expand_ANNOTATE): New function. * internal-fn.def (ANNOTATE): Define as new internal function. * tree-core.h (tree_node_kind): Add annot_expr_ivdep_kind. (tree_base) Update a comment. * tree-pretty-print.c (dump_generic_node): Handle ANNOTATE_EXPR. * tree.def (ANNOTATE_EXPR): New DEFTREECODE. * tree.h (ANNOTATE_EXPR_ID, SET_ANNOTATE_EXPR_ID): New macros. * doc/extend.texi (Pragmas): Document #pragma ivdep. PR other/33426 * testsuite/gcc.dg/ivdep.c: New. * testsuite/gcc.dg/vect/vect-ivdep-1.c: New. diff --git a/gcc/c-family/c-pragma.c b/gcc/c-family/c-pragma.c index be5748b..1656000 100644 --- a/gcc/c-family/c-pragma.c +++ b/gcc/c-family/c-pragma.c @@ -1362,6 +1362,8 @@ init_pragma (void) cpp_register_deferred_pragma (parse_in, GCC, pch_preprocess, PRAGMA_GCC_PCH_PREPROCESS, false, false); + cpp_register_deferred_pragma (parse_in, GCC, ivdep, PRAGMA_IVDEP, false, +false); #ifdef HANDLE_PRAGMA_PACK_WITH_EXPANSION c_register_pragma_with_expansion (0, pack, handle_pragma_pack); #else diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h index c421284..705bcb4 100644 --- a/gcc/c-family/c-pragma.h +++ b/gcc/c-family/c-pragma.h @@ -53,6 +53,7 @@ typedef enum pragma_kind { PRAGMA_OMP_TEAMS, PRAGMA_GCC_PCH_PREPROCESS, + PRAGMA_IVDEP, PRAGMA_FIRST_EXTERNAL } pragma_kind; diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 9b6abe0..c721f8a 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -1159,7 +1159,7 @@ static void c_parser_if_statement (c_parser *); static void c_parser_switch_statement (c_parser *); static void c_parser_while_statement (c_parser *); static void c_parser_do_statement (c_parser *); -static void c_parser_for_statement (c_parser *); +static void c_parser_for_statement (c_parser *, bool); static tree c_parser_asm_statement (c_parser *); static tree c_parser_asm_operands (c_parser *); static tree c_parser_asm_goto_operands (c_parser *); @@ -4580,7 +4580,7 @@ c_parser_statement_after_labels (c_parser *parser) c_parser_do_statement (parser); break; case RID_FOR: - c_parser_for_statement (parser); + c_parser_for_statement (parser, false); break; case RID_GOTO: c_parser_consume_token (parser); @@ -5033,7 +5033,7 @@ c_parser_do_statement (c_parser *parser) */ static void -c_parser_for_statement (c_parser *parser) +c_parser_for_statement (c_parser *parser, bool ivdep) { tree block, cond, incr, save_break, save_cont, body; /* The following are only used when parsing an ObjC foreach statement. */ @@ -5139,8 +5139,17 @@ c_parser_for_statement (c_parser *parser) { if (c_parser_next_token_is (parser, CPP_SEMICOLON)) { - c_parser_consume_token (parser); - cond = NULL_TREE; + if (ivdep) + { + c_parser_error (parser, missing loop condition in loop with + %GCC ivdep% pragma); + cond = error_mark_node; + } + else + { + c_parser_consume_token (parser); + cond = NULL_TREE; + } } else { @@ -5154,6 +5163,12 @@ c_parser_for_statement (c_parser *parser) c_parser_skip_until_found (parser, CPP_SEMICOLON, expected %;%); } + if (ivdep cond != error_mark_node) + { + cond = build1 (ANNOTATE_EXPR, TREE_TYPE (cond), cond);
Re: [PATCH, rs6000] Be careful with special permute masks for little endian
Please hold off reviewing this. I see at least one testcase that will have to be modified (expected code generation pattern will be different for LE vs. BE). I'll resubmit the whole thing later today. Thanks, Bill On Mon, 2013-10-21 at 11:39 -0500, Bill Schmidt wrote: Whoops, looks like I missed some simpler cases (REG with the wrong mode instead of SUBREG with the wrong mode). Is this revised version ok, assuming it passes testing? It should fix a few more test cases. The changed code from the previous version is in the last hunk. Thanks, Bill Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c(revision 203792) +++ gcc/config/rs6000/rs6000.c(working copy) @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, { 0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw : CODE_FOR_altivec_vmrglw, { 0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb : CODE_FOR_altivec_vmrghb, { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh : CODE_FOR_altivec_vmrghh, { 8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw : CODE_FOR_altivec_vmrghw, { 8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31 } }, { OPTION_MASK_P8_VECTOR, CODE_FOR_p8_vmrgew, { 0, 1, 2, 3, 16, 17, 18, 19, 8, 9, 10, 11, 24, 25, 26, 27 } }, @@ -28980,6 +28987,26 @@ altivec_expand_vec_perm_const (rtx operands[4]) enum machine_mode omode = insn_data[icode].operand[0].mode; enum machine_mode imode = insn_data[icode].operand[1].mode; + /* For little-endian, don't use vpkuwum and vpkuhum if the + underlying vector type is not V4SI and V8HI, respectively. + For example, using vpkuwum with a V8HI picks up the even + halfwords (BE numbering) when the even halfwords (LE + numbering) are what we need. */ + if (!BYTES_BIG_ENDIAN +icode == CODE_FOR_altivec_vpkuwum +((GET_CODE (op0) == REG + GET_MODE (op0) != V4SImode) + || (GET_CODE (op0) == SUBREG +GET_MODE (XEXP (op0, 0)) != V4SImode))) + continue; + if (!BYTES_BIG_ENDIAN +icode == CODE_FOR_altivec_vpkuhum +((GET_CODE (op0) == REG + GET_MODE (op0) != V8HImode) + || (GET_CODE (op0) == SUBREG +GET_MODE (XEXP (op0, 0)) != V8HImode))) + continue; + /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering from 0 to 2N-1. */ On Mon, 2013-10-21 at 10:02 -0400, David Edelsohn wrote: On Mon, Oct 21, 2013 at 8:49 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, In altivec_expand_vec_perm_const, we look for special masks that match the behavior of specific instructions, so we can use those instructions rather than load a constant control vector and perform a permute. Some of the masks must be treated differently for little endian mode. The masks that represent merge-high and merge-low operations have reversed meanings in little-endian, because of the reversed ordering of the vector elements. The masks that represent vector-pack operations remain correct when the mode of the input operands matches the natural mode of the instruction, but not otherwise. This is because the pack instructions always select the rightmost, low-order bits of the vector element. There are cases where we use this, for example, with a V8SI vector matching a vpkuwum mask in order to select the odd-numbered elements of the
Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2
have an option for true AAPCS compliance, which will be allowed to break the C++11 memory model and And an option that addresses your requirements, which will _not_ break the C++11 memory model So the problem isn't that what *I* need conflicts with C++11, it's that what AAPCS needs conflicts?
[PATCH] Generate fused widening multiply-and-accumulate operations only when the widening multiply has single use
Hi, This patch changes the widening_mul pass to fuse the widening multiply with accumulate only when the multiply has single use. The widening_mul pass currently does the conversion regardless of the number of the uses, which can cause poor code-gen in cases like the following: typedef int ArrT [10][10]; void foo (ArrT Arr, int Idx) { Arr[Idx][Idx] = 1; Arr[Idx + 10][Idx] = 2; } On AArch64, after widening_mul, the IR is like _2 = (long unsigned int) Idx_1(D); _3 = Idx_1(D) w* 40; _5 = Arr_4(D) + _3; *_5[Idx_1(D)] = 1; _8 = WIDEN_MULT_PLUS_EXPR Idx_1(D), 40, 400; _9 = Arr_4(D) + _8; *_9[Idx_1(D)] = 2; Where the arrows point, there are redundant widening multiplies. Bootstrap successfully on x86_64. The patch passes the regtest on aarch64, arm and x86_64. OK for the trunk? Thanks, Yufeng p.s. Note that x86_64 doesn't suffer from this issue as the corresponding widening multiply accumulate op is not available on the target. gcc/ * tree-ssa-math-opts.c (convert_plusminus_to_widen): Call has_single_use () and not do the conversion if has_single_use () returns false for the multiplication result.diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index f7f8ec9..d316990 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2425,12 +2425,16 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, It might also appear that it would be sufficient to use the existing operands of the widening multiply, but that would limit the choice of - multiply-and-accumulate instructions. */ + multiply-and-accumulate instructions. + + If the widened-multiplication result has more than one uses, it is + probably wiser not to do the conversion. */ if (code == PLUS_EXPR (rhs1_code == MULT_EXPR || rhs1_code == WIDEN_MULT_EXPR)) { if (!is_widening_mult_p (rhs1_stmt, type1, mult_rhs1, - type2, mult_rhs2)) + type2, mult_rhs2) + || !has_single_use (rhs1)) return false; add_rhs = rhs2; conv_stmt = conv1_stmt; @@ -2438,7 +2442,8 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR) { if (!is_widening_mult_p (rhs2_stmt, type1, mult_rhs1, - type2, mult_rhs2)) + type2, mult_rhs2) + || !has_single_use (rhs2)) return false; add_rhs = rhs1; conv_stmt = conv2_stmt;
Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping
On Mon, Oct 21, 2013 at 6:32 PM, Martin Jambor wrote: --- a/gcc/ira.c +++ b/gcc/ira.c @@ -4314,6 +4314,197 @@ find_moveable_pseudos (void) free_dominance_info (CDI_DOMINATORS); } + +/* If insn is interesting for parameter range-splitting shring-wrapping + preparation, i.e. it is a single set from a hard register to a pseudo, which + is live at CALL_DOM, return the destination. Otherwise return NULL. */ + +static rtx +interesting_dest_for_shprep (rtx insn, basic_block call_dom) +{ + rtx set = single_set (insn); + if (!set) +return NULL; + rtx src = SET_SRC (set); + rtx dest = SET_DEST (set); + if (!REG_P (src) || !HARD_REGISTER_P (src) + || !REG_P (dest) || HARD_REGISTER_P (dest) + || (call_dom !bitmap_bit_p (df_get_live_in (call_dom), REGNO (dest See below... +return NULL; + return dest; +} + +/* Split live ranges of pseudos that are loaded from hard registers in the + first BB in a BB that dominates all non-sibling call if such a BB can be + found and is not in a loop. */ + +static void +split_live_ranges_for_shrink_wrap (void) +{ + basic_block bb, call_dom = NULL; + basic_block first = single_succ (ENTRY_BLOCK_PTR); + rtx insn, last_interesting_insn = NULL; + bitmap_head need_new, reachable; + vecbasic_block queue; + + if (!flag_shrink_wrap) +return; + + bitmap_initialize (need_new, 0); + bitmap_initialize (reachable, 0); + queue.create (n_basic_blocks); + + FOR_EACH_BB (bb) +FOR_BB_INSNS (bb, insn) + if (CALL_P (insn) !SIBLING_CALL_P (insn)) + { + if (bb == first) + { + bitmap_clear (need_new); + bitmap_clear (reachable); + queue.release (); + return; + } + + bitmap_set_bit (need_new, bb-index); + bitmap_set_bit (reachable, bb-index); + queue.quick_push (bb); + break; + } + + if (queue.is_empty ()) +{ + bitmap_clear (need_new); + bitmap_clear (reachable); + queue.release (); + return; +} + + while (!queue.is_empty ()) +{ + edge e; + edge_iterator ei; + + bb = queue.pop (); + FOR_EACH_EDGE (e, ei, bb-succs) + if (e-dest != EXIT_BLOCK_PTR +bitmap_set_bit (reachable, e-dest-index)) + queue.quick_push (e-dest); +} + queue.release (); + + FOR_BB_INSNS (first, insn) +{ + rtx dest = interesting_dest_for_shprep (insn, NULL); + if (!dest) + continue; + + if (DF_REG_DEF_COUNT (REGNO (dest)) 1) See below... + { + bitmap_clear (need_new); + bitmap_clear (reachable); + return; + } + + for (df_ref use = DF_REG_USE_CHAIN (REGNO(dest)); + use; + use = DF_REF_NEXT_REG (use)) You're using DF in these places. But IRA and LRA don't work with DF. After update_equiv_regs DF caches and liveness may be incorrect. You'd have to add a df_analyze call but I'm not sure how that will interact with IRA/LRA's own dataflow frameworks (e.g. w.r.t. REG_DEAD/REG_UNUSED notes). + rtx uin = DF_REF_INSN (use); + int ubbi = BLOCK_FOR_INSN (uin)-index; int ubbi = DF_REF_BB (use)? Ciao! Steven
Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.
On 10/17/2013 07:15 AM, Kirill Yukhin wrote: +(define_mode_attr ssescalarsize + [(V8DI 64) (V4DI 64) (V2DI 64) + (V32HI 16) (V16HI 16) (V8HI 16) + (V16SI 32) (V8SI 32) (V4SI 32) + (V16SF 16) (V8DF 64)]) Error on V16SF. Probably better to fill this out. +(define_insn avx512f_loadmode_mask + [(set (match_operand:VI48F_512 0 register_operand =v,v) + (vec_merge:VI48F_512 + (match_operand:VI48F_512 1 nonimmediate_operand v,m) + (match_operand:VI48F_512 2 vector_move_operand 0C,0C) + (match_operand:avx512fmaskmode 3 register_operand k,k)))] + TARGET_AVX512F +{ + switch (get_attr_mode (insn)) Better to just use sseinsnmode here, as it's a compile-time constant. +{ +case MODE_V8DF: +case MODE_V16SF: + return vmovassemodesuffix\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}; +default: + return vmovdqassescalarsize\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}; +} +} + [(set_attr type ssemov) + (set_attr prefix evex) + (set_attr memory none,load) + (set_attr mode sseinsnmode)]) + +(define_insn avx512f_storemode_mask Likewise. +(define_insn avx512f_movesmode_mask + [(set (match_operand:VF_128 0 register_operand =v) + (vec_merge:VF_128 + (vec_merge:VF_128 + (match_operand:VF_128 2 register_operand v) + (match_operand:VF_128 3 vector_move_operand 0C) + (match_operand:avx512fmaskmode 4 register_operand k)) + (match_operand:VF_128 1 register_operand v) + (const_int 1)))] + TARGET_AVX512F + vmovssescalarmodesuffix\t{%2, %1, %0%{%4%}%N3|%0%{%4%}%N3, %1, %2} + [(set_attr type ssemov) + (set_attr prefix evex) + (set_attr mode sseinsnmode)]) Nested vec_merge? That seems... odd to say the least. How in the world does this get matched? +(define_insn *avx512f_loadsmode_mask Likewise. +(define_insn avx512f_storesmode_mask + [(set (match_operand:ssescalarmode 0 memory_operand =m) + (vec_select:ssescalarmode + (vec_merge:VF_128 + (match_operand:VF_128 1 register_operand v) + (vec_duplicate:VF_128 + (match_dup 0)) + (match_operand:avx512fmaskmode 2 register_operand k)) + (parallel [(const_int 0)])))] This seems similar, though of course it's an extract. I still can't imagine how it could be used. -(define_insn rcp14mode +(define_insn mask_codeforrcp14modemask_name What, this name isn't used for non-masked anymore? -(define_insn srcp14mode +(define_insn *srcp14mode Likewise. These changes don't belong in this patch. -(define_insn rsqrt14mode +(define_insn *rsqrt14mode Likewise. @@ -2565,9 +2751,9 @@ (define_insn *fma_fmadd_mode [(set (match_operand:FMAMODE 0 register_operand =v,v,v,x,x) (fma:FMAMODE - (match_operand:FMAMODE 1 nonimmediate_operand %0, 0, v, x,x) - (match_operand:FMAMODE 2 nonimmediate_operand vm, v,vm, x,m) - (match_operand:FMAMODE 3 nonimmediate_operand v,vm, 0,xm,x)))] + (match_operand:FMAMODE 1 nonimmediate_operand %0,0,v,x,x) + (match_operand:FMAMODE 2 nonimmediate_operand vm,v,vm,x,m) + (match_operand:FMAMODE 3 nonimmediate_operand v,vm,0,xm,x)))] Unrelated changes. Repeated throughout the fma patterns. +(define_insn *fmai_fmadd_mode_maskz + [(set (match_operand:VF_128 0 register_operand =v,v) + (vec_merge:VF_128 + (vec_merge:VF_128 + (fma:VF_128 + (match_operand:VF_128 1 nonimmediate_operand 0,0) + (match_operand:VF_128 2 nonimmediate_operand vm,v) + (match_operand:VF_128 3 nonimmediate_operand v,vm)) + (match_operand:VF_128 4 const0_operand) + (match_operand:QI 5 register_operand k,k)) + (match_dup 1) + (const_int 1)))] + TARGET_AVX512F + @ + vfmadd132ssescalarmodesuffix\t{%2, %3, %0%{%5%}%N4|%0%{%5%}%N4, %3, %2} + vfmadd213ssescalarmodesuffix\t{%3, %2, %0%{%5%}%N4|%0%{%5%}%N4, %2, %3} + [(set_attr type ssemuladd) + (set_attr mode MODE)]) These seem like useless patterns. If they're for builtins, then they seem like useless builtins. See above. @@ -3686,8 +4328,8 @@ (set_attr athlon_decode vector,double,*) (set_attr amdfam10_decode vector,double,*) (set_attr bdver1_decode direct,direct,*) - (set_attr btver2_decode double,double,double) (set_attr prefix orig,orig,vex) + (set_attr btver2_decode double,double,double) (set_attr mode SF)]) Unrelated changes. +(define_expand vec_unpacku_float_hi_v16si + [(match_operand:V8DF 0 register_operand) + (match_operand:V16SI 1 register_operand)] + TARGET_AVX512F +{ + REAL_VALUE_TYPE TWO32r; + rtx k, x, tmp[4]; + + real_ldexp (TWO32r, dconst1, 32); + x = const_double_from_real_value (TWO32r, DFmode); + + tmp[0] = force_reg (V8DFmode, CONST0_RTX (V8DFmode)); + tmp[1] = force_reg (V8DFmode, ix86_build_const_vector (V8DFmode, 1, x)); + tmp[2] = gen_reg_rtx (V8DFmode); + tmp[3] = gen_reg_rtx (V8SImode); + k = gen_reg_rtx
Re: [patch] Cleanup tree-ssa-ter.c exports
what went wrong? I see it in my checkouts...? Just open the file in your preferred editor. :-) -- Eric Botcazou
[PATCH, rs6000] Be careful with special permute masks for little endian, take 2
Hi, This is a revision of my earlier patch on the subject, expanded to catch a few more cases and with some attendant test-case adjustments: In altivec_expand_vec_perm_const, we look for special masks that match the behavior of specific instructions, so we can use those instructions rather than load a constant control vector and perform a permute. Some of the masks must be treated differently for little endian mode. The masks that represent merge-high and merge-low operations have reversed meanings in little-endian, because of the reversed ordering of the vector elements. The masks that represent vector-pack operations remain correct when the mode of the input operands matches the natural mode of the instruction, but not otherwise. This is because the pack instructions always select the rightmost, low-order bits of the vector element. There are cases where we use this, for example, with a V8SI vector matching a vpkuwum mask in order to select the odd-numbered elements of the vector. In little endian mode, this instruction will get us the even-numbered elements instead. There is no alternative instruction with the desired behavior, so I've just disabled use of those masks for little endian when the mode isn't natural. This requires adjusting the altivec-perm-1.c test case. The vector pack tests are moved to a new altivec-perm-3.c test, which is restricted to big-endian targets. These changes fix 49 failures in the test suite for little endian mode (9 vector failures left to go!). Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no new failures. Is this ok for trunk? Thanks, Bill gcc: 2013-10-21 Bill Schmidt wschm...@vnet.ibm.com * config/rs6000/rs6000.c (altivec_expand_vec_perm_const): Reverse meaning of merge-high and merge-low masks for little endian; avoid use of vector-pack masks for little endian for mismatched modes. gcc/testsuite: 2013-10-21 Bill Schmidt wschm...@vnet.ibm.com * gcc.target/powerpc/altivec-perm-1.c: Move the two vector pack tests into... * gcc.target/powerpc/altivec-perm-3.c: ...this new test, which is restricted to big-endian targets. Index: gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c === --- gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/altivec-perm-3.c (revision 0) @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_altivec_ok } */ +/* { dg-skip-if { powerpc*le-*-* } { * } { } } */ +/* { dg-options -O -maltivec -mno-vsx } */ + +typedef unsigned char V __attribute__((vector_size(16))); + +V p2(V x, V y) +{ + return __builtin_shuffle(x, y, + (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); + +} + +V p4(V x, V y) +{ + return __builtin_shuffle(x, y, + (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); +} + +/* { dg-final { scan-assembler-not vperm } } */ +/* { dg-final { scan-assembler vpkuhum } } */ +/* { dg-final { scan-assembler vpkuwum } } */ Index: gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c === --- gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c (revision 203792) +++ gcc/testsuite/gcc.target/powerpc/altivec-perm-1.c (working copy) @@ -19,19 +19,6 @@ V b4(V x) return __builtin_shuffle(x, (V){ 4,5,6,7, 4,5,6,7, 4,5,6,7, 4,5,6,7, }); } -V p2(V x, V y) -{ - return __builtin_shuffle(x, y, - (V){ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 }); - -} - -V p4(V x, V y) -{ - return __builtin_shuffle(x, y, - (V){ 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 }); -} - V h1(V x, V y) { return __builtin_shuffle(x, y, @@ -72,5 +59,3 @@ V l4(V x, V y) /* { dg-final { scan-assembler vspltb } } */ /* { dg-final { scan-assembler vsplth } } */ /* { dg-final { scan-assembler vspltw } } */ -/* { dg-final { scan-assembler vpkuhum } } */ -/* { dg-final { scan-assembler vpkuwum } } */ Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 203792) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -28837,17 +28838,23 @@ altivec_expand_vec_perm_const (rtx operands[4]) { 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31 } }, { OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum, { 2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb : CODE_FOR_altivec_vmrglb, { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, -{ OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh, +{ OPTION_MASK_ALTIVEC, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh : CODE_FOR_altivec_vmrglh, { 0,
RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2
Hi, On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote: Hr. After some further testing, I'm afraid this patch is still buggy. :-( I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new pr56997-1.c testcase, it got stuck in infinite recursion between store_split_bit_field/store_fixed_bit_field and/or extract_split_bit_field/extract_fixed_bit_field. This didn't show up in my previous mainline testing. The difference between 4.8 and mainline head is the alignment of the incoming str_rtx passed to store_bit_field/extract_bit_field, due to the changes in r199898. The alignment is 8 bits on mainline, and 32 on 4.8. It seems to me that the bitfield code ought to handle store/extract from a more-aligned object and it's probably possible to construct an example that fails in this way on mainline as well. It looks like there are conflicting assumptions between get_best_mode, the places that call it in store_fixed_bit_field and extract_fixed_bit_field, and the code that actually does the splitting -- which uses a unit based on the MEM_ALIGN of the incoming rtx, rather than its mode. In the case where it's failing, get_best_mode is deciding it can't do a HImode access without splitting, but then the split code is assuming SImode units because of the 32-bit alignment, but not actually changing the mode of the rtx to match that. On top of that, this is one of the cases that strict_volatile_bitfield_p checks for and returns false, but the callers of store_bit_field and extract_bit_field in expr.c have already fiddled with the mode of the incoming rtx assuming that -fstrict-volatile-bitfields does apply. It doesn't get into that infinite recursion if it's compiled with -fno-strict-volatile-bitfields instead; in that case the incoming rtx has BLKmode, get_best_mode chooses SImode, and it's able to do the access without splitting at all. Anyway I tried a couple different bandaids that solved the infinite recursion problem but caused regressions elsewhere, and now I'm not sure of the right place to fix this. Given that there is also still ongoing discussion about making this a 3-way switch (etc) I am going to hold off on committing this patch for now. -Sandra You are right, if I modify pr56997-1 the patch crashes on trunk: typedef struct s{ char Prefix; short Type; }__attribute__((packed,aligned(4))) ss; please add a test case for this. This way we have op0 aligned 32 and HI mode selected bitoffset=8 numbits=16. crashes only when reading this value, because the access tries to use SImode. For some reason the alignment seems to be wrong in 4.8. Thanks Bernd.
RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2
have an option for true AAPCS compliance, which will be allowed to break the C++11 memory model and And an option that addresses your requirements, which will _not_ break the C++11 memory model So the problem isn't that what *I* need conflicts with C++11, it's that what AAPCS needs conflicts? Yes, there are two written specifications which are in conflict AAPCS and C++11. We cannot follow both at the same time. But from this discussion I've learned, that your target's requirements can easily co-exist with the C++ memory model. Because if you only use well-formed bit-fields, the C++ memory model just allows everything, and we can choose what to do. Regards Bernd.
Re: patch to enable LRA for ppc
On 13-10-21 11:51 AM, Michael Meissner wrote: On Sun, Oct 20, 2013 at 10:48:08PM -0400, Vladimir Makarov wrote: On 13-10-18 11:26 AM, David Edelsohn wrote: On Thu, Oct 3, 2013 at 5:02 PM, Vladimir Makarov vmaka...@redhat.com wrote: The following patch permits today trunk to use LRA for ppc by default. To switch it off -mno-lra can be used. The patch was bootstrapped on ppc64. GCC testsuite does not have regressions too (in comparison with reload). The change in rs6000.md is for fix LRA failure on a recently added ppc test. Vlad, I have not forgotten this patch. We are trying to figure out the right timeframe to make this change. The patch does affect performance -- both positively and negatively; most are in the noise but not all. And there still are some SPEC benchmarks that fail to build with the patch, at least in Mike's tests. And Mike is implementing some patches to utilize reload to improve use of VSX registers, which would need to be mirrored in LRA for the equivalent functionality. Thanks for informing me, David. I am ready to work on any LRA ppc issues when it will be in the trunk. It would be easier for me to work on LRA ppc if the patch is committed to the trunk and of course LRA is used as non-default local RA. I don't know what Mike is doing on reload to use VSX registers. I guess it is usage of VSX regs as spilled locations for GENERAL regs instead of memory. If it is so, it is 2 day work to add this functionality in LRA (as it already has analogous functionality for Intel processors and that gave a nice SPECFP2000 improvement for them) and probably more work on resolving issues especially as I have no power8. I would say lets add -mlra, but make the default OFF for the time being. We can always switch the default later. Sure, if you know some LRA problems it should not be on default. Moreover, if we still have the problems when releasing gcc4.9, I think we should exclude any possibility for a user to use LRA for ppc. I don't want to have GGC-4.9 users blaming LRA. But adding LRA to PPC on the trunk (switched OFF by default) earlier could help me a lot to work on the issues. Vladimir, I thought I included you in the list when I gave status. The big thing is several of the Spec 2006 benchmarks don't work in 32-bit mode, and I get a lot of Fortran errors, again in 32-bit. I also saw some decimal floating point problems. No, I did not see the message (or may be missed). I need to check. What I'm doing is adding secondary reload support so that up until reload time, we can represent VSX addresses as reg+offset, and in secondary reload, create the addition instructions to put the offset in a base register. I haven't made any changes to the machine independent portions of the compiler. As long as IRA uses the secondary reload interface, it should be ok. However, right now, I need to focus most of my attention on getting the secondary reload support to work. I completely understand. You are quite busy this time as me rushing some stuff into gcc-4.9. One thing that I've asked for before, but to remind you, is I really, really wish secondary reload could allocate two scratch registers if it is given an insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since that gives 2 registers, but future changes will want TFmode to go into a single vector register, and I will need to create another type, like V4DI that does take 2 registers. The case that this is needed for is moving an item from GPRs to VSX registers that takes 2 GPR registers, such as moving 128-bit items in 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the move into, and then I will do the combine operation. Ok. I guess LRA can be adapted to some new secondary_reload hook returning two scratch registers.
Re: patch to enable LRA for ppc
On 13-10-21 2:55 PM, Michael Meissner wrote: On Mon, Oct 21, 2013 at 08:18:22PM +0200, Steven Bosscher wrote: On Mon, Oct 21, 2013 at 5:51 PM, Michael Meissner wrote: What I'm doing is adding secondary reload support so that up until reload time, we can represent VSX addresses as reg+offset, and in secondary reload, create the addition instructions to put the offset in a base register. I haven't made any changes to the machine independent portions of the compiler. As long as IRA uses the secondary reload interface, it should be ok. However, right now, I need to focus most of my attention on getting the secondary reload support to work. One thing that I've asked for before, but to remind you, is I really, really wish secondary reload could allocate two scratch registers if it is given an insn that takes 4 arguments. Right now, I'm allocating a TFmode scratch, since that gives 2 registers, but future changes will want TFmode to go into a single vector register, and I will need to create another type, like V4DI that does take 2 registers. The case that this is needed for is moving an item from GPRs to VSX registers that takes 2 GPR registers, such as moving 128-bit items in 64-bit mode, or 64-bit items in 32-bit mode. I need two registers to do the move into, and then I will do the combine operation. Eh, perhaps I'm missing something, but... Isn't one of the great advantages of LRA over reload, that LRA allows you to create new pseudos so that you shouldn't ever need secondary reloads?? You still need secondary reload. As I understand, Mike is telling about secondary_reload hook. LRA can generate chain of reloads as long as it is needed. It is achieved by subsequent processing generated reload insns on one or more lra-constraints subpasses. Porting LRA frequently consists of removing secondary reload hook as in many cases it is smart enough to find necessary reloads just from insns constraints. But there are still really complicated situations when LRA can not do this and it still needs directions from secondary_reload hook. I am sure PPC has real needs to use this hook even for LRA. For example, on the powerpc, you have 5 addressing modes for GPRs and FPRS: 1) base register 2) base register + index register 3) base register + offset 4) auto-update for base register + index register 5) auto-update for base register + offset register Now for VSX registers you only have: 1) base register 2) base register + index register So, in the work I'm doing right now, I want to allow reg + offset addressing, but if the register being loaded is an Altivec register (high part of the VSX registers), I need to create a secondary reload to load the offset into and then convert the address to indirect or indexed addressing. You don't want to always disallow offset based addressing, but you want to create the secondary reload when you need to. Similarly, vector types can do indexed addressing (reg+reg) but if you are loading or storing the value into GPR registers, you can't do reg+reg addressing on multi-word items. Finally, one of the features of ISA 2.07 is the notion of load fusion, where the hardware will fuse together a load immediate with an adjacent load (for GPRs, you need the load immediate shifted to be the register that is being loaded, for VSX registers, you just need the instructions adjacent). In this case, before reload we will want to pretend that the machine has addressing to include the fusion forms, and in secondary reload, you will generate the combined insn that will become the fusion instruction.
Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping
On 13-10-21 6:56 PM, Steven Bosscher wrote: + { + bitmap_clear (need_new); + bitmap_clear (reachable); + return; + } + + for (df_ref use = DF_REG_USE_CHAIN (REGNO(dest)); + use; + use = DF_REF_NEXT_REG (use)) You're using DF in these places. But IRA and LRA don't work with DF. After update_equiv_regs DF caches and liveness may be incorrect. You'd have to add a df_analyze call but I'm not sure how that will interact with IRA/LRA's own dataflow frameworks (e.g. w.r.t. REG_DEAD/REG_UNUSED notes). Sorry, Martin. I think Steven is right. IRA/LRA (and reload pass) creates so many changes in RTL that DF infrastructure would slow down the compiler a lot and therefore df info is not updated during RA. Your patch mostly uses a correct DF-info because there are few changes since updating is off. You could move your optimization a bit up before df_clear_flags (DF_NO_INSN_RESCAN); or move this call right after your optimizations (possibly some minor df calls are needed too to restore live info for RA after your RTL changes).
[patch] fix tree-ssa-ter.h dup
On 10/21/2013 07:56 PM, Eric Botcazou wrote: what went wrong? I see it in my checkouts...? Just open the file in your preferred editor. :-) ah. open it in an editor with more than 25 lines on the screen you mean :-) Thanks fixed... revision 203914 Andrew * tree-ssa-ter.h: Remove duplicate copy of file contents. Index: tree-ssa-ter.h === *** tree-ssa-ter.h (revision 203836) --- tree-ssa-ter.h (working copy) *** extern bitmap find_replaceable_exprs (va *** 24,52 extern void dump_replaceable_exprs (FILE *, bitmap); #endif /* GCC_TREE_SSA_TER_H */ - /* Header file for tree-ssa-ter.c exports. -Copyright (C) 2013 Free Software Foundation, Inc. - - This file is part of GCC. - - GCC is free software; you can redistribute it and/or modify it under - the terms of the GNU General Public License as published by the Free - Software Foundation; either version 3, or (at your option) any later - version. - - GCC is distributed in the hope that it will be useful, but WITHOUT ANY - WARRANTY; without even the implied warranty of MERCHANTABILITY or - FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - for more details. - - You should have received a copy of the GNU General Public License - along with GCC; see the file COPYING3. If not see - http://www.gnu.org/licenses/. */ - - #ifndef GCC_TREE_SSA_TER_H - #define GCC_TREE_SSA_TER_H - - extern bitmap find_replaceable_exprs (var_map); - extern void dump_replaceable_exprs (FILE *, bitmap); - - #endif /* GCC_TREE_SSA_TER_H */ --- 24,26
Re: [Patch, fortran] PR58793 - Wrong value for _vtab for intrinsic types with CLASS(*): storage_size of class(*) gives wrong result
Committed as r203915. I have a fix for the hollerith nonsense that yields the same behaviour as other brands. I'll submit tonight. Cheers Paul On 21 October 2013 11:20, Tobias Burnus bur...@net-b.de wrote: Paul Richard Thomas wrote: This patch is fairly obvious and follows a suggestion from Tobias to use gfc_element_size. He even wrote the testcase! Bootstrapped and regested on FC17/x86_64 - OK for trunk? Looks good to me. Thanks for the patch. PS In writing this, I have just noted that I need to trap ts-type == BT_HOLLERITH. It would be a rather bizarre thing to try to do but I just know that somebody, somewhere will eventually try it :-) Consider that it will be done in the committed version. I think that should be done as follow-up patch.In particular, it is not obvious how it should be handled. The program below ICEs in gfc_typenode_for_spec;with both Intel's and Cray's compiler, it gives Character with len 3 Something else I somehow had expected that it maps to a character. program test call up(abc) call up(3habc) contains subroutine up(x) class(*) :: x select type(x) type is (character(*)) print *, 'Character with len', len(x) class default print *, 'Something else' end select end subroutine end program test The following - related to TYPE(*) - is also accepted, but I think it shouldn't. (call up(x) is presumably invalid). It fails here at assembler time. program test use iso_c_binding integer,target ::aa call up(c_loc(aa)) ! Probably okay but not matchable? call bar(aa) !OK contains subroutine up(x) class(*) :: x select type(x) ! type is (c_ptr) ! invalid rejected by gfortran ! print *, 'C_ptr' class default print *, 'Something else' end select end subroutine subroutine bar(x) type(*) :: x call up(x) ! Invalid end subroutine bar end program test Tobias 2013-10-20 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * class.c : Include target-memory.h. (gfc_find_intrinsic_vtab) Build a minimal expression so that gfc_element_size can be used to obtain the storage size, rather that the kind value. 2013-10-20 Paul Thomas pa...@gcc.gnu.org PR fortran 57893 * gfortran.dg/unlimited_polymorphic_12.f90 : New test. -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy