Re: [PATCH/RFC] C++ FE: expression ranges (v2)
On 11/19/2015 03:46 PM, Jason Merrill wrote: On 11/15/2015 12:01 AM, David Malcolm wrote: As with the C frontend, there's an issue with tree nodes that don't have locations: VAR_DECL, INTEGER_CST, etc: int test (int foo) { return foo * 100; ^^^ ^^^ } where we'd like to access the source spelling ranges of the expressions during parsing, so that we can use them when reporting parser errors. Hmm, I had been thinking to address this in the C++ front end by wrapping uses in another tree: NOP_EXPR for rvalues, VIEW_CONVERT_EXPR for lvalues. On the other hand, my direction seems likely to cause more issues, especially with code that doesn't yet know how to handle VIEW_CONVERT_EXPR, and could create ambiguity with explicit conversions. So I guess your approach seems reasonable. What is the memory consumption impact of this change? Also, in cp_parser_new_expression I attempted to generate meaningful ranges e.g.: int *foo = new int[100]; ^~~~ but it seems to be hard to do this, due to the trailing optional components; I found myself wanting to ask the lexer for the last token it consumed (in particular, I'm interested in the location_t of that token). Is this a reasonable thing to add to the lexer? cp_lexer_previous_token seems like what you want. - return cp_build_unary_op (code, arg1, candidates != 0, complain); + { + tree result = cp_build_unary_op (code, arg1, candidates != 0, complain); + protected_set_expr_location (result, loc); I'd much rather pass loc into cp_build_unary_op. At least add a FIXME to that effect. Likewise in the other places you call *build* and then set the loc. +#if 0 +/* FIXME: various assertions can be put in here when debugging, + for tracking down where location information gets thrown + away (during a trip through a purely "tree" value). */ +if (m_value && m_value != error_mark_node) + { + if (TREE_CODE (m_value) == FUNCTION_DECL) + return; // for now + gcc_assert (CAN_HAVE_LOCATION_P (m_value)); + //gcc_assert (m_loc != UNKNOWN_LOCATION); Yeah, I don't think you want to assert on UNKNOWN_LOCATION; some code does not and should not have an associated location. In particular, cleanups. Build the assignment expression. Its default -location is the location of the '=' token. */ +location: + LHS = RHS + ^ +is the location of the '=' token as the +caret, ranging from the start of the lhs to the +end of the rhs. */ saved_input_location = input_location; + loc = make_location (loc, + expr.get_start (), + rhs.get_finish ()); input_location = loc; expr = build_x_modify_expr (loc, expr, assignment_operator, rhs, complain_flags (decltype_p)); + protected_set_expr_location (expr, loc); input_location = saved_input_location; Do we still need to mess with input_location here? If so, please add a FIXME explaining what still needs to be fixed. Jason
Re: [PATCH] Fix debug info related ICE when inlining back fnsplit function (PR debug/66432)
On November 20, 2015 9:11:01 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >This patch fixes ICE where a parameter mentioned in a debug source bind >has its VLA type mistakenly remapped and that leads to inconsistent >type >between the PARM_DECL and SSA_NAMEs derived from it. > >The patch Tom posted for this can't work, because we assume that the >s=> value as well as debug_decl_args decl in that case is DECL_ORIGIN >of the PARM_DECL. But, in this case we are replacing the DECL_ORIGIN >PARM_DECL immediately with a DEBUG_EXPR_DECL anyway, so there is no >point remapping the var we don't own (it could be in a different >function >etc.). > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for >trunk/5.3? OK. Thanks, Richard. >2015-11-20 Jakub Jelinek > > PR debug/66432 > * tree-inline.c (copy_debug_stmt): If > gimple_debug_source_bind_get_value is DECL_ORIGIN of a PARM_DECL > in decl_debug_args, don't call remap_gimple_op_r on it. > > * gcc.dg/debug/pr66432.c: New test. > >--- gcc/tree-inline.c.jj 2015-11-14 19:36:03.0 +0100 >+++ gcc/tree-inline.c 2015-11-20 17:17:00.632082622 +0100 >@@ -2864,8 +2864,6 @@ copy_debug_stmt (gdebug *stmt, copy_body > else if (gimple_debug_source_bind_p (stmt)) > { > gimple_debug_source_bind_set_var (stmt, t); >- walk_tree (gimple_debug_source_bind_get_value_ptr (stmt), >- remap_gimple_op_r, &wi, NULL); > /* When inlining and source bind refers to one of the optimized >away parameters, change the source bind into normal debug bind >referring to the corresponding DEBUG_EXPR_DECL that should have >@@ -2889,7 +2887,10 @@ copy_debug_stmt (gdebug *stmt, copy_body > break; > } > } >- } >+ } >+ if (gimple_debug_source_bind_p (stmt)) >+ walk_tree (gimple_debug_source_bind_get_value_ptr (stmt), >+ remap_gimple_op_r, &wi, NULL); > } > > processing_debug_stmt = 0; >--- gcc/testsuite/gcc.dg/debug/pr66432.c.jj2015-11-20 >17:40:44.589171083 +0100 >+++ gcc/testsuite/gcc.dg/debug/pr66432.c 2015-11-20 17:38:48.0 >+0100 >@@ -0,0 +1,19 @@ >+/* PR debug/66432 */ >+/* { dg-do compile } */ >+/* { dg-options "-O2 -g" } */ >+ >+extern void baz (const char *, const char *) __attribute__ >((__noreturn__)); >+ >+void >+foo (int x, int y[x][x]) >+{ >+ if (x < 2) >+baz ("", ""); >+} >+ >+void >+bar (void) >+{ >+ int z[2][2] = { { 1, 2 }, { 3, 4 } }; >+ foo (2, z); >+} > > Jakub
Re: RFA: PATCH to match.pd for c++/68385
On November 20, 2015 8:58:15 PM GMT+01:00, Jason Merrill wrote: >In this bug, we hit the (A & sign-bit) != 0 -> A < 0 transformation. >Because of delayed folding, the operands aren't fully folded yet, so we > >have NOP_EXPRs around INTEGER_CSTs, and so calling wi::only_sign_bit_p >ICEs. We've been seeing several similar bugs, where code calls >integer_zerop and therefore assumes that they have an INTEGER_CST, but >in fact integer_zerop does STRIP_NOPS. > >This patch changes the pattern to only match if the operand is actually > >an INTEGER_CST. Alternatively we could call tree_strip_nop_conversions > >on the operand, but I would expect that to have issues when the >conversion changes the signedness of the type. > >OK if testing passes? What happens if we remove the nops stripping from integer_zerop? Do other integer predicates strip nops? Richard.
[PATCH] 23_containers/vector/profile/vector.cc
The testcase allocates so much memory that it requires an additional option to enabled higher memory limit on AIX. Bootstrapped on powerpc-ibm-aix7.1.0.0 Thanks, David * testsuite/23_containers/vector/profile/vector.cc: Add maxdata option on AIX. Index: 23_containers/vector/profile/vector.cc === --- 23_containers/vector/profile/vector.cc (revision 230674) +++ 23_containers/vector/profile/vector.cc (working copy) @@ -2,6 +2,8 @@ // Advice: set tmp as 1 // { dg-options "-DITERATIONS=20" { target simulator } } +// AIX requires higher memory limit +// { dg-additional-options "-Wl,-bmaxdata:0x2000" { target { powerpc-ibm-aix* } } } #ifndef ITERATIONS #define ITERATIONS 2000
libgo patch committed: Fix offset handling in syscall.Sendfile
PR 66378 points out that syscall.Sendfile does not correct handle an offset argument: it ignored the initial value. This patch fixes the problem. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline and GCC 5 branch. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230695) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -81dcb1ba4de82a6c9325cb322d5a832a6b1f168d +97ec885c715b3922b0866c081554899b8d50933a The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/syscall/libcall_bsd.go === --- libgo/go/syscall/libcall_bsd.go (revision 230463) +++ libgo/go/syscall/libcall_bsd.go (working copy) @@ -17,6 +17,7 @@ func Sendfile(outfd int, infd int, offse var soff Offset_t var psoff *Offset_t if offset != nil { + soff = Offset_t(*offset) psoff = &soff } written, err = sendfile(outfd, infd, psoff, count) Index: libgo/go/syscall/libcall_linux.go === --- libgo/go/syscall/libcall_linux.go (revision 230463) +++ libgo/go/syscall/libcall_linux.go (working copy) @@ -327,6 +327,7 @@ func Sendfile(outfd int, infd int, offse var soff Offset_t var psoff *Offset_t if offset != nil { + soff = Offset_t(*offset) psoff = &soff } written, err = sendfile(outfd, infd, psoff, count)
Re: [PATCH,RFC] Introduce RUN_UNDER_VALGRIND in test-suite
On Thu, 19 Nov 2015, Martin Li?ka wrote: > Hello. > > In last two weeks I've removed couple of memory leaks, mainly tight to > middle-end. > Currently, a user of the GCC compiler can pass '--enable-checking=valgrind' > configure option > that will run all commands within valgrind environment, but as the valgrind > runs just with '-q' option, > the result is not very helpful. > > I would like to start with another approach, where we can run all tests in > test-suite > within the valgrind sandbox and return an exit code if there's an error seen > by the tool. > That unfortunately leads to many latent (maybe false positives, FE issues, > ...) that can > be efficiently ignored by valgrind suppressions file (the file is part of > suggested patch). > > The first version of the valgrind.supp can survive running compilation of > tramp3d with -O2 > and majority of tests in test-suite can successfully finish. Most of memory > leaks > mentioned in the file can be eventually fixed. I didn't quite understand the need for the suppression files. Is it like Markus said, only because valgrind annotations are not on by default? Then let's change it so that's the default during DEV-PHASE = experimental (the development phase) or prerelease. I really thought that was the case by now. (The suppression files are IMHO a useful addition to contrib/ either way.) > As I noticed in results log files, most of remaining issues are connected to > gcc.c and > lto-wrapper.c files. gcc.c heavily manipulates with strings and it would > probably require > usage of a string pool, that can easily eventually removed (just in case of > --enable-valgrind-annotations). > The second source file tends to produce memory leaks because of fork/exec > constructs. However both > can be improved during next stage1. > > Apart from aforementioned issues, the compiler does not contain so many > issues and I think it's > doable to prune them and rely on reported valgrind errors. > > Patch touches many .exp files, but basically does just couple of > modifications: > > 1) gcc-defs.exp introduces new global variable run_under_valgrind > 2) new procedure dg-run-valgrind distinguishes between just passing options > to 'gd-test', >or runs 'dg-test' with additional flags that enable valgrind (using > -wrapper) > 3) new procedure dg-runtest-valgrind does the similar > 4) many changes in corresponding *.exp files that utilize these procedures > > The patch should be definitely part of next stage1, but I would appreciate > any thoughts > about the described approach? IIRC you can replace the actual dg-runtest proc with your own (implementing a wrapper). Grep aroung, I think we do that already. That's certainly preferable instead of touching all callers. > > Thank you, > Martin brgds, H-P
ICF fixes
Hi, this patchs fixes few issues in ipa-icf. First I drop the use of TYPE_CANONICAL because that is no longer part of type compatibility machinery. Second I also hash TYPE_MODE for aggregates, becuase we now always require a match and I check that we don't match types that are incomplete where this would give false negatives. Second change is in func_checker::compatible_types_p where I disabled comparing of alias sets of types that do not have alias sets (and thus can never be read). I hoped to switch ipa-icf-gimple.c to use of operand_equal_p and drop all such wrong type mathcing, but I suppose it will need to wait for next stage1. Bootstrapped/regtested x86_64-linux, comitted. Honza * ipa-icf.c (sem_item::add_type): Do not look for TYPE_CANONICAL; do not check AGGREGATE_TYPE_P when adding TYPE_MODE; Check that all record types are complete. * ipa-icf-gimple.c (func_checker::compatible_types_p): Do not compare alias sets for types w/o alias sets. Index: ipa-icf.c === --- ipa-icf.c (revision 230683) +++ ipa-icf.c (working copy) @@ -1543,11 +1543,8 @@ sem_item::add_type (const_tree type, inc } type = TYPE_MAIN_VARIANT (type); - if (TYPE_CANONICAL (type)) -type = TYPE_CANONICAL (type); - if (!AGGREGATE_TYPE_P (type)) -hstate.add_int (TYPE_MODE (type)); + hstate.add_int (TYPE_MODE (type)); if (TREE_CODE (type) == COMPLEX_TYPE) { @@ -1574,6 +1571,7 @@ sem_item::add_type (const_tree type, inc } else if (RECORD_OR_UNION_TYPE_P (type)) { + gcc_checking_assert (COMPLETE_TYPE_P (type)); hashval_t *val = optimizer->m_type_hash_cache.get (type); if (!val) Index: ipa-icf-gimple.c === --- ipa-icf-gimple.c(revision 230683) +++ ipa-icf-gimple.c(working copy) @@ -233,7 +233,15 @@ func_checker::compatible_types_p (tree t if (!types_compatible_p (t1, t2)) return return_false_with_msg ("types are not compatible"); - if (get_alias_set (t1) != get_alias_set (t2)) + /* We do a lot of unnecesary matching of types that are not being + accessed and thus do not need to be compatible. In longer term we should + remove these checks on all types which are not accessed as memory + locations. + + For time being just avoid calling get_alias_set on types that are not + having alias sets defined at all. */ + if (type_with_alias_set_p (t1) && type_with_alias_set_p (t2) + && get_alias_set (t1) != get_alias_set (t2)) return return_false_with_msg ("alias sets are different"); return true;
Go testsuite patch committed: Skip nilptr.go if PIE
This patch to the Go testsuite skips the nilptr.go test when the default is to produce PIE. This should fix GCC PR 66406. Ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian 2015-11-20 Ian Lance Taylor PR go/66406 * go.test/go-test.exp (go-gc-tests): Skip nilptr.go if PIE. Index: go.test/go-test.exp === --- go.test/go-test.exp (revision 229630) +++ go.test/go-test.exp (working copy) @@ -398,6 +398,11 @@ proc go-gc-tests { } { } } + if [check_effective_target_pie_enabled] { + untested $test + continue + } + if { [file tail $test] == "init1.go" } { # This tests whether GC runs during init, which for gccgo # it currently does not.
libgo patch committed: Don't run multicast tests on nil interface if -test.short
This libgo patch is a backport of https://golang.org/cl/17154. It fixes the tests for the net package to not run the multicast tests on a nil interface when using -test.short, which is the default when running make check. This should fix GCC PR 65785. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline and GCC 5 branch. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230694) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -b839c8c35af49bd6d86306ad34449654a4657cb1 +81dcb1ba4de82a6c9325cb322d5a832a6b1f168d The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/net/listen_test.go === --- libgo/go/net/listen_test.go (revision 230463) +++ libgo/go/net/listen_test.go (working copy) @@ -542,7 +542,7 @@ func TestIPv4MulticastListener(t *testin // routing stuff for finding out an appropriate // nexthop containing both network and link layer // adjacencies. - if ifi == nil && !*testExternal { + if ifi == nil && (testing.Short() || !*testExternal) { continue } for _, tt := range ipv4MulticastListenerTests { @@ -618,7 +618,7 @@ func TestIPv6MulticastListener(t *testin // routing stuff for finding out an appropriate // nexthop containing both network and link layer // adjacencies. - if ifi == nil && (!*testExternal || !*testIPv6) { + if ifi == nil && (testing.Short() || !*testExternal || !*testIPv6) { continue } for _, tt := range ipv6MulticastListenerTests {
libgo: use clock_gettime to get current time
PR 66574 aka https://golang.org/issue/11222 points out that libgo is only retrieving the current time in microseconds, when it should be using nanoseconds. This patch fixes the problem. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230689) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -f79db38cf3484b63f7807abef05eecb23e9d0806 +b839c8c35af49bd6d86306ad34449654a4657cb1 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/configure.ac === --- libgo/configure.ac (revision 230463) +++ libgo/configure.ac (working copy) @@ -501,9 +501,10 @@ PTHREAD_LIBS= AC_CHECK_LIB([pthread], [pthread_create], PTHREAD_LIBS=-lpthread) AC_SUBST(PTHREAD_LIBS) -dnl Test if -lrt is required for sched_yield and/or nanosleep. +dnl Test if -lrt is required for sched_yield or nanosleep or clock_gettime. AC_SEARCH_LIBS([sched_yield], [rt]) AC_SEARCH_LIBS([nanosleep], [rt]) +AC_SEARCH_LIBS([clock_gettime], [rt]) AC_C_BIGENDIAN Index: libgo/runtime/go-now.c === --- libgo/runtime/go-now.c (revision 230463) +++ libgo/runtime/go-now.c (working copy) @@ -13,11 +13,11 @@ struct time_now_ret now() { - struct timeval tv; + struct timespec ts; struct time_now_ret ret; - gettimeofday (&tv, NULL); - ret.sec = tv.tv_sec; - ret.nsec = tv.tv_usec * 1000; + clock_gettime (CLOCK_REALTIME, &ts); + ret.sec = ts.tv_sec; + ret.nsec = ts.tv_nsec; return ret; }
[PATCH/RFC v2] PR68212: Improve Accounting of Block Frequencies During Loop Unrolling
The problem addressed by this patch is that the intermediate code produced during loop unrolling has incorrect block frequencies. The erroneous block frequencies result because block frequencies are not adjusted to account for the execution contexts into which they are copied. These incorrect frequencies disable and/or confuse subsequent compiler optimizations. The general idea of how we address this problem is two fold: 1. Before a loop body is unpeeled into a pre-header location, we temporarily adjust the loop body frequencies to represent the values appropriate for the context into which the loop body is to be copied. 2. After unrolling the loop body (by replicating the loop body (N-1) times within the loop), we recompute all frequencies associated with blocks contained within the loop. This revision of the patch distributed earlier (See https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00814.html) addresses various concerns raised as feedback in response to the original post. This new patch includes the following improvements: 1. Various formatting and style improvements: a) Comments describing functions use all-caps naming of arguments, do not start continuation lines with an asterisk, and end with a comment termination sequence ("*/") on the same line as the comment text. b) Compound statements (enclosed within braces) consisting of only one simple statement are now represented as single statements (without the braces). c) Gratuitous changes in white space between old and new code revisions have been removed. d) Several obvious and/or redundant comments have been removed. e) Comments previously tagged to the end of a C statement have been moved to the line that precedes the statement. f) The ChangeLog has been rewritten and reformatted according to guidelines provided in the "GNU Coding Standards" 2. Certain functions have been rewritten and/or removed: a) in_loop_p() was written to use bb->loop_father and flow_loop_nested_p() instead of iteration over all blocks in the loop. b) same_edge_p() was removed. All invocations of this function are replaced with pointer equality tests. c) in_loop_of_header_p() was removed as it was deemed unnecessary. d) in_block_set_p() was removed as it was deemed unnecessary. e) get_exit_edges_from_loop_blocks() was removed as it was deemed unnecessary. f) internal() was removed as it was deemed redundant with fatal_error(). 3. Improvements include: a) A set of 15 test programs has been added into the testsuite/gcc.dg/ subdirectory. Some of these test programs still fail and work is ongoing to identify and address the reasons for the failures . In several cases, it is known that the failures result from block frequency inaccuracies that originate outside of the loop unrolling code. b) Where concerns were raised about the potential for unbounded recursion in some of the functions, comments now clarify the bound on the recursion depth. c) Code that had been conditionally compiled under the ENABLE_CHECKING attribute is now unconditionally compiled and executed only if the flag_checking variable is non-zero. gcc/ChangeLog: 2015-11-20 Kelvin Nilsen * loop-unroll.c (unroll_loop_constant_iterations): After replicating the loop body within a loop, recompute the frequencies for all blocks contained within the loop. (unroll_loop_runtime_iterations): Before copying loop body to preheader location, temporarily adjust the loop body frequencies to represent the context into which the loop body will be copied. After replicating the loop body within a loop, recompute the frequencies for all blocks contained within the loop. * cfgloopmanip.c (in_loop_p): New helper function. (zero_loop_frequencies): New helper function. (in_edge_set_p): New helper function. (in_call_chain_p): New helper function. (recursively_zero_frequency): New helper function. (recursion_detected_p): New helper function. (zero_partial_loop_frequencies): New helper function. (recursively_increment_frequency): New helper function. (increment_loop_frequencies): New helper function. (check_loop_frequency_integrity): New helper function. (can_duplicate_loop_p): New helper function. (duplicate_loop_to_header_edge): Recompute loop frequencies after blocks are replicated (unrolled) into the loop body. * cfgloopmanip.h: New extern declarations. gcc/testsuite/ChangeLog: 2015-11-20 Kelvin Nilsen * gcc.dg/pr68212-1.c: New test. * gcc.dg/pr68212-10.c: New test. * gcc.dg/pr68212-11.c: New test. * gcc.dg/pr68212-12.c: New test. * gcc.dg/pr68212-13.c: New test. * gcc.dg/pr68212-14.c: New test. * gcc.dg/pr68212-15.c: New test. * gcc.dg/pr68212-2.c: New test. * gcc.dg/pr68212-3.c: New test. * gcc.dg/pr68212-4.c: New test. * gcc.d
Re: [PATCH, rs6000, gcc 5 backport] Fix PR target/67808, LRA ICE on double to long double conversion
On Fri, Oct 02, 2015 at 02:04:48PM -0500, Peter Bergner wrote: > PR67808 exposes a problem with the constraints in the *extenddftf2_internal > pattern, in that it allows TFmode operands to occupy Altivec registers > which they are not allowed to do. Reload was able to work around the > problem, but LRA is more pedantic and it caused it to go into an infinite > spill loop until it ICEd. The following patch from Mike changes the TFmode > output operand to use the "d" constraint instead of "ws". It also allows > using the "ws" constraint for the two input operands, since that is allowed > for DFmode operands. > > This passed bootstraps (with reload on by default and lra on by default) > and shows no testsuite regressions. Is this ok for trunk? > > The bug is also present in the FSF 5 branch (4.9 is ok), is this ok for > that too, assuming my bootstrap/regtesting there are clean? The following patch backports the fix to GCC 5.x. There were no regressions in doing the bootstrap/make check on both a big endian power7 system and a little endian power8 system. Is it ok to apply the patch to the gcc-5 branch? 2015-10-20 Michael Meissner Back port from trunk: 2015-10-05 Michael Meissner Peter Bergner PR target/67808 * config/rs6000/rs6000.md (extenddftf2): In the expander, only allow registers, but provide insns for the combiner to create for loads from memory. Separate VSX code from non-VSX code. For non-VSX code, combine extenddftf2_fprs into extenddftf2 and rename externaldftf2_internal to externaldftf2_fprs. Reorder constraints so that registers come before memory operations. Drop support from converting DFmode to TFmode, if the DFmode value is in a GPR register. (extenddftf2_fprs): Likewise. (extenddftf2_internal): Likewise. (extenddftf2_vsx): Likewise. (extendsftf2): In the expander, only allow registers, but provide insns for the combiner to create for stores and loads. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class
On Fri, Nov 20, 2015 at 2:17 PM, Jason Merrill wrote: > On 11/20/2015 01:52 PM, H.J. Lu wrote: >> >> On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener >> wrote: >>> >>> On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu wrote: Empty record should be returned and passed the same way in C and C++. This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which defaults to return false. For C++, LANG_HOOKS_EMPTY_RECORD_P is defined to is_really_empty_class, which returns true for C++ empty classes. For LTO, we stream out a bit to indicate if a record is empty and we store it in TYPE_LANG_FLAG_0 when streaming in. get_ref_base_and_extent is changed to set bitsize to 0 for empty records. Middle-end and x86 backend are updated to ignore empty records for parameter passing and function value return. Other targets may need similar changes. >>> >>> >>> Please avoid a new langhook for this and instead claim a bit in >>> tree_type_common >>> like for example restrict_flag (double-check it is unused for >>> non-pointers). >> >> >> There is no bit in tree_type_common I can overload. restrict_flag is >> checked for non-pointers to issue an error when it is used on >> non-pointers: >> >> >> /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38: >> error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’ >> typedef typename T::L __restrict__ r;// { dg-error "'__restrict__' >> qualifiers cannot" "" } > > > The C++ front end only needs to check TYPE_RESTRICT for this purpose on > front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals could > handle that specifically if you change TYPE_RESTRICT to only apply to > pointers. > restrict_flag is also checked in this case: [hjl@gnu-6 gcc]$ cat x.i struct dummy { }; struct dummy foo (struct dummy __restrict__ i) { return i; } [hjl@gnu-6 gcc]$ gcc -S x.i -Wall x.i:4:13: error: invalid use of ‘restrict’ foo (struct dummy __restrict__ i) ^ x.i:4:13: error: invalid use of ‘restrict’ [hjl@gnu-6 gcc]$ restrict_flag can't also be used to indicate `i' is an empty record. H.J.
Go patch committed: Fix minor performance problem in import-archive
GCC PR 68141 points out a minor performance problem in the code in gcc/go/gofrontend/import-archive.cc: in the Archive_iterator comparison methods, the type is not passed by reference. This patch fixes it. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230685) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -128d5b14b8ab967cb61c01a9b2c596bda7d04c63 +f79db38cf3484b63f7807abef05eecb23e9d0806 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/import-archive.cc === --- gcc/go/gofrontend/import-archive.cc (revision 230463) +++ gcc/go/gofrontend/import-archive.cc (working copy) @@ -468,11 +468,11 @@ class Archive_iterator } bool - operator==(const Archive_iterator p) const + operator==(const Archive_iterator& p) const { return this->off_ == p->off; } bool - operator!=(const Archive_iterator p) const + operator!=(const Archive_iterator& p) const { return this->off_ != p->off; } private:
libgo patch committed: Fix PR 67976 on GCC 5 branch
This patch to libgo fixes PR 67976 on the GCC 5 branch. The bug was already fixed on mainline when libgo was upgraded to Go 1.5. Ian Index: libgo/go/cmd/cgo/out.go === --- libgo/go/cmd/cgo/out.go (revision 229639) +++ libgo/go/cmd/cgo/out.go (working copy) @@ -102,10 +102,9 @@ func (p *Package) writeDefs() { } if !cVars[n.C] { - fmt.Fprintf(fm, "extern char %s[];\n", n.C) - fmt.Fprintf(fm, "void *_cgohack_%s = %s;\n\n", n.C, n.C) - if !*gccgo { + fmt.Fprintf(fm, "extern char %s[];\n", n.C) + fmt.Fprintf(fm, "void *_cgohack_%s = %s;\n\n", n.C, n.C) fmt.Fprintf(fc, "#pragma cgo_import_static %s\n", n.C) }
Re: [PATCH] Fix PR objc/68438 (uninitialized source ranges)
On Fri, 20 Nov 2015, David Malcolm wrote: > The source ranges are verified using the same unit-testing plugin used > for C expressions. This leads to a wart, which is that it means having > a .m test file lurking below gcc.dg/plugin. A workaround would be to > create an objc.dg/plugin subdirectory, with a new plugin.exp for testing > Objective C plugins, and having it reference the existing plugin below > gcc.dg/plugin. That seemed like excessive complication, so I went for > the simpler approach of simply putting the .m file below gcc.dg/plugin. Have you made sure that this test is quietly not run if the --enable-languages configuration does not build the ObjC compiler? -- Joseph S. Myers jos...@codesourcery.com
libgo patch committed: Handle DW_AT_specification in cgo
The earlydebug work has caused https://gcc.gnu.org/PR68072 when using the cgo tool. The patch to fix this in the master sources is https://golang.org/cl/17151 . This patch fixes the problem in the gccgo sources. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline and GCC 5 branch. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230677) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -d52835c9376985f92f35c32af5f1808239981536 +128d5b14b8ab967cb61c01a9b2c596bda7d04c63 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/cmd/cgo/gcc.go === --- libgo/go/cmd/cgo/gcc.go (revision 230463) +++ libgo/go/cmd/cgo/gcc.go (working copy) @@ -490,6 +490,11 @@ func (p *Package) loadDWARF(f *File, nam name, _ := e.Val(dwarf.AttrName).(string) typOff, _ := e.Val(dwarf.AttrType).(dwarf.Offset) if name == "" || typOff == 0 { + if e.Val(dwarf.AttrSpecification) != nil { + // Since we are reading all the DWARF, + // assume we will see the variable elsewhere. + break + } fatalf("malformed DWARF TagVariable entry") } if !strings.HasPrefix(name, "__cgo__") {
Re: PATCH to shorten_compare -Wtype-limits handling
On 11/20/2015 11:04 AM, Manuel López-Ibáñez wrote: On 20 November 2015 at 17:42, Jeff Law wrote: So we have to detangle the operand shortening from warning detection. Kai's idea was to first make the shortening code "pure" in the sense that it would have no side effects other than to generate the warnings. Canonicalization and other transformations would still occur internally, but not be reflected in the IL. That was the overall plan and he posted a patch for that. But that patch didn't do the due diligence to verify that once the shortening code was made "pure" that we didn't regress on the quality of the code we generated. I thought that the original plan was to make the warning code also use match.pd. That is, that all folding, including FE folding, will be match.pd-based. Has this changed? I don't think that's changed. Detangling the two is the first step. Once detangled we can then look to move the warning to a more suitable location -- right now it's in the C/C++ front-ends, firing way too early. Instead it ought to be checked in gimple form, after match.pd canonicalization and simplifications. Jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/18/2015 07:15 AM, Nick Clifton wrote: Hi Guys, I recently discovered a bug in the current Redundant Extension Elimination optimization. If the candidate extension instruction increases the number of hard registers used, the pass does not check to see if these extra registers are live between the definition and the extension. For example: (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) (insn 45 (set (reg:QI r10) (mem:QI (reg:HI r18))) [...] (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) [...] (insn 88 (set (reg:HI r10) (zero_extend:HI (reg:QI r10))) (This is on the RL78 target where HImode values occupy two hard registers and QImode values only one. The bug however is generic, not RL78 specific). The REE pass transforms this into: (insn 44 (set (reg:QI r11) (mem:QI (reg:HI r20))) (insn 45 (set (reg:HI r10) (zero_extend:HI (mem:QI (reg:HI r18 [...] (insn 71 (set (reg:HI r14) (zero_extend:HI (reg:QI r11))) [...] (insn 88 deleted) Note how the new set at insn 45 clobbers the value loaded by insn 44 into r11. The patch below is my attempt to fix this. It is not correct however. I could not work out how to determine if a given hard register is live at any point between two insns. So instead I used the liveness information associated with the basic block containing the definition. If one of the extended registers is live or becomes live in this block, and this block is not the same block as the one containing the extension insn, then I stop the optimization. This works for the RL78 for all of the cases that I could find. So ... comments please ? Will gcc ever generate a single basic block that contains both the definition and the extension instructions, but also where the extra hard registers are used for another purpose as well ? Tested with no regressions (or fixes) on an x86-pc-linux-gnu target. Also tested with no regression and 7 fixes on an rl78-elf target. Cheers Nick gcc/ChangeLog 2015-11-18 Nick Clifton * ree.c (regs_live_between): New function. (add_removable_extension): If the extension uses more hard registers than the definition then check that the extra hard registers are not live between the definition and the extension. @@ -1080,6 +1123,30 @@ } } + /* Fourth, if the extended version occupies more registers than +the original version then make sure that these extra registers +are not live between the definition and the extension. */ + if (HARD_REGNO_NREGS (REGNO (dest), mode) + > HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg))) So this seems like a reasonable place to catch this. I certainly prefer to avoid work later by filtering earlier like you've done. The problem is with your implementation of regs_live_between. You're assuming we've got straight-line code between the original set and the extension. That's not guaranteed. Essentially we've got use-def chains via the DF framework. So, in theory, they can be in different blocks and arbitrary flow control between them. In fact, the extension could be before the original set in the insn chain (though obviously not for execution order). I think it's reasonable to just punt when we see that the source/dest register of the extension are the same and the number of hard regs is different. That's what I'm testing now. jeff
Re: [PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class
On 11/20/2015 01:52 PM, H.J. Lu wrote: On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener wrote: On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu wrote: Empty record should be returned and passed the same way in C and C++. This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which defaults to return false. For C++, LANG_HOOKS_EMPTY_RECORD_P is defined to is_really_empty_class, which returns true for C++ empty classes. For LTO, we stream out a bit to indicate if a record is empty and we store it in TYPE_LANG_FLAG_0 when streaming in. get_ref_base_and_extent is changed to set bitsize to 0 for empty records. Middle-end and x86 backend are updated to ignore empty records for parameter passing and function value return. Other targets may need similar changes. Please avoid a new langhook for this and instead claim a bit in tree_type_common like for example restrict_flag (double-check it is unused for non-pointers). There is no bit in tree_type_common I can overload. restrict_flag is checked for non-pointers to issue an error when it is used on non-pointers: /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38: error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’ typedef typename T::L __restrict__ r;// { dg-error "'__restrict__' qualifiers cannot" "" } The C++ front end only needs to check TYPE_RESTRICT for this purpose on front-end-specific type codes like TEMPLATE_TYPE_PARM; cp_type_quals could handle that specifically if you change TYPE_RESTRICT to only apply to pointers. Jason
Re: [PATCH] g++.dg/init/vbase1.C and g++.dg/cpp/ucn-1.C
On Nov 16, 2015, at 6:02 AM, Renlin Li wrote: > On 14/11/15 00:33, David Edelsohn wrote: >> No RISC architecture can store directly to MEM, so the expected RTL in >> g++.dg/init/vbase1.C is wrong. I am adding XFAIL for PowerPC. This >> probably should be disabled for ARM and other RISC architectures. > > I observed the same problem in arm. > > This passes for aarch64 and mips as they have zero register to do that. > However, other RISC might not have that feature, for example arm and RS6000 > in this case. I fixed this with the below patch. Tested on x86_64 linux, x86_64 darwin and my port. If you want to list aarch64/arn and mips, please do. * g++.dg/init/vbase1.C: Only run on x86_64-*-* as this testcase isn't portable. Index: g++.dg/init/vbase1.C === --- g++.dg/init/vbase1.C(revision 230675) +++ g++.dg/init/vbase1.C(working copy) @@ -42,4 +42,4 @@ int main(int, char**) // Verify that the SubB() mem-initializer is storing 0 directly into // this->D.whatever rather than into a stack temp that is then copied into the // base field. -// { dg-final { scan-rtl-dump "set \[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" { xfail { powerpc*-*-* } } } } +// { dg-final { scan-rtl-dump "set \[^\n\]*\n\[^\n\]*this\[^\n\]*\n\[^\n\]*const_int 0" "expand" { target { x86_64-*-* } } } }
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
> I would hazard a guess that the authors simply didn't consider the > multi-hard reg case. Essentially if the original set reached an > extension, then obviously the original set got there unharmed and the > extended destination should reach as well -- except that doesn't apply > to multi-word hard regs. This pass started as a Zero-Extension Elimination pass for x86-64 and was only considering implicit SI->DI extensions initially. The algorithm was tailored to this specific pattern and I agree that the pass should be reimplemented. -- Eric Botcazou
Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1
On 11/20/2015 11:17 AM, James Greenhalgh wrote: On Tue, Nov 10, 2015 at 11:54:00AM -0600, Evandro Menezes wrote: 2015-11-10 Evandro Menezes gcc/ * config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model. * config/aarch64/aarch64.md: Include "exynos-m1.md". * config/arm/arm-cores.def: Use the Exynos M1 sched model. * config/arm/arm.md: Include "exynos-m1.md". * config/arm/arm-tune.md: Regenerated. * config/arm/exynos-m1.md: New file. This patch adds the scheduling model for Exynos M1. It depends on https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu. Please, commit if it's alright. From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 9 Nov 2015 17:18:52 -0600 Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1 gcc/ * config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model. * config/aarch64/aarch64.md: Include "exynos-m1.md". These changes are fine. * config/arm/arm-cores.def: Use the Exynos M1 sched model. * config/arm/arm.md: Include "exynos-m1.md". * config/arm/arm-tune.md: Regenerated. These changes need an ack from an ARM reviewer. * config/arm/exynos-m1.md: New file. I have a few comments on this model. +;; The Exynos M1 core is modeled as a triple issue pipeline that has +;; the following functional units. + +(define_automaton "exynos_m1_gp") +(define_automaton "exynos_m1_ls") +(define_automaton "exynos_m1_fp") + +;; 1. Two pipelines for simple integer operations: A, B +;; 2. One pipeline for simple or complex integer operations: C + +(define_cpu_unit "em1_xa, em1_xb, em1_xc" "exynos_m1_gp") + +(define_reservation "em1_alu" "(em1_xa | em1_xb | em1_xc)") +(define_reservation "em1_c" "em1_xc") Is this extra reservation useful, can we not just use em1_xc directly? +;; 3. Two asymmetric pipelines for Neon and FP operations: F0, F1 + +(define_cpu_unit "em1_f0, em1_f1" "exynos_m1_fp") + +(define_reservation "em1_fmac" "em1_f0") +(define_reservation "em1_fcvt" "em1_f0") +(define_reservation "em1_nalu" "(em1_f0 | em1_f1)") +(define_reservation "em1_nalu0" "em1_f0") +(define_reservation "em1_nalu1" "em1_f1") +(define_reservation "em1_nmisc" "em1_f0") +(define_reservation "em1_ncrypt" "em1_f0") +(define_reservation "em1_fadd" "em1_f1") +(define_reservation "em1_fvar" "em1_f1") +(define_reservation "em1_fst" "em1_f1") Same comment here, does this not just obfuscate the interaction between instruction classes in the description. I'm not against doing it this way if you prefer, but it would seem to reduce readability to me. I think there is also an argument that this increases readability, so it is your choice. + +;; 4. One pipeline for branch operations: BX + +(define_cpu_unit "em1_bx" "exynos_m1_gp") + +(define_reservation "em1_br" "em1_bx") + And again? +;; 5. One AGU for loads: L +;; One AGU for stores and one pipeline for stores: S, SD + +(define_cpu_unit "em1_lx" "exynos_m1_ls") +(define_cpu_unit "em1_sx, em1_sd" "exynos_m1_ls") + +(define_reservation "em1_ld" "em1_lx") +(define_reservation "em1_st" "(em1_sx + em1_sd)") + +;; Common occurrences +(define_reservation "em1_sfst" "(em1_fst + em1_st)") +(define_reservation "em1_lfst" "(em1_fst + em1_ld)") + +;; Branches +;; +;; No latency as there is no result +;; TODO: Unconditional branches use no units; +;; conditional branches add the BX unit; +;; indirect branches add the C unit. +(define_insn_reservation "exynos_m1_branch" 0 + (and (eq_attr "tune" "exynosm1") + (eq_attr "type" "branch")) + "em1_br") + +(define_insn_reservation "exynos_m1_call" 1 + (and (eq_attr "tune" "exynosm1") + (eq_attr "type" "call")) + "em1_alu") + +;; Basic ALU +;; +;; Simple ALU without shift, non-predicated +(define_insn_reservation "exynos_m1_alu" 1 + (and (eq_attr "tune" "exynosm1") + (and (not (eq_attr "predicated" "yes")) (and (eq_attr "predicated" "no")) ? Likewise throughout the file? Again this is your choice. This is OK from the AArch64 side, let me know if you plan to change any of the above, otherwise I'll commit it (or someone else can commit it) after I see an OK from an ARM reviewer. The naming choices were made more for uniformity's sake and to conform with the processor documentation. The bit about "not... yes" <=> "no" is a goof up. :-s I'm waiting for Kyrill's patch adding FCCMP to be approved to amend this patch. However, I'd appreciate if it'd be approved and committed before then, as soon the ARM stuff is okayed. Thank you, -- Evandro Menezes
[SPARC] Minor cleanup
This just moves the VIS3 mulx patterns to a more appropriate place. Tested on SPARC/Solaris, applied on the mainline. 2015-11-20 Eric Botcazou * config/sparc/sparc.md (umulxhi_vis): Move around. (*umulxhi_sp64): Likewise. (umulxhi_v8plus): Likewise. (xmulx_vis): Likewise. (*xmulx_sp64): Likewise. (xmulx_v8plus): Likewise. (xmulxhi_vis): Likewise. (*xmulxhi_sp64): Likewise. (xmulxhi_v8plus): Likewise. -- Eric BotcazouIndex: config/sparc/sparc.md === --- config/sparc/sparc.md (revision 230589) +++ config/sparc/sparc.md (working copy) @@ -609,7 +609,7 @@ (define_insn "*cmptf_fp" return "fcmpq\t%1, %2"; } [(set_attr "type" "fpcmp")]) - + ;; Next come the scc insns. ;; Note that the boolean result (operand 0) takes on DImode @@ -647,8 +647,6 @@ (define_expand "cstore4" "TARGET_FPU" { if (emit_scc_insn (operands)) DONE; else FAIL; }) - - ;; Seq_special[_xxx] and sne_special[_xxx] clobber the CC reg, because they ;; generate addcc/subcc instructions. @@ -1137,7 +1135,7 @@ (define_split (match_dup 0)))] "") - + ;; These control RTL generation for conditional jump insns (define_expand "cbranchcc4" @@ -1318,7 +1316,6 @@ (define_insn "*cbcond_sp64" ;; There are no 32 bit brreg insns. -;; XXX (define_insn "*normal_int_branch_sp64" [(set (pc) (if_then_else (match_operator 0 "v9_register_compare_operator" @@ -1335,7 +1332,6 @@ (define_insn "*normal_int_branch_sp64" [(set_attr "type" "branch") (set_attr "branch_type" "reg")]) -;; XXX (define_insn "*inverted_int_branch_sp64" [(set (pc) (if_then_else (match_operator 0 "v9_register_compare_operator" @@ -2730,8 +2726,6 @@ (define_expand "movcc" DONE; }) -;; Conditional move define_insns - (define_insn "*mov_cc_v9" [(set (match_operand:I 0 "register_operand" "=r") (if_then_else:I (match_operator 1 "comparison_operator" @@ -2896,7 +2890,7 @@ (define_insn_and_split "*movtf_cc_reg_sp } [(set_attr "length" "2")]) - + ;; Zero-extension instructions ;; These patterns originally accepted general_operands, however, slightly @@ -4043,7 +4037,6 @@ (define_insn "*muldi3_sp64" [(set_attr "type" "imul")]) ;; V8plus wide multiply. -;; XXX (define_insn "muldi3_v8plus" [(set (match_operand:DI 0 "register_operand" "=r,h") (mult:DI (match_operand:DI 1 "arith_operand" "%r,0") @@ -4094,7 +4087,6 @@ (define_expand "mulsidi3" ;; V9 puts the 64-bit product in a 64-bit register. Only out or global ;; registers can hold 64-bit values in the V8plus environment. -;; XXX (define_insn "mulsidi3_v8plus" [(set (match_operand:DI 0 "register_operand" "=h,r") (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r,r")) @@ -4107,7 +4099,6 @@ (define_insn "mulsidi3_v8plus" [(set_attr "type" "multi") (set_attr "length" "2,3")]) -;; XXX (define_insn "const_mulsidi3_v8plus" [(set (match_operand:DI 0 "register_operand" "=h,r") (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r,r")) @@ -4120,7 +4111,6 @@ (define_insn "const_mulsidi3_v8plus" [(set_attr "type" "multi") (set_attr "length" "2,3")]) -;; XXX (define_insn "*mulsidi3_sp32" [(set (match_operand:DI 0 "register_operand" "=r") (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r")) @@ -4148,7 +4138,6 @@ (define_insn "*mulsidi3_sp64" ;; Extra pattern, because sign_extend of a constant isn't valid. -;; XXX (define_insn "const_mulsidi3_sp32" [(set (match_operand:DI 0 "register_operand" "=r") (mult:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r")) @@ -4203,7 +4192,6 @@ (define_expand "smulsi3_highpart" } }) -;; XXX (define_insn "smulsi3_highpart_v8plus" [(set (match_operand:SI 0 "register_operand" "=h,r") (truncate:SI @@ -4219,7 +4207,6 @@ (define_insn "smulsi3_highpart_v8plus" (set_attr "length" "2")]) ;; The combiner changes TRUNCATE in the previous pattern to SUBREG. -;; XXX (define_insn "" [(set (match_operand:SI 0 "register_operand" "=h,r") (subreg:SI @@ -4236,7 +4223,6 @@ (define_insn "" [(set_attr "type" "multi") (set_attr "length" "2")]) -;; XXX (define_insn "const_smulsi3_highpart_v8plus" [(set (match_operand:SI 0 "register_operand" "=h,r") (truncate:SI @@ -4251,7 +4237,6 @@ (define_insn "const_smulsi3_highpart_v8p [(set_attr "type" "multi") (set_attr "length" "2")]) -;; XXX (define_insn "*smulsi3_highpart_sp32" [(set (match_operand:SI 0 "register_operand" "=r") (truncate:SI @@ -4263,7 +4248,6 @@ (define_insn "*smulsi3_highpart_sp32" [(set_attr "type" "multi") (set_attr "length" "2")]) -;; XXX (define_insn "const_smulsi3_highpart" [(set (match_operand:SI 0 "register_operand" "=r") (truncate:SI @@ -4301,7 +4285,6 @@ (define_expand "umulsidi3" } }) -;; XXX (define_insn "umulsidi3_v8plus" [(set (match_operand:DI 0 "register_operan
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/20/2015 02:19 AM, Nick Clifton wrote: Hi Jeff, The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. So this looks like a pretty fundamental failing in ree.c and AFAICT has been there since day 1. Thankfully, I don't think it's likely to trigger all that often. It's useful to ponder two cases. One where we're going to add a copy and one where we don't. I added the copy case during the 4.9 cycle as an extension of existing code. It only fires in easy to analyze circumstances -- when the original set and the extension are in the same basic block, but have different destination registers. In the copy case. The code checks that the destination of the *extension* is neither used nor set between the original set and the extension. So in the case of multi-word hard reg, we'll verify that the entire multi-word hard reg survives from the original insn to the extension. That avoids this problem in cases where a copy is needed. However, in the case where no copy is needed, no such checks are made. In fact the checks could be fairly painful as the original set and the extension can be in different blocks with arbitrary flow between them. I would hazard a guess that the authors simply didn't consider the multi-hard reg case. Essentially if the original set reached an extension, then obviously the original set got there unharmed and the extended destination should reach as well -- except that doesn't apply to multi-word hard regs. Given the expense in checking all the potential paths between the original set and the extension, I think just disallowing this case ought to be is the best way to go. Ideally we'll filter out that case before we get to combine_set_extension. But if we can't, it's easy enough to test there. Jeff
libgo patch committed: use correct tool dir with gccgo
This patch from Lynn Boger fixes the go tool shipped with gccgo to use the correct tool directory. It also fixes the 'go tool' output to only list known tools. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline and GCC 5 branch. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230657) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -dfa74d975884f363c74d6a66a37b1703093fdba6 +d52835c9376985f92f35c32af5f1808239981536 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/cmd/go/pkg.go === --- libgo/go/cmd/go/pkg.go (revision 230463) +++ libgo/go/cmd/go/pkg.go (working copy) @@ -785,7 +785,11 @@ func (p *Package) load(stk *importStack, if goTools[p.ImportPath] == toTool { // This is for 'go tool'. // Override all the usual logic and force it into the tool directory. - p.target = filepath.Join(gorootPkg, "tool", full) + if buildContext.Compiler == "gccgo" { + p.target = filepath.Join(runtime.GCCGOTOOLDIR, elem) + } else { + p.target = filepath.Join(gorootPkg, "tool", full) + } } if p.target != "" && buildContext.GOOS == "windows" { p.target += ".exe" Index: libgo/go/cmd/go/tool.go === --- libgo/go/cmd/go/tool.go (revision 230463) +++ libgo/go/cmd/go/tool.go (working copy) @@ -39,6 +39,12 @@ var ( toolN bool ) +// List of go tools found in the gccgo tool directory. +// Other binaries could be in the same directory so don't +// show those with the 'go tool' command. + +var gccgoTools = []string{"cgo", "fix", "cover", "godoc", "vet"} + func init() { cmdTool.Flag.BoolVar(&toolN, "n", false, "") } @@ -146,6 +152,21 @@ func listTools() { if toolIsWindows && strings.HasSuffix(name, toolWindowsExtension) { name = name[:len(name)-len(toolWindowsExtension)] } - fmt.Println(name) + + // The tool directory used by gccgo will have other binaries + // in additions to go tools. Only display go tools for this list. + + if buildContext.Compiler == "gccgo" { + for _, tool := range gccgoTools { + if tool == name { + fmt.Println(name) + } + } + } else { + + // Not gccgo, list all the tools found in this dir + + fmt.Println(name) + } } }
Re: RFA: PATCH to match.pd for c++/68385
On 11/20/2015 02:58 PM, Jason Merrill wrote: OK if testing passes? Which it did. Jason
[PATCH] Fix PR objc/68438 (uninitialized source ranges)
The attached patch fixes some more places in c/c-parser.c where the src_range field of a c_expr was being left uninitialized, this time for various Objective C constructs. The source ranges are verified using the same unit-testing plugin used for C expressions. This leads to a wart, which is that it means having a .m test file lurking below gcc.dg/plugin. A workaround would be to create an objc.dg/plugin subdirectory, with a new plugin.exp for testing Objective C plugins, and having it reference the existing plugin below gcc.dg/plugin. That seemed like excessive complication, so I went for the simpler approach of simply putting the .m file below gcc.dg/plugin. I manually verified that this fixes the specific valgrind warnings reported in the PR, and visually inspected the code for objective-c-specific instances of uninitialized c_expr src_range values. Bootstrapped®rtested on x86_64-pc-linux-gnu; adds 15 PASS results to gcc.sum. OK for trunk? >From afdae8b15f71164d0d05e790078519b38bd674a4 Mon Sep 17 00:00:00 2001 From: David Malcolm Date: Fri, 20 Nov 2015 11:12:47 -0500 Subject: [PATCH] Fix PR objc/68438 (uninitialized source ranges) gcc/c/ChangeLog: PR objc/68438 * c-parser.c (c_parser_postfix_expression): Set up source ranges for various Objective-C constructs: Class.name syntax, @selector(), @protocol, @encode(), and [] message syntax. gcc/testsuite/ChangeLog: PR objc/68438 * gcc.dg/plugin/diagnostic-test-expressions-2.m: New test file. * gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above file. --- gcc/c/c-parser.c | 17 +++- .../gcc.dg/plugin/diagnostic-test-expressions-2.m | 94 ++ gcc/testsuite/gcc.dg/plugin/plugin.exp | 3 +- 3 files changed, 110 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c index 7b10764..18e9957 100644 --- a/gcc/c/c-parser.c +++ b/gcc/c/c-parser.c @@ -7338,10 +7338,13 @@ c_parser_postfix_expression (c_parser *parser) expr.value = error_mark_node; break; } - component = c_parser_peek_token (parser)->value; + c_token *component_tok = c_parser_peek_token (parser); + component = component_tok->value; + location_t end_loc = component_tok->get_finish (); c_parser_consume_token (parser); expr.value = objc_build_class_component_ref (class_name, component); + set_c_expr_source_range (&expr, loc, end_loc); break; } default: @@ -7816,9 +7819,11 @@ c_parser_postfix_expression (c_parser *parser) } { tree sel = c_parser_objc_selector_arg (parser); + location_t close_loc = c_parser_peek_token (parser)->location; c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>"); expr.value = objc_build_selector_expr (loc, sel); + set_c_expr_source_range (&expr, loc, close_loc); } break; case RID_AT_PROTOCOL: @@ -7839,9 +7844,11 @@ c_parser_postfix_expression (c_parser *parser) { tree id = c_parser_peek_token (parser)->value; c_parser_consume_token (parser); + location_t close_loc = c_parser_peek_token (parser)->location; c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>"); expr.value = objc_build_protocol_expr (id); + set_c_expr_source_range (&expr, loc, close_loc); } break; case RID_AT_ENCODE: @@ -7860,11 +7867,13 @@ c_parser_postfix_expression (c_parser *parser) c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL); break; } - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, - "expected %<)%>"); { + location_t close_loc = c_parser_peek_token (parser)->location; + c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, + "expected %<)%>"); tree type = groktypename (t1, NULL, NULL); expr.value = objc_build_encode_expr (type); + set_c_expr_source_range (&expr, loc, close_loc); } break; case RID_GENERIC: @@ -7907,9 +7916,11 @@ c_parser_postfix_expression (c_parser *parser) c_parser_consume_token (parser); receiver = c_parser_objc_receiver (parser); args = c_parser_objc_message_args (parser); + location_t close_loc = c_parser_peek_token (parser)->location; c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE, "expected %<]%>"); expr.value = objc_build_message_expr (receiver, args); + set_c_expr_source_range (&expr, loc, close_loc); break; } /* Else fall through to report error. */ diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m new file mode 100644 index 000..ed7aca3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-2.m @@ -0,0 +1,94 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdiagnostics-show-caret" } */ + +/* This file is similar to diagnostic
[PATCH] Fix debug info related ICE when inlining back fnsplit function (PR debug/66432)
Hi! This patch fixes ICE where a parameter mentioned in a debug source bind has its VLA type mistakenly remapped and that leads to inconsistent type between the PARM_DECL and SSA_NAMEs derived from it. The patch Tom posted for this can't work, because we assume that the s=> value as well as debug_decl_args decl in that case is DECL_ORIGIN of the PARM_DECL. But, in this case we are replacing the DECL_ORIGIN PARM_DECL immediately with a DEBUG_EXPR_DECL anyway, so there is no point remapping the var we don't own (it could be in a different function etc.). Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5.3? 2015-11-20 Jakub Jelinek PR debug/66432 * tree-inline.c (copy_debug_stmt): If gimple_debug_source_bind_get_value is DECL_ORIGIN of a PARM_DECL in decl_debug_args, don't call remap_gimple_op_r on it. * gcc.dg/debug/pr66432.c: New test. --- gcc/tree-inline.c.jj2015-11-14 19:36:03.0 +0100 +++ gcc/tree-inline.c 2015-11-20 17:17:00.632082622 +0100 @@ -2864,8 +2864,6 @@ copy_debug_stmt (gdebug *stmt, copy_body else if (gimple_debug_source_bind_p (stmt)) { gimple_debug_source_bind_set_var (stmt, t); - walk_tree (gimple_debug_source_bind_get_value_ptr (stmt), -remap_gimple_op_r, &wi, NULL); /* When inlining and source bind refers to one of the optimized away parameters, change the source bind into normal debug bind referring to the corresponding DEBUG_EXPR_DECL that should have @@ -2889,7 +2887,10 @@ copy_debug_stmt (gdebug *stmt, copy_body break; } } - } + } + if (gimple_debug_source_bind_p (stmt)) + walk_tree (gimple_debug_source_bind_get_value_ptr (stmt), + remap_gimple_op_r, &wi, NULL); } processing_debug_stmt = 0; --- gcc/testsuite/gcc.dg/debug/pr66432.c.jj 2015-11-20 17:40:44.589171083 +0100 +++ gcc/testsuite/gcc.dg/debug/pr66432.c2015-11-20 17:38:48.0 +0100 @@ -0,0 +1,19 @@ +/* PR debug/66432 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -g" } */ + +extern void baz (const char *, const char *) __attribute__ ((__noreturn__)); + +void +foo (int x, int y[x][x]) +{ + if (x < 2) +baz ("", ""); +} + +void +bar (void) +{ + int z[2][2] = { { 1, 2 }, { 3, 4 } }; + foo (2, z); +} Jakub
Re: [PATCH] (Partial) Implementation of simplificaiton of CSHIFT
On Thu, Nov 19, 2015 at 04:58:36PM -0800, Steve Kargl wrote: > > 2015-11-19 Steven G. Kargl > > * intrinsic.h: Prototype for gfc_simplify_cshift > * intrinsic.c (add_functions): Use gfc_simplify_cshift. > * simplify.c (gfc_simplify_cshift): Implement simplification of CSHIFT. > (gfc_simplify_spread): Remove a FIXME and add error condition. > > 2015-11-19 Steven G. Kargl > > * gfortran.dg/simplify_cshift_1.f90: New test. > I've attached an updated patch. The changes consists of 1) better/more comments 2) re-organize code to reduce copying of the array. 3) add optimization for a left/right shift that returns the original array. 4) Don't leak memory. -- Steve Index: gcc/fortran/intrinsic.c === --- gcc/fortran/intrinsic.c (revision 230585) +++ gcc/fortran/intrinsic.c (working copy) @@ -1659,9 +1659,11 @@ add_functions (void) make_generic ("count", GFC_ISYM_COUNT, GFC_STD_F95); - add_sym_3 ("cshift", GFC_ISYM_CSHIFT, CLASS_TRANSFORMATIONAL, ACTUAL_NO, BT_REAL, dr, GFC_STD_F95, - gfc_check_cshift, NULL, gfc_resolve_cshift, - ar, BT_REAL, dr, REQUIRED, sh, BT_INTEGER, di, REQUIRED, + add_sym_3 ("cshift", GFC_ISYM_CSHIFT, CLASS_TRANSFORMATIONAL, ACTUAL_NO, + BT_REAL, dr, GFC_STD_F95, + gfc_check_cshift, gfc_simplify_cshift, gfc_resolve_cshift, + ar, BT_REAL, dr, REQUIRED, + sh, BT_INTEGER, di, REQUIRED, dm, BT_INTEGER, ii, OPTIONAL); make_generic ("cshift", GFC_ISYM_CSHIFT, GFC_STD_F95); Index: gcc/fortran/intrinsic.h === --- gcc/fortran/intrinsic.h (revision 230585) +++ gcc/fortran/intrinsic.h (working copy) @@ -271,6 +271,7 @@ gfc_expr *gfc_simplify_conjg (gfc_expr * gfc_expr *gfc_simplify_cos (gfc_expr *); gfc_expr *gfc_simplify_cosh (gfc_expr *); gfc_expr *gfc_simplify_count (gfc_expr *, gfc_expr *, gfc_expr *); +gfc_expr *gfc_simplify_cshift (gfc_expr *, gfc_expr *, gfc_expr *); gfc_expr *gfc_simplify_dcmplx (gfc_expr *, gfc_expr *); gfc_expr *gfc_simplify_dble (gfc_expr *); gfc_expr *gfc_simplify_digits (gfc_expr *); Index: gcc/fortran/simplify.c === --- gcc/fortran/simplify.c (revision 230585) +++ gcc/fortran/simplify.c (working copy) @@ -1789,6 +1789,99 @@ gfc_simplify_count (gfc_expr *mask, gfc_ gfc_expr * +gfc_simplify_cshift (gfc_expr *array, gfc_expr *shift, gfc_expr *dim) +{ + int dm; + gfc_expr *a; + + /* DIM is only useful for rank > 1, but deal with it here as one can + set DIM = 1 for rank = 1. */ + if (dim) +{ + if (!gfc_is_constant_expr (dim)) + return NULL; + dm = mpz_get_si (dim->value.integer); +} + else +dm = 1; + + /* Copy array into 'a', simplify it, and then test for a constant array. + Unexpected expr_type cause an ICE. */ + switch (array->expr_type) +{ + case EXPR_VARIABLE: + case EXPR_ARRAY: + a = gfc_copy_expr (array); + gfc_simplify_expr (a, 0); + if (!is_constant_array_expr (a)) + { + gfc_free_expr (a); + return NULL; + } + break; + default: + gcc_unreachable (); +} + + if (a->rank == 1) +{ + gfc_constructor *ca, *cr; + gfc_expr *result; + mpz_t size; + int i, j, shft, sz; + + if (!gfc_is_constant_expr (shift)) + { + gfc_free_expr (a); + return NULL; + } + + shft = mpz_get_si (shift->value.integer); + + /* Case (i): If ARRAY has rank one, element i of the result is + ARRAY (1 + MODULO (i + SHIFT 1, SIZE (ARRAY))). */ + + mpz_init (size); + gfc_array_size (a, &size); + sz = mpz_get_si (size); + mpz_clear (size); + + /* Special case: rank 1 array with no shift or a complete shift to + the original order! */ + if (shft == 0 || shft == sz || shft == 1 - sz) + return a; + + /* Adjust shft to deal with right or left shifts. */ + shft = shft < 0 ? 1 - shft : shft; + + result = gfc_copy_expr (a); + cr = gfc_constructor_first (result->value.constructor); + for (i = 0; i < sz; i++, cr = gfc_constructor_next (cr)) + { + j = (i + shft) % sz; + ca = gfc_constructor_first (a->value.constructor); + while (j-- > 0) + ca = gfc_constructor_next (ca); + cr->expr = gfc_copy_expr (ca->expr); + } + + gfc_free_expr (a); + return result; +} + else +{ + /* FIXME: Deal with rank > 1 arrays. For now, don't leak memory + and exit with an error message. */ + gfc_free_expr (a); + gfc_error ("Simplification of CSHIFT with an array with rank > 1 " + "no yet support"); +} + + return NULL; +} + + +gfc_expr * gfc_simplify_dcmplx (gfc_expr *x, gfc_expr *y) { return simplify_cmplx ("DCMPLX", x, y, gfc_default_double_kind); @@ -6089,10 +6182,11 @@ gfc_simplify_spread (gfc_expr *source, g } } else -/* FIXME: Returning here avoids a regression in array_simplify_
[PATCH] Fix up reduction-1{1,2} testcases (PR middle-end/68221)
Hi! If C/C++ array section reductions have non-zero (positive) bias, it is implemented by declaring a smaller private array and subtracting the bias from the start of the private array (because valid code may only dereference elements from bias onwards). But, this isn't something that is kosher in C/C++ pointer arithmetics and the alias oracle seems to get upset on that. So, the following patch fixes that by performing the subtraction on integral type instead of p+ -bias. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk. 2015-11-20 Jakub Jelinek PR middle-end/68221 * omp-low.c (lower_rec_input_clauses): If C/C++ array reduction has non-zero bias, subtract it in integer type instead of pointer plus of negated bias. * testsuite/libgomp.c/reduction-11.c: Remove xfail. * testsuite/libgomp.c/reduction-12.c: Likewise. * testsuite/libgomp.c++/reduction-11.C: Likewise. * testsuite/libgomp.c++/reduction-12.C: Likewise. --- gcc/omp-low.c.jj2015-11-20 12:56:17.0 +0100 +++ gcc/omp-low.c 2015-11-20 13:44:29.080374051 +0100 @@ -,11 +,13 @@ lower_rec_input_clauses (tree clauses, g if (!integer_zerop (bias)) { - bias = fold_convert_loc (clause_loc, sizetype, bias); - bias = fold_build1_loc (clause_loc, NEGATE_EXPR, - sizetype, bias); - x = fold_build2_loc (clause_loc, POINTER_PLUS_EXPR, - TREE_TYPE (x), x, bias); + bias = fold_convert_loc (clause_loc, pointer_sized_int_node, + bias); + yb = fold_convert_loc (clause_loc, pointer_sized_int_node, +x); + yb = fold_build2_loc (clause_loc, MINUS_EXPR, + pointer_sized_int_node, yb, bias); + x = fold_convert_loc (clause_loc, TREE_TYPE (x), yb); yb = create_tmp_var (ptype, name); gimplify_assign (yb, x, ilist); x = yb; --- libgomp/testsuite/libgomp.c/reduction-11.c.jj 2015-11-05 16:03:53.0 +0100 +++ libgomp/testsuite/libgomp.c/reduction-11.c 2015-11-20 13:38:24.448520879 +0100 @@ -1,4 +1,4 @@ -/* { dg-do run { xfail *-*-* } } */ +/* { dg-do run } */ char z[10] = { 0 }; --- libgomp/testsuite/libgomp.c/reduction-12.c.jj 2015-11-05 16:03:53.0 +0100 +++ libgomp/testsuite/libgomp.c/reduction-12.c 2015-11-20 13:38:34.565378078 +0100 @@ -1,4 +1,4 @@ -/* { dg-do run { xfail *-*-* } } */ +/* { dg-do run } */ struct A { int t; }; struct B { char t; }; --- libgomp/testsuite/libgomp.c++/reduction-11.C.jj 2015-11-05 16:03:53.0 +0100 +++ libgomp/testsuite/libgomp.c++/reduction-11.C2015-11-20 13:37:53.921951766 +0100 @@ -1,4 +1,4 @@ -// { dg-do run { xfail *-*-* } } +// { dg-do run } char z[10] = { 0 }; --- libgomp/testsuite/libgomp.c++/reduction-12.C.jj 2015-11-05 16:03:53.0 +0100 +++ libgomp/testsuite/libgomp.c++/reduction-12.C2015-11-20 13:38:03.983809741 +0100 @@ -1,4 +1,4 @@ -// { dg-do run { xfail *-*-* } } +// { dg-do run } template struct A Jakub
[PATCH] Fix GC ICE during simd clone creation (PR middle-end/68339)
Hi! node->get_body () can run various IPA passes and ggc_collect in them, so it is undesirable to hold pointers to GC memory in automatic vars over it. While I could store those vars (clone_info, clone and id) into special GTY vars just to avoid collecting them, it seems easier to call node->get_body () earlier. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk and 5 branch. 2015-11-20 Jakub Jelinek PR middle-end/68339 * omp-low.c (expand_simd_clones): Call node->get_body () before allocating stuff in GC. * gcc.dg/vect/pr68339.c: New test. --- gcc/omp-low.c.jj2015-11-18 11:19:19.0 +0100 +++ gcc/omp-low.c 2015-11-20 12:56:17.075193601 +0100 @@ -18319,6 +18319,10 @@ expand_simd_clones (struct cgraph_node * && TYPE_ARG_TYPES (TREE_TYPE (node->decl)) == NULL_TREE) return; + /* Call this before creating clone_info, as it might ggc_collect. */ + if (node->definition && node->has_gimple_body_p ()) +node->get_body (); + do { /* Start with parsing the "omp declare simd" attribute(s). */ --- gcc/testsuite/gcc.dg/vect/pr68339.c.jj 2015-11-20 13:10:47.756905395 +0100 +++ gcc/testsuite/gcc.dg/vect/pr68339.c 2015-11-20 13:08:13.0 +0100 @@ -0,0 +1,17 @@ +/* PR middle-end/68339 */ +/* { dg-do compile } */ +/* { dg-options "--param ggc-min-heapsize=0 --param ggc-min-expand=0 -fopenmp-simd" } */ + +#pragma omp declare simd notinbranch +int +f1 (int x) +{ + return x; +} + +#pragma omp declare simd notinbranch +int +f2 (int x) +{ + return x; +} Jakub
RFA: PATCH to match.pd for c++/68385
In this bug, we hit the (A & sign-bit) != 0 -> A < 0 transformation. Because of delayed folding, the operands aren't fully folded yet, so we have NOP_EXPRs around INTEGER_CSTs, and so calling wi::only_sign_bit_p ICEs. We've been seeing several similar bugs, where code calls integer_zerop and therefore assumes that they have an INTEGER_CST, but in fact integer_zerop does STRIP_NOPS. This patch changes the pattern to only match if the operand is actually an INTEGER_CST. Alternatively we could call tree_strip_nop_conversions on the operand, but I would expect that to have issues when the conversion changes the signedness of the type. OK if testing passes? commit e7b45ed6775c88c6d48c5863738ba0db2e38fc5e Author: Jason Merrill Date: Fri Nov 20 14:40:35 2015 -0500 PR c++/68385 * match.pd: Don't assume that integer_pow2p implies INTEGER_CST. diff --git a/gcc/match.pd b/gcc/match.pd index e86cc8b..1981ae7 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2232,6 +2232,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && (TYPE_PRECISION (TREE_TYPE (@0)) == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0 && element_precision (@2) >= element_precision (@0) + && TREE_CODE (@1) == INTEGER_CST && wi::only_sign_bit_p (@1, element_precision (@0))) (with { tree stype = signed_type_for (TREE_TYPE (@0)); } (ncmp (convert:stype @0) { build_zero_cst (stype); })
Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.
On 11/20/2015 05:15 AM, Kyrill Tkachov wrote: Hi Kirill, On 18/11/15 14:11, Kirill Yukhin wrote: Hello Andreas, Devid. On 18 Nov 10:45, Andreas Schwab wrote: Kirill Yukhin writes: diff --git a/gcc/testsuite/c-c++-common/attr-simd.c b/gcc/testsuite/c-c++-common/attr-simd.c new file mode 100644 index 000..b4eda34 --- /dev/null +++ b/gcc/testsuite/c-c++-common/attr-simd.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-optimized" } */ + +__attribute__((__simd__)) +extern +int simd_attr (void) +{ + return 0; +} + +/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" } } */ On ia64: FAIL: c-c++-common/attr-simd.c -Wc++-compat scan-tree-dump optimized "simd_attr[ \\t]simdclone|vector" FAIL: c-c++-common/attr-simd.c -Wc++-compat scan-tree-dump optimized "simd_attr2[ \\t]simdclone|vector" $ grep simd_attr attr-simd.c.194t.optimized ;; Function simd_attr (simd_attr, funcdef_no=0, decl_uid=1389, cgraph_uid=0, symbol_order=0) simd_attr () ;; Function simd_attr2 (simd_attr2, funcdef_no=1, decl_uid=1392, cgraph_uid=1, symbol_order=1) simd_attr2 () As far as vABI is supported on x86_64/i?86 only, I am going to enable mentioned `scan-tree-dump' only for these targets. This should cure both IA64 and Power. Concerning attr-simd-3.c. It is known issue: PR68158. And I believe this test should work everywhere as far as PR is resolved. I'll put xfail into the test. Which will lead to (in g++.log): gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes]^M output is: gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes]^M XFAIL: c-c++-common/attr-simd-3.c -std=gnu++98 PR68158 (test for errors, line 5) FAIL: c-c++-common/attr-simd-3.c -std=gnu++98 (test for excess errors) Excess errors: gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes] Patch in the bottom. gcc/tessuite/ * c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error. This test fails on bare-metal targets that don't support -fcilkplus or -pthread. Would you consider moving them to the cilkplus testing directory or adding an appropriate effective target check? I think an effective-target-check is the most appropriate. The whole point here is to make that attribute independent of Cilk support. jeff
Re: [PATCH 6/n] OpenMP 4.0 offloading infrastructure: option handling
On Wed, Dec 10, 2014 at 01:48:21 +0300, Ilya Verbin wrote: > On 09 Dec 14:59, Richard Biener wrote: > > On Mon, 8 Dec 2014, Ilya Verbin wrote: > > > Unfortunately, this fix was not general enough. > > > There might be cases when mixed object files get into lto-wrapper, ie > > > some of > > > them contain only LTO sections, some contain only offload sections, and > > > some > > > contain both. But when lto-wrapper will pass all these files to > > > recompilation, > > > the compiler might crash (it depends on the order of input files), since > > > in > > > read_cgraph_and_symbols it expects that *all* input files contain IR > > > section of > > > given type. > > > This patch splits input objects from argv into lto_argv and offload_argv, > > > so > > > that all files in arrays contain corresponding IR. > > > Similarly, in lto-plugin, it was bad idea to add objects, which contain > > > offload > > > IR without LTO, to claimed_files, since this may corrupt a resolution > > > file. > > > > > > Tested on various combinations of files with/without -flto and > > > with/without > > > offload, using trunk ld and gold, also tested on ld without plugin > > > support. > > > Bootstrap and make check passed on x86_64-linux and i686-linux. Ok for > > > trunk? > > > > Did you check that bootstrap-lto still works? Ok if so. > > Yes, bootstrap-lto passed. > Committed revision 218543. I don't know how I missed this a year ago, but mixing of LTO objects with offloading-without-LTO objects still doesn't work :( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68463 filed about that. Any thoughts how to fix this? Thanks, -- Ilya
[commit] [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)
On Fri, 20 Nov 2015 18:40:46 +0100, Jonathan Wakely wrote: > The patch is OK for trunk and gcc-5-branch. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68448 trunk: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=230669 5.x: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=230670 Jan
Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt
On 11/20/2015 07:08 AM, Bernd Schmidt wrote: BZ27313 is marked as fixed by the introduction of the tree cselim pass, thus the problem won't even be seen at RTL level. Cool. I'm undecided on whether cs-elim is safe wrt the store speculation vs locks concerns raised in the thread discussing Ian's noce_can_store_speculate_p, but that's not something we have to consider to solve the problem at hand. I don't think cs-elim is safe WRT locks and such in multi-threaded code. In particular it replaces this: bb0: if (cond) goto bb2; else goto bb1; bb1: *p = RHS; bb2: with bb0: if (cond) goto bb1; else goto bb2; bb1: condtmp' = *p; bb2: condtmp = PHI *p = condtmp; If *p is a shared memory location, then there may be another writer. If that writer happens to store something in that location after the load of *p, but before the store to *p, then that store will get lost in the transformed pseudo code. That seems to introduce a data race. Presumably one would call the original code ill-formed WRT the C11/C++11 memory model since the shared location is not marked as such. I'm willing to consider this an independent problem. As far as I can tell hmmer and the 27313 testcase are unaffected at -O2 (if anything, hmmer was very slightly faster afterwards). The run wasn't super-scientific, but I wouldn't have expected anything else given the existence of cs-elim. Cool. It doesn't have to be super-scientific. Good to see it didn't harm anything -- thanks for going the extra mile on this one. Ok to do the above, removing all the bits made unnecessary (including memory_must_be_modified_in_insn_p in alias.c)? Yup. Zap it. jeff
Re: [PATCH] Add LANG_HOOKS_EMPTY_RECORD_P for C++ empty class
On Tue, Nov 17, 2015 at 4:22 AM, Richard Biener wrote: > On Tue, Nov 17, 2015 at 12:01 PM, H.J. Lu wrote: >> Empty record should be returned and passed the same way in C and C++. >> This patch adds LANG_HOOKS_EMPTY_RECORD_P for C++ empty class, which >> defaults to return false. For C++, LANG_HOOKS_EMPTY_RECORD_P is defined >> to is_really_empty_class, which returns true for C++ empty classes. For >> LTO, we stream out a bit to indicate if a record is empty and we store >> it in TYPE_LANG_FLAG_0 when streaming in. get_ref_base_and_extent is >> changed to set bitsize to 0 for empty records. Middle-end and x86 >> backend are updated to ignore empty records for parameter passing and >> function value return. Other targets may need similar changes. > > Please avoid a new langhook for this and instead claim a bit in > tree_type_common > like for example restrict_flag (double-check it is unused for non-pointers). There is no bit in tree_type_common I can overload. restrict_flag is checked for non-pointers to issue an error when it is used on non-pointers: /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/template/qualttp20.C:19:38: error: ‘__restrict__’ qualifiers cannot be applied to ‘AS::L’ typedef typename T::L __restrict__ r;// { dg-error "'__restrict__' qualifiers cannot" "" } Should I add a bit to tree_type_common? It may crease the size of tree_type_common by 4 bytes since there is unused bits in tree_type_common. > I don't like that you need to modify targets - those checks should be done > in the caller (which may just use a new wrapper with the logic and then > dispatching to the actual hook). I can do that. > Why do you need do adjust get_ref_base_and_extent? get_ref_base_and_extent is changed to set bitsize to 0 for empty records so that when ref_maybe_used_by_call_p_1 calls get_ref_base_and_extent to get 0 as the maximum size on empty record. Otherwise, find_tail_calls won't perform tail call optimization for functions with empty record parameters, as in struct dummy { }; struct true_type { struct dummy i; }; extern true_type y; extern void xxx (true_type c); void yyy (void) { xxx (y); } -- H.J.
Merge from trunk to gccgo branch
I merged trunk revision 230657 to the gccgo branch. Ian
Re: [PATCH] Add clang-format config to contrib folder
On 11/20/2015 04:33 AM, Martin Liška wrote: On 11/19/2015 06:43 PM, Pedro Alves wrote: On 11/19/2015 12:54 PM, Martin Liška wrote: ContinuationIndentWidth: 2 -ForEachMacros: ['_FOR_EACH','_FOR_EACH_1','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FOR_EACH_IMM_USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','F! O! R! _EACH_OBJE CT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','_FOR_EACH_X','_FOR_EACH_X_1','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR'] +ForEachMacros: ['FOR_ALL_BB_FN','FOR_ALL_EH_REGION','FOR_ALL_EH_REGION_AT','FOR_ALL_EH_REGION_FN','FOR_ALL_INHERITED_FIELDS','FOR_ALL_PREDICATES','FOR_BB_BETWEEN','FOR_BB_INSNS','FOR_BB_INSNS_REVERSE','FOR_BB_INSNS_REVERSE_SAFE','FOR_BB_INSNS_SAFE','FOR_BODY','FOR_COND','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','! F! O! R_EACH_IMM _USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR_EACH_OBJECT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','FOR_EXPR','FOR_INIT_STMT','FOR_SCOPE'] IndentCaseLabels: false I don't know the tool's syntax here, but it's usually much better to keep entries like these one per line (or at least a few only). Then changes to the list are _much_ easier to review and maintain, like: ... ForEachMacros: [ 'FOR_ALL_BB_FN', 'FOR_ALL_EH_REGION', -'FOR_ALL_EH_REGION_AT', -'FOR_ALL_EH_REGION', +'FOR_ALL_EH_WHATNOT, +'FOR_ALL_EH_WHATNOT_AT, 'FOR_ALL_INHERITED_FIELDS', 'FOR_ALL_PREDICATES', 'FOR_BB_BETWEEN', 'FOR_BB_INSNS', ... vs a single-line hunk that's basically unintelligible. Thanks, Pedro Alves Hi Pedro. Fully agree with you, there's suggested patch. Hope I can install the patch for trunk? Yes. In fact, I think that as long as you're moving towards coercing clang-format to handle GNU style that you should have a free reign to make changes to this config file. jeff
Re: PATCH to shorten_compare -Wtype-limits handling
On 20 November 2015 at 17:42, Jeff Law wrote: > So we have to detangle the operand shortening from warning detection. Kai's > idea was to first make the shortening code "pure" in the sense that it would > have no side effects other than to generate the warnings. Canonicalization > and other transformations would still occur internally, but not be reflected > in the IL. > > That was the overall plan and he posted a patch for that. But that patch > didn't do the due diligence to verify that once the shortening code was made > "pure" that we didn't regress on the quality of the code we generated. I thought that the original plan was to make the warning code also use match.pd. That is, that all folding, including FE folding, will be match.pd-based. Has this changed? Cheers, Manuel.
Re: PATCH to shorten_compare -Wtype-limits handling
On 11/19/2015 12:02 PM, Manuel López-Ibáñez wrote: On 19 November 2015 at 17:54, Jeff Law wrote: But there were a couple of patches from you some time ago, for example: http://permalink.gmane.org/gmane.comp.gcc.patches/343476 What happened with those? On hold pending fixing the type-limits warning placement. Essentially that has to be untangled first. Could you elaborate on this? Or point me to some previous email thread? (I don't have enough free time to follow the mailing list anymore, sorry). I don't have the thread handy, but essentially the operand shortening code has warning bits inside it. As a result pulling out functionality (such as operand canonicalization) and putting into match.pd results in missing a warning -- ie a regression. So we have to detangle the operand shortening from warning detection. Kai's idea was to first make the shortening code "pure" in the sense that it would have no side effects other than to generate the warnings. Canonicalization and other transformations would still occur internally, but not be reflected in the IL. That was the overall plan and he posted a patch for that. But that patch didn't do the due diligence to verify that once the shortening code was made "pure" that we didn't regress on the quality of the code we generated. jeff
Re: [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)
On 20/11/15 18:28 +0100, Jan Kratochvil wrote: On Thu, 19 Nov 2015 21:21:51 +0100, Jan Kratochvil wrote: * python/hook.in: Call register_libstdcxx_printers. * python/libstdcxx/v6/__init__.py: Wrap it to register_libstdcxx_printers. [...] -import libstdcxx.v6 +# Call a function as a plain import would not execute body of the included file +# on repeated reloads of this object file. +from libstdcxx.v6 import register_libstdcxx_printers +register_libstdcxx_printers(gdb.current_objfile()) Jonathan Wakely mentioned: [libstdc++] Refactor python/hook.in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02518.html From: Siva Chandra Message-ID: https://gcc.gnu.org/r215726 That change introduced this regression. This patch does not undo it, goal of the patch by Siva Chandra - minimizing hook.in - stays achieved. OK, thanks for checking, Jan. The patch is OK for trunk and gcc-5-branch.
Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address
On 20/11/15 08:31, Bin.Cheng wrote: > On Thu, Nov 19, 2015 at 10:32 AM, Bin.Cheng wrote: >> On Tue, Nov 17, 2015 at 6:08 PM, James Greenhalgh >> wrote: >>> On Tue, Nov 17, 2015 at 05:21:01PM +0800, Bin Cheng wrote: Hi, GIMPLE IVO needs to call backend interface to calculate costs for addr expressions like below: FORM1: "r73 + r74 + 16380" FORM2: "r73 << 2 + r74 + 16380" They are invalid address expression on AArch64, so will be legitimized by aarch64_legitimize_address. Below are what we got from that function: For FORM1, the address expression is legitimized into below insn sequence and rtx: r84:DI=r73:DI+r74:DI r85:DI=r84:DI+0x3000 r83:DI=r85:DI "r83 + 4092" For FORM2, the address expression is legitimized into below insn sequence and rtx: r108:DI=r73:DI<<0x2 r109:DI=r108:DI+r74:DI r110:DI=r109:DI+0x3000 r107:DI=r110:DI "r107 + 4092" So the costs computed are 12/16 respectively. The high cost prevents IVO from choosing right candidates. Besides cost computation, I also think the legitmization is bad in terms of code generation. The root cause in aarch64_legitimize_address can be described by it's comment: /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask), where mask is selected by alignment and size of the offset. We try to pick as large a range for the offset as possible to maximize the chance of a CSE. However, for aligned addresses we limit the range to 4k so that structures with different sized elements are likely to use the same base. */ I think the split of CONST is intended for REG+CONST where the const offset is not in the range of AArch64's addressing modes. Unfortunately, it doesn't explicitly handle/reject "REG+REG+CONST" and "REG+REG<>>> when the CONST are in the range of addressing modes. As a result, these two cases fallthrough this logic, resulting in sub-optimal results. It's obvious we can do below legitimization: FORM1: r83:DI=r73:DI+r74:DI "r83 + 16380" FORM2: r107:DI=0x3ffc r106:DI=r74:DI+r107:DI REG_EQUAL r74:DI+0x3ffc "r106 + r73 << 2" This patch handles these two cases as described. >>> >>> Thanks for the description, it made the patch very easy to review. I only >>> have a style comment. >>> Bootstrap & test on AArch64 along with other patch. Is it OK? 2015-11-04 Bin Cheng Jiong Wang * config/aarch64/aarch64.c (aarch64_legitimize_address): Handle address expressions like REG+REG+CONST and REG+NON_REG+CONST. >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5c8604f..47875ac 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4710,6 +4710,51 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) { HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); HOST_WIDE_INT base_offset; + rtx op0 = XEXP (x,0); + + if (GET_CODE (op0) == PLUS) + { + rtx op0_ = XEXP (op0, 0); + rtx op1_ = XEXP (op0, 1); >>> >>> I don't see this trailing _ on a variable name in many places in the source >>> tree (mostly in the Go frontend), and certainly not in the aarch64 backend. >>> Can we pick a different name for op0_ and op1_? >>> + + /* RTX pattern in the form of (PLUS (PLUS REG, REG), CONST) will + reach here, the 'CONST' may be valid in which case we should + not split. */ + if (REG_P (op0_) && REG_P (op1_)) + { + machine_mode addr_mode = GET_MODE (op0); + rtx addr = gen_reg_rtx (addr_mode); + + rtx ret = plus_constant (addr_mode, addr, offset); + if (aarch64_legitimate_address_hook_p (mode, ret, false)) + { + emit_insn (gen_adddi3 (addr, op0_, op1_)); + return ret; + } + } + /* RTX pattern in the form of (PLUS (PLUS REG, NON_REG), CONST) + will reach here. If (PLUS REG, NON_REG) is valid addr expr, + we split it into Y=REG+CONST, Y+NON_REG. */ + else if (REG_P (op0_) || REG_P (op1_)) + { + machine_mode addr_mode = GET_MODE (op0); + rtx addr = gen_reg_rtx (addr_mode); + + /* Switch to make sure that register is in op0_. */ + if (REG_P (op1_)) + std::swap (op0_, op1_); + + rtx ret = gen_rtx_fmt_ee (PLUS, addr_mode, addr, op1_); + if (aarch64_legitimate_address_hook_p (mode, ret, false)) +
Re: [Patch, vrp] Allow VRP type conversion folding only for widenings upto word mode
On 11/20/2015 10:04 AM, Senthil Kumar Selvaraj wrote: On Thu, Nov 19, 2015 at 10:31:41AM -0700, Jeff Law wrote: On 11/18/2015 11:20 PM, Senthil Kumar Selvaraj wrote: On Wed, Nov 18, 2015 at 09:36:21AM +0100, Richard Biener wrote: Otherwise ok. See modified patch below. If you think vrp98.c is unnecessary, feel free to dump it :). If ok, could you commit it for me please? I don't have commit access. Regards Senthil gcc/ChangeLog 2015-11-19 Senthil Kumar Selvaraj * tree.h (desired_pro_or_demotion_p): New function. * tree-vrp.c (simplify_cond_using_ranges): Call it. gcc/testsuite/ChangeLog 2015-11-19 Senthil Kumar Selvaraj * gcc.dg/tree-ssa/vrp98.c: New testcase. * gcc.target/avr/uint8-single-reg.c: New testcase. I went ahead and committed this as-is. I do think the vrp98 testcase is useful as it verifies that VRP is doing what we want in a target independent way. It's a good complement to the AVR specific testcase. I see the same problem on gcc-5-branch as well. Would it be ok to backport the fix to that branch as well? That's a call for the release managers. I typically don't backport anything expect ICE or incorrect code generation fixes as I tend to be very conservative on what goes onto a release branch. Jakub, Richi or Joseph would need to ack into a release branch. jeff
Re: RFC/RFA: Fix bug with REE optimization corrupting extended registers
On 11/20/2015 02:19 AM, Nick Clifton wrote: Hi Jeff, The code there would solve this problem, but the approach is is overly cautious, since it disables the optimization for all extensions that increase the number of hard registers used. Some of these will be viable candidates, provided that the extra hard registers are no used. (This is certainly true for the RL78, where the (patched) optimization does improve code, even though the widening does use extra registers). Nick -- can you pass along your testcode? Sure - this is for the RL78 toolchain. In theory the problem is generic, but I have not tested other toolchains. Right. It just really helps me to have something I can poke at -- it's just the type of learner I am. I tried to trip the test in that #if several times last year and never came up with anything. So naturally if we can trip it with the rl78 I want to dig into it. Compile the gcc.c-torture/execute/pr42833.c or gcc.c-torture/execute/strct-stdarg-1.c tests at -O1 or higher with -free also specified on the command line. Without -free these tests pass. With -free they fail. Perfect. Building rl78-elf cross bits as I type... jeff
Re: [patch] Python Pretty Printers get disabled on libstdc++ reload by GDB (PR libstdc++/68448)
On Thu, 19 Nov 2015 21:21:51 +0100, Jan Kratochvil wrote: > * python/hook.in: Call register_libstdcxx_printers. > * python/libstdcxx/v6/__init__.py: Wrap it to > register_libstdcxx_printers. [...] > -import libstdcxx.v6 > +# Call a function as a plain import would not execute body of the included > file > +# on repeated reloads of this object file. > +from libstdcxx.v6 import register_libstdcxx_printers > +register_libstdcxx_printers(gdb.current_objfile()) Jonathan Wakely mentioned: [libstdc++] Refactor python/hook.in https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02518.html From: Siva Chandra Message-ID: https://gcc.gnu.org/r215726 That change introduced this regression. This patch does not undo it, goal of the patch by Siva Chandra - minimizing hook.in - stays achieved. Jan
Re: [PATCH] Fix PR68067
On 6 November 2015 at 10:39, Richard Biener wrote: >> ../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location >> references block not in block tree >> l1_279 = PHI <1(28), l1_299(33)> > > ^^^ > > this is the error to look at! It means that the GC heap will be corrupted > quite easily. > This looked very similar to PR68117 - the invalid phi arg, and block not in block-tree, even if not the invalid tree code - and as the posters there were having success with valgrind, whereas I wasn't, I watched and waited. First observation is that it triggers the asserts you suggested in comment 27 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D68117#c27). Indeed, it fails those asserts, even after the patch in comment 25 (committed as r230594) to tree-ssa.c (delete_tree_ssa), and the patch in comment#35 to function.c (set_cfun), and the patch in comment#30 (committed as r230424) to cfgexpand.c (pass_expand::execute). The patch in comment#29 (which replaces the asserts in comment#27 with empties), however, fixes the problem - although I can't rule out, that that's just by changing the memory allocation pattern. Moreover, if I take those patches and rebase onto a recent trunk (onto which the delete_tree_ssa and pass_expand::execute patches have already been committed), i.e. just adding the assertions from comment#27 and the call in function.c (set_cfun) - the assertions are still failing on my testcase, whereas the original (assertionless) failure was very erratic, and had since disappeared/been hidden on trunk. Indeed those same assertions break in a few other places (even in a --disable-bootstrap build after gcc/xgcc is built), so I feel I have a good chance of producing a reasonable assertion-breaking testcase. So I have to ask, how sure are you that those assertions are(/should be!) "correct"? :) --Alan
Re: PATCH to shorten_compare -Wtype-limits handling
On 20 November 2015 at 16:10, Martin Sebor wrote: >>> Hmm, it looks like using expansion_point_if_in_system_header might avoid >>> the first issue you mention. >> >> >> Thus. > > > Great, thanks! (I'll have to remember the trick for my own use!) I added this to https://gcc.gnu.org/wiki/DiagnosticsGuidelines under "Locations" for future reference. I hope others would do the same in the future, so the info is kept up-to-date. Cheers, Manuel.
Re: [PATCH 4/4][AArch64] Add cost model for Exynos M1
On Thu, Nov 19, 2015 at 04:06:17PM -0600, Evandro Menezes wrote: > On 11/05/2015 06:09 PM, Evandro Menezes wrote: > >2015-10-25 Evandro Menezes > > > > gcc/ > > > > * config/aarch64/aarch64-cores.def: Use the Exynos M1 cost model. > > * config/aarch64/aarch64.c (exynosm1_addrcost_table): New > >variable. > > (exynosm1_regmove_cost): Likewise. > > (exynosm1_vector_cost): Likewise. > > (exynosm1_tunings): Likewise. > > * config/arm/aarch-cost-tables.h (exynosm1_extra_costs): Likewise. > > * config/arm/arm.c (arm_exynos_m1_tune): Likewise. > > > >This patch adds the cost model for Exynos M1. This patch depends > >on a couple of previous patches though, > >https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00505.html and > >https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00538.html > > > >Please, commit if it's alright. > > Ping. This is OK from an AArch64 perspective. Please wait for an OK from an ARM reviewer. Thanks, James
Re: [PATCH 3b/4][AArch64] Add scheduling model for Exynos M1
On Tue, Nov 10, 2015 at 11:54:00AM -0600, Evandro Menezes wrote: >2015-11-10 Evandro Menezes > >gcc/ > >* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model. >* config/aarch64/aarch64.md: Include "exynos-m1.md". >* config/arm/arm-cores.def: Use the Exynos M1 sched model. >* config/arm/arm.md: Include "exynos-m1.md". >* config/arm/arm-tune.md: Regenerated. >* config/arm/exynos-m1.md: New file. > > This patch adds the scheduling model for Exynos M1. It depends on > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html > > Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu. > > Please, commit if it's alright. > From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001 > From: Evandro Menezes > Date: Mon, 9 Nov 2015 17:18:52 -0600 > Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1 > > gcc/ > * config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model. > * config/aarch64/aarch64.md: Include "exynos-m1.md". These changes are fine. > * config/arm/arm-cores.def: Use the Exynos M1 sched model. > * config/arm/arm.md: Include "exynos-m1.md". > * config/arm/arm-tune.md: Regenerated. These changes need an ack from an ARM reviewer. > * config/arm/exynos-m1.md: New file. I have a few comments on this model. > +;; The Exynos M1 core is modeled as a triple issue pipeline that has > +;; the following functional units. > + > +(define_automaton "exynos_m1_gp") > +(define_automaton "exynos_m1_ls") > +(define_automaton "exynos_m1_fp") > + > +;; 1. Two pipelines for simple integer operations: A, B > +;; 2. One pipeline for simple or complex integer operations: C > + > +(define_cpu_unit "em1_xa, em1_xb, em1_xc" "exynos_m1_gp") > + > +(define_reservation "em1_alu" "(em1_xa | em1_xb | em1_xc)") > +(define_reservation "em1_c" "em1_xc") Is this extra reservation useful, can we not just use em1_xc directly? > +;; 3. Two asymmetric pipelines for Neon and FP operations: F0, F1 > + > +(define_cpu_unit "em1_f0, em1_f1" "exynos_m1_fp") > + > +(define_reservation "em1_fmac" "em1_f0") > +(define_reservation "em1_fcvt" "em1_f0") > +(define_reservation "em1_nalu" "(em1_f0 | em1_f1)") > +(define_reservation "em1_nalu0" "em1_f0") > +(define_reservation "em1_nalu1" "em1_f1") > +(define_reservation "em1_nmisc" "em1_f0") > +(define_reservation "em1_ncrypt" "em1_f0") > +(define_reservation "em1_fadd" "em1_f1") > +(define_reservation "em1_fvar" "em1_f1") > +(define_reservation "em1_fst" "em1_f1") Same comment here, does this not just obfuscate the interaction between instruction classes in the description. I'm not against doing it this way if you prefer, but it would seem to reduce readability to me. I think there is also an argument that this increases readability, so it is your choice. > + > +;; 4. One pipeline for branch operations: BX > + > +(define_cpu_unit "em1_bx" "exynos_m1_gp") > + > +(define_reservation "em1_br" "em1_bx") > + And again? > +;; 5. One AGU for loads: L > +;; One AGU for stores and one pipeline for stores: S, SD > + > +(define_cpu_unit "em1_lx" "exynos_m1_ls") > +(define_cpu_unit "em1_sx, em1_sd" "exynos_m1_ls") > + > +(define_reservation "em1_ld" "em1_lx") > +(define_reservation "em1_st" "(em1_sx + em1_sd)") > + > +;; Common occurrences > +(define_reservation "em1_sfst" "(em1_fst + em1_st)") > +(define_reservation "em1_lfst" "(em1_fst + em1_ld)") > + > +;; Branches > +;; > +;; No latency as there is no result > +;; TODO: Unconditional branches use no units; > +;; conditional branches add the BX unit; > +;; indirect branches add the C unit. > +(define_insn_reservation "exynos_m1_branch" 0 > + (and (eq_attr "tune" "exynosm1") > + (eq_attr "type" "branch")) > + "em1_br") > + > +(define_insn_reservation "exynos_m1_call" 1 > + (and (eq_attr "tune" "exynosm1") > + (eq_attr "type" "call")) > + "em1_alu") > + > +;; Basic ALU > +;; > +;; Simple ALU without shift, non-predicated > +(define_insn_reservation "exynos_m1_alu" 1 > + (and (eq_attr "tune" "exynosm1") > + (and (not (eq_attr "predicated" "yes")) (and (eq_attr "predicated" "no")) ? Likewise throughout the file? Again this is your choice. This is OK from the AArch64 side, let me know if you plan to change any of the above, otherwise I'll commit it (or someone else can commit it) after I see an OK from an ARM reviewer. Thanks, James
Re: [Patch, vrp] Allow VRP type conversion folding only for widenings upto word mode
On Thu, Nov 19, 2015 at 10:31:41AM -0700, Jeff Law wrote: > On 11/18/2015 11:20 PM, Senthil Kumar Selvaraj wrote: > >On Wed, Nov 18, 2015 at 09:36:21AM +0100, Richard Biener wrote: > >> > >>Otherwise ok. > > > >See modified patch below. If you think vrp98.c is unnecessary, feel free > >to dump it :). > > > >If ok, could you commit it for me please? I don't have commit access. > > > >Regards > >Senthil > > > >gcc/ChangeLog > >2015-11-19 Senthil Kumar Selvaraj > > > > * tree.h (desired_pro_or_demotion_p): New function. > > * tree-vrp.c (simplify_cond_using_ranges): Call it. > > > >gcc/testsuite/ChangeLog > >2015-11-19 Senthil Kumar Selvaraj > > > > * gcc.dg/tree-ssa/vrp98.c: New testcase. > > * gcc.target/avr/uint8-single-reg.c: New testcase. > I went ahead and committed this as-is. > > I do think the vrp98 testcase is useful as it verifies that VRP is doing > what we want in a target independent way. It's a good complement to the AVR > specific testcase. I see the same problem on gcc-5-branch as well. Would it be ok to backport the fix to that branch as well? Regards Senthil > > Thanks, > Jeff >
Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m
Hi Kyrill On 20/11/15 11:51, Kyrill Tkachov wrote: Hi Andre, On 18/11/15 09:44, Andre Vieira wrote: On 17/11/15 10:10, James Greenhalgh wrote: On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote: On 16/11/15 12:07, James Greenhalgh wrote: On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote: Hi, This patch changes the target support mechanism to make it recognize any ARM 'M' profile as a non-neon supporting target. The current check only tests for armv6 architectures and earlier, and does not account for armv7-m. This is correct because there is no 'M' profile that supports neon and the current test is not sufficient to exclude armv7-m. Tested by running regressions for this testcase for various ARM targets. Is this OK to commit? Thanks, Andre Vieira gcc/testsuite/ChangeLog: 2015-11-06 Andre Vieira * gcc/testsuite/lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Added check for M profile. From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Fri, 13 Nov 2015 11:16:34 + Subject: [PATCH] Disable neon testing for armv7-m --- gcc/testsuite/lib/target-supports.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } { int dummy; /* Avoid the case where a test adds -mfpu=neon, but the toolchain is configured for -mcpu=arm926ej-s, for example. */ -#if __ARM_ARCH < 7 +#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M' #error Architecture too old for NEON. Could you fix this #error message while you're here? Why we can't change this test to look for the __ARM_NEON macro from ACLE: #if __ARM_NEON < 1 #error NEON is not enabled #endif Thanks, James There is a check for this already: 'check_effective_target_arm_neon'. I think the idea behind arm_neon_ok is to check whether the hardware would support neon, whereas arm_neon is to check whether neon was enabled, i.e. -mfpu=neon was used or a mcpu was passed that has neon enabled by default. The comments for 'check_effective_target_arm_neon_ok_nocache' highlight this, though maybe the comments for check_effective_target_arm_neon could be better. # Return 1 if this is an ARM target supporting -mfpu=neon # -mfloat-abi=softfp or equivalent options. Some multilibs may be # incompatible with these options. Also set et_arm_neon_flags to the # best options to add. proc check_effective_target_arm_neon_ok_nocache ... /* Avoid the case where a test adds -mfpu=neon, but the toolchain is configured for -mcpu=arm926ej-s, for example. */ ... and # Return 1 if this is a ARM target with NEON enabled. proc check_effective_target_arm_neon OK, got it - sorry for my mistake, I had the two procs confused. I'd still like to see the error message fixed "Architecture too old for NEON." is not an accurate description of the problem. Thanks, James This OK? This is ok, I've committed for you with the slightly tweaked ChangeLog entry: 2015-11-20 Andre Vieira * lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Add check for M profile. as r230653. Thanks, Kyrill Cheers, Andre Thank you. Would there be any objections to backporting this to gcc-5-branch? I checked, it applies cleanly and its a simple enough way of preventing a lot of FAILS for armv7-m. Best Regards, Andre
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
On 20/11/15 14:29, Richard Biener wrote: I agree it's somewhat of an odd behavior but all passes should either be placed in a sub-pipeline with an outer loop_optimizer_init()/finalize () call or call both themselves. Hmm, but adding loop_optimizer_finalize at the end of pass_lim breaks the loop pipeline. We could use the style used in pass_slp_vectorize::execute: ... pass_slp_vectorize::execute (function *fun) { basic_block bb; bool in_loop_pipeline = scev_initialized_p (); if (!in_loop_pipeline) { loop_optimizer_init (LOOPS_NORMAL); scev_initialize (); } ... if (!in_loop_pipeline) { scev_finalize (); loop_optimizer_finalize (); } ... Although that doesn't strike me as particularly clean. Thanks, - Tom
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On Fri, Nov 20, 2015 at 09:55:29AM -0600, Evandro Menezes wrote: > On 11/20/2015 06:27 AM, James Greenhalgh wrote: > >On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: > >>On 11/12/2015 09:39 AM, Evandro Menezes wrote: > >>2015-11-12 Evandro Menezes > >> > >>[AArch64] Add attribute for compatibility with ARM pipeline models > >> > >>gcc/ > >> > >>* config/aarch64/aarch64.md (predicated): Copy attribute from > >>"arm.md". > >>* config/arm/arm.md (predicated): Added description. > >> > >>Please, commit if it's alright. > >The AArch64 part of this is OK. > > > >> From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 > >>From: Evandro Menezes > >>Date: Mon, 9 Nov 2015 17:11:16 -0600 > >>Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM > >> pipeline models > >> > >>gcc/ > >>* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". > >>* config/arm/arm.md (predicated): Added description. > >>--- > >> gcc/config/aarch64/aarch64.md | 4 > >> gcc/config/arm/arm.md | 3 +++ > >> 2 files changed, 7 insertions(+) > >> > >>diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > >>index 1586256..d46f837 100644 > >>--- a/gcc/config/aarch64/aarch64.md > >>+++ b/gcc/config/aarch64/aarch64.md > >>@@ -195,6 +195,10 @@ > >> ;; 1 :=: yes > >> (define_attr "far_branch" "" (const_int 0)) > >>+;; Strictly for compatibility with AArch32 in pipeline models, since > >>AArch64 has > >>+;; no predicated insns. > >>+(define_attr "predicated" "yes,no" (const_string "no")) > >>+ > >> ;; --- > >> ;; Pipeline descriptions and scheduling > >> ;; --- > >>diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > >>index 73c3088..6bda491 100644 > >>--- a/gcc/config/arm/arm.md > >>+++ b/gcc/config/arm/arm.md > >>@@ -105,6 +105,9 @@ > >> (define_attr "fpu" "none,vfp" > >>(const (symbol_ref "arm_fpu_attr"))) > >>+; Predicated means that the insn form is conditionally executed based on a > >>+; predicate. We default to 'no' because no Thumb patterns match this rule > >>+; and not all ARM insns do. > >s/is conditionally executed/can be conditionally executed/ in the first > >sentence. Otherwise, this looks OK to me but I can't approve the ARM part, > >so you'll need to wait for a review from someone who can. > > > >> (define_attr "predicated" "yes,no" (const_string "no")) > >> ; LENGTH of an instruction (in bytes) > >>-- > >>2.1.0.243.g30d45f7 > > Actually, that would then be the existing description for the > "predicable" attribute. I think that this proposed description is > correct for the "predicated" attribute. Right, understood, that will teach me to pattern match up to pred and stop reading the word. This is fine, I've committed it on your behalf as r230666 Thanks, James
Re: PATCH to shorten_compare -Wtype-limits handling
Hmm, it looks like using expansion_point_if_in_system_header might avoid the first issue you mention. Thus. Great, thanks! (I'll have to remember the trick for my own use!) Martin
[PATCH, PR68460] Always call free_stmt_vec_info_vec in gather_scalar_reductions
[ was: Re: [PATCH] Fix parloops gimple_uid usage ] On 09/10/15 23:09, Tom de Vries wrote: @@ -2392,6 +2397,9 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list loop_vec_info simple_inner_loop_info = NULL; bool allow_double_reduc = true; + if (!stmt_vec_info_vec.exists ()) +init_stmt_vec_info_vec (); + simple_loop_info = vect_analyze_loop_form (loop); if (simple_loop_info == NULL) return; @@ -2453,9 +2461,16 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list destroy_loop_vec_info (simple_loop_info, true); destroy_loop_vec_info (simple_inner_loop_info, true); + /* Release the claim on gimple_uid. */ + free_stmt_vec_info_vec (); + With the src/libgomp/testsuite/libgomp.c/pr46886.c testcase, compiled in addition with -ftree-vectorize, I ran into an ICE: ... src/libgomp/testsuite/libgomp.c/pr46886.c:8:5: internal compiler error: in init_stmt_vec_info_vec, at tree-vect-stmts.c:8250 int foo (void) ^~~ 0x1196082 init_stmt_vec_info_vec() src/gcc/tree-vect-stmts.c:8250 0x11c3ed4 vectorize_loops() src/gcc/tree-vectorizer.c:510 0x10a7ea5 execute src/gcc/tree-ssa-loop.c:276 ... The ICE is caused by the fact that init_stmt_vec_info_vec is called at the start of vectorize_loops, while stmt_vec_info_vec is not empty. I traced this back to gather_scalar_reduction, where we call init_stmt_vec_info_vec, but we skip free_stmt_vec_info_vec if we take the early-out for simple_loop_info == NULL. This patch fixes the ICE by making sure we always call free_stmt_vec_info_vec in gather_scalar_reduction. Passes cc1/f951 rebuild and autopar testing. OK for stage3 trunk if bootstrap and regtest succeeds? Thanks, - Tom Always call free_stmt_vec_info_vec in gather_scalar_reductions 2015-11-20 Tom de Vries PR tree-optimization/68460 * tree-parloops.c (gather_scalar_reductions): Also call free_stmt_vec_info_vec if simple_loop_info == NULL. * gcc.dg/autopar/pr68460.c: New test. --- gcc/testsuite/gcc.dg/autopar/pr68460.c | 35 ++ gcc/tree-parloops.c| 6 +- 2 files changed, 40 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/autopar/pr68460.c b/gcc/testsuite/gcc.dg/autopar/pr68460.c new file mode 100644 index 000..0c00065 --- /dev/null +++ b/gcc/testsuite/gcc.dg/autopar/pr68460.c @@ -0,0 +1,35 @@ +/* { dg-do "compile" } */ +/* { dg-options "-O -ftree-parallelize-loops=2 -ftree-vectorize -fno-tree-ch -fno-tree-dominator-opts" } */ + +void abort (void); + +int d[1024], e[1024]; + +int +foo (void) +{ + int s = 0; + int i; + + for (i = 0; i < 1024; i++) +s += d[i] - e[i]; + + return s; +} + +int +main () +{ + int i; + + for (i = 0; i < 1024; i++) +{ + d[i] = i * 2; + e[i] = i; +} + + if (foo () != 1023 * 1024 / 2) +abort (); + + return 0; +} diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 8d7912d..85cc78d 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2433,7 +2433,7 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list simple_loop_info = vect_analyze_loop_form (loop); if (simple_loop_info == NULL) -return; +goto gather_done; for (gsi = gsi_start_phis (loop->header); !gsi_end_p (gsi); gsi_next (&gsi)) { @@ -2492,9 +2492,13 @@ gather_scalar_reductions (loop_p loop, reduction_info_table_type *reduction_list destroy_loop_vec_info (simple_loop_info, true); destroy_loop_vec_info (simple_inner_loop_info, true); + gather_done: /* Release the claim on gimple_uid. */ free_stmt_vec_info_vec (); + if (reduction_list->elements () == 0) +return; + /* As gimple_uid is used by the vectorizer in between vect_analyze_loop_form and free_stmt_vec_info_vec, we can set gimple_uid of reduc_phi stmts only now. */
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On 11/20/2015 08:34 AM, Kyrill Tkachov wrote: On 20/11/15 12:27, James Greenhalgh wrote: On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: On 11/12/2015 09:39 AM, Evandro Menezes wrote: 2015-11-12 Evandro Menezes [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. The arm part is ok too. It's just a comment. In the ChangeLog entry for arm.md I'd say "Add description." Thanks, Kyrill Please, commit if it's alright. The AArch64 part of this is OK. From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 9 Nov 2015 17:11:16 -0600 Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. --- gcc/config/aarch64/aarch64.md | 4 gcc/config/arm/arm.md | 3 +++ 2 files changed, 7 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1586256..d46f837 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -195,6 +195,10 @@ ;; 1 :=: yes (define_attr "far_branch" "" (const_int 0)) +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has +;; no predicated insns. +(define_attr "predicated" "yes,no" (const_string "no")) + ;; --- ;; Pipeline descriptions and scheduling ;; --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 73c3088..6bda491 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -105,6 +105,9 @@ (define_attr "fpu" "none,vfp" (const (symbol_ref "arm_fpu_attr"))) +; Predicated means that the insn form is conditionally executed based on a +; predicate. We default to 'no' because no Thumb patterns match this rule +; and not all ARM insns do. s/is conditionally executed/can be conditionally executed/ in the first sentence. Otherwise, this looks OK to me but I can't approve the ARM part, so you'll need to wait for a review from someone who can. Thanks, James (define_attr "predicated" "yes,no" (const_string "no")) ; LENGTH of an instruction (in bytes) -- 2.1.0.243.g30d45f7 Can you please commit it with this change in the Changelog? Thank you, -- Evandro Menezes
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On 11/20/2015 06:27 AM, James Greenhalgh wrote: On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: On 11/12/2015 09:39 AM, Evandro Menezes wrote: 2015-11-12 Evandro Menezes [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. Please, commit if it's alright. The AArch64 part of this is OK. From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 9 Nov 2015 17:11:16 -0600 Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. --- gcc/config/aarch64/aarch64.md | 4 gcc/config/arm/arm.md | 3 +++ 2 files changed, 7 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1586256..d46f837 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -195,6 +195,10 @@ ;; 1 :=: yes (define_attr "far_branch" "" (const_int 0)) +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has +;; no predicated insns. +(define_attr "predicated" "yes,no" (const_string "no")) + ;; --- ;; Pipeline descriptions and scheduling ;; --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 73c3088..6bda491 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -105,6 +105,9 @@ (define_attr "fpu" "none,vfp" (const (symbol_ref "arm_fpu_attr"))) +; Predicated means that the insn form is conditionally executed based on a +; predicate. We default to 'no' because no Thumb patterns match this rule +; and not all ARM insns do. s/is conditionally executed/can be conditionally executed/ in the first sentence. Otherwise, this looks OK to me but I can't approve the ARM part, so you'll need to wait for a review from someone who can. (define_attr "predicated" "yes,no" (const_string "no")) ; LENGTH of an instruction (in bytes) -- 2.1.0.243.g30d45f7 Actually, that would then be the existing description for the "predicable" attribute. I think that this proposed description is correct for the "predicated" attribute. Thank you, -- Evandro Menezes
Re: PATCH to shorten_compare -Wtype-limits handling
On 11/19/2015 05:16 PM, Jason Merrill wrote: On 11/19/2015 02:44 PM, Martin Sebor wrote: On 11/18/2015 09:26 PM, Jason Merrill wrote: The rs6000 target was hitting a bootstrap failure due to -Werror=type-limits. Since warn_tautological_cmp and other warnings avoid warning if one of the operands comes from a macro, I thought it would make sense to do that here as well. The also disables the warning for functions that are shadowed by macros such as C atomic_load et al. For example, in the program below. Is that an acceptable compromise or is there a way to avoid this? I think it's an acceptable compromise, but see below. At the same time, the change doesn't suppress the warning in other cases where I would have expected it to suppress it based on your description. For instance here: unsigned short bar (unsigned short x) { #define X x if (x > 0x) Yes, this is missed because the front end doesn't remember the location of the use of x; that's one of many location tracking issues. David Malcolm is working on this stuff. I noticed there is code elsewhere in c-common.c that avoids issuing the same warning for system headers (that's the code that responsible for issuing the warning for the second test case above). Hmm, it looks like using expansion_point_if_in_system_header might avoid the first issue you mention. Thus. commit fb71bf4de520cc4bd11eefb57e50748b4679996f Author: Jason Merrill Date: Thu Nov 19 15:21:47 2015 -0500 * c-common.c (shorten_compare): But look through macros from system headers. diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 068a0bc..fe0a235 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -4651,8 +4651,10 @@ shorten_compare (location_t loc, tree *op0_ptr, tree *op1_ptr, } if (TREE_CODE (primop0) != INTEGER_CST - /* Don't warn if it's from a macro. */ - && !from_macro_expansion_at (EXPR_LOCATION (primop0))) + /* Don't warn if it's from a (non-system) macro. */ + && !(from_macro_expansion_at + (expansion_point_location_if_in_system_header + (EXPR_LOCATION (primop0) { if (val == truthvalue_false_node) warning_at (loc, OPT_Wtype_limits, diff --git a/gcc/testsuite/gcc.dg/Wtype-limits2.c b/gcc/testsuite/gcc.dg/Wtype-limits2.c new file mode 100644 index 000..92151aa --- /dev/null +++ b/gcc/testsuite/gcc.dg/Wtype-limits2.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-Wtype-limits" } */ +/* { dg-require-effective-target sync_char_short } */ + +#include + +unsigned foo (unsigned char *x) +{ + if (atomic_load (x) > 1000) /* { dg-warning "comparison is always false due to limited range of data type" } */ +return 0; + return 1; +}
Re: [PATCH][ARM] Do not expand movmisalign pattern if not in 32-bit mode
On 11/11/15 16:10, Kyrill Tkachov wrote: > Hi all, > > The attached testcase ICEs when compiled with -march=armv6k -mthumb -Os or > any march > for which -mthumb gives Thumb1: > error: unrecognizable insn: > } > ^ > (insn 13 12 14 5 (set (reg:SI 116 [ x ]) > (unspec:SI [ > (mem:SI (reg/v/f:SI 112 [ s ]) [0 MEM[(unsigned char > *)s_1(D)]+0 S4 A8]) > ] UNSPEC_UNALIGNED_LOAD)) besttry.c:9 -1 > (nil)) > > The problem is that the expands a movmisalign pattern but the resulting > unaligned loads don't > match any define_insn because they are gated on unaligned_access && > TARGET_32BIT. > The unaligned_access expander is gated only on unaligned_access. > > This small patch fixes the issue by turning off unaligned_access if > TARGET_32BIT is not true. > We can then remove TARGET_32BIT from the unaligned load/store patterns > conditions as a cleanup. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk? > > Thanks, > Kyrill > > 2015-11-11 Kyrylo Tkachov > > * config/arm/arm.c (arm_option_override): Require TARGET_32BIT > for unaligned_access. > * config/arm/arm.md (unaligned_loadsi): Remove redundant TARGET_32BIT > from matching condition. > (unaligned_loadhis): Likewise. > (unaligned_loadhiu): Likewise. > (unaligned_storesi): Likewise. > (unaligned_storehi): Likewise. > > 2015-11-11 Kyrylo Tkachov > > * gcc.target/arm/armv6-unaligned-load-ice.c: New test. This means we don't have unaligned access for some cores in Thumb1 for armv6. I'd rather not have the ICE instead of trying to find testing coverage on such cores now. OK. regards Ramana
Re: [PATCH][ARM] PR 68149 Fix ICE in unaligned_loaddi split
On 10/11/15 17:32, Kyrill Tkachov wrote: > Hi all, > > This ICE in this PR occurs when we're trying to split unaligned_loaddi into > two SImode unaligned loads. > The problem is in the addressing mode. When reload was picking the > addressing mode we accepted an offset of > -256 because the mode in the pattern is advertised as DImode and that was > accepted by the legitimate address > hooks because they thought it was a NEON load (DImode is in > VALID_NEON_DREG_MODE). However, the splitter wants > to generate two normal SImode unaligned loads using that address, for which > -256 is not valid, so we ICE > in gen_lowpart. > > The only way unaligned_loaddi could be generated was through the > gen_movmem_ldrd_strd expansion that implements > a memmove using LDRD and STRD sequences. If the memmove source is not aligned > we can't use LDRDs so the code > generates unaligned_loaddi patterns and expects them to be split into two > normal loads after reload. Similarly > for unaligned store destinations. > > This patch just explicitly generates the two unaligned SImode loads or stores > when appropriate inside > gen_movmem_ldrd_strd. This makes the unaligned_loaddi and unaligned_storedi > patterns unused, so we can remove them. > > This patch fixes the ICe in gcc.target/aarch64/advsimd-intrinsics/vldX.c seen > with > -mthumb -mcpu=cortex-a15 -mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee > so no new testcase is added. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk? > > Thanks, > Kyrill > > 2015-11-10 Kyrylo Tkachov > > PR target/68149 > * config/arm/arm.md (unaligned_loaddi): Delete. > (unaligned_storedi): Likewise. > * config/arm/arm.c (gen_movmem_ldrd_strd): Don't generate > unaligned DImode memory ops. Instead perform two back-to-back > unalgined SImode ops. s/unalgined/unaligned. Ok. Ramana
Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF
On 20 Nov 14:31, Ilya Enkovich wrote: > 2015-11-20 14:28 GMT+03:00 Richard Biener : > > On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich > > wrote: > >> 2015-11-18 16:44 GMT+03:00 Richard Biener : > >>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich > >>> wrote: > Hi, > > When we compute vectypes we skip non-relevant phi nodes. But we process > non-relevant alive statements and thus may need vectype of non-relevant > live phi node to compute mask vectype. This patch enables vectype > computation for live phi nodes. Botostrapped and regtested on > x86_64-unknown-linux-gnu. OK for trunk? > >>> > >>> Hmm. What breaks if you instead skip all !relevant stmts and not > >>> compute vectype for life but not relevant ones? We won't ever > >>> "vectorize" !relevant ones, that is, we don't need their vector type. > >> > >> I tried it and got regression in SLP. It expected non-null vectype > >> for non-releveant but live statement. Regression was in > >> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90 > > > > Because somebody put a vector type check before > > > > if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > > return false; > > > > @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g > >tree mask_type; > >tree mask; > > > > + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > > +return false; > > + > >if (!VECTOR_BOOLEAN_TYPE_P (vectype)) > > return false; > > > > @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g > > ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; > > > >gcc_assert (ncopies >= 1); > > - if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > > -return false; > > > >if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def > >&& !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle > > > > fixes this particular fallout for me. > > I'll try it. With this fix it works fine, thanks! Bootstrapped and regtested on x86_64-unknown-linux-gnu. OK for trunk? Ilya -- gcc/ 2015-11-20 Ilya Enkovich Richard Biener * tree-vect-loop.c (vect_determine_vectorization_factor): Don't compute vectype for non-relevant mask producers. * gcc/tree-vect-stmts.c (vectorizable_comparison): Check stmt relevance earlier. gcc/testsuite/ 2015-11-20 Ilya Enkovich * gcc.dg/pr68327.c: New test. diff --git a/gcc/testsuite/gcc.dg/pr68327.c b/gcc/testsuite/gcc.dg/pr68327.c new file mode 100644 index 000..c3e6a94 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr68327.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +int a, d; +char b, c; + +void +fn1 () +{ + int i = 0; + for (; i < 1; i++) +d = 1; + for (; b; b++) +a = 1 && (d & b); +} diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 80937ec..592372d 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -439,7 +439,8 @@ vect_determine_vectorization_factor (loop_vec_info loop_vinfo) compute a factor. */ if (TREE_CODE (scalar_type) == BOOLEAN_TYPE) { - mask_producers.safe_push (stmt_info); + if (STMT_VINFO_RELEVANT_P (stmt_info)) + mask_producers.safe_push (stmt_info); bool_result = true; if (gimple_code (stmt) == GIMPLE_ASSIGN diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 0f64aaf..3723b26 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -7546,6 +7546,9 @@ vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi, tree mask_type; tree mask; + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) +return false; + if (!VECTOR_BOOLEAN_TYPE_P (vectype)) return false; @@ -7558,9 +7561,6 @@ vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi, ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; gcc_assert (ncopies >= 1); - if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) -return false; - if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle && reduc_def))
[Patch, fortran] PRs 68237 & 66762 - submodule problems
Dear All, I have committed as 'obvious' revision 230661 to fix 2/3 submodule problems. In the case of the third, PR68243, I believe gfortran is behaving correctly and I am awaiting confirmation from the reporter. Thanks to Dominique for regtesting the part of the patch that fixes PR66762. I have bootstrapped and regtested both part on FC21/x86_64. Cheers Paul PS I have just noticed that I attributed the wrong part of the patch to Steve. I will correct the ChangeLog in just a moment. 2015-11-20 Paul Thomas PR fortran/68237 * decl.c (gfc_match_submod_proc): Test the interface symbol before accessing its attributes. 2015-11-20 Steven G. Kargl PR fortran/66762 (gfc_get_symbol_decl): Test for attr.used_in_submodule as well as attr.use_assoc (twice). (gfc_create_module_variable): Ditto. 2015-11-20 Paul Thomas PR fortran/68237 * gfortran.dg/submodule_12.f90: New test PR fortran/66762 * gfortran.dg/submodule_6.f90: Add compile option -flto.
Re: [Committed] S/390: Add bswaphi2 pattern
On 11/20/2015 01:23 PM, Richard Henderson wrote: > On 11/20/2015 12:52 PM, Andreas Krebbel wrote: >> +(define_insn "bswaphi2" >> + [(set (match_operand:HI 0 "register_operand" "=d") >> +(bswap:HI (match_operand:HI 1 "memory_operand" "RT")))] >> + "TARGET_CPU_ZARCH" >> + "lrvh\t%0,%1" >> + [(set_attr "type" "load") >> + (set_attr "op_type" "RXY") >> + (set_attr "z10prop" "z10_super")]) > > Surely it's better to arrange so that you can use STRVH as well. > And providing a fallback for the reg-reg case (e.g. LRVR+SRL). > > Although I suppose I don't see support for STRV in bswap32/64 either... Right, I totally forgot about the stores. I'll have a look. We even have a mem-mem variant (mvcin). But I found it rather difficult to use since the pattern would have to make sure that source and destination do not overlap. -Andreas-
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On 20/11/15 12:27, James Greenhalgh wrote: On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: On 11/12/2015 09:39 AM, Evandro Menezes wrote: 2015-11-12 Evandro Menezes [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. The arm part is ok too. It's just a comment. In the ChangeLog entry for arm.md I'd say "Add description." Thanks, Kyrill Please, commit if it's alright. The AArch64 part of this is OK. From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 From: Evandro Menezes Date: Mon, 9 Nov 2015 17:11:16 -0600 Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM pipeline models gcc/ * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". * config/arm/arm.md (predicated): Added description. --- gcc/config/aarch64/aarch64.md | 4 gcc/config/arm/arm.md | 3 +++ 2 files changed, 7 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 1586256..d46f837 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -195,6 +195,10 @@ ;; 1 :=: yes (define_attr "far_branch" "" (const_int 0)) +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has +;; no predicated insns. +(define_attr "predicated" "yes,no" (const_string "no")) + ;; --- ;; Pipeline descriptions and scheduling ;; --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 73c3088..6bda491 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -105,6 +105,9 @@ (define_attr "fpu" "none,vfp" (const (symbol_ref "arm_fpu_attr"))) +; Predicated means that the insn form is conditionally executed based on a +; predicate. We default to 'no' because no Thumb patterns match this rule +; and not all ARM insns do. s/is conditionally executed/can be conditionally executed/ in the first sentence. Otherwise, this looks OK to me but I can't approve the ARM part, so you'll need to wait for a review from someone who can. Thanks, James (define_attr "predicated" "yes,no" (const_string "no")) ; LENGTH of an instruction (in bytes) -- 2.1.0.243.g30d45f7
[ptx] overrride anchor hook
Jim discovered that he needed to override the anchoring hook when using a PPC host-side compiler, but didn't figure out why this was needed. Digging into it, I discovered that flag_section_anchors is cleared in toplev.c by the command line option machinery, if there are no anchor target hooks. However, that's done in the host compiler and the LTO machinery simply copies the value over into the LTO compiler. Usually that's fine, as the LTO compiler's for the same target architecture. Except when it's an offload compiler. The flags are not resanitized (perhaps that should be done?) Anyway, that led to the PTX accelerator compiler trying to do section anchory things. Fixed by overriding TARGET_USE_ANCHORS_FOR_SYMBOL_P to say 'no'. I guess at the next instance of an offload compiler seeing a 'surprising' combination of optimization flags, we'll need a resanitize hook? Jakub? nathan 2015-11-20 Nathan Sidwell James Norris * config/nvptx/nvptx.c (nvptx_use_anchors_for_symbol_p): New. (TARGET_USE_ANCHORS_FOR_SYMBOL_P): Override. Index: config/nvptx/nvptx.c === --- config/nvptx/nvptx.c (revision 230657) +++ config/nvptx/nvptx.c (working copy) @@ -3895,6 +3895,19 @@ nvptx_cannot_copy_insn_p (rtx_insn *insn return false; } } + +/* Section anchors do not work. Initialization for flag_section_anchor + probes the existence of the anchoring target hooks and prevents + anchoring if they don't exist. However, we may be being used with + a host-side compiler that does support anchoring, and hence see + the anchor flag set (as it's not recalculated). So provide an + implementation denying anchoring. */ + +static bool +nvptx_use_anchors_for_symbol_p (const_rtx ARG_UNUSED (a)) +{ + return false; +} /* Record a symbol for mkoffload to enter into the mapping table. */ @@ -4914,6 +4927,9 @@ nvptx_goacc_reduction (gcall *call) #undef TARGET_CANNOT_COPY_INSN_P #define TARGET_CANNOT_COPY_INSN_P nvptx_cannot_copy_insn_p +#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P +#define TARGET_USE_ANCHORS_FOR_SYMBOL_P nvptx_use_anchors_for_symbol_p + #undef TARGET_INIT_BUILTINS #define TARGET_INIT_BUILTINS nvptx_init_builtins #undef TARGET_EXPAND_BUILTIN
Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument
On 20 Nov 14:54, Richard Biener wrote: > On Fri, Nov 20, 2015 at 2:08 PM, Ilya Enkovich wrote: > > On 19 Nov 18:19, Richard Biener wrote: > >> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt > >> wrote: > >> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote: > >> >> Currently we fold all memcpy/memmove calls with a known data size. > >> >> It causes two problems when used with Pointer Bounds Checker. > >> >> The first problem is that we may copy pointers as integer data > >> >> and thus loose bounds. The second problem is that if we inline > >> >> memcpy, we also have to inline bounds copy and this may result > >> >> in a huge amount of code and significant compilation time growth. > >> >> This patch disables folding for functions we want to instrument. > >> >> > >> >> Does it look reasonable for trunk and GCC5 branch? Bootstrapped > >> >> and regtested on x86_64-unknown-linux-gnu. > >> > > >> >Can't see anything wrong with it. Ok. > >> > >> But for small sizes this can have a huge impact on optimization. Which is > >> why we have the code in the first place. I'd make the check less broad, > >> for example inlining copies of size less than a pointer shouldn't be > >> affected. > > > > Right. We also may inline in case we know no pointers are copied. Below > > is a version with extended condition and a couple more tests. Bootstrapped > > and regtested on x86_64-unknown-linux-gnu. Does it OK for trunk and > > gcc-5-branch? > > > >> > >> Richard. > >> > >> > > >> >Bernd > >> > >> > > > > Thanks, > > Ilya > > -- > > gcc/ > > > > 2015-11-20 Ilya Enkovich > > > > * gimple-fold.c (gimple_fold_builtin_memory_op): Don't > > fold call if we are going to instrument it and it may > > copy pointers. > > > > gcc/testsuite/ > > > > 2015-11-20 Ilya Enkovich > > > > * gcc.target/i386/mpx/pr68337-1.c: New test. > > * gcc.target/i386/mpx/pr68337-2.c: New test. > > * gcc.target/i386/mpx/pr68337-3.c: New test. > > > > > > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > > index 1ab20d1..dd9f80b 100644 > > --- a/gcc/gimple-fold.c > > +++ b/gcc/gimple-fold.c > > @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3. If not see > > #include "gomp-constants.h" > > #include "optabs-query.h" > > #include "omp-low.h" > > +#include "tree-chkp.h" > > +#include "ipa-chkp.h" > > > > > > /* Return true when DECL can be referenced from current unit. > > @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator > > *gsi, > >unsigned int src_align, dest_align; > >tree off0; > > > > + /* Inlining of memcpy/memmove may cause bounds lost (if we copy > > +pointers as wide integer) and also may result in huge function > > +size because of inlined bounds copy. Thus don't inline for > > +functions we want to instrument in case pointers are copied. */ > > + if (flag_check_pointer_bounds > > + && chkp_instrumentable_p (cfun->decl) > > + /* Even if data may contain pointers we can inline if copy > > +less than a pointer size. */ > > + && (!tree_fits_uhwi_p (len) > > + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0) > > || tree_to_uhwi (len) >= POINTER_SIZE_UNITS > > > + /* Check data type for pointers. */ > > + && (!TREE_TYPE (src) > > + || !TREE_TYPE (TREE_TYPE (src)) > > + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src))) > > + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src) > > I don't think you can in any way rely on the pointer type of the src argument > as all pointer conversions are useless and memcpy and friends take void * > anyway. This check is looking for cases when we have type information indicating no pointers are copied. In case of 'void *' we have to assume pointers are copied and inlining is undesired. Test pr68337-2.c checks pointer type allows to enable inlining. Looks like this check misses || !COMPLETE_TYPE_P(TREE_TYPE (TREE_TYPE (src)))? > > Note that you also disable memmove to memcpy simplification with this > early check. Doesn't matter for MPX which uses the same implementation for both cases. > > Where is pointer transfer handled for MPX? I suppose it's not done > transparently > for all memory move instructions but explicitely by instrumented block copy > routines in libmpx? In which case how does that identify pointers vs. > non-pointers? It is handled by instrumentation pass. Compiler checks type of stored data to find pointer stores. Each pointer store is instrumented with bndstx call. MPX versions of memcpy, memmove etc. don't make any assumptions about type of copied data and just copy whole chunk of bounds metadata corresponding to copied block. Thanks, Ilya > > Richard. >
[Patch] sync top level configure with binutils-gdb
This patch was pushed on binutils-gdb repo, so I also commit it on gcc. Tristan. 2015-11-20 Tristan Gingold Sync with binutils-gdb: 2015-11-20 Tristan Gingold * configure.ac: Add aarch64-*-darwin* and arm-*-darwin*. * configure: Regenerate. Index: configure === --- configure (revision 230657) +++ configure (working copy) @@ -3686,6 +3686,14 @@ case "${target}" in *-*-chorusos) ;; + aarch64-*-darwin*) +noconfigdirs="$noconfigdirs ld gas gdb gprof" +noconfigdirs="$noconfigdirs sim target-rda" +;; + arm-*-darwin*) +noconfigdirs="$noconfigdirs ld gas gdb gprof" +noconfigdirs="$noconfigdirs sim target-rda" +;; powerpc-*-darwin*) noconfigdirs="$noconfigdirs ld gas gdb gprof" noconfigdirs="$noconfigdirs sim target-rda" Index: configure.ac === --- configure.ac(revision 230657) +++ configure.ac(working copy) @@ -1023,6 +1023,14 @@ case "${target}" in *-*-chorusos) ;; + aarch64-*-darwin*) +noconfigdirs="$noconfigdirs ld gas gdb gprof" +noconfigdirs="$noconfigdirs sim target-rda" +;; + arm-*-darwin*) +noconfigdirs="$noconfigdirs ld gas gdb gprof" +noconfigdirs="$noconfigdirs sim target-rda" +;; powerpc-*-darwin*) noconfigdirs="$noconfigdirs ld gas gdb gprof" noconfigdirs="$noconfigdirs sim target-rda"
Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
On 20/11/2015 13:47, "Richard Biener" wrote: >On Fri, Nov 20, 2015 at 1:33 PM, Alan Hayward >wrote: >> >> >>On 20/11/2015 11:00, "Richard Biener" wrote: >> >>>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward >>>wrote: When vectorising a integer induction condition reduction, is_nonwrapping_integer_induction ends up with different values for base during the analysis and build phases. In the first it is an INTEGER_CST, in the second the loop has been vectorised out and the base is now a variable. This results in the analysis and build stage detecting the STMT_VINFO_VEC_REDUCTION_TYPE as different types. The easiest way to fix this is to only check for integer induction conditions on the analysis stage. >>> >>>I don't like this. For the evolution part we have added >>>STMT_VINFO_LOOP_PHI_EVOLUTION_PART. If you now need >>>the original initial value as well then just save it. >>> >>>Or if you really want to go with the hack then please do not call >>>is_nonwrapping_integer_induction with vec_stmt != NULL but >>>initialize cond_expr_is_nonwrapping_integer_induction from >>>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) >>> >>>The hack also lacks a comment. >>> >> >>Ok. I've gone for a combination of both: >> >>I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE. >> >>I've removed the vec_stmt != NULL checks. >> >>I've moved the call to is_nonwrapping_integer_induction until after the >>vect_is_simple_reduction check. I never liked that I had >>is_nonwrapping_integer_induction early in the function, and think this >>looks better. > >It looks better but the comment for loop_phi_evolution_base is wrong. >The value is _not_ "correct" after prologue peeling (unless you >update it there to a non-constant expr). It is conservatively "correct" >for is_nonwrapping_integer_induction though. Which is why I'd >probably rename it to STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED >to reflect this (and similar to STMT_VINFO_NITERS_UNCHANGED). > >Ok with that change. Updated as requested and submitted. Alan. analysisonlycondcheck3.patch Description: Binary data
Re: Remove noce_mem_write_may_trap_or_fault_p in ifcvt
On 11/19/2015 12:49 AM, Jeff Law wrote: On 11/18/2015 12:16 PM, Bernd Schmidt wrote: I don't think so, actually. One safe option would be to rip it out and just stop transforming this case, but let's start by looking at the code just a bit further down, calling noce_can_store_speculate. This was added later than the code we're discussing, and it tries to verify that the same memory location will unconditionally be written to at a point later than the one we're trying to convert And if we dig into that thread, Ian suggests this isn't terribly important anyway. However, if we go even further upthread, we find an assertion from Michael that this is critical for 456.hmmer and references BZ 27313. Cc'd in case he has additional input. https://gcc.gnu.org/ml/gcc-patches/2007-08/msg01978.html Sadly, no testcase was included. BZ27313 is marked as fixed by the introduction of the tree cselim pass, thus the problem won't even be seen at RTL level. I'm undecided on whether cs-elim is safe wrt the store speculation vs locks concerns raised in the thread discussing Ian's noce_can_store_speculate_p, but that's not something we have to consider to solve the problem at hand. So if it weren't for the assertion that it's critical for hmmr, I'd be convinced that just ripping out was the right thing to do. Can you look at 27313 and hmmr and see if there's an impact. Just maybe the critical stuff for those is handled by the tree if converter and we can just rip out the clearly incorrect RTL bits without regressing anything performance-wise. If there is an impact, then I think we have to look at either improving the tree bits (so we can remove the rtl bits) or we have to do real dataflow analysis in the rtl bits. So I made this change: if (!set_b && MEM_P (orig_x)) -{ - /* Disallow the "if (...) x = a;" form (implicit "else x = x;") -for optimizations if writing to x may trap or fault, -i.e. it's a memory other than a static var or a stack slot, -is misaligned on strict aligned machines or is read-only. If -x is a read-only memory, then the program is valid only if we -avoid the store into it. If there are stores on both the -THEN and ELSE arms, then we can go ahead with the conversion; -either the program is broken, or the condition is always -false such that the other memory is selected. */ - if (noce_mem_write_may_trap_or_fault_p (orig_x)) - return FALSE; - - /* Avoid store speculation: given "if (...) x = a" where x is a -MEM, we only want to do the store if x is always set -somewhere in the function. This avoids cases like - if (pthread_mutex_trylock(mutex)) -++global_variable; -where we only want global_variable to be changed if the mutex -is held. FIXME: This should ideally be expressed directly in -RTL somehow. */ - if (!noce_can_store_speculate_p (test_bb, orig_x)) - return FALSE; -} +return FALSE; As far as I can tell hmmer and the 27313 testcase are unaffected at -O2 (if anything, hmmer was very slightly faster afterwards). The run wasn't super-scientific, but I wouldn't have expected anything else given the existence of cs-elim. Ok to do the above, removing all the bits made unnecessary (including memory_must_be_modified_in_insn_p in alias.c)? Bernd
Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument
On Fri, Nov 20, 2015 at 2:08 PM, Ilya Enkovich wrote: > On 19 Nov 18:19, Richard Biener wrote: >> On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt >> wrote: >> >On 11/19/2015 05:31 PM, Ilya Enkovich wrote: >> >> Currently we fold all memcpy/memmove calls with a known data size. >> >> It causes two problems when used with Pointer Bounds Checker. >> >> The first problem is that we may copy pointers as integer data >> >> and thus loose bounds. The second problem is that if we inline >> >> memcpy, we also have to inline bounds copy and this may result >> >> in a huge amount of code and significant compilation time growth. >> >> This patch disables folding for functions we want to instrument. >> >> >> >> Does it look reasonable for trunk and GCC5 branch? Bootstrapped >> >> and regtested on x86_64-unknown-linux-gnu. >> > >> >Can't see anything wrong with it. Ok. >> >> But for small sizes this can have a huge impact on optimization. Which is >> why we have the code in the first place. I'd make the check less broad, for >> example inlining copies of size less than a pointer shouldn't be affected. > > Right. We also may inline in case we know no pointers are copied. Below is > a version with extended condition and a couple more tests. Bootstrapped and > regtested on x86_64-unknown-linux-gnu. Does it OK for trunk and gcc-5-branch? > >> >> Richard. >> >> > >> >Bernd >> >> > > Thanks, > Ilya > -- > gcc/ > > 2015-11-20 Ilya Enkovich > > * gimple-fold.c (gimple_fold_builtin_memory_op): Don't > fold call if we are going to instrument it and it may > copy pointers. > > gcc/testsuite/ > > 2015-11-20 Ilya Enkovich > > * gcc.target/i386/mpx/pr68337-1.c: New test. > * gcc.target/i386/mpx/pr68337-2.c: New test. > * gcc.target/i386/mpx/pr68337-3.c: New test. > > > diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c > index 1ab20d1..dd9f80b 100644 > --- a/gcc/gimple-fold.c > +++ b/gcc/gimple-fold.c > @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3. If not see > #include "gomp-constants.h" > #include "optabs-query.h" > #include "omp-low.h" > +#include "tree-chkp.h" > +#include "ipa-chkp.h" > > > /* Return true when DECL can be referenced from current unit. > @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, >unsigned int src_align, dest_align; >tree off0; > > + /* Inlining of memcpy/memmove may cause bounds lost (if we copy > +pointers as wide integer) and also may result in huge function > +size because of inlined bounds copy. Thus don't inline for > +functions we want to instrument in case pointers are copied. */ > + if (flag_check_pointer_bounds > + && chkp_instrumentable_p (cfun->decl) > + /* Even if data may contain pointers we can inline if copy > +less than a pointer size. */ > + && (!tree_fits_uhwi_p (len) > + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0) || tree_to_uhwi (len) >= POINTER_SIZE_UNITS > + /* Check data type for pointers. */ > + && (!TREE_TYPE (src) > + || !TREE_TYPE (TREE_TYPE (src)) > + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src))) > + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src) I don't think you can in any way rely on the pointer type of the src argument as all pointer conversions are useless and memcpy and friends take void * anyway. Note that you also disable memmove to memcpy simplification with this early check. Where is pointer transfer handled for MPX? I suppose it's not done transparently for all memory move instructions but explicitely by instrumented block copy routines in libmpx? In which case how does that identify pointers vs. non-pointers? Richard. > + return false; > + >/* Build accesses at offset zero with a ref-all character type. */ >off0 = build_int_cst (build_pointer_type_for_mode (char_type_node, > ptr_mode, true), 0); > diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c > b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c > new file mode 100644 > index 000..3f8d79d > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c > @@ -0,0 +1,32 @@ > +/* { dg-do run } */ > +/* { dg-options "-fcheck-pointer-bounds -mmpx" } */ > + > +#include "mpx-check.h" > + > +#define N 2 > + > +extern void abort (); > + > +static int > +mpx_test (int argc, const char **argv) > +{ > + char ** src = (char **)malloc (sizeof (char *) * N); > + char ** dst = (char **)malloc (sizeof (char *) * N); > + int i; > + > + for (i = 0; i < N; i++) > +src[i] = __bnd_set_ptr_bounds (argv[0] + i, i + 1); > + > + __builtin_memcpy(dst, src, sizeof (char *) * N); > + > + for (i = 0; i < N; i++) > +{ > + char *p = dst[i]; > + if (p != argv[0] + i > + || __bnd_get_ptr_lbound (p) != p > + || __bnd_
Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
On Fri, Nov 20, 2015 at 1:33 PM, Alan Hayward wrote: > > > On 20/11/2015 11:00, "Richard Biener" wrote: > >>On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward >>wrote: >>> When vectorising a integer induction condition reduction, >>> is_nonwrapping_integer_induction ends up with different values for base >>> during the analysis and build phases. In the first it is an INTEGER_CST, >>> in the second the loop has been vectorised out and the base is now a >>> variable. >>> >>> This results in the analysis and build stage detecting the >>> STMT_VINFO_VEC_REDUCTION_TYPE as different types. >>> >>> The easiest way to fix this is to only check for integer induction >>> conditions on the analysis stage. >> >>I don't like this. For the evolution part we have added >>STMT_VINFO_LOOP_PHI_EVOLUTION_PART. If you now need >>the original initial value as well then just save it. >> >>Or if you really want to go with the hack then please do not call >>is_nonwrapping_integer_induction with vec_stmt != NULL but >>initialize cond_expr_is_nonwrapping_integer_induction from >>STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) >> >>The hack also lacks a comment. >> > > Ok. I've gone for a combination of both: > > I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE. > > I've removed the vec_stmt != NULL checks. > > I've moved the call to is_nonwrapping_integer_induction until after the > vect_is_simple_reduction check. I never liked that I had > is_nonwrapping_integer_induction early in the function, and think this > looks better. It looks better but the comment for loop_phi_evolution_base is wrong. The value is _not_ "correct" after prologue peeling (unless you update it there to a non-constant expr). It is conservatively "correct" for is_nonwrapping_integer_induction though. Which is why I'd probably rename it to STMT_VINFO_LOOP_PHI_EVOLUTION_BASE_UNCHANGED to reflect this (and similar to STMT_VINFO_NITERS_UNCHANGED). Ok with that change. Thanks, Richard. > > Alan. >
Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.
Hello Kyrill, On 20 Nov 12:15, Kyrill Tkachov wrote: > >gcc/tessuite/ > > * c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error. > > This test fails on bare-metal targets that don't support -fcilkplus or > -pthread. > Would you consider moving them to the cilkplus testing directory or adding an > appropriate > effective target check? I think so. I'll commit change in the bottom. gcc/testsuite/ * c-c++-common/attr-simd-3.c: Require Cilk Plus in effective target. > > Thanks, > Kyrill -- Thanks, K diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c-c++-common/attr-simd-3.c index 35dd4c0..c7533f0 100644 --- a/gcc/testsuite/c-c++-common/attr-simd-3.c +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c @@ -1,3 +1,4 @@ +/* { dg-require-effective-target cilkplus } */ /* { dg-do compile } */ /* { dg-options "-fcilkplus" } */ /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not declared in this scope" } */
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
On Fri, 20 Nov 2015, Tom de Vries wrote: > On 20/11/15 11:37, Richard Biener wrote: > >I'd rather make loop_optimizer_init do nothing > > if requested flags are already set and no fixup is needed > > > Thus sth like > > > > Index: gcc/loop-init.c > > === > > --- gcc/loop-init.c (revision 230649) > > +++ gcc/loop-init.c (working copy) > > @@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags) > > calculate_dominance_info (CDI_DOMINATORS); > > > > if (!needs_fixup) > > - checking_verify_loop_structure (); > > + { > > + checking_verify_loop_structure (); > > + if (loops_state_satisfies_p (flags)) > > + goto out; > > What about flags that are present in the loops state, but not requested in > flags? Should we try to clear those flags? No, I don't think so, that would break in-loop-pipeline LIM, dropping loop-closed SSA for example. I agree it's somewhat of an odd behavior but all passes should either be placed in a sub-pipeline with an outer loop_optimizer_init()/finalize () call or call both themselves. Richard. > Thanks, > - Tom > > > + } > > > > /* Clear all flags. */ > > if (recorded_exits) > > @@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags) > > /* Apply flags to loops. */ > > apply_loop_flags (flags); > > > > + checking_verify_loop_structure (); > > + > > +out: > > /* Dump loops. */ > > flow_loops_dump (dump_file, NULL, 1); > > > > - checking_verify_loop_structure (); > > - > > timevar_pop (TV_LOOP_INIT); > > } > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def
On 20/11/15 11:37, Richard Biener wrote: I'd rather make loop_optimizer_init do nothing if requested flags are already set and no fixup is needed Thus sth like Index: gcc/loop-init.c === --- gcc/loop-init.c (revision 230649) +++ gcc/loop-init.c (working copy) @@ -103,7 +103,11 @@ loop_optimizer_init (unsigned flags) calculate_dominance_info (CDI_DOMINATORS); if (!needs_fixup) - checking_verify_loop_structure (); + { + checking_verify_loop_structure (); + if (loops_state_satisfies_p (flags)) + goto out; What about flags that are present in the loops state, but not requested in flags? Should we try to clear those flags? Thanks, - Tom + } /* Clear all flags. */ if (recorded_exits) @@ -122,11 +126,12 @@ loop_optimizer_init (unsigned flags) /* Apply flags to loops. */ apply_loop_flags (flags); + checking_verify_loop_structure (); + +out: /* Dump loops. */ flow_loops_dump (dump_file, NULL, 1); - checking_verify_loop_structure (); - timevar_pop (TV_LOOP_INIT); }
Go patch committed: add receiver type to specific type function name
This patch to the Go frontend fixes the case where two different methods on different types with the same method name both define a type internally with the same name where the type requires a specific type hash or equality function. Before this patch those functions would get the same, causing a compilation error. This patch gives them different names using the same approach as is done for the type descriptor. I sent out a test case to the master Go testsuite in https://golang.org/cl/17081. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline and GCC 5 branch. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 230463) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -e3aef41ce0c5be81e2589e60d9cb0db1516e9e2d +dfa74d975884f363c74d6a66a37b1703093fdba6 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/types.cc === --- gcc/go/gofrontend/types.cc (revision 230463) +++ gcc/go/gofrontend/types.cc (working copy) @@ -1769,7 +1769,16 @@ Type::specific_type_functions(Gogo* gogo const Named_object* in_function = name->in_function(&index); if (in_function != NULL) { - base_name += '$' + Gogo::unpack_hidden_name(in_function->name()); + base_name.append(1, '$'); + const Typed_identifier* rcvr = + in_function->func_value()->type()->receiver(); + if (rcvr != NULL) + { + Named_type* rcvr_type = rcvr->type()->deref()->named_type(); + base_name.append(Gogo::unpack_hidden_name(rcvr_type->name())); + base_name.append(1, '$'); + } + base_name.append(Gogo::unpack_hidden_name(in_function->name())); if (index > 0) { char buf[30];
Re: [PATCH, PR68337] Don't fold memcpy/memmove we want to instrument
On 19 Nov 18:19, Richard Biener wrote: > On November 19, 2015 6:12:30 PM GMT+01:00, Bernd Schmidt > wrote: > >On 11/19/2015 05:31 PM, Ilya Enkovich wrote: > >> Currently we fold all memcpy/memmove calls with a known data size. > >> It causes two problems when used with Pointer Bounds Checker. > >> The first problem is that we may copy pointers as integer data > >> and thus loose bounds. The second problem is that if we inline > >> memcpy, we also have to inline bounds copy and this may result > >> in a huge amount of code and significant compilation time growth. > >> This patch disables folding for functions we want to instrument. > >> > >> Does it look reasonable for trunk and GCC5 branch? Bootstrapped > >> and regtested on x86_64-unknown-linux-gnu. > > > >Can't see anything wrong with it. Ok. > > But for small sizes this can have a huge impact on optimization. Which is > why we have the code in the first place. I'd make the check less broad, for > example inlining copies of size less than a pointer shouldn't be affected. Right. We also may inline in case we know no pointers are copied. Below is a version with extended condition and a couple more tests. Bootstrapped and regtested on x86_64-unknown-linux-gnu. Does it OK for trunk and gcc-5-branch? > > Richard. > > > > >Bernd > > Thanks, Ilya -- gcc/ 2015-11-20 Ilya Enkovich * gimple-fold.c (gimple_fold_builtin_memory_op): Don't fold call if we are going to instrument it and it may copy pointers. gcc/testsuite/ 2015-11-20 Ilya Enkovich * gcc.target/i386/mpx/pr68337-1.c: New test. * gcc.target/i386/mpx/pr68337-2.c: New test. * gcc.target/i386/mpx/pr68337-3.c: New test. diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index 1ab20d1..dd9f80b 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -53,6 +53,8 @@ along with GCC; see the file COPYING3. If not see #include "gomp-constants.h" #include "optabs-query.h" #include "omp-low.h" +#include "tree-chkp.h" +#include "ipa-chkp.h" /* Return true when DECL can be referenced from current unit. @@ -664,6 +666,23 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi, unsigned int src_align, dest_align; tree off0; + /* Inlining of memcpy/memmove may cause bounds lost (if we copy +pointers as wide integer) and also may result in huge function +size because of inlined bounds copy. Thus don't inline for +functions we want to instrument in case pointers are copied. */ + if (flag_check_pointer_bounds + && chkp_instrumentable_p (cfun->decl) + /* Even if data may contain pointers we can inline if copy +less than a pointer size. */ + && (!tree_fits_uhwi_p (len) + || compare_tree_int (len, POINTER_SIZE_UNITS) >= 0) + /* Check data type for pointers. */ + && (!TREE_TYPE (src) + || !TREE_TYPE (TREE_TYPE (src)) + || VOID_TYPE_P (TREE_TYPE (TREE_TYPE (src))) + || chkp_type_has_pointer (TREE_TYPE (TREE_TYPE (src) + return false; + /* Build accesses at offset zero with a ref-all character type. */ off0 = build_int_cst (build_pointer_type_for_mode (char_type_node, ptr_mode, true), 0); diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c new file mode 100644 index 000..3f8d79d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-1.c @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-options "-fcheck-pointer-bounds -mmpx" } */ + +#include "mpx-check.h" + +#define N 2 + +extern void abort (); + +static int +mpx_test (int argc, const char **argv) +{ + char ** src = (char **)malloc (sizeof (char *) * N); + char ** dst = (char **)malloc (sizeof (char *) * N); + int i; + + for (i = 0; i < N; i++) +src[i] = __bnd_set_ptr_bounds (argv[0] + i, i + 1); + + __builtin_memcpy(dst, src, sizeof (char *) * N); + + for (i = 0; i < N; i++) +{ + char *p = dst[i]; + if (p != argv[0] + i + || __bnd_get_ptr_lbound (p) != p + || __bnd_get_ptr_ubound (p) != p + i) + abort (); +} + + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c b/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c new file mode 100644 index 000..16736b4 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mpx/pr68337-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-fcheck-pointer-bounds -mmpx" } */ +/* { dg-final { scan-assembler-not "memcpy" } } */ + +void +test1 (char *dst, char *src) +{ + __builtin_memcpy (dst, src, sizeof (char *) * 2); +} + +void +test2 (void *dst, void *src) +{ + __builtin_memcpy (dst, src, sizeof (char *) / 2); +} + +struct s +{ + int a; + int b; +}; + +void +test3 (struct s *dst, struct s *src) +{ + __builtin_memcpy (dst, src, sizeof (struct s)); +} diff --git a/gcc/testsuite/gc
Re: GCC 5.3 Status Report (2015-11-20)
On Fri, Nov 20, 2015 at 7:53 AM, Richard Biener wrote: > > Status > == > > We plan to do a GCC 5.3 release candidate at the end of next week > followed by the actual release a week after that. > > So now is the time to look at your regression bugs in bugzilla and > do some backporting for things already fixed on trunk. I'm still waiting for approval of the libtool change to support AIX TLS symbols (PR 68192). There has been no response on Libtool patches mailing list. I again request permission to apply the patches to GCC trunk and 5-branch. Thanks, David
[committed, trivial] Fix typo and trailing whitespace in dump-file strings in parloops
[ was: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def ] On 18/11/15 17:22, Bernhard Reutner-Fischer wrote: Bonus points for fixing the dump_file to parse in: >Parloops will fail because: >... >phi is n_2 = PHI >arg of phi to exit: value n_4(D) used outside loop >checking if it a part of reduction pattern: s/it a/it is/ This patch fixes a typo and trailing whitespace in dump-file strings in parloops. Build for c and fortran, tested -fdump-tree-parloops testcases. Committed to trunk as trivial. Thanks, - Tom Fix typo and trailing whitespace in dump-file strings in parloops 2015-11-19 Tom de Vries * tree-parloops.c (build_new_reduction): Fix trailing whitespace in dump-file string. (try_create_reduction_list): Same. Fix typo in dump-file string. --- gcc/tree-parloops.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c index 8d7912d..aca2370 100644 --- a/gcc/tree-parloops.c +++ b/gcc/tree-parloops.c @@ -2383,7 +2383,7 @@ build_new_reduction (reduction_info_table_type *reduction_list, if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, - "Detected reduction. reduction stmt is: \n"); + "Detected reduction. reduction stmt is:\n"); print_gimple_stmt (dump_file, reduc_stmt, 0, 0); fprintf (dump_file, "\n"); } @@ -2564,7 +2564,7 @@ try_create_reduction_list (loop_p loop, print_generic_expr (dump_file, val, 0); fprintf (dump_file, " used outside loop\n"); fprintf (dump_file, - " checking if it a part of reduction pattern: \n"); + " checking if it is part of reduction pattern:\n"); } if (reduction_list->elements () == 0) {
GCC 5.3 Status Report (2015-11-20)
Status == We plan to do a GCC 5.3 release candidate at the end of next week followed by the actual release a week after that. So now is the time to look at your regression bugs in bugzilla and do some backporting for things already fixed on trunk. Quality Data Priority # Change from last report --- --- P10 P2 121+ 30 P3 20- 8 P4 87+ 2 P5 32+ 2 --- --- Total P1-P3 141+ 22 Total 260+ 24 Previous Report === https://gcc.gnu.org/ml/gcc/2015-07/msg00197.html
Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
On 20/11/2015 11:00, "Richard Biener" wrote: >On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward >wrote: >> When vectorising a integer induction condition reduction, >> is_nonwrapping_integer_induction ends up with different values for base >> during the analysis and build phases. In the first it is an INTEGER_CST, >> in the second the loop has been vectorised out and the base is now a >> variable. >> >> This results in the analysis and build stage detecting the >> STMT_VINFO_VEC_REDUCTION_TYPE as different types. >> >> The easiest way to fix this is to only check for integer induction >> conditions on the analysis stage. > >I don't like this. For the evolution part we have added >STMT_VINFO_LOOP_PHI_EVOLUTION_PART. If you now need >the original initial value as well then just save it. > >Or if you really want to go with the hack then please do not call >is_nonwrapping_integer_induction with vec_stmt != NULL but >initialize cond_expr_is_nonwrapping_integer_induction from >STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) > >The hack also lacks a comment. > Ok. I've gone for a combination of both: I now cache the base in STMT_VINFO_LOOP_PHI_EVOLUTION_BASE. I've removed the vec_stmt != NULL checks. I've moved the call to is_nonwrapping_integer_induction until after the vect_is_simple_reduction check. I never liked that I had is_nonwrapping_integer_induction early in the function, and think this looks better. Alan. analysisonlycondcheck2.patch Description: Binary data
Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models
On Thu, Nov 12, 2015 at 11:32:36AM -0600, Evandro Menezes wrote: > On 11/12/2015 09:39 AM, Evandro Menezes wrote: >2015-11-12 Evandro Menezes > >[AArch64] Add attribute for compatibility with ARM pipeline models > >gcc/ > >* config/aarch64/aarch64.md (predicated): Copy attribute from >"arm.md". >* config/arm/arm.md (predicated): Added description. > > Please, commit if it's alright. The AArch64 part of this is OK. > From 3fa6a2bca8f3d2992b4607cff0afcc2d9caa96f4 Mon Sep 17 00:00:00 2001 > From: Evandro Menezes > Date: Mon, 9 Nov 2015 17:11:16 -0600 > Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM > pipeline models > > gcc/ > * config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md". > * config/arm/arm.md (predicated): Added description. > --- > gcc/config/aarch64/aarch64.md | 4 > gcc/config/arm/arm.md | 3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 1586256..d46f837 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -195,6 +195,10 @@ > ;; 1 :=: yes > (define_attr "far_branch" "" (const_int 0)) > > +;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 > has > +;; no predicated insns. > +(define_attr "predicated" "yes,no" (const_string "no")) > + > ;; --- > ;; Pipeline descriptions and scheduling > ;; --- > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > index 73c3088..6bda491 100644 > --- a/gcc/config/arm/arm.md > +++ b/gcc/config/arm/arm.md > @@ -105,6 +105,9 @@ > (define_attr "fpu" "none,vfp" >(const (symbol_ref "arm_fpu_attr"))) > > +; Predicated means that the insn form is conditionally executed based on a > +; predicate. We default to 'no' because no Thumb patterns match this rule > +; and not all ARM insns do. s/is conditionally executed/can be conditionally executed/ in the first sentence. Otherwise, this looks OK to me but I can't approve the ARM part, so you'll need to wait for a review from someone who can. Thanks, James > (define_attr "predicated" "yes,no" (const_string "no")) > > ; LENGTH of an instruction (in bytes) > -- > 2.1.0.243.g30d45f7 >
Re: [RFC] [Patch] PR67326 - relax trap assumption by looking at similar DRS
On Fri, Nov 20, 2015 at 12:02:07PM +, Kumar, Venkataramanan wrote: > Also. > (1) I guard these checks for -ftree-loop-if-convert-stores and -fno-common. > Sometimes vectorization flags also triggers if conversion. > (2) Also hashing base DRs for writes only. Let me comment just on formatting, will leave actual review to Richard. > > gcc/ChangeLog > 2015-11-19 Venkataramanan Surname missing. > > PR tree-optimization/67326 > * tree-if-conv.c (offset_DR_map): Define. Extraneous space. > (struct ifc_dr): Add new tree base_predicate field. All ChangeLog lines should be indented by a single tab. > (hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash offsets, > DR pairs Too long line. > and hash base ref, DR pairs for write type DRs. Extraneous spaces. > (ifcvt_memrefs_wont_trap): Guard checks with > -ftree-loop-if-convert-stores flag. Too long line and extraneous spaces. >Check for similar DR that are accessed unconditionally. >(if_convertible_loop_p_1): Initialize and delete offset hash maps Extraneous space, missing full stop at the end. > > gcc/testsuite/ChangeLog > 2015-11-19 Venkataramanan > * gcc.dg/tree-ssa/ifc-pr67326.c: Add new. Extraneous space. > + if (DR_IS_WRITE (a)) > { > - IFC_DR (a)->predicate = ca; > - *base_master_dr = a; > + base_master_dr = &baseref_DR_map->get_or_insert (base_ref,&exist2); Missing space before &exist2); > + offset_master_dr = &offset_DR_map->get_or_insert (offset,&exist3); > + if (!exist3) > + *offset_master_dr = a; > + > + if (DR_RW_UNCONDITIONALLY (*offset_master_dr) != 1) > + DR_RW_UNCONDITIONALLY (*offset_master_dr) > + = DR_RW_UNCONDITIONALLY (*master_dr); Wrong indentation, the = should be below the first underscore in DR_RW_UNCONDITIONALLY. > - if (DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) == 1) > + if ((base_master_dr > +&& DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) == 1)) Extraneous () pair. > + else if (DR_OFFSET (a)) > +{ > + offset_dr = offset_DR_map->get (DR_OFFSET (a)); > + if ((DR_RW_UNCONDITIONALLY (*offset_dr) == 1) Likewise. Jakub
Re: [Committed] S/390: Add bswaphi2 pattern
On 11/20/2015 12:52 PM, Andreas Krebbel wrote: +(define_insn "bswaphi2" + [(set (match_operand:HI 0 "register_operand" "=d") + (bswap:HI (match_operand:HI 1 "memory_operand" "RT")))] + "TARGET_CPU_ZARCH" + "lrvh\t%0,%1" + [(set_attr "type" "load") + (set_attr "op_type" "RXY") + (set_attr "z10prop" "z10_super")]) Surely it's better to arrange so that you can use STRVH as well. And providing a fallback for the reg-reg case (e.g. LRVR+SRL). Although I suppose I don't see support for STRV in bswap32/64 either... r~
Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.
Hi Kirill, On 18/11/15 14:11, Kirill Yukhin wrote: Hello Andreas, Devid. On 18 Nov 10:45, Andreas Schwab wrote: Kirill Yukhin writes: diff --git a/gcc/testsuite/c-c++-common/attr-simd.c b/gcc/testsuite/c-c++-common/attr-simd.c new file mode 100644 index 000..b4eda34 --- /dev/null +++ b/gcc/testsuite/c-c++-common/attr-simd.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-optimized" } */ + +__attribute__((__simd__)) +extern +int simd_attr (void) +{ + return 0; +} + +/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" } } */ On ia64: FAIL: c-c++-common/attr-simd.c -Wc++-compat scan-tree-dump optimized "simd_attr[ \\t]simdclone|vector" FAIL: c-c++-common/attr-simd.c -Wc++-compat scan-tree-dump optimized "simd_attr2[ \\t]simdclone|vector" $ grep simd_attr attr-simd.c.194t.optimized ;; Function simd_attr (simd_attr, funcdef_no=0, decl_uid=1389, cgraph_uid=0, symbol_order=0) simd_attr () ;; Function simd_attr2 (simd_attr2, funcdef_no=1, decl_uid=1392, cgraph_uid=1, symbol_order=1) simd_attr2 () As far as vABI is supported on x86_64/i?86 only, I am going to enable mentioned `scan-tree-dump' only for these targets. This should cure both IA64 and Power. Concerning attr-simd-3.c. It is known issue: PR68158. And I believe this test should work everywhere as far as PR is resolved. I'll put xfail into the test. Which will lead to (in g++.log): gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes]^M output is: gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes]^M XFAIL: c-c++-common/attr-simd-3.c -std=gnu++98 PR68158 (test for errors, line 5) FAIL: c-c++-common/attr-simd-3.c -std=gnu++98 (test for excess errors) Excess errors: gcc/testsuite/c-c++-common/attr-simd-3.c:5:48: warning: '__simd__' attribute does not apply to types\ [-Wattributes] Patch in the bottom. gcc/tessuite/ * c-c++-common/attr-simd-3.c: Put xfail (PR68158) on dg-error. This test fails on bare-metal targets that don't support -fcilkplus or -pthread. Would you consider moving them to the cilkplus testing directory or adding an appropriate effective target check? Thanks, Kyrill * c-c++-common/attr-simd.c: Limit scan of dump to x86_64/i?86. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different." diff --git a/gcc/testsuite/c-c++-common/attr-simd-3.c b/gcc/testsuite/c-c++-common/attr-simd-3.c index 2bbdf04..35dd4c0 100644 --- a/gcc/testsuite/c-c++-common/attr-simd-3.c +++ b/gcc/testsuite/c-c++-common/attr-simd-3.c @@ -2,4 +2,4 @@ /* { dg-options "-fcilkplus" } */ /* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not declared in this scope" } */ -void f () __attribute__((__simd__, __vector__)); /* { dg-error "in the same function marked as a Cilk Plus" } */ +void f () __attribute__((__simd__, __vector__)); /* { dg-error "in the same function marked as a Cilk Plus" "PR68158" { xfail c++ } } */ diff --git a/gcc/testsuite/c-c++-common/attr-simd.c b/gcc/testsuite/c-c++-common/attr-simd.c index 61974e3..7674588 100644 --- a/gcc/testsuite/c-c++-common/attr-simd.c +++ b/gcc/testsuite/c-c++-common/attr-simd.c @@ -11,7 +11,7 @@ int simd_attr (void) return 0; } -/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" } } */ +/* { dg-final { scan-tree-dump "simd_attr\[ \\t\]simdclone|vector" "optimized" { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVbN4_simd_attr:" 1 { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVbM4_simd_attr:" 1 { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVcN4_simd_attr:" 1 { target { i?86-*-* x86_64-*-* } } } } */ @@ -29,7 +29,7 @@ int simd_attr2 (void) return 0; } -/* { dg-final { scan-tree-dump "simd_attr2\[ \\t\]simdclone|vector" "optimized" } } */ +/* { dg-final { scan-tree-dump "simd_attr2\[ \\t\]simdclone|vector" "optimized" { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVbN4_simd_attr2:" 1 { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVbM4_simd_attr2:" 1 { target { i?86-*-* x86_64-*-* } } } } */ /* { dg-final { scan-assembler-times "_ZGVcN4_simd_attr2:" 1 { target { i?86-*-* x86_64-*-* } } } } */
[RFC] [Patch] PR67326 - relax trap assumption by looking at similar DRS
Hi Richard, As per Jakub suggestion in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67326, the below patch fixes the regression in tree if conversion. Basically allowing if conversion to happen for a candidate DR, if we find similar DR with same dimensions and that DR will not trap. To find similar DRs using hash table to hashing the offset and DR pairs. Also reusing read/written information that was stored for reference tree. Also. (1) I guard these checks for -ftree-loop-if-convert-stores and -fno-common. Sometimes vectorization flags also triggers if conversion. (2) Also hashing base DRs for writes only. gcc/ChangeLog 2015-11-19 Venkataramanan PR tree-optimization/67326 * tree-if-conv.c (offset_DR_map): Define. (struct ifc_dr): Add new tree base_predicate field. (hash_memrefs_baserefs_and_store_DRs_read_written_info): Hash offsets, DR pairs and hash base ref, DR pairs for write type DRs. (ifcvt_memrefs_wont_trap): Guard checks with -ftree-loop-if-convert-stores flag. Check for similar DR that are accessed unconditionally. (if_convertible_loop_p_1): Initialize and delete offset hash maps gcc/testsuite/ChangeLog 2015-11-19 Venkataramanan * gcc.dg/tree-ssa/ifc-pr67326.c: Add new. Regstrapped on x86_64, Ok for trunk? Regards, Venkat. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c new file mode 100644 index 000..5ca11cd --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-pr67326.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -fdump-tree-ifcvt-details -fno-common -ftree-loop-if-convert-stores" } */ + +float v0[1024]; +float v1[1024]; +float v2[1024]; +float v3[1024]; + +void condAssign1 () { + for (int i=0; i<1024; ++i) +v0[i] = (v2[i]>v1[i]) ? v2[i]*v3[i] : v0[i]; +} + +void condAssign2 () { + for (int i=0; i<1024; ++i) +v0[i] = (v2[i]>v1[i]) ? v2[i]*v1[i] : v0[i]; +} + +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 2 "ifcvt" } } */ diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 01065cb..1b4d220 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -124,6 +124,9 @@ static hash_map *ref_DR_map; /* Hash table to store base reference, DR pairs. */ static hash_map *baseref_DR_map; +/* Hash table to store DR offset, DR pairs. */ +static hash_map *offset_DR_map; + /* Structure used to predicate basic blocks. This is attached to the ->aux field of the BBs in the loop to be if-converted. */ struct bb_predicate { @@ -589,6 +592,8 @@ struct ifc_dr { int rw_unconditionally; tree predicate; + + tree base_predicate; }; #define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux) @@ -609,11 +614,12 @@ static void hash_memrefs_baserefs_and_store_DRs_read_written_info (data_reference_p a) { - data_reference_p *master_dr, *base_master_dr; + data_reference_p *master_dr, *base_master_dr, *offset_master_dr; tree ref = DR_REF (a); tree base_ref = DR_BASE_OBJECT (a); + tree offset = DR_OFFSET (a); tree ca = bb_predicate (gimple_bb (DR_STMT (a))); - bool exist1, exist2; + bool exist1, exist2, exist3; while (TREE_CODE (ref) == COMPONENT_REF || TREE_CODE (ref) == IMAGPART_EXPR @@ -636,22 +642,34 @@ hash_memrefs_baserefs_and_store_DRs_read_written_info (data_reference_p a) if (is_true_predicate (IFC_DR (*master_dr)->predicate)) DR_RW_UNCONDITIONALLY (*master_dr) = 1; - base_master_dr = &baseref_DR_map->get_or_insert (base_ref,&exist2); - - if (!exist2) + if (DR_IS_WRITE (a)) { - IFC_DR (a)->predicate = ca; - *base_master_dr = a; + base_master_dr = &baseref_DR_map->get_or_insert (base_ref,&exist2); + if (!exist2) + { + IFC_DR (a)->base_predicate = ca; + *base_master_dr = a; + } + else + IFC_DR (*base_master_dr)->base_predicate + = fold_or_predicates + (EXPR_LOCATION (IFC_DR (*base_master_dr)->base_predicate), +ca, IFC_DR (*base_master_dr)->base_predicate); + + if (is_true_predicate (IFC_DR (*base_master_dr)->base_predicate)) + DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) = 1; } - else -IFC_DR (*base_master_dr)->predicate - = fold_or_predicates - (EXPR_LOCATION (IFC_DR (*base_master_dr)->predicate), -ca, IFC_DR (*base_master_dr)->predicate); - if (DR_IS_WRITE (a) - && (is_true_predicate (IFC_DR (*base_master_dr)->predicate))) -DR_WRITTEN_AT_LEAST_ONCE (*base_master_dr) = 1; + if (offset) +{ + offset_master_dr = &offset_DR_map->get_or_insert (offset,&exist3); + if (!exist3) + *offset_master_dr = a; + + if (DR_RW_UNCONDITIONALLY (*offset_master_dr) != 1) + DR_RW_UNCONDITIONALLY (*offset_master_dr) + = DR_RW_UNCONDITIONALLY (*master_dr); +} } /* Return true when the memory references of STMT won't trap in the @@ -682,7
Re: [PATCH 2/4][AArch64] Increase the loop peeling limit
On Thu, Nov 19, 2015 at 04:04:41PM -0600, Evandro Menezes wrote: > On 11/05/2015 02:51 PM, Evandro Menezes wrote: > >2015-11-05 Evandro Menezes > > > > gcc/ > > > > * config/aarch64/aarch64.c (aarch64_override_options_internal): > > Increase loop peeling limit. > > > >This patch increases the limit for the number of peeled insns. > >With this change, I noticed no major regression in either > >Geekbench v3 or SPEC CPU2000 while some benchmarks, typically FP > >ones, improved significantly. > > > >I tested this tuning on Exynos M1 and on A57. ThunderX seems to > >benefit from this tuning too. However, I'd appreciate comments > >from other stakeholders. > > Ping. I'd like to leave this for a call from the port maintainers. I can see why this leads to more opportunities for vectorization, but I'm concerned about the wider impact on code size. Certainly I wouldn't expect this to be our default at -O2 and below. My gut feeling is that this doesn't really belong in the back-end (there are presumably good reasons why the default for this parameter across GCC has fluctuated from 400 to 100 to 200 over recent years), but as I say, I'd like Marcus or Richard to make the call as to whether or not we take this patch. For now, I'd drop it from the series (it stands alone anyway). Thanks, James
[Committed] S/390: Add bswaphi2 pattern
Hi, the 16 bit bswap instruction support was missing in the back-end so far. Added with the attached patch. Bootstrapped on s390 and s390x. No regressions. Bye, -Andreas- gcc/testsuite/ChangeLog: 2015-11-20 Andreas Krebbel * gcc.target/s390/bswap-1.c: New test. gcc/ChangeLog: 2015-11-20 Andreas Krebbel * config/s390/s390.md ("bswaphi2"): New pattern. diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index ea65c74..280a772 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -10439,6 +10439,8 @@ ; Byte swap instructions ; +; FIXME: There is also mvcin but we cannot use it since src and target +; may overlap. (define_insn "bswap2" [(set (match_operand:GPR 0"register_operand" "=d, d") (bswap:GPR (match_operand:GPR 1 "nonimmediate_operand" " d,RT")))] @@ -10450,6 +10452,14 @@ (set_attr "op_type" "RRE,RXY") (set_attr "z10prop" "z10_super")]) +(define_insn "bswaphi2" + [(set (match_operand:HI 0 "register_operand" "=d") + (bswap:HI (match_operand:HI 1 "memory_operand" "RT")))] + "TARGET_CPU_ZARCH" + "lrvh\t%0,%1" + [(set_attr "type" "load") + (set_attr "op_type" "RXY") + (set_attr "z10prop" "z10_super")]) ; ; Population count instruction diff --git a/gcc/testsuite/gcc.target/s390/bswap-1.c b/gcc/testsuite/gcc.target/s390/bswap-1.c new file mode 100644 index 000..e1f113a --- /dev/null +++ b/gcc/testsuite/gcc.target/s390/bswap-1.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=z900 -mzarch" } */ + +#include + +uint64_t u64; +uint32_t u32; +uint16_t u16; + +uint64_t +foo64a (uint64_t a) +{ + return __builtin_bswap64 (a); +} +/* { dg-final { scan-assembler-times "lrvgr\t%r2,%r2" 1 { target lp64 } } } */ + +uint64_t +foo64b () +{ + return __builtin_bswap64 (u64); +} +/* { dg-final { scan-assembler-times "lrvg\t%r2,0\\(%r\[0-9\]*\\)" 1 { target lp64 } } } */ + +uint32_t +foo32 () +{ + return __builtin_bswap32 (u32); +} +/* { dg-final { scan-assembler-times "lrv\t%r2,0\\(%r\[0-9\]*\\)" 1 } } */ + +uint16_t +foo16 () +{ + return __builtin_bswap16 (u16); +} +/* { dg-final { scan-assembler-times "lrvh\t%r2,0\\(%r\[0-9\]*\\)" 1 } } */
Re: [PATCH][GCC][ARM] Disable neon testing for armv7-m
Hi Andre, On 18/11/15 09:44, Andre Vieira wrote: On 17/11/15 10:10, James Greenhalgh wrote: On Mon, Nov 16, 2015 at 01:15:32PM +, Andre Vieira wrote: On 16/11/15 12:07, James Greenhalgh wrote: On Mon, Nov 16, 2015 at 10:49:11AM +, Andre Vieira wrote: Hi, This patch changes the target support mechanism to make it recognize any ARM 'M' profile as a non-neon supporting target. The current check only tests for armv6 architectures and earlier, and does not account for armv7-m. This is correct because there is no 'M' profile that supports neon and the current test is not sufficient to exclude armv7-m. Tested by running regressions for this testcase for various ARM targets. Is this OK to commit? Thanks, Andre Vieira gcc/testsuite/ChangeLog: 2015-11-06 Andre Vieira * gcc/testsuite/lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Added check for M profile. From 2c53bb9ba3236919ecf137a4887abf26d4f7fda2 Mon Sep 17 00:00:00 2001 From: Andre Simoes Dias Vieira Date: Fri, 13 Nov 2015 11:16:34 + Subject: [PATCH] Disable neon testing for armv7-m --- gcc/testsuite/lib/target-supports.exp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 75d506829221e3d02d454631c4bd2acd1a8cedf2..8097a4621b088a93d58d09571cf7aa27b8d5fba6 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2854,7 +2854,7 @@ proc check_effective_target_arm_neon_ok_nocache { } { int dummy; /* Avoid the case where a test adds -mfpu=neon, but the toolchain is configured for -mcpu=arm926ej-s, for example. */ -#if __ARM_ARCH < 7 +#if __ARM_ARCH < 7 || __ARM_ARCH_PROFILE == 'M' #error Architecture too old for NEON. Could you fix this #error message while you're here? Why we can't change this test to look for the __ARM_NEON macro from ACLE: #if __ARM_NEON < 1 #error NEON is not enabled #endif Thanks, James There is a check for this already: 'check_effective_target_arm_neon'. I think the idea behind arm_neon_ok is to check whether the hardware would support neon, whereas arm_neon is to check whether neon was enabled, i.e. -mfpu=neon was used or a mcpu was passed that has neon enabled by default. The comments for 'check_effective_target_arm_neon_ok_nocache' highlight this, though maybe the comments for check_effective_target_arm_neon could be better. # Return 1 if this is an ARM target supporting -mfpu=neon # -mfloat-abi=softfp or equivalent options. Some multilibs may be # incompatible with these options. Also set et_arm_neon_flags to the # best options to add. proc check_effective_target_arm_neon_ok_nocache ... /* Avoid the case where a test adds -mfpu=neon, but the toolchain is configured for -mcpu=arm926ej-s, for example. */ ... and # Return 1 if this is a ARM target with NEON enabled. proc check_effective_target_arm_neon OK, got it - sorry for my mistake, I had the two procs confused. I'd still like to see the error message fixed "Architecture too old for NEON." is not an accurate description of the problem. Thanks, James This OK? This is ok, I've committed for you with the slightly tweaked ChangeLog entry: 2015-11-20 Andre Vieira * lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Add check for M profile. as r230653. Thanks, Kyrill Cheers, Andre
Re: [PATCH] Add clang-format config to contrib folder
On 11/20/2015 11:33 AM, Martin Liška wrote: > Hi Pedro. Hi Martin. > Fully agree with you, there's suggested patch. > Hope I can install the patch for trunk? I'd call it obvious. :-) Thanks, Pedro Alves
Re: [PATCH] Add clang-format config to contrib folder
On 11/19/2015 06:43 PM, Pedro Alves wrote: > On 11/19/2015 12:54 PM, Martin Liška wrote: >> ContinuationIndentWidth: 2 >> -ForEachMacros: >> ['_FOR_EACH','_FOR_EACH_1','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FOR_EACH_IMM_USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR! > _EACH_OBJE > CT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','_FOR_EACH_X','_FOR_EACH_X_1','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR'] >> +ForEachMacros: >> ['FOR_ALL_BB_FN','FOR_ALL_EH_REGION','FOR_ALL_EH_REGION_AT','FOR_ALL_EH_REGION_FN','FOR_ALL_INHERITED_FIELDS','FOR_ALL_PREDICATES','FOR_BB_BETWEEN','FOR_BB_INSNS','FOR_BB_INSNS_REVERSE','FOR_BB_INSNS_REVERSE_SAFE','FOR_BB_INSNS_SAFE','FOR_BODY','FOR_COND','FOR_EACH_AGGR_INIT_EXPR_ARG','FOR_EACH_ALIAS','FOR_EACH_ALLOCNO','FOR_EACH_ALLOCNO_OBJECT','FOR_EACH_ARTIFICIAL_DEF','FOR_EACH_ARTIFICIAL_USE','FOR_EACH_BB_FN','FOR_EACH_BB_REVERSE_FN','FOR_EACH_BIT_IN_MINMAX_SET','FOR_EACH_CALL_EXPR_ARG','FOR_EACH_CLONE','FOR_EACH_CONST_CALL_EXPR_ARG','FOR_EACH_CONSTRUCTOR_ELT','FOR_EACH_CONSTRUCTOR_VALUE','FOR_EACH_COPY','FOR_EACH_DEF','FOR_EACH_DEFINED_FUNCTION','FOR_EACH_DEFINED_SYMBOL','FOR_EACH_DEFINED_VARIABLE','FOR_EACH_DEP','FOR_EACH_EDGE','FOR_EACH_EXPR','FOR_EACH_EXPR_1','FOR_EACH_FUNCTION','FOREACH_FUNCTION_ARGS','FOREACH_FUNCTION_ARGS_PTR','FOR_EACH_FUNCTION_WITH_GIMPLE_BODY','FOR_EACH_HASH_TABLE_ELEMENT','FOR_EACH_IMM_USE_FAST','FOR_EACH_IMM_USE_ON_STMT','FO! > R_EACH_IMM > _USE_STMT','FOR_EACH_INSN','FOR_EACH_INSN_1','FOR_EACH_INSN_DEF','FOR_EACH_INSN_EQ_USE','FOR_EACH_INSN_INFO_DEF','FOR_EACH_INSN_INFO_EQ_USE','FOR_EACH_INSN_INFO_MW','FOR_EACH_INSN_INFO_USE','FOR_EACH_INSN_USE','FOR_EACH_LOCAL_DECL','FOR_EACH_LOOP','FOR_EACH_LOOP_FN','FOR_EACH_OBJECT','FOR_EACH_OBJECT_CONFLICT','FOR_EACH_PHI_ARG','FOR_EACH_PHI_OR_STMT_DEF','FOR_EACH_PHI_OR_STMT_USE','FOR_EACH_PREF','FOR_EACH_SCALAR','FOR_EACH_SSA_DEF_OPERAND','FOR_EACH_SSA_TREE_OPERAND','FOR_EACH_SSA_USE_OPERAND','FOR_EACH_STATIC_INITIALIZER','FOR_EACH_SUBRTX','FOR_EACH_SUBRTX_PTR','FOR_EACH_SUBRTX_VAR','FOR_EACH_SUCC','FOR_EACH_SUCC_1','FOR_EACH_SYMBOL','FOR_EACH_VARIABLE','FOR_EACH_VEC_ELT','FOR_EACH_VEC_ELT_FROM','FOR_EACH_VEC_ELT_REVERSE','FOR_EACH_VEC_SAFE_ELT','FOR_EACH_VEC_SAFE_ELT_REVERSE','FOR_EXPR','FOR_INIT_STMT','FOR_SCOPE'] >> IndentCaseLabels: false > > I don't know the tool's syntax here, but it's usually much better to > keep entries like these one per line (or at least a few only). Then > changes to the list are _much_ easier to review and maintain, like: > > ... > ForEachMacros: [ > 'FOR_ALL_BB_FN', > 'FOR_ALL_EH_REGION', > -'FOR_ALL_EH_REGION_AT', > -'FOR_ALL_EH_REGION', > +'FOR_ALL_EH_WHATNOT, > +'FOR_ALL_EH_WHATNOT_AT, > 'FOR_ALL_INHERITED_FIELDS', > 'FOR_ALL_PREDICATES', > 'FOR_BB_BETWEEN', > 'FOR_BB_INSNS', > ... > > vs a single-line hunk that's basically unintelligible. > > Thanks, > Pedro Alves > Hi Pedro. Fully agree with you, there's suggested patch. Hope I can install the patch for trunk? Thanks, Martin From eac730138b51db897f579a961aabaee805338bdd Mon Sep 17 00:00:00 2001 From: marxin Date: Fri, 20 Nov 2015 12:29:49 +0100 Subject: [PATCH] clang-format: split content of a list to multiple lines contrib/ChangeLog: 2015-11-20 Martin Liska * clang-format: Split content of a list to multiple lines. --- contrib/clang-format | 85 +++- 1 file changed, 84 insertions(+), 1 deletion(-) diff --git a/contrib/clang-format b/contrib/clang-format index 76a9d87..d734001 100644 --- a/contrib/clang-format +++ b/contrib/clang-format @@ -43,7 +43,90 @@ BreakBeforeTernaryOperators: true ColumnLimit: 80 ConstructorInitializerIndentWidth: 2 ContinuationInde
Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF
2015-11-20 14:28 GMT+03:00 Richard Biener : > On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich wrote: >> 2015-11-18 16:44 GMT+03:00 Richard Biener : >>> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich >>> wrote: Hi, When we compute vectypes we skip non-relevant phi nodes. But we process non-relevant alive statements and thus may need vectype of non-relevant live phi node to compute mask vectype. This patch enables vectype computation for live phi nodes. Botostrapped and regtested on x86_64-unknown-linux-gnu. OK for trunk? >>> >>> Hmm. What breaks if you instead skip all !relevant stmts and not >>> compute vectype for life but not relevant ones? We won't ever >>> "vectorize" !relevant ones, that is, we don't need their vector type. >> >> I tried it and got regression in SLP. It expected non-null vectype >> for non-releveant but live statement. Regression was in >> gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90 > > Because somebody put a vector type check before > > if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > return false; > > @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g >tree mask_type; >tree mask; > > + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > +return false; > + >if (!VECTOR_BOOLEAN_TYPE_P (vectype)) > return false; > > @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g > ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; > >gcc_assert (ncopies >= 1); > - if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) > -return false; > >if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def >&& !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle > > fixes this particular fallout for me. I'll try it. Thanks, Ilya > > Richard. > >> Ilya >> >>> >>> Richard. >>> Thanks, Ilya
Re: [PATCH, PR tree-optimization/68327] Compute vectype for live phi nodes when copmputing VF
On Wed, Nov 18, 2015 at 2:53 PM, Ilya Enkovich wrote: > 2015-11-18 16:44 GMT+03:00 Richard Biener : >> On Wed, Nov 18, 2015 at 12:34 PM, Ilya Enkovich >> wrote: >>> Hi, >>> >>> When we compute vectypes we skip non-relevant phi nodes. But we process >>> non-relevant alive statements and thus may need vectype of non-relevant >>> live phi node to compute mask vectype. This patch enables vectype >>> computation for live phi nodes. Botostrapped and regtested on >>> x86_64-unknown-linux-gnu. OK for trunk? >> >> Hmm. What breaks if you instead skip all !relevant stmts and not >> compute vectype for life but not relevant ones? We won't ever >> "vectorize" !relevant ones, that is, we don't need their vector type. > > I tried it and got regression in SLP. It expected non-null vectype > for non-releveant but live statement. Regression was in > gcc/gcc/testsuite/gfortran.fortran-torture/execute/pr43390.f90 Because somebody put a vector type check before if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) return false; @@ -7590,6 +7651,9 @@ vectorizable_comparison (gimple *stmt, g tree mask_type; tree mask; + if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) +return false; + if (!VECTOR_BOOLEAN_TYPE_P (vectype)) return false; @@ -7602,8 +7666,6 @@ vectorizable_comparison (gimple *stmt, g ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits; gcc_assert (ncopies >= 1); - if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) -return false; if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def && !(STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle fixes this particular fallout for me. Richard. > Ilya > >> >> Richard. >> >>> Thanks, >>> Ilya
Re: [PATCH] Fix memory leaks in tree-ssa-uninit.c
On 11/20/2015 03:14 AM, Bernd Schmidt wrote: > BTW, I'm with whoever said absolutely no way to the idea of making automatic > changes like this as part of a commit hook. > > I think the whitespace change can go in if it hasn't already, but I think the > other one still has enough problems that I'll say - leave it for the next > stage 1. > >> @@ -178,8 +173,9 @@ warn_uninitialized_vars (bool >> warn_possibly_uninitialized) >> >>FOR_EACH_BB_FN (bb, cfun) >> { >> - bool always_executed = dominated_by_p (CDI_POST_DOMINATORS, >> - single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb); >> + bool always_executed >> += dominated_by_p (CDI_POST_DOMINATORS, >> + single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)), bb); >>for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> { > > Better to pull the single_succ into its own variable perhaps? > >> @@ -1057,7 +1039,8 @@ prune_uninit_phi_opnds_in_unrealizable_paths (gphi >> *phi, >> *visited_flag_phis = BITMAP_ALLOC (NULL); >> >> if (bitmap_bit_p (*visited_flag_phis, >> -SSA_NAME_VERSION (gimple_phi_result (flag_arg_def >> +SSA_NAME_VERSION ( >> + gimple_phi_result (flag_arg_def >> return false; >> >> bitmap_set_bit (*visited_flag_phis, > > Pull the gimple_phi_result into a separate variable, or just leave it > unchanged for now, the modified version is too ugly to live. > >> bitmap_clear_bit (*visited_flag_phis, >> -SSA_NAME_VERSION (gimple_phi_result (flag_arg_def))); >> +SSA_NAME_VERSION ( >> + gimple_phi_result (flag_arg_def))); >> continue; >> } > > Here too. > >> - all_pruned = prune_uninit_phi_opnds_in_unrealizable_paths (phi, >> - uninit_opnds, >> - as_a (flag_def), >> - boundary_cst, >> - cmp_code, >> - visited_phis, >> - &visited_flag_phis); >> + all_pruned = prune_uninit_phi_opnds_in_unrealizable_paths >> +(phi, uninit_opnds, as_a (flag_def), boundary_cst, cmp_code, >> + visited_phis, &visited_flag_phis); > > I'd rather shorten the name of the function, even if it goes against the > spirit of the novel writing month. > >> - if (gphi *use_phi = dyn_cast (use_stmt)) >> -use_bb = gimple_phi_arg_edge (use_phi, >> - PHI_ARG_INDEX_FROM_USE (use_p))->src; >> + if (gphi *use_phi = dyn_cast (use_stmt)) >> +use_bb >> + = gimple_phi_arg_edge (use_phi, PHI_ARG_INDEX_FROM_USE (use_p))->src; >> else >> use_bb = gimple_bb (use_stmt); > > There are some changes of this nature and I'm not sure it's an improvement. > Leave them out for now? > >> @@ -2345,8 +2309,8 @@ warn_uninitialized_phi (gphi *phi, vec >> *worklist, >> } >> >> /* Now check if we have any use of the value without proper guard. */ >> - uninit_use_stmt = find_uninit_use (phi, uninit_opnds, >> - worklist, added_to_worklist); >> + uninit_use_stmt >> += find_uninit_use (phi, uninit_opnds, worklist, added_to_worklist); >> >> /* All uses are properly guarded. */ >> if (!uninit_use_stmt) > > Here too. > >> @@ -2396,12 +2359,24 @@ public: >> {} >> >> /* opt_pass methods: */ >> - opt_pass * clone () { return new pass_late_warn_uninitialized (m_ctxt); } >> - virtual bool gate (function *) { return gate_warn_uninitialized (); } >> + opt_pass *clone (); >> + virtual bool gate (function *); > > This may technically violate our coding standards, but it's consistent with a > lot of similar cases. Since coding standards are about enforcing consistency, > I'd drop this change. > >> namespace { >> >> const pass_data pass_data_early_warn_uninitialized = >> { >> - GIMPLE_PASS, /* type */ >> + GIMPLE_PASS, /* type */ >> "*early_warn_uninitialized", /* name */ >> - OPTGROUP_NONE, /* optinfo_flags */ >> - TV_TREE_UNINIT, /* tv_id */ >> - PROP_ssa, /* properties_required */ >> - 0, /* properties_provided */ >> - 0, /* properties_destroyed */ >> - 0, /* todo_flags_start */ >> - 0, /* todo_flags_finish */ >> + OPTGROUP_NONE, /* optinfo_flags */ >> + TV_TREE_UNINIT, /* tv_id */ >> + PROP_ssa, /* properties_required */ >> + 0, /* properties_provided */ >> + 0, /* properties_destroyed */ >> + 0, /* todo_flags_start */ >> + 0, /* todo_flags_finish */ >> }; > > Likewise. Seems to be done practically nowhere. > >> class pass_early_warn_uninitialized : public gimple_opt_pass >> @@ -2519,14 +2491,23 @@ public: >> {} >> >> /* opt_pass methods: */ >> - virtual bool gate (function *) { return gate_warn_uninitializ
Re: Add uaddv4_optab, usubv4_optab
On 11/20/2015 11:56 AM, Eric Botcazou wrote: Eric has just submitted a documentation path that documented the {add,sub,mul,umul}v4 and negv3 patterns, so this should be applied on top of that. OK, I'm going to apply it, thanks. Thanks. Note that the comment at the beginning of expand_addsub_overflow describing the overall strategy ought to be adjusted if new patterns for the jump on carry are added. Ok, I'll do that. r~
Re: Add uaddv4_optab, usubv4_optab
On 11/20/2015 11:43 AM, Jakub Jelinek wrote: +(define_expand "uaddv4" + [(parallel [(set (reg:CCC FLAGS_REG) + (compare:CCC +(plus:SWI (match_dup 1) (match_dup 2)) +(match_dup 1))) + (set (match_dup 0) + (plus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (ne (reg:CCC FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] + "" +{ + ix86_fixup_binary_operands_no_copy (PLUS, mode, operands); +}) Do we need this one on i?86? I'm not against adding it to optabs, so that other targets have a way to improve that, but doesn't combine handle this case on i?86 already well? Perhaps combine can do the job, but in my option it's better to have the optab than not. Especially when it's so easy to add in this case. +(define_expand "usubv4" + [(parallel [(set (reg:CC FLAGS_REG) + (compare:CC (match_dup 1) (match_dup 2))) + (set (match_dup 0) + (minus:SWI (match_dup 1) (match_dup 2)))]) + (set (pc) (if_then_else + (ltu (reg:CC FLAGS_REG) (const_int 0)) + (label_ref (match_operand 3)) + (pc)))] If this works, it will be nice, I thought we'll need a new CC*mode. No, we just need to re-use the existing insn that performs the low half of a double-word subtraction operation. Eric has just submitted a documentation path that documented the {add,sub,mul,umul}v4 and negv3 patterns, so this should be applied on top of that. Ok, I'll look out for that. r~
Re: [PATCH 2/5] [AARCH64] Change IMP and PART over to integers from strings.
Hi Andrew, On 17/11/15 22:10, Andrew Pinski wrote: Because the imp and parts are really integer rather than strings, this patch moves the comparisons to be integer. Also allows saving around integers are easier than doing string comparisons. This allows for the next change. The way I store BIG.little is (big<<12)|little as each part num is only 12bits long. So it would be nice if someone could test -mpu=native on a big.little system to make sure it works still. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski * config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are integer constants. * config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change implementer_id to unsigned char. Change part_no to unsigned int. (AARCH64_BIG_LITTLE): New define. (INVALID_IMP): New define. (INVALID_CORE): New define. (cpu_data): Change the last element's implementer_id and part_no to integers. (valid_bL_string_p): Rewrite to .. (valid_bL_core_p): this for integers instead of strings. (parse_field): New function. (contains_string_p): Rewrite to ... (contains_core_p): this for integers and only for the part_no. (host_detect_local_cpu): Rewrite handling of implementation and part num to be integers; simplifying the code. --- gcc/config/aarch64/aarch64-cores.def | 25 +- gcc/config/aarch64/driver-aarch64.c | 90 2 files changed, 62 insertions(+), 53 deletions(-) diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 0b456f7..798f3e3 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -33,25 +33,26 @@ This need not include flags implied by the architecture. COSTS is the name of the rtx_costs routine to use. IMP is the implementer ID of the CPU vendor. On a GNU/Linux system it can - be found in /proc/cpuinfo. + be found in /proc/cpuinfo. There is a list in the ARM ARM. PART is the part number of the CPU. On a GNU/Linux system it can be found - in /proc/cpuinfo. For big.LITTLE systems this should have the form at of - ".". */ + in /proc/cpuinfo. For big.LITTLE systems this should use the macro AARCH64_BIG_LITTLE + where the big part number comes as the first arugment to the macro and little is the + second. */ /* V8 Architecture Processors. */ -AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03") -AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07") -AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08") -AARCH64_CORE("exynos-m1", exynosm1, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001") -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800") -AARCH64_CORE("thunderx",thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1") -AARCH64_CORE("xgene1", xgene1,xgene1,8A, AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000") +AARCH64_CORE("cortex-a53", cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, 0x41, 0xd03) +AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, 0xd07) +AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, 0xd08) +AARCH64_CORE("exynos-m1", exynosm1, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, 0x53, 0x001) +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, 0x51, 0x800) +AARCH64_CORE("thunderx",thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, 0x43, 0x0a1) +AARCH64_CORE("xgene1", xgene1,xgene1,8A, AARCH64_FL_FOR_ARCH8, xgene1, 0x50, 0x000) /* V8 big.LITTLE implementations. */ -AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03") -AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08.0xd03") +AARCH64_CORE("cortex-a57.cortex-a53", cortexa57cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE(0xd07, 0xd03)) +AARCH64_CORE("cortex-a72.cortex-a53", cortexa72cortexa53, cortexa53, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, 0x41, AARCH64_BIG_LITTLE(0xd08, 0xd03)) #undef AARCH64_CORE diff --git a/gcc/config/aarch64/driver-aarch64.c b/gcc/config/aarch64/driver-aarch64.c
Re: Add uaddv4_optab, usubv4_optab
> Eric has just submitted a documentation path that documented the > {add,sub,mul,umul}v4 and negv3 patterns, so this should be > applied on top of that. OK, I'm going to apply it, thanks. Note that the comment at the beginning of expand_addsub_overflow describing the overall strategy ought to be adjusted if new patterns for the jump on carry are added. -- Eric Botcazou
Re: [PATCH] PR tree-optimization/68413 : Only check for integer cond reduction on analysis stage
On Fri, Nov 20, 2015 at 10:24 AM, Alan Hayward wrote: > When vectorising a integer induction condition reduction, > is_nonwrapping_integer_induction ends up with different values for base > during the analysis and build phases. In the first it is an INTEGER_CST, > in the second the loop has been vectorised out and the base is now a > variable. > > This results in the analysis and build stage detecting the > STMT_VINFO_VEC_REDUCTION_TYPE as different types. > > The easiest way to fix this is to only check for integer induction > conditions on the analysis stage. I don't like this. For the evolution part we have added STMT_VINFO_LOOP_PHI_EVOLUTION_PART. If you now need the original initial value as well then just save it. Or if you really want to go with the hack then please do not call is_nonwrapping_integer_induction with vec_stmt != NULL but initialize cond_expr_is_nonwrapping_integer_induction from STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) The hack also lacks a comment. Richard. > gcc/ > PR tree-optimization/68413 > * tree-vect-loop.c (vectorizable_reduction): Only check for > integer cond reduction on analysis stage. > > > > > Thanks, > Alan. >