Re: Turn DECL_SECTION_NAME into string
Hi! On Thu, 12 Jun 2014 06:33:25 +0200, Jan Hubicka hubi...@ucw.cz wrote: this lenghtly patch makes the legwork to put section names out of tree representation. Originally they were STRING_CST. I ended up implementing on-side reference counted string voclabulary that is done in bit baroque way to be GGC and PCH safe (uff). As reported in https://gcc.gnu.org/PR61508, this causes a build failure with --enable-checking=fold: /home/dimhen/src/gcc_current/gcc/fold-const.c: In function 'void fold_checksum_tree(const_tree, md5_ctx*, hash_tablepointer_hashtree_node)': /home/dimhen/src/gcc_current/gcc/fold-const.c:14863:55: error: cannot convert 'const char*' to 'const_tree {aka const tree_node*}' for argument '1' to 'void fold_checksum_tree(const_tree, md5_ctx*, hash_tablepointer_hashtree_node )' fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht); From light testing the following seems to get around this -- is it the appropriate fix? diff --git gcc/fold-const.c gcc/fold-const.c index 24daaa3..978b854 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -14859,8 +14859,6 @@ fold_checksum_tree (const_tree expr, struct md5_ctx *ctx, fold_checksum_tree (DECL_ABSTRACT_ORIGIN (expr), ctx, ht); fold_checksum_tree (DECL_ATTRIBUTES (expr), ctx, ht); } - if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_WITH_VIS)) - fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht); if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_NON_COMMON)) { Grüße, Thomas pgpPJBK1qw1Im.pgp Description: PGP signature
[DOC Patch] Attribute 'naked'
I don't have permissions to commit this patch, but I do have a release on file with the FSF. Problem description: The docs for the function attribute 'naked' are confusing and self-contradictory. Also, discussion on this thread https://gcc.gnu.org/ml/gcc/2014-05/msg00100.html has lead to changing the text from the vague avoid using to the very clear not supported regarding the usage of Extended asm with 'naked.' Lastly, this attribute should be mentioned when describing the differences between Basic and Extended asm. ChangeLog: 2014-06-17 David Wohlferd d...@limegreensocks.com * doc/extend.texi (Function Attributes): Update 'naked' attribute doc. dw Index: extend.texi === --- extend.texi (revision 210624) +++ extend.texi (working copy) @@ -3332,16 +3332,15 @@ @item naked @cindex function without a prologue/epilogue code -Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX and SPU -ports to indicate that the specified function does not need prologue/epilogue -sequences generated by the compiler. -It is up to the programmer to provide these sequences. The -only statements that can be safely included in naked functions are -@code{asm} statements that do not have operands. All other statements, -including declarations of local variables, @code{if} statements, and so -forth, should be avoided. Naked functions should be used to implement the -body of an assembly function, while allowing the compiler to construct -the requisite function declaration for the assembler. +This attribute is available on the ARM, AVR, MCORE, MSP430, NDS32, +RL78, RX and SPU ports. It allows the compiler to construct the +requisite function declaration, while allowing the body of the +function to be assembly code. The specified function will not have +prologue/epilogue sequences generated by the compiler. Only Basic +@code{asm} statements can safely be included in naked functions +(@pxref{Basic Asm}). While using Extended @code{asm} or a mixture of +Basic @code{asm} and ``C'' code may appear to work, they cannot be +depended upon to work reliably and are not supported. @item near @cindex functions that do not handle memory bank switching on 68HC11/68HC12 @@ -6269,6 +6268,8 @@ efficient code, and in most cases it is a better solution. When writing inline assembly language outside of C functions, however, you must use Basic @code{asm}. Extended @code{asm} statements have to be inside a C function. +Functions declared with the @code{naked} attribute also require Basic +@code{asm} (@pxref{Function Attributes}). Under certain circumstances, GCC may duplicate (or remove duplicates of) your assembly code when optimizing. This can lead to unexpected duplicate @@ -6388,6 +6389,8 @@ Note that Extended @code{asm} statements must be inside a function. Only Basic @code{asm} may be outside functions (@pxref{Basic Asm}). +Functions declared with the @code{naked} attribute also require Basic +@code{asm} (@pxref{Function Attributes}). While the uses of @code{asm} are many and varied, it may help to think of an @code{asm} statement as a series of low-level instructions that convert input
Re: [PATCH] Fix PR61335
On Fri, Jun 6, 2014 at 10:07 AM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Jun 6, 2014 at 9:47 AM, Uros Bizjak ubiz...@gmail.com wrote: 2014-05-28 Richard Biener rguent...@suse.de PR tree-optimization/61335 * tree-vrp.c (vrp_visit_phi_node): If the compare of old and new range fails, drop to varying. * gfortran.dg/pr61335.f90: New testcase. This testcase triggers SIGFPE on alpha due to the use of denormal operand. Maybe uninitialized value is used in line 48? SIGFPE also triggers at the same place on x86_64 with unmasked FPE exceptions (compile with -O0). Attached patch initializes problematic array to zero instead of uninitialized value. 2014-06-17 Uros Bizjak ubiz...@gmail.com * gfortran.dg/pr61335.f90 (cp_unit_create): Initialize unit_id and kind_id to zero. Tested on alphaev68-linux-gnu and x86_64-linux-gnu. OK for mainline? Uros. Index: gfortran.dg/pr61335.f90 === --- gfortran.dg/pr61335.f90 (revision 211723) +++ gfortran.dg/pr61335.f90 (working copy) @@ -45,8 +45,8 @@ LOGICAL :: failure failure=.FALSE. -unit_id=cp_units_none -kind_id=cp_ukind_none +unit_id=0 +kind_id=0 power=0 i_low=1 i_high=1
[PATCH] PR54555: Use strict_low_part for loading a constant only if it is cheaper
Postreload may transform (set (REGX) (CONST_INT A)) ... (set (REGX) (CONST_INT B)) to (set (REGX) (CONST_INT A)) ... (set (STRICT_LOW_PART (REGX)) (CONST_INT B)), but it should do that only if the latter is cheaper. On m68k, a full word load of a small constant with moveq is cheaper than doing a byte load with move.b. Tested on m68k-suse-linux and x86_64-suse-linux. In both cases the size of cc1* becomes smaller with this change. Andreas. PR rtl-optimization/54555 * postreload.c (move2add_use_add2_insn): Only substitute STRICT_LOW_PART if it is cheaper. --- gcc/postreload.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/gcc/postreload.c b/gcc/postreload.c index 9d71649..89f0c84 100644 --- a/gcc/postreload.c +++ b/gcc/postreload.c @@ -1805,10 +1805,14 @@ move2add_use_add2_insn (rtx reg, rtx sym, rtx off, rtx insn) gen_rtx_STRICT_LOW_PART (VOIDmode, narrow_reg), narrow_src); - changed = validate_change (insn, PATTERN (insn), -new_set, 0); - if (changed) - break; + get_full_set_rtx_cost (new_set, newcst); + if (costs_lt_p (newcst, oldcst, speed)) + { + changed = validate_change (insn, PATTERN (insn), +new_set, 0); + if (changed) + break; + } } } } -- 2.0.0 -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: Regimplification enhancements 1/3
On Mon, Jun 16, 2014 at 11:52 PM, Mike Stump mikest...@comcast.net wrote: On Jun 16, 2014, at 10:49 AM, Bernd Schmidt ber...@codesourcery.com wrote: There are two reasons why I can't do this in the frontends - one, Joseph has already rejected a C frontend patch, I’d like to think there is an acceptable way to get the right memory space on things... and two, this needs to work with OpenACC offloading - i.e. code is initially compiled by an x86 host compiler, then a ptx lto1 reads it in and needs to make it valid for that target. Ah yes, that would do it, thanks. I can see my port as an offload target… I’ll have to keep on eye on OpenACC and gcc. But then IMHO using the gimplifier to do this fixup is wrong. Please add those required ADDR_SPACE_CONVERT_EXPRs in your pass manually. After all you also have to adjust types of MEM_REFs and possibly types of pointer variables (and pointer sizes?). Richard.
Re: fix math wrt volatile-bitfields vs C++ model
On Tue, Jun 17, 2014 at 4:08 AM, DJ Delorie d...@redhat.com wrote: Looks ok to me, but can you add a testcase please? I have a testcase, but if -flto the testcase doesn't include *any* definition of the test function, just all the LTO data. Is this normal? Without -ffat-lto-objects yes, this is normal. If you are trying to do a scan-assembler or so then this will be difficult with LTO. If LTO is not necessary to trigger the bug and you just want to use the torture I suggest to dg-skip-if -flto. Also check if 4.9 is affected. It is... same fix works, though. Thanks, Richard.
Re: [PATCH][genattrtab] Fix memory corruption, allocate enough memory for all bypassed reservations
On 16/06/14 17:39, Jeff Law wrote: On 06/16/14 04:12, Kyrill Tkachov wrote: Doh, you're right. I did consider it but for some reason thought we might want to iterate over all of the bypasses anyway. Breaking out seems good. How about this? Tested on arm and aarch64 and confirmed with valgrind that no out of bounds accesses occur. I kicked off an x86_64 bootstrap but don't expect any problems. Thanks, Kyrill genattrtab-bypasses.patch commit 676b85f7a7cc1446482334dcaad457ac328875a8 Author: Kyrylo Tkachovkyrylo.tkac...@arm.com Date: Fri Jun 13 11:09:57 2014 +0100 [genattrtab] Fix memory corruption with bypasses I'm an idiot. n_bypassed is used to size the vector, so you do have to walk the entire list. AFAICS in the loop in process_bypasses we want to count all the reservations which have a bypass matching them. Once a reservation is matched with a bypass it should be safe to break out of the inner loop (over the bypasses), even if two bypasses match a reservation we only want to count the reservation once. So I think the 2nd version of the patch is good Thanks, Kyrill Jeff
Re: [PATCH, cprop] Check rtx_cost when propagating constant
On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: Hi, For some large constant, ports like ARM, need one more instructions to operate it. e.g #define MASK 0xfe00ff void maskdata (int * data, int len) { int i = len; for (; i 0; i -= 2) { data[i] = MASK; data[i + 1] = MASK; } } Need two instructions for each AND operation: andr3, r3, #16711935 bicr3, r3, #65536 If we keep the MASK in a register, loop2_invariant pass can hoist it out the loop. And it can be shared by different references. So the patch skips constant propagation if it makes INSN's cost higher. So cprop undos invariant motions work here? Should we make sure we add a REG_EQUAL note when not propagating? Bootstrap and no make check regression on X86-64 and ARM Chrome book. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-17 Zhenqiang Chen zhenqiang.c...@linaro.org * cprop.c (try_replace_reg): Check cost for constants. diff --git a/gcc/cprop.c b/gcc/cprop.c index aef3ee8..c9cf02a 100644 --- a/gcc/cprop.c +++ b/gcc/cprop.c @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) rtx src = 0; int success = 0; rtx set = single_set (insn); + int old_cost = 0; + bool copy_p = false; + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); + + if (set SET_SRC (set) REG_P (SET_SRC (set))) +copy_p = true; + else +old_cost = set_rtx_cost (set, speed); Looks bogus for set == NULL? Also what about register pressure? I think this kind of change needs wider testing as RTX costs are usually not fully implemented and you introduce a new use kind (or is it already used elsewhere in this way to compute cost difference of a set with s/reg/const?). What kind of performance difference do you see? Thanks, Richard. /* Usually we substitute easy stuff, so we won't copy everything. We however need to take care to not duplicate non-trivial CONST @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) to = copy_rtx (to); validate_replace_src_group (from, to, insn); + + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. + And it can be shared by different references. So skip propagation if + it makes INSN's rtx cost higher. */ + if (set !copy_p CONSTANT_P (to)) +{ + int new_cost = set_rtx_cost (set, speed); + if (new_cost old_cost) + { + cancel_changes (0); + return false; + } +} + if (num_changes_pending () apply_change_group ()) success = 1;
Re: Turn DECL_SECTION_NAME into string
On Tue, Jun 17, 2014 at 8:40 AM, Thomas Schwinge tho...@codesourcery.com wrote: Hi! On Thu, 12 Jun 2014 06:33:25 +0200, Jan Hubicka hubi...@ucw.cz wrote: this lenghtly patch makes the legwork to put section names out of tree representation. Originally they were STRING_CST. I ended up implementing on-side reference counted string voclabulary that is done in bit baroque way to be GGC and PCH safe (uff). As reported in https://gcc.gnu.org/PR61508, this causes a build failure with --enable-checking=fold: /home/dimhen/src/gcc_current/gcc/fold-const.c: In function 'void fold_checksum_tree(const_tree, md5_ctx*, hash_tablepointer_hashtree_node)': /home/dimhen/src/gcc_current/gcc/fold-const.c:14863:55: error: cannot convert 'const char*' to 'const_tree {aka const tree_node*}' for argument '1' to 'void fold_checksum_tree(const_tree, md5_ctx*, hash_tablepointer_hashtree_node )' fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht); From light testing the following seems to get around this -- is it the appropriate fix? Yes. This is ok. Thanks, Richard. diff --git gcc/fold-const.c gcc/fold-const.c index 24daaa3..978b854 100644 --- gcc/fold-const.c +++ gcc/fold-const.c @@ -14859,8 +14859,6 @@ fold_checksum_tree (const_tree expr, struct md5_ctx *ctx, fold_checksum_tree (DECL_ABSTRACT_ORIGIN (expr), ctx, ht); fold_checksum_tree (DECL_ATTRIBUTES (expr), ctx, ht); } - if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_WITH_VIS)) - fold_checksum_tree (DECL_SECTION_NAME (expr), ctx, ht); if (CODE_CONTAINS_STRUCT (TREE_CODE (expr), TS_DECL_NON_COMMON)) { Grüße, Thomas
[gomp4] Merge trunk r211693 (2014-06-16) into gomp-4_0-branch
Hi! In r211726, I have committed a merge from trunk r211693 (2014-06-16) into gomp-4_0-branch. The LTO regression that appeared with an earlier merge, http://news.gmane.org/find-root.php?message_id=%3C87wqf483pl.fsf%40schwinge.name%3E, remains to be resolved: PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o assemble, -O -flto -save-temps -PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o link, -O -flto -save-temps +FAIL: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o link, -O -flto -save-temps +UNRESOLVED: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o execute -O -flto -save-temps Executing on host: [...]/build/gcc/xgcc -B[...]/build/gcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O -flto -save-temps -c -o c_lto_save-temps_0.o [...]/source/gcc/testsuite/gcc.dg/lto/save-temps_0.c(timeout = 300) spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ -fno-diagnostics-show-caret -fdiagnostics-color=never -O -flto -save-temps -c -o c_lto_save-temps_0.o [...]/source/gcc/testsuite/gcc.dg/lto/save-temps_0.c PASS: gcc.dg/lto/save-temps c_lto_save-temps_0.o assemble, -O -flto -save-temps Executing on host: [...]/build/gcc/xgcc -B[...]/build/gcc/ c_lto_save-temps_0.o -fno-diagnostics-show-caret -fdiagnostics-color=never -O -flto -save-temps -o gcc-dg-lto-save-temps-01.exe(timeout = 300) spawn [...]/build/gcc/xgcc -B[...]/build/gcc/ c_lto_save-temps_0.o -fno-diagnostics-show-caret -fdiagnostics-color=never -O -flto -save-temps -o gcc-dg-lto-save-temps-01.exe [...]/build/gcc/xgcc @/tmp/ccjomvFW [...]/build/gcc/xgcc @/tmp/ccAM0t6j output is: [...]/build/gcc/xgcc @/tmp/ccjomvFW [...]/build/gcc/xgcc @/tmp/ccAM0t6j FAIL: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o link, -O -flto -save-temps UNRESOLVED: gcc.dg/lto/save-temps c_lto_save-temps_0.o-c_lto_save-temps_0.o execute -O -flto -save-temps Owing to the Fortran front end changes for OpenMP 4 User-Defined Reductions, I have adapted the expected error messages for OpenACC as follows. While this is not critical, perhaps someone may want to improve this later on; so noting this here for later reference. --- gcc/testsuite/gfortran.dg/goacc/reduction.f95 +++ gcc/testsuite/gfortran.dg/goacc/reduction.f95 @@ -66,73 +66,73 @@ common /blk/ i1 !$acc end parallel !$acc parallel reduction (*:ia1) ! { dg-error Assumed size } !$acc end parallel -!$acc parallel reduction (+:l1)! { dg-error must be of numeric type, got LOGICAL } +!$acc parallel reduction (+:l1)! { dg-error OMP DECLARE REDUCTION \\+ not found for type LOGICAL } !$acc end parallel -!$acc parallel reduction (*:la1) ! { dg-error must be of numeric type, got LOGICAL } +!$acc parallel reduction (*:la1) ! { dg-error OMP DECLARE REDUCTION \\* not found for type LOGICAL } !$acc end parallel -!$acc parallel reduction (-:a1)! { dg-error must be of numeric type, got CHARACTER } +!$acc parallel reduction (-:a1)! { dg-error OMP DECLARE REDUCTION - not found for type CHARACTER } !$acc end parallel -!$acc parallel reduction (+:t1)! { dg-error must be of numeric type, got TYPE } +!$acc parallel reduction (+:t1)! { dg-error OMP DECLARE REDUCTION \\+ not found for type TYPE } !$acc end parallel -!$acc parallel reduction (*:ta1) ! { dg-error must be of numeric type, got TYPE } +!$acc parallel reduction (*:ta1) ! { dg-error OMP DECLARE REDUCTION \\* not found for type TYPE } !$acc end parallel -!$acc parallel reduction (.and.:i3)! { dg-error must be LOGICAL } +!$acc parallel reduction (.and.:i3)! { dg-error OMP DECLARE REDUCTION \\.and\\. not found for type INTEGER } !$acc end parallel -!$acc parallel reduction (.or.:ia2)! { dg-error must be LOGICAL } +!$acc parallel reduction (.or.:ia2)! { dg-error OMP DECLARE REDUCTION \\.or\\. not found for type INTEGER } !$acc end parallel -!$acc parallel reduction (.eqv.:r1)! { dg-error must be LOGICAL } +!$acc parallel reduction (.eqv.:r1)! { dg-error OMP DECLARE REDUCTION \\.eqv\\. not found for type REAL } !$acc end parallel -!$acc parallel reduction (.neqv.:ra1) ! { dg-error must be LOGICAL } +!$acc parallel reduction (.neqv.:ra1) ! { dg-error OMP DECLARE REDUCTION \\.neqv\\. not found for type REAL } !$acc end parallel -!$acc parallel reduction (.and.:d1)! { dg-error must be LOGICAL } +!$acc parallel reduction (.and.:d1)! { dg-error OMP DECLARE REDUCTION \\.and\\. not found for type REAL } !$acc end parallel -!$acc parallel reduction (.or.:da1)! { dg-error must be LOGICAL } +!$acc parallel reduction (.or.:da1)! { dg-error OMP DECLARE REDUCTION \\.or\\. not found for type REAL } !$acc end parallel -!$acc parallel reduction (.eqv.:c1)! { dg-error must be LOGICAL } +!$acc parallel reduction (.eqv.:c1)! {
[c++-concepts] Fix assertion failure with cp_maybe_constrained_type_specifier
cp_maybe_constrained_type_specifier asserted that the decl passed in would be of type OVERLOAD, however a clean build of the compiler was broken since it could also be a BASELINK. I'm not entirely sure when this is the case, except that it seems to happen with class member templates as it also caused a test case in my next patch to fail. The solution is to check for a BASELINK and extract the functions from it. The possibility of decl being a BASELINK is asserted near the call in cp_parser_template_id (cp_maybe_partial_concept_id just calls the function in question at this time). 2014-06-17 Braden Obrzut ad...@maniacsvault.net * gcc/cp/parser.c (cp_maybe_constrained_type_specifier): Fix assertion failure if baselink was passed in as decl. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 1eaf863..40d1d63 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -15175,6 +15175,9 @@ cp_parser_allows_constrained_type_specifier (cp_parser *parser) static tree cp_maybe_constrained_type_specifier (cp_parser *parser, tree decl, tree args) { + if (BASELINK_P (decl)) +decl = BASELINK_FUNCTIONS (decl); + gcc_assert (TREE_CODE (decl) == OVERLOAD); gcc_assert (args ? TREE_CODE (args) == TREE_VEC : true);
[PATCH] Testcase for PR61012
From the new dup. Committed to trunk and branch. Richard. 2014-06-17 Richard Biener rguent...@suse.de PR lto/61012 * gcc.dg/lto/pr61526_0.c: New testcase. * gcc.dg/lto/pr61526_1.c: Likewise. Index: gcc/testsuite/gcc.dg/lto/pr61526_0.c === --- gcc/testsuite/gcc.dg/lto/pr61526_0.c(revision 0) +++ gcc/testsuite/gcc.dg/lto/pr61526_0.c(working copy) @@ -0,0 +1,6 @@ +/* { dg-lto-do link } */ +/* { dg-lto-options { { -fPIC -flto -flto-partition=1to1 } } } */ +/* { dg-extra-ld-options { -shared } } */ + +static void *master; +void *foo () { return master; } Index: gcc/testsuite/gcc.dg/lto/pr61526_1.c === --- gcc/testsuite/gcc.dg/lto/pr61526_1.c(revision 0) +++ gcc/testsuite/gcc.dg/lto/pr61526_1.c(working copy) @@ -0,0 +1,2 @@ +extern void *master; +void *bar () { return master; }
[c++-concepts] Allow function parameters to be referenced in trailing requires clauses
This patch allows function parameters to be referenced by trailing requires clauses. Typically this is used to refer to the type of an implicitly generated template. For example, the following should now be valid (where C is some previously defined concept): auto f1 (auto x) requires Cdecltype(x) (); Note that the test case trailing-requires-overload.C will fail to compile unless the previously submitted patch is applied first. 2014-06-17 Braden Obrzut ad...@maniacsvault.net * gcc/cp/parser.c (cp_parser_trailing_requirements): Handle requires keyword manually so that we can push function parameters back into scope. * gcc/cp/decl.c (push_function_parms): New. Recovers and reopens function parameter scope from declarator. * gcc/testsuite/g++.dg/concepts/trailing-requires.C: New tests. * gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C: New tests. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 5d23bfa..aca3ce5 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5409,6 +5409,7 @@ extern bool defer_mark_used_calls; extern GTY(()) vectree, va_gc *deferred_mark_used_calls; extern tree finish_case_label (location_t, tree, tree); extern tree cxx_maybe_build_cleanup (tree, tsubst_flags_t); +extern void push_function_parms (cp_declarator *); /* in decl2.c */ extern bool check_java_method (tree); diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 9791dba..5daccf8 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -13791,6 +13791,22 @@ store_parm_decls (tree current_function_parms) current_eh_spec_block = begin_eh_spec_block (); } +/* Bring the parameters of a function declaration back into scope without + entering the function body. Declarator must be a function declarator. + Caller is responsible for calling finish_scope. */ + +void +push_function_parms (cp_declarator *declarator) +{ + begin_scope (sk_function_parms, NULL_TREE); + + for (tree parms = declarator-u.function.parameters; parms != NULL_TREE +!VOID_TYPE_P (TREE_VALUE (parms)); parms = TREE_CHAIN (parms)) +{ + pushdecl (TREE_VALUE (parms)); +} +} + /* We have finished doing semantic analysis on DECL, but have not yet generated RTL for its body. Save away our current state, so that diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 1eaf863..2d5862f 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -16929,7 +16929,20 @@ cp_parser_trailing_requirements (cp_parser *parser, cp_declarator *decl) terse_reqs = get_shorthand_requirements (current_template_parms); // An optional requires clause can yield an additional constraint. - tree explicit_reqs = cp_parser_requires_clause_opt (parser); + tree explicit_reqs = NULL_TREE; + if (cp_lexer_next_token_is_keyword (parser-lexer, RID_REQUIRES)) +{ + cp_lexer_consume_token (parser-lexer); + + // Bring parms back into scope so requires clause can reference them. + ++cp_unevaluated_operand; + push_function_parms (decl); + + explicit_reqs = cp_parser_requires_clause (parser); + + finish_scope(); + --cp_unevaluated_operand; +} // If requirements were specified in either the implicit // template parameter list or an explicit requires clause, diff --git a/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C b/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C new file mode 100644 index 000..2fc6cdb --- /dev/null +++ b/gcc/testsuite/g++.dg/concepts/trailing-requires-overload.C @@ -0,0 +1,115 @@ +// { dg-do run } +// { dg-options -std=c++1y } + +#include cassert + +templatetypename T + concept bool C () + { +return requires (T a, T b) { { a + b } - T }; + } + +templatetypename T + concept bool D () + { +return requires (T a, T b) { { a - b } - T }; + } + +templatetypename T + concept bool M () + { +return requires (T a, T b) { { a * b } - T }; + } + +templatetypename T + requires CT () + struct Adds + { +Adds(T a) { v = a; } +T v; + }; + +templatetypename T + AddsT operator+ (const AddsT a, const AddsT b) + { +return a.v + b.v; + } + +templatetypename T + requires DT () + struct Subs + { +Subs(T a) { v = a; } +T v; + }; + +templatetypename T + SubsT operator- (const SubsT a, const SubsT b) + { +return a.v - b.v; + } + +templatetypename T + requires MT () + struct Mults + { +Mults(T a) { v = a; } +T v; + }; + +templatetypename T + MultsT operator- (const MultsT a, const MultsT b) + { +return a.v * b.v; + } + +auto f1 (auto a, decltype(a) b) - decltype(a) requires Mdecltype(a) (); +auto f1 (auto a, decltype(a) b) - decltype(a) requires Ddecltype(a) (); +auto f1 (auto a, decltype(a) b) - decltype(a) requires Cdecltype(a) (); + +struct S1 +{ + auto f2 (auto a) - decltype(a) requires Cdecltype(a) (); + auto f2 (auto a) - decltype(a) requires Ddecltype(a) (); + auto f2 (auto a) -
Commit: MSP430: Add NOP after DINT in hardware multiply patterns
Hi Guys, I am checking in the patch below to update the hardware multiply patterns for the MSP430 so that there is a NOP instruction after disabling interrupts with the DINT instruction. Timing issues mean that it is possible for the instruction following the DINT to be interrupted, so it has to be a NOP. The change is going in to the mainline sources and the 4.9 branch. Cheers Nick gcc/ChangeLog 2014-06-17 Nick Clifton ni...@redhat.com * config/msp430/msp430.md (mulhisi3): Add a NOP after the DINT. (umulhi3, mulsidi3, umulsidi3): Likewise. Index: gcc/config/msp430/msp430.md === --- gcc/config/msp430/msp430.md (revision 211726) +++ gcc/config/msp430/msp430.md (working copy) @@ -1423,9 +1423,9 @@ optimize 2 msp430_hwmult_type != NONE * if (msp430_use_f5_series_hwmult ()) - return \PUSH.W sr { DINT { MOV.W %1, 0x04C2 { MOV.W %2, 0x04C8 { MOV.W 0x04CA, %L0 { MOV.W 0x04CC, %H0 { POP.W sr\; + return \PUSH.W sr { DINT { NOP { MOV.W %1, 0x04C2 { MOV.W %2, 0x04C8 { MOV.W 0x04CA, %L0 { MOV.W 0x04CC, %H0 { POP.W sr\; else - return \PUSH.W sr { DINT { MOV.W %1, 0x0132 { MOV.W %2, 0x0138 { MOV.W 0x013A, %L0 { MOV.W 0x013C, %H0 { POP.W sr\; + return \PUSH.W sr { DINT { NOP { MOV.W %1, 0x0132 { MOV.W %2, 0x0138 { MOV.W 0x013A, %L0 { MOV.W 0x013C, %H0 { POP.W sr\; ) @@ -1436,9 +1436,9 @@ optimize 2 msp430_hwmult_type != NONE * if (msp430_use_f5_series_hwmult ()) - return \PUSH.W sr { DINT { MOV.W %1, 0x04C0 { MOV.W %2, 0x04C8 { MOV.W 0x04CA, %L0 { MOV.W 0x04CC, %H0 { POP.W sr\; + return \PUSH.W sr { DINT { NOP { MOV.W %1, 0x04C0 { MOV.W %2, 0x04C8 { MOV.W 0x04CA, %L0 { MOV.W 0x04CC, %H0 { POP.W sr\; else - return \PUSH.W sr { DINT { MOV.W %1, 0x0130 { MOV.W %2, 0x0138 { MOV.W 0x013A, %L0 { MOV.W 0x013C, %H0 { POP.W sr\; + return \PUSH.W sr { DINT { NOP { MOV.W %1, 0x0130 { MOV.W %2, 0x0138 { MOV.W 0x013A, %L0 { MOV.W 0x013C, %H0 { POP.W sr\; ) @@ -1449,9 +1449,9 @@ optimize 2 msp430_hwmult_type != NONE * if (msp430_use_f5_series_hwmult ()) - return \PUSH.W sr { DINT { MOV.W %L1, 0x04D4 { MOV.W %H1, 0x04D6 { MOV.W %L2, 0x04E0 { MOV.W %H2, 0x04E2 { MOV.W 0x04E4, %A0 { MOV.W 0x04E6, %B0 { MOV.W 0x04E8, %C0 { MOV.W 0x04EA, %D0 { POP.W sr\; + return \PUSH.W sr { DINT { NOP { MOV.W %L1, 0x04D4 { MOV.W %H1, 0x04D6 { MOV.W %L2, 0x04E0 { MOV.W %H2, 0x04E2 { MOV.W 0x04E4, %A0 { MOV.W 0x04E6, %B0 { MOV.W 0x04E8, %C0 { MOV.W 0x04EA, %D0 { POP.W sr\; else - return \PUSH.W sr { DINT { MOV.W %L1, 0x0144 { MOV.W %H1, 0x0146 { MOV.W %L2, 0x0150 { MOV.W %H2, 0x0152 { MOV.W 0x0154, %A0 { MOV.W 0x0156, %B0 { MOV.W 0x0158, %C0 { MOV.W 0x015A, %D0 { POP.W sr\; + return \PUSH.W sr { DINT { NOP { MOV.W %L1, 0x0144 { MOV.W %H1, 0x0146 { MOV.W %L2, 0x0150 { MOV.W %H2, 0x0152 { MOV.W 0x0154, %A0 { MOV.W 0x0156, %B0 { MOV.W 0x0158, %C0 { MOV.W 0x015A, %D0 { POP.W sr\; ) @@ -1462,8 +1462,8 @@ optimize 2 msp430_hwmult_type != NONE * if (msp430_use_f5_series_hwmult ()) - return \PUSH.W sr { DINT { MOV.W %L1, 0x04D0 { MOV.W %H1, 0x04D2 { MOV.W %L2, 0x04E0 { MOV.W %H2, 0x04E2 { MOV.W 0x04E4, %A0 { MOV.W 0x04E6, %B0 { MOV.W 0x04E8, %C0 { MOV.W 0x04EA, %D0 { POP.W sr\; + return \PUSH.W sr { DINT { NOP { MOV.W %L1, 0x04D0 { MOV.W %H1, 0x04D2 { MOV.W %L2, 0x04E0 { MOV.W %H2, 0x04E2 { MOV.W 0x04E4, %A0 { MOV.W 0x04E6, %B0 { MOV.W 0x04E8, %C0 { MOV.W 0x04EA, %D0 { POP.W sr\; else - return \PUSH.W sr { DINT { MOV.W %L1, 0x0140 { MOV.W %H1, 0x0142 { MOV.W %L2, 0x0150 { MOV.W %H2, 0x0152 { MOV.W 0x0154, %A0 { MOV.W 0x0156, %B0 { MOV.W 0x0158, %C0 { MOV.W 0x015A, %D0 { POP.W sr\; + return \PUSH.W sr { DINT { NOP { MOV.W %L1, 0x0140 { MOV.W %H1, 0x0142 { MOV.W %L2, 0x0150 { MOV.W %H2, 0x0152 { MOV.W 0x0154, %A0 { MOV.W 0x0156, %B0 { MOV.W 0x0158, %C0 { MOV.W 0x015A, %D0 { POP.W sr\; )
[PATCH][match-and-simplify] Make gimple_fold_stmt_to_constant_1 dumping more useful
Committed. Richard. 2014-06-17 Richard Biener rguent...@suse.de * gimple-fold.c (gimple_fold_stmt_to_constant_1): Dump simplified expression. Index: gcc/gimple-fold.c === --- gcc/gimple-fold.c (revision 211452) +++ gcc/gimple-fold.c (working copy) @@ -2810,8 +2810,8 @@ gimple_fold_stmt_to_constant_1 (gimple s { if (dump_file dump_flags TDF_DETAILS) { - fprintf (dump_file, Match-and-simplified definition of ); - print_generic_expr (dump_file, lhs, 0); + fprintf (dump_file, Match-and-simplified ); + print_gimple_expr (dump_file, stmt, 0, TDF_SLIM); fprintf (dump_file, to ); print_generic_expr (dump_file, res, 0); fprintf (dump_file, \n);
Re: [PATCH, cprop] Check rtx_cost when propagating constant
On 17 June 2014 16:15, Richard Biener richard.guent...@gmail.com wrote: On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: Hi, For some large constant, ports like ARM, need one more instructions to operate it. e.g #define MASK 0xfe00ff void maskdata (int * data, int len) { int i = len; for (; i 0; i -= 2) { data[i] = MASK; data[i + 1] = MASK; } } Need two instructions for each AND operation: andr3, r3, #16711935 bicr3, r3, #65536 If we keep the MASK in a register, loop2_invariant pass can hoist it out the loop. And it can be shared by different references. So the patch skips constant propagation if it makes INSN's cost higher. So cprop undos invariant motions work here? Yes. GLOBAL CONST-PROP will undo invariant motions. Should we make sure we add a REG_EQUAL note when not propagating? Logs show there already has REG_EQUAL note. Bootstrap and no make check regression on X86-64 and ARM Chrome book. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-06-17 Zhenqiang Chen zhenqiang.c...@linaro.org * cprop.c (try_replace_reg): Check cost for constants. diff --git a/gcc/cprop.c b/gcc/cprop.c index aef3ee8..c9cf02a 100644 --- a/gcc/cprop.c +++ b/gcc/cprop.c @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn) rtx src = 0; int success = 0; rtx set = single_set (insn); + int old_cost = 0; + bool copy_p = false; + bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)); + + if (set SET_SRC (set) REG_P (SET_SRC (set))) +copy_p = true; + else +old_cost = set_rtx_cost (set, speed); Looks bogus for set == NULL? set_rtx_cost has checked it. If it is NULL, the function will return 0; Also what about register pressure? Do you think it has big register pressure impact? I think it does not increase register pressure. I think this kind of change needs wider testing as RTX costs are usually not fully implemented and you introduce a new use kind (or is it already used elsewhere in this way to compute cost difference of a set with s/reg/const?). Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the decision. e.g. in function attempt_change of auto-inc-dec.c, it has code segments like: old_cost = (set_src_cost (mem, speed) + set_rtx_cost (PATTERN (inc_insn.insn), speed)); new_cost = set_src_cost (mem_tmp, speed); ... if (old_cost new_cost) { ... return false; } The usage of RTX costs in this patch is similar. I had run X86-64 bootstrap and regression tests with --enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java And ARM bootstrap and regression tests with --enable-languages=c,c++,fortran,lto,objc,obj-c++ I will run tests on i686. What other tests do you think I have to run? What kind of performance difference do you see? I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm backend changes). Coremark with some options show 10% performance improvement. dhrystone is a little better. Some wave in eembc, but overall result is better. I will run spec2000 on X86-64 and ARM, and back to you about the performance changes. Thanks! -Zhenqiang Thanks, Richard. /* Usually we substitute easy stuff, so we won't copy everything. We however need to take care to not duplicate non-trivial CONST @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn) to = copy_rtx (to); validate_replace_src_group (from, to, insn); + + /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop. + And it can be shared by different references. So skip propagation if + it makes INSN's rtx cost higher. */ + if (set !copy_p CONSTANT_P (to)) +{ + int new_cost = set_rtx_cost (set, speed); + if (new_cost old_cost) + { + cancel_changes (0); + return false; + } +} + if (num_changes_pending () apply_change_group ()) success = 1;
Re: [PATCH] Fix PR61335
Uros Bizjak wrote: Attached patch initializes problematic array to zero instead of uninitialized value. 2014-06-17 Uros Bizjak ubiz...@gmail.com * gfortran.dg/pr61335.f90 (cp_unit_create): Initialize unit_id and kind_id to zero. Tested on alphaev68-linux-gnu and x86_64-linux-gnu. OK for mainline? Looks good to me, is obvious and shouldn't affect the test case. In particular the variables in questions aren't used in the code after their initialization with an undefined implicitly declared variable, which is also otherwise unused. Tobias
RE: [PATCH,MIPS] Remove unused code relating to reloading fcc
Richard Sandiford rdsandif...@googlemail.com writes: Matthew Fortune matthew.fort...@imgtec.com writes: This is a small clean-up patch to remove code relating to reloading or moving mips fcc registers. At some point in the past these registers were allocated as part of register allocation but they are now statically allocated in the backend in a round robin fashion. The code for reloading them is therefore not necessary any more. The move costs are also irrelevant so are replaced with a comment instead (but the cases can just be deleted if that is preferred). I think removing the cases would be better. OK with that change. Thanks for cleaning this up. Re-posting as I missed removing the ST_REGS handling code from mips_secondary_reload_class. Is this still OK? Testsuite run on mips-unknown-linux-gnu shows no change in pass/fail. Regards, Matthew gcc/ * config/mips/mips-protos.h (mips_expand_fcc_reload): Remove. * config/mips/mips.c (mips_expand_fcc_reload): Remove. (mips_move_to_gpr_cost): Remove ST_REGS case. (mips_move_from_gpr_cost): Likewise. (mips_register_move_cost): Likewise. (mips_secondary_reload_class): Likewise. diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h index 0b8125a..0b32a70 100644 --- a/gcc/config/mips/mips-protos.h +++ b/gcc/config/mips/mips-protos.h @@ -232,7 +232,6 @@ extern bool mips_use_pic_fn_addr_reg_p (const_rtx); extern rtx mips_expand_call (enum mips_call_type, rtx, rtx, rtx, rtx, bool); extern void mips_split_call (rtx, rtx); extern bool mips_get_pic_call_symbol (rtx *, int); -extern void mips_expand_fcc_reload (rtx, rtx, rtx); extern void mips_set_return_address (rtx, rtx); extern bool mips_move_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int); extern bool mips_store_by_pieces_p (unsigned HOST_WIDE_INT, unsigned int); diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 585b755..cff1d38 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -7195,35 +7195,6 @@ mips_function_ok_for_sibcall (tree decl, tree exp ATTRIBUTE_UNUSED) return true; } -/* Emit code to move general operand SRC into condition-code - register DEST given that SCRATCH is a scratch TFmode FPR. - The sequence is: - - FP1 = SRC - FP2 = 0.0f - DEST = FP2 FP1 - - where FP1 and FP2 are single-precision FPRs taken from SCRATCH. */ - -void -mips_expand_fcc_reload (rtx dest, rtx src, rtx scratch) -{ - rtx fp1, fp2; - - /* Change the source to SFmode. */ - if (MEM_P (src)) -src = adjust_address (src, SFmode, 0); - else if (REG_P (src) || GET_CODE (src) == SUBREG) -src = gen_rtx_REG (SFmode, true_regnum (src)); - - fp1 = gen_rtx_REG (SFmode, REGNO (scratch)); - fp2 = gen_rtx_REG (SFmode, REGNO (scratch) + MAX_FPRS_PER_FMT); - - mips_emit_move (copy_rtx (fp1), src); - mips_emit_move (copy_rtx (fp2), CONST0_RTX (SFmode)); - emit_insn (gen_slt_sf (dest, fp2, fp1)); -} - /* Implement MOVE_BY_PIECES_P. */ bool @@ -12044,10 +12015,6 @@ mips_move_to_gpr_cost (enum machine_mode mode ATTRIBUTE_UNUSED, /* MFC1, etc. */ return 4; -case ST_REGS: - /* LUI followed by MOVF. */ - return 4; - case COP0_REGS: case COP2_REGS: case COP3_REGS: @@ -12081,11 +12048,6 @@ mips_move_from_gpr_cost (enum machine_mode mode, reg_class_t to) /* MTC1, etc. */ return 4; -case ST_REGS: - /* A secondary reload through an FPR scratch. */ - return (mips_register_move_cost (mode, GENERAL_REGS, FP_REGS) - + mips_register_move_cost (mode, FP_REGS, ST_REGS)); - case COP0_REGS: case COP2_REGS: case COP3_REGS: @@ -12117,9 +12079,6 @@ mips_register_move_cost (enum machine_mode mode, if (to == FP_REGS mips_mode_ok_for_mov_fmt_p (mode)) /* MOV.FMT. */ return 4; - if (to == ST_REGS) - /* The sequence generated by mips_expand_fcc_reload. */ - return 8; } /* Handle cases in which only one class deviates from the ideal. */ @@ -12184,23 +12143,6 @@ mips_secondary_reload_class (enum reg_class rclass, if (ACC_REG_P (regno)) return reg_class_subset_p (rclass, GR_REGS) ? NO_REGS : GR_REGS; - /* We can only copy a value to a condition code register from a - floating-point register, and even then we require a scratch - floating-point register. We can only copy a value out of a - condition-code register into a general register. */ - if (reg_class_subset_p (rclass, ST_REGS)) -{ - if (in_p) - return FP_REGS; - return GP_REG_P (regno) ? NO_REGS : GR_REGS; -} - if (ST_REG_P (regno)) -{ - if (!in_p) - return FP_REGS; - return reg_class_subset_p (rclass, GR_REGS) ? NO_REGS : GR_REGS; -} - if (reg_class_subset_p (rclass, FP_REGS)) { if (MEM_P (x)
Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3
Hello. This patch fixes gcc build problems on the latest OS X 10.10 SDK beta (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61407) fixincludes/ChangeLog * inclhack.def (darwin14_has_feature): New fix * fixincl.x: Regenerate * tests/base/Availability.h: Added gcc/ChangeLog * config/darwin-c.c (version_as_macro): Added compatibility with OS X 10.10 macro version macro and triplet * config/darwin-driver.c (darwin_find_version_from_kernel): Bumped max kernel version libsanitizer/ChangeLog * sanitizer_common/sanitizer_platform_limits_posix.cc: Fixed 32-bit compatible dirent struct for OS X * sanitizer_common/sanitizer_platform_limits_posix.h: Likewise With regards, Ilya Mikhaltsou diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def index 6a1136c..b536080 100644 --- a/fixincludes/inclhack.def +++ b/fixincludes/inclhack.def @@ -4751,4 +4751,33 @@ fix = { test_text = extern char *\tsprintf();; }; + +/* + * Fix stdio.h using C++ __has_feature built-in on OS X 10.10 + */ +fix = { +hackname = darwin14_has_feature; +files = Availability.h; +mach = *-*-darwin14.0*; + +c_fix = wrap; +c_fix_arg = - _HasFeature_ + +/* + * GCC doesn't support __has_feature built-in in C mode and + * using defined(__has_feature) __has_feature in the same + * macro expression is not valid. So, easiest way is to define + * for this header __has_feature as a macro, returning 0, in case + * it is not defined internally + */ +#ifndef __has_feature +#define __has_feature(x) 0 +#endif + + +_HasFeature_; + +test_text = ''; +}; + /*EOF*/ diff --git a/fixincludes/tests/base/Availability.h b/fixincludes/tests/base/Availability.h new file mode 100644 index 000..807c40d --- /dev/null +++ b/fixincludes/tests/base/Availability.h @@ -0,0 +1,29 @@ +/* DO NOT EDIT THIS FILE. + +It has been auto-edited by fixincludes from: + + fixinc/tests/inc/Availability.h + +This had to be done to correct non-standard usages in the +original, manufacturer supplied header file. */ + +#ifndef FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE +#define FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE 1 + + +/* GCC doesn't support __has_feature built-in in C mode and + * using defined(__has_feature) __has_feature in the same + * macro expression is not valid. So, easiest way is to define + * for this header __has_feature as a macro, returning 0, in case + * it is not defined internally + */ +#ifndef __has_feature +#define __has_feature(x) 0 +#endif + + +#if defined( DARWIN14_HAS_FEATURE_CHECK ) + +#endif /* DARWIN14_HAS_FEATURE_CHECK */ + +#endif /* FIXINC_WRAP_AVAILABILITY_H_DARWIN14_HAS_FEATURE */ diff --git a/gcc/config/darwin-c.c b/gcc/config/darwin-c.c index 892ba35..39f795f 100644 --- a/gcc/config/darwin-c.c +++ b/gcc/config/darwin-c.c @@ -572,20 +572,31 @@ find_subframework_header (cpp_reader *pfile, const char *header, cpp_dir **dirp) /* Return the value of darwin_macosx_version_min suitable for the __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ macro, - so '10.4.2' becomes 1040. The lowest digit is always zero. - Print a warning if the version number can't be understood. */ + so '10.4.2' becomes 1040 and '10.10.0' becomes 101000. The lowest + digit is always zero. Print a warning if the version number + can't be understood. */ static const char * version_as_macro (void) { - static char result[] = 1000; + static char result[7] = 1000; + int minorDigitIdx; if (strncmp (darwin_macosx_version_min, 10., 3) != 0) goto fail; if (! ISDIGIT (darwin_macosx_version_min[3])) goto fail; - result[2] = darwin_macosx_version_min[3]; - if (darwin_macosx_version_min[4] != '\0' - darwin_macosx_version_min[4] != '.') + + minorDigitIdx = 3; + result[2] = darwin_macosx_version_min[minorDigitIdx++]; + if (ISDIGIT(darwin_macosx_version_min[minorDigitIdx])) { +/* Starting with 10.10 numeration for mactro changed */ +result[3] = darwin_macosx_version_min[minorDigitIdx++]; +result[4] = '0'; +result[5] = '0'; +result[6] = '\0'; + } + if (darwin_macosx_version_min[minorDigitIdx] != '\0' + darwin_macosx_version_min[minorDigitIdx] != '.') goto fail; return result; diff --git a/gcc/config/darwin-driver.c b/gcc/config/darwin-driver.c index 8b6ae93..a115616 100644 --- a/gcc/config/darwin-driver.c +++ b/gcc/config/darwin-driver.c @@ -57,7 +57,7 @@ darwin_find_version_from_kernel (char *new_flag) version_p = osversion + 1; if (ISDIGIT (*version_p)) major_vers = major_vers * 10 + (*version_p++ - '0'); - if (major_vers 4 + 9) + if (major_vers 4 + 10) goto parse_failed; if (*version_p++ != '.') goto parse_failed; diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc b/libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc index a93d38d..6783108 100644 ---
Re: [PATCH,MIPS] Remove unused code relating to reloading fcc
Matthew Fortune matthew.fort...@imgtec.com writes: Richard Sandiford rdsandif...@googlemail.com writes: Matthew Fortune matthew.fort...@imgtec.com writes: This is a small clean-up patch to remove code relating to reloading or moving mips fcc registers. At some point in the past these registers were allocated as part of register allocation but they are now statically allocated in the backend in a round robin fashion. The code for reloading them is therefore not necessary any more. The move costs are also irrelevant so are replaced with a comment instead (but the cases can just be deleted if that is preferred). I think removing the cases would be better. OK with that change. Thanks for cleaning this up. Re-posting as I missed removing the ST_REGS handling code from mips_secondary_reload_class. Is this still OK? Testsuite run on mips-unknown-linux-gnu shows no change in pass/fail. Yeah, looks good, thanks. Richard
Re: fix math wrt volatile-bitfields vs C++ model
Hi, On Tue, 17 Jun 2014 10:08:33, Richard Biener wrote: On Tue, Jun 17, 2014 at 4:08 AM, DJ Delorie d...@redhat.com wrote: Looks ok to me, but can you add a testcase please? I have a testcase, but if -flto the testcase doesn't include *any* definition of the test function, just all the LTO data. Is this normal? Without -ffat-lto-objects yes, this is normal. If you are trying to do a scan-assembler or so then this will be difficult with LTO. If LTO is not necessary to trigger the bug and you just want to use the torture I suggest to dg-skip-if -flto. Also check if 4.9 is affected. It is... same fix works, though. Thanks, Richard. If you have a test case where the generated code is actually different with and without your patch, that would be interesting. Please see gcc.dg/pr23623.c and gcc.dg/pr56997-4.c for examples how to automatically scan the intermediate code which is generated by -fdump-rtl-final to check the expected access mode. That should work for all targets, even if they have different assembler syntax. Thanks Bernd.
[PATCH] Simplify collect_switch_conv_info
This simplifies (and for me robustifies) finding of the final_bb. The current code is somewhat odd in that it requires at least one non-forwarder successor of a switch to transform. The following patch makes us simply pick the candidate from a random edge (I chose the default edge) using either the successor or its successor if the successor is a forwarder. That fixes fallout of gcc.dg/tree-ssa/pr36881.c when removing the early copyprop pass which happened to unconditionally run a cfgcleanup. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-06-17 Richard Biener rguent...@suse.de * tree-switch-conversion.c (collect_switch_conv_info): Simplify and allow all blocks to be forwarders. Index: gcc/tree-switch-conversion.c === *** gcc/tree-switch-conversion.c(revision 211727) --- gcc/tree-switch-conversion.c(working copy) *** collect_switch_conv_info (gimple swtch, *** 640,654 info-other_count += e-count; /* See if there is one common successor block for all branch ! targets. If it exists, record it in FINAL_BB. */ ! FOR_EACH_EDGE (e, ei, info-switch_bb-succs) ! { ! if (! single_pred_p (e-dest)) ! { ! info-final_bb = e-dest; ! break; ! } ! } if (info-final_bb) FOR_EACH_EDGE (e, ei, info-switch_bb-succs) { --- 640,655 info-other_count += e-count; /* See if there is one common successor block for all branch ! targets. If it exists, record it in FINAL_BB. ! Start with the destination of the default case as guess ! or its destination in case it is a forwarder block. */ ! if (! single_pred_p (e_default-dest)) ! info-final_bb = e_default-dest; ! else if (single_succ_p (e_default-dest) ! ! single_pred_p (single_succ (e_default-dest))) ! info-final_bb = single_succ (e_default-dest); ! /* Require that all switch destinations are either that common ! FINAL_BB or a forwarder to it. */ if (info-final_bb) FOR_EACH_EDGE (e, ei, info-switch_bb-succs) {
[PATCH] Use vec::qsort where possible
Just spotted these. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2014-06-17 Richard Biener rguent...@suse.de * genopinit.c (main): Use vec::qsort method. * tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise. * tree-vect-data-refs.c (vect_analyze_data_ref_accesses): Likewise. Index: gcc/genopinit.c === --- gcc/genopinit.c (revision 211698) +++ gcc/genopinit.c (working copy) @@ -357,8 +357,7 @@ main (int argc, char **argv) } /* Sort the collected patterns. */ - qsort (patterns.address (), patterns.length (), -sizeof (pattern), pattern_cmp); + patterns.qsort (pattern_cmp); /* Now that we've handled the extra patterns, eliminate them from the optabs array. That way they don't get in the way below. */ Index: gcc/tree-ssa-loop-niter.c === --- gcc/tree-ssa-loop-niter.c (revision 211698) +++ gcc/tree-ssa-loop-niter.c (working copy) @@ -3144,8 +3144,7 @@ discover_iteration_bound_by_body_walk (s fprintf (dump_file, Trying to walk loop body to reduce the bound.\n); /* Sort the bounds in decreasing order. */ - qsort (bounds.address (), bounds.length (), -sizeof (widest_int), wide_int_cmp); + bounds.qsort (wide_int_cmp); /* For every basic block record the lowest bound that is guaranteed to terminate the loop. */ Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 211698) +++ gcc/tree-vect-data-refs.c (working copy) @@ -2508,8 +2530,7 @@ vect_analyze_data_ref_accesses (loop_vec linear. Don't modify the original vector's order, it is needed for determining what dependencies are reversed. */ vecdata_reference_p datarefs_copy = datarefs.copy (); - qsort (datarefs_copy.address (), datarefs_copy.length (), -sizeof (data_reference_p), dr_group_sort_cmp); + datarefs_copy.qsort (dr_group_sort_cmp); /* Build the interleaving chains. */ for (i = 0; i datarefs_copy.length () - 1;)
Re: Make ipa-ref somewhat less stupid
On 06/16/2014 10:01 AM, Jan Hubicka wrote: On 06/10/2014 08:34 AM, Jan Hubicka wrote: Hi, ipa-reference is somewhat stupid and builds its data sets for all variables including addressable and public one just to prune them out after all bitmaps are constructed. This used to make sense when the profile generation happened at compile time, but since ipa_ref datastructure was intrdocued this is a nonsense. Martin: It may be interesting to check if this solves the memory use issues with chrome. We also may be able to re-enable ipa-ref with profile-generate as I think all the datastructures are considered to have address taken. Hi, there is a link to chromium stats: https://drive.google.com/file/d/0B0pisUJ80pO1VmNHeklCRWVkOUU/edit?usp=sharing Both compilation were run with '-flto=6', where the upper graph adds '-fprofile-generate'. Memory footprint is IMHO acceptable, but compilation process takes twice longer with profile generation. Yeah, chromium contains a really big code base :) Yep, I wonder why WPA takes so much longer. Do you think you can build lto1 with --enable-gather-detailed-mem-stats and relink with -fpre-ipa-mem-report -fpost-ipa-mem-report -fmem-report -Q and send me the output? It would be nice to push Chromium under 4GB of WPA :) There's report you requested: https://drive.google.com/file/d/0B0pisUJ80pO1RlRRTVBxUG5vSlE/edit?usp=sharing , produced by -fno-profile-generate. With enabled -fprofile-generate, WPA stage cannot fit to 24GB memory with enabled memory stats. Martin Thanks a lot! Honza
Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.
Are i386 changes ok? Patches with corresponding changes and new tests are attached. Thanks, Evgeny On Thu, Jun 12, 2014 at 12:14 PM, Richard Biener richard.guent...@gmail.com wrote: On Thu, Jun 12, 2014 at 6:04 AM, Evgeny Stupachenko evstu...@gmail.com wrote: Testing finished. No new regressions. Is the following patch ok? + if (targetm.sched.reassociation_width (VEC_PERM_EXPR, mode) 1 || + !vect_shift_permute_load_chain (dr_chain, size, stmt, gsi, result_chain)) ||s and s go to the next line. I miss testcases that make sure the vectorizer/backend code-paths are both exercised. Put them in gcc.target/i386 and provide an appropriate -march. The vectorizer changes are ok with the above fixed, I defer to backend maintainers for the i386 changes. Richard. 2014-06-11 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_reassociation_width): Add alternative for vector case. * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New. * config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New. * tree-vect-data-refs.c (vect_shift_permute_load_chain): New. Introduces alternative way of loads group permutaions. (vect_transform_grouped_load): Try alternative way of permutations. Thanks, Evgeny On Tue, Jun 10, 2014 at 4:43 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ix86_reassociation_width checks INTEGRAL_MODE_P and FLOAT_MODE_P which include vector mode. I'll try to separate this into scalar and vector part, but it will require more testing (under the testing now). What about the rest of the patch? Thanks, Evgeny On Thu, Jun 5, 2014 at 3:54 PM, Ramana Radhakrishnan ramana.radhakrish...@arm.com wrote: On 06/05/14 12:43, Evgeny Stupachenko wrote: New hook is related to vector instructions only. Vector instructions could be sequential in pipeline, but scalar - parallel. For x86 architectures TARGET_SCHED_REASSOC_WIDTH does not give required differentiation. General hooks could be potentially reused in other algorithms/by other architectures. It already takes a mode argument. Couldn't you use a vector mode to work this out ? If it is not enough then please be more specific about the documentation of this hook about where it is useful so that it's easy for people reading the documentation to understand at a glance what purpose it serves. Ramana Thanks, Evgeny On Thu, Jun 5, 2014 at 2:04 PM, Ramana Radhakrishnan ramana@googlemail.com wrote: On Wed, May 28, 2014 at 2:09 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, The patch introduces alternative way of permutations for load groups of size 2 and 3 which should be faster on architectures with low parallelism. The patch gives 2 times gain on Silvermont to the test from PR52252 (in addition to already committed 3 times gain). Patch passes bootstrap on x86. Make check is in progress. Why do we need a new hook ? Can't you derive this information from something which is equally badly named TARGET_SCHED_REASSOC_WIDTH though used in the reassociation logic but also serves a similar purpose ? Also the documentation of this hook is incomplete at best and wrong at worst as this is not applied everywhere in the vectorizer but just for this special case for load store permuting. Implying this is useful everywhere in the vectorizer does not appear to be correct. regards Ramana ChangeLog: 2014-05-28 Evgeny Stupachenko evstu...@gmail.com * config/i386/i386.c (ix86_have_vector_parallel_execution): New. (TARGET_VECTORIZE_HAVE_VECTOR_PARALLEL_EXECUTION): New. * config/i386/i386.h (TARGET_VECTOR_PARALLEL_EXECUTION): New. * config/i386/x86-tune.def (X86_TUNE_VECTOR_PARALLEL_EXECUTION): New. * target.def (have_vector_parallel_execution): New. * doc/tm.texi.in (have_vector_parallel_execution)): New. * doc/tm.texi: Regenerate. * targhooks.c (default_have_vector_parallel_execution): New. * tree-vect-data-refs.c (vect_shift_permute_load_chain): New. Introduces alternative way of loads group permutaions. (vect_transform_grouped_load): Try alternative way of permutaions. Evgeny vect_groups2.patch Description: Binary data i386tests.patch Description: Binary data
Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.
On Tue, Jun 17, 2014 at 2:33 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Are i386 changes ok? Patches with corresponding changes and new tests are attached. Please remove all target selectors from dg-options and dg-final testcase directives, they are not needed inside gcc.dg/i386 directory. The patch is OK with this change. Thanks, Uros.
[PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)
First this adds a controlling option to the phiopt pass (-fssa-phiopt). Second, this moves the first phiopt pass from the main optimization pipeline into early opts (before merge-phi which confuses phiopt but after dce which will help it). ISTR that adding an early phiopt pass was wanted to perform CFG cleanups on the weird CFG that the gimplifier produces from C++ code (but I fail to recollect the details nor remember a bug number). Generally doing a phiopt before merge-phi gets the chance to screw things up is good. Also phiopt is a kind of cleanup that is always beneficial as it decreases code-size. Bootstrap and regtest running on x86_64-unknown-linux-gnu. I felt that -ftree-XXX is bad naming so I went for -fssa-XXX even if that is now inconsistent. Any optinion here? For RTL we simply have unsuffixed names so shall we instead go for -fphiopt? PHI implies SSA anyway and 'SSA' or 'RTL' is an implementation detail that the user should not be interested in (applies to tree- as well, of course). Now, 'phiopt' is a bad name when thinking of users (but they shouldn't play with those options anyway). So - comments on the pass move? Comments on the flag naming? Thanks, Richard. 2014-06-17 Richard Biener rguent...@suse.de * passes.def (pass_all_early_optimizations): Add phi-opt after dce and before merge-phi. (pass_all_optimizations): Remove first phi-opt pass. * common.opt (fssa-phiopt): New option. * opts.c (default_options_table): Enable -fssa-phiopt with -O1+ but not with -Og. * tree-ssa-phiopt.c (pass_phiopt): Add gate method. * doc/invoke.texi (-fssa-phiopt): Document. Index: gcc/passes.def === --- gcc/passes.def (revision 211736) +++ gcc/passes.def (working copy) @@ -73,8 +73,12 @@ along with GCC; see the file COPYING3. execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); NEXT_PASS (pass_fre); - NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); + NEXT_PASS (pass_phiopt); + /* Do this after phiopt runs as phiopt is confused by +PHIs with more than two arguments. Switch conversion +looks for a single PHI block though. */ + NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_early_ipa_sra); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_convert_switch); @@ -155,7 +159,6 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_cselim); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_tree_ifcombine); - NEXT_PASS (pass_phiopt); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_ch); NEXT_PASS (pass_stdarg); Index: gcc/common.opt === --- gcc/common.opt (revision 211736) +++ gcc/common.opt (working copy) @@ -1950,6 +1950,10 @@ fsplit-wide-types Common Report Var(flag_split_wide_types) Optimization Split wide types into independent registers +fssa-phiopt +Common Report Var(flag_ssa_phiopt) Optimization +Optimize conditional patterns using SSA PHI nodes + fvariable-expansion-in-unroller Common Report Var(flag_variable_expansion_in_unroller) Optimization Apply variable expansion when loops are unrolled Index: gcc/opts.c === --- gcc/opts.c (revision 211736) +++ gcc/opts.c (working copy) @@ -457,6 +457,7 @@ static const struct default_options defa { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 }, +{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 }, /* -O2 optimizations. */ { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 }, Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c (revision 211736) +++ gcc/tree-ssa-phiopt.c (working copy) @@ -2332,6 +2332,7 @@ public: /* opt_pass methods: */ opt_pass * clone () { return new pass_phiopt (m_ctxt); } + virtual bool gate (function *) { return flag_ssa_phiopt; } virtual unsigned int execute (function *) { return tree_ssa_phiopt_worker (false, gate_hoist_loads ()); Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 211736) +++ gcc/doc/invoke.texi (working copy) @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. -fselective-scheduling -fselective-scheduling2 @gol -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol -fshrink-wrap -fsignaling-nans -fsingle-precision-constant @gol --fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol +-fsplit-ivs-in-unroller -fsplit-wide-types -fssa-phiopt
Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)
On Jun 17, 2014, at 6:07 AM, Richard Biener rguent...@suse.de wrote: First this adds a controlling option to the phiopt pass (-fssa-phiopt). Second, this moves the first phiopt pass from the main optimization pipeline into early opts (before merge-phi which confuses phiopt but after dce which will help it). ISTR that adding an early phiopt pass was wanted to perform CFG cleanups on the weird CFG that the gimplifier produces from C++ code (but I fail to recollect the details nor remember a bug number). Generally doing a phiopt before merge-phi gets the chance to screw things up is good. Also phiopt is a kind of cleanup that is always beneficial as it decreases code-size. Bootstrap and regtest running on x86_64-unknown-linux-gnu. I felt that -ftree-XXX is bad naming so I went for -fssa-XXX even if that is now inconsistent. Any optinion here? For RTL we simply have unsuffixed names so shall we instead go for -fphiopt? PHI implies SSA anyway and 'SSA' or 'RTL' is an implementation detail that the user should not be interested in (applies to tree- as well, of course). Now, 'phiopt' is a bad name when thinking of users (but they shouldn't play with those options anyway). So - comments on the pass move? Comments on the flag naming? Thanks, Richard. 2014-06-17 Richard Biener rguent...@suse.de * passes.def (pass_all_early_optimizations): Add phi-opt after dce and before merge-phi. (pass_all_optimizations): Remove first phi-opt pass. * common.opt (fssa-phiopt): New option. * opts.c (default_options_table): Enable -fssa-phiopt with -O1+ but not with -Og. * tree-ssa-phiopt.c (pass_phiopt): Add gate method. * doc/invoke.texi (-fssa-phiopt): Document. Index: gcc/passes.def === --- gcc/passes.def(revision 211736) +++ gcc/passes.def(working copy) @@ -73,8 +73,12 @@ along with GCC; see the file COPYING3. execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); NEXT_PASS (pass_fre); - NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); + NEXT_PASS (pass_phiopt); + /* Do this after phiopt runs as phiopt is confused by + PHIs with more than two arguments. Switch conversion + looks for a single PHI block though. */ + NEXT_PASS (pass_merge_phi); I had made phiopt not be confused by more than two arguments. What has changed? I think we should make phiopt again better with more two arguments. Thanks, Andrew NEXT_PASS (pass_early_ipa_sra); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_convert_switch); @@ -155,7 +159,6 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_cselim); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_tree_ifcombine); - NEXT_PASS (pass_phiopt); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_ch); NEXT_PASS (pass_stdarg); Index: gcc/common.opt === --- gcc/common.opt(revision 211736) +++ gcc/common.opt(working copy) @@ -1950,6 +1950,10 @@ fsplit-wide-types Common Report Var(flag_split_wide_types) Optimization Split wide types into independent registers +fssa-phiopt +Common Report Var(flag_ssa_phiopt) Optimization +Optimize conditional patterns using SSA PHI nodes + fvariable-expansion-in-unroller Common Report Var(flag_variable_expansion_in_unroller) Optimization Apply variable expansion when loops are unrolled Index: gcc/opts.c === --- gcc/opts.c(revision 211736) +++ gcc/opts.c(working copy) @@ -457,6 +457,7 @@ static const struct default_options defa { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 }, +{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 }, /* -O2 optimizations. */ { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 }, Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c(revision 211736) +++ gcc/tree-ssa-phiopt.c(working copy) @@ -2332,6 +2332,7 @@ public: /* opt_pass methods: */ opt_pass * clone () { return new pass_phiopt (m_ctxt); } + virtual bool gate (function *) { return flag_ssa_phiopt; } virtual unsigned int execute (function *) { return tree_ssa_phiopt_worker (false, gate_hoist_loads ()); Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi(revision 211736) +++ gcc/doc/invoke.texi(working copy) @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. -fselective-scheduling -fselective-scheduling2 @gol
Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)
On Tue, 17 Jun 2014, pins...@gmail.com wrote: On Jun 17, 2014, at 6:07 AM, Richard Biener rguent...@suse.de wrote: First this adds a controlling option to the phiopt pass (-fssa-phiopt). Second, this moves the first phiopt pass from the main optimization pipeline into early opts (before merge-phi which confuses phiopt but after dce which will help it). ISTR that adding an early phiopt pass was wanted to perform CFG cleanups on the weird CFG that the gimplifier produces from C++ code (but I fail to recollect the details nor remember a bug number). Generally doing a phiopt before merge-phi gets the chance to screw things up is good. Also phiopt is a kind of cleanup that is always beneficial as it decreases code-size. Bootstrap and regtest running on x86_64-unknown-linux-gnu. I felt that -ftree-XXX is bad naming so I went for -fssa-XXX even if that is now inconsistent. Any optinion here? For RTL we simply have unsuffixed names so shall we instead go for -fphiopt? PHI implies SSA anyway and 'SSA' or 'RTL' is an implementation detail that the user should not be interested in (applies to tree- as well, of course). Now, 'phiopt' is a bad name when thinking of users (but they shouldn't play with those options anyway). So - comments on the pass move? Comments on the flag naming? Thanks, Richard. 2014-06-17 Richard Biener rguent...@suse.de * passes.def (pass_all_early_optimizations): Add phi-opt after dce and before merge-phi. (pass_all_optimizations): Remove first phi-opt pass. * common.opt (fssa-phiopt): New option. * opts.c (default_options_table): Enable -fssa-phiopt with -O1+ but not with -Og. * tree-ssa-phiopt.c (pass_phiopt): Add gate method. * doc/invoke.texi (-fssa-phiopt): Document. Index: gcc/passes.def === --- gcc/passes.def(revision 211736) +++ gcc/passes.def(working copy) @@ -73,8 +73,12 @@ along with GCC; see the file COPYING3. execute TODO_rebuild_alias at this point. */ NEXT_PASS (pass_build_ealias); NEXT_PASS (pass_fre); - NEXT_PASS (pass_merge_phi); NEXT_PASS (pass_cd_dce); + NEXT_PASS (pass_phiopt); + /* Do this after phiopt runs as phiopt is confused by + PHIs with more than two arguments. Switch conversion + looks for a single PHI block though. */ + NEXT_PASS (pass_merge_phi); I had made phiopt not be confused by more than two arguments. What has changed? I think we should make phiopt again better with more two arguments. I'm not sure - the above is just what I remember seeing, not currently failing testcases. I can certainly remove the comment - or do you say phiopt now eventually benefits from merge_phi? Then I can as well keep merge_phi where it is right now. Richard. Thanks, Andrew NEXT_PASS (pass_early_ipa_sra); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_convert_switch); @@ -155,7 +159,6 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_cselim); NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_tree_ifcombine); - NEXT_PASS (pass_phiopt); NEXT_PASS (pass_tail_recursion); NEXT_PASS (pass_ch); NEXT_PASS (pass_stdarg); Index: gcc/common.opt === --- gcc/common.opt(revision 211736) +++ gcc/common.opt(working copy) @@ -1950,6 +1950,10 @@ fsplit-wide-types Common Report Var(flag_split_wide_types) Optimization Split wide types into independent registers +fssa-phiopt +Common Report Var(flag_ssa_phiopt) Optimization +Optimize conditional patterns using SSA PHI nodes + fvariable-expansion-in-unroller Common Report Var(flag_variable_expansion_in_unroller) Optimization Apply variable expansion when loops are unrolled Index: gcc/opts.c === --- gcc/opts.c(revision 211736) +++ gcc/opts.c(working copy) @@ -457,6 +457,7 @@ static const struct default_options defa { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fbranch_count_reg, NULL, 1 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fmove_loop_invariants, NULL, 1 }, { OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_ftree_pta, NULL, 1 }, +{ OPT_LEVELS_1_PLUS_NOT_DEBUG, OPT_fssa_phiopt, NULL, 1 }, /* -O2 optimizations. */ { OPT_LEVELS_2_PLUS, OPT_finline_small_functions, NULL, 1 }, Index: gcc/tree-ssa-phiopt.c === --- gcc/tree-ssa-phiopt.c(revision 211736) +++ gcc/tree-ssa-phiopt.c(working copy) @@ -2332,6 +2332,7 @@ public: /* opt_pass methods: */ opt_pass * clone () { return new pass_phiopt (m_ctxt); } + virtual bool gate (function *) { return flag_ssa_phiopt; } virtual unsigned
Re: [PATCH, Pointer Bounds Checker 28/x] IPA CP
Hi, On Wed, Jun 11, 2014 at 05:47:36PM +0400, Ilya Enkovich wrote: Here is fixed verison. I'm fine with the ipa-cp hunks but I cannot approve them, Honza is the right person to ask. Thanks, Martin Thanks, Ilya -- gcc/ 2014-06-11 Ilya Enkovich ilya.enkov...@intel.com * cgraph.h (cgraph_local_p): New. * ipa-cp.c (initialize_node_lattices): Use cgraph_local_p to handle instrumentation clones properly. (propagate_constants_accross_call): Do not propagate through instrumentation thunks. diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 5e702a7..b225ebe 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -1556,4 +1556,17 @@ symtab_in_same_comdat_p (symtab_node *one, symtab_node *two) { return DECL_COMDAT_GROUP (one-decl) == DECL_COMDAT_GROUP (two-decl); } + +/* Return true if NODE is local. Instrumentation clones are counted as local + only when originla function is local. */ + +static inline bool +cgraph_local_p (cgraph_node *node) +{ + if (!node-instrumentation_clone || !node-instrumented_version) +return node-local.local; + + return node-local.local node-instrumented_version-local.local; +} + #endif /* GCC_CGRAPH_H */ diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 689378a..4318789 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -699,7 +699,7 @@ initialize_node_lattices (struct cgraph_node *node) int i; gcc_checking_assert (cgraph_function_with_gimple_body_p (node)); - if (!node-local.local) + if (!cgraph_local_p (node)) { /* When cloning is allowed, we can assume that externally visible functions are not called. We will compensate this by cloning @@ -1434,6 +1434,24 @@ propagate_constants_accross_call (struct cgraph_edge *cs) if (parms_count == 0) return false; + /* No propagation through instrumentation thunks is available yet. + It should be possible with proper mapping of call args and + instrumented callee params in the propagation loop below. But + this case mostly occurs when legacy code calls instrumented code + and it is not a primary target for optimizations. + We detect instrumentation thunks in aliases and thunks chain by + checking instrumentation_clone flag for chain source and target. + Going through instrumentation thunks we always have it changed + from 0 to 1 and all other nodes do not change it. */ + if (!cs-callee-instrumentation_clone + callee-instrumentation_clone) +{ + for (i = 0; i parms_count; i++) + ret |= set_all_contains_variable (ipa_get_parm_lattices (callee_info, + i)); + return ret; +} + /* If this call goes through a thunk we must not propagate to the first (0th) parameter. However, we might need to uncover a thunk from below a series of aliases first. */
Compile gcc.target/i386/fuse-caller-save.c with -fomit-frame-pointer (PR target/61533)
gcc.target/i386/fuse-caller-save.c currently FAILs on Solaris/x86 with gas and -m64: FAIL: gcc.target/i386/fuse-caller-save.c scan-assembler-not .cfi_def_cfa_offset FAIL: gcc.target/i386/fuse-caller-save.c scan-assembler-not .cfi_offset Fixed as follows as suggested and pre-approved by Uros in the PR. Tested with the appropriate runtest invocations on i386-pc-solaris2.11 and x86_64-unknown-linux-gnu, installed on mainline. Rainer 2014-06-17 Rainer Orth r...@cebitec.uni-bielefeld.de PR target/61533 * gcc.target/i386/fuse-caller-save.c: Add -fomit-frame-pointer to dg-options. diff --git a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c --- a/gcc/testsuite/gcc.target/i386/fuse-caller-save.c +++ b/gcc/testsuite/gcc.target/i386/fuse-caller-save.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O2 -fuse-caller-save } */ +/* { dg-options -O2 -fuse-caller-save -fomit-frame-pointer } */ /* { dg-additional-options -mregparm=1 { target ia32 } } */ /* Testing -fuse-caller-save optimization option. */ -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3
On 17 June 2014 13:10:07 Илья Михальцов morph...@gmail.com wrote: index 892ba35..39f795f 100644 --- a/gcc/config/darwin-c.c +++ b/gcc/config/darwin-c.c @@ -572,20 +572,31 @@ find_subframework_header (cpp_reader *pfile, const char *header, cpp_dir **dirp) /* Return the value of darwin_macosx_version_min suitable for the __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ macro, - so '10.4.2' becomes 1040. The lowest digit is always zero. - Print a warning if the version number can't be understood. */ + so '10.4.2' becomes 1040 and '10.10.0' becomes 101000. The lowest + digit is always zero. Print a warning if the version number + can't be understood. */ static const char * version_as_macro (void) { - static char result[] = 1000; + static char result[7] = 1000; + int minorDigitIdx; if (strncmp (darwin_macosx_version_min, 10., 3) != 0) goto fail; if (! ISDIGIT (darwin_macosx_version_min[3])) goto fail; - result[2] = darwin_macosx_version_min[3]; - if (darwin_macosx_version_min[4] != '\0' - darwin_macosx_version_min[4] != '.') + + minorDigitIdx = 3; + result[2] = darwin_macosx_version_min[minorDigitIdx++]; + if (ISDIGIT(darwin_macosx_version_min[minorDigitIdx])) { +/* Starting with 10.10 numeration for mactro changed */ What does mactro mean? macro? Thanks, Sent with AquaMail for Android http://www.aqua-mail.com
Re: Another AIX Bootstrap failure
On Mon, Jun 16, 2014 at 11:44 PM, Jan Hubicka hubi...@ucw.cz wrote: The linker is not seeing the local definition of ._ZN14__gnu_parallel9_SettingsC1Ev. libstdc++ is built with Linux-like semantics, so it allows symbols to be overridden. AIX calls everything through the PLT. But the real definition of the function is Even static functions? not being seen. I'm not exactly sure why inlining changing this and what these extra levels of indirections are trying to accomplish. The visibility of the To avoid using PLT and GOT when the unit refers to the symbol and we know that interposition does not matter. I am not certain if the linker is creating the PLT stub code because it wants to allow interpolation or because it cannot see a definition of the function and wants to allow for some other shared library to provide the definition at runtime. Why branch to a non-global (static) symbol b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0 leads to PLT stub here and why branching to such symbols seems to work otherwise? Branching to non-global (static) symbol, even an alias, is working here. The weak function seems to be the problem. The failing branch is b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0 so the call to static construction seems to have happened correctly but we can not get right the call from the constructor to static function (that is an alias of a global symbol) The linker appears to not want to resolve the weak function. If I change ._ZN14__gnu_parallel9_SettingsC1Ev to lglobl, it works. If I change the static constructor to call the weak function directly, avoiding the alias, it shows the same failure mode. I don't know what code generation looked like before. Was GCC generating calls to weak functions within the same file? Thanks, David
Re: Regimplification enhancements 3/3
On Mon, Jun 16, 2014 at 01:38:49PM +0200, Richard Biener wrote: On Mon, Jun 16, 2014 at 12:57 PM, Bernd Schmidt ber...@codesourcery.com wrote: There's code in regimplification that makes us use an extra temporary when we encounter a call returning a non-BLKmode structure. This seems somewhat inefficient and unnecessary, and when used from the lower-addr-spaces pass I'm working on it leads to problems further down that look like tree-ssa bugs that I wasn't able to clearly disentangle. Here's what happens on compile/pr51761.c. Regimplification has the following effect, creating an extra temporary _6: - D.1378 = fooD.1373 (aD.1377); + _6 = fooD.1373 (aD.1377); + # .MEMD.1382 = VDEF .MEMD.1382 + D.1378 = _6; SRA turns this into: _6 = fooD.1373 (aD.1377); # VUSE .MEM_3 SR$2_7 = MEM[(struct S *)_6]; clearly bogus - _6 is a register, you can't use a MEM on it. Weird... does the following (untested) patch help? diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 0afa197..747b1b6 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -3277,6 +3277,8 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi) if (modify_this_stmt || gimple_has_volatile_ops (*stmt) + || is_gimple_reg (lhs) + || is_gimple_reg (rhs) || contains_vce_or_bfcref_p (rhs) || contains_vce_or_bfcref_p (lhs) || stmt_ends_bb_p (*stmt)) It is just a quick thought though. If it does not, could you post the access trees dumped by -fdump-tree-esra-details or -fdump-tree-sra-details (depending on whether this is early or late SRA)? Or is it simple to set it up locally? Thanks, Martin Somehow, the address of _6 doesn't count as a use, and the DCE pass decides it is unused: Eliminating unnecessary statements: Deleting LHS of call: _6 = foo (a); However, the statement SR$2_7 = MEM[(struct S *)_6]; is still present, and we have an SSA name without a definition, leading to a crash. Rather than figure all this out, I decided to try making the regimplification not generate the extra copy in the first place. The testsuite seems to agree with me that it's unnecessary. Bootstrapped and tested on x86_64-linux, ok? Ok. The code looks bogus anyway in that it generates a SSA name for sth not is_gimple_reg_type (). Thanks, Richard. Bernd
Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access
On 5 June 2014 07:27, Ramana Radhakrishnan ramana@googlemail.com wrote: On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis charles.bay...@linaro.org wrote: This patch adds support for post-indexed addressing for NEON structure memory accesses. For example VLD1.8 {d0}, [r0], r1 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. Ok for trunk? This looks like a reasonable start but this work doesn't look complete to me yet. Can you also look at the impact on performance of a range of benchmarks especially a popular embedded one to see how this behaves unless you have already done so ? I ran a popular suite of embedded benchmarks, and there is no impact at all on Chromebook (including with the additional attached patch) The patch was developed to address a performance issue with a new version of libvpx which uses intrinsics instead of NEON assembler. The patch results in a 3% improvement for VP8 decode. POST_INC, POST_MODIFY usually have a funny way of biting you with either ivopts or the way in which address costs work. I think there maybe further tweaks needed but for a first step I'd like to know what the performance impact is. I would also suggest running this through clyon's neon intrinsics testsuite to see if that catches any issues especially with the large vector modes. No issues found in clyon's tests. Your mention of larger vector modes prompted me to check that the patch has the desired result with them. In fact, the costs are estimated incorrectly which means the post_modify pattern is not used. The attached patch fixes that. (used in combination with my original patch) 2014-06-15 Charles Baylis charles.ba...@linaro.org * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with embedded side effects. 0002-Adjust-costs-for-mem-with-post_modify.patch Description: application/download
Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.
On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote: + 1st vec: 0 1 2 3 4 5 6 7 + 2nd vec: 8 9 10 11 12 13 14 15 + 3rd vec: 16 17 18 19 20 21 22 23 + + The output sequence should be: + + 1st vec: 0 3 6 9 12 15 18 21 + 2nd vec: 1 4 7 10 13 16 19 22 + 3rd vec: 2 5 8 11 14 17 20 23 + + We use 3 shuffle instructions and 3 * 3 - 1 shifts to create such output. Why not 3 * 2 blend followed by 3 shuffle? When length is prime, as here, we know that no blend will ever overlap elements. So: 1st step A1 = blend V1 V2 = 0 9 2 3 12 5 6 15 A2 = blend V1 V2 = 8 1 10 11 4 13 14 7 A3 = blend V1 V3 = 16 17 2 19 20 5 22 23 2nd step B1 = blend A1 V3 = 0 9 18 3 12 21 6 15 B2 = blend A2 V3 = 16 1 10 19 4 13 22 7 B3 = blend A3 V2 = 8 17 2 11 20 5 14 23 3rd step C1 = perm B1 = 0 3 6 9 12 15 18 21 C2 = perm B2 = 1 4 7 10 13 16 19 22 C3 = perm B3 = 2 5 8 11 14 17 20 23 The final permute here isn't trivial, crossing lanes for avx2 and all, but the initial permute you use is similar. r~
[PATCH GCC 2/2]Add 'force-dwarf-lexical-blocks' command line option - extend to C++
Hi, This is the third (and final) patch which extends the original change proposal, submitted on June 1, and titled Add 'force-dwarf-lexical-blocks' command line option. This patch extends the proposed functionality to C++. Attached are the proposed ChangeLog additions (for this patch only), named according to the directory each one belongs to. All check-c and check-c++ tests have been run for unix target. The testsuites showed identical results, with and without setting the proposed -fforce-dwarf-lexical-blocks command line option. Please let me know, if the proposed additions will be accepted. Best regards, Andrei Herman Mentor Graphics Corporation Israel branch From 824e75eb563e82c04fe1621c64430d87cdb0f348 Mon Sep 17 00:00:00 2001 From: Andrei Herman andrei_her...@codesourcery.com Date: Tue, 17 Jun 2014 17:59:07 +0300 Subject: [PATCH 3/3] Support flag_force_dwarf_blocks in C++. * c-semantics.c (push_block_info): Allow BIND_EXPR for STATEMENT_LIST. * cp-objcp-common.c (cxx_block_may_fallthru): Return false for break or continue, when flag_force_dwarf_blocks. * cp-tree.h (pop_scope_for_labels): New. * name-lookup.c (keep_current_level): New. (kept_level_p): When flag_force_dwarf_blocks, avoid creating duplicate blocks. * name-lookup.h (keep_current_level): New. * parser.c (cp_parser_statement): Add last_label and pass it when calling cp_parser_label_for_labeled_statement, to create a label scope for the first label of a statement. Close forced scopes at current level, after labeled compound statements that don't fall through. (cp_parser_force_block_for_label): New. (pop_scope_for_labels): New. (cp_parser_label_for_labeled_statement): Add parameter. Create a label scope for the first label of a statement. (cp_parser_compound_statement): Force a block for compound statement. (cp_parser_implicitly_scoped_statement): Likewise for if-then, if-else, switch and do statements. (cp_parser_already_scoped_statement): Likewise for for/while bodies. * semantics.c (do_poplevel): Close any forced scopes in given level. (build_data_member_initialization): Allow BIND_EXP. Signed-off-by: Andrei Herman andrei_her...@codesourcery.com --- gcc/c-family/c-semantics.c | 11 - gcc/cp/cp-objcp-common.c |5 ++ gcc/cp/cp-tree.h |1 + gcc/cp/name-lookup.c | 12 +- gcc/cp/name-lookup.h |1 + gcc/cp/parser.c| 104 gcc/cp/semantics.c |5 ++ 7 files changed, 127 insertions(+), 12 deletions(-) diff --git a/gcc/c-family/c-semantics.c b/gcc/c-family/c-semantics.c index ec3045f..8c8497f 100644 --- a/gcc/c-family/c-semantics.c +++ b/gcc/c-family/c-semantics.c @@ -35,8 +35,15 @@ along with GCC; see the file COPYING3. If not see void push_block_info (tree block, location_t loc, bool is_label) { - if (TREE_CODE(block) != STATEMENT_LIST) + switch (TREE_CODE (block)) { + case BIND_EXPR: +block = BIND_EXPR_BODY (block); +/* Fall through. */ + case STATEMENT_LIST: +break; + default: return; + } block_loc tl; tl = (block_loc) ggc_internal_cleared_alloc (sizeof(struct block_loc_s)); @@ -70,7 +77,7 @@ check_pop_block_info(tree block, location_t loc) if (block == cur_block_info-block loc == cur_block_info-loc !cur_block_info-is_label) { - block_list_stack-pop(); + block_list_stack-pop (); } } } diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c index 78dddef..fcfd959 100644 --- a/gcc/cp/cp-objcp-common.c +++ b/gcc/cp/cp-objcp-common.c @@ -238,6 +238,11 @@ cxx_block_may_fallthru (const_tree stmt) return false; default: + if (flag_force_dwarf_blocks) { +if (TREE_CODE (stmt) == BREAK_STMT || +TREE_CODE (stmt) == CONTINUE_STMT) + return false; + } return true; } } diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 7d29c2c..4953ad9 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5501,6 +5501,7 @@ extern bool maybe_clone_body (tree); extern tree cp_convert_range_for (tree, tree, tree, bool); extern bool parsing_nsdmi (void); extern void inject_this_parameter (tree, cp_cv_quals); +extern void pop_scope_for_labels (tree); /* in pt.c */ extern bool check_template_shadow (tree); diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index 2baeeb7..5538c63 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -1745,7 +1745,8 @@ local_bindings_p (void) bool kept_level_p (void) { - return (current_binding_level-blocks != NULL_TREE + return ((!flag_force_dwarf_blocks +current_binding_level-blocks != NULL_TREE) || current_binding_level-keep || current_binding_level-kind
Re: Another AIX Bootstrap failure
To avoid using PLT and GOT when the unit refers to the symbol and we know that interposition does not matter. I am not certain if the linker is creating the PLT stub code because it wants to allow interpolation or because it cannot see a definition of the function and wants to allow for some other shared library to provide the definition at runtime. OK, but the definition appears in the same file.. Why branch to a non-global (static) symbol b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0 leads to PLT stub here and why branching to such symbols seems to work otherwise? Branching to non-global (static) symbol, even an alias, is working here. The weak function seems to be the problem. The failing branch is b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0 so the call to static construction seems to have happened correctly but we can not get right the call from the constructor to static function (that is an alias of a global symbol) The linker appears to not want to resolve the weak function. If I change ._ZN14__gnu_parallel9_SettingsC1Ev to lglobl, it works. If I change the static constructor to call the weak function directly, avoiding the alias, it shows the same failure mode. I don't know what code generation looked like before. Was GCC generating calls to weak functions within the same file? Yes, this is how you implement COMDAT functions, right? I looked at rs6000 call expansion and it does not seem to care about visibility properties (just about direct wrt indirect call). One problem I can think of is a scenario where linked unify calls comdat functoins in between units somehow forgetting about the aliases, but this function seems to not be shared. Index: symtab.c === --- symtab.c(revision 211693) +++ symtab.c(working copy) @@ -1327,10 +1327,8 @@ (void *)new_node, true); if (new_node) return new_node; -#ifndef ASM_OUTPUT_DEF /* If aliases aren't supported by the assembler, fail. */ return NULL; -#endif /* Otherwise create a new one. */ new_decl = copy_node (node-decl); disable generation of the local aliases completely. I do not see much of difference in the actual codegen with this... I will check older GCC Honza Thanks, David
Re: [Patch] PR55189 enable -Wreturn-type by default
On 05/06/2014 20:01, Joseph S. Myers wrote: Initially, I implemented -Wmissing-return to manage this case ( https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason suggested to remove that: https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html (I don't have a strong opinion on the subject). I think splitting the option like that makes sense. Compatibility indicates that -Wreturn-type and -Wall should still enable -Wmissing-return, but only the other pieces of -Wreturn-type should be enabled by default, at least for C. (Enabling -Wimplicit-int by default might be a good starting point.) OK. As attachment, you will find a potential implementation. Is that what you expect? Also, at least one testsuite change in your patch is wrong. OK. Thanks. I've probably made other (I update +1300 of them) Thanks Sylvestre From 1b936c618c58dc0e899fa9f56013de48f7e4dcd6 Mon Sep 17 00:00:00 2001 From: Sylvestre Ledru sylves...@debian.org Date: Tue, 17 Jun 2014 18:48:29 +0200 Subject: [PATCH 2/2] Enable Wimplicit by default --- gcc/c-family/c.opt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 050d400..9b9ede7 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -460,7 +460,7 @@ C ObjC Var(warn_implicit_function_declaration) Init(-1) Warning LangEnabledBy(C Warn about implicit function declarations Wimplicit-int -C ObjC Var(warn_implicit_int) Warning LangEnabledBy(C ObjC,Wimplicit) +C ObjC Var(warn_implicit_int) Warning Warn when a declaration does not specify a type Wimport -- 2.0.0 From 80cd3dff34f74058ab66b69e0e01a05eaf686338 Mon Sep 17 00:00:00 2001 From: Sylvestre Ledru sylves...@debian.org Date: Tue, 17 Jun 2014 18:48:12 +0200 Subject: [PATCH 1/2] Introduce -Wmissing-return (Was part of -Wreturn-type which is now enabled by default) --- gcc/c-family/c.opt| 4 gcc/doc/invoke.texi | 10 +- gcc/fortran/options.c | 4 gcc/tree-cfg.c| 4 ++-- 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 91f8275..050d400 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -697,6 +697,10 @@ Wreturn-type C ObjC C++ ObjC++ Var(warn_return_type) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Warn whenever a function's return type defaults to \int\ (C), or about inconsistent return types (C++) +Wmissing-return +C ObjC C++ ObjC++ Var(warn_missing_return) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) +Warn whenever control may reach end of non-void function + Wselector ObjC ObjC++ Var(warn_selector) Warning Warn if a selector has multiple methods diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 9a34f1c..9911e86 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -258,7 +258,7 @@ Objective-C and Objective-C++ Dialects}. -Winvalid-pch -Wlarger-than=@var{len} -Wunsafe-loop-optimizations @gol -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol -Wmain -Wmaybe-uninitialized -Wmissing-braces -Wmissing-field-initializers @gol --Wmissing-include-dirs @gol +-Wmissing-include-dirs -Wmissing-return @gol -Wno-multichar -Wnonnull -Wno-overflow -Wopenmp-simd @gol -Woverlength-strings -Wpacked -Wpacked-bitfield-compat -Wpadded @gol -Wparentheses -Wpedantic-ms-format -Wno-pedantic-ms-format @gol @@ -3327,6 +3327,7 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}. -Wmain @r{(only for C/ObjC and unless} @option{-ffreestanding}@r{)} @gol -Wmaybe-uninitialized @gol -Wmissing-braces @r{(only for C/ObjC)} @gol +-Wmissing-return @gol -Wnonnull @gol -Wopenmp-simd @gol -Wparentheses @gol @@ -3657,6 +3658,13 @@ the following example, the initializer for @samp{a} is not fully bracketed, but that for @samp{b} is fully bracketed. This warning is enabled by @option{-Wall} in C. +@item -Wmissing-return +@opindex Wmissing-return +@opindex Wno-missing-return +Warn whenever falling off the end of the function body (I.e. without +any return). +This warning is enabled by @option{-Wall} for C and C++. + @smallexample int a[2][2] = @{ 0, 1, 2, 3 @}; int b[2][2] = @{ @{ 0, 1 @}, @{ 2, 3 @} @}; diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c index a2b91ca..fe71230 100644 --- a/gcc/fortran/options.c +++ b/gcc/fortran/options.c @@ -698,6 +698,10 @@ gfc_handle_option (size_t scode, const char *arg, int value, gfc_option.warn_line_truncation = value; break; +case OPT_Wmissing_return: + warn_missing_return = value; + break; + case OPT_Wrealloc_lhs: gfc_option.warn_realloc_lhs = value; break; diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c index e824619..2fd342e 100644 --- a/gcc/tree-cfg.c +++ b/gcc/tree-cfg.c @@ -8265,7 +8265,7 @@ pass_warn_function_return::execute (function *fun) /* If we see return; in some basic block, then we do reach the end without returning a value. */ - else if (warn_return_type + else if
Re: Another AIX Bootstrap failure
On Tue, Jun 17, 2014 at 12:50 PM, Jan Hubicka hubi...@ucw.cz wrote: To avoid using PLT and GOT when the unit refers to the symbol and we know that interposition does not matter. I am not certain if the linker is creating the PLT stub code because it wants to allow interpolation or because it cannot see a definition of the function and wants to allow for some other shared library to provide the definition at runtime. OK, but the definition appears in the same file.. Why branch to a non-global (static) symbol b ._ZN14__gnu_parallel9_SettingsC1Ev.localalias.0 leads to PLT stub here and why branching to such symbols seems to work otherwise? Branching to non-global (static) symbol, even an alias, is working here. The weak function seems to be the problem. The weak function is the problem, but I don't know why. And I don't understand how this is different than past uses of weak functions. Or is that new? This is very confusing because the library, libstdc++, is being linked statically. It provides a weak definition of the function. There should be no glink code (PLT stub). If the function is declared .lglobl, it is called directly and no PLT stub is created. I need to call in the help of the AIX linker expert to figure out why it is inserting PLT stub code, especially when linking statically. Thanks, David
Re: [Patch] PR55189 enable -Wreturn-type by default
On Tue, 17 Jun 2014, Sylvestre Ledru wrote: On 05/06/2014 20:01, Joseph S. Myers wrote: Initially, I implemented -Wmissing-return to manage this case ( https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason suggested to remove that: https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html (I don't have a strong opinion on the subject). I think splitting the option like that makes sense. Compatibility indicates that -Wreturn-type and -Wall should still enable -Wmissing-return, but only the other pieces of -Wreturn-type should be enabled by default, at least for C. (Enabling -Wimplicit-int by default might be a good starting point.) OK. As attachment, you will find a potential implementation. Is that what you expect? It would help a lot if it included testcases for what various options / option combinations do / do not enable. I expect that each option continues to enable the warnings it does at present (so if a user explicitly does -Wreturn-type it also enables the -Wmissing-return warnings, for example) - but some warnings would start to be enabled by default. If someone does e.g. -Wno-implicit that would disable the default -Wimplicit-int; if they do -Wno-implicit -Wimplicit that would have the same effect as just -Wimplicit (so keeping the default warnings enabled, and possibly enabling others). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, PR52252] Alternative way of vectorization for load groups of size 2 and 3.
While developing I've tried the following scheme: First step is 3 shuffles (as initially): A1 = (0 3 6) (1 4 7) (2 5) A2 = (8 11 14) (9 12 15) (10 13) A3 = (16 19 22) (17 20 23) (18 21) R1 = blend [ blend [A1 A2], A3] = (0 3 6) (9 12 15) (18 21) B2 = blend [A1, A2] = (0 3 6) (1 4 7) (10 13) R2 = shift 3, B2 ... (1 4 7) (10 13) + A3 (16 19 22) ... = (1 4 7) (10 13) (16 19 22) B3 = blend [ A2, A3] = (8 11 14) (17 20 23) (18 21) R3 = shift 6, A1 ... (2 5) + B3 (8 11 14) (17 20 23) ... = (2 5) (8 11 14) (17 20 23) But it was slower than scheme in the patch as blend costs more than shift (palign). For AVX2 the scheme is not ok as have much more dependencies than current (in vect_permute_load_chain). Evgeny On Tue, Jun 17, 2014 at 7:41 PM, Richard Henderson r...@redhat.com wrote: On 06/17/2014 05:33 AM, Evgeny Stupachenko wrote: + 1st vec: 0 1 2 3 4 5 6 7 + 2nd vec: 8 9 10 11 12 13 14 15 + 3rd vec: 16 17 18 19 20 21 22 23 + + The output sequence should be: + + 1st vec: 0 3 6 9 12 15 18 21 + 2nd vec: 1 4 7 10 13 16 19 22 + 3rd vec: 2 5 8 11 14 17 20 23 + + We use 3 shuffle instructions and 3 * 3 - 1 shifts to create such output. Why not 3 * 2 blend followed by 3 shuffle? When length is prime, as here, we know that no blend will ever overlap elements. So: 1st step A1 = blend V1 V2 = 0 9 2 3 12 5 6 15 A2 = blend V1 V2 = 8 1 10 11 4 13 14 7 A3 = blend V1 V3 = 16 17 2 19 20 5 22 23 2nd step B1 = blend A1 V3 = 0 9 18 3 12 21 6 15 B2 = blend A2 V3 = 16 1 10 19 4 13 22 7 B3 = blend A3 V2 = 8 17 2 11 20 5 14 23 3rd step C1 = perm B1 = 0 3 6 9 12 15 18 21 C2 = perm B2 = 1 4 7 10 13 16 19 22 C3 = perm B3 = 2 5 8 11 14 17 20 23 The final permute here isn't trivial, crossing lanes for avx2 and all, but the initial permute you use is similar. r~
Re: [Patch] PR55189 enable -Wreturn-type by default
On 17/06/2014 19:15, Joseph S. Myers wrote: On Tue, 17 Jun 2014, Sylvestre Ledru wrote: On 05/06/2014 20:01, Joseph S. Myers wrote: Initially, I implemented -Wmissing-return to manage this case ( https://gcc.gnu.org/ml/gcc-patches/2014-01/msg00820.html ) but Jason suggested to remove that: https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01033.html (I don't have a strong opinion on the subject). I think splitting the option like that makes sense. Compatibility indicates that -Wreturn-type and -Wall should still enable -Wmissing-return, but only the other pieces of -Wreturn-type should be enabled by default, at least for C. (Enabling -Wimplicit-int by default might be a good starting point.) OK. As attachment, you will find a potential implementation. Is that what you expect? It would help a lot if it included testcases for what various options / option combinations do / do not enable. OK. I will do that. We should test the following: * default = run just -Wreturn-type * -Wreturn-type = Run both * -Wreturn-type + -Wmissing-return = Run both * -Wno-return-type + -Wmissing-return = Run just the second one * -Wno-return-type + -Wno-missing-return = Run none Do you see any other? I expect that each option continues to enable the warnings it does at present (so if a user explicitly does -Wreturn-type it also enables the -Wmissing-return warnings, for example) - but some warnings would start to be enabled by default. If someone does e.g. -Wno-implicit that would disable the default -Wimplicit-int; if they do -Wno-implicit -Wimplicit that would have the same effect as just -Wimplicit (so keeping the default warnings enabled, and possibly enabling others). OK. I will try to implement that later (I don't think -Wimplicit-int is necessary to enable -Wreturn-type by default). Besides that, are you OK with my changes? (with the tests updated) Thanks, Sylvestre
Re: [Patch] PR55189 enable -Wreturn-type by default
On Tue, 17 Jun 2014, Sylvestre Ledru wrote: OK. I will do that. We should test the following: * default = run just -Wreturn-type * -Wreturn-type = Run both * -Wreturn-type + -Wmissing-return = Run both * -Wno-return-type + -Wmissing-return = Run just the second one * -Wno-return-type + -Wno-missing-return = Run none Do you see any other? That looks like the right things to test, if there are no changes for anything other than those options. Besides that, are you OK with my changes? (with the tests updated) The tests are key to reviewing whether the code changes actually do the right thing. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] C++ thunk section names
Ping. On Mon, Jun 9, 2014 at 3:54 PM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Mon, May 19, 2014 at 11:25 AM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Thu, Apr 17, 2014 at 10:41 AM, Sriraman Tallam tmsri...@google.com wrote: Ping. On Wed, Feb 5, 2014 at 4:31 PM, Sriraman Tallam tmsri...@google.com wrote: Hi, I would like this patch reviewed and considered for commit when Stage 1 is active again. Patch Description: A C++ thunk's section name is set to be the same as the original function's section name for which the thunk was created in order to place the two together. This is done in cp/method.c in function use_thunk. However, with function reordering turned on, the original function's section name can change to something like .text.hot.orginal or .text.unlikely.original in function default_function_section in varasm.c based on the node count of that function. The thunk function's section name is not updated to be the same as the original here and also is not always correct to do it as the original function can be hotter than the thunk. I have created a patch to not name the thunk function's section to be the same as the original function when function reordering is enabled. Thanks Sri
Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)
On 06/17/14 07:07, Richard Biener wrote: I felt that -ftree-XXX is bad naming so I went for -fssa-XXX even if that is now inconsistent. Any optinion here? For RTL we simply have unsuffixed names so shall we instead go for -fphiopt? PHI implies SSA anyway and 'SSA' or 'RTL' is an implementation detail that the user should not be interested in (applies to tree- as well, of course). Now, 'phiopt' is a bad name when thinking of users (but they shouldn't play with those options anyway). Our flags are a mess. If I put my user hat on, then I'd have to ask the question, why would I care about tree, ssa, or even phis. The pass converts branchy code into straightline code. So, arguably, the right name would reflect that it changes branchy code to straight line code. But I believe most of our flag names are poor in this regard (and I'm as much to blame as anyone). So go with your best judgement IMHO. It'd be nice to have some testcases here to show why we want this moved earlier so that a few years from now when someone else wants to move it back, we can say umm, see test frobit.c, make that work and you can move it back :-) jeff
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
Hello Bernd, On 28 Feb 17:21, Bernd Schmidt wrote: For your use case, I'd imagine the offload compiler would be built relatively normally as a full build with --enable-as-accelerator-for=x86_64-linux, which would install it into locations where the host will eventually be able to find it. Then the host compiler would be built with another new configure option (as yet unimplemented in my patch set) --enable-offload-targets=mic,... which would tell the host compiler about the pre-built offload target compilers. On the ptx I don't get this part of the plan. Where a host compiler will look for mkoffloads? E.g., first I configure/make/install the target gcc and corresponding mkoffload with the following options: --enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux --prefix=/install_gcc/accel_intelmic Next I configure/make/install the host gcc with: --enable-accelerator=intelmic --prefix=/install_gcc/host Now if I manually copy mkoffload from target's install dir into one of the dirs in host's $COMPILER_PATH, then lto-wrapper finds it and everything works fine. E.g.: mkdir -p /install_gcc/host/libexec/gcc/x86_64-unknown-linux-gnu/accel/intelmic/ cp /install_gcc/accel_intelmic/libexec/gcc/x86_64-unknown-linux/4.10.0/accel/x86_64-unknown-linux-gnu/mkoffload /install_gcc/host/libexec/gcc/x86_64-unknown-linux-gnu/accel/intelmic/ But what was your idea of how to tell host gcc about the path to mkoffload? Thanks, -- Ilya
Re: [PATCH] PR54555: Use strict_low_part for loading a constant only if it is cheaper
On 06/17/14 01:47, Andreas Schwab wrote: Postreload may transform (set (REGX) (CONST_INT A)) ... (set (REGX) (CONST_INT B)) to (set (REGX) (CONST_INT A)) ... (set (STRICT_LOW_PART (REGX)) (CONST_INT B)), but it should do that only if the latter is cheaper. On m68k, a full word load of a small constant with moveq is cheaper than doing a byte load with move.b. Tested on m68k-suse-linux and x86_64-suse-linux. In both cases the size of cc1* becomes smaller with this change. Andreas. PR rtl-optimization/54555 * postreload.c (move2add_use_add2_insn): Only substitute STRICT_LOW_PART if it is cheaper. Sadly, Kazu didn't add a testcase for the H8/300 cases which inspired his change, so we don't know if your patch hurts the H8/300 port or not. Let's do better this time ;-) Add a testcase for the m68k port which verifies we're getting the desired code. I don't care if you test the assembly code or test the RTL dumps, just that we have a test for the case where STRICT_LOW_PART is not a win. With a testcase, this is approved. Thanks, jeff
Re: [PATCH, Cilk+, PR57541] Additional fix for issues witn array notations
On 06/16/14 14:13, Zamyatin, Igor wrote: Hi All! The patch fixes ICE in array notation for the cases of incorrect arguments of Cilk+ builtins and undeclared initial index. Is it ok for trunk and 4.9? Thanks, Igor diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog index 54d0de7..56e1b0b 100644 --- a/gcc/c/ChangeLog +++ b/gcc/c/ChangeLog @@ -1,3 +1,12 @@ +2014-06-16 Igor Zamyatin igor.zamya...@intel.com + + PR middle-end/57541 + * c-array-notation.c (fix_builtin_array_notation_fn): + Check for 0 arguments in builtin call. Check that bultin argument is + correct. + * c-parser.c (c_parser_array_notation): Check for incorrect initial + index. Shouldn't this have been caught earlier? ISTM we should be catching any argument mix-ups during parsing?!?Is there some reason we don't do that? jeff
Re: [PATCH, PR 61211] Fix a bug in clone_of_p verification
Ping. Thanks, Martin On Sat, May 31, 2014 at 12:46:03AM +0200, Martin Jambor wrote: Hi, after a clone is materialized, its clone_of field is cleared which in PR 61211 leads to a failure in the skipped_thunk path in clone_of_p in cgraph.c, which then leads to a false positive verification failure. Fixed by the following patch. Bootstrapped and tested on x86_64-linux on both the trunk and the 4.9 branch. OK for both? Thanks, Martin 2014-05-30 Martin Jambor mjam...@suse.cz PR ipa/61211 * cgraph.c (clone_of_p): Allow skipped_branch to deal with expanded clones. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index ff65b86..f18f977 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2566,11 +2566,16 @@ clone_of_p (struct cgraph_node *node, struct cgraph_node *node2) skipped_thunk = true; } - if (skipped_thunk - (!node2-clone_of - || !node2-clone.args_to_skip - || !bitmap_bit_p (node2-clone.args_to_skip, 0))) -return false; + if (skipped_thunk) +{ + if (!node2-clone.args_to_skip + || !bitmap_bit_p (node2-clone.args_to_skip, 0)) + return false; + if (node2-former_clone_of == node-decl) + return true; + else if (!node2-clone_of) + return false; +} while (node != node2 node2) node2 = node2-clone_of;
Re: [patch libatomic]: Add basic support for mingw targets
On 06/16/14 07:20, Kai Tietz wrote: Hello, this patch adds basic support for libatomic for mingw targets using win32 and for mingw targets using posix threading model. The win32 implemenation might need for initialization of mutexes a critical section. If issue occures we can still add that. For now all testcases are passing for native and posix-threading model mingw (32-bit and 64-bit). ChangeLog 2014-06-16 Kai Tietz kti...@redhat.com * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags. Isn't this all target stuff, in which case lt_host_flags seems inappropriate. Or is this just poorly named? The rest seems reasonable. So we just need to settle that nit and we can go forward. jeff
Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation
On 06/17/2014 08:20 PM, Ilya Verbin wrote: Hello Bernd, On 28 Feb 17:21, Bernd Schmidt wrote: For your use case, I'd imagine the offload compiler would be built relatively normally as a full build with --enable-as-accelerator-for=x86_64-linux, which would install it into locations where the host will eventually be able to find it. Then the host compiler would be built with another new configure option (as yet unimplemented in my patch set) --enable-offload-targets=mic,... which would tell the host compiler about the pre-built offload target compilers. On the ptx I don't get this part of the plan. Where a host compiler will look for mkoffloads? E.g., first I configure/make/install the target gcc and corresponding mkoffload with the following options: --enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux --prefix=/install_gcc/accel_intelmic Next I configure/make/install the host gcc with: --enable-accelerator=intelmic --prefix=/install_gcc/host Try using the same prefix for both. Bernd
Re: [patch i386]: Combine memory and indirect jump
On 06/13/14 10:59, Kai Tietz wrote: 2014-06-13 17:58 GMT+02:00 Jeff Law l...@redhat.com: On 06/13/14 09:56, Richard Henderson wrote: On 06/13/2014 08:36 AM, Jeff Law wrote: So you may have answered this already, but why can't this be a combiner pattern? Until pass_duplicate_computed_gotos, we (intentionally) have a single indirect branch in the entire function. This vastly reduces the size of the CFG. Ah, the factoring bits. Should have known. Peep2 is currently running before d_c_g, so currently Kai can't solve this problem in peep2. I don't think peep2 should run after sched2, but I'll bet we can reorder things a bit so that d_c_g runs before peep2. Yea, seems worth a try. jeff Well, I tested to put the second sched2 pass before the sched2 pass. That works in general. There are just some opportunties which weren't caught then. I attached a sample, which demonstrates that pretty well. I noticed that I had to put that pass behind reload blocks was necessary for better hit-rate of the peephole optimization. So can you tell us why this sample code misses opportunities? Otherwise we have to dig into it ourselves to tease out that information. I think we're zeroing in on a path to move d_c_g before peep2, but I'd like to have a clearer understanding of why we'd still be missing opportunities. If we can avoid running peep2 twice, that'd be good. jeff
Re: [Patch, Fortran] Add coarray communication support to the trunk (coindex variables)
Dear Tobias and Alessandro, Well what can I say? The patch is something of a tour de force! Sandro, questo è assolutamente meraviglioso. Molte grazie da tutti noi. I have done nothing to check the functionality of the patch. However, I have checked the conformance with coding standards and that it is well and truly insulated from the rest of gfortran by the coarray option. OK for trunk Once again many thanks for the patch. Paul On 17 June 2014 08:28, Tobias Burnus bur...@net-b.de wrote: This patch add the first coarray communication support to the trunk (ignoring the co_sum/co_min/co_max support, which was recently merged). [Note: In terms of the library this is still libcaf_single, but see below.] The patch is based on my work on the fortran-caf branch, but has a slightly modified ABI. The patch should support most communications, but it is not complete. I intent to submit soon a patch which irons some wrinkles. In particular, this patch adds three library calls to handle coindexed communication: Assignment to a coindex variable (caf_send), a coindexed expression (caf_expression) and assigning a coindexed variable to a coindexed variable (caf_sendget). The coarray is identified by a token (opaque object provided by the coarray library), an offset to that base address, an image index and an array descriptor for the coarray, which is also used for scalars – and which has the value of the whole array for vector subscripts. Additionally, one passes a kind variable as extra argument as the current array descriptor cannot destinguish a len=1 kind=4 from a len=4 kind=1 character string. And for vector subscripts, the subscripts are passed as additional argument. For assignments, the library is supposed to handle padding/trimming of strings and type conversion (e.g. cmplx_caf(:)[i] = int_array) as well as array = scalar assignments. The following is left to be done as follow up: * Support of vector subscripts with assumed-size variables: To be tested; might need the new array descriptor or some similar work around – or just a test case. * The library libcaf_single supports padding/trimming of strings but still lacks the support for type conversion and vector subscripts. * Adding an ABI documentation * There are still some issues with regards to polymorphic coarrays, in particular with passing them as dummy arguments and in ASSOCIATE/SELECT TYPE, but presumably also with using them in coindexed expressions. And as bigger item: Allocatable components of coarrays are not supported – not is the access to pointer or allocatable components (part refs); currently, there is no compile time diagnostic for it. Additionally, I have remove the vector subscript preparations from the co_sum/min/max as it does not make much sense for those. And I added a collective test case, which I found on my hard disk. Build and regtested. OK for the trunk? Tobias PS: Additional missing bits, not listed above: Locking and CRITICAL and atomics for Fortran 2008. And for TS18508 co_broadcast and co_reduce, the atomics extensions, teams, events and error recovery. -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
Re: [patch libatomic]: Add basic support for mingw targets
2014-06-17 21:16 GMT+02:00 Jeff Law l...@redhat.com: On 06/16/14 07:20, Kai Tietz wrote: Hello, this patch adds basic support for libatomic for mingw targets using win32 and for mingw targets using posix threading model. The win32 implemenation might need for initialization of mutexes a critical section. If issue occures we can still add that. For now all testcases are passing for native and posix-threading model mingw (32-bit and 64-bit). ChangeLog 2014-06-16 Kai Tietz kti...@redhat.com * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags. Isn't this all target stuff, in which case lt_host_flags seems inappropriate. Or is this just poorly named? Hmm, libatomic is here build for new host (means it is a gcc-target library). So it might be named poorly. Nevertheless see for details ACX_LT_HOST_FLAGS in config/lthostflags.m4 and why it is required to set -no-undefined and the proper bindir for cygwin/mingw. The rest seems reasonable. So we just need to settle that nit and we can go forward. jeff Kai
[PATCH, rs6000] Remove XFAIL from default_format_denormal_2.f90 for PowerPC on Linux
Hi, The testcase gfortran.dg/default_format_denormal_2.f90 has been reporting XPASS since 4.8 on the powerpc*-unknown-linux-gnu platforms. This patch removes the XFAIL for powerpc*-*-linux-* from the test. I believe this pattern doesn't match any other platforms, but please let me know if I should replace it with a more specific pattern instead. Verified on powerpc64-unknown-linux-gnu (-m32 and -m64) and powerpc64le-unknown-linux-gnu (-m64). Is this ok for trunk, 4.9, and 4.8? Thanks, Bill 2014-06-17 Bill Schmidt wschm...@linux.vnet.ibm.com * gfortran.dg/default_format_denormal_2.f90: Remove xfail for powerpc*-*-linux*. Index: gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 === --- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (revision 211741) +++ gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (working copy) @@ -1,5 +1,5 @@ ! { dg-require-effective-target fortran_large_real } -! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } } +! { dg-do run { xfail powerpc*-apple-darwin* } } ! Test XFAILed on these platforms because the system's printf() lacks ! proper support for denormalized long doubles. See PR24685 !
[Fortran-dev] Merge from the trunk
Dear all, I have now updated the fortran-dev branch to trunk version Rev. 211744. Committed as Rev. 211745. Tobias
Re: [PATCH, PR 61160] Artificial thunks need combined_args_to_skip
Hi, Ping. Thanks, Martin On Sat, May 31, 2014 at 01:08:31AM +0200, Martin Jambor wrote: Hi, the second issue in PR 61160 is that because artificial thunks (produced by duplicate_thunk_for_node) do not have combined_args_to_skip, calls to them do not get actual arguments removed, while the actual functions do loose their formal parameters, leading to mismatches. Currently, the combined_args_to_skip is computed in of cgraph_create_virtual_clone only after all the edge redirection and thunk duplication is done so it had to be moved to a spot before that. Since we already pass args_to_skip to cgraph_clone_node, I moved the computation there (otherwise it would have to duplicate the old value and also pass the new one to the redirection routine). I have also noticed that the code producing combined_args_to_skip from an old value and new args_to_skip cannot work in LTO because we do not have DECL_ARGUMENTS available at WPA in LTO. The wrong code is however never executed and so I replaced it with a simple bitmap_ior. This changes the semantics of args_to_skip for any user of cgraph_create_virtual_clone that would like to remove some parameters from something which is already a clone. However, currently there are no such users and the new semantics is saner because WPA code will be happier using the old indices rather than remapping everything the whole time. I am still in the process of bootstrapping and testing this patch on trunk, I will test it on the 4.9 branch too. OK if it passes everywhere? Thanks, Martin 2014-05-29 Martin Jambor mjam...@suse.cz PR ipa/61160 * cgraphclones.c (duplicate_thunk_for_node): Removed parameter args_to_skip, use those from node instead. Copy args_to_skip and combined_args_to_skip from node to the new thunk. (redirect_edge_duplicating_thunks): Removed parameter args_to_skip. (cgraph_create_virtual_clone): Moved computation of combined_args_to_skip... (cgraph_clone_node): ...here, simplify it to bitmap_ior.. testsuite/ * g++.dg/ipa/pr61160-2.C: New test. * g++.dg/ipa/pr61160-3.C: Likewise. diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c index 4387b99..91cc13c 100644 --- a/gcc/cgraphclones.c +++ b/gcc/cgraphclones.c @@ -301,14 +301,13 @@ set_new_clone_decl_and_node_flags (cgraph_node *new_node) thunk is this_adjusting but we are removing this parameter. */ static cgraph_node * -duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node, - bitmap args_to_skip) +duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node) { cgraph_node *new_thunk, *thunk_of; thunk_of = cgraph_function_or_thunk_node (thunk-callees-callee); if (thunk_of-thunk.thunk_p) -node = duplicate_thunk_for_node (thunk_of, node, args_to_skip); +node = duplicate_thunk_for_node (thunk_of, node); struct cgraph_edge *cs; for (cs = node-callers; cs; cs = cs-next_caller) @@ -320,17 +319,18 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node, return cs-caller; tree new_decl; - if (!args_to_skip) + if (!node-clone.args_to_skip) new_decl = copy_node (thunk-decl); else { /* We do not need to duplicate this_adjusting thunks if we have removed this. */ if (thunk-thunk.this_adjusting -bitmap_bit_p (args_to_skip, 0)) +bitmap_bit_p (node-clone.args_to_skip, 0)) return node; - new_decl = build_function_decl_skip_args (thunk-decl, args_to_skip, + new_decl = build_function_decl_skip_args (thunk-decl, + node-clone.args_to_skip, false); } gcc_checking_assert (!DECL_STRUCT_FUNCTION (new_decl)); @@ -348,6 +348,8 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node, new_thunk-thunk = thunk-thunk; new_thunk-unique_name = in_lto_p; new_thunk-former_clone_of = thunk-decl; + new_thunk-clone.args_to_skip = node-clone.args_to_skip; + new_thunk-clone.combined_args_to_skip = node-clone.combined_args_to_skip; struct cgraph_edge *e = cgraph_create_edge (new_thunk, node, NULL, 0, CGRAPH_FREQ_BASE); @@ -364,12 +366,11 @@ duplicate_thunk_for_node (cgraph_node *thunk, cgraph_node *node, chain. */ void -redirect_edge_duplicating_thunks (struct cgraph_edge *e, struct cgraph_node *n, - bitmap args_to_skip) +redirect_edge_duplicating_thunks (struct cgraph_edge *e, struct cgraph_node *n) { cgraph_node *orig_to = cgraph_function_or_thunk_node (e-callee); if (orig_to-thunk.thunk_p) -n = duplicate_thunk_for_node (orig_to, n, args_to_skip); +n = duplicate_thunk_for_node (orig_to, n); cgraph_redirect_edge_callee (e, n); } @@ -422,9 +423,21 @@
Re: [PATCH 1/5] New Identical Code Folding IPA pass
On 06/13/14 04:24, mliska wrote: You may ask, why the GNU GCC does need such a new optimization. The compiler, having simply better knowledge of a compiled source file, is capable of reaching better results, especially if Link-Time optimization is enabled. Apart from that, GCC implementation adds support for read-only variables like construction vtables (mentioned in: http://hubicka.blogspot.cz/2014/02/devirtualization-in-c-part-3-building.html). Can you outline at a high level cases where GCC's knowledge allows it to reach a better result? Is it because you're not requiring bit for bit identical code, but that the code merely be semantically equivalent? The GCC driven ICF seems to pick up 2X more opportunities than the gold driven ICF. But if I'm reading everything correctly, that includes ICF of both functions and variables. Do you have any sense of how those improvements break down? ie, is it mostly more function's you're finding as identical, and if so what is it about the GCC implementation that allows us to find more ICF opportunities. If it's mostly variables, that's fine too. I'm just trying to understand where the improvements are coming from. Jeff
Re: [PATCH 4/5] Existing tests fix
On 06/13/14 04:48, mliska wrote: Hi, many tests rely on a precise number of scanned functions in a dump file. If IPA ICF decides to merge some function and(or) read-only variables, counts do not match. Martin Changelog: 2014-06-13 Martin Liska mli...@suse.cz Honza Hubicka hubi...@ucw.cz * c-c++-common/rotate-1.c: Text * c-c++-common/rotate-2.c: New test. * c-c++-common/rotate-3.c: Likewise. * c-c++-common/rotate-4.c: Likewise. * g++.dg/cpp0x/rv-return.C: Likewise. * g++.dg/cpp0x/rv1n.C: Likewise. * g++.dg/cpp0x/rv1p.C: Likewise. * g++.dg/cpp0x/rv2n.C: Likewise. * g++.dg/cpp0x/rv3n.C: Likewise. * g++.dg/cpp0x/rv4n.C: Likewise. * g++.dg/cpp0x/rv5n.C: Likewise. * g++.dg/cpp0x/rv6n.C: Likewise. * g++.dg/cpp0x/rv7n.C: Likewise. * gcc.dg/ipa/ipacost-1.c: Likewise. * gcc.dg/ipa/ipacost-2.c: Likewise. * gcc.dg/ipa/ipcp-agg-6.c: Likewise. * gcc.dg/ipa/remref-2a.c: Likewise. * gcc.dg/ipa/remref-2b.c: Likewise. * gcc.dg/pr46309-2.c: Likewise. * gcc.dg/torture/ipa-pta-1.c: Likewise. * gcc.dg/tree-ssa/andor-3.c: Likewise. * gcc.dg/tree-ssa/andor-4.c: Likewise. * gcc.dg/tree-ssa/andor-5.c: Likewise. * gcc.dg/vect/no-vfa-pr29145.c: Likewise. * gcc.dg/vect/vect-cond-10.c: Likewise. * gcc.dg/vect/vect-cond-9.c: Likewise. * gcc.dg/vect/vect-widen-mult-const-s16.c: Likewise. * gcc.dg/vect/vect-widen-mult-const-u16.c: Likewise. * gcc.dg/vect/vect-widen-mult-half-u8.c: Likewise. * gcc.target/i386/bmi-1.c: Likewise. * gcc.target/i386/bmi-2.c: Likewise. * gcc.target/i386/pr56564-2.c: Likewise. * g++.dg/opt/pr30965.C: Likewise. * g++.dg/tree-ssa/pr19637.C: Likewise. * gcc.dg/guality/csttest.c: Likewise. * gcc.dg/ipa/iinline-4.c: Likewise. * gcc.dg/ipa/iinline-7.c: Likewise. * gcc.dg/ipa/ipa-pta-13.c: Likewise. I know this is the least interesting part of your changes, but it's also simple and mechanical and thus trivial to review. Approved, but obviously don't install until the rest of your patch has been approved. Similar changes for recently added tests or cases where you might improve ICF requiring similar tweaks to existing tests are pre-approved as well. jeff
Re: [PATCH 5/5] New tests introduction
On 06/13/14 05:16, mliska wrote: Hi, this is a new collection of tests for IPA ICF pass. Martin Changelog: 2014-06-13 Martin Liska mli...@suse.cz Honza Hubicka hubi...@ucw.cz * gcc/testsuite/g++.dg/ipa/ipa-se-1.C: New test. * gcc/testsuite/g++.dg/ipa/ipa-se-2.C: Likewise. * gcc/testsuite/g++.dg/ipa/ipa-se-3.C: Likewise. * gcc/testsuite/g++.dg/ipa/ipa-se-4.C: Likewise. * gcc/testsuite/g++.dg/ipa/ipa-se-5.C: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-1.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-10.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-11.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-12.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-13.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-14.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-15.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-16.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-17.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-18.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-19.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-2.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-20.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-21.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-22.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-23.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-24.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-25.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-26.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-27.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-28.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-3.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-4.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-5.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-6.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-7.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-8.c: Likewise. * gcc/testsuite/gcc.dg/ipa/ipa-se-9.c: Likewise. Also approved, but please don't install entire the entire kit is approved. I'd like to applaud you and Jan for including a nice baseline of tests. jeff
Re: [PATCH 2/5] Existing call graph infrastructure enhancement
On 06/13/14 04:26, mliska wrote: Hi, this small patch prepares remaining needed infrastructure for the new pass. Changelog: 2014-06-13 Martin Liska mli...@suse.cz Honza Hubicka hubi...@ucw.cz * ipa-utils.h (polymorphic_type_binfo_p): Function marked external instead of static. * ipa-devirt.c (polymorphic_type_binfo_p): Likewise. * ipa-prop.h (count_formal_params): Likewise. * ipa-prop.c (count_formal_params): Likewise. * ipa-utils.c (ipa_merge_profiles): Be more tolerant if we merge profiles for semantically equivalent functions. * passes.c (do_per_function): If we load body of a function during WPA, this condition should behave same. * varpool.c (ctor_for_folding): More tolerant assert for variable aliases created during WPA. Presumably we don't have any useful way to merge the cases where we have provides for SRC DST in ipa_merge_profiles or even to guess which is more useful when presented with both? Does it make sense to log this into a debugging file when we drop one? I think this patch is fine. If adding logging makes sense, then feel free to do so and consider that trivial change pre-approved. Jeff
Re: [PATCH 1/5] New Identical Code Folding IPA pass
Hi, On 13/06/14 12:24, mliska wrote: The optimization is inspired by Microsoft /OPT:ICF optimization (http://msdn.microsoft.com/en-us/library/bxwfs976.aspx) that merges COMDAT sections with each function reside in a separate section. In terms of C++ testcases, I'm wondering if you already double checked that the new pass already does well on the typical examples on which, I was told, the Microsoft optimization is known to do well, eg, code instantiating std::vector for different pointer types, or even long and long long on x86_64-linux, things like that. Thanks, Paolo.
Re: [patch libatomic]: Add basic support for mingw targets
On 06/17/14 13:31, Kai Tietz wrote: 2014-06-17 21:16 GMT+02:00 Jeff Law l...@redhat.com: On 06/16/14 07:20, Kai Tietz wrote: Hello, this patch adds basic support for libatomic for mingw targets using win32 and for mingw targets using posix threading model. The win32 implemenation might need for initialization of mutexes a critical section. If issue occures we can still add that. For now all testcases are passing for native and posix-threading model mingw (32-bit and 64-bit). ChangeLog 2014-06-16 Kai Tietz kti...@redhat.com * Makefile.am (libatomic_la_LDFLAGS): Add lt_host_flags. Isn't this all target stuff, in which case lt_host_flags seems inappropriate. Or is this just poorly named? Hmm, libatomic is here build for new host (means it is a gcc-target library). So it might be named poorly. Nevertheless see for details ACX_LT_HOST_FLAGS in config/lthostflags.m4 and why it is required to set -no-undefined and the proper bindir for cygwin/mingw. Right, I'm aware that libatomic is a target library. What I'm worried about is confusion due to using ACX_LT_HOST_FLAGS and possible pollution of flags originally the host being used for the target library build. Given that several other libraries use similar constraints to get lt_host_flags into LDFLAGS, I guess pollution isn't (or better stated hasn't) been an issue. Approved. Jeff
Re: [PATCH 1/5] New Identical Code Folding IPA pass
On Fri, 2014-06-13 at 12:24 +0200, mliska wrote: [...snip...] Statistics about the pass: Inkscape: 11.95 MB - 11.44 MB (-4.27%) Firefox: 70.12 MB - 70.12 MB (-3.07%) FWIW, you wrote 70.12 MB here for both before and after for Firefox, but give a -3.07% change, which seems like a typo. A 3.07% reduction from 70.12 MB would be 67.97 MB; was this what the pass achieved? [...snip...] Thanks (nice patch, btw) Dave
Re: [PATCH, Pointer Bounds Checker 28/x] IPA CP
On 06/17/14 07:41, Martin Jambor wrote: Hi, On Wed, Jun 11, 2014 at 05:47:36PM +0400, Ilya Enkovich wrote: Here is fixed verison. I'm fine with the ipa-cp hunks but I cannot approve them, Honza is the right person to ask. I'll step in and say these bits are fine :-) Thanks for the reviews Martin. Ilya, please hold off installing until all the patches are approved. We're obviously trying to keep up with them as they come in. jeff
Re: [PATCH][genattrtab] Fix memory corruption, allocate enough memory for all bypassed reservations
On 06/17/14 02:12, Kyrill Tkachov wrote: On 16/06/14 17:39, Jeff Law wrote: On 06/16/14 04:12, Kyrill Tkachov wrote: Doh, you're right. I did consider it but for some reason thought we might want to iterate over all of the bypasses anyway. Breaking out seems good. How about this? Tested on arm and aarch64 and confirmed with valgrind that no out of bounds accesses occur. I kicked off an x86_64 bootstrap but don't expect any problems. Thanks, Kyrill genattrtab-bypasses.patch commit 676b85f7a7cc1446482334dcaad457ac328875a8 Author: Kyrylo Tkachovkyrylo.tkac...@arm.com Date: Fri Jun 13 11:09:57 2014 +0100 [genattrtab] Fix memory corruption with bypasses I'm an idiot. n_bypassed is used to size the vector, so you do have to walk the entire list. AFAICS in the loop in process_bypasses we want to count all the reservations which have a bypass matching them. Once a reservation is matched with a bypass it should be safe to break out of the inner loop (over the bypasses), even if two bypasses match a reservation we only want to count the reservation once. So I think the 2nd version of the patch is good OK. APproved. jeff
Re: [PING][PATCH, trunk, 4.9, 4.8] Fix PR57653, filename information discarded when using -imacros
On 06/11/14 15:15, Peter Bergner wrote: I'd like to ping the following patch that fixes PR57653. This did bootstrap and regtest with no regressions on powerpc64-linux. https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01571.html Is this ok for trunk, 4.9 and 4.8? Whee, fun. So this led me to an interesting exchange between Per DJ on some of Per's changes in this space. Sadly, it doesn't look like Per checked in any tests for the problems DJ was running into. I hate to ask Peter, but can you add some testcases? These messages have the originals which led to the unsightly code we have now. https://gcc.gnu.org/ml/gcc-patches/2003-10/msg02694.html https://gcc.gnu.org/ml/gcc-patches/2003-11/msg00163.html I know 57653's problem is specific to the stdc-predef that's included in glibc-2.17 and later, but that's becoming relatively common at this point. I think c#2 has the testcase. Approved with the tests added. Thanks and sorry for the delay. Jeff
Re: [patch i386]: Combine memory and indirect jump
2014-06-17 21:26 GMT+02:00 Jeff Law l...@redhat.com: On 06/13/14 10:59, Kai Tietz wrote: 2014-06-13 17:58 GMT+02:00 Jeff Law l...@redhat.com: On 06/13/14 09:56, Richard Henderson wrote: On 06/13/2014 08:36 AM, Jeff Law wrote: So you may have answered this already, but why can't this be a combiner pattern? Until pass_duplicate_computed_gotos, we (intentionally) have a single indirect branch in the entire function. This vastly reduces the size of the CFG. Ah, the factoring bits. Should have known. Peep2 is currently running before d_c_g, so currently Kai can't solve this problem in peep2. I don't think peep2 should run after sched2, but I'll bet we can reorder things a bit so that d_c_g runs before peep2. Yea, seems worth a try. jeff Well, I tested to put the second sched2 pass before the sched2 pass. That works in general. There are just some opportunties which weren't caught then. I attached a sample, which demonstrates that pretty well. I noticed that I had to put that pass behind reload blocks was necessary for better hit-rate of the peephole optimization. So can you tell us why this sample code misses opportunities? Otherwise we have to dig into it ourselves to tease out that information. I think we're zeroing in on a path to move d_c_g before peep2, but I'd like to have a clearer understanding of why we'd still be missing opportunities. If we can avoid running peep2 twice, that'd be good. jeff Hi Jeff, I just did retest my testcase with recent source. I can't reproduce this missed optimization before sched2 pass anymore. I moved second peephole2 pass just before split_before_sched2 and everything got caught. To remove first peephole2 pass seems to cause weaker code for impossible pushes, etc Nevertheless it might be a point to make this new peephole instead a define_split? I admit that this operation isn't a split, nevertheless we would avoid a second peephole pass. Kai
Re: [patch] improve sloc assignment on bind_expr entry/exit code
On 06/11/14 09:02, Olivier Hainque wrote: Hello, For blocks requiring it, the gimplifier generates stack pointer save/restore operations on entry/exit, per: gimplify_bind_expr (...) if (gimplify_ctxp-save_stack) { gimple stack_restore; /* Save stack on entry and restore it on exit. Add a try_finally block to achieve this. */ build_stack_save_restore (stack_save, stack_restore); gimplify_seq_add_stmt (cleanup, stack_restore); } /* Add clobbers for all variables that go out of scope. */ ... There is no specific location assigned to these entry/exit statements so they eventually inherits slocs coming from preceding statements. This is problematic for tools relying on debug info to infer which statements were executed out of execution traces (allowing coverage analysis without code instrumentation). An example of problematic scenario is provided below. The attached patch is a proposal to improve this by propagating start and end of block locations from the block structure to the few gimple statements we generate. It adds an end_locus to the block structure for this purpose, which the Ada front-end knows how to fill already. I verified that it does inserts proper .loc directives before the entry/exit code on the example. The patch also bootstraps and regtests fine for languages=all,ada on x86_64-pc-linux-gnu. OK to commit ? Thanks in advance for your feedback, With Kind Regards, Olivier -- 2014-06-11 Olivier Hainque hain...@adacore.com * tree-core.h (tree_block): Add an end_locus field, allowing memorization of the end of block source location. * tree.h (BLOCK_SOURCE_END_LOCATION): New accessor. * gimplify.c (gimplify_bind_expr): Propagate the block start and end source location info we have on the block entry/exit code we generate. OK. I assume y'all will add a suitable test to the Ada testsuite and propagate it into the GCC testsuite in due course? jeff
Re: [PATCH, rs6000] Remove XFAIL from default_format_denormal_2.f90 for PowerPC on Linux
William J. Schmidt wschm...@linux.vnet.ibm.com writes: Index: gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 === --- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (revision 211741) +++ gcc/testsuite/gfortran.dg/default_format_denormal_2.f90 (working copy) @@ -1,5 +1,5 @@ ! { dg-require-effective-target fortran_large_real } -! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } } +! { dg-do run { xfail powerpc*-apple-darwin* } } ! Test XFAILed on these platforms because the system's printf() lacks ! proper support for denormalized long doubles. See PR24685 You should also update the comment: `these platforms' no longer applies. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 4/5] Existing tests fix
Jeff Law l...@redhat.com writes: On 06/13/14 04:48, mliska wrote: Hi, many tests rely on a precise number of scanned functions in a dump file. If IPA ICF decides to merge some function and(or) read-only variables, counts do not match. Martin Changelog: 2014-06-13 Martin Liska mli...@suse.cz Honza Hubicka hubi...@ucw.cz * c-c++-common/rotate-1.c: Text ^ Huh? * c-c++-common/rotate-2.c: New test. * c-c++-common/rotate-3.c: Likewise. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH, Pointer Bounds Checker 29/x] Debug info
On 06/11/14 02:50, Ilya Enkovich wrote: Hi, This patch skips all bounds during debug info generation. Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-06-11 Ilya Enkovich ilya.enkov...@intel.com * dbxout.c (dbxout_type): Ignore POINTER_BOUNDS_TYPE. * dwarf2out.c (gen_subprogram_die): Ignore bound args. (gen_type_die_with_usage): Skip pointer bounds. (dwarf2out_global_decl): Likewise. (is_base_type): Support POINTER_BOUNDS_TYPE. (gen_formal_types_die): Skip pointer bounds. (gen_decl_die): Likewise. * var-tracking.c (vt_add_function_parameters): Skip bounds parameters. OK. Note that sdbout might need updating as well. It's used even less than dbxout, but if you can see how to skip bounds in there to, it'd be appreciated. It looks like mingw/cygwin still use sdbout (?!?), so if you need something tested, you can ping Kai Tietz. jeff
Re: [PING*2][PATCH] Extend mode-switching to support toggle (1/2)
On 06/12/14 08:34, Christian Bruel wrote: On 06/11/2014 02:00 PM, Christian Bruel wrote: On 06/11/2014 06:17 AM, Joern Rennecke wrote: Joern, is this new target macro interface OK with you ? Yes, this interface should allow me to do switches between rounding and truncating floating-point modes with an add/subtract immediate. However, the implentation, as posted, doesn't work - it causes memory corruption. It appears to work with the attached amendment patch. Indeed, thanks for pointing out the bad reusing of the aux field between multiple entities. In fact rereading this part of the implementation, I find the allocation of aux*n_entities awkward. A simpler setting in the entity loop to carry the mode directly into eg-aux is possible without array allocation (which also fixes a memory leak by the way). Here is the revised version fixing the aforementioned issue found by Joern on Epiphany. It also simplifies the allocation of the aux edges field to carry the modes. Now that everyone agrees on the interface, is this OK for trunk ? bootstrapped/regtested for X86 and SH4a. thanks, Christian toggle.patch 2014-06-12 Christian Bruelchristian.br...@st.com * mode-switching.c (struct bb_info): Add mode_out, mode_in caches. (make_preds_opaque): Delete. (clear_mode_bit, mode_bit_p, set_mode_bit): New macros. (commit_mode_sets): New function. (optimize_mode_switching): Handle current_mode to mode_switching_emit. Process all modes at once. * basic-block.h (pre_edge_lcm_avs): Declare. * lcm.c (pre_edge_lcm_avs): Renamed from pre_edge_lcm. Call clear_aux_for_edges. Fix comments. (pre_edge_lcm): New wrapper function to call pre_edge_lcm_avs. (pre_edge_rev_lcm): Idem. * config/epiphany/epiphany.c (emit_set_fp_mode): Add prev_mode parameter. * config/epiphany/epiphany-protos.h (emit_set_fp_mode): Idem. * config/epiphany/resolve-sw-modes.c (pass_resolve_sw_modes::execute): Idem. * config/i386/i386.c (x96_emit_mode_set): Idem. * config/sh/sh.c (sh_emit_mode_set): Likewise. Handle PR toggle. * config/sh/sh.md (toggle_pr): Defined if TARGET_FPU_SINGLE. (fpscr_toggle) Disallow from delay slot. * target.def (emit_mode_set): Add prev_mode parameter. * doc/tm.texi: Regenerate. 2014-06-12 Christian Bruelchristian.br...@st.com * gcc.target/sh/fpchg.c: New test. This is fine for the trunk. Thanks for your patience, Jeff
Re: Bug 61407 - Build errors on latest OS X 10.10 Yosemite with Xcode 6 on GCC 4.8.3
On Jun 17, 2014, at 4:09 AM, Илья Михальцов morph...@gmail.com wrote: This patch fixes gcc build problems on the latest OS X 10.10 SDK beta (see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61407) fix include hack to add: +#ifndef __has_feature +#define __has_feature(x) 0 +#endif So, I’d like to bring this up in the larger context of autoconf, portable code what style we’d like for people to write code in. From a darwin .h file in /usr/include: #if defined(__has_feature) defined(__has_attribute) #if __has_attribute(deprecated) #define DEPRECATED_ATTRIBUTE__attribute__((deprecated)) #if __has_feature(attribute_deprecated_with_message) #define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated(s))) #else #define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated)) #endif #else #define DEPRECATED_ATTRIBUTE #define DEPRECATED_MSG_ATTRIBUTE(s) #endif #elif defined(__GNUC__) ((__GNUC__ = 4) || ((__GNUC__ == 3) (__GNUC_MINOR__ = 1))) #define DEPRECATED_ATTRIBUTE__attribute__((deprecated)) #if (__GNUC__ = 5) || ((__GNUC__ == 4) (__GNUC_MINOR__ = 5)) #define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated(s))) #else #define DEPRECATED_MSG_ATTRIBUTE(s) __attribute__((deprecated)) #endif #else I think this serves as a great introduction to the feature and what it is, why it exists and what it attempts to do. In short, give code writers an ability to smell a port via #if and or if (), and write portable code without using autoconf. Yes, for some truly hard problems, this scheme breaks down, but if gcc and other vendor compilers follow the scheme and define these as appropriate, then users can make use of this scheme instead of autoconf. It was code like #if defined(__GNUC__) that causes clang to lie and say it is gnuc, it does this, as the code doesn’t use a fine grained check for the feature, but rather a course grained check on __GNUC__ which is wrong, as other compilers implement __attribute__ and __attribute__((deprecated)) that are not gcc. http://clang.llvm.org/docs/LanguageExtensions.html has the names for the things that clang defines. In gcc, we could elect to use the same names and define them as appropriate for gcc. I think if gcc did this, then the quoted fix isn’t necessary. Also, if gcc doesn’t want to do this, it is reasonable for the darwin port to so define features, they tend to be large scale and slow moving and monotonic in nature, so the maintenance of them should be low in general. What do people think?
Re: [PATCH, loop2_invariant] Pre-check invariants
On 06/11/14 03:35, Zhenqiang Chen wrote: Thanks for the comments. df_live seams redundant. With flag_ira_loop_pressure, the pass will call df_analyze () at the beginning, which can make sure all the DF info are correct. Can we guarantee all DF_... correct without df_analyze ()? They should be fine in this context. +/* Pre-check candidate DEST to skip the one which can not make a valid insn + during move_invariant_reg. SIMPlE is to skip HARD_REGISTER. */ s/SIMPlE/SIMPLE/ + { + /* Multi definitions at this stage, most likely are due to + instruction constrain, which requires both read and write s/constrain/constraints/ Though that doesn't make sense. Constraints don't come into play until much later in the pipeline. Certainly there's been code in the expanders and elsewhere to try and make the code we generate more acceptable to 2-address targets and that's probably what you're really running into. I think the code is fine, but that you need to improve the comment. ISTM that if your primary focus is to filter out read/write operands, then just say that and ignore the constraints or other mechanisms by which we got a read/write pseudo. So I think with those two small comment changes, this patch is OK for the trunk. Please post the final version for archival purposes before checking it in. Thanks, Jeff
Re: [PATCH, i386, Pointer Bounds Checker 17/x] Pointer bounds constants support
On 06/06/14 03:11, Ilya Enkovich wrote: 2014-06-04 10:58 GMT+04:00 Jeff Law l...@redhat.com: On 06/02/14 04:25, Ilya Enkovich wrote: Hi, This patch adds support for pointer bounds constants to be used as DECL_INITIAL for constant bounds (like zero bounds). Bootstrapped and tested on linux-x86_64. Thanks, Ilya -- gcc/ 2014-05-30 Ilya Enkovich ilya.enkov...@intel.com * emit-rtl.c (immed_double_const): Support MODE_POINTER_BOUNDS. (init_emit_once): Build pointer bounds zero constants. * explow.c (trunc_int_for_mode): Likewise. * varpool.c (ctor_for_folding): Do not fold constant bounds vars. * varasm.c (output_constant_pool_2): Support MODE_POINTER_BOUNDS. * config/i386/i386.c (ix86_legitimate_constant_p): Mark bounds constant as not valid. [ ... ] @@ -5875,6 +5876,11 @@ init_emit_once (void) if (STORE_FLAG_VALUE == 1) const_tiny_rtx[1][(int) BImode] = const1_rtx; + for (mode = GET_CLASS_NARROWEST_MODE (MODE_POINTER_BOUNDS); + mode != VOIDmode; + mode = GET_MODE_WIDER_MODE (mode)) +const_tiny_rtx[0][mode] = immed_double_const (0, 0, mode); I'm pretty sure GET_CLASS_NARROWEST_MODE should be taking a class, not a mode as its argument. So something is clearly wrong here... MODE_POINTER_BOUNDS is a class. Modes in this class are BND32mode and BND64mode. Bah. You're right. Approved. jeff
Re: [PATCH, loop2_invariant, 1/2] Check only one register class
On 06/11/14 04:05, Zhenqiang Chen wrote: On 10 June 2014 19:06, Steven Bosscher stevenb@gmail.com wrote: On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote: Hi, For loop2-invariant pass, when flag_ira_loop_pressure is enabled, function gain_for_invariant checks the pressures of all register classes. This does not make sense since one invariant might impact only one register class. The patch enhances functions get_inv_cost and gain_for_invariant to check only the register pressure of the invariant if possible. This patch may work for targets with more-or-less orthogonal reg classes, but not if there is a lot of overlap between reg classes. Yes. I need check the overlap between reg classes. Patch is updated to check all overlap reg classes by reg_classes_intersect_p: Just so I'm sure I know what you're trying to do. You want to map the pseudo back to its likely class(es) then look at how those classes (and only those classes) would be impacted from a register pressure standpoint if the pseudo was hoisted as an invariant? This is primarily achieved by returning the class of the invariant, then filtering out any non-intersecting classes in gain_for_invariant, right? jeff
Re: [PATCH] Fortran OpenMP 4.0 target support
Jakub Jelinek wrote: This patch adds the target directives. Tested both normally plus with target.c/splay-tree.c from gomp-4_0-branch@203409 plus the attached patch against target.c to implement the new to_pset map kind (5) and allow handling of NULL. That patch will need to be forward ported to whatever gomp-4_0-branch now has after this is merged from trunk to that branch. Does this look reasonable to Fortran maintainers? Thanks for the patch! I browsed through the patch, and it looked good to me. (However, given that the patch has 48 files changed, 3342 insertions(+), 330 deletions(-), I didn't check every line.) If I did the book keeping correctly, a patch for an alignment test case is still missing. As are the changes for some corner cases for which the OpenMP ARB has to provide some feedback. Any news from that side? Otherwise and aside of 4.9.1 backporting, it now looks pretty complete. Tobias
Re: [PATCH] Fortran OpenMP 4.0 target support
On Tue, Jun 17, 2014 at 11:59:22PM +0200, Tobias Burnus wrote: This patch adds the target directives. Tested both normally plus with target.c/splay-tree.c from gomp-4_0-branch@203409 plus the attached patch against target.c to implement the new to_pset map kind (5) and allow handling of NULL. That patch will need to be forward ported to whatever gomp-4_0-branch now has after this is merged from trunk to that branch. Does this look reasonable to Fortran maintainers? Thanks for the patch! I browsed through the patch, and it looked good to me. (However, given that the patch has 48 files changed, 3342 insertions(+), 330 deletions(-), I didn't check every line.) If I did the book keeping correctly, a patch for an alignment test case is still missing. As are the changes for some corner cases for which the OpenMP ARB has to provide some feedback. Any news from that side? Otherwise and aside of 4.9.1 backporting, it now looks pretty complete. I think some work is needed in tree-nested.c, ideally write a testcase that tests all the new OpenMP 4.0 clauses in contained functions with and without non-local decls (and with local decls used by contained functions). One of the omp-lang answers shows some work is needed on the UDRs too, in particular that the combiner/initializer should not be resolved as part of the UDR directive, but only when used in a reduction clause where not only the typespec, but also rank/shape, pointer/allocatable etc. are known. Some further restriction checking is probably needed + backing that with testcases. And wait for further omp-lang/omp-f2003 feedback. Jakub
Re: [Patch, microblaze]: Added load and store reverse patterns
On 02/10/14 17:55, Michael Eager wrote: On 11/25/13 23:54, David Holsgrove wrote: Added the lwr/swr instructions pattern. lwr and swr instructions will load/store the data with opposite endianness. Changelog 2013-11-26 Nagaraju Mekala nagaraju.mek...@xilinx.com * gcc/config/microblaze/microblaze.md: Add movsi4_rev insn pattern. * gcc/config/microblaze/predicates.md: Add reg_or_mem_operand predicate. GCC-head: Committed revision 207683. GCC-4.8-branch: Committed revision 207684. Reverted GCC-4.8-branch commit. Committed revision 211750. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: [PATCH, ARM] MI-thunk fix for TARGET_THUMB1_ONLY
On Sun, Jun 8, 2014 at 12:27 PM, Chung-Lin Tang clt...@codesourcery.com wrote: Hi Richard, Ramana, Attached is a small fix for resolving a g++.old-deja/g++.jason/thunk2.C regression we found under a TARGET_THUMB1_ONLY multilib (-mthumb -march=armv6-m to be exact). Basically under those conditions, the thunk is in Thumb mode, so the subtraction should be 4 rather than 8. Yep, this is OK with a minor change to the comment to make it more explicit. + /* Output .word .LTHUNKn-[37]-.LTHUNKPCn. */ s/37/3,7/ Ok with that change and if no regressions. OK for release branches unless the RM's object in 24 hours. It would be nice to see if we could rewrite the mi thunk code like other backends but that's the matter of a separate patch. Ramana Original patch was by Julian, with trivial adaptations for trunk by me. We've been carrying this fix for a while by now. Okay for trunk? (and stable branches?) Thanks, Chung-Lin 2014-06-08 Julian Brown jul...@codesourcery.com Chung-Lin Tang clt...@codesourcery.com * config/arm/arm.c (arm_output_mi_thunk): Fix offset for TARGET_THUMB1_ONLY. Add comments.
Re: [PATCH, PR61219]: Fix sNaN handling in ARM float to double conversion
On Sun, May 18, 2014 at 10:23 PM, Aurelien Jarno aurel...@aurel32.net wrote: On ARM soft-float, the float to double conversion doesn't convert a sNaN to qNaN as the IEEE Std 754 standard mandates: Under default exception handling, any operation signaling an invalid operation exception and for which a floating-point result is to be delivered shall deliver a quiet NaN. Given the soft float ARM code ignores exceptions and always provides a result, a float to double conversion of a signaling NaN should return a quiet NaN. Fix this in extendsfdf2. 2014-05-18 Aurelien Jarno aurel...@aurel32.net PR target/61219 * config/arm/ieee754-df.S (extendsfdf2): Convert sNaN to qNaN. Ok if no regressions along with a testcase to catch this case please and fixing the PR number Sorry about the slow review. Ramana Index: libgcc/config/arm/ieee754-df.S === --- libgcc/config/arm/ieee754-df.S (revision 210588) +++ libgcc/config/arm/ieee754-df.S (working copy) @@ -473,11 +473,15 @@ eorne xh, xh, #0x3800 @ fixup exponent otherwise. RETc(ne)@ and return it. - teq r2, #0 @ if actually 0 - do_it ne, e - teqne r3, #0xff00 @ or INF or NAN + bicsr2, r2, #0xff00 @ isolate mantissa + do_it eq @ if 0, that is ZERO or INF, RETc(eq)@ we are done already. + teq r3, #0xff00 @ check for NAN + do_it eq, t + orreq xh, xh, #0x0008 @ change to quiet NAN + RETc(eq)@ and return it. + @ value was denormalized. We can normalize it now. do_push {r4, r5, lr} mov r4, #0x380 @ setup corresponding exponent -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net
Re: [RFC ARM] Error if overriding --with-tune by --with-cpu
On Fri, May 30, 2014 at 5:34 PM, James Greenhalgh james.greenha...@arm.com wrote: Hi, We error in the case where both --with-tune and --with-cpu are specified at configure time. In this case, we cannot distinguish this situation from the situation where --with-tune was specified at configure time and -mcpu was passed on the command line, so we give -mcpu precedence. This might be surprising if you expect the precedence rules we give to the command line options, but we can't change this precedence without breaking our definition of -mcpu. We also promote the warning which used to be thrown in the case of --with-arch and --with-cpu to an error. Ok by me - Especially as Bin has just run into it as part of his testing. Obviously no one watches these warnings and they don't realize what's happening under their feet. I've marked this is an RFC as it isn't clear that configure should be catching something like this. Other blatant errors in configuration options like passing --with-languages=c,c++ pass without event. Well yeah that looks ok . Tested with a few combinations of configure options with no issues and the expected behaviour. Any opinions, and if not, OK for trunk? I am going to give this a week for anyone else to pitch in and object - otherwise please apply it and document this change in behaviour in the caveats section for the next release (changes.html). Ramana Thanks, James --- gcc/ 2014-05-30 James Greenhalgh james.greenha...@arm.com * config.gcc (supported_defaults): Error when passing either --with-tune or --with-arch in conjunction with --with-cpu for ARM.
[PATCH, rs6000] Fix PR61542 - V4SF vector extract for little endian
Hi, As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61542, a new test case (gcc.dg/vect/vect-nop-move.c) was added in 4.9. This exposes a bug on PowerPC little endian for extracting an element from a V4SF value that goes back to 4.8. The following patch fixes the problem. Tested on powerpc64le-unknown-linux-gnu with no regressions. Ok to commit to trunk? I would also like to commit to 4.8 and 4.9 as soon as possible to be picked up by the distros. I would also like to backport gcc.dg/vect/vect-nop-move.c to 4.8 to provide regression coverage. Thanks, Bill 2014-06-17 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/vsx.md (vsx_extract_v4sf): Fix bug with element extraction other than index 3. Index: gcc/config/rs6000/vsx.md === --- gcc/config/rs6000/vsx.md(revision 211741) +++ gcc/config/rs6000/vsx.md(working copy) @@ -1667,7 +1667,7 @@ { if (GET_CODE (op3) == SCRATCH) op3 = gen_reg_rtx (V4SFmode); - emit_insn (gen_vsx_xxsldwi_v4sf (op3, op1, op1, op2)); + emit_insn (gen_vsx_xxsldwi_v4sf (op3, op1, op1, GEN_INT (ele))); tmp = op3; } emit_insn (gen_vsx_xscvspdp_scalar2 (op0, tmp));
Re: [PATCH][ARM] FAIL: gcc.target/arm/pr58041.c scan-assembler ldrb
On Fri, May 30, 2014 at 12:19 AM, Maciej W. Rozycki ma...@codesourcery.com wrote: On Wed, 28 May 2014, Richard Earnshaw wrote: Ah, light dawns (maybe). I guess the problems stem from the attempts to combine Neon with ARMv5. Neon shouldn't be used with anything prior to ARMv7, since that's the earliest version of the architecture that can support it. Good to know, thanks for the hint. Anyway it's the test case doing something silly or maybe just odd. After all IIUC ARMv5 code will run just fine on ARMv7/NEON hardware so mixing up ARMv5 scalar code with NEON vector code is nothing wrong per se. I guess that what is happening is that we see we have Neon, so start to generate a Neon-based copy sequence, but then notice that we don't have misaligned access (something that must exist if we have Neon) and generate VLDR instructions in a mistaken attempt to work around the first inconsistency. Maybe we should tie -mfpu=neon to having at least ARMv7 (though ARMv6 also has misaligned access support). So to move away from the odd mixture of instruction selection options just as a quick test I rebuilt the same file with `-march=armv7-a -mno-unaligned-access' and the result is the same, a pair of VLDR instructions accessing unaligned memory, i.e. the same problem. So based on observations made so far I think there are two sensible ways to move forward: 1. Fix GCC so that a manual byte-wise copy is made whenever `-mno-unaligned-access' is in effect. #1 is the preferrable option. 2. Revert the change being discussed here as its lone purpose was to disable the use of VLD1.8, etc. where `-mno-unaligned-access' is in effect, and it does no good. Reverting this means pr58041 will fail on armv7-a / neon configurations which is what this patch was designed to fix ? So it's not an option is it ? Ramana Maciej
Re: [PATCH] [ARM] [RFC] Fix longstanding push_minipool_fix ICE (PR49423, lp1296601)
On Wed, Apr 2, 2014 at 2:29 PM, Charles Baylis charles.bay...@linaro.org wrote: Hi This patch fixes the push_minipool_fix ICE, which occurs when the ARM backend encounters a zero/sign extending load from a constant pool. I don't have a current test case for trunk, lp1296601 has a test case which affects the linaro-4.8 branch. As far as I know, there has been no fix for this on trunk. The approach taken in this patch is to extend each pattern where this can occur, so that it triggers a define_split to synthesise a constant move instead. Some but not all extend patterns have previously added pool_range attributes to work-around this problem, this patch removes those, and also fixes the remaining patterns. Some patterns have slightly more complex workarounds, which I have not yet analysed, but it seems worth posting the patch at this stage to get feedback on the general approach. Tested on arm-unknown-linux-gnueabihf (qemu), bootstrap in progress. If this looks good, I'll clean it up for a more detailed review. Interesting workaround but can we investigate further how to fix this at the source rather than working around in the backend in this form. It's still a kludge that we carry in the backend rather than fix the problem at it's source. I'd rather try to fix the problem at the source rather than working around this in the backend. Ramana Thanks Charles
C++ PATCH for c++/60605 (local function and default template arg)
The exception for local declarations in check_default_tmpl_args needs to handle DECL_LOCAL_FUNCTION_P, too. Tested x86_64-pc-linux-gnu, applying to 4.8, 4.9, trunk. commit 424c657e1213126dc5d2a7231abac05e16713286 Author: Jason Merrill ja...@redhat.com Date: Tue Jun 17 18:43:57 2014 +0200 PR c++/60605 * pt.c (check_default_tmpl_args): Check DECL_LOCAL_FUNCTION_P. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 85b46fe..a4e1a59 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -4308,7 +4308,8 @@ check_default_tmpl_args (tree decl, tree parms, bool is_primary, in the template-parameter-list of the definition of a member of a class template. */ - if (TREE_CODE (CP_DECL_CONTEXT (decl)) == FUNCTION_DECL) + if (TREE_CODE (CP_DECL_CONTEXT (decl)) == FUNCTION_DECL + || (TREE_CODE (decl) == FUNCTION_DECL DECL_LOCAL_FUNCTION_P (decl))) /* You can't have a function template declaration in a local scope, nor you can you define a member of a class template in a local scope. */ diff --git a/gcc/testsuite/g++.dg/template/local-fn1.C b/gcc/testsuite/g++.dg/template/local-fn1.C new file mode 100644 index 000..88acd17 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/local-fn1.C @@ -0,0 +1,8 @@ +// PR c++/60605 + +template typename T = int +struct Foo { +void bar() { +void bug(); +} +};
[PATCH] PR61517: fix stmt replacement in bswap pass
Hi everybody, Thanks to a comment from Richard Biener, the bswap pass take care to not perform its optimization is memory is modified between the load of the original expression. However, when it replaces these statements by a single load, it does so in the gimple statement that computes the final bitwise OR of the original expression. However, memory could be modified between the last load statement and this bitwise OR statement. Therefore the result is to read memory *after* it was changed instead of before. This patch takes care to move the statement to be replaced close to one of the original load, thus avoiding this problem. ChangeLog entries for this fix are: *** gcc/ChangeLog *** 2014-06-16 Thomas Preud'homme thomas.preudho...@arm.com * tree-ssa-math-opts.c (find_bswap_or_nop_1): Adapt to return a stmt whose rhs's first tree is the source expression instead of the expression itself. (find_bswap_or_nop): Likewise. (bsap_replace): Rename stmt in cur_stmt. Pass gsi by value and src as a gimple stmt whose rhs's first tree is the source. In the memory source case, move the stmt to be replaced close to one of the original load to avoid the problem of a store between the load and the stmt's original location. (pass_optimize_bswap::execute): Adapt to change in bswap_replace's signature. *** gcc/testsuite/ChangeLog *** 2014-06-16 Thomas Preud'homme thomas.preudho...@arm.com * gcc.c-torture/execute/bswap-2.c (incorrect_read_le32): New. (incorrect_read_be32): Likewise. (main): Call incorrect_read_* to test stmt replacement is made by bswap at the right place. * gcc.c-torture/execute/pr61517.c: New test. Patch also attached for convenience. Is it ok for trunk? diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c index a47e01a..88132fe 100644 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c @@ -66,6 +66,32 @@ fake_read_be32 (char *x, char *y) return c3 | c2 8 | c1 16 | c0 24; } +__attribute__ ((noinline, noclone)) uint32_t +incorrect_read_le32 (char *x, char *y) +{ + unsigned char c0, c1, c2, c3; + + c0 = x[0]; + c1 = x[1]; + c2 = x[2]; + c3 = x[3]; + *y = 1; + return c0 | c1 8 | c2 16 | c3 24; +} + +__attribute__ ((noinline, noclone)) uint32_t +incorrect_read_be32 (char *x, char *y) +{ + unsigned char c0, c1, c2, c3; + + c0 = x[0]; + c1 = x[1]; + c2 = x[2]; + c3 = x[3]; + *y = 1; + return c3 | c2 8 | c1 16 | c0 24; +} + int main () { @@ -92,8 +118,17 @@ main () out = fake_read_le32 (cin, cin[2]); if (out != 0x89018583) __builtin_abort (); + cin[2] = 0x87; out = fake_read_be32 (cin, cin[2]); if (out != 0x83850189) __builtin_abort (); + cin[2] = 0x87; + out = incorrect_read_le32 (cin, cin[2]); + if (out != 0x89878583) +__builtin_abort (); + cin[2] = 0x87; + out = incorrect_read_be32 (cin, cin[2]); + if (out != 0x83858789) +__builtin_abort (); return 0; } diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61517.c b/gcc/testsuite/gcc.c-torture/execute/pr61517.c new file mode 100644 index 000..fc9bbe8 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61517.c @@ -0,0 +1,19 @@ +int a, b, *c = a; +unsigned short d; + +int +main () +{ + unsigned int e = a; + *c = 1; + if (!b) +{ + d = e; + *c = d | e; +} + + if (a != 0) +__builtin_abort (); + + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index c868e92..1ee2ba8 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1804,28 +1804,28 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct symbolic_number *n) /* find_bswap_or_nop_1 invokes itself recursively with N and tries to perform the operation given by the rhs of STMT on the result. If the operation - could successfully be executed the function returns the tree expression of - the source operand and NULL otherwise. */ + could successfully be executed the function returns a gimple stmt whose + rhs's first tree is the expression of the source operand and NULL + otherwise. */ -static tree +static gimple find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit) { enum tree_code code; tree rhs1, rhs2 = NULL; - gimple rhs1_stmt, rhs2_stmt; - tree source_expr1; + gimple rhs1_stmt, rhs2_stmt, source_stmt1; enum gimple_rhs_class rhs_class; if (!limit || !is_gimple_assign (stmt)) -return NULL_TREE; +return NULL; rhs1 = gimple_assign_rhs1 (stmt); if (find_bswap_or_nop_load (stmt, rhs1, n)) -return rhs1; +return stmt; if (TREE_CODE (rhs1) != SSA_NAME) -return NULL_TREE; +return NULL; code = gimple_assign_rhs_code (stmt); rhs_class = gimple_assign_rhs_class (stmt); @@ -1848,18 +1848,18 @@ find_bswap_or_nop_1
Re: [PATCH, loop2_invariant, 1/2] Check only one register class
On 18 June 2014 05:49, Jeff Law l...@redhat.com wrote: On 06/11/14 04:05, Zhenqiang Chen wrote: On 10 June 2014 19:06, Steven Bosscher stevenb@gmail.com wrote: On Tue, Jun 10, 2014 at 11:22 AM, Zhenqiang Chen wrote: Hi, For loop2-invariant pass, when flag_ira_loop_pressure is enabled, function gain_for_invariant checks the pressures of all register classes. This does not make sense since one invariant might impact only one register class. The patch enhances functions get_inv_cost and gain_for_invariant to check only the register pressure of the invariant if possible. This patch may work for targets with more-or-less orthogonal reg classes, but not if there is a lot of overlap between reg classes. Yes. I need check the overlap between reg classes. Patch is updated to check all overlap reg classes by reg_classes_intersect_p: Just so I'm sure I know what you're trying to do. You want to map the pseudo back to its likely class(es) then look at how those classes (and only those classes) would be impacted from a register pressure standpoint if the pseudo was hoisted as an invariant? Yes. This is primarily achieved by returning the class of the invariant, then filtering out any non-intersecting classes in gain_for_invariant, right? Yes. This is what I want to do since I found some invariant which register class is NO_REGS (memory write) or SSE_REGS is blocked by GENERAL_REGS' register pressure. Thanks! -Zhenqiang
RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap
From: Richard Biener [mailto:richard.guent...@gmail.com] Sent: Wednesday, June 11, 2014 4:32 PM Is this OK for trunk? Does this bug qualify for a backport patch to 4.8 and 4.9 branches? This is ok for trunk and also for backporting (after a short while to see if there is any fallout). Below is the backported patch for 4.8/4.9. Is this ok for both 4.8 and 4.9? If yes, how much more should I wait before committing? Tested on both 4.8 and 4.9 without regression in the testsuite after a bootstrap. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 1e35bbe..0559b7f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,16 @@ +2014-06-12 Thomas Preud'homme thomas.preudho...@arm.com + + PR tree-optimization/61306 + * tree-ssa-math-opts.c (struct symbolic_number): Store type of + expression instead of its size. + (do_shift_rotate): Adapt to change in struct symbolic_number. Return + false to prevent optimization when the result is unpredictable due to + arithmetic right shift of signed type with highest byte is set. + (verify_symbolic_number_p): Adapt to change in struct symbolic_number. + (find_bswap_1): Likewise. Return NULL to prevent optimization when the + result is unpredictable due to sign extension. + (find_bswap): Adapt to change in struct symbolic_number. + 2014-06-12 Alan Modra amo...@gmail.com PR target/61300 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 757cb74..139f23c 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,9 @@ +2014-06-12 Thomas Preud'homme thomas.preudho...@arm.com + + * gcc.c-torture/execute/pr61306-1.c: New test. + * gcc.c-torture/execute/pr61306-2.c: Likewise. + * gcc.c-torture/execute/pr61306-3.c: Likewise. + 2014-06-11 Richard Biener rguent...@suse.de PR tree-optimization/61452 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c new file mode 100644 index 000..ebc90a3 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c @@ -0,0 +1,39 @@ +#ifdef __INT32_TYPE__ +typedef __INT32_TYPE__ int32_t; +#else +typedef int int32_t; +#endif + +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#define __fake_const_swab32(x) ((uint32_t)( \ + (((uint32_t)(x) (uint32_t)0x00ffUL) 24) |\ + (((uint32_t)(x) (uint32_t)0xff00UL) 8) |\ + (((uint32_t)(x) (uint32_t)0x00ffUL) 8) |\ + (( (int32_t)(x) (int32_t)0xff00UL) 24))) + +/* Previous version of bswap optimization failed to consider sign extension + and as a result would replace an expression *not* doing a bswap by a + bswap. */ + +__attribute__ ((noinline, noclone)) uint32_t +fake_bswap32 (uint32_t in) +{ + return __fake_const_swab32 (in); +} + +int +main(void) +{ + if (sizeof (int32_t) * __CHAR_BIT__ != 32) +return 0; + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + if (fake_bswap32 (0x87654321) != 0xff87) +__builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c new file mode 100644 index 000..886ecfd --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c @@ -0,0 +1,40 @@ +#ifdef __INT16_TYPE__ +typedef __INT16_TYPE__ int16_t; +#else +typedef short int16_t; +#endif + +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#else +typedef unsigned uint32_t; +#endif + +#define __fake_const_swab32(x) ((uint32_t)( \ + (((uint32_t) (x) (uint32_t)0x00ffUL) 24) | \ + (((uint32_t)(int16_t)(x) (uint32_t)0x0000UL) 8) | \ + (((uint32_t) (x) (uint32_t)0x00ffUL) 8) | \ + (((uint32_t) (x) (uint32_t)0xff00UL) 24))) + + +/* Previous version of bswap optimization failed to consider sign extension + and as a result would replace an expression *not* doing a bswap by a + bswap. */ + +__attribute__ ((noinline, noclone)) uint32_t +fake_bswap32 (uint32_t in) +{ + return __fake_const_swab32 (in); +} + +int +main(void) +{ + if (sizeof (uint32_t) * __CHAR_BIT__ != 32) +return 0; + if (sizeof (int16_t) * __CHAR_BIT__ != 16) +return 0; + if (fake_bswap32 (0x81828384) != 0xff838281) +__builtin_abort (); + return 0; +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c new file mode 100644 index 000..6086e27 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c @@ -0,0 +1,13 @@ +short a = -1; +int b; +char c; + +int +main () +{ + c = a; + b = a | c; + if (b != -1) +__builtin_abort (); + return 0; +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 9ff857c..2b656ae 100644 --- a/gcc/tree-ssa-math-opts.c +++
[PATCH, aarch64] Fix 61545
Trivial fix for missing clobber of the flags over the tlsdesc call. Ok for all branches? r~ * config/aarch64/aarch64.md (tlsdesc_small_PTR): Clobber CC_REGNUM. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a4d8887..1ee2cae 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3855,6 +3855,7 @@ (unspec:PTR [(match_operand 0 aarch64_valid_symref S)] UNSPEC_TLSDESC)) (clobber (reg:DI LR_REGNUM)) + (clobber (reg:CC CC_REGNUM)) (clobber (match_scratch:DI 1 =r))] TARGET_TLS_DESC adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0, w0, %L0\;.tlsdesccall\\t%0\;blr\\t%1
Re: [PATCH][PING] Fix for PR 61422
Have already been done in r211699. Does it work for you? Adding a test would still be useful. -Y