Re: varpool alias reorg
On Mon, 27 Jun 2011, Jan Hubicka wrote: On Fri, 24 Jun 2011, Jan Hubicka wrote: Hi, this is yet another variant of the fix. This time we stream builtins decls as usually, but at fixup time we copy the assembler names (if set) into the builtin decls used by folders. Not sure if it is any better than breaking memops-asm, but I can imagine that things like glibc actually rename string functions into their internal variants (and thus with this version of patch we would be able to LTO such library, but still we won't be able to LTO such library into something else because something else would end up referncing the internal versions of builtins). I doubt we could do any better, however. Not stream builtins with adjusted assembler names (I guess we'd need a flag for this, DECL_USER_ASSEMBLER_NAME_SET_P? Or just check for Most of code just checks for '*' on begginign of assembler name. I suppose it is safe. attributes?) as builtins but as new decls. Let lto symbol merging then register those as aliases. But which way around? probably similar to how we should handle re-defined extern inlines, the extern inline being the GCC builtin and the re-definition being the aliased one. I don't quite get your answer here. What we do now is: 1) stream in builtin as special kind of reference with decl assembler name associated to it 2) at streaming in time always resolve builtlin to the official builtin decls (no matter what types and other stuff builtin had at stream out time) and overwritting the official builtin assembler name into one specified. What i suggest is 1) Stream out builtins as usual decls just with the extra function code 2) Stream in builtins as usually 3) optionally set the assembler name of the official decl I see there are problems with i.e. one decl rule, but we do have same problems with normal frontends that also do use different decl for explicit builtin calls than for implicit, sadly. I am not quite sure what the proper fix for this problem is - it is very handy to have builtin decl in middle end where I know it is sane (i.e. it has the right types etc.). Since C allows to declare the builtins arbitrarily, it gets bit tricky to preserve one decl rule here. Hm. I would suggest to do as now, stream in builtin specially if it does not have an assembler name attribute. If it does have it, stream it as usually and let lto symtab do its job (I suppose we need to register builtin functions with the symtab as well). __attribute__ ((used)) is still needed in memops-asm-lib.c because LTO symtab of course doesn't see the future references to builtins that we will emit later via folding. I think it is resonable requirement, as discussed at the time enabling the plugin. Yes, I think the testcase fix sounds reasonable. I suppose you can come up with a simpler testcase for this feature for gcc.dg/lto highlighting the different issues? I'm not sure if we are talking about my_memcpy () alias(memcpy) or memcpy () alias(my_memcpy). I still like to stream unmodified builtins as builtins, as that is similar to pre-loading the streamer caches with things like void_type_node or sizetype. Doing so will need us to solve the other one decl rules probly. I didn't really got what the preloading is useful for after all? Saving memory mostly, apart from the special singletons we have (as Micha already hinted). Richard.
Re: [RFC] Fix full memory barrier on SPARC-V8
Let's clarify something, did you run your testcase that triggered this bug on a v8 or a v9 machine? Sun UltraSPARC, so V9 of course. The point is that Solaris is TSO (TSO as defined for the V9 architecture, i.e. backward compatible with V8) so you have a V8-compatible TSO implementation, in particular not a Strong Consistency V8. It is perfectly valid to compile with -mcpu=v8 on Solaris and expect to get a working program. Now if you start to play seriously with __sync_synchronize, you conclude that it doesn't implement a full memory barrier with -mcpu=v8. The V8 architecture manual is quite clear about it: TSO allows stores to be reordered after subsequent loads (it's the only difference in TSO with Strong Consistency) so you need to do something to have a full memory barrier. As there is no specific instruction to that effect in V8, you need to do what is done for pre-SSE2 x86, i.e. use an atomic instruction. -- Eric Botcazou
[ARM] fix PR target/48637
For a long time now the compiler has permitted printing a symbol with the %c operator, but for some reason we've never permitted symbol+offset. This patch fixes this omission and also makes the compiler slightly more friendly to users of ASM statements by not generating an ICE when it can't handle an expression. Tested on arm-eabi and installed on trunk. This is not a regression, so I don't propose to back-port it to older compilers (though doing so would most-likely be trivial). R. 2011-06-27 Richard Earnshaw rearn...@arm.com PR target/48637 * arm.c (arm_print_operand): Allow sym+offset. Don't abort on invalid asm operands. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index efffcf8..8b9cb25 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -16242,8 +16242,17 @@ arm_print_operand (FILE *stream, rtx x, int code) output_addr_const (stream, x); break; + case CONST: + if (GET_CODE (XEXP (x, 0)) == PLUS + GET_CODE (XEXP (XEXP (x, 0), 0)) == SYMBOL_REF) + { + output_addr_const (stream, x); + break; + } + /* Fall through. */ + default: - gcc_unreachable (); + output_operand_lossage (Unsupported operand for code '%c', code); } return;
Re: [PATCH] __builtin_assume_aligned
On Mon, Jun 27, 2011 at 6:54 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Jun 27, 2011 at 12:17:40PM +0200, Richard Guenther wrote: Ok if you remove the builtins.c folding and instead verify arguments from check_builtin_function_arguments. Thanks, here is what I've committed after bootstrapping/regtesting again on x86_64-linux and i686-linux. Thanks Jakub. Probably worth an entry in changes.html. Richard. 2011-06-27 Jakub Jelinek ja...@redhat.com * builtin-types.def (BT_FN_PTR_CONST_PTR_SIZE_VAR): New. * builtins.def (BUILT_IN_ASSUME_ALIGNED): New builtin. * tree-ssa-structalias.c (find_func_aliases_for_builtin_call, find_func_clobbers): Handle BUILT_IN_ASSUME_ALIGNED. * tree-ssa-ccp.c (bit_value_assume_aligned): New function. (evaluate_stmt, execute_fold_all_builtins): Handle BUILT_IN_ASSUME_ALIGNED. * tree-ssa-dce.c (propagate_necessity): Likewise. * tree-ssa-alias.c (ref_maybe_used_by_call_p_1, call_may_clobber_ref_p_1): Likewise. * builtins.c (is_simple_builtin, expand_builtin): Likewise. (expand_builtin_assume_aligned): New function. * doc/extend.texi (__builtin_assume_aligned): Document. * c-common.c (check_builtin_function_arguments): Handle BUILT_IN_ASSUME_ALIGNED. * gcc.dg/builtin-assume-aligned-1.c: New test. * gcc.dg/builtin-assume-aligned-2.c: New test. * gcc.target/i386/builtin-assume-aligned-1.c: New test. --- gcc/builtin-types.def.jj 2011-06-26 09:55:16.0 +0200 +++ gcc/builtin-types.def 2011-06-27 15:08:12.0 +0200 @@ -454,6 +454,8 @@ DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_CONST BT_INT, BT_CONST_STRING, BT_CONST_STRING) DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_INT_CONST_STRING_VAR, BT_INT, BT_INT, BT_CONST_STRING) +DEF_FUNCTION_TYPE_VAR_2 (BT_FN_PTR_CONST_PTR_SIZE_VAR, BT_PTR, + BT_CONST_PTR, BT_SIZE) DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_STRING_SIZE_CONST_STRING_VAR, BT_INT, BT_STRING, BT_SIZE, BT_CONST_STRING) --- gcc/builtins.def.jj 2011-06-26 09:55:16.0 +0200 +++ gcc/builtins.def 2011-06-27 15:08:12.0 +0200 @@ -1,7 +1,7 @@ /* This file contains the definitions and documentation for the builtins used in the GNU compiler. Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, - 2010 Free Software Foundation, Inc. + 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -638,6 +638,7 @@ DEF_EXT_LIB_BUILTIN (BUILT_IN_EXE DEF_EXT_LIB_BUILTIN (BUILT_IN_EXECVE, execve, BT_FN_INT_CONST_STRING_PTR_CONST_STRING_PTR_CONST_STRING, ATTR_NOTHROW_LIST) DEF_LIB_BUILTIN (BUILT_IN_EXIT, exit, BT_FN_VOID_INT, ATTR_NORETURN_NOTHROW_LIST) DEF_GCC_BUILTIN (BUILT_IN_EXPECT, expect, BT_FN_LONG_LONG_LONG, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN (BUILT_IN_ASSUME_ALIGNED, assume_aligned, BT_FN_PTR_CONST_PTR_SIZE_VAR, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN (BUILT_IN_EXTEND_POINTER, extend_pointer, BT_FN_UNWINDWORD_PTR, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN (BUILT_IN_EXTRACT_RETURN_ADDR, extract_return_addr, BT_FN_PTR_PTR, ATTR_LEAF_LIST) DEF_EXT_LIB_BUILTIN (BUILT_IN_FFS, ffs, BT_FN_INT_INT, ATTR_CONST_NOTHROW_LEAF_LIST) --- gcc/tree-ssa-structalias.c.jj 2011-06-26 09:55:16.0 +0200 +++ gcc/tree-ssa-structalias.c 2011-06-27 15:08:12.0 +0200 @@ -4002,6 +4002,7 @@ find_func_aliases_for_builtin_call (gimp case BUILT_IN_STPCPY_CHK: case BUILT_IN_STRCAT_CHK: case BUILT_IN_STRNCAT_CHK: + case BUILT_IN_ASSUME_ALIGNED: { tree res = gimple_call_lhs (t); tree dest = gimple_call_arg (t, (DECL_FUNCTION_CODE (fndecl) @@ -4726,6 +4727,7 @@ find_func_clobbers (gimple origt) return; } /* The following functions neither read nor clobber memory. */ + case BUILT_IN_ASSUME_ALIGNED: case BUILT_IN_FREE: return; /* Trampolines are of no interest to us. */ --- gcc/tree-ssa-ccp.c.jj 2011-06-26 09:55:16.0 +0200 +++ gcc/tree-ssa-ccp.c 2011-06-27 15:08:12.0 +0200 @@ -1476,6 +1476,64 @@ bit_value_binop (enum tree_code code, tr return val; } +/* Return the propagation value when applying __builtin_assume_aligned to + its arguments. */ + +static prop_value_t +bit_value_assume_aligned (gimple stmt) +{ + tree ptr = gimple_call_arg (stmt, 0), align, misalign = NULL_TREE; + tree type = TREE_TYPE (ptr); + unsigned HOST_WIDE_INT aligni, misaligni = 0; + prop_value_t ptrval = get_value_for_expr (ptr, true); + prop_value_t alignval; + double_int value, mask; + prop_value_t val; + if (ptrval.lattice_val == UNDEFINED) + return ptrval; + gcc_assert ((ptrval.lattice_val == CONSTANT
Re: [patch tree-optimization]: Try to sink type-casts for binary and/or/xor operations
On Mon, Jun 27, 2011 at 7:17 PM, Kai Tietz kti...@redhat.com wrote: Ups, missed to update patch. You still modify the /* If the first argument is an SSA name that is itself a result of a typecast of an ADDR_EXPR to an integer, feed the ADDR_EXPR to the folder rather than the ssa name. */ block. Please merge the constant handling with the CONVERT_EXPR_CODE_P path instead. The above block is purely legacy and should probably be entirely dropped. Richard. Kai - Original Message - From: Kai Tietz kti...@redhat.com To: Richard Guenther richard.guent...@gmail.com Cc: gcc-patches@gcc.gnu.org Sent: Monday, June 27, 2011 7:04:04 PM Subject: Re: [patch tree-optimization]: Try to sink type-casts for binary and/or/xor operations Hi, so I modified patch to use int_fits_type_p() for integer CST checking. Well, this approach is - as discussed on IRC suboptimal - as my intial approach was for and-operations with precision type precision type-x and unsigned type-x for constant values bigger then (type-x)~0. But well, those we miss now by int_fits_type_p() approach, too. And also we miss now the cases for that type is signed and type-x is unsigned with same precision. Anyway ... here is the updated patch Regards, Kai - Original Message - From: Richard Guenther richard.guent...@gmail.com To: Kai Tietz kti...@redhat.com Cc: gcc-patches@gcc.gnu.org Sent: Monday, June 27, 2011 4:08:41 PM Subject: Re: [patch tree-optimization]: Try to sink type-casts for binary and/or/xor operations On Mon, Jun 27, 2011 at 3:46 PM, Kai Tietz kti...@redhat.com wrote: Hello, this patch sink type conversions in forward-propagate for the following patterns: - ((type) X) op ((type) Y): If X and Y have compatible types. - ((type) X) op CST: If the conversion of (type) ((type-x) CST) == CST and X has integral type. - CST op ((type) X): If the conversion of (type) ((type-x) CST) == CST and X has integral type. See IRC comments. Additionally it fixes another issue shown by this type-sinking in bswap detection. The bswap pattern matching algorithm goes for the first hit, and not tries to seek for best hit. So we search here two times. First for di case (if present) and then for si mode case. Please split this piece out. I suppose either walking over stmts backwards or simply handling __builtin_bswap in find_bswap_1 would be a better fix than yours. Richard. ChangeLog 2011-06-27 Kai Tietz kti...@redhat.com * tree-ssa-forwprop.c (simplify_bitwise_binary): Improve type sinking. * tree-ssa-math-opts.c (execute_optimize_bswap): Separate search for di/si mode patterns for finding widest match. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai
Re: {patch tree-ssa-math-opts]: Change searching direction for bswap
On Mon, Jun 27, 2011 at 7:33 PM, Kai Tietz kti...@redhat.com wrote: Hello, this is the separated patch for issues noticed by doing type-sinking on bitwise-operations. The tests exposed that bswap pattern-matching searches from top to down for each BB. As it replaces the found match for a bswap in tree, but doesn't know about handling its inserted builtin-swap on pattern-matching for wider-mode bswap, search failed. By reversing search order within BB from last to first, this issue can be fixed. ChangeLog 2011-06-27 Kai Tietz kti...@redhat.com * tree-ssa-math-opts.c (execute_optimize_bswap): Search within BB from last to first. Bootstrapped and regression-tested for x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc-head/gcc/tree-ssa-math-opts.c === --- gcc-head.orig/gcc/tree-ssa-math-opts.c +++ gcc-head/gcc/tree-ssa-math-opts.c @@ -1820,8 +1820,10 @@ execute_optimize_bswap (void) FOR_EACH_BB (bb) { gimple_stmt_iterator gsi; - +/* for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (gsi)) + */ + for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi)) { Err ... please 1) don't comment out stuff this way, 2) add a comment why we loop backwards Richard. gimple stmt = gsi_stmt (gsi); tree bswap_src, bswap_type;
Re: [RFC] Fix full memory barrier on SPARC-V8
From: Eric Botcazou ebotca...@adacore.com Date: Tue, 28 Jun 2011 10:11:03 +0200 The V8 architecture manual is quite clear about it: TSO allows stores to be reordered after subsequent loads (it's the only difference in TSO with Strong Consistency) so you need to do something to have a full memory barrier. As there is no specific instruction to that effect in V8, you need to do what is done for pre-SSE2 x86, i.e. use an atomic instruction. Fair enough, you can add this code if you want.
[RFC] Add middle end hook for stack red zone size
This patch is to fix PR38644, which is a bug with long history about stack red zone access, and PR30282 is correlated. Originally red zone concept is not exposed to middle-end, and back-end uses special logic to add extra memory barrier RTL and help the correct dependence in middle-end. This way different back-ends must handle red zone problem by themselves. For example, X86 target introduced function ix86_using_red_zone() to judge red zone access, while POWER introduced offset_below_red_zone_p() to judge it. Note that they have different semantics, but the logic in caller sites of back-end uses them to decide whether adding memory barrier RTL or not. If back-end incorrectly handles this, bug would be introduced. Therefore, the correct method should be middle-end handles red zone related things to avoid the burden in different back-ends. To be specific for PR38644, this middle-end problem causes incorrect behavior for ARM target. This patch exposes red zone concept to middle-end by introducing a middle-end/back-end hook TARGET_STACK_RED_ZONE_SIZE defined in target.def, and by default its value is 0. Back-end may redefine this function to provide concrete red zone size according to specific ABI requirements. In middle end, scheduling dependence is modified by using this hook plus checking stack frame pointer adjustment instruction to decide whether memory references need to be all flushed out or not. In theory, if TARGET_STACK_RED_ZONE_SIZE is defined correctly, back-end would not be required to specially handle this scheduling dependence issue by introducing extra memory barrier RTL. In back-end, the following changes are made to define the hook, 1) For X86, TARGET_STACK_RED_ZONE_SIZE is redefined to be ix86_stack_red_zone_size() in i386.c, which is an newly introduced function. 2) For POWER, TARGET_STACK_RED_ZONE_SIZE is redefined to be rs6000_stack_red_zone_size() in rs6000.c, which is also a newly defined function. 3) For ARM and others, TARGET_STACK_RED_ZONE_SIZE is defined to be default_stack_red_zone_size in targhooks.c, and this function returns 0, which means ARM eabi and others don't support red zone access at all. In summary, the relationship between ABI and red zone access is like below, - | ARCH | ARM | X86 |POWER | others | |--|---|---|---|| |ABI | EABI | MS_64 | other | AIX | V4 || |--|---|---|---||--|| | RED ZONE | No | YES | No | YES | No | No | |--|---|---|---||--|| | RED ZONE SIZE| 0 | 128 | 0 |220/288 | 0 |0 | - Thanks, -Jiangning stack-red-zone-patch-38644-3.patch Description: Binary data
Re: [patch tree-optimization]: Try to do type sinking on comparisons
On Mon, Jun 27, 2011 at 8:52 PM, Kai Tietz kti...@redhat.com wrote: Hello, this patch tries to sink conversions for comparisons patterns: a) (type) X cmp (type) Y = x cmp y. b) (type) X cmp CST = x cmp ((type-x) CST). c) CST cmp (type) X = ((type-x) CST) cmp x. This patch just allows type sinking for the case that type-precision of type is wider or equal to type-precision of type-x. Or if type and type-x have same signess and CST fits into type-x. For cmp operation is == or !=, we allow also that type and type-x have different signess, as long as CST fits into type-x without truncation. ChangeLog 2011-06-27 Kai Tietz kti...@redhat.com * tree-ssa-forwprop.c (forward_propagate_into_comparision): Sink types within comparison operands, if suitable. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Hmm, why does fold_widened_comparison and fold_sign_changed_comparison not handle these cases? We already dispatch to fold in this function, so this is a case where we'd want fold to be improved. You didn't add testcases - do you have some that are not handled by fold already? Thanks, Richard. Regards, Kai
Re: [RFC, ARM] Convert thumb1 prologue completely to rtl
On 27/06/11 19:31, Richard Henderson wrote: On 06/24/2011 02:59 AM, Richard Earnshaw wrote: On 18/06/11 20:02, Richard Henderson wrote: I couldn't find anything terribly tricky about the conversion. The existing push_mult pattern would service thumb1 with just a tweak or two to the memory predicate and the length. The existing emit_multi_reg_push wasn't set up to handle a complete switch of registers for unwind info. I thought about trying to merge them, but thought chickened out. I havn't cleaned out the code that is now dead in thumb_pushpop. I'd been thinking about maybe converting epilogues completely to rtl as well, which would allow the function to be deleted completely, rather than incrementally. I'm unsure what testing should be applied. I'm currently doing arm-elf, which does at least have a thumb1 multilib, and uses newlib so I don't have to fiddle with setting up a full native cross environment. What else should be done? arm-eabi? Testing this on arm-eabi is essential since this may affect C++ unwind table generation (I can't see any obvious problems, but you never know). I've now tested the patch with both arm-elf and arm-eabi with RUNTESTFLAGS='--target_board=arm-sim{-mthumb}' with no regressions. Ok to install? Yep, thanks. R. r~
Re: Backport AVX256 load/store split patches to gcc 4.6 for performance boost on latest AMD/Intel hardware.
On Tue, Jun 28, 2011 at 12:33 AM, Fang, Changpeng changpeng.f...@amd.com wrote: Hi, Attached are the patches we propose to backport to gcc 4.6 branch which are related to avx256 unaligned load/store splitting. As we mentioned before, The combined effect of these patches are positive on both AMD and Intel CPUs on cpu2006 and polyhedron 2005. 0001-Split-32-byte-AVX-unaligned-load-store.patch Initial patch that implements unaligned load/store splitting 0001-Don-t-assert-unaligned-256bit-load-store.patch Remove the assert. 0001-Fix-a-typo-in-mavx256-split-unaligned-store.patch Fix a typo. 0002-pr49089-enable-avx256-splitting-unaligned-load-store.patch Disable unaligned load splitting for bdver1. All these patches are in 4.7 trunk. Bootstrap and tests are on-going in gcc 4.6 branch. Is It OK to commit to 4.6 branch as long as the tests pass? Yes, if they have been approved and checked in for trunk. Thanks, Richard. Thanks, Changpeng From: Jagasia, Harsha Sent: Monday, June 20, 2011 12:03 PM To: 'H.J. Lu' Cc: 'gcc-patches@gcc.gnu.org'; 'hubi...@ucw.cz'; 'ubiz...@gmail.com'; 'hongjiu...@intel.com'; Fang, Changpeng Subject: RE: Backport AVX256 load/store split patches to gcc 4.6 for performance boost on latest AMD/Intel hardware. On Mon, Jun 20, 2011 at 9:58 AM, harsha.jaga...@amd.com wrote: Is it ok to backport patches, with Changelogs below, already in trunk to gcc 4.6? These patches are for AVX-256bit load store splitting. These patches make significant performance difference =3% to several CPU2006 and Polyhedron benchmarks on latest AMD and Intel hardware. If ok, I will post backported patches for commit approval. AMD plans to submit additional patches on AVX-256 load/store splitting to trunk. We will send additional backport requests for those later once they are accepted/comitted to trunk. Since we will make some changes on trunk, I would prefer to to do the backport after trunk change is finished. Ok, thanks. Adding Changpeng who is working on the trunk changes. Harsha
Commit: Add support for V850 variants to libgcc
Hi Guys, I am checking in the patch below to add support for V850 variant architectures to the libgcc/config.host file. Cheers Nick libgcc/ChangeLog 2011-06-28 Nick Clifton ni...@redhat.com * config.host: Recognize all V850 variants. Index: libgcc/config.host === --- libgcc/config.host (revision 175575) +++ libgcc/config.host (working copy) @@ -143,6 +143,9 @@ sh[123456789lbe]*-*-*) cpu_type=sh ;; +v850*-*-*) + cpu_type=v850 + ;; esac # Common parts for widely ported systems. @@ -645,12 +648,8 @@ ;; spu-*-elf*) ;; -v850e1-*-*) +v850*-*-*) ;; -v850e-*-*) - ;; -v850-*-*) - ;; vax-*-linux*) ;; vax-*-netbsdelf*)
Re: [patch, darwin, committed] fix PR47997
On 26 Jun 2011, at 17:28, Iain Sandoe wrote: It should also be applied to 4.6.x at some stage. applied to 4.6 branch. gcc/ PR target/47997 * config/darwin.c (darwin_mergeable_string_section): Place string constants in '.cstring' rather than '.const' when CF/NSStrings are active. Index: gcc/config/darwin.c === --- gcc/config/darwin.c (revision 175409) +++ gcc/config/darwin.c (working copy) @@ -1195,7 +1195,11 @@ static section * darwin_mergeable_string_section (tree exp, unsigned HOST_WIDE_INT align) { - if (flag_merge_constants + /* Darwin's ld expects to see non-writable string literals in the .cstring + section. Later versions of ld check and complain when CFStrings are + enabled. Therefore we shall force the strings into .cstring since we + don't support writable ones anyway. */ + if ((darwin_constant_cfstrings || flag_merge_constants) TREE_CODE (exp) == STRING_CST TREE_CODE (TREE_TYPE (exp)) == ARRAY_TYPE align = 256
Re: {patch tree-ssa-math-opts]: Change searching direction for bswap
Oh, missed to fill comment. Thanks, Kai Index: gcc-head/gcc/tree-ssa-math-opts.c === --- gcc-head.orig/gcc/tree-ssa-math-opts.c +++ gcc-head/gcc/tree-ssa-math-opts.c @@ -1821,7 +1821,11 @@ execute_optimize_bswap (void) { gimple_stmt_iterator gsi; - for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (gsi)) + /* We scan for bswap patterns reverse for making sure we get +widest match. As bswap pattern matching doesn't handle +previously inserted smaller bswap replacements as sub- +patterns, the wider variant wouldn't be detected. */ + for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi)) { gimple stmt = gsi_stmt (gsi); tree bswap_src, bswap_type;
Re: {patch tree-ssa-math-opts]: Change searching direction for bswap
On Tue, Jun 28, 2011 at 11:29 AM, Kai Tietz kti...@redhat.com wrote: Oh, missed to fill comment. Thanks, Kai Index: gcc-head/gcc/tree-ssa-math-opts.c === --- gcc-head.orig/gcc/tree-ssa-math-opts.c +++ gcc-head/gcc/tree-ssa-math-opts.c @@ -1821,7 +1821,11 @@ execute_optimize_bswap (void) { gimple_stmt_iterator gsi; - for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (gsi)) + /* We scan for bswap patterns reverse for making sure we get We do a reverse scan for bswap patterns to make sure we get the widest match. Ok with that change. Richard. + widest match. As bswap pattern matching doesn't handle + previously inserted smaller bswap replacements as sub- + patterns, the wider variant wouldn't be detected. */ + for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi)) { gimple stmt = gsi_stmt (gsi); tree bswap_src, bswap_type;
Re: [testsuite] ARM tests vfp-ldm*.c and vfp-stm*.c
On 24/06/11 15:49, Janis Johnson wrote: On 06/24/2011 03:29 AM, Joseph S. Myers wrote: On Thu, 23 Jun 2011, Janis Johnson wrote: Tests target/arm/vfp-ldm*.c and vfp-sdm*.c add -mfloat-abi=softfp but fail if multilib flags override that option. This patch skips the test for multilibs that specify a different value for -mfloat-abi. While they need to be skipped for -mfloat-abi=soft, I'd think they ought to pass for -mfloat-abi=hard - why do they fail there? They don't, this would be better: /* { dg-skip-if need fp instructions { *-*-* } { -mfloat-abi=soft } { } } */ Janis OK with that change. R.
Re: [Patch ARM] Add predefine for availability of DSP multiplication functions.
On 24/06/11 09:09, James Greenhalgh wrote: Hi, This patch adds a builtin macro __ARM_FEATURE_DSP which is defined when the ARMv5E DSP multiplication extensions are available for use. Thanks, James Greenhalgh 2011-06-22 James Greenhalgh james.greenha...@arm.com * TARGET_CPU_CPP_BUILTINS: Add __ARM_FEATURE_DSP. 0001-Patch-ARM-Add-predefine-for-availability-of-DSP-mult.patch diff --git gcc/config/arm/arm.h gcc/config/arm/arm.h index c32ef1a..892065b 100644 --- gcc/config/arm/arm.h +++ gcc/config/arm/arm.h @@ -45,6 +45,8 @@ extern char arm_arch_name[]; #define TARGET_CPU_CPP_BUILTINS()\ do \ {\ + if (TARGET_DSP_MULTIPLY)\ + builtin_define (__ARM_FEATURE_DSP); \ /* Define __arm__ even when in thumb mode, for \ consistency with armcc. */ \ builtin_define (__arm__); \ OK. R.
Re: [patch tree-optimization]: Try to sink type-casts for binary and/or/xor operations
Ok, moved code out of special case for addresses. Bootstrapped for x86_64-pc-linux-gnu. Patch ok for apply? Regards, Kai Index: gcc-head/gcc/tree-ssa-forwprop.c === --- gcc-head.orig/gcc/tree-ssa-forwprop.c +++ gcc-head/gcc/tree-ssa-forwprop.c @@ -1676,16 +1676,61 @@ simplify_bitwise_binary (gimple_stmt_ite } } + /* Try to fold (type) X op CST - (type) (X op ((type-x) CST)). */ + if (TREE_CODE (arg2) == INTEGER_CST + CONVERT_EXPR_CODE_P (def1_code) + INTEGRAL_TYPE_P (def1_arg1) + int_fits_type_p (arg2, TREE_TYPE (def1_arg1))) +{ + gimple newop; + tree tem = create_tmp_reg (TREE_TYPE (def1_arg1), NULL); + newop = +gimple_build_assign_with_ops (code, tem, def1_arg1, + fold_convert_loc (gimple_location (stmt), + TREE_TYPE (def1_arg1), + arg2)); + tem = make_ssa_name (tem, newop); + gimple_assign_set_lhs (newop, tem); + gsi_insert_before (gsi, newop, GSI_SAME_STMT); + gimple_assign_set_rhs_with_ops_1 (gsi, NOP_EXPR, + tem, NULL_TREE, NULL_TREE); + update_stmt (gsi_stmt (*gsi)); + return true; +} + + /* Try to fold CST op (type) X - (type) (((type-x) CST) op X). */ + if (TREE_CODE (arg1) == INTEGER_CST + CONVERT_EXPR_CODE_P (def2_code) + INTEGRAL_TYPE_P (def2_arg1) + int_fits_type_p (arg1, TREE_TYPE (def2_arg1))) +{ + gimple newop; + tree tem = create_tmp_reg (TREE_TYPE (def2_arg1), NULL); + newop = +gimple_build_assign_with_ops (code, tem, def2_arg1, + fold_convert_loc (gimple_location (stmt), + TREE_TYPE (def2_arg1), + arg1)); + tem = make_ssa_name (tem, newop); + gimple_assign_set_lhs (newop, tem); + gsi_insert_before (gsi, newop, GSI_SAME_STMT); + gimple_assign_set_rhs_with_ops_1 (gsi, NOP_EXPR, + tem, NULL_TREE, NULL_TREE); + update_stmt (gsi_stmt (*gsi)); + return true; +} + /* For bitwise binary operations apply operand conversions to the binary operation result instead of to the operands. This allows to combine successive conversions and bitwise binary operations. */ if (CONVERT_EXPR_CODE_P (def1_code) CONVERT_EXPR_CODE_P (def2_code) types_compatible_p (TREE_TYPE (def1_arg1), TREE_TYPE (def2_arg1)) - /* Make sure that the conversion widens the operands or that it -changes the operation to a bitfield precision. */ + /* Make sure that the conversion widens the operands, or has same +precision, or that it changes the operation to a bitfield +precision. */ ((TYPE_PRECISION (TREE_TYPE (def1_arg1)) - TYPE_PRECISION (TREE_TYPE (arg1))) + = TYPE_PRECISION (TREE_TYPE (arg1))) || (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (arg1))) != MODE_INT) || (TYPE_PRECISION (TREE_TYPE (arg1))
Re: [patch tree-optimization]: Try to do type sinking on comparisons
- Original Message - From: Richard Guenther richard.guent...@gmail.com To: Kai Tietz kti...@redhat.com Cc: gcc-patches@gcc.gnu.org Sent: Tuesday, June 28, 2011 10:45:20 AM Subject: Re: [patch tree-optimization]: Try to do type sinking on comparisons On Mon, Jun 27, 2011 at 8:52 PM, Kai Tietz kti...@redhat.com wrote: Hello, this patch tries to sink conversions for comparisons patterns: a) (type) X cmp (type) Y = x cmp y. b) (type) X cmp CST = x cmp ((type-x) CST). c) CST cmp (type) X = ((type-x) CST) cmp x. This patch just allows type sinking for the case that type-precision of type is wider or equal to type-precision of type-x. Or if type and type-x have same signess and CST fits into type-x. For cmp operation is == or !=, we allow also that type and type-x have different signess, as long as CST fits into type-x without truncation. ChangeLog 2011-06-27 Kai Tietz kti...@redhat.com * tree-ssa-forwprop.c (forward_propagate_into_comparision): Sink types within comparison operands, if suitable. Bootstrapped and regression tested for x86_64-pc-linux-gnu. Ok for apply? Hmm, why does fold_widened_comparison and fold_sign_changed_comparison not handle these cases? We already dispatch to fold in this function, so this is a case where we'd want fold to be improved. You didn't add testcases - do you have some that are not handled by fold already? Thanks, Richard. Regards, Kai Well, I noticed this kind of patterns in case for boolification of comparisons. They seem to appear if one of the comparison operands itself has type-promotion and has non-trivial tree. Nevertheless I am about to rework this patch a bit, as it has some issues about type-truncation, if outer type has smaller precision then inner type. I think for now it would be ok to do operations only for case that inner type has smaller or equal precision then outer type, and inner and outer type are of integer kind. In the other case we might want to transform such integer-comparisons from ((char) a:int) cmp CST to (a:int (char) ~0) cmp (char) CST. So we are truncation proper for comparison. Nevertheless the more interesting part is, if inner type has smaller or equal precision to outer type. Kai
Ping Re: Clean up TARGET_ASM_NAMED_SECTION defaults
Ping. This patch http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01642.html is pending review. -- Joseph S. Myers jos...@codesourcery.com
[Patch, AVR]: Better 32=16*16 widening multiplication
This implements mulhisi3 and umulhisi3 widening multiplication insns if AVR_HAVE_MUL. I chose the interface as r25:r22 = r19:r18 * r21:r20 which is ok because only avr-gcc BE will call respective __* support functions in libgcc. Tested without regression and hand-tested assembler code. Johann * config/avr/t-avr (LIB1ASMFUNCS): Add _mulhisi3, _umulhisi3, _xmulhisi3_exit. * config/avr/libgcc.S (_xmulhisi3_exit): New Function. (__mulhisi3): Optimize if have MUL*. Use XJMP instead of rjmp. (__umulhisi3): Ditto. * config/avr/avr.md (mulhisi3): New insn expender. (umulhisi3): New insn expender. (*mulhisi3_call): New insn. (*umulhisi3_call): New insn. Index: config/avr/libgcc.S === --- config/avr/libgcc.S (revision 175574) +++ config/avr/libgcc.S (working copy) @@ -178,10 +178,57 @@ __mulhi3_exit: #endif /* defined (L_mulhi3) */ #endif /* !defined (__AVR_HAVE_MUL__) */ +/*** + Widening Multiplication 32 = 16 x 16 +***/ + #if defined (L_mulhisi3) - .global __mulhisi3 - .func __mulhisi3 -__mulhisi3: +DEFUN __mulhisi3 +#if defined (__AVR_HAVE_MUL__) + +;; r25:r22 = r19:r18 * r21:r20 + +#define A0 18 +#define B0 20 +#define C0 22 + +#define A1 A0+1 +#define B1 B0+1 +#define C1 C0+1 +#define C2 C0+2 +#define C3 C0+3 + +; C = (signed)A1 * (signed)B1 +muls A1, B1 +movw C2, R0 + +; C += A0 * B0 +mul A0, B0 +movw C0, R0 + +; C += (signed)A1 * B0 +mulsu A1, B0 +sbci C3, 0 +add C1, R0 +adc C2, R1 +clr __zero_reg__ +adc C3, __zero_reg__ + +; C += (signed)B1 * A0 +mulsu B1, A0 +sbci C3, 0 +XJMP __xmulhisi3_exit + +#undef A0 +#undef A1 +#undef B0 +#undef B1 +#undef C0 +#undef C1 +#undef C2 +#undef C3 + +#else /* !__AVR_HAVE_MUL__ */ mov_l r18, r24 mov_h r19, r25 clr r24 @@ -192,24 +239,91 @@ __mulhisi3: sbrc r19, 7 dec r20 mov r21, r20 - rjmp __mulsi3 - .endfunc + XJMP __mulsi3 +#endif /* __AVR_HAVE_MUL__ */ +ENDF __mulhisi3 #endif /* defined (L_mulhisi3) */ #if defined (L_umulhisi3) - .global __umulhisi3 - .func __umulhisi3 -__umulhisi3: +DEFUN __umulhisi3 +#if defined (__AVR_HAVE_MUL__) + +;; r25:r22 = r19:r18 * r21:r20 + +#define A0 18 +#define B0 20 +#define C0 22 + +#define A1 A0+1 +#define B1 B0+1 +#define C1 C0+1 +#define C2 C0+2 +#define C3 C0+3 + +; C = A1 * B1 +mul A1, B1 +movw C2, R0 + +; C += A0 * B0 +mul A0, B0 +movw C0, R0 + +; C += A1 * B0 +mul A1, B0 +add C1, R0 +adc C2, R1 +clr __zero_reg__ +adc C3, __zero_reg__ + +; C += B1 * A0 +mul B1, A0 +XJMP __xmulhisi3_exit + +#undef A0 +#undef A1 +#undef B0 +#undef B1 +#undef C0 +#undef C1 +#undef C2 +#undef C3 + +#else /* !__AVR_HAVE_MUL__ */ mov_l r18, r24 mov_h r19, r25 clr r24 clr r25 clr r20 clr r21 - rjmp __mulsi3 - .endfunc + XJMP __mulsi3 +#endif /* __AVR_HAVE_MUL__ */ +ENDF __umulhisi3 #endif /* defined (L_umulhisi3) */ +#if defined (L_xmulhisi3_exit) + +;;; Helper for __mulhisi3 resp. __umulhisi3. + +#define C0 22 +#define C1 C0+1 +#define C2 C0+2 +#define C3 C0+3 + +DEFUN __xmulhisi3_exit +add C1, R0 +adc C2, R1 +clr __zero_reg__ +adc C3, __zero_reg__ +ret +ENDF __xmulhisi3_exit + +#undef C0 +#undef C1 +#undef C2 +#undef C3 + +#endif /* defined (L_xmulhisi3_exit) */ + #if defined (L_mulsi3) /*** Multiplication 32 x 32 Index: config/avr/avr.md === --- config/avr/avr.md (revision 175574) +++ config/avr/avr.md (working copy) @@ -1056,6 +1056,50 @@ (define_insn *mulsi3_call [(set_attr type xcall) (set_attr cc clobber)]) +(define_expand mulhisi3 + [(set (reg:HI 18) +(match_operand:HI 1 register_operand )) + (set (reg:HI 20) +(match_operand:HI 2 register_operand )) + (set (reg:SI 22) +(mult:SI (sign_extend:SI (reg:HI 18)) + (sign_extend:SI (reg:HI 20 + (set (match_operand:SI 0 register_operand ) +(reg:SI 22))] + AVR_HAVE_MUL + ) + +(define_expand umulhisi3 + [(set (reg:HI 18) +(match_operand:HI 1 register_operand )) + (set (reg:HI 20) +(match_operand:HI 2 register_operand )) + (set (reg:SI 22) +(mult:SI (zero_extend:SI (reg:HI 18)) + (zero_extend:SI (reg:HI 20 + (set (match_operand:SI 0 register_operand ) +(reg:SI 22))] + AVR_HAVE_MUL + ) + +(define_insn *mulhisi3_call + [(set (reg:SI 22) +(mult:SI (sign_extend:SI (reg:HI 18)) + (sign_extend:SI (reg:HI 20] + AVR_HAVE_MUL + %~call __mulhisi3 + [(set_attr type xcall) + (set_attr cc clobber)]) + +(define_insn
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On 24/06/11 16:47, Richard Guenther wrote: I can certainly add checks to make sure that the skipped operations actually don't make any important changes to the value, but do I need to? Yes. OK, how about this patch? I've added checks to make sure the value is not truncated at any point. I've also changed the test cases to address Janis' comments. Andrew 2011-06-28 Andrew Stubbs a...@codesourcery.com gcc/ * gimple.h (tree_ssa_harmless_type_conversion): New prototype. (tree_ssa_strip_harmless_type_conversions): New prototype. (harmless_type_conversion_p): New prototype. * tree-ssa-math-opts.c (convert_plusminus_to_widen): Look for multiply statement beyond no-op conversion statements. * tree-ssa.c (harmless_type_conversion_p): New function. (tree_ssa_harmless_type_conversion): New function. (tree_ssa_strip_harmless_type_conversions): New function. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -1090,8 +1090,11 @@ extern bool validate_gimple_arglist (const_gimple, ...); /* In tree-ssa.c */ extern bool tree_ssa_useless_type_conversion (tree); +extern bool tree_ssa_harmless_type_conversion (tree); extern tree tree_ssa_strip_useless_type_conversions (tree); +extern tree tree_ssa_strip_harmless_type_conversions (tree); extern bool useless_type_conversion_p (tree, tree); +extern bool harmless_type_conversion_p (tree, tree); extern bool types_compatible_p (tree, tree); /* Return the code for GIMPLE statement G. */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; +return a + (short)bc; +} + +/* { dg-final { scan-assembler mul } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler umlal } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2117,23 +2117,19 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, rhs1 = gimple_assign_rhs1 (stmt); rhs2 = gimple_assign_rhs2 (stmt); - if (TREE_CODE (rhs1) == SSA_NAME) -{ - rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); - if (is_gimple_assign (rhs1_stmt)) - rhs1_code = gimple_assign_rhs_code (rhs1_stmt); -} - else + if (TREE_CODE (rhs1) != SSA_NAME + || TREE_CODE (rhs2) != SSA_NAME) return false; - if (TREE_CODE (rhs2) == SSA_NAME) -{ - rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); - if (is_gimple_assign (rhs2_stmt)) - rhs2_code = gimple_assign_rhs_code (rhs2_stmt); -} - else -return false; + rhs1 = tree_ssa_strip_harmless_type_conversions (rhs1); + rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_gimple_assign (rhs1_stmt)) +rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + + rhs2 = tree_ssa_strip_harmless_type_conversions(rhs2); + rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); + if (is_gimple_assign (rhs2_stmt)) +rhs2_code = gimple_assign_rhs_code (rhs2_stmt); if (code == PLUS_EXPR rhs1_code == MULT_EXPR) { --- a/gcc/tree-ssa.c +++ b/gcc/tree-ssa.c @@ -1484,6 +1484,33 @@ useless_type_conversion_p (tree outer_type, tree inner_type) return false; } +/* Return true if the conversion from INNER_TYPE to OUTER_TYPE will + not alter the arithmetic meaning of a type, otherwise return false. + + For example, widening an integer type leaves the value unchanged, + but narrowing an integer type can cause truncation. + + Note that switching between signed and unsigned modes doesn't change + the underlying representation, and so is harmless. + + This function is not yet a complete definition of what is harmless + but should reject everything that is not. */ + +bool +harmless_type_conversion_p (tree outer_type, tree inner_type) +{ + /* If it's useless, it's also harmless. */ + if (useless_type_conversion_p (outer_type, inner_type)) +return true; + + if (INTEGRAL_TYPE_P (inner_type) + INTEGRAL_TYPE_P (outer_type) + TYPE_PRECISION (inner_type) = TYPE_PRECISION (outer_type)) +return true; + + return false; +} + /* Return true if a conversion from either type of TYPE1 and TYPE2 to the other is not required. Otherwise return false. */ @@ -1515,6 +1542,29 @@ tree_ssa_useless_type_conversion (tree expr) return false; } +/* Return true if EXPR is a harmless type conversion, otherwise return + false. */ + +bool +tree_ssa_harmless_type_conversion (tree expr) +{ + gimple stmt; + + if (TREE_CODE (expr) != SSA_NAME) +return false; + + stmt = SSA_NAME_DEF_STMT (expr); + + if (!is_gimple_assign (stmt)) +return false; + + if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))) +return false; + + return harmless_type_conversion_p
Ping #1: [Patch, AVR]: Fix PR34734
http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01462.html Georg-Johann Lay wrote: PR34734 produces annoying, false warnings if __attribute__((progmem)) is used in conjunction with C++. DECL_INITIAL is not yet set up in avr_handle_progmem_attribute. Johann PR target/34734 * config/avr/avr.c (avr_handle_progmem_attribute): Move warning about uninitialized data attributed 'progmem' from here... (avr_encode_section_info): ...to this new function. (TARGET_ENCODE_SECTION_INFO): New define. (avr_section_type_flags): For data in .progmem.data, remove section flag SECTION_WRITE. avr_encode_section_info is good place to emit the warning: DECL_INITIAL has stabilized for C++, the warning will appear even for unused variables that will eventually be thrown away, and the warning appears only once (new_decl_p). Johann
Re: [pph] Fix var order when streaming in. (issue4635074)
On 2011/06/28 00:27:04, Gabriel Charette wrote: The names and namespaces chains are built by adding each new element to the front of the list. When streaming it in we traverse the list of names and re-add them to the current chains; thus reversing the order in which they were defined in the header file. Since this is a singly linked-list we cannot start from the tail; thus we reverse the chain in place and then traverse it, now adding the bindings in the same order they were found in the header file. I introduced a new failing test to test this. The test showed the reverse behaviour prior to the patch. The test still fails however, there is another inversion problem between the global variables and the .LFBO, .LCFI0, ... This patch only fixes the inversion of the global variables declarations in the assembly, not the second issue this is exposing. This second issue is potentially already exposed by another test?? Do we need this new test? It can't hurt. This fixes all of the assembly mismatches in c1limits-externalid.cc however! Nice! 2011-06-27 Gabriel Charette mailto:gch...@google.com * pph-streamer-in.c (pph_add_bindings_to_namespace): Reverse names and namespaces chains. * g++.dg/pph/c1limits-externalid.cc: Remove pph asm xdiff. * g++.dg/pph/c1varorder.cc: New. * g++.dg/pph/c1varorder.h: New. * g++.dg/pph/pph.map: Add c1varorder.h OK with a minor comment nit. http://codereview.appspot.com/4635074/
Re: [pph] Fix var order when streaming in. (issue4635074)
http://codereview.appspot.com/4635074/diff/1/gcc/cp/pph-streamer-in.c File gcc/cp/pph-streamer-in.c (right): http://codereview.appspot.com/4635074/diff/1/gcc/cp/pph-streamer-in.c#newcode1144 gcc/cp/pph-streamer-in.c:1144: /* The chains are built backwards (ref: add_decl_to_level@name-lookup.c), 1143 1144 /* The chains are built backwards (ref: add_decl_to_level@name-lookup.c), s/add_decl_to_level@name-lookup.c/add_decl_to_level/ http://codereview.appspot.com/4635074/
Re: [patch tree-optimization]: Try to sink type-casts for binary and/or/xor operations
On Tue, Jun 28, 2011 at 12:04 PM, Kai Tietz kti...@redhat.com wrote: Ok, moved code out of special case for addresses. Bootstrapped for x86_64-pc-linux-gnu. Patch ok for apply? There is no need to check for CST op (T) arg, the constant is always the 2nd operand for commutative operations. Ok with that variant removed. Thanks, Richard. Regards, Kai
Re: [PATCH, PR 49094] Refrain from creating misaligned accesses in SRA
Hi, On Mon, Jun 27, 2011 at 03:18:01PM +0200, Richard Guenther wrote: On Sun, 26 Jun 2011, Martin Jambor wrote: Hi, under some circumstances involving user specified alignment and/or packed attributes, SRA can create a misaligned MEM_REF. As the testcase demonstrates, it is not enough to not consider variables with these type attributes, mainly because we might attempt to load/store the scalar replacements from/to right/left sides of original aggregate assignments which might be misaligned. ... I think you want something like static bool tree_non_mode_aligned_mem_p (tree exp) { enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); unsigned int align; if (mode == BLKmode || !STRICT_ALIGNMENT) return false; align = get_object_alignment (exp, BIGGEST_ALIGNMENT); if (GET_MODE_ALIGNMENT (mode) align) return true; return false; } as for STRICT_ALIGNMENT targets we assume that the loads/stores SRA inserts have the alignment of the mode. I admit to be surprised this works, I did not know aggregates could have non-BLK modes. Anyway, it does, and so I intend to commit the following this evening, after a testsuite run on sparc64. Please stop me if the previous message was not a pre-approval of sorts. Thanks a lot, Martin 2011-06-28 Martin Jambor mjam...@suse.cz PR tree-optimization/49094 * tree-sra.c (tree_non_mode_aligned_mem_p): New function. (build_accesses_from_assign): Use it. * testsuite/gcc.dg/tree-ssa/pr49094.c: New test. Index: src/gcc/tree-sra.c === --- src.orig/gcc/tree-sra.c +++ src/gcc/tree-sra.c @@ -1050,6 +1050,25 @@ disqualify_ops_if_throwing_stmt (gimple return false; } +/* Return true iff type of EXP is not sufficiently aligned. */ + +static bool +tree_non_mode_aligned_mem_p (tree exp) +{ + enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); + unsigned int align; + + if (mode == BLKmode + || !STRICT_ALIGNMENT) +return false; + + align = get_object_alignment (exp, BIGGEST_ALIGNMENT); + if (GET_MODE_ALIGNMENT (mode) align) +return true; + + return false; +} + /* Scan expressions occuring in STMT, create access structures for all accesses to candidates for scalarization and remove those candidates which occur in statements or expressions that prevent them from being split apart. Return @@ -1074,7 +1093,10 @@ build_accesses_from_assign (gimple stmt) lacc = build_access_from_expr_1 (lhs, stmt, true); if (lacc) -lacc-grp_assignment_write = 1; +{ + lacc-grp_assignment_write = 1; + lacc-grp_unscalarizable_region |= tree_non_mode_aligned_mem_p (rhs); +} if (racc) { @@ -1082,6 +1104,7 @@ build_accesses_from_assign (gimple stmt) if (should_scalarize_away_bitmap !gimple_has_volatile_ops (stmt) !is_gimple_reg_type (racc-type)) bitmap_set_bit (should_scalarize_away_bitmap, DECL_UID (racc-base)); + racc-grp_unscalarizable_region |= tree_non_mode_aligned_mem_p (lhs); } if (lacc racc Index: src/gcc/testsuite/gcc.dg/tree-ssa/pr49094.c === --- /dev/null +++ src/gcc/testsuite/gcc.dg/tree-ssa/pr49094.c @@ -0,0 +1,38 @@ +/* { dg-do run } */ +/* { dg-options -O } */ + +struct in_addr { + unsigned int s_addr; +}; + +struct ip { + unsigned char ip_p; + unsigned short ip_sum; + struct in_addr ip_src,ip_dst; +} __attribute__ ((aligned(1), packed)); + +struct ip ip_fw_fwd_addr; + +int test_alignment( char *m ) +{ + struct ip *ip = (struct ip *) m; + struct in_addr pkt_dst; + pkt_dst = ip-ip_dst ; + if( pkt_dst.s_addr == 0 ) +return 1; + else +return 0; +} + +int __attribute__ ((noinline, noclone)) +intermediary (char *p) +{ + return test_alignment (p); +} + +int +main (int argc, char *argv[]) +{ + ip_fw_fwd_addr.ip_dst.s_addr = 1; + return intermediary ((void *) ip_fw_fwd_addr); +}
Re: [google] Enable both ld and gold in gcc (issue4664051)
On 11-06-27 19:09 , Doug Kwan wrote: This patch enables both ld and gold in gcc using the -fuse-ld switch. The original patch use written by Nick Clifton and was subsequently updated by Matthias Klose. The patch currently does not work with LTO but that is okay for now and it is no worse than its counterpart in an older gcc version. We need this functionality for now. It is mostly used as a safety net in the Android toolchain if gold does not work. We can disable LTO in that case. Hopefully we will fix this can resubmit it for trunk later. This is tested by running ./buildit and building the Android toolchain. I would like to apply this to goolge/main only. 2011-06-27 Doug Kwandougk...@google.com Google ref 41164-p2 Backport upstream patch under review. 2011-01-19 Nick Cliftonni...@redhat.com Matthias Klosed...@debian.org * configure.ac (gcc_cv_gold_srcdir): New cached variable - contains the location of the gold sources. (ORIGINAL_GOLD_FOR_TARGET): New substituted variable - contains the name of the locally built gold executable. * configure: Regenerate. * collect2.c (main): Detect the -use-gold and -use-ld switches and select the appropriate linker, if found. If a linker cannot be found and collect2 is executing in verbose mode then report the search paths examined. * exec-tool.in: Detect the -use-gold and -use-ld switches and select the appropriate linker, if found. Add support for -v switch. Report problems locating linker executable. * gcc.c (LINK_COMMAND_SPEC): Translate -fuse-ld=gold into -use-gold and -fuse-ld=bfd into -use-ld. * common.opt: Add fuse-ld=gold and fuse-ld=bfd. * opts.c (comman_handle_option): Ignore -fuse-ld=gold and -fuse-ld=bfd. * doc/invoke.texi: Document the new options. OK for google/main. Nick/Matthias, anything in particular blocking this patch in trunk? (other than the LTO issue) Diego. -- This patch is available for review at http://codereview.appspot.com/4664051
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
On Tue, Jun 28, 2011 at 12:47 PM, Andrew Stubbs andrew.stu...@gmail.com wrote: On 24/06/11 16:47, Richard Guenther wrote: I can certainly add checks to make sure that the skipped operations actually don't make any important changes to the value, but do I need to? Yes. OK, how about this patch? I'd name the predicate value_preserving_conversion_p which I think is what you mean. harmless isn't really descriptive. Note that you include non-value-preserving conversions, namely int - unsigned int. Don't dispatch to useless_type_conversion_p, it's easy to enumerate which conversions are value-preserving. Don't try to match the tree_ssa_useless_* set of functions, instead put the value_preserving_conversion_p predicate in tree.[ch] and a suitable function using it in tree-ssa-math-opts.c. Thanks, Richard. I've added checks to make sure the value is not truncated at any point. I've also changed the test cases to address Janis' comments. Andrew
Re: Simplify Solaris configuration
Eric, At least I can build the 64-bit libgcc now, but the 32-bit one fails for unrelated reasons: configure:3247: checking for suffix of object files configure:3269: /var/gcc/gcc-4.7.0-20110622/11-gcc/./gcc/xgcc -B/var/gcc/gcc-4.7.0-20110622/11-gcc/./gcc/ -B/usr/local/sparcv9-sun-solaris2.11/bin/ -B/usr/local/sparcv9-sun-solaris2.11/lib/ -isystem /usr/local/sparcv9-sun-solaris2.11/include -isystem /usr/local/sparcv9-sun-solaris2.11/sys-include -m32 -c -g -O2 conftest.c 5 conftest.c:16:1: internal compiler error: in simplify_subreg, at simplify-rtx.c:5362 It's very likely the same problem, the options -mptr32 -mno-stack-bias aren't passed to cc1 anymore. right. sparc/sol2-64.h was included too late. The following patch fixes this. Other approaches to reordering the headers ran into various issues since TARGET_DEFAULT is defined and redefined in several places. The patch allowed a sparcv9-sun-solaris2.11 bootstrap to run well into building the target libraries (failed configuring libgfortran since I'd mis-merged the 32-bit and 64-bit gmp.h), a sparc-sun-solaris2.10 bootstrap is still running. I'll probably fix the gmp.h issue, rebuild the sparcv9-sun-solaris2.11 configuration and commit unless I find problems or you disapprove of the approach. Rainer 2011-06-28 Rainer Orth r...@cebitec.uni-bielefeld.de * config/sparc/sol2-64.h (TARGET_DEFAULT): Remove. (TARGET_64BIT_DEFAULT): Define. * config.gcc (sparc*-*-solaris2*): Move sparc/sol2-64.h to front of tm_file. * config/sparc/sol2.h [TARGET_64BIT_DEFAULT] (TARGET_DEFAULT): Define. diff --git a/gcc/config.gcc b/gcc/config.gcc --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -2482,7 +2482,7 @@ sparc*-*-solaris2*) tm_file=sparc/biarch64.h ${tm_file} ${sol2_tm_file} sol2-bi.h case ${target} in sparc64-*-* | sparcv9-*-*) - tm_file=${tm_file} sparc/sol2-64.h + tm_file=sparc/sol2-64.h ${tm_file} ;; *) test x$with_cpu != x || with_cpu=v9 diff --git a/gcc/config/sparc/sol2-64.h b/gcc/config/sparc/sol2-64.h --- a/gcc/config/sparc/sol2-64.h +++ b/gcc/config/sparc/sol2-64.h @@ -1,7 +1,7 @@ /* Definitions of target machine for GCC, for bi-arch SPARC running Solaris 2, defaulting to 64-bit code generation. - Copyright (C) 1999, 2010 Free Software Foundation, Inc. + Copyright (C) 1999, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -19,7 +19,4 @@ You should have received a copy of the G along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ -#undef TARGET_DEFAULT -#define TARGET_DEFAULT \ - (MASK_V9 + MASK_PTR64 + MASK_64BIT /* + MASK_HARD_QUAD */ + \ - MASK_STACK_BIAS + MASK_APP_REGS + MASK_FPU + MASK_LONG_DOUBLE_128) +#define TARGET_64BIT_DEFAULT 1 diff --git a/gcc/config/sparc/sol2.h b/gcc/config/sparc/sol2.h --- a/gcc/config/sparc/sol2.h +++ b/gcc/config/sparc/sol2.h @@ -20,11 +20,17 @@ You should have received a copy of the G along with GCC; see the file COPYING3. If not see http://www.gnu.org/licenses/. */ +#undef TARGET_DEFAULT +#ifdef TARGET_64BIT_DEFAULT +#define TARGET_DEFAULT \ + (MASK_V9 + MASK_PTR64 + MASK_64BIT /* + MASK_HARD_QUAD */ + \ + MASK_STACK_BIAS + MASK_APP_REGS + MASK_FPU + MASK_LONG_DOUBLE_128) +#else /* Solaris allows 64 bit out and global registers in 32 bit mode. sparc_override_options will disable V8+ if not generating V9 code. */ -#undef TARGET_DEFAULT #define TARGET_DEFAULT (MASK_V8PLUS + MASK_APP_REGS + MASK_FPU \ + MASK_LONG_DOUBLE_128) +#endif /* The default code model used to be CM_MEDANY on Solaris but even Sun eventually found it to be quite wasteful -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] [annotalysis] Support IPA-SRA cloned functions (issue 4591066)
On Wed, Jun 22, 2011 at 16:10, Delesley Hutchins deles...@google.com wrote: Hi, This patch is merely a port of an earlier patch, made by Le-Chun Wu, from google/main to annotalysis. It extends Annotalysis to support cloned functions/methods (especially created by IPA-SRA). Bootstrapped and passed GCC regression testsuite on x86_64-unknown-linux-gnu. Okay for branches/annotalysis? -DeLesley 2011-06-22 Le-Chun Wu l...@google.com, DeLesley Hutchins deles...@google.com Minor nit. Align names vertically: 2011-06-22 Le-Chun Wu l...@google.com DeLesley Hutchins deles...@google.com * tree-threadsafe-analyze.c (build_fully_qualified_lock): Handle IPA-SRA cloned methods. (get_canonical_lock_expr): Fold expressions that are INDIRECT_REF on top of ADDR_EXPR. (check_lock_required): Handle IPA-SRA cloned methods. (check_func_lock_excluded): Likewise. (process_function_attrs): Likewise. OK. Incidentally, I think it would make sense to have you added to the list of maintainers for the annotalysis branch. Le-Chun, what do you think? Diego.
Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
On 23/06/11 15:41, Andrew Stubbs wrote: If one or both of the inputs to a widening multiply are of unsigned type then the compiler will attempt to use usmul_widen_optab or umul_widen_optab, respectively. That works fine, but only if the target supports those operations directly. Otherwise, it just bombs out and reverts to the normal inefficient non-widening multiply. This patch attempts to catch these cases and use an alternative signed widening multiply instruction, if one of those is available. I believe this should be legal as long as the top bit of both inputs is guaranteed to be zero. The code achieves this guarantee by zero-extending the inputs to a wider mode (which must still be narrower than the output mode). OK? This update fixes the testsuite issue Janis pointed out. Andrew 2011-06-28 Andrew Stubbs a...@codesourcery.com gcc/ * Makefile.in (tree-ssa-math-opts.o): Add langhooks.h dependency. * optabs.c (find_widening_optab_handler): Rename to ... (find_widening_optab_handler_and_mode): ... this, and add new argument 'found_mode'. * optabs.h (find_widening_optab_handler): Rename to ... (find_widening_optab_handler_and_mode): ... this. (find_widening_optab_handler): New macro. * tree-ssa-math-opts.c: Include langhooks.h (build_and_insert_cast): New function. (convert_mult_to_widen): Add new argument 'gsi'. Convert unsupported unsigned multiplies to signed. (convert_plusminus_to_widen): Likewise. (execute_optimize_widening_mul): Pass gsi to convert_mult_to_widen. gcc/testsuite/ * gcc.target/arm/wmul-6.c: New file. --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2672,7 +2672,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \ tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \ $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \ - $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h + $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \ + langhooks.h tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \ $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \ $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \ --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -232,9 +232,10 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1) non-widening optabs also. */ enum insn_code -find_widening_optab_handler (optab op, enum machine_mode to_mode, - enum machine_mode from_mode, - int permit_non_widening) +find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode, + enum machine_mode from_mode, + int permit_non_widening, + enum machine_mode *found_mode) { for (; (permit_non_widening || from_mode != to_mode) GET_MODE_SIZE (from_mode) = GET_MODE_SIZE (to_mode) @@ -245,7 +246,11 @@ find_widening_optab_handler (optab op, enum machine_mode to_mode, from_mode); if (handler != CODE_FOR_nothing) - return handler; + { + if (found_mode) + *found_mode = from_mode; + return handler; + } } return CODE_FOR_nothing; --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -808,8 +808,13 @@ extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code); extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code); /* Find a widening optab even if it doesn't widen as much as we want. */ -extern enum insn_code find_widening_optab_handler (optab, enum machine_mode, - enum machine_mode, int); +#define find_widening_optab_handler(A,B,C,D) \ + find_widening_optab_handler_and_mode (A, B, C, D, NULL) +extern enum insn_code find_widening_optab_handler_and_mode (optab, + enum machine_mode, + enum machine_mode, + int, + enum machine_mode *); /* An extra flag to control optab_for_tree_code's behavior. This is needed to distinguish between machines with a vector shift that takes a scalar for the --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-6.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +long long +foo (long long a, unsigned char *b, signed char *c) +{ + return a + (long long)*b * (long long)*c; +} + +/* { dg-final { scan-assembler smlal } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -98,6 +98,7 @@ along with GCC; see the file COPYING3. If not see #include basic-block.h #include target.h #include gimple-pretty-print.h +#include langhooks.h /* FIXME: RTL headers have to be included here for optabs. */ #include rtl.h /* Because optabs.h wants enum rtx_code. */ @@ -1086,6 +1087,21 @@ build_and_insert_ref (gimple_stmt_iterator *gsi, location_t loc, tree type, return result; } +/* Build a gimple assignment to cast VAL to TYPE, and put the result in + TARGET. Insert the statement prior to GSI's current
Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
On 06/23/2011 04:41 PM, Andrew Stubbs wrote: I believe this should be legal as long as the top bit of both inputs is guaranteed to be zero. The code achieves this guarantee by zero-extending the inputs to a wider mode (which must still be narrower than the output mode). Yes, that's correct. Paolo
[PATCH, SRA] Total scalarization and padding
Hi, at the moment SRA can get confused by alignment padding and think that it actually contains some data for which there is no planned replacement and thus might leave some loads and stores in place instead of removing them. This is perhaps the biggest problem when we attempt total scalarization of simple structures exactly in order to get rid of these and of the variables altogether. I've pondered for quite a while how to best deal with them. One option was to make just the total scalarization stronger. I have also contemplated creating phantom accesses for padding I could detect (i.e. in simple structures) which would be more general, but this would complicate the parts of SRA which are already quite convoluted and I was not really sure it was worth it. Eventually I decided for the total scalarization option. This patch changes it such that the flag is propagated down the access tree but also, if it does not work out, is reset on the way up. If the flag survives, the access tree is considered covered by scalar replacements and thus it is known not to contain unscalarized data. While changing function analyze_access_subtree I have simplified the way we compute the hole flag and also fixed one comparison which we currently have the wrong way round but it fortunately does not matter because if there is a hole, the covered_to will never add up to the total size. I'll probably post a separate patch against 4.6 just in case someone attempts to read the source. Bootstrapped and tested on x86_64-linux, OK for trunk? Thanks, Martin 2011-06-24 Martin Jambor mjam...@suse.cz * tree-sra.c (struct access): Rename total_scalarization to grp_total_scalarization (completely_scalarize_var): New function. (sort_and_splice_var_accesses): Set total_scalarization in the representative access. (analyze_access_subtree): Propagate total scalarization accross the tree, no holes in totally scalarized trees, simplify coverage computation. (analyze_all_variable_accesses): Call completely_scalarize_var instead of completely_scalarize_record. * testsuite/gcc.dg/tree-ssa/sra-12.c: New test. Index: src/gcc/tree-sra.c === *** src.orig/gcc/tree-sra.c --- src/gcc/tree-sra.c *** struct access *** 170,179 /* Is this particular access write access? */ unsigned write : 1; - /* Is this access an artificial one created to scalarize some record - entirely? */ - unsigned total_scalarization : 1; - /* Is this access an access to a non-addressable field? */ unsigned non_addressable : 1; --- 170,175 *** struct access *** 204,209 --- 200,209 is not propagated in the access tree in any direction. */ unsigned grp_scalar_write : 1; + /* Is this access an artificial one created to scalarize some record + entirely? */ + unsigned grp_total_scalarization : 1; + /* Other passes of the analysis use this bit to make function analyze_access_subtree create scalar replacements for this group if possible. */ *** dump_access (FILE *f, struct access *acc *** 377,402 fprintf (f, , type = ); print_generic_expr (f, access-type, 0); if (grp) ! fprintf (f, , total_scalarization = %d, grp_read = %d, grp_write = %d, !grp_assignment_read = %d, grp_assignment_write = %d, !grp_scalar_read = %d, grp_scalar_write = %d, grp_hint = %d, grp_covered = %d, grp_unscalarizable_region = %d, grp_unscalarized_data = %d, grp_partial_lhs = %d, grp_to_be_replaced = %d, grp_maybe_modified = %d, grp_not_necessarilly_dereferenced = %d\n, !access-total_scalarization, access-grp_read, access-grp_write, !access-grp_assignment_read, access-grp_assignment_write, !access-grp_scalar_read, access-grp_scalar_write, access-grp_hint, access-grp_covered, access-grp_unscalarizable_region, access-grp_unscalarized_data, access-grp_partial_lhs, access-grp_to_be_replaced, access-grp_maybe_modified, access-grp_not_necessarilly_dereferenced); else ! fprintf (f, , write = %d, total_scalarization = %d, grp_partial_lhs = %d\n, !access-write, access-total_scalarization, access-grp_partial_lhs); } --- 377,402 fprintf (f, , type = ); print_generic_expr (f, access-type, 0); if (grp) ! fprintf (f, , grp_read = %d, grp_write = %d, grp_assignment_read = %d, !grp_assignment_write = %d, grp_scalar_read = %d, !grp_scalar_write = %d, grp_total_scalarization = %d, grp_hint = %d, grp_covered = %d, grp_unscalarizable_region = %d, grp_unscalarized_data = %d, grp_partial_lhs =
Re: [PATCH, SRA] Total scalarization and padding
On Tue, Jun 28, 2011 at 2:50 PM, Martin Jambor mjam...@suse.cz wrote: Hi, at the moment SRA can get confused by alignment padding and think that it actually contains some data for which there is no planned replacement and thus might leave some loads and stores in place instead of removing them. This is perhaps the biggest problem when we attempt total scalarization of simple structures exactly in order to get rid of these and of the variables altogether. I've pondered for quite a while how to best deal with them. One option was to make just the total scalarization stronger. I have also contemplated creating phantom accesses for padding I could detect (i.e. in simple structures) which would be more general, but this would complicate the parts of SRA which are already quite convoluted and I was not really sure it was worth it. Eventually I decided for the total scalarization option. This patch changes it such that the flag is propagated down the access tree but also, if it does not work out, is reset on the way up. If the flag survives, the access tree is considered covered by scalar replacements and thus it is known not to contain unscalarized data. While changing function analyze_access_subtree I have simplified the way we compute the hole flag and also fixed one comparison which we currently have the wrong way round but it fortunately does not matter because if there is a hole, the covered_to will never add up to the total size. I'll probably post a separate patch against 4.6 just in case someone attempts to read the source. Bootstrapped and tested on x86_64-linux, OK for trunk? So, what will it do for the testcase? The following is what I _think_ it should do: bb 2: l = *p_1(D); l$i_6 = p_1(D)-i; D.2700_2 = l$i_6; D.2701_3 = D.2700_2 + 1; l$i_12 = D.2701_3; *p_1(D) = l; p_1(D)-i = l$i_12; and let FRE/DSE do their job (which they don't do, unfortunately). So does your patch then remove the load/store from/to l but keep the elementwise loads/stores (which are probably cleaned up by FRE)? Richard. Thanks, Martin 2011-06-24 Martin Jambor mjam...@suse.cz * tree-sra.c (struct access): Rename total_scalarization to grp_total_scalarization (completely_scalarize_var): New function. (sort_and_splice_var_accesses): Set total_scalarization in the representative access. (analyze_access_subtree): Propagate total scalarization accross the tree, no holes in totally scalarized trees, simplify coverage computation. (analyze_all_variable_accesses): Call completely_scalarize_var instead of completely_scalarize_record. * testsuite/gcc.dg/tree-ssa/sra-12.c: New test. Index: src/gcc/tree-sra.c === *** src.orig/gcc/tree-sra.c --- src/gcc/tree-sra.c *** struct access *** 170,179 /* Is this particular access write access? */ unsigned write : 1; - /* Is this access an artificial one created to scalarize some record - entirely? */ - unsigned total_scalarization : 1; - /* Is this access an access to a non-addressable field? */ unsigned non_addressable : 1; --- 170,175 *** struct access *** 204,209 --- 200,209 is not propagated in the access tree in any direction. */ unsigned grp_scalar_write : 1; + /* Is this access an artificial one created to scalarize some record + entirely? */ + unsigned grp_total_scalarization : 1; + /* Other passes of the analysis use this bit to make function analyze_access_subtree create scalar replacements for this group if possible. */ *** dump_access (FILE *f, struct access *acc *** 377,402 fprintf (f, , type = ); print_generic_expr (f, access-type, 0); if (grp) ! fprintf (f, , total_scalarization = %d, grp_read = %d, grp_write = %d, ! grp_assignment_read = %d, grp_assignment_write = %d, ! grp_scalar_read = %d, grp_scalar_write = %d, grp_hint = %d, grp_covered = %d, grp_unscalarizable_region = %d, grp_unscalarized_data = %d, grp_partial_lhs = %d, grp_to_be_replaced = %d, grp_maybe_modified = %d, grp_not_necessarilly_dereferenced = %d\n, ! access-total_scalarization, access-grp_read, access-grp_write, ! access-grp_assignment_read, access-grp_assignment_write, ! access-grp_scalar_read, access-grp_scalar_write, access-grp_hint, access-grp_covered, access-grp_unscalarizable_region, access-grp_unscalarized_data, access-grp_partial_lhs, access-grp_to_be_replaced, access-grp_maybe_modified, access-grp_not_necessarilly_dereferenced); else ! fprintf (f, , write = %d, total_scalarization = %d, grp_partial_lhs = %d\n,
Re: [Patch, AVR]: Better 32=16*16 widening multiplication
2011/6/28 Georg-Johann Lay a...@gjlay.de: This implements mulhisi3 and umulhisi3 widening multiplication insns if AVR_HAVE_MUL. I chose the interface as r25:r22 = r19:r18 * r21:r20 which is ok because only avr-gcc BE will call respective __* support functions in libgcc. Tested without regression and hand-tested assembler code. Johann * config/avr/t-avr (LIB1ASMFUNCS): Add _mulhisi3, _umulhisi3, _xmulhisi3_exit. * config/avr/libgcc.S (_xmulhisi3_exit): New Function. (__mulhisi3): Optimize if have MUL*. Use XJMP instead of rjmp. (__umulhisi3): Ditto. * config/avr/avr.md (mulhisi3): New insn expender. (umulhisi3): New insn expender. (*mulhisi3_call): New insn. (*umulhisi3_call): New insn. Approved. Denis.
Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const
On Mon, Jun 27, 2011 at 3:25 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Jun 27, 2011 at 3:19 PM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Jun 27, 2011 at 3:08 PM, Ulrich Weigand uweig...@de.ibm.com wrote: H.J. Lu wrote: reload generates: (insn 914 912 0 (set (reg:SI 0 ax) (plus:SI (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (const_int 8 [0x8]))) 248 {*lea_1_x32} (nil)) from insn = emit_insn_if_valid_for_reload (gen_rtx_SET (VOIDmode, out, in)); Interesting. The pseudo should have been replaced by the hard register (reg:DI 1) during the preceding call to op0 = find_replacement (XEXP (in, 0)); (since reload 0 should have pushed a replacement record.) Interestingly enough, in the final output that replacement *is* performed in the REG_EQUIV note: (insn 1023 1022 1024 34 (set (reg:SI 1 dx) (plus:SI (reg:SI 1 dx) (const_int 8 [0x8]))) spooles.c:291 248 {*lea_1_x32} (expr_list:REG_EQUIV (plus:SI (subreg:SI (reg:DI 1 dx) 0) (const_int 8 [0x8])) (nil))) which is why I hadn't expected this to be a problem here. Can you try to find out why the find_replacement doesn't work with your test case? I will investigate. Could (reg:SI 1 dx) vs (subreg:SI (reg:DI 1 dx) 0) a problem? find_replacement never checks subreg: Breakpoint 3, find_replacement (loc=0x7068ab00) at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411 6411 if (reloadreg r-where == loc) (reg:DI 0 ax) (reg/v/f:DI 182 [ b ]) (gdb) call debug_rtx (*loc) (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (gdb) This patch checks SUBREG pointer if Pmode != ptr_mode. OK for trunk? Thanks. -- H.J. --- 2011-06-28 H.J. Lu hongjiu...@intel.com PR rtl-optimization/49114 * reload.c (find_replacement): Properly handle SUBREG pointers. diff --git a/gcc/reload.c b/gcc/reload.c index 3ad46b9..829e45b 100644 --- a/gcc/reload.c +++ b/gcc/reload.c @@ -6415,6 +6415,36 @@ find_replacement (rtx *loc) return reloadreg; } + else if (Pmode != ptr_mode + !r-subreg_loc + reloadreg + (r-mode == Pmode || GET_MODE (reloadreg) == Pmode) + REG_P (reloadreg) + GET_CODE (*loc) == SUBREG + REG_P (SUBREG_REG (*loc)) + REG_POINTER (SUBREG_REG (*loc)) + GET_MODE (*loc) == ptr_mode + r-where == SUBREG_REG (*loc)) + { + int offset; + + if (r-mode != VOIDmode GET_MODE (reloadreg) != r-mode) + reloadreg = gen_rtx_REG (r-mode, REGNO (reloadreg)); + + if ((WORDS_BIG_ENDIAN || BYTES_BIG_ENDIAN) + GET_MODE_SIZE (Pmode) GET_MODE_SIZE (ptr_mode)) + { + offset = GET_MODE_SIZE (Pmode) - GET_MODE_SIZE (ptr_mode); + if (! BYTES_BIG_ENDIAN) + offset = (offset / UNITS_PER_WORD) * UNITS_PER_WORD; + else if (! WORDS_BIG_ENDIAN) + offset %= UNITS_PER_WORD; + } + else +offset = 0; + + return gen_rtx_SUBREG (ptr_mode, reloadreg, offset); + } else if (reloadreg r-subreg_loc == loc) { /* RELOADREG must be either a REG or a SUBREG.
MN10330: Do not use linker relaxation and incremental linking together
Hi Guys, With the MN10300, enabling linker relaxation when performing a incremental link does not work: % mn10300-elf-gcc hello.c -mrelax -r collect-ld: --relax and -r may not be used together collect2: error: ld returned 1 exit status Hence I am applying the patch below as an obvious fix for the problem. Tested without regressions on an mn10300-elf toolchain. Cheers Nick gcc/ChangeLog 2011-06-28 Nick Clifton ni...@redhat.com * config/mn10300/mn10300.h (LINK_SPEC): Do not use linker relaxation when performing an incremental link. Index: gcc/config/mn10300/mn10300.h === --- gcc/config/mn10300/mn10300.h(revision 175576) +++ gcc/config/mn10300/mn10300.h(working copy) @@ -24,7 +24,7 @@ #undef LIB_SPEC #undef ENDFILE_SPEC #undef LINK_SPEC -#define LINK_SPEC %{mrelax:--relax} +#define LINK_SPEC %{mrelax:%{!r:--relax}} #undef STARTFILE_SPEC #define STARTFILE_SPEC %{!mno-crt0:%{!shared:%{pg:gcrt0%O%s}%{!pg:%{p:mcrt0%O%s}%{!p:crt0%O%s
Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
On 28/06/11 13:33, Andrew Stubbs wrote: On 23/06/11 15:41, Andrew Stubbs wrote: If one or both of the inputs to a widening multiply are of unsigned type then the compiler will attempt to use usmul_widen_optab or umul_widen_optab, respectively. That works fine, but only if the target supports those operations directly. Otherwise, it just bombs out and reverts to the normal inefficient non-widening multiply. This patch attempts to catch these cases and use an alternative signed widening multiply instruction, if one of those is available. I believe this should be legal as long as the top bit of both inputs is guaranteed to be zero. The code achieves this guarantee by zero-extending the inputs to a wider mode (which must still be narrower than the output mode). OK? This update fixes the testsuite issue Janis pointed out. And this one fixes up the wmul-5.c testcase also. The patch has changed the correct result. Andrew 2011-06-28 Andrew Stubbs a...@codesourcery.com gcc/ * Makefile.in (tree-ssa-math-opts.o): Add langhooks.h dependency. * optabs.c (find_widening_optab_handler): Rename to ... (find_widening_optab_handler_and_mode): ... this, and add new argument 'found_mode'. * optabs.h (find_widening_optab_handler): Rename to ... (find_widening_optab_handler_and_mode): ... this. (find_widening_optab_handler): New macro. * tree-ssa-math-opts.c: Include langhooks.h (build_and_insert_cast): New function. (convert_mult_to_widen): Add new argument 'gsi'. Convert unsupported unsigned multiplies to signed. (convert_plusminus_to_widen): Likewise. (execute_optimize_widening_mul): Pass gsi to convert_mult_to_widen. gcc/testsuite/ * gcc.target/arm/wmul-5.c: Update expected result. * gcc.target/arm/wmul-6.c: New file. --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2672,7 +2672,8 @@ tree-ssa-loop-im.o : tree-ssa-loop-im.c $(TREE_FLOW_H) $(CONFIG_H) \ tree-ssa-math-opts.o : tree-ssa-math-opts.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TM_H) $(FLAGS_H) $(TREE_H) $(TREE_FLOW_H) $(TIMEVAR_H) \ $(TREE_PASS_H) alloc-pool.h $(BASIC_BLOCK_H) $(TARGET_H) \ - $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h + $(DIAGNOSTIC_H) $(RTL_H) $(EXPR_H) $(OPTABS_H) gimple-pretty-print.h \ + langhooks.h tree-ssa-alias.o : tree-ssa-alias.c $(TREE_FLOW_H) $(CONFIG_H) $(SYSTEM_H) \ $(TREE_H) $(TM_P_H) $(EXPR_H) $(GGC_H) $(TREE_INLINE_H) $(FLAGS_H) \ $(FUNCTION_H) $(TIMEVAR_H) convert.h $(TM_H) coretypes.h langhooks.h \ --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -232,9 +232,10 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1) non-widening optabs also. */ enum insn_code -find_widening_optab_handler (optab op, enum machine_mode to_mode, - enum machine_mode from_mode, - int permit_non_widening) +find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode, + enum machine_mode from_mode, + int permit_non_widening, + enum machine_mode *found_mode) { for (; (permit_non_widening || from_mode != to_mode) GET_MODE_SIZE (from_mode) = GET_MODE_SIZE (to_mode) @@ -245,7 +246,11 @@ find_widening_optab_handler (optab op, enum machine_mode to_mode, from_mode); if (handler != CODE_FOR_nothing) - return handler; + { + if (found_mode) + *found_mode = from_mode; + return handler; + } } return CODE_FOR_nothing; --- a/gcc/optabs.h +++ b/gcc/optabs.h @@ -808,8 +808,13 @@ extern void emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code); extern bool maybe_emit_unop_insn (enum insn_code, rtx, rtx, enum rtx_code); /* Find a widening optab even if it doesn't widen as much as we want. */ -extern enum insn_code find_widening_optab_handler (optab, enum machine_mode, - enum machine_mode, int); +#define find_widening_optab_handler(A,B,C,D) \ + find_widening_optab_handler_and_mode (A, B, C, D, NULL) +extern enum insn_code find_widening_optab_handler_and_mode (optab, + enum machine_mode, + enum machine_mode, + int, + enum machine_mode *); /* An extra flag to control optab_for_tree_code's behavior. This is needed to distinguish between machines with a vector shift that takes a scalar for the --- a/gcc/testsuite/gcc.target/arm/wmul-5.c +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -7,4 +7,4 @@ foo (long long a, char *b, char *c) return a + *b * *c; } -/* { dg-final { scan-assembler umlal } } */ +/* { dg-final { scan-assembler smlalbb } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-6.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +long long +foo (long long a, unsigned char *b, signed char *c) +{ + return a + (long long)*b * (long long)*c; +} + +/* { dg-final { scan-assembler smlal } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -98,6 +98,7 @@ along with GCC; see the file COPYING3. If not see #include basic-block.h
[PATCH] [ARM] Fix constraint modifiers for VFP patterns.
Hi, Sometime back Chung-Lin noticed that a few of the VFP patterns as below had the '+' constraint modifiers rather than the '=' constraint modifiers. I've now corrected this as follows and tested this on trunk with arm-linux-gnueabi and qemu for a v7-a neon test run. Committed. cheers Ramana 2011-06-28 Ramana Radhakrishnan ramana.radhakrish...@linaro.org * config/arm/vfp.md (*divsf3_vfp): Replace '+' constraint modifier with '=' constraint modifier. (*divdf3_vfp): Likewise. (*mulsf3_vfp): Likewise. (*muldf3_vfp): Likewise. (*mulsf3negsf_vfp): Likewise. (*muldf3negdf_vfp): Likewise. --- gcc/config/arm/arm.h |2 +- gcc/config/arm/vfp.md | 13 ++--- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index b0d2625..edd6afd 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1597,7 +1597,7 @@ typedef struct frame. */ #define EXIT_IGNORE_STACK 1 -#define EPILOGUE_USES(REGNO) ((REGNO) == LR_REGNUM) +#define EPILOGUE_USES(REGNO) (epilogue_completed (REGNO) == LR_REGNUM) /* Determine if the epilogue should be output as RTL. You should override this if you define FUNCTION_EXTRA_EPILOGUE. */ diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md index 42be2ff..e2165a8 100644 --- a/gcc/config/arm/vfp.md +++ b/gcc/config/arm/vfp.md @@ -719,7 +719,7 @@ ;; Division insns (define_insn *divsf3_vfp - [(set (match_operand:SF0 s_register_operand +t) + [(set (match_operand:SF0 s_register_operand =t) (div:SF (match_operand:SF 1 s_register_operand t) (match_operand:SF 2 s_register_operand t)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP @@ -729,7 +729,7 @@ ) (define_insn *divdf3_vfp - [(set (match_operand:DF0 s_register_operand +w) + [(set (match_operand:DF0 s_register_operand =w) (div:DF (match_operand:DF 1 s_register_operand w) (match_operand:DF 2 s_register_operand w)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP_DOUBLE @@ -742,7 +742,7 @@ ;; Multiplication insns (define_insn *mulsf3_vfp - [(set (match_operand:SF 0 s_register_operand +t) + [(set (match_operand:SF 0 s_register_operand =t) (mult:SF (match_operand:SF 1 s_register_operand t) (match_operand:SF 2 s_register_operand t)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP @@ -752,7 +752,7 @@ ) (define_insn *muldf3_vfp - [(set (match_operand:DF 0 s_register_operand +w) + [(set (match_operand:DF 0 s_register_operand =w) (mult:DF (match_operand:DF 1 s_register_operand w) (match_operand:DF 2 s_register_operand w)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP_DOUBLE @@ -761,9 +761,8 @@ (set_attr type fmuld)] ) - (define_insn *mulsf3negsf_vfp - [(set (match_operand:SF 0 s_register_operand +t) + [(set (match_operand:SF 0 s_register_operand =t) (mult:SF (neg:SF (match_operand:SF 1 s_register_operand t)) (match_operand:SF 2 s_register_operand t)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP @@ -773,7 +772,7 @@ ) (define_insn *muldf3negdf_vfp - [(set (match_operand:DF 0 s_register_operand +w) + [(set (match_operand:DF 0 s_register_operand =w) (mult:DF (neg:DF (match_operand:DF 1 s_register_operand w)) (match_operand:DF 2 s_register_operand w)))] TARGET_32BIT TARGET_HARD_FLOAT TARGET_VFP_DOUBLE -- 1.7.4.1
Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const
H.J. Lu wrote: find_replacement never checks subreg: Breakpoint 3, find_replacement (loc=3D0x7068ab00) =A0 =A0at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411 6411 =A0 =A0 =A0 =A0 =A0if (reloadreg r-where =3D=3D loc) (reg:DI 0 ax) (reg/v/f:DI 182 [ b ]) (gdb) call debug_rtx (*loc) (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (gdb) This seems to work. Does it make any senses? Ah, I see. This was supposed to be handled via the SUBREG_LOC member of the replacement struct. Unfortunately, it turns out that this is no longer reliably set these days ... At first I was concerned that this might also cause problems at the other location where replacements are processed, subst_reloads. However, it turns out that code in subst_reloads is dead these days anyway, as the reloadreg is *always* a REG, and never a SUBREG. Once that code (and similar code in find_replacement that tries to handle SUBREG reloadregs) is removed, the only remaining user of the SUBREG_LOC field is actually find_replacement. But here we're doing a recursive descent through an RTL anyway, so we always know we're replacing inside a SUBREG. This makes the whole SUBREG_LOC field obsolete. The patch below implements those changes (untested so far). Can you verify that this works for you as well? Thanks, Ulrich ChangeLog: * reload.c (struct replacement): Remove SUBREG_LOC member. (push_reload): Do not set it. (push_replacement): Likewise. (subst_reload): Remove dead code. (copy_replacements): Remove assertion. (copy_replacements_1): Do not handle SUBREG_LOC. (move_replacements): Likewise. (find_replacement): Remove dead code. Detect subregs via recursive descent instead of via SUBREG_LOC. Index: gcc/reload.c === *** gcc/reload.c(revision 175580) --- gcc/reload.c(working copy) *** static int replace_reloads; *** 158,165 struct replacement { rtx *where; /* Location to store in */ - rtx *subreg_loc;/* Location of SUBREG if WHERE is inside - a SUBREG; 0 otherwise. */ int what; /* which reload this is for */ enum machine_mode mode; /* mode it must have */ }; --- 158,163 *** push_reload (rtx in, rtx out, rtx *inloc *** 1496,1502 { struct replacement *r = replacements[n_replacements++]; r-what = i; - r-subreg_loc = in_subreg_loc; r-where = inloc; r-mode = inmode; } --- 1494,1499 *** push_reload (rtx in, rtx out, rtx *inloc *** 1505,1511 struct replacement *r = replacements[n_replacements++]; r-what = i; r-where = outloc; - r-subreg_loc = out_subreg_loc; r-mode = outmode; } } --- 1502,1507 *** push_replacement (rtx *loc, int reloadnu *** 1634,1640 struct replacement *r = replacements[n_replacements++]; r-what = reloadnum; r-where = loc; - r-subreg_loc = 0; r-mode = mode; } } --- 1630,1635 *** subst_reloads (rtx insn) *** 6287,6319 if (GET_MODE (reloadreg) != r-mode r-mode != VOIDmode) reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode); ! /* If we are putting this into a SUBREG and RELOADREG is a !SUBREG, we would be making nested SUBREGs, so we have to fix !this up. Note that r-where == SUBREG_REG (*r-subreg_loc). */ ! ! if (r-subreg_loc != 0 GET_CODE (reloadreg) == SUBREG) ! { ! if (GET_MODE (*r-subreg_loc) ! == GET_MODE (SUBREG_REG (reloadreg))) ! *r-subreg_loc = SUBREG_REG (reloadreg); ! else ! { ! int final_offset = ! SUBREG_BYTE (*r-subreg_loc) + SUBREG_BYTE (reloadreg); ! ! /* When working with SUBREGs the rule is that the byte !offset must be a multiple of the SUBREG's mode. */ ! final_offset = (final_offset / ! GET_MODE_SIZE (GET_MODE (*r-subreg_loc))); ! final_offset = (final_offset * ! GET_MODE_SIZE (GET_MODE (*r-subreg_loc))); ! ! *r-where = SUBREG_REG (reloadreg); ! SUBREG_BYTE (*r-subreg_loc) = final_offset; ! } ! } ! else ! *r-where = reloadreg; } /* If reload got no reg and isn't optional, something's wrong. */ else --- 6282,6288 if (GET_MODE (reloadreg) != r-mode r-mode != VOIDmode) reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode); ! *r-where = reloadreg; } /* If reload got no reg and isn't optional,
Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const
On Tue, Jun 28, 2011 at 7:24 AM, Ulrich Weigand uweig...@de.ibm.com wrote: H.J. Lu wrote: find_replacement never checks subreg: Breakpoint 3, find_replacement (loc=3D0x7068ab00) =A0 =A0at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411 6411 =A0 =A0 =A0 =A0 =A0if (reloadreg r-where =3D=3D loc) (reg:DI 0 ax) (reg/v/f:DI 182 [ b ]) (gdb) call debug_rtx (*loc) (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (gdb) This seems to work. Does it make any senses? Ah, I see. This was supposed to be handled via the SUBREG_LOC member of the replacement struct. Unfortunately, it turns out that this is no longer reliably set these days ... At first I was concerned that this might also cause problems at the other location where replacements are processed, subst_reloads. However, it turns out that code in subst_reloads is dead these days anyway, as the reloadreg is *always* a REG, and never a SUBREG. Once that code (and similar code in find_replacement that tries to handle SUBREG reloadregs) is removed, the only remaining user of the SUBREG_LOC field is actually find_replacement. But here we're doing a recursive descent through an RTL anyway, so we always know we're replacing inside a SUBREG. This makes the whole SUBREG_LOC field obsolete. The patch below implements those changes (untested so far). Can you verify that this works for you as well? Thanks, Ulrich ChangeLog: * reload.c (struct replacement): Remove SUBREG_LOC member. (push_reload): Do not set it. (push_replacement): Likewise. (subst_reload): Remove dead code. (copy_replacements): Remove assertion. (copy_replacements_1): Do not handle SUBREG_LOC. (move_replacements): Likewise. (find_replacement): Remove dead code. Detect subregs via recursive descent instead of via SUBREG_LOC. Index: gcc/reload.c === *** gcc/reload.c (revision 175580) --- gcc/reload.c (working copy) *** static int replace_reloads; *** 158,165 struct replacement { rtx *where; /* Location to store in */ - rtx *subreg_loc; /* Location of SUBREG if WHERE is inside - a SUBREG; 0 otherwise. */ int what; /* which reload this is for */ enum machine_mode mode; /* mode it must have */ }; --- 158,163 *** push_reload (rtx in, rtx out, rtx *inloc *** 1496,1502 { struct replacement *r = replacements[n_replacements++]; r-what = i; - r-subreg_loc = in_subreg_loc; r-where = inloc; r-mode = inmode; } --- 1494,1499 *** push_reload (rtx in, rtx out, rtx *inloc *** 1505,1511 struct replacement *r = replacements[n_replacements++]; r-what = i; r-where = outloc; - r-subreg_loc = out_subreg_loc; r-mode = outmode; } } --- 1502,1507 *** push_replacement (rtx *loc, int reloadnu *** 1634,1640 struct replacement *r = replacements[n_replacements++]; r-what = reloadnum; r-where = loc; - r-subreg_loc = 0; r-mode = mode; } } --- 1630,1635 *** subst_reloads (rtx insn) *** 6287,6319 if (GET_MODE (reloadreg) != r-mode r-mode != VOIDmode) reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode); ! /* If we are putting this into a SUBREG and RELOADREG is a ! SUBREG, we would be making nested SUBREGs, so we have to fix ! this up. Note that r-where == SUBREG_REG (*r-subreg_loc). */ ! ! if (r-subreg_loc != 0 GET_CODE (reloadreg) == SUBREG) ! { ! if (GET_MODE (*r-subreg_loc) ! == GET_MODE (SUBREG_REG (reloadreg))) ! *r-subreg_loc = SUBREG_REG (reloadreg); ! else ! { ! int final_offset = ! SUBREG_BYTE (*r-subreg_loc) + SUBREG_BYTE (reloadreg); ! ! /* When working with SUBREGs the rule is that the byte ! offset must be a multiple of the SUBREG's mode. */ ! final_offset = (final_offset / ! GET_MODE_SIZE (GET_MODE (*r-subreg_loc))); ! final_offset = (final_offset * ! GET_MODE_SIZE (GET_MODE (*r-subreg_loc))); ! ! *r-where = SUBREG_REG (reloadreg); ! SUBREG_BYTE (*r-subreg_loc) = final_offset; ! } ! } ! else ! *r-where = reloadreg; } /* If reload got no reg and isn't optional, something's wrong. */ else --- 6282,6288 if (GET_MODE (reloadreg) != r-mode r-mode != VOIDmode) reloadreg =
Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const
On Tue, Jun 28, 2011 at 7:47 AM, H.J. Lu hjl.to...@gmail.com wrote: On Tue, Jun 28, 2011 at 7:24 AM, Ulrich Weigand uweig...@de.ibm.com wrote: H.J. Lu wrote: find_replacement never checks subreg: Breakpoint 3, find_replacement (loc=3D0x7068ab00) =A0 =A0at /export/gnu/import/git/gcc-x32/gcc/reload.c:6411 6411 =A0 =A0 =A0 =A0 =A0if (reloadreg r-where =3D=3D loc) (reg:DI 0 ax) (reg/v/f:DI 182 [ b ]) (gdb) call debug_rtx (*loc) (subreg:SI (reg/v/f:DI 182 [ b ]) 0) (gdb) This seems to work. Does it make any senses? Ah, I see. This was supposed to be handled via the SUBREG_LOC member of the replacement struct. Unfortunately, it turns out that this is no longer reliably set these days ... At first I was concerned that this might also cause problems at the other location where replacements are processed, subst_reloads. However, it turns out that code in subst_reloads is dead these days anyway, as the reloadreg is *always* a REG, and never a SUBREG. Once that code (and similar code in find_replacement that tries to handle SUBREG reloadregs) is removed, the only remaining user of the SUBREG_LOC field is actually find_replacement. But here we're doing a recursive descent through an RTL anyway, so we always know we're replacing inside a SUBREG. This makes the whole SUBREG_LOC field obsolete. The patch below implements those changes (untested so far). Can you verify that this works for you as well? Thanks, Ulrich ChangeLog: * reload.c (struct replacement): Remove SUBREG_LOC member. (push_reload): Do not set it. (push_replacement): Likewise. (subst_reload): Remove dead code. (copy_replacements): Remove assertion. (copy_replacements_1): Do not handle SUBREG_LOC. (move_replacements): Likewise. (find_replacement): Remove dead code. Detect subregs via recursive descent instead of via SUBREG_LOC. Index: gcc/reload.c === *** gcc/reload.c (revision 175580) --- gcc/reload.c (working copy) *** static int replace_reloads; *** 158,165 struct replacement { rtx *where; /* Location to store in */ - rtx *subreg_loc; /* Location of SUBREG if WHERE is inside - a SUBREG; 0 otherwise. */ int what; /* which reload this is for */ enum machine_mode mode; /* mode it must have */ }; --- 158,163 *** push_reload (rtx in, rtx out, rtx *inloc *** 1496,1502 { struct replacement *r = replacements[n_replacements++]; r-what = i; - r-subreg_loc = in_subreg_loc; r-where = inloc; r-mode = inmode; } --- 1494,1499 *** push_reload (rtx in, rtx out, rtx *inloc *** 1505,1511 struct replacement *r = replacements[n_replacements++]; r-what = i; r-where = outloc; - r-subreg_loc = out_subreg_loc; r-mode = outmode; } } --- 1502,1507 *** push_replacement (rtx *loc, int reloadnu *** 1634,1640 struct replacement *r = replacements[n_replacements++]; r-what = reloadnum; r-where = loc; - r-subreg_loc = 0; r-mode = mode; } } --- 1630,1635 *** subst_reloads (rtx insn) *** 6287,6319 if (GET_MODE (reloadreg) != r-mode r-mode != VOIDmode) reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode); ! /* If we are putting this into a SUBREG and RELOADREG is a ! SUBREG, we would be making nested SUBREGs, so we have to fix ! this up. Note that r-where == SUBREG_REG (*r-subreg_loc). */ ! ! if (r-subreg_loc != 0 GET_CODE (reloadreg) == SUBREG) ! { ! if (GET_MODE (*r-subreg_loc) ! == GET_MODE (SUBREG_REG (reloadreg))) ! *r-subreg_loc = SUBREG_REG (reloadreg); ! else ! { ! int final_offset = ! SUBREG_BYTE (*r-subreg_loc) + SUBREG_BYTE (reloadreg); ! ! /* When working with SUBREGs the rule is that the byte ! offset must be a multiple of the SUBREG's mode. */ ! final_offset = (final_offset / ! GET_MODE_SIZE (GET_MODE (*r-subreg_loc))); ! final_offset = (final_offset * ! GET_MODE_SIZE (GET_MODE (*r-subreg_loc))); ! ! *r-where = SUBREG_REG (reloadreg); ! SUBREG_BYTE (*r-subreg_loc) = final_offset; ! } ! } ! else ! *r-where = reloadreg; } /* If reload got no reg and isn't optional, something's wrong. */ else --- 6282,6288 if (GET_MODE
Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
On 23/06/11 15:41, Andrew Stubbs wrote: This patch removes the restriction that the inputs to a widening multiply must be of the same mode. It does this by extending the smaller of the two inputs to match the larger; therefore, it remains the case that subsequent code (in the expand pass, for example) can rely on the type of rhs1 being the input type of the operation, and the gimple verification code is still valid. OK? This update fixes the testcase issue Janis highlighted. Andrew 2011-06-28 Andrew Stubbs a...@codesourcery.com gcc/ * tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME. Ensure the the larger type is the first operand. (convert_mult_to_widen): Insert cast if type2 is smaller than type1. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.target/arm/wmul-7.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-7.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +unsigned long long +foo (unsigned long long a, unsigned char *b, unsigned short *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler umlal } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2051,9 +2051,17 @@ is_widening_mult_p (gimple stmt, *type2_out = *type1_out; } - /* FIXME: remove this restriction. */ - if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out)) -return false; + /* Ensure that the larger of the two operands comes first. */ + if (TYPE_PRECISION (*type1_out) TYPE_PRECISION (*type2_out)) +{ + tree tmp; + tmp = *type1_out; + *type1_out = *type2_out; + *type2_out = tmp; + tmp = *rhs1_out; + *rhs1_out = *rhs2_out; + *rhs2_out = tmp; +} return true; } @@ -2069,6 +2077,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) enum insn_code handler; enum machine_mode to_mode, from_mode; optab op; + int cast1 = false, cast2 = false; lhs = gimple_assign_lhs (stmt); type = TREE_TYPE (lhs); @@ -2107,16 +2116,26 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) return false; type1 = type2 = lang_hooks.types.type_for_mode (from_mode, 0); - - rhs1 = build_and_insert_cast (gsi, gimple_location (stmt), - create_tmp_var (type1, NULL), rhs1, type1); - rhs2 = build_and_insert_cast (gsi, gimple_location (stmt), - create_tmp_var (type2, NULL), rhs2, type2); + cast1 = cast2 = true; } else return false; } + if (TYPE_MODE (type2) != from_mode) +{ + type2 = lang_hooks.types.type_for_mode (from_mode, + TYPE_UNSIGNED (type2)); + cast2 = true; +} + + if (cast1) +rhs1 = build_and_insert_cast (gsi, gimple_location (stmt), + create_tmp_var (type1, NULL), rhs1, type1); + if (cast2) +rhs2 = build_and_insert_cast (gsi, gimple_location (stmt), + create_tmp_var (type2, NULL), rhs2, type2); + gimple_assign_set_rhs1 (stmt, fold_convert (type1, rhs1)); gimple_assign_set_rhs2 (stmt, fold_convert (type2, rhs2)); gimple_assign_set_rhs_code (stmt, WIDEN_MULT_EXPR); @@ -2142,6 +2161,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, optab this_optab; enum tree_code wmult_code; enum insn_code handler; + int cast1 = false, cast2 = false; lhs = gimple_assign_lhs (stmt); type = TREE_TYPE (lhs); @@ -2211,17 +2231,28 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, if (GET_MODE_SIZE (mode) GET_MODE_SIZE (TYPE_MODE (type))) { type1 = type2 = lang_hooks.types.type_for_mode (mode, 0); - mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt), - create_tmp_var (type1, NULL), - mult_rhs1, type1); - mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt), - create_tmp_var (type2, NULL), - mult_rhs2, type2); + cast1 = cast2 = true; } else return false; } + if (TYPE_MODE (type2) != TYPE_MODE (type1)) +{ + type2 = lang_hooks.types.type_for_mode (TYPE_MODE (type1), + TYPE_UNSIGNED (type2)); + cast2 = true; +} + + if (cast1) +mult_rhs1 = build_and_insert_cast (gsi, gimple_location (stmt), + create_tmp_var (type1, NULL), + mult_rhs1, type1); + if (cast2) +mult_rhs2 = build_and_insert_cast (gsi, gimple_location (stmt), + create_tmp_var (type2, NULL), + mult_rhs2, type2); + /* Verify that the machine can perform a widening multiply accumulate in this mode/signedness combination, otherwise this transformation is likely to pessimize code. */
Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const
H.J. Lu wrote: it doesn't work; allocation.f: In function 'allocation': allocation.f:1048:0: internal compiler error: in subreg_get_info, at rtlanal.c:3235 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. since subreg_regno_offset only works on hard registers. Hmm, OK. That look like another latent bug in the original code ... + if (r-mode != VOIDmode GET_MODE (reloadreg) != r-mode) + reloadreg = gen_rtx_REG (r-mode, REGNO (reloadreg)); (As an aside, this is wrong; it's already wrong in the place where you copied it from. This should now use reload_adjust_reg_for_mode just like subst_reload does.) + + if ((WORDS_BIG_ENDIAN || BYTES_BIG_ENDIAN) + GET_MODE_SIZE (Pmode) GET_MODE_SIZE (ptr_mode)) + { + offset = GET_MODE_SIZE (Pmode) - GET_MODE_SIZE (ptr_mode); + if (! BYTES_BIG_ENDIAN) + offset = (offset / UNITS_PER_WORD) * UNITS_PER_WORD; + else if (! WORDS_BIG_ENDIAN) + offset %= UNITS_PER_WORD; + } + else +offset = 0; + + return gen_rtx_SUBREG (ptr_mode, reloadreg, offset); works for me. This doesn't seem correct either, it completely ignores the SUBREG_BYTE of the original SUBREG ... Also, I don't quite see why this should have anything special for Pmode / ptr_mode. It seems simplest to just use simplify_gen_subreg here. Can you try the following version? Thanks, Ulrich ChangeLog: * reload.c (struct replacement): Remove SUBREG_LOC member. (push_reload): Do not set it. (push_replacement): Likewise. (subst_reload): Remove dead code. (copy_replacements): Remove assertion. (copy_replacements_1): Do not handle SUBREG_LOC. (move_replacements): Likewise. (find_replacement): Remove dead code. Use reload_adjust_reg_for_mode. Detect subregs via recursive descent instead of via SUBREG_LOC. Index: gcc/reload.c === *** gcc/reload.c(revision 175580) --- gcc/reload.c(working copy) *** static int replace_reloads; *** 158,165 struct replacement { rtx *where; /* Location to store in */ - rtx *subreg_loc;/* Location of SUBREG if WHERE is inside - a SUBREG; 0 otherwise. */ int what; /* which reload this is for */ enum machine_mode mode; /* mode it must have */ }; --- 158,163 *** push_reload (rtx in, rtx out, rtx *inloc *** 1496,1502 { struct replacement *r = replacements[n_replacements++]; r-what = i; - r-subreg_loc = in_subreg_loc; r-where = inloc; r-mode = inmode; } --- 1494,1499 *** push_reload (rtx in, rtx out, rtx *inloc *** 1505,1511 struct replacement *r = replacements[n_replacements++]; r-what = i; r-where = outloc; - r-subreg_loc = out_subreg_loc; r-mode = outmode; } } --- 1502,1507 *** push_replacement (rtx *loc, int reloadnu *** 1634,1640 struct replacement *r = replacements[n_replacements++]; r-what = reloadnum; r-where = loc; - r-subreg_loc = 0; r-mode = mode; } } --- 1630,1635 *** subst_reloads (rtx insn) *** 6287,6319 if (GET_MODE (reloadreg) != r-mode r-mode != VOIDmode) reloadreg = reload_adjust_reg_for_mode (reloadreg, r-mode); ! /* If we are putting this into a SUBREG and RELOADREG is a !SUBREG, we would be making nested SUBREGs, so we have to fix !this up. Note that r-where == SUBREG_REG (*r-subreg_loc). */ ! ! if (r-subreg_loc != 0 GET_CODE (reloadreg) == SUBREG) ! { ! if (GET_MODE (*r-subreg_loc) ! == GET_MODE (SUBREG_REG (reloadreg))) ! *r-subreg_loc = SUBREG_REG (reloadreg); ! else ! { ! int final_offset = ! SUBREG_BYTE (*r-subreg_loc) + SUBREG_BYTE (reloadreg); ! ! /* When working with SUBREGs the rule is that the byte !offset must be a multiple of the SUBREG's mode. */ ! final_offset = (final_offset / ! GET_MODE_SIZE (GET_MODE (*r-subreg_loc))); ! final_offset = (final_offset * ! GET_MODE_SIZE (GET_MODE (*r-subreg_loc))); ! ! *r-where = SUBREG_REG (reloadreg); ! SUBREG_BYTE (*r-subreg_loc) = final_offset; ! } ! } ! else ! *r-where = reloadreg; } /* If reload got no reg and isn't optional,
Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
On 23/06/11 15:42, Andrew Stubbs wrote: This patch fixes the case where widening multiply-and-accumulate were not recognised because the multiplication itself is not actually widening. This can happen when you have DI + SI * SI - the multiplication will be done in SImode as a non-widening multiply, and it's only the final accumulate step that is widening. This was not recognised for two reasons: 1. is_widening_mult_p inferred the output type from the multiply statement, which in not useful in this case. 2. The inputs to the multiply instruction may not have been converted at all (because they're not being widened), so the pattern match failed. The patch fixes these issues by making the output type explicit, and by permitting unconverted inputs (the types are still checked, so this is safe). OK? This update fixes Janis' testsuite issue. Andrew 2011-06-28 Andrew Stubbs a...@codesourcery.com gcc/ * tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument 'type'. Use 'type' from caller, not inferred from 'rhs'. Don't reject non-conversion statements. Do return lhs in this case. (is_widening_mult_p): Add new argument 'type'. Use 'type' from caller, not inferred from 'stmt'. Pass type to is_widening_mult_rhs_p. (convert_mult_to_widen): Pass type to is_widening_mult_p. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.target/arm/wmul-8.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-8.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +long long +foo (long long a, int *b, int *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler smlal } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1963,7 +1963,8 @@ struct gimple_opt_pass pass_optimize_bswap = } }; -/* Return true if RHS is a suitable operand for a widening multiplication. +/* Return true if RHS is a suitable operand for a widening multiplication, + assuming a target type of TYPE. There are two cases: - RHS makes some value at least twice as wide. Store that value @@ -1973,32 +1974,32 @@ struct gimple_opt_pass pass_optimize_bswap = but leave *TYPE_OUT untouched. */ static bool -is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out) +is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out, + tree *new_rhs_out) { gimple stmt; - tree type, type1, rhs1; + tree type1, rhs1; enum tree_code rhs_code; if (TREE_CODE (rhs) == SSA_NAME) { - type = TREE_TYPE (rhs); stmt = SSA_NAME_DEF_STMT (rhs); if (!is_gimple_assign (stmt)) return false; - rhs_code = gimple_assign_rhs_code (stmt); - if (TREE_CODE (type) == INTEGER_TYPE - ? !CONVERT_EXPR_CODE_P (rhs_code) - : rhs_code != FIXED_CONVERT_EXPR) - return false; - rhs1 = gimple_assign_rhs1 (stmt); type1 = TREE_TYPE (rhs1); if (TREE_CODE (type1) != TREE_CODE (type) || TYPE_PRECISION (type1) * 2 TYPE_PRECISION (type)) return false; - *new_rhs_out = rhs1; + rhs_code = gimple_assign_rhs_code (stmt); + if (TREE_CODE (type) == INTEGER_TYPE + ? !CONVERT_EXPR_CODE_P (rhs_code) + : rhs_code != FIXED_CONVERT_EXPR) + *new_rhs_out = gimple_assign_lhs (stmt); + else + *new_rhs_out = rhs1; *type_out = type1; return true; } @@ -2013,28 +2014,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out) return false; } -/* Return true if STMT performs a widening multiplication. If so, - store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT - respectively. Also fill *RHS1_OUT and *RHS2_OUT such that converting - those operands to types *TYPE1_OUT and *TYPE2_OUT would give the - operands of the multiplication. */ +/* Return true if STMT performs a widening multiplication, assuming the + output type is TYPE. If so, store the unwidened types of the operands + in *TYPE1_OUT and *TYPE2_OUT respectively. Also fill *RHS1_OUT and + *RHS2_OUT such that converting those operands to types *TYPE1_OUT + and *TYPE2_OUT would give the operands of the multiplication. */ static bool -is_widening_mult_p (gimple stmt, +is_widening_mult_p (tree type, gimple stmt, tree *type1_out, tree *rhs1_out, tree *type2_out, tree *rhs2_out) { - tree type; - - type = TREE_TYPE (gimple_assign_lhs (stmt)); if (TREE_CODE (type) != INTEGER_TYPE TREE_CODE (type) != FIXED_POINT_TYPE) return false; - if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out)) + if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out, + rhs1_out)) return false; - if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out)) + if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out, + rhs2_out)) return false; if (*type1_out == NULL) @@ -2084,7 +2084,7 @@ convert_mult_to_widen (gimple stmt,
[Patch, Fortran, F08] PR 49562: [4.6/4.7 Regression] [OOP] assigning value to type-bound function
Hi all, here is a patch for a problem which was originally reported as an ICE-on-invalid regression (assigning to a type-bound function). In the course of fixing it, I noticed that it becomes valid according to F08 if the function is pointer-valued, and modified the patch such that it will accept this variant. I also adapted the original test case to be a run-time test of this F08 feature (in fact it is just a very complicated way of performing an increment from 0 to 1, and would still segfault without the patch). The patch was regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.6.2? Cheers, Janus 2011-06-28 Janus Weil ja...@gcc.gnu.org PR fortran/49562 * expr.c (gfc_check_vardef_context): Handle type-bound procedures. 2011-06-28 Janus Weil ja...@gcc.gnu.org PR fortran/49562 * gfortran.dg/typebound_proc_23.f90: New. Index: gcc/fortran/expr.c === --- gcc/fortran/expr.c (revision 175580) +++ gcc/fortran/expr.c (working copy) @@ -4394,8 +4394,8 @@ gfc_check_vardef_context (gfc_expr* e, bool pointe sym = e-value.function.esym ? e-value.function.esym : e-symtree-n.sym; } - if (!pointer e-expr_type == EXPR_FUNCTION - sym-result-attr.pointer) + attr = gfc_expr_attr (e); + if (!pointer e-expr_type == EXPR_FUNCTION attr.pointer) { if (!(gfc_option.allow_std GFC_STD_F2008)) { @@ -4432,7 +4432,6 @@ gfc_check_vardef_context (gfc_expr* e, bool pointe /* Find out whether the expr is a pointer; this also means following component references to the last one. */ - attr = gfc_expr_attr (e); is_pointer = (attr.pointer || attr.proc_pointer); if (pointer !is_pointer) { ! { dg-do compile } ! ! PR 49562: [4.6/4.7 Regression] [OOP] assigning value to type-bound function ! ! Contributed by Hans-Werner Boschmann boschm...@tp1.physik.uni-siegen.de module ice type::ice_type contains procedure::ice_func end type integer, target :: it = 0 contains function ice_func(this) integer, pointer :: ice_func class(ice_type)::this ice_func = it end function ice_func subroutine ice_sub(a) class(ice_type)::a a%ice_func() = 1 end subroutine ice_sub end module use ice type(ice_type) :: t if (it/=0) call abort() call ice_sub(t) if (it/=1) call abort() end ! { dg-final { cleanup-modules ice } }
Re: PATCH [10/n]: Prepare x32: PR rtl-optimization/49114: Reload failed to handle (set reg:X (plus:X (subreg:X (reg:Y) 0) (const
On Tue, Jun 28, 2011 at 8:19 AM, Ulrich Weigand uweig...@de.ibm.com wrote: H.J. Lu wrote: it doesn't work; allocation.f: In function 'allocation': allocation.f:1048:0: internal compiler error: in subreg_get_info, at rtlanal.c:3235 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. since subreg_regno_offset only works on hard registers. Hmm, OK. That look like another latent bug in the original code ... + if (r-mode != VOIDmode GET_MODE (reloadreg) != r-mode) + reloadreg = gen_rtx_REG (r-mode, REGNO (reloadreg)); (As an aside, this is wrong; it's already wrong in the place where you copied it from. This should now use reload_adjust_reg_for_mode just like subst_reload does.) + + if ((WORDS_BIG_ENDIAN || BYTES_BIG_ENDIAN) + GET_MODE_SIZE (Pmode) GET_MODE_SIZE (ptr_mode)) + { + offset = GET_MODE_SIZE (Pmode) - GET_MODE_SIZE (ptr_mode); + if (! BYTES_BIG_ENDIAN) + offset = (offset / UNITS_PER_WORD) * UNITS_PER_WORD; + else if (! WORDS_BIG_ENDIAN) + offset %= UNITS_PER_WORD; + } + else + offset = 0; + + return gen_rtx_SUBREG (ptr_mode, reloadreg, offset); works for me. This doesn't seem correct either, it completely ignores the SUBREG_BYTE of the original SUBREG ... Also, I don't quite see why this should have anything special for Pmode / ptr_mode. It seems simplest to just use simplify_gen_subreg here. Can you try the following version? Thanks, Ulrich ChangeLog: * reload.c (struct replacement): Remove SUBREG_LOC member. (push_reload): Do not set it. (push_replacement): Likewise. (subst_reload): Remove dead code. (copy_replacements): Remove assertion. (copy_replacements_1): Do not handle SUBREG_LOC. (move_replacements): Likewise. (find_replacement): Remove dead code. Use reload_adjust_reg_for_mode. Detect subregs via recursive descent instead of via SUBREG_LOC. It works much better. I am testing it now. Thanks. -- H.J.
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
Hi, On Tue, 28 Jun 2011, Richard Guenther wrote: I'd name the predicate value_preserving_conversion_p which I think is what you mean. harmless isn't really descriptive. Note that you include non-value-preserving conversions, namely int - unsigned int. It seems that Andrew really does want to accept them. If so value_preserving_conversion_p would be the wrong name. It seems to me he wants to accept those conversions that make it possible to retrieve the old value, i.e. when T1 x; (T1)(T2)x == x, then T1-T2 has the to-be-named property. bits_preserving? Hmm. Ciao, Michael.
Re: [PATCH, SRA] Total scalarization and padding
Hi, On Tue, Jun 28, 2011 at 03:01:17PM +0200, Richard Guenther wrote: On Tue, Jun 28, 2011 at 2:50 PM, Martin Jambor mjam...@suse.cz wrote: Hi, at the moment SRA can get confused by alignment padding and think that it actually contains some data for which there is no planned replacement and thus might leave some loads and stores in place instead of removing them. This is perhaps the biggest problem when we attempt total scalarization of simple structures exactly in order to get rid of these and of the variables altogether. I've pondered for quite a while how to best deal with them. One option was to make just the total scalarization stronger. I have also contemplated creating phantom accesses for padding I could detect (i.e. in simple structures) which would be more general, but this would complicate the parts of SRA which are already quite convoluted and I was not really sure it was worth it. Eventually I decided for the total scalarization option. This patch changes it such that the flag is propagated down the access tree but also, if it does not work out, is reset on the way up. If the flag survives, the access tree is considered covered by scalar replacements and thus it is known not to contain unscalarized data. While changing function analyze_access_subtree I have simplified the way we compute the hole flag and also fixed one comparison which we currently have the wrong way round but it fortunately does not matter because if there is a hole, the covered_to will never add up to the total size. I'll probably post a separate patch against 4.6 just in case someone attempts to read the source. Bootstrapped and tested on x86_64-linux, OK for trunk? So, what will it do for the testcase? The following is what I _think_ it should do: bb 2: l = *p_1(D); l$i_6 = p_1(D)-i; D.2700_2 = l$i_6; D.2701_3 = D.2700_2 + 1; l$i_12 = D.2701_3; *p_1(D) = l; p_1(D)-i = l$i_12; and let FRE/DSE do their job (which they don't do, unfortunately). So does your patch then remove the load/store from/to l but keep the elementwise loads/stores (which are probably cleaned up by FRE)? Well, that is what would happen if no total scalarization was going on. Total scalarization is a poor-man's aggregate copy-propagation by splitting up small structures to individual fields whenever we can get rid of them this way (i.e. if they are never used in a non-assignment) which I introduced to fix PR 42585 - but unfortunately the padding problem did not occur to me until this winter. Currently, SRA performs very badly on the testcase, creating: bb 2: l = *p_1(D); l$i_6 = p_1(D)-i; l$f1_8 = p_1(D)-f1; l$f2_9 = p_1(D)-f2; l$f3_10 = p_1(D)-f3; l$f4_11 = p_1(D)-f4; D.1966_2 = l$i_6; D.1967_3 = D.1966_2 + 1; l$i_12 = D.1967_3; *p_1(D) = l; -- this should not be here p_1(D)-i = l$i_12; p_1(D)-f1 = l$f1_8; p_1(D)-f2 = l$f2_9; p_1(D)-f3 = l$f3_10; p_1(D)-f4 = l$f4_11; return; Unfortunately, this basically survives all the way to the optimized dump. With the patch, the assignment *p_1(D) = l; is removed and copyprop1 and cddce1 turn this into: bb 2: l$i_6 = p_1(D)-i; D.1967_3 = l$i_6 + 1; p_1(D)-i = D.1967_3; return; which is then the optimized gimple, already before IPA and at -O1. For the record, without total scalarization, the optimized gimple would be: bb 2: l = *p_1(D); l$i_6 = p_1(D)-i; D.1967_3 = l$i_6 + 1; *p_1(D) = l; p_1(D)-i = D.1967_3; return; So at the moment FRE/DSE certainly does not help. Eventually we should do something like that or a real aggregate copy propagation but until then we probably need to live with the total scalarization thingy - I have learned in the PR mentioned above and a few others, there are people who really want at least this functionality now - and it should not perform this badly on unaligned structures. Martin Richard. Thanks, Martin 2011-06-24 Martin Jambor mjam...@suse.cz * tree-sra.c (struct access): Rename total_scalarization to grp_total_scalarization (completely_scalarize_var): New function. (sort_and_splice_var_accesses): Set total_scalarization in the representative access. (analyze_access_subtree): Propagate total scalarization accross the tree, no holes in totally scalarized trees, simplify coverage computation. (analyze_all_variable_accesses): Call completely_scalarize_var instead of completely_scalarize_record. * testsuite/gcc.dg/tree-ssa/sra-12.c: New test. Index: src/gcc/tree-sra.c === *** src.orig/gcc/tree-sra.c --- src/gcc/tree-sra.c *** struct access *** 170,179 /* Is this particular access write access? */ unsigned write : 1; - /* Is this access an artificial one created to
[testsuite, objc] Don't XFAIL objc.dg/torture/forward-1.m
objc.dg/torture/forward-1.m now seems to XPASS everywhere, creating an annoying amount of testsuite noise. Dominique provided the following patch in PR libobjc/Bug 36610. Tested with the appropriate runtest invocations on i386-pc-solaris2.10 (both multilibs), sparc-sun-solaris2.10 (both multilibs), alpha-dec-osf5.1b, mips-sgi-irix6.5 (both multilibs), powerpc-apple-darwin9.8.0 (32-bit only). Ok for mainline? Thanks. Rainer 2011-06-28 Dominique d'Humieres domi...@lps.ens.fr * objc.dg/torture/forward-1.m: Remove dg-xfail-run-if, dg-skip-if. Index: gcc/testsuite/objc.dg/torture/forward-1.m === --- gcc/testsuite/objc.dg/torture/forward-1.m (revision 175589) +++ gcc/testsuite/objc.dg/torture/forward-1.m (working copy) @@ -1,7 +1,5 @@ /* { dg-do run } */ /* See if -forward:: is able to work. */ -/* { dg-xfail-run-if PR36610 { ! { { i?86-*-* x86_64-*-* } ilp32 } } { -fgnu-runtime } { } } */ -/* { dg-skip-if Needs OBJC2 Implementation { *-*-darwin* { lp64 } } { -fnext-runtime } { } } */ #include stdio.h #include stdlib.h -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [testsuite, objc] Don't XFAIL objc.dg/torture/forward-1.m
On 28 Jun 2011, at 17:47, Rainer Orth wrote: objc.dg/torture/forward-1.m now seems to XPASS everywhere, creating an annoying amount of testsuite noise. Dominique provided the following patch in PR libobjc/Bug 36610. Tested with the appropriate runtest invocations on i386-pc-solaris2.10 (both multilibs), sparc-sun-solaris2.10 (both multilibs), alpha-dec-osf5.1b, mips-sgi-irix6.5 (both multilibs), powerpc-apple-darwin9.8.0 (32-bit only). Ok for mainline? Thanks. Rainer 2011-06-28 Dominique d'Humieres domi...@lps.ens.fr * objc.dg/torture/forward-1.m: Remove dg-xfail-run-if, dg-skip-if. Index: gcc/testsuite/objc.dg/torture/forward-1.m === --- gcc/testsuite/objc.dg/torture/forward-1.m (revision 175589) +++ gcc/testsuite/objc.dg/torture/forward-1.m (working copy) @@ -1,7 +1,5 @@ /* { dg-do run } */ /* See if -forward:: is able to work. */ -/* { dg-xfail-run-if PR36610 { ! { { i?86-*-* x86_64-*-* } ilp32 } } { -fgnu-runtime } { } } */ -/* { dg-skip-if Needs OBJC2 Implementation { *-*-darwin* { lp64 } } { -fnext-runtime } { } } */ actually, looking at this, it should likely read (untested): /* { dg-skip-if Needs OBJC2 Implementation { *-*-darwin8* { lp64 { ! objc2 } } } { -fnext-runtime } { } } */ and should stay in place to protect the test-cases for m64 on *-*- darwin8* (not that there's ever likely to be an m64 objc2 on darwin 8.. but) Iain
Use common and target option handling hooks in driver
This patch makes the driver use the common and target option handling hooks, so making the option state in the driver much closer to that in the core compiler as needed for it to drive multilib selection. opts.o is put in libcommon-target; a few cases of global state usage in opts.c (either missed in my previous changes, or recently added) are fixed. In a few cases where the driver has its own handling of a common option, or where the common handling may not work in the driver at present, common_handle_option is made to return early in the driver. In particular, this applies to --help (right now the driver has its own code reporting help information for driver options and they generally don't have help text in the .opt files; it would be good to integrate things better so that there is only one set of --help machinery used) and to -Werror= (the diagnostic machinery is initialized in the driver without the support for individual option control, which doesn't seem particularly useful there). Bootstrapped with no regressions on x86_64-unknown-linux-gnu. Applied to mainline. 2011-06-28 Joseph Myers jos...@codesourcery.com * common.opt (in_lto_p): New Variable entry. * flags.h (in_lto_p): Move to common.opt. * gcc.c: Include params.h. (set_option_handlers): Also use common_handle_option and target_handle_option. (main): Call global_init_params, finish_params and init_options_struct. * opts.c (debug_type_names): Move from toplev.c. (print_filtered_help): Access quiet_flag through opts pointer. (common_handle_option): Return early in the driver for some options. Access in_lto_p, dwarf_version and warn_maybe_uninitialized through opts pointer. * toplev.c (in_lto_p): Move to common.opt. (debug_type_names): Move to opts.c. * Makefile.in (OBJS): Remove opts.o. (OBJS-libcommon-target): Add opts.o. (gcc.o): Update dependencies. Index: gcc/flags.h === --- gcc/flags.h (revision 175330) +++ gcc/flags.h (working copy) @@ -1,6 +1,6 @@ /* Compilation switch flag definitions for GCC. Copyright (C) 1987, 1988, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2002, - 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010 + 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Free Software Foundation, Inc. This file is part of GCC. @@ -34,13 +34,6 @@ extern const char *const debug_type_name extern void strip_off_ending (char *, int); extern int base_of_path (const char *path, const char **base_out); -/* True if this is the LTO front end (lto1). This is used to disable - gimple generation and lowering passes that are normally run on the - output of a front end. These passes must be bypassed for lto since - they have already been done before the gimple was written. */ - -extern bool in_lto_p; - /* Return true iff flags are set as if -ffast-math. */ extern bool fast_math_flags_set_p (const struct gcc_options *); extern bool fast_math_flags_struct_set_p (struct cl_optimization *); Index: gcc/gcc.c === --- gcc/gcc.c (revision 175330) +++ gcc/gcc.c (working copy) @@ -43,6 +43,7 @@ compilation is specified by a string cal #include diagnostic.h #include flags.h #include opts.h +#include params.h #include vec.h #include filenames.h @@ -3532,9 +3533,13 @@ set_option_handlers (struct cl_option_ha handlers-unknown_option_callback = driver_unknown_option_callback; handlers-wrong_lang_callback = driver_wrong_lang_callback; handlers-post_handling_callback = driver_post_handling_callback; - handlers-num_handlers = 1; + handlers-num_handlers = 3; handlers-handlers[0].handler = driver_handle_option; handlers-handlers[0].mask = CL_DRIVER; + handlers-handlers[1].handler = common_handle_option; + handlers-handlers[1].mask = CL_COMMON; + handlers-handlers[2].handler = target_handle_option; + handlers-handlers[2].mask = CL_TARGET; } /* Create the vector `switches' and its contents. @@ -6156,7 +6161,11 @@ main (int argc, char **argv) if (argv != old_argv) at_file_supplied = true; - global_options = global_options_init; + /* Register the language-independent parameters. */ + global_init_params (); + finish_params (); + + init_options_struct (global_options, global_options_set); decode_cmdline_options_to_array (argc, CONST_CAST2 (const char **, char **, argv), Index: gcc/toplev.c === --- gcc/toplev.c(revision 175330) +++ gcc/toplev.c(working copy) @@ -125,13 +125,6 @@ unsigned int save_decoded_options_count; const struct gcc_debug_hooks *debug_hooks; -/* True if this is the lto front end. This is used to disable - gimple generation and lowering passes that are normally
[testsuite] Remove dg-extra-errors in gcc.dg/inline_[12].c etc.
Three new testcases seem to XPASS everywhere, at least on all of my targets: XPASS: gcc.dg/inline_1.c (test for excess errors) XPASS: gcc.dg/inline_2.c (test for excess errors) XPASS: gcc.dg/unroll_1.c (test for excess errors) The following patch fixes this to remove the noise. Tested with the appropriate runtest invocation on i386-pc-solaris2.10. Ok for mainline? Rainer 2011-06-28 Rainer Orth r...@cebitec.uni-bielefeld.de * gcc.dg/inline_1.c: Remove dg-excess-errors. * gcc.dg/inline_2.c: Likewise. * gcc.dg/unroll_1.c: Likewise. Index: gcc/testsuite/gcc.dg/inline_2.c === --- gcc/testsuite/gcc.dg/inline_2.c (revision 175590) +++ gcc/testsuite/gcc.dg/inline_2.c (working copy) @@ -20,4 +20,3 @@ /* { dg-final { scan-tree-dump-times bar 5 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ -/* { dg-excess-errors extra notes } */ Index: gcc/testsuite/gcc.dg/inline_1.c === --- gcc/testsuite/gcc.dg/inline_1.c (revision 175590) +++ gcc/testsuite/gcc.dg/inline_1.c (working copy) @@ -20,4 +20,3 @@ /* { dg-final { scan-tree-dump-times bar 5 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ -/* { dg-excess-errors extra notes } */ Index: gcc/testsuite/gcc.dg/unroll_1.c === --- gcc/testsuite/gcc.dg/unroll_1.c (revision 175590) +++ gcc/testsuite/gcc.dg/unroll_1.c (working copy) @@ -30,4 +30,3 @@ /* { dg-final { scan-rtl-dump-times Decided to peel loop completely 2 loop2_unroll } } */ /* { dg-final { cleanup-rtl-dump loop2_unroll } } */ -/* { dg-excess-errors extra notes } */ -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [testsuite] Remove dg-extra-errors in gcc.dg/inline_[12].c etc.
Your fix works ok for me (on x86-64/linux) too. Thanks, David On Tue, Jun 28, 2011 at 10:09 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Three new testcases seem to XPASS everywhere, at least on all of my targets: XPASS: gcc.dg/inline_1.c (test for excess errors) XPASS: gcc.dg/inline_2.c (test for excess errors) XPASS: gcc.dg/unroll_1.c (test for excess errors) The following patch fixes this to remove the noise. Tested with the appropriate runtest invocation on i386-pc-solaris2.10. Ok for mainline? Rainer 2011-06-28 Rainer Orth r...@cebitec.uni-bielefeld.de * gcc.dg/inline_1.c: Remove dg-excess-errors. * gcc.dg/inline_2.c: Likewise. * gcc.dg/unroll_1.c: Likewise. Index: gcc/testsuite/gcc.dg/inline_2.c === --- gcc/testsuite/gcc.dg/inline_2.c (revision 175590) +++ gcc/testsuite/gcc.dg/inline_2.c (working copy) @@ -20,4 +20,3 @@ /* { dg-final { scan-tree-dump-times bar 5 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ -/* { dg-excess-errors extra notes } */ Index: gcc/testsuite/gcc.dg/inline_1.c === --- gcc/testsuite/gcc.dg/inline_1.c (revision 175590) +++ gcc/testsuite/gcc.dg/inline_1.c (working copy) @@ -20,4 +20,3 @@ /* { dg-final { scan-tree-dump-times bar 5 optimized } } */ /* { dg-final { cleanup-tree-dump optimized } } */ -/* { dg-excess-errors extra notes } */ Index: gcc/testsuite/gcc.dg/unroll_1.c === --- gcc/testsuite/gcc.dg/unroll_1.c (revision 175590) +++ gcc/testsuite/gcc.dg/unroll_1.c (working copy) @@ -30,4 +30,3 @@ /* { dg-final { scan-rtl-dump-times Decided to peel loop completely 2 loop2_unroll } } */ /* { dg-final { cleanup-rtl-dump loop2_unroll } } */ -/* { dg-excess-errors extra notes } */ -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [pph] Fix var order when streaming in. (issue4635074)
On 2011/06/28 11:27:56, Diego Novillo wrote: http://codereview.appspot.com/4635074/diff/1/gcc/cp/pph-streamer-in.c File gcc/cp/pph-streamer-in.c (right): http://codereview.appspot.com/4635074/diff/1/gcc/cp/pph-streamer-in.c#newcode1144 gcc/cp/pph-streamer-in.c:1144: /* The chains are built backwards (ref: mailto:add_decl_to_level@name-lookup.c), 1143 1144 /* The chains are built backwards (ref: mailto:add_decl_to_level@name-lookup.c), s/add_decl_to_level@name-lookup.c/add_decl_to_level/ Done. Commited as r175592. Gab http://codereview.appspot.com/4635074/
Re: Simplify Solaris configuration
The patch allowed a sparcv9-sun-solaris2.11 bootstrap to run well into building the target libraries (failed configuring libgfortran since I'd mis-merged the 32-bit and 64-bit gmp.h), a sparc-sun-solaris2.10 bootstrap is still running. I'll probably fix the gmp.h issue, rebuild the sparcv9-sun-solaris2.11 configuration and commit unless I find problems or you disapprove of the approach. No, this is fine by me, thanks. -- Eric Botcazou
Re: Simplify Solaris configuration
Eric Botcazou ebotca...@adacore.com writes: The patch allowed a sparcv9-sun-solaris2.11 bootstrap to run well into building the target libraries (failed configuring libgfortran since I'd mis-merged the 32-bit and 64-bit gmp.h), a sparc-sun-solaris2.10 bootstrap is still running. I'll probably fix the gmp.h issue, rebuild the sparcv9-sun-solaris2.11 configuration and commit unless I find problems or you disapprove of the approach. No, this is fine by me, thanks. Both bootstraps have completed successfully, so I've checked in the patch. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: PATCH [8/n]: Prepare x32: PR other/48007: Unwind library doesn't work with UNITS_PER_WORD sizeof (void *)
On Mon, Jun 27, 2011 at 7:58 AM, Jason Merrill ja...@redhat.com wrote: On 06/26/2011 05:58 PM, H.J. Lu wrote: The current unwind library scheme provides only one unwind context and is backward compatible with multiple different unwind contexts from multiple unwind libraries: http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01769.html My patch fixes UNITS_PER_WORD sizeof (void *) and enforces single unwind context when backward compatibility isn't needed. OK, there seem to be two things going on in this patch: 1) Handle registers larger than pointers. 2) Require that all code share a single copy of the unwinder. For #2, how are you avoiding the issues Jakub describes in that message? Isn't his scenario 2 still possible? Are you deciding that it's better to abort at run-time in that case? It seems to me that for targets newer than Jakub's patch we can hard-wire _Unwind_IsExtendedContext to true, but making further assumptions would be a mistake. Then, if we're still trying to handle versioning, I think your earlier patch for #1 (r170716) that just changes the type of the reg array is a better way to go. But that change should be dependent on a target macro to avoid ABI changes for existing targets. This updated patch. It allows multiple unwind contexts. It replaces char by_value[DWARF_FRAME_REGISTERS+1]; with _Unwind_Word value[DWARF_FRAME_REGISTERS+1]; The code is cleaner than conditionally replacing void *reg[DWARF_FRAME_REGISTERS+1]; with _Unwind_Word reg[DWARF_FRAME_REGISTERS+1]; with a bigger unwind context. But it is more flexible if we want to extend unwind context later, like saving/restoring 128bit or vector registers which may be bigger than the current _Unwind_Word. Thanks. -- H.J. gcc/ 2011-06-28 H.J. Lu hongjiu...@intel.com * config.gcc (libgcc_tm_file): Add i386/value-unwind.h for Linux/x86. * system.h (REG_VALUE_IN_UNWIND_CONTEXT): Poisoned. * unwind-dw2.c (_Unwind_Context): If REG_VALUE_IN_UNWIND_CONTEXT is defined, add value and remove by_value. (SIGNAL_FRAME_BIT): Define if REG_VALUE_IN_UNWIND_CONTEXT is defined. (EXTENDED_CONTEXT_BIT): Don't define if REG_VALUE_IN_UNWIND_CONTEXT is defined. (_Unwind_IsExtendedContext): Likewise. (_Unwind_GetGR): Support REG_VALUE_IN_UNWIND_CONTEXT. (_Unwind_SetGR): Likewise. (_Unwind_GetGRPtr): Likewise. (_Unwind_SetGRPtr): Likewise. (_Unwind_SetGRValue): Likewise. (_Unwind_GRByValue): Likewise. (__frame_state_for): Likewise. (uw_install_context_1): Likewise. * doc/tm.texi.in: Document REG_VALUE_IN_UNWIND_CONTEXT. * doc/tm.texi: Regenerated. libgcc/ 2011-06-28 H.J. Lu hongjiu...@intel.com * config/i386/value-unwind.h: New. gcc/ 2011-06-28 H.J. Lu hongjiu...@intel.com * config.gcc (libgcc_tm_file): Add i386/value-unwind.h for Linux/x86. * system.h (REG_VALUE_IN_UNWIND_CONTEXT): Poisoned. * unwind-dw2.c (_Unwind_Context): If REG_VALUE_IN_UNWIND_CONTEXT is defined, add value and remove by_value. (SIGNAL_FRAME_BIT): Define if REG_VALUE_IN_UNWIND_CONTEXT is defined. (EXTENDED_CONTEXT_BIT): Don't define if REG_VALUE_IN_UNWIND_CONTEXT is defined. (_Unwind_IsExtendedContext): Likewise. (_Unwind_GetGR): Support REG_VALUE_IN_UNWIND_CONTEXT. (_Unwind_SetGR): Likewise. (_Unwind_GetGRPtr): Likewise. (_Unwind_SetGRPtr): Likewise. (_Unwind_SetGRValue): Likewise. (_Unwind_GRByValue): Likewise. (__frame_state_for): Likewise. (uw_install_context_1): Likewise. * doc/tm.texi.in: Document REG_VALUE_IN_UNWIND_CONTEXT. * doc/tm.texi: Regenerated. libgcc/ 2011-06-28 H.J. Lu hongjiu...@intel.com * config/i386/value-unwind.h: New. diff --git a/gcc/config.gcc b/gcc/config.gcc index a1dbd1a..c9867a2 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -2627,6 +2648,7 @@ esac case ${target} in i[34567]86-*-linux* | x86_64-*-linux*) tmake_file=${tmake_file} i386/t-pmm_malloc i386/t-i386 + libgcc_tm_file=${libgcc_tm_file} i386/value-unwind.h ;; i[34567]86-*-* | x86_64-*-*) tmake_file=${tmake_file} i386/t-gmm_malloc i386/t-i386 diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 341628b..2666716 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -3701,6 +3701,14 @@ return @code{@var{regno}}. @end defmac +@defmac REG_VALUE_IN_UNWIND_CONTEXT + +Define this macro if the target stores register values as +@code{_Unwind_Word} type in unwind context. The default is to +store register values as @code{void *} type. + +@end defmac + @node Elimination @subsection Eliminating Frame Pointer and Arg Pointer diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index f7c16e9..690fa52 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -3687,6
Re: [patch, fortran] Fix PR 49479, reshape with optional arg
Hi Jerry, On 06/27/2011 03:18 PM, Thomas Koenig wrote: Hello world, the attached patch fixes PR 49479, a regression for 4.7 and 4.6. Test case was supplied by Joost, the approach to the patch was suggested by Tobias in comment#4 of the PR. The patch certainly looks safe enough. Regression-tested. OK for trunk and, after a couple of days, for 4.6? Thomas OK, After your approval, I realized that I had forgotten the generic reshape. I added that as obvious. Here is what I committed, revision 175594. Regards Thomas 2011-06-28 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/49479 * m4/reshape.m4: If source allocation is smaller than one, set it to one. * intrinsics/reshape_generic.c: Likewise. * generated/reshape_r16.c: Regenerated. * generated/reshape_c4.c: Regenerated. * generated/reshape_c16.c: Regenerated. * generated/reshape_c8.c: Regenerated. * generated/reshape_r4.c: Regenerated. * generated/reshape_i4.c: Regenerated. * generated/reshape_r10.c: Regenerated. * generated/reshape_r8.c: Regenerated. * generated/reshape_c10.c: Regenerated. * generated/reshape_i8.c: Regenerated. * generated/reshape_i16.c: Regenerated. 2011-06-28 Thomas Koenig tkoe...@gcc.gnu.org PR fortran/49479 * gfortran.dg/reshape_zerosize_3.f90: New test. Index: m4/reshape.m4 === --- m4/reshape.m4 (Revision 175593) +++ m4/reshape.m4 (Arbeitskopie) @@ -101,6 +101,8 @@ if (ret-data == NULL) { + index_type alloc_size; + rs = 1; for (n = 0; n rdim; n++) { @@ -111,7 +113,13 @@ rs *= rex; } ret-offset = 0; - ret-data = internal_malloc_size ( rs * sizeof ('rtype_name`)); + + if (unlikely (rs 1)) +alloc_size = 1; + else +alloc_size = rs * sizeof ('rtype_name`); + + ret-data = internal_malloc_size (alloc_size); ret-dtype = (source-dtype ~GFC_DTYPE_RANK_MASK) | rdim; } Index: intrinsics/reshape_generic.c === --- intrinsics/reshape_generic.c (Revision 175593) +++ intrinsics/reshape_generic.c (Arbeitskopie) @@ -85,6 +85,8 @@ if (ret-data == NULL) { + index_type alloc_size; + rs = 1; for (n = 0; n rdim; n++) { @@ -95,7 +97,14 @@ rs *= rex; } ret-offset = 0; - ret-data = internal_malloc_size ( rs * size ); + + if (unlikely (rs 1)) + alloc_size = 1; + else + alloc_size = rs * size; + + ret-data = internal_malloc_size (alloc_size); + ret-dtype = (source-dtype ~GFC_DTYPE_RANK_MASK) | rdim; } ! { dg-do run } ! PR 49479 - this used not to print anything. ! Test case by Joost VandeVondele. MODULE M1 IMPLICIT NONE type foo character(len=5) :: x end type foo CONTAINS SUBROUTINE S1(data) INTEGER, DIMENSION(:), INTENT(IN), OPTIONAL :: DATA character(20) :: line IF (.not. PRESENT(data)) call abort write (unit=line,fmt='(I5)') size(data) if (line /= '0 ') call abort END SUBROUTINE S1 subroutine s_type(data) type(foo), dimension(:), intent(in), optional :: data character(20) :: line IF (.not. PRESENT(data)) call abort write (unit=line,fmt='(I5)') size(data) if (line /= '0 ') call abort end subroutine s_type SUBROUTINE S2(N) INTEGER :: N INTEGER, ALLOCATABLE, DIMENSION(:, :):: blki type(foo), allocatable, dimension(:, :) :: bar ALLOCATE(blki(3,N)) allocate (bar(3,n)) blki=0 CALL S1(RESHAPE(blki,(/3*N/))) call s_type(reshape(bar, (/3*N/))) END SUBROUTINE S2 END MODULE M1 USE M1 CALL S2(0) END ! { dg-final { cleanup-modules m1 } }
Re: Updated: RFA: partially hookize POINTER_SIZE
Joern == Joern Rennecke amyl...@spamcop.net writes: Joern This is basically the same patch as posted before in Joern http://gcc.gnu.org/ml/gcc-patches/2010-11/msg02772.html and updated in Joern http://gcc.gnu.org/viewcvs?view=revisionrevision=168273, but with a Joern few merge conflicts in current mainline resolved. Joern * java-tree.h (JAVA_POINTER_SIZE): Define. Joern * class.c (make_class_data): Use JAVA_POINTER_SIZE. Joern (emit_register_classes): Likewise. Joern * jcf-parse.c (handle_long_constant): Likewise. Joern * constants.c (build_constants_constructor): Likewise. Joern * builtins.c (UNMARSHAL3, UNMARSHAL4, UNMARSHAL5): Likewise. Joern (compareAndSwapObject_builtin): Likewise. Joern * boehm.c (get_boehm_type_descriptor): Likewise. Joern (mark_reference_fields): Add log2_size parameter. Changed all callers. Joern gcc/cp: One question about the Java parts... Joern - if (offset % (HOST_WIDE_INT) (POINTER_SIZE / BITS_PER_UNIT)) Joern + if (offset ((1 log2_size) - 1)) I think this has to be '(((HOST_WIDE_INT) 1) log2_size) - 1'. Otherwise it seems like this could overflow. The rest of the java parts are ok. Tom
[pph] Add cp_global_trees to cache in preload (issue4635077)
Add the cp_global_trees to the cache during the preload. Those are preconstructed trees which we only need the pointers to (i.e. they should be identical in both the .cc and .h) One exception to this is the keyed_classes tree which is generated during parsing. We will need to merge the keyed_classes tree eventually when working with multiple pph's. 2011-06-28 Gabriel Charette gch...@google.com * pph-streamer.c (pph_preload_common_nodes): Add cp_global_trees[] to cache. * g++.dg/pph/x1typerefs.cc: Remove xfail. diff --git a/gcc/cp/pph-streamer.c b/gcc/cp/pph-streamer.c index e919baf..c62864a 100644 --- a/gcc/cp/pph-streamer.c +++ b/gcc/cp/pph-streamer.c @@ -79,6 +79,17 @@ pph_preload_common_nodes (struct lto_streamer_cache_d *cache) if (c_global_trees[i]) lto_streamer_cache_append (cache, c_global_trees[i]); + /* cp_global_trees[] can have NULL entries in it. Skip them. */ + for (i = 0; i CPTI_MAX; i++) +{ + /* Also skip trees which are generated while parsing. */ + if (i == CPTI_KEYED_CLASSES) + continue; + + if (cp_global_trees[i]) + lto_streamer_cache_append (cache, cp_global_trees[i]); +} + lto_streamer_cache_append (cache, global_namespace); } diff --git a/gcc/testsuite/g++.dg/pph/x1typerefs.cc b/gcc/testsuite/g++.dg/pph/x1typerefs.cc index ba7580f..6aa0e96 100644 --- a/gcc/testsuite/g++.dg/pph/x1typerefs.cc +++ b/gcc/testsuite/g++.dg/pph/x1typerefs.cc @@ -1,6 +1,3 @@ -// { dg-xfail-if BOGUS { *-*-* } { -fpph-map=pph.map } } -// { dg-bogus c1typerefs.h:11:18: error: cannot convert 'const std::type_info.' to 'const std::type_info.' in initialization { xfail *-*-* } 0 } - #include x1typerefs.h int derived::method() { -- This patch is available for review at http://codereview.appspot.com/4635077
Re: [pph] Add cp_global_trees to cache in preload (issue4635077)
On Tue, Jun 28, 2011 at 15:23, Gabriel Charette gch...@google.com wrote: 2011-06-28 Gabriel Charette gch...@google.com * pph-streamer.c (pph_preload_common_nodes): Add cp_global_trees[] to cache. * g++.dg/pph/x1typerefs.cc: Remove xfail. OK. Diego.
Re: [pph] Add cp_global_trees to cache in preload (issue4635077)
Commited as r175595. http://codereview.appspot.com/4635077/
[patch] libiberty/cp-demangle.c: Fix CP_DEMANGLE_DEBUG SIGSEGV
Hi, a mechanical patch which fixes during #define CP_DEMANGLE_DEBUG make check - /bin/sh: line 1: 9179 Segmentation fault ./test-demangle ./demangle-expected which also fixes confusing output for _Z1hI1AIiEdEDTcldtfp_1gIT0_EEET_S2_ binary operator arguments binary operator operator . binary operator arguments ???--- template name 'g' template argument list template parameter 1 argument list Thanks, Jan libiberty/ 2011-06-28 Jan Kratochvil jan.kratoch...@redhat.com * cp-demangle.c (d_dump): Add (zero-based) to DEMANGLE_COMPONENT_TEMPLATE_PARAM. Implement DEMANGLE_COMPONENT_FUNCTION_PARAM, DEMANGLE_COMPONENT_VECTOR_TYPE, DEMANGLE_COMPONENT_NUMBER, DEMANGLE_COMPONENT_GLOBAL_CONSTRUCTORS, DEMANGLE_COMPONENT_GLOBAL_DESTRUCTORS, DEMANGLE_COMPONENT_LAMBDA, DEMANGLE_COMPONENT_DEFAULT_ARG and DEMANGLE_COMPONENT_UNNAMED_TYPE. Print ??? %d on unknown dc-type. --- a/libiberty/cp-demangle.c +++ b/libiberty/cp-demangle.c @@ -506,7 +507,10 @@ d_dump (struct demangle_component *dc, int indent) printf (name '%.*s'\n, dc-u.s_name.len, dc-u.s_name.s); return; case DEMANGLE_COMPONENT_TEMPLATE_PARAM: - printf (template parameter %ld\n, dc-u.s_number.number); + printf (template parameter %ld (zero-based)\n, dc-u.s_number.number); + return; +case DEMANGLE_COMPONENT_FUNCTION_PARAM: + printf (function parameter %ld (zero-based)\n, dc-u.s_number.number); return; case DEMANGLE_COMPONENT_CTOR: printf (constructor %d\n, (int) dc-u.s_ctor.kind); @@ -633,6 +637,9 @@ d_dump (struct demangle_component *dc, int indent) case DEMANGLE_COMPONENT_FIXED_TYPE: printf (fixed-point type\n); break; +case DEMANGLE_COMPONENT_VECTOR_TYPE: + printf (vector type\n); + break; case DEMANGLE_COMPONENT_ARGLIST: printf (argument list\n); break; @@ -675,12 +682,35 @@ d_dump (struct demangle_component *dc, int indent) case DEMANGLE_COMPONENT_CHARACTER: printf (character '%c'\n, dc-u.s_character.character); return; +case DEMANGLE_COMPONENT_NUMBER: + printf (number %ld\n, dc-u.s_number.number); + return; case DEMANGLE_COMPONENT_DECLTYPE: printf (decltype\n); break; +case DEMANGLE_COMPONENT_GLOBAL_CONSTRUCTORS: + printf (global constructors keyed to name\n); + break; +case DEMANGLE_COMPONENT_GLOBAL_DESTRUCTORS: + printf (global destructors keyed to name\n); + break; +case DEMANGLE_COMPONENT_LAMBDA: + printf (lambda %d (zero-based)\n, dc-u.s_unary_num.num); + d_dump (dc-u.s_unary_num.sub, indent + 2); + return; +case DEMANGLE_COMPONENT_DEFAULT_ARG: + printf (default argument %d (zero-based)\n, dc-u.s_unary_num.num); + d_dump (dc-u.s_unary_num.sub, indent + 2); + return; +case DEMANGLE_COMPONENT_UNNAMED_TYPE: + printf (unnamed type %ld\n, dc-u.s_number.number); + return; case DEMANGLE_COMPONENT_PACK_EXPANSION: printf (pack expansion\n); break; +default: + printf (??? %d\n, dc-type); + break; } d_dump (d_left (dc), indent + 2);
Re: [patch] Fix oversight in tuplification of DOM
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 06/28/11 14:36, Eric Botcazou wrote: Hi, the attached testcase triggers an ICE when compiled at -O or above, on all the open branches. This is a regression introduced with the tuplification. The problem is that 2 ARRAY_RANGE_REFs are recognized as equivalent, although they don't have the same number of elements. This is so because their type isn't taken into account by the hash equality function as it simply isn't recorded in initialize_hash_element (GIMPLE_SINGLE_RHS case). Now in all the other cases it is recorded so this very likely is an oversight. Tested on x86_64-suse-linux, OK for all branches? 2011-06-28 Eric Botcazou ebotca...@adacore.com * tree-ssa-dom.c (initialize_hash_element): Fix oversight. 2011-06-28 Eric Botcazou ebotca...@adacore.com * gnat.dg/opt17.ad[sb]: New test. OK. Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOCkOJAAoJEBRtltQi2kC7d74H/1UxVoJCRtAJyLBzwPvVCKni 7uowRbHYTWVpB5y+LrrrIh8vkcuM/SZ6LAB6SuowK00G+4zQJtmvnA9DBLq65WSZ /vOiond3LljmH8E5m7lg9umx5VO7jdErScB7xORfEezNy4857Y0p78UOkZxKiDpI RqKThfRYK/0mjizTlDaPaBQH/LIRJU8MgxWA8SDxLKZ1FmmqhOqcyH7Z+wbGdNPf QoHAd5xrQsA7Ga3kmwI/eBjNqlKkWS92L0ggQnn6aKsJJNeDuLdfolFKw4Fi4waN X8BV4vYDlDVywRoFRzo1lvBIjeJ9hpJsT3cLuW6Kp3BUvEzQjyv7d0o/BRxWYfw= =lvG1 -END PGP SIGNATURE-
Re: [RFC] Fix full memory barrier on SPARC-V8
Fair enough, you can add this code if you want. Thanks. Note that this is marginal for Solaris as GCC defaults to -mcpu=v9 on Solaris but, in all other cases, it defaults to -mcpu=v8. I can reproduce the problem on the SPARC/Linux machine 'grobluk' of the CompileFarm: cpu : TI UltraSparc II (BlackBird) fpu : UltraSparc II integrated FPU prom: OBP 3.2.30 2002/10/25 14:03 type: sun4u ncpus probed: 4 ncpus active: 4 Linux grobluk 2.6.26-2-sparc64-smp #1 SMP Thu Nov 5 03:34:29 UTC 2009 sparc64 GNU/Linux With the pristine compiler, the test passes with -mcpu=v9 but fails otherwise. It passes with the patched compiler. However, I suspect that we would still have problems with newer UltraSparc CPUs supporting full RMO, because the new insn membar_v8 is only half a memory barrier for V9. -- Eric Botcazou
[patch, fortran] Always return malloc(1) for empty arrays in the library
Hello world, looking at PR 49479 and other functions in the library made me realize there are lots of places where we don't malloc one byte for empty arrays. This patch is an attempt at fixing the ton of regressions likely caused by this (like in the PR) which haven't been found yet. No test cases, as they haven't been found yet :-) I also noticed two places where we had a memory leak (in eoshift1 and eoshift3), which I also fixed. Regression-tested. OK for trunk and, after a few days, for 4.6? Thomas 2011-06-28 Thomas Koenig tkoe...@gcc.gnu.org * m4/in_pack.m4 (internal_pack_'rtype_ccode`): If size is less than one, allocate a single byte. * m4/transpose.m4 (transpose_'rtype_code`): Likewise. * m4/cshift1.m4 (cshift1): Likewise. * m4/matmull.m4 (matmul_'rtype_code`): Likewise. * m4/unpack.m4 (unpack0_'rtype_code`): Likewise. * m4/ifunction_logical.m4 (name`'rtype_qual`_'atype_code): Likewise. * m4/matmul.m4 (name`'rtype_qual`_'atype_code): Likewise. * intrinics/transpose_generic.c (transpose_internal): Likewise. * intrinsics/unpack_generic.c (unpack_internal): Likewise. * m4/eoshift1.m4 (eoshift1): Remove double allocation. * m4/eoshift3.m4 (eoshift3): Likewise. * generated/all_l16.c: Regenerated. * generated/all_l1.c: Regenerated. * generated/all_l2.c: Regenerated. * generated/all_l4.c: Regenerated. * generated/all_l8.c: Regenerated. * generated/any_l16.c: Regenerated. * generated/any_l1.c: Regenerated. * generated/any_l2.c: Regenerated. * generated/any_l4.c: Regenerated. * generated/any_l8.c: Regenerated. * generated/count_16_l.c: Regenerated. * generated/count_1_l.c: Regenerated. * generated/count_2_l.c: Regenerated. * generated/count_4_l.c: Regenerated. * generated/count_8_l.c: Regenerated. * generated/cshift1_16.c: Regenerated. * generated/cshift1_4.c: Regenerated. * generated/cshift1_8.c: Regenerated. * generated/eoshift1_16.c: Regenerated. * generated/eoshift1_4.c: Regenerated. * generated/eoshift1_8.c: Regenerated. * generated/eoshift3_16.c: Regenerated. * generated/eoshift3_4.c: Regenerated. * generated/eoshift3_8.c: Regenerated. * generated/in_pack_c10.c: Regenerated. * generated/in_pack_c16.c: Regenerated. * generated/in_pack_c4.c: Regenerated. * generated/in_pack_c8.c: Regenerated. * generated/in_pack_i16.c: Regenerated. * generated/in_pack_i1.c: Regenerated. * generated/in_pack_i2.c: Regenerated. * generated/in_pack_i4.c: Regenerated. * generated/in_pack_i8.c: Regenerated. * generated/in_pack_r10.c: Regenerated. * generated/in_pack_r16.c: Regenerated. * generated/in_pack_r4.c: Regenerated. * generated/in_pack_r8.c: Regenerated. * generated/matmul_c10.c: Regenerated. * generated/matmul_c16.c: Regenerated. * generated/matmul_c4.c: Regenerated. * generated/matmul_c8.c: Regenerated. * generated/matmul_i16.c: Regenerated. * generated/matmul_i1.c: Regenerated. * generated/matmul_i2.c: Regenerated. * generated/matmul_i4.c: Regenerated. * generated/matmul_i8.c: Regenerated. * generated/matmul_l16.c: Regenerated. * generated/matmul_l4.c: Regenerated. * generated/matmul_l8.c: Regenerated. * generated/matmul_r10.c: Regenerated. * generated/matmul_r16.c: Regenerated. * generated/matmul_r4.c: Regenerated. * generated/matmul_r8.c: Regenerated. * generated/maxloc1_16_i16.c: Regenerated. * generated/maxloc1_16_i1.c: Regenerated. * generated/maxloc1_16_i2.c: Regenerated. * generated/maxloc1_16_i4.c: Regenerated. * generated/maxloc1_16_i8.c: Regenerated. * generated/maxloc1_16_r10.c: Regenerated. * generated/maxloc1_16_r16.c: Regenerated. * generated/maxloc1_16_r4.c: Regenerated. * generated/maxloc1_16_r8.c: Regenerated. * generated/maxloc1_4_i16.c: Regenerated. * generated/maxloc1_4_i1.c: Regenerated. * generated/maxloc1_4_i2.c: Regenerated. * generated/maxloc1_4_i4.c: Regenerated. * generated/maxloc1_4_i8.c: Regenerated. * generated/maxloc1_4_r10.c: Regenerated. * generated/maxloc1_4_r16.c: Regenerated. * generated/maxloc1_4_r4.c: Regenerated. * generated/maxloc1_4_r8.c: Regenerated. * generated/maxloc1_8_i16.c: Regenerated. * generated/maxloc1_8_i1.c: Regenerated. * generated/maxloc1_8_i2.c: Regenerated. * generated/maxloc1_8_i4.c: Regenerated. * generated/maxloc1_8_i8.c: Regenerated. * generated/maxloc1_8_r10.c: Regenerated. * generated/maxloc1_8_r16.c: Regenerated. *
RE: [PATCH, i386] Enable -mprefer-avx128 by default for Bulldozer
Hi, I re-attached the patch here. Can someone review it? We would like to commit to trunk as well as 4.6 branch. Thanks, Changpeng From: Fang, Changpeng Sent: Monday, June 27, 2011 5:42 PM To: Fang, Changpeng; Jan Hubicka Cc: Uros Bizjak; gcc-patches@gcc.gnu.org; rguent...@suse.de Subject: RE: [PATCH, i386] Enable -mprefer-avx128 by default for Bulldozer Is this patch OK to commit to trunk? Also I would like to backport this patch to gcc 4.6 branch. Do I have to send a separate request or use this one? Thanks, Changpeng From: Fang, Changpeng Sent: Friday, June 24, 2011 7:12 PM To: Jan Hubicka Cc: Uros Bizjak; gcc-patches@gcc.gnu.org; rguent...@suse.de Subject: RE: [PATCH, i386] Enable -mprefer-avx128 by default for Bulldozer Hi, I have no preference in tune feature coding. But I agree with you it's better to put similar things together. I modified the code following your suggestion. Is it OK to commit this modified patch? Thanks, Changpeng From: Jan Hubicka [hubi...@ucw.cz] Sent: Thursday, June 23, 2011 6:20 PM To: Fang, Changpeng Cc: Uros Bizjak; gcc-patches@gcc.gnu.org; hubi...@ucw.cz; rguent...@suse.de Subject: Re: [PATCH, i386] Enable -mprefer-avx128 by default for Bulldozer Hi, --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2128,6 +2128,9 @@ static const unsigned int x86_avx256_split_unaligned_load static const unsigned int x86_avx256_split_unaligned_store = m_COREI7 | m_BDVER1 | m_GENERIC; +static const unsigned int x86_prefer_avx128 + = m_BDVER1; What is reason for stuff like this to not go into initial_ix86_tune_features? I sort of liked them better when they was individual flags, but having the target tunning flags spread across multiple places seems unnecesary. Honza From a325395439a314f87b3c79a5b9ce79a6a976a710 Mon Sep 17 00:00:00 2001 From: Changpeng Fang chfang@huainan.(none) Date: Wed, 22 Jun 2011 15:03:05 -0700 Subject: [PATCH] Auto-vectorizer generates 128-bit AVX insns by default for bdver1 * config/i386/i386.opt (mprefer-avx128): Redefine the flag as a Mask option. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_AVX128_OPTIMAL entry. (TARGET_AVX128_OPTIMAL): New definition. * config/i386/i386.c (initial_ix86_tune_features): Initialize X86_TUNE_AVX128_OPTIMAL entry. (ix86_option_override_internal): Enable the generation of the 128-bit instructions when TARGET_AVX128_OPTIMAL is set. (ix86_preferred_simd_mode): Use TARGET_PREFER_AVX128. (ix86_autovectorize_vector_sizes): Use TARGET_PREFER_AVX128. --- gcc/config/i386/i386.c | 16 gcc/config/i386/i386.h |4 +++- gcc/config/i386/i386.opt |2 +- 3 files changed, 16 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 014401b..b3434dd 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2089,7 +2089,11 @@ static unsigned int initial_ix86_tune_features[X86_TUNE_LAST] = { /* X86_SOFTARE_PREFETCHING_BENEFICIAL: Enable software prefetching at -O3. For the moment, the prefetching seems badly tuned for Intel chips. */ - m_K6_GEODE | m_AMD_MULTIPLE + m_K6_GEODE | m_AMD_MULTIPLE, + + /* X86_TUNE_AVX128_OPTIMAL: Enable 128-bit AVX instruction generation for + the auto-vectorizer. */ + m_BDVER1 }; /* Feature tests against the various architecture variations. */ @@ -2623,6 +2627,7 @@ ix86_target_string (int isa, int flags, const char *arch, const char *tune, { -mvzeroupper, MASK_VZEROUPPER }, { -mavx256-split-unaligned-load, MASK_AVX256_SPLIT_UNALIGNED_LOAD}, { -mavx256-split-unaligned-store, MASK_AVX256_SPLIT_UNALIGNED_STORE}, +{ -mprefer-avx128, MASK_PREFER_AVX128}, }; const char *opts[ARRAY_SIZE (isa_opts) + ARRAY_SIZE (flag_opts) + 6][2]; @@ -3672,6 +3677,9 @@ ix86_option_override_internal (bool main_args_p) if ((x86_avx256_split_unaligned_store ix86_tune_mask) !(target_flags_explicit MASK_AVX256_SPLIT_UNALIGNED_STORE)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE; + /* Enable 128-bit AVX instruction generation for the auto-vectorizer. */ + if (TARGET_AVX128_OPTIMAL !(target_flags_explicit MASK_PREFER_AVX128)) + target_flags |= MASK_PREFER_AVX128; } } else @@ -34614,7 +34622,7 @@ ix86_preferred_simd_mode (enum machine_mode mode) return V2DImode; case SFmode: - if (TARGET_AVX !flag_prefer_avx128) + if (TARGET_AVX !TARGET_PREFER_AVX128) return V8SFmode; else return V4SFmode; @@ -34622,7 +34630,7 @@ ix86_preferred_simd_mode (enum machine_mode mode) case DFmode: if (!TARGET_VECTORIZE_DOUBLE) return word_mode; - else if (TARGET_AVX !flag_prefer_avx128) + else if (TARGET_AVX !TARGET_PREFER_AVX128) return V4DFmode; else if (TARGET_SSE2) return V2DFmode; @@ -34639,7 +34647,7 @@
[patch] Fix PR tree-optimization/49539
Hi, this is an ICE building the gnattools on ARM, a regression present on the mainline (and reproducible on x86/Linux by switching to SJLJ exceptions). For the reduced testcase compiled at -O: Unable to coalesce ssa_names 2 and 174 which are marked as MUST COALESCE. comp_last_2(ab) and comp_last_174(ab) +===GNAT BUG DETECTED==+ | 4.7.0 20110626 (experimental) [trunk revision 175408] (i586-suse-linux-gnu) GCC error:| | SSA corruption | | Error detected around p.adb:3:4 The SSA names (or rather 2 related ones) have overlapping lifetimes. The problem is created by forwprop1. Before: bb 23: # comp_last_1(ab) = PHI comp_last_159(ab)(20), comp_last_2(ab)(22) [...] comp_last_174(ab) = comp_last_1(ab) + 1; D.2425_175 = args.P_BOUNDS; D.2426_176 = D.2425_175-LB0; if (D.2426_176 comp_last_174(ab)) goto bb 39; else goto bb 38; bb 38: D.2425_177 = args.P_BOUNDS; D.2427_178 = D.2425_177-UB0; if (D.2427_178 comp_last_174(ab)) goto bb 39; else goto bb 40; [...] comp_last_185(ab) = comp_last_174(ab) + 1; D.2425_186 = args.P_BOUNDS; D.2426_187 = D.2425_186-LB0; if (D.2426_187 comp_last_185(ab)) goto bb 43; else goto bb 42; After: comp_last_185(ab) = comp_last_1(ab) + 2; D.2425_186 = args.P_BOUNDS; D.2426_187 = D.2425_186-LB0; if (D.2426_187 comp_last_185(ab)) goto bb 43; else goto bb 42; The pass already contains a check for this situation in can_propagate_from but it isn't applied in this case. Tested on x86_64-suse-linux, OK for the mainline? 2011-06-28 Eric Botcazou ebotca...@adacore.com PR tree-optimization/49539 * tree-ssa-forwprop.c (can_propagate_from): Check for abnormal SSA by means of stmt_references_abnormal_ssa_name. (associate_plusminus): Call can_propagate_from before propagating from definition statements. (ssa_forward_propagate_and_combine): Remove superfluous newline. -- Eric Botcazou Index: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 175408) +++ tree-ssa-forwprop.c (working copy) @@ -260,9 +260,6 @@ get_prop_source_stmt (tree name, bool si static bool can_propagate_from (gimple def_stmt) { - use_operand_p use_p; - ssa_op_iter iter; - gcc_assert (is_gimple_assign (def_stmt)); /* If the rhs has side-effects we cannot propagate from it. */ @@ -280,9 +277,8 @@ can_propagate_from (gimple def_stmt) return true; /* We cannot propagate ssa names that occur in abnormal phi nodes. */ - FOR_EACH_SSA_USE_OPERAND (use_p, def_stmt, iter, SSA_OP_USE) -if (SSA_NAME_OCCURS_IN_ABNORMAL_PHI (USE_FROM_PTR (use_p))) - return false; + if (stmt_references_abnormal_ssa_name (def_stmt)) +return false; /* If the definition is a conversion of a pointer to a function type, then we can not apply optimizations as some targets require @@ -1780,7 +1776,8 @@ associate_plusminus (gimple stmt) { gimple def_stmt = SSA_NAME_DEF_STMT (rhs2); if (is_gimple_assign (def_stmt) - gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR) + gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR + can_propagate_from (def_stmt)) { code = (code == MINUS_EXPR) ? PLUS_EXPR : MINUS_EXPR; gimple_assign_set_rhs_code (stmt, code); @@ -1797,7 +1794,8 @@ associate_plusminus (gimple stmt) { gimple def_stmt = SSA_NAME_DEF_STMT (rhs1); if (is_gimple_assign (def_stmt) - gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR) + gimple_assign_rhs_code (def_stmt) == NEGATE_EXPR + can_propagate_from (def_stmt)) { code = MINUS_EXPR; gimple_assign_set_rhs_code (stmt, code); @@ -1840,7 +1838,7 @@ associate_plusminus (gimple stmt) if (TREE_CODE (rhs1) == SSA_NAME) { gimple def_stmt = SSA_NAME_DEF_STMT (rhs1); - if (is_gimple_assign (def_stmt)) + if (is_gimple_assign (def_stmt) can_propagate_from (def_stmt)) { enum tree_code def_code = gimple_assign_rhs_code (def_stmt); if (def_code == PLUS_EXPR @@ -1940,7 +1938,7 @@ associate_plusminus (gimple stmt) if (rhs2 TREE_CODE (rhs2) == SSA_NAME) { gimple def_stmt = SSA_NAME_DEF_STMT (rhs2); - if (is_gimple_assign (def_stmt)) + if (is_gimple_assign (def_stmt) can_propagate_from (def_stmt)) { enum tree_code def_code = gimple_assign_rhs_code (def_stmt); if (def_code == PLUS_EXPR @@ -2262,8 +2260,7 @@ ssa_forward_propagate_and_combine (void) else gsi_next (gsi); } - else if (code == POINTER_PLUS_EXPR - can_propagate_from (stmt)) + else if (code == POINTER_PLUS_EXPR can_propagate_from (stmt)) { if (TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST /* ??? Better adjust the interface to that function
[ARM] Clean up dead code in thumb_pushpop
When I presented the patch that converted thumb1 prologue to rtl, I said I didn't clean up thumb_pushpop. I had thought about converting the epilogue to rtl as well and deleting the function entirely. However, for my immediate purposes cleaning up dwarf2out, I need to remove the text-based interface to the unwind info, and that means cleaning out the dead code from thumb_pushpop now. Tested with crosses to arm-elf and arm-eabi, -mthumb. Committed as obvious. r~ * config/arm/arm.c (thumb_pop): Rename from thumb_pushpop. Delete all code and arguments that handled pushes. Update all callers. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index be03659..4c6041a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -20188,16 +20188,9 @@ thumb1_emit_multi_reg_push (unsigned long mask, unsigned long real_regs) } /* Emit code to push or pop registers to or from the stack. F is the - assembly file. MASK is the registers to push or pop. PUSH is - nonzero if we should push, and zero if we should pop. For debugging - output, if pushing, adjust CFA_OFFSET by the amount of space added - to the stack. REAL_REGS should have the same number of bits set as - MASK, and will be used instead (in the same order) to describe which - registers were saved - this is used to mark the save slots when we - push high registers after moving them to low registers. */ + assembly file. MASK is the registers to pop. */ static void -thumb_pushpop (FILE *f, unsigned long mask, int push, int *cfa_offset, - unsigned long real_regs) +thumb_pop (FILE *f, unsigned long mask) { int regno; int lo_mask = mask 0xFF; @@ -20205,7 +20198,7 @@ thumb_pushpop (FILE *f, unsigned long mask, int push, int *cfa_offset, gcc_assert (mask); - if (lo_mask == 0 !push (mask (1 PC_REGNUM))) + if (lo_mask == 0 (mask (1 PC_REGNUM))) { /* Special case. Do not generate a POP PC statement here, do it in thumb_exit() */ @@ -20213,22 +20206,7 @@ thumb_pushpop (FILE *f, unsigned long mask, int push, int *cfa_offset, return; } - if (push arm_except_unwind_info (global_options) == UI_TARGET) -{ - fprintf (f, \t.save\t{); - for (regno = 0; regno 15; regno++) - { - if (real_regs (1 regno)) - { - if (real_regs ((1 regno) -1)) - fprintf (f, , ); - asm_fprintf (f, %r, regno); - } - } - fprintf (f, }\n); -} - - fprintf (f, \t%s\t{, push ? push : pop); + fprintf (f, \tpop\t{); /* Look at the low registers first. */ for (regno = 0; regno = LAST_LO_REGNUM; regno++, lo_mask = 1) @@ -20244,17 +20222,7 @@ thumb_pushpop (FILE *f, unsigned long mask, int push, int *cfa_offset, } } - if (push (mask (1 LR_REGNUM))) -{ - /* Catch pushing the LR. */ - if (mask 0xFF) - fprintf (f, , ); - - asm_fprintf (f, %r, LR_REGNUM); - - pushed_words++; -} - else if (!push (mask (1 PC_REGNUM))) + if (mask (1 PC_REGNUM)) { /* Catch popping the PC. */ if (TARGET_INTERWORK || TARGET_BACKTRACE @@ -20278,23 +20246,6 @@ thumb_pushpop (FILE *f, unsigned long mask, int push, int *cfa_offset, } fprintf (f, }\n); - - if (push pushed_words dwarf2out_do_frame ()) -{ - char *l = dwarf2out_cfi_label (false); - int pushed_mask = real_regs; - - *cfa_offset += pushed_words * 4; - dwarf2out_def_cfa (l, SP_REGNUM, *cfa_offset); - - pushed_words = 0; - pushed_mask = real_regs; - for (regno = 0; regno = 14; regno++, pushed_mask = 1) - { - if (pushed_mask 1) - dwarf2out_reg_save (l, regno, 4 * pushed_words++ - *cfa_offset); - } -} } /* Generate code to return from a thumb function. @@ -20440,8 +20391,7 @@ thumb_exit (FILE *f, int reg_containing_return_addr) } /* Pop as many registers as we can. */ - thumb_pushpop (f, regs_available_for_popping, FALSE, NULL, -regs_available_for_popping); + thumb_pop (f, regs_available_for_popping); /* Process the registers we popped. */ if (reg_containing_return_addr == -1) @@ -20522,8 +20472,7 @@ thumb_exit (FILE *f, int reg_containing_return_addr) int popped_into; int move_to; - thumb_pushpop (f, regs_available_for_popping, FALSE, NULL, -regs_available_for_popping); + thumb_pop (f, regs_available_for_popping); /* We have popped either FP or SP. Move whichever one it is into the correct register. */ @@ -20543,8 +20492,7 @@ thumb_exit (FILE *f, int reg_containing_return_addr) { int popped_into; - thumb_pushpop (f, regs_available_for_popping, FALSE, NULL, -regs_available_for_popping); + thumb_pop (f, regs_available_for_popping); popped_into = number_of_first_bit_set (regs_available_for_popping); @@
Re: [pph] Append DECL_CONTEXT of global namespace to cache in preload (issue4629081)
On Tue, Jun 28, 2011 at 18:37, Gabriel Charette gch...@google.com wrote: 2011-06-28 Gabriel Charette gch...@google.com * pph-streamer.c (pph_preload_common_nodes): Append DECL_CONTEXT of global_namespace to cache. OK. Diego.
Remove __GCC_FLOAT_NOT_NEEDED define
In the course of options changes I noted the existence of too many defines conditioning code built for the target http://gcc.gnu.org/ml/gcc-patches/2010-10/msg00947.html. One of those defines, __GCC_FLOAT_NOT_NEEDED, is not tested anywhere, and this patch removes the definition. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. Applied to mainline as obvious. Index: ChangeLog === --- ChangeLog (revision 175606) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2011-06-28 Joseph Myers jos...@codesourcery.com + + * Makefile.in (LIBGCC2_CFLAGS): Remove -D__GCC_FLOAT_NOT_NEEDED. + 2011-06-28 Richard Henderson r...@redhat.com * config/arm/arm.c (thumb_pop): Rename from thumb_pushpop. Delete Index: Makefile.in === --- Makefile.in (revision 175606) +++ Makefile.in (working copy) @@ -670,7 +670,7 @@ LIBGCC2_DEBUG_CFLAGS = -g LIBGCC2_CFLAGS = -O2 $(LIBGCC2_INCLUDES) $(GCC_CFLAGS) $(TARGET_LIBGCC2_CFLAGS) \ $(LIBGCC2_DEBUG_CFLAGS) $(GTHREAD_FLAGS) \ --DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED \ +-DIN_LIBGCC2 \ -fbuilding-libgcc -fno-stack-protector \ $(INHIBIT_LIBC_CFLAGS) -- Joseph S. Myers jos...@codesourcery.com
Re: [RFC] Fix full memory barrier on SPARC-V8
From: Eric Botcazou ebotca...@adacore.com Date: Tue, 28 Jun 2011 23:27:43 +0200 With the pristine compiler, the test passes with -mcpu=v9 but fails otherwise. It passes with the patched compiler. However, I suspect that we would still have problems with newer UltraSparc CPUs supporting full RMO, because the new insn membar_v8 is only half a memory barrier for V9. Linux doesn't ever run the cpu in the RMO memory model any more. All sparc64 chips run only in TSO now. All of the Niagara chips implement an even stricter than TSO memory model, and the membars we used to have all over the kernel to handle that properly were just wasted I-cache space. So I just moved unilaterally to TSO everywhere and killed off the membars necessitated by RMO.
Request to backport two -mvzeroupper related patches to 4.6 branch
Hi, Attached are two patches in gcc 4.7 trunk that we request to backport to 4.6 branch. There are all related to -mvzerupper 1) 0001-Save-the-initial-options-after-checking-vzeroupper.patch This patch fixes bug 47315, ICE: in extract_insn, at recog.c:2109 (unrecognizable insn) with -mvzeroupper and __attribute__((target(avx))) The patch was committed to trunk: 2011-05-23 H.J. Lu hongjiu...@intel.com The bug still exists in gcc 4.6.1. Backporting this patches would fix it. 2). 0001--config-i386-i386.c-ix86_reorg-Run-move_or_dele.patch This patch Run move_or_delete_vzeroupper first, and was committed to trunk: 2011-05-04 Uros Bizjak ubiz...@gmail.com Is It OK to commit to 4.6 branch? Thanks, Changpeng From 0b70e1e33afa25536305f4a228409cf9b4e0eaad Mon Sep 17 00:00:00 2001 From: hjl hjl@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon, 23 May 2011 16:51:42 + Subject: [PATCH] Save the initial options after checking vzeroupper. gcc/ 2011-05-23 H.J. Lu hongjiu...@intel.com PR target/47315 * config/i386/i386.c (ix86_option_override_internal): Save the initial options after checking vzeroupper. gcc/testsuite/ 2011-05-23 H.J. Lu hongjiu...@intel.com PR target/47315 * gcc.target/i386/pr47315.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@174078 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog |6 ++ gcc/config/i386/i386.c | 11 ++- gcc/testsuite/ChangeLog |5 + gcc/testsuite/gcc.target/i386/pr47315.c | 10 ++ 4 files changed, 27 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr47315.c diff --git a/gcc/ChangeLog b/gcc/ChangeLog index a3cb0f1..1d46b04 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2011-05-23 H.J. Lu hongjiu...@intel.com + + PR target/47315 + * config/i386/i386.c (ix86_option_override_internal): Save the + initial options after checking vzeroupper. + 2011-05-23 David Li davi...@google.com PR tree-optimization/48988 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 0709be8..854e376 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -4191,11 +4191,6 @@ ix86_option_override_internal (bool main_args_p) #endif } - /* Save the initial options in case the user does function specific options */ - if (main_args_p) -target_option_default_node = target_option_current_node - = build_target_option_node (); - if (TARGET_AVX) { /* When not optimize for size, enable vzeroupper optimization for @@ -4217,6 +4212,12 @@ ix86_option_override_internal (bool main_args_p) /* Disable vzeroupper pass if TARGET_AVX is disabled. */ target_flags = ~MASK_VZEROUPPER; } + + /* Save the initial options in case the user does function specific + options. */ + if (main_args_p) +target_option_default_node = target_option_current_node + = build_target_option_node (); } /* Return TRUE if VAL is passed in register with 256bit AVX modes. */ diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 72aae61..85137d0 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2011-05-23 H.J. Lu hongjiu...@intel.com + + PR target/47315 + * gcc.target/i386/pr47315.c: New test. + 2011-05-23 Jason Merrill ja...@redhat.com * g++.dg/cpp0x/lambda/lambda-eh2.C: New. diff --git a/gcc/testsuite/gcc.target/i386/pr47315.c b/gcc/testsuite/gcc.target/i386/pr47315.c new file mode 100644 index 000..871d3f1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr47315.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O3 -mvzeroupper } */ + +__attribute__ ((__target__ (avx))) +float bar (float f) {} + +void foo (float f) +{ +bar (f); +} -- 1.6.0.2 From 343f07cbec2d66bebe71e4f48b0403f52ebfe8f9 Mon Sep 17 00:00:00 2001 From: uros uros@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Wed, 4 May 2011 17:07:03 + Subject: [PATCH] * config/i386/i386.c (ix86_reorg): Run move_or_delete_vzeroupper first. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@173383 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 16 ++-- gcc/config/i386/i386.c |8 2 files changed, 14 insertions(+), 10 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 5412506..ca85616 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2011-05-04 Uros Bizjak ubiz...@gmail.com + + * config/i386/i386.c (ix86_reorg): Run move_or_delete_vzeroupper first. + 2011-05-04 Eric Botcazou ebotca...@adacore.com * stor-layout.c (variable_size): Do not issue errors. @@ -263,9 +267,9 @@ 2011-05-03 Stuart Henderson shend...@gcc.gnu.org -From Mike Frysinger: -* config/bfin/bfin.c (bfin_cpus[]): Add 0.4 for -bf542/bf544/bf547/bf548/bf549. + From Mike Frysinger: + * config/bfin/bfin.c (bfin_cpus[]): Add 0.4 for + bf542/bf544/bf547/bf548/bf549.
Re: [patch] Fix oversight in tuplification of DOM
On Tue, 28 Jun 2011, Eric Botcazou wrote: Hi, the attached testcase triggers an ICE when compiled at -O or above, on all the open branches. This is a regression introduced with the tuplification. The problem is that 2 ARRAY_RANGE_REFs are recognized as equivalent, although they don't have the same number of elements. This is so because their type isn't taken into account by the hash equality function as it simply isn't recorded in initialize_hash_element (GIMPLE_SINGLE_RHS case). Now in all the other cases it is recorded so this very likely is an oversight. Tested on x86_64-suse-linux, OK for all branches? 2011-06-28 Eric Botcazou ebotca...@adacore.com * tree-ssa-dom.c (initialize_hash_element): Fix oversight. This caused a regression on 4.4 for cris-elf (at least), see PR49572. brgds, H-P
[pph] Support simple C++ programs (issue4630074)
This patch adds support for emitting functions read from a PPH image. With this, we can now run some simple C++ programs whose header has been reconstructed from a single PPH image. The core problem it fixes was in the saving and restoring of functions with a body. 1- When the parser wants to register a function for code generation, it calls expand_or_defer_fn(). When reading from the pph image, we were not calling this, so the callgraph manager was tossing these functions out. 2- Even when we call expand_or_defer_fn, we need to take care of another side-effect. In the writer, the call to expand_or_defer_fn sets DECL_EXTERNAL to 1 (for reasons that I'm not too sure I understand). At the same time, it remembers that it forced DECL_EXTERNAL by setting DECL_NOT_REALLY_EXTERN. Since I don't think I understand why it does this, I'm simply using DECL_NOT_REALLY_EXTERN in the reader to recognize that the decl is should have DECL_EXTERNAL set to 0. Jason, does this make any sense? This fixed a whole bunch of tests: c1builtin-object-size-2.cc, c1funcstatic.cc, c1return-5.cc, c1simple.cc, x1autometh.cc, x1funcstatic.cc, x1struct1.cc, x1ten-hellos.cc and x1tmplfunc.cc. It also exposed other bugs in c c1attr-warn-unused-result.cc and x1template.cc. Lawrence, Gab, I think this affects some of the failures you were looking at today. Please double check. I also added support for 'dg-do run' tests to support x1ten-hellos.cc which now actually works (though it is not completely bug-free, I see that the counter it initializes starts with a bogus value). Tested on x86_64. Committed to branch. cp/ChangeLog.pph 2011-06-28 Diego Novillo dnovi...@google.com * pph-streamer-in.c (pph_in_ld_fn): Instantiate DECL_STRUCT_FUNCTION by calling allocate_struct_function. Remove assertion for stream-data_in. (pph_in_function_decl): Factor out of ... (pph_read_tree): ... here. * pph-streamer-out.c (pph_out_function_decl): Factor out of ... (pph_write_tree): ... here. testsuite/ChangeLog.pph * g++.dg/pph/c1attr-warn-unused-result.cc: Expect an ICE. * g++.dg/pph/x1template.cc: Likewise. * g++.dg/pph/c1builtin-object-size-2.cc: Expect no asm difference. * g++.dg/pph/c1funcstatic.cc: Likewise. * g++.dg/pph/c1return-5.cc: Likewise. * g++.dg/pph/c1simple.cc: Likewise. * g++.dg/pph/x1autometh.cc: Likewise. * g++.dg/pph/x1funcstatic.cc: Likewise. * g++.dg/pph/x1struct1.cc: Likewise. * g++.dg/pph/x1ten-hellos.cc: Likewise. * g++.dg/pph/x1tmplfunc.cc: Likewise. * g++.dg/pph/c1meteor-contest.cc: Adjust timeout. * g++.dg/pph/x1dynarray1.cc: Adjust expected ICE. * g++.dg/pph/x1namespace.cc: Likewise. * lib/dg-pph.exp: Do not compare assembly output if the test is marked 'dg-do run'. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index 7f70b65..1dabcf1 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -767,18 +767,17 @@ pph_in_ld_fn (pph_stream *stream, struct lang_decl_fn *ldf) } -/* Read applicable fields of struct function instance FN from STREAM. */ +/* Read applicable fields of struct function from STREAM. Associate + the read structure to DECL. */ static struct function * -pph_in_struct_function (pph_stream *stream) +pph_in_struct_function (pph_stream *stream, tree decl) { size_t count, i; unsigned ix; enum pph_record_marker marker; struct function *fn; - gcc_assert (stream-data_in != NULL); - marker = pph_in_start_record (stream, ix); if (marker == PPH_RECORD_END) return NULL; @@ -786,7 +785,8 @@ pph_in_struct_function (pph_stream *stream) /* Since struct function is embedded in every decl, fn cannot be shared. */ gcc_assert (marker != PPH_RECORD_SHARED); - fn = ggc_alloc_cleared_function (); + allocate_struct_function (decl, false); + fn = DECL_STRUCT_FUNCTION (decl); input_struct_function_base (fn, stream-data_in, stream-ib); @@ -1355,6 +1355,35 @@ pph_read_file (const char *filename) } +/* Read the attributes for a FUNCTION_DECL FNDECL. If FNDECL had + a body, mark it for expansion. */ + +static void +pph_in_function_decl (pph_stream *stream, tree fndecl) +{ + DECL_INITIAL (fndecl) = pph_in_tree (stream); + pph_in_lang_specific (stream, fndecl); + DECL_SAVED_TREE (fndecl) = pph_in_tree (stream); + DECL_STRUCT_FUNCTION (fndecl) = pph_in_struct_function (stream, fndecl); + DECL_CHAIN (fndecl) = pph_in_tree (stream); + if (DECL_SAVED_TREE (fndecl)) +{ + /* FIXME pph - This is somewhat gross. When we generated the +PPH image, the parser called expand_or_defer_fn on FNDECL, +which marked it DECL_EXTERNAL (see expand_or_defer_fn_1 for +details). + +However, this is not really an extern definition, so it was +also marked not-really-extern (yes, I