Re: [PATCH][ARM] one_cmpldi2 in NEON
On 06/12/11 23:07, Richard Henderson wrote: On 12/06/2011 01:42 PM, Andrew Stubbs wrote: On Tue 06 Dec 2011 21:05:30 GMT, Richard Henderson wrote: On 12/06/2011 09:59 AM, Andrew Stubbs wrote: +(define_insn *one_cmpldi2_neon + [(set (match_operand:DI 0 s_register_operand =w,?r,?r,?w) +(not:DI (match_operand:DI 1 s_register_operand w, 0, r, w)))] alternative 0 == alternative 3? Yes and no. This is an idiom used in several places in neon.md. They are the same, but only one or other is enabled at the same time (see the arch attribute) so while both 'w' and 'r' options are always available, the order of preference is different. Except that without *, I don't think this is what is actually achieved. Perhaps I'm mistaken about how register preferencing is handled in the current state of the register allocator... Well I can demonstrate that it does work in a simple testcase, and it has a visible affect in benchmark figures. Of course, that doesn't mean it doing it the right way. As far as I understand it the current implementation relies on the left-to-right priority of the constraints. On A9, and other A-series parts, we want it to prefer 'w'-constraint registers, but still use 'r'-constraint registers if those are more convenient. On A8, we want it to prefer 'r' registers, all else being equal, but still have the freedom to use 'w' registers if the values are already there. From reading the internals manual, it's not clear to me what the '*' constraint modifier really means, or how it would work in this case? Could you enlighten me? Andrew
Re: [PATCH] _GCC_PICFLAG: use -fPIC for s390x targets
Mike Frysinger vap...@gentoo.org writes: Building newer libiberty for s390x targets fails with relocation errors: libiberty/pic/libiberty.a(hashtab.o): In function 'htab_create': libiberty/hashtab.c:408:(.text+0x5e4): relocation truncated to fit: R_390_GOT12 against symbol 'xcalloc' defined in .text section in libiberty/pic/libiberty.a(xmalloc.o) libiberty/pic/libiberty.a(hashtab.o): In function 'htab_try_create': libiberty/hashtab.c:414:(.text+0x61c): relocation truncated to fit: R_390_GOT12 against symbol 'calloc@@GLIBC_2.2' defined in .text section in /lib/libc.so.6 collect2: ld returned 1 exit status Building with larger GOT (-fPIC rather than -fpic) fixes this. CC: Aurelien Jarno aurel...@aurel32.net CC: Martin Schwidefsky schwidef...@de.ibm.com Signed-off-by: Mike Frysinger vap...@gentoo.org config/: 2011-12-06 Mike Frysinger vap...@gentoo.org * picflag.m4 (_GCC_PICFLAG): Set $1 to -fPIC for s390x*-*-*. gcc/: libada/: libgcc/: libiberty/: 2011-12-06 Mike Frysinger vap...@gentoo.org * configure: Regenerate. --- config/picflag.m4 |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/config/picflag.m4 b/config/picflag.m4 index f6f1b44..db2ce0f 100644 --- a/config/picflag.m4 +++ b/config/picflag.m4 @@ -51,6 +51,9 @@ case ${$2} in m68k-*-*) $1=-fpic ;; +s390x*-*-*) + $1=-fpic + ;; This doesn't match the ChangeLog/description, and certainly needs an explanation. s390*-*-*) $1=-fpic ;; Perhaps it's better to remove both s390* cases and use the -fPIC default everywhere, as does libtool. picflag.m4 is supposed to be usable everywhere. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [Patch] Increase array sizes in vect-tests to enable 256-bit vectorization
Thanks, Richard. Should somebody else approve the patch or is it ok for commit to trunk? On 5 December 2011 18:04, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Dec 5, 2011 at 2:28 PM, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: I'd just duplicate the tests you want to change to a larger array size and change those duplicates accordingly, leaving the original tests alone. Richard, I made the tests this way - please check them in the attached patch (it happened to be quite big). Works for me. Thanks, Richard. There is vect_multiple_sizes for such cases. Ira, thanks! This flag would be useful to avoid fails on the original tests when they are compiled with mavx/mavx2 - I'll prepare a patch for this soon. On 5 December 2011 13:10, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Dec 5, 2011 at 9:48 AM, Ira Rosen i...@il.ibm.com wrote: gcc-patches-ow...@gcc.gnu.org wrote on 05/12/2011 10:39:07 AM: From: Michael Zolotukhin michael.v.zolotuk...@gmail.com To: Richard Guenther richard.guent...@gmail.com Cc: gcc-patches@gcc.gnu.org, izamya...@gmail.com Date: 05/12/2011 10:39 AM Subject: Re: [Patch] Increase array sizes in vect-tests to enable 256-bit vectorization Sent by: gcc-patches-ow...@gcc.gnu.org On 5 December 2011 10:14, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Ok, will several tests with short arrays be enough for that or should we keep all the original tests plus new ones with longer arrays? BTW, there is another problem with current tests with short arrays - scans are expecting specific number of some diagnostic messages like not vectorized: unsupported unaligned store, and that number would be different if several vector-lengths are available - so we'll have fails in those tests. There is vect_multiple_sizes for such cases. I'd just duplicate the tests you want to change to a larger array size and change those duplicates accordingly, leaving the original tests alone. Richard. -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation. -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
Re: [PATCH] Fix PR50823 and its gazillion dups
Am Tue 06 Dec 2011 03:41:36 PM CET schrieb Richard Guenther rguent...@suse.de: This removes accounting for the number of remaining calls in the inlining edge badness calculation (as discussed in private with Honza a long time ago - and yes, we disagreed). This Well, the main reason for disagreement was that is makes programs bigger. The basic idea of including badness in the cost is to make inliner preffer to fully inline functions that can be fully inline with little effort. This seems to work well to limit size growths when inlining. fixes the various ICEs of the edge badness update and caching code checking which are now present for over one month. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. A SPEC 2k6 LTO run is also in progress where I hope it will fix the 403.gcc and 435.gromacs builds which fail with the same ICE since a month. Honza, I guess you have some 12h to come up with a different fix now ;) I have patch for maintaining the growth via edge removal/addition hooks for some time now that speeds things up, too, but needs some cleanups (in particular it messes up heap updating somewhere and I did not work out where, yet). I think I can do it over weekend, but I am fine if this patch gets into tree beforehand. At least we will see how important the global metric still is. Thanks! Honza
Re: Ping #1: [Patch,AVR] Light-weight DImode implementation.
Georg-Johann Lay wrote: Denis Chertykov wrote: The only question that remains is what the -m64 option should be like? [ ] Omit it altogether [ ] Leave it as is (off per default) [ ] Set it on per default As soon as the direction is clear, I'll post a follow-up patch to add the missing bits like, e.g., documentation for the new switch. I'll leave the decision to Denis, but I'm for omitting it. I will also defer to Denis, but I'd rather avoid having another option, if we can. Keep it simple for the users. It might also be a hidden option like -morder2 and on per default. Such thing is nice for developers to play :-) I'm agree with Richard. I'm for omitting it. Denis. Here is a combined patch if that is more convenient for review. The variable avr_have_dimode that was originally set in avr.opt is still there but always set to true in avr.c. Option -m64 is omitted and thus avr.opt is unchanged. Passed without regressions. Ok? Johann gcc/ * config/avr/avr-dimode.md: New file. * config/avr/avr.md: Include it. (adjust_len): Add alternatives: plus64, compare64. (HIDI): Remove code iterator. (code_stdname): New code attribute. (rotx, rotsmode): Remove DI from interators. (rotlmode3, *rotwmode, *rotbmode): Use HISI instead of HIDI as code iterator. * config/avr/avr-protos.h (avr_have_dimode): New. (avr_out_plus64, avr_out_compare64): New. * config/avr/avr.c (avr_out_compare): Handle DImode. (avr_have_dimode): New variable definition and initialization. (avr_out_compare64, avr_out_plus64): New functions. (avr_out_plus_1): Use simplify_unary_operation to negate xval. (adjust_insn_length): Handle ADJUST_LEN_COMPARE64, ADJUST_LEN_PLUS64. (avr_compare_pattern): Skip DImode comparisons. libgcc/ * config/avr/t-avr (LIB1ASMFUNCS): Add _adddi3, _adddi3_s8, _subdi3, _cmpdi2, _cmpdi2_s8, _rotldi3. * config/avr/lib1funcs.S (__adddi3, __adddi3_s8, __subdi3, __cmpdi2, __cmpdi2_s8, __rotldi3): New functions. Index: gcc/config/avr/avr-dimode.md === --- gcc/config/avr/avr-dimode.md (revision 0) +++ gcc/config/avr/avr-dimode.md (revision 0) @@ -0,0 +1,283 @@ +;; Machine description for GNU compiler, +;; for Atmel AVR micro controllers. +;; Copyright (C) 1998 - 2011 +;; Free Software Foundation, Inc. +;; Contributed by Georg Lay (a...@gjlay.de) +;; +;; This file is part of GCC. +;; +;; GCC is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; GCC is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; http://www.gnu.org/licenses/. + +;; + +;; The purpose of this file is to provide a light-weight DImode +;; implementation for AVR. The trouble with DImode is that tree - RTL +;; lowering leads to really unpleasant code for operations that don't +;; work byte-wise like NEG, PLUS, MINUS, etc. Defining optabs entries for +;; them won't help because the optab machinery assumes these operations +;; are cheap and does not check if a libgcc implementation is available. +;; +;; The DImode insns are all straight forward -- except movdi. The approach +;; of this implementation is to provide DImode insns without the burden of +;; introducing movdi. +;; +;; The caveat is that if there are insns for some mode, there must also be a +;; respective move insn that describes reloads. Therefore, this +;; implementation uses an accumulator-based model with two hard-coded, +;; accumulator-like registers +;; +;;A[] = reg:DI 18 +;;B[] = reg:DI 10 +;; +;; so that no DImode insn contains pseudos or needs reloading. + +(define_constants + [(ACC_A 18) + (ACC_B 10)]) + +;; +;; Addition +;; + +(define_expand adddi3 + [(parallel [(match_operand:DI 0 general_operand ) + (match_operand:DI 1 general_operand ) + (match_operand:DI 2 general_operand )])] + avr_have_dimode + { +rtx acc_a = gen_rtx_REG (DImode, ACC_A); + +emit_move_insn (acc_a, operands[1]); + +if (s8_operand (operands[2], VOIDmode)) + { +emit_move_insn (gen_rtx_REG (QImode, REG_X), operands[2]); +emit_insn (gen_adddi3_const8_insn ()); + } +else if (CONST_INT_P
[Dodji Seketeli] Re: [PATCH] PR c++/51289 - ICE with alias template for bound template
Friendly pinging this patch. ---BeginMessage--- Dodji Seketeli do...@redhat.com writes: Jason Merrill ja...@redhat.com writes: I guess let's check DECL_ORIGINAL_TYPE instead of TREE_TYPE for alias templates. Like the below that I am currently bootstrapping? Finally this is what passed bootstrap and testing on x86_64-unknown-linux-gnu against trunk. From: Dodji Seketeli do...@redhat.com Date: Sat, 26 Nov 2011 11:50:43 +0100 Subject: [PATCH] PR c++/51289 - ICE with alias template for bound template template parm gcc/cp/ PR c++/51289 * cp-tree.h (TYPE_TEMPLATE_INFO): Rewrite this accessor macro to better support aliased types. (TYPE_ALIAS_P): Don't crash on TYPE_NAME nodes that are not TYPE_DECL. * pt.c (find_parameter_packs_r): Handle types aliases. (push_template_decl_real): Check for bare parameter packs in the underlying type of an alias template. gcc/PR51289/gcc/testsuite/ PR c++/51289 * g++.dg/cpp0x/alias-decl-17.C: New test. --- gcc/cp/cp-tree.h | 28 ++-- gcc/cp/pt.c| 19 ++- gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C | 21 + 3 files changed, 57 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-17.C diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 3f4f408..b821928 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -2553,6 +2553,7 @@ extern void decl_shadowed_for_var_insert (tree, tree); #define TYPE_ALIAS_P(NODE) \ (TYPE_P (NODE) \ TYPE_NAME (NODE) \ +TREE_CODE (TYPE_NAME (NODE)) == TYPE_DECL\ TYPE_DECL_ALIAS_P (TYPE_NAME (NODE))) /* For a class type: if this structure has many fields, we'll sort them @@ -2605,17 +2606,24 @@ extern void decl_shadowed_for_var_insert (tree, tree); (LANG_TYPE_CLASS_CHECK (BOUND_TEMPLATE_TEMPLATE_PARM_TYPE_CHECK (NODE)) \ -template_info) -/* Template information for an ENUMERAL_, RECORD_, or UNION_TYPE. */ +/* Template information for an ENUMERAL_, RECORD_, UNION_TYPE, or + BOUND_TEMPLATE_TEMPLATE_PARM type. Note that if NODE is a + specialization of an alias template, this accessor returns the + template info for the alias template, not the one (if any) for the + template of the underlying type. */ #define TYPE_TEMPLATE_INFO(NODE) \ - (TREE_CODE (NODE) == ENUMERAL_TYPE \ - ? ENUM_TEMPLATE_INFO (NODE) : \ - (TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM \ -? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE) :\ -((CLASS_TYPE_P (NODE) !TYPE_ALIAS_P (NODE)) \ - ? CLASSTYPE_TEMPLATE_INFO (NODE) \ - : ((TYPE_NAME (NODE) DECL_LANG_SPECIFIC (TYPE_NAME (NODE)))\ - ? (DECL_TEMPLATE_INFO (TYPE_NAME (NODE))) \ - : NULL_TREE + (TYPE_ALIAS_P (NODE) \ + ? ((TYPE_NAME (NODE) DECL_LANG_SPECIFIC (TYPE_NAME (NODE))) \ + ? DECL_TEMPLATE_INFO (TYPE_NAME (NODE)) \ + : NULL_TREE) \ + : ((TREE_CODE (NODE) == ENUMERAL_TYPE) \ + ? ENUM_TEMPLATE_INFO (NODE) \ + : ((TREE_CODE (NODE) == BOUND_TEMPLATE_TEMPLATE_PARM)\ +? TEMPLATE_TEMPLATE_PARM_TEMPLATE_INFO (NODE) \ +: (CLASS_TYPE_P (NODE) \ + ? CLASSTYPE_TEMPLATE_INFO (NODE)\ + : NULL_TREE + /* Set the template information for an ENUMERAL_, RECORD_, or UNION_TYPE to VAL. */ diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 4725080..ee3a3ab 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -2976,6 +2976,20 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, void* data) (struct find_parameter_pack_data*)data; bool parameter_pack_p = false; + /* Handle type aliases/typedefs. */ + if (TYPE_P (t) + TYPE_NAME (t) + TREE_CODE (TYPE_NAME (t)) == TYPE_DECL + TYPE_DECL_ALIAS_P (TYPE_NAME (t))) +{ + if (TYPE_TEMPLATE_INFO (t)) + cp_walk_tree (TYPE_TI_ARGS (t), + find_parameter_packs_r, + ppd, ppd-visited); + *walk_subtrees = 0; + return NULL_TREE; +} + /* Identify whether this is a parameter pack or not. */ switch (TREE_CODE (t)) { @@ -4905,7 +4919,10 @@ push_template_decl_real (tree decl, bool is_friend) if (check_for_bare_parameter_packs (TYPE_RAISES_EXCEPTIONS (type))) TYPE_RAISES_EXCEPTIONS (type) = NULL_TREE; } - else if
Re: [PATCH] Fix PR50823 and its gazillion dups
On Wed, 7 Dec 2011, Jan Hubicka wrote: Am Tue 06 Dec 2011 03:41:36 PM CET schrieb Richard Guenther rguent...@suse.de: This removes accounting for the number of remaining calls in the inlining edge badness calculation (as discussed in private with Honza a long time ago - and yes, we disagreed). This Well, the main reason for disagreement was that is makes programs bigger. The basic idea of including badness in the cost is to make inliner preffer to fully inline functions that can be fully inline with little effort. This seems to work well to limit size growths when inlining. Well, the theory might work for functions called once, but I doubt it works for a function called 3 times and one called 2 times. For functions called once we have the inline-functions-called-once stuff, idependent on the fibheap priority driven inlining. fixes the various ICEs of the edge badness update and caching code checking which are now present for over one month. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. A SPEC 2k6 LTO run is also in progress where I hope it will fix the 403.gcc and 435.gromacs builds which fail with the same ICE since a month. Honza, I guess you have some 12h to come up with a different fix now ;) I have patch for maintaining the growth via edge removal/addition hooks for some time now that speeds things up, too, but needs some cleanups (in particular it messes up heap updating somewhere and I did not work out where, yet). I think I can do it over weekend, but I am fine if this patch gets into tree beforehand. At least we will see how important the global metric still is. SPEC works, shows no regressions (in fact, binary size improves slightly). I have applied the patch for now. You can resort to reverting it with a more proper fix, but there is no reason to keep this broken during stage3. Thanks, Richard.
[testsuite]: Move pr45830.c to gcc.dg/torture and make it pass for AVR
In principle, test case pr45830.c works for target avr, but there is an issue with the -ftree-switch-conversion optimization activated at higher optimization levels: It transforms code size into .data usage and thus exceeds AVRs' RAM size because of big CSWTCH lookup tables located in RAM. The patch moves the test case from gcc.c-torture/execute/ to more modern gcc.dg/torture and adds magic dg-comments as needed. Patch lightly tested with my avr-tools. Ok for trunk? Johann PR tree-optimization/45830 * gcc.c-torture/execute/pr45830.c: Move file from here to... * gcc.dg/torture/pr45830.c: ...here. Add dg-do and dg-additional-options magic. Index: gcc.c-torture/execute/pr45830.c === --- gcc.c-torture/execute/pr45830.c (revision 182043) +++ gcc.c-torture/execute/pr45830.c (working copy) @@ -1,97 +0,0 @@ -/* PR tree-optimization/45830 */ - -extern void abort (void); - -long long va, vb, vc, vd, ve; - -__attribute__((noinline)) int -foo (int x) -{ - long long a, b, c, d, e; - switch (x) -{ -case 0: -case 3: -case 1: -case 2: -case 4: - a = 1; - b = 129; - c = -12; - d = -4; - e = 128; - break; -case 23: -case 26: -case 19: -case 65: -case 5: - a = 2; - b = 138; - c = 115; - d = 128; - e = -16; - break; -case 21: -case 20: -case 22: -case 38: -case 27: -case 66: -case 45: -case 47: - a = 3; - b = 6; - c = 127; - d = 25; - e = 257; - break; -default: - a = 0; - b = 18; - c = 0; - d = 64; - e = 32768L; - break; -} - va = a; - vb = b; - vc = c; - vd = d; - ve = e; -} - -int -bar (int x) -{ - if (x 0) -return 3; - if (x 5) -return 0; - if (x == 5 || x == 19 || x == 23 | x == 26 || x == 65) -return 1; - if ((x = 20 x = 22) || x == 27 || x == 38 - || x == 45 || x == 47 || x == 66) -return 2; - return 3; -} - -long long expected[] = -{ 1, 129, -12, -4, 128, 2, 138, 115, 128, -16, - 3, 6, 127, 25, 257, 0, 18, 0, 64, 32768L }; - -int -main (void) -{ - int i, v; - for (i = -4; i 70; i++) -{ - foo (i); - v = bar (i); - if (va != expected[5 * v] || vb != expected[5 * v + 1] - || vc != expected[5 * v + 2] || vd != expected[5 * v + 3] - || ve != expected[5 * v + 4]) - abort (); -} - return 0; -} Index: gcc.dg/torture/pr45830.c === --- gcc.dg/torture/pr45830.c (revision 182043) +++ gcc.dg/torture/pr45830.c (working copy) @@ -1,4 +1,6 @@ /* PR tree-optimization/45830 */ +/* { dg-do run } */ +/* { dg-additional-options -fno-tree-switch-conversion { target avr-*-* } } */ extern void abort (void);
Re: [Patch] Increase array sizes in vect-tests to enable 256-bit vectorization
On Wed, Dec 7, 2011 at 11:27 AM, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Thanks, Richard. Should somebody else approve the patch or is it ok for commit to trunk? It's ok to commit. Richard. On 5 December 2011 18:04, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Dec 5, 2011 at 2:28 PM, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: I'd just duplicate the tests you want to change to a larger array size and change those duplicates accordingly, leaving the original tests alone. Richard, I made the tests this way - please check them in the attached patch (it happened to be quite big). Works for me. Thanks, Richard. There is vect_multiple_sizes for such cases. Ira, thanks! This flag would be useful to avoid fails on the original tests when they are compiled with mavx/mavx2 - I'll prepare a patch for this soon. On 5 December 2011 13:10, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Dec 5, 2011 at 9:48 AM, Ira Rosen i...@il.ibm.com wrote: gcc-patches-ow...@gcc.gnu.org wrote on 05/12/2011 10:39:07 AM: From: Michael Zolotukhin michael.v.zolotuk...@gmail.com To: Richard Guenther richard.guent...@gmail.com Cc: gcc-patches@gcc.gnu.org, izamya...@gmail.com Date: 05/12/2011 10:39 AM Subject: Re: [Patch] Increase array sizes in vect-tests to enable 256-bit vectorization Sent by: gcc-patches-ow...@gcc.gnu.org On 5 December 2011 10:14, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Ok, will several tests with short arrays be enough for that or should we keep all the original tests plus new ones with longer arrays? BTW, there is another problem with current tests with short arrays - scans are expecting specific number of some diagnostic messages like not vectorized: unsupported unaligned store, and that number would be different if several vector-lengths are available - so we'll have fails in those tests. There is vect_multiple_sizes for such cases. I'd just duplicate the tests you want to change to a larger array size and change those duplicates accordingly, leaving the original tests alone. Richard. -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation. -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)
On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote: This is a backport for two upstream patches into our 4.6 branch. I submitted the first patch by Julian a while ago for backport but Richard Earnshaw pointed out a problem with the first patch. The second patch from Joey fixes that problem. This was tested on x86 and ARM. Why hasn't this been proposed for upstream 4.6 instead? See PR51442. 2011-11-22 Doug Kwan dougk...@google.com Backport r171347 and r181549 from trunk. gcc/ 2011-03-23 Julian Brown jul...@codesourcery.com * expr.c (expand_expr_real_1): Only use BLKmode for volatile accesses which are not naturally aligned. 2011-11-20 Joey Ye joey...@arm.com * expr.c (expand_expr_real_1): Correctly handle strict volatile bitfield loads smaller than mode size. gcc/testsuite/ 2011-11-20 Joey Ye joey...@arm.com * gcc.dg/volatile-bitfields-1.c: New. Jakub
Re: [PATCH] Fix PR tree-optimization/51315
You are basically (but non-optimally) locally re-implementing what expr.c:get_object_or_type_alignment does, for the bare MEM_REF case (you don't consider offsets that do not change the alignment, nor alignment information present on the SSA name). Gosh. And now I distinctly remember suggesting that the function be moved to the same file as the others... Note that in expr.c gete_object_or_type_alignment is only called on [TARGET_]MEM_REFs (I didn't export that function exactly because of its restrictions, if we want to export it we should assert it is only called for [TARGET_]MEM_REFs). OK, let's export it, move it to builtins.c and and add the assert. Thus, apart from the exp_align computation the patch looks ok (it's only non-optimal, not incorrect). I'll post an adjusted version. Thanks for the review. -- Eric Botcazou
Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)
On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote: This is a backport for two upstream patches into our 4.6 branch. I submitted the first patch by Julian a while ago for backport but Richard Earnshaw pointed out a problem with the first patch. The second patch from Joey fixes that problem. This was tested on x86 and ARM. Why hasn't this been proposed for upstream 4.6 instead? See PR51442. It's indeed annoying to see arm related backports only going to vendor branches, not the last officially maintained GCC release branch (4.6). I keep getting local requests to include random patches to our 4.6 build to make arm work at all. It seriously seems like having arm-linux-gnueabi as a primary target is a lie to our users. Arm maintainers - please consider maintaining at least the current release series and shift focus away from your vendor branches. Thanks, Richard. 2011-11-22 Doug Kwan dougk...@google.com Backport r171347 and r181549 from trunk. gcc/ 2011-03-23 Julian Brown jul...@codesourcery.com * expr.c (expand_expr_real_1): Only use BLKmode for volatile accesses which are not naturally aligned. 2011-11-20 Joey Ye joey...@arm.com * expr.c (expand_expr_real_1): Correctly handle strict volatile bitfield loads smaller than mode size. gcc/testsuite/ 2011-11-20 Joey Ye joey...@arm.com * gcc.dg/volatile-bitfields-1.c: New. Jakub
Re: [PATCH] Remove dead labels to increase superblock scope
On 06/12/11 14:25, Michael Matz wrote: Hi, On Tue, 6 Dec 2011, Tom de Vries wrote: what should be the 'next' returned by delete_insn? There are exactly two calls of delete_insn that take the return value. One (delete_insn_and_edges) just uses it to return it itself (and there are no calls to delete_insn_and_edges that use the returned value), the other (delete_insn_chain) wants to have the original next insn (before reordering), even if that means that it can see the insn twice (once as label, once as note, the latter would be skipped). So, return the original next one. Even better would be to somehow clean up the single last use of the return value in delete_insn_chain, and make delete_insn return nothing. I did that now. The only problem I ran into was an assert in remove_insn: ... if (BB_HEAD (bb) == insn) { /* Never ever delete the basic block note without deleting whole basic block. */ gcc_assert (!NOTE_P (insn)); BB_HEAD (bb) = next; } ... The problematic case was a removing a basic_block using delete_insn_chain: ... (code_label 33 26 34 4 1 [0 uses]) (note 34 33 50 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (note 50 34 51 4 NOTE_INSN_EPILOGUE_BEG) ... Normally, first the code_label was replaced by a note, then the BASIC_BLOCK note was removed. The assert would not trigger while removing this note because the note was not BB_HEAD. With the fixup, when removing the BASIC_BLOCK note, the note would be at the head and the assert would trigger: ... (note 34 33 50 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (note 33 26 34 4 1 NOTE_INSN_DELETED_LABEL) (note 50 34 51 4 NOTE_INSN_EPILOGUE_BEG) ... I solved this by removing backwards rather than forwards in delete_insn_chain. Bootstrapped and reg-tested on x86_64. OK for next stage1? Thanks, - Tom 2011-12-07 Tom de Vries t...@codesourcery.com * cfgcleanup.c (try_optimize_cfg): Replace call to delete_isns_chain by call to delete_insn. Remove code to reorder BASIC_BLOCK note and DELETED_LABEL note, and move it to ... * cfg_rtl.c (delete_insn): Change return type to void. (delete_insn_and_edges): Likewise. (delete_insn_chain): Handle new return type of delete_insn. Delete chain backwards rather than forwards. * rtl.h (delete_insn, delete_insn_and_edges): Change return type to void. * cfglayout.c (fixup_reorder_chain): Delete unused label. * gcc.dg/superblock.c: New test. Index: gcc/cfgcleanup.c === --- gcc/cfgcleanup.c (revision 181652) +++ gcc/cfgcleanup.c (working copy) @@ -2632,20 +2632,7 @@ try_optimize_cfg (int mode) || ! label_is_jump_target_p (BB_HEAD (b), BB_END (single_pred (b) { - rtx label = BB_HEAD (b); - - delete_insn_chain (label, label, false); - /* If the case label is undeletable, move it after the - BASIC_BLOCK note. */ - if (NOTE_KIND (BB_HEAD (b)) == NOTE_INSN_DELETED_LABEL) - { - rtx bb_note = NEXT_INSN (BB_HEAD (b)); - - reorder_insns_nobb (label, label, bb_note); - BB_HEAD (b) = bb_note; - if (BB_END (b) == bb_note) - BB_END (b) = label; - } + delete_insn (BB_HEAD (b)); if (dump_file) fprintf (dump_file, Deleted label in block %i.\n, b-index); Index: gcc/cfglayout.c === --- gcc/cfglayout.c (revision 181652) +++ gcc/cfglayout.c (working copy) @@ -857,6 +857,9 @@ fixup_reorder_chain (void) (e_taken-src, e_taken-dest)); e_taken-flags |= EDGE_FALLTHRU; update_br_prob_note (bb); + if (LABEL_NUSES (ret_label) == 0 + single_pred_p (e_taken-dest)) + delete_insn (ret_label); continue; } } Index: gcc/rtl.h === --- gcc/rtl.h (revision 181652) +++ gcc/rtl.h (working copy) @@ -2460,12 +2460,12 @@ extern void add_insn_before (rtx, rtx, s extern void add_insn_after (rtx, rtx, struct basic_block_def *); extern void remove_insn (rtx); extern rtx emit (rtx); -extern rtx delete_insn (rtx); +extern void delete_insn (rtx); extern rtx entry_of_function (void); extern void emit_insn_at_entry (rtx); extern void delete_insn_chain (rtx, rtx, bool); extern rtx unlink_insn_chain (rtx, rtx); -extern rtx delete_insn_and_edges (rtx); +extern void delete_insn_and_edges (rtx); extern rtx gen_lowpart_SUBREG (enum machine_mode, rtx); extern rtx gen_const_mem (enum machine_mode, rtx); extern rtx gen_frame_mem (enum machine_mode, rtx); Index: gcc/cfgrtl.c === --- gcc/cfgrtl.c (revision 181652) +++ gcc/cfgrtl.c (working copy) @@ -111,12 +111,11 @@ can_delete_label_p (const_rtx label) !in_expr_list_p (forced_labels, label)); } -/* Delete INSN by patching it
Re: [coverage] Some restructuring
On 2011.12.04 at 18:35 +, Nathan Sidwell wrote: I've committed this patch to break apart the gcov finalization routines, I believe this will make it easier to fix the problem shown up by bug 51113 -- although this patch does not. Notable changes: * rename coverage_begin_output to coverage_begin_function for consistency with coverage_end_function * Only call it once from the profile code -- I suspect in the past it was called from several locations where it wasn't possible to statically determine which would be the first call. Now it's obvious. * Move the opening of the notes file to the coverage initialization. This does mean that we'll always create a notes file when generating coverage data, even if there are no functions in the resultant object file. I don't think that's unsurprising -- and will stop gcov itself complaining about a missing notes file in this case. * Replace the trailing array of the gcov_info object with a pointer to an array. This doesn't require changes to libgcov.c's source but will of course change the resultant object file it compiles to and constitutes an ABI change. This change allows the creation of the gcov_info data type without knowing the number of functions being instrumented. tested on i686-pc-linux-gnu with profiledbootstrap. Nathan, this patch breaks profiled build of tramp3d: % c++ -w -Ofast -fprofile-generate -march=native tramp3d-v4.cpp /tmp/ccMmeivA.o:tramp3d-v4.cpp:function Inform::flush(): error: undefined reference to '__gcov0__ZNKSt19basic_ostringstreamIcSt11char_traitsIcESaIcEE3strEv' /tmp/ccMmeivA.o:tramp3d-v4.cpp:function Inform::flush(): error: undefined reference to '__gcov0__ZNKSt19basic_ostringstreamIcSt11char_traitsIcESaIcEE3strEv' ... see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51449 -- Markus
Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)
On 07/12/11 13:02, Richard Guenther wrote: On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote: This is a backport for two upstream patches into our 4.6 branch. I submitted the first patch by Julian a while ago for backport but Richard Earnshaw pointed out a problem with the first patch. The second patch from Joey fixes that problem. This was tested on x86 and ARM. Why hasn't this been proposed for upstream 4.6 instead? See PR51442. It's indeed annoying to see arm related backports only going to vendor branches, not the last officially maintained GCC release branch (4.6). I keep getting local requests to include random patches to our 4.6 build to make arm work at all. It seriously seems like having arm-linux-gnueabi as a primary target is a lie to our users. Arm maintainers - please consider maintaining at least the current release series and shift focus away from your vendor branches. So this, to some extent seems to conflict with your rules for only fixing regressions. This code has always been broken in one way or another, so technically this doesn't qualify for the 4.6 branch. I think we need clearer rules (on the web site, not in a mailing list post) that describes which patches are considered acceptable on the branch and which are not. R.
[PATCH] Testcase for PR48100
Which got fixed on trunk. Tested on x86_64-unknown-linux-gnu, applied. Richard. 2011-12-07 Richard Guenther rguent...@suse.de PR lto/48100 * gcc.dg/lto/20111207-1_0.c: New testcase. * gcc.dg/lto/20111207-1_1.c: Likewise. * gcc.dg/lto/20111207-1_2.c: Likewise. * gcc.dg/lto/20111207-1_3.c: Likewise. Index: gcc/testsuite/gcc.dg/lto/20111207-1_0.c === --- gcc/testsuite/gcc.dg/lto/20111207-1_0.c (revision 0) +++ gcc/testsuite/gcc.dg/lto/20111207-1_0.c (revision 0) @@ -0,0 +1,4 @@ +/* { dg-lto-do run } */ +/* { dg-lto-options { { -flto } } } */ +/* { dg-require-linker-plugin } */ +/* { dg-extra-ld-options -fuse-linker-plugin } */ Index: gcc/testsuite/gcc.dg/lto/20111207-1_1.c === --- gcc/testsuite/gcc.dg/lto/20111207-1_1.c (revision 0) +++ gcc/testsuite/gcc.dg/lto/20111207-1_1.c (revision 0) @@ -0,0 +1,3 @@ +/* { dg-options -fno-lto } */ + +int i; Index: gcc/testsuite/gcc.dg/lto/20111207-1_2.c === --- gcc/testsuite/gcc.dg/lto/20111207-1_2.c (revision 0) +++ gcc/testsuite/gcc.dg/lto/20111207-1_2.c (revision 0) @@ -0,0 +1 @@ +int i; Index: gcc/testsuite/gcc.dg/lto/20111207-1_3.c === --- gcc/testsuite/gcc.dg/lto/20111207-1_3.c (revision 0) +++ gcc/testsuite/gcc.dg/lto/20111207-1_3.c (revision 0) @@ -0,0 +1,6 @@ +extern int i; + +int main() +{ + return i; +}
Re: [Patch, Fortran] PR50815 - don't -fcheck=bounds of deferred-length strings
** PING ** On 11/29/2011 07:35 PM, Tobias Burnus wrote: gfortran had an ICE when trying to insert a check whether the character length between actual and dummy argument matches. This check is pointless for deferred-length character arguments - thus, no bounds check should be generated. Build and regtested on x86-64-linux. OK for the trunk? Tobias
Re: [Patch, Fortran] PR50923 - [4.4-4.7] Print warning function does not return a value
** PING ** On 11/30/2011 07:19 PM, Tobias Burnus wrote: The bug has been introduced when changing the warning system to do more in the front end. The problem is that for: module m contains function f() end end the sym-attr.referenced gets set - and no warning is printed. I now ignore the sym-attr.referenced attribute as the RESULT == NULL_TREE should only be reached if the variable has not been assigned a value. Additionally, I now only set TREE_NO_WARNING() if a warning has been printed; that way, some other useful middle-end warnings might be still printed. I have to set the NO_WARNING also for sym != sym-result as it otherwise prints a middle end warning into addition to the FE warning. But the FE warning has been printed already before for the result variable. Build and regtested on x86-64-linux. OK for the trunk? How far should this be backported - all the way down to 4.4? Tobias
Re: [Patch, Fortran] PR51407 - allow BOZ edit descriptors for REAL/COMPLEX
* ping * ? On 12/04/2011 06:46 PM, Tobias Burnus wrote: Hi all, as Dominique has found, Fortran 2008 allows the BOZ edit descriptors now also with REAL and COMPLEX arguments. (See PR for quotes from the standard.) Build and regtested on x86-64-linux. OK for the trunk? Tobias PS: Thank you, Mikael, for reviewing my ASSOCIATE patch!
Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)
On Wed, Dec 7, 2011 at 2:32 PM, Richard Earnshaw rearn...@arm.com wrote: On 07/12/11 13:02, Richard Guenther wrote: On Wed, Dec 7, 2011 at 1:34 PM, Jakub Jelinek ja...@redhat.com wrote: On Tue, Nov 29, 2011 at 05:59:53PM -0800, Doug Kwan wrote: This is a backport for two upstream patches into our 4.6 branch. I submitted the first patch by Julian a while ago for backport but Richard Earnshaw pointed out a problem with the first patch. The second patch from Joey fixes that problem. This was tested on x86 and ARM. Why hasn't this been proposed for upstream 4.6 instead? See PR51442. It's indeed annoying to see arm related backports only going to vendor branches, not the last officially maintained GCC release branch (4.6). I keep getting local requests to include random patches to our 4.6 build to make arm work at all. It seriously seems like having arm-linux-gnueabi as a primary target is a lie to our users. Arm maintainers - please consider maintaining at least the current release series and shift focus away from your vendor branches. So this, to some extent seems to conflict with your rules for only fixing regressions. This code has always been broken in one way or another, so technically this doesn't qualify for the 4.6 branch. I think we need clearer rules (on the web site, not in a mailing list post) that describes which patches are considered acceptable on the branch and which are not. We generally accept wrong-code fixes (or rejects-valid) that are low-risk. Target maintainers have complete freedom for their targets provided a fix is really target-only (we even accept new ports or new ISAs on branches, well - in theory at least). If a maintainer thinks backporting a fix is important he can always defer to a release manager for a final decision. What we generally do not want is new middle-end functionality. And we generally raise the barrier for low-risk the more mature a branch is. As a general rule, if you'd point users to use the arm-embedded branch for bugreports you get, then you are doing sth wrong. If you say, use the arm-embedded branch to get smaller/faster code - well, that's ok. Pointing people to the latest official release series (in this case 4.6.x) is also ok, we're keeping too many branches active IMNSHO. Richard. R.
Re: RFC: ARM 64-bit shifts in NEON
On Wed 07 Dec 2011 13:42:37 GMT, Richard Earnshaw wrote: So it looks like the code generated for core registers with thumb2 is pretty rubbish (no real surprise there -- to get the best code you need to make use of the fact that on ARM a shift by a small negative number ( -128) will give zero. This gives us sequences like: For ARM state it's something like (untested) @ shft 32 , shft= 32 __ashldi3_v3: sub r3, r2, #32 @ -ve , shft - 32 lsl ah, ah, r2 @ ah shft , 0 rsb ip, r2, #32 @ 32 - shft , -ve orr ah, ah, al, lsl r3 @ ah shft , al shft - 32 orr ah, ah, al, lsr ip @ ah shft | al 32 - shft , al shft - 32 lsl al, al, r2 @ al shft , 0 For Thumb2 (where there is no orr with register shift) lslsah, ah, r2 @ ah shft , 0 sub r3, r2, #32 @ -ve , shft - 32 lsl ip, al, r3 @ 0 , al shft - 32 negsr3, r3 @ 32 - shft , -ve orr ah, ah, ip @ ah shft , al shft - 32 lsr r3, al, r3 @ al 32 - shft , 0 orrsah, ah, r3 @ ah shft | al 32 - shft , al shft - 32 lslsal, al, r2 @ al shft , 0 Neither of which needs the condition flags during execution (and indeed is probably better in both cases than the code currently in lib1funcs.asm for a modern core). The flag clobbering behaviour in the thumb2 variant is only for code size saving; that would normally be added by a late optimization pass. OK, those are interesting, and I can look into making it happen, with or without NEON. Would it not require an unspec to prevent 'clever things' happening to the negative shift, if we were to encode these in the machine description? I'm not too clear on what these 'clever things' might be in the case of shift-by-register (presumably value-range propagation is one), but I know the NEON shifts are encoded this way for safety. None of this directly helps with your neon usage, but it does show that we really don't need to clobber the condition code register to get an efficient sequence. Except that it doesn't in the case of a shift by one where there is a two-instruction sequence that clobbers CC. Presumably this special case can be treated differently though, right from expand. Andrew
RFA: Fix PR middle-end/40154
Having a value in the wrong mode is not the only hazard that users of set_unique_reg_note should anticipate; it could also be that the value is computed as a constant by the compiler, and thus the last instruction obtained with get_last_insn is completely (or somewhat) unrelated; or an expander could decide to save / restore some hard register around the actual operation. So, in general, the test should be if the candidate insn actually sets the expected target. There are already a number of users of set_unique_reg_note that do that. Others assume that a move, add or other expander emits a single insn, or a sequence of insns that ends with a single insn, which sets the target. Although that is often the case, this is not a safe assumption to make. Fixing all these places with inlined code would mean a lot of cut paste programming. So instead, I've added a new wrapper for set_unique_reg_note to be used in these circumstances, and also used it to replace the existing inlined checks unless something special was going on (e.g. testing two different target / value pairs). Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 2011-12-07 Joern Rennecke joern.renne...@embecosm.com PR middle-end/40154 * emit-rtl.c (set_dst_reg_note): New function. * rtl.h (set_dst_reg_note): Declare. * optabs.c (expand_binop, expand_absneg_bit): Use set_dst_reg_note. (emit_libcall_block, expand_fix): Likewise. * function.c (assign_parm_setup_reg, expand_function_start): Likewise. * expmed.c (expand_mult_const, expand_divmod): Likewise. * reload1.c (gen_reload): Likewise. Index: optabs.c === --- optabs.c(revision 182075) +++ optabs.c(working copy) @@ -2052,11 +2052,11 @@ expand_binop (enum machine_mode mode, op { rtx temp = emit_move_insn (target, xtarget); - set_unique_reg_note (temp, - REG_EQUAL, - gen_rtx_fmt_ee (binoptab-code, mode, - copy_rtx (xop0), - copy_rtx (xop1))); + set_dst_reg_note (temp, REG_EQUAL, + gen_rtx_fmt_ee (binoptab-code, mode, + copy_rtx (xop0), + copy_rtx (xop1)), + target); } else target = xtarget; @@ -2104,11 +2104,12 @@ expand_binop (enum machine_mode mode, op if (optab_handler (mov_optab, mode) != CODE_FOR_nothing) { temp = emit_move_insn (target ? target : product, product); - set_unique_reg_note (temp, - REG_EQUAL, - gen_rtx_fmt_ee (MULT, mode, - copy_rtx (op0), - copy_rtx (op1))); + set_dst_reg_note (temp, + REG_EQUAL, + gen_rtx_fmt_ee (MULT, mode, + copy_rtx (op0), + copy_rtx (op1)), + target ? target : product); } return product; } @@ -2966,8 +2967,9 @@ expand_absneg_bit (enum rtx_code code, e gen_lowpart (imode, target), 1, OPTAB_LIB_WIDEN); target = lowpart_subreg_maybe_copy (mode, temp, imode); - set_unique_reg_note (get_last_insn (), REG_EQUAL, - gen_rtx_fmt_e (code, mode, copy_rtx (op0))); + set_dst_reg_note (get_last_insn (), REG_EQUAL, + gen_rtx_fmt_e (code, mode, copy_rtx (op0)), + target); } return target; @@ -3899,8 +3901,7 @@ emit_libcall_block (rtx insns, rtx targe } last = emit_move_insn (target, result); - if (optab_handler (mov_optab, GET_MODE (target)) != CODE_FOR_nothing) -set_unique_reg_note (last, REG_EQUAL, copy_rtx (equiv)); + set_dst_reg_note (last, REG_EQUAL, copy_rtx (equiv), target); if (final_dest != target) emit_move_insn (final_dest, target); @@ -5213,11 +5214,10 @@ expand_fix (rtx to, rtx from, int unsign { /* Make a place for a REG_NOTE and add it. */ insn = emit_move_insn (to, to); - set_unique_reg_note (insn, - REG_EQUAL, - gen_rtx_fmt_e (UNSIGNED_FIX, - GET_MODE (to), - copy_rtx (from))); + set_dst_reg_note (insn, REG_EQUAL, + gen_rtx_fmt_e (UNSIGNED_FIX, GET_MODE (to), +
Re: RFC: ARM 64-bit shifts in NEON
On 07/12/11 14:03, Andrew Stubbs wrote: On Wed 07 Dec 2011 13:42:37 GMT, Richard Earnshaw wrote: So it looks like the code generated for core registers with thumb2 is pretty rubbish (no real surprise there -- to get the best code you need to make use of the fact that on ARM a shift by a small negative number ( -128) will give zero. This gives us sequences like: For ARM state it's something like (untested) @ shft 32 , shft= 32 __ashldi3_v3: sub r3, r2, #32 @ -ve , shft - 32 lsl ah, ah, r2 @ ah shft, 0 rsb ip, r2, #32 @ 32 - shft , -ve orr ah, ah, al, lsl r3 @ ah shft, al shft - 32 orr ah, ah, al, lsr ip @ ah shft | al 32 - shft , al shft - 32 lsl al, al, r2 @ al shft, 0 For Thumb2 (where there is no orr with register shift) lslsah, ah, r2 @ ah shft, 0 sub r3, r2, #32 @ -ve , shft - 32 lsl ip, al, r3 @ 0 , al shft - 32 negsr3, r3 @ 32 - shft , -ve orr ah, ah, ip @ ah shft, al shft - 32 lsr r3, al, r3 @ al 32 - shft , 0 orrsah, ah, r3 @ ah shft | al 32 - shft , al shft - 32 lslsal, al, r2 @ al shft, 0 Neither of which needs the condition flags during execution (and indeed is probably better in both cases than the code currently in lib1funcs.asm for a modern core). The flag clobbering behaviour in the thumb2 variant is only for code size saving; that would normally be added by a late optimization pass. OK, those are interesting, and I can look into making it happen, with or without NEON. Would it not require an unspec to prevent 'clever things' happening to the negative shift, if we were to encode these in the machine description? I'm not too clear on what these 'clever things' might be in the case of shift-by-register (presumably value-range propagation is one), but I know the NEON shifts are encoded this way for safety. Given the way the shift patterns in the compiler are written today, quite possibly. Though in the general case of a non-constant shift the optimizer probably wouldn't be able to safely make any assumptions that would break things. I suspect that the shift patterns should really be changed to make the shift be by a QImode value; this would then correctly describe the number of bits in the register that are really involved in the shift. Further, we could then say that, for core registers, the full value in that QI register was used to determine the shift. It would be quite a lot of churn to fix this though. None of this directly helps with your neon usage, but it does show that we really don't need to clobber the condition code register to get an efficient sequence. Except that it doesn't in the case of a shift by one where there is a two-instruction sequence that clobbers CC. Presumably this special case can be treated differently though, right from expand. All of the sequences above can be simplified significantly if the shift amount is constant and I think then, that with the exception of the special case you mention (which is only for shift right by 1) you never need the condition codes and you never need more than 3 ARM instructions: shifts 32 LSL AH, AH, #n ORR AH, AH, AL, LSR #(32 - n) LSL AL, AL, #n shifts = 32 LSL AH, AL, #(n - 32) MOV AL, #0 In fact both of the above sequences are equally good for Thumb2. If we lost the RRX tweak it wouldn't be a major loss (we could even put it back as a peephole2 to handle the common case where the condition code registers were known to be dead). R.
Re: [PATCH] Fix PR tree-optimization/51315
You are basically (but non-optimally) locally re-implementing what expr.c:get_object_or_type_alignment does, for the bare MEM_REF case (you don't consider offsets that do not change the alignment, nor alignment information present on the SSA name). Adjusted version attached, regression-tested on SPARC/Solaris. I'll do a full testing cycle if it is OK for mainline. get_object_or_type_alignment doesn't exist on the 4.6 branch, the expanders for MEM_REF and TARGET_MEM_REF do: MAX (TYPE_ALIGN (TREE_TYPE (exp)), get_object_alignment (exp, BIGGEST_ALIGNMENT))) instead, so I presume I should do the same for them in tree_non_aligned_mem_p? 2011-12-07 Eric Botcazou ebotca...@adacore.com PR tree-optimization/51315 * tree.h (get_object_or_type_alignment): Declare. * expr.c (get_object_or_type_alignment): Move to... * builtins.c (get_object_or_type_alignment): ...here. Add assertion. * tree-sra.c (tree_non_mode_aligned_mem_p): Rename to... (tree_non_aligned_mem_p): ...this. Add ALIGN parameter. Look into MEM_REFs and use get_object_or_type_alignment for them. (build_accesses_from_assign): Adjust for above change. (access_precludes_ipa_sra_p): Likewise. -- Eric Botcazou Index: tree.h === --- tree.h (revision 181993) +++ tree.h (working copy) @@ -5463,6 +5463,7 @@ extern rtx builtin_memset_read_str (void extern bool is_builtin_fn (tree); extern unsigned int get_object_alignment_1 (tree, unsigned HOST_WIDE_INT *); extern unsigned int get_object_alignment (tree); +extern unsigned int get_object_or_type_alignment (tree); extern unsigned int get_pointer_alignment (tree); extern tree fold_call_stmt (gimple, bool); extern tree gimple_fold_builtin_snprintf_chk (gimple, tree, enum built_in_function); Index: builtins.c === --- builtins.c (revision 181993) +++ builtins.c (working copy) @@ -452,6 +452,31 @@ get_object_alignment (tree exp) return align; } +/* Return the alignment of object EXP, also considering its type when we do + not know of explicit misalignment. Only handle MEM_REF and TARGET_MEM_REF. + + ??? Note that, in the general case, the type of an expression is not kept + consistent with misalignment information by the front-end, for example when + taking the address of a member of a packed structure. However, in most of + the cases, expressions have the alignment of their type so we optimistically + fall back to this alignment when we cannot compute a misalignment. */ + +unsigned int +get_object_or_type_alignment (tree exp) +{ + unsigned HOST_WIDE_INT misalign; + unsigned int align = get_object_alignment_1 (exp, misalign); + + gcc_assert (TREE_CODE (exp) == MEM_REF || TREE_CODE (exp) == TARGET_MEM_REF); + + if (misalign != 0) +align = (misalign -misalign); + else +align = MAX (TYPE_ALIGN (TREE_TYPE (exp)), align); + + return align; +} + /* Return the alignment in bits of EXP, a pointer valued expression. The alignment returned is, by default, the alignment of the thing that EXP points to. If it is not a POINTER_TYPE, 0 is returned. Index: expr.c === --- expr.c (revision 181993) +++ expr.c (working copy) @@ -4544,27 +4544,6 @@ get_bit_range (unsigned HOST_WIDE_INT *b } } -/* Return the alignment of the object EXP, also considering its type - when we do not know of explicit misalignment. - ??? Note that, in the general case, the type of an expression is not kept - consistent with misalignment information by the front-end, for - example when taking the address of a member of a packed structure. - However, in most of the cases, expressions have the alignment of - their type, so we optimistically fall back to the alignment of the - type when we cannot compute a misalignment. */ - -static unsigned int -get_object_or_type_alignment (tree exp) -{ - unsigned HOST_WIDE_INT misalign; - unsigned int align = get_object_alignment_1 (exp, misalign); - if (misalign != 0) -align = (misalign -misalign); - else -align = MAX (TYPE_ALIGN (TREE_TYPE (exp)), align); - return align; -} - /* Expand an assignment that stores the value of FROM into TO. If NONTEMPORAL is true, try generating a nontemporal store. */ Index: tree-sra.c === --- tree-sra.c (revision 181993) +++ tree-sra.c (working copy) @@ -1067,26 +1067,29 @@ disqualify_ops_if_throwing_stmt (gimple return false; } -/* Return true iff type of EXP is not sufficiently aligned. */ +/* Return true if EXP is a memory reference less aligned than ALIGN. This is + invoked only on strict-alignment targets. */ static bool -tree_non_mode_aligned_mem_p (tree exp) +tree_non_aligned_mem_p (tree exp, unsigned int align) { - enum
Re: [SMS] Support new loop pattern
2011/10/12 Ayal Zaks ayal.z...@gmail.com: - the last jump instruction should look like: pc=(regF!=0)?label:pc, regF is you'd probably want to bump to next instruction if falling through, e.g., pc=(regF!=0)?label:pc+4 It is considered that program counter is increased automatically onhardware level.Otherwise we should add something like pc=pc+4 in parallel to eachinstruction in RTL. flag register; - the last instruction which sets regF should be: regF=COMPARE(regC,X), where X is a constant, or maybe a register, which is not changed inside a loop; - only one instruction modifies regC inside a loop (other can use regC, but not write), and it should simply adjust it by a constant: regC=regC+step, where step is a constant. When doloop is succesfully scheduled by SMS, its number of iterations of loop kernel should be decreased by the number of stages in a schedule minus one, while other iterations expand to prologue and epilogue. In new supported loops such approach can't be used, because some instructions can use count register (regC). Instead of this, the final register value X in compare instruction regF=COMPARE(regC,X) is changed to another value Y respective to the stage this instruction is scheduled (Y = X - stage * step). making sure this does not underflow; i.e., that the number of iterations is no less than stage (you've addressed this towards the end below). Yes, this situation is processed correctly. The main difference from doloop case is that regC can be used by some instructions in loop body. That's why we are unable to simply adjust regC initial value, but have to keep it's value correct on each particular iteration. So, we change comparison instruction accordingly. An example: int a[100]; int main() { int i; for (i = 85; i 12; i -= 5) a[i] = i * i; return a[15]-225; } ARM assembler with -O2 -fno-auto-inc-dec: ldr r0, .L5 mov r3, #85 mov r2, r0 .L2: mul r1, r3, r3 sub r3, r3, #5 cmp r3, #10 str r1, [r2, #340] sub r2, r2, #20 bne .L2 ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L5: .word a Loop body is executed 15 times. When compiling with SMS, it finds a schedule with ii=7, stage_count=3 and following times: Stage Time Insn 0 5 mul r1, r3, r3 1 10 sub r3, r3, #5 1 11 cmp r3, #10 1 11 str r1, [r2, #340] 1 13 bne .L2 2 16 sub r2, r2, #20 branch is not scheduled last? Yes, branch schedule time is smaller then sub's one.This mean that sub r2, r2, $20 instruction from original iterationnumber K will be executed afterbne .L2 from original iteration number K.But certainly bne remains to be the last instuction in new loop body.Below you can see how it looks after SMS. To make new schedule correct the loop body should be executed 14 times and we change compare instruction: the loop itself should execute 13 times. with i =85, 80, 75, 70, 6560, 55, 50, 45, 4035, 30, 25, 20, 15this gives total 15 iterations (15 stores to memory).And new loop body will be executed 13 times (one store goes toepilogue and one - to prologue). regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step. In our example regC is r3, X is 10, step = -5, compare instruction is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15. right. In general, if the compare is on stage s (starting from 0), it will be executed s times in the epilog, so it should exit the loop upon reaching Y = X - s * step. So, after SMS it looks like: ldr r0, .L5 mov r3, #85 mov r2, r0 ;;prologue mul r1, r3, r3 ;;from stage 0 first iteration sub r3, r3, #5 ;;3 insns from stage 1 first iteration cmp r3, #10 str r1, [r2, #340] mul r1, r3, r3 ;;from stage 0 second iteration ;;body .L2: sub r3, r3, #5 sub r2, r2, #20 cmp r3, #15 ;; new value to compare with is Y=15 str r1, [r2, #340] mul r1, r3, r3 bne .L2 ;;epilogue sub r2, r2, #20 ;;from stage 2 pre-last iteration sub r3, r3, #5 ;;3 insns from stage 1 last iteration cmp r3, #10 str r1, [r2, #340] sub r2, r2, #20 ;;from stage 2 last iteration ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L5: .word a Here in comments I mention why insn was copied to prolog and epilog.Only branch is not copied at all. Testing of this appoach reveals two bugs, which do not appear while SMS was used only for doloop loops. Both these bugs happen due to the nature of the flag register. On x86_64 it is clobbered by most of arithmetic instructions. This should ideally be
Re: RFC: ARM 64-bit shifts in NEON
On Wed 07 Dec 2011 14:20:43 GMT, Richard Earnshaw wrote: Would it not require an unspec to prevent 'clever things' happening to the negative shift, if we were to encode these in the machine description? I'm not too clear on what these 'clever things' might be in the case of shift-by-register (presumably value-range propagation is one), but I know the NEON shifts are encoded this way for safety. Given the way the shift patterns in the compiler are written today, quite possibly. Though in the general case of a non-constant shift the optimizer probably wouldn't be able to safely make any assumptions that would break things. I've noticed that the right-shift NEON insns have an EQUIV entry in the dumps, so I'm suspicious that even then we're not totally safe from optimization. I suspect that the shift patterns should really be changed to make the shift be by a QImode value; this would then correctly describe the number of bits in the register that are really involved in the shift. Further, we could then say that, for core registers, the full value in that QI register was used to determine the shift. It would be quite a lot of churn to fix this though. Yeah, I considered this, but I couldn't figure out how to make the details work, and anyway, I've seen the compiler trying to truncate things (unnecessarily) for that sort of thing, so I haven't actually tried it. Maybe I'll have a go and see what happens. I suspect it's need extra patterns to combine the truncate seamlessly, and allow actual QImode input also? None of this directly helps with your neon usage, but it does show that we really don't need to clobber the condition code register to get an efficient sequence. Except that it doesn't in the case of a shift by one where there is a two-instruction sequence that clobbers CC. Presumably this special case can be treated differently though, right from expand. All of the sequences above can be simplified significantly if the shift amount is constant and I think then, that with the exception of the special case you mention (which is only for shift right by 1) you never need the condition codes and you never need more than 3 ARM instructions: Actually, there are 1bit patterns for all the shift types. shifts 32 LSL AH, AH, #n ORR AH, AH, AL, LSR #(32 - n) LSL AL, AL, #n shifts= 32 LSL AH, AL, #(n - 32) MOV AL, #0 In fact both of the above sequences are equally good for Thumb2. If we lost the RRX tweak it wouldn't be a major loss (we could even put it back as a peephole2 to handle the common case where the condition code registers were known to be dead). Yes, these are what the compiler currently generates. With my patch, *sh*di3 never fails to expand (not if TARGET_NEON is true, anyway), so the compiler doesn't do it automatically any more, so I have added splitters to do it manually. Andrew
Re: [SMS] Support new loop pattern
Apologies for the messed up previous e-mail. 2011/10/12 Ayal Zaks ayal.z...@gmail.com: - the last jump instruction should look like: pc=(regF!=0)?label:pc, regF is you'd probably want to bump to next instruction if falling through, e.g., pc=(regF!=0)?label:pc+4 It is considered that program counter is increased automatically on hardware level. Otherwise we should add something like pc=pc+4 in parallel to each instruction in RTL. flag register; - the last instruction which sets regF should be: regF=COMPARE(regC,X), where X is a constant, or maybe a register, which is not changed inside a loop; - only one instruction modifies regC inside a loop (other can use regC, but not write), and it should simply adjust it by a constant: regC=regC+step, where step is a constant. When doloop is succesfully scheduled by SMS, its number of iterations of loop kernel should be decreased by the number of stages in a schedule minus one, while other iterations expand to prologue and epilogue. In new supported loops such approach can't be used, because some instructions can use count register (regC). Instead of this, the final register value X in compare instruction regF=COMPARE(regC,X) is changed to another value Y respective to the stage this instruction is scheduled (Y = X - stage * step). making sure this does not underflow; i.e., that the number of iterations is no less than stage (you've addressed this towards the end below). Yes, this situation is processed correctly. The main difference from doloop case is that regC can be used by some instructions in loop body. That's why we are unable to simply adjust regC initial value, but have to keep it's value correct on each particular iteration. So, we change comparison instruction accordingly. An example: int a[100]; int main() { int i; for (i = 85; i 12; i -= 5) a[i] = i * i; return a[15]-225; } ARM assembler with -O2 -fno-auto-inc-dec: ldr r0, .L5 mov r3, #85 mov r2, r0 .L2: mul r1, r3, r3 sub r3, r3, #5 cmp r3, #10 str r1, [r2, #340] sub r2, r2, #20 bne .L2 ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L5: .word a Loop body is executed 15 times. When compiling with SMS, it finds a schedule with ii=7, stage_count=3 and following times: Stage Time Insn 0 5 mul r1, r3, r3 1 10 sub r3, r3, #5 1 11 cmp r3, #10 1 11 str r1, [r2, #340] 1 13 bne .L2 2 16 sub r2, r2, #20 branch is not scheduled last? Yes, branch schedule time is smaller then sub's one. This mean that sub r2, r2, $20 instruction from original iteration number K will be executed after bne .L2 from original iteration number K. But certainly bne remains to be the last instuction in new loop body. Below you can see how it looks after SMS. To make new schedule correct the loop body should be executed 14 times and we change compare instruction: the loop itself should execute 13 times. with i = 85, 80, 75, 70, 65 60, 55, 50, 45, 40 35, 30, 25, 20, 15 this gives total 15 iterations (15 stores to memory). And new loop body will be executed 13 times (one store goes to epilogue and one - to prologue). regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step. In our example regC is r3, X is 10, step = -5, compare instruction is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15. right. In general, if the compare is on stage s (starting from 0), it will be executed s times in the epilog, so it should exit the loop upon reaching Y = X - s * step. So, after SMS it looks like: ldr r0, .L5 mov r3, #85 mov r2, r0 ;;prologue mul r1, r3, r3 ;;from stage 0 first iteration sub r3, r3, #5 ;;3 insns from stage 1 first iteration cmp r3, #10 str r1, [r2, #340] mul r1, r3, r3 ;;from stage 0 second iteration ;;body .L2: sub r3, r3, #5 sub r2, r2, #20 cmp r3, #15 ;; new value to compare with is Y=15 str r1, [r2, #340] mul r1, r3, r3 bne .L2 ;;epilogue sub r2, r2, #20 ;;from stage 2 pre-last iteration sub r3, r3, #5 ;;3 insns from stage 1 last iteration cmp r3, #10 str r1, [r2, #340] sub r2, r2, #20 ;;from stage 2 last iteration ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L5: .word a Here in comments I mention why insn was copied to prolog and epilog. Only branch is not copied at all. Testing of this appoach reveals two bugs, which do not appear while SMS was used only for doloop loops. Both these bugs happen due to the nature of the flag register. On
[SMS, DDG] Additional edges to instructions with clobber
This letter contains the first separate patch for ddg.cIt creates additional nessesary anti-dep edges in data dependency graph. 2011/10/12 Ayal Zaks ayal.z...@gmail.com: The following situation happens when SMS is enabled without register renaming (-fno-modulo-sched-allow-regmoves). When data dependency graph is built, there is a step when we generate anti-dependencies from last register use to first write of this register at the next iteration. is a step when we generate anti-dependencies from all last registers uses (i.e., of last register def) to first write of this register at the next iteration. Right. At this moment we should also create such dependencies to all instructions which clobber the register to prevent this clobbers being before last use is new schedule. well, we simply need to connect these last uses to either the first write *or* the first clobber of this register at the next iteration, according to whichever is first, no? No, is works now just how you describe. Clobber is considered as a write,and last uses are connected with first write or first clobber.But this is not sufficient: similarly to how there's no dependenciesbetween last uses,there shall be no dependency between first clobbers (i.e. clobbersof a registercan be permuted freely). So at least we have to make an additional dependencyedge to each clobber before first write. Here is an model of example: loop { set1 regR use1 regR clobber regR set2 regR use2 regR } If we create only use2-set1 anti-dependency (and no use2-cloober) the following wrong schedule is possible: prologue { set1 regR use1 regR clobber regR } kernel { set2 regR clobber regR (instruction from next iteration in terms of old loop kernel) use2 regR set1 regR (also from next iteration) use1 regR (also from next iteration) } epilogue { set2 regR use2 regR } strange; according to prolog (and epilog), clobber should appear after use1 in the kernel, no? Aren't there (intra-iteration) dependencies preventing the clobber to skip over use1 and/or set1? Yeah, this is bad example - I wrongly suggested that there is noanti-dep between use1 and clobber. New example:loop { clobber1 clobber2 set use}When there is no use-clobber2 anti-dep - the following wrong scheduleis possible. Clobber2 is on stage0, other instructions are on stage1.prologue { clobber2}kernel { clobber1 set clobber2 use}epilogue { clobber1 set use} This problem was succesfully fixed by creating a vector of all clobbering instructions together with first write and adding all needed dependencies. seems like an overkill to me; we didn't draw an edge between every last use and every write, because writes are kept in order by having output dependences between them. So should be the case with clobbers. Clobbers themselves aren't kept in order - there are no output dependencesbetween them. They may be scheduled in any order. Presumably, the ddg already has all intra-iteration edges preventing motion of clobbering instructions within an iteration, and we are only missing inter-iteration deps or deps surfaced by eliminating anti-deps, right? I think the new version of a fix suits this reasoning.Now it creates dependencies only to clobber instructions before first write.And certainly to first write instruction itself. --Roman zhuykovzhr...@ispras.ru 2011-12-07 Roman Zhuykov zhr...@ispras.ru * ddg.c: New VEC. (add_cross_iteration_register_deps): Store information about clobbers before first write to a register. Use collected information to create anti-dependencies from last uses. --- diff --git a/gcc/ddg.c b/gcc/ddg.c index 2b1cfe8..e3e065c 100644 --- a/gcc/ddg.c +++ b/gcc/ddg.c @@ -49,6 +49,10 @@ along with GCC; see the file COPYING3. If not see /* A flag indicating that a ddg edge belongs to an SCC or not. */ enum edge_flag {NOT_IN_SCC = 0, IN_SCC}; +/* A vector of dependencies needed while processing clobbers. */ +DEF_VEC_P(df_ref); +DEF_VEC_ALLOC_P(df_ref,heap); + /* Forward declarations. */ static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr); static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr); @@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to, static void add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def) { - int regno = DF_REF_REGNO (last_def); + unsigned int regno = DF_REF_REGNO (last_def); struct df_link *r_use; int has_use_in_bb_p = false; - rtx def_insn = DF_REF_INSN (last_def); + rtx insn, def_insn = DF_REF_INSN (last_def); ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn); ddg_node_ptr use_node; #ifdef ENABLE_CHECKING struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb); #endif - df_ref first_def = df_bb_regno_first_def_find (g-bb, regno); + static VEC(df_ref,heap) *clobbers; + df_ref first_write_def = NULL; + df_ref *def_rec; + unsigned int uid; + + clobbers = VEC_alloc (df_ref, heap, 0); + + FOR_BB_INSNS (g-bb, insn) +{ +
[PATCH][LTO] Fix PR48437
I'm testing the following patch for PR48437 - the C frontend generates decl copies for placing them in BLOCK_VARS for int test (void) { extern int f (void); return 0; } we currently end up putting those into the decl merging machinery, causing DECL_CHAIN re-use and subsequent crashes. By not indexing them we make sure to keep them separate. LTO bootstrap regtest pending on x86_64-unknown-linux-gnu, ok? [Similar fix has to be employed for VLA types] Thanks, Richard. 2011-12-07 Richard Guenther rguent...@suse.de PR lto/48437 * lto-streamer-out.c (tree_is_indexable): Exclude block-local extern declarations. * gcc.dg/lto/20111207-2_0.c: New testcase. * gcc.dg/guality/pr48437.c: Likewise. Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 182081) --- gcc/lto-streamer-out.c (working copy) *** tree_is_indexable (tree t) *** 129,134 --- 129,144 else if (TREE_CODE (t) == VAR_DECL decl_function_context (t) !TREE_STATIC (t)) return false; + /* If this is a decl generated for block local externs for + debug info generation, stream it unshared alongside BLOCK_VARS. */ + else if (VAR_OR_FUNCTION_DECL_P (t) + /* ??? The following tests are a literal match on what + c-decl.c:pop_scope does. */ + TREE_PUBLIC (t) + DECL_EXTERNAL (t) + DECL_CONTEXT (t) + TREE_CODE (DECL_CONTEXT (t)) == FUNCTION_DECL) + return false; else return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME); } Index: gcc/testsuite/gcc.dg/lto/20111207-2_0.c === *** gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0) --- gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0) *** *** 0 --- 1,17 + /* { dg-lto-do run } */ + + int + test (void) + { + int f (void); + return 0; + } + + int + main (void) + { + int f (void); + int test (void); + + return test (); + } Index: gcc/testsuite/gcc.dg/guality/pr48437.c === *** gcc/testsuite/gcc.dg/guality/pr48437.c (revision 0) --- gcc/testsuite/gcc.dg/guality/pr48437.c (revision 0) *** *** 0 --- 1,15 + /* PR lto/48437 */ + /* { dg-do run } */ + /* { dg-options -g } */ + + int i __attribute__((used)); + int main() + { + volatile int i; + for (i = 3; i 7; ++i) + { + extern int i; /* { dg-final { gdb-test 7 i 0 } } */ + asm volatile ( : : : memory); + } + return 0; + }
Re: [PATCH] Fix PR tree-optimization/51315
On Wed, Dec 7, 2011 at 3:32 PM, Eric Botcazou ebotca...@adacore.com wrote: You are basically (but non-optimally) locally re-implementing what expr.c:get_object_or_type_alignment does, for the bare MEM_REF case (you don't consider offsets that do not change the alignment, nor alignment information present on the SSA name). Adjusted version attached, regression-tested on SPARC/Solaris. I'll do a full testing cycle if it is OK for mainline. Yes, the patch is ok for mainline. get_object_or_type_alignment doesn't exist on the 4.6 branch, the expanders for MEM_REF and TARGET_MEM_REF do: MAX (TYPE_ALIGN (TREE_TYPE (exp)), get_object_alignment (exp, BIGGEST_ALIGNMENT))) instead, so I presume I should do the same for them in tree_non_aligned_mem_p? Yeah, you can do that. Thanks, Richard. 2011-12-07 Eric Botcazou ebotca...@adacore.com PR tree-optimization/51315 * tree.h (get_object_or_type_alignment): Declare. * expr.c (get_object_or_type_alignment): Move to... * builtins.c (get_object_or_type_alignment): ...here. Add assertion. * tree-sra.c (tree_non_mode_aligned_mem_p): Rename to... (tree_non_aligned_mem_p): ...this. Add ALIGN parameter. Look into MEM_REFs and use get_object_or_type_alignment for them. (build_accesses_from_assign): Adjust for above change. (access_precludes_ipa_sra_p): Likewise. -- Eric Botcazou
[SMS, DDG] Additional edges to instructions with clobber
Apologies for the messed up previous e-mail. This letter contains the first separate patch for ddg.cIt creates additional nessesary anti-dep edges in data dependency graph. 2011/10/12 Ayal Zaks ayal.z...@gmail.com: The following situation happens when SMS is enabled without register renaming (-fno-modulo-sched-allow-regmoves). When data dependency graph is built, there is a step when we generate anti-dependencies from last register use to first write of this register at the next iteration. is a step when we generate anti-dependencies from all last registers uses (i.e., of last register def) to first write of this register at the next iteration. Right. At this moment we should also create such dependencies to all instructions which clobber the register to prevent this clobbers being before last use is new schedule. well, we simply need to connect these last uses to either the first write *or* the first clobber of this register at the next iteration, according to whichever is first, no? No, is works now just how you describe. Clobber is considered as a write,and last uses are connected with first write or first clobber.But this is not sufficient: similarly to how there's no dependenciesbetween last uses,there shall be no dependency between first clobbers (i.e. clobbersof a registercan be permuted freely). So at least we have to make an additional dependencyedge to each clobber before first write. Here is an model of example: loop { set1 regR use1 regR clobber regR set2 regR use2 regR } If we create only use2-set1 anti-dependency (and no use2-cloober) the following wrong schedule is possible: prologue { set1 regR use1 regR clobber regR } kernel { set2 regR clobber regR (instruction from next iteration in terms of old loop kernel) use2 regR set1 regR (also from next iteration) use1 regR (also from next iteration) } epilogue { set2 regR use2 regR } strange; according to prolog (and epilog), clobber should appear after use1 in the kernel, no? Aren't there (intra-iteration) dependencies preventing the clobber to skip over use1 and/or set1? Yeah, this is bad example - I wrongly suggested that there is noanti-dep between use1 and clobber. New example:loop { clobber1 clobber2 set use}When there is no use-clobber2 anti-dep - the following wrong scheduleis possible. Clobber2 is on stage0, other instructions are on stage1.prologue { clobber2}kernel { clobber1 set clobber2 use}epilogue { clobber1 set use} This problem was succesfully fixed by creating a vector of all clobbering instructions together with first write and adding all needed dependencies. seems like an overkill to me; we didn't draw an edge between every last use and every write, because writes are kept in order by having output dependences between them. So should be the case with clobbers. Clobbers themselves aren't kept in order - there are no output dependencesbetween them. They may be scheduled in any order. Presumably, the ddg already has all intra-iteration edges preventing motion of clobbering instructions within an iteration, and we are only missing inter-iteration deps or deps surfaced by eliminating anti-deps, right? I think the new version of a fix suits this reasoning.Now it creates dependencies only to clobber instructions before first write.And certainly to first write instruction itself. --Roman zhuykovzhr...@ispras.ru 2011-12-07 Roman Zhuykov zhr...@ispras.ru * ddg.c: New VEC. (add_cross_iteration_register_deps): Store information about clobbers before first write to a register. Use collected information to create anti-dependencies from last uses. --- diff --git a/gcc/ddg.c b/gcc/ddg.c index 2b1cfe8..e3e065c 100644 --- a/gcc/ddg.c +++ b/gcc/ddg.c @@ -49,6 +49,10 @@ along with GCC; see the file COPYING3. If not see /* A flag indicating that a ddg edge belongs to an SCC or not. */ enum edge_flag {NOT_IN_SCC = 0, IN_SCC}; +/* A vector of dependencies needed while processing clobbers. */ +DEF_VEC_P(df_ref); +DEF_VEC_ALLOC_P(df_ref,heap); + /* Forward declarations. */ static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr); static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr); @@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to, static void add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def) { - int regno = DF_REF_REGNO (last_def); + unsigned int regno = DF_REF_REGNO (last_def); struct df_link *r_use; int has_use_in_bb_p = false; - rtx def_insn = DF_REF_INSN (last_def); + rtx insn, def_insn = DF_REF_INSN (last_def); ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn); ddg_node_ptr use_node; #ifdef ENABLE_CHECKING struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb); #endif - df_ref first_def = df_bb_regno_first_def_find (g-bb, regno); + static VEC(df_ref,heap) *clobbers; + df_ref first_write_def = NULL; + df_ref *def_rec; + unsigned int uid; + + clobbers = VEC_alloc (df_ref, heap, 0);
Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)
On 12/06/2011 02:48 PM, Jakub Jelinek wrote: - gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type)); + gcc_assert (!processing_template_decl + || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE)); How about SCALAR_TYPE_P (strip_array_types (type))? OK with that change. Jason
Re: [PATCH][LTO] Fix PR48437
On Wed, Dec 07, 2011 at 03:52:35PM +0100, Richard Guenther wrote: *** gcc/testsuite/gcc.dg/guality/pr48437.c(revision 0) --- gcc/testsuite/gcc.dg/guality/pr48437.c(revision 0) *** *** 0 --- 1,15 + /* PR lto/48437 */ + /* { dg-do run } */ + /* { dg-options -g } */ + + int i __attribute__((used)); + int main() + { + volatile int i; + for (i = 3; i 7; ++i) + { + extern int i; /* { dg-final { gdb-test 7 i 0 } } */ + asm volatile ( : : : memory); Do you really want to test it on line 7 ({ of after main())? I'd expect you want to look at it on the line where asm volatile is, and make sure you can put a breakpoint on it: #include ../nop.h early and use NOP instead of as the asm pattern. Jakub
Re: [PATCH][LTO] Fix PR48437
On Wed, 7 Dec 2011, Jakub Jelinek wrote: On Wed, Dec 07, 2011 at 03:52:35PM +0100, Richard Guenther wrote: *** gcc/testsuite/gcc.dg/guality/pr48437.c (revision 0) --- gcc/testsuite/gcc.dg/guality/pr48437.c (revision 0) *** *** 0 --- 1,15 + /* PR lto/48437 */ + /* { dg-do run } */ + /* { dg-options -g } */ + + int i __attribute__((used)); + int main() + { + volatile int i; + for (i = 3; i 7; ++i) + { + extern int i; /* { dg-final { gdb-test 7 i 0 } } */ + asm volatile ( : : : memory); Do you really want to test it on line 7 ({ of after main())? Oops - so much for adding the /* dg */ stuff late ... I'd expect you want to look at it on the line where asm volatile is, and make sure you can put a breakpoint on it: #include ../nop.h early and use NOP instead of as the asm pattern. Changed. Thanks, Richard.
Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)
On Wed, Dec 07, 2011 at 10:10:08AM -0500, Jason Merrill wrote: On 12/06/2011 02:48 PM, Jakub Jelinek wrote: - gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type)); + gcc_assert (!processing_template_decl + || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE)); How about SCALAR_TYPE_P (strip_array_types (type))? OK with that change. That would be readable, but expensive - SCALAR_TYPE_P evaluates its argument 11 times. Jakub
Re: [C++ PATCH] Fix ICE in build_value_init (PR c++/51369)
On 12/07/2011 10:18 AM, Jakub Jelinek wrote: On Wed, Dec 07, 2011 at 10:10:08AM -0500, Jason Merrill wrote: On 12/06/2011 02:48 PM, Jakub Jelinek wrote: - gcc_assert (!processing_template_decl || SCALAR_TYPE_P (type)); + gcc_assert (!processing_template_decl + || (SCALAR_TYPE_P (type) || TREE_CODE (type) == ARRAY_TYPE)); How about SCALAR_TYPE_P (strip_array_types (type))? OK with that change. That would be readable, but expensive - SCALAR_TYPE_P evaluates its argument 11 times. Good point. Your patch is OK, then. Jasno
[SMS, DDG] Additional edges to instructions with clobber
Apologies for the messed up previous e-mails. This letter contains the first separate patch for ddg.c It creates additional nessesary anti-dep edges in data dependency graph. 2011/10/12 Ayal Zaks ayal.z...@gmail.com: The following situation happens when SMS is enabled without register renaming (-fno-modulo-sched-allow-regmoves). When data dependency graph is built, there is a step when we generate anti-dependencies from last register use to first write of this register at the next iteration. is a step when we generate anti-dependencies from all last registers uses (i.e., of last register def) to first write of this register at the next iteration. Right. At this moment we should also create such dependencies to all instructions which clobber the register to prevent this clobbers being before last use is new schedule. well, we simply need to connect these last uses to either the first write *or* the first clobber of this register at the next iteration, according to whichever is first, no? No, is works now just how you describe. Clobber is considered as a write, and last uses are connected with first write or first clobber. But this is not sufficient: similarly to how there's no dependencies between last uses, there shall be no dependency between first clobbers (i.e. clobbers of a register can be permuted freely). So at least we have to make an additional dependency edge to each clobber before first write. Here is an model of example: loop { set1 regR use1 regR clobber regR set2 regR use2 regR } If we create only use2-set1 anti-dependency (and no use2-cloober) the following wrong schedule is possible: prologue { set1 regR use1 regR clobber regR } kernel { set2 regR clobber regR (instruction from next iteration in terms of old loop kernel) use2 regR set1 regR (also from next iteration) use1 regR (also from next iteration) } epilogue { set2 regR use2 regR } strange; according to prolog (and epilog), clobber should appear after use1 in the kernel, no? Aren't there (intra-iteration) dependencies preventing the clobber to skip over use1 and/or set1? Yeah, this is bad example - I wrongly suggested that there is no anti-dep between use1 and clobber. New example: loop { clobber1 clobber2 set use } When there is no use-clobber2 anti-dep - the following wrong schedule is possible. Clobber2 is on stage0, other instructions are on stage1. prologue { clobber2 } kernel { clobber1 set clobber2 use } epilogue { clobber1 set use } This problem was succesfully fixed by creating a vector of all clobbering instructions together with first write and adding all needed dependencies. seems like an overkill to me; we didn't draw an edge between every last use and every write, because writes are kept in order by having output dependences between them. So should be the case with clobbers. Clobbers themselves aren't kept in order - there are no output dependences between them. They may be scheduled in any order. Presumably, the ddg already has all intra-iteration edges preventing motion of clobbering instructions within an iteration, and we are only missing inter-iteration deps or deps surfaced by eliminating anti-deps, right? I think the new version of a fix suits this reasoning. Now it creates dependencies only to clobber instructions before first write. And certainly to first write instruction itself. -- Roman Zhuykov zhr...@ispras.ru 2011-12-07 Roman Zhuykov zhr...@ispras.ru * ddg.c: New VEC. (add_cross_iteration_register_deps): Store information about clobbers before first write to a register. Use collected information to create anti-dependencies from last uses. --- diff --git a/gcc/ddg.c b/gcc/ddg.c index 2b1cfe8..e3e065c 100644 --- a/gcc/ddg.c +++ b/gcc/ddg.c @@ -49,6 +49,10 @@ along with GCC; see the file COPYING3. If not see /* A flag indicating that a ddg edge belongs to an SCC or not. */ enum edge_flag {NOT_IN_SCC = 0, IN_SCC}; +/* A vector of dependencies needed while processing clobbers. */ +DEF_VEC_P(df_ref); +DEF_VEC_ALLOC_P(df_ref,heap); + /* Forward declarations. */ static void add_backarc_to_ddg (ddg_ptr, ddg_edge_ptr); static void add_backarc_to_scc (ddg_scc_ptr, ddg_edge_ptr); @@ -273,24 +277,52 @@ create_ddg_dep_no_link (ddg_ptr g, ddg_node_ptr from, ddg_node_ptr to, static void add_cross_iteration_register_deps (ddg_ptr g, df_ref last_def) { - int regno = DF_REF_REGNO (last_def); + unsigned int regno = DF_REF_REGNO (last_def); struct df_link *r_use; int has_use_in_bb_p = false; - rtx def_insn = DF_REF_INSN (last_def); + rtx insn, def_insn = DF_REF_INSN (last_def); ddg_node_ptr last_def_node = get_node_of_insn (g, def_insn); ddg_node_ptr use_node; #ifdef ENABLE_CHECKING struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb); #endif - df_ref first_def = df_bb_regno_first_def_find (g-bb, regno); + static VEC(df_ref,heap) *clobbers; + df_ref first_write_def = NULL; + df_ref
[SMS, DDG] Correctly delete anti-dep edges for renaming
This letter contains second separate patch for ddg.c It prevents removing some edges in data dependency graph when renaming is allowed. 2011/10/12 Ayal Zaks ayal.z...@gmail.com: The other bug happens only with -fmodulo-sched-allow-regmoves. Here we eliminate some anti-dependence edges in data dependency graph in order to resolve them later by adding some register moves (renaming instructions). But in situation as in example above compiler gives an ICE because it can't create a register move, when regR is hardware flag register. So we have to know which register(s) cause anti-dependency in order to understand whether we can ignore it. I can't find any easy way to gather this information, so I create my own structures to store this info and had implemented my own hooks for sched_analyze function. This leads to more complex interconnection between ddg.c and modulo-sched.c. Well, having ddg.c's/create_ddg_dep_from_intra_loop_link() consult Register sets from modulo scheduler structures to determine if an anti-dep can be eliminated, seems awkward. One should be able to build a ddg independent of any modulo scheduler structures. I think now I found a proper way to gather information about the register(s) causing anti-dependency (see the patch attached). One simple solution is to keep all anti-dep edges of registers that are clobbered anywhere in the loop. This might be overly conservative, but perhaps not so if On x86_64 it is clobbered by most of arithmetic instructions. If an anti-dep between a use and a dep of a clobber register is to be removed, a dependence should be created between the use and a clobbering instruction following the def. Hope this is clear. This too should be solved by a dedicated (separate) patch, for easier digestion. Presumably, the ddg already has all intra-iteration edges preventing motion of clobbering instructions within an iteration, and we are only missing inter-iteration deps or deps surfaced by eliminating anti-deps, right? You might consider introducing a new type of dependence for such edges, Clobber, if it would help. Ayal, I suppose here we have some misunderstanding. This is not directly related to the problem of clobbers discussed previously. Here I talk about the need to find out the register causing the anti-dependence (for instance, we need to check whether it can be renamed). Currently, SMS simply attempts to take the RHS of the second instruction (via single_set()), and if it's a register, SMS assumes it's a register causing the dependency. This breaks in a following scenario: insn1: ... (read flags) insn2: regA = regB - regC; update flags Here, SMS would wrongly take regA as the dependency register, while it was in fact the flags register. So, I rewrote this part in a different way and make a separate patch. -- Roman Zhuykov zhr...@ispras.ru 2011-12-07 Roman Zhuykov zhr...@ispras.ru * ddg.c (create_ddg_dep_from_intra_loop_link): Correclty determine which register causes anti-dependency. Prevent removing anti-dep edge from ddg when hard_register_p. --- diff --git a/gcc/ddg.c b/gcc/ddg.c index 2b1cfe8..0a3726f 100644 --- a/gcc/ddg.c +++ b/gcc/ddg.c @@ -204,23 +204,34 @@ create_ddg_dep_from_intra_loop_link (ddg_ptr g, ddg_node_ptr src_node, (t == ANTI_DEP dt == REG_DEP) !autoinc_var_is_used_p (dest_node-insn, src_node-insn)) { - rtx set; + df_ref *def_rec; + bool can_delete_dep = true; - set = single_set (dest_node-insn); - /* TODO: Handle registers that REG_P is not true for them, i.e. + /* TODO: Handle !REG_P register correctly, i.e. subregs and special registers. */ - if (set REG_P (SET_DEST (set))) -{ - int regno = REGNO (SET_DEST (set)); - df_ref first_def; - struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb); + for (def_rec = DF_INSN_DEFS (dest_node-insn); *def_rec; def_rec++) + { + df_ref first_def, def = *def_rec, *use_rec; + unsigned regno = DF_REF_REGNO (def); + struct df_rd_bb_info *bb_info = DF_RD_BB_INFO (g-bb); + bool src_uses = false; - first_def = df_bb_regno_first_def_find (g-bb, regno); - gcc_assert (first_def); + for (use_rec = DF_INSN_USES (src_node-insn); *use_rec; use_rec++) + if (regno == DF_REF_REGNO (*use_rec)) + src_uses = true; - if (bitmap_bit_p (bb_info-gen, DF_REF_ID (first_def))) -return; -} + if (!src_uses) + continue; + + first_def = df_bb_regno_first_def_find (g-bb, regno); + gcc_assert (first_def); + + if (HARD_REGISTER_NUM_P (regno) + || !bitmap_bit_p (bb_info-gen, DF_REF_ID (first_def))) + can_delete_dep = false; + } + if (can_delete_dep) + return; } latency = dep_cost (link);
Re: [PATCH][LTO] Fix PR48437
On 12/07/11 09:52, Richard Guenther wrote: Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 182081) --- gcc/lto-streamer-out.c (working copy) *** tree_is_indexable (tree t) *** 129,134 --- 129,144 else if (TREE_CODE (t) == VAR_DECL decl_function_context (t) !TREE_STATIC (t)) return false; + /* If this is a decl generated for block local externs for + debug info generation, stream it unshared alongside BLOCK_VARS. */ + else if (VAR_OR_FUNCTION_DECL_P (t) + /* ??? The following tests are a literal match on what + c-decl.c:pop_scope does. */ Factor it into a common routine then? Diego.
Re: [PATCH][LTO] Fix PR48437
On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 09:52, Richard Guenther wrote: Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 182081) --- gcc/lto-streamer-out.c (working copy) *** tree_is_indexable (tree t) *** 129,134 --- 129,144 else if (TREE_CODE (t) == VAR_DECL decl_function_context (t) !TREE_STATIC (t)) return false; + /* If this is a decl generated for block local externs for + debug info generation, stream it unshared alongside BLOCK_VARS. */ + else if (VAR_OR_FUNCTION_DECL_P (t) + /* ??? The following tests are a literal match on what + c-decl.c:pop_scope does. */ Factor it into a common routine then? pop_scope of course _sets_ the values that way, it doesn't test them. Richard.
Re: [PATCH][LTO] Fix PR48437
On 12/07/11 10:46, Richard Guenther wrote: On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 09:52, Richard Guenther wrote: Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 182081) --- gcc/lto-streamer-out.c (working copy) *** tree_is_indexable (tree t) *** 129,134 --- 129,144 else if (TREE_CODE (t) == VAR_DECL decl_function_context (t) !TREE_STATIC (t)) return false; + /* If this is a decl generated for block local externs for + debug info generation, stream it unshared alongside BLOCK_VARS. */ + else if (VAR_OR_FUNCTION_DECL_P (t) + /* ??? The following tests are a literal match on what + c-decl.c:pop_scope does. */ Factor it into a common routine then? pop_scope of course _sets_ the values that way, it doesn't test them. Yes. I meant factoring, so future users have something to call, instead of three seemingly random flag checks. pop_scope could also be calling some complementary setter instead of doing the seemingly random flag setting. Diego.
Re: [PATCH][LTO] Fix PR48437
On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 10:46, Richard Guenther wrote: On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 09:52, Richard Guenther wrote: Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 182081) --- gcc/lto-streamer-out.c (working copy) *** tree_is_indexable (tree t) *** 129,134 --- 129,144 else if (TREE_CODE (t) == VAR_DECL decl_function_context (t) !TREE_STATIC (t)) return false; + /* If this is a decl generated for block local externs for + debug info generation, stream it unshared alongside BLOCK_VARS. */ + else if (VAR_OR_FUNCTION_DECL_P (t) + /* ??? The following tests are a literal match on what + c-decl.c:pop_scope does. */ Factor it into a common routine then? pop_scope of course _sets_ the values that way, it doesn't test them. Yes. I meant factoring, so future users have something to call, instead of three seemingly random flag checks. pop_scope could also be calling some complementary setter instead of doing the seemingly random flag setting. I don't see a good way to factor out the flags setting. Factoring out the copying maybe. But well... we can do that when a 2nd user comes along? Richard.
Re: [PATCH][LTO] Fix PR48437
On 12/07/11 10:54, Richard Guenther wrote: On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 10:46, Richard Guenther wrote: On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 09:52, Richard Guenther wrote: Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 182081) --- gcc/lto-streamer-out.c (working copy) *** tree_is_indexable (tree t) *** 129,134 --- 129,144 else if (TREE_CODE (t) == VAR_DECLdecl_function_context (t) !TREE_STATIC (t)) return false; + /* If this is a decl generated for block local externs for + debug info generation, stream it unshared alongside BLOCK_VARS. */ + else if (VAR_OR_FUNCTION_DECL_P (t) + /* ??? The following tests are a literal match on what + c-decl.c:pop_scope does. */ Factor it into a common routine then? pop_scope of course _sets_ the values that way, it doesn't test them. Yes. I meant factoring, so future users have something to call, instead of three seemingly random flag checks. pop_scope could also be calling some complementary setter instead of doing the seemingly random flag setting. I don't see a good way to factor out the flags setting. Factoring out the copying maybe. But well... we can do that when a 2nd user comes along? The problem is that the 2nd user will cut-n-paste from this one. However, if you find adding a little function too strenuous, I guess it's not too big a deal. Diego.
Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)
On Wed, 7 Dec 2011, Richard Guenther wrote: code - well, that's ok. Pointing people to the latest official release series (in this case 4.6.x) is also ok, we're keeping too many branches active IMNSHO. As I recall we agreed in London that both 4.3 and 4.4 should be closed (after a final release from each branch) - and in general two active release branches was enough - but so far have only closed 4.3. (Closed = no datestamp updates, no snapshots, no releases, no tracking regression status in Bugzilla, not absolutely disallowing commits if someone feels it's still useful to put some fix on an old branch.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][LTO] Fix PR48437
On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 10:54, Richard Guenther wrote: On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 10:46, Richard Guenther wrote: On Wed, 7 Dec 2011, Diego Novillo wrote: On 12/07/11 09:52, Richard Guenther wrote: Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 182081) --- gcc/lto-streamer-out.c (working copy) *** tree_is_indexable (tree t) *** 129,134 --- 129,144 else if (TREE_CODE (t) == VAR_DECLdecl_function_context (t) !TREE_STATIC (t)) return false; + /* If this is a decl generated for block local externs for + debug info generation, stream it unshared alongside BLOCK_VARS. */ + else if (VAR_OR_FUNCTION_DECL_P (t) + /* ??? The following tests are a literal match on what + c-decl.c:pop_scope does. */ Factor it into a common routine then? pop_scope of course _sets_ the values that way, it doesn't test them. Yes. I meant factoring, so future users have something to call, instead of three seemingly random flag checks. pop_scope could also be calling some complementary setter instead of doing the seemingly random flag setting. I don't see a good way to factor out the flags setting. Factoring out the copying maybe. But well... we can do that when a 2nd user comes along? The problem is that the 2nd user will cut-n-paste from this one. However, if you find adding a little function too strenuous, I guess it's not too big a deal. Testing this kind of patches turns out to be quite time-consuming (I do a LTO bootstrap and two SPEC 2k6 builds (-g and -g0)), so yeah ... The following is what I ended up LTO bootstrapping (finished ok), testing is still in progress, as is SPEC 2k6 build. It fixes PR49945 in a similar way, by streaming VLA types locally as well. I'm going to apply it tomorrow, when full testing hopefully finished Thanks, Richard. 2011-12-08 Richard Guenther rguent...@suse.de PR lto/48437 PR lto/49945 * lto-streamer-out.c (tree_is_indexable): Exclude block-local extern declarations. PR lto/48437 * gcc.dg/lto/20111207-2_0.c: New testcase. * gcc.dg/guality/pr48437.c: Likewise. Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 182081) --- gcc/lto-streamer-out.c (working copy) *** tree_is_indexable (tree t) *** 129,134 --- 129,147 else if (TREE_CODE (t) == VAR_DECL decl_function_context (t) !TREE_STATIC (t)) return false; + /* If this is a decl generated for block local externs for + debug info generation, stream it unshared alongside BLOCK_VARS. */ + else if (VAR_OR_FUNCTION_DECL_P (t) + /* ??? The following tests are a literal match on what + c-decl.c:pop_scope does. */ + TREE_PUBLIC (t) + DECL_EXTERNAL (t) + DECL_CONTEXT (t) + TREE_CODE (DECL_CONTEXT (t)) == FUNCTION_DECL) + return false; + else if (TYPE_P (t) + variably_modified_type_p (t, NULL_TREE)) + return false; else return (TYPE_P (t) || DECL_P (t) || TREE_CODE (t) == SSA_NAME); } Index: gcc/testsuite/gcc.dg/lto/20111207-2_0.c === *** gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0) --- gcc/testsuite/gcc.dg/lto/20111207-2_0.c (revision 0) *** *** 0 --- 1,17 + /* { dg-lto-do run } */ + + int + test (void) + { + int f (void); + return 0; + } + + int + main (void) + { + int f (void); + int test (void); + + return test (); + } Index: gcc/testsuite/gcc.dg/guality/pr48437.c === *** gcc/testsuite/gcc.dg/guality/pr48437.c (revision 0) --- gcc/testsuite/gcc.dg/guality/pr48437.c (revision 0) *** *** 0 --- 1,17 + /* PR lto/48437 */ + /* { dg-do run } */ + /* { dg-options -g } */ + + #include ../nop.h + + int i __attribute__((used)); + int main() + { + volatile int i; + for (i = 3; i 7; ++i) + { + extern int i; + asm volatile (NOP : : : memory); /* { dg-final { gdb-test 14 i 0 } } */ + } + return 0; + }
Re: [PATCH][LTO] Fix PR48437
On Wed, Dec 7, 2011 at 11:16, Richard Guenther rguent...@suse.de wrote: I'm going to apply it tomorrow, when full testing hopefully finished Sure. But remember the zombie kitties! Diego.
Re: [PATCH][ARM] one_cmpldi2 in NEON
On 06/12/11 17:59, Andrew Stubbs wrote: This patch adds a one's complement pattern for doing DImode 'not' in NEON registers. There are already patterns for doing one's complement of vectors, and even though it boils down to the same instruction, the DImode case was missing. The patch needs to be a little more complicated than using a mode iterator that includes DI because it needs to coexist with the non-neon one_cmpldi2 (renamed by this patch to one_cmpldi2_core). OK for when stage 1 opens again? Andrew neon-not.patch 2011-12-06 Andrew Stubbs a...@codesourcery.com gcc/ * config/arm/arm.md (one_cmpldi2): Rename to ... (one_cmpldi2_core): ... this, and modify it to prevent it being used for NEON. (one_cmpldi2): New define_expand. * config/arm/neon.md (one_cmpldi2_neon): New define_insn. +(define_insn_and_split *one_cmpldi2_core + [(set (match_operand:DI 0 arm_general_register_operand =r,r) + (not:DI (match_operand:DI 1 arm_general_register_operand 0,r)))] Thinking about it, for an operation with one input and one output, there's no need for the earlyclobber marker when the input is tied to the output (there's no other operand that can be clobbered). Otherwise OK. R.
Re: [google] Backport r171347 and r181549 from trunk (strict volatile bitfield) (issue5434084)
On Wed, Dec 07, 2011 at 04:16:25PM +, Joseph S. Myers wrote: On Wed, 7 Dec 2011, Richard Guenther wrote: code - well, that's ok. Pointing people to the latest official release series (in this case 4.6.x) is also ok, we're keeping too many branches active IMNSHO. As I recall we agreed in London that both 4.3 and 4.4 should be closed (after a final release from each branch) - and in general two active release branches was enough - but so far have only closed 4.3. (Closed = no datestamp updates, no snapshots, no releases, no tracking regression status in Bugzilla, not absolutely disallowing commits if someone feels it's still useful to put some fix on an old branch.) I'll handle 4.4.7 and closing the branch in the coming weeks. Jakub
Re: Tidy up MD_INCLUDES in config/arm/t-arm
On 01/12/11 11:09, Georg-Johann Lay wrote: Richard Earnshaw wrote: On 29/11/11 09:42, Matthew Gretton-Dann wrote: All, Whilst developing the Cortex-A15 integer pipeline patch it was noted that the MD_INCLUDES variable in config/arm/t-arm has not been kept up-to-date. The attached patch fixes this, and rearranges the list of md files into alphabetical order. The list was generated using `ls -1 *.md | grep -v arm\\.md`. Tested by doing a arm-none-eabi build. Can someone please review, and if appropriate apply? Thanks, Matt gcc/ChangeLog: 2011-11-29 Matthew Gretton-Dann matthew.gretton-d...@arm.com * config/arm/t-arm (MD_INCLUDES): Ensure all md files are listed. OK. R. Is each entry mandatory in that list? I thought gen-tools already arrange for great part of MD_INCLUDES? For example, after adding (include avr-dimode.md) to avr.md, ./gcc/mddeps.mk reads: MD_INCLUDES = \ ../../../gcc.gnu.org/trunk/gcc/config/avr/predicates.md \ ../../../gcc.gnu.org/trunk/gcc/config/avr/constraints.md \ ../../../gcc.gnu.org/trunk/gcc/config/avr/avr-dimode.md ../../../gcc.gnu.org/trunk/gcc/config/avr/predicates.md: ../../../gcc.gnu.org/trunk/gcc/config/avr/constraints.md: ../../../gcc.gnu.org/trunk/gcc/config/avr/avr-dimode.md: so that maintaining such a list might be considerable easier. Indeed. I hadn't realised that that feature had been added since I first added that macro. It does indeed look as though the whole definition of MD_INCLUDES is now redundant. R.
Re: [patch] ARM: Fix miscompilation in arm.md:*minmax_arithsi. (PR target/51408)
On 05/12/11 10:42, Richard Earnshaw wrote: On 04/12/11 13:32, kazu_hir...@mentor.com wrote: Hi, Attached is a patch to fix miscompilation in arm.md:*minmax_arithsi. The following testcase, reduced from efgcvt_r.c:fcvt_r in glibc, gets miscompiled: extern void abort (void); int __attribute__((noinline)) foo (int a, int b) { int max = (b 0) ? b : 0; return max - a; } int main (void) { if (foo (3, -1) != -3) abort (); return 0; } arm-none-eabi-gcc -O1 generates: foo: @ Function supports interworking. @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r1, #0 rsbge r0, r0, r1 bx lr This would be equivalent to: return b = 0 ? b - a : a; which is different from: return b = 0 ? b - a : -a; That is, in assembly code, we should have an else clause like so: cmp r1, #0 rsbge r0, r0, r1 - then clause rsblt r0, r0, #0 - else clause bx lr The problem comes from the fact that arm.md:*minmax_arithsi does not add the else clause even though MINUS is not commutative. The patch fixes the problem by always requiring the else clause in the MINUS case. Tested by running gcc testsuite on various ARM subarchitectures. OK to apply? Kazu Hirata gcc/ 2011-12-04 Kazu Hirata k...@codesourcery.com PR target/51408 * config/arm/arm.md (*minmax_arithsi): Always require the else clause in the MINUS case. gcc/testsuite/ 2011-12-04 Kazu Hirata k...@codesourcery.com PR target/51408 * gcc.dg/pr51408.c: New. OK. R. BTW, I would expect this to also exist in all the release branches. Could you back-port it where needed please. TIA, R.
[Patch, Fortran] PR51448 [4.6/4.7] Fix realloc with RHS conversion function
This fixes a -frealloc-lhs regression where the RHS is handled by a conversion function whose argument has component refs. Build and regtested on x86-64-linux. OK for the trunk and 4.7? Tobias 2011-12-07 Tobias Burnus bur...@net-b.de PR fortran/51448 * fortran/trans-array.c (get_std_lbound): Fix handling of conversion functions. 2011-12-07 Tobias Burnus bur...@net-b.de PR fortran/51448 * gfortran.dg/realloc_on_assign_8.f90: New. diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index ee8f896..2fd5749 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -7428,7 +7584,16 @@ get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size) gfc_array_index_type, cond, lbound, gfc_index_one_node); } - else if (expr-expr_type == EXPR_VARIABLE) + + if (expr-expr_type == EXPR_FUNCTION) +{ + /* A conversion function, so use the argument. */ + gcc_assert (expr-value.function.isym + expr-value.function.isym-conversion); + expr = expr-value.function.actual-expr; +} + + if (expr-expr_type == EXPR_VARIABLE) { tmp = TREE_TYPE (expr-symtree-n.sym-backend_decl); for (ref = expr-ref; ref; ref = ref-next) @@ -7441,15 +7606,6 @@ get_std_lbound (gfc_expr *expr, tree desc, int dim, bool assumed_size) } return GFC_TYPE_ARRAY_LBOUND(tmp, dim); } - else if (expr-expr_type == EXPR_FUNCTION) -{ - /* A conversion function, so use the argument. */ - expr = expr-value.function.actual-expr; - if (expr-expr_type != EXPR_VARIABLE) - return gfc_index_one_node; - desc = TREE_TYPE (expr-symtree-n.sym-backend_decl); - return get_std_lbound (expr, desc, dim, assumed_size); -} return gfc_index_one_node; } --- /dev/null 2011-12-07 07:00:28.727532040 +0100 +++ gcc/gcc/testsuite/gfortran.dg/realloc_on_assign_8.f90 2011-12-07 16:34:46.0 +0100 @@ -0,0 +1,17 @@ +! { dg-do compile } +! +! PR fortran/51448 +! +! Contribued by François Willot +! + PROGRAM MAIN + IMPLICIT NONE + TYPE mytype +REAL b(2) + END TYPE mytype + TYPE(mytype) a + DOUBLE PRECISION, ALLOCATABLE :: x(:) + ALLOCATE(x(2)) + a%b=0.0E0 + x=a%b + END
Re: [C++ PATCH] ICE with invalid user-defined literals (PR c++/51420)
Applied, thanks. Jason
Re: Fix a bug in ThreadSanitizer pass (issue 5448109)
ok for google/main. David On Wed, Dec 7, 2011 at 3:04 AM, dvyu...@google.com wrote: On 2011/12/05 17:05:17, dvyukov wrote: This is for google-main branch. Fix taking address of SSA_NAME in ThreadSanitizer pass. Index: gcc/tree-tsan.c === --- gcc/tree-tsan.c (revision 182014) +++ gcc/tree-tsan.c (working copy) @@ -726,13 +726,20 @@ struct mop_desc mop; unsigned fld_off; unsigned fld_size; + tree base; + + base = get_base_address (expr); + if (base == NULL_TREE + || TREE_CODE (base) == SSA_NAME + || TREE_CODE (base) == STRING_CST) + return; + tcode = TREE_CODE (expr); /* Below are things we do not instrument (no possibility of races or not implemented yet). */ if ((func_ignore (tsan_ignore_mop | tsan_ignore_rec)) - || get_base_address (expr) == NULL /* Compiler-emitted artificial variables. */ || (DECL_P (expr) DECL_ARTIFICIAL (expr)) /* The var does not live in memory - no possibility of races. */ Index: gcc/ChangeLog.google-main === --- gcc/ChangeLog.google-main (revision 182014) +++ gcc/ChangeLog.google-main (working copy) @@ -1,3 +1,8 @@ +2011-12-05 Dmitriy Vyukov mailto:dvyu...@google.com + + Fix taking address of SSA_NAME in ThreadSanitizer pass. + * gcc/tree-tsan.c (handle_expr): Add the additional check. + 2011-11-30 Dmitriy Vyukov mailto:dvyu...@google.com Add directives to run ThreadSanitizer tests -- This patch is available for review at http://codereview.appspot.com/5448109 Sorry, I forgot to provide details about the crash. The compiler crashes on the following code (dump just before tsan pass): store_param_double (struct NET * net, struct MYSQL_BIND * param) { double value; unsigned char * D.26882; long int D.26881; long int D.26879; unsigned char * D.26877; double value.122; void * D.26875; bb 2: D.26875_2 = param_1(D)-buffer; value.122_3 = MEM[(double *)D.26875_2]; D.26877_5 = net_4(D)-write_pos; D.26879_7 = VIEW_CONVERT_EXPRunion doubleget_union(value.122_3).m[0]; MEM[(long int *)D.26877_5] = D.26879_7; D.26877_8 = net_4(D)-write_pos; D.26881_11 = VIEW_CONVERT_EXPRunion doubleget_union(value.122_3).m[1]; MEM[(long int *)D.26877_8 + 4B] = D.26881_11; D.26877_12 = net_4(D)-write_pos; D.26882_13 = D.26877_12 + 8; net_4(D)-write_pos = D.26882_13; return; } with the following error message: libmysql.c: In function 'store_param_double': libmysql.c:2314:13: error: conversion of an SSA_NAME on the left hand side VIEW_CONVERT_EXPRunion doubleget_union(D.29825_23); D.29824_24 = VIEW_CONVERT_EXPRunion doubleget_union(D.29825_23).m[0]; libmysql.c:2314:13: internal compiler error: verify_gimple failed Is it a correct fix for the issue? If base_address refers to SSA_NAME we just do not instrument it, right? http://codereview.appspot.com/5448109/
[PATCH] increase timeout in simulate-thread gdb test
Currently we are failing... FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O1 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O2 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O3 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -Os -g thread simulation test on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread of gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20 seconds eliminates the failures (as these test take ~16 seconds on x86_64-apple-darwin11). Okay for gcc trunk? Jack gcc/testsuite/ 2011-12-07 Jack Howarth howa...@bromo.med.uc.edu * lib/gcc-simulate-thread.exp (simulate-thread): Increase timeout to 20 seconds. Index: gcc/testsuite/lib/gcc-simulate-thread.exp === --- gcc/testsuite/lib/gcc-simulate-thread.exp (revision 182083) +++ gcc/testsuite/lib/gcc-simulate-thread.exp (working copy) @@ -56,8 +56,8 @@ proc simulate-thread { args } { set gdb_worked 0 -# Set timeout to 10 seconds due to huge amount of generated log. -remote_expect target 10 { +# Set timeout to 20 seconds due to huge amount of generated log. +remote_expect target 20 { # Too old GDB -re Unhandled dwarf expression|Error in sourced command file { unsupported $testcase $message
Re: [PATCH] increase timeout in simulate-thread gdb test
On Wed, Dec 7, 2011 at 7:43 PM, Jack Howarth howa...@bromo.med.uc.edu wrote: Currently we are failing... FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O1 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O2 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O3 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -Os -g thread simulation test on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread of gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20 seconds eliminates the failures (as these test take ~16 seconds on x86_64-apple-darwin11). Okay for gcc trunk? As said elsewhere, this will double the amount of already large logfile in case of failed test. Do we really need such detailed log? Uros.
Re: Update to Fortran invoke documentation about the features -finit-type *really* provides.
On 12/06/2011 08:32 PM, Steve Kargl wrote: On Mon, Dec 05, 2011 at 07:21:59PM +0100, Toon Moene wrote: 2011-12-05 Toon Moenet...@moene.org PR/51310 invoke.texi: Itemize the cases for which -finit-type doesn't work. OK for trunk ? (and perhaps later for the 4.6 branch ? Looks good to me. You can apply it to the 4.6 branch if you have time. And then shortly before applying it, I realized that the proper documentation of the limitations might be dependent on the -fno-automatic, -fstack-arrays and -fmax-stack-var-size=n compiler flags used. So I'll come back tomorrow with version 2.0 of this patch, after checking out all of the above (the documentation of 4.6 and 4.7 will be different if using -fstack-arrays makes a difference, because that option only exists in 4.7). -- Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/ Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news
Re: [PATCH] increase timeout in simulate-thread gdb test
On 7 Dec 2011, at 18:47, Uros Bizjak wrote: On Wed, Dec 7, 2011 at 7:43 PM, Jack Howarth howa...@bromo.med.uc.edu wrote: Currently we are failing... FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O1 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O2 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O3 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -Os -g thread simulation test on x86_64-apple-darwin11 due to the 10 second timeout in simulate- thread of gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20 seconds eliminates the failures (as these test take ~16 seconds on x86_64- apple-darwin11). Okay for gcc trunk? if it's only one test can't you use { dg-timeout-factor 2.0 ? As said elsewhere, this will double the amount of already large logfile in case of failed test. Do we really need such detailed log? anything to optimize what's in the logs would be welcome in debugging Iain
Re: [PATCH] increase timeout in simulate-thread gdb test
On Wed, Dec 7, 2011 at 7:58 PM, Iain Sandoe develo...@sandoe-acoustics.co.uk wrote: Currently we are failing... FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O1 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O2 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -O3 -g thread simulation test FAIL: gcc.dg/simulate-thread/atomic-load-int128.c -Os -g thread simulation test on x86_64-apple-darwin11 due to the 10 second timeout in simulate-thread of gcc/testsuite/lib/gcc-simulate-thread.exp. Increasing this timeout to 20 seconds eliminates the failures (as these test take ~16 seconds on x86_64-apple-darwin11). Okay for gcc trunk? if it's only one test can't you use { dg-timeout-factor 2.0 ? As said elsewhere, this will double the amount of already large logfile in case of failed test. Do we really need such detailed log? anything to optimize what's in the logs would be welcome in debugging I fully agree, but it is trivial to re-run the test in the debugger outside the testsuite run. IMO, logging a couple of lines for execution of every instruction (in a loop!) is a bit excessive. Uros.
Re: [PATCH] [MIPS] Add -march=octeon+ support for GCC
On Tue, Dec 6, 2011 at 6:28 PM, Andrew Pinski pins...@gmail.com wrote: Hi, This patch adds -march=octeon+ to GCC. OK? Bootstrapped and tested on mips64-linux-gnu configured with --with-arch=octeon+ . Thanks, Andrew Pinski gcc/ChangeLog: * mips/mips-cpus.def (octeon+): New CPU. * config/mips/mips-tables.opt: Regenerate. * config/mips/mips.h (MIPS_CPP_SET_PROCESSOR): Emit '+' as 'P'. testsuite/ChangeLog: * gcc.target/mips/mult-1.c: Forbit all Octeon processors. * gcc.target/mips/dmult-1.c: Likewise. * gcc.target/mips/branch-1.c: Likewise. * gcc.target/mips/extend-1.c: Likewise. Woops I forgot the patch. Thanks, Andrew Pinski Index: testsuite/gcc.target/mips/mult-1.c === --- testsuite/gcc.target/mips/mult-1.c (revision 182066) +++ testsuite/gcc.target/mips/mult-1.c (working copy) @@ -1,6 +1,6 @@ /* For SI-DI widening multiplication we should use DINS to combine the two halves. For Octeon use DMUL with explicit widening. */ -/* { dg-options -O -mgp64 isa_rev=2 forbid_cpu=octeon } */ +/* { dg-options -O -mgp64 isa_rev=2 forbid_cpu=octeon\[\+0-9\]* } */ /* { dg-final { scan-assembler \tdins\t } } */ /* { dg-final { scan-assembler-not \tdsll\t } } */ /* { dg-final { scan-assembler-not \tdsrl\t } } */ Index: testsuite/gcc.target/mips/dmult-1.c === --- testsuite/gcc.target/mips/dmult-1.c (revision 182066) +++ testsuite/gcc.target/mips/dmult-1.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-options forbid_cpu=octeon -mgp64 } */ +/* { dg-options forbid_cpu=octeon\[\+0-9\]* -mgp64 } */ /* { dg-final { scan-assembler \tdmult\t } } */ /* { dg-final { scan-assembler \tmflo\t } } */ /* { dg-final { scan-assembler-not \tdmul\t } } */ Index: testsuite/gcc.target/mips/branch-1.c === --- testsuite/gcc.target/mips/branch-1.c(revision 182066) +++ testsuite/gcc.target/mips/branch-1.c(working copy) @@ -2,7 +2,7 @@ but we test for bbit elsewhere. On other targets, we should implement the if statements using an andi instruction followed by a branch on zero. */ -/* { dg-options -O2 forbid_cpu=octeon } */ +/* { dg-options -O2 forbid_cpu=octeon\[\+0-9\]* } */ void bar (void); NOMIPS16 void f1 (int x) { if (x 4) bar (); } Index: testsuite/gcc.target/mips/extend-1.c === --- testsuite/gcc.target/mips/extend-1.c(revision 182066) +++ testsuite/gcc.target/mips/extend-1.c(working copy) @@ -1,4 +1,4 @@ -/* { dg-options -O -mgp64 forbid_cpu=octeon } */ +/* { dg-options -O -mgp64 forbid_cpu=octeon\[\+0-9\]* } */ /* { dg-final { scan-assembler-times \tdsll\t 5 } } */ /* { dg-final { scan-assembler-times \tdsra\t 5 } } */ /* { dg-final { scan-assembler-not \tsll\t } } */ Index: config/mips/mips-tables.opt === --- config/mips/mips-tables.opt (revision 182066) +++ config/mips/mips-tables.opt (working copy) @@ -603,3 +603,6 @@ EnumValue Enum(mips_arch_opt_value) String(octeon) Value(80) Canonical +EnumValue +Enum(mips_arch_opt_value) String(octeon+) Value(81) Canonical + Index: config/mips/mips-cpus.def === --- config/mips/mips-cpus.def (revision 182066) +++ config/mips/mips-cpus.def (working copy) @@ -145,3 +145,4 @@ /* MIPS64 Release 2 processors. */ MIPS_CPU (octeon, PROCESSOR_OCTEON, 65, PTF_AVOID_BRANCHLIKELY) +MIPS_CPU (octeon+, PROCESSOR_OCTEON, 65, PTF_AVOID_BRANCHLIKELY) Index: config/mips/mips.h === --- config/mips/mips.h (revision 182066) +++ config/mips/mips.h (working copy) @@ -329,7 +329,10 @@ \ macro = concat ((PREFIX), _, (INFO)-name, NULL); \ for (p = macro; *p != 0; p++)\ - *p = TOUPPER (*p); \ +if (*p == '+') \ + *p = 'P'; \ +else\ + *p = TOUPPER (*p);\ \ builtin_define (macro); \ builtin_define_with_value ((PREFIX), (INFO)-name, 1); \
Re: [Patch, Fortran] PR51448 [4.6/4.7] Fix realloc with RHS conversion function
On Wednesday 07 December 2011 17:45:52 Tobias Burnus wrote: This fixes a -frealloc-lhs regression where the RHS is handled by a conversion function whose argument has component refs. Build and regtested on x86-64-linux. OK for the trunk and 4.7? Tobias OK. Mikael
Re: PR 51386
Attached patch applied: 2011-12-07 François Dumont fdum...@gcc.gnu.org PR libstdc++/51386 * include/bits/hashtable_policy.h (_Prime_rehash_policy::_M_next_bkt): Fix computation of _M_prev_resize so that hashtable do not keep on being rehashed when _M_max_load_factor is lower than 1. François On 12/07/2011 11:21 AM, Paolo Carlini wrote: Hi, Ok to commit ? Yes, thanks a lot for handling this. Please remember to add a proper header to the ChangeLog entry. Thanks again, Paolo. Index: include/bits/hashtable_policy.h === --- include/bits/hashtable_policy.h (revision 181975) +++ include/bits/hashtable_policy.h (working copy) @@ -300,23 +300,30 @@ { // Optimize lookups involving the first elements of __prime_list. // (useful to speed-up, eg, constructors) -static const unsigned long __fast_bkt[12] +static const unsigned char __fast_bkt[12] = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11 }; +if (__n = 11) + { + _M_prev_resize = 0; + _M_next_resize + = __builtin_ceil(__fast_bkt[__n] * (long double)_M_max_load_factor); + return __fast_bkt[__n]; + } + const unsigned long* __p - = __n = 11 ? __fast_bkt + __n - : std::lower_bound(__prime_list + 5, - __prime_list + _S_n_primes, __n); + = std::lower_bound(__prime_list + 5, __prime_list + _S_n_primes, __n); -_M_prev_resize = __builtin_floor(*__p * (long double)_M_max_load_factor); -if (__p != __fast_bkt) - _M_prev_resize = std::min(_M_prev_resize, -static_caststd::size_t(*(__p - 1))); -// Lets guaranty a minimal grow step of 11: +// Shrink will take place only if the number of elements is small enough +// so that the prime number 2 steps before __p is large enough to still +// conform to the max load factor: +_M_prev_resize + = __builtin_floor(*(__p - 2) * (long double)_M_max_load_factor); + +// Let's guaranty that a minimal grow step of 11 is used if (*__p - __n 11) - __p = std::lower_bound(__prime_list + 5, - __prime_list + _S_n_primes, __n + 11); -_M_next_resize = __builtin_floor(*__p * (long double)_M_max_load_factor); + __p = std::lower_bound(__p, __prime_list + _S_n_primes, __n + 11); +_M_next_resize = __builtin_ceil(*__p * (long double)_M_max_load_factor); return *__p; }
Re: [Patch, Fortran] PR51407 - allow BOZ edit descriptors for REAL/COMPLEX
On Wednesday 07 December 2011 14:54:36 Tobias Burnus wrote: * ping * ? On 12/04/2011 06:46 PM, Tobias Burnus wrote: Hi all, as Dominique has found, Fortran 2008 allows the BOZ edit descriptors now also with REAL and COMPLEX arguments. (See PR for quotes from the standard.) Build and regtested on x86-64-linux. OK for the trunk? OK. Mikael
Re: [Patch, Fortran] PR 51378 Fix component-access check
On Friday 02 December 2011 22:01:19 Tobias Burnus wrote: Found via Reinhold Bader's test suite: If a component is public, it remains public even if the extended type has PRIVATE. Build and regtested on x86-64-linux. OK for the trunk? OK. Mikael
[PATCH] Avoid using constructor attribute in libcpp (PR bootstrap/50237)
Hi! While this isn't real fix for this PR, which should be fixed in configury, I think avoiding ctors in host code is useful for portability and Eric agreed with that in the PR. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-12-07 Jakub Jelinek ja...@redhat.com PR bootstrap/50237 * internal.h (_cpp_init_lexer): New prototype. * init.c (init_library): Call it. * lex.c (init_vectorized_lexer): Remove constructor attribute, add inline keyword. (HAVE_init_vectorized_lexer): Define. (_cpp_init_lexer): New function. --- libcpp/internal.h.jj2011-11-04 07:49:56.0 +0100 +++ libcpp/internal.h 2011-11-28 16:01:13.467831488 +0100 @@ -653,6 +653,7 @@ extern int _cpp_equiv_tokens (const cpp_ extern void _cpp_init_tokenrun (tokenrun *, unsigned int); extern cpp_hashnode *_cpp_lex_identifier (cpp_reader *, const char *); extern int _cpp_remaining_tokens_num_in_context (cpp_context *); +extern void _cpp_init_lexer (void); /* In init.c. */ extern void _cpp_maybe_push_include_file (cpp_reader *); --- libcpp/init.c.jj2011-10-31 20:44:15.0 +0100 +++ libcpp/init.c 2011-11-28 16:01:42.168044257 +0100 @@ -134,6 +134,8 @@ init_library (void) { initialized = 1; + _cpp_init_lexer (); + /* Set up the trigraph map. This doesn't need to do anything if we were compiled with a compiler that supports C99 designated initializers. */ --- libcpp/lex.c.jj 2011-10-27 08:43:10.0 +0200 +++ libcpp/lex.c2011-11-28 16:04:42.624703698 +0100 @@ -477,7 +477,8 @@ search_line_sse42 (const uchar *s, const typedef const uchar * (*search_line_fast_type) (const uchar *, const uchar *); static search_line_fast_type search_line_fast; -static void __attribute__((constructor)) +#define HAVE_init_vectorized_lexer 1 +static inline void init_vectorized_lexer (void) { unsigned dummy, ecx = 0, edx = 0; @@ -638,6 +639,16 @@ search_line_fast (const uchar *s, const #endif +/* Initialize the lexer if needed. */ + +void +_cpp_init_lexer (void) +{ +#ifdef HAVE_init_vectorized_lexer + init_vectorized_lexer (); +#endif +} + /* Returns with a logical line that contains no escaped newlines or trigraphs. This is a time-critical inner loop. */ void Jakub
[C++ PATCH] Error out on [N] = style designated initializers for classes (PR c++/51229)
Hi! We already error out on .foo = style designators for arrays, but for [N] = style designators for classes we'd just ICE. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? We still have a bunch of accepts-invalid, no idea how to solve them, because reshape_init_r is called often several times with the same d-cur. IMHO we should error on all of: struct A { int i; }; int a = { .foo = 6 }; int b = { [0] = 1 }; _Complex float c = { .foo = 0, 1 }; _Complex float d = { [0] = 0, 1 }; _Complex float e = { 0, .foo = 1 }; _Complex float f = { 0, [0] = 1 }; char g[] = { [7] = abcd }; The last one is a accepts-invalid even for C. Jason, do you think you could look at that on the C++ side, I'll look at the C one? 2011-12-07 Jakub Jelinek ja...@redhat.com PR c++/51229 * decl.c (reshape_init_class): Complain if d-cur-index is INTEGER_CST. * parser.c (cp_parser_initializer_list): If cp_parser_parse_definitely fails, clear designator. * g++.dg/ext/desig3.C: New test. --- gcc/cp/decl.c.jj2011-12-01 11:45:04.0 +0100 +++ gcc/cp/decl.c 2011-12-07 12:51:17.455762864 +0100 @@ -5078,6 +5078,14 @@ reshape_init_class (tree type, reshape_i /* Handle designated initializers, as an extension. */ if (d-cur-index) { + if (TREE_CODE (d-cur-index) == INTEGER_CST) + { + if (complain tf_error) + error (%[%E] =% used in a GNU-style designated initializer + for class %qT, d-cur-index, type); + return error_mark_node; + } + field = lookup_field_1 (type, d-cur-index, /*want_type=*/false); if (!field || TREE_CODE (field) != FIELD_DECL) --- gcc/cp/parser.c.jj 2011-11-28 17:58:00.0 +0100 +++ gcc/cp/parser.c 2011-12-07 11:31:33.525651711 +0100 @@ -17713,7 +17713,8 @@ cp_parser_initializer_list (cp_parser* p designator = cp_parser_constant_expression (parser, false, NULL); cp_parser_require (parser, CPP_CLOSE_SQUARE, RT_CLOSE_SQUARE); cp_parser_require (parser, CPP_EQ, RT_EQ); - cp_parser_parse_definitely (parser); + if (!cp_parser_parse_definitely (parser)) + designator = NULL_TREE; } else designator = NULL_TREE; --- gcc/testsuite/g++.dg/ext/desig3.C.jj2011-12-07 12:55:34.894323455 +0100 +++ gcc/testsuite/g++.dg/ext/desig3.C 2011-12-07 12:56:32.172001945 +0100 @@ -0,0 +1,9 @@ +// PR c++/51229 +// { dg-do compile } +// { dg-options } + +struct A { int i; }; + +int a[5] = { .foo = 7 };// { dg-error used in a GNU-style designated initializer for an array } +int b[] = { .foo = 8 };// { dg-error used in a GNU-style designated initializer for an array } +A c = { [0] = {} };// { dg-error used in a GNU-style designated initializer for class } Jakub
[C++ PATCH] Fix ICE in cxx_incomplete_type_diagnostic (PR c++/51429)
Hi! This testcase regressed recently when Paolo added the user friendlier invalid use of member function vs. invalid use of member distinction, when TREE_OPERAND (value, 1) is BASELINK, DECL_FUNCTION_MEMBER_P on it dies with a checking ICE. The following patch calls get_first_fn if it is overloaded_fn. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-12-07 Jakub Jelinek ja...@redhat.com PR c++/51429 * typeck2.c (cxx_incomplete_type_diagnostic): Don't ICE if TREE_OPERAND (value, 1) is overloaded. * g++.dg/parse/error45.C: New test. --- gcc/cp/typeck2.c.jj 2011-11-07 12:40:45.0 +0100 +++ gcc/cp/typeck2.c2011-12-07 15:31:30.506795286 +0100 @@ -428,15 +428,20 @@ cxx_incomplete_type_diagnostic (const_tr case OFFSET_TYPE: bad_member: - if (DECL_FUNCTION_MEMBER_P (TREE_OPERAND (value, 1)) - ! flag_ms_extensions) - emit_diagnostic (diag_kind, input_location, 0, -invalid use of member function -(did you forget the %()% ?)); - else - emit_diagnostic (diag_kind, input_location, 0, -invalid use of member -(did you forget the %% ?)); + { + tree member = TREE_OPERAND (value, 1); + if (is_overloaded_fn (member)) + member = get_first_fn (member); + if (DECL_FUNCTION_MEMBER_P (member) +! flag_ms_extensions) + emit_diagnostic (diag_kind, input_location, 0, + invalid use of member function + (did you forget the %()% ?)); + else + emit_diagnostic (diag_kind, input_location, 0, + invalid use of member + (did you forget the %% ?)); + } break; case TEMPLATE_TYPE_PARM: --- gcc/testsuite/g++.dg/parse/error45.C.jj 2011-12-07 15:45:28.732286838 +0100 +++ gcc/testsuite/g++.dg/parse/error45.C2011-12-07 15:44:20.0 +0100 @@ -0,0 +1,9 @@ +// PR c++/51429 +// { dg-do compile } + +struct A +{ + void foo (double); + void foo (int); + A () { foo = 0; }// { dg-error invalid use of member function } +}; Jakub
Re: [C++ PATCH] Error out on [N] = style designated initializers for classes (PR c++/51229)
On 12/07/2011 03:25 PM, Jakub Jelinek wrote: Hi! We already error out on .foo = style designators for arrays, but for [N] = style designators for classes we'd just ICE. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. The last one is a accepts-invalid even for C. Jason, do you think you could look at that on the C++ side, I'll look at the C one? At some point, sure. Would you open a PR? Jason
Re: [C++ PATCH] Fix ICE in cxx_incomplete_type_diagnostic (PR c++/51429)
OK. Jason
Re: [PATCH][ARM] one_cmpldi2 in NEON
On 07/12/11 16:25, Richard Earnshaw wrote: 2011-12-06 Andrew Stubbsa...@codesourcery.com gcc/ * config/arm/arm.md (one_cmpldi2): Rename to ... (one_cmpldi2_core): ... this, and modify it to prevent it being used for NEON. (one_cmpldi2): New define_expand. * config/arm/neon.md (one_cmpldi2_neon): New define_insn. +(define_insn_and_split *one_cmpldi2_core + [(set (match_operand:DI 0 arm_general_register_operand =r,r) + (not:DI (match_operand:DI 1 arm_general_register_operand 0,r)))] Thinking about it, for an operation with one input and one output, there's no need for the earlyclobber marker when the input is tied to the output (there's no other operand that can be clobbered). Otherwise OK. Thanks I'll test and commit the attached updated patch once stage 1 opens again. Andrew 2011-12-07 Andrew Stubbs a...@codesourcery.com gcc/ * config/arm/arm.md (one_cmpldi2): Rename to ... (one_cmpldi2_core): ... this, and modify it to prevent it being used for NEON. (one_cmpldi2): New define_expand. * config/arm/neon.md (one_cmpldi2_neon): New define_insn. --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4199,10 +4199,16 @@ TARGET_32BIT TARGET_HARD_FLOAT (TARGET_FPA || TARGET_VFP_DOUBLE) ) -(define_insn_and_split one_cmpldi2 - [(set (match_operand:DI 0 s_register_operand =r,r) - (not:DI (match_operand:DI 1 s_register_operand 0,r)))] +(define_expand one_cmpldi2 + [(set (match_operand:DI 0 s_register_operand ) + (not:DI (match_operand:DI 1 s_register_operand )))] TARGET_32BIT + ) + +(define_insn_and_split *one_cmpldi2_core + [(set (match_operand:DI 0 arm_general_register_operand =r,r) + (not:DI (match_operand:DI 1 arm_general_register_operand 0,r)))] + TARGET_32BIT !TARGET_NEON # TARGET_32BIT reload_completed [(set (match_dup 0) (not:SI (match_dup 1))) --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -896,6 +896,20 @@ [(set_attr neon_type neon_int_1)] ) +(define_insn *one_cmpldi2_neon + [(set (match_operand:DI 0 s_register_operand =w,?r,?r,?w) + (not:DI (match_operand:DI 1 s_register_operand w, 0, r, w)))] + TARGET_NEON + @ + vmvn\t%P0, %P1 + # + # + vmvn\t%P0, %P1 + [(set_attr neon_type neon_int_1,*,*,neon_int_1) + (set_attr length *,8,8,*) + (set_attr arch nota8,*,*,onlya8)] +) + (define_insn absmode2 [(set (match_operand:VDQW 0 s_register_operand =w) (abs:VDQW (match_operand:VDQW 1 s_register_operand w)))]
[C++ PATCH] Avoid ICE on auto non-static data members (PR c++/51401)
Hi! By my reading auto is not valid on non-static data members, if they aren't NSDMI, then we'd error on them (but differently between templates and non-templates), but with NSDMI in template we just ICE because init is DEFAULT_ARG. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-12-07 Jakub Jelinek ja...@redhat.com PR c++/51401 * decl.c (grokdeclarator): Error for auto on non-static data members. * g++.dg/cpp0x/auto7.C: Adjust expected error message. * g++.dg/cpp0x/auto29.C: New test. --- gcc/cp/decl.c.jj2011-12-07 12:51:17.0 +0100 +++ gcc/cp/decl.c 2011-12-07 18:09:09.113310949 +0100 @@ -9971,6 +9971,12 @@ grokdeclarator (const cp_declarator *dec } else if (decl_context == FIELD) { + if (!staticp type_uses_auto (type)) + { + error (non-static data member declared %auto%); + type = error_mark_node; + } + /* The C99 flexible array extension. */ if (!staticp TREE_CODE (type) == ARRAY_TYPE TYPE_DOMAIN (type) == NULL_TREE) --- gcc/testsuite/g++.dg/cpp0x/auto7.C.jj 2011-05-31 08:02:58.0 +0200 +++ gcc/testsuite/g++.dg/cpp0x/auto7.C 2011-12-07 18:14:18.011549947 +0100 @@ -9,5 +9,5 @@ templateint struct A { static auto k = 7; // { dg-error non-const } static auto l; // { dg-error has no initializer } - auto m; // { dg-error has no initializer } + auto m; // { dg-error non-static data member declared } }; --- gcc/testsuite/g++.dg/cpp0x/auto29.C.jj 2011-12-07 18:12:22.181184910 +0100 +++ gcc/testsuite/g++.dg/cpp0x/auto29.C 2011-12-07 18:11:52.0 +0100 @@ -0,0 +1,25 @@ +// PR c++/51401 +// { dg-do compile } +// { dg-options -std=c++11 } + +template int +struct A +{ + auto i; // { dg-error non-static data member declared } +}; + +template int +struct B +{ + auto i = 0; // { dg-error non-static data member declared } +}; + +struct C +{ + auto i; // { dg-error non-static data member declared } +}; + +struct D +{ + auto i = 0; // { dg-error non-static data member declared } +}; Jakub
Re: [Patch, Fortran] PR50815 - don't -fcheck=bounds of deferred-length strings
On Wednesday 07 December 2011 14:53:20 Tobias Burnus wrote: ** PING ** On 11/29/2011 07:35 PM, Tobias Burnus wrote: gfortran had an ICE when trying to insert a check whether the character length between actual and dummy argument matches. This check is pointless for deferred-length character arguments - thus, no bounds check should be generated. Build and regtested on x86-64-linux. OK for the trunk? OK, though I would have merged the skip for deferred lengths into the toplevel if condition (either is fine anyway). Mikael
Re: [PATCH] Avoid using constructor attribute in libcpp (PR bootstrap/50237)
On 12/07/2011 12:20 PM, Jakub Jelinek wrote: PR bootstrap/50237 * internal.h (_cpp_init_lexer): New prototype. * init.c (init_library): Call it. * lex.c (init_vectorized_lexer): Remove constructor attribute, add inline keyword. (HAVE_init_vectorized_lexer): Define. (_cpp_init_lexer): New function. Ok. r~
[Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase
An issue turned up in our internal 4.6 based testing that has been fixed on trunk. This patch backports the fix to 4.6. I also have a small test case that I will add to both 4.6 and 4.7. Bootstrapped and checked with x86_64-unknown-linux-gnu. Can someone review? Thanks, Teresa 2011-12-07 Teresa Johnson tejohn...@google.com Backport from mainline: 2011-08-05 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*movdi_internal_rex64): Use !o constraint instead of !m for operand 0, alternative 4. (*movdf_internal_rex64): Ditto for operand 0, alernative 6. 2011-12-07 Teresa Johnson tejohn...@google.com * gcc.target/i386/movdi-rex64.c: New. Index: config/i386/i386.md===--- config/i386/i386.md (revision 182083)+++ config/i386/i386.md(working copy)@@ -1960,7 +1960,7 @@ (define_insn *movdi_internal_rex64 [(set (match_operand:DI 0 nonimmediate_operand- =r,r ,r,m ,!m,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym)+ =r,r ,r,m ,!o,*y,*y,?r ,m ,?*Ym,?*y,*x,*x,?r ,m,?*Yi,*x,?*x,?*Ym) (match_operand:DI 1 general_operand Z ,rem,i,re,n ,C ,*y,*Ym,*y,r ,m ,C ,*x,*Yi,*x,r ,m ,*Ym,*x))] TARGET_64BIT !(MEM_P (operands[0]) MEM_P (operands[1]))@@ -2905,7 +2905,7 @@ (define_insn *movdf_internal_rex64 [(set (match_operand:DF 0 nonimmediate_operand- =f,m,f,r ,m,!r,!m,Y2*x,Y2*x,Y2*x,m ,Yi,r )+ =f,m,f,r ,m,!r,!o,Y2*x,Y2*x,Y2*x,m ,Yi,r ) (match_operand:DF 1 general_operand fm,f,G,rm,r,F ,F ,C ,Y2*x,m ,Y2*x,r ,Yi))] TARGET_64BIT !(MEM_P (operands[0]) MEM_P (operands[1])) Index: testsuite/gcc.target/i386/movdi-rex64.c === --- testsuite/gcc.target/i386/movdi-rex64.c (revision 0) +++ testsuite/gcc.target/i386/movdi-rex64.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options -fPIE -Wwrite-strings } */ + +#include string.h +static __thread char buffer[25]; +const char * error_message (void) +{ +oops: +strcpy (buffer, Unknown code ); +return 0; +} -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase
Hello! An issue turned up in our internal 4.6 based testing that has been fixed on trunk. This patch backports the fix to 4.6. I also have a small test case that I will add to both 4.6 and 4.7. Bootstrapped and checked with x86_64-unknown-linux-gnu. 2011-12-07 Teresa Johnson tejohn...@google.com Backport from mainline: 2011-08-05 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*movdi_internal_rex64): Use !o constraint instead of !m for operand 0, alternative 4. (*movdf_internal_rex64): Ditto for operand 0, alernative 6. 2011-12-07 Teresa Johnson tejohn...@google.com * gcc.target/i386/movdi-rex64.c: New. Index: testsuite/gcc.target/i386/movdi-rex64.c === --- testsuite/gcc.target/i386/movdi-rex64.c (revision 0) +++ testsuite/gcc.target/i386/movdi-rex64.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options -fPIE -Wwrite-strings } */ + +#include string.h +static __thread char buffer[25]; +const char * error_message (void) +{ +oops: +strcpy (buffer, Unknown code ); +return 0; +} You don't need #include for compile tests, just use: --cut here-- /* { dg-do compile } */ /* { dg-options -fPIE } */ char *strcpy (char *dest, const char *src); static __thread char buffer[25]; const char * error_message (void) { strcpy (buffer, Unknown code ); return 0; } --cut here-- Also this can be compiled everywhere, not just linux. OK with the testcase above. Thanks, Uros.
Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase
On Wed, Dec 7, 2011 at 2:12 PM, Uros Bizjak ubiz...@gmail.com wrote: Hello! An issue turned up in our internal 4.6 based testing that has been fixed on trunk. This patch backports the fix to 4.6. I also have a small test case that I will add to both 4.6 and 4.7. Bootstrapped and checked with x86_64-unknown-linux-gnu. 2011-12-07 Teresa Johnson tejohn...@google.com Backport from mainline: 2011-08-05 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*movdi_internal_rex64): Use !o constraint instead of !m for operand 0, alternative 4. (*movdf_internal_rex64): Ditto for operand 0, alernative 6. 2011-12-07 Teresa Johnson tejohn...@google.com * gcc.target/i386/movdi-rex64.c: New. Index: testsuite/gcc.target/i386/movdi-rex64.c === --- testsuite/gcc.target/i386/movdi-rex64.c (revision 0) +++ testsuite/gcc.target/i386/movdi-rex64.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options -fPIE -Wwrite-strings } */ + +#include string.h +static __thread char buffer[25]; +const char * error_message (void) +{ +oops: + strcpy (buffer, Unknown code ); + return 0; +} You don't need #include for compile tests, just use: --cut here-- /* { dg-do compile } */ /* { dg-options -fPIE } */ char *strcpy (char *dest, const char *src); static __thread char buffer[25]; const char * error_message (void) { strcpy (buffer, Unknown code ); return 0; } --cut here-- Also this can be compiled everywhere, not just linux. Ok, I will change the testcase to replace the include and retest. Regarding the linux target restriction, though, I was concerned about the -fPIE option used for the test case. I noticed that in 4.7 there is a pie effective target keyword (check_effective_target_pie in testsuite/lib/target-supports.exp). However, that does not yet exist in 4.6, so rather than backport that as well I added the linux restriction. I see the same restriction in the other tests that use -fpie in gcc.target/i386 (pr39013-[12].c). What do you think? Thanks! Teresa OK with the testcase above. Thanks, Uros. -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Committed] Fix PR libffi/50051 (MIPS libffi does not compile for mips64octeon-linux-gnu)
Hi, The problem here is Cavium's octeon assembler rejects floating point if the arch is set to either octeon or octeon2. This fixes the issue by adding: .set mips4 so that floating point instructions are enabled. Committed as approved by Anthony in the bugzilla. Thanks, Andrew Pinski libffi/ChangeLog: * src/mips/n32.S: Add .set mips4. Index: src/mips/n32.S === --- src/mips/n32.S (revision 182083) +++ src/mips/n32.S (working copy) @@ -43,6 +43,7 @@ #ifdef __GNUC__ .abicalls #endif + .set mips4 .text .align 2 .globl ffi_call_N32
Re: [v3] RFC: rename __calculate_memory_order
* include/bits/atomic_base.h (__calculate_memory_order): Rename to... (__cmpexch_failure_order): This, and rewrite as constexpr function. (compare_exchange_strong, compare_exchange_weak): Use it. * include/std/atomic (compare_exchange_strong, compare_exchange_weak): Likewise. Tested x86_64-linux. looks great to me. More constexpr, what's not to like? -benjamin
-finstrument-functions leads to unsats to _mm instrinsic wrappers
Build the test case in the patch file with -finstrument-functions, the link will fail with unsat. The problem is gcc instruments the artificial wrappers that will won't be emitted. The patch fixes the problem. Bootstrap and tested on x86-64/linux. Ok for mainline? thanks, David Index: gimplify.c === --- gimplify.c (revision 182083) +++ gimplify.c (working copy) @@ -8048,6 +8048,12 @@ flag_instrument_functions_exclude_p (tre return true; } +/* Avoid instrumenting wrapper functions to builtins. */ + +if (DECL_DISREGARD_INLINE_LIMITS (fndecl) + lookup_attribute (always_inline, DECL_ATTRIBUTES (fndecl))) + return true; + return false; } Index: testsuite/gcc.target/i386/instrument-func.c === --- testsuite/gcc.target/i386/instrument-func.c (revision 0) +++ testsuite/gcc.target/i386/instrument-func.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-require-effective-target sse } */ +/* { dg-options -O2 -msse -finstrument-functions } */ + +#include xmmintrin.h + +__attribute__((noinline)) __m128 foo( float *row, int x) +{ + __m128 vals = _mm_loadu_ps(row + x); + return vals; +} + +int main() +{ + float f[10]; + foo(f, 5); + return 0; +} cg1 Description: Binary data cg2 Description: Binary data
Re: -finstrument-functions leads to unsats to _mm instrinsic wrappers
On Wed, Dec 7, 2011 at 4:01 PM, Xinliang David Li davi...@google.com wrote: Build the test case in the patch file with -finstrument-functions, the link will fail with unsat. The problem is gcc instruments the artificial wrappers that will won't be emitted. The patch fixes the problem. Bootstrap and tested on x86-64/linux. I think you really should be checking for artificial attribute and not looking at always_inline. Thanks, Andrew Pinski
Re: -finstrument-functions leads to unsats to _mm instrinsic wrappers
On Wed, Dec 7, 2011 at 4:25 PM, Andrew Pinski pins...@gmail.com wrote: On Wed, Dec 7, 2011 at 4:01 PM, Xinliang David Li davi...@google.com wrote: Build the test case in the patch file with -finstrument-functions, the link will fail with unsat. The problem is gcc instruments the artificial wrappers that will won't be emitted. The patch fixes the problem. Bootstrap and tested on x86-64/linux. I think you really should be checking for artificial attribute and not looking at always_inline. Interesting -- I thought I added that. The following is the revised one. thanks, David Thanks, Andrew Pinski Index: gimplify.c === --- gimplify.c (revision 182083) +++ gimplify.c (working copy) @@ -8048,6 +8048,13 @@ flag_instrument_functions_exclude_p (tre return true; } +/* Avoid instrumenting wrapper functions to builtins. */ + +if (DECL_ARTIFICIAL (fndecl) + DECL_DISREGARD_INLINE_LIMITS (fndecl) + lookup_attribute (always_inline, DECL_ATTRIBUTES (fndecl))) + return true; + return false; } Index: testsuite/gcc.target/i386/instrument-func.c === --- testsuite/gcc.target/i386/instrument-func.c (revision 0) +++ testsuite/gcc.target/i386/instrument-func.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-require-effective-target sse } */ +/* { dg-options -O2 -msse -finstrument-functions } */ + +#include xmmintrin.h + +__attribute__((noinline)) __m128 foo( float *row, int x) +{ + __m128 vals = _mm_loadu_ps(row + x); + return vals; +} + +int main() +{ + float f[10]; + foo(f, 5); + return 0; +}
Re: [C++ PATCH] Avoid ICE on auto non-static data members (PR c++/51401)
OK. Jason
[4.7][google] Adding a new option -fstack-protector-strong. (issue5461043)
Hi, this patch provides a new stack protection option - fstack-protector-strong. Background - some times stack-protector is too-simple while stack-protector-all over-kills, for example, to build one of our core systems, we forcibly add -fstack-protector-all to all compile commands, which brings big performance penalty (due to extra stack guard/check insns on function prologue and epilogue) on both atom and arm. To use -fstack-protector is just regarded as not secure enough (only protects 2% functions) by the system secure team. So I'd like to add the option -fstack-protector-strong, that hits the balance between -fstack-protector and -fstack-protector-all. Benefit - gain big performance while sacrificing little security (for scenarios using -fstack-protector-all) Status - implemented internally, to be up-streamed or merged to google branch only. Detail - https://docs.google.com/a/google.com/document/d/1xXBH6rRZue4f296vGt9YQcuLVQHeE516stHwt8M9xyU/edit?hl=en_US Tested - manually, built chrome browser using the modified compiler with -fstack-protector-strong. = gcc/ChangeLog: * cfgexpand.c (expand_used_vars): Add logic handling stack-protector-strong. (is_local_address_taken): Internal function that returns true when gimple contains an address taken on function local variables. (record_or_union_type_has_array): New, tests if a record or union type contains an array. * common.opt (fstack-protector-all): New option. gcc/testsuite/ChangeLog * gcc.dg/fstack-protector-strong.c: New. == diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 8684721..1d9df87 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -1507,6 +1507,50 @@ estimated_stack_frame_size (struct cgraph_node *node) return size; } +static int is_local_address_taken(tree t) { + if (t TREE_CODE(t) == ADDR_EXPR) { +int i; +tree local_var; +tree v = TREE_OPERAND(t, 0); +switch (TREE_CODE(v)) { +case MEM_REF: + for (i = 0; i TREE_OPERAND_LENGTH(v); ++i) +if (is_local_address_taken(TREE_OPERAND(v, i))) + return 1; + return 0; +case COMPONENT_REF: + while (v TREE_CODE(v) == COMPONENT_REF) +v = TREE_OPERAND(v, 0); + break; +case VAR_DECL: +default: + ; +} +if (v TREE_CODE(v) == VAR_DECL !is_global_var(v)) { + FOR_EACH_VEC_ELT(tree, cfun-local_decls, i, local_var) { +if (local_var == v) + return 1; + } +} + } + return 0; +} + +static int record_or_union_type_has_array(tree tree_type) { + tree fields = TYPE_FIELDS(tree_type); + tree f; + for (f = fields; f; f = DECL_CHAIN (f)) { +if (TREE_CODE(f) == FIELD_DECL) { + tree field_type = TREE_TYPE(f); + if (RECORD_OR_UNION_TYPE_P(field_type)) +return record_or_union_type_has_array(field_type); + if (TREE_CODE(field_type) == ARRAY_TYPE) +return 1; +} + } + return 0; +} + /* Expand all variables used in the function. */ static void @@ -1516,6 +1560,8 @@ expand_used_vars (void) VEC(tree,heap) *maybe_local_decls = NULL; unsigned i; unsigned len; + int gen_stack_protect_signal = 0; + basic_block bb; /* Compute the phase of the stack frame for this function. */ { @@ -1548,6 +1594,63 @@ expand_used_vars (void) } } + /* Examine each basic block for address taking of local variables. */ + FOR_EACH_BB(bb) { +gimple_stmt_iterator si; +/* Scanning phis. */ +for (si = gsi_start_phis(bb); !gsi_end_p(si); gsi_next(si)) { + gimple phi_stmt = gsi_stmt(si); + unsigned int i; + for (i = 0; i gimple_phi_num_args(phi_stmt); ++i) +if (is_local_address_taken(gimple_phi_arg(phi_stmt, i)-def)) + ++gen_stack_protect_signal; +} +/* Scanning assignments and calls. */ +for (si = gsi_start_bb(bb); !gen_stack_protect_signal !gsi_end_p(si); + gsi_next(si)) { + gimple stmt = gsi_stmt (si); + if (is_gimple_assign(stmt)) { +switch(gimple_assign_rhs_class(stmt)) { +case GIMPLE_TERNARY_RHS: + if (is_local_address_taken(gimple_assign_rhs3(stmt))) { +++gen_stack_protect_signal; +break; + } + /* Otherwise, fall through. */ +case GIMPLE_BINARY_RHS: + if (is_local_address_taken(gimple_assign_rhs2(stmt))) { +++gen_stack_protect_signal; +break; + } +case GIMPLE_SINGLE_RHS: +case GIMPLE_UNARY_RHS: + if (is_local_address_taken(gimple_assign_rhs1(stmt))) +++gen_stack_protect_signal; + break; +case GIMPLE_INVALID_RHS: + break; +} + } + + if (!gen_stack_protect_signal is_gimple_call(stmt)) { +int ii, num_arg = gimple_call_num_args(stmt); +for (ii = 0; !gen_stack_protect_signal ii num_arg; ++ii) + if
Re: [4.7][google] Adding a new option -fstack-protector-strong. (issue5461043)
On Wed, Dec 7, 2011 at 5:07 PM, Han Shen shen...@google.com wrote: + /* Examine each basic block for address taking of local variables. */ I don't think you need to look at the basic blocks to figure out if a local variable has its address taken. You can look through referenced variables and see if it is a local variable and TREE_ADDRESSABLE is set. This should speed up the code a lot. Though that might has some false positives with arrays but that is ok because you are checking if there are any arrays already anyways. Thanks, Andrew Pinski
Re: [Patch, i386] [4.6] Backport movd*_internal_rex64 fix from trunk and [4.6/4.7] add testcase
On Wed, Dec 7, 2011 at 11:32 PM, Teresa Johnson tejohn...@google.com wrote: An issue turned up in our internal 4.6 based testing that has been fixed on trunk. This patch backports the fix to 4.6. I also have a small test case that I will add to both 4.6 and 4.7. Bootstrapped and checked with x86_64-unknown-linux-gnu. 2011-12-07 Teresa Johnson tejohn...@google.com Backport from mainline: 2011-08-05 Uros Bizjak ubiz...@gmail.com * config/i386/i386.md (*movdi_internal_rex64): Use !o constraint instead of !m for operand 0, alternative 4. (*movdf_internal_rex64): Ditto for operand 0, alernative 6. 2011-12-07 Teresa Johnson tejohn...@google.com * gcc.target/i386/movdi-rex64.c: New. Index: testsuite/gcc.target/i386/movdi-rex64.c === --- testsuite/gcc.target/i386/movdi-rex64.c (revision 0) +++ testsuite/gcc.target/i386/movdi-rex64.c (revision 0) @@ -0,0 +1,11 @@ +/* { dg-do compile { target *-*-linux* } } */ +/* { dg-options -fPIE -Wwrite-strings } */ + +#include string.h +static __thread char buffer[25]; +const char * error_message (void) +{ +oops: + strcpy (buffer, Unknown code ); + return 0; +} You don't need #include for compile tests, just use: --cut here-- /* { dg-do compile } */ /* { dg-options -fPIE } */ char *strcpy (char *dest, const char *src); static __thread char buffer[25]; const char * error_message (void) { strcpy (buffer, Unknown code ); return 0; } --cut here-- Also this can be compiled everywhere, not just linux. Ok, I will change the testcase to replace the include and retest. Regarding the linux target restriction, though, I was concerned about the -fPIE option used for the test case. I noticed that in 4.7 there is a pie effective target keyword (check_effective_target_pie in testsuite/lib/target-supports.exp). However, that does not yet exist in 4.6, so rather than backport that as well I added the linux restriction. I see the same restriction in the other tests that use -fpie in gcc.target/i386 (pr39013-[12].c). What do you think? Ah, I see. Then pleasee add back linux target selctor for 4.6 and add fpie effective target check for 4.7. Thanks, Uros.