RE: A case exposing code sink issue
-Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Tuesday, December 27, 2011 5:10 PM To: 'Richard Guenther' Cc: Michael Matz; gcc@gcc.gnu.org Subject: RE: A case exposing code sink issue The job to do this is final value replacement, not sinking (we do not sink non-invariant expressions - you'd have to translate them through the loop-closed SSA exit PHI node, certainly doable, patches welcome ;)). Richard, In final value replacement, expression a + D. can be figured out, while a[i_xxx] failed to be CHRECed, so I'm wondering if we should lower a[i_xxx] to a + unitsize(a) * i_xxx first? It seems GCC intends to keep a[i_xxx] until cfgexpand pass. Or we have to directly modify CHREC algorithm to get it calculated? Appreciate your kindly help in advance! Richard, Now I have a patch working for the case of step i++, by directly modifying scalar evolution algorithm. the following code would be generated after SCCP, l # i_13 = PHI i_6(7), k_2(D)(4) a_p.0_4 = a[i_13]; MEM[(int *)a][i_13] = 100; i_6 = i_13 + 1; if (i_6 = 999) goto bb 7; else goto bb 6; bb 6: a_p_lsm.5_11 = MEM[(void *)a + 3996B]; a_p = a_p_lsm.5_11; goto bb 3; It looks good, but I still have problem when the case has step i+=k. For this case the value of variable i exiting loop isn't invariant, the algorithm below in scalar evolution doesn't work on it, compute_overall_effect_of_inner_loop() { ... tree nb_iter = number_of_latch_executions (inner_loop); if (nb_iter == chrec_dont_know) return chrec_dont_know; else { tree res; /* evolution_fn is the evolution function in LOOP. Get its value in the nb_iter-th iteration. */ res = chrec_apply (inner_loop-num, evolution_fn, nb_iter); if (chrec_contains_symbols_defined_in_loop (res, loop-num)) res = instantiate_parameters (loop, res); /* Continue the computation until ending on a parent of LOOP. */ return compute_overall_effect_of_inner_loop (loop, res); } } In theory, we can still have the transformation like below even if the step is i+=k, # i_13 = PHI i_6(7), k_2(D)(4) i_14 = i_13, a_p.0_4 = a[i_13]; MEM[(int *)a][i_13] = 100; i_6 = i_13 + k_2(D); // i+=k if (i_6 = 999) goto bb 7; else goto bb 6; bb 6: a_p_lsm.5_11 = a[i_14]; a_p = a_p_lsm.5_11; goto bb 3; But I realize this is not a loop closed SSA form at all, because i_14 is being used out of the loop. Where could we extend the liverange of variable i in GCC infrastructure and finally solve this problem? Thanks, -Jiangning
Re: FW: a nifty feature for c preprocessor
On 28/12/2011 07:48, R A wrote: i'm an amateur programmer that just started learning C. i like most of the features, specially the c preprocessor that it comes packed with. it's an extremely portable way of implementing metaprogramming in C. though i've always thought it lacked a single feature -- an evaluation feature. I think you have missed the point about the C pre-processor. It is not a metaprogramming language - it is a simple text substitution macro processor. It does not have any understanding of the symbols (except for #) in the code, nor does it support recursion - it's pure text substitution. Your suggestion would therefore need a complete re-design of the C pre-processor. And the result is not a feature that people would want. Many uses of the C pre-processor are deprecated with modern use of C and C++. Where possible, it is usually better programming practice to use a static const instead of a simple numeric #define, and a static inline function instead of a function-like macro. With C++, even more pre-processor functionality can be replaced by language features - templates give you metaprogramming. There are plenty of exceptions, of course, but in general it is better to use a feature that is part of the language itself (C or C++) rather than the preprocessor. It looks like you are wanting to get the compiler to pre-calculate results rather than have them calculated at run-time. That's a good idea - so the gcc developers have worked hard to make the compiler do that in many cases. If your various expressions here boil down to constants that the compiler can see, and you have at least some optimisation enabled, then it will pre-calculate the results. If you have particular need of more complicated pre-processing, then what you want is generally some sort of code generator. C has a simple enough syntax - write code in any language you want (C itself, or anything else) that outputs a C file. I've done that a few times, such as for scripts to generate CRC tables. And if you really want to use a pre-processing macro style, then there are more powerful languages suited to that. You could use PHP, for example - while the output of a PHP script is usually HTML, there is no reason why it couldn't be used as a C pre-processor. say i have these definitions: #define MACRO_1 (x/y)*y #define MACRO_2 sqrt(a) #define MACRO_3 calc13() #define MACRO_15 (a + b)/c now, all throughout the codebase, whenever and whichever of MACRO_1, or MACRO_2 (or so forth) needs to be called, they are conveniently indexed by another macro expansion: #define CONCAT(a, b) a##b #define CONCAT_VAR(a, b) CONCAT(a, b) #define MASTER_MACRO(N) CONCAT_VAR(MACRO_, N) now, if we use MASTER_MACRO with a direct value: MASTER_MACRO(10) or #define N 10 MASTER_MACRO(10) both will work. but substitute this with: #define N((5*a)/c + (10*b)/c + ((5*a) % c + (10*b) % c)/c) and MASTER_MACRO expands to: MACRO_((5*a)/c + (10*b)/c + ((5*a) % c + (10*b) % c)/c) which, of course is wrong. there are other workarounds or many times this scheme can be avoided altogether. but it can be made to work (elegantly) by adding an eval preprocessor operation: so we redefine MASTER_MACRO this way: #define MASTER_MACRO(N) CONCAT_VAR(MACRO_, eval(N)) which evaluates correctly. this nifty trick (though a bit extended than what i elaborated above) can also be used to *finally* have increments and decrements (among others). since eval forces the evaluation of an *arithmetic* expression (for now), it will force the evaluation of an expression, then define it to itself. this will of course trigger a redefinition flag from our beloved preprocessor, but the defined effect would be: #define X (((14*x)/y)/z)/* say this evaluates to simply 3 */ incrementing X, will simply be: #define X eval(eval(X) + 1)/* 1) will be evaluated as 4 before any token substitution */ #define X eval(eval(X) + 1)/* 2) will be evaluated as 5 before any token substitution */ that easy. to suppress the redef warnings, we can have another directive like force_redef (which can only work in conjunction with eval) #force_redef X eval(eval(X) + 1) i'm just confused :-S... why hasn't this been suggested? i would love to have this incorporated (even just on test builds) to gcc. it would make my code so, so much more manageable and virtually extensible to more platforms. i would love to have a go at it and probably modify the gcc preprocessor, but i since i know nothing of it's implementation details, i don't know where to begin. i was hoping that this being a gnu implementation, it's been heavily modularized (the fact that gcc was heavily revised back then to use abstract syntax trees, gimple, etc, past version 2.95 -- ???). so i can easily interrupt the
return vs simple_return
Hi -- I've run into a problem with the MicroBlaze backend where it is not recognizing a return pattern. I'm trying to modify the back end to use the 'simple_return' pattern, rather than 'return', since MicroBlaze has exactly what the documentation describes: a no-frills return instruction which does nothing more than branch back to the caller. When I define only 'simple_return', there are undefined references in function.c for emit_return_into_block() and emit_use_return_register_into_block(), since these are defined when HAVE_return is defined. MIPS has a similar call/return model, with a trivial return instruction. mips.md defines expanders for both 'return' and 'simple_return' and identical insn's for both which generate the return jump. ARM also has a simple return, but the back end defines 'return' and does not define 'simple_return'. My guess is that the #ifdef HAVE_return in function.c which surrounds the undefined functions should be removed. What is the correct model for the back end? Define only 'return' like ARM, define both 'return' and 'simple_return' like MIPS, or define only 'simple_return' like I tried to do? -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
RE: a nifty feature for c preprocessor
yes, i do realize that c preprocessor is but a text substitution tool from days past when programmers where only starting to develop the rudimentaries of high-level programming. but the reason i'm sticking with the c preprocessor if the fact that code that i write from it is extremely portable. copy the code and you can use it in any IDE or stand-alone compiler, it's as simple as that. i have considered using gnu make, writing scripts with m4 and other parsers or lexers, but sticking with the preprocessor's minimalism is still too attractive an idea. about the built in features in c and C++ to alleviate the extensive use for the preprocessor, like inline functions, static consts. the fact is NOT ALL compilers out there would optimize a function so that it will not have to use a return stack. simply using a macro FORCES the compiler to do so. the same goes for static const, if you use a precompiled value, you are forcing an immediate addressing, something of a good optimization. so it's still mostly an issue of portability of optimization. templates, i have no problem with, i wish there could be a C dialect that can integrate it, so i wouldn't have to be forced to use C++ and all the bloat that usually come from a lot of it's implementation (by that i mean a performance close to C i think is very possible for C++'s library). but, of course, one has to ask if you're making your code portable to any C compiler, why do you want gcc to change (or modify it for your own use)? you should be persuading the c committee. well, that's the thing, it's harder to do the latter, so by doing this, i can demonstrate that it's a SIMPLE, but good idea. Date: Wed, 28 Dec 2011 10:57:28 +0100 From: da...@westcontrol.com To: ren_zokuke...@hotmail.com CC: gcc@gcc.gnu.org Subject: Re: FW: a nifty feature for c preprocessor On 28/12/2011 07:48, R A wrote: i'm an amateur programmer that just started learning C. i like most of the features, specially the c preprocessor that it comes packed with. it's an extremely portable way of implementing metaprogramming in C. though i've always thought it lacked a single feature -- an evaluation feature. I think you have missed the point about the C pre-processor. It is not a metaprogramming language - it is a simple text substitution macro processor. It does not have any understanding of the symbols (except for #) in the code, nor does it support recursion - it's pure text substitution. Your suggestion would therefore need a complete re-design of the C pre-processor. And the result is not a feature that people would want. Many uses of the C pre-processor are deprecated with modern use of C and C++. Where possible, it is usually better programming practice to use a static const instead of a simple numeric #define, and a static inline function instead of a function-like macro. With C++, even more pre-processor functionality can be replaced by language features - templates give you metaprogramming. There are plenty of exceptions, of course, but in general it is better to use a feature that is part of the language itself (C or C++) rather than the preprocessor. It looks like you are wanting to get the compiler to pre-calculate results rather than have them calculated at run-time. That's a good idea - so the gcc developers have worked hard to make the compiler do that in many cases. If your various expressions here boil down to constants that the compiler can see, and you have at least some optimisation enabled, then it will pre-calculate the results. If you have particular need of more complicated pre-processing, then what you want is generally some sort of code generator. C has a simple enough syntax - write code in any language you want (C itself, or anything else) that outputs a C file. I've done that a few times, such as for scripts to generate CRC tables. And if you really want to use a pre-processing macro style, then there are more powerful languages suited to that. You could use PHP, for example - while the output of a PHP script is usually HTML, there is no reason why it couldn't be used as a C pre-processor. say i have these definitions: #define MACRO_1 (x/y)*y #define MACRO_2 sqrt(a) #define MACRO_3 calc13() #define MACRO_15 (a + b)/c now, all throughout the codebase, whenever and whichever of MACRO_1, or MACRO_2 (or so forth) needs to be called, they are conveniently indexed by another macro expansion: #define CONCAT(a, b) a##b #define CONCAT_VAR(a, b) CONCAT(a, b) #define MASTER_MACRO(N) CONCAT_VAR(MACRO_, N) now, if we use MASTER_MACRO with a direct value: MASTER_MACRO(10) or #define N 10 MASTER_MACRO(10) both will work. but substitute this with: #define N ((5*a)/c + (10*b)/c + ((5*a) % c + (10*b) % c)/c) and MASTER_MACRO expands to: MACRO_((5*a)/c + (10*b)/c + ((5*a) % c + (10*b) % c)/c) which,
Re: a nifty feature for c preprocessor
On 28 December 2011 20:57, R A wrote: templates, i have no problem with, i wish there could be a C dialect that can integrate it, so i wouldn't have to be forced to use C++ and all the bloat that usually come from a lot of it's implementation (by that i mean a performance close to C i think is very possible for C++'s library). What bloat? If you only use the subset of C++ that is compatible with C++ then you don't get any additional cost, you are not forced to use anything, or to get any mythical bloat but, of course, one has to ask if you're making your code portable to any C compiler, why do you want gcc to change (or modify it for your own use)? you should be persuading the c committee. well, that's the thing, it's harder to do the latter, so by doing this, i can demonstrate that it's a SIMPLE, but good idea. It's not simple, or IMHO a good idea.
Re: a nifty feature for c preprocessor
On 28/12/11 21:57, R A wrote: yes, i do realize that c preprocessor is but a text substitution tool from days past when programmers where only starting to develop the rudimentaries of high-level programming. but the reason i'm sticking with the c preprocessor if the fact that code that i write from it is extremely portable. copy the code and you can use it in any IDE or stand-alone compiler, it's as simple as that. i have considered using gnu make, writing scripts with m4 and other parsers or lexers, but sticking with the preprocessor's minimalism is still too attractive an idea. If you want portable, use features that already exist. Lots of people write lots of C code that is portable across huge ranges of compilers and target processors. And if you want portable pre-processing or code generation, use something that generates the code rather than inventing tools and features that don't exist, nor will ever exist. It is also quite common to use scripts in languages like perl or python to generate tables and other pre-calculated values for inclusion in C code. about the built in features in c and C++ to alleviate the extensive use for the preprocessor, like inline functions, static consts. the fact is NOT ALL compilers out there would optimize a function so that it will not have to use a return stack. simply using a macro FORCES the compiler to do so. the same goes for static const, if you use a precompiled value, you are forcing an immediate addressing, something of a good optimization. so it's still mostly an issue of portability of optimization. Most modern compilers will do a pretty reasonable job of constant propagation and calculating expressions using constant values. And most will apply inline as you would expect, unless you intentionally hamper the compiler by not enabling optimisations. Using macros, incidentally, does not FORCE the compiler to do anything - I know at least one compiler that will take common sections of code (from macros or normal text) and refactor it artificial functions, expending stack space and run time speed to reduce code size. And immediate addressing is not necessarily a good optimisation - beware making generalisations like that. Let the compiler do what it is good at doing - generating optimal code for the target in question - and don't try to second-guess it. You will end up with bigger and slower code. templates, i have no problem with, i wish there could be a C dialect that can integrate it, so i wouldn't have to be forced to use C++ and all the bloat that usually come from a lot of it's implementation (by that i mean a performance close to C i think is very possible for C++'s library). C++ does not have bloat. The only feature of C++ that can occasionally lead to larger or slower code, or fewer optimisations, than the same code in C is exceptions - if you don't need them, disable them with -fno-exceptions. Other than that C++ is zero cost compared to C - you only pay for the features you use. but, of course, one has to ask if you're making your code portable to any C compiler, why do you want gcc to change (or modify it for your own use)? you should be persuading the c committee. well, that's the thing, it's harder to do the latter, so by doing this, i can demonstrate that it's a SIMPLE, but good idea. It's not a good idea, and it would not be simple to implement. I really don't want to discourage someone from wanting to contribute to gcc development, but this is very much a dead-end idea. I applaud your enthusiasm, but keep a check on reality - you are an amateur just starting C programming. C has been used for the last forty years - with gcc coming up for its 25th birthday this spring. If this idea were that simple, and that good, it would already be implemented. As you gain experience and knowledge with C (and possibly C++), you will quickly find that a preprocessor like you describe is neither necessary nor desirable. mvh., David Date: Wed, 28 Dec 2011 10:57:28 +0100 From: da...@westcontrol.com To: ren_zokuke...@hotmail.com CC: gcc@gcc.gnu.org Subject: Re: FW: a nifty feature for c preprocessor On 28/12/2011 07:48, R A wrote: i'm an amateur programmer that just started learning C. i like most of the features, specially the c preprocessor that it comes packed with. it's an extremely portable way of implementing metaprogramming in C. though i've always thought it lacked a single feature -- an evaluation feature. I think you have missed the point about the C pre-processor. It is not a metaprogramming language - it is a simple text substitution macro processor. It does not have any understanding of the symbols (except for #) in the code, nor does it support recursion - it's pure text substitution. Your suggestion would therefore need a complete re-design of the C pre-processor. And the result is not a feature that people would want. Many
RE: a nifty feature for c preprocessor
And if you want portable pre-processing or code generation, use something that generates the code rather than inventing tools and features that don't exist, nor will ever exist. It is also quite common to use scripts in languages like perl or python to generate tables and other pre-calculated values for inclusion in C code. though there are things that i will not disclose, i've never had to invent any tools for the project i'm working on everything is legit. this is the only time that i've had to. so believe me if i said i've considered all *conventional* solutions Most modern compilers will do a pretty reasonable job of constant propagation and calculating expressions using constant values. And most will apply inline as you would expect, unless you intentionally hamper the compiler by not enabling optimisations. Using macros, incidentally, does not FORCE the compiler to do anything - I know at least one compiler that will take common sections of code (from macros or normal text) and refactor it artificial functions, expending stack space and run time speed to reduce code size. And immediate addressing is not necessarily a good optimisation - beware making generalisations like that. Let the compiler do what it is good at doing - generating optimal code for the target in question - and don't try to second-guess it. You will end up with bigger and slower code. i'm not one to share techniques/methodologies, 1) but if it's the case for more than, say 70%, of systems/processors and 2) it takes very little penalty; then i'd write it that way. if it's not optimized, just let the compiler (if it's as good as you say it is) re-optimize it. if the compiler ain't good enough to do that, well it's not a good compiler anyway. but the code will still work. I really don't want to discourage someone from wanting to contribute to gcc development, but this is very much a dead-end idea. I applaud your enthusiasm, but keep a check on reality - you are an amateur just starting C programming. C has been used for the last forty years - with gcc coming up for its 25th birthday this spring. If this idea were that simple, and that good, it would already be implemented. As you gain experience and knowledge with C (and possibly C++), you will quickly find that a preprocessor like you describe is neither necessary nor desirable. you know there's no way i can't answer that without invoking the wrath of the community.
FW: a nifty feature for c preprocessor
sorry: 2) it takes very little penalty, otherwise.
Re: a nifty feature for c preprocessor
that all being said, i really don't think it's a hard feature to implement like i said, just whenever there is an 1) evaluation in the conditional directives or 2) #define is called, look for eval, if there, evaluate the expression, then substitute token. the rest of the needs no tampering at all. libccp's implementation is great, neatly divided. probably have to edit only half a dozen files, at most -- at least from what i can tell from scanning the the code. it'll just take me a long time to know how to work with setting all the flags, attributes, and working with the structs, so it's hard for me to do by myself.
[Bug ada/51691] New: Cast of an array with type generates a please file bug message (See below)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51691 Bug #: 51691 Summary: Cast of an array with type generates a please file bug message (See below) Classification: Unclassified Product: gcc Version: 4.4.5 Status: UNCONFIRMED Severity: minor Priority: P3 Component: ada AssignedTo: unassig...@gcc.gnu.org ReportedBy: ale...@m2osw.com Created attachment 26193 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26193 Case Folding implementation for my own Ada compiler --- prompt gnatmake case_folding gcc-4.4 -c case_folding.adb +===GNAT BUG DETECTED==+ | 4.4.5 (x86_64-pc-linux-gnu) Assert_Failure sinfo.adb:880 | | Error detected at case_folding.adb:401:32| | Please submit a bug report; see http://gcc.gnu.org/bugs.html.| | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact gcc-4.4 or gnatmake command that you entered. | | Also include sources listed below in gnatchop format | | (concatenated together with no headers between files). | +==+ Please include these source files with error report Note that list may not be accurate in some cases, so please double check that the problem can still be reproduced with the set of files listed. case_folding.adb case_folding.adb:401:53: missing ) compilation abandoned gnatmake: case_folding.adb compilation error --- As I type fast, the error came from this line: output_line(1 .. indent) := string(1 .. indent = ' '); which includes an invalid cast, the proper line should be (without string): output_line(1 .. indent) := (1 .. indent = ' '); There are still problems on line 403 which I left in case the bug would not be reported without that other error (unlikely though.) Just in case, I'm on Ubuntu 11.04. I use the stock version of Ada. --- More info about my project can be found here: http://aada.m2osw.com/compiler
[Bug tree-optimization/51684] [4.7 Regression]: ICE in gfortran.dg/maxloc_bounds_5 on ia64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51684 --- Comment #2 from Uros Bizjak ubizjak at gmail dot com 2011-12-28 09:06:45 UTC --- (In reply to comment #1) Untested patch: I have bootstrapped and regression tested the patch on ia64-unknown-linux-gnu [1], where it fixes all mentioned failures. [1] http://gcc.gnu.org/ml/gcc-testresults/2011-12/msg02709.html
[Bug rtl-optimization/51667] [4.7 Regression] new FAIL: 27_io/basic_*stream/* execution test with -m32
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51667 --- Comment #19 from Uros Bizjak ubizjak at gmail dot com 2011-12-28 09:09:02 UTC --- FYI, the patch also works correctly on alpha [1], a target with sign-extended instructions. [1] http://gcc.gnu.org/ml/gcc-testresults/2011-12/msg02710.html
[Bug target/50038] redundant zero extensions
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50038 Uros Bizjak ubizjak at gmail dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED Target Milestone|--- |4.7.0 --- Comment #9 from Uros Bizjak ubizjak at gmail dot com 2011-12-28 09:13:03 UTC --- Patch was committed to mainline.
[Bug testsuite/50722] FAIL: gcc.dg/pr49994-3.c (test for excess errors)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50722 --- Comment #7 from uros at gcc dot gnu.org 2011-12-28 09:16:28 UTC --- Author: uros Date: Wed Dec 28 09:16:24 2011 New Revision: 182704 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=182704 Log: PR testsuite/50722 * gcc.dg/pr49994-3.c: Skip on ia64-*-*-*, hppa*-*-* and *-*-hpux*. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/pr49994-3.c
[Bug tree-optimization/51684] [4.7 Regression]: ICE in gfortran.dg/maxloc_bounds_5 on ia64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51684 --- Comment #3 from irar at gcc dot gnu.org 2011-12-28 09:20:20 UTC --- Author: irar Date: Wed Dec 28 09:20:16 2011 New Revision: 182705 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=182705 Log: PR tree-optimization/51684 * tree-vect-slp.c (vect_schedule_slp_instance): Get gsi of original statement in case of a pattern. (vect_schedule_slp): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-slp.c
[Bug target/51685] FAIL: gcc.dg/tm/pr51472.c (internal compiler error) on ppc*-*-*, s390*-*-*, spu-*-*
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51685 Hans-Peter Nilsson hp at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011-12-28 CC||hp at gcc dot gnu.org Ever Confirmed|0 |1 --- Comment #1 from Hans-Peter Nilsson hp at gcc dot gnu.org 2011-12-28 09:38:27 UTC --- I checked my logs for r182695 (latest at this time) and yes, cris-axi-elf too, same message. A quick peek in gcc-testresults@ shows the same error for armv7l-unknown-linux-gnueabi (http://gcc.gnu.org/ml/gcc-testresults/2011-12/msg02689.html) and ia64-linux (http://gcc.gnu.org/ml/gcc-testresults/2011-12/msg02709.html) so it looks almost universal.
[Bug tree-optimization/51684] [4.7 Regression]: ICE in gfortran.dg/maxloc_bounds_5 on ia64
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51684 Ira Rosen irar at il dot ibm.com changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #4 from Ira Rosen irar at il dot ibm.com 2011-12-28 10:22:07 UTC --- Fixed.
[Bug tree-optimization/51692] New: [4.7 Regression] ICE on several valgrind tests
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51692 Bug #: 51692 Summary: [4.7 Regression] ICE on several valgrind tests Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ja...@gcc.gnu.org Target: x86_64-linux int main () { volatile double d = 0.0; double *p = __builtin_calloc (1, sizeof (double)); d += 1.0; *p += 2.0; __builtin_free (p); return 0; } ICEs at -O2, the free argument becomes a freed SSA_NAME for some reason. Started with http://gcc.gnu.org/viewcvs?root=gccview=revrev=182009
[Bug tree-optimization/51692] [4.7 Regression] ICE on several valgrind tests
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51692 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.7.0
[Bug testsuite/51693] New: New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 Bug #: 51693 Summary: New XPASSes in vectorizer testsuite on powerpc64-suse-linux Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite AssignedTo: unassig...@gcc.gnu.org ReportedBy: i...@il.ibm.com CC: michael.v.zolotuk...@gmail.com Host: powerpc64-suse-linux Target: powerpc64-suse-linux Build: powerpc64-suse-linux Revision 182583 http://gcc.gnu.org/viewcvs?view=revisionrevision=182583 caused several XPASSes on powerpc64-suse-linux: XPASS: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect Alignment of access forced using peeling 2 XPASS: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect Vectorizing an unaligned access 4 XPASS: gcc.dg/vect/vect-peel-3.c scan-tree-dump-times vect Vectorizing an unaligned access 1 XPASS: gcc.dg/vect/vect-peel-3.c scan-tree-dump-times vect Alignment of access forced using peeling 1 XPASS: gcc.dg/vect/vect-multitypes-1.c -flto scan-tree-dump-times vect Alignment of access forced using peeling 2 XPASS: gcc.dg/vect/vect-multitypes-1.c -flto scan-tree-dump-times vect Vectorizing an unaligned access 4 XPASS: gcc.dg/vect/vect-peel-3.c -flto scan-tree-dump-times vect Vectorizing an unaligned access 1 XPASS: gcc.dg/vect/vect-peel-3.c -flto scan-tree-dump-times vect Alignment of access forced using peeling 1 XPASS: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect Alignment of access forced using peeling 2 The reason is that {!vect_aligned_arrays} was added to xfail of the above checks, while vect_aligned_arrays is false for power. Changing that, i.e.: Index: ../../lib/target-supports.exp === --- ../../lib/target-supports.exp (revision 182703) +++ ../../lib/target-supports.exp (working copy) @@ -3222,7 +3222,8 @@ proc check_effective_target_vect_aligned_arrays { set et_vect_aligned_arrays_saved 1 } } -if [istarget spu-*-*] { +if {[istarget spu-*-*] + || [istarget powerpc*-*-*] } { set et_vect_aligned_arrays_saved 1 } } fixes the XPASSes and doesn't cause any problems (on powerpc64-suse-linux), but AFAIU arrays are not always vector aligned on power, so this is not a good idea, unless we change the definition of check_effective_target_vect_aligned_arrays. What was the purpose of adding {!vect_aligned_arrays} to these tests? If peeling is impossible on AVX because arrays are never vector aligned, maybe we need a new target check instead of vect_aligned_arrays?
[Bug tree-optimization/51694] New: [4.7 Regression] ICE while compiling alliance package
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51694 Bug #: 51694 Summary: [4.7 Regression] ICE while compiling alliance package Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ja...@gcc.gnu.org CC: mkuvyr...@gcc.gnu.org Target: x86_64-linux void foo (x, fn) void (*fn) (); { int a = baz ((void *) 0, x); (*fn) (x, 0); } void bar (void) { void *x = 0; foo (x); } ICEs at -O2 starting with http://gcc.gnu.org/viewcvs?root=gccview=revrev=181377
[Bug tree-optimization/51694] [4.7 Regression] ICE while compiling alliance package
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51694 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.7.0
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #1 from Michael Zolotukhin michael.v.zolotukhin at gmail dot com 2011-12-28 11:08:36 UTC --- I though that if {vect_aligned_arrays} isn't true, than arrays could be aligned even after peeling - that's why I added such check. Unfortunately, I can't reproduce these fails, as I have no PowerPC. By the way, if arrays aren't aligned on Power, why does GCC produce such messages - does it really try to peel something? Maybe we should just refine the check? Anyway, if everything is ok with the tests (in original version) and with gcc itself - we could check not for vect_aligned_arrays, but for AVX. Please check http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600.html and the attached to that letter patch. Thanks, Michael On 28 December 2011 14:51, irar at il dot ibm.com gcc-bugzi...@gcc.gnu.org wrote: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 Bug #: 51693 Summary: New XPASSes in vectorizer testsuite on powerpc64-suse-linux Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite AssignedTo: unassig...@gcc.gnu.org ReportedBy: i...@il.ibm.com CC: michael.v.zolotuk...@gmail.com Host: powerpc64-suse-linux Target: powerpc64-suse-linux Build: powerpc64-suse-linux Revision 182583 http://gcc.gnu.org/viewcvs?view=revisionrevision=182583 caused several XPASSes on powerpc64-suse-linux: XPASS: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect Alignment of access forced using peeling 2 XPASS: gcc.dg/vect/vect-multitypes-1.c scan-tree-dump-times vect Vectorizing an unaligned access 4 XPASS: gcc.dg/vect/vect-peel-3.c scan-tree-dump-times vect Vectorizing an unaligned access 1 XPASS: gcc.dg/vect/vect-peel-3.c scan-tree-dump-times vect Alignment of access forced using peeling 1 XPASS: gcc.dg/vect/vect-multitypes-1.c -flto scan-tree-dump-times vect Alignment of access forced using peeling 2 XPASS: gcc.dg/vect/vect-multitypes-1.c -flto scan-tree-dump-times vect Vectorizing an unaligned access 4 XPASS: gcc.dg/vect/vect-peel-3.c -flto scan-tree-dump-times vect Vectorizing an unaligned access 1 XPASS: gcc.dg/vect/vect-peel-3.c -flto scan-tree-dump-times vect Alignment of access forced using peeling 1 XPASS: gcc.dg/vect/no-section-anchors-vect-69.c scan-tree-dump-times vect Alignment of access forced using peeling 2 The reason is that {!vect_aligned_arrays} was added to xfail of the above checks, while vect_aligned_arrays is false for power. Changing that, i.e.: Index: ../../lib/target-supports.exp === --- ../../lib/target-supports.exp (revision 182703) +++ ../../lib/target-supports.exp (working copy) @@ -3222,7 +3222,8 @@ proc check_effective_target_vect_aligned_arrays { set et_vect_aligned_arrays_saved 1 } } - if [istarget spu-*-*] { + if {[istarget spu-*-*] + || [istarget powerpc*-*-*] } { set et_vect_aligned_arrays_saved 1 } } fixes the XPASSes and doesn't cause any problems (on powerpc64-suse-linux), but AFAIU arrays are not always vector aligned on power, so this is not a good idea, unless we change the definition of check_effective_target_vect_aligned_arrays. What was the purpose of adding {!vect_aligned_arrays} to these tests? If peeling is impossible on AVX because arrays are never vector aligned, maybe we need a new target check instead of vect_aligned_arrays? -- Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug.
[Bug tree-optimization/51694] [4.7 Regression] ICE while compiling alliance package
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51694 Maxim Kuvyrkov mkuvyrkov at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2011-12-28 Ever Confirmed|0 |1 --- Comment #1 from Maxim Kuvyrkov mkuvyrkov at gcc dot gnu.org 2011-12-28 11:09:29 UTC --- Will investigate. Jakub, thanks for reporting this.
[Bug debug/51695] [4.7 Regression] ICE while compiling argyllcms package
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51695 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Component|tree-optimization |debug Target Milestone|--- |4.7.0
[Bug tree-optimization/51695] New: [4.7 Regression] ICE while compiling argyllcms package
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51695 Bug #: 51695 Summary: [4.7 Regression] ICE while compiling argyllcms package Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: ja...@gcc.gnu.org CC: aol...@gcc.gnu.org Target: x86_64-linux typedef struct { struct { unsigned int t1, t2, t3, t4, t5, t6; } t; int p; struct { double X, Y, Z; } r; } T; typedef struct { T *h; } S; static unsigned int v = 0x12345678; int foo (void) { v = (v 0x8000) ? ((v 1) ^ 0xa398655d) : (v 1); return 0; } double bar (void) { unsigned int o; v = (v 0x8000) ? ((v 1) ^ 0xa398655d) : (v 1); o = v 0x; return (double) o / 32768.0; } int baz (void) { foo (); return 0; } void test (S *x) { T *t = x-h; t-t.t1 = foo (); t-t.t2 = foo (); t-t.t3 = foo (); t-t.t4 = foo (); t-t.t5 = foo (); t-t.t6 = foo (); t-p = baz (); t-r.X = bar (); t-r.Y = bar (); t-r.Z = bar (); } ICEs at -O2 -g, starting with http://gcc.gnu.org/viewcvs?root=gccview=revrev=180194
[Bug debug/51695] [4.7 Regression] ICE while compiling argyllcms package
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51695 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org 2011-12-28 11:35:23 UTC --- The NOTE_INSN_VAR_LOCATION argument for variable o is extremely huge in this case and we hit the 64KB limit on .debug_loc expressions.
[Bug target/51345] [avr] Devices with 8-bit SP need their own multilib(s)
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51345 --- Comment #3 from Georg-Johann Lay gjl at gcc dot gnu.org 2011-12-28 12:21:40 UTC --- Created attachment 26194 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26194 tentative patch
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #2 from Ira Rosen irar at il dot ibm.com 2011-12-28 12:27:18 UTC --- (In reply to comment #1) I though that if {vect_aligned_arrays} isn't true, than arrays could be aligned even after peeling - that's why I added such check. Sorry, I don't understand this sentence. What do you mean by aligned after peeling? Could you please explain what exactly happens on AVX (a dump file with -fdump-tree-vect-details would be the best thing). Unfortunately, I can't reproduce these fails, as I have no PowerPC. By the way, if arrays aren't aligned on Power, why does GCC produce such messages - does it really try to peel something? The arrays in the tests are aligned. I said that I think that we can't promise that all the arrays are vector aligned on power. BTW, we can peel for unknown misalignment as well. Maybe we should just refine the check? Anyway, if everything is ok with the tests (in original version) and with gcc itself - we could check not for vect_aligned_arrays, but for AVX. Please check http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600.html and the attached to that letter patch. I think that everything was ok, but I don't think that using vect_sizes_32B_16B is a good idea. I would really like to see an AVX vect dump for eg. vect-peel-3.c. Thanks, Ira Thanks, Michael
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #3 from Michael Zolotukhin michael.v.zolotukhin at gmail dot com 2011-12-28 12:59:24 UTC --- Created attachment 26195 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26195 AVX2 vect dump
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #4 from Michael Zolotukhin michael.v.zolotukhin at gmail dot com 2011-12-28 13:01:51 UTC --- (In reply to comment #2) I though that if {vect_aligned_arrays} isn't true, than arrays could be aligned even after peeling - that's why I added such check. Sorry, I don't understand this sentence. What do you mean by aligned after peeling? Could you please explain what exactly happens on AVX (a dump file with -fdump-tree-vect-details would be the best thing). Sorry, I misspelled. I meant than arrays couldn't be aligned - at least without some runtime checks. I.e. we can't peel some compile-time-known number of iterations and be sure that array become aligned. E.g., if we have array IA of ints aligned to 16-bytes, and we have access IA[i+3], then peeling of one iteration will guarantee alignment to 16-byte. But we don't know, how much iterations needs to be peeled to reach alignment to 32-bytes (as needed for AVX operations). Unfortunately, I can't reproduce these fails, as I have no PowerPC. By the way, if arrays aren't aligned on Power, why does GCC produce such messages - does it really try to peel something? The arrays in the tests are aligned. I said that I think that we can't promise that all the arrays are vector aligned on power. BTW, we can peel for unknown misalignment as well. In this case we shouldn't add Power to vector_aligned_arrays, I guess. Maybe we should just refine the check? Anyway, if everything is ok with the tests (in original version) and with gcc itself - we could check not for vect_aligned_arrays, but for AVX. Please check http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600.html and the attached to that letter patch. I think that everything was ok, but I don't think that using vect_sizes_32B_16B is a good idea. I would really like to see an AVX vect dump for eg. vect-peel-3.c. In vect-peel-3.c we actually assume that vector length is 16 byte. Here is the loop body: suma += ia[i]; sumb += ib[i+5]; sumc += ic[i+1]; When vector-size is 16, then peeling can make two of three accesses aligned, but when vector size is 32 that's impossible. That's why using vector_sizes_32B_16B might be correct here. Also, I uploaded the dump you asked. Michael Thanks, Ira
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #5 from Ira Rosen irar at il dot ibm.com 2011-12-28 13:11:53 UTC --- (In reply to comment #4) In vect-peel-3.c we actually assume that vector length is 16 byte. Here is the loop body: suma += ia[i]; sumb += ib[i+5]; sumc += ic[i+1]; When vector-size is 16, then peeling can make two of three accesses aligned, but when vector size is 32 that's impossible. That's why using vector_sizes_32B_16B might be correct here. Ah, now I understand. I was confused by vect_aligned_arrays, and it's irrelevant here, right? Yes, vector_sizes_32B_16B seems to be ok in that case. Thanks, Ira
[Bug c++/51680] g++ 4.7 fails to inline trivial template stuff
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51680 Marc Glisse marc.glisse at normalesup dot org changed: What|Removed |Added CC||marc.glisse at normalesup ||dot org --- Comment #7 from Marc Glisse marc.glisse at normalesup dot org 2011-12-28 13:44:16 UTC --- With g++-4.6, -O1 -finline-small-functions already inlines everything, so maybe the definition of small somehow changed a bit? g++-4.7 -fdump-ipa-all says that it doesn't inline because function not declared inline and code size would grow. g++-4.6 only tells me that the code size was unchanged by inlining the 2 calls.
[Bug c++/51547] auto, type deduction, reference collapsing and const: invalid initialization of reference of type 'const X' from expression of type 'const X'
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51547 --- Comment #4 from paolo at gcc dot gnu.org paolo at gcc dot gnu.org 2011-12-28 15:53:01 UTC --- Author: paolo Date: Wed Dec 28 15:52:54 2011 New Revision: 182709 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=182709 Log: 2011-12-27 Paolo Carlini paolo.carl...@oracle.com PR c++/51547 * g++.dg/cpp0x/pr51547.C: New. Modified: trunk/gcc/testsuite/ChangeLog
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #6 from Oleg Endo oleg.e...@t-online.de 2011-12-28 15:59:35 UTC --- (In reply to comment #3) Created attachment 26191 [details] Proposed patch to improve some of the issues. The attached patch removes the useless sequence and still allows the -1 constant to be CSE-ed for such cases as the example function above. I haven't ran all tests on it yet, but CSiBE shows average code size reduction of approx. -0.1% for -m4* with some code size increases in some files. Some of the code size increases are caused by the ifcvt.c pass which tries to transform sequences like: int test_func_6 (int a, int b, int c) { if (a == 16) c = 0; return b + c; } into branch-free code like: mov r4,r0 ! 45movsi_ie/2[length = 2] cmp/eq #16,r0 ! 9 cmpeqsi_t/2[length = 2] mov #-1,r0 ! 34movsi_ie/3[length = 2] negcr0,r0 ! 38*negc[length = 2] neg r0,r0 ! 36negsi2[length = 2] and r6,r0 ! 37*andsi3_compact/2[length = 2] rts ! 48*return_i[length = 2] add r5,r0 ! 14*addsi3_compact[length = 2] instead of the more compact (and on SH4 most likely better): movr4,r0 ! 41movsi_ie/2[length = 2] cmp/eq#16,r0 ! 9cmpeqsi_t/2[length = 2] bf0f ! 34*movsicc_t_true/2[length = 4] mov#0,r6 0: addr5,r6 ! 14*addsi3_compact[length = 2] rts ! 44*return_i[length = 2] movr6,r0 ! 19movsi_ie/2[length = 2] This particular case is handled in noce_try_store_flag_mask, which does the transformation if BRANCH_COST = 2, which is true for -m4. I guess before the patch ifcvt didn't realize that this transformation can be applied. I've tried setting BRANCH_COST to 1, which avoids this transformation but increases overall code size a bit.
[Bug libstdc++/51673] undefined references / libstdc++-7.dll
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51673 --- Comment #6 from Pawel Sikora pluto at agmk dot net 2011-12-28 16:06:47 UTC --- btw, i've tested the default allocator with std::__7 and the i686-pc-mingw32 toolchain works fine while the x86_64-pc-mingw32 reports undefined reference to .text$_ZN9__gnu_cxx3__713new_allocatorIiE8allocateEyPKv[__gnu_cxx::__7::new_allocatorint::allocate(unsigned long long, void const*)] so, there's a bug with symbol exporting not directly related to mt_allocator. _Znwj vs. _Znwy issue?
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #6 from Michael Zolotukhin michael.v.zolotukhin at gmail dot com 2011-12-28 16:19:54 UTC --- (In reply to comment #5) In vect-peel-3.c we actually assume that vector length is 16 byte. Here is the loop body: suma += ia[i]; sumb += ib[i+5]; sumc += ic[i+1]; When vector-size is 16, then peeling can make two of three accesses aligned, but when vector size is 32 that's impossible. That's why using vector_sizes_32B_16B might be correct here. Ah, now I understand. I was confused by vect_aligned_arrays, and it's irrelevant here, right? Actually yes, you're right. I think, ideally, vect_aligned_arrays should be somehow checked in such tests, as in them we assume that array's beginning is aligned - but that's not the rootcause of the xpasses. Yes, vector_sizes_32B_16B seems to be ok in that case. Other two tests (vect-multitypes-1.c and no-section-anchors-vect-69.c) look like having the same problem - are you ok for similar fix for them too, i.e. is patch http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600/vec-tests-avx2_fixes-7.patch ok for trunk? Thanks, Michael
[Bug rtl-optimization/51623] PowerPC section type conflict
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51623 --- Comment #2 from Michael Meissner meissner at gcc dot gnu.org 2011-12-28 18:02:56 UTC --- Author: meissner Date: Wed Dec 28 18:02:49 2011 New Revision: 182710 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=182710 Log: Fix PR 51623 Added: trunk/gcc/testsuite/gcc.target/powerpc/pr51623.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/51623] PowerPC section type conflict
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51623 Michael Meissner meissner at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||meissner at gcc dot gnu.org Resolution||FIXED AssignedTo|unassigned at gcc dot |meissner at gcc dot gnu.org |gnu.org | --- Comment #3 from Michael Meissner meissner at gcc dot gnu.org 2011-12-28 18:04:03 UTC --- Fixed in subversion revision 182710.
[Bug c++/51556] Bizarre member template access control errors
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51556 --- Comment #5 from Paolo Carlini paolo.carlini at oracle dot com 2011-12-28 18:12:17 UTC --- This works with current (Rev 182710) mainline.
[Bug rtl-optimization/49710] [4.7 Regression] segfault
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49710 Jan Hubicka hubicka at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |hubicka at gcc dot gnu.org |gnu.org | --- Comment #4 from Jan Hubicka hubicka at gcc dot gnu.org 2011-12-28 18:41:12 UTC --- Looking into it now. I am by no means expert on this code ;))
[Bug rtl-optimization/49710] [4.7 Regression] segfault
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49710 --- Comment #5 from Jan Hubicka hubicka at gcc dot gnu.org 2011-12-28 19:37:38 UTC --- OK, loop hiearchy looks as follows: loop_0 (header = 0, latch = 1, niter = ) { bb_2 (preds = {bb_0 }, succs = {bb_3 }) bb_6 (preds = {bb_5 }, succs = {bb_13 }) bb_12 (preds = {bb_4 }, succs = {bb_1 }) loop_4 (header = 13, latch = 14, niter = ) { bb_13 (preds = {bb_6 bb_14 }, succs = {bb_14 }) bb_14 (preds = {bb_13 }, succs = {bb_13 }) } loop_1 (header = 3, latch = 9, niter = ) { bb_3 (preds = {bb_2 bb_9 }, succs = {bb_4 }) bb_9 (preds = {bb_8 }, succs = {bb_3 }) loop_2 (header = 4, latch = 11, niter = ) { bb_4 (preds = {bb_3 bb_11 }, succs = {bb_12 bb_5 }) bb_5 (preds = {bb_4 }, succs = {bb_6 bb_7 }) bb_7 (preds = {bb_5 }, succs = {bb_10 }) bb_11 (preds = {bb_10 }, succs = {bb_4 }) loop_3 (header = 10, latch = 15, niter = ) { bb_8 (preds = {bb_10 }, succs = {bb_9 bb_15 }) bb_15 (preds = {bb_8 }, succs = {bb_10 }) bb_10 (preds = {bb_7 bb_15 }, succs = {bb_8 bb_11 }) } } } } We remove path from 10 to 8, that is closing the loop of loop_3. Basic blocks removed are 8 9 and 15. Finally we fail on BB 3 that is believed to be in loop 1, but header is null at this point because of code in delete_basic_block: 504 /* If we remove the header or the latch of a loop, mark the loop for 405 removal by setting its header and latch to NULL. */ 506 if (loop-latch == bb 507 || loop-header == bb) 508 { 509 loop-header = NULL; 510 loop-latch = NULL; 511 } OK, so it seems that fix_bb_placements is not ready to see loops marked for removal. I guess the catch is that loop peeling renders bb 3 unreachable. I however do not understand how loop peeling can make this happen, perhaps folding of the header condition is done? Honza
[Bug libstdc++/51673] undefined references / libstdc++-7.dll
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51673 --- Comment #7 from Pawel Sikora pluto at agmk dot net 2011-12-28 19:51:55 UTC --- please apply following obvious patch: --- gcc-4.6.0/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver.orig 2011-12-28 12:43:50.0 +0100 +++ gcc-4.6.0/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver 2011-12-28 20:25:36.603040153 +0100 @@ -42,9 +42,9 @@ __once_proxy; # operator new(size_t) -_Znw[jm]; +_Znw[jmy]; # operator new(size_t, std::nothrow_t const) -_Znw[jm]RKSt9nothrow_t; +_Znw[jmy]RKSt9nothrow_t; # operator delete(void*) _ZdlPv; @@ -52,9 +52,9 @@ _ZdlPvRKSt9nothrow_t; # operator new[](size_t) -_Zna[jm]; +_Zna[jmy]; # operator new[](size_t, std::nothrow_t const) -_Zna[jm]RKSt9nothrow_t; +_Zna[jmy]RKSt9nothrow_t; # operator delete[](void*) _ZdaPv; it fixes new/delete exports for x86_64-pc-mingw32. mt-allocator needs more exports...
[Bug c++/23211] using dec in nested class doesn't import name
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23211 --- Comment #15 from fabien at gcc dot gnu.org 2011-12-28 19:53:19 UTC --- Author: fabien Date: Wed Dec 28 19:53:14 2011 New Revision: 182711 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=182711 Log: gcc/testsuite/ChangeLog 2011-12-28 Fabien Chene fab...@gcc.gnu.org PR c++/23211 * g++.dg/template/using18.C: New. * g++.dg/template/using19.C: New. * g++.dg/template/nested3.C: Remove dg-message at instantiation. * g++.dg/template/crash13.C: Likewise. gcc/cp/ChangeLog 2011-12-28 Fabien Chene fab...@gcc.gnu.org PR c++/23211 * name-lookup.c (do_class_using_decl): Use dependent_scope_p instead of dependent_type_p, to check that a non-dependent nested-name-specifier of a class-scope using declaration refers to a base, even if the current scope is dependent. * parser.c (cp_parser_using_declaration): Set USING_DECL_TYPENAME_P to 1 if the DECL is not null. Re-indent a 'else' close to the prior modification. Added: trunk/gcc/testsuite/g++.dg/template/using18.C trunk/gcc/testsuite/g++.dg/template/using19.C Modified: trunk/gcc/cp/ChangeLog trunk/gcc/cp/name-lookup.c trunk/gcc/cp/parser.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/g++.dg/template/crash13.C trunk/gcc/testsuite/g++.dg/template/nested3.C
[Bug c++/23211] using dec in nested class doesn't import name
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23211 fabien at gcc dot gnu.org changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution||FIXED --- Comment #16 from fabien at gcc dot gnu.org 2011-12-28 20:04:25 UTC --- Fixed.
[Bug c++/51680] g++ 4.7 fails to inline trivial template stuff
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51680 --- Comment #8 from Jonathan Wakely redi at gcc dot gnu.org 2011-12-28 20:09:32 UTC --- (In reply to comment #6) Well, it's just an impression ... :] I think one reason is that unlike normal functions, template functions are implicitly sort of local (by necessity), in that they can have a definition in many compilation units without causing a link conflict. To get this effect for normal functions, one must use the static or inline keywords -- so the impression (rightly or wrongly) is that template functions definitions are like one of those. Inline functions and templates both have vague linkage, which is how they avoid multiple definitions. That has nothing to do with inlining.
[Bug c++/51316] alignof doesn't work with arrays of unknown bound
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51316 Paolo Carlini paolo.carlini at oracle dot com changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2011-12-28 AssignedTo|unassigned at gcc dot |paolo.carlini at oracle dot |gnu.org |com Ever Confirmed|0 |1 --- Comment #4 from Paolo Carlini paolo.carlini at oracle dot com 2011-12-28 20:24:44 UTC --- On it.
[Bug c/51696] New: [trans-mem] unsafe indirect function call in struct not properly displayed
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51696 Bug #: 51696 Summary: [trans-mem] unsafe indirect function call in struct not properly displayed Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: c AssignedTo: unassig...@gcc.gnu.org ReportedBy: patrick.marl...@gmail.com CC: al...@gcc.gnu.org, torv...@gcc.gnu.org Created attachment 26196 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26196 Attached testcase With an unsafe indirect function call, the error message is not clear. I don't know if it can display the declaration. In the worst case, unsafe indirect function call within ‘transaction_safe’ function should be ok. $ ./gcc/xgcc -B./gcc/ -fgnu-tm -O0 testcase.i testcase.i: In function ‘func’: testcase.i:7:21: error: unsafe function call ‘Uf3c0’ within ‘transaction_safe’ function testcase.i:8:12: error: unsafe function call ‘compare.1’ within ‘transaction_safe’ function Patrick Marlier.
[Bug rtl-optimization/51623] PowerPC section type conflict
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51623 --- Comment #4 from Michael Meissner meissner at gcc dot gnu.org 2011-12-28 20:53:33 UTC --- Author: meissner Date: Wed Dec 28 20:53:30 2011 New Revision: 182712 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=182712 Log: Backport PR 51623 change Added: branches/gcc-4_6-branch/gcc/testsuite/gcc.target/powerpc/pr51623.c - copied unchanged from r182710, trunk/gcc/testsuite/gcc.target/powerpc/pr51623.c Modified: branches/gcc-4_6-branch/gcc/ChangeLog branches/gcc-4_6-branch/gcc/config/rs6000/rs6000.c branches/gcc-4_6-branch/gcc/testsuite/ChangeLog
[Bug libstdc++/51673] undefined references / libstdc++-7.dll
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51673 --- Comment #8 from Kai Tietz ktietz at gcc dot gnu.org 2011-12-28 21:24:25 UTC --- (In reply to comment #7) please apply following obvious patch: --- gcc-4.6.0/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver.orig 2011-12-28 12:43:50.0 +0100 +++ gcc-4.6.0/libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver 2011-12-28 20:25:36.603040153 +0100 @@ -42,9 +42,9 @@ __once_proxy; # operator new(size_t) -_Znw[jm]; +_Znw[jmy]; # operator new(size_t, std::nothrow_t const) -_Znw[jm]RKSt9nothrow_t; +_Znw[jmy]RKSt9nothrow_t; # operator delete(void*) _ZdlPv; @@ -52,9 +52,9 @@ _ZdlPvRKSt9nothrow_t; # operator new[](size_t) -_Zna[jm]; +_Zna[jmy]; # operator new[](size_t, std::nothrow_t const) -_Zna[jm]RKSt9nothrow_t; +_Zna[jmy]RKSt9nothrow_t; # operator delete[](void*) _ZdaPv; it fixes new/delete exports for x86_64-pc-mingw32. mt-allocator needs more exports... Thanks. Yes, confirmed patch fixes reported new/delete issue. From my side this patch is ok. If C++ maintainer ok-s it too, I will apply it. Kai
[Bug c++/51316] alignof doesn't work with arrays of unknown bound
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51316 --- Comment #5 from Nikolka tsoae at mail dot ru 2011-12-28 22:06:18 UTC --- On it. There is an active core issue about alignof: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3309.html#1305 Probably, you should take into account the proposed resolution.
[Bug target/51244] SH Target: Inefficient conditional branch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51244 --- Comment #7 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-28 22:25:48 UTC --- (In reply to comment #3) I haven't ran all tests on it yet, but CSiBE shows average code size reduction of approx. -0.1% for -m4* with some code size increases in some files. Would something like that be OK for stage 3? Looks good, though not appropriate for stage 3, I think.
[Bug testsuite/50988] gcc.target/powerpc/*: Several tests fail incorrectly on powerpc-linux-gnuspe
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50988 --- Comment #2 from Michael Meissner meissner at gcc dot gnu.org 2011-12-28 22:30:22 UTC --- Created attachment 26197 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26197 Proposed patch Please check this patch on the spe compiler.
[Bug c++/51316] alignof doesn't work with arrays of unknown bound
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51316 --- Comment #6 from Paolo Carlini paolo.carlini at oracle dot com 2011-12-28 22:31:02 UTC --- Yeah, just allow the types at issue, that was clarified in core/930 actually.
[Bug target/51340] SH Target: Make -mfused-madd enabled by default
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51340 --- Comment #3 from Kazumoto Kojima kkojima at gcc dot gnu.org 2011-12-28 22:31:27 UTC --- (In reply to comment #2) Uhm, yes... The title should have been Enable -mfused-madd by -ffast-math Do you mean something like this? --- ORIG/trunk/gcc/config/sh/sh.c2011-12-03 10:03:41.0 +0900 +++ trunk/gcc/config/sh/sh.c2011-12-27 08:33:23.0 +0900 @@ -838,6 +838,11 @@ sh_option_override (void) align_functions = min_align; } + /* Default to use fmac insn when -ffast-math. See PR target/29100. */ + if (global_options_set.x_TARGET_FMAC == 0 + fast_math_flags_set_p (global_options) +TARGET_FMAC = 1; + if (sh_fixed_range_str) sh_fix_range (sh_fixed_range_str); I don't know the exact semantics for the new patterns. All I know is that rounding is supposed to be done only once after the two operations. This is the case for the SH fmac insn. Not sure whether this is enough though. It seems that we can use the fma pattern, though it would be an another issue.
[Bug testsuite/50988] gcc.target/powerpc/*: Several tests fail incorrectly on powerpc-linux-gnuspe
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50988 Michael Meissner meissner at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2011-12-28 CC||meissner at gcc dot gnu.org AssignedTo|unassigned at gcc dot |meissner at gcc dot gnu.org |gnu.org | Ever Confirmed|0 |1
[Bug testsuite/50988] gcc.target/powerpc/*: Several tests fail incorrectly on powerpc-linux-gnuspe
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50988 Michael Meissner meissner at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|WAITING --- Comment #3 from Michael Meissner meissner at gcc dot gnu.org 2011-12-28 22:32:41 UTC --- Klye, could you check this patch on your SPE compiler before I check it in?
[Bug fortran/51502] [4.6/4.7 Regression] Potentially wrong code generation due to wrong implict_pure check
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51502 Thomas Koenig tkoenig at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |tkoenig at gcc dot gnu.org |gnu.org |
[Bug middle-end/42668] internal compiler error: in expand_expr_real_1, at expr.c:9314
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42668 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Component|c |middle-end Resolution||FIXED Target Milestone|--- |4.4.3 Severity|major |normal --- Comment #2 from Andrew Pinski pinskia at gcc dot gnu.org 2011-12-28 22:52:12 UTC --- Has been fixed for awhile now.
[Bug target/51340] SH Target: Make -mfused-madd enabled by default
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51340 --- Comment #4 from Oleg Endo oleg.e...@t-online.de 2011-12-29 00:02:40 UTC --- (In reply to comment #3) (In reply to comment #2) Uhm, yes... The title should have been Enable -mfused-madd by -ffast-math Do you mean something like this? --- ORIG/trunk/gcc/config/sh/sh.c2011-12-03 10:03:41.0 +0900 +++ trunk/gcc/config/sh/sh.c2011-12-27 08:33:23.0 +0900 @@ -838,6 +838,11 @@ sh_option_override (void) align_functions = min_align; } + /* Default to use fmac insn when -ffast-math. See PR target/29100. */ + if (global_options_set.x_TARGET_FMAC == 0 + fast_math_flags_set_p (global_options) +TARGET_FMAC = 1; + if (sh_fixed_range_str) sh_fix_range (sh_fixed_range_str); Yes, something like that. Or maybe check flag_unsafe_math_optimizations, as it is done for FSCA and FSRRA insns in sh.md. I don't know the exact semantics for the new patterns. All I know is that rounding is supposed to be done only once after the two operations. This is the case for the SH fmac insn. Not sure whether this is enough though. It seems that we can use the fma pattern, though it would be an another issue. Maybe when trunk is back to stage 1.
[Bug target/51697] New: SH Target: Inefficient DImode comparisons for -Os
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51697 Bug #: 51697 Summary: SH Target: Inefficient DImode comparisons for -Os Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: oleg.e...@t-online.de CC: kkoj...@gcc.gnu.org Target: sh*-*-* For -Os and everything but -m1 DImode comparisons are not optimized properly which results in redundant SImode comparisons, producing code worse than for -O1. A reduced example: int test_0 (long long* x) { return *x 0x ? -20 : -40; } -Os -m2/-m3/-m4: mov#0,r2 ! 55movsi_ie/3[length = 2] tstr2,r2 ! 57cmpeqsi_t/1[length = 2] bf/s.L12! 58branch_false[length = 2] mov.l@(4,r4),r3! 12movsi_ie/7[length = 2] tstr3,r3 ! 59cmpeqsi_t/1[length = 2] .L12: bt/s.L11! 14branch_true[length = 2] mov#-40,r0 ! 5movsi_ie/3[length = 2] mov#-20,r0 ! 4movsi_ie/3[length = 2] .L11: rts nop ! 65*return_i[length = 4] -Os -m1: -O2 -m4: mov.l @(4,r4),r1 ! 10movsi_i/5[length = 2] mov #-40,r0 ! 5movsi_i/3[length = 2] tst r1,r1 ! 15cmpeqsi_t/1[length = 2] bt .L7 ! 16branch_true[length = 2] mov #-20,r0 ! 4movsi_i/3[length = 2] .L7: rts nop ! 61*return_i[length = 4] -O1 -m4: mov.l @(4,r4),r1 ! 10movsi_ie/7[length = 2] tst r1,r1 ! 17cmpeqsi_t/1[length = 2] bt/s.L6 ! 18branch_true[length = 2] mov #-40,r0 ! 5movsi_ie/3[length = 2] mov #-20,r0 ! 4movsi_ie/3[length = 2] .L6: rts nop ! 62*return_i[length = 4] Another example would be: int test_2 (unsigned long long x) { return x = 0x1LL ? -20 : -40; } -Os -m2/-m3/-m4: mov #0,r2 ! 48movsi_ie/3[length = 2] mov #-1,r3 ! 49movsi_ie/3[length = 2] cmp/eq r2,r4 ! 9cmpgtudi_t[length = 8] bf/s.Ldi67 cmp/hi r2,r4 cmp/hi r3,r5 .Ldi67: bf/s.L16! 10branch_false[length = 2] mov #-40,r0 ! 5movsi_ie/3[length = 2] mov #-20,r0 ! 4movsi_ie/3[length = 2] .L16: rts nop ! 52*return_i[length = 4] -Os -m1: tst r4,r4 ! 9cmpeqsi_t/1[length = 2] mov #-20,r0 ! 4movsi_i/3[length = 2] bf .L12! 10branch_false[length = 2] mov #-40,r0 ! 5movsi_i/3[length = 2] .L12: rts nop ! 56*return_i[length = 4] The problem does not appear for -m1, only for -Os and -m2*, -m3*, -m4*.
[Bug target/49263] SH Target: underutilized TST #imm, R0 instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49263 --- Comment #15 from Oleg Endo oleg.e...@t-online.de 2011-12-29 00:34:53 UTC --- (In reply to comment #14) With trunk rev 181517 I have observed the following problem, which happens when compiling for -m2*, -m3*, -m4* and -Os: This is still present as of rev 182713 and seems to be a different issue. I've created PR51697 for it.
[Bug lto/51698] New: [trans-mem] TM runtime and application with LTO
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51698 Bug #: 51698 Summary: [trans-mem] TM runtime and application with LTO Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto AssignedTo: unassig...@gcc.gnu.org ReportedBy: patrick.marl...@gmail.com CC: al...@gcc.gnu.org, r...@gcc.gnu.org, torv...@gcc.gnu.org Created attachment 26198 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=26198 testcase app-itm with lto In my attempt to make _ITM_R/W* calls inlined into the application code, it seems that the TM builtins and TM defintions don't work as expected with LTO. $ gcc -flto -fgnu-tm -Wall -o bin appitm.c `_ITM_beginTransaction' referenced in section `.text' of /tmp/cc7uGSe1.ltrans0.ltrans.o: defined in discarded section `.text' of /tmp/ccJk2crp.o (symbol from plugin) `_ITM_RU4' referenced in section `.text' of /tmp/cc7uGSe1.ltrans0.ltrans.o: defined in discarded section `.text' of /tmp/ccJk2crp.o (symbol from plugin) `_ITM_commitTransaction' referenced in section `.text' of /tmp/cc7uGSe1.ltrans0.ltrans.o: defined in discarded section `.text' of /tmp/ccJk2crp.o (symbol from plugin) collect2: error: ld returned 1 exit status I have merged all .c in the same source for the testcase but it has the same problem if TM runtime is in a library. Patrick Marlier.
[Bug libstdc++/51699] New: Clang refuses to compile ext/rope citing scope resolution issues
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51699 Bug #: 51699 Summary: Clang refuses to compile ext/rope citing scope resolution issues Classification: Unclassified Product: gcc Version: unknown Status: UNCONFIRMED Severity: minor Priority: P3 Component: libstdc++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: fedorabugm...@yahoo.com When using clang to compile an existing program, clang refuses to compile the ext/rope header files. One of the errors given is below. ropeimpl.h:433:2: error: use of undeclared identifier '_Data_allocate' _Data_allocate(_S_rounded_up_size(__old_len + __len)); g++ will compile this okay but the clang authors claim this code is invalid, http://llvm.org/bugs/show_bug.cgi?id=6454. Below are the 7 changes to the two files that allowed a successful compile. Line numbers may not be exact. In ropeimpl.h 383c381 this-_L_deallocate(__l, 1); --- _L_deallocate(__l, 1); 392c390 this-_C_deallocate(__c, 1); --- _C_deallocate(__c, 1); 400c398 this-_F_deallocate(__f, 1); --- _F_deallocate(__f, 1); 409c407 this-_S_deallocate(__ss, 1); --- _S_deallocate(__ss, 1); 433c431 _Rope_base_CharT, _Alloc::_Data_allocate(_S_rounded_up_size(__old_len + __len)); --- _Data_allocate(_S_rounded_up_size(__old_len + __len)); 514c512 _Rope_base_CharT, _Alloc::_C_deallocate(__result,1); --- _C_deallocate(__result,1); 817c815 _Rope_base_CharT, _Alloc::_Data_allocate(_S_rounded_up_size(__result_len)); --- _Data_allocate(_S_rounded_up_size(__result_len)); In rope 732c730 this-_S_free_string(_M_data, this-_M_size,this-_M_get_allocator()); --- __STL_FREE_STRING(_M_data, this-_M_size, this-_M_get_allocator());
[Bug target/51565] [4.4/4.5/4.6/4.7 Regression] fastcall in array of method pointers: internal compiler error
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51565 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2011-12-29 Component|c++ |target Known to work||4.3.5 Target Milestone|--- |4.4.7 Summary|fastcall in array of method |[4.4/4.5/4.6/4.7 |pointers: internal compiler |Regression] fastcall in |error |array of method pointers: ||internal compiler error Ever Confirmed|0 |1 Known to fail||4.4.5, 4.7.0 --- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2011-12-29 06:00:53 UTC --- Confirmed.
[Bug fortran/51569] documentation on sign intrinsic
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51569 --- Comment #2 from Andrew Pinski pinskia at gcc dot gnu.org 2011-12-29 06:02:53 UTC --- -0.0 does not exist in Fortran except when using the IEEE module IIRC.
[Bug c++/51613] [4.4/4.5/4.6/4.7 Regression] Ambiguous function template instantiations as template argument are not rejected
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51613 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Known to work||4.3.5 Keywords||accepts-invalid Last reconfirmed||2011-12-29 Ever Confirmed|0 |1 Summary|Ambiguous function template |[4.4/4.5/4.6/4.7 |instantiations as template |Regression] Ambiguous |argument are not rejected |function template ||instantiations as template ||argument are not rejected Target Milestone|--- |4.4.7 Known to fail||4.4.5, 4.7.0 --- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2011-12-29 06:07:16 UTC --- Confirmed.
[Bug testsuite/51693] New XPASSes in vectorizer testsuite on powerpc64-suse-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51693 --- Comment #7 from Ira Rosen irar at il dot ibm.com 2011-12-29 07:37:53 UTC --- (In reply to comment #6) Yes, vector_sizes_32B_16B seems to be ok in that case. Other two tests (vect-multitypes-1.c and no-section-anchors-vect-69.c) look like having the same problem - are you ok for similar fix for them too, i.e. is patch http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01600/vec-tests-avx2_fixes-7.patch ok for trunk? Yes, just please don't forget to update testsuite/ChangeLog. Thanks, Ira Thanks, Michael
[patch] Fix PR tree-optimization/51684
Hi, This patch fixes an attempt to access gsi of pattern statement. Bootstrapped and tested on ia64-unknown-linux-gnu by Uros and on powerpc64-suse-linux by me. Committed. Ira ChangeLog: PR tree-optimization/51684 * tree-vect-slp.c (vect_schedule_slp_instance): Get gsi of original statement in case of a pattern. (vect_schedule_slp): Likewise. Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 182703) +++ gcc/tree-vect-slp.c (working copy) @@ -2885,6 +2885,8 @@ vect_schedule_slp_instance (slp_tree node, slp_ins REFERENCE_CLASS_P (gimple_get_lhs (stmt))) { gimple last_store = vect_find_last_store_in_slp_instance (instance); + if (is_pattern_stmt_p (vinfo_for_stmt (last_store))) + last_store = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (last_store)); si = gsi_for_stmt (last_store); } @@ -2989,6 +2991,8 @@ vect_schedule_slp (loop_vec_info loop_vinfo, bb_ve if (!STMT_VINFO_DATA_REF (vinfo_for_stmt (store))) break; + if (is_pattern_stmt_p (vinfo_for_stmt (store))) + store = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (store)); /* Free the attached stmt_vec_info and remove the stmt. */ gsi = gsi_for_stmt (store); gsi_remove (gsi, true);
[PATCH, testsuite]: Use dg-add-options ieee in gcc.dg/torture/pr50396.c
Hello! Some targets (i.e. alpha) need -mieee to handle NaNs. 2011-12-28 Uros Bizjak ubiz...@gmail.com * gcc.dg/torture/pr50396.c: Use dg-add-options ieee. Tested on alphaev68-pc-linux-gnu, committed to mainline SVN and 4.6. Uros. Index: gcc.dg/torture/pr50396.c === --- gcc.dg/torture/pr50396.c(revision 182694) +++ gcc.dg/torture/pr50396.c(working copy) @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-add-options ieee } */ extern void abort (void); typedef float vf128 __attribute__((vector_size(16)));
Ping: backport fix for PR 48660 (assigning to BLKmode return regs)
Ping for backporting this expand patch, which fixes an ice-on-valid regression from 4.4 while compiling certain C++ packages on ARM: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg01550.html As I understand it, this bug is the only reason Ubuntu is keeping a GCC 4.4 package: there's no known workaround besides changing the source, so affected packages have to be compiled with an older compiler. I think this is going to be one of those patches that distros who care about ARM will end up having to backport individually if we don't do it in the FSF version. Richard
[C++ testcase, commited] PR 51547
[resend: the first time the message didn't get through because miscategorized as spam] Hi, I'm adding the testcase to mainline and closing the PR. Thanks, Paolo. / 2011-12-27 Paolo Carlini paolo.carl...@oracle.com PR c++/51547 * g++.dg/cpp0x/pr51547.C: New. Index: g++.dg/cpp0x/pr51547.C === --- g++.dg/cpp0x/pr51547.C (revision 0) +++ g++.dg/cpp0x/pr51547.C (revision 0) @@ -0,0 +1,50 @@ +// PR c++/51547 +// { dg-options -std=c++0x } + +template class T +struct vector +{ + T* + begin() + { return member; } + + const T* + begin() const + { return member; } + + T member; +}; + +struct Bar { + int x; +}; + +struct Foo { + const vectorBar bar() const { +return bar_; + } + + vectorBar bar_; +}; + +template class X +struct Y { + void foo() { +Foo a; +auto b = a.bar().begin(); +auto c = b-x; + } +}; + +template class X +void foo() { + Foo a; + auto b = a.bar().begin(); + auto c = b-x; +} + +int main() { + Yint p; + p.foo(); + fooint(); +}
Re: RFC: An alternative -fsched-pressure implementation
Vladimir Makarov vmaka...@redhat.com writes: In the end I tried an ad-hoc approach in an attempt to do something about (2), (3) and (4b). The idea was to construct a preliminary model schedule in which the only objective is to keep register pressure to a minimum. This schedule ignores pipeline characteristics, latencies, and the number of available registers. The maximum pressure seen in this initial model schedule (MP) is then the benchmark for ECC(X). I always had an impression that the code before scheduler is close to minimal register pressure because of specific expression generation. May be I was wrong and some optimizations (global ones like pre) changes this a lot. One of the examples I was looking at was: - #include stdint.h #define COUNT 8 void loop (uint8_t *__restrict dst, uint8_t *__restrict src, uint8_t *__restrict ff_cropTbl, int dstStride, int srcStride) { const int w = COUNT; uint8_t *cm = ff_cropTbl + 1024; for(int i=0; iw; i++) { const int srcB = src[-2*srcStride]; const int srcA = src[-1*srcStride]; const int src0 = src[0 *srcStride]; const int src1 = src[1 *srcStride]; const int src2 = src[2 *srcStride]; const int src3 = src[3 *srcStride]; const int src4 = src[4 *srcStride]; const int src5 = src[5 *srcStride]; const int src6 = src[6 *srcStride]; const int src7 = src[7 *srcStride]; const int src8 = src[8 *srcStride]; const int src9 = src[9 *srcStride]; const int src10 = src[10*srcStride]; dst[0*dstStride] = cm[(((src0+src1)*20 - (srcA+src2)*5 + (srcB+src3)) + 16)5]; dst[1*dstStride] = cm[(((src1+src2)*20 - (src0+src3)*5 + (srcA+src4)) + 16)5]; dst[2*dstStride] = cm[(((src2+src3)*20 - (src1+src4)*5 + (src0+src5)) + 16)5]; dst[3*dstStride] = cm[(((src3+src4)*20 - (src2+src5)*5 + (src1+src6)) + 16)5]; dst[4*dstStride] = cm[(((src4+src5)*20 - (src3+src6)*5 + (src2+src7)) + 16)5]; dst[5*dstStride] = cm[(((src5+src6)*20 - (src4+src7)*5 + (src3+src8)) + 16)5]; dst[6*dstStride] = cm[(((src6+src7)*20 - (src5+src8)*5 + (src4+src9)) + 16)5]; dst[7*dstStride] = cm[(((src7+src8)*20 - (src6+src9)*5 + (src5+src10)) + 16)5]; dst++; src++; } } - (based on the libav h264 code). In this example the loads from src and stores to dst are still in their original order by the time we reach sched1, so src, dst, srcA, srcB, and src0..10 are all live at once. There's no aliasing reason why they can't be reordered, and we do that during scheduling. During the main scheduling, an instruction X that occurs at or before the next point of maximum pressure in the model schedule is measured based on the current register pressure. If X doesn't increase the current pressure beyond the current maximum, its ECC(X) is zero, otherwise ECC(X) is the cost of going from MP to the new maximum. The idea is that the final net pressure of scheduling a given set of instructions is going to be the same regardless of the order; we simply want to keep the intermediate pressure under control. An ECC(X) of zero usually[*] means that scheduling X next won't send the rest of the sequence beyond the current maximum pressure. [*] but not always. There's more about this in the patch comments. If an instruction X occurs _after_ the next point of maximum pressure, X is measured based on that maximum pressure. If the current maximum pressure is MP', and X increases pressure by dP, ECC(X) is the cost of going from MP to MP' + dP. Of course, this all depends on how good a value MP is, and therefore on how good the model schedule is. I tried a few variations before settling on the one in the patch (which I hope makes conceptual sense). I initially stayed with the idea above about assigning different costs to (Ra), (Rb) and (Rc). This produces some good results, but was still a little too conservative in general, in that other tests were still worse with -fsched-pressure than without. I described some of the problems with these costs above. Another is that if an instruction X has a spill cost of 6, say, then: ECC(X) + delay(X) will only allow X to be scheduled if the next instruction without a spill cost has a delay of 6 cycles or more. This is overly harsh, especially seeing as few ARM instructions have such a high latency. The benefit of spilling is often to avoid a series of short (e.g. single-cycle) stalls, rather than to avoid a single long one. I then adjusted positive ECC(X) values based on the priority of X relative to the highest-priority zero-cost instruction. This was better, but a DES filter in particular still suffered from the lots of short stalls problem. Then, as an experiment, I tried ignoring MEMORY_MOVE_COST altogether and simply treating
Ping**1.57 [Patch, fortran] Improve common function elimination
http://gcc.gnu.org/ml/fortran/2011-12/msg00102.html OK for trunk? Regards Thomas
Re: [PATCH] PowerPC section type conflict (created PR 51623)
On Mon, Dec 19, 2011 at 11:45:35PM +0800, Chung-Lin Tang wrote: On 2011/12/19 上午 03:18, Richard Henderson wrote: On 12/17/2011 10:36 PM, Chung-Lin Tang wrote: I don't think it's that kind of problem; the powerpc backend uses unlikely_text_section_p(), which compares the passed in argument section and the value of function_section_1(current_function_decl,true). I think this might be the real bug, or something related. Since current_function_decl is NULL at assembly phase, it retrieves .text.unlikely to test for equality. It's the retrieving/lookup that fails here, because the default looked-up section flags set when decl == NULL does not really seem to make sense (adds SECTION_WRITE). current_function_decl is only null when we're not inside a function. One possible fix is to test for current_function_section inside unlikely_text_section_p. However, I think that begs the question of what in the world is actually going on in rs6000_assemble_integer. Why are we testing for emitting data in text sections? I think I sort of mis-represented the context here; this was not really during the assembly phase of a function, but already in toplev.c:output_object_blocks(). I've created a bugzilla PR for this, with a testcase from U-boot, and a minimal testcase: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51623 This one line patch fixes the problem by using a different test than unlikely_text_section_p, which assumes it is called within a function context. I bootstrapped it, and there were no regressions. I have added the test case from the PR so it doesn't come back. Is it ok to apply? It is also a bug in GCC 4.6, and I will backport the patch to that branch as well. FWIW, I wrote -mrelocatable around 1990 or so to for a specific Cygnus customer that needed to have pseudo shared libraries in embedded code, as long as they were willing to live with various restrictions. At the time, the Linux shared library code was non-existant, and this was a quick hack. In the nature of all quick hacks, eventually things change in the machine independent code layer, and it has to be revisited. In hindsight, it would have been better if the Linux shared library code was operational, and that there was a non-GPL dynamic linker written to handle the relocations, rather than having this quick hack. The check for unlikely text was added in 2004 by Caroline Tice of Apple, and it is curious that they didn't add a check for it being in a hot function as well as a cold function. I also suspect the check would not work as well if -ffunctions-section was used. The point of the check is not to add to the fixup table pointers that are stored in the read-only text section (which would cause a segfault at runtime, but it would leave a pointer that is not fixed up when the program starts). It was modified in 2005 by Richard Sandiford in a global change in how sections are dealt with, and modified by Alan Modra in 2006. [gcc] 2011-12-27 Michael Meissner meiss...@linux.vnet.ibm.com PR target/51623 * config/rs6000/rs6000.c (rs6000_assemble_integer): Don't call unlikely_text_section_p. Instead check for being in a code section. [gcc/testsuite] 2011-12-27 Michael Meissner meiss...@linux.vnet.ibm.com PR target/51623 * gcc.target/powerpc/pr51623.c: New file. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899 Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 182694) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -15461,7 +15461,7 @@ rs6000_assemble_integer (rtx x, unsigned if (TARGET_RELOCATABLE in_section != toc_section in_section != text_section - !unlikely_text_section_p (in_section) + (in_section (in_section-common.flags SECTION_CODE)) == 0 !recurse GET_CODE (x) != CONST_INT GET_CODE (x) != CONST_DOUBLE Index: gcc/testsuite/gcc.target/powerpc/pr51623.c === --- gcc/testsuite/gcc.target/powerpc/pr51623.c (revision 0) +++ gcc/testsuite/gcc.target/powerpc/pr51623.c (revision 0) @@ -0,0 +1,123 @@ +/* PR target/51623 */ +/* { dg-do compile { target { { powerpc*-*-linux* ilp32 } || { powerpc-*-eabi* } } } } */ +/* { dg-options -mrelocatable -ffreestanding } */ + +/* This generated an error, since the compiler was calling + unlikely_text_section_p in a context where it wasn't valid. */ + +typedef long long loff_t; +typedef unsigned size_t; + + +struct mtd_info { + unsigned writesize; + unsigned oobsize; + const char *name; +}; + +extern int strcmp(const char *,const char *); +extern char * strchr(const char *,int); + +struct cmd_tbl_s { + char *name; +}; + + +int printf(const char *fmt, ...)
PR rtl-optimization/51069 (verify_loop_info failed)
Hi, in this testcase peeling of loop contaiing irreducible region leads to increasing size of the region (by removing the conditional path into it). remove_path is not quite ready for this scenario. Still it would be nice to avoid us creating irreducible region in cases where they are not. Bootstrapped/regtested x86_64-linux, OK? int a, b, c, d, e, f, bar (void); void foo (int x) { for (;;) { if (!x) { for (d = 6; d = 0; d--) { while (!b) ; if (e) return foo (x); if (f) { a = 0; continue; } for (; c; c--) ; } } if (bar ()) break; e = 0; if (x) for (;;) ; } } PR rtl-optimization/51069 * cfgloopmanip.c (remove_path): Removing path making irreducible region unconditional makes BB part of the region. Index: cfgloopmanip.c === *** cfgloopmanip.c (revision 182708) --- cfgloopmanip.c (working copy) *** remove_path (edge e) *** 290,295 --- 290,296 int i, nrem, n_bord_bbs; sbitmap seen; bool irred_invalidated = false; + edge_iterator ei; if (!can_remove_branch_p (e)) return false; *** remove_path (edge e) *** 329,337 /* Find border hexes -- i.e. those with predecessor in removed path. */ for (i = 0; i nrem; i++) SET_BIT (seen, rem_bbs[i]-index); for (i = 0; i nrem; i++) { - edge_iterator ei; bb = rem_bbs[i]; FOR_EACH_EDGE (ae, ei, rem_bbs[i]-succs) if (ae-dest != EXIT_BLOCK_PTR !TEST_BIT (seen, ae-dest-index)) --- 330,341 /* Find border hexes -- i.e. those with predecessor in removed path. */ for (i = 0; i nrem; i++) SET_BIT (seen, rem_bbs[i]-index); + FOR_EACH_EDGE (ae, ei, e-src-succs) + if (ae != e ae-dest != EXIT_BLOCK_PTR !TEST_BIT (seen, ae-dest-index) +ae-flags EDGE_IRREDUCIBLE_LOOP) + irred_invalidated = true; for (i = 0; i nrem; i++) { bb = rem_bbs[i]; FOR_EACH_EDGE (ae, ei, rem_bbs[i]-succs) if (ae-dest != EXIT_BLOCK_PTR !TEST_BIT (seen, ae-dest-index))
Re: PR rtl-optimization/51069 (verify_loop_info failed)
On Wed, Dec 28, 2011 at 07:31:57PM +0100, Jan Hubicka wrote: *** cfgloopmanip.c(revision 182708) --- cfgloopmanip.c(working copy) *** remove_path (edge e) *** 290,295 --- 290,296 int i, nrem, n_bord_bbs; sbitmap seen; bool irred_invalidated = false; + edge_iterator ei; if (!can_remove_branch_p (e)) return false; *** remove_path (edge e) *** 329,337 /* Find border hexes -- i.e. those with predecessor in removed path. */ for (i = 0; i nrem; i++) SET_BIT (seen, rem_bbs[i]-index); for (i = 0; i nrem; i++) { - edge_iterator ei; bb = rem_bbs[i]; FOR_EACH_EDGE (ae, ei, rem_bbs[i]-succs) if (ae-dest != EXIT_BLOCK_PTR !TEST_BIT (seen, ae-dest-index)) --- 330,341 /* Find border hexes -- i.e. those with predecessor in removed path. */ for (i = 0; i nrem; i++) SET_BIT (seen, rem_bbs[i]-index); + FOR_EACH_EDGE (ae, ei, e-src-succs) + if (ae != e ae-dest != EXIT_BLOCK_PTR !TEST_BIT (seen, ae-dest-index) + ae-flags EDGE_IRREDUCIBLE_LOOP) + irred_invalidated = true; Just a nit, can't you break out of the loop when irred_invalidated is set to true as well? There is no need to look through any further edges. I.e. perhaps: if (!irred_invalidated) FOR_EACH_EDGE (ae, ei, e-src-succs) if (ae != e ae-dest != EXIT_BLOCK_PTR (ae-flags EDGE_IRREDUCIBLE_LOOP) !TEST_BIT (seen, ae-dest-index)) { irred_invalidated = true; break; } Thanks for looking into this, I'll defer the review to somebody familiar with cfgloopmanip.c though. Jakub
[PATCH] Don't optimize away non-pure/const calls during ccp (PR tree-optimization/51683)
Hi! For some calls (like memcpy and other builtins that are known to pass through the first argument) we know the value of the lhs, but still we shouldn't be replacing the call with just a mere assignment of that known value to the LHS SSA_NAME, because the call has other side-effects. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2011-12-28 Jakub Jelinek ja...@redhat.com PR tree-optimization/51683 * tree-ssa-propagate.c (substitute_and_fold): Don't optimize away calls with side-effects. * tree-ssa-ccp.c (ccp_fold_stmt): Likewise. * gcc.dg/pr51683.c: New test. --- gcc/tree-ssa-propagate.c.jj 2011-11-11 20:54:59.0 +0100 +++ gcc/tree-ssa-propagate.c2011-12-27 12:23:41.334187258 +0100 @@ -1056,6 +1056,12 @@ substitute_and_fold (ssa_prop_get_value_ } else if (is_gimple_call (def_stmt)) { + int flags = gimple_call_flags (def_stmt); + + /* Don't optimize away calls that have side-effects. */ + if ((flags (ECF_CONST|ECF_PURE)) == 0 + || (flags ECF_LOOPING_CONST_OR_PURE)) + continue; if (update_call_from_tree (gsi, val) maybe_clean_or_replace_eh_stmt (def_stmt, gsi_stmt (gsi))) gimple_purge_dead_eh_edges (gimple_bb (gsi_stmt (gsi))); --- gcc/tree-ssa-ccp.c.jj 2011-12-19 09:21:07.0 +0100 +++ gcc/tree-ssa-ccp.c 2011-12-27 12:29:48.620880857 +0100 @@ -1878,6 +1878,7 @@ ccp_fold_stmt (gimple_stmt_iterator *gsi case GIMPLE_CALL: { tree lhs = gimple_call_lhs (stmt); + int flags = gimple_call_flags (stmt); tree val; tree argt; bool changed = false; @@ -1888,7 +1889,10 @@ ccp_fold_stmt (gimple_stmt_iterator *gsi type issues. */ if (lhs TREE_CODE (lhs) == SSA_NAME -(val = get_constant_value (lhs))) +(val = get_constant_value (lhs)) + /* Don't optimize away calls that have side-effects. */ +(flags (ECF_CONST|ECF_PURE)) != 0 +(flags ECF_LOOPING_CONST_OR_PURE) == 0) { tree new_rhs = unshare_expr (val); bool res; --- gcc/testsuite/gcc.dg/pr51683.c.jj 2011-12-27 12:21:43.662925435 +0100 +++ gcc/testsuite/gcc.dg/pr51683.c 2011-12-27 12:21:23.0 +0100 @@ -0,0 +1,18 @@ +/* PR tree-optimization/51683 */ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-optimized } */ + +static inline void * +bar (void *p, void *q, int r) +{ + return __builtin_memcpy (p, q, r); +} + +void * +foo (void *p) +{ + return bar ((void *) 0x12345000, p, 256); +} + +/* { dg-final { scan-tree-dump memcpy optimized } } */ +/* { dg-final { cleanup-tree-dump optimized } } */ Jakub
[SH] Fix defunct -mbranch-cost option
Hello, while working on another PR I've noticed that the -mbranch-cost option in the SH target is not really working. The attached patch brings it back to life, leaving the default behavior unchanged. Cheers, Oleg 2011-12-28 Oleg Endo oleg.e...@t-online.de * config/sh/sh.h (BRANCH_COST): Use sh_branch_cost variable. * config/sh/sh.c (sh_option_override): Simplify sh_branch_cost expression. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 182695) +++ gcc/config/sh/sh.c (working copy) @@ -724,9 +724,16 @@ else sh_divsi3_libfunc = __sdivsi3; if (sh_branch_cost == -1) -sh_branch_cost - = TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1; +{ + sh_branch_cost = 1; + /* The SH1 does not have delay slots, hence we get a pipeline stall + at every branch. The SH4 is superscalar, so the single delay slot + is not sufficient to keep both pipelines filled. */ + if (! TARGET_SH2 || TARGET_HARD_SH4) + sh_branch_cost = 2; +} + for (regno = 0; regno FIRST_PSEUDO_REGISTER; regno++) if (! VALID_REGISTER_P (regno)) sh_register_names[regno][0] = '\0'; Index: gcc/config/sh/sh.h === --- gcc/config/sh/sh.h (revision 182695) +++ gcc/config/sh/sh.h (working copy) @@ -2088,12 +2088,8 @@ different code that does fewer memory accesses. */ /* A C expression for the cost of a branch instruction. A value of 1 - is the default; other values are interpreted relative to that. - The SH1 does not have delay slots, hence we get a pipeline stall - at every branch. The SH4 is superscalar, so the single delay slot - is not sufficient to keep both pipelines filled. */ -#define BRANCH_COST(speed_p, predictable_p) \ - (TARGET_SH5 ? 1 : ! TARGET_SH2 || TARGET_HARD_SH4 ? 2 : 1) + is the default; other values are interpreted relative to that. */ +#define BRANCH_COST(speed_p, predictable_p) sh_branch_cost /* Assembler output control. */
Re: PR rtl-optimization/51069 (verify_loop_info failed)
Just a nit, can't you break out of the loop when irred_invalidated is set to true as well? There is no need to look through any further edges. I.e. Sure, though we do have horrible time complexity in case irreducible regions are including recomputing the whole CFG flags after every path removal. Honza
Re: [patch testsuite g++.dg]: Reflect ABI change for windows native targets about bitfield layout in structures
On Dec 16, 2011, at 9:56 AM, Dave Korn dave.korn.cyg...@gmail.com wrote: On 16/12/2011 09:01, Kai Tietz wrote: 2011/12/15 Dave Korn: { dg-options -mno-align-double { target i?86-*-cygwin* i?86-*-mingw* } } { dg-additional-options -mno-ms-bitfields { target i?86-*-mingw* } } ... so that MinGW gets both and Cygwin only the one it wants? (Actually the first one could just as well be changed to dg-additional-options at the same time, couldn't it?) Well, interesting. I think it should be the additional variant for cygwin/mingw, as otherwise -O2 gets clobbered for it, isn't it? Yes, that's what I was concerned with. So I modified patch as attached. Thanks for that. I recommend this patch for approval. Ok.
Re: [patch testsuite g++.old-deja]: Fix some testcases for mingw targets
On Dec 27, 2011, at 10:55 PM, Kai Tietz ktiet...@googlemail.com wrote: Ping It was previously approved in the email you quote. See the Ok buried in there. 2011/12/15 Dave Korn dave.korn.cyg...@gmail.com: On 15/12/2011 17:44, Mike Stump wrote: On Dec 15, 2011, at 1:43 AM, Kai Tietz wrote: This patch takes care that we are using for operator new/delete replacement test static version on mingw-targets. As the shared (DLL) version isn't able to have operator overload within DLL itself, as a DLL is finally-linked for PE-COFF. Ok for apply? Not sure who would review this if I don't, so, Ok. That said, if a shared library C++ type person wants to chime in... I get the feeling this is unfortunate, and it might have been nice to manage this in some other way, but, I just want to step back and let others think about it. Well, it's a consequence of how you can't leave undefined references in Windows DLLs at link-time for the loader to just fill in with the first definition it comes across at run-time (as you can on ELF). We have to jump through hoops to get operator new/delete replacement working on Cygwin, and were lucky in that the cygwin1.dll is linked against absolutely everything, so we had somewhere to hang our redirection hooks. Without someone adding some similar amount of infrastructure to MinGW, the only time function replacement can work is for a statically-linked executable, when all definitions are visible in one single link. * g++.old-deja/g++.brendan/new3.C: Adjust test for mingw targets to use static-version. s/static-version/static linking/ +// Avoid use of none-overridable new/delete operators in shared s/none-overridable/non-overridable/g s/in shared/in shared link/g Patch looks perfectly sensible to me, but I can't approve. cheers, DaveK -- | (\_/) This is Bunny. Copy and paste | (='.'=) Bunny into your signature to help | ()_() him gain world domination
Re: [PATCH] Don't optimize away non-pure/const calls during ccp (PR tree-optimization/51683)
- Original Message - else if (is_gimple_call (def_stmt)) { + int flags = gimple_call_flags (def_stmt); + + /* Don't optimize away calls that have side-effects. */ + if ((flags (ECF_CONST|ECF_PURE)) == 0 + || (flags ECF_LOOPING_CONST_OR_PURE)) This patch does this computation twice; grepping through the tree for ECF_CONST suggests it's done quite a few more times. Could we get a predicate in gimple.h to encapsulate this? -Nathan
Re: [PATCH] Don't optimize away non-pure/const calls during ccp (PR tree-optimization/51683)
On Wed, Dec 28, 2011 at 11:53:41AM -0800, Nathan Froyd wrote: - Original Message - else if (is_gimple_call (def_stmt)) { + int flags = gimple_call_flags (def_stmt); + + /* Don't optimize away calls that have side-effects. */ + if ((flags (ECF_CONST|ECF_PURE)) == 0 + || (flags ECF_LOOPING_CONST_OR_PURE)) This patch does this computation twice; grepping through the tree for ECF_CONST suggests it's done quite a few more times. Could we get a predicate in gimple.h to encapsulate this? I think it would be an overkill to have a predicate for nonlooping_const_or_pure_flags, we don't have predicates for similar RTL or decl flags either. We write: /* We can delete dead const or pure calls as long as they do not infinite loop. */ (RTL_CONST_OR_PURE_CALL_P (insn) !RTL_LOOPING_CONST_OR_PURE_CALL_P (insn))) and not RTL_CONST_OR_PURE_NONLOOPING_CALL_P (insn) etc. Jakub
Re: [PATCH] PowerPC section type conflict (created PR 51623)
On 12/28/2011 09:39 AM, Michael Meissner wrote: in_section != text_section -!unlikely_text_section_p (in_section) +(in_section (in_section-common.flags SECTION_CODE)) == 0 You should be able to delete the text_section test as well, and in_section should *never* be null, when emitting data. Otherwise this looks much better to me. r~
Ping [ARM back-end and middle-end patch] stack check for threads
ping I would like to introduce two new -fstack-check options named direct and indirect. Targets that did not supporting the new stack checking options will work as before. At the ARM platform the old generic options is working as before. (Including that is now possible to have a checking code sequence even if optimization is switched on.) The check against a given limit value while doing dynamic stack allocation is now also working, too. This was not the case due to missing trap function. For this case I've added a code sequence to let generic act like the dynamic part doing a compare against a given limit value. I'm treating this as keeping old stuff alive. Back to my new options I like to have here. Maybe you are happy with the above, but I'm not. Sometimes you do not have a one single limit value that is valid for all. For example if you are having an environment with threads and each threads is using its own stack at an different location. In case all functions should have a common knowledge about a global limit variable which is holding the limit value. This limit value can be used to check if a stack overflow has occurred or not. There are two ways to inform the compiler about this limit variable. If it is an ordinary variable (located somewhere in data space) you should use the option combination -fstack-check=indirect and -fstack-limit-symbol=global_stack_limit If it is a register global variable you should use the option combination -fstack-check=direct} and -fstack-limit-register=r6 In this case you have to make sure that this register isn't be used by others. For example you can add the option -ffixed-r6 to all files that are not going to do stack checking. The OS is responsible to insert the correct limit value. For example at the end of a context switch. I've added a little bit of documentation, too. This may not be as god as you expect, but it the best I can do. Sorry for that. I have added some tests running on arm simulator and linux arm target machine. I'm using ../src/configure --target=arm-elf and --target=arm-elf-eabi cross compilers and running tests with: gmake check-gcc RUNTESTFLAGS=--target_board=arm-sim arm_stack_check.exp Also using a native linux compiler (on armv7-a machine) and running tests with: gmake check-gcc RUNTESTFLAGS=arm_stack_check.exp Each test case is done with: - stack checking variants generic using a limit-symbol, generic using a limit register, direct using a limit-symbol, direct using a limit register and indirect using a limit-symbol - various modes ARM, Thumb, (and if possible with Thumb-2) - With and without optimization. - Without -fpic, with -fpic and with -fpic -msingle-pic-base I have also detected a minor bug if using combination: -fpic -mpic-register=r9 -march=armv4t -mthumb. (A move of the hi register to a lo register is missing here.) So I've added the few lines of code in here, too. Maybe you think that this is a nasty hack, so insert a better one instead. All tests succeeds. I'm still thinking that my idea isn't that bad. How ever any feedback from the ARM maintainers would be god. Even if is something like: We hate this bull shit at all. Any feedback is better than no feedback. Regards Thomas Klein references http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01261.html http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00310.html http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00216.html http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00281.html http://gcc.gnu.org/ml/gcc-patches/2011-07/msg00149.html http://gcc.gnu.org/ml/gcc-patches/2011-06/msg01872.html http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01226.html ChangeLog.check.bz2 Description: Binary data gcc.diff_chk.bz2 Description: Binary data gcc.diff_dep.bz2 Description: Binary data ChangeLog.test.bz2 Description: Binary data gcc.diff_test.bz2 Description: Binary data
Re: [PATCH] PowerPC section type conflict (created PR 51623)
On Wed, Dec 28, 2011 at 12:34:25PM -0800, Richard Henderson wrote: On 12/28/2011 09:39 AM, Michael Meissner wrote: in_section != text_section - !unlikely_text_section_p (in_section) + (in_section (in_section-common.flags SECTION_CODE)) == 0 You should be able to delete the text_section test as well, and in_section should *never* be null, when emitting data. Otherwise this looks much better to me. Yeah, I thought about that. I'm wondering whether any integer is ever emitted in the text section, and just delete the two lines. I'll try it out. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: Ping**1.57 [Patch, fortran] Improve common function elimination
On Wed, Dec 28, 2011 at 04:21:55PM +0100, Thomas Koenig wrote: http://gcc.gnu.org/ml/fortran/2011-12/msg00102.html OK for trunk? I did not test the patch, but it appears correct to me. OK. -- Steve
MAINTAINERS: Add myself
Just commited: * MAINTAINERS (Write After Approval): Add myself. Index: MAINTAINERS === --- MAINTAINERS (revision 182712) +++ MAINTAINERS (working copy) @@ -352,6 +352,7 @@ Michael Eager ea...@eagercon.com Phil Edwards p...@gcc.gnu.org Mohan Embargnust...@thisiscool.com +Oleg Endo olege...@gcc.gnu.org Revital Eres e...@il.ibm.com Marc Espie es...@cvs.openbsd.org Rafael Ávila de Espíndola espind...@google.com
[PATCH] PR testsuite/51097 fix: a lot of FAIL: gcc.dg/vect on i686 avx build 181167 to 181177
Hi, Here is another patch about failures in gcc.dg/vect tests. These changes fix fails that could be seen on avx-built compilers. It also introduces no FAILs/XFAILs/XPASSes/ERRORs on regular i686, x86_64, avx2_32, avx2_64. Is it ok for the trunk? Thanks, Igor 2011-12-28 Igor Zamyatin igor.zamya...@intel.com PR testsuite/51097 * lib/target-supports.exp (check_effective_target_vect_float_no_int): New function. (check_avx2_available): Ditto. * gcc.dg/vect/no-scevccp-outer-7.c: Adjust dg-scans for AVX-built compiler. * gcc.dg/vect/no-scevccp-vect-iv-3.c: Likewise. * gcc.dg/vect/no-vfa-vect-depend-1.c: Likewise. * gcc.dg/vect/no-vfa-vect-dv-2.c: Likewise. * gcc.dg/vect/slp-perm-9.c: Likewise. * gcc.dg/vect/slp-reduc-6.c: Likewise. * gcc.dg/vect/slp-widen-mult-half.c: Likewise. * gcc.dg/vect/vect-109.c: Likewise. * gcc.dg/vect/vect-119.c: Likewise. * gcc.dg/vect/vect-35-big-array.c: Likewise. * gcc.dg/vect/vect-91.c: Likewise. * gcc.dg/vect/vect-multitypes-4.c: Likewise. * gcc.dg/vect/vect-multitypes-6.c: Likewise. * gcc.dg/vect/vect-outer-4c-big-array.c: Likewise. * gcc.dg/vect/vect-over-widen-1.c: Likewise. * gcc.dg/vect/vect-over-widen-4.c: Likewise. * gcc.dg/vect/vect-peel-1.c: Likewise. * gcc.dg/vect/vect-peel-3.c: Likewise. * gcc.dg/vect/vect-peel-4.c: Likewise. * gcc.dg/vect/vect-reduc-dot-s16a.c: Likewise. * gcc.dg/vect/vect-reduc-dot-s8a.c: Likewise. * gcc.dg/vect/vect-reduc-dot-u8a.c: Likewise. * gcc.dg/vect/vect-reduc-dot-u8b.c: Likewise. * gcc.dg/vect/vect-reduc-pattern-1a.c: Likewise. * gcc.dg/vect/vect-reduc-pattern-1b-big-array.c: Likewise. * gcc.dg/vect/vect-reduc-pattern-1c-big-array.c: Likewise. * gcc.dg/vect/vect-reduc-pattern-2a.c: Likewise. * gcc.dg/vect/vect-reduc-pattern-2b-big-array.c: Likewise. * gcc.dg/vect/vect-widen-mult-const-s16.c: Likewise. * gcc.dg/vect/vect-widen-mult-const-u16.c: Likewise. * gcc.dg/vect/vect-widen-mult-half-u8.c: Likewise. * gcc.dg/vect/vect-widen-mult-half.c: Likewise. * gcc.dg/vect/vect-widen-mult-sum.c: Likewise. * gcc.dg/vect/vect-widen-mult-u16.c: Likewise. * gcc.dg/vect/wrapv-vect-reduc-dot-s8b.c: Likewise. 51097.patch Description: Binary data
Re: PR rtl-optimization/51069 (verify_loop_info failed)
Hi, Just a nit, can't you break out of the loop when irred_invalidated is set to true as well? There is no need to look through any further edges. I.e. perhaps: if (!irred_invalidated) FOR_EACH_EDGE (ae, ei, e-src-succs) if (ae != e ae-dest != EXIT_BLOCK_PTR (ae-flags EDGE_IRREDUCIBLE_LOOP) !TEST_BIT (seen, ae-dest-index)) { irred_invalidated = true; break; } Thanks for looking into this, I'll defer the review to somebody familiar with cfgloopmanip.c though. the change looks fine to me. Sure, though we do have horrible time complexity in case irreducible regions are including recomputing the whole CFG flags after every path removal. Yeah, though trying to keep it up-to-date locally was a nightmare. We actually do not use the information about irreducible regions all that much, so maybe the right approach would be to just compute it when needed, Zdenek
[wwwdocs] - changes to GUPC page
Hello, I updated GUPC page with the Download section. Attached is a patch. Ok to commit? Feedback is appreciated. Nenad Index: htdocs/projects/gupc.html === RCS file: /cvs/gcc/wwwdocs/htdocs/projects/gupc.html,v retrieving revision 1.5 diff -u -r1.5 gupc.html --- htdocs/projects/gupc.html 31 Dec 2010 11:36:07 - 1.5 +++ htdocs/projects/gupc.html 28 Dec 2011 21:47:42 - @@ -68,7 +68,7 @@ liIntel x86 Linux uniprocessor and symmetric multiprocessor systems (CentOS 5.3)/li liIntel x86 Apple Mac OS X uniprocessor and symmetric multiprocessor -systems (Leopard 10.5.7+ and Snow Leopard 10.6)/li +systems (Leopard 10.5.7+, Snow Leopard 10.6, and Lion 1.7)/li liMips2 32-bit (-n32) ABI and mips4 64-bit (-n64) ABI (SGI IRIX 6.5)/li liCray XT3/4/5 CNL and Catamount/li liAs a front-end to the Berkeley UPC Berkeley UPC runtime @@ -81,6 +81,16 @@ a href=#gupc_discussGUPC discussion list/a. /p +h2Download/h2 + +pThe latest release of GUPC can be downloaded from a +href=http://www.gccupc.org/downloads.html;gccupc.org/a./p + +pAlternatively, read-only SVN access to the GUPC branch can be used to +acquire the latest development source tree:/p + +presvn checkout svn://gcc.gnu.org/svn/gcc/branches/gupc/pre + h2Documentation/h2 pFor a list of configuration switches that you can use to build GUPC, consult
Re: [SH] Fix defunct -mbranch-cost option
Oleg Endo oleg.e...@t-online.de wrote: while working on another PR I've noticed that the -mbranch-cost option in the SH target is not really working. The attached patch brings it back to life, leaving the default behavior unchanged. Cheers, Oleg 2011-12-28 Oleg Endo oleg.e...@t-online.de * config/sh/sh.h (BRANCH_COST): Use sh_branch_cost variable. * config/sh/sh.c (sh_option_override): Simplify sh_branch_cost expression. Ok as the obvious fix. Thanks for the patch. Regards, kaz
[C++ Patch] PR 51316
Hi, I think the resolution of core/930 and C++11 itself are pretty clear: alignof of an array of unknown bound is fine, provided the element type is complete of course. Tested x86_64-linux. Thanks, Paolo. // /c-family 2011-12-29 Paolo Carlini paolo.carl...@oracle.com PR c++/51316 * c-common.c (c_sizeof_or_alignof_type): In C++ allow for alignof of array types with an unknown bound. /testsuite 2011-12-29 Paolo Carlini paolo.carl...@oracle.com PR c++/51316 * g++.dg/cpp0x/alignof4.C: New. Index: testsuite/g++.dg/cpp0x/alignof4.C === --- testsuite/g++.dg/cpp0x/alignof4.C (revision 0) +++ testsuite/g++.dg/cpp0x/alignof4.C (revision 0) @@ -0,0 +1,7 @@ +// PR c++/51316 +// { dg-options -std=c++0x } + +int main() +{ + alignof(int []); +} Index: c-family/c-common.c === --- c-family/c-common.c (revision 182710) +++ c-family/c-common.c (working copy) @@ -4382,13 +4382,22 @@ c_sizeof_or_alignof_type (location_t loc, return error_mark_node; value = size_one_node; } - else if (!COMPLETE_TYPE_P (type)) + else if (!COMPLETE_TYPE_P (type) + (!c_dialect_cxx () || is_sizeof || type_code != ARRAY_TYPE)) { if (complain) - error_at (loc, invalid application of %qs to incomplete type %qT , + error_at (loc, invalid application of %qs to incomplete type %qT, op_name, type); return error_mark_node; } + else if (c_dialect_cxx () type_code == ARRAY_TYPE + !COMPLETE_TYPE_P (TREE_TYPE (type))) +{ + if (complain) + error_at (loc, invalid application of %qs to array type %qT of + incomplete element type, op_name, type); + return error_mark_node; +} else { if (is_sizeof)
Use DW_LANG_Go for Go
This patch to gcc uses the new DW_LANG_Go DWARF language code for Go. Bootstrapped and ran testsuite on x86_64-unknown-linux-gnu. Committed on the basis of 1) I am a middle-end maintainer; 2) I am a Go maintainer; 3) the patch is obvious. Ian 2011-12-28 Ian Lance Taylor i...@google.com * dwarf2out.c (gen_compile_unit_die): Use DW_LANG_Go for Go. Index: dwarf2out.c === --- dwarf2out.c (revision 182694) +++ dwarf2out.c (working copy) @@ -18433,6 +18433,11 @@ gen_compile_unit_die (const char *filena language = DW_LANG_ObjC; else if (strcmp (language_string, GNU Objective-C++) == 0) language = DW_LANG_ObjC_plus_plus; + else if (dwarf_version = 5 || !dwarf_strict) + { + if (strcmp (language_string, GNU Go) == 0) + language = DW_LANG_Go; + } } add_AT_unsigned (die, DW_AT_language, language);
[libitm] Remove variadic argument of _ITM_beginTransaction from libitm.h
With i386, the regparm(2) is not taken into account when there is a variadic function. All parameters are in the stack. Since this variable argument is never used removing it is not a problem. This solves libitm testcases memset-1.c/memcpy-1.c on i686 (part of PR51655/51124). Before: FAIL: libitm.c/memcpy-1.c execution test FAIL: libitm.c/memset-1.c execution test === libitm Summary === # of expected passes21 # of unexpected failures2 # of expected failures 5 # of unresolved testcases 1 After: === libitm Summary === # of expected passes23 # of expected failures 5 # of unresolved testcases 1 Tested on i686. If ok, please commit. Thanks. Patrick Marlier. 2011-12-28 Patrick Marlier patrick.marl...@gmail.com PR testsuite/51655 * libitm.h (_ITM_beginTransaction): Remove unused argument. Index: libitm.h === --- libitm.h(revision 182549) +++ libitm.h(working copy) @@ -136,7 +136,7 @@ typedef uint64_t _ITM_transactionId_t; /* Transact extern _ITM_transactionId_t _ITM_getTransactionId(void) ITM_REGPARM; -extern uint32_t _ITM_beginTransaction(uint32_t, ...) ITM_REGPARM; +extern uint32_t _ITM_beginTransaction(uint32_t) ITM_REGPARM; extern void _ITM_abortTransaction(_ITM_abortReason) ITM_REGPARM ITM_NORETURN;
Re: [wwwdocs] - changes to GUPC page
On Dec 28, 2011, at 1:52 PM, Nenad Vukicevic wrote: -systems (Leopard 10.5.7+ and Snow Leopard 10.6)/li +systems (Leopard 10.5.7+, Snow Leopard 10.6, and Lion 1.7)/li 1.7? Should this be 10.7?
RE: PING: [PATCH, ARM, iWMMXt][4/5]: WMMX machine description
At 2011-12-22 17:53:45,Richard Earnshaw rearn...@arm.com wrote: On 22/12/11 06:38, Xinyu Qi wrote: At 2011-12-15 01:32:13,Richard Earnshaw rearn...@arm.com wrote: On 24/11/11 01:33, Xinyu Qi wrote: Hi Ramana, I solve the conflict, please try again. The new diff is attached. Thanks, Xinyu At 2011-11-19 07:36:15,Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: Hi Xinyu, This doesn't apply cleanly currently on trunk and the reject appears to come from iwmmxt.md and I've not yet investigated why. Can you have a look ? This patch is NOT ok. You're adding features that were new in iWMMXt2 (ie not in the original implementation) but you've provided no means by which the compiler can detect which operations are only available on the new cores. Hi Richard, All of the WMMX chips support WMMX2 instructions. This may be true for Marvell's current range of processors, but I find it hard to reconcile with the assembler support in GAS, which clearly distinguishes between iWMMXT and iWMMXT2 instruction sets. Are you telling me that no cores were ever manufactured (even by Intel) that only supported iWMMXT? I'm concerned that this patch will break support for existing users who have older chips (for GCC we have to go through a deprecation cycle if we want to drop support for something we now believe is no-longer worth maintaining). What I do is to complement the WMMX2 intrinsic support in GCC. I understand that, and I'm not saying the patch can never go in; just that it needs to separate out the support for the different architecture variants. I don't think it is necessary for users to consider whether one WMMX insn is a WMMX2 insn or not. Users don't (unless they want their code to run on legacy processors that only support the original instruction set), but the compiler surely must know what it is targeting. Remember that the instruction patterns are not entirely black boxes, the compiler can do optimizations on intrinsics (it's one of the reasons why they are better than inline assembly). Unless the compiler knows exactly what instructions are legal, it could end up optimizing something that started as a WMMX insn into something that's a WMMX2 insn (for example, propagating a constant into a vector shift expression). R. Hi, Richard, You are right. There exist the chips that only support WMMX instructions in the history. I distinguish the iWMMXt and iWMMXt2 in the patch update this time. In current GCC, -march=iwmmxt and -march=iwmmxt2 (or -mcpu=iwmmxt and -mcpu=iwmmxt2) are almost no difference in the compiling stage. I take advantage of them to do the work, that is, make -march=iwmmxt (or -mcpu=iwmmxt) only support iWMMXt intrinsic iWMMXt built in and WMMX instructions, and make -march=iwmmxt2 (or -mcpu=iwmmxt2) support fully iWMMXt2. Define a new flag FL_IWMMXT2 to represent the chip support iWMMXt2 extension, which directly controls the iWMMXt2 built in initialization and the followed defines. Define __IWMMXT2__ in TARGET_CPU_CPP_BUILTINS to control the access of iWMMXt2 intrinsics. Define TARGET_REALLY_IWMMXT2 to control the access of WMMX2 instructions' machine description. In arm.md, define iwmmxt2 in arch attr to control the access of the alternative in shift patterns. The updated patch 4/5 is attached here. 1/5, 2/5 and 3/5 are updated accordingly. Attach them in related mails. Please take a look if such modification is proper. Changelog: * config/arm/arm.c (arm_output_iwmmxt_shift_immediate): New function. (arm_output_iwmmxt_tinsr): Likewise. * config/arm/arm-protos.h (arm_output_iwmmxt_shift_immediate): Declare. (arm_output_iwmmxt_tinsr): Likewise. * config/arm/iwmmxt.md (WCGR0, WCGR1, WCGR2, WCGR3): New constant. (iwmmxt_psadbw, iwmmxt_walign, iwmmxt_tmrc, iwmmxt_tmcr): Delete. (rorv4hi3, rorv2si3, rordi3): Likewise. (rorv4hi3_di, rorv2si3_di, rordi3_di): Likewise. (ashrv4hi3_di, ashrv2si3_di, ashrdi3_di): Likewise. (lshrv4hi3_di, lshrv2si3_di, lshrdi3_di): Likewise. (ashlv4hi3_di, ashlv2si3_di, ashldi3_di): Likewise. (iwmmxt_tbcstqi, iwmmxt_tbcsthi, iwmmxt_tbcstsi): Likewise (*iwmmxt_clrv8qi, *iwmmxt_clrv4hi, *iwmmxt_clrv2si): Likewise. (tbcstv8qi, tbcstv4hi, tbsctv2si): New pattern. (iwmmxt_clrv8qi, iwmmxt_clrv4hi, iwmmxt_clrv2si): Likewise. (*andmode3_iwmmxt, *iormode3_iwmmxt, *xormode3_iwmmxt): Likewise. (rormode3, rormode3_di): Likewise. (ashrmode3_di, lshrmode3_di, ashlmode3_di): Likewise. (ashlimode3_iwmmxt, iwmmxt_waligni, iwmmxt_walignr): Likewise. (iwmmxt_walignr0, iwmmxt_walignr1): Likewise. (iwmmxt_walignr2, iwmmxt_walignr3): Likewise. (iwmmxt_setwcgr0, iwmmxt_setwcgr1): Likewise. (iwmmxt_setwcgr2, iwmmxt_setwcgr3): Likewise. (iwmmxt_getwcgr0, iwmmxt_getwcgr1): Likewise.
RE: [PATCH, ARM, iWMMXt][1/5]: ARM code generic change
At 2011-12-15 00:47:48,Richard Earnshaw rearn...@arm.com wrote: On 14/07/11 08:35, Xinyu Qi wrote: Hi, It is the first part of iWMMXt maintenance. *config/arm/arm.c (arm_option_override): Enable iWMMXt with VFP. iWMMXt and NEON are incompatible. iWMMXt unsupported under Thumb-2 mode. (arm_expand_binop_builtin): Accept immediate op (with mode VOID) *config/arm/arm.md: Resettle include location of iwmmxt.md so that *arm_movdi and *arm_movsi_insn could be used when iWMMXt is enabled. With the current work in trunk to handle enabled attributes and per-alternative predicable attributes (Thanks Bernd) we should be able to get rid of *cond_iwmmxt_movsi_insn in iwmmxt.md file. It's not a matter for this patch but for a follow-up patch. Actually we should probably do the same for the various insns that are dotted around all over the place with final conditions that prevent matching - atleast makes the backend description slightly smaller :). Add pipeline description file include. It is enough to say (filename): Include. in the changelog entry. The include for the pipeline description file should be with the patch that you add this in i.e. patch #5. Please add this to MD_INCLUDES in t-arm as well. Also as a general note, please provide a correct Changelog entry. This is not the format that we expect Changelog entries to be in. Please look at the coding standards on the website for this or at other patches submitted with respect to Changelog entries. Please fix this for each patch in the patch stack. cheers Ramana Thanks for reviewing. I have updated the patches and the Changelog. *config/arm/arm.c (arm_option_override): Enable iWMMXt with VFP. (arm_expand_binop_builtin): Accept VOIDmode op. *config/arm/arm.md (*arm_movdi, *arm_movsi_insn): Remove condition !TARGET_IWMMXT. (iwmmxt.md): Include location. Thanks, Xinyu= + VFP and iWMMXt however can coexist. */ if (TARGET_IWMMXT + TARGET_HARD_FLOAT !TARGET_VFP) +sorry (iWMMXt and non-VFP floating point unit); + + /* iWMMXt and NEON are incompatible. */ if (TARGET_IWMMXT + TARGET_NEON) +sorry (iWMMXt and NEON); - /* ??? iWMMXt insn patterns need auditing for Thumb-2. */ + /* iWMMXt unsupported under Thumb-2 mode. */ if (TARGET_THUMB2 TARGET_IWMMXT) sorry (Thumb-2 iWMMXt); Don't use sorry() when a feature is not supported by the hardware; sorry() is used when GCC is currently unable to support something that it should. Use error() in these cases. Secondly, iWMMXt is incompatible with the entire Thumb ISA, not just the Thumb-2 extensions to the Thumb ISA. Done. +;; Load the Intel Wireless Multimedia Extension patterns (include +iwmmxt.md) + No, the extension patterns need to come at the end of the main machine description. The list at the top of the MD file is purely for pipeline descriptions. Why do you think this is needed? This modification is needless right now since *iwmmxt_movsi_insn and *iwmmxt_arm_movdi have been corrected in the fourth part of the patch. Revert it. The new modified patch is attached. * config/arm/arm.c (arm_option_override): Enable use of iWMMXt with VFP. Disable use of iWMMXt with NEON. Disable use of iWMMXt under Thumb mode. (arm_expand_binop_builtin): Accept VOIDmode op. Thanks, Xinyu Other bits are ok. R. New changlog * config/arm/arm.c (FL_IWMMXT2): New define. (arm_arch_iwmmxt2): New variable. (arm_option_override): Enable use of iWMMXt with VFP. Disable use of iWMMXt with NEON. Disable use of iWMMXt under Thumb mode. Set arm_arch_iwmmxt2. (arm_expand_binop_builtin): Accept VOIDmode op. * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Define __IWMMXT2__. (TARGET_IWMMXT2): New define. (TARGET_REALLY_IWMMXT2): Likewise. (arm_arch_iwmmxt2): Declare. * config/arm/arm-cores.def (iwmmxt2): Add FL_IWMMXT2. * config/arm/arm-arches.def (iwmmxt2): Likewise. * config/arm/arm.md (arch): Add iwmmxt2. (arch_enabled): Handle iwmmxt2. Thanks, Xinyu 1_generic.diff Description: 1_generic.diff
RE: PING: [PATCH, ARM, iWMMXt][2/5]: intrinsic head file change
* config/arm/mmintrin.h: Use __IWMMXT__ to enable iWMMXt intrinsics. Use __IWMMXT2__ to enable iWMMXt2 intrinsics. Use C name-mangling for intrinsics. (__v8qi): Redefine. (_mm_cvtsi32_si64, _mm_andnot_si64, _mm_sad_pu8): Revise. (_mm_sad_pu16, _mm_align_si64, _mm_setwcx, _mm_getwcx): Likewise. (_m_from_int): Likewise. (_mm_sada_pu8, _mm_sada_pu16): New intrinsic. (_mm_alignr0_si64, _mm_alignr1_si64, _mm_alignr2_si64): Likewise. (_mm_alignr3_si64, _mm_tandcb, _mm_tandch, _mm_tandcw): Likewise. (_mm_textrcb, _mm_textrch, _mm_textrcw, _mm_torcb): Likewise. (_mm_torch, _mm_torcw, _mm_tbcst_pi8, _mm_tbcst_pi16): Likewise. (_mm_tbcst_pi32): Likewise. (_mm_abs_pi8, _mm_abs_pi16, _mm_abs_pi32): New iWMMXt2 intrinsic. (_mm_addsubhx_pi16, _mm_absdiff_pu8, _mm_absdiff_pu16): Likewise. (_mm_absdiff_pu32, _mm_addc_pu16, _mm_addc_pu32): Likewise. (_mm_avg4_pu8, _mm_avg4r_pu8, _mm_maddx_pi16, _mm_maddx_pu16): Likewise. (_mm_msub_pi16, _mm_msub_pu16, _mm_mulhi_pi32): Likewise. (_mm_mulhi_pu32, _mm_mulhir_pi16, _mm_mulhir_pi32): Likewise. (_mm_mulhir_pu16, _mm_mulhir_pu32, _mm_mullo_pi32): Likewise. (_mm_qmulm_pi16, _mm_qmulm_pi32, _mm_qmulmr_pi16): Likewise. (_mm_qmulmr_pi32, _mm_subaddhx_pi16, _mm_addbhusl_pu8): Likewise. (_mm_addbhusm_pu8, _mm_qmiabb_pi32, _mm_qmiabbn_pi32): Likewise. (_mm_qmiabt_pi32, _mm_qmiabtn_pi32, _mm_qmiatb_pi32): Likewise. (_mm_qmiatbn_pi32, _mm_qmiatt_pi32, _mm_qmiattn_pi32): Likewise. (_mm_wmiabb_si64, _mm_wmiabbn_si64, _mm_wmiabt_si64): Likewise. (_mm_wmiabtn_si64, _mm_wmiatb_si64, _mm_wmiatbn_si64): Likewise. (_mm_wmiatt_si64, _mm_wmiattn_si64, _mm_wmiawbb_si64): Likewise. (_mm_wmiawbbn_si64, _mm_wmiawbt_si64, _mm_wmiawbtn_si64): Likewise. (_mm_wmiawtb_si64, _mm_wmiawtbn_si64, _mm_wmiawtt_si64): Likewise. (_mm_wmiawttn_si64, _mm_merge_si64): Likewise. (_mm_torvscb, _mm_torvsch, _mm_torvscw): Likewise. (_m_to_int): New define. Thanks, Xinyu 2_mmintrin.diff Description: 2_mmintrin.diff