[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505 Richard Guenther changed: What|Removed |Added Known to work||4.6.0 Target Milestone|4.4.6 |4.6.0 --- Comment #15 from Richard Guenther 2011-01-13 16:47:22 UTC --- For 4.6. Nothing to backport here.
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505 Jeffrey A. Law changed: What|Removed |Added Status|NEW |RESOLVED CC||law at redhat dot com Resolution||FIXED Known to fail|| --- Comment #14 from Jeffrey A. Law 2011-01-13 15:45:22 UTC --- Fixed long ago.
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505 --- Comment #13 from Sandra Loosemore 2010-10-01 15:01:08 UTC --- I think this bug is fixed now.
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505 Jakub Jelinek changed: What|Removed |Added Target Milestone|4.4.5 |4.4.6
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #12 from sandra at gcc dot gnu dot org 2010-07-10 18:43 --- Subject: Bug 42505 Author: sandra Date: Sat Jul 10 18:43:29 2010 New Revision: 162043 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162043 Log: 2010-07-10 Sandra Loosemore PR middle-end/42505 gcc/ * tree-inline.c (estimate_num_insns): Refactor builtin complexity lookup code into * builtins.c (is_simple_builtin, is_inexpensive_builtin): ...these new functions. * tree.h (is_simple_builtin, is_inexpensive_builtin): Declare. * cfgloopanal.c (target_clobbered_regs): Define. (init_set_costs): Initialize target_clobbered_regs. (estimate_reg_pressure_cost): Add call_p argument. When true, adjust the number of available registers to exclude the call-clobbered registers. * cfgloop.h (target_clobbered_regs): Declare. (estimate_reg_pressure_cost): Adjust declaration. * tree-ssa-loop-ivopts.c (struct ivopts_data): Add body_includes_call. (ivopts_global_cost_for_size): Pass it to estimate_reg_pressure_cost. (determine_set_costs): Dump target_clobbered_regs. (loop_body_includes_call): New function. (tree_ssa_iv_optimize_loop): Use it to initialize new field. * loop-invariant.c (gain_for_invariant): Adjust arguments to pass call_p flag through. (best_gain_for_invariant): Likewise. (find_invariants_to_move): Likewise. (move_single_loop_invariants): Likewise, using already-computed has_call field. Modified: trunk/gcc/ChangeLog trunk/gcc/builtins.c trunk/gcc/cfgloop.h trunk/gcc/cfgloopanal.c trunk/gcc/loop-invariant.c trunk/gcc/tree-inline.c trunk/gcc/tree-ssa-loop-ivopts.c trunk/gcc/tree.h -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #11 from sandra at gcc dot gnu dot org 2010-07-05 17:41 --- Subject: Bug 42505 Author: sandra Date: Mon Jul 5 17:40:57 2010 New Revision: 161844 URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=161844 Log: 2010-07-05 Sandra Loosemore PR middle-end/42505 gcc/ * tree-ssa-loop-ivopts.c (determine_set_costs): Delete obsolete comments about cost model. (try_add_cand_for): Add second strategy for choosing initial set based on original IVs, controlled by ORIGINALP argument. (get_initial_solution): Add ORIGINALP argument. (find_optimal_iv_set_1): New function, split from find_optimal_iv_set. (find_optimal_iv_set): Try two different strategies for choosing the IV set, and return the one with lower cost. gcc/testsuite/ * gcc.target/arm/pr42505.c: New test case. Added: trunk/gcc/testsuite/gcc.target/arm/pr42505.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-ssa-loop-ivopts.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #10 from sandra at codesourcery dot com 2010-06-19 12:56 --- Patch posted here: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01920.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #9 from sandra at codesourcery dot com 2010-06-12 07:42 --- I now have a specific theory of what is going on here. There are two problems: (1) estimate_reg_pressure_cost is not accounting for the function call in the loop body. In this case it ought to use call_used_regs instead of fixed_regs to determine how many registers are available for loop invariants. Here the target is Thumb-1 and there are only 4 non-call-clobbered registers available rather than 9, so we are much more constrained than ivopts thinks we are. This is pretty straightforward to fix. (2) For the test case filed with the issue, there are 4 registers needed for the two candidates and two invariants ivopts is selecting, so even with the fix for (1) ivopts thinks it has enough registers available. But, there are two uses of the form (src + offset) in the ivopts output, although they appear differently in the gimple code. RTL optimizations are combining these and allocating a temporary. Since the two uses span the function call in the loop body, the temporary needs to be assigned to a non-call-clobbered register. This is why there is a spill of the other loop invariant. Perhaps we could make the RA smarter about recomputing the src + offset value rather than resort to spilling something, but since I am dumb about the RA ;-) I'm planning to keep poking at the ivopts cost model instead. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #8 from sandra at codesourcery dot com 2010-06-10 13:01 --- I was barking up the wrong tree with my last idea -- the signed/unsigned conversion business was a red herring. Here's what I now believe is the problem: the costs computation is underestimating the register pressure costs so that we are in fact spilling when the cost computation thinks it still has "free" registers. A hack to make get_computation_cost_at add target_reg_cost to the result when it must use a scratch register seemed to have positive overall effects on code size (as well as fixing the test case). But, I don't think that's the real solution, as I can't come up with a good logical justification for putting such a cost there. :-) estimate_reg_pressure_cost already reserves 3 "free" registers for such things. Anyway, I am continuing to poke at this in hopes of figuring out where the register costs model is really going wrong. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #7 from sandra at codesourcery dot com 2010-06-05 20:41 --- OK, I'm testing a hack to rewrite_use_compare to make it know that it doesn't have to introduce a temporary just to compare against constant zero. I'm also doing a little tuning of the costs model for -Os, using CSiBE. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #6 from rguenth at gcc dot gnu dot org 2010-06-04 09:08 --- If the result of the conversion is only used in an exit equality test against a constant it can be dropped. This could also happen in a following forwprop run which is our single tree-combiner (though that currently will combine into comparisons only if the result will be a constant, it doesn't treat defs with a single use specially which it could, if the combined constant is in gimple form). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #5 from steven at gcc dot gnu dot org 2010-06-04 07:45 --- AFAIU, you can't randomly change signed to unsigned, due to different overflow semantics, which is why IVOPTS doesn't make this change itself. Imagine you enter the loop with count = 0, and with a second counter hidden in func. You will not get the same number of iterations if you change the type of count from "int" to "unsigned int". -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
--- Comment #4 from sandra at codesourcery dot com 2010-06-04 00:09 --- I've been looking at this problem today. Here's the stupid part coming out of ivopts: : # ivtmp.7_21 = PHI <0(2), ivtmp.7_20(4)> # ivtmp.10_22 = PHI count_25 = (int) ivtmp.10_22; if (count_25 != 0) goto ; else goto ; No subsequent pass is recognizing that the unsigned-to-signed conversion is useless and "count" is otherwise dead. If I change the parameter "count" to have type "unsigned int", then ivopts does the obvious replacement itself: : # ivtmp.7_21 = PHI <0(2), ivtmp.7_20(4)> # ivtmp.10_22 = PHI if (ivtmp.10_22 != 0) goto ; else goto ; Then "count" is completely gone from the loop after ivopts and the resulting code looks good. So, fix this somewhere inside ivopts to make the signed case produce the same code as the unsigned one? Or tell it not to replace count at all if it has to do a type conversion? I'm still trying to find my way around the code for this pass to figure out where things happen, so if this is obvious to someone else I'd appreciate a pointer. :-) -- sandra at codesourcery dot com changed: What|Removed |Added CC||sandra at codesourcery dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505
[Bug middle-end/42505] [4.4/4.5/4.6 Regression] loop canonicalization causes a lot of unnecessary temporary variables
-- jakub at gcc dot gnu dot org changed: What|Removed |Added Target Milestone|4.4.4 |4.4.5 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42505