On Sun, Mar 10, 2013 at 2:08 AM, Steven Bosscher <stevenb....@gmail.com> wrote: > Hello, > > The attached patch fixes another one of the scalability problems > reported in PR middle-end/39326. > > This problem is that tree loop-invariant code motion explodes on basic > blocks with many memory references. Compile time is quadratic in the > number of memory references in the basic block, and so are the memory > requirements when the dependences or independences are propagated > bottom-up through the loop tree. > > The fix is to give up on loops with too many memory references to > handle. There is already a param for that for loop dependence > analysis, and this patch makes LIM use the same param. > > Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu. > OK for trunk?
Given + well. Return true if all is well, false if something happened + that is fatal to the rest of the LIM pass. */ -static void +static bool gather_mem_refs_stmt (struct loop *loop, gimple stmt) and FOR_EACH_BB (bb) { ... + for (bsi = gsi_start_bb (bb); + !gsi_end_p (bsi) && all_ok; + gsi_next (&bsi)) + all_ok = gather_mem_refs_stmt (loop, gsi_stmt (bsi)); + + if (! all_ok) + bitmap_set_bit (loops_with_too_many_memrefs, loop->num); + } + + /* Propagate the information about loops with too many memory + references up the loop hierarchy. */ + FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) + { + struct loop *outer = loop_outer (loop); + if (outer == current_loops->tree_root + || ! bitmap_bit_p (loops_with_too_many_memrefs, loop->num)) + continue; + bitmap_set_bit (loops_with_too_many_memrefs, outer->num); } I don't see how this propagation works correctly as you start to mark BBs as not-ok starting from a "random" basic-block in the loop tree. You of course also end up disqualifying very small loops completely if they happen to be analyzed after a very big one you disqualify. Of course that's partly because memory_accesses contains all memory accesses in the function - so I think rather than limiting on length of memory_accesses you want to limit on the length of memory_accesses.refs_in_loop (well, on memory_accesses.all_refs_in_loop). And you want the initial walk over all BBs to instead walk on BBs FOR_EACH_LOOP and LI_FROM_INNERMOST (you can then do the propagation to fill all_refs_in_loop there, too). I realize there isn't a good existing BB walker for this (with a suitable place to re-set all-ok) - a simple walk over get_loop_body via LI_FROM_INNERMOST would get you visit sub-loop bodies multiple times (easily skipped by a bb->loop_father test, of course, but still ...) That is, sth like the following preparatory patch to change the iteration: Index: gcc/tree-ssa-loop-im.c =================================================================== --- gcc/tree-ssa-loop-im.c (revision 196547) +++ gcc/tree-ssa-loop-im.c (working copy) @@ -1629,29 +1629,30 @@ gather_mem_refs_in_loops (void) loop_iterator li; bitmap lrefs, alrefs, alrefso; - FOR_EACH_BB (bb) - { - loop = bb->loop_father; - if (loop == current_loops->tree_root) - continue; - - for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi)) - gather_mem_refs_stmt (loop, gsi_stmt (bsi)); - } - - /* Propagate the information about accessed memory references up - the loop hierarchy. */ FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST) { + basic_block bbs = get_loop_body (loop); + for (i = 0; i < loop->num_nodes; ++i) + { + bb = bbs[i]; + if (bb->loop_father != loop) + continue; + for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (&bsi)) + gather_mem_refs_stmt (loop, gsi_stmt (bsi)); + } + free (bbs); + + /* Propagate the information about accessed memory references up + the loop hierarchy. */ lrefs = memory_accesses.refs_in_loop[loop->num]; alrefs = memory_accesses.all_refs_in_loop[loop->num]; bitmap_ior_into (alrefs, lrefs); - if (loop_outer (loop) == current_loops->tree_root) - continue; - - alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num]; - bitmap_ior_into (alrefso, alrefs); + if (loop_outer (loop) != current_loops->tree_root) + { + alrefso = memory_accesses.all_refs_in_loop[loop_outer (loop)->num]; + bitmap_ior_into (alrefso, alrefs); + } } } At this point this should be stage1 material, eventually backported for 4.8.1. And yes, aside from the above the rest of the patch looks good to me (move loops_with_too_many_memrefs into the memory_accesses struct?) Thanks, Richard. > Ciao! > Steven > > (The ChangeLog is a bit long but the patch is relatively straight-forward.) > > * tree-flow.h (enum move_pos): Moved to tree-ssa-loop-im.c. > * tree-ssa-loop-im.c: Include diagnostic-core.h for warning_at() > (enum move_pos): Moved here. > (movement_possibility): Made static. Reject memory operations in > loops with too many memory references to handle. > (determine_max_movement): Take loops_with_too_many_memrefs argument. > For statements referencing memory, find the outermost superloop > that is not in the loops_with_too_many_memrefs set. > (determine_invariantness_stmt): Take loops_with_too_many_memrefs > via dom_walk_data.global_data, and pass it along where necessary. > Hoist "pos == MOVE_POSSIBLE" test. > (determine_invariantness): Take loops_with_too_many_memrefs argument. > (move_computations): Likewise, but unused for now. > (gather_mem_refs_stmt): Fail if there are too many memory references. > Use PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS as threshold. Add disabled > optimization warning. > (gather_mem_refs_in_loops): Take loops_with_too_many_memrefs argument. > Propagate it from inner loops to outer loops. Do not propagate > recorded memory references for loops on which memory optimizations > are disabled. > (create_vop_ref_mapping): Take loops_with_too_many_memrefs argument. > Don't create a mapping on loops that are in this set. > (analyze_memory_references): Take loops_with_too_many_memrefs argument > and let subroutines fill it. > (store_motion): Take loops_with_too_many_memrefs argument. > Skip loops that are in this set. > (tree_ssa_lim): Allocate, pass, and free loops_with_too_many_memrefs.