On Thu, 14 May 2015, Tom de Vries wrote: > On 20-04-15 14:25, Richard Biener wrote: > > On Wed, 15 Apr 2015, Tom de Vries wrote: > > > > > On 03-04-15 14:39, Tom de Vries wrote: > > > > On 27-03-15 15:10, Tom de Vries wrote: > > > > > Hi, > > > > > > > > > > this patch fixes PR65443, a todo in the parloops pass for function > > > > > transform_to_exit_first_loop: > > > > > ... > > > > > TODO: the common case is that latch of the loop is empty and > > > > > immediately > > > > > follows the loop exit. In this case, it would be better not to > > > > > copy > > > > > the > > > > > body of the loop, but only move the entry of the loop directly > > > > > before > > > > > the > > > > > exit check and increase the number of iterations of the loop by > > > > > one. > > > > > This may need some additional preconditioning in case NIT = ~0. > > > > > ... > > > > > > > > > > The current implementation of transform_to_exit_first_loop transforms > > > > > the > > > > > loop > > > > > by moving and duplicating the loop body. This patch transforms the > > > > > loop > > > > > into the > > > > > same normal form, but without the duplication, if that's possible > > > > > (determined by > > > > > try_transform_to_exit_first_loop_alt). > > > > > > > > > > The actual transformation, done by transform_to_exit_first_loop_alt > > > > > transforms > > > > > this loop: > > > > > ... > > > > > bb preheader: > > > > > ... > > > > > goto <bb header> > > > > > > > > > > <bb header>: > > > > > ... > > > > > if (ivtmp < n) > > > > > goto <bb latch>; > > > > > else > > > > > goto <bb exit>; > > > > > > > > > > <bb latch>: > > > > > ivtmp = ivtmp + inc; > > > > > goto <bb header> > > > > > ... > > > > > > > > > > into this one: > > > > > ... > > > > > bb preheader: > > > > > ... > > > > > goto <bb newheader> > > > > > > > > > > <bb header>: > > > > > ... > > > > > goto <bb latch>; > > > > > > > > > > <bb newheader>: > > > > > if (ivtmp < n + 1) > > > > > goto <bb header>; > > > > > else > > > > > goto <bb exit>; > > > > > > > > > > <bb latch>: > > > > > ivtmp = ivtmp + inc; > > > > > goto <bb newheader> > > > > > ... > > > > > > > > > > > > > Updated patch, now using redirect_edge_var_map and flush_pending_stmts. > > > > > > > > Bootstrapped and reg-tested on x86_64. > > > > > > > > OK for stage1 trunk? > > > > > > > > > > Ping. > > > > Hi Richard, > > thanks for the review.
Sorry for the delay (too many things to review, too much other work...) > > +static void > > +replace_imm_uses (tree val, imm_use_iterator *imm_iter) > > +{ > > + use_operand_p use_p; > > + > > + FOR_EACH_IMM_USE_ON_STMT (use_p, *imm_iter) > > + SET_USE (use_p, val); > > > > Use propagate_value. Why this odd interface passing a imm_iter?! > > > > The replace_imm_uses function is common code factored out of > replace_uses_in_bb_by and replace_uses_in_bbs_by. I'm not sure what is odd > about the interface of the replace_imm_uses function, but I've removed the > function. > > I tried using propagate_value, but that one doesn't like virtuals. > > > In fact most of the "repair SSA" in transform_to_exit_first_loop_alt > > looks too ad-hoc to me ... it also looks somewhat familiar to other > > code ... > > > > Unfortunately the comment before the function isn't in SSA form > > so it's hard to follow the transform. > > > > I've added the ssa bits in the transformation comment before the function, and > updated variable names and comments in the function. > > > I consider the parloops code bitrotten, no repair possible, so > > I might be convinced to not care about new spaghetti in there... > > > > + /* Fix up loop structure. TODO: Check whether this is sufficient. */ > > + loop->header = new_header; > > + > > > > no, surely not. Number of iterations (and estimates) are off > > after the transform > > The loop is cancelled at the end of gen_parallel_loop, and the estimations are > removed. We only seem to be using a limited set of the loop data until the > loop is cancelled. Updated comment accordingly. > > > and loop->latch might also need updating. > > > > That one actually stays the same. Updated comment accordingly. > > > "Safest" is to simply schedule a fixup (but you'll lose any > > loop annotation in that process). > > > > Since the loop is cancelled, AFAIU we don't need that, but... You're > referring to the annotations mentioned in > replace_loop_annotate_in_block? Losing the annotations seems to be a > generic problem of scheduling such a fixup, not particular to this > patch. Do you have a concern related to the patch and these annotations? For example such annotations (or others added by OpenMP like the safe_len stuff). Generally fixups preserve those _iff_ the loop header stays the same. So in your case doing loop->header = new_header; set_loops_state (LOOPS_NEED_FIXUP); would probably "work". But yes, if the loop is cancelled anyway there is no point jumping through hoops. > > + /* Figure out whether nit + 1 overflows. */ > > + if (TREE_CODE (nit) == INTEGER_CST) > > + { > > + if (!tree_int_cst_equal (nit, TYPE_MAXVAL (nit_type))) > > > > in case nit_type is a pointer type TYPE_MAXVAL will be NULL I think. > > > > We've done ivcanon just before this point, so AFAIU we can assume we're > dealing with an unsigned integer. > > > Is the whole exercise for performance? > > I'm trying to fix the todo in the code for parloops, which is about > performance, though I don't expect a large gain. > > OTOH, my main focus is a related oacc kernels issue. For an oacc kernels > region, the entire loop is enclosed in a kernels region. Peeling of the last > iteration means I have to guard that iteration to run on only one thread, > which currently isn't done, so in that sense it's a correctness issue as well. But as the patch doesn't cover all cases you still have a correctness issue to solve, no? It seems to me the last iteration should simply _not_ be in the oacc kernels region? > > In that case using an > > entirely new, unsigned IV, that runs from 0 to niter should be easiest and > > Introducting such an IV would mean that we don't have to update the IV, but > still we have to connect the new IV to the uses of the old IV. > > The current special handling of the IV variable is actually not that > elaborate: > ... > + /* Replace ivtmp_a with ivtmp_c in condition 'if (ivtmp_a < n)'. */ > + replace_uses_in_bb_by (res_a, res_c, new_header); > ... > So I'm not sure it's easier. > > just run-time guard that niter == +INF case? > > That doesn't work out nicely for the oacc kernels case. I need to know > compile-time wheter I can parallelize or not. Hmm, I see. > > For the graphite case, can't you make graphite generate the > > loops exit-first in the first place? > > > > Perhaps, but that doesn't make a difference for the oacc kernels case. > > > The testcases are just correctness ones? That is, there isn't > > any testcase that checks whether the new code is exercised? > > (no extra debugging dumping?) > > > > There are 3 new test patterns, each with a libgomp/testsuite/libgomp.c and a > gcc/testsuite/gcc.dg variant, so 6 new test-cases in total. The > gcc/testsuite/gcc.dg variant checks that the new code is exercised. The > libgomp/testsuite/libgomp.c variant checks that the new code generates correct > code. > > Reposting updated patch (for brevity, without testcases) below. I'm ok with the patch and count on you to fix eventual fallout ;) Thanks, Richard.