2009/12/17 Zdenek Dvorak <rakd...@kam.mff.cuni.cz>: > Hi, > >> > Is there a way to pass to the unroller the maximum number of iterations >> > of the loop such that it can decide to avoid unrolling if >> > the maximum number is small. >> > >> > To be more specific, I am referring to the following case: >> > After the vectorizer decides to peel for alignment >> > it creates three loops: >> > [1] scalar loop - the prologue to align >> > memory access. >> > [2] the vecorized loop >> > [3] scalar loop - the remaining scalar computations. >> > >> > If the unroller does not know the number of iterations at compile time >> > it unrolls loops with run-time checks in the following way >> > (taken from loop-unroll.c): >> > >> > for (i = 0; i < n; i++) >> > body; >> > >> > ==> >> > >> > i = 0; >> > mod = n % 4; >> > >> > switch (mod) >> > { >> > case 3: >> > body; i++; >> > case 2: >> > body; i++; >> > case 1: >> > body; i++; >> > case 0: ; >> > } >> > >> > while (i < n) >> > { >> > body; i++; >> > body; i++; >> > body; i++; >> > body; i++; >> > } >> > >> > >> > The vectorizer knowns at compile time the maximum number of iterations >> > that will be needed for the prologue and the epilogue. In some cases >> > seems there is no need to unroll and create redundant loops. >> >> You can set niter_max in the niter_desc of simple loops. There is >> also nb_iter_bound for all loops. Of course the >> issue is that loop information is destroyed sometimes. It also looks >> like that RTL loop analysis may not re-use this information. >> >> Maybe Zdenek knows a better answer. > > currently, there is no reliable way how to pass this information to RTL. The > best > I can come up with (without significant amount of changes to other parts of > the compiler) > would be to insert a code like > > if (n > 5) > special_abort (); > > before the loop in the vectorizer if you know for sure that the loop will > iterate at most > 5 times, use these hints to bound the number of iterations in the unroller > (we do not do this > at the moment, but it should be easy), and remove the calls to special_abort > and the > corresponding branches after the unroller.
We do have __builtin_unreachable (), but that is removed at expansion time already ;) Also the code does have n = orig_n % 4; already (or the equivalent masking and subtraction), but I guess the RTL loop analysis doesn't catch that either. Richard. > Zdenek >