>> at least for chips with no control flow support like nv30 and i915 > s/at least/only > This doesn't reduce divergence, only increases code size.
The purpose of this unrolling is not to reduce divergence, but to avoid the expense of computing and checking the loop iteration variable, and the expense of the loop construct itself, which may be significant if the loop consists of only one instruction or so. Thus, it should be an optimization an any hardware, provided that appropriate heuristics are employed, so that only short loops with little iterations are unrolled (this will probably need Gallium caps to give hints to the compiler). Right now it will usually trigger only when control flow is unsupported, because breaks are only lowered in that case, and the loop is not likely to be in the form this technique requires otherwise. However, we should add a "small loop with few iterations heuristic" that will turn on all lowering for the loop in question so it can always be unrolled. Functions should similarly have their returns unified only if the inliner is going to inline the function, which it shouldn't do indiscriminately as it does right now if the hardware supports function calls. Anyway, a higher priority in this area is to prevent the inliner and loop unroller from running indefinitely as the tests I posted on piglit ML show. _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
