On 06/17/2016 08:33 AM, Ilya Enkovich wrote:

Hmm, there seems to be a level of indirection I'm missing here.  We're
smuggling LOOP_VINFO_ORIG_LOOP_INFO around in loop->aux.  Ewww.  I thought
the whole point of LOOP_VINFO_ORIG_LOOP_INFO was to smuggle the VINFO from
the original loop to the vectorized epilogue.  What am I missing?  Rather
than smuggling around in the aux field, is there some inherent reason why we
can't just copy the info from the original loop directly into
LOOP_VINFO_ORIG_LOOP_INFO for the vectorized epilogue?

LOOP_VINFO_ORIG_LOOP_INFO is used for several things:
 - mark this loop as epilogue
 - get VF of original loop (required for both mask and nomask modes)
 - get decision about epilogue masking

That's all.  When epilogue is created it has no LOOP_VINFO.  Also when we
vectorize loop we create and destroy its LOOP_VINFO multiple times.  When
loop has LOOP_VINFO loop->aux points to it and original LOOP_VINFO is in
LOOP_VINFO_ORIG_LOOP_INFO.  When Loop has no LOOP_VINFO associated I have no
place to bind it with the original loop and therefore I use vacant loop->aux
for that.  Any other way to bind epilogue with its original loop would work
as well.  I just chose loop->aux to avoid new fields and data structures.
I was starting to draw the conclusion that the smuggling in the aux field was for cases when there was no LOOP_VINFO. But was rather late at night and I didn't follow that idea through the code. THanks for clarifying.



And something just occurred to me -- is there some inherent reason why SLP
doesn't vectorize the epilogue, particularly for the cases where we can
vectorize the epilogue using smaller vectors?  Sorry if you've already
answered this somewhere or it's a dumb question.

IIUC this may happen only if we unroll epilogue into a single BB which happens
only when epilogue iterations count is known. Right?
Probably. The need to make sure the epilogue is unrolled probably makes this a non-starter.

I have a soft spot for SLP as I stumbled on the idea while rewriting a presentation in the wee hours of the morning for the next day. Essentially it was a "poor man's" vectorizer that could be done for dramatically less engineering cost than a traditional vectorizer. The MIT paper outlining the same ideas came out a couple years later...


+       /* Add new loop to a processing queue.  To make it easier
+          to match loop and its epilogue vectorization in dumps
+          put new loop as the next loop to process.  */
+       if (new_loop)
+         {
+           loops.safe_insert (i + 1, new_loop->num);
+           vect_loops_num = number_of_loops (cfun);
+         }
+

So just to be clear, the only reason to do this is for dumps -- other than
processing the loop before it's epilogue, there's no other inherently
necessary ordering of the loops, right?

Right, I don't see other reasons to do it.
Perfect.  Thanks for confirming.

jeff

Reply via email to