On 30/07/2019 13:16, Andre Vieira (lists) wrote:
Hi Richard,
I believe this is in line with what you were explaining to me earlier.
The one thing I haven't quite done here is the jump around for if there
is no prolog peeling. Though I think skip_vectors introduces the jumping
we need.
The main gist of it is:
1) if '--param vect-epilogues-nomask=1' is passed we refrain from adding
the versioning threshold check when versioning a loop at first,
2) later we allow the vectorizer to use skip_vectors to basically check
niters to be able to skip the vectorized loop on to the right epilogue,
3) after vectorizing the epilogue, we check whether this was a versioned
loop and whether we skipped adding the versioning threshold, if so we
add a threshold based on the last VF
There is a caveat here, if we don't vectorize the epilogue, because say
our architecture doesn't know how, we will end up with a regression.
Let's say we have a loop for which our target can vectorize for 32x but
not 16x, we get:
if (alias & threshold_for_16x ())
{
if (niters > 31)
vectorized_31 ();
else
scalar ();
}
else
scalar ();
Imagine our iteration count is 18, all we hit is the scalar loop, but
now we go through 1x alias and 2x niters. Whereas potentially we could
have done with just 1x niters.
A mitigation for this could be to only add the threshold check if we
actually vectorized the last loop, I believe a:
'if (LOOP_VINFO_VECTORIZABLE_P (loop_vinfo)) return NULL;' inside that
new "else" in vect_transform_loop would do the trick. Though then we
will still have that extra alias check... >
I am going to hack around in the back-end to "fake" a situation like
this and see what happens.
Right should have quickly tried this before sending the email, what
happens is actually vect_transform_loop never gets called because
try_vectorize_loop_1 will recognize it can't vectorize the epilogue. So
we end up with the "mitigation" result I suggested, where we simply
don't have a versioning threshold which might still not be ideal.
Another potential issue arises when the profitability threshold obtained
from the cost model isn't guaranteed to be either LE or EQ to the
versioning threshold. In that case there are two separate checks, which
now we no longer will attempt to fold.
And I just realized I don't take the cost model threshold into account
when creating the new threshold check, I guess if it is ordered_p we
should again take the max there between the new threshold check and the
cost threshold for the new check.
I am trying to find a way to test and benchmark these changes.
Unfortunately I am having trouble doing this for AVX512 as I find that
the old '--param vect_epilogues_nomask=1' already causes wrong codegen
in SPEC2017 for the gcc and perlbench benchmarks.
Cheers,
Andre
On 19/07/2019 13:38, Andre Vieira (lists) wrote:
On 19/07/2019 12:35, Richard Biener wrote:
On Fri, 19 Jul 2019, Andre Vieira (lists) wrote:
On 15/07/2019 11:54, Richard Biener wrote:
On Mon, 15 Jul 2019, Andre Vieira (lists) wrote:
On 12/07/2019 11:19, Richard Biener wrote:
On Thu, 11 Jul 2019, Andre Vieira (lists) wrote:
I have code that can split the condition and alias checks in
'vect_loop_versioning'. For this approach I am considering keeping
that
bit of
code and seeing if I can patch up the checks after vectorizing the
epilogue
further. I think initially I will just go with a "hacked up" way of
passing
down the bb with the iteration check and split the false edge
every time
we
vectorize it further. Will keep you posted on progress. If you
have any
pointers/tips they are most welc ome!
I thought to somehow force the idea that we have a prologue loop
to the vectorizer so it creates the number-of-vectorized iterations
check and branch around the main (highest VF) vectorized loop.
Hmm I think I may have skimmed over this earlier. I am reading it
now and am
not entirely sure what you mean by this. Specifically the "number of
vectorized iterations" or how a prologue loop plays a role in this.
When there is no prologue then the versioning condition currently
ensures we always enter the vector loop. Only if there is a prologue
is there a check and branch eventually skipping right to the epilogue.
If the versioning condition (now using a lower VF) doesn't guarantee
this we have to add this jump-around.
Right, I haven't looked at the prologue path yet. I had a quick look
and can't find where this branch skipping to the epilogue is
constructed. I will take a better look after I got my current example
to work.
I guess this sheds some light on the comment above. And it
definitely implies
we would need to know the lowest VF when creating this condition.
Which is
tricky.
We can simply use the smallest vector size supported by the target to
derive it from the actual VF, no?
So I could wait to introduce this check after all epilogue
vectorization is done, back track to the last niter check and
duplicate that in the outer loop.
What I didn't want to do was use the smallest possible vector size for
the target because I was under the impression that does not
necessarily correspond to the smallest VF we may have for a loop, as
the vectorizer may have decided not to vectorize for that vector size
because of costs? If it I can assume this never happens, that once it
starts to vectorize epilogues that it will keep vectorizing them for
any vector size it knows off then yeah I can use that.
I'm not sure I understand - why would you have any check not inside
the outer loop? Yes, we now eventually hoist versioning checks
but the VF checks for the individual variants should be around
the vectorized loop itself (so not really part of the versioning check).
Yeah I agree. I was just explaining what I had done wrong now.
Cheers,
Andre
PS: I often find myself having to patch the DOMINATOR information,
sometimes
its easy to, but sometimes it can get pretty complicated. I wonder
whether it
would make sense to write something that traverses a loop and
corrects this,
if it doesn't exist already.
There's iterate-fix-dominators, but unless you create new edges/blocks
manually rather than doing split-block/redirect-edge which should do
dominator updating for you.
Ah I was doing everything manually after having some bad experiences
with lv_add_condition_to_bb. I will have a look at those thanks!
Cheers,
Andre
Richard.
Richard.