On 26/10/15 15:04, Richard Biener wrote:
apart from the fact that you'll post a new version you need to adjust GROUP_GAP.
You also seem to somewhat "confuse" "first I stmts" and "a group of
size I", those
are not the same when the group has haps. I'd say "a group of size i" makes the
most sense here thus I suggest to adjust the function comment accordingly.
Ok, thanks for pointing this out. My objective had been to only split the store
groups - which in BB vectorization, always seem to have gap 0 1 1 .... 1. I
didn't come up with a good scheme for how to split load groups, but it seemed
that I didn't need to do anything there if I restricted to BB vectorization
only. For example, consider (ignoring that we could multiply the first four
elements by 1 and add 0 to the last four):
a[0] = b[I] + 1;
a[1] = b[J] + 2;
a[2] = b[K] + 3;
a[3] = b[L] + 4;
a[4] = b[M] * 3;
a[5] = b[N] * 4;
a[6] = b[O] * 5;
a[7] = b[P] * 7;
with constants I,J,K,L,M,N,O,P. Even with those being a sequence 2 0 1 1 3 0 2 1
with overlaps and repetitions, this works fine for BB SLP (two subgroups of
stores, *sharing* a load group but with different permutations). Likewise 0 1 2
3 0 2 4 6.
For loop SLP, yes it looks like the load group needs to be split. So how; and
what constraints to impose on those constants? (There is no single right answer!)
A fairly-strict scheme could be that (I,J,K,L) must be within a contiguous block
of memory, that does not overlap with the contiguous block containing (M,N,O,P).
Then, splitting the load group on the boundary seems reasonable, and updating
the gaps as you suggest. However, when you say "the group first elements
GROUP_GAP is the gap at the _end_ of the whole group" - the gap at the end is
the gap that comes after the last element and up to....what?
Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we
split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1?
(IIUC, you suggest 1111 and 0111?)
If they are disjoint sets, but overlapping blocks of memory, say 0 2 4 6 1 3 5
7...then do we create two load groups, with gap 0 2 2 2 and 0 2 2 2 again? Does
something record that the load groups access overlapping areas, and record the
offset against each other?
If there are repeated elements (as in the BB SLP case mentioned above), I'm not
clear how we can split this effectively...so may have to rule out that case.
(Moreover, if we are considering hybrid SLP, it may not be clear what the loop
accesses are, we may be presented only with the SLP accesses. Do we necessarily
want to pull those out of a load group?)
So I expect I may resolve some of these issues as I progress, but I'm curious as
to whether (and why) the patch was really broken (wrt gaps) as it stood...
Thanks,
Alan