On 26/10/15 15:04, Richard Biener wrote:

apart from the fact that you'll post a new version you need to adjust GROUP_GAP.
You also seem to somewhat "confuse" "first I stmts" and "a group of
size I", those
are not the same when the group has haps.  I'd say "a group of size i" makes the
most sense here thus I suggest to adjust the function comment accordingly.

Ok, thanks for pointing this out. My objective had been to only split the store groups - which in BB vectorization, always seem to have gap 0 1 1 .... 1. I didn't come up with a good scheme for how to split load groups, but it seemed that I didn't need to do anything there if I restricted to BB vectorization only. For example, consider (ignoring that we could multiply the first four elements by 1 and add 0 to the last four):

    a[0] = b[I] + 1;
    a[1] = b[J] + 2;
    a[2] = b[K] + 3;
    a[3] = b[L] + 4;
    a[4] = b[M] * 3;
    a[5] = b[N] * 4;
    a[6] = b[O] * 5;
    a[7] = b[P] * 7;


with constants I,J,K,L,M,N,O,P. Even with those being a sequence 2 0 1 1 3 0 2 1 with overlaps and repetitions, this works fine for BB SLP (two subgroups of stores, *sharing* a load group but with different permutations). Likewise 0 1 2 3 0 2 4 6.

For loop SLP, yes it looks like the load group needs to be split. So how; and what constraints to impose on those constants? (There is no single right answer!)

A fairly-strict scheme could be that (I,J,K,L) must be within a contiguous block of memory, that does not overlap with the contiguous block containing (M,N,O,P). Then, splitting the load group on the boundary seems reasonable, and updating the gaps as you suggest. However, when you say "the group first elements GROUP_GAP is the gap at the _end_ of the whole group" - the gap at the end is the gap that comes after the last element and up to....what?

Say I...P are consecutive, the input would have gaps 0 1 1 1 1 1 1 1. If we split the load group, we would want subgroups with gaps 0 1 1 1 and 0 1 1 1? (IIUC, you suggest 1111 and 0111?)

If they are disjoint sets, but overlapping blocks of memory, say 0 2 4 6 1 3 5 7...then do we create two load groups, with gap 0 2 2 2 and 0 2 2 2 again? Does something record that the load groups access overlapping areas, and record the offset against each other?

If there are repeated elements (as in the BB SLP case mentioned above), I'm not clear how we can split this effectively...so may have to rule out that case. (Moreover, if we are considering hybrid SLP, it may not be clear what the loop accesses are, we may be presented only with the SLP accesses. Do we necessarily want to pull those out of a load group?)

So I expect I may resolve some of these issues as I progress, but I'm curious as to whether (and why) the patch was really broken (wrt gaps) as it stood...

Thanks,
Alan

Reply via email to