Re: combine permutations in gimple

Marc Glisse Mon, 13 Aug 2012 07:29:28 -0700

On Mon, 13 Aug 2012, Ramana Radhakrishnan wrote:

On 13 August 2012 14:21, Marc Glisse <marc.gli...@inria.fr> wrote:

On Mon, 13 Aug 2012, Richard Guenther wrote:

On Mon, Aug 13, 2012 at 3:12 PM, Ramana Radhakrishnan
<ramana.radhakrish...@linaro.org> wrote:



I guess people will complain soon enough if this causes horrible
performance
regressions in vectorized code.



Not having looked at your patch in great detail,. surely what we don't
want is a situation where 2 constant permutations are converted into
one generic permute. Based on a quick read of your patch I couldn't
work that out.  It might be that 2 constant  permutes are cheaper than
a generic permute. Have you looked at any examples in that space . I
surely wouldn't like to see a sequence of interleave / transpose
change into a generic permute operation on Neon as that would be far
more expensive than this.  It surely needs more testting than just
this bit before going in. The reason being that this would likely take
more registers and indeed produce loads of a constant pool for the new
mask.



What do you call constant / non-constant? The combined permutation is still
constant, although the expansion (in the back-end) might fail to expand it
efficiently and fall back to the generic permutation expansion...



That is exactly what I was worried about. By constant I would expect
something that is expanded as  a constant permute by the backend - an
interleave operation or a transpose operation or indeed some of the
funky operations as below in ARM / Neon which is the SIMD architecture
I'm most familiar with .


If you had something that did the following :


uint8x8_t
tst_vrev642_u8 (uint8x8_t __a)
{
 uint8x8_t __rv;
 uint8x8_t __mask1 = { 7, 6, 5, 4, 3, 2, 1, 0};
 uint8x8_t __mask2 =  { 0, 8, 2, 10, 4, 12, 6, 14 };
 return __builtin_shuffle ( __builtin_shuffle ( __a, __mask1), __mask2) ;
}



I would expect these instructions

        vrev64.8        d0, d0
        vmov    d16, d0  @ v8qi
        vtrn.8  d0, d16
        bx      lr

With your patch I see

tst_vrev642_u8:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        fldd    d16, .L2
        vtbl.8  d0, {d0}, d16
        bx      lr
.L3:
        .align  3
.L2:
        .byte   7
        .byte   7
        .byte   5
        .byte   5
        .byte   3
        .byte   3
        .byte   1
        .byte   1

Seems to be one instruction shorter at least ;-) Yes, there can be muchworse regressions than that because of the patch (like 40 instructionsinstead of 4, in the x86 backend).

It might be that the backend predicates need tightening in this case
but  why not try to get cost info about such combinations rather than
just doing this gratuitously ?

I don't think the word gratuitous is appropriate. middle-end replaces a+-bwith a-b without first asking the backend whether it might be moreefficient. One permutation is better than 2. It just happens that therange of possible permutations is too large (and the basic instructionsare too strange) for backends to do a good job on them, and thus keepingtoplevel input as a hint is important.

While in this case the ARM port might
be wrong , but in general when the vector permute rewrites were done
we chose to go ahead and keep the generic constant permutes in the
backend as the last resort to fall back to. However if fwprop starts
making such transformations really one ought to get relative costs for
each of these operations rather than allowing gratuitous replacements.


Indeed, having costs would help, but that's a lot of work.

As you can see from the original email, I am willing to limit theoptimization to the case where the combined permutation is the identity(yes, that's quite drastic...). I should probably implement that specialcase anyway, because it doesn't require its argument to have a single usefor the optimization to make sense.


--
Marc Glisse

Re: combine permutations in gimple

Reply via email to