On Tue, Jun 12, 2012 at 1:07 PM, Ramana Radhakrishnan <ramana.radhakrish...@linaro.org> wrote: > On 12 June 2012 11:46, Richard Guenther <richard.guent...@gmail.com> wrote: >> On Tue, Jun 12, 2012 at 11:22 AM, Julian Brown <jul...@codesourcery.com> >> wrote: >>> On Mon, 11 Jun 2012 16:46:27 +0100 >>> Ramana Radhakrishnan <ramana.radhakrish...@linaro.org> wrote: >>> >>>> Hi, >>>> >>>> I don't like the ML bits of the patch as it stands today and before >>>> committing I would like to clean up the ML bits quite a bit further >>>> especially in areas where I've put FIXMEs [...] >>> >>> I had a go at this, see attached. Untested. Note there are some >>> semantic differences in output: >>> >>> vzipq_p8 (poly8x16_t __a, poly8x16_t __b) >>> { >>> poly8x16x2_t __rv; >>> - uint8x16_t __mask1 = {0, 2}; >>> - uint8x16_t __mask2 = {1, 3}; >>> - __rv.val[0] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask1); >>> - __rv.val[1] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask2); >>> + uint8x16_t __mask1 = { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, >>> 22, 7, 23 }; >>> + uint8x16_t __mask2 = { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, >>> 14, 30, 15, 31 }; >>> + __rv.val[0] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask1); >>> + __rv.val[1] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask2); >> >> You should get better code at -O0 when not using a temporary __mask1/__mask2 >> but directly pasting the constant in the builtin call. > > I tried that yesterday but it didn't seem to help - from a quick peek > at the dumps it looks like we could do with some limited const prop > just for the vec_perm expand cases. > > D.14032 = { 0, 8, 1, 9, 2, 10, 3, 11 }; > D.14044 = VEC_PERM_EXPR <__a, __b, D.14032>; > > That's what I see from the dumps and from a quick skim of the sources > - my suspicion is that lower_vec_perm in tree-vect-generic.c is where > we could try doing a limited constant propagation in this case. ? Is > that where one should attempt to fix this ? > > Consider the following testcase at O0 rewritten with just > __builtin_shuffle so that you can see it on other platforms as well > that have vec_perm_const defined for doing the interleave style > operations. and look at what you get for O1. On arm-linux-gnueabi with > -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a9 at O0 you'd see it use > the generic permute operations and at O1 you'd see a vzip.32 > instruction > > > typedef int v4si __attribute__ ((vector_size (16))); > > v4si vs (v4si a, v4si b) > { > return __builtin_shuffle (a, b, (v4si) {0, 4, 1, 5}); > }
Ok, I see the C frontend hands us this as return VEC_PERM_EXPR < a , b , <<< Unknown tree: compound_literal_expr v4si D.1712 = { 0, 4, 1, 5 }; >>> > ; and gimplification in some way fails to gimplify it to { 0, 4, 1, 5 }. Yes, tree-vect-generic.c could just lookup the SSA def stmt in this case - it does so in most cases already. Richard. > > regards > Ramana > >> >>> return __rv; >>> } >>> >>> I wasn't quite sure which version was correct -- but your version >>> doesn't seem to have enough elements for these cases? >>> >>> HTH, >>> >>> Julian