On Tue, Jun 12, 2012 at 2:12 PM, Richard Guenther <richard.guent...@gmail.com> wrote: > On Tue, Jun 12, 2012 at 1:07 PM, Ramana Radhakrishnan > <ramana.radhakrish...@linaro.org> wrote: >> On 12 June 2012 11:46, Richard Guenther <richard.guent...@gmail.com> wrote: >>> On Tue, Jun 12, 2012 at 11:22 AM, Julian Brown <jul...@codesourcery.com> >>> wrote: >>>> On Mon, 11 Jun 2012 16:46:27 +0100 >>>> Ramana Radhakrishnan <ramana.radhakrish...@linaro.org> wrote: >>>> >>>>> Hi, >>>>> >>>>> I don't like the ML bits of the patch as it stands today and before >>>>> committing I would like to clean up the ML bits quite a bit further >>>>> especially in areas where I've put FIXMEs [...] >>>> >>>> I had a go at this, see attached. Untested. Note there are some >>>> semantic differences in output: >>>> >>>> vzipq_p8 (poly8x16_t __a, poly8x16_t __b) >>>> { >>>> poly8x16x2_t __rv; >>>> - uint8x16_t __mask1 = {0, 2}; >>>> - uint8x16_t __mask2 = {1, 3}; >>>> - __rv.val[0] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask1); >>>> - __rv.val[1] = (poly8x16_t)__builtin_shuffle (__a, __b, __mask2); >>>> + uint8x16_t __mask1 = { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, >>>> 22, 7, 23 }; >>>> + uint8x16_t __mask2 = { 8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, >>>> 14, 30, 15, 31 }; >>>> + __rv.val[0] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask1); >>>> + __rv.val[1] = (poly8x16_t) __builtin_shuffle (__a, __b, __mask2); >>> >>> You should get better code at -O0 when not using a temporary __mask1/__mask2 >>> but directly pasting the constant in the builtin call. >> >> I tried that yesterday but it didn't seem to help - from a quick peek >> at the dumps it looks like we could do with some limited const prop >> just for the vec_perm expand cases. >> >> D.14032 = { 0, 8, 1, 9, 2, 10, 3, 11 }; >> D.14044 = VEC_PERM_EXPR <__a, __b, D.14032>; >> >> That's what I see from the dumps and from a quick skim of the sources >> - my suspicion is that lower_vec_perm in tree-vect-generic.c is where >> we could try doing a limited constant propagation in this case. ? Is >> that where one should attempt to fix this ? >> >> Consider the following testcase at O0 rewritten with just >> __builtin_shuffle so that you can see it on other platforms as well >> that have vec_perm_const defined for doing the interleave style >> operations. and look at what you get for O1. On arm-linux-gnueabi with >> -mfpu=neon -mfloat-abi=softfp -mcpu=cortex-a9 at O0 you'd see it use >> the generic permute operations and at O1 you'd see a vzip.32 >> instruction >> >> >> typedef int v4si __attribute__ ((vector_size (16))); >> >> v4si vs (v4si a, v4si b) >> { >> return __builtin_shuffle (a, b, (v4si) {0, 4, 1, 5}); >> } > > Ok, I see the C frontend hands us this as > > return VEC_PERM_EXPR < a , b , <<< Unknown tree: compound_literal_expr > v4si D.1712 = { 0, 4, 1, 5 }; >>> > ; > > and gimplification in some way fails to gimplify it to { 0, 4, 1, 5 }. Yes, > tree-vect-generic.c could just lookup the SSA def stmt in this case - it > does so in most cases already.
Like with Index: tree-vect-generic.c =================================================================== --- tree-vect-generic.c (revision 188428) +++ tree-vect-generic.c (working copy) @@ -628,6 +628,14 @@ lower_vec_perm (gimple_stmt_iterator *gs location_t loc = gimple_location (gsi_stmt (*gsi)); unsigned i; + if (TREE_CODE (mask) == SSA_NAME) + { + gimple def_stmt = SSA_NAME_DEF_STMT (mask); + if (is_gimple_assign (def_stmt) + && gimple_assign_rhs_code (def_stmt) == VECTOR_CST) + mask = gimple_assign_rhs1 (def_stmt); + } + if (TREE_CODE (mask) == VECTOR_CST) { unsigned char *sel_int = XALLOCAVEC (unsigned char, elements); pre-approved if it passes bootstrap & regtest. Richard. > Richard. > >> >> regards >> Ramana >> >>> >>>> return __rv; >>>> } >>>> >>>> I wasn't quite sure which version was correct -- but your version >>>> doesn't seem to have enough elements for these cases? >>>> >>>> HTH, >>>> >>>> Julian