On Mon, Aug 31, 2015 at 10:28 PM, Bill Schmidt
<wschm...@linux.vnet.ibm.com> wrote:
> Hi,
>
> The following simple test fails when attempting to convert a vector
> shift-by-scalar into a vector shift-by-vector.
>
>   typedef unsigned char v16ui __attribute__((vector_size(16)));
>
>   v16ui vslb(v16ui v, unsigned char i)
>   {
>     return v << i;
>   }
>
> When this code is gimplified, the shift amount gets expanded to an
> unsigned int:
>
>   vslb (v16ui v, unsigned char i)
>   {
>     v16ui D.2300;
>     unsigned int D.2301;
>
>     D.2301 = (unsigned int) i;
>     D.2300 = v << D.2301;
>     return D.2300;
>   }
>
> In expand_binop, the shift-by-scalar is converted into a shift-by-vector
> using expand_vector_broadcast, which produces the following rtx to be
> used to initialize a V16QI vector:
>
> (parallel:V16QI [
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>         (subreg/s/v:SI (reg:DI 155) 0)
>     ])
>
> The back end eventually chokes trying to generate a copy of the SImode
> expression into a QImode memory slot.
>
> This patch fixes this problem by ensuring that the shift amount is
> truncated to the inner mode of the vector when necessary.  I've added a
> test case verifying correct PowerPC code generation in this case.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions.  Is this ok for trunk?
>
> Thanks,
> Bill
>
>
> [gcc]
>
> 2015-08-31  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>
>         * optabs.c (expand_binop): Don't create a broadcast vector with a
>         source element wider than the inner mode.
>
> [gcc/testsuite]
>
> 2015-08-31  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
>
>         * gcc.target/powerpc/vec-shift.c: New test.
>
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        (revision 227353)
> +++ gcc/optabs.c        (working copy)
> @@ -1608,6 +1608,13 @@ expand_binop (machine_mode mode, optab binoptab, r
>
>        if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
>         {
> +         /* The scalar may have been extended to be too wide.  Truncate
> +            it back to the proper size to fit in the broadcast vector.  */
> +         machine_mode inner_mode = GET_MODE_INNER (mode);
> +         if (GET_MODE_BITSIZE (inner_mode)
> +             < GET_MODE_BITSIZE (GET_MODE (op1)))

Does that work for modeless constants?  Btw, what do other targets do
here?  Do they
also choke or do they cope with the wide operand?

> +           op1 = simplify_gen_unary (TRUNCATE, inner_mode, op1,
> +                                     GET_MODE (op1));
>           rtx vop1 = expand_vector_broadcast (mode, op1);
>           if (vop1)
>             {
> Index: gcc/testsuite/gcc.target/powerpc/vec-shift.c
> ===================================================================
> --- gcc/testsuite/gcc.target/powerpc/vec-shift.c        (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/vec-shift.c        (working copy)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power7" } } */
> +/* { dg-options "-mcpu=power7 -O2" } */
> +
> +/* This used to ICE.  During gimplification, "i" is widened to an unsigned
> +   int.  We used to fail at expand time as we tried to cram an SImode item
> +   into a QImode memory slot.  This has been fixed to properly truncate the
> +   shift amount when splatting it into a vector.  */
> +
> +typedef unsigned char v16ui __attribute__((vector_size(16)));
> +
> +v16ui vslb(v16ui v, unsigned char i)
> +{
> +       return v << i;
> +}
> +
> +/* { dg-final { scan-assembler "vspltb" } } */
> +/* { dg-final { scan-assembler "vslb" } } */
>
>
>

Reply via email to