On Thu, Oct 31, 2013 at 04:49:47PM +0000, Jakub Jelinek wrote:

> 2013-10-31  Jakub Jelinek  <ja...@redhat.com>
> 
>         * optabs.c (expand_vec_perm): Avoid vector mode punning
>         SUBREGs in SET_DEST.
>         * expmed.c (store_bit_field_1): Likewise.
>         * config/i386/sse.md (movdi_to_sse, vec_pack_sfix_trunc_v2df,
>         vec_pack_sfix_v2df, vec_shl_<mode>, vec_shr_<mode>,
>         vec_interleave_high<mode>, vec_interleave_low<mode>): Likewise.
>         * config/i386/i386.c (ix86_expand_vector_move_misalign,
>         ix86_expand_sse_movcc, ix86_expand_int_vcond, ix86_expand_vec_perm,
>         ix86_expand_sse_unpack, ix86_expand_args_builtin,
>         ix86_expand_vector_init_duplicate, ix86_expand_vector_set,
>         emit_reduc_half, expand_vec_perm_blend, expand_vec_perm_pshufb,
>         expand_vec_perm_interleave2, expand_vec_perm_pshufb2,
>         expand_vec_perm_vpshufb2_vpermq,
>         expand_vec_perm_vpshufb2_vpermq_even_odd, expand_vec_perm_even_odd_1,
>         expand_vec_perm_broadcast_1, expand_vec_perm_vpshufb4_vpermq2,
>         ix86_expand_sse2_mulv4si3, ix86_expand_pinsr): Likewise.
>         (expand_vec_perm_palignr): Likewise.  Modify a copy of *d rather
>         than *d itself.
> 
> --- gcc/optabs.c.jj     2013-10-29 09:25:45.000000000 +0100
> +++ gcc/optabs.c        2013-10-31 13:20:40.384808642 +0100
> @@ -6674,7 +6674,7 @@ expand_vec_perm (enum machine_mode mode,
>         }
>        tmp = gen_rtx_CONST_VECTOR (qimode, vec);
>        sel = gen_lowpart (qimode, sel);
> -      sel = expand_vec_perm (qimode, sel, sel, tmp, NULL);
> +      sel = expand_vec_perm (qimode, gen_reg_rtx (qimode), sel, tmp, NULL);
>        gcc_assert (sel != NULL);
> 
>        /* Add the byte offset to each byte element.  */

This hunk causes issues on AArch64 and ARM.

We look to see which permute operation we should generate in
aarch64_expand_vec_perm_const. If we notice that all elements in 
would select from op0 we copy op0 to op1 and generate appropriate code.

With this hunk applied we end up selecting from the register
generated by gen_reg_rtx (qimode), rather than from 'sel' as we
intended. Thus we lose the value of 'sel' and everything starts to
go wrong!

The hunk looks suspicious to me (why do we pick a register out of
thin air?), and reverting it fixes the problems I see. Could you
give me a pointer as to what this hunk fixes on i?86?

I don't think I can fix this up in the expander very easily?

Thanks,
James


Reply via email to