[Bug target/88278] Fails to elide zeroing of upper vector register

rguenther at suse dot de Fri, 30 Nov 2018 06:41:36 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88278


--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 30 Nov 2018, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88278
> 
> Jakub Jelinek <jakub at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|UNCONFIRMED                 |NEW
>    Last reconfirmed|                            |2018-11-30
>      Ever confirmed|0                           |1
> 
> --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> I guess
> #include <x86intrin.h>
> 
> __m128i
> foo (__m64 *x)
> {
>   return _mm_movpi64_epi64 (*x);
> }
> is what intrinsic users would write for this case, and that is optimized
> properly:
> (insn 7 6 12 2 (set (reg:V2DI 87)
>         (vec_concat:V2DI (mem:DI (reg:DI 89) [0 *x_3(D)+0 S8 A64])
>             (const_int 0 [0]))) "include/emmintrin.h":592:24 3956
> {vec_concatv2di}
>      (expr_list:REG_DEAD (reg:DI 89)
>         (nil)))
> 
> Similarly e.g.
> #include <x86intrin.h>
> 
> __m256
> foo (__m128 *x)
> {
>   return _mm256_castps128_ps256 (*x);
> }
> which is conceptually closest to this case.
> Or
> #include <x86intrin.h>
> 
> __m256i
> foo (__m128i *x)
> {
>   return _mm256_castsi128_si256 (*x);
> }
> 
> All these use something like:
> (insn 7 6 13 2 (set (reg:V8SI 87)
>         (unspec:V8SI [
>                 (mem:V4SI (reg:DI 90) [0 *x_3(D)+0 S16 A128])
>             ] UNSPEC_CAST)) "include/avxintrin.h":1484:20 4813 {avx_si256_si}
>      (expr_list:REG_DEAD (reg:DI 90)
>         (nil)))
> Not really sure why UNSPEC_CAST rather than representing it with something
> natural like VEC_CONCAT of nonimmediate_operand and const0_operand.

OK, it indeed seems to "work" when punning via integers:

typedef unsigned long v2di __attribute__((vector_size(16)));

v2di __GIMPLE baz (unsigned long *p)
{
  unsigned long _2;
  v2di _3;

bb_2:
  _2 = __MEM <unsigned long, 64> (p_1(D));
  _3 = _Literal (v2di) { _2, _Literal (unsigned long) 0 };
  return _3;
}

looks like for this combine can do

Successfully matched this instruction:
(set (reg:V2DI 87)
    (vec_concat:V2DI (mem:DI (reg:DI 89) [1 *p_1(D)+0 S8 A64])
        (const_int 0 [0])))

which means the vector variants could be handled similarly
by macroizing on vector modes with matching sizes?  Or is
this undesirable?  If we declare the above canonical RTL
for zero-"extending" loads into vector registers then we
can handle this during RTL expansion I guess.

[Bug target/88278] Fails to elide zeroing of upper vector register

Reply via email to