[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2019-03-10 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #8 from Uroš Bizjak  ---
(In reply to Uroš Bizjak from comment #6)
> Please report this problem in the another PR (it is the case of missing v->r
> alternative in *vec_extractv2di_0_sse pattern for SSE4+, where we can split
> directly to movd/pextrd).

Also fixed for gcc-9.

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2019-03-10 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #7 from uros at gcc dot gnu.org ---
Author: uros
Date: Sun Mar 10 22:59:31 2019
New Revision: 269562

URL: https://gcc.gnu.org/viewcvs?rev=269562&root=gcc&view=rev
Log:
PR target/68924
* config/i386/sse.md (*vec_extractv2di_0_sse):
Add (=r,x) alternative and corresponding splitter.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/sse.md

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2019-03-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

Uroš Bizjak  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED
   Target Milestone|--- |9.0

--- Comment #6 from Uroš Bizjak  ---
(In reply to Peter Cordes from comment #3)
> For the reverse, we get:
> 
> long long extract(__m128i v) {
> return ((__v2di)v)[0];
> }
> 
> subl$28, %esp
> vmovq   %xmm0, 8(%esp)
> movl8(%esp), %eax
> movl12(%esp), %edx
> addl$28, %esp
> ret
> 
> MOVD / PEXTRD might be better, but gcc does handle it.  It's all using
> syntax that's available in 32-bit mode, not a special built-in.

Please report this problem in the another PR (it is the case of missing v->r
alternative in *vec_extractv2di_0_sse pattern for SSE4+, where we can split
directly to movd/pextrd).

The original problem is fixed for gcc-9.

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2019-03-08 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #5 from Uroš Bizjak  ---
_mm_loadu_si64 intrinsic can now be used in the example from #Description:

#include 
#include 
__m256 load_bytes_to_m256(uint8_t *p)
{
  __m128i small_load = _mm_loadu_si64( (void *)p );
  __m256i intvec = _mm256_cvtepu8_epi32( small_load );
return _mm256_cvtepi32_ps(intvec);
}

-O2 -mavx2 now compiles on 32bit targets to:

...
vpmovzxbd   (%eax), %ymm0
vcvtdq2ps   %ymm0, %ymm0

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2019-03-08 Thread uros at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #4 from uros at gcc dot gnu.org ---
Author: uros
Date: Fri Mar  8 15:53:47 2019
New Revision: 269497

URL: https://gcc.gnu.org/viewcvs?rev=269497&root=gcc&view=rev
Log:
PR target/68924
PR target/78782
PR target/87558
* config/i386/emmintrin.h (_mm_loadu_si64): New intrinsic.
(_mm_storeu_si64): Ditto.

testsuite/ChangeLog:

PR target/68924
PR target/78782
PR target/87558
* gcc.target/i386/pr78782.c: New test.
* gcc.target/i386/pr87558.c: Ditto.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr78782.c
trunk/gcc/testsuite/gcc.target/i386/pr87558.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/emmintrin.h
trunk/gcc/testsuite/ChangeLog

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2017-09-27 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #3 from Peter Cordes  ---
(In reply to Marc Glisse from comment #2)
> Does anything bad happen if you remove the #ifdef/#endif for
> _mm_cvtsi64_si128? (2 files in the testsuite would need updating for a
> proper patch)

It's just a wrapper for

_mm_cvtsi64_si128 (long long __A) {
  return _mm_set_epi64x (0, __A);
}

and _mm_set_epi64x is already available in 32-bit mode.

I tried using _mm_set_epi64x(0, i) (https://godbolt.org/g/24AYPk), and got the
expected results (same as with _mm_loadl_epi64(&i));

__m128i movq_test(uint64_t *p) {
  return _mm_set_epi64x( 0, *p );
}

movl4(%esp), %eax
vmovq   (%eax), %xmm0
ret

For the test where we shift before movq, it still uses 32-bit integer
double-precision shifts, stores to the stack, then vmovq (instead of optimizing
to  vmovq / vpsllq)


For the reverse, we get:

long long extract(__m128i v) {
return ((__v2di)v)[0];
}

subl$28, %esp
vmovq   %xmm0, 8(%esp)
movl8(%esp), %eax
movl12(%esp), %edx
addl$28, %esp
ret

MOVD / PEXTRD might be better, but gcc does handle it.  It's all using syntax
that's available in 32-bit mode, not a special built-in.

I don't think it's helpful to disable the 64-bit integer intrinsics for 32-bit
mode, even though they are no longer always single instructions.  I guess it
could be worse if someone used it without thinking, assuming it would be the
same cost as MOVD, and didn't really need the full 64 bits.  In that case, a
compile-time error would prompt them to port more optimally to 32-bit.  But
it's not usually gcc's job to refuse to compile code that might be sub-optimal!

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2017-09-27 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #2 from Marc Glisse  ---
Does anything bad happen if you remove the #ifdef/#endif for _mm_cvtsi64_si128?
(2 files in the testsuite would need updating for a proper patch)

[Bug target/68924] No intrinsic for x86 `MOVQ m64, %xmm` in 32bit mode.

2017-09-26 Thread peter at cordes dot ca
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68924

--- Comment #1 from Peter Cordes  ---
There's  __m128i _mm_loadl_epi64 (__m128i const*
mem_addr)(https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=movq&expand=5450,4247,3115&techs=SSE2),
which gcc makes available in 32-bit mode.

This does solve the correctness problem for 32-bit, but gcc still compiles it
to a separate vmovq before a vpmovzxbd %xmm,%ymm.  (Using _mm_loadu_si128 still
optimizes away to vpmovzxbd (%eax), %ymm0.)

https://godbolt.org/g/Zuf26P