[PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-08-23 Thread Paul A. Clarke via Gcc-patches
Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cmpeq_epi64
- _mm_mullo_epi32, _mm_mul_epi32
- _mm_packus_epi32
- _mm_cmpgt_epi64 (SSE4.2)

from gcc/testsuite/gcc.target/i386.

2021-08-23  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
_mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.
* config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-packusdw.c: Same.
* gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
* gcc.target/powerpc/sse4_1-pmuldq.c: Same.
* gcc.target/powerpc/sse4_1-pmulld.c: Same.
* gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
* gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386,
tweak to suit.
---
v3:
- Add nmmintrin.h. _mm_cmpgt_epi64 is part of SSE4.2, which is
  ostensibly defined in nmmintrin.h. Following the i386 implementation,
  however, nmmintrin.h only includes smmintrin.h, and the actual
  implementations appear there.
- Add sse4_2-check.h, required by sse4_2-pcmpgtq.c. My testing was
  obviously inadequate.
v2:
- Added "extern" to functions to maintain compatible decorations with
  like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

 gcc/config/rs6000/nmmintrin.h | 40 ++
 gcc/config/rs6000/smmintrin.h | 41 +++
 gcc/testsuite/gcc.target/powerpc/pr78102.c| 23 ++
 .../gcc.target/powerpc/sse4_1-packusdw.c  | 73 +++
 .../gcc.target/powerpc/sse4_1-pcmpeqq.c   | 46 
 .../gcc.target/powerpc/sse4_1-pmuldq.c| 51 +
 .../gcc.target/powerpc/sse4_1-pmulld.c| 46 
 .../gcc.target/powerpc/sse4_2-check.h | 18 +
 .../gcc.target/powerpc/sse4_2-pcmpgtq.c   | 46 
 9 files changed, 384 insertions(+)
 create mode 100644 gcc/config/rs6000/nmmintrin.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-check.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c

diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
new file mode 100644
index ..20a70bee3776
--- /dev/null
+++ b/gcc/config/rs6000/nmmintrin.h
@@ -0,0 +1,40 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.  */
+#endif
+
+#ifndef _NMMINTRIN_H_INCLUDED
+#define _NMMINTRIN_H_INCLUDED
+
+/* We just include SSE4.1 header file.  */
+#include 
+
+#endif /* _NMMINTRIN_H_INCLUDED */
diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index fdef6674d16c..c04d2bb5b6d3 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -386,6 +386,15 @@ _mm_testnzc_si128 (__m128i __A, __m128i __B)
 
 #define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V))
 
+#ifdef _ARCH_PWR8
+extern __inline __m128i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
+{
+  return (

Re: [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-08-27 Thread Bill Schmidt via Gcc-patches

Hi Paul,

On 8/23/21 2:03 PM, Paul A. Clarke wrote:

Function signatures and decorations match gcc/config/i386/smmintrin.h.

Also, copy tests for:
- _mm_cmpeq_epi64
- _mm_mullo_epi32, _mm_mul_epi32
- _mm_packus_epi32
- _mm_cmpgt_epi64 (SSE4.2)

from gcc/testsuite/gcc.target/i386.

2021-08-23  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
_mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.
* config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.

gcc/testsuite
* gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
adjust dg directives to suit.
* gcc.target/powerpc/sse4_1-packusdw.c: Same.
* gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
* gcc.target/powerpc/sse4_1-pmuldq.c: Same.
* gcc.target/powerpc/sse4_1-pmulld.c: Same.
* gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
* gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386,
tweak to suit.
---
v3:
- Add nmmintrin.h. _mm_cmpgt_epi64 is part of SSE4.2, which is
   ostensibly defined in nmmintrin.h. Following the i386 implementation,
   however, nmmintrin.h only includes smmintrin.h, and the actual
   implementations appear there.
- Add sse4_2-check.h, required by sse4_2-pcmpgtq.c. My testing was
   obviously inadequate.
v2:
- Added "extern" to functions to maintain compatible decorations with
   like implementations in gcc/config/i386.
- Removed "-Wno-psabi" from tests as unnecessary, per v1 review.
- Noted testing in patch series cover letter.

  gcc/config/rs6000/nmmintrin.h | 40 ++
  gcc/config/rs6000/smmintrin.h | 41 +++
  gcc/testsuite/gcc.target/powerpc/pr78102.c| 23 ++
  .../gcc.target/powerpc/sse4_1-packusdw.c  | 73 +++
  .../gcc.target/powerpc/sse4_1-pcmpeqq.c   | 46 
  .../gcc.target/powerpc/sse4_1-pmuldq.c| 51 +
  .../gcc.target/powerpc/sse4_1-pmulld.c| 46 
  .../gcc.target/powerpc/sse4_2-check.h | 18 +
  .../gcc.target/powerpc/sse4_2-pcmpgtq.c   | 46 
  9 files changed, 384 insertions(+)
  create mode 100644 gcc/config/rs6000/nmmintrin.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr78102.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-packusdw.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pcmpeqq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmuldq.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-pmulld.c
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-check.h
  create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_2-pcmpgtq.c

diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
new file mode 100644
index ..20a70bee3776
--- /dev/null
+++ b/gcc/config/rs6000/nmmintrin.h
@@ -0,0 +1,40 @@
+/* Copyright (C) 2021 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#ifndef NO_WARN_X86_INTRINSICS
+/* This header is distributed to simplify porting x86_64 code that
+   makes explicit use of Intel intrinsics to powerpc64le.
+   It is the user's responsibility to determine if the results are
+   acceptable and make additional changes as necessary.
+   Note that much code that uses Intel intrinsics can be rewritten in
+   standard C or GNU C extensions, which are more portable and better
+   optimized across multiple targets.  */
+#endif
+
+#ifndef _NMMINTRIN_H_INCLUDED
+#define _NMMINTRIN_H_INCLUDED
+
+/* We just include SSE4.1 header file.  */
+#include 
+
+#endif /* _NMMINTRIN_H_INCLUDED */


Should there be something in here indicating that nmmintrin.h is for SSE 
4.2?  Otherwise it's a bit of a head-scratcher to a new person wondering 
why this file exists.  No big deal either way.


This looks fine to me with or without that.  Recommend approval.

Thanks!
Bill


diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index fdef6674d16c..c04d2bb5b6d3 100644
--- a/gcc/config/rs6000/

Re: [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-08-27 Thread Paul A. Clarke via Gcc-patches
On Fri, Aug 27, 2021 at 10:21:35AM -0500, Bill Schmidt via Gcc-patches wrote:
> On 8/23/21 2:03 PM, Paul A. Clarke wrote:
> > Function signatures and decorations match gcc/config/i386/smmintrin.h.

> > gcc

> > * config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.

> > ---
> > v3:
> > - Add nmmintrin.h. _mm_cmpgt_epi64 is part of SSE4.2, which is
> >ostensibly defined in nmmintrin.h. Following the i386 implementation,
> >however, nmmintrin.h only includes smmintrin.h, and the actual
> >implementations appear there.

> > v2:
> > - Added "extern" to functions to maintain compatible decorations with
> >like implementations in gcc/config/i386.

> > diff --git a/gcc/config/rs6000/nmmintrin.h b/gcc/config/rs6000/nmmintrin.h
> > new file mode 100644
> > index ..20a70bee3776
> > --- /dev/null
> > +++ b/gcc/config/rs6000/nmmintrin.h
> > @@ -0,0 +1,40 @@
> > +/* Copyright (C) 2021 Free Software Foundation, Inc.
> > +
> > +   This file is part of GCC.
> > +
> > +   GCC is free software; you can redistribute it and/or modify
> > +   it under the terms of the GNU General Public License as published by
> > +   the Free Software Foundation; either version 3, or (at your option)
> > +   any later version.
> > +
> > +   GCC is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > +   GNU General Public License for more details.
> > +
> > +   Under Section 7 of GPL version 3, you are granted additional
> > +   permissions described in the GCC Runtime Library Exception, version
> > +   3.1, as published by the Free Software Foundation.
> > +
> > +   You should have received a copy of the GNU General Public License and
> > +   a copy of the GCC Runtime Library Exception along with this program;
> > +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> > +   .  */
> > +
> > +#ifndef NO_WARN_X86_INTRINSICS
> > +/* This header is distributed to simplify porting x86_64 code that
> > +   makes explicit use of Intel intrinsics to powerpc64le.
> > +   It is the user's responsibility to determine if the results are
> > +   acceptable and make additional changes as necessary.
> > +   Note that much code that uses Intel intrinsics can be rewritten in
> > +   standard C or GNU C extensions, which are more portable and better
> > +   optimized across multiple targets.  */
> > +#endif
> > +
> > +#ifndef _NMMINTRIN_H_INCLUDED
> > +#define _NMMINTRIN_H_INCLUDED
> > +
> > +/* We just include SSE4.1 header file.  */
> > +#include 
> > +
> > +#endif /* _NMMINTRIN_H_INCLUDED */
> 
> Should there be something in here indicating that nmmintrin.h is for SSE
> 4.2?  Otherwise it's a bit of a head-scratcher to a new person wondering why
> this file exists.  No big deal either way.

For good or bad, I have been trying to minimize differences with the
analogous i386 files.  With the exception of the copyright and our annoying
litte warning, the only difference was this comment:

--
/* Implemented from the specification included in the Intel C++ Compiler
   User Guide and Reference, version 10.0.  */
--

I didn't find that (1) accurate, since there are no implementations therein,
or (2) particularly informative, as I imagine that document has a much
bigger scope than SSE4.2.  And keeping it would be a bit misleading, I think.
So, I intentionally removed the comment.

> This looks fine to me with or without that.  Recommend approval.

Thanks for the review!

PC


Re: [PATCH v3 5/6] rs6000: Support more SSE4 "cmp", "mul", "pack" intrinsics

2021-10-11 Thread Segher Boessenkool
Hi!

On Mon, Aug 23, 2021 at 02:03:09PM -0500, Paul A. Clarke wrote:
> gcc
>   * config/rs6000/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64,
>   _mm_mullo_epi32, _mm_mul_epi32, _mm_packus_epi32): New.
>   * config/rs6000/nmmintrin.h: Copy from i386, tweak to suit.
> 
> gcc/testsuite
>   * gcc.target/powerpc/pr78102.c: Copy from gcc.target/i386,
>   adjust dg directives to suit.
>   * gcc.target/powerpc/sse4_1-packusdw.c: Same.
>   * gcc.target/powerpc/sse4_1-pcmpeqq.c: Same.
>   * gcc.target/powerpc/sse4_1-pmuldq.c: Same.
>   * gcc.target/powerpc/sse4_1-pmulld.c: Same.
>   * gcc.target/powerpc/sse4_2-pcmpgtq.c: Same.
>   * gcc.target/powerpc/sse4_2-check.h: Copy from gcc.target/i386,
>   tweak to suit.

Okay for trunk (with the vsx_hw thing).  Thanks!


Segher