Re: [PATCH] rs6000: Add support for the vec_sbox_be, vec_cipher_be etc. builtins.

2019-02-11 Thread Xiong Hu Luo

Hi Segher,

On 2019/1/26 AM1:43, Segher Boessenkool wrote:

Hi!

On Wed, Jan 23, 2019 at 03:57:28AM -0600, luo...@linux.vnet.ibm.com wrote:

The 5 new builtins vec_sbox_be, vec_cipher_be, vec_cipherlast_be, vec_ncipher_be
and vec_ncipherlast_be only support vector unsigned char type parameters.
Add new instruction crypto_vsbox_ and crypto__ to handle
them accordingly, where the new mode CR_vqdi can be expanded to vector unsigned
long long for none _be postfix builtins or vector unsigned char for _be postfix
builtins.


Hrm, can't you use the existing CR_mode iterator here?


2019-01-23  Xiong Hu Luo  

* gcc/testsuite/gcc.target/powerpc/crypto-builtin-1.c
(crpyto1_be, crpyto2_be, crpyto3_be, crpyto4_be, crpyto5_be):
 New testcases.


Typoes ("crypto").  And that last line is indented incorrectly.

With those things fixed, okay for trunk, with the new iterator if CR_mode
isn't usable here.  Thanks!


Segher



Thanks, I will fix the typos and indent.
CR_mode support all the 4 types(v16qi, v8hi, v4si and v2di), so we 
define the new mode CR_vqdi to represent only 2 types(v16qi and v2di) we 
need, means that these two modes cannot be reused.

BTW, does this patch need back port to gcc-7 and gcc-8?

Xionghu







[PATCH] rs6000: new vec-s*d-modulo.c tests should require p8vector_hw

2019-02-11 Thread Bill Schmidt
Hi,

It turns out that the new tests added today actually require POWER8 hardware at
a minimum, since the vec_vsrad interface requires it.  (Note that requiring
P8 hardware obviates the need to specify -mvsx, so that is now removed.)

Tested on powerpc64le (P9, P8) and powerpc64 (P7) with correct behavior.  Is 
this
okay for trunk?

Thanks,
Bill


2018-01-11  Bill Schmidt  

* gcc.target/powerpc/vec-sld-modulo.c: Require p8vector_hw.
* gcc.target/powerpc/vec-srad-modulo.c: Likewise.
* gcc.target/powerpc/vec-srd-modulo.c: Likewise.


Index: gcc/testsuite/gcc.target/powerpc/vec-sld-modulo.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-sld-modulo.c   (revision 268771)
+++ gcc/testsuite/gcc.target/powerpc/vec-sld-modulo.c   (working copy)
@@ -1,8 +1,8 @@
 /* Test that using a character splat to set up a shift-left
for a doubleword vector works correctly after gimple folding.  */
 
-/* { dg-do run { target { vsx_hw } } } */
-/* { dg-options "-O2 -mvsx" } */
+/* { dg-do run { target { p8vector_hw } } } */
+/* { dg-options "-O2" } */
 
 #include 
 
Index: gcc/testsuite/gcc.target/powerpc/vec-srad-modulo.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-srad-modulo.c  (revision 268771)
+++ gcc/testsuite/gcc.target/powerpc/vec-srad-modulo.c  (working copy)
@@ -1,8 +1,8 @@
 /* Test that using a character splat to set up a shift-right algebraic
for a doubleword vector works correctly after gimple folding.  */
 
-/* { dg-do run { target { vsx_hw } } } */
-/* { dg-options "-O2 -mvsx" } */
+/* { dg-do run { target { p8vector_hw } } } */
+/* { dg-options "-O2" } */
 
 #include 
 
Index: gcc/testsuite/gcc.target/powerpc/vec-srd-modulo.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-srd-modulo.c   (revision 268771)
+++ gcc/testsuite/gcc.target/powerpc/vec-srd-modulo.c   (working copy)
@@ -1,8 +1,8 @@
 /* Test that using a character splat to set up a shift-right logical
for a doubleword vector works correctly after gimple folding.  */
 
-/* { dg-do run { target { vsx_hw } } } */
-/* { dg-options "-O2 -mvsx" } */
+/* { dg-do run { target { p8vector_hw } } } */
+/* { dg-options "-O2" } */
 
 #include 
 



[PATCH] Avoid assuming valid_constant_size_p argument is a constant expression (PR 89294)

2019-02-11 Thread Martin Sebor

The attached patch removes the assumption introduced earlier today
in my fix for bug 87996 that the valid_constant_size_p argument is
a constant expression.  I couldn't come up with a C/C++ test case
where this isn't true but apparently it can happen in Ada which I
inadvertently didn't build.  I still haven't figured out what
I have to do to build it on my Fedora 29 machine so I tested
this change by hand (besides bootstrapping w/o Ada).

The first set of instructions Google gives me don't seem to do
it:

  https://fedoraproject.org/wiki/Features/Ada_developer_tools

and neither does dnf install gcc-gnat as explained on our Wiki:

  https://gcc.gnu.org/wiki/GNAT

If someone knows the magic chant I would be grateful (it might
be helpful to also update the Wiki page -- the last change to
it was made in 2012; I volunteer to do that).

Martin
PR middle-end/89294 - ICE in valid_constant_size_p

gcc/c-family/ChangeLog:

	PR middle-end/89294
	* c-common.c (invalid_array_size_error): Handle cst_size_not_constant.

gcc/ChangeLog:

	PR middle-end/89294
	* tree.c (valid_constant_size_p): Avoid assuming size is a constant
	expression.
	* tree.h (cst_size_error): Add the cst_size_not_constant enumerator.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 268783)
+++ gcc/c-family/c-common.c	(working copy)
@@ -8241,6 +8241,13 @@ invalid_array_size_error (location_t loc, cst_size
   tree maxsize = max_object_size ();
   switch (error)
 {
+case cst_size_not_constant:
+  if (name)
+	error_at (loc, "size of array %qE is not a constant expression",
+		  name);
+  else
+	error_at (loc, "size of array is not a constant expression");
+  break;
 case cst_size_negative:
   if (name)
 	error_at (loc, "size %qE of array %qE is negative",
Index: gcc/tree.c
===
--- gcc/tree.c	(revision 268783)
+++ gcc/tree.c	(working copy)
@@ -7521,8 +7521,14 @@ valid_constant_size_p (const_tree size, cst_size_e
   if (!perr)
 perr = 
 
-  if (TREE_OVERFLOW (size))
+  if (TREE_CODE (size) != INTEGER_CST)
 {
+  *perr = cst_size_not_constant;
+  return false;
+}
+
+  if (TREE_OVERFLOW_P (size))
+{
   *perr = cst_size_overflow;
   return false;
 }
Index: gcc/tree.h
===
--- gcc/tree.h	(revision 268783)
+++ gcc/tree.h	(working copy)
@@ -4352,6 +4352,7 @@ extern tree excess_precision_type (tree);
is not a valid size.  */
 enum cst_size_error {
   cst_size_ok,
+  cst_size_not_constant,
   cst_size_negative,
   cst_size_too_big,
   cst_size_overflow


[committed] linemap_line_start: protect against location_t overflow (PR lto/88147)

2019-02-11 Thread David Malcolm
PR lto/88147 reports an assertion failure due to a bogus location_t value
when adding a line to a pre-existing line map, when there's a large
difference between the two line numbers.

For some "large differences", this leads to a location_t value that exceeds
LINE_MAP_MAX_LOCATION, in which case linemap_line_start returns 0.  This
isn't ideal, but at least should lead to safe degradation of location
information.

However, if the difference is very large, it's possible for the line
number offset (relative to the start of the map) to be sufficiently large
that overflow occurs when left-shifted by the column-bits, and hence
the check against the LINE_MAP_MAX_LOCATION limit fails, leading to
a seemingly-valid location_t value, but encoding the wrong location.  This
triggers the assertion failure:
  linemap_assert (SOURCE_LINE (map, r) == to_line);

The fix (thanks to Martin) is to check for overflow when determining
whether to reuse an existing map, and to not reuse it if it would occur.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Committed to trunk as r268789.

gcc/ChangeLog: David Malcolm  
PR lto/88147
* input.c (selftest::test_line_offset_overflow): New selftest.
(selftest::input_c_tests): Call it.

libcpp/ChangeLog: Martin Liska  
PR lto/88147
* line-map.c (linemap_line_start): Don't reuse the existing line
map if the line offset is sufficiently large to cause overflow
when computing location_t values.
---
 gcc/input.c   | 30 ++
 libcpp/line-map.c |  4 
 2 files changed, 34 insertions(+)

diff --git a/gcc/input.c b/gcc/input.c
index bf1ca66..c589d70 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -3557,6 +3557,34 @@ for_each_line_table_case (void (*testcase) (const 
line_table_case &))
   ASSERT_EQ (num_cases_tested, 2 * 12);
 }
 
+/* Verify that when presented with a consecutive pair of locations with
+   a very large line offset, we don't attempt to consolidate them into
+   a single ordinary linemap where the line offsets within the line map
+   would lead to overflow (PR lto/88147).  */
+
+static void
+test_line_offset_overflow ()
+{
+  line_table_test ltt (line_table_case (5, 0));
+
+  linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
+  linemap_line_start (line_table, 1, 100);
+  location_t loc_a = linemap_line_start (line_table, 2578, 255);
+  assert_loceq ("foo.c", 2578, 0, loc_a);
+
+  const line_map_ordinary *ordmap_a = LINEMAPS_LAST_ORDINARY_MAP (line_table);
+  ASSERT_EQ (ordmap_a->m_column_and_range_bits, 13);
+  ASSERT_EQ (ordmap_a->m_range_bits, 5);
+
+  location_t loc_b = linemap_line_start (line_table, 404198, 512);
+  assert_loceq ("foo.c", 404198, 0, loc_b);
+
+  /* We should have started a new linemap, rather than attempting to store
+ a very large line offset.  */
+  const line_map_ordinary *ordmap_b = LINEMAPS_LAST_ORDINARY_MAP (line_table);
+  ASSERT_NE (ordmap_a, ordmap_b);
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -3596,6 +3624,8 @@ input_c_tests ()
   for_each_line_table_case (test_lexer_char_constants);
 
   test_reading_source_line ();
+
+  test_line_offset_overflow ();
 }
 
 } // namespace selftest
diff --git a/libcpp/line-map.c b/libcpp/line-map.c
index ff679ed..0e30b4b 100644
--- a/libcpp/line-map.c
+++ b/libcpp/line-map.c
@@ -742,6 +742,10 @@ linemap_line_start (struct line_maps *set, linenum_type 
to_line,
   if (line_delta < 0
  || last_line != ORDINARY_MAP_STARTING_LINE_NUMBER (map)
  || SOURCE_COLUMN (map, highest) >= (1U << (column_bits - range_bits))
+ || ( /* We can't reuse the map if the line offset is sufficiently
+ large to cause overflow when computing location_t values.  */
+ (to_line - ORDINARY_MAP_STARTING_LINE_NUMBER (map))
+ >= (1U << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
  || range_bits < map->m_range_bits)
map = linemap_check_ordinary
(const_cast 
-- 
1.8.5.3



Re: [PATCH] Updated patches for the port of gccgo to GNU/Hurd

2019-02-11 Thread Ian Lance Taylor
On Sun, Feb 10, 2019 at 3:40 AM Svante Signell  wrote:
>
> > I've found some problems. Current problem is with the mksysinfo.sh patch. 
> > But
> > there are some other things missing. New patches will be submitted tomorrow.
>
> Attached are three additional patches needed to build libgo on GNU/Hurd:
> src_libgo_mksysinfo.sh.diff
> src_libgo_go_syscall_wait.c.diff
> src_libgo_testsuite_gotest.diff
>
> For the first patch, src_libgo_mksysinfo.sh.diff, I had to go back to the old
> version, using sed -i -e. As written now ${fsid_to_dev} expands to
> fsid_to_dev='-e '\''s/st_fsid/Dev/'\''' resulting in: "sed: -e expression #4,
> char 1: unknown command: `''". Unfortunately, I have not yet been able to 
> modify
> the expansion omitting the single qoutes around the shell variable.

I fixed this a slightly different way, as attached.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.


Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268605)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-9b66264ed6adcf3fd215dbfd125c12b022b7280e
+fc8aa5a46433d6ecba9fd1cd0bee4290c314ca06
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 268461)
+++ libgo/mksysinfo.sh  (working copy)
@@ -486,9 +486,9 @@ grep '^type _st_timespec ' gen-sysinfo.g
 
 # Special treatment of struct stat st_dev for GNU/Hurd
 # /usr/include/i386-gnu/bits/stat.h: #define st_dev st_fsid
-fsid_to_dev=
+st_dev='-e s/st_dev/Dev/'
 if grep 'define st_dev st_fsid' gen-sysinfo.go > /dev/null 2>&1; then
-  fsid_to_dev="-e 's/st_fsid/Dev/'"
+  st_dev='-e s/st_fsid/Dev/'
 fi
 
 # The stat type.
@@ -500,8 +500,7 @@ else
   grep '^type _stat ' gen-sysinfo.go
 fi | sed -e 's/type _stat64/type Stat_t/' \
  -e 's/type _stat/type Stat_t/' \
- -e 's/st_dev/Dev/' \
- ${fsid_to_dev} \
+ ${st_dev} \
  -e 's/st_ino/Ino/g' \
  -e 's/st_nlink/Nlink/' \
  -e 's/st_mode/Mode/' \


Re: [PATCH 08/40] i386: Emulate MMX ashr3/3 with SSE

2019-02-11 Thread Uros Bizjak
On Tue, Feb 12, 2019 at 12:15 AM Uros Bizjak  wrote:
>
> On Mon, Feb 11, 2019 at 11:55 PM H.J. Lu  wrote:
> >
> > Emulate MMX ashr3/3 with SSE.  Only SSE register
> > source operand is allowed.
> >
> > PR target/89021
> > * config/i386/mmx.md (mmx_ashr3): Changed to define_expand.
> > Disallow TARGET_MMX_WITH_SSE.
> > (mmx_3): Likewise.
> > (ashr3): New.
> > (*ashr3): Likewise.
> > (3): Likewise.
> > (*3): Likewise.
> > ---
> >  gcc/config/i386/mmx.md | 68 --
> >  1 file changed, 52 insertions(+), 16 deletions(-)
> >
> > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> > index 0e44b3ce9b8..1b4f67be902 100644
> > --- a/gcc/config/i386/mmx.md
> > +++ b/gcc/config/i386/mmx.md
> > @@ -983,33 +983,69 @@
> >[(set_attr "type" "mmxadd")
> > (set_attr "mode" "DI")])
> >
> > -(define_insn "mmx_ashr3"
> > -  [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
> > +(define_expand "mmx_ashr3"
> > +  [(set (match_operand:MMXMODE24 0 "register_operand")
> >  (ashiftrt:MMXMODE24
> > - (match_operand:MMXMODE24 1 "register_operand" "0")
> > - (match_operand:DI 2 "nonmemory_operand" "yN")))]
> > -  "TARGET_MMX"
> > -  "psra\t{%2, %0|%0, %2}"
> > -  [(set_attr "type" "mmxshft")
> > + (match_operand:MMXMODE24 1 "register_operand")
> > + (match_operand:DI 2 "nonmemory_operand")))]
> > +  "TARGET_MMX && !TARGET_MMX_WITH_SSE")
>
> Are you sure this is the correct condition? This pattern is called
> from a builtin, which should be enabled for TARGET_MMX *or*
> TARGET_MMX_WITH_SSE.

It looks to me that "TARGET_MMX || TARGET_MMX_WITH_SSE" should be used
with mmx_* patterns (and new SSE constraints should be added to these
patterns) and new named shift expanders should be added here.

Uros.

> > +(define_expand "ashr3"
> > +  [(set (match_operand:MMXMODE24 0 "register_operand")
> > +(ashiftrt:MMXMODE24
> > + (match_operand:MMXMODE24 1 "register_operand")
> > + (match_operand:DI 2 "nonmemory_operand")))]
> > +  "TARGET_MMX_WITH_SSE")
> > +
> > +(define_insn "*ashr3"
> > +  [(set (match_operand:MMXMODE24 0 "register_operand" "=y,x,Yv")
> > +(ashiftrt:MMXMODE24
> > + (match_operand:MMXMODE24 1 "register_operand" "0,0,Yv")
> > + (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
> > +  "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > +  "@
> > +   psra\t{%2, %0|%0, %2}
> > +   psra\t{%2, %0|%0, %2}
> > +   vpsra\t{%2, %1, %0|%0, %1, %2}"
> > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > +   (set_attr "type" "mmxshft,sseishft,sseishft")
> > (set (attr "length_immediate")
> >   (if_then_else (match_operand 2 "const_int_operand")
> > (const_string "1")
> > (const_string "0")))
> > -   (set_attr "mode" "DI")])
> > +   (set_attr "mode" "DI,TI,TI")])
> >
> > -(define_insn "mmx_3"
> > -  [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
> > +(define_expand "mmx_3"
> > +  [(set (match_operand:MMXMODE248 0 "register_operand")
> >  (any_lshift:MMXMODE248
> > - (match_operand:MMXMODE248 1 "register_operand" "0")
> > - (match_operand:DI 2 "nonmemory_operand" "yN")))]
> > -  "TARGET_MMX"
> > -  "p\t{%2, %0|%0, %2}"
> > -  [(set_attr "type" "mmxshft")
> > + (match_operand:MMXMODE248 1 "register_operand")
> > + (match_operand:DI 2 "nonmemory_operand")))]
> > +  "TARGET_MMX && !TARGET_MMX_WITH_SSE")
> > +
> > +(define_expand "3"
> > +  [(set (match_operand:MMXMODE248 0 "register_operand")
> > +(any_lshift:MMXMODE248
> > + (match_operand:MMXMODE248 1 "register_operand")
> > + (match_operand:DI 2 "nonmemory_operand")))]
> > +  "TARGET_MMX_WITH_SSE")
> > +
> > +(define_insn "*3"
> > +  [(set (match_operand:MMXMODE248 0 "register_operand" "=y,x,Yv")
> > +(any_lshift:MMXMODE248
> > + (match_operand:MMXMODE248 1 "register_operand" "0,0,Yv")
> > + (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
> > +  "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > +  "@
> > +   p\t{%2, %0|%0, %2}
> > +   p\t{%2, %0|%0, %2}
> > +   vp\t{%2, %1, %0|%0, %1, %2}"
> > +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > +   (set_attr "type" "mmxshft,sseishft,sseishft")
> > (set (attr "length_immediate")
> >   (if_then_else (match_operand 2 "const_int_operand")
> > (const_string "1")
> > (const_string "0")))
> > -   (set_attr "mode" "DI")])
> > +   (set_attr "mode" "DI,TI,TI")])
> >
> >  ;
> >  ;;
> > --
> > 2.20.1
> >


[PATCH 19/40] i386: Emulate MMX mmx_pmovmskb with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_pmovmskb with SSE by zero-extending result of SSE pmovmskb
from QImode to SImode.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmovmskb): Changed to
define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/mmx.md | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 4cf008e99c7..d9ff70884bd 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1799,14 +1799,30 @@
   [(set_attr "type" "mmxshft")
(set_attr "mode" "DI")])
 
-(define_insn "mmx_pmovmskb"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:V8QI 1 "register_operand" "y")]
+(define_insn_and_split "mmx_pmovmskb"
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
+   (unspec:SI [(match_operand:V8QI 1 "register_operand" "y,x")]
   UNSPEC_MOVMSK))]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pmovmskb\t{%1, %0|%0, %1}"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   pmovmskb\t{%1, %0|%0, %1}
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (zero_extend:SI (match_dup 1)))]
+{
+  /* Generate SSE pmovmskb and zero-extend from QImode to SImode.  */
+  rtx op1 = lowpart_subreg (V16QImode, operands[1],
+   GET_MODE (operands[1]));
+  rtx insn = gen_sse2_pmovmskb (operands[0], op1);
+  emit_insn (insn);
+  operands[1] = lowpart_subreg (QImode, operands[0],
+   GET_MODE (operands[0]));
+}
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,ssemov")
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_maskmovq"
   [(set (match_operand:V8QI 0 "memory_operand")
-- 
2.20.1



Re: [PATCH 08/40] i386: Emulate MMX ashr3/3 with SSE

2019-02-11 Thread Uros Bizjak
On Mon, Feb 11, 2019 at 11:55 PM H.J. Lu  wrote:
>
> Emulate MMX ashr3/3 with SSE.  Only SSE register
> source operand is allowed.
>
> PR target/89021
> * config/i386/mmx.md (mmx_ashr3): Changed to define_expand.
> Disallow TARGET_MMX_WITH_SSE.
> (mmx_3): Likewise.
> (ashr3): New.
> (*ashr3): Likewise.
> (3): Likewise.
> (*3): Likewise.
> ---
>  gcc/config/i386/mmx.md | 68 --
>  1 file changed, 52 insertions(+), 16 deletions(-)
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index 0e44b3ce9b8..1b4f67be902 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -983,33 +983,69 @@
>[(set_attr "type" "mmxadd")
> (set_attr "mode" "DI")])
>
> -(define_insn "mmx_ashr3"
> -  [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
> +(define_expand "mmx_ashr3"
> +  [(set (match_operand:MMXMODE24 0 "register_operand")
>  (ashiftrt:MMXMODE24
> - (match_operand:MMXMODE24 1 "register_operand" "0")
> - (match_operand:DI 2 "nonmemory_operand" "yN")))]
> -  "TARGET_MMX"
> -  "psra\t{%2, %0|%0, %2}"
> -  [(set_attr "type" "mmxshft")
> + (match_operand:MMXMODE24 1 "register_operand")
> + (match_operand:DI 2 "nonmemory_operand")))]
> +  "TARGET_MMX && !TARGET_MMX_WITH_SSE")

Are you sure this is the correct condition? This pattern is called
from a builtin, which should be enabled for TARGET_MMX *or*
TARGET_MMX_WITH_SSE.

Please review other builtins enable conditions for this issue.

Uros.

> +(define_expand "ashr3"
> +  [(set (match_operand:MMXMODE24 0 "register_operand")
> +(ashiftrt:MMXMODE24
> + (match_operand:MMXMODE24 1 "register_operand")
> + (match_operand:DI 2 "nonmemory_operand")))]
> +  "TARGET_MMX_WITH_SSE")
> +
> +(define_insn "*ashr3"
> +  [(set (match_operand:MMXMODE24 0 "register_operand" "=y,x,Yv")
> +(ashiftrt:MMXMODE24
> + (match_operand:MMXMODE24 1 "register_operand" "0,0,Yv")
> + (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
> +  "TARGET_MMX || TARGET_MMX_WITH_SSE"
> +  "@
> +   psra\t{%2, %0|%0, %2}
> +   psra\t{%2, %0|%0, %2}
> +   vpsra\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> +   (set_attr "type" "mmxshft,sseishft,sseishft")
> (set (attr "length_immediate")
>   (if_then_else (match_operand 2 "const_int_operand")
> (const_string "1")
> (const_string "0")))
> -   (set_attr "mode" "DI")])
> +   (set_attr "mode" "DI,TI,TI")])
>
> -(define_insn "mmx_3"
> -  [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
> +(define_expand "mmx_3"
> +  [(set (match_operand:MMXMODE248 0 "register_operand")
>  (any_lshift:MMXMODE248
> - (match_operand:MMXMODE248 1 "register_operand" "0")
> - (match_operand:DI 2 "nonmemory_operand" "yN")))]
> -  "TARGET_MMX"
> -  "p\t{%2, %0|%0, %2}"
> -  [(set_attr "type" "mmxshft")
> + (match_operand:MMXMODE248 1 "register_operand")
> + (match_operand:DI 2 "nonmemory_operand")))]
> +  "TARGET_MMX && !TARGET_MMX_WITH_SSE")
> +
> +(define_expand "3"
> +  [(set (match_operand:MMXMODE248 0 "register_operand")
> +(any_lshift:MMXMODE248
> + (match_operand:MMXMODE248 1 "register_operand")
> + (match_operand:DI 2 "nonmemory_operand")))]
> +  "TARGET_MMX_WITH_SSE")
> +
> +(define_insn "*3"
> +  [(set (match_operand:MMXMODE248 0 "register_operand" "=y,x,Yv")
> +(any_lshift:MMXMODE248
> + (match_operand:MMXMODE248 1 "register_operand" "0,0,Yv")
> + (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
> +  "TARGET_MMX || TARGET_MMX_WITH_SSE"
> +  "@
> +   p\t{%2, %0|%0, %2}
> +   p\t{%2, %0|%0, %2}
> +   vp\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> +   (set_attr "type" "mmxshft,sseishft,sseishft")
> (set (attr "length_immediate")
>   (if_then_else (match_operand 2 "const_int_operand")
> (const_string "1")
> (const_string "0")))
> -   (set_attr "mode" "DI")])
> +   (set_attr "mode" "DI,TI,TI")])
>
>  ;
>  ;;
> --
> 2.20.1
>


Re: C++ PATCH for c++/89217 - ICE with list-initialization in range-based for loop

2019-02-11 Thread Marek Polacek
On Mon, Feb 11, 2019 at 01:43:36PM -0500, Jason Merrill wrote:
> On 2/7/19 6:02 PM, Marek Polacek wrote:
> > Since r268321 we can call digest_init even in a template, when the compound
> > literal isn't instantiation-dependent.
> 
> Right.  And since digest_init modifies the CONSTRUCTOR in place, that means
> the template trees are digested rather than the original parse trees that we
> try to use.  If we're going to use digest_init, we should probably save
> another CONSTRUCTOR with the original trees.

I tried unsharing the constructor and even its contents but only then did I
realize that this cannot work.  It's not digest_init that adds the problematic
INDIRECT_REF via convert_from_reference, it's instantiate_pending_templates
-> tsubst_expr -> ... -> finish_non_static_data_member.

So the problem isn't sharing the contents of the CONSTRUCTOR, but rather what
finish_non_static_data_member does with the 

  {.r=(struct R &) (struct R *) ((struct S *) this)->r}

expression.  The same problem would appear even before r268321 changes if we
called tsubst_* twice on the CONSTRUCTOR above.

Do you still think digest_init and/or finish_compound_literal need tweaking?

Marek


[PATCH 17/40] i386: Emulate MMX mmx_pinsrw with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_pinsrw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pinsrw): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(*mmx_pinsrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b1d27506131..836adf3e533 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1321,32 +1321,45 @@
 (match_operand:SI 2 "nonimmediate_operand"))
  (match_operand:V4HI 1 "register_operand")
   (match_operand:SI 3 "const_0_to_3_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   operands[2] = gen_lowpart (HImode, operands[2]);
   operands[3] = GEN_INT (1 << INTVAL (operands[3]));
 })
 
 (define_insn "*mmx_pinsrw"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
 (vec_merge:V4HI
   (vec_duplicate:V4HI
-(match_operand:HI 2 "nonimmediate_operand" "rm"))
- (match_operand:V4HI 1 "register_operand" "0")
+(match_operand:HI 2 "nonimmediate_operand" "rm,rm,rm"))
+ (match_operand:V4HI 1 "register_operand" "0,0,Yv")
   (match_operand:SI 3 "const_int_operand")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ((unsigned) exact_log2 (INTVAL (operands[3]))
< GET_MODE_NUNITS (V4HImode))"
 {
   operands[3] = GEN_INT (exact_log2 (INTVAL (operands[3])));
-  if (MEM_P (operands[2]))
-return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  if (TARGET_MMX_WITH_SSE && TARGET_AVX)
+{
+  if (MEM_P (operands[2]))
+   return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  else
+   return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+}
   else
-return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+{
+  if (MEM_P (operands[2]))
+   return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  else
+   return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+}
 }
-  [(set_attr "type" "mmxcvt")
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcvt,sselog,sselog")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_pextrw"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-- 
2.20.1



[C++ PATCH] PR c++/89241 - ICE with __func__ in lambda in template.

2019-02-11 Thread Jason Merrill
When we're instantiating a generic lambda, its enclosing context will
have already been instantiated, so we need to look for that as well.

Tested x86_64-pc-linux-gnu, applying to trunk.

* pt.c (enclosing_instantiation_of): Also check
instantiated_lambda_fn_p for the template context.
---
 gcc/cp/pt.c   |  3 ++-
 gcc/testsuite/g++.dg/cpp1y/lambda-generic-func1.C | 12 
 gcc/cp/ChangeLog  |  6 ++
 3 files changed, 20 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-func1.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index b8fbf4046f0..6304c99c5b1 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13353,7 +13353,8 @@ enclosing_instantiation_of (tree otctx)
   tree fn = current_function_decl;
   int lambda_count = 0;
 
-  for (; tctx && lambda_fn_in_template_p (tctx);
+  for (; tctx && (lambda_fn_in_template_p (tctx)
+ || instantiated_lambda_fn_p (tctx));
tctx = decl_function_context (tctx))
 ++lambda_count;
   for (; fn; fn = decl_function_context (fn))
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-func1.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-func1.C
new file mode 100644
index 000..4a6385529f8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-func1.C
@@ -0,0 +1,12 @@
+// PR c++/89241
+// { dg-do compile { target c++14 } }
+
+template  void m(al p) {
+  p(1);
+}
+
+template  void f() {
+  m([](auto) { __func__; });
+}
+
+template void f();
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 76f22f5e363..24939e9fc9e 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,9 @@
+2019-02-11  Jason Merrill  
+
+   PR c++/89241 - ICE with __func__ in lambda in template.
+   * pt.c (enclosing_instantiation_of): Also check
+   instantiated_lambda_fn_p for the template context.
+
 2019-02-11  Martin Sebor  
 
PR c++/87996

base-commit: 5f2991399dfbe89e3a6a4b241f489631e806a272
-- 
2.20.1



[PATCH 20/40] i386: Emulate MMX mmx_umulv4hi3_highpart with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_umulv4hi3_highpart with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_umulv4hi3_highpart): Also check
TARGET_MMX and TARGET_MMX_WITH_SSE.
(*mmx_umulv4hi3_highpart): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index d9ff70884bd..3c432f09e31 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -810,24 +810,30 @@
  (zero_extend:V4SI
(match_operand:V4HI 2 "nonimmediate_operand")))
(const_int 16]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_umulv4hi3_highpart"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(mult:V4SI
  (zero_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (zero_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_int 16]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmulhuw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "@
+   pmulhuw\t{%2, %0|%0, %2}
+   pmulhuw\t{%2, %0|%0, %2}
+   vpmulhuw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_pmaddwd"
   [(set (match_operand:V2SI 0 "register_operand")
-- 
2.20.1



[PATCH 15/40] i386: Emulate MMX sse_cvtpi2ps with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX sse_cvtpi2ps with SSE2 cvtdq2ps, preserving upper 64 bits of
destination XMM register.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (UNSPEC_CVTPI2PS): New.
(sse_cvtpi2ps): Renamed to ...
(*mmx_cvtpi2ps): This.  Disabled for TARGET_MMX_WITH_SSE.
(sse_cvtpi2ps): New.
(mmx_cvtpi2ps_sse): Likewise.
---
 gcc/config/i386/sse.md | 83 +-
 1 file changed, 81 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 80bb4cb935d..75e711624ce 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18,6 +18,9 @@
 ;; .
 
 (define_c_enum "unspec" [
+  ;; MMX with SSE
+  UNSPEC_CVTPI2PS
+
   ;; SSE
   UNSPEC_MOVNT
 
@@ -4655,14 +4658,90 @@
 ;;
 ;
 
-(define_insn "sse_cvtpi2ps"
+(define_expand "sse_cvtpi2ps"
+  [(set (match_operand:V4SF 0 "register_operand")
+   (vec_merge:V4SF
+ (vec_duplicate:V4SF
+   (float:V2SF (match_operand:V2SI 2 "nonimmediate_operand")))
+ (match_operand:V4SF 1 "register_operand")
+ (const_int 3)))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  rtx op2 = force_reg (V2SImode, operands[2]);
+  rtx op3 = gen_reg_rtx (V4SFmode);
+  rtx op4 = gen_reg_rtx (V4SFmode);
+  rtx insn = gen_mmx_cvtpi2ps_sse (operands[0], operands[1], op2,
+  op3, op4);
+  emit_insn (insn);
+  DONE;
+}
+})
+
+(define_insn_and_split "mmx_cvtpi2ps_sse"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,Yv")
+   (unspec:V4SF [(match_operand:V2SI 2 "register_operand" "x,Yv")
+ (match_operand:V4SF 1 "register_operand" "0,Yv")]
+UNSPEC_CVTPI2PS))
+   (set (match_operand:V4SF 3 "register_operand" "=x,Yv")
+   (unspec:V4SF [(match_operand:V4SF 4 "register_operand" "3,3")]
+UNSPEC_CVTPI2PS))]
+  "TARGET_MMX_WITH_SSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op2 = gen_rtx_REG (V4SImode, REGNO (operands[2]));
+  /* Generate SSE2 cvtdq2ps.  */
+  rtx insn = gen_floatv4siv4sf2 (operands[3], op2);
+  emit_insn (insn);
+
+  /* Merge operands[3] with operands[0].  */
+  rtx mask, op1;
+  if (TARGET_AVX)
+{
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (1),
+ GEN_INT (6), GEN_INT (7)));
+  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[3], operands[1]);
+  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
+  insn = gen_rtx_SET (operands[0], op2);
+}
+  else
+{
+  /* NB: SSE can only concatenate OP0 and OP3 to OP0.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (4), GEN_INT (5)));
+  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[0], operands[3]);
+  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
+  insn = gen_rtx_SET (operands[0], op2);
+  emit_insn (insn);
+
+  /* Swap bits 0:63 with bits 64:127.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (1)));
+  rtx dest = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+}
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "*mmx_cvtpi2ps"
   [(set (match_operand:V4SF 0 "register_operand" "=x")
(vec_merge:V4SF
  (vec_duplicate:V4SF
(float:V2SF (match_operand:V2SI 2 "nonimmediate_operand" "ym")))
  (match_operand:V4SF 1 "register_operand" "0")
  (const_int 3)))]
-  "TARGET_SSE"
+  "TARGET_SSE && !TARGET_MMX_WITH_SSE"
   "cvtpi2ps\t{%2, %0|%0, %2}"
   [(set_attr "type" "ssecvt")
(set_attr "mode" "V4SF")])
-- 
2.20.1



[PATCH 37/40] i386: Allow MMX intrinsic emulation with SSE

2019-02-11 Thread H.J. Lu
Allow MMX intrinsic emulation with SSE/SSE2/SSSE3.  Don't enable MMX ISA
by default with TARGET_MMX_WITH_SSE.

For pr82483-1.c and pr82483-2.c, "-mssse3 -mno-mmx" compiles in 64-bit
mode since MMX intrinsics can be emulated wit SSE.

gcc/

PR target/89021
* config/i386/i386-builtin.def: Enable MMX intrinsics with
SSE/SSE2/SSSE3.
* config/i386/i386.c (ix86_option_override_internal): Don't
enable MMX ISA with TARGET_MMX_WITH_SSE by default.
(bdesc_tm): Enable MMX intrinsics with SSE/SSE2/SSSE3.
(ix86_init_mmx_sse_builtins): Likewise.
(ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
intrinsics with TARGET_MMX_WITH_SSE.
* config/i386/mmintrin.h: Don't require MMX in 64-bit mode.

gcc/testsuite/

PR target/89021
* gcc.target/i386/pr82483-1.c: Error only on ia32.
* gcc.target/i386/pr82483-2.c: Likewise.
---
 gcc/config/i386/i386-builtin.def  | 126 +++---
 gcc/config/i386/i386.c|  62 +++
 gcc/config/i386/mmintrin.h|  10 +-
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 5 files changed, 118 insertions(+), 84 deletions(-)

diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 88005f4687f..10a9d631f29 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -100,7 +100,7 @@ BDESC (0, 0, CODE_FOR_fnstsw, "__builtin_ia32_fnstsw", 
IX86_BUILTIN_FNSTSW, UNKN
 BDESC (0, 0, CODE_FOR_fnclex, "__builtin_ia32_fnclex", IX86_BUILTIN_FNCLEX, 
UNKNOWN, (int) VOID_FTYPE_VOID)
 
 /* MMX */
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_emms, "__builtin_ia32_emms", 
IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
+BDESC (OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_mmx_emms, 
"__builtin_ia32_emms", IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
 
 /* 3DNow! */
 BDESC (OPTION_MASK_ISA_3DNOW, 0, CODE_FOR_mmx_femms, "__builtin_ia32_femms", 
IX86_BUILTIN_FEMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
@@ -442,68 +442,68 @@ BDESC (0, 0, CODE_FOR_rotrqi3, "__builtin_ia32_rorqi", 
IX86_BUILTIN_RORQI, UNKNO
 BDESC (0, 0, CODE_FOR_rotrhi3, "__builtin_ia32_rorhi", IX86_BUILTIN_RORHI, 
UNKNOWN, (int) UINT16_FTYPE_UINT16_INT)
 
 /* MMX */
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv8qi3, "__builtin_ia32_paddb", 
IX86_BUILTIN_PADDB, UNKNOWN, (int) V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv4hi3, "__builtin_ia32_paddw", 
IX86_BUILTIN_PADDW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv2si3, "__builtin_ia32_paddd", 
IX86_BUILTIN_PADDD, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv8qi3, "__builtin_ia32_psubb", 
IX86_BUILTIN_PSUBB, UNKNOWN, (int) V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv4hi3, "__builtin_ia32_psubw", 
IX86_BUILTIN_PSUBW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv2si3, "__builtin_ia32_psubd", 
IX86_BUILTIN_PSUBD, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv8qi3, 
"__builtin_ia32_paddsb", IX86_BUILTIN_PADDSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv4hi3, 
"__builtin_ia32_paddsw", IX86_BUILTIN_PADDSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_sssubv8qi3, 
"__builtin_ia32_psubsb", IX86_BUILTIN_PSUBSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_sssubv4hi3, 
"__builtin_ia32_psubsw", IX86_BUILTIN_PSUBSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_usaddv8qi3, 
"__builtin_ia32_paddusb", IX86_BUILTIN_PADDUSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_usaddv4hi3, 
"__builtin_ia32_paddusw", IX86_BUILTIN_PADDUSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ussubv8qi3, 
"__builtin_ia32_psubusb", IX86_BUILTIN_PSUBUSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ussubv4hi3, 
"__builtin_ia32_psubusw", IX86_BUILTIN_PSUBUSW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_mulv4hi3, "__builtin_ia32_pmullw", 
IX86_BUILTIN_PMULLW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_smulv4hi3_highpart, 
"__builtin_ia32_pmulhw", IX86_BUILTIN_PMULHW, UNKNOWN, (int) 
V4HI_FTYPE_V4HI_V4HI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_andv2si3, "__builtin_ia32_pand", 
IX86_BUILTIN_PAND, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_andnotv2si3, 
"__builtin_ia32_pandn", IX86_BUILTIN_PANDN, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_iorv2si3, "__builtin_ia32_por", 
IX86_BUILTIN_POR, UNKNOWN, (int) 

[PATCH 39/40] i386: Also enable SSSE3 __m64 tests in 64-bit mode

2019-02-11 Thread H.J. Lu
Since we now emulate MMX intrinsics with SSE in 64-bit mode, we can
enable SSSE3 __m64 tests even when AVX is enabled.

PR target/89021
* gcc.target/i386/ssse3-pabsb.c: Also enable __m64 check in
64-bit mode.
* gcc.target/i386/ssse3-pabsd.c: Likewise.
* gcc.target/i386/ssse3-pabsw.c: Likewise.
* gcc.target/i386/ssse3-palignr.c: Likewise.
* gcc.target/i386/ssse3-phaddd.c: Likewise.
* gcc.target/i386/ssse3-phaddsw.c: Likewise.
* gcc.target/i386/ssse3-phaddw.c: Likewise.
* gcc.target/i386/ssse3-phsubd.c: Likewise.
* gcc.target/i386/ssse3-phsubsw.c: Likewise.
* gcc.target/i386/ssse3-phsubw.c: Likewise.
* gcc.target/i386/ssse3-pmaddubsw.c: Likewise.
* gcc.target/i386/ssse3-pmulhrsw.c: Likewise.
* gcc.target/i386/ssse3-pshufb.c: Likewise.
* gcc.target/i386/ssse3-psignb.c: Likewise.
* gcc.target/i386/ssse3-psignd.c: Likewise.
* gcc.target/i386/ssse3-psignw.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/ssse3-pabsb.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pabsd.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pabsw.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-palignr.c   | 6 +++---
 gcc/testsuite/gcc.target/i386/ssse3-phaddd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phaddsw.c   | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phaddw.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubsw.c   | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubw.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pmaddubsw.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pmulhrsw.c  | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pshufb.c| 6 +++---
 gcc/testsuite/gcc.target/i386/ssse3-psignb.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-psignd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-psignw.c| 4 ++--
 16 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
index 7caa1b6c3a6..eef4ccae222 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
@@ -15,7 +15,7 @@
 #include "ssse3-vals.h"
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsb (int *i1, int *r)
@@ -63,7 +63,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result([i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
   /* Run the 64-bit tests */
   ssse3_test_pabsb ([i + 0], [0]);
   ssse3_test_pabsb ([i + 2], [2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
index 3a73cf01170..60043bad4a4 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
@@ -16,7 +16,7 @@
 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsd (int *i1, int *r)
@@ -62,7 +62,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result([i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
   /* Run the 64-bit tests */
   ssse3_test_pabsd ([i + 0], [0]);
   ssse3_test_pabsd ([i + 2], [2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
index 67e4721b8e6..dd0caa9783f 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
@@ -16,7 +16,7 @@
 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsw (int *i1, int *r)
@@ -64,7 +64,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result ([i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
   /* Run the 64-bit tests */
   ssse3_test_pabsw ([i + 0], [0]);
   ssse3_test_pabsw ([i + 2], [2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-palignr.c 
b/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
index dbee9bee4aa..f266f7805b8 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 /* Test the 64-bit form */
 static void
 ssse3_test_palignr (int *i1, int *i2, unsigned int imm, int *r)
@@ -214,7 +214,7 @@ compute_correct_result_128 (int *i1, int *i2, unsigned int 
imm, int *r)
   bout[i] = buf[imm + i];
 }
 
-#ifndef __AVX__
+#if !defined __AVX__ || defined __x86_64__
 static void
 compute_correct_result_64 (int *i1, int *i2, unsigned int imm, int *r)
 {
@@ -256,7 +256,7 @@ TEST (void)
   for (i = 0; i < 256; i += 8)
 for (imm = 0; imm < 100; imm++)
   {

[PATCH 40/40] i386: Enable 8-byte vectorizer for TARGET_MMX_WITH_SSE

2019-02-11 Thread H.J. Lu
In 64-bit, we support 8-byte vectorizer with SSE.  Also xfail x86-64
targets for gcc.dg/tree-ssa/pr84512.c.

gcc/

PR target/89028
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Enable
8-byte vectorizer for TARGET_MMX_WITH_SSE.

gcc/testsuite/

PR target/89028
* gcc.dg/tree-ssa/pr84512.c: Also xfail x86-64 targets.
* gcc.target/i386/pr89028-1.c: New test.
---
 gcc/config/i386/i386.c|  2 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr89028-1.c | 10 ++
 3 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-1.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8b822b6d34f..d088fd19673 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -50219,6 +50219,8 @@ ix86_autovectorize_vector_sizes (vector_sizes *sizes)
   sizes->safe_push (32);
   sizes->safe_push (16);
 }
+  if (TARGET_MMX_WITH_SSE)
+sizes->safe_push (8);
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
index 3975757d844..8f8529ba8cf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
@@ -13,4 +13,4 @@ int foo()
 }
 
 /* Listed targets xfailed due to PR84958.  */
-/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail { { 
alpha*-*-* amdgcn*-*-* nvptx*-*-* } || { sparc*-*-* && lp64 } } } } } */
+/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail { { { 
alpha*-*-* amdgcn*-*-* nvptx*-*-* } || { sparc*-*-* && lp64 } } || { { i?86-*-* 
x86_64-*-* } && { ! ia32 } } } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89028-1.c 
b/gcc/testsuite/gcc.target/i386/pr89028-1.c
new file mode 100644
index 000..d2ebb7f844d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89028-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx2 -O3" } */
+/* { dg-final { scan-assembler "vpaddb\[ \\t\]+\[^\n\]*%xmm\[0-9\]" } } */
+
+void
+foo (char* restrict r, char* restrict a)
+{
+  for (int i = 0; i < 8; i++)
+r[i] += a[i];
+}
-- 
2.20.1



[PATCH 36/40] i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE

2019-02-11 Thread H.J. Lu
PR target/89021
* config/i386/i386.c (ix86_expand_vector_init_duplicate): Set
mmx_ok to true if TARGET_MMX_WITH_SSE is true.
(ix86_expand_vector_init_one_nonzero): Likewise.
(ix86_expand_vector_init_one_var): Likewise.
(ix86_expand_vector_init_general): Likewise.
(ix86_expand_vector_init): Likewise.
(ix86_expand_vector_set): Likewise.
(ix86_expand_vector_extract): Likewise.
* config/i386/mmx.md (*vec_dupv2sf): Changed to
define_insn_and_split to support SSE emulation.
(vec_setv2sf): Also allow TARGET_MMX_WITH_SSE.
(vec_extractv2sf_1 splitter): Likewise.
(vec_extractv2sfsf): Likewise.
(vec_setv2si): Likewise.
(vec_extractv2si_1 splitter): Likewise.
(vec_extractv2sisi): Likewise.
(vec_setv4hi): Likewise.
(vec_extractv4hihi): Likewise.
(vec_setv8qi): Likewise.
(vec_extractv8qiqi): Likewise.
(*vec_extractv2sf_0): Don't allow TARGET_MMX_WITH_SSE.
(*vec_extractv2sf_1): Likewise.
(*vec_extractv2si_0): Likewise.
(*vec_extractv2si_1): Likewise.
(*vec_extractv2sf_0_sse): New.
(*vec_extractv2sf_1_sse): Likewise.
(*vec_extractv2si_0_sse): Likewise.
(*vec_extractv2si_1_sse): Likewise.
---
 gcc/config/i386/i386.c |   8 +++
 gcc/config/i386/mmx.md | 129 +
 2 files changed, 113 insertions(+), 24 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7d65192c1cd..4e776b8c3ea 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42365,6 +42365,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
 {
   bool ok;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SImode:
@@ -42524,6 +42525,7 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, 
machine_mode mode,
   bool use_vector_set = false;
   rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2DImode:
@@ -42717,6 +42719,7 @@ ix86_expand_vector_init_one_var (bool mmx_ok, 
machine_mode mode,
   XVECEXP (const_vec, 0, one_var) = CONST0_RTX (GET_MODE_INNER (mode));
   const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (const_vec, 0));
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2DFmode:
@@ -43102,6 +43105,7 @@ ix86_expand_vector_init_general (bool mmx_ok, 
machine_mode mode,
   machine_mode quarter_mode = VOIDmode;
   int n, i;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SFmode:
@@ -43301,6 +43305,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx 
vals)
   int i;
   rtx x;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
+
   /* Handle first initialization from vector elts.  */
   if (n_elts != XVECLEN (vals, 0))
 {
@@ -43400,6 +43406,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx 
val, int elt)
   machine_mode mmode = VOIDmode;
   rtx (*gen_blendm) (rtx, rtx, rtx, rtx);
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SFmode:
@@ -43755,6 +43762,7 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
rtx vec, int elt)
   bool use_vec_extr = false;
   rtx tmp;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SImode:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index c8bd544dc9e..4e8b6e54b4c 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -591,14 +591,23 @@
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
 
-(define_insn "*vec_dupv2sf"
-  [(set (match_operand:V2SF 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2sf"
+  [(set (match_operand:V2SF 0 "register_operand" "=y,x,Yv")
(vec_duplicate:V2SF
- (match_operand:SF 1 "register_operand" "0")))]
-  "TARGET_MMX"
-  "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+ (match_operand:SF 1 "register_operand" "0,0,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   punpckldq\t%0, %0
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (vec_duplicate:V4SF (match_dup 1)))]
+  "operands[0] = lowpart_subreg (V4SFmode, operands[0],
+GET_MODE (operands[0]));"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*mmx_concatv2sf"
   [(set (match_operand:V2SF 0 "register_operand" "=y,y")
@@ -616,7 +625,7 @@
   [(match_operand:V2SF 0 "register_operand")
(match_operand:SF 1 "register_operand")
(match_operand 2 "const_int_operand")]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_set (false, operands[0], operands[1],
  INTVAL (operands[2]));
@@ -630,7 +639,20 @@
(vec_select:SF
  (match_operand:V2SF 1 "nonimmediate_operand" " xm,x,ym,y,m,m")
  

[PATCH 12/40] i386: Emulate MMX vec_dupv2si with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX vec_dupv2si with SSE.  Add the "Yw" constraint to allow
broadcast from integer register for AVX512BW with TARGET_AVX512VL.
Only SSE register source operand is allowed.

PR target/89021
* config/i386/constraints.md (Yw): New constraint.
* config/i386/mmx.md (*vec_dupv2si): Changed to
define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
support SSE emulation.
---
 gcc/config/i386/constraints.md |  6 ++
 gcc/config/i386/mmx.md | 24 +---
 2 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 16075b4acf3..c546b20d9dc 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -110,6 +110,8 @@
 ;;  v  any EVEX encodable SSE register for AVX512VL target,
 ;; otherwise any SSE register
 ;;  h  EVEX encodable SSE register with number factor of four
+;;  w  any EVEX encodable SSE register for AVX512BW with TARGET_AVX512VL
+;; target.
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -146,6 +148,10 @@
  "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
  "@internal For AVX512VL, any EVEX encodable SSE register 
(@code{%xmm0-%xmm31}), otherwise any SSE register.")
 
+(define_register_constraint "Yw"
+ "TARGET_AVX512BW && TARGET_AVX512VL ? ALL_SSE_REGS : NO_REGS"
+ "@internal Any EVEX encodable SSE register (@code{%xmm0-%xmm31}) for AVX512BW 
with TARGET_AVX512VL target.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  f  FLAGS_REG
 ;;  g  GOT memory operand.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 7a5e41defe4..3b6f2c1c87b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1420,14 +1420,24 @@
(set_attr "length_immediate" "1")
(set_attr "mode" "DI")])
 
-(define_insn "*vec_dupv2si"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2si"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv,Yw")
(vec_duplicate:V2SI
- (match_operand:SI 1 "register_operand" "0")))]
-  "TARGET_MMX"
-  "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+ (match_operand:SI 1 "register_operand" "0,0,Yv,r")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   punpckldq\t%0, %0
+   #
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (vec_duplicate:V4SI (match_dup 1)))]
+  "operands[0] = lowpart_subreg (V4SImode, operands[0],
+GET_MODE (operands[0]));"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx,x64_avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI,TI")])
 
 (define_insn "*mmx_concatv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y,y")
-- 
2.20.1



[PATCH 35/40] i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE

2019-02-11 Thread H.J. Lu
PR target/89021
* config/i386/mmx.md (MMXMODE:mov): Also allow
TARGET_MMX_WITH_SSE.
(MMXMODE:*mov_internal): Likewise.
(MMXMODE:movmisalign): Likewise.
---
 gcc/config/i386/mmx.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 5159c68925b..c8bd544dc9e 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -70,7 +70,7 @@
 (define_expand "mov"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
   DONE;
@@ -81,7 +81,7 @@
 "=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
(match_operand:MMXMODE 1 "nonimm_or_0_operand"
 "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
-  "TARGET_MMX
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -232,7 +232,7 @@
 (define_expand "movmisalign"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
 {
   ix86_expand_vector_move (mode, operands);
   DONE;
-- 
2.20.1



[PATCH 30/40] i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX ssse3_pmulhrswv4hi3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/sse.md (*ssse3_pmulhrswv4hi3): Add SSE emulation.
---
 gcc/config/i386/sse.md | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 70ac2259107..dc35fcfd34a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15774,25 +15774,31 @@
(set_attr "mode" "")])
 
 (define_insn "*ssse3_pmulhrswv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(plus:V4SI
  (lshiftrt:V4SI
(mult:V4SI
  (sign_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (sign_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
(const_int 14))
  (match_operand:V4HI 3 "const1_operand"))
(const_int 1]
-  "TARGET_SSSE3 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "pmulhrsw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseimul")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && TARGET_SSSE3
+   && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
+  "@
+   pmulhrsw\t{%2, %0|%0, %2}
+   pmulhrsw\t{%2, %0|%0, %2}
+   vpmulhrsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseimul")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_pshufb3"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
-- 
2.20.1



[PATCH 34/40] i386: Emulate MMX abs2 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX abs2 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/sse.md (abs2): Add SSE emulation.
---
 gcc/config/i386/sse.md | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a1d43204344..e444e599734 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16091,16 +16091,19 @@
 })
 
 (define_insn "abs2"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yv")
(abs:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "ym")))]
-  "TARGET_SSSE3"
-  "pabs\t{%1, %0|%0, %1}";
-  [(set_attr "type" "sselog1")
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "ym,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   pabs\t{%1, %0|%0, %1}
+   %vpabs\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "sselog1")
(set_attr "prefix_rep" "0")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 33/40] i386: Emulate MMX ssse3_palignrdi with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX version of palignrq with SSE version by concatenating 2
64-bit MMX operands into a single 128-bit SSE operand, followed by
SSE psrldq.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (ssse3_palignrdi): Changed to
define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 54 ++
 1 file changed, 44 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 15c187f7f5c..a1d43204344 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15977,23 +15977,57 @@
(set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "")])
 
-(define_insn "ssse3_palignrdi"
-  [(set (match_operand:DI 0 "register_operand" "=y")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "0")
-   (match_operand:DI 2 "nonimmediate_operand" "ym")
-   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n")]
+(define_insn_and_split "ssse3_palignrdi"
+  [(set (match_operand:DI 0 "register_operand" "=y,x,Yv")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "0,0,Yv")
+   (match_operand:DI 2 "nonimmediate_operand" "ym,x,Yv")
+   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n,n")]
   UNSPEC_PALIGNR))]
-  "TARGET_SSSE3"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
 {
-  operands[3] = GEN_INT (INTVAL (operands[3]) / 8);
-  return "palignr\t{%3, %2, %0|%0, %2, %3}";
+  if (TARGET_MMX_WITH_SSE)
+return "#";
+  else
+{
+  operands[3] = GEN_INT (INTVAL (operands[3]) / 8);
+  return "palignr\t{%3, %2, %0|%0, %2, %3}";
+}
 }
-  [(set_attr "type" "sseishft")
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 0)
+   (lshiftrt:V1TI (match_dup 0) (match_dup 3)))]
+{
+  /* Emulate MMX palignrdi with SSE psrldq.  */
+  rtx op0 = lowpart_subreg (V2DImode, operands[0],
+   GET_MODE (operands[0]));
+  rtx insn;
+  if (TARGET_AVX)
+insn = gen_vec_concatv2di (op0, operands[2], operands[1]);
+  else
+{
+  /* NB: SSE can only concatenate OP0 and OP1 to OP0.  */
+  insn = gen_vec_concatv2di (op0, operands[1], operands[2]);
+  emit_insn (insn);
+  /* Swap bits 0:63 with bits 64:127.  */
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2),
+ GEN_INT (3),
+ GEN_INT (0),
+ GEN_INT (1)));
+  rtx op1 = gen_rtx_REG (V4SImode, REGNO (op0));
+  rtx op2 = gen_rtx_VEC_SELECT (V4SImode, op1, mask);
+  insn = gen_rtx_SET (op1, op2);
+}
+  emit_insn (insn);
+  operands[0] = lowpart_subreg (V1TImode, op0, GET_MODE (op0));
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseishft")
(set_attr "atom_unit" "sishuf")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 ;; Mode iterator to handle singularity w/ absence of V2DI and V4DI
 ;; modes for abs instruction on pre AVX-512 targets.
-- 
2.20.1



[PATCH 32/40] i386: Emulate MMX ssse3_psign3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX ssse3_psign3 with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_psign3): Add SSE emulation.
---
 gcc/config/i386/sse.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6e748d0543c..15c187f7f5c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15908,17 +15908,21 @@
(set_attr "mode" "")])
 
 (define_insn "ssse3_psign3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
(unspec:MMXMODEI
- [(match_operand:MMXMODEI 1 "register_operand" "0")
-  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")]
+ [(match_operand:MMXMODEI 1 "register_operand" "0,0,Yv")
+  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")]
  UNSPEC_PSIGN))]
-  "TARGET_SSSE3"
-  "psign\t{%2, %0|%0, %2}";
-  [(set_attr "type" "sselog1")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   psign\t{%2, %0|%0, %2}
+   psign\t{%2, %0|%0, %2}
+   vpsign\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_palignr_mask"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=v")
-- 
2.20.1



[PATCH 29/40] i386: Emulate MMX ssse3_pmaddubsw with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX ssse3_pmaddubsw with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_pmaddubsw): Add SSE emulation.
---
 gcc/config/i386/sse.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 3fe41b772c2..70ac2259107 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15666,17 +15666,17 @@
(set_attr "mode" "TI")])
 
 (define_insn "ssse3_pmaddubsw"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(ss_plus:V4HI
  (mult:V4HI
(zero_extend:V4HI
  (vec_select:V4QI
-   (match_operand:V8QI 1 "register_operand" "0")
+   (match_operand:V8QI 1 "register_operand" "0,0,Yv")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)])))
(sign_extend:V4HI
  (vec_select:V4QI
-   (match_operand:V8QI 2 "nonimmediate_operand" "ym")
+   (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)]
  (mult:V4HI
@@ -15688,13 +15688,17 @@
  (vec_select:V4QI (match_dup 2)
(parallel [(const_int 1) (const_int 3)
   (const_int 5) (const_int 7)]))]
-  "TARGET_SSSE3"
-  "pmaddubsw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   pmaddubsw\t{%2, %0|%0, %2}
+   pmaddubsw\t{%2, %0|%0, %2}
+   vpmaddubsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "simul")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_mode_iterator PMULHRSW
   [V4HI V8HI (V16HI "TARGET_AVX2")])
-- 
2.20.1



[PATCH 31/40] i386: Emulate MMX pshufb with SSE version

2019-02-11 Thread H.J. Lu
Emulate MMX version of pshufb with SSE version by masking out the bit 3
of the shuffle control byte.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (ssse3_pshufbv8qi3): Renamed to ...
(ssse3_pshufbv8qi3_mmx): This.
(ssse3_pshufbv8qi3): New.
(ssse3_pshufbv8qi3_sse): Likewise.
---
 gcc/config/i386/sse.md | 63 --
 1 file changed, 61 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index dc35fcfd34a..6e748d0543c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15819,18 +15819,77 @@
(set_attr "btver2_decode" "vector")
(set_attr "mode" "")])
 
-(define_insn "ssse3_pshufbv8qi3"
+(define_expand "ssse3_pshufbv8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+   (unspec:V8QI [(match_operand:V8QI 1 "register_operand")
+ (match_operand:V8QI 2 "nonimmediate_operand")]
+UNSPEC_PSHUFB))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  /* Emulate MMX version of pshufb with SSE version by masking
+out the bit 3 of the shuffle control byte.  */
+  rtvec par = gen_rtvec (4, GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7));
+  rtx vec_const = gen_rtx_CONST_VECTOR (V4SImode, par);
+  vec_const = force_const_mem (V4SImode, vec_const);
+  rtx op3 = gen_reg_rtx (V4SImode);
+  rtx op4 = gen_reg_rtx (V4SImode);
+  rtx insn = gen_rtx_SET (op4, vec_const);
+  emit_insn (insn);
+  rtx op2 = force_reg (V8QImode, operands[2]);
+  insn = gen_ssse3_pshufbv8qi3_sse (operands[0], operands[1],
+   op2, op3, op4);
+  emit_insn (insn);
+  DONE;
+}
+})
+
+(define_insn "ssse3_pshufbv8qi3_mmx"
   [(set (match_operand:V8QI 0 "register_operand" "=y")
(unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0")
  (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
 UNSPEC_PSHUFB))]
-  "TARGET_SSSE3"
+  "TARGET_SSSE3 && !TARGET_MMX_WITH_SSE"
   "pshufb\t{%2, %0|%0, %2}";
   [(set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
(set_attr "mode" "DI")])
 
+(define_insn_and_split "ssse3_pshufbv8qi3_sse"
+  [(set (match_operand:V8QI 0 "register_operand" "=x,Yv")
+   (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,Yv")
+ (match_operand:V8QI 2 "register_operand" "x,Yv")]
+UNSPEC_PSHUFB))
+   (set (match_operand:V4SI 3 "register_operand" "=x,Yv")
+   (unspec:V4SI [(match_operand:V4SI 4 "register_operand" "3,3")]
+UNSPEC_PSHUFB))]
+  "TARGET_SSSE3 && TARGET_MMX_WITH_SSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  /* Mask out the bit 3 of the shuffle control byte.  */
+  rtx op2 = gen_rtx_REG (V4SImode, REGNO (operands[2]));
+  rtx op3 = operands[3];
+  rtx insn = gen_andv4si3 (op3, op3, op2);
+  emit_insn (insn);
+  /* Generate SSE version of pshufb.  */
+  rtx op0 = gen_rtx_REG (V16QImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V16QImode, REGNO (operands[1]));
+  op3 = gen_rtx_REG (V16QImode, REGNO (op3));
+  insn = gen_ssse3_pshufbv16qi3 (op0, op1, op3);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "mmx_isa" "x64_noavx,x64_avx")
+   (set_attr "type" "sselog1")
+   (set_attr "mode" "TI,TI")])
+
 (define_insn "_psign3"
   [(set (match_operand:VI124_AVX2 0 "register_operand" "=x,x")
(unspec:VI124_AVX2
-- 
2.20.1



[PATCH 26/40] i386: Emulate MMX umulv1siv1di3 with SSE2

2019-02-11 Thread H.J. Lu
Emulate MMX umulv1siv1di3 with SSE2.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (sse2_umulv1siv1di3): Add SSE emulation
support.
(*sse2_umulv1siv1di3): Add SSE2 emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2efa663b3e2..5159c68925b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -947,24 +947,30 @@
(vec_select:V1SI
  (match_operand:V2SI 2 "nonimmediate_operand")
  (parallel [(const_int 0)])]
-  "TARGET_SSE2"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE2"
   "ix86_fixup_binary_operands_no_copy (MULT, V2SImode, operands);")
 
 (define_insn "*sse2_umulv1siv1di3"
-  [(set (match_operand:V1DI 0 "register_operand" "=y")
+  [(set (match_operand:V1DI 0 "register_operand" "=y,x,Yv")
 (mult:V1DI
  (zero_extend:V1DI
(vec_select:V1SI
- (match_operand:V2SI 1 "nonimmediate_operand" "%0")
+ (match_operand:V2SI 1 "nonimmediate_operand" "%0,0,Yv")
  (parallel [(const_int 0)])))
  (zero_extend:V1DI
(vec_select:V1SI
- (match_operand:V2SI 2 "nonimmediate_operand" "ym")
+ (match_operand:V2SI 2 "nonimmediate_operand" "ym,x,Yv")
  (parallel [(const_int 0)])]
-  "TARGET_SSE2 && ix86_binary_operator_ok (MULT, V2SImode, operands)"
-  "pmuludq\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && TARGET_SSE2
+   && ix86_binary_operator_ok (MULT, V2SImode, operands)"
+  "@
+   pmuludq\t{%2, %0|%0, %2}
+   pmuludq\t{%2, %0|%0, %2}
+   vpmuludq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_v4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 25/40] i386: Emulate MMX movntq with SSE2 movntidi

2019-02-11 Thread H.J. Lu
Emulate MMX movntq with SSE2 movntidi.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (sse_movntq): Renamed to ...
(*sse_movntq): This.  Require TARGET_MMX and disallow
TARGET_MMX_WITH_SSE.
(sse_movntq): New.  Emulate MMX movntq with SSE2 movntidi.
---
 gcc/config/i386/mmx.md | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b3048a6a3b8..2efa663b3e2 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -238,11 +238,26 @@
   DONE;
 })
 
-(define_insn "sse_movntq"
+(define_expand "sse_movntq"
+  [(set (match_operand:DI 0 "memory_operand")
+   (unspec:DI [(match_operand:DI 1 "register_operand")]
+  UNSPEC_MOVNTQ))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  rtx insn = gen_sse2_movntidi (operands[0], operands[1]);
+  emit_insn (insn);
+  DONE;
+}
+})
+
+(define_insn "*sse_movntq"
   [(set (match_operand:DI 0 "memory_operand" "=m")
(unspec:DI [(match_operand:DI 1 "register_operand" "y")]
   UNSPEC_MOVNTQ))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "TARGET_MMX && !TARGET_MMX_WITH_SSE && (TARGET_SSE || TARGET_3DNOW_A)"
   "movntq\t{%1, %0|%0, %1}"
   [(set_attr "type" "mmxmov")
(set_attr "mode" "DI")])
-- 
2.20.1



[PATCH 28/40] i386: Emulate MMX ssse3_phdv2si3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX ssse3_phdv2si3 with SSE by moving bits
64:95 to bits 32:63 in SSE register.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_phdv2si3):
Changed to define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index e3b63b0e890..3fe41b772c2 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15480,26 +15480,42 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
-(define_insn "ssse3_phdv2si3"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "ssse3_phdv2si3"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
(vec_concat:V2SI
  (plusminus:SI
(vec_select:SI
- (match_operand:V2SI 1 "register_operand" "0")
+ (match_operand:V2SI 1 "register_operand" "0,0,Yv")
  (parallel [(const_int 0)]))
(vec_select:SI (match_dup 1) (parallel [(const_int 1)])))
  (plusminus:SI
(vec_select:SI
- (match_operand:V2SI 2 "nonimmediate_operand" "ym")
+ (match_operand:V2SI 2 "nonimmediate_operand" "ym,x,Yv")
  (parallel [(const_int 0)]))
(vec_select:SI (match_dup 2) (parallel [(const_int 1)])]
-  "TARGET_SSSE3"
-  "phd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   phd\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  /* Generate SSE version of the operation.  */
+  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V4SImode, REGNO (operands[1]));
+  rtx op2 = gen_rtx_REG (V4SImode, REGNO (operands[2]));
+  rtx insn = gen_ssse3_phdv4si3 (op0, op1, op2);
+  emit_insn (insn);
+  ix86_move_vector_high_sse_to_mmx (op0);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "complex")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "avx2_pmaddubsw256"
   [(set (match_operand:V16HI 0 "register_operand" "=x,v")
-- 
2.20.1



[PATCH 27/40] i386: Emulate MMX ssse3_phwv4hi3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX ssse3_phwv4hi3 with SSE by moving bits
64:95 to bits 32:63 in SSE register.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_phwv4hi3):
Changed to define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 32 
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 75e711624ce..e3b63b0e890 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15358,13 +15358,13 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
-(define_insn "ssse3_phwv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+(define_insn_and_split "ssse3_phwv4hi3"
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(vec_concat:V4HI
  (vec_concat:V2HI
(ssse3_plusminus:HI
  (vec_select:HI
-   (match_operand:V4HI 1 "register_operand" "0")
+   (match_operand:V4HI 1 "register_operand" "0,0,Yv")
(parallel [(const_int 0)]))
  (vec_select:HI (match_dup 1) (parallel [(const_int 1)])))
(ssse3_plusminus:HI
@@ -15373,19 +15373,35 @@
  (vec_concat:V2HI
(ssse3_plusminus:HI
  (vec_select:HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0)]))
  (vec_select:HI (match_dup 2) (parallel [(const_int 1)])))
(ssse3_plusminus:HI
  (vec_select:HI (match_dup 2) (parallel [(const_int 2)]))
  (vec_select:HI (match_dup 2) (parallel [(const_int 3)]))]
-  "TARGET_SSSE3"
-  "phw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   phw\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  /* Generate SSE version of the operation.  */
+  rtx op0 = gen_rtx_REG (V8HImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V8HImode, REGNO (operands[1]));
+  rtx op2 = gen_rtx_REG (V8HImode, REGNO (operands[2]));
+  rtx insn = gen_ssse3_phwv8hi3 (op0, op1, op2);
+  emit_insn (insn);
+  ix86_move_vector_high_sse_to_mmx (op0);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "complex")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "avx2_phdv8si3"
   [(set (match_operand:V8SI 0 "register_operand" "=x")
-- 
2.20.1



[PATCH 11/40] i386: Emulate MMX mmx_eq/mmx_gt3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_eq/mmx_gt3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_eq3): Also allow
TARGET_MMX_WITH_SSE.
(*mmx_eq3): Also allow TARGET_MMX_WITH_SSE.  Add SSE
support.
(mmx_gt3): Likewise.
---
 gcc/config/i386/mmx.md | 39 ---
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index cbad3da1fe0..7a5e41defe4 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1058,28 +1058,37 @@
 (eq:MMXMODEI
  (match_operand:MMXMODEI 1 "nonimmediate_operand")
  (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
 
 (define_insn "*mmx_eq3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (eq:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (EQ, mode, operands)"
-  "pcmpeq\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxcmp")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (EQ, mode, operands)"
+  "@
+   pcmpeq\t{%2, %0|%0, %2}
+   pcmpeq\t{%2, %0|%0, %2}
+   vpcmpeq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcmp,ssecmp,ssecmp")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_gt3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (gt:MMXMODEI
- (match_operand:MMXMODEI 1 "register_operand" "0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX"
-  "pcmpgt\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxcmp")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "register_operand" "0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   pcmpgt\t{%2, %0|%0, %2}
+   pcmpgt\t{%2, %0|%0, %2}
+   vpcmpgt\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxcmp,ssecmp,ssecmp")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 24/40] i386: Emulate MMX mmx_psadbw with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_psadbw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_psadbw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 5e4fd499658..b3048a6a3b8 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1807,14 +1807,19 @@
(set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_psadbw"
-  [(set (match_operand:V1DI 0 "register_operand" "=y")
-(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
+  [(set (match_operand:V1DI 0 "register_operand" "=y,x,Yv")
+(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0,0,Yv")
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")]
 UNSPEC_PSADBW))]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "psadbw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   psadbw\t{%2, %0|%0, %2}
+   psadbw\t{%2, %0|%0, %2}
+   vpsadbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn_and_split "mmx_pmovmskb"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-- 
2.20.1



[PATCH 22/40] i386: Emulate MMX mmx_uavgv8qi3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_uavgv8qi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_uavgv8qi3): Also check TARGET_MMX
and TARGET_MMX_WITH_SSE.
(*mmx_uavgv8qi3): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3c432f09e31..7fd29094836 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1723,42 +1723,47 @@
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "TARGET_SSE || TARGET_3DNOW"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (PLUS, V8QImode, operands);")
 
 (define_insn "*mmx_uavgv8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
(truncate:V8QI
  (lshiftrt:V8HI
(plus:V8HI
  (plus:V8HI
(zero_extend:V8HI
- (match_operand:V8QI 1 "nonimmediate_operand" "%0"))
+ (match_operand:V8QI 1 "nonimmediate_operand" "%0,0,Yv"))
(zero_extend:V8HI
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")))
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_vector:V8HI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "(TARGET_SSE || TARGET_3DNOW)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (PLUS, V8QImode, operands)"
 {
   /* These two instructions have the same operation, but their encoding
  is different.  Prefer the one that is de facto standard.  */
-  if (TARGET_SSE || TARGET_3DNOW_A)
+  if (TARGET_MMX_WITH_SSE && TARGET_AVX)
+return "vpavgb\t{%2, %1, %0|%0, %1, %2}";
+  else if (TARGET_SSE || TARGET_3DNOW_A)
 return "pavgb\t{%2, %0|%0, %2}";
   else
 return "pavgusb\t{%2, %0|%0, %2}";
 }
-  [(set_attr "type" "mmxshft")
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
(set (attr "prefix_extra")
  (if_then_else
(not (ior (match_test "TARGET_SSE")
 (match_test "TARGET_3DNOW_A")))
(const_string "1")
(const_string "*")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_uavgv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 23/40] i386: Emulate MMX mmx_uavgv4hi3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_uavgv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_uavgv4hi3): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(*mmx_uavgv4hi3): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 7fd29094836..5e4fd499658 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1778,27 +1778,33 @@
  (const_vector:V4SI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
   "ix86_fixup_binary_operands_no_copy (PLUS, V4HImode, operands);")
 
 (define_insn "*mmx_uavgv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(plus:V4SI
  (plus:V4SI
(zero_extend:V4SI
- (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+ (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
(zero_extend:V4SI
- (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+ (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
  (const_vector:V4SI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (PLUS, V4HImode, operands)"
-  "pavgw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set_attr "mode" "DI")])
+  "@
+   pavgw\t{%2, %0|%0, %2}
+   pavgw\t{%2, %0|%0, %2}
+   vpavgw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_psadbw"
   [(set (match_operand:V1DI 0 "register_operand" "=y")
-- 
2.20.1



[PATCH 21/40] i386: Emulate MMX maskmovq with SSE2 maskmovdqu

2019-02-11 Thread H.J. Lu
Emulate MMX maskmovq with SSE2 maskmovdqu in 64-bit mode by zero-extending
source and mask operands to 128 bits.  Handle unmapped bits 64:127 at
memory address by adjusting source and mask operands together with memory
address.

PR target/89021
* config/i386/xmmintrin.h: Emulate MMX maskmovq with SSE2
maskmovdqu in 64-bit mode.
---
 gcc/config/i386/xmmintrin.h | 61 +
 1 file changed, 61 insertions(+)

diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index 58284378514..e797795f127 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1165,7 +1165,68 @@ _m_pshufw (__m64 __A, int const __N)
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_maskmove_si64 (__m64 __A, __m64 __N, char *__P)
 {
+#ifdef __x86_64__
+  /* Emulate MMX maskmovq with SSE2 maskmovdqu and handle unmapped bits
+ 64:127 at address __P.  */
+  typedef long long __v2di __attribute__ ((__vector_size__ (16)));
+  typedef char __v16qi __attribute__ ((__vector_size__ (16)));
+  /* Zero-extend __A and __N to 128 bits.  */
+  __v2di __A128 = __extension__ (__v2di) { ((__v1di) __A)[0], 0 };
+  __v2di __N128 = __extension__ (__v2di) { ((__v1di) __N)[0], 0 };
+
+  /* Check the alignment of __P.  */
+  __SIZE_TYPE__ offset = ((__SIZE_TYPE__) __P) & 0xf;
+  if (offset)
+{
+  /* If the misalignment of __P > 8, subtract __P by 8 bytes.
+Otherwise, subtract __P by the misalignment.  */
+  if (offset > 8)
+   offset = 8;
+  __P = (char *) (((__SIZE_TYPE__) __P) - offset);
+
+  /* Shift __A128 and __N128 to the left by the adjustment.  */
+  switch (offset)
+   {
+   case 1:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 8);
+ break;
+   case 2:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 2 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 2 * 8);
+ break;
+   case 3:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 3 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 3 * 8);
+ break;
+   case 4:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 4 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 4 * 8);
+ break;
+   case 5:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 5 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 5 * 8);
+ break;
+   case 6:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 6 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 6 * 8);
+ break;
+   case 7:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 7 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 7 * 8);
+ break;
+   case 8:
+ __A128 = __builtin_ia32_pslldqi128 (__A128, 8 * 8);
+ __N128 = __builtin_ia32_pslldqi128 (__N128, 8 * 8);
+ break;
+   default:
+ break;
+   }
+}
+  __builtin_ia32_maskmovdqu ((__v16qi)__A128, (__v16qi)__N128, __P);
+#else
   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
+#endif
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-- 
2.20.1



[PATCH 13/40] i386: Emulate MMX pshufw with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX pshufw with SSE.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_pshufw): Also check TARGET_MMX and
TARGET_MMX_WITH_SSE.
(mmx_pshufw_1): Add SSE emulation.
(*vec_dupv4hi): Changed to define_insn_and_split and also allow
TARGET_MMX_WITH_SSE to support SSE emulation.
---
 gcc/config/i386/mmx.md | 79 ++
 1 file changed, 64 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 3b6f2c1c87b..69ed2d07022 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1364,7 +1364,8 @@
   [(match_operand:V4HI 0 "register_operand")
(match_operand:V4HI 1 "nonimmediate_operand")
(match_operand:SI 2 "const_int_operand")]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   int mask = INTVAL (operands[2]);
   emit_insn (gen_mmx_pshufw_1 (operands[0], operands[1],
@@ -1376,14 +1377,15 @@
 })
 
 (define_insn "mmx_pshufw_1"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv")
 (vec_select:V4HI
-  (match_operand:V4HI 1 "nonimmediate_operand" "ym")
+  (match_operand:V4HI 1 "nonimmediate_operand" "ym,Yv")
   (parallel [(match_operand 2 "const_0_to_3_operand")
  (match_operand 3 "const_0_to_3_operand")
  (match_operand 4 "const_0_to_3_operand")
  (match_operand 5 "const_0_to_3_operand")])))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
 {
   int mask = 0;
   mask |= INTVAL (operands[2]) << 0;
@@ -1392,11 +1394,20 @@
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
 
-  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
+  switch (which_alternative)
+{
+case 0:
+  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
+case 1:
+  return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}";
+default:
+  gcc_unreachable ();
+}
 }
-  [(set_attr "type" "mmxcvt")
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,sselog")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_insn "mmx_pswapdv2si2"
   [(set (match_operand:V2SI 0 "register_operand" "=y")
@@ -1409,16 +1420,54 @@
(set_attr "prefix_extra" "1")
(set_attr "mode" "DI")])
 
-(define_insn "*vec_dupv4hi"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv4hi"
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv,Yw")
(vec_duplicate:V4HI
  (truncate:HI
-   (match_operand:SI 1 "register_operand" "0"]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pshufw\t{$0, %0, %0|%0, %0, 0}"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (match_operand:SI 1 "register_operand" "0,Yv,r"]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "@
+   pshufw\t{$0, %0, %0|%0, %0, 0}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(const_int 0)]
+{
+  rtx op;
+  operands[0] = lowpart_subreg (V8HImode, operands[0],
+   GET_MODE (operands[0]));
+  if (TARGET_AVX2)
+{
+  operands[1] = lowpart_subreg (HImode, operands[1],
+   GET_MODE (operands[1]));
+  op = gen_rtx_VEC_DUPLICATE (V8HImode, operands[1]);
+}
+  else
+{
+  operands[1] = lowpart_subreg (V8HImode, operands[1],
+   GET_MODE (operands[1]));
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (8,
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (0),
+ GEN_INT (4),
+ GEN_INT (5),
+ GEN_INT (6),
+ GEN_INT (7)));
+
+  op = gen_rtx_VEC_SELECT (V8HImode, operands[1], mask);
+}
+  rtx insn = gen_rtx_SET (operands[0], op);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "mmx_isa" "native,x64,x64_avx")
+   (set_attr "type" "mmxcvt,sselog1,ssemov")
+   (set_attr "length_immediate" "1,1,0")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn_and_split "*vec_dupv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv,Yw")
-- 
2.20.1



[PATCH 18/40] i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_v4hi3): Also check TARGET_MMX
and TARGET_MMX_WITH_SSE.
(mmx_v8qi3): Likewise.
(smaxmin:v4hi3): New.
(umaxmin:v8qi3): Likewise.
(smaxmin:*mmx_v4hi3): Add SSE emulation.
(umaxmin:*mmx_v8qi3): Likewise.
---
 gcc/config/i386/mmx.md | 60 +++---
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 836adf3e533..4cf008e99c7 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -950,38 +950,66 @@
 (smaxmin:V4HI
  (match_operand:V4HI 1 "nonimmediate_operand")
  (match_operand:V4HI 2 "nonimmediate_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
+
+(define_expand "v4hi3"
+  [(set (match_operand:V4HI 0 "register_operand")
+(smaxmin:V4HI
+ (match_operand:V4HI 1 "nonimmediate_operand")
+ (match_operand:V4HI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
 
 (define_insn "*mmx_v4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
 (smaxmin:V4HI
- (match_operand:V4HI 1 "nonimmediate_operand" "%0")
- (match_operand:V4HI 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+ (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (, V4HImode, operands)"
-  "pw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   pw\t{%2, %0|%0, %2}
+   pw\t{%2, %0|%0, %2}
+   vpw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_v8qi3"
   [(set (match_operand:V8QI 0 "register_operand")
 (umaxmin:V8QI
  (match_operand:V8QI 1 "nonimmediate_operand")
  (match_operand:V8QI 2 "nonimmediate_operand")))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
+
+(define_expand "v8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+(umaxmin:V8QI
+ (match_operand:V8QI 1 "nonimmediate_operand")
+ (match_operand:V8QI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
 
 (define_insn "*mmx_v8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
 (umaxmin:V8QI
- (match_operand:V8QI 1 "nonimmediate_operand" "%0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_SSE || TARGET_3DNOW_A)
+ (match_operand:V8QI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (, V8QImode, operands)"
-  "pb\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   pb\t{%2, %0|%0, %2}
+   pb\t{%2, %0|%0, %2}
+   vpb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_ashr3"
   [(set (match_operand:MMXMODE24 0 "register_operand")
-- 
2.20.1



[PATCH 14/40] i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.

PR target/89021
* config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.
(sse_cvttps2pi): Likewise.
---
 gcc/config/i386/sse.md | 30 ++
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5dc0930ac1f..80bb4cb935d 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4668,26 +4668,32 @@
(set_attr "mode" "V4SF")])
 
 (define_insn "sse_cvtps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")
(vec_select:V2SI
- (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]
+ (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm")]
   UNSPEC_FIX_NOTRUNC)
  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
-  "cvtps2pi\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+  "@
+   cvtps2pi\t{%1, %0|%0, %q1}
+   %vcvtps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "ssecvt")
+   (set_attr "unit" "mmx,*")
(set_attr "mode" "DI")])
 
 (define_insn "sse_cvttps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yv")
(vec_select:V2SI
- (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm"))
+ (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm,YvBm"))
  (parallel [(const_int 0) (const_int 1)])))]
-  "TARGET_SSE"
-  "cvttps2pi\t{%1, %0|%0, %q1}"
-  [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE"
+  "@
+   cvttps2pi\t{%1, %0|%0, %q1}
+   %vcvttps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "ssecvt")
+   (set_attr "unit" "mmx,*")
(set_attr "prefix_rep" "0")
(set_attr "mode" "SF")])
 
-- 
2.20.1



[PATCH 10/40] i386: Emulate MMX mmx_andnot3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_andnot3 with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (mmx_andnot3): Also allow
TARGET_MMX_WITH_SSE.  Add SSE support.
---
 gcc/config/i386/mmx.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index e4b0a3b0311..cbad3da1fe0 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1088,14 +1088,18 @@
 ;
 
 (define_insn "mmx_andnot3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
(and:MMXMODEI
- (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand" "0"))
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX"
-  "pandn\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand" "0,0,Yv"))
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   pandn\t{%2, %0|%0, %2}
+   pandn\t{%2, %0|%0, %2}
+   vpandn\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODEI 0 "register_operand")
-- 
2.20.1



[PATCH 08/40] i386: Emulate MMX ashr3/3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX ashr3/3 with SSE.  Only SSE register
source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_ashr3): Changed to define_expand.
Disallow TARGET_MMX_WITH_SSE.
(mmx_3): Likewise.
(ashr3): New.
(*ashr3): Likewise.
(3): Likewise.
(*3): Likewise.
---
 gcc/config/i386/mmx.md | 68 --
 1 file changed, 52 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 0e44b3ce9b8..1b4f67be902 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -983,33 +983,69 @@
   [(set_attr "type" "mmxadd")
(set_attr "mode" "DI")])
 
-(define_insn "mmx_ashr3"
-  [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
+(define_expand "mmx_ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand")
 (ashiftrt:MMXMODE24
- (match_operand:MMXMODE24 1 "register_operand" "0")
- (match_operand:DI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "psra\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
+ (match_operand:MMXMODE24 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX && !TARGET_MMX_WITH_SSE")
+
+(define_expand "ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand")
+(ashiftrt:MMXMODE24
+ (match_operand:MMXMODE24 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE")
+
+(define_insn "*ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand" "=y,x,Yv")
+(ashiftrt:MMXMODE24
+ (match_operand:MMXMODE24 1 "register_operand" "0,0,Yv")
+ (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   psra\t{%2, %0|%0, %2}
+   psra\t{%2, %0|%0, %2}
+   vpsra\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseishft,sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
-(define_insn "mmx_3"
-  [(set (match_operand:MMXMODE248 0 "register_operand" "=y")
+(define_expand "mmx_3"
+  [(set (match_operand:MMXMODE248 0 "register_operand")
 (any_lshift:MMXMODE248
- (match_operand:MMXMODE248 1 "register_operand" "0")
- (match_operand:DI 2 "nonmemory_operand" "yN")))]
-  "TARGET_MMX"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
+ (match_operand:MMXMODE248 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX && !TARGET_MMX_WITH_SSE")
+
+(define_expand "3"
+  [(set (match_operand:MMXMODE248 0 "register_operand")
+(any_lshift:MMXMODE248
+ (match_operand:MMXMODE248 1 "register_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE")
+
+(define_insn "*3"
+  [(set (match_operand:MMXMODE248 0 "register_operand" "=y,x,Yv")
+(any_lshift:MMXMODE248
+ (match_operand:MMXMODE248 1 "register_operand" "0,0,Yv")
+ (match_operand:DI 2 "nonmemory_operand" "yN,xN,YvN")))]
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxshft,sseishft,sseishft")
(set (attr "length_immediate")
  (if_then_else (match_operand 2 "const_int_operand")
(const_string "1")
(const_string "0")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 16/40] i386: Emulate MMX mmx_pextrw with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mmx_pextrw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pextrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 69ed2d07022..b1d27506131 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1349,16 +1349,18 @@
(set_attr "mode" "DI")])
 
 (define_insn "mmx_pextrw"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
 (zero_extend:SI
  (vec_select:HI
-   (match_operand:V4HI 1 "register_operand" "y")
-   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n")]]
-  "TARGET_SSE || TARGET_3DNOW_A"
-  "pextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "mmxcvt")
+   (match_operand:V4HI 1 "register_operand" "y,Yv")
+   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n")]]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && (TARGET_SSE || TARGET_3DNOW_A)"
+  "%vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64")
+   (set_attr "type" "mmxcvt,sselog1")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_pshufw"
   [(match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 09/40] i386: Emulate MMX 3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX 3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (any_logic:3): New.
(any_logic:*mmx_3): Also allow TARGET_MMX_WITH_SSE.
Add SSE support.
---
 gcc/config/i386/mmx.md | 27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 1b4f67be902..e4b0a3b0311 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1105,15 +1105,28 @@
   "TARGET_MMX"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (any_logic:MMXMODEI
+ (match_operand:MMXMODEI 1 "nonimmediate_operand")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,x,Yv")
 (any_logic:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,Yv")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 06/40] i386: Emulate MMX smulv4hi3_highpart with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mulv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_smulv4hi3_highpart): Also allow
TARGET_MMX_WITH_SSE.
(*mmx_smulv4hi3_highpart): Also allow TARGET_MMX_WITH_SSE. Add
SSE support.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8ebaf9b3ee5..aaae5576967 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -777,23 +777,28 @@
  (sign_extend:V4SI
(match_operand:V4HI 2 "nonimmediate_operand")))
(const_int 16]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_smulv4hi3_highpart"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
(truncate:V4HI
  (lshiftrt:V4SI
(mult:V4SI
  (sign_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv"))
  (sign_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))
(const_int 16]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmulhw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmulhw\t{%2, %0|%0, %2}
+   pmulhw\t{%2, %0|%0, %2}
+   vpmulhw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_umulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 07/40] i386: Emulate MMX mmx_pmaddwd with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX pmaddwd with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE.
(*mmx_pmaddwd): Also allow TARGET_MMX_WITH_SSE.  Add SSE support.
---
 gcc/config/i386/mmx.md | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index aaae5576967..0e44b3ce9b8 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -848,20 +848,20 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_pmaddwd"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
 (plus:V2SI
  (mult:V2SI
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0")
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
(parallel [(const_int 0) (const_int 2)])))
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")
(parallel [(const_int 0) (const_int 2)]
  (mult:V2SI
(sign_extend:V2SI
@@ -870,10 +870,15 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmaddwd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmaddwd\t{%2, %0|%0, %2}
+   pmaddwd\t{%2, %0|%0, %2}
+   vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_pmulhrwv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 03/40] i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX

2019-02-11 Thread H.J. Lu
Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX.  For MMX punpckhXX,
move bits 64:127 to bits 0:63 in SSE register.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/i386-protos.h (ix86_split_mmx_punpck): New
prototype.
* config/i386/i386.c (ix86_split_mmx_punpck): New function.
* config/i386/mmx.m (mmx_punpckhbw): Changed to
define_insn_and_split to support SSE emulation.
(mmx_punpcklbw): Likewise.
(mmx_punpckhwd): Likewise.
(mmx_punpcklwd): Likewise.
(mmx_punpckhdq): Likewise.
(mmx_punpckldq): Likewise.
---
 gcc/config/i386/i386-protos.h |   1 +
 gcc/config/i386/i386.c|  77 +++
 gcc/config/i386/mmx.md| 138 ++
 3 files changed, 168 insertions(+), 48 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bb96a420a85..dc7fc38d8e4 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -202,6 +202,7 @@ extern rtx ix86_split_stack_guard (void);
 
 extern void ix86_move_vector_high_sse_to_mmx (rtx);
 extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+extern void ix86_split_mmx_punpck (rtx[], bool);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b8d5ba7f28f..7d65192c1cd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20009,6 +20009,83 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code 
code)
   ix86_move_vector_high_sse_to_mmx (op0);
 }
 
+/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
+
+void
+ix86_split_mmx_punpck (rtx operands[], bool high_p)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  machine_mode mode = GET_MODE (op0);
+  rtx mask;
+  /* The corresponding SSE mode.  */
+  machine_mode sse_mode, double_sse_mode;
+
+  switch (mode)
+{
+case E_V8QImode:
+  sse_mode = V16QImode;
+  double_sse_mode = V32QImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (16,
+ GEN_INT (0), GEN_INT (16),
+ GEN_INT (1), GEN_INT (17),
+ GEN_INT (2), GEN_INT (18),
+ GEN_INT (3), GEN_INT (19),
+ GEN_INT (4), GEN_INT (20),
+ GEN_INT (5), GEN_INT (21),
+ GEN_INT (6), GEN_INT (22),
+ GEN_INT (7), GEN_INT (23)));
+  break;
+
+case E_V4HImode:
+  sse_mode = V8HImode;
+  double_sse_mode = V16HImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (8,
+ GEN_INT (0), GEN_INT (8),
+ GEN_INT (1), GEN_INT (9),
+ GEN_INT (2), GEN_INT (10),
+ GEN_INT (3), GEN_INT (11)));
+  break;
+
+case E_V2SImode:
+  sse_mode = V4SImode;
+  double_sse_mode = V8SImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4,
+ GEN_INT (0), GEN_INT (4),
+ GEN_INT (1), GEN_INT (5)));
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  /* Generate SSE punpcklXX.  */
+  rtx dest = gen_rtx_REG (sse_mode, REGNO (op0));
+  op1 = gen_rtx_REG (sse_mode, REGNO (op1));
+  op2 = gen_rtx_REG (sse_mode, REGNO (op2));
+
+  op1 = gen_rtx_VEC_CONCAT (double_sse_mode, op1, op2);
+  op2 = gen_rtx_VEC_SELECT (sse_mode, op1, mask);
+  rtx insn = gen_rtx_SET (dest, op2);
+  emit_insn (insn);
+
+  if (high_p)
+{
+  /* Move bits 64:127 to bits 0:63.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (0)));
+  dest = gen_rtx_REG (V4SImode, REGNO (dest));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+  emit_insn (insn);
+}
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 840d369ab02..034c6a855e0 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1089,87 +1089,129 @@
(set_attr "type" "mmxshft,sselog,sselog")
(set_attr "mode" "DI,TI,TI")])
 
-(define_insn "mmx_punpckhbw"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+(define_insn_and_split "mmx_punpckhbw"
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
(vec_select:V8QI
 

[PATCH 05/40] i386: Emulate MMX mulv4hi3 with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX mulv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mulv4hi3): New.
(*mmx_mulv4hi3): Also allow TARGET_MMX_WITH_SSE.  Add SSE
support.
---
 gcc/config/i386/mmx.md | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index f4c9aa37f7d..8ebaf9b3ee5 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -746,14 +746,26 @@
   "TARGET_MMX"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
+(define_expand "mulv4hi3"
+  [(set (match_operand:V4HI 0 "register_operand")
+(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand")
+  (match_operand:V4HI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
+
 (define_insn "*mmx_mulv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
-(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand" "%0")
-  (match_operand:V4HI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmullw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  [(set (match_operand:V4HI 0 "register_operand" "=y,x,Yv")
+(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yv")
+  (match_operand:V4HI 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmullw\t{%2, %0|%0, %2}
+   pmullw\t{%2, %0|%0, %2}
+   vpmullw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_smulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 04/40] i386: Emulate MMX plusminus/sat_plusminus with SSE

2019-02-11 Thread H.J. Lu
Emulate MMX plusminus/sat_plusminus with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (MMXMODEI8): Require TARGET_SSE2 for V1DI.
(plusminus:mmx_3): Check
TARGET_MMX_WITH_SSE.
(sat_plusminus:mmx_3): Likewise.
(3): New.
(*mmx_3): Add SSE emulation.
(*mmx_3): Likewise.
---
 gcc/config/i386/mmx.md | 51 --
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 034c6a855e0..f4c9aa37f7d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -45,7 +45,7 @@
 
 ;; 8 byte integral modes handled by MMX (and by extension, SSE)
 (define_mode_iterator MMXMODEI [V8QI V4HI V2SI])
-(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI V1DI])
+(define_mode_iterator MMXMODEI8 [V8QI V4HI V2SI (V1DI "TARGET_SSE2")])
 
 ;; All 8-byte vector modes handled by MMX
 (define_mode_iterator MMXMODE [V8QI V4HI V2SI V1DI V2SF])
@@ -690,37 +690,54 @@
(plusminus:MMXMODEI8
  (match_operand:MMXMODEI8 1 "nonimmediate_operand")
  (match_operand:MMXMODEI8 2 "nonimmediate_operand")))]
-  "TARGET_MMX || (TARGET_SSE2 && mode == V1DImode)"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (plusminus:MMXMODEI
+ (match_operand:MMXMODEI 1 "nonimmediate_operand")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,x,Yv")
 (plusminus:MMXMODEI8
- (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_MMX || (TARGET_SSE2 && mode == V1DImode))
+ (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0,0,Yv")
+ (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
&& ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODE12 0 "register_operand")
(sat_plusminus:MMXMODE12
  (match_operand:MMXMODE12 1 "nonimmediate_operand")
  (match_operand:MMXMODE12 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+  "TARGET_MMX || TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODE12 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE12 0 "register_operand" "=y,x,Yv")
 (sat_plusminus:MMXMODE12
- (match_operand:MMXMODE12 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODE12 1 "nonimmediate_operand" "0,0,Yv")
+ (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym,x,Yv")))]
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE)
+   && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_mulv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 01/40] i386: Allow MMX register modes in SSE registers

2019-02-11 Thread H.J. Lu
In 64-bit mode, SSE2 can be used to emulate MMX instructions without
3DNOW.  We can use SSE2 to support MMX register modes.

PR target/89021
* config/i386/i386.c (ix86_set_reg_reg_cost): Also support
VALID_MMX_WITH_SSE_REG_MODE.
(ix86_vector_mode_supported_p): Likewise.
* config/i386/i386.h (TARGET_MMX_WITH_SSE): New.
(TARGET_MMX_WITH_SSE_P): Likewise.
---
 gcc/config/i386/i386.c | 5 +++--
 gcc/config/i386/i386.h | 5 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 12bc7926f86..61e602bdb38 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -40235,7 +40235,8 @@ ix86_set_reg_reg_cost (machine_mode mode)
  || (TARGET_AVX && VALID_AVX256_REG_MODE (mode))
  || (TARGET_SSE2 && VALID_SSE2_REG_MODE (mode))
  || (TARGET_SSE && VALID_SSE_REG_MODE (mode))
- || (TARGET_MMX && VALID_MMX_REG_MODE (mode)))
+ || ((TARGET_MMX || TARGET_MMX_WITH_SSE)
+ && VALID_MMX_REG_MODE (mode)))
units = GET_MODE_SIZE (mode);
 }
 
@@ -44061,7 +44062,7 @@ ix86_vector_mode_supported_p (machine_mode mode)
 return true;
   if (TARGET_AVX512F && VALID_AVX512F_REG_MODE (mode))
 return true;
-  if (TARGET_MMX && VALID_MMX_REG_MODE (mode))
+  if ((TARGET_MMX ||TARGET_MMX_WITH_SSE) && VALID_MMX_REG_MODE (mode))
 return true;
   if (TARGET_3DNOW && VALID_MMX_REG_MODE_3DNOW (mode))
 return true;
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 83b025e0cf5..db814d9ed17 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -201,6 +201,11 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define TARGET_16BIT   TARGET_CODE16
 #define TARGET_16BIT_P(x)  TARGET_CODE16_P(x)
 
+#define TARGET_MMX_WITH_SSE \
+  (TARGET_64BIT && TARGET_SSE2)
+#define TARGET_MMX_WITH_SSE_P(x) \
+  (TARGET_64BIT_P (x) && TARGET_SSE2_P (x))
+
 #include "config/vxworks-dummy.h"
 
 #include "config/i386/i386-opts.h"
-- 
2.20.1



[PATCH 02/40] i386: Emulate MMX packsswb/packssdw/packuswb with SSE2

2019-02-11 Thread H.J. Lu
Emulate MMX packsswb/packssdw/packuswb with SSE packsswb/packssdw/packuswb
plus moving bits 64:95 to bits 32:63 in SSE register.  Only SSE register
source operand is allowed.

2019-02-08  H.J. Lu  
Uros Bizjak  

PR target/89021
* config/i386/i386-protos.h (ix86_move_vector_high_sse_to_mmx):
New prototype.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.c (ix86_move_vector_high_sse_to_mmx): New
function.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.md (mmx_isa): New.
(enabled): Also check mmx_isa.
* config/i386/mmx.md (any_s_truncate): New code iterator.
(s_trunsuffix): New code attr.
(mmx_packsswb): Removed.
(mmx_packssdw): Likewise.
(mmx_packuswb): Likewise.
(mmx_packswb): New define_insn_and_split to emulate
MMX packsswb/packuswb with SSE2.
(mmx_packssdw): Likewise.
---
 gcc/config/i386/i386-protos.h |  3 ++
 gcc/config/i386/i386.c| 54 
 gcc/config/i386/i386.md   | 12 +++
 gcc/config/i386/mmx.md| 67 +++
 4 files changed, 106 insertions(+), 30 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 2d600173917..bb96a420a85 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -200,6 +200,9 @@ extern void ix86_expand_vecop_qihi (enum rtx_code, rtx, 
rtx, rtx);
 
 extern rtx ix86_split_stack_guard (void);
 
+extern void ix86_move_vector_high_sse_to_mmx (rtx);
+extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
 #endif /* TREE_CODE  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 61e602bdb38..b8d5ba7f28f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19955,6 +19955,60 @@ ix86_expand_vector_move_misalign (machine_mode mode, 
rtx operands[])
 gcc_unreachable ();
 }
 
+/* Move bits 64:95 to bits 32:63.  */
+
+void
+ix86_move_vector_high_sse_to_mmx (rtx op)
+{
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (2),
+ GEN_INT (0), GEN_INT (0)));
+  rtx dest = gen_rtx_REG (V4SImode, REGNO (op));
+  op = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  rtx insn = gen_rtx_SET (dest, op);
+  emit_insn (insn);
+}
+
+/* Split MMX pack with signed/unsigned saturation with SSE/SSE2.  */
+
+void
+ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+
+  machine_mode dmode = GET_MODE (op0);
+  machine_mode smode = GET_MODE (op1);
+  machine_mode inner_dmode = GET_MODE_INNER (dmode);
+  machine_mode inner_smode = GET_MODE_INNER (smode);
+
+  /* Get the corresponding SSE mode for destination.  */
+  int nunits = 16 / GET_MODE_SIZE (inner_dmode);
+  machine_mode sse_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+   nunits).require ();
+  machine_mode sse_half_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+nunits / 2).require ();
+
+  /* Get the corresponding SSE mode for source.  */
+  nunits = 16 / GET_MODE_SIZE (inner_smode);
+  machine_mode sse_smode = mode_for_vector (GET_MODE_INNER (smode),
+   nunits).require ();
+
+  /* Generate SSE pack with signed/unsigned saturation.  */
+  rtx dest = gen_rtx_REG (sse_dmode, REGNO (op0));
+  op1 = gen_rtx_REG (sse_smode, REGNO (op1));
+  op2 = gen_rtx_REG (sse_smode, REGNO (op2));
+
+  op1 = gen_rtx_fmt_e (code, sse_half_dmode, op1);
+  op2 = gen_rtx_fmt_e (code, sse_half_dmode, op2);
+  rtx insn = gen_rtx_SET (dest, gen_rtx_VEC_CONCAT (sse_dmode,
+   op1, op2));
+  emit_insn (insn);
+
+  ix86_move_vector_high_sse_to_mmx (op0);
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5b89e52493e..633b1dab523 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -792,6 +792,9 @@
avx512vl,noavx512vl,x64_avx512dq,x64_avx512bw"
   (const_string "base"))
 
+;; Define instruction set of MMX instructions
+(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx" (const_string 
"base"))
+
 (define_attr "enabled" ""
   (cond [(eq_attr "isa" "x64") (symbol_ref "TARGET_64BIT")
 (eq_attr "isa" "x64_sse2")
@@ -830,6 +833,15 @@
 (eq_attr "isa" "noavx512dq") (symbol_ref "!TARGET_AVX512DQ")
 (eq_attr "isa" "avx512vl") (symbol_ref "TARGET_AVX512VL")
 (eq_attr "isa" "noavx512vl") (symbol_ref "!TARGET_AVX512VL")
+
+(eq_attr "mmx_isa" "native")
+  

[PATCH 00/40] V4: Emulate MMX intrinsics with SSE

2019-02-11 Thread H.J. Lu
On x86-64, since __m64 is returned and passed in XMM registers, we can
emulate MMX intrinsics with SSE instructions. To support it, we added

 #define TARGET_MMX_WITH_SSE \
  (TARGET_64BIT && TARGET_SSE2)
 #define TARGET_MMX_WITH_SSE_P(x) \
  (TARGET_64BIT_P (x) && TARGET_SSE2_P (x))

;; Define instruction set of MMX instructions
(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx" (const_string 
"base"))

 (eq_attr "mmx_isa" "native")
   (symbol_ref "!TARGET_MMX_WITH_SSE")
 (eq_attr "mmx_isa" "x64")
   (symbol_ref "TARGET_MMX_WITH_SSE")
 (eq_attr "mmx_isa" "x64_avx")
   (symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
 (eq_attr "mmx_isa" "x64_noavx")
   (symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")

We added SSE emulation to MMX patterns and disabled MMX alternatives with
TARGET_MMX_WITH_SSE.

Most of MMX instructions have equivalent SSE versions and results of some
SSE versions need to be reshuffled to the right order for MMX.  Thee are
couple tricky cases:

1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand and handle unmapped bits 64:127 at memory address by
adjusting source and mask operands together with memory address.

2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.

3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.

4. To emulate MMX cvtpi2ps with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.

Tests are also added to check each SSE emulation of MMX intrinsics.

With SSE emulation in 64-bit mode, 8-byte vectorizer is enabled with SSE2.

There are no regressions on i686 and x86-64.  For x86-64, GCC is also
tested with

--with-arch=native --with-cpu=native

on AVX2 and AVX512F machines.

H.J. Lu (40):
  i386: Allow MMX register modes in SSE registers
  i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
  i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
  i386: Emulate MMX plusminus/sat_plusminus with SSE
  i386: Emulate MMX mulv4hi3 with SSE
  i386: Emulate MMX smulv4hi3_highpart with SSE
  i386: Emulate MMX mmx_pmaddwd with SSE
  i386: Emulate MMX ashr3/3 with SSE
  i386: Emulate MMX 3 with SSE
  i386: Emulate MMX mmx_andnot3 with SSE
  i386: Emulate MMX mmx_eq/mmx_gt3 with SSE
  i386: Emulate MMX vec_dupv2si with SSE
  i386: Emulate MMX pshufw with SSE
  i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
  i386: Emulate MMX sse_cvtpi2ps with SSE
  i386: Emulate MMX mmx_pextrw with SSE
  i386: Emulate MMX mmx_pinsrw with SSE
  i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
  i386: Emulate MMX mmx_pmovmskb with SSE
  i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
  i386: Emulate MMX maskmovq with SSE2 maskmovdqu
  i386: Emulate MMX mmx_uavgv8qi3 with SSE
  i386: Emulate MMX mmx_uavgv4hi3 with SSE
  i386: Emulate MMX mmx_psadbw with SSE
  i386: Emulate MMX movntq with SSE2 movntidi
  i386: Emulate MMX umulv1siv1di3 with SSE2
  i386: Emulate MMX ssse3_phwv4hi3 with SSE
  i386: Emulate MMX ssse3_phdv2si3 with SSE
  i386: Emulate MMX ssse3_pmaddubsw with SSE
  i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
  i386: Emulate MMX pshufb with SSE version
  i386: Emulate MMX ssse3_psign3 with SSE
  i386: Emulate MMX ssse3_palignrdi with SSE
  i386: Emulate MMX abs2 with SSE
  i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
  i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
  i386: Allow MMX intrinsic emulation with SSE
  i386: Add tests for MMX intrinsic emulations with SSE
  i386: Also enable SSSE3 __m64 tests in 64-bit mode
  i386: Enable 8-byte vectorizer for TARGET_MMX_WITH_SSE

 gcc/config/i386/constraints.md|   6 +
 gcc/config/i386/i386-builtin.def  | 126 +--
 gcc/config/i386/i386-protos.h |   4 +
 gcc/config/i386/i386.c| 208 +++-
 gcc/config/i386/i386.h|   5 +
 gcc/config/i386/i386.md   |  12 +
 gcc/config/i386/mmintrin.h|  10 +-
 gcc/config/i386/mmx.md| 980 --
 gcc/config/i386/sse.md| 365 +--
 gcc/config/i386/xmmintrin.h   |  61 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c   |   2 +-
 gcc/testsuite/gcc.target/i386/mmx-vals.h  |  77 ++
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr89028-1.c |  10 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c   |  42 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c   |  39 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c   |  41 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c   |  40 +
 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c   |  30 +
 

[PR fortran/88299, patch] - [F18] COMMON in a legacy module produces bogus warnings in dependent code

2019-02-11 Thread Harald Anlauf
The attached patch moves the check for this F2018 obsolescent feature
to a better place where the warning is only emitted when the COMMON is
declared.  No warning should be emitted when such a legacy module is
simply used.

Regtested on x86_64-pc-linux-gnu.

OK for trunk?

Thanks,
Harald

2019-02-11  Harald Anlauf  

PR fortran/88299
* resolve.c (resolve_common_blocks,resolve_common_vars): Move
check for obsolent COMMON feature in F2018 to better place.

2019-02-11  Harald Anlauf  

PR fortran/88299
* gfortran.dg/pr88299.f90: New test.

Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (revision 268778)
+++ gcc/fortran/resolve.c   (working copy)
@@ -940,7 +940,11 @@
 have been ignored to continue parsing.
 We do the checks again here.  */
   if (!csym->attr.use_assoc)
-   gfc_add_in_common (>attr, csym->name, _block->where);
+   {
+ gfc_add_in_common (>attr, csym->name, _block->where);
+ gfc_notify_std (GFC_STD_F2018_OBS, "COMMON block at %L",
+ _block->where);
+   }
 
   if (csym->value || csym->attr.data)
{
@@ -998,10 +1002,6 @@
 
   resolve_common_vars (common_root->n.common, true);
 
-  if (!gfc_notify_std (GFC_STD_F2018_OBS, "COMMON block at %L",
-  _root->n.common->where))
-return;
-
   /* The common name is a global name - in Fortran 2003 also if it has a
  C binding name, since Fortran 2008 only the C binding name is a global
  identifier.  */
Index: gcc/testsuite/gfortran.dg/pr88299.f90
===
--- gcc/testsuite/gfortran.dg/pr88299.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr88299.f90   (working copy)
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-options "-std=f2018" }
+!
+! PR 85839: [F18] COMMON in a legacy module produces bogus warnings
+!   in dependent code
+
+module legacy
+  integer :: major, n
+  common /version/ major  ! { dg-warning "obsolescent feature" }
+  public  :: n
+  private
+end module legacy
+
+module mod1
+  use legacy, only: n ! No warning expected here
+end module mod1


Re: [PATCH] Updated patches for the port of gccgo to GNU/Hurd

2019-02-11 Thread Svante Signell
On Mon, 2019-02-11 at 10:27 -0800, Ian Lance Taylor wrote:
> On Mon, Feb 11, 2019 at 3:10 AM Svante Signell 
> wrote:
> > On Sun, 2019-02-10 at 22:08 -0800, Ian Lance Taylor wrote:
> > > On Sun, Feb 10, 2019 at 3:41 AM Svante Signell 
> > > wrote:
> > > > On Sat, 2019-02-09 at 23:57 +0100, Svante Signell wrote:
> > > > > On Sat, 2019-02-09 at 14:40 -0800, Ian Lance Taylor wrote:
> > > > > > On Fri, Feb 8, 2019 at 3:07 PM Matthias Klose 
> > > > > > wrote:
> > > > > > > On 07.02.19 06:04, Ian Lance Taylor wrote:
> > > > > > What are the lines before that in the log?  For some reason libtool
> > > > > > is
> > > > > > being invoke with no source files.  The lines before the failing
> > > > > > line
> > > > > > should show an invocation of match.sh that determines the source
> > > > > > files.
> > > > > 
> > > > > Thanks for your job upstreaming the patches!
> > > > > 
> > > > > I've found some problems. Current problem is with the mksysinfo.sh
> > > > > patch.
> > > > > But there are some other things missing. New patches will be submitted
> > > > > tomorrow.
> > > > 
> > > > Attached are three additional patches needed to build libgo on GNU/Hurd:
> > > > src_libgo_mksysinfo.sh.diff
> > > > src_libgo_go_syscall_wait.c.diff
> > > > src_libgo_testsuite_gotest.diff
> > > > 
> > > > For the first patch, src_libgo_mksysinfo.sh.diff, I had to go back to
> > > > the
> > > > old version, using sed -i -e. As written now ${fsid_to_dev} expands to
> > > > fsid_to_dev='-e '\''s/st_fsid/Dev/'\''' resulting in: "sed: -e
> > > > expression
> > > > #4, char 1: unknown command: `''". Unfortunately, I have not yet been
> > > > able
> > > > to modify the expansion omitting the single qoutes around the shell
> > > > variable.
> > > 
> > > I'm sorry, I don't want to use "sed -i".  That loses the original file
> > > and makes it harder to reconstruct what has happened.
> > 
> > What to do then?
> > 
> > > > The second patch, src_libgo_go_syscall_wait.c.diff, is needed since
> > > > WCONTINUED is not defined and is needed for WIFCONTINUED to be defined
> > > > in
> > > > wait.h.
> > > 
> > > I don't understand that.   is a system header file.  Are
> > > you saying that it is impossible to use  and WIFCONTINUED
> > > unless your source code does a #define WCONTINUED before #include'ing
> > > ?  That seems like a bug in the Hurd library code.
> > 
> > The problem is that WCONTINUED is not defined in /usr/include/i386-
> > gnu/bits/waitflags.h on Hurd. Only WNOHANG and WUNTRACED are. That causes
> > WIFCONTINUED not to be defined in /usr/include/i386-gnu/bits/waitstatus.h.
> > As
> > WCONTINUED is not defined, I assume that WIFCONTINUED is not supported.
> > 
> > From waitpid(2):
> > WCONTINUED (since Linux 2.6.10)
> >also return if a stopped child has been resumed by delivery of SIGCONT.
> > 
> > @Samuel: more info?
> > 
> > I think that that call to WIFCONTINUED in libgo/go/syscall/wait.c
> > _Bool
> > Continued (uint32_t *w)
> > {
> >   return WIFCONTINUED (*w) != 0;
> > }
> > 
> > has to be omitted somehow for Hurd.
> 
> It sound like the right fix is to use #ifdef WIFCONTINUED in
> syscall/wait.c.  If WIFCONTINUED is not defined, the Continued
> function should always return 0.

I've got some ideas on how to solve the mksysinfo.sh problem. I just don't have
time to try it out now. The idea is:
fsid_to_dev='s/st_dev/Dev/'
if grep 'define st_dev st_fsid' gen-sysinfo.go > /dev/null 2>&1; then
fsid_to_dev='s/st_fsid/Dev/'
...
remove: -e 's/st_dev/Dev/' \
add:-e ${fsid_to_dev} \


I can also easily submit a patch for WIFCONTINUED returning 0. Problem is I'll
be AFK for the next week. Maybe this can wait, or you find a solution? 
Regardinga comm opttion for ps Samuel is the best source. 

Thanks!



Re: arm access to stack slot out of allocated area

2019-02-11 Thread Ramana Radhakrishnan
On Mon, Feb 11, 2019 at 4:48 PM Olivier Hainque  wrote:
>
> Hi Wilco,
>
> > On 8 Feb 2019, at 22:35, Wilco Dijkstra  wrote:
>
> > So I think we need to push much harder on getting rid of obsolete stuff and
> > avoid people encountering these nasty issues.
>
> Numbers I just received indicate that we can legitimately head
> in this direction for VxWorks as well (move towards VxWorks 7 only
> ports, AAPCS based).
>
> Good news :)
>

Yay !

Ramana


Re: [PATCH][ARM] Fix PR89222

2019-02-11 Thread Ramana Radhakrishnan
On Mon, Feb 11, 2019 at 5:35 PM Wilco Dijkstra  wrote:
>
> The GCC optimizer can generate symbols with non-zero offset from simple
> if-statements. Bit zero is used for the Arm/Thumb state bit, so relocations
> with offsets fail if it changes bit zero and the relocation forces bit zero
> to true.  The fix is to disable offsets on function pointer symbols.
>
> ARMv5te bootstrap OK, regression tests pass. OK for commit?

Interesting bug. armv5te-linux bootstrap ? Can you share your --target
and --with-arch flags ?

>
> ChangeLog:
> 2019-02-06  Wilco Dijkstra  
>
> gcc/
> PR target/89222
> * config/arm/arm.md (movsi): Use arm_cannot_force_const_mem
> to decide when to split off an offset from a symbol.
> * config/arm/arm.c (arm_cannot_force_const_mem): Disallow offsets
> in function symbols.
> * config/arm/arm-protos.h (arm_cannot_force_const_mem): Add.
>
> testsuite/
> PR target/89222
> * gcc.target/arm/pr89222.c: Add new test.
>
> --
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 
> 79ede0db174fcce87abe8b4d18893550d4c7e2f6..0bedbe5110853617ecf7456bbaa56b1405fb65dd
>  100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -184,6 +184,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, 
> tree, rtx, tree);
>  extern bool arm_pad_reg_upward (machine_mode, tree, int);
>  #endif
>  extern int arm_apply_result_size (void);
> +extern bool arm_cannot_force_const_mem (machine_mode, rtx);
>
>  #endif /* RTX_CODE */
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> c4c9b4a667100d81d918196713e40b01ee232ee2..ccd4211045066d8edb89dd4c23d554517639f8f6
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -178,7 +178,6 @@ static void arm_internal_label (FILE *, const char *, 
> unsigned long);
>  static void arm_output_mi_thunk (FILE *, tree, HOST_WIDE_INT, HOST_WIDE_INT,
>  tree);
>  static bool arm_have_conditional_execution (void);
> -static bool arm_cannot_force_const_mem (machine_mode, rtx);
>  static bool arm_legitimate_constant_p (machine_mode, rtx);
>  static bool arm_rtx_costs (rtx, machine_mode, int, int, int *, bool);
>  static int arm_address_cost (rtx, machine_mode, addr_space_t, bool);
> @@ -8936,15 +8935,20 @@ arm_legitimate_constant_p (machine_mode mode, rtx x)
>
>  /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
>
> -static bool

Let's keep this static ...

> +bool
>  arm_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
>rtx base, offset;
> +  split_const (x, , );
>
> -  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P)
> +  if (GET_CODE (base) == SYMBOL_REF)

Isn't there a SYMBOL_REF_P predicate for this ?

>  {
> -  split_const (x, , );
> -  if (GET_CODE (base) == SYMBOL_REF
> +  /* Function symbols cannot have an offset due to the Thumb bit.  */
> +  if ((SYMBOL_REF_FLAGS (base) & SYMBOL_FLAG_FUNCTION)
> + && INTVAL (offset) != 0)
> +   return true;
> +

Can we look to allow anything that is a power of 2 as an offset i.e.
anything with bit 0 set to 0 ? Could you please file an enhancement
request on binutils for both gold and ld to catch the linker warning
case ? I suspect we are looking for addends which have the lower bit
set and function symbols ?


> +  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P
>   && !offset_within_block_p (base, INTVAL (offset)))
> return true;
>  }

this looks ok.

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 
> aa759624f8f617576773aa75fd6239d6e06e8a13..00fccd964a86dd814f15e4a1fdf5b47173a3ee3f
>  100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -5981,17 +5981,13 @@ (define_expand "movsi"
>  }
>  }
>
> -  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P)
> +  if (arm_cannot_force_const_mem (SImode, operands[1]))

Firstly (targetm.cannot_force_const_mem (...)) please instead of
arm_cannot_force_const_mem , then that can remain static.  Let's look
to use the targetm interface instead of direct calls here. We weren't
hitting this path for non-vxworks code , however now we do so if
arm_tls_referenced_p is true at the end of arm_cannot_force_const_mem
which means that we could well have a TLS address getting spat out or
am I mis-reading something ?

This is my main concern with this patch ..

>  {
>split_const (operands[1], , );

> -  if (GET_CODE (base) == SYMBOL_REF
> - && !offset_within_block_p (base, INTVAL (offset)))
> -   {
> - tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
> - emit_move_insn (tmp, base);
> - emit_insn (gen_addsi3 (operands[0], tmp, offset));
> - DONE;
> -   }
> +  tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
> +  emit_move_insn (tmp, base);
> +  emit_insn (gen_addsi3 (operands[0], tmp, offset));
> +  

Re: [PATCH] fix ICE in __builtin_has_attribute (PR 88383 and 89288)

2019-02-11 Thread Martin Sebor

On 2/11/19 1:23 PM, Jakub Jelinek wrote:

On Mon, Feb 11, 2019 at 12:20:42PM -0700, Martin Sebor wrote:

--- gcc/c-family/c-attribs.c(revision 268774)
+++ gcc/c-family/c-attribs.c(working copy)
@@ -4032,8 +4032,12 @@ validate_attribute (location_t atloc, tree oper, t
  
if (TYPE_P (oper))

  tmpdecl = build_decl (atloc, TYPE_DECL, tmpid, oper);
+  else if (DECL_P (oper))
+tmpdecl = build_decl (atloc, TREE_CODE (oper), tmpid, TREE_TYPE (oper));
+  else if (EXPR_P (oper))
+tmpdecl = build_decl (atloc, TYPE_DECL, tmpid, TREE_TYPE (oper));
else
-tmpdecl = build_decl (atloc, TREE_CODE (oper), tmpid, TREE_TYPE (oper));
+return false;
  
/* Temporarily clear CURRENT_FUNCTION_DECL to make decl_attributes

   believe the DECL declared above is at file scope.  (See bug 87526.)  */


Why do you this kind of validation at all?  You do compare the arguments
later on in the caller, why isn't that sufficient?  Creating some decl (and
ignoring for that whether the attribute is decl_required, type_required and/or
function_type_required) is a sure way to get many warnings, even if the
arguments are reasonable.



The snippet above deals with validating an attribute with one or
more arguments in the context where it's being queried, to make
sure the whole attribute specification makes any sense at all.
It's meant to catch gross mistakes like

  __builtin_has_attribute (func, alloc_size ("1"));

but it's far from perfect.

The difference in the example you asked about in your other mail
can be seen here:

  const int i = 1;

  enum {
e = __builtin_has_attribute (1 + 0, alloc_size ("foo")),
f = __builtin_has_attribute (i + 1, alloc_size ("bar"))
  };

Because the EPPR_P(oper) test fails for the first enumerator
the first invalid attribute specification is not diagnosed.  But
because it succeeds for (i + 1), the second specification triggers
a warning about alloc_size being only applicable to function types.
I suppose this could improved by handling constants the same as
expressions.

But this is far from the only limitation.  Attributes with no
arguments don't even make it this far.  Because the validation
depends on decl_attributes to detect these mistakes and that
function doesn't distinguish success from failure, the validation
succeeds even for non-sensical attributes.  But it should never
cause the built-in to give a false positive.  There are at least
a couple of FIXMEs in the code where I would like to improve
the validation in GCC 10.  But none of them is inherent in
the design of the built-in or serious enough to seriously
compromise its usefulness.

Martin


Re: [PATCH][ARM] Fix PR89222

2019-02-11 Thread Alexander Monakov
On Mon, 11 Feb 2019, Wilco Dijkstra wrote:
> > With Gold linker this is handled correctly.  So it looks to me like a
> > bug in BFD linker, where it ignores any addend (not just +1/-1) when
> > resolving a relocation against a Thumb function.
> 
> If the Gold linker doesn't fail that means Gold has a serious bug in the way
> it handles Thumb relocations. Can you elaborate, does it do S+A+1 rather than
> (S+A) | 1 as the ARM-ELF spec requires?

Apologies - it appears I might have mistyped something, as re-trying my tests
shows that both linkers properly implement the required '(S+A) | 1'.  I can't
reproduce any linker bug I suspected.

It seems odd to me that the spec requires '(S+A) | T' instead of the (imho
more intuitive) '(S|T) + A', but apart from the missing diagnostic from the
linkers, it seems they do as they must and GCC was at fault.

(perhaps it's okay to allow addends with low bit zero though, instead of
allowing only zero addends as your patch does?)

Thanks for clearing this up!
Alexander

Re: [PATCH] fix ICE in __builtin_has_attribute (PR 88383 and 89288)

2019-02-11 Thread Jakub Jelinek
On Mon, Feb 11, 2019 at 12:20:42PM -0700, Martin Sebor wrote:
> --- gcc/c-family/c-attribs.c  (revision 268774)
> +++ gcc/c-family/c-attribs.c  (working copy)
> @@ -4032,8 +4032,12 @@ validate_attribute (location_t atloc, tree oper, t
>  
>if (TYPE_P (oper))
>  tmpdecl = build_decl (atloc, TYPE_DECL, tmpid, oper);
> +  else if (DECL_P (oper))
> +tmpdecl = build_decl (atloc, TREE_CODE (oper), tmpid, TREE_TYPE (oper));
> +  else if (EXPR_P (oper))
> +tmpdecl = build_decl (atloc, TYPE_DECL, tmpid, TREE_TYPE (oper));
>else
> -tmpdecl = build_decl (atloc, TREE_CODE (oper), tmpid, TREE_TYPE (oper));
> +return false;
>  
>/* Temporarily clear CURRENT_FUNCTION_DECL to make decl_attributes
>   believe the DECL declared above is at file scope.  (See bug 87526.)  */

Why do you this kind of validation at all?  You do compare the arguments
later on in the caller, why isn't that sufficient?  Creating some decl (and
ignoring for that whether the attribute is decl_required, type_required and/or
function_type_required) is a sure way to get many warnings, even if the
arguments are reasonable.

Jakub


Re: [C++ PATCH] Fix std::is_constant_evaluated() in non-type template parameters (PR c++/88977)

2019-02-11 Thread Jason Merrill

On 2/8/19 6:18 PM, Jakub Jelinek wrote:

Hi!

Non-type template arguments are constant-expression in the grammar and thus
manifestly constant-evaluated.
For e.g. class templates, convert_nontype_argument is called with
tf_warning_or_error and so while we called in the below spots
maybe_constant_value without manifestly_const_eval=true, there is a
   if (TREE_CODE (expr) != INTEGER_CST
   && !value_dependent_expression_p (expr))
 {
   if (complain & tf_error)
 {
   int errs = errorcount, warns = warningcount + werrorcount;
   if (!require_potential_constant_expression (expr))
 expr = error_mark_node;
   else
 expr = cxx_constant_value (expr);
later on and cxx_constant_value will do the manifestly_const_eval=true.
On the testcase below with function template, complain is tf_none and
so we only call that maybe_constant_value and not cxx_constant_value.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?


OK.


Re: C++ PATCH for c++/89212 - ICE converting nullptr to pointer-to-member-function

2019-02-11 Thread Jason Merrill

On 2/11/19 2:21 PM, Marek Polacek wrote:

On Fri, Feb 08, 2019 at 05:37:00PM -0500, Jason Merrill wrote:

On 2/8/19 12:21 PM, Marek Polacek wrote:

r256999 removed early bailout for pointer-to-member-function types, so we
now try to tsubst each element of a pointer-to-member-function CONSTRUCTOR.

That's fine but the problem here is that we end up converting a null pointer
to pointer-to-member-function type and that crashes in fold_convert:

16035 case INTEGER_CST:
16039   {
16040 /* Instantiate any typedefs in the type.  */
16041 tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
16042 r = fold_convert (type, t);

It seems obvious to use cp_fold_convert which handles TYPE_PTRMEM_P, but
that then ICEs too, the infamous "canonical types differ for identical types":
type is

struct
{
void A:: (struct A *) * __pfn;
long int __delta;
}

and the type of "0" is "void A:: (struct A *) *".  These types are
structurally equivalent but have different canonical types.  (What's up
with that, anyway?  It seems OK that the canonical type of the struct is
the struct itself and that the canonical type of the pointer is the pointer
itself.)

That could be handled in cp_fold_convert: add code to convert an integer_zerop 
to
TYPE_PTRMEMFUNC_P.  Unfortunately the 0 is not null_ptr_cst_p because it's got
a pointer type.

Or just don't bother substituting null_member_pointer_value_p and avoid the
above.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 8?

2019-02-08  Marek Polacek  

PR c++/89212 - ICE converting nullptr to pointer-to-member-function.
* pt.c (tsubst_copy_and_build) : Return early for
null member pointer value.

* g++.dg/cpp0x/nullptr40.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index b8fbf4046f0..acc2d8f1feb 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -19251,6 +19251,9 @@ tsubst_copy_and_build (tree t,
   looked up by digest_init.  */
process_index_p = !(type && MAYBE_CLASS_TYPE_P (type));
+   if (null_member_pointer_value_p (t))
+ RETURN (t);


I would expect this to do the wrong thing if type is different from
TREE_TYPE (t).  Can we get here for a dependent PMF type like T (A::*)()?
If not, let's assert that they're the same.  Otherwise, maybe cp_convert
(type, nullptr_node)?


Yup, that's a concern.  But I'm not seeing any ICEs with the assert added and a
dependent PMF as in the new testcase.  And it seems we get a conversion error
if the types of the PMFs don't match.  If I'm wrong, this would be easy to
fix anyway.

Bootstrapped/regtested on x86_64-linux, ok for trunk?


OK.

Jason


Re: [PATCH 14/43] i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE

2019-02-11 Thread Uros Bizjak
On Mon, Feb 11, 2019 at 8:08 PM H.J. Lu  wrote:
>
> On Sun, Feb 10, 2019 at 2:48 AM Uros Bizjak  wrote:
> >
> > On 2/10/19, H.J. Lu  wrote:
> > > Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.
> > >
> > >   PR target/89021
> > >   * config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.
> > >   (sse_cvttps2pi): Likewise.
> >
> > It looks to me that this description is wrong. We don't have V4SF
> > modes here, but V2SF, so we have to fake 64bit load in case of MMX.
> > The cvtps2dq will access memory in true 128bit width, so this is
> > wrong.
> >
> > We have to fix the description to not fake wide mode.
> >
>
> What do you propose to implement
>
> __m64 _mm_cvtps_pi32 (__m128 __A);

Hm...

In your original patch, we *do* have V4SF memory access, but the
original insn accesses it in __m64 mode. This should be OK, but then
accessing this memory in __m128 mode should also be OK. So, on a more
detailed look, the original patch looks OK to me. Luckily, a false
alarm...

>
> We also have
>
> (define_insn "sse2_cvtps2pd"
>   [(set (match_operand:V2DF 0 "register_operand" "=v")
> (float_extend:V2DF
>   (vec_select:V2SF
> (match_operand:V4SF 1 "vector_operand" "vm")
> (parallel [(const_int 0) (const_int 1)]]
>   "TARGET_SSE2 && "
>   "%vcvtps2pd\t{%1, %0|%0, %q1}"
>
> These aren't new problems introduced by my MMX work.

This one is not problematic, since the instruction accesses memory in
__m64 mode, which is narrower that V4SFmode.

Uros.


Re: Trivial C++ PATCH to remove commented code

2019-02-11 Thread Jason Merrill

OK.

On 2/11/19 2:27 PM, Marek Polacek wrote:

This I don't like.

Obvious, but ok?

2019-02-11  Marek Polacek  

* typeck2.c (digest_init_r): Remove commented code.

--- gcc/cp/typeck2.c
+++ gcc/cp/typeck2.c
@@ -1099,7 +1099,6 @@ digest_init_r (tree type, tree init, int nested, int 
flags,
  
tree typ1 = TYPE_MAIN_VARIANT (TREE_TYPE (type));

if (char_type_p (typ1)
- /*&& init */
  && TREE_CODE (stripped_init) == STRING_CST)
{
  tree char_type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (init)));





Re: [PATCH][ARM] Fix PR89222

2019-02-11 Thread Wilco Dijkstra
Hi Alexander,

> Just to be sure the issue is analyzed properly: if it's certain that this 
> usage
> is not allowed, shouldn't the linker produce a diagnostic instead of silently
> concealing the issue?

The ABI doesn't require this but yes a linker could report a warning if the
addend of a function symbol has bit 0 set.

> With Gold linker this is handled correctly.  So it looks to me like a
> bug in BFD linker, where it ignores any addend (not just +1/-1) when
> resolving a relocation against a Thumb function.

If the Gold linker doesn't fail that means Gold has a serious bug in the way
it handles Thumb relocations. Can you elaborate, does it do S+A+1 rather than
(S+A) | 1 as the ARM-ELF spec requires?

Cheers,
Wilco

Re: [PATCH 13/43] i386: Emulate MMX pshufw with SSE

2019-02-11 Thread Uros Bizjak
On Mon, Feb 11, 2019 at 7:09 PM H.J. Lu  wrote:
>
> On Sun, Feb 10, 2019 at 3:16 AM Uros Bizjak  wrote:
> >
> > On 2/10/19, H.J. Lu  wrote:
> > > Emulate MMX pshufw with SSE.  Only SSE register source operand is allowed.
> > >
> > >   PR target/89021
> > >   * config/i386/mmx.md (mmx_pshufw_1): Add SSE emulation.
> > >   (*vec_dupv4hi): Likewise.
> > >   emulation.
> > > ---
> > >  gcc/config/i386/mmx.md | 33 +
> > >  1 file changed, 21 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> > > index 1ee51c5deb7..dc81d7f45df 100644
> > > --- a/gcc/config/i386/mmx.md
> > > +++ b/gcc/config/i386/mmx.md
> > > @@ -1364,7 +1364,8 @@
> > >[(match_operand:V4HI 0 "register_operand")
> > > (match_operand:V4HI 1 "nonimmediate_operand")
> > > (match_operand:SI 2 "const_int_operand")]
> > > -  "TARGET_SSE || TARGET_3DNOW_A"
> > > +  "((TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE)
> > > +   || TARGET_3DNOW_A"
> >
> > I think that the above condition should read
> >
> > (TARGET_MMX || TARGET_MMX_WITH_SSE) && (TARGET_SSE || TARGET_3DNOW_A)
> >
> > and with TARGET_MMX_WITH_SSE (which implies SSE2) we always use XMM
> > registers. Without SSE2, we use MMX registers, as before.
>
> Done.
>
> > >  {
> > >int mask = INTVAL (operands[2]);
> > >emit_insn (gen_mmx_pshufw_1 (operands[0], operands[1],
> > > @@ -1376,14 +1377,15 @@
> > >  })
> > >
> > >  (define_insn "mmx_pshufw_1"
> > > -  [(set (match_operand:V4HI 0 "register_operand" "=y")
> > > +  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv")
> > >  (vec_select:V4HI
> > > -  (match_operand:V4HI 1 "nonimmediate_operand" "ym")
> > > +  (match_operand:V4HI 1 "nonimmediate_operand" "ym,Yv")
> > >(parallel [(match_operand 2 "const_0_to_3_operand")
> > >   (match_operand 3 "const_0_to_3_operand")
> > >   (match_operand 4 "const_0_to_3_operand")
> > >   (match_operand 5 "const_0_to_3_operand")])))]
> > > -  "TARGET_SSE || TARGET_3DNOW_A"
> > > +  "((TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE)
> > > +   || TARGET_3DNOW_A"
> > >  {
> > >int mask = 0;
> > >mask |= INTVAL (operands[2]) << 0;
> > > @@ -1392,11 +1394,15 @@
> > >mask |= INTVAL (operands[5]) << 6;
> > >operands[2] = GEN_INT (mask);
> > >
> > > -  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
> > > +  if (TARGET_MMX_WITH_SSE)
> > > +return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}";
> > > +  else
> > > +return "pshufw\t{%2, %1, %0|%0, %1, %2}";
> >
> > The above should be implemented as multi-output template.
>
> I have
>
> {
>   int mask = 0;
>   mask |= INTVAL (operands[2]) << 0;
>   mask |= INTVAL (operands[3]) << 2;
>   mask |= INTVAL (operands[4]) << 4;
>   mask |= INTVAL (operands[5]) << 6;
>   operands[2] = GEN_INT (mask);
>
>   if (TARGET_MMX_WITH_SSE)
> return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}";
>   else
> return "pshufw\t{%2, %1, %0|%0, %1, %2}";
> }
>
> How can I build mask before multi-output template?

You are right, mask has to be adjusted before output.

Maybe we should be more explicit here with:

  switch (which_alternative)
{
case 0:
  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
case 1:
  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
default:
  gcc_unreachable ();
}

Uros.


Re: [PATCH] fix ICE in __builtin_has_attribute (PR 88383 and 89288)

2019-02-11 Thread Jakub Jelinek
On Mon, Feb 11, 2019 at 12:20:42PM -0700, Martin Sebor wrote:
> This is a repost of a patch for PR 88383 updated to also fix the just
> reported PR 89288 (the original patch only partially handles this case).
> The review of the first patch was derailed by questions about the design
> of the built-in so the fix for the ICE was never approved.  I think
> the ICEs should be fixed for GCC 9 and any open design questions should
> be dealt with independently.

Well, it is closely coupled with the design questions.

>if (TYPE_P (oper))
>  tmpdecl = build_decl (atloc, TYPE_DECL, tmpid, oper);
> +  else if (DECL_P (oper))
> +tmpdecl = build_decl (atloc, TREE_CODE (oper), tmpid, TREE_TYPE (oper));
> +  else if (EXPR_P (oper))
> +tmpdecl = build_decl (atloc, TYPE_DECL, tmpid, TREE_TYPE (oper));
>else
> -tmpdecl = build_decl (atloc, TREE_CODE (oper), tmpid, TREE_TYPE (oper));
> +return false;

The EXPR_P conditional makes no sense.  Why should __builtin_has_attribute (1 + 
1, ...)
do something (if unfolded yet) and __builtin_has_attribute (2, ...)
something different?  1 + 1 when unfolded is EXPR_P, but 2 is not.

Jakub


Trivial C++ PATCH to remove commented code

2019-02-11 Thread Marek Polacek
This I don't like.

Obvious, but ok?

2019-02-11  Marek Polacek  

* typeck2.c (digest_init_r): Remove commented code.

--- gcc/cp/typeck2.c
+++ gcc/cp/typeck2.c
@@ -1099,7 +1099,6 @@ digest_init_r (tree type, tree init, int nested, int 
flags,
 
   tree typ1 = TYPE_MAIN_VARIANT (TREE_TYPE (type));
   if (char_type_p (typ1)
- /*&& init */
  && TREE_CODE (stripped_init) == STRING_CST)
{
  tree char_type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (init)));


Re: C++ PATCH for c++/89212 - ICE converting nullptr to pointer-to-member-function

2019-02-11 Thread Marek Polacek
On Fri, Feb 08, 2019 at 05:37:00PM -0500, Jason Merrill wrote:
> On 2/8/19 12:21 PM, Marek Polacek wrote:
> > r256999 removed early bailout for pointer-to-member-function types, so we
> > now try to tsubst each element of a pointer-to-member-function CONSTRUCTOR.
> > 
> > That's fine but the problem here is that we end up converting a null pointer
> > to pointer-to-member-function type and that crashes in fold_convert:
> > 
> > 16035 case INTEGER_CST:
> > 16039   {
> > 16040 /* Instantiate any typedefs in the type.  */
> > 16041 tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
> > 16042 r = fold_convert (type, t);
> > 
> > It seems obvious to use cp_fold_convert which handles TYPE_PTRMEM_P, but
> > that then ICEs too, the infamous "canonical types differ for identical 
> > types":
> > type is
> > 
> > struct
> > {
> >void A:: (struct A *) * __pfn;
> >long int __delta;
> > }
> > 
> > and the type of "0" is "void A:: (struct A *) *".  These types are
> > structurally equivalent but have different canonical types.  (What's up
> > with that, anyway?  It seems OK that the canonical type of the struct is
> > the struct itself and that the canonical type of the pointer is the pointer
> > itself.)
> > 
> > That could be handled in cp_fold_convert: add code to convert an 
> > integer_zerop to
> > TYPE_PTRMEMFUNC_P.  Unfortunately the 0 is not null_ptr_cst_p because it's 
> > got
> > a pointer type.
> > 
> > Or just don't bother substituting null_member_pointer_value_p and avoid the
> > above.
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 8?
> > 
> > 2019-02-08  Marek Polacek  
> > 
> > PR c++/89212 - ICE converting nullptr to pointer-to-member-function.
> > * pt.c (tsubst_copy_and_build) : Return early for
> > null member pointer value.
> > 
> > * g++.dg/cpp0x/nullptr40.C: New test.
> > 
> > diff --git gcc/cp/pt.c gcc/cp/pt.c
> > index b8fbf4046f0..acc2d8f1feb 100644
> > --- gcc/cp/pt.c
> > +++ gcc/cp/pt.c
> > @@ -19251,6 +19251,9 @@ tsubst_copy_and_build (tree t,
> >looked up by digest_init.  */
> > process_index_p = !(type && MAYBE_CLASS_TYPE_P (type));
> > +   if (null_member_pointer_value_p (t))
> > + RETURN (t);
> 
> I would expect this to do the wrong thing if type is different from
> TREE_TYPE (t).  Can we get here for a dependent PMF type like T (A::*)()?
> If not, let's assert that they're the same.  Otherwise, maybe cp_convert
> (type, nullptr_node)?

Yup, that's a concern.  But I'm not seeing any ICEs with the assert added and a
dependent PMF as in the new testcase.  And it seems we get a conversion error
if the types of the PMFs don't match.  If I'm wrong, this would be easy to
fix anyway.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-02-11  Marek Polacek  

PR c++/89212 - ICE converting nullptr to pointer-to-member-function.
* pt.c (tsubst_copy_and_build) : Return early for
null member pointer value.

* g++.dg/cpp0x/nullptr40.C: New test.
* g++.dg/cpp0x/nullptr41.C: New test.

diff --git gcc/cp/pt.c gcc/cp/pt.c
index b8fbf4046f0..2682c68dcfa 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -19251,6 +19251,12 @@ tsubst_copy_and_build (tree t,
   looked up by digest_init.  */
process_index_p = !(type && MAYBE_CLASS_TYPE_P (type));
 
+   if (null_member_pointer_value_p (t))
+ {
+   gcc_assert (same_type_p (type, TREE_TYPE (t)));
+   RETURN (t);
+ }
+
n = vec_safe_copy (CONSTRUCTOR_ELTS (t));
 newlen = vec_safe_length (n);
FOR_EACH_VEC_SAFE_ELT (n, idx, ce)
diff --git gcc/testsuite/g++.dg/cpp0x/nullptr40.C 
gcc/testsuite/g++.dg/cpp0x/nullptr40.C
new file mode 100644
index 000..21c188bdb5e
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/nullptr40.C
@@ -0,0 +1,19 @@
+// PR c++/89212
+// { dg-do compile { target c++11 } }
+
+template  using enable_if_t = int;
+
+template
+struct p
+{
+template>
+p(T) { }
+p() = default;
+};
+
+struct A
+{
+p i = 1;
+void bar();
+p j;
+};
diff --git gcc/testsuite/g++.dg/cpp0x/nullptr41.C 
gcc/testsuite/g++.dg/cpp0x/nullptr41.C
new file mode 100644
index 000..54e66af2095
--- /dev/null
+++ gcc/testsuite/g++.dg/cpp0x/nullptr41.C
@@ -0,0 +1,19 @@
+// PR c++/89212
+// { dg-do compile { target c++11 } }
+
+template  using enable_if_t = int;
+
+template
+struct p
+{
+template>
+p(T) { }
+p() = default;
+};
+
+struct A
+{
+p i = 1;
+void bar();
+p j;
+};


[PATCH] fix ICE in __builtin_has_attribute (PR 88383 and 89288)

2019-02-11 Thread Martin Sebor

This is a repost of a patch for PR 88383 updated to also fix the just
reported PR 89288 (the original patch only partially handles this case).
The review of the first patch was derailed by questions about the design
of the built-in so the fix for the ICE was never approved.  I think
the ICEs should be fixed for GCC 9 and any open design questions should
be dealt with independently.

Martin

The patch for PR 88383 was originally posted last December:
  https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00337.html
PR c/88383 - ICE calling __builtin_has_attribute on a reference
PR c/89288 - ICE in tree_code_size, at tree.c:865

gcc/c-family/ChangeLog:

	PR c/88383
	PR c/89288
	* c-attribs.c (validate_attribute): Handle expressions.
	(has_attribute): Handle types referenced by expressions.
	Avoid considering array attributes in ARRAY_REF expressions .

gcc/cp/ChangeLog:

	PR c/88383
	PR c/89288
	* parser.c (cp_parser_has_attribute_expression): Handle assignment
	expressions.

gcc/testsuite/ChangeLog:

	PR c/88383
	PR c/89288
	* c-c++-common/builtin-has-attribute-4.c: Adjust expectations.
	* c-c++-common/builtin-has-attribute-6.c: New test.

Index: gcc/c-family/c-attribs.c
===
--- gcc/c-family/c-attribs.c	(revision 268774)
+++ gcc/c-family/c-attribs.c	(working copy)
@@ -4032,8 +4032,12 @@ validate_attribute (location_t atloc, tree oper, t
 
   if (TYPE_P (oper))
 tmpdecl = build_decl (atloc, TYPE_DECL, tmpid, oper);
+  else if (DECL_P (oper))
+tmpdecl = build_decl (atloc, TREE_CODE (oper), tmpid, TREE_TYPE (oper));
+  else if (EXPR_P (oper))
+tmpdecl = build_decl (atloc, TYPE_DECL, tmpid, TREE_TYPE (oper));
   else
-tmpdecl = build_decl (atloc, TREE_CODE (oper), tmpid, TREE_TYPE (oper));
+return false;
 
   /* Temporarily clear CURRENT_FUNCTION_DECL to make decl_attributes
  believe the DECL declared above is at file scope.  (See bug 87526.)  */
@@ -4042,7 +4046,7 @@ validate_attribute (location_t atloc, tree oper, t
   if (DECL_P (tmpdecl))
 {
   if (DECL_P (oper))
-	/* An alias cannot be a defintion so declare the symbol extern.  */
+	/* An alias cannot be a definition so declare the symbol extern.  */
 	DECL_EXTERNAL (tmpdecl) = true;
   /* Attribute visibility only applies to symbols visible from other
 	 translation units so make it "public."   */
@@ -4078,11 +4082,17 @@ has_attribute (location_t atloc, tree t, tree attr
   do
 	{
 	  /* Determine the array element/member declaration from
-	 an ARRAY/COMPONENT_REF.  */
+	 a COMPONENT_REF and an INDIRECT_REF involving a refeence.  */
 	  STRIP_NOPS (t);
 	  tree_code code = TREE_CODE (t);
-	  if (code == ARRAY_REF)
-	t = TREE_OPERAND (t, 0);
+	  if (code == INDIRECT_REF)
+	{
+	  tree op0 = TREE_OPERAND (t, 0);
+	  if (TREE_CODE (TREE_TYPE (op0)) == REFERENCE_TYPE)
+		t = op0;
+	  else
+		break;
+	}
 	  else if (code == COMPONENT_REF)
 	t = TREE_OPERAND (t, 1);
 	  else
@@ -4133,7 +4143,8 @@ has_attribute (location_t atloc, tree t, tree attr
 	}
   else
 	{
-	  atlist = TYPE_ATTRIBUTES (TREE_TYPE (expr));
+	  type = TREE_TYPE (expr);
+	  atlist = TYPE_ATTRIBUTES (type);
 	  done = true;
 	}
 
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c	(revision 268774)
+++ gcc/cp/parser.c	(working copy)
@@ -8542,9 +8542,9 @@ cp_parser_has_attribute_expression (cp_parser *par
   cp_parser_parse_definitely (parser);
 
   /* If the type-id production did not work out, then we must be
- looking at the unary-expression production.  */
+ looking at an expression.  */
   if (!oper || oper == error_mark_node)
-oper = cp_parser_unary_expression (parser);
+oper = cp_parser_assignment_expression (parser);
 
   STRIP_ANY_LOCATION_WRAPPER (oper);
 
Index: gcc/testsuite/c-c++-common/builtin-has-attribute-4.c
===
--- gcc/testsuite/c-c++-common/builtin-has-attribute-4.c	(revision 268774)
+++ gcc/testsuite/c-c++-common/builtin-has-attribute-4.c	(working copy)
@@ -154,7 +154,8 @@ void test_packed (struct PackedMember *p)
   A (0, gpak[0].c, packed);
   A (0, gpak[1].s, packed);
   A (1, gpak->a, packed);
-  A (1, (*gpak).a[0], packed);
+  /* It's the array that's declared packed but not its elements.  */
+  A (0, (*gpak).a[0], packed);
 
   /* The following fails because in C it's represented as
INDIRECT_REF (POINTER_PLUS (NOP_EXPR (ADDR_EXPR (gpak)), ...))
@@ -164,7 +165,8 @@ void test_packed (struct PackedMember *p)
   A (0, p->c, packed);
   A (0, p->s, packed);
   A (1, p->a, packed);
-  A (1, p->a[0], packed);
+  /* It's the array that's declared packed but not its elements.  */
+  A (0, p->a[0], packed);
   /* Similar to the comment above.
A (1, *p->a, packed);  */
 }
Index: gcc/testsuite/c-c++-common/builtin-has-attribute-6.c

Re: [PATCH 14/43] i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE

2019-02-11 Thread H.J. Lu
On Sun, Feb 10, 2019 at 2:48 AM Uros Bizjak  wrote:
>
> On 2/10/19, H.J. Lu  wrote:
> > Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.
> >
> >   PR target/89021
> >   * config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.
> >   (sse_cvttps2pi): Likewise.
>
> It looks to me that this description is wrong. We don't have V4SF
> modes here, but V2SF, so we have to fake 64bit load in case of MMX.
> The cvtps2dq will access memory in true 128bit width, so this is
> wrong.
>
> We have to fix the description to not fake wide mode.
>

What do you propose to implement

__m64 _mm_cvtps_pi32 (__m128 __A);

We also have

(define_insn "sse2_cvtps2pd"
  [(set (match_operand:V2DF 0 "register_operand" "=v")
(float_extend:V2DF
  (vec_select:V2SF
(match_operand:V4SF 1 "vector_operand" "vm")
(parallel [(const_int 0) (const_int 1)]]
  "TARGET_SSE2 && "
  "%vcvtps2pd\t{%1, %0|%0, %q1}"

These aren't new problems introduced by my MMX work.

-- 
H.J.


Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-02-11 Thread Steve Ellcey
On Thu, 2019-02-07 at 18:13 +, Wilco Dijkstra wrote:
> External Email
> 
> Hi Steve,
> 
> > > After special cases you could do something like t = mask2 +
> > > (HWI_1U << shift);
> > > return t == (t & -t) to check for a valid bfi.
> > 
> > I am not sure I follow this logic and my attempts to use this did not
> > work so I kept my original code.
> 
> It's similar to the initial code in aarch64_bitmask_imm, but rather than 
> adding
> the lowest bit to the value to verify it is a mask (val + (val & -val)), we 
> use the
> shift instead. If the shift is exactly right, it reaches the first
> set bit of the mask.
> Adding the low bit to a valid mask always results in zero or a single set bit.
> The standard idiom to check that is t == (t & -t).
> 
> > > +  "bfi\t%0, %1, 0, %P2"
> > > 
> > > This could emit a width that may be 32 too large in SImode if bit 31 is 
> > > set
> > > (there is another use of %P in aarch64.md which may have the same
> > > issue).
> > 
> > I am not sure why having bit 31 set would be a problem.  Sign
> > extension?
> 
> Yes, if bit 31 is set, %P will emit 33 for a 32-bit constant which is 
> obviously wrong.
> Your patch avoids this for bfi by explicitly computing the correct value.
> 
> This looks good to me (and creates useful bfi's as expected), but I
> can't approve.
> 
> Wilco

Thanks for looking this over.  I have updated the mask check to use
your method and retested to make sure it still works.  Can one of the
aarch64 maintainers approve this patch?

2018-02-11  Steve Ellcey  

PR rtl-optimization/87763
* config/aarch64/aarch64-protos.h (aarch64_masks_and_shift_for_bfi_p):
New prototype.
* config/aarch64/aarch64.c (aarch64_masks_and_shift_for_bfi_p):
New function.
* config/aarch64/aarch64.md (*aarch64_bfi5_shift):
New instruction.
(*aarch64_bfi4_noand): Ditto.
(*aarch64_bfi4_noshift): Ditto.
(*aarch64_bfi4_noshift_alt): Ditto.

2018-02-11  Steve Ellcey  

PR rtl-optimization/87763
* gcc.target/aarch64/combine_bfxil.c: Change some bfxil checks
to bfi.
* gcc.target/aarch64/combine_bfi_2.c: New test.


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b035e35f33b..b6c0d0a8eb6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -429,6 +429,9 @@ bool aarch64_label_mentioned_p (rtx);
 void aarch64_declare_function_name (FILE *, const char*, tree);
 bool aarch64_legitimate_pic_operand_p (rtx);
 bool aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode, rtx, rtx);
+bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT,
+	unsigned HOST_WIDE_INT);
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 opt_machine_mode aarch64_sve_pred_mode (unsigned int);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d7c453cdad0..a7ef952ad1b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9330,6 +9330,35 @@ aarch64_mask_and_shift_for_ubfiz_p (scalar_int_mode mode, rtx mask,
 	 & ((HOST_WIDE_INT_1U << INTVAL (shft_amnt)) - 1)) == 0;
 }
 
+/* Return true if the masks and a shift amount from an RTX of the form
+   ((x & MASK1) | ((y << SHIFT_AMNT) & MASK2)) are valid to combine into
+   a BFI instruction of mode MODE.  See *arch64_bfi patterns.  */
+
+bool
+aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode,
+   unsigned HOST_WIDE_INT mask1,
+   unsigned HOST_WIDE_INT shft_amnt,
+   unsigned HOST_WIDE_INT mask2)
+{
+  unsigned HOST_WIDE_INT t;
+
+  /* Verify that there is no overlap in what bits are set in the two masks.  */
+  if (mask1 != ~mask2)
+return false;
+
+  /* Verify that mask2 is not all zeros or ones.  */
+  if (mask2 == 0 || mask2 == HOST_WIDE_INT_M1U)
+return false;
+
+  /* The shift amount should always be less than the mode size.  */
+  gcc_assert (shft_amnt < GET_MODE_BITSIZE (mode));
+
+  /* Verify that the mask being shifted is contiguous and would be in the
+ least significant bits after shifting by shft_amnt.  */
+  t = mask2 + (HOST_WIDE_INT_1U << shft_amnt);
+  return (t == (t & -t));
+}
+
 /* Calculate the cost of calculating X, storing it in *COST.  Result
is true if the total cost of the operation has now been calculated.  */
 static bool
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b7f6fe0f135..2bbd3f1055c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5476,6 +5476,76 @@
   [(set_attr "type" "bfm")]
 )
 
+;;  Match a bfi instruction where the shift of OP3 means that we are
+;;  actually copying the least significant bits of OP3 into OP0 by way
+;;  of the AND masks and the IOR instruction.  A similar instruction
+;;  with the two parts of the IOR swapped around was never 

Re: C++ PATCH for c++/89217 - ICE with list-initialization in range-based for loop

2019-02-11 Thread Jason Merrill

On 2/7/19 6:02 PM, Marek Polacek wrote:

Since r268321 we can call digest_init even in a template, when the compound
literal isn't instantiation-dependent.


Right.  And since digest_init modifies the CONSTRUCTOR in place, that 
means the template trees are digested rather than the original parse 
trees that we try to use.  If we're going to use digest_init, we should 
probably save another CONSTRUCTOR with the original trees.


Jason


Re: [PATCH] Updated patches for the port of gccgo to GNU/Hurd

2019-02-11 Thread Samuel Thibault
Svante Signell, le lun. 11 févr. 2019 12:10:21 +0100, a ecrit:
> WCONTINUED is not defined, I assume that WIFCONTINUED is not supported.
> 
> From waitpid(2):
> WCONTINUED (since Linux 2.6.10)
>also return if a stopped child has been resumed by delivery of SIGCONT.
> 
> @Samuel: more info?

git grep WCONTINUED .
yields nothing in hurd/proc, so it's probably just not supported yet
indeed.

> > > The third patch, src_libgo_testsuite_gotest.diff, is not strictly needed,
> > > but running the tests the annoying text is displayed: "ps: comm: Unknown
> > > format spec"
> > 
> > I get that "comm" doesn't work, but the change in that patch is simply
> > incorrect.  If you don't pass "comm", the "grep sleep" will never
> > succeed.  If there is no way to support this code on Hurd then we
> > should skip it, not put in a command that can never work.
> 
> OK, let's drop that part then.
> 
> @Samuel: more info?

arg0 can be used instead.

That said, we should implement comm since it's defined by posix, it's
probably a matter of duplicating the line "Arg0" in hurd/libps/spec.c

Samuel


Re: [PATCH] Updated patches for the port of gccgo to GNU/Hurd

2019-02-11 Thread Ian Lance Taylor
On Mon, Feb 11, 2019 at 3:10 AM Svante Signell  wrote:
>
> On Sun, 2019-02-10 at 22:08 -0800, Ian Lance Taylor wrote:
> > On Sun, Feb 10, 2019 at 3:41 AM Svante Signell 
> > wrote:
> > > On Sat, 2019-02-09 at 23:57 +0100, Svante Signell wrote:
> > > > On Sat, 2019-02-09 at 14:40 -0800, Ian Lance Taylor wrote:
> > > > > On Fri, Feb 8, 2019 at 3:07 PM Matthias Klose  wrote:
> > > > > > On 07.02.19 06:04, Ian Lance Taylor wrote:
> > > > > What are the lines before that in the log?  For some reason libtool is
> > > > > being invoke with no source files.  The lines before the failing line
> > > > > should show an invocation of match.sh that determines the source
> > > > > files.
> > > >
> > > > Thanks for your job upstreaming the patches!
> > > >
> > > > I've found some problems. Current problem is with the mksysinfo.sh 
> > > > patch.
> > > > But there are some other things missing. New patches will be submitted
> > > > tomorrow.
> > >
> > > Attached are three additional patches needed to build libgo on GNU/Hurd:
> > > src_libgo_mksysinfo.sh.diff
> > > src_libgo_go_syscall_wait.c.diff
> > > src_libgo_testsuite_gotest.diff
> > >
> > > For the first patch, src_libgo_mksysinfo.sh.diff, I had to go back to the
> > > old version, using sed -i -e. As written now ${fsid_to_dev} expands to
> > > fsid_to_dev='-e '\''s/st_fsid/Dev/'\''' resulting in: "sed: -e expression
> > > #4, char 1: unknown command: `''". Unfortunately, I have not yet been able
> > > to modify the expansion omitting the single qoutes around the shell
> > > variable.
> >
> > I'm sorry, I don't want to use "sed -i".  That loses the original file
> > and makes it harder to reconstruct what has happened.
>
> What to do then?
>
> > > The second patch, src_libgo_go_syscall_wait.c.diff, is needed since
> > > WCONTINUED is not defined and is needed for WIFCONTINUED to be defined in
> > > wait.h.
> >
> > I don't understand that.   is a system header file.  Are
> > you saying that it is impossible to use  and WIFCONTINUED
> > unless your source code does a #define WCONTINUED before #include'ing
> > ?  That seems like a bug in the Hurd library code.
>
> The problem is that WCONTINUED is not defined in /usr/include/i386-
> gnu/bits/waitflags.h on Hurd. Only WNOHANG and WUNTRACED are. That causes
> WIFCONTINUED not to be defined in /usr/include/i386-gnu/bits/waitstatus.h. As
> WCONTINUED is not defined, I assume that WIFCONTINUED is not supported.
>
> From waitpid(2):
> WCONTINUED (since Linux 2.6.10)
>also return if a stopped child has been resumed by delivery of SIGCONT.
>
> @Samuel: more info?
>
> I think that that call to WIFCONTINUED in libgo/go/syscall/wait.c
> _Bool
> Continued (uint32_t *w)
> {
>   return WIFCONTINUED (*w) != 0;
> }
>
> has to be omitted somehow for Hurd.

It sound like the right fix is to use #ifdef WIFCONTINUED in
syscall/wait.c.  If WIFCONTINUED is not defined, the Continued
function should always return 0.

Ian


Re: [PATCH] avoid 4095/INT_MAX warning for fprintf (PR 88993)

2019-02-11 Thread Martin Sebor

Ping: https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00224.html

(This patch also handles bug 88835.)

On 2/4/19 8:58 PM, Martin Sebor wrote:

The attached patch relaxes -Wformat-overflow=2 to avoid warning about
individual directives that might (but need not) exceed the 4095 byte
limit, and about the total function output that likewise might (but
need not) exceed the INT_MAX limit.

The bug report actually requests that instead of the standard minimum
of 4095 bytes, GCC consider real libc limits, but trying to figure
out what these real limits might be (they're not documented anywhere,
AFAIK) and hardcoding them into GCC doesn't seem like a good solution.

Instead, the patch only does little more than the bare minimum to
suppress these pedantic warnings, and it only does that for the "may
exceed" cases and not for those where the size of output definitely
exceeds either limit.  Using the formatted functions to write such
large amounts of data seems more likely to be a bug than intentional,
and at level 2 issuing the warning seems appropriate unless the return
value of the function is tested.  When it is, even tough exceeding
these limits is strictly undefined, it seems reasonable to assume that
a quality libc implementation will detect it and return an error (as
required by POSIX).

So with the patch, the only way to get this warning is for calls to
sprintf or to unchecked snprintf.

Martin




Re: [PATCH][GCC][AArch64] Allow any offset for SVE addressing modes before reload

2019-02-11 Thread Richard Sandiford
Tamar Christina  writes:
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 5df5a8b78439e69705e62845a4d1f86166a01894..59f03e688e58c1aab37629555c7b3f19e5075935
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3414,6 +3414,14 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm,
>  void
>  aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
>  {
> +  /* Make sure that the address is legitimate.  */
> +  if (MEM_P (dest)
> +  && !aarch64_sve_struct_memory_operand_p (dest))
> +{
> +  rtx addr = force_reg (Pmode, XEXP (dest, 0));
> +  dest = replace_equiv_address (dest, addr);
> +}
> +
>emit_insn (gen_rtx_SET (dest, gen_rtx_UNSPEC (GET_MODE (dest),
>   gen_rtvec (2, pred, src),
>   UNSPEC_MERGE_PTRUE)));

I think the same thing could happen for the src as well as dest.
The function is also used for single-vector modes, not just struct modes.

This code predated the support for "@" patterns.  Now that we have them,
it might be better to make:

(define_insn "*pred_mov"
  [(set (match_operand:SVE_ALL 0 "nonimmediate_operand" "=w, m")
(unspec:SVE_ALL
  [(match_operand: 1 "register_operand" "Upl, Upl")
   (match_operand:SVE_ALL 2 "nonimmediate_operand" "m, w")]
  UNSPEC_MERGE_PTRUE))]

and:

(define_insn_and_split "pred_mov"
  [(set (match_operand:SVE_STRUCT 0 "aarch64_sve_struct_nonimmediate_operand" 
"=w, Utx")
(unspec:SVE_STRUCT
  [(match_operand: 1 "register_operand" "Upl, Upl")
   (match_operand:SVE_STRUCT 2 
"aarch64_sve_struct_nonimmediate_operand" "Utx, w")]
  UNSPEC_MERGE_PTRUE))]

public as "@aarch64_pred_mov" ("aarch64_" so that we don't
pollute the target-independent namespace).  Then we can use something like:

  expand_operand ops[3];
  create_output_operand ([0], dest, mode);
  create_input_operand ([1], pred, GET_MODE (pred));
  create_input_operand ([2], src, mode);
  expand_insn (code_for_aarch64_pred_mov (mode), 3, ops);

(completely untested).

Thanks,
Richard


[COMMITTED] Fix pthread errors in pr86637-2.c

2019-02-11 Thread Wilco Dijkstra
Fix test errors on targets which do not support pthreads.

Committed as obvious.

ChangeLog:
2019-02-11  Wilco Dijkstra  

PR tree-optimization/86637
* gcc.c-torture/compile/pr86637-2.c: Test pthread and graphite target.
---
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr86637-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr86637-2.c
index 
3b675eae1b685fb1f7a431bb84dc5b3dbb327177..2f69c292f7e9d205e92f797b78cebd769e0c1dc6
 100644
--- a/gcc/testsuite/gcc.c-torture/compile/pr86637-2.c
+++ b/gcc/testsuite/gcc.c-torture/compile/pr86637-2.c
@@ -1,4 +1,6 @@
-/* { dg-do compile { target fgraphite } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target fgraphite } */
+/* { dg-require-effective-target pthread } */
 /* { dg-options "-floop-parallelize-all -fsave-optimization-record 
-ftree-parallelize-loops=2 -ftree-slp-vectorize" } */
 
 #include 


Re: [PATCH][ARM] Fix PR89222

2019-02-11 Thread Alexander Monakov
On Mon, 11 Feb 2019, Wilco Dijkstra wrote:

> The GCC optimizer can generate symbols with non-zero offset from simple
> if-statements. Bit zero is used for the Arm/Thumb state bit, so relocations
> with offsets fail if it changes bit zero and the relocation forces bit zero
> to true.  The fix is to disable offsets on function pointer symbols.  
> 
> ARMv5te bootstrap OK, regression tests pass. OK for commit?

Just to be sure the issue is analyzed properly: if it's certain that this usage
is not allowed, shouldn't the linker produce a diagnostic instead of silently
concealing the issue?

With Gold linker this is handled correctly.  So it looks to me like a
bug in BFD linker, where it ignores any addend (not just +1/-1) when
resolving a relocation against a Thumb function.

Alexander


Re: [PATCH 13/43] i386: Emulate MMX pshufw with SSE

2019-02-11 Thread H.J. Lu
On Sun, Feb 10, 2019 at 3:16 AM Uros Bizjak  wrote:
>
> On 2/10/19, H.J. Lu  wrote:
> > Emulate MMX pshufw with SSE.  Only SSE register source operand is allowed.
> >
> >   PR target/89021
> >   * config/i386/mmx.md (mmx_pshufw_1): Add SSE emulation.
> >   (*vec_dupv4hi): Likewise.
> >   emulation.
> > ---
> >  gcc/config/i386/mmx.md | 33 +
> >  1 file changed, 21 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> > index 1ee51c5deb7..dc81d7f45df 100644
> > --- a/gcc/config/i386/mmx.md
> > +++ b/gcc/config/i386/mmx.md
> > @@ -1364,7 +1364,8 @@
> >[(match_operand:V4HI 0 "register_operand")
> > (match_operand:V4HI 1 "nonimmediate_operand")
> > (match_operand:SI 2 "const_int_operand")]
> > -  "TARGET_SSE || TARGET_3DNOW_A"
> > +  "((TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE)
> > +   || TARGET_3DNOW_A"
>
> I think that the above condition should read
>
> (TARGET_MMX || TARGET_MMX_WITH_SSE) && (TARGET_SSE || TARGET_3DNOW_A)
>
> and with TARGET_MMX_WITH_SSE (which implies SSE2) we always use XMM
> registers. Without SSE2, we use MMX registers, as before.

Done.

> >  {
> >int mask = INTVAL (operands[2]);
> >emit_insn (gen_mmx_pshufw_1 (operands[0], operands[1],
> > @@ -1376,14 +1377,15 @@
> >  })
> >
> >  (define_insn "mmx_pshufw_1"
> > -  [(set (match_operand:V4HI 0 "register_operand" "=y")
> > +  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv")
> >  (vec_select:V4HI
> > -  (match_operand:V4HI 1 "nonimmediate_operand" "ym")
> > +  (match_operand:V4HI 1 "nonimmediate_operand" "ym,Yv")
> >(parallel [(match_operand 2 "const_0_to_3_operand")
> >   (match_operand 3 "const_0_to_3_operand")
> >   (match_operand 4 "const_0_to_3_operand")
> >   (match_operand 5 "const_0_to_3_operand")])))]
> > -  "TARGET_SSE || TARGET_3DNOW_A"
> > +  "((TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSE)
> > +   || TARGET_3DNOW_A"
> >  {
> >int mask = 0;
> >mask |= INTVAL (operands[2]) << 0;
> > @@ -1392,11 +1394,15 @@
> >mask |= INTVAL (operands[5]) << 6;
> >operands[2] = GEN_INT (mask);
> >
> > -  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
> > +  if (TARGET_MMX_WITH_SSE)
> > +return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}";
> > +  else
> > +return "pshufw\t{%2, %1, %0|%0, %1, %2}";
>
> The above should be implemented as multi-output template.

I have

{
  int mask = 0;
  mask |= INTVAL (operands[2]) << 0;
  mask |= INTVAL (operands[3]) << 2;
  mask |= INTVAL (operands[4]) << 4;
  mask |= INTVAL (operands[5]) << 6;
  operands[2] = GEN_INT (mask);

  if (TARGET_MMX_WITH_SSE)
return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}";
  else
return "pshufw\t{%2, %1, %0|%0, %1, %2}";
}

How can I build mask before multi-output template?

> >  }
> > -  [(set_attr "type" "mmxcvt")
> > +  [(set_attr "mmx_isa" "native,x64")
> > +   (set_attr "type" "mmxcvt,sselog")
> > (set_attr "length_immediate" "1")
> > -   (set_attr "mode" "DI")])
> > +   (set_attr "mode" "DI,TI")])
> >
> >  (define_insn "mmx_pswapdv2si2"
> >[(set (match_operand:V2SI 0 "register_operand" "=y")
> > @@ -1410,15 +1416,18 @@
> > (set_attr "mode" "DI")])
> >
> >  (define_insn "*vec_dupv4hi"
> > -  [(set (match_operand:V4HI 0 "register_operand" "=y")
> > +  [(set (match_operand:V4HI 0 "register_operand" "=y,Yv")
> >   (vec_duplicate:V4HI
> > (truncate:HI
> > - (match_operand:SI 1 "register_operand" "0"]
> > + (match_operand:SI 1 "register_operand" "0,Yv"]
> >"TARGET_SSE || TARGET_3DNOW_A"
>
> Here we also need "(TARGET_MMX || TARGET_MMX_WITH_SSE) &&"

Fixed.

> Uros.
>
> > -  "pshufw\t{$0, %0, %0|%0, %0, 0}"
> > -  [(set_attr "type" "mmxcvt")
> > +  "@
> > +   pshufw\t{$0, %0, %0|%0, %0, 0}
> > +   %vpshuflw\t{$0, %1, %0|%0, %1, 0}"
> > +  [(set_attr "mmx_isa" "native,x64")
> > +   (set_attr "type" "mmxcvt,sselog1")
> > (set_attr "length_immediate" "1")
> > -   (set_attr "mode" "DI")])
> > +   (set_attr "mode" "DI,TI")])
> >
> >  (define_insn_and_split "*vec_dupv2si"
> >[(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> > --
> > 2.20.1
> >
> >



-- 
H.J.


[PATCH] PR c++/89267 - change of error location.

2019-02-11 Thread Jason Merrill
My patch for 86943 on the branch removed this code, which led to a location
change on one of the diagnostics in constexpr-lambda8.C.  Removing this bit
wasn't the point of the patch, so let's put it back.

Applying to 8 branch.

* pt.c (tsubst_copy_and_build): Do still clear expr location
for instantiated thunk calls.
---
 gcc/cp/pt.c  | 8 +++-
 gcc/cp/ChangeLog | 6 ++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index aa57811d7b7..72dc1e0b569 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -18539,12 +18539,18 @@ tsubst_copy_and_build (tree t,
bool op = CALL_EXPR_OPERATOR_SYNTAX (t);
bool ord = CALL_EXPR_ORDERED_ARGS (t);
bool rev = CALL_EXPR_REVERSE_ARGS (t);
-   if (op || ord || rev)
+   bool thk = CALL_FROM_THUNK_P (t);
+   if (op || ord || rev || thk)
  {
function = extract_call_expr (ret);
CALL_EXPR_OPERATOR_SYNTAX (function) = op;
CALL_EXPR_ORDERED_ARGS (function) = ord;
CALL_EXPR_REVERSE_ARGS (function) = rev;
+   if (thk)
+ {
+   /* The thunk location is not interesting.  */
+   SET_EXPR_LOCATION (function, UNKNOWN_LOCATION);
+ }
  }
  }
 
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 27f7032652f..7e3d056dc7b 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,9 @@
+2019-02-11  Jason Merrill  
+
+   PR c++/89267 - change of error location.
+   * pt.c (tsubst_copy_and_build): Do still clear expr location
+   for instantiated thunk calls.
+
 2019-02-08  Jason Merrill  
 
PR c++/88761 - ICE with reference capture of constant.

base-commit: 161d165c044a1cc7e5d4c15358817afaf6e82f58
-- 
2.20.1



[PATCH] S/390: Reject invalid Q/R/S/T addresses after LRA

2019-02-11 Thread Ilya Leoshkevich
Bootstrapped and regtested on s390x-redhat-linux.

The previous attempt to fix PR89233 [1] went into wrong direction of
dealing with symptoms rather than the root cause.  Since the approach
here is completely different, I'm not sending it as v2.

The following insn:

(insn (set (reg:DI %r2)
   (sign_extend:DI (mem:SI
(const:DI (plus:DI (symbol_ref:DI ("*.LC0"))
   (const_int 16)))

is correctly recognized by LRA as RIL alternative of extendsidi2
define_insn.  However, when recognition runs after LRA, it returns RXY
alternative, which is incorrect, since the offset 16 points past the
end of of *.LC0 literal pool entry.  Such addresses are normally
rejected by s390_decompose_address ().

This inconsistency confuses annotate_constant_pool_refs: the selected
alternative makes it proceed with annotation, only to find that the
annotated address is invalid, causing ICE.

This patch fixes the root cause, namely, that s390_check_qrst_address ()
behaves differently during and after LRA.

[1] https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00736.html

gcc/ChangeLog:

2019-02-11  Ilya Leoshkevich  

PR target/89233
* config/s390/s390.c (s390_decompose_address): Update comment.
(s390_check_qrst_address): Reject invalid address forms after
LRA.

gcc/testsuite/ChangeLog:

2019-02-11  Ilya Leoshkevich  

PR target/89233
* gcc.target/s390/pr89233.c: New test.
---
 gcc/config/s390/s390.c  | 10 +++---
 gcc/testsuite/gcc.target/s390/pr89233.c | 11 +++
 2 files changed, 18 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr89233.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 6a571a3e054..713973a3fd4 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -3020,7 +3020,9 @@ s390_decompose_address (rtx addr, struct s390_address 
*out)
  if (offset)
{
  /* If we have an offset, make sure it does not
-exceed the size of the constant pool entry.  */
+exceed the size of the constant pool entry.
+Otherwise we might generate an out-of-range
+displacement for the base register form.  */
  rtx sym = XVECEXP (disp, 0, 0);
  if (offset >= GET_MODE_SIZE (get_pool_mode (sym)))
return false;
@@ -3193,8 +3195,10 @@ s390_check_qrst_address (char c, rtx op, bool 
lit_pool_ok)
  generic cases below ('R' or 'T'), since reload will in fact fix
  them up.  LRA behaves differently here; we never see such forms,
  but on the other hand, we need to strictly reject every invalid
- address form.  Perform this check right up front.  */
-  if (lra_in_progress)
+ address form.  After both reload and LRA invalid address forms
+ must be rejected, because nothing will fix them up later.  Perform
+ this check right up front.  */
+  if (lra_in_progress || reload_completed)
 {
   if (!decomposed && !s390_decompose_address (op, ))
return 0;
diff --git a/gcc/testsuite/gcc.target/s390/pr89233.c 
b/gcc/testsuite/gcc.target/s390/pr89233.c
new file mode 100644
index 000..f572bfa08d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr89233.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-march=z13 -O1" } */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+
+int
+f ()
+{
+  v4si x = {0, 1, 2, 3};
+  return x[4];
+}
-- 
2.20.1



[PATCH][ARM] Fix PR89222

2019-02-11 Thread Wilco Dijkstra
The GCC optimizer can generate symbols with non-zero offset from simple
if-statements. Bit zero is used for the Arm/Thumb state bit, so relocations
with offsets fail if it changes bit zero and the relocation forces bit zero
to true.  The fix is to disable offsets on function pointer symbols.  

ARMv5te bootstrap OK, regression tests pass. OK for commit?

ChangeLog:
2019-02-06  Wilco Dijkstra  

gcc/
PR target/89222
* config/arm/arm.md (movsi): Use arm_cannot_force_const_mem
to decide when to split off an offset from a symbol.
* config/arm/arm.c (arm_cannot_force_const_mem): Disallow offsets
in function symbols.
* config/arm/arm-protos.h (arm_cannot_force_const_mem): Add.

testsuite/
PR target/89222
* gcc.target/arm/pr89222.c: Add new test.

--
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
79ede0db174fcce87abe8b4d18893550d4c7e2f6..0bedbe5110853617ecf7456bbaa56b1405fb65dd
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -184,6 +184,7 @@ extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, 
tree, rtx, tree);
 extern bool arm_pad_reg_upward (machine_mode, tree, int);
 #endif
 extern int arm_apply_result_size (void);
+extern bool arm_cannot_force_const_mem (machine_mode, rtx);
 
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
c4c9b4a667100d81d918196713e40b01ee232ee2..ccd4211045066d8edb89dd4c23d554517639f8f6
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -178,7 +178,6 @@ static void arm_internal_label (FILE *, const char *, 
unsigned long);
 static void arm_output_mi_thunk (FILE *, tree, HOST_WIDE_INT, HOST_WIDE_INT,
 tree);
 static bool arm_have_conditional_execution (void);
-static bool arm_cannot_force_const_mem (machine_mode, rtx);
 static bool arm_legitimate_constant_p (machine_mode, rtx);
 static bool arm_rtx_costs (rtx, machine_mode, int, int, int *, bool);
 static int arm_address_cost (rtx, machine_mode, addr_space_t, bool);
@@ -8936,15 +8935,20 @@ arm_legitimate_constant_p (machine_mode mode, rtx x)
 
 /* Implement TARGET_CANNOT_FORCE_CONST_MEM.  */
 
-static bool
+bool
 arm_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 {
   rtx base, offset;
+  split_const (x, , );
 
-  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P)
+  if (GET_CODE (base) == SYMBOL_REF)
 {
-  split_const (x, , );
-  if (GET_CODE (base) == SYMBOL_REF
+  /* Function symbols cannot have an offset due to the Thumb bit.  */
+  if ((SYMBOL_REF_FLAGS (base) & SYMBOL_FLAG_FUNCTION)
+ && INTVAL (offset) != 0)
+   return true;
+
+  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P
  && !offset_within_block_p (base, INTVAL (offset)))
return true;
 }
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
aa759624f8f617576773aa75fd6239d6e06e8a13..00fccd964a86dd814f15e4a1fdf5b47173a3ee3f
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5981,17 +5981,13 @@ (define_expand "movsi"
 }
 }
 
-  if (ARM_OFFSETS_MUST_BE_WITHIN_SECTIONS_P)
+  if (arm_cannot_force_const_mem (SImode, operands[1]))
 {
   split_const (operands[1], , );
-  if (GET_CODE (base) == SYMBOL_REF
- && !offset_within_block_p (base, INTVAL (offset)))
-   {
- tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
- emit_move_insn (tmp, base);
- emit_insn (gen_addsi3 (operands[0], tmp, offset));
- DONE;
-   }
+  tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+  emit_move_insn (tmp, base);
+  emit_insn (gen_addsi3 (operands[0], tmp, offset));
+  DONE;
 }
 
   /* Recognize the case where operand[1] is a reference to thread-local
diff --git a/gcc/testsuite/gcc.target/arm/pr89222.c 
b/gcc/testsuite/gcc.target/arm/pr89222.c
new file mode 100644
index 
..d26d7df17544db8426331e67b9a36d749ec6c6d1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr89222.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void g (void);
+
+void f1 (int x)
+{
+  if (x != (int) g + 3)
+return;
+  g();
+}
+
+void (*a2)(void);
+
+void f2 (void)
+{
+  a2 =  + 3;
+}
+
+typedef void (*__sighandler_t)(int);
+void handler (int);
+
+void f3 (int x)
+{
+  __sighandler_t h = 
+  if (h != (__sighandler_t) 2 && h != (__sighandler_t) 1)
+h (x);
+}
+
+/* { dg-final { scan-assembler-times {add(?:s)?\tr[0-9]+, r[0-9]+, #3} 2 } } */
+/* { dg-final { scan-assembler-not {.word\tg\+3} } } */
+/* { dg-final { scan-assembler-not {.word\thandler-1} } } */



Re: arm access to stack slot out of allocated area

2019-02-11 Thread Olivier Hainque
Hello Ramana,

> Olivier, while you are here could you also document the choices made by
> the vxworks port in terms of the ABI and how it differs from EABI ? It
> would certainly help with patch review.

Thanks for your feedback as well.

Yes, I'll add a comment and macro defs to the VxWorks
headers to state the ABI settings explicitly.

We're using the defaults from arm.h for vxworks6 today
(ABI_APCS & MASK_APCS_FRAME), which matches what the system
environment provides.

I'll stick a note mentioning the deprecation Wilco
described and pushing towards VxWorks 7.

We'll probably end up removing support entirely at
some point.

Cheers,

Olivier



Re: arm access to stack slot out of allocated area

2019-02-11 Thread Olivier Hainque
Hi Wilco,

> On 8 Feb 2019, at 22:35, Wilco Dijkstra  wrote:

> So I think we need to push much harder on getting rid of obsolete stuff and
> avoid people encountering these nasty issues.

Numbers I just received indicate that we can legitimately head
in this direction for VxWorks as well (move towards VxWorks 7 only
ports, AAPCS based).

Good news :)

Thanks for your input!




Re: [PATCH] correct comments in tree-prof/inliner-1.c

2019-02-11 Thread Jan Hubicka
> I noticed the comments in the test don't correspond to what it's
> designed to exercise: namely that the call to hot_function() is
> inlined and the call to cold_function() is not, rather than
> the other way around.
> 
> Attached is a patch that adjusts the comments.  Honza, please let
> me know if this looks correct to you.
> 
> Thaks
> Martin

> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/tree-prof/inliner-1.c: Correct comments.

This looks ok, thanks!
Honza

> 
> Index: gcc/testsuite/gcc.dg/tree-prof/inliner-1.c
> ===
> --- gcc/testsuite/gcc.dg/tree-prof/inliner-1.c(revision 268755)
> +++ gcc/testsuite/gcc.dg/tree-prof/inliner-1.c(working copy)
> @@ -28,15 +28,15 @@ main ()
>for (i = 0; i < 100; i++)
>  {
>if (a)
> -cold_function ();
> +cold_function ();   /* Should not be inlined.  */
>else
> -hot_function ();
> +hot_function ();/* Should be inlined.  */
>  }
>return 0;
>  }
>  
> -/* cold function should be inlined, while hot function should not.  
> -   Look for "cold_function () [tail call];" call statement not for the
> -   declaration or other appearances of the string in dump.  */
> +/* The call to hot_function should be inlined, while cold_function should
> +   not be.  Look for the "cold_function ();" call statement and not for
> +   its declaration or other occurrences of the string in the dump.  */
>  /* { dg-final-use { scan-tree-dump "cold_function ..;" "optimized"} } */
>  /* { dg-final-use { scan-tree-dump-not "hot_function ..;" "optimized"} } */



[PATCH] correct comments in tree-prof/inliner-1.c

2019-02-11 Thread Martin Sebor

I noticed the comments in the test don't correspond to what it's
designed to exercise: namely that the call to hot_function() is
inlined and the call to cold_function() is not, rather than
the other way around.

Attached is a patch that adjusts the comments.  Honza, please let
me know if this looks correct to you.

Thaks
Martin
gcc/testsuite/ChangeLog:

	* gcc.dg/tree-prof/inliner-1.c: Correct comments.

Index: gcc/testsuite/gcc.dg/tree-prof/inliner-1.c
===
--- gcc/testsuite/gcc.dg/tree-prof/inliner-1.c	(revision 268755)
+++ gcc/testsuite/gcc.dg/tree-prof/inliner-1.c	(working copy)
@@ -28,15 +28,15 @@ main ()
   for (i = 0; i < 100; i++)
 {
   if (a)
-cold_function ();
+cold_function ();   /* Should not be inlined.  */
   else
-hot_function ();
+hot_function ();/* Should be inlined.  */
 }
   return 0;
 }
 
-/* cold function should be inlined, while hot function should not.  
-   Look for "cold_function () [tail call];" call statement not for the
-   declaration or other appearances of the string in dump.  */
+/* The call to hot_function should be inlined, while cold_function should
+   not be.  Look for the "cold_function ();" call statement and not for
+   its declaration or other occurrences of the string in the dump.  */
 /* { dg-final-use { scan-tree-dump "cold_function ..;" "optimized"} } */
 /* { dg-final-use { scan-tree-dump-not "hot_function ..;" "optimized"} } */


Re: Make clear, when contributions will be ignored

2019-02-11 Thread Segher Boessenkool
On Mon, Feb 11, 2019 at 02:16:27PM +, Дилян Палаузов wrote:
> Hello Segher,
> 
> my question was how do you propose to proceed, so that a 
> no-reminders-for-patches-are-necessary-state is reached.
> 
> There is no relation with having infinite time or dealing with high-cost 
> low-profit patches.
> 
> Previously I raised the quesion, whether automating the process for sending 
> reminders, is a good idea.  This saves time
> of people to write reminders.

But that would be "optimising" exactly the wrong thing!  The choke point is
patch review.  So you should make it easier to review a patch, instead of
making it easier to send in more patches.  Your complaint is that many
patches are sent in but then not reviewed, or not reviewed for a long while,
after all.

Easy to review patches are of course first and foremost patches that do the
correct thing.  But also they need to clearly say what they fix (and how),
how the patch was tested, and they should often contain testcases for the
testsuite.  Easy to review patches usually use the same style and
presentation as all other easy to review patches.


Segher


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread Jakub Jelinek
On Mon, Feb 11, 2019 at 04:56:45PM +0100, Uros Bizjak wrote:
> > Let's first define what MODE_XI means in standard_sse_constant_opcode
> > as well as in all these mov patterns for with and without AVX512VL.   
> > Without
> > a clear definition, we can't get out of this mess.
> 
> INT_MODE (OI, 32);
> INT_MODE (XI, 64);
> 
> So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
> in case of const_1, all 512 bits set.
> 
> We can load zeros with narrower instruction, (e.g. 256 bit by inherent
> zeroing of highpart in case of 128 bit xor), so TImode in this case.
> 
> Some targets prefer V4SF mode, so they will emit float xorps for zeroing
> 
> Then the introduction of AVX512F fubared everything by overloading the
> meaning of insn mode.

I don't see much changes in AVX512F here, most of the behavior has been
there already in AVX.
Most of the SSE/AVX/AVX512 instructions affect the whole register,
usually there is DEST[MAX_VL-1:VL] <- 0 at the end of each instruction.
But, using the MAX_VL to determine get_attr_mode doesn't seem really useful,
because that changes dynamically at runtime based on the actual hw, not on
what we've been compiled for.
So, I believe we want to use that VL value to determine the bitsize of the
mode corresponding to get_attr_mode.  And in that case, for
*movoi_internal_avx and *movti_internal, I believe the right mode is MODE_OI
resp. MODE_TI for AVX512VL, because e.g.
vmovdqa32 %ymm12, %ymm23
is a VL = 256 instruction, not VL = 512.  Similarly, if we want to set
%ymm25 to all ones, i.e. movoi_internal_avx, we use
vpternlogd  $0xFF, %ymm25, %ymm25, %ymm25
which is again VL = 256 instruction, so should use MODE_OI.
We'd need to use
vmovdqa32 %zmm12, %zmm23
or
vpternlogd  $0xFF, %zmm25, %zmm25, %zmm25
instructions for AVX512F without AVX512VL, but as has been discussed, this
won't really happen, because hard_regno_mode_ok refuses to allocate 256-bit
or 128-bit modes in ext sse registers.

Jakub


Re: [PATCH, v2] rs6000: Vector shift-right should honor modulo semantics

2019-02-11 Thread Bill Schmidt
On 2/11/19 10:01 AM, Segher Boessenkool wrote:

> Hi Bill,
>
> On Mon, Feb 11, 2019 at 07:36:11AM -0600, Bill Schmidt wrote:
>> 2019-02-11  Bill Schmidt  
>>
>>  * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Shift-right
>>  and shift-left vector built-ins need to include a TRUNC_MOD_EXPR
>>  for correct semantics.
>>
>> [gcc/testsuite]
>>
>> 2019-02-11  Bill Schmidt  
>>
>>  * gcc.target/powerpc/vec-sld-modulo.c: New.
>>  * gcc.target/powerpc/vec-srad-modulo.c: New.
>>  * gcc.target/powerpc/vec-srd-modulo.c: New.
> This is okay for trunk and backports.  Thanks!
>
> One comment:
>
>> +vec_sldi (vui64_t vra, const unsigned int shb)
>> +{
>> +  vui64_t lshift;
>> +  vui64_t result;
>> +
>> +  /* Note legitimate use of wrong-type splat due to expectation that only
>> + lower 6-bits are read.  */
>> +  lshift = (vui64_t) vec_splat_s8((const unsigned char)shb);
>> +
>> +  /* Vector Shift Left Doublewords based on the lower 6-bits
>> + of corresponding element of lshift.  */
>> +  result = vec_vsld (vra, lshift);
>> +
>> +  return (vui64_t) result;
>> +}
> I realise this is a testcase, and in one frame of mind it is good to test
> all different styles and bad habits.  But please never use casts that do not
> do anything in correct programs: the only thing such casts do is they shut
> up warnings in incorrect programs (including the same program after a wrong
> change).  

Agreed!  Thanks.  I wasn't careful to remove these as I modified the original
test where they were pertinent.  Will fix before committing.

Thanks,
Bill

>
>
> Segher
>



Re: [PATCH][GCC][Arm] Update tests after register allocation changes. (PR/target 88560)

2019-02-11 Thread Kyrill Tkachov



On 11/02/19 15:17, Tamar Christina wrote:

Hi all,

After the register allocator changes of r268705 we need to update a few tests
with new output.

In all cases the compiler is now generating the expected code, since the tests
are all float16 testcases using a hard-floar abi, we expect that actual fp16
instructions are used rather than using integer loads and stores.  Because of
we also save on some mov.f16s that were being emitted before to move between the
two.

The aapcs cases now match the f32 cases in using floating point operations.

Regtested on arm-none-eabi and no issues.

Ok for trunk?



Ok.
Thanks,
Kyrill


Thanks,
Tamar

2019-02-11  Tamar Christina  

PR middle-end/88560
* gcc.target/arm/armv8_2-fp16-move-1.c: Update assembler scans.
* gcc.target/arm/fp16-aapcs-1.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.

--




Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread H.J. Lu
In Mon, Feb 11, 2019 at 7:56 AM Uros Bizjak  wrote:
>
> On Mon, Feb 11, 2019 at 3:32 PM H.J. Lu  wrote:
> >
> > On Mon, Feb 11, 2019 at 5:51 AM Uros Bizjak  wrote:
> > >
> > > On Mon, Feb 11, 2019 at 2:29 PM H.J. Lu  wrote:
> > >
> > > > > No. As said, please correctly set mode to XImode in mode attribute 
> > > > > calculation.
> > > >
> > > > There is
> > > >
> > > >  switch (get_attr_type (insn))
> > > > {
> > > > case TYPE_SSELOG1:
> > > >   return standard_sse_constant_opcode (insn, operands);
> > > >
> > > > standard_sse_constant_opcode has
> > > >
> > > > else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
> > > > {
> > > >   enum attr_mode insn_mode = get_attr_mode (insn);
> > > >
> > > >   switch (insn_mode)
> > > > {
> > > > case MODE_XI:
> > > > case MODE_V8DF:
> > > > case MODE_V16SF:
> > > >   gcc_assert (TARGET_AVX512F);
> > > >   return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 
> > > > 0xFF}";
> > >
> > > If there is something wrong with standard_sse_constant_opcode, then
> > > fix the problem in the function itself. With your previous patch, you
> > > introduced a regression, and the presented fix is another kludge to
> > > fix a stack of kludges inside standard_sse_constant_opcode.
> > >
> > > Please take your time and propose some acceptable solution that would
> > > put some logic into const_0/const_1 handling. The situation is not OK
> > > and your patch makes it even worse.
> > >
> >
> > Let's first define what MODE_XI means in standard_sse_constant_opcode
> > as well as in all these mov patterns for with and without AVX512VL.   
> > Without
> > a clear definition, we can't get out of this mess.
>
> INT_MODE (OI, 32);
> INT_MODE (XI, 64);
>
> So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
> in case of const_1, all 512 bits set.
>
> We can load zeros with narrower instruction, (e.g. 256 bit by inherent
> zeroing of highpart in case of 128 bit xor), so TImode in this case.
>
> Some targets prefer V4SF mode, so they will emit float xorps for zeroing
>
> Then the introduction of AVX512F fubared everything by overloading the
> meaning of insn mode.

Exactly.

How should we use INSN mode,  MODE_XI, in standard_sse_constant_opcode
and patterns which use standard_sse_constant_opcode? 2 options:

1.  MODE_XI should only used to check if EXT_REX_SSE_REG_P is true
in any register operand.  The operand size must be determined by operand
itself , not by MODE_XI.  The operand encoding size should be determined
by the operand size, EXT_REX_SSE_REG_P and AVX512VL.
2. MODE_XI should be used to determine the operand encoding size.
EXT_REX_SSE_REG_P and AVX512VL should be checked for encoding
instructions.

Which way should we go?

-- 
H.J.


Re: [PATCH, v2] rs6000: Vector shift-right should honor modulo semantics

2019-02-11 Thread Segher Boessenkool
Hi Bill,

On Mon, Feb 11, 2019 at 07:36:11AM -0600, Bill Schmidt wrote:
> 2019-02-11  Bill Schmidt  
> 
>   * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Shift-right
>   and shift-left vector built-ins need to include a TRUNC_MOD_EXPR
>   for correct semantics.
> 
> [gcc/testsuite]
> 
> 2019-02-11  Bill Schmidt  
> 
>   * gcc.target/powerpc/vec-sld-modulo.c: New.
>   * gcc.target/powerpc/vec-srad-modulo.c: New.
>   * gcc.target/powerpc/vec-srd-modulo.c: New.

This is okay for trunk and backports.  Thanks!

One comment:

> +vec_sldi (vui64_t vra, const unsigned int shb)
> +{
> +  vui64_t lshift;
> +  vui64_t result;
> +
> +  /* Note legitimate use of wrong-type splat due to expectation that only
> + lower 6-bits are read.  */
> +  lshift = (vui64_t) vec_splat_s8((const unsigned char)shb);
> +
> +  /* Vector Shift Left Doublewords based on the lower 6-bits
> + of corresponding element of lshift.  */
> +  result = vec_vsld (vra, lshift);
> +
> +  return (vui64_t) result;
> +}

I realise this is a testcase, and in one frame of mind it is good to test
all different styles and bad habits.  But please never use casts that do not
do anything in correct programs: the only thing such casts do is they shut
up warnings in incorrect programs (including the same program after a wrong
change).  


Segher


Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread Uros Bizjak
On Mon, Feb 11, 2019 at 3:32 PM H.J. Lu  wrote:
>
> On Mon, Feb 11, 2019 at 5:51 AM Uros Bizjak  wrote:
> >
> > On Mon, Feb 11, 2019 at 2:29 PM H.J. Lu  wrote:
> >
> > > > No. As said, please correctly set mode to XImode in mode attribute 
> > > > calculation.
> > >
> > > There is
> > >
> > >  switch (get_attr_type (insn))
> > > {
> > > case TYPE_SSELOG1:
> > >   return standard_sse_constant_opcode (insn, operands);
> > >
> > > standard_sse_constant_opcode has
> > >
> > > else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
> > > {
> > >   enum attr_mode insn_mode = get_attr_mode (insn);
> > >
> > >   switch (insn_mode)
> > > {
> > > case MODE_XI:
> > > case MODE_V8DF:
> > > case MODE_V16SF:
> > >   gcc_assert (TARGET_AVX512F);
> > >   return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
> >
> > If there is something wrong with standard_sse_constant_opcode, then
> > fix the problem in the function itself. With your previous patch, you
> > introduced a regression, and the presented fix is another kludge to
> > fix a stack of kludges inside standard_sse_constant_opcode.
> >
> > Please take your time and propose some acceptable solution that would
> > put some logic into const_0/const_1 handling. The situation is not OK
> > and your patch makes it even worse.
> >
>
> Let's first define what MODE_XI means in standard_sse_constant_opcode
> as well as in all these mov patterns for with and without AVX512VL.   Without
> a clear definition, we can't get out of this mess.

INT_MODE (OI, 32);
INT_MODE (XI, 64);

So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
in case of const_1, all 512 bits set.

We can load zeros with narrower instruction, (e.g. 256 bit by inherent
zeroing of highpart in case of 128 bit xor), so TImode in this case.

Some targets prefer V4SF mode, so they will emit float xorps for zeroing

Then the introduction of AVX512F fubared everything by overloading the
meaning of insn mode.

Uros.


RE: [Aarch64][SVE] Vectorise sum-of-absolute-differences

2019-02-11 Thread Alejandro Martinez Vicente
> -Original Message-
> From: James Greenhalgh 
> Sent: 06 February 2019 17:42
> To: Alejandro Martinez Vicente 
> Cc: GCC Patches ; nd ; Richard
> Sandiford ; Richard Biener
> 
> Subject: Re: [Aarch64][SVE] Vectorise sum-of-absolute-differences
> 
> On Mon, Feb 04, 2019 at 07:34:05AM -0600, Alejandro Martinez Vicente
> wrote:
> > Hi,
> >
> > This patch adds support to vectorize sum of absolute differences
> > (SAD_EXPR) using SVE. It also uses the new functionality to ensure
> > that the resulting loop is masked. Therefore, it depends on
> >
> > https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00016.html
> >
> > Given this input code:
> >
> > int
> > sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n) {
> >   int sum = 0;
> >
> >   for (int i = 0; i < n; i++)
> > {
> >   sum += __builtin_abs (x[i] - y[i]);
> > }
> >
> >   return sum;
> > }
> >
> > The resulting SVE code is:
> >
> >  :
> >0:   715fcmp w2, #0x0
> >4:   5400026db.le50 
> >8:   d283mov x3, #0x0// #0
> >c:   93407c42sxtwx2, w2
> >   10:   2538c002mov z2.b, #0
> >   14:   25221fe0whilelo p0.b, xzr, x2
> >   18:   2538c023mov z3.b, #1
> >   1c:   2518e3e1ptrue   p1.b
> >   20:   a4034000ld1b{z0.b}, p0/z, [x0, x3]
> >   24:   a4034021ld1b{z1.b}, p0/z, [x1, x3]
> >   28:   0430e3e3incbx3
> >   2c:   0520c021sel z1.b, p0, z1.b, z0.b
> >   30:   25221c60whilelo p0.b, x3, x2
> >   34:   040d0420uabdz0.b, p1/m, z0.b, z1.b
> >   38:   44830402udotz2.s, z0.b, z3.b
> >   3c:   5421b.ne20   // b.any
> >   40:   2598e3e0ptrue   p0.s
> >   44:   04812042uaddv   d2, p0, z2.s
> >   48:   1e260040fmovw0, s2
> >   4c:   d65f03c0ret
> >   50:   1e2703e2fmovs2, wzr
> >   54:   1e260040fmovw0, s2
> >   58:   d65f03c0ret
> >
> > Notice how udot is used inside a fully masked loop.
> >
> > I tested this patch in an aarch64 machine bootstrapping the compiler
> > and running the checks.
> 
> This doesn't give us much confidence in SVE coverage; unless you have been
> running in an environment using SVE by default? Do you have some set of
> workloads you could test the compiler against to ensure correct operation of
> the SVE vectorization?
> 
I tested it using an SVE model and a big set of workloads, including SPEC 2000,
2006 and 2017. On the plus side, nothing got broken. But impact on performance
was very minimal (on average, a tiny gain over the whole set of workloads).

I still want this patch (and the companion dot product patch) to make into the
compiler because they are the first steps towards vectorising workloads using
fully masked loops when the target ISA (like SVE) doesn't support masking in
all the operations.

Alejandro

> >
> > I admit it is too late to merge this into gcc 9, but I'm posting it
> > anyway so it can be considered for gcc 10.
> 
> Richard Sandiford has the call on whether this patch is OK for trunk now or
> GCC 10. With the minimal testing it has had, I'd be uncomfortable with it as a
> GCC 9 patch. That said, it is a fairly self-contained pattern for the compiler
> and it would be good to see this optimization in GCC 9.
> 
> >
> > Alejandro
> >
> >
> > gcc/Changelog:
> >
> > 2019-02-04  Alejandro Martinez  
> >
> > * config/aarch64/aarch64-sve.md (abd_3): New
> define_expand.
> > (aarch64_abd_3): Likewise.
> > (*aarch64_abd_3): New define_insn.
> > (sad): New define_expand.
> > * config/aarch64/iterators.md: Added MAX_OPP and max_opp
> attributes.
> > Added USMAX iterator.
> > * config/aarch64/predicates.md: Added aarch64_smin and
> aarch64_umin
> > predicates.
> > * tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR.
> > (build_vect_cond_expr): Likewise.
> >
> > gcc/testsuite/Changelog:
> >
> > 2019-02-04  Alejandro Martinez  
> >
> > * gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute
> > differences.
> 



[PATCH][GCC][Arm] Update tests after register allocation changes. (PR/target 88560)

2019-02-11 Thread Tamar Christina
Hi all,

After the register allocator changes of r268705 we need to update a few tests
with new output.

In all cases the compiler is now generating the expected code, since the tests
are all float16 testcases using a hard-floar abi, we expect that actual fp16
instructions are used rather than using integer loads and stores.  Because of
we also save on some mov.f16s that were being emitted before to move between the
two.

The aapcs cases now match the f32 cases in using floating point operations.

Regtested on arm-none-eabi and no issues.

Ok for trunk?

Thanks,
Tamar

2019-02-11  Tamar Christina  

PR middle-end/88560
* gcc.target/arm/armv8_2-fp16-move-1.c: Update assembler scans.
* gcc.target/arm/fp16-aapcs-1.c: Likewise.
* gcc.target/arm/fp16-aapcs-3.c: Likewise.

-- 
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index 56d87eb6f716718595dc6acdf0744b1d9ecf4a42..2321dd38cc6d7a3635f01180ad0f235b2a183ec2 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -16,7 +16,6 @@ test_load_2 (__fp16* a, int i)
   return a[i];
 }
 
-/* { dg-final { scan-assembler-times {vld1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 2 } }  */
 
 void
 test_store_1 (__fp16* a, __fp16 b)
@@ -30,7 +29,6 @@ test_store_2 (__fp16* a, int i, __fp16 b)
   a[i] = b;
 }
 
-/* { dg-final { scan-assembler-times {vst1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 2 } }  */
 
 __fp16
 test_load_store_1 (__fp16* a, int i, __fp16* b)
@@ -44,8 +42,9 @@ test_load_store_2 (__fp16* a, int i, __fp16* b)
   a[i] = b[i + 2];
   return a[i];
 }
-/* { dg-final { scan-assembler-times {ldrh\tr[0-9]+} 2 } }  */
-/* { dg-final { scan-assembler-times {strh\tr[0-9]+} 2 } }  */
+
+/* { dg-final { scan-assembler-times {vst1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 3 } }  */
+/* { dg-final { scan-assembler-times {vld1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 3 } }  */
 
 __fp16
 test_select_1 (int sel, __fp16 a, __fp16 b)
@@ -102,7 +101,7 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c)
 /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
 /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } }  */
 
-/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler-not {vmov\.f16} } }  */
 
 int
 test_compare_1 (__fp16 a, __fp16 b)
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c
index b91168d43b389675909cabc1950c750c1c5dbf24..0a0a60f3503387f96eed881645aae031275d21ff 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-1.c
@@ -16,6 +16,7 @@ F (__fp16 a, __fp16 b, __fp16 c)
   return c;
 }
 
-/* { dg-final { scan-assembler {vmov(\.f16)?\tr[0-9]+, s[0-9]+} } }  */
-/* { dg-final { scan-assembler {vmov(\.f32)?\ts1, s0} } }  */
-/* { dg-final { scan-assembler {vmov(\.f16)?\ts0, r[0-9]+} } }  */
+/* { dg-final { scan-assembler {vmov\.f32\ts[0-9]+, s1} } }  */
+/* { dg-final { scan-assembler {vmov\.f32\ts1, s0} } }  */
+/* { dg-final { scan-assembler {vmov\.f32\ts[0-9]+, s2+} } }  */
+/* { dg-final { scan-assembler-times {vmov\.f32\ts0, s[0-9]+} 2 } }  */
diff --git a/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c b/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
index 84fc0a0f5f06b1714a70f4703213ca10ea0b268e..56a3ae2618432a408cd9b20f9e1334106efab98b 100644
--- a/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
+++ b/gcc/testsuite/gcc.target/arm/fp16-aapcs-3.c
@@ -16,6 +16,8 @@ F (__fp16 a, __fp16 b, __fp16 c)
   return c;
 }
 
-/* { dg-final { scan-assembler-times {vmov\tr[0-9]+, s[0-2]} 2 } }  */
-/* { dg-final { scan-assembler-times {vmov.f32\ts1, s0} 1 } }  */
-/* { dg-final { scan-assembler-times {vmov\ts0, r[0-9]+} 2 } }  */
+/* { dg-final { scan-assembler {vmov\.f32\ts[0-9]+, s1} } }  */
+/* { dg-final { scan-assembler {vmov\.f32\ts1, s0} } }  */
+/* { dg-final { scan-assembler {vmov\.f32\ts[0-9]+, s2+} } }  */
+/* { dg-final { scan-assembler-times {vmov\.f32\ts0, s[0-9]+} 2 } }  */
+



[PATCH][GCC][AArch64] Allow any offset for SVE addressing modes before reload

2019-02-11 Thread Tamar Christina
Hi All,

On AArch64 aarch64_classify_address has a case for when it's non-strict
that will allow it to accept any byte offset from a reg when validating
an address in a given addressing mode.

This because reload would later make the address valid. SVE however requires
the address always be valid, but currently allows any address when a MEM +
offset is used.  This causes an ICE as nothing later forces the address to be
legitimate.

The patch forces aarch64_emit_sve_pred_move to ensure that the addressing mode
is valid for any loads/stores it creates, which follows the SVE way of handling
address classifications.

Bootstrapped on aarch64-none-linux-gnu and no issues.
Regtested on aarch64-none-elf with SVE on and no issues.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2019-02-11  Tamar Christina  

PR target/88847
* config/aarch64/aarch64.c (aarch64_classify_address):
For SVE enforce that the address is always valid when doing a MEM +
offset.

gcc/testsuite/ChangeLog:

2019-02-11  Tamar Christina  

PR target/88847
* gcc.target/aarch64/sve/pr88847.c: New test.

-- 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5df5a8b78439e69705e62845a4d1f86166a01894..59f03e688e58c1aab37629555c7b3f19e5075935 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3414,6 +3414,14 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm,
 void
 aarch64_emit_sve_pred_move (rtx dest, rtx pred, rtx src)
 {
+  /* Make sure that the address is legitimate.  */
+  if (MEM_P (dest)
+  && !aarch64_sve_struct_memory_operand_p (dest))
+{
+  rtx addr = force_reg (Pmode, XEXP (dest, 0));
+  dest = replace_equiv_address (dest, addr);
+}
+
   emit_insn (gen_rtx_SET (dest, gen_rtx_UNSPEC (GET_MODE (dest),
 		gen_rtvec (2, pred, src),
 		UNSPEC_MERGE_PTRUE)));
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr88847.c b/gcc/testsuite/gcc.target/aarch64/sve/pr88847.c
new file mode 100644
index ..b7504add9a9f2eedf9421328a87b75b53d492860
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr88847.c
@@ -0,0 +1,21 @@
+/* { dg-do assemble { target aarch64_asm_sve_ok } } */
+/* { dg-additional-options "-O0 -msve-vector-bits=256 -mbig-endian --save-temps" } */
+
+typedef struct _b {
+  __attribute__((__vector_size__(32))) int a[2];
+} b;
+
+b *c;
+
+void
+foo (void)
+{
+  char *p = '\0';
+  b e = c[0];
+}
+
+/* { dg-final { scan-assembler {\tld1w\tz[0-9]+.s, p[0-9]+/z, \[x[0-9]+\]\n} } } */
+/* { dg-final { scan-assembler {\tld1w\tz[0-9]+.s, p[0-9]+/z, \[x[0-9]+, #1, mul vl\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz[0-9]+.s, p[0-9]+, \[(sp|x[0-9]+)\]\n} } } */
+/* { dg-final { scan-assembler {\tst1w\tz[0-9]+.s, p[0-9]+, \[(sp|x[0-9]+), #1, mul vl\]\n} } } */
+



Re: [PATCH] rs6000: Vector shift-right should honor modulo semantics

2019-02-11 Thread Bill Schmidt
On 2/11/19 8:11 AM, Segher Boessenkool wrote:
> On Mon, Feb 11, 2019 at 07:17:16AM -0600, Bill Schmidt wrote:
>> At -O0 (if I hand-inline everything myself to avoid errors), we scalarize
>> the modulo/masking operation into a rldicl for each doubleword.  I really
>> don't see any reason to change the code.
> So what does this look like at expand (at -O0)?  Is it something that
> is done at gimple level, is it expand itself, is it some target thing?

It's already a mask at expand, even at -O0.  vregs dump excerpt:

(insn 13 12 14 2 (set (reg:DI 129 [ _16 ])
(vec_select:DI (reg:V2DI 126 [ _9 ])
(parallel [
(const_int 0 [0])
]))) "vec-srad-modulo.c":40:10 1223 {vsx_extract_v2di}
 (nil))
(insn 14 13 15 2 (set (reg:DI 130 [ _17 ])
(and:DI (reg:DI 129 [ _16 ])
(const_int 63 [0x3f]))) "vec-srad-modulo.c":40:10 195 {anddi3_mask}
 (nil))
(insn 15 14 16 2 (set (reg:DI 131 [ _18 ])
(vec_select:DI (reg:V2DI 126 [ _9 ])
(parallel [
(const_int 1 [0x1])
]))) "vec-srad-modulo.c":40:10 1223 {vsx_extract_v2di}
 (nil))
(insn 16 15 17 2 (set (reg:DI 132 [ _19 ])
(and:DI (reg:DI 131 [ _18 ])
(const_int 63 [0x3f]))) "vec-srad-modulo.c":40:10 195 {anddi3_mask}
 (nil))

>
 For -mcpu=power9, we get close, but have some bad register allocation and
 an unnecessary extend:

 xxspltib 0,4   <- why not just xxspltib 32,4?
 xxlor 32,0,0   <- wasted copy
>>> Yeah, huh.  Where does that come from...  I blame splitters after reload.
>> This only happens at -O2 and up, FWIW.  At -O1 we allocate the registers
>> reasonably.
> Heh.
>
 Weird.  I just tried adding -mvsx
>>> Does it _need_ VSX anyway?  Are these builtins defined without it, too?
>> Yes (vector long long / V2DImode requires VSX).
> So something like
>
> /* { dg-do run } */
> /* { dg-require-effective-target vsx_hw } */
> /* { dg-options "-mvsx -O2" } */
>
> then?

What I have now:

/* { dg-do run { target { vsx_hw } } } */  
/* { dg-options "-O2 -mvsx" } */   

>
>> I pointed to the bugzilla in another reply -- which was "resolved" with a 
>> hack.
>> I consider it still broken this way...
> Reopen that PR?

Already done. ;-)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88920

Bill

>
>> I tested a revised version of the patch overnight and will submit shortly.
> Thanks.
>
>
> Segher
>



Re: [Patch] [arm] Fix 88714, Arm LDRD/STRD peepholes

2019-02-11 Thread Matthew Malcomson
On 10/02/19 09:42, Christophe Lyon wrote:
> 
> Both this simple patch or the previous fix all the ICEs I reported, thanks.
> 
> Of course, the scan-assembler failures remain to be fixed.
> 

In the testcase I failed to account for targets that don't support arm 
mode or
targets that do not support the ldrd/strd instructions.

This patch accounts for both of these by adding some
dg-require-effective-target lines to the testcase.

This patch also adds a new effective-target procedure to check a target
supports arm ldrd/strd.
This check uses the 'r' constraint to ensure SP is not used so that it will
work for thumb mode code generation as well as arm mode.

Tested by running this testcase with cross compilers using "-march=armv5t",
"-mcpu=cortex-m3", "-mcpu-arm7tdmi", "-mcpu=cortex-a9 -march=armv5t" for 
both
arm-none-eabi and arm-none-linux-gnueabihf.
Also ran this testcase with `make check` natively.

Ok for trunk?

gcc/testsuite/ChangeLog:

2019-02-11  Matthew Malcomson  

* gcc.dg/rtl/arm/ldrd-peepholes.c: Restrict testcase.
* lib/target-supports.exp: Add procedure to check for ldrd.



diff --git a/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c 
b/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
index 
4c3949c0963b8482545df670c31db2d9ec0f26b3..cbb64a770f5d796250601cafe481d7c2ea13f2eb
 
100644
--- a/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
+++ b/gcc/testsuite/gcc.dg/rtl/arm/ldrd-peepholes.c
@@ -1,4 +1,6 @@
  /* { dg-do compile { target arm*-*-* } } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_ldrd_strd_ok } */
  /* { dg-skip-if "Ensure only targetting arm with TARGET_LDRD" { *-*-* 
} { "-mthumb" } { "" } } */
  /* { dg-options "-O3 -marm -fdump-rtl-peephole2" } */

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
a0b4b99067f9ae225bde3b6bc719e89e1ea8e0e1..16dd018e8020fdf8e104690fed6a4e8919aa4aa1
 
100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4918,6 +4918,27 @@ proc check_effective_target_arm_prefer_ldrd_strd 
{ } {
  }  "-O2 -mthumb" ]
  }

+# Return true if LDRD/STRD instructions are available on this target.
+proc check_effective_target_arm_ldrd_strd_ok { } {
+if { ![check_effective_target_arm32] } {
+  return 0;
+}
+
+return [check_no_compiler_messages arm_ldrd_strd_ok object {
+  int main(void)
+  {
+__UINT64_TYPE__ a = 1, b = 10;
+__UINT64_TYPE__ *c = 
+// `a` will be in a valid register since it's a DImode quantity.
+asm ("ldrd %0, %1"
+ : "=r" (a)
+ : "m" (c));
+return a == 10;
+  }
+}]
+}
+
  # Return 1 if this is a PowerPC target supporting -meabi.

  proc check_effective_target_powerpc_eabi_ok { } {



Re: [PATCH] i386: Use EXT_REX_SSE_REG_P in *movoi_internal_avx/movti_internal

2019-02-11 Thread H.J. Lu
On Mon, Feb 11, 2019 at 5:51 AM Uros Bizjak  wrote:
>
> On Mon, Feb 11, 2019 at 2:29 PM H.J. Lu  wrote:
>
> > > No. As said, please correctly set mode to XImode in mode attribute 
> > > calculation.
> >
> > There is
> >
> >  switch (get_attr_type (insn))
> > {
> > case TYPE_SSELOG1:
> >   return standard_sse_constant_opcode (insn, operands);
> >
> > standard_sse_constant_opcode has
> >
> > else if (x == constm1_rtx || vector_all_ones_operand (x, mode))
> > {
> >   enum attr_mode insn_mode = get_attr_mode (insn);
> >
> >   switch (insn_mode)
> > {
> > case MODE_XI:
> > case MODE_V8DF:
> > case MODE_V16SF:
> >   gcc_assert (TARGET_AVX512F);
> >   return "vpternlogd\t{$0xFF, %g0, %g0, %g0|%g0, %g0, %g0, 0xFF}";
>
> If there is something wrong with standard_sse_constant_opcode, then
> fix the problem in the function itself. With your previous patch, you
> introduced a regression, and the presented fix is another kludge to
> fix a stack of kludges inside standard_sse_constant_opcode.
>
> Please take your time and propose some acceptable solution that would
> put some logic into const_0/const_1 handling. The situation is not OK
> and your patch makes it even worse.
>

Let's first define what MODE_XI means in standard_sse_constant_opcode
as well as in all these mov patterns for with and without AVX512VL.   Without
a clear definition, we can't get out of this mess.

-- 
H.J.


Re: arm access to stack slot out of allocated area

2019-02-11 Thread Olivier Hainque
Hi Wilco,

Thanks for your feedback.

> On 8 Feb 2019, at 22:35, Wilco Dijkstra  wrote:
> 
> Hi Olivier,
> 
>> Sorry, I had -mapcs-frame in mind.
> 
> That's identical to -mapcs, and equally deprecated. It was superceded 2 
> decades
> ago. -mpcs-frame bugs have been reported multiple times, including on VxWorks.

> For example https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64379 suggests
> VxWorks doesn't need -mapcs-frame. And that was in 2014!

> So I think we need to push much harder on getting rid of obsolete stuff and
> avoid people encountering these nasty issues.

People are strongly encouraged to move to VxWorks7 (bpabi)
these days and we'll probably reinforce that message.

VxWorks6 is however still around and the environment (toolchain,
libraries, ...) is abi=apcs-gnu + apcs-frame, so right now we
unfortunately don't have much of a choice, and it was not working
that bad until gcc-7 in our experience.

I performed a round of testing with an aapcs compiler, out of
curiosity to double check, and the incompatibilities are indeed
visible.

On calls to printf with a "double" as the first va-argument, for
example, where the argument is passed through r2 & r3 but expected
in r1 & r2 by the target library.

apcs-frame apart, it looks like stack-checking could be affected as
well. I need to think about this some more.

Olivier



Re: Make clear, when contributions will be ignored

2019-02-11 Thread Дилян Палаузов
Hello Segher,

my question was how do you propose to proceed, so that a 
no-reminders-for-patches-are-necessary-state is reached.

There is no relation with having infinite time or dealing with high-cost 
low-profit patches.

Previously I raised the quesion, whether automating the process for sending 
reminders, is a good idea.  This saves time
of people to write reminders.

Greetings
  Дилян

On Mon, 2019-02-11 at 07:57 -0600, Segher Boessenkool wrote:
> On Mon, Feb 11, 2019 at 12:44:31PM +, Дилян Палаузов wrote:
> > -- at https://www.gnu.org/software/gcc/contribute.html is written “If you 
> > do not receive a response to a patch that you
> > have submitted within two weeks or so, it may be a good idea to chase it by 
> > sending a follow-up email to the same
> > list(s).”
> 
> That is about patches.  Not about bugzilla.  Sending reminders for bugzilla
> reports is useless and annoying.  Sending reminders for patches however is
> necessary, the way our development works currently.  It isn't clear any
> change to procedures would help at all, since the fundamental problems need
> to be attacked to make any progress.  Maintainers do not have infinite time,
> and there is no incentive to deal with high-cost low-profit patches.
> 
> 
> Segher



Re: [PATCH] rs6000: Vector shift-right should honor modulo semantics

2019-02-11 Thread Segher Boessenkool
On Mon, Feb 11, 2019 at 07:17:16AM -0600, Bill Schmidt wrote:
> At -O0 (if I hand-inline everything myself to avoid errors), we scalarize
> the modulo/masking operation into a rldicl for each doubleword.  I really
> don't see any reason to change the code.

So what does this look like at expand (at -O0)?  Is it something that
is done at gimple level, is it expand itself, is it some target thing?

> >> For -mcpu=power9, we get close, but have some bad register allocation and
> >> an unnecessary extend:
> >>
> >> xxspltib 0,4   <- why not just xxspltib 32,4?
> >> xxlor 32,0,0   <- wasted copy
> > Yeah, huh.  Where does that come from...  I blame splitters after reload.
> 
> This only happens at -O2 and up, FWIW.  At -O1 we allocate the registers
> reasonably.

Heh.

> >> Weird.  I just tried adding -mvsx
> > Does it _need_ VSX anyway?  Are these builtins defined without it, too?
> 
> Yes (vector long long / V2DImode requires VSX).

So something like

/* { dg-do run } */
/* { dg-require-effective-target vsx_hw } */
/* { dg-options "-mvsx -O2" } */

then?

> I pointed to the bugzilla in another reply -- which was "resolved" with a 
> hack.
> I consider it still broken this way...

Reopen that PR?

> I tested a revised version of the patch overnight and will submit shortly.

Thanks.


Segher


Re: Make clear, when contributions will be ignored

2019-02-11 Thread Segher Boessenkool
On Mon, Feb 11, 2019 at 12:44:31PM +, Дилян Палаузов wrote:
> -- at https://www.gnu.org/software/gcc/contribute.html is written “If you do 
> not receive a response to a patch that you
> have submitted within two weeks or so, it may be a good idea to chase it by 
> sending a follow-up email to the same
> list(s).”

That is about patches.  Not about bugzilla.  Sending reminders for bugzilla
reports is useless and annoying.  Sending reminders for patches however is
necessary, the way our development works currently.  It isn't clear any
change to procedures would help at all, since the fundamental problems need
to be attacked to make any progress.  Maintainers do not have infinite time,
and there is no incentive to deal with high-cost low-profit patches.


Segher


  1   2   >