Re: Late-breaking jit features (was Re: [PATCH][gcc] libgccjit: introduce gcc_jit_context_add_driver_option)

2019-02-01 Thread Richard Biener
On February 1, 2019 10:11:12 PM GMT+01:00, David Malcolm  
wrote:
>On Mon, 2019-01-21 at 08:40 +, Andrea Corallo wrote:
>> Hi all,
>> Second version of the patch addressing David's comment about all-non-
>> failing-tests.h
>> 
>> Adds gcc_jit_context_add_driver_option to the libgccjit ABI and a
>> testcase for it.
>> 
>> Using this interface is now possible to pass options affecting
>> assembler and linker.
>> 
>> Does not introduce regressions running make check-jit
>
>Thanks; the patch looks good.
>
>[CCing the release managers]
>
>Given that gcc development is now in stage 4, we really shouldn't be
>adding new features, but I'm wondering if an exception can be made for
>libgccjit?  (this patch purely touches the jit subdirectories).
>
>There's one other late-breaking change, here:
>  [PATCH][jit] Add thread-local globals to the libgccjit frontend
>https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00227.html
>which is nearly ready, but is awaiting copyright assignment paperwork.
>
>Alternatively, should these patches go into a branch of queued jit
>changes for gcc 10?

Is there anything like an ABI involved? If so we should avoid breaking it all 
the time. Otherwise JIT is not release critical and thus if you break it in the 
wrong moment it's your own fault. 

Richard. 

>Thanks
>Dave
>
>
>> Bests
>> 
>>   Andrea
>> 
>> 
>> gcc/jit/ChangeLog
>> 2019-01-16  Andrea Corallo  andrea.cora...@arm.com
>> 
>> * docs/topics/compatibility.rst (LIBGCCJIT_ABI_11): New ABI tag.
>> * docs/topics/contexts.rst (Additional driver options): New
>> section.
>> * jit-playback.c (invoke_driver): Add call to append_driver_options.
>> * jit-recording.c: Within namespace gcc::jit...
>> (recording::context::~context): Free the optnames within
>> m_driver_options.
>> (recording::context::add_driver_option): New method.
>> (recording::context::append_driver_options): New method.
>> (recording::context::dump_reproducer_to_file): Add driver
>> options.
>> * jit-recording.h: Within namespace gcc::jit...
>> (recording::context::add_driver_option): New method.
>> (recording::context::append_driver_options): New method.
>> (recording::context::m_driver_options): New field.
>> * libgccjit++.h (gccjit::context::add_driver_option): New
>> method.
>> * libgccjit.c (gcc_jit_context_add_driver_option): New API
>> entrypoint.
>> * libgccjit.h (gcc_jit_context_add_driver_option): New API
>> entrypoint.
>> (LIBGCCJIT_HAVE_gcc_jit_context_add_driver_option): New
>> macro.
>> * libgccjit.map (LIBGCCJIT_ABI_11): New ABI tag.
>> 
>> 
>> 
>> gcc/testsuite/ChangeLog
>> 2019-01-16  Andrea Corallo  andrea.cora...@arm.com
>> 
>> * jit.dg/add-driver-options-testlib.c: Add support file for
>> test-add-driver-options.c testcase.
>> * jit.dg/all-non-failing-tests.h: Add note about
>> test-add-driver-options.c
>> * jit.dg/jit.exp (jit-dg-test): Update to support
>> add-driver-options-testlib.c compilation.
>> * jit.dg/test-add-driver-options.c: New testcase.
>> 



[C++ PATCH] PR c++/88761 - ICE with reference capture of constant.

2019-02-01 Thread Jason Merrill
Here, we capture nf, then the use of the proxy decays to a constant during
semantic processing of +nf.  Since we saw some decay from proxy to constant,
we walk through the lambda body to see which proxies are still used, but we
weren't walking into subtrees of DECL_EXPR at all, so we missed the use of
 in the initializer of y, and removed the capture.  But then at
instantiation time we try to use nf, don't have a proxy anymore, and ICE.

Currently the template representation of +nf still uses the proxy rather than
the decayed constant; I will address that in a follow-up patch.

Tested x86_64-pc-linux-gnu, applying to trunk.

* lambda.c (mark_const_cap_r): Do walk subtrees of DECL_EXPR for
non-proxy decls.
---
 gcc/cp/lambda.c|  6 --
 .../g++.dg/cpp1y/lambda-generic-const6.C   | 18 ++
 gcc/cp/ChangeLog   |  6 ++
 3 files changed, 28 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/lambda-generic-const6.C

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 4b7a358a0ad..c31b06e2b1e 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -1488,8 +1488,10 @@ mark_const_cap_r (tree *t, int *walk_subtrees, void 
*data)
 {
   tree decl = DECL_EXPR_DECL (*t);
   if (is_constant_capture_proxy (decl))
-   var = DECL_CAPTURED_VARIABLE (decl);
-  *walk_subtrees = 0;
+   {
+ var = DECL_CAPTURED_VARIABLE (decl);
+ *walk_subtrees = 0;
+   }
 }
   else if (is_constant_capture_proxy (*t))
 var = DECL_CAPTURED_VARIABLE (*t);
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-const6.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-const6.C
new file mode 100644
index 000..e85d6497488
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-const6.C
@@ -0,0 +1,18 @@
+// PR c++/88761
+// { dg-do compile { target c++14 } }
+
+template 
+void f(T t) { t(1); }
+
+int main()
+{
+  const unsigned long nf = 10'000'000;
+
+  auto loop = [&](auto)
+  {
+auto x = +nf;
+auto y = 
+  };
+
+  f(loop);
+}
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index f0545aee116..6c474fdda16 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,9 @@
+2019-02-01  Jason Merrill  
+
+   PR c++/88761 - ICE with reference capture of constant.
+   * lambda.c (mark_const_cap_r): Do walk subtrees of DECL_EXPR for
+   non-proxy decls.
+
 2019-02-01  Marek Polacek  
 
PR c++/88325 - ICE with invalid out-of-line template member definition.

base-commit: 5f6f6e51c0ff7f6e9e26f026e8ea856e0c4b91b5
-- 
2.20.1



Re: [PATCH 00/46] Implement MMX intrinsics with SSE

2019-02-01 Thread H.J. Lu
On Fri, Feb 1, 2019 at 4:50 PM Andi Kleen  wrote:
>
> "H.J. Lu"  writes:
>
> > To support it, we disable
> > MMX by default in 64-bit mode so that MMX registers won't be available
>
> Wouldn't that break inline assembler that references MMX register clobbers?

Yes.  You need to use -mmmx explicitly to enable MMX.

-- 
H.J.


Re: [PATCH 00/46] Implement MMX intrinsics with SSE

2019-02-01 Thread Andi Kleen
"H.J. Lu"  writes:

> To support it, we disable
> MMX by default in 64-bit mode so that MMX registers won't be available

Wouldn't that break inline assembler that references MMX register clobbers?

-Andi


[committed] Fix omp declare simd ICE (PR middle-end/87887)

2019-02-01 Thread Jakub Jelinek
Hi!

We can't create a vector type with aggregate elements.  In the past
we used to reject BLKmode aggregates, but accepted (and ICEd on) [QHSD]Imode
aggregates.  On the other side, as the FIXME says, we should accept any
argument type if it is uniform (linear should be only integers or pointers
or references and we do already accept those).  From what I can see, aarch64
will not already let aggregates through (but, it could be changed to ignore
uniform argument types).

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2019-02-02  Jakub Jelinek  

PR middle-end/87887
* config/i386/i386.c (ix86_simd_clone_compute_vecsize_and_simdlen):
Punt with warning on aggregate return or argument types.  Ignore
type/mode checking for uniform arguments.

* gcc.dg/gomp/pr87887-1.c: New test.
* gcc.dg/gomp/pr87887-2.c: New test.

--- gcc/config/i386/i386.c.jj   2019-01-30 08:50:26.824223494 +0100
+++ gcc/config/i386/i386.c  2019-02-01 23:14:10.450518093 +0100
@@ -50433,7 +50433,9 @@ ix86_simd_clone_compute_vecsize_and_simd
   case E_DFmode:
   /* case E_SCmode: */
   /* case E_DCmode: */
-   break;
+   if (!AGGREGATE_TYPE_P (ret_type))
+ break;
+   /* FALLTHRU */
   default:
warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
"unsupported return type %qT for simd", ret_type);
@@ -50444,7 +50446,6 @@ ix86_simd_clone_compute_vecsize_and_simd
   int i;
 
   for (t = DECL_ARGUMENTS (node->decl), i = 0; t; t = DECL_CHAIN (t), i++)
-/* FIXME: Shouldn't we allow such arguments if they are uniform?  */
 switch (TYPE_MODE (TREE_TYPE (t)))
   {
   case E_QImode:
@@ -50455,8 +50456,12 @@ ix86_simd_clone_compute_vecsize_and_simd
   case E_DFmode:
   /* case E_SCmode: */
   /* case E_DCmode: */
-   break;
+   if (!AGGREGATE_TYPE_P (TREE_TYPE (t)))
+ break;
+   /* FALLTHRU */
   default:
+   if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_UNIFORM)
+ break;
warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
"unsupported argument type %qT for simd", TREE_TYPE (t));
return 0;
--- gcc/testsuite/gcc.dg/gomp/pr87887-1.c.jj2019-02-01 20:55:11.383755787 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr87887-1.c   2019-02-01 23:15:07.376587238 
+0100
@@ -0,0 +1,26 @@
+/* PR middle-end/87887 */
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-w" } */
+
+struct S { int n; };
+#pragma omp declare simd
+struct S
+foo (int x)
+{
+  return (struct S) { x };
+}
+
+#pragma omp declare simd
+int
+bar (struct S x)
+{
+  return x.n;
+}
+
+#pragma omp declare simd uniform (x)
+int
+baz (int w, struct S x, int y)
+{
+  return w + x.n + y;
+}
--- gcc/testsuite/gcc.dg/gomp/pr87887-2.c.jj2019-02-01 20:55:18.618636600 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr87887-2.c   2019-02-01 23:15:14.118476995 
+0100
@@ -0,0 +1,25 @@
+/* PR middle-end/87887 */
+/* { dg-do compile { target i?86-*-* x86_64-*-* } } */
+/* { dg-require-effective-target vect_simd_clones } */
+
+struct S { int n; };
+#pragma omp declare simd
+struct S
+foo (int x)/* { dg-warning "unsupported return type 'struct S' for 
simd" } */
+{
+  return (struct S) { x };
+}
+
+#pragma omp declare simd
+int
+bar (struct S x)   /* { dg-warning "unsupported argument type 'struct S' 
for simd" } */
+{
+  return x.n;
+}
+
+#pragma omp declare simd uniform (x)
+int
+baz (int w, struct S x, int y)
+{
+  return w + x.n + y;
+}

Jakub


libgo patch committed: Add Hurd netpoll and semaphore support

2019-02-01 Thread Ian Lance Taylor
This patch by Svante Signell adds Hurd netpoll and semaphore support
to the runtime package.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268463)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-582392b80c07bd7e830e177b775dc4ef802b5fd6
+047b0aa6a29d46fde99b3e5823339ac8866f797c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/netpoll.go
===
--- libgo/go/runtime/netpoll.go (revision 268459)
+++ libgo/go/runtime/netpoll.go (working copy)
@@ -102,7 +102,7 @@ func netpollinited() bool {
 // descriptor being used by netpoll.
 func poll_runtime_isPollServerDescriptor(fd uintptr) bool {
fds := netpolldescriptor()
-   if GOOS != "aix" {
+   if GOOS != "aix" && GOOS != "hurd" {
return fd == fds
} else {
// AIX have a pipe in its netpoll implementation.
@@ -178,8 +178,8 @@ func poll_runtime_pollWait(pd *pollDesc,
if err != 0 {
return err
}
-   // As for now only Solaris and AIX use level-triggered IO.
-   if GOOS == "solaris" || GOOS == "aix" {
+   // As for now only Solaris, AIX and Hurd use level-triggered IO.
+   if GOOS == "solaris" || GOOS == "aix" || GOOS == "hurd" {
netpollarm(pd, mode)
}
for !netpollblock(pd, int32(mode), false) {
Index: libgo/go/runtime/netpoll_hurd.go
===
--- libgo/go/runtime/netpoll_hurd.go(nonexistent)
+++ libgo/go/runtime/netpoll_hurd.go(working copy)
@@ -0,0 +1,240 @@
+// Copyright 2019 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package runtime
+
+import "unsafe"
+
+// FIXME: Improve network poller for hurd.
+// This is based on the former libgo/runtime/netpoll_select.c implementation
+// except that it uses poll instead of select and is written in Go.
+// It's also based on Solaris implementation for the arming mechanisms
+// Inspiration was also taken from netpoll_aix.go and netpoll_solaris.go
+
+//From /usr/include/x86_64-linux-gnu/sys/poll.h
+//go:noescape
+//extern poll
+func libc_poll(pfds *pollfd, nfds int32, timeout int32) int32
+
+//go:noescape
+//extern pipe2
+func libc_pipe2(fd *int32, flags int32) int32
+
+//pollfd represents the poll structure for GNU/Hurd operating system.
+type pollfd struct {
+   fd  int32 // File descriptor to poll.
+   events  int16 // Types of events poller cares about.
+   revents int16 // Types of events that actually occurred.
+}
+
+//From /usr/include/i386-gnu/bits/poll.h
+const _POLLIN = 01// There is data to read.
+const _POLLPRI = 02   // There is urgent data to read.
+const _POLLOUT = 04   // Writing now will not block.
+const _POLLERR = 010  // Error condition.
+const _POLLHUP = 020  // Hung up.
+const _POLLNVAL = 040 // Invalid polling request.
+
+var (
+   pfds   []pollfd
+   pds[]*pollDesc
+   mtxpollmutex
+   mtxset mutex
+   rdwake int32
+   wrwake int32
+   pendingUpdates int32
+)
+
+const pollVerbose = false
+
+func netpollinit() {
+   var p [2]int32
+
+   // Create the pipe we use to wakeup poll.
+   if err := libc_pipe2([0], _O_CLOEXEC|_O_NONBLOCK); err < 0 {
+   throw("runtime:netpollinit(): failed to create pipe2")
+   }
+   rdwake = p[0]
+   wrwake = p[1]
+
+   // Pre-allocate array of pollfd structures for poll.
+   if pollVerbose {
+   println("*** allocating")
+   }
+   pfds = make([]pollfd, 1, 128)
+   if pollVerbose {
+   println("*** allocating done", [0])
+   }
+
+   // Poll the read side of the pipe.
+   pfds[0].fd = int32(rdwake)
+   pfds[0].events = int16(_POLLIN)
+   pfds[0].revents = int16(0)
+
+   pds = make([]*pollDesc, 1, 128)
+   // Checks for pd != nil are made in netpoll.
+   pds[0] = nil
+}
+
+func netpolldescriptor() uintptr {
+   // Both fds must be returned.
+   if rdwake > 0x || wrwake > 0x {
+   throw("netpolldescriptor: invalid fd number")
+   }
+   return uintptr(rdwake<<16 | wrwake)
+}
+
+// netpollwakeup writes on wrwake to wakeup poll before any changes.
+func netpollwakeup() {
+   if pendingUpdates == 0 {
+   pendingUpdates = 1
+   if pollVerbose {
+   println("*** writing 1 byte")
+   }
+   b := [1]byte{0}
+   write(uintptr(wrwake), unsafe.Pointer([0]), 1)
+   }
+}
+
+func netpollopen(fd uintptr, pd *pollDesc) int32 

Re: [PATCH] Handle timeout warnings in dg-extract-results

2019-02-01 Thread Mike Stump
On Jan 23, 2019, at 5:16 AM, Christophe Lyon  wrote:
> What do people think about this?

Seems reasonable.

Re: libgo patch committed: Hurd configury

2019-02-01 Thread Ian Lance Taylor
And this patch adds the new expected file,
libgo/runtime/getncpu-hurd.c.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268461)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-51fb93fd71b8a0a690455dfdd3d12b2aa0171f5c
+582392b80c07bd7e830e177b775dc4ef802b5fd6
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/getncpu-hurd.c
===
--- libgo/runtime/getncpu-hurd.c(nonexistent)
+++ libgo/runtime/getncpu-hurd.c(working copy)
@@ -0,0 +1,16 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+#include 
+
+#include "runtime.h"
+#include "defs.h"
+
+int32
+getproccount(void)
+{
+   int32 n;
+   n = (int32)sysconf(_SC_NPROCESSORS_ONLN);
+   return n > 1 ? n : 1;
+}


Re: [PATCH] Fix Fortran handling of PARAMETERs in BLOCK which is not the first statement in function (PR fortran/83246, PR fortran/89084)

2019-02-01 Thread Steve Kargl
On Fri, Feb 01, 2019 at 11:36:27PM +0100, Jakub Jelinek wrote:
> 
> As mentioned in the PR, the following testcases FAIL, because a VAR_DECL
> for a PARAMETER inside of a BLOCK is not added to the BIND_EXPR vars and
> thus the middle-end doesn't consider it defined.
> 
> The problem is in the following test, which passes for all the PR67885
> tests, but doesn't really test whether this is a parameter in a BLOCK
> namespace.  It actually tests whether the parent namespace (usually
> function/subroutine body, but could be another BLOCK) has EXEC_BLOCK as
> the first executable statement in it.
> As I said in the PR, we could walk the linked list from sym->ns->parent->code
> through ->next pointers and look for EXEC_BLOCK which has sym->ns as the
> attached namespace, but that could be compile time expensive (consider
> a function with a million of blocks in it and in each of them one or more
> referenced parameters - that would be O(n^2) compile time).
> The following patch instead checks the bit flag set only in EXEC_BLOCK
> namespaces.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 

OK.  Thanks for fixing my previous attempt to make this work.

-- 
Steve


Re: [PATCH] Move stack protector epilogue before loading return hard reg(s) from pseudo(s) (PR rtl-optimization/87485)

2019-02-01 Thread Eric Botcazou
> So, can we e.g. keep emitting the epilogue where it is now for
> naked_return_label != NULL_RTX and move it otherwise?
> For __builtin_return the setter and use of the hard register won't be
> adjacent in any case.

See my comment in the audit trail of the PR; I'd suspend it and go to bed. ;-)

-- 
Eric Botcazou


libgo patch committed: Hurd configury

2019-02-01 Thread Ian Lance Taylor
This libgo patch by Svante Signell adds Hurd configury support, and
also sysinfo/sigtab support.  On Hurd systems it expects a file that
will be added in a later patch.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268460)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-87dd981901c645a7d54a52c5f4c35caec31a8978
+51fb93fd71b8a0a690455dfdd3d12b2aa0171f5c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 268458)
+++ libgo/Makefile.am   (working copy)
@@ -428,10 +428,14 @@ else
 if LIBGO_IS_AIX
 runtime_getncpu_file = runtime/getncpu-aix.c
 else
+if LIBGO_IS_HURD
+runtime_getncpu_file = runtime/getncpu-hurd.c
+else
 runtime_getncpu_file = runtime/getncpu-none.c
 endif
 endif
 endif
+endif
 endif
 endif
 endif
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 268458)
+++ libgo/configure.ac  (working copy)
@@ -165,6 +165,7 @@ is_dragonfly=no
 is_rtems=no
 is_solaris=no
 is_aix=no
+is_hurd=no
 GOOS=unknown
 case ${host} in
   *-*-darwin*)   is_darwin=yes;  GOOS=darwin ;;
@@ -177,6 +178,7 @@ case ${host} in
   *-*-rtems*)is_rtems=yes;   GOOS=rtems ;;
   *-*-solaris2*) is_solaris=yes; GOOS=solaris ;;
   *-*-aix*)  is_aix=yes; GOOS=aix ;;
+  *-*-gnu*)  is_hurd=yes;GOOS=hurd ;;
 esac
 AM_CONDITIONAL(LIBGO_IS_DARWIN, test $is_darwin = yes)
 AM_CONDITIONAL(LIBGO_IS_FREEBSD, test $is_freebsd = yes)
@@ -188,6 +190,7 @@ AM_CONDITIONAL(LIBGO_IS_DRAGONFLY, test
 AM_CONDITIONAL(LIBGO_IS_RTEMS, test $is_rtems = yes)
 AM_CONDITIONAL(LIBGO_IS_SOLARIS, test $is_solaris = yes)
 AM_CONDITIONAL(LIBGO_IS_AIX, test $is_aix = yes)
+AM_CONDITIONAL(LIBGO_IS_HURD, test $is_hurd = yes)
 AM_CONDITIONAL(LIBGO_IS_BSD, test $is_darwin = yes -o $is_dragonfly = yes -o 
$is_freebsd = yes -o $is_netbsd = yes -o $is_openbsd = yes)
 AC_SUBST(GOOS)
 AC_SUBST(ALLGOOS)
Index: libgo/mksigtab.sh
===
--- libgo/mksigtab.sh   (revision 268369)
+++ libgo/mksigtab.sh   (working copy)
@@ -91,6 +91,7 @@ checksig _SIGCANCEL  '{_SigSetStack + _S
 checksig _SIGXRES'{_SigNotify, "SIGXRES: resource control exceeded"}'
 checksig _SIGJVM1'{_SigNotify, "SIGJVM1: reserved signal for Java Virtual 
Machine"}'
 checksig _SIGJVM2'{_SigNotify, "SIGJVM2: reserved signal for Java Virtual 
Machine"}'
+checksig _SIGLOST '   {_SigNotify, "SIGLOST: resource lost (Sun); server died 
(GNU)"}'
 
 # Special handling of signals 32 and 33 on GNU/Linux systems,
 # because they are special to glibc.
@@ -112,6 +113,11 @@ else
rtmax=`grep 'const _*SIGRTMAX = [0-9]*$' gen-sysinfo.go | sed -e 
's/.* = \([0-9]*\)/\1/'`
if test -n "$rtmax"; then
nsig=`expr $rtmax + 1`
+   elif grep 'const _*SIGRTMAX = [ (]*_*SIGRTMIN[ )]*' gen-sysinfo.go 
>/dev/null 2>&1; then
+   rtmin=`grep 'const _*SIGRTMIN = [0-9]*$' gen-sysinfo.go | sed 
-e 's/.* = \([0-9]*\)/\1/'`
+   if test -n "$rtmin"; then
+   nsig=`expr $rtmin + 1`
+   fi
fi
fi
 fi
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 268369)
+++ libgo/mksysinfo.sh  (working copy)
@@ -55,9 +55,13 @@ grep '^type _mld_hdr_t ' gen-sysinfo.go
   sed -e 's/_in6_addr/[16]byte/' >> ${OUT}
 
 # The errno constants.  These get type Errno.
-  egrep '#define E[A-Z0-9_]+ ' errno.i | \
+egrep '#define E[A-Z0-9_]+ [0-9E]' errno.i | \
   sed -e 's/^#define \(E[A-Z0-9_]*\) .*$/const \1 = Errno(_\1)/' >> ${OUT}
 
+# Workaround for GNU/Hurd _EMIG_* errors having negative values
+egrep '#define E[A-Z0-9_]+ -[0-9]' errno.i | \
+  sed -e 's/^#define \(E[A-Z0-9_]*\) .*$/const \1 = Errno(-_\1)/' >> ${OUT}
+
 # The O_xxx flags.
 egrep '^const _(O|F|FD)_' gen-sysinfo.go | \
   sed -e 's/^\(const \)_\([^= ]*\)\(.*\)$/\1\2 = _\2/' >> ${OUT}
@@ -130,6 +134,11 @@ grep '^const _SYS_' gen-sysinfo.go | \
 echo "const $sup = _$sys" >> ${OUT}
   done
 
+# Special treatment of SYS_IOCTL for GNU/Hurd.
+if ! grep '^const SYS_IOCTL' ${OUT} > /dev/null 2>&1; then
+  echo "const SYS_IOCTL = 0" >> ${OUT}
+fi
+
 # The GNU/Linux support wants to use SYS_GETDENTS64 if available.
 if ! grep '^const SYS_GETDENTS ' ${OUT} >/dev/null 2>&1; then
   echo "const SYS_GETDENTS = 0" >> ${OUT}
@@ -475,6 +484,13 @@ grep '^type _st_timespec ' gen-sysinfo.g
   -e 's/tv_sec/Sec/' \
   -e 's/tv_nsec/Nsec/' >> ${OUT}
 
+# Special treatment of struct stat st_dev for GNU/Hurd
+# /usr/include/i386-gnu/bits/stat.h: #define st_dev st_fsid
+fsid_to_dev=

Re: [PATCH] Move stack protector epilogue before loading return hard reg(s) from pseudo(s) (PR rtl-optimization/87485)

2019-02-01 Thread Jakub Jelinek
On Fri, Feb 01, 2019 at 11:37:06PM +0100, Eric Botcazou wrote:
> > As discussed in the PR and suggested by Uros, scheduler has code to keep a
> > use of hard register next to the assignment that sets that hard register
> > from a pseudo, which is desirable so that RA can deal with it properly.
> > Unfortunately, with -fstack-protector* we stick the stack protect epilogue
> > in between, which splits the load and use to different basic blocks.
> > The code emitted by expand_function_end between these two spots is only the
> > loading of the return value into registers, so generally it shouldn't
> > contain any stores which stack protection wants to guard against, so I
> > believe from security POV this shouldn't weaken anything, but fixes the
> > testcase.
> 
> This moves the stack protect epilogue from after the naked_return_label to 
> before though, so it will be skipped for a naked return.

So, can we e.g. keep emitting the epilogue where it is now for
naked_return_label != NULL_RTX and move it otherwise?
For __builtin_return the setter and use of the hard register won't be
adjacent in any case.

Jakub


[omp] Move NE_EXPR handling to omp_adjust_for_condition

2019-02-01 Thread Martin Jambor
Hi,

even after the two previous HSA fixes, there is still one remining
libgomp failure in the testsuite when run on an HSA-enabled APU.  The
problem is that grid calculation does not work with NE_EXPR conditions
in omp loop constructs which is now permitted in OpenMP 5.

The patch below fixes it by simply moving the code that deals with it
into the function shared between omp expansion and gridification, and a
place which also feels more natural, to omp_adjust_for_condition.  For
some reason, this function is also called twice in omp_extract_for_data
but the second call cannot have any effect, so I removed one.

I have tested this on an HSA APU system with hsa offloading enabled and
also bootstrapped and tested on a bigger x86_64-linux system.  OK for
trunk?

Thanks,

Martin


2019-02-01  Martin Jambor  

* omp-general.c (omp_extract_for_data): Removed a duplicate call
to omp_adjust_for_condition, moved NE_EXPR code_cond processing...
(omp_adjust_for_condition): ...here.  Added necessary parameters.
* omp-general.h (omp_adjust_for_condition): Updated declaration.
* omp-grid.c (grid_attempt_target_gridification): Adjust to pass
proper values to new parameters of omp_adjust_for_condition.
---
 gcc/omp-general.c | 67 ---
 gcc/omp-general.h |  2 +-
 gcc/omp-grid.c|  9 ---
 3 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/gcc/omp-general.c b/gcc/omp-general.c
index 12210c556fc..0f66ba0c5d8 100644
--- a/gcc/omp-general.c
+++ b/gcc/omp-general.c
@@ -56,18 +56,47 @@ omp_is_reference (tree decl)
   return lang_hooks.decls.omp_privatize_by_reference (decl);
 }
 
-/* Adjust *COND_CODE and *N2 so that the former is either LT_EXPR or
-   GT_EXPR.  */
+/* Adjust *COND_CODE and *N2 so that the former is either LT_EXPR or GT_EXPR,
+   given that V is the loop index variable and STEP is loop step. */
 
 void
-omp_adjust_for_condition (location_t loc, enum tree_code *cond_code, tree *n2)
+omp_adjust_for_condition (location_t loc, enum tree_code *cond_code, tree *n2,
+ tree v, tree step)
 {
   switch (*cond_code)
 {
 case LT_EXPR:
 case GT_EXPR:
+  break;
+
 case NE_EXPR:
+  gcc_assert (TREE_CODE (step) == INTEGER_CST);
+  if (TREE_CODE (TREE_TYPE (v)) == INTEGER_TYPE)
+   {
+ if (integer_onep (step))
+   *cond_code = LT_EXPR;
+ else
+   {
+ gcc_assert (integer_minus_onep (step));
+ *cond_code = GT_EXPR;
+   }
+   }
+  else
+   {
+ tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (v)));
+ gcc_assert (TREE_CODE (unit) == INTEGER_CST);
+ if (tree_int_cst_equal (unit, step))
+   *cond_code = LT_EXPR;
+ else
+   {
+ gcc_assert (wi::neg (wi::to_widest (unit))
+ == wi::to_widest (step));
+ *cond_code = GT_EXPR;
+   }
+   }
+
   break;
+
 case LE_EXPR:
   if (POINTER_TYPE_P (TREE_TYPE (*n2)))
*n2 = fold_build_pointer_plus_hwi_loc (loc, *n2, 1);
@@ -258,41 +287,13 @@ omp_extract_for_data (gomp_for *for_stmt, struct 
omp_for_data *fd,
   gcc_assert (loop->cond_code != NE_EXPR
  || (gimple_omp_for_kind (for_stmt)
  != GF_OMP_FOR_KIND_OACC_LOOP));
-  omp_adjust_for_condition (loc, >cond_code, >n2);
 
   t = gimple_omp_for_incr (for_stmt, i);
   gcc_assert (TREE_OPERAND (t, 0) == var);
   loop->step = omp_get_for_step_from_incr (loc, t);
 
-  if (loop->cond_code == NE_EXPR)
-   {
- gcc_assert (TREE_CODE (loop->step) == INTEGER_CST);
- if (TREE_CODE (TREE_TYPE (loop->v)) == INTEGER_TYPE)
-   {
- if (integer_onep (loop->step))
-   loop->cond_code = LT_EXPR;
- else
-   {
- gcc_assert (integer_minus_onep (loop->step));
- loop->cond_code = GT_EXPR;
-   }
-   }
- else
-   {
- tree unit = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (loop->v)));
- gcc_assert (TREE_CODE (unit) == INTEGER_CST);
- if (tree_int_cst_equal (unit, loop->step))
-   loop->cond_code = LT_EXPR;
- else
-   {
- gcc_assert (wi::neg (wi::to_widest (unit))
- == wi::to_widest (loop->step));
- loop->cond_code = GT_EXPR;
-   }
-   }
-   }
-
-  omp_adjust_for_condition (loc, >cond_code, >n2);
+  omp_adjust_for_condition (loc, >cond_code, >n2, loop->v,
+   loop->step);
 
   if (simd
  || (fd->sched_kind == OMP_CLAUSE_SCHEDULE_STATIC
diff --git a/gcc/omp-general.h b/gcc/omp-general.h
index f5f03c8b056..0cbbb31e73b 100644
--- a/gcc/omp-general.h
+++ b/gcc/omp-general.h
@@ -73,7 +73,7 @@ struct omp_for_data
 extern tree 

Re: [PATCH] Move stack protector epilogue before loading return hard reg(s) from pseudo(s) (PR rtl-optimization/87485)

2019-02-01 Thread Eric Botcazou
> As discussed in the PR and suggested by Uros, scheduler has code to keep a
> use of hard register next to the assignment that sets that hard register
> from a pseudo, which is desirable so that RA can deal with it properly.
> Unfortunately, with -fstack-protector* we stick the stack protect epilogue
> in between, which splits the load and use to different basic blocks.
> The code emitted by expand_function_end between these two spots is only the
> loading of the return value into registers, so generally it shouldn't
> contain any stores which stack protection wants to guard against, so I
> believe from security POV this shouldn't weaken anything, but fixes the
> testcase.

This moves the stack protect epilogue from after the naked_return_label to 
before though, so it will be skipped for a naked return.

-- 
Eric Botcazou


[PATCH] Fix Fortran handling of PARAMETERs in BLOCK which is not the first statement in function (PR fortran/83246, PR fortran/89084)

2019-02-01 Thread Jakub Jelinek
Hi!

As mentioned in the PR, the following testcases FAIL, because a VAR_DECL
for a PARAMETER inside of a BLOCK is not added to the BIND_EXPR vars and
thus the middle-end doesn't consider it defined.

The problem is in the following test, which passes for all the PR67885
tests, but doesn't really test whether this is a parameter in a BLOCK
namespace.  It actually tests whether the parent namespace (usually
function/subroutine body, but could be another BLOCK) has EXEC_BLOCK as
the first executable statement in it.
As I said in the PR, we could walk the linked list from sym->ns->parent->code
through ->next pointers and look for EXEC_BLOCK which has sym->ns as the
attached namespace, but that could be compile time expensive (consider
a function with a million of blocks in it and in each of them one or more
referenced parameters - that would be O(n^2) compile time).
The following patch instead checks the bit flag set only in EXEC_BLOCK
namespaces.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-02-01  Jakub Jelinek  

PR fortran/83246
PR fortran/89084
* trans-decl.c (generate_local_decl): Add referenced FL_PARAMETERs
if sym->ns->construct_entities rather than if
sym->ns->parent->code->op == EXEC_BLOCK.

* gfortran.dg/pr89084.f90: New test.
* gfortran.dg/lto/pr89084_0.f90: New test.
* gfortran.dg/pr83246.f90: New test.

--- gcc/fortran/trans-decl.c.jj 2019-01-19 09:41:46.465393897 +0100
+++ gcc/fortran/trans-decl.c2019-02-01 18:28:35.376176176 +0100
@@ -5735,10 +5735,7 @@ generate_local_decl (gfc_symbol * sym)
  "imported at %L", sym->name, >declared_at);
}
 
-  if (sym->ns
- && sym->ns->parent
- && sym->ns->parent->code
- && sym->ns->parent->code->op == EXEC_BLOCK)
+  if (sym->ns && sym->ns->construct_entities)
{
  if (sym->attr.referenced)
gfc_get_symbol_decl (sym);
--- gcc/testsuite/gfortran.dg/pr89084.f90.jj2019-02-01 18:29:56.776826089 
+0100
+++ gcc/testsuite/gfortran.dg/pr89084.f90   2019-02-01 18:32:11.305594831 
+0100
@@ -0,0 +1,23 @@
+! PR fortran/89084
+! { dg-do run }
+
+integer function foo ()
+  write (*,*) 'foo'
+  block
+integer, parameter :: idxs(3) = (/ 1, 2, 3 /)
+integer :: i
+foo = 0
+do i = 1, size(idxs)
+  foo = foo + idxs(i)
+enddo
+  end block
+end function foo
+program pr89084
+  integer :: i
+  interface
+integer function foo ()
+end function
+  end interface
+  i = foo ()
+  if (i.ne.6) stop 1
+end
--- gcc/testsuite/gfortran.dg/lto/pr89084_0.f90.jj  2019-02-01 
18:33:12.500579868 +0100
+++ gcc/testsuite/gfortran.dg/lto/pr89084_0.f90 2019-02-01 18:33:31.387266622 
+0100
@@ -0,0 +1,24 @@
+! PR fortran/89084
+! { dg-lto-do link }
+! { dg-lto-options {{ -O0 -flto }} }
+
+integer function foo ()
+  write (*,*) 'foo'
+  block
+integer, parameter :: idxs(3) = (/ 1, 2, 3 /)
+integer :: i
+foo = 0
+do i = 1, size(idxs)
+  foo = foo + idxs(i)
+enddo
+  end block
+end function foo
+program pr89084
+  integer :: i
+  interface
+integer function foo ()
+end function
+  end interface
+  i = foo ()
+  if (i.ne.6) stop 1
+end
--- gcc/testsuite/gfortran.dg/pr83246.f90.jj2019-02-01 20:19:04.567300205 
+0100
+++ gcc/testsuite/gfortran.dg/pr83246.f90   2019-02-01 20:18:02.446316150 
+0100
@@ -0,0 +1,9 @@
+! PR fortran/83246
+! { dg-do link }
+   program dusty_corner 
+   write(*,*)'BLOCK TESTS' 
+   MAKEDATAP: block
+   integer,parameter :: scratch(*)=[1,2,3]
+   write(*,*)scratch
+   endblock MAKEDATAP
+   end program dusty_corner

Jakub


[PATCH] Move stack protector epilogue before loading return hard reg(s) from pseudo(s) (PR rtl-optimization/87485)

2019-02-01 Thread Jakub Jelinek
Hi!

As discussed in the PR and suggested by Uros, scheduler has code to keep a
use of hard register next to the assignment that sets that hard register
from a pseudo, which is desirable so that RA can deal with it properly.
Unfortunately, with -fstack-protector* we stick the stack protect epilogue
in between, which splits the load and use to different basic blocks.
The code emitted by expand_function_end between these two spots is only the
loading of the return value into registers, so generally it shouldn't
contain any stores which stack protection wants to guard against, so I
believe from security POV this shouldn't weaken anything, but fixes the
testcase.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-02-01  Jakub Jelinek  

PR rtl-optimization/87485
* function.c (expand_function_end): Move stack_protect_epilogue
before loading of return value into hard register(s).

* gcc.dg/pr87485.c: New test.

--- gcc/function.c.jj   2019-01-29 16:47:02.0 +0100
+++ gcc/function.c  2019-02-01 16:23:07.471877843 +0100
@@ -5330,6 +5330,10 @@ expand_function_end (void)
  communicate between __builtin_eh_return and the epilogue.  */
   expand_eh_return ();
 
+  /* If stack protection is enabled for this function, check the guard.  */
+  if (crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ())
+stack_protect_epilogue ();
+
   /* If scalar return value was computed in a pseudo-reg, or was a named
  return value that got dumped to the stack, copy that to the hard
  return register.  */
@@ -5475,10 +5479,6 @@ expand_function_end (void)
   && targetm_common.except_unwind_info (_options) != UI_SJLJ)
 emit_insn (gen_blockage ());
 
-  /* If stack protection is enabled for this function, check the guard.  */
-  if (crtl->stack_protect_guard && targetm.stack_protect_runtime_enabled_p ())
-stack_protect_epilogue ();
-
   /* If we had calls to alloca, and this machine needs
  an accurate stack pointer to exit the function,
  insert some code to save and restore the stack pointer.  */
--- gcc/testsuite/gcc.dg/pr87485.c.jj   2019-02-01 16:30:51.101211900 +0100
+++ gcc/testsuite/gcc.dg/pr87485.c  2019-02-01 16:31:48.660260183 +0100
@@ -0,0 +1,29 @@
+/* PR rtl-optimization/87485 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -fschedule-insns -fno-guess-branch-probability 
-fno-isolate-erroneous-paths-dereference -fno-omit-frame-pointer 
-fno-split-wide-types -fno-tree-ccp -fno-tree-sra" } */
+/* { dg-additional-options "-fstack-protector-strong" { target 
fstack_protector } } */
+
+int *a;
+
+int
+foo (__int128 x, int y, int z)
+{
+  __int128 b;
+  *a = ((!!y ? y : x) * y | x) * 2;
+  if (z == 0)
+{
+  unsigned int c = 1;
+  __int128 *d = 
+  for (*a = 0; *a < 1; *a += y)
+   ;
+  *a += b < (c / 0);   /* { dg-warning "division by zero" } */
+  goto l;
+ m:
+  while (b < 1)
+   ;
+  ++*a;
+}
+  goto m;
+ l:
+  return 0;
+}

Jakub


Re: libgo patch committed: Add hurd build tags

2019-02-01 Thread Ian Lance Taylor
And this patch by
adds more hurd build tags, this time to test files.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268459)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-aa860a9ab0d1b60d1f499065a40a11e8a247422f
+87dd981901c645a7d54a52c5f4c35caec31a8978
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/go_unix_test.go
===
--- libgo/go/cmd/go/go_unix_test.go (revision 268369)
+++ libgo/go/cmd/go/go_unix_test.go (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build darwin dragonfly freebsd hurd linux netbsd openbsd solaris
 
 package main_test
 
Index: libgo/go/internal/poll/export_posix_test.go
===
--- libgo/go/internal/poll/export_posix_test.go (revision 268369)
+++ libgo/go/internal/poll/export_posix_test.go (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux nacl netbsd openbsd solaris 
windows
+// +build aix darwin dragonfly freebsd hurd linux nacl netbsd openbsd solaris 
windows
 
 // Export guts for testing on posix.
 // Since testing imports os and os imports internal/poll,
Index: libgo/go/internal/poll/fd_posix_test.go
===
--- libgo/go/internal/poll/fd_posix_test.go (revision 268369)
+++ libgo/go/internal/poll/fd_posix_test.go (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux nacl netbsd openbsd solaris 
windows
+// +build aix darwin dragonfly freebsd hurd linux nacl netbsd openbsd solaris 
windows
 
 package poll_test
 
Index: libgo/go/net/addrselect_test.go
===
--- libgo/go/net/addrselect_test.go (revision 268369)
+++ libgo/go/net/addrselect_test.go (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build darwin dragonfly freebsd hurd linux netbsd openbsd solaris
 
 package net
 
Index: libgo/go/net/cgo_unix_test.go
===
--- libgo/go/net/cgo_unix_test.go   (revision 268369)
+++ libgo/go/net/cgo_unix_test.go   (working copy)
@@ -3,7 +3,7 @@
 // license that can be found in the LICENSE file.
 
 // +build cgo,!netgo
-// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd hurd linux netbsd openbsd solaris
 
 package net
 
Index: libgo/go/net/conf_test.go
===
--- libgo/go/net/conf_test.go   (revision 268369)
+++ libgo/go/net/conf_test.go   (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build darwin dragonfly freebsd hurd linux netbsd openbsd solaris
 
 package net
 
Index: libgo/go/net/dial_unix_test.go
===
--- libgo/go/net/dial_unix_test.go  (revision 268369)
+++ libgo/go/net/dial_unix_test.go  (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd hurd linux netbsd openbsd solaris
 
 package net
 
Index: libgo/go/net/dnsclient_unix_test.go
===
--- libgo/go/net/dnsclient_unix_test.go (revision 268369)
+++ libgo/go/net/dnsclient_unix_test.go (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd hurd linux netbsd openbsd solaris
 
 package net
 
Index: libgo/go/net/dnsconfig_unix_test.go
===
--- libgo/go/net/dnsconfig_unix_test.go (revision 268369)
+++ libgo/go/net/dnsconfig_unix_test.go (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is 

libgo patch committed: Add hurd build tags

2019-02-01 Thread Ian Lance Taylor
This libgo patch bySvante Signell adds hurd build tags.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268458)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-c49ad6c4e66fa7ca992d947a5f0377090abadf6b
+aa860a9ab0d1b60d1f499065a40a11e8a247422f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/archive/tar/stat_actime1.go
===
--- libgo/go/archive/tar/stat_actime1.go(revision 268369)
+++ libgo/go/archive/tar/stat_actime1.go(working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build linux dragonfly openbsd solaris
+// +build hurd linux dragonfly openbsd solaris
 
 package tar
 
Index: libgo/go/archive/tar/stat_unix.go
===
--- libgo/go/archive/tar/stat_unix.go   (revision 268369)
+++ libgo/go/archive/tar/stat_unix.go   (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix linux darwin dragonfly freebsd openbsd netbsd solaris
+// +build aix hurd linux darwin dragonfly freebsd openbsd netbsd solaris
 
 package tar
 
Index: libgo/go/cmd/go/internal/base/signal_unix.go
===
--- libgo/go/cmd/go/internal/base/signal_unix.go(revision 268369)
+++ libgo/go/cmd/go/internal/base/signal_unix.go(working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd js linux nacl netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd hurd js linux nacl netbsd openbsd 
solaris
 
 package base
 
Index: libgo/go/cmd/go/internal/lockedfile/internal/filelock/filelock_other.go
===
--- libgo/go/cmd/go/internal/lockedfile/internal/filelock/filelock_other.go 
(revision 268369)
+++ libgo/go/cmd/go/internal/lockedfile/internal/filelock/filelock_other.go 
(working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build 
!aix,!darwin,!dragonfly,!freebsd,!linux,!netbsd,!openbsd,!plan9,!solaris,!windows
+// +build 
!aix,!darwin,!dragonfly,!freebsd,!hurd,!linux,!netbsd,!openbsd,!plan9,!solaris,!windows
 
 package filelock
 
Index: libgo/go/cmd/go/internal/lockedfile/internal/filelock/filelock_unix.go
===
--- libgo/go/cmd/go/internal/lockedfile/internal/filelock/filelock_unix.go  
(revision 268369)
+++ libgo/go/cmd/go/internal/lockedfile/internal/filelock/filelock_unix.go  
(working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux netbsd openbsd
+// +build darwin dragonfly freebsd hurd linux netbsd openbsd
 
 package filelock
 
Index: libgo/go/crypto/rand/eagain.go
===
--- libgo/go/crypto/rand/eagain.go  (revision 268369)
+++ libgo/go/crypto/rand/eagain.go  (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux nacl netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd hurd linux nacl netbsd openbsd solaris
 
 package rand
 
Index: libgo/go/crypto/rand/rand_unix.go
===
--- libgo/go/crypto/rand/rand_unix.go   (revision 268369)
+++ libgo/go/crypto/rand/rand_unix.go   (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix darwin dragonfly freebsd linux nacl netbsd openbsd plan9 solaris
+// +build aix darwin dragonfly freebsd hurd linux nacl netbsd openbsd plan9 
solaris
 
 // Unix cryptographically secure pseudorandom number
 // generator.
Index: libgo/go/crypto/x509/root_unix.go
===
--- libgo/go/crypto/x509/root_unix.go   (revision 268369)
+++ libgo/go/crypto/x509/root_unix.go   (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build aix dragonfly freebsd js,wasm linux nacl netbsd openbsd solaris
+// +build aix dragonfly freebsd hurd js,wasm linux nacl netbsd openbsd solaris
 
 

libgo patch committed: Use __atomic intrinsics instead of __sync

2019-02-01 Thread Ian Lance Taylor
This patch changes libgo to use the __atomic intrinsics instead of the
older __sync intrinsics.  libgo already used some __atomic calls; this
replaces all the __sync calls.  GCC has supported the __atomic
intrinsics since 4.7.  They are better than the __sync intrinsics in
that they specify a memory model and, more importantly for our
purposes, they are reliably implemented either in the compiler or in
libatomic.  This fixes the reopened GCC PR 52084.  Bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268450)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-cbcc538adc518da5788d1101e16f106a1514
+c49ad6c4e66fa7ca992d947a5f0377090abadf6b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 268369)
+++ libgo/Makefile.am   (working copy)
@@ -471,7 +471,6 @@ runtime_files = \
runtime/proc.c \
runtime/runtime_c.c \
runtime/stack.c \
-   runtime/thread.c \
runtime/yield.c \
$(rtems_task_variable_add_file) \
$(runtime_getncpu_file)
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 268369)
+++ libgo/configure.ac  (working copy)
@@ -578,62 +578,6 @@ LIBS="$LIBS $MATH_LIBS"
 AC_CHECK_FUNCS(matherr)
 LIBS="$LIBS_hold"
 
-AC_CACHE_CHECK([for __sync_bool_compare_and_swap_4],
-[libgo_cv_func___sync_bool_compare_and_swap_4],
-[AC_LINK_IFELSE([AC_LANG_SOURCE([
-typedef unsigned int uint32  __attribute__ ((mode (SI)));
-uint32 i;
-int main() { return __sync_bool_compare_and_swap (, 0, 1); }
-])],
-[libgo_cv_func___sync_bool_compare_and_swap_4=yes],
-[libgo_cv_func___sync_bool_compare_and_swap_4=no])])
-if test "$libgo_cv_func___sync_bool_compare_and_swap_4" = "yes"; then
-  AC_DEFINE(HAVE_SYNC_BOOL_COMPARE_AND_SWAP_4, 1,
-[Define to 1 if the compiler provides the __sync_bool_compare_and_swap 
function for uint32])
-fi
-
-AC_CACHE_CHECK([for __sync_bool_compare_and_swap_8],
-[libgo_cv_func___sync_bool_compare_and_swap_8],
-[AC_LINK_IFELSE([AC_LANG_SOURCE([
-typedef unsigned int uint64  __attribute__ ((mode (DI)));
-uint64 i;
-int main() { return __sync_bool_compare_and_swap (, 0, 1); }
-])],
-[libgo_cv_func___sync_bool_compare_and_swap_8=yes],
-[libgo_cv_func___sync_bool_compare_and_swap_8=no])])
-if test "$libgo_cv_func___sync_bool_compare_and_swap_8" = "yes"; then
-  AC_DEFINE(HAVE_SYNC_BOOL_COMPARE_AND_SWAP_8, 1,
-[Define to 1 if the compiler provides the __sync_bool_compare_and_swap 
function for uint64])
-fi
-
-AC_CACHE_CHECK([for __sync_fetch_and_add_4],
-[libgo_cv_func___sync_fetch_and_add_4],
-[AC_LINK_IFELSE([AC_LANG_SOURCE([
-typedef unsigned int uint32  __attribute__ ((mode (SI)));
-uint32 i;
-int main() { return __sync_fetch_and_add (, 1); }
-])],
-[libgo_cv_func___sync_fetch_and_add_4=yes],
-[libgo_cv_func___sync_fetch_and_add_4=no])])
-if test "$libgo_cv_func___sync_fetch_and_add_4" = "yes"; then
-  AC_DEFINE(HAVE_SYNC_FETCH_AND_ADD_4, 1,
-[Define to 1 if the compiler provides the __sync_fetch_and_add function 
for uint32])
-fi
-
-AC_CACHE_CHECK([for __sync_add_and_fetch_8],
-[libgo_cv_func___sync_add_and_fetch_8],
-[AC_LINK_IFELSE([AC_LANG_SOURCE([
-typedef unsigned int uint64  __attribute__ ((mode (DI)));
-uint64 i;
-int main() { return __sync_add_and_fetch (, 1); }
-])],
-[libgo_cv_func___sync_add_and_fetch_8=yes],
-[libgo_cv_func___sync_add_and_fetch_8=no])])
-if test "$libgo_cv_func___sync_add_and_fetch_8" = "yes"; then
-  AC_DEFINE(HAVE_SYNC_ADD_AND_FETCH_8, 1,
-[Define to 1 if the compiler provides the __sync_add_and_fetch function 
for uint64])
-fi
-
 dnl For x86 we want to use the -minline-all-stringops option to avoid
 dnl forcing a stack split when calling memcpy and friends.
 AC_CACHE_CHECK([whether compiler supports -minline-all-stringops],
Index: libgo/go/runtime/testdata/testprogcgo/lockosthread.c
===
--- libgo/go/runtime/testdata/testprogcgo/lockosthread.c(revision 
268369)
+++ libgo/go/runtime/testdata/testprogcgo/lockosthread.c(working copy)
@@ -9,5 +9,5 @@
 uint32_t threadExited;
 
 void setExited(void *x) {
-   __sync_fetch_and_add(, 1);
+   __atomic_add_fetch(, 1, __ATOMIC_SEQ_CST);
 }
Index: libgo/go/runtime/testdata/testprogcgo/threadpprof.go
===
--- libgo/go/runtime/testdata/testprogcgo/threadpprof.go(revision 
268369)
+++ libgo/go/runtime/testdata/testprogcgo/threadpprof.go(working copy)
@@ -50,13 +50,13 @@ void pprofCgoThreadTraceback(void* parg)
arg->buf[0] = (uintptr_t)(cpuHogThread) + 

Re: [PATCH] [og8] Allow optional arguments to be used in the use_device OpenACC clause

2019-02-01 Thread Kwok Cheung Yeung
I have retested all the failing cases and they now pass with the 
attached patch. I will commit this to openacc-gcc-8-branch now as the 
fix is obvious.


Kwok

On 01/02/2019 6:02 pm, Kwok Cheung Yeung wrote:

There is an error in the logic here:

--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -8938,18 +8938,51 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)

   tkind = GOMP_MAP_FIRSTPRIVATE_INT;
     type = TREE_TYPE (ovar);
     if (TREE_CODE (type) == ARRAY_TYPE)
- var = build_fold_addr_expr (var);
+ {
+   var = build_fold_addr_expr (var);
+   gimplify_assign (x, var, );
+ }
     else
...
+   if (omp_is_reference (ovar) || optional_arg_p)
   {
...
+   gimplify_assign (x, var, );
   }
+
+   if (optional_arg_p)
+ gimple_seq_add_stmt (,
+  gimple_build_label (opt_arg_label));
   }
-   gimplify_assign (x, var, );

The gimplify_assign was hoisted into the two branches of the preceding 
if-else because I wanted to skip the assign if there was a non-present 
optional argument. However, in the else case, the assign only happens if 
omp_is_reference or optional_arg_p is true, when it should be 
unconditional.


I can confirm that fixing this allows at least 
libgomp.oacc-fortran/host_data-1.f90 to pass again. I will post the 
patch when I have double-checked the other cases.


Thanks

Kwok

On 01/02/2019 4:24 pm, Thomas Schwinge wrote:

Hi Kwok!

On Thu, 31 Jan 2019 18:30:35 +, Kwok Cheung Yeung 
 wrote:

This patch allows for the use of Fortran optional arguments in the
use_device clause of a host_data directive.

I will push this into openacc-gcc-8-branch later today.


Per my testing, it unfortunately also introduces a number of regressions:

 [-PASS:-]{+FAIL:+} 
gfortran.dg/goacc/uninit-use-device-clause.f95   -O   (test for 
warnings, line 7)
 PASS: gfortran.dg/goacc/uninit-use-device-clause.f95   -O  (test 
for excess errors)


(This probably means that the clause argument is no longer
"evaluated/used".)

 PASS: libgomp.c/target-14.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/target-14.c execution test

 libgomp: cuCtxSynchronize error: an illegal memory access was 
encountered


 PASS: libgomp.c/target-18.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/target-18.c execution test

 libgomp: use_device_ptr pointer wasn't mapped

 PASS: libgomp.c++/target-9.C (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c++/target-9.C execution test

 libgomp: use_device_ptr pointer wasn't mapped

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 
-foffload=nvptx-none  -O0  (test for excess errors)
 [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 
-foffload=nvptx-none  -O0  execution test


 libgomp: use_device_ptr pointer wasn't mapped

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 
-foffload=nvptx-none  -O2  (test for excess errors)
 [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 
-foffload=nvptx-none  -O2  execution test


No error message.

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 
-foffload=nvptx-none  -O0  (test for excess errors)
 [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 
-foffload=nvptx-none  -O0  execution test
 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 
-foffload=nvptx-none  -O2  (test for excess errors)
 [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 
-foffload=nvptx-none  -O2  execution test
 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O2  
(test for excess errors)
 [-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O2  
execution test


 host_data-6.exe: 
[...]/libgomp.oacc-c-c++-common/host_data-6.c:15: foo: Assertion `p == 
(float *) host_p' failed.

Same for C++, for "libgomp.oacc-c-c++-common/host_data-5.c", and
"libgomp.oacc-c-c++-common/host_data-6.c".

 PASS: libgomp.oacc-fortran/host_data-1.f90 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O0  
(test for excess errors)
 

[PATCH 11/46] i386: Emulate MMX ashr3/3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX ashr3/3 with SSE.  Only SSE register
source operand is allowed.

PR target/89021
* config/i386/mmx.md (ashr3): New.
(3): Likewise.
---
 gcc/config/i386/mmx.md | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index fe199b84935..0b2383ef764 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1012,6 +1012,40 @@
(const_string "0")))
(set_attr "mode" "DI")])
 
+(define_insn "ashr3"
+  [(set (match_operand:MMXMODE24 0 "register_operand" "=Yx,Yy")
+(ashiftrt:MMXMODE24
+ (match_operand:MMXMODE24 1 "register_operand" "0,Yy")
+ (match_operand:DI 2 "nonmemory_operand" "YxN,YyN")))]
+  "TARGET_MMX_WITH_SSE"
+  "@
+   psra\t{%2, %0|%0, %2}
+   vpsra\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseishft,sseishft")
+   (set (attr "length_immediate")
+ (if_then_else (match_operand 2 "const_int_operand")
+   (const_string "1")
+   (const_string "0")))
+   (set_attr "mode" "TI")])
+
+(define_insn "3"
+  [(set (match_operand:MMXMODE248 0 "register_operand" "=Yx,Yy")
+(any_lshift:MMXMODE248
+ (match_operand:MMXMODE248 1 "register_operand" "0,Yy")
+ (match_operand:DI 2 "nonmemory_operand" "YxN,YyN")))]
+  "TARGET_MMX_WITH_SSE"
+  "@
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseishft,sseishft")
+   (set (attr "length_immediate")
+ (if_then_else (match_operand 2 "const_int_operand")
+   (const_string "1")
+   (const_string "0")))
+   (set_attr "mode" "TI")])
+
 ;
 ;;
 ;; Parallel integral comparisons
-- 
2.20.1



[PATCH 16/46] i386: Emulate MMX pshufw with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX pshufw with SSE.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_pshufw_1): Check TARGET_MMX_WITH_SSE
for SSE emulation.
(*vec_dupv4hi): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 74efe680d9e..599c762e166 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1344,9 +1344,9 @@
 })
 
 (define_insn "mmx_pshufw_1"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yy")
 (vec_select:V4HI
-  (match_operand:V4HI 1 "nonimmediate_operand" "ym")
+  (match_operand:V4HI 1 "nonimmediate_operand" "ym,Yy")
   (parallel [(match_operand 2 "const_0_to_3_operand")
  (match_operand 3 "const_0_to_3_operand")
  (match_operand 4 "const_0_to_3_operand")
@@ -1360,11 +1360,14 @@
   mask |= INTVAL (operands[5]) << 6;
   operands[2] = GEN_INT (mask);
 
-  return "pshufw\t{%2, %1, %0|%0, %1, %2}";
+  if (TARGET_MMX_WITH_SSE)
+return "%vpshuflw\t{%2, %1, %0|%0, %1, %2}";
+  else
+return "pshufw\t{%2, %1, %0|%0, %1, %2}";
 }
-  [(set_attr "type" "mmxcvt")
+  [(set_attr "type" "mmxcvt,sselog")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_insn "mmx_pswapdv2si2"
   [(set (match_operand:V2SI 0 "register_operand" "=y")
@@ -1378,15 +1381,17 @@
(set_attr "mode" "DI")])
 
 (define_insn "*vec_dupv4hi"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yy")
(vec_duplicate:V4HI
  (truncate:HI
-   (match_operand:SI 1 "register_operand" "0"]
+   (match_operand:SI 1 "register_operand" "0,Yy"]
   "TARGET_SSE || TARGET_3DNOW_A"
-  "pshufw\t{$0, %0, %0|%0, %0, 0}"
-  [(set_attr "type" "mmxcvt")
+  "@
+   pshufw\t{$0, %0, %0|%0, %0, 0}
+   %vpshuflw\t{$0, %1, %0|%0, %1, 0}"
+  [(set_attr "type" "mmxcvt,ssemov")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_insn_and_split "*vec_dupv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y,Yx,Yy")
-- 
2.20.1



[PATCH 41/46] i386: Add tests for MMX intrinsic emulations with SSE

2019-02-01 Thread H.J. Lu
Test MMX intrinsics with -msse2 -mno-mmx in 64-bit mode.

PR target/89021
* gcc.target/i386/mmx-vals.h: New file.
* gcc.target/i386/sse2-mmx-2.c: Likewise.
* gcc.target/i386/sse2-mmx-3.c: Likewise.
* gcc.target/i386/sse2-mmx-4.c: Likewise.
* gcc.target/i386/sse2-mmx-5.c: Likewise.
* gcc.target/i386/sse2-mmx-6.c: Likewise.
* gcc.target/i386/sse2-mmx-7.c: Likewise.
* gcc.target/i386/sse2-mmx-8.c: Likewise.
* gcc.target/i386/sse2-mmx-9.c: Likewise.
* gcc.target/i386/sse2-mmx-10.c: Likewise.
* gcc.target/i386/sse2-mmx-11.c: Likewise.
* gcc.target/i386/sse2-mmx-12.c: Likewise.
* gcc.target/i386/sse2-mmx-13.c: Likewise.
* gcc.target/i386/sse2-mmx-14.c: Likewise.
* gcc.target/i386/sse2-mmx-15.c: Likewise.
* gcc.target/i386/sse2-mmx-16.c: Likewise.
* gcc.target/i386/sse2-mmx-17.c: Likewise.
* gcc.target/i386/sse2-mmx-18.c: Likewise.
* gcc.target/i386/sse2-mmx-19.c: Likewise.
* gcc.target/i386/sse2-mmx-20.c: Likewise.
* gcc.target/i386/sse2-mmx-21.c: Likewise.
* gcc.target/i386/sse2-mmx-cvtpi2ps.c: Likewise.
* gcc.target/i386/sse2-mmx-cvtps2pi.c: Likewise.
* gcc.target/i386/sse2-mmx-cvttps2pi.c: Likewise.
* gcc.target/i386/sse2-mmx-maskmovq.c: Likewise.
* gcc.target/i386/sse2-mmx-packssdw.c: Likewise.
* gcc.target/i386/sse2-mmx-packsswb.c: Likewise.
* gcc.target/i386/sse2-mmx-packuswb.c: Likewise.
* gcc.target/i386/sse2-mmx-paddb.c: Likewise.
* gcc.target/i386/sse2-mmx-paddd.c: Likewise.
* gcc.target/i386/sse2-mmx-paddq.c: Likewise.
* gcc.target/i386/sse2-mmx-paddsb.c: Likewise.
* gcc.target/i386/sse2-mmx-paddsw.c: Likewise.
* gcc.target/i386/sse2-mmx-paddusb.c: Likewise.
* gcc.target/i386/sse2-mmx-paddusw.c: Likewise.
* gcc.target/i386/sse2-mmx-paddw.c: Likewise.
* gcc.target/i386/sse2-mmx-pand.c: Likewise.
* gcc.target/i386/sse2-mmx-pandn.c: Likewise.
* gcc.target/i386/sse2-mmx-pavgb.c: Likewise.
* gcc.target/i386/sse2-mmx-pavgw.c: Likewise.
* gcc.target/i386/sse2-mmx-pcmpeqb.c: Likewise.
* gcc.target/i386/sse2-mmx-pcmpeqd.c: Likewise.
* gcc.target/i386/sse2-mmx-pcmpeqw.c: Likewise.
* gcc.target/i386/sse2-mmx-pcmpgtb.c: Likewise.
* gcc.target/i386/sse2-mmx-pcmpgtd.c: Likewise.
* gcc.target/i386/sse2-mmx-pcmpgtw.c: Likewise.
* gcc.target/i386/sse2-mmx-pextrw.c: Likewise.
* gcc.target/i386/sse2-mmx-pinsrw.c: Likewise.
* gcc.target/i386/sse2-mmx-pmaddwd.c: Likewise.
* gcc.target/i386/sse2-mmx-pmaxsw.c: Likewise.
* gcc.target/i386/sse2-mmx-pmaxub.c: Likewise.
* gcc.target/i386/sse2-mmx-pminsw.c: Likewise.
* gcc.target/i386/sse2-mmx-pminub.c: Likewise.
* gcc.target/i386/sse2-mmx-pmovmskb.c: Likewise.
* gcc.target/i386/sse2-mmx-pmulhuw.c: Likewise.
* gcc.target/i386/sse2-mmx-pmulhw.c: Likewise.
* gcc.target/i386/sse2-mmx-pmullw.c: Likewise.
* gcc.target/i386/sse2-mmx-pmuludq.c: Likewise.
* gcc.target/i386/sse2-mmx-por.c: Likewise.
* gcc.target/i386/sse2-mmx-psadbw.c: Likewise.
* gcc.target/i386/sse2-mmx-pshufw.c: Likewise.
* gcc.target/i386/sse2-mmx-pslld.c: Likewise.
* gcc.target/i386/sse2-mmx-pslldi.c: Likewise.
* gcc.target/i386/sse2-mmx-psllq.c: Likewise.
* gcc.target/i386/sse2-mmx-psllqi.c: Likewise.
* gcc.target/i386/sse2-mmx-psllw.c: Likewise.
* gcc.target/i386/sse2-mmx-psllwi.c: Likewise.
* gcc.target/i386/sse2-mmx-psrad.c: Likewise.
* gcc.target/i386/sse2-mmx-psradi.c: Likewise.
* gcc.target/i386/sse2-mmx-psraw.c: Likewise.
* gcc.target/i386/sse2-mmx-psrawi.c: Likewise.
* gcc.target/i386/sse2-mmx-psrld.c: Likewise.
* gcc.target/i386/sse2-mmx-psrldi.c: Likewise.
* gcc.target/i386/sse2-mmx-psrlq.c: Likewise.
* gcc.target/i386/sse2-mmx-psrlqi.c: Likewise.
* gcc.target/i386/sse2-mmx-psrlw.c: Likewise.
* gcc.target/i386/sse2-mmx-psrlwi.c: Likewise.
* gcc.target/i386/sse2-mmx-psubb.c: Likewise.
* gcc.target/i386/sse2-mmx-psubd.c: Likewise.
* gcc.target/i386/sse2-mmx-psubq.c: Likewise.
* gcc.target/i386/sse2-mmx-psubusb.c: Likewise.
* gcc.target/i386/sse2-mmx-psubusw.c: Likewise.
* gcc.target/i386/sse2-mmx-psubw.c: Likewise.
* gcc.target/i386/sse2-mmx-punpckhbw.c: Likewise.
* gcc.target/i386/sse2-mmx-punpckhdq.c: Likewise.
* gcc.target/i386/sse2-mmx-punpckhwd.c: Likewise.
* gcc.target/i386/sse2-mmx-punpcklbw.c: Likewise.
* gcc.target/i386/sse2-mmx-punpckldq.c: Likewise.
* gcc.target/i386/sse2-mmx-punpcklwd.c: Likewise.
* gcc.target/i386/sse2-mmx-pxor.c: Likewise.
---
 

[PATCH 46/46] i386: Implement V2SF comparisons with SSE

2019-02-01 Thread H.J. Lu
In 64-bit mode, implement V2SF comparisons with SEE.  Only SSE register
source operand is allowed.

gcc/

PR target/89028
* config/i386/sse.md (V_128_64): New mode iterator.
(VF_128_64): Likewise.
(sseintvecmode): Add V2SF.
(sseintvecmodelower): Likewise.
(*sse_maskcmpv2sf3_comm): New.
(*sse_maskcmpv2sf3): Likewise.
(vcond): Renamed to ...
(vcond): This.

gcc/testsuite/

PR target/89028
* gcc.target/i386/pr89028-10.c: New test.
* gcc.target/i386/pr89028-11.c: Likewise.
* gcc.target/i386/pr89028-12.c: Likewise.
* gcc.target/i386/pr89028-13.c: Likewise.
---
 gcc/config/i386/sse.md | 61 ++
 gcc/testsuite/gcc.target/i386/pr89028-10.c | 39 ++
 gcc/testsuite/gcc.target/i386/pr89028-11.c | 39 ++
 gcc/testsuite/gcc.target/i386/pr89028-12.c | 39 ++
 gcc/testsuite/gcc.target/i386/pr89028-13.c | 39 ++
 5 files changed, 208 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-13.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 512b6c71a75..03a5160a2e6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -246,6 +246,12 @@
 (define_mode_iterator V_128
   [V16QI V8HI V4SI V2DI V4SF (V2DF "TARGET_SSE2")])
 
+;; All 128bit and 64bit vector modes
+(define_mode_iterator V_128_64
+  [V16QI V8HI V4SI V2DI V4SF (V2DF "TARGET_SSE2")
+   (V8QI "TARGET_MMX_WITH_SSE") (V4HI "TARGET_MMX_WITH_SSE")
+   (V2SI "TARGET_MMX_WITH_SSE") (V2SF "TARGET_MMX_WITH_SSE")])
+
 ;; All 256bit vector modes
 (define_mode_iterator V_256
   [V32QI V16HI V8SI V4DI V8SF V4DF])
@@ -302,6 +308,10 @@
 (define_mode_iterator VF_128
   [V4SF (V2DF "TARGET_SSE2")])
 
+;; All 128bit and 64bit vector float modes
+(define_mode_iterator VF_128_64
+  [V4SF (V2DF "TARGET_SSE2") (V2SF "TARGET_MMX_WITH_SSE")])
+
 ;; All 256bit vector float modes
 (define_mode_iterator VF_256
   [V8SF V4DF])
@@ -734,6 +744,7 @@
   [(V16SF "V16SI") (V8DF  "V8DI")
(V8SF  "V8SI")  (V4DF  "V4DI")
(V4SF  "V4SI")  (V2DF  "V2DI")
+   (V2SF  "V2SI")
(V16SI "V16SI") (V8DI  "V8DI")
(V8SI  "V8SI")  (V4DI  "V4DI")
(V4SI  "V4SI")  (V2DI  "V2DI")
@@ -749,6 +760,7 @@
   [(V16SF "v16si") (V8DF "v8di")
(V8SF "v8si") (V4DF "v4di")
(V4SF "v4si") (V2DF "v2di")
+   (V2SF "v2si")
(V8SI "v8si") (V4DI "v4di")
(V4SI "v4si") (V2DI "v2di")
(V16HI "v16hi") (V8HI "v8hi")
@@ -2766,6 +2778,37 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "")])
 
+(define_insn "*sse_maskcmpv2sf3_comm"
+  [(set (match_operand:V2SF 0 "register_operand" "=Yx,Yx")
+   (match_operator:V2SF 3 "sse_comparison_operator"
+ [(match_operand:V2SF 1 "register_operand" "%0,Yx")
+  (match_operand:V2SF 2 "register_operand" "Yx,Yx")]))]
+  "TARGET_MMX_WITH_SSE
+   && GET_RTX_CLASS (GET_CODE (operands[3])) == RTX_COMM_COMPARE"
+  "@
+   cmp%D3ps\t{%2, %0|%0, %2}
+   vcmp%D3ps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssecmp")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "SF")])
+
+(define_insn "*sse_maskcmpv2sf3"
+  [(set (match_operand:V2SF 0 "register_operand" "=Yx,Yx")
+   (match_operator:V2SF 3 "sse_comparison_operator"
+ [(match_operand:V2SF 1 "register_operand" "0,Yx")
+  (match_operand:V2SF 2 "register_operand" "Yx,Yx")]))]
+  "TARGET_MMX_WITH_SSE"
+  "@
+   cmp%D3ps\t{%2, %0|%0, %2}
+   vcmp%D3ps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssecmp")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "orig,vex")
+   (set_attr "mode" "SF")])
+
 (define_mode_attr cmp_imm_predicate
   [(V16SF "const_0_to_31_operand")  (V8DF "const_0_to_31_operand")
(V16SI "const_0_to_7_operand")   (V8DI "const_0_to_7_operand")
@@ -3089,17 +3132,17 @@
   DONE;
 })
 
-(define_expand "vcond"
-  [(set (match_operand:V_128 0 "register_operand")
-   (if_then_else:V_128
+(define_expand "vcond"
+  [(set (match_operand:V_128_64 0 "register_operand")
+   (if_then_else:V_128_64
  (match_operator 3 ""
-   [(match_operand:VF_128 4 "vector_operand")
-(match_operand:VF_128 5 "vector_operand")])
- (match_operand:V_128 1 "general_operand")
- (match_operand:V_128 2 "general_operand")))]
+   [(match_operand:VF_128_64 4 "vector_operand")
+(match_operand:VF_128_64 5 "vector_operand")])
+ (match_operand:V_128_64 1 "general_operand")
+ (match_operand:V_128_64 2 "general_operand")))]
   "TARGET_SSE
-   && (GET_MODE_NUNITS (mode)
-   == GET_MODE_NUNITS (mode))"
+   && (GET_MODE_NUNITS (mode)
+   == GET_MODE_NUNITS 

[PATCH 40/46] i386: Don't enable MMX in 64-bit mode by default

2019-02-01 Thread H.J. Lu
In 64-bit mode, don't enable MMX by default and allow SSE/SSE2/SSSE3
to emulate MMX intrinsics when MMX is disabled.  For pr82483-1.c and
pr82483-2.c, "-mssse3 -mno-mmx" no longer ICEs in 64-bit mode since MMX
intrinsics are supported now when MMX is disabled.

gcc/

PR target/89021
* config/i386/driver-i386.c (host_detect_local_cpu): Pass
-mmmx-native instead -mmmx to ix86_option_override_internal.
* config/i386/i386-builtin.def: Enable MMX intrinsics with
SSE/SSE2/SSSE3.
* config/i386/i386.c (ix86_option_override_internal): Don't
enable MMX in 64-bit mode by default.  Turn on MMX for -msse
or -mmmx-native when not in 64-bit mode.
(bdesc_tm): Enable MMX intrinsics with SSE/SSE2/SSSE3.
(ix86_init_mmx_sse_builtins): Likewise.
(ix86_expand_builtin): Allow SSE/SSE2/SSSE3 to emulate MMX
intrinsics in 64-bit mode when MMX is disabled.
* config/i386/i386.opt (-mmmx-native): New.  Undocumented
command-line option.
* config/i386/mmintrin.h: Don't enable MMX in 64-bit mode.

gcc/testsuite/

PR target/89021
* gcc.target/i386/pr82483-1.c: Error only on ia32.
* gcc.target/i386/pr82483-2.c: Likewise.
* gcc.target/i386/sse-mmx-1.c: New test.
* objc.dg/gnu-encoding/struct-layout-1.h: Include 
in 64-bit mode when MMX is disabled and SSE2 is enabled.
---
 gcc/config/i386/driver-i386.c |   4 +-
 gcc/config/i386/i386-builtin.def  | 126 +-
 gcc/config/i386/i386.c|  64 ++---
 gcc/config/i386/i386.opt  |   4 +
 gcc/config/i386/mmintrin.h|  10 +-
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 gcc/testsuite/gcc.target/i386/sse-mmx-1.c |  12 ++
 .../objc.dg/gnu-encoding/struct-layout-1.h|   2 +-
 9 files changed, 139 insertions(+), 87 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse-mmx-1.c

diff --git a/gcc/config/i386/driver-i386.c b/gcc/config/i386/driver-i386.c
index 75f70269517..38914126354 100644
--- a/gcc/config/i386/driver-i386.c
+++ b/gcc/config/i386/driver-i386.c
@@ -1070,7 +1070,9 @@ const char *host_detect_local_cpu (int argc, const char 
**argv)
 
   if (arch)
 {
-  const char *mmx = has_mmx ? " -mmmx" : " -mno-mmx";
+  /* Pass -mmmx-native, instead of -mmmx, to x86 backend so that
+ MMX won't be enabled by -march=native in 64-bit mode.  */
+  const char *mmx = has_mmx ? " -mmmx-native" : " -mno-mmx";
   const char *mmx3dnow = has_3dnow ? " -m3dnow" : " -mno-3dnow";
   const char *sse = has_sse ? " -msse" : " -mno-sse";
   const char *sse2 = has_sse2 ? " -msse2" : " -mno-sse2";
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 88005f4687f..10a9d631f29 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -100,7 +100,7 @@ BDESC (0, 0, CODE_FOR_fnstsw, "__builtin_ia32_fnstsw", 
IX86_BUILTIN_FNSTSW, UNKN
 BDESC (0, 0, CODE_FOR_fnclex, "__builtin_ia32_fnclex", IX86_BUILTIN_FNCLEX, 
UNKNOWN, (int) VOID_FTYPE_VOID)
 
 /* MMX */
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_emms, "__builtin_ia32_emms", 
IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
+BDESC (OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE2, 0, CODE_FOR_mmx_emms, 
"__builtin_ia32_emms", IX86_BUILTIN_EMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
 
 /* 3DNow! */
 BDESC (OPTION_MASK_ISA_3DNOW, 0, CODE_FOR_mmx_femms, "__builtin_ia32_femms", 
IX86_BUILTIN_FEMMS, UNKNOWN, (int) VOID_FTYPE_VOID)
@@ -442,68 +442,68 @@ BDESC (0, 0, CODE_FOR_rotrqi3, "__builtin_ia32_rorqi", 
IX86_BUILTIN_RORQI, UNKNO
 BDESC (0, 0, CODE_FOR_rotrhi3, "__builtin_ia32_rorhi", IX86_BUILTIN_RORHI, 
UNKNOWN, (int) UINT16_FTYPE_UINT16_INT)
 
 /* MMX */
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv8qi3, "__builtin_ia32_paddb", 
IX86_BUILTIN_PADDB, UNKNOWN, (int) V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv4hi3, "__builtin_ia32_paddw", 
IX86_BUILTIN_PADDW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_addv2si3, "__builtin_ia32_paddd", 
IX86_BUILTIN_PADDD, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv8qi3, "__builtin_ia32_psubb", 
IX86_BUILTIN_PSUBB, UNKNOWN, (int) V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv4hi3, "__builtin_ia32_psubw", 
IX86_BUILTIN_PSUBW, UNKNOWN, (int) V4HI_FTYPE_V4HI_V4HI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_subv2si3, "__builtin_ia32_psubd", 
IX86_BUILTIN_PSUBD, UNKNOWN, (int) V2SI_FTYPE_V2SI_V2SI)
-
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv8qi3, 
"__builtin_ia32_paddsb", IX86_BUILTIN_PADDSB, UNKNOWN, (int) 
V8QI_FTYPE_V8QI_V8QI)
-BDESC (OPTION_MASK_ISA_MMX, 0, CODE_FOR_mmx_ssaddv4hi3, 
"__builtin_ia32_paddsw", IX86_BUILTIN_PADDSW, UNKNOWN, (int) 

[PATCH 42/46] i386: Also enable SSSE3 __m64 tests without MMX

2019-02-01 Thread H.J. Lu
Since we now emulate MMX intrinsics with SSE when MMX is disabled, we
can enable SSSE3 __m64 tests without MMX even when AVX is enabled.

PR target/89021
* gcc.target/i386/ssse3-pabsb.c: Also enable __m64 check when
MMX is disabled.
* gcc.target/i386/ssse3-pabsd.c: Likewise.
* gcc.target/i386/ssse3-pabsw.c: Likewise.
* gcc.target/i386/ssse3-palignr.c: Likewise.
* gcc.target/i386/ssse3-phaddd.c: Likewise.
* gcc.target/i386/ssse3-phaddsw.c: Likewise.
* gcc.target/i386/ssse3-phaddw.c: Likewise.
* gcc.target/i386/ssse3-phsubd.c: Likewise.
* gcc.target/i386/ssse3-phsubsw.c: Likewise.
* gcc.target/i386/ssse3-phsubw.c: Likewise.
* gcc.target/i386/ssse3-pmaddubsw.c: Likewise.
* gcc.target/i386/ssse3-pmulhrsw.c: Likewise.
* gcc.target/i386/ssse3-pshufb.c: Likewise.
* gcc.target/i386/ssse3-psignb.c: Likewise.
* gcc.target/i386/ssse3-psignd.c: Likewise.
* gcc.target/i386/ssse3-psignw.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/ssse3-pabsb.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pabsd.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pabsw.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-palignr.c   | 6 +++---
 gcc/testsuite/gcc.target/i386/ssse3-phaddd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phaddsw.c   | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phaddw.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubsw.c   | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-phsubw.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pmaddubsw.c | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pmulhrsw.c  | 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-pshufb.c| 6 +++---
 gcc/testsuite/gcc.target/i386/ssse3-psignb.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-psignd.c| 4 ++--
 gcc/testsuite/gcc.target/i386/ssse3-psignw.c| 4 ++--
 16 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
index 7caa1b6c3a6..36c62a73052 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsb.c
@@ -15,7 +15,7 @@
 #include "ssse3-vals.h"
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || !defined __MMX__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsb (int *i1, int *r)
@@ -63,7 +63,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result([i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || !defined __MMX__
   /* Run the 64-bit tests */
   ssse3_test_pabsb ([i + 0], [0]);
   ssse3_test_pabsb ([i + 2], [2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
index 3a73cf01170..4c600377e6a 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsd.c
@@ -16,7 +16,7 @@
 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || !defined __MMX__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsd (int *i1, int *r)
@@ -62,7 +62,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result([i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || !defined __MMX__
   /* Run the 64-bit tests */
   ssse3_test_pabsd ([i + 0], [0]);
   ssse3_test_pabsd ([i + 2], [2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c 
b/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
index 67e4721b8e6..8d0e2386e3d 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-pabsw.c
@@ -16,7 +16,7 @@
 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || !defined __MMX__
 /* Test the 64-bit form */
 static void
 ssse3_test_pabsw (int *i1, int *r)
@@ -64,7 +64,7 @@ TEST (void)
   /* Manually compute the result */
   compute_correct_result ([i + 0], ck);
 
-#ifndef __AVX__
+#if !defined __AVX__ || !defined __MMX__
   /* Run the 64-bit tests */
   ssse3_test_pabsw ([i + 0], [0]);
   ssse3_test_pabsw ([i + 2], [2]);
diff --git a/gcc/testsuite/gcc.target/i386/ssse3-palignr.c 
b/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
index dbee9bee4aa..1e01e3a69bd 100644
--- a/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
+++ b/gcc/testsuite/gcc.target/i386/ssse3-palignr.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 
-#ifndef __AVX__
+#if !defined __AVX__ || !defined __MMX__
 /* Test the 64-bit form */
 static void
 ssse3_test_palignr (int *i1, int *i2, unsigned int imm, int *r)
@@ -214,7 +214,7 @@ compute_correct_result_128 (int *i1, int *i2, unsigned int 
imm, int *r)
   bout[i] = buf[imm + i];
 }
 
-#ifndef __AVX__
+#if !defined __AVX__ || !defined __MMX__
 static void
 compute_correct_result_64 (int *i1, int *i2, unsigned int imm, int *r)
 {
@@ -256,7 +256,7 @@ TEST (void)
   for (i = 0; i < 256; i += 8)
 for (imm = 0; imm < 100; imm++)
 

[PATCH 38/46] i386: Allow MMXMODE moves without MMX

2019-02-01 Thread H.J. Lu
In 64-bit mode, allow MMXMODE moves with SSE when MMX is disabled.

PR target/89021
* config/i386/mmx.md (MMXMODE:mov): Check TARGET_MMX_INSNS
instead of TARGET_MMX.
(MMXMODE:*mov_internal): Likewise.
(MMXMODE:movmisalign): Likewise.
---
 gcc/config/i386/mmx.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 8dc17f0241f..d269e85d332 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -75,7 +75,7 @@
 (define_expand "mov"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX_INSNS"
 {
   ix86_expand_vector_move (mode, operands);
   DONE;
@@ -86,7 +86,7 @@
 "=r ,o ,r,r ,m ,?!y,!y,?!y,m  ,r  ,?!y,v,v,v,m,r,v,!y,*x")
(match_operand:MMXMODE 1 "nonimm_or_0_operand"
 "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
-  "TARGET_MMX
+  "TARGET_MMX_INSNS
&& !(MEM_P (operands[0]) && MEM_P (operands[1]))"
 {
   switch (get_attr_type (insn))
@@ -237,7 +237,7 @@
 (define_expand "movmisalign"
   [(set (match_operand:MMXMODE 0 "nonimmediate_operand")
(match_operand:MMXMODE 1 "nonimmediate_operand"))]
-  "TARGET_MMX"
+  "TARGET_MMX_INSNS"
 {
   ix86_expand_vector_move (mode, operands);
   DONE;
-- 
2.20.1



[PATCH 43/46] i386: Enable 8-byte vectorizer for TARGET_MMX_WITH_SSE

2019-02-01 Thread H.J. Lu
In 64-bit, we support 8-byte vectorizer with SSE when MMX is disabled.
Also xfail x86-64 targets for gcc.dg/tree-ssa/pr84512.c.

gcc/

PR target/89028
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Enable
8-byte vectorizer for TARGET_MMX_WITH_SSE.

gcc/testsuite/

PR target/89028
* gcc.dg/tree-ssa/pr84512.c: Also xfail x86-64 targets.
* gcc.target/i386/pr89028-1.c: New test.
---
 gcc/config/i386/i386.c|  2 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c   |  2 +-
 gcc/testsuite/gcc.target/i386/pr89028-1.c | 10 ++
 3 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-1.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ae517959639..b1f538969d2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -50215,6 +50215,8 @@ ix86_autovectorize_vector_sizes (vector_sizes *sizes)
   sizes->safe_push (32);
   sizes->safe_push (16);
 }
+  if (TARGET_MMX_WITH_SSE)
+sizes->safe_push (8);
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
index 3975757d844..8f8529ba8cf 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
@@ -13,4 +13,4 @@ int foo()
 }
 
 /* Listed targets xfailed due to PR84958.  */
-/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail { { 
alpha*-*-* amdgcn*-*-* nvptx*-*-* } || { sparc*-*-* && lp64 } } } } } */
+/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail { { { 
alpha*-*-* amdgcn*-*-* nvptx*-*-* } || { sparc*-*-* && lp64 } } || { { i?86-*-* 
x86_64-*-* } && { ! ia32 } } } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr89028-1.c 
b/gcc/testsuite/gcc.target/i386/pr89028-1.c
new file mode 100644
index 000..d2ebb7f844d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89028-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-mavx2 -O3" } */
+/* { dg-final { scan-assembler "vpaddb\[ \\t\]+\[^\n\]*%xmm\[0-9\]" } } */
+
+void
+foo (char* restrict r, char* restrict a)
+{
+  for (int i = 0; i < 8; i++)
+r[i] += a[i];
+}
-- 
2.20.1



[PATCH 39/46] i386: Allow MMX vector expanders with SSE

2019-02-01 Thread H.J. Lu
In 64-bit mode, allow MMX vector expanders with SSE when MMX is disabled.

PR target/89021
* config/i386/i386.c (ix86_expand_vector_init_duplicate): Set
mmx_ok to true if TARGET_MMX_WITH_SSE is true.
(ix86_expand_vector_init_one_nonzero): Likewise.
(ix86_expand_vector_init_one_var): Likewise.
(ix86_expand_vector_init_general): Likewise.
(ix86_expand_vector_init): Likewise.
(ix86_expand_vector_set): Likewise.
(ix86_expand_vector_extract): Likewise.
* config/i386/mmx.md (*vec_dupv2sf): Changed to
define_insn_and_split to support SSE emulation.
(vec_setv2sf): Check TARGET_MMX_INSNS instead of TARGET_MMX.
(*vec_extractv2sf_0): Likewise.
(*vec_extractv2sf_1): Likewise.
(vec_extractv2sf_1 splitter): Likewise.
(vec_extractv2sfsf): Likewise.
(vec_setv2si): Likewise.
(*vec_extractv2si_0): Likewise.
(*vec_extractv2si_1): Likewise.
(vec_extractv2si_1 splitter): Likewise.
(*vec_extractv2si_zext_mem): Likewise.
(vec_extractv2sisi): Likewise.
(vec_setv4hi): Likewise.
(vec_extractv4hihi): Likewise.
(vec_setv8qi): Likewise.
(vec_extractv8qiqi): Likewise.
---
 gcc/config/i386/i386.c |  8 +++
 gcc/config/i386/mmx.md | 52 +-
 2 files changed, 39 insertions(+), 21 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d795af1dd93..d8bd018b800 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -42361,6 +42361,7 @@ ix86_expand_vector_init_duplicate (bool mmx_ok, 
machine_mode mode,
 {
   bool ok;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SImode:
@@ -42520,6 +42521,7 @@ ix86_expand_vector_init_one_nonzero (bool mmx_ok, 
machine_mode mode,
   bool use_vector_set = false;
   rtx (*gen_vec_set_0) (rtx, rtx, rtx) = NULL;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2DImode:
@@ -42713,6 +42715,7 @@ ix86_expand_vector_init_one_var (bool mmx_ok, 
machine_mode mode,
   XVECEXP (const_vec, 0, one_var) = CONST0_RTX (GET_MODE_INNER (mode));
   const_vec = gen_rtx_CONST_VECTOR (mode, XVEC (const_vec, 0));
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2DFmode:
@@ -43098,6 +43101,7 @@ ix86_expand_vector_init_general (bool mmx_ok, 
machine_mode mode,
   machine_mode quarter_mode = VOIDmode;
   int n, i;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SFmode:
@@ -43297,6 +43301,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx 
vals)
   int i;
   rtx x;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
+
   /* Handle first initialization from vector elts.  */
   if (n_elts != XVECLEN (vals, 0))
 {
@@ -43396,6 +43402,7 @@ ix86_expand_vector_set (bool mmx_ok, rtx target, rtx 
val, int elt)
   machine_mode mmode = VOIDmode;
   rtx (*gen_blendm) (rtx, rtx, rtx, rtx);
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SFmode:
@@ -43751,6 +43758,7 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, 
rtx vec, int elt)
   bool use_vec_extr = false;
   rtx tmp;
 
+  mmx_ok |= TARGET_MMX_WITH_SSE;
   switch (mode)
 {
 case E_V2SImode:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index d269e85d332..ba81b48e515 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -595,14 +595,24 @@
(set_attr "prefix_extra" "1")
(set_attr "mode" "V2SF")])
 
-(define_insn "*vec_dupv2sf"
-  [(set (match_operand:V2SF 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2sf"
+  [(set (match_operand:V2SF 0 "register_operand" "=y,Yx,Yy")
(vec_duplicate:V2SF
- (match_operand:SF 1 "register_operand" "0")))]
-  "TARGET_MMX"
+ (match_operand:SF 1 "register_operand" "0,0,Yy")))]
+  "TARGET_MMX_INSNS"
   "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+  "&& reload_completed && TARGET_MMX_WITH_SSE"
+  [(const_int 0)]
+{
+  /* Emulate MMX vec_dupv2sf with SSE vec_dupv4sf.  */
+  rtx op0 = gen_rtx_REG (V4SFmode, REGNO (operands[0]));
+  rtx insn = gen_vec_dupv4sf (op0, operands[1]);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*mmx_concatv2sf"
   [(set (match_operand:V2SF 0 "register_operand" "=y,y")
@@ -620,7 +630,7 @@
   [(match_operand:V2SF 0 "register_operand")
(match_operand:SF 1 "register_operand")
(match_operand 2 "const_int_operand")]
-  "TARGET_MMX"
+  "TARGET_MMX_INSNS"
 {
   ix86_expand_vector_set (false, operands[0], operands[1],
  INTVAL (operands[2]));
@@ -634,7 +644,7 @@
(vec_select:SF
  (match_operand:V2SF 1 "nonimmediate_operand" " xm,x,ym,y,m,m")
  (parallel [(const_int 0)])))]
-  "TARGET_MMX && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  

[PATCH 37/46] i386: Emulate MMX abs2 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX abs2 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/sse.md (abs2): Add SSE emulation.
---
 gcc/config/i386/sse.md | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f0d42a17c93..931c40934b1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16077,16 +16077,18 @@
 })
 
 (define_insn "abs2"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yy")
(abs:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "ym")))]
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "ym,Yy")))]
   "TARGET_SSSE3"
-  "pabs\t{%1, %0|%0, %1}";
+  "@
+   pabs\t{%1, %0|%0, %1}
+   %vpabs\t{%1, %0|%0, %1}"
   [(set_attr "type" "sselog1")
(set_attr "prefix_rep" "0")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 44/46] i386: Implement V2SF add/sub/mul with SEE

2019-02-01 Thread H.J. Lu
In 64-bit mode, implement V2SF add/sub/mul with SEE.  Only SSE register
source operand is allowed.

gcc/

PR target/89028
* config/i386/i386.md (comm): Handle mult.
* config/i386/mmx.md (plusminusmult): New.
(plusminusmult_insn): Likewse.
(plusminusmult_mnemonic): Likewse.
(plusminusmult_type): Likewse.
(mmx_addv2sf3): Add "&& !TARGET_MMX_WITH_SSE".
(*mmx_addv2sf3): Likewise.
(mmx_subv2sf3): Likewise.
(mmx_subrv2sf3): Likewise.
(*mmx_subv2sf3): Likewise.
(mmx_mulv2sf3): Likewise.
(*mmx_mulv2sf3): Likewise.
(v2sf3): New.
(*sse_v2sf3): Likewise.

gcc/testsuite/

PR target/89028
* gcc.target/i386/pr89028-2.c: New test.
* gcc.target/i386/pr89028-3.c: Likewise.
* gcc.target/i386/pr89028-4.c: Likewise.
* gcc.target/i386/pr89028-5.c: Likewise.
* gcc.target/i386/pr89028-6.c: Likewise.
* gcc.target/i386/pr89028-7.c: Likewise.
---
 gcc/config/i386/i386.md   |  3 +-
 gcc/config/i386/mmx.md| 56 ---
 gcc/testsuite/gcc.target/i386/pr89028-2.c | 11 +
 gcc/testsuite/gcc.target/i386/pr89028-3.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr89028-4.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr89028-5.c | 11 +
 gcc/testsuite/gcc.target/i386/pr89028-6.c | 14 ++
 gcc/testsuite/gcc.target/i386/pr89028-7.c | 14 ++
 8 files changed, 129 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-7.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 744f155fca6..89ebca70e6d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -861,7 +861,8 @@
 
 ;; Mark commutative operators as such in constraints.
 (define_code_attr comm [(plus "%") (ss_plus "%") (us_plus "%")
-   (minus "") (ss_minus "") (us_minus "")])
+   (minus "") (ss_minus "") (us_minus "")
+   (mult "%")])
 
 ;; Mapping of max and min
 (define_code_iterator maxmin [smax smin umax umin])
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index ba81b48e515..7350da03069 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -63,6 +63,20 @@
 ;; Instruction suffix for truncations with saturation.
 (define_code_attr s_trunsuffix [(ss_truncate "s") (us_truncate "u")])
 
+(define_code_iterator plusminusmult [plus minus mult])
+
+;; Base name for define_insn
+(define_code_attr plusminusmult_insn
+  [(plus "add") (minus "sub") (mult "mul")])
+
+;; Base name for insn mnemonic.
+(define_code_attr plusminusmult_mnemonic
+  [(plus "add") (minus "sub") (mult "mul")])
+
+;; Insn type name for insn mnemonic.
+(define_code_attr plusminusmult_type
+  [(plus "add") (minus "add") (mult "mul")])
+
 ;
 ;;
 ;; Move patterns
@@ -277,14 +291,16 @@
(plus:V2SF
  (match_operand:V2SF 1 "nonimmediate_operand")
  (match_operand:V2SF 2 "nonimmediate_operand")))]
-  "TARGET_3DNOW"
+  "TARGET_3DNOW && !TARGET_MMX_WITH_SSE"
   "ix86_fixup_binary_operands_no_copy (PLUS, V2SFmode, operands);")
 
 (define_insn "*mmx_addv2sf3"
   [(set (match_operand:V2SF 0 "register_operand" "=y")
(plus:V2SF (match_operand:V2SF 1 "nonimmediate_operand" "%0")
   (match_operand:V2SF 2 "nonimmediate_operand" "ym")))]
-  "TARGET_3DNOW && ix86_binary_operator_ok (PLUS, V2SFmode, operands)"
+  "TARGET_3DNOW
+   && !TARGET_MMX_WITH_SSE
+   && ix86_binary_operator_ok (PLUS, V2SFmode, operands)"
   "pfadd\t{%2, %0|%0, %2}"
   [(set_attr "type" "mmxadd")
(set_attr "prefix_extra" "1")
@@ -294,19 +310,21 @@
   [(set (match_operand:V2SF 0 "register_operand")
 (minus:V2SF (match_operand:V2SF 1 "register_operand")
(match_operand:V2SF 2 "nonimmediate_operand")))]
-  "TARGET_3DNOW")
+  "TARGET_3DNOW && !TARGET_MMX_WITH_SSE")
 
 (define_expand "mmx_subrv2sf3"
   [(set (match_operand:V2SF 0 "register_operand")
 (minus:V2SF (match_operand:V2SF 2 "register_operand")
(match_operand:V2SF 1 "nonimmediate_operand")))]
-  "TARGET_3DNOW")
+  "TARGET_3DNOW && !TARGET_MMX_WITH_SSE")
 
 (define_insn "*mmx_subv2sf3"
   [(set (match_operand:V2SF 0 "register_operand" "=y,y")
 (minus:V2SF (match_operand:V2SF 1 "nonimmediate_operand" "0,ym")
(match_operand:V2SF 2 "nonimmediate_operand" "ym,0")))]
-  "TARGET_3DNOW && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "TARGET_3DNOW
+   && !TARGET_MMX_WITH_SSE
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
   "@
   

[PATCH 45/46] i386: Implement V2SF <-> V2SI conversions with SEE

2019-02-01 Thread H.J. Lu
In 64-bit mode, implement V2SF <-> V2SI conversions with SEE.  Only SSE
register source operand is allowed.

gcc/

PR target/89028
* config/i386/sse.md (floatv2siv2sf2): New.
(fix_truncv2sfv2si2): Likewise.

gcc/testsuite/

PR target/89028
* gcc.target/i386/pr89028-8.c: New test.
* gcc.target/i386/pr89028-9.c: Likewise.
---
 gcc/config/i386/sse.md| 31 +++
 gcc/testsuite/gcc.target/i386/pr89028-8.c | 12 +
 gcc/testsuite/gcc.target/i386/pr89028-9.c | 12 +
 3 files changed, 55 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-9.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 931c40934b1..512b6c71a75 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4895,6 +4895,17 @@
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "")])
 
+(define_insn "floatv2siv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand" "=Yx,Yy")
+   (float:V2SF
+ (match_operand:V2SI 1 "register_operand" "Yx,Yy")))]
+  "TARGET_MMX_WITH_SSE"
+  "%vcvtdq2ps\t{%1, %0|%0, %1}"
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "prefix" "maybe_vex")
+   (set_attr "mode" "V4SF")])
+
 (define_insn "ufloat2"
   [(set (match_operand:VF1_AVX512VL 0 "register_operand" "=v")
(unsigned_float:VF1_AVX512VL
@@ -5054,6 +5065,26 @@
(set_attr "prefix" "")
(set_attr "mode" "TI")])
 
+(define_insn "fix_truncv2sfv2si2"
+  [(set (match_operand:V2SI 0 "register_operand" "=Yy")
+   (fix:V2SI (match_operand:V2SF 1 "register_operand" "Yy")))]
+  "TARGET_MMX_WITH_SSE"
+  "%vcvttps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set (attr "prefix_rep")
+ (if_then_else
+   (match_test "TARGET_AVX")
+ (const_string "*")
+ (const_string "1")))
+   (set (attr "prefix_data16")
+ (if_then_else
+   (match_test "TARGET_AVX")
+ (const_string "*")
+ (const_string "0")))
+   (set_attr "prefix_data16" "0")
+   (set_attr "prefix" "maybe_evex")
+   (set_attr "mode" "TI")])
+
 (define_expand "fixuns_trunc2"
   [(match_operand: 0 "register_operand")
(match_operand:VF1 1 "register_operand")]
diff --git a/gcc/testsuite/gcc.target/i386/pr89028-8.c 
b/gcc/testsuite/gcc.target/i386/pr89028-8.c
new file mode 100644
index 000..35cdf1ed332
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89028-8.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse2 -mno-mmx" } */
+/* { dg-final { scan-assembler-times "cvttps2dq" 1 } } */
+
+typedef int __v2si __attribute__ ((__vector_size__ (8)));
+typedef float __v2sf __attribute__ ((__vector_size__ (8)));
+
+__v2si
+foo1 ( __v2sf x)
+{
+  return __builtin_convertvector (x, __v2si);
+}
diff --git a/gcc/testsuite/gcc.target/i386/pr89028-9.c 
b/gcc/testsuite/gcc.target/i386/pr89028-9.c
new file mode 100644
index 000..17242c0402d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr89028-9.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse2 -mno-mmx" } */
+/* { dg-final { scan-assembler-times "cvtdq2ps" 1 } } */
+
+typedef int __v2si __attribute__ ((__vector_size__ (8)));
+typedef float __v2sf __attribute__ ((__vector_size__ (8)));
+
+__v2sf
+foo1 ( __v2si x)
+{
+  return __builtin_convertvector (x, __v2sf);
+}
-- 
2.20.1



[PATCH 36/46] i386: Emulate MMX ssse3_palignrdi with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX version of palignrq with SSE version by concatenating 2
64-bit MMX operands into a single 128-bit SSE operand, followed by
SSE psrldq.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (ssse3_palignrdi): Changed to
define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 44 +++---
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6cad298eb86..f0d42a17c93 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15967,23 +15967,53 @@
(set_attr "prefix" "orig,vex,evex")
(set_attr "mode" "")])
 
-(define_insn "ssse3_palignrdi"
-  [(set (match_operand:DI 0 "register_operand" "=y")
-   (unspec:DI [(match_operand:DI 1 "register_operand" "0")
-   (match_operand:DI 2 "nonimmediate_operand" "ym")
-   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n")]
+(define_insn_and_split "ssse3_palignrdi"
+  [(set (match_operand:DI 0 "register_operand" "=y,Yx,Yy")
+   (unspec:DI [(match_operand:DI 1 "register_operand" "0,0,Yy")
+   (match_operand:DI 2 "nonimmediate_operand" "ym,Yx,Yy")
+   (match_operand:SI 3 "const_0_to_255_mul_8_operand" "n,n,n")]
   UNSPEC_PALIGNR))]
   "TARGET_SSSE3"
 {
   operands[3] = GEN_INT (INTVAL (operands[3]) / 8);
   return "palignr\t{%3, %2, %0|%0, %2, %3}";
 }
-  [(set_attr "type" "sseishft")
+  "&& reload_completed && TARGET_MMX_WITH_SSE"
+  [(const_int 0)]
+{
+  /* Emulate MMX palignrdi with SSE psrldq.  */
+  rtx op0 = gen_rtx_REG (V2DImode, REGNO (operands[0]));
+  rtx insn;
+  if (TARGET_AVX)
+insn = gen_vec_concatv2di (op0, operands[2], operands[1]);
+  else
+{
+  /* NB: SSE can only concatenate OP0 and OP1 to OP0.  */
+  insn = gen_vec_concatv2di (op0, operands[1], operands[2]);
+  emit_insn (insn);
+  /* Swap bits 0:63 with bits 64:127.  */
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2),
+ GEN_INT (3),
+ GEN_INT (0),
+ GEN_INT (1)));
+  rtx op1 = gen_rtx_REG (V4SImode, REGNO (op0));
+  rtx op2 = gen_rtx_VEC_SELECT (V4SImode, op1, mask);
+  insn = gen_rtx_SET (op1, op2);
+}
+  emit_insn (insn);
+  op0 = gen_rtx_REG (V1TImode, REGNO (op0));
+  insn = gen_sse2_lshrv1ti3 (op0, op0, operands[3]);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "sseishft")
(set_attr "atom_unit" "sishuf")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 ;; Mode iterator to handle singularity w/ absence of V2DI and V4DI
 ;; modes for abs instruction on pre AVX-512 targets.
-- 
2.20.1



[PATCH 34/46] i386: Emulate MMX pshufb with SSE version

2019-02-01 Thread H.J. Lu
Emulate MMX version of pshufb with SSE version by masking out the bit 3
of the shuffle control byte.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/sse.md (ssse3_pshufbv8qi3): Renamed to ...
(ssse3_pshufbv8qi3_mmx): This.
(ssse3_pshufbv8qi3): New.
(ssse3_pshufbv8qi3_sse): Likewise.
---
 gcc/config/i386/sse.md | 63 --
 1 file changed, 61 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c170cc75e5a..f932369c740 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15809,18 +15809,77 @@
(set_attr "btver2_decode" "vector")
(set_attr "mode" "")])
 
-(define_insn "ssse3_pshufbv8qi3"
+(define_expand "ssse3_pshufbv8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+   (unspec:V8QI [(match_operand:V8QI 1 "register_operand")
+ (match_operand:V8QI 2 "nonimmediate_operand")]
+UNSPEC_PSHUFB))]
+  "TARGET_SSSE3"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  /* Emulate MMX version of pshufb with SSE version by masking
+out the bit 3 of the shuffle control byte.  */
+  rtvec par = gen_rtvec (4, GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7),
+GEN_INT (0xf7f7f7f7));
+  rtx vec_const = gen_rtx_CONST_VECTOR (V4SImode, par);
+  vec_const = force_const_mem (V4SImode, vec_const);
+  rtx op3 = gen_reg_rtx (V4SImode);
+  rtx op4 = gen_reg_rtx (V4SImode);
+  rtx insn = gen_rtx_SET (op4, vec_const);
+  emit_insn (insn);
+  rtx op2 = force_reg (V8QImode, operands[2]);
+  insn = gen_ssse3_pshufbv8qi3_sse (operands[0], operands[1],
+   op2, op3, op4);
+  emit_insn (insn);
+  DONE;
+}
+})
+
+(define_insn "ssse3_pshufbv8qi3_mmx"
   [(set (match_operand:V8QI 0 "register_operand" "=y")
(unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0")
  (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
 UNSPEC_PSHUFB))]
-  "TARGET_SSSE3"
+  "TARGET_SSSE3 && !TARGET_MMX_WITH_SSE"
   "pshufb\t{%2, %0|%0, %2}";
   [(set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
(set_attr "mode" "DI")])
 
+(define_insn_and_split "ssse3_pshufbv8qi3_sse"
+  [(set (match_operand:V8QI 0 "register_operand" "=Yx,Yy")
+   (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,Yy")
+ (match_operand:V8QI 2 "register_operand" "Yx,Yy")]
+UNSPEC_PSHUFB))
+   (set (match_operand:V4SI 3 "register_operand" "=Yx,Yy")
+   (unspec:V4SI [(match_operand:V4SI 4 "register_operand" "3,3")]
+UNSPEC_PSHUFB))]
+  "TARGET_SSSE3 && TARGET_MMX_WITH_SSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  /* Mask out the bit 3 of the shuffle control byte.  */
+  rtx op2 = gen_rtx_REG (V4SImode, REGNO (operands[2]));
+  rtx op3 = operands[3];
+  rtx insn = gen_andv4si3 (op3, op3, op2);
+  emit_insn (insn);
+  /* Generate SSE version of pshufb.  */
+  rtx op0 = gen_rtx_REG (V16QImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V16QImode, REGNO (operands[1]));
+  op3 = gen_rtx_REG (V16QImode, REGNO (op3));
+  insn = gen_ssse3_pshufbv16qi3 (op0, op1, op3);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sselog1")
+   (set_attr "mode" "TI,TI")])
+
 (define_insn "_psign3"
   [(set (match_operand:VI124_AVX2 0 "register_operand" "=x,x")
(unspec:VI124_AVX2
-- 
2.20.1



[PATCH 32/46] i386: Emulate MMX ssse3_pmaddubsw with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX ssse3_pmaddubsw with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_pmaddubsw): Add SSE emulation.
---
 gcc/config/i386/sse.md | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index b06e105e379..7c066a2cdc5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15658,17 +15658,17 @@
(set_attr "mode" "TI")])
 
 (define_insn "ssse3_pmaddubsw"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy")
(ss_plus:V4HI
  (mult:V4HI
(zero_extend:V4HI
  (vec_select:V4QI
-   (match_operand:V8QI 1 "register_operand" "0")
+   (match_operand:V8QI 1 "register_operand" "0,0,Yy")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)])))
(sign_extend:V4HI
  (vec_select:V4QI
-   (match_operand:V8QI 2 "nonimmediate_operand" "ym")
+   (match_operand:V8QI 2 "nonimmediate_operand" "ym,Yx,Yy")
(parallel [(const_int 0) (const_int 2)
   (const_int 4) (const_int 6)]
  (mult:V4HI
@@ -15681,12 +15681,16 @@
(parallel [(const_int 1) (const_int 3)
   (const_int 5) (const_int 7)]))]
   "TARGET_SSSE3"
-  "pmaddubsw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "@
+   pmaddubsw\t{%2, %0|%0, %2}
+   pmaddubsw\t{%2, %0|%0, %2}
+   vpmaddubsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "simul")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_mode_iterator PMULHRSW
   [V4HI V8HI (V16HI "TARGET_AVX2")])
-- 
2.20.1



[PATCH 28/46] i386: Emulate MMX movntq with SSE2 movntidi

2019-02-01 Thread H.J. Lu
Emulate MMX movntq with SSE2 movntidi.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (sse_movntq): Renamed to ...
(*sse_movntq): This.
(sse_movntq): New.  Emulate MMX movntq with SSE2 movntidi.
---
 gcc/config/i386/mmx.md | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index a832e25e9d0..569aeae98ab 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -243,7 +243,21 @@
   DONE;
 })
 
-(define_insn "sse_movntq"
+(define_expand "sse_movntq"
+  [(set (match_operand:DI 0 "memory_operand")
+   (unspec:DI [(match_operand:DI 1 "register_operand")]
+  UNSPEC_MOVNTQ))]
+  "TARGET_SSE || TARGET_3DNOW_A"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  rtx insn = gen_sse2_movntidi (operands[0], operands[1]);
+  emit_insn (insn);
+  DONE;
+}
+})
+
+(define_insn "*sse_movntq"
   [(set (match_operand:DI 0 "memory_operand" "=m")
(unspec:DI [(match_operand:DI 1 "register_operand" "y")]
   UNSPEC_MOVNTQ))]
-- 
2.20.1



[PATCH 25/46] i386: Emulate MMX mmx_uavgv8qi3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mmx_uavgv8qi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (*mmx_uavgv8qi3): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 92252984482..7e2e3741255 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1641,15 +1641,15 @@
   "ix86_fixup_binary_operands_no_copy (PLUS, V8QImode, operands);")
 
 (define_insn "*mmx_uavgv8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+  [(set (match_operand:V8QI 0 "register_operand" "=y,Yx,Yy")
(truncate:V8QI
  (lshiftrt:V8HI
(plus:V8HI
  (plus:V8HI
(zero_extend:V8HI
- (match_operand:V8QI 1 "nonimmediate_operand" "%0"))
+ (match_operand:V8QI 1 "nonimmediate_operand" "%0,0,Yy"))
(zero_extend:V8HI
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")))
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,Yx,Yy")))
  (const_vector:V8HI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)
  (const_int 1) (const_int 1)
@@ -1660,19 +1660,22 @@
 {
   /* These two instructions have the same operation, but their encoding
  is different.  Prefer the one that is de facto standard.  */
-  if (TARGET_SSE || TARGET_3DNOW_A)
+  if (TARGET_MMX_WITH_SSE && TARGET_AVX)
+return "vpavgb\t{%2, %1, %0|%0, %1, %2}";
+  else if (TARGET_SSE || TARGET_3DNOW_A)
 return "pavgb\t{%2, %0|%0, %2}";
   else
 return "pavgusb\t{%2, %0|%0, %2}";
 }
-  [(set_attr "type" "mmxshft")
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
(set (attr "prefix_extra")
  (if_then_else
(not (ior (match_test "TARGET_SSE")
 (match_test "TARGET_3DNOW_A")))
(const_string "1")
(const_string "*")))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_uavgv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 29/46] i386: Emulate MMX umulv1siv1di3 with SSE2

2019-02-01 Thread H.J. Lu
Emulate MMX umulv1siv1di3 with SSE2.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (*sse2_umulv1siv1di3): Add SSE2 emulation.
---
 gcc/config/i386/mmx.md | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 569aeae98ab..8dc17f0241f 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -949,20 +949,25 @@
   "ix86_fixup_binary_operands_no_copy (MULT, V2SImode, operands);")
 
 (define_insn "*sse2_umulv1siv1di3"
-  [(set (match_operand:V1DI 0 "register_operand" "=y")
+  [(set (match_operand:V1DI 0 "register_operand" "=y,Yx,Yy")
 (mult:V1DI
  (zero_extend:V1DI
(vec_select:V1SI
- (match_operand:V2SI 1 "nonimmediate_operand" "%0")
+ (match_operand:V2SI 1 "nonimmediate_operand" "%0,0,Yy")
  (parallel [(const_int 0)])))
  (zero_extend:V1DI
(vec_select:V1SI
- (match_operand:V2SI 2 "nonimmediate_operand" "ym")
+ (match_operand:V2SI 2 "nonimmediate_operand" "ym,Yx,Yy")
  (parallel [(const_int 0)])]
   "TARGET_SSE2 && ix86_binary_operator_ok (MULT, V2SImode, operands)"
-  "pmuludq\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "@
+   pmuludq\t{%2, %0|%0, %2}
+   pmuludq\t{%2, %0|%0, %2}
+   vpmuludq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
+
 
 (define_expand "mmx_v4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 33/46] i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX ssse3_pmulhrswv4hi3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/sse.md (*ssse3_pmulhrswv4hi3): Add SSE emulation.
---
 gcc/config/i386/sse.md | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7c066a2cdc5..c170cc75e5a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15766,25 +15766,29 @@
(set_attr "mode" "")])
 
 (define_insn "*ssse3_pmulhrswv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy")
(truncate:V4HI
  (lshiftrt:V4SI
(plus:V4SI
  (lshiftrt:V4SI
(mult:V4SI
  (sign_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yy"))
  (sign_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,Yx,Yy")))
(const_int 14))
  (match_operand:V4HI 3 "const1_operand"))
(const_int 1]
   "TARGET_SSSE3 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
-  "pmulhrsw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseimul")
+  "@
+   pmulhrsw\t{%2, %0|%0, %2}
+   pmulhrsw\t{%2, %0|%0, %2}
+   vpmulhrsw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "sseimul")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_pshufb3"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x,v")
-- 
2.20.1



[PATCH 35/46] i386: Emulate MMX ssse3_psign3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX ssse3_psign3 with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_psign3): Add SSE emulation.
---
 gcc/config/i386/sse.md | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index f932369c740..6cad298eb86 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15898,17 +15898,21 @@
(set_attr "mode" "")])
 
 (define_insn "ssse3_psign3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yx,Yy")
(unspec:MMXMODEI
- [(match_operand:MMXMODEI 1 "register_operand" "0")
-  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")]
+ [(match_operand:MMXMODEI 1 "register_operand" "0,0,Yy")
+  (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,Yx,Yy")]
  UNSPEC_PSIGN))]
   "TARGET_SSSE3"
-  "psign\t{%2, %0|%0, %2}";
-  [(set_attr "type" "sselog1")
+  "@
+   psign\t{%2, %0|%0, %2}
+   psign\t{%2, %0|%0, %2}
+   vpsign\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "_palignr_mask"
   [(set (match_operand:VI1_AVX512 0 "register_operand" "=v")
-- 
2.20.1



[PATCH 30/46] i386: Emulate MMX ssse3_phwv4hi3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX ssse3_phwv4hi3 with SSE by moving bits
64:95 to bits 32:63 in SSE register.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_phwv4hi3):
Changed to define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7218c9cd646..2f25780ebee 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15356,13 +15356,13 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
-(define_insn "ssse3_phwv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+(define_insn_and_split "ssse3_phwv4hi3"
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy")
(vec_concat:V4HI
  (vec_concat:V2HI
(ssse3_plusminus:HI
  (vec_select:HI
-   (match_operand:V4HI 1 "register_operand" "0")
+   (match_operand:V4HI 1 "register_operand" "0,0,Yy")
(parallel [(const_int 0)]))
  (vec_select:HI (match_dup 1) (parallel [(const_int 1)])))
(ssse3_plusminus:HI
@@ -15371,7 +15371,7 @@
  (vec_concat:V2HI
(ssse3_plusminus:HI
  (vec_select:HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,Yx,Yy")
(parallel [(const_int 0)]))
  (vec_select:HI (match_dup 2) (parallel [(const_int 1)])))
(ssse3_plusminus:HI
@@ -15379,11 +15379,24 @@
  (vec_select:HI (match_dup 2) (parallel [(const_int 3)]))]
   "TARGET_SSSE3"
   "phw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "&& reload_completed && TARGET_MMX_WITH_SSE"
+  [(const_int 0)]
+{
+  /* Generate SSE version of the operation.  */
+  rtx op0 = gen_rtx_REG (V8HImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V8HImode, REGNO (operands[1]));
+  rtx op2 = gen_rtx_REG (V8HImode, REGNO (operands[2]));
+  rtx insn = gen_ssse3_phwv8hi3 (op0, op1, op2);
+  emit_insn (insn);
+  ix86_move_vector_high_sse_to_mmx (op0);
+  DONE;
+}
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "complex")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "avx2_phdv8si3"
   [(set (match_operand:V8SI 0 "register_operand" "=x")
-- 
2.20.1



[PATCH 31/46] i386: Emulate MMX ssse3_phdv2si3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX ssse3_phdv2si3 with SSE by moving bits
64:95 to bits 32:63 in SSE register.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/sse.md (ssse3_phdv2si3):
Changed to define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/sse.md | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 2f25780ebee..b06e105e379 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15475,26 +15475,39 @@
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
 
-(define_insn "ssse3_phdv2si3"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "ssse3_phdv2si3"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yx,Yy")
(vec_concat:V2SI
  (plusminus:SI
(vec_select:SI
- (match_operand:V2SI 1 "register_operand" "0")
+ (match_operand:V2SI 1 "register_operand" "0,0,Yy")
  (parallel [(const_int 0)]))
(vec_select:SI (match_dup 1) (parallel [(const_int 1)])))
  (plusminus:SI
(vec_select:SI
- (match_operand:V2SI 2 "nonimmediate_operand" "ym")
+ (match_operand:V2SI 2 "nonimmediate_operand" "ym,Yx,Yy")
  (parallel [(const_int 0)]))
(vec_select:SI (match_dup 2) (parallel [(const_int 1)])]
   "TARGET_SSSE3"
   "phd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "sseiadd")
+  "&& reload_completed && TARGET_MMX_WITH_SSE"
+  [(const_int 0)]
+{
+  /* Generate SSE version of the operation.  */
+  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+  rtx op1 = gen_rtx_REG (V4SImode, REGNO (operands[1]));
+  rtx op2 = gen_rtx_REG (V4SImode, REGNO (operands[2]));
+  rtx insn = gen_ssse3_phdv4si3 (op0, op1, op2);
+  emit_insn (insn);
+  ix86_move_vector_high_sse_to_mmx (op0);
+  DONE;
+}
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "sseiadd")
(set_attr "atom_unit" "complex")
(set_attr "prefix_extra" "1")
(set (attr "prefix_rex") (symbol_ref "x86_extended_reg_mentioned_p (insn)"))
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "avx2_pmaddubsw256"
   [(set (match_operand:V16HI 0 "register_operand" "=x,v")
-- 
2.20.1



[PATCH 27/46] i386: Emulate MMX mmx_psadbw with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mmx_psadbw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_psadbw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index f0987051f33..a832e25e9d0 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1717,14 +1717,19 @@
(set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_psadbw"
-  [(set (match_operand:V1DI 0 "register_operand" "=y")
-(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")]
+  [(set (match_operand:V1DI 0 "register_operand" "=y,Yx,Yy")
+(unspec:V1DI [(match_operand:V8QI 1 "register_operand" "0,0,Yy")
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,Yx,Yy")]
 UNSPEC_PSADBW))]
   "TARGET_SSE || TARGET_3DNOW_A"
-  "psadbw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set_attr "mode" "DI")])
+  "@
+   psadbw\t{%2, %0|%0, %2}
+   psadbw\t{%2, %0|%0, %2}
+   vpsadbw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
+
 
 (define_insn_and_split "mmx_pmovmskb"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-- 
2.20.1



[PATCH 21/46] i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (smaxmin:v4hi3): New.
(umaxmin:v8qi3): Likewise.
(smaxmin:*mmx_v4hi3): Add SSE emulation.
(umaxmin:*mmx_v8qi3): Likewise.
---
 gcc/config/i386/mmx.md | 48 +++---
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index a2cddea2235..b7bd975712a 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -954,16 +954,28 @@
   "TARGET_SSE || TARGET_3DNOW_A"
   "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
 
+(define_expand "v4hi3"
+  [(set (match_operand:V4HI 0 "register_operand")
+(smaxmin:V4HI
+ (match_operand:V4HI 1 "nonimmediate_operand")
+ (match_operand:V4HI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, V4HImode, operands);")
+
 (define_insn "*mmx_v4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy")
 (smaxmin:V4HI
- (match_operand:V4HI 1 "nonimmediate_operand" "%0")
- (match_operand:V4HI 2 "nonimmediate_operand" "ym")))]
+ (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yy")
+ (match_operand:V4HI 2 "nonimmediate_operand" "ym,Yx,Yy")))]
   "(TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (, V4HImode, operands)"
-  "pw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   pw\t{%2, %0|%0, %2}
+   pw\t{%2, %0|%0, %2}
+   vpw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxadd,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_v8qi3"
   [(set (match_operand:V8QI 0 "register_operand")
@@ -973,16 +985,28 @@
   "TARGET_SSE || TARGET_3DNOW_A"
   "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
 
+(define_expand "v8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+(umaxmin:V8QI
+ (match_operand:V8QI 1 "nonimmediate_operand")
+ (match_operand:V8QI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, V8QImode, operands);")
+
 (define_insn "*mmx_v8qi3"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+  [(set (match_operand:V8QI 0 "register_operand" "=y,Yx,Yy")
 (umaxmin:V8QI
- (match_operand:V8QI 1 "nonimmediate_operand" "%0")
- (match_operand:V8QI 2 "nonimmediate_operand" "ym")))]
+ (match_operand:V8QI 1 "nonimmediate_operand" "%0,0,Yy")
+ (match_operand:V8QI 2 "nonimmediate_operand" "ym,Yx,Yy")))]
   "(TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (, V8QImode, operands)"
-  "pb\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   pb\t{%2, %0|%0, %2}
+   pb\t{%2, %0|%0, %2}
+   vpb\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxadd,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_ashr3"
   [(set (match_operand:MMXMODE24 0 "register_operand" "=y")
-- 
2.20.1



[PATCH 26/46] i386: Emulate MMX mmx_uavgv4hi3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mmx_uavgv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (*mmx_uavgv4hi3): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 7e2e3741255..f0987051f33 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1694,23 +1694,27 @@
   "ix86_fixup_binary_operands_no_copy (PLUS, V4HImode, operands);")
 
 (define_insn "*mmx_uavgv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy")
(truncate:V4HI
  (lshiftrt:V4SI
(plus:V4SI
  (plus:V4SI
(zero_extend:V4SI
- (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+ (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yy"))
(zero_extend:V4SI
- (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+ (match_operand:V4HI 2 "nonimmediate_operand" "ym,Yx,Yy")))
  (const_vector:V4SI [(const_int 1) (const_int 1)
  (const_int 1) (const_int 1)]))
(const_int 1]
   "(TARGET_SSE || TARGET_3DNOW_A)
&& ix86_binary_operator_ok (PLUS, V4HImode, operands)"
-  "pavgw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxshft")
-   (set_attr "mode" "DI")])
+  "@
+   pavgw\t{%2, %0|%0, %2}
+   pavgw\t{%2, %0|%0, %2}
+   vpavgw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxshft,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_psadbw"
   [(set (match_operand:V1DI 0 "register_operand" "=y")
-- 
2.20.1



[PATCH 22/46] i386: Emulate MMX mmx_pmovmskb with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mmx_pmovmskb with SSE by zero-extending result of SSE pmovmskb
from QImode to SImode.   Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmovmskb): Changed to
define_insn_and_split to support SSE emulation.
---
 gcc/config/i386/mmx.md | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b7bd975712a..7b58e9dcc6f 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1715,14 +1715,28 @@
   [(set_attr "type" "mmxshft")
(set_attr "mode" "DI")])
 
-(define_insn "mmx_pmovmskb"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (unspec:SI [(match_operand:V8QI 1 "register_operand" "y")]
+(define_insn_and_split "mmx_pmovmskb"
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
+   (unspec:SI [(match_operand:V8QI 1 "register_operand" "y,Yx")]
   UNSPEC_MOVMSK))]
   "TARGET_SSE || TARGET_3DNOW_A"
   "pmovmskb\t{%1, %0|%0, %1}"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+  "&& reload_completed && TARGET_MMX_WITH_SSE"
+  [(const_int 0)]
+{
+  /* Generate SSE pmovmskb.  */
+  rtx op0 = operands[0];
+  rtx op1 = gen_rtx_REG (V16QImode, REGNO (operands[1]));
+  rtx insn = gen_sse2_pmovmskb (op0, op1);
+  emit_insn (insn);
+  /* Zero-extend from QImode to SImode.  */
+  op1 = gen_rtx_REG (QImode, REGNO (operands[0]));
+  insn = gen_zero_extendqisi2 (op0, op1);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "type" "mmxcvt,ssemov")
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_maskmovq"
   [(set (match_operand:V8QI 0 "memory_operand")
-- 
2.20.1



[PATCH 19/46] i386: Emulate MMX mmx_pextrw with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mmx_pextrw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pextrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 599c762e166..b22e3dec784 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1317,16 +1317,16 @@
(set_attr "mode" "DI")])
 
 (define_insn "mmx_pextrw"
-  [(set (match_operand:SI 0 "register_operand" "=r")
+  [(set (match_operand:SI 0 "register_operand" "=r,r")
 (zero_extend:SI
  (vec_select:HI
-   (match_operand:V4HI 1 "register_operand" "y")
-   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n")]]
+   (match_operand:V4HI 1 "register_operand" "y,Yy")
+   (parallel [(match_operand:SI 2 "const_0_to_3_operand" "n,n")]]
   "TARGET_SSE || TARGET_3DNOW_A"
-  "pextrw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "mmxcvt")
+  "%vpextrw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "mmxcvt,sselog1")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI")])
 
 (define_expand "mmx_pshufw"
   [(match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 24/46] i386: Emulate MMX maskmovq with SSE2 maskmovdqu

2019-02-01 Thread H.J. Lu
Emulate MMX maskmovq with SSE2 maskmovdqu by zeroing out the upper 64
bits of the mask operand.  A warning is issued since invalid memory
access may happen when bits 64:127 at memory location are unmapped:

xmmintrin.h:1168:3: note: Emulate MMX maskmovq with SSE2 maskmovdqu may result 
in invalid memory access
 1168 |   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
  |   ^~~

Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_maskmovq): Emulate MMX maskmovq with
SSE2 maskmovdqu and a warning.
(sse2_maskmovq_): New.
(*mmx_maskmovq): Add "&& !TARGET_MMX_WITH_SSE".
* config/i386/sse.md (*sse2_maskmovdqu): Renamed to ...
(sse2_maskmovdqu_): This.
---
 gcc/config/i386/mmx.md | 59 --
 gcc/config/i386/sse.md |  2 +-
 2 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index f90574a7255..92252984482 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1748,7 +1748,62 @@
  (match_operand:V8QI 2 "register_operand")
  (match_dup 0)]
 UNSPEC_MASKMOV))]
-  "TARGET_SSE || TARGET_3DNOW_A")
+  "TARGET_SSE || TARGET_3DNOW_A"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  /* Emulate MMX maskmovq with SSE2 maskmovdqu and issue a warning
+since they aren't equivalent.  */
+  inform (input_location, "Emulate MMX maskmovq with SSE2 maskmovdqu "
+ "may result in invalid memory access");
+  rtx insn;
+  rtx op = gen_reg_rtx (V2DImode);
+  if (Pmode == SImode)
+   insn = gen_sse2_maskmovq_si (XEXP (operands[0], 0),
+operands[1], operands[2], op, op);
+  else
+   insn = gen_sse2_maskmovq_di (XEXP (operands[0], 0),
+operands[1], operands[2], op, op);
+  emit_insn (insn);
+  DONE;
+}
+})
+
+(define_insn_and_split "sse2_maskmovq_"
+  [(set (mem:V8QI (match_operand:P 0 "register_operand" "D"))
+   (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "Yy")
+ (match_operand:V8QI 2 "register_operand" "Yy")
+ (mem:V8QI (match_dup 0))]
+UNSPEC_MASKMOV))
+   (set (match_operand:V2DI 3 "register_operand" "=Yy")
+   (unspec:V2DI [(match_operand:V2DI 4 "register_operand" "3")]
+UNSPEC_MASKMOV))]
+  "TARGET_MMX_WITH_SSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  /* Copy the lower 64 bits of operand 2 (the mask operan) to operand 3.
+ NB: Invalid memory access may happen when bits 64:127 at memory
+ location are unmapped.  */
+  rtx op3 = operands[3];
+  rtx op2 = gen_rtx_REG (V2DImode, REGNO (operands[2]));
+  rtx insn = gen_sse2_movq128 (op3, op2);
+  emit_insn (insn);
+
+  /* Generate SSE2 maskmovdqu with operand 3.  */
+  rtx op1 = gen_rtx_REG (V16QImode, REGNO (operands[1]));
+  op3 = gen_rtx_REG (V16QImode, REGNO (operands[3]));
+  if (Pmode == SImode)
+insn = gen_sse2_maskmovdqu_si (operands[0], op1, op3);
+  else
+insn = gen_sse2_maskmovdqu_di (operands[0], op1, op3);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "type" "ssemov")
+   (set_attr "znver1_decode" "vector")
+   (set_attr "mode" "TI")])
 
 (define_insn "*mmx_maskmovq"
   [(set (mem:V8QI (match_operand:P 0 "register_operand" "D"))
@@ -1756,7 +1811,7 @@
  (match_operand:V8QI 2 "register_operand" "y")
  (mem:V8QI (match_dup 0))]
 UNSPEC_MASKMOV))]
-  "TARGET_SSE || TARGET_3DNOW_A"
+  "(TARGET_SSE || TARGET_3DNOW_A) && !TARGET_MMX_WITH_SSE"
   ;; @@@ check ordering of operands in intel/nonintel syntax
   "maskmovq\t{%2, %1|%1, %2}"
   [(set_attr "type" "mmxcvt")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9ecd9789c1e..7218c9cd646 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15142,7 +15142,7 @@
  UNSPEC_MASKMOV))]
   "TARGET_SSE2")
 
-(define_insn "*sse2_maskmovdqu"
+(define_insn "sse2_maskmovdqu_"
   [(set (mem:V16QI (match_operand:P 0 "register_operand" "D"))
(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "x")
   (match_operand:V16QI 2 "register_operand" "x")
-- 
2.20.1



[PATCH 17/46] i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE.

PR target/89021
* config/i386/mmx.md (sse_cvtps2pi): Add SSE emulation.
(sse_cvttps2pi): Likewise.
---
 gcc/config/i386/sse.md | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 7d2c0367911..836bc5e05a0 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4668,26 +4668,30 @@
(set_attr "mode" "V4SF")])
 
 (define_insn "sse_cvtps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yy")
(vec_select:V2SI
- (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm")]
+ (unspec:V4SI [(match_operand:V4SF 1 "nonimmediate_operand" "xm,YyBm")]
   UNSPEC_FIX_NOTRUNC)
  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_SSE"
-  "cvtps2pi\t{%1, %0|%0, %q1}"
+  "@
+   cvtps2pi\t{%1, %0|%0, %q1}
+   %vcvtps2dq\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+   (set_attr "unit" "mmx,*")
(set_attr "mode" "DI")])
 
 (define_insn "sse_cvttps2pi"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yy")
(vec_select:V2SI
- (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm"))
+ (fix:V4SI (match_operand:V4SF 1 "nonimmediate_operand" "xm,YyBm"))
  (parallel [(const_int 0) (const_int 1)])))]
   "TARGET_SSE"
-  "cvttps2pi\t{%1, %0|%0, %q1}"
+  "@
+   cvttps2pi\t{%1, %0|%0, %q1}
+   %vcvttps2dq\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssecvt")
-   (set_attr "unit" "mmx")
+   (set_attr "unit" "mmx,*")
(set_attr "prefix_rep" "0")
(set_attr "mode" "SF")])
 
-- 
2.20.1



[PATCH 20/46] i386: Emulate MMX mmx_pinsrw with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mmx_pinsrw with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pinsrw): Add SSE emulation.
---
 gcc/config/i386/mmx.md | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index b22e3dec784..a2cddea2235 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1296,25 +1296,36 @@
 })
 
 (define_insn "*mmx_pinsrw"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy")
 (vec_merge:V4HI
   (vec_duplicate:V4HI
-(match_operand:HI 2 "nonimmediate_operand" "rm"))
- (match_operand:V4HI 1 "register_operand" "0")
+(match_operand:HI 2 "nonimmediate_operand" "rm,rm,rm"))
+ (match_operand:V4HI 1 "register_operand" "0,0,Yy")
   (match_operand:SI 3 "const_int_operand")))]
   "(TARGET_SSE || TARGET_3DNOW_A)
&& ((unsigned) exact_log2 (INTVAL (operands[3]))
< GET_MODE_NUNITS (V4HImode))"
 {
   operands[3] = GEN_INT (exact_log2 (INTVAL (operands[3])));
-  if (MEM_P (operands[2]))
-return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  if (TARGET_MMX_WITH_SSE && TARGET_AVX)
+{
+  if (MEM_P (operands[2]))
+   return "vpinsrw\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+  else
+   return "vpinsrw\t{%3, %k2, %1, %0|%0, %1, %k2, %3}";
+}
   else
-return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+{
+  if (MEM_P (operands[2]))
+   return "pinsrw\t{%3, %2, %0|%0, %2, %3}";
+  else
+   return "pinsrw\t{%3, %k2, %0|%0, %k2, %3}";
+}
 }
-  [(set_attr "type" "mmxcvt")
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxcvt,sselog,sselog")
(set_attr "length_immediate" "1")
-   (set_attr "mode" "DI")])
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_pextrw"
   [(set (match_operand:SI 0 "register_operand" "=r,r")
-- 
2.20.1



[PATCH 18/46] i386: Emulate MMX sse_cvtpi2ps with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX sse_cvtpi2ps with SSE2 cvtdq2ps, preserving upper 64 bits of
destination XMM register.  Only SSE register source operand is allowed.

PR target/89021
* config/i386/mmx.md (UNSPEC_CVTPI2PS): New.
(sse_cvtpi2ps): Renamed to ...
(*mmx_cvtpi2ps): This.
(sse_cvtpi2ps): New.
(mmx_cvtpi2ps_sse): Likewise.
---
 gcc/config/i386/sse.md | 81 +-
 1 file changed, 80 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 836bc5e05a0..9ecd9789c1e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18,6 +18,9 @@
 ;; .
 
 (define_c_enum "unspec" [
+  ;; MMX with SSE
+  UNSPEC_CVTPI2PS
+
   ;; SSE
   UNSPEC_MOVNT
 
@@ -4655,7 +4658,83 @@
 ;;
 ;
 
-(define_insn "sse_cvtpi2ps"
+(define_expand "sse_cvtpi2ps"
+  [(set (match_operand:V4SF 0 "register_operand")
+   (vec_merge:V4SF
+ (vec_duplicate:V4SF
+   (float:V2SF (match_operand:V2SI 2 "nonimmediate_operand")))
+ (match_operand:V4SF 1 "register_operand")
+ (const_int 3)))]
+  "TARGET_SSE"
+{
+  if (TARGET_MMX_WITH_SSE)
+{
+  rtx op2 = force_reg (V2SImode, operands[2]);
+  rtx op3 = gen_reg_rtx (V4SFmode);
+  rtx op4 = gen_reg_rtx (V4SFmode);
+  rtx insn = gen_mmx_cvtpi2ps_sse (operands[0], operands[1], op2,
+  op3, op4);
+  emit_insn (insn);
+  DONE;
+}
+})
+
+(define_insn_and_split "mmx_cvtpi2ps_sse"
+  [(set (match_operand:V4SF 0 "register_operand" "=Yx,Yy")
+   (unspec:V4SF [(match_operand:V2SI 2 "register_operand" "Yx,Yy")
+ (match_operand:V4SF 1 "register_operand" "0,Yy")]
+UNSPEC_CVTPI2PS))
+   (set (match_operand:V4SF 3 "register_operand" "=Yx,Yy")
+   (unspec:V4SF [(match_operand:V4SF 4 "register_operand" "3,3")]
+UNSPEC_CVTPI2PS))]
+  "TARGET_MMX_WITH_SSE"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op2 = gen_rtx_REG (V4SImode, REGNO (operands[2]));
+  /* Generate SSE2 cvtdq2ps.  */
+  rtx insn = gen_floatv4siv4sf2 (operands[3], op2);
+  emit_insn (insn);
+
+  /* Merge operands[3] with operands[0].  */
+  rtx mask, op1;
+  if (TARGET_AVX)
+{
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (1),
+ GEN_INT (6), GEN_INT (7)));
+  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[3], operands[1]);
+  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
+  insn = gen_rtx_SET (operands[0], op2);
+}
+  else
+{
+  /* NB: SSE can only concatenate OP0 and OP3 to OP0.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (4), GEN_INT (5)));
+  op1 = gen_rtx_VEC_CONCAT (V8SFmode, operands[0], operands[3]);
+  op2 = gen_rtx_VEC_SELECT (V4SFmode, op1, mask);
+  insn = gen_rtx_SET (operands[0], op2);
+  emit_insn (insn);
+
+  /* Swap bits 0:63 with bits 64:127.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (1)));
+  rtx dest = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+}
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "*mmx_cvtpi2ps"
   [(set (match_operand:V4SF 0 "register_operand" "=x")
(vec_merge:V4SF
  (vec_duplicate:V4SF
-- 
2.20.1



[PATCH 15/46] i386: Emulate MMX vec_dupv2si with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX vec_dupv2si with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (*vec_dupv2si): Changed to
define_insn_and_split to support SSE emulation.
* config/i386/sse.md (*vec_dupv4si): Renamed to ...
(vec_dupv4si): This.
---
 gcc/config/i386/mmx.md | 22 --
 gcc/config/i386/sse.md |  2 +-
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 87e7f7e3921..74efe680d9e 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1388,14 +1388,24 @@
(set_attr "length_immediate" "1")
(set_attr "mode" "DI")])
 
-(define_insn "*vec_dupv2si"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2si"
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yx,Yy")
(vec_duplicate:V2SI
- (match_operand:SI 1 "register_operand" "0")))]
-  "TARGET_MMX"
+ (match_operand:SI 1 "register_operand" "0,0,Yy")))]
+  "TARGET_MMX_INSNS"
   "punpckldq\t%0, %0"
-  [(set_attr "type" "mmxcvt")
-   (set_attr "mode" "DI")])
+  "&& reload_completed && TARGET_MMX_WITH_SSE"
+  [(const_int 0)]
+{
+  /* Emulate MMX vec_dupv2si with SSE vec_dupv4si.  */
+  rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+  rtx insn = gen_vec_dupv4si (op0, operands[1]);
+  emit_insn (insn);
+  DONE;
+}
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxcvt,ssemov,ssemov")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*mmx_concatv2si"
   [(set (match_operand:V2SI 0 "register_operand" "=y,y")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5dc0930ac1f..7d2c0367911 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18976,7 +18976,7 @@
(set_attr "prefix" "maybe_evex,maybe_evex,orig")
(set_attr "mode" "V4SF")])
 
-(define_insn "*vec_dupv4si"
+(define_insn "vec_dupv4si"
   [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
(vec_duplicate:V4SI
  (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
-- 
2.20.1



[PATCH 14/46] i386: Emulate MMX mmx_eq/mmx_gt3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mmx_eq/mmx_gt3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (mmx_eq3): Check TARGET_MMX_INSNS
instead of TARGET_MMX.
(*mmx_eq3): Check TARGET_MMX_INSNS instead of TARGET_MMX.
Add SSE support.
(mmx_gt3): Likewise.
---
 gcc/config/i386/mmx.md | 38 +++---
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 12a933c636c..87e7f7e3921 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1057,28 +1057,36 @@
 (eq:MMXMODEI
  (match_operand:MMXMODEI 1 "nonimmediate_operand")
  (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+  "TARGET_MMX_INSNS"
   "ix86_fixup_binary_operands_no_copy (EQ, mode, operands);")
 
 (define_insn "*mmx_eq3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yx,Yy")
 (eq:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (EQ, mode, operands)"
-  "pcmpeq\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxcmp")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,Yy")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,Yx,Yy")))]
+  "TARGET_MMX_INSNS && ix86_binary_operator_ok (EQ, mode, operands)"
+  "@
+   pcmpeq\t{%2, %0|%0, %2}
+   pcmpeq\t{%2, %0|%0, %2}
+   vpcmpeq\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxcmp,ssecmp,ssecmp")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_insn "mmx_gt3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yx,Yy")
 (gt:MMXMODEI
- (match_operand:MMXMODEI 1 "register_operand" "0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX"
-  "pcmpgt\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxcmp")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "register_operand" "0,0,Yy")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,Yx,Yy")))]
+  "TARGET_MMX_INSNS"
+  "@
+   pcmpgt\t{%2, %0|%0, %2}
+   pcmpgt\t{%2, %0|%0, %2}
+   vpcmpgt\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxcmp,ssecmp,ssecmp")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 12/46] i386: Emulate MMX 3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX 3 with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (any_logic:3): New.
(any_logic:*mmx_3): Check TARGET_MMX_INSNS instead of
TARGET_MMX.  Add SSE support.
---
 gcc/config/i386/mmx.md | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 0b2383ef764..376163a41af 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1104,15 +1104,27 @@
   "TARGET_MMX"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (any_logic:MMXMODEI
+ (match_operand:MMXMODEI 1 "nonimmediate_operand")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yx,Yy")
 (any_logic:MMXMODEI
- (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0")
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODEI 1 "nonimmediate_operand" "%0,0,Yy")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,Yx,Yy")))]
+  "TARGET_MMX_INSNS && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 ;
 ;;
-- 
2.20.1



[PATCH 10/46] i386: Emulate MMX mmx_pmaddwd with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX pmaddwd with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_pmaddwd): Check TARGET_MMX_INSNS
instead of TARGET_MMX.
(*mmx_pmaddwd): Check TARGET_MMX_INSNS instead of TARGET_MMX.
Add SSE support.
---
 gcc/config/i386/mmx.md | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 5ba8b46fc73..fe199b84935 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -850,20 +850,20 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX"
+  "TARGET_MMX_INSNS"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_pmaddwd"
-  [(set (match_operand:V2SI 0 "register_operand" "=y")
+  [(set (match_operand:V2SI 0 "register_operand" "=y,Yx,Yy")
 (plus:V2SI
  (mult:V2SI
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0")
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yy")
(parallel [(const_int 0) (const_int 2)])))
(sign_extend:V2SI
  (vec_select:V2HI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,Yx,Yy")
(parallel [(const_int 0) (const_int 2)]
  (mult:V2SI
(sign_extend:V2SI
@@ -872,10 +872,14 @@
(sign_extend:V2SI
  (vec_select:V2HI (match_dup 2)
(parallel [(const_int 1) (const_int 3)]))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmaddwd\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "TARGET_MMX_INSNS && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmaddwd\t{%2, %0|%0, %2}
+   pmaddwd\t{%2, %0|%0, %2}
+   vpmaddwd\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxmul,sseiadd,sseiadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_pmulhrwv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 13/46] i386: Emulate MMX mmx_andnot3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mmx_andnot3 with SSE.  Only SSE register source operand
is allowed.

PR target/89021
* config/i386/mmx.md (mmx_andnot3): Check TARGET_MMX_INSNS
instead of TARGET_MMX.  Add SSE support.
---
 gcc/config/i386/mmx.md | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 376163a41af..12a933c636c 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1087,14 +1087,18 @@
 ;
 
 (define_insn "mmx_andnot3"
-  [(set (match_operand:MMXMODEI 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI 0 "register_operand" "=y,Yx,Yy")
(and:MMXMODEI
- (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand" "0"))
- (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX"
-  "pandn\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (not:MMXMODEI (match_operand:MMXMODEI 1 "register_operand" "0,0,Yy"))
+ (match_operand:MMXMODEI 2 "nonimmediate_operand" "ym,Yx,Yy")))]
+  "TARGET_MMX_INSNS"
+  "@
+   pandn\t{%2, %0|%0, %2}
+   pandn\t{%2, %0|%0, %2}
+   vpandn\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxadd,sselog,sselog")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODEI 0 "register_operand")
-- 
2.20.1



[PATCH 09/46] i386: Emulate MMX smulv4hi3_highpart with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mulv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mmx_smulv4hi3_highpart): Check
TARGET_MMX_INSNS instead of TARGET_MMX.
(*mmx_smulv4hi3_highpart): Check TARGET_MMX_INSNS instead of
TARGET_MMX.  Add SSE support.
---
 gcc/config/i386/mmx.md | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index d3f300a901c..5ba8b46fc73 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -780,23 +780,27 @@
  (sign_extend:V4SI
(match_operand:V4HI 2 "nonimmediate_operand")))
(const_int 16]
-  "TARGET_MMX"
+  "TARGET_MMX_INSNS"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
 (define_insn "*mmx_smulv4hi3_highpart"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy")
(truncate:V4HI
  (lshiftrt:V4SI
(mult:V4SI
  (sign_extend:V4SI
-   (match_operand:V4HI 1 "nonimmediate_operand" "%0"))
+   (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yy"))
  (sign_extend:V4SI
-   (match_operand:V4HI 2 "nonimmediate_operand" "ym")))
+   (match_operand:V4HI 2 "nonimmediate_operand" "ym,Yx,Yy")))
(const_int 16]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmulhw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  "TARGET_MMX_INSNS && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmulhw\t{%2, %0|%0, %2}
+   pmulhw\t{%2, %0|%0, %2}
+   vpmulhw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_umulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 08/46] i386: Emulate MMX mulv4hi3 with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX mulv4hi3 with SSE.  Only SSE register source operand is
allowed.

PR target/89021
* config/i386/mmx.md (mulv4hi3): New.
(*mmx_mulv4hi3): Check TARGET_MMX_INSNS instead of TARGET_MMX.
Add SSE support.
---
 gcc/config/i386/mmx.md | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 33754910232..d3f300a901c 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -750,14 +750,25 @@
   "TARGET_MMX"
   "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
 
+(define_expand "mulv4hi3"
+  [(set (match_operand:V4HI 0 "register_operand")
+(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand")
+  (match_operand:V4HI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (MULT, V4HImode, operands);")
+
 (define_insn "*mmx_mulv4hi3"
-  [(set (match_operand:V4HI 0 "register_operand" "=y")
-(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand" "%0")
-  (match_operand:V4HI 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (MULT, V4HImode, operands)"
-  "pmullw\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxmul")
-   (set_attr "mode" "DI")])
+  [(set (match_operand:V4HI 0 "register_operand" "=y,Yx,Yy")
+(mult:V4HI (match_operand:V4HI 1 "nonimmediate_operand" "%0,0,Yy")
+  (match_operand:V4HI 2 "nonimmediate_operand" "ym,Yx,Yy")))]
+  "TARGET_MMX_INSNS && ix86_binary_operator_ok (MULT, V4HImode, operands)"
+  "@
+   pmullw\t{%2, %0|%0, %2}
+   pmullw\t{%2, %0|%0, %2}
+   vpmullw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxmul,ssemul,ssemul")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_smulv4hi3_highpart"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 05/46] i386: Emulate MMX packsswb/packssdw/packuswb with SSE2

2019-02-01 Thread H.J. Lu
Emulate MMX packsswb/packssdw/packuswb with SSE packsswb/packssdw/packuswb
plus moving bits 64:95 to bits 32:63 in SSE register.  Only SSE register
source operand is allowed.

PR target/89021
* config/i386/constraints.md (Yx): Any SSE register if MMX is
disabled in 64-bit mode.
(Yy): Any EVEX encodable SSE register for AVX512VL target,
otherwise any SSE register if MMX is disabled in 64-bit mode.
* config/i386/i386-protos.h (ix86_move_vector_high_sse_to_mmx):
New prototype.
(ix86_split_mmx_pack): Likewise.
* config/i386/i386.c (ix86_move_vector_high_sse_to_mmx): New
function.
(ix86_split_mmx_pack): Likewise.
* config/i386/mmx.md (any_s_truncate): New code iterator.
(s_trunsuffix): New code attr.
(mmx_packsswb): Removed.
(mmx_packssdw): Likewise.
(mmx_packuswb): Likewise.
(mmx_packswb): New define_insn_and_split to emulate
MMX packsswb/packuswb with SSE2.
(mmx_packssdw): Likewise.
---
 gcc/config/i386/constraints.md | 10 ++
 gcc/config/i386/i386-protos.h  |  3 ++
 gcc/config/i386/i386.c | 54 +++
 gcc/config/i386/mmx.md | 59 +-
 4 files changed, 97 insertions(+), 29 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 33921aea267..6e9244ad77f 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -110,6 +110,9 @@
 ;;  v  any EVEX encodable SSE register for AVX512VL target,
 ;; otherwise any SSE register
 ;;  h  EVEX encodable SSE register with number factor of four
+;;  x  SSE register if MMX is disabled in 64-bit mode
+;;  y  any EVEX encodable SSE register for AVX512VL target, otherwise
+;;  any SSE register if MMX is disabled in 64-bit mode
 
 (define_register_constraint "Yz" "TARGET_SSE ? SSE_FIRST_REG : NO_REGS"
  "First SSE register (@code{%xmm0}).")
@@ -146,6 +149,13 @@
  "TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS"
  "@internal For AVX512VL, any EVEX encodable SSE register 
(@code{%xmm0-%xmm31}), otherwise any SSE register.")
 
+(define_register_constraint "Yx" "TARGET_MMX_WITH_SSE ? SSE_REGS : NO_REGS"
+ "@internal Any SSE register if MMX is disabled in 64-bit mode.")
+
+(define_register_constraint "Yy"
+ "TARGET_MMX_WITH_SSE ? (TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? 
SSE_REGS : NO_REGS) : NO_REGS"
+ "@internal Any EVEX encodable SSE register for AVX512VL target, otherwise any 
SSE register if MMX is disabled in 64-bit mode.")
+
 ;; We use the B prefix to denote any number of internal operands:
 ;;  f  FLAGS_REG
 ;;  g  GOT memory operand.
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 2d600173917..bb96a420a85 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -200,6 +200,9 @@ extern void ix86_expand_vecop_qihi (enum rtx_code, rtx, 
rtx, rtx);
 
 extern rtx ix86_split_stack_guard (void);
 
+extern void ix86_move_vector_high_sse_to_mmx (rtx);
+extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
 #endif /* TREE_CODE  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4e67abe8764..fde32983fa2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19952,6 +19952,60 @@ ix86_expand_vector_move_misalign (machine_mode mode, 
rtx operands[])
 gcc_unreachable ();
 }
 
+/* Move bits 64:95 to bits 32:63.  */
+
+void
+ix86_move_vector_high_sse_to_mmx (rtx op)
+{
+  rtx mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (0), GEN_INT (2),
+ GEN_INT (0), GEN_INT (0)));
+  rtx dest = gen_rtx_REG (V4SImode, REGNO (op));
+  op = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  rtx insn = gen_rtx_SET (dest, op);
+  emit_insn (insn);
+}
+
+/* Split MMX pack with signed/unsigned saturation with SSE/SSE2.  */
+
+void
+ix86_split_mmx_pack (rtx operands[], enum rtx_code code)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+
+  machine_mode dmode = GET_MODE (op0);
+  machine_mode smode = GET_MODE (op1);
+  machine_mode inner_dmode = GET_MODE_INNER (dmode);
+  machine_mode inner_smode = GET_MODE_INNER (smode);
+
+  /* Get the corresponding SSE mode for destination.  */
+  int nunits = 16 / GET_MODE_SIZE (inner_dmode);
+  machine_mode sse_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+   nunits).require ();
+  machine_mode sse_half_dmode = mode_for_vector (GET_MODE_INNER (dmode),
+nunits / 2).require ();
+
+  /* Get the corresponding SSE mode for source.  */
+  nunits = 16 / GET_MODE_SIZE (inner_smode);
+  machine_mode sse_smode = mode_for_vector (GET_MODE_INNER (smode),
+ 

[PATCH 03/46] i386: Allow 64-bit vector modes in SSE registers

2019-02-01 Thread H.J. Lu
In 64-bit mode, we can use SSE2 to support 64-bit vectors.

PR target/89021
* config/i386/i386.h (VALID_SSE_REG_MODE): Allow 64-bit vector
modes for TARGET_MMX_WITH_SSE.
(VALID_SSE2_REG_MODE): Likewise.
---
 gcc/config/i386/i386.h | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index b62305fceec..10e882015f0 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1155,7 +1155,11 @@ extern const char *host_detect_local_cpu (int argc, 
const char **argv);
 
 #define VALID_SSE2_REG_MODE(MODE)  \
   ((MODE) == V16QImode || (MODE) == V8HImode || (MODE) == V2DFmode \
-   || (MODE) == V2DImode || (MODE) == DFmode)
+   || (MODE) == V2DImode || (MODE) == DFmode   \
+   || (TARGET_MMX_WITH_SSE && ((MODE) == V1DImode || (MODE) == V8QImode
\
+  || (MODE) == V4HImode\
+  || (MODE) == V2SImode\
+  || (MODE) == V2SFmode)))
 
 #define VALID_SSE_REG_MODE(MODE)   \
   ((MODE) == V1TImode || (MODE) == TImode  \
@@ -1198,7 +1202,11 @@ extern const char *host_detect_local_cpu (int argc, 
const char **argv);
|| (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode   \
|| (MODE) == V2TImode || (MODE) == V8DImode || (MODE) == V64QImode  \
|| (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode \
-   || (MODE) == V16SFmode)
+   || (MODE) == V16SFmode  \
+   || (TARGET_MMX_WITH_SSE && ((MODE) == V1DImode || (MODE) == V8QImode
\
+  || (MODE) == V4HImode\
+  || (MODE) == V2SImode\
+  || (MODE) == V2SFmode)))
 
 #define X87_FLOAT_MODE_P(MODE) \
   (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
-- 
2.20.1



[PATCH 00/46] Implement MMX intrinsics with SSE

2019-02-01 Thread H.J. Lu
On x86-64, since __m64 is returned and passed in XMM registers, we can
implement MMX intrinsics with SSE instructions. To support it, we disable
MMX by default in 64-bit mode so that MMX registers won't be available
with x86-64.  Most of MMX instructions have equivalent SSE versions and
results of some SSE versions need to be reshuffled to the right order
for MMX.  Thee are couple tricky cases:

1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent.  We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand.  A warning is issued since invalid memory access may
happen when bits 64:127 at memory location are unmapped:

xmmintrin.h:1168:3: note: Emulate MMX maskmovq with SSE2 maskmovdqu may result i
n invalid memory access
 1168 |   __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P);
  |   ^~~

2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.

3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.

4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.

Tests are also added to check each SSE emulation of MMX intrinsics.

With MMX disabled in 64-bit mode, 8-byte vectorizer is enabled with SSE2.

There are no regressions on i686 and x86-64.  For x86-64, GCC is also
tested with

--with-arch=native --with-cpu=native

on AVX2 and AVX512F machines.

H.J. Lu (46):
  i386: Add TARGET_MMX_INSNS and TARGET_MMX_WITH_SSE
  libitm: Support _ITM_TYPE_M64 with SSE2 in 64-bit mode
  i386: Allow 64-bit vector modes in SSE registers
  i386: Allow UNSPECV_EMMS with SSE2 in 64-bit mode
  i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
  i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
  i386: Emulate MMX plusminus/sat_plusminus with SSE
  i386: Emulate MMX mulv4hi3 with SSE
  i386: Emulate MMX smulv4hi3_highpart with SSE
  i386: Emulate MMX mmx_pmaddwd with SSE
  i386: Emulate MMX ashr3/3 with SSE
  i386: Emulate MMX 3 with SSE
  i386: Emulate MMX mmx_andnot3 with SSE
  i386: Emulate MMX mmx_eq/mmx_gt3 with SSE
  i386: Emulate MMX vec_dupv2si with SSE
  i386: Emulate MMX pshufw with SSE
  i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
  i386: Emulate MMX sse_cvtpi2ps with SSE
  i386: Emulate MMX mmx_pextrw with SSE
  i386: Emulate MMX mmx_pinsrw with SSE
  i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
  i386: Emulate MMX mmx_pmovmskb with SSE
  i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
  i386: Emulate MMX maskmovq with SSE2 maskmovdqu
  i386: Emulate MMX mmx_uavgv8qi3 with SSE
  i386: Emulate MMX mmx_uavgv4hi3 with SSE
  i386: Emulate MMX mmx_psadbw with SSE
  i386: Emulate MMX movntq with SSE2 movntidi
  i386: Emulate MMX umulv1siv1di3 with SSE2
  i386: Emulate MMX ssse3_phwv4hi3 with SSE
  i386: Emulate MMX ssse3_phdv2si3 with SSE
  i386: Emulate MMX ssse3_pmaddubsw with SSE
  i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
  i386: Emulate MMX pshufb with SSE version
  i386: Emulate MMX ssse3_psign3 with SSE
  i386: Emulate MMX ssse3_palignrdi with SSE
  i386: Emulate MMX abs2 with SSE
  i386: Allow MMXMODE moves without MMX
  i386: Allow MMX vector expanders with SSE
  i386: Don't enable MMX in 64-bit mode by default
  i386: Add tests for MMX intrinsic emulations with SSE
  i386: Also enable SSSE3 __m64 tests without MMX
  i386: Enable 8-byte vectorizer for TARGET_MMX_WITH_SSE
  i386: Implement V2SF add/sub/mul with SEE
  i386: Implement V2SF <-> V2SI conversions with SEE
  i386: Implement V2SF comparisons with SSE

 gcc/config/i386/constraints.md|  10 +
 gcc/config/i386/driver-i386.c |   4 +-
 gcc/config/i386/i386-builtin.def  | 126 +--
 gcc/config/i386/i386-protos.h |   4 +
 gcc/config/i386/i386.c| 205 -
 gcc/config/i386/i386.h|  22 +-
 gcc/config/i386/i386.md   |   3 +-
 gcc/config/i386/i386.opt  |   4 +
 gcc/config/i386/mmintrin.h|  10 +-
 gcc/config/i386/mmx.md| 824 --
 gcc/config/i386/sse.md| 412 +++--
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c   |   2 +-
 gcc/testsuite/gcc.target/i386/mmx-vals.h  |  77 ++
 gcc/testsuite/gcc.target/i386/pr82483-1.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr82483-2.c |   2 +-
 gcc/testsuite/gcc.target/i386/pr89028-1.c |  10 +
 gcc/testsuite/gcc.target/i386/pr89028-10.c|  39 +
 gcc/testsuite/gcc.target/i386/pr89028-11.c|  39 +
 gcc/testsuite/gcc.target/i386/pr89028-12.c|  39 +
 gcc/testsuite/gcc.target/i386/pr89028-13.c|  39 +
 gcc/testsuite/gcc.target/i386/pr89028-2.c |  11 +
 gcc/testsuite/gcc.target/i386/pr89028-3.c |  14 +
 gcc/testsuite/gcc.target/i386/pr89028-4.c |  14 +
 

[PATCH 07/46] i386: Emulate MMX plusminus/sat_plusminus with SSE

2019-02-01 Thread H.J. Lu
Emulate MMX plusminus/sat_plusminus with SSE.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/mmx.md (3): New.
(*mmx_3): Changed to define_insn_and_split
to support SSE emulation.
(*mmx_3): Likewise.
(mmx_3): Check TARGET_MMX_INSNS instead of
TARGET_MMX.
---
 gcc/config/i386/mmx.md | 46 --
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index fbd341109d6..33754910232 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -698,34 +698,50 @@
   "TARGET_MMX || (TARGET_SSE2 && mode == V1DImode)"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
+(define_expand "3"
+  [(set (match_operand:MMXMODEI 0 "register_operand")
+   (plusminus:MMXMODEI
+ (match_operand:MMXMODEI 1 "nonimmediate_operand")
+ (match_operand:MMXMODEI 2 "nonimmediate_operand")))]
+  "TARGET_MMX_WITH_SSE"
+  "ix86_fixup_binary_operands_no_copy (, mode, operands);")
+
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODEI8 0 "register_operand" "=y,Yx,Yy")
 (plusminus:MMXMODEI8
- (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym")))]
-  "(TARGET_MMX || (TARGET_SSE2 && mode == V1DImode))
+ (match_operand:MMXMODEI8 1 "nonimmediate_operand" "0,0,Yy")
+ (match_operand:MMXMODEI8 2 "nonimmediate_operand" "ym,Yx,Yy")))]
+  "(TARGET_MMX_INSNS || (TARGET_SSE2 && mode == V1DImode))
&& ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_3"
   [(set (match_operand:MMXMODE12 0 "register_operand")
(sat_plusminus:MMXMODE12
  (match_operand:MMXMODE12 1 "nonimmediate_operand")
  (match_operand:MMXMODE12 2 "nonimmediate_operand")))]
-  "TARGET_MMX"
+  "TARGET_MMX_INSNS"
   "ix86_fixup_binary_operands_no_copy (, mode, operands);")
 
 (define_insn "*mmx_3"
-  [(set (match_operand:MMXMODE12 0 "register_operand" "=y")
+  [(set (match_operand:MMXMODE12 0 "register_operand" "=y,Yx,Yy")
 (sat_plusminus:MMXMODE12
- (match_operand:MMXMODE12 1 "nonimmediate_operand" "0")
- (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym")))]
-  "TARGET_MMX && ix86_binary_operator_ok (, mode, operands)"
-  "p\t{%2, %0|%0, %2}"
-  [(set_attr "type" "mmxadd")
-   (set_attr "mode" "DI")])
+ (match_operand:MMXMODE12 1 "nonimmediate_operand" "0,0,Yy")
+ (match_operand:MMXMODE12 2 "nonimmediate_operand" "ym,Yx,Yy")))]
+  "TARGET_MMX_INSNS && ix86_binary_operator_ok (, mode, operands)"
+  "@
+   p\t{%2, %0|%0, %2}
+   p\t{%2, %0|%0, %2}
+   vp\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "*,noavx,avx")
+   (set_attr "type" "mmxadd,sseadd,sseadd")
+   (set_attr "mode" "DI,TI,TI")])
 
 (define_expand "mmx_mulv4hi3"
   [(set (match_operand:V4HI 0 "register_operand")
-- 
2.20.1



[PATCH 02/46] libitm: Support _ITM_TYPE_M64 with SSE2 in 64-bit mode

2019-02-01 Thread H.J. Lu
In 64-bit mode when MMX is disabled, we can use SSE2 emulate MMX
intrinsics.

PR target/89021
* libitm.h (_ITM_TYPE_M64): Also enabled with SSE2 in 64-bit mode.
---
 libitm/libitm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libitm/libitm.h b/libitm/libitm.h
index c892608cef7..7b471d4ad4b 100644
--- a/libitm/libitm.h
+++ b/libitm/libitm.h
@@ -217,7 +217,7 @@ ITM_LOG(CD)
 ITM_LOG(CE)
 
 #if defined(__i386__) || defined(__x86_64__)
-# ifdef __MMX__
+# if defined(__MMX__) || (defined(__x86_64__) && defined(__SSE2__))
   typedef int _ITM_TYPE_M64 __attribute__((vector_size(8), may_alias));
   ITM_BARRIERS(M64)
   ITM_LOG(M64)
-- 
2.20.1



[PATCH 06/46] i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX

2019-02-01 Thread H.J. Lu
Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX.  For MMX punpckhXX,
move bits 64:127 to bits 0:63 in SSE register.  Only SSE register source
operand is allowed.

PR target/89021
* config/i386/i386-protos.h (ix86_split_mmx_punpck): New
prototype.
* config/i386/i386.c (ix86_split_mmx_punpck): New function.
* config/i386/mmx.m (mmx_punpckhbw): Changed to
define_insn_and_split to support SSE emulation.
(mmx_punpcklbw): Likewise.
(mmx_punpckhwd): Likewise.
(mmx_punpcklwd): Likewise.
(mmx_punpckhdq): Likewise.
(mmx_punpckldq): Likewise.
---
 gcc/config/i386/i386-protos.h |   1 +
 gcc/config/i386/i386.c|  77 
 gcc/config/i386/mmx.md| 108 +-
 3 files changed, 144 insertions(+), 42 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bb96a420a85..dc7fc38d8e4 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -202,6 +202,7 @@ extern rtx ix86_split_stack_guard (void);
 
 extern void ix86_move_vector_high_sse_to_mmx (rtx);
 extern void ix86_split_mmx_pack (rtx[], enum rtx_code);
+extern void ix86_split_mmx_punpck (rtx[], bool);
 
 #ifdef TREE_CODE
 extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fde32983fa2..d795af1dd93 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -20006,6 +20006,83 @@ ix86_split_mmx_pack (rtx operands[], enum rtx_code 
code)
   ix86_move_vector_high_sse_to_mmx (op0);
 }
 
+/* Split MMX punpcklXX/punpckhXX with SSE punpcklXX.  */
+
+void
+ix86_split_mmx_punpck (rtx operands[], bool high_p)
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  machine_mode mode = GET_MODE (op0);
+  rtx mask;
+  /* The corresponding SSE mode.  */
+  machine_mode sse_mode, double_sse_mode;
+
+  switch (mode)
+{
+case E_V8QImode:
+  sse_mode = V16QImode;
+  double_sse_mode = V32QImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (16,
+ GEN_INT (0), GEN_INT (16),
+ GEN_INT (1), GEN_INT (17),
+ GEN_INT (2), GEN_INT (18),
+ GEN_INT (3), GEN_INT (19),
+ GEN_INT (4), GEN_INT (20),
+ GEN_INT (5), GEN_INT (21),
+ GEN_INT (6), GEN_INT (22),
+ GEN_INT (7), GEN_INT (23)));
+  break;
+
+case E_V4HImode:
+  sse_mode = V8HImode;
+  double_sse_mode = V16HImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (8,
+ GEN_INT (0), GEN_INT (8),
+ GEN_INT (1), GEN_INT (9),
+ GEN_INT (2), GEN_INT (10),
+ GEN_INT (3), GEN_INT (11)));
+  break;
+
+case E_V2SImode:
+  sse_mode = V4SImode;
+  double_sse_mode = V8SImode;
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4,
+ GEN_INT (0), GEN_INT (4),
+ GEN_INT (1), GEN_INT (5)));
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  /* Generate SSE punpcklXX.  */
+  rtx dest = gen_rtx_REG (sse_mode, REGNO (op0));
+  op1 = gen_rtx_REG (sse_mode, REGNO (op1));
+  op2 = gen_rtx_REG (sse_mode, REGNO (op2));
+
+  op1 = gen_rtx_VEC_CONCAT (double_sse_mode, op1, op2);
+  op2 = gen_rtx_VEC_SELECT (sse_mode, op1, mask);
+  rtx insn = gen_rtx_SET (dest, op2);
+  emit_insn (insn);
+
+  if (high_p)
+{
+  /* Move bits 64:127 to bits 0:63.  */
+  mask = gen_rtx_PARALLEL (VOIDmode,
+  gen_rtvec (4, GEN_INT (2), GEN_INT (3),
+ GEN_INT (0), GEN_INT (0)));
+  dest = gen_rtx_REG (V4SImode, REGNO (dest));
+  op1 = gen_rtx_VEC_SELECT (V4SImode, dest, mask);
+  insn = gen_rtx_SET (dest, op1);
+  emit_insn (insn);
+}
+}
+
 /* Helper function of ix86_fixup_binary_operands to canonicalize
operand order.  Returns true if the operands should be swapped.  */
 
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index c183f949a7c..fbd341109d6 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1083,87 +1083,111 @@
(set_attr "type" "mmxshft,sselog,sselog")
(set_attr "mode" "DI,TI,TI")])
 
-(define_insn "mmx_punpckhbw"
-  [(set (match_operand:V8QI 0 "register_operand" "=y")
+(define_insn_and_split "mmx_punpckhbw"
+  [(set (match_operand:V8QI 0 "register_operand" "=y,Yx,Yy")

[PATCH 01/46] i386: Add TARGET_MMX_INSNS and TARGET_MMX_WITH_SSE

2019-02-01 Thread H.J. Lu
In 64-bit mode when MMX is disabled, SSE2 can be used to emulate MMX
instructions.

PR target/89021
* config/i386/i386.h (TARGET_MMX_INSNS): New.
(TARGET_MMX_INSNS_P): Likewise.
(TARGET_MMX_WITH_SSE): Likewise.
(TARGET_MMX_WITH_SSE_P): Likewise.
---
 gcc/config/i386/i386.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 83b025e0cf5..b62305fceec 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -43,6 +43,16 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define TARGET_64BIT_P(x)  TARGET_ISA_64BIT_P(x)
 #define TARGET_MMX TARGET_ISA_MMX
 #define TARGET_MMX_P(x)TARGET_ISA_MMX_P(x)
+/* In 64-bit mode, SSE2 can be used to emulate MMX instructions.  */
+#define TARGET_MMX_INSNS   \
+  (TARGET_MMX || (TARGET_64BIT && TARGET_SSE2))
+#define TARGET_MMX_INSNS_P(x) \
+  (TARGET_MMX_P(x) || (TARGET_64BIT_P (x) && TARGET_SSE2_P (x)))
+/* In 64-bit mode, use special SSE2 patterns for MMX emulation.  */
+#define TARGET_MMX_WITH_SSE\
+  (TARGET_64BIT && TARGET_SSE2 && !TARGET_MMX)
+#define TARGET_MMX_WITH_SSE_P(x) \
+  (TARGET_64BIT_P (x) && TARGET_SSE2_P (x) && !TARGET_MMX_P(x))
 #define TARGET_3DNOW   TARGET_ISA_3DNOW
 #define TARGET_3DNOW_P(x)  TARGET_ISA_3DNOW_P(x)
 #define TARGET_3DNOW_A TARGET_ISA_3DNOW_A
-- 
2.20.1



[PATCH 04/46] i386: Allow UNSPECV_EMMS with SSE2 in 64-bit mode

2019-02-01 Thread H.J. Lu
In 64-bit mode, also support __builtin_ia32_emms with SSE2.

PR target/89021
* config/i386/mmx.md (UNSPECV_EMMS): Replace TARGET_MMX with
TARGET_MMX_INSNS.
---
 gcc/config/i386/mmx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index c1e0f2c411e..0d44aa60c79 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1578,7 +1578,7 @@
(set_attr "mode" "DI")])
 
 (define_int_iterator EMMS
-  [(UNSPECV_EMMS "TARGET_MMX")
+  [(UNSPECV_EMMS "TARGET_MMX_INSNS")
(UNSPECV_FEMMS "TARGET_3DNOW")])
 
 (define_int_attr emms
-- 
2.20.1



Late-breaking jit features (was Re: [PATCH][gcc] libgccjit: introduce gcc_jit_context_add_driver_option)

2019-02-01 Thread David Malcolm
On Mon, 2019-01-21 at 08:40 +, Andrea Corallo wrote:
> Hi all,
> Second version of the patch addressing David's comment about all-non-
> failing-tests.h
> 
> Adds gcc_jit_context_add_driver_option to the libgccjit ABI and a
> testcase for it.
> 
> Using this interface is now possible to pass options affecting
> assembler and linker.
> 
> Does not introduce regressions running make check-jit

Thanks; the patch looks good.

[CCing the release managers]

Given that gcc development is now in stage 4, we really shouldn't be
adding new features, but I'm wondering if an exception can be made for
libgccjit?  (this patch purely touches the jit subdirectories).

There's one other late-breaking change, here:
  [PATCH][jit] Add thread-local globals to the libgccjit frontend
https://gcc.gnu.org/ml/gcc-patches/2019-01/msg00227.html
which is nearly ready, but is awaiting copyright assignment paperwork.

Alternatively, should these patches go into a branch of queued jit
changes for gcc 10?

Thanks
Dave


> Bests
> 
>   Andrea
> 
> 
> gcc/jit/ChangeLog
> 2019-01-16  Andrea Corallo  andrea.cora...@arm.com
> 
> * docs/topics/compatibility.rst (LIBGCCJIT_ABI_11): New ABI tag.
> * docs/topics/contexts.rst (Additional driver options): New
> section.
> * jit-playback.c (invoke_driver): Add call to append_driver_options.
> * jit-recording.c: Within namespace gcc::jit...
> (recording::context::~context): Free the optnames within
> m_driver_options.
> (recording::context::add_driver_option): New method.
> (recording::context::append_driver_options): New method.
> (recording::context::dump_reproducer_to_file): Add driver
> options.
> * jit-recording.h: Within namespace gcc::jit...
> (recording::context::add_driver_option): New method.
> (recording::context::append_driver_options): New method.
> (recording::context::m_driver_options): New field.
> * libgccjit++.h (gccjit::context::add_driver_option): New
> method.
> * libgccjit.c (gcc_jit_context_add_driver_option): New API
> entrypoint.
> * libgccjit.h (gcc_jit_context_add_driver_option): New API
> entrypoint.
> (LIBGCCJIT_HAVE_gcc_jit_context_add_driver_option): New
> macro.
> * libgccjit.map (LIBGCCJIT_ABI_11): New ABI tag.
> 
> 
> 
> gcc/testsuite/ChangeLog
> 2019-01-16  Andrea Corallo  andrea.cora...@arm.com
> 
> * jit.dg/add-driver-options-testlib.c: Add support file for
> test-add-driver-options.c testcase.
> * jit.dg/all-non-failing-tests.h: Add note about
> test-add-driver-options.c
> * jit.dg/jit.exp (jit-dg-test): Update to support
> add-driver-options-testlib.c compilation.
> * jit.dg/test-add-driver-options.c: New testcase.
> 


[PATCH, rs6000] Correct dg directives on recently added vec-extract tests

2019-02-01 Thread Kelvin Nilsen


Overnight regression testing revealed a portability problem with several 
recently installed tests.  The tests were observed to fail on a power7 test 
platform.

The tests, which are intended to execute, are compiled with -mcpu=power8.  
Thus, they require power 8 hardware.

I have regression tested this on powerpc64-linux (P7 big-endian, both -m32 and 
-m64), both 32-bit and 64-bit.  Is this ok for trunk and for various backports 
to which the original patch is to be directed?

gcc/testsuite/ChangeLog:

2019-02-01  Kelvin Nilsen  

* gcc.target/powerpc/vec-extract-slong-1.c: Require p8 execution
hardware.
* gcc.target/powerpc/vec-extract-schar-1.c: Likewise.
* gcc.target/powerpc/vec-extract-sint128-1.c: Likewise.
* gcc.target/powerpc/vec-extract-sshort-1.c: Likewise.
* gcc.target/powerpc/vec-extract-ulong-1.c: Likewise.
* gcc.target/powerpc/vec-extract-uchar-1.c: Likewise.
* gcc.target/powerpc/vec-extract-sint-1.c: Likewise.
* gcc.target/powerpc/vec-extract-uint128-1.c: Likewise.
* gcc.target/powerpc/vec-extract-ushort-1.c: Likewise.
* gcc.target/powerpc/vec-extract-uint-1.c: Likewise.

Index: gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c  (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-slong-1.c  (working copy)
@@ -2,7 +2,7 @@
signed longs remains signed.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c  (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-schar-1.c  (working copy)
@@ -2,7 +2,7 @@
signed chars remains signed.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-sint128-1.c(working copy)
@@ -2,7 +2,7 @@
signed __int128s remains signed.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-sshort-1.c (working copy)
@@ -2,7 +2,7 @@
signed shorts remains signed.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c  (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-ulong-1.c  (working copy)
@@ -2,7 +2,7 @@
unsigned longs remains unsigned.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c
===
--- gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c  (revision 
268424)
+++ gcc/testsuite/gcc.target/powerpc/vec-extract-uchar-1.c  (working copy)
@@ -2,7 +2,7 @@
unsigned chars remains unsigned.  */
 /* { dg-do run } */
 /* { dg-options "-ansi -mcpu=power8 " } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-require-effective-target p8vector_hw } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 
 #include 
Index: 

[C++ Patch] PR 88986 ("[7/8/9 Regression] ICE: tree check: expected tree that contains 'decl minimal' structure, have 'error_mark' in member_vec_binary_search, at cp/name-lookup.c:1136")

2019-02-01 Thread Paolo Carlini

Hi,

I think that this ICE on invalid (and valid, for c++17+) can be in fact 
avoided by accepting in make_typename_type a TYPE_PACK_EXPANSION as 
context, thus by not triggering the "‘T ...’ is not a class" error. Not 
sure if a better fix would be something more general. Note, anyway, that 
we are asserting TYPE_P (context) thus TYPE_PACK_EXPANSIONs definitely 
get through beyond MAYBE_CLASS_TYPE_P.


Tested x86_64-linux.

Thanks, Paolo.

///

/cp
2019-02-01  Paolo Carlini  

PR c++/88986
* decl.c (make_typename_type): Allow for TYPE_PACK_EXPANSION as
context (the first argument).

/testsuite
2019-02-01  Paolo Carlini  

PR c++/88986
* g++.dg/cpp1z/using4.C: New.

Index: cp/decl.c
===
--- cp/decl.c   (revision 268447)
+++ cp/decl.c   (working copy)
@@ -3816,7 +3816,9 @@ make_typename_type (tree context, tree name, enum
   gcc_assert (identifier_p (name));
   gcc_assert (TYPE_P (context));
 
-  if (!MAYBE_CLASS_TYPE_P (context))
+  if (TREE_CODE (context) == TYPE_PACK_EXPANSION)
+/* This can happen for C++17 variadic using (c++/88986).  */;
+  else if (!MAYBE_CLASS_TYPE_P (context))
 {
   if (complain & tf_error)
error ("%q#T is not a class", context);
Index: testsuite/g++.dg/cpp1z/using4.C
===
--- testsuite/g++.dg/cpp1z/using4.C (nonexistent)
+++ testsuite/g++.dg/cpp1z/using4.C (working copy)
@@ -0,0 +1,8 @@
+// PR c++/88986
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template struct C : T... {
+  using typename T::type ...;  // { dg-warning "pack expansion" "" { target 
c++14_down } }
+  void f() { type value; }
+};


Re: Move -Wmaybe-uninitialized to -Wextra

2019-02-01 Thread Marc Glisse

On Fri, 1 Feb 2019, Jeff Law wrote:


On 2/1/19 7:01 AM, Marek Polacek wrote:

On Fri, Feb 01, 2019 at 07:19:25AM -0600, Segher Boessenkool wrote:

Hi Marc,

On Fri, Feb 01, 2019 at 12:32:45PM +0100, Marc Glisse wrote:

-Wmaybe-uninitialized generates false positives, we can tweak the compiler
to reduce them, but there will always be some, that's in the nature of
this warning.


That is true for *every* warning; if not, it should be an error, not a
warning.


My opinion is that -Wmaybe-uninitialized would serve its purpose better as
part of -Wextra.


+1


+1 from me too.

I disagree strongly.


I am not surprised, but I had to at least start the conversation. Would 
you mind providing a patch that changes the definition of -Wall, since the 
current one doesn't quite match reality anymore? Also, what do you 
recommend people do when they hit false positives?


If we move it to Wextra it's going to see a lot less usage in real world 
codebases


I am very tempted by the strawman: should we deprecate -Wextra since 
nobody uses it? (Sorry)


Ideally serious projects would use (parts of) -Wextra, at least 
occasionally, and with circumspection. But some warnings like 
-Wmaybe-uninitialized are dangerous tools in the hands of quickfix 
developers, and I am indeed trying to keep it out of their hands...


and potentially lead to the re-introduction of a class of bugs that 
we've largely helped stomp out.


That's very optimistic. -Wmaybe-uninitialized only detects a very small 
proportion of uninitialized uses. Also, my experience is that people have 
stomped out the warning, not the bugs. In some cases they even introduced 
bugs to stomp out false warnings, or made it harder to detect real bugs in 
the future, so the warning did more harm than good. I am mostly working on 
large C++ header-only template-heavy scientific libraries, it is quite 
possible that people who handle different types of code bases have a 
different experience, and -Wmaybe-uninitialized may have had a more 
positive impact on other projects.



It's also the case that maybe uninitialized vs is uninitialized is
really just a function of CFG shape.  Give me any "maybe uninitialized"
case and I can turn it into a "is uninitialized" with simple block
duplication of the forms done by jump threading, path isolation,
superblock formation, etc.


Hmm, you know those things better than me, so you are probably right, but 
I am not seeing it. We omit "maybe" if, starting from the entry of the 
function, and barring exceptions, we know the statement will always be 
executed. If you take a false positive for maybe-uninitialized, i.e. a 
statement in a dead branch, I don't see how block duplication can make it 
so the statement is now always executed. The natural way to remove "maybe" 
is through function cloning or outlining. Then you can create functions 
that are never called, and any warning about those is a false positive.


There is a matter of statistics. In practice maybe-uninitialized has way 
more false positives than uninitialized, which makes it more problematic. 
But if you prefer to move both to -Wextra (this is the current default 
when front-ends don't override it), that's ok with me ;-)


--
Marc Glisse


Re: C++ PATCH for c++/88325 - ICE with invalid out-of-line template member definition

2019-02-01 Thread Marek Polacek
On Fri, Feb 01, 2019 at 12:02:44PM -0500, Jason Merrill wrote:
> On 2/1/19 11:26 AM, Marek Polacek wrote:
> > On Wed, Jan 30, 2019 at 01:39:11PM -0500, Jason Merrill wrote:
> > > On 1/28/19 9:46 PM, Marek Polacek wrote:
> > > > This patch fixes an ICE-on-invalid (becase out-of-line constructors 
> > > > can't have
> > > > template arguments and also because function templates can't be 
> > > > partially
> > > > specialized) in C++2a: when we're parsing
> > > > 
> > > > template template A::A ()
> > > > 
> > > > in the attached test we end up parsing "A::A" as a type name, and 
> > > > first we
> > > > try a class-name.  First we process "A::" as the nested name 
> > > > specifier and then
> > > > we parse "A".  In this test that results in a BASELINK.  Because in 
> > > > this context
> > > > we're supposed to treat it as a typename ([temp.res]/6), we call 
> > > > make_typename_type,
> > > > but that crashes.
> > > 
> > > Hmm.  If we've done an actual lookup (that gave us a BASELINK), we aren't
> > > dealing with a member of an unknown specialization anymore, so we should
> > > just use the result of the lookup rather than speculate about what the 
> > > name
> > > might mean.  Why are we still trying to treat it as a typename?
> > 
> > Good point.  It's because cp_parser_class_name has:
> > 
> > 23095   /* Any name names a type if we're following the `typename' keyword
> > 23096  in a qualified name where the enclosing scope is type-dependent. 
> >  */
> > 23097   typename_p = (typename_keyword_p && scope && TYPE_P (scope)
> > 23098 && dependent_type_p (scope));
> > 
> > and scope is in this case "A" which is dependent.  Then there's this
> > "identifier, but not template-id" case which only performs name lookup when
> > typename_p is false.  But we're parsing "A" so we call
> > cp_parser_template_id.  It sees CPP_TEMPLATE_ID -- an already parsed
> > template-id, so it just returns it, which is a BASELINK.  So even messing
> > with tag_type wouldn't help.
> > 
> > Does this answer your question?
> 
> Mostly, though I'm still curious how we got the previous parse of the
> template-id and yet continued to try to parse it as a class-name.

So we have
template template
A::A ()

and we're in cp_parser_single_declaration.  We're trying to parse the
decl-specifier-seq, and that first tries to parse variou RID_* keywords
and if that doesn't work it tries to parse a constructor:

  /* Constructors are a special case.  The `S' in `S()' is not a
 decl-specifier; it is the beginning of the declarator.  */
  constructor_p
= (!found_decl_spec
   && constructor_possible_p
   && (cp_parser_constructor_declarator_p
   (parser, decl_spec_seq_has_spec_p (decl_specs, ds_friend;

cp_parser_constructor_declarator_p calls cp_parser_nested_name_specifier_opt
and that's where we parse the two template-ids, A and A.  But
cp_parser_constructor_declarator_p isn't successful so we go to parsing
A::A as a type-specifier, and that's where the above happens.

> The patch is OK.

Thanks.

Marek


Re: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions

2019-02-01 Thread Thomas Schwinge
Hi!

On Fri, 01 Feb 2019 00:59:30 +0100, I wrote:
> From c7713be32fc5eace2b1e9c20447da849d23f6076 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Gerg=C3=B6=20Barany?= 
> Date: Wed, 23 Jan 2019 22:11:11 -0800
> Subject: [PATCH 6/9] Adjust parallelism of loops in gang-single parts of
>  OpenACC kernels regions

>  transform_kernels_loop_clauses (gimple *omp_for,

> +  struct walk_stmt_info wi;
> +  memset (, 0, sizeof (wi));
> +  tree *num_clauses[GOMP_DIM_MAX]
> += { [GOMP_DIM_GANG] = _gang_clause,
> +[GOMP_DIM_WORKER] = _worker_clause,
> +[GOMP_DIM_VECTOR] = _vector_clause };
> +  wi.info = num_clauses;
> +  gimple *body = gimple_omp_body (omp_for);
> +  walk_gimple_seq (body, adjust_nested_loop_clauses, NULL, );

It makes sense to me, but not to GCC 4.6 ;-) -- pushed to
openacc-gcc-8-branch the attached commit
5885db6f8466e13ddfab046bae3149a992a30926 'Adjust parallelism of loops in
gang-single parts of OpenACC kernels regions: "struct
adjust_nested_loop_clauses_wi_info"'.


Grüße
 Thomas


>From 5885db6f8466e13ddfab046bae3149a992a30926 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 1 Feb 2019 18:12:05 +0100
Subject: [PATCH] Adjust parallelism of loops in gang-single parts of OpenACC
 kernels regions: "struct adjust_nested_loop_clauses_wi_info"

The current code apparently is too freaky at least for for GCC 4.6:

[...]/gcc/omp-oacc-kernels.c: In function 'tree_node* transform_kernels_loop_clauses(gimple*, tree, tree, tree, tree)':
[...]/gcc/omp-oacc-kernels.c:584:10: error: expected identifier before numeric constant
[...]/gcc/omp-oacc-kernels.c: In lambda function:
[...]/gcc/omp-oacc-kernels.c:584:25: error: expected '{' before '=' token
[...]/gcc/omp-oacc-kernels.c: In function 'tree_node* transform_kernels_loop_clauses(gimple*, tree, tree, tree, tree)':
[...]/gcc/omp-oacc-kernels.c:584:25: warning: lambda expressions only available with -std=c++0x or -std=gnu++0x [enabled by default]
[...]/gcc/omp-oacc-kernels.c:584:28: error: no match for 'operator=' in '{} = & loop_gang_clause'
[...]

	gcc/
	* omp-oacc-kernels.c (struct adjust_nested_loop_clauses_wi_info): New.
	(adjust_nested_loop_clauses, transform_kernels_loop_clauses): Use it.
---
 gcc/ChangeLog.openacc  |  5 +
 gcc/omp-oacc-kernels.c | 29 +
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 433653b2b38..a3472637729 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-02-01  Thomas Schwinge  
+
+	* omp-oacc-kernels.c (struct adjust_nested_loop_clauses_wi_info): New.
+	(adjust_nested_loop_clauses, transform_kernels_loop_clauses): Use it.
+
 2019-01-31  Thomas Schwinge  
 
 	* doc/invoke.texi (-fopenacc-kernels): Update.
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index a8860c98e11..d1db4924b1c 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -409,14 +409,19 @@ add_parent_or_loop_num_clause (tree parent_clause, tree loop_clause,
nested loops.  It adds an auto clause unless there is already an
independent/seq/auto clause or a gang/worker/vector annotation.  */
 
+struct adjust_nested_loop_clauses_wi_info
+{
+  tree *loop_gang_clause_ptr;
+  tree *loop_worker_clause_ptr;
+  tree *loop_vector_clause_ptr;
+};
+
 static tree
 adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
 struct walk_stmt_info *wi)
 {
-  tree **clauses = (tree **) wi->info;
-  tree *gang_num_clause = clauses[GOMP_DIM_GANG];
-  tree *worker_num_clause = clauses[GOMP_DIM_WORKER];
-  tree *vector_length_clause = clauses[GOMP_DIM_VECTOR];
+  struct adjust_nested_loop_clauses_wi_info *wi_info
+= (struct adjust_nested_loop_clauses_wi_info *) wi->info;
   gimple *stmt = gsi_stmt (*gsi_p);
 
   if (gimple_code (stmt) == GIMPLE_OMP_FOR)
@@ -430,13 +435,13 @@ adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
   switch (OMP_CLAUSE_CODE (loop_clause))
 {
   case OMP_CLAUSE_GANG:
-outer_clause_ptr = gang_num_clause;
+outer_clause_ptr = wi_info->loop_gang_clause_ptr;
 break;
   case OMP_CLAUSE_WORKER:
-outer_clause_ptr = worker_num_clause;
+outer_clause_ptr = wi_info->loop_worker_clause_ptr;
 break;
   case OMP_CLAUSE_VECTOR:
-outer_clause_ptr = vector_length_clause;
+outer_clause_ptr = wi_info->loop_vector_clause_ptr;
 break;
   case OMP_CLAUSE_INDEPENDENT:
   case OMP_CLAUSE_SEQ:
@@ -580,11 +585,11 @@ transform_kernels_loop_clauses (gimple *omp_for,
  Turn these into worker/vector annotations on the parallel region.  */
   struct walk_stmt_info wi;
   memset (, 0, sizeof (wi));
-  tree *num_clauses[GOMP_DIM_MAX]
-= { [GOMP_DIM_GANG] = _gang_clause,
-[GOMP_DIM_WORKER] 

Re: [Patch, fortran] PR88393 - [7/8/9 Regression] [OOP] Segfault with type-bound assignment

2019-02-01 Thread Steve Kargl
On Fri, Feb 01, 2019 at 06:15:21PM +, Paul Richard Thomas wrote:
> I will commit this patch as 'obvious' tomorrow.
> 
> Cheers
> 
> Paul
> 
> 2019-02-01  Paul Thomas  
> 
> PR fortran/88393
> * trans-expr.c (gfc_conv_procedure_call): For derived entities,
> passed in parentheses to class formals, invert the order of
> copying allocatable components to taking taking the _data of
> the class expression.
> 
> 2019-02-01  Paul Thomas  
> 
> PR fortran/88393
> * gfortran.dg/alloc_comp_assign_16.f03 : New test.

Paul,

Does this patch also fix PR57710?

-- 
Steve


Re: Move -Wmaybe-uninitialized to -Wextra

2019-02-01 Thread Jeff Law
On 2/1/19 7:01 AM, Marek Polacek wrote:
> On Fri, Feb 01, 2019 at 07:19:25AM -0600, Segher Boessenkool wrote:
>> Hi Marc,
>>
>> On Fri, Feb 01, 2019 at 12:32:45PM +0100, Marc Glisse wrote:
>>> -Wmaybe-uninitialized generates false positives, we can tweak the compiler 
>>> to reduce them, but there will always be some, that's in the nature of 
>>> this warning.
>>
>> That is true for *every* warning; if not, it should be an error, not a
>> warning.
>>
>>> My opinion is that -Wmaybe-uninitialized would serve its purpose better as 
>>> part of -Wextra.
>>
>> +1
> 
> +1 from me too.
I disagree strongly.  If we move it to Wextra it's going to see a lot
less usage in real world codebases and potentially lead to the
re-introduction of a class of bugs that we've largely helped stomp out.

It's also the case that maybe uninitialized vs is uninitialized is
really just a function of CFG shape.  Give me any "maybe uninitialized"
case and I can turn it into a "is uninitialized" with simple block
duplication of the forms done by jump threading, path isolation,
superblock formation, etc.


> 
>>> People tend to use -Wall with -Werror (either explicitly 
>>> or implicitly by modifying the code until all warnings are gone). What I 
>>> see currently in projects where I participate is that 
>>> -Wmaybe-uninitialized is making things worse. People don't investigate 
>>> deeply the cause of the warning, they just go for whatever "quick-fix" 
>>> makes the compiler shut up. Quite often, this involves extra code that is 
>>> less readable and performs worse, while it didn't even "fix" what caused 
>>> the warning, it just randomly ended up with code where the compiler 
>>> doesn't warn (possibly because the function got bigger, which changed 
>>> inlining decisions...).
>>
>> Yes, using -Werror is usually a terrible idea.
Generally agreed in released versions of any code.  -Werror *may* be
appropriate in development versions depending on the project's policies,
procedures and quality of codebase.


Jeff


[PATCH] [8/9 Regression] i386: Add pass_remove_partial_avx_dependency

2019-02-01 Thread H.J. Lu
On Mon, Jan 28, 2019 at 9:08 AM H.J. Lu  wrote:
>
> On Tue, Jan 22, 2019 at 5:28 AM H.J. Lu  wrote:
> >
> > On Tue, Jan 22, 2019 at 4:08 AM Richard Biener
> >  wrote:
> > >
> > > On Mon, Jan 21, 2019 at 10:27 PM H.J. Lu  wrote:
> > > >
> > > > On Mon, Jan 21, 2019 at 10:54 AM Jeff Law  wrote:
> > > > >
> > > > > On 1/7/19 6:55 AM, H.J. Lu wrote:
> > > > > > On Sun, Dec 30, 2018 at 8:40 AM H.J. Lu  wrote:
> > > > > >> On Wed, Nov 28, 2018 at 12:17 PM Jeff Law  wrote:
> > > > > >>> On 11/28/18 12:48 PM, H.J. Lu wrote:
> > > > >  On Mon, Nov 5, 2018 at 7:29 AM Jan Hubicka  
> > > > >  wrote:
> > > > > >> On 11/5/18 7:21 AM, Jan Hubicka wrote:
> > > > >  Did you mean "the nearest common dominator"?
> > > > > >>> If the nearest common dominator appears in the loop while all 
> > > > > >>> uses are
> > > > > >>> out of loops, this will result in suboptimal xor placement.
> > > > > >>> In this case you want to split edges out of the loop.
> > > > > >>>
> > > > > >>> In general this is what the LCM framework will do for you if 
> > > > > >>> the problem
> > > > > >>> is modelled siimlar way as in mode_swtiching.  At entry 
> > > > > >>> function mode is
> > > > > >>> "no zero register needed" and all conversions need mode "zero 
> > > > > >>> register
> > > > > >>> needed".  Mode switching should then do the correct placement 
> > > > > >>> decisions
> > > > > >>> (reaching minimal number of executions of xor).
> > > > > >>>
> > > > > >>> Jeff, whan is your optinion on the approach taken by the 
> > > > > >>> patch?
> > > > > >>> It seems like a special case of more general issue, but I do 
> > > > > >>> not see
> > > > > >>> very elegant way to solve it at least in the GCC 9 horisont, 
> > > > > >>> so if
> > > > > >>> the placement is correct we can probalby go either with new 
> > > > > >>> pass or
> > > > > >>> making this part of mode swithcing (which is anyway run by 
> > > > > >>> x86 backend)
> > > > > >> So I haven't followed this discussion at all, but did touch on 
> > > > > >> this
> > > > > >> issue with some patch a month or two ago with a target patch 
> > > > > >> that was
> > > > > >> trying to avoid the partial stalls.
> > > > > >>
> > > > > >> My assumption is that we're trying to find one or more places 
> > > > > >> to
> > > > > >> initialize the upper half of an avx register so as to avoid 
> > > > > >> partial
> > > > > >> register stall at existing sites that set the upper half.
> > > > > >>
> > > > > >> This sounds like a classic PRE/LCM style problem (of which mode
> > > > > >> switching is just another variant).   A common-dominator 
> > > > > >> approach is
> > > > > >> closer to a classic GCSE and is going to result is more 
> > > > > >> initializations
> > > > > >> at sub-optimal points than a PRE/LCM style.
> > > > > > yes, it is usual code placement problem. It is special case 
> > > > > > because the
> > > > > > zero register is not modified by the conversion (just we need 
> > > > > > to have
> > > > > > zero somewhere).  So basically we do not have kills to the zero 
> > > > > > except
> > > > > > for entry block.
> > > > > >
> > > > >  Do you have  testcase to show thatf the nearest common dominator
> > > > >  in the loop, while all uses areout of loops, leads to suboptimal 
> > > > >  xor
> > > > >  placement?
> > > > > >>> I don't have a testcase, but it's all but certain nearest common
> > > > > >>> dominator is going to be a suboptimal placement.  That's going to 
> > > > > >>> create
> > > > > >>> paths where you're going to emit the xor when it's not used.
> > > > > >>>
> > > > > >>> The whole point of the LCM algorithms is they are optimal in 
> > > > > >>> terms of
> > > > > >>> expression evaluations.
> > > > > >> We tried LCM and it didn't work well for this case.  LCM places a 
> > > > > >> single
> > > > > >> VXOR close to the location where it is needed, which can be inside 
> > > > > >> a
> > > > > >> loop.  There is nothing wrong with the LCM algorithms.   But this 
> > > > > >> doesn't
> > > > > >> solve
> > > > > >>
> > > > > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87007
> > > > > >>
> > > > > >> where VXOR is executed multiple times inside of a function, 
> > > > > >> instead of
> > > > > >> just once.   We are investigating to generate a single VXOR at 
> > > > > >> entry of the
> > > > > >> nearest dominator for basic blocks with SF/DF conversions, which 
> > > > > >> is in
> > > > > >> the the fake loop that contains the whole function:
> > > > > >>
> > > > > >>   bb = nearest_common_dominator_for_set (CDI_DOMINATORS,
> > > > > >>  convert_bbs);
> > > > > >>   while (bb->loop_father->latch
> > > > > >>  != EXIT_BLOCK_PTR_FOR_FN (cfun))
> > > > > >>

Re: [Patch, fortran] PR88393 - [7/8/9 Regression] [OOP] Segfault with type-bound assignment

2019-02-01 Thread Steve Kargl
On Fri, Feb 01, 2019 at 06:15:21PM +, Paul Richard Thomas wrote:
> 2019-02-01  Paul Thomas  
> 
> PR fortran/88393
> * trans-expr.c (gfc_conv_procedure_call): For derived entities,
> passed in parentheses to class formals, invert the order of
> copying allocatable components to taking taking the _data of
> the class expression.

taking taking

> Index: gcc/fortran/trans-expr.c
> ===
> *** gcc/fortran/trans-expr.c  (revision 268231)
> --- gcc/fortran/trans-expr.c  (working copy)
> *** gfc_conv_procedure_call (gfc_se * se, gf
> *** 6042,6047 
> --- 6042,6057 
> break;
>   }
>   
> +   if (e->ts.type == BT_DERIVED && fsym && fsym->ts.type == BT_CLASS)
> + {
> +   /* The derived type is passed to gfc_deallocate_alloc_comp.
> +  Therefore, class actuals can handled correctly but derived

s/can handled/can be handled/

-- 
Steve


Re: [PATCH] Add simplification rule tanh (x) * cosh (x) -> sinh (x)

2019-02-01 Thread Jeff Law
On 1/30/19 7:10 AM, Bárbara de Castro Fernandes wrote:
> This patch simplifies the function tanh (x) * cosh (x) -> sinh (x).
> This rule is derived from the relationship between hyperbolic
> functions.
> 
> I ran the tests and gfortran.dg/pr79966.f90 failed, but this failure
> is unrelated to the patch (see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88711 for more
> information). My architecture is x86_64.
> 
> gcc/ChangeLog:
> 2019-01-30  Bárbara Fernandes 
> 
> * match.pd (tanh (x) * cosh (x)): New simplification rule.
> 
> gcc/testsuite/ChangeLog:
> 2019-01-30  Bárbara Fernandes  
> 
> * tanhtimescosh.c: New test.
> 
Just a note.  The trunk is only open for regression bugfixes right  now
as we prepare for the spring release.  This patch has been queued for
analysis after the gcc-9 release.
jeff


[Patch, fortran] PR88393 - [7/8/9 Regression] [OOP] Segfault with type-bound assignment

2019-02-01 Thread Paul Richard Thomas
I will commit this patch as 'obvious' tomorrow.

Cheers

Paul

2019-02-01  Paul Thomas  

PR fortran/88393
* trans-expr.c (gfc_conv_procedure_call): For derived entities,
passed in parentheses to class formals, invert the order of
copying allocatable components to taking taking the _data of
the class expression.

2019-02-01  Paul Thomas  

PR fortran/88393
* gfortran.dg/alloc_comp_assign_16.f03 : New test.
Index: gcc/fortran/trans-expr.c
===
*** gcc/fortran/trans-expr.c	(revision 268231)
--- gcc/fortran/trans-expr.c	(working copy)
*** gfc_conv_procedure_call (gfc_se * se, gf
*** 6042,6047 
--- 6042,6057 
  	  break;
  	}
  
+ 	  if (e->ts.type == BT_DERIVED && fsym && fsym->ts.type == BT_CLASS)
+ 	{
+ 	  /* The derived type is passed to gfc_deallocate_alloc_comp.
+ 		 Therefore, class actuals can handled correctly but derived
+ 		 types passed to class formals need the _data component.  */
+ 	  tmp = gfc_class_data_get (tmp);
+ 	  if (!CLASS_DATA (fsym)->attr.dimension)
+ 		tmp = build_fold_indirect_ref_loc (input_location, tmp);
+ 	}
+ 
  	  if (e->expr_type == EXPR_OP
  		&& e->value.op.op == INTRINSIC_PARENTHESES
  		&& e->value.op.op1->expr_type == EXPR_VARIABLE)
*** gfc_conv_procedure_call (gfc_se * se, gf
*** 6053,6068 
  	  gfc_add_expr_to_block (>post, local_tmp);
  	}
  
- 	  if (e->ts.type == BT_DERIVED && fsym && fsym->ts.type == BT_CLASS)
- 	{
- 	  /* The derived type is passed to gfc_deallocate_alloc_comp.
- 		 Therefore, class actuals can handled correctly but derived
- 		 types passed to class formals need the _data component.  */
- 	  tmp = gfc_class_data_get (tmp);
- 	  if (!CLASS_DATA (fsym)->attr.dimension)
- 		tmp = build_fold_indirect_ref_loc (input_location, tmp);
- 	}
- 
  	  if (!finalized && !e->must_finalize)
  	{
  	  if ((e->ts.type == BT_CLASS
--- 6063,6068 
Index: gcc/testsuite/gfortran.dg/alloc_comp_assign_16.f03
===
*** gcc/testsuite/gfortran.dg/alloc_comp_assign_16.f03	(nonexistent)
--- gcc/testsuite/gfortran.dg/alloc_comp_assign_16.f03	(working copy)
***
*** 0 
--- 1,37 
+ ! { dg-do run }
+ !
+ ! Test the fix for PR88393 in which a segfault occurred as indicated.
+ !
+ ! Contributed by Janus Weil  
+ !
+ module m
+implicit none
+type :: t
+   character(len=:), allocatable :: cs
+contains
+   procedure :: ass
+   generic :: assignment(=) => ass
+end type
+ contains
+subroutine ass(a, b)
+   class(t), intent(inout) :: a
+   class(t), intent(in):: b
+   a%cs = b%cs
+   print *, "ass"
+end subroutine
+ end module
+ 
+ program p
+use m
+implicit none
+type :: t2
+   type(t) :: c
+end type
+type(t2), dimension(1:2) :: arr
+arr(1)%c%cs = "abcd"
+arr(2)%c = arr(1)%c  ! Segfault here.
+print *, "done", arr(2)%c%cs, arr(2)%c%cs
+ ! Make sure with valgrind that there are no memory leaks.
+deallocate (arr(1)%c%cs)
+deallocate (arr(2)%c%cs)
+ end


Re: [PATCH] [og8] Allow optional arguments to be used in the use_device OpenACC clause

2019-02-01 Thread Kwok Cheung Yeung

There is an error in the logic here:

--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -8938,18 +8938,51 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)

  tkind = GOMP_MAP_FIRSTPRIVATE_INT;
type = TREE_TYPE (ovar);
if (TREE_CODE (type) == ARRAY_TYPE)
- var = build_fold_addr_expr (var);
+ {
+   var = build_fold_addr_expr (var);
+   gimplify_assign (x, var, );
+ }
else
...
+   if (omp_is_reference (ovar) || optional_arg_p)
  {
...
+   gimplify_assign (x, var, );
  }
+
+   if (optional_arg_p)
+ gimple_seq_add_stmt (,
+  gimple_build_label (opt_arg_label));
  }
-   gimplify_assign (x, var, );

The gimplify_assign was hoisted into the two branches of the preceding 
if-else because I wanted to skip the assign if there was a non-present 
optional argument. However, in the else case, the assign only happens if 
omp_is_reference or optional_arg_p is true, when it should be unconditional.


I can confirm that fixing this allows at least 
libgomp.oacc-fortran/host_data-1.f90 to pass again. I will post the 
patch when I have double-checked the other cases.


Thanks

Kwok

On 01/02/2019 4:24 pm, Thomas Schwinge wrote:

Hi Kwok!

On Thu, 31 Jan 2019 18:30:35 +, Kwok Cheung Yeung  
wrote:

This patch allows for the use of Fortran optional arguments in the
use_device clause of a host_data directive.

I will push this into openacc-gcc-8-branch later today.


Per my testing, it unfortunately also introduces a number of regressions:

 [-PASS:-]{+FAIL:+} gfortran.dg/goacc/uninit-use-device-clause.f95   -O   
(test for warnings, line 7)
 PASS: gfortran.dg/goacc/uninit-use-device-clause.f95   -O  (test for 
excess errors)

(This probably means that the clause argument is no longer
"evaluated/used".)

 PASS: libgomp.c/target-14.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/target-14.c execution test

 libgomp: cuCtxSynchronize error: an illegal memory access was encountered

 PASS: libgomp.c/target-18.c (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c/target-18.c execution test

 libgomp: use_device_ptr pointer wasn't mapped

 PASS: libgomp.c++/target-9.C (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.c++/target-9.C execution test

 libgomp: use_device_ptr pointer wasn't mapped

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O0  (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O0  execution test

 libgomp: use_device_ptr pointer wasn't mapped

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O2  (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O2  execution test

No error message.

 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O0  (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O0  execution test
 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O2  (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O2  execution test
 PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O2  execution 
test

 host_data-6.exe: [...]/libgomp.oacc-c-c++-common/host_data-6.c:15: foo: 
Assertion `p == (float *) host_p' failed.
 
Same for C++, for "libgomp.oacc-c-c++-common/host_data-5.c", and

"libgomp.oacc-c-c++-common/host_data-6.c".

 PASS: libgomp.oacc-fortran/host_data-1.f90 -DACC_DEVICE_TYPE_host="" 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
 [-PASS:-]{+FAIL:+} libgomp.oacc-fortran/host_data-1.f90 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O0  execution 
test
 PASS: libgomp.oacc-fortran/host_data-1.f90 -DACC_DEVICE_TYPE_host="" 

Re: C++ PATCH for c++/88325 - ICE with invalid out-of-line template member definition

2019-02-01 Thread Jason Merrill

On 2/1/19 11:26 AM, Marek Polacek wrote:

On Wed, Jan 30, 2019 at 01:39:11PM -0500, Jason Merrill wrote:

On 1/28/19 9:46 PM, Marek Polacek wrote:

This patch fixes an ICE-on-invalid (becase out-of-line constructors can't have
template arguments and also because function templates can't be partially
specialized) in C++2a: when we're parsing

template template A::A ()

in the attached test we end up parsing "A::A" as a type name, and first we
try a class-name.  First we process "A::" as the nested name specifier and 
then
we parse "A".  In this test that results in a BASELINK.  Because in this 
context
we're supposed to treat it as a typename ([temp.res]/6), we call 
make_typename_type,
but that crashes.


Hmm.  If we've done an actual lookup (that gave us a BASELINK), we aren't
dealing with a member of an unknown specialization anymore, so we should
just use the result of the lookup rather than speculate about what the name
might mean.  Why are we still trying to treat it as a typename?


Good point.  It's because cp_parser_class_name has:

23095   /* Any name names a type if we're following the `typename' keyword
23096  in a qualified name where the enclosing scope is type-dependent.  */
23097   typename_p = (typename_keyword_p && scope && TYPE_P (scope)
23098 && dependent_type_p (scope));

and scope is in this case "A" which is dependent.  Then there's this
"identifier, but not template-id" case which only performs name lookup when
typename_p is false.  But we're parsing "A" so we call
cp_parser_template_id.  It sees CPP_TEMPLATE_ID -- an already parsed
template-id, so it just returns it, which is a BASELINK.  So even messing
with tag_type wouldn't help.

Does this answer your question?


Mostly, though I'm still curious how we got the previous parse of the 
template-id and yet continued to try to parse it as a class-name.


The patch is OK.

Jason


Re: C++ PATCH for c++/88325 - ICE with invalid out-of-line template member definition

2019-02-01 Thread Marek Polacek
On Wed, Jan 30, 2019 at 01:39:11PM -0500, Jason Merrill wrote:
> On 1/28/19 9:46 PM, Marek Polacek wrote:
> > This patch fixes an ICE-on-invalid (becase out-of-line constructors can't 
> > have
> > template arguments and also because function templates can't be partially
> > specialized) in C++2a: when we're parsing
> > 
> >template template A::A ()
> > 
> > in the attached test we end up parsing "A::A" as a type name, and 
> > first we
> > try a class-name.  First we process "A::" as the nested name specifier 
> > and then
> > we parse "A".  In this test that results in a BASELINK.  Because in this 
> > context
> > we're supposed to treat it as a typename ([temp.res]/6), we call 
> > make_typename_type,
> > but that crashes.
> 
> Hmm.  If we've done an actual lookup (that gave us a BASELINK), we aren't
> dealing with a member of an unknown specialization anymore, so we should
> just use the result of the lookup rather than speculate about what the name
> might mean.  Why are we still trying to treat it as a typename?

Good point.  It's because cp_parser_class_name has:

23095   /* Any name names a type if we're following the `typename' keyword
23096  in a qualified name where the enclosing scope is type-dependent.  */
23097   typename_p = (typename_keyword_p && scope && TYPE_P (scope)
23098 && dependent_type_p (scope));

and scope is in this case "A" which is dependent.  Then there's this
"identifier, but not template-id" case which only performs name lookup when
typename_p is false.  But we're parsing "A" so we call
cp_parser_template_id.  It sees CPP_TEMPLATE_ID -- an already parsed
template-id, so it just returns it, which is a BASELINK.  So even messing
with tag_type wouldn't help.

Does this answer your question?

Marek


Re: [PATCH] [og8] Allow optional arguments to be used in the use_device OpenACC clause

2019-02-01 Thread Thomas Schwinge
Hi Kwok!

On Thu, 31 Jan 2019 18:30:35 +, Kwok Cheung Yeung  
wrote:
> This patch allows for the use of Fortran optional arguments in the 
> use_device clause of a host_data directive.
> 
> I will push this into openacc-gcc-8-branch later today.

Per my testing, it unfortunately also introduces a number of regressions:

[-PASS:-]{+FAIL:+} gfortran.dg/goacc/uninit-use-device-clause.f95   -O   
(test for warnings, line 7)
PASS: gfortran.dg/goacc/uninit-use-device-clause.f95   -O  (test for excess 
errors)

(This probably means that the clause argument is no longer
"evaluated/used".)

PASS: libgomp.c/target-14.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/target-14.c execution test

libgomp: cuCtxSynchronize error: an illegal memory access was encountered

PASS: libgomp.c/target-18.c (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c/target-18.c execution test

libgomp: use_device_ptr pointer wasn't mapped

PASS: libgomp.c++/target-9.C (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.c++/target-9.C execution test

libgomp: use_device_ptr pointer wasn't mapped

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O0  (test for excess errors)
[-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O0  execution test

libgomp: use_device_ptr pointer wasn't mapped

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O2  (test for excess errors)
[-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-5.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O2  execution test

No error message.

PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O0  (test for excess errors)
[-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O0  execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O2  (test for excess errors)
[-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_nvidia="nvptx-none" -DACC_MEM_SHARED=0 -foffload=nvptx-none  
-O2  execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for 
excess errors)
[-PASS:-]{+FAIL:+} 
libgomp.oacc-c/../libgomp.oacc-c-c++-common/host_data-6.c 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O2  execution 
test

host_data-6.exe: [...]/libgomp.oacc-c-c++-common/host_data-6.c:15: foo: 
Assertion `p == (float *) host_p' failed.

Same for C++, for "libgomp.oacc-c-c++-common/host_data-5.c", and
"libgomp.oacc-c-c++-common/host_data-6.c".

PASS: libgomp.oacc-fortran/host_data-1.f90 -DACC_DEVICE_TYPE_host="" 
-DACC_MEM_SHARED=1 -foffload=disable  -O0  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/host_data-1.f90 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O0  execution 
test
PASS: libgomp.oacc-fortran/host_data-1.f90 -DACC_DEVICE_TYPE_host="" 
-DACC_MEM_SHARED=1 -foffload=disable  -O1  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/host_data-1.f90 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O1  execution 
test
PASS: libgomp.oacc-fortran/host_data-1.f90 -DACC_DEVICE_TYPE_host="" 
-DACC_MEM_SHARED=1 -foffload=disable  -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/host_data-1.f90 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O2  execution 
test
PASS: libgomp.oacc-fortran/host_data-1.f90 -DACC_DEVICE_TYPE_host="" 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/host_data-1.f90 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test
PASS: libgomp.oacc-fortran/host_data-1.f90 -DACC_DEVICE_TYPE_host="" 
-DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.oacc-fortran/host_data-1.f90 
-DACC_DEVICE_TYPE_host="" -DACC_MEM_SHARED=1 -foffload=disable  -O3 -g  
execution test
PASS: libgomp.oacc-fortran/host_data-1.f90 -DACC_DEVICE_TYPE_host="" 
-DACC_MEM_SHARED=1 -foffload=disable  -Os  (test for excess errors)
[-PASS:-]{+FAIL:+} 

Re: [Patch, fortran] PR88980 - [9 regression] segfault on allocatable string member assignment

2019-02-01 Thread Steve Kargl
On Fri, Feb 01, 2019 at 01:10:21PM +, Paul Richard Thomas wrote:
> This patch is rather simpler than it looks.
> 
> The segfault was occurring because r264724 changed the array reference
> for cases like these to use pointer arithmetic to obtain the element.
> Unfortunately, in the case, the span field of the descriptor was not
> being set during the allocation of the component items.
> 
> The ChangeLog adequately explains the fix and results in the span
> field being set unconditionally.
> 
> Bootstrapped and regtested on FC28/x86_64 - OK for trunk?
> 


OK. Thanks for the patch.

-- 
Steve


Re: Move -Wmaybe-uninitialized to -Wextra

2019-02-01 Thread Segher Boessenkool
On Fri, Feb 01, 2019 at 09:01:53AM -0500, Marek Polacek wrote:
> On Fri, Feb 01, 2019 at 07:19:25AM -0600, Segher Boessenkool wrote:
> > > Some people tend to consider that 
> > > if a warning is not part of -Wall, it might as well not exist. Obviously 
> > > I 
> > > disagree with that.
> > 
> > If it is not part of -Wall and not of -W, and not special purpose, then it
> > might as well not exist.
> 
> There are warnings that *do* make sense, but have issues e.g. with macro
> expansion, so will be outside -Wall/-Wextra unless that's fixed.  E.g.
> -Wlogical-op, -Wduplicated-conds, or a warning I posted to some PR
> called -Wsame-arguments I think, etc.

Yes, we agree on that.  I'm just saying such general-purpose warnings are
not used much until they are part of -Wall or -W.


Segher


Re: Go patch committed: Support alias to pointer type as method receiver

2019-02-01 Thread Ian Lance Taylor
On Wed, Jan 30, 2019 at 7:57 AM Ian Lance Taylor  wrote:
>
> This patch by Ben Shi to the Go frontend fixes it to support an
> aliases to a pointer type as a method receiver.  This fixes
> https://golang.org/issue/28252.  Bootstrapped and ran Go testsuite on
> x86_64-pc-linux-gnu.  Committed to mainline.

This patch, also by Ben Shi, extends the same idea to a method
declaration.  This fixes https://golang.org/issue/27994.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 268397)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2206f40fc1e0e1e2ba3eacb7388dd26b72729bde
+cbcc538adc518da5788d1101e16f106a1514
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/gogo.cc
===
--- gcc/go/gofrontend/gogo.cc   (revision 268397)
+++ gcc/go/gofrontend/gogo.cc   (working copy)
@@ -2096,12 +2096,20 @@ Gogo::declare_function(const std::string
   // declarations.
   Type* rtype = type->receiver()->type();
 
+  while (rtype->named_type() != NULL
+ && rtype->named_type()->is_alias())
+   rtype = rtype->named_type()->real_type()->forwarded();
+
   // We want to look through the pointer created by the
   // parser, without getting an error if the type is not yet
   // defined.
   if (rtype->classification() == Type::TYPE_POINTER)
rtype = rtype->points_to();
 
+  while (rtype->named_type() != NULL
+ && rtype->named_type()->is_alias())
+   rtype = rtype->named_type()->real_type()->forwarded();
+
   if (rtype->is_error_type())
return NULL;
   else if (rtype->named_type() != NULL)


Re: [PATCH] Another -fdebug-type-section fix

2019-02-01 Thread Jason Merrill

On 2/1/19 7:29 AM, Richard Biener wrote:


This fixes another case where we end up with duplicate stub DIEs
and thus build_abbrev_table trying to adjust a DW_AT_signature
ref to a local DIE.  This happens when we have two unworthy DIEs
from different type units rooted in stub DIEs themselves.  Here
copy_ancestor_tree records the stubs as original DIE that gets
copied failing to see we already copied the thing.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

OK?


OK, thanks.

Jason



Re: [PATCH] print correct array sizes in errors (PR 87996)

2019-02-01 Thread Jason Merrill

On 1/31/19 5:49 PM, Martin Sebor wrote:

On 1/30/19 3:15 PM, Jason Merrill wrote:

On 1/29/19 7:15 PM, Martin Sebor wrote:

+  /* Try to convert the original SIZE to a ssizetype.  */
+  if (orig_size != error_mark_node
+  && !TYPE_UNSIGNED (TREE_TYPE (orig_size)))
+    {
+  if (TREE_CODE (size) == INTEGER_CST
+  && tree_int_cst_sign_bit (size))
+    diagsize = build_converted_constant_expr (ssizetype, size,
+  tsubst_flags_t ());
+  else if (size == error_mark_node
+   && TREE_CODE (orig_size) == INTEGER_CST
+   && tree_int_cst_sign_bit (orig_size))
+    diagsize = build_converted_constant_expr (ssizetype, orig_size,
+  tsubst_flags_t ());
+    }


Using build_converted_constant_expr here looks odd; that's a 
language-level notion, and we're dealing with compiler internals. 
fold_convert seems more appropriate.


Done.




+  if (TREE_CONSTANT (size))
+    {
+  if (!diagsize && TREE_CODE (size) == INTEGER_CST)
+    diagsize = size;
+    }
+  else
 size = osize;
 }

@@ -9732,15 +9758,12 @@ compute_array_index_type_loc (location_t 
name_loc,

   if (TREE_CODE (size) == INTEGER_CST)
 {
   /* An array must have a positive number of elements.  */
-  if (!valid_constant_size_p (size))
+  if (!diagsize)
+    diagsize = size;


It seems like the earlier hunk here is unnecessary; if size is an 
INTEGER_CST, it will be unchanged, and so be used for diagsize in the 
latter hunk without any changes to the earlier location.  Actually, 
why not do all of the diagsize logic down here?  There doesn't seem to 
be anything above that relies on information we will have lost at this 
point.


Sure.  Done in the attached revision.



-  tree osize = size;
+  /* The original numeric size as seen in the source code after
+ any substitution and before conversion to size_t.  */
+  tree origsize = NULL_TREE;


Can't you use osize?  instantiate_non_dependent_expr doesn't do any 
actual substitution, it shouldn't change the type of the expression or 
affect whether it's an INTEGER_CST.


Jason


Re: [libphobos] Work around lack of dlpi_tls_modid before Solaris 11.5

2019-02-01 Thread Rainer Orth
Hi Johannes,

> I'd recommend not using such a workaround:
>
> This means getTLSRange will always return an empty range, but the GC uses
> this to scan TLS memory. This means a GC collection can delete objects
> which are still pointed to from TLS. This leads to hard to debug errors,
> and if I remember correctly, the testsuite will not catch these errors. I
> think we have code in phobos though which references objects only from TLS
> and this will break after a GC collection.

I fully admit to have been wary about such an approach myself, but was
astonished how far it seemed to get me.

I suspect the two testsuite regressions (compared to a build with
dlpi_tls_modid present) I mentioned are exactly of the kind you mention:

e.g. the gdc.test/runnable/testaa.d failures are like this

core.exception.rangeer...@gdc.test/runnable/testaa.d(410): Range violation

/vol/gcc/src/hg/trunk/local/libphobos/libdruntime/core/exception.d:496 
onRangeError [0x80f0d2c]
/vol/gcc/src/hg/trunk/local/libphobos/libdruntime/core/exception.d:672 
_d_arraybounds [0x80f132f]
??:? void testaa.test15() [0x80d7ae4]
??:? _Dmain [0x80dd3fc]
before test 1

and gdc.test/runnable/xtest55.d fails like so:

core.exception.asserter...@gdc.test/runnable/xtest55.d(19): Assertion failure

/vol/gcc/src/hg/trunk/local/libphobos/libdruntime/core/exception.d:441 
onAssertError [0x7fff55dd3b56]
??:? _Dmain [0x418959]
7FFFBEB07FFFBEB0

It's a small set admittedly (but there are the libphobos failures as
well), but a compiler that leaves its users with a feeling of
unreliablity is probably worse than none at all.

Just for the record, I saw the same regressions on Linux/x86_64 when I
accidentally didn't define _GNU_SOURCE in the configure test for
dlpi_tls_modid, producing an equivalent configuration.  So this isn't
Solaris-specific in any way.

> I'm not sure what's a good solution here. EmuTLS has got the same problem,
> but I'll post a RFC patch next weekend which would allow to scan the emuTLS
> memory. If we somehow make that work, I'd recommend to use emuTLS instead
> of native TLS if there's no way to scan the native TLS*.

The problem here is that we'd probably need to build gcc twice in this
case: once with native TLS for all non-D languages, and a second time
with --disable-tls for D.  AFAICS TARGET_HAVE_TLS needs to be a
compile-time constant and cannot depend on the language being compiled
for.

> FYI Martin Nowak(in CC) wrote most of the original code for rt.sections so
> he's the expert we'd have to ask.
>
> * Maybe we could implement a more runtime-independent approach to scan
> native TLS?
> 1) We somehow need to bracket the TLS section (it would have to be
>per-shared-library though, we basically need thread-local, hidden
>__start_tls and __stop_tls symbols).
> 2) We need to emit a hidden _dso_scan_tls function into each D library.
>A pointer to  this DSO specific function then has to be passed in
>CompilerDSOData to _d_dso_registry.
> 3) tlsRange has to forward to the correct, DSO specific _dso_scan_tls.
>
> 2 and 3 are easy but I'm not sure if we can do 1.

Right: I suspect 1 would we way more difficult than the
__start_minfo/__stop_minfo stuff.

I failed to mention another approach in my patch submission, though I
alluded to it in PR d/88150: the ldc fork of libdruntime

https://github.com/ldc-developers/druntime

has in src/rt/sections_ldc.d an implementation of getTLSRange for
Illumos/Solaris without dlpi_tls_modid.  I managed to adapt it to
sections_elf_shared.d, but apart from the fact that it uses undocumented
libc internals (which probably don't change between Solaris 10 and 11.4,
so that shouldn't be too bad) that implementation only gets you the TLS
range for the main executable, so isn't very useful AFAICS.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, GCC] PR target/86487: fix the way 'uses_hard_regs_p' handles paradoxical subregs

2019-02-01 Thread Andre Vieira (lists)



On 11/01/2019 22:54, Jeff Law wrote:

On 1/8/19 8:21 AM, Andre Vieira (lists) wrote:



On 07/01/2019 22:50, Jeff Law wrote:

On 1/7/19 7:42 AM, Andre Vieira (lists) wrote:

Hi,

This patch fixes the way 'uses_hard_regs_p' handles paradoxical subregs.
   The function is supposed to detect whether a register access of 'x'
overlaps with 'set'.  For SUBREGs it should check whether any of the
full multi-register overlaps with 'set'.  The former behavior used to
grab the widest mode of the inner/outer registers of a SUBREG and the
inner register, and check all registers from the inner-register onwards
for the given width.  For normal SUBREGS this gives you the full
register, for paradoxical SUBREGS however it may give you the wrong set
of registers if the index is not the first of the multi-register set.

The original error reported in PR target/86487 can no longer be
reproduced with the given test, this was due to an unrelated code-gen
change, regardless I believe this should still be fixed as it is simply
wrong behavior by uses_hard_regs_p which may be triggered by a different
test-case or by future changes to the compiler.  Also it is useful to
point out that this isn't actually a 'target' issue as this code could
potentially hit any other target using paradoxical SUBREGS.  Should I
change the Bugzilla ticket to reflect this is actually a target agnostic
issue in RTL?

There is a gotcha here, I don't know what would happen if you hit the
cases of get_hard_regno where it would return -1, quoting the comment
above that function "If X is not a register or a subreg of a register,
return -1." though I think if we are hitting this then things must have
gone wrong before?

Bootstrapped on aarch64, arm and x86, no regressions.

Is this OK for trunk?


gcc/ChangeLog:
2019-01-07 Andre Vieira  


  PR target/86487
  * lra-constraints.c(uses_hard_regs_p): Fix handling of
paradoxical SUBREGS.

But doesn't wider_subreg_mode give us the wider of the two modes here
and we use that wider mode when we call overlaps_hard_reg_set_p which
should ultimately check all the registers in the paradoxical.

I must be missing something here?!?

jeff



Hi Jeff,

It does give us the wider of the two modes, but we also then grab the
"subreg" of the paradoxical subreg.  If you look at the first example
case of the bugzilla ticket, for an older gcc (say gcc-8) and the
options provided (using big-endian), it will generate the following subreg:
(subreg:DI (reg:SI 122) 0)

This paradoxical subreg represents a register pair r0-r1, where because
of big-endian and subgreg index 0, r1 is the value we care about and r0
the one we say "it can be whatever" by using this paradoxical subreg.

When 'uses_hard_regs_p' sees this as a subreg, it sets 'mode' to the
wider, i.e. DImode, but it also sets 'x' to the subreg i.e. 'reg:SI
122', for which get_hard_regno correctly returns 'r1'.  But if you now
pass 'overlaps_hard_reg_set_p' DImode and 'r1', it will check whether
'set' contains either 'r1-r2', and not 'r1'r0'.

To reproduce this again I now applied this patch to GCC 8 and found an
issue with it. 'REG_P (x)' returns false if x is a 'SUBREG'. So I will
need to change the later check to also include 'SUBREG_P (x)', I guess I
was testing with a too new version of gcc that didn't lead to the bogus
register allocation...

Which really encourages me to add some sort of testcase, but I'd very
much like to come up with a less flaky one, we basically need to force
the generation of a paradoxical subreg 'x', where 'get_hard_regno
(SUBREG_REG (x)) != get_hard_regno (x)'.  This will cause
'uses_hard_regs_p' to give you a wrong answer.

BTW, you might look at 87305 which is another problem with big endian
pseudos and paradoxical subregs that Vlad just fixed.

jeff



Hi Jeff,

Thank you, I had a look but I don't think it is the same issue and even 
though I can no longer reproduce this particular issue on trunk I do 
believe the latent fault is still there and I would like to fix it.



As I mentioned in the earlier emails I forgot to make sure 'SUBREG_P 
(x)' was also handled where we previously only accepted 'REG_P (x)' as 
we would always strip away the parent subreg. I have also added the 
testcase that triggered this issue on arm in previous GCC versions. This 
does not trigger it on trunk any more due to other codegen changes.



Bootstrapped on aarch64, arm and x86, no regressions.

Is this OK for trunk?


gcc/ChangeLog:
2019-02-01 Andre Vieira  


PR target/86487
* lra-constraints.c(uses_hard_regs_p): Fix handling of 
paradoxical SUBREGS.


gcc/testsuiteChangeLog:
2019-02-01 Andre Vieira  


PR target/86487
* gcc.target/arm/pr86487.c: New.
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index c061093ed699620afe2dfda60d58066d6967523a..736b084acc552b75ff4d369b6584bc9ab422e21b 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1761,11 +1761,21 @@ uses_hard_regs_p (rtx x, 

Re: Move -Wmaybe-uninitialized to -Wextra

2019-02-01 Thread Marek Polacek
On Fri, Feb 01, 2019 at 07:19:25AM -0600, Segher Boessenkool wrote:
> Hi Marc,
> 
> On Fri, Feb 01, 2019 at 12:32:45PM +0100, Marc Glisse wrote:
> > -Wmaybe-uninitialized generates false positives, we can tweak the compiler 
> > to reduce them, but there will always be some, that's in the nature of 
> > this warning.
> 
> That is true for *every* warning; if not, it should be an error, not a
> warning.
> 
> > My opinion is that -Wmaybe-uninitialized would serve its purpose better as 
> > part of -Wextra.
> 
> +1

+1 from me too.

> > People tend to use -Wall with -Werror (either explicitly 
> > or implicitly by modifying the code until all warnings are gone). What I 
> > see currently in projects where I participate is that 
> > -Wmaybe-uninitialized is making things worse. People don't investigate 
> > deeply the cause of the warning, they just go for whatever "quick-fix" 
> > makes the compiler shut up. Quite often, this involves extra code that is 
> > less readable and performs worse, while it didn't even "fix" what caused 
> > the warning, it just randomly ended up with code where the compiler 
> > doesn't warn (possibly because the function got bigger, which changed 
> > inlining decisions...).
> 
> Yes, using -Werror is usually a terrible idea.
> 
> > Note that similar arguments may apply to some other warnings that somehow 
> > made their way into -Wall when they shouldn't have, but for now I am only 
> > proposing to move -Wmaybe-uninitialized. Some people tend to consider that 
> > if a warning is not part of -Wall, it might as well not exist. Obviously I 
> > disagree with that.
> 
> If it is not part of -Wall and not of -W, and not special purpose, then it
> might as well not exist.

There are warnings that *do* make sense, but have issues e.g. with macro
expansion, so will be outside -Wall/-Wextra unless that's fixed.  E.g.
-Wlogical-op, -Wduplicated-conds, or a warning I posted to some PR
called -Wsame-arguments I think, etc.

Marek


Re: Provide __start_minfo/__stop_minfo for linkers that don't (PR d/87864)

2019-02-01 Thread Rainer Orth
Hi Johannes,

> Looks good to me, although ultimately Iain has to decide of course.

fine, thanks.  However, if we cannot find an acceptable solution for the
lack of dlpi_tls_modid, the patch isn't of much use:

* Solaris 11.5 will have all of dlpi_tls_modid and section bracketing.

* Solaris 11.4 has ld section bracketing, but might or might not get
  dlpi_tls_modid by a patch.

* Solaris 11.3 lacks ld section bracketing, won't get dlpi_tls_modid,
  and also needs this patch to link with -lsocket -lnsl separately:

[build] Fix libgphobos linking on Solaris 11
https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02248.html

> One nitpick: wouldn't you have to somehow mark __start/__stop _minfo as
> hidden? This is important in the case where you have multiple shared
> libraries and each library should have its own __start/__stop symbold to
> braket the library's minfo section.

Here's what I see in libgdruntime.so with various linkers/approaches for
minfo bracketing, using readelf -s libgdruntime.so|grep __start:

* gld 2.31 on x86_64-pc-linux-gnu (native gld section bracketing):

 2: 0012b770 0 NOTYPE  LOCAL  DEFAULT   26 __start_minfo
  1763: 0012b770 0 NOTYPE  LOCAL  DEFAULT   26 __start_minfo

* Solaris ld on i386-pc-solaris2.11 (Solaris 11.4 where ld supports
  section bracketing natively):

   136: 00147f40   836 OBJECT  LOCAL  HIDDEN28 __start_minfo

* Solaris ld on i386-pc-solaris2.11 (Solaris 11.3 with drtstuff patch):

   150: 0014731c 0 OBJECT  LOCAL  HIDDEN40 __start_minfo

I guess it's enough that the symbols are local, irrespective of visibility.

> Also 'if !DRUNTIME_OS_MINFO_BRACKETING' might be the wrong condition/name
> in Makefile.am if we add back support for per-module constructor-based
> module registration (instead of calling _d_dso_registry once per shared
> library passing all ModuleInfos in that library, call another hook
> _d_register_module once per module and pass only one ModuleInfo). But we
> can fix that if we ever need to add back support for that second
> approach. (If this startfile/endfile approach is portable enough, we may be
> able to avoid that).

It seemed natural to me to call the automake conditional similarly to
the variable set by the DRUNTIME_OS_MINFO_BRACKETING macro.  But I'm of
course open to other names, although it's probably best to only think
about renaming once section bracketing gets other uses.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Fix inconsistent operator delete usages

2019-02-01 Thread Jonathan Wakely

On 31/01/19 22:29 +0100, François Dumont wrote:
    I was writing a test which needed to override the std::nothrow 
versions of the operator new and delete to control 
get_temporary_buffer behavior and noticed that it is inconsistent with 
release_temporary_buffer in terms of new/delete operators.


    Grepping for other std::nothrow usages I found some others and 
especially one in libsupc++.


    I don't know how serious it is considering the Standard. As long 
as you stick to the libstdc++ operators it is fine. Only users 
overriding those operators will notice.


    Tested under Linux x86_64 normal mode with some failures but none 
related to this patch I think but of course you better check on your 
side.


    * libsupc++/atexit_thread.cc (run(void*)): Call std::nothrow delete
    operator.
    * include/bits/stl_tempbuf.h (return_temporary_buffer): Likewise.
    * include/profile/impl/profiler_trace.h
    (__trace_base<>::~__trace_base()): Likewise.
    (__trace_base<>::__add_object(__stack_t)): Likewise.
    (__trace_base<>::__retire_object(__stack_t)): Likewise.

Let me know if it is a go.


Nope.


François




diff --git a/libstdc++-v3/include/bits/stl_tempbuf.h 
b/libstdc++-v3/include/bits/stl_tempbuf.h
index b6ad9ee6a46..e614a77bc4f 100644
--- a/libstdc++-v3/include/bits/stl_tempbuf.h
+++ b/libstdc++-v3/include/bits/stl_tempbuf.h
@@ -110,7 +110,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  template
inline void
return_temporary_buffer(_Tp* __p)
-{ ::operator delete(__p); }
+{ ::operator delete(__p, std::nothrow); }


This change is harmless, but unnecessary.

The standard requires that the nothrow versions of operator new must
obtain memory from the same source as the normal version of operator
new (even if the user has replaced one or both versions of operator
new). That means you can always use the normal version of operator
delete instead of the nothrow one.

If your tests failed because of this, then your replacement versions
of operator new and operator delete were wrong.

See [new.delete.single] p7.



  /**
diff --git a/libstdc++-v3/include/profile/impl/profiler_trace.h 
b/libstdc++-v3/include/profile/impl/profiler_trace.h
index 261f3b3cc72..36822ef77ac 100644
--- a/libstdc++-v3/include/profile/impl/profiler_trace.h
+++ b/libstdc++-v3/include/profile/impl/profiler_trace.h
@@ -200,7 +200,7 @@ namespace __gnu_profile
  {
for (typename __stack_table_t::iterator __it
   = __stack_table.begin(); __it != __stack_table.end(); ++__it)
- delete __it->first;
+ ::operator delete(__it->first, std::nothrow);


This introduces a bug. The previous code called the destructor before
deallocating the memory, after your change it would not run the
destructor. This is definitely not OK.

__it->__first is a pointer to a std::vector, so it's destructor
definitely must be run.

You're making the code *less* consistent, by changing it from
new/delete to new/operator delete.

Also Profile Mode was deprecated in gcc-7 and I think instead of
maintaining the code we should just remove it early in stage 1 for
gcc-10.


diff --git a/libstdc++-v3/libsupc++/atexit_thread.cc 
b/libstdc++-v3/libsupc++/atexit_thread.cc
index 25334250dab..d47d1654b28 100644
--- a/libstdc++-v3/libsupc++/atexit_thread.cc
+++ b/libstdc++-v3/libsupc++/atexit_thread.cc
@@ -79,7 +79,7 @@ namespace {
  FreeLibrary (e->dll);
#endif
e = e->next;
-   delete (old_e);
+   ::operator delete(old_e, std::nothrow);


This type has a trivial destructor, so this doesn't actually cause a
change in behaviour, but it's still making it inconsistent.



  }
  }





Re: Move -Wmaybe-uninitialized to -Wextra

2019-02-01 Thread Segher Boessenkool
Hi Marc,

On Fri, Feb 01, 2019 at 12:32:45PM +0100, Marc Glisse wrote:
> -Wmaybe-uninitialized generates false positives, we can tweak the compiler 
> to reduce them, but there will always be some, that's in the nature of 
> this warning.

That is true for *every* warning; if not, it should be an error, not a
warning.

> My opinion is that -Wmaybe-uninitialized would serve its purpose better as 
> part of -Wextra.

+1

> People tend to use -Wall with -Werror (either explicitly 
> or implicitly by modifying the code until all warnings are gone). What I 
> see currently in projects where I participate is that 
> -Wmaybe-uninitialized is making things worse. People don't investigate 
> deeply the cause of the warning, they just go for whatever "quick-fix" 
> makes the compiler shut up. Quite often, this involves extra code that is 
> less readable and performs worse, while it didn't even "fix" what caused 
> the warning, it just randomly ended up with code where the compiler 
> doesn't warn (possibly because the function got bigger, which changed 
> inlining decisions...).

Yes, using -Werror is usually a terrible idea.

> Note that similar arguments may apply to some other warnings that somehow 
> made their way into -Wall when they shouldn't have, but for now I am only 
> proposing to move -Wmaybe-uninitialized. Some people tend to consider that 
> if a warning is not part of -Wall, it might as well not exist. Obviously I 
> disagree with that.

If it is not part of -Wall and not of -W, and not special purpose, then it
might as well not exist.


Segher


[Patch, fortran] PR88980 - [9 regression] segfault on allocatable string member assignment

2019-02-01 Thread Paul Richard Thomas
This patch is rather simpler than it looks.

The segfault was occurring because r264724 changed the array reference
for cases like these to use pointer arithmetic to obtain the element.
Unfortunately, in the case, the span field of the descriptor was not
being set during the allocation of the component items.

The ChangeLog adequately explains the fix and results in the span
field being set unconditionally.

Bootstrapped and regtested on FC28/x86_64 - OK for trunk?

Paul

2019-02-01  Paul Thomas  

PR fortran/88980
* trans-array.c (gfc_array_init_size): Add element_size to the
arguments.
(gfc_array_allocate): Remove the recalculation of the size of
the element and use element_size from the call to the above.
Unconditionally set the span field of the descriptor.

2019-02-01  Paul Thomas  

PR fortran/88980
* gfortran.dg/realloc_on_assign_32.f90 : New test.
Index: gcc/fortran/trans-array.c
===
*** gcc/fortran/trans-array.c	(revision 268231)
--- gcc/fortran/trans-array.c	(working copy)
*** gfc_array_init_size (tree descriptor, in
*** 5370,5383 
  		 gfc_expr ** lower, gfc_expr ** upper, stmtblock_t * pblock,
  		 stmtblock_t * descriptor_block, tree * overflow,
  		 tree expr3_elem_size, tree *nelems, gfc_expr *expr3,
! 		 tree expr3_desc, bool e3_has_nodescriptor, gfc_expr *expr)
  {
tree type;
tree tmp;
tree size;
tree offset;
tree stride;
-   tree element_size;
tree or_expr;
tree thencase;
tree elsecase;
--- 5370,5383 
  		 gfc_expr ** lower, gfc_expr ** upper, stmtblock_t * pblock,
  		 stmtblock_t * descriptor_block, tree * overflow,
  		 tree expr3_elem_size, tree *nelems, gfc_expr *expr3,
! 		 tree expr3_desc, bool e3_has_nodescriptor, gfc_expr *expr,
! 		 tree *element_size)
  {
tree type;
tree tmp;
tree size;
tree offset;
tree stride;
tree or_expr;
tree thencase;
tree elsecase;
*** gfc_array_init_size (tree descriptor, in
*** 5628,5637 
  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
  
/* Convert to size_t.  */
!   element_size = fold_convert (size_type_node, tmp);
  
if (rank == 0)
! return element_size;
  
*nelems = gfc_evaluate_now (stride, pblock);
stride = fold_convert (size_type_node, stride);
--- 5628,5637 
  tmp = TYPE_SIZE_UNIT (gfc_get_element_type (type));
  
/* Convert to size_t.  */
!   *element_size = fold_convert (size_type_node, tmp);
  
if (rank == 0)
! return *element_size;
  
*nelems = gfc_evaluate_now (stride, pblock);
stride = fold_convert (size_type_node, stride);
*** gfc_array_init_size (tree descriptor, in
*** 5641,5654 
   dividing.  */
tmp = fold_build2_loc (input_location, TRUNC_DIV_EXPR,
  			 size_type_node,
! 			 TYPE_MAX_VALUE (size_type_node), element_size);
cond = gfc_unlikely (fold_build2_loc (input_location, LT_EXPR,
  	logical_type_node, tmp, stride),
  		   PRED_FORTRAN_OVERFLOW);
tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node, cond,
  			 integer_one_node, integer_zero_node);
cond = gfc_unlikely (fold_build2_loc (input_location, EQ_EXPR,
! 	logical_type_node, element_size,
  	build_int_cst (size_type_node, 0)),
  		   PRED_FORTRAN_SIZE_ZERO);
tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node, cond,
--- 5641,5654 
   dividing.  */
tmp = fold_build2_loc (input_location, TRUNC_DIV_EXPR,
  			 size_type_node,
! 			 TYPE_MAX_VALUE (size_type_node), *element_size);
cond = gfc_unlikely (fold_build2_loc (input_location, LT_EXPR,
  	logical_type_node, tmp, stride),
  		   PRED_FORTRAN_OVERFLOW);
tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node, cond,
  			 integer_one_node, integer_zero_node);
cond = gfc_unlikely (fold_build2_loc (input_location, EQ_EXPR,
! 	logical_type_node, *element_size,
  	build_int_cst (size_type_node, 0)),
  		   PRED_FORTRAN_SIZE_ZERO);
tmp = fold_build3_loc (input_location, COND_EXPR, integer_type_node, cond,
*** gfc_array_init_size (tree descriptor, in
*** 5658,5664 
*overflow = gfc_evaluate_now (tmp, pblock);
  
size = fold_build2_loc (input_location, MULT_EXPR, size_type_node,
! 			  stride, element_size);
  
if (poffset != NULL)
  {
--- 5658,5664 
*overflow = gfc_evaluate_now (tmp, pblock);
  
size = fold_build2_loc (input_location, MULT_EXPR, size_type_node,
! 			  stride, *element_size);
  
if (poffset != NULL)
  {
*** gfc_array_allocate (gfc_se * se, gfc_exp
*** 5736,5741 
--- 5736,5742 
tree var_overflow = NULL_TREE;
tree cond;
tree set_descriptor;
+   tree element_size = NULL_TREE;
stmtblock_t set_descriptor_block;
stmtblock_t elseblock;
gfc_expr **lower;
*** gfc_array_allocate 

[PATCH] Another -fdebug-type-section fix

2019-02-01 Thread Richard Biener


This fixes another case where we end up with duplicate stub DIEs
and thus build_abbrev_table trying to adjust a DW_AT_signature
ref to a local DIE.  This happens when we have two unworthy DIEs
from different type units rooted in stub DIEs themselves.  Here
copy_ancestor_tree records the stubs as original DIE that gets
copied failing to see we already copied the thing.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2019-02-01  Richard Biener  

PR debug/87295
* dwarf2out.c (copy_ancestor_tree): Register non-stubs as
orig.

* g++.dg/debug/dwarf2/pr87295.C: New testcase.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 268446)
+++ gcc/dwarf2out.c (working copy)
@@ -8169,6 +8169,11 @@ copy_ancestor_tree (dw_die_ref unit, dw_
   decl_table_entry **slot = NULL;
   struct decl_table_entry *entry = NULL;
 
+  /* If DIE refers to a stub unfold that so we get the appropriate
+ DIE registered as orig in decl_table.  */
+  if (dw_die_ref c = get_AT_ref (die, DW_AT_signature))
+die = c;
+
   if (decl_table)
 {
   /* Check if the entry has already been copied to UNIT.  */
Index: gcc/testsuite/g++.dg/debug/dwarf2/pr87295.C
===
--- gcc/testsuite/g++.dg/debug/dwarf2/pr87295.C (nonexistent)
+++ gcc/testsuite/g++.dg/debug/dwarf2/pr87295.C (working copy)
@@ -0,0 +1,22 @@
+// { dg-additional-options "-fdebug-types-section" }
+// { dg-require-effective-target c++11 }
+
+struct A {};
+namespace N {
+struct B {
+   using C = struct H {};
+   using D = A;
+};
+}
+struct E : N::B {
+typedef C C;
+};
+namespace N {
+struct F {
+   E::C d;
+   E::D h;
+};
+}
+struct G {
+N::F i;
+} j;


[PATCH] Fix PR88597

2019-02-01 Thread Richard Biener


The following fixes the compile-time explosion for the PR88597
testcase.  The fix isn't really "complete" but as usual I'd like
to see testcases for other cases.  I've queued a more complete
fix for GCC 10.  The issue is exponential work done by
SCEV instantiation which eventually hits "cached" but needs to
dive down the whole GENERIC tree it is fed.  So the new short-cut
in instantiate_scev_binary is the weak part of the fix.

The chrec_contains_undetermined fix is required to not blow up
there since CHRECs happily (and luckily!) exploit tree sharing
to limit their size but these simple recursive functions do
not expect to be fed a graph.  (there's some more functions
with the same issue not fixed with this patch)

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2019-02-01  Richard Biener  

PR middle-end/88597
* tree-scalar-evolution.c (analyze_scalar_evolution): Set up
the instantiate cache.
(instantiate_scev_binary): Elide second operand procesing
if equal to the first.
* tree-chrec.c (chrec_contains_symbols): Add visited set.
(chrec_contains_undetermined): Likewise.
(tree_contains_chrecs): Likewise.

* gcc.dg/torture/pr88597.c: New testcase.

Index: gcc/tree-scalar-evolution.c
===
--- gcc/tree-scalar-evolution.c (revision 268446)
+++ gcc/tree-scalar-evolution.c (working copy)
@@ -380,6 +380,37 @@ find_var_scev_info (basic_block instanti
   return >chrec;
 }
 
+
+/* Hashtable helpers for a temporary hash-table used when
+   analyzing a scalar evolution, instantiating a CHREC or
+   resolving mixers.  */
+
+struct instantiate_cache_type
+{
+  htab_t map;
+  vec entries;
+
+  instantiate_cache_type () : map (NULL), entries (vNULL) {}
+  ~instantiate_cache_type ();
+  tree get (unsigned slot) { return entries[slot].chrec; }
+  void set (unsigned slot, tree chrec) { entries[slot].chrec = chrec; }
+};
+
+instantiate_cache_type::~instantiate_cache_type ()
+{
+  if (map != NULL)
+{
+  htab_delete (map);
+  entries.release ();
+}
+}
+
+/* Cache to avoid infinite recursion when instantiating an SSA name.
+   Live during the outermost analyze_scalar_evolution, instantiate_scev
+   or resolve_mixers call.  */
+static instantiate_cache_type *global_cache;
+
+
 /* Return true when CHREC contains symbolic names defined in
LOOP_NB.  */
 
@@ -2117,7 +2148,22 @@ analyze_scalar_evolution (struct loop *l
 
   res = get_scalar_evolution (block_before_loop (loop), var);
   if (res == chrec_not_analyzed_yet)
-res = analyze_scalar_evolution_1 (loop, var);
+{
+  /* We'll recurse into instantiate_scev, avoid tearing down the
+ instantiate cache repeatedly and keep it live from here.  */
+  bool destr = false;
+  if (!global_cache)
+   {
+ global_cache = new instantiate_cache_type;
+ destr = true;
+   }
+  res = analyze_scalar_evolution_1 (loop, var);
+  if (destr)
+   {
+ delete global_cache;
+ global_cache = NULL;
+   }
+}
 
   if (dump_file && (dump_flags & TDF_SCEV))
 fprintf (dump_file, ")\n");
@@ -2231,34 +2277,6 @@ analyze_scalar_evolution_in_loop (struct
 }
 
 
-/* Hashtable helpers for a temporary hash-table used when
-   instantiating a CHREC or resolving mixers.  For this use
-   instantiated_below is always the same.  */
-
-struct instantiate_cache_type
-{
-  htab_t map;
-  vec entries;
-
-  instantiate_cache_type () : map (NULL), entries (vNULL) {}
-  ~instantiate_cache_type ();
-  tree get (unsigned slot) { return entries[slot].chrec; }
-  void set (unsigned slot, tree chrec) { entries[slot].chrec = chrec; }
-};
-
-instantiate_cache_type::~instantiate_cache_type ()
-{
-  if (map != NULL)
-{
-  htab_delete (map);
-  entries.release ();
-}
-}
-
-/* Cache to avoid infinite recursion when instantiating an SSA name.
-   Live during the outermost instantiate_scev or resolve_mixers call.  */
-static instantiate_cache_type *global_cache;
-
 /* Computes a hash function for database element ELT.  */
 
 static inline hashval_t
@@ -2562,10 +2580,18 @@ instantiate_scev_binary (edge instantiat
   if (op0 == chrec_dont_know)
 return chrec_dont_know;
 
-  op1 = instantiate_scev_r (instantiate_below, evolution_loop, inner_loop,
-   c1, fold_conversions, size_expr);
-  if (op1 == chrec_dont_know)
-return chrec_dont_know;
+  /* While we eventually compute the same op1 if c0 == c1 the process
+ of doing this is expensive so the following short-cut prevents
+ exponential compile-time behavior.  */
+  if (c0 != c1)
+{
+  op1 = instantiate_scev_r (instantiate_below, evolution_loop, inner_loop,
+   c1, fold_conversions, size_expr);
+  if (op1 == chrec_dont_know)
+   return chrec_dont_know;
+}
+  else
+op1 = op0;
 
   if (c0 != op0
   || c1 != op1)
Index: 

Move -Wmaybe-uninitialized to -Wextra

2019-02-01 Thread Marc Glisse

Hello,

first, I expect this to be controversial, so feel free to complain.

The description of -Wall says "This enables all the warnings about
constructions that some users consider questionable, and that are easy
to avoid (or modify to prevent the warning), even in conjunction with
macros."

And the description of -Wmaybe-uninitialized "For an automatic variable,
if there exists a path from the function entry to a use of the variable
that is initialized, but there exist some other paths for which the
variable is not initialized, the compiler emits a warning if it cannot
prove the uninitialized paths are not executed at run time. These
warnings are made optional because GCC is not smart enough to see all
the reasons why the code might be correct in spite of appearing to have
an error."

-Wmaybe-uninitialized generates false positives, we can tweak the compiler 
to reduce them, but there will always be some, that's in the nature of 
this warning.


These false positives are not easy to avoid, as required to be part of 
-Wall. Avoiding them, when it is possible at all, requires not just a 
syntactic tweak, like adding parentheses, but a semantic change that can 
make the code worse. Initializing something that does not need it is extra 
code (increases code size and running time). It also prevents better tools 
from detecting true uninitialized uses, either static analyzers or runtime 
checkers (sanitizer, valgrind).


This message concentrates on the negatives, but that doesn't mean I 
consider -Wmaybe-uninitialized as useless. It can find true uninitialized 
uses. And even some false positives can point at places where we can help 
the compiler generate better code (say with a default 
__builtin_unreachable case in a switch). I did in the past contribute 
patches to make it warn more often, and I might do so again in the future.


My opinion is that -Wmaybe-uninitialized would serve its purpose better as 
part of -Wextra. People tend to use -Wall with -Werror (either explicitly 
or implicitly by modifying the code until all warnings are gone). What I 
see currently in projects where I participate is that 
-Wmaybe-uninitialized is making things worse. People don't investigate 
deeply the cause of the warning, they just go for whatever "quick-fix" 
makes the compiler shut up. Quite often, this involves extra code that is 
less readable and performs worse, while it didn't even "fix" what caused 
the warning, it just randomly ended up with code where the compiler 
doesn't warn (possibly because the function got bigger, which changed 
inlining decisions...).


If the warning is not enabled by default so much but only when people are 
ready to investigate any warning thoroughly, the quickfix mentality is 
less likely to be present. People using -Wmaybe-uninitialized need to be 
willing to ignore false positives, or disable them with pragmas.


Note that similar arguments may apply to some other warnings that somehow 
made their way into -Wall when they shouldn't have, but for now I am only 
proposing to move -Wmaybe-uninitialized. Some people tend to consider that 
if a warning is not part of -Wall, it might as well not exist. Obviously I 
disagree with that.


---

Now the actual patch. Surprisingly, the middle-end puts both 
Wuninitialized and Wmaybe-uninitialized in Wextra, it is the C-family of 
front-ends that puts them in Wall. It also makes Wuninitialized enable 
Wmaybe-uninitialized, which is backwards (it made sense historically), 
Wuninitialized has much fewer false positives, and if we are willing to be 
warned about possibly uninitialized uses, we certainly also want warnings 
about uninitialized uses that are certain. So I am switching the enabling 
relation between those 2, and enabling only Wuninitialized at Wall.


If the patch gets in, this will of course require a mention in the release 
notes.


I changed a set of tests based on a mix of grep and seeing what failed 
make check. The exact list may not be optimal.


gcc/ChangeLog:

2019-02-01  Marc Glisse  

* common.opt (Wuninitialized): Enable with Wmaybe-uninitialized.
(Wmaybe-uninitialized): Enable with Wextra.
* doc/invoke.texi: Update implications between Wuninitialized,
Wmaybe-uninitialized, Wall and Wextra.

gcc/c-family/ChangeLog:

2019-02-01  Marc Glisse  

* c.opt (Wmaybe-uninitialized): Enable with Wextra.

gcc/testsuite/ChangeLog:

2019-02-01  Marc Glisse  

* c-c++-common/pr69543-1.c: Use -Wmaybe-uninitialized.
* c-c++-common/pr69543-2.c: Likewise.
* c-c++-common/pr69543-3.c: Likewise.
* c-c++-common/pr69543-4.c: Likewise.
* c-c++-common/uninit-17.c: Likewise.
* g++.dg/pr48484.C: Likewise.
* g++.dg/uninit-pred-1_b.C: Likewise.
* g++.dg/uninit-pred-2_b.C: Likewise.
* g++.dg/uninit-pred-3_b.C: Likewise.
* g++.dg/warn/Wuninitialized-5.C: Likewise.
* g++.dg/warn/Wuninitialized-6.C: Likewise.
* 

Re: [PATCH][wwwdocs][Arm] Mention the fixed configurations for Cortex-R7 and Cortex-R8

2019-02-01 Thread Gerald Pfeifer
On Fri, 1 Feb 2019, Andre Vieira (lists) wrote:
> This patch adds the documentation to the FPU configuration fixes for 
> Cortex-R7 and Cortex-R8 to changes.html for GCC9. See 
> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02183.html

Looks good to me.  (And I'm happy to see all those improvements and
doc snippets around Arm. ;-)

Gerald


[PATCH][wwwdocs][Arm] Mention the fixed configurations for Cortex-R7 and Cortex-R8

2019-02-01 Thread Andre Vieira (lists)

Hi,

This patch adds the documentation to the FPU configuration fixes for 
Cortex-R7 and Cortex-R8 to changes.html for GCC9.

See https://gcc.gnu.org/ml/gcc-patches/2018-11/msg02183.html

I have validated the html using the W3C validator.

Is it OK?

Cheers,
Andre
? .changes.html.swp
? patch
cvs diff: Diffing .
Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.35
diff -U 3 -r1.35 changes.html
--- changes.html	15 Jan 2019 13:17:49 -	1.35
+++ changes.html	23 Jan 2019 11:35:15 -
@@ -250,6 +250,10 @@
  (which have no known implementations) has been removed.
  Note that Armv5T, Armv5TE and Armv5TEJ architectures remain supported.
   
+  
+ Corrected FPU configurations for Cortex-R7 and Cortex-R8 when using their
+ respective -mcpu options.
+  
 
 
 


[PATCH fortran] PR 81344 - Can't disable -ffpe-trap (or not documented)

2019-02-01 Thread Dominique d'Humières
Hi!

I am planning to commit the following patch (with a suitable ChangeLog entry)

--- ../_clean/gcc/fortran/invoke.texi   2019-01-30 16:54:38.0 +0100
+++ gcc/fortran/invoke.texi 2019-02-01 11:52:01.0 +0100
@@ -1203,6 +1203,12 @@ The first three exceptions (@samp{invali
 has provisions for dealing with these exceptions, enabling traps for
 these three exceptions is probably a good idea.
 
+If the option is used more than once in the command line, the lists will
+be joined: '@code{ffpe-trap=}@var{list1} @code{ffpe-trap=}@var{list2}'
+is equivalent to @code{ffpe-trap=}@var{list1,list2}.
+
+Note that once enabled an exception cannot be disabled (no negative form).
+
 Many, if not most, floating point operations incur loss of precision
 due to rounding, and hence the @code{ffpe-trap=inexact} is likely to
 be uninteresting in practice.
@@ -1218,6 +1224,9 @@ of the following exceptions: @samp{inval
 @samp{underflow}, @samp{inexact} and @samp{denormal}. (See
 @option{-ffpe-trap} for a description of the exceptions.)
 
+If the option is used more than once in the command line, only the
+last one will be used.
+
 By default, a summary for all exceptions but @samp{inexact} is shown.
 
 @item -fno-backtrace

then to close the PR as WONTFIX. Is it OK?

TIA

Dominique



  1   2   >