Re: [nvptx] vector length patch series

2018-12-21 Thread Tom de Vries
On 14-12-18 20:58, Tom de Vries wrote:
> 0003-openacc-Add-target-hook-TARGET_GOACC_ADJUST_PARALLEL.patch

> 0017-nvptx-Enable-large-vectors.patch

1.

If I void nvptx_adjust_parallelism like this:
...
static unsigned
nvptx_adjust_parallelism (unsigned inner_mask, unsigned outer_mask)
{
  return default_goacc_adjust_parallelism (inner_mask, outer_mask);
}
...
I don't run into any failing tests. From what I can tell, the only
test-case that the proposed implementation of the hook has an effect on,
is the worker vector loop in vred2d-128.c, but that one is passing.

Can you confirm that this hook is in fact needed? Does this test fail on
a specific card? Or is there another test-case that exercises this?

2.

If you have a test-case where this is indeed failing without the
proposed hook implementation, then please try to remove the hardcoding
of vector_length > 32 from the test-source and instead set it using
-fopenacc-dim. AFAIU, the proposed hook does not handle that case, so
you should be able to make it fail.
If so, can you test whether attached implementation fixes it?

Thanks,
- Tom
[nvptx] Add nvptx_adjust_parallelism

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (nvptx_adjust_parallelism): New function.
	(TARGET_GOACC_ADJUST_PARALLELISM): Define.

---
 gcc/config/nvptx/nvptx.c | 55 
 gcc/omp-offload.c|  7 ++
 gcc/omp-offload.h|  1 +
 3 files changed, 63 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index f4095ff5f55..90bbc5b251e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5314,6 +5314,58 @@ nvptx_dim_limit (int axis)
   return 0;
 }
 
+/* This is a copy of oacc_validate_dims from omp-offload.c that does not update
+   the function attributes.  */
+
+static void
+oacc_validate_dims_no_update (tree fn, tree attrs, int *dims, int level,
+			  unsigned used)
+{
+  tree purpose[GOMP_DIM_MAX];
+  unsigned ix;
+  tree pos = TREE_VALUE (attrs);
+
+  gcc_assert (pos);
+
+  for (ix = 0; ix != GOMP_DIM_MAX; ix++)
+{
+  purpose[ix] = TREE_PURPOSE (pos);
+  tree val = TREE_VALUE (pos);
+  dims[ix] = val ? TREE_INT_CST_LOW (val) : -1;
+  pos = TREE_CHAIN (pos);
+}
+
+  targetm.goacc.validate_dims (fn, dims, level);
+
+  for (ix = 0; ix != GOMP_DIM_MAX; ix++)
+if (dims[ix] < 0)
+  dims[ix] = (used & GOMP_DIM_MASK (ix)
+		  ? oacc_get_default_dim (ix) : oacc_get_min_dim (ix));
+}
+
+/* Adjust the parallelism available to a loop given vector_length
+   associated with the offloaded function.  */
+
+static unsigned
+nvptx_adjust_parallelism (unsigned inner_mask, unsigned outer_mask)
+{
+  bool wv = ((inner_mask & GOMP_DIM_MASK (GOMP_DIM_WORKER))
+	 && (inner_mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR)));
+  if (!wv)
+return default_goacc_adjust_parallelism (inner_mask, outer_mask);
+
+  int dims[GOMP_DIM_MAX];
+  tree attrs = oacc_get_fn_attrib (current_function_decl);
+  int fn_level = oacc_fn_attrib_level (attrs);
+  oacc_validate_dims_no_update (current_function_decl, attrs, dims, fn_level,
+inner_mask);
+
+  if (dims[GOMP_DIM_VECTOR] > PTX_WARP_SIZE)
+inner_mask &= ~GOMP_DIM_MASK (GOMP_DIM_WORKER);
+
+  return default_goacc_adjust_parallelism (inner_mask, outer_mask);
+}
+
 /* Determine whether fork & joins are needed.  */
 
 static bool
@@ -6109,6 +6161,9 @@ nvptx_set_current_function (tree fndecl)
 #undef TARGET_GOACC_DIM_LIMIT
 #define TARGET_GOACC_DIM_LIMIT nvptx_dim_limit
 
+#undef TARGET_GOACC_ADJUST_PARALLELISM
+#define TARGET_GOACC_ADJUST_PARALLELISM nvptx_adjust_parallelism
+
 #undef TARGET_GOACC_FORK_JOIN
 #define TARGET_GOACC_FORK_JOIN nvptx_goacc_fork_join
 
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 3338e0633a1..80ecda82d24 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -580,6 +580,13 @@ oacc_get_default_dim (int dim)
   return oacc_default_dims[dim];
 }
 
+int
+oacc_get_min_dim (int dim)
+{
+  gcc_assert (0 <= dim && dim < GOMP_DIM_MAX);
+  return oacc_min_dims[dim];
+}
+
 /* Parse the default dimension parameter.  This is a set of
:-separated optional compute dimensions.  Each specified dimension
is a positive integer.  When device type support is added, it is
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index 176c4da7e88..08e994abdb9 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_OMP_DEVICE_H
 
 extern int oacc_get_default_dim (int dim);
+extern int oacc_get_min_dim (int dim);
 extern int oacc_fn_attrib_level (tree attr);
 
 extern GTY(()) vec *offload_funcs;


Re: [PATCH] attribute copy, leaf, weakref and -Wmisisng-attributes (PR 88546)

2018-12-21 Thread Martin Sebor

On 12/21/18 5:16 PM, Jakub Jelinek wrote:

On Fri, Dec 21, 2018 at 04:50:47PM -0700, Martin Sebor wrote:

The first revision of the patch was missing a test and didn't
completely or completely correctly handle attribute noreturn.
Attached is an update with the test included and the omission
and bug fixed.

I think it makes sense to consider the patch independently of
the question whether weakrefs should be extern.  That change can


Weakrefs shouldn't be extern, that is what we were using initially and
changed to static.  At this point we can't change that again IMNSHO.


Sorry, I mixed things up.  What I meant is "independently of
the question whether leaf should be accepted on extern declarations."

Martin


Re: [PATCH] attribute copy, leaf, weakref and -Wmisisng-attributes (PR 88546)

2018-12-21 Thread Jakub Jelinek
On Fri, Dec 21, 2018 at 04:50:47PM -0700, Martin Sebor wrote:
> The first revision of the patch was missing a test and didn't
> completely or completely correctly handle attribute noreturn.
> Attached is an update with the test included and the omission
> and bug fixed.
> 
> I think it makes sense to consider the patch independently of
> the question whether weakrefs should be extern.  That change can

Weakrefs shouldn't be extern, that is what we were using initially and
changed to static.  At this point we can't change that again IMNSHO.

Jakub


Re: [PATCH] attribute copy, leaf, weakref and -Wmisisng-attributes (PR 88546)

2018-12-21 Thread Martin Sebor

The first revision of the patch was missing a test and didn't
completely or completely correctly handle attribute noreturn.
Attached is an update with the test included and the omission
and bug fixed.

I think it makes sense to consider the patch independently of
the question whether weakrefs should be extern.  That change can
be made separately, with only minor tweaks to the attribute copy
handling and the warning.  None of the other fixes in this patch
(precipitated by more thorough testing) should be affected by it.

Martin

On 12/20/18 8:45 PM, Martin Sebor wrote:

The enhancement to detect mismatched attributes between function
aliases and their targets triggers (expected) warnings in GCC
builds due to aliases being declared with fewer attributes than
their targets.

Using attribute copy as recommended to copy the attributes from
the target to the alias triggers another warning, this time due
to applying attribute leaf to static functions (the attribute
only applies to extern functions).  This is due to an oversight
in both the handler for attribute copy and in
the -Wmissing-attributes warning.

In addition, the copy attribute handler doesn't account for C11
_Noreturn and C++ throw() specifications, both of which set
the corresponding tree bits but don't attach the synonymous
attribute to it.  This also leads to warnings in GCC builds
(in libgfortran).

The attached patch corrects all of these problems: the attribute
copy handler to avoid copying attribute leaf to declarations of
static functions, and to set the noreturn and nonthrow bits, and
the missing attribute warning to avoid triggering for static
weakref aliases whose targets are decorated wiwth attribute leaf.

With this patch, GCC should build with no -Wmissing-attributes
warnings.

Tested on x86_64-linux.

Martin


PR c/88546 - Copy attribute unusable for weakrefs

gcc/c-family/ChangeLog:

	PR c/88546
	* c-attribs.c (handle_copy_attribute): Avoid copying attribute leaf.
	Handle C++ empty throw specification and C11 _Noreturn.
	(has_attribute): Also handle C11 _Noreturn.

gcc/ChangeLog:

	PR c/88546
	* attribs.c (decls_mismatched_attributes): Avoid warning for attribute
	leaf.

libgcc/ChangeLog:

	PR c/88546
	* gthr-posix.h (__gthrw2): Use attribute copy.

libgfortran/ChangeLog:

	PR c/88546
	* libgfortran.h (iexport2): Use attribute copy.

gcc/testsuite/ChangeLog:

	PR c/88546
	* g++.dg/ext/attr-copy.C: New test.
	* gcc.dg/attr-copy-4.c: Disable macro expansion tracking.
	* gcc.dg/attr-copy-6.c: New test.
	* gcc.dg/attr-copy-7.c: New test.

Index: gcc/attribs.c
===
--- gcc/attribs.c	(revision 267301)
+++ gcc/attribs.c	(working copy)
@@ -1912,6 +1912,12 @@ decls_mismatched_attributes (tree tmpl, tree decl,
 
   for (unsigned i = 0; blacklist[i]; ++i)
 {
+  /* Attribute leaf only applies to extern functions.  Avoid mentioning
+	 it when it's missing from a static declaration.  */
+  if (!TREE_PUBLIC (decl)
+	  && !strcmp ("leaf", blacklist[i]))
+	continue;
+
   for (unsigned j = 0; j != 2; ++j)
 	{
 	  if (!has_attribute (tmpls[j], tmpl_attrs[j], blacklist[i]))
Index: gcc/c-family/c-attribs.c
===
--- gcc/c-family/c-attribs.c	(revision 267301)
+++ gcc/c-family/c-attribs.c	(working copy)
@@ -2455,6 +2455,12 @@ handle_copy_attribute (tree *node, tree name, tree
 	  || is_attribute_p ("weakref", atname))
 	continue;
 
+	  /* Aattribute leaf only applies to extern functions.
+	 Avoid copying it to static ones.  */
+	  if (!TREE_PUBLIC (decl)
+	  && is_attribute_p ("leaf", atname))
+	continue;
+
 	  tree atargs = TREE_VALUE (at);
 	  /* Create a copy of just the one attribute ar AT, including
 	 its argumentsm and add it to DECL.  */
@@ -2472,7 +2478,19 @@ handle_copy_attribute (tree *node, tree name, tree
   return NULL_TREE;
 }
 
+  /* A function declared with attribute nothrow has the attribute
+ attached to it, but a C++ throw() function does not.  */
+  if (TREE_NOTHROW (ref))
+TREE_NOTHROW (decl) = true;
+
+  /* Similarly, a function declared with attribute noreturn has it
+ attached on to it, but a C11 _Noreturn function does not.  */
   tree reftype = ref;
+  if (DECL_P (ref)
+  && TREE_THIS_VOLATILE (ref)
+  && FUNC_OR_METHOD_TYPE_P (TREE_TYPE (reftype)))
+TREE_THIS_VOLATILE (decl) = true;
+
   if (DECL_P (ref) || EXPR_P (ref))
 reftype = TREE_TYPE (ref);
 
@@ -2479,6 +2497,9 @@ handle_copy_attribute (tree *node, tree name, tree
   if (POINTER_TYPE_P (reftype))
 reftype = TREE_TYPE (reftype);
 
+  if (!TYPE_P (reftype))
+return NULL_TREE;
+
   tree attrs = TYPE_ATTRIBUTES (reftype);
 
   /* Copy type attributes from REF to DECL.  */
@@ -4188,6 +4209,15 @@ has_attribute (location_t atloc, tree t, tree attr
 	  if (expr && DECL_P (expr))
 		found_match = TREE_READONLY (expr);
 	}
+	  else if (!strcmp ("noreturn", namestr))
+	

[PATCH] PR fortran/88169 -- remove error condition/message

2018-12-21 Thread Steve Kargl
The attached patch addresses an issue submitted by Neil
Carlson.  He and I have an exchange in the PR's audit
trail hashing out the validity of his code example.  I 
also asked on the J3 mailing about the his code.  It seems
that language of the Fortran standard may have been 
misinterpreted when the gfortran code was committed.  See
the PR for more information.

The patch has been tested on x86_64-*-freebsd.  OK to commit?

2018-12-21  Steven G. Kargl  

PR fortran/88169
* module.c (mio_namelist): Remove an error condition/message that
is contrary to the Fortran standard.

2018-12-21  Steven G. Kargl  

PR fortran/88169
* gfortran.dg/pr88169_1.f90: new test.
* gfortran.dg/pr88169_2.f90: Ditto.
* gfortran.dg/pr88169_3.f90: Ditto.

-- 
Steve
Index: gcc/fortran/module.c
===
--- gcc/fortran/module.c	(revision 267342)
+++ gcc/fortran/module.c	(working copy)
@@ -3711,7 +3711,6 @@ static void
 mio_namelist (gfc_symbol *sym)
 {
   gfc_namelist *n, *m;
-  const char *check_name;
 
   mio_lparen ();
 
@@ -3722,17 +3721,6 @@ mio_namelist (gfc_symbol *sym)
 }
   else
 {
-  /* This departure from the standard is flagged as an error.
-	 It does, in fact, work correctly. TODO: Allow it
-	 conditionally?  */
-  if (sym->attr.flavor == FL_NAMELIST)
-	{
-	  check_name = find_use_name (sym->name, false);
-	  if (check_name && strcmp (check_name, sym->name) != 0)
-	gfc_error ("Namelist %s cannot be renamed by USE "
-		   "association to %s", sym->name, check_name);
-	}
-
   m = NULL;
   while (peek_atom () != ATOM_RPAREN)
 	{
Index: gcc/testsuite/gfortran.dg/pr88169_1.f90
===
--- gcc/testsuite/gfortran.dg/pr88169_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr88169_1.f90	(working copy)
@@ -0,0 +1,21 @@
+! { dg-do run }
+module foo_nml
+   implicit none
+   real :: x = -1
+   namelist /foo/ x
+end module
+
+program main
+   use foo_nml, only: bar => foo, x
+   implicit none
+   integer fd
+   x = 42
+   open(newunit=fd, file='tmp.dat', status='replace')
+   write(fd,nml=bar)
+   close(fd)
+   open(newunit=fd, file='tmp.dat', status='old')
+   read(fd,nml=bar)
+   if (x /= 42) stop 1
+   close(fd)
+end program
+! { dg-final { cleanup-modules "foo_nml" } }
Index: gcc/testsuite/gfortran.dg/pr88169_2.f90
===
--- gcc/testsuite/gfortran.dg/pr88169_2.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr88169_2.f90	(working copy)
@@ -0,0 +1,31 @@
+! { dg-do run }
+module foo_nml
+   implicit none
+   real :: x = -1
+   namelist /foo/ x
+end module
+!
+! Yes, implicit typing of local variable 'x'.
+!
+program main
+   use foo_nml, only: bar => foo
+   integer fd
+   x = 42
+   open(newunit=fd, file='tmp.dat', status='replace')
+   write(fd,nml=bar)
+   close(fd)
+   open(newunit=fd, file='tmp.dat', status='old')
+   read(fd,nml=bar)
+   close(fd)
+   call bah
+   if (x /= 42) stop 1
+end program
+
+subroutine bah
+   use foo_nml
+   integer fd
+   open(newunit=fd, file='tmp.dat', status='old')
+   read(fd,nml=foo)
+   if (x /= -1) stop 2
+   close(fd, status='delete')
+end subroutine bah
Index: gcc/testsuite/gfortran.dg/pr88169_3.f90
===
--- gcc/testsuite/gfortran.dg/pr88169_3.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/pr88169_3.f90	(working copy)
@@ -0,0 +1,15 @@
+! { dg-do compile }
+! { dg-options "-std=f95" }
+module foo_nml
+   implicit none
+   real :: x = -1
+   namelist /foo/ x
+end module
+
+program main
+   use foo_nml, only: bar => foo, x
+   implicit none
+   real a
+   namelist /bar/a  ! { dg-error "already is USE associated" }
+end program
+! { dg-final { cleanup-modules "foo_nml" } }


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-21 Thread Hans-Peter Nilsson
On Tue, 18 Dec 2018, Andi Kleen wrote:

> > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > -  int i;
> > -  for (i = 0; i < 1000; i++)
> > +  int i, j;
> > +  for (i = 0; i < 100; i++)
> > +for (j = 0; j < 50; j++)
> >   g += tc->foo();
> > if (g<100) g++;
> >  }
> > @@ -27,8 +28,9 @@ void test1 (A *tc)
> >  static __attribute__((always_inline))
> >  void test2 (B *tc)
> >  {
> > -  int i;
> > +  int i, j;
> >for (i = 0; i < 100; i++)
> > +for (j = 0; j < 50; j++)
> >
> > I have to increase loop count like this to get stable pass on my
> > machine.  The original count (1000) is too small to be sampled.
>
> IIRC It was originally higher, but people running on slow simulators 
> complained,
> so it was reduced.  Perhaps we need some way to detect in the test suite
> that the test runs on a real CPU.

Doesn't check_effective_target_simulator work here?
See e.g. libstdc++-v3/testsuite/25_algorithms/heap/moveable2.cc
for an example.

brgds, H-P


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-21 Thread Uecker, Martin
Am Freitag, den 21.12.2018, 16:13 -0500 schrieb Hans-Peter Nilsson:
> On Tue, 18 Dec 2018, Uecker, Martin wrote:
> > Am Dienstag, den 18.12.2018, 17:29 +0100 schrieb Martin Uecker:
> > > Am Dienstag, den 18.12.2018, 17:24 +0100 schrieb Jakub Jelinek:
> > > > On Tue, Dec 18, 2018 at 09:03:41AM -0700, Jeff Law wrote:
> > > > > Right.  This is the classic example and highlights the ABI concerns.  
> > > > > If
> > > > > we use the low bit to distinguish between a normal function pointer 
> > > > > and
> > > > > a pointer to a descriptor and qsort doesn't know about it, then we 
> > > > > lose.
> > > > > 
> > > > > One way around this is to make *all* function pointers be some kind of
> > > > > descriptor and route all indirect calls through a resolver.  THen you
> > > > 
> > > > Either way, you are creating a new ABI for calling functions through
> > > > function pointers.  Because of how rarely GNU C nested functions are 
> > > > used
> > > > these days, if we want to do anything I'd think it might be better to 
> > > > use
> > > > trampolines, just don't place them on the stack, say have a mmaped page 
> > > > of
> > > > trampolines perhaps with some pointer encryption to where they jump to, 
> > > > so
> > > > it isn't a way to circumvent non-executable stack, and have some 
> > > > register
> > > > and unregister function you'd call to get or release the trampoline.
> > > > If more trampolines are needed than currently available, the library 
> > > > could
> > > > just mmap another such page.  A problem is how it should interact with
> > > > longjmp or similar APIs, because then we could leak some trampolines (no
> > > > "destructor" for the trampoline would be called.  The leaking could be
> > > > handled e.g. through remembering the thread and frame pointer for which 
> > > > it
> > > > has been allocated and if you ask for a new trampoline with a frame 
> > > > pointer
> > > > above the already allocated one, release those entries or reuse them,
> > > > instead of allocating a new one.  And somehow deal with thread exit.
> > > 
> > > Yes, something like this. If the trampolines are pre-allocated, this could
> > > even avoid the need to clear the cache on archs where this is needed.
> > 
> > And if we can make the trampolines be all the same (and it somehow derived
> > from the IP where it has to look for the static chain), we could map the
> > same page of pre-allocated trampolines and not use memory on platforms
> > with virtual memory.
> 
> All fine with new ideas, but consider the case where the nested
> functions are nested.  All mentioned ideas seem to fail for the
> case where a caller (generating a trampoline to be called later)
> is re-entered, i.e. need to generate another trampoline.  The
> same location can't be re-used.  You need a sort of stack.

Yes, you need to be able to create arbitrary number of trampolines.

But this would work: One can use a second stack with pre-allocated
readonly trampolines. Everytime you would now create a trampoline on
the real stack you simply refer to an existing trampoline at the
same location on the parallel stack. And if these trampolines are
all identical, you only need a single real page which is mapped
many times.

Setting up a stack would be more complicated because you also
need to setup this parallel stack. Maybe simulating this second
stack with a global hash table which uses thread id and stack
pointer of the real stack as index is better...

Best,
Martin



Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-21 Thread Hans-Peter Nilsson
On Tue, 18 Dec 2018, Uecker, Martin wrote:
> Am Dienstag, den 18.12.2018, 17:29 +0100 schrieb Martin Uecker:
> > Am Dienstag, den 18.12.2018, 17:24 +0100 schrieb Jakub Jelinek:
> > > On Tue, Dec 18, 2018 at 09:03:41AM -0700, Jeff Law wrote:
> > > > Right.  This is the classic example and highlights the ABI concerns.  If
> > > > we use the low bit to distinguish between a normal function pointer and
> > > > a pointer to a descriptor and qsort doesn't know about it, then we lose.
> > > >
> > > > One way around this is to make *all* function pointers be some kind of
> > > > descriptor and route all indirect calls through a resolver.  THen you
> > >
> > > Either way, you are creating a new ABI for calling functions through
> > > function pointers.  Because of how rarely GNU C nested functions are used
> > > these days, if we want to do anything I'd think it might be better to use
> > > trampolines, just don't place them on the stack, say have a mmaped page of
> > > trampolines perhaps with some pointer encryption to where they jump to, so
> > > it isn't a way to circumvent non-executable stack, and have some register
> > > and unregister function you'd call to get or release the trampoline.
> > > If more trampolines are needed than currently available, the library could
> > > just mmap another such page.  A problem is how it should interact with
> > > longjmp or similar APIs, because then we could leak some trampolines (no
> > > "destructor" for the trampoline would be called.  The leaking could be
> > > handled e.g. through remembering the thread and frame pointer for which it
> > > has been allocated and if you ask for a new trampoline with a frame 
> > > pointer
> > > above the already allocated one, release those entries or reuse them,
> > > instead of allocating a new one.  And somehow deal with thread exit.
> >
> > Yes, something like this. If the trampolines are pre-allocated, this could
> > even avoid the need to clear the cache on archs where this is needed.
>
> And if we can make the trampolines be all the same (and it somehow derived
> from the IP where it has to look for the static chain), we could map the
> same page of pre-allocated trampolines and not use memory on platforms
> with virtual memory.

All fine with new ideas, but consider the case where the nested
functions are nested.  All mentioned ideas seem to fail for the
case where a caller (generating a trampoline to be called later)
is re-entered, i.e. need to generate another trampoline.  The
same location can't be re-used.  You need a sort of stack.

brgds, H-P

Re: libbacktrace integration for _GLIBCXX_DEBUG mode

2018-12-21 Thread Jonathan Wakely

On 21/12/18 22:47 +0200, Ville Voutilainen wrote:

On Fri, 21 Dec 2018 at 22:35, Jonathan Wakely  wrote:

>I also explcitely define BACKTRACE_SUPPORTED to 0 to make sure
>libstdc++ has no libbacktrace dependency after usual build.



I'm concerned about the requirement to link to libbacktrace
explicitly (which will break existing makefiles and build systems that
currently use debug mode in testing).


But see what Francois wrote, "I also explcitely define
BACKTRACE_SUPPORTED to 0 to make sure
libstdc++ has no libbacktrace dependency after usual build."


Yes, but if you happen to install libbacktrace headers, the behaviour
for users building their own code changes. I agree that if you install
those headers, it's probably for a reason, but it might be a different
reason to "so that libstdc++ prints better backtraces".


Also, some of the glibc team pointed out to me that running *any*
extra code after undefined behaviour has been detected is a potential
risk. The less that you do between detecting UB and calling abort(),
the better. Giving the users more information is helpful, but comes
with some additional risk.


Ditto. Having said those things, I think we need to figure out a good
way to provide this sensibly
as an opt-in. The backtrace support is bloody useful, and dovetails
into a possible Contracts-aware
implementation of our library, but I think we need to do some more
thought-work on this, thus I agree
that it's not stage3 material. I do think it's something that we need
to keep in mind, thanks
for working on it, Francois!


Yes, I agree that making it available via a more explicit opt-in would
be good. Maybe require users to define _GLIBCXX_DEBUG_BACKTRACE as well
as _GLIBCXX_DEBUG, or something like that.




Re: [C++ PATCH] Speed up inplace_merge algorithm & fix inefficient logic(PR libstdc++/83938)

2018-12-21 Thread Jonathan Wakely

On 29/10/18 07:06 +0100, François Dumont wrote:

Hi

    Some feedback regarding this patch ?


Sorry this got missed, please resubmit during stage 1.

You haven't CC'd the original patch author (chang jc) to give them a
chance to comment on your proposed changes to the patch.

The attached PDF on PR libstdc++/83938 has extensive discussion of the
performance issue, but I don't see any for your version. Some
performance benchmarks for your version would be helpful.





Thanks,
François

On 8/21/18 10:34 PM, François Dumont wrote:
I missed a test that was failing because of this patch. So I revert 
a small part of it and here is the new proposal.


Tested under Linux x86_64, ok to commit ?

François


On 24/07/2018 12:22, François Dumont wrote:

Hi

    Any chance to review this patch ?

François


On 06/06/2018 18:39, François Dumont wrote:

Hi

    I review and rework this proposal. I noticed that the same 
idea to limit buffer size within inplace_merge also apply to 
stable_sort.


    I also change the decision when buffer is too small to 
consider the buffer size rather than going through successive 
cuts of the original ranges. This way we won't cut the range 
more than necessary. Note that if you prefer this second part 
could be committed seperatly.


    PR libstdc++/83938
    * include/bits/stl_algo.h:
    (__stable_partition_adaptive): When buffer to small cut range using
    buffer size.
    (__inplace_merge): Take temporary buffer length from 
smallest range.
    (__merge_adaptive): When buffer too small consider smallest 
range first

    and cut based on buffer size.
    (__stable_sort_adaptive): When buffer too small cut based on buffer
    size.
    (__stable_sort): Limit temporary buffer length.
    * include/bits/stl_tempbuf.h (get_temporary_buffer): Change 
function

    to reduce requested buffer length on allocation failure.

Tested under Linux x86_64.

Ok to commit ?

François


On 25/01/2018 23:37, chang jc wrote:

Hi:

1. The __len = (__len + 1) / 2; is as you suggested, need to modify as
__len = (__len == 1) ? 0 : ((__len + 1) / 2);

2. The coding gain has been shown  PR c++/83938; I re-post here




   21
   20
   19
   18
   17
   16


   0.471136
   0.625695
   0.767262
   0.907461
   1.04838
   1.19508


   0.340845
   0.48651
   0.639139
   0.770133
   0.898454
   1.04632

it means when Merge [0, 4325376, 16777216); A is a sorted integer with
4325376 & B with 12451840 elements, total with 16M integers

The proposed method has the speed up under given buffer size =, ex
2^16, 2^17, ... 2^21 in unit of sizeof(int), for example, 2^16 means
given sizof(int)*64K bytes.

3. As your suggestion, _TmpBuf __buf should be rewrite.

4. It represents a fact that the intuitive idea to split from larger
part is wrong.

For example, if you have an input sorted array A & B, A has 8 integers
& B has 24 integers. Given tmp buffer whose capacity = 4 integers.

Current it tries to split from B, right?

Then we have:

A1 | A2  B1 | B2

B1 & B2 has 12 integers each, right?

Current algorithm selects pivot as 13th integer from B, right?

If the corresponding upper bound of A is 6th integer.

Then it split in

A1 = 5 | A2 = 3 | B1 = 12 | B2 = 12

After rotate, we have two arrays to merge

[A1 = 5 | B1 = 12]  & [A2 = 3 | B2 = 12]

Great, [A2 = 3 | B2 = 12] can use tmp buffer to merge.

Sadly, [A1 = 5 | B1 = 12] CANNOT.

So we do rotate again, split & merge the two split arrays from [A1 = 5
| B1 = 12] again.


But wait, if we always split from the smaller one instead of 
larger one.


After rotate, it promises two split arrays both contain 
ceiling[small/2].


Since tmp buffer also allocate by starting from sizeof(small) &
recursively downgrade by ceiling[small/2^(# of fail allocate)].

It means the allocated tmp buffer promises to be sufficient at the
level of (# of fail allocate).

Instead, you can see if split from large at level (# of fail allocate)
several split array still CANNOT use  tmp buffer to do buffered merge.


As you know, buffered merge is far speed then (split, rotate, and
merge two sub-arrays) (PR c++/83938 gives the profiling figures),

the way should provide speedup.


Thanks.










On 24/01/2018 18:23, François Dumont wrote:

Hi


 It sounds like a very sensitive change to make but 
nothing worth figures.

Do you have any bench showing the importance of the gain ?

 At least the memory usage optimization is obvious.

On 19/01/2018 10:43, chang jc wrote:

Current std::inplace_merge() suffers from performance issue by 
inefficient


logic under limited memory,

It leads to performance downgrade.

Please help to review it.

Index: include/bits/stl_algo.h
===
--- include/bits/stl_algo.h    (revision 256871)
+++ include/bits/stl_algo.h    (working copy)
@@ -2437,7 +2437,7 @@
 _BidirectionalIterator __second_cut = __middle;
 _Distance __len11 = 0;
 _Distance __len22 = 0;
-  if (__len1 

Re: libbacktrace integration for _GLIBCXX_DEBUG mode

2018-12-21 Thread Ville Voutilainen
On Fri, 21 Dec 2018 at 22:35, Jonathan Wakely  wrote:
> >I also explcitely define BACKTRACE_SUPPORTED to 0 to make sure
> >libstdc++ has no libbacktrace dependency after usual build.

> I'm concerned about the requirement to link to libbacktrace
> explicitly (which will break existing makefiles and build systems that
> currently use debug mode in testing).

But see what Francois wrote, "I also explcitely define
BACKTRACE_SUPPORTED to 0 to make sure
libstdc++ has no libbacktrace dependency after usual build."

> Also, some of the glibc team pointed out to me that running *any*
> extra code after undefined behaviour has been detected is a potential
> risk. The less that you do between detecting UB and calling abort(),
> the better. Giving the users more information is helpful, but comes
> with some additional risk.

Ditto. Having said those things, I think we need to figure out a good
way to provide this sensibly
as an opt-in. The backtrace support is bloody useful, and dovetails
into a possible Contracts-aware
implementation of our library, but I think we need to do some more
thought-work on this, thus I agree
that it's not stage3 material. I do think it's something that we need
to keep in mind, thanks
for working on it, Francois!


Re: libbacktrace integration for _GLIBCXX_DEBUG mode

2018-12-21 Thread Jonathan Wakely

On 11/12/18 00:08 +0100, François Dumont wrote:

Hi

    Here is the integration of libbacktrace to provide the backtrace 
on _GLIBCXX_DEBUG assertions.


    I decided to integrate it without impacting the build scripts. 
Users just need to install libbacktrace and once done _GLIBCXX_DEBUG 
will look for it and start using it if supported. The drawback is that 
as soon as libbacktrace is installed users will have to add 
-lbacktrace in order to use _GLIBCXX_DEBUG mode. But I expect that if 
you install libbacktrace it is for a reason.


    Note that when libbacktrace is not supported I include stdint.h to 
get uintptr_t, I hope it is the correct way to get it in a portable 
way.


    I also explcitely define BACKTRACE_SUPPORTED to 0 to make sure 
libstdc++ has no libbacktrace dependency after usual build.


    As it starts to make a lot of information displayed on Debug 
assertion I have created print_function to filter output of functions. 
It removes things like __cxx1998::, std::allocator and greatly 
simplified _Safe_iterator rendering.


    Here is an example of output when building 
23_containers/vector/debug/construct3_neg.cc:


/home/fdt/dev/gcc/install/include/c++/9.0.0/debug/safe_iterator.h:321:
In function:
    __gnu_debug::_Safe_iterator<_Iterator, _Sequence, _Category>&
    __gnu_debug::_Safe_iterator<_Iterator, _Sequence,
    _Category>::operator++() [with _Iterator = std::_List_iterator;
    _Sequence = std::__debug::list; _Category =
    std::forward_iterator_tag]

Backtrace:
    0x40275f 
__gnu_debug::_Safe_iterator>::operator++()

/home/fdt/dev/gcc/install/include/c++/9.0.0/debug/safe_iterator.h:321
    0x402181 
__gnu_debug::_Safe_iterator>::operator++()

/home/fdt/dev/gcc/install/include/c++/9.0.0/debug/safe_iterator.h:570
    0x404082 std::iterator_traits<__gnu_debug::_Safe_iterator> 
::difference_type 
std::__distance<__gnu_debug::_Safe_iterator> 
(__gnu_debug::_Safe_iterator>, 
__gnu_debug::_Safe_iterator>, 
std::input_iterator_tag)

/home/fdt/dev/gcc/install/include/c++/9.0.0/bits/stl_iterator_base_funcs.h:89
    0x403795 std::iterator_traits<__gnu_debug::_Safe_iterator> 
::difference_type 
std::distance<__gnu_debug::_Safe_iterator> 
(__gnu_debug::_Safe_iterator>, 

__gnu_debug::_Safe_iterator>)
/home/fdt/dev/gcc/install/include/c++/9.0.0/bits/stl_iterator_base_funcs.h:141
    0x4030b9 void std::vector::_M_range_initialize<__gnu_debug::_Safe_iterator> 
(__gnu_debug::_Safe_iterator>, 
__gnu_debug::_Safe_iterator>, 
std::forward_iterator_tag)

/home/fdt/dev/gcc/install/include/c++/9.0.0/bits/stl_vector.h:1541
    0x402a2d std::vector::vector<__gnu_debug::_Safe_iterator>, 
void>(__gnu_debug::_Safe_iterator>, 
__gnu_debug::_Safe_iterator>)

/home/fdt/dev/gcc/install/include/c++/9.0.0/bits/stl_vector.h:618
    0x4022ec std::__debug::vector::vector<__gnu_debug::_Safe_iterator>, 
void>(__gnu_debug::_Safe_iterator>, 
__gnu_debug::_Safe_iterator>)

    /home/fdt/dev/gcc/install/include/c++/9.0.0/debug/vector:195
    0x401e2c void 
__gnu_test::check_construct3 >()

    ./util/debug/checks.h:234
    0x401460 test01()
    /home/fdt/dev/poc/construct3_neg.cc:26
    0x40146c main
    /home/fdt/dev/poc/construct3_neg.cc:31

Error: attempt to increment a past-the-end iterator.

Objects involved in the operation:
    iterator "this" @ 0x0x7fff068adce0 {
  type = std::_List_iterator (mutable iterator);
  state = past-the-end;
  references sequence with type 'std::__debug::list' @ 
0x0x7fff068ae080

    }

    * include/debug/formatter.h: Check for backtrace-supported.h access
    and include it.
    [BACKTRACE_SUPPORTED] Include 
    (_Error_formatter::_Bt_full_t): New function pointer type.
    (_Error_formatter::_M_backtrace_state): New.
    (_Error_formatter::_M_backtrace_full_func): New.
    * src/c++11/debug.cc: Include .
    (PrintContext::_M_demangle_name): New.
    (_Print_func_t): New.
    (print_word(PrintContext&, const char*)): New.
    (print_raw(PrintContext&, const char*)): New.
    (print_function(PrintContext&, const char*, _Print_func_t)): New.
    (print_type): Use latter.
    (print_string(PrintContext&, const char*)): New.
    (print_backtrace(void*, uintptr_t, const char*, int, const char*)):
    New.
    (_Error_formatter::_M_error()): Adapt.

Tested under Linux x86_64.

Ok to commit ? One day ?


Maybe one day, but not now during stage 3.

I'm concerned about the requirement to link to libbacktrace
explicitly (which will break existing makefiles and build systems that
currently use debug mode in testing).

Also, some of the glibc team pointed out to me that running *any*
extra code after undefined behaviour has been detected is a potential
risk. The less that you do between detecting UB and calling abort(),
the better. Giving the users more information is helpful, but comes
with some additional risk.



Re: Fix hashtable node deallocation

2018-12-21 Thread Jonathan Wakely

On 16/12/18 14:16 +0100, François Dumont wrote:

Gentle reminder, we still have this issue pending.

    * include/bits/hashtable_policy.h
(_Hashtable_alloc<>::_M_deallocate_node_ptr(__node_type*)): New.
    (_Hashtable_alloc<>::_M_deallocate_node(__node_type*)): Use latter.
(_ReuseOrAllocNode<>::operator<_Arg>()(_Arg&&)): Likewise.


Please add more detail to the commit message explaining the problem.
Either as a paragraph of text in the commit message before the
changelog (e.g. see https://gcc.gnu.org/r267236 or
https://gcc.gnu.org/r267276 for commits with additional text in the
commit message), or in the changelog itself, e.g.

   (_ReuseOrAllocNode<>::operator<_Arg>()(_Arg&&)): Likewise, so
   that the argument to __node_alloc_traits::deallocate is the
   correct pointer type.


    * libstdc++-v3/testsuite/util/testsuite_allocator.h
    (CustomPointerAlloc<>::allocate(size_t, pointer)): Replace by...
    (CustomPointerAlloc<>::allocate(size_t, const_void_pointer)): ...this.


This should have been a separate commit really.

OK for trunk with a better commit message that explains what the
change does.



Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Steve Kargl
On Fri, Dec 21, 2018 at 07:39:45PM +, Joseph Myers wrote:
> On Fri, 21 Dec 2018, Steve Kargl wrote:
> 
> > scalbln(double x, long n)
> > {
> > 
> > return (scalbn(x, (n > NMAX) ? NMAX : (n < NMIN) ? NMIN : (int)n));
> > }
> > 
> > A search for glibc's libm locates https://tinyurl.com/ybcy8w4t
> > which is a bit-twiddling routine.  Not sure it's worth the
> > effort.  Joseph Myers might have an opinion.
> 
> Such comparisons are needed in the scalbn / scalbln implementations anyway 
> to deal with large exponents.  I suppose where there's a suitable scalbln 
> implementation, and you don't know if the arguments are within the range 
> of int, calling scalbln at least saves code size in the caller and avoids 
> duplicating those range checks.
> 

I was thinking along the lines of -ffast-math and whether 
__builtin_scalbn and __builtin_scalbln are then inlined.  
The comparisons may inhibit inlining __builtin_scalbn;
while, if gfortran used __builtin_scalbln, inlining would
occur.

As it is, for

   function foo(x,i)
 use ieee_arithmetic
 real(8) foo, c
 integer(8) i
 foo = ieee_scalb(c, i)
   end function foo

the options -ffast-math -O3 -fdump-tree-optimized give

   [local count: 1073741824]:
  _gfortran_ieee_procedure_entry ();
  _8 = *i_7(D);
  _1 = MIN_EXPR <_8, 2147483647>;
  _2 = MAX_EXPR <_1, -2147483647>;
  _3 = (integer(kind=4)) _2;
  _4 = __builtin_scalbn (c_9(D), _3);
  _gfortran_ieee_procedure_exit ();
  fpstate.0 ={v} {CLOBBER};
  return _4;

It seems this could be 

   [local count: 1073741824]:
  _gfortran_ieee_procedure_entry ();
  _3 = (integer(kind=4)) *i_7(D);
  _4 = __builtin_scalbn (c_9(D), _3);
  _gfortran_ieee_procedure_exit ();
  fpstate.0 ={v} {CLOBBER};

-- 
Steve


Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Joseph Myers
On Fri, 21 Dec 2018, Steve Kargl wrote:

> scalbln(double x, long n)
> {
> 
> return (scalbn(x, (n > NMAX) ? NMAX : (n < NMIN) ? NMIN : (int)n));
> }
> 
> A search for glibc's libm locates https://tinyurl.com/ybcy8w4t
> which is a bit-twiddling routine.  Not sure it's worth the
> effort.  Joseph Myers might have an opinion.

Such comparisons are needed in the scalbn / scalbln implementations anyway 
to deal with large exponents.  I suppose where there's a suitable scalbln 
implementation, and you don't know if the arguments are within the range 
of int, calling scalbln at least saves code size in the caller and avoids 
duplicating those range checks.

-- 
Joseph S. Myers
jos...@codesourcery.com


[patch, fortran] PR87881 - gfortran.dg/inquiry_type_ref_(1.f08|3.f90) fail on darwin

2018-12-21 Thread Paul Richard Thomas
Applied as 'obvious' after regtesting on FC28/x86_64.

The second part of the patch (in simplify_ref_chain) is due to Jakub,
for which many thanks. The first is consequent on the need to deal
with more than one inquiry part ref (see the testcase) and yet be able
to return true from simplify_ref_chain. The testcase checks this part.

Paul

2018-12-21  Paul Thomas  

PR fortran/87881
* expr.c (find_inquiry_ref): Loop through the inquiry refs in
case there are two of them.
(simplify_ref_chain): Return true after a successful call to
find_inquiry_ref.

2018-12-21  Paul Thomas  

PR fortran/87881
* gfortran.dg/inquiry_part_ref_4.f90: New test.


Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Steve Kargl
On Fri, Dec 21, 2018 at 08:59:04PM +0200, Janne Blomqvist wrote:
> On Fri, Dec 21, 2018 at 7:59 PM Steve Kargl <
> >
> > D.3853 = *i;
> > __result_foo = scalbnq (c,
> > (integer(kind=4)) MAX_EXPR ,
> > -2147483647>);
> >
> > The range [-32443,32443] is a subset of [-huge(),huge(0)].
> >
> 
> True. I guess the advantage of scalbln* would be to avoid the MAX/MIN_EXPR
> and casting for kind int64.
> 

fdlibm-based libm has 

#define NMAX65536
#define NMIN-65536
double
scalbln(double x, long n)
{

return (scalbn(x, (n > NMAX) ? NMAX : (n < NMIN) ? NMIN : (int)n));
}

A search for glibc's libm locates https://tinyurl.com/ybcy8w4t
which is a bit-twiddling routine.  Not sure it's worth the
effort.  Joseph Myers might have an opinion.

-- 
Steve


Re: [C++ PATCH] Fix __builtin_{is_constant_evaluated,constant_p} handling in static_assert (PR c++/86524, PR c++/88446, take 2)

2018-12-21 Thread Jason Merrill

On 12/21/18 3:51 AM, Jakub Jelinek wrote:

On Thu, Dec 20, 2018 at 09:49:39PM -0500, Jason Merrill wrote:

But if we need cp_fully_fold, doesn't that mean that the earlier
cxx_eval_constant_expression failed and thus the argument is not a constant
expression?  Should __builtin_is_constant_evaluated () evaluate to true
even if the argument is not a constant expression?


Ah, no, good point.


Is there a reason to call that maybe_constant_value at all when we've called
cxx_eval_constant_expression first?  Wouldn't cp_fold_rvalue (or
c_fully_fold with false as last argument) be sufficient there?


I think that would be better, yes.


As cp_fold_rvalue* is static in cp-gimplify.c, I've used c_fully_fold
(or do you want to export cp_fold_rvalue*?).


Let's export it.  OK with that change.

Jason


Fix devirtualiation in expanded thunks

2018-12-21 Thread Jan Hubicka
Hi,
this patch fixes polymorphic call analysis in thunks.  Unlike normal
methods, thunks take THIS pointer offsetted by a known constant. This
needs t be compensated for when calculating address of outer type.

Bootstrapped/regtested x86_64-linux, also tested with Firefox where this
bug trigger misoptimization in spellchecker.  I plan to backport it to
release branches soon.

Honza

PR ipa/88561
* ipa-polymorphic-call.c
(ipa_polymorphic_call_context::ipa_polymorphic_call_context): Handle
arguments of thunks correctly.
(ipa_polymorphic_call_context::get_dynamic_context): Be ready for
NULL instance pinter.
* lto-cgraph.c (lto_output_node): Always stream thunk info.
* g++.dg/tree-prof/devirt.C: New testcase.
Index: ipa-polymorphic-call.c
===
--- ipa-polymorphic-call.c  (revision 267325)
+++ ipa-polymorphic-call.c  (working copy)
@@ -995,9 +995,22 @@ ipa_polymorphic_call_context::ipa_polymo
{
  outer_type
 = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (base_pointer)));
+ cgraph_node *node = cgraph_node::get (current_function_decl);
  gcc_assert (TREE_CODE (outer_type) == RECORD_TYPE
  || TREE_CODE (outer_type) == UNION_TYPE);
 
+ /* Handle the case we inlined into a thunk.  In this case
+thunk has THIS pointer of type bar, but it really receives
+address to its base type foo which sits in bar at 
+0-thunk.fixed_offset.  It starts with code that adds
+think.fixed_offset to the pointer to compensate for this.
+
+Because we walked all the way to the begining of thunk, we now
+see pointer _offset and need to compensate
+for it.  */
+ if (node->thunk.fixed_offset)
+   offset -= node->thunk.fixed_offset * BITS_PER_UNIT;
+
  /* Dynamic casting has possibly upcasted the type
 in the hiearchy.  In this case outer type is less
 informative than inner type and we should forget
@@ -1005,7 +1018,11 @@ ipa_polymorphic_call_context::ipa_polymo
  if ((otr_type
   && !contains_type_p (outer_type, offset,
otr_type))
- || !contains_polymorphic_type_p (outer_type))
+ || !contains_polymorphic_type_p (outer_type)
+ /* If we compile thunk with virtual offset, the THIS pointer
+is adjusted by unknown value.  We can't thus use outer info
+at all.  */
+ || node->thunk.virtual_offset_p)
{
  outer_type = NULL;
  if (instance)
@@ -1030,7 +1047,15 @@ ipa_polymorphic_call_context::ipa_polymo
  maybe_in_construction = false;
}
  if (instance)
-   *instance = base_pointer;
+   {
+ /* If method is expanded thunk, we need to apply thunk offset
+to instance pointer.  */
+ if (node->thunk.virtual_offset_p
+ || node->thunk.fixed_offset)
+   *instance = NULL;
+ else
+   *instance = base_pointer;
+   }
  return;
}
   /* Non-PODs passed by value are really passed by invisible
@@ -1547,6 +1572,9 @@ ipa_polymorphic_call_context::get_dynami
   HOST_WIDE_INT instance_offset = offset;
   tree instance_outer_type = outer_type;
 
+  if (!instance)
+return false;
+
   if (otr_type)
 otr_type = TYPE_MAIN_VARIANT (otr_type);
 
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 267325)
+++ lto-cgraph.c(working copy)
@@ -547,7 +547,11 @@ lto_output_node (struct lto_simple_outpu
   streamer_write_bitpack ();
   streamer_write_data_stream (ob->main_stream, section, strlen (section) + 1);
 
-  if (node->thunk.thunk_p)
+  /* Stream thunk info always because we use it in
+ ipa_polymorphic_call_context::ipa_polymorphic_call_context
+ to properly interpret THIS pointers for thunks that has been converted
+ to Gimple.  */
+  if (node->definition)
 {
   streamer_write_uhwi_stream
 (ob->main_stream,
@@ -1295,7 +1299,7 @@ input_node (struct lto_file_decl_data *f
   if (section)
 node->set_section_for_node (section);
 
-  if (node->thunk.thunk_p)
+  if (node->definition)
 {
   int type = streamer_read_uhwi (ib);
   HOST_WIDE_INT fixed_offset = streamer_read_uhwi (ib);
Index: testsuite/g++.dg/tree-prof/devirt.C
===
--- testsuite/g++.dg/tree-prof/devirt.C (nonexistent)
+++ testsuite/g++.dg/tree-prof/devirt.C (working copy)
@@ -0,0 +1,123 @@
+/* { dg-options "-O3 -fdump-tree-dom3" } */
+struct nsISupports
+{
+  virtual int QueryInterface (const int , void **aInstancePtr) = 0;
+  virtual __attribute__((noinline, noclone)) 

[committed] OpenMP expansion regimplification fix (PR middle-end/85594, PR middle-end/88553)

2018-12-21 Thread Jakub Jelinek
Hi!

As the testcase shows, even result of force_gimple_operand_gsi
might need regimplification, e.g. if it is ADDR_EXPR of a reference with
a decl with DECL_VALUE_EXPR set.

Fixed thusly.  Additionally, I've noticed the OpenMP 4.5 support broke a
case where we assigned to addressable decl vback some expression and then
wanted to use that t expression rather than vback to avoid having
addressable decl in there, but 4.5 addition inserted code in between that
reused the temporary for something else.  Haven't managed to construct a
testcase quickly for that though.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-12-21  Jakub Jelinek  

PR middle-end/85594
PR middle-end/88553
* omp-expand.c (extract_omp_for_update_vars): Regimplify the condition
if needed.
(expand_omp_for_generic): Don't clobber t temporary for ordered loops.

* gcc.dg/gomp/pr85594.c: New test.
* gcc.dg/gomp/pr88553.c: New test.

--- gcc/omp-expand.c.jj 2018-11-19 14:43:35.710901282 +0100
+++ gcc/omp-expand.c2018-12-21 16:33:21.732632819 +0100
@@ -2076,6 +2076,11 @@ extract_omp_for_update_vars (struct omp_
  t = fold_build2 (fd->loops[i].cond_code, boolean_type_node, v, t);
  stmt = gimple_build_cond_empty (t);
  gsi_insert_after (, stmt, GSI_CONTINUE_LINKING);
+ if (walk_tree (gimple_cond_lhs_ptr (as_a  (stmt)),
+expand_omp_regimplify_p, NULL, NULL)
+ || walk_tree (gimple_cond_rhs_ptr (as_a  (stmt)),
+   expand_omp_regimplify_p, NULL, NULL))
+   gimple_regimplify_operands (stmt, );
  e = make_edge (bb, body_bb, EDGE_TRUE_VALUE);
  e->probability = profile_probability::guessed_always ().apply_scale 
(7, 8);
}
@@ -3209,20 +3214,21 @@ expand_omp_for_generic (struct omp_regio
 
  if (fd->ordered && counts[fd->collapse - 1] == NULL_TREE)
{
+ tree tem;
  if (fd->collapse > 1)
-   t = fd->loop.v;
+   tem = fd->loop.v;
  else
{
- t = fold_build2 (MINUS_EXPR, TREE_TYPE (fd->loops[0].v),
-  fd->loops[0].v, fd->loops[0].n1);
- t = fold_convert (fd->iter_type, t);
+ tem = fold_build2 (MINUS_EXPR, TREE_TYPE (fd->loops[0].v),
+fd->loops[0].v, fd->loops[0].n1);
+ tem = fold_convert (fd->iter_type, tem);
}
  tree aref = build4 (ARRAY_REF, fd->iter_type,
  counts[fd->ordered], size_zero_node,
  NULL_TREE, NULL_TREE);
- t = force_gimple_operand_gsi (, t, true, NULL_TREE,
-   true, GSI_SAME_STMT);
- expand_omp_build_assign (, aref, t);
+ tem = force_gimple_operand_gsi (, tem, true, NULL_TREE,
+ true, GSI_SAME_STMT);
+ expand_omp_build_assign (, aref, tem);
}
 
  t = build2 (fd->loop.cond_code, boolean_type_node,
--- gcc/testsuite/gcc.dg/gomp/pr85594.c.jj  2018-12-21 16:47:08.430529687 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr85594.c 2018-12-21 16:46:59.682670326 +0100
@@ -0,0 +1,5 @@
+/* PR middle-end/85594 */
+/* { dg-do compile } */
+/* { dg-additional-options "-fwrapv" } */
+
+#include "pr81768-2.c"
--- gcc/testsuite/gcc.dg/gomp/pr88553.c.jj  2018-12-21 16:48:02.930653492 
+0100
+++ gcc/testsuite/gcc.dg/gomp/pr88553.c 2018-12-21 16:48:20.577369790 +0100
@@ -0,0 +1,5 @@
+/* PR middle-end/88553 */
+/* { dg-do compile } */
+/* { dg-additional-options "-O1 -ftree-loop-vectorize -fwrapv" } */
+
+#include "pr81768-2.c"

Jakub


Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-21 Thread Jakub Jelinek
On Fri, Dec 21, 2018 at 06:01:56PM +, Steve Ellcey wrote:
> Here is an update to the test part of this patch.  I did not change the
> actual source code part of this, just the tests, so that is all I am
> including here.  I removed the x86 changes that had gotten in there by
> accident and used relative line numbers in the warning checks instead
> of absolute line numbers.  I also moved the warning checks to be closer
> to the lines where the warnings are generated.
> 
> Retested on x86 and aarch64 with no regressions.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2018-12-21  Steve Ellcey  
> 
>   * g++.dg/gomp/declare-simd-1.C: Add aarch64 specific
>   warning checks and assembler scans.
>   * g++.dg/gomp/declare-simd-3.C: Ditto.
>   * g++.dg/gomp/declare-simd-4.C: Ditto.
>   * g++.dg/gomp/declare-simd-7.C: Ditto.
>   * gcc.dg/gomp/declare-simd-1.c: Ditto.
>   * gcc.dg/gomp/declare-simd-3.c: Ditto.

LGTM.

Jakub


Re: [PATCH] attribute copy, leaf, weakref and -Wmisisng-attributes (PR 88546)

2018-12-21 Thread Joseph Myers
On Fri, 21 Dec 2018, Martin Sebor wrote:

> That said, I'm also not sure the warning is necessarily the best way
> to deal with the attribute mismatches in these cases (declarations
> of aliases in .c files).  Wouldn't it make more sense to copy
> the attributes from targets to their aliases unconditionally?
> 
> Joseph, any thoughts based on your experience with the warning (and
> attribute copy) in Glibc?

My expectation is that the normal case is that the same attributes should 
apply to all names for a function (except for the ones already excluded 
from copying because they're properties of a symbol rather than of the 
function that symbol points to), but there may be niche cases where you 
deliberately want calls to different names for a function to be handled 
differently based on different attributes (and so have deliberately 
different declarations for both names in a header which is included in the 
translation unit defining the function and alias, say).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Janne Blomqvist
On Fri, Dec 21, 2018 at 7:59 PM Steve Kargl <
s...@troutmask.apl.washington.edu> wrote:

> On Fri, Dec 21, 2018 at 11:07:08AM +0200, Janne Blomqvist wrote:
> > On Fri, Dec 21, 2018 at 8:22 AM Steve Kargl <
> > s...@troutmask.apl.washington.edu> wrote:
> >
> > > On Thu, Dec 20, 2018 at 01:47:39PM -0800, Steve Kargl wrote:
> > > > The attached patch has been tested on x86_64-*-freebsd.
> > > >
> > > > OK to commit?
> > > >
> > > > 2018-12-20  Steven G. Kargl  
> > > >
> > > >   PR fortran/69121
> > > >   * libgfortran/ieee/ieee_arithmetic.F90: Provide missing
> functions
> > > >   in interface for IEEE_SCALB.
> > > >
> > > > 2018-12-20  Steven G. Kargl  
> > > >
> > > >   PR fortran/69121
> > > >   * gfortran.dg/ieee/ieee_9.f90: New test.
> > >
> > > Now, tested on i586-*-freebsd.
> > >
> >
> > Hi, looks ok for trunk.
> >
> > A few questions popped into my mind while looking into this:
> >
> > 1) Why are none of the _gfortran_ieee_scalb_X_Y functions mentioned in
> > gfortran.map? I guess they should all be there?
> >
> > 2) Currently all the intrinsics map to the scalbn{,f,l} builtins.
> However,
> > when the integer argument is of kind int64 or int128 we should instead
> use
> > scalbln{,f,l}. This also applies to other intrinsics that use scalbn
> under
> > the hood.
> >
> > To clarify, fixing these is not a prerequisite for accepting the patch (I
> > already accepted it), but more like topics for further work.
> >
>
> I forgot to address your had a 2) item above.  ieee_scalb appears
> to do the right thing.  FX addressed that with his implementation.
> The 2nd argument is always cast to integer after reducing the range
> to that of integer(4).
>
> The binary floating point representation for a REAL(16) finite number
> is x=f*2**e with f in [0.5,1) and e in [-16059,16384].  scalb(x,n) is
> x*2**n, which becomes f*2**e*2**n = f*2**(e+n).  If x is the smallest
> positive subnormal number, then n can be at most 32443 to still return
> a finite REAL(16) number.  Any larger value overflows to infinity.
> If x is the largest positive finite number, then n can be -32443 to
> return the small positive subnormal number.  Any more negative value
> of n underflows to zero.  (Note, I could be off-by-one, but that is
> just a detail.)
>
> Consider
>
> function foo(x,i)
>use ieee_arithmetic
>real(16) foo, c
>integer(8) i
>print *, ieee_scalb(c, i)
> end function foo
>
> -fdump-tree-original gives
>
> D.3853 = *i;
> __result_foo = scalbnq (c,
> (integer(kind=4)) MAX_EXPR ,
> -2147483647>);
>
> The range [-32443,32443] is a subset of [-huge(),huge(0)].
>

True. I guess the advantage of scalbln* would be to avoid the MAX/MIN_EXPR
and casting for kind int64.

-- 
Janne Blomqvist


[Committed] S/390: Add support for double<->long vector converts

2018-12-21 Thread Andreas Krebbel
Bootstrapped and regression tested on s390x (IBM z14).

Committed to mainline

gcc/ChangeLog:

2018-12-21  Andreas Krebbel  

* config/s390/vector.md ("floatv2div2df2", "floatunsv2div2df2")
("fix_truncv2dfv2di2", "fixuns_truncv2dfv2di2"): New pattern
definitions.

gcc/testsuite/ChangeLog:

2018-12-21  Andreas Krebbel  

* gcc.target/s390/vector/fp-signedint-convert-1.c: New test.
* gcc.target/s390/vector/fp-unsignedint-convert-1.c: New test.
---
 gcc/config/s390/vector.md  | 52 ++
 .../s390/vector/fp-signedint-convert-1.c   | 26 +++
 .../s390/vector/fp-unsignedint-convert-1.c | 26 +++
 3 files changed, 104 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/fp-signedint-convert-1.c
 create mode 100644 
gcc/testsuite/gcc.target/s390/vector/fp-unsignedint-convert-1.c

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index f0e4049..4c84505 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1960,6 +1960,58 @@
   operands[6] = gen_reg_rtx (V16QImode);
 })
 
+;
+; BFP <-> integer conversions
+;
+
+; signed integer to floating point
+
+; op2: inexact exception not suppressed (IEEE 754 2008)
+; op3: according to current rounding mode
+
+(define_insn "floatv2div2df2"
+  [(set (match_operand:V2DF 0 "register_operand" "=v")
+   (float:V2DF (match_operand:V2DI 1 "register_operand"  "v")))]
+  "TARGET_VX"
+  "vcdgb\t%v0,%v1,0,0"
+  [(set_attr "op_type" "VRR")])
+
+; unsigned integer to floating point
+
+; op2: inexact exception not suppressed (IEEE 754 2008)
+; op3: according to current rounding mode
+
+(define_insn "floatunsv2div2df2"
+  [(set (match_operand:V2DF  0 "register_operand" "=v")
+   (unsigned_float:V2DF (match_operand:V2DI 1 "register_operand"  "v")))]
+  "TARGET_VX"
+  "vcdlgb\t%v0,%v1,0,0"
+  [(set_attr "op_type" "VRR")])
+
+; floating point to signed integer
+
+; op2: inexact exception not suppressed (IEEE 754 2008)
+; op3: rounding mode 5 (round towards 0 C11 6.3.1.4)
+
+(define_insn "fix_truncv2dfv2di2"
+  [(set (match_operand:V2DI   0 "register_operand" "=v")
+   (fix:V2DI (match_operand:V2DF 1 "register_operand"  "v")))]
+  "TARGET_VX"
+  "vcgdb\t%v0,%v1,0,5"
+  [(set_attr "op_type" "VRR")])
+
+; floating point to unsigned integer
+
+; op2: inexact exception not suppressed (IEEE 754 2008)
+; op3: rounding mode 5 (round towards 0 C11 6.3.1.4)
+
+(define_insn "fixuns_truncv2dfv2di2"
+  [(set (match_operand:V2DI0 "register_operand" "=v")
+   (unsigned_fix:V2DI (match_operand:V2DF 1 "register_operand"  "v")))]
+  "TARGET_VX"
+  "vclgdb\t%v0,%v1,0,5"
+  [(set_attr "op_type" "VRR")])
+
 ; reduc_smin
 ; reduc_smax
 ; reduc_umin
diff --git a/gcc/testsuite/gcc.target/s390/vector/fp-signedint-convert-1.c 
b/gcc/testsuite/gcc.target/s390/vector/fp-signedint-convert-1.c
new file mode 100644
index 000..536817a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/fp-signedint-convert-1.c
@@ -0,0 +1,26 @@
+/* { dg-compile } */
+/* { dg-options "-O3 -march=z13 -mzarch" } */
+
+typedef long long __attribute__((vector_size(16))) v2di;
+typedef double __attribute__((vector_size(16))) v2df;
+
+v2di longvec;
+v2df doublevec;
+
+v2di
+tolong (v2df a)
+{
+  v2di out = (v2di){ (long long)a[0], (long long)a[1] };
+  return out;
+}
+
+/* { dg-final { scan-assembler-times "vcgdb\t%v24,%v24,0,5" 1 } } */
+
+v2df
+todouble (v2di a)
+{
+  v2df out = (v2df){ (double)a[0], (double)a[1] };
+  return out;
+}
+
+/* { dg-final { scan-assembler-times "vcdgb\t%v24,%v24,0,0" 1 } } */
diff --git a/gcc/testsuite/gcc.target/s390/vector/fp-unsignedint-convert-1.c 
b/gcc/testsuite/gcc.target/s390/vector/fp-unsignedint-convert-1.c
new file mode 100644
index 000..61409bc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/vector/fp-unsignedint-convert-1.c
@@ -0,0 +1,26 @@
+/* { dg-compile } */
+/* { dg-options "-O3 -march=z13 -mzarch" } */
+
+typedef unsigned long long __attribute__((vector_size(16))) v2di;
+typedef double __attribute__((vector_size(16))) v2df;
+
+v2di longvec;
+v2df doublevec;
+
+v2di
+toulong (v2df a)
+{
+  v2di out = (v2di){ (unsigned long long)a[0], (unsigned long long)a[1] };
+  return out;
+}
+
+/* { dg-final { scan-assembler-times "vclgdb\t%v24,%v24,0,5" 1 } } */
+
+v2df
+todouble (v2di a)
+{
+  v2df out = (v2df){ (double)a[0], (double)a[1] };
+  return out;
+}
+
+/* { dg-final { scan-assembler-times "vcdlgb\t%v24,%v24,0,0" 1 } } */
-- 
2.7.4



Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Steve Kargl
On Fri, Dec 21, 2018 at 06:31:27PM +0100, Thomas Koenig wrote:
> Hi Steve,
> 
> > No, I'm adding the missing functions to the INTERFACE.
> 
> Ah, I see. What I missed was that the function is actually translated
> to something else.
> 
> So, OK for trunk, and thanks for the patch!
> 

Janne, Thomas, Thanks for the quick review.

I first reported the bug 2 years ago (2016-01-02).  I left
it as low-hanging fruit with the hope that a junior gfortran
hacker would cut his/her teeth on a patch.  Junior hasn't
come along, and with nearly 1000 open PRs, I decided to
fix it.  I took a 5-6 month break from working on gfortran.
When I looked a month ago, there were 1002 open PRs.  With my
recent burst of fixing/closing 16 PRs in December, we're 
still at 991.  Too many PRs, too few hands. :-(

-- 
Steve


Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

2018-12-21 Thread Tamar Christina
Hi All,

I have made a trivial change in the patch and will assume the OK still applies.

I have also changed it from a compile to assemble tests.

Kind Regards,
Tamar

The 12/21/2018 11:40, Kyrill Tkachov wrote:
> Hi Tamar,
> 
> On 11/12/18 15:46, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> > multiplication and add instructions with a rotate along the Argand plane.
> >
> > The instructions are documented in the ArmARM[1] and the intrinsics 
> > specification
> > will be published on the Arm website [2].
> >
> > The Lane versions of these instructions are special in that they always 
> > select a pair.
> > using index 0 means selecting lane 0 and 1.  Because of this the range 
> > check for the
> > intrinsics require special handling.
> >
> > On Arm, in order to implement some of the lane intrinsics we're using the 
> > structure of the
> > register file.  The lane variant of these instructions always select a D 
> > register, but the data
> > itself can be stored in Q registers.  This means that for single precision 
> > complex numbers you are
> > only allowed to select D[0] but using the register file layout you can get 
> > the range 0-1 for lane indices
> > by selecting between Dn[0] and Dn+1[0].
> >
> > Same reasoning applies for half float complex numbers, except there your D 
> > register indexes can be 0 or 1, so you have
> > a total range of 4 elements (for a V8HF).
> >
> >
> > [1] 
> > https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
> > [2] https://developer.arm.com/docs/101028/latest
> >
> > Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> >
> > Ok for trunk?
> >
> 
> Ok.
> Thanks,
> Kyrill
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 2018-12-11  Tamar Christina  
> >
> > * config/arm/arm-builtins.c
> > (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
> > (MAC_LANE_PAIR_QUALIFIERS): New.
> > (arm_expand_builtin_args): Use it.
> > (arm_expand_builtin_1): Likewise.
> > * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
> > * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
> > * config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
> > * config/arm/arm_neon.h:
> > (vcadd_rot90_f16): New.
> > (vcaddq_rot90_f16): New.
> > (vcadd_rot270_f16): New.
> > (vcaddq_rot270_f16): New.
> > (vcmla_f16): New.
> > (vcmlaq_f16): New.
> > (vcmla_lane_f16): New.
> > (vcmla_laneq_f16): New.
> > (vcmlaq_lane_f16): New.
> > (vcmlaq_laneq_f16): New.
> > (vcmla_rot90_f16): New.
> > (vcmlaq_rot90_f16): New.
> > (vcmla_rot90_lane_f16): New.
> > (vcmla_rot90_laneq_f16): New.
> > (vcmlaq_rot90_lane_f16): New.
> > (vcmlaq_rot90_laneq_f16): New.
> > (vcmla_rot180_f16): New.
> > (vcmlaq_rot180_f16): New.
> > (vcmla_rot180_lane_f16): New.
> > (vcmla_rot180_laneq_f16): New.
> > (vcmlaq_rot180_lane_f16): New.
> > (vcmlaq_rot180_laneq_f16): New.
> > (vcmla_rot270_f16): New.
> > (vcmlaq_rot270_f16): New.
> > (vcmla_rot270_lane_f16): New.
> > (vcmla_rot270_laneq_f16): New.
> > (vcmlaq_rot270_lane_f16): New.
> > (vcmlaq_rot270_laneq_f16): New.
> > (vcadd_rot90_f32): New.
> > (vcaddq_rot90_f32): New.
> > (vcadd_rot270_f32): New.
> > (vcaddq_rot270_f32): New.
> > (vcmla_f32): New.
> > (vcmlaq_f32): New.
> > (vcmla_lane_f32): New.
> > (vcmla_laneq_f32): New.
> > (vcmlaq_lane_f32): New.
> > (vcmlaq_laneq_f32): New.
> > (vcmla_rot90_f32): New.
> > (vcmlaq_rot90_f32): New.
> > (vcmla_rot90_lane_f32): New.
> > (vcmla_rot90_laneq_f32): New.
> > (vcmlaq_rot90_lane_f32): New.
> > (vcmlaq_rot90_laneq_f32): New.
> > (vcmla_rot180_f32): New.
> > (vcmlaq_rot180_f32): New.
> > (vcmla_rot180_lane_f32): New.
> > (vcmla_rot180_laneq_f32): New.
> > (vcmlaq_rot180_lane_f32): New.
> > (vcmlaq_rot180_laneq_f32): New.
> > (vcmla_rot270_f32): New.
> > (vcmlaq_rot270_f32): New.
> > (vcmla_rot270_lane_f32): New.
> > (vcmla_rot270_laneq_f32): New.
> > (vcmlaq_rot270_lane_f32): New.
> > (vcmlaq_rot270_laneq_f32): New.
> > * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, 
> > vcmla90,
> > vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, 
> > vcmla_lane270,
> > vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
> > vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
> > * config/arm/neon.md (neon_vcmla_lane,
> > neon_vcmla_laneq, 

Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-21 Thread Steve Ellcey
Here is an update to the test part of this patch.  I did not change the
actual source code part of this, just the tests, so that is all I am
including here.  I removed the x86 changes that had gotten in there by
accident and used relative line numbers in the warning checks instead
of absolute line numbers.  I also moved the warning checks to be closer
to the lines where the warnings are generated.

Retested on x86 and aarch64 with no regressions.

Steve Ellcey
sell...@cavium.com


2018-12-21  Steve Ellcey  

* g++.dg/gomp/declare-simd-1.C: Add aarch64 specific
warning checks and assembler scans.
* g++.dg/gomp/declare-simd-3.C: Ditto.
* g++.dg/gomp/declare-simd-4.C: Ditto.
* g++.dg/gomp/declare-simd-7.C: Ditto.
* gcc.dg/gomp/declare-simd-1.c: Ditto.
* gcc.dg/gomp/declare-simd-3.c: Ditto.

diff --git a/gcc/testsuite/g++.dg/gomp/declare-simd-1.C b/gcc/testsuite/g++.dg/gomp/declare-simd-1.C
index d2659e1..f44efd5 100644
--- a/gcc/testsuite/g++.dg/gomp/declare-simd-1.C
+++ b/gcc/testsuite/g++.dg/gomp/declare-simd-1.C
@@ -14,6 +14,7 @@ int f2 (int a, int *b, int c)
   return a + *b + c;
 }
 
+// { dg-warning "GCC does not currently support simdlen 8 for type 'int'" "" { target aarch64-*-* } .-5 }
 // { dg-final { scan-assembler-times "_ZGVbM8uva32l4__Z2f2iPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVbN8uva32l4__Z2f2iPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVcM8uva32l4__Z2f2iPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
@@ -89,6 +90,7 @@ namespace N1
 // { dg-final { scan-assembler-times "_ZGVdN2va16__ZN2N12N23f10EPx:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeM2va16__ZN2N12N23f10EPx:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeN2va16__ZN2N12N23f10EPx:" 1 { target { i?86-*-* x86_64-*-* } } } }
+// { dg-final { scan-assembler-times "_ZN2N12N23f10EPx:" 1 { target { aarch64-*-* } } } }
 
 struct A
 {
@@ -191,6 +193,7 @@ int B::f25<7> (int a, int *b, int c)
   return a + *b + c;
 }
 
+// { dg-warning "unsupported argument type 'B' for simd" "" { target aarch64-*-* } .-5 }
 // { dg-final { scan-assembler-times "_ZGVbM8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVbN8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVcM8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
@@ -216,6 +219,7 @@ int B::f26<-1> (int a, int *b, int c)
 // { dg-final { scan-assembler-times "_ZGVdN4vl2va32__ZN1BIiE3f26ILin1EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeM4vl2va32__ZN1BIiE3f26ILin1EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeN4vl2va32__ZN1BIiE3f26ILin1EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } } } }
+// { dg-final { scan-assembler-times "_ZN1BIiE3f26ILin1EEEiiPii:" 1 { target { aarch64-*-* } } } }
 
 int
 f27 (int x)
@@ -239,6 +243,7 @@ f30 (int x)
   return x;
 }
 
+// { dg-warning "GCC does not currently support simdlen 16 for type 'int'" "" { target aarch64-*-* } .-7 }
 // { dg-final { scan-assembler-times "_ZGVbM16v__Z3f30i:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVbN16v__Z3f30i:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVcM16v__Z3f30i:" 1 { target { i?86-*-* x86_64-*-* } } } }
@@ -281,6 +286,7 @@ struct D
   int f37 (int a);
   int e;
 };
+// { dg-warning "GCC does not currently support simdlen 16 for type 'int'" "" { target aarch64-*-* } .-3 }
 
 void
 f38 (D )
diff --git a/gcc/testsuite/g++.dg/gomp/declare-simd-3.C b/gcc/testsuite/g++.dg/gomp/declare-simd-3.C
index 32cdc58..3d668ff 100644
--- a/gcc/testsuite/g++.dg/gomp/declare-simd-3.C
+++ b/gcc/testsuite/g++.dg/gomp/declare-simd-3.C
@@ -21,6 +21,8 @@ int f1 (int a, int b, int c, int , int , int )
 // { dg-final { scan-assembler-times "_ZGVdN8vulLUR4__Z2f1iiiRiS_S_:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeM16vulLUR4__Z2f1iiiRiS_S_:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeN16vulLUR4__Z2f1iiiRiS_S_:" 1 { target { i?86-*-* x86_64-*-* } } } }
+// { dg-final { scan-assembler-times "_ZGVnN4vulLUR4__Z2f1iiiRiS_S_:" 1 { target { aarch64-*-* } } } }
+  
 
 #pragma omp declare simd uniform(b) linear(c, d) linear(uval(e)) linear(ref(f))
 int f2 (int a, int b, int c, int , int , int )
@@ -48,6 +50,7 @@ int f2 (int a, int b, int c, int , int , int )
 // { dg-final { scan-assembler-times "_ZGVdN8vulLUR4__Z2f2iiiRiS_S_:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { dg-final { scan-assembler-times "_ZGVeM16vulLUR4__Z2f2iiiRiS_S_:" 1 { target { i?86-*-* x86_64-*-* } } } }
 // { 

Re: [ping] Change static chain to r11 on aarch64

2018-12-21 Thread Wilco Dijkstra
Hi Olivier,

> I'm experimenting with the idea of adjusting the
> stack probing code using r9 today, to see if it could
> save/restore that reg if it happens to be the static chain
> as well.
>
> If that can be made to work, maybe that would be a better
> alternative than just swapping and have the stack probing
> code use r10 and r11 instead (1 fewer register with dedicated
> use).

Remember these are just temporaries for use in the prolog and epilog -
there is no need to save/restore the static base. Setting static chain
to x9 and the temporaries to x10/x11 is the simplest solution. We
can separately look at why the prolog uses more than a single
temporary.

Cheers,
Wilco



Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Steve Kargl
On Fri, Dec 21, 2018 at 11:07:08AM +0200, Janne Blomqvist wrote:
> On Fri, Dec 21, 2018 at 8:22 AM Steve Kargl <
> s...@troutmask.apl.washington.edu> wrote:
> 
> > On Thu, Dec 20, 2018 at 01:47:39PM -0800, Steve Kargl wrote:
> > > The attached patch has been tested on x86_64-*-freebsd.
> > >
> > > OK to commit?
> > >
> > > 2018-12-20  Steven G. Kargl  
> > >
> > >   PR fortran/69121
> > >   * libgfortran/ieee/ieee_arithmetic.F90: Provide missing functions
> > >   in interface for IEEE_SCALB.
> > >
> > > 2018-12-20  Steven G. Kargl  
> > >
> > >   PR fortran/69121
> > >   * gfortran.dg/ieee/ieee_9.f90: New test.
> >
> > Now, tested on i586-*-freebsd.
> >
> 
> Hi, looks ok for trunk.
> 
> A few questions popped into my mind while looking into this:
> 
> 1) Why are none of the _gfortran_ieee_scalb_X_Y functions mentioned in
> gfortran.map? I guess they should all be there?
> 
> 2) Currently all the intrinsics map to the scalbn{,f,l} builtins. However,
> when the integer argument is of kind int64 or int128 we should instead use
> scalbln{,f,l}. This also applies to other intrinsics that use scalbn under
> the hood.
> 
> To clarify, fixing these is not a prerequisite for accepting the patch (I
> already accepted it), but more like topics for further work.
> 

I forgot to address your had a 2) item above.  ieee_scalb appears
to do the right thing.  FX addressed that with his implementation.
The 2nd argument is always cast to integer after reducing the range
to that of integer(4).

The binary floating point representation for a REAL(16) finite number
is x=f*2**e with f in [0.5,1) and e in [-16059,16384].  scalb(x,n) is
x*2**n, which becomes f*2**e*2**n = f*2**(e+n).  If x is the smallest
positive subnormal number, then n can be at most 32443 to still return 
a finite REAL(16) number.  Any larger value overflows to infinity.
If x is the largest positive finite number, then n can be -32443 to
return the small positive subnormal number.  Any more negative value
of n underflows to zero.  (Note, I could be off-by-one, but that is
just a detail.)

Consider

function foo(x,i)
   use ieee_arithmetic
   real(16) foo, c
   integer(8) i
   print *, ieee_scalb(c, i)
end function foo

-fdump-tree-original gives 
 
D.3853 = *i;
__result_foo = scalbnq (c,
(integer(kind=4)) MAX_EXPR , -2147483647>);

The range [-32443,32443] is a subset of [-huge(),huge(0)].

-- 
Steve


Re: [patch] Fix PR rtl-optimization/87727

2018-12-21 Thread Segher Boessenkool
On Fri, Dec 21, 2018 at 06:35:14PM +0100, Richard Biener wrote:
> On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool
>  wrote:
> >
> > On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote:
> > > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov  
> > > wrote:
> > > > On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > > > > On 12/20/18 4:41 PM, Jeff Law wrote:
> > > > >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> > > >I am just saying that you need at least have cost for each insn
> > > > alternative (may be sub-targets).  Although some approximation can be
> > > > possible (like insns number generated from the alternative or even their
> > > > size).
> >
> > For RISC targets, most instructions have exactly the same cost (and all
> > have the same size, or just a few sizes if you look at thumb etc.)
> >
> > > A further away (in pass distance) but maybe related project is to
> > > replace the current "instruction selection" (I'm talking about RTL
> > > expansion)
> >
> > In current GCC the instruction selection is expand+combine really, and
> > more the latter even, for well-written backends anyway.  Most "smarts"
> > expand does does only get in the way, even.
> >
> > > with a scheme that works on (GIMPLE) SSA.  My
> > > rough idea for prototyping pieces would be to first do this
> > > completely on GIMPLE by replacing a "instruction" by
> > > a GIMPLE asm with an "RTL" body (well, that doesn't have to
> > > be explicit, it just needs to remember the insn chosen). The
> > > available patterns are readily available in the .md files, we
> > > just need some GIMPLE <-> RTL translation of the operations.
> > >
> > > In the end this would do away with our named patterns
> > > for expansion purposes.
> >
> > That sounds nice :-)
> >
> > Do you see some way we can transition to such a scheme bit by bit, or
> > will there be a flag day?
> 
> Well, we could do a "pre-expand" GIMPLE instruction selection
> phase doing instruction selection on (parts) of the IL either
> substituting internal-function calls and use direct-optabs for
> later RTL expansion (that would then introduce target-specific
> internal functions) or try using the suggested scheme^Whack
> of using a GIMPLE ASM kind with instead of the asm text
> something that RTL expansion can work with.  The ASM approach
> has the advantage that we could put in constraints to guide RTL
> expansion, avoiding more "magic" (aka recog) there.

Hrm, so a special kind of GIMPLE ASM, let's call it "GIMPLE RTL"...
That sounds good yes!  As an intermediate, of course :-)

> Not sure what the hard part here is, but I guess it might be
> mapping of GIMPLE SSA to .md file define-insn patterns.

Expand does so *much* currently.  Maybe it shouldn't.  But then we
need to move much of what it does to a better place, because not all
of it is useless.

> Or maybe not.  As said, it should be reasonable easy to
> handle it for the standard named patterns which is where
> you could prototype the plumbing w/o doing the .md file
> parsing and matcher auto-generation.


Segher


RE: [PATCH 6/9][GCC][AArch64] Add Armv8.3-a complex intrinsics

2018-12-21 Thread Tamar Christina
Hi All,

This updated patch adds NEON intrinsics and tests for the Armv8.3-a complex
multiplication and add instructions with a rotate along the Argand plane.

The instructions are documented in the ArmARM[1] and the intrinsics 
specification
will be published on the Arm website [2].

The Lane versions of these instructions are special in that they always select 
a pair.
using index 0 means selecting lane 0 and 1.  Because of this the range check 
for the
intrinsics require special handling.

There're a few complexities with the intrinsics for the laneq variants for 
AArch64:

1) The architecture does not have a version for V2SF. However since the 
instructions always
   selects a pair of values, the only valid index for V2SF would have been 0. 
As such the lane
   versions for V2SF are all mapped to the 3SAME variant of the instructions 
and not the By element
   variant.

2) Because of no# 1 above, the laneq versions of the instruction become tricky. 
The valid indices are 0 and 1.
   For index 0 we treat it the same as the lane version of this instruction and 
just pass the lower half of the
   register to the 3SAME instruction.  When index is 1 we extract the upper 
half of the register and pass that to
   the 3SAME version of the instruction.

2) The architecture forbits the laneq version of the V4HF instruction from 
having an index greater than 1.  For index 0-1
   we do no extra work. For index 2-3 we extract the upper parts of the 
register and pass that to the instruction it would
   have normally used, and re-map the index into a range of 0-1.

[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
[2] https://developer.arm.com/docs/101028/latest

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
Additional runtime checks done but not posted with the patch.

Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-12-22  Tamar Christina  

* config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): Add 
qualifier_lane_pair_index.
(emit-rtl.h): Include.
(TYPES_QUADOP_LANE_PAIR): New.
(aarch64_simd_expand_args): Use it.
(aarch64_simd_expand_builtin): Likewise.
(AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_laneq_builtin_datum): 
New.
(FCMLA_LANEQ_BUILTIN, AARCH64_SIMD_FCMLA_LANEQ_BUILTIN_BASE,
AARCH64_SIMD_FCMLA_LANEQ_BUILTINS, aarch64_fcmla_lane_builtin_data,
aarch64_init_fcmla_laneq_builtins): New.
(aarch64_init_builtins): Add aarch64_init_fcmla_laneq_builtins.
(aarch64_expand_buildin): Add AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V2SF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V2SF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V2SF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ2700_V2SF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ0_V4HF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ90_V4HF, 
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ180_V4HF,
AARCH64_SIMD_BUILTIN_FCMLA_LANEQ270_V4HF.
* config/aarch64/iterators.md (FCMLA_maybe_lane): New.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add 
__ARM_FEATURE_COMPLEX.
* config/aarch64/aarch64-simd-builtins.def (fcadd90, fcadd270, fcmla0, 
fcmla90,
fcmla180, fcmla270, fcmla_lane0, fcmla_lane90, fcmla_lane180, 
fcmla_lane270,
fcmla_laneq0, fcmla_laneq90, fcmla_laneq180, fcmla_laneq270,
fcmlaq_lane0, fcmlaq_lane90, fcmlaq_lane180, fcmlaq_lane270): New.
* config/aarch64/aarch64-simd.md (aarch64_fcmla_lane,
aarch64_fcmla_laneqv4hf, aarch64_fcmlaq_lane): New.
* config/aarch64/arm_neon.h:
(vcadd_rot90_f16): New.
(vcaddq_rot90_f16): New.
(vcadd_rot270_f16): New.
(vcaddq_rot270_f16): New.
(vcmla_f16): New.
(vcmlaq_f16): New.
(vcmla_lane_f16): New.
(vcmla_laneq_f16): New.
(vcmlaq_lane_f16): New.
(vcmlaq_rot90_lane_f16): New.
(vcmla_rot90_laneq_f16): New.
(vcmla_rot90_lane_f16): New.
(vcmlaq_rot90_f16): New.
(vcmla_rot90_f16): New.
(vcmlaq_laneq_f16): New.
(vcmla_rot180_laneq_f16): New.
(vcmla_rot180_lane_f16): New.
(vcmlaq_rot180_f16): New.
(vcmla_rot180_f16): New.
(vcmlaq_rot90_laneq_f16): New.
(vcmlaq_rot270_laneq_f16): New.
(vcmlaq_rot270_lane_f16): New.
(vcmla_rot270_laneq_f16): New.
(vcmlaq_rot270_f16): New.
(vcmla_rot270_f16): New.
(vcmlaq_rot180_laneq_f16): New.
(vcmlaq_rot180_lane_f16): New.
(vcmla_rot270_lane_f16): New.
(vcadd_rot90_f32): New.
(vcaddq_rot90_f32): New.
(vcaddq_rot90_f64): New.
(vcadd_rot270_f32): New.
(vcaddq_rot270_f32): New.
(vcaddq_rot270_f64): New.
(vcmla_f32): New.
(vcmlaq_f32): New.
(vcmlaq_f64): New.
(vcmla_lane_f32): New.
(vcmla_laneq_f32): New.
(vcmlaq_lane_f32): New.

[PATCH, committed] Changing maintainer email address

2018-12-21 Thread Thomas Preudhomme
Hi,

I've updated my email address in MAINTAINERS file since I'm leaving my
company. I'll do the copyright assignment paperwork before
contributing any new patches.

Best regards,

Thomas
From c486e31b10ae0ec648ba256a92d5a4bcef1ef83d Mon Sep 17 00:00:00 2001
From: thopre01 
Date: Fri, 21 Dec 2018 17:53:03 +
Subject: [PATCH] Update maintainer email address

2018-12-21  Thomas Preud'homme  

* MAINTAINERS (Write After Approval): Update my maintainer address.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267330 138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog   | 4 
 MAINTAINERS | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/ChangeLog b/ChangeLog
index 11cfa2a6789..a86c3fc40c0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2018-12-21  Thomas Preud'homme  
+
+	* MAINTAINERS (Write After Approval): Update my maintainer address.
+
 2018-12-21  Gergö Barany  
 
 	* MAINTAINERS (Write After Approval): Add myself.
diff --git a/MAINTAINERS b/MAINTAINERS
index dcf744d023b..8ccd0ca7c33 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -537,7 +537,7 @@ Paul Pluzhnikov	
 Antoniu Pop	
 Siddhesh Poyarekar
 Vidya Praveen	
-Thomas Preud'homme
+Thomas Preud'homme
 Vladimir Prus	
 Yao Qi		
 Jerry Quinn	
-- 
2.19.1



Re: Fix hashtable node deallocation

2018-12-21 Thread François Dumont

Still waiting for this (I hope) last fix before gcc 9 release...

We could perhaps make:
  _Pointer_adapter(element_type* __arg = 0)

explicit to find out if there are other places where we miss to properly 
call pointer_traits<>::pointer_to. But I aren't sure we can do it in 
this extension and I haven't tried yet.


François

On 12/16/18 2:16 PM, François Dumont wrote:

Gentle reminder, we still have this issue pending.

    * include/bits/hashtable_policy.h
(_Hashtable_alloc<>::_M_deallocate_node_ptr(__node_type*)): New.
    (_Hashtable_alloc<>::_M_deallocate_node(__node_type*)): Use latter.
(_ReuseOrAllocNode<>::operator<_Arg>()(_Arg&&)): Likewise.
    * libstdc++-v3/testsuite/util/testsuite_allocator.h
    (CustomPointerAlloc<>::allocate(size_t, pointer)): Replace by...
    (CustomPointerAlloc<>::allocate(size_t, const_void_pointer)): 
...this.


François

On 11/29/18 7:08 AM, François Dumont wrote:

I am unclear about this patch, is it accepted ?


On 11/19/18 10:19 PM, François Dumont wrote:

On 11/19/18 1:34 PM, Jonathan Wakely wrote:

On 10/11/18 22:40 +0100, François Dumont wrote:
While working on a hashtable enhancement I noticed that we are not 
using the correct method to deallocate node if the constructor 
throws in _ReuseOrAllocNode operator(). I had to introduce a new 
_M_deallocate_node_ptr for that as node value shall not be destroy 
again.


I also check other places and noticed that a __node_type 
destructor call was missing.


That's intentional. The type has a trivial destructor, so its storage
can just be reused, we don't need to destroy it.



Ok, do you want to also remove the other call to ~__node_type() then ?

Here is the updated patch and the right ChangeLog entry:

    * include/bits/hashtable_policy.h
(_Hashtable_alloc<>::_M_deallocate_node_ptr(__node_type*)): New.
(_Hashtable_alloc<>::_M_deallocate_node(__node_type*)): Use latter.
(_ReuseOrAllocNode<>::operator<_Arg>()(_Arg&&)): Likewise.
    (_Hashtable_alloc<>::_M_allocate_node): Add ~__node_type call.
    * libstdc++-v3/testsuite/util/testsuite_allocator.h
    (CustomPointerAlloc<>::allocate(size_t, pointer)): Replace by...
    (CustomPointerAlloc<>::allocate(size_t, const_void_pointer)): 
...this.

    * testsuite/23_containers/unordered_set/allocator/ext_ptr.cc: Add
    check.

Ok to commit ?

François









Re: [patch] Fix PR rtl-optimization/87727

2018-12-21 Thread Richard Biener
On Fri, Dec 21, 2018 at 6:35 PM Richard Biener
 wrote:
>
> On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool
>  wrote:
> >
> > On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote:
> > > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov  
> > > wrote:
> > > > On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > > > > On 12/20/18 4:41 PM, Jeff Law wrote:
> > > > >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> > > >I am just saying that you need at least have cost for each insn
> > > > alternative (may be sub-targets).  Although some approximation can be
> > > > possible (like insns number generated from the alternative or even their
> > > > size).
> >
> > For RISC targets, most instructions have exactly the same cost (and all
> > have the same size, or just a few sizes if you look at thumb etc.)
> >
> > > A further away (in pass distance) but maybe related project is to
> > > replace the current "instruction selection" (I'm talking about RTL
> > > expansion)
> >
> > In current GCC the instruction selection is expand+combine really, and
> > more the latter even, for well-written backends anyway.  Most "smarts"
> > expand does does only get in the way, even.
> >
> > > with a scheme that works on (GIMPLE) SSA.  My
> > > rough idea for prototyping pieces would be to first do this
> > > completely on GIMPLE by replacing a "instruction" by
> > > a GIMPLE asm with an "RTL" body (well, that doesn't have to
> > > be explicit, it just needs to remember the insn chosen). The
> > > available patterns are readily available in the .md files, we
> > > just need some GIMPLE <-> RTL translation of the operations.
> > >
> > > In the end this would do away with our named patterns
> > > for expansion purposes.
> >
> > That sounds nice :-)
> >
> > Do you see some way we can transition to such a scheme bit by bit, or
> > will there be a flag day?
>
> Well, we could do a "pre-expand" GIMPLE instruction selection
> phase doing instruction selection on (parts) of the IL either
> substituting internal-function calls and use direct-optabs for
> later RTL expansion (that would then introduce target-specific
> internal functions) or try using the suggested scheme^Whack
> of using a GIMPLE ASM kind with instead of the asm text
> something that RTL expansion can work with.  The ASM approach
> has the advantage that we could put in constraints to guide RTL
> expansion, avoiding more "magic" (aka recog) there.
>
> Not sure what the hard part here is, but I guess it might be
> mapping of GIMPLE SSA to .md file define-insn patterns.
>
> Or maybe not.  As said, it should be reasonable easy to
> handle it for the standard named patterns which is where
> you could prototype the plumbing w/o doing the .md file
> parsing and matcher auto-generation.

To expand on this I was thinking about doing such partial
transition to get rid of TER - all the cases TER now is
required for would be "early instruction selected".

Richard.

> Richard.
>
> >
> > Segher


Re: [patch] Fix PR rtl-optimization/87727

2018-12-21 Thread Richard Biener
On Fri, Dec 21, 2018 at 5:25 PM Segher Boessenkool
 wrote:
>
> On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote:
> > On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov  
> > wrote:
> > > On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > > > On 12/20/18 4:41 PM, Jeff Law wrote:
> > > >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> > >I am just saying that you need at least have cost for each insn
> > > alternative (may be sub-targets).  Although some approximation can be
> > > possible (like insns number generated from the alternative or even their
> > > size).
>
> For RISC targets, most instructions have exactly the same cost (and all
> have the same size, or just a few sizes if you look at thumb etc.)
>
> > A further away (in pass distance) but maybe related project is to
> > replace the current "instruction selection" (I'm talking about RTL
> > expansion)
>
> In current GCC the instruction selection is expand+combine really, and
> more the latter even, for well-written backends anyway.  Most "smarts"
> expand does does only get in the way, even.
>
> > with a scheme that works on (GIMPLE) SSA.  My
> > rough idea for prototyping pieces would be to first do this
> > completely on GIMPLE by replacing a "instruction" by
> > a GIMPLE asm with an "RTL" body (well, that doesn't have to
> > be explicit, it just needs to remember the insn chosen). The
> > available patterns are readily available in the .md files, we
> > just need some GIMPLE <-> RTL translation of the operations.
> >
> > In the end this would do away with our named patterns
> > for expansion purposes.
>
> That sounds nice :-)
>
> Do you see some way we can transition to such a scheme bit by bit, or
> will there be a flag day?

Well, we could do a "pre-expand" GIMPLE instruction selection
phase doing instruction selection on (parts) of the IL either
substituting internal-function calls and use direct-optabs for
later RTL expansion (that would then introduce target-specific
internal functions) or try using the suggested scheme^Whack
of using a GIMPLE ASM kind with instead of the asm text
something that RTL expansion can work with.  The ASM approach
has the advantage that we could put in constraints to guide RTL
expansion, avoiding more "magic" (aka recog) there.

Not sure what the hard part here is, but I guess it might be
mapping of GIMPLE SSA to .md file define-insn patterns.

Or maybe not.  As said, it should be reasonable easy to
handle it for the standard named patterns which is where
you could prototype the plumbing w/o doing the .md file
parsing and matcher auto-generation.

Richard.

>
> Segher


Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Thomas Koenig

Hi Steve,


No, I'm adding the missing functions to the INTERFACE.


Ah, I see. What I missed was that the function is actually translated
to something else.

So, OK for trunk, and thanks for the patch!

Regards

Thomas


[PATCH] Improve PR85574

2018-12-21 Thread Richard Biener


It looks like IPA ICF does code generation based on the order of
a hahstable walk which is keyed on pointers.  That's a no-no.
Fixing that doesn't solve the cc1 miscompare for LTO bootstrap
but at least the IPA ICF WPA dumps are now consistent between
stages.

[LTO] Bootstrapped and tested on x86_64-unknown-linux-gnu, ok
for trunk (and branches)?

Will only get to applying after Christmas holidays so in case
you want to poke further feel free to apply yourself.

Thanks,
Richard.

2018-12-21  Richard Biener  

PR ipa/85574
* ipa-icf.h (sem_item_optimizer::sort_congruence_split): Declare.
* ipa-icf.c (sem_item_optimizer::sort_congruence_split): New
function.
(sem_item_optimizer::do_congruence_step_f): Sort the congruence
set after UIDs before splitting them.

Index: gcc/ipa-icf.c
===
--- gcc/ipa-icf.c   (revision 267301)
+++ gcc/ipa-icf.c   (working copy)
@@ -3117,6 +3117,18 @@ sem_item_optimizer::traverse_congruence_
   return true;
 }
 
+int
+sem_item_optimizer::sort_congruence_split (const void *a_, const void *b_)
+{
+  const std::pair *a = (const 
std::pair *)a_;
+  const std::pair *b = (const 
std::pair *)b_;
+  if (a->first->id < b->first->id)
+return -1;
+  else if (a->first->id > b->first->id)
+return 1;
+  return 0;
+}
+
 /* Tests if a class CLS used as INDEXth splits any congruence classes.
Bitmap stack BMSTACK is used for bitmap allocation.  */
 
@@ -3157,13 +3169,20 @@ sem_item_optimizer::do_congruence_step_f
}
 }
 
+  auto_vec > to_split;
+  to_split.reserve_exact (split_map.elements ());
+  for (hash_map ::iterator i = split_map.begin ();
+   i != split_map.end (); ++i)
+to_split.safe_push (*i);
+  to_split.qsort (sort_congruence_split);
+
   traverse_split_pair pair;
   pair.optimizer = this;
   pair.cls = cls;
 
   splitter_class_removed = false;
-  split_map.traverse  ();
+  for (unsigned i = 0; i < to_split.length (); ++i)
+traverse_congruence_split (to_split[i].first, to_split[i].second, );
 
   /* Bitmap clean-up.  */
   split_map.traverse 

[PATCH] Add myself to MAINTAINERS

2018-12-21 Thread Gergö Barany

Hi all,

this patch adds me to MAINTAINERS in the Write After Approval section.

Will commit to trunk.


Thanks,
Gergö
>From 2b5e62781aadfb5d89f6b11f4c4cb8e5cfe373be Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= 
Date: Fri, 21 Dec 2018 09:05:35 -0800
Subject: [PATCH] Add myself to MAINTAINERS.

* MAINTAINERS (Write After Approval): Add myself.
---
 ChangeLog   | 4 
 MAINTAINERS | 1 +
 2 files changed, 5 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index e36fc6f..11cfa2a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2018-12-21  Gergö Barany  
+
+	* MAINTAINERS (Write After Approval): Add myself.
+
 2018-12-10  Segher Boessenkool  
 
 	* contrib/config-list.mk: Remove powerpc-eabispe and powerpc-linux_spe.
diff --git a/MAINTAINERS b/MAINTAINERS
index 5d88479..dcf744d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -311,6 +311,7 @@ Giovanni Bajo	
 Simon Baldwin	
 Scott Bambrough	
 Wolfgang Bangerth
+Gergö Barany	
 Charles Baylis	
 Tejas Belagod	
 Jon Beniston	
-- 
2.8.1



Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2018-12-21 Thread Julian Brown
Hi Jakub,

Thanks for review!

On Fri, 21 Dec 2018 14:31:19 +0100
Jakub Jelinek  wrote:

> On Fri, Dec 21, 2018 at 01:23:03PM +, Julian Brown wrote:
> > 2018-xx-yy  Nathan Sidwell  
> > 
> > PR lto/71959
> > libgomp/
> > * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
> > * testsuite/libgomp.oacc-c++/pr71959.C: New.  
> 
> Just nits, better use pr71959-aux.cc (*.cc files aren't considered as
> testcases by *.exp:
> set tests [lsort [concat \
>   [find $srcdir/$subdir *.C] \
>   [find
> $srcdir/$subdir/../libgomp.oacc-c-c++-common *.c]]] ) and just a is
> weird.

Fixed.

> > commit c69dce8ba0ecd7ff620f4f1b8dacc94c61984107
> > Author: Julian Brown 
> > Date:   Wed Dec 19 05:01:58 2018 -0800
> > 
> > Add testcase from PR71959
> > 
> > libgomp/  
> 
> Please mention
>   PR lto/71959
> here in the ChangeLog.

Fixed.

> > * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
> > * testsuite/libgomp.oacc-c++/pr71959.C: New.  
> 
> > +void apply (int (*fn)(), Iter out) asm
> > ("_ZN5Apply5applyEPFivE4Iter");  
> 
> Will this work even on targets that use _ or other symbol prefixes?

I'd guess so, else there would be no portable way of using "asm" to
write pre-mangled C++ names. The only existing similar uses I could find
in the testsuite are for the ifunc attribute, not asm, though (e.g.
g++.dg/ext/attr-ifunc-*.C).

Anyway, OpenACC is only useful for a handful of targets at present,
neither of which use special symbol prefixes AFAIK.

> > --- /dev/null
> > +++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
> > @@ -0,0 +1,31 @@
> > +// { dg-additional-sources "pr71959-a.C" }
> > +
> > +// pr lto/71959 ICEd LTO due to mismatch between writing & reading
> > behaviour  
> 
> Capital PR instead of pr .

Fixed. OK now?

Thanks,

Julian


Re: [ping] Change static chain to r11 on aarch64

2018-12-21 Thread Olivier Hainque
Hi Wilco,

(and thanks everyone for the interesting input on this)

> On 17 Dec 2018, at 14:55, Wilco Dijkstra  wrote:
> 
> The AArch64 ABI defines x18 as platform specific:
[...]
> Using x9 would make its use as an extra argument clearer.

I'm experimenting with the idea of adjusting the
stack probing code using r9 today, to see if it could
save/restore that reg if it happens to be the static chain
as well.

If that can be made to work, maybe that would be a better
alternative than just swapping and have the stack probing
code use r10 and r11 instead (1 fewer register with dedicated
use).

Olivier



Re: [patch] Fix PR rtl-optimization/87727

2018-12-21 Thread Segher Boessenkool
On Fri, Dec 21, 2018 at 04:55:28PM +0100, Richard Biener wrote:
> On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov  wrote:
> > On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > > On 12/20/18 4:41 PM, Jeff Law wrote:
> > >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> >I am just saying that you need at least have cost for each insn
> > alternative (may be sub-targets).  Although some approximation can be
> > possible (like insns number generated from the alternative or even their
> > size).

For RISC targets, most instructions have exactly the same cost (and all
have the same size, or just a few sizes if you look at thumb etc.)

> A further away (in pass distance) but maybe related project is to
> replace the current "instruction selection" (I'm talking about RTL
> expansion)

In current GCC the instruction selection is expand+combine really, and
more the latter even, for well-written backends anyway.  Most "smarts"
expand does does only get in the way, even.

> with a scheme that works on (GIMPLE) SSA.  My
> rough idea for prototyping pieces would be to first do this
> completely on GIMPLE by replacing a "instruction" by
> a GIMPLE asm with an "RTL" body (well, that doesn't have to
> be explicit, it just needs to remember the insn chosen). The
> available patterns are readily available in the .md files, we
> just need some GIMPLE <-> RTL translation of the operations.
> 
> In the end this would do away with our named patterns
> for expansion purposes.

That sounds nice :-)

Do you see some way we can transition to such a scheme bit by bit, or
will there be a flag day?


Segher


Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts (revised, v3)

2018-12-21 Thread Chung-Lin Tang

On 2018/12/19 5:03 AM, Thomas Schwinge wrote:

Hi Chung-Lin!

On Tue, 18 Dec 2018 23:06:38 +0800, Chung-Lin Tang  
wrote:

this part includes some of the lookup_goacc_asyncqueue fixes we talked about.
I am still thinking about how the queue lock problem should really be solved, 
so regard
this patch as just fixing some of the problems.


Hi Thomas,
This is my solution to the queue lock stuff we talked about. I've only attached 
a diff to
be applied atop of the existing changes, though that may actually be easier to 
review.

Note that this is still in testing, which means it hasn't been tested :P, but 
I'm
posting to see if you have time to give it a look before the holidays.

Having the need for a lock on the async queues is quite irritating, especially
when the structure needed for managing them is quite simple.

Therefore, lets do away the need for locks entirely.

This patch makes the asyncqueue part of the device->openacc.async managed by 
lock-free
atomic operations; almost all of the complexity is contained in 
lookup_goacc_asyncqueue(),
so it should be not too complex. A descriptor and the queue array is 
allocated/exchanged
atomically when more storage is required, while in the common case a simple 
lookup is enough.
The fact that we manage asyncqueues by only expanding and never destroying 
asyncqueues
during the device lifetime also simplifies many things.

The current implementation may be not that optimized and clunky in some cases, 
but I think
this should be the way to implement what should be a fairly simple asyncqueue 
array and active
list. I'll update more on the status as testing proceeds.

(and about the other corners you noticed in the last mail, I'll get to that 
later...)

Thanks,
Chung-Lin








Sure, thanks.

Two comments, though:


--- libgomp/oacc-async.c(revision 267226)
+++ libgomp/oacc-async.c(working copy)



+attribute_hidden struct goacc_asyncqueue *
+lookup_goacc_asyncqueue (struct goacc_thread *thr, bool create, int async)
+{
+  /* The special value acc_async_noval (-1) maps to the thread-specific
+ default async stream.  */
+  if (async == acc_async_noval)
+async = thr->default_async;
+
+  if (async == acc_async_sync)
+return NULL;
+
+  if (async < 0)
+gomp_fatal ("bad async %d", async);
+
+  struct gomp_device_descr *dev = thr->dev;
+
+  gomp_mutex_lock (>openacc.async.lock);
+
+  if (!create
+  && (async >= dev->openacc.async.nasyncqueue
+ || !dev->openacc.async.asyncqueue[async]))
+{
+  gomp_mutex_unlock (>openacc.async.lock);
+  return NULL;
+}
+
+  if (async >= dev->openacc.async.nasyncqueue)
+{
+  int diff = async + 1 - dev->openacc.async.nasyncqueue;
+  dev->openacc.async.asyncqueue
+   = gomp_realloc (dev->openacc.async.asyncqueue,
+   sizeof (goacc_aq) * (async + 1));
+  memset (dev->openacc.async.asyncqueue + dev->openacc.async.nasyncqueue,
+ 0, sizeof (goacc_aq) * diff);
+  dev->openacc.async.nasyncqueue = async + 1;
+}
+
+  if (!dev->openacc.async.asyncqueue[async])
+{
+  dev->openacc.async.asyncqueue[async] = dev->openacc.async.construct_func 
();
+
+  if (!dev->openacc.async.asyncqueue[async])
+   {
+ gomp_mutex_unlock (>openacc.async.lock);
+ gomp_fatal ("async %d creation failed", async);
+   }


That will now always fail for host fallback, where
"host_openacc_async_construct" just always does "return NULL".

Actually, if the device doesn't support asyncqueues, this whole function
should turn into some kind of no-op, so that we don't again and again try
to create a new one for every call to "lookup_goacc_asyncqueue".

I'm attaching one possible solution.  I think it's fine to assume that
the majority of devices will support asyncqueues, and for those that
don't, this is just a one-time overhead per async-argument.  So, no
special handling required in "lookup_goacc_asyncqueue".


+  /* Link new async queue into active list.  */
+  goacc_aq_list n = gomp_malloc (sizeof (struct goacc_asyncqueue_list));
+  n->aq = dev->openacc.async.asyncqueue[async];
+  n->next = dev->openacc.async.active;
+  dev->openacc.async.active = n;
+}
+  gomp_mutex_unlock (>openacc.async.lock);


You still need to keep "async" locked during...


+  return dev->openacc.async.asyncqueue[async];


... this dereference.


+}



Oh, and:


--- libgomp/oacc-plugin.c   (revision 267226)
+++ libgomp/oacc-plugin.c   (working copy)
@@ -31,14 +31,10 @@
  #include "oacc-int.h"
  
  void

-GOMP_PLUGIN_async_unmap_vars (void *ptr, int async)
+GOMP_PLUGIN_async_unmap_vars (void *ptr __attribute__((unused)),
+ int async __attribute__((unused)))
  {
-  struct target_mem_desc *tgt = ptr;
-  struct gomp_device_descr *devicep = tgt->device_descr;
-
-  devicep->openacc.async_set_async_func (async);
-  gomp_unmap_vars (tgt, true);
-  devicep->openacc.async_set_async_func (acc_async_sync);

Re: [PATCH] attribute copy, leaf, weakref and -Wmisisng-attributes (PR 88546)

2018-12-21 Thread Martin Sebor

On 12/21/18 3:05 AM, Jakub Jelinek wrote:

Hi!

I think the main question is whether we should accept leaf attribute
on weakrefs, despite them being marked as !TREE_PUBLIC.

I know we haven't allowed that until now, but weakrefs are weirdo things
which have both static and external effects, static for that they are a
local alias and external for being actually aliases to (usually) external
functions.  If we add a weakref for some function declared as leaf,
it is unnecessarily pessimizing when we don't allow the leaf attribute on
the weakref.

Your patch looks reasonable to me to revert to previous state, but if we
decide to change the above, it would need to change.


Yes.  I confess I don't fully understand the rationale for disallowing
attribute leaf on extern functions, or the implications of allowing it
again.  It's accepted by GCC 4.1 but in r108074 it was made an error.
(GCC 4.1 also requires the declaration to be extern and gives an error
for static.)  The description in the patch submission says it's because:

  - requires that 'weakref' is attached only to a static object,
because that's all that the object file formats support; and
  - makes it work on Darwin, or at least makes the testcases pass.

  https://gcc.gnu.org/ml/gcc-patches/2005-12/msg00375.html

The text added to the manual makes it sounds like it was thought to be
a limitation that could be removed in the future:

  At present, a declaration to which @code{weakref} is attached can
  only be @code{static}.

I leave it up to you and others to decide if it's possible to accept
it on externs again.

That said, I'm also not sure the warning is necessarily the best way
to deal with the attribute mismatches in these cases (declarations
of aliases in .c files).  Wouldn't it make more sense to copy
the attributes from targets to their aliases unconditionally?

Joseph, any thoughts based on your experience with the warning (and
attribute copy) in Glibc?


On Thu, Dec 20, 2018 at 08:45:03PM -0700, Martin Sebor wrote:

--- gcc/c-family/c-attribs.c(revision 267282)
+++ gcc/c-family/c-attribs.c(working copy)
@@ -2455,6 +2455,12 @@ handle_copy_attribute (tree *node, tree name, tree
  || is_attribute_p ("weakref", atname))
continue;
  
+	  /* Aattribute leaf only applies to extern functions.

+Avoid copying it to static ones.  */


s/Aatribute/Attribute/


Fixed, thanks.

Martin


Re: [patch] Fix PR rtl-optimization/87727

2018-12-21 Thread Richard Biener
On Fri, Dec 21, 2018 at 4:25 PM Vladimir Makarov  wrote:
>
>
>
> On 12/20/2018 06:14 PM, Peter Bergner wrote:
> > On 12/20/18 4:41 PM, Jeff Law wrote:
> >> On 12/20/18 2:30 PM, Peter Bergner wrote:
> >>> For stage1, I'd like to fix that conflict wart if I can.  I have also
> >>> wondered about adding a copy coalesce phase just before we enter RA,
> >>> which would ensure the copies are removed, instead of hoping RA assigns
> >>> the same reg to the source and destination of the copy making it a nop
> >>> that can be removed.
> >> The difficulty with coalescing is that if you get too aggressive then
> >> you end up removing degrees of freedom from the allocator and you can
> >> easily make the final results worse.
> > I agree, but being too aggressive leading to bad decisions/code is
> > true for a lot of optimizations. :-)   I do plan on first attacking
> > the conservative conflict info for pseudos first and seeing what
> > that buys us before attempting any coalescing.
> When I started to work on IRA, I've tried several coalescing techniques
> (i recall only conservative, iterative and optimistic ones).  The
> results were not promising.  But it was very long time ago,  my major
> target was i686 that time and there were no accurate conflict
> calculations for irregular file registers.  So may be it will work in
> current environment and in a different implementation.
>
> Currently IRA has coalescing only for spilled pseudos after coloring
> (because mem<->mem moves are very expensive).  LRA has the same technique.
>
> > As for removing degrees of freedom for the allocator, sometimes that can
> > be a good thing, if it can makes the allocator simpler.  For example, I
> > think we have forced the allocator to do too much by not only being an RA,
> > but being an instruction selector as well.  Doing both RA and instruction
> > selection at the same time makes everything very complicated and I think
> > we probably don't compute allocation costs correctly, since we seem to
> > calculate costs on a per alternative per insn basis and I don't think we
> > ever see what the ramifications of using an alternative in one insn
> > has on the costs of another alternative in another insn.  Sometimes using
> > the cheapest alternative in one insn and the cheapest alternative in
> > another insn can lead us into a situation that requires spilling to
> > resolve the conflicting choices.
>I am completely agree.  The big remaining part to modernize GCC is
> code selection.  I believe LLVM has a big advantage in this area over
> GCC.  A modern approach could make RA much simpler.  But it is a very
> big job involving changes in machine descriptions (a lot of them).
>
>I don't mean machine description in IBURG style.  That would be a
> huge, enormous job requiring a lot of expertise part of which is lost
> for some targets (i was thinking about to start this jobs several times
> but gave up when I saw how many efforts it would take, it would be even
> a bigger job that writing IRA/LRA).
>
>I am just saying that you need at least have cost for each insn
> alternative (may be sub-targets).  Although some approximation can be
> possible (like insns number generated from the alternative or even their
> size).
>
>There are although some smaller projects in this direction.  For
> example, I tried to use code selection in register cost calculation (the
> code on ira-select branch).  The algorithm is based on choosing
> alternative for each insns first and then calculates costs and register
> classes for pseudos involved in the insn.  The chosen alternatives could
> be propagated later to LRA (this work even did not started yet).  The
> cost of each insn alternative (if we add them in the future in md files)
> could be easily integrated in the algorithm.
>
>Unfortunately the algorithm did not improve SPEC2006 for x86-64
> (i7-8700k) in overall although one benchmark was improved by about 5% if
> I remember this correctly.  But modern Intel CPUs are very insensitive
> to optimizations (they are complicated black boxes which do own
> optimizations and anekdotically i saw code when adding an additional
> move sped up the code a lot).  May be the algorithm will have better
> results on other targets (power or aarch64).  I never tried other targets.
> > I've wondered if running something like lra_constraints() (but using
> > pseudos for fixups rather than hard regs) early in the rtl passes as
> > a pseudo instruction selection pass wouldn't make things easier for
> > the following passes like RA, etc?
> >
> I think it might. As wrote we could propagate the above algorithm
> decision to LRA.
>
> Peter, also if you are interesting to do RA work, there is another
> problem which is to implement sub-register level conflict calculations
> in LRA.  Currently, IRA has a simple subregister level conflict
> calculation (see allocno objects) and in a case of sub-register presence
> IRA and LRA decisions are different and this 

Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Steve Kargl
On Fri, Dec 21, 2018 at 11:07:08AM +0200, Janne Blomqvist wrote:
> On Fri, Dec 21, 2018 at 8:22 AM Steve Kargl <
> s...@troutmask.apl.washington.edu> wrote:
> 
> > On Thu, Dec 20, 2018 at 01:47:39PM -0800, Steve Kargl wrote:
> > > The attached patch has been tested on x86_64-*-freebsd.
> > >
> > > OK to commit?
> > >
> > > 2018-12-20  Steven G. Kargl  
> > >
> > >   PR fortran/69121
> > >   * libgfortran/ieee/ieee_arithmetic.F90: Provide missing functions
> > >   in interface for IEEE_SCALB.
> > >
> > > 2018-12-20  Steven G. Kargl  
> > >
> > >   PR fortran/69121
> > >   * gfortran.dg/ieee/ieee_9.f90: New test.
> >
> > Now, tested on i586-*-freebsd.
> >
> 
> Hi, looks ok for trunk.
> 
> A few questions popped into my mind while looking into this:
> 
> 1) Why are none of the _gfortran_ieee_scalb_X_Y functions mentioned in
> gfortran.map? I guess they should all be there?
> 
> 2) Currently all the intrinsics map to the scalbn{,f,l} builtins. However,
> when the integer argument is of kind int64 or int128 we should instead use
> scalbln{,f,l}. This also applies to other intrinsics that use scalbn under
> the hood.
> 
> To clarify, fixing these is not a prerequisite for accepting the patch (I
> already accepted it), but more like topics for further work.

Just, sent shorter note in private email to Thomas.

No, I'm adding the missing functions to the INTERFACE.

This will not compile: 

   program foo
   use ieee_arithmetic
   real x
   integer(8) i
   x = 2
   i = 2_8
   print *, ieee_scalb(x,i)
   end program

because the module has a generic interface that does not
include the integer(8) argument.

FX seems to have wanted to avoid the explosion of functions in
the library.  In trans-intrinsic.c (conv_intrinsic_ieee_scalb),
he does a conversion of the integer(8) to an integer(4).
Unfortunately, checking the interface for ieee_scalb occurs
before code generation.  Compiling the above after my patch,
the -ftree-dump-original contains

D.3769 = __builtin_scalbnf (x, (integer(kind=4)) MAX_EXPR , -2147483647>);

-- 
Steve


Re: [patch] Fix PR rtl-optimization/87727

2018-12-21 Thread Vladimir Makarov




On 12/20/2018 06:14 PM, Peter Bergner wrote:

On 12/20/18 4:41 PM, Jeff Law wrote:

On 12/20/18 2:30 PM, Peter Bergner wrote:

For stage1, I'd like to fix that conflict wart if I can.  I have also
wondered about adding a copy coalesce phase just before we enter RA,
which would ensure the copies are removed, instead of hoping RA assigns
the same reg to the source and destination of the copy making it a nop
that can be removed.

The difficulty with coalescing is that if you get too aggressive then
you end up removing degrees of freedom from the allocator and you can
easily make the final results worse.

I agree, but being too aggressive leading to bad decisions/code is
true for a lot of optimizations. :-)   I do plan on first attacking
the conservative conflict info for pseudos first and seeing what
that buys us before attempting any coalescing.
When I started to work on IRA, I've tried several coalescing techniques 
(i recall only conservative, iterative and optimistic ones).  The 
results were not promising.  But it was very long time ago,  my major 
target was i686 that time and there were no accurate conflict 
calculations for irregular file registers.  So may be it will work in 
current environment and in a different implementation.


Currently IRA has coalescing only for spilled pseudos after coloring 
(because mem<->mem moves are very expensive).  LRA has the same technique.



As for removing degrees of freedom for the allocator, sometimes that can
be a good thing, if it can makes the allocator simpler.  For example, I
think we have forced the allocator to do too much by not only being an RA,
but being an instruction selector as well.  Doing both RA and instruction
selection at the same time makes everything very complicated and I think
we probably don't compute allocation costs correctly, since we seem to
calculate costs on a per alternative per insn basis and I don't think we
ever see what the ramifications of using an alternative in one insn
has on the costs of another alternative in another insn.  Sometimes using
the cheapest alternative in one insn and the cheapest alternative in
another insn can lead us into a situation that requires spilling to
resolve the conflicting choices.
  I am completely agree.  The big remaining part to modernize GCC is 
code selection.  I believe LLVM has a big advantage in this area over 
GCC.  A modern approach could make RA much simpler.  But it is a very 
big job involving changes in machine descriptions (a lot of them).


  I don't mean machine description in IBURG style.  That would be a 
huge, enormous job requiring a lot of expertise part of which is lost 
for some targets (i was thinking about to start this jobs several times 
but gave up when I saw how many efforts it would take, it would be even 
a bigger job that writing IRA/LRA).


  I am just saying that you need at least have cost for each insn 
alternative (may be sub-targets).  Although some approximation can be 
possible (like insns number generated from the alternative or even their 
size).


  There are although some smaller projects in this direction.  For 
example, I tried to use code selection in register cost calculation (the 
code on ira-select branch).  The algorithm is based on choosing 
alternative for each insns first and then calculates costs and register 
classes for pseudos involved in the insn.  The chosen alternatives could 
be propagated later to LRA (this work even did not started yet).  The 
cost of each insn alternative (if we add them in the future in md files) 
could be easily integrated in the algorithm.


  Unfortunately the algorithm did not improve SPEC2006 for x86-64 
(i7-8700k) in overall although one benchmark was improved by about 5% if 
I remember this correctly.  But modern Intel CPUs are very insensitive 
to optimizations (they are complicated black boxes which do own 
optimizations and anekdotically i saw code when adding an additional 
move sped up the code a lot).  May be the algorithm will have better 
results on other targets (power or aarch64).  I never tried other targets.

I've wondered if running something like lra_constraints() (but using
pseudos for fixups rather than hard regs) early in the rtl passes as
a pseudo instruction selection pass wouldn't make things easier for
the following passes like RA, etc?

I think it might. As wrote we could propagate the above algorithm 
decision to LRA.


Peter, also if you are interesting to do RA work, there is another 
problem which is to implement sub-register level conflict calculations 
in LRA.  Currently, IRA has a simple subregister level conflict 
calculation (see allocno objects) and in a case of sub-register presence 
IRA and LRA decisions are different and this results in worse code 
generations (there are some PRs for this).  It would be also a big RA 
project to do.




Re: [PATCH] Change AVX512 gathers/scatters to not print *WORD PTR in -masm=intel mode (PR target/88522)

2018-12-21 Thread Uros Bizjak
On Fri, Dec 21, 2018 at 4:04 PM Jakub Jelinek  wrote:
>
> Hi!
>
> Binutils recently changed the expected *WORD PTR sizes for AVX512
> scatter/gather insns incompatibly.
>
> If we want to stay compatible with both old and new gas, we need to
> avoid printing that *WORD PTR.

 Please add a comment, why we avoid PTR, and why X modifier is used.

> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux
> (though just with gas 2.29).  Ok for trunk?
>
> 2018-12-21  Jakub Jelinek  
>
> PR target/88522
> * config/i386/sse.md (*avx512pf_gatherpfsf_mask,
> *avx512pf_gatherpfdf_mask, *avx512pf_scatterpfsf_mask,
> *avx512pf_scatterpfdf_mask): Use %X5 instead of %5 for
> -masm=intel.
> (gatherq_mode): Remove mode iterator.
> (*avx512f_gathersi, *avx512f_gathersi_2): Use X instead
> of .
> (*avx512f_gatherdi): Use X instead of .
> (*avx512f_gatherdi_2, *avx512f_scattersi,
> *avx512f_scatterdi): Use %X5 for -masm=intel.

OK with the above comment addition.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2018-12-18 12:20:53.519482252 +0100
> +++ gcc/config/i386/sse.md  2018-12-21 10:44:34.614068251 +0100
> @@ -17269,9 +17269,9 @@ (define_insn "*avx512pf_gatherpfsf
>switch (INTVAL (operands[4]))
>  {
>  case 3:
> -  return "vgatherpf0ps\t{%5%{%0%}|%5%{%0%}}";
> +  return "vgatherpf0ps\t{%5%{%0%}|%X5%{%0%}}";
>  case 2:
> -  return "vgatherpf1ps\t{%5%{%0%}|%5%{%0%}}";
> +  return "vgatherpf1ps\t{%5%{%0%}|%X5%{%0%}}";
>  default:
>gcc_unreachable ();
>  }
> @@ -17314,9 +17314,9 @@ (define_insn "*avx512pf_gatherpfdf
>switch (INTVAL (operands[4]))
>  {
>  case 3:
> -  return "vgatherpf0pd\t{%5%{%0%}|%5%{%0%}}";
> +  return "vgatherpf0pd\t{%5%{%0%}|%X5%{%0%}}";
>  case 2:
> -  return "vgatherpf1pd\t{%5%{%0%}|%5%{%0%}}";
> +  return "vgatherpf1pd\t{%5%{%0%}|%X5%{%0%}}";
>  default:
>gcc_unreachable ();
>  }
> @@ -17360,10 +17360,10 @@ (define_insn "*avx512pf_scatterpfs
>  {
>  case 3:
>  case 7:
> -  return "vscatterpf0ps\t{%5%{%0%}|%5%{%0%}}";
> +  return "vscatterpf0ps\t{%5%{%0%}|%X5%{%0%}}";
>  case 2:
>  case 6:
> -  return "vscatterpf1ps\t{%5%{%0%}|%5%{%0%}}";
> +  return "vscatterpf1ps\t{%5%{%0%}|%X5%{%0%}}";
>  default:
>gcc_unreachable ();
>  }
> @@ -17407,10 +17407,10 @@ (define_insn "*avx512pf_scatterpfd
>  {
>  case 3:
>  case 7:
> -  return "vscatterpf0pd\t{%5%{%0%}|%5%{%0%}}";
> +  return "vscatterpf0pd\t{%5%{%0%}|%X5%{%0%}}";
>  case 2:
>  case 6:
> -  return "vscatterpf1pd\t{%5%{%0%}|%5%{%0%}}";
> +  return "vscatterpf1pd\t{%5%{%0%}|%X5%{%0%}}";
>  default:
>gcc_unreachable ();
>  }
> @@ -20290,12 +20290,6 @@ (define_insn "*avx2_gatherdi_4"
> (set_attr "prefix" "vex")
> (set_attr "mode" "")])
>
> -;; Memory operand override for -masm=intel of the v*gatherq* patterns.
> -(define_mode_attr gatherq_mode
> -  [(V4SI "q") (V2DI "x") (V4SF "q") (V2DF "x")
> -   (V8SI "x") (V4DI "t") (V8SF "x") (V4DF "t")
> -   (V16SI "t") (V8DI "g") (V16SF "t") (V8DF "g")])
> -
>  (define_expand "_gathersi"
>[(parallel [(set (match_operand:VI48F 0 "register_operand")
>(unspec:VI48F
> @@ -20329,7 +20323,7 @@ (define_insn "*avx512f_gathersi"
>   UNSPEC_GATHER))
> (clobber (match_scratch: 2 "="))]
>"TARGET_AVX512F"
> -  "vgatherd\t{%6, %0%{%2%}|%0%{%2%}, 
> %6}"
> +  "vgatherd\t{%6, %0%{%2%}|%0%{%2%}, %X6}"
>[(set_attr "type" "ssemov")
> (set_attr "prefix" "evex")
> (set_attr "mode" "")])
> @@ -20348,7 +20342,7 @@ (define_insn "*avx512f_gathersi_2"
>   UNSPEC_GATHER))
> (clobber (match_scratch: 1 "="))]
>"TARGET_AVX512F"
> -  "vgatherd\t{%5, %0%{%1%}|%0%{%1%}, 
> %5}"
> +  "vgatherd\t{%5, %0%{%1%}|%0%{%1%}, %X5}"
>[(set_attr "type" "ssemov")
> (set_attr "prefix" "evex")
> (set_attr "mode" "")])
> @@ -20388,7 +20382,7 @@ (define_insn "*avx512f_gatherdi"
> (clobber (match_scratch:QI 2 "="))]
>"TARGET_AVX512F"
>  {
> -  return "vgatherq\t{%6, %1%{%2%}|%1%{%2%}, 
> %6}";
> +  return "vgatherq\t{%6, %1%{%2%}|%1%{%2%}, 
> %X6}";
>  }
>[(set_attr "type" "ssemov")
> (set_attr "prefix" "evex")
> @@ -20412,11 +20406,11 @@ (define_insn "*avx512f_gatherdi_2"
>if (mode != mode)
>  {
>if ( != 64)
> -   return "vgatherq\t{%5, 
> %x0%{%1%}|%x0%{%1%}, %5}";
> +   return "vgatherq\t{%5, 
> %x0%{%1%}|%x0%{%1%}, %X5}";
>else
> -   return "vgatherq\t{%5, 
> %t0%{%1%}|%t0%{%1%}, %t5}";
> +   return "vgatherq\t{%5, 
> %t0%{%1%}|%t0%{%1%}, %X5}";
>  }
> -  return "vgatherq\t{%5, %0%{%1%}|%0%{%1%}, 
> %5}";
> +  return "vgatherq\t{%5, %0%{%1%}|%0%{%1%}, 
> %X5}";
>  }
>[(set_attr "type" "ssemov")
> (set_attr "prefix" "evex")
> @@ -20453,7 +20447,7 @@ (define_insn "*avx512f_scattersi"
>   

Re: [PATCH] Fix WIDEN_MULT_EXPR expansion (PR rtl-optimization/88563)

2018-12-21 Thread Richard Biener
On Fri, 21 Dec 2018, Jakub Jelinek wrote:

> Hi!
> 
> In the PR57251 fixes, I've messed up the modes, convert_modes first argument
> is the to mode and second argument is the from mode, i.e. mode the third
> argument has or is assumed to have.
> 
> The following patch fixes that.  Additionally I've swapped two conditions
> to avoid first convert_modes a CONST_INT to one mode and then convert_modes
> it again from the old mode rather than new one.  With the two conditions
> swapped it uses just one convert_modes for that case.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2018-12-21  Jakub Jelinek  
> 
>   PR rtl-optimization/88563
>   * expr.c (expand_expr_real_2) : Swap innermode
>   and mode arguments to convert_modes.  Likewise swap mode and word_mode
>   arguments.  Handle both arguments with VOIDmode before convert_modes
>   of one of them.  Formatting fixes.
> 
>   * gcc.dg/pr88563.c: New test.
> 
> --- gcc/expr.c.jj 2018-12-18 12:20:53.511482381 +0100
> +++ gcc/expr.c2018-12-21 11:57:25.523576071 +0100
> @@ -8775,8 +8775,8 @@ expand_expr_real_2 (sepops ops, rtx targ
>!= INTEGER_CST check.  Handle it.  */
> if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
>   {
> -   op0 = convert_modes (innermode, mode, op0, true);
> -   op1 = convert_modes (innermode, mode, op1, false);
> +   op0 = convert_modes (mode, innermode, op0, true);
> +   op1 = convert_modes (mode, innermode, op1, false);
> return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1,
>   target, unsignedp));
>   }
> @@ -8798,7 +8798,7 @@ expand_expr_real_2 (sepops ops, rtx targ
> if (TREE_CODE (treeop0) != INTEGER_CST)
>   {
> if (find_widening_optab_handler (this_optab, mode, innermode)
> - != CODE_FOR_nothing)
> +   != CODE_FOR_nothing)
>   {
> expand_operands (treeop0, treeop1, NULL_RTX, , ,
>  EXPAND_NORMAL);
> @@ -8807,9 +8807,9 @@ expand_expr_real_2 (sepops ops, rtx targ
> if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
>   {
>widen_mult_const:
> -   op0 = convert_modes (innermode, mode, op0, zextend_p);
> +   op0 = convert_modes (mode, innermode, op0, zextend_p);
> op1
> - = convert_modes (innermode, mode, op1,
> + = convert_modes (mode, innermode, op1,
>TYPE_UNSIGNED (TREE_TYPE (treeop1)));
> return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1,
>   target,
> @@ -8820,21 +8820,19 @@ expand_expr_real_2 (sepops ops, rtx targ
> return REDUCE_BIT_FIELD (temp);
>   }
> if (find_widening_optab_handler (other_optab, mode, innermode)
> - != CODE_FOR_nothing
> +   != CODE_FOR_nothing
> && innermode == word_mode)
>   {
> rtx htem, hipart;
> op0 = expand_normal (treeop0);
> -   if (TREE_CODE (treeop1) == INTEGER_CST)
> - op1 = convert_modes (word_mode, mode,
> -  expand_normal (treeop1),
> -  TYPE_UNSIGNED (TREE_TYPE (treeop1)));
> -   else
> - op1 = expand_normal (treeop1);
> -   /* op0 and op1 might still be constant, despite the above
> +   op1 = expand_normal (treeop1);
> +   /* op0 and op1 might be constants, despite the above
>!= INTEGER_CST check.  Handle it.  */
> if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
>   goto widen_mult_const;
> +   if (TREE_CODE (treeop1) == INTEGER_CST)
> + op1 = convert_modes (mode, word_mode, op1,
> +  TYPE_UNSIGNED (TREE_TYPE (treeop1)));
> temp = expand_binop (mode, other_optab, op0, op1, target,
>  unsignedp, OPTAB_LIB_WIDEN);
> hipart = gen_highpart (word_mode, temp);
> --- gcc/testsuite/gcc.dg/pr88563.c.jj 2018-12-21 12:19:02.850681604 +0100
> +++ gcc/testsuite/gcc.dg/pr88563.c2018-12-21 12:18:52.891841942 +0100
> @@ -0,0 +1,15 @@
> +/* PR rtl-optimization/88563 */
> +/* { dg-do run { target int128 } } */
> +/* { dg-options "-O2 -fno-code-hoisting -fno-tree-ccp 
> -fno-tree-dominator-opts -fno-tree-forwprop -fno-tree-fre -fno-tree-pre 
> -fno-tree-vrp" } */
> +
> +int
> +main ()
> +{
> +#if __SIZEOF_LONG_LONG__ == 8 && __SIZEOF_INT128__ == 16 && __CHAR_BIT__ == 8
> +  unsigned 

[PATCH] Fix WIDEN_MULT_EXPR expansion (PR rtl-optimization/88563)

2018-12-21 Thread Jakub Jelinek
Hi!

In the PR57251 fixes, I've messed up the modes, convert_modes first argument
is the to mode and second argument is the from mode, i.e. mode the third
argument has or is assumed to have.

The following patch fixes that.  Additionally I've swapped two conditions
to avoid first convert_modes a CONST_INT to one mode and then convert_modes
it again from the old mode rather than new one.  With the two conditions
swapped it uses just one convert_modes for that case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-21  Jakub Jelinek  

PR rtl-optimization/88563
* expr.c (expand_expr_real_2) : Swap innermode
and mode arguments to convert_modes.  Likewise swap mode and word_mode
arguments.  Handle both arguments with VOIDmode before convert_modes
of one of them.  Formatting fixes.

* gcc.dg/pr88563.c: New test.

--- gcc/expr.c.jj   2018-12-18 12:20:53.511482381 +0100
+++ gcc/expr.c  2018-12-21 11:57:25.523576071 +0100
@@ -8775,8 +8775,8 @@ expand_expr_real_2 (sepops ops, rtx targ
 != INTEGER_CST check.  Handle it.  */
  if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
{
- op0 = convert_modes (innermode, mode, op0, true);
- op1 = convert_modes (innermode, mode, op1, false);
+ op0 = convert_modes (mode, innermode, op0, true);
+ op1 = convert_modes (mode, innermode, op1, false);
  return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1,
target, unsignedp));
}
@@ -8798,7 +8798,7 @@ expand_expr_real_2 (sepops ops, rtx targ
  if (TREE_CODE (treeop0) != INTEGER_CST)
{
  if (find_widening_optab_handler (this_optab, mode, innermode)
-   != CODE_FOR_nothing)
+ != CODE_FOR_nothing)
{
  expand_operands (treeop0, treeop1, NULL_RTX, , ,
   EXPAND_NORMAL);
@@ -8807,9 +8807,9 @@ expand_expr_real_2 (sepops ops, rtx targ
  if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
{
 widen_mult_const:
- op0 = convert_modes (innermode, mode, op0, zextend_p);
+ op0 = convert_modes (mode, innermode, op0, zextend_p);
  op1
-   = convert_modes (innermode, mode, op1,
+   = convert_modes (mode, innermode, op1,
 TYPE_UNSIGNED (TREE_TYPE (treeop1)));
  return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1,
target,
@@ -8820,21 +8820,19 @@ expand_expr_real_2 (sepops ops, rtx targ
  return REDUCE_BIT_FIELD (temp);
}
  if (find_widening_optab_handler (other_optab, mode, innermode)
-   != CODE_FOR_nothing
+ != CODE_FOR_nothing
  && innermode == word_mode)
{
  rtx htem, hipart;
  op0 = expand_normal (treeop0);
- if (TREE_CODE (treeop1) == INTEGER_CST)
-   op1 = convert_modes (word_mode, mode,
-expand_normal (treeop1),
-TYPE_UNSIGNED (TREE_TYPE (treeop1)));
- else
-   op1 = expand_normal (treeop1);
- /* op0 and op1 might still be constant, despite the above
+ op1 = expand_normal (treeop1);
+ /* op0 and op1 might be constants, despite the above
 != INTEGER_CST check.  Handle it.  */
  if (GET_MODE (op0) == VOIDmode && GET_MODE (op1) == VOIDmode)
goto widen_mult_const;
+ if (TREE_CODE (treeop1) == INTEGER_CST)
+   op1 = convert_modes (mode, word_mode, op1,
+TYPE_UNSIGNED (TREE_TYPE (treeop1)));
  temp = expand_binop (mode, other_optab, op0, op1, target,
   unsignedp, OPTAB_LIB_WIDEN);
  hipart = gen_highpart (word_mode, temp);
--- gcc/testsuite/gcc.dg/pr88563.c.jj   2018-12-21 12:19:02.850681604 +0100
+++ gcc/testsuite/gcc.dg/pr88563.c  2018-12-21 12:18:52.891841942 +0100
@@ -0,0 +1,15 @@
+/* PR rtl-optimization/88563 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -fno-code-hoisting -fno-tree-ccp -fno-tree-dominator-opts 
-fno-tree-forwprop -fno-tree-fre -fno-tree-pre -fno-tree-vrp" } */
+
+int
+main ()
+{
+#if __SIZEOF_LONG_LONG__ == 8 && __SIZEOF_INT128__ == 16 && __CHAR_BIT__ == 8
+  unsigned __int128 a = 5;
+  __builtin_mul_overflow (0xULL, (unsigned long long) a, );
+  if (a != ((unsigned __int128)4 << 64 

[PATCH] Change AVX512 gathers/scatters to not print *WORD PTR in -masm=intel mode (PR target/88522)

2018-12-21 Thread Jakub Jelinek
Hi!

Binutils recently changed the expected *WORD PTR sizes for AVX512
scatter/gather insns incompatibly.

If we want to stay compatible with both old and new gas, we need to
avoid printing that *WORD PTR.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux
(though just with gas 2.29).  Ok for trunk?

2018-12-21  Jakub Jelinek  

PR target/88522
* config/i386/sse.md (*avx512pf_gatherpfsf_mask,
*avx512pf_gatherpfdf_mask, *avx512pf_scatterpfsf_mask,
*avx512pf_scatterpfdf_mask): Use %X5 instead of %5 for
-masm=intel.
(gatherq_mode): Remove mode iterator.
(*avx512f_gathersi, *avx512f_gathersi_2): Use X instead
of .
(*avx512f_gatherdi): Use X instead of .
(*avx512f_gatherdi_2, *avx512f_scattersi,
*avx512f_scatterdi): Use %X5 for -masm=intel.

--- gcc/config/i386/sse.md.jj   2018-12-18 12:20:53.519482252 +0100
+++ gcc/config/i386/sse.md  2018-12-21 10:44:34.614068251 +0100
@@ -17269,9 +17269,9 @@ (define_insn "*avx512pf_gatherpfsf
   switch (INTVAL (operands[4]))
 {
 case 3:
-  return "vgatherpf0ps\t{%5%{%0%}|%5%{%0%}}";
+  return "vgatherpf0ps\t{%5%{%0%}|%X5%{%0%}}";
 case 2:
-  return "vgatherpf1ps\t{%5%{%0%}|%5%{%0%}}";
+  return "vgatherpf1ps\t{%5%{%0%}|%X5%{%0%}}";
 default:
   gcc_unreachable ();
 }
@@ -17314,9 +17314,9 @@ (define_insn "*avx512pf_gatherpfdf
   switch (INTVAL (operands[4]))
 {
 case 3:
-  return "vgatherpf0pd\t{%5%{%0%}|%5%{%0%}}";
+  return "vgatherpf0pd\t{%5%{%0%}|%X5%{%0%}}";
 case 2:
-  return "vgatherpf1pd\t{%5%{%0%}|%5%{%0%}}";
+  return "vgatherpf1pd\t{%5%{%0%}|%X5%{%0%}}";
 default:
   gcc_unreachable ();
 }
@@ -17360,10 +17360,10 @@ (define_insn "*avx512pf_scatterpfs
 {
 case 3:
 case 7:
-  return "vscatterpf0ps\t{%5%{%0%}|%5%{%0%}}";
+  return "vscatterpf0ps\t{%5%{%0%}|%X5%{%0%}}";
 case 2:
 case 6:
-  return "vscatterpf1ps\t{%5%{%0%}|%5%{%0%}}";
+  return "vscatterpf1ps\t{%5%{%0%}|%X5%{%0%}}";
 default:
   gcc_unreachable ();
 }
@@ -17407,10 +17407,10 @@ (define_insn "*avx512pf_scatterpfd
 {
 case 3:
 case 7:
-  return "vscatterpf0pd\t{%5%{%0%}|%5%{%0%}}";
+  return "vscatterpf0pd\t{%5%{%0%}|%X5%{%0%}}";
 case 2:
 case 6:
-  return "vscatterpf1pd\t{%5%{%0%}|%5%{%0%}}";
+  return "vscatterpf1pd\t{%5%{%0%}|%X5%{%0%}}";
 default:
   gcc_unreachable ();
 }
@@ -20290,12 +20290,6 @@ (define_insn "*avx2_gatherdi_4"
(set_attr "prefix" "vex")
(set_attr "mode" "")])
 
-;; Memory operand override for -masm=intel of the v*gatherq* patterns.
-(define_mode_attr gatherq_mode
-  [(V4SI "q") (V2DI "x") (V4SF "q") (V2DF "x")
-   (V8SI "x") (V4DI "t") (V8SF "x") (V4DF "t")
-   (V16SI "t") (V8DI "g") (V16SF "t") (V8DF "g")])
-
 (define_expand "_gathersi"
   [(parallel [(set (match_operand:VI48F 0 "register_operand")
   (unspec:VI48F
@@ -20329,7 +20323,7 @@ (define_insn "*avx512f_gathersi"
  UNSPEC_GATHER))
(clobber (match_scratch: 2 "="))]
   "TARGET_AVX512F"
-  "vgatherd\t{%6, %0%{%2%}|%0%{%2%}, 
%6}"
+  "vgatherd\t{%6, %0%{%2%}|%0%{%2%}, %X6}"
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -20348,7 +20342,7 @@ (define_insn "*avx512f_gathersi_2"
  UNSPEC_GATHER))
(clobber (match_scratch: 1 "="))]
   "TARGET_AVX512F"
-  "vgatherd\t{%5, %0%{%1%}|%0%{%1%}, 
%5}"
+  "vgatherd\t{%5, %0%{%1%}|%0%{%1%}, %X5}"
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -20388,7 +20382,7 @@ (define_insn "*avx512f_gatherdi"
(clobber (match_scratch:QI 2 "="))]
   "TARGET_AVX512F"
 {
-  return "vgatherq\t{%6, %1%{%2%}|%1%{%2%}, 
%6}";
+  return "vgatherq\t{%6, %1%{%2%}|%1%{%2%}, %X6}";
 }
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -20412,11 +20406,11 @@ (define_insn "*avx512f_gatherdi_2"
   if (mode != mode)
 {
   if ( != 64)
-   return "vgatherq\t{%5, 
%x0%{%1%}|%x0%{%1%}, %5}";
+   return "vgatherq\t{%5, 
%x0%{%1%}|%x0%{%1%}, %X5}";
   else
-   return "vgatherq\t{%5, 
%t0%{%1%}|%t0%{%1%}, %t5}";
+   return "vgatherq\t{%5, 
%t0%{%1%}|%t0%{%1%}, %X5}";
 }
-  return "vgatherq\t{%5, %0%{%1%}|%0%{%1%}, 
%5}";
+  return "vgatherq\t{%5, %0%{%1%}|%0%{%1%}, %X5}";
 }
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
@@ -20453,7 +20447,7 @@ (define_insn "*avx512f_scattersi"
  UNSPEC_SCATTER))
(clobber (match_scratch: 1 "="))]
   "TARGET_AVX512F"
-  "vscatterd\t{%3, %5%{%1%}|%5%{%1%}, %3}"
+  "vscatterd\t{%3, %5%{%1%}|%X5%{%1%}, %3}"
   [(set_attr "type" "ssemov")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -20489,11 +20483,7 @@ (define_insn "*avx512f_scatterdi"
  UNSPEC_SCATTER))
(clobber (match_scratch:QI 1 "="))]
   "TARGET_AVX512F"
-{
-  if (GET_MODE_SIZE (GET_MODE_INNER (mode)) == 8)
-return 

Re: [PATCH 3/3][GCC][AARCH64] Add support for pointer authentication B key

2018-12-21 Thread Sam Tebbs
On 11/9/18 11:04 AM, Sam Tebbs wrote:
> On 11/02/2018 06:01 PM, Sam Tebbs wrote:
>
>> On 11/02/2018 05:35 PM, Sam Tebbs wrote:
>>
>>> Hi all,
>>>
>>> This patch adds support for the Armv8.3-A pointer authentication 
>>> instructions
>>> that use the B-key (pacib*, autib* and retab). This required adding 
>>> builtins for
>>> pacib1716 and autib1716, adding the "b-key" feature to the 
>>> -mbranch-protection
>>> option, and required emitting a new CFI directive ".cfi_b_key_frame" which
>>> causes GAS to add 'B' to the CIE augmentation string. I also had to add a 
>>> new
>>> hook called ASM_POST_CFI_STARTPROC which is triggered when the 
>>> .cfi_startproc
>>> directive is emitted.
>>>
>>> The libgcc stack unwinder has been amended to authenticate return addresses
>>> with the B key when the function has been signed with the B key.
>>>
>>> The previous patch in this series is here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00104.html
>>>
>>> Bootstrapped successfully and regression tested on aarch64-none-elf.
>>>
>>> OK for trunk?
>>>
>>> gcc/
>>> 2018-11-02  Sam Tebbs  
>>>
>>> * config/aarch64/aarch64-builtins.c (aarch64_builtins): Add
>>> AARCH64_PAUTH_BUILTIN_AUTIB1716 and AARCH64_PAUTH_BUILTIN_PACIB1716.
>>> * config/aarch64/aarch64-builtins.c (aarch64_init_pauth_hint_builtins):
>>> Add autib1716 and pacib1716 initialisation.
>>> * config/aarch64/aarch64-builtins.c (aarch64_expand_builtin): Add checks
>>> for autib1716 and pacib1716.
>>> * config/aarch64/aarch64-protos.h (aarch64_key_type,
>>> aarch64_post_cfi_startproc): Define.
>>> * config/aarch64/aarch64-protos.h (aarch64_ra_sign_key): Define extern.
>>> * config/aarch64/aarch64.c (aarch64_return_address_signing_enabled): Add
>>> check for b-key, remove frame.laid_out assertion.
>>> * config/aarch64/aarch64.c (aarch64_ra_sign_key,
>>> aarch64_post_cfi_startproc, aarch64_handle_pac_ret_b_key): Define.
>>> * config/aarch64/aarch64.h (TARGET_ASM_POST_CFI_STARTPROC): Define.
>>> * config/aarch64/aarch64.c (aarch64_pac_ret_subtypes): Add "b-key".
>>> * config/aarch64/aarch64.md (unspec): Add UNSPEC_AUTIA1716,
>>> UNSPEC_AUTIB1716, UNSPEC_AUTIASP, UNSPEC_AUTIBSP, UNSPEC_PACIA1716,
>>> UNSPEC_PACIB1716, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>> * config/aarch64/aarch64.md (do_return): Add check for b-key.
>>> * config/aarch64/aarch64.md (sp): Add check for
>>> signing key and scope selected.
>>> * config/aarch64/aarch64.md (1716): Add check for
>>> signing key and scope selected.
>>> * config/aarch64/aarch64.opt (msign-return-address=): Deprecate.
>>> * config/aarch64/iterators.md (PAUTH_LR_SP): Add UNSPEC_AUTIASP,
>>> UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>> * config/aarch64/iterators.md (PAUTH_17_16): Add UNSPEC_AUTIA1716,
>>> UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716.
>>> * config/aarch64/iterators.md (pauth_mnem_prefix): Add UNSPEC_AUTIA1716,
>>> UNSPEC_AUTIB1716, UNSPEC_PACIA1716, UNSPEC_PACIB1716, UNSPEC_AUTIASP,
>>> UNSPEC_AUTIBSP, UNSPEC_PACIASP, UNSPEC_PACIBSP.
>>> * config/aarch64/iterators.md (pauth_hint_num_a): Replace
>>> UNSPEC_PACI1716 and UNSPEC_AUTI1716 with UNSPEC_PACIA1716 and
>>> UNSPEC_AUTIA1716 respectively.
>>> * config/aarch64/iterators.md (pauth_hint_num_b): New int attribute.
>>>
>>> gcc/testsuite
>>> 2018-11-02  Sam Tebbs  
>>>
>>> * gcc.target/aarch64/return_address_sign_1.c (dg-final): Replace
>>> "autiasp" and "paciasp" with "hint\t29 // autisp" and
>>> "hint\t25 // pacisp" respectively.
>>> * gcc.target/aarch64/return_address_sign_2.c (dg-final): Replace
>>> "paciasp" with "hint\t25 // pacisp".
>>> * gcc.target/aarch64/return_address_sign_3.c (dg-final): Replace
>>> "paciasp" and "autiasp" with "pacisp" and "autisp" respectively.
>>> * gcc.target/aarch64/return_address_sign_b_1.c: New file.
>>> * gcc.target/aarch64/return_address_sign_b_2.c: New file.
>>> * gcc.target/aarch64/return_address_sign_b_3.c: New file.
>>> * gcc.target/aarch64/return_address_sign_b_exception.c: New file.
>>> * gcc.target/aarch64/return_address_sign_builtin.c: New file
>>>
>>> libgcc/
>>> 2018-11-02  Sam Tebbs  
>>>
>>> * config/aarch64/aarch64-unwind.h (aarch64_cie_signed_with_b_key): New
>>> function.
>>> * config/aarch64/aarch64-unwind.h (aarch64_post_extract_frame_addr,
>>> aarch64_post_frob_eh_handler_addr): Add check for b-key.
>>> * unwind-dw2-fde.c (get_cie_encoding): Add check for 'B' in augmentation
>>> string.
>>> * unwind-dw2.c (extract_cie_info): Add check for 'B' in augmentation
>>> string.
>> Attached is an updated patch rebased on an improvement to the
>> -mbranch-protection option documentation.
> ping

Attached is an improved patch with "hint" removed from the test scans, 
pauth_hint_num_a and pauth_hint_num_b merged into pauth_hint_num and the 
"gcc_assert (cfun->machine->frame.laid_out)" 

Re: [PATCH] x86-64: {,V}CVTSI2Sx are ambiguous without suffix

2018-12-21 Thread Uros Bizjak
On Fri, Dec 21, 2018 at 9:08 AM Jan Beulich  wrote:
>
> For 64-bit these should not be emitted without suffix in AT mode (as
> being ambiguous that way); the suffixes are benign for 32-bit. For
> consistency also omit the suffix in Intel mode for {,V}CVTSI2SxQ.
>
> The omission has originally (prior to rev 260691) lead to wrong code
> being generated for the 64-bit unsigned-to-float/double conversions (as
> gas guesses an L suffix instead of the required Q one when the operand
> is in memory). In all remaining cases (being changed here) the omission
> would "just" lead to warnings with future gas versions.
>
> Since rex64suffix so far has been used also on {,V}CVTSx2SI (but
> not on VCVTSx2USI, as gas doesn't permit suffixes there), testsuite
> adjustments are also necessary for their test cases. Rather than
> making thinks check for the L suffixes in 32-bit cases, make things
> symmetric with VCVTSx2USI and drop the redundant suffixes instead,
> dropping the Q suffix expectations at the same time from the 64-bit
> cases.

This diverges from established practice, where all instructions have
suffixes in ATT  dialect. I think that we should to continue to follow
established convention (that found a couple of bugs in the past), so I
think that "l" should be emitted where appropriate. I wonder if gas
should be fixed to accept suffixes for VCVTSx2USI.

For now, let's leave all suffixes, but skip problematic VCVTSx2USI.

> In order for related test cases to actually test what they're supposed
> to test, add (seemingly unrelated) a few empty "asm volatile()".
> Presumably there are more where constant propagation voids the intended
> effect of the tests, but these are ones helping make sure the assembler
> actually still assembles correctly the output after the changes here.

Please just make relevant variable volatile. There are plenty of
examples in the i386 target testsuite.

Uros.

> gcc/
> 2018-12-21  Jan Beulich  
>
> * config/i386/i386.md (rex64suffix): Add L suffix for SI.
> * config/i386/sse.md (sse_cvtss2si,
> sse_cvtss2si_2,
> sse_cvttss2si,
> sse2_cvtsd2si,
> sse2_cvtsd2si_2,
> sse2_cvttsd2si): Drop
> .
> (cvtusi232, sse2_cvtsi2sd): Add
> {l}.
> (sse2_cvtsi2sdq): Make q conditional upon AT
> syntax.
>
> gcc/testsuite/
> 2018-12-21  Jan Beulich  
>
> * gcc.target/i386/avx512f-vcvtsd2si64-1.c,
> gcc.target/i386/avx512f-vcvtss2si64-1.c
> gcc.target/i386/avx512f-vcvttsd2si64-1.c
> gcc.target/i386/avx512f-vcvttss2si64-1.c: Drop q suffix
> expectation.
> * gcc.target/i386/avx512f-vcvtsi2ss-1.c,
> gcc.target/i386/avx512f-vcvtusi2sd-1.c,
> gcc.target/i386/avx512f-vcvtusi2ss-1.c: Expect l suffix.
> * gcc.target/i386/avx512f-vcvtusi2sd-2.c,
> gcc.target/i386/avx512f-vcvtusi2sd64-2.c,
> gcc.target/i386/avx512f-vcvtusi2ss-2.c,
> gcc.target/i386/avx512f-vcvtusi2ss64-2.c: Add asm volatile().
>
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1162,7 +1162,7 @@
>[(QI "V64QI") (HI "V32HI") (SI "V16SI") (DI "V8DI") (SF "V16SF") (DF 
> "V8DF")])
>
>  ;; Instruction suffix for REX 64bit operators.
> -(define_mode_attr rex64suffix [(SI "") (DI "{q}")])
> +(define_mode_attr rex64suffix [(SI "{l}") (DI "{q}")])
>  (define_mode_attr rex64namesuffix [(SI "") (DI "q")])
>
>  ;; This mode iterator allows :P to be used for patterns that operate on
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -4720,7 +4720,7 @@
>  (parallel [(const_int 0)]))]
>   UNSPEC_FIX_NOTRUNC))]
>"TARGET_SSE"
> -  "%vcvtss2si\t{%1, %0|%0, %k1}"
> +  "%vcvtss2si\t{%1, %0|%0, %k1}"
>[(set_attr "type" "sseicvt")
> (set_attr "athlon_decode" "double,vector")
> (set_attr "bdver1_decode" "double,double")
> @@ -4733,7 +4733,7 @@
> (unspec:SWI48 [(match_operand:SF 1 "nonimmediate_operand" "v,m")]
>   UNSPEC_FIX_NOTRUNC))]
>"TARGET_SSE"
> -  "%vcvtss2si\t{%1, %0|%0, %k1}"
> +  "%vcvtss2si\t{%1, %0|%0, %k1}"
>[(set_attr "type" "sseicvt")
> (set_attr "athlon_decode" "double,vector")
> (set_attr "amdfam10_decode" "double,double")
> @@ -4749,7 +4749,7 @@
> (match_operand:V4SF 1 "" 
> "v,")
> (parallel [(const_int 0)]]
>"TARGET_SSE"
> -  "%vcvttss2si\t{%1, %0|%0, 
> %k1}"
> +  "%vcvttss2si\t{%1, %0|%0, %k1}"
>[(set_attr "type" "sseicvt")
> (set_attr "athlon_decode" "double,vector")
> (set_attr "amdfam10_decode" "double,double")
> @@ -4767,7 +4767,7 @@
>   (match_operand:VF_128 1 "register_operand" "v")
>   (const_int 1)))]
>"TARGET_AVX512F && "
> -  "vcvtusi2\t{%2, %1, %0|%0, %1, 
> %2}"
> +  "vcvtusi2{l}\t{%2, %1, %0|%0, 
> %1, %2}"
>[(set_attr "type" "sseicvt")
> (set_attr "prefix" "evex")
> (set_attr "mode" "")])
> @@ -5026,9 +5026,9 @@
>   (const_int 1)))]
>

Re: [PATCH] [og8] Add OpenACC 2.6 if and if_present clauses on host_data construct

2018-12-21 Thread Thomas Schwinge
Hi Gergő!

On Fri, 21 Dec 2018 13:29:09 +0100, Gergö Barany  wrote:
> OpenACC 2.6 specifies `if' and `if_present' clauses on the `host_data' 
> construct. These patches add support for these clauses. The first patch, 
> by Thomas, reorganizes libgomp internals to turn a "device" argument 
> into "flags" that can provide more information to the runtime. The 
> second patch adds support for the `if' and `if_present' clauses, using 
> the new flag mechanism.
> 
> OK for openacc-gcc-8-branch?

Yes, thanks.  To record the review effort, please include "Reviewed-by:
Thomas Schwinge " in the commit log, see
.  (Not for my own commit, of
course.)

Again, just the commit message of the second commit needs to be adjusted, from:

> [...]
> gcc/testsuite/c-c++-common/goacc/
> * host_data-1.c: Add tests of if and if_present clauses on host_data.
> gcc/testsuite/gfortran.dg/goacc/
> * host_data-tree.f95: Likewise.
> [...]
> libgomp/
> * libgomp.h (enum gomp_map_vars_kind): Add
> GOMP_MAP_VARS_OPENACC_IF_PRESENT.
> 
> libgomp/
> * oacc-parallel.c (GOACC_data_start): Handle
> GOACC_FLAG_HOST_DATA_IF_PRESENT flag.
> * target.c (gomp_map_vars_async): Handle
> GOMP_MAP_VARS_OPENACC_IF_PRESENT mapping kind.
> 
> libgomp/testsuite/libgomp.oacc-c-c++-common/
> * host_data-6.c: New test.

... to:

> [...]
> gcc/testsuite/
> * c-c++-common/goacc/host_data-1.c: Add tests of if and if_present 
> clauses on host_data. [add suitable line break some where]
> * gfortran.dg/goacc/host_data-tree.f95: Likewise.
> [...]
> libgomp/
> * libgomp.h (enum gomp_map_vars_kind): Add
> GOMP_MAP_VARS_OPENACC_IF_PRESENT.
> * oacc-parallel.c (GOACC_data_start): Handle
> GOACC_FLAG_HOST_DATA_IF_PRESENT flag.
> * target.c (gomp_map_vars_async): Handle
> GOMP_MAP_VARS_OPENACC_IF_PRESENT mapping kind.
> * testsuite/libgomp.oacc-c-c++-common/host_data-6.c: New test.


Grüße
 Thomas


> From 6d719cc2bcfa8f7ed8cb59e753e44aab6bf634fb Mon Sep 17 00:00:00 2001
> From: Thomas Schwinge 
> Date: Wed, 19 Dec 2018 20:04:18 +0100
> Subject: [PATCH 1/2] For libgomp OpenACC entry points, redefine the "device"
>  argument to "flags"
> 
> ... so that we're then able to use this for other flags in addition to
> "GOACC_FLAG_HOST_FALLBACK".
> 
>   gcc/
>   * omp-expand.c (expand_omp_target): Restructure OpenACC vs. OpenMP
>   code paths.  Update for libgomp OpenACC entry points change.
>   include/
>   * gomp-constants.h (GOACC_FLAG_HOST_FALLBACK)
>   (GOACC_FLAGS_MARSHAL_OP, GOACC_FLAGS_UNMARSHAL): Define.
>   libgomp/
>   * oacc-parallel.c (GOACC_parallel_keyed, GOACC_parallel)
>   (GOACC_data_start, GOACC_enter_exit_data, GOACC_update)
>   (GOACC_declare): Redefine the "device" argument to "flags".
> ---
>  gcc/ChangeLog.openacc  |   5 ++
>  gcc/omp-expand.c   | 111 
> +
>  gcc/tree-ssa-structalias.c |   4 +-
>  include/ChangeLog.openacc  |   5 ++
>  include/gomp-constants.h   |  12 +
>  libgomp/ChangeLog.openacc  |   6 +++
>  libgomp/oacc-parallel.c|  60 ++--
>  7 files changed, 139 insertions(+), 64 deletions(-)
> 
> diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
> index 718044c..6a51b1e 100644
> --- a/gcc/ChangeLog.openacc
> +++ b/gcc/ChangeLog.openacc
> @@ -1,3 +1,8 @@
> +2018-12-21  Thomas Schwinge  
> +
> + * omp-expand.c (expand_omp_target): Restructure OpenACC vs. OpenMP
> + code paths.  Update for libgomp OpenACC entry points change.
> +
>  2018-12-21  Gergö Barany  
>  
>   * omp-low.c (scan_sharing_clauses): Fix call to renamed function
> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> index 988b1bb..ea264da 100644
> --- a/gcc/omp-expand.c
> +++ b/gcc/omp-expand.c
> @@ -7204,7 +7204,7 @@ expand_omp_target (struct omp_region *region)
>   transfers.  */
>tree t1, t2, t3, t4, device, cond, depend, c, clauses;
>enum built_in_function start_ix;
> -  location_t clause_loc;
> +  location_t clause_loc = UNKNOWN_LOCATION;
>unsigned int flags_i = 0;
>  
>switch (gimple_omp_target_kind (entry_stmt))
> @@ -7249,49 +7249,62 @@ expand_omp_target (struct omp_region *region)
>  
>clauses = gimple_omp_target_clauses (entry_stmt);
>  
> -  /* By default, the value of DEVICE is GOMP_DEVICE_ICV (let runtime
> - library choose) and there is no conditional.  */
> -  cond = NULL_TREE;
> -  device = build_int_cst (integer_type_node, GOMP_DEVICE_ICV);
> -
> -  c = omp_find_clause (clauses, OMP_CLAUSE_IF);
> -  if (c)
> -cond = OMP_CLAUSE_IF_EXPR (c);
> -
> -  c = omp_find_clause (clauses, OMP_CLAUSE_DEVICE);
> -  if (c)
> +  device = NULL_TREE;
> +  tree goacc_flags = NULL_TREE;
> +  if (is_gimple_omp_oacc (entry_stmt))
>  {
> -  /* Even if we pass it to all library function calls, it is currently 
> only
> -  defined/used for the OpenMP 

Re: [PATCH] x86: relax mask register constraints

2018-12-21 Thread Uros Bizjak
On Fri, Dec 21, 2018 at 9:43 AM Jan Beulich  wrote:
>
> While their use for masking is indeed restricted to %k1...%k7, use as
> "normal" insn operands also permits %k0. Remove the unnecessary
> limitations, requiring quite a few testsuite adjustments.
>
> Oddly enough some AVX512{F,DQ} test cases already check for %k[0-7],
> while others did permit {%k0} - where they get touched here anyway this
> gets fixed at the same time.
>
> gcc/
> 2018-12-21  Jan Beulich  
>
> * config/i386/sse.md
> (_cmp3,
> _cmp3,
> _ucmp3,
> _ucmp3,
> avx512f_vmcmp3,
> avx512f_vmcmp3_mask,
> avx512f_maskcmp3,
> _cvt2mask,
> _cvt2mask,
> *_cvtmask2,
> *_cvtmask2,
> _eq3_1,
> _eq3_1,
> _gt3,
> _gt3,
> _testm3,
> _testnm3,
> *_testm3_zext,
> *_testm3_zext_mask,
> *_testnm3_zext,
> *_testnm3_zext_mask,
> avx512cd_maskb_vec_dup,
> avx512cd_maskw_vec_dup,
> avx512dq_fpclass,
> avx512dq_vmfpclass,
> avx512vl_vpshufbitqmb): Use =k
> instead of =Yk.
>
> gcc/testsuite/
> 2018-12-21  Jan Beulich  
>
> * gcc.target/i386/avx512bitalg-vpshufbitqmb.c,
> gcc.target/i386/avx512bw-vpcmpeqb-1.c,
> gcc.target/i386/avx512bw-vpcmpequb-1.c,
> gcc.target/i386/avx512bw-vpcmpequw-1.c,
> gcc.target/i386/avx512bw-vpcmpeqw-1.c,
> gcc.target/i386/avx512bw-vpcmpgeb-1.c,
> gcc.target/i386/avx512bw-vpcmpgeub-1.c,
> gcc.target/i386/avx512bw-vpcmpgeuw-1.c,
> gcc.target/i386/avx512bw-vpcmpgew-1.c,
> gcc.target/i386/avx512bw-vpcmpgtb-1.c,
> gcc.target/i386/avx512bw-vpcmpgtub-1.c,
> gcc.target/i386/avx512bw-vpcmpgtuw-1.c,
> gcc.target/i386/avx512bw-vpcmpgtw-1.c,
> gcc.target/i386/avx512bw-vpcmpleb-1.c,
> gcc.target/i386/avx512bw-vpcmpleub-1.c,
> gcc.target/i386/avx512bw-vpcmpleuw-1.c,
> gcc.target/i386/avx512bw-vpcmplew-1.c,
> gcc.target/i386/avx512bw-vpcmpltb-1.c,
> gcc.target/i386/avx512bw-vpcmpltub-1.c,
> gcc.target/i386/avx512bw-vpcmpltuw-1.c,
> gcc.target/i386/avx512bw-vpcmpltw-1.c,
> gcc.target/i386/avx512bw-vpcmpneqb-1.c,
> gcc.target/i386/avx512bw-vpcmpnequb-1.c,
> gcc.target/i386/avx512bw-vpcmpnequw-1.c,
> gcc.target/i386/avx512bw-vpcmpneqw-1.c,
> gcc.target/i386/avx512bw-vpmovb2m-1.c,
> gcc.target/i386/avx512bw-vpmovm2b-1.c,
> gcc.target/i386/avx512bw-vpmovm2w-1.c,
> gcc.target/i386/avx512bw-vpmovw2m-1.c,
> gcc.target/i386/avx512bw-vptestmb-1.c,
> gcc.target/i386/avx512bw-vptestmw-1.c,
> gcc.target/i386/avx512bw-vptestnmb-1.c,
> gcc.target/i386/avx512bw-vptestnmw-1.c,
> gcc.target/i386/avx512cd-vpbroadcastmb2q-1.c,
> gcc.target/i386/avx512cd-vpbroadcastmw2d-1.c,
> gcc.target/i386/avx512dq-vfpclasssd-1.c,
> gcc.target/i386/avx512dq-vfpcla-1.c,
> gcc.target/i386/avx512dq-vpmovd2m-1.c,
> gcc.target/i386/avx512dq-vpmovm2d-1.c,
> gcc.target/i386/avx512dq-vpmovm2q-1.c,
> gcc.target/i386/avx512dq-vpmovq2m-1.c,
> gcc.target/i386/avx512vl-vpbroadcastmb2q-1.c,
> gcc.target/i386/avx512vl-vpbroadcastmw2d-1.c,
> gcc.target/i386/avx512vl-vpcmpeqd-1.c,
> gcc.target/i386/avx512vl-vpcmpeqq-1.c,
> gcc.target/i386/avx512vl-vpcmpequd-1.c,
> gcc.target/i386/avx512vl-vpcmpequq-1.c,
> gcc.target/i386/avx512vl-vpcmpged-1.c,
> gcc.target/i386/avx512vl-vpcmpgeq-1.c,
> gcc.target/i386/avx512vl-vpcmpgeud-1.c,
> gcc.target/i386/avx512vl-vpcmpgeuq-1.c,
> gcc.target/i386/avx512vl-vpcmpgtd-1.c,
> gcc.target/i386/avx512vl-vpcmpgtq-1.c,
> gcc.target/i386/avx512vl-vpcmpgtud-1.c,
> gcc.target/i386/avx512vl-vpcmpgtuq-1.c,
> gcc.target/i386/avx512vl-vpcmpled-1.c,
> gcc.target/i386/avx512vl-vpcmpleq-1.c,
> gcc.target/i386/avx512vl-vpcmpleud-1.c,
> gcc.target/i386/avx512vl-vpcmpleuq-1.c,
> gcc.target/i386/avx512vl-vpcmpltd-1.c,
> gcc.target/i386/avx512vl-vpcmpltq-1.c,
> gcc.target/i386/avx512vl-vpcmpltud-1.c,
> gcc.target/i386/avx512vl-vpcmpltuq-1.c,
> gcc.target/i386/avx512vl-vpcmpneqd-1.c,
> gcc.target/i386/avx512vl-vpcmpneqq-1.c,
> gcc.target/i386/avx512vl-vpcmpnequd-1.c,
> gcc.target/i386/avx512vl-vpcmpnequq-1.c,
> gcc.target/i386/avx512vl-vptestmd-1.c,
> gcc.target/i386/avx512vl-vptestmq-1.c,
> gcc.target/i386/avx512vl-vptestnmd-1.c,
> gcc.target/i386/avx512vl-vptestnmq-1.c: Permit %k0 as ordinary
> operand.
> * gcc.target/i386/avx512bw-vpcmpb-1.c,
> gcc.target/i386/avx512bw-vpcmpub-1.c,
> gcc.target/i386/avx512bw-vpcmpuw-1.c,
> gcc.target/i386/avx512bw-vpcmpw-1.c,
>  

Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2018-12-21 Thread Jakub Jelinek
On Fri, Dec 21, 2018 at 01:23:03PM +, Julian Brown wrote:
> 2018-xx-yy  Nathan Sidwell  
> 
> PR lto/71959
> libgomp/
> * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
> * testsuite/libgomp.oacc-c++/pr71959.C: New.

Just nits, better use pr71959-aux.cc (*.cc files aren't considered as
testcases by *.exp:
set tests [lsort [concat \
  [find $srcdir/$subdir *.C] \
  [find $srcdir/$subdir/../libgomp.oacc-c-c++-common 
*.c]]]
) and just a is weird.

> commit c69dce8ba0ecd7ff620f4f1b8dacc94c61984107
> Author: Julian Brown 
> Date:   Wed Dec 19 05:01:58 2018 -0800
> 
> Add testcase from PR71959
> 
>   libgomp/

Please mention
PR lto/71959
here in the ChangeLog.

>   * testsuite/libgomp.oacc-c++/pr71959-a.C: New.
>   * testsuite/libgomp.oacc-c++/pr71959.C: New.

> +void apply (int (*fn)(), Iter out) asm ("_ZN5Apply5applyEPFivE4Iter");

Will this work even on targets that use _ or other symbol prefixes?

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
> @@ -0,0 +1,31 @@
> +// { dg-additional-sources "pr71959-a.C" }
> +
> +// pr lto/71959 ICEd LTO due to mismatch between writing & reading behaviour

Capital PR instead of pr .

Jakub


Re: Relax std::move_if_noexcept for std::pair

2018-12-21 Thread Jonathan Wakely

On 20/12/18 22:53 +0100, François Dumont wrote:

On 12/20/18 9:04 AM, Ville Voutilainen wrote:

On Thu, 20 Dec 2018 at 08:29, François Dumont  wrote:

Hi

 I eventually find out what was the problem with the
std::move_if_noexcept within associative containers.

 The std::pair move default constructor might not move both first
and second member. If any is not moveable it will just copy it. And then

..as it should..


the noexcept qualification of the copy constructor will participate in
the noexcept qualification of the std::pair move constructor. So
std::move_if_noexcept can eventually decide to not use move because a
_copy_ constructor not noexcept qualified.

..and again, as it should.


 This is why I am partially specializing __move_if_noexcept_cond. As
there doesn't seem to exist any Standard meta function to find out if
move will take place I resort using std::is_const as in this case for
sure the compiler won't call the move constructor.

That seems wrong; just because a type is or is not const has nothing
to do whether
it's nothrow_move_constructible.


Indeed, I am not changing that.




I don't understand what problem this is solving, and how it's not
introducing new problems.

The problem I am trying to solve is shown by the tests I have adapted. 
Allow more move semantic in associative container where key are stored 
as const.


I'm not convinced that's a desirable property, especially not if it
needs changes to move_if_noexcept.

But if I make counter_type copy constructor noexcept then I also get 
the move on the pair.second instance, great. I am just surprise to 
have to make a copy constructor noexcept to have std::move_if_noexcept 
work as I expect.


Because the move constructor of pair will copy the first
element not move it, because you can't move from a const object. If
the T(const T&) constructor is noexcept, and the U(U&&) constructor is
also noexcept, then the pair move constructor is noexcept.
The move constructor's exception specification depends on the
exception specifications of whichever constructors it invokes for its
members.

I think I just need to understand why we need std::move_if_noexcept in 
unordered containers or even rb_tree. Couldn't we just use std::move ? 


No. If moving can throw then we can't provide strong exception safety.

I don't understand what we are trying to avoid with this noexcept 
check.


Then maybe stop trying to change how it works :-)



Re: [PATCH][og8] Update code and reduction tests for `serial' construct

2018-12-21 Thread Thomas Schwinge
Hi Gergő!

On Fri, 21 Dec 2018 13:24:34 +0100, Gergö Barany  wrote:
> This fixes a conflict between two recently committed patches to 
> openacc-gcc-8-branch, Maciej's "Add OpenACC 2.6 `serial' construct 
> support" and my "Report errors on missing OpenACC reduction clauses in 
> nested reductions". The former renamed a function which caused the 
> latter to no longer compile.

My bad for not noticing that one before pushing these changes
yesterday...  ;-|

> Additionally, new tests for OpenACC reductions in serial regions are 
> added, and the existing ones separated out by region kind 
> (parallel/kernels/serial).
> 
> OK for openacc-gcc-8-branch?

Yes, thanks.  To record the review effort, please include "Reviewed-by:
Thomas Schwinge " in the commit log, see
.

Just two minor notes:

> From 72098b852c0cee656f61395c04f9271a0a598761 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Gerg=C3=B6=20Barany?= 
> Date: Fri, 21 Dec 2018 00:08:09 -0800
> Subject: [PATCH] [og8] Update code and reduction tests for `serial' construct

Might want to say "OpenACC `serial' construct" here, so that it's
obviously clear what this is about.

> gcc/
> * omp-low.c (scan_sharing_clauses): Fix call to renamed function
> is_oacc_parallel.
> gcc/testsuite/c-c++-common/goacc/
> * nested-reductions-fail.c: Renamed to...
> * nested-reductions-parallel-fail.c: ...this file, with kernels tests...
> * nested-reductions-kernels-fail.c: ... moved to this new file.
> * nested-reductions-serial-fail.c: New test.
> * nested-reductions.c: Renamed to...
> * nested-reductions-parallel.c: ... this file, with kernels tests...
> * nested-reductions-kernels.c: ... moved to this new file.
> * nested-reductions-serial.c: New test.

The paths used inside the snippets are always relative to the respective
ChangeLog file, so here:

>   gcc/testsuite/
>   * c-c++-common/goacc/nested-reductions-fail.c: Renamed to...
>   [...]

You got that right in the "gcc/testsuite/ChangeLog.openacc" file, but it
needs to be updated in the commit message.  (... which in GCC typically
is just a copy'n'paste of the respective ChangeLog updates, possibly with
some introductory text to describe the "why" of the commit etc., instead
of just the "how" as done by the GNU ChangeLogs.)  For avoidance of
doubt, in your example here, I'd say that the summary line plus ChangeLog
is sufficient.


Grüße
 Thomas


> ---
>  gcc/ChangeLog.openacc  |   5 +
>  gcc/omp-low.c  |   2 +-
>  gcc/testsuite/ChangeLog.openacc|  15 +
>  .../c-c++-common/goacc/nested-reductions-fail.c| 492 
> -
>  .../goacc/nested-reductions-kernels-fail.c | 273 
>  .../c-c++-common/goacc/nested-reductions-kernels.c | 227 ++
>  .../goacc/nested-reductions-parallel-fail.c| 447 +++
>  .../goacc/nested-reductions-parallel.c | 384 
>  .../goacc/nested-reductions-serial-fail.c  | 446 +++
>  .../c-c++-common/goacc/nested-reductions-serial.c  | 391 
>  .../c-c++-common/goacc/nested-reductions.c | 420 --
>  11 files changed, 2189 insertions(+), 913 deletions(-)
>  delete mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
>  create mode 100644 
> gcc/testsuite/c-c++-common/goacc/nested-reductions-kernels-fail.c
>  create mode 100644 
> gcc/testsuite/c-c++-common/goacc/nested-reductions-kernels.c
>  create mode 100644 
> gcc/testsuite/c-c++-common/goacc/nested-reductions-parallel-fail.c
>  create mode 100644 
> gcc/testsuite/c-c++-common/goacc/nested-reductions-parallel.c
>  create mode 100644 
> gcc/testsuite/c-c++-common/goacc/nested-reductions-serial-fail.c
>  create mode 100644 
> gcc/testsuite/c-c++-common/goacc/nested-reductions-serial.c
>  delete mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c
> 
> diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
> index 5973625..718044c 100644
> --- a/gcc/ChangeLog.openacc
> +++ b/gcc/ChangeLog.openacc
> @@ -1,3 +1,8 @@
> +2018-12-21  Gergö Barany  
> +
> + * omp-low.c (scan_sharing_clauses): Fix call to renamed function
> + is_oacc_parallel.
> +
>  2018-12-20  Gergö Barany  
>  
>   * omp-low.c (struct omp_context): New fields
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index 6b7b23e..72b6548 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -1286,7 +1286,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
> goto do_private;
>  
>   case OMP_CLAUSE_REDUCTION:
> -  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
> +  if (is_gimple_omp_oacc (ctx->stmt))
>  ctx->local_reduction_clauses
> = tree_cons (NULL, c, ctx->local_reduction_clauses);
> decl = OMP_CLAUSE_DECL (c);
> diff --git 

Re: [patch,openacc] Fix PR71959: lto dump of callee counts

2018-12-21 Thread Julian Brown
On Fri, 21 Dec 2018 02:56:36 +
Julian Brown  wrote:

> On Tue, 25 Sep 2018 14:59:18 +0200
> Martin Jambor  wrote:
> 
> > Hi,
> > 
> > I have noticed a few things...
> > 
> > On Thu, Sep 20 2018, Cesar Philippidis wrote:  
> > > This is another old gomp4 patch that demotes an ICE in PR71959 to
> > > a linker warning. One problem here is that it is not clear if
> > > OpenACC allows individual member functions in C++ classes to be
> > > marked as acc routines. There's another issue accessing member
> > > data inside offloaded regions. We'll add some support for member
> > > data OpenACC 2.6, but some of the OpenACC C++ semantics are still
> > > unclear.
> > >
> > > Is this OK for trunk? I bootstrapped and regtested it for x86_64
> > > Linux with nvptx offloading.  
> > [...]  
> 
> The testcase associated with this bug appears to be fixed by the
> following patch:
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01167.html
> 
> So, it's unclear if there's anything left to do here, and this patch
> can probably be withdrawn.

...or actually, maybe we should keep the new testcase in case of future
regressions. This patch contains just that.

OK to apply?

Thanks,

Julian

ChangeLog

2018-xx-yy  Nathan Sidwell  

PR lto/71959
libgomp/
* testsuite/libgomp.oacc-c++/pr71959-a.C: New.
* testsuite/libgomp.oacc-c++/pr71959.C: New.
commit c69dce8ba0ecd7ff620f4f1b8dacc94c61984107
Author: Julian Brown 
Date:   Wed Dec 19 05:01:58 2018 -0800

Add testcase from PR71959

	libgomp/
	* testsuite/libgomp.oacc-c++/pr71959-a.C: New.
	* testsuite/libgomp.oacc-c++/pr71959.C: New.

diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C b/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C
new file mode 100644
index 000..ec4b14a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C
@@ -0,0 +1,31 @@
+// { dg-do compile }
+
+struct Iter
+{
+  int *cursor;
+
+  void ctor (int *cursor_) asm("_ZN4IterC1EPi");
+  int *point () const asm("_ZNK4Iter5pointEv");
+};
+
+#pragma acc routine
+void  Iter::ctor (int *cursor_)
+{
+  cursor = cursor_;
+}
+
+#pragma acc routine
+int *Iter::point () const
+{
+  return cursor;
+}
+
+void apply (int (*fn)(), Iter out) asm ("_ZN5Apply5applyEPFivE4Iter");
+
+#pragma acc routine
+void apply (int (*fn)(), struct Iter out)
+{ *out.point() = fn (); }
+
+extern "C" void __gxx_personality_v0 ()
+{
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959.C b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
new file mode 100644
index 000..8508c17
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
@@ -0,0 +1,31 @@
+// { dg-additional-sources "pr71959-a.C" }
+
+// pr lto/71959 ICEd LTO due to mismatch between writing & reading behaviour
+
+struct Iter
+{
+  int *cursor;
+
+  Iter(int *cursor_) : cursor(cursor_) {}
+
+  int *point() const { return cursor; }
+};
+
+#pragma acc routine seq
+int one () { return 1; }
+
+struct Apply
+{
+  static void apply (int (*fn)(), Iter out)
+  { *out.point() = fn (); }
+};
+
+int main ()
+{
+  int x;
+
+#pragma acc parallel copyout(x)
+  Apply::apply (one, Iter ());
+
+  return x != 1;
+}


PING^2 [PATCH] x86: Add -mzero-caller-saved-regs=[skip|used-gpr|all-gpr|used|all]

2018-12-21 Thread H.J. Lu
On Thu, Nov 29, 2018 at 3:14 PM H.J. Lu  wrote:
>
> On Wed, Oct 31, 2018 at 12:42 PM H.J. Lu  wrote:
> >
> > On Thu, Sep 27, 2018 at 7:58 AM Richard Biener
> >  wrote:
> > >
> > > On Thu, Sep 27, 2018 at 3:16 PM H.J. Lu  wrote:
> > > >
> > > > On Thu, Sep 27, 2018 at 6:08 AM, Szabolcs Nagy  
> > > > wrote:
> > > > > On 26/09/18 19:10, H.J. Lu wrote:
> > > > >>
> > > > >> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and
> > > > >> zero_caller_saved_regs("skip|used|all") function attribue:
> > > > >>
> > > > >> 1. -mzero-caller-saved-regs=skip and zero_caller_saved_regs("skip")
> > > > >>
> > > > >> Don't zero caller-saved integer registers upon function return.
> > > > >>
> > > > >> 2. -mzero-caller-saved-regs=used and zero_caller_saved_regs("used")
> > > > >>
> > > > >> Zero used caller-saved integer registers upon function return.
> > > > >>
> > > > >> 3. -mzero-caller-saved-regs=all and zero_caller_saved_regs("all")
> > > > >>
> > > > >> Zero all caller-saved integer registers upon function return.
> > > > >>
> > > > >> Tested on i686 and x86-64 with bootstrapping GCC trunk and
> > > > >> -mzero-caller-saved-regs=used as well as -mzero-caller-saved-regs=all
> > > > >> enabled by default.
> > > > >>
> > > > >
> > > > > from this description and the documentation it's
> > > > > not clear to me what this tries to achieve.
> > > > >
> > > > > is it trying to prevent information leak?
> > > > > or some pcs hack the caller may rely on?
> > > > >
> > > > > if it's for information leak then i'd expect such
> > > > > attribute to be used on crypto code.. however i'd
> > > > > expect crypto code to use simd registers as well,
> > > > > so integer only cleaning needs explanation.
> > > >
> > > > The target usage is in Linux kernel.
> > >
> > > Maybe still somehow encode that in the option since it otherwise raises
> > > expectations that are not met?
> > > -mzero-call-clobbered-regs=used-int|all-int|skip|used-simd|used-fp,etc.?
> > > and sorry() on unimplemented ones?  Or simply zero also non-integer
> > > regs the same way?  I suppose
> > > there isn't sth like vzeroupper that zeros all SIMD regs and completely?
> > >
> >
> > Here is the updated patch to zero caller-saved vector registers.   I don't
> > mind a different option name if it is preferred.  I may be able to create
> > some generic utility functions which can be used by other backends.  But
> > actual implementation must be target specific.
> >
> > Any comments?
>
> PING.
>
> https://gcc.gnu.org/ml/gcc-patches/2018-10/msg02079.html
>

PING.

-- 
H.J.


[PATCH][AArch64] Use Q-reg loads/stores in movmem expansion

2018-12-21 Thread Kyrill Tkachov

Hi all,

Our movmem expansion currently emits TImode loads and stores when copying 
128-bit chunks.
This generates X-register LDP/STP sequences as these are the most preferred 
registers for that mode.

For the purpose of copying memory, however, we want to prefer Q-registers.
This uses one fewer register, so helping with register pressure.
It also allows merging of 256-bit and larger copies into Q-reg LDP/STP, further 
helping code size.

The implementation of that is easy: we just use a 128-bit vector mode (V4SImode 
in this patch)
rather than a TImode.

With this patch the testcase:
#define N 8
int src[N], dst[N];

void
foo (void)
{
  __builtin_memcpy (dst, src, N * sizeof (int));
}

generates:
foo:
adrpx1, src
add x1, x1, :lo12:src
adrpx0, dst
add x0, x0, :lo12:dst
ldp q1, q0, [x1]
stp q1, q0, [x0]
ret

instead of:
foo:
adrpx1, src
add x1, x1, :lo12:src
adrpx0, dst
add x0, x0, :lo12:dst
ldp x2, x3, [x1]
stp x2, x3, [x0]
ldp x2, x3, [x1, 16]
stp x2, x3, [x0, 16]
ret

Bootstrapped and tested on aarch64-none-linux-gnu.
I hope this is a small enough change for GCC 9.
One could argue that it is finishing up the work done this cycle to support 
Q-register LDP/STPs

I've seen this give about 1.8% on 541.leela_r on Cortex-A57 with other changes 
in SPEC2017 in the noise
but there is reduction in code size everywhere (due to more LDP/STP-Q pairs 
being formed)

Ok for trunk?

Thanks,
Kyrill

2018-12-21  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_expand_movmem): Use V4SImode for
128-bit moves.

2018-12-21  Kyrylo Tkachov  

* gcc.target/aarch64/movmem-q-reg_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 88b14179a4cbc5357dfabe21227ff9c8a111804c..a8dcdd4c9e22a7583a197372e500c787c91fe459 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -16448,6 +16448,16 @@ aarch64_expand_movmem (rtx *operands)
 	if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit))
 	  cur_mode = mode_iter.require ();
 
+  /* If we want to use 128-bit chunks use a vector mode to prefer the use
+	 of Q registers.  This is preferable to using load/store-pairs of X
+	 registers as we need 1 Q-register vs 2 X-registers.
+	 Also, for targets that prefer it, further passes can create
+	 LDP/STP of Q-regs to further reduce the code size.  */
+  if (TARGET_SIMD
+	  && known_eq (GET_MODE_SIZE (cur_mode), GET_MODE_SIZE (TImode)))
+	cur_mode = V4SImode;
+
+
   gcc_assert (cur_mode != BLKmode);
 
   mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant ();
diff --git a/gcc/testsuite/gcc.target/aarch64/movmem-q-reg_1.c b/gcc/testsuite/gcc.target/aarch64/movmem-q-reg_1.c
new file mode 100644
index ..09afad59712b939e25519f02153b5156ddacbf5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movmem-q-reg_1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#define N 8
+int src[N], dst[N];
+
+void
+foo (void)
+{
+  __builtin_memcpy (dst, src, N * sizeof (int));
+}
+
+/* { dg-final { scan-assembler {ld[rp]\tq[0-9]*} } } */
+/* { dg-final { scan-assembler-not {ld[rp]\tx[0-9]*} } } */
+/* { dg-final { scan-assembler {st[rp]\tq[0-9]*} } } */
+/* { dg-final { scan-assembler-not {st[rp]\tx[0-9]*} } } */
\ No newline at end of file


[PATCH] [og8] Add OpenACC 2.6 if and if_present clauses on host_data construct

2018-12-21 Thread Gergö Barany
OpenACC 2.6 specifies `if' and `if_present' clauses on the `host_data' 
construct. These patches add support for these clauses. The first patch, 
by Thomas, reorganizes libgomp internals to turn a "device" argument 
into "flags" that can provide more information to the runtime. The 
second patch adds support for the `if' and `if_present' clauses, using 
the new flag mechanism.


OK for openacc-gcc-8-branch?

gcc/
* omp-expand.c (expand_omp_target): Restructure OpenACC vs. OpenMP
code paths.  Update for libgomp OpenACC entry points change.
include/
* gomp-constants.h (GOACC_FLAG_HOST_FALLBACK)
(GOACC_FLAGS_MARSHAL_OP, GOACC_FLAGS_UNMARSHAL): Define.
libgomp/
* oacc-parallel.c (GOACC_parallel_keyed, GOACC_parallel)
(GOACC_data_start, GOACC_enter_exit_data, GOACC_update)
(GOACC_declare): Redefine the "device" argument to "flags".


gcc/c/
* c-parser.c (OACC_HOST_DATA_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_IF
and PRAGMA_OACC_CLAUSE_IF_PRESENT.
gcc/cp/
* parser.c (OACC_HOST_DATA_CLAUSE_MASK): Likewise.

gcc/fortran/
* openmp.c (OACC_HOST_DATA_CLAUSES): Add OMP_CLAUSE_IF and
OMP_CLAUSE_IF_PRESENT.

gcc/
* omp-expand.c (expand_omp_target): Handle if_present flag on
OpenACC host_data construct.

gcc/testsuite/c-c++-common/goacc/
* host_data-1.c: Add tests of if and if_present clauses on host_data.
gcc/testsuite/gfortran.dg/goacc/
* host_data-tree.f95: Likewise.

include/
* gomp-constants.h (GOACC_FLAG_HOST_DATA_IF_PRESENT): New constant.

libgomp/
* libgomp.h (enum gomp_map_vars_kind): Add
GOMP_MAP_VARS_OPENACC_IF_PRESENT.

libgomp/
* oacc-parallel.c (GOACC_data_start): Handle
GOACC_FLAG_HOST_DATA_IF_PRESENT flag.
* target.c (gomp_map_vars_async): Handle
GOMP_MAP_VARS_OPENACC_IF_PRESENT mapping kind.

libgomp/testsuite/libgomp.oacc-c-c++-common/
* host_data-6.c: New test.
>From 6d719cc2bcfa8f7ed8cb59e753e44aab6bf634fb Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 19 Dec 2018 20:04:18 +0100
Subject: [PATCH 1/2] For libgomp OpenACC entry points, redefine the "device"
 argument to "flags"

... so that we're then able to use this for other flags in addition to
"GOACC_FLAG_HOST_FALLBACK".

	gcc/
	* omp-expand.c (expand_omp_target): Restructure OpenACC vs. OpenMP
	code paths.  Update for libgomp OpenACC entry points change.
	include/
	* gomp-constants.h (GOACC_FLAG_HOST_FALLBACK)
	(GOACC_FLAGS_MARSHAL_OP, GOACC_FLAGS_UNMARSHAL): Define.
	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed, GOACC_parallel)
	(GOACC_data_start, GOACC_enter_exit_data, GOACC_update)
	(GOACC_declare): Redefine the "device" argument to "flags".
---
 gcc/ChangeLog.openacc  |   5 ++
 gcc/omp-expand.c   | 111 +
 gcc/tree-ssa-structalias.c |   4 +-
 include/ChangeLog.openacc  |   5 ++
 include/gomp-constants.h   |  12 +
 libgomp/ChangeLog.openacc  |   6 +++
 libgomp/oacc-parallel.c|  60 ++--
 7 files changed, 139 insertions(+), 64 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 718044c..6a51b1e 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2018-12-21  Thomas Schwinge  
+
+	* omp-expand.c (expand_omp_target): Restructure OpenACC vs. OpenMP
+	code paths.  Update for libgomp OpenACC entry points change.
+
 2018-12-21  Gergö Barany  
 
 	* omp-low.c (scan_sharing_clauses): Fix call to renamed function
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 988b1bb..ea264da 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7204,7 +7204,7 @@ expand_omp_target (struct omp_region *region)
  transfers.  */
   tree t1, t2, t3, t4, device, cond, depend, c, clauses;
   enum built_in_function start_ix;
-  location_t clause_loc;
+  location_t clause_loc = UNKNOWN_LOCATION;
   unsigned int flags_i = 0;
 
   switch (gimple_omp_target_kind (entry_stmt))
@@ -7249,49 +7249,62 @@ expand_omp_target (struct omp_region *region)
 
   clauses = gimple_omp_target_clauses (entry_stmt);
 
-  /* By default, the value of DEVICE is GOMP_DEVICE_ICV (let runtime
- library choose) and there is no conditional.  */
-  cond = NULL_TREE;
-  device = build_int_cst (integer_type_node, GOMP_DEVICE_ICV);
-
-  c = omp_find_clause (clauses, OMP_CLAUSE_IF);
-  if (c)
-cond = OMP_CLAUSE_IF_EXPR (c);
-
-  c = omp_find_clause (clauses, OMP_CLAUSE_DEVICE);
-  if (c)
+  device = NULL_TREE;
+  tree goacc_flags = NULL_TREE;
+  if (is_gimple_omp_oacc (entry_stmt))
 {
-  /* Even if we pass it to all library function calls, it is currently only
-	 defined/used for the OpenMP target ones.  */
-  gcc_checking_assert (start_ix == BUILT_IN_GOMP_TARGET
-			   || start_ix == BUILT_IN_GOMP_TARGET_DATA
-			   || start_ix == BUILT_IN_GOMP_TARGET_UPDATE
-			   || start_ix == BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA);
-
-  device = OMP_CLAUSE_DEVICE_ID (c);
-  clause_loc = 

[PATCH][og8] Update code and reduction tests for `serial' construct

2018-12-21 Thread Gergö Barany
This fixes a conflict between two recently committed patches to 
openacc-gcc-8-branch, Maciej's "Add OpenACC 2.6 `serial' construct 
support" and my "Report errors on missing OpenACC reduction clauses in 
nested reductions". The former renamed a function which caused the 
latter to no longer compile.


Additionally, new tests for OpenACC reductions in serial regions are 
added, and the existing ones separated out by region kind 
(parallel/kernels/serial).


OK for openacc-gcc-8-branch?


2018-12-21  Gergö Barany  

gcc/
* omp-low.c (scan_sharing_clauses): Fix call to renamed function
is_oacc_parallel.
gcc/testsuite/c-c++-common/goacc/
* nested-reductions-fail.c: Renamed to...
* nested-reductions-parallel-fail.c: ...this file, with kernels tests...
* nested-reductions-kernels-fail.c: ... moved to this new file.
* nested-reductions-serial-fail.c: New test.
* nested-reductions.c: Renamed to...
* nested-reductions-parallel.c: ... this file, with kernels tests...
* nested-reductions-kernels.c: ... moved to this new file.
* nested-reductions-serial.c: New test.
>From 72098b852c0cee656f61395c04f9271a0a598761 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= 
Date: Fri, 21 Dec 2018 00:08:09 -0800
Subject: [PATCH] [og8] Update code and reduction tests for `serial' construct

gcc/
* omp-low.c (scan_sharing_clauses): Fix call to renamed function
is_oacc_parallel.
gcc/testsuite/c-c++-common/goacc/
* nested-reductions-fail.c: Renamed to...
* nested-reductions-parallel-fail.c: ...this file, with kernels tests...
* nested-reductions-kernels-fail.c: ... moved to this new file.
* nested-reductions-serial-fail.c: New test.
* nested-reductions.c: Renamed to...
* nested-reductions-parallel.c: ... this file, with kernels tests...
* nested-reductions-kernels.c: ... moved to this new file.
* nested-reductions-serial.c: New test.
---
 gcc/ChangeLog.openacc  |   5 +
 gcc/omp-low.c  |   2 +-
 gcc/testsuite/ChangeLog.openacc|  15 +
 .../c-c++-common/goacc/nested-reductions-fail.c| 492 -
 .../goacc/nested-reductions-kernels-fail.c | 273 
 .../c-c++-common/goacc/nested-reductions-kernels.c | 227 ++
 .../goacc/nested-reductions-parallel-fail.c| 447 +++
 .../goacc/nested-reductions-parallel.c | 384 
 .../goacc/nested-reductions-serial-fail.c  | 446 +++
 .../c-c++-common/goacc/nested-reductions-serial.c  | 391 
 .../c-c++-common/goacc/nested-reductions.c | 420 --
 11 files changed, 2189 insertions(+), 913 deletions(-)
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-kernels-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-kernels.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-parallel-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-parallel.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-serial-fail.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions-serial.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/nested-reductions.c

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 5973625..718044c 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2018-12-21  Gergö Barany  
+
+	* omp-low.c (scan_sharing_clauses): Fix call to renamed function
+	is_oacc_parallel.
+
 2018-12-20  Gergö Barany  
 
 	* omp-low.c (struct omp_context): New fields
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6b7b23e..72b6548 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1286,7 +1286,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	  goto do_private;
 
 	case OMP_CLAUSE_REDUCTION:
-  if (is_oacc_parallel (ctx) || is_oacc_kernels (ctx))
+  if (is_gimple_omp_oacc (ctx->stmt))
 ctx->local_reduction_clauses
 	  = tree_cons (NULL, c, ctx->local_reduction_clauses);
 	  decl = OMP_CLAUSE_DECL (c);
diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 4af31e5..473eb9d 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,18 @@
+2018-12-21  Gergö Barany  
+
+	* c-c++-common/goacc/nested-reductions-fail.c: Renamed to...
+	* c-c++-common/goacc/nested-reductions-parallel-fail.c: ...this file,
+	with kernels tests...
+	* c-c++-common/goacc/nested-reductions-kernels-fail.c: ... moved to this
+	new file.
+	* c-c++-common/goacc/nested-reductions-serial-fail.c: New test.
+	* c-c++-common/goacc/nested-reductions.c: Renamed to...
+	* c-c++-common/goacc/nested-reductions-parallel.c: ... 

[committed] Add testcase for already fixed PR c++/87125

2018-12-21 Thread Jakub Jelinek
On Fri, Dec 14, 2018 at 06:15:54PM -0200, Alexandre Oliva wrote:
> On Dec  6, 2018, Alexandre Oliva  wrote:
> 
> > Regstrapped on x86_64- and i686-linux-gnu, mistakenly along with a patch
> > with a known regression, and got only that known regression.  Retesting
> > without it.  Ok to install?
> 
> Ping?  That retesting confirmed no regressions.
> https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00423.html
> 
> 
> > for  gcc/cp/ChangeLog
> 
> > PR c++/87814
> > * pt.c (tsubst_exception_specification): Handle
> > DEFERRED_NOEXCEPT with !defer_ok.
> 
> > for  gcc/testsuite/ChangeLog
> 
> > PR c++/87814
> > * g++.dg/cpp1z/pr87814.C: New.

This patch fixed also PR87125, I've added the simplified testcase to
the testsuite after verifying it still ICEs before your commit and doesn't
after it or before r261084, so that we can close the PR.

2018-12-21  Jakub Jelinek  

PR c++/87125
* g++.dg/cpp0x/pr87125.C: New test.

--- gcc/testsuite/g++.dg/cpp0x/pr87125.C.jj 2018-12-21 13:03:02.781212081 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr87125.C2018-12-21 13:08:10.941433896 
+0100
@@ -0,0 +1,15 @@
+// PR c++/87125
+// { dg-do compile { target c++11 } }
+
+template 
+struct S {
+  template 
+  constexpr S (U) noexcept (T ()) {}
+};
+struct V : S { using S::S; };
+
+bool
+foo ()
+{
+  return noexcept (V (0));
+}


Jakub


[PATCH, ARM, committed] Fix size-optimization-ieee testcase failure

2018-12-21 Thread Thomas Preudhomme
I've committed the obvious attached patch to fix the
gcc.target/arm/size-optimization-ieee-* testcase failures.

On some version of dejagnu, options in RUNTESTFLAGS are appended to the
command-line and thus any -mfloat-abi=softfp or -mfloat-abi=hard in
there overwrite the -mfloat-abi=soft in the dg-options for
size-optimization-ieee-* tests. Test is still run though because
arm_soft_ok returns true if -mfloat-abi=soft is accepted, even if the
file is not compiled for softfloat due to a later -mfloat-abi on the
command line.

This patch adds a dg-skip-if to those tests to ensure they are not run
in softfp or hard mode.

2018-12-21  Thomas Preud'homme  

gcc/testsuite/
* gcc.target/arm/size-optimization-ieee-1.c: Skip if passing
-mfloat-abi=softfp or -mfloat-abi=hard.
* gcc.target/arm/size-optimization-ieee-2.c: Likewise.
* gcc.target/arm/size-optimization-ieee-3.c: Likewise.
From c13cca23aa64a07f66c80f14dbdd79c63163783c Mon Sep 17 00:00:00 2001
From: thopre01 
Date: Fri, 21 Dec 2018 11:49:04 +
Subject: [PATCH] [ARM] Fix size-optimization-ieee testcase failure

On some version of dejagnu, options in RUNTESTFLAGS are appended to the
command-line and thus any -mfloat-abi=softfp or -mfloat-abi=hard in
there overwrite the -mfloat-abi=soft in the dg-options for
size-optimization-ieee-* tests. Test is still run though because
arm_soft_ok returns true if -mfloat-abi=soft is accepted, even if the
file is not compiled for softfloat due to a later -mfloat-abi on the
command line.

This patch adds a dg-skip-if to those tests to ensure they are not run
in softfp or hard mode.

2018-12-21  Thomas Preud'homme  

gcc/testsuite/
* gcc.target/arm/size-optimization-ieee-1.c: Skip if passing
-mfloat-abi=softfp or -mfloat-abi=hard.
* gcc.target/arm/size-optimization-ieee-2.c: Likewise.
* gcc.target/arm/size-optimization-ieee-3.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267323 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog | 7 +++
 gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c | 1 +
 gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c | 1 +
 gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c | 1 +
 4 files changed, 10 insertions(+)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index dcac93bb275..1569e7aaa0f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2018-12-21  Thomas Preud'homme  
+
+	* gcc.target/arm/size-optimization-ieee-1.c: Skip if passing
+	-mfloat-abi=softfp or -mfloat-abi=hard.
+	* gcc.target/arm/size-optimization-ieee-2.c: Likewise.
+	* gcc.target/arm/size-optimization-ieee-3.c: Likewise.
+
 2018-12-21  Jakub Jelinek  
 
 	PR target/88547
diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
index 34090f20fec..61475eb4c67 100644
--- a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
+++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-1.c
@@ -1,4 +1,5 @@
 /* { dg-do link { target arm_soft_ok } } */
+/* { dg-skip-if "Feature is -mfloat-abi=soft only" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
 /* { dg-options "-mfloat-abi=soft" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
index 75337894a9c..b4699271cea 100644
--- a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
+++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-2.c
@@ -1,4 +1,5 @@
 /* { dg-do link { target arm_soft_ok } } */
+/* { dg-skip-if "Feature is -mfloat-abi=soft only" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
 /* { dg-options "-mfloat-abi=soft" } */
 
 int
diff --git a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c
index 63c92b3bbb7..34b1ebe7afd 100644
--- a/gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c
+++ b/gcc/testsuite/gcc.target/arm/size-optimization-ieee-3.c
@@ -1,4 +1,5 @@
 /* { dg-do link { target arm_soft_ok } } */
+/* { dg-skip-if "Feature is -mfloat-abi=soft only" { *-*-* } { "-mfloat-abi=*" } { "-mfloat-abi=soft" } } */
 /* { dg-options "-mfloat-abi=soft" } */
 
 int
-- 
2.19.1



Re: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

2018-12-21 Thread Kyrill Tkachov

Hi Tamar,

On 11/12/18 15:46, Tamar Christina wrote:

Hi All,

This patch adds NEON intrinsics and tests for the Armv8.3-a complex
multiplication and add instructions with a rotate along the Argand plane.

The instructions are documented in the ArmARM[1] and the intrinsics 
specification
will be published on the Arm website [2].

The Lane versions of these instructions are special in that they always select 
a pair.
using index 0 means selecting lane 0 and 1.  Because of this the range check 
for the
intrinsics require special handling.

On Arm, in order to implement some of the lane intrinsics we're using the 
structure of the
register file.  The lane variant of these instructions always select a D 
register, but the data
itself can be stored in Q registers.  This means that for single precision 
complex numbers you are
only allowed to select D[0] but using the register file layout you can get the 
range 0-1 for lane indices
by selecting between Dn[0] and Dn+1[0].

Same reasoning applies for half float complex numbers, except there your D 
register indexes can be 0 or 1, so you have
a total range of 4 elements (for a V8HF).


[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
[2] https://developer.arm.com/docs/101028/latest

Bootstrapped Regtested on arm-none-gnueabihf and no issues.

Ok for trunk?



Ok.
Thanks,
Kyrill


Thanks,
Tamar

gcc/ChangeLog:

2018-12-11  Tamar Christina  

* config/arm/arm-builtins.c
(enum arm_type_qualifiers): Add qualifier_lane_pair_index.
(MAC_LANE_PAIR_QUALIFIERS): New.
(arm_expand_builtin_args): Use it.
(arm_expand_builtin_1): Likewise.
* config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands): New.
* config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
* config/arm/arm-c.c (arm_cpu_builtins): Add __ARM_FEATURE_COMPLEX.
* config/arm/arm_neon.h:
(vcadd_rot90_f16): New.
(vcaddq_rot90_f16): New.
(vcadd_rot270_f16): New.
(vcaddq_rot270_f16): New.
(vcmla_f16): New.
(vcmlaq_f16): New.
(vcmla_lane_f16): New.
(vcmla_laneq_f16): New.
(vcmlaq_lane_f16): New.
(vcmlaq_laneq_f16): New.
(vcmla_rot90_f16): New.
(vcmlaq_rot90_f16): New.
(vcmla_rot90_lane_f16): New.
(vcmla_rot90_laneq_f16): New.
(vcmlaq_rot90_lane_f16): New.
(vcmlaq_rot90_laneq_f16): New.
(vcmla_rot180_f16): New.
(vcmlaq_rot180_f16): New.
(vcmla_rot180_lane_f16): New.
(vcmla_rot180_laneq_f16): New.
(vcmlaq_rot180_lane_f16): New.
(vcmlaq_rot180_laneq_f16): New.
(vcmla_rot270_f16): New.
(vcmlaq_rot270_f16): New.
(vcmla_rot270_lane_f16): New.
(vcmla_rot270_laneq_f16): New.
(vcmlaq_rot270_lane_f16): New.
(vcmlaq_rot270_laneq_f16): New.
(vcadd_rot90_f32): New.
(vcaddq_rot90_f32): New.
(vcadd_rot270_f32): New.
(vcaddq_rot270_f32): New.
(vcmla_f32): New.
(vcmlaq_f32): New.
(vcmla_lane_f32): New.
(vcmla_laneq_f32): New.
(vcmlaq_lane_f32): New.
(vcmlaq_laneq_f32): New.
(vcmla_rot90_f32): New.
(vcmlaq_rot90_f32): New.
(vcmla_rot90_lane_f32): New.
(vcmla_rot90_laneq_f32): New.
(vcmlaq_rot90_lane_f32): New.
(vcmlaq_rot90_laneq_f32): New.
(vcmla_rot180_f32): New.
(vcmlaq_rot180_f32): New.
(vcmla_rot180_lane_f32): New.
(vcmla_rot180_laneq_f32): New.
(vcmlaq_rot180_lane_f32): New.
(vcmlaq_rot180_laneq_f32): New.
(vcmla_rot270_f32): New.
(vcmlaq_rot270_f32): New.
(vcmla_rot270_lane_f32): New.
(vcmla_rot270_laneq_f32): New.
(vcmlaq_rot270_lane_f32): New.
(vcmlaq_rot270_laneq_f32): New.
* config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0, vcmla90,
vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180, 
vcmla_lane270,
vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270): New.
* config/arm/neon.md (neon_vcmla_lane,
neon_vcmla_laneq, neon_vcmlaq_lane): New.

gcc/testsuite/ChangeLog:

2018-12-11  Tamar Christina  

* gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add AArch32 
regexpr.
* gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c: Likewise.

--




Re: [040/nnn] poly_int: get_inner_reference & co.

2018-12-21 Thread Jakub Jelinek
On Fri, Dec 21, 2018 at 12:10:26PM +0100, Thomas Schwinge wrote:
>   gcc/
>   * gimplify.c (gimplify_scan_omp_clauses): Fix known_eq typo/bug.

Ok, thanks.

> ---
>  gcc/gimplify.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 465d138abbed..40ed18e30271 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -8719,7 +8719,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
> o2 = 0;
>   o2 += bits_to_bytes_round_down (bitpos2);
>   if (maybe_lt (o1, o2)
> - || (known_eq (o1, 2)
> + || (known_eq (o1, o2)
>   && maybe_lt (bitpos, bitpos2)))
> {
>   if (ptr)
> -- 
> 2.17.1
> 


Jakub


Re: [040/nnn] poly_int: get_inner_reference & co.

2018-12-21 Thread Thomas Schwinge
Hi!

On Mon, 23 Oct 2017 18:17:38 +0100, Richard Sandiford 
 wrote:
> This patch makes get_inner_reference and ptr_difference_const return the
> bit size and bit position as poly_int64s rather than HOST_WIDE_INTS.
> The non-mechanical changes were handled by previous patches.

(A variant of that got committed to trunk in r255914.)

> --- gcc/gimplify.c2017-10-23 17:11:40.246949037 +0100
> +++ gcc/gimplify.c2017-10-23 17:18:47.663057272 +0100

> @@ -8056,13 +8056,13 @@ gimplify_scan_omp_clauses (tree *list_p,

> - if (bitpos2)
> -   o2 = o2 + bitpos2 / BITS_PER_UNIT;
> - if (wi::ltu_p (o1, o2)
> - || (wi::eq_p (o1, o2) && bitpos < bitpos2))
> + o2 += bits_to_bytes_round_down (bitpos2);
> + if (may_lt (o1, o2)
> + || (must_eq (o1, 2)
> + && may_lt (bitpos, bitpos2)))
> {

("must_eq" is nowadays known as "known_eq".)  As Julian points out in
 (thanks!,
but please, bug fixes separate from code refactoring), there is an
'apparent bug introduced [...]: "known_eq (o1, 2)" should have been
"known_eq (o1, o2)"'.

I have not searched now for any other such issues -- could this one have
been (or, any others now be) found automatically?

OK to fix (on all relevant branches) this as in the patch attached?  If
approving this patch, please respond with "Reviewed-by: NAME " so
that your effort will be recorded in the commit log, see
.


Grüße
 Thomas


>From 0396c4087114d4a63824d89ff33110b76d607768 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 21 Dec 2018 11:58:45 +0100
Subject: [PATCH] poly_int: get_inner_reference & co.: fix known_eq typo/bug

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Fix known_eq typo/bug.
---
 gcc/gimplify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 465d138abbed..40ed18e30271 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8719,7 +8719,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 			  o2 = 0;
 			o2 += bits_to_bytes_round_down (bitpos2);
 			if (maybe_lt (o1, o2)
-|| (known_eq (o1, 2)
+|| (known_eq (o1, o2)
 && maybe_lt (bitpos, bitpos2)))
 			  {
 if (ptr)
-- 
2.17.1



Re: [PATCH 4/9][GCC][AArch64/Arm] Add new testsuite directives to check complex instructions.

2018-12-21 Thread Kyrill Tkachov

Hi Tamar,

On 11/11/18 10:27, Tamar Christina wrote:

Hi All,

This patch adds new testsuite directive for both Arm and AArch64 to support
testing of the Complex Arithmetic operations form Armv8.3-a.

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
and no regressions.

The instructions have also been tested on aarch64-none-elf and arm-none-eabi on 
a Armv8.3-a model
and -march=Armv8.3-a+fp16 and all tests pass.

Ok for trunk?



This is ok from an arm perspective.

Thanks,
Kyrill


Thanks,
Tamar

gcc/testsuite/ChangeLog:

2018-11-11  Tamar Christina  

* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache,
check_effective_target_arm_v8_3a_complex_neon_ok,
add_options_for_arm_v8_3a_complex_neon,
check_effective_target_arm_v8_3a_complex_neon_hw,
check_effective_target_vect_complex_rot_N): New.

--




Re: [PATCH 7/9][GCC][Arm] Enable autovectorization of Half float values

2018-12-21 Thread Kyrill Tkachov

Hi Tamar,

On 11/11/18 10:27, Tamar Christina wrote:

Hi All,

The AArch32 backend is currently not able to support autovectorization of 
half-float values
on ARM. This is because we never told the vectorizer what the vector modes are 
for Half floats.

This enables autovectorization by definiting V4HF and V8HF as the vector modes.

Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and 
x86_64-pc-linux-gnu
are still on going but previous patch showed no regressions.



Did the testing go okay in the end?
This looks ok to me but can you provide an example, or better yet, add a test 
that demonstrates this changes?

Thanks,
Kyrill


Ok for trunk?

Thanks,
Tamar

gcc/ChangeLog:

2018-11-11  Tamar Christina  

* config/arm/arm.c (arm_preferred_simd_mode): Add V4HF and V8HF.

--




RE: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex mutliplication and addition

2018-12-21 Thread Tamar Christina
Ping

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Tamar Christina
> Sent: Tuesday, December 11, 2018 15:47
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH 9/9][GCC][Arm] Add ACLE intrinsics for complex
> mutliplication and addition
> 
> Hi All,
> 
> This patch adds NEON intrinsics and tests for the Armv8.3-a complex
> multiplication and add instructions with a rotate along the Argand plane.
> 
> The instructions are documented in the ArmARM[1] and the intrinsics
> specification will be published on the Arm website [2].
> 
> The Lane versions of these instructions are special in that they always 
> select a
> pair.
> using index 0 means selecting lane 0 and 1.  Because of this the range check
> for the intrinsics require special handling.
> 
> On Arm, in order to implement some of the lane intrinsics we're using the
> structure of the register file.  The lane variant of these instructions always
> select a D register, but the data itself can be stored in Q registers.  This 
> means
> that for single precision complex numbers you are only allowed to select D[0]
> but using the register file layout you can get the range 0-1 for lane indices 
> by
> selecting between Dn[0] and Dn+1[0].
> 
> Same reasoning applies for half float complex numbers, except there your D
> register indexes can be 0 or 1, so you have a total range of 4 elements (for a
> V8HF).
> 
> 
> [1] https://developer.arm.com/docs/ddi0487/latest/arm-architecture-
> reference-manual-armv8-for-armv8-a-architecture-profile
> [2] https://developer.arm.com/docs/101028/latest
> 
> Bootstrapped Regtested on arm-none-gnueabihf and no issues.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 2018-12-11  Tamar Christina  
> 
>   * config/arm/arm-builtins.c
>   (enum arm_type_qualifiers): Add qualifier_lane_pair_index.
>   (MAC_LANE_PAIR_QUALIFIERS): New.
>   (arm_expand_builtin_args): Use it.
>   (arm_expand_builtin_1): Likewise.
>   * config/arm/arm-protos.h (neon_vcmla_lane_prepare_operands):
> New.
>   * config/arm/arm.c (neon_vcmla_lane_prepare_operands): New.
>   * config/arm/arm-c.c (arm_cpu_builtins): Add
> __ARM_FEATURE_COMPLEX.
>   * config/arm/arm_neon.h:
>   (vcadd_rot90_f16): New.
>   (vcaddq_rot90_f16): New.
>   (vcadd_rot270_f16): New.
>   (vcaddq_rot270_f16): New.
>   (vcmla_f16): New.
>   (vcmlaq_f16): New.
>   (vcmla_lane_f16): New.
>   (vcmla_laneq_f16): New.
>   (vcmlaq_lane_f16): New.
>   (vcmlaq_laneq_f16): New.
>   (vcmla_rot90_f16): New.
>   (vcmlaq_rot90_f16): New.
>   (vcmla_rot90_lane_f16): New.
>   (vcmla_rot90_laneq_f16): New.
>   (vcmlaq_rot90_lane_f16): New.
>   (vcmlaq_rot90_laneq_f16): New.
>   (vcmla_rot180_f16): New.
>   (vcmlaq_rot180_f16): New.
>   (vcmla_rot180_lane_f16): New.
>   (vcmla_rot180_laneq_f16): New.
>   (vcmlaq_rot180_lane_f16): New.
>   (vcmlaq_rot180_laneq_f16): New.
>   (vcmla_rot270_f16): New.
>   (vcmlaq_rot270_f16): New.
>   (vcmla_rot270_lane_f16): New.
>   (vcmla_rot270_laneq_f16): New.
>   (vcmlaq_rot270_lane_f16): New.
>   (vcmlaq_rot270_laneq_f16): New.
>   (vcadd_rot90_f32): New.
>   (vcaddq_rot90_f32): New.
>   (vcadd_rot270_f32): New.
>   (vcaddq_rot270_f32): New.
>   (vcmla_f32): New.
>   (vcmlaq_f32): New.
>   (vcmla_lane_f32): New.
>   (vcmla_laneq_f32): New.
>   (vcmlaq_lane_f32): New.
>   (vcmlaq_laneq_f32): New.
>   (vcmla_rot90_f32): New.
>   (vcmlaq_rot90_f32): New.
>   (vcmla_rot90_lane_f32): New.
>   (vcmla_rot90_laneq_f32): New.
>   (vcmlaq_rot90_lane_f32): New.
>   (vcmlaq_rot90_laneq_f32): New.
>   (vcmla_rot180_f32): New.
>   (vcmlaq_rot180_f32): New.
>   (vcmla_rot180_lane_f32): New.
>   (vcmla_rot180_laneq_f32): New.
>   (vcmlaq_rot180_lane_f32): New.
>   (vcmlaq_rot180_laneq_f32): New.
>   (vcmla_rot270_f32): New.
>   (vcmlaq_rot270_f32): New.
>   (vcmla_rot270_lane_f32): New.
>   (vcmla_rot270_laneq_f32): New.
>   (vcmlaq_rot270_lane_f32): New.
>   (vcmlaq_rot270_laneq_f32): New.
>   * config/arm/arm_neon_builtins.def (vcadd90, vcadd270, vcmla0,
> vcmla90,
>   vcmla180, vcmla270, vcmla_lane0, vcmla_lane90, vcmla_lane180,
> vcmla_lane270,
>   vcmla_laneq0, vcmla_laneq90, vcmla_laneq180, vcmla_laneq270,
>   vcmlaq_lane0, vcmlaq_lane90, vcmlaq_lane180, vcmlaq_lane270):
> New.
>   * config/arm/neon.md (neon_vcmla_lane,
>   neon_vcmla_laneq, neon_vcmlaq_lane):
> New.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-12-11  Tamar Christina  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vector-complex.c: Add
> AArch32 regexpr.
>   * gcc.target/aarch64/advsimd-intrinsics/vector-complex_f16.c:
> Likewise.
> 
> --


RE: [PATCH 7/9][GCC][Arm] Enable autovectorization of Half float values

2018-12-21 Thread Tamar Christina
Ping.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Tamar Christina
> Sent: Sunday, November 11, 2018 10:28
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH 7/9][GCC][Arm] Enable autovectorization of Half float values
> 
> Hi All,
> 
> The AArch32 backend is currently not able to support autovectorization of
> half-float values on ARM. This is because we never told the vectorizer what
> the vector modes are for Half floats.
> 
> This enables autovectorization by definiting V4HF and V8HF as the vector
> modes.
> 
> Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf and
> x86_64-pc-linux-gnu are still on going but previous patch showed no
> regressions.
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 2018-11-11  Tamar Christina  
> 
>   * config/arm/arm.c (arm_preferred_simd_mode): Add V4HF and
> V8HF.
> 
> --


RE: [PATCH 4/9][GCC][AArch64/Arm] Add new testsuite directives to check complex instructions.

2018-12-21 Thread Tamar Christina
Ping arm maintainers.

> -Original Message-
> From: James Greenhalgh 
> Sent: Wednesday, November 28, 2018 17:18
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; ni...@redhat.com; Ramana Radhakrishnan
> ; Kyrylo Tkachov
> 
> Subject: Re: [PATCH 4/9][GCC][AArch64/Arm] Add new testsuite directives
> to check complex instructions.
> 
> On Sun, Nov 11, 2018 at 04:27:04AM -0600, Tamar Christina wrote:
> > Hi All,
> >
> > This patch adds new testsuite directive for both Arm and AArch64 to
> > support testing of the Complex Arithmetic operations form Armv8.3-a.
> >
> > Bootstrap and Regtest on aarch64-none-linux-gnu, arm-none-gnueabihf
> > and x86_64-pc-linux-gnu and no regressions.
> >
> > The instructions have also been tested on aarch64-none-elf and
> > arm-none-eabi on a Armv8.3-a model and -march=Armv8.3-a+fp16 and all
> tests pass.
> >
> > Ok for trunk?
> 
> OK by me on principle, but I don't speak TCL and can't approve the Arm part.
> 
> Please ask a testsuite maintainer.
> 
> Thanks,
> James
> 
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-11-11  Tamar Christina  
> >
> > * lib/target-supports.exp
> > (check_effective_target_arm_v8_3a_complex_neon_ok_nocache,
> > check_effective_target_arm_v8_3a_complex_neon_ok,
> > add_options_for_arm_v8_3a_complex_neon,
> > check_effective_target_arm_v8_3a_complex_neon_hw,
> > check_effective_target_vect_complex_rot_N): New.
> >
> > --


libgomp/target.c magic constants self-documentation

2018-12-21 Thread Thomas Schwinge
Hi!

On Sat, 10 Nov 2018 09:11:18 -0800, Julian Brown  
wrote:
> This patch (by Cesar, with some minor additional changes)

Cesar's changes we're handling separately (already approved; will commit
soon), so it remains here:

> replaces usage
> of several magic constants in target.c with named macros

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -902,6 +902,11 @@ struct target_mem_desc {
> artificial pointer to "omp declare target link" object.  */
>  #define REFCOUNT_LINK (~(uintptr_t) 1)
>  
> +/* Special offset values.  */
> +#define OFFSET_INLINED (~(uintptr_t) 0)
> +#define OFFSET_POINTER (~(uintptr_t) 1)
> +#define OFFSET_STRUCT (~(uintptr_t) 2)
> +
>  struct splay_tree_key_s {
>/* Address of the host object.  */
>uintptr_t host_start;

I'd move these close to the struct they apply to.


> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -45,6 +45,8 @@
>  #include "plugin-suffix.h"
>  #endif
>  
> +#define FIELD_TGT_EMPTY (~(size_t) 0)
> +
>  static void gomp_target_init (void);
>  
>  /* The whole initialization code for offloading plugins is only run one.  */

As it's only used there, I'd actually move that one into "gomp_map_vars",
as a "const size_t field_tgt_empty".  And, you'd missed to use it in the
initialization of "field_tgt_clear".  ;-)


> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -876,6 +892,8 @@ gomp_map_vars_async (struct gomp_device_descr *devicep,
>   else
> k->host_end = k->host_start + sizeof (void *);
>   splay_tree_key n = splay_tree_lookup (mem_map, k);
> + /* Need to account for the case where a struct field hasn't been
> +mapped onto the accelerator yet.  */
>   if (n && n->refcount != REFCOUNT_LINK)
> gomp_map_vars_existing (devicep, aq, n, k, >list[i],
> kind & typemask, cbufp);

We usually talk about "device", not "accelerator".


All that I'm changing with the incremental patch attached.


I'm also again attaching the complete patch that we'd like to commit to
trunk; Jakub, OK?  If approving this patch, please respond with
"Reviewed-by: NAME " so that your effort will be recorded in the
commit log, see .


Grüße
 Thomas


>From 8f36a7d620b3e1d0130b352dc02d58c066c7ba92 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 21 Dec 2018 11:28:49 +0100
Subject: [PATCH] [WIP] libgomp/target.c magic constants self-documentation

---
 libgomp/libgomp.h | 10 +-
 libgomp/target.c  | 11 +--
 2 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 19e5fbb24e26..eef380d7b0fc 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -873,6 +873,11 @@ struct target_var_desc {
   uintptr_t length;
 };
 
+/* Special values for struct target_var_desc's offset.  */
+#define OFFSET_INLINED (~(uintptr_t) 0)
+#define OFFSET_POINTER (~(uintptr_t) 1)
+#define OFFSET_STRUCT (~(uintptr_t) 2)
+
 struct target_mem_desc {
   /* Reference count.  */
   uintptr_t refcount;
@@ -903,11 +908,6 @@ struct target_mem_desc {
artificial pointer to "omp declare target link" object.  */
 #define REFCOUNT_LINK (~(uintptr_t) 1)
 
-/* Special offset values.  */
-#define OFFSET_INLINED (~(uintptr_t) 0)
-#define OFFSET_POINTER (~(uintptr_t) 1)
-#define OFFSET_STRUCT (~(uintptr_t) 2)
-
 struct splay_tree_key_s {
   /* Address of the host object.  */
   uintptr_t host_start;
diff --git a/libgomp/target.c b/libgomp/target.c
index d7acdd9b784b..201da567d73a 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -45,8 +45,6 @@
 #include "plugin-suffix.h"
 #endif
 
-#define FIELD_TGT_EMPTY (~(size_t) 0)
-
 static void gomp_target_init (void);
 
 /* The whole initialization code for offloading plugins is only run one.  */
@@ -748,7 +746,8 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
   if (not_found_cnt)
 	tgt->array = gomp_malloc (not_found_cnt * sizeof (*tgt->array));
   splay_tree_node array = tgt->array;
-  size_t j, field_tgt_offset = 0, field_tgt_clear = ~(size_t) 0;
+  const size_t field_tgt_empty = ~(size_t) 0;
+  size_t j, field_tgt_offset = 0, field_tgt_clear = field_tgt_empty;
   uintptr_t field_tgt_base = 0;
 
   for (i = 0; i < mapnum; i++)
@@ -841,7 +840,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	  k->host_end = k->host_start + sizeof (void *);
 	splay_tree_key n = splay_tree_lookup (mem_map, k);
 	/* Need to account for the case where a struct field hasn't been
-	   mapped onto the accelerator yet.  */
+	   mapped onto the device yet.  */
 	if (n && n->refcount != REFCOUNT_LINK)
 	  gomp_map_vars_existing (devicep, n, k, >list[i],
   kind & typemask, cbufp);
@@ -858,12 +857,12 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i].key = k;
 		k->tgt = tgt;
-		if 

Re: [PATCH] Use vpmin to optimize some vector comparisons (PR target/88547)

2018-12-21 Thread Uros Bizjak
On Fri, Dec 21, 2018 at 10:13 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch attempts to improve code generated for some
> integral vector comparisons and VEC_COND_EXPRs with integral comparisons.
> The only available integral vector comparison instructions are GT and
> EQ, the rest is handled either by negating the result (for vcond swapping
> op_true/op_false), or swapping comparison operands and for unsigned
> comparisons we use various tricks (either subtract *_MAX from both operands
> for V*SI/V*DImode, or use saturating subtractions for V*QI/V*HImode).
>
> If op_true is -1 and op_false is 0, at least when not using AVX512 mask
> reg comparisons we have the right result right after the comparison.
> So for signed x > y we can just
> vpcmpgtd%ymm1, %ymm0, %ymm0
> but if op_true is 0 and op_false is -1, i.e. x <= y, we generate
> vpcmpgtd%ymm1, %ymm0, %ymm0
> vpcmpeqd%ymm1, %ymm1, %ymm1
> vpandn  %ymm1, %ymm0, %ymm0
> The following patch attempts to detect these cases where we would have
> op_true 0 and op_false -1, and rather than generating a single vpcmpgtd
> for the comparison and wait for 2 more instructions for the conditional move
> we generate two instructions for the comparison - min (x, y) == x and don't
> need anything for the rest, so:
> vpminud %ymm1, %ymm0, %ymm1
> vpcmpeqd%ymm0, %ymm1, %ymm0
> For most cases it is done only in these cases where
> 1) mask registers aren't involved
> 2) op_true == 0 and op_false == -1 (with the *negate considered, as the
>transformation inverts *negate)
>
> There is one case where this is useful to do even in other cases, for
> V*SI/V*DImode and unsigned comparisons we generate those
> x -= INT_MAX; y -= INT_MAX subtractions before comparison, so using
> vpminu[dq] + vpcmpeq[dq] is shorter regardless of what follows (with the
> exception when op_true is -1 and op_false is 0, when both sequences are the
> same).
>
> Of course, we can do this optimization only if the corresponding %vpmin
> instructions are available, which varries a lot depending on mode (sometimes
> SSE2, SSE4.1, AVX2, AVX512DQ {,+VL}, AVX512BW {,+VL}).
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> The reason for the avx512f_cond_move.c testcase adjustment is that it hits
> the above second case, where vpminud + vpcmpeqd is shorter, but that means
> we transform that cmp1 ? 2 : 0 which can use a {z} masking into cmp2 ? 0 : 2
> which can't (still, the whole thing is shorter).  On the other side, if the
> original testcase was cmp1 ? 0 : 2 then we wouldn't use {z} masking before
> and with cmp2 ? 2 : 0 we would with the patch.  By changing the comparison
> to be signed, we change what I believe the testcase meant to test - verify
> that combiner can produce {z} masking.
>
> 2018-12-21  Jakub Jelinek  
>
> PR target/88547
> * config/i386/i386.c (ix86_expand_int_sse_cmp): Optimize
> x > y ? 0 : -1 into min (x, y) == x ? -1 : 0.
>
> * gcc.target/i386/pr88547-1.c: Expect only 2 knotb and 2 knotw
> insns instead of 4, check for vpminud, vpminuq and no vpsubd or
> vpsubq.
> * gcc.target/i386/sse2-pr88547-1.c: New test.
> * gcc.target/i386/sse2-pr88547-2.c: New test.
> * gcc.target/i386/sse4_1-pr88547-1.c: New test.
> * gcc.target/i386/sse4_1-pr88547-2.c: New test.
> * gcc.target/i386/avx2-pr88547-1.c: New test.
> * gcc.target/i386/avx2-pr88547-2.c: New test.
> * gcc.target/i386/avx512f-pr88547-2.c: New test.
> * gcc.target/i386/avx512vl-pr88547-1.c: New test.
> * gcc.target/i386/avx512vl-pr88547-2.c: New test.
> * gcc.target/i386/avx512vl-pr88547-3.c: New test.
> * gcc.target/i386/avx512f_cond_move.c (y): Change from unsigned int
> array to int array.

Nice patch, LGTM.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2018-12-20 18:28:51.118253338 +0100
> +++ gcc/config/i386/i386.c  2018-12-21 02:17:23.049042774 +0100
> @@ -24126,6 +24126,104 @@ ix86_expand_int_sse_cmp (rtx dest, enum
> }
> }
>
> +  rtx optrue = op_true ? op_true : CONSTM1_RTX (data_mode);
> +  rtx opfalse = op_false ? op_false : CONST0_RTX (data_mode);
> +  if (*negate)
> +   std::swap (optrue, opfalse);
> +
> +  /* Transform x > y ? 0 : -1 (i.e. x <= y ? -1 : 0 or x <= y) when
> +not using integer masks into min (x, y) == x ? -1 : 0 (i.e.
> +min (x, y) == x).  While we add one instruction (the minimum),
> +we remove the need for two instructions in the negation, as the
> +result is done this way.
> +When using masks, do it for SI/DImode element types, as it is shorter
> +than the two subtractions.  */
> +  if ((code != EQ
> +  && GET_MODE_SIZE (mode) != 64
> +  && vector_all_ones_operand (opfalse, data_mode)
> +  && optrue == 

Re: [PATCH] attribute copy, leaf, weakref and -Wmisisng-attributes (PR 88546)

2018-12-21 Thread Jakub Jelinek
Hi!

I think the main question is whether we should accept leaf attribute
on weakrefs, despite them being marked as !TREE_PUBLIC.

I know we haven't allowed that until now, but weakrefs are weirdo things
which have both static and external effects, static for that they are a
local alias and external for being actually aliases to (usually) external
functions.  If we add a weakref for some function declared as leaf,
it is unnecessarily pessimizing when we don't allow the leaf attribute on
the weakref.

Your patch looks reasonable to me to revert to previous state, but if we
decide to change the above, it would need to change.

On Thu, Dec 20, 2018 at 08:45:03PM -0700, Martin Sebor wrote:
> --- gcc/c-family/c-attribs.c  (revision 267282)
> +++ gcc/c-family/c-attribs.c  (working copy)
> @@ -2455,6 +2455,12 @@ handle_copy_attribute (tree *node, tree name, tree
> || is_attribute_p ("weakref", atname))
>   continue;
>  
> +   /* Aattribute leaf only applies to extern functions.
> +  Avoid copying it to static ones.  */

s/Aatribute/Attribute/

Jakub


Re: [Patch] Bug 88521 - gcc 9.0 from r266355 miscompile x265 for mingw-w64 target

2018-12-21 Thread JonY
On 12/21/18 9:08 AM, Uros Bizjak wrote:
> On Thu, Dec 20, 2018 at 1:09 PM Jakub Jelinek  wrote:
>>
>> On Thu, Dec 20, 2018 at 01:42:15PM +0530, Lokesh Janghel wrote:
>>> Hi Mateuszb,
>>>
>>> I tested with your proposition patch and it is working right.
>>> I also added the patch with test case.
>>> Please let me know your thoughts/suggestions.
>>
>> ChangeLog entry is missing, please write it (and mention there
>> Mateusz's name/mail as he wrote the i386.c part).
>>

Patch looks good to me, but please add a ChangeLog.




signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Rip out rhs-to-tree from tree-affine.c

2018-12-21 Thread Richard Biener
On Tue, 11 Dec 2018, Richard Biener wrote:

> 
> After the previous cleanup the following is more straight-forward now.
> This should make tree-affine behave wrt the "new" GIMPLE world.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, now making
> sure there are no codegen changes as expected.

So this is the final patch.  It generates the same code for gcc/*.o
besides one case where it improved code-generation slightly.
Unfortunately it regresses

FAIL: gcc.dg/tree-ssa/ivopts-lt-2.c scan-tree-dump-times ivopts "PHI" 1
FAIL: gcc.dg/tree-ssa/ivopts-lt-2.c scan-tree-dump-times ivopts "p_[0-9]* <" 1

on x86_64 where IVOPTs fails to eliminate one induction variable.
This is because we no longer expand

 {
   type = long unsigned int
   offset = 0
   elements = {
 [0] = _1 * 4
   }
 }

to

 {
   type = long unsigned int
   offset = 0
   elements = {
 [0] = (long unsigned int) i_6(D) * 1
   }
 }
 
which I think is reasonable.  This has to be dealt with in
IVOPTs.  On 32bit the testcase still passes.

I do not want to introduce a regression at this point
so I am defering this patch to GCC 10.

Developing this patch also exposed IVOPTs calling
tree_to_aff_combination with essentially an outer
widenig cast missing which required to do

+CASE_CONVERT:
+  if (expr_to_aff_combination (comb, code,
+  TREE_TYPE (expr), TREE_OPERAND (expr, 0)))
+   {
+ aff_combination_convert (comb, type);
+ return;

which shouldn'd be required.  That's also sth to fix
and there's now plenty of time in the GCC 10 timeframe to
do so.

Bootstrapped / tested on x86_64-unknown-linux-gnu.

Richard.

2018-12-21  Richard Biener  

* tree-affine.c (expr_to_aff_combination): New function split
out from...
(tree_to_aff_combination): ... here.
(aff_combination_expand): Avoid building a GENERIC tree.


Index: gcc/tree-affine.c
===
--- gcc/tree-affine.c   (revision 267297)
+++ gcc/tree-affine.c   (working copy)
@@ -259,104 +259,66 @@ aff_combination_convert (aff_tree *comb,
 }
 }
 
-/* Splits EXPR into an affine combination of parts.  */
+/* Tries to handle OP0 CODE OP1 as affine combination of parts.  Returns
+   true when that was successful and returns the combination in COMB.  */
 
-void
-tree_to_aff_combination (tree expr, tree type, aff_tree *comb)
+static bool
+expr_to_aff_combination (aff_tree *comb, tree_code code, tree type,
+tree op0, tree op1 = NULL_TREE)
 {
   aff_tree tmp;
-  enum tree_code code;
-  tree cst, core, toffset;
   poly_int64 bitpos, bitsize, bytepos;
-  machine_mode mode;
-  int unsignedp, reversep, volatilep;
-
-  STRIP_NOPS (expr);
 
-  code = TREE_CODE (expr);
   switch (code)
 {
 case POINTER_PLUS_EXPR:
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
-  tree_to_aff_combination (TREE_OPERAND (expr, 1), sizetype, );
+  tree_to_aff_combination (op0, type, comb);
+  tree_to_aff_combination (op1, sizetype, );
   aff_combination_add (comb, );
-  return;
+  return true;
 
 case PLUS_EXPR:
 case MINUS_EXPR:
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
-  tree_to_aff_combination (TREE_OPERAND (expr, 1), type, );
+  tree_to_aff_combination (op0, type, comb);
+  tree_to_aff_combination (op1, type, );
   if (code == MINUS_EXPR)
aff_combination_scale (, -1);
   aff_combination_add (comb, );
-  return;
+  return true;
 
 case MULT_EXPR:
-  cst = TREE_OPERAND (expr, 1);
-  if (TREE_CODE (cst) != INTEGER_CST)
+  if (TREE_CODE (op1) != INTEGER_CST)
break;
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
-  aff_combination_scale (comb, wi::to_widest (cst));
-  return;
+  tree_to_aff_combination (op0, type, comb);
+  aff_combination_scale (comb, wi::to_widest (op1));
+  return true;
 
 case NEGATE_EXPR:
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
+  tree_to_aff_combination (op0, type, comb);
   aff_combination_scale (comb, -1);
-  return;
+  return true;
 
 case BIT_NOT_EXPR:
   /* ~x = -x - 1 */
-  tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
+  tree_to_aff_combination (op0, type, comb);
   aff_combination_scale (comb, -1);
   aff_combination_add_cst (comb, -1);
-  return;
-
-case ADDR_EXPR:
-  /* Handle [ptr + CST] which is equivalent to POINTER_PLUS_EXPR.  */
-  if (TREE_CODE (TREE_OPERAND (expr, 0)) == MEM_REF)
-   {
- expr = TREE_OPERAND (expr, 0);
- tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
- tree_to_aff_combination (TREE_OPERAND (expr, 1), sizetype, );
- aff_combination_add (comb, );
- return;
-   }
-  core = get_inner_reference (TREE_OPERAND (expr, 0), , ,
-

Re: [Patch] Bug 88521 - gcc 9.0 from r266355 miscompile x265 for mingw-w64 target

2018-12-21 Thread Jakub Jelinek
On Fri, Dec 21, 2018 at 10:08:09AM +0100, Uros Bizjak wrote:
> This patch should be reviewed and evenutually approved by cygwin/mingw
> maintainer.

CCing.

Jakub


[PATCH] Use vpmin to optimize some vector comparisons (PR target/88547)

2018-12-21 Thread Jakub Jelinek
Hi!

The following patch attempts to improve code generated for some
integral vector comparisons and VEC_COND_EXPRs with integral comparisons.
The only available integral vector comparison instructions are GT and
EQ, the rest is handled either by negating the result (for vcond swapping
op_true/op_false), or swapping comparison operands and for unsigned
comparisons we use various tricks (either subtract *_MAX from both operands
for V*SI/V*DImode, or use saturating subtractions for V*QI/V*HImode).

If op_true is -1 and op_false is 0, at least when not using AVX512 mask
reg comparisons we have the right result right after the comparison.
So for signed x > y we can just
vpcmpgtd%ymm1, %ymm0, %ymm0
but if op_true is 0 and op_false is -1, i.e. x <= y, we generate
vpcmpgtd%ymm1, %ymm0, %ymm0
vpcmpeqd%ymm1, %ymm1, %ymm1
vpandn  %ymm1, %ymm0, %ymm0
The following patch attempts to detect these cases where we would have
op_true 0 and op_false -1, and rather than generating a single vpcmpgtd
for the comparison and wait for 2 more instructions for the conditional move
we generate two instructions for the comparison - min (x, y) == x and don't
need anything for the rest, so:
vpminud %ymm1, %ymm0, %ymm1
vpcmpeqd%ymm0, %ymm1, %ymm0
For most cases it is done only in these cases where
1) mask registers aren't involved
2) op_true == 0 and op_false == -1 (with the *negate considered, as the
   transformation inverts *negate)

There is one case where this is useful to do even in other cases, for
V*SI/V*DImode and unsigned comparisons we generate those
x -= INT_MAX; y -= INT_MAX subtractions before comparison, so using
vpminu[dq] + vpcmpeq[dq] is shorter regardless of what follows (with the
exception when op_true is -1 and op_false is 0, when both sequences are the
same).

Of course, we can do this optimization only if the corresponding %vpmin
instructions are available, which varries a lot depending on mode (sometimes
SSE2, SSE4.1, AVX2, AVX512DQ {,+VL}, AVX512BW {,+VL}).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

The reason for the avx512f_cond_move.c testcase adjustment is that it hits
the above second case, where vpminud + vpcmpeqd is shorter, but that means
we transform that cmp1 ? 2 : 0 which can use a {z} masking into cmp2 ? 0 : 2
which can't (still, the whole thing is shorter).  On the other side, if the
original testcase was cmp1 ? 0 : 2 then we wouldn't use {z} masking before
and with cmp2 ? 2 : 0 we would with the patch.  By changing the comparison
to be signed, we change what I believe the testcase meant to test - verify
that combiner can produce {z} masking.

2018-12-21  Jakub Jelinek  

PR target/88547
* config/i386/i386.c (ix86_expand_int_sse_cmp): Optimize
x > y ? 0 : -1 into min (x, y) == x ? -1 : 0.

* gcc.target/i386/pr88547-1.c: Expect only 2 knotb and 2 knotw
insns instead of 4, check for vpminud, vpminuq and no vpsubd or
vpsubq.
* gcc.target/i386/sse2-pr88547-1.c: New test.
* gcc.target/i386/sse2-pr88547-2.c: New test.
* gcc.target/i386/sse4_1-pr88547-1.c: New test.
* gcc.target/i386/sse4_1-pr88547-2.c: New test.
* gcc.target/i386/avx2-pr88547-1.c: New test.
* gcc.target/i386/avx2-pr88547-2.c: New test.
* gcc.target/i386/avx512f-pr88547-2.c: New test.
* gcc.target/i386/avx512vl-pr88547-1.c: New test.
* gcc.target/i386/avx512vl-pr88547-2.c: New test.
* gcc.target/i386/avx512vl-pr88547-3.c: New test.
* gcc.target/i386/avx512f_cond_move.c (y): Change from unsigned int
array to int array.

--- gcc/config/i386/i386.c.jj   2018-12-20 18:28:51.118253338 +0100
+++ gcc/config/i386/i386.c  2018-12-21 02:17:23.049042774 +0100
@@ -24126,6 +24126,104 @@ ix86_expand_int_sse_cmp (rtx dest, enum
}
}
 
+  rtx optrue = op_true ? op_true : CONSTM1_RTX (data_mode);
+  rtx opfalse = op_false ? op_false : CONST0_RTX (data_mode);
+  if (*negate)
+   std::swap (optrue, opfalse);
+
+  /* Transform x > y ? 0 : -1 (i.e. x <= y ? -1 : 0 or x <= y) when
+not using integer masks into min (x, y) == x ? -1 : 0 (i.e.
+min (x, y) == x).  While we add one instruction (the minimum),
+we remove the need for two instructions in the negation, as the
+result is done this way.
+When using masks, do it for SI/DImode element types, as it is shorter
+than the two subtractions.  */
+  if ((code != EQ
+  && GET_MODE_SIZE (mode) != 64
+  && vector_all_ones_operand (opfalse, data_mode)
+  && optrue == CONST0_RTX (data_mode))
+ || (code == GTU
+ && GET_MODE_SIZE (GET_MODE_INNER (mode)) >= 4
+ /* Don't do it if not using integer masks and we'd end up with
+the right values in the registers though.  */
+ && 

Re: [Patch] Bug 88521 - gcc 9.0 from r266355 miscompile x265 for mingw-w64 target

2018-12-21 Thread Uros Bizjak
On Thu, Dec 20, 2018 at 1:09 PM Jakub Jelinek  wrote:
>
> On Thu, Dec 20, 2018 at 01:42:15PM +0530, Lokesh Janghel wrote:
> > Hi Mateuszb,
> >
> > I tested with your proposition patch and it is working right.
> > I also added the patch with test case.
> > Please let me know your thoughts/suggestions.
>
> ChangeLog entry is missing, please write it (and mention there
> Mateusz's name/mail as he wrote the i386.c part).
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index b3c8676..e54c489 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -9063,6 +9063,13 @@ function_value_ms_64 (machine_mode orig_mode, 
> machine_mode mode,
>   && !COMPLEX_MODE_P (mode))
> regno = FIRST_SSE_REG;
>   break;
> +   case 8:
> +   case 4:
> + if (valtype != NULL_TREE && AGGREGATE_TYPE_P (valtype))
> +   break;
> + if (mode == SFmode || mode == DFmode)
> +   regno = FIRST_SSE_REG;
> + break;
> default:
>   break;
>  }
> diff --git a/gcc/testsuite/gcc.target/i386/pr88521.c 
> b/gcc/testsuite/gcc.target/i386/pr88521.c
> new file mode 100644
> index 000..f42703a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr88521.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-times "movl\[^\n\r]*, %eax|mov\[ \t]*eax," 1 
> } } */
>
> You try here to handle both -masm=att and -masm=intel
>
> +/* { dg-final { scan-assembler-times "movss\[^\n\r]*, %xmm" 1 } } */
> +/* { dg-final { scan-assembler-times "movsd\[^\n\r]*, %xmm" 1 } } */
>
> but not here.  For that it would need to be "movss\[^\n\r]*(?:, %xmm|xmm, )"
> and similarly for movsd (please verify with
> make check-gcc 
> RUNTESTFLAGS='--target_board=unix\{-m32/-masm=att,-m32/-masm=intel,-m64/-masm=att,-m64/-masm=intel\}
>  i386.exp=pr88521.c'
>
> I'll defer the final review to Uros.

This patch should be reviewed and evenutually approved by cygwin/mingw
maintainer.

Uros.


Re: [PATCH] fortran/69121 -- Make IEEE_SCALB generic

2018-12-21 Thread Janne Blomqvist
On Fri, Dec 21, 2018 at 8:22 AM Steve Kargl <
s...@troutmask.apl.washington.edu> wrote:

> On Thu, Dec 20, 2018 at 01:47:39PM -0800, Steve Kargl wrote:
> > The attached patch has been tested on x86_64-*-freebsd.
> >
> > OK to commit?
> >
> > 2018-12-20  Steven G. Kargl  
> >
> >   PR fortran/69121
> >   * libgfortran/ieee/ieee_arithmetic.F90: Provide missing functions
> >   in interface for IEEE_SCALB.
> >
> > 2018-12-20  Steven G. Kargl  
> >
> >   PR fortran/69121
> >   * gfortran.dg/ieee/ieee_9.f90: New test.
>
> Now, tested on i586-*-freebsd.
>
> --
> Steve
>

Hi, looks ok for trunk.

A few questions popped into my mind while looking into this:

1) Why are none of the _gfortran_ieee_scalb_X_Y functions mentioned in
gfortran.map? I guess they should all be there?

2) Currently all the intrinsics map to the scalbn{,f,l} builtins. However,
when the integer argument is of kind int64 or int128 we should instead use
scalbln{,f,l}. This also applies to other intrinsics that use scalbn under
the hood.

To clarify, fixing these is not a prerequisite for accepting the patch (I
already accepted it), but more like topics for further work.

-- 
Janne Blomqvist


Re: [PATCH] x86: VAESDEC{,LAST} allow memory inputs

2018-12-21 Thread Uros Bizjak
On Fri, Dec 21, 2018 at 9:43 AM Jan Beulich  wrote:
>
> They are no different from their VAESENC{,LAST} counterparts in this
> regard.
>
> gcc/
> 2018-12-21  Jan Beulich  
>
> * config/i386/sse.md (vaesdec_, vaesdeclast_): Allow
> memory input.

OK.

Thanks,
Uros.

> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -21659,7 +21659,7 @@
>[(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> (unspec:VI1_AVX512VL_F
>   [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -  (match_operand:VI1_AVX512VL_F 2 "vector_operand" "v")]
> +  (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
>   UNSPEC_VAESDEC))]
>"TARGET_VAES"
>"vaesdec\t{%2, %1, %0|%0, %1, %2}"
> @@ -21669,7 +21669,7 @@
>[(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
> (unspec:VI1_AVX512VL_F
>   [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
> -  (match_operand:VI1_AVX512VL_F 2 "vector_operand" "v")]
> +  (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
>   UNSPEC_VAESDECLAST))]
>"TARGET_VAES"
>"vaesdeclast\t{%2, %1, %0|%0, %1, %2}"
>
>
>


[C++ PATCH] Fix __builtin_{is_constant_evaluated,constant_p} handling in static_assert (PR c++/86524, PR c++/88446, take 2)

2018-12-21 Thread Jakub Jelinek
On Thu, Dec 20, 2018 at 09:49:39PM -0500, Jason Merrill wrote:
> > But if we need cp_fully_fold, doesn't that mean that the earlier
> > cxx_eval_constant_expression failed and thus the argument is not a constant
> > expression?  Should __builtin_is_constant_evaluated () evaluate to true
> > even if the argument is not a constant expression?
> 
> Ah, no, good point.
> 
> > Is there a reason to call that maybe_constant_value at all when we've called
> > cxx_eval_constant_expression first?  Wouldn't cp_fold_rvalue (or
> > c_fully_fold with false as last argument) be sufficient there?
> 
> I think that would be better, yes.

As cp_fold_rvalue* is static in cp-gimplify.c, I've used c_fully_fold
(or do you want to export cp_fold_rvalue*?).

There is another fix, not reusing the dummy bools between different args,
I think if e.g. the first argument is non-constant and dummy1 would be set,
then the processing of the second argument which might be a constant
expression could behave differently, as *non_constant_p would be true from
the start of the processing.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-20  Jakub Jelinek  

PR c++/86524
PR c++/88446
* cp-tree.h (fold_non_dependent_expr): Add manifestly_const_eval
argument.
* constexpr.c (cxx_eval_builtin_function_call): Evaluate
__builtin_constant_p if ctx->manifestly_const_eval even in constexpr
functions.  Don't reuse dummy{1,2} vars between different arguments.
Use c_fully_fold instead of cp_fully_fold.  Fix comment typo.
(fold_non_dependent_expr): Add manifestly_const_eval argument, pass
it through to cxx_eval_outermost_constant_expr and
maybe_constant_value.
* semantics.c (finish_static_assert): Call fold_non_dependent_expr
with true as manifestly_const_eval.

* g++.dg/cpp1y/constexpr-86524.C: New test.
* g++.dg/cpp2a/is-constant-evaluated4.C: New test.
* g++.dg/cpp2a/is-constant-evaluated5.C: New test.
* g++.dg/cpp2a/is-constant-evaluated6.C: New test.

--- gcc/cp/cp-tree.h.jj 2018-12-20 18:29:24.069715207 +0100
+++ gcc/cp/cp-tree.h2018-12-20 22:10:46.686521475 +0100
@@ -7668,7 +7668,9 @@ extern tree cxx_constant_value(tree,
 extern tree cxx_constant_init  (tree, tree = NULL_TREE);
 extern tree maybe_constant_value   (tree, tree = NULL_TREE, bool = 
false);
 extern tree maybe_constant_init(tree, tree = 
NULL_TREE, bool = false);
-extern tree fold_non_dependent_expr(tree, tsubst_flags_t = 
tf_warning_or_error);
+extern tree fold_non_dependent_expr(tree,
+tsubst_flags_t = 
tf_warning_or_error,
+bool = false);
 extern tree fold_simple(tree);
 extern bool is_sub_constant_expr(tree);
 extern bool reduced_constant_expression_p   (tree);
--- gcc/cp/constexpr.c.jj   2018-12-20 08:50:29.695444227 +0100
+++ gcc/cp/constexpr.c  2018-12-20 22:12:43.754615737 +0100
@@ -1197,7 +1197,7 @@ cxx_eval_builtin_function_call (const co
   /* If we aren't requiring a constant expression, defer __builtin_constant_p
  in a constexpr function until we have values for the parameters.  */
   if (bi_const_p
-  && ctx->quiet
+  && !ctx->manifestly_const_eval
   && current_function_decl
   && DECL_DECLARED_CONSTEXPR_P (current_function_decl))
 {
@@ -1222,7 +1222,6 @@ cxx_eval_builtin_function_call (const co
  return constant false for a non-constant argument.  */
   constexpr_ctx new_ctx = *ctx;
   new_ctx.quiet = true;
-  bool dummy1 = false, dummy2 = false;
   for (i = 0; i < nargs; ++i)
 {
   args[i] = CALL_EXPR_ARG (t, i);
@@ -1231,12 +1230,16 @@ cxx_eval_builtin_function_call (const co
 of the builtin, verify it here.  */
   if (!builtin_valid_in_constant_expr_p (fun)
  || potential_constant_expression (args[i]))
-   args[i] = cxx_eval_constant_expression (_ctx, args[i], false,
-   , );
+   {
+ bool dummy1 = false, dummy2 = false;
+ args[i] = cxx_eval_constant_expression (_ctx, args[i], false,
+ , );
+   }
+
   if (bi_const_p)
-   /* For __built_in_constant_p, fold all expressions with constant values
+   /* For __builtin_constant_p, fold all expressions with constant values
   even if they aren't C++ constant-expressions.  */
-   args[i] = cp_fully_fold (args[i]);
+   args[i] = c_fully_fold (args[i], false, NULL, false);
 }
 
   bool save_ffbcp = force_folding_builtin_constant_p;
@@ -5340,6 +5343,7 @@ clear_cv_and_fold_caches (void)
(t, complain) followed by maybe_constant_value but is more efficient,
because it calls instantiation_dependent_expression_p 

[PATCH] x86: VAESDEC{,LAST} allow memory inputs

2018-12-21 Thread Jan Beulich
They are no different from their VAESENC{,LAST} counterparts in this
regard.

gcc/
2018-12-21  Jan Beulich  

* config/i386/sse.md (vaesdec_, vaesdeclast_): Allow
memory input.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -21659,7 +21659,7 @@
   [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
(unspec:VI1_AVX512VL_F
  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-  (match_operand:VI1_AVX512VL_F 2 "vector_operand" "v")]
+  (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
  UNSPEC_VAESDEC))]
   "TARGET_VAES"
   "vaesdec\t{%2, %1, %0|%0, %1, %2}"
@@ -21669,7 +21669,7 @@
   [(set (match_operand:VI1_AVX512VL_F 0 "register_operand" "=v")
(unspec:VI1_AVX512VL_F
  [(match_operand:VI1_AVX512VL_F 1 "register_operand" "v")
-  (match_operand:VI1_AVX512VL_F 2 "vector_operand" "v")]
+  (match_operand:VI1_AVX512VL_F 2 "vector_operand" "vm")]
  UNSPEC_VAESDECLAST))]
   "TARGET_VAES"
   "vaesdeclast\t{%2, %1, %0|%0, %1, %2}"





[PATCH] x86: relax mask register constraints

2018-12-21 Thread Jan Beulich
While their use for masking is indeed restricted to %k1...%k7, use as
"normal" insn operands also permits %k0. Remove the unnecessary
limitations, requiring quite a few testsuite adjustments.

Oddly enough some AVX512{F,DQ} test cases already check for %k[0-7],
while others did permit {%k0} - where they get touched here anyway this
gets fixed at the same time.

gcc/
2018-12-21  Jan Beulich  

* config/i386/sse.md
(_cmp3,
_cmp3,
_ucmp3,
_ucmp3,
avx512f_vmcmp3,
avx512f_vmcmp3_mask,
avx512f_maskcmp3,
_cvt2mask,
_cvt2mask,
*_cvtmask2,
*_cvtmask2,
_eq3_1,
_eq3_1,
_gt3,
_gt3,
_testm3,
_testnm3,
*_testm3_zext,
*_testm3_zext_mask,
*_testnm3_zext,
*_testnm3_zext_mask,
avx512cd_maskb_vec_dup,
avx512cd_maskw_vec_dup,
avx512dq_fpclass,
avx512dq_vmfpclass,
avx512vl_vpshufbitqmb): Use =k
instead of =Yk.

gcc/testsuite/
2018-12-21  Jan Beulich  

* gcc.target/i386/avx512bitalg-vpshufbitqmb.c,
gcc.target/i386/avx512bw-vpcmpeqb-1.c,
gcc.target/i386/avx512bw-vpcmpequb-1.c,
gcc.target/i386/avx512bw-vpcmpequw-1.c,
gcc.target/i386/avx512bw-vpcmpeqw-1.c,
gcc.target/i386/avx512bw-vpcmpgeb-1.c,
gcc.target/i386/avx512bw-vpcmpgeub-1.c,
gcc.target/i386/avx512bw-vpcmpgeuw-1.c,
gcc.target/i386/avx512bw-vpcmpgew-1.c,
gcc.target/i386/avx512bw-vpcmpgtb-1.c,
gcc.target/i386/avx512bw-vpcmpgtub-1.c,
gcc.target/i386/avx512bw-vpcmpgtuw-1.c,
gcc.target/i386/avx512bw-vpcmpgtw-1.c,
gcc.target/i386/avx512bw-vpcmpleb-1.c,
gcc.target/i386/avx512bw-vpcmpleub-1.c,
gcc.target/i386/avx512bw-vpcmpleuw-1.c,
gcc.target/i386/avx512bw-vpcmplew-1.c,
gcc.target/i386/avx512bw-vpcmpltb-1.c,
gcc.target/i386/avx512bw-vpcmpltub-1.c,
gcc.target/i386/avx512bw-vpcmpltuw-1.c,
gcc.target/i386/avx512bw-vpcmpltw-1.c,
gcc.target/i386/avx512bw-vpcmpneqb-1.c,
gcc.target/i386/avx512bw-vpcmpnequb-1.c,
gcc.target/i386/avx512bw-vpcmpnequw-1.c,
gcc.target/i386/avx512bw-vpcmpneqw-1.c,
gcc.target/i386/avx512bw-vpmovb2m-1.c,
gcc.target/i386/avx512bw-vpmovm2b-1.c,
gcc.target/i386/avx512bw-vpmovm2w-1.c,
gcc.target/i386/avx512bw-vpmovw2m-1.c,
gcc.target/i386/avx512bw-vptestmb-1.c,
gcc.target/i386/avx512bw-vptestmw-1.c,
gcc.target/i386/avx512bw-vptestnmb-1.c,
gcc.target/i386/avx512bw-vptestnmw-1.c,
gcc.target/i386/avx512cd-vpbroadcastmb2q-1.c,
gcc.target/i386/avx512cd-vpbroadcastmw2d-1.c,
gcc.target/i386/avx512dq-vfpclasssd-1.c,
gcc.target/i386/avx512dq-vfpcla-1.c,
gcc.target/i386/avx512dq-vpmovd2m-1.c,
gcc.target/i386/avx512dq-vpmovm2d-1.c,
gcc.target/i386/avx512dq-vpmovm2q-1.c,
gcc.target/i386/avx512dq-vpmovq2m-1.c,
gcc.target/i386/avx512vl-vpbroadcastmb2q-1.c,
gcc.target/i386/avx512vl-vpbroadcastmw2d-1.c,
gcc.target/i386/avx512vl-vpcmpeqd-1.c,
gcc.target/i386/avx512vl-vpcmpeqq-1.c,
gcc.target/i386/avx512vl-vpcmpequd-1.c,
gcc.target/i386/avx512vl-vpcmpequq-1.c,
gcc.target/i386/avx512vl-vpcmpged-1.c,
gcc.target/i386/avx512vl-vpcmpgeq-1.c,
gcc.target/i386/avx512vl-vpcmpgeud-1.c,
gcc.target/i386/avx512vl-vpcmpgeuq-1.c,
gcc.target/i386/avx512vl-vpcmpgtd-1.c,
gcc.target/i386/avx512vl-vpcmpgtq-1.c,
gcc.target/i386/avx512vl-vpcmpgtud-1.c,
gcc.target/i386/avx512vl-vpcmpgtuq-1.c,
gcc.target/i386/avx512vl-vpcmpled-1.c,
gcc.target/i386/avx512vl-vpcmpleq-1.c,
gcc.target/i386/avx512vl-vpcmpleud-1.c,
gcc.target/i386/avx512vl-vpcmpleuq-1.c,
gcc.target/i386/avx512vl-vpcmpltd-1.c,
gcc.target/i386/avx512vl-vpcmpltq-1.c,
gcc.target/i386/avx512vl-vpcmpltud-1.c,
gcc.target/i386/avx512vl-vpcmpltuq-1.c,
gcc.target/i386/avx512vl-vpcmpneqd-1.c,
gcc.target/i386/avx512vl-vpcmpneqq-1.c,
gcc.target/i386/avx512vl-vpcmpnequd-1.c,
gcc.target/i386/avx512vl-vpcmpnequq-1.c,
gcc.target/i386/avx512vl-vptestmd-1.c,
gcc.target/i386/avx512vl-vptestmq-1.c,
gcc.target/i386/avx512vl-vptestnmd-1.c,
gcc.target/i386/avx512vl-vptestnmq-1.c: Permit %k0 as ordinary
operand.
* gcc.target/i386/avx512bw-vpcmpb-1.c,
gcc.target/i386/avx512bw-vpcmpub-1.c,
gcc.target/i386/avx512bw-vpcmpuw-1.c,
gcc.target/i386/avx512bw-vpcmpw-1.c,
gcc.target/i386/avx512dq-vfpclasspd-1.c,
gcc.target/i386/avx512dq-vfpclassps-1.c,
gcc.target/i386/avx512f-vcmppd-1.c,
gcc.target/i386/avx512f-vcmpps-1.c,
gcc.target/i386/avx512f-vcmpsd-1.c,
gcc.target/i386/avx512f-vcmpss-1.c,

[PATCH] x86-64: {,V}CVTSI2Sx are ambiguous without suffix

2018-12-21 Thread Jan Beulich
For 64-bit these should not be emitted without suffix in AT mode (as
being ambiguous that way); the suffixes are benign for 32-bit. For
consistency also omit the suffix in Intel mode for {,V}CVTSI2SxQ.

The omission has originally (prior to rev 260691) lead to wrong code
being generated for the 64-bit unsigned-to-float/double conversions (as
gas guesses an L suffix instead of the required Q one when the operand
is in memory). In all remaining cases (being changed here) the omission
would "just" lead to warnings with future gas versions.

Since rex64suffix so far has been used also on {,V}CVTSx2SI (but
not on VCVTSx2USI, as gas doesn't permit suffixes there), testsuite
adjustments are also necessary for their test cases. Rather than
making thinks check for the L suffixes in 32-bit cases, make things
symmetric with VCVTSx2USI and drop the redundant suffixes instead,
dropping the Q suffix expectations at the same time from the 64-bit
cases.

In order for related test cases to actually test what they're supposed
to test, add (seemingly unrelated) a few empty "asm volatile()".
Presumably there are more where constant propagation voids the intended
effect of the tests, but these are ones helping make sure the assembler
actually still assembles correctly the output after the changes here.

gcc/
2018-12-21  Jan Beulich  

* config/i386/i386.md (rex64suffix): Add L suffix for SI.
* config/i386/sse.md (sse_cvtss2si,
sse_cvtss2si_2,
sse_cvttss2si,
sse2_cvtsd2si,
sse2_cvtsd2si_2,
sse2_cvttsd2si): Drop
.
(cvtusi232, sse2_cvtsi2sd): Add
{l}.
(sse2_cvtsi2sdq): Make q conditional upon AT
syntax.

gcc/testsuite/
2018-12-21  Jan Beulich  

* gcc.target/i386/avx512f-vcvtsd2si64-1.c,
gcc.target/i386/avx512f-vcvtss2si64-1.c
gcc.target/i386/avx512f-vcvttsd2si64-1.c
gcc.target/i386/avx512f-vcvttss2si64-1.c: Drop q suffix
expectation.
* gcc.target/i386/avx512f-vcvtsi2ss-1.c,
gcc.target/i386/avx512f-vcvtusi2sd-1.c,
gcc.target/i386/avx512f-vcvtusi2ss-1.c: Expect l suffix.
* gcc.target/i386/avx512f-vcvtusi2sd-2.c,
gcc.target/i386/avx512f-vcvtusi2sd64-2.c,
gcc.target/i386/avx512f-vcvtusi2ss-2.c,
gcc.target/i386/avx512f-vcvtusi2ss64-2.c: Add asm volatile().

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1162,7 +1162,7 @@
   [(QI "V64QI") (HI "V32HI") (SI "V16SI") (DI "V8DI") (SF "V16SF") (DF 
"V8DF")])
 
 ;; Instruction suffix for REX 64bit operators.
-(define_mode_attr rex64suffix [(SI "") (DI "{q}")])
+(define_mode_attr rex64suffix [(SI "{l}") (DI "{q}")])
 (define_mode_attr rex64namesuffix [(SI "") (DI "q")])
 
 ;; This mode iterator allows :P to be used for patterns that operate on
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -4720,7 +4720,7 @@
 (parallel [(const_int 0)]))]
  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE"
-  "%vcvtss2si\t{%1, %0|%0, %k1}"
+  "%vcvtss2si\t{%1, %0|%0, %k1}"
   [(set_attr "type" "sseicvt")
(set_attr "athlon_decode" "double,vector")
(set_attr "bdver1_decode" "double,double")
@@ -4733,7 +4733,7 @@
(unspec:SWI48 [(match_operand:SF 1 "nonimmediate_operand" "v,m")]
  UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE"
-  "%vcvtss2si\t{%1, %0|%0, %k1}"
+  "%vcvtss2si\t{%1, %0|%0, %k1}"
   [(set_attr "type" "sseicvt")
(set_attr "athlon_decode" "double,vector")
(set_attr "amdfam10_decode" "double,double")
@@ -4749,7 +4749,7 @@
(match_operand:V4SF 1 "" 
"v,")
(parallel [(const_int 0)]]
   "TARGET_SSE"
-  "%vcvttss2si\t{%1, %0|%0, 
%k1}"
+  "%vcvttss2si\t{%1, %0|%0, %k1}"
   [(set_attr "type" "sseicvt")
(set_attr "athlon_decode" "double,vector")
(set_attr "amdfam10_decode" "double,double")
@@ -4767,7 +4767,7 @@
  (match_operand:VF_128 1 "register_operand" "v")
  (const_int 1)))]
   "TARGET_AVX512F && "
-  "vcvtusi2\t{%2, %1, %0|%0, %1, 
%2}"
+  "vcvtusi2{l}\t{%2, %1, %0|%0, %1, 
%2}"
   [(set_attr "type" "sseicvt")
(set_attr "prefix" "evex")
(set_attr "mode" "")])
@@ -5026,9 +5026,9 @@
  (const_int 1)))]
   "TARGET_SSE2"
   "@
-   cvtsi2sd\t{%2, %0|%0, %2}
-   cvtsi2sd\t{%2, %0|%0, %2}
-   vcvtsi2sd\t{%2, %1, %0|%0, %1, %2}"
+   cvtsi2sd{l}\t{%2, %0|%0, %2}
+   cvtsi2sd{l}\t{%2, %0|%0, %2}
+   vcvtsi2sd{l}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseicvt")
(set_attr "athlon_decode" "double,direct,*")
@@ -5048,9 +5048,9 @@
  (const_int 1)))]
   "TARGET_SSE2 && TARGET_64BIT"
   "@
-   cvtsi2sdq\t{%2, %0|%0, %2}
-   cvtsi2sdq\t{%2, %0|%0, %2}
-   vcvtsi2sdq\t{%2, %1, %0|%0, %1, %2}"
+   cvtsi2sd{q}\t{%2, %0|%0, %2}
+   cvtsi2sd{q}\t{%2, %0|%0, %2}
+   vcvtsi2sd{q}\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "sseicvt")
(set_attr "athlon_decode" "double,direct,*")
@@ -5119,7 +5119,7 

+reminder+ Re: Make clear, when contributions will be ignored

2018-12-21 Thread Дилян Палаузов
Hello,

what shall happen, so that no reminders are necessary to move things
forward?  Why does sending a reminder make a difference and are only
penetrant persons blessed?

Regards
  Дилян


On Fri, 2018-12-07 at 10:55 +, Дилян Палаузов wrote:
> Hello,
> 
> will it help, if Bugzilla is reprogrammed to send automatically weekly
> reminders on all patches, that are not integrated yet?
> 
> Will lt help, if I hire myself to integrate the patch, or shall I
> rather hire somebody to send reminders?
> 
> If something can be done after sending a reminder, then it can be
> arranged also without reminders.  In particular, dealing with reminders
> is avoidable extra work.
> 
> Whether people are paid or not, does not change on the subject very
> much.  I have experienced organizations, where people are not paid and
> they manage to tackle everything.  I have seen organizations where
> people are paid and they do not get the management right.
> 
> I am not speaking about having some strict time to get a response, but
> rather to ensure an answer in reasonable time.  No answer in reasonable
> time is the same as ignorance — the subject of this thread.
> 
> The patch I proposed on 27th Oct was first submitted towards GDB and
> then I was told to send it to GCC.  Here I was told to sent it to GDB. 
> What shall happen to quit the loop?
> 
> In any case, if the common aim is to have a system where contributions
> do not get lost, then I’m sure the workflows can be adjusted to achieve
> this aim.
> 
> Regards
>   Дилян
> 
> 
> On Wed, 2018-12-05 at 17:37 +, Joseph Myers wrote:
> > On Wed, 5 Dec 2018, Segher Boessenkool wrote:
> > 
> > > Patches are usually ignored because everyone thinks someone else will
> > > handle it.
> > 
> > And in this case, it looks like this patch would best be reviewed first in 
> > the GDB context - then once committed to binutils-gdb, the committer could 
> > post to gcc-patches (CC:ing build system maintainers) requesting a commit 
> > to GCC if they don't have write access to GCC themselves.  I consider 
> > synchronizing changes to such top-level files in either direction to be 
> > obvious and not to need a separate review.
> >