Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

2017-10-27 Thread Jeff Law
On 10/13/2017 02:26 PM, Wilco Dijkstra wrote:
> Hi,
> 
> To continue the review of the AArch64 frame code I tried a few examples
> to figure out what it does now. For initial_adjust <= 63*1024 and 
> final_adjust <
> 1024 there are no probes inserted as expected, ie. the vast majority of
> functions are unaffected. So that works perfectly.
Right.

> 
> For larger frames the first oddity is that there are now 2 separate params
> controlling how probes are generated:
> 
> stack-clash-protection-guard-size (default 12, but set to 16 on AArch64)
> stack-clash-protection-probe-interval (default 12)
> 
> I don't see how this makes sense. These values are closely related, so if
> one is different from the other, probing becomes ineffective/incorrect. 
> For example we generate code that trivially bypasses the guard despite
> all the probing:
My hope would be that we simply don't ever use the params.  They were
done as much for *you* to experiment with as anything.  I'd happy just
delete them as there's essentially no guard rails to ensure their values
are sane.


> 
> --param=stack-clash-protection-probe-interval=13
> --param=stack-clash-protection-guard-size=12
> 
> So if there is a good reason to continue with 2 separate values, we must
> force probe interval <= guard size!
The param code really isn't designed to enforce values that are
inter-dependent.  It has a min, max & default values.  No more, no less.
 If you set up something inconsistent with the params, it's simply not
going to work.


> 
> Also on AArch64 --param=stack-clash-protection-probe-interval=16 causes
> crashes due to the offsets used in the probes - we don't need large offsets
> as we want to probe close to the bottom of the stack.
Not a surprise.  While I tried to handle larger intervals, I certainly
didn't test them.  Given the ISA I wouldn't expect an interval > 12 to
be useful or necessarily even work correctly.


> 
> Functions with a large stack emit like alloca a lot of code, here I used
> --param=stack-clash-protection-probe-interval=15:
> 
> int f1(int x)
> {
>   char arr[128*1024];
>   return arr[x];
> }
> 
> f1:
>   mov x16, 64512
>   sub sp, sp, x16
>   .cfi_def_cfa_offset 64512
>   mov x16, -32768
>   add sp, sp, x16
>   .cfi_def_cfa_offset -1024
>   str xzr, [sp, 32760]
>   add sp, sp, x16
>   .cfi_def_cfa_offset -66560
>   str xzr, [sp, 32760]
>   sub sp, sp, #1024
>   .cfi_def_cfa_offset -65536
>   str xzr, [sp, 1016]
>   ldrbw0, [sp, w0, sxtw]
>   .cfi_def_cfa_offset 131072
>   add sp, sp, 131072
>   .cfi_def_cfa_offset 0
>   ret
> 
> Note the cfa offsets are wrong.
Yes.  They definitely look wrong.  There's a clear logic error in
setting up the ADJUST_CFA note when the probing interval is larger than
2**12.  That should be easily fixed.  Let me poke at it.

> 
> There is an odd mix of a big initial adjustment, then some probes+adjustments 
> and
> then a final adjustment and probe for the remainder. I can't see the point of 
> having
> both an initial and remainder adjustment. I would expect this:
> 
>   sub sp, sp, 65536
>   str xzr, [sp, 1024]
>   sub sp, sp, 65536
>   str xzr, [sp, 1024]
>   ldrbw0, [sp, w0, sxtw]
>   add sp, sp, 131072
>   ret
I'm really not able to justify spending further time optimizing the
aarch64 implementation.  I've done the best I can.  You can take the
work as-is or improve it, but I really can't justify further time
investment on that architecture.

> 
> 
> int f2(int x)
> {
>   char arr[128*1024];
>   return arr[x];
> }
> 
> f2:
>   mov x16, 64512
>   sub sp, sp, x16
>   mov x16, -65536
>   movkx16, 0xfffd, lsl 16
>   add x16, sp, x16
> .LPSRL0:
>   sub sp, sp, 4096
>   str xzr, [sp, 4088]
>   cmp sp, x16
>   b.ne.LPSRL0
>   sub sp, sp, #1024
>   str xzr, [sp, 1016]
>   ldrbw0, [sp, w0, sxtw]
>   add sp, sp, 262144
>   ret
> 
> The cfa entries are OK for this case. There is a mix of positive/negative 
> offsets which
> makes things confusing. Again there are 3 kinds of adjustments when for this 
> size we
> only need the loop.
> 
> Reusing the existing gen_probe_stack_range code appears a bad idea since
> it ignores the probe interval and just defaults to 4KB. I don't see why it 
> should be
> any more complex than this:
> 
>   sub x16, sp, 262144  // only need temporary if > 1MB
> .LPSRL0:
>   sub sp, sp, 65536
>   str xzr, [sp, 1024]
>   cmp sp, x16
>   b.ne.LPSRL0
>   ldrbw0, [sp, w0, sxtw]
>   add sp, sp, 262144
>   ret
> 
> Probe insertion if final adjustment >= 1024 also generates a lot of redundant
> code - although this is more a theoretical issue given this is so rare.
Again, if ARM wants this optimized, then ARM's engineers are going to
have to take 

[PATCH,committed] PR fortran/82620 -- fix detection of syntax error

2017-10-27 Thread Steve Kargl
I've committed the following patch to fix a problem
where gfortran ICEs after detection of a syntax 
error in an allocate statement.  The patch was
regression tested on x86_64-*-freebsd.

=== gfortran Summary ===

# of expected passes46027
# of expected failures  97
# of unsupported tests  82
/mnt/sgk/gcc/obj/gcc/gfortran  version 8.0.0 20171028 (experimental) (GCC) 


2017-10-27  Steven G. Kargl  

PR fortran/82620
* match.c (gfc_match_allocate): Exit early on syntax error.

2017-10-27  Steven G. Kargl  

PR fortran/82620
* gfortran.dg/allocate_error_7.f90: new test.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow
Index: gcc/fortran/match.c
===
--- gcc/fortran/match.c	(revision 254192)
+++ gcc/fortran/match.c	(working copy)
@@ -3968,7 +3968,10 @@ gfc_match_allocate (void)
   saw_stat = saw_errmsg = saw_source = saw_mold = saw_deferred = false;
 
   if (gfc_match_char ('(') != MATCH_YES)
-goto syntax;
+{
+  gfc_syntax_error (ST_ALLOCATE);
+  return MATCH_ERROR;
+}
 
   /* Match an optional type-spec.  */
   old_locus = gfc_current_locus;
Index: gcc/testsuite/gfortran.dg/allocate_error_7.f90
===
--- gcc/testsuite/gfortran.dg/allocate_error_7.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/allocate_error_7.f90	(working copy)
@@ -0,0 +1,12 @@
+! { dg-do compile }
+!
+! Code contributed by Gerhard Steinmetz
+!
+program pr82620
+   type t(a)
+  integer, len :: a
+   end type
+   type(t(:)), allocatable :: x, y
+   allocate(t(4) :: x)
+   allocate)t(7) :: y) ! { dg-error "Syntax error in ALLOCATE" }
+end program pr82620


[PATCH], Add rounding built-ins to the _Float and _FloatX built-in functions

2017-10-27 Thread Michael Meissner
The power9 (running PowerPC ISA 3.0) has a round to integer instruction
(XSRQPI) that does various flavors of round an IEEE 128-bit floating point to
integeral values.  This patch adds the support to the machine independent
portion of the compiler, and adds the necessary support for ceilf128,
roundf128, truncf128, and roundf128 to the PowerPC backend when you use
-mcpu=power9.

I have done bootstrap builds on both x86-64 and a little endian power8 system.
Can I install these patches to the trunk?

[gcc]
2017-10-27  Michael Meissner  

* builtins.def: (_Float and _FloatX BUILT_IN_CEIL): Add
_Float and _FloatX variants for rounding built-in
functions.
(_Float and _FloatX BUILT_IN_FLOOR): Likewise.
(_Float and _FloatX BUILT_IN_NEARBYINT): Likewise.
(_Float and _FloatX BUILT_IN_RINT): Likewise.
(_Float and _FloatX BUILT_IN_ROUND): Likewise.
(_Float and _FloatX BUILT_IN_TRUNC): Likewise.
* builtins.c (mathfn_built_in_2): Likewise.
* internal-fn.def (CEIL): Likewise.
(FLOOR): Likewise.
(NEARBYINT): Likewise.
(RINT): Likewise.
(ROUND): Likewise.
(TRUNC): Likewise.
* fold-const.c (tree_call_nonnegative_warnv_p): Likewise.
(integer_valued_real_call_p): Likewise.
* fold-const-call.c (fold_const_call_ss): Likewise.
* config/rs6000/rs6000.md (floor2): Add support for IEEE
128-bit round to integer instructions.
(ceil2): Likewise.
(btrunc2): Likewise.
(round2): Likewise.

[gcc/c]
2017-10-27  Michael Meissner  

* c-decl.c (header_for_builtin_fn): Add integer rounding _Float
and _FloatX built-in functions.

[gcc/testsuite]
2017-10-27  Michael Meissner  

* gcc.target/powerpc/float128-hw2.c: Add tests for ceilf128,
floorf128, truncf128, and roundf128.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/builtins.def
===
--- gcc/builtins.def(revision 254172)
+++ gcc/builtins.def(working copy)
@@ -335,6 +335,9 @@ DEF_C99_BUILTIN(BUILT_IN_CBRTL, 
 DEF_LIB_BUILTIN(BUILT_IN_CEIL, "ceil", BT_FN_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_C90RES_BUILTIN (BUILT_IN_CEILF, "ceilf", BT_FN_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_C90RES_BUILTIN (BUILT_IN_CEILL, "ceill", BT_FN_LONGDOUBLE_LONGDOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+#define CEIL_TYPE(F) BT_FN_##F##_##F
+DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_CEIL, "ceil", CEIL_TYPE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+#undef CEIL_TYPE
 DEF_C99_BUILTIN(BUILT_IN_COPYSIGN, "copysign", 
BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_COPYSIGNF, "copysignf", 
BT_FN_FLOAT_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_COPYSIGNL, "copysignl", 
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
@@ -394,6 +397,9 @@ DEF_C99_BUILTIN(BUILT_IN_FEUPDAT
 DEF_LIB_BUILTIN(BUILT_IN_FLOOR, "floor", BT_FN_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_C90RES_BUILTIN (BUILT_IN_FLOORF, "floorf", BT_FN_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_C90RES_BUILTIN (BUILT_IN_FLOORL, "floorl", 
BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
+#define FLOOR_TYPE(F) BT_FN_##F##_##F
+DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_FLOOR, "floor", FLOOR_TYPE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+#undef FLOOR_TYPE
 DEF_C99_BUILTIN(BUILT_IN_FMA, "fma", 
BT_FN_DOUBLE_DOUBLE_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING)
 DEF_C99_BUILTIN(BUILT_IN_FMAF, "fmaf", BT_FN_FLOAT_FLOAT_FLOAT_FLOAT, 
ATTR_MATHFN_FPROUNDING)
 DEF_C99_BUILTIN(BUILT_IN_FMAL, "fmal", 
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING)
@@ -531,6 +537,9 @@ DEF_GCC_FLOATN_NX_BUILTINS (BUILT_IN_NAN
 DEF_C99_BUILTIN(BUILT_IN_NEARBYINT, "nearbyint", BT_FN_DOUBLE_DOUBLE, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_NEARBYINTF, "nearbyintf", BT_FN_FLOAT_FLOAT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_C99_BUILTIN(BUILT_IN_NEARBYINTL, "nearbyintl", 
BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST)
+#define NEARBYINT_TYPE(F) BT_FN_##F##_##F
+DEF_EXT_LIB_FLOATN_NX_BUILTINS (BUILT_IN_NEARBYINT, "nearbyint", 
NEARBYINT_TYPE, ATTR_CONST_NOTHROW_LEAF_LIST)
+#undef NEARBYINT_TYPE
 DEF_C99_BUILTIN(BUILT_IN_NEXTAFTER, "nextafter", 
BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
 DEF_C99_BUILTIN(BUILT_IN_NEXTAFTERF, "nextafterf", 
BT_FN_FLOAT_FLOAT_FLOAT, ATTR_MATHFN_FPROUNDING_ERRNO)
 DEF_C99_BUILTIN(BUILT_IN_NEXTAFTERL, "nextafterl", 
BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO)
@@ -555,9 

Re: [patch, fortran, RFC] Interchange indices for FORALL and DO CONCURRENT if profitable

2017-10-27 Thread Steve Kargl
Hi Thomas,

In general, I like the idea.  I have some minor suggestions below.


On Sat, Oct 28, 2017 at 12:03:58AM +0200, Thomas Koenig wrote:
> +/* Callback function to determine if an expression is the 
> +   corresponding variable.  */
> +
> +static int

static bool

> +has_var (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED, void *data)
> +{
> +  gfc_expr *expr = *e;
> +  gfc_symbol *sym;
> +
> +  if (expr->expr_type != EXPR_VARIABLE)
> +return 0;

return false;

> +
> +  sym = (gfc_symbol *) data;
> +  return sym == expr->symtree->n.sym;
> +}
> +
> +/* Callback function to calculate the cost of a certain index.  */

This function always returns 0, so

> +static int

static void

> +index_cost (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED,
> + void *data)
> +{
> +  ind_type *ind;
> +  gfc_expr *expr;
> +  gfc_array_ref *ar;
> +  gfc_ref *ref;
> +  int i,j;
> +
> +  expr = *e;
> +  if (expr->expr_type != EXPR_VARIABLE)
> +return 0;

return;

> +
> +  ar = NULL;
> +  for (ref = expr->ref; ref; ref = ref->next)
> +{
> +  if (ref->type == REF_ARRAY)
> + {
> +   ar = >u.ar;
> +   break;
> + }
> +}
> +  if (ar == NULL || ar->type != AR_ELEMENT)
> +return 0;

return;

> +
> +  ind = (ind_type *) data;
> +  for (i = 0; i < ar->dimen; i++)
> +{
> +  for (j=0; ind[j].sym != NULL; j++)
> + {
> +   if (gfc_expr_walker (>start[i], has_var, (void *) (ind[j].sym)))
> +   ind[j].n[i]++;
> + }
> +}
> +  return 0;

Delete this return as a void function that reaches its
end will return;

> +}
> +
> +/* Callback function for qsort, to sort the loop indices. */
> +
> +static int
> +loop_comp (const void *e1, const void *e2)
> +{
> +  const ind_type *i1 = (const ind_type *) e1;
> +  const ind_type *i2 = (const ind_type *) e2;
> +  int i;
> +
> +  for (i=GFC_MAX_DIMENSIONS-1; i >= 0; i--)
> +{
> +  if (i1->n[i] != i2->n[i])
> + return i1->n[i] - i2->n[i];
> +}
> +  /* All other things being equal, let's not change the ordering.  */
> +  return i2->num - i1->num;
> +}
> +
> +/* Main function to do the index interchange.  */
> +

This function always returns 0, so

> +static int

static void

> +index_interchange (gfc_code **c, int *walk_subtrees ATTRIBUTE_UNUSED,
> +   void *data ATTRIBUTE_UNUSED)
> +{
> +  gfc_code *co;
> +  co = *c;
> +  int n_iter;
> +  gfc_forall_iterator *fa;
> +  ind_type *ind;
> +  int i, j;
> +  
> +  if (co->op != EXEC_FORALL && co->op != EXEC_DO_CONCURRENT)
> +return 0;

return;

> +
> +  n_iter = 0;
> +  for (fa = co->ext.forall_iterator; fa; fa = fa->next)
> +n_iter ++;
> +
> +  /* Nothing to reorder. */
> +  if (n_iter < 2)
> +return 0;

return;

> +
> +  ind = XALLOCAVEC (ind_type, n_iter + 1);
> +
> +  i = 0;
> +  for (fa = co->ext.forall_iterator; fa; fa = fa->next)
> +{
> +  ind[i].sym = fa->var->symtree->n.sym;
> +  ind[i].fa = fa;
> +  for (j=0; j + ind[i].n[j] = 0;
> +  ind[i].num = i;
> +  i++;
> +}
> +  ind[n_iter].sym = NULL;
> +  ind[n_iter].fa = NULL;
> +
> +  gfc_code_walker (c, gfc_dummy_code_callback, index_cost, (void *) ind);
> +  qsort ((void *) ind, n_iter, sizeof (ind_type), loop_comp);
> +
> +  /* Do the actual index interchange.  */
> +  co->ext.forall_iterator = fa = ind[0].fa;
> +  for (i=1; i +{
> +  fa->next = ind[i].fa;
> +  fa = fa->next;
> +}
> +  fa->next = NULL;
> +
> +  return 0;

Delete this return.

-- 
Steve


[patch, fortran, RFC] Interchange indices for FORALL and DO CONCURRENT if profitable

2017-10-27 Thread Thomas Koenig

Hello world,

this is a draft patch which interchanges the indices for FORALL and
DO CONCURRENT loops for cases like PR 82471, where code like

  DO CONCURRENT( K=1:N, J=1:M, I=1:L)
 C(I,J,K) = A(I,J,K) + B(I,J,K)
  END DO

led to very poor code because of stride issues.  Currently,
Graphite is not able to do this.

Without the patch, the code above is translated as


i.7 = 1;
count.10 = 512;
while (1)
  {
if (ANNOTATE_EXPR ) goto L.4;
j.6 = 1;
count.9 = 512;
while (1)
  {
if (ANNOTATE_EXPR ) goto L.3;
k.5 = 1;
count.8 = 512;
while (1)
  {
if (ANNOTATE_EXPR ) goto L.2;
(*(real(kind=4)[0:] * restrict) c.data)[((c.offset 
+ (integer(kind=8)) k.5 * c.dim[2].stride) + (integer(kind=8)) j.6 * 
c.dim[1].stride) + (integer(kind=8)) i.7] = (*(real(kind=4)[0:] * 
restrict) a.data)[((a.offset + (integer(kind=8)) k.5 * a.dim[2].stride) 
+ (integer(kind=8)) j.6 * a.dim[1].stride) + (integer(kind=8)) i.7] + 
(*(real(kind=4)[0:] * restrict) b.data)[((b.offset + (integer(kind=8)) 
k.5 * b.dim[2].stride) + (integer(kind=8)) j.6 * b.dim[1].stride) + 
(integer(kind=8)) i.7];

L.1:;
k.5 = k.5 + 1;
count.8 = count.8 + -1;
  }
L.2:;
j.6 = j.6 + 1;
count.9 = count.9 + -1;
  }
L.3:;
i.7 = i.7 + 1;
count.10 = count.10 + -1;
  }
L.4:;

so the innermost loop has the biggest stride. With the patch
(and front-end optimization turned on), this is turned into

k.7 = 1;
count.10 = 512;
while (1)
  {
if (ANNOTATE_EXPR ) goto L.4;
j.6 = 1;
count.9 = 512;
while (1)
  {
if (ANNOTATE_EXPR ) goto L.3;
i.5 = 1;
count.8 = 512;
while (1)
  {
if (ANNOTATE_EXPR ) goto L.2;
(*(real(kind=4)[0:] * restrict) c.data)[((c.offset 
+ (integer(kind=8)) k.7 * c.dim[2].stride) + (integer(kind=8)) j.6 * 
c.dim[1].stride) + (integer(kind=8)) i.5] = (*(real(kind=4)[0:] * 
restrict) a.data)[((a.offset + (integer(kind=8)) k.7 * a.dim[2].stride) 
+ (integer(kind=8)) j.6 * a.dim[1].stride) + (integer(kind=8)) i.5] + 
(*(real(kind=4)[0:] * restrict) b.data)[((b.offset + (integer(kind=8)) 
k.7 * b.dim[2].stride) + (integer(kind=8)) j.6 * b.dim[1].stride) + 
(integer(kind=8)) i.5];

L.1:;
i.5 = i.5 + 1;
count.8 = count.8 + -1;
  }
L.2:;
j.6 = j.6 + 1;
count.9 = count.9 + -1;
  }
L.3:;
k.7 = k.7 + 1;
count.10 = count.10 + -1;
  }
L.4:;

so the innermost loop is the one that gets traversed with unity stride
(the way it should have been done).

Although the algorithm here is quite simple, it resolves the issues
raised in the PR so far, and it definitely worth fixing.

If we do this kind of thing, it might also be possible to
interchange normal DO loops in a similar way (which Graphite also
cannot do at the moment, at least not if the bounds are variables).

So, comments? Suggestions? Other ideas? Any ideas how to write
a test case? Any volunteers to re-implement Graphite in the
Fortran front end (didn't think so) or to make Graphite catch
this sort of pattern (which it currently doesn't) instead?

Regards

Thomas

2017-10-27  Thomas Koenig  

* frontend-passes.c (index_interchange): New funciton,
prototype.
(optimize_namespace): Call index_interchange.
(ind_type): New function.
(has_var): New function.
(index_cost): New function.
(loop_comp): New function.
Index: frontend-passes.c
===
--- frontend-passes.c	(Revision 253872)
+++ frontend-passes.c	(Arbeitskopie)
@@ -55,6 +55,7 @@ static gfc_expr* check_conjg_transpose_variable (g
 		 bool *);
 static bool has_dimen_vector_ref (gfc_expr *);
 static int matmul_temp_args (gfc_code **, int *,void *data);
+static int index_interchange (gfc_code **, int*, void *);
 
 #ifdef CHECKING_P
 static void check_locus (gfc_namespace *);
@@ -1385,6 +1386,9 @@ optimize_namespace (gfc_namespace *ns)
 		   NULL);
 }
 
+  gfc_code_walker (>code, index_interchange, dummy_expr_callback,
+		   NULL);
+
   /* BLOCKs are handled in the expression walker below.  */
   for (ns = ns->contained; ns; ns = ns->sibling)
 {
@@ -4225,6 +4229,157 @@ inline_matmul_assign (gfc_code **c, int *walk_subt
   return 0;
 }
 
+
+/* Code for index interchange for loops which are grouped together in DO
+   

Add -std=c17, -std=gnu17

2017-10-27 Thread Joseph Myers
C17, a bug-fix version of the C11 standard with DR resolutions
integrated, will soon go to ballot.  This patch adds corresponding
options -std=c17, -std=gnu17 (new default version, replacing
-std=gnu11 as the default), -std=iso9899:2017.  As a bug-fix version
of the standard, there is no need for flag_isoc17 or any options for
compatibility warnings; however, there is a new __STDC_VERSION__
value, so new cpplib languages CLK_GNUC17 and CLK_STDC17 are added to
support using that new value with the new options.  (If the standard
ends up being published in 2018 and being known as C18, option aliases
can be added.  Note however that -std=iso9899:199409 corresponds to a
__STDC_VERSION__ value rather than a publication date.)

(There are a couple of DR resolutions needing implementing in GCC, but
that's independent of the new options.)

(I'd propose to add -std=c2x / -std=gnu2x / -Wc11-c2x-compat for the
next major C standard revision once there are actually C2x drafts
being issued with new features included.)

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Are the
non-front-end changes for the "GNU C17" language name OK?

gcc:
2017-10-27  Joseph Myers  

* doc/invoke.texi (C Dialect Options): Document -std=c17,
-std=iso9899:2017 and -std=gnu17.
* doc/standards.texi (C Language): Document C17 support.
* doc/cpp.texi (Overview): Mention -std=c17.
(Standard Predefined Macros): Document C11 and C17 values of
__STDC_VERSION__.  Do not refer to C99 support as incomplete.
* doc/extend.texi (Inline): Do not list individual options for
standards newer than C99.
* dwarf2out.c (highest_c_language, gen_compile_unit_die): Handle
"GNU C17".
* config/rl78/rl78.c (rl78_option_override): Handle "GNU C17"
language name.

gcc/c-family:
2017-10-27  Joseph Myers  

* c.opt (std=c17, std=gnu17, std=iso9899:2017): New options.
* c-opts.c (set_std_c17): New function.
(c_common_init_options): Use gnu17 as default C version.
(c_common_handle_option): Handle -std=c17 and -std=gnu17.

gcc/testsuite:
2017-10-27  Joseph Myers  

* gcc.dg/c17-version-1.c, gcc.dg/c17-version-2.c: New tests.

libcpp:
2017-10-27  Joseph Myers  

* include/cpplib.h (enum c_lang): Add CLK_GNUC17 and CLK_STDC17.
* init.c (lang_defaults): Add GNUC17 and STDC17 data.
(cpp_init_builtins): Handle C17 value of __STDC_VERSION__.

Index: gcc/c-family/c-opts.c
===
--- gcc/c-family/c-opts.c   (revision 254145)
+++ gcc/c-family/c-opts.c   (working copy)
@@ -115,6 +115,7 @@ static void set_std_cxx2a (int);
 static void set_std_c89 (int, int);
 static void set_std_c99 (int);
 static void set_std_c11 (int);
+static void set_std_c17 (int);
 static void check_deps_environment_vars (void);
 static void handle_deferred_opts (void);
 static void sanitize_cpp_opts (void);
@@ -236,8 +237,8 @@ c_common_init_options (unsigned int decoded_option
 
   if (c_language == clk_c)
 {
-  /* The default for C is gnu11.  */
-  set_std_c11 (false /* ISO */);
+  /* The default for C is gnu17.  */
+  set_std_c17 (false /* ISO */);
 
   /* If preprocessing assembly language, accept any of the C-family
 front end options since the driver may pass them through.  */
@@ -675,6 +676,16 @@ c_common_handle_option (size_t scode, const char *
set_std_c11 (false /* ISO */);
   break;
 
+case OPT_std_c17:
+  if (!preprocessing_asm_p)
+   set_std_c17 (true /* ISO */);
+  break;
+
+case OPT_std_gnu17:
+  if (!preprocessing_asm_p)
+   set_std_c17 (false /* ISO */);
+  break;
+
 case OPT_trigraphs:
   cpp_opts->trigraphs = 1;
   break;
@@ -1559,6 +1570,21 @@ set_std_c11 (int iso)
   lang_hooks.name = "GNU C11";
 }
 
+/* Set the C 17 standard (without GNU extensions if ISO).  */
+static void
+set_std_c17 (int iso)
+{
+  cpp_set_lang (parse_in, iso ? CLK_STDC17: CLK_GNUC17);
+  flag_no_asm = iso;
+  flag_no_nonansi_builtin = iso;
+  flag_iso = iso;
+  flag_isoc11 = 1;
+  flag_isoc99 = 1;
+  flag_isoc94 = 1;
+  lang_hooks.name = "GNU C17";
+}
+
+
 /* Set the C++ 98 standard (without GNU extensions if ISO).  */
 static void
 set_std_cxx98 (int iso)
Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt  (revision 254145)
+++ gcc/c-family/c.opt  (working copy)
@@ -1944,6 +1944,10 @@ std=c1x
 C ObjC Alias(std=c11)
 Deprecated in favor of -std=c11.
 
+std=c17
+C ObjC
+Conform to the ISO 2017 C standard.
+
 std=c89
 C ObjC Alias(std=c90)
 Conform to the ISO 1990 C standard.
@@ -2006,6 +2010,10 @@ std=gnu1x
 C ObjC Alias(std=gnu11)
 Deprecated in favor of -std=gnu11.
 
+std=gnu17
+C ObjC
+Conform to the ISO 2017 C standard with GNU 

Re: [PATCH] [testsuite/i386] PR 82268 Correct FAIL when configured --with-cpu

2017-10-27 Thread Uros Bizjak
On Fri, Oct 27, 2017 at 11:17 PM, Daniel Santos  wrote:
> When I originally wrote this test I wasn't wasn't aware of the
> --with-cpu configure option, so this change explicitly disables avx to
> make sure we choose the sse implementation, even when --with-cpu
> specifies an arch that has avx support.
>
> OK for head?
>
> gcc/testsuite/ChangeLog:
>
> gcc.target/i386/pr82196-1.c (dg-options): Add -mno-avx.

OK.

Thanks,
Uros.

> Thanks,
> Daniel
>
> ---
>  gcc/testsuite/gcc.target/i386/pr82196-1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c 
> b/gcc/testsuite/gcc.target/i386/pr82196-1.c
> index 541d975480d..ff108132bb5 100644
> --- a/gcc/testsuite/gcc.target/i386/pr82196-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target lp64 } } */
> -/* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */
> +/* { dg-options "-mno-avx -msse -mcall-ms2sysv-xlogues -O2" } */
>  /* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */
>  /* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */
>
> --
> 2.14.3
>


[PATCH] [testsuite/i386] PR 82268 Correct FAIL when configured --with-cpu

2017-10-27 Thread Daniel Santos
When I originally wrote this test I wasn't wasn't aware of the
--with-cpu configure option, so this change explicitly disables avx to
make sure we choose the sse implementation, even when --with-cpu
specifies an arch that has avx support.

OK for head?

gcc/testsuite/ChangeLog:

gcc.target/i386/pr82196-1.c (dg-options): Add -mno-avx.

Thanks,
Daniel

---
 gcc/testsuite/gcc.target/i386/pr82196-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c 
b/gcc/testsuite/gcc.target/i386/pr82196-1.c
index 541d975480d..ff108132bb5 100644
--- a/gcc/testsuite/gcc.target/i386/pr82196-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target lp64 } } */
-/* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */
+/* { dg-options "-mno-avx -msse -mcall-ms2sysv-xlogues -O2" } */
 /* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */
 /* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */
 
-- 
2.14.3



Re: [committed][PATCH] Convert sprintf warning code to a dominator walk

2017-10-27 Thread Jeff Law
On 10/27/2017 12:03 PM, David Malcolm wrote:
> On Fri, 2017-10-27 at 10:55 -0600, Jeff Law wrote:
>> Prereq for eventually embedding range analysis into the sprintf
>> warning
>> pass.  The only thing that changed since the original from a few days
>> ago was the addition of FINAL OVERRIDE to the before_dom_children
>> override function.
>>
>> Re-bootstrapped and regression tested on x86.
>>
>> Installing on the trunk.  Final patch attached for archival purposes.
>>
>>
>> Jeff
> 
> Sorry to be re-treading the FINAL/OVERRIDE stuff, but...
> 
> [...snip...]
> 
>> +class sprintf_dom_walker : public dom_walker
>> +{
>> + public:
>> +  sprintf_dom_walker () : dom_walker (CDI_DOMINATORS) {}
>> +  ~sprintf_dom_walker () {}
>> +
>> +  virtual edge before_dom_children (basic_block) FINAL OVERRIDE;
> 
> Is it just me, or is it a code smell to have both "virtual" and
> "final"/"override" on a decl?
> 
> In particular, AIUI:
> "virtual" says: "some subclass might override this method"
> "final" says: "no subclass will override this method"
> 
> so having both seems contradictory.
> 
> If sprintf_dom_walker is providing a implementation of a vfunc of
> dom_walker, then presumably this should just lose the "virtual" on the
> subclass, it's presumably already got the "virtual" it needs in the
> base class.
I thought I'd removed all the virtuals when I added the FINAL OVERRIDEs.
 Sigh.

I'll take care of it.

jeff


[committed] Backports to 7.x

2017-10-27 Thread Jakub Jelinek
Hi!

I've backported following 5 patches to 7.x.
The PR81715 one, because it could be risky, is enabled only
for -fsanitize=kernel-address which it has been reported for
and where it causes major issues for the kernel during sanitization.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to 7
branch.

Jakub
2017-10-27  Jakub Jelinek  

Backported from mainline
2017-09-15  Jakub Jelinek  

PR rtl-optimization/82192
* combine.c (make_extraction): Don't look through non-paradoxical
SUBREGs or TRUNCATE if pos + len is or might be bigger than
inner's mode.

* gcc.c-torture/execute/pr82192.c: New test.

--- gcc/combine.c   (revision 252823)
+++ gcc/combine.c   (revision 252824)
@@ -7442,7 +7442,14 @@ make_extraction (machine_mode mode, rtx
   if (pos_rtx && CONST_INT_P (pos_rtx))
 pos = INTVAL (pos_rtx), pos_rtx = 0;
 
-  if (GET_CODE (inner) == SUBREG && subreg_lowpart_p (inner))
+  if (GET_CODE (inner) == SUBREG
+  && subreg_lowpart_p (inner)
+  && (paradoxical_subreg_p (inner)
+ /* If trying or potentionally trying to extract
+bits outside of is_mode, don't look through
+non-paradoxical SUBREGs.  See PR82192.  */
+ || (pos_rtx == NULL_RTX
+ && pos + len <= GET_MODE_PRECISION (is_mode
 {
   /* If going from (subreg:SI (mem:QI ...)) to (mem:QI ...),
 consider just the QI as the memory to extract from.
@@ -7468,7 +7475,12 @@ make_extraction (machine_mode mode, rtx
   if (new_rtx != 0)
return gen_rtx_ASHIFT (mode, new_rtx, XEXP (inner, 1));
 }
-  else if (GET_CODE (inner) == TRUNCATE)
+  else if (GET_CODE (inner) == TRUNCATE
+  /* If trying or potentionally trying to extract
+ bits outside of is_mode, don't look through
+ TRUNCATE.  See PR82192.  */
+  && pos_rtx == NULL_RTX
+  && pos + len <= GET_MODE_PRECISION (is_mode))
 inner = XEXP (inner, 0);
 
   inner_mode = GET_MODE (inner);
--- gcc/testsuite/gcc.c-torture/execute/pr82192.c   (nonexistent)
+++ gcc/testsuite/gcc.c-torture/execute/pr82192.c   (revision 252824)
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/82192 */
+
+unsigned long long int a = 0x95dd3d896f7422e2ULL;
+struct S { unsigned int m : 13; } b;
+
+__attribute__((noinline, noclone)) void
+foo (void)
+{
+  b.m = ((unsigned) a) >> (0x644eee9667723bf7LL
+  | a & ~0xdee27af8U) - 0x644eee9667763bd8LL;
+}
+
+int
+main ()
+{
+  if (__INT_MAX__ != 0x7fffULL)
+return 0;
+  foo ();
+  if (b.m != 0)
+__builtin_abort ();
+  return 0;
+}
2017-10-27  Jakub Jelinek  

Backported from mainline
2017-09-18  Jakub Jelinek  

PR c/82234
* doc/extend.texi: Add @findex entry for __builtin_shuffle.

--- gcc/doc/extend.texi (revision 252946)
+++ gcc/doc/extend.texi (revision 252947)
@@ -9683,6 +9683,7 @@ For mixed operations between a scalar @c
 @code{s && v} is equivalent to @code{s?v!=0:0} (the evaluation is
 short-circuit) and @code{v && s} is equivalent to @code{v!=0 & (s?-1:0)}.
 
+@findex __builtin_shuffle
 Vector shuffling is available using functions
 @code{__builtin_shuffle (vec, mask)} and
 @code{__builtin_shuffle (vec0, vec1, mask)}.
2017-10-27  Jakub Jelinek  

Backported from mainline
2017-09-21  Jakub Jelinek  

PR sanitizer/81715
* tree-inline.c (expand_call_inline): Emit clobber stmts for
VAR_DECLs to which addressable non-volatile parameters are mapped
and for id->retvar after the return value assignment, though
for -fsanitize=kernel-address only.  Clear id->retval and id->retbnd
after inlining.

* g++.dg/asan/pr81715.C: New test.

--- gcc/tree-inline.c   (revision 253064)
+++ gcc/tree-inline.c   (revision 253065)
@@ -4796,6 +4796,23 @@ expand_call_inline (basic_block bb, gimp
 
   reset_debug_bindings (id, stmt_gsi);
 
+  if (flag_stack_reuse != SR_NONE
+  && (flag_sanitize & SANITIZE_KERNEL_ADDRESS) != 0)
+for (tree p = DECL_ARGUMENTS (id->src_fn); p; p = DECL_CHAIN (p))
+  if (!TREE_THIS_VOLATILE (p))
+   {
+ tree *varp = id->decl_map->get (p);
+ if (varp && VAR_P (*varp) && !is_gimple_reg (*varp))
+   {
+ tree clobber = build_constructor (TREE_TYPE (*varp), NULL);
+ gimple *clobber_stmt;
+ TREE_THIS_VOLATILE (clobber) = 1;
+ clobber_stmt = gimple_build_assign (*varp, clobber);
+ gimple_set_location (clobber_stmt, gimple_location (stmt));
+ gsi_insert_before (_gsi, clobber_stmt, GSI_SAME_STMT);
+   }
+   }
+
   /* Reset the escaped solution.  */
   if (cfun->gimple_df)
 pt_solution_reset (>gimple_df->escaped);
@@ -4846,6 +4863,24 @@ expand_call_inline (basic_block bb, gimp
   stmt = 

Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Nathan Sidwell

On 10/27/2017 02:34 PM, Jakub Jelinek wrote:


But when singly inheriting a polymorphic base and thus mapped to the same
vptr all but the last dtor will not be in charge, right?


Correct.


So, if using build_clobber_this for this, instead of clobbering what we
clobber we'd just clear the single vptr (couldn't clobber the rest, even
if before the store, because that would make the earlier other vptr stores
dead).


ok (I'd not looked at the patch to see if in chargeness was signficant)

nathan

--
Nathan Sidwell


[PATCH] Assorted store-merging improvements (PR middle-end/22141)

2017-10-27 Thread Jakub Jelinek
Hi!

The following patch attempts to improve store merging, for the time being
it still only optimizes constant stores to adjacent memory.

The biggest improvement is handling bitfields, it uses the get_bit_range
helper to find the bounds of what can be modified when modifying the bitfield
and instead of requiring all the stores to be adjacent it now only requires
that their bitregion_* regions are adjacent.  If get_bit_range fails (e.g. for
non-C/C++ languages), then it still rounds the boundaries down and up to whole
bytes, as any kind of changes within the byte affect the rest.  At the end,
if there are any gaps in between the stored values, the old value is loaded from
memory (had to set TREE_NO_WARNING on it, so that uninit doesn't complain)
masked with mask, ored with the constant maked with negation of mask and stored,
pretty much what the expansion emits.  As incremental improvement, perhaps we
could emit in some cases where all the stored bitfields in one load/store set
are adjacent a BIT_INSERT_EXPR instead of doing the and/or.

Another improvement is related to alignment handling, previously the code
was using get_object_alignment, which say for the store_merging_11.c testcase
has to return 8-bit alignment, as the whole struct is 64-bit aligned,
but the first store is 1 byte after that.  The old code would then on
targets that don't allow unaligned stores or have them slow emit just byte
stores (many of them).  The patch uses get_object_alignment_1, so that we
get both the maximum known alignment and misalign, and computes alignment
for that for every bitpos we try, such that for stores starting with
1 byte after 64-bit alignment we get 1 byte store, then 2 byte and then 4 byte
and then 8 byte.

Another improvement is for targets that allow unaligned stores, the new
code performs a dry run on split_group and if it determines that aligned
stores are as many or fewer as unaligned stores, it prefers aligned stores.
E.g. for the case in store_merging_11.c, where ptr is 64-bit aligned and
we store 15 bytes, unaligned stores and unpatched gcc would choose to
do an 8 byte, then 4 byte, then 2 byte and then one byte store.
Aligned stores, 1, 2, 4, 8 in that order are also 4, so it is better
to do those.

The patch also attempts to reuse original stores (well, just their lhs/rhs1),
if we choose a split store that has a single original insn in it.  That way
we don't lose ARRAY_REFs/COMPONENT_REFs etc. unnecessarily.  Furthermore, if
there is a larger original store than the maximum we try (wordsize), e.g. when
there is originally 8 byte long long store followed by 1 byte store followed by
1 byte store, on 32-bit targets we'd previously try to split it into 4 byte
store, 4 byte store and 2 byte store, figure out that is 3 stores like 
previously
and give up.  With the patch, if we see a single original larger store at the
bitpos we want, we just reuse that store, so we get in that case an 8 byte
original store (lhs/rhs1) followed by 2 byte store.

In find_constituent_stmts it optimizes by not walking unnecessarily 
group->stores
entries that are already known to be before the bitpos we ask.  It fixes the 
comparisons
which were off by one, so previously it often chose more original stores than 
were really
in the split store.

Another change is that output_merged_store used to emit the new stores into a 
sequence
and if it found out there are too many, released all ssa names and failed.
That seems to be unnecessary to me, because we know before entering the loop
how many split stores there are, so can just fail at that point and so only 
start
emitting something when we have decided to do the replacement.

I had to disable store merging in the g++.dg/pr71694.C testcase, but that is 
just
because the testcase doesn't test what it should.  In my understanding, it 
wants to
verify that the c.c store isn't using 32-bit RMW, because it would create data 
race
for c.d.  But it stores both c.c and c.d next to each other, so even when c.c's
bitregion is the first 3 bytes and c.d's bitregion is the following byte, we are
then touching bytes in both of the regions and thus a RMW cycle for the whole
32-bit word is fine, as c.d is written, it will store the new value and ignore 
the
old value of the c.d byte.  What is wrong, but store merging doesn't emit, is 
what
we emitted before, i.e. 32-bit RMW that just stored c.c, followed by c.d store.
Another thread could have stored value1 into c.d, we'd R it by 32-bit read, 
modify,
while another thread stored value2 into c.d, then we W the 32-bit word and thus
introduce a value1 store into c.d, then another thread reads it and finds
a value1 instead of expected value2.  Finally we store into c.d value3.
So, alternative to -fno-store-merging in the testcase would be probably separate
functions where one would store to c.c and another one to c.d, then we can make
sure neither store is using movl.  Though, it probably still should only look at
movl stores or loads, other 

Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Jakub Jelinek
On Fri, Oct 27, 2017 at 02:30:39PM -0400, Nathan Sidwell wrote:
> On 10/27/2017 02:18 PM, Jakub Jelinek wrote:
> > On Fri, Oct 27, 2017 at 02:10:10PM -0400, Jason Merrill wrote:
> 
> > > If the point is to clear the vptr, why are you also clearing the rest
> > > of the object?
> > 
> > Can there be multiple vptr pointers in the object or is there just one?
> > Even if there can be multiple, perhaps earlier destructors would
> > have cleared those other vptr pointers though.
> 
> There can be multiple vptrs in an object (multiple polymorphic bases).
> However, each such case will have its own base dtor invoked, as you
> postulated.  In fact, there may be a base dtor invoked that maps onto the
> single vptr, in the cases when we're singly inheriting a polymorphic base.

But when singly inheriting a polymorphic base and thus mapped to the same
vptr all but the last dtor will not be in charge, right?
So, if using build_clobber_this for this, instead of clobbering what we
clobber we'd just clear the single vptr (couldn't clobber the rest, even
if before the store, because that would make the earlier other vptr stores
dead).

Jakub


Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Nathan Sidwell

On 10/27/2017 02:18 PM, Jakub Jelinek wrote:

On Fri, Oct 27, 2017 at 02:10:10PM -0400, Jason Merrill wrote:



If the point is to clear the vptr, why are you also clearing the rest
of the object?


Can there be multiple vptr pointers in the object or is there just one?
Even if there can be multiple, perhaps earlier destructors would
have cleared those other vptr pointers though.


There can be multiple vptrs in an object (multiple polymorphic bases). 
However, each such case will have its own base dtor invoked, as you 
postulated.  In fact, there may be a base dtor invoked that maps onto 
the single vptr, in the cases when we're singly inheriting a polymorphic 
base.


nathan

[polymorphic here includes the case of having virtual bases].

--
Nathan Sidwell


[PATCH, i386]: Fix PR 82692, Ordered comparisons used for unordered built-ins

2017-10-27 Thread Uros Bizjak
Hello!

As discussed in the PR, different modes of a FP compare RTX are not
strong enough to survive through RTL optimization passes. Attached
testcase was miscompiled due to combine changing the mode of FP
compare through SELECT_CC_MODE.

The solution, implemented in the attached patch, is to drop CCFPUmode
(which was used to distinguish unordered and ordered compares) and use
UNSPEC_NOTRAP unspec wrappers around compare RTXes for unordered
comparisons.

2017-10-27  Uros Bizjak  

PR target/82692
* config/i386/i386-modes.def (CCFPU): Remove definition.
* config/i386/i386.c (put_condition_mode): Remove CCFPU mode handling.
(ix86_cc_modes_compatible): Ditto.
(ix86_expand_carry_flag_compare): Ditto.
(ix86_expand_int_movcc): Ditto.
(ix86_expand_int_addcc): Ditto.
(ix86_reverse_condition): Ditto.
(ix86_unordered_fp_compare): Rename from ix86_fp_compare_mode.
Return true/false for unordered/ordered fp comparisons.
(ix86_cc_mode): Always return CCFPmode for float mode comparisons.
(ix86_prepare_fp_compare_args): Update for rename.
(ix86_expand_fp_compare): Update for rename.  Generate unordered
compare RTXes wrapped with UNSPEC_NOTRAP unspec.
(ix86_expand_sse_compare_and_jump): Ditto.
* config/i386/predicates.md (fcmov_comparison_operator):
Remove CCFPU mode handling.
(ix86_comparison_operator): Ditto.
(ix86_carry_flag_operator): Ditto.
* config/i386/i386.md (UNSPEC_NOTRAP): New unspec.
(*cmpu_i387): Wrap compare RTX with UNSPEC_NOTRAP unspec.
(*cmpu_cc_i387): Ditto.
(FPCMP): Remove mode iterator.
(unord): Remove mode attribute.
(unord_subst): New define_subst transformation
(unord): New define_subst attribute.
(unordered): Ditto.
(*cmpi): Rewrite using unord_subst transformation.
(*cmpixf_i387): Ditto.
* config/i386/sse.md (_comi): Merge
from _comi and _ucomi
using unord_subst transformation.
* config/i386/subst.md (SUBST_A): Remove CCFP and CCFPU modes.
(round_saeonly): Also handle CCFP mode.
* reg-stack.c (subst_stack_regs_pat): Handle UNSPEC_NOTRAP unspec.
Remove UNSPEC_SAHF unspec handling.

testsuite/ChangeLog:

2017-10-27  Uros Bizjak  

PR target/82692
* gcc.dg/torture/pr82692.c: New test.

Patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386-modes.def
===
--- config/i386/i386-modes.def  (revision 254111)
+++ config/i386/i386-modes.def  (working copy)
@@ -72,8 +72,8 @@ CC_MODE (CCO);
 CC_MODE (CCP);
 CC_MODE (CCS);
 CC_MODE (CCZ);
+
 CC_MODE (CCFP);
-CC_MODE (CCFPU);
 
 /* Vector modes.  Note that VEC_CONCAT patterns require vector
sizes twice as big as implemented in hardware.  */
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 254111)
+++ config/i386/i386.c  (working copy)
@@ -16930,7 +16930,7 @@ put_condition_code (enum rtx_code code, machine_mo
 {
   const char *suffix;
 
-  if (mode == CCFPmode || mode == CCFPUmode)
+  if (mode == CCFPmode)
 {
   code = ix86_fp_compare_code_to_integer (code);
   mode = CCmode;
@@ -21709,14 +21709,13 @@ ix86_expand_int_compare (enum rtx_code code, rtx o
   return gen_rtx_fmt_ee (code, VOIDmode, flags, const0_rtx);
 }
 
-/* Figure out whether to use ordered or unordered fp comparisons.
-   Return the appropriate mode to use.  */
+/* Figure out whether to use unordered fp comparisons.  */
 
-machine_mode
-ix86_fp_compare_mode (enum rtx_code code)
+static bool
+ix86_unordered_fp_compare (enum rtx_code code)
 {
   if (!TARGET_IEEE_FP)
-return CCFPmode;
+return false;
 
   switch (code)
 {
@@ -21724,7 +21723,7 @@ ix86_expand_int_compare (enum rtx_code code, rtx o
 case GE:
 case LT:
 case LE:
-  return CCFPmode;
+  return false;
 
 case EQ:
 case NE:
@@ -21737,7 +21736,7 @@ ix86_expand_int_compare (enum rtx_code code, rtx o
 case UNGT:
 case UNGE:
 case UNEQ:
-  return CCFPUmode;
+  return true;
 
 default:
   gcc_unreachable ();
@@ -21752,7 +21751,7 @@ ix86_cc_mode (enum rtx_code code, rtx op0, rtx op1
   if (SCALAR_FLOAT_MODE_P (mode))
 {
   gcc_assert (!DECIMAL_FLOAT_MODE_P (mode));
-  return ix86_fp_compare_mode (code);
+  return CCFPmode;
 }
 
   switch (code)
@@ -21874,7 +21873,6 @@ ix86_cc_modes_compatible (machine_mode m1, machine
}
 
 case E_CCFPmode:
-case E_CCFPUmode:
   /* These are only compatible with themselves, which we already
 checked above.  */
   return VOIDmode;
@@ -21978,7 +21976,7 @@ ix86_fp_comparison_strategy (enum rtx_code)
 static enum rtx_code
 ix86_prepare_fp_compare_args (enum rtx_code code, rtx *pop0, rtx *pop1)
 {
-  machine_mode fpcmp_mode = ix86_fp_compare_mode (code);
+  bool unordered_compare = 

Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Jakub Jelinek
On Fri, Oct 27, 2017 at 02:10:10PM -0400, Jason Merrill wrote:
> On Fri, Oct 27, 2017 at 9:52 AM, Jakub Jelinek  wrote:
> > On Fri, Oct 27, 2017 at 03:48:41PM +0200, Martin Liška wrote:
> >> --- a/gcc/cp/decl.c
> >> +++ b/gcc/cp/decl.c
> >> @@ -14639,8 +14639,12 @@ implicit_default_ctor_p (tree fn)
> >>  /* Clobber the contents of *this to let the back end know that the object
> >> storage is dead when we enter the constructor or leave the destructor. 
> >>  */
> >>
> >> +/* Clobber or zero (depending on CLOBBER_P argument) the contents of *this
> >> +   to let the back end know that the object storage is dead
> >> +   when we enter the constructor or leave the destructor.  */
> >> +
> >>  static tree
> >> -build_clobber_this ()
> >> +build_this_constructor (bool clobber_p)
> >
> > I think build_clobber_this is better name, but will defer final review
> > to Jason or Nathan.  Also, seems there was already a function comment
> > and you've added yet another one, instead of ammending the first one.
> 
> Agreed.
> 
> If the point is to clear the vptr, why are you also clearing the rest
> of the object?

Can there be multiple vptr pointers in the object or is there just one?
Even if there can be multiple, perhaps earlier destructors would
have cleared those other vptr pointers though.

Jakub


Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Jason Merrill
On Fri, Oct 27, 2017 at 9:52 AM, Jakub Jelinek  wrote:
> On Fri, Oct 27, 2017 at 03:48:41PM +0200, Martin Liška wrote:
>> --- a/gcc/cp/decl.c
>> +++ b/gcc/cp/decl.c
>> @@ -14639,8 +14639,12 @@ implicit_default_ctor_p (tree fn)
>>  /* Clobber the contents of *this to let the back end know that the object
>> storage is dead when we enter the constructor or leave the destructor.  
>> */
>>
>> +/* Clobber or zero (depending on CLOBBER_P argument) the contents of *this
>> +   to let the back end know that the object storage is dead
>> +   when we enter the constructor or leave the destructor.  */
>> +
>>  static tree
>> -build_clobber_this ()
>> +build_this_constructor (bool clobber_p)
>
> I think build_clobber_this is better name, but will defer final review
> to Jason or Nathan.  Also, seems there was already a function comment
> and you've added yet another one, instead of ammending the first one.

Agreed.

If the point is to clear the vptr, why are you also clearing the rest
of the object?

Jason


Re: [committed][PATCH] Convert sprintf warning code to a dominator walk

2017-10-27 Thread David Malcolm
On Fri, 2017-10-27 at 10:55 -0600, Jeff Law wrote:
> Prereq for eventually embedding range analysis into the sprintf
> warning
> pass.  The only thing that changed since the original from a few days
> ago was the addition of FINAL OVERRIDE to the before_dom_children
> override function.
> 
> Re-bootstrapped and regression tested on x86.
> 
> Installing on the trunk.  Final patch attached for archival purposes.
> 
> 
> Jeff

Sorry to be re-treading the FINAL/OVERRIDE stuff, but...

[...snip...]

> +class sprintf_dom_walker : public dom_walker
> +{
> + public:
> +  sprintf_dom_walker () : dom_walker (CDI_DOMINATORS) {}
> +  ~sprintf_dom_walker () {}
> +
> +  virtual edge before_dom_children (basic_block) FINAL OVERRIDE;

Is it just me, or is it a code smell to have both "virtual" and
"final"/"override" on a decl?

In particular, AIUI:
"virtual" says: "some subclass might override this method"
"final" says: "no subclass will override this method"

so having both seems contradictory.

If sprintf_dom_walker is providing a implementation of a vfunc of
dom_walker, then presumably this should just lose the "virtual" on the
subclass, it's presumably already got the "virtual" it needs in the
base class.


Dave


[PATCH] Simplify _Node_insert_return to avoid including

2017-10-27 Thread Jonathan Wakely

We can use auto return types and if constexpr to do this without
including .

* include/bits/node_handle.h (_Node_insert_return::get): Avoid
use of std::tie and std::get.

Tested powerpc64le-linux, committed to trunk.

commit 397d3b68eb53ff6b229ac777a05dff4ae842a19b
Author: Jonathan Wakely 
Date:   Fri Oct 27 18:07:35 2017 +0100

Simplify _Node_insert_return to avoid including 

* include/bits/node_handle.h (_Node_insert_return::get): Avoid
use of std::tie and std::get.

diff --git a/libstdc++-v3/include/bits/node_handle.h 
b/libstdc++-v3/include/bits/node_handle.h
index c7694a1e0ef..f93bfd7f686 100644
--- a/libstdc++-v3/include/bits/node_handle.h
+++ b/libstdc++-v3/include/bits/node_handle.h
@@ -37,7 +37,6 @@
 # define __cpp_lib_node_extract 201606
 
 #include 
-#include 
 #include 
 #include 
 
@@ -286,22 +285,50 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
decltype(auto) get() &
-   { return std::get<_Idx>(std::tie(inserted, position, node)); }
+   {
+ static_assert(_Idx < 3);
+ if constexpr (_Idx == 0)
+   return inserted;
+ else if constexpr (_Idx == 1)
+   return position;
+ else if constexpr (_Idx == 2)
+   return node;
+   }
 
   template
decltype(auto) get() const &
-   { return std::get<_Idx>(std::tie(inserted, position, node)); }
+   {
+ static_assert(_Idx < 3);
+ if constexpr (_Idx == 0)
+   return inserted;
+ else if constexpr (_Idx == 1)
+   return position;
+ else if constexpr (_Idx == 2)
+   return node;
+   }
 
   template
decltype(auto) get() &&
{
- return std::move(std::get<_Idx>(std::tie(inserted, position, node)));
+ static_assert(_Idx < 3);
+ if constexpr (_Idx == 0)
+   return std::move(inserted);
+ else if constexpr (_Idx == 1)
+   return std::move(position);
+ else if constexpr (_Idx == 2)
+   return std::move(node);
}
 
   template
decltype(auto) get() const &&
{
- return std::move(std::get<_Idx>(std::tie(inserted, position, node)));
+ static_assert(_Idx < 3);
+ if constexpr (_Idx == 0)
+   return std::move(inserted);
+ else if constexpr (_Idx == 1)
+   return std::move(position);
+ else if constexpr (_Idx == 2)
+   return std::move(node);
}
 };
 


[PATCH] List headers in Makefile in alphabetical order

2017-10-27 Thread Jonathan Wakely

I think this header was going to be called math_specfun.h but was
renamed at the last minute, without re-ordering it in the Makefile.

* include/Makefile.am: Put headers in alphabetical order.
* include/Makefile.in: Regenerate.

Tested powerpc64le-linux, committed to trunk.

commit a4cac4d8ad8481903f2ad213f7fa692dbe856195
Author: Jonathan Wakely 
Date:   Fri Oct 27 18:04:38 2017 +0100

List headers in Makefile in alphabetical order

* include/Makefile.am: Put headers in alphabetical order.
* include/Makefile.in: Regenerate.

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index 2c4d193d0a4..3e34dc00747 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -130,7 +130,6 @@ bits_headers = \
${bits_srcdir}/locale_facets_nonio.tcc \
${bits_srcdir}/localefwd.h \
${bits_srcdir}/mask_array.h \
-   ${bits_srcdir}/specfun.h \
${bits_srcdir}/memoryfwd.h \
${bits_srcdir}/move.h \
${bits_srcdir}/node_handle.h \
@@ -161,6 +160,7 @@ bits_headers = \
${bits_srcdir}/shared_ptr_atomic.h \
${bits_srcdir}/shared_ptr_base.h \
${bits_srcdir}/slice_array.h \
+   ${bits_srcdir}/specfun.h \
${bits_srcdir}/sstream.tcc \
${bits_srcdir}/std_abs.h \
${bits_srcdir}/std_function.h \


[PATCH] Remove noexcept from filesystem iterators and operations (LWG 3013, 3014)

2017-10-27 Thread Jonathan Wakely

These issues haven't been resolved yet, but the discussion is correct
that these operations allocate memory, so the proposed resolutions are
correct.

* include/bits/fs_dir.h (directory_iterator): Remove noexcept from
constructors and increment member (LWG 3013).
(recursive_directory_iterator): Likewise.
* include/bits/fs_ops.h (copy, copy_file, create_directories)
(is_empty, remove_all): Remove noexcept (LWG 3013 and LWG 3014).
* src/filesystem/std-dir.cc (directory_iterator::increment)
(recursive_directory_iterator::increment): Remove noexcept.
* src/filesystem/std-ops.cc (copy, copy_file, create_directories)
(is_empty, remove_all): Remove noexcept

Tested powerpc64le-linux, committed to trunk.


commit eeb3ecd4345471956f95e77c853e1ed00fded065
Author: Jonathan Wakely 
Date:   Fri Oct 27 17:58:22 2017 +0100

Remove noexcept from filesystem iterators and operations (LWG 3013, 3014)

* include/bits/fs_dir.h (directory_iterator): Remove noexcept from
constructors and increment member (LWG 3013).
(recursive_directory_iterator): Likewise.
* include/bits/fs_ops.h (copy, copy_file, create_directories)
(is_empty, remove_all): Remove noexcept (LWG 3013 and LWG 3014).
* src/filesystem/std-dir.cc (directory_iterator::increment)
(recursive_directory_iterator::increment): Remove noexcept.
* src/filesystem/std-ops.cc (copy, copy_file, create_directories)
(is_empty, remove_all): Remove noexcept

diff --git a/libstdc++-v3/include/bits/fs_dir.h 
b/libstdc++-v3/include/bits/fs_dir.h
index cd83d25c4ad..579a269711e 100644
--- a/libstdc++-v3/include/bits/fs_dir.h
+++ b/libstdc++-v3/include/bits/fs_dir.h
@@ -355,12 +355,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 directory_iterator(const path& __p, directory_options __options)
 : directory_iterator(__p, __options, nullptr) { }
 
-directory_iterator(const path& __p, error_code& __ec) noexcept
+directory_iterator(const path& __p, error_code& __ec)
 : directory_iterator(__p, directory_options::none, __ec) { }
 
-directory_iterator(const path& __p,
-  directory_options __options,
-  error_code& __ec) noexcept
+directory_iterator(const path& __p, directory_options __options,
+  error_code& __ec)
 : directory_iterator(__p, __options, &__ec) { }
 
 directory_iterator(const directory_iterator& __rhs) = default;
@@ -378,7 +377,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 const directory_entry& operator*() const;
 const directory_entry* operator->() const { return &**this; }
 directory_iterator&operator++();
-directory_iterator&increment(error_code& __ec) noexcept;
+directory_iterator&increment(error_code& __ec);
 
 __directory_iterator_proxy operator++(int)
 {
@@ -436,12 +435,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 recursive_directory_iterator(const path& __p, directory_options __options)
 : recursive_directory_iterator(__p, __options, nullptr) { }
 
-recursive_directory_iterator(const path& __p,
- directory_options __options,
- error_code& __ec) noexcept
+recursive_directory_iterator(const path& __p, directory_options __options,
+ error_code& __ec)
 : recursive_directory_iterator(__p, __options, &__ec) { }
 
-recursive_directory_iterator(const path& __p, error_code& __ec) noexcept
+recursive_directory_iterator(const path& __p, error_code& __ec)
 : recursive_directory_iterator(__p, directory_options::none, &__ec) { }
 
 recursive_directory_iterator(
@@ -466,7 +464,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 operator=(recursive_directory_iterator&& __rhs) noexcept;
 
 recursive_directory_iterator& operator++();
-recursive_directory_iterator& increment(error_code& __ec) noexcept;
+recursive_directory_iterator& increment(error_code& __ec);
 
 __directory_iterator_proxy operator++(int)
 {
diff --git a/libstdc++-v3/include/bits/fs_ops.h 
b/libstdc++-v3/include/bits/fs_ops.h
index 563d63de81a..075d61e2a63 100644
--- a/libstdc++-v3/include/bits/fs_ops.h
+++ b/libstdc++-v3/include/bits/fs_ops.h
@@ -56,31 +56,31 @@ namespace filesystem
   { copy(__from, __to, copy_options::none); }
 
   inline void
-  copy(const path& __from, const path& __to, error_code& __ec) noexcept
+  copy(const path& __from, const path& __to, error_code& __ec)
   { copy(__from, __to, copy_options::none, __ec); }
 
   void copy(const path& __from, const path& __to, copy_options __options);
   void copy(const path& __from, const path& __to, copy_options __options,
-   error_code& __ec) noexcept;
+   error_code& __ec);
 
   inline bool
   copy_file(const path& __from, const path& __to)
   { return copy_file(__from, __to, 

[PATCH] Make filesystem::file_status default constructor non-explicit (LWG 2787)

2017-10-27 Thread Jonathan Wakely

Also add tests for experimental::file_status.

* include/bits/fs_dir.h (file_status): Make default constructor
non-explicit (LWG 2787).
* testsuite/27_io/filesystem/file_status/1.cc: New test.
* testsuite/experimental/filesystem/file_status/1.cc: New test.

Tested powerpc64le-linux, committed to trunk.


commit 33acd39c23232d98eb0b621ac856f07a0bbf6259
Author: Jonathan Wakely 
Date:   Fri Oct 27 17:53:14 2017 +0100

Make filesystem::file_status default constructor non-explicit (LWG 2787)

* include/bits/fs_dir.h (file_status): Make default constructor
non-explicit (LWG 2787).
* testsuite/27_io/filesystem/file_status/1.cc: New test.
* testsuite/experimental/filesystem/file_status/1.cc: New test.

diff --git a/libstdc++-v3/include/bits/fs_dir.h 
b/libstdc++-v3/include/bits/fs_dir.h
index 20ce9beb023..cd83d25c4ad 100644
--- a/libstdc++-v3/include/bits/fs_dir.h
+++ b/libstdc++-v3/include/bits/fs_dir.h
@@ -50,10 +50,11 @@ namespace filesystem
   class file_status
   {
   public:
-// constructors
+// constructors and destructor
+file_status() noexcept : file_status(file_type::none) {}
+
 explicit
-file_status(file_type __ft = file_type::none,
-   perms __prms = perms::unknown) noexcept
+file_status(file_type __ft, perms __prms = perms::unknown) noexcept
 : _M_type(__ft), _M_perms(__prms) { }
 
 file_status(const file_status&) noexcept = default;
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/file_status/1.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/file_status/1.cc
new file mode 100644
index 000..21613020f88
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/file_status/1.cc
@@ -0,0 +1,84 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17 -lstdc++fs" }
+// { dg-do run { target c++17 } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+#include 
+
+namespace fs = std::filesystem;
+
+template
+constexpr bool nothrow_constructible() {
+  return std::is_nothrow_constructible::value;
+}
+
+void
+test01()
+{
+  fs::file_status st0;
+  VERIFY( st0.type() == fs::file_type::none );
+  VERIFY( st0.permissions() == fs::perms::unknown );
+  static_assert( nothrow_constructible<>(), "" );
+
+  fs::file_status st1(fs::file_type::regular);
+  VERIFY( st1.type() == fs::file_type::regular );
+  VERIFY( st1.permissions() == fs::perms::unknown );
+  static_assert( nothrow_constructible(), "" );
+
+  fs::file_status st2(fs::file_type::directory, fs::perms::owner_all);
+  VERIFY( st2.type() == fs::file_type::directory );
+  VERIFY( st2.permissions() == fs::perms::owner_all );
+  static_assert( nothrow_constructible(), "" );
+
+  static_assert( nothrow_constructible(), "" );
+  static_assert( nothrow_constructible(), "" );
+}
+
+void
+test02()
+{
+  fs::file_status st;
+  VERIFY( st.type() == fs::file_type::none );
+  VERIFY( st.permissions() == fs::perms::unknown );
+
+  st.type(fs::file_type::symlink);
+  VERIFY( st.type() == fs::file_type::symlink );
+  VERIFY( st.permissions() == fs::perms::unknown );
+
+  st.permissions(fs::perms::owner_all);
+  VERIFY( st.type() == fs::file_type::symlink );
+  VERIFY( st.permissions() == fs::perms::owner_all );
+}
+
+void check_non_explicit_constructor(fs::file_status) { }
+
+void
+test03()
+{
+  check_non_explicit_constructor( {} ); // LWG 2787
+}
+
+int
+main()
+{
+  test01();
+  test02();
+  test03();
+}
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/file_status/1.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/file_status/1.cc
new file mode 100644
index 000..970ad177b95
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/file_status/1.cc
@@ -0,0 +1,75 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but 

Re: [PATCH, rs6000] (v3) Gimple folding for vec_madd()

2017-10-27 Thread Segher Boessenkool
On Fri, Oct 27, 2017 at 12:51:56PM -0400, David Edelsohn wrote:
> On Fri, Oct 27, 2017 at 11:51 AM, Will Schmidt
>  wrote:
> > [PATCH, rs6000] (v2) Gimple folding for vec_madd()
> >
> > Add support for gimple folding of the vec_madd() (vector multiply-add)
> > intrinsics.
> > Renamed the define_insn of altivec_vmladduhm to fmav8hi4, Refreshed the
> > caller of gen_altivec_vmladduhm to call gen_fmav8hi, and updated the
> > rs6000-builtin.def entry for VMLADDUHM to point to the new name.
> > With this refresh I am no longer adding a define_expand.
> > Plus a few cosmetic tweaks per feedback.
> >
> > Testcase coverage is provided by the existing tests as
> > gcc.target/powerpc/fold-vec-madd-*.c
> >
> > Sniff-tests passed. Regtests will be kicked off shortly. OK for trunk?

Okay, with David's comments taken care of.  Thanks!


Segher


[committed][PATCH] Convert sprintf warning code to a dominator walk

2017-10-27 Thread Jeff Law

Prereq for eventually embedding range analysis into the sprintf warning
pass.  The only thing that changed since the original from a few days
ago was the addition of FINAL OVERRIDE to the before_dom_children
override function.

Re-bootstrapped and regression tested on x86.

Installing on the trunk.  Final patch attached for archival purposes.


Jeff
commit 9d8823fc2cd32cc3ef667e96f445f3cbf5192441
Author: law 
Date:   Fri Oct 27 16:54:49 2017 +

* gimple-ssa-sprintf.c: Include domwalk.h.
(class sprintf_dom_walker): New class, derived from dom_walker.
(sprintf_dom_walker::before_dom_children): New function.
(struct call_info): Moved into sprintf_dom_walker class
(compute_formath_length, handle_gimple_call): Likewise.
(sprintf_length::execute): Call the dominator walker rather
than walking the statements.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@254156 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 33e275f6b3c..c740c8b1705 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,13 @@
 2017-10-27  Jeff Law  
 
+   * gimple-ssa-sprintf.c: Include domwalk.h.
+   (class sprintf_dom_walker): New class, derived from dom_walker.
+   (sprintf_dom_walker::before_dom_children): New function.
+   (struct call_info): Moved into sprintf_dom_walker class
+   (compute_formath_length, handle_gimple_call): Likewise.
+   (sprintf_length::execute): Call the dominator walker rather
+   than walking the statements.
+
* tree-vrp.c (check_all_array_refs): Do not use wi->info to smuggle
gimple statement locations.
(check_array_bounds): Corresponding changes.  Get the statement's
diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 9770df72898..74154138fc9 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "toplev.h"
 #include "substring-locations.h"
 #include "diagnostic.h"
+#include "domwalk.h"
 
 /* The likely worst case value of MB_LEN_MAX for the target, large enough
for UTF-8.  Ideally, this would be obtained by a target hook if it were
@@ -113,6 +114,19 @@ static int warn_level;
 
 struct format_result;
 
+class sprintf_dom_walker : public dom_walker
+{
+ public:
+  sprintf_dom_walker () : dom_walker (CDI_DOMINATORS) {}
+  ~sprintf_dom_walker () {}
+
+  virtual edge before_dom_children (basic_block) FINAL OVERRIDE;
+  bool handle_gimple_call (gimple_stmt_iterator *);
+
+  struct call_info;
+  bool compute_format_length (call_info &, format_result *);
+};
+
 class pass_sprintf_length : public gimple_opt_pass
 {
   bool fold_return_value;
@@ -135,10 +149,6 @@ public:
   fold_return_value = param;
 }
 
-  bool handle_gimple_call (gimple_stmt_iterator *);
-
-  struct call_info;
-  bool compute_format_length (call_info &, format_result *);
 };
 
 bool
@@ -976,7 +986,7 @@ bytes_remaining (unsigned HOST_WIDE_INT navail, const 
format_result )
 
 /* Description of a call to a formatted function.  */
 
-struct pass_sprintf_length::call_info
+struct sprintf_dom_walker::call_info
 {
   /* Function call statement.  */
   gimple *callstmt;
@@ -2348,7 +2358,7 @@ format_plain (const directive , tree)
should be diagnosed given the AVAILable space in the destination.  */
 
 static bool
-should_warn_p (const pass_sprintf_length::call_info ,
+should_warn_p (const sprintf_dom_walker::call_info ,
   const result_range , const result_range )
 {
   if (result.max <= avail.min)
@@ -2419,7 +2429,7 @@ should_warn_p (const pass_sprintf_length::call_info ,
 
 static bool
 maybe_warn (substring_loc , location_t argloc,
-   const pass_sprintf_length::call_info ,
+   const sprintf_dom_walker::call_info ,
const result_range _range, const result_range ,
const directive )
 {
@@ -2716,7 +2726,7 @@ maybe_warn (substring_loc , location_t argloc,
in *RES.  Return true if the directive has been handled.  */
 
 static bool
-format_directive (const pass_sprintf_length::call_info ,
+format_directive (const sprintf_dom_walker::call_info ,
  format_result *res, const directive )
 {
   /* Offset of the beginning of the directive from the beginning
@@ -3004,7 +3014,7 @@ format_directive (const pass_sprintf_length::call_info 
,
the directive.  */
 
 static size_t
-parse_directive (pass_sprintf_length::call_info ,
+parse_directive (sprintf_dom_walker::call_info ,
 directive , format_result *res,
 const char *str, unsigned *argno)
 {
@@ -3431,7 +3441,7 @@ parse_directive (pass_sprintf_length::call_info ,
that caused the processing to be terminated early).  */
 
 bool
-pass_sprintf_length::compute_format_length (call_info ,
+sprintf_dom_walker::compute_format_length (call_info ,

Re: [PATCH] Fix nrv-1.c false failure on aarch64.

2017-10-27 Thread Egeyar Bagcioglu

On 10/26/2017 05:03 PM, Jeff Law wrote:

On 10/18/2017 10:59 AM, Egeyar Bagcioglu wrote:

Hello,

Test case "guality.exp=nrv-1.c" fails on aarch64. Optimizations reorder
the instructions and cause the value of a variable to be checked before
its first assignment. The following patch is moving the
break point to the end of the function. Therefore, it ensures that the
break point is reached after the assignment instruction is executed.

Please review the patch and apply if legitimate.

This seems wrong.

If I understand the test correctly, we want to break on the line with
the assignment to a2.i[4] = 7 and verify that before that line executes
that a2.i[0] == 42.

Moving the test point to the end of the function seems to defeat the
purpose of the test.  A breakpoint at the end of the function to test
state is pointless as it doesn't reflect what a user is likely to want
to do.

I'm guessing based on your description that optimization has sunk the
assignment to a2.i[0] down past the assignment to a2.i[4]?  What
optimization did this and what do the dwarf records look like?


Jeff

Hi Jeff,

Thank you for the review. Your analysis of the issue is correct. Indeed, 
I realized after the first reviews that the test was aiming the 
debugging experience rather than the functionality of the executable. As 
a result, I withdraw this patch.


Earlier reviewers mentioned Alex Oliva's SFN work could fix this test 
case as well. I verified that. His patches are currently in review. I am 
following and waiting for their progress before any further action.


Egeyar


Re: [PATCH, rs6000] (v3) Gimple folding for vec_madd()

2017-10-27 Thread David Edelsohn
On Fri, Oct 27, 2017 at 11:51 AM, Will Schmidt
 wrote:
> Hi,
>  V3. :-)
>
> [PATCH, rs6000] (v2) Gimple folding for vec_madd()
>
> Add support for gimple folding of the vec_madd() (vector multiply-add)
> intrinsics.
> Renamed the define_insn of altivec_vmladduhm to fmav8hi4, Refreshed the
> caller of gen_altivec_vmladduhm to call gen_fmav8hi, and updated the
> rs6000-builtin.def entry for VMLADDUHM to point to the new name.
> With this refresh I am no longer adding a define_expand.
> Plus a few cosmetic tweaks per feedback.
>
> Testcase coverage is provided by the existing tests as
> gcc.target/powerpc/fold-vec-madd-*.c
>
> Sniff-tests passed. Regtests will be kicked off shortly. OK for trunk?
>
> Thanks,
> -Will
>
> [gcc]

Thanks for spinning the patch again without the define_expand.

The altivec.md ChangeLog entry should be more explicit for each change.

>
> 2017-10-27  Will Schmidt 
>
> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
> gimple folding of vec_madd() intrinsics.
> * config/rs6000/altivec.md: Rename altivec_vmladduhm to fmav8hi4

* config/rs6000/altivec.md (mulv8hi3): Rename altivec_vmladduhm to fmav8hi4.
(altivec_vmladduhm): Rename to fmav8hi4.

> * config/rs6000/rs6000-builtin.def: Rename vmladduhm to fmav8hi4

Thanks, David


Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-10-27 Thread Kyrill Tkachov

Hi Wilco,

On 16/10/17 12:29, Wilco Dijkstra wrote:

ping
 
Kyrill Tkachov wrote:

On 14/12/16 16:37, Wilco Dijkstra wrote:


Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
applied only to one of the duplicates.

When this was brought up Ramana requested some more investigations on the 
codegen impact [1].
Have you done the archaeology on why we had two patterns in the first place and 
what the codegen
effect is of remove the Cortex-A8-specific one?

Yes, the reason to split the pattern was to introduce the '!' to discourage 
Neon->int moves on Cortex-A8 (https://patches.linaro.org/patch/541/). I am not 
removing the optimization for Cortex-A8, however I   haven't been able to find an 
example where it makes a difference, even on high register pressure code.


Thanks, as far as I can see this doesn't impact codegen much these days 
[1] and this Cortex-A8-specific pattern is a huge liability in my eyes.
So this patch is ok but please re-bootstrap and re-test it on 
arm-none-linux-gnueabihf before committing.


Thank you for your patience with this,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01585.html

Wilco



Bootstrap OK for ARM and Thumb-2 gnueabihf targets. OK for commit?

ChangeLog:
2016-11-29  Wilco Dijkstra  

   * config/arm/vfp.md (movdi_vfp): Merge changes from 
movdi_vfp_cortexa8.
   * (movdi_vfp_cortexa8): Remove pattern.
--

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
2051f1018f1cbff9c5bf044e71304d78e615458e..a917aa625a7b15f6c9e2b549ab22e5219bb9b99c
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -304,9 +304,9 @@
;; DImode moves

(define_insn "*movdi_vfp"

-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,q,q,m,w,r,w,w, 
Uv")
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,q,q,m,w,!r,w,w, 
Uv")
   (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,q,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune != TARGET_CPU_cortexa8
+  "TARGET_32BIT && TARGET_HARD_FLOAT
   && (   register_operand (operands[0], DImode)
   || register_operand (operands[1], DImode))
   && !(TARGET_NEON && CONST_INT_P (operands[1])
@@ -339,71 +339,25 @@
}
  "
  [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1,4,5,6") (const_int 8)
+   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
  (eq_attr "alternative" "2") (const_int 12)
  (eq_attr "alternative" "3") (const_int 16)
+ (eq_attr "alternative" "4,5,6")
+  (symbol_ref "arm_count_output_move_double_insns 
(operands) * 4")
  (eq_attr "alternative" "9")
   (if_then_else
 (match_test "TARGET_VFP_SINGLE")
 (const_int 8)
 (const_int 4))]
  (const_int 4)))
+   (set_attr "predicable""yes")
   (set_attr "arm_pool_range" "*,*,*,*,1020,4096,*,*,*,*,1020,*")
   (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
   (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
+   (set (attr "ce_count") (symbol_ref "get_attr_length (insn) / 4"))
   (set_attr "arch"   
"t2,any,any,any,a,t2,any,any,any,any,any,any")]
)

-(define_insn "*movdi_vfp_cortexa8"

-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,r,r,m,w,!r,w,w, 
Uv")
-   (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,r,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune == TARGET_CPU_cortexa8
-&& (   register_operand (operands[0], DImode)
-|| register_operand (operands[1], DImode))
-&& !(TARGET_NEON && CONST_INT_P (operands[1])
-&& neon_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
-  "*
-  switch (which_alternative)
-{
-case 0:
-case 1:
-case 2:
-case 3:
-  return \"#\";
-case 4:
-case 5:
-case 6:
-  return output_move_double (operands, true, NULL);
-case 7:
-  return \"vmov%?\\t%P0, %Q1, %R1\\t%@ int\";
-case 8:
-  return \"vmov%?\\t%Q0, %R0, %P1\\t%@ int\";
-case 9:
-  return \"vmov%?.f64\\t%P0, %P1\\t%@ int\";
-case 10: case 11:
-  return output_move_vfp (operands);
-default:
-  gcc_unreachable ();
-}
-  "
-  [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
-   

Re: [PATCH, rs6000 V3] Add Power 8 support to vec_revb

2017-10-27 Thread Carl Love
Segher:

I still have issues with the .  

> > +(define_mode_attr VSX_XXBR  [(V16QI "q")
> > +  (V8HI  "h")
> > +  (V4SI  "w")
> > +  (V4SF  "w")
> > +  (V2DF  "d")
> > +  (V2DI  "d")
> > +  (V1TI  "q")])
> 
> So I think this is wrong for V16QI.  You can use  fine, but you need
> to avoid generating "xxbrb" insns; instead, do a register move?  xxbrq
> isn't the insn you want, as far as I see.
> 
> > +;; Swap all bytes in each element of vector
> > +(define_expand "revb_"
> > +  [(set (match_operand:VEC_A 0 "vsx_register_operand")
> > +   (bswap:VEC_A (match_operand:VEC_A 1 "vsx_register_operand")))]
> > +  "TARGET_P9_VECTOR"
> > +{
> > +  rtx sel;
> 
> So a special case here:
> 
>   if (mode == V16QImode)
> {
>   emit_move_insn (operands[0], operands[1]);
>   DONE;
> }

Even if I put in the above special case, I still have issues with the
.  The updated code for the expand with the special case above is

(define_expand "revb_"
  [(set (match_operand:VEC_A 0 "vsx_register_operand")  
(bswap:VEC_A (match_operand:VEC_A 1 "vsx_register_operand")))]  
  "TARGET_P8_VECTOR"
{   
  rtx sel;  

  if (TARGET_P9_VECTOR) 
if (mode == V16QImode)
  emit_move_insn (operands[0], operands[1]);
else
  emit_insn (gen_p9_xxbr_ (operands[0], operands[1]));
  
 etc. 


The issue is the if (mode == V16QImode) does not prevent the code
in the else statement from getting expanded for . I agree it will
prevent the generation of the instruction but the code is still expanded
and compiled.  I get the error message:

/home/carll/GCC/gcc-revb/gcc/config/rs6000/vsx.md:4727:62: error:
‘gen_p9_xxbrb_v16qi’ was not declared in this scope
   emit_insn (gen_p9_xxbr_ (operands[0], operands[1]));

Because  for mode v16qi still gets expanded to "b" not "q".

There is no definition for "gen_p9_xxbrb_v16qi" since xxbrb is not
vaild.  Short of using a different expander for  I don't see how to
not get the expansion.  Sorry if I am missing something obvious here.

  Carl Love







Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-10-27 Thread Kyrill Tkachov

Hi Wilco,

Sorry for the delay.

On 16/10/17 12:30, Wilco Dijkstra wrote:
 


ping

From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
 
A left shift of 1 can always be done using an add, so slightly adjust rtx

cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.


I agree...


DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.


... but it's still used, and the patterns were put there for a reason.
Even if GCC itself doesn't use them much they may be used by other 
applications.


So I'd support removing the left shift 1-bit expansions, but not the 
right shift ones.


Thanks,
Kyrill


Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

 * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
 (arm_ashldi3_1bit): Remove pattern.
 (ashrdi3): Remove shift by 1 expansion.
 (arm_ashrdi3_1bit): Remove pattern.
 (lshrdi3): Remove shift by 1 expansion.
 (arm_lshrdi3_1bit): Remove pattern.
 * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
 cost of ashldi3 by 1.
 * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
 (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
 + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
if (speed_p)
  *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
return true;
  }
else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
  {
rtx scratch1, scratch2;
  
-  if (operands[2] == CONST1_RTX (SImode))

-{
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
/* Ideally we should use iwmmxt here if we could know that operands[1]
   ends up already living in an iwmmxt register. Otherwise it's
   cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
"
  )
  
-(define_insn "arm_ashldi3_1bit"

-  [(set (match_operand:DI0 "s_register_operand" "=r,")
-(ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
  (define_expand "ashlsi3"
[(set (match_operand:SI0 "s_register_operand" "")
  (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
  {
rtx scratch1, scratch2;
  
-  if (operands[2] == CONST1_RTX (SImode))

-{
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
/* Ideally we should use iwmmxt here if we could know that operands[1]
   ends up already living in an iwmmxt register. Otherwise it's
   cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
"
  )
  
-(define_insn "arm_ashrdi3_1bit"

-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-(ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
  (define_expand "ashrsi3"
[(set (match_operand:SI  0 "s_register_operand" "")
  (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
  {
rtx scratch1, scratch2;
  
-  if (operands[2] == CONST1_RTX (SImode))

-{
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-}
-
/* Ideally we should use iwmmxt here if we could know that operands[1]
   ends up already living in an iwmmxt register. Otherwise it's
   cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@

Re: [doc] Remove Tru64 UNIX and IRIX references in install.texi

2017-10-27 Thread Sandra Loosemore

On 10/27/2017 05:48 AM, Rainer Orth wrote:

I happened to notice that install.texi still contains references to the
Tru64 UNIX and IRIX ports I've removed in GCC 4.8.  I believe it's time
now to get rid of those completely.

Tested with make doc/gccinstall.info and doc/gccinstall.pdf.  Ok for
mainline?  This falls under my prior maintainership, I guess, but
think it's best to get a second opinion.


Thanks for catching this.  In general I think we should document only 
GCC's current behavior and not mention removed functionality except in 
release notes.  I think this whole document needs review for that sort 
of thing, but every bit helps.


Can you fix this nit while you're in there?


@@ -3353,8 +3347,7 @@ The workaround is disabled by default if
 @anchor{alpha-x-x}
 @heading alpha*-*-*
 This section contains general configuration information for all
-alpha-based platforms using ELF (in particular, ignore this section for
-DEC OSF/1, Digital UNIX and Tru64 UNIX)@.  In addition to reading this
+alpha-based platforms using ELF@.  In addition to reading this
 section, please read all other sections that match your target.

 We require binutils 2.11.2 or newer.


s/alpha-based/Alpha-based/

-Sandra



Re: [PATCH] Change default to -fno-math-errno

2017-10-27 Thread Joseph Myers
No existing glibc version defines math_errhandling based on 
__NO_MATH_ERRNO__.  I'd expect such a change to come with a glibc patch, 
and indeed a GCC execution test of the value of math_errhandling to make 
sure the compiler's behavior isn't contradicting what's declared by the 
runtime libraries.  Also, the default should not be changed for pre-C99 C 
standard modes (or pre-C++11 C++, I suppose).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Adjust empty class parameter passing ABI (PR c++/60336)

2017-10-27 Thread Jeff Law
On 10/27/2017 04:38 AM, Jakub Jelinek wrote:
> On Fri, Oct 27, 2017 at 12:31:46PM +0200, Richard Biener wrote:
>> I fear it doesn't work at all with LTO (you'll always get the old ABI
>> if I read the patch correctly).  This is because the function
>> computing the size looks at flag_abi_version which isn't saved
>> per function / TU.
>>
>> Similarly you'll never get the ABI warning with LTO (less of a big
>> deal of course) because the langhook doesn't reflect things correctly
>> either.
>>
>> So...  can we instead compute whether a type is "empty" according
>> to the ABI early and store the result in the type (thinking of
>> doing this in layout_type?).  Similarly set a flag whether to
>> warn.  Why do you warn from backends / code emission and not
>> from the FEs?  Is that to avoid warnings for calls that got inlined?
>> Maybe the FE could set a flag on the call itself (ok, somewhat
>> awkward to funnel through gimple).
> 
> Warning in the FE is too early both because of the inlining, never
> emitted functions and because whether an empty struct is passed differently
> from the past matters on the backend (whether its psABI says it should be
> not passed at all or not).
Right.  My recollection was there were tons of warnings when we tried to
do this in the FE and everything we looked at was a false positive.
Moving the warning to be more controlled by the backend was an effort to
reduce the noise so that the result was actually meaningful.

Jeff


Re: [doc] Remove Tru64 UNIX and IRIX references in install.texi

2017-10-27 Thread Jeff Law
On 10/27/2017 05:48 AM, Rainer Orth wrote:
> I happened to notice that install.texi still contains references to the
> Tru64 UNIX and IRIX ports I've removed in GCC 4.8.  I believe it's time
> now to get rid of those completely.
> 
> Tested with make doc/gccinstall.info and doc/gccinstall.pdf.  Ok for
> mainline?  This falls under my prior maintainership, I guess, but
> think it's best to get a second opinion.
OK.  THere's probably a ton of ancient host/build issues that should
just get removed.

jeff


Re: [RFA][PATCH] Provide a class interface into substitute_and_fold.

2017-10-27 Thread Jeff Law
On 10/26/2017 12:11 PM, Richard Biener wrote:
> On October 26, 2017 6:50:15 PM GMT+02:00, Jeff Law  wrote:

[ Big snip ]

>>> Both patches look ok to me though it would be nice to
>>> do the actual composition with a comment that the
>>> lattices might be moved here (if all workers became
>>> member functions as well).
>> I'm actually hoping to post a snapshot of how this will look in a clean
>> consumer today (sprintf bits).  There's enough in place that I can do
>> that while I continue teasing apart vrp and evrp.
> 
> Good. Nice to see we're in the same boat. 
And just so folks can see where this is going.  This is what it takes to
embed a vrp-style range analyzer into the sprintf warnings pass.

There's a #include to pick up the range analyzer.  A new override in the
sprintf dom walker and a new data member in the sprintf dom walker.

You can see the calls into the range analyzer, one in the
before_dom_children override and the other in the after_dom_children
override to generate the context sensitive range information.  There's 3
calls into a class method to get the range info.

That's the general procedure for any dom walker that we'd want to have
access to context sensitive range data.  #include, new data member in
the walker class,  an enter/exit call in the {before,after}_dom_children
overrides and redirecting the queries into the range analyzer rather
than querying the global info.



Two warts in the current implementation:

First, the queries into the range analyzer occur deeper into the call
chain, so we don't have trivial access to the analyzer class instance.
So for now I just shove it into a global.

This is a common issue I see with our codebase -- we may have some of
the higher level objects in place, but when it comes to accessing those
objects deeper in the call chains, we punt to globals.  It's probably as
much an artifact of our history as a C project as anything.

Truly class-ifying the code or passing down the class instance sometimes
has little/no benefit and we stop and punt to a global.  That's probably
not an unreasonable decision, particularly in self-contained code.  But
once we want to re-use the code it's a major hindrance.  In fact, the
whole process around cleaning up vr_data in tree-vrp is another instance
of this exact issue.   Anyway, I'll try to avoid ranting too much
because I'm as guilty as anyone on this issue.

Second.  The name of the method to get the range information from the
analyzer is the same as the global scoped function to get range
information from the SSA_NAME.  That opens us up to shadowing problems.
Obviously changing the name of one or the other will need to happen.

Anyway, the idea is to show the general procedure for adding range
analysis to a domwalker based optimization/analysis pass.

Anyway, back to cleaning up vr_values :-)

Jeff
commit dca6f665879a75becb08697653a47ced71a627a0
Author: Jeff Law 
Date:   Thu Oct 26 14:19:42 2017 -0400

Range analyzer in sprintf code

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 7415413..4ccc140 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -80,6 +80,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "substring-locations.h"
 #include "diagnostic.h"
 #include "domwalk.h"
+#include "range-analyzer.h"
 
 /* The likely worst case value of MB_LEN_MAX for the target, large enough
for UTF-8.  Ideally, this would be obtained by a target hook if it were
@@ -121,12 +122,16 @@ class sprintf_dom_walker : public dom_walker
   ~sprintf_dom_walker () {}
 
   virtual edge before_dom_children (basic_block) FINAL OVERRIDE;
+  virtual void after_dom_children (basic_block) FINAL OVERRIDE;
   bool handle_gimple_call (gimple_stmt_iterator *);
 
   struct call_info;
   bool compute_format_length (call_info &, format_result *);
+  class range_analyzer range_analyzer;
 };
 
+static class vr_values *x;
+
 class pass_sprintf_length : public gimple_opt_pass
 {
   bool fold_return_value;
@@ -1143,7 +1148,7 @@ get_int_range (tree arg, HOST_WIDE_INT *pmin, 
HOST_WIDE_INT *pmax,
{
  /* Try to determine the range of values of the integer argument.  */
  wide_int min, max;
- enum value_range_type range_type = get_range_info (arg, , );
+ enum value_range_type range_type = x->get_range_info (arg, , 
);
  if (range_type == VR_RANGE)
{
  HOST_WIDE_INT type_min
@@ -1443,7 +1448,7 @@ format_integer (const directive , tree arg)
   /* Try to determine the range of values of the integer argument
 (range information is not available for pointers).  */
   wide_int min, max;
-  enum value_range_type range_type = get_range_info (arg, , );
+  enum value_range_type range_type = x->get_range_info (arg, , );
   if (range_type == VR_RANGE)
{
  argmin = wide_int_to_tree (argtype, min);
@@ -3883,7 +3888,7 @@ 

Re: [03/nn] Allow vector CONSTs

2017-10-27 Thread Richard Sandiford
Jeff Law  writes:
> On 10/23/2017 05:18 AM, Richard Sandiford wrote:
>> This patch allows (const ...) wrappers to be used for rtx vector
>> constants, as an alternative to const_vector.  This is useful
>> for SVE, where the number of elements isn't known until runtime.
> Right.  It's constant, but not knowable at compile time.  That seems an
> exact match for how we've used CONST.
>
>> 
>> It could also be useful in future for fixed-length vectors, to
>> reduce the amount of memory needed to represent simple constants
>> with high element counts.  However, one nice thing about keeping
>> it restricted to variable-length vectors is that there is never
>> any need to handle combinations of (const ...) and CONST_VECTOR.
> Yea, but is the memory consumption of these large vectors a real
> problem?  I suspect, relative to other memory issues they're in the noise.

Yeah, maybe not, especially since the elements themselves are shared.

>> 2017-10-23  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>> 
>> gcc/
>>  * doc/rtl.texi (const): Update description of address constants.
>>  Say that vector constants are allowed too.
>>  * common.md (E, F): Use CONSTANT_P instead of checking for
>>  CONST_VECTOR.
>>  * emit-rtl.c (gen_lowpart_common): Use const_vec_p instead of
>>  checking for CONST_VECTOR.
>>  * expmed.c (make_tree): Use build_vector_from_val for a CONST
>>  VEC_DUPLICATE.
>>  * expr.c (expand_expr_real_2): Check for vector modes instead
>>  of checking for CONST_VECTOR.
>>  * rtl.h (const_vec_p): New function.
>>  (const_vec_duplicate_p): Check for a CONST VEC_DUPLICATE.
>>  (unwrap_const_vec_duplicate): Handle them here too.
> My only worry here is code that is a bit loose in checking for a CONST,
> but not the innards and perhaps isn't prepared for for the new forms
> that appear inside the CONST.
>
> If we have such problems I'd expect it's in the targets as the targets
> have traditionally have had to validate the innards of a CONST to ensure
> it could be handled by the assembler/linker.  Hmm, that may save the
> targets since they'd likely need an update to LEGITIMATE_CONSTANT_P to
> ever see these new forms.
>
> Presumably an aarch64 specific patch to recognize these as valid
> constants in LEGITIMATE_CONSTANT_P is in the works?

Yeah, via the const_vec_duplicate_p helper.  For the default
variable-length mode of SVE we use the (const ...) while for the
fixed-length mode we use (const_vector ...) as normal.  Advanced SIMD
always uses (const_vector ...).

> OK for the trunk.
>
> jeff

Thanks,
Richard


Re: [01/nn] Add gen_(const_)vec_duplicate helpers

2017-10-27 Thread Richard Sandiford
Jeff Law  writes:
> On 10/23/2017 05:16 AM, Richard Sandiford wrote:
>> This patch adds helper functions for generating constant and
>> non-constant vector duplicates.  These routines help with SVE because
>> it is then easier to use:
>> 
>>(const:M (vec_duplicate:M X))
>> 
>> for a broadcast of X, even if the number of elements in M isn't known
>> at compile time.  It also makes it easier for general rtx code to treat
>> constant and non-constant duplicates in the same way.
>> 
>> In the target code, the patch uses gen_vec_duplicate instead of
>> gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially
>> useful.  It might be that some or all of the call sites only handle
>> non-constants in practice, in which case the change is a harmless
>> no-op (and a saving of a few characters).
>> 
>> Otherwise, the target changes use gen_const_vec_duplicate instead
>> of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate.
>> They also include some changes to use CONSTxx_RTX for easy global
>> constants.
>> 
>> 
>> 2017-10-23  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>> 
>> gcc/
>>  * emit-rtl.h (gen_const_vec_duplicate): Declare.
>>  (gen_vec_duplicate): Likewise.
>>  * emit-rtl.c (gen_const_vec_duplicate_1): New function, split
>>  out from...
>>  (gen_const_vector): ...here.
>>  (gen_const_vec_duplicate, gen_vec_duplicate): New functions.
>>  (gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants
>>  whose elements are all equal.
>>  * optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate.
>>  * simplify-rtx.c (simplify_const_unary_operation): Likewise.
>>  (simplify_relational_operation): Likewise.
>>  * config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
>>  Likewise.
>>  (aarch64_simd_dup_constant): Use gen_vec_duplicate.
>>  (aarch64_expand_vector_init): Likewise.
>>  * config/arm/arm.c (neon_vdup_constant): Likewise.
>>  (neon_expand_vector_init): Likewise.
>>  (arm_expand_vec_perm): Use gen_const_vec_duplicate.
>>  (arm_block_set_unaligned_vect): Likewise.
>>  (arm_block_set_aligned_vect): Likewise.
>>  * config/arm/neon.md (neon_copysignf): Likewise.
>>  * config/i386/i386.c (ix86_expand_vec_perm): Likewise.
>>  (expand_vec_perm_even_odd_pack): Likewise.
>>  (ix86_vector_duplicate_value): Use gen_vec_duplicate.
>>  * config/i386/sse.md (one_cmpl2): Use CONSTM1_RTX.
>>  * config/ia64/ia64.c (ia64_expand_vecint_compare): Use
>>  gen_const_vec_duplicate.
>>  * config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX.
>>  * config/mips/mips.c (mips_gen_const_int_vector): Use
>>  gen_const_vec_duplicate.
>>  (mips_expand_vector_init): Use CONST0_RTX.
>>  * config/powerpcspe/altivec.md (abs2, nabs2): Likewise.
>>  (define_split): Use gen_const_vec_duplicate.
>>  * config/rs6000/altivec.md (abs2, nabs2): Use CONST0_RTX.
>>  (define_split): Use gen_const_vec_duplicate.
>>  * config/s390/vx-builtins.md (vec_genmask): Likewise.
>>  (vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise.
>>  * config/spu/spu.c (spu_const): Likewise.
> I'd started looking at this a couple times when it was originally
> submitted, but never seemed to get through it.  It seems like a nice
> cleanup.
>
> So in gen_const_vector we had an assert to verify that const_tiny_rtx
> was set up.  That seems to have been lost.  It's probably not a big
> deal, but I mention it in case the loss was unintentional.

This morphed into:

+static rtx
+gen_const_vector (machine_mode mode, int constant)
+{
+  machine_mode inner = GET_MODE_INNER (mode);
+
+  gcc_assert (!DECIMAL_FLOAT_MODE_P (inner));
+
+  rtx el = const_tiny_rtx[constant][(int) inner];
+  gcc_assert (el);

but it wasn't obvious due to the way the unified diff mixed up the
functions.  I should have posted that one as context, sorry...

Richard


Re: [Patch, fortran] PR81758 - [7/8 Regression] [OOP] Broken vtab

2017-10-27 Thread Jerry DeLisle
On 10/26/2017 12:20 PM, Andre Vehreschild wrote:
> Hi Paul,
> 
> Without having tested the patch, it looks reasonable to me. So ok from my 
> side.
> 
> - Andre
> 

Seconded, thanks.

Jerry


[PATCH, rs6000] update vec_perm testcase

2017-10-27 Thread Will Schmidt
Hi,

Update the vec-perm testcase to use 'long long' rather than 'long'.  This was a 
missed typo
from when i initially committed the test.

Credit given to Carl for noticing this one.

OK for trunk?

Thanks,
-Will


diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-longlong.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-longlong.c
index 7f3e574..1333d88 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-perm-longlong.c
@@ -14,11 +14,11 @@ testbl (vector bool long long vbl2, vector bool long long 
vbl3,
 {
   return vec_perm (vbl2, vbl3, vuc);
 }
 
 vector signed long long
-testsl (vector signed long vsl2, vector signed long vsl3,
+testsl (vector signed long long vsl2, vector signed long long vsl3,
vector unsigned char vuc)
 {
   return vec_perm (vsl2, vsl3, vuc);
 }
 




Re: [07/nn] Add unique CONSTs

2017-10-27 Thread Richard Sandiford
Jeff Law  writes:
> On 10/23/2017 05:21 AM, Richard Sandiford wrote:
>> This patch adds a way of treating certain kinds of CONST as unique,
>> so that pointer equality is equivalent to value equality.  For now it
>> is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
>> generate them remains in the else arm of an "if (1)" until a later
>> patch.
>> 
>> This is needed so that (const (vec_duplicate xx)) can used as the
>> CONSTxx_RTX of a variable-length vector.
> You're brave :-)  I know we looked at making CONST_INTs behave in this
> manner eons ago in an effort to reduce memory consumption and it was
> just plain painful.   There may still be comments from that project
> littering the source code.
>
> I do wonder if we might want to revisit this again as we have better
> infrastructure in place.

For vectors it isn't so bad, since we already do the same thing
for CONST_VECTOR.  Fortunately CONST_VECTOR and CONST always have
a mode, so there's no awkward sharing between modes...

>> 2017-10-23  Richard Sandiford  
>>  Alan Hayward  
>>  David Sherwood  
>> 
>> gcc/
>>  * rtl.h (unique_const_p): New function.
>>  (gen_rtx_CONST): Declare.
>>  * emit-rtl.c (const_hasher): New struct.
>>  (const_htab): New variable.
>>  (init_emit_once): Initialize it.
>>  (const_hasher::hash, const_hasher::equal): New functions.
>>  (gen_rtx_CONST): New function.
>>  (spare_vec_duplicate, spare_vec_series): New variables.
>>  (gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
>>  but disable it for now.
>>  (gen_const_vec_series): Likewise (const (vec_series)).
>>  * gengenrtl.c (special_rtx): Return true for CONST.
>>  * rtl.c (shared_const_p): Return true if unique_const_p.
> ISTM that you need an update the rtl.texi's structure sharing
> assumptions section to describe the new rules around CONSTs.

Oops, yeah.  How about the attached?

> So what's the purpose of the sparc_vec_* stuff that you're going to use
> in the future?  It looks like a single element cache to me.Am I
> missing something?

No, that's right.  When looking up the const for (vec_duplicate x), say,
it's easier to create the vec_duplicate rtx first.  But if the lookup
succeeds (and so we already have an rtx with that value), we keep the
discarded vec_duplicate around so that we can reuse it for the next
lookup.

Thanks for the reviews,

Richard


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* doc/rtl.texi: Document rtl sharing rules.
* rtl.h (unique_const_p): New function.
(gen_rtx_CONST): Declare.
* emit-rtl.c (const_hasher): New struct.
(const_htab): New variable.
(init_emit_once): Initialize it.
(const_hasher::hash, const_hasher::equal): New functions.
(gen_rtx_CONST): New function.
(spare_vec_duplicate, spare_vec_series): New variables.
(gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
but disable it for now.
(gen_const_vec_series): Likewise (const (vec_series)).
* gengenrtl.c (special_rtx): Return true for CONST.
* rtl.c (shared_const_p): Return true if unique_const_p.

Index: gcc/doc/rtl.texi
===
--- gcc/doc/rtl.texi2017-10-27 16:48:35.827706696 +0100
+++ gcc/doc/rtl.texi2017-10-27 16:48:37.617270148 +0100
@@ -4197,6 +4197,20 @@ There is only one @code{pc} expression.
 @item
 There is only one @code{cc0} expression.
 
+@cindex @code{const}, RTL sharing
+@item
+There is only one instance of the following structures for a given
+@var{m}, @var{x} and @var{y}:
+@example
+(const:@var{m} (vec_duplicate:@var{m} @var{x}))
+(const:@var{m} (vec_series:@var{m} @var{x} @var{y}))
+@end example
+This means, for example, that for a given @var{n} there is only ever a
+single instance of an expression like:
+@example
+(const:V@var{n}DI (vec_duplicate:V@var{n}DI (const_int 0)))
+@end example
+
 @cindex @code{const_double}, RTL sharing
 @item
 There is only one @code{const_double} expression with value 0 for
Index: gcc/rtl.h
===
--- gcc/rtl.h   2017-10-27 16:48:37.433286940 +0100
+++ gcc/rtl.h   2017-10-27 16:48:37.619280894 +0100
@@ -2861,6 +2861,23 @@ vec_series_p (const_rtx x, rtx *base_out
   return const_vec_series_p (x, base_out, step_out);
 }
 
+/* Return true if there should only ever be one instance of (const X),
+   so that constants of this type can be compared using pointer equality.  */
+
+inline bool
+unique_const_p (const_rtx x)
+{
+  switch (GET_CODE (x))
+{
+case VEC_DUPLICATE:
+case VEC_SERIES:
+  return true;
+
+default:
+  return false;
+   

Re: [PATCH, rs6000] (v3) Gimple folding for vec_madd()

2017-10-27 Thread Will Schmidt
Hi, 
 V3. :-)

[PATCH, rs6000] (v2) Gimple folding for vec_madd()

Add support for gimple folding of the vec_madd() (vector multiply-add)
intrinsics.
Renamed the define_insn of altivec_vmladduhm to fmav8hi4, Refreshed the
caller of gen_altivec_vmladduhm to call gen_fmav8hi, and updated the
rs6000-builtin.def entry for VMLADDUHM to point to the new name.
With this refresh I am no longer adding a define_expand.
Plus a few cosmetic tweaks per feedback.

Testcase coverage is provided by the existing tests as
gcc.target/powerpc/fold-vec-madd-*.c

Sniff-tests passed. Regtests will be kicked off shortly. OK for trunk?

Thanks,
-Will

[gcc]

2017-10-27  Will Schmidt 

* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support for
gimple folding of vec_madd() intrinsics.
* config/rs6000/altivec.md: Rename altivec_vmladduhm to fmav8hi4
* config/rs6000/rs6000-builtin.def: Rename vmladduhm to fmav8hi4

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6ea529b..b2f173d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -793,15 +793,16 @@
"TARGET_ALTIVEC"
 {
   rtx zero = gen_reg_rtx (V8HImode);
 
   emit_insn (gen_altivec_vspltish (zero, const0_rtx));
-  emit_insn (gen_altivec_vmladduhm(operands[0], operands[1], operands[2], 
zero));
+  emit_insn (gen_fmav8hi4 (operands[0], operands[1], operands[2], zero));
 
   DONE;
 })
 
+
 ;; Fused multiply subtract 
 (define_insn "*altivec_vnmsubfp"
   [(set (match_operand:V4SF 0 "register_operand" "=v")
(neg:V4SF
 (fma:V4SF (match_operand:V4SF 1 "register_operand" "v")
@@ -934,11 +935,11 @@
(set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
   "TARGET_ALTIVEC"
   "vmhraddshs %0,%1,%2,%3"
   [(set_attr "type" "veccomplex")])
 
-(define_insn "altivec_vmladduhm"
+(define_insn "fmav8hi4"
   [(set (match_operand:V8HI 0 "register_operand" "=v")
 (plus:V8HI (mult:V8HI (match_operand:V8HI 1 "register_operand" "v")
  (match_operand:V8HI 2 "register_operand" "v"))
   (match_operand:V8HI 3 "register_operand" "v")))]
   "TARGET_ALTIVEC"
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index ac9ddae..7834bef 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -959,11 +959,11 @@ BU_SPECIAL_X (RS6000_BUILTIN_NONE, NULL, 0, 
RS6000_BTC_MISC)
 
 /* 3 argument Altivec builtins.  */
 BU_ALTIVEC_3 (VMADDFP,"vmaddfp",FP,fmav4sf4)
 BU_ALTIVEC_3 (VMHADDSHS,  "vmhaddshs",  SAT,   
altivec_vmhaddshs)
 BU_ALTIVEC_3 (VMHRADDSHS, "vmhraddshs", SAT,   
altivec_vmhraddshs)
-BU_ALTIVEC_3 (VMLADDUHM,  "vmladduhm",  CONST, 
altivec_vmladduhm)
+BU_ALTIVEC_3 (VMLADDUHM,  "vmladduhm",  CONST, fmav8hi4)
 BU_ALTIVEC_3 (VMSUMUBM,   "vmsumubm",   CONST, 
altivec_vmsumubm)
 BU_ALTIVEC_3 (VMSUMMBM,   "vmsummbm",   CONST, 
altivec_vmsummbm)
 BU_ALTIVEC_3 (VMSUMUHM,   "vmsumuhm",   CONST, 
altivec_vmsumuhm)
 BU_ALTIVEC_3 (VMSUMSHM,   "vmsumshm",   CONST, 
altivec_vmsumshm)
 BU_ALTIVEC_3 (VMSUMUHS,   "vmsumuhs",   SAT,   
altivec_vmsumuhs)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4837e14..aef34b7 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16606,10 +16606,26 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   build_int_cst (arg2_type, 0)), arg0);
 gimple_set_location (g, loc);
 gsi_replace (gsi, g, true);
 return true;
   }
+
+/* Vector Fused multiply-add (fma).  */
+case ALTIVEC_BUILTIN_VMADDFP:
+case VSX_BUILTIN_XVMADDDP:
+case ALTIVEC_BUILTIN_VMLADDUHM:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   tree arg2 = gimple_call_arg (stmt, 2);
+   lhs = gimple_call_lhs (stmt);
+   gimple *g = gimple_build_assign (lhs, FMA_EXPR , arg0, arg1, arg2);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+
 default:
if (TARGET_DEBUG_BUILTIN)
   fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
fn_code, fn_name1, fn_name2);
   break;




Re: [07/nn] Add unique CONSTs

2017-10-27 Thread Jeff Law
On 10/23/2017 05:21 AM, Richard Sandiford wrote:
> This patch adds a way of treating certain kinds of CONST as unique,
> so that pointer equality is equivalent to value equality.  For now it
> is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
> generate them remains in the else arm of an "if (1)" until a later
> patch.
> 
> This is needed so that (const (vec_duplicate xx)) can used as the
> CONSTxx_RTX of a variable-length vector.
You're brave :-)  I know we looked at making CONST_INTs behave in this
manner eons ago in an effort to reduce memory consumption and it was
just plain painful.   There may still be comments from that project
littering the source code.

I do wonder if we might want to revisit this again as we have better
infrastructure in place.


> 
> 
> 2017-10-23  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * rtl.h (unique_const_p): New function.
>   (gen_rtx_CONST): Declare.
>   * emit-rtl.c (const_hasher): New struct.
>   (const_htab): New variable.
>   (init_emit_once): Initialize it.
>   (const_hasher::hash, const_hasher::equal): New functions.
>   (gen_rtx_CONST): New function.
>   (spare_vec_duplicate, spare_vec_series): New variables.
>   (gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
>   but disable it for now.
>   (gen_const_vec_series): Likewise (const (vec_series)).
>   * gengenrtl.c (special_rtx): Return true for CONST.
>   * rtl.c (shared_const_p): Return true if unique_const_p.
ISTM that you need an update the rtl.texi's structure sharing
assumptions section to describe the new rules around CONSTs.

So what's the purpose of the sparc_vec_* stuff that you're going to use
in the future?  It looks like a single element cache to me.Am I
missing something?

jeff


Re: [PATCH] RISC-V: Correct and improve the "-mabi" documentation

2017-10-27 Thread Palmer Dabbelt
Committed.

On Thu, 26 Oct 2017 09:45:07 PDT (-0700), Palmer Dabbelt wrote:
> The documentation for the "-mabi" argument on RISC-V was incorrect.  We
> chose to treat this as a documentation bug rather than a code bug, and
> to make the documentation match what GCC currently does.  In the
> process, I also improved the documentation a bit.
>
> Thanks to Alex Bradbury for finding the bug!
>
> PR target/82717: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82717
>
> gcc/ChangeLog
>
> 2017-10-26  Palmer Dabbelt  
>
> PR target/82717
> * doc/invoke.texi (RISC-V) <-mabi>: Correct and improve.
> ---
>  gcc/doc/invoke.texi | 23 ---
>  1 file changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 71b2445f70fd..d184e1d7b7d4 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -21669,9 +21669,26 @@ When generating PIC code, allow the use of PLTs. 
> Ignored for non-PIC.
>
>  @item -mabi=@var{ABI-string}
>  @opindex mabi
> -Specify integer and floating-point calling convention.  This defaults to the
> -natural calling convention: e.g.@ LP64 for RV64I, ILP32 for RV32I, LP64D for
> -RV64G.
> +@item -mabi=@var{ABI-string}
> +@opindex mabi
> +Specify integer and floating-point calling convention.  @var{ABI-string}
> +contains two parts: the size of integer types and the registers used for
> +floating-point types.  For example @samp{-march=rv64ifd -mabi=lp64d} means 
> that
> +@samp{long} and pointers are 64-bit (implicitly defining @samp{int} to be
> +32-bit), and that floating-point values up to 64 bits wide are passed in F
> +registers.  Contrast this with @samp{-march=rv64ifd -mabi=lp64f}, which still
> +allows the compiler to generate code that uses the F and D extensions but 
> only
> +allows floating-point values up to 32 bits long to be passed in registers; or
> +@samp{-march=rv64ifd -mabi=lp64}, in which no floating-point arguments will 
> be
> +passed in registers.
> +
> +The default for this argument is system dependent, users who want a specific
> +calling convention should specify one explicitly.  The valid calling
> +conventions are: @samp{ilp32}, @samp{ilp32f}, @samp{ilp32d}, @samp{lp64},
> +@samp{lp64f}, and @samp{lp64d}.  Some calling conventions are impossible to
> +implement on some ISAs: for example, @samp{-march=rv32if -mabi=ilp32d} is
> +invalid because the ABI requires 64-bit values be passed in F registers, but 
> F
> +registers are only 32 bits wide.
>
>  @item -mfdiv
>  @itemx -mno-fdiv


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-27 Thread Jeff Law
On 10/27/2017 02:35 AM, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek  wrote:
>> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
>>> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
>>>  wrote:
 Richard Biener  writes:
> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>  wrote:
>> This patch adds a POD version of fixed_size_mode.  The only current use
>> is for storing the __builtin_apply and __builtin_result register modes,
>> which were made fixed_size_modes by the previous patch.
>
> Bah - can we update our host compiler to C++11/14 please ...?
> (maybe requiring that build with GCC 4.8 as host compiler works,
> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).

 That'd be great :-)  It would avoid all the poly_int_pod stuff too,
 and allow some clean-up of wide-int.h.
>>>
>>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>>> that would be required?
>>
>> I think it is too early for that, we aren't LLVM or Rust that don't really
>> care about what build requirements they impose on users.
> 
> That's true, which is why I asked.  For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
> 
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC.  Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".
It's always going to be a balancing act.  Clearly we don't want to go to
something like the Rust  model.  But we also don't want to limit
ourselves to such old tools that we end up hacking around compiler bugs
or avoiding features that can make the codebase easier to maintain and
improve or end up depending on dusty corners of C++98/C++03
implementations that nobody else uses/tests anymore because they've
moved on to C++11.


To be more concrete, if I had to put a stake in the ground.  I'd want to
pick a semi- recent version of Sun, IBM and Clang/LLVM as well as GCC.
Ideally it'd be something that supports C++11 as a language, even if the
runtime isn't fully compliant.   I suspect anything older than GCC 4.8
wouldn't have enough C++11 and anything newer to not work well for the
distros (Red Hat included).

Jeff


Re: [PATCH] Change default optimization level to -Og

2017-10-27 Thread Jeff Law
On 10/26/2017 01:50 PM, Jakub Jelinek wrote:
> On Thu, Oct 26, 2017 at 05:12:40PM +, Wilco Dijkstra wrote:
>> GCC's default optimization level is -O0.  Unfortunately unlike other 
>> compilers,
>> GCC generates extremely inefficient code with -O0.  It is almost unusable for
>> low-level debugging or manual inspection of generated code.  So a -O option 
>> is
>> always required for compilation.  -Og not only allows for fast compilation, 
>> but
>> also produces code that is efficient, readable as well as debuggable.
>> Therefore -Og makes for a much better default setting.
>>
>> Any comments?
>>
>> 2017-10-26  Wilco Dijkstra  
>>
>>  * opts.c (default_options_optimization): Set default to -Og.
>>
>> doc/
>>  * invoke.texi (-O0) Remove default mention.
>>  (-Og): Add mention of default setting.
> 
> This would only severely confuse users.  -Og has lots of unresolved issues
> for debugging experience, and changing the default this way is IMHO
> extremely undesirable.
And changing a default that has been in place for 30 years just seems
unwise at this point.

jeff


[PATCH] Change default to -fno-math-errno

2017-10-27 Thread Wilco Dijkstra
GCC currently defaults to -fmath-errno.  This generates code assuming math
functions set errno and the application checks errno.  Very few applications
test errno and various systems and math libraries no longer set errno since it
is optional.  GCC generates much faster code for simple math functions with
-fno-math-errno such as sqrt and lround (avoiding a call and PLT redirection).
Therefore it seems reasonable to change the default to -fno-math-errno.

long f(float x) { return lroundf(x) + 1; }

by default:

f:
str x30, [sp, -16]!
bl  lroundf
add x0, x0, 1
ldr x30, [sp], 16
ret

lroundf in GLIBC doesn't set errno, so all the inefficiency was for nothing:

 <__lroundf>:
   0:   9e24fcvtas  x0, s0
   4:   d65f03c0ret

With -fno-math-errno:

f:
fcvtas  x0, s0
add x0, x0, 1
ret

OK for commit?

2017-10-27  Wilco Dijkstra  

* common.opt (fmath-errno): Change default to 0.
* opts.c (set_fast_math_flags): Force -fno-math-errno with -ffast-math.

doc/
* invoke.texi (-fmath-errno) Update documentation.

--
diff --git a/gcc/common.opt b/gcc/common.opt
index 
836f05b95a219d17614f35e8dad61b22fef7d748..1bb87353f760d7c60c39de8b9de4311c1ec3d892
 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1842,7 +1842,7 @@ Common Report Var(flag_lto_report_wpa) Init(0)
 Report various link-time optimization statistics for WPA only.
 
 fmath-errno
-Common Report Var(flag_errno_math) Init(1) Optimization SetByCombined
+Common Report Var(flag_errno_math) Init(0) Optimization SetByCombined
 Set errno after built-in math functions.
 
 fmax-errors=
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
71b2445f70fd5b832c68c08e69e71d8ecad37a4a..3328a3b5fafa6a98007eff52d2a26af520de9128
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9397,24 +9397,16 @@ that depend on an exact implementation of IEEE or ISO 
rules/specifications
 for math functions. It may, however, yield faster code for programs
 that do not require the guarantees of these specifications.
 
-@item -fno-math-errno
-@opindex fno-math-errno
-Do not set @code{errno} after calling math functions that are executed
-with a single instruction, e.g., @code{sqrt}.  A program that relies on
-IEEE exceptions for math error handling may want to use this flag
-for speed while maintaining IEEE arithmetic compatibility.
+@item -fmath-errno
+@opindex fmath-errno
+Generate code that assumes math functions may set errno.  This disables
+inlining of simple math functions like @code{sqrt} and @code{lround}.
 
-This option is not turned on by any @option{-O} option since
-it can result in incorrect output for programs that depend on
-an exact implementation of IEEE or ISO rules/specifications for
-math functions. It may, however, yield faster code for programs
-that do not require the guarantees of these specifications.
-
-The default is @option{-fmath-errno}.
+A program which relies on math functions setting errno may need to
+use this flag.  However note various systems and math libraries never
+set errno.
 
-On Darwin systems, the math library never sets @code{errno}.  There is
-therefore no reason for the compiler to consider the possibility that
-it might, and @option{-fno-math-errno} is the default.
+The default is @option{-fno-math-errno}.
 
 @item -funsafe-math-optimizations
 @opindex funsafe-math-optimizations
@@ -21315,8 +21307,8 @@ truncation towards zero.
 @item @samp{round}
 Conversion from single-precision floating point to signed integer,
 rounding to the nearest integer and ties away from zero.
-This corresponds to the @code{__builtin_lroundf} function when
-@option{-fno-math-errno} is used.
+This corresponds to the @code{__builtin_lroundf} function unless
+@option{-fmath-errno} is used.
 
 @item @samp{floatis}, @samp{floatus}, @samp{floatid}, @samp{floatud}
 Conversion from signed or unsigned integer types to floating-point types.
diff --git a/gcc/opts.c b/gcc/opts.c
index 
6600a5afd488e89262e6327f7370057c7ae234ba..dfad955e220870a3250198640f3790c804b191e0
 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2530,8 +2530,6 @@ set_fast_math_flags (struct gcc_options *opts, int set)
 }
   if (!opts->frontend_set_flag_finite_math_only)
 opts->x_flag_finite_math_only = set;
-  if (!opts->frontend_set_flag_errno_math)
-opts->x_flag_errno_math = !set;
   if (set)
 {
   if (opts->frontend_set_flag_excess_precision_cmdline
@@ -2544,6 +2542,8 @@ set_fast_math_flags (struct gcc_options *opts, int set)
opts->x_flag_rounding_math = 0;
   if (!opts->frontend_set_flag_cx_limited_range)
opts->x_flag_cx_limited_range = 1;
+  if (!opts->frontend_set_flag_errno_math)
+   opts->x_flag_errno_math = 0;
 }
 }
 

Re: [RFC] Make 4-stage PGO bootstrap really working

2017-10-27 Thread Markus Trippelsdorf
On 2017.10.27 at 15:03 +0200, Martin Liška wrote:
> > And BTW would it make sense to add -gtoggle to stage2 in bootstrap-lto?
> 
> Why do you want to have it there? Am I right that we do not do a stage
> comparison with LTO bootstrap?

The idea was to trigger -g -flto at least during one stage, just as a
sanity check. 
Comparison wouldn't make sense, because we would compare LTO object
files.

-- 
Markus


Re: [PATCH, Fortran, v1] Clarify error message of co_reduce

2017-10-27 Thread Steve Kargl
On Fri, Oct 27, 2017 at 12:19:02PM +0200, Andre Vehreschild wrote:
> Hi all,
> 
> as noted on IRC is one of the error message in check.c co_reduce misleading.
> The attached patch fixes this. The intention of the error message is to tell
> that the type of the argument bound to parameter A is of wrong type and not
> that an unspecific argument has a wrong type.
> 
> If no one objects within 24 h I am planning to commit this patch as obvious to
> trunk gcc-7. Bootstrapped and regtested on x86_64-linux-gnu/f25.
> 
> Regards,
>   Andre
> -- 
> Andre Vehreschild * Email: vehre ad gmx dot de 

> gcc/fortran/ChangeLog:
> 
> 2017-10-27  Andre Vehreschild  
> 
>   * check.c (gfc_check_co_reduce): Clarify error message.
> 

> diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
> index 681950e..759c15a 100644
> --- a/gcc/fortran/check.c
> +++ b/gcc/fortran/check.c
> @@ -1731,7 +1731,7 @@ gfc_check_co_reduce (gfc_expr *a, gfc_expr *op, 
> gfc_expr *result_image,
>  
>if (!gfc_compare_types (>ts, >result->ts))
>  {
> -  gfc_error ("A argument at %L has type %s but the function passed as "
> +  gfc_error ("The A argument at %L has type %s but the function passed 
> as "
>"OPERATOR at %L returns %s",
>>where, gfc_typename (>ts), >where,
>gfc_typename (>result->ts));

Andre,

Can I suggest that you take a look at scalar_check in check.c
and use a similar approach for your error message?

gfc_error ("%qs argument of %qs intrinsics at %L has type %s but "
"the function passed as OPERATOR at %L returns %s",
gfc_current_intrinsic_arg[0]->name, gfc_current_intrinsic,
>where, gfc_typename (>ts), >where,
gfc_typename (>result->ts));

Note, you'll need to use the correct index in gfc_current...arg[0]->name.
Also note, %qs gives a 'quoted' string, which in my xterm is a boldface
font.



-- 
Steve


[PATCH] Implement omp async support for nvptx

2017-10-27 Thread Tom de Vries

[ was: Re: [RFC PATCH] Coalesce host to device transfers in libgomp ]
On 10/25/2017 01:38 PM, Jakub Jelinek wrote:

And we don't really have the async target implemented yet for NVPTX:(,
guess that should be the highest priority after this optimization.


Hi,

how about this approach:
1 - Move async_run from plugin-hsa.c to default_async_run
2 - Implement omp async support for nvptx
?

The first patch moves the GOMP_OFFLOAD_async_run implementation from 
plugin-hsa.c to target.c, making it the default implementation if the 
plugin does not define the GOMP_OFFLOAD_async_run symbol.


The second patch removes the GOMP_OFFLOAD_async_run symbol from the 
nvptx plugin, activating the default implementation, and makes sure 
GOMP_OFFLOAD_run can be called from a fresh thread.


I've tested this with libgomp.c/c.exp and the previously failing 
target-33.c and target-34.c are now passing, and there are no regressions.


OK for trunk after complete testing (and adding function comment for 
default_async_run)?


Thanks,
- Tom

Move async_run from plugin-hsa.c to default_async_run

2017-10-27  Tom de Vries  

	* plugin/plugin-hsa.c (struct async_run_info): Move ...
	(run_kernel_asynchronously): Rename to ...
	(GOMP_OFFLOAD_async_run): Rename to ...
	* target.c (struct async_run_info): ... here.
	(default_async_run_1): ... this.
	(default_async_run): ... this.
	(gomp_target_task_fn): Handle missing async_run.
	(gomp_load_plugin_for_device): Make async_run optional.

---
 libgomp/plugin/plugin-hsa.c | 58 -
 libgomp/target.c| 63 ++---
 2 files changed, 60 insertions(+), 61 deletions(-)

diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c
index fc08f5d..65a89a3 100644
--- a/libgomp/plugin/plugin-hsa.c
+++ b/libgomp/plugin/plugin-hsa.c
@@ -1625,64 +1625,6 @@ GOMP_OFFLOAD_run (int n __attribute__((unused)),
   run_kernel (kernel, vars, kla);
 }
 
-/* Information to be passed to a thread running a kernel asycnronously.  */
-
-struct async_run_info
-{
-  int device;
-  void *tgt_fn;
-  void *tgt_vars;
-  void **args;
-  void *async_data;
-};
-
-/* Thread routine to run a kernel asynchronously.  */
-
-static void *
-run_kernel_asynchronously (void *thread_arg)
-{
-  struct async_run_info *info = (struct async_run_info *) thread_arg;
-  int device = info->device;
-  void *tgt_fn = info->tgt_fn;
-  void *tgt_vars = info->tgt_vars;
-  void **args = info->args;
-  void *async_data = info->async_data;
-
-  free (info);
-  GOMP_OFFLOAD_run (device, tgt_fn, tgt_vars, args);
-  GOMP_PLUGIN_target_task_completion (async_data);
-  return NULL;
-}
-
-/* Part of the libgomp plugin interface.  Run a kernel like GOMP_OFFLOAD_run
-   does, but asynchronously and call GOMP_PLUGIN_target_task_completion when it
-   has finished.  */
-
-void
-GOMP_OFFLOAD_async_run (int device, void *tgt_fn, void *tgt_vars,
-			void **args, void *async_data)
-{
-  pthread_t pt;
-  struct async_run_info *info;
-  HSA_DEBUG ("GOMP_OFFLOAD_async_run invoked\n")
-  info = GOMP_PLUGIN_malloc (sizeof (struct async_run_info));
-
-  info->device = device;
-  info->tgt_fn = tgt_fn;
-  info->tgt_vars = tgt_vars;
-  info->args = args;
-  info->async_data = async_data;
-
-  int err = pthread_create (, NULL, _kernel_asynchronously, info);
-  if (err != 0)
-GOMP_PLUGIN_fatal ("HSA asynchronous thread creation failed: %s",
-		   strerror (err));
-  err = pthread_detach (pt);
-  if (err != 0)
-GOMP_PLUGIN_fatal ("Failed to detach a thread to run HSA kernel "
-		   "asynchronously: %s", strerror (err));
-}
-
 /* Deinitialize all information associated with MODULE and kernels within
it.  Return TRUE on success.  */
 
diff --git a/libgomp/target.c b/libgomp/target.c
index 3dd119f..456ed78 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1868,6 +1868,59 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
 gomp_exit_data (devicep, mapnum, hostaddrs, sizes, kinds);
 }
 
+/* Information to be passed to a thread running a kernel asycnronously.  */
+
+struct async_run_info
+{
+  struct gomp_device_descr *devicep;
+  void *tgt_fn;
+  void *tgt_vars;
+  void **args;
+  void *async_data;
+};
+
+/* Thread routine to run a kernel asynchronously.  */
+
+static void *
+default_async_run_1 (void *thread_arg)
+{
+  struct async_run_info *info = (struct async_run_info *) thread_arg;
+  struct gomp_device_descr *devicep = info->devicep;
+  void *tgt_fn = info->tgt_fn;
+  void *tgt_vars = info->tgt_vars;
+  void **args = info->args;
+  void *async_data = info->async_data;
+
+  free (info);
+  devicep->run_func (devicep->target_id, tgt_fn, tgt_vars, args);
+  GOMP_PLUGIN_target_task_completion (async_data);
+  return NULL;
+}
+
+static void
+default_async_run (struct gomp_device_descr *devicep, void *tgt_fn,
+		   void *tgt_vars, void **args, void *async_data)
+{
+  pthread_t pt;
+  struct async_run_info 

Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Jakub Jelinek
On Fri, Oct 27, 2017 at 03:48:41PM +0200, Martin Liška wrote:
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -14639,8 +14639,12 @@ implicit_default_ctor_p (tree fn)
>  /* Clobber the contents of *this to let the back end know that the object
> storage is dead when we enter the constructor or leave the destructor.  */
>  
> +/* Clobber or zero (depending on CLOBBER_P argument) the contents of *this
> +   to let the back end know that the object storage is dead
> +   when we enter the constructor or leave the destructor.  */
> +
>  static tree
> -build_clobber_this ()
> +build_this_constructor (bool clobber_p)

I think build_clobber_this is better name, but will defer final review
to Jason or Nathan.  Also, seems there was already a function comment
and you've added yet another one, instead of ammending the first one.

Otherwise LGTM.

Jakub


Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Martin Liška

On 10/27/2017 01:26 PM, Jakub Jelinek wrote:

On Fri, Oct 27, 2017 at 01:16:08PM +0200, Martin Liška wrote:

On 10/27/2017 12:52 PM, Jakub Jelinek wrote:

The decl.c change seems to be only incremental change from a not publicly
posted patch rather than the full diff against trunk.


Sorry for that. Sending full patch.


Thanks.


--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -15280,7 +15280,19 @@ begin_destructor_body (void)
if (flag_lifetime_dse
  /* Clobbering an empty base is harmful if it overlays real data.  */
  && !is_empty_class (current_class_type))
-   finish_decl_cleanup (NULL_TREE, build_clobber_this ());
+   {
+ if (sanitize_flags_p (SANITIZE_VPTR)
+ && (flag_sanitize_recover & SANITIZE_VPTR) == 0)
+   {
+ tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
+ tree call = build_call_expr (fndecl, 3,
+  current_class_ptr, integer_zero_node,
+  TYPE_SIZE_UNIT (current_class_type));


I wonder if it wouldn't be cheaper to just use thisref = {}; rather than
memset, pretty much the same thing as build_clobber_this () emits, except
for the TREE_VOLATILE.  Also, build_clobber_this has:
   if (vbases)
 exprstmt = build_if_in_charge (exprstmt);
so it doesn't clobber if not in charge, not sure if it applies here too.
So maybe easiest would be add a bool argument to build_clobber_this which
would say whether it is a clobber or real clearing?


Hello.

Did that in newer version of the patch, good idea!




+ finish_decl_cleanup (NULL_TREE, call);
+   }
+ else
+   finish_decl_cleanup (NULL_TREE, build_clobber_this ());
+   }
  
/* And insert cleanups for our bases and members so that they

 will be properly destroyed if we throw.  */
diff --git a/gcc/testsuite/g++.dg/ubsan/vptr-12.C 
b/gcc/testsuite/g++.dg/ubsan/vptr-12.C
new file mode 100644
index 000..96c8473d757
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/vptr-12.C
@@ -0,0 +1,26 @@
+// { dg-do run }
+// { dg-shouldfail "ubsan" }
+// { dg-options "-fsanitize=vptr -fno-sanitize-recover=vptr" }
+
+struct MyClass
+{
+  virtual ~MyClass () {}
+  virtual void
+  Doit ()
+  {
+  }


Why not put all the above 4 lines into a single one, the dtor already uses
that kind of formatting.


Sure.




+};
+
+int
+main ()
+{
+  MyClass *c = new MyClass;
+  c->~MyClass ();
+  c->Doit ();
+
+  return 0;
+}
+
+// { dg-output "\[^\n\r]*vptr-12.C:19:\[0-9]*: runtime error: member call on 
address 0x\[0-9a-fA-F]* which does not point to an object of type 
'MyClass'(\n|\r\n|\r)" }
+// { dg-output "0x\[0-9a-fA-F]*: note: object has invalid vptr(\n|\r\n|\r)" }
+


Unnecessary empty line at end.


Likewise.

Martin



Jakub



>From b1da5f4de8b630f284627f422b902d28cd1d408b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 19 Oct 2017 11:10:19 +0200
Subject: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

gcc/cp/ChangeLog:

2017-10-27  Martin Liska  

	* decl.c (build_clobber_this): Rename to ...
	(build_this_constructor): ... this. Add argument clobber_p.
	(start_preparsed_function): Use the argument.
	(begin_destructor_body): In case of disabled recovery,
	we can zero object in order to catch virtual calls after
	an object lifetime.

gcc/testsuite/ChangeLog:

2017-10-27  Martin Liska  

	* g++.dg/ubsan/vptr-12.C: New test.
---
 gcc/cp/decl.c| 18 ++
 gcc/testsuite/g++.dg/ubsan/vptr-12.C | 22 ++
 2 files changed, 36 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ubsan/vptr-12.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 519aa06a0f9..ee48d1c157e 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -14639,8 +14639,12 @@ implicit_default_ctor_p (tree fn)
 /* Clobber the contents of *this to let the back end know that the object
storage is dead when we enter the constructor or leave the destructor.  */
 
+/* Clobber or zero (depending on CLOBBER_P argument) the contents of *this
+   to let the back end know that the object storage is dead
+   when we enter the constructor or leave the destructor.  */
+
 static tree
-build_clobber_this ()
+build_this_constructor (bool clobber_p)
 {
   /* Clobbering an empty base is pointless, and harmful if its one byte
  TYPE_SIZE overlays real data.  */
@@ -14657,7 +14661,9 @@ build_clobber_this ()
 ctype = CLASSTYPE_AS_BASE (ctype);
 
   tree clobber = build_constructor (ctype, NULL);
-  TREE_THIS_VOLATILE (clobber) = true;
+
+  if (clobber_p)
+TREE_THIS_VOLATILE (clobber) = true;
 
   tree thisref = current_class_ref;
   if (ctype != current_class_type)
@@ -15086,7 +15092,7 @@ start_preparsed_function (tree decl1, tree attrs, int flags)
 	 because part of the initialization might happen before we enter the
 	 constructor, via AGGR_INIT_ZERO_FIRST (c++/68006). 

Re: [hsa] Add missing guard in OMP gridification

2017-10-27 Thread Jakub Jelinek
On Fri, Oct 27, 2017 at 03:19:05PM +0200, Martin Jambor wrote:
> 2017-10-10  Martin Jambor  
> 
>   * omp-grid.c (grid_attempt_target_gridification): Also insert a
>   condition whether loop should be executed at all.

Ok, thanks.

> --- a/gcc/omp-grid.c
> +++ b/gcc/omp-grid.c
> @@ -1315,6 +1315,7 @@ grid_attempt_target_gridification (gomp_target *target,
>n1 = fold_convert (itype, n1);
>n2 = fold_convert (itype, n2);
>  
> +  tree cond = fold_build2 (cond_code, boolean_type_node, n1, n2);
>tree step
>   = omp_get_for_step_from_incr (loc, gimple_omp_for_incr (inner_loop, i));
>  
> @@ -1328,6 +1329,7 @@ grid_attempt_target_gridification (gomp_target *target,
>fold_build1 (NEGATE_EXPR, itype, step));
>else
>   t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step);
> +  t = fold_build3 (COND_EXPR, itype, cond, t, build_zero_cst (itype));
>if (grid.tiling)
>   {
> if (cond_code == GT_EXPR)
> -- 
> 2.14.2

Jakub


[12/nn] [AArch64] Add const_offset field to aarch64_address_info

2017-10-27 Thread Richard Sandiford
This patch records the integer value of the address offset in
aarch64_address_info, so that it doesn't need to be re-extracted
from the rtx.  The SVE port will make more use of this.  The patch
also uses poly_int64 routines to manipulate the offset, rather than
just handling CONST_INTs.


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64.c (aarch64_address_info): Add a const_offset
field.
(aarch64_classify_address): Initialize it.  Track polynomial offsets.
(aarch64_print_operand_address): Use it to check for a zero offset.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-10-27 14:13:59.548121066 +0100
+++ gcc/config/aarch64/aarch64.c2017-10-27 14:14:17.047874812 +0100
@@ -113,6 +113,7 @@ struct aarch64_address_info {
   enum aarch64_address_type type;
   rtx base;
   rtx offset;
+  poly_int64 const_offset;
   int shift;
   enum aarch64_symbol_type symbol_type;
 };
@@ -4728,6 +4729,8 @@ aarch64_classify_address (struct aarch64
 {
   enum rtx_code code = GET_CODE (x);
   rtx op0, op1;
+  poly_int64 offset;
+
   HOST_WIDE_INT const_size;
 
   /* On BE, we use load/store pair for all large int mode load/stores.
@@ -4756,6 +4759,7 @@ aarch64_classify_address (struct aarch64
   info->type = ADDRESS_REG_IMM;
   info->base = x;
   info->offset = const0_rtx;
+  info->const_offset = 0;
   return aarch64_base_register_rtx_p (x, strict_p);
 
 case PLUS:
@@ -4765,24 +4769,24 @@ aarch64_classify_address (struct aarch64
   if (! strict_p
  && REG_P (op0)
  && virt_or_elim_regno_p (REGNO (op0))
- && CONST_INT_P (op1))
+ && poly_int_rtx_p (op1, ))
{
  info->type = ADDRESS_REG_IMM;
  info->base = op0;
  info->offset = op1;
+ info->const_offset = offset;
 
  return true;
}
 
   if (may_ne (GET_MODE_SIZE (mode), 0)
- && CONST_INT_P (op1)
- && aarch64_base_register_rtx_p (op0, strict_p))
+ && aarch64_base_register_rtx_p (op0, strict_p)
+ && poly_int_rtx_p (op1, ))
{
- HOST_WIDE_INT offset = INTVAL (op1);
-
  info->type = ADDRESS_REG_IMM;
  info->base = op0;
  info->offset = op1;
+ info->const_offset = offset;
 
  /* TImode and TFmode values are allowed in both pairs of X
 registers and individual Q registers.  The available
@@ -4862,13 +4866,12 @@ aarch64_classify_address (struct aarch64
   info->type = ADDRESS_REG_WB;
   info->base = XEXP (x, 0);
   if (GET_CODE (XEXP (x, 1)) == PLUS
- && CONST_INT_P (XEXP (XEXP (x, 1), 1))
+ && poly_int_rtx_p (XEXP (XEXP (x, 1), 1), )
  && rtx_equal_p (XEXP (XEXP (x, 1), 0), info->base)
  && aarch64_base_register_rtx_p (info->base, strict_p))
{
- HOST_WIDE_INT offset;
  info->offset = XEXP (XEXP (x, 1), 1);
- offset = INTVAL (info->offset);
+ info->const_offset = offset;
 
  /* TImode and TFmode values are allowed in both pairs of X
 registers and individual Q registers.  The available
@@ -5899,7 +5902,7 @@ aarch64_print_operand_address (FILE *f,
 switch (addr.type)
   {
   case ADDRESS_REG_IMM:
-   if (addr.offset == const0_rtx)
+   if (must_eq (addr.const_offset, 0))
  asm_fprintf (f, "[%s]", reg_names [REGNO (addr.base)]);
else
  asm_fprintf (f, "[%s, %wd]", reg_names [REGNO (addr.base)],


[11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2

2017-10-27 Thread Richard Sandiford
This patch switches the AArch64 port to use 2 poly_int coefficients
and updates code as necessary to keep it compiling.

One potentially-significant change is to
aarch64_hard_regno_caller_save_mode.  The old implementation
was written in a pretty conservative way: it changed the default
behaviour for single-register values, but used the default handling
for multi-register values.

I don't think that's necessary, since the interesting cases for this
macro are usually the single-register ones.  Multi-register modes take
up the whole of the constituent registers and the move patterns for all
multi-register modes should be equally good.

Using the original mode for multi-register cases stops us from using
SVE modes to spill multi-register NEON values.  This was caught by
gcc.c-torture/execute/pr47538.c.

Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
(which are all scalars), and I think it's more obvious, since if we ever
do use this for elementwise shifts of vector modes, the mask will depend
on the number of bits in each element rather than the number of bits in
the whole vector.


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2.
* config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset):
Return a poly_int64 rather than a HOST_WIDE_INT.
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* config/aarch64/aarch64.h (aarch64_frame): Protect with
HAVE_POLY_INT_H rather than HOST_WIDE_INT.  Change locals_offset,
hard_fp_offset, frame_size, initial_adjust, callee_offset and
final_offset from HOST_WIDE_INT to poly_int64.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use
to_constant when getting the number of units in an Advanced SIMD
mode.
(aarch64_builtin_vectorized_function): Check for a constant number
of units.
* config/aarch64/aarch64-simd.md (mov): Handle polynomial
GET_MODE_SIZE.
(aarch64_ld_lane): Use the nunits
attribute instead of GET_MODE_NUNITS.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
(aarch64_class_max_nregs): Use the constant_lowest_bound of the
GET_MODE_SIZE for fixed-size registers.
(aarch64_hard_regno_call_part_clobbered, aarch64_classify_index)
(aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address)
(aarch64_legitimize_address_displacement, aarch64_secondary_reload)
(aarch64_print_operand_address, aarch64_address_cost)
(aarch64_register_move_cost, aarch64_short_vector_p)
(aapcs_vfp_sub_candidate, aarch64_simd_attr_length_rglist)
(aarch64_operands_ok_for_ldpstp): Handle polynomial GET_MODE_SIZE.
(aarch64_hard_regno_caller_save_mode): Likewise.  Return modes
wider than SImode without modification.
(tls_symbolic_operand_type): Use strip_offset instead of split_const.
(aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward)
(aarch64_gimplify_va_arg_expr): Assert that we don't yet handle
passing and returning SVE modes.
(aarch64_function_value, aarch64_layout_arg): Use gen_int_mode
rather than GEN_INT.
(aarch64_emit_probe_stack_range): Take the size as a poly_int64
rather than a HOST_WIDE_INT, but call sorry if it isn't constant.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_layout_frame): Cope with polynomial offsets.
(aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the
start_offset as a poly_int64 rather than a HOST_WIDE_INT.  Track
polynomial offsets.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p)
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a
poly_int64 rather than a HOST_WIDE_INT.
(aarch64_get_separate_components, aarch64_process_components)
(aarch64_expand_prologue, aarch64_expand_epilogue)
(aarch64_use_return_insn_p): Handle polynomial frame offsets.
(aarch64_anchor_offset): New function, split out from...
(aarch64_legitimize_address): ...here.
(aarch64_builtin_vectorization_cost): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(aarch64_simd_check_vect_par_cnst_half): Handle polynomial
GET_MODE_NUNITS.
(aarch64_simd_make_constant, aarch64_expand_vector_init): Get the
number of elements from the PARALLEL rather than the mode.
(aarch64_shift_truncation_mask): Use GET_MODE_UNIT_BITSIZE
rather than GET_MODE_BITSIZE.
(aarch64_evpc_tbl): Use nelt rather than GET_MODE_NUNITS.

[10/nn] [AArch64] Minor rtx costs tweak

2017-10-27 Thread Richard Sandiford
aarch64_rtx_costs uses the number of registers in a mode as the basis
of SET costs.  This patch makes it get the number of registers from
aarch64_hard_regno_nregs rather than repeating the calcalation inline.
Handling SVE modes in aarch64_hard_regno_nregs is then enough to get
the correct SET cost as well.


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64.c (aarch64_rtx_costs): Use
aarch64_hard_regno_nregs to get the number of registers
in a mode.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-10-27 14:12:11.045026014 +0100
+++ gcc/config/aarch64/aarch64.c2017-10-27 14:12:14.533257115 +0100
@@ -7200,18 +7200,16 @@ aarch64_rtx_costs (rtx x, machine_mode m
  /* The cost is one per vector-register copied.  */
  if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
{
- int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
- / GET_MODE_SIZE (V4SImode);
- *cost = COSTS_N_INSNS (n_minus_1 + 1);
+ int nregs = aarch64_hard_regno_nregs (V0_REGNUM, GET_MODE (op0));
+ *cost = COSTS_N_INSNS (nregs);
}
  /* const0_rtx is in general free, but we will use an
 instruction to set a register to 0.  */
  else if (REG_P (op1) || op1 == const0_rtx)
{
  /* The cost is 1 per register copied.  */
- int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
- / UNITS_PER_WORD;
- *cost = COSTS_N_INSNS (n_minus_1 + 1);
+ int nregs = aarch64_hard_regno_nregs (R0_REGNUM, GET_MODE (op0));
+ *cost = COSTS_N_INSNS (nregs);
}
   else
/* Cost is just the cost of the RHS of the set.  */


[09/nn] [AArch64] Pass number of units to aarch64_expand_vec_perm(_const)

2017-10-27 Thread Richard Sandiford
This patch passes the number of units to aarch64_expand_vec_perm
and aarch64_expand_vec_perm_const, which avoids a to_constant ()
once GET_MODE_NUNITS is variable.


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm)
(aarch64_expand_vec_perm_const): Take the number of units too.
* config/aarch64/aarch64.c (aarch64_expand_vec_perm)
(aarch64_expand_vec_perm_const): Likewise.
* config/aarch64/aarch64-simd.md (vec_perm_const)
(vec_perm): Update accordingly.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:07.203885483 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:11.042239887 +0100
@@ -484,11 +484,11 @@ tree aarch64_builtin_rsqrt (unsigned int
 tree aarch64_builtin_vectorized_function (unsigned int, tree, tree);
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
-extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
+extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
 extern bool aarch64_madd_needs_nop (rtx_insn *);
 extern void aarch64_final_prescan_insn (rtx_insn *);
 extern bool
-aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
+aarch64_expand_vec_perm_const (rtx, rtx, rtx, rtx, unsigned int);
 void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
 int aarch64_ccmp_mode_to_code (machine_mode mode);
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-10-27 14:12:07.205742901 +0100
+++ gcc/config/aarch64/aarch64.c2017-10-27 14:12:11.045026014 +0100
@@ -13488,11 +13488,14 @@ aarch64_expand_vec_perm_1 (rtx target, r
 }
 }
 
+/* Expand a vec_perm with the operands given by TARGET, OP0, OP1 and SEL.
+   NELT is the number of elements in the vector.  */
+
 void
-aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
+aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel,
+unsigned int nelt)
 {
   machine_mode vmode = GET_MODE (target);
-  unsigned int nelt = GET_MODE_NUNITS (vmode);
   bool one_vector_p = rtx_equal_p (op0, op1);
   rtx mask;
 
@@ -13848,13 +13851,15 @@ aarch64_expand_vec_perm_const_1 (struct
   return false;
 }
 
-/* Expand a vec_perm_const pattern.  */
+/* Expand a vec_perm_const pattern with the operands given by TARGET,
+   OP0, OP1 and SEL.  NELT is the number of elements in the vector.  */
 
 bool
-aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel)
+aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel,
+  unsigned int nelt)
 {
   struct expand_vec_perm_d d;
-  int i, nelt, which;
+  unsigned int i, which;
 
   d.target = target;
   d.op0 = op0;
@@ -13864,12 +13869,11 @@ aarch64_expand_vec_perm_const (rtx targe
   gcc_assert (VECTOR_MODE_P (d.vmode));
   d.testing_p = false;
 
-  nelt = GET_MODE_NUNITS (d.vmode);
   d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
 {
   rtx e = XVECEXP (sel, 0, i);
-  int ei = INTVAL (e) & (2 * nelt - 1);
+  unsigned int ei = INTVAL (e) & (2 * nelt - 1);
   which |= (ei < nelt ? 1 : 2);
   d.perm.quick_push (ei);
 }
Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  2017-10-27 14:12:07.203885483 +0100
+++ gcc/config/aarch64/aarch64-simd.md  2017-10-27 14:12:11.043168596 +0100
@@ -5238,7 +5238,7 @@ (define_expand "vec_perm_const"
   "TARGET_SIMD"
 {
   if (aarch64_expand_vec_perm_const (operands[0], operands[1],
-operands[2], operands[3]))
+operands[2], operands[3], ))
 DONE;
   else
 FAIL;
@@ -5252,7 +5252,7 @@ (define_expand "vec_perm"
   "TARGET_SIMD"
 {
   aarch64_expand_vec_perm (operands[0], operands[1],
-  operands[2], operands[3]);
+  operands[2], operands[3], );
   DONE;
 })
 


[08/nn] [AArch64] Pass number of units to aarch64_simd_vect_par_cnst_half

2017-10-27 Thread Richard Sandiford
This patch passes the number of units to aarch64_simd_vect_par_cnst_half,
which avoids a to_constant () once GET_MODE_NUNITS is variable.


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_simd_vect_par_cnst_half):
Take the number of units too.
* config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Likewise.
(aarch64_simd_check_vect_par_cnst_half): Update call accordingly,
but check for a vector mode before rather than after the call.
* config/aarch64/aarch64-simd.md (aarch64_split_simd_mov)
(move_hi_quad_, vec_unpack_hi_)
(vec_unpack_lo_

[07/nn] [AArch64] Pass number of units to aarch64_reverse_mask

2017-10-27 Thread Richard Sandiford
This patch passes the number of units to aarch64_reverse_mask,
which avoids a to_constant () once GET_MODE_NUNITS is variable.


2017-10-26  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_reverse_mask): Take
the number of units too.
* config/aarch64/aarch64.c (aarch64_reverse_mask): Likewise.
* config/aarch64/aarch64-simd.md (vec_load_lanesoi)
(vec_store_lanesoi, vec_load_lanesci)
(vec_store_lanesci, vec_load_lanesxi)
(vec_store_lanesxi): Update accordingly.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:00.601693018 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:04.192082112 +0100
@@ -365,7 +365,7 @@ bool aarch64_mask_and_shift_for_ubfiz_p
 bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
 bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
 bool aarch64_mov_operand_p (rtx, machine_mode);
-rtx aarch64_reverse_mask (machine_mode);
+rtx aarch64_reverse_mask (machine_mode, unsigned int);
 bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
 char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
 char *aarch64_output_simd_mov_immediate (rtx, unsigned,
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-10-27 14:12:00.603550436 +0100
+++ gcc/config/aarch64/aarch64.c2017-10-27 14:12:04.193939530 +0100
@@ -13945,16 +13945,18 @@ aarch64_vectorize_vec_perm_const_ok (mac
   return ret;
 }
 
+/* Generate a byte permute mask for a register of mode MODE,
+   which has NUNITS units.  */
+
 rtx
-aarch64_reverse_mask (machine_mode mode)
+aarch64_reverse_mask (machine_mode mode, unsigned int nunits)
 {
   /* We have to reverse each vector because we dont have
  a permuted load that can reverse-load according to ABI rules.  */
   rtx mask;
   rtvec v = rtvec_alloc (16);
-  int i, j;
-  int nunits = GET_MODE_NUNITS (mode);
-  int usize = GET_MODE_UNIT_SIZE (mode);
+  unsigned int i, j;
+  unsigned int usize = GET_MODE_UNIT_SIZE (mode);
 
   gcc_assert (BYTES_BIG_ENDIAN);
   gcc_assert (AARCH64_VALID_SIMD_QREG_MODE (mode));
Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  2017-10-27 14:12:00.602621727 +0100
+++ gcc/config/aarch64/aarch64-simd.md  2017-10-27 14:12:04.193010821 +0100
@@ -4632,7 +4632,7 @@ (define_expand "vec_load_lanesoi"
   if (BYTES_BIG_ENDIAN)
 {
   rtx tmp = gen_reg_rtx (OImode);
-  rtx mask = aarch64_reverse_mask (mode);
+  rtx mask = aarch64_reverse_mask (mode, );
   emit_insn (gen_aarch64_simd_ld2 (tmp, operands[1]));
   emit_insn (gen_aarch64_rev_reglistoi (operands[0], tmp, mask));
 }
@@ -4676,7 +4676,7 @@ (define_expand "vec_store_lanesoi"
   if (BYTES_BIG_ENDIAN)
 {
   rtx tmp = gen_reg_rtx (OImode);
-  rtx mask = aarch64_reverse_mask (mode);
+  rtx mask = aarch64_reverse_mask (mode, );
   emit_insn (gen_aarch64_rev_reglistoi (tmp, operands[1], mask));
   emit_insn (gen_aarch64_simd_st2 (operands[0], tmp));
 }
@@ -4730,7 +4730,7 @@ (define_expand "vec_load_lanesci"
   if (BYTES_BIG_ENDIAN)
 {
   rtx tmp = gen_reg_rtx (CImode);
-  rtx mask = aarch64_reverse_mask (mode);
+  rtx mask = aarch64_reverse_mask (mode, );
   emit_insn (gen_aarch64_simd_ld3 (tmp, operands[1]));
   emit_insn (gen_aarch64_rev_reglistci (operands[0], tmp, mask));
 }
@@ -4774,7 +4774,7 @@ (define_expand "vec_store_lanesci"
   if (BYTES_BIG_ENDIAN)
 {
   rtx tmp = gen_reg_rtx (CImode);
-  rtx mask = aarch64_reverse_mask (mode);
+  rtx mask = aarch64_reverse_mask (mode, );
   emit_insn (gen_aarch64_rev_reglistci (tmp, operands[1], mask));
   emit_insn (gen_aarch64_simd_st3 (operands[0], tmp));
 }
@@ -4828,7 +4828,7 @@ (define_expand "vec_load_lanesxi"
   if (BYTES_BIG_ENDIAN)
 {
   rtx tmp = gen_reg_rtx (XImode);
-  rtx mask = aarch64_reverse_mask (mode);
+  rtx mask = aarch64_reverse_mask (mode, );
   emit_insn (gen_aarch64_simd_ld4 (tmp, operands[1]));
   emit_insn (gen_aarch64_rev_reglistxi (operands[0], tmp, mask));
 }
@@ -4872,7 +4872,7 @@ (define_expand "vec_store_lanesxi"
   if (BYTES_BIG_ENDIAN)
 {
   rtx tmp = gen_reg_rtx (XImode);
-  rtx mask = aarch64_reverse_mask (mode);
+  rtx mask = aarch64_reverse_mask (mode, );
   emit_insn (gen_aarch64_rev_reglistxi (tmp, operands[1], mask));
   emit_insn (gen_aarch64_simd_st4 (operands[0], tmp));
 }


[06/nn] [AArch64] Add an endian_lane_rtx helper routine

2017-10-27 Thread Richard Sandiford
Later patches turn the number of vector units into a poly_int.
We deliberately don't support applying GEN_INT to those (except
in target code that doesn't disguish between poly_ints and normal
constants); gen_int_mode needs to be used instead.

This patch therefore replaces instances of:

  GEN_INT (ENDIAN_LANE_N (builtin_mode, INTVAL (op[opc])))

with uses of a new endian_lane_rtx function.


2017-10-26  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_endian_lane_rtx): Declare.
* config/aarch64/aarch64.c (aarch64_endian_lane_rtx): New function.
* config/aarch64/aarch64.h (ENDIAN_LANE_N): Take the number
of units rather than the mode.
* config/aarch64/iterators.md (nunits): New mode attribute.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args):
Use aarch64_endian_lane_rtx instead of GEN_INT (ENDIAN_LANE_N ...).
* config/aarch64/aarch64-simd.md (aarch64_dup_lane)
(aarch64_dup_lane_, *aarch64_mul3_elt)
(*aarch64_mul3_elt_): Likewise.
(*aarch64_mul3_elt_to_64v2df, *aarch64_mla_elt): Likewise.
(*aarch64_mla_elt_, *aarch64_mls_elt)
(*aarch64_mls_elt_, *aarch64_fma4_elt)
(*aarch64_fma4_elt_):: Likewise.
(*aarch64_fma4_elt_to_64v2df, *aarch64_fnma4_elt): Likewise.
(*aarch64_fnma4_elt_): Likewise.
(*aarch64_fnma4_elt_to_64v2df, reduc_plus_scal_): Likewise.
(reduc_plus_scal_v4sf, reduc__scal_): Likewise.
(reduc__scal_): Likewise.
(*aarch64_get_lane_extend): Likewise.
(*aarch64_get_lane_zero_extendsi): Likewise.
(aarch64_get_lane, *aarch64_mulx_elt_)
(*aarch64_mulx_elt, *aarch64_vgetfmulx): Likewise.
(aarch64_sqdmulh_lane, aarch64_sqdmulh_laneq)
(aarch64_sqrdmlh_lane): Likewise.
(aarch64_sqrdmlh_laneq): Likewise.
(aarch64_sqdmll_lane): Likewise.
(aarch64_sqdmll_laneq): Likewise.
(aarch64_sqdmll2_lane_internal): Likewise.
(aarch64_sqdmll2_laneq_internal): Likewise.
(aarch64_sqdmull_lane, aarch64_sqdmull_laneq): Likewise.
(aarch64_sqdmull2_lane_internal): Likewise.
(aarch64_sqdmull2_laneq_internal): Likewise.
(aarch64_vec_load_lanesoi_lane): Likewise.
(aarch64_vec_store_lanesoi_lane): Likewise.
(aarch64_vec_load_lanesci_lane): Likewise.
(aarch64_vec_store_lanesci_lane): Likewise.
(aarch64_vec_load_lanesxi_lane): Likewise.
(aarch64_vec_store_lanesxi_lane): Likewise.
(aarch64_simd_vec_set): Update use of ENDIAN_LANE_N.
(aarch64_simd_vec_setv2di): Likewise.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:11:56.993658452 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:00.601693018 +0100
@@ -437,6 +437,7 @@ void aarch64_simd_emit_reg_reg_move (rtx
 rtx aarch64_simd_expand_builtin (int, tree, rtx);
 
 void aarch64_simd_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree);
+rtx aarch64_endian_lane_rtx (machine_mode, unsigned int);
 
 void aarch64_split_128bit_move (rtx, rtx);
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-10-27 14:11:56.995515870 +0100
+++ gcc/config/aarch64/aarch64.c2017-10-27 14:12:00.603550436 +0100
@@ -12083,6 +12083,15 @@ aarch64_simd_lane_bounds (rtx operand, H
   }
 }
 
+/* Peform endian correction on lane number N, which indexes a vector
+   of mode MODE, and return the result as an SImode rtx.  */
+
+rtx
+aarch64_endian_lane_rtx (machine_mode mode, unsigned int n)
+{
+  return gen_int_mode (ENDIAN_LANE_N (GET_MODE_NUNITS (mode), n), SImode);
+}
+
 /* Return TRUE if OP is a valid vector addressing mode.  */
 bool
 aarch64_simd_mem_operand_p (rtx op)
Index: gcc/config/aarch64/aarch64.h
===
--- gcc/config/aarch64/aarch64.h2017-10-27 14:05:38.132936808 +0100
+++ gcc/config/aarch64/aarch64.h2017-10-27 14:12:00.603550436 +0100
@@ -910,8 +910,8 @@ #define AARCH64_VALID_SIMD_QREG_MODE(MOD
|| (MODE) == V4SFmode || (MODE) == V8HFmode || (MODE) == V2DImode \
|| (MODE) == V2DFmode)
 
-#define ENDIAN_LANE_N(mode, n)  \
-  (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
+#define ENDIAN_LANE_N(NUNITS, N) \
+  (BYTES_BIG_ENDIAN ? NUNITS - 1 - N : N)
 
 /* Support for a configure-time default CPU, etc.  We currently support
--with-arch and --with-cpu.  Both are ignored if either is specified
Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2017-10-27 14:11:56.995515870 +0100
+++ 

[05/nn] [AArch64] Rewrite aarch64_simd_valid_immediate

2017-10-27 Thread Richard Sandiford
This patch reworks aarch64_simd_valid_immediate so that
it's easier to add SVE support.  The main changes are:

- make simd_immediate_info easier to construct
- replace the while (1) { ... break; } blocks with checks that use
  the full 64-bit value of the constant
- treat floating-point modes as integers if they aren't valid
  as floating-point values


2017-10-26  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_output_simd_mov_immediate):
Remove the mode argument.
(aarch64_simd_valid_immediate): Remove the mode and inverse
arguments.
* config/aarch64/iterators.md (bitsize): New iterator.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov, and3)
(ior3): Update calls to aarch64_output_simd_mov_immediate.
* config/aarch64/constraints.md (Do, Db, Dn): Update calls to
aarch64_simd_valid_immediate.
* config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Likewise.
(aarch64_reg_or_bic_imm): Likewise.
* config/aarch64/aarch64.c (simd_immediate_info): Replace mvn
with an insn_type enum and msl with a modifier_type enum.
Replace element_width with a scalar_mode.  Change the shift
to unsigned int.  Add constructors for scalar_float_mode and
scalar_int_mode elements.
(aarch64_vect_float_const_representable_p): Delete.
(aarch64_can_const_movi_rtx_p, aarch64_legitimate_constant_p)
(aarch64_simd_scalar_immediate_valid_for_move)
(aarch64_simd_make_constant): Update call to
aarch64_simd_valid_immediate.
(aarch64_advsimd_valid_immediate_hs): New function.
(aarch64_advsimd_valid_immediate): Likewise.
(aarch64_simd_valid_immediate): Remove mode and inverse
arguments.  Rewrite to use the above.  Use const_vec_duplicate_p
to detect duplicated constants and use aarch64_float_const_zero_rtx_p
and aarch64_float_const_representable_p on the result.
(aarch64_output_simd_mov_immediate): Remove mode argument.
Update call to aarch64_simd_valid_immediate and use of
simd_immediate_info.
(aarch64_output_scalar_simd_mov_immediate): Update call
accordingly.

gcc/testsuite/
* gcc.target/aarch64/vect-movi.c (movi_float_lsl24): New function.
(main): Call it.

Index: gcc/config/aarch64/aarch64-protos.h
===
*** gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:06:16.157803281 +0100
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:26:40.949165813 +0100
*** bool aarch64_mov_operand_p (rtx, machine
*** 368,374 
  rtx aarch64_reverse_mask (machine_mode);
  bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
  char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
! char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
  bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
  bool aarch64_regno_ok_for_base_p (int, bool);
--- 368,374 
  rtx aarch64_reverse_mask (machine_mode);
  bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
  char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
! char *aarch64_output_simd_mov_immediate (rtx, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
  bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
  bool aarch64_regno_ok_for_base_p (int, bool);
*** bool aarch64_simd_check_vect_par_cnst_ha
*** 379,386 
  bool aarch64_simd_imm_zero_p (rtx, machine_mode);
  bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode);
  bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
! bool aarch64_simd_valid_immediate (rtx, machine_mode, bool,
!   struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
  bool aarch64_split_dimode_const_store (rtx, rtx);
  bool aarch64_symbolic_address_p (rtx);
--- 379,385 
  bool aarch64_simd_imm_zero_p (rtx, machine_mode);
  bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode);
  bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
! bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
  bool aarch64_split_dimode_const_store (rtx, rtx);
  bool aarch64_symbolic_address_p (rtx);
Index: gcc/config/aarch64/iterators.md
===
*** gcc/config/aarch64/iterators.md 2017-10-27 14:05:38.185854661 +0100
--- gcc/config/aarch64/iterators.md 2017-10-27 14:26:40.949165813 +0100
*** 

[04/nn] [AArch64] Rename the internal "Upl" constraint

2017-10-27 Thread Richard Sandiford
The SVE port uses the public constraints "Upl" and "Upa" to mean
"low predicate register" and "any predicate register" respectively.
"Upl" was already used as an internal-only constraint by the
addition patterns, so this patch renames it to "Uaa" ("two adds
needed").


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/constraints.md (Upl): Rename to...
(Uaa): ...this.
* config/aarch64/aarch64.md
(*zero_extend2_aarch64, *addsi3_aarch64_uxtw):
Update accordingly.

Index: gcc/config/aarch64/constraints.md
===
--- gcc/config/aarch64/constraints.md   2017-10-27 14:06:16.159815485 +0100
+++ gcc/config/aarch64/constraints.md   2017-10-27 14:11:54.071011147 +0100
@@ -35,7 +35,7 @@ (define_constraint "I"
  (and (match_code "const_int")
   (match_test "aarch64_uimm12_shift (ival)")))
 
-(define_constraint "Upl"
+(define_constraint "Uaa"
   "@internal A constant that matches two uses of add instructions."
   (and (match_code "const_int")
(match_test "aarch64_pluslong_strict_immedate (op, VOIDmode)")))
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2017-10-27 14:07:01.875769946 +0100
+++ gcc/config/aarch64/aarch64.md   2017-10-27 14:11:54.071011147 +0100
@@ -1562,7 +1562,7 @@ (define_insn "*add3_aarch64"
 (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r")
 (plus:GPI
  (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk")
- (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Upl")))]
+ (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa")))]
   ""
   "@
   add\\t%0, %1, %2
@@ -1580,7 +1580,7 @@ (define_insn "*addsi3_aarch64_uxtw"
 (match_operand:DI 0 "register_operand" "=rk,rk,rk,r")
 (zero_extend:DI
  (plus:SI (match_operand:SI 1 "register_operand" "%rk,rk,rk,rk")
-  (match_operand:SI 2 "aarch64_pluslong_operand" "I,r,J,Upl"]
+ (match_operand:SI 2 "aarch64_pluslong_operand" "I,r,J,Uaa"]
   ""
   "@
   add\\t%w0, %w1, %2


[03/nn] [AArch64] Rework interface to add constant/offset routines

2017-10-27 Thread Richard Sandiford
The port had aarch64_add_offset and aarch64_add_constant routines
that did similar things.  This patch replaces them with an expanded
version of aarch64_add_offset that takes separate source and
destination registers.  The new routine also takes a poly_int64 offset
instead of a HOST_WIDE_INT offset, but it leaves the HOST_WIDE_INT
case to aarch64_add_offset_1, which is basically a repurposed
aarch64_add_constant_internal.  The SVE patch will put the handling
of VL-based constants in aarch64_add_offset, while still using
aarch64_add_offset_1 for the constant part.

The vcall_offset == 0 path in aarch64_output_mi_thunk will use temp0
as well as temp1 once SVE is added.

A side-effect of the patch is that we now generate:

mov x29, sp

instead of:

add x29, sp, 0

in the pr70044.c test.


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64.c (aarch64_force_temporary): Assert that
x exists before using it.
(aarch64_add_constant_internal): Rename to...
(aarch64_add_offset_1): ...this.  Replace regnum with separate
src and dest rtxes.  Handle the case in which they're different,
including when the offset is zero.  Replace scratchreg with an rtx.
Use 2 additions if there is no spare register into which we can
move a 16-bit constant.
(aarch64_add_constant): Delete.
(aarch64_add_offset): Replace reg with separate src and dest
rtxes.  Take a poly_int64 offset instead of a HOST_WIDE_INT.
Use aarch64_add_offset_1.
(aarch64_add_sp, aarch64_sub_sp): Take the scratch register as
an rtx rather than an int.  Take the delta as a poly_int64
rather than a HOST_WIDE_INT.  Use aarch64_add_offset.
(aarch64_expand_mov_immediate): Update uses of aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Take the scratch register
as an rtx rather than an int.  Use Pmode rather than word_mode
in the loop code.  Update calls to aarch64_sub_sp.
(aarch64_expand_prologue): Update calls to aarch64_sub_sp,
aarch64_allocate_and_probe_stack_space and aarch64_add_offset.
(aarch64_expand_epilogue): Update calls to aarch64_add_offset
and aarch64_add_sp.
(aarch64_output_mi_thunk): Use aarch64_add_offset rather than
aarch64_add_constant.

gcc/testsuite/
* gcc.target/aarch64/pr70044.c: Allow "mov x29, sp" too.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-10-27 14:10:17.740863052 +0100
+++ gcc/config/aarch64/aarch64.c2017-10-27 14:11:14.425034427 +0100
@@ -1818,30 +1818,13 @@ aarch64_force_temporary (machine_mode mo
 return force_reg (mode, value);
   else
 {
-  x = aarch64_emit_move (x, value);
+  gcc_assert (x);
+  aarch64_emit_move (x, value);
   return x;
 }
 }
 
 
-static rtx
-aarch64_add_offset (scalar_int_mode mode, rtx temp, rtx reg,
-   HOST_WIDE_INT offset)
-{
-  if (!aarch64_plus_immediate (GEN_INT (offset), mode))
-{
-  rtx high;
-  /* Load the full offset into a register.  This
- might be improvable in the future.  */
-  high = GEN_INT (offset);
-  offset = 0;
-  high = aarch64_force_temporary (mode, temp, high);
-  reg = aarch64_force_temporary (mode, temp,
-gen_rtx_PLUS (mode, high, reg));
-}
-  return plus_constant (mode, reg, offset);
-}
-
 static int
 aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
scalar_int_mode mode)
@@ -1966,86 +1949,123 @@ aarch64_internal_mov_immediate (rtx dest
   return num_insns;
 }
 
-/* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to hold a
-   temporary value if necessary.  FRAME_RELATED_P should be true if
-   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
-   to the generated instructions.  If SCRATCHREG is known to hold
-   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
-   immediate again.
-
-   Since this function may be used to adjust the stack pointer, we must
-   ensure that it cannot cause transient stack deallocation (for example
-   by first incrementing SP and then decrementing when adjusting by a
-   large immediate).  */
+/* A subroutine of aarch64_add_offset that handles the case in which
+   OFFSET is known at compile time.  The arguments are otherwise the same.  */
 
 static void
-aarch64_add_constant_internal (scalar_int_mode mode, int regnum,
-  int scratchreg, HOST_WIDE_INT delta,
-  bool frame_related_p, bool emit_move_imm)
+aarch64_add_offset_1 (scalar_int_mode mode, rtx dest,
+ rtx src, HOST_WIDE_INT 

[02/nn] [AArch64] Move code around

2017-10-27 Thread Richard Sandiford
This patch simply moves code around, in order to make the later
patches easier to read, and to avoid forward declarations.
It doesn't add the missing function comments because the interfaces
will change in a later patch.


2017-10-26  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64.c (aarch64_add_constant_internal)
(aarch64_add_constant, aarch64_add_sp, aarch64_sub_sp): Move
earlier in file.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-10-27 14:10:14.622293803 +0100
+++ gcc/config/aarch64/aarch64.c2017-10-27 14:10:17.740863052 +0100
@@ -1966,6 +1966,87 @@ aarch64_internal_mov_immediate (rtx dest
   return num_insns;
 }
 
+/* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to hold a
+   temporary value if necessary.  FRAME_RELATED_P should be true if
+   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
+   to the generated instructions.  If SCRATCHREG is known to hold
+   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
+   immediate again.
+
+   Since this function may be used to adjust the stack pointer, we must
+   ensure that it cannot cause transient stack deallocation (for example
+   by first incrementing SP and then decrementing when adjusting by a
+   large immediate).  */
+
+static void
+aarch64_add_constant_internal (scalar_int_mode mode, int regnum,
+  int scratchreg, HOST_WIDE_INT delta,
+  bool frame_related_p, bool emit_move_imm)
+{
+  HOST_WIDE_INT mdelta = abs_hwi (delta);
+  rtx this_rtx = gen_rtx_REG (mode, regnum);
+  rtx_insn *insn;
+
+  if (!mdelta)
+return;
+
+  /* Single instruction adjustment.  */
+  if (aarch64_uimm12_shift (mdelta))
+{
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  return;
+}
+
+  /* Emit 2 additions/subtractions if the adjustment is less than 24 bits.
+ Only do this if mdelta is not a 16-bit move as adjusting using a move
+ is better.  */
+  if (mdelta < 0x100 && !aarch64_move_imm (mdelta, mode))
+{
+  HOST_WIDE_INT low_off = mdelta & 0xfff;
+
+  low_off = delta < 0 ? -low_off : low_off;
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  return;
+}
+
+  /* Emit a move immediate if required and an addition/subtraction.  */
+  rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
+  if (emit_move_imm)
+aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
+  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
+ : gen_add2_insn (this_rtx, scratch_rtx));
+  if (frame_related_p)
+{
+  RTX_FRAME_RELATED_P (insn) = frame_related_p;
+  rtx adj = plus_constant (mode, this_rtx, delta);
+  add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
+}
+}
+
+static inline void
+aarch64_add_constant (scalar_int_mode mode, int regnum, int scratchreg,
+ HOST_WIDE_INT delta)
+{
+  aarch64_add_constant_internal (mode, regnum, scratchreg, delta, false, true);
+}
+
+static inline void
+aarch64_add_sp (int scratchreg, HOST_WIDE_INT delta, bool emit_move_imm)
+{
+  aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, delta,
+true, emit_move_imm);
+}
+
+static inline void
+aarch64_sub_sp (int scratchreg, HOST_WIDE_INT delta, bool frame_related_p)
+{
+  aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, -delta,
+frame_related_p, true);
+}
 
 void
 aarch64_expand_mov_immediate (rtx dest, rtx imm)
@@ -2077,88 +2158,6 @@ aarch64_expand_mov_immediate (rtx dest,
  as_a  (mode));
 }
 
-/* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to hold a
-   temporary value if necessary.  FRAME_RELATED_P should be true if
-   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
-   to the generated instructions.  If SCRATCHREG is known to hold
-   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
-   immediate again.
-
-   Since this function may be used to adjust the stack pointer, we must
-   ensure that it cannot cause transient stack deallocation (for example
-   by first incrementing SP and then decrementing when adjusting by a
-   large immediate).  */
-
-static void
-aarch64_add_constant_internal (scalar_int_mode mode, int regnum,
-  int scratchreg, HOST_WIDE_INT delta,
-  

[01/nn] [AArch64] Generate permute patterns using rtx builders

2017-10-27 Thread Richard Sandiford
This patch replaces switch statements that call specific generator
functions with code that constructs the rtl pattern directly.
This seemed to scale better to SVE and also seems less error-prone.

As a side-effect, the patch fixes the REV handling for diff==1,
vmode==E_V4HFmode and adds missing support for diff==3,
vmode==E_V4HFmode.

To compensate for the lack of switches that check for specific modes,
the patch makes aarch64_expand_vec_perm_const_1 reject permutes on
single-element vectors (specifically V1DImode).


2017-10-27  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* config/aarch64/aarch64.c (aarch64_evpc_trn, aarch64_evpc_uzp)
(aarch64_evpc_zip, aarch64_evpc_ext, aarch64_evpc_rev)
(aarch64_evpc_dup): Generate rtl direcly, rather than using
named expanders.
(aarch64_expand_vec_perm_const_1): Explicitly check for permutes
of a single element.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-10-27 14:10:08.337833963 +0100
+++ gcc/config/aarch64/aarch64.c2017-10-27 14:10:14.622293803 +0100
@@ -13475,7 +13475,6 @@ aarch64_evpc_trn (struct expand_vec_perm
 {
   unsigned int i, odd, mask, nelt = d->perm.length ();
   rtx out, in0, in1, x;
-  rtx (*gen) (rtx, rtx, rtx);
   machine_mode vmode = d->vmode;
 
   if (GET_MODE_UNIT_SIZE (vmode) > 8)
@@ -13512,48 +13511,8 @@ aarch64_evpc_trn (struct expand_vec_perm
 }
   out = d->target;
 
-  if (odd)
-{
-  switch (vmode)
-   {
-   case E_V16QImode: gen = gen_aarch64_trn2v16qi; break;
-   case E_V8QImode: gen = gen_aarch64_trn2v8qi; break;
-   case E_V8HImode: gen = gen_aarch64_trn2v8hi; break;
-   case E_V4HImode: gen = gen_aarch64_trn2v4hi; break;
-   case E_V4SImode: gen = gen_aarch64_trn2v4si; break;
-   case E_V2SImode: gen = gen_aarch64_trn2v2si; break;
-   case E_V2DImode: gen = gen_aarch64_trn2v2di; break;
-   case E_V4HFmode: gen = gen_aarch64_trn2v4hf; break;
-   case E_V8HFmode: gen = gen_aarch64_trn2v8hf; break;
-   case E_V4SFmode: gen = gen_aarch64_trn2v4sf; break;
-   case E_V2SFmode: gen = gen_aarch64_trn2v2sf; break;
-   case E_V2DFmode: gen = gen_aarch64_trn2v2df; break;
-   default:
- return false;
-   }
-}
-  else
-{
-  switch (vmode)
-   {
-   case E_V16QImode: gen = gen_aarch64_trn1v16qi; break;
-   case E_V8QImode: gen = gen_aarch64_trn1v8qi; break;
-   case E_V8HImode: gen = gen_aarch64_trn1v8hi; break;
-   case E_V4HImode: gen = gen_aarch64_trn1v4hi; break;
-   case E_V4SImode: gen = gen_aarch64_trn1v4si; break;
-   case E_V2SImode: gen = gen_aarch64_trn1v2si; break;
-   case E_V2DImode: gen = gen_aarch64_trn1v2di; break;
-   case E_V4HFmode: gen = gen_aarch64_trn1v4hf; break;
-   case E_V8HFmode: gen = gen_aarch64_trn1v8hf; break;
-   case E_V4SFmode: gen = gen_aarch64_trn1v4sf; break;
-   case E_V2SFmode: gen = gen_aarch64_trn1v2sf; break;
-   case E_V2DFmode: gen = gen_aarch64_trn1v2df; break;
-   default:
- return false;
-   }
-}
-
-  emit_insn (gen (out, in0, in1));
+  emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1),
+ odd ? UNSPEC_TRN2 : UNSPEC_TRN1));
   return true;
 }
 
@@ -13563,7 +13522,6 @@ aarch64_evpc_uzp (struct expand_vec_perm
 {
   unsigned int i, odd, mask, nelt = d->perm.length ();
   rtx out, in0, in1, x;
-  rtx (*gen) (rtx, rtx, rtx);
   machine_mode vmode = d->vmode;
 
   if (GET_MODE_UNIT_SIZE (vmode) > 8)
@@ -13599,48 +13557,8 @@ aarch64_evpc_uzp (struct expand_vec_perm
 }
   out = d->target;
 
-  if (odd)
-{
-  switch (vmode)
-   {
-   case E_V16QImode: gen = gen_aarch64_uzp2v16qi; break;
-   case E_V8QImode: gen = gen_aarch64_uzp2v8qi; break;
-   case E_V8HImode: gen = gen_aarch64_uzp2v8hi; break;
-   case E_V4HImode: gen = gen_aarch64_uzp2v4hi; break;
-   case E_V4SImode: gen = gen_aarch64_uzp2v4si; break;
-   case E_V2SImode: gen = gen_aarch64_uzp2v2si; break;
-   case E_V2DImode: gen = gen_aarch64_uzp2v2di; break;
-   case E_V4HFmode: gen = gen_aarch64_uzp2v4hf; break;
-   case E_V8HFmode: gen = gen_aarch64_uzp2v8hf; break;
-   case E_V4SFmode: gen = gen_aarch64_uzp2v4sf; break;
-   case E_V2SFmode: gen = gen_aarch64_uzp2v2sf; break;
-   case E_V2DFmode: gen = gen_aarch64_uzp2v2df; break;
-   default:
- return false;
-   }
-}
-  else
-{
-  switch (vmode)
-   {
-   case E_V16QImode: gen = gen_aarch64_uzp1v16qi; break;
-   case E_V8QImode: gen = gen_aarch64_uzp1v8qi; break;
-   case E_V8HImode: gen = gen_aarch64_uzp1v8hi; break;
-   case E_V4HImode: gen = gen_aarch64_uzp1v4hi; break;
-   case E_V4SImode: gen = 

[00/nn] AArch64 patches preparing for SVE

2017-10-27 Thread Richard Sandiford
This series of patches are the AArch64 changes needed before SVE
support goes in.  It's based on top of:

  https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01163.html

and Jeff's latest stach-clash protection changes.

Series tested on aarch64-linux-gnu.

Richard


[hsa] Add missing guard in OMP gridification

2017-10-27 Thread Martin Jambor
Hi,

rather embarrasingly, I found out that there is a missing condition to
make sure that HSA grid size is zero when the OpenMP loop bounds
should preclude the loop from executing at all.  I do not know whether
I lost is somewhere when preparing patches for trunk or whether I
forgot about it from the beginning.  In any case, the patch below adds
it where it should be.

This popped up during my libgomp testsuite runs as a consequence of
Jakub's revision 253395 after which HSAIL was apparently generated for a
a few more kernels and libgomp.c/for-5.c started to fail (taking the
whole machine GPGPU subsystem with it).  So there is already a testcase
for this.

My long term plan for gridification is to replace it with the approach
that our nvidia offloading uses once we have simpler (and better
supported) function pointers in HSA or/and, better yet, a full blown
GCN BE.  It does not currently work well but I still try to avoid any
regressions (this one took long because the bug started happening when
I changed some unrelated things on the APU machine and was suspecting
them).

Bootstrapped with hsa enabled on an x86_64-linux and tested on an HSA
capable APU, OK for trunk?

Thanks,

Martin


2017-10-10  Martin Jambor  

* omp-grid.c (grid_attempt_target_gridification): Also insert a
condition whether loop should be executed at all.
---
 gcc/omp-grid.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/omp-grid.c b/gcc/omp-grid.c
index a7b6f60aeaf..121c96ebe39 100644
--- a/gcc/omp-grid.c
+++ b/gcc/omp-grid.c
@@ -1315,6 +1315,7 @@ grid_attempt_target_gridification (gomp_target *target,
   n1 = fold_convert (itype, n1);
   n2 = fold_convert (itype, n2);
 
+  tree cond = fold_build2 (cond_code, boolean_type_node, n1, n2);
   tree step
= omp_get_for_step_from_incr (loc, gimple_omp_for_incr (inner_loop, i));
 
@@ -1328,6 +1329,7 @@ grid_attempt_target_gridification (gomp_target *target,
 fold_build1 (NEGATE_EXPR, itype, step));
   else
t = fold_build2 (TRUNC_DIV_EXPR, itype, t, step);
+  t = fold_build3 (COND_EXPR, itype, cond, t, build_zero_cst (itype));
   if (grid.tiling)
{
  if (cond_code == GT_EXPR)
-- 
2.14.2



[PATCH] Append PWD to path when using -fprofile-generate=/some/path.

2017-10-27 Thread Martin Liška

Hello.

It's improvement that I consider still useful even though we're not going to use
it for profiled bootstrap.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready for trunk?
Thanks,
Martin
>From 1a32e0b41b291bef3d58126754834f6c41148ace Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 16 Aug 2017 10:22:57 +0200
Subject: [PATCH] Append PWD to path when using -fprofile-generate=/some/path.

gcc/ChangeLog:

2017-10-27  Martin Liska  

	* coverage.c (coverage_init): Append absolute path of object
	file.
	* doc/invoke.texi: Document the behavior.
---
 gcc/coverage.c  | 20 ++--
 gcc/doc/invoke.texi |  3 +++
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/gcc/coverage.c b/gcc/coverage.c
index 8a56a677f15..c66651f1045 100644
--- a/gcc/coverage.c
+++ b/gcc/coverage.c
@@ -1220,8 +1220,24 @@ coverage_init (const char *filename)
 g->get_passes ()->get_pass_profile ()->static_pass_number;
   g->get_dumps ()->dump_start (profile_pass_num, NULL);
 
-  if (!profile_data_prefix && !IS_ABSOLUTE_PATH (filename))
-profile_data_prefix = getpwd ();
+  if (!IS_ABSOLUTE_PATH (filename))
+{
+  if (profile_data_prefix)
+	{
+	  const char *pwd = getpwd ();
+	  unsigned l1 = strlen (profile_data_prefix);
+	  unsigned l2 = strlen (pwd);
+
+	  char *b = XNEWVEC (char, l1 + l2 + 2);
+	  memcpy (b, profile_data_prefix, l1);
+	  b[l1] = '/';
+	  memcpy (b + l1 + 1, pwd, l2);
+	  b[l1 + l2 + 1] = '\0';
+	  profile_data_prefix = b;
+	}
+  else
+	profile_data_prefix = getpwd ();
+}
 
   if (profile_data_prefix)
 prefix_len = strlen (profile_data_prefix);
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71b2445f70f..682ab570584 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10915,6 +10915,9 @@ and used by @option{-fprofile-use} and @option{-fbranch-probabilities}
 and its related options.  Both absolute and relative paths can be used.
 By default, GCC uses the current directory as @var{path}, thus the
 profile data file appears in the same directory as the object file.
+In order to prevent filename clashing, if object file name is not an absolute
+path, current working directory path is appended to @var{path}
+and used as a destination for @file{@var{sourcename}.gcda} file.
 
 @item -fprofile-generate
 @itemx -fprofile-generate=@var{path}
-- 
2.14.2



[PATCH][OBVIOUS] Fix profiledbootstrap.

2017-10-27 Thread Martin Liška

Hello.

So eventually it looks needed fix is much simpler, one just need to rename a 
folder.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
I'm going to install it.

Martin
>From 028b5cc3865bdc77c185172de84627e140030303 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 27 Oct 2017 12:21:02 +0200
Subject: [PATCH] Fix profiledbootstrap.

ChangeLog:

2017-10-27  Martin Liska  

	* Makefile.tpl: Use proper name of folder as it was renamed
	during transition to 4 stages.
	* Makefile.in: Regenerate.
---
 Makefile.in  | 4 ++--
 Makefile.tpl | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index 78db0982ba2..13d23915349 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -57174,8 +57174,8 @@ stageprofile-end::
 stagefeedback-start::
 	@r=`${PWD_COMMAND}`; export r; \
 	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-	for i in prev-*; do \
-	  j=`echo $$i | sed s/^prev-//`; \
+	for i in stageprofile-*; do \
+	  j=`echo $$i | sed s/^stageprofile-//`; \
 	  cd $$r/$$i && \
 	  { find . -type d | sort | sed 's,.*,$(SHELL) '"$$s"'/mkinstalldirs "../'$$j'/&",' | $(SHELL); } && \
 	  { find . -name '*.*da' | sed 's,.*,$(LN) -f "&" "../'$$j'/&",' | $(SHELL); }; \
diff --git a/Makefile.tpl b/Makefile.tpl
index 5fcd7e358d9..1f23b79b4b2 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -1718,8 +1718,8 @@ stageprofile-end::
 stagefeedback-start::
 	@r=`${PWD_COMMAND}`; export r; \
 	s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-	for i in prev-*; do \
-	  j=`echo $$i | sed s/^prev-//`; \
+	for i in stageprofile-*; do \
+	  j=`echo $$i | sed s/^stageprofile-//`; \
 	  cd $$r/$$i && \
 	  { find . -type d | sort | sed 's,.*,$(SHELL) '"$$s"'/mkinstalldirs "../'$$j'/&",' | $(SHELL); } && \
 	  { find . -name '*.*da' | sed 's,.*,$(LN) -f "&" "../'$$j'/&",' | $(SHELL); }; \
-- 
2.14.2



Re: [RFC] Make 4-stage PGO bootstrap really working

2017-10-27 Thread Martin Liška

On 10/25/2017 02:19 PM, Markus Trippelsdorf wrote:

On 2017.08.30 at 11:45 +0200, Martin Liška wrote:

diff --git a/Makefile.in b/Makefile.in
index 78db0982ba2..16b76906ad0 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -529,13 +529,14 @@ STAGE1_CONFIGURE_FLAGS = --disable-intermodule 
$(STAGE1_CHECKING) \
  --disable-coverage --enable-languages="$(STAGE1_LANGUAGES)" \
  --disable-build-format-warnings
  
-STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate

+profile_folder=`${PWD_COMMAND}`/gcov-profiles/
+STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate=$(profile_folder)
  STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
  
  STAGEtrain_CFLAGS = $(STAGE3_CFLAGS)

  STAGEtrain_TFLAGS = $(STAGE3_TFLAGS)
  
-STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use

+STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use=$(profile_folder) 
-fdump-ipa-profile


-fdump-ipa-profile looks like a debugging leftover and should be
dropped.


Sure. Let me prepare a new version of patch.




  STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
  
  STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g

diff --git a/Makefile.tpl b/Makefile.tpl
index 5fcd7e358d9..129175a579c 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -452,13 +452,14 @@ STAGE1_CONFIGURE_FLAGS = --disable-intermodule 
$(STAGE1_CHECKING) \
  --disable-coverage --enable-languages="$(STAGE1_LANGUAGES)" \
  --disable-build-format-warnings
  
-STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate

+profile_folder=`${PWD_COMMAND}`/gcov-profiles/
+STAGEprofile_CFLAGS = $(STAGE2_CFLAGS) -fprofile-generate=$(profile_folder)
  STAGEprofile_TFLAGS = $(STAGE2_TFLAGS)
  
  STAGEtrain_CFLAGS = $(STAGE3_CFLAGS)

  STAGEtrain_TFLAGS = $(STAGE3_TFLAGS)
  
-STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use

+STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use=$(profile_folder) 
-fdump-ipa-profile


ditto.


And BTW would it make sense to add -gtoggle to stage2 in bootstrap-lto?


Why do you want to have it there? Am I right that we do not do a stage 
comparison
with LTO bootstrap?

Martin



diff --git a/config/bootstrap-lto.mk b/config/bootstrap-lto.mk
index 50b86ef1c81..c0cdee69288 100644
--- a/config/bootstrap-lto.mk
+++ b/config/bootstrap-lto.mk
@@ -1,8 +1,8 @@
  # This option enables LTO for stage2 and stage3 in slim mode
  
-STAGE2_CFLAGS += -flto=jobserver -frandom-seed=1

+STAGE2_CFLAGS += -flto=jobserver -frandom-seed=1 -gtoggle
  STAGE3_CFLAGS += -flto=jobserver -frandom-seed=1
-STAGEprofile_CFLAGS += -flto=jobserver -frandom-seed=1
+STAGEprofile_CFLAGS += -flto=jobserver -frandom-seed=1 -gtoggle
  STAGEtrain_CFLAGS += -flto=jobserver -frandom-seed=1
  STAGEfeedback_CFLAGS += -flto=jobserver -frandom-seed=1
  





Re: [PATCH] Change default optimization level to -Og

2017-10-27 Thread Wilco Dijkstra
Andrew Pinski wrote:

> I think this goes against what most folks are used to.  I know you are
> saying most folks are used to a compiler defaulting to optimizations
> on but I don't think that is true.  In fact GCC has been this way
> since day one.

Well it may depend which part of the industry you're coming from. GCC 
certainly has a long history doing it one way, however other compilers took
a different approach and have supported optimized debugging for decades.
So I don't understand the use of having a "turn every optimization off" option,
let alone for it to be the default today...

> Plus you also missed changing the following part of the documentation:
> If you are not using some other optimization option, consider using
> -Og (see Optimize Options) with -g. With no -O option at all, some
> compiler passes that collect information useful for debugging do not
> run at all, so that -Og may result in a better debugging experience.

Sure, the doc part of the patch will need further revision if we agree to
change the default.

Wilco


Re: [PATCH] Provide filesystem::path overloads for file streams (LWG 2676, partial)

2017-10-27 Thread Jonathan Wakely

On 27/10/17 13:43 +0100, Jonathan Wakely wrote:

This implements part of LWG 2676. I haven't added the new members
taking wide character strings, because they're only needed on Windows,
where the Filesystem library doesn't work yet. I'll send a follow-up
patch about those overloads.


This patch should add the wide character overloads, for systems that
support _wfopen for opening a FILE from a wchar_t string (i.e. MinGW
and MinGW-w64). This is the missing part of LWG 2676.

These are not templates, so would require new symbols to be exported
from the library (but only for Windows). As is done with std::string
for now, I've just disabled the explicit instantiation declarations
for C++17, so the functions get implicitly instantiated as needed.

I'm not committing this, because I haven't tested it, and I get angry
people complaining why I try to support Windows in good faith. So this
is provided with no testing and not committed. Windows users can do
their own testing.


commit f6a912da2ebcfd1eaedb8fb894421f3accf9cb06
Author: Jonathan Wakely 
Date:   Tue Oct 24 19:11:06 2017 +0100

create fstreams from wide strings

diff --git a/libstdc++-v3/config/io/basic_file_stdio.cc b/libstdc++-v3/config/io/basic_file_stdio.cc
index eeb1e5e94b6..2114698a3b8 100644
--- a/libstdc++-v3/config/io/basic_file_stdio.cc
+++ b/libstdc++-v3/config/io/basic_file_stdio.cc
@@ -249,6 +249,39 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 return __ret;
   }
 
+#if _GLIBCXX_HAVE__WFOPEN && _GLIBCXX_USE_WCHAR_T
+  __basic_file*
+  __basic_file::open(const wchar_t* __name, ios_base::openmode __mode)
+  {
+__basic_file* __ret = NULL;
+const char* __c_mode = fopen_mode(__mode);
+if (__c_mode && !this->is_open())
+  {
+	wchar_t __wc_mode[4] = { };
+	int __i = 0;
+	do
+	  {
+	switch(__c_mode[__i]) {
+	case 'a': __wc_mode[__i] = L'a'; break;
+	case 'b': __wc_mode[__i] = L'b'; break;
+	case 'r': __wc_mode[__i] = L'r'; break;
+	case 'w': __wc_mode[__i] = L'w'; break;
+	case '+': __wc_mode[__i] = L'+'; break;
+	default: return __ret;
+	}
+	  }
+	while (__c_mode[++__i]);
+
+	if ((_M_cfile = _wfopen(__name, __wc_mode)))
+	  {
+	_M_cfile_created = true;
+	__ret = this;
+	  }
+  }
+return __ret;
+  }
+#endif
+
   bool
   __basic_file::is_open() const throw ()
   { return _M_cfile != 0; }
diff --git a/libstdc++-v3/config/io/basic_file_stdio.h b/libstdc++-v3/config/io/basic_file_stdio.h
index f959ea534cb..11dc47de809 100644
--- a/libstdc++-v3/config/io/basic_file_stdio.h
+++ b/libstdc++-v3/config/io/basic_file_stdio.h
@@ -84,6 +84,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __basic_file*
   open(const char* __name, ios_base::openmode __mode, int __prot = 0664);
 
+#if _GLIBCXX_HAVE__WFOPEN && _GLIBCXX_USE_WCHAR_T
+  __basic_file*
+  open(const wchar_t* __name, ios_base::openmode __mode);
+#endif
+
   __basic_file*
   sys_open(__c_file* __file, ios_base::openmode);
 
diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac
index 270dcbaf723..7ae3d4376aa 100644
--- a/libstdc++-v3/configure.ac
+++ b/libstdc++-v3/configure.ac
@@ -257,6 +257,7 @@ if $GLIBCXX_IS_NATIVE; then
 
   AC_CHECK_FUNCS(__cxa_thread_atexit_impl __cxa_thread_atexit)
   AC_CHECK_FUNCS(aligned_alloc posix_memalign memalign _aligned_malloc)
+  AC_CHECK_FUNCS(_wfopen)
 
   # For iconv support.
   AM_ICONV
diff --git a/libstdc++-v3/include/bits/fstream.tcc b/libstdc++-v3/include/bits/fstream.tcc
index 12ea977b997..5b094a3f6e1 100644
--- a/libstdc++-v3/include/bits/fstream.tcc
+++ b/libstdc++-v3/include/bits/fstream.tcc
@@ -207,6 +207,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __ret;
 }
 
+#if __cplusplus >= 201703L
+#if _GLIBCXX_HAVE__WFOPEN && _GLIBCXX_USE_WCHAR_T
+  template
+auto
+basic_filebuf<_CharT, _Traits>::
+open(const wchar_t* __s, ios_base::openmode __mode)
+-> __filebuf_type*
+{
+  __filebuf_type *__ret = 0;
+  if (!this->is_open())
+	{
+	  _M_file.open(__s, __mode);
+	  if (this->is_open())
+	{
+	  _M_allocate_internal_buffer();
+	  _M_mode = __mode;
+
+	  // Setup initial buffer to 'uncommitted' mode.
+	  _M_reading = false;
+	  _M_writing = false;
+	  _M_set_buffer(-1);
+
+	  // Reset to initial state.
+	  _M_state_last = _M_state_cur = _M_state_beg;
+
+	  // 27.8.1.3,4
+	  if ((__mode & ios_base::ate)
+		  && this->seekoff(0, ios_base::end, __mode)
+		  == pos_type(off_type(-1)))
+		this->close();
+	  else
+		__ret = this;
+	}
+	}
+  return __ret;
+}
+#endif
+#endif // C++17
+
   template
 typename basic_filebuf<_CharT, _Traits>::__filebuf_type*
 basic_filebuf<_CharT, _Traits>::
@@ -1048,6 +1087,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Inhibit implicit instantiations for required instantiations,
   // which are defined via explicit instantiations elsewhere.
 #if _GLIBCXX_EXTERN_TEMPLATE
+#if !(__cplusplus >= 201703L && _GLIBCXX_HAVE__WFOPEN && 

Re: [RFC] propagate malloc attribute in ipa-pure-const pass

2017-10-27 Thread Richard Biener
On Fri, 27 Oct 2017, Jan Hubicka wrote:

> > On 25 October 2017 at 20:44, Jan Hubicka  wrote:
> > >> On 24 October 2017 at 16:26, Jan Hubicka  wrote:
> > >> >> 2017-10-13  Prathamesh Kulkarni  
> > >> >>
> > >> >>   * cgraph.h (set_malloc_flag): Declare.
> > >> >>   * cgraph.c (set_malloc_flag_1): New function.
> > >> >>   (set_malloc_flag): Likewise.
> > >> >>   * ipa-fnsummary.h (ipa_call_summary): Add new field 
> > >> >> is_return_callee.
> > >> >>   * ipa-fnsummary.c (ipa_call_summary::reset): Set 
> > >> >> is_return_callee to
> > >> >>   false.
> > >> >>   (read_ipa_call_summary): Add support for reading 
> > >> >> is_return_callee.
> > >> >>   (write_ipa_call_summary): Stream is_return_callee.
> > >> >>   * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.
> > >> >>   * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, 
> > >> >> symbol-summary.h,
> > >> >>   ipa-prop.h, ipa-fnsummary.h.
> > >> >>   (pure_const_names): Change to static.
> > >> >>   (malloc_state_e): Define.
> > >> >>   (malloc_state_names): Define.
> > >> >>   (funct_state_d): Add field malloc_state.
> > >> >>   (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.
> > >> >>   (check_retval_uses): New function.
> > >> >>   (malloc_candidate_p): Likewise.
> > >> >>   (analyze_function): Add support for malloc attribute.
> > >> >>   (pure_const_write_summary): Stream malloc_state.
> > >> >>   (pure_const_read_summary): Add support for reading malloc_state.
> > >> >>   (dump_malloc_lattice): New function.
> > >> >>   (propagate_malloc): New function.
> > >> >>   (ipa_pure_const::execute): Call propagate_malloc and
> > >> >>   ipa_free_fn_summary.
> > >> >>   (pass_local_pure_const::execute): Add support for malloc 
> > >> >> attribute.
> > >> >>   * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.
> > >> >>
> > >> >> testsuite/
> > >> >>   * gcc.dg/ipa/propmalloc-1.c: New test-case.
> > >> >>   * gcc.dg/ipa/propmalloc-2.c: Likewise.
> > >> >>   * gcc.dg/ipa/propmalloc-3.c: Likewise.
> > >> >
> > >> > OK.
> > >> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in 
> > >> > changes.html?
> > >> Done in this version.
> > >> In warn_function_malloc(), I passed false for known_finite param to
> > >> suggest_attribute().
> > >> Does that look OK ?
> > >> Validation in progress. OK to commit if passes ?
> > >
> > > OK, thanks!
> > Thanks, committed as r254140 after following validation:
> > 1] Bootstrap+test with --enable-languages=all,ada,go on
> > x86_64-unknown-linux-gnu and ppc64le-linux-gnu.
> > 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu
> > 3] Cross tested on arm*-*-* and aarch64*-*-*.
> > 
> > Would it be a good idea to extend ipa-pure-const to propagate
> > alloc_size/alloc_align and returns_nonnull attributes ?
> > Which other attributes would be useful to propagate in ipa-pure-const ?
> 
> Also one extension I was considering was TBAA mod-ref pass. I.e. propagate 
> what
> types are read/stored by the call, rather than having just pure/const (no 
> stores,
> no reads at all).
> 
> This will be bit fun to implement in IPA, but it may be useful.  If you would
> be interested in looking into this, we can discuss it (I wanted to implement 
> it
> this stage1 but I think I have way too many other plans).
> 
> LLVM also has nocapture that seems useful for PTA. Richi may have useful
> opinions on that ;)

I once tried to prototype "fn spec" attribute autodetection and
IPA propagation (also into IPA pure const).  Didn't get very far
though.  That tracks per function argument properties like whether
memory is written to through this pointer or whether a pointer
possibly escapes from the function (including through its returned
value).

Richard.


[PATCH] Provide filesystem::path overloads for file streams (LWG 2676, partial)

2017-10-27 Thread Jonathan Wakely

This implements part of LWG 2676. I haven't added the new members
taking wide character strings, because they're only needed on Windows,
where the Filesystem library doesn't work yet. I'll send a follow-up
patch about those overloads.

I've implemented these new functions as templates, which means they
work with both std::filesystem::path and
std::experimental::filesystem::path. There's another benefit, which is
that we don't need to include  in , and we don't
need to export new symbols from libstdc++.so for these functions
(we have explicit instantation definitions in the library).

This isn't entirely conforming, because a type that is convertible to
filesystem::path will not match these new function templates. We can
revisit it once the  symbols are added to libstdc++.so but
this should be OK for now.

* include/std/fstream (basic_filebuf::_If_path): New SFINAE helper.
(basic_filebuf::open(const Path&, const ios_base::openmode&))
(basic_ifstream(const Path&, const ios_base::openmode&))
(basic_ifstream::open(const Path&, const ios_base::openmode&))
(basic_ofstream(const Path&, const ios_base::openmode&))
(basic_ofstream::open(const Path&, const ios_base::openmode&))
(basic_fstream(const Path&, const ios_base::openmode&))
(basic_fstream::open(const Path&, const ios_base::openmode&)):
New constructors and member functions.
* testsuite/27_io/basic_filebuf/open/char/path.cc: New test.
* testsuite/27_io/basic_fstream/cons/char/path.cc: New test.
* testsuite/27_io/basic_fstream/open/char/path.cc: New test.
* testsuite/27_io/basic_ifstream/cons/char/path.cc: New test.
* testsuite/27_io/basic_ifstream/open/char/path.cc: New test.
* testsuite/27_io/basic_ofstream/cons/char/path.cc: New test.
* testsuite/27_io/basic_ofstream/open/char/path.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit 67bafe2c8409f91c2de758e37830b2d57dedbf02
Author: Jonathan Wakely 
Date:   Tue Oct 24 19:08:15 2017 +0100

Provide filesystem::path overloads for file streams (LWG 2676, partial)

* include/std/fstream (basic_filebuf::_If_path): New SFINAE helper.
(basic_filebuf::open(const Path&, const ios_base::openmode&))
(basic_ifstream(const Path&, const ios_base::openmode&))
(basic_ifstream::open(const Path&, const ios_base::openmode&))
(basic_ofstream(const Path&, const ios_base::openmode&))
(basic_ofstream::open(const Path&, const ios_base::openmode&))
(basic_fstream(const Path&, const ios_base::openmode&))
(basic_fstream::open(const Path&, const ios_base::openmode&)):
New constructors and member functions.
* testsuite/27_io/basic_filebuf/open/char/path.cc: New test.
* testsuite/27_io/basic_fstream/cons/char/path.cc: New test.
* testsuite/27_io/basic_fstream/open/char/path.cc: New test.
* testsuite/27_io/basic_ifstream/cons/char/path.cc: New test.
* testsuite/27_io/basic_ifstream/open/char/path.cc: New test.
* testsuite/27_io/basic_ofstream/cons/char/path.cc: New test.
* testsuite/27_io/basic_ofstream/open/char/path.cc: New test.

diff --git a/libstdc++-v3/include/std/fstream b/libstdc++-v3/include/std/fstream
index 52830945fe2..3205f81fb47 100644
--- a/libstdc++-v3/include/std/fstream
+++ b/libstdc++-v3/include/std/fstream
@@ -216,6 +216,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  }
   }
 
+#if __cplusplus >= 201703L
+  template().make_preferred().native())>
+   using _If_path = enable_if_t, _Result>;
+#endif // C++17
+
 public:
   // Constructors/destructor:
   /**
@@ -306,7 +312,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __filebuf_type*
   open(const std::string& __s, ios_base::openmode __mode)
   { return open(__s.c_str(), __mode); }
-#endif
+
+#if __cplusplus >= 201703L
+  /**
+   *  @brief  Opens an external file.
+   *  @param  __s  The name of the file, as a filesystem::path.
+   *  @param  __mode  The open mode flags.
+   *  @return  @c this on success, NULL on failure
+   */
+  template
+   _If_path<_Path, __filebuf_type*>
+   open(const _Path& __s, ios_base::openmode __mode)
+   { return open(__s.c_str(), __mode); }
+#endif // C++17
+#endif // C++11
 
   /**
*  @brief  Closes the currently associated file.
@@ -516,13 +535,29 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
this->open(__s, __mode);
   }
 
+#if __cplusplus >= 201703L
+  /**
+   *  @param  Create an input file stream.
+   *  @param  __s  filesystem::path specifying the filename.
+   *  @param  __mode  Open file in specified mode (see std::ios_base).
+   *
+   *  @c ios_base::in is automatically included in @a __mode.
+   */
+  template>>
+   

[PATCH] New option saphira for Qualcomm server part

2017-10-27 Thread Siddhesh Poyarekar
From: Siddhesh Poyarekar 

This patch adds an mcpu option for the Qualcomm saphira server part.
Tested on aarch64 and did not find any regressions resulting from this
patch.

Siddhesh

2017-10-27  Siddhesh Poyarekar  
Jim Wilson  

gcc/
* config/aarch64/aarch64-cores.def (saphira): New.
* config/aarch64/aarch64-tune.md: Regenerated.
* doc/invoke.texi (AArch64 Options/-mtune): Add "saphira".
* gcc/config/aarch64/aarch64.c (saphira_tunings): New.

Change-Id: I23c4a1ab74e4376c3800cb1481c508bc27418508
---
 gcc/config/aarch64/aarch64-cores.def |  5 +
 gcc/config/aarch64/aarch64-tune.md   |  2 +-
 gcc/config/aarch64/aarch64.c | 28 
 gcc/doc/invoke.texi  |  2 +-
 4 files changed, 35 insertions(+), 2 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 16e4485..cdf047c 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -86,6 +86,11 @@ AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 
8_1A,  AARCH64_FL_FOR
 AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa53, 0x41, 
0xd05, -1)
 AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD, cortexa73, 0x41, 
0xd0a, -1)
 
+/* ARMv8.3-A Architecture Processors.  */
+
+/* Qualcomm ('Q') cores. */
+AARCH64_CORE("saphira", saphira,falkor,8_3A,  
AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   0x51, 
0xC01, -1)
+
 /* ARMv8-A big.LITTLE implementations.  */
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE 
(0xd07, 0xd03), -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index 7fcd6cb..7b3a746 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
+   
"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d1aaf19..f554ffb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -822,6 +822,34 @@ static const struct tune_params qdf24xx_tunings =
   _prefetch_tune
 };
 
+/* Tuning structure for the Qualcomm Saphira core.  Default to falkor values
+   for now.  */
+static const struct tune_params saphira_tunings =
+{
+  _extra_costs,
+  _addrcost_table,
+  _regmove_cost,
+  _vector_cost,
+  _branch_cost,
+  _approx_modes,
+  4, /* memmov_cost  */
+  4, /* issue_rate  */
+  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
+  16,  /* function_align.  */
+  8,   /* jump_align.  */
+  16,  /* loop_align.  */
+  2,   /* int_reassoc_width.  */
+  4,   /* fp_reassoc_width.  */
+  1,   /* vec_reassoc_width.  */
+  2,   /* min_div_recip_mul_sf.  */
+  2,   /* min_div_recip_mul_df.  */
+  0,   /* max_case_values.  */
+  tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  _prefetch_tune
+};
+
 static const struct tune_params thunderx2t99_tunings =
 {
   _extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71b2445..bc480ad 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14326,7 +14326,7 @@ Specify the name of the target processor for which GCC 
should tune the
 performance of the code.  Permissible values for this option are:
 @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a55},
 @samp{cortex-a57}, @samp{cortex-a72}, @samp{cortex-a73}, @samp{cortex-a75},
-@samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx},
+@samp{exynos-m1}, @samp{falkor}, @samp{qdf24xx}, @samp{saphira},
 @samp{xgene1}, @samp{vulcan}, @samp{thunderx},
 @samp{thunderxt88}, @samp{thunderxt88p1}, @samp{thunderxt81},
 @samp{thunderxt83}, @samp{thunderx2t99}, @samp{cortex-a57.cortex-a53},
-- 
2.7.4



Re: [RFC] propagate malloc attribute in ipa-pure-const pass

2017-10-27 Thread Jan Hubicka
> On 25 October 2017 at 20:44, Jan Hubicka  wrote:
> >> On 24 October 2017 at 16:26, Jan Hubicka  wrote:
> >> >> 2017-10-13  Prathamesh Kulkarni  
> >> >>
> >> >>   * cgraph.h (set_malloc_flag): Declare.
> >> >>   * cgraph.c (set_malloc_flag_1): New function.
> >> >>   (set_malloc_flag): Likewise.
> >> >>   * ipa-fnsummary.h (ipa_call_summary): Add new field 
> >> >> is_return_callee.
> >> >>   * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee 
> >> >> to
> >> >>   false.
> >> >>   (read_ipa_call_summary): Add support for reading is_return_callee.
> >> >>   (write_ipa_call_summary): Stream is_return_callee.
> >> >>   * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.
> >> >>   * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, 
> >> >> symbol-summary.h,
> >> >>   ipa-prop.h, ipa-fnsummary.h.
> >> >>   (pure_const_names): Change to static.
> >> >>   (malloc_state_e): Define.
> >> >>   (malloc_state_names): Define.
> >> >>   (funct_state_d): Add field malloc_state.
> >> >>   (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.
> >> >>   (check_retval_uses): New function.
> >> >>   (malloc_candidate_p): Likewise.
> >> >>   (analyze_function): Add support for malloc attribute.
> >> >>   (pure_const_write_summary): Stream malloc_state.
> >> >>   (pure_const_read_summary): Add support for reading malloc_state.
> >> >>   (dump_malloc_lattice): New function.
> >> >>   (propagate_malloc): New function.
> >> >>   (ipa_pure_const::execute): Call propagate_malloc and
> >> >>   ipa_free_fn_summary.
> >> >>   (pass_local_pure_const::execute): Add support for malloc 
> >> >> attribute.
> >> >>   * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.
> >> >>
> >> >> testsuite/
> >> >>   * gcc.dg/ipa/propmalloc-1.c: New test-case.
> >> >>   * gcc.dg/ipa/propmalloc-2.c: Likewise.
> >> >>   * gcc.dg/ipa/propmalloc-3.c: Likewise.
> >> >
> >> > OK.
> >> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in 
> >> > changes.html?
> >> Done in this version.
> >> In warn_function_malloc(), I passed false for known_finite param to
> >> suggest_attribute().
> >> Does that look OK ?
> >> Validation in progress. OK to commit if passes ?
> >
> > OK, thanks!
> Thanks, committed as r254140 after following validation:
> 1] Bootstrap+test with --enable-languages=all,ada,go on
> x86_64-unknown-linux-gnu and ppc64le-linux-gnu.
> 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu
> 3] Cross tested on arm*-*-* and aarch64*-*-*.
> 
> Would it be a good idea to extend ipa-pure-const to propagate
> alloc_size/alloc_align and returns_nonnull attributes ?
> Which other attributes would be useful to propagate in ipa-pure-const ?

Also one extension I was considering was TBAA mod-ref pass. I.e. propagate what
types are read/stored by the call, rather than having just pure/const (no 
stores,
no reads at all).

This will be bit fun to implement in IPA, but it may be useful.  If you would
be interested in looking into this, we can discuss it (I wanted to implement it
this stage1 but I think I have way too many other plans).

LLVM also has nocapture that seems useful for PTA. Richi may have useful
opinions on that ;)

Honza


Re: [RFC, PR 80689] Copy small aggregates element-wise

2017-10-27 Thread Jan Hubicka
> On Thu, Oct 26, 2017 at 2:55 PM, Jan Hubicka  wrote:
> >> I think the limit should be on the number of generated copies and not
> >> the overall size of the structure...  If the struct were composed of
> >> 32 individual chars we wouldn't want to emit 32 loads and 32 stores...
> >>
> >> I wonder how rep; movb; interacts with store to load forwarding?  Is
> >> that maybe optimized well on some archs?  movb should always
> >> forward and wasn't the setup cost for small N reasonable on modern
> >> CPUs?
> >
> > rep mov is win over loop for blocks over 128bytes on core, for blocks in 
> > rage
> > 24-128 on zen.  This is w/o store/load forwarding, but I doubt those provide
> > a cheap way around.
> >
> >>
> >> It probably depends on the width of the entries in the store buffer,
> >> if they appear in-order and the alignment of the stores (if they are 
> >> larger than
> >> 8 bytes they are surely aligned).  IIRC CPUs had smaller store buffer
> >> entries than cache line size.
> >>
> >> Given that load bandwith is usually higher than store bandwith it
> >> might make sense to do the store combining in our copying sequence,
> >> like for the 8 byte entry case use sth like
> >>
> >>   movq 0(%eax), %xmm0
> >>   movhps 8(%eax), %xmm0 // or vpinsert
> >>   mov[au]ps %xmm0, 0%(ebx)
> >> ...
> >>
> >> thus do two loads per store and perform the stores in wider
> >> mode?
> >
> > This may be somewhat faster indeed.  I am not sure if store to load
> > forwarding will work for the later half when read again by halves.
> > It would not happen on older CPUs :)
> 
> Yes, forwarding larger stores to smaller loads generally works fine
> since forever with the usual restrictions of alignment/size being
> power of two "halves".
> 
> The question is of course what to do for 4 byte or smaller elements or
> mixed size elements.  We can do zero-extending loads
> (do we have them for QI, HI mode loads as well?) and
> do shift and or's.  I'm quite sure the CPUs wouldn't like to
> see vpinsert's of different vector mode destinations.  So it
> would be 8 byte stores from GPRs and values built up via
> shift & or.
> 
> As said, the important part is that IIRC CPUs can usually
> have more loads in flight than stores.  Esp. Bulldozer
> with the split core was store buffer size limited (but it
> could do merging of store buffer entries IIRC).

In a way this seems an independent optimization to me
(store forwarding) because for sure this can work for user
code which does not originate from copy sequence.

Seems like something bit tricky to implement on top of RTL
though.

Honza
> 
> Richard.
> 
> > Honza
> >>
> >> As said a general concern was you not copying padding.  If you
> >> put this into an even more common place you surely will break
> >> stuff, no?
> >>
> >> Richard.
> >>
> >> >
> >> > Martin
> >> >
> >> >
> >> >>
> >> >> Richard.
> >> >>
> >> >> > Martin
> >> >> >
> >> >> >
> >> >> > 2017-10-12  Martin Jambor  
> >> >> >
> >> >> > PR target/80689
> >> >> > * tree-sra.h: New file.
> >> >> > * ipa-prop.h: Moved declaration of build_ref_for_offset to
> >> >> > tree-sra.h.
> >> >> > * expr.c: Include params.h and tree-sra.h.
> >> >> > (emit_move_elementwise): New function.
> >> >> > (store_expr_with_bounds): Optionally use it.
> >> >> > * ipa-cp.c: Include tree-sra.h.
> >> >> > * params.def (PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY): New.
> >> >> > * config/i386/i386.c (ix86_option_override_internal): Set
> >> >> > PARAM_MAX_SIZE_FOR_ELEMENTWISE_COPY to 35.
> >> >> > * tree-sra.c: Include tree-sra.h.
> >> >> > (scalarizable_type_p): Renamed to
> >> >> > simple_mix_of_records_and_arrays_p, made public, renamed the
> >> >> > second parameter to allow_char_arrays.
> >> >> > (extract_min_max_idx_from_array): New function.
> >> >> > (completely_scalarize): Moved bits of the function to
> >> >> > extract_min_max_idx_from_array.
> >> >> >
> >> >> > testsuite/
> >> >> > * gcc.target/i386/pr80689-1.c: New test.
> >> >> > ---
> >> >> >  gcc/config/i386/i386.c|   4 ++
> >> >> >  gcc/expr.c| 103 
> >> >> > --
> >> >> >  gcc/ipa-cp.c  |   1 +
> >> >> >  gcc/ipa-prop.h|   4 --
> >> >> >  gcc/params.def|   6 ++
> >> >> >  gcc/testsuite/gcc.target/i386/pr80689-1.c |  38 +++
> >> >> >  gcc/tree-sra.c|  86 
> >> >> > +++--
> >> >> >  gcc/tree-sra.h|  33 ++
> >> >> >  8 files changed, 233 insertions(+), 42 deletions(-)
> >> >> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr80689-1.c
> >> >> >  create mode 100644 gcc/tree-sra.h
> >> >> >
> >> >> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c

Re: [RFC] propagate malloc attribute in ipa-pure-const pass

2017-10-27 Thread Jan Hubicka
Prathamesh
> > OK, thanks!
> Thanks, committed as r254140 after following validation:
> 1] Bootstrap+test with --enable-languages=all,ada,go on
> x86_64-unknown-linux-gnu and ppc64le-linux-gnu.
> 2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu
> 3] Cross tested on arm*-*-* and aarch64*-*-*.
> 
> Would it be a good idea to extend ipa-pure-const to propagate
> alloc_size/alloc_align and returns_nonnull attributes ?
> Which other attributes would be useful to propagate in ipa-pure-const ?

Good I did not discourage you by slow review rate (will try to do something
about it).
I guess alloc_size/alloc_align are obvious candidates.

returns_nonnull is a special case of VRP over returns so I would rather
like to see ipa-prop/ipa-cp extended to handle return values and this done
as a consequence.

One interesting case I think we could try to track is what types of exceptions
are thrown.  Currently if function throws specific exception which is handled
by caller, we still think caller may throw because we do not take types into
consideration at all.  I think that may mark significant portion of functions
nothrow as this seems like common coding practice.

It would be also nice to cleanup ipa-pure-const.  I think one can implement
propagation template where one just feeds the basic parameters (what data
to store, what edges to consider) rahter than copying the rather outdated
code again and again.
> 
> Also, I would be grateful for suggestions on how to propagate malloc
> attribute through indirect calls.
> Currently, I have left it as FIXME, and simply drop the lattice to
> MALLOC_BOTTOM if there's an indirect call :(
> 
> I am not able to understand how attribute propagation across indirect
> calls works.
> For example consider propagation of nothrow attribute in following test-case:
> 
> __attribute__((noinline, noclone, nothrow))
> int f1(int k) { return k; }
> 
> __attribute__((noinline, noclone))
> static int foo(int (*p)(int))
> {
>   return p(10);
> }
> 
> __attribute__((noinline, noclone))
> int bar(void)
> {
>   return foo(f1);
> }
> 
> Shouldn't foo and bar be also marked as nothrow ?
> Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

Well, foo may have other uses where it calls something else than f1
so one needs to track "nothrow in the context when f1 is called".

In general I think this reduces to may edges in the callgraph (for given
indirect calls we want to track whether we know complete list of possible
targets).  We do that for polymorphic calls via ipa-polymorphic-call analysis
but I did not get around implementing anything for indirect calls.

To ge the list of targets, you call possible_polymorphic_call_targets which
also tells you whether the list is complete (final flag) or whether there may
be some callees invisible to compiler (such as derrivate types from other
compilation unit).

The lists may be large and for that reason there is cache token which tells
you if you see same list again.

Extending ipa-pure-const to work across final lists of polymorphic targets
may be first step to handle indirect calls in general.

Honza
> 
> The local-pure-const2 dump shows function is locally throwing  for
> "foo" and "bar", and ipa-pure-const dump doesn't appear to show foo and bar
> marked as nothrow.
> 
> Thanks,
> Prathamesh
> > Honza


[PATCH] Define std::filesystem::path::format enum (P0430R2)

2017-10-27 Thread Jonathan Wakely

This enum doesn't do anything for POSIX systems, so this is a pretty
uninteresting change.

* include/bits/fs_path.h (path::format): Define new enumeration type.
(path(string_type&&), path(const Source&))
(path(InputIterator, InputIterator))
(path(const Source&, const locale&))
(path(InputIterator, InputIterator, const locale&)):
Add format parameter.
* testsuite/27_io/filesystem/path/construct/format.cc: New test.

Tested powerpc64le-linux, committed to trunk.

commit 94acf4d0cfca501eab1387d544ffbf1ccba3e323
Author: Jonathan Wakely 
Date:   Fri Oct 27 12:47:35 2017 +0100

Define std::filesystem::path::format enum (P0430R2)

* include/bits/fs_path.h (path::format): Define new enumeration 
type.
(path(string_type&&), path(const Source&))
(path(InputIterator, InputIterator))
(path(const Source&, const locale&))
(path(InputIterator, InputIterator, const locale&)):
Add format parameter.
* testsuite/27_io/filesystem/path/construct/format.cc: New test.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index 6ba2bd2d43a..7d97cdfbb81 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -156,6 +156,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 #endif
 typedef std::basic_string  string_type;
 
+enum format { native_format, generic_format, auto_format };
+
 // constructors and destructor
 
 path() noexcept { }
@@ -169,27 +171,27 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   __p.clear();
 }
 
-path(string_type&& __source)
+path(string_type&& __source, format = auto_format)
 : _M_pathname(std::move(__source))
 { _M_split_cmpts(); }
 
 template>
-  path(_Source const& __source)
+  path(_Source const& __source, format = auto_format)
   : _M_pathname(_S_convert(_S_range_begin(__source),
   _S_range_end(__source)))
   { _M_split_cmpts(); }
 
 template>
-  path(_InputIterator __first, _InputIterator __last)
+  path(_InputIterator __first, _InputIterator __last, format = auto_format)
   : _M_pathname(_S_convert(__first, __last))
   { _M_split_cmpts(); }
 
 template,
 typename _Require2 = __value_type_is_char<_Source>>
-  path(_Source const& __source, const locale& __loc)
+  path(_Source const& __source, const locale& __loc, format = auto_format)
   : _M_pathname(_S_convert_loc(_S_range_begin(__source),
   _S_range_end(__source), __loc))
   { _M_split_cmpts(); }
@@ -197,7 +199,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 template,
 typename _Require2 = __value_type_is_char<_InputIterator>>
-  path(_InputIterator __first, _InputIterator __last, const locale& __loc)
+  path(_InputIterator __first, _InputIterator __last, const locale& __loc,
+  format = auto_format)
   : _M_pathname(_S_convert_loc(__first, __last, __loc))
   { _M_split_cmpts(); }
 
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/construct/format.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/construct/format.cc
new file mode 100644
index 000..e7ed19cafe9
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/construct/format.cc
@@ -0,0 +1,116 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17 -lstdc++fs" }
+// { dg-do run { target c++17 } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+#include 
+
+using std::filesystem::path;
+
+void
+test01()
+{
+  auto s = [&]() -> path::string_type { return "foo/bar"; };
+  path p0(s());
+  path p1(s(), path::auto_format);
+  VERIFY( p1 == p0 );
+  path p2(s(), path::native_format);
+  VERIFY( p2 == p0 );
+  path p3(s(), path::generic_format);
+  VERIFY( p3 == p0 );
+}
+
+void
+test02()
+{
+  path::string_type s = "foo/bar";
+  path p0(s);
+  path p1(s, path::auto_format);
+  VERIFY( p1 == p0 );
+  path p2(s, path::native_format);
+  VERIFY( p2 == p0 );
+  path p3(s, path::generic_format);
+  VERIFY( p3 == p0 );
+}
+
+void
+test03()
+{
+  const char* s = "foo/bar";
+  path p0(s);
+  path p1(s, path::auto_format);

[doc] Update install.texi for Solaris 12 rename

2017-10-27 Thread Rainer Orth
There's one Solaris 12 reference in install.texi that also needs
adapting for the Solaris 12 -> 11.4 change.  However, instead of simply
updating it I chose to update the relevant sections for current binutils
versions.

Tested with make doc/gccinstall.info doc/gccinstall.pdf and visually
inspecting the pdf.  Will install on mainline shortly.  I've no
intentions of backporting to the gcc-7 and gcc-6 branches, though:
unless something critical comes up, I usually keep branches as is.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-10-26  Rainer Orth  

* doc/install.texi (Specific, i?86-*-solaris2.10): Simplify gas
2.26 caveat.  Update gas and gld versions.
(Specific, *-*-solaris2*): Update binutils version.  Remove caveat
reference.

# HG changeset patch
# Parent  944ee8a0ae454d5689607ef4f4abc535f866c766
Update install.texi for Solaris 12 rename

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3806,12 +3806,10 @@ It is recommended that you configure GCC
 versions included in Solaris 10, from GNU binutils 2.15 (in
 @file{/usr/sfw/bin/gas}), and Solaris 11, from GNU binutils 2.19 or
 newer (also available as @file{/usr/bin/gas} and
-@file{/usr/gnu/bin/as}), work fine.  Please note that the current
-version, from GNU binutils 2.26, only works on Solaris 12 when using the
-Solaris linker.  On Solaris 10 and 11, you either have to wait for GNU
-binutils 2.26.1 or newer, or stay with GNU binutils 2.25.1.  Recent
-versions of the Solaris assembler in @file{/usr/ccs/bin/as} work almost
-as well, though.
+@file{/usr/gnu/bin/as}), work fine.  The current version, from GNU
+binutils 2.29, is known to work, but the version from GNU binutils 2.26
+must be avoided.  Recent versions of the Solaris assembler in
+@file{/usr/ccs/bin/as} work almost as well, though.
 @c FIXME: as patch requirements?
 
 For linking, the Solaris linker, is preferred.  If you want to use the GNU
@@ -3819,7 +3817,7 @@ linker instead, note that due to a packa
 10, from GNU binutils 2.15 (in @file{/usr/sfw/bin/gld}), cannot be used,
 while the version in Solaris 11, from GNU binutils 2.19 or newer (also
 in @file{/usr/gnu/bin/ld} and @file{/usr/bin/gld}), works, as does the
-latest version, from GNU binutils 2.26.
+latest version, from GNU binutils 2.29.
 
 To use GNU @command{as}, configure with the options
 @option{--with-gnu-as --with-as=@//usr/@/sfw/@/bin/@/gas}.  It may be necessary
@@ -4456,9 +4454,8 @@ versions included in Solaris 10, from GN
 @file{/usr/sfw/bin/gas}), and Solaris 11,
 from GNU binutils 2.19 or newer (also in @file{/usr/bin/gas} and
 @file{/usr/gnu/bin/as}), are known to work.
-Current versions of GNU binutils (2.26)
-are known to work as well, with the caveat mentioned in
-@uref{#ix86-x-solaris210,,i?86-*-solaris2.10} .  Note that your mileage may vary
+The current version, from GNU binutils 2.29,
+is known to work as well.  Note that your mileage may vary
 if you use a combination of the GNU tools and the Solaris tools: while the
 combination GNU @command{as} + Sun @command{ld} should reasonably work,
 the reverse combination Sun @command{as} + GNU @command{ld} may fail to
@@ -4466,7 +4463,7 @@ build or cause memory corruption at runt
 @c FIXME: still?
 GNU @command{ld} usually works as well, although the version included in
 Solaris 10 cannot be used due to several bugs.  Again, the current
-version (2.26) is known to work, but generally lacks platform specific
+version (2.29) is known to work, but generally lacks platform specific
 features, so better stay with Solaris @command{ld}.  To use the LTO linker
 plugin (@option{-fuse-linker-plugin}) with GNU @command{ld}, GNU
 binutils @emph{must} be configured with @option{--enable-largefile}.


[doc] Remove Tru64 UNIX and IRIX references in install.texi

2017-10-27 Thread Rainer Orth
I happened to notice that install.texi still contains references to the
Tru64 UNIX and IRIX ports I've removed in GCC 4.8.  I believe it's time
now to get rid of those completely.

Tested with make doc/gccinstall.info and doc/gccinstall.pdf.  Ok for
mainline?  This falls under my prior maintainership, I guess, but
think it's best to get a second opinion.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2017-10-27  Rainer Orth  

* doc/install.texi (Specific, alpha*-*-*): Remove DEC OSF/1
etc. reference.
(Specific, alpha*-dec-osf5.1): Remove.
(Specific, mips-sgi-irix5): Remove.
(Specific, mips-sgi-irix6): Remove.

# HG changeset patch
# Parent  8d97816c0ce1dd84c7463b60d689c69e6314c67d
Remove Tru64 UNIX and IRIX references in install.texi

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3168,8 +3168,6 @@ information have to.
 @item
 @uref{#alpha-x-x,,alpha*-*-*}
 @item
-@uref{#alpha-dec-osf51,,alpha*-dec-osf5.1}
-@item
 @uref{#amd64-x-solaris210,,amd64-*-solaris2.10}
 @item
 @uref{#arm-x-eabi,,arm-*-eabi}
@@ -3220,10 +3218,6 @@ information have to.
 @item
 @uref{#mips-x-x,,mips-*-*}
 @item
-@uref{#mips-sgi-irix5,,mips-sgi-irix5}
-@item
-@uref{#mips-sgi-irix6,,mips-sgi-irix6}
-@item
 @uref{#nds32le-x-elf,,nds32le-*-elf}
 @item
 @uref{#nds32be-x-elf,,nds32be-*-elf}
@@ -3353,8 +3347,7 @@ The workaround is disabled by default if
 @anchor{alpha-x-x}
 @heading alpha*-*-*
 This section contains general configuration information for all
-alpha-based platforms using ELF (in particular, ignore this section for
-DEC OSF/1, Digital UNIX and Tru64 UNIX)@.  In addition to reading this
+alpha-based platforms using ELF@.  In addition to reading this
 section, please read all other sections that match your target.
 
 We require binutils 2.11.2 or newer.
@@ -3365,20 +3358,6 @@ shared libraries.
 @html
 
 @end html
-@anchor{alpha-dec-osf51}
-@heading alpha*-dec-osf5.1
-Systems using processors that implement the DEC Alpha architecture and
-are running the DEC/Compaq/HP Unix (DEC OSF/1, Digital UNIX, or Compaq/HP
-Tru64 UNIX) operating system, for example the DEC Alpha AXP systems.
-
-Support for Tru64 UNIX V5.1 has been removed in GCC 4.8.  As of GCC 4.6,
-support for Tru64 UNIX V4.0 and V5.0 has been removed.  As of GCC 3.2,
-versions before @code{alpha*-dec-osf4} are no longer supported.  (These
-are the versions which identify themselves as DEC OSF/1.)
-
-@html
-
-@end html
 @anchor{amd64-x-solaris210}
 @heading amd64-*-solaris2.1[0-9]*
 This is a synonym for @samp{x86_64-*-solaris2.1[0-9]*}.
@@ -4165,22 +4144,6 @@ use traps on systems that support them.
 @html
 
 @end html
-@anchor{mips-sgi-irix5}
-@heading mips-sgi-irix5
-Support for IRIX 5 has been removed in GCC 4.6.
-
-@html
-
-@end html
-@anchor{mips-sgi-irix6}
-@heading mips-sgi-irix6
-Support for IRIX 6.5 has been removed in GCC 4.8.  Support for IRIX 6
-releases before 6.5 has been removed in GCC 4.6, as well as support for
-the O32 ABI.
-
-@html
-
-@end html
 @anchor{moxie-x-elf}
 @heading moxie-*-elf
 The moxie processor.


Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Jakub Jelinek
On Fri, Oct 27, 2017 at 01:16:08PM +0200, Martin Liška wrote:
> On 10/27/2017 12:52 PM, Jakub Jelinek wrote:
> > The decl.c change seems to be only incremental change from a not publicly
> > posted patch rather than the full diff against trunk.
> 
> Sorry for that. Sending full patch.

Thanks.

> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -15280,7 +15280,19 @@ begin_destructor_body (void)
>if (flag_lifetime_dse
> /* Clobbering an empty base is harmful if it overlays real data.  */
> && !is_empty_class (current_class_type))
> - finish_decl_cleanup (NULL_TREE, build_clobber_this ());
> + {
> +   if (sanitize_flags_p (SANITIZE_VPTR)
> +   && (flag_sanitize_recover & SANITIZE_VPTR) == 0)
> + {
> +   tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
> +   tree call = build_call_expr (fndecl, 3,
> +current_class_ptr, integer_zero_node,
> +TYPE_SIZE_UNIT (current_class_type));

I wonder if it wouldn't be cheaper to just use thisref = {}; rather than
memset, pretty much the same thing as build_clobber_this () emits, except
for the TREE_VOLATILE.  Also, build_clobber_this has:
  if (vbases)
exprstmt = build_if_in_charge (exprstmt);
so it doesn't clobber if not in charge, not sure if it applies here too.
So maybe easiest would be add a bool argument to build_clobber_this which
would say whether it is a clobber or real clearing?

> +   finish_decl_cleanup (NULL_TREE, call);
> + }
> +   else
> + finish_decl_cleanup (NULL_TREE, build_clobber_this ());
> + }
>  
>/* And insert cleanups for our bases and members so that they
>will be properly destroyed if we throw.  */
> diff --git a/gcc/testsuite/g++.dg/ubsan/vptr-12.C 
> b/gcc/testsuite/g++.dg/ubsan/vptr-12.C
> new file mode 100644
> index 000..96c8473d757
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ubsan/vptr-12.C
> @@ -0,0 +1,26 @@
> +// { dg-do run }
> +// { dg-shouldfail "ubsan" }
> +// { dg-options "-fsanitize=vptr -fno-sanitize-recover=vptr" }
> +
> +struct MyClass
> +{
> +  virtual ~MyClass () {}
> +  virtual void
> +  Doit ()
> +  {
> +  }

Why not put all the above 4 lines into a single one, the dtor already uses
that kind of formatting.

> +};
> +
> +int
> +main ()
> +{
> +  MyClass *c = new MyClass;
> +  c->~MyClass ();
> +  c->Doit ();
> +
> +  return 0;
> +}
> +
> +// { dg-output "\[^\n\r]*vptr-12.C:19:\[0-9]*: runtime error: member call on 
> address 0x\[0-9a-fA-F]* which does not point to an object of type 
> 'MyClass'(\n|\r\n|\r)" }
> +// { dg-output "0x\[0-9a-fA-F]*: note: object has invalid vptr(\n|\r\n|\r)" }
> +

Unnecessary empty line at end.

Jakub


Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Martin Liška

On 10/27/2017 12:52 PM, Jakub Jelinek wrote:

The decl.c change seems to be only incremental change from a not publicly
posted patch rather than the full diff against trunk.


Sorry for that. Sending full patch.

Martin
>From df0cc0c2da18b150b1f0fbef418450a223470d7f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 19 Oct 2017 11:10:19 +0200
Subject: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

gcc/cp/ChangeLog:

2017-10-27  Martin Liska  

	* decl.c (begin_destructor_body): In case of disabled recovery,
	we can zero object in order to catch virtual calls after
	an object lifetime.

gcc/testsuite/ChangeLog:

2017-10-27  Martin Liska  

	* g++.dg/ubsan/vptr-12.C: New test.
---
 gcc/cp/decl.c| 14 +-
 gcc/testsuite/g++.dg/ubsan/vptr-12.C | 26 ++
 2 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/ubsan/vptr-12.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 42b52748e2a..69636e30008 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -15280,7 +15280,19 @@ begin_destructor_body (void)
   if (flag_lifetime_dse
 	  /* Clobbering an empty base is harmful if it overlays real data.  */
 	  && !is_empty_class (current_class_type))
-	finish_decl_cleanup (NULL_TREE, build_clobber_this ());
+	{
+	  if (sanitize_flags_p (SANITIZE_VPTR)
+	  && (flag_sanitize_recover & SANITIZE_VPTR) == 0)
+	{
+	  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
+	  tree call = build_call_expr (fndecl, 3,
+	   current_class_ptr, integer_zero_node,
+	   TYPE_SIZE_UNIT (current_class_type));
+	  finish_decl_cleanup (NULL_TREE, call);
+	}
+	  else
+	finish_decl_cleanup (NULL_TREE, build_clobber_this ());
+	}
 
   /* And insert cleanups for our bases and members so that they
 	 will be properly destroyed if we throw.  */
diff --git a/gcc/testsuite/g++.dg/ubsan/vptr-12.C b/gcc/testsuite/g++.dg/ubsan/vptr-12.C
new file mode 100644
index 000..96c8473d757
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/vptr-12.C
@@ -0,0 +1,26 @@
+// { dg-do run }
+// { dg-shouldfail "ubsan" }
+// { dg-options "-fsanitize=vptr -fno-sanitize-recover=vptr" }
+
+struct MyClass
+{
+  virtual ~MyClass () {}
+  virtual void
+  Doit ()
+  {
+  }
+};
+
+int
+main ()
+{
+  MyClass *c = new MyClass;
+  c->~MyClass ();
+  c->Doit ();
+
+  return 0;
+}
+
+// { dg-output "\[^\n\r]*vptr-12.C:19:\[0-9]*: runtime error: member call on address 0x\[0-9a-fA-F]* which does not point to an object of type 'MyClass'(\n|\r\n|\r)" }
+// { dg-output "0x\[0-9a-fA-F]*: note: object has invalid vptr(\n|\r\n|\r)" }
+
-- 
2.14.2



[wwwdocs] Document change that ipa-pure-const pass propagates malloc attribute.

2017-10-27 Thread Prathamesh Kulkarni
Applied the attached patch to changes.html.

Thanks,
Prathamesh
* changes.html: Document change that ipa-pure-const is extended to 
propagate malloc attribute.

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.15
diff -r1.15 changes.html
37c37,39
<   
---
>   The ipa-pure-const pass is extended to propagate malloc attribute, and 
> the
>   corresponding warning option Wsuggest-attribute=malloc emits a
>   diagnostic for a function, which can be annotated with malloc 
> attribute.


Re: [PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Jakub Jelinek
On Fri, Oct 27, 2017 at 12:47:12PM +0200, Martin Liška wrote:
> Hello.
> 
> This is small improvement that can catch a virtual call after a lifetime
> scope of an object.
> 
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?

The decl.c change seems to be only incremental change from a not publicly
posted patch rather than the full diff against trunk.

> 2017-10-27  Martin Liska  
> 
>   * decl.c (begin_destructor_body): In case of disabled recovery,
>   we can zero object in order to catch virtual calls after
>   an object lifetime.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-10-27  Martin Liska  
> 
>   * g++.dg/ubsan/vptr-12.C: New test.
> ---
>  gcc/cp/decl.c|  3 ++-
>  gcc/testsuite/g++.dg/ubsan/vptr-12.C | 26 ++
>  2 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/ubsan/vptr-12.C
> 
> 

> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index 15a8d283353..69636e30008 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -15281,7 +15281,8 @@ begin_destructor_body (void)
> /* Clobbering an empty base is harmful if it overlays real data.  */
> && !is_empty_class (current_class_type))
>   {
> -   if (sanitize_flags_p (SANITIZE_VPTR))
> +   if (sanitize_flags_p (SANITIZE_VPTR)
> +   && (flag_sanitize_recover & SANITIZE_VPTR) == 0)
>   {
> tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
> tree call = build_call_expr (fndecl, 3,

Jakub


Re: [RFC] propagate malloc attribute in ipa-pure-const pass

2017-10-27 Thread Prathamesh Kulkarni
On 25 October 2017 at 20:44, Jan Hubicka  wrote:
>> On 24 October 2017 at 16:26, Jan Hubicka  wrote:
>> >> 2017-10-13  Prathamesh Kulkarni  
>> >>
>> >>   * cgraph.h (set_malloc_flag): Declare.
>> >>   * cgraph.c (set_malloc_flag_1): New function.
>> >>   (set_malloc_flag): Likewise.
>> >>   * ipa-fnsummary.h (ipa_call_summary): Add new field 
>> >> is_return_callee.
>> >>   * ipa-fnsummary.c (ipa_call_summary::reset): Set is_return_callee to
>> >>   false.
>> >>   (read_ipa_call_summary): Add support for reading is_return_callee.
>> >>   (write_ipa_call_summary): Stream is_return_callee.
>> >>   * ipa-inline.c (ipa_inline): Remove call to ipa_free_fn_summary.
>> >>   * ipa-pure-const.c: Add headers ssa.h, alloc-pool.h, 
>> >> symbol-summary.h,
>> >>   ipa-prop.h, ipa-fnsummary.h.
>> >>   (pure_const_names): Change to static.
>> >>   (malloc_state_e): Define.
>> >>   (malloc_state_names): Define.
>> >>   (funct_state_d): Add field malloc_state.
>> >>   (varying_state): Set malloc_state to STATE_MALLOC_BOTTOM.
>> >>   (check_retval_uses): New function.
>> >>   (malloc_candidate_p): Likewise.
>> >>   (analyze_function): Add support for malloc attribute.
>> >>   (pure_const_write_summary): Stream malloc_state.
>> >>   (pure_const_read_summary): Add support for reading malloc_state.
>> >>   (dump_malloc_lattice): New function.
>> >>   (propagate_malloc): New function.
>> >>   (ipa_pure_const::execute): Call propagate_malloc and
>> >>   ipa_free_fn_summary.
>> >>   (pass_local_pure_const::execute): Add support for malloc attribute.
>> >>   * ssa-iterators.h (RETURN_FROM_IMM_USE_STMT): New macro.
>> >>
>> >> testsuite/
>> >>   * gcc.dg/ipa/propmalloc-1.c: New test-case.
>> >>   * gcc.dg/ipa/propmalloc-2.c: Likewise.
>> >>   * gcc.dg/ipa/propmalloc-3.c: Likewise.
>> >
>> > OK.
>> > Perhaps we could also add -Wsuggest-sttribute=malloc and mention it in 
>> > changes.html?
>> Done in this version.
>> In warn_function_malloc(), I passed false for known_finite param to
>> suggest_attribute().
>> Does that look OK ?
>> Validation in progress. OK to commit if passes ?
>
> OK, thanks!
Thanks, committed as r254140 after following validation:
1] Bootstrap+test with --enable-languages=all,ada,go on
x86_64-unknown-linux-gnu and ppc64le-linux-gnu.
2] LTO bootstrap+test on x86_64-unknown-linux-gnu and ppc64le-linux-gnu
3] Cross tested on arm*-*-* and aarch64*-*-*.

Would it be a good idea to extend ipa-pure-const to propagate
alloc_size/alloc_align and returns_nonnull attributes ?
Which other attributes would be useful to propagate in ipa-pure-const ?

Also, I would be grateful for suggestions on how to propagate malloc
attribute through indirect calls.
Currently, I have left it as FIXME, and simply drop the lattice to
MALLOC_BOTTOM if there's an indirect call :(

I am not able to understand how attribute propagation across indirect
calls works.
For example consider propagation of nothrow attribute in following test-case:

__attribute__((noinline, noclone, nothrow))
int f1(int k) { return k; }

__attribute__((noinline, noclone))
static int foo(int (*p)(int))
{
  return p(10);
}

__attribute__((noinline, noclone))
int bar(void)
{
  return foo(f1);
}

Shouldn't foo and bar be also marked as nothrow ?
Since foo indirectly calls f1 which is nothrow and bar only calls foo ?

The local-pure-const2 dump shows function is locally throwing  for
"foo" and "bar", and ipa-pure-const dump doesn't appear to show foo and bar
marked as nothrow.

Thanks,
Prathamesh
> Honza


[PATCH] Zero vptr in dtor for -fsanitize=vptr.

2017-10-27 Thread Martin Liška

Hello.

This is small improvement that can catch a virtual call after a lifetime
scope of an object.


Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/cp/ChangeLog:

2017-10-27  Martin Liska  

* decl.c (begin_destructor_body): In case of disabled recovery,
we can zero object in order to catch virtual calls after
an object lifetime.

gcc/testsuite/ChangeLog:

2017-10-27  Martin Liska  

* g++.dg/ubsan/vptr-12.C: New test.
---
 gcc/cp/decl.c|  3 ++-
 gcc/testsuite/g++.dg/ubsan/vptr-12.C | 26 ++
 2 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/ubsan/vptr-12.C


diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 15a8d283353..69636e30008 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -15281,7 +15281,8 @@ begin_destructor_body (void)
 	  /* Clobbering an empty base is harmful if it overlays real data.  */
 	  && !is_empty_class (current_class_type))
 	{
-	  if (sanitize_flags_p (SANITIZE_VPTR))
+	  if (sanitize_flags_p (SANITIZE_VPTR)
+	  && (flag_sanitize_recover & SANITIZE_VPTR) == 0)
 	{
 	  tree fndecl = builtin_decl_explicit (BUILT_IN_MEMSET);
 	  tree call = build_call_expr (fndecl, 3,
diff --git a/gcc/testsuite/g++.dg/ubsan/vptr-12.C b/gcc/testsuite/g++.dg/ubsan/vptr-12.C
new file mode 100644
index 000..96c8473d757
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ubsan/vptr-12.C
@@ -0,0 +1,26 @@
+// { dg-do run }
+// { dg-shouldfail "ubsan" }
+// { dg-options "-fsanitize=vptr -fno-sanitize-recover=vptr" }
+
+struct MyClass
+{
+  virtual ~MyClass () {}
+  virtual void
+  Doit ()
+  {
+  }
+};
+
+int
+main ()
+{
+  MyClass *c = new MyClass;
+  c->~MyClass ();
+  c->Doit ();
+
+  return 0;
+}
+
+// { dg-output "\[^\n\r]*vptr-12.C:19:\[0-9]*: runtime error: member call on address 0x\[0-9a-fA-F]* which does not point to an object of type 'MyClass'(\n|\r\n|\r)" }
+// { dg-output "0x\[0-9a-fA-F]*: note: object has invalid vptr(\n|\r\n|\r)" }
+



Re: Adjust empty class parameter passing ABI (PR c++/60336)

2017-10-27 Thread Richard Biener
On Fri, 27 Oct 2017, Jakub Jelinek wrote:

> On Fri, Oct 27, 2017 at 12:31:46PM +0200, Richard Biener wrote:
> > I fear it doesn't work at all with LTO (you'll always get the old ABI
> > if I read the patch correctly).  This is because the function
> > computing the size looks at flag_abi_version which isn't saved
> > per function / TU.
> > 
> > Similarly you'll never get the ABI warning with LTO (less of a big
> > deal of course) because the langhook doesn't reflect things correctly
> > either.
> > 
> > So...  can we instead compute whether a type is "empty" according
> > to the ABI early and store the result in the type (thinking of
> > doing this in layout_type?).  Similarly set a flag whether to
> > warn.  Why do you warn from backends / code emission and not
> > from the FEs?  Is that to avoid warnings for calls that got inlined?
> > Maybe the FE could set a flag on the call itself (ok, somewhat
> > awkward to funnel through gimple).
> 
> Warning in the FE is too early both because of the inlining, never
> emitted functions and because whether an empty struct is passed differently
> from the past matters on the backend (whether its psABI says it should be
> not passed at all or not).
> 
> Perhaps if empty types are rare enough it could be an artificial attribute
> on the type if we can't get a spare bit for that.  But computing in the FE
> or before free_lang_data and saving on the type whether it is empty or not
> seems reasonable to me.

There are 18 unused bits in tree_type_common if we don't want to re-use
any.  For the warning I first thought of setting TREE_NO_WARNING on it
but that bit is used already.  OTOH given the "fit" of TREE_NO_WARNING
I'd move TYPE_ARTIFICIAL somewhere else.

Richard.


Disable partial reg dependencies for haswell+

2017-10-27 Thread Jan Hubicka
Hi,
while looking for x86 tuning issues I noticed PR81614 about partial register
stalls on core.  We currently support two schemes for our of order CPUs -
partial register dependencies where registers are renamed always as whole
and thus it is important to always write complete register at the begginnig
of dependency chain and partial register stalls where registers are renamed
by parts and it is important to not read full size after partial store.

Core renames partial registers, like pentiumPro+ but it is currently set
to partial reg dependency.

PR log also claims that there was change in Haswell that avoids the partial
register stalls completely (how?). This is per Agner Fog manual and I have
verified that dropping partial register dependncy on haswell produce no
regressions and slighly reduce code size.

I plan to experiment with switching pre-Haswell cores to partial register stals
but need to find set up for benchmarking (Vladimir is still running regular
tester on Conroe). But I plan to do that incrementally.

Because AMD chips are all partial reg dependency, we will probably need to
find a way to avoid both on the code sequences mentioned in the PRs. This
is another incremental step.

Bootstrapped/regtested x86_64-linux and benchmarked on Haswell on Spec2k,
spec2k6, C++ benchmarks, polyhedron and my own microbenchmarks which I developed
for partial register stalls/dependencies at the PPro/K7 times.

I have also noticed:
#define m_CORE_ALL (m_CORE2 | m_NEHALEM  | m_SANDYBRIDGE | m_HASWELL)
#define m_SKYLAKE_AVX512 (1U<

Re: Adjust empty class parameter passing ABI (PR c++/60336)

2017-10-27 Thread Jakub Jelinek
On Fri, Oct 27, 2017 at 12:31:46PM +0200, Richard Biener wrote:
> I fear it doesn't work at all with LTO (you'll always get the old ABI
> if I read the patch correctly).  This is because the function
> computing the size looks at flag_abi_version which isn't saved
> per function / TU.
> 
> Similarly you'll never get the ABI warning with LTO (less of a big
> deal of course) because the langhook doesn't reflect things correctly
> either.
> 
> So...  can we instead compute whether a type is "empty" according
> to the ABI early and store the result in the type (thinking of
> doing this in layout_type?).  Similarly set a flag whether to
> warn.  Why do you warn from backends / code emission and not
> from the FEs?  Is that to avoid warnings for calls that got inlined?
> Maybe the FE could set a flag on the call itself (ok, somewhat
> awkward to funnel through gimple).

Warning in the FE is too early both because of the inlining, never
emitted functions and because whether an empty struct is passed differently
from the past matters on the backend (whether its psABI says it should be
not passed at all or not).

Perhaps if empty types are rare enough it could be an artificial attribute
on the type if we can't get a spare bit for that.  But computing in the FE
or before free_lang_data and saving on the type whether it is empty or not
seems reasonable to me.

Jakub


Re: Adjust empty class parameter passing ABI (PR c++/60336)

2017-10-27 Thread Richard Biener
On Fri, 27 Oct 2017, Marek Polacek wrote:

> This is my attempt at the empty class ABI change.  To recap quickly, the C++
> compiler has used a different calling convention for passing empty classes,
> because C++ says they have size 1, while the GCC C extension gives them size 
> 0.
> But this difference doesn't mean that they need to be passed differently; in
> either case there's no actual data involved.
> 
> I've made use of all the previous patches:
> 
> 
> 
> but this approach uses two target hooks which check whether a type is an empty
> type according to the x86_64 psABI, and the second implements the warning for 
> it.
> It also uses a lang hook to determine whether to print a -Wabi warning.  The
> new passing can be turned back into the old passing using -fabi-version=11.  
> So
> I had to use the new langhook, otherwise I wouldn't be able make it dependent
> on the C++ ABI verison.
> 
> Some earlier comments from Jason:
> > I'm still uneasy about how much this requires generic code to think
> > about empty types specifically.  I'd prefer to encapsulate this as
> > much as possible.  Rather than places saying (empty ? 0 :
> > int_size_in_bytes), I figured that would all be hidden in the target
> > code, along with the warning.  Places where you currently emit a
> > warning from generic code ought to come from a target hook, either an
> > existing one or a new one called something like
> > warn_parameter_passing_abi.
>  
> I hope I've improved this now.  I've introduced two new wrappers,
> maybe_empty_type_size, and int_maybe_empty_type_size.  I've moved the
> warning to its own target hook with a new field in CUMULATIVE_ARGS; that
> seems to work well.
> 
> > Note that nothing in gcc/ currently refers to warn_abi or warn_psabi,
> > which are both c-common flags; some targets refer to warn_psabi.
>  
> True.
> 
> > > It also uses a lang hook to determine
> > > whether to print a -Wabi warning.  The new passing can be turned back into
> > > the old passing using -fabi-version=11.  So I had to use a new langhook,
> > > otherwise I couldn't make it dependent on the C++ ABI verison.
> > 
> > Hmm, that's unfortunate.
> 
> It is, but the possibility of using -fabi-version=11 to revert to the old
> behavior seems useful.
> 
> > > The users will get the new empty records passing even with -m32.
> > 
> > I thought we only wanted to make the change for -m64; -m32 is a legacy
> > ABI at this point, it doesn't seem advisable to change it.
> 
> Sure thing, done (using the TARGET_64BIT check in ix86_is_empty_record_p).
> 
> > > One thing I wasn't sure about is what to do with a struct that contains
> > > a zero-length array.  We should consider it an empty type, but my
> > > is_empty_record_p won't do that, the reason is that there doesn't seem
> > > to be any change regarding passing: older GCCs wouldn't pass such a struct
> > > via stack.  So there's no warning for that, either.
> > 
> > Not warning for that makes sense.  Doing that by making
> > is_empty_record_p give the wrong answer seems unwise.
>  
> is_empty_record_p should now give the correct answer.  As I said yesterday,
> I think we shouldn't consider struct S { struct { } a; int b[0]; } as empty,
> because [0] is a GNU extension and struct S { struct { } a; int b[]; } is
> considered non-empty.  Also, for the record, the struct with the flexible 
> array
> member [0] broke struct-layout when considered empty.  Thus the uglyfying of
> is_empty_record_p.  empty23.C and empty24.C test that.
> 
> > Also, you need to handled unnamed bit-fields, as they aren't data members.
> 
> Here's another hiccup: e.g. struct S { unsigned : 15; }; are considered empty,
> but they were considered empty even before my change (so we probably shouldn't
> warn), with "unsigned : 0;" it's different and there seems to be a change in
> how we pass that.  So I kept the warning.  Not sure how important this is.
> 
> Any other concerns?

I fear it doesn't work at all with LTO (you'll always get the old ABI
if I read the patch correctly).  This is because the function
computing the size looks at flag_abi_version which isn't saved
per function / TU.

Similarly you'll never get the ABI warning with LTO (less of a big
deal of course) because the langhook doesn't reflect things correctly
either.

So...  can we instead compute whether a type is "empty" according
to the ABI early and store the result in the type (thinking of
doing this in layout_type?).  Similarly set a flag whether to
warn.  Why do you warn from backends / code emission and not
from the FEs?  Is that to avoid warnings for calls that got inlined?
Maybe the FE could set a flag on the call itself (ok, somewhat
awkward to funnel through gimple).

Richard.

> Bootstrapped/regtested on x86_64-linux, ppc64-linux, and aarch64-linux.
> 
> 2017-10-27  

[PATCH, Fortran, v1] Clarify error message of co_reduce

2017-10-27 Thread Andre Vehreschild
Hi all,

as noted on IRC is one of the error message in check.c co_reduce misleading.
The attached patch fixes this. The intention of the error message is to tell
that the type of the argument bound to parameter A is of wrong type and not
that an unspecific argument has a wrong type.

If no one objects within 24 h I am planning to commit this patch as obvious to
trunk gcc-7. Bootstrapped and regtested on x86_64-linux-gnu/f25.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
gcc/fortran/ChangeLog:

2017-10-27  Andre Vehreschild  

* check.c (gfc_check_co_reduce): Clarify error message.

diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c
index 681950e..759c15a 100644
--- a/gcc/fortran/check.c
+++ b/gcc/fortran/check.c
@@ -1731,7 +1731,7 @@ gfc_check_co_reduce (gfc_expr *a, gfc_expr *op, gfc_expr *result_image,
 
   if (!gfc_compare_types (>ts, >result->ts))
 {
-  gfc_error ("A argument at %L has type %s but the function passed as "
+  gfc_error ("The A argument at %L has type %s but the function passed as "
 		 "OPERATOR at %L returns %s",
 		 >where, gfc_typename (>ts), >where,
 		 gfc_typename (>result->ts));


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-27 Thread Pedro Alves
On 10/27/2017 09:35 AM, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek  wrote:
>> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:

>>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>>> that would be required?
>>
>> I think it is too early for that, we aren't LLVM or Rust that don't really
>> care about what build requirements they impose on users.
> 
> That's true, which is why I asked.  For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
> 
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC.  Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".

Right, GDB's baseline is GCC 4.8 too.  When GDB was deciding whether
to start requiring full C++11 (about a year ago), we looked at the
latest stable release of all the "big" distros to see whether:

#1 - the system compiler was new enough (gcc >= 4.8), or failing
 that,
#2 - whether there's an easy to install package providing a
 new-enough compiler.

and it turns out that that was true for all.  Meanwhile another year
has passed and there have been no complaints.

Thanks,
Pedro Alves



Adjust empty class parameter passing ABI (PR c++/60336)

2017-10-27 Thread Marek Polacek
This is my attempt at the empty class ABI change.  To recap quickly, the C++
compiler has used a different calling convention for passing empty classes,
because C++ says they have size 1, while the GCC C extension gives them size 0.
But this difference doesn't mean that they need to be passed differently; in
either case there's no actual data involved.

I've made use of all the previous patches:



but this approach uses two target hooks which check whether a type is an empty
type according to the x86_64 psABI, and the second implements the warning for 
it.
It also uses a lang hook to determine whether to print a -Wabi warning.  The
new passing can be turned back into the old passing using -fabi-version=11.  So
I had to use the new langhook, otherwise I wouldn't be able make it dependent
on the C++ ABI verison.

Some earlier comments from Jason:
> I'm still uneasy about how much this requires generic code to think
> about empty types specifically.  I'd prefer to encapsulate this as
> much as possible.  Rather than places saying (empty ? 0 :
> int_size_in_bytes), I figured that would all be hidden in the target
> code, along with the warning.  Places where you currently emit a
> warning from generic code ought to come from a target hook, either an
> existing one or a new one called something like
> warn_parameter_passing_abi.
 
I hope I've improved this now.  I've introduced two new wrappers,
maybe_empty_type_size, and int_maybe_empty_type_size.  I've moved the
warning to its own target hook with a new field in CUMULATIVE_ARGS; that
seems to work well.

> Note that nothing in gcc/ currently refers to warn_abi or warn_psabi,
> which are both c-common flags; some targets refer to warn_psabi.
 
True.

> > It also uses a lang hook to determine
> > whether to print a -Wabi warning.  The new passing can be turned back into
> > the old passing using -fabi-version=11.  So I had to use a new langhook,
> > otherwise I couldn't make it dependent on the C++ ABI verison.
> 
> Hmm, that's unfortunate.

It is, but the possibility of using -fabi-version=11 to revert to the old
behavior seems useful.

> > The users will get the new empty records passing even with -m32.
> 
> I thought we only wanted to make the change for -m64; -m32 is a legacy
> ABI at this point, it doesn't seem advisable to change it.

Sure thing, done (using the TARGET_64BIT check in ix86_is_empty_record_p).

> > One thing I wasn't sure about is what to do with a struct that contains
> > a zero-length array.  We should consider it an empty type, but my
> > is_empty_record_p won't do that, the reason is that there doesn't seem
> > to be any change regarding passing: older GCCs wouldn't pass such a struct
> > via stack.  So there's no warning for that, either.
> 
> Not warning for that makes sense.  Doing that by making
> is_empty_record_p give the wrong answer seems unwise.
 
is_empty_record_p should now give the correct answer.  As I said yesterday,
I think we shouldn't consider struct S { struct { } a; int b[0]; } as empty,
because [0] is a GNU extension and struct S { struct { } a; int b[]; } is
considered non-empty.  Also, for the record, the struct with the flexible array
member [0] broke struct-layout when considered empty.  Thus the uglyfying of
is_empty_record_p.  empty23.C and empty24.C test that.

> Also, you need to handled unnamed bit-fields, as they aren't data members.

Here's another hiccup: e.g. struct S { unsigned : 15; }; are considered empty,
but they were considered empty even before my change (so we probably shouldn't
warn), with "unsigned : 0;" it's different and there seems to be a change in
how we pass that.  So I kept the warning.  Not sure how important this is.

Any other concerns?

Bootstrapped/regtested on x86_64-linux, ppc64-linux, and aarch64-linux.

2017-10-27  Marek Polacek  
H.J. Lu  
Jason Merrill  

PR c++/60336
PR middle-end/67239
PR target/68355
* calls.c (initialize_argument_information): Call
warn_parameter_passing_abi target hook.
(store_one_arg): Use 0 for empty record size.  Don't push 0 size
argument onto stack.
(must_pass_in_stack_var_size_or_pad): Return false for empty types.
* common.opt: Update -fabi-version description.
* config/i386/i386.c (init_cumulative_args): Set cum->warn_empty.
(ix86_function_arg_advance): Skip empty records.
(ix86_return_in_memory): Return false for empty types.
(ix86_gimplify_va_arg): Call int_maybe_empty_type_size instead of
int_size_in_bytes.
(ix86_is_empty_record_p): New function.
(ix86_warn_parameter_passing_abi): New function.
(TARGET_EMPTY_RECORD_P): Redefine.

[C++ Patch, obvious] Change invalid_nontype_parm_type_p to return a bool

2017-10-27 Thread Paolo Carlini

Hi,

today I noticed once more that the function pointlessly still returns an 
int instead of a bool. Unless somebody complaints, I'm going to apply 
the below. Tested x86_64-linux.


Thanks, Paolo.



2017-10-27  Paolo Carlini  

* pt.c (invalid_nontype_parm_type_p): Return a bool instead of an int.
Index: pt.c
===
--- pt.c(revision 254136)
+++ pt.c(working copy)
@@ -203,7 +203,7 @@ static void tsubst_default_arguments (tree, tsubst
 static tree for_each_template_parm_r (tree *, int *, void *);
 static tree copy_default_args_to_explicit_spec_1 (tree, tree);
 static void copy_default_args_to_explicit_spec (tree);
-static int invalid_nontype_parm_type_p (tree, tsubst_flags_t);
+static bool invalid_nontype_parm_type_p (tree, tsubst_flags_t);
 static bool dependent_template_arg_p (tree);
 static bool any_template_arguments_need_structural_equality_p (tree);
 static bool dependent_type_p_r (tree);
@@ -23618,31 +23618,31 @@ instantiating_current_function_p (void)
 }
 
 /* [temp.param] Check that template non-type parm TYPE is of an allowable
-   type. Return zero for ok, nonzero for disallowed. Issue error and
-   warning messages under control of COMPLAIN.  */
+   type.  Return false for ok, true for disallowed.  Issue error and
+   inform messages under control of COMPLAIN.  */
 
-static int
+static bool
 invalid_nontype_parm_type_p (tree type, tsubst_flags_t complain)
 {
   if (INTEGRAL_OR_ENUMERATION_TYPE_P (type))
-return 0;
+return false;
   else if (POINTER_TYPE_P (type))
-return 0;
+return false;
   else if (TYPE_PTRMEM_P (type))
-return 0;
+return false;
   else if (TREE_CODE (type) == TEMPLATE_TYPE_PARM)
-return 0;
+return false;
   else if (TREE_CODE (type) == TYPENAME_TYPE)
-return 0;
+return false;
   else if (TREE_CODE (type) == DECLTYPE_TYPE)
-return 0;
+return false;
   else if (TREE_CODE (type) == NULLPTR_TYPE)
-return 0;
+return false;
   /* A bound template template parm could later be instantiated to have a valid
  nontype parm type via an alias template.  */
   else if (cxx_dialect >= cxx11
   && TREE_CODE (type) == BOUND_TEMPLATE_TEMPLATE_PARM)
-return 0;
+return false;
 
   if (complain & tf_error)
 {
@@ -23652,7 +23652,7 @@ invalid_nontype_parm_type_p (tree type, tsubst_fla
error ("%q#T is not a valid type for a template non-type parameter",
   type);
 }
-  return 1;
+  return true;
 }
 
 /* Returns TRUE if TYPE is dependent, in the sense of [temp.dep.type].


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-27 Thread Eric Botcazou
> There's always the possibility of building GCC 4.8 with the other compiler
> and then GCC 9+ (?) with GCC 4.8.

What an user-friendly solution...

> What's the list of other compilers people routinely use?  I see various
> comments on other compilers in install.texi but those are already saying
> those cannot be used to build GCC but you need to build an older GCC first
> (like xlc or the HP compiler).

I read the opposite for XLC:

"GCC can bootstrap with recent versions of IBM XLC, but bootstrapping with an 
earlier release of GCC is recommended."

I think that the major supported compilers are IBM, Sun/Oracle and LLVM.

-- 
Eric Botcazou


Re: [PATCH][AArch64] Simplify frame layout for stack probing

2017-10-27 Thread James Greenhalgh
On Thu, Oct 26, 2017 at 04:19:35PM +0100, James Greenhalgh wrote:
> On Tue, Jul 25, 2017 at 02:58:04PM +0100, Wilco Dijkstra wrote:
> > This patch makes some changes to the frame layout in order to simplify
> > stack probing.  We want to use the save of LR as a probe in any non-leaf
> > function.  With shrinkwrapping we may only save LR before a call, so it
> > is useful to define a fixed location in the callee-saves. So force LR at
> > the bottom of the callee-saves even with -fomit-frame-pointer.
> > 
> > Also remove a rarely used frame layout that saves the callee-saves first
> > with -fomit-frame-pointer.
> > 
> > OK for commit (and backport to GCC7)?
> 
> OK. Leave it a week before backporting.

This caused:

  Failures:
gcc.target/aarch64/test_frame_4.c
gcc.target/aarch64/test_frame_2.c
gcc.target/aarch64/test_frame_7.c
gcc.target/aarch64/test_frame_10.c

  Bisected to: 

  Author: wilco
  Date:   Thu Oct 26 16:40:25 2017 +

Simplify frame layout for stack probing

This patch makes some changes to the frame layout in order to simplify
stack probing.  We want to use the save of LR as a probe in any non-leaf
function.  With shrinkwrapping we may only save LR before a call, so it
is useful to define a fixed location in the callee-saves. So force LR at
the bottom of the callee-saves even with -fomit-frame-pointer.

Also remove a rarely used frame layout that saves the callee-saves first
with -fomit-frame-pointer.  Doing so allows the store of LR to be used as
a valid stack probe in all frames.

gcc/
* config/aarch64/aarch64.c (aarch64_layout_frame):
Ensure LR is always stored at the bottom of the callee-saves.
Remove rarely used frame layout which saves callee-saves at top of
frame, so the store of LR can be used as a valid probe in all cases.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@254112

Please look in to this.

This will also block the request to backport the patch until after the
failures have been resolved.

There's no reason we shouldn't be catching bugs like this (simple
scan assembler tests which have been in the port for years, that will
obviously never pass after your changes) before the patch makes it to
trunk. How was this patch tested?

Thanks,
James



Re: [PATCH 06/13] remove sdb and -gcoff from non-target files

2017-10-27 Thread Richard Biener
On Fri, Oct 27, 2017 at 12:12 AM, Jim Wilson  wrote:
> On Thu, 2017-10-26 at 11:38 +0200, Richard Biener wrote:
>> You can eventually keep the option, marking it as Ignore (like we do
>> for options we remove but "keep" for backward compatibility).  The
>> diagnostic (as warning, given the option will be just ignored) could
>> be emited from option processing in opts.c then.
>
> I seriously doubt that anyone will miss the -gcoff option.  The last
> bug report I can find is
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=9963
> which was fixed in 2005.  There is also a bug report from 2004
> https://gcc.gnu.org/ml/gcc/2004-06/msg00708.html
> which suggests it should just be removed instead of fixed.
>
> I see Kai Tietz fixing some bugs in sdbout in 2014, but that is only
> because he was doing cygwin maintenance, and these problems turned up
> during testsuite debug torture testing.  So it wasn't an end user
> problem.  Also, in this thread, there are questions about why we don't
> just delete it instead.
>
> If we ignore the option, we can't have code in opts.c to emit a warning
> for it, but we can put a warning in the common.opt file.  I tried this
> and ran into a minor problem which is that the code to check the debug
> level only works for options that exist.  So I get
>
> palantir:2277$ ./xgcc -B./ -O -S -gcoff tmp.c
> xgcc: warning: switch ‘-gcoff’ no longer supported
> palantir:2278$ ./xgcc -B./ -O -S -gcoff3 tmp.c
> xgcc: warning: switch ‘-gcoff3’ no longer supported
> palantir:2279$ ./xgcc -B./ -O -S -gcofffoo tmp.c
> xgcc: warning: switch ‘-gcofffoo’ no longer supported
> palantir:2280$
>
> The last one has never been a valid option.  If we don't care about
> this, then the attached patch works.
>
> Otherwise I think I have to add 4 stanzas for the four valid options,
> -gcoff, -gcoff1, -gcoff2, and -gcoff3.  I'd rather not do that.  Or
> leave -gcoff in as a supported option and ignore it in opts.c, which I
> would also rather not do. I just want it gone.  I can live with the
> ignored option.
>
> OK?

Does

gcoff
Common Driver JoinedOrMissing Ignore Warn(switch %qs no longer supported)
Does nothing.  Preserved for backward compatibility.

gcoff1
Common Driver Alias(gcoff)

gcoff2
Common Driver Alias(gcoff)

gcoff3
Common Driver Alias(gcoff)

work to that effect?  Not sure if we really should care ;)

I'm ok with your patch as approved or the Alias variant if it
avoids the odd warnings for options that never existed
and you're fine with the reduced duplication.

Thanks,
Richard.

> Jim
>
> 2017-10-26  Jim Wilson  
>
> gcc/
> * common.opt (gcoff): Re-add as ignored option.
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 25e86ec..c248d95 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2868,6 +2868,10 @@ g
>  Common Driver RejectNegative JoinedOrMissing
>  Generate debug information in default format.
>
> +gcoff
> +Common Driver JoinedOrMissing Ignore Warn(switch %qs no longer supported)
> +Does nothing.  Preserved for backward compatibility.
> +
>  gcolumn-info
>  Common Driver Var(debug_column_info,1) Init(1)
>  Record DW_AT_decl_column and DW_AT_call_column in DWARF.
>


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-27 Thread Jakub Jelinek
On Fri, Oct 27, 2017 at 10:35:56AM +0200, Richard Biener wrote:
> > I think it is too early for that, we aren't LLVM or Rust that don't really
> > care about what build requirements they impose on users.
> 
> That's true, which is why I asked.  For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
> 
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC.  Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".

Well, they can always start by building a new GCC and then build GDB with
it.  If they'd need to build an intermediate, already unsupported, GCC in
between as well, it might be bigger pain.
GCC 4.8 as system compiler certainly needs to be supported, it is still
heavily used in the wild, but I'd say even e.g. GCC 4.4 or 4.3 isn't
something that can be ignored.  And there are also non-GCC system compilers
we need to cope with.

Jakub


Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-27 Thread Richard Biener
On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek  wrote:
> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
>> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
>>  wrote:
>> > Richard Biener  writes:
>> >> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>> >>  wrote:
>> >>> This patch adds a POD version of fixed_size_mode.  The only current use
>> >>> is for storing the __builtin_apply and __builtin_result register modes,
>> >>> which were made fixed_size_modes by the previous patch.
>> >>
>> >> Bah - can we update our host compiler to C++11/14 please ...?
>> >> (maybe requiring that build with GCC 4.8 as host compiler works,
>> >> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>> >
>> > That'd be great :-)  It would avoid all the poly_int_pod stuff too,
>> > and allow some clean-up of wide-int.h.
>>
>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>> that would be required?
>
> I think it is too early for that, we aren't LLVM or Rust that don't really
> care about what build requirements they impose on users.

That's true, which is why I asked.  For me requiring sth newer than GCC 4.8
would be a blocker given that's the system compiler on our latest server
(and "stable" OSS) product.

I guess it depends on the amount of pain we have going forward with C++
use in GCC.  Given that gdb already requires C++11 people building
GCC are likely already experiencing the "issue".

Richard.

> Jakub


Re: [PATCH] Document --coverage and fork-like functions (PR gcov-profile/82457).

2017-10-27 Thread Martin Liška

On 10/26/2017 06:30 PM, Sandra Loosemore wrote:

On 10/26/2017 01:21 AM, Martin Liška wrote:

On 10/20/2017 06:03 AM, Sandra Loosemore wrote:

On 10/19/2017 12:26 PM, Eric Gallager wrote:

On 10/19/17, Martin Liška  wrote:

Hi.

As discussed in the PR, we should be more precise in our documentation.
The patch does that.

Ready for trunk?
Martin

gcc/ChangeLog:

2017-10-19  Martin Liska  

 PR gcov-profile/82457
 * doc/invoke.texi: Document that one needs a non-strict ISO mode
 for fork-like functions to be properly instrumented.
---
   gcc/doc/invoke.texi | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)





The wording is kinda unclear because the modes in the parentheses are
all strict ISO modes, but the part before the parentheses says
NON-strict... I think you either need an additional "not" inside the
parentheses, or to change all the instances of -std=c* to -std=gnu*.


The wording in the patch doesn't make sense to me, either.  If I understand the 
issue correctly, the intent is probably to say something like

Unless a strict ISO C dialect option is in effect,
@code{fork} calls are detected and correctly handled without double counting.

??


Hi Sandra.

Thank you for the feedback, I'm sending version you suggested. Hope it's fine 
to install the patch?


Ummm, no.  Sorry to have been unclear; the wording I suggested above was 
intended to replace the existing sentence about fork behavior, not to be 
appended to it.

-Sandra



Sorry, stupid mistake. Should be fixed, I'm going to install the patch.

Martin
>From 30a3987aae8d306764bf38037f3f197cfe0d2aef Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 19 Oct 2017 12:18:45 +0200
Subject: [PATCH] Document --coverage and fork-like functions (PR
 gcov-profile/82457).

gcc/ChangeLog:

2017-10-19  Martin Liska  

	PR gcov-profile/82457
	* doc/invoke.texi: Document that one needs a non-strict ISO mode
	for fork-like functions to be properly instrumented.
---
 gcc/doc/invoke.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 71b2445f70f..6ca59baac67 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10862,9 +10862,9 @@ Link your object files with @option{-lgcov} or @option{-fprofile-arcs}
 Run the program on a representative workload to generate the arc profile
 information.  This may be repeated any number of times.  You can run
 concurrent instances of your program, and provided that the file system
-supports locking, the data files will be correctly updated.  Also
-@code{fork} calls are detected and correctly handled (double counting
-will not happen).
+supports locking, the data files will be correctly updated.  Unless
+a strict ISO C dialect option is in effect, @code{fork} calls are
+detected and correctly handled without double counting.
 
 @item
 For profile-directed optimizations, compile the source files again with
-- 
2.14.2



Re: [09/nn] Add a fixed_size_mode_pod class

2017-10-27 Thread Richard Biener
On Thu, Oct 26, 2017 at 9:37 PM, Eric Botcazou  wrote:
>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>> that would be required?
>
> GCC needs to be buildable by other compilers than itself though.

There's always the possibility of building GCC 4.8 with the other compiler and
then GCC 9+ (?) with GCC 4.8.

What's the list of other compilers people routinely use?  I see various comments
on other compilers in install.texi but those are already saying those cannot be
used to build GCC but you need to build an older GCC first (like xlc or the HP
compiler).

Richard.

> --
> Eric Botcazou


  1   2   >