Re: [PATCH version 4, rs6000] Add builtins to convert from float/double to int/long using current rounding mode

2017-09-22 Thread Segher Boessenkool
Hi Carl,

Some comments, mostly trivial:

On Thu, Sep 21, 2017 at 04:09:11PM -0700, Carl Love wrote:
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -608,6 +608,16 @@
>   CODE_FOR_ ## ICODE) /* ICODE */
>  
>  
> +/* Miscellaneous builtins for instructions added prior to ISA 2.04.  These
> +  operate on floating point registers.  */

"operate" should align with "Miscellaneous".

> +#define BU_FP_MISC_1(ENUM, NAME, ATTR, ICODE)
> \
> +  RS6000_BUILTIN_1 (MISC_BUILTIN_ ## ENUM,   /* ENUM */  \
> + "__builtin_" NAME,  /* NAME */  \
> +RS6000_BTM_HARD_FLOAT,   /* MASK */  \
> + (RS6000_BTC_ ## ATTR/* ATTR */  \
> +  | RS6000_BTC_UNARY),   \
> + CODE_FOR_ ## ICODE) /* ICODE */

RS6000_BTM_HARD_FLOAT needs to be indented one extra space, too.

> +(define_insn "lrintsfsi2"
> +  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
> + (unspec:SI [(match_operand:DF 1 "gpc_reg_operand" "d")]
> +UNSPEC_FCTIW))]
> +  "TARGET_SF_FPR && TARGET_FPRND"
> +  "fctiw %0,%1"
> +  [(set_attr "type" "fp")])

Should this be match_operand:SF?

Okay for trunk if that works.  Thanks,


Segher


Re: [PATCH, rs6000] folding of vector stores in GIMPLE

2017-09-22 Thread Segher Boessenkool
On Fri, Sep 22, 2017 at 02:58:54PM -0500, Bill Schmidt wrote:
> OK.  Will, for now, let's again use the (void *) solution for the time being, 
> and
> add commentary recording this improvement for future work.  Same would
> go for the vec_vsx_ld/st variations once you get to those.
> 
> Otherwise the patch looks ok to me.  I'll defer to Segher for approval, of 
> course.

It's okay for trunk with the suggested improvements.  Thanks for the
review!


Segher


Re: [PATCH, rs6000] testcase coverage for vector store builtins

2017-09-22 Thread Segher Boessenkool
On Thu, Sep 21, 2017 at 01:11:11PM -0500, Will Schmidt wrote:
> Add testcase coverage for the vec_st (vector store)
> intrinsic builtins.
> 
> Tested across power platforms (p6 and newer). OK for trunk?

Yep, looks fine.  Thanks!


Segher


> 2017-09-21  Will Schmidt  
> 
> * gcc.target/powerpc/fold-vec-st-char.c: New.
> * gcc.target/powerpc/fold-vec-st-double.c: New.
> * gcc.target/powerpc/fold-vec-st-float.c: New.
> * gcc.target/powerpc/fold-vec-st-int.c: New.
> * gcc.target/powerpc/fold-vec-st-longlong.c: New.
> * gcc.target/powerpc/fold-vec-st-pixel.c: New.
> * gcc.target/powerpc/fold-vec-st-short.c: New.


Re: [Patch] Edit contrib/ files to download gfortran prerequisites

2017-09-22 Thread Jerry DeLisle
On Thu, 21 Sep 2017 09:40:49Richard Biener wrote:
> On Wed, Sep 20, 2017 at 10:35 PM, Damian Rouson
>  wrote:
> >> Attached is a patch that adds the downloading of gfortran prerequisites
> OpenCoarrays and MPICH in the contrib/download_prerequisites script.  The
> patch also provides a useful error message when neither wget or curl are
> available on the target platform.  I tested this patch with several choices
> for the command-line options on macOS (including --md5 and --sha512) and
> Ubuntu Linux (including --sha512).A suggested ChangeLog entry is
>>
>> * contrib/download_prerequisites: Download OpenCoarrays and MPICH.
>> * contrib/prerequisites.sha5: Add sha512 message digests for
> OpenCoarrays and MPICH.
>> * contrib/prerequisites.md5: Add md5 message digests for OpenCoarrays
> and MPICH.
>
>
> OK for trunk?  If so, I’ll ask Jerry to commit this.  I don’t have commit
> rights.

> Can you make this optional similar to graphite/isl?  Also I see no support in
> the toplevel build machinery to build/install the libs as part of GCC
> so how does that work in the end?

In the end gcc needs to know what to do and just do it just like mpfr and
libquadmath.

I started looking at Makefile.def and I am guessing this is where the changes
need to be made to enable gcc machinery to actually build the two packages.  I
am not knowledgeable enough to figure this out and I recently got a new job so
now am very time constrained as far as reverse engineering what is going on.

I did manage to add Gnu Build System autotools features to OpenCoarrays so we
could be closer to integrating this in. (./configure && make vs cmake)

We definitely need some guidance from someone regarding the changes required for
the top level gcc items.

Also we are getting some libtool related build failures with the OpenCoarrays on
Mac and some other platforms where we know gcc builds fine, so I suspect we need
to build a libtool script or something to ensure that it works everywhere. It
works fine on Fedora which is what I use (naturally).

So if anyone would step up to the plate and advise on the toplevel machinery it
would be greatly appreciated.

Regards,

Jerry


Re: [PATCH, rs6000 version 3] Add support for vec_xst_len_r() and vec_xl_len_r() builtins

2017-09-22 Thread Segher Boessenkool
Hi Carl,

On Mon, Sep 18, 2017 at 11:31:07AM -0700, Carl Love wrote:
>   * gcc.target/powerpc/builtins-5-p9-runnable.c: Add new runable test file
>   for the new built-ins and the existing built-ins.

Typo ("runable").

>  (define_expand "altivec_lvsr"
> -  [(use (match_operand:V16QI 0 "register_operand" ""))
> +  [(use (match_operand:V16QI 0 "altivec_register_operand" ""))
> (use (match_operand:V16QI 1 "memory_operand" ""))]

Empty constraint strings in define_expand is the default, just leave
them out.

> +;; Expand for builtin xl_len_r
> +(define_expand "xl_len_r"
> +  [(match_operand:V16QI 0 "vsx_register_operand" "=wa")
> +   (match_operand:DI 1 "register_operand" "b")
> +   (match_operand:DI 2 "register_operand" "r")]
> +  ""

Non-empty constraints in an expander do not really make sense either :-)

All the rest looks fine.  Please fix up the expanders and commit.

Thanks!


Segher


Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Andrew Pinski
On Fri, Sep 22, 2017 at 11:39 AM, Jim Wilson  wrote:
> On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski  wrote:
>> Two overall comments:
>> * What about splitting register_offset into two different elements,
>> one for non 128bit modes and one for 128bit (and more; OI, etc.) modes
>> so you get better address generation right away for the simd load
>> cases rather than having LRA/reload having to reload the address into
>> a register.
>
> I'm not sure if changing register_offset cost would make a difference,
> since costs are usually used during optimization, not during address
> generation.  This is something that I didn't think to try though.  I
> can try taking a look at this.

It does taken into account when fwprop is propagating the addition into
the MEM (the tree level is always a_1 = POINTER_PLUS_EXPR; MEM_REF(a_1)).
IV-OPTS will produce much better code if the address_cost is correct.

It looks like no other pass (combine, etc.) would take that into
account except for postreload CSE but maybe they should.

>
> I did try writing a patch to modify predicates to disallow reg offset
> for 128bit modes, and that got complicated, as I had to split apart a
> number of patterns in the aarch64-simd.md file that accept both VD and
> VQ modes.  I ended up with a patch 3-4 times as big as the one I
> submitted, without any additional performance improvement, so it
> wasn't worth the trouble.
>
>> * Maybe adding a testcase to the testsuite to show this change.
>
> Yes, I can add a testcase.
>
>> One extra comment:
>> * should we change the generic tuning to avoid reg+reg for 128bit modes?
>
> Are there other targets with a similar problem?  I only know that it
> is a problem for Falkor.  It might be a loss for some targets as it is
> replacing one instruction with two.

Well that is why I was suggesting the address cost model change.
Because the cost model change actually might provide better code in
the first place and still allow for reasonable generic code to be
produced.

Thanks,
Andrew

>
> Jim


Re: [PATCH] Fix PR82144

2017-09-22 Thread Alexandre Oliva
On Sep 21, 2017, Richard Biener  wrote:

> On Tue, 12 Sep 2017, Richard Biener wrote:
>> 
>> The following avoids adding DW_AT_alignment twice by not doing it
>> for incomplete types.
>> 
>> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>> 
>> Alex, is that ok or do we want DW_AT_alignment for incomplete types as 
>> well?

> Alex, ping?

Sorry, still catching up after Cauldron and recovery from the bug I
caught at the conference ;-)

Since incomplete enum types are non-standard, there's no standard to
guide us.  Pointers (the primary viable use of such types) generally
benefit from alignment information, but I don't see how the alignment
could be meaningfully determined for such types.  For debug information,
we're probably better off leaving any tentative or uninitialized
alignment information out of the DIE, so the patch looks good to me.
Thanks for your patience!

Reviewed-by: Alexandre Oliva 

>> PR middle-end/82144
>> * dwarf2out.c (gen_enumeration_type_die): Do not add alignment
>> attribute for incomplete types nor twice for complete ones.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [C++] Fix PR bootstrap/81926

2017-09-22 Thread Eric Botcazou
> This looks good.  But let's call it debug_type_hash instead.  OK with
> that change.

Thanks.  It occured to me that we don't need to look up the type twice when we 
need to insert it so I have installed the attached patchlet as obvious after 
boostrapping/regtesting on x86-64/Linux.


PR bootstrap/81926
* cp-objcp-common.c (cp_get_debug_type): Do only one lookup.

-- 
Eric BotcazouIndex: cp-objcp-common.c
===
--- cp-objcp-common.c	(revision 253049)
+++ cp-objcp-common.c	(working copy)
@@ -162,13 +162,13 @@ cp_get_debug_type (const_tree type)
 	 types on the fly for the debug info only, they would not be attached
 	 to any GC root and always be swept, so we would make the contents of
 	 the debug info depend on the collection points.  */
-  struct tree_map in, *h;
+  struct tree_map in, *h, **slot;
 
   in.base.from = CONST_CAST_TREE (type);
   in.hash = htab_hash_pointer (type);
-  h = debug_type_hash->find_with_hash (&in, in.hash);
-  if (h)
-	return h->to;
+  slot = debug_type_hash->find_slot_with_hash (&in, in.hash, INSERT);
+  if (*slot)
+	return (*slot)->to;
 
   tree t = build_offset_type (TYPE_PTRMEMFUNC_OBJECT_TYPE (type),
   TREE_TYPE (TYPE_PTRMEMFUNC_FN_TYPE (type)));
@@ -177,7 +177,7 @@ cp_get_debug_type (const_tree type)
   h->base.from = CONST_CAST_TREE (type);
   h->hash = htab_hash_pointer (type);
   h->to = t;
-  *debug_type_hash->find_slot_with_hash (h, h->hash, INSERT) = h;
+  *slot = h;
 
   return t;
 }


Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Wilco Dijkstra
Hi Jim,

This looks like a general issue with reg+reg addressing modes being generated in
cases where it is not correct. I haven't looked at lmbench for a while, but it 
generated
absolutely horrible code like:

add x1, x0, #120
ldr v0, [x2, x1]
add x1, x0, #128
ldr v1, [x2, x1]

If this is still happening, we need to fix this in a general way as this is bad 
for any
CPU even if reg+reg is fast. Reduced testcases for this would be welcome as it's
likely many other targets are affected. A while ago I posted a patch to 
reassociate
(x + C1) * C2 -> x * C2 + C3 to improve cases like this.

Note we already support adjusting addressing costs. There are several other CPUs
which increase the reg+reg costs. So that's where I would start - assuming 
reg+reg
addressing mode was correctly used.

Finally, if you really want to completely disable a particular addressing mode, 
it's
best done in classify_address rather than changing the md patterns. My 
experience
is that if you use anything other than the standard 'm' constraint, GCC reload 
starts
to generate inefficient code even if the pattern should still apply. I have 
posted
several patches to use 'm' more to get better spill code and efficient 
expansions
if the offset happens to be too large.

Wilco



Re: [PATCH, rs6000] folding of vector stores in GIMPLE

2017-09-22 Thread Bill Schmidt
On Sep 22, 2017, at 5:27 AM, Richard Biener  wrote:
> 
> On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt
>  wrote:
>> Hi,
>> 
>> Folding of vector stores in GIMPLE.
>> 
>> - Add code to handle gimple folding for the vec_st (vector store) builtins.
>> - Remove the now obsoleted folding code for vec_st from rs6000-c-c.
>> 
>> There are two spots that I could use some feedback on.
>> 
>> First -
>> An early exit remains in place prevent folding of statements that do not
>> have a LHS.  To allow folding of the stores to get past the check, I have
>> added a helper function (rs6000_builtin_valid_without_lhs) that allows
>> those store intrinsics to proceed.  I'm not sure the approach (or the name I 
>> chose)
>> is the best choice, so I'll defer to recommendations on how to improve that. 
>> :-)

It's fine, but please make the helper function static.

>> 
>> Second -
>> This code (as-is) is subject to a TBAA related issue (similar to what was 
>> noticed
>> in the gimple folding code for loads.   As-is, with a testcase such as :
>> 
>> void testst_struct1b (vector double vd1, long long ll1, struct S *p)
>>{
>>vec_st (vd1, ll1, (vector double *)p);
>>}
>> 
>> will generate gimple that looks like:
>>MEM[(struct S *)D.3218] = vd1;
>> 
>> If I rework the code, setting arg2_type to be ptr_type_node, i.e.
>> + tree arg2_type = TREE_TYPE (arg2);
>> to:
>> + tree arg2_type = ptr_type_node;
>> 
>> the generated gimple then looks like
>>  MEM[(void *)D.3218] = vd1;
>> 
>> Which is probably OK, but I cannot say for certain.  The generated .s 
>> content is at least equivalent.
> 
> It looks safe.
> 
> The question I had is whether vec_st (vd1, ll1, (vector double *)p) is
> equivalent
> to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective.  Thus
> whether the type of the tird argument to vec_st defines the type of the access
> (in C standards meaning).  If so then what we do now is pessimizing (but
> as you say previously (long long *) and (long *) were aliased together and
> you got wrong-code with aliasing with regular stores of the "wrong" same 
> type).
> 
> A proper fix would be to transition this type as seen from the frontend to
> GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing
> that on an extra argument, a constant zero pointer of the alias
> pointer type to use
> (which would also serve as a type indicator of the store itself).  I'd use a
> target specific internal function for this (not sure if we can have those 
> target
> specific, but I guess if it's folded away then that's fine) and get away with
> the overload set.

OK.  Will, for now, let's again use the (void *) solution for the time being, 
and
add commentary recording this improvement for future work.  Same would
go for the vec_vsx_ld/st variations once you get to those.

Otherwise the patch looks ok to me.  I'll defer to Segher for approval, of 
course.

Thanks,
Bill
> 
> Richard.
> 
>> The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which
>> has been posted separately.
>> 
>> regtest looks clean on power6 and newer.
>> 
>> pending feedback, OK for trunk?
>> 
>> Thanks,
>> -Will
>> 
>>[gcc]
>> 
>>2017-09-21  Will Schmidt  
>> 
>>* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
>>for early folding of vector stores (ALTIVEC_BUILTIN_ST_*).
>>(rs6000_builtin_valid_without_lhs): helper function.
>>* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST.
>> 
>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>> index a49db97..4a363a1 100644
>> --- a/gcc/config/rs6000/rs6000-c.c
>> +++ b/gcc/config/rs6000/rs6000-c.c
>> @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, 
>> tree fndecl,
>> convert (TREE_TYPE (stmt), arg0));
>>   stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>   return stmt;
>> }
>> 
>> -  /* Expand vec_st into an expression that masks the address and
>> - performs the store.  We need to expand this early to allow
>> - the best aliasing, as by the time we get into RTL we no longer
>> - are able to honor __restrict__, for example.  We may want to
>> - consider this for all memory access built-ins.
>> -
>> - When -maltivec=be is specified, or the wrong number of arguments
>> - is provided, simply punt to existing built-in processing.  */
>> -
>> -  if (fcode == ALTIVEC_BUILTIN_VEC_ST
>> -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
>> -  && nargs == 3)
>> -{
>> -  tree arg0 = (*arglist)[0];
>> -  tree arg1 = (*arglist)[1];
>> -  tree arg2 = (*arglist)[2];
>> -
>> -  /* Construct the masked address.  Let existing error handling take
>> -over if we don't have a constant offset.  */
>> -  arg1 = fold (arg1);
>> -
>> -  if (TREE_CODE (arg1) == INTEGER_CST)
>> -   {
>> -   

Re: [PATCH][aarch64] Fix error calls in aarch64 code so they can be tranlated

2017-09-22 Thread Martin Sebor

On 09/22/2017 11:54 AM, Steve Ellcey wrote:



In my view, the ideal message would use the same form as in
the source code.  But I suspect that's not easy to determine at
all sites where the message is being formatted, so the next best
thing is to avoid using the keyword and fall back on the general
term.

Martin

PS If possible, I would also suggest adopting a style that is
already in use elsewhere, for consistency.  (That would suggest
using % with no spaces and no "string" after
arch=.)


Here is a new version of my patch.  I think it addresses these issues
and is consistent.  It does cause some regressions in tests that check
for certain error messages, specifically:

gcc.target/aarch64/spellcheck_1.c
gcc.target/aarch64/spellcheck_2.c
gcc.target/aarch64/spellcheck_3.c
gcc.target/aarch64/target_attr_11.c
gcc.target/aarch64/target_attr_12.c
gcc.target/aarch64/target_attr_17.c

But I thought we should try and get some agreement on the format of the
messages before updating the tests.  If we think these changes look
good I can submit a final patch that includes the testsuite changes.


Thank you for taking my suggestions!  The changes look good to me.
Just one minor nit.  The following won't produce the kind of output
we'd like it to:

  error (is_pragma
 ? G_("pragma % does not accept an argument")
 : G_("attribute % does not accept an argument"),
 str_to_check);

It will end up printing:

  error: pragma ‘target(‘...’)’ does not accept an argument

(note the single quotes around the ... where the syntax otherwise
calls for a string).  The %s directive should not use the 'q' flag
when nested between %< and %>.  -Wformat normally warns about this
but in this case it doesn't see the format string because of the
conditional (I think -Wformat can be extended to handle this case).

  warning: 'q' flag used within a quoted sequence [-Wformat=]

To get the right output I think the format string should like
something like this:

  "pragma % does not accept an argument"

Martin


Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-22 Thread Janus Weil
2017-09-22 11:44 GMT+02:00 Janus Weil :
> 2017-09-22 9:11 GMT+02:00 Janne Blomqvist :
>> And since the standard requires that double precision variables are
>> twice as big as reals, in the absence of an explicit -fdefault-double=
>> flag, would it make sense to have -fdefault-real=N imply
>> -fdefault-double=[2*N or if that isn't supported on the target, the
>> largest supported real kind]?
>
> That's basically the behavior I tried to implement in my current patch
> (although I notice now that you're not necessarily getting the largest
> real kind, if 16 is not supported).

Attached is a new version of the patch, which improves this point. In
the absence of further comments, I'll commit this by tomorrow.

Cheers,
Janus
Index: gcc/fortran/invoke.texi
===
--- gcc/fortran/invoke.texi (revision 253108)
+++ gcc/fortran/invoke.texi (working copy)
@@ -119,8 +119,8 @@ by type.  Explanations are in the following sectio
 @gccoptlist{-fall-intrinsics -fbackslash -fcray-pointer -fd-lines-as-code @gol
 -fd-lines-as-comments @gol
 -fdec -fdec-structure -fdec-intrinsic-ints -fdec-static -fdec-math @gol
--fdefault-double-8 -fdefault-integer-8 @gol
--fdefault-real-8 -fdollar-ok -ffixed-line-length-@var{n} @gol
+-fdefault-double-8 -fdefault-integer-8 -fdefault-real-8 @gol
+-fdefault-real-10 -fdefault-real-16 -fdollar-ok -ffixed-line-length-@var{n} 
@gol
 -ffixed-line-length-none -ffree-form -ffree-line-length-@var{n} @gol
 -ffree-line-length-none -fimplicit-none -finteger-4-integer-8 @gol
 -fmax-identifier-length -fmodule-private -ffixed-form -fno-range-check @gol
@@ -404,6 +404,22 @@ the default width of @code{DOUBLE PRECISION} to 16
 @code{-fdefault-double-8} is given, too. Unlike @option{-freal-4-real-8},
 it does not promote variables with explicit kind declaration.
 
+@item -fdefault-real-10
+@opindex @code{fdefault-real-10}
+Set the default real type to a 10 byte wide type. This option also affects
+the kind of non-double real constants like @code{1.0}, and does promote
+the default width of @code{DOUBLE PRECISION} to 16 bytes if possible, unless
+@code{-fdefault-double-8} is given. Unlike @option{-freal-4-real-10},
+it does not promote variables with explicit kind declaration.
+
+@item -fdefault-real-16
+@opindex @code{fdefault-real-16}
+Set the default real type to a 16 byte wide type. This option also affects
+the kind of non-double real constants like @code{1.0}, and does promote
+the default width of @code{DOUBLE PRECISION} to 16 bytes if possible, unless
+@code{-fdefault-double-8} is given. Unlike @option{-freal-4-real-16},
+it does not promote variables with explicit kind declaration.
+
 @item -fdefault-double-8
 @opindex @code{fdefault-double-8}
 Set the @code{DOUBLE PRECISION} type to an 8 byte wide type.  Do nothing if 
this
Index: gcc/fortran/lang.opt
===
--- gcc/fortran/lang.opt(revision 253108)
+++ gcc/fortran/lang.opt(working copy)
@@ -457,9 +457,17 @@ Fortran Var(flag_default_integer)
 Set the default integer kind to an 8 byte wide type.
 
 fdefault-real-8
-Fortran Var(flag_default_real)
+Fortran Var(flag_default_real_8)
 Set the default real kind to an 8 byte wide type.
 
+fdefault-real-10
+Fortran Var(flag_default_real_10)
+Set the default real kind to an 10 byte wide type.
+
+fdefault-real-16
+Fortran Var(flag_default_real_16)
+Set the default real kind to an 16 byte wide type.
+
 fdollar-ok
 Fortran Var(flag_dollar_ok)
 Allow dollar signs in entity names.
Index: gcc/fortran/module.c
===
--- gcc/fortran/module.c(revision 253108)
+++ gcc/fortran/module.c(working copy)
@@ -6741,7 +6741,7 @@ use_iso_fortran_env_module (void)
   "standard", symbol[i].name, &u->where))
continue;
 
- if ((flag_default_integer || flag_default_real)
+ if ((flag_default_integer || flag_default_real_8)
  && symbol[i].id == ISOFORTRANENV_NUMERIC_STORAGE_SIZE)
gfc_warning_now (0, "Use of the NUMERIC_STORAGE_SIZE named "
 "constant from intrinsic module "
@@ -6808,7 +6808,7 @@ use_iso_fortran_env_module (void)
  if ((gfc_option.allow_std & symbol[i].standard) == 0)
continue;
 
- if ((flag_default_integer || flag_default_real)
+ if ((flag_default_integer || flag_default_real_8)
  && symbol[i].id == ISOFORTRANENV_NUMERIC_STORAGE_SIZE)
gfc_warning_now (0,
 "Use of the NUMERIC_STORAGE_SIZE named constant "
Index: gcc/fortran/trans-types.c
===
--- gcc/fortran/trans-types.c   (revision 253108)
+++ gcc/fortran/trans-types.c   (working copy)
@@ -530,7 +530,7 @@ gfc_init_kinds (void)
 }
 
   /* Choose the default real kind.

libgo patch committed: Add XCOFF support

2017-09-22 Thread Ian Lance Taylor
This patch by Tony Reix adds XCOFF support to libgo and to the go and
cgo commands.  Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 253042)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-1fcb9bb3cefb97cfab1e623826a1cc3f6aadd5f7
+e0c1f0b645b12a544b484c0f477f8fb6f5980550
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 253042)
+++ libgo/Makefile.am   (working copy)
@@ -217,7 +217,8 @@ toolexeclibgodebug_DATA = \
debug/gosym.gox \
debug/macho.gox \
debug/pe.gox \
-   debug/plan9obj.gox
+   debug/plan9obj.gox \
+   debug/xcoff.gox
 
 toolexeclibgoencodingdir = $(toolexeclibgodir)/encoding
 
@@ -722,6 +723,7 @@ PACKAGES = \
debug/macho \
debug/pe \
debug/plan9obj \
+   debug/xcoff \
encoding \
encoding/ascii85 \
encoding/asn1 \
@@ -1293,6 +1295,7 @@ TEST_PACKAGES = \
debug/macho/check \
debug/pe/check \
debug/plan9obj/check \
+   debug/xcoff/check \
encoding/ascii85/check \
encoding/asn1/check \
encoding/base32/check \
Index: libgo/go/cmd/cgo/gcc.go
===
--- libgo/go/cmd/cgo/gcc.go (revision 253025)
+++ libgo/go/cmd/cgo/gcc.go (working copy)
@@ -13,6 +13,7 @@ import (
"debug/elf"
"debug/macho"
"debug/pe"
+   "debug/xcoff"
"encoding/binary"
"errors"
"flag"
@@ -1289,6 +1290,10 @@ func (p *Package) gccMachine() []string
return []string{"-mabi=64"}
case "mips", "mipsle":
return []string{"-mabi=32"}
+   case "ppc64":
+   if goos == "aix" {
+   return []string{"-maix64"}
+   }
}
return nil
 }
@@ -1616,7 +1621,79 @@ func (p *Package) gccDebug(stdin []byte,
return d, ints, floats, strs
}
 
-   fatalf("cannot parse gcc output %s as ELF, Mach-O, PE object", gccTmp())
+   if f, err := xcoff.Open(gccTmp()); err == nil {
+   defer f.Close()
+   d, err := f.DWARF()
+   if err != nil {
+   fatalf("cannot load DWARF output from %s: %v", 
gccTmp(), err)
+   }
+   bo := binary.BigEndian
+   for _, s := range f.Symbols {
+   switch {
+   case isDebugInts(s.Name):
+   if i := int(s.SectionNumber) - 1; 0 <= i && i < 
len(f.Sections) {
+   sect := f.Sections[i]
+   if s.Value < sect.Size {
+   if sdat, err := sect.Data(); 
err == nil {
+   data := sdat[s.Value:]
+   ints = make([]int64, 
len(data)/8)
+   for i := range ints {
+   ints[i] = 
int64(bo.Uint64(data[i*8:]))
+   }
+   }
+   }
+   }
+   case isDebugFloats(s.Name):
+   if i := int(s.SectionNumber) - 1; 0 <= i && i < 
len(f.Sections) {
+   sect := f.Sections[i]
+   if s.Value < sect.Size {
+   if sdat, err := sect.Data(); 
err == nil {
+   data := sdat[s.Value:]
+   floats = 
make([]float64, len(data)/8)
+   for i := range floats {
+   floats[i] = 
math.Float64frombits(bo.Uint64(data[i*8:]))
+   }
+   }
+   }
+   }
+   default:
+   if n := indexOfDebugStr(s.Name); n != -1 {
+   if i := int(s.SectionNumber) - 1; 0 <= 
i && i < len(f.Sections) {
+   sect := f.Sections[i]
+   if s.Value < sect.Size {
+   if sdat, err 

Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Jim Wilson
On Fri, Sep 22, 2017 at 10:58 AM, Andrew Pinski  wrote:
> Two overall comments:
> * What about splitting register_offset into two different elements,
> one for non 128bit modes and one for 128bit (and more; OI, etc.) modes
> so you get better address generation right away for the simd load
> cases rather than having LRA/reload having to reload the address into
> a register.

I'm not sure if changing register_offset cost would make a difference,
since costs are usually used during optimization, not during address
generation.  This is something that I didn't think to try though.  I
can try taking a look at this.

I did try writing a patch to modify predicates to disallow reg offset
for 128bit modes, and that got complicated, as I had to split apart a
number of patterns in the aarch64-simd.md file that accept both VD and
VQ modes.  I ended up with a patch 3-4 times as big as the one I
submitted, without any additional performance improvement, so it
wasn't worth the trouble.

> * Maybe adding a testcase to the testsuite to show this change.

Yes, I can add a testcase.

> One extra comment:
> * should we change the generic tuning to avoid reg+reg for 128bit modes?

Are there other targets with a similar problem?  I only know that it
is a problem for Falkor.  It might be a loss for some targets as it is
replacing one instruction with two.

Jim


Re: [PATCH] Optimize x == -1 && y == -1 (PR middle-end/35691)

2017-09-22 Thread Richard Biener
On September 22, 2017 6:59:48 PM GMT+02:00, Jakub Jelinek  
wrote:
>Hi!
>
>While fixing libgcc2.c (__mulvDI3), I've noticed that while we
>optimize x == 0 && y == 0 into (x | y) == 0, we don't optimize
>analogical x == -1 && y == -1 into (x & y) == -1.
>
>The following patch does that, bootstrapped/regtested on x86_64-linux
>and
>i686-linux, ok for trunk?

OK. 

Thanks, 
Richard. 

>2017-09-22  Jakub Jelinek  
>
>   PR middle-end/35691
>   * match.pd: Simplify x == -1 & y == -1 into (x & y) == -1
>   and x != -1 | y != -1 into (x & y) != -1.
>
>   * gcc.dg/pr35691-1.c: Use -fdump-tree-forwprop1-details
>   instead of -fdump-tree-forwprop-details in dg-options.
>   * gcc.dg/pr35691-2.c: Likewise.
>   * gcc.dg/pr35691-3.c: New test.
>   * gcc.dg/pr35691-4.c: New test.
>
>--- gcc/match.pd.jj2017-09-15 20:42:29.0 +0200
>+++ gcc/match.pd   2017-09-22 13:34:29.413534762 +0200
>@@ -630,17 +630,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (TYPE_UNSIGNED (type))
>(bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1)
> 
>-/* PR35691: Transform
>-   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
>-   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
> (for bitop (bit_and bit_ior)
>  cmp (eq ne)
>+ /* PR35691: Transform
>+(x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
>+(x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
>  (simplify
>   (bitop (cmp @0 integer_zerop@2) (cmp @1 integer_zerop))
>(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>-   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>-   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>(@1)))
>-(cmp (bit_ior @0 (convert @1)) @2
>+  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>+  && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>(@1)))
>+(cmp (bit_ior @0 (convert @1)) @2)))
>+ /* Transform:
>+(x == -1 & y == -1) -> (x & typeof(x)(y)) == -1.
>+(x != -1 | y != -1) -> (x & typeof(x)(y)) != -1.  */
>+ (simplify
>+  (bitop (cmp @0 integer_all_onesp@2) (cmp @1 integer_all_onesp))
>+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>+  && INTEGRAL_TYPE_P (TREE_TYPE (@1))
>+  && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE
>(@1)))
>+(cmp (bit_and @0 (convert @1)) @2
> 
> /* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
> (simplify
>--- gcc/testsuite/gcc.dg/pr35691-1.c.jj2016-11-09 15:22:31.0
>+0100
>+++ gcc/testsuite/gcc.dg/pr35691-1.c   2017-09-22 13:58:32.665455251
>+0200
>@@ -1,5 +1,5 @@
> /* { dg-do compile } */
>-/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
>+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
> 
> int foo(int z0, unsigned z1)
> {
>--- gcc/testsuite/gcc.dg/pr35691-2.c.jj2016-11-09 15:22:32.0
>+0100
>+++ gcc/testsuite/gcc.dg/pr35691-2.c   2017-09-22 13:58:38.174386317
>+0200
>@@ -1,5 +1,5 @@
> /* { dg-do compile } */
>-/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
>+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
> 
> int foo(int z0, unsigned z1)
> {
>--- gcc/testsuite/gcc.dg/pr35691-3.c.jj2017-09-22 13:55:36.186663576
>+0200
>+++ gcc/testsuite/gcc.dg/pr35691-3.c   2017-09-22 13:58:44.070312539
>+0200
>@@ -0,0 +1,12 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
>+
>+int foo(int z0, unsigned z1)
>+{
>+  int t0 = (z0 == -1);
>+  int t1 = (z1 == -1U);
>+  int t2 = (t0 & t1);
>+  return t2;
>+}
>+
>+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* =
>\\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */
>--- gcc/testsuite/gcc.dg/pr35691-4.c.jj2017-09-22 13:55:39.855617665
>+0200
>+++ gcc/testsuite/gcc.dg/pr35691-4.c   2017-09-22 13:58:50.132236685
>+0200
>@@ -0,0 +1,12 @@
>+/* { dg-do compile } */
>+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
>+
>+int foo(int z0, unsigned z1)
>+{
>+  int t0 = (z0 != -1);
>+  int t1 = (z1 != -1U);
>+  int t2 = (t0 | t1);
>+  return t2;
>+}
>+
>+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* =
>\\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */
>
>   Jakub



Re: [PATCH][aarch64] Enable ifunc resolver attribute by default

2017-09-22 Thread Steve Ellcey
On Thu, 2017-09-21 at 12:37 +, Joseph Myers wrote:
> On Tue, 5 Sep 2017, Steve Ellcey wrote:
> 
> > 
> > 2017-09-05  Steve Ellcey  
> > 
> > * config.gcc: Add new case statement to set
> > default_gnu_indirect_function.  Remove it from x86_64-*-linux*,
> > i[34567]86-*, powerpc*-*-linux*spe*, powerpc*-*-linux*, 
> > s390-*-linux*,
> > s390x-*-linux* case statements.   Added aarch64 to the list of
> > supported architectures.
> This is OK subject to AArch64 maintainer approval, or in the absence of 
> AArch64 maintainer objections within 24 hours.

Since I didn't see any objections I have checked this in.

Steve Ellcey
sell...@cavium.com




Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Andrew Pinski
On Fri, Sep 22, 2017 at 8:59 AM, Jim Wilson  wrote:
> On Fri, Sep 22, 2017 at 8:49 AM, Jim Wilson  wrote:
>> On Falkor, because of an idiosyncracy of how the pipelines are designed, a
>> quad-word store using a reg+reg addressing mode is almost twice as slow as an
>> add followed by a quad-word store with a single reg addressing mode.  So we
>> get better performance if we disallow addressing modes using register offsets
>> with quad-word stores.
>
> This was tested with a bootstrap and make check of course.  Also,
> gcc.c-torture/compile/20021212-1.c compiled with -O3 -mcpu=falkor
> makes a nice testcase to demonstrate that the patch works.
>
> OK?

Two overall comments:
* What about splitting register_offset into two different elements,
one for non 128bit modes and one for 128bit (and more; OI, etc.) modes
so you get better address generation right away for the simd load
cases rather than having LRA/reload having to reload the address into
a register.
* Maybe adding a testcase to the testsuite to show this change.

One extra comment:
* should we change the generic tuning to avoid reg+reg for 128bit modes?

Thanks,
Andrew

>
> Jim


Re: [C++ PATCH] Fix compile time hog in replace_placeholders (PR sanitizer/81929)

2017-09-22 Thread Jason Merrill
OK.

On Thu, Sep 14, 2017 at 4:12 PM, Jakub Jelinek  wrote:
> Hi!
>
> When the expression replace_placeholders is called on contains
> many SAVE_EXPRs that appear more than once in the tree, we hang walking them
> over and over again, while it is sufficient to just walk it without
> duplicates (not using cp_walk_tree_without_duplicates, because the callback
> can cp_walk_tree* again and we want to use the same pointer set between
> those).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-09-14  Jakub Jelinek  
>
> PR sanitizer/81929
> * tree.c (struct replace_placeholders_t): Add pset field.
> (replace_placeholders_r): Call cp_walk_tree with d->pset as
> last argument instead of NULL.  Formatting fix.
> (replace_placeholders): Add pset variable, add its address
> into data.  Pass &pset instead of NULL to cp_walk_tree.
>
> * g++.dg/ubsan/pr81929.C: New test.
>
> --- gcc/cp/tree.c.jj2017-09-12 09:35:47.0 +0200
> +++ gcc/cp/tree.c   2017-09-14 17:38:07.717064412 +0200
> @@ -3063,6 +3063,7 @@ struct replace_placeholders_t
>  {
>tree obj;/* The object to be substituted for a PLACEHOLDER_EXPR.  
> */
>bool seen;   /* Whether we've encountered a PLACEHOLDER_EXPR.  */
> +  hash_set *pset;/* To avoid walking same trees multiple 
> times.  */
>  };
>
>  /* Like substitute_placeholder_in_expr, but handle C++ tree codes and
> @@ -3085,8 +3086,8 @@ replace_placeholders_r (tree* t, int* wa
>  case PLACEHOLDER_EXPR:
>{
> tree x = obj;
> -   for (; !(same_type_ignoring_top_level_qualifiers_p
> -(TREE_TYPE (*t), TREE_TYPE (x)));
> +   for (; !same_type_ignoring_top_level_qualifiers_p (TREE_TYPE (*t),
> +  TREE_TYPE (x));
>  x = TREE_OPERAND (x, 0))
>   gcc_assert (TREE_CODE (x) == COMPONENT_REF);
> *t = x;
> @@ -3118,8 +3119,7 @@ replace_placeholders_r (tree* t, int* wa
>   valp = &TARGET_EXPR_INITIAL (*valp);
>   }
> d->obj = subob;
> -   cp_walk_tree (valp, replace_placeholders_r,
> - data_, NULL);
> +   cp_walk_tree (valp, replace_placeholders_r, data_, d->pset);
> d->obj = obj;
>   }
> *walk_subtrees = false;
> @@ -3151,10 +3151,11 @@ replace_placeholders (tree exp, tree obj
>  return exp;
>
>tree *tp = &exp;
> -  replace_placeholders_t data = { obj, false };
> +  hash_set pset;
> +  replace_placeholders_t data = { obj, false, &pset };
>if (TREE_CODE (exp) == TARGET_EXPR)
>  tp = &TARGET_EXPR_INITIAL (exp);
> -  cp_walk_tree (tp, replace_placeholders_r, &data, NULL);
> +  cp_walk_tree (tp, replace_placeholders_r, &data, &pset);
>if (seen_p)
>  *seen_p = data.seen;
>return exp;
> --- gcc/testsuite/g++.dg/ubsan/pr81929.C.jj 2017-09-14 17:48:09.052611540 
> +0200
> +++ gcc/testsuite/g++.dg/ubsan/pr81929.C2017-09-14 17:49:21.644711332 
> +0200
> @@ -0,0 +1,14 @@
> +// PR sanitizer/81929
> +// { dg-do compile }
> +// { dg-options "-std=c++14 -fsanitize=undefined" }
> +
> +struct S { S &operator<< (long); S foo (); S (); };
> +
> +void
> +bar ()
> +{
> +  static_cast(S () << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0
> +  << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0
> +  << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0
> +  << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 0 << 
> 0).foo ();
> +}
>
> Jakub


Re: [PATCH][aarch64] Fix error calls in aarch64 code so they can be tranlated

2017-09-22 Thread Steve Ellcey

> In my view, the ideal message would use the same form as in
> the source code.  But I suspect that's not easy to determine at
> all sites where the message is being formatted, so the next best
> thing is to avoid using the keyword and fall back on the general
> term.
> 
> Martin
> 
> PS If possible, I would also suggest adopting a style that is
> already in use elsewhere, for consistency.  (That would suggest
> using % with no spaces and no "string" after
> arch=.)

Here is a new version of my patch.  I think it addresses these issues
and is consistent.  It does cause some regressions in tests that check
for certain error messages, specifically:

gcc.target/aarch64/spellcheck_1.c
gcc.target/aarch64/spellcheck_2.c
gcc.target/aarch64/spellcheck_3.c
gcc.target/aarch64/target_attr_11.c
gcc.target/aarch64/target_attr_12.c
gcc.target/aarch64/target_attr_17.c

But I thought we should try and get some agreement on the format of the
messages before updating the tests.  If we think these changes look
good I can submit a final patch that includes the testsuite changes.

Steve Ellcey
sell...@cavium.com

2017-09-22  Steve Ellcey  

PR target/79868
* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse):
Change argument type on aarch64_process_target_attr call.
* config/aarch64/aarch64-protos.h (aarch64_process_target_attr):
Change argument type.
* config/aarch64/aarch64.c (aarch64_attribute_info): Change
field type.
(aarch64_handle_attr_arch): Change argument type, use boolean
argument to use different strings in error calls.
(aarch64_handle_attr_cpu): Ditto.
(aarch64_handle_attr_tune): Ditto.
(aarch64_handle_attr_isa_flags): Ditto.
(aarch64_process_one_target_attr): Ditto.
(aarch64_process_target_attr): Ditto.
(aarch64_option_valid_attribute_p): Change argument type on
aarch64_process_target_attr call.


diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
index 177e638..c9945db 100644
--- a/gcc/config/aarch64/aarch64-c.c
+++ b/gcc/config/aarch64/aarch64-c.c
@@ -165,7 +165,7 @@ aarch64_pragma_target_parse (tree args, tree pop_target)
  information that it specifies.  */
   if (args)
 {
-  if (!aarch64_process_target_attr (args, "pragma"))
+  if (!aarch64_process_target_attr (args, true))
 	return false;
 
   aarch64_override_options_internal (&global_options);
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index e67c2ed..4323e9e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -445,7 +445,7 @@ bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE);
 
 void aarch64_init_builtins (void);
 
-bool aarch64_process_target_attr (tree, const char*);
+bool aarch64_process_target_attr (tree, bool);
 void aarch64_override_options_internal (struct gcc_options *);
 
 rtx aarch64_expand_builtin (tree exp,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1c14008..3bf91b3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -67,6 +67,7 @@
 #include "common/common-target.h"
 #include "selftest.h"
 #include "selftest-rtl.h"
+#include "intl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -9554,15 +9555,15 @@ struct aarch64_attribute_info
   const char *name;
   enum aarch64_attr_opt_type attr_type;
   bool allow_neg;
-  bool (*handler) (const char *, const char *);
+  bool (*handler) (const char *, bool);
   enum opt_code opt_num;
 };
 
 /* Handle the ARCH_STR argument to the arch= target attribute.
-   PRAGMA_OR_ATTR is used in potential error messages.  */
+   IS_PRAGMA is used in potential error messages.  */
 
 static bool
-aarch64_handle_attr_arch (const char *str, const char *pragma_or_attr)
+aarch64_handle_attr_arch (const char *str, bool is_pragma)
 {
   const struct processor *tmp_arch = NULL;
   enum aarch64_parse_opt_result parse_res
@@ -9579,15 +9580,22 @@ aarch64_handle_attr_arch (const char *str, const char *pragma_or_attr)
   switch (parse_res)
 {
   case AARCH64_PARSE_MISSING_ARG:
-	error ("missing architecture name in 'arch' target %s", pragma_or_attr);
+	error (is_pragma
+	   ? G_("missing name in % pragma")
+	   : G_("missing name in % attribute"));
 	break;
   case AARCH64_PARSE_INVALID_ARG:
-	error ("unknown value %qs for 'arch' target %s", str, pragma_or_attr);
+	error (is_pragma
+	   ? G_("invalid name (%qs) in % pragma")
+	   : G_("invalid name (%qs) in % attribute"),
+	   str);
 	aarch64_print_hint_for_arch (str);
 	break;
   case AARCH64_PARSE_INVALID_FEATURE:
-	error ("invalid feature modifier %qs for 'arch' target %s",
-	   str, pragma_or_attr);
+	error (is_pragma
+	   ? G_("invalid value (%qs) in % pragma")
+	   : G_("invalid value (%qs) in % attribute"),
+	   str);
 	break;
   default:
 	gcc_unre

[RFA][PATCH] Stack clash protection 06/08 - V4

2017-09-22 Thread Jeff Law

Enough changed since V3 that I think another review round on the ppc
bits is warranted.

wrap_frame_mem is gone.  Allocation, attribute setting and
notes/scheduler barriers are now in rs6000_emit_allocate_stack_1.

handle_residual is gone, it's folded into rs6000_emit_allocate_stack_1.

rs6000_emit_probe_stack_range_stack_clash now returns an insn with the
first stack adjustment or just before the stack adjustment loop.  This
seems more correct given how its return value (sp_adjust) is used.

The go test suite has been run with and without this patch.  I've found
os/signal seems to flip between pass and fail without rhyme or reason.
It may be related to system load.  TestCgoCallbackGC now passes for
reasons unknown.  I haven't run it enough to know if it is sensitive to
load or other timing factors.

I also flipped things so that clash protection is enabled by default and
re-ran the tests.  The idea being to see if I could exercise the path
that uses SP_ADJUST a bit more.  But that gave me the same results.
While I think the change in the return value in
rs6000_emit_probe_stack_range_stack_clash is more correct, I don't have
a good way to test it.

There were other minor improvements pointed out by Segher that were
fixed as well.

I kept the separation of the case where there's no stack space
allocated.   I did look at pushing that into
emit_probe_stack_range_stack_clash, but doing so actually complicates
things.  It's cleaner to leave it as-is.


OK for the trunk?

Jeff




commit b25ea3141981f833f3d1211f50209f349c2eae3e
Author: Jeff Law 
Date:   Fri Sep 22 11:47:26 2017 -0600

* config/rs6000/rs6000-protos.h (output_probe_stack_range): Update
prototype for new argument.
* config/rs6000/rs6000.c (rs6000_emit_allocate_stack_1): New 
function,
mostly extracted from rs6000_emit_allocate_stack.
(rs6000_emit_probe_stack_range_stack_clash): New function.
(rs6000_emit_allocate_stack): Call
rs6000_emit_probe_stack_range_stack_clash as needed.
(rs6000_emit_probe_stack_range): Add additional argument
to call to gen_probe_stack_range{si,di}.
(output_probe_stack_range): New.
(output_probe_stack_range_1): Renamed from output_probe_stack_range.
(output_probe_stack_range_stack_clash): New.
(rs6000_emit_prologue): Emit notes into dump file as requested.
* rs6000.md (allocate_stack): Handle -fstack-clash-protection.
(probe_stack_range): Operand 0 is now early-clobbered.
Add additional operand and pass it to output_probe_stack_range.

* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): Enable for
rs6000 and powerpc targets.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fdb7221848c..2bdb8c07372 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,22 @@
+2017-09-22  Jeff Law  
+
+   * config/rs6000/rs6000-protos.h (output_probe_stack_range): Update
+   prototype for new argument.
+   * config/rs6000/rs6000.c (rs6000_emit_allocate_stack_1): New function,
+   mostly extracted from rs6000_emit_allocate_stack.
+   (rs6000_emit_probe_stack_range_stack_clash): New function.
+   (rs6000_emit_allocate_stack): Call
+   rs6000_emit_probe_stack_range_stack_clash as needed.
+   (rs6000_emit_probe_stack_range): Add additional argument
+   to call to gen_probe_stack_range{si,di}.
+   (output_probe_stack_range): New.
+   (output_probe_stack_range_1): Renamed from output_probe_stack_range.
+   (output_probe_stack_range_stack_clash): New.
+   (rs6000_emit_prologue): Emit notes into dump file as requested.
+   * rs6000.md (allocate_stack): Handle -fstack-clash-protection.
+   (probe_stack_range): Operand 0 is now early-clobbered.
+   Add additional operand and pass it to output_probe_stack_range.
+
 2017-09-22  Richard Sandiford  
Alan Hayward  
David Sherwood  
diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 3f86aba947e..781349b850e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -128,7 +128,7 @@ extern void rs6000_emit_sISEL (machine_mode, rtx[]);
 extern void rs6000_emit_sCOND (machine_mode, rtx[]);
 extern void rs6000_emit_cbranch (machine_mode, rtx[]);
 extern char * output_cbranch (rtx, const char *, int, rtx_insn *);
-extern const char * output_probe_stack_range (rtx, rtx);
+extern const char * output_probe_stack_range (rtx, rtx, rtx);
 extern void rs6000_emit_dot_insn (rtx dst, rtx src, int dot, rtx ccreg);
 extern bool rs6000_emit_set_const (rtx, rtx);
 extern int rs6000_emit_cmove (rtx, rtx, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e5ef63889b7..9e486f592b9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -25458,6 +25458,201 

[committed] PR82289: Computing peeling costs for irrelevant drs

2017-09-22 Thread Richard Sandiford
This PR shows that we weren't filtering out irrelevant stmts in
vect_get_peeling_costs_all_drs (unlike related loops in which
we iterate over all datarefs).

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Installed as obvious.

Richard


2017-09-22  Richard Sandiford  

gcc/
PR tree-optimization/82289
* tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Check
STMT_VINFO_RELEVANT_P.

gcc/testsuite/
PR tree-optimization/82289
* gcc.dg/vect/pr82289.c: New test.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-09-22 17:44:23.043135080 +0100
+++ gcc/tree-vect-data-refs.c   2017-09-22 17:50:04.801574308 +0100
@@ -1326,6 +1326,9 @@ vect_get_peeling_costs_all_drs (vec 0)
+{
+  d = j;
+  return;
+}
+  if (!h)
+while (g)
+  ;
+  while (h < 1)
+if (a)
+  {
+fn1 (&h);
+h = 0;
+  }
+  f[e] = &c;
+}
+  while (1)
+;
+}


Re: range_int_cst_p handling in extract_range_from_binary_expr_1

2017-09-22 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Sep 20, 2017 at 2:06 PM, Richard Sandiford
>  wrote:
>> extract_range_from_binary_expr_1 had:
>>
>>   if (range_int_cst_p (&vr0)
>>   && range_int_cst_p (&vr1)
>>   && TYPE_OVERFLOW_WRAPS (expr_type))
>> ...
>>   ...
>>   extract_range_from_multiplicative_op_1 (vr, code, &vr0, &vr1);
>>
>> but extract_range_from_multiplicative_op_1 also requires range_int_cst_p.
>> I think we should bail out if either range isn't a constant.
>>
>> This might only be theoretical with current sources, but it's needed
>> once polynomial constants are added.
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linus-gnu.
>> OK to install?
>
> extract_range_from_multiplicative_op_1 also allows anti-ranges but indeed
> requires INTEGER_CSTs.  Note it won't get anti-ranges because we handle
> this case by ranges_from_anti_range.
>
> Thus ok if you also remove the apparent anti-range handling from
> extract_range_from_multiplicative_op_1 (in the assert).

Thanks, here's what I committed after testing as before.

Richard


2017-09-22  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vrp.c (extract_range_from_multiplicative_op_1): Assert
for VR_RANGE only; don't allow VR_ANTI_RANGE.
(extract_range_from_binary_expr_1): Don't call
extract_range_from_multiplicative_op_1 if !range_int_cst_p.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  2017-09-21 22:35:16.63940 +0100
+++ gcc/tree-vrp.c  2017-09-22 17:46:45.207205032 +0100
@@ -1851,8 +1851,7 @@ extract_range_from_multiplicative_op_1 (
  || code == ROUND_DIV_EXPR
  || code == RSHIFT_EXPR
  || code == LSHIFT_EXPR);
-  gcc_assert ((vr0->type == VR_RANGE
-  || (code == MULT_EXPR && vr0->type == VR_ANTI_RANGE))
+  gcc_assert (vr0->type == VR_RANGE
  && vr0->type == vr1->type);
 
   rtype = vr0->type;
@@ -2462,9 +2461,14 @@ extract_range_from_binary_expr_1 (value_
   signop sign = TYPE_SIGN (expr_type);
   unsigned int prec = TYPE_PRECISION (expr_type);
 
-  if (range_int_cst_p (&vr0)
- && range_int_cst_p (&vr1)
- && TYPE_OVERFLOW_WRAPS (expr_type))
+  if (!range_int_cst_p (&vr0)
+ || !range_int_cst_p (&vr1))
+   {
+ set_value_range_to_varying (vr);
+ return;
+   }
+
+  if (TYPE_OVERFLOW_WRAPS (expr_type))
{
  typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) vrp_int;
  typedef generic_wide_int


[PATCH] Optimize x == -1 && y == -1 (PR middle-end/35691)

2017-09-22 Thread Jakub Jelinek
Hi!

While fixing libgcc2.c (__mulvDI3), I've noticed that while we
optimize x == 0 && y == 0 into (x | y) == 0, we don't optimize
analogical x == -1 && y == -1 into (x & y) == -1.

The following patch does that, bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2017-09-22  Jakub Jelinek  

PR middle-end/35691
* match.pd: Simplify x == -1 & y == -1 into (x & y) == -1
and x != -1 | y != -1 into (x & y) != -1.

* gcc.dg/pr35691-1.c: Use -fdump-tree-forwprop1-details
instead of -fdump-tree-forwprop-details in dg-options.
* gcc.dg/pr35691-2.c: Likewise.
* gcc.dg/pr35691-3.c: New test.
* gcc.dg/pr35691-4.c: New test.

--- gcc/match.pd.jj 2017-09-15 20:42:29.0 +0200
+++ gcc/match.pd2017-09-22 13:34:29.413534762 +0200
@@ -630,17 +630,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (TYPE_UNSIGNED (type))
 (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1)
 
-/* PR35691: Transform
-   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
-   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
 (for bitop (bit_and bit_ior)
  cmp (eq ne)
+ /* PR35691: Transform
+(x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
+(x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
  (simplify
   (bitop (cmp @0 integer_zerop@2) (cmp @1 integer_zerop))
(if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
-   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
-   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)))
-(cmp (bit_ior @0 (convert @1)) @2
+   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)))
+(cmp (bit_ior @0 (convert @1)) @2)))
+ /* Transform:
+(x == -1 & y == -1) -> (x & typeof(x)(y)) == -1.
+(x != -1 | y != -1) -> (x & typeof(x)(y)) != -1.  */
+ (simplify
+  (bitop (cmp @0 integer_all_onesp@2) (cmp @1 integer_all_onesp))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+   && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)))
+(cmp (bit_and @0 (convert @1)) @2
 
 /* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
 (simplify
--- gcc/testsuite/gcc.dg/pr35691-1.c.jj 2016-11-09 15:22:31.0 +0100
+++ gcc/testsuite/gcc.dg/pr35691-1.c2017-09-22 13:58:32.665455251 +0200
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
 
 int foo(int z0, unsigned z1)
 {
--- gcc/testsuite/gcc.dg/pr35691-2.c.jj 2016-11-09 15:22:32.0 +0100
+++ gcc/testsuite/gcc.dg/pr35691-2.c2017-09-22 13:58:38.174386317 +0200
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
 
 int foo(int z0, unsigned z1)
 {
--- gcc/testsuite/gcc.dg/pr35691-3.c.jj 2017-09-22 13:55:36.186663576 +0200
+++ gcc/testsuite/gcc.dg/pr35691-3.c2017-09-22 13:58:44.070312539 +0200
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
+
+int foo(int z0, unsigned z1)
+{
+  int t0 = (z0 == -1);
+  int t1 = (z1 == -1U);
+  int t2 = (t0 & t1);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) 
z1_\[0-9\]*\\(D\\);" "forwprop1" } } */
--- gcc/testsuite/gcc.dg/pr35691-4.c.jj 2017-09-22 13:55:39.855617665 +0200
+++ gcc/testsuite/gcc.dg/pr35691-4.c2017-09-22 13:58:50.132236685 +0200
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
+
+int foo(int z0, unsigned z1)
+{
+  int t0 = (z0 != -1);
+  int t1 = (z1 != -1U);
+  int t2 = (t0 | t1);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) 
z1_\[0-9\]*\\(D\\);" "forwprop1" } } */

Jakub


Re: Don't query the frontend for unsupported types

2017-09-22 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Sep 21, 2017 at 2:56 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On September 20, 2017 2:36:03 PM GMT+02:00, Richard Sandiford
>>>  wrote:
When forcing a constant of mode MODE into memory, force_const_mem
asks the frontend to provide the type associated with that mode.
In principle type_for_mode is allowed to return null, and although
one use site correctly handled that, the other didn't.

I think there's agreement that it's bogus to use type_for_mode for
this kind of thing, since it forces frontends to handle types that
don't exist in that language.  See e.g. http://gcc.gnu.org/PR46805
where the Go frontend was forced to handle vector types even though
Go doesn't have vector types.

Also, the frontends use code like:

  else if (VECTOR_MODE_P (mode))
{
  machine_mode inner_mode = GET_MODE_INNER (mode);
  tree inner_type = c_common_type_for_mode (inner_mode, unsignedp);
  if (inner_type != NULL_TREE)
return build_vector_type_for_mode (inner_type, mode);
}

and there's no guarantee that every vector mode M used by backend
rtl has an associated vector type whose TYPE_MODE is M.  I think
really the type_for_mode hook should only return trees that _do_ have
the requested TYPE_MODE, but PR46805 linked above shows that this is
likely to have too many knock-on consequences.  It doesn't make sense
for force_const_mem to ask about vector modes that aren't valid for
vector types, so this patch handles the condition there instead.

This is needed for SVE multi-register modes, which are modelled as
vector modes but are not usable as vector types.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and
powerpc64le-linus-gnu.
OK to install?
>>>
>>> I think we should get rid of the use entirely.
>>
>> I first read this as not using type_for_mode at all in force_const_mem,
>> which sounded like a good thing :-)
>
> That's what I meant ;)  A mode doesn't really have a type...
>
>   I tried it overnight on the usual
>> at-least-one-target-per-CPU set and diffing the before and after
>> assembly for the testsuite.  And it looks like i686 relies on this
>> to get an alignment of 16 rather than 4 for XFmode constants:
>> GET_MODE_ALIGNMENT (XFmode) == 32 (as requested by i386-modes.def),
>> but i386's CONSTANT_ALIGNMENT increases it to 128 for static constants.
>
> Then the issue is that CONSTANT_ALIGNMENT takes a tree and not a mode...
> even worse than type_for_mode is a use of make_tree!  Incidentially
> ix86_constant_alignment _does_ look at the mode in the end...

OK, I guess this means another target hook conversion.  The patch
below converts CONSTANT_ALIGNMENT with its current interface.
The definition:

  #define CONSTANT_ALIGNMENT(EXP, ALIGN) \
(TREE_CODE (EXP) == STRING_CST \
 && (ALIGN) < BITS_PER_WORD ? BITS_PER_WORD : (ALIGN))

was very common, so the patch adds a canned definition for that,
called constant_alignment_word_strings.  Some ports had a variation
that used a port-local FASTEST_ALIGNMENT instead of BITS_PER_WORD;
the patch uses constant_alignment_word_strings if FASTEST_ALIGNMENT
was always BITS_PER_WORD and a port-local hook function otherwise.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the testsuite assembly output on at least one
target per CPU directory.  I don't think this comes under Jeff's
preapproval due to the constant_alignment_word_strings thing, so:
OK to install?

If so, then I'll follow up with a separate hook for rtl modes, which
varasm, default_constant_alignment and constant_alignment_word_strings
can all use.

Thanks,
Richard


2017-09-22  Richard Sandiford  

gcc/
* target.def (constant_alignment): New hook.
* defaults.h (CONSTANT_ALIGNMENT): Delete.
* doc/tm.texi.in (CONSTANT_ALIGNMENT): Replace with...
(TARGET_CONSTANT_ALIGNMENT): ...this new hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_constant_alignment): Declare.
(constant_alignment_word_strings): Likewise.
* targhooks.c (default_constant_alignment): New function.
(constant_alignment_word_strings): Likewise.
* builtins.c (get_object_alignment_2): Use targetm.constant_alignment
instead of CONSTANT_ALIGNMENT.
* varasm.c (align_variable, get_variable_align, build_constant_desc)
(force_const_mem): Likewise.
* config/aarch64/aarch64.h (CONSTANT_ALIGNMENT): Delete.
* config/aarch64/aarch64.c (aarch64_constant_alignment): New function.
(aarch64_classify_address): Call it instead of CONSTANT_ALIGNMENT.
(TARGET_CONSTANT_ALIGNMENT): Redefine.
* config/alpha/alpha.h (CONSTANT_ALIGNMENT): Delete commented-out
definition.
* config/arc/arc.h (CONSTANT_ALIGNMENT): Delete.
* config/arc/arc.c (TA

Change permute index type to unsigned short

2017-09-22 Thread Richard Sandiford
This patch changes the element type of (auto_)vec_perm_indices from
unsigned char to unsigned short.  This is needed for fixed-length
2048-bit SVE.  (SVE is variable-length by default, but it's possible
to ask for specific vector lengths if you want to.)

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the testsuite assembly output on at least one
target per CPU directory.  OK to install?

Richard


2017-09-22  Richard Sandiford  

gcc/
* target.h (vec_perm_indices): Use unsigned short rather than
unsigned char.
(auto_vec_perm_indices): Likewise.
* config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const_ok):
Use unsigned int rather than unsigned char.
* config/arm/arm.c (arm_vectorize_vec_perm_const_ok): Likewise.

Index: gcc/target.h
===
--- gcc/target.h2017-09-14 17:04:19.080694343 +0100
+++ gcc/target.h2017-09-22 17:35:22.486794044 +0100
@@ -193,11 +193,11 @@ enum vect_cost_model_location {
 
 /* The type to use for vector permutes with a constant permute vector.
Each entry is an index into the concatenated input vectors.  */
-typedef vec vec_perm_indices;
+typedef vec vec_perm_indices;
 
 /* Same, but can be used to construct local permute vectors that are
automatically freed.  */
-typedef auto_vec auto_vec_perm_indices;
+typedef auto_vec auto_vec_perm_indices;
 
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-09-22 17:31:56.412840135 +0100
+++ gcc/config/aarch64/aarch64.c2017-09-22 17:35:22.483794044 +0100
@@ -13820,7 +13820,7 @@ aarch64_vectorize_vec_perm_const_ok (mac
   nelt = sel.length ();
   for (i = which = 0; i < nelt; ++i)
 {
-  unsigned char e = d.perm[i];
+  unsigned int e = d.perm[i];
   gcc_assert (e < 2 * nelt);
   which |= (e < nelt ? 1 : 2);
 }
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c2017-09-22 17:31:56.414735941 +0100
+++ gcc/config/arm/arm.c2017-09-22 17:35:22.486794044 +0100
@@ -29261,7 +29261,7 @@ arm_vectorize_vec_perm_const_ok (machine
   nelt = GET_MODE_NUNITS (d.vmode);
   for (i = which = 0; i < nelt; ++i)
 {
-  unsigned char e = d.perm[i];
+  unsigned int e = d.perm[i];
   gcc_assert (e < 2 * nelt);
   which |= (e < nelt ? 1 : 2);
 }


Update interface to TARGET_VECTORIZE_VEC_PERM_CONST_OK

2017-09-22 Thread Richard Sandiford
This patch makes TARGET_VECTORIZE_VEC_PERM_CONST_OK take the permute
vector in the form of a vec_perm_indices instead of an unsigned char *.
It follows on from the recent patch that did the same in target-independent
code.

It was easy to make ARM and AArch64 use vec_perm_indices internally
as well, and converting AArch64 helps with SVE.  I did try doing the same
for the other ports, but the surgery needed was much more invasive and
much less obviously correct.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
Also tested by comparing the testsuite assembly output on at least one
target per CPU directory.  OK to install?

Richard


2017-09-22  Richard Sandiford  

gcc/
* target.def (vec_perm_const_ok): Change sel parameter to
vec_perm_indices.
* optabs-query.c (can_vec_perm_p): Update accordingly.
* doc/tm.texi: Regenerate.
* config/aarch64/aarch64.c (expand_vec_perm_d): Change perm to
auto_vec_perm_indices and remove separate nelt field.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_zip)
(aarch64_evpc_ext, aarch64_evpc_rev, aarch64_evpc_dup)
(aarch64_evpc_tbl, aarch64_expand_vec_perm_const_1)
(aarch64_expand_vec_perm_const): Update accordingly.
(aarch64_vectorize_vec_perm_const_ok): Likewise.  Change sel
to vec_perm_indices.
* config/arm/arm.c (expand_vec_perm_d): Change perm to
auto_vec_perm_indices and remove separate nelt field.
(arm_evpc_neon_vuzp, arm_evpc_neon_vzip, arm_evpc_neon_vrev)
(arm_evpc_neon_vtrn, arm_evpc_neon_vext, arm_evpc_neon_vtbl)
(arm_expand_vec_perm_const_1, arm_expand_vec_perm_const): Update
accordingly.
(arm_vectorize_vec_perm_const_ok): Likewise.  Change sel
to vec_perm_indices.
* config/i386/i386.c (ix86_vectorize_vec_perm_const_ok): Change
sel to vec_perm_indices.
* config/ia64/ia64.c (ia64_vectorize_vec_perm_const_ok): Likewise.
* config/mips/mips.c (mips_vectorize_vec_perm_const_ok): Likewise.
* config/powerpcspe/powerpcspe.c (rs6000_vectorize_vec_perm_const_ok):
Likewise.
* config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const_ok):
Likewise.

Index: gcc/target.def
===
--- gcc/target.def  2017-09-22 17:31:36.935337179 +0100
+++ gcc/target.def  2017-09-22 17:31:56.428954480 +0100
@@ -1847,7 +1847,7 @@ DEFHOOK
 DEFHOOK
 (vec_perm_const_ok,
  "Return true if a vector created for @code{vec_perm_const} is valid.",
- bool, (machine_mode, const unsigned char *sel),
+ bool, (machine_mode, vec_perm_indices),
  NULL)
 
 /* Return true if the target supports misaligned store/load of a
Index: gcc/optabs-query.c
===
--- gcc/optabs-query.c  2017-09-14 17:04:19.080694343 +0100
+++ gcc/optabs-query.c  2017-09-22 17:31:56.428006577 +0100
@@ -367,7 +367,7 @@ can_vec_perm_p (machine_mode mode, bool
   if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing
  && (sel == NULL
  || targetm.vectorize.vec_perm_const_ok == NULL
- || targetm.vectorize.vec_perm_const_ok (mode, &(*sel)[0])))
+ || targetm.vectorize.vec_perm_const_ok (mode, *sel)))
return true;
 }
 
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2017-09-22 17:31:36.933441374 +0100
+++ gcc/doc/tm.texi 2017-09-22 17:31:56.428006577 +0100
@@ -5774,7 +5774,7 @@ correct for most targets.
 Return true if vector alignment is reachable (by peeling N iterations) for the 
given scalar type @var{type}.  @var{is_packed} is false if the scalar access 
using @var{type} is known to be naturally aligned.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK 
(machine_mode, const unsigned char *@var{sel})
+@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK 
(machine_mode, @var{vec_perm_indices})
 Return true if a vector created for @code{vec_perm_const} is valid.
 @end deftypefn
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-09-21 11:53:16.681759682 +0100
+++ gcc/config/aarch64/aarch64.c2017-09-22 17:31:56.412840135 +0100
@@ -141,8 +141,8 @@ static void aarch64_elf_asm_constructor
 static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED;
 static void aarch64_override_options_after_change (void);
 static bool aarch64_vector_mode_supported_p (machine_mode);
-static bool aarch64_vectorize_vec_perm_const_ok (machine_mode vmode,
-const unsigned char *sel);
+static bool aarch64_vectorize_vec_perm_const_ok (machine_mode,
+vec_perm_indices);
 static int aarch64_address_cost (

Re: [PING][PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope

2017-09-22 Thread Jakub Jelinek
On Thu, Sep 21, 2017 at 01:08:01PM -0700, Kevin Buettner wrote:
> Ping.

Ok, thanks.

> On Mon, 7 Aug 2017 17:51:38 -0700
> Kevin Buettner  wrote:
> 
> > On Wed, 10 May 2017 17:24:27 +0200
> > Jakub Jelinek  wrote:
> >  
> > > What I don't like is that the patch is inconsistent, it sets DECL_CONTEXT
> > > of the child function for all kinds of outlined functions, but then you 
> > > just
> > > choose one of the many places and add it into the BLOCK tree.  Any reason
> > > why the DECL_CONTEXT change can't be done in a helper function together
> > > with all the changes you've added into omp-expand.c, and then call it from
> > > expand_omp_parallel (with the child_fn and entry_stmt arguments) so that
> > > you can call it easily also for other constructs, either now or later on? 
> > >  
> > 
> > I've worked out a way to do the DECL_CONTEXT and the scope change
> > together.  The helper function should be usable for other constructs,
> > though I have not tested this yet.
> > 
> > > Also, is there any rationale on appending the FUNCTION_DECL to BLOCK_VARS
> > > instead of prepending it there (which is cheaper)?  Does the debugger
> > > care about relative order of those artificial functions vs. other
> > > variables in the lexical scope?  
> > 
> > To the best of my knowledge, the debugger doesn't care about the
> > order.  I've changed the code to prepend the FUNCTION_DECL to
> > BLOCK_VARS instead.
> > 
> > How does this new version (below) look?
> > 
> > I've done a "make bootstrap" and "make -k check". No regressions found
> > for x86_64.
> > 
> > gcc/ChangeLog:
> > 
> > * omp-expand.c (adjust_context_scope): New function.
> > (expand_parallel_call): Call adjust_context_scope.
> > ---
> >  gcc/omp-expand.c | 38 ++
> >  1 file changed, 38 insertions(+)
> > 
> > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
> > index d6755cd..9eb0a89 100644
> > --- a/gcc/omp-expand.c
> > +++ b/gcc/omp-expand.c
> > @@ -498,6 +498,42 @@ parallel_needs_hsa_kernel_p (struct omp_region *region)
> >return false;
> >  }
> >  
> > +/* Change DECL_CONTEXT of CHILD_FNDECL to that of the parent function.
> > +   Add CHILD_FNDECL to decl chain of the supercontext of the block
> > +   ENTRY_BLOCK - this is the block which originally contained the
> > +   code from which CHILD_FNDECL was created.
> > +   
> > +   Together, these actions ensure that the debug info for the outlined
> > +   function will be emitted with the correct lexical scope.  */
> > +
> > +static void
> > +adjust_context_and_scope (tree entry_block, tree child_fndecl)
> > +{
> > +  if (entry_block != NULL_TREE && TREE_CODE (entry_block) == BLOCK)
> > +{
> > +  tree b = BLOCK_SUPERCONTEXT (entry_block);
> > +
> > +  if (TREE_CODE (b) == BLOCK)
> > +{
> > + tree parent_fndecl;
> > +
> > + /* Follow supercontext chain until the parent fndecl
> > +is found.  */
> > + for (parent_fndecl = BLOCK_SUPERCONTEXT (b);
> > +  TREE_CODE (parent_fndecl) == BLOCK;
> > +  parent_fndecl = BLOCK_SUPERCONTEXT (parent_fndecl))
> > +   ;
> > +
> > + gcc_assert (TREE_CODE (parent_fndecl) == FUNCTION_DECL);
> > +
> > + DECL_CONTEXT (child_fndecl) = parent_fndecl;
> > +
> > + DECL_CHAIN (child_fndecl) = BLOCK_VARS (b);
> > + BLOCK_VARS (b) = child_fndecl;
> > +   }
> > +}
> > +}
> > +
> >  /* Build the function calls to GOMP_parallel_start etc to actually
> > generate the parallel operation.  REGION is the parallel region
> > being expanded.  BB is the block where to insert the code.  WS_ARGS
> > @@ -667,6 +703,8 @@ expand_parallel_call (struct omp_region *region, 
> > basic_block bb,
> >tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt);
> >t2 = build_fold_addr_expr (child_fndecl);
> >  
> > +  adjust_context_and_scope (gimple_block (entry_stmt), child_fndecl);
> > +
> >vec_alloc (args, 4 + vec_safe_length (ws_args));
> >args->quick_push (t2);
> >args->quick_push (t1);

Jakub


Re: [patch][arm] (respin) Improve error checking in parsecpu.awk

2017-09-22 Thread Ramana Radhakrishnan

On 9/22/17 4:58 PM, Vidya Praveen wrote:

Hello,

This patch by Richard Earnshaw was reverted along with the commit that preceded
it as the preceding commit was causing  cross-native builds  to fail and I
presumed this patch was related too.  Now I am respinning as the issue that
caused the cross-native failure have been fixed. This patch however is simply
rebased and has no other changes.

For reference, the ChangeLog of the preceding patch that broke cross-native 
build.

 [arm] auto-generate arm-isa.h from CPU descriptions

 This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
 This has the primary advantage that it makes the description file more
 self-contained, but it also solves the 'array dimensioning' problem
 that Tamar recently encountered.  It adds two new constructs to
 arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
 a group of feature bits so that they can be referenced together.  We
 follow the convention that feature bits are all lower case, while
 fgroups are (predominantly) upper case.  This is helpful as in some
 contexts they share the same namespace.  Most of the minor changes in
 this patch are related to adopting this new naming convention.

 * config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
 * config/arm/arm-isa.h: Delete.  Move definitions to ...
 * arm-cpus.in: ... here.  Use new feature and fgroup values.
 * config/arm/arm.c (arm_option_override): Use lower case for feature
 bit names.
 * config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
 (TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
 * config/arm/parsecpu.awk (END): Add new command 'isa'.
 (isa_pfx): Delete.
 (print_isa_bits_for): New function.
 (gen_isa): New function.
 (gen_comm_data): Use print_isa_bits_for.
 (define feature): New keyword.
 (define fgroup): New keyword.
 * config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h
 (arm-isa.h): Add rule to generate file.
 * common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
 case for feature bit names.

Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal
cross build (arm-none-eabi) on x86_64.

OK for trunk?



Ok, please apply.

Ramana


Regards
VP.

gcc/ChangeLog:

[arm] Improve error checking in parsecpu.awk

This patch adds a bit more error checking to parsecpu.awk to ensure
that statements are not missing arguments or have excess arguments
beyond those permitted.  It also slightly improves the handling of
errors so that we terminate properly if parsing fails and be as
helpful as we can while in the parsing phase.

2017-09-22  Richard Earnshaw  

* config/arm/parsecpu.awk (fatal): Note that we've encountered an
error.  Only quit immediately if parsing is complete.
(BEGIN): Initialize fatal_err and parse_done.
(begin fpu, end fpu): Check number of arguments.
(begin arch, end arch): Likewise.
(begin cpu, end cpu): Likewise.
(cname, tune for, tune flags, architecture, fpu, option): Likewise.
(optalias): Likewise.





Re: [PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Jim Wilson
On Fri, Sep 22, 2017 at 8:49 AM, Jim Wilson  wrote:
> On Falkor, because of an idiosyncracy of how the pipelines are designed, a
> quad-word store using a reg+reg addressing mode is almost twice as slow as an
> add followed by a quad-word store with a single reg addressing mode.  So we
> get better performance if we disallow addressing modes using register offsets
> with quad-word stores.

This was tested with a bootstrap and make check of course.  Also,
gcc.c-torture/compile/20021212-1.c compiled with -O3 -mcpu=falkor
makes a nice testcase to demonstrate that the patch works.

OK?

Jim


[patch][arm] (respin) Improve error checking in parsecpu.awk

2017-09-22 Thread Vidya Praveen
Hello,

This patch by Richard Earnshaw was reverted along with the commit that preceded
it as the preceding commit was causing  cross-native builds  to fail and I
presumed this patch was related too.  Now I am respinning as the issue that
caused the cross-native failure have been fixed. This patch however is simply
rebased and has no other changes.

For reference, the ChangeLog of the preceding patch that broke cross-native 
build.

[arm] auto-generate arm-isa.h from CPU descriptions

This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
This has the primary advantage that it makes the description file more
self-contained, but it also solves the 'array dimensioning' problem
that Tamar recently encountered.  It adds two new constructs to
arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
a group of feature bits so that they can be referenced together.  We
follow the convention that feature bits are all lower case, while
fgroups are (predominantly) upper case.  This is helpful as in some
contexts they share the same namespace.  Most of the minor changes in
this patch are related to adopting this new naming convention.

* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
* config/arm/arm-isa.h: Delete.  Move definitions to ...
* arm-cpus.in: ... here.  Use new feature and fgroup values.
* config/arm/arm.c (arm_option_override): Use lower case for feature
bit names.
* config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
(TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
* config/arm/parsecpu.awk (END): Add new command 'isa'.
(isa_pfx): Delete.
(print_isa_bits_for): New function.
(gen_isa): New function.
(gen_comm_data): Use print_isa_bits_for.
(define feature): New keyword.
(define fgroup): New keyword.
* config/arm/t-arm (OPTIONS_H_EXTRA): Add arm-isa.h
(arm-isa.h): Add rule to generate file.
* common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
case for feature bit names.

Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal
cross build (arm-none-eabi) on x86_64.

OK for trunk?

Regards
VP.

gcc/ChangeLog:

[arm] Improve error checking in parsecpu.awk

This patch adds a bit more error checking to parsecpu.awk to ensure
that statements are not missing arguments or have excess arguments
beyond those permitted.  It also slightly improves the handling of
errors so that we terminate properly if parsing fails and be as
helpful as we can while in the parsing phase.

2017-09-22  Richard Earnshaw  

* config/arm/parsecpu.awk (fatal): Note that we've encountered an
error.  Only quit immediately if parsing is complete.
(BEGIN): Initialize fatal_err and parse_done.
(begin fpu, end fpu): Check number of arguments.
(begin arch, end arch): Likewise.
(begin cpu, end cpu): Likewise.
(cname, tune for, tune flags, architecture, fpu, option): Likewise.
(optalias): Likewise.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 5b9217c..4885746 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -2154,6 +2154,17 @@
 
 2017-09-06  Richard Earnshaw  
 
+	* config/arm/parsecpu.awk (fatal): Note that we've encountered an
+	error.  Only quit immediately if parsing is complete.
+	(BEGIN): Initialize fatal_err and parse_done.
+	(begin fpu, end fpu): Check number of arguments.
+	(begin arch, end arch): Likewise.
+	(begin cpu, end cpu): Likewise.
+	(cname, tune for, tune flags, architecture, fpu, option): Likewise.
+	(optalias): Likewise.
+
+2017-09-06  Richard Earnshaw  
+
 	* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
 	* config/arm/arm-isa.h: Delete.  Move definitions to ...
 	* arm-cpus.in: ... here.  Use new feature and fgroup values.
diff --git a/gcc/config/arm/parsecpu.awk b/gcc/config/arm/parsecpu.awk
index d07d3fc..0b4fc68 100644
--- a/gcc/config/arm/parsecpu.awk
+++ b/gcc/config/arm/parsecpu.awk
@@ -32,7 +32,8 @@
 
 function fatal (m) {
 print "error ("lineno"): " m > "/dev/stderr"
-exit 1
+fatal_err = 1
+if (parse_done) exit 1
 }
 
 function toplevel () {
@@ -502,14 +503,18 @@ BEGIN {
 arch_name = ""
 fpu_name = ""
 lineno = 0
+fatal_err = 0
+parse_done = 0
 if (cmd == "") fatal("Usage parsecpu.awk -v cmd=")
 }
 
+# New line.  Reset parse status and increment line count for error messages
 // {
 lineno++
 parse_ok = 0
 }
 
+# Comments must be on a line on their own.
 /^#/ {
 parse_ok = 1
 }
@@ -552,12 +557,14 @@ BEGIN {
 }
 
 /^begin fpu / {
+if (NF != 3) fatal("syntax: begin fpu ")
 toplevel()
 fpu_name = $3
 parse_ok = 1
 }
 
 /^end fpu / {
+if (NF != 3) fatal("syntax: end fpu ")
 if (fpu_name != $3) fatal("mimatched end fpu")
 if (! (fpu_name in fpu_isa)) {
 	fatal("fpu definition \"" fpu_name "\" lacks an \"isa\" stat

Re: [patch][arm] (respin) auto-generate arm-isa.h from CPU descriptions

2017-09-22 Thread Ramana Radhakrishnan

On 22/09/17 16:54, Vidya Praveen wrote:

Hello,

This patch by Richard Earnshaw was reverted earlier as it was breaking
cross-native builds. Respinning now with a minor change that fixes the build
issue - adding arm-isa.h to GTM_H. Also remove a redundant dependency (TM_H
includes GTM_H).

Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal
cross build (arm-none-eabi) on x86_64.

OK for trunk?


Ok, please apply.

Ramana



Regards
VP.

gcc/ChangeLog:

[arm] auto-generate arm-isa.h from CPU descriptions

This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
This has the primary advantage that it makes the description file more
self-contained, but it also solves the 'array dimensioning' problem
that Tamar recently encountered.  It adds two new constructs to
arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
a group of feature bits so that they can be referenced together.  We
follow the convention that feature bits are all lower case, while
fgroups are (predominantly) upper case. This is helpful as in some
contexts they share the same namespace. Most of the minor changes in
this patch are related to adopting this new naming convention.

2017-09-22  Richard Earnshaw  

* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
* config/arm/arm-isa.h: Delete.  Move definitions to ...
* arm-cpus.in: ... here.  Use new feature and fgroup values.
* config/arm/arm.c (arm_option_override): Use lower case for feature
bit names.
* config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
(TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
* config/arm/parsecpu.awk (END): Add new command 'isa'.
(isa_pfx): Delete.
(print_isa_bits_for): New function.
(gen_isa): New function.
(gen_comm_data): Use print_isa_bits_for.
(define feature): New keyword.
(define fgroup): New keyword.
* config/arm/t-arm (TM_H): Remove.
(GTM_H): Add arm-isa.h.
(arm-isa.h): Add rule to generate file.
* common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
case for feature bit names.







[patch][arm] (respin) auto-generate arm-isa.h from CPU descriptions

2017-09-22 Thread Vidya Praveen
Hello,

This patch by Richard Earnshaw was reverted earlier as it was breaking
cross-native builds. Respinning now with a minor change that fixes the build
issue - adding arm-isa.h to GTM_H. Also remove a redundant dependency (TM_H
includes GTM_H).

Tested by building cross/cross-native arm-none-linux-gnueabihf and baremetal
cross build (arm-none-eabi) on x86_64.

OK for trunk?

Regards
VP.

gcc/ChangeLog:

[arm] auto-generate arm-isa.h from CPU descriptions

This patch autogenerates arm-isa.h from new entries in arm-cpus.in.
This has the primary advantage that it makes the description file more
self-contained, but it also solves the 'array dimensioning' problem
that Tamar recently encountered.  It adds two new constructs to
arm-cpus.in: features and fgroups.  Fgroups are simply a way of naming
a group of feature bits so that they can be referenced together.  We
follow the convention that feature bits are all lower case, while
fgroups are (predominantly) upper case. This is helpful as in some
contexts they share the same namespace. Most of the minor changes in
this patch are related to adopting this new naming convention.

2017-09-22  Richard Earnshaw  

* config.gcc (arm*-*-*): Don't add arm-isa.h to tm_p_file.
* config/arm/arm-isa.h: Delete.  Move definitions to ...
* arm-cpus.in: ... here.  Use new feature and fgroup values.
* config/arm/arm.c (arm_option_override): Use lower case for feature
bit names.
* config/arm/arm.h (TARGET_HARD_FLOAT): Likewise.
(TARGET_VFP3, TARGET_VFP5, TARGET_FMA): Likewise.
* config/arm/parsecpu.awk (END): Add new command 'isa'.
(isa_pfx): Delete.
(print_isa_bits_for): New function.
(gen_isa): New function.
(gen_comm_data): Use print_isa_bits_for.
(define feature): New keyword.
(define fgroup): New keyword.
* config/arm/t-arm (TM_H): Remove.
(GTM_H): Add arm-isa.h.
(arm-isa.h): Add rule to generate file.
* common/config/arm/arm-common.c: (arm_canon_arch_option): Use lower
case for feature bit names.

diff --git a/gcc/common/config/arm/arm-common.c b/gcc/common/config/arm/arm-common.c
index 38bd3a7..7cb99ec 100644
--- a/gcc/common/config/arm/arm-common.c
+++ b/gcc/common/config/arm/arm-common.c
@@ -574,7 +574,7 @@ arm_canon_arch_option (int argc, const char **argv)
 	{
 	  /* The easiest and safest way to remove the default fpu
 	 capabilities is to look for a '+no..' option that removes
-	 the base FPU bit (isa_bit_VFPv2).  If that doesn't exist
+	 the base FPU bit (isa_bit_vfpv2).  If that doesn't exist
 	 then the best we can do is strip out all the bits that
 	 might be part of the most capable FPU we know about,
 	 which is "crypto-neon-fp-armv8".  */
@@ -586,7 +586,7 @@ arm_canon_arch_option (int argc, const char **argv)
 		   ++ext)
 		{
 		  if (ext->remove
-		  && check_isa_bits_for (ext->isa_bits, isa_bit_VFPv2))
+		  && check_isa_bits_for (ext->isa_bits, isa_bit_vfpv2))
 		{
 		  arm_initialize_isa (fpu_isa, ext->isa_bits);
 		  bitmap_and_compl (target_isa, target_isa, fpu_isa);
@@ -620,7 +620,7 @@ arm_canon_arch_option (int argc, const char **argv)
 {
   /* Clearing the VFPv2 bit is sufficient to stop any extention that
 	 builds on the FPU from matching.  */
-  bitmap_clear_bit (target_isa, isa_bit_VFPv2);
+  bitmap_clear_bit (target_isa, isa_bit_vfpv2);
 }
 
   /* If we don't have a selected architecture by now, something's
@@ -692,8 +692,8 @@ arm_canon_arch_option (int argc, const char **argv)
  capable FPU variant that we do support.  This is sufficient for
  multilib selection.  */
 
-  if (bitmap_bit_p (target_isa_unsatisfied, isa_bit_VFPv2)
-  && bitmap_bit_p (fpu_isa, isa_bit_VFPv2))
+  if (bitmap_bit_p (target_isa_unsatisfied, isa_bit_vfpv2)
+  && bitmap_bit_p (fpu_isa, isa_bit_vfpv2))
 {
   std::list::iterator ipoint = extensions.begin ();
 
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 555ed69..00225104 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -593,7 +593,7 @@ x86_64-*-*)
 	tm_file="vxworks-dummy.h ${tm_file}"
 	;;
 arm*-*-*)
-	tm_p_file="arm/arm-flags.h arm/arm-isa.h ${tm_p_file} arm/aarch-common-protos.h"
+	tm_p_file="arm/arm-flags.h ${tm_p_file} arm/aarch-common-protos.h"
 	tm_file="vxworks-dummy.h ${tm_file}"
 	;;
 mips*-*-* | sh*-*-* | sparc*-*-*)
diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
index d009a9e..07de4c9 100644
--- a/gcc/config/arm/arm-cpus.in
+++ b/gcc/config/arm/arm-cpus.in
@@ -40,6 +40,210 @@
 # names in the final compiler.  The order within each group is preserved and
 # forms the order for the list within the compiler.
 
+# Most objects in this file support forward references.  The major
+# exception is feature groups, which may only refer to previously
+# defined features or feature groups.  This is done to avoid the risk
+# of feature groups recursively ref

[PATCH, AArch64] Disable reg offset in quad-word store for Falkor.

2017-09-22 Thread Jim Wilson
On Falkor, because of an idiosyncracy of how the pipelines are designed, a
quad-word store using a reg+reg addressing mode is almost twice as slow as an
add followed by a quad-word store with a single reg addressing mode.  So we
get better performance if we disallow addressing modes using register offsets
with quad-word stores.

Using lmbench compiled with -O2 -ftree-vectorize as my benchmark, I see a 13%
performance increase on stream copy using this patch, and a 16% performance
increase on stream scale using this patch.  I also see a small performance
increase on SPEC CPU2006 of around 0.2% for int and 0.4% for FP at -O3.

gcc/
* config/aarch64/aarch64-protos.h (aarch64_movti_target_operand_p):
New.
* config/aarch64/aarch64-simd.md (aarch64_simd_mov): Use Utf.
* config/aarch64/aarch64-tuning-flags.def
(SLOW_REGOFFSET_QUADWORD_STORE): New.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
SLOW_REGOFFSET_QUADWORD_STORE to tuning flags.
(aarch64_movti_target_operand_p): New.
* config/aarch64/aarch64.md (movti_aarch64): Use Utf.
(movtf_aarch64): Likewise.
* config/aarch64/constraints.md (Utf): New.
---
 gcc/config/aarch64/aarch64-protos.h |  1 +
 gcc/config/aarch64/aarch64-simd.md  |  4 ++--
 gcc/config/aarch64/aarch64-tuning-flags.def |  4 
 gcc/config/aarch64/aarch64.c| 14 +-
 gcc/config/aarch64/aarch64.md   |  6 +++---
 gcc/config/aarch64/constraints.md   |  6 ++
 6 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index e67c2ed..2dfd057 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -379,6 +379,7 @@ const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
 rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
 bool aarch64_simd_mem_operand_p (rtx);
+bool aarch64_movti_target_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool);
 rtx aarch64_tls_get_addr (void);
 tree aarch64_fold_builtin (tree, int, tree *, bool);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 70e9339..88bf210 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -133,9 +133,9 @@
 
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq,  Utf,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz,   w,w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
&& (register_operand (operands[0], mode)
|| aarch64_simd_reg_or_zero (operands[1], mode))"
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index f48642c..7d0b104 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", 
SLOW_UNALIGNED_LDPW)
are not considered cheap.  */
 AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
 
+/* Don't use a register offset in a memory address for a quad-word store.  */
+AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store",
+SLOW_REGOFFSET_QUADWORD_STORE)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5e26cb7..d6f1133 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -818,7 +818,7 @@ static const struct tune_params qdf24xx_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_STRONG,  /* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE),  /* tune_flags.  */
   &qdf24xx_prefetch_tune
 };
 
@@ -11821,6 +11821,18 @@ aarch64_simd_mem_operand_p (rtx op)
|| REG_P (XEXP (op, 0)));
 }
 
+/* Return TRUE if OP uses an efficient memory address for quad-word target.  */
+bool
+aarch64_movti_target_operand_p (rtx op)
+{
+  if (! optimize_size
+  && (aarch64_tune_params.extra_tuning_flags
+ & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE))
+return MEM_P (op) && ! (GET_CODE (XEXP (op, 0)) == PLUS
+   && ! CONST_INT_P (XEXP (XEXP (op, 0), 1)));
+  return MEM_P (op);
+}
+
 /* Emit a register copy from operand to operand, taking care not to
early-clobber source registers in the process.
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f8cdb06..9c7e356 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1023,7 +1023,7 @@
 
 (define_i

[committed] v2: C++: underline parameters in mismatching function calls

2017-09-22 Thread David Malcolm
On Thu, 2017-09-21 at 09:20 -0700, Nathan Sidwell wrote:
> On 09/20/2017 12:52 PM, David Malcolm wrote:
> > When we have a type mismatch in a C++ function call, e.g.
> > whereas underlining the mismatching things would make the messages
> > easier to comprehend:
> > 
> >test.c: In function 'int caller(int, int, float)':
> >test.c:5:38: error: invalid conversion from 'int' to 'const
> > char*'
> >[-fpermissive]
> > return callee (first, second, third);
> >   ^~
> >test.c:1:12: note:   initializing argument 2 of 'int callee(int,
> >const char*, float)'
> > extern int callee (int one, const char *two, float three);
> 
> Looks nice.  Thanks for addressing what Jason & I said at the summit.
> 
> > OK for trunk?
> 
> A few nits, I think ...
> 
> 
> > +  location_t param_loc;
> > +  if (declarator && declarator->id_loc)
> > +param_loc = make_location (declarator->id_loc,
> > +  decl_spec_token_start->location,
> > +  input_location);
> > +  else
> > +param_loc = make_location (decl_spec_token_start->location,
> > +  decl_spec_token_start->location,
> > +  input_location);
> 
> Are you implicitly relying on UNKNOWN_LOCATION being zero?  Should
> the 
> if condition be
>if (declarator && declarator->id_loc != UNKNOWN_LOCATION)
> ?
> 
> The only difference between those 2 calls is the first param.  I
> find 
> such idiom confusing.  Can you assign the arg to a temp and then
> call 
> the function exactly once? No need for re-review on that change.
> 
> >   /* A collection of calls where argument 2 is of the wrong type.
> >   
> >  TODO: we should put the caret and underline for the diagnostic
> > -   at the second argument, rather than the close paren.
> > -
> > -   TODO: we should highlight the second parameter of the callee,
> > rather
> > -   than its name.  */
> > +   at the second argument, rather than the close paren.  */
> 
> Yay, x-off a TODO!
> 
> Looks good otherwise.
> 
> nathan

Thanks.

I went ahead and committed the following to trunk (as r253096), which
fixes the issues you pointed out above, and the FUNCTION_FIRST_USER_PARM
one Jason pointed out.

(Successfully bootstrapped®rtested on x86_64-pc-linux-gnu)

gcc/cp/ChangeLog:
* call.c (get_fndecl_argument_location): New function.
(convert_like_real): Use it  when complaining about argument type
mismatches.
* cp-tree.h (struct cp_parameter_declarator): Add "loc" field.
* parser.c (make_parameter_declarator): Add "loc" param and use
it to initialize the new field.
(cp_parser_translation_unit): Add UNKNOWN_LOCATION for "loc" of
the "no_parameters" parameter.
(cp_parser_parameter_declaration_list): Set the location of the
result of grokdeclarator to be the parameter's loc, assuming no
errors.
(cp_parser_parameter_declaration): Generate a location for the
parameter and pass to make_parameter_declarator.

gcc/testsuite/ChangeLog:
* g++.dg/diagnostic/param-type-mismatch.C: Update expected results
to reflect highlighting of parameters; add test coverage for
callback parameters.
---
 gcc/cp/call.c  | 26 +-
 gcc/cp/cp-tree.h   |  2 +
 gcc/cp/parser.c| 32 +++-
 .../g++.dg/diagnostic/param-type-mismatch.C| 57 +-
 4 files changed, 103 insertions(+), 14 deletions(-)

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 4fa0d03..e83cf99 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6579,6 +6579,30 @@ maybe_print_user_conv_context (conversion *convs)
}
 }
 
+/* Locate the parameter with the given index within FNDECL.
+   ARGNUM is zero based, -1 indicates the `this' argument of a method.
+   Return the location of the FNDECL itself if there are problems.  */
+
+static location_t
+get_fndecl_argument_location (tree fndecl, int argnum)
+{
+  int i;
+  tree param;
+
+  /* Locate param by index within DECL_ARGUMENTS (fndecl).  */
+  for (i = 0, param = FUNCTION_FIRST_USER_PARM (fndecl);
+   i < argnum && param;
+   i++, param = TREE_CHAIN (param))
+;
+
+  /* If something went wrong (e.g. if we have a builtin and thus no arguments),
+ return the location of FNDECL.  */
+  if (param == NULL)
+return DECL_SOURCE_LOCATION (fndecl);
+
+  return DECL_SOURCE_LOCATION (param);
+}
+
 /* Perform the conversions in CONVS on the expression EXPR.  FN and
ARGNUM are used for diagnostics.  ARGNUM is zero based, -1
indicates the `this' argument of a method.  INNER is nonzero when
@@ -6680,7 +6704,7 @@ convert_like_real (conversion *convs, tree expr, tree fn, 
int argnum,
complained = permerror (loc, "invalid conversion from %qH to %qI",
TREE_TYPE (expr), totype);
   if (co

C++ patch ping

2017-09-22 Thread Jakub Jelinek
Hi!

I'd like to ping the
  http://gcc.gnu.org/ml/gcc-patches/2017-09/msg00937.html
  - fix compile time hog in replace_placeholders
patch.

Thanks

Jakub


Re: [PATCH][GRAPHITE] More TLC

2017-09-22 Thread Sebastian Pop
On Fri, Sep 22, 2017 at 8:03 AM, Richard Biener  wrote:

>
> This simplifies canonicalize_loop_closed_ssa and does other minimal
> TLC.  It also adds a testcase I reduced from a stupid mistake I made
> when reworking canonicalize_loop_closed_ssa.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.
>
> SPEC CPU 2006 is happy with it, current statistics on x86_64 with
> -Ofast -march=haswell -floop-nest-optimize are
>
>  61 loop nests "optimized"
>  45 loop nest transforms cancelled because of code generation issues
>  21 loop nest optimizations timed out the 35 ISL "operations" we allow
>
> I say "optimized" because the usual transform I've seen is static tiling
> as enforced by GRAPHITE according to --param loop-block-tile-size.
> There's no way to automagically figure what kind of transform ISL did
>

Here is how to automate (without magic) the detection
of the transform that isl did.

The problem solved by isl is the minimization of strides
in memory, and to do this, we need to tell the isl scheduler
the validity dependence graph, in graphite-optimize-isl.c
see the validity (RAW, WAR, WAW) and the proximity
(RAR + validity) maps.  The proximity does include the
read after read, as the isl scheduler needs to minimize
strides between consecutive reads.

When you apply the schedule to the dependence graph,
one can tell from the result the strides in memory, a good
way to say whether a transform was beneficial is to sum up
all memory strides, and make sure that the sum of all strides
decreases after transform.  We could add a printf with the
sum of strides before and after transforms, and have the
testcases check for that.

(usually none with the schedule identical check confused by FILTER
> stuff positioning).  This is also the issue with most GRAPHITE testcases.
> We can't really verify easily whether we performed loop interchange
> or not.  We can probably tell whether we applied loop fusion or
> splitting (by counting loops).
>
> I'm not aware of any remaining ICEs / wrong-code issues with GRAPHITE.
>
> I'm aware that the current "black-box" granularity hinders
> scheduling freedom (each GIMPLE BB is mapped to a ISL stmt, this
> is too coarse to schedule say two writes in a BB independently
> from each other).  Quick experiments could be done by simply
> splitting gimple BBs at some points.
>
> I'm aware that the SCOP detection algorithm assumes that it can
> walk loop->next and find loops "in order" -- but while that's
> true for the initial flow_loops_find result (DFS walk) it isn't
> true for any later created / discovered loops.  Sorting of
> loop siblings in DFS order should be easy (and a general cfgloopanal
> helper).
>
> Richard.
>
> 2017-09-22  Richard Biener  
>
> * graphite-isl-ast-to-gimple.c (graphite_verify): Inline into
> single caller.
> (graphite_regenerate_ast_isl): Do not reset SCEV.  Move debug
> print of no dependency loops ...
> * graphite.c (graphite_transform_loops): ... here.
> (canonicalize_loop_closed_ssa_form): Work from inner to outer
> loops.
> (same_close_phi_node, remove_duplicate_close_phi,
> make_close_phi_nodes_unique, defined_in_loop_p): Fold into ...
> (canonicalize_loop_closed_ssa): ... here and simplify.
> * graphite-optimize-isl.c: Include tree-vectorizer.h.
> (optimize_isl): Use dump_printf_loc to tell when we stopped
> optimizing because of an ISL timeout.
>
> * gcc.dg/graphite/scop-24.c: New testcase.
>
>
The change looks good to me.

Thanks,
Sebastian


Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-22 Thread Steve Kargl
On Fri, Sep 22, 2017 at 10:11:55AM +0300, Janne Blomqvist wrote:
> On Fri, Sep 22, 2017 at 8:02 AM, Janus Weil  wrote:
> > Thanks, Steve. I'm attaching the updated ChangeLog and the two test
> > cases for the two new flags. Since this appears to be  a somewhat
> > controversial feature, I'll wait two more days to allow for others to
> > contribute their feedback (positive or negative). I'll commit on
> > Sunday if I hear nothing until then.
> 
> Well, if you're actively soliciting bikeshedding, here goes:
> 
> Since we're about to have several -fdefault-real-N flags, would it
> make sense to instead make a single flag -fdefault-real=SOMEVALUE (and
> for backwards compatibility, make -fdefault-real-8 an alias for
> -fdefault-real=8)? And similarly for -fdefault-double=.
> 
> And since the standard requires that double precision variables are
> twice as big as reals, in the absence of an explicit -fdefault-double=
> flag, would it make sense to have -fdefault-real=N imply
> -fdefault-double=[2*N or if that isn't supported on the target, the
> largest supported real kind]? (This is sort-of an extension of the
> current -fdefault-real-8 behavior)

The standard requires more than just double precision being twice
as big as default real.  Among other things, storage association
rules require default logical, integer, and real to all occupy
the same number of storage units.  The promotion of DOUBLE
PRECISION to REAL(16), if available, was my misguided attempt to
enforce storage assocation.  That is the essenses of why I think
these options should go away.

But, I understand Janus's position that -fdefault-real-16 allows
a developer to quickly test whether an algorithm works and/or is
stable at higher precision.  The unfortunate side-effect is that
a developer finds that the result of the option appears to work,
so the developer never reviews the actual code.  It is not uncommon
to find old code of the form

   REAL EPS, NEW, OLD
   PARAMETER(EPS=1.E-8)
   ...
C  TEST FOR CONVERGENCE
   IF (ABS(NEW-OLD).LT.EPS) THEN

Is the convergence criteria still correct for -fdefault-real-16?
Only a developer reviewing and properly porting the code to a 
higher precision can make that decision.

-- 
Steve
20170425 https://www.youtube.com/watch?v=VWUpyCsUKR4
20161221 https://www.youtube.com/watch?v=IbCHE-hONow


libbacktrace patch committed: Replace lstat and readlink if not available

2017-09-22 Thread Ian Lance Taylor
This patch to libbacktrace provides dummy versions of lstat and
readlink if they are not available on the system.  Bootstrapped and
ran libbacktrace tests on x86_64-pc-linux-gnu both normally and with a
hand-edited config.h.  Committed to mainline.

Ian

2017-09-22  Ian Lance Taylor  

PR sanitizer/77631
* configure.ac: Check for lstat and readlink.
* elf.c (lstat, readlink): Provide dummy versions if real versions
are not available.
* configure, config.h.in: Rebuild.
Index: configure.ac
===
--- configure.ac(revision 253093)
+++ configure.ac(working copy)
@@ -373,6 +373,7 @@ if test "$have_fcntl" = "yes"; then
 fi
 
 AC_CHECK_DECLS(strnlen)
+AC_CHECK_FUNCS(lstat readlink)
 
 # Check for getexecname function.
 if test -n "${with_target_subdir}"; then
Index: elf.c
===
--- elf.c   (revision 253093)
+++ elf.c   (working copy)
@@ -75,6 +75,35 @@ xstrnlen (const char *s, size_t maxlen)
 
 #endif
 
+#ifndef HAVE_LSTAT
+
+/* Dummy version of lstat for systems that don't have it.  */
+
+static int
+xlstat (const char *path ATTRIBUTE_UNUSED, struct stat *st ATTRIBUTE_UNUSED)
+{
+  return -1;
+}
+
+#define lstat xlstat
+
+#endif
+
+#ifndef HAVE_READLINK
+
+/* Dummy version of readlink for systems that don't have it.  */
+
+static ssize_t
+xreadlink (const char *path ATTRIBUTE_UNUSED, char *buf ATTRIBUTE_UNUSED,
+  size_t bufsz ATTRIBUTE_UNUSED)
+{
+  return -1;
+}
+
+#define readlink xreadlink
+
+#endif
+
 #ifndef HAVE_DL_ITERATE_PHDR
 
 /* Dummy version of dl_iterate_phdr for systems that don't have it.  */


Re: [PATCH][GRAPHITE] Simplify move_sese_in_condition

2017-09-22 Thread Sebastian Pop
On Fri, Sep 22, 2017 at 4:37 AM, Richard Biener  wrote:

>
> This re-implements it avoding the need to recompute dominators and in
> a much simpler way.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress, SPEC CPU
> 2006 is happy.
>
> Richard.
>
> 2017-09-22  Richard Biener  
>
> * sese.c: Include cfganal.h.
> (if_region_set_false_region): Remove.
> (create_if_region_on_edge): Likewise.
> (move_sese_in_condition): Re-implement without destroying
> dominators.
>

This is an excellent cleanup.  Thanks!

Sebastian


[PATCH] Fix __mulv[dt]i3 and expand_mul_overflow (PR target/82274)

2017-09-22 Thread Jakub Jelinek
Hi!

An overflow isn't detected on multiplication of signed integer
with all ones in the upper half of bits and all zeros in the lower half
by itself, either with -ftrapv, or __builtin_mul_overflow{,_p},
or signed-integer-overflow sanitization.

For libgcc, we have various cases we handle earlier (e.g. when one or both
operands are properly sign-extended from half precision to the full
precision), and for other operands we mostly abort, except for some special
cases (when upper half is all 0s or all 1s, but the msb of lower half is
different from that).  When upper halves of both operands are all 1s,
we mishandle the above mentioned special case:
  DWunion ww = {.ll = (UDWtype) (UWtype) uu.s.low
* (UDWtype) (UWtype) vv.s.low};

  ww.s.high -= uu.s.low;
  ww.s.high -= vv.s.low;
  if (__builtin_expect (ww.s.high >= 0, 1))
return ww.ll;
ww.ll will be 0 and so will be ww.s.high, so we return 0, but
the only case when multiplication result is 0 without overflow is when
either of the operands is 0 (but in that case it is properly sign-extended
from lower half and thus we handle it earlier).
So, either we could do if (__builtin_expect (ww.s.high >= 0 && ww.ll, 1))
or what the patch does (which is as expensive as that IMHO, but without
performing the multiplication in that case).

The expand_mul_overflow fix is similar, in that case we know that
either both operands have all upper half bits 1 and then 0 msb of lower half,
or both operands have all upper half bits 0 and then 1 msb of lower half.
In any case, we know that neither operand is 0, and thus if the result is
0, we certainly have overflow.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
release branches?

2017-09-22  Jakub Jelinek  

PR target/82274
* internal-fn.c (expand_mul_overflow): If both operands have
the same highpart of -1 or 0 and the topmost bit of lowpart
is different, overflow is if res <= 0 rather than res < 0.

* libgcc2.c (__mulvDI3): If both operands have
the same highpart of -1 and the topmost bit of lowpart is 0,
multiplication overflows even if both lowparts are 0.

* gcc.dg/pr82274-1.c: New test.
* gcc.dg/pr82274-2.c: New test.

--- gcc/internal-fn.c.jj2017-09-01 09:25:44.0 +0200
+++ gcc/internal-fn.c   2017-09-22 11:10:13.901438868 +0200
@@ -1770,8 +1770,8 @@ expand_mul_overflow (location_t loc, tre
}
 
  /* At this point hipart{0,1} are both in [-1, 0].  If they are
-the same, overflow happened if res is negative, if they are
-different, overflow happened if res is positive.  */
+the same, overflow happened if res is non-positive, if they
+are different, overflow happened if res is positive.  */
  if (op0_sign != 1 && op1_sign != 1 && op0_sign != op1_sign)
emit_jump (hipart_different);
  else if (op0_sign == 1 || op1_sign == 1)
@@ -1779,7 +1779,7 @@ expand_mul_overflow (location_t loc, tre
 NULL_RTX, NULL, hipart_different,
 profile_probability::even ());
 
- do_compare_rtx_and_jump (res, const0_rtx, LT, false, mode,
+ do_compare_rtx_and_jump (res, const0_rtx, LE, false, mode,
   NULL_RTX, NULL, do_error,
   profile_probability::very_unlikely ());
  emit_jump (done_label);
--- libgcc/libgcc2.c.jj 2017-01-01 12:45:50.0 +0100
+++ libgcc/libgcc2.c2017-09-22 10:17:56.010757022 +0200
@@ -375,7 +375,8 @@ __mulvDI3 (DWtype u, DWtype v)
}
  else
{
- if (uu.s.high == (Wtype) -1 && vv.s.high == (Wtype) - 1)
+ if ((uu.s.high & vv.s.high) == (Wtype) -1
+ && (uu.s.low | vv.s.low) != 0)
{
  DWunion ww = {.ll = (UDWtype) (UWtype) uu.s.low
* (UDWtype) (UWtype) vv.s.low};
--- gcc/testsuite/gcc.dg/pr82274-1.c.jj 2017-09-22 09:58:06.016739604 +0200
+++ gcc/testsuite/gcc.dg/pr82274-1.c2017-09-22 10:10:53.466071164 +0200
@@ -0,0 +1,16 @@
+/* PR target/82274 */
+/* { dg-do run } */
+/* { dg-shouldfail "trapv" } */
+/* { dg-options "-ftrapv" } */
+
+int
+main ()
+{
+#ifdef __SIZEOF_INT128__
+  volatile __int128 m = -(((__int128) 1) << (__CHAR_BIT__ * __SIZEOF_INT128__ 
/ 2));
+#else
+  volatile long long m = -(1LL << (__CHAR_BIT__ * __SIZEOF_LONG_LONG__ / 2));
+#endif
+  m = m * m;
+  return 0;
+}
--- gcc/testsuite/gcc.dg/pr82274-2.c.jj 2017-09-22 11:17:59.534630946 +0200
+++ gcc/testsuite/gcc.dg/pr82274-2.c2017-09-22 11:19:17.340661013 +0200
@@ -0,0 +1,26 @@
+/* PR target/82274 */
+/* { dg-do run } */
+/

[PATCH][GRAPHITE] More TLC

2017-09-22 Thread Richard Biener

This simplifies canonicalize_loop_closed_ssa and does other minimal
TLC.  It also adds a testcase I reduced from a stupid mistake I made
when reworking canonicalize_loop_closed_ssa.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

SPEC CPU 2006 is happy with it, current statistics on x86_64 with
-Ofast -march=haswell -floop-nest-optimize are

 61 loop nests "optimized"
 45 loop nest transforms cancelled because of code generation issues
 21 loop nest optimizations timed out the 35 ISL "operations" we allow

I say "optimized" because the usual transform I've seen is static tiling
as enforced by GRAPHITE according to --param loop-block-tile-size.
There's no way to automagically figure what kind of transform ISL did
(usually none with the schedule identical check confused by FILTER
stuff positioning).  This is also the issue with most GRAPHITE testcases.
We can't really verify easily whether we performed loop interchange
or not.  We can probably tell whether we applied loop fusion or
splitting (by counting loops).

I'm not aware of any remaining ICEs / wrong-code issues with GRAPHITE.

I'm aware that the current "black-box" granularity hinders
scheduling freedom (each GIMPLE BB is mapped to a ISL stmt, this
is too coarse to schedule say two writes in a BB independently
from each other).  Quick experiments could be done by simply
splitting gimple BBs at some points.

I'm aware that the SCOP detection algorithm assumes that it can
walk loop->next and find loops "in order" -- but while that's
true for the initial flow_loops_find result (DFS walk) it isn't
true for any later created / discovered loops.  Sorting of
loop siblings in DFS order should be easy (and a general cfgloopanal
helper).

Richard.

2017-09-22  Richard Biener  

* graphite-isl-ast-to-gimple.c (graphite_verify): Inline into
single caller.
(graphite_regenerate_ast_isl): Do not reset SCEV.  Move debug
print of no dependency loops ...
* graphite.c (graphite_transform_loops): ... here.
(canonicalize_loop_closed_ssa_form): Work from inner to outer
loops.
(same_close_phi_node, remove_duplicate_close_phi,
make_close_phi_nodes_unique, defined_in_loop_p): Fold into ...
(canonicalize_loop_closed_ssa): ... here and simplify.
* graphite-optimize-isl.c: Include tree-vectorizer.h.
(optimize_isl): Use dump_printf_loc to tell when we stopped
optimizing because of an ISL timeout.

* gcc.dg/graphite/scop-24.c: New testcase.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 253091)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -73,15 +73,6 @@ struct ast_build_info
   bool is_parallelizable;
 };
 
-/* Verifies properties that GRAPHITE should maintain during translation.  */
-
-static inline void
-graphite_verify (void)
-{
-  checking_verify_loop_structure ();
-  checking_verify_loop_closed_ssa (true);
-}
-
 /* IVS_PARAMS maps isl's scattering and parameter identifiers
to corresponding trees.  */
 
@@ -2997,8 +2988,9 @@ graphite_regenerate_ast_isl (scop_p scop
  delete_loop (loop);
 }
 
-  graphite_verify ();
-  scev_reset ();
+  /* Verifies properties that GRAPHITE should maintain during translation.  */
+  checking_verify_loop_structure ();
+  checking_verify_loop_closed_ssa (true);
 
   free (if_region->true_region);
   free (if_region->region);
@@ -3008,19 +3000,6 @@ graphite_regenerate_ast_isl (scop_p scop
   isl_ast_node_free (root_node);
   timevar_pop (TV_GRAPHITE_CODE_GEN);
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
-{
-  loop_p loop;
-  int num_no_dependency = 0;
-
-  FOR_EACH_LOOP (loop, 0)
-   if (loop->can_be_parallel)
- num_no_dependency++;
-
-  fprintf (dump_file, "%d loops carried no dependency.\n",
-  num_no_dependency);
-}
-
   return !t.codegen_error_p ();
 }
 
Index: gcc/graphite-optimize-isl.c
===
--- gcc/graphite-optimize-isl.c (revision 253091)
+++ gcc/graphite-optimize-isl.c (working copy)
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
 #include "tree-data-ref.h"
 #include "params.h"
 #include "dumpfile.h"
+#include "tree-vectorizer.h"
 #include "graphite.h"
 
 
@@ -156,9 +157,12 @@ optimize_isl (scop_p scop)
   if (!scop->transformed_schedule
   || isl_ctx_last_error (scop->isl_context) == isl_error_quota)
 {
-  if (dump_file && dump_flags)
-   fprintf (dump_file, "isl timed out --param max-isl-operations=%d\n",
-max_operations);
+  location_t loc = find_loop_location
+   (scop->scop_info->region.entry->dest->loop_father);
+  dump_printf_loc (MSG_MISSED_OPTIMIZATION, loc,
+  "loop nest not optimized, optimization timed out "
+  "after %d operations [--param ma

Re: [PATCH PR82163/V2]New interface checking LCSSA for single loop

2017-09-22 Thread Richard Biener
On Fri, Sep 22, 2017 at 1:37 PM, Bin Cheng  wrote:
> Hi,
> This is the V2 patch fixing PR82163.  It rewrites verify_loop_closed_ssa by 
> checking
> uses of all definitions inside of loop.  This gives advantage that we can 
> check loop
> closed ssa form for a specific loop, rather than for whole function.  The 
> interface
> is used in fixing this issue.
> Bootstrap and test on x86_64, is it OK?

Looks good to me.

Thanks,
Richard.

> Thanks,
> bin
> 2017-09-21  Bin Cheng  
>
> PR tree-optimization/82163
> * tree-ssa-loop-manip.h (verify_loop_closed_ssa): New parameter.
> (checking_verify_loop_closed_ssa): New parameter.
> * tree-ssa-loop-manip.c (check_loop_closed_ssa_use): Delete.
> (check_loop_closed_ssa_stmt): Delete.
> (check_loop_closed_ssa_def, check_loop_closed_ssa_bb): New functions.
> (verify_loop_closed_ssa): Check loop closed ssa form for LOOP.
> (tree_transform_and_unroll_loop): Check loop closed ssa form only for
> changed loops.
>
> gcc/testsuite
> 2017-09-21  Bin Cheng  
>
> PR tree-optimization/82163
> * gcc.dg/tree-ssa/pr82163.c: New test.


[PATCH] Fix PR82291

2017-09-22 Thread Richard Biener

The following fixes a goof in if-conversion which did nothing to
false predicated blocks.  But we have to at least remove all stores.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-09-22  Richard Biener  

PR tree-optimization/82291
* tree-if-conv.c (predicate_mem_writes): Make sure to
remove writes in blocks predicated with false.

* gcc.dg/torture/pr82291.c: New testcase.

Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 253090)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2197,7 +2197,7 @@ predicate_mem_writes (loop_p loop)
   gimple *stmt;
   int index;
 
-  if (is_true_predicate (cond) || is_false_predicate (cond))
+  if (is_true_predicate (cond))
continue;
 
   swap = false;
@@ -2210,97 +2210,107 @@ predicate_mem_writes (loop_p loop)
   vect_sizes.truncate (0);
   vect_masks.truncate (0);
 
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
-   if (!gimple_assign_single_p (stmt = gsi_stmt (gsi)))
- continue;
-   else if (gimple_plf (stmt, GF_PLF_2))
- {
-   tree lhs = gimple_assign_lhs (stmt);
-   tree rhs = gimple_assign_rhs1 (stmt);
-   tree ref, addr, ptr, mask;
-   gcall *new_stmt;
-   gimple_seq stmts = NULL;
-   int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
-   ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
-   mark_addressable (ref);
-   addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
-true, NULL_TREE, true,
-GSI_SAME_STMT);
-   if (!vect_sizes.is_empty ()
-   && (index = mask_exists (bitsize, vect_sizes)) != -1)
- /* Use created mask.  */
- mask = vect_masks[index];
-   else
- {
-   if (COMPARISON_CLASS_P (cond))
- mask = gimple_build (&stmts, TREE_CODE (cond),
-  boolean_type_node,
-  TREE_OPERAND (cond, 0),
-  TREE_OPERAND (cond, 1));
-   else
- {
-   gcc_assert (TREE_CODE (cond) == SSA_NAME);
-   mask = cond;
- }
-
-   if (swap)
- {
-   tree true_val
- = constant_boolean_node (true, TREE_TYPE (mask));
-   mask = gimple_build (&stmts, BIT_XOR_EXPR,
-TREE_TYPE (mask), mask, true_val);
- }
-   gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
-
-   mask = ifc_temp_var (TREE_TYPE (mask), mask, &gsi);
-   /* Save mask and its size for further use.  */
-   vect_sizes.safe_push (bitsize);
-   vect_masks.safe_push (mask);
- }
-   ptr = build_int_cst (reference_alias_ptr_type (ref),
-get_object_alignment (ref));
-   /* Copy points-to info if possible.  */
-   if (TREE_CODE (addr) == SSA_NAME && !SSA_NAME_PTR_INFO (addr))
- copy_ref_info (build2 (MEM_REF, TREE_TYPE (ref), addr, ptr),
-ref);
-   if (TREE_CODE (lhs) == SSA_NAME)
- {
-   new_stmt
- = gimple_build_call_internal (IFN_MASK_LOAD, 3, addr,
-   ptr, mask);
-   gimple_call_set_lhs (new_stmt, lhs);
-   gimple_set_vuse (new_stmt, gimple_vuse (stmt));
- }
-   else
- {
-   new_stmt
- = gimple_build_call_internal (IFN_MASK_STORE, 4, addr, ptr,
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
+   {
+ if (!gimple_assign_single_p (stmt = gsi_stmt (gsi)))
+   ;
+ else if (is_false_predicate (cond))
+   {
+ unlink_stmt_vdef (stmt);
+ gsi_remove (&gsi, true);
+ release_defs (stmt);
+ continue;
+   }
+ else if (gimple_plf (stmt, GF_PLF_2))
+   {
+ tree lhs = gimple_assign_lhs (stmt);
+ tree rhs = gimple_assign_rhs1 (stmt);
+ tree ref, addr, ptr, mask;
+ gcall *new_stmt;
+ gimple_seq stmts = NULL;
+ int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
+ ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
+ mark_addressable (ref);
+ addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
+  true, NULL_TREE, true,
+  GSI_SAME_STMT);
+ if (!vect_sizes.is_empty ()
+

Re: [PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch

2017-09-22 Thread Uros Bizjak
On Fri, Sep 22, 2017 at 1:27 PM, Uros Bizjak  wrote:
> On Fri, Sep 22, 2017 at 12:28 PM, Daniel Santos  
> wrote:
>> On 09/22/2017 03:28 AM, Rainer Orth wrote:
>>> Hi Daniel,
>>>
 On 09/22/2017 02:18 AM, Rainer Orth wrote:
> Hi Daniel,
>
>> On 09/21/2017 05:18 PM, Daniel Santos wrote:
>>> So libgcc doesn't use a config.in. :(
>> Scratch that, I forgot that we're using gcc/config.in via auto-host.h.
>> So I only have to add this to gcc/configure.ac and it will be available
>> for my libgcc header -- this is what I used to sniff out support for the
>> .hidden directive.
> Please don't go that route: it's totally the wrong direction.  There's
> work going on to further decouple libgcc from gcc-private headers and
> configure results.  libgcc already has its own configure tests for
> assembler features, and its own config.in.  What's wrong with adapting
> libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for
> libgcc?  Should be trivial...
>
> Rainer
>
 Oops, I just saw your email after submitting my other patch.  Yes, I am
 mistaken about config.in, sorry about that.  I didn't see a config.h
 file, but examining further it looks like it outputs to auto-target.h.
 Also, I was looking for some HAVE_AS* macros, but they are named
 differently.
>>> Right: though some are for assembler features, the macros are named
>>> differently.
>>>
 I had previously included gcc's auto-host.h since it was in the include
 path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll
>>> HAVE_GAS_HIDDEN actually ;-)
>>>
 need to add that check into libgcc/configure.ac as well.  Again,
 shouldn't be that much code.  Sound sane to you?
>>> You could do that, but it was already used before your patches, so
>>> please separate it from the current issue if you go down that route.
>>> libgcc is still full of cleanup possibilities :-)
>>>
>>>   Rainer
>>
>> OK, so I'm just adding HAVE_AS_AVX mostly as-is from libitm (we don't
>> have $target_cpu so I'm using $target).  I do have minor concerns about
>> how this test will work on a cross-build -- I'm not an autotools expert
>> and I don't understand which assembler it will invoke, but the results
>> of the test failing only means we use .byte instead of the real
>> mnemonic, so it really shouldn't be a problem.
>>
>> I've got tests started again, so presuming that *this* one passes, is it
>> OK for the trunk?
>>
>> gcc/testsuite:
>> * gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break
>> on Solaris or with -mno-omit-frame-pointer.
>
> No need to explain the change in the ChangeLog. Just say "(b): Remove
> volatile asm."
>
>> * gcc.target/i386/pr82196-2.c: Likewise.
>>
>> libgcc:
>> * configure.ac: Add check for HAVE_AS_AVX.
>> * config.in: Regenerate.
>> * configure: Likewise.
>> * config/i386/i386-asm.h: Include auto-target.h from libgcc.
>> (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_AVX and directly emit raw
>> .byte code when assembler doesn't support avx, correct
>> out-of-date comments.
>
> (SSE_SAVE, SSE_RESTORE): Emit .byte sequence for !HAVE_AS_AVX.
> Correct out-of-date comments.
>
>>
>>  gcc/testsuite/gcc.target/i386/pr82196-1.c |  5 ++-
>>  gcc/testsuite/gcc.target/i386/pr82196-2.c |  5 ++-
>>  libgcc/config.in  |  3 ++
>>  libgcc/config/i386/i386-asm.h | 45 ++-
>>  libgcc/configure  | 39 +++
>>  libgcc/configure.ac   | 16 ++
>>  6 files changed, 100 insertions(+), 13 deletions(-)
>
>
>>
>
>  #ifdef MS2SYSV_STUB_AVX
>  # define MS2SYSV_STUB_PREFIX __avx_
> -# define MOVAPS vmovaps
> +# ifdef HAVE_AS_AVX
> +#  define MOVAPS vmovaps
> +# endif
>
> This is unecessarily complex. Please define MOVAPS unconditionaly, and ...

Please disregard the above... It is OK, since the code also handles sse.

> +#  define BYTE .byte
> +#  define SSE_SAVE\
> + BYTE 0xc5, 0x78, 0x29, 0x78, 0xd0; /* vmovaps %xmm15,-0x30(%rax) */ \
>
> Is there a reason for BYTE definition? Every known assembler supports
> .byte directive.

+ BYTE 0xc5, 0xf8, 0x28, 0x76, 0x60; /* vmovaps  0x60(%rsi),%xmm6  */
+# endif /* MOVAPS */
 #endif /* defined (MS2SYSV_STUB_ISA) && defined (MOVAPS) */
 #endif /* I386_ASM_H */

Please update the #endif comment.

Uros.


[PATCH PR82163/V2]New interface checking LCSSA for single loop

2017-09-22 Thread Bin Cheng
Hi,
This is the V2 patch fixing PR82163.  It rewrites verify_loop_closed_ssa by 
checking
uses of all definitions inside of loop.  This gives advantage that we can check 
loop
closed ssa form for a specific loop, rather than for whole function.  The 
interface
is used in fixing this issue.
Bootstrap and test on x86_64, is it OK?

Thanks,
bin
2017-09-21  Bin Cheng  

PR tree-optimization/82163
* tree-ssa-loop-manip.h (verify_loop_closed_ssa): New parameter.
(checking_verify_loop_closed_ssa): New parameter.
* tree-ssa-loop-manip.c (check_loop_closed_ssa_use): Delete.
(check_loop_closed_ssa_stmt): Delete.
(check_loop_closed_ssa_def, check_loop_closed_ssa_bb): New functions.
(verify_loop_closed_ssa): Check loop closed ssa form for LOOP.
(tree_transform_and_unroll_loop): Check loop closed ssa form only for
changed loops.

gcc/testsuite
2017-09-21  Bin Cheng  

PR tree-optimization/82163
* gcc.dg/tree-ssa/pr82163.c: New test.From ef756285db3685bd97bbe7d144d58af477251416 Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Thu, 21 Sep 2017 12:41:32 +0100
Subject: [PATCH] pr82163-20170921.txt

---
 gcc/testsuite/gcc.dg/tree-ssa/pr82163.c | 23 +
 gcc/tree-ssa-loop-manip.c   | 91 +++--
 gcc/tree-ssa-loop-manip.h   |  6 +--
 3 files changed, 80 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr82163.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c
new file mode 100644
index 000..fef2b1d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr82163.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, b, c[4], d, e, f, g;
+
+void h ()
+{
+  for (; a; a++)
+{
+  c[a + 3] = g;
+  if (b)
+c[a] = f;
+  else
+{
+  for (; d; d++)
+c[d + 3] = c[d];
+  for (e = 1; e == 2; e++)
+;
+  if (e)
+break;
+}
+}
+}
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index d6ba305..6ad0b75 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -690,48 +690,62 @@ rewrite_virtuals_into_loop_closed_ssa (struct loop *loop)
   rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_VIRTUAL_USES, loop);
 }
 
-/* Check invariants of the loop closed ssa form for the USE in BB.  */
+/* Check invariants of the loop closed ssa form for the def in DEF_BB.  */
 
 static void
-check_loop_closed_ssa_use (basic_block bb, tree use)
+check_loop_closed_ssa_def (basic_block def_bb, tree def)
 {
-  gimple *def;
-  basic_block def_bb;
+  use_operand_p use_p;
+  imm_use_iterator iterator;
+  FOR_EACH_IMM_USE_FAST (use_p, iterator, def)
+{
+  if (is_gimple_debug (USE_STMT (use_p)))
+   continue;
 
-  if (TREE_CODE (use) != SSA_NAME || virtual_operand_p (use))
-return;
+  basic_block use_bb = gimple_bb (USE_STMT (use_p));
+  if (is_a  (USE_STMT (use_p)))
+   use_bb = EDGE_PRED (use_bb, PHI_ARG_INDEX_FROM_USE (use_p))->src;
 
-  def = SSA_NAME_DEF_STMT (use);
-  def_bb = gimple_bb (def);
-  gcc_assert (!def_bb
- || flow_bb_inside_loop_p (def_bb->loop_father, bb));
+  gcc_assert (flow_bb_inside_loop_p (def_bb->loop_father, use_bb));
+}
 }
 
-/* Checks invariants of loop closed ssa form in statement STMT in BB.  */
+/* Checks invariants of loop closed ssa form in BB.  */
 
 static void
-check_loop_closed_ssa_stmt (basic_block bb, gimple *stmt)
+check_loop_closed_ssa_bb (basic_block bb)
 {
-  ssa_op_iter iter;
-  tree var;
+  for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+   gsi_next (&bsi))
+{
+  gphi *phi = bsi.phi ();
 
-  if (is_gimple_debug (stmt))
-return;
+  if (!virtual_operand_p (PHI_RESULT (phi)))
+   check_loop_closed_ssa_def (bb, PHI_RESULT (phi));
+}
+
+  for (gimple_stmt_iterator bsi = gsi_start_bb (bb); !gsi_end_p (bsi);
+   gsi_next (&bsi))
+{
+  ssa_op_iter iter;
+  tree var;
+  gimple *stmt = gsi_stmt (bsi);
+
+  if (is_gimple_debug (stmt))
+   continue;
 
-  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_USE)
-check_loop_closed_ssa_use (bb, var);
+  FOR_EACH_SSA_TREE_OPERAND (var, stmt, iter, SSA_OP_DEF)
+   check_loop_closed_ssa_def (bb, var);
+}
 }
 
 /* Checks that invariants of the loop closed ssa form are preserved.
-   Call verify_ssa when VERIFY_SSA_P is true.  */
+   Call verify_ssa when VERIFY_SSA_P is true.  Note all loops are checked
+   if LOOP is NULL, otherwise, only LOOP is checked.  */
 
 DEBUG_FUNCTION void
-verify_loop_closed_ssa (bool verify_ssa_p)
+verify_loop_closed_ssa (bool verify_ssa_p, struct loop *loop)
 {
-  basic_block bb;
-  edge e;
-  edge_iterator ei;
-
   if (number_of_loops (cfun) <= 1)
 return;
 
@@ -740,20 +754,22 @@ verify_loop_closed_ssa (bool verify_ssa_p)
 
   timevar_push (TV_VERIFY_LOO

Re: [PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch

2017-09-22 Thread Uros Bizjak
On Fri, Sep 22, 2017 at 12:28 PM, Daniel Santos  wrote:
> On 09/22/2017 03:28 AM, Rainer Orth wrote:
>> Hi Daniel,
>>
>>> On 09/22/2017 02:18 AM, Rainer Orth wrote:
 Hi Daniel,

> On 09/21/2017 05:18 PM, Daniel Santos wrote:
>> So libgcc doesn't use a config.in. :(
> Scratch that, I forgot that we're using gcc/config.in via auto-host.h.
> So I only have to add this to gcc/configure.ac and it will be available
> for my libgcc header -- this is what I used to sniff out support for the
> .hidden directive.
 Please don't go that route: it's totally the wrong direction.  There's
 work going on to further decouple libgcc from gcc-private headers and
 configure results.  libgcc already has its own configure tests for
 assembler features, and its own config.in.  What's wrong with adapting
 libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for
 libgcc?  Should be trivial...

 Rainer

>>> Oops, I just saw your email after submitting my other patch.  Yes, I am
>>> mistaken about config.in, sorry about that.  I didn't see a config.h
>>> file, but examining further it looks like it outputs to auto-target.h.
>>> Also, I was looking for some HAVE_AS* macros, but they are named
>>> differently.
>> Right: though some are for assembler features, the macros are named
>> differently.
>>
>>> I had previously included gcc's auto-host.h since it was in the include
>>> path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll
>> HAVE_GAS_HIDDEN actually ;-)
>>
>>> need to add that check into libgcc/configure.ac as well.  Again,
>>> shouldn't be that much code.  Sound sane to you?
>> You could do that, but it was already used before your patches, so
>> please separate it from the current issue if you go down that route.
>> libgcc is still full of cleanup possibilities :-)
>>
>>   Rainer
>
> OK, so I'm just adding HAVE_AS_AVX mostly as-is from libitm (we don't
> have $target_cpu so I'm using $target).  I do have minor concerns about
> how this test will work on a cross-build -- I'm not an autotools expert
> and I don't understand which assembler it will invoke, but the results
> of the test failing only means we use .byte instead of the real
> mnemonic, so it really shouldn't be a problem.
>
> I've got tests started again, so presuming that *this* one passes, is it
> OK for the trunk?
>
> gcc/testsuite:
> * gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break
> on Solaris or with -mno-omit-frame-pointer.

No need to explain the change in the ChangeLog. Just say "(b): Remove
volatile asm."

> * gcc.target/i386/pr82196-2.c: Likewise.
>
> libgcc:
> * configure.ac: Add check for HAVE_AS_AVX.
> * config.in: Regenerate.
> * configure: Likewise.
> * config/i386/i386-asm.h: Include auto-target.h from libgcc.
> (SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_AVX and directly emit raw
> .byte code when assembler doesn't support avx, correct
> out-of-date comments.

(SSE_SAVE, SSE_RESTORE): Emit .byte sequence for !HAVE_AS_AVX.
Correct out-of-date comments.

>
>  gcc/testsuite/gcc.target/i386/pr82196-1.c |  5 ++-
>  gcc/testsuite/gcc.target/i386/pr82196-2.c |  5 ++-
>  libgcc/config.in  |  3 ++
>  libgcc/config/i386/i386-asm.h | 45 ++-
>  libgcc/configure  | 39 +++
>  libgcc/configure.ac   | 16 ++
>  6 files changed, 100 insertions(+), 13 deletions(-)


>

 #ifdef MS2SYSV_STUB_AVX
 # define MS2SYSV_STUB_PREFIX __avx_
-# define MOVAPS vmovaps
+# ifdef HAVE_AS_AVX
+#  define MOVAPS vmovaps
+# endif

This is unecessarily complex. Please define MOVAPS unconditionaly, and ...

 #elif defined(MS2SYSV_STUB_SSE)
 # define MS2SYSV_STUB_PREFIX __sse_
 # define MOVAPS movaps
 #endif

-#if defined (MS2SYSV_STUB_PREFIX) && defined (MOVAPS)
+#if defined (MS2SYSV_STUB_PREFIX)

 # define MS2SYSV_STUB_BEGIN(base_name) \
  HIDDEN_FUNC(PASTE2(MS2SYSV_STUB_PREFIX, base_name))
@@ -83,8 +86,10 @@ ASMNAME(fn):
 # define MS2SYSV_STUB_END(base_name) \
  FUNC_END(PASTE2(MS2SYSV_STUB_PREFIX, base_name))

-/* Save SSE registers 6-15. off is the offset of rax to get to xmm6.  */
-# define SSE_SAVE   \
+/* If expanding for sse or avx and we have assembler support.  */
+# ifdef MOVAPS

... check HAVE_AS_AVX here.

+/* Save SSE registers 6-15 using rax as the base address.  */
+#  define SSE_SAVE   \
  MOVAPS %xmm15,-0x30(%rax); \


+#  define BYTE .byte
+#  define SSE_SAVE\
+ BYTE 0xc5, 0x78, 0x29, 0x78, 0xd0; /* vmovaps %xmm15,-0x30(%rax) */ \

Is there a reason for BYTE definition? Every known assembler supports
.byte directive.

Uros.


Re: [PATCH] haifa-sched: fix autopref_rank_for_schedule qsort comparator

2017-09-22 Thread Alexander Monakov
On Tue, 19 Sep 2017, Alexander Monakov wrote:
>   * haifa-sched.c (autopref_rank_for_schedule): Order 'irrelevant' insns
>   first, always call autopref_rank_data otherwise.

May I apply this patch now to unblock qsort checking?  Further changes or
adjustments can then go in independently at a later time.

Thanks.
Alexander

> --- a/gcc/haifa-sched.c
> +++ b/gcc/haifa-sched.c
> @@ -5707,7 +5707,8 @@ autopref_rank_data (autopref_multipass_data_t data1,
>  static int
>  autopref_rank_for_schedule (const rtx_insn *insn1, const rtx_insn *insn2)
>  {
> -  for (int write = 0; write < 2; ++write)
> +  int r = 0;
> +  for (int write = 0; write < 2 && !r; ++write)
>  {
>autopref_multipass_data_t data1
>   = &INSN_AUTOPREF_MULTIPASS_DATA (insn1)[write];
> @@ -5716,21 +5717,20 @@ autopref_rank_for_schedule (const rtx_insn *insn1, 
> const rtx_insn *insn2)
>  
>if (data1->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
>   autopref_multipass_init (insn1, write);
> -  if (data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
> - continue;
>  
>if (data2->status == AUTOPREF_MULTIPASS_DATA_UNINITIALIZED)
>   autopref_multipass_init (insn2, write);
> -  if (data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT)
> - continue;
>  
> -  if (!rtx_equal_p (data1->base, data2->base))
> - continue;
> +  int irrel1 = data1->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
> +  int irrel2 = data2->status == AUTOPREF_MULTIPASS_DATA_IRRELEVANT;
>  
> -  return autopref_rank_data (data1, data2);
> +  if (!irrel1 && !irrel2)
> + r = autopref_rank_data (data1, data2);
> +  else
> + r = irrel2 - irrel1;
>  }
>  
> -  return 0;
> +  return r;
>  }
>  
>  /* True if header of debug dump was printed.  */
> 


Re: [PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch

2017-09-22 Thread Jakub Jelinek
On Fri, Sep 22, 2017 at 05:28:00AM -0500, Daniel Santos wrote:
> +/* If the assembler doesn't support AVX then directly emit machine code
> +   for the instructions above directly.  */

Just a nit: too many "directly" words.

Jakub


Re: Add a vect_get_dr_size helper function

2017-09-22 Thread Richard Biener
On Fri, Sep 22, 2017 at 10:15 AM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Sep 14, 2017 at 4:05 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Thu, Sep 14, 2017 at 1:23 PM, Richard Sandiford
  wrote:
> This patch adds a helper function for getting the number of
> bytes accessed by a scalar data reference, which helps when general
> modes have a variable size.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

 Can you put it into tree-data-ref.h?
>>>
>>> The idea (which I forgot to say) was to collect the uses within the
>>> vectoriser into one place so that we can assert in only one place
>>> that the reference is constant-sized.
>>>
>>> A general data_reference can be variable-sized, so the guarantees
>>> wouldn't be the same elsewhere.
>>>
>>> Would putting it in tree-vectorizer.h be OK?
>>
>> Maybe name it vect_get_scalar_dr_size then?
>>
>> Ok with that.
>
> As discussed on IRC, I tried also using SCALAR_TYPE_MODE instead of
> TYPE_MODE, to go with the new function name, but it turns out that
> the mode can be a single-element vector instead of a "true" scalar.
>
> This version of the patch sticks with vect_get_scalar_dr_size as
> the name, but adds a comment to say that "scalar" includes "scalar
> equivalent".  Since SCALAR_TYPE_MODE is too strong an assertion,
> I went with tree_to_uhwi instead.
>
> As also discussed on IRC, it might be possible in current sources
> to use SCALAR_TYPE_MODE (TREE_TYPE (STMT_VINFO_VECTYPE ...)) instead.
> But SVE has extending loads and truncating stores, for which the DR size
> will be different from the vector element size, so if we switch it now,
> we'd have to switch back to the DR size again later.
>
> (Sorry again for labouring such a simple change!)
>
> Tested as before.  OK to install?

Ok.

Richard.

> Thanks,
> Richard
>
>
> 2017-09-22  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * tree-vectorizer.h (vect_get_scalar_dr_size): New function.
> * tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use it.
> (vect_enhance_data_refs_alignment): Likewise.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2017-09-18 16:49:51.902170781 +0100
> +++ gcc/tree-vectorizer.h   2017-09-22 08:58:24.308199570 +0100
> @@ -1095,6 +1095,19 @@ vect_get_num_copies (loop_vec_info loop_
>   / TYPE_VECTOR_SUBPARTS (vectype));
>  }
>
> +/* Return the size of the value accessed by unvectorized data reference DR.
> +   This is only valid once STMT_VINFO_VECTYPE has been calculated for the
> +   associated gimple statement, since that guarantees that DR accesses
> +   either a scalar or a scalar equivalent.  ("Scalar equivalent" here
> +   includes things like V1SI, which can be vectorized in the same way
> +   as a plain SI.)  */
> +
> +inline unsigned int
> +vect_get_scalar_dr_size (struct data_reference *dr)
> +{
> +  return tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr;
> +}
> +
>  /* Source location */
>  extern source_location vect_location;
>
> Index: gcc/tree-vect-data-refs.c
> ===
> --- gcc/tree-vect-data-refs.c   2017-09-18 16:42:00.553203578 +0100
> +++ gcc/tree-vect-data-refs.c   2017-09-22 08:58:24.308199570 +0100
> @@ -955,7 +955,6 @@ vect_compute_data_ref_alignment (struct
>return true;
>  }
>
> -
>  /* Function vect_update_misalignment_for_peel.
> Sets DR's misalignment
> - to 0 if it has the same alignment as DR_PEEL,
> @@ -975,8 +974,8 @@ vect_update_misalignment_for_peel (struc
>unsigned int i;
>vec same_aligned_drs;
>struct data_reference *current_dr;
> -  int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr;
> -  int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF 
> (dr_peel;
> +  int dr_size = vect_get_scalar_dr_size (dr);
> +  int dr_peel_size = vect_get_scalar_dr_size (dr_peel);
>stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
>stmt_vec_info peel_stmt_info = vinfo_for_stmt (DR_STMT (dr_peel));
>
> @@ -1664,8 +1663,7 @@ vect_enhance_data_refs_alignment (loop_v
>
>vectype = STMT_VINFO_VECTYPE (stmt_info);
>nelements = TYPE_VECTOR_SUBPARTS (vectype);
> -  mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE (
> -TREE_TYPE (DR_REF (dr;
> + mis = DR_MISALIGNMENT (dr) / vect_get_scalar_dr_size (dr);
>   if (DR_MISALIGNMENT (dr) != 0)
> npeel_tmp = (negative ? (mis - nelements)
>  : (nelements - mis)) & (nelements - 1);
> @@ -1937,8 +1935,7 @@ vect_enhance_data_refs_alignment (loop_v
>   updating DR_MISALIGNMENT values.  The peeling factor is the
>

Re: [PATCH, rs6000] folding of vector stores in GIMPLE

2017-09-22 Thread Richard Biener
On Thu, Sep 21, 2017 at 10:56 PM, Will Schmidt
 wrote:
> Hi,
>
> Folding of vector stores in GIMPLE.
>
> - Add code to handle gimple folding for the vec_st (vector store) builtins.
> - Remove the now obsoleted folding code for vec_st from rs6000-c-c.
>
> There are two spots that I could use some feedback on.
>
> First -
>  An early exit remains in place prevent folding of statements that do not
>  have a LHS.  To allow folding of the stores to get past the check, I have
>  added a helper function (rs6000_builtin_valid_without_lhs) that allows
>  those store intrinsics to proceed.  I'm not sure the approach (or the name I 
> chose)
>  is the best choice, so I'll defer to recommendations on how to improve that. 
> :-)
>
> Second -
>  This code (as-is) is subject to a TBAA related issue (similar to what was 
> noticed
>  in the gimple folding code for loads.   As-is, with a testcase such as :
>
> void testst_struct1b (vector double vd1, long long ll1, struct S *p)
> {
> vec_st (vd1, ll1, (vector double *)p);
> }
>
>  will generate gimple that looks like:
> MEM[(struct S *)D.3218] = vd1;
>
> If I rework the code, setting arg2_type to be ptr_type_node, i.e.
> + tree arg2_type = TREE_TYPE (arg2);
> to:
> + tree arg2_type = ptr_type_node;
>
> the generated gimple then looks like
>   MEM[(void *)D.3218] = vd1;
>
> Which is probably OK, but I cannot say for certain.  The generated .s content 
> is at least equivalent.

It looks safe.

The question I had is whether vec_st (vd1, ll1, (vector double *)p) is
equivalent
to *(vector double *)((char *)p + ll1) = vd1; from a TBAA perspective.  Thus
whether the type of the tird argument to vec_st defines the type of the access
(in C standards meaning).  If so then what we do now is pessimizing (but
as you say previously (long long *) and (long *) were aliased together and
you got wrong-code with aliasing with regular stores of the "wrong" same type).

A proper fix would be to transition this type as seen from the frontend to
GIMPLE, for example in a similar way we do for MEM_REFs by piggy-backing
that on an extra argument, a constant zero pointer of the alias
pointer type to use
(which would also serve as a type indicator of the store itself).  I'd use a
target specific internal function for this (not sure if we can have those target
specific, but I guess if it's folded away then that's fine) and get away with
the overload set.

Richard.

> The resulting code is verified by testcases powerpc/fold-vec-st-*.c, which
> has been posted separately.
>
> regtest looks clean on power6 and newer.
>
> pending feedback, OK for trunk?
>
> Thanks,
> -Will
>
> [gcc]
>
> 2017-09-21  Will Schmidt  
>
> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add handling
> for early folding of vector stores (ALTIVEC_BUILTIN_ST_*).
> (rs6000_builtin_valid_without_lhs): helper function.
> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> Remove obsoleted code for handling ALTIVEC_BUILTIN_VEC_ST.
>
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index a49db97..4a363a1 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -6470,82 +6470,10 @@ altivec_resolve_overloaded_builtin (location_t loc, 
> tree fndecl,
>  convert (TREE_TYPE (stmt), arg0));
>stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>return stmt;
>  }
>
> -  /* Expand vec_st into an expression that masks the address and
> - performs the store.  We need to expand this early to allow
> - the best aliasing, as by the time we get into RTL we no longer
> - are able to honor __restrict__, for example.  We may want to
> - consider this for all memory access built-ins.
> -
> - When -maltivec=be is specified, or the wrong number of arguments
> - is provided, simply punt to existing built-in processing.  */
> -
> -  if (fcode == ALTIVEC_BUILTIN_VEC_ST
> -  && (BYTES_BIG_ENDIAN || !VECTOR_ELT_ORDER_BIG)
> -  && nargs == 3)
> -{
> -  tree arg0 = (*arglist)[0];
> -  tree arg1 = (*arglist)[1];
> -  tree arg2 = (*arglist)[2];
> -
> -  /* Construct the masked address.  Let existing error handling take
> -over if we don't have a constant offset.  */
> -  arg1 = fold (arg1);
> -
> -  if (TREE_CODE (arg1) == INTEGER_CST)
> -   {
> - if (!ptrofftype_p (TREE_TYPE (arg1)))
> -   arg1 = build1 (NOP_EXPR, sizetype, arg1);
> -
> - tree arg2_type = TREE_TYPE (arg2);
> - if (TREE_CODE (arg2_type) == ARRAY_TYPE && c_dialect_cxx ())
> -   {
> - /* Force array-to-pointer decay for C++.  */
> - arg2 = default_conversion (arg2);
> - arg2_type = TREE_TYPE (arg2);
> -   }
> -
> - /* Find the built-in to make sure a compatible one exists; if not
> -we fall back to default handling to get the error message. 

Re: [PATCH GCC]A simple implementation of loop interchange

2017-09-22 Thread Bin.Cheng
On Mon, Sep 4, 2017 at 2:54 PM, Richard Biener
 wrote:
> On Wed, Aug 30, 2017 at 6:32 PM, Bin.Cheng  wrote:
>> On Wed, Aug 30, 2017 at 3:19 PM, Richard Biener
>>  wrote:
>>> On Wed, Aug 30, 2017 at 3:18 PM, Bin Cheng  wrote:
 Hi,
 This patch implements a simple loop interchange pass in GCC, as described 
 by its comments:
 +/* This pass performs loop interchange: for example, the loop nest
 +
 +   for (int j = 0; j < N; j++)
 + for (int k = 0; k < N; k++)
 +   for (int i = 0; i < N; i++)
 +c[i][j] = c[i][j] + a[i][k]*b[k][j];
 +
 +   is transformed to
 +
 +   for (int i = 0; i < N; i++)
 + for (int j = 0; j < N; j++)
 +   for (int k = 0; k < N; k++)
 +c[i][j] = c[i][j] + a[i][k]*b[k][j];
 +
 +   This pass implements loop interchange in the following steps:
 +
 + 1) Find perfect loop nest for each innermost loop and compute data
 +   dependence relations for it.  For above example, loop nest is
 +   .
 + 2) From innermost to outermost loop, this pass tries to interchange
 +   each loop pair.  For above case, it firstly tries to interchange
 +and loop nest becomes .
 +   Then it tries to interchange  and loop nest becomes
 +   .  The overall effect is to move innermost
 +   loop to the outermost position.  For loop pair 
 +   to be interchanged, we:
 + 3) Check if data dependence relations are valid for loop interchange.
 + 4) Check if both loops can be interchanged in terms of 
 transformation.
 + 5) Check if interchanging the two loops is profitable.
 + 6) Interchange the two loops by mapping induction variables.
 +
 +   This pass also handles reductions in loop nest.  So far we only support
 +   simple reduction of inner loop and double reduction of the loop nest.  
 */

 Actually, this pass only does loop shift which outermosting inner loop to 
 outer, rather
 than permutation.  Also as a traditional loop optimizer, it only works for 
 perfect loop
 nest.  I put it just after loop distribution thus ideally loop 
 split/distribution could
 create perfect nest for it.  Unfortunately, we don't get any perfect nest 
 from distribution
 for now because it only works for innermost loop.  For example, the 
 motivation case in
 spec2k6/bwaves is not handled on this pass alone.  I have a patch 
 extending distribution
 for (innermost) loop nest and with that patch bwaves case can be handled.
 Another point is I deliberately make both the cost model and code 
 transformation (very)
 conservative.  We can support more cases, or more transformations with 
 great care when
 it is for sure known beneficial.  IMHO, we already hit over-baked issues 
 quite often and
 don't want to introduce more.
 As for code generation, this patch has an issue that invariant code in 
 outer loop could
 be moved to inner loop.  For the moment, we rely on the last lim pass to 
 handle all INV
 generated during interchange.  In the future, we may need to avoid that in 
 interchange
 itself, or another lim pass just like the one after graphite optimizations.

 Boostrap and test on x86_64 and AArch64.  Various benchmarks built and run 
 successfully.
 Note this pass is disabled in patch, while the code is exercised by 
 bootstrap/building
 programs with it enabled by default.  Any comments?
>>>
>> Thanks for quick review.
>>> +/* The same as above, but this one is only used for interchanging not
>>> +   innermost loops.  */
>>> +#define OUTER_STRIDE_RATIO (2)
>>>
>>> please make all these knobs --params.
>>>
>>> +/* Enum type for loop reduction variable.  */
>>> +enum reduction_type
>>> +{
>>> +  UNKNOWN_RTYPE = 0,
>>> +  SIMPLE_RTYPE,
>>> +  DOUBLE_RTYPE
>>> +};
>>>
>>> seeing this we should have some generic data structure / analysis for
>>> reduction detection.  This adds a third user (autopar and vectorizer
>>> are the others).  Just an idea.
>>>
>>> +/* Return true if E is abnormal edge.  */
>>> +
>>> +static inline bool
>>> +abnormal_edge (edge e)
>>> +{
>>> +  return (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP));
>>> +}
>>>
>>> bad name/comment for what it does.
>>>
>>> ... jumping to end of file / start of pass
>>>
>>> +  /* Get the outer loop.  */
>>> +  loop = superloop_at_depth (loop, loop_depth (loop) - 1);
>>>
>>> loop_outer (loop)?
>>>
>>> +  /* Only support rectangle loop nest, i.e, inner loop's niters doesn't
>>> +depends on outer loop's IV.  */
>>> +  if (chrec_contains_symbols_defined_in_loop (niters, loop->num))
>>> +   break;
>>>
>>> but you don't check for a three-nest whether niters depends on outer outer
>>> loop's IV that way.  Either the check is superfluous here or incomplete.
>> It is checked 

[PATCH try 2] [i386,testsuite,libgcc] Fix build breakage on Mac and test FAILS on Solaris caused by PR82196 patch

2017-09-22 Thread Daniel Santos
On 09/22/2017 03:28 AM, Rainer Orth wrote:
> Hi Daniel,
>
>> On 09/22/2017 02:18 AM, Rainer Orth wrote:
>>> Hi Daniel,
>>>
 On 09/21/2017 05:18 PM, Daniel Santos wrote:
> So libgcc doesn't use a config.in. :(
 Scratch that, I forgot that we're using gcc/config.in via auto-host.h. 
 So I only have to add this to gcc/configure.ac and it will be available
 for my libgcc header -- this is what I used to sniff out support for the
 .hidden directive.
>>> Please don't go that route: it's totally the wrong direction.  There's
>>> work going on to further decouple libgcc from gcc-private headers and
>>> configure results.  libgcc already has its own configure tests for
>>> assembler features, and its own config.in.  What's wrong with adapting
>>> libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for
>>> libgcc?  Should be trivial...
>>>
>>> Rainer
>>>
>> Oops, I just saw your email after submitting my other patch.  Yes, I am
>> mistaken about config.in, sorry about that.  I didn't see a config.h
>> file, but examining further it looks like it outputs to auto-target.h. 
>> Also, I was looking for some HAVE_AS* macros, but they are named
>> differently.
> Right: though some are for assembler features, the macros are named
> differently.
>
>> I had previously included gcc's auto-host.h since it was in the include
>> path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll
> HAVE_GAS_HIDDEN actually ;-)
>
>> need to add that check into libgcc/configure.ac as well.  Again,
>> shouldn't be that much code.  Sound sane to you?
> You could do that, but it was already used before your patches, so
> please separate it from the current issue if you go down that route.
> libgcc is still full of cleanup possibilities :-)
>
>   Rainer

OK, so I'm just adding HAVE_AS_AVX mostly as-is from libitm (we don't
have $target_cpu so I'm using $target).  I do have minor concerns about
how this test will work on a cross-build -- I'm not an autotools expert
and I don't understand which assembler it will invoke, but the results
of the test failing only means we use .byte instead of the real
mnemonic, so it really shouldn't be a problem.

I've got tests started again, so presuming that *this* one passes, is it
OK for the trunk?

gcc/testsuite:
* gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break
on Solaris or with -mno-omit-frame-pointer.
* gcc.target/i386/pr82196-2.c: Likewise.

libgcc:
* configure.ac: Add check for HAVE_AS_AVX.
* config.in: Regenerate.
* configure: Likewise.
* config/i386/i386-asm.h: Include auto-target.h from libgcc.
(SSE_SAVE, SSE_RESTORE): Sniff HAVE_AS_AVX and directly emit raw
.byte code when assembler doesn't support avx, correct
out-of-date comments.


 gcc/testsuite/gcc.target/i386/pr82196-1.c |  5 ++-
 gcc/testsuite/gcc.target/i386/pr82196-2.c |  5 ++-
 libgcc/config.in  |  3 ++
 libgcc/config/i386/i386-asm.h | 45 ++-
 libgcc/configure  | 39 +++
 libgcc/configure.ac   | 16 ++
 6 files changed, 100 insertions(+), 13 deletions(-)


Thanks,
Daniel
diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c b/gcc/testsuite/gcc.target/i386/pr82196-1.c
index ef858328f00..541d975480d 100644
--- a/gcc/testsuite/gcc.target/i386/pr82196-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target lp64 } } */
 /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */
-/* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */
-/* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */
+/* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */
+/* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */
 
 void __attribute__((sysv_abi)) a() {
 }
@@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() {
 static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a;
 
 void __attribute__((ms_abi)) b() {
-  __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15");
   a_noinfo ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c b/gcc/testsuite/gcc.target/i386/pr82196-2.c
index 8fe58411d5e..7166d068bc1 100644
--- a/gcc/testsuite/gcc.target/i386/pr82196-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr82196-2.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target lp64 } } */
 /* { dg-options "-mavx -mcall-ms2sysv-xlogues -O2" } */
-/* { dg-final { scan-assembler "call.*__avx_savms64_18" } } */
-/* { dg-final { scan-assembler "jmp.*__avx_resms64x_18" } } */
+/* { dg-final { scan-assembler "call.*__avx_savms64f?_12" } } */
+/* { dg-final { scan-assembler "jmp.*__avx_resms64f?x_12" } } */
 
 void __attribute__((sysv_abi)) a() {
 }
@@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() {
 static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a;
 
 void __attribute__((ms_abi)) b() {
-  __asm__ __volat

Re: [PATCH] PR target/80556

2017-09-22 Thread Iain Sandoe
> On 18 Sep 2017, at 22:08, Simon Wright  wrote:
> 
> On 18 Sep 2017, at 21:09, Iain Sandoe  wrote:
>> 

>>> If I propose this alternative patch, should it be a new post, or should I 
>>> continue this thread?
>> 
>> thanks for the patch.
>> 
>> The basic idea seems sound - as a workaround (as noted in comment #20 in the 
>> PR, we should really rationalise the libgcc/crts stuff to reflect the modern 
>> world, but these things take time...).
>> 
>> The patch as you have it would apply to every version of Darwin.
>> 
>> AFAICT from the published sources, i386 Darwin should be able to work with 
>> the libgcc unwinder (and all earlier Darwin *have* to) - so I’ve proposed a 
>> modified patch in the PR that makes the changes specific to m64 x86 and 
>> doesn’t make any alteration for PPC and/or Darwin < 10.
> 
> That sounds like the right thing to do. I hadn't considered the older 
> hardware/os issues (I only have kit back to macOS 10.11, Darwin 15).

So here’s the revised version with the comments slightly updated, checked 
Darwin10,15,16 x86_64 and i386 in progress,
OK if i386 succeeds?

Iain Sandoe
CodeSourcery / Mentor Embedded / Siemens

gcc/

2017-09-xx Iain Sandoe 

PR target/80556
* config/i386/darwin.h (REAL_LIB_SPEC): New; put libSystem ahead of 
libgcc_eh
for m64.
* config/i386/darwin64.h: Likewise.


diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index fccaf7e..321ed27 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -39,6 +39,32 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 #endif
 
+/* WORKAROUND pr80556:
+   For x86_64 Darwin10 and later, the unwinder is in libunwind (redirected
+   from libSystem).  This doesn't use the keymgr (see keymgr.c) and therefore
+   the calls that libgcc makes to obtain the KEYMGR_GCC3_DW2_OBJ_LIST are not
+   updated to include new images, and might not even be valid for a single
+   image.
+   Therefore, for 64b exes at least, we must use the libunwind implementation,
+   even when static-libgcc is specified.  We put libSystem first so that
+   unwinder symbols are satisfied from there. */
+#undef REAL_LIBGCC_SPEC
+#define REAL_LIBGCC_SPEC  \
+   "%{static-libgcc|static:   \
+  %{m64:%:version-compare(>= 10.6 mmacosx-version-min= -lSystem)} \
+-lgcc_eh -lgcc;
   \
+  shared-libgcc|fexceptions|fgnu-runtime: \
+   %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_s.10.4)   \
+   %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)   \
+   %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_ext.10.4) \
+   %:version-compare(>= 10.5 mmacosx-version-min= -lgcc_ext.10.5) \
+   -lgcc ;\
+  :%:version-compare(>< 10.3.9 10.5 mmacosx-version-min= -lgcc_s.10.4) \
+   %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)   \
+   %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_ext.10.4) \
+   %:version-compare(>= 10.5 mmacosx-version-min= -lgcc_ext.10.5) \
+   -lgcc }"
+
 /* Size of the Obj-C jump buffer.  */
 #define OBJC_JBLEN ((TARGET_64BIT) ? ((9 * 2) + 3 + 16) : (18))
 
diff --git a/gcc/config/i386/darwin64.h b/gcc/config/i386/darwin64.h
index f2982ed..32cb789 100644
--- a/gcc/config/i386/darwin64.h
+++ b/gcc/config/i386/darwin64.h
@@ -21,6 +21,32 @@ along with GCC; see the file COPYING3.  If not see
 #undef  DARWIN_ARCH_SPEC
 #define DARWIN_ARCH_SPEC "%{m32:i386;:x86_64}"
 
+/* WORKAROUND pr80556:
+   For x86_64 Darwin10 and later, the unwinder is in libunwind (redirected
+   from libSystem).  This doesn't use the keymgr (see keymgr.c) and therefore
+   the calls that libgcc makes to obtain the KEYMGR_GCC3_DW2_OBJ_LIST are not
+   updated to include new images, and might not even be valid for a single
+   image.
+   Therefore, for 64b exes at least, we must use the libunwind implementation,
+   even when static-libgcc is specified.  We put libSystem first so that
+   unwinder symbols are satisfied from there. */
+#undef REAL_LIBGCC_SPEC
+#define REAL_LIBGCC_SPEC  \
+   "%{static-libgcc|static:   \
+  %{!m32:%:version-compare(>= 10.6 mmacosx-version-min= -lSystem)}\
+-lgcc_eh -lgcc;
   \
+  shared-libgcc|fexceptions|fgnu-runtime: \
+   %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_s.10.4)   \
+   %:version-compare(>< 10.5 10.6 mmacosx-version-min= -lgcc_s.10.5)   \
+   %:version-compare(!> 10.5 mmacosx-version-min= -lgcc_ext.10.4) \
+   %:version-compare(>= 10.5 mmacosx-version-min= -lgcc_ext.10.5) \
+   -lgc

Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-22 Thread Janus Weil
2017-09-22 9:11 GMT+02:00 Janne Blomqvist :
> On Fri, Sep 22, 2017 at 8:02 AM, Janus Weil  wrote:
>> Thanks, Steve. I'm attaching the updated ChangeLog and the two test
>> cases for the two new flags. Since this appears to be  a somewhat
>> controversial feature, I'll wait two more days to allow for others to
>> contribute their feedback (positive or negative). I'll commit on
>> Sunday if I hear nothing until then.
>
> Well, if you're actively soliciting bikeshedding, here goes:

I'm not soliciting anything. I'm giving people a fair chance to
comment. What you make out of that is completely up to you.


> Since we're about to have several -fdefault-real-N flags, would it
> make sense to instead make a single flag -fdefault-real=SOMEVALUE

I don't think that's a good idea, for several reasons:

1) We're probably not going to have much more N values (adding a
single one is tough enough of a job apparently, plus the list of
supported real kinds is very much limited anyway).

2) The syntax you're proposing is inconsistent with other flags like
-fdefault-integer-8, -fdefault-double-8. I'm sure we don't want to
change all of them.

3) That kind of syntax is not even used in other flags like
-ffree-line-length-n, where the range of possible values is
potentially much larger.



> And since the standard requires that double precision variables are
> twice as big as reals, in the absence of an explicit -fdefault-double=
> flag, would it make sense to have -fdefault-real=N imply
> -fdefault-double=[2*N or if that isn't supported on the target, the
> largest supported real kind]?

That's basically the behavior I tried to implement in my current patch
(although I notice now that you're not necessarily getting the largest
real kind, if 16 is not supported).

Cheers,
Janus


[PATCH][GRAPHITE] Simplify move_sese_in_condition

2017-09-22 Thread Richard Biener

This re-implements it avoding the need to recompute dominators and in
a much simpler way.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress, SPEC CPU 
2006 is happy.

Richard.

2017-09-22  Richard Biener  

* sese.c: Include cfganal.h.
(if_region_set_false_region): Remove.
(create_if_region_on_edge): Likewise.
(move_sese_in_condition): Re-implement without destroying
dominators.

Index: gcc/sese.c
===
--- gcc/sese.c  (revision 253090)
+++ gcc/sese.c  (working copy)
@@ -40,8 +40,9 @@ along with GCC; see the file COPYING3.
 #include "cfgloop.h"
 #include "tree-data-ref.h"
 #include "tree-scalar-evolution.h"
-#include "sese.h"
 #include "tree-ssa-propagate.h"
+#include "cfganal.h"
+#include "sese.h"
 
 /* For a USE in BB, if BB is outside REGION, mark the USE in the
LIVEOUTS set.  */
@@ -333,99 +334,6 @@ get_false_edge_from_guard_bb (basic_bloc
   return NULL;
 }
 
-/* Sets the false region of an IF_REGION to REGION.  */
-
-static void
-if_region_set_false_region (ifsese if_region, sese_info_p region)
-{
-  free_dominance_info (CDI_DOMINATORS);
-
-  basic_block condition = if_region_get_condition_block (if_region);
-  edge false_edge = get_false_edge_from_guard_bb (condition);
-  basic_block dummy = false_edge->dest;
-  edge entry_region = region->region.entry;
-  edge exit_region = region->region.exit;
-  basic_block before_region = entry_region->src;
-  basic_block last_in_region = exit_region->src;
-  hashval_t hash = htab_hash_pointer (exit_region);
-  loop_exit **slot
-= current_loops->exits->find_slot_with_hash (exit_region, hash, NO_INSERT);
-  bool latch_p
-= exit_region->dest->loop_father->latch == exit_region->src;
-
-  entry_region->flags = false_edge->flags;
-  false_edge->flags = exit_region->flags;
-
-  redirect_edge_pred (entry_region, condition);
-  redirect_edge_pred (exit_region, before_region);
-  redirect_edge_pred (false_edge, last_in_region);
-  redirect_edge_succ (false_edge, single_succ (dummy));
-  delete_basic_block (dummy);
-
-  exit_region->flags = EDGE_FALLTHRU;
-
-  region->region.exit = false_edge;
-
-  free (if_region->false_region);
-  if_region->false_region = region;
-
-  if (slot)
-{
-  struct loop_exit *loop_exit = ggc_cleared_alloc ();
-
-  memcpy (loop_exit, *((struct loop_exit **) slot),
- sizeof (struct loop_exit));
-  current_loops->exits->clear_slot (slot);
-
-  hashval_t hash = htab_hash_pointer (false_edge);
-  slot = current_loops->exits->find_slot_with_hash (false_edge, hash,
-   INSERT);
-  loop_exit->e = false_edge;
-  *slot = loop_exit;
-  false_edge->src->loop_father->exits->next = loop_exit;
-}
-  if (latch_p)
-exit_region->dest->loop_father->latch = before_region;
-
-  calculate_dominance_info (CDI_DOMINATORS);
-}
-
-/* Creates an IFSESE with CONDITION on edge ENTRY.  */
-
-static ifsese
-create_if_region_on_edge (edge entry, tree condition)
-{
-  edge e;
-  edge_iterator ei;
-  sese_info_p sese_region = XNEW (struct sese_info_t);
-  sese_info_p true_region = XNEW (struct sese_info_t);
-  sese_info_p false_region = XNEW (struct sese_info_t);
-  ifsese if_region = XNEW (struct ifsese_s);
-  edge exit = create_empty_if_region_on_edge (entry, condition);
-
-  if_region->region = sese_region;
-  if_region->region->region.entry = entry;
-  if_region->region->region.exit = exit;
-
-  FOR_EACH_EDGE (e, ei, entry->dest->succs)
-{
-  if (e->flags & EDGE_TRUE_VALUE)
-   {
- true_region->region.entry = e;
- true_region->region.exit = single_succ_edge (e->dest);
- if_region->true_region = true_region;
-   }
-  else if (e->flags & EDGE_FALSE_VALUE)
-   {
- false_region->region.entry = e;
- false_region->region.exit = single_succ_edge (e->dest);
- if_region->false_region = false_region;
-   }
-}
-
-  return if_region;
-}
-
 /* Moves REGION in a condition expression:
| if (1)
|   ;
@@ -436,14 +344,32 @@ create_if_region_on_edge (edge entry, tr
 ifsese
 move_sese_in_condition (sese_info_p region)
 {
-  gcc_assert (! dom_info_available_p (cfun, CDI_POST_DOMINATORS));
+  basic_block region_entry_dest = region->region.entry->dest;
   basic_block pred_block = split_edge (region->region.entry);
-  ifsese if_region;
+  basic_block merge_block = split_edge (region->region.exit);
 
-  region->region.entry = single_succ_edge (pred_block);
-  if_region = create_if_region_on_edge (single_pred_edge (pred_block),
-   integer_one_node);
-  if_region_set_false_region (if_region, region);
+  edge true_edge = make_edge (pred_block, merge_block, EDGE_TRUE_VALUE);
+  edge false_edge = find_edge (pred_block, region_entry_dest);
+  false_edge->flags &= ~EDGE_FALLTHRU;
+  false_edge->flags |= EDGE_FALSE_VALUE;
+  gimple_stmt_iterator gsi =

Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-22 Thread Rainer Orth
Hi Daniel,

> On 09/22/2017 02:18 AM, Rainer Orth wrote:
>> Hi Daniel,
>>
>>> On 09/21/2017 05:18 PM, Daniel Santos wrote:
 So libgcc doesn't use a config.in. :(
>>> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. 
>>> So I only have to add this to gcc/configure.ac and it will be available
>>> for my libgcc header -- this is what I used to sniff out support for the
>>> .hidden directive.
>> Please don't go that route: it's totally the wrong direction.  There's
>> work going on to further decouple libgcc from gcc-private headers and
>> configure results.  libgcc already has its own configure tests for
>> assembler features, and its own config.in.  What's wrong with adapting
>> libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for
>> libgcc?  Should be trivial...
>>
>>  Rainer
>>
>
> Oops, I just saw your email after submitting my other patch.  Yes, I am
> mistaken about config.in, sorry about that.  I didn't see a config.h
> file, but examining further it looks like it outputs to auto-target.h. 
> Also, I was looking for some HAVE_AS* macros, but they are named
> differently.

Right: though some are for assembler features, the macros are named
differently.

> I had previously included gcc's auto-host.h since it was in the include
> path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll

HAVE_GAS_HIDDEN actually ;-)

> need to add that check into libgcc/configure.ac as well.  Again,
> shouldn't be that much code.  Sound sane to you?

You could do that, but it was already used before your patches, so
please separate it from the current issue if you go down that route.
libgcc is still full of cleanup possibilities :-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


PING][PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-09-22 Thread Michael Collison
Ping. Updated patch posted here:



https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00390.html



Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-22 Thread Daniel Santos
On 09/22/2017 02:18 AM, Rainer Orth wrote:
> Hi Daniel,
>
>> On 09/21/2017 05:18 PM, Daniel Santos wrote:
>>> So libgcc doesn't use a config.in. :(
>> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. 
>> So I only have to add this to gcc/configure.ac and it will be available
>> for my libgcc header -- this is what I used to sniff out support for the
>> .hidden directive.
> Please don't go that route: it's totally the wrong direction.  There's
> work going on to further decouple libgcc from gcc-private headers and
> configure results.  libgcc already has its own configure tests for
> assembler features, and its own config.in.  What's wrong with adapting
> libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for
> libgcc?  Should be trivial...
>
>   Rainer
>

Oops, I just saw your email after submitting my other patch.  Yes, I am
mistaken about config.in, sorry about that.  I didn't see a config.h
file, but examining further it looks like it outputs to auto-target.h. 
Also, I was looking for some HAVE_AS* macros, but they are named
differently.

I had previously included gcc's auto-host.h since it was in the include
path in order to use HAVE_AS_HIDDEN, so in order to decouple this I'll
need to add that check into libgcc/configure.ac as well.  Again,
shouldn't be that much code.  Sound sane to you?

Thanks,
Daniel


[RFC PATCH 4/4] Extra async tests, not for merging

2017-09-22 Thread Mike Crowe
These tests show that changing the system clock has an effect on
std::future::wait_until when using std::chrono::system_clock but not when
using std::chrono::steady_clock. Unfortunately these tests have a number of
downsides:

1. Nothing that is attempting to keep the clock set correctly (ntpd,
   systemd-timesyncd) can be running at the same time.

2. The test process requires the CAP_SYS_TIME capability.

3. Other processes running concurrently may misbehave when the clock darts
   back and forth.

4. They are slow to run.

As such, I don't think they are suitable for merging. I include them here
because I wanted to document how I had tested the changes in the previous
commits.
---
 libstdc++-v3/testsuite/30_threads/async/async.cc | 76 
 1 file changed, 76 insertions(+)

diff --git a/libstdc++-v3/testsuite/30_threads/async/async.cc 
b/libstdc++-v3/testsuite/30_threads/async/async.cc
index 59905b4666f..0a3cb28fbda 100644
--- a/libstdc++-v3/testsuite/30_threads/async/async.cc
+++ b/libstdc++-v3/testsuite/30_threads/async/async.cc
@@ -25,6 +25,7 @@
 
 #include 
 #include 
+#include 
 
 using namespace std;
 
@@ -94,11 +95,86 @@ void test03()
   VERIFY( elapsed < std::chrono::seconds(20) );
 }
 
+void perturb_system_clock(const std::chrono::seconds &seconds)
+{
+  struct timeval tv;
+  if (gettimeofday(&tv, NULL))
+abort();
+
+  tv.tv_sec += seconds.count();
+  if (settimeofday(&tv, NULL))
+abort();
+}
+
+void work04()
+{
+  std::this_thread::sleep_for(std::chrono::seconds(10));
+}
+
+// Ensure that advancing CLOCK_REALTIME doesn't make any difference
+// when we're waiting on std::chrono::steady_clock.
+void test04()
+{
+  auto const start = chrono::steady_clock::now();
+  future f1 = async(launch::async, &work04);
+
+  perturb_system_clock(chrono::seconds(20));
+
+  std::future_status status;
+  status = f1.wait_for(std::chrono::seconds(4));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(6));
+  VERIFY( status == std::future_status::timeout );
+
+  status = f1.wait_until(start + std::chrono::seconds(12));
+  VERIFY( status == std::future_status::ready );
+
+  auto const elapsed = chrono::steady_clock::now() - start;
+  VERIFY( elapsed >= std::chrono::seconds(10) );
+  VERIFY( elapsed < std::chrono::seconds(15) );
+
+  perturb_system_clock(chrono::seconds(-20));
+}
+
+void work05()
+{
+  std::this_thread::sleep_for(std::chrono::seconds(5));
+  perturb_system_clock(chrono::seconds(60));
+  std::this_thread::sleep_for(std::chrono::seconds(5));
+}
+
+// Ensure that advancing CLOCK_REALTIME does make a difference when
+// we're waiting on std::chrono::system_clock.
+void test05()
+{
+  auto const start = chrono::system_clock::now();
+  auto const start_steady = chrono::steady_clock::now();
+
+  future f1 = async(launch::async, &work05);
+  future_status status;
+  status = f1.wait_until(start + std::chrono::seconds(60));
+  VERIFY( status == std::future_status::timeout );
+
+  auto const elapsed_steady = chrono::steady_clock::now() - start_steady;
+  VERIFY( elapsed_steady >= std::chrono::seconds(5) );
+  VERIFY( elapsed_steady < std::chrono::seconds(10) );
+
+  status = f1.wait_until(start + std::chrono::seconds(75));
+  VERIFY( status == std::future_status::ready );
+
+  perturb_system_clock(chrono::seconds(-60));
+}
+
 int main()
 {
   test01();
   test02();
   test03();
   test03();
+  if (geteuid() == 0) {
+test04();
+test05();
+  }
   return 0;
 }
-- 
2.11.0



[RFC PATCH 2/4] libstdc++ futex: Support waiting on std::chrono::steady_clock directly

2017-09-22 Thread Mike Crowe
The user-visible effect of this change is for std::future::wait_until
to use CLOCK_MONOTONIC when passed a timeout of
std::chrono::steady_clock::time_point type. This makes it immune to
any changes made to the system clock CLOCK_REALTIME.

Add an overload of __atomic_futex_unsigned::_M_load_and_text_until_impl
that accepts a std::chrono::steady_clock, and correctly passes this through
to __atomic_futex_unsigned_base::_M_futex_wait_until_steady which uses
CLOCK_MONOTONIC for the timeout within the futex system call. These
functions are mostly just copies of the std::chrono::system_clock versions
with small tweaks.

Prior to this commit, a std::chrono::steady timeout would be converted via
std::chrono::system_clock which risks reducing or increasing the timeout if
someone changes CLOCK_REALTIME whilst the wait is happening. (The commit
immediately prior to this one increases the window of opportunity for that
from a short period during the calculation of a relative timeout, to the
entire duration of the wait.)

I believe that I've added this functionality in a way that it doesn't break
ABI compatibility, but that has made it more verbose and less type safe. I
believe that it would be better to maintain the timeout as an instance of
the correct clock type all the way down to a single _M_futex_wait_until
function with an overload for each clock. The current scheme of separating
out the seconds and nanoseconds early risks accidentally calling the wait
function for the wrong clock. Unfortunately, doing this would break code
that compiled against the old header.
---
 libstdc++-v3/include/bits/atomic_futex.h | 67 +++-
 libstdc++-v3/src/c++11/futex.cc  | 33 
 2 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index afcfeb7720d..c2b3df03592 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -52,11 +52,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #if defined(_GLIBCXX_HAVE_LINUX_FUTEX) && ATOMIC_INT_LOCK_FREE > 1
   struct __atomic_futex_unsigned_base
   {
-// Returns false iff a timeout occurred.
+// __s and __ns are measured against CLOCK_REALTIME. Returns false
+// iff a timeout occurred.
 bool
 _M_futex_wait_until(unsigned *__addr, unsigned __val, bool __has_timeout,
chrono::seconds __s, chrono::nanoseconds __ns);
 
+// __s and __ns are measured against CLOCK_MONOTONIC. Returns
+// false iff a timeout occurred.
+bool
+_M_futex_wait_until_steady(unsigned *__addr, unsigned __val, bool 
__has_timeout,
+   chrono::seconds __s, chrono::nanoseconds __ns);
+
 // This can be executed after the object has been destroyed.
 static void _M_futex_notify_all(unsigned* __addr);
   };
@@ -86,6 +93,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 // value if equal is false.
 // The assumed value is the caller's assumption about the current value
 // when making the call.
+// __s and __ns are measured against CLOCK_REALTIME.
 unsigned
 _M_load_and_test_until(unsigned __assumed, unsigned __operand,
bool __equal, memory_order __mo, bool __has_timeout,
@@ -110,6 +118,36 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
 }
 
+// If a timeout occurs, returns a current value after the timeout;
+// otherwise, returns the operand's value if equal is true or a different
+// value if equal is false.
+// The assumed value is the caller's assumption about the current value
+// when making the call.
+// __s and __ns are measured against CLOCK_MONOTONIC.
+unsigned
+_M_load_and_test_until_steady(unsigned __assumed, unsigned __operand,
+   bool __equal, memory_order __mo, bool __has_timeout,
+   chrono::seconds __s, chrono::nanoseconds __ns)
+{
+  for (;;)
+   {
+ // Don't bother checking the value again because we expect the caller
+ // to have done it recently.
+ // memory_order_relaxed is sufficient because we can rely on just the
+ // modification order (store_notify uses an atomic RMW operation too),
+ // and the futex syscalls synchronize between themselves.
+ _M_data.fetch_or(_Waiter_bit, memory_order_relaxed);
+ bool __ret = _M_futex_wait_until_steady((unsigned*)(void*)&_M_data,
+  __assumed | _Waiter_bit,
+  __has_timeout, __s, __ns);
+ // Fetch the current value after waiting (clears _Waiter_bit).
+ __assumed = _M_load(__mo);
+ if (!__ret || ((__operand == __assumed) == __equal))
+   return __assumed;
+ // TODO adapt wait time
+   }
+}
+
 // Returns the operand's value if equal is true or a different value if
 // equal is false.
 // The assumed value is the caller's assumption about the current value
@@ -140,6 +178,19 @@ _GL

[RFC PATCH 3/4] libstdc++ atomic_futex: Use std::chrono::steady_clock as reference clock

2017-09-22 Thread Mike Crowe
The user-visible effect of this change is that std::future::wait_for now
uses std::chrono::steady_clock to determine the timeout. This makes it
immune to changes made to the system clock. It also means that anyone using
their own clock types with std::future::wait_until will have the timeout
converted to std::chrono::steady_clock rather than
std::chrono::system_clock.

Now that use of both std::chrono::steady_clock and
std::chrono::system_clock are correctly supported for the wait
timeout, I believe that std::chrono::steady_clock is a better choice
for the reference clock that is used by std::future::wait_for, and all
other clocks are converted to, since it is guaranteed to advance
steadily. The previous behaviour of converting to
std::chrono::system_clock risks timeouts changing dramatically when
the system clock is changed.

---
 libstdc++-v3/include/bits/atomic_futex.h | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/include/bits/atomic_futex.h 
b/libstdc++-v3/include/bits/atomic_futex.h
index c2b3df03592..7ba86dfeec7 100644
--- a/libstdc++-v3/include/bits/atomic_futex.h
+++ b/libstdc++-v3/include/bits/atomic_futex.h
@@ -71,7 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template 
   class __atomic_futex_unsigned : __atomic_futex_unsigned_base
   {
-typedef chrono::system_clock __clock_t;
+typedef chrono::steady_clock __clock_t;
 
 // This must be lock-free and at offset 0.
 atomic _M_data;
@@ -169,7 +169,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 unsigned
 _M_load_and_test_until_impl(unsigned __assumed, unsigned __operand,
bool __equal, memory_order __mo,
-   const chrono::time_point<__clock_t, _Dur>& __atime)
+   const chrono::time_point& __atime)
 {
   auto __s = chrono::time_point_cast(__atime);
   auto __ns = chrono::duration_cast(__atime - __s);
@@ -229,7 +229,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_load_when_equal_until(unsigned __val, memory_order __mo,
  const chrono::time_point<_Clock, _Duration>& __atime)
   {
-   // DR 887 - Sync unknown clock to known clock.
const typename _Clock::time_point __c_entry = _Clock::now();
const __clock_t::time_point __s_entry = __clock_t::now();
const auto __delta = __atime - __c_entry;
@@ -241,7 +240,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 template
 _GLIBCXX_ALWAYS_INLINE bool
 _M_load_when_equal_until(unsigned __val, memory_order __mo,
-   const chrono::time_point<__clock_t, _Duration>& __atime)
+   const chrono::time_point& __atime)
 {
   unsigned __i = _M_load(__mo);
   if ((__i & ~_Waiter_bit) == __val)
-- 
2.11.0



[RFC PATCH 1/4] libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait

2017-09-22 Thread Mike Crowe
The futex system call supports waiting for an absolute time if
FUTEX_WAIT_BITSET is used rather than FUTEX_WAIT. Doing so provides two
benefits:

1. The call to gettimeofday is not required in order to calculate a
   relative timeout.

2. If someone changes the system clock during the wait then the futex
   timeout will correctly expire earlier or later. Currently that only
   happens if the clock is changed prior to the call to gettimeofday.

According to futex(2), support for FUTEX_CLOCK_REALTIME was added in the
v2.6.28 Linux kernel and FUTEX_WAIT_BITSET was added in v2.6.25. There is
no attempt to detect the kernel version and fall back to the previous
method.
---
 libstdc++-v3/src/c++11/futex.cc | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/libstdc++-v3/src/c++11/futex.cc b/libstdc++-v3/src/c++11/futex.cc
index 64aada06969..217aeefe005 100644
--- a/libstdc++-v3/src/c++11/futex.cc
+++ b/libstdc++-v3/src/c++11/futex.cc
@@ -35,6 +35,9 @@
 
 // Constants for the wait/wake futex syscall operations
 const unsigned futex_wait_op = 0;
+const unsigned futex_wait_bitset_op = 9;
+const unsigned futex_clock_realtime_flag = 256;
+const unsigned futex_bitset_match_any = ~0;
 const unsigned futex_wake_op = 1;
 
 namespace std _GLIBCXX_VISIBILITY(default)
@@ -58,22 +61,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 else
   {
-   struct timeval tv;
-   gettimeofday (&tv, NULL);
-   // Convert the absolute timeout value to a relative timeout
struct timespec rt;
-   rt.tv_sec = __s.count() - tv.tv_sec;
-   rt.tv_nsec = __ns.count() - tv.tv_usec * 1000;
-   if (rt.tv_nsec < 0)
- {
-   rt.tv_nsec += 10;
-   --rt.tv_sec;
- }
-   // Did we already time out?
-   if (rt.tv_sec < 0)
- return false;
-
-   if (syscall (SYS_futex, __addr, futex_wait_op, __val, &rt) == -1)
+   rt.tv_sec = __s.count();
+   rt.tv_nsec = __ns.count();
+   if (syscall (SYS_futex, __addr, futex_wait_bitset_op | 
futex_clock_realtime_flag, __val, &rt, nullptr, futex_bitset_match_any) == -1)
  {
_GLIBCXX_DEBUG_ASSERT(errno == EINTR || errno == EAGAIN
  || errno == ETIMEDOUT);
-- 
2.11.0



[RFC PATCH 0/4] Make std::future::wait_* use std::chrono::steady_clock when required

2017-09-22 Thread Mike Crowe
This is a first attempt to make std::future::wait_until and
std::future::wait_for make correct use of
std::chrono::steady_clock/CLOCK_MONOTONIC. It also makes
std::future::wait_until react to changes to CLOCK_REALTIME during the
wait, but only when passed a std::chrono::system_clock time point.

Prior to these patches, the situation is similar to that described at
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861 for
std::condition_variable, but with the benefit that this time the bug can be
fixed entirely within libstdc++.

(I've omitted the ChangeLog entries for now, since I don't believe that
this is ready to be accepted in its current form.)

Mike.

Mike Crowe (4):
  libstdc++ futex: Use FUTEX_CLOCK_REALTIME for wait
  libstdc++ futex: Support waiting on std::chrono::steady_clock directly
  libstdc++ atomic_futex: Use std::chrono::steady_clock as reference
clock
  Extra async tests, not for merging

 libstdc++-v3/include/bits/atomic_futex.h | 74 +--
 libstdc++-v3/src/c++11/futex.cc  | 48 +++
 libstdc++-v3/testsuite/30_threads/async/async.cc | 76 
 3 files changed, 181 insertions(+), 17 deletions(-)

-- 
2.11.0



Re: Add a vect_get_dr_size helper function

2017-09-22 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Sep 14, 2017 at 4:05 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Thu, Sep 14, 2017 at 1:23 PM, Richard Sandiford
>>>  wrote:
 This patch adds a helper function for getting the number of
 bytes accessed by a scalar data reference, which helps when general
 modes have a variable size.

 Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
 OK to install?
>>>
>>> Can you put it into tree-data-ref.h?
>>
>> The idea (which I forgot to say) was to collect the uses within the
>> vectoriser into one place so that we can assert in only one place
>> that the reference is constant-sized.
>>
>> A general data_reference can be variable-sized, so the guarantees
>> wouldn't be the same elsewhere.
>>
>> Would putting it in tree-vectorizer.h be OK?
>
> Maybe name it vect_get_scalar_dr_size then?
>
> Ok with that.

As discussed on IRC, I tried also using SCALAR_TYPE_MODE instead of
TYPE_MODE, to go with the new function name, but it turns out that
the mode can be a single-element vector instead of a "true" scalar.

This version of the patch sticks with vect_get_scalar_dr_size as
the name, but adds a comment to say that "scalar" includes "scalar
equivalent".  Since SCALAR_TYPE_MODE is too strong an assertion,
I went with tree_to_uhwi instead.

As also discussed on IRC, it might be possible in current sources
to use SCALAR_TYPE_MODE (TREE_TYPE (STMT_VINFO_VECTYPE ...)) instead.
But SVE has extending loads and truncating stores, for which the DR size
will be different from the vector element size, so if we switch it now,
we'd have to switch back to the DR size again later.

(Sorry again for labouring such a simple change!)

Tested as before.  OK to install?

Thanks,
Richard


2017-09-22  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* tree-vectorizer.h (vect_get_scalar_dr_size): New function.
* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Use it.
(vect_enhance_data_refs_alignment): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2017-09-18 16:49:51.902170781 +0100
+++ gcc/tree-vectorizer.h   2017-09-22 08:58:24.308199570 +0100
@@ -1095,6 +1095,19 @@ vect_get_num_copies (loop_vec_info loop_
  / TYPE_VECTOR_SUBPARTS (vectype));
 }
 
+/* Return the size of the value accessed by unvectorized data reference DR.
+   This is only valid once STMT_VINFO_VECTYPE has been calculated for the
+   associated gimple statement, since that guarantees that DR accesses
+   either a scalar or a scalar equivalent.  ("Scalar equivalent" here
+   includes things like V1SI, which can be vectorized in the same way
+   as a plain SI.)  */
+
+inline unsigned int
+vect_get_scalar_dr_size (struct data_reference *dr)
+{
+  return tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr;
+}
+
 /* Source location */
 extern source_location vect_location;
 
Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   2017-09-18 16:42:00.553203578 +0100
+++ gcc/tree-vect-data-refs.c   2017-09-22 08:58:24.308199570 +0100
@@ -955,7 +955,6 @@ vect_compute_data_ref_alignment (struct
   return true;
 }
 
-
 /* Function vect_update_misalignment_for_peel.
Sets DR's misalignment
- to 0 if it has the same alignment as DR_PEEL,
@@ -975,8 +974,8 @@ vect_update_misalignment_for_peel (struc
   unsigned int i;
   vec same_aligned_drs;
   struct data_reference *current_dr;
-  int dr_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr;
-  int dr_peel_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr_peel;
+  int dr_size = vect_get_scalar_dr_size (dr);
+  int dr_peel_size = vect_get_scalar_dr_size (dr_peel);
   stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr));
   stmt_vec_info peel_stmt_info = vinfo_for_stmt (DR_STMT (dr_peel));
 
@@ -1664,8 +1663,7 @@ vect_enhance_data_refs_alignment (loop_v
 
   vectype = STMT_VINFO_VECTYPE (stmt_info);
   nelements = TYPE_VECTOR_SUBPARTS (vectype);
-  mis = DR_MISALIGNMENT (dr) / GET_MODE_SIZE (TYPE_MODE (
-TREE_TYPE (DR_REF (dr;
+ mis = DR_MISALIGNMENT (dr) / vect_get_scalar_dr_size (dr);
  if (DR_MISALIGNMENT (dr) != 0)
npeel_tmp = (negative ? (mis - nelements)
 : (nelements - mis)) & (nelements - 1);
@@ -1937,8 +1935,7 @@ vect_enhance_data_refs_alignment (loop_v
  updating DR_MISALIGNMENT values.  The peeling factor is the
  vectorization factor minus the misalignment as an element
  count.  */
-  mis = DR_MISALIGNMENT (dr0);
-  mis /= GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (DR_REF (dr0;
+ mis = DR_MISALIGNMENT (dr0) / vect_get_

Re: [Patch, fortran] Bug fixes, including regressions, for the associate construct

2017-09-22 Thread Paul Richard Thomas
Dear All,

I seem to have messed up associate_31.f90 totally. Not only is it not
doing the intended test but it is failing at all levels of
optimization. I will remove it first thing tomorrow. Strangely, it did
not fail when I regtested before committing.

Sorry about this.

Paul


On 21 September 2017 at 19:41, Paul Richard Thomas
 wrote:
> Dear Jerry,
>
> Thanks! Committed as revision 253077.
>
> Cheers
>
> Paul
>
> On 21 September 2017 at 01:52, Jerry DeLisle  wrote:
>> On 09/20/2017 09:45 AM, Paul Richard Thomas wrote:
>>> In the last update to the Parameterized Derived Types implementation,
>>> I fixed PR60483 as a side effect. I then checked all the ASSOCIATE
>>> bugs and noted that I was responsible for a number of regressions due
>>> to a patch that I applied last year. I determined to fix them and
>>> found that I couldn't stop.
>>>
>>> The PRs in the ChangeLogs are the low(ish) lying fruit and the changes
>>> are fairly obvious or are described in the ChangeLog or the comments.
>>>
>>> Bootstrapped and regtested on FC23/x86_64 - OK for trunk and later on 
>>> 7-branch?
>>
>> Yes OK and thanks for the fixes.
>>
>>
>> Cheers,
>>
>> Jerry
>>
>
>
>
> --
> "If you can't explain it simply, you don't understand it well enough"
> - Albert Einstein



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


[PATCH] [i386, testsuite, libgcc] Fix build breakage on Mac and test FAILS on Solaris.

2017-09-22 Thread Daniel Santos
I've bootstrapped and reg-tested {,-m64} with this on
86_64-pc-linux-gnu, but I'm waiting for a reference test set to finish
to compare them.  I've verified that we're getting HAVE_AS_IX86_AVX in
auto-host.h and I've also built and tested w/o to double-verify that the
AVX code is correct.  Now that I think of it, I didn't run the tests
with -mno-omit-frame-pointer (to simulate Solaris), but since the tests
pass as-is, I presume that it understands the 'f?' part of my regex.  (A
separate set of stubs are used when rbp is the frame pointer and these
have 'f' appended to their names.)

OK to commit once I get a clean compare?

gcc:
configure.ac: Add Check for HAVE_AS_IX86_AVX
config.in: Regenerate.
configure: Likewise.

gcc/testsuite:
gcc.target/i386/pr82196-1.c: Simplify so that it doesn't break
on Solaris or with -mno-omit-frame-pointer.
gcc.target/i386/pr82196-2.c: Likewise.

libgcc:
config/i386/i386-asm.h (SSE_SAVE, SSE_RESTORE): Sniff
HAVE_AS_IX86_AVX and directly emit raw .byte code when assembler
doesn't support avx, correct out-of-date comments.

Thanks,
Daniel

Signed-off-by: Daniel Santos 
---
 gcc/config.in |  6 +
 gcc/configure | 32 ++
 gcc/configure.ac  |  6 +
 gcc/testsuite/gcc.target/i386/pr82196-1.c |  5 ++--
 gcc/testsuite/gcc.target/i386/pr82196-2.c |  5 ++--
 libgcc/config/i386/i386-asm.h | 44 ++-
 6 files changed, 85 insertions(+), 13 deletions(-)

diff --git a/gcc/config.in b/gcc/config.in
index 89d7108e8db..df2e518baa6 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -406,6 +406,12 @@
 #endif
 
 
+/* Define if your assembler supports avx extensions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_IX86_AVX
+#endif
+
+
 /* Define if your assembler supports the Sun syntax for cmov. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_IX86_CMOV_SUN_SYNTAX
diff --git a/gcc/configure b/gcc/configure
index 13f97cd3663..e982b86c25c 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -25881,6 +25881,38 @@ if test $gcc_cv_as_ix86_swap = yes; then
 
 $as_echo "#define HAVE_AS_IX86_SWAP 1" >>confdefs.h
 
+fi
+
+
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for avx 
extensions" >&5
+$as_echo_n "checking assembler for avx extensions... " >&6; }
+if test "${gcc_cv_as_ix86_avx+set}" = set; then :
+  $as_echo_n "(cached) " >&6
+else
+  gcc_cv_as_ix86_avx=no
+  if test x$gcc_cv_as != x; then
+$as_echo 'vzeroupper' > conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s >&5'
+  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
+  (eval $ac_try) 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; }
+then
+   gcc_cv_as_ix86_avx=yes
+else
+  echo "configure: failed program was" >&5
+  cat conftest.s >&5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_ix86_avx" >&5
+$as_echo "$gcc_cv_as_ix86_avx" >&6; }
+if test $gcc_cv_as_ix86_avx = yes; then
+
+$as_echo "#define HAVE_AS_IX86_AVX 1" >>confdefs.h
+
 fi
 
 
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 82711389281..a05f2ca10b2 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4171,6 +4171,12 @@ foo: nop
   [AC_DEFINE(HAVE_AS_IX86_SWAP, 1,
 [Define if your assembler supports the swap suffix.])])
 
+gcc_GAS_CHECK_FEATURE([avx extensions],
+  gcc_cv_as_ix86_avx,,,
+  [vzeroupper],,
+  [AC_DEFINE(HAVE_AS_IX86_AVX, 1,
+[Define if your assembler supports avx extensions.])])
+
 gcc_GAS_CHECK_FEATURE([different section symbol subtraction],
   gcc_cv_as_ix86_diff_sect_delta,,,
   [.section .rodata
diff --git a/gcc/testsuite/gcc.target/i386/pr82196-1.c 
b/gcc/testsuite/gcc.target/i386/pr82196-1.c
index ef858328f00..541d975480d 100644
--- a/gcc/testsuite/gcc.target/i386/pr82196-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr82196-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target lp64 } } */
 /* { dg-options "-msse -mcall-ms2sysv-xlogues -O2" } */
-/* { dg-final { scan-assembler "call.*__sse_savms64_18" } } */
-/* { dg-final { scan-assembler "jmp.*__sse_resms64x_18" } } */
+/* { dg-final { scan-assembler "call.*__sse_savms64f?_12" } } */
+/* { dg-final { scan-assembler "jmp.*__sse_resms64f?x_12" } } */
 
 void __attribute__((sysv_abi)) a() {
 }
@@ -9,6 +9,5 @@ void __attribute__((sysv_abi)) a() {
 static void __attribute__((sysv_abi)) (*volatile a_noinfo)() = a;
 
 void __attribute__((ms_abi)) b() {
-  __asm__ __volatile__ ("" :::"rbx", "rbp", "r12", "r13", "r14", "r15");
   a_noinfo ();
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr82196-2.c 
b/gcc/testsuite/gcc.target/i386/pr82196-2.c
index 8fe58411d5e..7166d068bc1 100644
--- a/gcc/testsuite/gcc.target/i386/pr82196-2.c
+++ b/gcc/testsuite/gcc.t

Re: [PATCH] [i386, libgcc] PR 82196 -mcall-ms2sysv-xlogues emits wrong AVX/SSE MOV

2017-09-22 Thread Rainer Orth
Hi Daniel,

> On 09/21/2017 05:18 PM, Daniel Santos wrote:
>> So libgcc doesn't use a config.in. :(
>
> Scratch that, I forgot that we're using gcc/config.in via auto-host.h. 
> So I only have to add this to gcc/configure.ac and it will be available
> for my libgcc header -- this is what I used to sniff out support for the
> .hidden directive.

Please don't go that route: it's totally the wrong direction.  There's
work going on to further decouple libgcc from gcc-private headers and
configure results.  libgcc already has its own configure tests for
assembler features, and its own config.in.  What's wrong with adapting
libitm's avx test in libitm/acinclude.m4 (LIBITM_CHECK_AS_AVX) for
libgcc?  Should be trivial...

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Backtrace library [3/3]

2017-09-22 Thread Richard Biener
On Thu, 21 Sep 2017, Matthias Klose wrote:

> On 21.09.2017 17:50, Ian Lance Taylor via gcc-patches wrote:
> > On Thu, Sep 21, 2017 at 4:52 AM, Thomas Schwinge
> >  wrote:
> >>
> >> On Fri, 14 Sep 2012 14:20:08 -0700, Ian Lance Taylor  
> >> wrote:
>  [libbacktrace]
> >>
> >>> I won't commit for a day or two in case anybody sees a
> >>> problem in the implementation.
> >>
> >> Eventually got committed, trunk r191397.
> >>
> >>
> >> I just happened to notice that contrib/gcc_update never got updated for
> >> libbacktrace.  OK to commit the following?
> >>
> >> commit 5f59657d84228734944dc6216e947f2ee627fcf7
> >> Author: Thomas Schwinge 
> >> Date:   Thu Sep 21 13:49:47 2017 +0200
> >>
> >> Handle libbacktrace in contrib/gcc_update
> >>
> >> contrib/
> >> * gcc_update (files_and_dependencies): Handle libbacktrace.
> > 
> > Fine by me.  Thanks.
> 
> ok to backport to the active branches?

This one specifically?  Yes.

Richard.


Re: [Patch, Fortran] PR 82143: add a -fdefault-real-16 flag

2017-09-22 Thread Janne Blomqvist
On Fri, Sep 22, 2017 at 8:02 AM, Janus Weil  wrote:
> Thanks, Steve. I'm attaching the updated ChangeLog and the two test
> cases for the two new flags. Since this appears to be  a somewhat
> controversial feature, I'll wait two more days to allow for others to
> contribute their feedback (positive or negative). I'll commit on
> Sunday if I hear nothing until then.

Well, if you're actively soliciting bikeshedding, here goes:

Since we're about to have several -fdefault-real-N flags, would it
make sense to instead make a single flag -fdefault-real=SOMEVALUE (and
for backwards compatibility, make -fdefault-real-8 an alias for
-fdefault-real=8)? And similarly for -fdefault-double=.

And since the standard requires that double precision variables are
twice as big as reals, in the absence of an explicit -fdefault-double=
flag, would it make sense to have -fdefault-real=N imply
-fdefault-double=[2*N or if that isn't supported on the target, the
largest supported real kind]? (This is sort-of an extension of the
current -fdefault-real-8 behavior)


-- 
Janne Blomqvist