date:20180709

Re: [PATCH 1/3][POPCOUNT] Handle COND_EXPR in expression_expensive_p

2018-07-09 Thread Kugan Vivekanandarajah

Hi Richard,

Thanks for the review.

On 6 July 2018 at 20:17, Richard Biener  wrote:
> On Fri, Jul 6, 2018 at 11:45 AM Kugan Vivekanandarajah
>  wrote:
>>
>> Hi Richard,
>>
>> > It was rewrite_to_non_trapping_overflow available  in tree.h.  Thus
>> > final value replacement
>> > could use that before gimplifying instead of using 
>> > rewrite_to_defined_overflow
>> Thanks.
>>
>> Is the attached patch OK? I am testing this on x86_64-linux-gnu and if
>> there is no new regressions.
>
> Please clean up the control flow to
>
>   if (...)
> def = rewrite_to_non_trapping_overflow (def);
>   def = force_gimple_operand_gsi (&gsi, def, false, NULL_TREE,
> true, GSI_SAME_STMT);

I also had to add flag_trapv like we do in other places (for
flag_wrapv) when calling rewrite_to_non_trapping_overflow. Attached
patch bootstraps and there is no new regressions in x86-64-linux-gnu.
Is this OK?

Thanks,
Kugan
>
> OK with that change.
> Richard.
>
>> Thanks,
>> Kugan
>>
>> gcc/ChangeLog:
>>
>> 2018-07-06  Kugan Vivekanandarajah  
>>
>> * tree-scalar-evolution.c (final_value_replacement_loop): Use
>> rewrite_to_non_trapping_overflow instead of rewrite_to_defined_overflow.
From f3ecde5ff57d361e452965550aca94560629e784 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Fri, 6 Jul 2018 13:34:41 +1000
Subject: [PATCH] rewrite with rewrite_to_non_trapping_overflow

Change-Id: I18bb9713b4562cd3f3954c3997bb376969d8ce9b
---
 gcc/tree-scalar-evolution.c | 29 +++--
 1 file changed, 7 insertions(+), 22 deletions(-)

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 4b0ec02..4feb4b1 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -267,6 +267,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimplify-me.h"
 #include "tree-cfg.h"
+#include "tree-eh.h"
 #include "tree-ssa-loop-ivopts.h"
 #include "tree-ssa-loop-manip.h"
 #include "tree-ssa-loop-niter.h"
@@ -3615,30 +3616,14 @@ final_value_replacement_loop (struct loop *loop)
   if (folded_casts && ANY_INTEGRAL_TYPE_P (TREE_TYPE (def))
 	  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (def)))
 	{
-	  gimple_seq stmts;
-	  gimple_stmt_iterator gsi2;
-	  def = force_gimple_operand (def, &stmts, true, NULL_TREE);
-	  gsi2 = gsi_start (stmts);
-	  while (!gsi_end_p (gsi2))
-	{
-	  gimple *stmt = gsi_stmt (gsi2);
-	  gimple_stmt_iterator gsi3 = gsi2;
-	  gsi_next (&gsi2);
-	  gsi_remove (&gsi3, false);
-	  if (is_gimple_assign (stmt)
-		  && arith_code_with_undefined_signed_overflow
-		  (gimple_assign_rhs_code (stmt)))
-		gsi_insert_seq_before (&gsi,
-   rewrite_to_defined_overflow (stmt),
-   GSI_SAME_STMT);
-	  else
-		gsi_insert_before (&gsi, stmt, GSI_SAME_STMT);
-	}
+	  bool saved_flag_trapv = flag_trapv;
+	  flag_trapv = 1;
+	  def = rewrite_to_non_trapping_overflow (def);
+	  flag_trapv = saved_flag_trapv;
 	}
-  else
-	def = force_gimple_operand_gsi (&gsi, def, false, NULL_TREE,
-	true, GSI_SAME_STMT);
 
+  def = force_gimple_operand_gsi (&gsi, def, false, NULL_TREE,
+	true, GSI_SAME_STMT);
   gassign *ass = gimple_build_assign (rslt, def);
   gsi_insert_before (&gsi, ass, GSI_SAME_STMT);
   if (dump_file)
-- 
2.7.4

Re: [patch] jump threading multiple paths that start from the same BB

2018-07-09 Thread Aldy Hernandez





On 07/05/2018 03:27 PM, Jeff Law wrote:

On 07/04/2018 02:12 AM, Aldy Hernandez wrote:



On 07/03/2018 08:16 PM, Jeff Law wrote:

On 07/03/2018 03:31 AM, Aldy Hernandez wrote:

On 07/02/2018 07:08 AM, Christophe Lyon wrote:


On 11/07/2017 10:33 AM, Aldy Hernandez wrote:

While poking around in the backwards threader I noticed that we bail if


we have already seen a starting BB.

  /* Do not jump-thread twice from the same block.  */
  if (bitmap_bit_p (threaded_blocks, entry->src->index)

This limitation discards paths that are sub-paths of paths that have
already been threaded.

The following patch scans the remaining to-be-threaded paths to identify


if any of them start from the same point, and are thus sub-paths of the


just-threaded path.  By removing the common prefix of blocks in upcoming


threadable paths, and then rewiring first non-common block
appropriately, we expose new threading opportunities, since we are no

longer starting from the same BB.  We also simplify the would-be
threaded paths, because we don't duplicate already duplicated paths.

[snip]

Hi,

I've noticed a regression on aarch64:
FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
threaded: 3"
very likely caused by this patch (appeared between 262282 and 262294)

Christophe


The test needs to be adjusted here.

The long story is that the aarch64 IL is different at thread3 time in
that it has 2 profitable sub-paths that can now be threaded with my
patch.  This is causing the threaded count to be 5 for aarch64, versus 3
for x86 64.  Previously we couldn't thread these in aarch64, so the
backwards threader would bail.

One can see the different threading opportunities by sticking
debug_all_paths() at the top of thread_through_all_blocks().  You will
notice that aarch64 has far more candidates to begin with.  The IL on
the x86 backend, has no paths that start on the same BB.  The aarch64,
on the other hand, has many to choose from:

path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,

Some of these prove unprofitable, but 2 more than before are profitable now.



BTW, I see another threading related failure on aarch64 which is
unrelated to my patch, and was previously there:

FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
threaded"

This is probably another IL incompatibility between architectures.

Anyways... the attached path fixes the regression.  I have added a note
to the test explaining the IL differences.  We really should rewrite all
the threading tests (I am NOT volunteering ;-)).

OK for trunk?
Aldy

curr.patch


gcc/testsuite/

 * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
 has a slightly different IL that provides more threading
 opportunities.

OK.

WRT rewriting the tests.  I'd certainly agree that we don't have the
right set of knobs to allow us to characterize the target nor do we have
the right dumping/scanning facilities to describe and query the CFG
changes.

The fact that the IL changes so much across targets is a sign that
target dependency (probably BRANCH_COST) is twiddling the gimple we
generate.  I strongly suspect we'd be a lot better off if we tackled the
BRANCH_COST problem first.


Huh.  I've always accepted differing IL between architectures as a
necessary evil for things like auto-vectorization and the like.

Yes.  We've made a conscious decision that introducing target
dependencies for the autovectorizer makes sense.  BRANCH_COST on the
other hand is a different beast :-)



What's the ideal plan here? A knob to set default values for target
dependent variables that can affect IL layout?  Then we could pass
-fthis-is-an-IL-test and things be normalized?

Well, lots of things.

I'd like decisions about how to expand branches deferred until rtl
expansion.  Kai was poking at this in the past but never really got any
traction.


For the record, the problem in this testcase is that switch lowering is 
riddled with back end specific knowledge (GET_MODE_SIZE uses as well as 
some rtx cost hacks).




Many tests should turn into gimple IL tests.


Yeah, though for tests like the threading ones, they're already 
sufficiently convoluted that turning them into gimple IL tests will make 
them even harder to read.  Oh well, I guess?


To make it easier to transition to gimple IL tests, I suppose we could 
add -fdump-tree-*-something-more-suitable-for-gimplefe ;-).  Is it by 
design that the gimple fe is suffici

Re: [PATCH] Fix __mmask* types on many AVX512 intrinsics

2018-07-09 Thread Jakub Jelinek

On Sun, Jul 08, 2018 at 02:39:40AM +0300, Grazvydas Ignotas wrote:
> > I guess we want to backport it soon, but would appreciate somebody testing
> > it on real AVX512-{BW,VL} hw before doing the backports.
> 
> I've run the testsuite with this patch applied and all tests passed on
> i7-7800X.

Thanks for the testing.

> There are avx512vl-vmovdqa64-1.c and avx512vl-vpermilpdi-1.c
> failures, but those seem unrelated.

These are dg-do compile tests, and they PASS for me, even when doing
make check-gcc RUNTESTFLAGS="--target_board=unix/-march=skylake-avx512 
i386.exp='avx512vl-vmovdqa64-1.c avx512vl-vpermilpdi-1.c'"
So, how exactly you've configured your gcc, what kind of options are
passed to the test and how they FAIL?

Jakub

Re: [C++ Patch] Use rich_location::add_range in three more places

2018-07-09 Thread Jason Merrill

Ok.

On Mon, Jul 9, 2018, 6:42 AM Paolo Carlini  wrote:

> Hi,
>
> noticed three additional error messages where an additional range seems
> appropriate. Tested x86_64-linux.
>
> Thanks, Paolo.
>
> //
>
>

[PATCH][OBVIOUS] Add missing Optimization attribute.

2018-07-09 Thread Martin Liška

Hi.

I'm putting back what I accidentally removed.

Martin

gcc/ChangeLog:

2018-07-09  Martin Liska  

* common.opt: Add back wrongly removed attribute.
---
 gcc/common.opt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/common.opt b/gcc/common.opt
index 963c37f04cd..c29abdb5cb1 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -987,7 +987,7 @@ Common Report Var(flag_align_loops) Optimization
 Align the start of loops.
 
 falign-loops=
-Common RejectNegative Joined Var(str_align_loops)
+Common RejectNegative Joined Var(str_align_loops) Optimization
 
 fargument-alias
 Common Ignore

Re: [PATCH 1/3][POPCOUNT] Handle COND_EXPR in expression_expensive_p

2018-07-09 Thread Richard Biener

On Mon, Jul 9, 2018 at 9:05 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> Thanks for the review.
>
> On 6 July 2018 at 20:17, Richard Biener  wrote:
> > On Fri, Jul 6, 2018 at 11:45 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >> > It was rewrite_to_non_trapping_overflow available  in tree.h.  Thus
> >> > final value replacement
> >> > could use that before gimplifying instead of using 
> >> > rewrite_to_defined_overflow
> >> Thanks.
> >>
> >> Is the attached patch OK? I am testing this on x86_64-linux-gnu and if
> >> there is no new regressions.
> >
> > Please clean up the control flow to
> >
> >   if (...)
> > def = rewrite_to_non_trapping_overflow (def);
> >   def = force_gimple_operand_gsi (&gsi, def, false, NULL_TREE,
> > true, GSI_SAME_STMT);
>
> I also had to add flag_trapv like we do in other places (for
> flag_wrapv) when calling rewrite_to_non_trapping_overflow. Attached
> patch bootstraps and there is no new regressions in x86-64-linux-gnu.
> Is this OK?

Aww, no.  Sorry for misleading you - final value replacement - in addition
to the trapping issue - needs to make sure to not introduce undefined
overflow as well.  So the rewrite_to_non_trapping_overflow should avoid
the gimplification issue you ran into (and thus the extra predicate in
expression_expensive) but you can't remove the rewrite_to_defined_overflow
code.

So, can you rework things again, doing the rewrtite_to_non_trapping_overflow
but keep the rewrite_to_defined_overflow code as well?  The you should
be able to remove the generic_xpr_could_trap_p checks (and TREE_SIDE_EFFECTS).

Thanks,
Richard.

> Thanks,
> Kugan
> >
> > OK with that change.
> > Richard.
> >
> >> Thanks,
> >> Kugan
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-07-06  Kugan Vivekanandarajah  
> >>
> >> * tree-scalar-evolution.c (final_value_replacement_loop): Use
> >> rewrite_to_non_trapping_overflow instead of 
> >> rewrite_to_defined_overflow.

Re: [patch] jump threading multiple paths that start from the same BB

2018-07-09 Thread Richard Biener

On Mon, Jul 9, 2018 at 9:19 AM Aldy Hernandez  wrote:
>
>
>
> On 07/05/2018 03:27 PM, Jeff Law wrote:
> > On 07/04/2018 02:12 AM, Aldy Hernandez wrote:
> >>
> >>
> >> On 07/03/2018 08:16 PM, Jeff Law wrote:
> >>> On 07/03/2018 03:31 AM, Aldy Hernandez wrote:
>  On 07/02/2018 07:08 AM, Christophe Lyon wrote:
> 
> >>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
>  While poking around in the backwards threader I noticed that we bail 
>  if
> 
> 
>  we have already seen a starting BB.
> 
>    /* Do not jump-thread twice from the same block.  */
>    if (bitmap_bit_p (threaded_blocks, entry->src->index)
> 
>  This limitation discards paths that are sub-paths of paths that have
>  already been threaded.
> 
>  The following patch scans the remaining to-be-threaded paths to 
>  identify
> 
> 
>  if any of them start from the same point, and are thus sub-paths of 
>  the
> 
> 
>  just-threaded path.  By removing the common prefix of blocks in 
>  upcoming
> 
> 
>  threadable paths, and then rewiring first non-common block
>  appropriately, we expose new threading opportunities, since we are no
> 
>  longer starting from the same BB.  We also simplify the would-be
>  threaded paths, because we don't duplicate already duplicated paths.
>  [snip]
> > Hi,
> >
> > I've noticed a regression on aarch64:
> > FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
> > threaded: 3"
> > very likely caused by this patch (appeared between 262282 and 262294)
> >
> > Christophe
> 
>  The test needs to be adjusted here.
> 
>  The long story is that the aarch64 IL is different at thread3 time in
>  that it has 2 profitable sub-paths that can now be threaded with my
>  patch.  This is causing the threaded count to be 5 for aarch64, versus 3
>  for x86 64.  Previously we couldn't thread these in aarch64, so the
>  backwards threader would bail.
> 
>  One can see the different threading opportunities by sticking
>  debug_all_paths() at the top of thread_through_all_blocks().  You will
>  notice that aarch64 has far more candidates to begin with.  The IL on
>  the x86 backend, has no paths that start on the same BB.  The aarch64,
>  on the other hand, has many to choose from:
> 
>  path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11,
>  path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>  path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16,
>  path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>  path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>  path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 11, 11 -> 35,
>  path: 52 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>  path: 51 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 17,
>  path: 53 -> 56, 56 -> 57, 57 -> 58, 58 -> 10, 10 -> 16, 16 -> 19,
> 
>  Some of these prove unprofitable, but 2 more than before are profitable 
>  now.
> 
> 
> 
>  BTW, I see another threading related failure on aarch64 which is
>  unrelated to my patch, and was previously there:
> 
>  FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump-not vrp2 "Jumps
>  threaded"
> 
>  This is probably another IL incompatibility between architectures.
> 
>  Anyways... the attached path fixes the regression.  I have added a note
>  to the test explaining the IL differences.  We really should rewrite all
>  the threading tests (I am NOT volunteering ;-)).
> 
>  OK for trunk?
>  Aldy
> 
>  curr.patch
> 
> 
>  gcc/testsuite/
> 
>   * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust test because aarch64
>   has a slightly different IL that provides more threading
>   opportunities.
> >>> OK.
> >>>
> >>> WRT rewriting the tests.  I'd certainly agree that we don't have the
> >>> right set of knobs to allow us to characterize the target nor do we have
> >>> the right dumping/scanning facilities to describe and query the CFG
> >>> changes.
> >>>
> >>> The fact that the IL changes so much across targets is a sign that
> >>> target dependency (probably BRANCH_COST) is twiddling the gimple we
> >>> generate.  I strongly suspect we'd be a lot better off if we tackled the
> >>> BRANCH_COST problem first.
> >>
> >> Huh.  I've always accepted differing IL between architectures as a
> >> necessary evil for things like auto-vectorization and the like.
> > Yes.  We've made a conscious decision that introducing target
> > dependencies for the autovectorizer makes sense.  BRANCH_COST on the
> > other hand is a different beast :-)
> >
> >>
> >> What's the i

Re: [patch] jump threading multiple paths that start from the same BB

2018-07-09 Thread Aldy Hernandez





On 07/09/2018 04:29 AM, Richard Biener wrote:

On Mon, Jul 9, 2018 at 9:19 AM Aldy Hernandez  wrote:




On 07/05/2018 03:27 PM, Jeff Law wrote:

On 07/04/2018 02:12 AM, Aldy Hernandez wrote:



Many tests should turn into gimple IL tests.


Yeah, though for tests like the threading ones, they're already
sufficiently convoluted that turning them into gimple IL tests will make
them even harder to read.  Oh well, I guess?

To make it easier to transition to gimple IL tests, I suppose we could
add -fdump-tree-*-something-more-suitable-for-gimplefe ;-).


There is -fdump-tree-*-gimple ;)


Ahh!!!  Thanks.




Is it by
design that the gimple fe is sufficiently different from what we dump
for -fdump-tree (ahem, things like phi arguments, probabilities, etc)?


Well.  The only reason is that I didnt' want to adjust all testcases so
I introduced the -gimple dump modifier...

Note there's still some "magic" fiddling needed to make the GIMPLE FE
grok -gimple dump files as well as working around limitations with the
current way of having SSA/CFG testcases.


I'll keep it in mind as I deal with threading going forward.





And I'd like a better framework for testing what we're doing to the IL.


Sigh.  Me too.


Well, the GIMPLE FE is supposed to be that - it's just not perfect
(or rather incomplete).


Hey, it's pretty good IMO :).

Thanks.
Aldy

Re: [PATCH] Fix __mmask* types on many AVX512 intrinsics

2018-07-09 Thread Grazvydas Ignotas

On Mon, Jul 9, 2018 at 10:37 AM, Jakub Jelinek  wrote:
> On Sun, Jul 08, 2018 at 02:39:40AM +0300, Grazvydas Ignotas wrote:
>> > I guess we want to backport it soon, but would appreciate somebody testing
>> > it on real AVX512-{BW,VL} hw before doing the backports.
>>
>> I've run the testsuite with this patch applied and all tests passed on
>> i7-7800X.
>
> Thanks for the testing.
>
>> There are avx512vl-vmovdqa64-1.c and avx512vl-vpermilpdi-1.c
>> failures, but those seem unrelated.
>
> These are dg-do compile tests, and they PASS for me, even when doing
> make check-gcc RUNTESTFLAGS="--target_board=unix/-march=skylake-avx512 
> i386.exp='avx512vl-vmovdqa64-1.c avx512vl-vpermilpdi-1.c'"
> So, how exactly you've configured your gcc, what kind of options are
> passed to the test and how they FAIL?

I should've mentioned I've tested this patch on top of 8.1 release
tarball and used crosstool-NG to build the toolchain with it's "GCC
test suite" option enabled. It looks like crosstool is applying some
patches, so the results might not be valid. Here is the log (seems to
contain the configuration info), where I just grepped for FAIL and the
new test names to see if they were actually run:

http://notaz.gp2x.de/misc/unsorted/gcc.log.xz

Gražvydas

Re: [PATCH] Remove "note: " prefix from some scan-tree-dump directives

2018-07-09 Thread Christophe Lyon

On Tue, 3 Jul 2018 at 16:11, Richard Biener  wrote:
>
> On Tue, Jul 3, 2018 at 4:10 PM David Malcolm  wrote:
> >
> > On Tue, 2018-07-03 at 15:53 +0200, Richard Biener wrote:
> > > On Tue, Jul 3, 2018 at 3:52 PM David Malcolm 
> > > wrote:
> > > >
> > > > On Tue, 2018-07-03 at 09:37 +0200, Richard Biener wrote:
> > > > > On Mon, Jul 2, 2018 at 7:00 PM David Malcolm  > > > > >
> > > > > wrote:
> > > > > >
> > > > > > On Mon, 2018-07-02 at 14:23 +0200, Christophe Lyon wrote:
> > > > > > > On Fri, 29 Jun 2018 at 10:09, Richard Biener  > > > > > > r@gm
> > > > > > > ail.
> > > > > > > com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jun 26, 2018 at 5:43 PM David Malcolm  > > > > > > > hat.
> > > > > > > > com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > This patch adds a concept of nested "scopes" to
> > > > > > > > > dumpfile.c's
> > > > > > > > > dump_*_loc
> > > > > > > > > calls, and wires it up to the DUMP_VECT_SCOPE macro in
> > > > > > > > > tree-
> > > > > > > > > vectorizer.h,
> > > > > > > > > so that the nested structure is shown in -fopt-info by
> > > > > > > > > indentation.
> > > > > > > > >
> > > > > > > > > For example, this converts -fopt-info-all e.g. from:
> > > > > > > > >
> > > > > > > > > test.c:8:3: note: === analyzing loop ===
> > > > > > > > > test.c:8:3: note: === analyze_loop_nest ===
> > > > > > > > > test.c:8:3: note: === vect_analyze_loop_form ===
> > > > > > > > > test.c:8:3: note: === get_loop_niters ===
> > > > > > > > > test.c:8:3: note: symbolic number of iterations is
> > > > > > > > > (unsigned
> > > > > > > > > int)
> > > > > > > > > n_9(D)
> > > > > > > > > test.c:8:3: note: not vectorized: loop contains function
> > > > > > > > > calls or
> > > > > > > > > data references that cannot be analyzed
> > > > > > > > > test.c:8:3: note: vectorized 0 loops in function
> > > > > > > > >
> > > > > > > > > to:
> > > > > > > > >
> > > > > > > > > test.c:8:3: note: === analyzing loop ===
> > > > > > > > > test.c:8:3: note:  === analyze_loop_nest ===
> > > > > > > > > test.c:8:3: note:   === vect_analyze_loop_form ===
> > > > > > > > > test.c:8:3: note:=== get_loop_niters ===
> > > > > > > > > test.c:8:3: note:   symbolic number of iterations is
> > > > > > > > > (unsigned
> > > > > > > > > int) n_9(D)
> > > > > > > > > test.c:8:3: note:   not vectorized: loop contains
> > > > > > > > > function
> > > > > > > > > calls
> > > > > > > > > or data references that cannot be analyzed
> > > > > > > > > test.c:8:3: note: vectorized 0 loops in function
> > > > > > > > >
> > > > > > > > > showing that the "symbolic number of iterations" message
> > > > > > > > > is
> > > > > > > > > within
> > > > > > > > > the "=== analyze_loop_nest ===" (and not within the
> > > > > > > > > "=== vect_analyze_loop_form ===").
> > > > > > > > >
> > > > > > > > > This is also enabling work for followups involving
> > > > > > > > > optimization
> > > > > > > > > records
> > > > > > > > > (allowing the records to directly capture the nested
> > > > > > > > > structure of
> > > > > > > > > the
> > > > > > > > > dump messages).
> > > > > > > > >
> > > > > > > > > Successfully bootstrapped & regrtested on x86_64-pc-
> > > > > > > > > linux-
> > > > > > > > > gnu.
> > > > > > > > >
> > > > > > > > > OK for trunk?
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I've noticed that this patch (r262246) caused regressions on
> > > > > > > aarch64:
> > > > > > > gcc.dg/vect/slp-perm-1.c -flto -ffat-lto-objects  scan-
> > > > > > > tree-
> > > > > > > dump
> > > > > > > vect "note: Built SLP cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-1.c scan-tree-dump vect "note: Built
> > > > > > > SLP
> > > > > > > cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-2.c -flto -ffat-lto-objects  scan-
> > > > > > > tree-
> > > > > > > dump
> > > > > > > vect "note: Built SLP cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-2.c scan-tree-dump vect "note: Built
> > > > > > > SLP
> > > > > > > cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-3.c -flto -ffat-lto-objects  scan-
> > > > > > > tree-
> > > > > > > dump
> > > > > > > vect "note: Built SLP cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-3.c scan-tree-dump vect "note: Built
> > > > > > > SLP
> > > > > > > cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-5.c -flto -ffat-lto-objects  scan-
> > > > > > > tree-
> > > > > > > dump
> > > > > > > vect "note: Built SLP cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-5.c scan-tree-dump vect "note: Built
> > > > > > > SLP
> > > > > > > cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-6.c -flto -ffat-lto-objects  scan-
> > > > > > > tree-
> > > > > > > dump
> > > > > > > vect "note: Built SLP cancelled: can use load/store-lanes"
> > > > > > > gcc.dg/vect/slp-perm-6.c scan-tree-dump vec

Re: [PATCH] Don't fold nextafter/nexttoward if -ftrapping-math or -fmath-errno if they produce denormal results (PR c/86420)

2018-07-09 Thread Richard Biener

On Sat, 7 Jul 2018, Jakub Jelinek wrote:

> Hi!
> 
> So, apparently I've misread when exceptions are raised by
> nextafter/nexttoward (and errno set).
> if(((ix>=0x7ff0)&&((ix-0x7ff0)|lx)!=0) ||   /* x is nan */
>((iy>=0x7ff0)&&((iy-0x7ff0)|ly)!=0)) /* y is nan */
>return x+y;
> I believe the above only raises exception if either operand is sNaN and thus
> we should handle it:
>   if (REAL_VALUE_ISSIGNALING_NAN (*arg0)
>   || REAL_VALUE_ISSIGNALING_NAN (*arg1))
> return false;
> Then:
> if(x==y) return y;  /* x=y, return y */
> If arguments are equal, no exception is raised even if it is denormal,
> we handle this with:
>   /* If x == y, return y cast to target type.  */
>   if (cmp == 0)
> {
>   real_convert (r, fmt, y);
>   return false;
> }
> Next:
> if((ix|lx)==0) {/* x == 0 */
> double u;
> INSERT_WORDS(x,hy&0x8000,1);/* return +-minsubnormal */
> u = math_opt_barrier (x);
> u = u*u;
> math_force_eval (u);/* raise underflow flag */
> return x;
> }
> going from zero to +/- ulp should only raise underflow, but not set errno,
> handled with:
>   /* Similarly for nextafter (0, 1) raising underflow.  */
>   else if (flag_trapping_math
>&& arg0->cl == rvc_zero
>&& result->cl != rvc_zero)
> return false;
> Then overflow:
> hy = hx&0x7ff0;
> if(hy>=0x7ff0) {
>   double u = x+x;   /* overflow  */
>   math_force_eval (u);
>   __set_errno (ERANGE);
> }
> should be handled with:
>   if (REAL_EXP (r) > fmt->emax)
> {
>   get_inf (r, x->sign);
>   return true;
> }
> And finally there is:
> if(hy<0x0010) {
> double u = x*x; /* underflow */
> math_force_eval (u);/* raise underflow flag */
>   __set_errno (ERANGE);
> }
> which I've misread as raising exception and setting errno only if
> the result is 0 and first arg isn't:
>   return r->cl == rvc_zero;
> but actually it is true also for all denormal returns except for the x == y
> case, so the following patch uses:
>   return r->cl == rvc_zero || REAL_EXP (r) < fmt->emin;
> instead.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2018-07-07  Jakub Jelinek  
> 
>   PR c/86420
>   * real.c (real_nextafter): Return true if result is denormal.
> 
>   * gcc.dg/nextafter-1.c (TEST): Adjust the tests that expect denormals
>   to be returned and when first argument is not 0, so that they don't do
>   anything for NEED_EXC or NEED_ERRNO.
> 
> --- gcc/real.c.jj 2018-05-06 23:12:49.211619736 +0200
> +++ gcc/real.c2018-07-06 18:42:44.761026632 +0200
> @@ -5141,7 +5141,7 @@ real_nextafter (REAL_VALUE_TYPE *r, form
>get_zero (r, x->sign);
>return true;
>  }
> -  return r->cl == rvc_zero;
> +  return r->cl == rvc_zero || REAL_EXP (r) < fmt->emin;
>  }
>  
>  /* Write into BUF the maximum representable finite floating-point
> --- gcc/testsuite/gcc.dg/nextafter-1.c.jj 2018-05-10 09:38:03.040250709 
> +0200
> +++ gcc/testsuite/gcc.dg/nextafter-1.c2018-07-06 19:17:55.138355524 
> +0200
> @@ -58,23 +58,41 @@ name (void)   
>  \
>  = (NEED_EXC || NEED_ERRNO) ? __builtin_inf##l1 ()
>  \
>: fn (MAX1, __builtin_inf ());  \
>CHECK (__builtin_isinf##l1 (m) && !__builtin_signbit (m));  \
> -  const type n = fn (DENORM_MIN1, 12.0##L2);  \
> +  const type n   
>  \
> += (NEED_EXC || NEED_ERRNO) ? 2.0##L1 * DENORM_MIN1   
>  \
> +  : fn (DENORM_MIN1, 12.0##L2);   \
>CHECK (n == 2.0##L1 * DENORM_MIN1);
>  \
> -  const type o = fn (n, 24.0##L2);\
> +  const type o   
>  \
> += (NEED_EXC || NEED_ERRNO) ? 3.0##L1 * DENORM_MIN1   
>  \
> +  : fn (n, 24.0##L2); \
>CHECK (o == 3.0##L1 * DENORM_MIN1);
>  \
> -  const type p = fn (o, 132.0##L2);   \
> +  const type p   
>  \
> += (NEED_EXC || NEED_ERRNO) ? 4.0##L1 * DENORM_MIN1   
>  \
> +  : fn (o, 132.0##L2);\
>CHECK (p

Re: [PATCH] Don't fold nextafter/nexttoward if -ftrapping-math or -fmath-errno if they produce denormal results (PR c/86420)

2018-07-09 Thread Richard Biener

On Sat, 7 Jul 2018, Marc Glisse wrote:

> On Sat, 7 Jul 2018, Jakub Jelinek wrote:
> 
> > 2018-07-07  Jakub Jelinek  
> > 
> > PR c/86420
> > * real.c (real_nextafter): Return true if result is denormal.
> 
> I have a question on the side: would it be hard / useful, in cases where
> nextafter may set errno or some exception flag, to fold the result to a
> constant while keeping the function call (ignoring the value it returns)? To
> clarify, I mean replace
> 
> _2 = nextafter(DBL_DENORM_MIN, 0);
> 
> with
> 
> nextafter(DBL_DENORM_MIN, 0);
> _2 = 0;
> 
> I think we already do that for some other calls, although I can't remember
> where. The point would be that we have the value of _2 and can keep folding
> its uses.

There's tree-call-dce.c which is doing a related (more complex) transform
but nothing doing constant propagation through calls.  I think it would
be more useful for the target C library to expose sth like
set_errno (...) so we can constant fold the errno setting as well.
Maybe there's some cheap non-DCEable math function we could abuse...

Richard.

[testsuite, committed] Use relative line numbers in gcc.dg/guality

2018-07-09 Thread Tom de Vries

Hi,

this patches uses relative line numbers in gcc.dg/guality where obvious:
either the relative line number is '.', '.-1' or '.+1', or adjacent to
another obvious case.

Committed as obvious.

Thanks,
- Tom

[testsuite] Use relative line numbers in gcc.dg/guality

2018-07-09  Tom de Vries  

* gcc.dg/guality/asm-1.c: Use relative line numbers where obvious.
* gcc.dg/guality/bswaptest.c: Same.
* gcc.dg/guality/clztest.c: Same.
* gcc.dg/guality/csttest.c: Same.
* gcc.dg/guality/ctztest.c: Same.
* gcc.dg/guality/drap.c: Same.
* gcc.dg/guality/nrv-1.c: Same.
* gcc.dg/guality/pr41353-1.c: Same.
* gcc.dg/guality/pr41353-2.c: Same.
* gcc.dg/guality/pr41404-1.c: Same.
* gcc.dg/guality/pr43051-1.c: Same.
* gcc.dg/guality/pr43077-1.c: Same.
* gcc.dg/guality/pr43177.c: Same.
* gcc.dg/guality/pr43329-1.c: Same.
* gcc.dg/guality/pr43479.c: Same.
* gcc.dg/guality/pr43593.c: Same.
* gcc.dg/guality/pr45003-1.c: Same.
* gcc.dg/guality/pr45003-2.c: Same.
* gcc.dg/guality/pr45003-3.c: Same.
* gcc.dg/guality/pr48437.c: Same.
* gcc.dg/guality/pr48466.c: Same.
* gcc.dg/guality/pr49888.c: Same.
* gcc.dg/guality/pr54200.c: Same.
* gcc.dg/guality/pr54519-1.c: Same.
* gcc.dg/guality/pr54519-2.c: Same.
* gcc.dg/guality/pr54519-3.c: Same.
* gcc.dg/guality/pr54519-4.c: Same.
* gcc.dg/guality/pr54519-5.c: Same.
* gcc.dg/guality/pr54519-6.c: Same.
* gcc.dg/guality/pr54551.c: Same.
* gcc.dg/guality/pr54693-2.c: Same.
* gcc.dg/guality/pr54693.c: Same.
* gcc.dg/guality/pr54796.c: Same.
* gcc.dg/guality/pr54970.c: Same.
* gcc.dg/guality/pr67192.c: Same.
* gcc.dg/guality/pr69947.c: Same.
* gcc.dg/guality/pr78726.c: Same.
* gcc.dg/guality/rotatetest.c: Same.
* gcc.dg/guality/sra-1.c: Same.
* gcc.dg/guality/vla-2.c: Same.

---
 gcc/testsuite/gcc.dg/guality/asm-1.c  |  2 +-
 gcc/testsuite/gcc.dg/guality/bswaptest.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/clztest.c|  4 +-
 gcc/testsuite/gcc.dg/guality/csttest.c| 72 +++
 gcc/testsuite/gcc.dg/guality/ctztest.c|  4 +-
 gcc/testsuite/gcc.dg/guality/drap.c   |  4 +-
 gcc/testsuite/gcc.dg/guality/nrv-1.c  |  2 +-
 gcc/testsuite/gcc.dg/guality/pr41353-1.c  | 30 ++---
 gcc/testsuite/gcc.dg/guality/pr41353-2.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr41404-1.c  |  6 +--
 gcc/testsuite/gcc.dg/guality/pr43051-1.c  | 12 +++---
 gcc/testsuite/gcc.dg/guality/pr43077-1.c  | 20 -
 gcc/testsuite/gcc.dg/guality/pr43177.c|  8 ++--
 gcc/testsuite/gcc.dg/guality/pr43329-1.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr43479.c| 10 ++---
 gcc/testsuite/gcc.dg/guality/pr43593.c|  2 +-
 gcc/testsuite/gcc.dg/guality/pr45003-1.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr45003-2.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr45003-3.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr48437.c|  2 +-
 gcc/testsuite/gcc.dg/guality/pr48466.c|  8 ++--
 gcc/testsuite/gcc.dg/guality/pr49888.c|  4 +-
 gcc/testsuite/gcc.dg/guality/pr54200.c|  2 +-
 gcc/testsuite/gcc.dg/guality/pr54519-1.c  | 12 +++---
 gcc/testsuite/gcc.dg/guality/pr54519-2.c  |  6 +--
 gcc/testsuite/gcc.dg/guality/pr54519-3.c  | 12 +++---
 gcc/testsuite/gcc.dg/guality/pr54519-4.c  |  6 +--
 gcc/testsuite/gcc.dg/guality/pr54519-5.c  |  6 +--
 gcc/testsuite/gcc.dg/guality/pr54519-6.c  |  4 +-
 gcc/testsuite/gcc.dg/guality/pr54551.c|  4 +-
 gcc/testsuite/gcc.dg/guality/pr54693-2.c  |  8 ++--
 gcc/testsuite/gcc.dg/guality/pr54693.c|  2 +-
 gcc/testsuite/gcc.dg/guality/pr54796.c|  6 +--
 gcc/testsuite/gcc.dg/guality/pr54970.c| 68 ++---
 gcc/testsuite/gcc.dg/guality/pr67192.c| 10 ++---
 gcc/testsuite/gcc.dg/guality/pr69947.c|  4 +-
 gcc/testsuite/gcc.dg/guality/pr78726.c|  6 +--
 gcc/testsuite/gcc.dg/guality/rotatetest.c | 12 +++---
 gcc/testsuite/gcc.dg/guality/sra-1.c  | 12 +++---
 gcc/testsuite/gcc.dg/guality/vla-2.c  |  4 +-
 40 files changed, 199 insertions(+), 199 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/guality/asm-1.c 
b/gcc/testsuite/gcc.dg/guality/asm-1.c
index e9cf4a167aa..089badf812a 100644
--- a/gcc/testsuite/gcc.dg/guality/asm-1.c
+++ b/gcc/testsuite/gcc.dg/guality/asm-1.c
@@ -10,7 +10,7 @@ foo (struct A *p, char *q)
 {
   int f = &p->z[p->y] - q;
   asm volatile (NOP);
-  asm volatile (NOP : : "g" (f));  /* { dg-final { gdb-test 14 "f" 
"14" } } */
+  asm volatile (NOP : : "g" (f));  /* { dg-final { gdb-test .+1 
"f" "14" } } */
   asm volatile ("" : : "g" (p), "g" (q));
 }
 
diff --git a/gcc/testsuite/gcc.dg/guality/bswaptest.c 
b/gcc/testsuite/gcc.dg/guality/bswaptest.c
index c924f6e2fc8..8d70cb3c5c8 100644
--- a/gcc/testsuite/gcc.dg/guality/bswaptest.c
+++ b/gcc/testsuite

Re: calculate overflow type in wide int arithmetic

2018-07-09 Thread Rainer Orth

Hi Aldy,

> On 07/05/2018 05:50 AM, Richard Biener wrote:
>> On Thu, Jul 5, 2018 at 9:35 AM Aldy Hernandez  wrote:
>>>
>>> The reason for this patch are the changes showcased in tree-vrp.c.
>>> Basically I'd like to discourage rolling our own overflow and underflow
>>> calculation when doing wide int arithmetic.  We should have a
>>> centralized place for this, that is-- in the wide int code itself ;-).
>>>
>>> The only cases I care about are plus/minus, which I have implemented,
>>> but we also get division for free, since AFAICT, division can only
>>> positive overflow:
>>>
>>>  -MIN / -1 => +OVERFLOW
>>>
>>> Multiplication OTOH, can underflow, but I've not implemented it because
>>> we have no uses for it.  I have added a note in the code explaining this.
>>>
>>> Originally I tried to only change plus/minus, but that made code that
>>> dealt with plus/minus in addition to div or mult a lot uglier.  You'd
>>> have to special case "int overflow_for_add_stuff" and "bool
>>> overflow_for_everything_else".  Changing everything to int, makes things
>>> consistent.
>>>
>>> Note: I have left poly-int as is, with its concept of yes/no for
>>> overflow.  I can adapt this as well if desired.
>>>
>>> Tested on x86-64 Linux.
>>>
>>> OK for trunk?
>>
>> looks all straight-forward but the following:
>>
>> else if (op1)
>>   {
>> if (minus_p)
>> -   {
>> - wi = -wi::to_wide (op1);
>> -
>> - /* Check for overflow.  */
>> - if (sgn == SIGNED
>> - && wi::neg_p (wi::to_wide (op1))
>> - && wi::neg_p (wi))
>> -   ovf = 1;
>> - else if (sgn == UNSIGNED && wi::to_wide (op1) != 0)
>> -   ovf = -1;
>> -   }
>> +   wi = wi::neg (wi::to_wide (op1));
>> else
>>  wi = wi::to_wide (op1);
>>
>> you fail to handle - -INT_MIN.
>
> Woah, very good catch.  I previously had this implemented as wi::sub(0,
> op1, &ovf) which was calculating overflow correctly but when I implemented
> the overflow type in wi::neg I missed this.  Thanks.
>
>>
>> Given the fact that for multiplication (or others, didn't look too  close)
>> you didn't implement the direction indicator I wonder if it would be more
>> appropriate to do
>>
>> enum ovfl { OVFL_NONE = 0, OVFL_UNDERFLOW = -1, OVFL_OVERFLOW = 1,
>> OVFL_UNKNOWN = 2 };
>>
>> and tell us the "truth" here?
>
> Excellent idea...though it came with lots of typing :).  Fixed.
>
> BTW, if I understand correctly, I've implemented the overflow types
> correctly for everything but multiplication (which we have no users for and
> I return OVF_UNKNOWN).  I have indicated this in comments.  Also, for
> division I did nothing special, as we can only +OVERFLOW.
>
>>
>> Hopefully if (overflow) will still work with that.
>
> It does.
>
>>
>> Otherwise can you please add a toplevel comment to wide-int.h as to what the
>> overflow result semantically is for a) SIGNED and b) UNSIGNED operations?
>
> Done.  Let me know if the current comment is what you had in mind.
>
> OK for trunk?

this patch broke SPARC bootstrap:

/vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c: In function 'tree_node* 
sparc_fold_builtin(tree, int, tree_node**, bool)':
/vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:11868:52: error: no 
matching function for call to 'neg(wi::tree_to_widest_ref, bool*)'
tmp = wi::neg (wi::to_widest (e1), &neg1_ovf);
^
In file included from /vol/gcc/src/hg/trunk/local/gcc/coretypes.h:399:0,
 from /vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:27:
/vol/gcc/src/hg/trunk/local/gcc/wide-int.h:2127:1: note: candidate: 
template typename wi::binary_traits::result_type wi::neg(const 
T&)
 wi::neg (const T &x)
 ^~
/vol/gcc/src/hg/trunk/local/gcc/wide-int.h:2127:1: note:   template argument 
deduction/substitution failed:
/vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:11868:52: note:   
candidate expects 1 argument, 2 provided
tmp = wi::neg (wi::to_widest (e1), &neg1_ovf);
^
In file included from /vol/gcc/src/hg/trunk/local/gcc/coretypes.h:399:0,
 from /vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:27:
/vol/gcc/src/hg/trunk/local/gcc/wide-int.h:2136:1: note: candidate: 
template typename wi::binary_traits::result_type wi::neg(const 
T&, wi::overflow_type*)
 wi::neg (const T &x, overflow_type *overflow)
 ^~
/vol/gcc/src/hg/trunk/local/gcc/wide-int.h:2136:1: note:   template argument 
deduction/substitution failed:
/vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:11868:43: note:   cannot 
convert '& neg1_ovf' (type 'bool*') to type 'wi::overflow_type*'
tmp = wi::neg (wi::to_widest (e1), &neg1_ovf);
   ^
In file included from /vol/gcc/src/hg/trunk/local/gcc/coretypes.h:414:0,
 from /vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:27:
/vol/gcc/src/hg/trun

Re: [PATCH] Fix __mmask* types on many AVX512 intrinsics

2018-07-09 Thread Jakub Jelinek

On Mon, Jul 09, 2018 at 12:00:46PM +0300, Grazvydas Ignotas wrote:
> On Mon, Jul 9, 2018 at 10:37 AM, Jakub Jelinek  wrote:
> > On Sun, Jul 08, 2018 at 02:39:40AM +0300, Grazvydas Ignotas wrote:
> >> > I guess we want to backport it soon, but would appreciate somebody 
> >> > testing
> >> > it on real AVX512-{BW,VL} hw before doing the backports.
> >>
> >> I've run the testsuite with this patch applied and all tests passed on
> >> i7-7800X.
> >
> > Thanks for the testing.
> >
> >> There are avx512vl-vmovdqa64-1.c and avx512vl-vpermilpdi-1.c
> >> failures, but those seem unrelated.
> >
> > These are dg-do compile tests, and they PASS for me, even when doing
> > make check-gcc RUNTESTFLAGS="--target_board=unix/-march=skylake-avx512 
> > i386.exp='avx512vl-vmovdqa64-1.c avx512vl-vpermilpdi-1.c'"
> > So, how exactly you've configured your gcc, what kind of options are
> > passed to the test and how they FAIL?
> 
> I should've mentioned I've tested this patch on top of 8.1 release
> tarball and used crosstool-NG to build the toolchain with it's "GCC
> test suite" option enabled. It looks like crosstool is applying some
> patches, so the results might not be valid. Here is the log (seems to
> contain the configuration info), where I just grepped for FAIL and the
> new test names to see if they were actually run:
> 
> http://notaz.gp2x.de/misc/unsorted/gcc.log.xz

Don't see any FAILs even in your log file on the above tests:
spawn -ignore SIGHUP gcc 
/home/notaz/x-tools/x86_64-unknown-linux-gnu/test-suite/gcc/testsuite/gcc.target/i386/avx512vl-vmovdqa64-1.c
 -fno-diagnostics-show-caret -fdiagnostics-color=never -mavx512vl -O2 
-ffat-lto-objects -S -o avx512vl-vmovdqa64-1.s
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c (test for excess errors)
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+\\([^\n]*%ymm[0-9]+(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+\\([^\n]*%xmm[0-9]+(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*\\)[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*\\)[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\nxy]*\\(.{5,6}(?:\n|[ \\t]+#) 1
gcc.target/i386/avx512vl-vmovdqa64-1.c: vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\nxy]*\\((?:\n|[ \\t]+#) found 0 times
XFAIL: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\nxy]*\\((?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%ymm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vmovdqa64-1.c scan-assembler-times vmovdqa64[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n]*\\){%k[1-7]}(?:\n|[ \\t]+#) 1

spawn -ignore SIGHUP gcc 
/home/notaz/x-tools/x86_64-unknown-linux-gnu/test-suite/gcc/testsuite/gcc.target/i386/avx512vl-vpermilpdi-1.c
 -fno-diagnostics-show-caret -fdiagnostics-color=never -mavx512vl -O2 
-ffat-lto-objects -S -o avx512vl-vpermilpdi-1.s
PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c (test for excess errors)
gcc.target/i386/avx512vl-vpermilpdi-1.c: vpermilpd[ 
\\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) found 0 times
XFAIL: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ 
\\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
gcc.target/i386/avx512vl-vpermilpdi-1.c: vpermilpd[ 
\\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) found 0 times
XFAIL: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ 
\\t]+[^{\n]*13[^\n]*%ymm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ 
\\t]+[^{\n]*3[^\n]*%xmm[0-9]+{%k[1-7]}(?:\n|[ \\t]+#) 1
PASS: gcc.target/i386/avx512vl-vpermilpdi-1.c scan-assembler-times vpermilpd[ 
\\t]+[^{\n]*3[^\n]*%xmm[0-9]+{%k[1-7]}{z}(?:\n|[ \\t]+#) 1

XFAIL is expected fail, not unexpected...

Jakub

[PATCH][AAarch64] Add support for TARGET_COMPUTE_FRAME_LAYOUT

2018-07-09 Thread Vlad Lazar

Hi all,

The patch adds support for the TARGET_COMPUTE_FRAME_LAYOUT hook on AArch64
and removes unneeded frame layout recalculation.

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.

Thanks,
Vlad

gcc/
2018-07-02  Vlad Lazar  

* config/aarch64/aarch64.h (TARGET_COMPUTE_FRAME_LAYOUT): Define.
* config/aarch64/aarch64.c (aarch64_expand_prologue, 
aarch64_expand_epilogue,
aarch64_initial_elimination_offset): Remove aarch64_layout_frame calls.

diff
Description: diff

Re: calculate overflow type in wide int arithmetic

2018-07-09 Thread Richard Biener

On July 9, 2018 1:16:59 PM GMT+02:00, Rainer Orth 
 wrote:
>Hi Aldy,
>
>> On 07/05/2018 05:50 AM, Richard Biener wrote:
>>> On Thu, Jul 5, 2018 at 9:35 AM Aldy Hernandez 
>wrote:

 The reason for this patch are the changes showcased in tree-vrp.c.
 Basically I'd like to discourage rolling our own overflow and
>underflow
 calculation when doing wide int arithmetic.  We should have a
 centralized place for this, that is-- in the wide int code itself
>;-).

 The only cases I care about are plus/minus, which I have
>implemented,
 but we also get division for free, since AFAICT, division can only
 positive overflow:

  -MIN / -1 => +OVERFLOW

 Multiplication OTOH, can underflow, but I've not implemented it
>because
 we have no uses for it.  I have added a note in the code explaining
>this.

 Originally I tried to only change plus/minus, but that made code
>that
 dealt with plus/minus in addition to div or mult a lot uglier. 
>You'd
 have to special case "int overflow_for_add_stuff" and "bool
 overflow_for_everything_else".  Changing everything to int, makes
>things
 consistent.

 Note: I have left poly-int as is, with its concept of yes/no for
 overflow.  I can adapt this as well if desired.

 Tested on x86-64 Linux.

 OK for trunk?
>>>
>>> looks all straight-forward but the following:
>>>
>>> else if (op1)
>>>   {
>>> if (minus_p)
>>> -   {
>>> - wi = -wi::to_wide (op1);
>>> -
>>> - /* Check for overflow.  */
>>> - if (sgn == SIGNED
>>> - && wi::neg_p (wi::to_wide (op1))
>>> - && wi::neg_p (wi))
>>> -   ovf = 1;
>>> - else if (sgn == UNSIGNED && wi::to_wide (op1) != 0)
>>> -   ovf = -1;
>>> -   }
>>> +   wi = wi::neg (wi::to_wide (op1));
>>> else
>>>  wi = wi::to_wide (op1);
>>>
>>> you fail to handle - -INT_MIN.
>>
>> Woah, very good catch.  I previously had this implemented as
>wi::sub(0,
>> op1, &ovf) which was calculating overflow correctly but when I
>implemented
>> the overflow type in wi::neg I missed this.  Thanks.
>>
>>>
>>> Given the fact that for multiplication (or others, didn't look too 
>close)
>>> you didn't implement the direction indicator I wonder if it would be
>more
>>> appropriate to do
>>>
>>> enum ovfl { OVFL_NONE = 0, OVFL_UNDERFLOW = -1, OVFL_OVERFLOW = 1,
>>> OVFL_UNKNOWN = 2 };
>>>
>>> and tell us the "truth" here?
>>
>> Excellent idea...though it came with lots of typing :).  Fixed.
>>
>> BTW, if I understand correctly, I've implemented the overflow types
>> correctly for everything but multiplication (which we have no users
>for and
>> I return OVF_UNKNOWN).  I have indicated this in comments.  Also, for
>> division I did nothing special, as we can only +OVERFLOW.
>>
>>>
>>> Hopefully if (overflow) will still work with that.
>>
>> It does.
>>
>>>
>>> Otherwise can you please add a toplevel comment to wide-int.h as to
>what the
>>> overflow result semantically is for a) SIGNED and b) UNSIGNED
>operations?
>>
>> Done.  Let me know if the current comment is what you had in mind.
>>
>> OK for trunk?
>
>this patch broke SPARC bootstrap:
>
>/vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c: In function
>'tree_node* sparc_fold_builtin(tree, int, tree_node**, bool)':
>/vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:11868:52: error:
>no matching function for call to 'neg(wi::tree_to_widest_ref, bool*)'
>tmp = wi::neg (wi::to_widest (e1), &neg1_ovf);
>^
>In file included from
>/vol/gcc/src/hg/trunk/local/gcc/coretypes.h:399:0,
>  from /vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:27:
>/vol/gcc/src/hg/trunk/local/gcc/wide-int.h:2127:1: note: candidate:
>template typename wi::binary_traits::result_type
>wi::neg(const T&)
> wi::neg (const T &x)
> ^~
>/vol/gcc/src/hg/trunk/local/gcc/wide-int.h:2127:1: note:   template
>argument deduction/substitution failed:
>/vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:11868:52: note:  
>candidate expects 1 argument, 2 provided
>tmp = wi::neg (wi::to_widest (e1), &neg1_ovf);
>^
>In file included from
>/vol/gcc/src/hg/trunk/local/gcc/coretypes.h:399:0,
>  from /vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:27:
>/vol/gcc/src/hg/trunk/local/gcc/wide-int.h:2136:1: note: candidate:
>template typename wi::binary_traits::result_type
>wi::neg(const T&, wi::overflow_type*)
> wi::neg (const T &x, overflow_type *overflow)
> ^~
>/vol/gcc/src/hg/trunk/local/gcc/wide-int.h:2136:1: note:   template
>argument deduction/substitution failed:
>/vol/gcc/src/hg/trunk/local/gcc/config/sparc/sparc.c:11868:43: note:  
>cannot convert '& neg1_ovf' (type 'bool*') to type 'wi::overflow_type*'
>tmp = wi::neg (wi::to_widest (e1), &neg1_ovf);
>   ^~~

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-09 Thread Richard Biener

On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:
>
> On 07/06/2018 09:52 AM, Richard Biener wrote:
> > On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:
> >>
> >> GCC folds accesses to members of constant aggregates except
> >> for character arrays/strings.  For example, the strlen() call
> >> below is not folded:
> >>
> >>const char a[][4] = { "1", "12" };
> >>
> >>int f (void) { retturn strlen (a[1]); }
> >>
> >> The attached change set enhances the string_constant() function
> >> to make it possible to extract string constants from aggregate
> >> initializers (CONSTRUCTORS).
> >>
> >> The initial solution was much simpler but as is often the case,
> >> MEM_REF made it fail to fold things like:
> >>
> >>int f (void) { retturn strlen (a[1] + 1); }
> >>
> >> Handling those made the project a bit more interesting and
> >> the final solution somewhat more involved.
> >>
> >> To handle offsets into aggregate string members the patch also
> >> extends the fold_ctor_reference() function to extract entire
> >> string array initializers even if the offset points past
> >> the beginning of the string and even though the size and
> >> exact type of the reference are not known (there isn't enough
> >> information in a MEM_REF to determine that).
> >>
> >> Tested along with the patch for PR 86415 on x86_64-linux.
> >
> > +  if (TREE_CODE (init) == CONSTRUCTOR)
> > +   {
> > + tree type;
> > + if (TREE_CODE (arg) == ARRAY_REF
> > + || TREE_CODE (arg) == MEM_REF)
> > +   type = TREE_TYPE (arg);
> > + else if (TREE_CODE (arg) == COMPONENT_REF)
> > +   {
> > + tree field = TREE_OPERAND (arg, 1);
> > + type = TREE_TYPE (field);
> > +   }
> > + else
> > +   return NULL_TREE;
> >
> > what's wrong with just
> >
> > type = TREE_TYPE (field);
>
> In response to your comment below abut size I simplified things
> further so determining the type a priori is no longer necessary.
>
> > ?
> >
> > + base_off *= BITS_PER_UNIT;
> >
> > poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
> > for poly you'd then use poly_offset?
>
> Okay, I tried to avoid the overflow.  (Converting between all
> these flavors of wide int types is a monumental PITA.)
>
> >
> > You extend fold_ctor_reference to treat size == 0 specially but then
> > bother to compute a size here - that looks unneeded?
>
> Yes, well spotted, thanks!  I simplified the code so this isn't
> necessary, and neither is the type.
>
> >
> > While the offset of the reference determines the first field in the
> > CONSTRUCTOR, how do you know the access doesn't touch
> > adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
> > so consider
> >
> >   char x[2][4] = { "abcd", "abcd" };
> >
> > and MEM[&x] with a char[8] type?  memcpy "inlining" will create
> > such MEMs for example.
>
> The code is only used to find string constants in initializer
> expressions where I don't think the size of the access comes
> into play.  If a memcpy() call results in a MEM_REF[char[8],
> &x, 8] that's fine.  It's a valid reference and we can still
> get the underlying character sequence (which is represented
> as two STRING_CSTs with the two string literals).  I might
> be missing the point of your question.

Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.

> >
> > @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
> >tree byte_offset = DECL_FIELD_OFFSET (cfield);
> >tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
> >tree field_size = DECL_SIZE (cfield);
> > -  offset_int bitoffset;
> > -  offset_int bitoffset_end, access_end;
> > +
> > +  if (!field_size && TREE_CODE (cval) == STRING_CST)
> > +   {
> > + /* Determine the size of the flexible array member from
> > +the size of the string initializer provided for it.  */
> > + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
> > + tree eltype = TREE_TYPE (TREE_TYPE (cval));
> > + len *= tree_to_uhwi (TYPE_SIZE (eltype));
> > + field_size = build_int_cst (size_type_node, len);
> > +   }
> >
> > Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
> > say, for
> >
> > struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };
>
> I can't think of a use for it.  Do you have something in mind?

Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.

> >
> > ?  And why not use simply
> >
> >   field_size = TYPE_SIZE (TREE_TYPE (cval));
> >
> > like you do in c_strlen?
>
> Yes, that's simpler, thanks.
>
> >
> > Otherwise looks reasonable.
>
> Attached is an updated patch.  I also enhanced the handling
> of non-constant indices.  They were already handled before
> to a smaller extent.

Re: [PATCH][debug] Reuse debug exprs generated in remap_ssa_name

2018-07-09 Thread Richard Biener

On Sun, Jul 8, 2018 at 11:27 AM Tom de Vries  wrote:
>
> On Sun, Jul 08, 2018 at 11:22:41AM +0200, Tom de Vries wrote:
> > On Fri, Jul 06, 2018 at 04:38:50PM +0200, Richard Biener wrote:
> > > On Fri, Jul 6, 2018 at 12:47 PM Tom de Vries  wrote:
> > > > On 07/05/2018 01:39 PM, Richard Biener wrote:
> >
> > 
> >
> > > I now also spotted the code in remap_ssa_name that is supposed to handle
> > > this it seems and for the testcase we only give up because the PARM_DECL 
> > > is
> > > remapped to a VAR_DECL.  So I suppose it is to be handled via the
> > > debug-args stuff
> > > which probably lacks in the area of versioning.
> > >
> > > Your patch feels like it adds stuff ontop of existing mechanisms that
> > > should "just work"
> > > with the correct setup at the correct places...
> > >
> >
> > Hmm, I realized that I may be complicating things, by trying to do an
> > optimal fix in a single patch, so I decided to write two patches, one
> > with a fix, and then one improving the fix to be more optimal.
> >
> > Also, I suspect that the "just work" approach is this:
> > ...
> ># DEBUG D#8 s=> iD.1900
> ># DEBUG iD.1949 => D#8
> ># DEBUG D#6 s=> iD.1949
> > ...
> > whereas previously I tried to map 'D#6' on iD.1900 directly.
> >
>
> Second patch OK for trunk?

OK, though I wonder how it doesn't fail with that testcase with
the mismatching type where the removed param-decl is mapped
to a local var-decl.

Richard.

> Thanks,
> - Tom
>
> [debug] Reuse debug exprs generated in remap_ssa_name
>
> When compiling gcc.dg/vla-1.c with -O3 -g, vla a and b in f1 are optimized
> away, and f1 is cloned to a version f1.constprop with no parameters, 
> eliminating
> parameter i.  Debug info is generated to describe the sizes of a and b, but
> that process generates debug expressions that are not reused.
>
> Fix the duplication by saving and reusing the generated debug expressions in
> remap_ssa_name.  Concretely: reuse D#7 here instead of generating D#8:
> ...
>  __attribute__((noinline))
>  f1.constprop ()
>  {
>intD.6 iD.1935;
>
>
># DEBUG D#10 s=> iD.1897
># DEBUG iD.1935 => D#10
>
>
> -  # DEBUG D#8 s=> iD.1935
># DEBUG D#7 s=> iD.1935
>saved_stack.2_1 = __builtin_stack_save ();
># DEBUG BEGIN_STMT
># DEBUG D#6 => D#7 + 1
># DEBUG D#5 => (long intD.8) D#6
># DEBUG D#4 => D#5 + -1
># DEBUG D.1937 => (sizetype) D#4
># DEBUG a.0D.1942 => NULL
># DEBUG BEGIN_STMT
> -  # DEBUG D#3 => D#8 + 2
> +  # DEBUG D#3 => D#7 + 2
># DEBUG D#2 => (long intD.8) D#3
># DEBUG D#1 => D#2 + -1
># DEBUG D.1944 => (sizetype) D#1
># DEBUG b.1D.1949 => NULL
> ...
>
> Bootstrapped and reg-tested on x86_64.
>
> 2018-07-07  Tom de Vries  
>
> * tree-inline.c (remap_ssa_name): Save and reuse debug exprs generated
> in remap_ssa_name.
>
> * gcc.dg/vla-1.c: Update.
>
> ---
>  gcc/testsuite/gcc.dg/vla-1.c | 5 +++--
>  gcc/tree-inline.c| 4 
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/vla-1.c b/gcc/testsuite/gcc.dg/vla-1.c
> index 0c19feffd2b..94db23d1336 100644
> --- a/gcc/testsuite/gcc.dg/vla-1.c
> +++ b/gcc/testsuite/gcc.dg/vla-1.c
> @@ -20,6 +20,7 @@ main ()
>return 0;
>  }
>
> -/* One debug source bind is generated for the parameter, and two to describe 
> the
> +/* One debug source bind is generated for the parameter, and one to describe 
> the
> sizes of a and b.  */
> -/* { dg-final { scan-tree-dump-times " s=> i" 3 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times " s=> i" 2 "optimized" } } */
> +
> diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
> index 6fbd8c3ca61..164c7fff710 100644
> --- a/gcc/tree-inline.c
> +++ b/gcc/tree-inline.c
> @@ -215,12 +215,16 @@ remap_ssa_name (tree name, copy_body_data *id)
>   processing_debug_stmt = -1;
>   return name;
> }
> + n = id->decl_map->get (val);
> + if (n && TREE_CODE (*n) == DEBUG_EXPR_DECL)
> +   return *n;
>   def_temp = gimple_build_debug_source_bind (vexpr, val, NULL);
>   DECL_ARTIFICIAL (vexpr) = 1;
>   TREE_TYPE (vexpr) = TREE_TYPE (name);
>   SET_DECL_MODE (vexpr, DECL_MODE (SSA_NAME_VAR (name)));
>   gsi = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN 
> (cfun)));
>   gsi_insert_before (&gsi, def_temp, GSI_SAME_STMT);
> + insert_decl_map (id, val, vexpr);
>   return vexpr;
> }
>
>

Re: [PATCH 1/2] Add "optinfo" framework

2018-07-09 Thread Richard Biener

On Mon, Jul 2, 2018 at 10:51 PM David Malcolm  wrote:
>
> This patch implements a way to consolidate dump_* calls into
> optinfo objects, as enabling work towards being able to write out
> optimization records to a file, or emit them as diagnostic "remarks".
>
> The patch adds the support for building optinfo instances from dump_*
> calls, but leaves implementing any *users* of them to followup patches.
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

Looks good overall, but ...

To "fix" the GC issue you'd need to capture all possibly interesting
information from tree/gimple while it is still in flight.  This _may_ be
necessary anyway since I remember writing code like

  fprintf (dump_file, "old: ");
  print_gimple_stmt (..., stmt);
  gimple_set_rhs1 (stmt, op);
  fprintf (dump_file, "new: ");
  print_gmple_stmt (..., stmt);
  fprintf (dump_file, "\n");

capturing interesting information means we know all targeted
optinfo channels, right?  And the optinfo consumers
need to handle "streams" of input and may not look back.

I've yet have to look at the 2nd patch but can you comment on
this?  How difficult is it to re-wire how the data flows to make
stmt re-use like the above possible?

Thanks,
Richard.

> gcc/ChangeLog:
> * Makefile.in (OBJS): Add optinfo.o.
> * coretypes.h (class symtab_node): New forward decl.
> (struct cgraph_node): New forward decl.
> (class varpool_node): New forward decl.
> * dump-context.h: New file.
> * dumpfile.c: Include "optinfo.h", "dump-context.h", "cgraph.h",
> "tree-pass.h", "optinfo-internal.h".
> (refresh_dumps_are_enabled): Use optinfo_enabled_p.
> (set_dump_file): Call dumpfile_ensure_any_optinfo_are_flushed.
> (set_alt_dump_file): Likewise.
> (dump_context::~dump_context): New dtor.
> (dump_gimple_stmt): Move implementation to...
> (dump_context::dump_gimple_stmt): ...this new member function.
> Add the stmt to any pending optinfo, creating one if need be.
> (dump_gimple_stmt_loc): Move implementation to...
> (dump_context::dump_gimple_stmt_loc): ...this new member function.
> Convert param "loc" from location_t to const dump_location_t &.
> Start a new optinfo and add the stmt to it.
> (dump_generic_expr): Move implementation to...
> (dump_context::dump_generic_expr): ...this new member function.
> Add the tree to any pending optinfo, creating one if need be.
> (dump_generic_expr_loc): Move implementation to...
> (dump_context::dump_generic_expr_loc): ...this new member
> function.  Add the tree to any pending optinfo, creating one if
> need be.
> (dump_printf): Move implementation to...
> (dump_context::dump_printf_va): ...this new member function.  Add
> the text to any pending optinfo, creating one if need be.
> (dump_printf_loc): Move implementation to...
> (dump_context::dump_printf_loc_va): ...this new member function.
> Convert param "loc" from location_t to const dump_location_t &.
> Start a new optinfo and add the stmt to it.
> (dump_dec): Move implementation to...
> (dump_context::dump_dec): ...this new member function.  Add the
> value to any pending optinfo, creating one if need be.
> (dump_context::dump_symtab_node): New member function.
> (dump_context::get_scope_depth): New member function.
> (dump_context::begin_scope): New member function.
> (dump_context::end_scope): New member function.
> (dump_context::ensure_pending_optinfo): New member function.
> (dump_context::begin_next_optinfo): New member function.
> (dump_context::end_any_optinfo): New member function.
> (dump_context::s_current): New global.
> (dump_context::s_default): New global.
> (dump_scope_depth): Delete global.
> (dumpfile_ensure_any_optinfo_are_flushed): New function.
> (dump_symtab_node): New function.
> (get_dump_scope_depth): Reimplement in terms of dump_context.
> (dump_begin_scope): Likewise.
> (dump_end_scope): Likewise.
> (selftest::temp_dump_context::temp_dump_context): New ctor.
> (selftest::temp_dump_context::~temp_dump_context): New dtor.
> (selftest::assert_is_text): New support function.
> (selftest::assert_is_tree): New support function.
> (selftest::assert_is_gimple): New support function.
> (selftest::test_capture_of_dump_calls): New test.
> (selftest::dumpfile_c_tests): Call it.
> * dumpfile.h (dump_printf, dump_printf_loc, dump_basic_block,
> dump_generic_expr_loc, dump_generic_expr, dump_gimple_stmt_loc,
> dump_gimple_stmt, dump_dec): Gather these related decls and add a
> descriptive comment.
> (dump_function, print_combine_total_stats, ena

Re: [PATCH 2/2] optinfo: add diagnostic remarks

2018-07-09 Thread Richard Biener

On Mon, Jul 2, 2018 at 10:51 PM David Malcolm  wrote:
>
> This patch adds the first destination for optinfo instances to be
> emitted to: as "remarks" through the diagnostics subsystem.
>
> Examples can be seen at
>   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html
>
> Remarks look a lot like the output of -fopt-info, with the following
> differences:
>
> * announcing "In function 'blah':" etc when the function changes.
>
> * printing the corresponding source lines (unless
>   "-fno-diagnostics-show-caret"), as per other diagnostics, and
>
> * colorizing the various parts of the message if stderr is at a tty.
>
> * showing extra metadata:
>   * the pass that emitted the remark,
>   * the execution count of the code in question
>   * which file/line/function of GCC emitted the remark
>
> * possibly allowing for remarks to be used in DejaGnu tests to better
>   associate testing of an optimization with the source line in
>   question, rather than relying on a scan of the dumpfile (which is
>   per-source file and thus rather "coarse-grained").*
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> (see the notes in the cover letter about the state of this patch;
> posting for motivation of the optinfo stuff).

As an overall comment I do like this as it is an improvement
over -fopt-info which is currently the user-facing way of
showing remarks about what code is optimized and what
not and why.

But for this very reason I don't like introducing a separate
set of options.  This means the remark stuff should
replace -fopt-info (and re-use that flag as far as possible).
This eventually means some -fXYZ might reduce the
verbosity level (source line / hotness, etc.) to the current
state - but I don't think we have made any guarantees
about the current format.  Given there'll be the JSON stuff
for machine-consumption I am also not worried about
existing machines parsing the stuff.

Anyway, I think the issue outlined in the comment to 1/n
needs to be addressed first.

Thanks for working on this!
Richard.

> gcc/ChangeLog:
> * Makefile.in (OBJS): Add optinfo-emit-diagnostics.o.
> * common.opt (fremarks): New option.
> (fdiagnostics-show-remark-hotness): New option.
> (fdiagnostics-show-remark-origin): New option.
> (fdiagnostics-show-remark-pass): New option.
> * diagnostic-color.c (color_dict): Add "remark", as bold green.
> * diagnostic-core.h (remark): New decl.
> * diagnostic.c (diagnostic_action_after_output): Handle DK_REMARK.
> (remark): New function.
> * diagnostic.def (DK_REMARK): New diagnostic kind.
> * doc/invoke.texi (Remarks): New section.
> (-fremarks): New option.
> (-fno-diagnostics-show-remark-hotness): New option.
> (-fno-diagnostics-show-remark-origin): New option.
> (-fno-diagnostics-show-remark-pass): New option.
> * dumpfile.c (dump_context::get_scope_depth): Update comment.
> (dump_context::end_any_optinfo): Likewise.
> * dumpfile.h: Update comment.
> * opt-functions.awk (function): Handle "Remark" by adding
> CL_REMARK.
> * optinfo-emit-diagnostics.cc: New file.
> * optinfo-emit-diagnostics.h: New file.
> * optinfo.cc: Include "optinfo-emit-diagnostics.h".
> (optinfo::emit): Call emit_optinfo_as_diagnostic_remark.
> (optinfo_enabled_p): Use flag_remarks.
> * optinfo.h: Update comment.
> * opts.c (print_specific_help): Handle CL_REMARK.
> (common_handle_option): Likewise.
> * opts.h (CL_REMARK): New macro.
> (CL_MAX_OPTION_CLASS): Update for CL_REMARK.
> (CL_JOINED, CL_SEPARATE, CL_UNDOCUMENTED, CL_NO_DWARF_RECORD)
> (CL_PCH_IGNORE): Likewise.
> * profile-count.c (profile_quality_as_string): New function.
> * profile-count.h (profile_quality_as_string): New decl.
> (profile_count::quality): New accessor.
> * selftest-run-tests.c (selftest::run_tests): Call
> selftest::optinfo_emit_diagnostics_cc_tests.
> * selftest.h (selftest::optinfo_emit_diagnostics_cc_tests): New
> decl.
>
> gcc/fortran/ChangeLog:
> * gfc-diagnostic.def (DK_REMARK): New diagnostic kind.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/plugin/plugin.exp (plugin_test_list): Add
> remarks_plugin.c.
> * gcc.dg/plugin/remarks-1.c: New test.
> * gcc.dg/plugin/remarks_plugin.c: New test plugin.
> * lib/gcc-dg.exp (dg-remark): New function.
> ---
>  gcc/Makefile.in  |   1 +
>  gcc/common.opt   |  17 ++
>  gcc/diagnostic-color.c   |   2 +
>  gcc/diagnostic-core.h|   2 +
>  gcc/diagnostic.c |  17 ++
>  gcc/diagnostic.def   |   1 +
>  gcc/doc/invoke.texi  |  67 ++
>  gcc/dump

Re: Extend tree code folds to IFN_COND_*

2018-07-09 Thread Richard Biener

On Wed, Jul 4, 2018 at 8:46 PM Richard Sandiford
 wrote:
>
> Finally getting back to this...
>
> Richard Biener  writes:
> > On Wed, Jun 6, 2018 at 10:16 PM Richard Sandiford
> >  wrote:
> >>
> >> > On Thu, May 24, 2018 at 11:36 AM Richard Sandiford
> >> >  wrote:
> >> >>
> >> >> This patch adds match.pd support for applying normal folds to their
> >> >> IFN_COND_* forms.  E.g. the rule:
> >> >>
> >> >>   (plus @0 (negate @1)) -> (minus @0 @1)
> >> >>
> >> >> also allows the fold:
> >> >>
> >> >>   (IFN_COND_ADD @0 @1 (negate @2) @3) -> (IFN_COND_SUB @0 @1 @2 @3)
> >> >>
> >> >> Actually doing this by direct matches in gimple-match.c would
> >> >> probably lead to combinatorial explosion, so instead, the patch
> >> >> makes gimple_match_op carry a condition under which the operation
> >> >> happens ("cond"), and the value to use when the condition is false
> >> >> ("else_value").  Thus in the example above we'd do the following
> >> >>
> >> >> (a) convert:
> >> >>
> >> >>   cond:NULL_TREE (IFN_COND_ADD @0 @1 @4 @3) else_value:NULL_TREE
> >> >>
> >> >> to:
> >> >>
> >> >>   cond:@0 (plus @1 @4) else_value:@3
> >> >>
> >> >> (b) apply gimple_resimplify to (plus @1 @4)
> >> >>
> >> >> (c) reintroduce cond and else_value when constructing the result.
> >> >>
> >> >> Nested operations inherit the condition of the outer operation
> >> >> (so that we don't introduce extra faults) but have a null else_value.
> >> >> If we try to build such an operation, the target gets to choose what
> >> >> else_value it can handle efficiently: obvious choices include one of
> >> >> the operands or a zero constant.  (The alternative would be to have some
> >> >> representation for an undefined value, but that seems a bit invasive,
> >> >> and isn't likely to be useful here.)
> >> >>
> >> >> I've made the condition a mandatory part of the gimple_match_op
> >> >> constructor so that it doesn't accidentally get dropped.
> >> >>
> >> >> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> >> >> and x86_64-linux-gnu.  OK to install?
> >> >
> >> > It looks somewhat clever but after looking for a while it doesn't handle
> >> > simplifying
> >> >
> >> >  (IFN_COND_ADD @0 @1 (IFN_COND_SUB @0 @2 @1 @3) @3)
> >> >
> >> > to
> >> >
> >> >  (cond @0 @2 @3)
> >> >
> >> > right?  Because while the conditional gimple_match_op is built
> >> > by try_conditional_simplification it isn't built when doing
> >> > SSA use->def following in the generated matching code?
> >>
> >> Right.  This would be easy to add, but there's no motivating case yet.
> >
> > ...
> >
> >> > So it looks like a bit much noise for this very special case?
> >> >
> >> > I suppose you ran into the need of these foldings from looking
> >> > at real code - which foldings specifically were appearing here?
> >> > Usually code is well optimized before if-conversion/vectorization
> >> > so we shouldn't need full-blown handling?
> >>
> >> It's needed to get the FMA, FMS, FNMA and FNMS folds for IFN_COND_* too.
> >> I thought it'd be better to do it "automatically" rather than add specific
> >> folds, since if we don't do it automatically now, it's going to end up
> >> being a precedent for not doing it automatically in future either.
> >
> > ... not like above isn't a similar precedent ;)  But OK, given...
>
> But we're not doing the above case manually either yet :-)  Whereas the
> series does need to do what the patch does one way or another.
>
> Also, it might be hard to do the above case manually anyway (i.e. match
> nested IFN_COND_* ops with an implicitly-conditional top-level op),
> since the match.pd rule wouldn't have easy access to the overall condition.
> And that's by design, or so I'd like to claim.
>
> >> > That said, I'm not sure how much work it is to massage
> >> >
> >> >   if (gimple *def_stmt = get_def (valueize, op2))
> >> > {
> >> >   if (gassign *def = dyn_cast  (def_stmt))
> >> > switch (gimple_assign_rhs_code (def))
> >> >   {
> >> >   case PLUS_EXPR:
> >> >
> >> > to look like
> >> >
> >> >   if (gimple *def_stmt = get_def (valueize, op2))
> >> > {
> >> >code = ERROR_MARK;
> >> >if (!is_cond_ifn_with_cond (curr_gimple_match_op, &code))
> >> >  if (gassign *def dyn_cast  (def_stmt))
> >> >code = gimple_assign_rhs_code (def);
> >> >switch (code)
> >> >  {
> >> >  case PLUS_EXPR:
> >> >
> >> > thus transparently treat the IFN_COND_* as their "code" if the condition
> >> > matches that of the context (I'm not sure if we can do anything for
> >> > mismatching contexts).
> >>
> >> Yeah, this was one approach I had in mind for the subnodes, if we do
> >> end up needing it.  But at least for the top-level node, we want to try
> >> both as a native IFN_COND_FOO and as a conditional FOO, which is why the
> >> top-level case is handled directly in gimple-match-head.c.
> >>
> >> Of course, tr

[PATCH, GCC, AARCH64] Add support for +profile extension

2018-07-09 Thread Andre Vieira (lists)

Hi,

This patch adds support for the Statistical Profiling Extension (SPE) on
AArch64. Even though the compiler will not generate code any differently
given this extension, it will need to pass it on to the assembler in
order to let it correctly assemble inline asm containing accesses to the
extension's system registers.  The same applies when using the
preprocessor on an assembly file as this first must pass through cc1.

I left the hwcaps string for SPE empty as the kernel does not define a
feature string for this extension.  The current effect of this is that
driver will disable profile feature bit in GCC.  This is OK though
because we don't, nor do we ever, enable this feature bit, as codegen is
not affect by the SPE support and more importantly the driver will still
pass the extension down to the assembler regardless.

Boostrapped aarch64-none-linux-gnu and ran regression tests.

Is it OK for trunk?

gcc/ChangeLog:
2018-07-09  Andre Vieira  

* config/aarch64/aarch64-option-extensions.def: New entry for profile
extension.
* config/aarch64/aarch64.h (AARCH64_FL_PROFILE): New.
* doc/invoke.texi (aarch64-feature-modifiers): New entry for profile
extension.

gcc/testsuite/ChangeLog:
2018-07-09 Andre Vieira 

* gcc.target/aarch64/profile.c: New test.
diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 5fe5e3f7dddf622a48a5b9458ef30449a886f395..69ab796a4e1a959b89ebb55b599919c442cfb088 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -105,4 +105,7 @@ AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, AARCH64_FL_FP | AARCH64_FL_F
Disabling "sve" just disables "sve".  */
 AARCH64_OPT_EXTENSION("sve", AARCH64_FL_SVE, AARCH64_FL_FP | AARCH64_FL_SIMD | AARCH64_FL_F16, 0, "sve")
 
+/* Enabling/Disabling "profile" does not enable/disable any other feature.  */
+AARCH64_OPT_EXTENSION("profile", AARCH64_FL_PROFILE, 0, 0, "")
+
 #undef AARCH64_OPT_EXTENSION
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f284e74bfb8c9bab2aa22cc6c5a67750cbbba3c2..c1218503bab19323eee1cca8b7e4bea8fbfcf573 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -158,6 +158,9 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_FL_SHA3	  (1 << 18)  /* Has ARMv8.4-a SHA3 and SHA512.  */
 #define AARCH64_FL_F16FML (1 << 19)  /* Has ARMv8.4-a FP16 extensions.  */
 
+/* Statistical Profiling extensions.  */
+#define AARCH64_FL_PROFILE(1 << 20)
+
 /* Has FP and SIMD.  */
 #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 56cd122b0d7b420e2b16ceb02907860879d3b9d7..4ca68a563297482afc75abed4a31c106af38caf7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14813,6 +14813,8 @@ instructions. Use of this option with architectures prior to Armv8.2-A is not su
 @item sm4
 Enable the sm3 and sm4 crypto extension.  This also enables Advanced SIMD instructions.
 Use of this option with architectures prior to Armv8.2-A is not supported.
+@item profile
+Enable the Statistical Profiling extension.
 
 @end table
 
diff --git a/gcc/testsuite/gcc.target/aarch64/profile.c b/gcc/testsuite/gcc.target/aarch64/profile.c
new file mode 100644
index ..db51b4746dd60009d784bc0b37ea99b2f120d856
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/profile.c
@@ -0,0 +1,9 @@
+/* { dg-do assemble } */
+/* { dg-options "-std=gnu99 -march=armv8.2-a+profile" } */
+
+int foo (void)
+{
+  int ret;
+  asm ("mrs  %0, pmblimitr_el1" : "=r" (ret));
+  return ret;
+}

Re: [PATCH, GCC, AARCH64] Add support for +profile extension

2018-07-09 Thread Kyrill Tkachov


Hi Andre,

On 09/07/18 14:20, Andre Vieira (lists) wrote:

Hi,

This patch adds support for the Statistical Profiling Extension (SPE) on
AArch64. Even though the compiler will not generate code any differently
given this extension, it will need to pass it on to the assembler in
order to let it correctly assemble inline asm containing accesses to the
extension's system registers.  The same applies when using the
preprocessor on an assembly file as this first must pass through cc1.

I left the hwcaps string for SPE empty as the kernel does not define a
feature string for this extension.  The current effect of this is that
driver will disable profile feature bit in GCC.  This is OK though
because we don't, nor do we ever, enable this feature bit, as codegen is
not affect by the SPE support and more importantly the driver will still
pass the extension down to the assembler regardless.

Boostrapped aarch64-none-linux-gnu and ran regression tests.

Is it OK for trunk?



This looks sensible to me (though you'll need approval from a maintainer) with 
a small ChangeLog nit...


gcc/ChangeLog:
2018-07-09  Andre Vieira 

* config/aarch64/aarch64-option-extensions.def: New entry for profile
extension.
* config/aarch64/aarch64.h (AARCH64_FL_PROFILE): New.
* doc/invoke.texi (aarch64-feature-modifiers): New entry for profile
extension.

gcc/testsuite/ChangeLog:
2018-07-09 Andre Vieira 



... Make sure there's two spaces between the date, name and email.

Thanks,
Kyrill


* gcc.target/aarch64/profile.c: New test.

[PATCH, committed] Improve code generation for pdp11 target

2018-07-09 Thread Paul Koning

This patch improves the generated code for the pdp11 target.

Committed.

paul

ChangeLog:

2018-07-09  Paul Koning  

* config/pdp11/pdp11.c (pdp11_addr_cost): New function.
(pdp11_insn_cost): New function.
(pdp11_md_asm_adjust): New function.
(TARGET_INVALID_WITHIN_DOLOOP): Define.
(pdp11_rtx_costs): Update to match machine better.
(output_addr_const_pdp11): Correct format mismatch warnings.
* config/pdp11/pdp11.h (SLOW_BYTE_ACCESS): Correct definition.
* config/pdp11/pdp11.md: General change to add base_cost and/or
length attributes for use by new pdp11_insn_cost function.
(MIN_BRANCH): Correct definition.
(MIN_SOB): Ditto.
(doloop_end): Use standard pattern name for looping pattern.
(doloop_end_nocc): New.
(movsf): Add another constraint alternative.
(zero_extendqihi2): Add constraint alternatives for not in place
extend.
(zero_extendhisi2): Remove.
(shift patterns): Add CC handling variants.
(bswaphi2): New.
(bswapsi2): New.
(rothi3): New.
(define_peephole2): New peephole to recognize mov that sets CC for
subsequent test.

Index: config/pdp11/pdp11.c
===
--- config/pdp11/pdp11.c(revision 262518)
+++ config/pdp11/pdp11.c(working copy)
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "memmodel.h"
 #include "tm_p.h"
 #include "insn-config.h"
+#include "insn-attr.h"
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
@@ -150,6 +151,11 @@ decode_pdp11_d (const struct real_format *fmt ATTR
 static const char *singlemove_string (rtx *);
 static bool pdp11_assemble_integer (rtx, unsigned int, int);
 static bool pdp11_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static int pdp11_addr_cost (rtx, machine_mode, addr_space_t, bool);
+static int pdp11_insn_cost (rtx_insn *insn, bool speed);
+static rtx_insn *pdp11_md_asm_adjust (vec &, vec &,
+ vec &,
+ vec &, HARD_REG_SET &);
 static bool pdp11_return_in_memory (const_tree, const_tree);
 static rtx pdp11_function_value (const_tree, const_tree, bool);
 static rtx pdp11_libcall_value (machine_mode, const_rtx);
@@ -174,6 +180,8 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 #undef TARGET_ASM_INTEGER
 #define TARGET_ASM_INTEGER pdp11_assemble_integer
 
+/* These two apply to Unix and GNU assembler; for DEC, they are
+   overridden during option processing.  */
 #undef TARGET_ASM_OPEN_PAREN
 #define TARGET_ASM_OPEN_PAREN "["
 #undef TARGET_ASM_CLOSE_PAREN
@@ -182,6 +190,15 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 #undef TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS pdp11_rtx_costs
 
+#undef  TARGET_ADDRESS_COST
+#define TARGET_ADDRESS_COST pdp11_addr_cost
+
+#undef  TARGET_INSN_COST
+#define TARGET_INSN_COST pdp11_insn_cost
+
+#undef  TARGET_MD_ASM_ADJUST
+#define TARGET_MD_ASM_ADJUST pdp11_md_asm_adjust
+
 #undef TARGET_FUNCTION_ARG
 #define TARGET_FUNCTION_ARG pdp11_function_arg
 #undef TARGET_FUNCTION_ARG_ADVANCE
@@ -271,6 +288,9 @@ static bool pdp11_scalar_mode_supported_p (scalar_
 
 #undef  TARGET_CAN_CHANGE_MODE_CLASS
 #define TARGET_CAN_CHANGE_MODE_CLASS pdp11_can_change_mode_class
+
+#undef TARGET_INVALID_WITHIN_DOLOOP
+#define TARGET_INVALID_WITHIN_DOLOOP hook_constcharptr_const_rtx_insn_null
 
 /* A helper function to determine if REGNO should be saved in the
current function's stack frame.  */
@@ -968,12 +988,8 @@ pdp11_assemble_integer (rtx x, unsigned int size,
 }
 
 
-/* Register to register moves are cheap if both are general registers.
-   The same is true for FPU, but there we return cost of 3 rather than
-   2 to make reload look at the constraints.  The raeson is that
-   load/store double require extra care since load touches condition
-   codes and store doesn't, which is (partly anyway) described by
-   constraints.  */
+/* Register to register moves are cheap if both are general
+   registers.  */
 static int 
 pdp11_register_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
  reg_class_t c1, reg_class_t c2)
@@ -983,151 +999,270 @@ pdp11_register_move_cost (machine_mode mode ATTRIB
 return 2;
   else if ((c1 >= LOAD_FPU_REGS && c1 <= FPU_REGS && c2 == LOAD_FPU_REGS) ||
   (c2 >= LOAD_FPU_REGS && c2 <= FPU_REGS && c1 == LOAD_FPU_REGS))
-return 3;
+return 2;
   else
 return 22;
 }
 
-
+/* This tries to approximate what pdp11_insn_cost would do, but
+   without visibility into the actual instruction being generated it's
+   inevitably a rough approximation.  */
 static bool
-pdp11_rtx_costs (rtx x, machine_mode mode, int outer_code ATTRIBUTE_UNUSED,
-int opno ATTRIBUTE_UNUSED, int *total,
-bool speed ATTRIBUTE_UNUSED)
+pdp11_rtx_costs (rtx x, machine_mode mode, i

Re: [PATCH] fold strings as long as the array they're stored in (PR 86428)

2018-07-09 Thread Richard Biener

On Sat, Jul 7, 2018 at 11:47 PM Martin Sebor  wrote:
>
> While working on other string folding improvements (PR 77357)
> I came across another distinct case where GCC could do better:
> it doesn't consider as candidates strings that have as many
> elements as the size of the array they are stored in, even if
> their length is within the bounds of the array.  For instance,
> only the first strlen() call below is folded even though
> the arguments to both are valid NUL-terminated strings.
>
> const char a[4] = "123";
> int f (void)
> {
>   return strlen (a);   // folded
> }
>
> const char b[4] = "123\0";
> int g (void)
> {
>   return strlen (b);   // not folded
> }
>
> The attached tiny patch adjusts the the string_constant() function
> to recognize as valid string constants all those whose length (as
> returned by strlen()) is less than the size of the array they are
> stored in.
>
> Tested on x86_64-linux.
>
> Testing of an earlier version of the patch exposed what I think
> is a deficiency in the (very) early strlen folding: c_strlen()
> folds expressions of the form strlen(A + I) to sizeof(A) - I when
> A is an array of known size and I a non-const variable.  This not
> only prevents -Warray-bounds from warning on invalid indices but
> also defeats other, possibly more profitable optimizations based
> on the range of the result of the strlen() call.  The logs show
> that the code dates at least as far back as 1992.  With VRP and
> other data flow based optimizations I think we can do better
> today.  I opened bug 86434 to improve things.

I think that

+  unsigned HOST_WIDE_INT length = strlen (TREE_STRING_POINTER (init));

is dangerous and you should use strnlen (TREE_STRING_POINTER (init),
TREE_STRING_LENGTH (init))
instead.

OK with that change.

Richard.

> Martin
>

Re: [PATCH] add support for strnlen (PR 81384)

2018-07-09 Thread Aldy Hernandez

   { dg-do run }
   { do-options "-O2 -fno-tree-strlen" }  */

 I don't think this is doing anything.

If you look at the test run you can see that -fno-tree-strlen is never
passed (I think you actually mean -fno-optimize-strlen for that
matter).  Also, the builtins.exp harness runs your test for an
assortment of other flags, not just -O2.

This test is failing on my range branch for -Og, because
expand_builtin_strnlen() needs range info:

+  wide_int min, max;
+  enum value_range_type rng = get_range_info (bound, &min, &max);
+  if (rng != VR_RANGE)
+return NULL_RTX;

but interestingly enough, it seems to be calculated in the sprintf
pass as part of the DOM walk:

  /* First record ranges generated by this statement.  */
  evrp_range_analyzer.record_ranges_from_stmt (stmt, false);

It feels wrong that the sprintf warning pass is generating range info
that you may later depend on at rtl expansion time (and for a totally
unrelated thing-- strlen expansion).

I don't know if this is just a quirk of builtins.exp calling your test
with flags you didn't intend, but the inconsistency could cause
problems in the future.  Errr, or my present ;-).

Would it be too much to ask for you to either fix the flags being
passed down to the test, or better yet, find some non-sprintf
dependent way of calculating range info earlier?

Aldy
On Mon, Jun 18, 2018 at 6:35 PM Martin Sebor  wrote:
>
> On 06/12/2018 03:11 PM, Jeff Law wrote:
> > On 06/05/2018 03:43 PM, Martin Sebor wrote:
> >> The attached patch adds basic support for handling strnlen
> >> as a built-in function.  It touches the strlen pass where
> >> it folds constant results of the function, and builtins.c
> >> to add simple support for expanding strnlen calls with known
> >> results.  It also changes calls.c to detect excessive bounds
> >> to the function and unsafe calls with arguments declared
> >> attribute nonstring.
> >>
> >> A side-effect of the strlen change I should call out is that
> >> strlen() calls to all zero-length arrays that aren't considered
> >> flexible array members (i.e., internal members or non-members)
> >> are folded into zero.  No warning is issued for such invalid
> >> uses of zero-length arrays but based on the responses to my
> >> question Re: aliasing between internal zero-length-arrays and
> >> other members(*) it sounds like one would be appropriate.
> >> I will see about adding one in a separate patch.
> >>
> >> Martin
> >>
> >> [*] https://gcc.gnu.org/ml/gcc/2018-06/msg00046.html
> >>
> >> gcc-81384.diff
> >>
> >>
> >> PR tree-optimization/81384 - built-in form of strnlen missing
> >>
> >> gcc/ChangeLog:
> >>
> >>  PR tree-optimization/81384
> >>  * builtin-types.def (BT_FN_SIZE_CONST_STRING_SIZE): New.
> >>  * builtins.c (expand_builtin_strnlen): New function.
> >>  (expand_builtin): Call it.
> >>  (fold_builtin_n): Avoid setting TREE_NO_WARNING.
> >>  * builtins.def (BUILT_IN_STRNLEN): New.
> >>  * calls.c (maybe_warn_nonstring_arg): Handle BUILT_IN_STRNLEN.
> >>  Warn for bounds in excess of maximum object size.
> >>  * tree-ssa-strlen.c (maybe_set_strlen_range): Return tree representing
> >>  single-value ranges.  Handle strnlen.
> >>  (handle_builtin_strlen): Handle strnlen.
> >>  (strlen_check_and_optimize_stmt): Same.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>  PR tree-optimization/81384
> >>  * gcc.c-torture/execute/builtins/lib/strnlen.c: New test.
> >>  * gcc.c-torture/execute/builtins/strnlen-lib.c: New test.
> >>  * gcc.c-torture/execute/builtins/strnlen.c: New test.
> >>  * gcc.dg/attr-nonstring-2.c: New test.
> >>  * gcc.dg/attr-nonstring-3.c: New test.
> >>  * gcc.dg/attr-nonstring-4.c: New test.
> >>  * gcc.dg/strlenopt-44.c: New test.
> >>  * gcc.dg/strlenopt.h (strnlen):  Declare.
> >>
> >> diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
> >> index 5365bef..1f15350 100644
> >> --- a/gcc/builtin-types.def
> >> +++ b/gcc/builtin-types.def
> >> @@ -322,6 +322,8 @@ DEF_FUNCTION_TYPE_2 (BT_FN_STRING_CONST_STRING_INT,
> >>   BT_STRING, BT_CONST_STRING, BT_INT)
> >>  DEF_FUNCTION_TYPE_2 (BT_FN_STRING_CONST_STRING_SIZE,
> >>   BT_STRING, BT_CONST_STRING, BT_SIZE)
> >> +DEF_FUNCTION_TYPE_2 (BT_FN_SIZE_CONST_STRING_SIZE,
> >> + BT_SIZE, BT_CONST_STRING, BT_SIZE)
> >>  DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_STRING_FILEPTR,
> >>   BT_INT, BT_CONST_STRING, BT_FILEPTR)
> >>  DEF_FUNCTION_TYPE_2 (BT_FN_INT_INT_FILEPTR,
> > I believe Jakub already suggested these change and you ack'd that.
> >
> > You have some hunks which will need updating now that the CHKP/MPX bits
> > are gone.
> >
> > So OK after the cleanups noted above and a fresh bootstrap & regression
> > test cycle.
> >
>
> Done.  I also added documentation for the built-in, reran GCC
> and Glibc tests with no regressions, and committed 261705.
>
> Martin

Re: calculate overflow type in wide int arithmetic

2018-07-09 Thread Aldy Hernandez





On 07/09/2018 07:16 AM, Rainer Orth wrote:


and several more.  This seems to be the only backend that uses the
additional bool * argument to wi::neg etc.

Fixed as follows, bootstrapped on sparc-sun-solaris2.11.


Thanks.  Sorry for the oversight.

Aldy

Re: [PATCH, GCC, AARCH64] Add support for +profile extension

2018-07-09 Thread Andrew Pinski

On Mon, Jul 9, 2018 at 6:21 AM Andre Vieira (lists)
 wrote:
>
> Hi,
>
> This patch adds support for the Statistical Profiling Extension (SPE) on
> AArch64. Even though the compiler will not generate code any differently
> given this extension, it will need to pass it on to the assembler in
> order to let it correctly assemble inline asm containing accesses to the
> extension's system registers.  The same applies when using the
> preprocessor on an assembly file as this first must pass through cc1.
>
> I left the hwcaps string for SPE empty as the kernel does not define a
> feature string for this extension.  The current effect of this is that
> driver will disable profile feature bit in GCC.  This is OK though
> because we don't, nor do we ever, enable this feature bit, as codegen is
> not affect by the SPE support and more importantly the driver will still
> pass the extension down to the assembler regardless.
>
> Boostrapped aarch64-none-linux-gnu and ran regression tests.
>
> Is it OK for trunk?

I use a similar patch for the last year and half.

Thanks,
Andrew

>
> gcc/ChangeLog:
> 2018-07-09  Andre Vieira  
>
> * config/aarch64/aarch64-option-extensions.def: New entry for profile
> extension.
> * config/aarch64/aarch64.h (AARCH64_FL_PROFILE): New.
> * doc/invoke.texi (aarch64-feature-modifiers): New entry for profile
> extension.
>
> gcc/testsuite/ChangeLog:
> 2018-07-09 Andre Vieira 
>
> * gcc.target/aarch64/profile.c: New test.

[PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-09 Thread Richard Earnshaw


The patches I posted earlier this year for mitigating against
CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
which it became obvious that a rethink was needed.  This mail, and the
following patches attempt to address that feedback and present a new
approach to mitigating against this form of attack surface.

There were two major issues with the original approach:

- The speculation bounds were too tightly constrained - essentially
  they had to represent and upper and lower bound on a pointer, or a
  pointer offset.
- The speculation constraints could only cover the immediately preceding
  branch, which often did not fit well with the structure of the existing
  code.

An additional criticism was that the shape of the intrinsic did not
fit particularly well with systems that used a single speculation
barrier that essentially had to wait until all preceding speculation
had to be resolved.

To address all of the above, these patches adopt a new approach, based
in part on a posting by Chandler Carruth to the LLVM developers list
(https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
but which we have extended to deal with inter-function speculation.
The patches divide the problem into two halves.

The first half is some target-specific code to track the speculation
condition through the generated code to provide an internal variable
which can tell us whether or not the CPU's control flow speculation
matches the data flow calculations.  The idea is that the internal
variable starts with the value TRUE and if the CPU's control flow
speculation ever causes a jump to the wrong block of code the variable
becomes false until such time as the incorrect control flow
speculation gets unwound.

The second half is that a new intrinsic function is introduced that is
much simpler than we had before.  The basic version of the intrinsic
is now simply:

  T var = __builtin_speculation_safe_value (T unsafe_var);

Full details of the syntax can be found in the documentation patch, in
patch 1.  In summary, when not speculating the intrinsic returns
unsafe_var; when speculating then if it can be shown that the
speculative flow has diverged from the intended control flow then zero
is returned.  An optional second argument can be used to return an
alternative value to zero.  The builtin may cause execution to pause
until the speculation state is resolved.

There are seven patches in this set, as follows.

1) Introduces the new intrinsic __builtin_sepculation_safe_value.
2) Adds a basic hard barrier implementation for AArch32 (arm) state.
3) Adds a basic hard barrier implementation for AArch64 state.
4) Adds a new command-line option -mtrack-speculation (currently a no-op).
5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
6) Adds the new speculation tracking pass for AArch64
7) Uses the new speculation tracking pass to generate CSDB-based barrier
   sequences

I haven't added a speculation-tracking pass for AArch32 at this time.
It is possible to do this, but would require quite a lot of rework for
the arm backend due to the limited number of registers that are
available.

Although patch 6 is AArch64 specific, I'd appreciate a review from
someone more familiar with the branch edge code than myself.  There
appear to be a number of tricky issues with more complex edges so I'd
like a second opinion on that code in case I've missed an important
case.

R.

  

Richard Earnshaw (7):
  Add __builtin_speculation_safe_value
  Arm - add speculation_barrier pattern
  AArch64 - add speculation barrier
  AArch64 - Add new option -mtrack-speculation
  AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
  AArch64 - new pass to add conditional-branch speculation tracking
  AArch64 - use CSDB based sequences if speculation tracking is enabled

 gcc/builtin-types.def |   6 +
 gcc/builtins.c|  57 
 gcc/builtins.def  |  20 ++
 gcc/c-family/c-common.c   | 143 +
 gcc/c-family/c-cppbuiltin.c   |   5 +-
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   3 +-
 gcc/config/aarch64/aarch64-speculation.cc | 494 ++
 gcc/config/aarch64/aarch64.c  |  88 +-
 gcc/config/aarch64/aarch64.md | 140 -
 gcc/config/aarch64/aarch64.opt|   4 +
 gcc/config/aarch64/iterators.md   |   3 +
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/config/arm/arm.md |  21 ++
 gcc/config/arm/unspecs.md |   1 +
 gcc/doc/cpp.texi  |   4 +
 gcc/doc/extend.texi   |  29 ++
 gcc/doc/invoke.texi   |  10 +-
 gcc/doc/md.texi   |  15 +
 gcc/doc/tm.texi   |  20 ++
 gcc/doc/tm.texi.in

[PATCH 2/7] Arm - add speculation_barrier pattern

2018-07-09 Thread Richard Earnshaw


This patch defines a speculation barrier for AArch32.

* config/arm/unspecs.md (unspecv): Add VUNSPEC_SPECULATION_BARRIER.
* config/arm/arm.md (speculation_barrier): New expand.
(speculation_barrier_insn): New pattern.
---
 gcc/config/arm/arm.md | 21 +
 gcc/config/arm/unspecs.md |  1 +
 2 files changed, 22 insertions(+)

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 361a026..ca2a2f5 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12012,6 +12012,27 @@ (define_insn ""
   [(set_attr "length" "4")
(set_attr "type" "coproc")])
 
+(define_expand "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  "TARGET_EITHER"
+  "
+/* Don't emit anything for Thumb1 and suppress the warning from the
+   generic expansion.  */
+if (!TARGET_32BIT)
+   DONE;
+  "
+)
+
+;; Generate a hard speculation barrier when we have not enabled speculation
+;; tracking.
+(define_insn "*speculation_barrier_insn"
+  [(unspec_volatile [(const_int 0)] VUNSPEC_SPECULATION_BARRIER)]
+  "TARGET_32BIT"
+  "isb\;dsb\\tsy"
+  [(set_attr "type" "block")
+   (set_attr "length" "8")]
+)
+
 ;; Vector bits common to IWMMXT and Neon
 (include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b05f85e..1941673 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -168,6 +168,7 @@ (define_c_enum "unspecv" [
   VUNSPEC_MCRR2		; Represent the coprocessor mcrr2 instruction.
   VUNSPEC_MRRC		; Represent the coprocessor mrrc instruction.
   VUNSPEC_MRRC2		; Represent the coprocessor mrrc2 instruction.
+  VUNSPEC_SPECULATION_BARRIER ; Represents an unconditional speculation barrier.
 ])
 
 ;; Enumerators for NEON unspecs.

[PATCH 3/7] AArch64 - add speculation barrier

2018-07-09 Thread Richard Earnshaw


Similar to Arm, this adds an unconditional speculation barrier for AArch64.

* config/aarch64.md (unspecv): Add UNSPECV_SPECULAION_BARRIER.
(speculation_barrier): New insn.
---
 gcc/config/aarch64/aarch64.md | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..c135ada 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -205,6 +205,7 @@ (define_c_enum "unspecv" [
 UNSPECV_SET_FPSR		; Represent assign of FPSR content.
 UNSPECV_BLOCKAGE		; Represent a blockage
 UNSPECV_PROBE_STACK_RANGE	; Represent stack range probing.
+UNSPECV_SPECULATION_BARRIER ; Represent speculation barrier.
   ]
 )
 
@@ -6093,6 +6094,15 @@ (define_expand "set_clobber_cc"
 		   (match_operand 1))
 	  (clobber (reg:CC CC_REGNUM))])])
 
+;; Hard speculation barrier.
+(define_insn "speculation_barrier"
+  [(unspec_volatile [(const_int 0)] UNSPECV_SPECULATION_BARRIER)]
+  ""
+  "isb\;dsb\\tsy"
+  [(set_attr "length" "8")
+   (set_attr "type" "block")]
+)
+
 ;; AdvSIMD Stuff
 (include "aarch64-simd.md")

[PATCH 4/7] AArch64 - Add new option -mtrack-speculation

2018-07-09 Thread Richard Earnshaw


This patch doesn't do anything useful, it simply adds a new command-line
option -mtrack-speculation to AArch64.  Subsequent patches build on this.

* config/aarch64/aarch64.opt (mtrack-speculation): New target option.
---
 gcc/config/aarch64/aarch64.opt | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 1426b45..bc9b22a 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -214,3 +214,7 @@ Target RejectNegative Joined Enum(sve_vector_bits) Var(aarch64_sve_vector_bits)
 mverbose-cost-dump
 Common Undocumented Var(flag_aarch64_verbose_cost)
 Enables verbose cost model dumping in the debug dump files.
+
+mtrack-speculation
+Target Var(aarch64_track_speculation)
+Generate code to track when the CPU might be speculating incorrectly.

[PATCH 1/7] Add __builtin_speculation_safe_value

2018-07-09 Thread Richard Earnshaw


This patch defines a new intrinsic function
__builtin_speculation_safe_value.  A generic default implementation is
defined which will attempt to use the backend pattern
"speculation_safe_barrier".  If this pattern is not defined, or if it
is not available, then the compiler will emit a warning, but
compilation will continue.

Note that the test spec-barrier-1.c will currently fail on all
targets.  This is deliberate, the failure will go away when
appropriate action is taken for each target backend.

gcc:
* builtin-types.def (BT_FN_PTR_PTR_VAR): New function type.
(BT_FN_I1_I1_VAR, BT_FN_I2_I2_VAR, BT_FN_I4_I4_VAR): Likewise.
(BT_FN_I8_I8_VAR, BT_FN_I16_I16_VAR): Likewise.
* builtins.def (BUILT_IN_SPECULATION_SAFE_VALUE_N): New builtin.
(BUILT_IN_SPECULATION_SAFE_VALUE_PTR): New internal builtin.
(BUILT_IN_SPECULATION_SAFE_VALUE_1): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_2): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_4): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_8): Likewise.
(BUILT_IN_SPECULATION_SAFE_VALUE_16): Likewise.
* builtins.c (expand_speculation_safe_value): New function.
(expand_builtin): Call it.
* doc/cpp.texi: Document predefine __HAVE_SPECULATION_SAFE_VALUE.
* doc/extend.texi: Document __builtin_speculation_safe_value.
* doc/md.texi: Document "speculation_barrier" pattern.
* doc/tm.texi.in: Pull in TARGET_SPECULATION_SAFE_VALUE.
* doc/tm.texi: Regenerated.
* target.def (speculation_safe_value): New hook.
* targhooks.c (default_speculation_safe_value): New function.
* targhooks.h (default_speculation_safe_value): Add prototype.

c-family:
* c-common.c (speculation_safe_resolve_size): New function.
(speculation_safe_resolve_params): New function.
(speculation_safe_resolve_return): New function.
(resolve_overloaded_builtin): Handle __builtin_speculation_safe_value.
* c-cppbuiltin.c (c_cpp_builtins): Add pre-define for
__HAVE_SPECULATION_SAFE_VALUE.

testsuite:
* gcc.dg/spec-barrier-1.c: New test.
* gcc.dg/spec-barrier-2.c: New test.
* gcc.dg/spec-barrier-3.c: New test.
---
 gcc/builtin-types.def |   6 ++
 gcc/builtins.c|  57 ++
 gcc/builtins.def  |  20 +
 gcc/c-family/c-common.c   | 143 ++
 gcc/c-family/c-cppbuiltin.c   |   5 +-
 gcc/doc/cpp.texi  |   4 +
 gcc/doc/extend.texi   |  29 +++
 gcc/doc/md.texi   |  15 
 gcc/doc/tm.texi   |  20 +
 gcc/doc/tm.texi.in|   2 +
 gcc/target.def|  23 ++
 gcc/targhooks.c   |  27 +++
 gcc/targhooks.h   |   2 +
 gcc/testsuite/gcc.dg/spec-barrier-1.c |  40 ++
 gcc/testsuite/gcc.dg/spec-barrier-2.c |  19 +
 gcc/testsuite/gcc.dg/spec-barrier-3.c |  13 
 16 files changed, 424 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-1.c
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-2.c
 create mode 100644 gcc/testsuite/gcc.dg/spec-barrier-3.c

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index b01095c..70fae35 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -763,6 +763,12 @@ DEF_FUNCTION_TYPE_VAR_1 (BT_FN_VOID_LONG_VAR,
 			 BT_VOID, BT_LONG)
 DEF_FUNCTION_TYPE_VAR_1 (BT_FN_VOID_ULL_VAR,
 			 BT_VOID, BT_ULONGLONG)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_PTR_PTR_VAR, BT_PTR, BT_PTR)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I1_I1_VAR, BT_I1, BT_I1)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I2_I2_VAR, BT_I2, BT_I2)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I4_I4_VAR, BT_I4, BT_I4)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I8_I8_VAR, BT_I8, BT_I8)
+DEF_FUNCTION_TYPE_VAR_1 (BT_FN_I16_I16_VAR, BT_I16, BT_I16)
 
 DEF_FUNCTION_TYPE_VAR_2 (BT_FN_INT_FILEPTR_CONST_STRING_VAR,
 			 BT_INT, BT_FILEPTR, BT_CONST_STRING)
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 91658e8..9f97ecf 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6716,6 +6716,52 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
   return target;
 }
 
+/* Expand a call to __builtin_speculation_safe_value_.  MODE
+   represents the size of the first argument to that call, or VOIDmode
+   if the argument is a pointer.  IGNORE will be true if the result
+   isn't used.  */
+static rtx
+expand_speculation_safe_value (machine_mode mode, tree exp, rtx target,
+			   bool ignore)
+{
+  rtx val, failsafe;
+  unsigned nargs = call_expr_nargs (exp);
+
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+
+  if (mode == VOIDmode)
+{
+  mode = TYPE_MODE (TREE_TYPE (arg0));
+  gcc_assert (GET_MODE_CLASS (mode) == MODE_INT);
+}
+
+  val = expand_expr (arg0, NULL_RTX, mode, EXPAND_NORMAL);
+
+  /* An optional second

[PATCH 7/7] AArch64 - use CSDB based sequences if speculation tracking is enabled

2018-07-09 Thread Richard Earnshaw


In this final patch, now that we can track speculation through conditional
branches, we can use this information to use a less expensive CSDB based
speculation barrier.

* config/aarch64/iterators.md (ALLI_TI): New iterator.
* config/aarch64/aarch64.md (despeculate_copy): New
expand.
(despeculate_copy_insn): New insn.
(despeculate_copyti_insn): New insn.
(despeculate_simple): New insn
(despeculate_simpleti): New insn.
* config/aarch64/aarch64.c (aarch64_speculation_safe_value): New
function.
(TARGET_SPECULATION_SAFE_VALUE): Redefine to
aarch64_speculation_safe_value.
---
 gcc/config/aarch64/aarch64.c| 42 ++
 gcc/config/aarch64/aarch64.md   | 96 +
 gcc/config/aarch64/iterators.md |  3 ++
 3 files changed, 141 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b11d768..b30b857 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17648,6 +17648,45 @@ aarch64_select_early_remat_modes (sbitmap modes)
 }
 }
 
+/* Override the default target speculation_safe_value.  */
+static rtx
+aarch64_speculation_safe_value (machine_mode mode,
+rtx result, rtx val, rtx failval)
+{
+  /* Maybe we should warn if falling back to hard barriers.  They are
+ likely to be noticably more expensive than the alternative below.  */
+  if (!aarch64_track_speculation)
+return default_speculation_safe_value (mode, result, val, failval);
+
+  if (!REG_P (val))
+val = copy_to_mode_reg (mode, val);
+
+  if (!aarch64_reg_or_zero (failval, mode))
+failval = copy_to_mode_reg (mode, failval);
+
+  switch (mode)
+{
+case E_QImode:
+  emit_insn (gen_despeculate_copyqi (result, val, failval));
+  break;
+case E_HImode:
+  emit_insn (gen_despeculate_copyhi (result, val, failval));
+  break;
+case E_SImode:
+  emit_insn (gen_despeculate_copysi (result, val, failval));
+  break;
+case E_DImode:
+  emit_insn (gen_despeculate_copydi (result, val, failval));
+  break;
+case E_TImode:
+  emit_insn (gen_despeculate_copyti (result, val, failval));
+  break;
+default:
+  gcc_unreachable ();
+}
+  return result;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -18117,6 +18156,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_SELECT_EARLY_REMAT_MODES
 #define TARGET_SELECT_EARLY_REMAT_MODES aarch64_select_early_remat_modes
 
+#undef TARGET_SPECULATION_SAFE_VALUE
+#define TARGET_SPECULATION_SAFE_VALUE aarch64_speculation_safe_value
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 528d03d..cbcada2 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6129,6 +6129,102 @@ (define_insn "speculation_barrier"
(set_attr "speculation_barrier" "true")]
 )
 
+;; Support for __builtin_speculation_safe_value when we have speculation
+;; tracking enabled.  Use the speculation tracker to decide whether to
+;; copy operand 1 to the target, or to copy the fail value (operand 2).
+(define_expand "despeculate_copy"
+  [(set (match_operand:ALLI_TI 0 "register_operand" "=r")
+	(unspec_volatile:ALLI_TI
+	 [(match_operand:ALLI_TI 1 "register_operand" "r")
+	  (match_operand:ALLI_TI 2 "aarch64_reg_or_zero" "rZ")
+	  (use (reg:DI SPECULATION_TRACKER_REGNUM))
+	  (clobber (reg:CC CC_REGNUM))] UNSPECV_SPECULATION_BARRIER))]
+  ""
+  "
+  {
+if (operands[2] == const0_rtx)
+  {
+	rtx tracker;
+	if (mode == TImode)
+	  tracker = gen_rtx_REG (DImode, SPECULATION_TRACKER_REGNUM);
+	else
+	  tracker = gen_rtx_REG (mode, SPECULATION_TRACKER_REGNUM);
+
+	emit_insn (gen_despeculate_simple (operands[0], operands[1],
+		 tracker));
+	DONE;
+  }
+  }
+  "
+)
+
+;; Pattern to match despeculate_copy
+(define_insn "*despeculate_copy_insn"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+	(unspec_volatile:ALLI
+	 [(match_operand:ALLI 1 "register_operand" "r")
+	  (match_operand:ALLI 2 "aarch64_reg_or_zero" "rZ")
+	  (use (reg:DI SPECULATION_TRACKER_REGNUM))
+	  (clobber (reg:CC CC_REGNUM))] UNSPECV_SPECULATION_BARRIER))]
+  ""
+  {
+operands[3] = gen_rtx_REG (DImode, SPECULATION_TRACKER_REGNUM);
+output_asm_insn ("cmp\\t%3, #0\;csel\\t%0, %1, %2, ne\;csdb",
+		 operands);
+return "";
+  }
+  [(set_attr "length" "12")
+   (set_attr "type" "block")
+   (set_attr "speculation_barrier" "true")]
+)
+
+;; Pattern to match despeculate_copyti
+(define_insn "*despeculate_copyti_insn"
+  [(set (match_operand:TI 0 "register_operand" "=r")
+	(unspec_volatile:TI
+	 [(match_operand:TI 1 "register_operand" "r")
+	  (match_operand:TI 2 "aarch64_reg_or_zero" "rZ")
+	  (use (reg:DI SPECULATION_TRACKER_REGNUM))
+	  (clobber (reg:CC CC_REGNUM))] UNSPECV_SPECULAT

[PATCH 6/7] AArch64 - new pass to add conditional-branch speculation tracking

2018-07-09 Thread Richard Earnshaw


This patch is the main part of the speculation tracking code.  It adds
a new target-specific pass that is run just before the final branch
reorg pass (so that it can clean up any new edge insertions we make).
The pass is only run with -mtrack-speculation is passed on the command
line.

One thing that did come to light as part of this was that the stack pointer
register was not being permitted in comparision instructions.  We rely on
that for moving the tracking state between SP and the scratch register at
function call boundaries.

* config/aarch64/aarch64-speculation.cc: New file.
* config/aarch64/aarch64-passes.def (pass_track_speculation): Add before
pass_reorder_blocks.
* config/aarch64/aarch64-protos.h (make_pass_track_speculation): Add
prototype.
* config/aarch64/aarch64.c (aarch64_conditional_register_usage): Fix
X14 and X15 when tracking speculation.
* config/aarch64/aarch64.md (register name constants): Add
SPECULATION_TRACKER_REGNUM and SPECULATION_SCRATCH_REGNUM.
(unspec): Add UNSPEC_SPECULATION_TRACKER.
(speculation_barrier): New insn attribute.
(cmp): Allow SP in comparisons.
(speculation_tracker): New insn.
(speculation_barrier): Add speculation_barrier attribute.
* config/aarch64/t-aarch64: Add make rule for aarch64-speculation.o.
* config.gcc (aarch64*-*-*): Add aarch64-speculation.o to extra_objs.
* doc/invoke.texi (AArch64 Options): Document -mtrack-speculation.
---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |   3 +-
 gcc/config/aarch64/aarch64-speculation.cc | 494 ++
 gcc/config/aarch64/aarch64.c  |  13 +
 gcc/config/aarch64/aarch64.md |  30 +-
 gcc/config/aarch64/t-aarch64  |  10 +
 gcc/doc/invoke.texi   |  10 +-
 8 files changed, 558 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/aarch64/aarch64-speculation.cc

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 78e84c2..b17fdba 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
 	extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
-	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+	extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o aarch64-speculation.o"
 	target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 	target_has_targetm_common=yes
 	;;
diff --git a/gcc/config/aarch64/aarch64-passes.def b/gcc/config/aarch64/aarch64-passes.def
index 87747b4..3d6a254 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
.  */
 
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
+INSERT_PASS_BEFORE (pass_reorder_blocks, 1, pass_track_speculation);
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index bc11a78..e80ffcf 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -554,7 +554,8 @@ enum aarch64_parse_opt_result aarch64_parse_extension (const char *,
 std::string aarch64_get_extension_string_for_isa_flags (unsigned long,
 			unsigned long);
 
-rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
+rtl_opt_pass *make_pass_fma_steering (gcc::context *);
+rtl_opt_pass *make_pass_track_speculation (gcc::context *);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
diff --git a/gcc/config/aarch64/aarch64-speculation.cc b/gcc/config/aarch64/aarch64-speculation.cc
new file mode 100644
index 000..2dd06ae
--- /dev/null
+++ b/gcc/config/aarch64/aarch64-speculation.cc
@@ -0,0 +1,494 @@
+/* Speculation tracking and mitigation (e.g. CVE 2017-5753) for AArch64.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "target.h"
+#include "rtl.h"
+#include "tree-pass.h"
+#include "profile-count.h"
+#include "cfg.h"
+#include "cfgbuild.h"
+#include "print-rtl.h"
+#include "cfgrtl.h"
+#include "function.h"
+#include "bas

[PATCH 5/7] AArch64 - disable CB[N]Z TB[N]Z when tracking speculation

2018-07-09 Thread Richard Earnshaw


The CB[N]Z and TB[N]Z instructions do not expose the comparison through
the condition code flags.  This makes it impossible to track speculative
execution through such a branch.  We can handle this relatively easily
by simply disabling the patterns in this case.

A side effect of this is that the split patterns for the atomic operations
need to also avoid generating these instructions.  They mostly have simple
fall-backs for this already.

* config/aarch64/aarch64.md (cb1): Disable when
aarch64_track_speculation is true.
(tb1): Likewise.
* config/aarch64/aarch64.c (aarch64_split_compare_regs): Do not
generate CB[N]Z when tracking speculation.
(aarch64_split_compare_and_swap): Likewise.
(aarch64_split_atomic_op): Likewise.
---
 gcc/config/aarch64/aarch64.c  | 33 ++---
 gcc/config/aarch64/aarch64.md |  6 +++---
 2 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 01f35f8..da96afd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14465,7 +14465,16 @@ aarch64_split_compare_and_swap (rtx operands[])
 
   if (strong_zero_p)
 {
-  x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
+  if (aarch64_track_speculation)
+	{
+	  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+	  rtx cc_reg = aarch64_gen_compare_reg (NE, rval, const0_rtx);
+	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+	}
+  else
+	x = gen_rtx_NE (VOIDmode, rval, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 gen_rtx_LABEL_REF (Pmode, label2), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -14483,7 +14492,16 @@ aarch64_split_compare_and_swap (rtx operands[])
 
   if (!is_weak)
 {
-  x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
+  if (aarch64_track_speculation)
+	{
+	  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+	  rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
+	  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+	}
+  else
+	x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 gen_rtx_LABEL_REF (Pmode, label1), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
@@ -14819,7 +14837,16 @@ aarch64_split_atomic_op (enum rtx_code code, rtx old_out, rtx new_out, rtx mem,
   aarch64_emit_store_exclusive (mode, cond, mem,
 gen_lowpart (mode, new_out), model_rtx);
 
-  x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+  if (aarch64_track_speculation)
+{
+  /* Emit an explicit compare instruction, so that we can correctly
+	 track the condition codes.  */
+  rtx cc_reg = aarch64_gen_compare_reg (NE, cond, const0_rtx);
+  x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+}
+  else
+x = gen_rtx_NE (VOIDmode, cond, const0_rtx);
+
   x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
 			gen_rtx_LABEL_REF (Pmode, label), pc_rtx);
   aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c135ada..259a07d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -690,7 +690,7 @@ (define_insn "*cb1"
 (const_int 0))
 			   (label_ref (match_operand 1 "" ""))
 			   (pc)))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   return aarch64_gen_far_branch (operands, 1, "Lcb", "\\t%0, ");
@@ -720,7 +720,7 @@ (define_insn "*tb1"
 	 (label_ref (match_operand 2 "" ""))
 	 (pc)))
(clobber (reg:CC CC_REGNUM))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   {
@@ -756,7 +756,7 @@ (define_insn "*cb1"
 			   (label_ref (match_operand 1 "" ""))
 			   (pc)))
(clobber (reg:CC CC_REGNUM))]
-  ""
+  "!aarch64_track_speculation"
   {
 if (get_attr_length (insn) == 8)
   {

[PATCH] alpha: Use TARGET_COMPUTE_FRAME_LAYOUT

2018-07-09 Thread Richard Henderson

At the same time, merge several related frame computing functions.
Recall that HWI is now always 64-bit, so merge IMASK and FMASK,
which allows merging of several loops within prologue and epilogue.

Full regression testing will take some time, but a quick browse
suggests no change in generated code.


r~


* config/alpha/alpha.c (direct_return): Move down after
struct machine_function definition; use saved frame_size;
return bool.
(struct machine_function): Add sa_mask, sa_size, frame_size.
(alpha_sa_mask, alpha_sa_size, compute_frame_size): Merge into ...
(alpha_compute_frame_layout): ... new function.
(TARGET_COMPUTE_FRAME_LAYOUT): New.
(alpha_initial_elimination_offset): Use saved sa_size.
(alpha_vms_initial_elimination_offset): Likewise.
(alpha_vms_can_eliminate): Remove alpha_sa_size call.
(alpha_expand_prologue): Use saved frame data.  Merge integer
and fp register save loops.
(alpha_expand_epilogue): Likewise.
(alpha_start_function): Use saved frame data.
* config/alpha/alpha-protos.h (direct_return): Update.
(alpha_sa_size): Remove.
---
 gcc/config/alpha/alpha-protos.h |   3 +-
 gcc/config/alpha/alpha.c| 293 
 2 files changed, 109 insertions(+), 187 deletions(-)

diff --git a/gcc/config/alpha/alpha-protos.h b/gcc/config/alpha/alpha-protos.h
index d171f4eb414..099ce0e0c42 100644
--- a/gcc/config/alpha/alpha-protos.h
+++ b/gcc/config/alpha/alpha-protos.h
@@ -21,9 +21,8 @@ extern int alpha_next_sequence_number;
 
 extern void literal_section (void);
 extern int zap_mask (HOST_WIDE_INT);
-extern int direct_return (void);
+extern bool direct_return (void);
 
-extern int alpha_sa_size (void);
 extern HOST_WIDE_INT alpha_initial_elimination_offset (unsigned int,
   unsigned int);
 extern void alpha_expand_prologue (void);
diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 9adfe159381..218306d3a07 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -728,19 +728,6 @@ alpha_vector_mode_supported_p (machine_mode mode)
   return mode == V8QImode || mode == V4HImode || mode == V2SImode;
 }
 
-/* Return 1 if this function can directly return via $26.  */
-
-int
-direct_return (void)
-{
-  return (TARGET_ABI_OSF
- && reload_completed
- && alpha_sa_size () == 0
- && get_frame_size () == 0
- && crtl->outgoing_args_size == 0
- && crtl->args.pretend_args_size == 0);
-}
-
 /* Return the TLS model to use for SYMBOL.  */
 
 static enum tls_model
@@ -4837,6 +4824,10 @@ struct GTY(()) alpha_links;
 
 struct GTY(()) machine_function
 {
+  unsigned HOST_WIDE_INT sa_mask;
+  HOST_WIDE_INT sa_size;
+  HOST_WIDE_INT frame_size;
+
   /* For flag_reorder_blocks_and_partition.  */
   rtx gp_save_rtx;
 
@@ -7236,83 +7227,59 @@ static int vms_save_fp_regno;
 /* Register number used to reference objects off our PV.  */
 static int vms_base_regno;
 
-/* Compute register masks for saved registers.  */
-
+/* Compute register masks for saved registers, register save area size,
+   and total frame size.  */
 static void
-alpha_sa_mask (unsigned long *imaskP, unsigned long *fmaskP)
+alpha_compute_frame_layout (void)
 {
-  unsigned long imask = 0;
-  unsigned long fmask = 0;
-  unsigned int i;
+  unsigned HOST_WIDE_INT sa_mask = 0;
+  HOST_WIDE_INT frame_size;
+  int sa_size;
 
   /* When outputting a thunk, we don't have valid register life info,
  but assemble_start_function wants to output .frame and .mask
  directives.  */
-  if (cfun->is_thunk)
+  if (!cfun->is_thunk)
 {
-  *imaskP = 0;
-  *fmaskP = 0;
-  return;
-}
+  if (TARGET_ABI_OPEN_VMS && alpha_procedure_type == PT_STACK)
+   sa_mask |= HOST_WIDE_INT_1U << HARD_FRAME_POINTER_REGNUM;
 
-  if (TARGET_ABI_OPEN_VMS && alpha_procedure_type == PT_STACK)
-imask |= (1UL << HARD_FRAME_POINTER_REGNUM);
+  /* One for every register we have to save.  */
+  for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+   if (! fixed_regs[i] && ! call_used_regs[i]
+   && df_regs_ever_live_p (i) && i != REG_RA)
+ sa_mask |= HOST_WIDE_INT_1U << i;
 
-  /* One for every register we have to save.  */
-  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-if (! fixed_regs[i] && ! call_used_regs[i]
-   && df_regs_ever_live_p (i) && i != REG_RA)
-  {
-   if (i < 32)
- imask |= (1UL << i);
-   else
- fmask |= (1UL << (i - 32));
-  }
-
-  /* We need to restore these for the handler.  */
-  if (crtl->calls_eh_return)
-{
-  for (i = 0; ; ++i)
+  /* We need to restore these for the handler.  */
+  if (crtl->calls_eh_return)
{
- unsigned regno = EH_RETURN_DATA_REGNO (i);
- if (regno == INVALID_REGNUM)
-   break;
- imask |= 1UL << regno;
+ for (unsigned i

[PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst

2018-07-09 Thread Will Schmidt

Hi,
  Re-posting.  Richard provided feedback on a previous version of this
patch, I wanted to make sure he was/is OK with the latest. :-) 

Add support for Gimple folding for unaligned vector loads and stores.

Regtest completed across variety of systems, P6,P7,P8,P9.

[v2] Added the type for the MEM_REF, per feedback.
Testcases for gimple-folding of the same are currently in-tree
as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c.
Re-tested, still looks good. :-)

[v3] Updated the alignment for the MEM_REF to be 4bytes.
Updated/added/removed comments in the code for clarity.

OK for trunk?

Thanks
-Will

[gcc]

2018-07-09 Will Schmidt 

* config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
vec_xst variants to the list.
(rs6000_gimple_fold_builtin): Add support for folding unaligned
vector loads and stores.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8bc4109..774c60a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum 
rs6000_builtins fn_code)
 case ALTIVEC_BUILTIN_STVX_V8HI:
 case ALTIVEC_BUILTIN_STVX_V4SI:
 case ALTIVEC_BUILTIN_STVX_V4SF:
 case ALTIVEC_BUILTIN_STVX_V2DI:
 case ALTIVEC_BUILTIN_STVX_V2DF:
+case VSX_BUILTIN_STXVW4X_V16QI:
+case VSX_BUILTIN_STXVW4X_V8HI:
+case VSX_BUILTIN_STXVW4X_V4SF:
+case VSX_BUILTIN_STXVW4X_V4SI:
+case VSX_BUILTIN_STXVD2X_V2DF:
+case VSX_BUILTIN_STXVD2X_V2DI:
   return true;
 default:
   return false;
 }
 }
@@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
gimple_set_location (g, loc);
gsi_replace (gsi, g, true);
return true;
   }
 
+/* unaligned Vector loads.  */
+case VSX_BUILTIN_LXVW4X_V16QI:
+case VSX_BUILTIN_LXVW4X_V8HI:
+case VSX_BUILTIN_LXVW4X_V4SF:
+case VSX_BUILTIN_LXVW4X_V4SI:
+case VSX_BUILTIN_LXVD2X_V2DF:
+case VSX_BUILTIN_LXVD2X_V2DI:
+  {
+arg0 = gimple_call_arg (stmt, 0);  // offset
+arg1 = gimple_call_arg (stmt, 1);  // address
+lhs = gimple_call_lhs (stmt);
+location_t loc = gimple_location (stmt);
+/* Since arg1 may be cast to a different type, just use ptr_type_node
+   here instead of trying to enforce TBAA on pointer types.  */
+tree arg1_type = ptr_type_node;
+tree lhs_type = TREE_TYPE (lhs);
+/* in GIMPLE the type of the MEM_REF specifies the alignment.  The
+  required alignment (power) is 4 bytes regardless of data type.  */
+tree align_ltype = build_aligned_type (lhs_type, 4);
+/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+   the tree using the value from arg0.  The resulting type will match
+   the type of arg1.  */
+gimple_seq stmts = NULL;
+tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
+tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
+  arg1_type, arg1, temp_offset);
+gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
+   take an offset, but since we've already incorporated the offset
+   above, here we just pass in a zero.  */
+gimple *g;
+g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, temp_addr,
+   build_int_cst (arg1_type, 0)));
+gimple_set_location (g, loc);
+gsi_replace (gsi, g, true);
+return true;
+  }
+
+/* unaligned Vector stores.  */
+case VSX_BUILTIN_STXVW4X_V16QI:
+case VSX_BUILTIN_STXVW4X_V8HI:
+case VSX_BUILTIN_STXVW4X_V4SF:
+case VSX_BUILTIN_STXVW4X_V4SI:
+case VSX_BUILTIN_STXVD2X_V2DF:
+case VSX_BUILTIN_STXVD2X_V2DI:
+  {
+arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
+arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
+tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
+location_t loc = gimple_location (stmt);
+tree arg0_type = TREE_TYPE (arg0);
+/* Use ptr_type_node (no TBAA) for the arg2_type.  */
+tree arg2_type = ptr_type_node;
+/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+   the tree using the value from arg0.  The resulting type will match
+   the type of arg2.  */
+gimple_seq stmts = NULL;
+tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
+tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
+  arg2_type, arg2, temp_offset);
+gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+gimple *g;
+g = gimple_build_assign (build2 (MEM_REF, arg0_type, temp_addr,
+

[PATCH, rs6000] gimple folding support for vec_pack and vec_unpack

2018-07-09 Thread Will Schmidt

Hi,
  Add support for gimple folding for vec_pack(), vec_unpackh() and
vec_unpackl().
Testcases for vec_pack are already in tree. Tests for vec_unpack
have been posted separately.
OK for trunk?

Thanks,
-Will

[gcc]

2018-07-09 Will Schmidt  

* config/rs6000/rs6000.c (rs6000_gimple_fold_builtin):
Add support for gimple-folding of vec_pack() and vec_unpack()
intrinsics.


diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 774c60a..1335661 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16100,10 +16100,65 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case VSX_BUILTIN_XXMRGHW_4SI:
 case ALTIVEC_BUILTIN_VMRGHB:
 case VSX_BUILTIN_VEC_MERGEH_V2DI:
fold_mergehl_helper (gsi, stmt, 0);
return true;
+
+/* d = vec_pack (a, b) */
+case P8V_BUILTIN_VPKUDUM:
+case ALTIVEC_BUILTIN_VPKUHUM:
+case ALTIVEC_BUILTIN_VPKUWUM:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   lhs = gimple_call_lhs (stmt);
+   gimple *g = gimple_build_assign (lhs, VEC_PACK_TRUNC_EXPR, arg0, arg1);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+
+   /* d = vec_unpackh (a) */
+   /* Note that the UNPACK_{HI,LO}_EXPR used in the gimple_build_assign call
+  in this code is sensitive to endian-ness, and needs to be inverted to
+  handle both LE and BE targets.  */
+case ALTIVEC_BUILTIN_VUPKHSB:
+case ALTIVEC_BUILTIN_VUPKHSH:
+case P8V_BUILTIN_VUPKHSW:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   lhs = gimple_call_lhs (stmt);
+   if (BYTES_BIG_ENDIAN)
+g = gimple_build_assign (lhs, VEC_UNPACK_HI_EXPR, arg0);
+   else
+g = gimple_build_assign (lhs, VEC_UNPACK_LO_EXPR, arg0);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+   /* d = vec_unpackl (a) */
+case ALTIVEC_BUILTIN_VUPKLSB:
+case ALTIVEC_BUILTIN_VUPKLSH:
+case P8V_BUILTIN_VUPKLSW:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   lhs = gimple_call_lhs (stmt);
+   if (BYTES_BIG_ENDIAN)
+g = gimple_build_assign (lhs, VEC_UNPACK_LO_EXPR, arg0);
+   else
+g = gimple_build_assign (lhs, VEC_UNPACK_HI_EXPR, arg0);
+   gimple_set_location (g, gimple_location (stmt));
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+/* There is no gimple type corresponding with pixel, so just return.  */
+case ALTIVEC_BUILTIN_VUPKHPX:
+case ALTIVEC_BUILTIN_VUPKLPX:
+  {
+   return false;
+  }
+
 default:
   if (TARGET_DEBUG_BUILTIN)
fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
 fn_code, fn_name1, fn_name2);
   break;

[PATCH, rs6000] Add support for gimple folding vec_perm()

2018-07-09 Thread Will Schmidt

 Hi,
   Add support for early gimple folding of vec_perm.   Testcases are already 
in-tree as
gcc.target/powerpc/fold-vec-perm-*.c

OK for trunk?

Thanks,
-Will

[gcc]

2018-07-09  Will Schmidt  

* gcc/config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Add support
for folding vec_perm.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1335661..7e4370c9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16155,10 +16155,41 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 case ALTIVEC_BUILTIN_VUPKLPX:
   {
return false;
   }
 
+/* vec_perm.  */
+case ALTIVEC_BUILTIN_VPERM_16QI:
+case ALTIVEC_BUILTIN_VPERM_8HI:
+case ALTIVEC_BUILTIN_VPERM_4SI:
+case ALTIVEC_BUILTIN_VPERM_2DI:
+case ALTIVEC_BUILTIN_VPERM_4SF:
+case ALTIVEC_BUILTIN_VPERM_2DF:
+  {
+   arg0 = gimple_call_arg (stmt, 0);
+   arg1 = gimple_call_arg (stmt, 1);
+   tree permute = gimple_call_arg (stmt, 2);
+   lhs = gimple_call_lhs (stmt);
+   location_t loc = gimple_location (stmt);
+   gimple_seq stmts = NULL;
+   // convert arg0 and arg1 to match the type of the permute
+   // for the VEC_PERM_EXPR operation.
+   tree permute_type = (TREE_TYPE (permute));
+   tree arg0_ptype = gimple_convert (&stmts, loc, permute_type, arg0);
+   tree arg1_ptype = gimple_convert (&stmts, loc, permute_type, arg1);
+   tree lhs_ptype = gimple_build (&stmts, loc, VEC_PERM_EXPR,
+ permute_type, arg0_ptype, arg1_ptype,
+ permute);
+   // Convert the result back to the desired lhs type upon completion.
+   tree temp = gimple_convert (&stmts, loc, TREE_TYPE (lhs), lhs_ptype);
+   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+   g = gimple_build_assign (lhs, temp);
+   gimple_set_location (g, loc);
+   gsi_replace (gsi, g, true);
+   return true;
+  }
+
 default:
   if (TARGET_DEBUG_BUILTIN)
fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
 fn_code, fn_name1, fn_name2);
   break;

[PATCH, rs6000] Testcase adds for vec_unpack

2018-07-09 Thread Will Schmidt

Hi,
  Testcases to exercise the vec_unpack intrinsics.

Tested clean across assorted systems.
OK for trunk?

Thanks,
-Will

[testsuite]

2018-07-09  Will Schmidt  

* gcc.target/powerpc/fold-vec-unpack-char.c: New.
* gcc.target/powerpc/fold-vec-unpack-float.c: New.
* gcc.target/powerpc/fold-vec-unpack-int.c: New.
* gcc.target/powerpc/fold-vec-unpack-pixel.c: New.
* gcc.target/powerpc/fold-vec-unpack-short.c: New.

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-char.c
new file mode 100644
index 000..7f4b372
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-char.c
@@ -0,0 +1,36 @@
+/* Verify that overloaded built-ins for vec_unpackh and vec_unpackl with char
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+#include 
+
+vector bool short
+testbc_l (vector bool char vbc2)
+{
+  return vec_unpackl (vbc2);
+}
+
+vector signed short
+testsc_l (vector signed char vsc2)
+{
+  return vec_unpackl (vsc2);
+}
+
+vector bool short
+testbc_h (vector bool char vbc2)
+{
+  return vec_unpackh (vbc2);
+}
+
+vector signed short
+testsc_h (vector signed char vsc2)
+{
+  return vec_unpackh (vsc2);
+}
+
+/* { dg-final { scan-assembler-times "vupkhsb" 2 } } */
+/* { dg-final { scan-assembler-times "vupklsb" 2 } } */
+
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-float.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-float.c
new file mode 100644
index 000..78e8eb3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-float.c
@@ -0,0 +1,23 @@
+/* Verify that overloaded built-ins for vec_unpackh and vec_unpackl with float
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mpower8-vector -O2" } */
+
+#include 
+
+vector double
+testf_l (vector float vf2)
+{
+  return vec_unpackl (vf2);
+}
+
+vector double
+testf_h (vector float vf2)
+{
+  return vec_unpackh (vf2);
+}
+
+/* { dg-final { scan-assembler-times "xxsldwi" 4 } } */
+/* { dg-final { scan-assembler-times "xvcvspdp" 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-int.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-int.c
new file mode 100644
index 000..621c4eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-int.c
@@ -0,0 +1,35 @@
+/* Verify that overloaded built-ins for vec_unpackh and vec_unpackl with int
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mpower8-vector -O2" } */
+
+#include 
+
+vector bool long long
+testbi_l (vector bool int vbi2)
+{
+  return vec_unpackl (vbi2);
+}
+
+vector signed long long
+testsi_l (vector signed int vsi2)
+{
+  return vec_unpackl (vsi2);
+}
+
+vector bool long long
+testbi_h (vector bool int vbi2)
+{
+  return vec_unpackh (vbi2);
+}
+
+vector signed long long
+testsi_h (vector signed int vsi2)
+{
+  return vec_unpackh (vsi2);
+}
+
+/* { dg-final { scan-assembler-times "vupkhsw" 2 } } */
+/* { dg-final { scan-assembler-times "vupklsw" 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-pixel.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-pixel.c
new file mode 100644
index 000..8e7d110
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-pixel.c
@@ -0,0 +1,23 @@
+/* Verify that overloaded built-ins for vec_unpackh and vec_unpackl with pixel
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+#include 
+
+vector unsigned int
+testf_el (vector pixel vpx2)
+{
+  return vec_unpackl (vpx2);
+}
+
+vector unsigned int
+testf_eh (vector pixel vpx2)
+{
+  return vec_unpackh (vpx2);
+}
+
+/* { dg-final { scan-assembler-times "vupkhpx" 1 } } */
+/* { dg-final { scan-assembler-times "vupklpx" 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-short.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-short.c
new file mode 100644
index 000..da51012
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-unpack-short.c
@@ -0,0 +1,36 @@
+/* Verify that overloaded built-ins for vec_unpackh and vec_unpackl with int
+   inputs produce the right code.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-maltivec -O2" } */
+
+#include 
+
+vector bool int
+testbi_el (vector bool short vbs2)
+{
+  return vec_unpackl (vbs2);
+}
+
+vector signed int
+testsi_el (vector signed short vss2)
+{
+  return vec_unpackl (vss2);
+}
+
+vector bool int
+testbi_eh (vector bool short vbs2)
+{
+  return vec_unpackh (vbs2);
+}
+
+vector signed int
+testsi_eh (vector signed short vss2)
+{
+  ret

Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-09 Thread Qing Zhao

Hi, Martin,

thanks a lot for your comments. 

> On Jul 5, 2018, at 11:30 AM, Martin Sebor  wrote:
> 
> One of the basic design principles that I myself have
> accidentally violated in the past is that warning options
> should not impact the emitted object code.  I don't think
> your patch actually does introduce this dependency by having
> the codegen depend on the result of check_access() -- I'm
> pretty sure the function is designed to do the validation
> irrespective of warning options and return based on
> the result of the validation and not based on whether
> a warning was issued.  But the choice of the variable name,
> no_overflow_warn, suggests that it does, in fact, have this
> effect.  So I would suggest to rename the variable and add
> a test that verifies that this dependency does not exist.

I agree that warning options should not impact the emitted object code. 
and my current change seems violate this principle:

in the following change:

+  bool no_overflow_warn = true;

   /* Diagnose calls where the specified length exceeds the size of either
  object.  */
   if (warn_stringop_overflow)
 {
   tree size = compute_objsize (arg1, 0);
-  if (check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
-   /*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE))
+  no_overflow_warn = check_access (exp, /*dst=*/NULL_TREE, 
/*src=*/NULL_TREE,
+  len, /*maxread=*/NULL_TREE, size,
+  /*objsize=*/NULL_TREE);
+  if (no_overflow_warn) 
{
  size = compute_objsize (arg2, 0);
- check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
-   /*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE);
+ no_overflow_warn = check_access (exp, /*dst=*/NULL_TREE, 
/*src=*/NULL_TREE,
+  len,  /*maxread=*/NULL_TREE, size,
+  /*objsize=*/NULL_TREE);
}
 }

+  /* Due to the performance benefit, always inline the calls first 
+ when result_eq is false.  */
+  rtx result = NULL_RTX;
+   
+  if (!result_eq && fcode != BUILT_IN_BCMP && no_overflow_warn) 
+{
+  result = inline_expand_builtin_string_cmp (exp, target, true);
+  if (result)

The variable no_overflow_warn DEPENDs on the warning option 
warn_stringop_overflow, and this
variable is used to control the code generation.  such behavior seems violate 
the above mentioned
principle.

However, this is not a problem that can be easily fixed based on the the 
current design, which has the following issues as my
understanding:

1. the routine check_access issues warnings by default, then it seems 
necessary to guard the call
to this routine with the warning option;
2. then the returned value of the routine check_access has to depend on 
the warning option.

in order to fix the current problem I have, an approach is to rewrite the 
routine check_access to guard the issue warning inside
the routine with the warning option passed as an additional parameter.

let me know anything I am missing so far.

> 
> Beyond that, an enhancement to this optimization that might
> be worth considering is inlining even non-constant calls
> with array arguments whose size is no greater than the limit.
> As in:
> 
>  extern char a[4], *b;
> 
>  int n = strcmp (a, b);
> 
> Because strcmp arguments are required to be nul-terminated
> strings, a's length above must be at most 3.  This is analogous
> to similar optimizations GCC performs, such as folding to zero
> calls to strlen() with one-element arrays.

Yes, I agree that this will be another good enhancement to the strcmp inlining.

however, it’s not easy to be integrated with my current patch.  The major issue 
is:

 The inlined code for the strcmp call without string constant will be 
different than the inlined code for the
strcmp call with string constant,  then:

1. the default value for the threshold that control the maximum length 
of the string length for inlining will
be different than the one for the strcmp call with string constant,  more 
experiments need to be run and a new parameter
need to be added to control this;
2. the inlined transformed code will be different than the current one. 

based on the above, I’d like to open a new PR to record this new enhancement 
and finish it with a new patch later.

what’s your opinion on this?

Qing
> 
> Martin

Re: [PATCH] add support for strnlen (PR 81384)

2018-07-09 Thread Jeff Law

On 07/09/2018 08:36 AM, Aldy Hernandez wrote:
>{ dg-do run }
>{ do-options "-O2 -fno-tree-strlen" }  */
> 
>  I don't think this is doing anything.
> 
> If you look at the test run you can see that -fno-tree-strlen is never
> passed (I think you actually mean -fno-optimize-strlen for that
> matter).  Also, the builtins.exp harness runs your test for an
> assortment of other flags, not just -O2.
> 
> This test is failing on my range branch for -Og, because
> expand_builtin_strnlen() needs range info:
> 
> +  wide_int min, max;
> +  enum value_range_type rng = get_range_info (bound, &min, &max);
> +  if (rng != VR_RANGE)
> +return NULL_RTX;
> 
> but interestingly enough, it seems to be calculated in the sprintf
> pass as part of the DOM walk:
> 
>   /* First record ranges generated by this statement.  */
>   evrp_range_analyzer.record_ranges_from_stmt (stmt, false);
> 
> It feels wrong that the sprintf warning pass is generating range info
> that you may later depend on at rtl expansion time (and for a totally
> unrelated thing-- strlen expansion).
We have a fair amount of expansion code these days that will use
globally valid range information if it's been computed.


> 
> I don't know if this is just a quirk of builtins.exp calling your test
> with flags you didn't intend, but the inconsistency could cause
> problems in the future.  Errr, or my present ;-).
> 
> Would it be too much to ask for you to either fix the flags being
> passed down to the test, or better yet, find some non-sprintf
> dependent way of calculating range info earlier?
I think the general issue here is if we do something like
-O <-fblahblahblah> -Wsprintf-blah

Where  is whatever aggregate of -f options are need to
disable any optimizer that generates range information.

In that case the sprintf warning pass becomes the only pass that
generates ranges.  So whether or not we run the sprintf warning pass
would affect the generated code.  And I think that's really the bigger
issue here -- warnings should affect code generation.

The question is what to do about it.  It probably wouldn't be too
terrible to have clients of the EVRP analyzer to specify if any
discovered ranges should be mirrored into the global range information.

Optimization passes would ask for the mirroring, the sprintf pass or any
other warning pass would not ask for mirroring.

Thoughts?

jeff

Re: [patch] jump threading multiple paths that start from the same BB

2018-07-09 Thread Jeff Law

On 07/09/2018 01:19 AM, Aldy Hernandez wrote:
>>
>> I'd like decisions about how to expand branches deferred until rtl
>> expansion.  Kai was poking at this in the past but never really got any
>> traction.
> 
> For the record, the problem in this testcase is that switch lowering is
> riddled with back end specific knowledge (GET_MODE_SIZE uses as well as
> some rtx cost hacks).
Yea.  Switch lowering is going to have some of these as well, though I
think BRANCH_COST is more pervasive.

> 
>>
>> Many tests should turn into gimple IL tests.
> 
> Yeah, though for tests like the threading ones, they're already
> sufficiently convoluted that turning them into gimple IL tests will make
> them even harder to read.  Oh well, I guess?
It might make them harder to read, but it would guarantee consistent
gimple fed into the optimizer across our targets which in turn ought to
result in consistent behavior by the optimizer which in turn should
simplify the test and make them more consistent over time.

Jeff

Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-09 Thread Martin Sebor


On 07/09/2018 01:28 PM, Qing Zhao wrote:

Hi, Martin,

thanks a lot for your comments.


On Jul 5, 2018, at 11:30 AM, Martin Sebor  wrote:

One of the basic design principles that I myself have
accidentally violated in the past is that warning options
should not impact the emitted object code.  I don't think
your patch actually does introduce this dependency by having
the codegen depend on the result of check_access() -- I'm
pretty sure the function is designed to do the validation
irrespective of warning options and return based on
the result of the validation and not based on whether
a warning was issued.  But the choice of the variable name,
no_overflow_warn, suggests that it does, in fact, have this
effect.  So I would suggest to rename the variable and add
a test that verifies that this dependency does not exist.


I agree that warning options should not impact the emitted object code.
and my current change seems violate this principle:

in the following change:

+  bool no_overflow_warn = true;

   /* Diagnose calls where the specified length exceeds the size of either
  object.  */
   if (warn_stringop_overflow)
 {
   tree size = compute_objsize (arg1, 0);
-  if (check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
-   /*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE))
+  no_overflow_warn = check_access (exp, /*dst=*/NULL_TREE, 
/*src=*/NULL_TREE,
+  len, /*maxread=*/NULL_TREE, size,
+  /*objsize=*/NULL_TREE);
+  if (no_overflow_warn)
{
  size = compute_objsize (arg2, 0);
- check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
-   /*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE);
+ no_overflow_warn = check_access (exp, /*dst=*/NULL_TREE, 
/*src=*/NULL_TREE,
+  len,  /*maxread=*/NULL_TREE, size,
+  /*objsize=*/NULL_TREE);
}
 }

+  /* Due to the performance benefit, always inline the calls first
+ when result_eq is false.  */
+  rtx result = NULL_RTX;
+
+  if (!result_eq && fcode != BUILT_IN_BCMP && no_overflow_warn)
+{
+  result = inline_expand_builtin_string_cmp (exp, target, true);
+  if (result)

The variable no_overflow_warn DEPENDs on the warning option 
warn_stringop_overflow, and this
variable is used to control the code generation.  such behavior seems violate 
the above mentioned
principle.

However, this is not a problem that can be easily fixed based on the the 
current design, which has the following issues as my
understanding:

1. the routine check_access issues warnings by default, then it seems 
necessary to guard the call
to this routine with the warning option;
2. then the returned value of the routine check_access has to depend on 
the warning option.

in order to fix the current problem I have, an approach is to rewrite the 
routine check_access to guard the issue warning inside
the routine with the warning option passed as an additional parameter.

let me know anything I am missing so far.


check_access() calls warning_at() to issue warnings, and that
function only issues warnings if they are enabled, so the guard
isn't necessary to make it work this way.


Beyond that, an enhancement to this optimization that might
be worth considering is inlining even non-constant calls
with array arguments whose size is no greater than the limit.
As in:

 extern char a[4], *b;

 int n = strcmp (a, b);

Because strcmp arguments are required to be nul-terminated
strings, a's length above must be at most 3.  This is analogous
to similar optimizations GCC performs, such as folding to zero
calls to strlen() with one-element arrays.


Yes, I agree that this will be another good enhancement to the strcmp inlining.

however, it’s not easy to be integrated with my current patch.  The major issue 
is:

 The inlined code for the strcmp call without string constant will be 
different than the inlined code for the
strcmp call with string constant,  then:

1. the default value for the threshold that control the maximum length 
of the string length for inlining will
be different than the one for the strcmp call with string constant,  more 
experiments need to be run and a new parameter
need to be added to control this;
2. the inlined transformed code will be different than the current one.

based on the above, I’d like to open a new PR to record this new enhancement 
and finish it with a new patch later.

what’s your opinion on this?


I'm not sure I see the issues above as problems and I would expect
the non-constant optimization to naturally handle the constant case
as well.  But if you prefer it that way, implementing the non-constant
optimization in a separate step sounds reasonable to me.  It's your
call.

Martin

Re: [PATCH] add support for strnlen (PR 81384)

2018-07-09 Thread Martin Sebor


On 07/09/2018 01:51 PM, Jeff Law wrote:

On 07/09/2018 08:36 AM, Aldy Hernandez wrote:

   { dg-do run }
   { do-options "-O2 -fno-tree-strlen" }  */

 I don't think this is doing anything.

If you look at the test run you can see that -fno-tree-strlen is never
passed (I think you actually mean -fno-optimize-strlen for that
matter).  Also, the builtins.exp harness runs your test for an
assortment of other flags, not just -O2.

This test is failing on my range branch for -Og, because
expand_builtin_strnlen() needs range info:

+  wide_int min, max;
+  enum value_range_type rng = get_range_info (bound, &min, &max);
+  if (rng != VR_RANGE)
+return NULL_RTX;

but interestingly enough, it seems to be calculated in the sprintf
pass as part of the DOM walk:

  /* First record ranges generated by this statement.  */
  evrp_range_analyzer.record_ranges_from_stmt (stmt, false);

It feels wrong that the sprintf warning pass is generating range info
that you may later depend on at rtl expansion time (and for a totally
unrelated thing-- strlen expansion).

We have a fair amount of expansion code these days that will use
globally valid range information if it's been computed.




I don't know if this is just a quirk of builtins.exp calling your test
with flags you didn't intend, but the inconsistency could cause
problems in the future.  Errr, or my present ;-).

Would it be too much to ask for you to either fix the flags being
passed down to the test, or better yet, find some non-sprintf
dependent way of calculating range info earlier?

I think the general issue here is if we do something like
-O <-fblahblahblah> -Wsprintf-blah

Where  is whatever aggregate of -f options are need to
disable any optimizer that generates range information.

In that case the sprintf warning pass becomes the only pass that
generates ranges.  So whether or not we run the sprintf warning pass
would affect the generated code.  And I think that's really the bigger
issue here -- warnings should affect code generation.

The question is what to do about it.  It probably wouldn't be too
terrible to have clients of the EVRP analyzer to specify if any
discovered ranges should be mirrored into the global range information.

Optimization passes would ask for the mirroring, the sprintf pass or any
other warning pass would not ask for mirroring.

Thoughts?


The sprintf pass doesn't just emit warnings -- it also performs
the sprintf optimizations, and the two functions are independent
of one another.  I'm pretty sure there are tests to verify that
it's so.

The sprintf optimization include replacing constant calls with
their results and setting ranges for others.  As you mentioned,
the sprintf pass isn't the only one in GCC that does the latter.
The strlen pass does as well, and so does gimple-fold.c.  There
is also a call to set_range_info() in tree-vect-loop-manip.c
though I'm not familiar with that one yet.

Martin

Re: [PING][PATCH, rs6000, C/C++] Fix PR target/86324: divkc3-1.c FAILs when compiling with -mabi=ieeelongdouble

2018-07-09 Thread Jeff Law

On 07/06/2018 02:59 PM, Peter Bergner wrote:
> On 7/5/18 2:36 PM, Jeff Law wrote:
>> On 07/02/2018 03:50 PM, Peter Bergner wrote:
>>> I'd like to PING:
>>>
>>>   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01713.html
>>>
>>> I've included the entire patch below, since I missed the test cases in
>>> the original submission and Segher asked for some updated text for the
>>> hook documentation which I've included below.
>>
>> OK.
> 
> Is that an ok for GCC 8 as well which I asked for in the initial patch
> submission?
Well, I think that's more up to Jakub and Richi -- it doesn't seem to me
like it fixes a regression, so ISTM they'd need to agree to an exception
here.

I don't think it's risky at all.  BUt I don't want to step on their toes
since they're the release managers.

jeff

Re: [PATCH] add support for strnlen (PR 81384)

2018-07-09 Thread Martin Sebor


On 07/09/2018 08:36 AM, Aldy Hernandez wrote:

   { dg-do run }
   { do-options "-O2 -fno-tree-strlen" }  */

 I don't think this is doing anything.

If you look at the test run you can see that -fno-tree-strlen is never
passed (I think you actually mean -fno-optimize-strlen for that
matter).  Also, the builtins.exp harness runs your test for an
assortment of other flags, not just -O2.


I didn't know the harness ignores dg-options specified in these
tests.  That's surprising and feels like a bug in the harness
not to complain about it.  The purpose of the test is to verify
that the strnlen expansion in builtins.c does the right thing
and it deliberately tries to disable the earlier strlen
optimizations to make sure the expansion in builtins.c is fully
exercised.  By not pointing out my mistake the harness effectively
let me commit a change without making sure it's thoroughly tested
(I tested it manually before committing the patch but things could
regress without us noticing).  I'll look into fixing this somehow.



This test is failing on my range branch for -Og, because
expand_builtin_strnlen() needs range info:

+  wide_int min, max;
+  enum value_range_type rng = get_range_info (bound, &min, &max);
+  if (rng != VR_RANGE)
+return NULL_RTX;

but interestingly enough, it seems to be calculated in the sprintf
pass as part of the DOM walk:

  /* First record ranges generated by this statement.  */
  evrp_range_analyzer.record_ranges_from_stmt (stmt, false);

It feels wrong that the sprintf warning pass is generating range info
that you may later depend on at rtl expansion time (and for a totally
unrelated thing-- strlen expansion).


Any pass that records ranges for statements will have this
effect.  The sprintf pass seems to be the first one to make
use of this utility (and it's not just a warning pass but also
an optimization pass) but it would be a shame to put it off
limits to warning-only passes only because it happens to set
ranges.



I don't know if this is just a quirk of builtins.exp calling your test
with flags you didn't intend, but the inconsistency could cause
problems in the future.  Errr, or my present ;-).

Would it be too much to ask for you to either fix the flags being
passed down to the test, or better yet, find some non-sprintf
dependent way of calculating range info earlier?


At the time I wrote the test I didn't realize the statement
range info was being computed only in the sprintf pass --
I thought it was done as "a basic service for the greater
good" by VRP.  It seems that it should be such a service.

Let me look into tweaking the test.

Martin



Aldy
On Mon, Jun 18, 2018 at 6:35 PM Martin Sebor  wrote:


On 06/12/2018 03:11 PM, Jeff Law wrote:

On 06/05/2018 03:43 PM, Martin Sebor wrote:

The attached patch adds basic support for handling strnlen
as a built-in function.  It touches the strlen pass where
it folds constant results of the function, and builtins.c
to add simple support for expanding strnlen calls with known
results.  It also changes calls.c to detect excessive bounds
to the function and unsafe calls with arguments declared
attribute nonstring.

A side-effect of the strlen change I should call out is that
strlen() calls to all zero-length arrays that aren't considered
flexible array members (i.e., internal members or non-members)
are folded into zero.  No warning is issued for such invalid
uses of zero-length arrays but based on the responses to my
question Re: aliasing between internal zero-length-arrays and
other members(*) it sounds like one would be appropriate.
I will see about adding one in a separate patch.

Martin

[*] https://gcc.gnu.org/ml/gcc/2018-06/msg00046.html

gcc-81384.diff


PR tree-optimization/81384 - built-in form of strnlen missing

gcc/ChangeLog:

 PR tree-optimization/81384
 * builtin-types.def (BT_FN_SIZE_CONST_STRING_SIZE): New.
 * builtins.c (expand_builtin_strnlen): New function.
 (expand_builtin): Call it.
 (fold_builtin_n): Avoid setting TREE_NO_WARNING.
 * builtins.def (BUILT_IN_STRNLEN): New.
 * calls.c (maybe_warn_nonstring_arg): Handle BUILT_IN_STRNLEN.
 Warn for bounds in excess of maximum object size.
 * tree-ssa-strlen.c (maybe_set_strlen_range): Return tree representing
 single-value ranges.  Handle strnlen.
 (handle_builtin_strlen): Handle strnlen.
 (strlen_check_and_optimize_stmt): Same.

gcc/testsuite/ChangeLog:

 PR tree-optimization/81384
 * gcc.c-torture/execute/builtins/lib/strnlen.c: New test.
 * gcc.c-torture/execute/builtins/strnlen-lib.c: New test.
 * gcc.c-torture/execute/builtins/strnlen.c: New test.
 * gcc.dg/attr-nonstring-2.c: New test.
 * gcc.dg/attr-nonstring-3.c: New test.
 * gcc.dg/attr-nonstring-4.c: New test.
 * gcc.dg/strlenopt-44.c: New test.
 * gcc.dg/strlenopt.h (strnlen):  Declare.

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 5365bef..1f15350 100644
--- a/gcc/builtin-types.def
++

Re: [PATCH] fold strlen() of aggregate members (PR 77357)

2018-07-09 Thread Martin Sebor


On 07/09/2018 06:40 AM, Richard Biener wrote:

On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor  wrote:


On 07/06/2018 09:52 AM, Richard Biener wrote:

On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor  wrote:


GCC folds accesses to members of constant aggregates except
for character arrays/strings.  For example, the strlen() call
below is not folded:

   const char a[][4] = { "1", "12" };

   int f (void) { retturn strlen (a[1]); }

The attached change set enhances the string_constant() function
to make it possible to extract string constants from aggregate
initializers (CONSTRUCTORS).

The initial solution was much simpler but as is often the case,
MEM_REF made it fail to fold things like:

   int f (void) { retturn strlen (a[1] + 1); }

Handling those made the project a bit more interesting and
the final solution somewhat more involved.

To handle offsets into aggregate string members the patch also
extends the fold_ctor_reference() function to extract entire
string array initializers even if the offset points past
the beginning of the string and even though the size and
exact type of the reference are not known (there isn't enough
information in a MEM_REF to determine that).

Tested along with the patch for PR 86415 on x86_64-linux.


+  if (TREE_CODE (init) == CONSTRUCTOR)
+   {
+ tree type;
+ if (TREE_CODE (arg) == ARRAY_REF
+ || TREE_CODE (arg) == MEM_REF)
+   type = TREE_TYPE (arg);
+ else if (TREE_CODE (arg) == COMPONENT_REF)
+   {
+ tree field = TREE_OPERAND (arg, 1);
+ type = TREE_TYPE (field);
+   }
+ else
+   return NULL_TREE;

what's wrong with just

type = TREE_TYPE (field);


In response to your comment below abut size I simplified things
further so determining the type a priori is no longer necessary.


?

+ base_off *= BITS_PER_UNIT;

poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int,
for poly you'd then use poly_offset?


Okay, I tried to avoid the overflow.  (Converting between all
these flavors of wide int types is a monumental PITA.)



You extend fold_ctor_reference to treat size == 0 specially but then
bother to compute a size here - that looks unneeded?


Yes, well spotted, thanks!  I simplified the code so this isn't
necessary, and neither is the type.



While the offset of the reference determines the first field in the
CONSTRUCTOR, how do you know the access doesn't touch
adjacent ones?  STRING_CSTs do not have to be '\0' terminated,
so consider

  char x[2][4] = { "abcd", "abcd" };

and MEM[&x] with a char[8] type?  memcpy "inlining" will create
such MEMs for example.


The code is only used to find string constants in initializer
expressions where I don't think the size of the access comes
into play.  If a memcpy() call results in a MEM_REF[char[8],
&x, 8] that's fine.  It's a valid reference and we can still
get the underlying character sequence (which is represented
as two STRING_CSTs with the two string literals).  I might
be missing the point of your question.


Maybe irrelevant for strlen folding depending on what you do
for missing '\0' termination.



@@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor,
   tree byte_offset = DECL_FIELD_OFFSET (cfield);
   tree field_offset = DECL_FIELD_BIT_OFFSET (cfield);
   tree field_size = DECL_SIZE (cfield);
-  offset_int bitoffset;
-  offset_int bitoffset_end, access_end;
+
+  if (!field_size && TREE_CODE (cval) == STRING_CST)
+   {
+ /* Determine the size of the flexible array member from
+the size of the string initializer provided for it.  */
+ unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval);
+ tree eltype = TREE_TYPE (TREE_TYPE (cval));
+ len *= tree_to_uhwi (TYPE_SIZE (eltype));
+ field_size = build_int_cst (size_type_node, len);
+   }

Why does this only apply to STRING_CST initializers and not CONSTRUCTORS,
say, for

struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } };


I can't think of a use for it.  Do you have something in mind?


Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset
which is useful in other parts of the compiler.  So I don't see why
it shouldn't work for general flex-arrays.



?  And why not use simply

  field_size = TYPE_SIZE (TREE_TYPE (cval));

like you do in c_strlen?


Yes, that's simpler, thanks.



Otherwise looks reasonable.


Attached is an updated patch.  I also enhanced the handling
of non-constant indices.  They were already handled before
to a smaller extent.  (There may be other opportunities
here.)


Please don't do functional changes to a patch in review, without
exactly pointing out the change.  It makes review inefficent for me.

Looks like it might be the NULL type argument handling?


Sorry.  The change I was referring to is the addition and handling
of the varidx variable to string_constant.  It was necessary for
parity

Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-09 Thread Jeff Law

On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
> 
> The patches I posted earlier this year for mitigating against
> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> which it became obvious that a rethink was needed.  This mail, and the
> following patches attempt to address that feedback and present a new
> approach to mitigating against this form of attack surface.
> 
> There were two major issues with the original approach:
> 
> - The speculation bounds were too tightly constrained - essentially
>   they had to represent and upper and lower bound on a pointer, or a
>   pointer offset.
> - The speculation constraints could only cover the immediately preceding
>   branch, which often did not fit well with the structure of the existing
>   code.
> 
> An additional criticism was that the shape of the intrinsic did not
> fit particularly well with systems that used a single speculation
> barrier that essentially had to wait until all preceding speculation
> had to be resolved.
Right.  I suggest the Intel and IBM reps chime in on the updated semantics.

> 
> To address all of the above, these patches adopt a new approach, based
> in part on a posting by Chandler Carruth to the LLVM developers list
> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> but which we have extended to deal with inter-function speculation.
> The patches divide the problem into two halves.
We're essentially turning the control dependency into a value that we
can then use to munge the pointer or the resultant data.

> 
> The first half is some target-specific code to track the speculation
> condition through the generated code to provide an internal variable
> which can tell us whether or not the CPU's control flow speculation
> matches the data flow calculations.  The idea is that the internal
> variable starts with the value TRUE and if the CPU's control flow
> speculation ever causes a jump to the wrong block of code the variable
> becomes false until such time as the incorrect control flow
> speculation gets unwound.
Right.

So one of the things that comes immediately to mind is you have to run
this early enough that you can still get to all the control flow and
build your predicates.  Otherwise you have do undo stuff like
conditional move generation.

On the flip side, the earlier you do this mitigation, the more you have
to worry about what the optimizers are going to do to the code later in
the pipeline.  It's almost guaranteed a naive implementation is going to
muck this up since we can propagate the state of the condition into the
arms which will make the predicate state a compile time constant.

In fact this seems to be running into the area of pointer providence and
some discussions we had around atomic a few years back.

I also wonder if this could be combined with taint analysis to produce a
much lower overhead solution in cases were developers have done analysis
and know what objects are potentially under attacker control.  So
instead of analyzing everything, we can have a much narrower focus.

The pointer munging could well run afoul of alias analysis engines that
don't expect to be seeing those kind of operations.

Anyway, just some initial high level thoughts.  I'm sure there'll be
more as I read the implementation.

Jeff

[PATCH,rs6000] Backport of stxvl instruction fix to GCC 7

2018-07-09 Thread Carl Love

GCC Maintainers:

The following patch is a back port for a commit to mainline prior to
GCC 8 release.  Note, the code fixed by this patch was later modified
in commit 256798 as part of adding vec_xst_len support.  The sldi
instruction gets replaced by an ashift of the operand for the stxvl
instruction.  Commit 256798 adds additional functionality and does not
fix any functional issues.  Hence it is not being back ported, just the
original bug fix given below.

The patch has been tested on 

powerpc64le-unknown-linux-gnu (Power 8 LE)  

With no regressions.

Please let me know if the patch looks OK for GCC 7.

 Carl Love
---
2018-07-09  Carl Love  

Backport from mainline
2017-09-07  Carl Love  

* config/rs6000/vsx.md (define_insn "*stxvl"): Add missing argument to
the sldi instruction.
---
 gcc/config/rs6000/vsx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index eef5357..37d768f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3946,7 +3946,7 @@
      (match_operand:DI 2 "register_operand" "+r")]
     UNSPEC_STXVL))]
   "TARGET_P9_VECTOR && TARGET_64BIT"
-  "sldi %2,%2\;stxvl %x0,%1,%2"
+  "sldi %2,%2,56\;stxvl %x0,%1,%2"
   [(set_attr "length" "8")
(set_attr "type" "vecstore")])
 
-- 
2.7.4

[PATCH 0/5] [RFC v2] Higher-level reporting of vectorization problems

2018-07-09 Thread David Malcolm

On Mon, 2018-06-25 at 11:10 +0200, Richard Biener wrote:
> On Fri, 22 Jun 2018, David Malcolm wrote:
> 
> > NightStrike and I were chatting on IRC last week about
> > issues with trying to vectorize the following code:
> > 
> > #include 
> > std::size_t f(std::vector> const & v) {
> > std::size_t ret = 0;
> > for (auto const & w: v)
> > ret += w.size();
> > return ret;
> > }
> > 
> > icc could vectorize it, but gcc couldn't, but neither of us could
> > immediately figure out what the problem was.
> > 
> > Using -fopt-info leads to a wall of text.
> > 
> > I tried using my patch here:
> > 
> >  "[PATCH] v3 of optinfo, remarks and optimization records"
> >   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
> > 
> > It improved things somewhat, by showing:
> > (a) the nesting structure via indentation, and
> > (b) the GCC line at which each message is emitted (by using the
> > "remark" output)
> > 
> > but it's still a wall of text:
> > 
> >   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.
> > html
> >   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C.
> > .%7Csrc%7Ctest.cc.html#line-4
> > 
> > It doesn't yet provide a simple high-level message to a
> > tech-savvy user on what they need to do to get GCC to
> > vectorize their loop.
> 
> Yeah, in particular the vectorizer is way too noisy in its low-level
> functions.  IIRC -fopt-info-vec-missed is "somewhat" better:
> 
> t.C:4:26: note: step unknown.
> t.C:4:26: note: vector alignment may not be reachable
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: no array mode for V2DI[3]
> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> t.C:4:26: note: can't use a fully-masked loop because the target
> doesn't 
> have the appropriate masked load or store.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: no array mode for V2DI[3]
> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> t.C:4:26: note: op not supported by target.
> t.C:4:26: note: not vectorized: relevant stmt not supported: _15 =
> _14 
> /[ex] 4;
> t.C:4:26: note: bad operation or unsupported loop bound.
> t.C:4:26: note: not vectorized: no grouped stores in basic block.
> t.C:4:26: note: not vectorized: no grouped stores in basic block.
> t.C:6:12: note: not vectorized: not enough data-refs in basic block.
> 
> 
> > The pertinent dump messages are:
> > 
> > test.cc:4:23: remark: === try_vectorize_loop_1 ===
> > [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
> > cc1plus: remark:
> > Analyzing loop at test.cc:4
> > [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
> > test.cc:4:23: remark:  === analyze_loop_nest ===
> > [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
> > [...snip...]
> > test.cc:4:23: remark:   === vect_analyze_loop_operations ===
> > [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations]
> > [...snip...]
> > test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex]
> > 4;’ [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
> > test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’
> > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> > test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’
> > [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
> > test.cc:4:23: remark:type of def: internal [../../src/gcc/tree-
> > vect-stmts.c:10112:vect_is_simple_use]
> > test.cc:4:23: remark:vect_is_simple_use: operand ‘4’
> > [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
> > test.cc:4:23: remark:op not supported by target.
> > [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
> > test.cc:4:23: remark:not vectorized: relevant stmt not
> > supported: ‘_15 = _14 /[ex] 4;’ [../../src/gcc/tree-vect-
> > stmts.c:9565:vect_analyze_stmt]
> > test.cc:4:23: remark:   bad operation or unsupported loop bound.
> > [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
> > cc1plus: remark: vectorized 0 loops in function.
> > [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
> > 
> > In particular, that complaint from
> >   [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
> > is coming from:
> > 
> >   if (!ok)
> > {
> >   if (dump_enabled_p ())
> > {
> >   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >"not vectorized: relevant stmt not ");
> >   dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
> >   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
> > stmt, 0);
> > }
> > 
> >   return false;
> > }
> > 
> > This got me thinking: the user presumably wants to know several
> > things:
> > 
> > * the location of the loop that can't be vectorized (vect_location
> >   captures this)
>

[PATCH 2/5] Use opt-problem.h in try_vectorize_loop_1

2018-07-09 Thread David Malcolm

gcc/ChangeLog:
* tree-vectorizer.c: Include "opt-problem.h".
(try_vectorize_loop_1): Convert "loop_vinfo" from loop_vec_info to
opt_loop_vec_info loop_vinfo, and use opt_report to report any
opt_problem thus captured.  Use opt_report ro report on successful
loop vectorization.
---
 gcc/tree-vectorizer.c | 34 --
 1 file changed, 28 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 8d54fbb..c60d0d9 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stringpool.h"
 #include "attribs.h"
 #include "gimple-pretty-print.h"
+#include "opt-problem.h"
 
 
 /* Loop or bb location, with hotness information.  */
@@ -705,9 +706,17 @@ try_vectorize_loop_1 (hash_table 
*&simduid_to_vf_htab,
 LOCATION_FILE (vect_location.get_location_t ()),
 LOCATION_LINE (vect_location.get_location_t ()));
 
-  loop_vec_info loop_vinfo = vect_analyze_loop (loop, orig_loop_vinfo, 
&shared);
+  opt_loop_vec_info loop_vinfo = vect_analyze_loop (loop, orig_loop_vinfo, 
&shared);
   loop->aux = loop_vinfo;
 
+  if (!loop_vinfo)
+if (loop_vinfo.get_problem ())
+  {
+   opt_report report;
+   if (report.remark (vect_location, "couldn't vectorize loop"))
+ loop_vinfo.get_problem ()->report_reason (report);
+  }
+
   if (!loop_vinfo || !LOOP_VINFO_VECTORIZABLE_P (loop_vinfo))
 {
   /* Free existing information if loop is analyzed with some
@@ -775,13 +784,26 @@ try_vectorize_loop_1 (hash_table 
*&simduid_to_vf_htab,
 
   unsigned HOST_WIDE_INT bytes;
   if (current_vector_size.is_constant (&bytes))
-dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"loop vectorized vectorized using "
+{
+  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
+  "loop vectorized vectorized using "
+  HOST_WIDE_INT_PRINT_UNSIGNED " byte "
+  "vectors\n", bytes);
+  opt_report report;
+  // FIXME: is this the correct format code?
+  report.remark (vect_location,
+"loop vectorized using "
 HOST_WIDE_INT_PRINT_UNSIGNED " byte "
-"vectors\n", bytes);
+"vectors", bytes);
+}
   else
-dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"loop vectorized using variable length vectors\n");
+{
+  dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
+  "loop vectorized using variable length vectors\n");
+  opt_report report;
+  report.remark (vect_location,
+"loop vectorized using variable length vectors");
+}
 
   loop_p new_loop = vect_transform_loop (loop_vinfo);
   (*num_vectorized_loops)++;
-- 
1.8.5.3

[PATCH 3/5] Add some test coverage

2018-07-09 Thread David Malcolm

In particular, note how this allows us to highlight specific loops in
testcases (via dg-remark), and to highlight the specific lines that cause
problems to the vectorizer (via dg-message).

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-alias-check-4.c: Add -fremarks to options.  Add
dg-remark and dg-message directives to the cover the expected
vectorization failure reports.
* gcc.dg/vect/vect-remarks-1.c: New test.
---
 gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c | 11 +++
 gcc/testsuite/gcc.dg/vect/vect-remarks-1.c | 18 ++
 2 files changed, 25 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-remarks-1.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 
b/gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c
index 1e5fc27..b08b4b4 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_int } */
-/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */
+/* { dg-additional-options "--param vect-max-version-for-alias-checks=0 
-fremarks" } */
 
 #define N 16
 
@@ -14,22 +14,25 @@ union u { struct s2 f; struct s3 g; };
 void
 f1 (int a[][N], int b[][N])
 {
-  for (int i = 0; i < N; ++i)
+  for (int i = 0; i < N; ++i) /* { dg-remark "couldn't vectorize loop" } */
 a[0][i] += b[0][i];
+  /* { dg-message "will not create alias checks, as --param 
vect-max-version-for-alias-checks == 0" "" { target *-*-* } .-2 } */
 }
 
 void
 f2 (union u *a, union u *b)
 {
-  for (int i = 0; i < N; ++i)
+  for (int i = 0; i < N; ++i) /* { dg-remark "couldn't vectorize loop" } */
 a->f.b.a[i] += b->g.e.a[i];
+  /* { dg-message "will not create alias checks, as --param 
vect-max-version-for-alias-checks == 0" "" { target *-*-* } .-2 } */
 }
 
 void
 f3 (struct s1 *a, struct s1 *b)
 {
-  for (int i = 0; i < N - 1; ++i)
+  for (int i = 0; i < N - 1; ++i) /* { dg-remark "couldn't vectorize loop" } */
 a->a[i + 1] += b->a[i];
+  /* { dg-message "possible dependence between data-refs" "" { target *-*-* } 
.-2 } */
 }
 
 /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-remarks-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-remarks-1.c
new file mode 100644
index 000..5006742
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-remarks-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-fremarks -O3" } */
+
+extern void accumulate (int x, int *a);
+
+int test_1 (int arr[], int n)
+{
+  int sum = 0;
+  for (int i = 0; i < n; ++i) /* { dg-remark "couldn't vectorize loop" } */
+accumulate (arr[i], &sum); /* { dg-message "statement clobbers memory: 
'accumulate .*'" } */
+  return sum;
+}
+
+void test_2 (int *dst, int *src_a, int *src_b, int n)
+{
+  for (int i = 0; i < n; i++) /* { dg-remark "loop vectorized" } */
+dst[i] = src_a[i] + src_b[i];
+}
-- 
1.8.5.3

[PATCH 1/5] Add opt-problem.h

2018-07-09 Thread David Malcolm

gcc/ChangeLog:
* opt-problem.h: New file.
* tree-vectorizer.h (opt_loop_vec_info): New typedef.
---
 gcc/opt-problem.h | 326 ++
 gcc/tree-vectorizer.h |   6 +
 2 files changed, 332 insertions(+)
 create mode 100644 gcc/opt-problem.h

diff --git a/gcc/opt-problem.h b/gcc/opt-problem.h
new file mode 100644
index 000..100eed0
--- /dev/null
+++ b/gcc/opt-problem.h
@@ -0,0 +1,326 @@
+/* Rich information on why an optimization wasn't possible.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#ifndef GCC_OPT_PROBLEM_H
+#define GCC_OPT_PROBLEM_H
+
+#include "diagnostic-core.h" /* for ATTRIBUTE_GCC_DIAG.  */
+
+/* This header declares a family of wrapper classes for tracking a
+   success/failure value, while optionally supporting propagating an
+   opt_problem * describing any failure back up the call stack.
+
+   For instance, at the deepest point of the callstack where the failure
+   happens, rather than:
+
+ if (!check_something ())
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"foo is unsupported.\n");
+ return false;
+   }
+ // [...more checks...]
+
+ // All checks passed:
+ return true;
+
+   we can capture the cause of the failure via:
+
+ if (!check_something ())
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"foo is unsupported.\n");
+ return opt_result::failure ("foo is unsupported",
+ stmt);
+   }
+ // [...more checks...]
+
+ // All checks passed:
+ return opt_result::success ();
+
+   which effectively returns true or false, whilst recording any problem.
+
+   opt_result::success and opt_result::failure return opt_result values
+   which "looks like" true/false respectively, via operator bool().
+   If dump_enabled_p, then opt_result::failure also creates an opt_problem *,
+   capturing the pertinent data (here, "foo is unsupported " and "stmt").
+   If dumps are disabled, then opt_problem instances aren't
+   created, and it's equivalent to just returning a bool.
+
+   The opt_problem can be propagated via opt_result values back up
+   the call stack to where it makes most sense to the user.
+   For instance, rather than:
+
+ bool ok = try_something_that_might_fail ();
+ if (!ok)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"some message.\n");
+ return false;
+   }
+
+   we can replace the bool with an opt_result, so if dump_enabled_p, we
+   assume that if try_something_that_might_fail, an opt_problem * will be
+   created, and we can propagate it up the call chain:
+
+ opt_result ok = try_something_that_might_fail ();
+ if (!ok)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"some message.\n");
+ return ok; // propagating the opt_result
+   }
+
+   opt_result is an opt_wrapper, where opt_wrapper is a base
+   class for wrapping a T, optionally propagating an opt_problem in
+   case of failure (when dumps are enabled).  Similarly,
+   opt_pointer_wrapper can be used to wrap pointer types (where non-NULL
+   signifies success, NULL signifies failure).
+
+   In all cases, opt_wrapper acts as if the opt_problem were one of its
+   fields, but the opt_problem is actually stored in a global, so that when
+   compiled, an opt_wrapper is effectively just a T, so that we're
+   still just passing e.g. a bool around; the opt_wrapper classes
+   simply provide type-checking and an API to ensure that we provide
+   error-messages deep in the callstack at the places where problems
+   occur, and that we propagate them.  This also avoids having
+   to manage the ownership of the opt_problem instances.
+
+   Using opt_result and opt_wrapper documents the intent of the code
+   for the places where we represent success values, and allows the C++ type
+   system to track where the deepest points in the callstack are where we
+   need to emit the failure messages from.  */
+

[PATCH 5/5] Add opt-problem.cc

2018-07-09 Thread David Malcolm

gcc/ChangeLog:
* Makefile.in (OBJS): Add opt-problem.o.
* opt-problem.cc: New file.
---
 gcc/Makefile.in|  1 +
 gcc/opt-problem.cc | 96 ++
 2 files changed, 97 insertions(+)
 create mode 100644 gcc/opt-problem.cc

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c5c3d3c..fb262da 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1425,6 +1425,7 @@ OBJS = \
omp-grid.o \
omp-low.o \
omp-simd-clone.o \
+   opt-problem.o \
optabs.o \
optabs-libfuncs.o \
optabs-query.o \
diff --git a/gcc/opt-problem.cc b/gcc/opt-problem.cc
new file mode 100644
index 000..f518b16
--- /dev/null
+++ b/gcc/opt-problem.cc
@@ -0,0 +1,96 @@
+/* Rich information on why an optimization wasn't possible.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by David Malcolm .
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "diagnostic.h"
+#include "tree.h"
+#include "gimple.h"
+#include "pretty-print.h"
+#include "gimple-pretty-print.h"
+#include "opt-problem.h"
+
+/* Emit a remark about an optimization.  */
+
+bool
+opt_report::remark (dump_location_t loc, const char *gmsgid, ...)
+{
+  if (!flag_remarks)
+return false;
+
+  va_list ap;
+  va_start (ap, gmsgid);
+  bool ret = emit_diagnostic_valist (DK_REMARK, loc.get_location_t (), -1,
+gmsgid, &ap);
+  va_end (ap);
+
+  // FIXME: how should this interact with optinfo? (we don't want a duplicate 
remark)
+  return ret;
+}
+
+/* Emit a note about an optimization.  */
+
+bool
+opt_report::note (dump_location_t loc, const char *gmsgid, ...)
+{
+  va_list ap;
+  va_start (ap, gmsgid);
+  bool ret = emit_diagnostic_valist (DK_NOTE, loc.get_location_t (), -1,
+gmsgid, &ap);
+  va_end (ap);
+
+  // FIXME: how should this interact with optinfo? (we don't want a duplicate 
note)
+  return ret;
+}
+
+opt_problem *opt_problem::s_the_problem;
+
+// FIXME: some refactoring is needed, based on exactly where remarks land,
+// and how this interacts with the optinfo stuff.  For now, this decl:
+
+extern void
+print_impl_location (pretty_printer *pp, const dump_impl_location_t &impl_loc);
+
+/* Emit a diagnostic note describing why an optimization wasn't possible.  */
+
+void
+opt_problem::report_reason (opt_report &report)
+{
+  bool show_color = pp_show_color (global_dc->printer);
+
+  pretty_printer pp;
+  pp_show_color (&pp) = pp_show_color (global_dc->printer);
+  pp_string (&pp, m_text);
+  if (m_stmt)
+{
+  pp_string (&pp, ": ");
+  pp_begin_quote (&pp, show_color);
+  pp_gimple_stmt_1 (&pp, m_stmt, 0, TDF_SLIM);
+  pp_end_quote (&pp, show_color);
+}
+
+  print_impl_location (&pp, m_location.get_impl_location ());
+
+  const char *msg = pp_formatted_text (&pp);
+
+  report.note (m_location, "%s", msg);
+}
-- 
1.8.5.3

[PATCH 4/5] Use opt_result throughout vectorizer

2018-07-09 Thread David Malcolm

This patch uses the opt_result framework throughout the vectorizer
(and a few other places) so that specific details on problems are
propagated back up to the top-level of the vectorizer.

The changes are mostly fairly mechanical.

There are various FIXMEs in the code: for example, in various places
it feels like it would be useful to have a formatted printing API
for opt_result::failure etc, or other ways to capture the information
of interest (e.g. if there's a dependence between two data references
that's blocking vectorization, we probably should capture them so
we can highlight them to the user).

Almost everywhere using "bool" to handle error-checking uses "true"
for success, but there was one place where "true" meant "failure":
vect_analyze_data_ref_dependence, so the patch changes the sense
of that function's return value (as well as converting it from
bool to opt_result).

Placeholder ChangeLog follows:

gcc/ChangeLog:
* tree-data-ref.c: Use opt-problem.h in many places.
* tree-data-ref.h: Likewise.
* tree-vect-data-refs.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vect-stmts.c: Likewise.
* tree-vectorizer.h: Likewise.
---
 gcc/tree-data-ref.c   |  33 +---
 gcc/tree-data-ref.h   |  10 ++-
 gcc/tree-vect-data-refs.c | 189 -
 gcc/tree-vect-loop.c  | 212 +++---
 gcc/tree-vect-slp.c   |   4 +-
 gcc/tree-vect-stmts.c | 130 
 gcc/tree-vectorizer.h |  35 
 7 files changed, 360 insertions(+), 253 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index b163eaf..a6319f83 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -827,7 +827,7 @@ canonicalize_base_object_address (tree addr)
Return true if the analysis succeeded and store the results in DRB if so.
BB analysis can only fail for bitfield or reversed-storage accesses.  */
 
-bool
+opt_result
 dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,
  struct loop *loop)
 {
@@ -851,14 +851,14 @@ dr_analyze_innermost (innermost_loop_behavior *drb, tree 
ref,
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "failed: bit offset alignment.\n");
-  return false;
+  return opt_result::failure ("bit offset alignment", NULL);
 }
 
   if (preversep)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "failed: reverse storage order.\n");
-  return false;
+  return opt_result::failure ("reverse storage order", NULL);
 }
 
   /* Calculate the alignment and misalignment for the inner reference.  */
@@ -898,7 +898,9 @@ dr_analyze_innermost (innermost_loop_behavior *drb, tree 
ref,
 {
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "failed: evolution of base is not affine.\n");
- return false;
+ // FIXME: should this propagate an opt_problem from simple_iv?
+ return opt_result::failure ("evolution of base is not affine",
+ NULL);
 }
 }
   else
@@ -924,7 +926,9 @@ dr_analyze_innermost (innermost_loop_behavior *drb, tree 
ref,
 {
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "failed: evolution of offset is not affine.\n");
- return false;
+ // FIXME: should this propagate an opt_problem from simple_iv?
+ return opt_result::failure ("evolution of offset is not affine",
+ NULL);
 }
 }
 
@@ -981,7 +985,7 @@ dr_analyze_innermost (innermost_loop_behavior *drb, tree 
ref,
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "success.\n");
 
-  return true;
+  return opt_result::success ();
 }
 
 /* Return true if OP is a valid component reference for a DR access
@@ -1318,7 +1322,7 @@ data_ref_compare_tree (tree t1, tree t2)
 /* Return TRUE it's possible to resolve data dependence DDR by runtime alias
check.  */
 
-bool
+opt_result
 runtime_alias_check_p (ddr_p ddr, struct loop *loop, bool speed_p)
 {
   if (dump_enabled_p ())
@@ -1336,7 +1340,8 @@ runtime_alias_check_p (ddr_p ddr, struct loop *loop, bool 
speed_p)
dump_printf (MSG_MISSED_OPTIMIZATION,
 "runtime alias check not supported when optimizing "
 "for size.\n");
-  return false;
+  return opt_result::failure ("runtime alias check not supported when"
+ " optimizing for size", NULL);
 }
 
   /* FORNOW: We don't support versioning with outer-loop in either
@@ -1346,10 +1351,11 @@ runtime_alias_check_p (ddr_p ddr, struct loop *loop, 
bool speed_p)
   if (dump_enabled_p ())
dump_printf (MSG_MISSED_OPTIMIZATION,
 "runtime alias check not supported for outer

Go patch committed: Fix double evaluation when using interface field expression

2018-07-09 Thread Ian Lance Taylor

This patch to the Go frontend by Cherry Zhang fixes the compiler to
avoid a double evaluation of an interface field expression.  In
Interface_field_reference_expression, the interface expression is used
in two places, so a temporary variable is used.  Previously, we used a
Set_and_use_temporary_expression, which, when evaluated twice, causes
double evaluation of the underlying expression.  Fix by setting the
temporary once and use Temporary_reference_expression instead.

This fixes https://golang.org/issue/26248.  The test case for this is
in https://golang.org/cl/122757.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 262313)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-94738979a3422e845acf358a766aabf8b9275d43
+8ad67a72a4fa59efffc891e73ecf10020e3c565d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 262312)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -11886,10 +11886,9 @@ Interface_field_reference_expression::do
   if (!this->expr_->is_variable())
 {
   Temporary_statement* temp =
-   Statement::make_temporary(this->expr_->type(), NULL, this->location());
+   Statement::make_temporary(NULL, this->expr_, this->location());
   inserter->insert(temp);
-  this->expr_ = Expression::make_set_and_use_temporary(temp, this->expr_,
-  this->location());
+  this->expr_ = Expression::make_temporary_reference(temp, 
this->location());
 }
   return this;
 }

PING #2 [PATCH] specify large command line option arguments (PR 82063)

2018-07-09 Thread Martin Sebor


Ping #2: https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01509.html

On 07/03/2018 08:12 PM, Martin Sebor wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01509.html

On 06/24/2018 03:05 PM, Martin Sebor wrote:

Storing integer command line option arguments in type int
limits options such as -Wlarger-than= or -Walloca-larger-than
to at most INT_MAX (see bug 71905).  Larger values wrap around
zero.  The value zero is considered to disable the option,
making it impossible to specify a zero limit.

To get around these limitations, the -Walloc-size-larger-than=
option accepts a string argument that it then parses itself
and interprets as HOST_WIDE_INT.  The option also accepts byte
size suffixes like KB, MB, GiB, etc. to make it convenient to
specify very large limits.

The int limitation is obviously less than ideal in a 64-bit
world.  The treatment of zero as a toggle is just a minor wart.
The special treatment to make it work for just a single option
makes option handling inconsistent.  It should be possible for
any option that takes an integer argument to use the same logic.

The attached patch enhances GCC option processing to do that.
It changes the storage type of option arguments from int to
HOST_WIDE_INT and extends the existing (although undocumented)
option property Host_Wide_Int to specify wide option arguments.
It also introduces the ByteSize property for options for which
specifying the byte-size suffix makes sense.

To make it possible to consider zero as a meaningful argument
value rather than a flag indicating that an option is disabled
the patch also adds a CLVC_SIZE enumerator to the cl_var_type
enumeration, and modifies how options of the kind are handled.

Warning options that take large byte-size arguments can be
disabled by specifying a value equal to or greater than
HOST_WIDE_INT_M1U.  For convenience, aliases in the form of
-Wno-xxx-larger-than have been provided for all the affected
options.

In the patch all the existing -larger-than options are set
to PTRDIFF_MAX.  This makes them effectively enabled, but
because the setting is exceedingly permissive, and because
some of the existing warnings are already set to the same
value and some other checks detect and reject such exceedingly
large values with errors, this change shouldn't noticeably
affect what constructs are diagnosed.

Although all the options are set to PTRDIFF_MAX, I think it
would make sense to consider setting some of them lower, say
to PTRDIFF_MAX / 2.  I'd like to propose that in a followup
patch.

To minimize observable changes the -Walloca-larger-than and
-Wvla-larger-than warnings required more extensive work to
make of the new mechanism because of the "unbounded" argument
handling (the warnings trigger for arguments that are not
visibly constrained), and because of the zero handling
(the warnings also trigger


Martin

60 matches

Mail list logo