Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Andrew Pinski
On Tue, Jul 10, 2018 at 6:35 PM Kugan Vivekanandarajah
 wrote:
>
> Hi Andrew,
>
> On 11 July 2018 at 11:19, Andrew Pinski  wrote:
> > On Tue, Jul 10, 2018 at 6:14 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> On 10 July 2018 at 23:17, Richard Biener  
> >> wrote:
> >> > On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
> >> >  wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> Jeff told me that the recent popcount built-in detection is causing
> >> >> kernel build issues as
> >> >> ERROR: "__popcountsi2"
> >> >> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] 
> >> >> undefined!
> >> >>
> >> >> I could also reproduce this. AFIK, we should check if the libfunc is
> >> >> defined while checking popcount?
> >> >>
> >> >> I am testing the attached RFC patch. Is this reasonable?
> >> >
> >> > It doesn't work that way, all targets have this libfunc in libgcc.  This 
> >> > means
> >> > the kernel has to provide it.  The only thing you could do is restrict
> >> > replacement of CALL_EXPRs (in SCEV cprop) to those the target
> >> > natively supports.
> >>
> >> How about restricting it in expression_expensive_p ? Is that what you
> >> wanted. Attached patch does this.
> >> Bootstrap and regression testing progressing.
> >
> > Seems like that should go into is_inexpensive_builtin  instead which
> > is just tested right below.
>
> I hought about that. is_inexpensive_builtin is used in various other
> places including some inlining decision so wasn't sure if it is the
> right thing. Happy to change it if that is the right thing to do.

I audited all of the users (and their users if it is used in a
wrapper) and found that is_inexpensive_builtin should return false for
this builtin if it is a function call in the end; there are other
builtins which should be checked the similar way but I think we should
not going to force you to do the similar thing for those builtins.

Thanks,
Andrew

>
> Thanks,
> Kugan
> >
> > Thanks,
> > Andrew
> >
> >>
> >> Thanks,
> >> Kugan
> >>
> >> >
> >> > Richard.
> >> >
> >> >> Thanks,
> >> >> Kugan
> >> >>
> >> >> gcc/ChangeLog:
> >> >>
> >> >> 2018-07-10  Kugan Vivekanandarajah  
> >> >>
> >> >> * tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
> >> >> if libfunc for popcount is available.


Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Kugan Vivekanandarajah
Hi Andrew,

On 11 July 2018 at 11:19, Andrew Pinski  wrote:
> On Tue, Jul 10, 2018 at 6:14 PM Kugan Vivekanandarajah
>  wrote:
>>
>> On 10 July 2018 at 23:17, Richard Biener  wrote:
>> > On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
>> >  wrote:
>> >>
>> >> Hi,
>> >>
>> >> Jeff told me that the recent popcount built-in detection is causing
>> >> kernel build issues as
>> >> ERROR: "__popcountsi2"
>> >> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
>> >>
>> >> I could also reproduce this. AFIK, we should check if the libfunc is
>> >> defined while checking popcount?
>> >>
>> >> I am testing the attached RFC patch. Is this reasonable?
>> >
>> > It doesn't work that way, all targets have this libfunc in libgcc.  This 
>> > means
>> > the kernel has to provide it.  The only thing you could do is restrict
>> > replacement of CALL_EXPRs (in SCEV cprop) to those the target
>> > natively supports.
>>
>> How about restricting it in expression_expensive_p ? Is that what you
>> wanted. Attached patch does this.
>> Bootstrap and regression testing progressing.
>
> Seems like that should go into is_inexpensive_builtin  instead which
> is just tested right below.

I hought about that. is_inexpensive_builtin is used in various other
places including some inlining decision so wasn't sure if it is the
right thing. Happy to change it if that is the right thing to do.

Thanks,
Kugan
>
> Thanks,
> Andrew
>
>>
>> Thanks,
>> Kugan
>>
>> >
>> > Richard.
>> >
>> >> Thanks,
>> >> Kugan
>> >>
>> >> gcc/ChangeLog:
>> >>
>> >> 2018-07-10  Kugan Vivekanandarajah  
>> >>
>> >> * tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
>> >> if libfunc for popcount is available.


Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Andrew Pinski
On Tue, Jul 10, 2018 at 6:14 PM Kugan Vivekanandarajah
 wrote:
>
> On 10 July 2018 at 23:17, Richard Biener  wrote:
> > On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi,
> >>
> >> Jeff told me that the recent popcount built-in detection is causing
> >> kernel build issues as
> >> ERROR: "__popcountsi2"
> >> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
> >>
> >> I could also reproduce this. AFIK, we should check if the libfunc is
> >> defined while checking popcount?
> >>
> >> I am testing the attached RFC patch. Is this reasonable?
> >
> > It doesn't work that way, all targets have this libfunc in libgcc.  This 
> > means
> > the kernel has to provide it.  The only thing you could do is restrict
> > replacement of CALL_EXPRs (in SCEV cprop) to those the target
> > natively supports.
>
> How about restricting it in expression_expensive_p ? Is that what you
> wanted. Attached patch does this.
> Bootstrap and regression testing progressing.

Seems like that should go into is_inexpensive_builtin  instead which
is just tested right below.

Thanks,
Andrew

>
> Thanks,
> Kugan
>
> >
> > Richard.
> >
> >> Thanks,
> >> Kugan
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-07-10  Kugan Vivekanandarajah  
> >>
> >> * tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
> >> if libfunc for popcount is available.


Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Kugan Vivekanandarajah
On 10 July 2018 at 23:17, Richard Biener  wrote:
> On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
>  wrote:
>>
>> Hi,
>>
>> Jeff told me that the recent popcount built-in detection is causing
>> kernel build issues as
>> ERROR: "__popcountsi2"
>> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
>>
>> I could also reproduce this. AFIK, we should check if the libfunc is
>> defined while checking popcount?
>>
>> I am testing the attached RFC patch. Is this reasonable?
>
> It doesn't work that way, all targets have this libfunc in libgcc.  This means
> the kernel has to provide it.  The only thing you could do is restrict
> replacement of CALL_EXPRs (in SCEV cprop) to those the target
> natively supports.

How about restricting it in expression_expensive_p ? Is that what you
wanted. Attached patch does this.
Bootstrap and regression testing progressing.

Thanks,
Kugan

>
> Richard.
>
>> Thanks,
>> Kugan
>>
>> gcc/ChangeLog:
>>
>> 2018-07-10  Kugan Vivekanandarajah  
>>
>> * tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
>> if libfunc for popcount is available.
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount4.c 
b/gcc/testsuite/gcc.target/aarch64/popcount4.c
index e69de29..ee55b2e 100644
--- a/gcc/testsuite/gcc.target/aarch64/popcount4.c
+++ b/gcc/testsuite/gcc.target/aarch64/popcount4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -mgeneral-regs-only" } */
+
+int PopCount (long b) {
+int c = 0;
+
+while (b) {
+   b &= b - 1;
+   c++;
+}
+return c;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 0 "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 69122f2..ec8e4ec 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -257,7 +257,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
 #include "rtl.h"
+#include "optabs-query.h"
 #include "tree.h"
 #include "gimple.h"
 #include "ssa.h"
@@ -282,6 +284,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "tree-into-ssa.h"
 #include "builtins.h"
+#include "case-cfn-macros.h"
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
@@ -3500,6 +3503,23 @@ expression_expensive_p (tree expr)
 {
   tree arg;
   call_expr_arg_iterator iter;
+  tree fndecl = get_callee_fndecl (expr);
+
+  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+   {
+ combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
+ switch (cfn)
+   {
+   CASE_CFN_POPCOUNT:
+ /* Check if opcode for popcount is available.  */
+ if (optab_handler (popcount_optab,
+TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 
0
+ == CODE_FOR_nothing)
+   return true;
+   default:
+ break;
+   }
+   }
 
   if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
return true;


[PATCH, rs6000] Alphabetize prototypes of AltiVec built-in functions in extend.texi

2018-07-10 Thread Kelvin Nilsen
This patch alphabetizes the list of AltiVec built-in function prototypes that 
consume about 15 pages of the gcc.pdf file.  As part of the alphabetization 
effort, certain functions that should not be documented in this section of the 
manual are separated from the others and moved to the end of the section with 
comments to explain their role.

This patch prepares the way for future patches that will remove certain 
prototypes from this section and will insert certain prototypes that are 
currently missing from this section.  It also improves readability and 
maintainability of the section.

This patch has bootstrapped and tested without regressions on 
powerpc64le-unknown-linux (P8).  I have also built the gcc.pdf file and 
reviewed its contents.

In total, the diffs may appear daunting.  A condensation of the diffs is 
obtained by separating out the insertions (+ in the first column) from the 
deletions (- in the first column), sorting the respective files, and performing 
a diff.  This condensed diff reveals that the entirety of this patch results 
only in the following "net changes", all of which are (temporary) additions to 
the extend.texi file:

< 
< 
< 
< 
< 
< @end smallexample
< /* __int128, long long, and double arguments and results require -mvsx.  */
< @smallexample
< The following built-in functions which are currently documented in
< this section are not alphabetized with other built-in functions of
< this section because they belong in different sections.
< /* vec_doublee requires -mvsx.  */
< /* vec_doubleh requires -mvsx.  */
< /* vec_doublel requires -mvsx.  */
< /* vec_doubleo requires -mvsx.  */
< /* vec_float2 requires -mvsx.  */
< /* vec_floate requires -mvsx.  */
< /* vec_floato requires -mvsx.  */
< /* vec_float requires -mvsx.  */
< /* vec_neg requires P8_vector */
< /* vec_signed2 requires -mcpu=power8.  */
< /* vec_signede requires -mvsx.  */
< /* vec_signedo requires -mvsx.  */
< /* vec_signed requires -mvsx.  */
< /* vec_sldw requires -mvsx.  */
< /* vec_unsignede requires -mcpu=power8.  */
< /* vec_unsignede requires -mvsx.  */
< /* vec_unsignedo requires -mvsx.  */
< /* vec_unsigned requires -mvsx.  */

Is this patch ok for trunk?

gcc/ChangeLog:

2018-07-10  Kelvin Nilsen  

* doc/extend.texi (PowerPC AltiVec Built-in Functions):
Alphabetize prototypes of built-in functions, separating out
built-in functions that are listed in this section but should be
described elsewhere.

Index: gcc/doc/extend.texi
===
--- gcc/doc/extend.texi (revision 262542)
+++ gcc/doc/extend.texi (working copy)
@@ -16065,29 +16065,6 @@ vector unsigned int vec_add (vector unsigned int,
 vector unsigned int vec_add (vector unsigned int, vector unsigned int);
 vector float vec_add (vector float, vector float);
 
-vector float vec_vaddfp (vector float, vector float);
-
-vector signed int vec_vadduwm (vector bool int, vector signed int);
-vector signed int vec_vadduwm (vector signed int, vector bool int);
-vector signed int vec_vadduwm (vector signed int, vector signed int);
-vector unsigned int vec_vadduwm (vector bool int, vector unsigned int);
-vector unsigned int vec_vadduwm (vector unsigned int, vector bool int);
-vector unsigned int vec_vadduwm (vector unsigned int, vector unsigned int);
-
-vector signed short vec_vadduhm (vector bool short, vector signed short);
-vector signed short vec_vadduhm (vector signed short, vector bool short);
-vector signed short vec_vadduhm (vector signed short, vector signed short);
-vector unsigned short vec_vadduhm (vector bool short, vector unsigned short);
-vector unsigned short vec_vadduhm (vector unsigned short, vector bool short);
-vector unsigned short vec_vadduhm (vector unsigned short, vector unsigned 
short);
-
-vector signed char vec_vaddubm (vector bool char, vector signed char);
-vector signed char vec_vaddubm (vector signed char, vector bool char);
-vector signed char vec_vaddubm (vector signed char, vector signed char);
-vector unsigned char vec_vaddubm (vector bool char, vector unsigned char);
-vector unsigned char vec_vaddubm (vector unsigned char, vector bool char);
-vector unsigned char vec_vaddubm (vector unsigned char, vector unsigned char);
-
 vector unsigned int vec_addc (vector unsigned int, vector unsigned int);
 
 vector unsigned char vec_adds (vector bool char, vector unsigned char);
@@ -16109,34 +16086,151 @@ vector signed int vec_adds (vector bool int, vecto
 vector signed int vec_adds (vector signed int, vector bool int);
 vector signed int vec_adds (vector signed int, vector signed int);
 
-vector signed int vec_vaddsws (vector bool int, vector signed int);
-vector signed int vec_vaddsws (vector signed int, vector bool int);
-vector signed int vec_vaddsws (vector signed int, vector signed int);
+int vec_all_eq (vector signed char, vector bool char);
+int vec_all_eq (vector signed char, vector signed char);
+int vec_all_eq (vector unsigned 

Re: [PATCH] PR debug/86459 - Fix -gsplit-dwarf -g3 gcc_assert

2018-07-10 Thread Jakub Jelinek
On Wed, Jul 11, 2018 at 12:20:08AM +0200, Mark Wielaard wrote:
> There was a typo in the output_macinfo_op gcc_assert.
> The function is called dwarf_FORM, not dwarf_form.
> Add the provided testcase from the bug to test -gsplit-dwarf -g3.
> 
> gcc/ChangeLog:
> 
>   PR debug/86459
>   * dwarf2out.c (output_macinfo_op): Fix dwarf_FORM typo in gcc_assert.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR debug/86459
>   * gcc.dg/pr86459.c: New test.

Ok, thanks.

Jakub


[PATCH] PR debug/86459 - Fix -gsplit-dwarf -g3 gcc_assert

2018-07-10 Thread Mark Wielaard
There was a typo in the output_macinfo_op gcc_assert.
The function is called dwarf_FORM, not dwarf_form.
Add the provided testcase from the bug to test -gsplit-dwarf -g3.

gcc/ChangeLog:

PR debug/86459
* dwarf2out.c (output_macinfo_op): Fix dwarf_FORM typo in gcc_assert.

gcc/testsuite/ChangeLog:

PR debug/86459
* gcc.dg/pr86459.c: New test.
---
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 9523217..4640a22 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -28066,7 +28066,7 @@ output_macinfo_op (macinfo_entry *ref)
   node = find_AT_string (ref->info);
   gcc_assert (node
  && (node->form == DW_FORM_strp
- || node->form == dwarf_form (DW_FORM_strx)));
+ || node->form == dwarf_FORM (DW_FORM_strx)));
   dw2_asm_output_data (1, ref->code,
   ref->code == DW_MACRO_define_strp
   ? "Define macro strp"
diff --git a/gcc/testsuite/gcc.dg/pr86459.c b/gcc/testsuite/gcc.dg/pr86459.c
new file mode 100644
index 000..7856a37
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr86459.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-g -O2 -fno-var-tracking-assignments -gsplit-dwarf -g3" } */
+
+/* Same as pr86064.c but compiled with -g3 it showed an issue in
+   output_macinfo_op because of a typo in an assert.  */
+
+int a;
+__attribute__((__cold__)) void b();
+
+void e(int *);
+int f();
+
+void c() {
+  int d;
+  e();
+  a = d;
+  if (f())
+b();
+}
-- 
1.8.3.1



Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-10 Thread Qing Zhao
Richard and Martin,

thanks for the info.

> On Jul 10, 2018, at 11:29 AM, Richard Biener  wrote:
>> Is the above condition on variable warn_stringop_overflow unnecessary?
>> all the warnings inside check_access are controlled by
>> OPT_Wstringop_overflow_.
> 
> Well, the condition certainly saves compile time. 



> On Jul 10, 2018, at 11:55 AM, Martin Sebor  wrote:
>> 
>> Is the above condition on variable warn_stringop_overflow unnecessary?
>> all the warnings inside check_access are controlled by 
>> OPT_Wstringop_overflow_.
>> 
>> can I safely delete the above condition if (warn_stringop_overflow)?
> 
> I think the check above is only there to avoid the overhead
> of the two calls to compute_objsize and check_access.  There
> are a few more like it in other functions in the file and
> they all should be safe to remove, but also safe to keep.
> (Some of them might make it easy to inadvertently introduce
> a dependency between the warning option and an optimization
> so that's something to consider.)

currently,  the condition is there for saving compilation time.
However, for my patch, I need the return value of check_access to control 
whether 
to invoking inlining or not,  therefore,  the call to check_access should 
always be
invoked for code generation.  The condition need to be deleted.

let me know if I still miss anything here.

 based on the above, I’d like to open a new PR to record this new 
 enhancement and finish it with a new patch later.
 
 what’s your opinion on this?
>>> 
>>> I'm not sure I see the issues above as problems and I would expect
>>> the non-constant optimization to naturally handle the constant case
>>> as well.  But if you prefer it that way, implementing the non-constant
>>> optimization in a separate step sounds reasonable to me.  It's your
>>> call.
>> 
>> the inlined code for call to strcmp with constant string will only have one 
>> load instruction for each byte, but for call to strcmp
>> without constant string, there will be  two load instructions for each byte. 
>>  So, the run time performance impact will be different.
>> we need separate default values of the maximum length of the string to 
>> enable the transformation.
> 
> You're right, that's true for builtins.c where all we have to
> work with is arrays with unknown contents and string literals.
> The strlen pass, on the other hand, has access to the lengths
> of even unknown strings.  That suggests that an even better
> place for the optimization might be the strlen pass where
> the folding could happen earlier and at a higher level, which
> might even obviate having to worry about the constant vs non-
> constant handling.

Yes, looks like the inlining of call to strcmp with all variable strings might 
need to be done at
strlen pass in order to get more necessary info. 

In addition to this, I still feel that these two inlining could be separated.  
the generated code of inlining of call to strcmp with constant string
could be more optimal than the inlining of call to strcmp without constant 
strings. the cost models are different.

I just created PR:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86467 


for this work.

> 
>> I will create a PR on this and add a new patch after this one.
> 
> Sure, one step at a time makes sense.  I don't think there is
> any harm in having the optimization in two places: builtins.c
> and strlen.

Thanks a lot for your suggestions.

Qing



Re: [PATCH,rs6000] Backport of stxvl instruction fix to GCC 7

2018-07-10 Thread Segher Boessenkool
On Mon, Jul 09, 2018 at 04:50:03PM -0700, Carl Love wrote:
> The following patch is a back port for a commit to mainline prior to
> GCC 8 release.  Note, the code fixed by this patch was later modified
> in commit 256798 as part of adding vec_xst_len support.  The sldi
> instruction gets replaced by an ashift of the operand for the stxvl
> instruction.  Commit 256798 adds additional functionality and does not
> fix any functional issues.  Hence it is not being back ported, just the
> original bug fix given below.
> 
> The patch has been tested on 
> 
> powerpc64le-unknown-linux-gnu (Power 8 LE)  
> 
> With no regressions.
> 
> Please let me know if the patch looks OK for GCC 7.

This is fine.  Thanks!


Segher


>   Backport from mainline
>   2017-09-07  Carl Love  
> 
>   * config/rs6000/vsx.md (define_insn "*stxvl"): Add missing argument to
>   the sldi instruction.


[PATCH, rs6000 v4] enable gimple folding for vec_xl, vec_xst

2018-07-10 Thread Will Schmidt
Hi,
Add support for Gimple folding for unaligned vector loads and stores.

Regtest completed across variety of systems, P6,P7,P8,P9.

[v2] Added the type for the MEM_REF, per feedback.
Testcases for gimple-folding of the same are currently in-tree
as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c.

[v3] Updated the alignment for the MEM_REF to be 4bytes.
Updated/added/removed comments in the code for clarity.

[v4] Updated the alignment for the stores to match what was done
for the loads.

Sniff tests passed for V4.  Will do full regtest as well, just in case.
OK for trunk assuming successful test results?

Thanks
-Will

[gcc]

2018-07-10 Will Schmidt 

* config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
vec_xst variants to the list.
(rs6000_gimple_fold_builtin): Add support for folding unaligned
vector loads and stores.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8bc4109..229cfac 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum 
rs6000_builtins fn_code)
 case ALTIVEC_BUILTIN_STVX_V8HI:
 case ALTIVEC_BUILTIN_STVX_V4SI:
 case ALTIVEC_BUILTIN_STVX_V4SF:
 case ALTIVEC_BUILTIN_STVX_V2DI:
 case ALTIVEC_BUILTIN_STVX_V2DF:
+case VSX_BUILTIN_STXVW4X_V16QI:
+case VSX_BUILTIN_STXVW4X_V8HI:
+case VSX_BUILTIN_STXVW4X_V4SF:
+case VSX_BUILTIN_STXVW4X_V4SI:
+case VSX_BUILTIN_STXVD2X_V2DF:
+case VSX_BUILTIN_STXVD2X_V2DI:
   return true;
 default:
   return false;
 }
 }
@@ -15910,10 +15916,81 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
gimple_set_location (g, loc);
gsi_replace (gsi, g, true);
return true;
   }
 
+/* unaligned Vector loads.  */
+case VSX_BUILTIN_LXVW4X_V16QI:
+case VSX_BUILTIN_LXVW4X_V8HI:
+case VSX_BUILTIN_LXVW4X_V4SF:
+case VSX_BUILTIN_LXVW4X_V4SI:
+case VSX_BUILTIN_LXVD2X_V2DF:
+case VSX_BUILTIN_LXVD2X_V2DI:
+  {
+arg0 = gimple_call_arg (stmt, 0);  // offset
+arg1 = gimple_call_arg (stmt, 1);  // address
+lhs = gimple_call_lhs (stmt);
+location_t loc = gimple_location (stmt);
+/* Since arg1 may be cast to a different type, just use ptr_type_node
+   here instead of trying to enforce TBAA on pointer types.  */
+tree arg1_type = ptr_type_node;
+tree lhs_type = TREE_TYPE (lhs);
+/* In GIMPLE the type of the MEM_REF specifies the alignment.  The
+  required alignment (power) is 4 bytes regardless of data type.  */
+tree align_ltype = build_aligned_type (lhs_type, 4);
+/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+   the tree using the value from arg0.  The resulting type will match
+   the type of arg1.  */
+gimple_seq stmts = NULL;
+tree temp_offset = gimple_convert (, loc, sizetype, arg0);
+tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
+  arg1_type, arg1, temp_offset);
+gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
+   take an offset, but since we've already incorporated the offset
+   above, here we just pass in a zero.  */
+gimple *g;
+g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, temp_addr,
+   build_int_cst (arg1_type, 0)));
+gimple_set_location (g, loc);
+gsi_replace (gsi, g, true);
+return true;
+  }
+
+/* unaligned Vector stores.  */
+case VSX_BUILTIN_STXVW4X_V16QI:
+case VSX_BUILTIN_STXVW4X_V8HI:
+case VSX_BUILTIN_STXVW4X_V4SF:
+case VSX_BUILTIN_STXVW4X_V4SI:
+case VSX_BUILTIN_STXVD2X_V2DF:
+case VSX_BUILTIN_STXVD2X_V2DI:
+  {
+arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
+arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
+tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
+location_t loc = gimple_location (stmt);
+tree arg0_type = TREE_TYPE (arg0);
+/* Use ptr_type_node (no TBAA) for the arg2_type.  */
+tree arg2_type = ptr_type_node;
+/* In GIMPLE the type of the MEM_REF specifies the alignment.  The
+   required alignment (power) is 4 bytes regardless of data type.  */
+tree align_stype = build_aligned_type (arg0_type, 4);
+/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+   the tree using the value from arg1.  */
+gimple_seq stmts = NULL;
+tree temp_offset = gimple_convert (, loc, sizetype, arg1);
+tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
+  arg2_type, arg2, temp_offset);
+

Re: [PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst

2018-07-10 Thread Will Schmidt
On Tue, 2018-07-10 at 16:23 +0200, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 3:51 PM Bill Schmidt  wrote:
> >
> >
> > > On Jul 10, 2018, at 8:48 AM, Richard Biener  
> > > wrote:
> > >
> > > On Tue, Jul 10, 2018 at 3:33 PM Bill Schmidt  
> > > wrote:
> > >>
> > >>
> > >>> On Jul 10, 2018, at 2:10 AM, Richard Biener 
> > >>>  wrote:
> > >>>
> > >>> On Mon, Jul 9, 2018 at 9:08 PM Will Schmidt  
> > >>> wrote:
> > 
> >  Hi,
> >  Re-posting.  Richard provided feedback on a previous version of this
> >  patch, I wanted to make sure he was/is OK with the latest. :-)
> > 
> >  Add support for Gimple folding for unaligned vector loads and stores.
> > 
> >  Regtest completed across variety of systems, P6,P7,P8,P9.
> > 
> >  [v2] Added the type for the MEM_REF, per feedback.
> >  Testcases for gimple-folding of the same are currently in-tree
> >  as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c.
> >  Re-tested, still looks good. :-)
> > 
> >  [v3] Updated the alignment for the MEM_REF to be 4bytes.
> >  Updated/added/removed comments in the code for clarity.
> > 
> >  OK for trunk?
> > 
> >  Thanks
> >  -Will
> > 
> >  [gcc]
> > 
> >  2018-07-09 Will Schmidt 
> > 
> >    * config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
> >    vec_xst variants to the list.
> >    (rs6000_gimple_fold_builtin): Add support for folding unaligned
> >    vector loads and stores.
> > 
> >  diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >  index 8bc4109..774c60a 100644
> >  --- a/gcc/config/rs6000/rs6000.c
> >  +++ b/gcc/config/rs6000/rs6000.c
> >  @@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum 
> >  rs6000_builtins fn_code)
> > case ALTIVEC_BUILTIN_STVX_V8HI:
> > case ALTIVEC_BUILTIN_STVX_V4SI:
> > case ALTIVEC_BUILTIN_STVX_V4SF:
> > case ALTIVEC_BUILTIN_STVX_V2DI:
> > case ALTIVEC_BUILTIN_STVX_V2DF:
> >  +case VSX_BUILTIN_STXVW4X_V16QI:
> >  +case VSX_BUILTIN_STXVW4X_V8HI:
> >  +case VSX_BUILTIN_STXVW4X_V4SF:
> >  +case VSX_BUILTIN_STXVW4X_V4SI:
> >  +case VSX_BUILTIN_STXVD2X_V2DF:
> >  +case VSX_BUILTIN_STXVD2X_V2DI:
> >   return true;
> > default:
> >   return false;
> > }
> >  }
> >  @@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin 
> >  (gimple_stmt_iterator *gsi)
> >    gimple_set_location (g, loc);
> >    gsi_replace (gsi, g, true);
> >    return true;
> >   }
> > 
> >  +/* unaligned Vector loads.  */
> >  +case VSX_BUILTIN_LXVW4X_V16QI:
> >  +case VSX_BUILTIN_LXVW4X_V8HI:
> >  +case VSX_BUILTIN_LXVW4X_V4SF:
> >  +case VSX_BUILTIN_LXVW4X_V4SI:
> >  +case VSX_BUILTIN_LXVD2X_V2DF:
> >  +case VSX_BUILTIN_LXVD2X_V2DI:
> >  +  {
> >  +arg0 = gimple_call_arg (stmt, 0);  // offset
> >  +arg1 = gimple_call_arg (stmt, 1);  // address
> >  +lhs = gimple_call_lhs (stmt);
> >  +location_t loc = gimple_location (stmt);
> >  +/* Since arg1 may be cast to a different type, just use 
> >  ptr_type_node
> >  +   here instead of trying to enforce TBAA on pointer types.  
> >  */
> >  +tree arg1_type = ptr_type_node;
> >  +tree lhs_type = TREE_TYPE (lhs);
> >  +/* in GIMPLE the type of the MEM_REF specifies the alignment. 
> >   The
> >  +  required alignment (power) is 4 bytes regardless of data 
> >  type.  */
> >  +tree align_ltype = build_aligned_type (lhs_type, 4);
> >  +/* POINTER_PLUS_EXPR wants the offset to be of type 
> >  'sizetype'.  Create
> >  +   the tree using the value from arg0.  The resulting type 
> >  will match
> >  +   the type of arg1.  */
> >  +gimple_seq stmts = NULL;
> >  +tree temp_offset = gimple_convert (, loc, sizetype, 
> >  arg0);
> >  +tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
> >  +  arg1_type, arg1, temp_offset);
> >  +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> >  +/* Use the build2 helper to set up the mem_ref.  The MEM_REF 
> >  could also
> >  +   take an offset, but since we've already incorporated the 
> >  offset
> >  +   above, here we just pass in a zero.  */
> >  +gimple *g;
> >  +g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, 
> >  temp_addr,
> >  +   build_int_cst 
> >  (arg1_type, 0)));
> >  +gimple_set_location (g, loc);
> >  +gsi_replace (gsi, g, true);
> >  +

Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-10 Thread Martin Sebor

On 07/10/2018 09:14 AM, Qing Zhao wrote:



On Jul 9, 2018, at 3:25 PM, Martin Sebor  wrote:

check_access() calls warning_at() to issue warnings, and that
function only issues warnings if they are enabled, so the guard
isn't necessary to make it work this way.


Okay I see.

then, in the current code: (for routine expand_builtin_memcmp)

  /* Diagnose calls where the specified length exceeds the size of either
 object.  */
  if (warn_stringop_overflow)
{
  tree size = compute_objsize (arg1, 0);
  if (check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
/*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE))
{
  size = compute_objsize (arg2, 0);
  check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
/*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE);
}
}

Is the above condition on variable warn_stringop_overflow unnecessary?
all the warnings inside check_access are controlled by OPT_Wstringop_overflow_.

can I safely delete the above condition if (warn_stringop_overflow)?


I think the check above is only there to avoid the overhead
of the two calls to compute_objsize and check_access.  There
are a few more like it in other functions in the file and
they all should be safe to remove, but also safe to keep.
(Some of them might make it easy to inadvertently introduce
a dependency between the warning option and an optimization
so that's something to consider.)


Beyond that, an enhancement to this optimization that might
be worth considering is inlining even non-constant calls
with array arguments whose size is no greater than the limit.
As in:

extern char a[4], *b;

int n = strcmp (a, b);

Because strcmp arguments are required to be nul-terminated
strings, a's length above must be at most 3.  This is analogous
to similar optimizations GCC performs, such as folding to zero
calls to strlen() with one-element arrays.


Yes, I agree that this will be another good enhancement to the strcmp inlining.

however, it’s not easy to be integrated with my current patch.  The major issue 
is:

 The inlined code for the strcmp call without string constant will be 
different than the inlined code for the
strcmp call with string constant,  then:

1. the default value for the threshold that control the maximum length 
of the string length for inlining will
be different than the one for the strcmp call with string constant,  more 
experiments need to be run and a new parameter
need to be added to control this;
2. the inlined transformed code will be different than the current one.

based on the above, I’d like to open a new PR to record this new enhancement 
and finish it with a new patch later.

what’s your opinion on this?


I'm not sure I see the issues above as problems and I would expect
the non-constant optimization to naturally handle the constant case
as well.  But if you prefer it that way, implementing the non-constant
optimization in a separate step sounds reasonable to me.  It's your
call.


the inlined code for call to strcmp with constant string will only have one 
load instruction for each byte, but for call to strcmp
without constant string, there will be  two load instructions for each byte.  
So, the run time performance impact will be different.
we need separate default values of the maximum length of the string to enable 
the transformation.


You're right, that's true for builtins.c where all we have to
work with is arrays with unknown contents and string literals.
The strlen pass, on the other hand, has access to the lengths
of even unknown strings.  That suggests that an even better
place for the optimization might be the strlen pass where
the folding could happen earlier and at a higher level, which
might even obviate having to worry about the constant vs non-
constant handling.


I will create a PR on this and add a new patch after this one.


Sure, one step at a time makes sense.  I don't think there is
any harm in having the optimization in two places: builtins.c
and strlen.

Martin



thanks.

Qing





[PATCH] doc: add missing "mode" type attribute

2018-07-10 Thread Paul Koning
"mode" is documented as a variable attribute but not as a type attribute.  This 
fixes that omission.  I simply copied the other text, it seemed suitable as it 
stands.

The attributes are normally listed in alphabetical order but "mode" was out of 
order in the variable attributes.

Ok for trunk?

paul

ChangeLog:

2018-07-10  Paul Koning  

* doc/extend.texi (Common Variable Attributes): Move "mode" into
alphabetical order.
(Common Type Attributes): Add "mode" attribute.

Index: doc/extend.texi
===
--- doc/extend.texi (revision 262540)
+++ doc/extend.texi (working copy)
@@ -6123,6 +6123,19 @@ types (@pxref{Common Function Attributes},
 The message attached to the attribute is affected by the setting of
 the @option{-fmessage-length} option.
 
+@item mode (@var{mode})
+@cindex @code{mode} variable attribute
+This attribute specifies the data type for the declaration---whichever
+type corresponds to the mode @var{mode}.  This in effect lets you
+request an integer or floating-point type according to its width.
+
+@xref{Machine Modes,,, gccint, GNU Compiler Collection (GCC) Internals},
+for a list of the possible keywords for @var{mode}.
+You may also specify a mode of @code{byte} or @code{__byte__} to
+indicate the mode corresponding to a one-byte integer, @code{word} or
+@code{__word__} for the mode of a one-word integer, and @code{pointer}
+or @code{__pointer__} for the mode used to represent pointers.
+
 @item nonstring
 @cindex @code{nonstring} variable attribute
 The @code{nonstring} variable attribute specifies that an object or member
@@ -6158,19 +6171,6 @@ int f (struct Data *pd, const char *s)
 @}
 @end smallexample
 
-@item mode (@var{mode})
-@cindex @code{mode} variable attribute
-This attribute specifies the data type for the declaration---whichever
-type corresponds to the mode @var{mode}.  This in effect lets you
-request an integer or floating-point type according to its width.
-
-@xref{Machine Modes,,, gccint, GNU Compiler Collection (GCC) Internals},
-for a list of the possible keywords for @var{mode}.
-You may also specify a mode of @code{byte} or @code{__byte__} to
-indicate the mode corresponding to a one-byte integer, @code{word} or
-@code{__word__} for the mode of a one-word integer, and @code{pointer}
-or @code{__pointer__} for the mode used to represent pointers.
-
 @item packed
 @cindex @code{packed} variable attribute
 The @code{packed} attribute specifies that a variable or structure field
@@ -7112,6 +7112,19 @@ declaration, the above program would abort when co
 @option{-fstrict-aliasing}, which is on by default at @option{-O2} or
 above.
 
+@item mode (@var{mode})
+@cindex @code{mode} type attribute
+This attribute specifies the data type for the declaration---whichever
+type corresponds to the mode @var{mode}.  This in effect lets you
+request an integer or floating-point type according to its width.
+
+@xref{Machine Modes,,, gccint, GNU Compiler Collection (GCC) Internals},
+for a list of the possible keywords for @var{mode}.
+You may also specify a mode of @code{byte} or @code{__byte__} to
+indicate the mode corresponding to a one-byte integer, @code{word} or
+@code{__word__} for the mode of a one-word integer, and @code{pointer}
+or @code{__pointer__} for the mode used to represent pointers.
+
 @item packed
 @cindex @code{packed} type attribute
 This attribute, attached to @code{struct} or @code{union} type
Index: doc/md.texi
===
--- doc/md.texi (revision 262540)
+++ doc/md.texi (working copy)
@@ -10263,7 +10263,11 @@ the expression from the original pattern, which ma
 @code{match_operand N} from the input pattern.  As a consequence,
 @code{match_dup} cannot be used to point to @code{match_operand}s from
 the output pattern, it should always refer to a @code{match_operand}
-from the input pattern.
+from the input pattern.  If a @code{match_dup N} occurs more than once
+in the output template, its first occurrence is replaced with the
+expression from the original pattern, and the subsequent expressions
+are replaced with @code{match_dup N}, i.e., a reference to the first
+expression.
 
 In the output template one can refer to the expressions from the
 original pattern and create new ones.  For instance, some operands could



Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Earnshaw (lists)
On 10/07/18 16:42, Jeff Law wrote:
> On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote:
>> On 10/07/18 00:13, Jeff Law wrote:
>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:

 To address all of the above, these patches adopt a new approach, based
 in part on a posting by Chandler Carruth to the LLVM developers list
 (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
 but which we have extended to deal with inter-function speculation.
 The patches divide the problem into two halves.
>>> We're essentially turning the control dependency into a value that we
>>> can then use to munge the pointer or the resultant data.
>>>

 The first half is some target-specific code to track the speculation
 condition through the generated code to provide an internal variable
 which can tell us whether or not the CPU's control flow speculation
 matches the data flow calculations.  The idea is that the internal
 variable starts with the value TRUE and if the CPU's control flow
 speculation ever causes a jump to the wrong block of code the variable
 becomes false until such time as the incorrect control flow
 speculation gets unwound.
>>> Right.
>>>
>>> So one of the things that comes immediately to mind is you have to run
>>> this early enough that you can still get to all the control flow and
>>> build your predicates.  Otherwise you have do undo stuff like
>>> conditional move generation.
>>
>> No, the opposite, in fact.  We want to run this very late, at least on
>> Arm systems (AArch64 or AArch32).  Conditional move instructions are
>> fine - they're data-flow operations, not control flow (in fact, that's
>> exactly what the control flow tracker instructions are).  By running it
>> late we avoid disrupting any of the earlier optimization passes as well.
> Ack.  I looked at the aarch64 implementation after sending my message
> and it clearly runs very late.
> 
> I haven't convinced myself that all the work generic parts of the
> compiler to rewrite and eliminate conditionals is safe.  But even if it
> isn't, you're probably getting enough coverage to drastically reduce the
> attack surface.  I'm going to have to think about the early
> transformations we make and how they interact here harder.  But I think
> the general approach can dramatically reduce the attack surface.

My argument here would be that we are concerned about speculation that
the CPU does with the generated program.  We're not particularly
bothered about the abstract machine description it's based upon.  As
long as the earlier transforms lead to a valid translation (it hasn't
removed a necessary bounds check) then running late is fine.

I can't currently conceive a situation where the compiler would be able
to remove a /necessary/ bounds check that could lead to unsafe
speculation later on.  A redundant bounds check removal shouldn't be a
problem as the non-redundant check should remain and that will still get
tracking code added.

> 
> With running very late, as you noted, the big concern is edge
> insertions.  I'm going to have to re-familiarize myself with all the
> rules there :-)I did note you stumbled on some of the issues in that
> space (what to do with calls that throw exceptions).
> 
> Placement before the final bbro pass probably avoids a lot of pain.  So
> the basic placement seems reasonable.  And again, if we're missing
> something due to the effects of earlier passes, I still think you're
> reducing the attack surface in a meaningful way.
> 
> 
> 
>>
>>>
>>> On the flip side, the earlier you do this mitigation, the more you have
>>> to worry about what the optimizers are going to do to the code later in
>>> the pipeline.  It's almost guaranteed a naive implementation is going to
>>> muck this up since we can propagate the state of the condition into the
>>> arms which will make the predicate state a compile time constant.
>>>
>>> In fact this seems to be running into the area of pointer providence and
>>> some discussions we had around atomic a few years back.
>>>
>>> I also wonder if this could be combined with taint analysis to produce a
>>> much lower overhead solution in cases were developers have done analysis
>>> and know what objects are potentially under attacker control.  So
>>> instead of analyzing everything, we can have a much narrower focus.
>>
>> Automatic application of the tracker to vulnerable variables would be
>> nice, but I haven't attempted to go there yet: at present I still rely
>> on the user to annotate code with the new intrinsic.
> ACK.  My sense is we are going to want taint analysis.  I think it'd be
> useful here and in other contexts.  However, I don't think it
> necessarily needs to be a requirement to go forward.
> 
> I'm going to review the atomic discussion we had a while back with the
> kernel folks as well as some pointer providence discussions I've had
> with Martin S.  I can't put my finger on it yet, but I still 

Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-10 Thread Richard Biener
On July 10, 2018 5:14:37 PM GMT+02:00, Qing Zhao  wrote:
>
>> On Jul 9, 2018, at 3:25 PM, Martin Sebor  wrote:
>> 
>> check_access() calls warning_at() to issue warnings, and that
>> function only issues warnings if they are enabled, so the guard
>> isn't necessary to make it work this way.
>
>Okay I see.
>
>then, in the current code: (for routine expand_builtin_memcmp)
>
>/* Diagnose calls where the specified length exceeds the size of either
> object.  */
>  if (warn_stringop_overflow)
>{
>  tree size = compute_objsize (arg1, 0);
>  if (check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
>   /*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE))
>{
>  size = compute_objsize (arg2, 0);
>  check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
>   /*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE);
>}
>}
>
>Is the above condition on variable warn_stringop_overflow unnecessary?
>all the warnings inside check_access are controlled by
>OPT_Wstringop_overflow_.

Well, the condition certainly saves compile time. 

>
>can I safely delete the above condition if (warn_stringop_overflow)?
>
>> 
 Beyond that, an enhancement to this optimization that might
 be worth considering is inlining even non-constant calls
 with array arguments whose size is no greater than the limit.
 As in:
 
 extern char a[4], *b;
 
 int n = strcmp (a, b);
 
 Because strcmp arguments are required to be nul-terminated
 strings, a's length above must be at most 3.  This is analogous
 to similar optimizations GCC performs, such as folding to zero
 calls to strlen() with one-element arrays.
>>> 
>>> Yes, I agree that this will be another good enhancement to the
>strcmp inlining.
>>> 
>>> however, it’s not easy to be integrated with my current patch.  The
>major issue is:
>>> 
>>>  The inlined code for the strcmp call without string constant will
>be different than the inlined code for the
>>> strcmp call with string constant,  then:
>>> 
>>> 1. the default value for the threshold that control the maximum
>length of the string length for inlining will
>>> be different than the one for the strcmp call with string constant, 
>more experiments need to be run and a new parameter
>>> need to be added to control this;
>>> 2. the inlined transformed code will be different than the current
>one.
>>> 
>>> based on the above, I’d like to open a new PR to record this new
>enhancement and finish it with a new patch later.
>>> 
>>> what’s your opinion on this?
>> 
>> I'm not sure I see the issues above as problems and I would expect
>> the non-constant optimization to naturally handle the constant case
>> as well.  But if you prefer it that way, implementing the
>non-constant
>> optimization in a separate step sounds reasonable to me.  It's your
>> call.
>
>the inlined code for call to strcmp with constant string will only have
>one load instruction for each byte, but for call to strcmp
>without constant string, there will be  two load instructions for each
>byte.  So, the run time performance impact will be different.
>we need separate default values of the maximum length of the string to
>enable the transformation. 
>
>I will create a PR on this and add a new patch after this one.
>
>thanks.
>
>Qing



Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Jeff Law
On 07/10/2018 04:53 AM, Richard Earnshaw (lists) wrote:
> On 10/07/18 11:10, Richard Biener wrote:
>> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
>>  wrote:
>>>
>>> On 10/07/18 08:19, Richard Biener wrote:
 On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
  wrote:
>
>
> The patches I posted earlier this year for mitigating against
> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> which it became obvious that a rethink was needed.  This mail, and the
> following patches attempt to address that feedback and present a new
> approach to mitigating against this form of attack surface.
>
> There were two major issues with the original approach:
>
> - The speculation bounds were too tightly constrained - essentially
>   they had to represent and upper and lower bound on a pointer, or a
>   pointer offset.
> - The speculation constraints could only cover the immediately preceding
>   branch, which often did not fit well with the structure of the existing
>   code.
>
> An additional criticism was that the shape of the intrinsic did not
> fit particularly well with systems that used a single speculation
> barrier that essentially had to wait until all preceding speculation
> had to be resolved.
>
> To address all of the above, these patches adopt a new approach, based
> in part on a posting by Chandler Carruth to the LLVM developers list
> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> but which we have extended to deal with inter-function speculation.
> The patches divide the problem into two halves.
>
> The first half is some target-specific code to track the speculation
> condition through the generated code to provide an internal variable
> which can tell us whether or not the CPU's control flow speculation
> matches the data flow calculations.  The idea is that the internal
> variable starts with the value TRUE and if the CPU's control flow
> speculation ever causes a jump to the wrong block of code the variable
> becomes false until such time as the incorrect control flow
> speculation gets unwound.
>
> The second half is that a new intrinsic function is introduced that is
> much simpler than we had before.  The basic version of the intrinsic
> is now simply:
>
>   T var = __builtin_speculation_safe_value (T unsafe_var);
>
> Full details of the syntax can be found in the documentation patch, in
> patch 1.  In summary, when not speculating the intrinsic returns
> unsafe_var; when speculating then if it can be shown that the
> speculative flow has diverged from the intended control flow then zero
> is returned.  An optional second argument can be used to return an
> alternative value to zero.  The builtin may cause execution to pause
> until the speculation state is resolved.

 So a trivial target implementation would be to emit a barrier and then
 it would always return unsafe_var and never zero.  What I don't understand
 fully is what users should do here, thus what the value of ever returning
 "unsafe" is.  Also I wonder why the API is forcing you to single-out a
 special value instead of doing

  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
  if (!safe)
/* what now? */

 I'm only guessing that the correct way to handle "unsafe" is basically

  while (__builtin_speculation_safe_value (val) == 0)
 ;

 use val, it's now safe
>>>
>>> No, a safe version of val is returned, not a bool telling you it is now
>>> safe to use the original.
>>
>> OK, so making the old value dead is required to preserve the desired
>> dataflow.
>>
>> But how should I use the special value that signaled "failure"?
>>
>> Obviously the user isn't supposed to simply replace 'val' with
>>
>>  val = __builtin_speculation_safe_value (val);
>>
>> to make it speculation-proof.  So - how should the user _use_ this
>> builtin?  The docs do not say anything about this but says the
>> very confusing
>>
>> +The function may use target-dependent speculation tracking state to cause
>> +@var{failval} to be returned when it is known that speculative
>> +execution has incorrectly predicted a conditional branch operation.
>>
>> because speculation is about executing instructions as if they were
>> supposed to be executed.  Once it is known the prediciton was wrong
>> no more "wrong" instructions will be executed but a previously
>> speculated instruction cannot know it was "falsely" speculated.
>>
>> Does the above try to say that the function may return failval if the
>> instruction is currently executed speculatively instead?  That would
>> make sense to me.  And return failval independent of if the speculation
>> later turns out to be correct or not.
>>
>>>  You must use the sanitized 

Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Jeff Law
On 07/10/2018 08:14 AM, Richard Earnshaw (lists) wrote:
> On 10/07/18 14:48, Bill Schmidt wrote:
>>
>>> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) 
>>>  wrote:
>>>
>>> On 10/07/18 00:13, Jeff Law wrote:
 On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>
> The patches I posted earlier this year for mitigating against
> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> which it became obvious that a rethink was needed.  This mail, and the
> following patches attempt to address that feedback and present a new
> approach to mitigating against this form of attack surface.
>
> There were two major issues with the original approach:
>
> - The speculation bounds were too tightly constrained - essentially
>  they had to represent and upper and lower bound on a pointer, or a
>  pointer offset.
> - The speculation constraints could only cover the immediately preceding
>  branch, which often did not fit well with the structure of the existing
>  code.
>
> An additional criticism was that the shape of the intrinsic did not
> fit particularly well with systems that used a single speculation
> barrier that essentially had to wait until all preceding speculation
> had to be resolved.
 Right.  I suggest the Intel and IBM reps chime in on the updated semantics.

>>>
>>> Yes, logically, this is a boolean tracker value.  In practice we use ~0
>>> for true and 0 for false, so that we can simply use it as a mask
>>> operation later.
>>>
>>> I hope this intrinsic will be even more acceptable than the one that
>>> Bill Schmidt acked previously, it's even simpler than the version we had
>>> last time.
>>
>> Yes, I think this looks quite good.  Thanks!
>>
>> Thanks also for digging into the speculation tracking algorithm.  This
>> has good potential as a conservative opt-in approach.  The obvious
>> concern is whether performance will be acceptable even for apps
>> that really want the protection.
>>
>> We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 
>> numbers on a Skylake box.  We saw geomean degradations of about
>> 42% (int) and 33% (fp).  (This was just one test, so caveat emptor.)
>> This isn't terrible given the number of potential false positives and the
>> early state of the algorithm, but it's still a lot from a customer 
>> perspective.
>> I'll be interested in whether your interprocedural improvements are
>> able to reduce the conservatism a bit.
>>
> 
> So I don't have any numbers for SPEC2006.  I have some initial numbers
> for SPEC2000 when just adding the tracking code (so not applying the
> second part of the mitigation).  In that case INT2000 is down by ~13%
> and FP2000 was by comparison almost in the noise (~2.4%).
> 
> Applying the tracker value to all memory loads would push those numbers
> up significantly, I suspect.  That's part of the reason for preferring
> the intrinsic rather than automatic mitigation: the intrinsic is much
> more targeted.
Right.  Fully automatic without any "hints" is going to be very
expensive, possibly prohibitively expensive.

Using the intrinsic or exploiting some kind of taint analysis has the
potential to drastically reduce the overhead.  At least it seems like
they should :-)

jeff


Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Jeff Law
On 07/10/2018 02:49 AM, Richard Earnshaw (lists) wrote:
> On 10/07/18 00:13, Jeff Law wrote:
>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>>
>>> To address all of the above, these patches adopt a new approach, based
>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>> but which we have extended to deal with inter-function speculation.
>>> The patches divide the problem into two halves.
>> We're essentially turning the control dependency into a value that we
>> can then use to munge the pointer or the resultant data.
>>
>>>
>>> The first half is some target-specific code to track the speculation
>>> condition through the generated code to provide an internal variable
>>> which can tell us whether or not the CPU's control flow speculation
>>> matches the data flow calculations.  The idea is that the internal
>>> variable starts with the value TRUE and if the CPU's control flow
>>> speculation ever causes a jump to the wrong block of code the variable
>>> becomes false until such time as the incorrect control flow
>>> speculation gets unwound.
>> Right.
>>
>> So one of the things that comes immediately to mind is you have to run
>> this early enough that you can still get to all the control flow and
>> build your predicates.  Otherwise you have do undo stuff like
>> conditional move generation.
> 
> No, the opposite, in fact.  We want to run this very late, at least on
> Arm systems (AArch64 or AArch32).  Conditional move instructions are
> fine - they're data-flow operations, not control flow (in fact, that's
> exactly what the control flow tracker instructions are).  By running it
> late we avoid disrupting any of the earlier optimization passes as well.
Ack.  I looked at the aarch64 implementation after sending my message
and it clearly runs very late.

I haven't convinced myself that all the work generic parts of the
compiler to rewrite and eliminate conditionals is safe.  But even if it
isn't, you're probably getting enough coverage to drastically reduce the
attack surface.  I'm going to have to think about the early
transformations we make and how they interact here harder.  But I think
the general approach can dramatically reduce the attack surface.

With running very late, as you noted, the big concern is edge
insertions.  I'm going to have to re-familiarize myself with all the
rules there :-)I did note you stumbled on some of the issues in that
space (what to do with calls that throw exceptions).

Placement before the final bbro pass probably avoids a lot of pain.  So
the basic placement seems reasonable.  And again, if we're missing
something due to the effects of earlier passes, I still think you're
reducing the attack surface in a meaningful way.



> 
>>
>> On the flip side, the earlier you do this mitigation, the more you have
>> to worry about what the optimizers are going to do to the code later in
>> the pipeline.  It's almost guaranteed a naive implementation is going to
>> muck this up since we can propagate the state of the condition into the
>> arms which will make the predicate state a compile time constant.
>>
>> In fact this seems to be running into the area of pointer providence and
>> some discussions we had around atomic a few years back.
>>
>> I also wonder if this could be combined with taint analysis to produce a
>> much lower overhead solution in cases were developers have done analysis
>> and know what objects are potentially under attacker control.  So
>> instead of analyzing everything, we can have a much narrower focus.
> 
> Automatic application of the tracker to vulnerable variables would be
> nice, but I haven't attempted to go there yet: at present I still rely
> on the user to annotate code with the new intrinsic.
ACK.  My sense is we are going to want taint analysis.  I think it'd be
useful here and in other contexts.  However, I don't think it
necessarily needs to be a requirement to go forward.

I'm going to review the atomic discussion we had a while back with the
kernel folks as well as some pointer providence discussions I've had
with Martin S.  I can't put my finger on it yet, but I still have the
sense there's some interactions here we want to at least be aware of.

> 
> That doesn't mean that we couldn't extend the overall approach later to
> include automatic tracking.
Absolutely.

> 
>>
>> The pointer munging could well run afoul of alias analysis engines that
>> don't expect to be seeing those kind of operations.
> 
> I think the pass runs late enough that it isn't a problem.
Yea, I think you're right.


> 
>>
>> Anyway, just some initial high level thoughts.  I'm sure there'll be
>> more as I read the implementation.
>>
> 
> Thanks for starting to look at this so quickly.
NP.  Your timing to come back to this is good.

Jeff


Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-10 Thread Qing Zhao


> On Jul 9, 2018, at 3:25 PM, Martin Sebor  wrote:
> 
> check_access() calls warning_at() to issue warnings, and that
> function only issues warnings if they are enabled, so the guard
> isn't necessary to make it work this way.

Okay I see.

then, in the current code: (for routine expand_builtin_memcmp)

  /* Diagnose calls where the specified length exceeds the size of either
 object.  */
  if (warn_stringop_overflow)
{
  tree size = compute_objsize (arg1, 0);
  if (check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
/*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE))
{
  size = compute_objsize (arg2, 0);
  check_access (exp, /*dst=*/NULL_TREE, /*src=*/NULL_TREE, len,
/*maxread=*/NULL_TREE, size, /*objsize=*/NULL_TREE);
}
}

Is the above condition on variable warn_stringop_overflow unnecessary?
all the warnings inside check_access are controlled by OPT_Wstringop_overflow_.

can I safely delete the above condition if (warn_stringop_overflow)?

> 
>>> Beyond that, an enhancement to this optimization that might
>>> be worth considering is inlining even non-constant calls
>>> with array arguments whose size is no greater than the limit.
>>> As in:
>>> 
>>> extern char a[4], *b;
>>> 
>>> int n = strcmp (a, b);
>>> 
>>> Because strcmp arguments are required to be nul-terminated
>>> strings, a's length above must be at most 3.  This is analogous
>>> to similar optimizations GCC performs, such as folding to zero
>>> calls to strlen() with one-element arrays.
>> 
>> Yes, I agree that this will be another good enhancement to the strcmp 
>> inlining.
>> 
>> however, it’s not easy to be integrated with my current patch.  The major 
>> issue is:
>> 
>>   The inlined code for the strcmp call without string constant will be 
>> different than the inlined code for the
>> strcmp call with string constant,  then:
>> 
>>  1. the default value for the threshold that control the maximum length 
>> of the string length for inlining will
>> be different than the one for the strcmp call with string constant,  more 
>> experiments need to be run and a new parameter
>> need to be added to control this;
>>  2. the inlined transformed code will be different than the current one.
>> 
>> based on the above, I’d like to open a new PR to record this new enhancement 
>> and finish it with a new patch later.
>> 
>> what’s your opinion on this?
> 
> I'm not sure I see the issues above as problems and I would expect
> the non-constant optimization to naturally handle the constant case
> as well.  But if you prefer it that way, implementing the non-constant
> optimization in a separate step sounds reasonable to me.  It's your
> call.

the inlined code for call to strcmp with constant string will only have one 
load instruction for each byte, but for call to strcmp
without constant string, there will be  two load instructions for each byte.  
So, the run time performance impact will be different.
we need separate default values of the maximum length of the string to enable 
the transformation. 

I will create a PR on this and add a new patch after this one.

thanks.

Qing



Re: [PATCH] add support for strnlen (PR 81384)

2018-07-10 Thread Martin Sebor

On 07/10/2018 08:34 AM, Jeff Law wrote:

On 07/10/2018 08:25 AM, Richard Biener wrote:

On Tue, Jul 10, 2018 at 4:10 PM Martin Sebor  wrote:


On 07/10/2018 01:12 AM, Richard Biener wrote:

On Mon, Jul 9, 2018 at 11:26 PM Martin Sebor  wrote:


On 07/09/2018 08:36 AM, Aldy Hernandez wrote:

   { dg-do run }
   { do-options "-O2 -fno-tree-strlen" }  */

 I don't think this is doing anything.

If you look at the test run you can see that -fno-tree-strlen is never
passed (I think you actually mean -fno-optimize-strlen for that
matter).  Also, the builtins.exp harness runs your test for an
assortment of other flags, not just -O2.


I didn't know the harness ignores dg-options specified in these
tests.  That's surprising and feels like a bug in the harness
not to complain about it.  The purpose of the test is to verify
that the strnlen expansion in builtins.c does the right thing
and it deliberately tries to disable the earlier strlen
optimizations to make sure the expansion in builtins.c is fully
exercised.  By not pointing out my mistake the harness effectively
let me commit a change without making sure it's thoroughly tested
(I tested it manually before committing the patch but things could
regress without us noticing).  I'll look into fixing this somehow.



This test is failing on my range branch for -Og, because
expand_builtin_strnlen() needs range info:

+  wide_int min, max;
+  enum value_range_type rng = get_range_info (bound, , );
+  if (rng != VR_RANGE)
+return NULL_RTX;

but interestingly enough, it seems to be calculated in the sprintf
pass as part of the DOM walk:

  /* First record ranges generated by this statement.  */
  evrp_range_analyzer.record_ranges_from_stmt (stmt, false);

It feels wrong that the sprintf warning pass is generating range info
that you may later depend on at rtl expansion time (and for a totally
unrelated thing-- strlen expansion).


Any pass that records ranges for statements will have this
effect.  The sprintf pass seems to be the first one to make
use of this utility (and it's not just a warning pass but also
an optimization pass) but it would be a shame to put it off
limits to warning-only passes only because it happens to set
ranges.


As you noted elsewhere warning options shouldn't change code-generation.
This means that ranges may not be set to the IL in case they are only
computed when a warning option is enabled.


I agree.  That's also why I think it should be a basic service
rather than a side-effect of tree traversal, either one done
to implement a particular optimization, or one done by a warning.

The main reason for the work Jeff put in to extracting it out
of EVRP, if I recall correctly, was to expose more accurate
range information to warning passes.  Would setting statement
ranges make sense as part of EVRP (or some other similar pass)?
If not, the only way to conform to the policy that I can think
of is to have warning-only  passes that need it make
the traversal and call record_ranges regardless of whether or
not the warning itself is enabled.  That would seem needlessly
inefficient, even if the code implementing the warning itself
were disabled.


Well, simply not set range-info on SSA names but use the
lattice values?  Should be easy to key that to a flag.

Right.  That was essentially my suggestion.  For a client that uses EVRP
analysis to drive warnings, they do not mirror results into the global
range info we attach to SSA_NAMEs.  An optimization pass which utilizes
EVRP can (of course) set the global range info.

THe sprintf bits are a bit tricky as it's a mix of warning and
optimization, but I think there's enough separation that we can arrange
to do the right thing.

Since I introduced EVRP into the sprintf bits, I'm happy to own fixing
this up.  I just wanted to get general agreement on the approach.


I'm not sure I understand what about the sprintf pass needs
changing since that the warning is independent of the optimization.
The gate function makes the intent clear:

pass_sprintf_length::gate (function *)
{
  /* Run the pass iff -Warn-format-overflow or -Warn-format-truncation
 is specified and either not optimizing and the pass is being invoked
 early, or when optimizing and the pass is being invoked during
 optimization (i.e., "late").  */
  return ((warn_format_overflow > 0
   || warn_format_trunc > 0
   || flag_printf_return_value)
  && (optimize > 0) == fold_return_value);
}

The only other use of the warn_format_overflow and
warn_format_trunc variables is to set the warn_level variable,
and that one is only used to affect the LIKELY counter which
is used for warnings only but not for optimization.

Am I missing something?

Martin


Re: [PATCH][OBVIOUS] Add missing Optimization attribute.

2018-07-10 Thread Martin Liška
On 07/10/2018 10:01 AM, Andre Vieira (lists) wrote:
> On 09/07/18 09:11, Martin Liška wrote:
>> Hi.
>>
>> I'm putting back what I accidentally removed.
>>
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-07-09  Martin Liska  
>>
>>  * common.opt: Add back wrongly removed attribute.
>> ---
>>  gcc/common.opt | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>
> Hi Martin,
> 
> Re-enabling this option made the test for warnings in gcc.dg/pr84100.c fail.
> 
> Maybe the test needs some adjusting?
> 
> Cheers,
> Andre
> 

Following patch fixes that. I've tested the patch and I'm going to install it.

Martin
>From d375378fc6c61e0c279a75565cc2195b4fa54fab Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 10 Jul 2018 16:42:23 +0200
Subject: [PATCH] Change dg-warning to db-bogus in a test-case (PR
 testsuite/86445).

gcc/testsuite/ChangeLog:

2018-07-10  Martin Liska  

PR testsuite/86445
	* gcc.dg/pr84100.c: Change it back to dg-bogus.
---
 gcc/testsuite/gcc.dg/pr84100.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr84100.c b/gcc/testsuite/gcc.dg/pr84100.c
index 676d0c78dea..86fbc4f7a3e 100644
--- a/gcc/testsuite/gcc.dg/pr84100.c
+++ b/gcc/testsuite/gcc.dg/pr84100.c
@@ -8,7 +8,7 @@ __attribute__((optimize ("align-loops=16", "align-jumps=16",
 			 "align-labels=16", "align-functions=16")))
 void
 foo (void)
-{			/* { dg-warning "bad option" } */
+{			/* { dg-bogus "bad option" } */
   for (int i = 0; i < 1024; ++i)
 bar ();
 }
-- 
2.18.0



Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Jeff Law
On 07/10/2018 07:27 AM, Jakub Jelinek wrote:
> On Tue, Jul 10, 2018 at 03:17:48PM +0200, Richard Biener wrote:
>> On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
>>> Jeff told me that the recent popcount built-in detection is causing
>>> kernel build issues as
>>> ERROR: "__popcountsi2"
>>> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
>>>
>>> I could also reproduce this. AFIK, we should check if the libfunc is
>>> defined while checking popcount?
>>>
>>> I am testing the attached RFC patch. Is this reasonable?
>>
>> It doesn't work that way, all targets have this libfunc in libgcc.  This 
>> means
>> the kernel has to provide it.  The only thing you could do is restrict
>> replacement of CALL_EXPRs (in SCEV cprop) to those the target
>> natively supports.
> 
> Yeah, that is what we've done in the past in other cases, e.g. PR82981
> is somewhat similar.  So perhaps just check the optab and use it only if it
> is supported?
And I could live with this too.  Essentially I'm just looking to get the
issue raised and addressed now rather than waiting for stage3/stage4 :-)


jeff


Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Jeff Law
On 07/10/2018 07:17 AM, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
>  wrote:
>>
>> Hi,
>>
>> Jeff told me that the recent popcount built-in detection is causing
>> kernel build issues as
>> ERROR: "__popcountsi2"
>> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
>>
>> I could also reproduce this. AFIK, we should check if the libfunc is
>> defined while checking popcount?
>>
>> I am testing the attached RFC patch. Is this reasonable?
> 
> It doesn't work that way, all targets have this libfunc in libgcc.  This means
> the kernel has to provide it.  The only thing you could do is restrict
> replacement of CALL_EXPRs (in SCEV cprop) to those the target
> natively supports.
I can certainly live with that, but I think we should reach out to the
kernel developers to proactively make them aware of the requirement to
provide popcount.

Jeff


Re: [PATCH] add support for strnlen (PR 81384)

2018-07-10 Thread Jeff Law
On 07/10/2018 08:25 AM, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 4:10 PM Martin Sebor  wrote:
>>
>> On 07/10/2018 01:12 AM, Richard Biener wrote:
>>> On Mon, Jul 9, 2018 at 11:26 PM Martin Sebor  wrote:

 On 07/09/2018 08:36 AM, Aldy Hernandez wrote:
>{ dg-do run }
>{ do-options "-O2 -fno-tree-strlen" }  */
>
>  I don't think this is doing anything.
>
> If you look at the test run you can see that -fno-tree-strlen is never
> passed (I think you actually mean -fno-optimize-strlen for that
> matter).  Also, the builtins.exp harness runs your test for an
> assortment of other flags, not just -O2.

 I didn't know the harness ignores dg-options specified in these
 tests.  That's surprising and feels like a bug in the harness
 not to complain about it.  The purpose of the test is to verify
 that the strnlen expansion in builtins.c does the right thing
 and it deliberately tries to disable the earlier strlen
 optimizations to make sure the expansion in builtins.c is fully
 exercised.  By not pointing out my mistake the harness effectively
 let me commit a change without making sure it's thoroughly tested
 (I tested it manually before committing the patch but things could
 regress without us noticing).  I'll look into fixing this somehow.

>
> This test is failing on my range branch for -Og, because
> expand_builtin_strnlen() needs range info:
>
> +  wide_int min, max;
> +  enum value_range_type rng = get_range_info (bound, , );
> +  if (rng != VR_RANGE)
> +return NULL_RTX;
>
> but interestingly enough, it seems to be calculated in the sprintf
> pass as part of the DOM walk:
>
>   /* First record ranges generated by this statement.  */
>   evrp_range_analyzer.record_ranges_from_stmt (stmt, false);
>
> It feels wrong that the sprintf warning pass is generating range info
> that you may later depend on at rtl expansion time (and for a totally
> unrelated thing-- strlen expansion).

 Any pass that records ranges for statements will have this
 effect.  The sprintf pass seems to be the first one to make
 use of this utility (and it's not just a warning pass but also
 an optimization pass) but it would be a shame to put it off
 limits to warning-only passes only because it happens to set
 ranges.
>>>
>>> As you noted elsewhere warning options shouldn't change code-generation.
>>> This means that ranges may not be set to the IL in case they are only
>>> computed when a warning option is enabled.
>>
>> I agree.  That's also why I think it should be a basic service
>> rather than a side-effect of tree traversal, either one done
>> to implement a particular optimization, or one done by a warning.
>>
>> The main reason for the work Jeff put in to extracting it out
>> of EVRP, if I recall correctly, was to expose more accurate
>> range information to warning passes.  Would setting statement
>> ranges make sense as part of EVRP (or some other similar pass)?
>> If not, the only way to conform to the policy that I can think
>> of is to have warning-only  passes that need it make
>> the traversal and call record_ranges regardless of whether or
>> not the warning itself is enabled.  That would seem needlessly
>> inefficient, even if the code implementing the warning itself
>> were disabled.
> 
> Well, simply not set range-info on SSA names but use the
> lattice values?  Should be easy to key that to a flag.
Right.  That was essentially my suggestion.  For a client that uses EVRP
analysis to drive warnings, they do not mirror results into the global
range info we attach to SSA_NAMEs.  An optimization pass which utilizes
EVRP can (of course) set the global range info.

THe sprintf bits are a bit tricky as it's a mix of warning and
optimization, but I think there's enough separation that we can arrange
to do the right thing.

Since I introduced EVRP into the sprintf bits, I'm happy to own fixing
this up.  I just wanted to get general agreement on the approach.

jeff


Re: [PATCH] add support for strnlen (PR 81384)

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 4:10 PM Martin Sebor  wrote:
>
> On 07/10/2018 01:12 AM, Richard Biener wrote:
> > On Mon, Jul 9, 2018 at 11:26 PM Martin Sebor  wrote:
> >>
> >> On 07/09/2018 08:36 AM, Aldy Hernandez wrote:
> >>>{ dg-do run }
> >>>{ do-options "-O2 -fno-tree-strlen" }  */
> >>>
> >>>  I don't think this is doing anything.
> >>>
> >>> If you look at the test run you can see that -fno-tree-strlen is never
> >>> passed (I think you actually mean -fno-optimize-strlen for that
> >>> matter).  Also, the builtins.exp harness runs your test for an
> >>> assortment of other flags, not just -O2.
> >>
> >> I didn't know the harness ignores dg-options specified in these
> >> tests.  That's surprising and feels like a bug in the harness
> >> not to complain about it.  The purpose of the test is to verify
> >> that the strnlen expansion in builtins.c does the right thing
> >> and it deliberately tries to disable the earlier strlen
> >> optimizations to make sure the expansion in builtins.c is fully
> >> exercised.  By not pointing out my mistake the harness effectively
> >> let me commit a change without making sure it's thoroughly tested
> >> (I tested it manually before committing the patch but things could
> >> regress without us noticing).  I'll look into fixing this somehow.
> >>
> >>>
> >>> This test is failing on my range branch for -Og, because
> >>> expand_builtin_strnlen() needs range info:
> >>>
> >>> +  wide_int min, max;
> >>> +  enum value_range_type rng = get_range_info (bound, , );
> >>> +  if (rng != VR_RANGE)
> >>> +return NULL_RTX;
> >>>
> >>> but interestingly enough, it seems to be calculated in the sprintf
> >>> pass as part of the DOM walk:
> >>>
> >>>   /* First record ranges generated by this statement.  */
> >>>   evrp_range_analyzer.record_ranges_from_stmt (stmt, false);
> >>>
> >>> It feels wrong that the sprintf warning pass is generating range info
> >>> that you may later depend on at rtl expansion time (and for a totally
> >>> unrelated thing-- strlen expansion).
> >>
> >> Any pass that records ranges for statements will have this
> >> effect.  The sprintf pass seems to be the first one to make
> >> use of this utility (and it's not just a warning pass but also
> >> an optimization pass) but it would be a shame to put it off
> >> limits to warning-only passes only because it happens to set
> >> ranges.
> >
> > As you noted elsewhere warning options shouldn't change code-generation.
> > This means that ranges may not be set to the IL in case they are only
> > computed when a warning option is enabled.
>
> I agree.  That's also why I think it should be a basic service
> rather than a side-effect of tree traversal, either one done
> to implement a particular optimization, or one done by a warning.
>
> The main reason for the work Jeff put in to extracting it out
> of EVRP, if I recall correctly, was to expose more accurate
> range information to warning passes.  Would setting statement
> ranges make sense as part of EVRP (or some other similar pass)?
> If not, the only way to conform to the policy that I can think
> of is to have warning-only  passes that need it make
> the traversal and call record_ranges regardless of whether or
> not the warning itself is enabled.  That would seem needlessly
> inefficient, even if the code implementing the warning itself
> were disabled.

Well, simply not set range-info on SSA names but use the
lattice values?  Should be easy to key that to a flag.

Richard.

>
> Martin


Re: [PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 3:51 PM Bill Schmidt  wrote:
>
>
> > On Jul 10, 2018, at 8:48 AM, Richard Biener  
> > wrote:
> >
> > On Tue, Jul 10, 2018 at 3:33 PM Bill Schmidt  wrote:
> >>
> >>
> >>> On Jul 10, 2018, at 2:10 AM, Richard Biener  
> >>> wrote:
> >>>
> >>> On Mon, Jul 9, 2018 at 9:08 PM Will Schmidt  
> >>> wrote:
> 
>  Hi,
>  Re-posting.  Richard provided feedback on a previous version of this
>  patch, I wanted to make sure he was/is OK with the latest. :-)
> 
>  Add support for Gimple folding for unaligned vector loads and stores.
> 
>  Regtest completed across variety of systems, P6,P7,P8,P9.
> 
>  [v2] Added the type for the MEM_REF, per feedback.
>  Testcases for gimple-folding of the same are currently in-tree
>  as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c.
>  Re-tested, still looks good. :-)
> 
>  [v3] Updated the alignment for the MEM_REF to be 4bytes.
>  Updated/added/removed comments in the code for clarity.
> 
>  OK for trunk?
> 
>  Thanks
>  -Will
> 
>  [gcc]
> 
>  2018-07-09 Will Schmidt 
> 
>    * config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
>    vec_xst variants to the list.
>    (rs6000_gimple_fold_builtin): Add support for folding unaligned
>    vector loads and stores.
> 
>  diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>  index 8bc4109..774c60a 100644
>  --- a/gcc/config/rs6000/rs6000.c
>  +++ b/gcc/config/rs6000/rs6000.c
>  @@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum 
>  rs6000_builtins fn_code)
> case ALTIVEC_BUILTIN_STVX_V8HI:
> case ALTIVEC_BUILTIN_STVX_V4SI:
> case ALTIVEC_BUILTIN_STVX_V4SF:
> case ALTIVEC_BUILTIN_STVX_V2DI:
> case ALTIVEC_BUILTIN_STVX_V2DF:
>  +case VSX_BUILTIN_STXVW4X_V16QI:
>  +case VSX_BUILTIN_STXVW4X_V8HI:
>  +case VSX_BUILTIN_STXVW4X_V4SF:
>  +case VSX_BUILTIN_STXVW4X_V4SI:
>  +case VSX_BUILTIN_STXVD2X_V2DF:
>  +case VSX_BUILTIN_STXVD2X_V2DI:
>   return true;
> default:
>   return false;
> }
>  }
>  @@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin 
>  (gimple_stmt_iterator *gsi)
>    gimple_set_location (g, loc);
>    gsi_replace (gsi, g, true);
>    return true;
>   }
> 
>  +/* unaligned Vector loads.  */
>  +case VSX_BUILTIN_LXVW4X_V16QI:
>  +case VSX_BUILTIN_LXVW4X_V8HI:
>  +case VSX_BUILTIN_LXVW4X_V4SF:
>  +case VSX_BUILTIN_LXVW4X_V4SI:
>  +case VSX_BUILTIN_LXVD2X_V2DF:
>  +case VSX_BUILTIN_LXVD2X_V2DI:
>  +  {
>  +arg0 = gimple_call_arg (stmt, 0);  // offset
>  +arg1 = gimple_call_arg (stmt, 1);  // address
>  +lhs = gimple_call_lhs (stmt);
>  +location_t loc = gimple_location (stmt);
>  +/* Since arg1 may be cast to a different type, just use 
>  ptr_type_node
>  +   here instead of trying to enforce TBAA on pointer types.  */
>  +tree arg1_type = ptr_type_node;
>  +tree lhs_type = TREE_TYPE (lhs);
>  +/* in GIMPLE the type of the MEM_REF specifies the alignment.  
>  The
>  +  required alignment (power) is 4 bytes regardless of data 
>  type.  */
>  +tree align_ltype = build_aligned_type (lhs_type, 4);
>  +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'. 
>   Create
>  +   the tree using the value from arg0.  The resulting type will 
>  match
>  +   the type of arg1.  */
>  +gimple_seq stmts = NULL;
>  +tree temp_offset = gimple_convert (, loc, sizetype, arg0);
>  +tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
>  +  arg1_type, arg1, temp_offset);
>  +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>  +/* Use the build2 helper to set up the mem_ref.  The MEM_REF 
>  could also
>  +   take an offset, but since we've already incorporated the 
>  offset
>  +   above, here we just pass in a zero.  */
>  +gimple *g;
>  +g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, 
>  temp_addr,
>  +   build_int_cst 
>  (arg1_type, 0)));
>  +gimple_set_location (g, loc);
>  +gsi_replace (gsi, g, true);
>  +return true;
>  +  }
>  +
>  +/* unaligned Vector stores.  */
>  +case VSX_BUILTIN_STXVW4X_V16QI:
>  +case VSX_BUILTIN_STXVW4X_V8HI:
>  +case VSX_BUILTIN_STXVW4X_V4SF:
>  +case VSX_BUILTIN_STXVW4X_V4SI:
>  +case VSX_BUILTIN_STXVD2X_V2DF:
>  +case 

Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Earnshaw (lists)
On 10/07/18 14:48, Bill Schmidt wrote:
> 
>> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) 
>>  wrote:
>>
>> On 10/07/18 00:13, Jeff Law wrote:
>>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:

 The patches I posted earlier this year for mitigating against
 CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
 which it became obvious that a rethink was needed.  This mail, and the
 following patches attempt to address that feedback and present a new
 approach to mitigating against this form of attack surface.

 There were two major issues with the original approach:

 - The speculation bounds were too tightly constrained - essentially
  they had to represent and upper and lower bound on a pointer, or a
  pointer offset.
 - The speculation constraints could only cover the immediately preceding
  branch, which often did not fit well with the structure of the existing
  code.

 An additional criticism was that the shape of the intrinsic did not
 fit particularly well with systems that used a single speculation
 barrier that essentially had to wait until all preceding speculation
 had to be resolved.
>>> Right.  I suggest the Intel and IBM reps chime in on the updated semantics.
>>>
>>
>> Yes, logically, this is a boolean tracker value.  In practice we use ~0
>> for true and 0 for false, so that we can simply use it as a mask
>> operation later.
>>
>> I hope this intrinsic will be even more acceptable than the one that
>> Bill Schmidt acked previously, it's even simpler than the version we had
>> last time.
> 
> Yes, I think this looks quite good.  Thanks!
> 
> Thanks also for digging into the speculation tracking algorithm.  This
> has good potential as a conservative opt-in approach.  The obvious
> concern is whether performance will be acceptable even for apps
> that really want the protection.
> 
> We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 
> numbers on a Skylake box.  We saw geomean degradations of about
> 42% (int) and 33% (fp).  (This was just one test, so caveat emptor.)
> This isn't terrible given the number of potential false positives and the
> early state of the algorithm, but it's still a lot from a customer 
> perspective.
> I'll be interested in whether your interprocedural improvements are
> able to reduce the conservatism a bit.
> 

So I don't have any numbers for SPEC2006.  I have some initial numbers
for SPEC2000 when just adding the tracking code (so not applying the
second part of the mitigation).  In that case INT2000 is down by ~13%
and FP2000 was by comparison almost in the noise (~2.4%).

Applying the tracker value to all memory loads would push those numbers
up significantly, I suspect.  That's part of the reason for preferring
the intrinsic rather than automatic mitigation: the intrinsic is much
more targeted.

R.


> Thanks,
> Bill
>>

 To address all of the above, these patches adopt a new approach, based
 in part on a posting by Chandler Carruth to the LLVM developers list
 (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
 but which we have extended to deal with inter-function speculation.
 The patches divide the problem into two halves.
>>> We're essentially turning the control dependency into a value that we
>>> can then use to munge the pointer or the resultant data.
>>>

 The first half is some target-specific code to track the speculation
 condition through the generated code to provide an internal variable
 which can tell us whether or not the CPU's control flow speculation
 matches the data flow calculations.  The idea is that the internal
 variable starts with the value TRUE and if the CPU's control flow
 speculation ever causes a jump to the wrong block of code the variable
 becomes false until such time as the incorrect control flow
 speculation gets unwound.
>>> Right.
>>>
>>> So one of the things that comes immediately to mind is you have to run
>>> this early enough that you can still get to all the control flow and
>>> build your predicates.  Otherwise you have do undo stuff like
>>> conditional move generation.
>>
>> No, the opposite, in fact.  We want to run this very late, at least on
>> Arm systems (AArch64 or AArch32).  Conditional move instructions are
>> fine - they're data-flow operations, not control flow (in fact, that's
>> exactly what the control flow tracker instructions are).  By running it
>> late we avoid disrupting any of the earlier optimization passes as well.
>>
>>>
>>> On the flip side, the earlier you do this mitigation, the more you have
>>> to worry about what the optimizers are going to do to the code later in
>>> the pipeline.  It's almost guaranteed a naive implementation is going to
>>> muck this up since we can propagate the state of the condition into the
>>> arms which will make the predicate state a compile time constant.
>>>

Re: [PATCH] add support for strnlen (PR 81384)

2018-07-10 Thread Martin Sebor

On 07/10/2018 01:12 AM, Richard Biener wrote:

On Mon, Jul 9, 2018 at 11:26 PM Martin Sebor  wrote:


On 07/09/2018 08:36 AM, Aldy Hernandez wrote:

   { dg-do run }
   { do-options "-O2 -fno-tree-strlen" }  */

 I don't think this is doing anything.

If you look at the test run you can see that -fno-tree-strlen is never
passed (I think you actually mean -fno-optimize-strlen for that
matter).  Also, the builtins.exp harness runs your test for an
assortment of other flags, not just -O2.


I didn't know the harness ignores dg-options specified in these
tests.  That's surprising and feels like a bug in the harness
not to complain about it.  The purpose of the test is to verify
that the strnlen expansion in builtins.c does the right thing
and it deliberately tries to disable the earlier strlen
optimizations to make sure the expansion in builtins.c is fully
exercised.  By not pointing out my mistake the harness effectively
let me commit a change without making sure it's thoroughly tested
(I tested it manually before committing the patch but things could
regress without us noticing).  I'll look into fixing this somehow.



This test is failing on my range branch for -Og, because
expand_builtin_strnlen() needs range info:

+  wide_int min, max;
+  enum value_range_type rng = get_range_info (bound, , );
+  if (rng != VR_RANGE)
+return NULL_RTX;

but interestingly enough, it seems to be calculated in the sprintf
pass as part of the DOM walk:

  /* First record ranges generated by this statement.  */
  evrp_range_analyzer.record_ranges_from_stmt (stmt, false);

It feels wrong that the sprintf warning pass is generating range info
that you may later depend on at rtl expansion time (and for a totally
unrelated thing-- strlen expansion).


Any pass that records ranges for statements will have this
effect.  The sprintf pass seems to be the first one to make
use of this utility (and it's not just a warning pass but also
an optimization pass) but it would be a shame to put it off
limits to warning-only passes only because it happens to set
ranges.


As you noted elsewhere warning options shouldn't change code-generation.
This means that ranges may not be set to the IL in case they are only
computed when a warning option is enabled.


I agree.  That's also why I think it should be a basic service
rather than a side-effect of tree traversal, either one done
to implement a particular optimization, or one done by a warning.

The main reason for the work Jeff put in to extracting it out
of EVRP, if I recall correctly, was to expose more accurate
range information to warning passes.  Would setting statement
ranges make sense as part of EVRP (or some other similar pass)?
If not, the only way to conform to the policy that I can think
of is to have warning-only  passes that need it make
the traversal and call record_ranges regardless of whether or
not the warning itself is enabled.  That would seem needlessly
inefficient, even if the code implementing the warning itself
were disabled.

Martin


Re: [PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst

2018-07-10 Thread Bill Schmidt


> On Jul 10, 2018, at 8:48 AM, Richard Biener  
> wrote:
> 
> On Tue, Jul 10, 2018 at 3:33 PM Bill Schmidt  wrote:
>> 
>> 
>>> On Jul 10, 2018, at 2:10 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Mon, Jul 9, 2018 at 9:08 PM Will Schmidt  
>>> wrote:
 
 Hi,
 Re-posting.  Richard provided feedback on a previous version of this
 patch, I wanted to make sure he was/is OK with the latest. :-)
 
 Add support for Gimple folding for unaligned vector loads and stores.
 
 Regtest completed across variety of systems, P6,P7,P8,P9.
 
 [v2] Added the type for the MEM_REF, per feedback.
 Testcases for gimple-folding of the same are currently in-tree
 as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c.
 Re-tested, still looks good. :-)
 
 [v3] Updated the alignment for the MEM_REF to be 4bytes.
 Updated/added/removed comments in the code for clarity.
 
 OK for trunk?
 
 Thanks
 -Will
 
 [gcc]
 
 2018-07-09 Will Schmidt 
 
   * config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
   vec_xst variants to the list.
   (rs6000_gimple_fold_builtin): Add support for folding unaligned
   vector loads and stores.
 
 diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
 index 8bc4109..774c60a 100644
 --- a/gcc/config/rs6000/rs6000.c
 +++ b/gcc/config/rs6000/rs6000.c
 @@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum 
 rs6000_builtins fn_code)
case ALTIVEC_BUILTIN_STVX_V8HI:
case ALTIVEC_BUILTIN_STVX_V4SI:
case ALTIVEC_BUILTIN_STVX_V4SF:
case ALTIVEC_BUILTIN_STVX_V2DI:
case ALTIVEC_BUILTIN_STVX_V2DF:
 +case VSX_BUILTIN_STXVW4X_V16QI:
 +case VSX_BUILTIN_STXVW4X_V8HI:
 +case VSX_BUILTIN_STXVW4X_V4SF:
 +case VSX_BUILTIN_STXVW4X_V4SI:
 +case VSX_BUILTIN_STXVD2X_V2DF:
 +case VSX_BUILTIN_STXVD2X_V2DI:
  return true;
default:
  return false;
}
 }
 @@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
 *gsi)
   gimple_set_location (g, loc);
   gsi_replace (gsi, g, true);
   return true;
  }
 
 +/* unaligned Vector loads.  */
 +case VSX_BUILTIN_LXVW4X_V16QI:
 +case VSX_BUILTIN_LXVW4X_V8HI:
 +case VSX_BUILTIN_LXVW4X_V4SF:
 +case VSX_BUILTIN_LXVW4X_V4SI:
 +case VSX_BUILTIN_LXVD2X_V2DF:
 +case VSX_BUILTIN_LXVD2X_V2DI:
 +  {
 +arg0 = gimple_call_arg (stmt, 0);  // offset
 +arg1 = gimple_call_arg (stmt, 1);  // address
 +lhs = gimple_call_lhs (stmt);
 +location_t loc = gimple_location (stmt);
 +/* Since arg1 may be cast to a different type, just use 
 ptr_type_node
 +   here instead of trying to enforce TBAA on pointer types.  */
 +tree arg1_type = ptr_type_node;
 +tree lhs_type = TREE_TYPE (lhs);
 +/* in GIMPLE the type of the MEM_REF specifies the alignment.  The
 +  required alignment (power) is 4 bytes regardless of data type.  
 */
 +tree align_ltype = build_aligned_type (lhs_type, 4);
 +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
 Create
 +   the tree using the value from arg0.  The resulting type will 
 match
 +   the type of arg1.  */
 +gimple_seq stmts = NULL;
 +tree temp_offset = gimple_convert (, loc, sizetype, arg0);
 +tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
 +  arg1_type, arg1, temp_offset);
 +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
 +/* Use the build2 helper to set up the mem_ref.  The MEM_REF 
 could also
 +   take an offset, but since we've already incorporated the offset
 +   above, here we just pass in a zero.  */
 +gimple *g;
 +g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, 
 temp_addr,
 +   build_int_cst (arg1_type, 
 0)));
 +gimple_set_location (g, loc);
 +gsi_replace (gsi, g, true);
 +return true;
 +  }
 +
 +/* unaligned Vector stores.  */
 +case VSX_BUILTIN_STXVW4X_V16QI:
 +case VSX_BUILTIN_STXVW4X_V8HI:
 +case VSX_BUILTIN_STXVW4X_V4SF:
 +case VSX_BUILTIN_STXVW4X_V4SI:
 +case VSX_BUILTIN_STXVD2X_V2DF:
 +case VSX_BUILTIN_STXVD2X_V2DI:
 +  {
 +arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
 +arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
 +tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
 +location_t loc = gimple_location (stmt);

Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Bill Schmidt


> On Jul 10, 2018, at 3:49 AM, Richard Earnshaw (lists) 
>  wrote:
> 
> On 10/07/18 00:13, Jeff Law wrote:
>> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>> 
>>> The patches I posted earlier this year for mitigating against
>>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>>> which it became obvious that a rethink was needed.  This mail, and the
>>> following patches attempt to address that feedback and present a new
>>> approach to mitigating against this form of attack surface.
>>> 
>>> There were two major issues with the original approach:
>>> 
>>> - The speculation bounds were too tightly constrained - essentially
>>>  they had to represent and upper and lower bound on a pointer, or a
>>>  pointer offset.
>>> - The speculation constraints could only cover the immediately preceding
>>>  branch, which often did not fit well with the structure of the existing
>>>  code.
>>> 
>>> An additional criticism was that the shape of the intrinsic did not
>>> fit particularly well with systems that used a single speculation
>>> barrier that essentially had to wait until all preceding speculation
>>> had to be resolved.
>> Right.  I suggest the Intel and IBM reps chime in on the updated semantics.
>> 
> 
> Yes, logically, this is a boolean tracker value.  In practice we use ~0
> for true and 0 for false, so that we can simply use it as a mask
> operation later.
> 
> I hope this intrinsic will be even more acceptable than the one that
> Bill Schmidt acked previously, it's even simpler than the version we had
> last time.

Yes, I think this looks quite good.  Thanks!

Thanks also for digging into the speculation tracking algorithm.  This
has good potential as a conservative opt-in approach.  The obvious
concern is whether performance will be acceptable even for apps
that really want the protection.

We took a look at Chandler's WIP LLVM patch and ran some SPEC2006 
numbers on a Skylake box.  We saw geomean degradations of about
42% (int) and 33% (fp).  (This was just one test, so caveat emptor.)
This isn't terrible given the number of potential false positives and the
early state of the algorithm, but it's still a lot from a customer perspective.
I'll be interested in whether your interprocedural improvements are
able to reduce the conservatism a bit.

Thanks,
Bill
> 
>>> 
>>> To address all of the above, these patches adopt a new approach, based
>>> in part on a posting by Chandler Carruth to the LLVM developers list
>>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>>> but which we have extended to deal with inter-function speculation.
>>> The patches divide the problem into two halves.
>> We're essentially turning the control dependency into a value that we
>> can then use to munge the pointer or the resultant data.
>> 
>>> 
>>> The first half is some target-specific code to track the speculation
>>> condition through the generated code to provide an internal variable
>>> which can tell us whether or not the CPU's control flow speculation
>>> matches the data flow calculations.  The idea is that the internal
>>> variable starts with the value TRUE and if the CPU's control flow
>>> speculation ever causes a jump to the wrong block of code the variable
>>> becomes false until such time as the incorrect control flow
>>> speculation gets unwound.
>> Right.
>> 
>> So one of the things that comes immediately to mind is you have to run
>> this early enough that you can still get to all the control flow and
>> build your predicates.  Otherwise you have do undo stuff like
>> conditional move generation.
> 
> No, the opposite, in fact.  We want to run this very late, at least on
> Arm systems (AArch64 or AArch32).  Conditional move instructions are
> fine - they're data-flow operations, not control flow (in fact, that's
> exactly what the control flow tracker instructions are).  By running it
> late we avoid disrupting any of the earlier optimization passes as well.
> 
>> 
>> On the flip side, the earlier you do this mitigation, the more you have
>> to worry about what the optimizers are going to do to the code later in
>> the pipeline.  It's almost guaranteed a naive implementation is going to
>> muck this up since we can propagate the state of the condition into the
>> arms which will make the predicate state a compile time constant.
>> 
>> In fact this seems to be running into the area of pointer providence and
>> some discussions we had around atomic a few years back.
>> 
>> I also wonder if this could be combined with taint analysis to produce a
>> much lower overhead solution in cases were developers have done analysis
>> and know what objects are potentially under attacker control.  So
>> instead of analyzing everything, we can have a much narrower focus.
> 
> Automatic application of the tracker to vulnerable variables would be
> nice, but I haven't attempted to go there yet: at present I still rely
> on the user to annotate code with the new intrinsic.
> 
> 

Re: [PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 3:33 PM Bill Schmidt  wrote:
>
>
> > On Jul 10, 2018, at 2:10 AM, Richard Biener  
> > wrote:
> >
> > On Mon, Jul 9, 2018 at 9:08 PM Will Schmidt  
> > wrote:
> >>
> >> Hi,
> >>  Re-posting.  Richard provided feedback on a previous version of this
> >> patch, I wanted to make sure he was/is OK with the latest. :-)
> >>
> >> Add support for Gimple folding for unaligned vector loads and stores.
> >>
> >> Regtest completed across variety of systems, P6,P7,P8,P9.
> >>
> >> [v2] Added the type for the MEM_REF, per feedback.
> >> Testcases for gimple-folding of the same are currently in-tree
> >> as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c.
> >> Re-tested, still looks good. :-)
> >>
> >> [v3] Updated the alignment for the MEM_REF to be 4bytes.
> >> Updated/added/removed comments in the code for clarity.
> >>
> >> OK for trunk?
> >>
> >> Thanks
> >> -Will
> >>
> >> [gcc]
> >>
> >> 2018-07-09 Will Schmidt 
> >>
> >>* config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
> >>vec_xst variants to the list.
> >>(rs6000_gimple_fold_builtin): Add support for folding unaligned
> >>vector loads and stores.
> >>
> >> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> >> index 8bc4109..774c60a 100644
> >> --- a/gcc/config/rs6000/rs6000.c
> >> +++ b/gcc/config/rs6000/rs6000.c
> >> @@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum 
> >> rs6000_builtins fn_code)
> >> case ALTIVEC_BUILTIN_STVX_V8HI:
> >> case ALTIVEC_BUILTIN_STVX_V4SI:
> >> case ALTIVEC_BUILTIN_STVX_V4SF:
> >> case ALTIVEC_BUILTIN_STVX_V2DI:
> >> case ALTIVEC_BUILTIN_STVX_V2DF:
> >> +case VSX_BUILTIN_STXVW4X_V16QI:
> >> +case VSX_BUILTIN_STXVW4X_V8HI:
> >> +case VSX_BUILTIN_STXVW4X_V4SF:
> >> +case VSX_BUILTIN_STXVW4X_V4SI:
> >> +case VSX_BUILTIN_STXVD2X_V2DF:
> >> +case VSX_BUILTIN_STXVD2X_V2DI:
> >>   return true;
> >> default:
> >>   return false;
> >> }
> >> }
> >> @@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> >> *gsi)
> >>gimple_set_location (g, loc);
> >>gsi_replace (gsi, g, true);
> >>return true;
> >>   }
> >>
> >> +/* unaligned Vector loads.  */
> >> +case VSX_BUILTIN_LXVW4X_V16QI:
> >> +case VSX_BUILTIN_LXVW4X_V8HI:
> >> +case VSX_BUILTIN_LXVW4X_V4SF:
> >> +case VSX_BUILTIN_LXVW4X_V4SI:
> >> +case VSX_BUILTIN_LXVD2X_V2DF:
> >> +case VSX_BUILTIN_LXVD2X_V2DI:
> >> +  {
> >> +arg0 = gimple_call_arg (stmt, 0);  // offset
> >> +arg1 = gimple_call_arg (stmt, 1);  // address
> >> +lhs = gimple_call_lhs (stmt);
> >> +location_t loc = gimple_location (stmt);
> >> +/* Since arg1 may be cast to a different type, just use 
> >> ptr_type_node
> >> +   here instead of trying to enforce TBAA on pointer types.  */
> >> +tree arg1_type = ptr_type_node;
> >> +tree lhs_type = TREE_TYPE (lhs);
> >> +/* in GIMPLE the type of the MEM_REF specifies the alignment.  The
> >> +  required alignment (power) is 4 bytes regardless of data type.  
> >> */
> >> +tree align_ltype = build_aligned_type (lhs_type, 4);
> >> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
> >> Create
> >> +   the tree using the value from arg0.  The resulting type will 
> >> match
> >> +   the type of arg1.  */
> >> +gimple_seq stmts = NULL;
> >> +tree temp_offset = gimple_convert (, loc, sizetype, arg0);
> >> +tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
> >> +  arg1_type, arg1, temp_offset);
> >> +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> >> +/* Use the build2 helper to set up the mem_ref.  The MEM_REF 
> >> could also
> >> +   take an offset, but since we've already incorporated the offset
> >> +   above, here we just pass in a zero.  */
> >> +gimple *g;
> >> +g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, 
> >> temp_addr,
> >> +   build_int_cst (arg1_type, 
> >> 0)));
> >> +gimple_set_location (g, loc);
> >> +gsi_replace (gsi, g, true);
> >> +return true;
> >> +  }
> >> +
> >> +/* unaligned Vector stores.  */
> >> +case VSX_BUILTIN_STXVW4X_V16QI:
> >> +case VSX_BUILTIN_STXVW4X_V8HI:
> >> +case VSX_BUILTIN_STXVW4X_V4SF:
> >> +case VSX_BUILTIN_STXVW4X_V4SI:
> >> +case VSX_BUILTIN_STXVD2X_V2DF:
> >> +case VSX_BUILTIN_STXVD2X_V2DI:
> >> +  {
> >> +arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
> >> +arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
> >> +tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
> >> +location_t loc = gimple_location (stmt);
> >> +tree arg0_type = TREE_TYPE (arg0);
> >> +

Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Earnshaw (lists)
On 10/07/18 12:21, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 12:53 PM Richard Earnshaw (lists)
>  wrote:
>>
>> On 10/07/18 11:10, Richard Biener wrote:
>>> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
>>>  wrote:

 On 10/07/18 08:19, Richard Biener wrote:
> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
>  wrote:
>>
>>
>> The patches I posted earlier this year for mitigating against
>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>> which it became obvious that a rethink was needed.  This mail, and the
>> following patches attempt to address that feedback and present a new
>> approach to mitigating against this form of attack surface.
>>
>> There were two major issues with the original approach:
>>
>> - The speculation bounds were too tightly constrained - essentially
>>   they had to represent and upper and lower bound on a pointer, or a
>>   pointer offset.
>> - The speculation constraints could only cover the immediately preceding
>>   branch, which often did not fit well with the structure of the existing
>>   code.
>>
>> An additional criticism was that the shape of the intrinsic did not
>> fit particularly well with systems that used a single speculation
>> barrier that essentially had to wait until all preceding speculation
>> had to be resolved.
>>
>> To address all of the above, these patches adopt a new approach, based
>> in part on a posting by Chandler Carruth to the LLVM developers list
>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>> but which we have extended to deal with inter-function speculation.
>> The patches divide the problem into two halves.
>>
>> The first half is some target-specific code to track the speculation
>> condition through the generated code to provide an internal variable
>> which can tell us whether or not the CPU's control flow speculation
>> matches the data flow calculations.  The idea is that the internal
>> variable starts with the value TRUE and if the CPU's control flow
>> speculation ever causes a jump to the wrong block of code the variable
>> becomes false until such time as the incorrect control flow
>> speculation gets unwound.
>>
>> The second half is that a new intrinsic function is introduced that is
>> much simpler than we had before.  The basic version of the intrinsic
>> is now simply:
>>
>>   T var = __builtin_speculation_safe_value (T unsafe_var);
>>
>> Full details of the syntax can be found in the documentation patch, in
>> patch 1.  In summary, when not speculating the intrinsic returns
>> unsafe_var; when speculating then if it can be shown that the
>> speculative flow has diverged from the intended control flow then zero
>> is returned.  An optional second argument can be used to return an
>> alternative value to zero.  The builtin may cause execution to pause
>> until the speculation state is resolved.
>
> So a trivial target implementation would be to emit a barrier and then
> it would always return unsafe_var and never zero.  What I don't understand
> fully is what users should do here, thus what the value of ever returning
> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
> special value instead of doing
>
>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
>  if (!safe)
>/* what now? */
>
> I'm only guessing that the correct way to handle "unsafe" is basically
>
>  while (__builtin_speculation_safe_value (val) == 0)
> ;
>
> use val, it's now safe

 No, a safe version of val is returned, not a bool telling you it is now
 safe to use the original.
>>>
>>> OK, so making the old value dead is required to preserve the desired
>>> dataflow.
>>>
>>> But how should I use the special value that signaled "failure"?
>>>
>>> Obviously the user isn't supposed to simply replace 'val' with
>>>
>>>  val = __builtin_speculation_safe_value (val);
>>>
>>> to make it speculation-proof.  So - how should the user _use_ this
>>> builtin?  The docs do not say anything about this but says the
>>> very confusing
>>>
>>> +The function may use target-dependent speculation tracking state to cause
>>> +@var{failval} to be returned when it is known that speculative
>>> +execution has incorrectly predicted a conditional branch operation.
>>>
>>> because speculation is about executing instructions as if they were
>>> supposed to be executed.  Once it is known the prediciton was wrong
>>> no more "wrong" instructions will be executed but a previously
>>> speculated instruction cannot know it was "falsely" speculated.
>>>
>>> Does the above try to say that the function may return failval if the
>>> instruction is currently executed speculatively instead?  That 

Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 3:27 PM Jakub Jelinek  wrote:
>
> On Tue, Jul 10, 2018 at 03:17:48PM +0200, Richard Biener wrote:
> > On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
> > > Jeff told me that the recent popcount built-in detection is causing
> > > kernel build issues as
> > > ERROR: "__popcountsi2"
> > > [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
> > >
> > > I could also reproduce this. AFIK, we should check if the libfunc is
> > > defined while checking popcount?
> > >
> > > I am testing the attached RFC patch. Is this reasonable?
> >
> > It doesn't work that way, all targets have this libfunc in libgcc.  This 
> > means
> > the kernel has to provide it.  The only thing you could do is restrict
> > replacement of CALL_EXPRs (in SCEV cprop) to those the target
> > natively supports.
>
> Yeah, that is what we've done in the past in other cases, e.g. PR82981
> is somewhat similar.  So perhaps just check the optab and use it only if it
> is supported?

Btw, given that we do not want to fail niter analysis we'd have to audit
all places that eventually code-generate it which isn't only SCEV-cprop ...

So not sure if it isn't better to user-control this in a way not depending
on target HW support...

Richard.

>
> Jakub


Re: [PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst

2018-07-10 Thread Bill Schmidt


> On Jul 10, 2018, at 2:10 AM, Richard Biener  
> wrote:
> 
> On Mon, Jul 9, 2018 at 9:08 PM Will Schmidt  wrote:
>> 
>> Hi,
>>  Re-posting.  Richard provided feedback on a previous version of this
>> patch, I wanted to make sure he was/is OK with the latest. :-)
>> 
>> Add support for Gimple folding for unaligned vector loads and stores.
>> 
>> Regtest completed across variety of systems, P6,P7,P8,P9.
>> 
>> [v2] Added the type for the MEM_REF, per feedback.
>> Testcases for gimple-folding of the same are currently in-tree
>> as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c.
>> Re-tested, still looks good. :-)
>> 
>> [v3] Updated the alignment for the MEM_REF to be 4bytes.
>> Updated/added/removed comments in the code for clarity.
>> 
>> OK for trunk?
>> 
>> Thanks
>> -Will
>> 
>> [gcc]
>> 
>> 2018-07-09 Will Schmidt 
>> 
>>* config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
>>vec_xst variants to the list.
>>(rs6000_gimple_fold_builtin): Add support for folding unaligned
>>vector loads and stores.
>> 
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 8bc4109..774c60a 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum 
>> rs6000_builtins fn_code)
>> case ALTIVEC_BUILTIN_STVX_V8HI:
>> case ALTIVEC_BUILTIN_STVX_V4SI:
>> case ALTIVEC_BUILTIN_STVX_V4SF:
>> case ALTIVEC_BUILTIN_STVX_V2DI:
>> case ALTIVEC_BUILTIN_STVX_V2DF:
>> +case VSX_BUILTIN_STXVW4X_V16QI:
>> +case VSX_BUILTIN_STXVW4X_V8HI:
>> +case VSX_BUILTIN_STXVW4X_V4SF:
>> +case VSX_BUILTIN_STXVW4X_V4SI:
>> +case VSX_BUILTIN_STXVD2X_V2DF:
>> +case VSX_BUILTIN_STXVD2X_V2DI:
>>   return true;
>> default:
>>   return false;
>> }
>> }
>> @@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
>> *gsi)
>>gimple_set_location (g, loc);
>>gsi_replace (gsi, g, true);
>>return true;
>>   }
>> 
>> +/* unaligned Vector loads.  */
>> +case VSX_BUILTIN_LXVW4X_V16QI:
>> +case VSX_BUILTIN_LXVW4X_V8HI:
>> +case VSX_BUILTIN_LXVW4X_V4SF:
>> +case VSX_BUILTIN_LXVW4X_V4SI:
>> +case VSX_BUILTIN_LXVD2X_V2DF:
>> +case VSX_BUILTIN_LXVD2X_V2DI:
>> +  {
>> +arg0 = gimple_call_arg (stmt, 0);  // offset
>> +arg1 = gimple_call_arg (stmt, 1);  // address
>> +lhs = gimple_call_lhs (stmt);
>> +location_t loc = gimple_location (stmt);
>> +/* Since arg1 may be cast to a different type, just use 
>> ptr_type_node
>> +   here instead of trying to enforce TBAA on pointer types.  */
>> +tree arg1_type = ptr_type_node;
>> +tree lhs_type = TREE_TYPE (lhs);
>> +/* in GIMPLE the type of the MEM_REF specifies the alignment.  The
>> +  required alignment (power) is 4 bytes regardless of data type.  */
>> +tree align_ltype = build_aligned_type (lhs_type, 4);
>> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
>> Create
>> +   the tree using the value from arg0.  The resulting type will 
>> match
>> +   the type of arg1.  */
>> +gimple_seq stmts = NULL;
>> +tree temp_offset = gimple_convert (, loc, sizetype, arg0);
>> +tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
>> +  arg1_type, arg1, temp_offset);
>> +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +/* Use the build2 helper to set up the mem_ref.  The MEM_REF could 
>> also
>> +   take an offset, but since we've already incorporated the offset
>> +   above, here we just pass in a zero.  */
>> +gimple *g;
>> +g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, 
>> temp_addr,
>> +   build_int_cst (arg1_type, 
>> 0)));
>> +gimple_set_location (g, loc);
>> +gsi_replace (gsi, g, true);
>> +return true;
>> +  }
>> +
>> +/* unaligned Vector stores.  */
>> +case VSX_BUILTIN_STXVW4X_V16QI:
>> +case VSX_BUILTIN_STXVW4X_V8HI:
>> +case VSX_BUILTIN_STXVW4X_V4SF:
>> +case VSX_BUILTIN_STXVW4X_V4SI:
>> +case VSX_BUILTIN_STXVD2X_V2DF:
>> +case VSX_BUILTIN_STXVD2X_V2DI:
>> +  {
>> +arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
>> +arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
>> +tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
>> +location_t loc = gimple_location (stmt);
>> +tree arg0_type = TREE_TYPE (arg0);
>> +/* Use ptr_type_node (no TBAA) for the arg2_type.  */
>> +tree arg2_type = ptr_type_node;
>> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
>> Create
>> +   the tree using the value from arg0.  The resulting type will 
>> match
>> +   the type 

Re: [AArch64] Generate load-pairs when the last load clobbers the address register [2/2]

2018-07-10 Thread Sudakshina Das

Hi Jackson


On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote:

Hi all,

This patch resolves PR86014.  It does so by noticing that the last 
load may clobber the address register without issue (regardless of 
where it exists in the final ldp/stp sequence).  That check has been 
changed so that the last register may be clobbered and the testcase 
(gcc.target/aarch64/ldp_stp_10.c) now passes.


Bootstrap and regtest OK.

OK for trunk?

Jackson

Changelog:

gcc/

2018-06-25  Jackson Woodruff  

    PR target/86014
    * config/aarch64/aarch64.c 
(aarch64_operands_adjust_ok_for_ldpstp):

    Remove address clobber check on last register.


This looks good to me but you will need a maintainer to approve it. The only
thing I would add is that if you could move the comment on top of the 
for loop

to this patch. That is, keep the original
/* Check if the addresses are clobbered by load.  */
in your [1/2] and make the comment change in [2/2].

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
d0e9b2d464183eecc8cc7639ca3e981d2ff243ba..feffe8ebdbd4efd0ffc09834547767ceec46f4e4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17074,7 +17074,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, 
bool load,
   /* Only the last register in the order in which they occur
  may be clobbered by the load.  */
   if (load)
-for (int i = 0; i < num_instructions; i++)
+for (int i = 0; i < num_instructions - 1; i++)
   if (reg_mentioned_p (reg[i], mem[i]))
return false;


Thanks
Sudi



Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Jakub Jelinek
On Tue, Jul 10, 2018 at 03:17:48PM +0200, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
> > Jeff told me that the recent popcount built-in detection is causing
> > kernel build issues as
> > ERROR: "__popcountsi2"
> > [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
> >
> > I could also reproduce this. AFIK, we should check if the libfunc is
> > defined while checking popcount?
> >
> > I am testing the attached RFC patch. Is this reasonable?
> 
> It doesn't work that way, all targets have this libfunc in libgcc.  This means
> the kernel has to provide it.  The only thing you could do is restrict
> replacement of CALL_EXPRs (in SCEV cprop) to those the target
> natively supports.

Yeah, that is what we've done in the past in other cases, e.g. PR82981
is somewhat similar.  So perhaps just check the optab and use it only if it
is supported?

Jakub


Re: [RFC] Fix recent popcount change is breaking

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 3:06 PM Kugan Vivekanandarajah
 wrote:
>
> Hi,
>
> Jeff told me that the recent popcount built-in detection is causing
> kernel build issues as
> ERROR: "__popcountsi2"
> [drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!
>
> I could also reproduce this. AFIK, we should check if the libfunc is
> defined while checking popcount?
>
> I am testing the attached RFC patch. Is this reasonable?

It doesn't work that way, all targets have this libfunc in libgcc.  This means
the kernel has to provide it.  The only thing you could do is restrict
replacement of CALL_EXPRs (in SCEV cprop) to those the target
natively supports.

Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2018-07-10  Kugan Vivekanandarajah  
>
> * tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
> if libfunc for popcount is available.


[PATCH] Do less (redudant) constant propagation during unrolling

2018-07-10 Thread Richard Biener


The following avoids constant propagating both on a loop and its
children.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'm going to remove the checking code I added before committing
(and I will of course re-test).

Richard.

2018-07-10  Richard Biener  

* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1):
Rework father_bb setting in a way to avoid propagating constants
multiple times on a loop body.

diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index c951f17f9ba..4f6080b339f 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -1378,20 +1378,53 @@ tree_unroll_loops_completely_1 (bool may_increase_size, 
bool unroll_outer,
   enum unroll_level ul;
   unsigned num = number_of_loops (cfun);
 
+  gcc_assert (bitmap_empty_p (father_bbs));
+
   /* Process inner loops first.  Don't walk loops added by the recursive
  calls because SSA form is not up-to-date.  They can be handled in the
  next iteration.  */
+  bitmap child_father_bbs = NULL;
   for (inner = loop->inner; inner != NULL; inner = inner->next)
 if ((unsigned) inner->num < num)
-  changed |= tree_unroll_loops_completely_1 (may_increase_size,
-unroll_outer, father_bbs,
-inner);
+  {
+   if (!child_father_bbs)
+ child_father_bbs = BITMAP_ALLOC (NULL);
+   if (tree_unroll_loops_completely_1 (may_increase_size, unroll_outer,
+   child_father_bbs, inner))
+ {
+   bitmap_ior_into (father_bbs, child_father_bbs);
+   bitmap_clear (child_father_bbs);
+   changed = true;
+ }
+  }
+  if (child_father_bbs)
+BITMAP_FREE (child_father_bbs);
 
   /* If we changed an inner loop we cannot process outer loops in this
  iteration because SSA form is not up-to-date.  Continue with
  siblings of outer loops instead.  */
   if (changed)
-return true;
+{
+  /* If we are recorded as father clear all other fathers that
+ are necessarily covered already to avoid redundant work.  */
+  if (bitmap_bit_p (father_bbs, loop->header->index))
+   {
+ if (flag_checking)
+   {
+ bitmap_iterator bi;
+ unsigned i;
+ EXECUTE_IF_SET_IN_BITMAP (father_bbs, 0, i, bi)
+   {
+ loop_p floop = BASIC_BLOCK_FOR_FN (cfun, i)->loop_father;
+ gcc_assert (floop == loop
+ || flow_loop_nested_p (loop, floop));
+   }
+   }
+ bitmap_clear (father_bbs);
+ bitmap_set_bit (father_bbs, loop->header->index);
+   }
+  return true;
+}
 
   /* Don't unroll #pragma omp simd loops until the vectorizer
  attempts to vectorize those.  */
@@ -1421,7 +1454,23 @@ tree_unroll_loops_completely_1 (bool may_increase_size, 
bool unroll_outer,
 computations; otherwise, the size might blow up before the
 iteration is complete and the IR eventually cleaned up.  */
   if (loop_outer (loop_father))
-   bitmap_set_bit (father_bbs, loop_father->header->index);
+   {
+ /* Once we process our father we will have processed
+the fathers of our children as well, so avoid doing
+redundant work and clear fathers we've gathered sofar.  */
+ if (flag_checking)
+   {
+ bitmap_iterator bi;
+ unsigned i;
+ EXECUTE_IF_SET_IN_BITMAP (father_bbs, 0, i, bi)
+   {
+ loop_p floop = BASIC_BLOCK_FOR_FN (cfun, i)->loop_father;
+ gcc_assert (flow_loop_nested_p (loop_father, floop));
+   }
+   }
+ bitmap_clear (father_bbs);
+ bitmap_set_bit (father_bbs, loop_father->header->index);
+   }
 
   return true;
 }


[RFC] Fix recent popcount change is breaking

2018-07-10 Thread Kugan Vivekanandarajah
Hi,

Jeff told me that the recent popcount built-in detection is causing
kernel build issues as
ERROR: "__popcountsi2"
[drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko] undefined!

I could also reproduce this. AFIK, we should check if the libfunc is
defined while checking popcount?

I am testing the attached RFC patch. Is this reasonable?

Thanks,
Kugan

gcc/ChangeLog:

2018-07-10  Kugan Vivekanandarajah  

* tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
if libfunc for popcount is available.
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index f6fa2f7..2e2b9c6 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
 #include "rtl.h"
 #include "tree.h"
 #include "gimple.h"
@@ -42,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-chrec.h"
 #include "tree-scalar-evolution.h"
 #include "params.h"
+#include "optabs-libfuncs.h"
 #include "tree-dfa.h"
 
 
@@ -2570,6 +2572,10 @@ number_of_iterations_popcount (loop_p loop, edge exit,
   (long_long_integer_type_node))
 fn = builtin_decl_implicit (BUILT_IN_POPCOUNTLL);
 
+  /* Check if libfunc for popcount is available.  */
+  if (!optab_libfunc (popcount_optab,
+ TYPE_MODE (TREE_TYPE (src
+return false;
   /* ??? Support promoting char/short to int.  */
   if (!fn)
 return false;


Re: [PATCH][RFC] Make iterating over hash-map elide copying/destructing

2018-07-10 Thread Richard Biener
On Tue, 10 Jul 2018, Richard Biener wrote:

> On Tue, 10 Jul 2018, Trevor Saunders wrote:
> 
> > On Tue, Jul 10, 2018 at 10:43:20AM +0200, Richard Biener wrote:
> > > 
> > > The following makes the hash-map iterator dereference return a pair > > Value&> rather than a copy of Value.  This matches the hash-table iterator
> > > behavior and avoids issues with
> > > 
> > >   hash_map >
> > 
> > Eventually somebodies probably going to want
> > hash_map>, auto_vec> too, so we might as well go ahead
> > and make it pair?
> > 
> > > where iterating over the hash-table will call the auto_vec destructor
> > > when dereferencing the iterator.  I note that the copy ctor of
> > > auto_vec should probably be deleted and the hash-table/map iterators
> > > should possibly support an alternate "reference" type to the stored
> > > Values so we can use vec<> for "references" and auto_vec<> for
> > > stored members.
> > 
> > I think code somewhere uses the auto_vec copy ctor to return a auto_vec,
> > this is pretty similar to the situation with unique_ptr in c++98 mode.
> > 
> > > But that's out of scope - the patch below seems to survive minimal
> > > testing at least.
> > > 
> > > I suppose we still want to somehow hide the copy ctors of auto_vec?
> > 
> > I suspec the best we can do is delete it in c++11 mode and provide a
> > auto_vec(auto_vec &&) move ctor instead.  Though I think for the
> > case where auto_vec has inline storage we should be able to just delete
> > the copy ctor?
> > 
> > > How does hash-map growth work here?  (I suppose it doesn't...?)
> > 
> > Yeah was going to ask, I think hash_table memcpy's the elements? in
> > which case memcpying a pointer into yourself isn't going to work.
> 
> It doesn't work.  It uses assignment but auto_vec doesn't implement
> that so auto-storage breaks.  So you say it should use
> std::move<> where that's obviously not available for us :/
> 
> > However I think if you use the auto_vec specialization for 0 internal
> > elements that should be able to work if we null out the old auto_vec or
> > avoid running dtors on the old elements.
> 
> Well, then I don't really need auto_vec, I'm more interested in the
> embedded storage than the destructor ;)
> 
> > > Any further comments?
> > 
> > other than using a reference for the key type seems good.
> 
> OK, I suppose it should be 'const Key&' then (hopefully that
> works for Key == const X / X * as intended).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-07-10  Richard Biener  

* hash-map.h (hash_map::iterator::operator*): Return
references to key and value.

diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index 7861440f3b3..39848289d80 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -223,10 +223,10 @@ public:
   return *this;
 }
 
-std::pair operator* ()
+std::pair operator* ()
 {
   hash_entry  = *m_iter;
-  return std::pair (e.m_key, e.m_value);
+  return std::pair (e.m_key, e.m_value);
 }
 
 bool


[PATCH] Fix PR86457

2018-07-10 Thread Richard Biener


The following fixes PR86457.  I also removed some leftover tests
of dwarf2out_as_loc_support in favor of output_asm_line_debug_info.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-07-10  Richard Biener  

PR debug/86457
* dwarf2out.c (init_sections_and_labels): Use
output_asm_line_debug_info consistently.
(dwarf2out_early_finish): Likewise.
(dwarf2out_finish): Remove DW_AT_stmt_list from early generated
type units.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 262538)
+++ gcc/dwarf2out.c (working copy)
@@ -28483,7 +28483,7 @@ init_sections_and_labels (bool early_lto
   debug_str_section = get_section (DEBUG_LTO_STR_SECTION,
   DEBUG_STR_SECTION_FLAGS
   | SECTION_EXCLUDE, NULL);
-  if (!dwarf_split_debug_info && !dwarf2out_as_loc_support)
+  if (!dwarf_split_debug_info && !output_asm_line_debug_info ())
debug_line_str_section
  = get_section (DEBUG_LTO_LINE_STR_SECTION,
 DEBUG_STR_SECTION_FLAGS | SECTION_EXCLUDE, NULL);
@@ -31125,9 +31125,9 @@ dwarf2out_finish (const char *)
  if (*slot != HTAB_EMPTY_ENTRY)
continue;
 
- /* Add a pointer to the line table for the main compilation unit
-so that the debugger can make sense of DW_AT_decl_file
-attributes.  */
+ /* Remove the pointer to the line table.  */
+ remove_AT (ctnode->root_die, DW_AT_stmt_list);
+
  if (debug_info_level >= DINFO_LEVEL_TERSE)
reset_dies (ctnode->root_die);
 
@@ -31819,7 +31819,7 @@ dwarf2out_early_finish (const char *file
 
   /* When emitting DWARF5 .debug_line_str, move DW_AT_name and
  DW_AT_comp_dir into .debug_line_str section.  */
-  if (!dwarf2out_as_loc_support
+  if (!output_asm_line_debug_info ()
   && dwarf_version >= 5
   && DWARF5_USE_DEBUG_LINE_STR)
 {


Re: [PATCH] alpha: Use TARGET_COMPUTE_FRAME_LAYOUT

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 2:50 PM Richard Henderson  wrote:
>
> On 07/10/2018 12:05 AM, Richard Biener wrote:
> > On Mon, Jul 9, 2018 at 9:05 PM Richard Henderson  wrote:
> >>
> >> At the same time, merge several related frame computing functions.
> >> Recall that HWI is now always 64-bit, so merge IMASK and FMASK,
> >> which allows merging of several loops within prologue and epilogue.
> >
> > Btw, if you're not using these with existing HWI APIs it is now prefered
> > to use [u]int64_t where appropriate.
>
> I do use both popcount_hwi and ctz_hwi.  Enough to swing the preference?

Yes.  Getting rid of HWI in APIs is difficult, and s/HWI/uint64_t/ loses
in my eye...

Richard.

>
> r~


Re: [PATCH] alpha: Use TARGET_COMPUTE_FRAME_LAYOUT

2018-07-10 Thread Richard Henderson
On 07/10/2018 12:05 AM, Richard Biener wrote:
> On Mon, Jul 9, 2018 at 9:05 PM Richard Henderson  wrote:
>>
>> At the same time, merge several related frame computing functions.
>> Recall that HWI is now always 64-bit, so merge IMASK and FMASK,
>> which allows merging of several loops within prologue and epilogue.
> 
> Btw, if you're not using these with existing HWI APIs it is now prefered
> to use [u]int64_t where appropriate.

I do use both popcount_hwi and ctz_hwi.  Enough to swing the preference?


r~


Re: [PATCH 1/2] Add "optinfo" framework

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 1:00 PM David Malcolm  wrote:
>
> On Mon, 2018-07-09 at 15:00 +0200, Richard Biener wrote:
> > On Mon, Jul 2, 2018 at 10:51 PM David Malcolm 
> > wrote:
> > >
> > > This patch implements a way to consolidate dump_* calls into
> > > optinfo objects, as enabling work towards being able to write out
> > > optimization records to a file, or emit them as diagnostic
> > > "remarks".
> > >
> > > The patch adds the support for building optinfo instances from
> > > dump_*
> > > calls, but leaves implementing any *users* of them to followup
> > > patches.
> > >
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > >
> > > OK for trunk?
> >
> > Looks good overall, but ...
> >
> > To "fix" the GC issue you'd need to capture all possibly interesting
> > information from tree/gimple while it is still in flight.  This _may_
> > be
> > necessary anyway since I remember writing code like
> >
> >   fprintf (dump_file, "old: ");
> >   print_gimple_stmt (..., stmt);
> >   gimple_set_rhs1 (stmt, op);
> >   fprintf (dump_file, "new: ");
> >   print_gmple_stmt (..., stmt);
> >   fprintf (dump_file, "\n");
> >
> > capturing interesting information means we know all targeted
> > optinfo channels, right?  And the optinfo consumers
> > need to handle "streams" of input and may not look back.
>
> > I've yet have to look at the 2nd patch but can you comment on
> > this?  How difficult is it to re-wire how the data flows to make
> > stmt re-use like the above possible?
>
> I *think* it's doable: rather than capture, say, a gimple *, the
> optinfo_item would capture the result of pp_gimple_stmt_1, plus some
> metadata.  In fact, it would probably allow for removing the
> optinfo_item subclasses, making optinfo_item concrete, containing
> something like:
>
>   /* Textual form.  */
>   char *m_text;
>   bool m_ownership_of_text;
>
>   /* Metadata for optimization records.  */
>   enum optinfo_item_kind m_kind;
>   location_t m_location;
>
> or somesuch.
>
> I'll have a go at implementing this.

Thanks, that would be much cleaner (if also a bit more fugly
when you need to debug things)

Richard.

> Thanks
> Dave
>
> > Thanks,
> > Richard.
> >
> > > gcc/ChangeLog:
> > > * Makefile.in (OBJS): Add optinfo.o.
> > > * coretypes.h (class symtab_node): New forward decl.
> > > (struct cgraph_node): New forward decl.
> > > (class varpool_node): New forward decl.
> > > * dump-context.h: New file.
> > > * dumpfile.c: Include "optinfo.h", "dump-context.h",
> > > "cgraph.h",
> > > "tree-pass.h", "optinfo-internal.h".
> > > (refresh_dumps_are_enabled): Use optinfo_enabled_p.
> > > (set_dump_file): Call
> > > dumpfile_ensure_any_optinfo_are_flushed.
> > > (set_alt_dump_file): Likewise.
> > > (dump_context::~dump_context): New dtor.
> > > (dump_gimple_stmt): Move implementation to...
> > > (dump_context::dump_gimple_stmt): ...this new member
> > > function.
> > > Add the stmt to any pending optinfo, creating one if need
> > > be.
> > > (dump_gimple_stmt_loc): Move implementation to...
> > > (dump_context::dump_gimple_stmt_loc): ...this new member
> > > function.
> > > Convert param "loc" from location_t to const
> > > dump_location_t &.
> > > Start a new optinfo and add the stmt to it.
> > > (dump_generic_expr): Move implementation to...
> > > (dump_context::dump_generic_expr): ...this new member
> > > function.
> > > Add the tree to any pending optinfo, creating one if need
> > > be.
> > > (dump_generic_expr_loc): Move implementation to...
> > > (dump_context::dump_generic_expr_loc): ...this new member
> > > function.  Add the tree to any pending optinfo, creating
> > > one if
> > > need be.
> > > (dump_printf): Move implementation to...
> > > (dump_context::dump_printf_va): ...this new member
> > > function.  Add
> > > the text to any pending optinfo, creating one if need be.
> > > (dump_printf_loc): Move implementation to...
> > > (dump_context::dump_printf_loc_va): ...this new member
> > > function.
> > > Convert param "loc" from location_t to const
> > > dump_location_t &.
> > > Start a new optinfo and add the stmt to it.
> > > (dump_dec): Move implementation to...
> > > (dump_context::dump_dec): ...this new member function.  Add
> > > the
> > > value to any pending optinfo, creating one if need be.
> > > (dump_context::dump_symtab_node): New member function.
> > > (dump_context::get_scope_depth): New member function.
> > > (dump_context::begin_scope): New member function.
> > > (dump_context::end_scope): New member function.
> > > (dump_context::ensure_pending_optinfo): New member
> > > function.
> > > (dump_context::begin_next_optinfo): New member function.
> > > 

Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 12:53 PM Richard Earnshaw (lists)
 wrote:
>
> On 10/07/18 11:10, Richard Biener wrote:
> > On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
> >  wrote:
> >>
> >> On 10/07/18 08:19, Richard Biener wrote:
> >>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
> >>>  wrote:
> 
> 
>  The patches I posted earlier this year for mitigating against
>  CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>  which it became obvious that a rethink was needed.  This mail, and the
>  following patches attempt to address that feedback and present a new
>  approach to mitigating against this form of attack surface.
> 
>  There were two major issues with the original approach:
> 
>  - The speculation bounds were too tightly constrained - essentially
>    they had to represent and upper and lower bound on a pointer, or a
>    pointer offset.
>  - The speculation constraints could only cover the immediately preceding
>    branch, which often did not fit well with the structure of the existing
>    code.
> 
>  An additional criticism was that the shape of the intrinsic did not
>  fit particularly well with systems that used a single speculation
>  barrier that essentially had to wait until all preceding speculation
>  had to be resolved.
> 
>  To address all of the above, these patches adopt a new approach, based
>  in part on a posting by Chandler Carruth to the LLVM developers list
>  (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>  but which we have extended to deal with inter-function speculation.
>  The patches divide the problem into two halves.
> 
>  The first half is some target-specific code to track the speculation
>  condition through the generated code to provide an internal variable
>  which can tell us whether or not the CPU's control flow speculation
>  matches the data flow calculations.  The idea is that the internal
>  variable starts with the value TRUE and if the CPU's control flow
>  speculation ever causes a jump to the wrong block of code the variable
>  becomes false until such time as the incorrect control flow
>  speculation gets unwound.
> 
>  The second half is that a new intrinsic function is introduced that is
>  much simpler than we had before.  The basic version of the intrinsic
>  is now simply:
> 
>    T var = __builtin_speculation_safe_value (T unsafe_var);
> 
>  Full details of the syntax can be found in the documentation patch, in
>  patch 1.  In summary, when not speculating the intrinsic returns
>  unsafe_var; when speculating then if it can be shown that the
>  speculative flow has diverged from the intended control flow then zero
>  is returned.  An optional second argument can be used to return an
>  alternative value to zero.  The builtin may cause execution to pause
>  until the speculation state is resolved.
> >>>
> >>> So a trivial target implementation would be to emit a barrier and then
> >>> it would always return unsafe_var and never zero.  What I don't understand
> >>> fully is what users should do here, thus what the value of ever returning
> >>> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
> >>> special value instead of doing
> >>>
> >>>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
> >>>  if (!safe)
> >>>/* what now? */
> >>>
> >>> I'm only guessing that the correct way to handle "unsafe" is basically
> >>>
> >>>  while (__builtin_speculation_safe_value (val) == 0)
> >>> ;
> >>>
> >>> use val, it's now safe
> >>
> >> No, a safe version of val is returned, not a bool telling you it is now
> >> safe to use the original.
> >
> > OK, so making the old value dead is required to preserve the desired
> > dataflow.
> >
> > But how should I use the special value that signaled "failure"?
> >
> > Obviously the user isn't supposed to simply replace 'val' with
> >
> >  val = __builtin_speculation_safe_value (val);
> >
> > to make it speculation-proof.  So - how should the user _use_ this
> > builtin?  The docs do not say anything about this but says the
> > very confusing
> >
> > +The function may use target-dependent speculation tracking state to cause
> > +@var{failval} to be returned when it is known that speculative
> > +execution has incorrectly predicted a conditional branch operation.
> >
> > because speculation is about executing instructions as if they were
> > supposed to be executed.  Once it is known the prediciton was wrong
> > no more "wrong" instructions will be executed but a previously
> > speculated instruction cannot know it was "falsely" speculated.
> >
> > Does the above try to say that the function may return failval if the
> > instruction is currently executed speculatively instead?  That would
> > make sense to me.  And return failval 

allow thread_through_all_blocks() to start from the same initial BB

2018-07-10 Thread Aldy Hernandez

I believe I missed this companion patch when I submitted...

   Subject: jump threading multiple paths that start from the same BB

The attached patch changes thread_through_all_blocks to allow threads 
that start from the same basic block as another thread.


OK for trunk?
gcc/

* tree-ssa-threadupdate.c (thread_through_all_blocks): Do not jump
	thread twice from the same starting edge.

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 17f9b89d6a7..8080dff76d0 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2428,6 +2428,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   unsigned int i;
   struct loop *loop;
   auto_bitmap threaded_blocks;
+  hash_set visited_starting_edges;
 
   if (!paths.exists ())
 {
@@ -2473,10 +2474,17 @@ thread_through_all_blocks (bool may_peel_loop_headers)
 	  continue;
 	}
 
-  /* Do not jump-thread twice from the same block.  */
-  if (bitmap_bit_p (threaded_blocks, entry->src->index)
-	  /* We may not want to realize this jump thread path
-	 for various reasons.  So check it first.  */
+  /* Do not jump-thread twice from the same starting edge.
+
+	 Previously we only checked that we weren't threading twice
+	 from the same BB, but that was too restrictive.  Imagine a
+	 path that starts from GIMPLE_COND(x_123 == 0,...), where both
+	 edges out of this conditional yield paths that can be
+	 threaded (for example, both lead to an x_123==0 or x_123!=0
+	 conditional further down the line.  */
+  if (visited_starting_edges.contains (entry)
+	  /* We may not want to realize this jump thread path for
+	 various reasons.  So check it first.  */
 	  || !valid_jump_thread_path (path))
 	{
 	  /* Remove invalid FSM jump-thread paths.  */
@@ -2496,7 +2504,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
 	{
 	  /* We do not update dominance info.  */
 	  free_dominance_info (CDI_DOMINATORS);
-	  bitmap_set_bit (threaded_blocks, entry->src->index);
+	  visited_starting_edges.add (entry);
 	  retval = true;
 	  thread_stats.num_threaded_edges++;
 	}
@@ -2514,7 +2522,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   edge entry = (*path)[0]->e;
 
   /* Do not jump-thread twice from the same block.  */
-  if (bitmap_bit_p (threaded_blocks, entry->src->index))
+  if (visited_starting_edges.contains (entry))
 	{
 	  delete_jump_thread_path (path);
 	  paths.unordered_remove (i);
@@ -2523,8 +2531,6 @@ thread_through_all_blocks (bool may_peel_loop_headers)
 	i++;
 }
 
-  bitmap_clear (threaded_blocks);
-
   mark_threaded_blocks (threaded_blocks);
 
   initialize_original_copy_tables ();



Re: [PATCH 1/2] Add "optinfo" framework

2018-07-10 Thread David Malcolm
On Mon, 2018-07-09 at 15:00 +0200, Richard Biener wrote:
> On Mon, Jul 2, 2018 at 10:51 PM David Malcolm 
> wrote:
> > 
> > This patch implements a way to consolidate dump_* calls into
> > optinfo objects, as enabling work towards being able to write out
> > optimization records to a file, or emit them as diagnostic
> > "remarks".
> > 
> > The patch adds the support for building optinfo instances from
> > dump_*
> > calls, but leaves implementing any *users* of them to followup
> > patches.
> > 
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> 
> Looks good overall, but ...
> 
> To "fix" the GC issue you'd need to capture all possibly interesting
> information from tree/gimple while it is still in flight.  This _may_
> be
> necessary anyway since I remember writing code like
> 
>   fprintf (dump_file, "old: ");
>   print_gimple_stmt (..., stmt);
>   gimple_set_rhs1 (stmt, op);
>   fprintf (dump_file, "new: ");
>   print_gmple_stmt (..., stmt);
>   fprintf (dump_file, "\n");
> 
> capturing interesting information means we know all targeted
> optinfo channels, right?  And the optinfo consumers
> need to handle "streams" of input and may not look back.

> I've yet have to look at the 2nd patch but can you comment on
> this?  How difficult is it to re-wire how the data flows to make
> stmt re-use like the above possible?

I *think* it's doable: rather than capture, say, a gimple *, the
optinfo_item would capture the result of pp_gimple_stmt_1, plus some
metadata.  In fact, it would probably allow for removing the
optinfo_item subclasses, making optinfo_item concrete, containing
something like:

  /* Textual form.  */
  char *m_text;
  bool m_ownership_of_text;

  /* Metadata for optimization records.  */
  enum optinfo_item_kind m_kind;
  location_t m_location;

or somesuch.

I'll have a go at implementing this.

Thanks
Dave

> Thanks,
> Richard.
> 
> > gcc/ChangeLog:
> > * Makefile.in (OBJS): Add optinfo.o.
> > * coretypes.h (class symtab_node): New forward decl.
> > (struct cgraph_node): New forward decl.
> > (class varpool_node): New forward decl.
> > * dump-context.h: New file.
> > * dumpfile.c: Include "optinfo.h", "dump-context.h",
> > "cgraph.h",
> > "tree-pass.h", "optinfo-internal.h".
> > (refresh_dumps_are_enabled): Use optinfo_enabled_p.
> > (set_dump_file): Call
> > dumpfile_ensure_any_optinfo_are_flushed.
> > (set_alt_dump_file): Likewise.
> > (dump_context::~dump_context): New dtor.
> > (dump_gimple_stmt): Move implementation to...
> > (dump_context::dump_gimple_stmt): ...this new member
> > function.
> > Add the stmt to any pending optinfo, creating one if need
> > be.
> > (dump_gimple_stmt_loc): Move implementation to...
> > (dump_context::dump_gimple_stmt_loc): ...this new member
> > function.
> > Convert param "loc" from location_t to const
> > dump_location_t &.
> > Start a new optinfo and add the stmt to it.
> > (dump_generic_expr): Move implementation to...
> > (dump_context::dump_generic_expr): ...this new member
> > function.
> > Add the tree to any pending optinfo, creating one if need
> > be.
> > (dump_generic_expr_loc): Move implementation to...
> > (dump_context::dump_generic_expr_loc): ...this new member
> > function.  Add the tree to any pending optinfo, creating
> > one if
> > need be.
> > (dump_printf): Move implementation to...
> > (dump_context::dump_printf_va): ...this new member
> > function.  Add
> > the text to any pending optinfo, creating one if need be.
> > (dump_printf_loc): Move implementation to...
> > (dump_context::dump_printf_loc_va): ...this new member
> > function.
> > Convert param "loc" from location_t to const
> > dump_location_t &.
> > Start a new optinfo and add the stmt to it.
> > (dump_dec): Move implementation to...
> > (dump_context::dump_dec): ...this new member function.  Add
> > the
> > value to any pending optinfo, creating one if need be.
> > (dump_context::dump_symtab_node): New member function.
> > (dump_context::get_scope_depth): New member function.
> > (dump_context::begin_scope): New member function.
> > (dump_context::end_scope): New member function.
> > (dump_context::ensure_pending_optinfo): New member
> > function.
> > (dump_context::begin_next_optinfo): New member function.
> > (dump_context::end_any_optinfo): New member function.
> > (dump_context::s_current): New global.
> > (dump_context::s_default): New global.
> > (dump_scope_depth): Delete global.
> > (dumpfile_ensure_any_optinfo_are_flushed): New function.
> > (dump_symtab_node): New function.
> > (get_dump_scope_depth): Reimplement in terms of
> > dump_context.

Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Earnshaw (lists)
On 10/07/18 11:10, Richard Biener wrote:
> On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
>  wrote:
>>
>> On 10/07/18 08:19, Richard Biener wrote:
>>> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
>>>  wrote:


 The patches I posted earlier this year for mitigating against
 CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
 which it became obvious that a rethink was needed.  This mail, and the
 following patches attempt to address that feedback and present a new
 approach to mitigating against this form of attack surface.

 There were two major issues with the original approach:

 - The speculation bounds were too tightly constrained - essentially
   they had to represent and upper and lower bound on a pointer, or a
   pointer offset.
 - The speculation constraints could only cover the immediately preceding
   branch, which often did not fit well with the structure of the existing
   code.

 An additional criticism was that the shape of the intrinsic did not
 fit particularly well with systems that used a single speculation
 barrier that essentially had to wait until all preceding speculation
 had to be resolved.

 To address all of the above, these patches adopt a new approach, based
 in part on a posting by Chandler Carruth to the LLVM developers list
 (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
 but which we have extended to deal with inter-function speculation.
 The patches divide the problem into two halves.

 The first half is some target-specific code to track the speculation
 condition through the generated code to provide an internal variable
 which can tell us whether or not the CPU's control flow speculation
 matches the data flow calculations.  The idea is that the internal
 variable starts with the value TRUE and if the CPU's control flow
 speculation ever causes a jump to the wrong block of code the variable
 becomes false until such time as the incorrect control flow
 speculation gets unwound.

 The second half is that a new intrinsic function is introduced that is
 much simpler than we had before.  The basic version of the intrinsic
 is now simply:

   T var = __builtin_speculation_safe_value (T unsafe_var);

 Full details of the syntax can be found in the documentation patch, in
 patch 1.  In summary, when not speculating the intrinsic returns
 unsafe_var; when speculating then if it can be shown that the
 speculative flow has diverged from the intended control flow then zero
 is returned.  An optional second argument can be used to return an
 alternative value to zero.  The builtin may cause execution to pause
 until the speculation state is resolved.
>>>
>>> So a trivial target implementation would be to emit a barrier and then
>>> it would always return unsafe_var and never zero.  What I don't understand
>>> fully is what users should do here, thus what the value of ever returning
>>> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
>>> special value instead of doing
>>>
>>>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
>>>  if (!safe)
>>>/* what now? */
>>>
>>> I'm only guessing that the correct way to handle "unsafe" is basically
>>>
>>>  while (__builtin_speculation_safe_value (val) == 0)
>>> ;
>>>
>>> use val, it's now safe
>>
>> No, a safe version of val is returned, not a bool telling you it is now
>> safe to use the original.
> 
> OK, so making the old value dead is required to preserve the desired
> dataflow.
> 
> But how should I use the special value that signaled "failure"?
> 
> Obviously the user isn't supposed to simply replace 'val' with
> 
>  val = __builtin_speculation_safe_value (val);
> 
> to make it speculation-proof.  So - how should the user _use_ this
> builtin?  The docs do not say anything about this but says the
> very confusing
> 
> +The function may use target-dependent speculation tracking state to cause
> +@var{failval} to be returned when it is known that speculative
> +execution has incorrectly predicted a conditional branch operation.
> 
> because speculation is about executing instructions as if they were
> supposed to be executed.  Once it is known the prediciton was wrong
> no more "wrong" instructions will be executed but a previously
> speculated instruction cannot know it was "falsely" speculated.
> 
> Does the above try to say that the function may return failval if the
> instruction is currently executed speculatively instead?  That would
> make sense to me.  And return failval independent of if the speculation
> later turns out to be correct or not.
> 
>>  You must use the sanitized version in future,
>> not the unprotected version.
>>
>>
>> So the usage is going to be more like:
>>
>> val = __builtin_speculation_safe_value (val);  // 

abstract gimple_call_nonnull*() from vr-values

2018-07-10 Thread Aldy Hernandez

Ho hum, more abstractions.

No change in functionality.

OK for trunk?

gcc/

* vr-values.c (gimple_stmt_nonzero_p): Abstract common code to...
* gimple.c (gimple_call_nonnull_result_p): ...here...
(gimple_call_nonnull_arg): ...and here.
* gimple.h (gimple_call_nonnull_result_p): New.
(gimple_call_nonnull_arg): New.

diff --git a/gcc/gimple.c b/gcc/gimple.c
index afdf583256c..8d56a966cc1 100644
--- a/gcc/gimple.c
+++ b/gcc/gimple.c
@@ -1548,6 +1548,57 @@ gimple_call_return_flags (const gcall *stmt)
 }
 
 
+/* Return true if call STMT is known to return a non-zero result.  */
+
+bool
+gimple_call_nonnull_result_p (gcall *call)
+{
+  tree fndecl = gimple_call_fndecl (call);
+  if (!fndecl)
+return false;
+  if (flag_delete_null_pointer_checks && !flag_check_new
+  && DECL_IS_OPERATOR_NEW (fndecl)
+  && !TREE_NOTHROW (fndecl))
+return true;
+
+  /* References are always non-NULL.  */
+  if (flag_delete_null_pointer_checks
+  && TREE_CODE (TREE_TYPE (fndecl)) == REFERENCE_TYPE)
+return true;
+
+  if (flag_delete_null_pointer_checks
+  && lookup_attribute ("returns_nonnull",
+			   TYPE_ATTRIBUTES (gimple_call_fntype (call
+return true;
+  return gimple_alloca_call_p (call);
+}
+
+
+/* If CALL returns a non-null result in an argument, return that arg.  */
+
+tree
+gimple_call_nonnull_arg (gcall *call)
+{
+  tree fndecl = gimple_call_fndecl (call);
+  if (!fndecl)
+return NULL_TREE;
+
+  unsigned rf = gimple_call_return_flags (call);
+  if (rf & ERF_RETURNS_ARG)
+{
+  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+  if (argnum < gimple_call_num_args (call))
+	{
+	  tree arg = gimple_call_arg (call, argnum);
+	  if (SSA_VAR_P (arg)
+	  && infer_nonnull_range_by_attribute (call, arg))
+	return arg;
+	}
+}
+  return NULL_TREE;
+}
+
+
 /* Return true if GS is a copy assignment.  */
 
 bool
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 32e1908c534..a5dda9369bc 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1488,6 +1488,8 @@ bool gimple_call_same_target_p (const gimple *, const gimple *);
 int gimple_call_flags (const gimple *);
 int gimple_call_arg_flags (const gcall *, unsigned);
 int gimple_call_return_flags (const gcall *);
+bool gimple_call_nonnull_result_p (gcall *);
+tree gimple_call_nonnull_arg (gcall *);
 bool gimple_assign_copy_p (gimple *);
 bool gimple_assign_ssa_name_copy_p (gimple *);
 bool gimple_assign_unary_nop_p (gimple *);
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 32f64e047af..bba170f341b 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -313,35 +313,9 @@ gimple_stmt_nonzero_p (gimple *stmt)
   return gimple_assign_nonzero_p (stmt);
 case GIMPLE_CALL:
   {
-	tree fndecl = gimple_call_fndecl (stmt);
-	if (!fndecl) return false;
-	if (flag_delete_null_pointer_checks && !flag_check_new
-	&& DECL_IS_OPERATOR_NEW (fndecl)
-	&& !TREE_NOTHROW (fndecl))
-	  return true;
-	/* References are always non-NULL.  */
-	if (flag_delete_null_pointer_checks
-	&& TREE_CODE (TREE_TYPE (fndecl)) == REFERENCE_TYPE)
-	  return true;
-	if (flag_delete_null_pointer_checks && 
-	lookup_attribute ("returns_nonnull",
-			  TYPE_ATTRIBUTES (gimple_call_fntype (stmt
-	  return true;
-
-	gcall *call_stmt = as_a (stmt);
-	unsigned rf = gimple_call_return_flags (call_stmt);
-	if (rf & ERF_RETURNS_ARG)
-	  {
-	unsigned argnum = rf & ERF_RETURN_ARG_MASK;
-	if (argnum < gimple_call_num_args (call_stmt))
-	  {
-		tree arg = gimple_call_arg (call_stmt, argnum);
-		if (SSA_VAR_P (arg)
-		&& infer_nonnull_range_by_attribute (stmt, arg))
-		  return true;
-	  }
-	  }
-	return gimple_alloca_call_p (stmt);
+gcall *call_stmt = as_a (stmt);
+	return (gimple_call_nonnull_result_p (call_stmt)
+		|| gimple_call_nonnull_arg (call_stmt));
   }
 default:
   gcc_unreachable ();


Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Biener
On Tue, Jul 10, 2018 at 10:39 AM Richard Earnshaw (lists)
 wrote:
>
> On 10/07/18 08:19, Richard Biener wrote:
> > On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
> >  wrote:
> >>
> >>
> >> The patches I posted earlier this year for mitigating against
> >> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> >> which it became obvious that a rethink was needed.  This mail, and the
> >> following patches attempt to address that feedback and present a new
> >> approach to mitigating against this form of attack surface.
> >>
> >> There were two major issues with the original approach:
> >>
> >> - The speculation bounds were too tightly constrained - essentially
> >>   they had to represent and upper and lower bound on a pointer, or a
> >>   pointer offset.
> >> - The speculation constraints could only cover the immediately preceding
> >>   branch, which often did not fit well with the structure of the existing
> >>   code.
> >>
> >> An additional criticism was that the shape of the intrinsic did not
> >> fit particularly well with systems that used a single speculation
> >> barrier that essentially had to wait until all preceding speculation
> >> had to be resolved.
> >>
> >> To address all of the above, these patches adopt a new approach, based
> >> in part on a posting by Chandler Carruth to the LLVM developers list
> >> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> >> but which we have extended to deal with inter-function speculation.
> >> The patches divide the problem into two halves.
> >>
> >> The first half is some target-specific code to track the speculation
> >> condition through the generated code to provide an internal variable
> >> which can tell us whether or not the CPU's control flow speculation
> >> matches the data flow calculations.  The idea is that the internal
> >> variable starts with the value TRUE and if the CPU's control flow
> >> speculation ever causes a jump to the wrong block of code the variable
> >> becomes false until such time as the incorrect control flow
> >> speculation gets unwound.
> >>
> >> The second half is that a new intrinsic function is introduced that is
> >> much simpler than we had before.  The basic version of the intrinsic
> >> is now simply:
> >>
> >>   T var = __builtin_speculation_safe_value (T unsafe_var);
> >>
> >> Full details of the syntax can be found in the documentation patch, in
> >> patch 1.  In summary, when not speculating the intrinsic returns
> >> unsafe_var; when speculating then if it can be shown that the
> >> speculative flow has diverged from the intended control flow then zero
> >> is returned.  An optional second argument can be used to return an
> >> alternative value to zero.  The builtin may cause execution to pause
> >> until the speculation state is resolved.
> >
> > So a trivial target implementation would be to emit a barrier and then
> > it would always return unsafe_var and never zero.  What I don't understand
> > fully is what users should do here, thus what the value of ever returning
> > "unsafe" is.  Also I wonder why the API is forcing you to single-out a
> > special value instead of doing
> >
> >  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
> >  if (!safe)
> >/* what now? */
> >
> > I'm only guessing that the correct way to handle "unsafe" is basically
> >
> >  while (__builtin_speculation_safe_value (val) == 0)
> > ;
> >
> > use val, it's now safe
>
> No, a safe version of val is returned, not a bool telling you it is now
> safe to use the original.

OK, so making the old value dead is required to preserve the desired
dataflow.

But how should I use the special value that signaled "failure"?

Obviously the user isn't supposed to simply replace 'val' with

 val = __builtin_speculation_safe_value (val);

to make it speculation-proof.  So - how should the user _use_ this
builtin?  The docs do not say anything about this but says the
very confusing

+The function may use target-dependent speculation tracking state to cause
+@var{failval} to be returned when it is known that speculative
+execution has incorrectly predicted a conditional branch operation.

because speculation is about executing instructions as if they were
supposed to be executed.  Once it is known the prediciton was wrong
no more "wrong" instructions will be executed but a previously
speculated instruction cannot know it was "falsely" speculated.

Does the above try to say that the function may return failval if the
instruction is currently executed speculatively instead?  That would
make sense to me.  And return failval independent of if the speculation
later turns out to be correct or not.

>  You must use the sanitized version in future,
> not the unprotected version.
>
>
> So the usage is going to be more like:
>
> val = __builtin_speculation_safe_value (val);  // Overwrite val with a
> sanitized version.
>
> You have to use the cleaned up version, the unclean version is still
> 

[PATCH 2/2] condition_variable: Use steady_clock to implement wait_for

2018-07-10 Thread Mike Crowe
I believe[1][2] that the C++ standard says that
std::condition_variable::wait_for should be implemented to be equivalent
to:

 return wait_until(lock, chrono::steady_clock::now() + rel_time);

But the existing implementation uses chrono::system_clock. Now that
wait_until has potentially-different behaviour for chrono::steady_clock,
let's at least try to wait using the correct clock.

[1] https://en.cppreference.com/w/cpp/thread/condition_variable/wait_for
[2] https://github.com/cplusplus/draft/blob/master/source/threads.tex
---
 libstdc++-v3/ChangeLog  | 3 +++
 libstdc++-v3/include/std/condition_variable | 5 +++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index ea7875ace9f..4500273ace7 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,4 +1,7 @@
 2018-07-09  Mike Crowe 
+   * include/std/condition_variable (wait_for): Use steady_clock.
+
+2018-07-09  Mike Crowe 
* include/std/condition_variable (wait_until): Only report timeout
if we really have timed out when measured against the
caller-supplied clock.
diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index a2d146a9b09..ce583990b9d 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -65,6 +65,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   class condition_variable
   {
 typedef chrono::system_clock   __clock_t;
+typedef chrono::steady_clock   __steady_clock_t;
 typedef __gthread_cond_t   __native_type;

 #ifdef __GTHREAD_COND_INIT
@@ -142,11 +143,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   wait_for(unique_lock& __lock,
   const chrono::duration<_Rep, _Period>& __rtime)
   {
-   using __dur = typename __clock_t::duration;
+   using __dur = typename __steady_clock_t::duration;
auto __reltime = chrono::duration_cast<__dur>(__rtime);
if (__reltime < __rtime)
  ++__reltime;
-   return wait_until(__lock, __clock_t::now() + __reltime);
+   return wait_until(__lock, __steady_clock_t::now() + __reltime);
   }

 template
--
2.11.0

BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.


[PATCH 1/2] condition_variable: Report early wakeup of wait_until as no_timeout

2018-07-10 Thread Mike Crowe
As currently implemented, condition_variable always ultimately waits
against std::chrono::system_clock. This clock can be changed in arbitrary
ways by the user which may result in us waking up too early or too late
when measured against the caller-supplied clock.

We can't (yet) do much about waking up too late[1], but
if we wake up too early we must return cv_status::no_timeout to indicate a
spurious wakeup rather than incorrectly returning cv_status::timeout.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41861
---
 libstdc++-v3/ChangeLog  | 5 +
 libstdc++-v3/include/std/condition_variable | 8 +++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog
index cceef0271ae..ea7875ace9f 100644
--- a/libstdc++-v3/ChangeLog
+++ b/libstdc++-v3/ChangeLog
@@ -1,3 +1,8 @@
+2018-07-09  Mike Crowe 
+   * include/std/condition_variable (wait_until): Only report timeout
+   if we really have timed out when measured against the
+   caller-supplied clock.
+
 2018-07-06  François Dumont  

* include/debug/functions.h (__gnu_debug::__check_string): Move...
diff --git a/libstdc++-v3/include/std/condition_variable 
b/libstdc++-v3/include/std/condition_variable
index 84863a162d6..a2d146a9b09 100644
--- a/libstdc++-v3/include/std/condition_variable
+++ b/libstdc++-v3/include/std/condition_variable
@@ -116,7 +116,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
const auto __delta = __atime - __c_entry;
const auto __s_atime = __s_entry + __delta;

-   return __wait_until_impl(__lock, __s_atime);
+   // We might get a timeout when measured against __clock_t but
+   // we need to check against the caller-supplied clock to tell
+   // whether we should return a timeout.
+   if (__wait_until_impl(__lock, __s_atime) == cv_status::timeout)
+ return _Clock::now() < __atime ? cv_status::no_timeout : 
cv_status::timeout;
+   else
+ return cv_status::no_timeout;
   }

 template
--
2.11.0

BrightSign considers your privacy to be very important. The emails you send to 
us will be protected and secured. Furthermore, we will only use your email and 
contact information for the reasons you sent them to us and for tracking how 
effectively we respond to your requests.


Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]

2018-07-10 Thread Sudakshina

Hi Jackson


On Tuesday 10 July 2018 09:37 AM, Jackson Woodruff wrote:

Hi all,

This patch removes some duplicated code.  Since this method deals with 
four loads or stores, there is a lot of duplicated code that can 
easily be replaced with smaller loops.


Regtest and bootstrap OK.

OK for trunk?

Thanks,

Jackson

Changelog:

gcc/

2018-06-28  Jackson Woodruff  

    * config/aarch64/aarch64.c 
(aarch64_operands_adjust_ok_for_ldpstp):

    Use arrays instead of numbered variables.


Thank you for doing this. This looks a lot neater now.
I am not a maintainer but I noticed a couple of nits:

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
01f35f8e8525adb455780269757452c8c3eb20be..d0e9b2d464183eecc8cc7639ca3e981d2ff243ba 
100644

--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17026,23 +17026,21 @@ bool
 aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
                scalar_mode mode)
 {
-  enum reg_class rclass_1, rclass_2, rclass_3, rclass_4;
-  HOST_WIDE_INT offvals[4], msize;
-  rtx mem_1, mem_2, mem_3, mem_4, reg_1, reg_2, reg_3, reg_4;
-  rtx base_1, base_2, base_3, base_4, offset_1, offset_2, offset_3, 
offset_4;

+  const int num_instructions = 4;
+  enum reg_class rclass[num_instructions];
+  HOST_WIDE_INT offvals[num_instructions], msize;
+  rtx mem[num_instructions], reg[num_instructions],
+  base[num_instructions], offset[num_instructions];
...
   /* Skip if memory operand is by itslef valid for ldp/stp.  */
-  if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
+  if (!MEM_P (mem[1]) || aarch64_mem_pair_operand (mem[1], mode))

mem_1 == mem[1]?

 return false;

-  /* The mems cannot be volatile.  */
...

/* If we have SImode and slow unaligned ldp,
  check the alignment to be at least 8 byte. */
   if (mode == SImode
   && (aarch64_tune_params.extra_tuning_flags
-  & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
+      & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW)
   && !optimize_size
-  && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT)
+  && MEM_ALIGN (mem[1]) < 8 * BITS_PER_UNIT)

Likewise
...
   /* Check if the registers are of same class.  */
-  if (rclass_1 != rclass_2 || rclass_2 != rclass_3 || rclass_3 != rclass_4)
-    return false;
+  for (int i = 0; i < 3; i++)

num_instructions -1 instead of 3 would be more consistent.

+    if (rclass[i] != rclass[i + 1])
+  return false;

It looks good otherwise.

Thanks
Sudi



Re: [AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]

2018-07-10 Thread Kyrill Tkachov

Hi Jackson,

On 10/07/18 09:37, Jackson Woodruff wrote:

Hi all,

This patch removes some duplicated code.  Since this method deals with
four loads or stores, there is a lot of duplicated code that can easily
be replaced with smaller loops.

Regtest and bootstrap OK.

OK for trunk?



This looks like a good cleanup. There are no functional changes, right?
Looks good to me, but you'll need approval from a maintainer.

Thanks,
Kyrill


Thanks,

Jackson

Changelog:

gcc/

2018-06-28  Jackson Woodruff 

 * config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp):
 Use arrays instead of numbered variables.





Re: [PATCH][RFC] Make iterating over hash-map elide copying/destructing

2018-07-10 Thread Richard Biener
On Tue, 10 Jul 2018, Trevor Saunders wrote:

> On Tue, Jul 10, 2018 at 10:43:20AM +0200, Richard Biener wrote:
> > 
> > The following makes the hash-map iterator dereference return a pair > Value&> rather than a copy of Value.  This matches the hash-table iterator
> > behavior and avoids issues with
> > 
> >   hash_map >
> 
> Eventually somebodies probably going to want
> hash_map>, auto_vec> too, so we might as well go ahead
> and make it pair?
> 
> > where iterating over the hash-table will call the auto_vec destructor
> > when dereferencing the iterator.  I note that the copy ctor of
> > auto_vec should probably be deleted and the hash-table/map iterators
> > should possibly support an alternate "reference" type to the stored
> > Values so we can use vec<> for "references" and auto_vec<> for
> > stored members.
> 
> I think code somewhere uses the auto_vec copy ctor to return a auto_vec,
> this is pretty similar to the situation with unique_ptr in c++98 mode.
> 
> > But that's out of scope - the patch below seems to survive minimal
> > testing at least.
> > 
> > I suppose we still want to somehow hide the copy ctors of auto_vec?
> 
> I suspec the best we can do is delete it in c++11 mode and provide a
> auto_vec(auto_vec &&) move ctor instead.  Though I think for the
> case where auto_vec has inline storage we should be able to just delete
> the copy ctor?
> 
> > How does hash-map growth work here?  (I suppose it doesn't...?)
> 
> Yeah was going to ask, I think hash_table memcpy's the elements? in
> which case memcpying a pointer into yourself isn't going to work.

It doesn't work.  It uses assignment but auto_vec doesn't implement
that so auto-storage breaks.  So you say it should use
std::move<> where that's obviously not available for us :/

> However I think if you use the auto_vec specialization for 0 internal
> elements that should be able to work if we null out the old auto_vec or
> avoid running dtors on the old elements.

Well, then I don't really need auto_vec, I'm more interested in the
embedded storage than the destructor ;)

> > Any further comments?
> 
> other than using a reference for the key type seems good.

OK, I suppose it should be 'const Key&' then (hopefully that
works for Key == const X / X * as intended).

I guess given the expansion problem I'm going to re-think using
auto_vec for now :/

Can we please move to C++11? ;)

Richard.


Re: [PATCH][RFC] Make iterating over hash-map elide copying/destructing

2018-07-10 Thread Trevor Saunders
On Tue, Jul 10, 2018 at 10:43:20AM +0200, Richard Biener wrote:
> 
> The following makes the hash-map iterator dereference return a pair Value&> rather than a copy of Value.  This matches the hash-table iterator
> behavior and avoids issues with
> 
>   hash_map >

Eventually somebodies probably going to want
hash_map>, auto_vec> too, so we might as well go ahead
and make it pair?

> where iterating over the hash-table will call the auto_vec destructor
> when dereferencing the iterator.  I note that the copy ctor of
> auto_vec should probably be deleted and the hash-table/map iterators
> should possibly support an alternate "reference" type to the stored
> Values so we can use vec<> for "references" and auto_vec<> for
> stored members.

I think code somewhere uses the auto_vec copy ctor to return a auto_vec,
this is pretty similar to the situation with unique_ptr in c++98 mode.

> But that's out of scope - the patch below seems to survive minimal
> testing at least.
> 
> I suppose we still want to somehow hide the copy ctors of auto_vec?

I suspec the best we can do is delete it in c++11 mode and provide a
auto_vec(auto_vec &&) move ctor instead.  Though I think for the
case where auto_vec has inline storage we should be able to just delete
the copy ctor?

> How does hash-map growth work here?  (I suppose it doesn't...?)

Yeah was going to ask, I think hash_table memcpy's the elements? in
which case memcpying a pointer into yourself isn't going to work.
However I think if you use the auto_vec specialization for 0 internal
elements that should be able to work if we null out the old auto_vec or
avoid running dtors on the old elements.

> Any further comments?

other than using a reference for the key type seems good.

thanks

trev



Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM

2018-07-10 Thread Thomas Preudhomme
Adding Jeff and Eric since the patch adds an RTL target hook.

Best regards,

Thomas

On Thu, 5 Jul 2018 at 15:48, Thomas Preudhomme
 wrote:
>
> In case of high register pressure in PIC mode, address of the stack
> protector's guard can be spilled on ARM targets as shown in PR85434,
> thus allowing an attacker to control what the canary would be compared
> against. ARM does lack stack_protect_set and stack_protect_test insn
> patterns, defining them does not help as the address is expanded
> regularly and the patterns only deal with the copy and test of the
> guard with the canary.
>
> This problem does not occur for x86 targets because the PIC access and
> the test can be done in the same instruction. Aarch64 is exempt too
> because PIC access insn pattern are mov of UNSPEC which prevents it from
> the second access in the epilogue being CSEd in cse_local pass with the
> first access in the prologue.
>
> The approach followed here is to create new "combined" set and test
> standard pattern names that take the unexpanded guard and do the set or
> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> to hide the individual instructions being generated to the compiler and
> split the pattern into generic load, compare and branch instruction
> after register allocator, therefore avoiding any spilling. This is here
> implemented for the ARM targets. For targets not implementing these new
> standard pattern names, the existing stack_protect_set and
> stack_protect_test pattern names are used.
>
> To be able to split PIC access after register allocation, the functions
> had to be augmented to force a new PIC register load and to control
> which register it loads into. This is because sharing the PIC register
> between prologue and epilogue could lead to spilling due to CSE again
> which an attacker could use to control what the canary gets compared
> against.
>
> ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2018-07-05  Thomas Preud'homme  
>
> PR target/85434
> * target-insns.def (stack_protect_combined_set): Define new standard
> pattern name.
> (stack_protect_combined_test): Likewise.
> * cfgexpand.c (stack_protect_prologue): Try new
> stack_protect_combined_set pattern first.
> * function.c (stack_protect_epilogue): Try new
> stack_protect_combined_test pattern first.
> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> parameters to control which register to use as PIC register and force
> reloading PIC register respectively.
> (legitimize_pic_address): Expose above new parameters in prototype and
> adapt recursive calls accordingly.
> (arm_legitimize_address): Adapt to new legitimize_pic_address
> prototype.
> (thumb_legitimize_address): Likewise.
> (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> change.
> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> prototype change.
> (stack_protect_combined_set): New insn_and_split pattern.
> (stack_protect_set): New insn pattern.
> (stack_protect_combined_test): New insn_and_split pattern.
> (stack_protect_test): New insn pattern.
> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> (UNSPEC_SP_TEST): Likewise.
> * doc/md.texi (stack_protect_combined_set): Document new standard
> pattern name.
> (stack_protect_set): Clarify that the operand for guard's address is
> legal.
> (stack_protect_combined_test): Document new standard pattern name.
> (stack_protect_test): Clarify that the operand for guard's address is
> legal.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-07-05  Thomas Preud'homme  
>
> PR target/85434
> * gcc.target/arm/pr85434.c: New test.
>
> Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
> Aarch64. Testsuite shows no regression on these 3 variants either both
> with default flags and with -fstack-protector-all.
>
> Is this ok for trunk? If yes, would this be acceptable as a backport to
> GCC 6, 7 and 8 provided that no regression is found?
>
> Best regards,
>
> Thomas
From d917d48c2005e46154383589f203d06f3c6167e0 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme 
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the 

Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Earnshaw (lists)
On 10/07/18 00:13, Jeff Law wrote:
> On 07/09/2018 10:38 AM, Richard Earnshaw wrote:
>>
>> The patches I posted earlier this year for mitigating against
>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>> which it became obvious that a rethink was needed.  This mail, and the
>> following patches attempt to address that feedback and present a new
>> approach to mitigating against this form of attack surface.
>>
>> There were two major issues with the original approach:
>>
>> - The speculation bounds were too tightly constrained - essentially
>>   they had to represent and upper and lower bound on a pointer, or a
>>   pointer offset.
>> - The speculation constraints could only cover the immediately preceding
>>   branch, which often did not fit well with the structure of the existing
>>   code.
>>
>> An additional criticism was that the shape of the intrinsic did not
>> fit particularly well with systems that used a single speculation
>> barrier that essentially had to wait until all preceding speculation
>> had to be resolved.
> Right.  I suggest the Intel and IBM reps chime in on the updated semantics.
> 

Yes, logically, this is a boolean tracker value.  In practice we use ~0
for true and 0 for false, so that we can simply use it as a mask
operation later.

I hope this intrinsic will be even more acceptable than the one that
Bill Schmidt acked previously, it's even simpler than the version we had
last time.

>>
>> To address all of the above, these patches adopt a new approach, based
>> in part on a posting by Chandler Carruth to the LLVM developers list
>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>> but which we have extended to deal with inter-function speculation.
>> The patches divide the problem into two halves.
> We're essentially turning the control dependency into a value that we
> can then use to munge the pointer or the resultant data.
> 
>>
>> The first half is some target-specific code to track the speculation
>> condition through the generated code to provide an internal variable
>> which can tell us whether or not the CPU's control flow speculation
>> matches the data flow calculations.  The idea is that the internal
>> variable starts with the value TRUE and if the CPU's control flow
>> speculation ever causes a jump to the wrong block of code the variable
>> becomes false until such time as the incorrect control flow
>> speculation gets unwound.
> Right.
> 
> So one of the things that comes immediately to mind is you have to run
> this early enough that you can still get to all the control flow and
> build your predicates.  Otherwise you have do undo stuff like
> conditional move generation.

No, the opposite, in fact.  We want to run this very late, at least on
Arm systems (AArch64 or AArch32).  Conditional move instructions are
fine - they're data-flow operations, not control flow (in fact, that's
exactly what the control flow tracker instructions are).  By running it
late we avoid disrupting any of the earlier optimization passes as well.

> 
> On the flip side, the earlier you do this mitigation, the more you have
> to worry about what the optimizers are going to do to the code later in
> the pipeline.  It's almost guaranteed a naive implementation is going to
> muck this up since we can propagate the state of the condition into the
> arms which will make the predicate state a compile time constant.
> 
> In fact this seems to be running into the area of pointer providence and
> some discussions we had around atomic a few years back.
> 
> I also wonder if this could be combined with taint analysis to produce a
> much lower overhead solution in cases were developers have done analysis
> and know what objects are potentially under attacker control.  So
> instead of analyzing everything, we can have a much narrower focus.

Automatic application of the tracker to vulnerable variables would be
nice, but I haven't attempted to go there yet: at present I still rely
on the user to annotate code with the new intrinsic.

That doesn't mean that we couldn't extend the overall approach later to
include automatic tracking.

> 
> The pointer munging could well run afoul of alias analysis engines that
> don't expect to be seeing those kind of operations.

I think the pass runs late enough that it isn't a problem.

> 
> Anyway, just some initial high level thoughts.  I'm sure there'll be
> more as I read the implementation.
> 

Thanks for starting to look at this so quickly.

R.

> 
> Jeff
> 



[PATCH][RFC] Make iterating over hash-map elide copying/destructing

2018-07-10 Thread Richard Biener


The following makes the hash-map iterator dereference return a pair rather than a copy of Value.  This matches the hash-table iterator
behavior and avoids issues with

  hash_map >

where iterating over the hash-table will call the auto_vec destructor
when dereferencing the iterator.  I note that the copy ctor of
auto_vec should probably be deleted and the hash-table/map iterators
should possibly support an alternate "reference" type to the stored
Values so we can use vec<> for "references" and auto_vec<> for
stored members.

But that's out of scope - the patch below seems to survive minimal
testing at least.

I suppose we still want to somehow hide the copy ctors of auto_vec?
How does hash-map growth work here?  (I suppose it doesn't...?)

Any further comments?

Thanks,
Richard.

2018-07-10  Richard Biener  

* hash-map.h (hash_map::iterator::operator*): Return
a reference to Value.

diff --git a/gcc/hash-map.h b/gcc/hash-map.h
index 7861440f3b3..9d2b38a843e 100644
--- a/gcc/hash-map.h
+++ b/gcc/hash-map.h
@@ -223,10 +223,10 @@ public:
   return *this;
 }
 
-std::pair operator* ()
+std::pair operator* ()
 {
   hash_entry  = *m_iter;
-  return std::pair (e.m_key, e.m_value);
+  return std::pair (e.m_key, e.m_value);
 }
 
 bool



Re: [patch] jump threading multiple paths that start from the same BB

2018-07-10 Thread Aldy Hernandez




On 07/09/2018 03:56 PM, Jeff Law wrote:

On 07/09/2018 01:19 AM, Aldy Hernandez wrote:


I'd like decisions about how to expand branches deferred until rtl
expansion.  Kai was poking at this in the past but never really got any
traction.


For the record, the problem in this testcase is that switch lowering is
riddled with back end specific knowledge (GET_MODE_SIZE uses as well as
some rtx cost hacks).

Yea.  Switch lowering is going to have some of these as well, though I
think BRANCH_COST is more pervasive.





Many tests should turn into gimple IL tests.


Yeah, though for tests like the threading ones, they're already
sufficiently convoluted that turning them into gimple IL tests will make
them even harder to read.  Oh well, I guess?

It might make them harder to read, but it would guarantee consistent
gimple fed into the optimizer across our targets which in turn ought to
result in consistent behavior by the optimizer which in turn should
simplify the test and make them more consistent over time.


Ok.  When I submit my queued up range based changes to the threader I'll 
see if I can convert a big chunk of the threader tests to gimple IL.


Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Earnshaw (lists)
On 10/07/18 08:19, Richard Biener wrote:
> On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
>  wrote:
>>
>>
>> The patches I posted earlier this year for mitigating against
>> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
>> which it became obvious that a rethink was needed.  This mail, and the
>> following patches attempt to address that feedback and present a new
>> approach to mitigating against this form of attack surface.
>>
>> There were two major issues with the original approach:
>>
>> - The speculation bounds were too tightly constrained - essentially
>>   they had to represent and upper and lower bound on a pointer, or a
>>   pointer offset.
>> - The speculation constraints could only cover the immediately preceding
>>   branch, which often did not fit well with the structure of the existing
>>   code.
>>
>> An additional criticism was that the shape of the intrinsic did not
>> fit particularly well with systems that used a single speculation
>> barrier that essentially had to wait until all preceding speculation
>> had to be resolved.
>>
>> To address all of the above, these patches adopt a new approach, based
>> in part on a posting by Chandler Carruth to the LLVM developers list
>> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
>> but which we have extended to deal with inter-function speculation.
>> The patches divide the problem into two halves.
>>
>> The first half is some target-specific code to track the speculation
>> condition through the generated code to provide an internal variable
>> which can tell us whether or not the CPU's control flow speculation
>> matches the data flow calculations.  The idea is that the internal
>> variable starts with the value TRUE and if the CPU's control flow
>> speculation ever causes a jump to the wrong block of code the variable
>> becomes false until such time as the incorrect control flow
>> speculation gets unwound.
>>
>> The second half is that a new intrinsic function is introduced that is
>> much simpler than we had before.  The basic version of the intrinsic
>> is now simply:
>>
>>   T var = __builtin_speculation_safe_value (T unsafe_var);
>>
>> Full details of the syntax can be found in the documentation patch, in
>> patch 1.  In summary, when not speculating the intrinsic returns
>> unsafe_var; when speculating then if it can be shown that the
>> speculative flow has diverged from the intended control flow then zero
>> is returned.  An optional second argument can be used to return an
>> alternative value to zero.  The builtin may cause execution to pause
>> until the speculation state is resolved.
> 
> So a trivial target implementation would be to emit a barrier and then
> it would always return unsafe_var and never zero.  What I don't understand
> fully is what users should do here, thus what the value of ever returning
> "unsafe" is.  Also I wonder why the API is forcing you to single-out a
> special value instead of doing
> 
>  bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
>  if (!safe)
>/* what now? */
> 
> I'm only guessing that the correct way to handle "unsafe" is basically
> 
>  while (__builtin_speculation_safe_value (val) == 0)
> ;
> 
> use val, it's now safe

No, a safe version of val is returned, not a bool telling you it is now
safe to use the original.  You must use the sanitized version in future,
not the unprotected version.


So the usage is going to be more like:

val = __builtin_speculation_safe_value (val);  // Overwrite val with a
sanitized version.

You have to use the cleaned up version, the unclean version is still
vulnerable to incorrect speculation.

R.

> 
> that is, the return value is only interesting in sofar as to whether it is 
> equal
> to val or the special value?
> 
> That said, I wonder why we don't hide that details from the user and
> provide a predicate instead.
> 
> Richard.
> 
>> There are seven patches in this set, as follows.
>>
>> 1) Introduces the new intrinsic __builtin_sepculation_safe_value.
>> 2) Adds a basic hard barrier implementation for AArch32 (arm) state.
>> 3) Adds a basic hard barrier implementation for AArch64 state.
>> 4) Adds a new command-line option -mtrack-speculation (currently a no-op).
>> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
>> 6) Adds the new speculation tracking pass for AArch64
>> 7) Uses the new speculation tracking pass to generate CSDB-based barrier
>>sequences
>>
>> I haven't added a speculation-tracking pass for AArch32 at this time.
>> It is possible to do this, but would require quite a lot of rework for
>> the arm backend due to the limited number of registers that are
>> available.
>>
>> Although patch 6 is AArch64 specific, I'd appreciate a review from
>> someone more familiar with the branch edge code than myself.  There
>> appear to be a number of tricky issues with more complex edges so I'd
>> like a second opinion on that code in case I've missed an important
>> 

[AArch64] Generate load-pairs when the last load clobbers the address register [2/2]

2018-07-10 Thread Jackson Woodruff

Hi all,

This patch resolves PR86014.  It does so by noticing that the last load 
may clobber the address register without issue (regardless of where it 
exists in the final ldp/stp sequence).  That check has been changed so 
that the last register may be clobbered and the testcase 
(gcc.target/aarch64/ldp_stp_10.c) now passes.


Bootstrap and regtest OK.

OK for trunk?

Jackson

Changelog:

gcc/

2018-06-25  Jackson Woodruff  

    PR target/86014
    * config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp):
    Remove address clobber check on last register.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d0e9b2d464183eecc8cc7639ca3e981d2ff243ba..feffe8ebdbd4efd0ffc09834547767ceec46f4e4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17074,7 +17074,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
   /* Only the last register in the order in which they occur
  may be clobbered by the load.  */
   if (load)
-for (int i = 0; i < num_instructions; i++)
+for (int i = 0; i < num_instructions - 1; i++)
   if (reg_mentioned_p (reg[i], mem[i]))
 	return false;
 


[AArch64] Use arrays and loops rather than numbered variables in aarch64_operands_adjust_ok_for_ldpstp [1/2]

2018-07-10 Thread Jackson Woodruff

Hi all,

This patch removes some duplicated code.  Since this method deals with 
four loads or stores, there is a lot of duplicated code that can easily 
be replaced with smaller loops.


Regtest and bootstrap OK.

OK for trunk?

Thanks,

Jackson

Changelog:

gcc/

2018-06-28  Jackson Woodruff  

    * config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp):
    Use arrays instead of numbered variables.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 01f35f8e8525adb455780269757452c8c3eb20be..d0e9b2d464183eecc8cc7639ca3e981d2ff243ba 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -17026,23 +17026,21 @@ bool
 aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
    scalar_mode mode)
 {
-  enum reg_class rclass_1, rclass_2, rclass_3, rclass_4;
-  HOST_WIDE_INT offvals[4], msize;
-  rtx mem_1, mem_2, mem_3, mem_4, reg_1, reg_2, reg_3, reg_4;
-  rtx base_1, base_2, base_3, base_4, offset_1, offset_2, offset_3, offset_4;
+  const int num_instructions = 4;
+  enum reg_class rclass[num_instructions];
+  HOST_WIDE_INT offvals[num_instructions], msize;
+  rtx mem[num_instructions], reg[num_instructions],
+  base[num_instructions], offset[num_instructions];
 
   if (load)
 {
-  reg_1 = operands[0];
-  mem_1 = operands[1];
-  reg_2 = operands[2];
-  mem_2 = operands[3];
-  reg_3 = operands[4];
-  mem_3 = operands[5];
-  reg_4 = operands[6];
-  mem_4 = operands[7];
-  gcc_assert (REG_P (reg_1) && REG_P (reg_2)
-		  && REG_P (reg_3) && REG_P (reg_4));
+  for (int i = 0; i < num_instructions; i++)
+	{
+	  reg[i] = operands[2 * i];
+	  mem[i] = operands[2 * i + 1];
+
+	  gcc_assert (REG_P (reg[i]));
+	}
 
   /* Do not attempt to merge the loads if the loads clobber each other.  */
   for (int i = 0; i < 8; i += 2)
@@ -17051,53 +17049,48 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
 	return false;
 }
   else
-{
-  mem_1 = operands[0];
-  reg_1 = operands[1];
-  mem_2 = operands[2];
-  reg_2 = operands[3];
-  mem_3 = operands[4];
-  reg_3 = operands[5];
-  mem_4 = operands[6];
-  reg_4 = operands[7];
-}
+for (int i = 0; i < num_instructions; i++)
+  {
+	mem[i] = operands[2 * i];
+	reg[i] = operands[2 * i + 1];
+  }
+
   /* Skip if memory operand is by itslef valid for ldp/stp.  */
-  if (!MEM_P (mem_1) || aarch64_mem_pair_operand (mem_1, mode))
+  if (!MEM_P (mem[1]) || aarch64_mem_pair_operand (mem[1], mode))
 return false;
 
-  /* The mems cannot be volatile.  */
-  if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)
-  || MEM_VOLATILE_P (mem_3) ||MEM_VOLATILE_P (mem_4))
-return false;
+  for (int i = 0; i < num_instructions; i++)
+{
+  /* The mems cannot be volatile.  */
+  if (MEM_VOLATILE_P (mem[i]))
+	return false;
 
-  /* Check if the addresses are in the form of [base+offset].  */
-  extract_base_offset_in_addr (mem_1, _1, _1);
-  if (base_1 == NULL_RTX || offset_1 == NULL_RTX)
-return false;
-  extract_base_offset_in_addr (mem_2, _2, _2);
-  if (base_2 == NULL_RTX || offset_2 == NULL_RTX)
-return false;
-  extract_base_offset_in_addr (mem_3, _3, _3);
-  if (base_3 == NULL_RTX || offset_3 == NULL_RTX)
-return false;
-  extract_base_offset_in_addr (mem_4, _4, _4);
-  if (base_4 == NULL_RTX || offset_4 == NULL_RTX)
-return false;
+  /* Check if the addresses are in the form of [base+offset].  */
+  extract_base_offset_in_addr (mem[i], base + i, offset + i);
+  if (base[i] == NULL_RTX || offset[i] == NULL_RTX)
+	return false;
+}
+
+  /* Only the last register in the order in which they occur
+ may be clobbered by the load.  */
+  if (load)
+for (int i = 0; i < num_instructions; i++)
+  if (reg_mentioned_p (reg[i], mem[i]))
+	return false;
 
   /* Check if the bases are same.  */
-  if (!rtx_equal_p (base_1, base_2)
-  || !rtx_equal_p (base_2, base_3)
-  || !rtx_equal_p (base_3, base_4))
-return false;
+  for (int i = 0; i < num_instructions - 1; i++)
+if (!rtx_equal_p (base[i], base[i + 1]))
+  return false;
+
+  for (int i = 0; i < num_instructions; i++)
+offvals[i] = INTVAL (offset[i]);
 
-  offvals[0] = INTVAL (offset_1);
-  offvals[1] = INTVAL (offset_2);
-  offvals[2] = INTVAL (offset_3);
-  offvals[3] = INTVAL (offset_4);
   msize = GET_MODE_SIZE (mode);
 
   /* Check if the offsets can be put in the right order to do a ldp/stp.  */
-  qsort (offvals, 4, sizeof (HOST_WIDE_INT), aarch64_host_wide_int_compare);
+  qsort (offvals, num_instructions, sizeof (HOST_WIDE_INT),
+	 aarch64_host_wide_int_compare);
 
   if (!(offvals[1] == offvals[0] + msize
 	&& offvals[3] == offvals[2] + msize))
@@ -17112,45 +17105,25 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
   if (offvals[0] % msize != offvals[2] % msize)
 return false;
 
-  /* Check if the addresses are 

abstract wide int binop code from VRP

2018-07-10 Thread Aldy Hernandez

Howdy!

Attached are more cleanups to VRP getting rid of some repetitive code, 
as well as abstracting wide int handling code into their own functions. 
There should be no change to existing functionality.


You may notice that I have removed the PLUS/MINUS_EXPR handling in 
vrp_int_const_binop, even from the new abstracted code:


-  /* For addition, the operands must be of the same sign
-to yield an overflow.  Its sign is therefore that
-of one of the operands, for example the first.  */
- || (code == PLUS_EXPR && sgn1 >= 0)
- /* For subtraction, operands must be of
-different signs to yield an overflow.  Its sign is
-therefore that of the first operand or the opposite of
-that of the second operand.  A first operand of 0 counts
-as positive here, for the corner case 0 - (-INF), which
-overflows, but must yield +INF.  */
- || (code == MINUS_EXPR && sgn1 >= 0)

This code is actually unreachable, as the switch above this snippet was 
already aborting if code was not one of the shift or mult/div operators.


Oh yeah, don't blame me for the cryptic comment to 
range_easy_mask_min_mask().  That machine language comment was already 
there ;-).


OK pending one more round of tests?

Aldy
gcc/

* fold-const.c (int_const_binop_2): Abstract wide int code to...
(wide_int_binop): ...here.
* fold-const.h (wide_int_binop): New.
* tree-vrp.c (vrp_int_const_binop): Call wide_int_binop.
	Remove useless PLUS/MINUS_EXPR case.
(zero_nonzero_bits_from_vr): Move wide int code...
(zero_nonzero_bits_from_bounds): ...here.
(extract_range_from_binary_expr_1): Move mask optimization code...
(range_easy_mask_min_max): ...here.
* tree-vrp.h (zero_nonzero_bits_from_bounds): New.
(range_easy_mask_min_max): New.

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 5b94c700c81..35171c5de08 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -966,21 +966,18 @@ int_binop_types_match_p (enum tree_code code, const_tree type1, const_tree type2
 	 && TYPE_MODE (type1) == TYPE_MODE (type2);
 }
 
-/* Subroutine of int_const_binop_1 that handles two INTEGER_CSTs.  */
-
-static tree
-int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree parg2,
-		   int overflowable)
-{
-  wide_int res;
-  tree t;
-  tree type = TREE_TYPE (parg1);
-  signop sign = TYPE_SIGN (type);
-  wi::overflow_type overflow = wi::OVF_NONE;
+/* Perform binary tree operation CODE on ARG1 and ARG2 and return the
+   result in RES.  If an overflow occurs, it is stored in OVERFLOW.
 
-  wi::tree_to_wide_ref arg1 = wi::to_wide (parg1);
-  wide_int arg2 = wi::to_wide (parg2, TYPE_PRECISION (type));
+   Return TRUE if the operation is handled and was successful.  */
 
+bool
+wide_int_binop (enum tree_code code,
+		wide_int , const wide_int , const wide_int ,
+		signop sign, wi::overflow_type )
+{
+  wide_int tmp;
+  overflow = wi::OVF_NONE;
   switch (code)
 {
 case BIT_IOR_EXPR:
@@ -999,37 +996,41 @@ int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree parg2,
 case LSHIFT_EXPR:
   if (wi::neg_p (arg2))
 	{
-	  arg2 = -arg2;
+	  tmp = -arg2;
 	  if (code == RSHIFT_EXPR)
 	code = LSHIFT_EXPR;
 	  else
 	code = RSHIFT_EXPR;
 	}
+  else
+tmp = arg2;
 
   if (code == RSHIFT_EXPR)
 	/* It's unclear from the C standard whether shifts can overflow.
 	   The following code ignores overflow; perhaps a C standard
 	   interpretation ruling is needed.  */
-	res = wi::rshift (arg1, arg2, sign);
+	res = wi::rshift (arg1, tmp, sign);
   else
-	res = wi::lshift (arg1, arg2);
+	res = wi::lshift (arg1, tmp);
   break;
 
 case RROTATE_EXPR:
 case LROTATE_EXPR:
   if (wi::neg_p (arg2))
 	{
-	  arg2 = -arg2;
+	  tmp = -arg2;
 	  if (code == RROTATE_EXPR)
 	code = LROTATE_EXPR;
 	  else
 	code = RROTATE_EXPR;
 	}
+  else
+tmp = arg2;
 
   if (code == RROTATE_EXPR)
-	res = wi::rrotate (arg1, arg2);
+	res = wi::rrotate (arg1, tmp);
   else
-	res = wi::lrotate (arg1, arg2);
+	res = wi::lrotate (arg1, tmp);
   break;
 
 case PLUS_EXPR:
@@ -1051,49 +1052,49 @@ int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree parg2,
 case TRUNC_DIV_EXPR:
 case EXACT_DIV_EXPR:
   if (arg2 == 0)
-	return NULL_TREE;
+	return false;
   res = wi::div_trunc (arg1, arg2, sign, );
   break;
 
 case FLOOR_DIV_EXPR:
   if (arg2 == 0)
-	return NULL_TREE;
+	return false;
   res = wi::div_floor (arg1, arg2, sign, );
   break;
 
 case CEIL_DIV_EXPR:
   if (arg2 == 0)
-	return NULL_TREE;
+	return false;
   res = wi::div_ceil (arg1, arg2, sign, );
   break;
 
 case ROUND_DIV_EXPR:
   if (arg2 == 0)
-	return NULL_TREE;
+	return false;
   res = wi::div_round (arg1, arg2, sign, );
   break;
 
 case TRUNC_MOD_EXPR:
   if (arg2 

Re: [PATCH] Fix -fcompare-debug issue in cp_maybe_instrument_return (PR sanitizer/86406)

2018-07-10 Thread Jakub Jelinek
On Tue, Jul 10, 2018 at 10:01:10AM +0200, Richard Biener wrote:
> On Tue, 10 Jul 2018, Jakub Jelinek wrote:
> > cp_maybe_instrument_return is looking for a return stmt at the end of
> > function to decide whether to omit -fsanitize=return instrumentation or
> > __builtin_unreachable addition.  If a STATEMENT_LIST has a return followed
> > by DEBUG_BEGIN_STMT (or multiple of them), it doesn't find the return
> > though.
> > 
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk?
> 
> OK.  This also affects the branch?

8.x only, will commit it there too.

> > 2018-07-10  Jakub Jelinek  
> > 
> > PR sanitizer/86406
> > * cp-gimplify.c (cp_maybe_instrument_return): Skip trailing
> > DEBUG_BEGIN_STMTs.
> > 
> > * g++.dg/ubsan/pr86406.C: New test.

Jakub


Re: [PATCH][OBVIOUS] Add missing Optimization attribute.

2018-07-10 Thread Andre Vieira (lists)
On 09/07/18 09:11, Martin Liška wrote:
> Hi.
> 
> I'm putting back what I accidentally removed.
> 
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-07-09  Martin Liska  
> 
>   * common.opt: Add back wrongly removed attribute.
> ---
>  gcc/common.opt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
Hi Martin,

Re-enabling this option made the test for warnings in gcc.dg/pr84100.c fail.

Maybe the test needs some adjusting?

Cheers,
Andre



Re: [PATCH] Fix -fcompare-debug issue in cp_maybe_instrument_return (PR sanitizer/86406)

2018-07-10 Thread Richard Biener
On Tue, 10 Jul 2018, Jakub Jelinek wrote:

> Hi!
> 
> cp_maybe_instrument_return is looking for a return stmt at the end of
> function to decide whether to omit -fsanitize=return instrumentation or
> __builtin_unreachable addition.  If a STATEMENT_LIST has a return followed
> by DEBUG_BEGIN_STMT (or multiple of them), it doesn't find the return
> though.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?

OK.  This also affects the branch?

Thanks,
Richard.

> 2018-07-10  Jakub Jelinek  
> 
>   PR sanitizer/86406
>   * cp-gimplify.c (cp_maybe_instrument_return): Skip trailing
>   DEBUG_BEGIN_STMTs.
> 
>   * g++.dg/ubsan/pr86406.C: New test.
> 
> --- gcc/cp/cp-gimplify.c.jj   2018-07-05 11:41:51.687718588 +0200
> +++ gcc/cp/cp-gimplify.c  2018-07-09 09:57:16.368775004 +0200
> @@ -1621,6 +1621,13 @@ cp_maybe_instrument_return (tree fndecl)
>   case STATEMENT_LIST:
> {
>   tree_stmt_iterator i = tsi_last (t);
> + while (!tsi_end_p (i))
> +   {
> + tree p = tsi_stmt (i);
> + if (TREE_CODE (p) != DEBUG_BEGIN_STMT)
> +   break;
> + tsi_prev ();
> +   }
>   if (!tsi_end_p (i))
> {
>   t = tsi_stmt (i);
> --- gcc/testsuite/g++.dg/ubsan/pr86406.C.jj   2018-07-09 09:58:57.362878125 
> +0200
> +++ gcc/testsuite/g++.dg/ubsan/pr86406.C  2018-07-09 09:58:37.716858063 
> +0200
> @@ -0,0 +1,33 @@
> +// PR sanitizer/86406
> +// { dg-do compile }
> +// { dg-options "-fcompare-debug -fsanitize=undefined -g -O1" }
> +
> +typedef enum { } cmd_status;
> +class ECell;
> +class ECell_const_ptr { };
> +class ECell_ptr
> +{
> +  ECell *mp_element;
> +  ECell *getPointer () const { return mp_element; }
> +public:
> +  operator  ECell_const_ptr () const { return ECell_const_ptr(); }
> +};
> +
> +extern ECell_ptr NULL_CELL;
> +class VwUI_2DCellLayerView;
> +class view_cell_layoutImpl
> +{
> +  cmd_status handleChangeFlags (VwUI_2DCellLayerView *
> +  p_ui_celllayerview,
> +  ECell_const_ptr p_peekCell);
> +  cmd_status openCellLayoutView ();
> +};
> +
> +cmd_status
> +view_cell_layoutImpl::openCellLayoutView ()
> +{
> +  ECell_const_ptr pcell = NULL_CELL;
> +  VwUI_2DCellLayerView *p_user_interface;
> +  return handleChangeFlags (p_user_interface, pcell);
> +  ;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH] Fix -fcompare-debug issue in cp_maybe_instrument_return (PR sanitizer/86406)

2018-07-10 Thread Jakub Jelinek
Hi!

cp_maybe_instrument_return is looking for a return stmt at the end of
function to decide whether to omit -fsanitize=return instrumentation or
__builtin_unreachable addition.  If a STATEMENT_LIST has a return followed
by DEBUG_BEGIN_STMT (or multiple of them), it doesn't find the return
though.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2018-07-10  Jakub Jelinek  

PR sanitizer/86406
* cp-gimplify.c (cp_maybe_instrument_return): Skip trailing
DEBUG_BEGIN_STMTs.

* g++.dg/ubsan/pr86406.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2018-07-05 11:41:51.687718588 +0200
+++ gcc/cp/cp-gimplify.c2018-07-09 09:57:16.368775004 +0200
@@ -1621,6 +1621,13 @@ cp_maybe_instrument_return (tree fndecl)
case STATEMENT_LIST:
  {
tree_stmt_iterator i = tsi_last (t);
+   while (!tsi_end_p (i))
+ {
+   tree p = tsi_stmt (i);
+   if (TREE_CODE (p) != DEBUG_BEGIN_STMT)
+ break;
+   tsi_prev ();
+ }
if (!tsi_end_p (i))
  {
t = tsi_stmt (i);
--- gcc/testsuite/g++.dg/ubsan/pr86406.C.jj 2018-07-09 09:58:57.362878125 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr86406.C2018-07-09 09:58:37.716858063 
+0200
@@ -0,0 +1,33 @@
+// PR sanitizer/86406
+// { dg-do compile }
+// { dg-options "-fcompare-debug -fsanitize=undefined -g -O1" }
+
+typedef enum { } cmd_status;
+class ECell;
+class ECell_const_ptr { };
+class ECell_ptr
+{
+  ECell *mp_element;
+  ECell *getPointer () const { return mp_element; }
+public:
+  operator  ECell_const_ptr () const { return ECell_const_ptr(); }
+};
+
+extern ECell_ptr NULL_CELL;
+class VwUI_2DCellLayerView;
+class view_cell_layoutImpl
+{
+  cmd_status handleChangeFlags (VwUI_2DCellLayerView *
+  p_ui_celllayerview,
+  ECell_const_ptr p_peekCell);
+  cmd_status openCellLayoutView ();
+};
+
+cmd_status
+view_cell_layoutImpl::openCellLayoutView ()
+{
+  ECell_const_ptr pcell = NULL_CELL;
+  VwUI_2DCellLayerView *p_user_interface;
+  return handleChangeFlags (p_user_interface, pcell);
+  ;
+}

Jakub


[committed] Save/restore OpenMP linear clause modifiers in modules (PR fortran/86421)

2018-07-10 Thread Jakub Jelinek
Hi!

This patch (in a backwards compatible way) handles saving and restoring of
linear clause modifiers.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-07-10  Jakub Jelinek  

PR fortran/86421
* module.c (omp_declare_simd_clauses): Add LINEAR with _REF, _VAL and
_UVAL suffixes.
(mio_omp_declare_simd): Save and restore ref, val and uval modifiers
on linear clauses.  Initialize n->where to gfc_current_locus.

* gfortran.dg/vect/pr86421.f90: New test.

--- gcc/fortran/module.c.jj 2018-02-13 09:28:10.0 +0100
+++ gcc/fortran/module.c2018-07-09 18:56:49.595348962 +0200
@@ -4098,6 +4098,9 @@ static const mstring omp_declare_simd_cl
 minit ("UNIFORM", 3),
 minit ("LINEAR", 4),
 minit ("ALIGNED", 5),
+minit ("LINEAR_REF", 33),
+minit ("LINEAR_VAL", 34),
+minit ("LINEAR_UVAL", 35),
 minit (NULL, -1)
 };
 
@@ -4140,7 +4143,10 @@ mio_omp_declare_simd (gfc_namespace *ns,
}
  for (n = ods->clauses->lists[OMP_LIST_LINEAR]; n; n = n->next)
{
- mio_name (4, omp_declare_simd_clauses);
+ if (n->u.linear_op == OMP_LINEAR_DEFAULT)
+   mio_name (4, omp_declare_simd_clauses);
+ else
+   mio_name (32 + n->u.linear_op, omp_declare_simd_clauses);
  mio_symbol_ref (>sym);
  mio_expr (>expr);
}
@@ -4181,11 +4187,20 @@ mio_omp_declare_simd (gfc_namespace *ns,
case 4:
case 5:
  *ptrs[t - 3] = n = gfc_get_omp_namelist ();
+   finish_namelist:
+ n->where = gfc_current_locus;
  ptrs[t - 3] = >next;
  mio_symbol_ref (>sym);
  if (t != 3)
mio_expr (>expr);
  break;
+   case 33:
+   case 34:
+   case 35:
+ *ptrs[1] = n = gfc_get_omp_namelist ();
+ n->u.linear_op = (enum gfc_omp_linear_op) (t - 32);
+ t = 4;
+ goto finish_namelist;
}
}
 }
--- gcc/testsuite/gfortran.dg/vect/pr86421.f90.jj   2018-07-09 
19:09:56.662398875 +0200
+++ gcc/testsuite/gfortran.dg/vect/pr86421.f90  2018-07-09 19:07:57.432240946 
+0200
@@ -0,0 +1,35 @@
+! PR fortran/86421
+! { dg-require-effective-target vect_simd_clones }
+! { dg-additional-options "-fopenmp-simd" }
+! { dg-additional-options "-mavx" { target avx_runtime } }
+
+module mod86421
+  implicit none
+contains
+  subroutine foo(x, y, z)
+real :: x
+integer :: y, z
+!$omp declare simd linear(ref(x)) linear(val(y)) linear(uval(z))
+x = x + y
+z = z + 1
+  end subroutine
+end module mod86421
+
+program pr86421
+  use mod86421
+  implicit none
+  integer :: i, j
+  real :: a(64)
+  j = 0
+  do i = 1, 64
+a(i) = i
+  end do
+  !$omp simd
+  do i = 1, 64
+call foo (a(i), i, j)
+  end do
+  do i = 1, 64
+if (a(i) .ne. (2 * i)) stop 1
+  end do
+  if (j .ne. 64) stop 2
+end program pr86421

Jakub


Re: [PATCH 0/7] Mitigation against unsafe data speculation (CVE-2017-5753)

2018-07-10 Thread Richard Biener
On Mon, Jul 9, 2018 at 6:39 PM Richard Earnshaw
 wrote:
>
>
> The patches I posted earlier this year for mitigating against
> CVE-2017-5753 (Spectre variant 1) attracted some useful feedback, from
> which it became obvious that a rethink was needed.  This mail, and the
> following patches attempt to address that feedback and present a new
> approach to mitigating against this form of attack surface.
>
> There were two major issues with the original approach:
>
> - The speculation bounds were too tightly constrained - essentially
>   they had to represent and upper and lower bound on a pointer, or a
>   pointer offset.
> - The speculation constraints could only cover the immediately preceding
>   branch, which often did not fit well with the structure of the existing
>   code.
>
> An additional criticism was that the shape of the intrinsic did not
> fit particularly well with systems that used a single speculation
> barrier that essentially had to wait until all preceding speculation
> had to be resolved.
>
> To address all of the above, these patches adopt a new approach, based
> in part on a posting by Chandler Carruth to the LLVM developers list
> (https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html),
> but which we have extended to deal with inter-function speculation.
> The patches divide the problem into two halves.
>
> The first half is some target-specific code to track the speculation
> condition through the generated code to provide an internal variable
> which can tell us whether or not the CPU's control flow speculation
> matches the data flow calculations.  The idea is that the internal
> variable starts with the value TRUE and if the CPU's control flow
> speculation ever causes a jump to the wrong block of code the variable
> becomes false until such time as the incorrect control flow
> speculation gets unwound.
>
> The second half is that a new intrinsic function is introduced that is
> much simpler than we had before.  The basic version of the intrinsic
> is now simply:
>
>   T var = __builtin_speculation_safe_value (T unsafe_var);
>
> Full details of the syntax can be found in the documentation patch, in
> patch 1.  In summary, when not speculating the intrinsic returns
> unsafe_var; when speculating then if it can be shown that the
> speculative flow has diverged from the intended control flow then zero
> is returned.  An optional second argument can be used to return an
> alternative value to zero.  The builtin may cause execution to pause
> until the speculation state is resolved.

So a trivial target implementation would be to emit a barrier and then
it would always return unsafe_var and never zero.  What I don't understand
fully is what users should do here, thus what the value of ever returning
"unsafe" is.  Also I wonder why the API is forcing you to single-out a
special value instead of doing

 bool safe = __builtin_speculation_safe_value_p (T unsafe_value);
 if (!safe)
   /* what now? */

I'm only guessing that the correct way to handle "unsafe" is basically

 while (__builtin_speculation_safe_value (val) == 0)
;

use val, it's now safe

that is, the return value is only interesting in sofar as to whether it is equal
to val or the special value?

That said, I wonder why we don't hide that details from the user and
provide a predicate instead.

Richard.

> There are seven patches in this set, as follows.
>
> 1) Introduces the new intrinsic __builtin_sepculation_safe_value.
> 2) Adds a basic hard barrier implementation for AArch32 (arm) state.
> 3) Adds a basic hard barrier implementation for AArch64 state.
> 4) Adds a new command-line option -mtrack-speculation (currently a no-op).
> 5) Disables CB[N]Z and TB[N]Z when -mtrack-speculation.
> 6) Adds the new speculation tracking pass for AArch64
> 7) Uses the new speculation tracking pass to generate CSDB-based barrier
>sequences
>
> I haven't added a speculation-tracking pass for AArch32 at this time.
> It is possible to do this, but would require quite a lot of rework for
> the arm backend due to the limited number of registers that are
> available.
>
> Although patch 6 is AArch64 specific, I'd appreciate a review from
> someone more familiar with the branch edge code than myself.  There
> appear to be a number of tricky issues with more complex edges so I'd
> like a second opinion on that code in case I've missed an important
> case.
>
> R.
>
>
>
> Richard Earnshaw (7):
>   Add __builtin_speculation_safe_value
>   Arm - add speculation_barrier pattern
>   AArch64 - add speculation barrier
>   AArch64 - Add new option -mtrack-speculation
>   AArch64 - disable CB[N]Z TB[N]Z when tracking speculation
>   AArch64 - new pass to add conditional-branch speculation tracking
>   AArch64 - use CSDB based sequences if speculation tracking is enabled
>
>  gcc/builtin-types.def |   6 +
>  gcc/builtins.c|  57 
>  gcc/builtins.def  |  20 ++
>  

[committed] Fix OpenMP class iterators in distribute parallel for (PR c++/86443)

2018-07-10 Thread Jakub Jelinek
Hi!

While working on OpenMP 5.0 range-for support, I've discovered that even for
normal class iterators distribute parallel for gimplification ICEs in
several ways (other composite loop constructs work only because class
iterators are not allowed on them).  The problem is that the FEs emit the
code that needs to be done before computing number of the iterations around
the innermost construct, which we then wrap into OMP_PARALLEL and
OMP_DISTRIBUTE and then we want to compute number of iterations on the
OMP_DISTRIBUTE.  The following patch fixes it by detecting these cases and
moving the outer composite constructs right around the innermost one.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2018-07-10  Jakub Jelinek  

PR c++/86443
* gimplify.c (find_combined_omp_for): Add DATA argument, in addition
to finding the inner OMP_FOR/OMP_SIMD stmt find non-trivial wrappers,
BLOCKs with BLOCK_VARs, OMP_PARALLEL in between, OMP_FOR in between.
(gimplify_omp_for): For composite loops, move outer
OMP_{DISTRIBUTE,TASKLOOP,FOR,PARALLEL} right around innermost
OMP_FOR/OMP_SIMD if there are any non-trivial wrappers.  For class
iterators add any needed clauses.  Allow OMP_FOR_ORIG_DECLS to contain
TREE_LIST for both the original class iterator and the "last" helper
var.  Gimplify OMP_FOR_PRE_BODY before the outermost composite
loop, remember has_decl_expr from outer composite loops for the
innermost OMP_SIMD in TREE_PRIVATE bit on OMP_FOR_INIT.
gcc/c-family/
* c-omp.c (c_omp_check_loop_iv_r, c_omp_check_loop_iv): Allow declv
to contain TREE_LIST for both the original class iterator and the
"last" helper var.
gcc/cp/
* semantics.c (handle_omp_for_class_iterator): Remove lastp argument,
instead of setting *lastp turn orig_declv elt into a TREE_LIST.
(finish_omp_for): Adjust handle_omp_for_class_iterator caller.
* pt.c (tsubst_omp_for_iterator): Allow OMP_FOR_ORIG_DECLS to contain
TREE_LIST for both the original class iterator and the "last" helper
var.
libgomp/
* testsuite/libgomp.c++/for-15.C: New test.

--- gcc/gimplify.c.jj   2018-07-07 09:45:42.133890332 +0200
+++ gcc/gimplify.c  2018-07-09 15:47:14.587400243 +0200
@@ -9532,24 +9532,53 @@ gimplify_omp_task (tree *expr_p, gimple_
 }
 
 /* Helper function of gimplify_omp_for, find OMP_FOR resp. OMP_SIMD
-   with non-NULL OMP_FOR_INIT.  */
+   with non-NULL OMP_FOR_INIT.  Also, fill in pdata array,
+   pdata[0] non-NULL if there is anything non-trivial in between, pdata[1]
+   is address of OMP_PARALLEL in between if any, pdata[2] is address of
+   OMP_FOR in between if any and pdata[3] is address of the inner
+   OMP_FOR/OMP_SIMD.  */
 
 static tree
-find_combined_omp_for (tree *tp, int *walk_subtrees, void *)
+find_combined_omp_for (tree *tp, int *walk_subtrees, void *data)
 {
+  tree **pdata = (tree **) data;
   *walk_subtrees = 0;
   switch (TREE_CODE (*tp))
 {
 case OMP_FOR:
+  if (OMP_FOR_INIT (*tp) != NULL_TREE)
+   {
+ pdata[3] = tp;
+ return *tp;
+   }
+  pdata[2] = tp;
   *walk_subtrees = 1;
-  /* FALLTHRU */
+  break;
 case OMP_SIMD:
   if (OMP_FOR_INIT (*tp) != NULL_TREE)
-   return *tp;
+   {
+ pdata[3] = tp;
+ return *tp;
+   }
   break;
 case BIND_EXPR:
+  if (BIND_EXPR_VARS (*tp)
+ || (BIND_EXPR_BLOCK (*tp)
+ && BLOCK_VARS (BIND_EXPR_BLOCK (*tp
+   pdata[0] = tp;
+  *walk_subtrees = 1;
+  break;
 case STATEMENT_LIST:
+  if (!tsi_one_before_end_p (tsi_start (*tp)))
+   pdata[0] = tp;
+  *walk_subtrees = 1;
+  break;
+case TRY_FINALLY_EXPR:
+  pdata[0] = tp;
+  *walk_subtrees = 1;
+  break;
 case OMP_PARALLEL:
+  pdata[1] = tp;
   *walk_subtrees = 1;
   break;
 default:
@@ -9574,6 +9603,115 @@ gimplify_omp_for (tree *expr_p, gimple_s
 
   orig_for_stmt = for_stmt = *expr_p;
 
+  if (OMP_FOR_INIT (for_stmt) == NULL_TREE)
+{
+  tree *data[4] = { NULL, NULL, NULL, NULL };
+  gcc_assert (TREE_CODE (for_stmt) != OACC_LOOP);
+  inner_for_stmt = walk_tree (_FOR_BODY (for_stmt),
+ find_combined_omp_for, data, NULL);
+  if (inner_for_stmt == NULL_TREE)
+   {
+ gcc_assert (seen_error ());
+ *expr_p = NULL_TREE;
+ return GS_ERROR;
+   }
+  if (data[2] && OMP_FOR_PRE_BODY (*data[2]))
+   {
+ append_to_statement_list_force (OMP_FOR_PRE_BODY (*data[2]),
+ _FOR_PRE_BODY (for_stmt));
+ OMP_FOR_PRE_BODY (*data[2]) = NULL_TREE;
+   }
+  if (OMP_FOR_PRE_BODY (inner_for_stmt))
+   {
+ append_to_statement_list_force (OMP_FOR_PRE_BODY (inner_for_stmt),
+ _FOR_PRE_BODY 

Re: [PATCH] add support for strnlen (PR 81384)

2018-07-10 Thread Richard Biener
On Mon, Jul 9, 2018 at 11:26 PM Martin Sebor  wrote:
>
> On 07/09/2018 08:36 AM, Aldy Hernandez wrote:
> >{ dg-do run }
> >{ do-options "-O2 -fno-tree-strlen" }  */
> >
> >  I don't think this is doing anything.
> >
> > If you look at the test run you can see that -fno-tree-strlen is never
> > passed (I think you actually mean -fno-optimize-strlen for that
> > matter).  Also, the builtins.exp harness runs your test for an
> > assortment of other flags, not just -O2.
>
> I didn't know the harness ignores dg-options specified in these
> tests.  That's surprising and feels like a bug in the harness
> not to complain about it.  The purpose of the test is to verify
> that the strnlen expansion in builtins.c does the right thing
> and it deliberately tries to disable the earlier strlen
> optimizations to make sure the expansion in builtins.c is fully
> exercised.  By not pointing out my mistake the harness effectively
> let me commit a change without making sure it's thoroughly tested
> (I tested it manually before committing the patch but things could
> regress without us noticing).  I'll look into fixing this somehow.
>
> >
> > This test is failing on my range branch for -Og, because
> > expand_builtin_strnlen() needs range info:
> >
> > +  wide_int min, max;
> > +  enum value_range_type rng = get_range_info (bound, , );
> > +  if (rng != VR_RANGE)
> > +return NULL_RTX;
> >
> > but interestingly enough, it seems to be calculated in the sprintf
> > pass as part of the DOM walk:
> >
> >   /* First record ranges generated by this statement.  */
> >   evrp_range_analyzer.record_ranges_from_stmt (stmt, false);
> >
> > It feels wrong that the sprintf warning pass is generating range info
> > that you may later depend on at rtl expansion time (and for a totally
> > unrelated thing-- strlen expansion).
>
> Any pass that records ranges for statements will have this
> effect.  The sprintf pass seems to be the first one to make
> use of this utility (and it's not just a warning pass but also
> an optimization pass) but it would be a shame to put it off
> limits to warning-only passes only because it happens to set
> ranges.

As you noted elsewhere warning options shouldn't change code-generation.
This means that ranges may not be set to the IL in case they are only
computed when a warning option is enabled.

Richard.

> >
> > I don't know if this is just a quirk of builtins.exp calling your test
> > with flags you didn't intend, but the inconsistency could cause
> > problems in the future.  Errr, or my present ;-).
> >
> > Would it be too much to ask for you to either fix the flags being
> > passed down to the test, or better yet, find some non-sprintf
> > dependent way of calculating range info earlier?
>
> At the time I wrote the test I didn't realize the statement
> range info was being computed only in the sprintf pass --
> I thought it was done as "a basic service for the greater
> good" by VRP.  It seems that it should be such a service.
>
> Let me look into tweaking the test.
>
> Martin
>
> >
> > Aldy
> > On Mon, Jun 18, 2018 at 6:35 PM Martin Sebor  wrote:
> >>
> >> On 06/12/2018 03:11 PM, Jeff Law wrote:
> >>> On 06/05/2018 03:43 PM, Martin Sebor wrote:
>  The attached patch adds basic support for handling strnlen
>  as a built-in function.  It touches the strlen pass where
>  it folds constant results of the function, and builtins.c
>  to add simple support for expanding strnlen calls with known
>  results.  It also changes calls.c to detect excessive bounds
>  to the function and unsafe calls with arguments declared
>  attribute nonstring.
> 
>  A side-effect of the strlen change I should call out is that
>  strlen() calls to all zero-length arrays that aren't considered
>  flexible array members (i.e., internal members or non-members)
>  are folded into zero.  No warning is issued for such invalid
>  uses of zero-length arrays but based on the responses to my
>  question Re: aliasing between internal zero-length-arrays and
>  other members(*) it sounds like one would be appropriate.
>  I will see about adding one in a separate patch.
> 
>  Martin
> 
>  [*] https://gcc.gnu.org/ml/gcc/2018-06/msg00046.html
> 
>  gcc-81384.diff
> 
> 
>  PR tree-optimization/81384 - built-in form of strnlen missing
> 
>  gcc/ChangeLog:
> 
>   PR tree-optimization/81384
>   * builtin-types.def (BT_FN_SIZE_CONST_STRING_SIZE): New.
>   * builtins.c (expand_builtin_strnlen): New function.
>   (expand_builtin): Call it.
>   (fold_builtin_n): Avoid setting TREE_NO_WARNING.
>   * builtins.def (BUILT_IN_STRNLEN): New.
>   * calls.c (maybe_warn_nonstring_arg): Handle BUILT_IN_STRNLEN.
>   Warn for bounds in excess of maximum object size.
>   * tree-ssa-strlen.c (maybe_set_strlen_range): Return tree 
>  representing
>   

Re: [PATCH, rs6000 v3] enable gimple folding for vec_xl, vec_xst

2018-07-10 Thread Richard Biener
On Mon, Jul 9, 2018 at 9:08 PM Will Schmidt  wrote:
>
> Hi,
>   Re-posting.  Richard provided feedback on a previous version of this
> patch, I wanted to make sure he was/is OK with the latest. :-)
>
> Add support for Gimple folding for unaligned vector loads and stores.
>
> Regtest completed across variety of systems, P6,P7,P8,P9.
>
> [v2] Added the type for the MEM_REF, per feedback.
> Testcases for gimple-folding of the same are currently in-tree
> as powerpc/fold-vec-load-*.c and powerpc/fold-vec-store-*.c.
> Re-tested, still looks good. :-)
>
> [v3] Updated the alignment for the MEM_REF to be 4bytes.
> Updated/added/removed comments in the code for clarity.
>
> OK for trunk?
>
> Thanks
> -Will
>
> [gcc]
>
> 2018-07-09 Will Schmidt 
>
> * config/rs6000/rs6000.c (rs6000_builtin_valid_without_lhs): Add
> vec_xst variants to the list.
> (rs6000_gimple_fold_builtin): Add support for folding unaligned
> vector loads and stores.
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 8bc4109..774c60a 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -15401,10 +15401,16 @@ rs6000_builtin_valid_without_lhs (enum 
> rs6000_builtins fn_code)
>  case ALTIVEC_BUILTIN_STVX_V8HI:
>  case ALTIVEC_BUILTIN_STVX_V4SI:
>  case ALTIVEC_BUILTIN_STVX_V4SF:
>  case ALTIVEC_BUILTIN_STVX_V2DI:
>  case ALTIVEC_BUILTIN_STVX_V2DF:
> +case VSX_BUILTIN_STXVW4X_V16QI:
> +case VSX_BUILTIN_STXVW4X_V8HI:
> +case VSX_BUILTIN_STXVW4X_V4SF:
> +case VSX_BUILTIN_STXVW4X_V4SI:
> +case VSX_BUILTIN_STXVD2X_V2DF:
> +case VSX_BUILTIN_STXVD2X_V2DI:
>return true;
>  default:
>return false;
>  }
>  }
> @@ -15910,10 +15916,79 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator 
> *gsi)
> gimple_set_location (g, loc);
> gsi_replace (gsi, g, true);
> return true;
>}
>
> +/* unaligned Vector loads.  */
> +case VSX_BUILTIN_LXVW4X_V16QI:
> +case VSX_BUILTIN_LXVW4X_V8HI:
> +case VSX_BUILTIN_LXVW4X_V4SF:
> +case VSX_BUILTIN_LXVW4X_V4SI:
> +case VSX_BUILTIN_LXVD2X_V2DF:
> +case VSX_BUILTIN_LXVD2X_V2DI:
> +  {
> +arg0 = gimple_call_arg (stmt, 0);  // offset
> +arg1 = gimple_call_arg (stmt, 1);  // address
> +lhs = gimple_call_lhs (stmt);
> +location_t loc = gimple_location (stmt);
> +/* Since arg1 may be cast to a different type, just use ptr_type_node
> +   here instead of trying to enforce TBAA on pointer types.  */
> +tree arg1_type = ptr_type_node;
> +tree lhs_type = TREE_TYPE (lhs);
> +/* in GIMPLE the type of the MEM_REF specifies the alignment.  The
> +  required alignment (power) is 4 bytes regardless of data type.  */
> +tree align_ltype = build_aligned_type (lhs_type, 4);
> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
> Create
> +   the tree using the value from arg0.  The resulting type will match
> +   the type of arg1.  */
> +gimple_seq stmts = NULL;
> +tree temp_offset = gimple_convert (, loc, sizetype, arg0);
> +tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
> +  arg1_type, arg1, temp_offset);
> +gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +/* Use the build2 helper to set up the mem_ref.  The MEM_REF could 
> also
> +   take an offset, but since we've already incorporated the offset
> +   above, here we just pass in a zero.  */
> +gimple *g;
> +g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, 
> temp_addr,
> +   build_int_cst (arg1_type, 
> 0)));
> +gimple_set_location (g, loc);
> +gsi_replace (gsi, g, true);
> +return true;
> +  }
> +
> +/* unaligned Vector stores.  */
> +case VSX_BUILTIN_STXVW4X_V16QI:
> +case VSX_BUILTIN_STXVW4X_V8HI:
> +case VSX_BUILTIN_STXVW4X_V4SF:
> +case VSX_BUILTIN_STXVW4X_V4SI:
> +case VSX_BUILTIN_STXVD2X_V2DF:
> +case VSX_BUILTIN_STXVD2X_V2DI:
> +  {
> +arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
> +arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
> +tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
> +location_t loc = gimple_location (stmt);
> +tree arg0_type = TREE_TYPE (arg0);
> +/* Use ptr_type_node (no TBAA) for the arg2_type.  */
> +tree arg2_type = ptr_type_node;
> +/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  
> Create
> +   the tree using the value from arg0.  The resulting type will match
> +   the type of arg2.  */
> +gimple_seq stmts = NULL;
> +tree temp_offset = gimple_convert (, loc, sizetype, arg1);
> +tree temp_addr = gimple_build (, loc, POINTER_PLUS_EXPR,
> +  

Re: [PATCH] alpha: Use TARGET_COMPUTE_FRAME_LAYOUT

2018-07-10 Thread Richard Biener
On Mon, Jul 9, 2018 at 9:05 PM Richard Henderson  wrote:
>
> At the same time, merge several related frame computing functions.
> Recall that HWI is now always 64-bit, so merge IMASK and FMASK,
> which allows merging of several loops within prologue and epilogue.

Btw, if you're not using these with existing HWI APIs it is now prefered
to use [u]int64_t where appropriate.

Richard.

> Full regression testing will take some time, but a quick browse
> suggests no change in generated code.
>
>
> r~
>
>
> * config/alpha/alpha.c (direct_return): Move down after
> struct machine_function definition; use saved frame_size;
> return bool.
> (struct machine_function): Add sa_mask, sa_size, frame_size.
> (alpha_sa_mask, alpha_sa_size, compute_frame_size): Merge into ...
> (alpha_compute_frame_layout): ... new function.
> (TARGET_COMPUTE_FRAME_LAYOUT): New.
> (alpha_initial_elimination_offset): Use saved sa_size.
> (alpha_vms_initial_elimination_offset): Likewise.
> (alpha_vms_can_eliminate): Remove alpha_sa_size call.
> (alpha_expand_prologue): Use saved frame data.  Merge integer
> and fp register save loops.
> (alpha_expand_epilogue): Likewise.
> (alpha_start_function): Use saved frame data.
> * config/alpha/alpha-protos.h (direct_return): Update.
> (alpha_sa_size): Remove.
> ---
>  gcc/config/alpha/alpha-protos.h |   3 +-
>  gcc/config/alpha/alpha.c| 293 
>  2 files changed, 109 insertions(+), 187 deletions(-)
>
> diff --git a/gcc/config/alpha/alpha-protos.h b/gcc/config/alpha/alpha-protos.h
> index d171f4eb414..099ce0e0c42 100644
> --- a/gcc/config/alpha/alpha-protos.h
> +++ b/gcc/config/alpha/alpha-protos.h
> @@ -21,9 +21,8 @@ extern int alpha_next_sequence_number;
>
>  extern void literal_section (void);
>  extern int zap_mask (HOST_WIDE_INT);
> -extern int direct_return (void);
> +extern bool direct_return (void);
>
> -extern int alpha_sa_size (void);
>  extern HOST_WIDE_INT alpha_initial_elimination_offset (unsigned int,
>unsigned int);
>  extern void alpha_expand_prologue (void);
> diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
> index 9adfe159381..218306d3a07 100644
> --- a/gcc/config/alpha/alpha.c
> +++ b/gcc/config/alpha/alpha.c
> @@ -728,19 +728,6 @@ alpha_vector_mode_supported_p (machine_mode mode)
>return mode == V8QImode || mode == V4HImode || mode == V2SImode;
>  }
>
> -/* Return 1 if this function can directly return via $26.  */
> -
> -int
> -direct_return (void)
> -{
> -  return (TARGET_ABI_OSF
> - && reload_completed
> - && alpha_sa_size () == 0
> - && get_frame_size () == 0
> - && crtl->outgoing_args_size == 0
> - && crtl->args.pretend_args_size == 0);
> -}
> -
>  /* Return the TLS model to use for SYMBOL.  */
>
>  static enum tls_model
> @@ -4837,6 +4824,10 @@ struct GTY(()) alpha_links;
>
>  struct GTY(()) machine_function
>  {
> +  unsigned HOST_WIDE_INT sa_mask;
> +  HOST_WIDE_INT sa_size;
> +  HOST_WIDE_INT frame_size;
> +
>/* For flag_reorder_blocks_and_partition.  */
>rtx gp_save_rtx;
>
> @@ -7236,83 +7227,59 @@ static int vms_save_fp_regno;
>  /* Register number used to reference objects off our PV.  */
>  static int vms_base_regno;
>
> -/* Compute register masks for saved registers.  */
> -
> +/* Compute register masks for saved registers, register save area size,
> +   and total frame size.  */
>  static void
> -alpha_sa_mask (unsigned long *imaskP, unsigned long *fmaskP)
> +alpha_compute_frame_layout (void)
>  {
> -  unsigned long imask = 0;
> -  unsigned long fmask = 0;
> -  unsigned int i;
> +  unsigned HOST_WIDE_INT sa_mask = 0;
> +  HOST_WIDE_INT frame_size;
> +  int sa_size;
>
>/* When outputting a thunk, we don't have valid register life info,
>   but assemble_start_function wants to output .frame and .mask
>   directives.  */
> -  if (cfun->is_thunk)
> +  if (!cfun->is_thunk)
>  {
> -  *imaskP = 0;
> -  *fmaskP = 0;
> -  return;
> -}
> +  if (TARGET_ABI_OPEN_VMS && alpha_procedure_type == PT_STACK)
> +   sa_mask |= HOST_WIDE_INT_1U << HARD_FRAME_POINTER_REGNUM;
>
> -  if (TARGET_ABI_OPEN_VMS && alpha_procedure_type == PT_STACK)
> -imask |= (1UL << HARD_FRAME_POINTER_REGNUM);
> +  /* One for every register we have to save.  */
> +  for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +   if (! fixed_regs[i] && ! call_used_regs[i]
> +   && df_regs_ever_live_p (i) && i != REG_RA)
> + sa_mask |= HOST_WIDE_INT_1U << i;
>
> -  /* One for every register we have to save.  */
> -  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> -if (! fixed_regs[i] && ! call_used_regs[i]
> -   && df_regs_ever_live_p (i) && i != REG_RA)
> -  {
> -   if (i < 32)
> - imask |= (1UL << i);
> -   else
> - fmask |=