[gomp4] add -finform-parallelism

2017-02-20 Thread Cesar Philippidis
This patch introduces a new -finform-parallelism flag to report any
detected parallelism encountered by the compiler. Initially, it's being
used to report how oaccdevlow partitions OpenACC loops. Currently, if
you want to extract this information, you need to compile the program
with -fdump-tree-oaccdevlow, then scan the tree dump for lines marked
Loop and decode the decimal bitmask that represents the parallelism.
This patch makes this process more user friendly by utilizing inform
messages to highlight the directives inside the source code, and clearly
print out the associated parallelism. E.g. given

  !$acc parallel loop
  do i = ...
!$acc parallel loop
 do j = ...

-finform-parallelism reports

  inform-parallelism.f90: In function ‘MAIN__._omp_fn.0’:
  inform-parallelism.f90:10:0: note: ACC LOOP GANG
 !$acc parallel loop

  inform-parallelism.f90:12:0: note: ACC LOOP WORKER VECTOR
!$acc loop

Unfortunately, because this oaccdevlow runs so late, the offloaded
function name doesn't match the one specified by the user.

While working on this, I noticed that the fortran FE wasn't recording
the location of combined loop directives properly, so I fixed that bug.
I also removed an unused variable inside trans-openmp.c.

This patch still isn't complete because I found a similar bug in the c++
FE. Thomas, before I fix that bug, do you think this patch is worth
pursuing for gomp-4_0-branch or maybe even trunk in general? Ideally, we
can extend these diagnostics to report any detected loops inside kernels
regions.

Cesar
2017-02-20  Cesar Philippidis  

	gcc/
	* common.opt (finform-parallelism): New option.
	* omp-low.c (inform_oacc_loop): New function.
	(execute_oacc_device_lower): Use it to report how ACC LOOPs are
	assigned parallelism.

	gcc/doc/
	* invoke.texi: Document -finform-parallelism.

	gcc/fortran/
	* trans-openmp.c (gfc_trans_omp_clauses_1): Delete unused orig_decl.
	(gfc_trans_oacc_combined_directive): Set the location of
	combined acc loops.

	gcc/testsuite/
	* c-c++-common/goacc/inform-parallelism.c: New test.
	* gfortran.dg/goacc/inform-parallelism.f90: New test.


diff --git a/gcc/common.opt b/gcc/common.opt
index 42c0b2f..a7e5494 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1451,6 +1451,10 @@ fif-conversion2
 Common Report Var(flag_if_conversion2) Optimization
 Perform conversion of conditional jumps to conditional execution.
 
+finform-parallelism
+Common Var(flag_inform_parallelism) Init(0)
+Report all paralllelism detected inside offloaded regions.
+
 fstack-reuse=
 Common Joined RejectNegative Enum(stack_reuse_level) Var(flag_stack_reuse) Init(SR_ALL) Optimization
 -fstack-reuse=[all|named_vars|none] Set stack reuse level for local variables.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fd8ba42..9cc8a8d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
 -fforward-propagate -ffp-contract=@var{style} -ffunction-sections @gol
 -fgcse -fgcse-after-reload -fgcse-las -fgcse-lm -fgraphite-identity @gol
 -fgcse-sm -fhoist-adjacent-loads -fif-conversion @gol
--fif-conversion2 -findirect-inlining @gol
+-fif-conversion2 -findirect-inlining -finform-parallelism @gol
 -finline-functions -finline-functions-called-once -finline-limit=@var{n} @gol
 -finline-small-functions -fipa-cp -fipa-cp-clone -fipa-cp-alignment @gol
 -fipa-pta -fipa-profile -fipa-pure-const -fipa-reference -fipa-icf @gol
@@ -6428,6 +6428,13 @@ or @option{-finline-small-functions} options.
 
 Enabled at level @option{-O2}.
 
+@item -finform-parallelism
+@opindex finform-parallelism
+Report any parallelism detected by the compiler.  Inside OpenACC
+offloaded regions, this includes the gang, worker and vector level
+parallelism associated with any @code{ACC LOOP}s.  This option is disabled
+by default.
+
 @item -finline-functions
 @opindex finline-functions
 Consider all functions for inlining, even if they are not declared inline.
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 295f172..8688425 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1947,7 +1947,6 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 			  && n->expr->ref->next->u.ar.type == AR_FULL)))
 		{
 		  gfc_ref *ref = n->expr->ref;
-		  tree orig_decl = decl;
 		  gfc_component *c = ref->u.c.component;
 		  tree field;
 		  tree context;
@@ -3819,6 +3818,7 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
   enum tree_code construct_code;
   bool scan_nodesc_arrays = false;
   hash_set *array_set = NULL;
+  location_t loc = input_location;
 
   switch (code->op)
 {
@@ -3850,6 +3850,9 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
 pushlevel ();
   stmt = gfc_trans_omp_do (code, EXEC_OACC_LOOP, pblock, loop_clauses, NULL);
 
+  if (CAN_HAVE_LOCATION_P (stmt))
+SET_EXPR_LOCATION (stmt, loc);
+
   if (array_set && array_set->elements 

Re: [PATCH, testsuite]: Use posix_memalign instead of aligned_alloc in gcc.dg/strncmp-2.c

2017-02-20 Thread Aaron Sawdey
On Fri, 2017-02-17 at 10:50 +0100, Uros Bizjak wrote:
> posix_memalign is portable to older, non-c11 runtimes.
> 
> 2017-02-17  Uros Bizjak  
> 
> * gcc.dg/strncmp-2.c (test_driver_strncmp): Use posix_memalign
> instead of aligned_alloc.
> 
> Tested on x86_64-linux-gnu, CentOS 5.11.
> 
> OK for mainline?

Uros,
  I posted something very similar last Tuesday:

https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00937.html

I didn't get around to applying it until this morning in 245608.
Apologies for wasting your time tracking down the same issue again.

  Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH] Fix simplify_const_unary_operation sNaN handling (PR tree-optimization/61441)

2017-02-20 Thread Segher Boessenkool
On Tue, Feb 21, 2017 at 12:12:28AM +0100, Jakub Jelinek wrote:
> As I wrote earlier today on gcc-patches, pr61441.c started failing
> on i?86-linux when -fsignaling-nans argument has been added to it.
> 
> I believe this is because the simplify_const_unary_operation
> changes in r231901 were broken, when the operand is a sNaN and
> -fsignaling-nans, we want to avoid folding FLOAT_TRUNCATE, FLOAT_EXTEND
> and FIX of a sNaN, not just return it unmodified (this is in a switch
> that optionally modifies the real value just to create another CONST_DOUBLE
> usually in a different mode out from it.  This patch instead
> returns NULL, which means no simplification.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, fixes pr61441.c
> on the latter, ok for trunk?

This patch looks fine to me, fwiw.

I did test the patch adding -fsignaling-nans to those to testcases on x86,
but for some reason there is no issignaling on the machine I tested on,
so the testcases are skipped.

Thanks,


Segher


Go patch committed: Fix formatting in go/lang.opt

2017-02-20 Thread Ian Lance Taylor
This patch fixes a character in go/lang.opt to be a tab rather than a
space.  This fixes GCC PR 79642.  Bootstrapped on x86_64-pc-linux-gnu.
Committed to mainline.

Ian

2017-02-20  Ian Lance Taylor  

PR go/79642
* lang.opt (-fgo-relative-import-path): Change space to tab.
Index: gcc/go/lang.opt
===
--- gcc/go/lang.opt (revision 245615)
+++ gcc/go/lang.opt (working copy)
@@ -71,7 +71,7 @@
 
 fgo-relative-import-path=
 Go Joined RejectNegative
--fgo-relative-import-path= Treat a relative import as relative to path.
+-fgo-relative-import-path=   Treat a relative import as relative to 
path.
 
 frequire-return-statement
 Go Var(go_require_return_statement) Init(1) Warning


Re: [PATCH] Fix -fsplit-stack with non-local gotos (PR target/79494)

2017-02-20 Thread Ian Lance Taylor
On Mon, Feb 20, 2017 at 12:42 PM, Jakub Jelinek  wrote:
>
> We ICE on the following testcase, because we have abnormal edges
> from both __morestack call (which is before prologue) and call
> to the nested function (which is in between prologue and epilogue)
> to a label reachable through non-local goto.  This is something
> dwarf2cfi doesn't allow, it doesn't know what CFI state should
> be at that label.
> As __morestack really doesn't do non-local gotos and while it probably
> can throw, it is never something that can be caught in the function that
> calls __morestack, the following patch fixes it by telling middle-end
> that __morestack can't do nl goto.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-02-20  Jakub Jelinek  
>
> PR target/79494
> * config/i386/i386.c (ix86_expand_split_stack_prologue): Call
> make_reg_eh_region_note_nothrow_nononlocal on call_insn.
> * config/rs6000/rs6000.c: Include except.h.
> (rs6000_expand_split_stack_prologue): Call
> make_reg_eh_region_note_nothrow_nononlocal on the call insn.
>
> * gcc.dg/pr79494.c: New test.

Seems fine to me.

Thanks.

Ian


[PATCH] Fix simplify_const_unary_operation sNaN handling (PR tree-optimization/61441)

2017-02-20 Thread Jakub Jelinek
Hi!

As I wrote earlier today on gcc-patches, pr61441.c started failing
on i?86-linux when -fsignaling-nans argument has been added to it.

I believe this is because the simplify_const_unary_operation
changes in r231901 were broken, when the operand is a sNaN and
-fsignaling-nans, we want to avoid folding FLOAT_TRUNCATE, FLOAT_EXTEND
and FIX of a sNaN, not just return it unmodified (this is in a switch
that optionally modifies the real value just to create another CONST_DOUBLE
usually in a different mode out from it.  This patch instead
returns NULL, which means no simplification.

Bootstrapped/regtested on x86_64-linux and i686-linux, fixes pr61441.c
on the latter, ok for trunk?

2017-02-20  Jakub Jelinek  

PR tree-optimization/61441
* simplify-rtx.c (simplify_const_unary_operation): For
-fsignaling-nans and sNaN operand, return NULL_RTX rather than
the sNaN unmodified.

--- gcc/simplify-rtx.c.jj   2017-01-01 12:45:37.0 +0100
+++ gcc/simplify-rtx.c  2017-02-20 20:02:33.446251366 +0100
@@ -1889,23 +1889,26 @@ simplify_const_unary_operation (enum rtx
case FLOAT_TRUNCATE:
  /* Don't perform the operation if flag_signaling_nans is on
 and the operand is a signaling NaN.  */
- if (!(HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d)))
-   d = real_value_truncate (mode, d);
+ if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+   return NULL_RTX;
+ d = real_value_truncate (mode, d);
  break;
case FLOAT_EXTEND:
- /* All this does is change the mode, unless changing
-mode class.  */
  /* Don't perform the operation if flag_signaling_nans is on
 and the operand is a signaling NaN.  */
- if (GET_MODE_CLASS (mode) != GET_MODE_CLASS (GET_MODE (op))
- && !(HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d)))
+ if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+   return NULL_RTX;
+ /* All this does is change the mode, unless changing
+mode class.  */
+ if (GET_MODE_CLASS (mode) != GET_MODE_CLASS (GET_MODE (op)))
real_convert (, mode, );
  break;
case FIX:
  /* Don't perform the operation if flag_signaling_nans is on
 and the operand is a signaling NaN.  */
- if (!(HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d)))
-   real_arithmetic (, FIX_TRUNC_EXPR, , NULL);
+ if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+   return NULL_RTX;
+ real_arithmetic (, FIX_TRUNC_EXPR, , NULL);
  break;
case NOT:
  {

Jakub


Re: [PATCH] Change default of param not being smaller that min.

2017-02-20 Thread Alexandre Oliva
On Feb 20, 2017, Richard Biener  wrote:

> On Fri, Feb 17, 2017 at 4:18 PM, Martin Liška  wrote:
>> Hello.
>> 
>> We should not have a default value (different from -1) that is smaller than 
>> minimum.
>> 
>> Ready to be installed after tests?

> I think you should instead change the minimum value.

> But OTOH I wonder why this is a --param at all...  Alex?  Why's this
> not just a #define
> inside emit-rtl.c?  In what circumstances is the user supposed to
> alter this --param?

It's convenient to debug -fcompare-debug errors.  You set the param to a
large number, and then non-debug insns will get the same uid in both
debug and non-debug compilations, so it's easier to compare RTL dumps
(using the attached gcc-diff-dump; I thought it was in contrib, but it's
not; should I put it there?) and locate the differences.



gcc-diff-dump
Description: Bourne shell script


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


New German PO file for 'gcc' (version 7.1-b20170101)

2017-02-20 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-7.1-b20170101.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [4.7][google] Adding a new option -fstack-protector-strong. (issue5461043)

2017-02-20 Thread love1060224

https://www.google.com/usability/index.html?reserved=0=0=andr=ghc=0=0_code=androiddotcom

fstack-protector-strong

https://codereview.appspot.com/5461043/


Re: [PATCH] Fix -fsplit-stack with non-local gotos (PR target/79494)

2017-02-20 Thread Segher Boessenkool
Hi Jakub,

On Mon, Feb 20, 2017 at 09:42:26PM +0100, Jakub Jelinek wrote:
> We ICE on the following testcase, because we have abnormal edges
> from both __morestack call (which is before prologue) and call
> to the nested function (which is in between prologue and epilogue)
> to a label reachable through non-local goto.  This is something
> dwarf2cfi doesn't allow, it doesn't know what CFI state should
> be at that label.
> As __morestack really doesn't do non-local gotos and while it probably
> can throw, it is never something that can be caught in the function that
> calls __morestack, the following patch fixes it by telling middle-end
> that __morestack can't do nl goto.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

The rs6000 part is fine, thanks!


Segher


> 2017-02-20  Jakub Jelinek  
> 
>   PR target/79494
>   * config/i386/i386.c (ix86_expand_split_stack_prologue): Call
>   make_reg_eh_region_note_nothrow_nononlocal on call_insn.
>   * config/rs6000/rs6000.c: Include except.h.
>   (rs6000_expand_split_stack_prologue): Call
>   make_reg_eh_region_note_nothrow_nononlocal on the call insn.
> 
>   * gcc.dg/pr79494.c: New test.


Re: [PATCH] Look through clobber stmts in uninit (PR tree-optimization/79345)

2017-02-20 Thread Marc Glisse

On Mon, 20 Feb 2017, Jakub Jelinek wrote:


As mentioned by Jason in the PR, we've regressed on the following testcase
since we started emitting CLOBBERs at the start of ctors (and we warn as
before with -fno-lifetime-dse -Wuninitialized).
With -fno-lifetime-dse, the vuse on the b.x read stmt is default def
and thus we warn, but if there are clobbers before that, that is not the
case and we don't warn.  The patch is quick hack to bypass the initial
clobbers as long as there aren't really many.  If we wanted to handle
all initial clobbers, I bet the first time we run into this we could
recursively walk vop uses from the default def and build say a bitmap
of vop SSA_NAMEs which are vdefs of clobbers that only have clobbers
before it as vdef stmts.


  MEM[(struct  &)] ={v} {CLOBBER};
  _4 = b.x;
  if (_4 != 0)

It would be nice (at some point in a distant future) to turn that into

  MEM[(struct  &)] ={v} {CLOBBER};
  if (_4(D) != 0)

i.e. replace reads from clobbered memory with a default def (with a 
gimple_nop defining statement).


IIRC I tried to do it in FRE once, but failed.


Now, the comment says:
 /* For memory the only cheap thing we can do is see if we
have a use of the default def of the virtual operand.
???  Not so cheap would be to use the alias oracle via
walk_aliased_vdefs, if we don't find any aliasing vdef
warn as is-used-uninitialized, if we don't find an aliasing
vdef that kills our use (stmt_kills_ref_p), warn as
may-be-used-uninitialized.  But this walk is quadratic and
so must be limited which means we would miss warning
opportunities.  */
I wonder if it isn't useful to walk even limited number of vdefs this way


Seems worth a try for gcc8, there are several PRs it could help with.


anyway (likely GCC8 material though), e.g. if we run into a clobber that
must (rather than just may) portion of the read ref (and of course when
seeing non-clobber stmt that could alias with the ref give up before that),
we could warn even if we are very far from the start of the function.


--
Marc Glisse


[PATCH] Look through clobber stmts in uninit (PR tree-optimization/79345)

2017-02-20 Thread Jakub Jelinek
Hi!

As mentioned by Jason in the PR, we've regressed on the following testcase
since we started emitting CLOBBERs at the start of ctors (and we warn as
before with -fno-lifetime-dse -Wuninitialized).
With -fno-lifetime-dse, the vuse on the b.x read stmt is default def
and thus we warn, but if there are clobbers before that, that is not the
case and we don't warn.  The patch is quick hack to bypass the initial
clobbers as long as there aren't really many.  If we wanted to handle
all initial clobbers, I bet the first time we run into this we could
recursively walk vop uses from the default def and build say a bitmap
of vop SSA_NAMEs which are vdefs of clobbers that only have clobbers
before it as vdef stmts.

Now, the comment says:
  /* For memory the only cheap thing we can do is see if we
 have a use of the default def of the virtual operand.
 ???  Not so cheap would be to use the alias oracle via
 walk_aliased_vdefs, if we don't find any aliasing vdef
 warn as is-used-uninitialized, if we don't find an aliasing
 vdef that kills our use (stmt_kills_ref_p), warn as
 may-be-used-uninitialized.  But this walk is quadratic and
 so must be limited which means we would miss warning
 opportunities.  */
I wonder if it isn't useful to walk even limited number of vdefs this way
anyway (likely GCC8 material though), e.g. if we run into a clobber that
must (rather than just may) portion of the read ref (and of course when
seeing non-clobber stmt that could alias with the ref give up before that),
we could warn even if we are very far from the start of the function.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk (or do
you want a version with a bitmap and really ignoring all the clobbers,
rather than just 128 of them)?

2017-02-20  Jakub Jelinek  

PR tree-optimization/79345
* tree-ssa-uninit.c (warn_uninitialized_vars): If vuse is not
default def, but there are only CLOBBER stmts as vdefs, handle it like
default def.

* g++.dg/warn/pr79345.C: New test.

--- gcc/tree-ssa-uninit.c.jj2017-01-01 12:45:35.0 +0100
+++ gcc/tree-ssa-uninit.c   2017-02-20 16:43:08.244868075 +0100
@@ -248,8 +248,7 @@ warn_uninitialized_vars (bool warn_possi
  use = gimple_vuse (stmt);
  if (use
  && gimple_assign_single_p (stmt)
- && !gimple_vdef (stmt)
- && SSA_NAME_IS_DEFAULT_DEF (use))
+ && !gimple_vdef (stmt))
{
  tree rhs = gimple_assign_rhs1 (stmt);
  tree base = get_base_address (rhs);
@@ -260,6 +259,23 @@ warn_uninitialized_vars (bool warn_possi
  || is_global_var (base))
continue;
 
+ /* Look through some CLOBBER stmts.  */
+ for (unsigned int cnt = 0; cnt < 128 && use; cnt++)
+   {
+ if (SSA_NAME_IS_DEFAULT_DEF (use))
+   break;
+
+ gimple *def_stmt = SSA_NAME_DEF_STMT (use);
+ if (!gimple_clobber_p (def_stmt))
+   {
+ use = NULL_TREE;
+ break;
+   }
+ use = gimple_vuse (def_stmt);
+   }
+ if (use == NULL_TREE)
+   continue;
+
  if (always_executed)
warn_uninit (OPT_Wuninitialized, use, gimple_assign_rhs1 (stmt),
 base, "%qE is used uninitialized in this function",
--- gcc/testsuite/g++.dg/warn/pr79345.C.jj  2017-02-20 17:19:01.138952915 
+0100
+++ gcc/testsuite/g++.dg/warn/pr79345.C 2017-02-20 17:18:20.0 +0100
@@ -0,0 +1,22 @@
+// PR tree-optimization/79345
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -Wuninitialized" }
+
+struct A {
+  A (int);
+};
+
+struct B : A {
+  const bool x = true;
+
+  B () : A (x ? 3 : 7) { } // { dg-warning "is used uninitialized in this 
function" }
+};
+
+void foo (void*);
+
+void
+bar ()
+{
+  B b;
+  foo ();
+}

Jakub


[PATCH] Fix -fsplit-stack with non-local gotos (PR target/79494)

2017-02-20 Thread Jakub Jelinek
Hi!

We ICE on the following testcase, because we have abnormal edges
from both __morestack call (which is before prologue) and call
to the nested function (which is in between prologue and epilogue)
to a label reachable through non-local goto.  This is something
dwarf2cfi doesn't allow, it doesn't know what CFI state should
be at that label.
As __morestack really doesn't do non-local gotos and while it probably
can throw, it is never something that can be caught in the function that
calls __morestack, the following patch fixes it by telling middle-end
that __morestack can't do nl goto.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-20  Jakub Jelinek  

PR target/79494
* config/i386/i386.c (ix86_expand_split_stack_prologue): Call
make_reg_eh_region_note_nothrow_nononlocal on call_insn.
* config/rs6000/rs6000.c: Include except.h.
(rs6000_expand_split_stack_prologue): Call
make_reg_eh_region_note_nothrow_nononlocal on the call insn.

* gcc.dg/pr79494.c: New test.

--- gcc/config/i386/i386.c.jj   2017-02-20 13:51:22.0 +0100
+++ gcc/config/i386/i386.c  2017-02-20 15:26:04.184793980 +0100
@@ -15057,6 +15057,8 @@ ix86_expand_split_stack_prologue (void)
   add_function_usage_to (call_insn, call_fusage);
   if (!TARGET_64BIT)
 add_reg_note (call_insn, REG_ARGS_SIZE, GEN_INT (0));
+  /* Indicate that this function can't jump to non-local gotos.  */
+  make_reg_eh_region_note_nothrow_nononlocal (as_a  (call_insn));
 
   /* In order to make call/return prediction work right, we now need
  to execute a return instruction.  See
--- gcc/config/rs6000/rs6000.c.jj   2017-02-18 14:12:36.0 +0100
+++ gcc/config/rs6000/rs6000.c  2017-02-20 15:47:56.602798893 +0100
@@ -66,6 +66,7 @@
 #include "builtins.h"
 #include "context.h"
 #include "tree-pass.h"
+#include "except.h"
 #if TARGET_XCOFF
 #include "xcoffout.h"  /* get declarations of xcoff_*_section_name */
 #endif
@@ -31680,6 +31681,8 @@ rs6000_expand_split_stack_prologue (void
  split_stack_return use r0.  */
   use_reg (_fusage, r0);
   add_function_usage_to (insn, call_fusage);
+  /* Indicate that this function can't jump to non-local gotos.  */
+  make_reg_eh_region_note_nothrow_nononlocal (insn);
   emit_insn (gen_frame_load (r0, r1, info->lr_save_offset));
   insn = emit_move_insn (lr, r0);
   add_reg_note (insn, REG_CFA_RESTORE, lr);
--- gcc/testsuite/gcc.dg/pr79494.c.jj   2017-02-20 15:51:12.970247250 +0100
+++ gcc/testsuite/gcc.dg/pr79494.c  2017-02-20 15:52:59.753859680 +0100
@@ -0,0 +1,22 @@
+/* PR target/79494 */
+/* { dg-do compile } */
+/* { dg-require-effective-target split_stack } */
+/* { dg-options "-O2 -fsplit-stack" } */
+
+void
+foo (int a)
+{
+  __label__ lab;
+  __attribute__((noinline, noclone)) void bar (int b)
+  {
+switch (b)
+  {
+  case 1:
+   goto lab;
+  case 2:
+   goto lab;
+  }
+  }
+  bar (a);
+lab:;
+}

Jakub


[C++ PATCH] Fix ICE with decomp gimplification (PR sanitizer/79589)

2017-02-20 Thread Jakub Jelinek
Hi!

When generic is unshared, we generally don't unshare DECL_VALUE_EXPRs,
so those are assumed to be not shared, otherwise as in the testcase
we can clear first argument of a COMPOUND_EXPR after it has been gimplified
in one use and by that clobbering all others.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-20  Jakub Jelinek  

PR sanitizer/79589
* decl.c: Include gimplify.h.
(cp_finish_decomp): Make sure there is no sharing of trees
in between DECL_VALUE_EXPR of decomposition decls.

* g++.dg/ubsan/pr79589.C: New test.

--- gcc/cp/decl.c.jj2017-02-17 18:29:21.0 +0100
+++ gcc/cp/decl.c   2017-02-20 11:45:55.281636544 +0100
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.
 #include "plugin.h"
 #include "cilk.h"
 #include "builtins.h"
+#include "gimplify.h"
 
 /* Possible cases of bad specifiers type used by bad_specifiers. */
 enum bad_spec_place {
@@ -7467,7 +7468,7 @@ cp_finish_decomp (tree decl, tree first,
{
  TREE_TYPE (v[i]) = eltype;
  layout_decl (v[i], 0);
- tree t = dexp;
+ tree t = unshare_expr (dexp);
  t = build4_loc (DECL_SOURCE_LOCATION (v[i]), ARRAY_REF,
  eltype, t, size_int (i), NULL_TREE,
  NULL_TREE);
@@ -7486,7 +7487,7 @@ cp_finish_decomp (tree decl, tree first,
{
  TREE_TYPE (v[i]) = eltype;
  layout_decl (v[i], 0);
- tree t = dexp;
+ tree t = unshare_expr (dexp);
  t = build1_loc (DECL_SOURCE_LOCATION (v[i]),
  i ? IMAGPART_EXPR : REALPART_EXPR, eltype,
  t);
@@ -7504,7 +7505,7 @@ cp_finish_decomp (tree decl, tree first,
{
  TREE_TYPE (v[i]) = eltype;
  layout_decl (v[i], 0);
- tree t = dexp;
+ tree t = unshare_expr (dexp);
  convert_vector_to_array_for_subscript (DECL_SOURCE_LOCATION (v[i]),
 , size_int (i));
  t = build4_loc (DECL_SOURCE_LOCATION (v[i]), ARRAY_REF,
@@ -7603,7 +7604,8 @@ cp_finish_decomp (tree decl, tree first,
  continue;
else
  {
-   tree tt = finish_non_static_data_member (field, t, NULL_TREE);
+   tree tt = finish_non_static_data_member (field, unshare_expr (t),
+NULL_TREE);
if (REFERENCE_REF_P (tt))
  tt = TREE_OPERAND (tt, 0);
TREE_TYPE (v[i]) = TREE_TYPE (tt);
--- gcc/testsuite/g++.dg/ubsan/pr79589.C.jj 2017-02-20 11:51:36.732130221 
+0100
+++ gcc/testsuite/g++.dg/ubsan/pr79589.C2017-02-20 11:52:09.012704729 
+0100
@@ -0,0 +1,13 @@
+// PR sanitizer/79589
+// { dg-do compile }
+// { dg-options "-fsanitize=undefined -std=c++1z" }
+
+struct A { char a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r; } a[64];
+
+void
+foo ()
+{
+  int z = 0;
+  for (auto & [ b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s ] : a)
+z += b + c + d + e + f + g + h + i + j + k + l + m + n + o + p + q + r + s;
+}

Jakub


[PATCH] Move -Wrestrict warning later in the FEs and fix some issues in it (PR c++/79588)

2017-02-20 Thread Jakub Jelinek
Hi!

As mentioned in the PR, -Wrestrict warning is done way too early, where
e.g. default arguments aren't filled up yet (reason for ICE on first
testcase) or where arguments in templates aren't instantiated yet (reason
why we don't diagnose anything on the second testcase).

This patch moves it later where e.g. -Wformat is diagnosed and fixes
some issues I found while looking at the code.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-20  Jakub Jelinek  

PR c++/79588
c-family/
* c-common.c (check_function_arguments): Add FNDECL argument.
Handle -Wrestrict here.
* c-warn.c (warn_for_restrict): Remove ARGS argument, add ARGARRAY
and NARGS.  Use auto_vec for ARG_POSITIONS, simplify.
* c-common.h (check_function_arguments): Add FNDECL argument.
(warn_for_restrict): Remove ARGS argument, add ARGARRAY and NARGS.
c/
* c-parser.c (c_parser_postfix_expression_after_primary): Don't
handle -Wrestrict here.
* c-typeck.c (build_function_call_vec): Adjust
check_function_arguments caller.
cp/
* call.c (build_over_call): Call check_function_arguments even for
-Wrestrict, adjust check_function_arguments caller.
* parser.c (cp_parser_postfix_expression): Don't handle -Wrestrict
here.
* typeck.c (cp_build_function_call_vec): Adjust
check_function_arguments caller.
testsuite/
* g++.dg/warn/Wrestrict-1.C: New test.
* g++.dg/warn/Wrestrict-2.C: New test.

--- gcc/c-family/c-common.c.jj  2017-01-24 23:29:05.0 +0100
+++ gcc/c-family/c-common.c 2017-02-20 13:17:06.601211847 +0100
@@ -5605,8 +5605,8 @@ attribute_fallthrough_p (tree attr)
There are NARGS arguments in the array ARGARRAY.  LOC should be used for
diagnostics.  Return true if -Wnonnull warning has been diagnosed.  */
 bool
-check_function_arguments (location_t loc, const_tree fntype, int nargs,
- tree *argarray)
+check_function_arguments (location_t loc, const_tree fndecl, const_tree fntype,
+ int nargs, tree *argarray)
 {
   bool warned_p = false;
 
@@ -5624,6 +5624,44 @@ check_function_arguments (location_t loc
 
   if (warn_format)
 check_function_sentinel (fntype, nargs, argarray);
+
+  if (warn_restrict)
+{
+  int i;
+  tree parms;
+
+  if (fndecl
+ && TREE_CODE (fndecl) == FUNCTION_DECL
+ && DECL_ARGUMENTS (fndecl))
+   parms = DECL_ARGUMENTS (fndecl);
+  else
+   parms = TYPE_ARG_TYPES (fntype);
+
+  for (i = 0; i < nargs; i++)
+TREE_VISITED (argarray[i]) = 0;
+
+  for (i = 0; i < nargs && parms && parms != void_list_node; i++)
+   {
+ tree type;
+ if (TREE_CODE (parms) == PARM_DECL)
+   {
+ type = TREE_TYPE (parms);
+ parms = DECL_CHAIN (parms);
+   }
+ else
+   {
+ type = TREE_VALUE (parms);
+ parms = TREE_CHAIN (parms);
+   }
+ if (POINTER_TYPE_P (type)
+ && TYPE_RESTRICT (type)
+ && !TYPE_READONLY (TREE_TYPE (type)))
+   warn_for_restrict (i, argarray, nargs);
+   }
+
+  for (i = 0; i < nargs; i++)
+TREE_VISITED (argarray[i]) = 0;
+}
   return warned_p;
 }
 
--- gcc/c-family/c-warn.c.jj2017-02-15 18:06:19.0 +0100
+++ gcc/c-family/c-warn.c   2017-02-20 12:36:29.008455672 +0100
@@ -2170,55 +2170,49 @@ maybe_warn_bool_compare (location_t loc,
restrict-qualified param, and it aliases with another argument.  */
 
 void
-warn_for_restrict (unsigned param_pos, vec *args)
+warn_for_restrict (unsigned param_pos, tree *argarray, unsigned nargs)
 {
-  tree arg = (*args)[param_pos];
-  if (TREE_VISITED (arg) || operand_equal_p (arg, null_pointer_node, 0))
+  tree arg = argarray[param_pos];
+  if (TREE_VISITED (arg) || integer_zerop (arg))
 return;
 
   location_t loc = EXPR_LOC_OR_LOC (arg, input_location);
   gcc_rich_location richloc (loc);
 
   unsigned i;
-  tree current_arg;
-  int *arg_positions = XNEWVEC (int, args->length ());
-  unsigned arg_positions_len = 0;
+  auto_vec arg_positions;
 
-  FOR_EACH_VEC_ELT (*args, i, current_arg) 
+  for (i = 0; i < nargs; i++)
 {
   if (i == param_pos)
continue;
 
-  tree current_arg = (*args)[i];
+  tree current_arg = argarray[i];
   if (operand_equal_p (arg, current_arg, 0))
{
  TREE_VISITED (current_arg) = 1; 
- arg_positions[arg_positions_len++] = (i + 1);
+ arg_positions.safe_push (i + 1);
}
 }
 
-  if (arg_positions_len == 0)
-{
-  free (arg_positions);
-  return;
-}
+  if (arg_positions.is_empty ())
+return;
 
-  for (unsigned i = 0; i < arg_positions_len; i++)
+  int pos;
+  FOR_EACH_VEC_ELT (arg_positions, i, pos)
 {
-  unsigned pos = arg_positions[i];
-  tree arg = 

Re: [PATCH] testsuite: pr59833.c and pr61441.c should use -fsignaling-nans

2017-02-20 Thread Jakub Jelinek
On Mon, Feb 20, 2017 at 07:24:50PM +0100, Jakub Jelinek wrote:
> On Thu, Feb 16, 2017 at 05:46:30PM +, Segher Boessenkool wrote:
> > The testcases pr59833.c and pr61441.c check whether signaling NaNs as
> > input to some operation result in quiet NaNs.  Without -fsignaling-nans
> > this is not guaranteed to happen.  So, this patch add this option to
> > these testcases.
> > 
> > Tested on powerpc64-linux {-m32,-m64}, on aarch64-linux, and on
> > x86_64-linux (where there is no issignaling, huh -- I tested on gcc20).
> 
> pr61441.c now fails since your change on i?86-linux (and x86_64-linux -m32).
> Before combine we have:
> (insn 6 5 7 2 (set (reg:DF 89 [ x ])
> (float_extend:DF (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) 
> [1  S4 A32]))) "pr61441.c":12 152 {*extendsfdf2}
>  (expr_list:REG_EQUAL (const_double:DF +SNaN [+SNaN])
> (nil)))
> (insn 7 6 8 2 (set (mem:DF (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
> (reg:DF 89 [ x ])) "pr61441.c":13 119 {*pushdf}
>  (expr_list:REG_DEAD (reg:DF 89 [ x ])
> (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
> (nil
> (not sure if that REG_EQUAL is not already invalid for -fsignaling-nans),
> and combine turns that into:
> (insn 7 6 8 2 (set (mem:DF (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
> (const_double:DF +SNaN [+SNaN])) "pr61441.c":13 119 {*pushdf}
>  (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
> (nil)))
> which doesn't drop the signal bit.

I have a patch, I'll bootstrap/regtest it and post afterwards.

Jakub


Re: [PATCH] testsuite: pr59833.c and pr61441.c should use -fsignaling-nans

2017-02-20 Thread Jakub Jelinek
On Thu, Feb 16, 2017 at 05:46:30PM +, Segher Boessenkool wrote:
> The testcases pr59833.c and pr61441.c check whether signaling NaNs as
> input to some operation result in quiet NaNs.  Without -fsignaling-nans
> this is not guaranteed to happen.  So, this patch add this option to
> these testcases.
> 
> Tested on powerpc64-linux {-m32,-m64}, on aarch64-linux, and on
> x86_64-linux (where there is no issignaling, huh -- I tested on gcc20).

pr61441.c now fails since your change on i?86-linux (and x86_64-linux -m32).
Before combine we have:
(insn 6 5 7 2 (set (reg:DF 89 [ x ])
(float_extend:DF (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [1 
 S4 A32]))) "pr61441.c":12 152 {*extendsfdf2}
 (expr_list:REG_EQUAL (const_double:DF +SNaN [+SNaN])
(nil)))
(insn 7 6 8 2 (set (mem:DF (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
(reg:DF 89 [ x ])) "pr61441.c":13 119 {*pushdf}
 (expr_list:REG_DEAD (reg:DF 89 [ x ])
(expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
(nil
(not sure if that REG_EQUAL is not already invalid for -fsignaling-nans),
and combine turns that into:
(insn 7 6 8 2 (set (mem:DF (pre_dec:SI (reg/f:SI 7 sp)) [2  S8 A64])
(const_double:DF +SNaN [+SNaN])) "pr61441.c":13 119 {*pushdf}
 (expr_list:REG_ARGS_SIZE (const_int 16 [0x10])
(nil)))
which doesn't drop the signal bit.

Jakub


C++ PATCH for c++/78139, private destructor and new-expression

2017-02-20 Thread Jason Merrill
The C++17 copy elision code in build_special_member_call was creating
a temporary and then eliding the copy directly, but creating that
temporary meant building a cleanup, which requires the destructor to
be callable.  And we shouldn't require that when building a
new-expression.  Fixed by using the new tf_no_cleanup flag.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit f41a463a7fba438a5127dc15e1043f287a16e9ad
Author: Jason Merrill 
Date:   Sun Feb 19 20:41:23 2017 -0800

PR c++/78139 - destructor needed by new-expression

* call.c (build_special_member_call): Use tf_no_cleanup.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index d6d3a8f..93fae0d 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8356,9 +8356,15 @@ build_special_member_call (tree instance, tree name, 
vec **args,
   /* FIXME P0135 doesn't say how to handle direct initialization from a
 type with a suitable conversion operator.  Let's handle it like
 copy-initialization, but allowing explict conversions.  */
+  tsubst_flags_t sub_complain = tf_warning;
+  if (!is_dummy_object (instance))
+   /* If we're using this to initialize a non-temporary object, don't
+  require the destructor to be accessible.  */
+   sub_complain |= tf_no_cleanup;
   if (!reference_related_p (class_type, TREE_TYPE (arg)))
arg = perform_implicit_conversion_flags (class_type, arg,
-tf_warning, flags);
+sub_complain,
+flags);
   if ((TREE_CODE (arg) == TARGET_EXPR
   || TREE_CODE (arg) == CONSTRUCTOR)
  && (same_type_ignoring_top_level_qualifiers_p
diff --git a/gcc/testsuite/g++.dg/init/new48.C 
b/gcc/testsuite/g++.dg/init/new48.C
new file mode 100644
index 000..528a582
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/new48.C
@@ -0,0 +1,18 @@
+// PR c++/78139
+
+struct A
+{
+  A(int);
+private:
+  ~A();
+};
+
+struct B
+{
+  B(void*);
+};
+
+int main()
+{
+  B(new A(42));
+}


Re: [PATCH] Fix fixincludes for canadian cross builds

2017-02-20 Thread Bruce Korb
On 02/18/17 01:01, Bernd Edlinger wrote:
> On 02/18/17 00:37, Bruce Korb wrote:
>> On 02/06/17 10:44, Bernd Edlinger wrote:
>>> I tested this change with different arm-linux-gnueabihf cross
>>> compilers, and verified that mkheaders still works on the host system.
>>>
>>> Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
>>> Is it OK for trunk?
>>
>> As long as you certify that this is correct for all systems we care about:
>>
>> +BUILD_SYSTEM_HEADER_DIR = `
>> +echo $(CROSS_SYSTEM_HEADER_DIR) | \
>> +sed -e :a -e 's,[^/]*/\.\.\/,,' -e ta`
>>
>> that is pretty obtuse sed-speak to me.  I suggest a comment
>> explaining what sed is supposed to be doing.  What should
>> "$(CROSS_SYSTEM_HEADER_DIR)" look like?
>>
> 
> I took it just from a few lines above, so I thought that comment would
> sufficiently explain the syntax:

I confess, I didn't pull a new copy of gcc, sorry.
So it looks good to me.


Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

2017-02-20 Thread Bin.Cheng
On Mon, Feb 20, 2017 at 4:05 PM, Jan Hubicka  wrote:
>> BTW, if we use gcov_type in calculation from 
>> expected_loop_iterations_unbounded,
>> how should we adjust the number for calling scale_loop_frequencies
>> which has int type?  In extreme case, gcov_type could be out of int's
>> range, we have to cap the value anyway.  But yes, 1 in
>> expect_loop_iterations is too small.
>
> What I usually do is to use fixed point math in this case (based on 
> REG_BR_PROB_BASE).
> Just pass REG_BR_PROB_BASE as den and calculate the adjustment in gcov_type 
> converting
> to int. Because you should be just decreasing the counts, it won't overflow 
> and because
> the decarese will be in range, say 2...256 times, it should also be 
> sufficiently
> precise.
>
> Be careful to avoid overflow of gcov type - it is not safe to multiply two
> counts in 64bit math because each count can be more than 2^32.  (next stage1 I
> plan to switch most of this to sreals that will make this easier)
>
>> >> > But I guess here it is sort of safe because vectorized loops are simple.
>> >> > You can't just scale down the existing counts/frequencies by vf, 
>> >> > because the
>> >> > entry edge frequency was adjusted.
>> >> I am not 100% follow here, it looks the code avoids changing frequency
>> >> counter for preheader/exit edge, otherwise we would need to change all
>> >> counters dominated by them?
>> >
>> > I was just wondering what is wrong with taking the existing 
>> > frequencies/counts
>> > the loop body has and dividing them all by the unroll factor.  This is 
>> > correct
>> > if you ignore the versioning. With versioning I guess you want to scale by
>> > the unroll factor and also subtract frequencies/counts that was acocunted 
>> > to
>> > the other versions of the loop, right?
>> IIUC, for (vectorized) loop header, it's frequency is calculated by:
>>   freq_header = freq_preheader + freq_latch
>> and freq_latch = (niter * freq_preheader).  Simply scaling it by VF gives:
>>   freq_header = (freq_preheader + freq_latch) / VF
>> which is wrong.  Especially if the loop is vectorized by large VF
>> (=16) and we normally assume niter (=10) without profiling
>> information, it results not only mismatch, but also
>> (loop->header->frequency < loop->preheader->frequency).  In fact, if
>> we have accurate niter information, the counter should be:
>>   freq_header = freq_preheader + niter * freq_preheader
>
> You are right. We need to compensate for the change of probability of the loop
> exit edge.
>>
>> >> >
>> >> > Also niter_for_unrolled_loop depends on sanity of the profile, so 
>> >> > perhaps you
>> >> > need to compute it before you start chanigng the CFG by peeling 
>> >> > proplogue?
>> >> Peeling for prologue doesn't change profiling information of
>> >> vect_loop, it is the skip edge from before loop to preferred epilogue
>> >> loop that will change profile counters.  I guess here exists a dilemma
>> >> that niter_for_unrolled_loop is for loop after peeling for prologue?
>> >
>> > expected_loop_iterations_unbounded calculates number of iteations by 
>> > computing
>> > sum of frequencies of edges entering the loop and comparing it to the 
>> > frequency
>> > of loop header.  While peeling the prologue, you split the preheader edge 
>> > and
>> > adjust frequency of the new preheader BB of the loop to be vectorized.  I 
>> > think
>> > that will adjust the #of iterations estimate.
>> It's not the case now I think.  one motivation of new vect_do_peeling
>> is to avoid niter checks between prologue and vector loop.  Once
>> prologue loop is entered or checked, the vector loop must be executed
>> unconditionally.  So the preheaderof vector loop has consistent
>> frequency counters now.  The niter check on whether vector loop should
>> be executed is now merged with cost check before prologue, and in the
>> future I think this can be further merged if loop versioning is
>> needed.
>
> Originally you have
>
>   loop_preheader
>|
>v
>loop_header
>
> and the ratio of the two BB frequencies is the loop iteration count. Then you
> do something like:
>
>   orig_loop_preheader
>|
>v
>loop_prologue -> scalar_version_of_loop
>|
>v
>  new_loop_preheader
>|
>v
>loop_header
It's like:

   orig_loop_preheader
|
v
check_on_niter -> scalar_version_of_loop (i.e, epilog_loop)
|
v
loop_prologue
|
v
  new_loop_preheader
|
v
loop_header

Yes, the preheader/header need to be consistent here, and it is now.
Thanks for explaining.

> At some point, you need to update new_loop_preheader frequency/count
> to reflect the fact that with some probability the loop_prologue avoids
> the vectorized loop.  Once you do it and if you don't scale frequency of
> loop_header you will make expect_loop_iterations to return higher value
> than previously.
>
> So at 

Re: fwprop fix for PR79405

2017-02-20 Thread Richard Biener
On February 20, 2017 2:58:54 PM GMT+01:00, Bernd Schmidt  
wrote:
>On 02/17/2017 10:11 AM, Richard Biener wrote:
>> Index: gcc/fwprop.c
>> ===
>> --- gcc/fwprop.c(revision 245501)
>> +++ gcc/fwprop.c(working copy)
>> @@ -1478,7 +1478,8 @@ fwprop (void)
>>   Do not forward propagate addresses into loops until after
>unrolling.
>>   CSE did so because it was able to fix its own mess, but we are
>not.  */
>>
>> -  for (i = 0; i < DF_USES_TABLE_SIZE (); i++)
>> +  unsigned sz = DF_USES_TABLE_SIZE ();
>> +  for (i = 0; i < sz; i++)
>>  {
>>df_ref use = DF_USES_GET (i);
>>if (use)
>>
>> might work?  (not knowing too much about this detail of the DF data
>> structures - can the table shrink?)
>
>This would probably work to fix the bug, but this behaviour is 
>explicitly documented as intentional (in the comment the second half of
>
>which you've quoted). I assume it enables additional substitutions.

Hmm, this means the walking-stmts solution sounds more correct and also gets 
these second-level opportunities.

Richard.

>
>Bernd



New Swedish PO file for 'gcc' (version 7.1-b20170101)

2017-02-20 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-7.1-b20170101.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

2017-02-20 Thread Jan Hubicka
> BTW, if we use gcov_type in calculation from 
> expected_loop_iterations_unbounded,
> how should we adjust the number for calling scale_loop_frequencies
> which has int type?  In extreme case, gcov_type could be out of int's
> range, we have to cap the value anyway.  But yes, 1 in
> expect_loop_iterations is too small.

What I usually do is to use fixed point math in this case (based on 
REG_BR_PROB_BASE).
Just pass REG_BR_PROB_BASE as den and calculate the adjustment in gcov_type 
converting
to int. Because you should be just decreasing the counts, it won't overflow and 
because
the decarese will be in range, say 2...256 times, it should also be sufficiently
precise.

Be careful to avoid overflow of gcov type - it is not safe to multiply two
counts in 64bit math because each count can be more than 2^32.  (next stage1 I
plan to switch most of this to sreals that will make this easier)

> >> > But I guess here it is sort of safe because vectorized loops are simple.
> >> > You can't just scale down the existing counts/frequencies by vf, because 
> >> > the
> >> > entry edge frequency was adjusted.
> >> I am not 100% follow here, it looks the code avoids changing frequency
> >> counter for preheader/exit edge, otherwise we would need to change all
> >> counters dominated by them?
> >
> > I was just wondering what is wrong with taking the existing 
> > frequencies/counts
> > the loop body has and dividing them all by the unroll factor.  This is 
> > correct
> > if you ignore the versioning. With versioning I guess you want to scale by
> > the unroll factor and also subtract frequencies/counts that was acocunted to
> > the other versions of the loop, right?
> IIUC, for (vectorized) loop header, it's frequency is calculated by:
>   freq_header = freq_preheader + freq_latch
> and freq_latch = (niter * freq_preheader).  Simply scaling it by VF gives:
>   freq_header = (freq_preheader + freq_latch) / VF
> which is wrong.  Especially if the loop is vectorized by large VF
> (=16) and we normally assume niter (=10) without profiling
> information, it results not only mismatch, but also
> (loop->header->frequency < loop->preheader->frequency).  In fact, if
> we have accurate niter information, the counter should be:
>   freq_header = freq_preheader + niter * freq_preheader

You are right. We need to compensate for the change of probability of the loop
exit edge.
> 
> >> >
> >> > Also niter_for_unrolled_loop depends on sanity of the profile, so 
> >> > perhaps you
> >> > need to compute it before you start chanigng the CFG by peeling 
> >> > proplogue?
> >> Peeling for prologue doesn't change profiling information of
> >> vect_loop, it is the skip edge from before loop to preferred epilogue
> >> loop that will change profile counters.  I guess here exists a dilemma
> >> that niter_for_unrolled_loop is for loop after peeling for prologue?
> >
> > expected_loop_iterations_unbounded calculates number of iteations by 
> > computing
> > sum of frequencies of edges entering the loop and comparing it to the 
> > frequency
> > of loop header.  While peeling the prologue, you split the preheader edge 
> > and
> > adjust frequency of the new preheader BB of the loop to be vectorized.  I 
> > think
> > that will adjust the #of iterations estimate.
> It's not the case now I think.  one motivation of new vect_do_peeling
> is to avoid niter checks between prologue and vector loop.  Once
> prologue loop is entered or checked, the vector loop must be executed
> unconditionally.  So the preheaderof vector loop has consistent
> frequency counters now.  The niter check on whether vector loop should
> be executed is now merged with cost check before prologue, and in the
> future I think this can be further merged if loop versioning is
> needed.

Originally you have

  loop_preheader
   |
   v
   loop_header

and the ratio of the two BB frequencies is the loop iteration count. Then you
do something like:

  orig_loop_preheader
   |
   v
   loop_prologue -> scalar_version_of_loop
   |
   v
 new_loop_preheader
   |
   v
   loop_header

At some point, you need to update new_loop_preheader frequency/count
to reflect the fact that with some probability the loop_prologue avoids
the vectorized loop.  Once you do it and if you don't scale frequency of
loop_header you will make expect_loop_iterations to return higher value
than previously.

So at the time you are calling it, you need to be sure that the loop_header
and its preheader frequences was both adjusted by same factor.  Or you need
to call it early before you start hacking on the CFG and its profile.

Pehaps currently it is safe, because your peeling code is also scaling
the loop profiles.

Honza
> 
> Thanks,
> bin
> >
> > Honza
> >>
> >> Thanks,
> >> bin
> >> >
> >> > Finally if freq_e is 0, all frequencies and counts will be probably 
> >> > dropped to
> >> > 0.  What about determining fraction by counts if they are 

Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

2017-02-20 Thread Bin.Cheng
On Mon, Feb 20, 2017 at 3:17 PM, Jan Hubicka  wrote:
>> > +  /* Without profile feedback, loops for which we do not know a better 
>> > estimate
>> > + are assumed to roll 10 times.  When we unroll such loop, it appears 
>> > to
>> > + roll too little, and it may even seem to be cold.  To avoid this, we
>> > + ensure that the created loop appears to roll at least 5 times (but at
>> > + most as many times as before unrolling).  */
>> > +  if (new_est_niter < 5)
>> > +{
>> > +  if (est_niter < 5)
>> > +   new_est_niter = est_niter;
>> > +  else
>> > +   new_est_niter = 5;
>> > +}
>> > +
>> > +  return new_est_niter;
>> > +}
>> >
>> > I see this code is pre-existing, but please extend it to test if
>> > loop->header->count is non-zero.  Even if we do not have idea about loop
>> > iteration count estimate we may end up predicting more than 10 iterations 
>> > when
>> > predictors combine that way.
>> If I use expected_loop_iterations_unbounded, then do I need to handle
>> loop->header->count explicitly here?  I suppose not because it has
>> below code already:
>>
>>   /* If we have no profile at all, use AVG_LOOP_NITER.  */
>>   if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
>> expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER);
>>   else if (loop->latch && (loop->latch->count || loop->header->count))
>> {
>>   gcov_type count_in, count_latch;
>>   //...
>
> expected_loop_iterations_unbounded avoids capping the # of iterations comming
> from profile feedback (so if loop iterates 100 it retns 100 instead of
> 1). Testing loop->header->count for non-zero tests if the loop was ever
> entered in the train run so it is easy test if you have any profile feedback
> for a given loop available.
BTW, if we use gcov_type in calculation from expected_loop_iterations_unbounded,
how should we adjust the number for calling scale_loop_frequencies
which has int type?  In extreme case, gcov_type could be out of int's
range, we have to cap the value anyway.  But yes, 1 in
expect_loop_iterations is too small.
>
> I profile feedback is there, then making usre that new_est_niter is at least
> 5 will only make feedback less acurate.  I am not sure if we vectorize loop
> that will iterate only 1...4 times after unrolling, but I guess we could,
> and it would be desirable to peel it afterwards.
Right, I think we can discard this lower bound if profiling
information is present.
>>
>> The second question is: looks like it only takes latch->count into
>> consideration when PROFILE_ABSENT.  But according to your comments, we
>> could have nonzero count sometime?
>
> profile_status_for_fn (cfun) == PROFILE_ABSENT is set only when there is no
> profile esitmate at all - i.e. you compile with
> -fno-guess-branch-probabilities.  We don't really use that path very often
> but in that case you have no data to guess your profile from.
>
>>
>> >
>> > Perhaps testing estimated-loop_iterations would also make sense, but that
>> > could be dealt with incrementally.
>> >
>> > +static void
>> > +scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
>> > +{
>> > +  unsigned freq_h = loop->header->frequency;
>> > +  unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
>> > +  /* Reduce loop iterations by the vectorization factor.  */
>> > +  unsigned new_est_niter = niter_for_unrolled_loop (loop, vf);
>> > +
>> > +  if (freq_h != 0)
>> > +scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
>> > +
>> > I am always trying to avoid propagating small mistakes (i.e. frong freq_h 
>> > or
>> > freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to 
>> > avoid
>> > spreading mistakes across cfg.
>> >
>> > But I guess here it is sort of safe because vectorized loops are simple.
>> > You can't just scale down the existing counts/frequencies by vf, because 
>> > the
>> > entry edge frequency was adjusted.
>> I am not 100% follow here, it looks the code avoids changing frequency
>> counter for preheader/exit edge, otherwise we would need to change all
>> counters dominated by them?
>
> I was just wondering what is wrong with taking the existing frequencies/counts
> the loop body has and dividing them all by the unroll factor.  This is correct
> if you ignore the versioning. With versioning I guess you want to scale by
> the unroll factor and also subtract frequencies/counts that was acocunted to
> the other versions of the loop, right?
IIUC, for (vectorized) loop header, it's frequency is calculated by:
  freq_header = freq_preheader + freq_latch
and freq_latch = (niter * freq_preheader).  Simply scaling it by VF gives:
  freq_header = (freq_preheader + freq_latch) / VF
which is wrong.  Especially if the loop is vectorized by large VF
(=16) and we normally assume niter (=10) without profiling
information, it results not only mismatch, but also
(loop->header->frequency < loop->preheader->frequency).  In fact, 

Re: [PATCH] Fix -fsanitize=bounds crash with zero-size array (PR sanitizer/79558)

2017-02-20 Thread Jakub Jelinek
On Mon, Feb 20, 2017 at 04:44:10PM +0100, Marek Polacek wrote:
> We crash here becase ubsan_type_descriptor isn't able to handle arrays
> such as int[0:], i.e. where the TYPE_MAX_VALUE of the domain is missing.
> Fixed by checking that first, which means we'd print '*' instead if it
> is missing.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk/6?
> 
> 2017-02-20  Marek Polacek  
> 
>   PR sanitizer/79558
>   * ubsan.c (ubsan_type_descriptor): Check if TYPE_MAX_VALUE is null.
> 
>   * c-c++-common/ubsan/bounds-14.c: New test.

Ok, thanks.

Jakub


[PATCH] Fix -fsanitize=bounds crash with zero-size array (PR sanitizer/79558)

2017-02-20 Thread Marek Polacek
We crash here becase ubsan_type_descriptor isn't able to handle arrays
such as int[0:], i.e. where the TYPE_MAX_VALUE of the domain is missing.
Fixed by checking that first, which means we'd print '*' instead if it
is missing.

Bootstrapped/regtested on x86_64-linux, ok for trunk/6?

2017-02-20  Marek Polacek  

PR sanitizer/79558
* ubsan.c (ubsan_type_descriptor): Check if TYPE_MAX_VALUE is null.

* c-c++-common/ubsan/bounds-14.c: New test.

diff --git gcc/testsuite/c-c++-common/ubsan/bounds-14.c 
gcc/testsuite/c-c++-common/ubsan/bounds-14.c
index e69de29..ddb5251 100644
--- gcc/testsuite/c-c++-common/ubsan/bounds-14.c
+++ gcc/testsuite/c-c++-common/ubsan/bounds-14.c
@@ -0,0 +1,13 @@
+/* PR sanitizer/79558 */
+/* { dg-do compile } */
+/* { dg-options "-fsanitize=bounds" } */
+
+void
+fn1 (int n)
+{
+  int i, j;
+  int x[2][0];
+  for (i = 0; i < n; i++)
+for (j = 0; j < n; j++)
+  x[i][j] = 5;
+}
diff --git gcc/ubsan.c gcc/ubsan.c
index 0291401..11a41e1 100644
--- gcc/ubsan.c
+++ gcc/ubsan.c
@@ -409,7 +409,9 @@ ubsan_type_descriptor (tree type, enum ubsan_print_style 
pstyle)
{
  pp_left_bracket (_name);
  tree dom = TYPE_DOMAIN (t);
- if (dom && TREE_CODE (TYPE_MAX_VALUE (dom)) == INTEGER_CST)
+ if (dom != NULL_TREE
+ && TYPE_MAX_VALUE (dom) != NULL_TREE
+ && TREE_CODE (TYPE_MAX_VALUE (dom)) == INTEGER_CST)
{
  if (tree_fits_uhwi_p (TYPE_MAX_VALUE (dom))
  && tree_to_uhwi (TYPE_MAX_VALUE (dom)) + 1 != 0)

Marek


Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

2017-02-20 Thread Jan Hubicka
> > +  /* Without profile feedback, loops for which we do not know a better 
> > estimate
> > + are assumed to roll 10 times.  When we unroll such loop, it appears to
> > + roll too little, and it may even seem to be cold.  To avoid this, we
> > + ensure that the created loop appears to roll at least 5 times (but at
> > + most as many times as before unrolling).  */
> > +  if (new_est_niter < 5)
> > +{
> > +  if (est_niter < 5)
> > +   new_est_niter = est_niter;
> > +  else
> > +   new_est_niter = 5;
> > +}
> > +
> > +  return new_est_niter;
> > +}
> >
> > I see this code is pre-existing, but please extend it to test if
> > loop->header->count is non-zero.  Even if we do not have idea about loop
> > iteration count estimate we may end up predicting more than 10 iterations 
> > when
> > predictors combine that way.
> If I use expected_loop_iterations_unbounded, then do I need to handle
> loop->header->count explicitly here?  I suppose not because it has
> below code already:
> 
>   /* If we have no profile at all, use AVG_LOOP_NITER.  */
>   if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
> expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER);
>   else if (loop->latch && (loop->latch->count || loop->header->count))
> {
>   gcov_type count_in, count_latch;
>   //...

expected_loop_iterations_unbounded avoids capping the # of iterations comming
from profile feedback (so if loop iterates 100 it retns 100 instead of
1). Testing loop->header->count for non-zero tests if the loop was ever
entered in the train run so it is easy test if you have any profile feedback
for a given loop available.

I profile feedback is there, then making usre that new_est_niter is at least
5 will only make feedback less acurate.  I am not sure if we vectorize loop
that will iterate only 1...4 times after unrolling, but I guess we could,
and it would be desirable to peel it afterwards.
> 
> The second question is: looks like it only takes latch->count into
> consideration when PROFILE_ABSENT.  But according to your comments, we
> could have nonzero count sometime?

profile_status_for_fn (cfun) == PROFILE_ABSENT is set only when there is no
profile esitmate at all - i.e. you compile with
-fno-guess-branch-probabilities.  We don't really use that path very often
but in that case you have no data to guess your profile from.

> 
> >
> > Perhaps testing estimated-loop_iterations would also make sense, but that
> > could be dealt with incrementally.
> >
> > +static void
> > +scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
> > +{
> > +  unsigned freq_h = loop->header->frequency;
> > +  unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
> > +  /* Reduce loop iterations by the vectorization factor.  */
> > +  unsigned new_est_niter = niter_for_unrolled_loop (loop, vf);
> > +
> > +  if (freq_h != 0)
> > +scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
> > +
> > I am always trying to avoid propagating small mistakes (i.e. frong freq_h or
> > freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid
> > spreading mistakes across cfg.
> >
> > But I guess here it is sort of safe because vectorized loops are simple.
> > You can't just scale down the existing counts/frequencies by vf, because the
> > entry edge frequency was adjusted.
> I am not 100% follow here, it looks the code avoids changing frequency
> counter for preheader/exit edge, otherwise we would need to change all
> counters dominated by them?

I was just wondering what is wrong with taking the existing frequencies/counts
the loop body has and dividing them all by the unroll factor.  This is correct
if you ignore the versioning. With versioning I guess you want to scale by
the unroll factor and also subtract frequencies/counts that was acocunted to
the other versions of the loop, right?
> >
> > Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps 
> > you
> > need to compute it before you start chanigng the CFG by peeling proplogue?
> Peeling for prologue doesn't change profiling information of
> vect_loop, it is the skip edge from before loop to preferred epilogue
> loop that will change profile counters.  I guess here exists a dilemma
> that niter_for_unrolled_loop is for loop after peeling for prologue?

expected_loop_iterations_unbounded calculates number of iteations by computing
sum of frequencies of edges entering the loop and comparing it to the frequency
of loop header.  While peeling the prologue, you split the preheader edge and
adjust frequency of the new preheader BB of the loop to be vectorized.  I think
that will adjust the #of iterations estimate.

Honza
> 
> Thanks,
> bin
> >
> > Finally if freq_e is 0, all frequencies and counts will be probably dropped 
> > to
> > 0.  What about determining fraction by counts if they are available?
> >
> > Otherwise the patch looks good and thanks a lot for working on this!
> >
> > 

Re: Improving code generation in the nvptx back end

2017-02-20 Thread Alexander Monakov
On Fri, 17 Feb 2017, Thomas Schwinge wrote:
> On Fri, 17 Feb 2017 14:00:09 +0100, I wrote:
> > [...] for "normal" functions there is no reason to use the
> > ".param" space for passing arguments in and out of functions.  We can
> > then get rid of the boilerplate code to move ".param %in_ar*" into ".reg
> > %ar*", and the other way round for "%value_out"/"%value".  This will then
> > also simplify the call sites, where all that code "evaporates".  That's
> > actually something I started to look into, many months ago, and I now
> > just dug out those changes, and will post them later.
> > 
> > (Very likely, the PTX "JIT" compiler will do the very same thing without
> > difficulty, but why not directly generate code that is less verbose to
> > read?)

In general you cannot use this shorthand notation because PTX interop guidelines
explicitly prescribe using the .param space for argument passing.  See
https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/ , section 3.

So at best GCC can use it for calls where interop concerns are guaranteed to not
arise: when the callee is not externally visible, and does not have its address
taken.  And there's a question of how well it's going to work after time passes,
since other compilers always use the verbose form (and thus the .reg calling
style is not frequently exercised).

Alexander


Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

2017-02-20 Thread Bin.Cheng
On Mon, Feb 20, 2017 at 2:02 PM, Jan Hubicka  wrote:
>> > 2017-02-16  Bin Cheng  
>> >
>> > PR tree-optimization/77536
>> > * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function.
>> > (tree_transform_and_unroll_loop): Use above function to compute the
>> > estimated niter of unrolled loop.
>> > * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration.
>> > * tree-vect-loop.c (scale_profile_for_vect_loop): New function.
>> > (vect_transform_loop): Call above function.
Thanks very much for your suggestions.  I don't know profiling logic
very well and have some questions embedded below before I start
revising patch.
>
> +/* Return estimated niter for LOOP after unrolling by FACTOR times.  */
> +
> +unsigned
> +niter_for_unrolled_loop (struct loop *loop, unsigned factor)
> +{
> +  unsigned est_niter = expected_loop_iterations (loop);
>
> What happens when you have profile and loop iterates very many times?
> Perhaps we want to do all calculation in gcov_type and use
> expected_loop_iterations_unbounded>?
>
> expected_loop_iterations is capping by 1 that is easy to overflow.
>
> +  gcc_assert (factor != 0);
> +  unsigned new_est_niter = est_niter / factor;
> +
> +  /* Without profile feedback, loops for which we do not know a better 
> estimate
> + are assumed to roll 10 times.  When we unroll such loop, it appears to
> + roll too little, and it may even seem to be cold.  To avoid this, we
> + ensure that the created loop appears to roll at least 5 times (but at
> + most as many times as before unrolling).  */
> +  if (new_est_niter < 5)
> +{
> +  if (est_niter < 5)
> +   new_est_niter = est_niter;
> +  else
> +   new_est_niter = 5;
> +}
> +
> +  return new_est_niter;
> +}
>
> I see this code is pre-existing, but please extend it to test if
> loop->header->count is non-zero.  Even if we do not have idea about loop
> iteration count estimate we may end up predicting more than 10 iterations when
> predictors combine that way.
If I use expected_loop_iterations_unbounded, then do I need to handle
loop->header->count explicitly here?  I suppose not because it has
below code already:

  /* If we have no profile at all, use AVG_LOOP_NITER.  */
  if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
expected = PARAM_VALUE (PARAM_AVG_LOOP_NITER);
  else if (loop->latch && (loop->latch->count || loop->header->count))
{
  gcov_type count_in, count_latch;
  //...

The second question is: looks like it only takes latch->count into
consideration when PROFILE_ABSENT.  But according to your comments, we
could have nonzero count sometime?

>
> Perhaps testing estimated-loop_iterations would also make sense, but that
> could be dealt with incrementally.
>
> +static void
> +scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
> +{
> +  unsigned freq_h = loop->header->frequency;
> +  unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
> +  /* Reduce loop iterations by the vectorization factor.  */
> +  unsigned new_est_niter = niter_for_unrolled_loop (loop, vf);
> +
> +  if (freq_h != 0)
> +scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
> +
> I am always trying to avoid propagating small mistakes (i.e. frong freq_h or
> freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid
> spreading mistakes across cfg.
>
> But I guess here it is sort of safe because vectorized loops are simple.
> You can't just scale down the existing counts/frequencies by vf, because the
> entry edge frequency was adjusted.
I am not 100% follow here, it looks the code avoids changing frequency
counter for preheader/exit edge, otherwise we would need to change all
counters dominated by them?
>
> Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you
> need to compute it before you start chanigng the CFG by peeling proplogue?
Peeling for prologue doesn't change profiling information of
vect_loop, it is the skip edge from before loop to preferred epilogue
loop that will change profile counters.  I guess here exists a dilemma
that niter_for_unrolled_loop is for loop after peeling for prologue?

Thanks,
bin
>
> Finally if freq_e is 0, all frequencies and counts will be probably dropped to
> 0.  What about determining fraction by counts if they are available?
>
> Otherwise the patch looks good and thanks a lot for working on this!
>
> Honza
>
>> >
>> > gcc/testsuite/ChangeLog
>> > 2017-02-16  Bin Cheng  
>> >
>> > PR tree-optimization/77536
>> > * gcc.dg/vect/pr79347.c: Revise testing string.


Re: [PATCH][DOC][OBVIOUS] Document default value for use-after-scope-direct-emission-threshold

2017-02-20 Thread Martin Liška
On 02/18/2017 09:48 AM, Gerald Pfeifer wrote:
> Let me know whether this looks fine to you, and I'll commit.
> 
> Gerald

I'm fine with that! Thanks.

Martin


Re: [GIMPLE FE] add fma_expr

2017-02-20 Thread Kyrill Tkachov

Hi Prathamesh,

On 16/02/17 12:47, Prathamesh Kulkarni wrote:

Hi Richard,
The attached patch handles fma_expr in gimple-fe.
Does it look OK ?

Thanks,
Prathamesh


I see the new test ICEing on aarch64-none-elf.

Thanks,
Kyrill

--

$DIR/build-aarch64/obj/gcc2/gcc/xgcc -B$DIR/build-aarch64/obj/gcc2/gcc/ 
$DIR/gcc/gcc/testsuite/gcc.dg/gimplefe-26.c -fno-diagnos
tics-show-caret -fdiagnostics-color=never -O -fgimple -fdump-tree-ssa-gimple -S 
-specs=aem-ve.specs -o gimplefe-26.s
$DIR/gcc/gcc/testsuite/gcc.dg/gimplefe-26.c: In function 'foo_3':
$DIR/gcc/gcc/testsuite/gcc.dg/gimplefe-26.c:5:18: internal compiler error: in 
expand_expr_real_2, at expr.c:8705
$DIR/gcc/gcc/testsuite/gcc.dg/gimplefe-26.c:14:1: note: in expansion of macro 
'foo'
0x84e7a3 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)
$DIR/gcc/gcc/expr.c:8705
0x838bd6 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
$DIR/gcc/gcc/expr.c:9730
0x83f0f5 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, 
rtx_def**, bool)
$DIR/gcc/gcc/expr.c:8072
0x73ab2f expand_expr
$DIR/gcc/gcc/expr.h:276
0x73ab2f expand_return
$DIR/gcc/gcc/cfgexpand.c:3526
0x73ab2f expand_gimple_stmt_1
$DIR/gcc/gcc/cfgexpand.c:3610
0x73ab2f expand_gimple_stmt
$DIR/gcc/gcc/cfgexpand.c:3737
0x73d55b expand_gimple_basic_block
$DIR/gcc/gcc/cfgexpand.c:5744
0x740b14 execute
$DIR/gcc/gcc/cfgexpand.c:6357
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.



Re: [PATCH 0/6] Improve -fprefetch-loop-arrays in general and for AArch64 in particular

2017-02-20 Thread Kyrill Tkachov

Hi Maxim,

On 30/01/17 11:24, Maxim Kuvyrkov wrote:

This patch series improves -fprefetch-loop-arrays pass through small fixes and 
tweaks, and then enables it for several AArch64 cores.

My tunings were done on and for Qualcomm hardware, with results varying between 
+0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, depending on 
hardware revision.

This patch series enables restricted -fprefetch-loop-arrays at -O2, which also 
improves SPEC2006 numbers

Biggest progressions are on 419.mcf and 437.leslie3d, with no serious 
regressions on other benchmarks.

I'm now investigating making -fprefetch-loop-arrays more aggressive for 
Qualcomm hardware, which improves performance on most benchmarks, but also 
causes big regressions on 454.calculix and 462.libquantum.  If I can fix these 
two regressions, prefetching will give another boost to AArch64.

Andrew just posted similar prefetching tunings for Cavium's cores, and the two 
patches have trivial conflicts.  I'll post mine as-is, since it address one of 
the comments on Andrew's review (adding a stand-alone struct for tuning 
parameters).

Andrew, feel free to just copy-paste it to your patch, since it is just a 
mechanical change.

All patches were bootstrapped and regtested on x86_64-linux-gnu and 
aarch64-linux-gnu.
  


I've tried these patches out on Cortex-A72 and Cortex-A53, with the tuning 
structs entries appropriately
modified to enable the changes on those cores.
I'm seeing the mcf and leslie3d improvements as well on Cortex-A72 and 
Cortex-A53 and no noticeable regressions.
I've also verified that the improvements are due to the prefetch instructions 
rather than just the unrolling that
the pass does.
So I'm in favor of enabling this for the cores that benefit from it.

Do you plan to get this in for GCC 8?
Thanks,
Kyrill


--
Maxim Kuvyrkov
www.linaro.org







[Patch, fortran] PRs79599 and 79523

2017-02-20 Thread Paul Richard Thomas
Dear All,

These trivial bugs have been fixed in revision 245603.

2017-02-20  Paul Thomas  

PR fortran/79599
* interface.c (check_dtio_arg_TKR_intent): Supply 'must'
missing from error message.

2017-02-20  Paul Thomas  

PR fortran/79523
* interface.c (gfc_find_typebound_dtio_proc): Guard test for
flavor attribute by checking that symbol is resolved.


Cheers

Paul


Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

2017-02-20 Thread Jan Hubicka
> > 2017-02-16  Bin Cheng  
> >
> > PR tree-optimization/77536
> > * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function.
> > (tree_transform_and_unroll_loop): Use above function to compute the
> > estimated niter of unrolled loop.
> > * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration.
> > * tree-vect-loop.c (scale_profile_for_vect_loop): New function.
> > (vect_transform_loop): Call above function.

+/* Return estimated niter for LOOP after unrolling by FACTOR times.  */
+
+unsigned
+niter_for_unrolled_loop (struct loop *loop, unsigned factor)
+{
+  unsigned est_niter = expected_loop_iterations (loop);

What happens when you have profile and loop iterates very many times?
Perhaps we want to do all calculation in gcov_type and use
expected_loop_iterations_unbounded>?

expected_loop_iterations is capping by 1 that is easy to overflow.

+  gcc_assert (factor != 0);
+  unsigned new_est_niter = est_niter / factor;
+
+  /* Without profile feedback, loops for which we do not know a better estimate
+ are assumed to roll 10 times.  When we unroll such loop, it appears to
+ roll too little, and it may even seem to be cold.  To avoid this, we
+ ensure that the created loop appears to roll at least 5 times (but at
+ most as many times as before unrolling).  */
+  if (new_est_niter < 5)
+{
+  if (est_niter < 5)
+   new_est_niter = est_niter;
+  else
+   new_est_niter = 5;
+}
+
+  return new_est_niter;
+}

I see this code is pre-existing, but please extend it to test if
loop->header->count is non-zero.  Even if we do not have idea about loop
iteration count estimate we may end up predicting more than 10 iterations when
predictors combine that way.

Perhaps testing estimated-loop_iterations would also make sense, but that
could be dealt with incrementally.

+static void
+scale_profile_for_vect_loop (struct loop *loop, unsigned vf)
+{
+  unsigned freq_h = loop->header->frequency;
+  unsigned freq_e = EDGE_FREQUENCY (loop_preheader_edge (loop));
+  /* Reduce loop iterations by the vectorization factor.  */
+  unsigned new_est_niter = niter_for_unrolled_loop (loop, vf);
+
+  if (freq_h != 0)
+scale_loop_frequencies (loop, freq_e * (new_est_niter + 1), freq_h);
+
I am always trying to avoid propagating small mistakes (i.e. frong freq_h or
freq_h) into bigger mistakes (i.e. wrong profile of the whole loop) to avoid
spreading mistakes across cfg.

But I guess here it is sort of safe because vectorized loops are simple.
You can't just scale down the existing counts/frequencies by vf, because the
entry edge frequency was adjusted.

Also niter_for_unrolled_loop depends on sanity of the profile, so perhaps you
need to compute it before you start chanigng the CFG by peeling proplogue?

Finally if freq_e is 0, all frequencies and counts will be probably dropped to
0.  What about determining fraction by counts if they are available?

Otherwise the patch looks good and thanks a lot for working on this!

Honza

> >
> > gcc/testsuite/ChangeLog
> > 2017-02-16  Bin Cheng  
> >
> > PR tree-optimization/77536
> > * gcc.dg/vect/pr79347.c: Revise testing string.


Re: fwprop fix for PR79405

2017-02-20 Thread Bernd Schmidt

On 02/17/2017 10:11 AM, Richard Biener wrote:

Index: gcc/fwprop.c
===
--- gcc/fwprop.c(revision 245501)
+++ gcc/fwprop.c(working copy)
@@ -1478,7 +1478,8 @@ fwprop (void)
  Do not forward propagate addresses into loops until after unrolling.
  CSE did so because it was able to fix its own mess, but we are not.  */

-  for (i = 0; i < DF_USES_TABLE_SIZE (); i++)
+  unsigned sz = DF_USES_TABLE_SIZE ();
+  for (i = 0; i < sz; i++)
 {
   df_ref use = DF_USES_GET (i);
   if (use)

might work?  (not knowing too much about this detail of the DF data
structures - can the table shrink?)


This would probably work to fix the bug, but this behaviour is 
explicitly documented as intentional (in the comment the second half of 
which you've quoted). I assume it enables additional substitutions.



Bernd


Re: Improving code generation in the nvptx back end

2017-02-20 Thread Bernd Schmidt

On 02/17/2017 02:09 PM, Thomas Schwinge wrote:

Hi!

On Fri, 17 Feb 2017 14:00:09 +0100, I wrote:

[...] for "normal" functions there is no reason to use the
".param" space for passing arguments in and out of functions.  We can
then get rid of the boilerplate code to move ".param %in_ar*" into ".reg
%ar*", and the other way round for "%value_out"/"%value".  This will then
also simplify the call sites, where all that code "evaporates".  That's
actually something I started to look into, many months ago, and I now
just dug out those changes, and will post them later.

(Very likely, the PTX "JIT" compiler will do the very same thing without
difficulty, but why not directly generate code that is less verbose to
read?)


Using my WIP patch, the generated PTX code changes/is simplified as
follows:


It's probably a good idea to run cuobjdump -sass to see whether this has 
any effect at all.


The most important issue that needs solving is probably still the old 
issue that ptxas isn't reliable. Looks like the llvm folks ran into the 
same problem, as I discovered last week:

  https://bugs.llvm.org//show_bug.cgi?id=27738


Bernd


Re: [libstdc++,doc] doc/xml/manual/profile_mode.xml: Update a paper reference

2017-02-20 Thread Gerald Pfeifer
On Fri, 10 Feb 2017, Joseph Myers wrote:
>> -   http://www.w3.org/1999/xlink; 
>> xlink:href="http://dx.doi.org/10.1109/CGO.2009.36;>paper presented at
>> +   http://www.w3.org/1999/xlink; 
>> xlink:href="http://ieeexplore.ieee.org/document/4907670/;>paper presented at
> dx.doi.org is specifically supposed to provide permanent URLs that 
> continue to work even if e.g. a journal changes publisher and the 
> old publisher fails to redirect for individual papers itself.  I'm 
> not convinced that replacing a link to such a URL with a link to the 
> current redirection target is a good idea.

You make a good point, Joseph, though I'd argue that my trust in 
ieee.org being around ten years from now exceed my trustin dx.doi.org
(especially after the citeseer cleanup I ended up working last weekend).

For now I am thinking to keep this link (to ieee.org), but have started
to convert all citeseer references to dx.doi.org and copied you on my
initial patch in that direction.  Let's hope that service is going to
remain stable and reliable.


One aspect I do not like too much about such a forwarding service is
that it means one of the benefits of link checking (discovering a site 
we point to going rogue) is quite reduced since every link to dx.doi.org
is going to redirect, by definition.

(Note, that is not an academic concern.  We've had such cases in the
past.)

Gerald


Re: [PATCH] Fix gimplify_expr ICE with Labels as Values (PR middle-end/79537)

2017-02-20 Thread Marek Polacek
On Mon, Feb 20, 2017 at 10:51:04AM +0100, Richard Biener wrote:
> On Fri, Feb 17, 2017 at 2:53 PM, Marek Polacek  wrote:
> > We are crashing with a label as a value without accompanying goto.
> > The problem is that TREE_SIDE_EFFECTS and FORCED_LABEL use the same
> > ->base.side_effects_flag, so gimplify_expr is confused.  We don't
> > ICE with 'goto *&' becase then we take this path:
> > 11406 case GOTO_EXPR:
> > 11407   /* If the target is not LABEL, then it is a computed jump
> > 11408  and the target needs to be gimplified.  */
> > 11409   if (TREE_CODE (GOTO_DESTINATION (*expr_p)) != LABEL_DECL)
> > 11410 {
> > 11411   ret = gimplify_expr (_DESTINATION (*expr_p), pre_p,
> > 11412NULL, is_gimple_val, fb_rvalue);
> > and because of that fb_rvalue we won't go to the switch where the ICE
> > occured.  Because '*&' on its own is useless, I think we can simply
> > discard it, which is what the following does.
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?
> 
> I think I prefer
> 
> Index: gcc/gimplify.c
> ===
> --- gcc/gimplify.c  (revision 245594)
> +++ gcc/gimplify.c  (working copy)
> @@ -11977,7 +11977,8 @@ gimplify_expr (tree *expr_p, gimple_seq
>  {
>/* We aren't looking for a value, and we don't have a valid
>  statement.  If it doesn't have side-effects, throw it away.  */
> -  if (!TREE_SIDE_EFFECTS (*expr_p))
> +  if (TREE_CODE (*expr_p) == LABEL_DECL
> +  || !TREE_SIDE_EFFECTS (*expr_p))
> *expr_p = NULL;
>else if (!TREE_THIS_VOLATILE (*expr_p))
> {
> 
> (with a comment).  Ideally we'd have TREE_NOT_CHECK (NON_TYPE_CHEC
> (NODE), LABEL_DECL) on
> TREE_SIDE_EFFECTS but that may have unwanted fallout at this point.
> 
> Ok with the above change.

All right, I'll commit this after testing.  Thanks,

2017-02-20  Marek Polacek  

PR middle-end/79537
* gimplify.c (gimplify_expr): Handle unused *&.

* gcc.dg/comp-goto-4.c: New.

diff --git gcc/gimplify.c gcc/gimplify.c
index 1b9c8d2..820459c 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -11976,8 +11976,11 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
   if (fallback == fb_none && *expr_p && !is_gimple_stmt (*expr_p))
 {
   /* We aren't looking for a value, and we don't have a valid
-statement.  If it doesn't have side-effects, throw it away.  */
-  if (!TREE_SIDE_EFFECTS (*expr_p))
+statement.  If it doesn't have side-effects, throw it away.
+We can also get here with code such as "*&", where L is
+a LABEL_DECL that is marked as FORCED_LABEL.  */
+  if (TREE_CODE (*expr_p) == LABEL_DECL
+ || !TREE_SIDE_EFFECTS (*expr_p))
*expr_p = NULL;
   else if (!TREE_THIS_VOLATILE (*expr_p))
{
diff --git gcc/testsuite/gcc.dg/comp-goto-4.c gcc/testsuite/gcc.dg/comp-goto-4.c
index e69de29..51a6a86 100644
--- gcc/testsuite/gcc.dg/comp-goto-4.c
+++ gcc/testsuite/gcc.dg/comp-goto-4.c
@@ -0,0 +1,21 @@
+/* PR middle-end/79537 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+/* { dg-require-effective-target indirect_jumps } */
+/* { dg-require-effective-target label_values } */
+
+void
+f (void)
+{
+L:
+  *&
+}
+
+void
+f2 (void)
+{
+   void *p;
+L:
+   p = &
+   *p; /* { dg-warning "dereferencing 'void \\*' pointer" } */
+}

Marek


Re: [PATCH PR77536]Generate correct profiling information for vectorized loop

2017-02-20 Thread Richard Biener
On Thu, Feb 16, 2017 at 6:48 PM, Bin Cheng  wrote:
> Hi,
> After fixing PR79347, the main problem with vectorizer is we scale down 
> profiling counters
> for vect_loop by VF, regardless of context CFG's profiling information.  This 
> is wrong and
> sometimes makes vect_loop not hot code, which may confuses following 
> optimizers.  This
> patch generates well-formed profiling information as generic tree unroller 
> does.  It borrows
> code from tree-ssa-loop-manip.c and does small refactors to existing code.  
> Vectorizer will
> not introduce mismatch counters with it, and unroll and vrp2 are the two 
> major passes
> messing up profiling counters now.  Though it removes all mismatch in 
> vectorizer, I am not
> sure if this counts as a regression fix.
> BTW, it may also help PR78116?  Hi Pat, could you please help verify this?  
> Thanks!
>
> Bootstrap and test on x86_64 and AArch64 ongoing.  Is it OK if no failures?

The patch looks good to me but please wait for Honza to comment.

Thanks,
Richard.

> Thanks,
> bin
>
> 2017-02-16  Bin Cheng  
>
> PR tree-optimization/77536
> * tree-ssa-loop-manip.c (niter_for_unrolled_loop): New function.
> (tree_transform_and_unroll_loop): Use above function to compute the
> estimated niter of unrolled loop.
> * tree-ssa-loop-manip.h niter_for_unrolled_loop(): New declaration.
> * tree-vect-loop.c (scale_profile_for_vect_loop): New function.
> (vect_transform_loop): Call above function.
>
> gcc/testsuite/ChangeLog
> 2017-02-16  Bin Cheng  
>
> PR tree-optimization/77536
> * gcc.dg/vect/pr79347.c: Revise testing string.


RE: [PATCH 5/5] Ensure the mode used to create split registers is suppported

2017-02-20 Thread Matthew Fortune
Vladimir Makarov  writes:
> On 02/07/2017 09:08 AM, Matthew Fortune wrote:
> > Hi,
> >
> > This patch addresses a problem with LRA splitting hard registers where
> > the mode requires multiple registers.  When splitting then each
> > constituent register is split individually using the widest mode for
> > each register but no check is made that such a mode is actually
> > supported in those registers.
> >
> > MIPS has an ABI variant o32 FPXX that allows DFmode values to exist in
> > pairs of 32-bit floating point registers but only the first 32-bit
> > register is directly addressable.  The second register can only be
> > accessed as part of a 64-bit load/store or via a special move
> > instruction used as part of a 64-bit move.
> >
> > The split is simply rejected to ensure compliance with the ABI
> > although I expect the split logic could account for this case and
> > split using the wider mode.  Such a change appears more invasive than
> > appropriate in stage 4.
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78012
> >
> > It is unknown if any other LRA enabled target could hit this issue but
> > it is certainly possible depending on mode/register restrictions.
> The patch is ok for the trunk and it is pretty safe.  Thank you Robert
> and Matt.

Committed as r245601.

Thanks,
Matthew


RE: [PATCH 4/5] Partial revert of r243782 to restore previous behavior

2017-02-20 Thread Matthew Fortune
Vladimir Makarov  writes:
> On 02/07/2017 09:08 AM, Matthew Fortune wrote:
> > Hi,
> >
> > This patch partially reverts r243782 where a return false was added
> > expecting it to be a no-op.  Detailed inspection shows this was not
> > true.  Despite no bug being identified following the change, removing
> > the early return is likely to be safer than leaving it in place.
> >
> > This is supported by the discussion here:
> > https://gcc.gnu.org/ml/gcc/2017-02/msg7.html
> >
> OK for the trunk.  Matt, thank you very much for efforts to clean
> simplify_operand_subreg up.

Committed as r245600.

Thanks,
Matthew


RE: [PATCH 2/5] Tighten condition for converting SUBREG reloads from OP_OUT to OP_INOUT

2017-02-20 Thread Matthew Fortune
Matthew Fortune  writes:
> This change addresses a comment from Richard Sandiford in:
> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg02165.html
> 
> where a previous fix was over eager in converting OP_OUT reloads to
> OP_INOUT.
> 
> No testcase here either but I will try and exercise this code later with
> a targeted test using the RTL frontend if possible.
> 
> Thanks,
> Matthew
> 
> gcc/
>   PR target/78660
>   * lra-constraints.c (curr_insn_transform): Tighten condition
>   for converting SUBREG reloads from OP_OUT to OP_INOUT.

Committed as r245599.

Thanks,
Matthew



RE: [PATCH 1/5] Handle WORD_REGISTER_OPERATIONS when reloading (subreg(reg))

2017-02-20 Thread Matthew Fortune
Vladimir Makarov  writes:
> On 02/07/2017 09:08 AM, Matthew Fortune wrote:
> > Hi,
> >
> > This change deals with reloading a subreg(reg) using the inner mode to
> > prevent partial spilling of data like in the case described here:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78660#c8
> >
> > No test case for now but I am investigating a targeted test using the
> > RTL frontend for later submission.
> >
> >
> > gcc/
> > PR target/78660
> > * lra-constraints.c (curr_insn_transform): Handle
> > WORD_REGISTER_OPERATIONS requirements when reloading SUBREGs.
> >
> The patch is OK.  So all 5 patches can be committed to the trunk.  I
> hope Eric's test of the patches on SPARC will be successful.  Thank you
> again for your efforts.

Committed as r245598.

Thanks,
Matthew


Re: [Patch, fortran] PR79382 - DTIO ICE

2017-02-20 Thread Paul Richard Thomas
Dear All,

Committed as revision 245596.

Thanks for the review.

Paul

On 16 February 2017 at 18:38, Jerry DeLisle  wrote:
> On 02/16/2017 03:31 AM, Paul Richard Thomas wrote:
>>
>> Dear All,
>>
>> The fix for the original bug is tested in dtio_24.f90. It is triggered
>> by the PRIVATE statement in the module and occurs because there is no
>> such generic interface in the module. Note, however, that there is a
>> typebound generic interface, which should not be affected by the
>> PRIVATE statement. The fix looks for the interface and issues an error
>> if it is not present.
>>
>> It was found that the absence of a DTIO procedure in a formatted
>> transfer, where a DT descriptor is present, caused a segfault. The fix
>> in transfer.c was to check if a reference to the DTIO procedure is
>> present and to issue an error if it is not. Unfortunately, since
>> trans-io.c transfers the components of derived types, in the absence
>> of a DTIO procedure, this negates the type check and requires that the
>> test in dtio_10.f90 be changed. I think that it would be a good idea
>> in the future to flag passing of components so that the type test can
>> be recovered. For this reason, I have left the calls in place.
>>
>> Bootstrapped and regtested on FC23/x86_64 - OK for trunk and 6-branch?
>
>
> OK for trunk. Not applicable for 6-branch
>>
>>
>> I am building up a backlog of approved patches: Including this one (if
>> approved :-) ), PRs79402, 79434 & 79447. Would it be OK to commit
>> these to trunk, even though it is in stage 4?
>
>
> Yes OK as long as we are not in freeze.
>
>
>>
>> Paul
>>
>> 2017-02-16  Paul Thomas  
>>
>> PR fortran/79382
>> * decl.c (access_attr_decl): Test for presence of generic DTIO
>> interface and emit error if not present.
>> (gfc_match_end): Catch case where a procedure is contained in
>> a module procedure and ensure that 'end procedure' is the
>> correct termination.
>>
>> 2017-02-16  Paul Thomas  
>>
>> PR fortran/79382
>> * io/transfer.c (check_dtio_proc): New function.
>> (formatted_transfer_scalar_read): Use it.
>> (formatted_transfer_scalar_write): ditto.
>>
>> 2017-02-16  Paul Thomas  
>>
>> PR fortran/79382
>> * gfortran.dg/dtio_10.f90 : Change test of error message.
>> * gfortran.dg/dtio_23.f90 : New test.
>> * gfortran.dg/dtio_24.f90 : New test.
>>
>



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] Fix various issues with x86 builtins (PR target/79568)

2017-02-20 Thread Uros Bizjak
On Fri, Feb 17, 2017 at 7:49 PM, Jakub Jelinek  wrote:
> Hi!
>
> The masks for builtins seems to be quite messy.  Most of
> the builtins have a single OPTION_MASK_ISA_* in there and that is
> clear (i.e. that the builtin is only enabled/usable if that isa
> bit is on).  Then we have 0 and that is meant for always enabled builtins.
> Then there is OPTION_MASK_ISA_xxx | OPTION_MASK_ISA_64BIT and that
> means (according to def_builtin code and comments) that we enable
> only if TARGET_64BIT and the rest without the  | OPTION_MASK_ISA_64BIT
> needs to be satisfied.  Then there is
> OPTION_MASK_ISA_xxx | OPTION_MASK_ISA_AVX512VL and that is again
> in def_builtin code and comments documented to be only enabled if
> -mavx512vl and the rest without the | OPTION_MASK_ISA_AVX512VL
> needs to be satisfied.  Then we have
> OPTION_MASK_ISA_FMA | OPTION_MASK_ISA_FMA4
> OPTION_MASK_ISA_SSE4_2 | OPTION_MASK_ISA_CRC32
> OPTION_MASK_ISA_SSE | OPTION_MASK_ISA_3DNOW_A
> def_builtin here as well as ix86_expand_builtin suggest here that it
> is either SSE or 3dNOWa etc. (I believe at least the last one
> is meant to be that, no idea about the other two).
> And finally various builtins use ~OPTION_MASK_ISA_64BIT, which I have
> no idea what people adding those really meant to express.  At least
> for e.g. __builtin_ia32_pause I'd expect it wants to be enabled
> everywhere (that is 0 though), while what both def_builtin and
> ix86_expand_builtin actually implement for ~OPTION_MASK_ISA_64BIT
> is that it is enabled whenever any ISA (other than OPTION_MASK_ISA_64BIT)
> is set and disabled otherwise.  So, e.g. -m32 -march=i386 -mno-sahf -mno-mmx
> -mno-sse disables __builtin_ia32_pause, __builtin_ia32_rdtsc,
> __builtin_ia32_rolhi etc.
>
> For OPTION_MASK_ISA_64BIT and OPTION_MASK_ISA_AVX512VL, while
> def_builtin implements something documented (and what makes sense),
> ix86_expand_builtin actually takes it as any of the ISAs enabled,
> so if you manage to define the builtin earlier (e.g. through including
> x86intrin.h), then one can use
> OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VL
> builtin (meant to be enabled if both are on) in a function where just one
> of them is on (-mavx512bw or -mavx512vl), if both aren't on, that will ICE.
> Similarly, for OPTION_MASK_ISA_LWP | OPTION_MASK_ISA_64BIT, if the
> builtin is defined earlier, it will be enabled even in -mno-lwp -m64
> function and ICE (not for -m32 -mlwp, because -m64 is a global TU option
> and so nothing will define the builtin).
>
> This patch attempts to resolve it by changing all ~OPTION_MASK_ISA_64BIT
> builtins to 0, handling xxx | OPTION_MASK_ISA_64BIT
> as must satisfy TARGET_64BIT and xxx, handling yyy |
> OPTION_MASK_ISA_AVX512VL as must satisfy -mavx512vl and yyy and for the
> rest, if 0, enabling always, otherwise requiring at least one of the ISAs
> enabled from the bitmask.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> Or do we want a different behavior (then what and why)?

Thanks for the detailed analysis. After some thinking, I think that
the above is the correct approach.

> 2017-02-17  Jakub Jelinek  
>
> PR target/79568
> * config/i386/i386.c (ix86_expand_builtin): Handle
> OPTION_MASK_ISA_AVX512VL and OPTION_MASK_ISA_64BIT in
> ix86_builtins_isa[fcode].isa as a requirement of those
> flags and any other flag in the bitmask.
> (ix86_init_mmx_sse_builtins): Use 0 instead of
> ~OPTION_MASK_ISA_64BIT as mask.
> * config/i386/i386-builtin.def (__builtin_ia32_rdtsc,
> __builtin_ia32_rdtscp, __builtin_ia32_pause, __builtin_ia32_bsrsi,
> __builtin_ia32_rdpmc, __builtin_ia32_rolqi, __builtin_ia32_rolhi,
> __builtin_ia32_rorqi, __builtin_ia32_rorhi): Likewise.
>
> * gcc.target/i386/pr79568-1.c: New test.
> * gcc.target/i386/pr79568-2.c: New test.
> * gcc.target/i386/pr79568-3.c: New test.

OK for mainline and eventually branches.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2017-02-17 11:11:27.0 +0100
> +++ gcc/config/i386/i386.c  2017-02-17 17:10:07.899194674 +0100
> @@ -32075,11 +32075,11 @@ ix86_init_mmx_sse_builtins (void)
>IX86_BUILTIN_SBB64);
>
>/* Read/write FLAGS.  */
> -  def_builtin (~OPTION_MASK_ISA_64BIT, "__builtin_ia32_readeflags_u32",
> +  def_builtin (0, "__builtin_ia32_readeflags_u32",
> UNSIGNED_FTYPE_VOID, IX86_BUILTIN_READ_FLAGS);
>def_builtin (OPTION_MASK_ISA_64BIT, "__builtin_ia32_readeflags_u64",
> UINT64_FTYPE_VOID, IX86_BUILTIN_READ_FLAGS);
> -  def_builtin (~OPTION_MASK_ISA_64BIT, "__builtin_ia32_writeeflags_u32",
> +  def_builtin (0, "__builtin_ia32_writeeflags_u32",
> VOID_FTYPE_UNSIGNED, IX86_BUILTIN_WRITE_FLAGS);
>def_builtin (OPTION_MASK_ISA_64BIT, "__builtin_ia32_writeeflags_u64",
> VOID_FTYPE_UINT64, IX86_BUILTIN_WRITE_FLAGS);
> @@ -36723,9 

RE: [PATCH 3/5] Support WORD_REGISTER_OPERATIONS requirements in simplify_operand_subreg

2017-02-20 Thread Matthew Fortune
Hi Eric,

Any thoughts on this?

Thanks,
Matthew

> Sorry for the slow reply, been away for a few days
> 
> Eric Botcazou  writes:
> > > This patch is a minimal change to prevent (subreg(mem)) from being
> > > simplified to use the outer mode for WORD_REGISTER_OPERATIONS.
> > > There is high probability of refining and/or re-implementing this
> > > for GCC 8 but such a change would be too invasive.  This change at
> > > least ensures correctness but may prevent simplification of some
> acceptable cases.
> >
> > This one causes:
> >
> > +FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer -
> > funroll-
> > loops -fpeel-loops -ftracer -finline-functions  (test for excess
> > errors)
> > +WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-
> pointer
> > -
> > funroll-loops -fpeel-loops -ftracer -finline-functions  compilation
> > failed to produce executable
> > +FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -g  (test for excess
> > errors)
> > +WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -g  compilation
> > failed to
> > produce executable
> > +WARNING: program timed out.
> > +WARNING: program timed out.
> >
> > on SPARC 32-bit, i.e. LRA hangs.  Reduced testcase attached, compile
> > at
> > -O3 with a cc1 configured for sparc-sun-solaris2.10.
> >
> > > gcc/
> > >   PR target/78660
> > >   * lra-constraints.c (simplify_operand_subreg): Handle
> > >   WORD_REGISTER_OPERATIONS targets.
> > > ---
> > >  gcc/lra-constraints.c | 17 -
> > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index
> > > 66ff2bb..484a70d 100644
> > > --- a/gcc/lra-constraints.c
> > > +++ b/gcc/lra-constraints.c
> > > @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop,
> > > machine_mode
> > > reg_mode) subregs as we don't substitute such equiv memory (see
> > > processing equivalences in function lra_constraints) and because for
> > > spilled pseudos we allocate stack memory enough for the biggest
> > > -  corresponding paradoxical subreg.  */
> > > -   if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > > - && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > > -   || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > > -   && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg
> > > +  corresponding paradoxical subreg.
> > > +
> > > +  However, never simplify a (subreg (mem ...)) for
> > > +  WORD_REGISTER_OPERATIONS targets as this may lead to loading
> > junk
> > > +  data into a register when the inner is narrower than outer or
> > > +  missing important data from memory when the inner is wider
> > than
> > > +  outer.  */
> > > +   if (!WORD_REGISTER_OPERATIONS
> > > +   && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > > + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > > +   || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > > +   && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN
> > (reg)
> > >   return true;
> > >
> > > *curr_id->operand_loc[nop] = operand;
> >
> > I think that we might need:
> >
> >   if (!(GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (innermode)
> > && WORD_REGISTER_OPERATIONS)
> >   && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> >   || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> >   && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)
> > return true;
> >
> > i.e. we force the reloading only for paradoxical subregs.
> 
> Maybe, though I'm not entirely convinced.  For WORD_REGISTER_OPERATIONS
> both paradoxical and normal SUBREGs can be a problem as the inner mode
> in both cases can be used elsewhere for a reload making the content of
> the spill slot wrong either in the subreg reload or the ordinary reload
> elsewhere. However, the condition can be tightened; I should not have
> made it so simplistic I guess. I.e. both modes must be no wider than a
> word and must be different precision to force an inner reload.
> 
> Adding that check would fix this case for SPARC and should be fine for
> MIPS but I need to wait for a bootstrap build to be sure.
> 
> I don't really understand why LRA can't reload this for SPARC though as
> I'm not sure there is any guarantee provided to backends that some
> SUBREGs will be reloaded using their outer mode.  If there is such a
> guarantee then it would be much easier to reason about this logic but as
> it stands I suspect we are having to tweak LRA to cope with assumptions
> made in various targets that happen to have held true (and I have no
> doubt that MIPS has some of these as well especially in terms of the
> FP/GP register usage with float and int modes.)  All being well we can
> capture such assumptions and formalise them so we ensure they hold true
> (or modify backends 

Re: [PATCH] Fix gimplify_expr ICE with Labels as Values (PR middle-end/79537)

2017-02-20 Thread Richard Biener
On Fri, Feb 17, 2017 at 2:53 PM, Marek Polacek  wrote:
> We are crashing with a label as a value without accompanying goto.
> The problem is that TREE_SIDE_EFFECTS and FORCED_LABEL use the same
> ->base.side_effects_flag, so gimplify_expr is confused.  We don't
> ICE with 'goto *&' becase then we take this path:
> 11406 case GOTO_EXPR:
> 11407   /* If the target is not LABEL, then it is a computed jump
> 11408  and the target needs to be gimplified.  */
> 11409   if (TREE_CODE (GOTO_DESTINATION (*expr_p)) != LABEL_DECL)
> 11410 {
> 11411   ret = gimplify_expr (_DESTINATION (*expr_p), pre_p,
> 11412NULL, is_gimple_val, fb_rvalue);
> and because of that fb_rvalue we won't go to the switch where the ICE
> occured.  Because '*&' on its own is useless, I think we can simply
> discard it, which is what the following does.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk and 6?

I think I prefer

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 245594)
+++ gcc/gimplify.c  (working copy)
@@ -11977,7 +11977,8 @@ gimplify_expr (tree *expr_p, gimple_seq
 {
   /* We aren't looking for a value, and we don't have a valid
 statement.  If it doesn't have side-effects, throw it away.  */
-  if (!TREE_SIDE_EFFECTS (*expr_p))
+  if (TREE_CODE (*expr_p) == LABEL_DECL
+  || !TREE_SIDE_EFFECTS (*expr_p))
*expr_p = NULL;
   else if (!TREE_THIS_VOLATILE (*expr_p))
{

(with a comment).  Ideally we'd have TREE_NOT_CHECK (NON_TYPE_CHEC
(NODE), LABEL_DECL) on
TREE_SIDE_EFFECTS but that may have unwanted fallout at this point.

Ok with the above change.

Thanks,
RIchard.

> 2017-02-17  Marek Polacek  
>
> PR middle-end/79537
> * gimplify.c (gimplify_expr): Handle unused *&.
>
> * gcc.dg/comp-goto-4.c: New.
>
> diff --git gcc/gimplify.c gcc/gimplify.c
> index 1b9c8d2..5524357 100644
> --- gcc/gimplify.c
> +++ gcc/gimplify.c
> @@ -12003,6 +12003,11 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>  gimple_test_f, fallback);
>   break;
>
> +   case LABEL_DECL:
> + /* We can get here with code such as "*&", where L is
> +a LABEL_DECL that is marked as FORCED_LABEL.  Skip it.  */
> + break;
> +
> default:
>/* Anything else with side-effects must be converted to
>   a valid statement before we get here.  */
> diff --git gcc/testsuite/gcc.dg/comp-goto-4.c 
> gcc/testsuite/gcc.dg/comp-goto-4.c
> index e69de29..51a6a86 100644
> --- gcc/testsuite/gcc.dg/comp-goto-4.c
> +++ gcc/testsuite/gcc.dg/comp-goto-4.c
> @@ -0,0 +1,21 @@
> +/* PR middle-end/79537 */
> +/* { dg-do compile } */
> +/* { dg-options "" } */
> +/* { dg-require-effective-target indirect_jumps } */
> +/* { dg-require-effective-target label_values } */
> +
> +void
> +f (void)
> +{
> +L:
> +  *&
> +}
> +
> +void
> +f2 (void)
> +{
> +   void *p;
> +L:
> +   p = &
> +   *p; /* { dg-warning "dereferencing 'void \\*' pointer" } */
> +}
>
> Marek


Re: [Patch, fortran] PR79434 - [submodules] separate module procedure breaks encapsulation

2017-02-20 Thread Paul Richard Thomas
Hi Everybody,

Committed as revision 245595.

Thanks for the review.

Paul

On 16 February 2017 at 00:38, Jerry DeLisle  wrote:
> On 02/15/2017 10:12 AM, Paul Richard Thomas wrote:
>> Dear All,
>>
>> Another straightforward patch, although it took some head scratching
>> to realize how simple the fix could be :-)
>>
>> Bootstraps and regtests on FC23/x_86_64 - OK for trunk and 6-branch?
>>
>
> Yes, OK
>
> Jerry



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [PATCH] Change default of param not being smaller that min.

2017-02-20 Thread Richard Biener
On Fri, Feb 17, 2017 at 4:18 PM, Martin Liška  wrote:
> Hello.
>
> We should not have a default value (different from -1) that is smaller than 
> minimum.
>
> Ready to be installed after tests?

I think you should instead change the minimum value.

But OTOH I wonder why this is a --param at all...  Alex?  Why's this
not just a #define
inside emit-rtl.c?  In what circumstances is the user supposed to
alter this --param?

Richard.

> Martin


Re: Ping [Patch, fortran] PR79402 - ICE with submodules: module procedure interface defined in parent module

2017-02-20 Thread Stefano Zaghi
Dear Paul and Jerry (and all gfortran brave heroes),

thank you very much for your great work!

Cheers
Stefano Zaghi  Ph.D. Aerospace Engineer

Research Scientist, Dept. of Computational Hydrodynamics at CNR-INSEAN
p: +39 0650299260 | m: +39 3497730036 | e: stefano.za...@gmail.com

Member of Fortran-FOSS-Programmers

Codes Showcase
OFF  Open source Finite volumes Fluid dynamics code
Lib_VTK_IO   Fortran library to write and read data conforming the VTK standard
FLAP Fortran command Line Arguments Parser for poor men
BeFoR64  Base64 encoding/decoding library for FoRtran poor men
FiNeRFortran INI ParseR and generator for FoRtran poor men
IR_Precision Fortran (standard 2003) module to develop portable codes
FoBis.py Fortran Building System for poor men
PreForM.py   Preprocessor for Fortran poor men
MaTiSSe.py   Markdown To Impressive Scientific Slides



On Sun, Feb 19, 2017 at 7:29 PM, Paul Richard Thomas
 wrote:
> Hi Everybody,
>
> With Jerry's OK for a commit to trunk, committed as revision 245580.
>
> I will wait a few weeks before a commit to 6-branch.
>
> Cheers
>
>
> Paul
>
> On 11 February 2017 at 13:24, Paul Richard Thomas
>  wrote:
>> Ping!
>>
>> On 8 February 2017 at 16:00, Paul Richard Thomas
>>  wrote:
>>> Dear All,
>>>
>>> The attached rework of the patch functions in the same way as
>>> yesterday's but is based in resolve.c rather than trans-decl.c. It
>>> looks to me to be by far cleaner.
>>>
>>> Bootstraps and regtests on FC23/x86_64 - OK for trunk?
>>>
>>> Cheers
>>>
>>> Paul
>>>
>>> 2017-02-08  Paul Thomas  
>>>
>>> PR fortran/79344
>>> * resolve.c (fixup_unique_dummy): New function.
>>> (gfc_resolve_expr): Call it for dummy variables with a unique
>>> symtree name.
>>>
>>> 2017-02-08  Paul Thomas  
>>>
>>> PR fortran/79344
>>> * gfortran.dg/submodule_23.f90: New test.
>>>
>>>
>>>
>>> On 7 February 2017 at 16:06, Paul Richard Thomas
>>>  wrote:
 Dear All,

 This bug generates an ICE because the symbol for dummy 'n' in the
 specification expression for the result of 'fun1' is not the same as
 the symbol in the formal arglist. For some reason that I have been
 unable to uncover, this false dummy is associated with a unique
 symtree. The odd thing is that the dump of the parse tree for the
 failing module procedure case is identical to that where the interface
 is explcitely reproduced in the submodule. The cause of the ICE is
 that the false dummy has no backend_decl as it should.

 This patch hits the problem directly on the head by using the
 backend_decl from the symbol in the namespace of the formal arglist,
 as described in the comment in the patch. If it is deemed to be more
 hygenic, the chunk of code can be lifted out and deposited in a
 separate function.

 Bootstraps and regtests on FC23/x86_64 - OK for trunk?

 Cheers

 Paul

 2017-02-07  Paul Thomas  

 PR fortran/79344
 * trans-decl.c (gfc_get_symbol_decl): If a dummy apparently has
 a null backend_decl, look for a replacement symbol in the
 namespace of the 1st formal argument and use its backend_decl.

 2017-02-07  Paul Thomas  

 PR fortran/79344
 * gfortran.dg/submodule_23.f90: New test.
>>>
>>>
>>>
>>> --
>>> "If you can't explain it simply, you don't understand it well enough"
>>> - Albert Einstein
>>
>>
>>
>> --
>> "If you can't explain it simply, you don't understand it well enough"
>> - Albert Einstein
>
>
>
> --
> "If you can't explain it simply, you don't understand it well enough"
> - Albert Einstein