Re: [PATCH] Fix PR70434

2016-04-01 Thread Richard Biener
On Wed, 30 Mar 2016, Jakub Jelinek wrote:

> On Wed, Mar 30, 2016 at 02:07:07PM +0200, Richard Biener wrote:
> > The patch for PR63764 (accepts-invalid / ICE) caused us to generate
> > worse code for rvalue vector indexing by forcing the vector to a
> > temporary.  It turns out this is not necessary if we alter the
> > way the C/C++ FE lower the vector to perform the indexing operation
> > from lowering to a pointer-to-element to lowering to an array
> > using a VIEW_CONVERT_EXPR.
> > 
> > The alternate lowering has the advantage that the vector is not
> > required to be addressable which should improve aliasing analysis.
> > Not lowering to indirect accesses should also improve optimizations
> > like value-numbering.
> > 
> > There's the fallout that we need to make sure to convert back
> > constant index array-refs to the canonical BIT_FIELD_REF form
> > we use for vectors (see the gimple-fold.c hunk pattern-matching this).
> > 
> > And there's the latent bug in update-address-taken which happily
> > re-wrote a vector array-ref base into SSA form.
> > 
> > Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress.
> > 
> > I'll happily defer to stage1 if you think that's better as I don't
> > know any project that heavily makes use of GCCs vector extension
> > and I'm not sure about our own testsuite coverage.
> 
> I think we should defer it for stage1 at this point.

For reference, further changes were necessary to make the patch
regression-free - here they are, queued for stage1.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2016-04-01  Richard Biener  

PR middle-end/70434
c-family/
* c-common.c (convert_vector_to_pointer_for_subscript): Use a
VIEW_CONVERT_EXPR to an array type.

cp/
* expr.c (mark_exp_read): Handle VIEW_CONVERT_EXPR.
* constexpr.c (cxx_eval_array_reference): Handle indexed
vectors.

c/
* c-typeck.c (build_array_ref): Do not complain about indexing
non-lvalue vectors.

* tree-ssa.c (non_rewritable_mem_ref_base): Make sure to mark
bases which are accessed with non-invariant indices.
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Re-write
constant index ARRAY_REFs of vectors into BIT_FIELD_REFs.

Index: gcc/c-family/c-common.c
===
*** gcc/c-family/c-common.c (revision 234546)
--- gcc/c-family/c-common.c (working copy)
*** convert_vector_to_pointer_for_subscript
*** 12448,12501 
if (VECTOR_TYPE_P (TREE_TYPE (*vecp)))
  {
tree type = TREE_TYPE (*vecp);
-   tree type1;
  
ret = !lvalue_p (*vecp);
if (TREE_CODE (index) == INTEGER_CST)
  if (!tree_fits_uhwi_p (index)
  || tree_to_uhwi (index) >= TYPE_VECTOR_SUBPARTS (type))
warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
  
!   if (ret)
!   {
! tree tmp = create_tmp_var_raw (type);
! DECL_SOURCE_LOCATION (tmp) = loc;
! *vecp = c_save_expr (*vecp);
! if (TREE_CODE (*vecp) == C_MAYBE_CONST_EXPR)
!   {
! bool non_const = C_MAYBE_CONST_EXPR_NON_CONST (*vecp);
! *vecp = C_MAYBE_CONST_EXPR_EXPR (*vecp);
! *vecp
!   = c_wrap_maybe_const (build4 (TARGET_EXPR, type, tmp,
! *vecp, NULL_TREE, NULL_TREE),
! non_const);
!   }
! else
!   *vecp = build4 (TARGET_EXPR, type, tmp, *vecp,
!   NULL_TREE, NULL_TREE);
! SET_EXPR_LOCATION (*vecp, loc);
! c_common_mark_addressable_vec (tmp);
!   }
!   else
!   c_common_mark_addressable_vec (*vecp);
!   type = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
!   type1 = build_pointer_type (TREE_TYPE (*vecp));
!   bool ref_all = TYPE_REF_CAN_ALIAS_ALL (type1);
!   if (!ref_all
! && !DECL_P (*vecp))
!   {
! /* If the original vector isn't declared may_alias and it
!isn't a bare vector look if the subscripting would
!alias the vector we subscript, and if not, force ref-all.  */
! alias_set_type vecset = get_alias_set (*vecp);
! alias_set_type sset = get_alias_set (type);
! if (!alias_sets_must_conflict_p (sset, vecset)
! && !alias_set_subset_of (sset, vecset))
!   ref_all = true;
!   }
!   type = build_pointer_type_for_mode (type, ptr_mode, ref_all);
!   *vecp = build1 (ADDR_EXPR, type1, *vecp);
!   *vecp = convert (type, *vecp);
  }
return ret;
  }
--- 12448,12470 
if (VECTOR_TYPE_P (TREE_TYPE (*vecp)))
  {
tree type = TREE_TYPE (*vecp);
  
ret = !lvalue_p (*vecp);
+ 
if (TREE_CODE (index) == INTEGER_CST)
  if (!tree_fits_uhwi_p (index)
  || tree_to_uhwi (i

Re: Various selective scheduling fixes

2016-04-01 Thread Christophe Lyon
On 31 March 2016 at 16:43, Andrey Belevantsev  wrote:
> Hello,
>
> On 14.03.2016 12:10, Andrey Belevantsev wrote:
>>
>> Hello,
>>
>> In this thread I will be posting the patches for the fixed selective
>> scheduling PRs (except the one that was already kindly checked in by
>> Jeff).
>>  The patches were tested both on x86-64 and ia64 with the following
>> combination: 1) the usual bootstrap/regtest, which only utilizes sel-sched
>> on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the
>> bootstrap/regtest with the second scheduler forced to sel-sched; 3) both
>> schedulers forced to sel-sched.  In all cases everything seemed to be
>> fine.
>>
>> Three of the PRs are regressions, the other two showed different errors
>> across the variety of releases tested by submitters;  I think all of them
>> are appropriate at this stage -- they do not touch anything outside of
>> selective scheduling except the first patch where a piece of code from
>> sched-deps.c needs to be refactored into a function to be called from
>> sel-sched.c.
>
>
> I've backported all regression PRs to gcc-5-branch after testing there again
> with selective scheduling force enabled: PRs 64411, 0, 69032, 69102.
> The first one was not marked as a regression as such but the test for PR
> 70292, which is duplicate, works for me on gcc 5.1 thus making it a
> regression, too.
>

Hi,

The backport for pr69102 shows that the new testcase fails to compile (ICE)
when GCC is configured as:

--target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm
--with-cpu=cortex-a15 --with-fpu=neon-vfpv4

/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:
In function 'foo':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1:
internal compiler error: Segmentation fault
0xa64d15 crash_signal
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383
0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
0xa31cd2 invoke_dfa_lookahead_guard
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212
0xa31cd2 find_best_expr
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415
0xa343fb fill_insns
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570
0xa343fb schedule_on_fences
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395
0xa36010 sel_sched_region_2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533
0xa36f2a sel_sched_region_1
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575
0xa36f2a sel_sched_region(int)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676
0xa37589 run_selective_scheduling()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752
0xa14aed rest_of_handle_sched2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
0xa14aed execute
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791

See 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt

Can you have a look?

Christophe

> Andrey
>
>>
>> Andrey
>
>


Re: [PATCH] Prevent loops from being optimized away

2016-04-01 Thread Richard Biener
On Fri, Apr 1, 2016 at 6:54 AM, Segher Boessenkool
 wrote:
> Sometimes people write loops that they do not want optimized away, even
> when the compiler can replace those loops by a simple expression (or
> nothing).  For such people, this patch adds a compiler option.
>
> Bootstrapped on powerpc64-linux; regression check still in progress
> (with Init(1) to actually test anything).

-fno-tree-scev-cprop?  -O0?

A new compiler option for this is complete overkill (and it's implementation
is gross ;)).  Semantics are also unclear, your patch would only make sure
to preserve an empty loop with the asm in the latch, it wouldn't disallow
replacing the overall effect with a computation.

Your patch would also miss a few testcases.

Richard.

>
> Segher
>
>
> 2016-04-01  Segher Boessenkool  
>
> * loop-init.c: Include some more stuff that really doesn't belong
> here, oh well.
> (loop_optimizer_init): Add empty asm statements in all gimple loops,
> if asked to.
> * common.opt: Add new option.
>
> ---
>  gcc/common.opt  |  4 
>  gcc/loop-init.c | 15 +++
>  2 files changed, 19 insertions(+)
>
> diff --git a/gcc/loop-init.c b/gcc/loop-init.c
> index 8634591..7c5dc24 100644
> --- a/gcc/loop-init.c
> +++ b/gcc/loop-init.c
> @@ -24,6 +24,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "target.h"
>  #include "rtl.h"
>  #include "tree.h"
> +#include "gimple.h"
> +#include "gimple-iterator.h"
>  #include "cfghooks.h"
>  #include "df.h"
>  #include "regs.h"
> @@ -91,6 +93,19 @@ loop_optimizer_init (unsigned flags)
>
>/* Find the loops.  */
>current_loops = flow_loops_find (NULL);
> +
> +  if (flag_never_gonna_give_you_up && current_ir_type () == IR_GIMPLE)
> +   {
> + struct loop *loop;
> + FOR_EACH_LOOP (loop, 0)
> +   if (loop->latch)
> + {
> +   gasm *p = gimple_build_asm_vec ("", 0, 0, 0, 0);
> +   gimple_asm_set_volatile (p, true);
> +   gimple_stmt_iterator bsi = gsi_after_labels (loop->latch);
> +   gsi_insert_before (&bsi, p, GSI_SAME_STMT);
> + }
> +   }
>  }
>else
>  {
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 0f3bb4e..b7c0a6a 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2002,6 +2002,10 @@ frerun-loop-opt
>  Common Ignore
>  Does nothing.  Preserved for backward compatibility.
>
> +frickroll-all-loops
> +Common Report Var(flag_never_gonna_give_you_up) Init(0) Optimization
> +You know the rules, and so do I.
> +
>  frounding-math
>  Common Report Var(flag_rounding_math) Optimization SetByCombined
>  Disable optimizations that assume default FP rounding behavior.
> --
> 1.9.3
>


Re: [PATCH][PR rtl-optimization/69307] Handle hard registers in modes that span more than one register properly

2016-04-01 Thread Christophe Lyon
On 31 March 2016 at 16:55, Andrey Belevantsev  wrote:
> Hello,
>
> On 12.03.2016 20:13, Jeff Law wrote:
>>
>>
>> As Andrey outlined in the PR, selective-scheduling was missing a check &
>> handling of hard registers in modes that span more than one hard reg. This
>> caused an incorrect register selection during renaming.
>>
>> I verified removing the printf call from the test would not compromise the
>> test.  Then I did a normal x86 bootstrap & regression test with the patch.
>> Of course that's essentially useless, so I also did another bootstrap and
>> regression test with -fselective-scheduling in BOOT_CFLAGS with and
>> without
>> this patch.  In both cases there were no regressions.
>>
>> I'm installing Andrey's patch on the trunk.  I'm not sure this is worth
>> addressing in gcc-5.
>
>
> I've looked at the patch again and as it fixes general code and has a
> regression marker I've included it in the bunch of other PRs that were
> backported to gcc-5.  I forgot you were hesitant putting it to gcc-5 though
> :) So I can revert it from the branch if you want me to.
>
> Andrey
>

Hi,

I've noticed that the backport in the gcc-5 branch shows an ICE on
gcc.dg/pr69307.c
when GCC is configured --with-target arm-none-linux-gnueabihf
--with-cpu=cortex-a15 --with-fpu=neon-vfpv4
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/pr69307.c: In
function 'foo':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/pr69307.c:25:1:
internal compiler error: in autopref_multipass_dfa_l
ookahead_guard, at haifa-sched.c:5752
0xfa450e autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
0xa31d42 invoke_dfa_lookahead_guard
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4225
0xa31d42 find_best_expr
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4428
0xa3446b fill_insns
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5583
0xa3446b schedule_on_fences
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7408
0xa36080 sel_sched_region_2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7546
0xa36f9a sel_sched_region_1
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7588
0xa36f9a sel_sched_region(int)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7689
0xa375f9 run_selective_scheduling()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7765
0xa14aed rest_of_handle_sched2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
0xa14aed execute
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791

See 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234629/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt

Christophe.


>>
>> Jeff
>
>


Re: Various selective scheduling fixes

2016-04-01 Thread Andrey Belevantsev

Hi Christophe,

On 01.04.2016 10:33, Christophe Lyon wrote:

On 31 March 2016 at 16:43, Andrey Belevantsev  wrote:

Hello,

On 14.03.2016 12:10, Andrey Belevantsev wrote:


Hello,

In this thread I will be posting the patches for the fixed selective
scheduling PRs (except the one that was already kindly checked in by
Jeff).
 The patches were tested both on x86-64 and ia64 with the following
combination: 1) the usual bootstrap/regtest, which only utilizes sel-sched
on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the
bootstrap/regtest with the second scheduler forced to sel-sched; 3) both
schedulers forced to sel-sched.  In all cases everything seemed to be
fine.

Three of the PRs are regressions, the other two showed different errors
across the variety of releases tested by submitters;  I think all of them
are appropriate at this stage -- they do not touch anything outside of
selective scheduling except the first patch where a piece of code from
sched-deps.c needs to be refactored into a function to be called from
sel-sched.c.



I've backported all regression PRs to gcc-5-branch after testing there again
with selective scheduling force enabled: PRs 64411, 0, 69032, 69102.
The first one was not marked as a regression as such but the test for PR
70292, which is duplicate, works for me on gcc 5.1 thus making it a
regression, too.



Hi,

The backport for pr69102 shows that the new testcase fails to compile (ICE)
when GCC is configured as:

--target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm
--with-cpu=cortex-a15 --with-fpu=neon-vfpv4

/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:
In function 'foo':
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1:
internal compiler error: Segmentation fault
0xa64d15 crash_signal
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383
0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
0xa31cd2 invoke_dfa_lookahead_guard
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212
0xa31cd2 find_best_expr
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415
0xa343fb fill_insns
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570
0xa343fb schedule_on_fences
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395
0xa36010 sel_sched_region_2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533
0xa36f2a sel_sched_region_1
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575
0xa36f2a sel_sched_region(int)
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676
0xa37589 run_selective_scheduling()
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752
0xa14aed rest_of_handle_sched2
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
0xa14aed execute
/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791

See 
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt

Can you have a look?


That's because A15 is the only place which enables 
autopref_multipass_dfa_lookahead_guard as the DFA lookahead guard hook. 
But autoprefetch modeling doesn't work for selective scheduling, it uses 
haifa structures that are not kept up to date during sel-sched.  So this is 
not supposed to work as soon as the param value for prefetcher lookahead 
depth is positive.


The following patch works for me.  Could you check it with your testing? 
If it works fine for you, I would install the patch both for trunk and 
gcc-5.  It would be great to force sel-sched to be enabled, too.  I could 
do that but I don't have the hardware or cross-arm target tools at the moment.


* haifa-sched.c (autopref_multipass_dfa_lookahead_guard): Disable 
for selective scheduler.


Best,
Andrey




Christophe


Andrey



Andrey





diff --git a/gcc/haifa-sched.c b/gcc/haifa-sched.c
index ad2450b..c790830 100644
--- a/gcc/haifa-sched.c
+++ b/gcc/haifa-sched.c
@@ -5691,6 +5691,10 @@ autopref_multipass_dfa_lookahead_guard (rtx_insn *insn1, int ready_index)
 {
   int r = 0;
 
+  /* Autoprefetcher modeling is not supported by selective scheduler.  */
+  if (sel_sched_p ())
+return 0;
+
   if (PARAM_VALUE (PARAM_SCHED_AUTOPREF_QUEUE_DEPTH) <= 0)
 return 0;
 


[gomp4] Also test -O0 for OpenACC C, C++ offloading test cases

2016-04-01 Thread Thomas Schwinge
Hi!

On Thu, 24 Mar 2016 22:31:29 +0100, I wrote:
> On Wed, 23 Mar 2016 19:57:50 +0100, Bernd Schmidt  wrote:
> > Ok with [...].
> 
> Thanks for the review; committed in r234471:

> Also test -O0 for OpenACC C, C++ offloading test cases

Merged into gomp-4_0-branch in r234664:

commit 9973610b0d5cb41f380fa18eabacb5f967fe4d0e
Merge: e252cf1 0266264
Author: tschwinge 
Date:   Fri Apr 1 08:41:44 2016 +

svn merge -r 234469:234471 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@234664 
138bc75d-0d04-0410-961f-82ee72b054a4

 libgomp/ChangeLog  | 36 ++
 libgomp/testsuite/libgomp.oacc-c++/c++.exp | 55 +-
 .../libgomp.oacc-c-c++-common/acc-on-device-2.c|  5 +-
 .../libgomp.oacc-c-c++-common/acc-on-device.c  |  3 +-
 .../libgomp.oacc-c-c++-common/avoid-offloading-1.c |  3 ++
 .../libgomp.oacc-c-c++-common/avoid-offloading-2.c |  3 ++
 .../libgomp.oacc-c-c++-common/gang-static-2.c  |  4 +-
 .../libgomp.oacc-c-c++-common/host_data-1.c|  1 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c |  2 +
 .../libgomp.oacc-c-c++-common/loop-auto-1.c|  6 ++-
 .../loop-default-compile.c |  6 ++-
 .../loop-default-runtime.c |  6 ++-
 .../testsuite/libgomp.oacc-c-c++-common/loop-g-1.c |  6 ++-
 .../testsuite/libgomp.oacc-c-c++-common/loop-g-2.c |  6 ++-
 .../libgomp.oacc-c-c++-common/loop-gwv-1.c |  5 +-
 .../libgomp.oacc-c-c++-common/loop-red-g-1.c   |  6 ++-
 .../libgomp.oacc-c-c++-common/loop-red-gwv-1.c |  5 +-
 .../libgomp.oacc-c-c++-common/loop-red-v-1.c   |  5 +-
 .../libgomp.oacc-c-c++-common/loop-red-v-2.c   |  5 +-
 .../libgomp.oacc-c-c++-common/loop-red-w-1.c   |  6 ++-
 .../libgomp.oacc-c-c++-common/loop-red-w-2.c   |  6 ++-
 .../libgomp.oacc-c-c++-common/loop-red-wv-1.c  |  5 +-
 .../testsuite/libgomp.oacc-c-c++-common/loop-v-1.c |  5 +-
 .../testsuite/libgomp.oacc-c-c++-common/loop-w-1.c |  6 ++-
 .../libgomp.oacc-c-c++-common/loop-wv-1.c  |  5 +-
 .../libgomp.oacc-c-c++-common/routine-3.c  |  8 ++--
 .../libgomp.oacc-c-c++-common/routine-g-1.c|  6 ++-
 .../libgomp.oacc-c-c++-common/routine-gwv-1.c  |  5 +-
 .../libgomp.oacc-c-c++-common/routine-v-1.c|  5 +-
 .../libgomp.oacc-c-c++-common/routine-vec-1.c  |  5 +-
 .../libgomp.oacc-c-c++-common/routine-w-1.c|  6 ++-
 .../libgomp.oacc-c-c++-common/routine-work-1.c |  6 ++-
 .../libgomp.oacc-c-c++-common/routine-wv-1.c   |  5 +-
 .../libgomp.oacc-c-c++-common/routine-wv-2.c   |  5 +-
 .../libgomp.oacc-c-c++-common/worker-partn-8.c |  2 -
 libgomp/testsuite/libgomp.oacc-c/c.exp | 54 +
 36 files changed, 187 insertions(+), 121 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 5f2c401..e0cd567 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,39 @@
+2016-03-24  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-c++/c++.exp: Set up torture testing, use
+   gcc-dg-runtest.
+   * testsuite/libgomp.oacc-c/c.exp: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Specify
+   -fno-builtin-acc_on_device instead of -O0.
+   * testsuite/libgomp.oacc-c-c++-common/acc-on-device.c: Skip for
+   -O0.
+   * testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c:
+   Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
+   * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
+   Don't specify -O2.
+   * testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
+   Likewise.
+   * testsuite/libgomp.oacc-c-c

[PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-01 Thread Richard Biener

RTL DSE uses true_dependence to see whether a store may be killed by
anothe store - that's obviously broken.  The following patch makes
it use output_dependence instead (introducing a canon_ variant of that).

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Ok?

Thanks,
Richard.

2016-04-01  Richard Biener  

PR rtl-optimization/70484
* rtl.h (canon_output_dependence): Declare.
* alias.c (canon_output_dependence): New function.
* dse.c (record_store): Use canon_output_dependence rather
than canon_true_dependence.

* gcc.dg/torture/pr70484.c: New testcase.

Index: gcc/rtl.h
===
*** gcc/rtl.h   (revision 234663)
--- gcc/rtl.h   (working copy)
*** extern int anti_dependence (const_rtx, c
*** 3652,3657 
--- 3652,3659 
  extern int canon_anti_dependence (const_rtx, bool,
  const_rtx, machine_mode, rtx);
  extern int output_dependence (const_rtx, const_rtx);
+ extern int canon_output_dependence (const_rtx, bool,
+   const_rtx, machine_mode, rtx);
  extern int may_alias_p (const_rtx, const_rtx);
  extern void init_alias_target (void);
  extern void init_alias_analysis (void);
Index: gcc/alias.c
===
*** gcc/alias.c (revision 234663)
--- gcc/alias.c (working copy)
*** output_dependence (const_rtx mem, const_
*** 3057,3062 
--- 3057,3076 
 /*mem_canonicalized=*/false,
 /*x_canonicalized*/false, /*writep=*/true);
  }
+ 
+ /* Likewise, but we already have a canonicalized MEM, and X_ADDR for X.
+Also, consider X in X_MODE (which might be from an enclosing
+STRICT_LOW_PART / ZERO_EXTRACT).
+If MEM_CANONICALIZED is true, MEM is canonicalized.  */
+ 
+ int
+ canon_output_dependence (const_rtx mem, bool mem_canonicalized,
+const_rtx x, machine_mode x_mode, rtx x_addr)
+ {
+   return write_dependence_p (mem, x, x_mode, x_addr,
+mem_canonicalized, /*x_canonicalized=*/true,
+/*writep=*/true);
+ }
  
  
  
Index: gcc/dse.c
===
*** gcc/dse.c   (revision 234663)
--- gcc/dse.c   (working copy)
*** record_store (rtx body, bb_info_t bb_inf
*** 1609,1618 
   the value of store_info.  If it is, set the rhs to NULL to
   keep it from being used to remove a load.  */
{
! if (canon_true_dependence (s_info->mem,
!GET_MODE (s_info->mem),
!s_info->mem_addr,
!mem, mem_addr))
{
  s_info->rhs = NULL;
  s_info->const_rhs = NULL;
--- 1609,1617 
   the value of store_info.  If it is, set the rhs to NULL to
   keep it from being used to remove a load.  */
{
! if (canon_output_dependence (s_info->mem, true,
!  mem, GET_MODE (mem),
!  mem_addr))
{
  s_info->rhs = NULL;
  s_info->const_rhs = NULL;
Index: gcc/testsuite/gcc.dg/torture/pr70484.c
===
--- gcc/testsuite/gcc.dg/torture/pr70484.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr70484.c  (working copy)
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+int __attribute__((noinline,noclone))
+f(int *pi, long *pl)
+{
+  *pi = 1;
+  *pl = 0;
+  return *(char *)pi;
+}
+
+int main()
+{
+  char a[sizeof (long)];
+  if (f ((int *)a, (long *)a) != 0)
+abort ();
+  return 0;
+}


Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 11:08:09AM +0200, Richard Biener wrote:
> 
> RTL DSE uses true_dependence to see whether a store may be killed by
> anothe store - that's obviously broken.  The following patch makes
> it use output_dependence instead (introducing a canon_ variant of that).

I think it would be interesting to see some stats on what effect does this
have on the optimization RTL DSE is doing (say gather during
unpatched bootstrap/regtest number of successfully optimized replace_read
calls, and the same with patched bootstrap/regtest).
>From quick look at the patch, this wouldn't optimize even the cases that
could be optimized (return *pi;) at the RTL level.  If the statistics would
show this affects it significantly, perhaps we could do both
canon_true_dependence and canon_output_dependence, and if the two calls
differ, don't clear the rhs, but mark it somehow and then in replace_read
check what alias set is used for the read or something similar?

Jakub


Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-01 Thread Richard Biener
On Fri, 1 Apr 2016, Jakub Jelinek wrote:

> On Fri, Apr 01, 2016 at 11:08:09AM +0200, Richard Biener wrote:
> > 
> > RTL DSE uses true_dependence to see whether a store may be killed by
> > anothe store - that's obviously broken.  The following patch makes
> > it use output_dependence instead (introducing a canon_ variant of that).
> 
> I think it would be interesting to see some stats on what effect does this
> have on the optimization RTL DSE is doing (say gather during
> unpatched bootstrap/regtest number of successfully optimized replace_read
> calls, and the same with patched bootstrap/regtest).
> From quick look at the patch, this wouldn't optimize even the cases that
> could be optimized (return *pi;) at the RTL level.  If the statistics would
> show this affects it significantly, perhaps we could do both
> canon_true_dependence and canon_output_dependence, and if the two calls
> differ, don't clear the rhs, but mark it somehow and then in replace_read
> check what alias set is used for the read or something similar?

Well, I don't believe it is DSEs job to do CSE.  And I don't see how
we can efficiently do what you suggest - it seems DSE doesn't check
all possible aliases when CSEing.

But of course I didn't try to understand how it works and thus happily
defer to somebody else coming up with a better fix.  Maybe the correct
fix is to

  while (i_ptr)
{
  bool remove = false;
  store_info *store_info = i_ptr->store_rec;

  /* Skip the clobbers.  */
  while (!store_info->is_set)
store_info = store_info->next;
...

  /* If this read is just reading back something that we just
 stored, rewrite the read.  */
  else
{
  if (store_info->rhs
  && offset >= store_info->begin
  && offset + width <= store_info->end
  && all_positions_needed_p (store_info,
 offset - 
store_info->begin,
 width)
  && replace_read (store_info, i_ptr, read_info,
   insn_info, loc, 
bb_info->regs_live))
return;

  /* The bases are the same, just see if the offsets
 overlap.  */
  if ((offset < store_info->end)
  && (offset + width > store_info->begin))
remove = true;

which lacks sth like a canon_true_dependence check so that if
replace_read fails it doesn't try again with other stores it finds
(hopefully the store_info->next list is "properly" ordered - you'd
want to walk backwards starting from the reads - I don't think this
is what the code above does).

So it really seems to be that the code I fixed (quoted again below)
is supposed to ensure the validity of the transform as the comment
suggests.

  else if (s_info->rhs)
/* Need to see if it is possible for this store to overwrite
   the value of store_info.  If it is, set the rhs to NULL to
   keep it from being used to remove a load.  */
{
  if (canon_output_dependence (s_info->mem, true,
   mem, GET_MODE (mem),
   mem_addr))
{
  s_info->rhs = NULL;
  s_info->const_rhs = NULL;
}

As said, I don't believe in a DSE algorithm doing CSE, so ...

Richard.


Re: [PATCH, AArch64] Fix for PR67896 (C++ FE cannot distinguish __Poly{8,16,64,128}_t types)

2016-04-01 Thread James Greenhalgh
On Mon, Jan 25, 2016 at 12:15:48PM +, James Greenhalgh wrote:
> On Wed, Jan 20, 2016 at 09:27:41PM +0100, Roger Ferrer Ibáñez wrote:
> > Hi James,
> > 
> > > This patch looks technically correct to me, though there is a small
> > > style issue to correct (in-line below), and your ChangeLogs don't fit
> > > our usual style.
> > 
> > thank you very much for the useful comments. I'm attaching a new
> > version of the patch with the style issues (hopefully) ironed out.
> 
> Thanks, this version of the patch looks correct to me.
> 
> > > > P.S.: I haven't signed the copyright assignment to the FSF. The change
> > > > is really small but I can do the paperwork if required.
> 
> I can't commit it on your behalf until we've heard back regarding whether
> this needs a copyright assignment to the FSF, but once I've heard I'd
> be happy to commit this for you. I'll expand the CC list a bit further
> to see if we can get an answer on that.
> 
> Thanks again for the analysis and patch.

This patch should also be backported to GCC 5, which has the same bug.

I've done that as r234665 as the merge was clean after a bootstrap and
test cycle on aarch64-none-linux-gnu.

Thanks,
James



Fix for PR70498 in Libiberty Demangler

2016-04-01 Thread Marcel Böhme
Hi,

This fixes the write access violation detailed in 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70498 (and a few other unreported 
cases).

Sometimes length-variables for strings and arrays are of type long other times 
of type int. Since cp-demangle.h exports structs and methods with 
length-variables of type int, this has been made consistent in cp-demangle.c.

Best regards,
- Marcel

Index: libiberty/cp-demangle.c
===
--- libiberty/cp-demangle.c (revision 234663)
+++ libiberty/cp-demangle.c (working copy)
@@ -398,7 +398,7 @@
  struct demangle_component *);
 
 static struct demangle_component *
-d_make_template_param (struct d_info *, long);
+d_make_template_param (struct d_info *, int);
 
 static struct demangle_component *
 d_make_sub (struct d_info *, const char *, int);
@@ -421,9 +421,9 @@
 
 static struct demangle_component *d_source_name (struct d_info *);
 
-static long d_number (struct d_info *);
+static int d_number (struct d_info *);
 
-static struct demangle_component *d_identifier (struct d_info *, long);
+static struct demangle_component *d_identifier (struct d_info *, int);
 
 static struct demangle_component *d_operator_name (struct d_info *);
 
@@ -1119,7 +1119,7 @@
 /* Add a new template parameter.  */
 
 static struct demangle_component *
-d_make_template_param (struct d_info *di, long i)
+d_make_template_param (struct d_info *di, int i)
 {
   struct demangle_component *p;
 
@@ -1135,7 +1135,7 @@
 /* Add a new function parameter.  */
 
 static struct demangle_component *
-d_make_function_param (struct d_info *di, long i)
+d_make_function_param (struct d_info *di, int i)
 {
   struct demangle_component *p;
 
@@ -1620,7 +1620,7 @@
 static struct demangle_component *
 d_source_name (struct d_info *di)
 {
-  long len;
+  int len;
   struct demangle_component *ret;
 
   len = d_number (di);
@@ -1633,12 +1633,12 @@
 
 /* number ::= [n] <(non-negative decimal integer)>  */
 
-static long
+static int
 d_number (struct d_info *di)
 {
   int negative;
   char peek;
-  long ret;
+  int ret;
 
   negative = 0;
   peek = d_peek_char (di);
@@ -1681,7 +1681,7 @@
 /* identifier ::= <(unqualified source code identifier)>  */
 
 static struct demangle_component *
-d_identifier (struct d_info *di, long len)
+d_identifier (struct d_info *di, int len)
 {
   const char *name;
 
@@ -1702,7 +1702,7 @@
   /* Look for something which looks like a gcc encoding of an
  anonymous namespace, and replace it with a more user friendly
  name.  */
-  if (len >= (long) ANONYMOUS_NAMESPACE_PREFIX_LEN + 2
+  if (len >= ANONYMOUS_NAMESPACE_PREFIX_LEN + 2
   && memcmp (name, ANONYMOUS_NAMESPACE_PREFIX,
 ANONYMOUS_NAMESPACE_PREFIX_LEN) == 0)
 {
@@ -1870,7 +1870,7 @@
 {
   struct demangle_component *p = NULL;
   struct demangle_component *next = NULL;
-  long len, i;
+  int len, i;
   char c;
   const char *str;
 
@@ -2012,7 +2012,7 @@
case 'C':
  {
struct demangle_component *derived_type;
-   long offset;
+   int offset;
struct demangle_component *base_type;
 
derived_type = cplus_demangle_type (di);
@@ -2946,10 +2946,10 @@
 
 /*  _ */
 
-static long
+static int
 d_compact_number (struct d_info *di)
 {
-  long num;
+  int num;
   if (d_peek_char (di) == '_')
 num = 0;
   else if (d_peek_char (di) == 'n')
@@ -2969,7 +2969,7 @@
 static struct demangle_component *
 d_template_param (struct d_info *di)
 {
-  long param;
+  int param;
 
   if (! d_check_char (di, 'T'))
 return NULL;
@@ -3502,7 +3502,7 @@
 static int
 d_discriminator (struct d_info *di)
 {
-  long discrim;
+  int discrim;
 
   if (d_peek_char (di) != '_')
 return 1;
@@ -3558,7 +3558,7 @@
 d_unnamed_type (struct d_info *di)
 {
   struct demangle_component *ret;
-  long num;
+  int num;
 
   if (! d_check_char (di, 'U'))
 return NULL;
@@ -4086,7 +4086,7 @@
 }
 
 static inline void
-d_append_num (struct d_print_info *dpi, long l)
+d_append_num (struct d_print_info *dpi, int l)
 {
   char buf[25];
   sprintf (buf,"%ld", l);



[PATCH, PR target/69890] Remove "string.h" dependency for string functions optimization tests

2016-04-01 Thread Ilya Enkovich
Hi,

This patch replaces "string.h" with "strlenopt.h" in all tests
checking string function optimizations with CHKP.  I added memmove
definition to strlenopt.h for that.

Regtested on x86_64-pc-linux-gnu and checked PR69890 is resolved
on Darwin by Dominique d'Humieres.  Is strlenopt.h extension and
re-usage OK for trunk and gcc-5-branch?

Thanks,
Ilya
--
gcc/testsuite/

2016-04-01  Ilya Enkovich  

PR target/69890
* gcc.dg/strlenopt.h (memmove): New.
* gcc.target/i386/chkp-strlen-1.c: Include "../../gcc.dg/strlenopt.h"
instead of "string.h".
* gcc.target/i386/chkp-strlen-2.c: Likewise.
* gcc.target/i386/chkp-strlen-3.c: Likewise.
* gcc.target/i386/chkp-strlen-4.c: Likewise.
* gcc.target/i386/chkp-strlen-5.c: Likewise.
* gcc.target/i386/chkp-stropt-1.c: Likewise.
* gcc.target/i386/chkp-stropt-10.c: Likewise.
* gcc.target/i386/chkp-stropt-11.c: Likewise.
* gcc.target/i386/chkp-stropt-12.c: Likewise.
* gcc.target/i386/chkp-stropt-13.c: Likewise.
* gcc.target/i386/chkp-stropt-14.c: Likewise.
* gcc.target/i386/chkp-stropt-15.c: Likewise.
* gcc.target/i386/chkp-stropt-16.c: Likewise.
* gcc.target/i386/chkp-stropt-2.c: Likewise.
* gcc.target/i386/chkp-stropt-3.c: Likewise.
* gcc.target/i386/chkp-stropt-4.c: Likewise.
* gcc.target/i386/chkp-stropt-5.c: Likewise.
* gcc.target/i386/chkp-stropt-6.c: Likewise.
* gcc.target/i386/chkp-stropt-7.c: Likewise.
* gcc.target/i386/chkp-stropt-8.c: Likewise.
* gcc.target/i386/chkp-stropt-9.c: Likewise.


diff --git a/gcc/testsuite/gcc.dg/strlenopt.h b/gcc/testsuite/gcc.dg/strlenopt.h
index ef47e5a..8f69940 100644
--- a/gcc/testsuite/gcc.dg/strlenopt.h
+++ b/gcc/testsuite/gcc.dg/strlenopt.h
@@ -10,6 +10,7 @@ void free (void *);
 char *strdup (const char *);
 size_t strlen (const char *);
 void *memcpy (void *__restrict, const void *__restrict, size_t);
+void *memmove (void *, const void *, size_t);
 char *strcpy (char *__restrict, const char *__restrict);
 char *strcat (char *__restrict, const char *__restrict);
 char *strchr (const char *, int);
@@ -31,6 +32,12 @@ memcpy (void *__restrict dest, const void *__restrict src, 
size_t len)
   return __builtin___memcpy_chk (dest, src, len, bos0 (dest));
 }
 
+extern inline __attribute__((gnu_inline, always_inline, artificial)) void *
+memmove (void *dest, const void *src, size_t len)
+{
+  return __builtin___memmove_chk (dest, src, len, bos0 (dest));
+}
+
 extern inline __attribute__((gnu_inline, always_inline, artificial)) char *
 strcpy (char *__restrict dest, const char *__restrict src)
 {
diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-1.c 
b/gcc/testsuite/gcc.target/i386/chkp-strlen-1.c
index de6279f..38d5390 100644
--- a/gcc/testsuite/gcc.target/i386/chkp-strlen-1.c
+++ b/gcc/testsuite/gcc.target/i386/chkp-strlen-1.c
@@ -2,7 +2,7 @@
 /* { dg-options "-fcheck-pointer-bounds -mmpx -O2 -fdump-tree-strlen" } */
 /* { dg-final { scan-tree-dump "memcpy.chkp" "strlen" } } */
 
-#include "string.h"
+#include "../../gcc.dg/strlenopt.h"
 
 char *test (char *str1, char *str2)
 {
diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c 
b/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
index 470ac47..789ebc1 100644
--- a/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
+++ b/gcc/testsuite/gcc.target/i386/chkp-strlen-2.c
@@ -3,8 +3,8 @@
 /* { dg-options "-fcheck-pointer-bounds -mmpx -O2 -fdump-tree-strlen" } */
 /* { dg-final { scan-tree-dump-not "strlen" "strlen" } } */
 
-#define _GNU_SOURCE
-#include "string.h"
+#define USE_GNU
+#include "../../gcc.dg/strlenopt.h"
 
 char *test (char *str1, char *str2)
 {
diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-3.c 
b/gcc/testsuite/gcc.target/i386/chkp-strlen-3.c
index 311c9a0..276f412 100644
--- a/gcc/testsuite/gcc.target/i386/chkp-strlen-3.c
+++ b/gcc/testsuite/gcc.target/i386/chkp-strlen-3.c
@@ -2,7 +2,7 @@
 /* { dg-options "-fcheck-pointer-bounds -mmpx -O2 -fdump-tree-strlen" } */
 /* { dg-final { scan-tree-dump-times "strlen" 1 "strlen" } } */
 
-#include "string.h"
+#include "../../gcc.dg/strlenopt.h"
 
 size_t test (char *str1, char *str2)
 {
diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c 
b/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c
index dbf568b..51ff960 100644
--- a/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c
+++ b/gcc/testsuite/gcc.target/i386/chkp-strlen-4.c
@@ -3,8 +3,8 @@
 /* { dg-options "-fcheck-pointer-bounds -mmpx -O2 -fdump-tree-strlen" } */
 /* { dg-final { scan-tree-dump-times "strlen" 1 "strlen" } } */
 
-#define _GNU_SOURCE
-#include "string.h"
+#define USE_GNU
+#include "../../gcc.dg/strlenopt.h"
 
 char * test (char *str1, char *str2)
 {
diff --git a/gcc/testsuite/gcc.target/i386/chkp-strlen-5.c 
b/gcc/testsuite/gcc.target/i386/chkp-strlen-5.c
index e44096c..bbafecc 100644
--- a/gcc/testsuite/gcc.target/i386/chkp-strlen-5.c
+++ b/gcc/testsuite/gcc.target

Re: [PATCH, PR target/69890] Remove "string.h" dependency for string functions optimization tests

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 01:28:10PM +0300, Ilya Enkovich wrote:
> This patch replaces "string.h" with "strlenopt.h" in all tests
> checking string function optimizations with CHKP.  I added memmove
> definition to strlenopt.h for that.
> 
> Regtested on x86_64-pc-linux-gnu and checked PR69890 is resolved
> on Darwin by Dominique d'Humieres.  Is strlenopt.h extension and
> re-usage OK for trunk and gcc-5-branch?

Ok, thanks.

Jakub


Re: [PATCH] Disable guality tests for powerpc*-linux*

2016-04-01 Thread Bill Schmidt
Thanks to all for the helpful explanations.  We plan to leave things as
they are.  I hope someday we can make some time to do some basic
investigations here.

Bill

On Fri, 2016-04-01 at 00:09 -0500, Aldy Hernandez wrote:
> Richard Biener  writes:
> 
> > Hell, even slapping a xfail powerpc*-*-* on all current ppc FAILs
> > would be better
> > than simply disabling all of guality for ppc.
> 
> FWIW, I agree.  While working on the debug early project, I found at
> least two legitimate bugs affecting all architectures with guality tests
> on ppc64.
> 
> Aldy
> 




Re: Fix for PR70498 in Libiberty Demangler

2016-04-01 Thread Bernd Schmidt

On 04/01/2016 12:21 PM, Marcel Böhme wrote:

This fixes the write access violation detailed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70498 (and a few other
unreported cases).

Sometimes length-variables for strings and arrays are of type long
other times of type int. Since cp-demangle.h exports structs and
methods with length-variables of type int, this has been made
consistent in cp-demangle.c.


Patches need to be bootstrapped and regression tested, and patch 
submissions should include which target this was done on.


Ideally you'd also want to include testcases along with your patches, 
although I'm not entirely sure how we can arrange for this type of 
problem to be tested.


Lastly, for this specific patch, I have trouble seeing how it fixes 
anything. I'd need a more detailed explanation of how the problem 
happens in the first place.



Bernd



Re: Patches to fix optimizer bug & C++ exceptions for GCC VAX backend

2016-04-01 Thread Bernd Schmidt
Cc'ing Matt Thomas who is listed as the vax maintainer; most of the 
patch should be reviewed by him IMO. If he is no longer active I'd 
frankly rather deprecate the port rather than invest effort in keeping 
it running.



Index: gcc/except.c
===
RCS file: /cvsroot/src/external/gpl3/gcc/dist/gcc/except.c,v
retrieving revision 1.3
diff -u -r1.3 except.c
--- gcc/except.c23 Mar 2016 15:51:36 -  1.3
+++ gcc/except.c28 Mar 2016 23:24:40 -
@@ -2288,7 +2288,8 @@
  #endif
  {
  #ifdef EH_RETURN_HANDLER_RTX
-  emit_move_insn (EH_RETURN_HANDLER_RTX, crtl->eh.ehr_handler);
+  rtx insn = emit_move_insn (EH_RETURN_HANDLER_RTX, crtl->eh.ehr_handler);
+  RTX_FRAME_RELATED_P (insn) = 1;  // XXX FIXME in VAX backend?
  #else
error ("__builtin_eh_return not supported on this target");
  #endif


This part looks highly suspicious and I think there needs to be further 
analysis.



Bernd



Re: [PATCH 1/2] Fix new -Wparentheses warnings encountered during bootstrap

2016-04-01 Thread Marek Polacek
On Thu, Mar 31, 2016 at 05:18:13PM -0400, Patrick Palka wrote:
> I hope someone else could do it since I'm not very familiar with the C
> parser :)  I think Marek said he would take care of it.

Sure, happy to.

Marek


Re: Backports to 5.x branch

2016-04-01 Thread Christophe Lyon
On 31 March 2016 at 10:57, Jakub Jelinek  wrote:
> On Thu, Mar 31, 2016 at 09:53:28AM +0100, Kyrill Tkachov wrote:
>>
>> On 31/03/16 09:48, Christophe Lyon wrote:
>> >On 30 March 2016 at 14:49, Jakub Jelinek  wrote:
>> >>Hi!
>> >>
>> >>I've bootstrapped/regtested on x86_64-linux and i686-linux following
>> >>backports from trunk and committed them to gcc-5-branch.
>> >>
>> >Hi,
>> >I've noticed that r234548 shows regressions on aarch64:
>> >PASS->FAILL:
>> >   gcc.target/aarch64/scalar_shift_1.c scan-assembler-times
>> >neg\td[0-9]+, d[0-9]+ 4
>>
>> Maybe that test just needs a backport of the fix for PR 70004?
>
> Sorry, haven't remembered PR70004 has been a follow up to the invalid shift
> counts PRs.  PR70004 patch is preapproved for 5.x branch.
>
OK, done at r234669.
Thanks

Christophe.

> Jakub


[patch] Clarify doxygen comments w.r.t nulls in std::string

2016-04-01 Thread Jonathan Wakely

A stackoverflow poster was confused by the comments on
basic_string::length() which talk about null-temrination, as it isn't
obvious that embedded nulls are not terminators.

I think this makes it clearer, but would appreciate other opinions.

(The @{ and @} commands tell doxygen to use the same comment for both
functions, to avoid duplicating the text more than already necessary
for the two string implementations.)


commit e028b1ff506e21567bb315715d4e694c36d1d006
Author: Jonathan Wakely 
Date:   Fri Apr 1 11:48:26 2016 +0100

Clarify doxygen comments w.r.t nulls in std::string

	* include/bits/basic_string.h (basic_string::size,
	basic_string::length): Clarify handling of embedded null characters.

diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 374c985..f79b531 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -783,18 +783,29 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 public:
   // Capacity:
-  ///  Returns the number of characters in the string, not including any
-  ///  null-termination.
+
+  /**
+   * @{
+   *
+   * @brief  Returns the number of characters in the string.
+   *
+   * The return value does not include the null character that follows
+   * the string contents, e.g. in the pointer returned by c_str(),
+   * but embedded null characters inside the string are part of the
+   * string contents and so they (and characters following them) are
+   * counted.
+   */
+
   size_type
   size() const _GLIBCXX_NOEXCEPT
   { return _M_string_length; }
 
-  ///  Returns the number of characters in the string, not including any
-  ///  null-termination.
   size_type
   length() const _GLIBCXX_NOEXCEPT
   { return _M_string_length; }
 
+  // @}
+
   ///  Returns the size() of the largest possible %string.
   size_type
   max_size() const _GLIBCXX_NOEXCEPT
@@ -1967,6 +1978,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   swap(basic_string& __s) _GLIBCXX_NOEXCEPT;
 
   // String operations:
+
   /**
*  @brief  Return const pointer to null-terminated contents.
*
@@ -3233,18 +3245,29 @@ _GLIBCXX_END_NAMESPACE_CXX11
 
 public:
   // Capacity:
-  ///  Returns the number of characters in the string, not including any
-  ///  null-termination.
+
+  /**
+   * @{
+   *
+   * @brief  Returns the number of characters in the string.
+   *
+   * The return value does not include the null character that follows
+   * the string contents, e.g. in the pointer returned by c_str(),
+   * but embedded null characters inside the string are part of the
+   * string contents and so they (and characters following them) are
+   * counted.
+   */
+
   size_type
   size() const _GLIBCXX_NOEXCEPT
   { return _M_rep()->_M_length; }
 
-  ///  Returns the number of characters in the string, not including any
-  ///  null-termination.
   size_type
   length() const _GLIBCXX_NOEXCEPT
   { return _M_rep()->_M_length; }
 
+  // @}
+
   ///  Returns the size() of the largest possible %string.
   size_type
   max_size() const _GLIBCXX_NOEXCEPT


Fix TRY_CATCH_EXPR comment

2016-04-01 Thread Nathan Sidwell
In working on 55635 I noticed the comment for TRY_CATCH_EXPR was confusing.   It 
is using 'operand 1' and 'operand 2' to mean 'first' and 'second', not literally 
op[1] and op[2].   Fixed to use indices 0 and 1.


Applied as obvious.

nathan
2016-04-01  Nathan Sidwell  

	* tree.def (TRY_CATCH_EXPR): Correct documentation.

Index: tree.def
===
--- tree.def	(revision 234668)
+++ tree.def	(working copy)
@@ -870,10 +870,10 @@ DEFTREECODE (POSTINCREMENT_EXPR, "postin
 /* Used to implement `va_arg'.  */
 DEFTREECODE (VA_ARG_EXPR, "va_arg_expr", tcc_expression, 1)
 
-/* Evaluate operand 1.  If and only if an exception is thrown during
-   the evaluation of operand 1, evaluate operand 2.
+/* Evaluate operand 0.  If and only if an exception is thrown during
+   the evaluation of operand 0, evaluate operand 1.
 
-   This differs from TRY_FINALLY_EXPR in that operand 2 is not evaluated
+   This differs from TRY_FINALLY_EXPR in that operand 1 is not evaluated
on a normal or jump exit, only on an exception.  */
 DEFTREECODE (TRY_CATCH_EXPR, "try_catch_expr", tcc_statement, 2)
 


Re: Various selective scheduling fixes

2016-04-01 Thread Christophe Lyon
On 1 April 2016 at 10:54, Andrey Belevantsev  wrote:
> Hi Christophe,
>
>
> On 01.04.2016 10:33, Christophe Lyon wrote:
>>
>> On 31 March 2016 at 16:43, Andrey Belevantsev  wrote:
>>>
>>> Hello,
>>>
>>> On 14.03.2016 12:10, Andrey Belevantsev wrote:


 Hello,

 In this thread I will be posting the patches for the fixed selective
 scheduling PRs (except the one that was already kindly checked in by
 Jeff).
  The patches were tested both on x86-64 and ia64 with the following
 combination: 1) the usual bootstrap/regtest, which only utilizes
 sel-sched
 on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the
 bootstrap/regtest with the second scheduler forced to sel-sched; 3) both
 schedulers forced to sel-sched.  In all cases everything seemed to be
 fine.

 Three of the PRs are regressions, the other two showed different errors
 across the variety of releases tested by submitters;  I think all of
 them
 are appropriate at this stage -- they do not touch anything outside of
 selective scheduling except the first patch where a piece of code from
 sched-deps.c needs to be refactored into a function to be called from
 sel-sched.c.
>>>
>>>
>>>
>>> I've backported all regression PRs to gcc-5-branch after testing there
>>> again
>>> with selective scheduling force enabled: PRs 64411, 0, 69032, 69102.
>>> The first one was not marked as a regression as such but the test for PR
>>> 70292, which is duplicate, works for me on gcc 5.1 thus making it a
>>> regression, too.
>>>
>>
>> Hi,
>>
>> The backport for pr69102 shows that the new testcase fails to compile
>> (ICE)
>> when GCC is configured as:
>>
>> --target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm
>> --with-cpu=cortex-a15 --with-fpu=neon-vfpv4
>>
>>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:
>> In function 'foo':
>>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1:
>> internal compiler error: Segmentation fault
>> 0xa64d15 crash_signal
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383
>> 0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
>> 0xa31cd2 invoke_dfa_lookahead_guard
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212
>> 0xa31cd2 find_best_expr
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415
>> 0xa343fb fill_insns
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570
>> 0xa343fb schedule_on_fences
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395
>> 0xa36010 sel_sched_region_2
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533
>> 0xa36f2a sel_sched_region_1
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575
>> 0xa36f2a sel_sched_region(int)
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676
>> 0xa37589 run_selective_scheduling()
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752
>> 0xa14aed rest_of_handle_sched2
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
>> 0xa14aed execute
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791
>>
>> See
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt
>>
>> Can you have a look?
>
>
> That's because A15 is the only place which enables
> autopref_multipass_dfa_lookahead_guard as the DFA lookahead guard hook. But
> autoprefetch modeling doesn't work for selective scheduling, it uses haifa
> structures that are not kept up to date during sel-sched.  So this is not
> supposed to work as soon as the param value for prefetcher lookahead depth
> is positive.
>
> The following patch works for me.  Could you check it with your testing? If
> it works fine for you, I would install the patch both for trunk and gcc-5.
> It would be great to force sel-sched to be enabled, too.  I could do that
> but I don't have the hardware or cross-arm target tools at the moment.
>
> * haifa-sched.c (autopref_multipass_dfa_lookahead_guard): Disable
> for selective scheduler.
>

It does work for me, it also fixes the other ICE I reported (on pr69307).

But note that both tests pass on trunk.

Christophe


> Best,
> Andrey
>
>
>>
>> Christophe
>>
>>> Andrey
>>>

 Andrey
>>>
>>>
>>>
>


Re: Various selective scheduling fixes

2016-04-01 Thread Kyrill Tkachov

Hi Christophe, Andrey,

On 01/04/16 14:09, Christophe Lyon wrote:

On 1 April 2016 at 10:54, Andrey Belevantsev  wrote:

Hi Christophe,


On 01.04.2016 10:33, Christophe Lyon wrote:

On 31 March 2016 at 16:43, Andrey Belevantsev  wrote:

Hello,

On 14.03.2016 12:10, Andrey Belevantsev wrote:


Hello,

In this thread I will be posting the patches for the fixed selective
scheduling PRs (except the one that was already kindly checked in by
Jeff).
  The patches were tested both on x86-64 and ia64 with the following
combination: 1) the usual bootstrap/regtest, which only utilizes
sel-sched
on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2) the
bootstrap/regtest with the second scheduler forced to sel-sched; 3) both
schedulers forced to sel-sched.  In all cases everything seemed to be
fine.

Three of the PRs are regressions, the other two showed different errors
across the variety of releases tested by submitters;  I think all of
them
are appropriate at this stage -- they do not touch anything outside of
selective scheduling except the first patch where a piece of code from
sched-deps.c needs to be refactored into a function to be called from
sel-sched.c.



I've backported all regression PRs to gcc-5-branch after testing there
again
with selective scheduling force enabled: PRs 64411, 0, 69032, 69102.
The first one was not marked as a regression as such but the test for PR
70292, which is duplicate, works for me on gcc 5.1 thus making it a
regression, too.


Hi,

The backport for pr69102 shows that the new testcase fails to compile
(ICE)
when GCC is configured as:

--target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm
--with-cpu=cortex-a15 --with-fpu=neon-vfpv4


/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:
In function 'foo':

/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1:
internal compiler error: Segmentation fault
0xa64d15 crash_signal
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383
0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
0xa31cd2 invoke_dfa_lookahead_guard
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212
0xa31cd2 find_best_expr
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415
0xa343fb fill_insns
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570
0xa343fb schedule_on_fences
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395
0xa36010 sel_sched_region_2
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533
0xa36f2a sel_sched_region_1
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575
0xa36f2a sel_sched_region(int)
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676
0xa37589 run_selective_scheduling()
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752
0xa14aed rest_of_handle_sched2
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
0xa14aed execute
 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791

See
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt

Can you have a look?


That's because A15 is the only place which enables
autopref_multipass_dfa_lookahead_guard as the DFA lookahead guard hook. But
autoprefetch modeling doesn't work for selective scheduling, it uses haifa
structures that are not kept up to date during sel-sched.  So this is not
supposed to work as soon as the param value for prefetcher lookahead depth
is positive.

The following patch works for me.  Could you check it with your testing? If
it works fine for you, I would install the patch both for trunk and gcc-5.
It would be great to force sel-sched to be enabled, too.  I could do that
but I don't have the hardware or cross-arm target tools at the moment.

 * haifa-sched.c (autopref_multipass_dfa_lookahead_guard): Disable
for selective scheduler.


It does work for me, it also fixes the other ICE I reported (on pr69307).

But note that both tests pass on trunk.


This looks to me like PR rtl-optimization/68236 which I fixed on trunk.

Kyrill


Christophe



Best,
Andrey



Christophe


Andrey


Andrey







[PATCH] Improve CSE (PR rtl-optimization/70467)

2016-04-01 Thread Jakub Jelinek
Hi!

As the testcase below shows, we can end up with lots of useless
instructions from multi-word arithmetics.
simplify-rtx.c can optimize x {&,|,^}= {0,-1}, but while
x &= 0 or x {|,^}= -1 are optimized into constants and CSE can handle those
fine, we keep x &= -1 and x {|,^}= 0 in the IL until expansion if x
is a MEM.  There are two issues, one is that cse_insn has for a few years
code that wants to prevent partially overlapping MEM->MEM moves,
but actually doesn't realize that fully overlapping MEM->MEM noop moves
are fine.  And the second one is that on most backends, there are no
MEM->MEM move instructions, so we need to delete the useless insns instead,
because it can't match.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
Is this something we want for 6.x or defer for stage1?

2016-04-01  Jakub Jelinek  

PR rtl-optimization/70467
* cse.c (cse_insn): Handle no-op MEM moves after folding.

* gcc.target/i386/pr70467-1.c: New test.

--- gcc/cse.c.jj2016-04-01 12:12:22.630602270 +0200
+++ gcc/cse.c   2016-04-01 12:18:03.509947871 +0200
@@ -5166,7 +5166,7 @@ cse_insn (rtx_insn *insn)
}
 
  /* Avoid creation of overlapping memory moves.  */
- if (MEM_P (trial) && MEM_P (SET_DEST (sets[i].rtl)))
+ if (MEM_P (trial) && MEM_P (dest) && !rtx_equal_p (trial, dest))
{
  rtx src, dest;
 
@@ -5277,6 +5277,20 @@ cse_insn (rtx_insn *insn)
  break;
}
 
+ /* Similarly, lots of targets don't allow no-op
+(set (mem x) (mem x)) moves.  */
+ else if (n_sets == 1
+  && MEM_P (trial)
+  && MEM_P (dest)
+  && rtx_equal_p (trial, dest)
+  && !side_effects_p (dest)
+  && (cfun->can_delete_dead_exceptions
+  || insn_nothrow_p (insn)))
+   {
+ SET_SRC (sets[i].rtl) = trial;
+ break;
+   }
+
  /* Reject certain invalid forms of CONST that we create.  */
  else if (CONSTANT_P (trial)
   && GET_CODE (trial) == CONST
@@ -5495,6 +5509,21 @@ cse_insn (rtx_insn *insn)
  sets[i].rtl = 0;
}
 
+  /* Similarly for no-op MEM moves.  */
+  else if (n_sets == 1
+  && MEM_P (SET_DEST (sets[i].rtl))
+  && MEM_P (SET_SRC (sets[i].rtl))
+  && rtx_equal_p (SET_DEST (sets[i].rtl),
+  SET_SRC (sets[i].rtl))
+  && !side_effects_p (SET_DEST (sets[i].rtl))
+  && (cfun->can_delete_dead_exceptions || insn_nothrow_p (insn)))
+   {
+ if (cfun->can_throw_non_call_exceptions && can_throw_internal (insn))
+   cse_cfg_altered = true;
+ delete_insn_and_edges (insn);
+ sets[i].rtl = 0;
+   }
+
   /* If this SET is now setting PC to a label, we know it used to
 be a conditional or computed branch.  */
   else if (dest == pc_rtx && GET_CODE (src) == LABEL_REF
--- gcc/testsuite/gcc.target/i386/pr70467-1.c.jj2016-04-01 
12:13:06.945997183 +0200
+++ gcc/testsuite/gcc.target/i386/pr70467-1.c   2016-04-01 12:17:12.187648630 
+0200
@@ -0,0 +1,55 @@
+/* PR rtl-optimization/70467 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mno-sse" } */
+
+void foo (unsigned long long *);
+
+void
+bar (void)
+{
+  unsigned long long a;
+  foo (&a);
+  a &= 0x7fffULL;
+  foo (&a);
+  a &= 0x7fffULL;
+  foo (&a);
+  a &= 0x7fffULL;
+  foo (&a);
+  a &= 0x7fffULL;
+  foo (&a);
+  a &= 0xULL;
+  foo (&a);
+  a &= 0xULL;
+  foo (&a);
+  a |= 0x7fffULL;
+  foo (&a);
+  a |= 0x7fffULL;
+  foo (&a);
+  a |= 0x7fffULL;
+  foo (&a);
+  a |= 0x7fffULL;
+  foo (&a);
+  a |= 0xULL;
+  foo (&a);
+  a |= 0xULL;
+  foo (&a);
+  a ^= 0x7fffULL;
+  foo (&a);
+  a ^= 0x7fffULL;
+  foo (&a);
+  a ^= 0x7fffULL;
+  foo (&a);
+  a ^= 0x7fffULL;
+  foo (&a);
+  a ^= 0xULL;
+  foo (&a);
+  a ^= 0xULL;
+  foo (&a);
+}
+
+/* { dg-final { scan-assembler-not "andl\[ \t\]*.-1," { target ia32 } } } */
+/* { dg-final { scan-assembler-not "andl\[ \t\]*.0," { target ia32 } } } */
+/* { dg-final { scan-assembler-not "orl\[ \t\]*.-1," { target ia32 } } } */
+/* { dg-final { scan-assembler-not "orl\[ \t\]*.0," { target ia32 } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*.-1," { target ia32 } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*.0," { target ia32 } } } */

Jakub


[PATCH] Improve add/sub double word splitters (PR rtl-optimization/70467)

2016-04-01 Thread Jakub Jelinek
Hi!

As the testcase below shows, we generate awful code for double word
additions/subtractions if the last argument is a constant that has the
whole low word 0 (and only nonzero some of the upper bits).
In that case, there is no point doing useless addl $0, ... followed by
adcl $something, ... because the addition won't change anything and the
carry flag will be also clear; we can just add the high word to the high
part.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-04-01  Jakub Jelinek  

PR rtl-optimization/70467
* config/i386/i386.md (*add3_doubleword, *sub3_doubleword):
If low word of the last operand is 0, just emit addition/subtraction
for the high word.

* gcc.target/i386/pr70467-2.c: New test.

--- gcc/config/i386/i386.md.jj  2016-03-29 19:31:23.0 +0200
+++ gcc/config/i386/i386.md 2016-03-31 17:33:36.848167239 +0200
@@ -5449,7 +5449,14 @@ (define_insn_and_split "*add3_doubl
   (match_dup 4))
 (match_dup 5)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (mode, &operands[0], 3, &operands[0], 
&operands[3]);")
+{
+  split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
+  if (operands[2] == const0_rtx)
+{
+  ix86_expand_binary_operator (PLUS, mode, &operands[3]);
+  DONE;
+}
+})
 
 (define_insn "*add_1"
   [(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm,r,r")
@@ -6379,7 +6386,14 @@ (define_insn_and_split "*sub3_doubl
   (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))
 (match_dup 5)))
  (clobber (reg:CC FLAGS_REG))])]
-  "split_double_mode (mode, &operands[0], 3, &operands[0], 
&operands[3]);")
+{
+  split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
+  if (operands[2] == const0_rtx)
+{
+  ix86_expand_binary_operator (MINUS, mode, &operands[3]);
+  DONE;
+}
+})
 
 (define_insn "*sub_1"
   [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,")
--- gcc/testsuite/gcc.target/i386/pr70467-2.c.jj2016-04-01 
12:29:15.611785157 +0200
+++ gcc/testsuite/gcc.target/i386/pr70467-2.c   2016-04-01 12:31:19.980092446 
+0200
@@ -0,0 +1,20 @@
+/* PR rtl-optimization/70467 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned long long
+foo (unsigned long long x)
+{
+  return x + 0x123456ULL;
+}
+
+unsigned long long
+bar (unsigned long long x)
+{
+  return x - 0x123456ULL;
+}
+
+/* { dg-final { scan-assembler-not "addl\[ \t\]*.0," { target ia32 } } } */
+/* { dg-final { scan-assembler-not "subl\[ \t\]*.0," { target ia32 } } } */
+/* { dg-final { scan-assembler-not "adcl\[^\n\r\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler-not "sbbl\[^\n\r\]*%" { target ia32 } } } */

Jakub


[PATCH] Improve add/sub TImode double word splitters (PR rtl-optimization/70467)

2016-04-01 Thread Jakub Jelinek
Hi!

The previous patch apparently isn't enough for TImode, because
we don't even allow the CONST_WIDE_INT operands in there, it uses
"e" constraint and similar predicate.  All we care about is that
both of the words of the argument can be expressed as addq/adcq/subq/sbbq
immediates, so this patch adds new predicates and new constraint for
that purpose.  Suggestions for better names for those appreciated.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for stage1
(while the previous patch looks simple enough that I'd like to see it in
6.x, this one IMHO can wait).

2016-04-01  Jakub Jelinek  

PR rtl-optimization/70467
* config/i386/predicates.md (x86_64_double_int_operand,
x86_64_double_general_operand): New predicates.
* config/i386/constraints.md (Wd): New constraint.
* config/i386/i386.md (mode attr di): Use Wd instead of e.
(double_general_operand): New mode attr.
(add3, sub3): Use 
instead of .
(*add3_doubleword, *sub3_doubleword): Use
x86_64_double_general_operand instead of .

* gcc.target/i386/pr70467-3.c: New test.
* gcc.target/i386/pr70467-4.c: New test.

--- gcc/config/i386/predicates.md.jj2016-01-07 09:42:39.0 +0100
+++ gcc/config/i386/predicates.md   2016-04-01 11:48:17.640386878 +0200
@@ -332,6 +332,27 @@ (define_predicate "x86_64_zext_immediate
   return false;
 })
 
+;; Return true if VALUE is a constant integer whose low and high words satisfy
+;; x86_64_immediate_operand.
+(define_predicate "x86_64_double_int_operand"
+  (match_code "const_int,const_wide_int")
+{
+  switch (GET_CODE (op))
+{
+case CONST_INT:
+  return x86_64_immediate_operand (op, mode);
+
+case CONST_WIDE_INT:
+  return (x86_64_immediate_operand (simplify_subreg (DImode, op, TImode,
+0), DImode)
+ && x86_64_immediate_operand (simplify_subreg (DImode, op, TImode,
+   8), DImode));
+
+default:
+  gcc_unreachable ();
+}
+})
+
 ;; Return true if size of VALUE can be stored in a sign
 ;; extended immediate field.
 (define_predicate "x86_64_immediate_size_operand"
@@ -347,6 +368,14 @@ (define_predicate "x86_64_general_operan
 (match_operand 0 "x86_64_immediate_operand"))
 (match_operand 0 "general_operand")))
 
+;; Return true if OP's both words are general operands representable
+;; on x86_64.
+(define_predicate "x86_64_double_general_operand"
+  (if_then_else (match_test "TARGET_64BIT")
+(ior (match_operand 0 "nonimmediate_operand")
+(match_operand 0 "x86_64_double_int_operand"))
+(match_operand 0 "general_operand")))
+
 ;; Return true if OP is non-VOIDmode general operand representable
 ;; on x86_64.  This predicate is used in sign-extending conversion
 ;; operations that require non-VOIDmode immediate operands.
--- gcc/config/i386/constraints.md.jj   2016-01-29 21:32:56.0 +0100
+++ gcc/config/i386/constraints.md  2016-04-01 11:19:44.633921527 +0200
@@ -266,6 +266,11 @@ (define_constraint "Wz"
   (and (match_operand 0 "x86_64_zext_immediate_operand")
(match_test "GET_MODE (op) != VOIDmode")))
 
+(define_constraint "Wd"
+  "128-bit integer constant where both the high and low 64-bit word
+   of it satisfies the e constraint."
+  (match_operand 0 "x86_64_double_int_operand"))
+
 (define_constraint "Z"
   "32-bit unsigned integer constant, or a symbolic reference known
to fit that range (for immediate operands in zero-extending x86-64
--- gcc/config/i386/i386.md.jj  2016-03-31 17:33:36.0 +0200
+++ gcc/config/i386/i386.md 2016-04-01 11:29:40.705729897 +0200
@@ -1071,7 +1071,7 @@ (define_mode_attr i [(QI "n") (HI "n") (
 (define_mode_attr g [(QI "qmn") (HI "rmn") (SI "rme") (DI "rme")])
 
 ;; Immediate operand constraint for double integer modes.
-(define_mode_attr di [(SI "nF") (DI "e")])
+(define_mode_attr di [(SI "nF") (DI "Wd")])
 
 ;; Immediate operand constraint for shifts.
 (define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
@@ -1084,6 +1084,15 @@ (define_mode_attr general_operand
 (DI "x86_64_general_operand")
 (TI "x86_64_general_operand")])
 
+;; General operand predicate for integer modes, where for TImode
+;; we need both words of the operand to be general operands.
+(define_mode_attr double_general_operand
+   [(QI "general_operand")
+(HI "general_operand")
+(SI "x86_64_general_operand")
+(DI "x86_64_general_operand")
+(TI "x86_64_double_general_operand")])
+
 ;; General sign extend operand predicate for integer modes,
 ;; which disallows VOIDmode operands and thus it is suitable
 ;; for use inside sign_extend.
@@ -5423,7 +5432,7 @@ (define_insn_and_split "*lea"
 (define_expand "add3"
   [(set (match_operand:SDWIM 0 "nonimmediate_operand")
(plus:SDWIM (match_operand:SDWIM 1 "nonimmediate_operand")
- 

Re: Various selective scheduling fixes

2016-04-01 Thread Christophe Lyon
On 1 April 2016 at 15:12, Kyrill Tkachov  wrote:
> Hi Christophe, Andrey,
>
>
> On 01/04/16 14:09, Christophe Lyon wrote:
>>
>> On 1 April 2016 at 10:54, Andrey Belevantsev  wrote:
>>>
>>> Hi Christophe,
>>>
>>>
>>> On 01.04.2016 10:33, Christophe Lyon wrote:

 On 31 March 2016 at 16:43, Andrey Belevantsev  wrote:
>
> Hello,
>
> On 14.03.2016 12:10, Andrey Belevantsev wrote:
>>
>>
>> Hello,
>>
>> In this thread I will be posting the patches for the fixed selective
>> scheduling PRs (except the one that was already kindly checked in by
>> Jeff).
>>   The patches were tested both on x86-64 and ia64 with the following
>> combination: 1) the usual bootstrap/regtest, which only utilizes
>> sel-sched
>> on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2)
>> the
>> bootstrap/regtest with the second scheduler forced to sel-sched; 3)
>> both
>> schedulers forced to sel-sched.  In all cases everything seemed to be
>> fine.
>>
>> Three of the PRs are regressions, the other two showed different
>> errors
>> across the variety of releases tested by submitters;  I think all of
>> them
>> are appropriate at this stage -- they do not touch anything outside of
>> selective scheduling except the first patch where a piece of code from
>> sched-deps.c needs to be refactored into a function to be called from
>> sel-sched.c.
>
>
>
> I've backported all regression PRs to gcc-5-branch after testing there
> again
> with selective scheduling force enabled: PRs 64411, 0, 69032,
> 69102.
> The first one was not marked as a regression as such but the test for
> PR
> 70292, which is duplicate, works for me on gcc 5.1 thus making it a
> regression, too.
>
 Hi,

 The backport for pr69102 shows that the new testcase fails to compile
 (ICE)
 when GCC is configured as:

 --target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm
 --with-cpu=cortex-a15 --with-fpu=neon-vfpv4



 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:
 In function 'foo':


 /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1:
 internal compiler error: Segmentation fault
 0xa64d15 crash_signal
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383
 0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
 0xa31cd2 invoke_dfa_lookahead_guard
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212
 0xa31cd2 find_best_expr
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415
 0xa343fb fill_insns
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570
 0xa343fb schedule_on_fences
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395
 0xa36010 sel_sched_region_2
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533
 0xa36f2a sel_sched_region_1
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575
 0xa36f2a sel_sched_region(int)
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676
 0xa37589 run_selective_scheduling()
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752
 0xa14aed rest_of_handle_sched2
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
 0xa14aed execute
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791

 See

 http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt

 Can you have a look?
>>>
>>>
>>> That's because A15 is the only place which enables
>>> autopref_multipass_dfa_lookahead_guard as the DFA lookahead guard hook.
>>> But
>>> autoprefetch modeling doesn't work for selective scheduling, it uses
>>> haifa
>>> structures that are not kept up to date during sel-sched.  So this is not
>>> supposed to work as soon as the param value for prefetcher lookahead
>>> depth
>>> is positive.
>>>
>>> The following patch works for me.  Could you check it with your testing?
>>> If
>>> it works fine for you, I would install the patch both for trunk and
>>> gcc-5.
>>> It would be great to force sel-sched to be enabled, too.  I could do that
>>> but I don't have the hardware or cross-arm target tools at the moment.
>>>
>>>  * haifa-sched.c (autopref_multipass_dfa_lookahead_guard):
>>> Disable
>>> for selective scheduler.
>>>
>> It does work for me, it also fixes the other ICE I reported (on pr69307).
>>
>> But note that both tests pass on trunk.
>
>
> This looks to me like PR rtl-optimization/68236 which I fixed on trunk.
>
You are right: I've just checked that backporting your

Re: [PATCH] Improve CSE (PR rtl-optimization/70467)

2016-04-01 Thread Bernd Schmidt

On 04/01/2016 03:14 PM, Jakub Jelinek wrote:

As the testcase below shows, we can end up with lots of useless
instructions from multi-word arithmetics.
simplify-rtx.c can optimize x {&,|,^}= {0,-1}, but while
x &= 0 or x {|,^}= -1 are optimized into constants and CSE can handle those
fine, we keep x &= -1 and x {|,^}= 0 in the IL until expansion if x
is a MEM.  There are two issues, one is that cse_insn has for a few years
code that wants to prevent partially overlapping MEM->MEM moves,
but actually doesn't realize that fully overlapping MEM->MEM noop moves
are fine.  And the second one is that on most backends, there are no
MEM->MEM move instructions, so we need to delete the useless insns instead,
because it can't match.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
Is this something we want for 6.x or defer for stage1?


It seems like a stage1 thing to me unless it's a regression. But you're 
in a better position to make that call.



+ /* Similarly, lots of targets don't allow no-op
+(set (mem x) (mem x)) moves.  */
+ else if (n_sets == 1
+  && MEM_P (trial)
+  && MEM_P (dest)
+  && rtx_equal_p (trial, dest)
+  && !side_effects_p (dest)
+  && (cfun->can_delete_dead_exceptions
+  || insn_nothrow_p (insn)))


Looks like this block of code is practically duplicated - I'd prefer a 
helper function set_of_equal_mems_removable_p or something. Ok with that 
change.



Bernd


Re: Fix for PR70498 in Libiberty Demangler

2016-04-01 Thread Marcel Böhme
Hi Bernd,

Thanks for the feedback!

> Patches need to be bootstrapped and regression tested, and patch submissions 
> should include which target this was done on.
> 
> Ideally you'd also want to include testcases along with your patches, 
> although I'm not entirely sure how we can arrange for this type of problem to 
> be tested.
Regression tested on x86_64-pc-linux-gnu (make check). Test cases added to 
libiberty/testsuite/demangler-expected and checked PR70498 is resolved. Not 
sure how to bootstrap the patch.

> Lastly, for this specific patch, I have trouble seeing how it fixes anything. 
> I'd need a more detailed explanation of how the problem happens in the first 
> place.
In the patched version, the values wrap around when they are parsed in 
d_number. Since the mangled string may contain negative numbers, there is 
usually proper handling of negative numbers in the clients of d_number. Without 
the patch a value can become negative when cast from long to int *after* these 
checks. 

For instance, in d_source_name the length of the identifier is parsed as long 
from the mangled string and checked whether it is negative. Since d_identifier 
takes an int as length, d_identifier is called with a negative length after the 
implicit cast:

static struct demangle_component *
d_source_name (struct d_info *di)
{
  int len;
  struct demangle_component *ret;

  len = d_number (di);
  if (len <= 0)
return NULL;
  ret = d_identifier (di, len);
  di->last_name = ret;
  return ret;
}

--

Index: libiberty/cp-demangle.c
===
--- libiberty/cp-demangle.c (revision 234663)
+++ libiberty/cp-demangle.c (working copy)
@@ -398,7 +398,7 @@
 struct demangle_component *);

static struct demangle_component *
-d_make_template_param (struct d_info *, long);
+d_make_template_param (struct d_info *, int);

static struct demangle_component *
d_make_sub (struct d_info *, const char *, int);
@@ -421,9 +421,9 @@

static struct demangle_component *d_source_name (struct d_info *);

-static long d_number (struct d_info *);
+static int d_number (struct d_info *);

-static struct demangle_component *d_identifier (struct d_info *, long);
+static struct demangle_component *d_identifier (struct d_info *, int);

static struct demangle_component *d_operator_name (struct d_info *);

@@ -1119,7 +1119,7 @@
/* Add a new template parameter.  */

static struct demangle_component *
-d_make_template_param (struct d_info *di, long i)
+d_make_template_param (struct d_info *di, int i)
{
  struct demangle_component *p;

@@ -1135,7 +1135,7 @@
/* Add a new function parameter.  */

static struct demangle_component *
-d_make_function_param (struct d_info *di, long i)
+d_make_function_param (struct d_info *di, int i)
{
  struct demangle_component *p;

@@ -1620,7 +1620,7 @@
static struct demangle_component *
d_source_name (struct d_info *di)
{
-  long len;
+  int len;
  struct demangle_component *ret;

  len = d_number (di);
@@ -1633,12 +1633,12 @@

/* number ::= [n] <(non-negative decimal integer)>  */

-static long
+static int
d_number (struct d_info *di)
{
  int negative;
  char peek;
-  long ret;
+  int ret;

  negative = 0;
  peek = d_peek_char (di);
@@ -1681,7 +1681,7 @@
/* identifier ::= <(unqualified source code identifier)>  */

static struct demangle_component *
-d_identifier (struct d_info *di, long len)
+d_identifier (struct d_info *di, int len)
{
  const char *name;

@@ -1702,7 +1702,7 @@
  /* Look for something which looks like a gcc encoding of an
 anonymous namespace, and replace it with a more user friendly
 name.  */
-  if (len >= (long) ANONYMOUS_NAMESPACE_PREFIX_LEN + 2
+  if (len >= ANONYMOUS_NAMESPACE_PREFIX_LEN + 2
  && memcmp (name, ANONYMOUS_NAMESPACE_PREFIX,
ANONYMOUS_NAMESPACE_PREFIX_LEN) == 0)
{
@@ -1870,7 +1870,7 @@
{
  struct demangle_component *p = NULL;
  struct demangle_component *next = NULL;
-  long len, i;
+  int len, i;
  char c;
  const char *str;

@@ -2012,7 +2012,7 @@
   case 'C':
 {
   struct demangle_component *derived_type;
-   long offset;
+   int offset;
   struct demangle_component *base_type;

   derived_type = cplus_demangle_type (di);
@@ -2946,10 +2946,10 @@

/*  _ */

-static long
+static int
d_compact_number (struct d_info *di)
{
-  long num;
+  int num;
  if (d_peek_char (di) == '_')
num = 0;
  else if (d_peek_char (di) == 'n')
@@ -2969,7 +2969,7 @@
static struct demangle_component *
d_template_param (struct d_info *di)
{
-  long param;
+  int param;

  if (! d_check_char (di, 'T'))
return NULL;
@@ -3502,7 +3502,7 @@
static int
d_discriminator (struct d_info *di)
{
-  long discrim;
+  int discrim;

  if (d_peek_char (di) != '_')
return 1;
@@ -3558,7 +3558,7 @@
d_unnamed_type (struct d_info *di)
{
  struct demangle_component *ret;
-  long num;
+  int num;

  if (! d_check_char (di, 'U'))
return NULL;
@@ -4086,7 +4086,7 @@

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote:
>
> Ping^1

I haven't seen a newer version that incorporates my feedback. To recap what
I'd like to see is a more general way to select approximations based on mode. 
I don't believe that looking at the inner mode works in general, and it doesn't
make sense to add internal tune flags for all possible combinations.

To give an idea what I mean, it would be easiest to add a single field to the 
CPU
tuning structure that contains a mask for all the combinations. Then we call a
single function with approximation kind ie. sqrt, rsqrt, div (x/y), recip (1/x) 
and
mode which uses the CPU tuning field to decide whether it should be inlined.

Cheers,
Wilco



[RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-01 Thread Prathamesh Kulkarni
Hi,
The attached patch introduces param max-lto-partition which creates an upper
bound for partition size.

My primary motivation for this patch is to fix building chromium for arm
with -flto-partition=one.
Chromium fails to build with -flto-partition={none, one} with assembler error:
"branch out of range error"
because in both these cases LTO creates a single text section of 18 mb
which exceeds thumb's limit of 16 mb and arm backend emits a short
call if caller and callee are in same section.
This is binutils PR18625:
https://sourceware.org/bugzilla/show_bug.cgi?id=18625
With patch, chromium builds for -flto-partition=one (by creating more
than one but minimal number of partitions to honor 16 mb limit).
I haven't tested with -flto-partition=none but I suppose the build
will still fail for none, because it won't involve partitioning?  I am
not sure how to fix for none case.

As suggested by Jim in binutils PR18625, the proper fix would be to
implement branch relaxation in arm's port of gas, however I suppose
only LTO will realistically create such large sections,
and implementing branch relaxation appears to be quite complicated and
probably too much of
an effort for this single use case, so this patch serves as a
work-around to the issue.
I am looking into fine-tuning the param value for ARM backend to
roughly match limit
of 16 mb.

AFAIU, this would change semantics of --param n_lto_partitions (or
-flto-partition=one) from
"exactly n_lto_partitions" to "at-least n_lto_partitions". If that's
not desirable maybe we could add
another param/option ?
Cross-tested on arm*-*-*.
Would this patch be OK for stage-1 (after getting param value right
for ARM target) ?

Thanks,
Prathamesh
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c868490..f734d56 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3459,6 +3459,11 @@ arm_option_override (void)
 
   /* Init initial mode for testing.  */
   thumb_flipper = TARGET_THUMB;
+
+maybe_set_param_value (MAX_PARTITION_SIZE,
+ 1, /* FIXME: fine-tune this value to roughly 
match 16 mb limit.  */
+   global_options.x_param_values,
+  global_options_set.x_param_values);
 }
 
 static void
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 9eb63c2..bc0c612 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -511,9 +511,20 @@ lto_balanced_map (int n_lto_partitions)
   varpool_order.qsort (varpool_node_cmp);
 
   /* Compute partition size and create the first partition.  */
+  if (PARAM_VALUE (MIN_PARTITION_SIZE) > PARAM_VALUE (MAX_PARTITION_SIZE))
+fatal_error (input_location, "min partition size cannot be greater than 
max partition size");
+
   partition_size = total_size / n_lto_partitions;
   if (partition_size < PARAM_VALUE (MIN_PARTITION_SIZE))
 partition_size = PARAM_VALUE (MIN_PARTITION_SIZE);
+  else if (partition_size > PARAM_VALUE (MAX_PARTITION_SIZE))
+{
+  n_lto_partitions = total_size / PARAM_VALUE (MAX_PARTITION_SIZE);
+  if (total_size % PARAM_VALUE (MAX_PARTITION_SIZE))
+   n_lto_partitions++;
+  partition_size = total_size / n_lto_partitions;
+}
+
   npartitions = 1;
   partition = new_partition ("");
   if (symtab->dump_file)
diff --git a/gcc/params.def b/gcc/params.def
index 9362c15..b6055ff 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1029,6 +1029,11 @@ DEFPARAM (MIN_PARTITION_SIZE,
  "Minimal size of a partition for LTO (in estimated instructions).",
  1000, 0, 0)
 
+DEFPARAM (MAX_PARTITION_SIZE,
+ "lto-max-partition",
+ "Maximal size of a partition for LTO (in estimated instructions).",
+ INT_MAX, 0, INT_MAX)
+
 /* Diagnostic parameters.  */
 
 DEFPARAM (CXX_MAX_NAMESPACES_FOR_DIAGNOSTIC_HELP,


ChangeLog
Description: Binary data


Re: Fix for PR70498 in Libiberty Demangler

2016-04-01 Thread Marcel Böhme

> Since d_identifier takes an int as length, d_identifier is called with a 
> negative length after the implicit cast:
Sorry, d_make_name called from d_identifier in cp_demangle.c:1721 takes an int 
as length.

Best regards
- Marcel

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote:
On 03/23/16 11:24, Evandro Menezes wrote:
> On 03/17/16 15:09, Evandro Menezes wrote:
>> This patch implements FP division by an approximation using the Newton
>> series.
>>
>> With this patch, DF division is sped up by over 100% and SF division,
>> zilch, both on A57 and on M1.

Mentioning throughput is not useful given that the vectorized single precision
case will give most of the speedup in actual code.

> gcc/
> * config/aarch64/aarch64-tuning-flags.def
> (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
> * config/aarch64/aarch64-protos.h
> (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
> (aarch64_emit_approx_div): Declare new function.
> * config/aarch64/aarch64.c
> (aarch64_emit_approx_div): Define new function.
> * config/aarch64/aarch64.md ("div3"): New expansion.
> * config/aarch64/aarch64-simd.md ("div3"): Likewise.
>
>
> This version of the patch cleans up the changes to the MD files and
> optimizes the division when the numerator is 1.0.

Adding support for plain recip is good. Having the enabling logic no longer in
the md file is an improvement, but I don't believe adding tuning flags for the 
inner
mode is correct - we need a more generic solution like I mentioned in my other 
mail.

The division variant should use the same latency reduction trick I mentioned 
for sqrt.

Wilco



Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-04-01 Thread James Greenhalgh
On Fri, Apr 01, 2016 at 02:47:05PM +0100, Wilco Dijkstra wrote:
> Evandro Menezes wrote:
> >
> > Ping^1
> 
> I haven't seen a newer version that incorporates my feedback. To recap what
> I'd like to see is a more general way to select approximations based on mode.
> I don't believe that looking at the inner mode works in general, and it
> doesn't make sense to add internal tune flags for all possible combinations.

Agreed. I don't think that a flag for each of the cartesian product of
{rsqrt,sqrt,div} X {SF,DF,V2SF,V4SF,V2DF} is a scalable solution - that's
at least 15 flags we'll need.

As I said earlier in the discussion, this particular split (between SF and
DF mode) seems strange to me. I'd expect the V4SF vs. SF would also be
interesting, and that a distinction between vector modes and scalar
modes would be more likely to be useful.

> To give an idea what I mean, it would be easiest to add a single field to the
> CPU tuning structure that contains a mask for all the combinations. Then we
> call a single function with approximation kind ie. sqrt, rsqrt, div (x/y),
> recip (1/x) and mode which uses the CPU tuning field to decide whether it
> should be inlined.

I like the idea of a single cost function.

These patches are well and truly on my radar for GCC 7, but as we're still
in bugfixing mode (and there's still plenty to do!), I'm not going to get
round to giving them a more detailed review until after the release. Feel
free to ping them again once GCC 6 has shipped.

Thanks,
James



Re: Proposed Patch for Bug 69687

2016-04-01 Thread Bernd Schmidt

On 03/31/2016 06:56 AM, Marcel Böhme wrote:

Hi Bernd,


Are all the places being patched really problematic ones where an input file 
could realistically cause an overflow, or just the string functions?

The loop in demangle_args allows to call the patched register*- and 
remember*-methods arbitrarily often. So, those should also overflow at some 
point.
Found a few other segmentation faults in libiberty that I’ll report and patch 
separately.


I'm concerned about just returning without any kind of error indication. Not 
sure what we should be calling from libiberty, but I was thinking maybe 
xmalloc_failed.

Done. Now, clients of libiberty freeze for about 80 seconds and consume about 3GB of 
memory before exiting with "out of memory allocating 2147483647 bytes after a 
total of 3221147648 bytes”.


Might also want to guard against overflow from the first addition.

Done.




Index: libiberty/cplus-dem.c
===
--- libiberty/cplus-dem.c   (revision 234607)
+++ libiberty/cplus-dem.c   (working copy)
@@ -55,6 +55,7 @@ Boston, MA 02110-1301, USA.  */
  void * malloc ();
  void * realloc ();
  #endif
+#include 

  #include 
  #undef CURRENT_DEMANGLING_STYLE


Forgot about this issue, sorry. At least this needs guarding with #ifdef 
HAVE_LIMITS_H, as in the other files in libiberty. Several of them also 
go to trouble to define the macros if limits.h is missing; not sure how 
much of an issue that is nowadays, but you might want to adapt something 
like the code from strtol.c:


#ifndef ULONG_MAX
#define ULONG_MAX   ((unsigned long)(~0L))  /* 0x */
#endif

#ifndef LONG_MAX
#define LONG_MAX((long)(ULONG_MAX >> 1))/* 0x7FFF */
#endif

Mind trying that and doing a full test run as described in my other mail?


Bernd


Re: Fix for PR70498 in Libiberty Demangler

2016-04-01 Thread Bernd Schmidt

On 04/01/2016 03:39 PM, Marcel Böhme wrote:

Hi Bernd,

Thanks for the feedback!


Patches need to be bootstrapped and regression tested, and patch
submissions should include which target this was done on.

Ideally you'd also want to include testcases along with your
patches, although I'm not entirely sure how we can arrange for this
type of problem to be tested.

Regression tested on x86_64-pc-linux-gnu (make check). Test cases
added to libiberty/testsuite/demangler-expected and checked PR70498
is resolved. Not sure how to bootstrap the patch.


You configure gcc normally and build it - that should automatically 
bootstrap, unless you're cross-compiling. You'll have stage1-* and 
stage2-* directories at the end if that worked. You should then run the 
testsuite on the bootstrapped compiler.



Lastly, for this specific patch, I have trouble seeing how it fixes
anything. I'd need a more detailed explanation of how the problem
happens in the first place.

In the patched version, the values wrap around when they are parsed
in d_number. Since the mangled string may contain negative numbers,
there is usually proper handling of negative numbers in the clients
of d_number. Without the patch a value can become negative when cast
from long to int *after* these checks.

For instance, in d_source_name the length of the identifier is parsed
as long from the mangled string and checked whether it is negative.
Since d_identifier takes an int as length, d_identifier is called
with a negative length after the implicit cast:


Ok, I think I see it. Guess I'll queue this up and commit it for you in 
the next few days.



Bernd




[patch] Fortran fix for PR70289

2016-04-01 Thread Cesar Philippidis
The bug in PR70289 is an assertion failure triggered by a static
variable used inside an offloaded acc region which doesn't have a data
clause associated with it. Basically, that static variable ends up in a
different lto partition, which was not streamed to the offloaded
compiler. I'm not sure if we should try to replicate the static storage
in the offloaded regions, but it probably doesn't make sense in a
parallel environment anyway.

My solution to this problem was to teach the fortran front end to create
a data clause for each reduction variable it encounters. Furthermore,
I've decided to update the semantics of the acc parallel reduction
clause such that gfortran will emit an error when a reduction variable
is private or firstprivate (note that an acc loop reduction still works
with private and firstprivate reductions, just not acc parallel
reductions). The second change is to emit a warning when an incompatible
data clause is used with a reduction, and promote that data clause to a
present_or_copy. My rationale behind the promotion is, you cannot have a
copyin reduction variable because the original variable on the host will
not be updated. Similarly, you cannot have a copyout reduction variable
because the reduction operator is supposed to combine the results of the
reduction with the original reduction variable, but in copyout the
original variable is not initialized on the accelerator. But perhaps the
copyout rule is too strict?

Tom and I are still working with the OpenACC technical committee to get
some clarification on how the reduction value should behave with respect
to data movement. In the meantime, I wanted to see if this type of
solution would be appropriate for trunk. I was trying to get this to
work in the gimplifier so that one patch could resolve the problem for
all of the front ends, but that was happening too late. Especially for
reference types.

By the way, we will also benefit from this patch too
. If not for
these reduction, but for global acc variables which haven't been
properly declared.

Cesar
2016-03-31  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (resolve_oacc_private_reductions): New function.
	(resolve_omp_clauses): Ensure that acc parallel reductions
	have a copy, pcopy or present clause associated with it,
	otherwise emit a warning or error as appropriate.

	gcc/testsuite/
	* gfortran.dg/goacc/reduction-promotions.f90: New test.
	* gfortran.dg/goacc/reduction.f95:

	libgomp/
	* testsuite/libgomp.oacc-fortran/pr70289.f90: New test.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index a6c39cd..e59997f 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -3146,6 +3146,46 @@ resolve_omp_udr_clause (gfc_omp_namelist *n, gfc_namespace *ns,
 /* OpenMP directive resolving routines.  */
 
 static void
+resolve_oacc_private_reductions (gfc_omp_clauses *omp_clauses, int list)
+{
+  gfc_omp_namelist *n;
+
+  /* Check for bogus private reductions.  */
+  for (n = omp_clauses->lists[list]; n; n = n->next)
+n->sym->mark = 1;
+
+  for (n = omp_clauses->lists[OMP_LIST_REDUCTION]; n; n = n->next)
+if (n->sym->mark)
+  {
+	gfc_omp_namelist *prev = NULL, *tmp = NULL;
+
+	/* Remove it from the list of private clauses.  */
+	tmp = omp_clauses->lists[list];
+	while (tmp)
+	  {
+	if (tmp->sym == n->sym)
+	  {
+		gfc_error ("Reduction symbol %qs cannot be private "
+			   "at %L", tmp->sym->name, &tmp->where);
+
+		if (omp_clauses->lists[list] == tmp)
+		  omp_clauses->lists[list] = tmp->next;
+		else
+		  prev->next = tmp->next;
+		break;
+	  }
+	else
+	  {
+		prev = tmp;
+		tmp = tmp->next;
+	  }
+	  }
+
+	n->sym->mark = 0;
+  }
+}
+
+static void
 resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 		 gfc_namespace *ns, bool openacc = false)
 {
@@ -3320,6 +3360,50 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses *omp_clauses,
 	gfc_error ("Array %qs is not permitted in reduction at %L",
 		   n->sym->name, &n->where);
 	}
+
+  /* Parallel reductions have their own set of rules.  */
+  if (code->op == EXEC_OACC_PARALLEL)
+	{
+	  for (n = omp_clauses->lists[OMP_LIST_REDUCTION]; n; n = n->next)
+	n->sym->mark = 0;
+
+	  resolve_oacc_private_reductions (omp_clauses, OMP_LIST_PRIVATE);
+	  resolve_oacc_private_reductions (omp_clauses, OMP_LIST_FIRSTPRIVATE);
+
+	  /* Check for data maps which aren't copy or present_or_copy.  */
+	  for (n = omp_clauses->lists[OMP_LIST_MAP]; n; n = n->next)
+	{
+	  if (n->sym->mark == 0
+		  && !(n->u.map_op == OMP_MAP_TOFROM
+		   || n->u.map_op == OMP_MAP_FORCE_TOFROM
+		   || n->u.map_op == OMP_MAP_FORCE_PRESENT))
+		{
+		  gfc_warning (0, "incompatible data clause associated with "
+			   "symbol %qs; promoting this clause to "
+			   "present_or_copy at %L", n->sym->name,
+			   &n->where);
+		  n->u.map_op = OMP_MAP_TOFROM;
+		}
+	  n->sym->mark = 1;
+	}

Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-04-01 Thread Evandro Menezes

On 04/01/16 08:47, Wilco Dijkstra wrote:

Evandro Menezes wrote:

Ping^1

I haven't seen a newer version that incorporates my feedback. To recap what
I'd like to see is a more general way to select approximations based on mode.
I don't believe that looking at the inner mode works in general, and it doesn't
make sense to add internal tune flags for all possible combinations.

To give an idea what I mean, it would be easiest to add a single field to the 
CPU
tuning structure that contains a mask for all the combinations. Then we call a
single function with approximation kind ie. sqrt, rsqrt, div (x/y), recip (1/x) 
and
mode which uses the CPU tuning field to decide whether it should be inlined.


I think that I have a better idea of what you meant.

Thank you,

--
Evandro Menezes



Re: [PATCH] Improve CSE (PR rtl-optimization/70467)

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 03:35:19PM +0200, Bernd Schmidt wrote:
> On 04/01/2016 03:14 PM, Jakub Jelinek wrote:
> >As the testcase below shows, we can end up with lots of useless
> >instructions from multi-word arithmetics.
> >simplify-rtx.c can optimize x {&,|,^}= {0,-1}, but while
> >x &= 0 or x {|,^}= -1 are optimized into constants and CSE can handle those
> >fine, we keep x &= -1 and x {|,^}= 0 in the IL until expansion if x
> >is a MEM.  There are two issues, one is that cse_insn has for a few years
> >code that wants to prevent partially overlapping MEM->MEM moves,
> >but actually doesn't realize that fully overlapping MEM->MEM noop moves
> >are fine.  And the second one is that on most backends, there are no
> >MEM->MEM move instructions, so we need to delete the useless insns instead,
> >because it can't match.
> >
> >Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
> >Is this something we want for 6.x or defer for stage1?
> 
> It seems like a stage1 thing to me unless it's a regression. But you're in a
> better position to make that call.

I guess it can wait for stage1.

> >+  /* Similarly, lots of targets don't allow no-op
> >+ (set (mem x) (mem x)) moves.  */
> >+  else if (n_sets == 1
> >+   && MEM_P (trial)
> >+   && MEM_P (dest)
> >+   && rtx_equal_p (trial, dest)
> >+   && !side_effects_p (dest)
> >+   && (cfun->can_delete_dead_exceptions
> >+   || insn_nothrow_p (insn)))
> 
> Looks like this block of code is practically duplicated - I'd prefer a
> helper function set_of_equal_mems_removable_p or something. Ok with that
> change.

Perhaps instead just set a bool in the second hunk and just test that at the
third hunk's condition?

Jakub


[gomp4] Merge trunk r234572 (2016-03-30) into gomp-4_0-branch

2016-04-01 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r234674:

commit e7e9a6012dca0546a8b1cd16bf51acdd85ec7654
Merge: 9973610 ef4f1cb
Author: tschwinge 
Date:   Fri Apr 1 14:23:38 2016 +

svn merge -r 234471:234572 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@234674 
138bc75d-0d04-0410-961f-82ee72b054a4


Grüße
 Thomas


Re: [AArch64] Add precision choices for the reciprocal square root approximation

2016-04-01 Thread Evandro Menezes

On 04/01/16 09:06, James Greenhalgh wrote:

On Fri, Apr 01, 2016 at 02:47:05PM +0100, Wilco Dijkstra wrote:

Evandro Menezes wrote:

Ping^1

I haven't seen a newer version that incorporates my feedback. To recap what
I'd like to see is a more general way to select approximations based on mode.
I don't believe that looking at the inner mode works in general, and it
doesn't make sense to add internal tune flags for all possible combinations.

Agreed. I don't think that a flag for each of the cartesian product of
{rsqrt,sqrt,div} X {SF,DF,V2SF,V4SF,V2DF} is a scalable solution - that's
at least 15 flags we'll need.

As I said earlier in the discussion, this particular split (between SF and
DF mode) seems strange to me. I'd expect the V4SF vs. SF would also be
interesting, and that a distinction between vector modes and scalar
modes would be more likely to be useful.


To give an idea what I mean, it would be easiest to add a single field to the
CPU tuning structure that contains a mask for all the combinations. Then we
call a single function with approximation kind ie. sqrt, rsqrt, div (x/y),
recip (1/x) and mode which uses the CPU tuning field to decide whether it
should be inlined.

I like the idea of a single cost function.


I'll go with it.


These patches are well and truly on my radar for GCC 7, but as we're still
in bugfixing mode (and there's still plenty to do!), I'm not going to get
round to giving them a more detailed review until after the release. Feel
free to ping them again once GCC 6 has shipped.


I've been proposing this change for a few months now and I'd really like 
to have it in 6.  I'd appreciate if you'd consider this request, all 
things considered.


Thank you,

--
Evandro Menezes



Re: [patch] Fortran fix for PR70289

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 07:49:16AM -0700, Cesar Philippidis wrote:
> The bug in PR70289 is an assertion failure triggered by a static
> variable used inside an offloaded acc region which doesn't have a data
> clause associated with it. Basically, that static variable ends up in a
> different lto partition, which was not streamed to the offloaded
> compiler. I'm not sure if we should try to replicate the static storage
> in the offloaded regions, but it probably doesn't make sense in a
> parallel environment anyway.

Is this really Fortran specific?  I'd expect the diagnostics to be in
gimplify.c and handle it for all 3 FEs...

Jakub


Re: [PATCH] Improve CSE (PR rtl-optimization/70467)

2016-04-01 Thread Bernd Schmidt

On 04/01/2016 04:51 PM, Jakub Jelinek wrote:

On Fri, Apr 01, 2016 at 03:35:19PM +0200, Bernd Schmidt wrote:

On 04/01/2016 03:14 PM, Jakub Jelinek wrote:

As the testcase below shows, we can end up with lots of useless
instructions from multi-word arithmetics.
simplify-rtx.c can optimize x {&,|,^}= {0,-1}, but while
x &= 0 or x {|,^}= -1 are optimized into constants and CSE can handle those
fine, we keep x &= -1 and x {|,^}= 0 in the IL until expansion if x
is a MEM.  There are two issues, one is that cse_insn has for a few years
code that wants to prevent partially overlapping MEM->MEM moves,
but actually doesn't realize that fully overlapping MEM->MEM noop moves
are fine.  And the second one is that on most backends, there are no
MEM->MEM move instructions, so we need to delete the useless insns instead,
because it can't match.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux.
Is this something we want for 6.x or defer for stage1?


It seems like a stage1 thing to me unless it's a regression. But you're in a
better position to make that call.


I guess it can wait for stage1.


+ /* Similarly, lots of targets don't allow no-op
+(set (mem x) (mem x)) moves.  */
+ else if (n_sets == 1
+  && MEM_P (trial)
+  && MEM_P (dest)
+  && rtx_equal_p (trial, dest)
+  && !side_effects_p (dest)
+  && (cfun->can_delete_dead_exceptions
+  || insn_nothrow_p (insn)))


Looks like this block of code is practically duplicated - I'd prefer a
helper function set_of_equal_mems_removable_p or something. Ok with that
change.


Perhaps instead just set a bool in the second hunk and just test that at the
third hunk's condition?


Also works for me. Or maybe set trial and dest to pc_rtx and merge the 
new case with the preexisting one. As long as we don't get a big block 
of duplicated conditions.



Bernd


[Patch ARM] Fix PR 70496 (unified assembler rewrite fall-out)

2016-04-01 Thread Ramana Radhakrishnan
Hi,

While doing the unified asm rewrite - I inadvertently changed the
meaning of ASM_APP_OFF which causes failures when folks who know what
they are doing switch between arm and thumb states within a
function. The intent of the unified asm rewrite was not to affect any
inline assembler code in that it would remain in divided syntax by
default and switching back to unified asm in normal compiled code.

Thanks to Jim Wilson for pointing it out on the linaro toolchain list and
sorry about the breakage.

Tested arm-none-eabi cross with both arm and thumb multilibs and applied to 
trunk.

regards
Ramana

2016-04-01  Ramana Radhakrishnan  

PR target/70496
* config/arm/arm.h (ASM_APP_OFF): Handle TARGET_ARM
and TARGET_THUMB.

2016-04-01  Ramana Radhakrishnan  

PR target/70496
* gcc.target/arm/pr70496.c: New test.




commit 767f1ecfa04beab3c01db38999bf2cfe170c2e93
Author: Ramana Radhakrishnan 
Date:   Fri Apr 1 11:14:59 2016 +0100

Fix PR70496

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 6352140..ad123dd 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2037,7 +2037,8 @@ extern int making_const_table;
"\t.syntax divided\n")
 
 #undef  ASM_APP_OFF
-#define ASM_APP_OFF "\t.syntax unified\n"
+#define ASM_APP_OFF (TARGET_ARM ? "\t.arm\n\t.syntax unified\n" : \
+"\t.thumb\n\t.syntax unified\n")
 
 /* Output a push or a pop instruction (only used when profiling).
We can't push STATIC_CHAIN_REGNUM (r12) directly with Thumb-1.  We know
diff --git a/gcc/testsuite/gcc.target/arm/pr70496.c 
b/gcc/testsuite/gcc.target/arm/pr70496.c
new file mode 100644
index 000..89957e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr70496.c
@@ -0,0 +1,12 @@
+/* { dg-do assemble } */
+/* { dg-options "-mthumb -O2" } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+
+int i;
+void
+main (void)
+{
+  __asm__ volatile (".arm");
+  i = 0;
+  __asm__ volatile ("\n cbz r0, 2f\n2:");
+}


[C PATCH] Fix ICE with VEC_COND_EXPR and compound literals (PR c/70307)

2016-04-01 Thread Marek Polacek
This is another case where a C_MAYBE_CONST_EXPR leaks into the gimplifier.
Starting with r229128 and thus introduction of build_vec_cmp we now create
VEC_COND_EXPR when building a vector comparison.  The C_MAYBE_CONST_EXPR
originated in build_compound_literal when creating a COMPOUND_LITERAL_EXPR.
This is then made a part of op0 of a VEC_COND_EXPR.  But c_fully_fold_internal
doesn't know what to do with VEC_COND_EXPRs so the C_MAYBE_CONST_EXPR went
unnoticed into fold, oops.  The fix here is to teach c_fully_fold_internal
how to handle VEC_COND_EXPRs, which is what this patch attempts to do.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-01  Marek Polacek  

PR c/70307
* c-fold.c (c_fully_fold_internal): Handle VEC_COND_EXPR.

* gcc.dg/torture/pr70307.c: New test.

diff --git gcc/c/c-fold.c gcc/c/c-fold.c
index f07917f..d512824 100644
--- gcc/c/c-fold.c
+++ gcc/c/c-fold.c
@@ -528,6 +528,23 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
*maybe_const_operands,
*maybe_const_itself &= op2_const_self;
   goto out;
 
+case VEC_COND_EXPR:
+  orig_op0 = op0 = TREE_OPERAND (expr, 0);
+  op1 = TREE_OPERAND (expr, 1);
+  op2 = TREE_OPERAND (expr, 2);
+  op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands,
+  maybe_const_itself, for_int_const);
+  STRIP_TYPE_NOPS (op0);
+
+  /* OP1 will be a vector of -1 and OP2 a vector if 0, as created in
+build_vec_cmp -- no need to fold them.  */
+
+  if (op0 != orig_op0)
+   ret = fold_build3_loc (loc, code, TREE_TYPE (expr), op0, op1, op2);
+  else
+   ret = fold (expr);
+  goto out;
+
 case EXCESS_PRECISION_EXPR:
   /* Each case where an operand with excess precision may be
 encountered must remove the EXCESS_PRECISION_EXPR around
diff --git gcc/testsuite/gcc.dg/torture/pr70307.c 
gcc/testsuite/gcc.dg/torture/pr70307.c
index e69de29..d47c4b6 100644
--- gcc/testsuite/gcc.dg/torture/pr70307.c
+++ gcc/testsuite/gcc.dg/torture/pr70307.c
@@ -0,0 +1,62 @@
+/* PR c/70307 */
+/* { dg-do compile } */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si foo (v4si);
+
+v4si
+fn1 (int i)
+{
+  return i <= (v4si){(0, 0)};
+}
+
+v4si
+fn2 (int i)
+{
+  v4si r;
+  r = i <= (v4si){(0, 0)};
+  return r;
+}
+
+v4si
+fn3 (int i)
+{
+  return foo (i <= (v4si){(0, 0)});
+}
+
+v4si
+fn4 (int i)
+{
+  struct S { v4si v; };
+  struct S s = { .v = i <= (v4si){(0, 0)} };
+  return s.v;
+}
+
+v4si
+fn5 (int i)
+{
+  return (v4si){(1, i++)} == (v4si){(0, 0)};
+}
+
+v4si
+fn6 (int i)
+{
+  v4si r;
+  r = (v4si){(1, i++)} == (v4si){(0, 0)};
+  return r;
+}
+
+v4si
+fn7 (int i)
+{
+  return foo ((v4si){(1, i++)} == (v4si){(0, 0)});
+}
+
+v4si
+fn8 (int i)
+{
+  struct S { v4si v; };
+  struct S s = { .v = (v4si){(1, i++)} == (v4si){(0, 0)} };
+  return s.v;
+}

Marek


Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 11:44:16AM +0200, Richard Biener wrote:
> On Fri, 1 Apr 2016, Jakub Jelinek wrote:
> 
> > On Fri, Apr 01, 2016 at 11:08:09AM +0200, Richard Biener wrote:
> > > 
> > > RTL DSE uses true_dependence to see whether a store may be killed by
> > > anothe store - that's obviously broken.  The following patch makes
> > > it use output_dependence instead (introducing a canon_ variant of that).
> > 
> > I think it would be interesting to see some stats on what effect does this
> > have on the optimization RTL DSE is doing (say gather during
> > unpatched bootstrap/regtest number of successfully optimized replace_read
> > calls, and the same with patched bootstrap/regtest).

So, I've gathered the stats, with:
--- gcc/dse.c.jj2016-03-02 10:47:25.0 +0100
+++ gcc/dse.c   2016-04-01 15:01:18.831249250 +0200
@@ -2047,6 +2047,11 @@ replace_read (store_info *store_info, in
  print_simple_rtl (dump_file, read_reg);
  fprintf (dump_file, "\n");
}
+{
+FILE *f = fopen ("/tmp/dse2", "a");
+fprintf (f, "%d %s %s\n", (int) BITS_PER_WORD, main_input_filename ? 
main_input_filename : "-", current_function_name ());
+fclose (f);
+}
   return true;
 }
   else
with my usual pair of rtl,yes checking bootstraps/regtests (x86_64-linux
and i686-linux, former one with ada, latter without), both without your
patch and with the patch.
Without the patch got 66555 successful replace_reads, with your patch
only 65971, that is 1% difference.  I guess that is acceptable level,
though for GCC 7 we should try to improve that (either at the GIMPLE level,
or do something better at the RTL level).

A few randomly chosen cases where we don't optimize this anymore:
32 ../../gcc/tree-ssa-structalias.c variable_info* 
create_variable_info_for_1(tree, const char*, bool, bool, bitmap)
32 /home/jakub/src/gcc/gcc/testsuite/gcc.c-torture/compile/pr42237.c foo
32 /home/jakub/src/gcc/gcc/testsuite/gcc.dg/pr62167.c main
32 /home/jakub/src/gcc/gcc/testsuite/gcc.target/i386/avx-vdppd-2.c do_test
32 /home/jakub/src/gcc/libstdc++-v3/testsuite/20_util/shared_ptr/cons/58659.cc 
std::__shared_ptr<_Tp, _Lp>::__shared_ptr(std::unique_ptr<_Up, _Ep>&&) [with 
_Tp1 = X; _Del = std::default_delete;  = void; _Tp = 
X; __gnu_cxx::_Lock_policy _Lp = (__gnu_cxx::_Lock_policy)2u]
32 
/home/jakub/src/gcc/libstdc++-v3/testsuite/28_regex/algorithms/regex_match/ecma/char/68863.cc
 
std::__detail::_Compiler<_TraitsT>::_Compiler(std::__detail::_Compiler<_TraitsT>::_IterT,
 std::__detail::_Compiler<_TraitsT>::_IterT, const typename 
_TraitsT::locale_type&, std::__detail::_Compiler<_TraitsT>::_FlagT) [with 
_TraitsT = std::__cxx11::regex_traits]
32 ../../../libgo/go/crypto/x509/cert_pool.go x509.CreateCertificateRequest
64 ../../gcc/gimple-streamer-in.c void input_bb(lto_input_block*, LTO_tags, 
data_in*, function*, int)
64 /home/jakub/src/gcc/gcc/ada/switch-m.adb 
Switch.M.Normalize_Compiler_Switches.Add_Switch_Component
64 /home/jakub/src/gcc/gcc/testsuite/gcc.target/i386/sse4_1-dppd-2.c do_test
64 
/home/jakub/src/gcc/libstdc++-v3/testsuite/23_containers/unordered_map/allocator/copy_assign.cc
 void std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, 
_Hash, _RehashPolicy, _Traits>::_M_assign(const std::_Hashtable<_Key, _Value, 
_Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>&, const 
_NodeGenerator&) [with _NodeGenerator = std::_Hashtable<_Key, _Value, _Alloc, 
_ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::operator=(const 
std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, 
_RehashPolicy, _Traits>&) [with _Key = T; _Value = std::pair; 
_Alloc = __gnu_test::propagating_allocator; _ExtractKey = 
std::__detail::_Select1st; _Equal = equal_to; _H1 = hash; _H2 = 
std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; 
_RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = 
std::__detail::_Hashtable_traits]::; _Key = T; _Value = std::pair; _Alloc = 
__gnu_test::propagating_allocator; _ExtractKey = 
std::__detail::_Select1st; _Equal = equal_to; _H1 = hash; _H2 = 
std::__detail::_Mod_range_hashing; _Hash = std::__detail::_Default_ranged_hash; 
_RehashPolicy = std::__detail::_Prime_rehash_policy; _Traits = 
std::__detail::_Hashtable_traits]

Jakub


Re: [patch] Fortran fix for PR70289

2016-04-01 Thread Cesar Philippidis
On 04/01/2016 07:56 AM, Jakub Jelinek wrote:
> On Fri, Apr 01, 2016 at 07:49:16AM -0700, Cesar Philippidis wrote:
>> The bug in PR70289 is an assertion failure triggered by a static
>> variable used inside an offloaded acc region which doesn't have a data
>> clause associated with it. Basically, that static variable ends up in a
>> different lto partition, which was not streamed to the offloaded
>> compiler. I'm not sure if we should try to replicate the static storage
>> in the offloaded regions, but it probably doesn't make sense in a
>> parallel environment anyway.
> 
> Is this really Fortran specific?  I'd expect the diagnostics to be in
> gimplify.c and handle it for all 3 FEs...

By the time the variable reaches the gimplifier, the reduction variable
may no longer match the ones inside the data clause. E.g. consider this
directive inside a fortran subroutine:

  !$acc parallel copyout(temp) reduction(+:temp)

The gimplifier would see something like:

  map(force_from:*temp.2 [len: 4]) map(alloc:temp [pointer assign, bias:
0]) reduction(+:temp)

At this point, unless I'm mistaken, it would be difficult to tell if
temp.2 is a pointer to the same temp in the reduction. Maybe I'm missing
something?

Cesar



[C++ PATCH] Fix ICE in warn_placement_new_too_small (PR c++/70488)

2016-04-01 Thread Jakub Jelinek
Hi!

The new warn_placement_new_too_small function blindly assumes that
if {DECL,TYPE}_SIZE_UNIT is non-NULL, then it must be INTEGER_CST
that fits into uhwi.  That is not the case, it could be a VLA, etc.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

OT, as I said in bugzilla, I see very questionable code in that function
too, no idea what Martin meant with that:
  while (TREE_CODE (oper) == COMPONENT_REF)
{
  tree op0 = oper;
  while (TREE_CODE (op0 = TREE_OPERAND (op0, 0)) == COMPONENT_REF);
  if (TREE_CODE (op0) == VAR_DECL)
var_decl = op0;
  oper = TREE_OPERAND (oper, 1);
}
TREE_OPERAND (oper, 1) of a COMPONENT_REF should be always a FIELD_DECL,
so this will never loop.  Did you mean to use if instead of while, something
different?

2016-04-01  Jakub Jelinek  
Marek Polacek  

PR c++/70488
* init.c (warn_placement_new_too_small): Test whether
DECL_SIZE_UNIT or TYPE_SIZE_UNIT are integers that fit into uhwi.

* g++.dg/init/new47.C: New test.

--- gcc/cp/init.c.jj2016-03-31 10:55:58.0 +0200
+++ gcc/cp/init.c   2016-04-01 14:23:25.977800499 +0200
@@ -2430,7 +2430,8 @@ warn_placement_new_too_small (tree type,
 though the size of a member of a union may be viewed as extending
 to the end of the union itself (it is by __builtin_object_size).  */
   if ((TREE_CODE (oper) == VAR_DECL || use_obj_size)
- && DECL_SIZE_UNIT (oper))
+ && DECL_SIZE_UNIT (oper)
+ && tree_fits_uhwi_p (DECL_SIZE_UNIT (oper)))
{
  /* Use the size of the entire array object when the expression
 refers to a variable or its size depends on an expression
@@ -2438,7 +2439,8 @@ warn_placement_new_too_small (tree type,
  bytes_avail = tree_to_uhwi (DECL_SIZE_UNIT (oper));
  exact_size = !use_obj_size;
}
-  else if (TYPE_SIZE_UNIT (TREE_TYPE (oper)))
+  else if (TYPE_SIZE_UNIT (TREE_TYPE (oper))
+  && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (oper
{
  /* Use the size of the type of the destination buffer object
 as the optimistic estimate of the available space in it.  */
--- gcc/testsuite/g++.dg/init/new47.C.jj2016-04-01 14:36:20.516355623 
+0200
+++ gcc/testsuite/g++.dg/init/new47.C   2016-04-01 14:36:34.162171718 +0200
@@ -0,0 +1,19 @@
+// PR c++/70448
+// { dg-do compile }
+// { dg-options "-Wall" }
+
+typedef __typeof__ (sizeof 0) size_t;  

+void *operator new (size_t, void *p) { return p; } 

+void *operator new[] (size_t, void *p) { return p; }   

+struct S { size_t s; };
+void bar (S *);
+
+void
+foo (unsigned int s)
+{
+  char t[sizeof (S) + s];
+  S *f = new (t) S;
+  bar (f);
+  f = new (t) S[1];
+  bar (f);
+}

Jakub


Re: [patch] Fortran fix for PR70289

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 08:07:24AM -0700, Cesar Philippidis wrote:
> On 04/01/2016 07:56 AM, Jakub Jelinek wrote:
> > On Fri, Apr 01, 2016 at 07:49:16AM -0700, Cesar Philippidis wrote:
> >> The bug in PR70289 is an assertion failure triggered by a static
> >> variable used inside an offloaded acc region which doesn't have a data
> >> clause associated with it. Basically, that static variable ends up in a
> >> different lto partition, which was not streamed to the offloaded
> >> compiler. I'm not sure if we should try to replicate the static storage
> >> in the offloaded regions, but it probably doesn't make sense in a
> >> parallel environment anyway.
> > 
> > Is this really Fortran specific?  I'd expect the diagnostics to be in
> > gimplify.c and handle it for all 3 FEs...
> 
> By the time the variable reaches the gimplifier, the reduction variable
> may no longer match the ones inside the data clause. E.g. consider this
> directive inside a fortran subroutine:
> 
>   !$acc parallel copyout(temp) reduction(+:temp)
> 
> The gimplifier would see something like:
> 
>   map(force_from:*temp.2 [len: 4]) map(alloc:temp [pointer assign, bias:
> 0]) reduction(+:temp)
> 
> At this point, unless I'm mistaken, it would be difficult to tell if
> temp.2 is a pointer to the same temp in the reduction. Maybe I'm missing
> something?

All the info is still there, and this wouldn't be the only case where
we rely on exact clause ordering.  I think that is still much better than
doing it in all the FEs.

Jakub


Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-01 Thread Bernd Schmidt

On 04/01/2016 05:05 PM, Jakub Jelinek wrote:

with my usual pair of rtl,yes checking bootstraps/regtests (x86_64-linux
and i686-linux, former one with ada, latter without), both without your
patch and with the patch.
Without the patch got 66555 successful replace_reads, with your patch
only 65971, that is 1% difference.  I guess that is acceptable level,
though for GCC 7 we should try to improve that (either at the GIMPLE level,
or do something better at the RTL level).


I was also running before/after tests on my set of .i files, and so far 
the effect seems negligible.



Bernd



Re: [PATCH] Prevent loops from being optimized away

2016-04-01 Thread Segher Boessenkool
On Fri, Apr 01, 2016 at 09:36:49AM +0200, Richard Biener wrote:
> On Fri, Apr 1, 2016 at 6:54 AM, Segher Boessenkool
>  wrote:
> > Sometimes people write loops that they do not want optimized away, even
> > when the compiler can replace those loops by a simple expression (or
> > nothing).  For such people, this patch adds a compiler option.
> >
> > Bootstrapped on powerpc64-linux; regression check still in progress
> > (with Init(1) to actually test anything).
> 
> -fno-tree-scev-cprop?  -O0?

There are other cases where GCC can delete loops, for example cddce1.

> A new compiler option for this is complete overkill (and it's implementation
> is gross ;)).  Semantics are also unclear, your patch would only make sure
> to preserve an empty loop with the asm in the latch, it wouldn't disallow
> replacing the overall effect with a computation.

That's right, and the loop can even still be unrolled, even fully
unrolled (which is good, not only should we not desert the loop but
we also shouldn't run around so much).

> Your patch would also miss a few testcases.

It already makes ~2000 (mainly vectorisation) testcases fail, is that
not enough coverage?  :-)

Cheers,


Segher


Re: [C++ PATCH] Fix ICE in warn_placement_new_too_small (PR c++/70488)

2016-04-01 Thread Jeff Law

On 04/01/2016 09:11 AM, Jakub Jelinek wrote:

Hi!

The new warn_placement_new_too_small function blindly assumes that
if {DECL,TYPE}_SIZE_UNIT is non-NULL, then it must be INTEGER_CST
that fits into uhwi.  That is not the case, it could be a VLA, etc.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

OT, as I said in bugzilla, I see very questionable code in that function
too, no idea what Martin meant with that:
   while (TREE_CODE (oper) == COMPONENT_REF)
 {
   tree op0 = oper;
   while (TREE_CODE (op0 = TREE_OPERAND (op0, 0)) == COMPONENT_REF);
   if (TREE_CODE (op0) == VAR_DECL)
 var_decl = op0;
   oper = TREE_OPERAND (oper, 1);
 }
TREE_OPERAND (oper, 1) of a COMPONENT_REF should be always a FIELD_DECL,
so this will never loop.  Did you mean to use if instead of while, something
different?

Or perhaps TREE_OPERAND (oper, 0)?



2016-04-01  Jakub Jelinek  
Marek Polacek  

PR c++/70488
* init.c (warn_placement_new_too_small): Test whether
DECL_SIZE_UNIT or TYPE_SIZE_UNIT are integers that fit into uhwi.

* g++.dg/init/new47.C: New test.

OK.
jeff



[c++/55635] operator delete and throwing dtors

2016-04-01 Thread Nathan Sidwell

this fixes bug 55635 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55635

Somewhat surprisingly, a delete expression should always call the operator 
delete function, regardless of whether the object destructor(s) throw.  See this 
note at the end of 5.3.5/7:


[Note: The deallocation function is called regardless of whether the destructor 
for the object or some element of the array throws an exception. — end note ]


The fix is relatively simple -- use a TRY_FINALLY_EXPR, rather than a 
COMPOUND_EXPR to glue the destruction and operator delete calls together.


While there I noticed that build_delete_destructor_body was unnecessarily 
emitting cdtor_label -- that looks like a bit of copy&paste when that 
functionality was broken out of finish_destructor_body (from  whence I removed 
the no longer active operator delete call yesterday).


Mostly, destructors are of course nothrow, particularly now that DR1123 is 
implemented.  Thus usually the exception path gets DCE'd.


Fixed thusly, ok?

nathan
2016-04-01  Nathan Sidwell  

	PR c++/55635
	* init.c (build_vec_delete_1): Protect operator delete call in try
	finally.
	(build_delete): Likewise.
	* optimize.c (build_delete_destructor_body): Likewise.

	PR c++/55635
	* g++.dg/eh/delete1.C: New.

Index: cp/init.c
===
--- cp/init.c	(revision 234668)
+++ cp/init.c	(working copy)
@@ -3671,7 +3671,9 @@ build_vec_delete_1 (tree base, tree maxi
   else if (!body)
 body = deallocate_expr;
   else
-body = build_compound_expr (input_location, body, deallocate_expr);
+/* The delete operator mist be called, even if a destructor
+   throws.  */
+body = build2 (TRY_FINALLY_EXPR, void_type_node, body, deallocate_expr);
 
   if (!body)
 body = integer_zero_node;
@@ -4508,7 +4510,13 @@ build_delete (tree otype, tree addr, spe
   if (expr == error_mark_node)
 	return error_mark_node;
   if (do_delete)
-	expr = build2 (COMPOUND_EXPR, void_type_node, expr, do_delete);
+	/* The delete operator must be called, regardless of whether
+	   the destructor throws.
+
+	   [expr.delete]/7 The deallocation function is called
+	   regardless of whether the destructor for the object or some
+	   element of the array throws an exception.  */
+	expr = build2 (TRY_FINALLY_EXPR, void_type_node, expr, do_delete);
 
   /* We need to calculate this before the dtor changes the vptr.  */
   if (head)
Index: cp/optimize.c
===
--- cp/optimize.c	(revision 234668)
+++ cp/optimize.c	(working copy)
@@ -112,26 +112,24 @@ clone_body (tree clone, tree fn, void *a
 static void
 build_delete_destructor_body (tree delete_dtor, tree complete_dtor)
 {
-  tree call_dtor, call_delete;
   tree parm = DECL_ARGUMENTS (delete_dtor);
   tree virtual_size = cxx_sizeof (current_class_type);
 
   /* Call the corresponding complete destructor.  */
   gcc_assert (complete_dtor);
-  call_dtor = build_cxx_call (complete_dtor, 1, &parm,
-			  tf_warning_or_error);
-  add_stmt (call_dtor);
-
-  add_stmt (build_stmt (0, LABEL_EXPR, cdtor_label));
+  tree call_dtor = build_cxx_call (complete_dtor, 1, &parm,
+   tf_warning_or_error);
 
   /* Call the delete function.  */
-  call_delete = build_op_delete_call (DELETE_EXPR, current_class_ptr,
-  virtual_size,
-  /*global_p=*/false,
-  /*placement=*/NULL_TREE,
-  /*alloc_fn=*/NULL_TREE,
-  tf_warning_or_error);
-  add_stmt (call_delete);
+  tree call_delete = build_op_delete_call (DELETE_EXPR, current_class_ptr,
+	   virtual_size,
+	   /*global_p=*/false,
+	   /*placement=*/NULL_TREE,
+	   /*alloc_fn=*/NULL_TREE,
+	   tf_warning_or_error);
+
+  /* Operator delete must be called, whether or not the dtor throws.  */
+  add_stmt (build2 (TRY_FINALLY_EXPR, void_type_node, call_dtor, call_delete));
 
   /* Return the address of the object.  */
   if (targetm.cxx.cdtor_returns_this ())
Index: testsuite/g++.dg/eh/delete1.C
===
--- testsuite/g++.dg/eh/delete1.C	(nonexistent)
+++ testsuite/g++.dg/eh/delete1.C	(working copy)
@@ -0,0 +1,79 @@
+// { dg-do run }
+// pr 55635, the delete operator must be called, regardless of whether
+// the dtor throws
+
+static int deleted;
+
+void operator delete (void *) throw ()
+{
+  deleted = 1;
+}
+
+struct Foo {
+  ~Foo() throw(int) {throw 1;}
+};
+
+struct Baz {
+  void operator delete (void *) throw ()
+  {
+deleted = 2;
+  }
+  virtual ~Baz() throw(int) {throw 1;}
+};
+
+int non_virt ()
+{
+  deleted = 0;
+  
+  Foo *p = new Foo;
+  try { delete p; }
+  catch (...) { return deleted != 1;}
+  return 1;
+}
+
+int virt_glob ()
+{
+  deleted = 0;
+  
+  Baz *p = ::new Baz;
+  try { ::delete p; }
+  catch (...) { return deleted != 1;}
+  return 1;
+

[Committed] PR70404 S/390: Fix insv expansion.

2016-04-01 Thread Andreas Krebbel
While the expander accepts general_operand as src operand the risbg
pattern only immediate_operand.  Unfortunately the expander called
force_reg only for VOIDmode constants missing things like
e.g. symbol_refs.  Fixed with the attached patch.

Bootstrapped on s390 and s390x.

This fixes the pr70174.c testcase on s390x (-march=z10).

Applied to mainline.

Bye,

-Andreas-

gcc/ChangeLog:

2016-04-01  Andreas Krebbel  

PR target/70404
* config/s390/s390.c (s390_expand_insv): Check for everything
constant instead of just VOIDmode stuff.
---
 gcc/config/s390/s390.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 4f219be..1134d0f 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -6063,7 +6063,7 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
 {
   machine_mode mode_s = GET_MODE (src);
 
-  if (mode_s == VOIDmode)
+  if (CONSTANT_P (src))
{
  /* For constant zero values the representation with AND
 appears to be folded in more situations than the (set
-- 
1.9.1



Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-01 Thread Bernd Schmidt

On 04/01/2016 11:08 AM, Richard Biener wrote:

{
! if (canon_true_dependence (s_info->mem,
!GET_MODE (s_info->mem),
!s_info->mem_addr,
!mem, mem_addr))
{
  s_info->rhs = NULL;
  s_info->const_rhs = NULL;
--- 1609,1617 
   the value of store_info.  If it is, set the rhs to NULL to
   keep it from being used to remove a load.  */
{
! if (canon_output_dependence (s_info->mem, true,
!  mem, GET_MODE (mem),
!  mem_addr))
{
  s_info->rhs = NULL;
  s_info->const_rhs = NULL;


I think the patch is ok, but there is a comment in that function which 
references canon_true_dependence; that should also be fixed up.


Isn't the testcase invalid though? I thought accesses through char * 
pointers bypass aliasing rules, but accessing a char array through int * 
and long * pointers doesn't?



Bernd


Re: [PATCH 2/2] Fix PR hsa/70402

2016-04-01 Thread Martin Jambor
Hi,

On Thu, Mar 31, 2016 at 12:50:54PM +0200, Martin Liska wrote:
> On 03/29/2016 01:44 PM, Martin Liška wrote:
> > Second part of the patch set which omits one split_block (compared to the 
> > original patch).
> > Acceptable just in case the first part will be accepted.
> > 
> > Thanks
> > Martin
> > 
> 
> Hi.
> 
> I'm sending v3 of the patch which does not immediately update dominator,
> but sets a flag that eventually triggers the update.
> 

The patch is OK after you change the name of the flag (introduced in a
different patch) to the new one.

Thanks,

Martin


[Patch ARM] Fix PR target/53440 - handle generic thunks better for TARGET_32BIT.

2016-04-01 Thread Ramana Radhakrishnan
I've had this in my tree for a few months now but never got
around to submitting it.

This partially fixes PR target/53440 atleast in ARM and
Thumb2 state. I haven't yet managed to get my head around
rewriting the Thumb1 support yet.

Tested on armhf with a bootstrap and regression test
with no regressions.

Queued for stage1 now as it isn't technically a regression.

regards
Ramana


  Ramana Radhakrishnan  

PR target/53440
* config/arm/arm.c (arm32_output_mi_thunk): New.
(arm_output_mi_thunk): Rename to arm_thumb1_mi_thunk. Rework
to split Thumb1 vs TARGET_32BIT functionality.
(arm_thumb1_mi_thunk): New.


* g++.dg/inherit/thunk1.C: Support arm / aarch64.
commit da356d8d7df8d958920c66bf18ae9c1473d98ec2
Author: Ramana Radhakrishnan 
Date:   Fri Nov 20 13:49:12 2015 +

thunk

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 2075c41..cdecf29 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -300,6 +300,9 @@ static void arm_canonicalize_comparison (int *code, rtx 
*op0, rtx *op1,
 static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void);
 
 static void arm_sched_fusion_priority (rtx_insn *, int, int *, int*);
+static bool arm_can_output_mi_thunk (const_tree, HOST_WIDE_INT, HOST_WIDE_INT,
+const_tree);
+
 
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -463,7 +466,7 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef  TARGET_ASM_OUTPUT_MI_THUNK
 #define TARGET_ASM_OUTPUT_MI_THUNK arm_output_mi_thunk
 #undef  TARGET_ASM_CAN_OUTPUT_MI_THUNK
-#define TARGET_ASM_CAN_OUTPUT_MI_THUNK default_can_output_mi_thunk_no_vcall
+#define TARGET_ASM_CAN_OUTPUT_MI_THUNK arm_can_output_mi_thunk
 
 #undef  TARGET_RTX_COSTS
 #define TARGET_RTX_COSTS arm_rtx_costs
@@ -26040,11 +26043,10 @@ arm_internal_label (FILE *stream, const char *prefix, 
unsigned long labelno)
 
 /* Output code to add DELTA to the first argument, and then jump
to FUNCTION.  Used for C++ multiple inheritance.  */
+
 static void
-arm_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
-HOST_WIDE_INT delta,
-HOST_WIDE_INT vcall_offset ATTRIBUTE_UNUSED,
-tree function)
+arm_thumb1_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta,
+HOST_WIDE_INT, tree function)
 {
   static int thunk_label = 0;
   char label[256];
@@ -26185,6 +26187,76 @@ arm_output_mi_thunk (FILE *file, tree thunk 
ATTRIBUTE_UNUSED,
   final_end_function ();
 }
 
+/* MI thunk handling for TARGET_32BIT.  */
+
+static void
+arm32_output_mi_thunk (FILE *file, tree, HOST_WIDE_INT delta,
+  HOST_WIDE_INT vcall_offset, tree function)
+{
+  /* On ARM, this_regno is R0 or R1 depending on
+ whether the function returns an aggregate or not.
+  */
+  int this_regno = (aggregate_value_p (TREE_TYPE (TREE_TYPE (function)),
+  function)
+   ? R1_REGNUM : R0_REGNUM);
+
+  rtx temp = gen_rtx_REG (Pmode, IP_REGNUM);
+  rtx this_rtx = gen_rtx_REG (Pmode, this_regno);
+  reload_completed = 1;
+  emit_note (NOTE_INSN_PROLOGUE_END);
+
+  /* Add DELTA to THIS_RTX.  */
+  if (delta != 0)
+arm_split_constant (PLUS, Pmode, NULL_RTX,
+   delta, this_rtx, this_rtx, false);
+
+  /* Add *(*THIS_RTX + VCALL_OFFSET) to THIS_RTX.  */
+  if (vcall_offset != 0)
+{
+  /* Load *THIS_RTX.  */
+  emit_move_insn (temp, gen_rtx_MEM (Pmode, this_rtx));
+  /* Compute *THIS_RTX + VCALL_OFFSET.  */
+  arm_split_constant (PLUS, Pmode, NULL_RTX, vcall_offset, temp, temp,
+ false);
+  /* Compute *(*THIS_RTX + VCALL_OFFSET).  */
+  emit_move_insn (temp, gen_rtx_MEM (Pmode, temp));
+  emit_insn (gen_add3_insn (this_rtx, this_rtx, temp));
+}
+
+  /* Generate a tail call to the target function.  */
+  if (!TREE_USED (function))
+{
+  assemble_external (function);
+  TREE_USED (function) = 1;
+}
+  rtx funexp = XEXP (DECL_RTL (function), 0);
+  funexp = gen_rtx_MEM (FUNCTION_MODE, funexp);
+  rtx_insn * insn = emit_call_insn (gen_sibcall (funexp, const0_rtx, 
NULL_RTX));
+  SIBLING_CALL_P (insn) = 1;
+
+  insn = get_insns ();
+  shorten_branches (insn);
+  final_start_function (insn, file, 1);
+  final (insn, file, 1);
+  final_end_function ();
+
+  /* Stop pretending this is a post-reload pass.  */
+  reload_completed = 0;
+}
+
+/* Output code to add DELTA to the first argument, and then jump
+   to FUNCTION.  Used for C++ multiple inheritance.  */
+
+static void
+arm_output_mi_thunk (FILE *file, tree thunk, HOST_WIDE_INT delta,
+HOST_WIDE_INT vcall_offset, tree function)
+{
+  if (TARGET_32BIT)
+arm32_output_mi_thunk (file, thunk, delta, vcall_offset, function);
+  else
+arm_thumb1_mi_thunk (file, thunk, delta, vcall_offset, function);
+}
+
 int
 arm_emit_vector_co

Re: [PATCH] Prevent loops from being optimized away

2016-04-01 Thread Andrew Pinski
On Thu, Mar 31, 2016 at 9:54 PM, Segher Boessenkool
 wrote:
> Sometimes people write loops that they do not want optimized away, even
> when the compiler can replace those loops by a simple expression (or
> nothing).  For such people, this patch adds a compiler option.


The Linux kernel has a nice workaround for this case, at least for the
divide case.

Thanks,
ANdrew

>
> Bootstrapped on powerpc64-linux; regression check still in progress
> (with Init(1) to actually test anything).
>
>
> Segher
>
>
> 2016-04-01  Segher Boessenkool  
>
> * loop-init.c: Include some more stuff that really doesn't belong
> here, oh well.
> (loop_optimizer_init): Add empty asm statements in all gimple loops,
> if asked to.
> * common.opt: Add new option.
>
> ---
>  gcc/common.opt  |  4 
>  gcc/loop-init.c | 15 +++
>  2 files changed, 19 insertions(+)
>
> diff --git a/gcc/loop-init.c b/gcc/loop-init.c
> index 8634591..7c5dc24 100644
> --- a/gcc/loop-init.c
> +++ b/gcc/loop-init.c
> @@ -24,6 +24,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "target.h"
>  #include "rtl.h"
>  #include "tree.h"
> +#include "gimple.h"
> +#include "gimple-iterator.h"
>  #include "cfghooks.h"
>  #include "df.h"
>  #include "regs.h"
> @@ -91,6 +93,19 @@ loop_optimizer_init (unsigned flags)
>
>/* Find the loops.  */
>current_loops = flow_loops_find (NULL);
> +
> +  if (flag_never_gonna_give_you_up && current_ir_type () == IR_GIMPLE)
> +   {
> + struct loop *loop;
> + FOR_EACH_LOOP (loop, 0)
> +   if (loop->latch)
> + {
> +   gasm *p = gimple_build_asm_vec ("", 0, 0, 0, 0);
> +   gimple_asm_set_volatile (p, true);
> +   gimple_stmt_iterator bsi = gsi_after_labels (loop->latch);
> +   gsi_insert_before (&bsi, p, GSI_SAME_STMT);
> + }
> +   }
>  }
>else
>  {
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 0f3bb4e..b7c0a6a 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2002,6 +2002,10 @@ frerun-loop-opt
>  Common Ignore
>  Does nothing.  Preserved for backward compatibility.
>
> +frickroll-all-loops
> +Common Report Var(flag_never_gonna_give_you_up) Init(0) Optimization
> +You know the rules, and so do I.
> +
>  frounding-math
>  Common Report Var(flag_rounding_math) Optimization SetByCombined
>  Disable optimizations that assume default FP rounding behavior.
> --
> 1.9.3
>


Re: [PATCH] Improve add/sub double word splitters (PR rtl-optimization/70467)

2016-04-01 Thread Uros Bizjak
On Fri, Apr 1, 2016 at 3:18 PM, Jakub Jelinek  wrote:
> Hi!
>
> As the testcase below shows, we generate awful code for double word
> additions/subtractions if the last argument is a constant that has the
> whole low word 0 (and only nonzero some of the upper bits).
> In that case, there is no point doing useless addl $0, ... followed by
> adcl $something, ... because the addition won't change anything and the
> carry flag will be also clear; we can just add the high word to the high
> part.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-04-01  Jakub Jelinek  
>
> PR rtl-optimization/70467
> * config/i386/i386.md (*add3_doubleword, *sub3_doubleword):
> If low word of the last operand is 0, just emit addition/subtraction
> for the high word.
>
> * gcc.target/i386/pr70467-2.c: New test.

OK with a small testcase adjustment.

Thanks,
Uros.

> --- gcc/config/i386/i386.md.jj  2016-03-29 19:31:23.0 +0200
> +++ gcc/config/i386/i386.md 2016-03-31 17:33:36.848167239 +0200
> @@ -5449,7 +5449,14 @@ (define_insn_and_split "*add3_doubl
>(match_dup 4))
>  (match_dup 5)))
>   (clobber (reg:CC FLAGS_REG))])]
> -  "split_double_mode (mode, &operands[0], 3, &operands[0], 
> &operands[3]);")
> +{
> +  split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
> +  if (operands[2] == const0_rtx)
> +{
> +  ix86_expand_binary_operator (PLUS, mode, &operands[3]);
> +  DONE;
> +}
> +})
>
>  (define_insn "*add_1"
>[(set (match_operand:SWI48 0 "nonimmediate_operand" "=r,rm,r,r")
> @@ -6379,7 +6386,14 @@ (define_insn_and_split "*sub3_doubl
>(ltu:DWIH (reg:CC FLAGS_REG) (const_int 0)))
>  (match_dup 5)))
>   (clobber (reg:CC FLAGS_REG))])]
> -  "split_double_mode (mode, &operands[0], 3, &operands[0], 
> &operands[3]);")
> +{
> +  split_double_mode (mode, &operands[0], 3, &operands[0], &operands[3]);
> +  if (operands[2] == const0_rtx)
> +{
> +  ix86_expand_binary_operator (MINUS, mode, &operands[3]);
> +  DONE;
> +}
> +})
>
>  (define_insn "*sub_1"
>[(set (match_operand:SWI 0 "nonimmediate_operand" "=m,")
> --- gcc/testsuite/gcc.target/i386/pr70467-2.c.jj2016-04-01 
> 12:29:15.611785157 +0200
> +++ gcc/testsuite/gcc.target/i386/pr70467-2.c   2016-04-01 12:31:19.980092446 
> +0200
> @@ -0,0 +1,20 @@
> +/* PR rtl-optimization/70467 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +unsigned long long
> +foo (unsigned long long x)
> +{
> +  return x + 0x123456ULL;
> +}
> +
> +unsigned long long
> +bar (unsigned long long x)
> +{
> +  return x - 0x123456ULL;
> +}
> +
> +/* { dg-final { scan-assembler-not "addl\[ \t\]*.0," { target ia32 } } } */
> +/* { dg-final { scan-assembler-not "subl\[ \t\]*.0," { target ia32 } } } */
> +/* { dg-final { scan-assembler-not "adcl\[^\n\r\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler-not "sbbl\[^\n\r\]*%" { target ia32 } } } */

Please compile the test only for ia32 target. The purpose of the test
is to scan assembly on ia32, and obviously, there is no point to
compile it on x86_64.


Re: [C++ PATCH] Fix ICE in warn_placement_new_too_small (PR c++/70488)

2016-04-01 Thread Martin Sebor

On 04/01/2016 09:11 AM, Jakub Jelinek wrote:

Hi!

The new warn_placement_new_too_small function blindly assumes that
if {DECL,TYPE}_SIZE_UNIT is non-NULL, then it must be INTEGER_CST
that fits into uhwi.  That is not the case, it could be a VLA, etc.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

OT, as I said in bugzilla, I see very questionable code in that function
too, no idea what Martin meant with that:
   while (TREE_CODE (oper) == COMPONENT_REF)
 {
   tree op0 = oper;
   while (TREE_CODE (op0 = TREE_OPERAND (op0, 0)) == COMPONENT_REF);
   if (TREE_CODE (op0) == VAR_DECL)
 var_decl = op0;
   oper = TREE_OPERAND (oper, 1);
 }
TREE_OPERAND (oper, 1) of a COMPONENT_REF should be always a FIELD_DECL,
so this will never loop.  Did you mean to use if instead of while, something
different?


Thanks for the patch!  I suspect the loop was either a thinko
or the result of moving code around while in development, or
both.  I do remember meaning to revisit the loop because it
didn't make complete sense even to me but forgot to get back
to it.  Let me look into cleaning it up after I'm done with
what I'm working on now.

Martin



2016-04-01  Jakub Jelinek  
Marek Polacek  

PR c++/70488
* init.c (warn_placement_new_too_small): Test whether
DECL_SIZE_UNIT or TYPE_SIZE_UNIT are integers that fit into uhwi.

* g++.dg/init/new47.C: New test.

--- gcc/cp/init.c.jj2016-03-31 10:55:58.0 +0200
+++ gcc/cp/init.c   2016-04-01 14:23:25.977800499 +0200
@@ -2430,7 +2430,8 @@ warn_placement_new_too_small (tree type,
 though the size of a member of a union may be viewed as extending
 to the end of the union itself (it is by __builtin_object_size).  */
if ((TREE_CODE (oper) == VAR_DECL || use_obj_size)
- && DECL_SIZE_UNIT (oper))
+ && DECL_SIZE_UNIT (oper)
+ && tree_fits_uhwi_p (DECL_SIZE_UNIT (oper)))
{
  /* Use the size of the entire array object when the expression
 refers to a variable or its size depends on an expression
@@ -2438,7 +2439,8 @@ warn_placement_new_too_small (tree type,
  bytes_avail = tree_to_uhwi (DECL_SIZE_UNIT (oper));
  exact_size = !use_obj_size;
}
-  else if (TYPE_SIZE_UNIT (TREE_TYPE (oper)))
+  else if (TYPE_SIZE_UNIT (TREE_TYPE (oper))
+  && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (oper
{
  /* Use the size of the type of the destination buffer object
 as the optimistic estimate of the available space in it.  */
--- gcc/testsuite/g++.dg/init/new47.C.jj2016-04-01 14:36:20.516355623 
+0200
+++ gcc/testsuite/g++.dg/init/new47.C   2016-04-01 14:36:34.162171718 +0200
@@ -0,0 +1,19 @@
+// PR c++/70448
+// { dg-do compile }
+// { dg-options "-Wall" }
+
+typedef __typeof__ (sizeof 0) size_t;
+void *operator new (size_t, void *p) { return p; }
+void *operator new[] (size_t, void *p) { return p; }
+struct S { size_t s; };
+void bar (S *);
+
+void
+foo (unsigned int s)
+{
+  char t[sizeof (S) + s];
+  S *f = new (t) S;
+  bar (f);
+  f = new (t) S[1];
+  bar (f);
+}

Jakub





Re: [c++/55635] operator delete and throwing dtors

2016-04-01 Thread Jason Merrill

On 04/01/2016 11:21 AM, Nathan Sidwell wrote:

this fixes bug 55635 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55635


Please include the word "patch" in the subject line of patch email.

The patch looks good, but this doesn't seem to be a regression; do you 
think it's an important enough bug to fix in stage 4?


Jason



Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-01 Thread Jeff Law

On 04/01/2016 03:44 AM, Richard Biener wrote:

On Fri, 1 Apr 2016, Jakub Jelinek wrote:


On Fri, Apr 01, 2016 at 11:08:09AM +0200, Richard Biener wrote:


RTL DSE uses true_dependence to see whether a store may be killed by
anothe store - that's obviously broken.  The following patch makes
it use output_dependence instead (introducing a canon_ variant of that).


I think it would be interesting to see some stats on what effect does this
have on the optimization RTL DSE is doing (say gather during
unpatched bootstrap/regtest number of successfully optimized replace_read
calls, and the same with patched bootstrap/regtest).
 From quick look at the patch, this wouldn't optimize even the cases that
could be optimized (return *pi;) at the RTL level.  If the statistics would
show this affects it significantly, perhaps we could do both
canon_true_dependence and canon_output_dependence, and if the two calls
differ, don't clear the rhs, but mark it somehow and then in replace_read
check what alias set is used for the read or something similar?


Well, I don't believe it is DSEs job to do CSE.  And I don't see how
we can efficiently do what you suggest - it seems DSE doesn't check
all possible aliases when CSEing.
IIRC, there's nowhere else we're making this kind of transformation and 
it fits in fairly naturally in the formulations I've looked at in the 
past.


It may also be the case that this was carried forward from the old DSE 
code to prevent regressions.  I'd have to do a lot more digging in the 
archives to know for sure -- what's in dse.c doesn't look anything like 
the old DSE implementation I was familiar with.


I've often speculated that all this stuff should be rewritten using the 
scheduler's dependency framework.  Essentially when store gets scheduled 
we ought to be able to identify the potential set of dead stores, 
redundant loads and potential aliasing loads/stores as the store's 
dependencies are resolved.


I wouldn't suggest doing it as part of the scheduling pass, but instead 
as a separate pass that utilizes all the scheduler's dependency 
analysis, queue management, etc.


Any regressions (in terms of stores not eliminated or loads not CSE'd) 
would likely be inaccurate (overly conservative) dataflow build by the 
scheduler and fixing those would help both the DSE bits as well as 
scheduling in general.



I'd first look to fix the CSE-ish transformation, but if that proves 
difficult/expensive, then we'd be looking at removal.  As part of the 
removal we should extract some testcases where it correctly fired and 
create a regression bug with those testcases.


Jeff



Re: [c++/55635] operator delete and throwing dtors

2016-04-01 Thread Nathan Sidwell

On 04/01/16 11:36, Jason Merrill wrote:

On 04/01/2016 11:21 AM, Nathan Sidwell wrote:

this fixes bug 55635 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55635


Please include the word "patch" in the subject line of patch email.

The patch looks good, but this doesn't seem to be a regression; do you think
it's an important enough bug to fix in stage 4?


nope -- I mistakenly thought 'bug fix & regressions'

nathan



Re: [C PATCH] Fix ICE with VEC_COND_EXPR and compound literals (PR c/70307)

2016-04-01 Thread Jeff Law

On 04/01/2016 09:03 AM, Marek Polacek wrote:

This is another case where a C_MAYBE_CONST_EXPR leaks into the gimplifier.
Starting with r229128 and thus introduction of build_vec_cmp we now create
VEC_COND_EXPR when building a vector comparison.  The C_MAYBE_CONST_EXPR
originated in build_compound_literal when creating a COMPOUND_LITERAL_EXPR.
This is then made a part of op0 of a VEC_COND_EXPR.  But c_fully_fold_internal
doesn't know what to do with VEC_COND_EXPRs so the C_MAYBE_CONST_EXPR went
unnoticed into fold, oops.  The fix here is to teach c_fully_fold_internal
how to handle VEC_COND_EXPRs, which is what this patch attempts to do.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-01  Marek Polacek  

PR c/70307
* c-fold.c (c_fully_fold_internal): Handle VEC_COND_EXPR.

* gcc.dg/torture/pr70307.c: New test.

diff --git gcc/c/c-fold.c gcc/c/c-fold.c
index f07917f..d512824 100644
--- gcc/c/c-fold.c
+++ gcc/c/c-fold.c
@@ -528,6 +528,23 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
*maybe_const_operands,
*maybe_const_itself &= op2_const_self;
goto out;

+case VEC_COND_EXPR:
+  orig_op0 = op0 = TREE_OPERAND (expr, 0);
+  op1 = TREE_OPERAND (expr, 1);
+  op2 = TREE_OPERAND (expr, 2);
+  op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands,
+  maybe_const_itself, for_int_const);
+  STRIP_TYPE_NOPS (op0);
+
+  /* OP1 will be a vector of -1 and OP2 a vector if 0, as created in
+build_vec_cmp -- no need to fold them.  */

Is this worth verifying with a gcc_assert?  Your call.

OK with or without the gcc_assert that op1/op2 are constants.

Jeff


Re: [PATCH] Prevent loops from being optimized away

2016-04-01 Thread Segher Boessenkool
On Fri, Apr 01, 2016 at 08:32:28AM -0700, Andrew Pinski wrote:
> On Thu, Mar 31, 2016 at 9:54 PM, Segher Boessenkool
>  wrote:
> > Sometimes people write loops that they do not want optimized away, even
> > when the compiler can replace those loops by a simple expression (or
> > nothing).  For such people, this patch adds a compiler option.
> 
> The Linux kernel has a nice workaround for this case, at least for the
> divide case.

I know, I wrote that code :-)  It has a slightly different purpose (and
semantics) though: it makes _sure_ that "inside, we know what is going on"
does not hold (that's what "+rm" does).  But yes, it is playing a similar
game.


Segher


Re: [C PATCH] Fix ICE with VEC_COND_EXPR and compound literals (PR c/70307)

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 09:54:39AM -0600, Jeff Law wrote:
> >--- gcc/c/c-fold.c
> >+++ gcc/c/c-fold.c
> >@@ -528,6 +528,23 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
> >*maybe_const_operands,
> > *maybe_const_itself &= op2_const_self;
> >goto out;
> >
> >+case VEC_COND_EXPR:
> >+  orig_op0 = op0 = TREE_OPERAND (expr, 0);
> >+  op1 = TREE_OPERAND (expr, 1);
> >+  op2 = TREE_OPERAND (expr, 2);
> >+  op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands,
> >+   maybe_const_itself, for_int_const);
> >+  STRIP_TYPE_NOPS (op0);
> >+
> >+  /* OP1 will be a vector of -1 and OP2 a vector if 0, as created in
> >+ build_vec_cmp -- no need to fold them.  */
> Is this worth verifying with a gcc_assert?  Your call.
> 
> OK with or without the gcc_assert that op1/op2 are constants.

I think either we should have an gcc_checking_assert, or can't
we just handle VEC_COND_EXPR like COND_EXPR, with no extra code?

Jakub


Re: [C PATCH] Fix ICE with VEC_COND_EXPR and compound literals (PR c/70307)

2016-04-01 Thread Marek Polacek
On Fri, Apr 01, 2016 at 06:02:24PM +0200, Jakub Jelinek wrote:
> On Fri, Apr 01, 2016 at 09:54:39AM -0600, Jeff Law wrote:
> > >--- gcc/c/c-fold.c
> > >+++ gcc/c/c-fold.c
> > >@@ -528,6 +528,23 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
> > >*maybe_const_operands,
> > >   *maybe_const_itself &= op2_const_self;
> > >goto out;
> > >
> > >+case VEC_COND_EXPR:
> > >+  orig_op0 = op0 = TREE_OPERAND (expr, 0);
> > >+  op1 = TREE_OPERAND (expr, 1);
> > >+  op2 = TREE_OPERAND (expr, 2);
> > >+  op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands,
> > >+ maybe_const_itself, for_int_const);
> > >+  STRIP_TYPE_NOPS (op0);
> > >+
> > >+  /* OP1 will be a vector of -1 and OP2 a vector if 0, as created in
> > >+   build_vec_cmp -- no need to fold them.  */
> > Is this worth verifying with a gcc_assert?  Your call.
> > 
> > OK with or without the gcc_assert that op1/op2 are constants.

I'm going to add the asserts.  I should've add them in the first place. 

> I think either we should have an gcc_checking_assert, or can't
> we just handle VEC_COND_EXPR like COND_EXPR, with no extra code?

It works (I seem to remember testing that variant), but the COND_EXPR case
code does a lot of stuff that doesn't seem to make sense for vectors, like
checking for truthvalue_false_node, folding op1/op2, playing games with
maybe_const_* stuff...  So I decided to add a new case.

Marek


Re: [C PATCH] Fix ICE with VEC_COND_EXPR and compound literals (PR c/70307)

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 06:14:00PM +0200, Marek Polacek wrote:
> On Fri, Apr 01, 2016 at 06:02:24PM +0200, Jakub Jelinek wrote:
> > On Fri, Apr 01, 2016 at 09:54:39AM -0600, Jeff Law wrote:
> > > >--- gcc/c/c-fold.c
> > > >+++ gcc/c/c-fold.c
> > > >@@ -528,6 +528,23 @@ c_fully_fold_internal (tree expr, bool in_init, 
> > > >bool *maybe_const_operands,
> > > > *maybe_const_itself &= op2_const_self;
> > > >goto out;
> > > >
> > > >+case VEC_COND_EXPR:
> > > >+  orig_op0 = op0 = TREE_OPERAND (expr, 0);
> > > >+  op1 = TREE_OPERAND (expr, 1);
> > > >+  op2 = TREE_OPERAND (expr, 2);
> > > >+  op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands,
> > > >+   maybe_const_itself, for_int_const);
> > > >+  STRIP_TYPE_NOPS (op0);
> > > >+
> > > >+  /* OP1 will be a vector of -1 and OP2 a vector if 0, as created in
> > > >+ build_vec_cmp -- no need to fold them.  */
> > > Is this worth verifying with a gcc_assert?  Your call.
> > > 
> > > OK with or without the gcc_assert that op1/op2 are constants.
> 
> I'm going to add the asserts.  I should've add them in the first place. 
> 
> > I think either we should have an gcc_checking_assert, or can't
> > we just handle VEC_COND_EXPR like COND_EXPR, with no extra code?
> 
> It works (I seem to remember testing that variant), but the COND_EXPR case
> code does a lot of stuff that doesn't seem to make sense for vectors, like
> checking for truthvalue_false_node, folding op1/op2, playing games with
> maybe_const_* stuff...  So I decided to add a new case.

Those comparisons with truthvalue_*_node would fail and DTRT.
Or just c_fully_fold_internal all the arguments, be ready for any future
further uses of VEC_COND_EXPR early?

Jakub


Re: Various selective scheduling fixes

2016-04-01 Thread Jeff Law

On 04/01/2016 07:26 AM, Christophe Lyon wrote:

On 1 April 2016 at 15:12, Kyrill Tkachov  wrote:

Hi Christophe, Andrey,


On 01/04/16 14:09, Christophe Lyon wrote:


On 1 April 2016 at 10:54, Andrey Belevantsev  wrote:


Hi Christophe,


On 01.04.2016 10:33, Christophe Lyon wrote:


On 31 March 2016 at 16:43, Andrey Belevantsev  wrote:


Hello,

On 14.03.2016 12:10, Andrey Belevantsev wrote:



Hello,

In this thread I will be posting the patches for the fixed selective
scheduling PRs (except the one that was already kindly checked in by
Jeff).
   The patches were tested both on x86-64 and ia64 with the following
combination: 1) the usual bootstrap/regtest, which only utilizes
sel-sched
on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2)
the
bootstrap/regtest with the second scheduler forced to sel-sched; 3)
both
schedulers forced to sel-sched.  In all cases everything seemed to be
fine.

Three of the PRs are regressions, the other two showed different
errors
across the variety of releases tested by submitters;  I think all of
them
are appropriate at this stage -- they do not touch anything outside of
selective scheduling except the first patch where a piece of code from
sched-deps.c needs to be refactored into a function to be called from
sel-sched.c.




I've backported all regression PRs to gcc-5-branch after testing there
again
with selective scheduling force enabled: PRs 64411, 0, 69032,
69102.
The first one was not marked as a regression as such but the test for
PR
70292, which is duplicate, works for me on gcc 5.1 thus making it a
regression, too.


Hi,

The backport for pr69102 shows that the new testcase fails to compile
(ICE)
when GCC is configured as:

--target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm
--with-cpu=cortex-a15 --with-fpu=neon-vfpv4



/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:
In function 'foo':


/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1:
internal compiler error: Segmentation fault
0xa64d15 crash_signal
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383
0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
0xa31cd2 invoke_dfa_lookahead_guard
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212
0xa31cd2 find_best_expr
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415
0xa343fb fill_insns
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570
0xa343fb schedule_on_fences
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395
0xa36010 sel_sched_region_2
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533
0xa36f2a sel_sched_region_1
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575
0xa36f2a sel_sched_region(int)
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676
0xa37589 run_selective_scheduling()
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752
0xa14aed rest_of_handle_sched2
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
0xa14aed execute
  /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791

See

http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt

Can you have a look?



That's because A15 is the only place which enables
autopref_multipass_dfa_lookahead_guard as the DFA lookahead guard hook.
But
autoprefetch modeling doesn't work for selective scheduling, it uses
haifa
structures that are not kept up to date during sel-sched.  So this is not
supposed to work as soon as the param value for prefetcher lookahead
depth
is positive.

The following patch works for me.  Could you check it with your testing?
If
it works fine for you, I would install the patch both for trunk and
gcc-5.
It would be great to force sel-sched to be enabled, too.  I could do that
but I don't have the hardware or cross-arm target tools at the moment.

  * haifa-sched.c (autopref_multipass_dfa_lookahead_guard):
Disable
for selective scheduler.


It does work for me, it also fixes the other ICE I reported (on pr69307).

But note that both tests pass on trunk.



This looks to me like PR rtl-optimization/68236 which I fixed on trunk.


You are right: I've just checked that backporting your patch r230088 does
fix the problems in the gcc-5 branch.

Can I commit it ? I guess the question is for Jakub?

Yes, you can commit it to the branch.

jeff



Re: Proposed Patch for Bug 69687

2016-04-01 Thread Marcel Böhme

> 
> Forgot about this issue, sorry. At least this needs guarding with #ifdef 
> HAVE_LIMITS_H, as in the other files in libiberty. Several of them also go to 
> trouble to define the macros if limits.h is missing; not sure how much of an 
> issue that is nowadays, but you might want to adapt something like the code 
> from strtol.c:
> 
> #ifndef ULONG_MAX
> #define ULONG_MAX   ((unsigned long)(~0L))  /* 0x */
> #endif
> 
> #ifndef LONG_MAX
> #define LONG_MAX((long)(ULONG_MAX >> 1))/* 0x7FFF */
> #endif
> 
> Mind trying that and doing a full test run as described in my other mail?

Regression tested on x86_64-pc-linux-gnu (make check). Checked PR69687 is 
resolved (via binutils); even for our definition of INT_MAX.
No test cases added since the test would take up to 2 minutes and end in 
xmalloc_failed even in the successful case.

Thanks,
- Marcel

--

Index: libiberty/cplus-dem.c
===
--- libiberty/cplus-dem.c   (revision 234663)
+++ libiberty/cplus-dem.c   (working copy)
@@ -56,6 +56,13 @@ void * malloc ();
 void * realloc ();
 #endif
 
+#ifdef HAVE_LIMITS_H
+#include 
+#endif
+#ifndef INT_MAX
+# define INT_MAX   (int)(((unsigned int) ~0) >> 1)  /* 0x7FFF 
*/ 
+#endif
+
 #include 
 #undef CURRENT_DEMANGLING_STYLE
 #define CURRENT_DEMANGLING_STYLE work->options
@@ -4256,6 +4263,8 @@ remember_type (struct work_stuff *work,
}
   else
{
+  if (work -> typevec_size > INT_MAX / 2)
+   xmalloc_failed (INT_MAX);
  work -> typevec_size *= 2;
  work -> typevec
= XRESIZEVEC (char *, work->typevec, work->typevec_size);
@@ -4283,6 +4292,8 @@ remember_Ktype (struct work_stuff *work,
}
   else
{
+  if (work -> ksize > INT_MAX / 2)
+   xmalloc_failed (INT_MAX);
  work -> ksize *= 2;
  work -> ktypevec
= XRESIZEVEC (char *, work->ktypevec, work->ksize);
@@ -4312,6 +4323,8 @@ register_Btype (struct work_stuff *work)
}
   else
{
+  if (work -> bsize > INT_MAX / 2)
+   xmalloc_failed (INT_MAX);
  work -> bsize *= 2;
  work -> btypevec
= XRESIZEVEC (char *, work->btypevec, work->bsize);
@@ -4766,6 +4779,8 @@ string_need (string *s, int n)
   else if (s->e - s->p < n)
 {
   tem = s->p - s->b;
+  if (n > INT_MAX / 2 - tem)
+xmalloc_failed (INT_MAX); 
   n += tem;
   n *= 2;
   s->b = XRESIZEVEC (char, s->b, n);








Re: [C PATCH] Fix ICE with VEC_COND_EXPR and compound literals (PR c/70307)

2016-04-01 Thread Marek Polacek
On Fri, Apr 01, 2016 at 06:17:57PM +0200, Jakub Jelinek wrote:
> Those comparisons with truthvalue_*_node would fail and DTRT.
> Or just c_fully_fold_internal all the arguments, be ready for any future
> further uses of VEC_COND_EXPR early?

..thus revive my earlier version of the patch that does it:

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-01  Marek Polacek  

PR c/70307
* c-fold.c (c_fully_fold_internal): Handle VEC_COND_EXPR.

* gcc.dg/torture/pr70307.c: New test.

diff --git gcc/c/c-fold.c gcc/c/c-fold.c
index f07917f..6c82f24 100644
--- gcc/c/c-fold.c
+++ gcc/c/c-fold.c
@@ -528,6 +528,26 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
*maybe_const_operands,
*maybe_const_itself &= op2_const_self;
   goto out;
 
+case VEC_COND_EXPR:
+  orig_op0 = op0 = TREE_OPERAND (expr, 0);
+  orig_op1 = op1 = TREE_OPERAND (expr, 1);
+  orig_op2 = op2 = TREE_OPERAND (expr, 2);
+  op0 = c_fully_fold_internal (op0, in_init, maybe_const_operands,
+  maybe_const_itself, for_int_const);
+  STRIP_TYPE_NOPS (op0);
+  op1 = c_fully_fold_internal (op1, in_init, maybe_const_operands,
+  maybe_const_itself, for_int_const);
+  STRIP_TYPE_NOPS (op1);
+  op2 = c_fully_fold_internal (op2, in_init, maybe_const_operands,
+  maybe_const_itself, for_int_const);
+  STRIP_TYPE_NOPS (op2);
+
+  if (op0 != orig_op0 || op1 != orig_op1 || op2 != orig_op2)
+   ret = fold_build3_loc (loc, code, TREE_TYPE (expr), op0, op1, op2);
+  else
+   ret = fold (expr);
+  goto out;
+
 case EXCESS_PRECISION_EXPR:
   /* Each case where an operand with excess precision may be
 encountered must remove the EXCESS_PRECISION_EXPR around
diff --git gcc/testsuite/gcc.dg/torture/pr70307.c 
gcc/testsuite/gcc.dg/torture/pr70307.c
index e69de29..d47c4b6 100644
--- gcc/testsuite/gcc.dg/torture/pr70307.c
+++ gcc/testsuite/gcc.dg/torture/pr70307.c
@@ -0,0 +1,62 @@
+/* PR c/70307 */
+/* { dg-do compile } */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+
+v4si foo (v4si);
+
+v4si
+fn1 (int i)
+{
+  return i <= (v4si){(0, 0)};
+}
+
+v4si
+fn2 (int i)
+{
+  v4si r;
+  r = i <= (v4si){(0, 0)};
+  return r;
+}
+
+v4si
+fn3 (int i)
+{
+  return foo (i <= (v4si){(0, 0)});
+}
+
+v4si
+fn4 (int i)
+{
+  struct S { v4si v; };
+  struct S s = { .v = i <= (v4si){(0, 0)} };
+  return s.v;
+}
+
+v4si
+fn5 (int i)
+{
+  return (v4si){(1, i++)} == (v4si){(0, 0)};
+}
+
+v4si
+fn6 (int i)
+{
+  v4si r;
+  r = (v4si){(1, i++)} == (v4si){(0, 0)};
+  return r;
+}
+
+v4si
+fn7 (int i)
+{
+  return foo ((v4si){(1, i++)} == (v4si){(0, 0)});
+}
+
+v4si
+fn8 (int i)
+{
+  struct S { v4si v; };
+  struct S s = { .v = (v4si){(1, i++)} == (v4si){(0, 0)} };
+  return s.v;
+}


Marek


Re: [C PATCH] Fix ICE with VEC_COND_EXPR and compound literals (PR c/70307)

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 07:10:06PM +0200, Marek Polacek wrote:
> On Fri, Apr 01, 2016 at 06:17:57PM +0200, Jakub Jelinek wrote:
> > Those comparisons with truthvalue_*_node would fail and DTRT.
> > Or just c_fully_fold_internal all the arguments, be ready for any future
> > further uses of VEC_COND_EXPR early?
> 
> ..thus revive my earlier version of the patch that does it:
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2016-04-01  Marek Polacek  
> 
>   PR c/70307
>   * c-fold.c (c_fully_fold_internal): Handle VEC_COND_EXPR.
> 
>   * gcc.dg/torture/pr70307.c: New test.

LGTM, thanks.

Jakub


Re: [PATCH] Fix PR70484, RTL DSE using wrong dependence check

2016-04-01 Thread Richard Biener
On April 1, 2016 5:26:21 PM GMT+02:00, Bernd Schmidt  
wrote:
>On 04/01/2016 11:08 AM, Richard Biener wrote:
>>  {
>> !  if (canon_true_dependence (s_info->mem,
>> ! GET_MODE (s_info->mem),
>> ! s_info->mem_addr,
>> ! mem, mem_addr))
>>  {
>>s_info->rhs = NULL;
>>s_info->const_rhs = NULL;
>> --- 1609,1617 
>> the value of store_info.  If it is, set the rhs to NULL to
>> keep it from being used to remove a load.  */
>>  {
>> !  if (canon_output_dependence (s_info->mem, true,
>> !   mem, GET_MODE (mem),
>> !   mem_addr))
>>  {
>>s_info->rhs = NULL;
>>s_info->const_rhs = NULL;
>
>I think the patch is ok, but there is a comment in that function which 
>references canon_true_dependence; that should also be fixed up.
>
>Isn't the testcase invalid though? I thought accesses through char * 
>pointers bypass aliasing rules, but accessing a char array through int
>* 
>and long * pointers doesn't?

It doesn't bypass aliasing rules but instead stores change the dynamic type of 
memory.

But the tests case is invalid for reasons of alignment, I'll adjust it 
accordingly before committing.

Richard.

>
>Bernd




Re: [RFC] introduce --param max-lto-partition for having an upper bound on partition size

2016-04-01 Thread Richard Biener
On April 1, 2016 3:48:35 PM GMT+02:00, Prathamesh Kulkarni 
 wrote:
>Hi,
>The attached patch introduces param max-lto-partition which creates an
>upper
>bound for partition size.
>
>My primary motivation for this patch is to fix building chromium for
>arm
>with -flto-partition=one.
>Chromium fails to build with -flto-partition={none, one} with assembler
>error:
>"branch out of range error"
>because in both these cases LTO creates a single text section of 18 mb
>which exceeds thumb's limit of 16 mb and arm backend emits a short
>call if caller and callee are in same section.
>This is binutils PR18625:
>https://sourceware.org/bugzilla/show_bug.cgi?id=18625
>With patch, chromium builds for -flto-partition=one (by creating more
>than one but minimal number of partitions to honor 16 mb limit).
>I haven't tested with -flto-partition=none but I suppose the build
>will still fail for none, because it won't involve partitioning?  I am
>not sure how to fix for none case.
>
>As suggested by Jim in binutils PR18625, the proper fix would be to
>implement branch relaxation in arm's port of gas, however I suppose
>only LTO will realistically create such large sections,
>and implementing branch relaxation appears to be quite complicated and
>probably too much of
>an effort for this single use case, so this patch serves as a
>work-around to the issue.
>I am looking into fine-tuning the param value for ARM backend to
>roughly match limit
>of 16 mb.
>
>AFAIU, this would change semantics of --param n_lto_partitions (or
>-flto-partition=one) from
>"exactly n_lto_partitions" to "at-least n_lto_partitions". If that's
>not desirable maybe we could add
>another param/option ?
>Cross-tested on arm*-*-*.
>Would this patch be OK for stage-1 (after getting param value right
>for ARM target) ?

What do you want to achieve?  Changing =one semantics doesn't look right to me.
Adding a param for maximum size sounds good in general, but only to increase 
the maximum number of partitions for =balanced (the default).

Richard.

>
>Thanks,
>Prathamesh




Re: backported patch for PR69614

2016-04-01 Thread Vladimir Makarov

On 03/31/2016 05:35 AM, Christophe Lyon wrote:

On 30 March 2016 at 18:01, Vladimir Makarov  wrote:

   The patch for PR69614 has been backported to gcc-5 branch:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614

   Committed as rev. 234577.


Hi,

As I've already reported:
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00773.html
the new test executes incorrectly on armeb --with-fpu=neon


Sorry, I tried to reproduce it on today trunk on a real hardware but 
I've failed.




Re: [C PATCH] Fix ICE with VEC_COND_EXPR and compound literals (PR c/70307)

2016-04-01 Thread Marek Polacek
On Fri, Apr 01, 2016 at 07:22:24PM +0200, Jakub Jelinek wrote:
> On Fri, Apr 01, 2016 at 07:10:06PM +0200, Marek Polacek wrote:
> > On Fri, Apr 01, 2016 at 06:17:57PM +0200, Jakub Jelinek wrote:
> > > Those comparisons with truthvalue_*_node would fail and DTRT.
> > > Or just c_fully_fold_internal all the arguments, be ready for any future
> > > further uses of VEC_COND_EXPR early?
> > 
> > ..thus revive my earlier version of the patch that does it:
> > 
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> > 
> > 2016-04-01  Marek Polacek  
> > 
> > PR c/70307
> > * c-fold.c (c_fully_fold_internal): Handle VEC_COND_EXPR.
> > 
> > * gcc.dg/torture/pr70307.c: New test.
> 
> LGTM, thanks.

Thanks.  I'll wait till tomorrow to see if Joseph has any comments on this
patch.

Marek


Re: [PATCH] Prevent loops from being optimized away

2016-04-01 Thread Richard Biener
On April 1, 2016 5:18:10 PM GMT+02:00, Segher Boessenkool 
 wrote:
>On Fri, Apr 01, 2016 at 09:36:49AM +0200, Richard Biener wrote:
>> On Fri, Apr 1, 2016 at 6:54 AM, Segher Boessenkool
>>  wrote:
>> > Sometimes people write loops that they do not want optimized away,
>even
>> > when the compiler can replace those loops by a simple expression
>(or
>> > nothing).  For such people, this patch adds a compiler option.
>> >
>> > Bootstrapped on powerpc64-linux; regression check still in progress
>> > (with Init(1) to actually test anything).
>> 
>> -fno-tree-scev-cprop?  -O0?
>
>There are other cases where GCC can delete loops, for example cddce1.
>
>> A new compiler option for this is complete overkill (and it's
>implementation
>> is gross ;)).  Semantics are also unclear, your patch would only make
>sure
>> to preserve an empty loop with the asm in the latch, it wouldn't
>disallow
>> replacing the overall effect with a computation.
>
>That's right, and the loop can even still be unrolled, even fully
>unrolled (which is good, not only should we not desert the loop but
>we also shouldn't run around so much).
>
>> Your patch would also miss a few testcases.
>
>It already makes ~2000 (mainly vectorisation) testcases fail, is that
>not enough coverage?  :-)

For an April 1st joke yes, otherwise no :)

Richard.

>Cheers,
>
>
>Segher




Re: backported patch for PR69614

2016-04-01 Thread Christophe Lyon
On 1 April 2016 at 19:34, Vladimir Makarov  wrote:
> On 03/31/2016 05:35 AM, Christophe Lyon wrote:
>>
>> On 30 March 2016 at 18:01, Vladimir Makarov  wrote:
>>>
>>>The patch for PR69614 has been backported to gcc-5 branch:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614
>>>
>>>Committed as rev. 234577.
>>>
>> Hi,
>>
>> As I've already reported:
>> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00773.html
>> the new test executes incorrectly on armeb --with-fpu=neon
>>
>>
> Sorry, I tried to reproduce it on today trunk on a real hardware but I've
> failed.
>
You have hardware running big-endian natively?


Re: Fix for PR70498 in Libiberty Demangler

2016-04-01 Thread Pedro Alves
On 04/01/2016 11:21 AM, Marcel Böhme wrote:
>  static inline void
> -d_append_num (struct d_print_info *dpi, long l)
> +d_append_num (struct d_print_info *dpi, int l)
>  {
>char buf[25];
>sprintf (buf,"%ld", l);

Doesn't this warn about %ld format vs int type mismatch?

Thanks,
Pedro Alves



Re: Fix for PR70498 in Libiberty Demangler

2016-04-01 Thread Bernd Schmidt

On 04/01/2016 07:41 PM, Pedro Alves wrote:

On 04/01/2016 11:21 AM, Marcel Böhme wrote:

  static inline void
-d_append_num (struct d_print_info *dpi, long l)
+d_append_num (struct d_print_info *dpi, int l)
  {
char buf[25];
sprintf (buf,"%ld", l);


Doesn't this warn about %ld format vs int type mismatch?


Well spotted. Marcel, please correct this issue and check for other 
warnings. Unless libiberty is somehow exempt from -Werror, this should 
have shown up in a bootstrap.



Bernd



Re: a patch for PR68695

2016-04-01 Thread Vladimir Makarov

On 03/30/2016 05:23 PM, Christophe Lyon wrote:

On 29 March 2016 at 18:28, Vladimir Makarov  wrote:

   The following patch improves the code in 2 out of 3 cases in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695

   The patch uses more accurate costs for the RA cost improvement
optimization after colouring.

   The patch was tested and bootstrapped on x86-64.  It is hard to create a
test to check the correct code generation.  Therefore there is no test.  As
the patch changes heuristics, a generated code in some cases will be
different but at least x86-64 tests expecting a specific code are not broken
by the patch.

   Committed as rev.  234527


Hi,

I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
   gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
"Performing shrink-wrapping"
   gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"


Thanks, Christophe.  I am having them too on the today trunk.  I'll 
investigate this more.




Re: [PATCH] Improve add/sub TImode double word splitters (PR rtl-optimization/70467)

2016-04-01 Thread Uros Bizjak
On Fri, Apr 1, 2016 at 3:23 PM, Jakub Jelinek  wrote:
> Hi!
>
> The previous patch apparently isn't enough for TImode, because
> we don't even allow the CONST_WIDE_INT operands in there, it uses
> "e" constraint and similar predicate.  All we care about is that
> both of the words of the argument can be expressed as addq/adcq/subq/sbbq
> immediates, so this patch adds new predicates and new constraint for
> that purpose.  Suggestions for better names for those appreciated.

Maybe simply x86_64_hilo_int_operand, x86_64_hilo_general_operand and
 to emphasize that the operand is specialized
for correct high/low part?

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for stage1
> (while the previous patch looks simple enough that I'd like to see it in
> 6.x, this one IMHO can wait).

Yes, please. This is not a regression.

> 2016-04-01  Jakub Jelinek  
>
> PR rtl-optimization/70467
> * config/i386/predicates.md (x86_64_double_int_operand,
> x86_64_double_general_operand): New predicates.
> * config/i386/constraints.md (Wd): New constraint.
> * config/i386/i386.md (mode attr di): Use Wd instead of e.
> (double_general_operand): New mode attr.
> (add3, sub3): Use 
> instead of .
> (*add3_doubleword, *sub3_doubleword): Use
> x86_64_double_general_operand instead of .
>
> * gcc.target/i386/pr70467-3.c: New test.
> * gcc.target/i386/pr70467-4.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/predicates.md.jj2016-01-07 09:42:39.0 +0100
> +++ gcc/config/i386/predicates.md   2016-04-01 11:48:17.640386878 +0200
> @@ -332,6 +332,27 @@ (define_predicate "x86_64_zext_immediate
>return false;
>  })
>
> +;; Return true if VALUE is a constant integer whose low and high words 
> satisfy
> +;; x86_64_immediate_operand.
> +(define_predicate "x86_64_double_int_operand"
> +  (match_code "const_int,const_wide_int")
> +{
> +  switch (GET_CODE (op))
> +{
> +case CONST_INT:
> +  return x86_64_immediate_operand (op, mode);
> +case CONST_WIDE_INT:

For now, we can get away with:

  gcc_assert (CONST_WIDE_INT_NUNITS (op) == 2);

  return (x86_64_immediate_operand (GEN_INT (CONST_WIDE_INT_ELT (op, 0)))
 && x86_64_immediate_operand (GEN_INT (CONST_WIDE_INT_ELT
(op, 1;

This approach is used in several targets, not to mention rtlanal.c ;)

> +  return (x86_64_immediate_operand (simplify_subreg (DImode, op, TImode,
> +0), DImode)
> + && x86_64_immediate_operand (simplify_subreg (DImode, op, 
> TImode,
> +   8), DImode));
> +
> +default:
> +  gcc_unreachable ();
> +}
> +})
> +
>  ;; Return true if size of VALUE can be stored in a sign
>  ;; extended immediate field.
>  (define_predicate "x86_64_immediate_size_operand"
> @@ -347,6 +368,14 @@ (define_predicate "x86_64_general_operan
>  (match_operand 0 "x86_64_immediate_operand"))
>  (match_operand 0 "general_operand")))
>
> +;; Return true if OP's both words are general operands representable
> +;; on x86_64.
> +(define_predicate "x86_64_double_general_operand"
> +  (if_then_else (match_test "TARGET_64BIT")
> +(ior (match_operand 0 "nonimmediate_operand")
> +(match_operand 0 "x86_64_double_int_operand"))
> +(match_operand 0 "general_operand")))
> +
>  ;; Return true if OP is non-VOIDmode general operand representable
>  ;; on x86_64.  This predicate is used in sign-extending conversion
>  ;; operations that require non-VOIDmode immediate operands.
> --- gcc/config/i386/constraints.md.jj   2016-01-29 21:32:56.0 +0100
> +++ gcc/config/i386/constraints.md  2016-04-01 11:19:44.633921527 +0200
> @@ -266,6 +266,11 @@ (define_constraint "Wz"
>(and (match_operand 0 "x86_64_zext_immediate_operand")
> (match_test "GET_MODE (op) != VOIDmode")))
>
> +(define_constraint "Wd"
> +  "128-bit integer constant where both the high and low 64-bit word
> +   of it satisfies the e constraint."
> +  (match_operand 0 "x86_64_double_int_operand"))
> +
>  (define_constraint "Z"
>"32-bit unsigned integer constant, or a symbolic reference known
> to fit that range (for immediate operands in zero-extending x86-64
> --- gcc/config/i386/i386.md.jj  2016-03-31 17:33:36.0 +0200
> +++ gcc/config/i386/i386.md 2016-04-01 11:29:40.705729897 +0200
> @@ -1071,7 +1071,7 @@ (define_mode_attr i [(QI "n") (HI "n") (
>  (define_mode_attr g [(QI "qmn") (HI "rmn") (SI "rme") (DI "rme")])
>
>  ;; Immediate operand constraint for double integer modes.
> -(define_mode_attr di [(SI "nF") (DI "e")])
> +(define_mode_attr di [(SI "nF") (DI "Wd")])
>
>  ;; Immediate operand constraint for shifts.
>  (define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
> @@ -1084,6 +1084,15 @@ (define_mode_attr general_operand
>  (DI "x86_64_general_operand")
>  

Re: Patches to fix optimizer bug & C++ exceptions for GCC VAX backend

2016-04-01 Thread Jake Hamby
Hi,

I apologize for the poor quality of the initial version of the patch that I 
submitted. I think I may have also mangled it by not disabling the "smart 
quotes" feature on my Mac before I pasted in the diff from the terminal window. 
I intentionally did not use gmail for fear of adding word wraps or converting 
tabs to spaces, but apparently I mangled the patch anyway. I emailed Christos a 
.tar.gz version separately.

Yes, the file you highlighted is definitely not correct and I need to figure 
out how to fix it properly. For some reason the optimizer is deleting the 
emit_move_insn() on VAX, while it seems to work on all the other platforms that 
define EH_RETURN_HANDLER_RTX() and depend on that instruction. So I'm looking 
into fixing it in gcc/config/vax/something. My next step to try to figure out 
what's going on is to dump the trees for all the phases when building 
unwind-dw2.o (the only file where __builtin_eh_return() has to work in GCC when 
libgcc is compiled in order for C++ exceptions to work) with and without "-O", 
and figure out when the instruction is being deleted and why. This only affects 
functions that call __builtin_eh_return() instead of return, but I think 
cc1plus may also need it to be defined correctly for some code that it 
generates.

My theory is that it has to do with liveness detection and a write into the 
stack frame being incorrectly seen as updating a local variable, but that could 
be completely wrong. I was hoping that by cc'ing gcc-patches that somebody more 
familiar with why some phase of the optimizer might decide to delete this 
particular insn that copies data into the stack (to overwrite the return 
address, e.g. to move to EH_RETURN_HANDLER_RTX) immediately before returning.

So far I haven't found any actual bugs in GCC that should be fixed. Perhaps it 
isn't correct to check in a hack like the change to gcc/except.c into 
netbsd-current except temporarily, until there's a correct fix for that part of 
the issue, which is what I'd like to figure out. In the meantime, I would 
highly recommend adding an #ifdef __vax around that line to avoid trouble with 
the other ports.

Now that I think about it, please do not check in the patch to 
gcc/dist/gcc/except.c without an #ifdef __vax, and certainly this isn't ready 
to go into GCC mainline. But I think it will be ready with a few more 
adjustments. The important thing is that I think most, if not all of the 
necessary fixes will be entirely modifications to VAX-related files that can be 
locally patched in NetBSD regardless of whether the GCC maintainers accept them 
or not.

To be honest, my hope by sending out my work now, even in such a rough state, 
would be to try to avoid deprecating the GCC port to VAX, if only because: 1) 
there doesn't seem to be a compelling reason to remove support for it since it 
doesn't use any really old code paths that aren't also used by other backends 
(e.g. m68k and Atmel AVR use cc0, IBM S/390 uses non-IEEE FP formats), so it 
doesn't seem to be preventing any optimizations or code refactoring elsewhere 
in GCC that I could see, 2) even though NetBSD could continue to support VAX 
GCC, I noticed in the ChangeLogs that whenever somebody has made a change to a 
definition that affects the backends, they're usually very good about updating 
all of the backends so that they continue to compile, at least. So leaving the 
VAX backend in the tree would be beneficial from a maintenance standpoint for 
users of it, 3) the VAX backend is perhaps somewhat useful as a test case for 
GCC because so many unusual RTX standard instructions were obviously defined 
*for* it (although x86 defines a lot of them, too), although my interest is 
personally in preserving an interesting piece of computer history, and for 
nostalgia purposes.

I sent an earlier email to port-vax suggesting that future discussions of this 
project aren't relevant to gcc-patches, but I did want to get it on a few 
people's radar on the NetBSD side and try to solicit a bit of help on the 
questions I had as to how to avoid having to add that hack to gcc/except.c to 
make the optimizer not delete the insns. All of the other stuff can be worked 
on in NetBSD-current and avoid bothering the 99% of people who subscribe to 
gcc-patches who have no interest in the VAX backend.

Best regards,
Jake


> On Apr 1, 2016, at 04:37, Bernd Schmidt  wrote:
> 
> Cc'ing Matt Thomas who is listed as the vax maintainer; most of the patch 
> should be reviewed by him IMO. If he is no longer active I'd frankly rather 
> deprecate the port rather than invest effort in keeping it running.
> 
>> Index: gcc/except.c
>> ===
>> RCS file: /cvsroot/src/external/gpl3/gcc/dist/gcc/except.c,v
>> retrieving revision 1.3
>> diff -u -r1.3 except.c
>> --- gcc/except.c 23 Mar 2016 15:51:36 -  1.3
>> +++ gcc/except.c 28 Mar 2016 23:24:40 -
>> @@ -2288,7 +2288,8 @@
>>  #en

[PATCH] Fix PR c++/70452 (regression in C++ parsing performance)

2016-04-01 Thread Patrick Palka
Currently during constexpr CALL_EXPR evaluation we create a new copy of
the callee function's body for each separate call with no attempt made
at reusing the function body.  So when a function ends up getting called
10s of thousands of times durnig constexpr evaluuation, we end up
creating 10s of thousands of copies of the function.

This patch is an attempt at reusing these copies instead of having a new
copy get created each call.  It implements a per-function freelist of
the unshared body/parm/res copies that were created by copy_fn().  When
a constexpr call is finished it pushes the body/parm/res trees to the
freelist, and before a call is evaluated it tries to reuse the trees
from the freelist, falling back to copy_fn() only if the freelist is
empty.

Consequently, instead of creating 10s of thousands of copies for 10s of
thousands of calls, the number of copies that're made corresponds to the
recursion depth of the function.  This makes sense because the reason we
create copies in the first place (IIUC) is to ensure that recursive
calls to a function do not share the same
PARM_DECLs/VAR_DECLs/RESULT_DECLs.

In order for this reuse to work properly in the presence of SAVE_EXPRs,
we have to track each SAVE_EXPR (belonging to the callee) that we
evaluate during the call and then forget its value after the call.
constexpr-vla4.C is a new test that would fail if this patch had missed
this SAVE_EXPR part.

With this patch, the compile time of the test case in the PR gets cut
from 3.5s to 2s and memory usage gets cut from 550M to 200M.

I bootstrapped + regtested this change on x86_64-pc-linux-gnu, and also
compile-tested it against boost with no regressions.

gcc/cp/ChangeLog:

PR c++/70452
* constexpr.c (struct bpr_entry): New struct.
(struct constexpr_fundef): New field bpr_freelist.
(retrieve_constexpr_fundef): Initialize field bpr_freelist to
NULL.
(register_constexpr_fundef): Likewise.
(cxx_eval_call_expression): Check constexpr_fundef::bpr_freelist
for a reusable copy of the function's body before creating a new
copy.  Keep track of the callee's SAVE_EXPRs that we evaluate
and forget their values afterwards. Push the function body we
used onto the freelist for later reuse.

gcc/testsuite/ChangeLog:

PR c++/70452
* g++.dg/ext/constexpr-vla4.C: New test.
---
 gcc/cp/constexpr.c| 55 ---
 gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++
 2 files changed, 67 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ea605dc..fc9b67c 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -112,11 +112,24 @@ ensure_literal_type_for_constexpr_object (tree decl)
   return decl;
 }
 
+/* The representation of a single node in the constexpr_fundef::bpr_freelist 
chain.  */
+
+struct GTY((chain_next ("%h.prev"))) bpr_entry
+{
+  tree body;
+  tree parms;
+  tree res;
+  struct bpr_entry *prev;
+};
+
 /* Representation of entries in the constexpr function definition table.  */
 
 struct GTY((for_user)) constexpr_fundef {
   tree decl;
   tree body;
+  /* A chain of unused copies of this function's body awaiting reuse for
+ CALL_EXPR evaluation.  */
+  struct bpr_entry *bpr_freelist;
 };
 
 struct constexpr_fundef_hasher : ggc_ptr_hash
@@ -154,7 +167,7 @@ constexpr_fundef_hasher::hash (constexpr_fundef *fundef)
 static constexpr_fundef *
 retrieve_constexpr_fundef (tree fun)
 {
-  constexpr_fundef fundef = { NULL, NULL };
+  constexpr_fundef fundef = { NULL, NULL, NULL };
   if (constexpr_fundef_table == NULL)
 return NULL;
 
@@ -806,6 +819,7 @@ register_constexpr_fundef (tree fun, tree body)
 
   entry.decl = fun;
   entry.body = body;
+  entry.bpr_freelist = NULL;
   slot = constexpr_fundef_table->find_slot (&entry, INSERT);
 
   gcc_assert (*slot == NULL);
@@ -1377,10 +1391,21 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t,
   if (!result || result == error_mark_node)
{
  gcc_assert (DECL_SAVED_TREE (fun));
- tree parms, res;
+ tree body, parms, res;
 
- /* Unshare the whole function body.  */
- tree body = copy_fn (fun, parms, res);
+ /* Reuse or create a new unshared copy of this function's body.  */
+ if (bpr_entry *bpr = new_call.fundef->bpr_freelist)
+   {
+ body = bpr->body;
+ parms = bpr->parms;
+ res = bpr->res;
+ new_call.fundef->bpr_freelist = bpr->prev;
+   }
+ else
+   {
+ /* Unshare the whole function body.  */
+ body = copy_fn (fun, parms, res);
+   }
 
  /* Associate the bindings with the remapped parms.  */
  tree bound = new_call.bindings;
@@ -1409,11 +1434,22 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t

[C++ PATCH] Reject self-recursive constexpr calls even in templates (PR c++/70449)

2016-04-01 Thread Jakub Jelinek
Hi!

As the testcase shows, when not in a template, cxx_eval_call_expression
already complains about self-recursive calls in constexpr contexts,
but if we are in a function template, we ICE on the testcase,
because we try to instantiate the function template we are in the middle of
parsing, e.g. function_end_locus is UNKNOWN_LOCATION, and only the
statements that have been already parsed are in there.

The patch attempts to reject that.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-04-01  Jakub Jelinek  

PR c++/70449
* constexpr.c (cxx_eval_call_expression): Before calling
instantiate_decl check if not trying to instantiate current
function template and handle that the same as if
fun == current_function_decl.

* g++.dg/cpp1y/pr70449.C: New test.

--- gcc/cp/constexpr.c.jj   2016-03-29 19:31:21.0 +0200
+++ gcc/cp/constexpr.c  2016-04-01 16:26:53.591088640 +0200
@@ -1293,6 +1293,27 @@ cxx_eval_call_expression (const constexp
   if (!DECL_INITIAL (fun)
   && DECL_TEMPLOID_INSTANTIATION (fun))
 {
+  tree d = fun;
+  if (DECL_CLONED_FUNCTION_P (d))
+   d = DECL_CLONED_FUNCTION (d);
+  d = template_for_substitution (d);
+  if (DECL_TEMPLATE_RESULT (d) == current_function_decl)
+   {
+ /* A call to the current function template, i.e.
+template 
+constexpr int f (int i) {
+  constexpr int j = f(i-1);
+  return j;
+}
+This would be OK without the constexpr on the declaration
+of j.  */
+ if (!ctx->quiet)
+   error_at (loc, "%qD called in a constant expression before its "
+  "definition is complete", fun);
+ *non_constant_p = true;
+ return t;
+   }
+
   ++function_depth;
   instantiate_decl (fun, /*defer_ok*/false, /*expl_inst*/false);
   --function_depth;
--- gcc/testsuite/g++.dg/cpp1y/pr70449.C.jj 2016-04-01 16:42:55.055190752 
+0200
+++ gcc/testsuite/g++.dg/cpp1y/pr70449.C2016-04-01 16:43:43.207545314 
+0200
@@ -0,0 +1,26 @@
+// PR c++/70449
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wall" }
+
+template 
+constexpr int f1 ()
+{
+  enum E { a = f1<0> () }; // { dg-error "called in a constant expression 
before its definition is complete|is not an integer constant" }
+  return 0;
+}
+
+template 
+constexpr int f2 ()
+{
+  enum E { a = f2<0> () };
+  return 0;
+}
+
+constexpr int f3 ()
+{
+  enum E { a = f3 () };// { dg-error "called in a constant expression 
before its definition is complete|is not an integer constant" }
+  return 0; 
+}
+
+constexpr int c = f1<0> ();
+constexpr int d = f3 ();

Jakub


Re: [PATCH] Fix PR c++/70452 (regression in C++ parsing performance)

2016-04-01 Thread Patrick Palka
On Fri, Apr 1, 2016 at 3:13 PM, Patrick Palka  wrote:
> Currently during constexpr CALL_EXPR evaluation we create a new copy of
> the callee function's body for each separate call with no attempt made
> at reusing the function body.  So when a function ends up getting called
> 10s of thousands of times durnig constexpr evaluuation, we end up
> creating 10s of thousands of copies of the function.
>
> This patch is an attempt at reusing these copies instead of having a new
> copy get created each call.  It implements a per-function freelist of
> the unshared body/parm/res copies that were created by copy_fn().  When
> a constexpr call is finished it pushes the body/parm/res trees to the
> freelist, and before a call is evaluated it tries to reuse the trees
> from the freelist, falling back to copy_fn() only if the freelist is
> empty.
>
> Consequently, instead of creating 10s of thousands of copies for 10s of
> thousands of calls, the number of copies that're made corresponds to the
> recursion depth of the function.  This makes sense because the reason we
> create copies in the first place (IIUC) is to ensure that recursive
> calls to a function do not share the same
> PARM_DECLs/VAR_DECLs/RESULT_DECLs.
>
> In order for this reuse to work properly in the presence of SAVE_EXPRs,
> we have to track each SAVE_EXPR (belonging to the callee) that we
> evaluate during the call and then forget its value after the call.
> constexpr-vla4.C is a new test that would fail if this patch had missed
> this SAVE_EXPR part.
>
> With this patch, the compile time of the test case in the PR gets cut
> from 3.5s to 2s and memory usage gets cut from 550M to 200M.
>
> I bootstrapped + regtested this change on x86_64-pc-linux-gnu, and also
> compile-tested it against boost with no regressions.
>
> gcc/cp/ChangeLog:
>
> PR c++/70452
> * constexpr.c (struct bpr_entry): New struct.
> (struct constexpr_fundef): New field bpr_freelist.
> (retrieve_constexpr_fundef): Initialize field bpr_freelist to
> NULL.
> (register_constexpr_fundef): Likewise.
> (cxx_eval_call_expression): Check constexpr_fundef::bpr_freelist
> for a reusable copy of the function's body before creating a new
> copy.  Keep track of the callee's SAVE_EXPRs that we evaluate
> and forget their values afterwards. Push the function body we
> used onto the freelist for later reuse.
>
> gcc/testsuite/ChangeLog:
>
> PR c++/70452
> * g++.dg/ext/constexpr-vla4.C: New test.
> ---
>  gcc/cp/constexpr.c| 55 
> ---
>  gcc/testsuite/g++.dg/ext/constexpr-vla4.C | 17 ++
>  2 files changed, 67 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/constexpr-vla4.C
>
> diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
> index ea605dc..fc9b67c 100644
> --- a/gcc/cp/constexpr.c
> +++ b/gcc/cp/constexpr.c
> @@ -112,11 +112,24 @@ ensure_literal_type_for_constexpr_object (tree decl)
>return decl;
>  }
>
> +/* The representation of a single node in the constexpr_fundef::bpr_freelist 
> chain.  */
> +
> +struct GTY((chain_next ("%h.prev"))) bpr_entry
> +{
> +  tree body;
> +  tree parms;
> +  tree res;
> +  struct bpr_entry *prev;
> +};
> +
>  /* Representation of entries in the constexpr function definition table.  */
>
>  struct GTY((for_user)) constexpr_fundef {
>tree decl;
>tree body;
> +  /* A chain of unused copies of this function's body awaiting reuse for
> + CALL_EXPR evaluation.  */
> +  struct bpr_entry *bpr_freelist;
>  };
>
>  struct constexpr_fundef_hasher : ggc_ptr_hash
> @@ -154,7 +167,7 @@ constexpr_fundef_hasher::hash (constexpr_fundef *fundef)
>  static constexpr_fundef *
>  retrieve_constexpr_fundef (tree fun)
>  {
> -  constexpr_fundef fundef = { NULL, NULL };
> +  constexpr_fundef fundef = { NULL, NULL, NULL };
>if (constexpr_fundef_table == NULL)
>  return NULL;
>
> @@ -806,6 +819,7 @@ register_constexpr_fundef (tree fun, tree body)
>
>entry.decl = fun;
>entry.body = body;
> +  entry.bpr_freelist = NULL;
>slot = constexpr_fundef_table->find_slot (&entry, INSERT);
>
>gcc_assert (*slot == NULL);
> @@ -1377,10 +1391,21 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
> tree t,
>if (!result || result == error_mark_node)
> {
>   gcc_assert (DECL_SAVED_TREE (fun));
> - tree parms, res;
> + tree body, parms, res;
>
> - /* Unshare the whole function body.  */
> - tree body = copy_fn (fun, parms, res);
> + /* Reuse or create a new unshared copy of this function's body.  */
> + if (bpr_entry *bpr = new_call.fundef->bpr_freelist)
> +   {
> + body = bpr->body;
> + parms = bpr->parms;
> + res = bpr->res;
> + new_call.fundef->bpr_freelist = bpr->prev;
> +   }
> + else
> +   {
> + /* Unshare 

Re: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-01 Thread Evandro Menezes

[AArch64] Add more choices for the reciprocal square root
   approximation

Allow a target to prefer such operation depending on the
   operation mode.

gcc/
* config/aarch64/aarch64-protos.h
(AARCH64_APPROX_MODE): New macro.
(AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,ALL}: Likewise.
(tune_params): New member "approx_rsqrt_modes".
* config/aarch64/aarch64-tuning-flags.def
(AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
* config/aarch64/aarch64.c
(generic_tunings): New member "approx_rsqrt_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(use_rsqrt_p): New argument for the mode and use new member
"approx_rsqrt_modes" from "tune_params".
(aarch64_builtin_reciprocal): Devise mode from builtin.
(aarch64_optab_supported_p): New argument for the mode.

This patch allows a target to choose the mode of this operation when it 
is beneficial to use the approximate version.


I hope that this gets in the ballpark of what's been discussed previously.

Thank you,

--
Evandro Menezes

>From 17ac33719bae8966a481cc833c9ac062f7fb742f Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 3 Mar 2016 18:13:46 -0600
Subject: [PATCH] [AArch64] Add more choices for the reciprocal square root
 approximation

Allow a target to prefer such operation depending on the operation mode.

gcc/
	* config/aarch64/aarch64-protos.h
	(AARCH64_APPROX_MODE): New macro.
	(AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,ALL}: Likewise.
	(tune_params): New member "approx_rsqrt_modes".
	* config/aarch64/aarch64-tuning-flags.def
	(AARCH64_EXTRA_TUNE_APPROX_RSQRT): Remove macro.
	* config/aarch64/aarch64.c
	(generic_tunings): New member "approx_rsqrt_modes".
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(exynosm1_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(use_rsqrt_p): New argument for the mode and use new member
	"approx_rsqrt_modes" from "tune_params".
	(aarch64_builtin_reciprocal): Devise mode from builtin.
	(aarch64_optab_supported_p): New argument for the mode.
---
 gcc/config/aarch64/aarch64-protos.h | 21 
 gcc/config/aarch64/aarch64-tuning-flags.def |  2 --
 gcc/config/aarch64/aarch64.c| 39 ++---
 3 files changed, 46 insertions(+), 16 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 58c9d0d..46f4f18 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,26 @@ struct cpu_branch_cost
   const int unpredictable;  /* Unpredictable branch or optimizing for speed.  */
 };
 
+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \
+ : (0))
+#define AARCH64_APPROX_NONE (0)
+#define AARCH64_APPROX_SP (AARCH64_APPROX_MODE (SFmode) \
+			   | AARCH64_APPROX_MODE (V2SFmode) \
+			   | AARCH64_APPROX_MODE (V4SFmode))
+#define AARCH64_APPROX_DP (AARCH64_APPROX_MODE (DFmode) \
+			   | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_DFORM (AARCH64_APPROX_MODE (SFmode) \
+			  | AARCH64_APPROX_MODE (DFmode) \
+			  | AARCH64_APPROX_MODE (V2SFmode))
+#define AARCH64_APPROX_QFORM (AARCH64_APPROX_MODE (V4SFmode) \
+			  | AARCH64_APPROX_MODE (V2DFmode))
+#define AARCH64_APPROX_ALL (-1)
+
 struct tune_params
 {
   const struct cpu_cost_table *insn_extra_cost;
@@ -218,6 +238,7 @@ struct tune_params
   } autoprefetcher_model;
 
   unsigned int extra_tuning_flags;
+  unsigned int approx_rsqrt_modes;
 };
 
 #define AARCH64_FUSION_PAIR(x, name) \
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def
index 7e45a0c..048c2a3 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -29,5 +29,3 @@
  AARCH64_TUNE_ to give an enum name. */
 
 AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS)
-AARCH64_EXTRA_TUNING_OPTION ("approx_rsqrt", APPROX_RSQRT)
-
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b7086dd..b0ee11e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -38,6 +38,7 @@
 #include "recog.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
+#include "insn-modes.h"
 #include "

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Evandro Menezes

On 04/01/16 08:58, Wilco Dijkstra wrote:

Evandro Menezes wrote:
On 03/23/16 11:24, Evandro Menezes wrote:

On 03/17/16 15:09, Evandro Menezes wrote:

This patch implements FP division by an approximation using the Newton
series.

With this patch, DF division is sped up by over 100% and SF division,
zilch, both on A57 and on M1.

Mentioning throughput is not useful given that the vectorized single precision
case will give most of the speedup in actual code.


 gcc/
 * config/aarch64/aarch64-tuning-flags.def
 (AARCH64_EXTRA_TUNE_APPROX_DIV_{SF,DF}: New tuning macros.
 * config/aarch64/aarch64-protos.h
 (AARCH64_EXTRA_TUNE_APPROX_DIV): New macro.
 (aarch64_emit_approx_div): Declare new function.
 * config/aarch64/aarch64.c
 (aarch64_emit_approx_div): Define new function.
 * config/aarch64/aarch64.md ("div3"): New expansion.
 * config/aarch64/aarch64-simd.md ("div3"): Likewise.


This version of the patch cleans up the changes to the MD files and
optimizes the division when the numerator is 1.0.

Adding support for plain recip is good. Having the enabling logic no longer in
the md file is an improvement, but I don't believe adding tuning flags for the 
inner
mode is correct - we need a more generic solution like I mentioned in my other 
mail.

The division variant should use the same latency reduction trick I mentioned 
for sqrt.


Wilco,

I don't think that it applies here, since it doesn't have to deal with 
special cases.


As for the finer grained flags, I'll wait for the feedback on 
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html


Thank you,

--
Evandro Menezes



Re: [AArch64] Fix SIMD predicate

2016-04-01 Thread Evandro Menezes

On 03/31/16 04:52, James Greenhalgh wrote:

On Wed, Mar 30, 2016 at 11:18:27AM -0500, Evandro Menezes wrote:

Add scalar 0.0 to the aarch64_simd_reg_or_zero predicate.

2016-03-30  Evandro Menezes  

 * gcc/config/aarch64/predicates.md
 (aarch64_simd_reg_or_zero predicate): Add the "const_double"
constraint.


It seems to me that the aarch64_simd_reg_or_zero should also handle
the scalar constant 0.0 as well.

It took me an extra few minutes to figure out why this patch was correct - a
more detailed description of what the code-gen issue this was intended to
fix would have helped. The only pattern I can see for which this matters
is aarch64_cm for the SF and DF modes. Clearly the predicate
is too tight here, and the relaxation you propose is correct.


I'm sorry.  Indeed, I meant this change for that very pattern for the 
scalar FP version, since it's the only instance when this predicate is 
used along with the constraint Y.  Without this patch, this constraint 
never matches.



OK to commit?

OK, and low-risk enough to take now.


Bootstrapped and checked on aarch64-unknown-linux-gnu.  Committed as 
r234685.


Thank you,

--
Evandro Menezes



Re: Various selective scheduling fixes

2016-04-01 Thread Christophe Lyon
On 1 April 2016 at 18:19, Jeff Law  wrote:
> On 04/01/2016 07:26 AM, Christophe Lyon wrote:
>>
>> On 1 April 2016 at 15:12, Kyrill Tkachov 
>> wrote:
>>>
>>> Hi Christophe, Andrey,
>>>
>>>
>>> On 01/04/16 14:09, Christophe Lyon wrote:


 On 1 April 2016 at 10:54, Andrey Belevantsev  wrote:
>
>
> Hi Christophe,
>
>
> On 01.04.2016 10:33, Christophe Lyon wrote:
>>
>>
>> On 31 March 2016 at 16:43, Andrey Belevantsev  wrote:
>>>
>>>
>>> Hello,
>>>
>>> On 14.03.2016 12:10, Andrey Belevantsev wrote:



 Hello,

 In this thread I will be posting the patches for the fixed selective
 scheduling PRs (except the one that was already kindly checked in by
 Jeff).
The patches were tested both on x86-64 and ia64 with the
 following
 combination: 1) the usual bootstrap/regtest, which only utilizes
 sel-sched
 on its own tests, made by default to run on arm/ppc/x86-64/ia64; 2)
 the
 bootstrap/regtest with the second scheduler forced to sel-sched; 3)
 both
 schedulers forced to sel-sched.  In all cases everything seemed to
 be
 fine.

 Three of the PRs are regressions, the other two showed different
 errors
 across the variety of releases tested by submitters;  I think all of
 them
 are appropriate at this stage -- they do not touch anything outside
 of
 selective scheduling except the first patch where a piece of code
 from
 sched-deps.c needs to be refactored into a function to be called
 from
 sel-sched.c.
>>>
>>>
>>>
>>>
>>> I've backported all regression PRs to gcc-5-branch after testing
>>> there
>>> again
>>> with selective scheduling force enabled: PRs 64411, 0, 69032,
>>> 69102.
>>> The first one was not marked as a regression as such but the test for
>>> PR
>>> 70292, which is duplicate, works for me on gcc 5.1 thus making it a
>>> regression, too.
>>>
>> Hi,
>>
>> The backport for pr69102 shows that the new testcase fails to compile
>> (ICE)
>> when GCC is configured as:
>>
>> --target=arm-none-linux-gnueabihf --with-float=hard --with-mode=arm
>> --with-cpu=cortex-a15 --with-fpu=neon-vfpv4
>>
>>
>>
>>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:
>> In function 'foo':
>>
>>
>>
>> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.c-torture/compile/pr69102.c:21:1:
>> internal compiler error: Segmentation fault
>> 0xa64d15 crash_signal
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/toplev.c:383
>> 0xfa41d7 autopref_multipass_dfa_lookahead_guard(rtx_insn*, int)
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/haifa-sched.c:5752
>> 0xa31cd2 invoke_dfa_lookahead_guard
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4212
>> 0xa31cd2 find_best_expr
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:4415
>> 0xa343fb fill_insns
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:5570
>> 0xa343fb schedule_on_fences
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7395
>> 0xa36010 sel_sched_region_2
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7533
>> 0xa36f2a sel_sched_region_1
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7575
>> 0xa36f2a sel_sched_region(int)
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7676
>> 0xa37589 run_selective_scheduling()
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sel-sched.c:7752
>> 0xa14aed rest_of_handle_sched2
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3647
>> 0xa14aed execute
>>   /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/sched-rgn.c:3791
>>
>> See
>>
>>
>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234625/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a15-neon-vfpv4.txt
>>
>> Can you have a look?
>
>
>
> That's because A15 is the only place which enables
> autopref_multipass_dfa_lookahead_guard as the DFA lookahead guard hook.
> But
> autoprefetch modeling doesn't work for selective scheduling, it uses
> haifa
> structures that are not kept up to date during sel-sched.  So this is
> not
> supposed to work as soon as the param value for prefetcher lookahead
> depth
> is positive.
>
> The following patch works for me.  Could you check it with your
> testing?
> If
> it works fine for you, I would install the patch both for trunk and
> gcc-5.
> It would be great to force sel-sched 

Re: a patch for PR68695

2016-04-01 Thread Vladimir Makarov

On 03/30/2016 05:23 PM, Christophe Lyon wrote:

On 29 March 2016 at 18:28, Vladimir Makarov  wrote:

   The following patch improves the code in 2 out of 3 cases in

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695

   The patch uses more accurate costs for the RA cost improvement
optimization after colouring.

   The patch was tested and bootstrapped on x86-64.  It is hard to create a
test to check the correct code generation.  Therefore there is no test.  As
the patch changes heuristics, a generated code in some cases will be
different but at least x86-64 tests expecting a specific code are not broken
by the patch.

   Committed as rev.  234527


Hi,

I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
   gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
"Performing shrink-wrapping"
   gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"



I've checked the generated code.  RA with the patch generates a better 
code for the both tests. So shrink wrap optimization failed. The final 
code has 1 insn less for the both tests when the patch is applied.


I guess it is wrong to write quality tests based on expected code 
generated before any optimization.  It has sense if we provide the same 
input.  LLVM testsuite is mostly such tests as they have a readable IR.  
GCC unfortunately has no serialized and readable IR. On the other hand 
LLVM lacks integrated testing.


So I'd mark these tests as XFAIL or removed arm from DEJAGNU target in 
the tests.





Re: a patch for PR68695

2016-04-01 Thread Jakub Jelinek
On Fri, Apr 01, 2016 at 04:26:41PM -0400, Vladimir Makarov wrote:
> >I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
> >   gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
> >"Performing shrink-wrapping"
> >   gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing 
> > shrink-wrapping"
> >
> 
> I've checked the generated code.  RA with the patch generates a better code
> for the both tests. So shrink wrap optimization failed. The final code has 1
> insn less for the both tests when the patch is applied.
> 
> I guess it is wrong to write quality tests based on expected code generated
> before any optimization.  It has sense if we provide the same input.  LLVM
> testsuite is mostly such tests as they have a readable IR.  GCC
> unfortunately has no serialized and readable IR. On the other hand LLVM
> lacks integrated testing.
> 
> So I'd mark these tests as XFAIL or removed arm from DEJAGNU target in the
> tests.

FYI, those 2 tests also now FAIL on ppc64{,le}-linux in addition to
armv7hl-linux-gnueabi.

Jakub


Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote:
> > The division variant should use the same latency reduction trick I 
> > mentioned for sqrt.
>
> I don't think that it applies here, since it doesn't have to deal with
> special cases.

No it applies as it's exactly the same calculation: x * rsqrt(y) and x * 
recip(y). In both
cases you don't need the final result of rsqrt(y) or recip(y), avoiding a 
multiply. 
Given these sequences are high latency this saving is actually quite important.

Wilco



Re: backported patch for PR69614

2016-04-01 Thread Vladimir Makarov

On 04/01/2016 01:39 PM, Christophe Lyon wrote:

On 1 April 2016 at 19:34, Vladimir Makarov  wrote:

O
Sorry, I tried to reproduce it on today trunk on a real hardware but I've
failed.


You have hardware running big-endian natively?

Oops, I missed that it is a big endian machine.  Thanks.



Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Evandro Menezes

On 04/01/16 16:22, Wilco Dijkstra wrote:

Evandro Menezes wrote:

The division variant should use the same latency reduction trick I mentioned 
for sqrt.

I don't think that it applies here, since it doesn't have to deal with
special cases.

No it applies as it's exactly the same calculation: x * rsqrt(y) and x * 
recip(y). In both
cases you don't need the final result of rsqrt(y) or recip(y), avoiding a 
multiply.
Given these sequences are high latency this saving is actually quite important.


Wilco,

In the case of sqrt(), the special case when the argument is 0.0 
multiplication is necessary in order to guarantee correctness. Handling 
this special case hurts performance, when your suggestion helps.


However, I don't think that there's the need to handle any special case 
for division.  The only case when the approximation differs from 
division is when the numerator is infinity and the denominator, zero, 
when the approximation returns infinity and the division, NAN.  So I 
don't think that it's a special case that deserves being handled.  IOW, 
the result of the approximate reciprocal is always needed.


Or am I missing something?

Thank you,

--
Evandro Menezes



[patch, fortran] PR68566 ICE on using unusable array in reshape

2016-04-01 Thread Jerry DeLisle
This problem is when array indexes are given that have non-integer expressions
or otherwise bad arrays, not just related to reshape.

There are several test cases presented in the PR.  Most of these are fixed by
adding a check for any non-integer in match_array_element_spec.  The patch-let
in gfc_simplify_reshape avoids passing a NULL shape further into simplification.

I will add an additional test case for the original posted problem in the PR.
Two existing tests get exercised, changing the error message.  Finding the
problems earlier in the matchers I think is the right way to go. I am curious if
the old checks ever get triggered (I will look into that a little later.

Regression tested on x86-64-linux.  OK for trunk?


2016-04-02  Jerry DeLisle  

PR fortran/68566
* array.c (match_array_element_spec): Add check for non-integer 
dimension given.
* simplify.c (gfc_simplify_reshape): If source shape is NULL return.
diff --git a/gcc/fortran/array.c b/gcc/fortran/array.c
index 2fc9dfa..57bdf7e 100644
--- a/gcc/fortran/array.c
+++ b/gcc/fortran/array.c
@@ -421,8 +421,12 @@ match_array_element_spec (gfc_array_spec *as)
   if (!gfc_expr_check_typed (*upper, gfc_current_ns, false))
 return AS_UNKNOWN;
 
-  if ((*upper)->expr_type == EXPR_FUNCTION && (*upper)->ts.type == BT_UNKNOWN
-  && (*upper)->symtree && strcmp ((*upper)->symtree->name, "null") == 0)
+  if (((*upper)->expr_type == EXPR_CONSTANT
+	&& (*upper)->ts.type != BT_INTEGER) ||
+  ((*upper)->expr_type == EXPR_FUNCTION
+	&& (*upper)->ts.type == BT_UNKNOWN
+	&& (*upper)->symtree
+	&& strcmp ((*upper)->symtree->name, "null") == 0))
 {
   gfc_error ("Expecting a scalar INTEGER expression at %C");
   return AS_UNKNOWN;
@@ -448,8 +452,12 @@ match_array_element_spec (gfc_array_spec *as)
   if (!gfc_expr_check_typed (*upper, gfc_current_ns, false))
 return AS_UNKNOWN;
 
-  if ((*upper)->expr_type == EXPR_FUNCTION && (*upper)->ts.type == BT_UNKNOWN
-  && (*upper)->symtree && strcmp ((*upper)->symtree->name, "null") == 0)
+  if (((*upper)->expr_type == EXPR_CONSTANT
+	&& (*upper)->ts.type != BT_INTEGER) ||
+  ((*upper)->expr_type == EXPR_FUNCTION
+	&& (*upper)->ts.type == BT_UNKNOWN
+	&& (*upper)->symtree
+	&& strcmp ((*upper)->symtree->name, "null") == 0))
 {
   gfc_error ("Expecting a scalar INTEGER expression at %C");
   return AS_UNKNOWN;
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 12a8f32..a631010 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -5163,6 +5163,9 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
   || !is_constant_array_expr (order_exp))
 return NULL;
 
+  if (source->shape == NULL)
+return NULL;
+
   /* Proceed with simplification, unpacking the array.  */
 
   mpz_init (index);
diff --git a/gcc/testsuite/gfortran.dg/pr36192.f90 b/gcc/testsuite/gfortran.dg/pr36192.f90
index df3bfd7..d8b48f2 100644
--- a/gcc/testsuite/gfortran.dg/pr36192.f90
+++ b/gcc/testsuite/gfortran.dg/pr36192.f90
@@ -3,7 +3,7 @@
 !
 program three_body
   real, parameter :: n = 2, d = 2
-  real, dimension(n,d) :: x  ! { dg-error "of INTEGER type|of INTEGER type" }
-  x(1,:) = (/ 1.0, 0.0 /)
+  real, dimension(n,d) :: x  ! { dg-error "scalar INTEGER expression" }
+  x(1,:) = (/ 1.0, 0.0 /)! { dg-error "Unclassifiable statement" }
 end program three_body
-! { dg-prune-output "have constant shape" }
+
diff --git a/gcc/testsuite/gfortran.dg/real_dimension_1.f b/gcc/testsuite/gfortran.dg/real_dimension_1.f
index 73e9131..5fd200a 100644
--- a/gcc/testsuite/gfortran.dg/real_dimension_1.f
+++ b/gcc/testsuite/gfortran.dg/real_dimension_1.f
@@ -1,7 +1,7 @@
 ! { dg-do compile }
 ! PR 34305 - make sure there's an error message for specifying a
   program test
-  parameter (datasize = 1000) 
-  dimension idata (datasize)  ! { dg-error "must be of INTEGER type|must have constant shape" }
-  idata (1) = -1
+  parameter (datasize = 1000) ! Note that datasize is defualt type real
+  dimension idata (datasize)  ! { dg-error "Expecting a scalar INTEGER expression" }
   end
+


Re: [AArch64] Emit square root using the Newton series

2016-04-01 Thread Evandro Menezes

On 03/24/16 14:11, Evandro Menezes wrote:

On 03/17/16 17:46, Evandro Menezes wrote:
This patch refactors the function to emit the reciprocal square root 
approximation to also emit the square root approximation.
This version of the patch cleans up the changes to the MD files and 
fixes some bugs introduced in it since the first proposal.


[AArch64] Emit square root using the Newton series

2016-03-30  Evandro Menezes  
Wilco Dijkstra  

gcc/
* config/aarch64/aarch64-protos.h
(aarch64_emit_approx_rsqrt): Replace with new function
"aarch64_emit_approx_sqrt".
(AARCH64_APPROX_MODE): New macro.
   (AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise.
(tune_params): New member "approx_sqrt_modes".
* config/aarch64/aarch64.c
(generic_tunings): New member "approx_rsqrt_modes".
(cortexa35_tunings): Likewise.
(cortexa53_tunings): Likewise.
(cortexa57_tunings): Likewise.
(cortexa72_tunings): Likewise.
(exynosm1_tunings): Likewise.
(thunderx_tunings): Likewise.
(xgene1_tunings): Likewise.
(aarch64_emit_approx_rsqrt): Replace with new function
"aarch64_emit_approx_sqrt".
(aarch64_override_options_after_change_1): Handle new option.
* config/aarch64/aarch64-simd.md
(rsqrt2): Use new function instead.
(sqrt2): New expansion and insn definitions.
* config/aarch64/aarch64.md: Likewise.
* config/aarch64/aarch64.opt
(mlow-precision-sqrt): Add new option description.
* doc/invoke.texi (mlow-precision-sqrt): Likewise.

This version of the patch uses the finer grained selection for the 
approximate sqrt() by the target firstly proposed at 
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00089.html


Additionally, I changed the handling of the special case when the 
argument is 0.0 for scalars to be the same as for vectors.  The reason 
is that by relying on the CC, a scarce resource, it hindered 
parallelism.  By using up an additional register to hold the mask also 
for scalars, the code is more... scalable.


Hopefully this patch gets close to what all have in mind.

Thank you,

--
Evandro Menezes

>From 6a508df89b9dde5506ec7c2fc40013850b1cd07c Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Thu, 17 Mar 2016 17:39:55 -0500
Subject: [PATCH] [AArch64] Emit square root using the Newton series

2016-03-30  Evandro Menezes  
Wilco Dijkstra  

gcc/
	* config/aarch64/aarch64-protos.h
	(aarch64_emit_approx_rsqrt): Replace with new function
	"aarch64_emit_approx_sqrt".
	(AARCH64_APPROX_MODE): New macro.
	(AARCH64_APPROX_{NONE,SP,DP,DFORM,QFORM,SCALAR,VECTOR,ALL}: Likewise.
	(tune_params): New member "approx_sqrt_modes".
	* config/aarch64/aarch64.c
	(generic_tunings): New member "approx_rsqrt_modes".
	(cortexa35_tunings): Likewise.
	(cortexa53_tunings): Likewise.
	(cortexa57_tunings): Likewise.
	(cortexa72_tunings): Likewise.
	(exynosm1_tunings): Likewise.
	(thunderx_tunings): Likewise.
	(xgene1_tunings): Likewise.
	(aarch64_emit_approx_rsqrt): Replace with new function
	"aarch64_emit_approx_sqrt".
	(aarch64_override_options_after_change_1): Handle new option.
	* config/aarch64/aarch64-simd.md
	(rsqrt2): Use new function instead.
	(sqrt2): New expansion and insn definitions.
	* config/aarch64/aarch64.md: Likewise.
	* config/aarch64/aarch64.opt
	(mlow-precision-sqrt): Add new option description.
	* doc/invoke.texi (mlow-precision-sqrt): Likewise.
---
 gcc/config/aarch64/aarch64-protos.h |  28 -
 gcc/config/aarch64/aarch64-simd.md  |  13 +++-
 gcc/config/aarch64/aarch64.c| 114 +---
 gcc/config/aarch64/aarch64.md   |  11 +++-
 gcc/config/aarch64/aarch64.opt  |   9 ++-
 gcc/config/aarch64/predicates.md|   2 +-
 gcc/doc/invoke.texi |  10 
 7 files changed, 146 insertions(+), 41 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index dced209..055ba7a 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -178,6 +178,31 @@ struct cpu_branch_cost
   const int unpredictable;  /* Unpredictable branch or optimizing for speed.  */
 };
 
+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \
+ : (0))
+#define AARCH64_APPROX_NONE (0)
+#define AARCH64_APPROX_SP (AARCH64_APPROX_MODE (SFmode) \
+			   | AARCH64_APPROX_MODE (V2SFmode) \
+			   | AARCH64_APPROX_MODE (V4SFmode))
+#define AARCH64_APPROX_DP (AARCH64_APPROX_MODE (DFmod

Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote:

> However, I don't think that there's the need to handle any special case
> for division.  The only case when the approximation differs from
> division is when the numerator is infinity and the denominator, zero,
> when the approximation returns infinity and the division, NAN.  So I
> don't think that it's a special case that deserves being handled.  IOW,
> the result of the approximate reciprocal is always needed.
 
No, the result of the approximate reciprocal is not needed. 

Basically a NR approximation produces a correction factor that is very close
to 1.0, and then multiplies that with the previous estimate to get a more
accurate estimate. The final calculation for x * recip(y) is:

result = (reciprocal_correction * reciprocal_estimate) * x

while what I am suggesting is a trivial reassociation:

result = reciprocal_correction * (reciprocal_estimate * x)

The computation of the final reciprocal_correction is on the critical latency
path, while reciprocal_estimate is computed earlier, so we can compute
(reciprocal_estimate * x) without increasing the overall latency. Ie. we saved
a multiply.

In principle this could be done as a separate optimization pass that tries to 
reassociate to reduce latency. However I'm not too convinced this would be
easy to implement in GCC's scheduler, so it's best to do it explicitly.

Wilco



Re: [AArch64] Emit division using the Newton series

2016-04-01 Thread Evandro Menezes

On 04/01/16 17:45, Wilco Dijkstra wrote:

Evandro Menezes wrote:


However, I don't think that there's the need to handle any special case
for division.  The only case when the approximation differs from
division is when the numerator is infinity and the denominator, zero,
when the approximation returns infinity and the division, NAN.  So I
don't think that it's a special case that deserves being handled.  IOW,
the result of the approximate reciprocal is always needed.
  
No, the result of the approximate reciprocal is not needed.


Basically a NR approximation produces a correction factor that is very close
to 1.0, and then multiplies that with the previous estimate to get a more
accurate estimate. The final calculation for x * recip(y) is:

result = (reciprocal_correction * reciprocal_estimate) * x

while what I am suggesting is a trivial reassociation:

result = reciprocal_correction * (reciprocal_estimate * x)

The computation of the final reciprocal_correction is on the critical latency
path, while reciprocal_estimate is computed earlier, so we can compute
(reciprocal_estimate * x) without increasing the overall latency. Ie. we saved
a multiply.

In principle this could be done as a separate optimization pass that tries to
reassociate to reduce latency. However I'm not too convinced this would be
easy to implement in GCC's scheduler, so it's best to do it explicitly.


I think that I see what you mean.  I'll hack something tomorrow.

Thanks for your patience,

--
Evandro Menezes



Re: [PATCH] 69517 - [5/6 regression] SEGV on a VLA with excess initializer elements

2016-04-01 Thread Martin Sebor

Fair enough.  I don't think we can impose an arbitrary 64K limit,
however, as that is a lot smaller than the 8MB default stack size, and
programs can use setrlimit to increase the stack farther.  For GCC 6 let
not impose any limit beyond non-negative/overflowing, and as you say we
can do something better in GCC 7.


Would you be open to imposing a somewhat more permissive limit,
say on the order of one or a few megabytes (but less than the
default 8 MB on Linux)?

I ask because I expect the majority of programmer errors with
VLAs to be due to out-of-range bounds that couldn't be
accommodated even if stack space was extended well beyond
the Linux default (say hundreds of MB), or that would result
in complete stack space exhaustion.  I.e., not caused by
deliberately trying to create very large VLAs but rather by
accidentally creating VLAs with unconstrained bounds (due to
a failure to validate input, uninitialized unsigned variables,
etc.)

I expect fewer cases to be due to negative or zero bounds or
excessive initializers.

I also expect programmers to want to find out about such bugs
sooner (i.e., in unit testing with plentiful stack space) rather
than after software has been deployed (and under low stack space
conditions not exercised during unit testing).

To that end, I think a lower limit is going to be more helpful
than a permissive one (or none at all).

But if even a few MB seems too strict, I would find having even
an exceedingly liberal limit (say 1GB) much preferable to none
at all as it makes it possible to exercise boundary conditions
such as the size overflow problem you noted below.



I think all modes.


Sounds good.  I've enabled it in all modes.


The if (1) seems left over from development.


Right.



It looks like this will multiply *cst_size by the sub-array length once
for each element of the outer array, leading to a too-large result.  And
also generate redundant code to check the bounds of the sub-array
multiple times.


Great catch, thank you!  I think it was a mistake on my part to
try to build both kinds of checks in the same function.  In the
update patch I split it up into two: one to build the bounds
check and another to build the initializer check.

During additional testing it dawned on me that there is no good
way to validate (or even to initialize) the initializer list of
a multi-dimensional VLA that isn't unambiguously braced.

For example, the following VLA with N = 2 and N = 3:

int A [M][N] = { 1, 2, 3, 4 };

Unpatched, GCC initializes it to { { 1, 2, 3 }, { 0, 0, 0 } }.
With my first patch, GCC throws.  Neither is correct, but doing
the right thing would involve emitting parameterizing the
initialization code for the value of each bound.  While that
might be doable it feels like a bigger change than I would be
comfortable attempting at this stage.  To avoid the problem I've
made it an error to specify a partially braced VLA initializer.
If you think it's worthwhile, I can see about implementing the
runtime reshaping in stage 1.



It seems to me that we want use the existing check for excess
initializers in build_vec_init, in the if (length_check) section, though
as you mention in 70019 that needs to be improved to handle STRING_CST.


I don't think modifying build_vec_init() alone would be sufficient.
For example, the function isn't called for a VLA with a constant
bound like this one:

 int A [2][N] = { 1, 2, 3, 4 };


Also, I think we should check for invalid bounds in
compute_array_index_type, next to the UBsan code.  Checking bounds only
from cp_finish_decl means that we don't check uses of VLA types other
than variable declarations.


You mean VLA typedefs?  That's true, though I have consciously
avoided dealing with those.  They're outlawed in N3639 and so
I've been focusing just on variables.  But since GCC accepts
VLA typedefs too I was thinking I would bring them up at some
point in the future to decide what to do about them.

As for where to add the bounds checking code, I also at first
thought of checking the bounds parallel to the UBSan code in
compute_array_index_type() and even tried that approach. The
problem with it is that it only considers one array dimension
at a time, without the knowledge of the others.  As a result,
as noted in sanitizer/70051, it doesn't correctly detect
overflows in the bounds of multidimensional VLAs.




+  /* Avoid instrumenting constexpr functions.  Those must
+ be checked statically, and the (non-constexpr) dynamic
+ instrumentation would cause them to be rejected.  */


Hmm, this sounds wrong; constexpr functions can also be called with
non-constant arguments, and the instrumentation should be folded away
when evaluating a call with constant arguments.


You're right that constexpr functions should be checked as
well.  Unfortunately, at present, due to c++/70507 the check
(or rather the call to __builtin_mul_overflow) isn't folded
away and we end up with error: call to internal function.
As 

Re: [AArch64] Add more precision choices for the reciprocal square root approximation

2016-04-01 Thread Wilco Dijkstra
Evandro Menezes wrote:
>
> I hope that this gets in the ballpark of what's been discussed previously.

Yes that's very close to what I had in mind. A minor issue is that the vector 
modes cannot work as they start at MAX_MODE_FLOAT (which is > 32):

+/* Control approximate alternatives to certain FP operators.  */
+#define AARCH64_APPROX_MODE(MODE) \
+  ((MIN_MODE_FLOAT <= (MODE) && (MODE) <= MAX_MODE_FLOAT) \
+   ? (1 << ((MODE) - MIN_MODE_FLOAT)) \
+   : (MIN_MODE_VECTOR_FLOAT <= (MODE) && (MODE) <= MAX_MODE_VECTOR_FLOAT) \
+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT + 1)) \
+ : (0))

That should be: 

+ ? (1 << ((MODE) - MIN_MODE_VECTOR_FLOAT + MAX_MODE_FLOAT - MIN_MODE_FLOAT 
+ 1)) \

It would be worth testing all the obvious cases to be sure they work.

Also I don't think it is a good idea to enable all modes on Exynos-M1 and 
XGene-1 -
I haven't seen any evidence that shows it gives a speedup on real code for all 
modes
(or at least on a good micro benchmark like the unit vector test I suggested - 
a simple
throughput test does not count!).

The issue is it hides performance gains from an improved divider/sqrt on new 
revisions
or microarchitectures. That means you should only enable cases where there is 
evidence
of a major speedup that cannot be matched by a future improved divider/sqrt.

Wilco



  1   2   >