Re: [PATCH] PR testsuite/70364: Properly align stack in gcc.target/i386/cleanup-[12].c

2016-03-30 Thread Uros Bizjak
On Tue, Mar 29, 2016 at 10:13 PM, H.J. Lu  wrote:
> Tested on x86-64.  OK for trunk?
>
> H.J.
> ---
> PR testsuite/70364
> * gcc.target/i386/cleanup-1.c: Include .
> (check): New function.
> (bar): Call check.
> (foo): Align stack to 16 bytes when calling bar.
> * gcc.target/i386/cleanup-2.c: Likewise.

OK, but let's also ask Jakub, the author of the testcases, for opinion.

Thanks,
Uros.

> ---
>  gcc/testsuite/gcc.target/i386/cleanup-1.c | 17 ++---
>  gcc/testsuite/gcc.target/i386/cleanup-2.c | 17 ++---
>  2 files changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/cleanup-1.c 
> b/gcc/testsuite/gcc.target/i386/cleanup-1.c
> index fc82f35..dcfcc4e 100644
> --- a/gcc/testsuite/gcc.target/i386/cleanup-1.c
> +++ b/gcc/testsuite/gcc.target/i386/cleanup-1.c
> @@ -4,6 +4,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -47,6 +48,14 @@ handler (void *p __attribute__((unused)))
>_exit (0);
>  }
>
> +static void
> +__attribute__((noinline))
> +check (intptr_t p)
> +{
> +  if ((p & 15) != 0)
> +abort ();
> +}
> +
>  static int __attribute__((noinline))
>  fn5 (void)
>  {
> @@ -59,6 +68,8 @@ void
>  bar (void)
>  {
>char dummy __attribute__((cleanup (counter)));
> +  unsigned long tmp[4] __attribute__((aligned(16)));
> +  check ((intptr_t) tmp);
>fn5 ();
>  }
>
> @@ -133,9 +144,9 @@ foo (int x)
> ".type  _L_mutex_lock_%=, @function\n"
>  "_L_mutex_lock_%=:\n"
>  "1:\t" "leaq   %1, %%rdi\n"
> -"2:\t" "subq   $128, %%rsp\n"
> +"2:\t" "subq   $136, %%rsp\n"
>  "3:\t" "call   bar\n"
> -"4:\t" "addq   $128, %%rsp\n"
> +"4:\t" "addq   $136, %%rsp\n"
>  "5:\t" "jmp24f\n"
>  "6:\t" ".size _L_mutex_lock_%=, .-_L_mutex_lock_%=\n\t"
> ".previous\n\t"
> @@ -179,7 +190,7 @@ foo (int x)
> ".sleb128 4b-3b\n"
>  "16:\t"".byte  0x40 + (4b-3b-1) # DW_CFA_advance_loc\n\t"
> ".byte  0x0e# DW_CFA_def_cfa_offset\n\t"
> -   ".uleb128 128\n\t"
> +   ".uleb128 136\n\t"
> ".byte  0x16# DW_CFA_val_expression\n\t"
> ".uleb128 0x10\n\t"
> ".uleb128 20f-17f\n"
> diff --git a/gcc/testsuite/gcc.target/i386/cleanup-2.c 
> b/gcc/testsuite/gcc.target/i386/cleanup-2.c
> index 0ec7c31..7e603233 100644
> --- a/gcc/testsuite/gcc.target/i386/cleanup-2.c
> +++ b/gcc/testsuite/gcc.target/i386/cleanup-2.c
> @@ -4,6 +4,7 @@
>
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -47,6 +48,14 @@ handler (void *p __attribute__((unused)))
>_exit (0);
>  }
>
> +static void
> +__attribute__((noinline))
> +check (intptr_t p)
> +{
> +  if ((p & 15) != 0)
> +abort ();
> +}
> +
>  static int __attribute__((noinline))
>  fn5 (void)
>  {
> @@ -59,6 +68,8 @@ void
>  bar (void)
>  {
>char dummy __attribute__((cleanup (counter)));
> +  unsigned long tmp[4] __attribute__((aligned(16)));
> +  check ((intptr_t) tmp);
>fn5 ();
>  }
>
> @@ -74,9 +85,9 @@ foo (int x)
> ".type  _L_mutex_lock_%=, @function\n"
>  "_L_mutex_lock_%=:\n"
>  "1:\t" "leaq   %1, %%rdi\n"
> -"2:\t" "subq   $128, %%rsp\n"
> +"2:\t" "subq   $136, %%rsp\n"
>  "3:\t" "call   bar\n"
> -"4:\t" "addq   $128, %%rsp\n"
> +"4:\t" "addq   $136, %%rsp\n"
>  "5:\t" "jmp21f\n"
>  "6:\t" ".size _L_mutex_lock_%=, .-_L_mutex_lock_%=\n\t"
> ".previous\n\t"
> @@ -160,7 +171,7 @@ foo (int x)
> ".uleb128 6b-5b-1\n"
>  "19:\t"".byte  0x40 + (3b-1b) # DW_CFA_advance_loc\n\t"
> ".byte  0xe # DW_CFA_def_cfa_offset\n\t"
> -   ".uleb128 128\n\t"
> +   ".uleb128 136\n\t"
> ".byte  0x40 + (5b-3b) # DW_CFA_advance_loc\n\t"
> ".byte  0xe # DW_CFA_def_cfa_offset\n\t"
> ".uleb128 0\n\t"
> --
> 2.5.5
>


Re: RFA: PATCH to tree-inline.c:remap_decls for c++/70353 (ICE with __func__ and constexpr)

2016-03-30 Thread Richard Biener
On Tue, Mar 29, 2016 at 8:42 PM, Jason Merrill  wrote:
> On 03/29/2016 06:37 AM, Jan Hubicka wrote:
>>>
>>> On Mon, Mar 28, 2016 at 11:26 PM, Jason Merrill  wrote:

 The constexpr evaluation code uses the inlining code to remap the
 constexpr
 function body for evaluation so that recursion works properly.  In this
 testcase __func__ is declared as a static local variable, so rather than
 remap it, remap_decls tries to add it to the local_decls list for the
 function we're inlining into.  But there is no such function in this
 case,
 so we crash.

 Avoid the add_local_decl call when cfun is null avoids the ICE (thanks
 Jakub), but results in an undefined symbol.  Calling
 varpool_node::finalize_decl instead allows cgraph to handle the
 reference
 from 'c' properly.

 OK if testing passes?
>>>
>>>
>>> So ce will never be instantiated?
>
>
> Right, because no calls to it survive constexpr evaluation.  And the front
> end avoids finalizing it in make_rtl_for_nonlocal_decl...which is another
> place I could fix this.  Thus.
>
> Tested x86_64-pc-linux-gnu, applying to trunk.

Much better!

Thanks,
Richard.

> Jason
>


Re: [PATCH] Disable guality tests for powerpc*-linux*

2016-03-30 Thread Richard Biener
On Tue, Mar 29, 2016 at 7:16 PM, Jakub Jelinek  wrote:
> On Tue, Mar 29, 2016 at 12:01:20PM -0500, Bill Schmidt wrote:
>> Again, this is good information to know about.  But the "stuff" we were
>> talking about was the failures on powerpc*, and I took what you said to
>> mean that nobody was working on those.  It sounds like you're saying
>> that the community has spent time on debug improvements for optimized
>> code on x86_64/i?86, but only for that target.  Is that a fair
>> statement?  If so, it seems unsurprising that you would get more bug
>
> Well, most of the analysis has been done on x86_64/i?86.  The bug fixes,
> DWARF enhancements etc. were then in various areas, if something has been
> improved through some GIMPLE change, then likely all targets benefited,
> if it was something at the RTL level (or var-tracking pass itself), then
> it really depends on the various properties of the machine descriptions,
> argument passing etc.
> I'm not saying it is possible to have all the guality tests pass at all
> optimization levels on all targets, sometimes the value of some variable
> is really lost through optimizations and can't be reconstructed in any way,
> sometimes it is too costly to track it, etc.
> In other cases we have yet to create new DWARF extensions, known stuff is
> e.g. debugging vectorized loops, what kind of user experience we want for
> users if single set of instructions handles multiple iterations of the loop?
> Do we want user to think he is seeing e.g. the first iteration, then the
> fifth one, then ninth etc., or provide enough info for the debuggers so that
> the user could find out he is in vectorized loop and explicitly request
> he is e.g. interested in the 3rd iteration instead of 1st?
> Then there will be certainly cases where even without adding any extensions
> one can just add some smarts to var-tracking, or change other GCC internals
> to handle some stuff better.

Yes,  I think we _do_ need some dg-effective-target stuff for guality
as some tests
currently fail on arm (IIRC, I've only once did some non-x86 digging
into guality fails)
because of ABI issues that make a middle-end debuginfo fix incomplete (or
impossible, don't remember).

For powerpc somebody needs to look at a few regressions towards x86_64 and
see if there's a similar pattern - adding arch specific xfails (or
adding effective
targets) is a good way to make progress as well.

Hell, even slapping a xfail powerpc*-*-* on all current ppc FAILs
would be better
than simply disabling all of guality for ppc.

Richard.

> Jakub


Re: Fix overflow in loop peeling code

2016-03-30 Thread Richard Biener
On Wed, Mar 30, 2016 at 12:04 AM, Jan Hubicka  wrote:
> Hi,
> this patch fixes stupid overflow in tree-ssa-loop-ivcanon.c.
> If the estimated number of execution of loop is INT_MAX+1 it will get peeled
> incorrectly.
>
> Bootstrapped/regtested x86_64-linux and committed (it is regression WRT the
> RTL implementation)
>
> Honza
>
> * tree-ssa-loop-ivcanon.c (try_peel_loop): Change type of peel
> to HOST_WIDE_INT
> Index: tree-ssa-loop-ivcanon.c
> ===
> --- tree-ssa-loop-ivcanon.c (revision 234516)
> +++ tree-ssa-loop-ivcanon.c (working copy)
> @@ -935,7 +935,7 @@ try_peel_loop (struct loop *loop,
>edge exit, tree niter,
>HOST_WIDE_INT maxiter)
>  {
> -  int npeel;
> +  HOST_WIDE_INT npeel;
>struct loop_size size;
>int peeled_size;
>sbitmap wont_exit;
> @@ -990,7 +990,7 @@ try_peel_loop (struct loop *loop,
>  {
>if (dump_file)
>  fprintf (dump_file, "Not peeling: rolls too much "
> -"(%i + 1 > --param max-peel-times)\n", npeel);
> +"(%i + 1 > --param max-peel-times)\n", (int) npeel);

Use "(" HOST_WIDE_INT_PRINT_DEC " + 1 > 

>return false;
>  }
>npeel++;
> @@ -998,7 +998,7 @@ try_peel_loop (struct loop *loop,
>/* Check peeled loops size.  */
>tree_estimate_loop_size (loop, exit, NULL, &size,
>PARAM_VALUE (PARAM_MAX_PEELED_INSNS));
> -  if ((peeled_size = estimated_peeled_sequence_size (&size, npeel))
> +  if ((peeled_size = estimated_peeled_sequence_size (&size, (int) npeel))

^^^ suggests estimated_peeled_sequence_size needs adjustment as well,
otherwise you'll get bogus param check.

>> PARAM_VALUE (PARAM_MAX_PEELED_INSNS))
>  {
>if (dump_file)
> @@ -1032,7 +1032,7 @@ try_peel_loop (struct loop *loop,
>if (dump_file && (dump_flags & TDF_DETAILS))
>  {
>fprintf (dump_file, "Peeled loop %d, %i times.\n",
> -  loop->num, npeel);
> +  loop->num, (int) npeel);

See above.

Richard.

>  }
>if (loop->any_upper_bound)
>  loop->nb_iterations_upper_bound -= npeel;


Re: [PATCH] Limit alias walking by speculative devirt

2016-03-30 Thread Richard Biener
On Wed, 23 Mar 2016, Richard Biener wrote:

> 
> This reduces the compile-time for the testcase from PR12392 from
> ~50s to ~35s, dropping the alias-stmt walking time from 40% to around 8%.
> 
> Currently (even when -fno-devirtualize-speculatively - heh) when
> looking for a must-def that specifies the dynamic type of an object
> we invoke a virtual call on we skip may-defs speculatively until
> we run into the function start which of course may be quite some
> work to do and which is of course not acceptable.
> 
> The following limits the number of may-defs we skip.
> 
> It does not limit the number of stmts we skip as non-aliasing, thus we can
> still run into large overhead cases but that would require to assign an
> overall budget to the current function which isn't that trivial
> because this is a helper which is called from multiple places in GCC.
> Ideally the devirt machinery would record interesting must-defs in
> a body walk and thus when looking for it it could find candidates
> with a hashtable lookup and would only need to check whether there
> is no intermediate may-def.  But as said, with the "tool" nature
> of the devirt thing this is hard (but maybe speculative devirt
> is really only done at IPA time and not from PRE?).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Honza, is this ok?  Can you check effects on devirt numbers for
> the testcases you have monitored that?

Honza had some comments and checked effects on libreoffice and approved
off-list.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-03-30  Michael Matz  
Richard Biener  

PR ipa/12392
* ipa-polymorphic-call.c (struct type_change_info): Change
speculative to an unsigned allowing to limit the work we do.
(csftc_abort_walking_p): New inline function..
(check_stmt_for_type_change): Limit the number of may-defs
skipped for speculative devirtualization to
max-speculative-devirt-maydefs.
* params.def (max-speculative-devirt-maydefs): New param.
* doc/invoke.texi (--param max-speculative-devirt-maydefs): Document.

Index: gcc/params.def
===
*** gcc/params.def  (revision 234415)
--- gcc/params.def  (working copy)
*** DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
*** 1203,1208 
--- 1203,1214 
  "hsa-gen-debug-stores",
  "Level of hsa debug stores verbosity",
  0, 0, 1)
+ 
+ DEFPARAM (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS,
+ "max-speculative-devirt-maydefs",
+ "Maximum number of may-defs visited when devirtualizing "
+ "speculatively", 50, 0, 0)
+ 
  /*
  
  Local variables:
Index: gcc/ipa-polymorphic-call.c
===
*** gcc/ipa-polymorphic-call.c  (revision 234415)
--- gcc/ipa-polymorphic-call.c  (working copy)
*** along with GCC; see the file COPYING3.
*** 38,43 
--- 38,44 
  #include "tree-dfa.h"
  #include "gimple-pretty-print.h"
  #include "tree-into-ssa.h"
+ #include "params.h"
  
  /* Return true when TYPE contains an polymorphic type and thus is interesting
 for devirtualization machinery.  */
*** struct type_change_info
*** 1094,1107 
tree known_current_type;
HOST_WIDE_INT known_current_offset;
  
/* Set to true if dynamic type change has been detected.  */
bool type_maybe_changed;
/* Set to true if multiple types have been encountered.  known_current_type
   must be disregarded in that case.  */
bool multiple_types_encountered;
-   /* Set to true if we possibly missed some dynamic type changes and we should
-  consider the set to be speculative.  */
-   bool speculative;
bool seen_unanalyzed_store;
  };
  
--- 1095,1109 
tree known_current_type;
HOST_WIDE_INT known_current_offset;
  
+   /* Set to nonzero if we possibly missed some dynamic type changes and we
+  should consider the set to be speculative.  */
+   unsigned speculative;
+ 
/* Set to true if dynamic type change has been detected.  */
bool type_maybe_changed;
/* Set to true if multiple types have been encountered.  known_current_type
   must be disregarded in that case.  */
bool multiple_types_encountered;
bool seen_unanalyzed_store;
  };
  
*** record_known_type (struct type_change_in
*** 1338,1343 
--- 1340,1358 
tci->type_maybe_changed = true;
  }
  
+ 
+ /* The maximum number of may-defs we visit when looking for a must-def
+that changes the dynamic type in check_stmt_for_type_change.  Tuned
+after the PR12392 testcase which unlimited spends 40% time within
+these alias walks and 8% with the following limit.  */
+ 
+ static inline bool
+ csftc_abort_walking_p (unsigned speculative)
+ {
+   unsigned max = PARAM_VALUE (PARAM_MAX_SPECULATIVE_DEVIRT_MAYDEFS);
+   return speculative > max ? true : fa

Re: [PATCH] PR target/70439: Properly check conflict between DRAP register and __builtin_eh_return

2016-03-30 Thread Uros Bizjak
On Tue, Mar 29, 2016 at 8:56 PM, H.J. Lu  wrote:
> Since %ecx can't be used for both DRAP register and __builtin_eh_return,
> we need to check if crtl->drap_reg uses %ecx before using %ecx for
> __builtin_eh_return.
>
> Testing on x86-64.  OK for trunk if there are no regressions?
>
>
> H.J.
> ---
> PR target/70439
> * config/i386/i386.c (ix86_expand_epilogue): Properly check
> conflict between DRAP register and __builtin_eh_return.
> ---
>  gcc/config/i386/i386.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1639704..aafe171 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -13463,8 +13463,9 @@ ix86_expand_epilogue (int style)
>   rtx sa = EH_RETURN_STACKADJ_RTX;
>   rtx_insn *insn;
>
> - /* Stack align doesn't work with eh_return.  */
> - gcc_assert (!stack_realign_drap);
> + /* %ecx can't be used for both DRAP register and eh_return.  */
> + gcc_assert (!crtl->drap_reg
> + || REGNO (crtl->drap_reg) != CX_REG);

How about:

if (crtl->drap_reg)
  gcc_assert (REGNO (crtl->drap_reg) != CX_REG));

?

>   /* Neither does regparm nested functions.  */
>   gcc_assert (!ix86_static_chain_on_stack);

This comment needs to be updated, too.

Uros.


Re: [PATCH GCC]Reduce compilation time for IVOPT by skipping cost computation in use group

2016-03-30 Thread Richard Biener
On Thu, Mar 24, 2016 at 6:26 PM, Bin Cheng  wrote:
> Hi,
> Quite lot of time is used when IVOPT computes cost for  pairs.  As 
> a matter of fact, some pairs are very similar to each other, and we can 
> abstract and compute cost only once for these pairs.  This is a patch doing 
> so, the idea is skipping cost computation for sub-uses in each group, of 
> course it may result in different assembly code for some complicated cases 
> because it estimates cost rather than doing real computation.  I did double 
> check one of such case that the change in generated assembly is not 
> degeneration.  For an IVOPT heavy program (spec2k/173), this patch reduces 
> IVOPT's compilation time by 7~8%, as well as the memory consumption on my 
> developing machine.
>
> Bootstrap & test on x86_64.
>
> For spec2k6 data on x86_64.  Maybe because I ran spec2k6 compiled with 
> patched GCC in unclean environment, some cases are regressed by small amount 
> (< %1).  I manually compared assembly code for several cases, including ones 
> with the largest regression (still within <1%).  I could confirm that 
> generated assembly code is exact the same as unpatched GCC, except for 
> function emit_library_call_value_1 in 403.gcc/calls.c.
>
> In this case, difference of IVOPT dumps is as below:
>
> $ diff -y trunk/calls.c.154t.ivopts patch/calls.c.154t.ivopts
>
>   ::
>   # val_21 = PHI   # val_21 = 
> PHI 
>   _811 = (void *) ivtmp.322_829;  _811 = 
> (void *) ivtmp.322_829;
>   MEM[base: _811, offset: -48B] = val_21; |   MEM[base: 
> _811, offset: -32B] = val_21;
>   _810 = (void *) ivtmp.322_829;  _810 = 
> (void *) ivtmp.322_829;
>   MEM[base: _810, offset: -40B] = mode_163;   |   MEM[base: 
> _810, offset: -24B] = mode_163;
>   _182 = function_arg (&args_so_far, mode_163, 0B, 1);_182 = 
> function_arg (&args_so_far, mode_163, 0B, 1);
>   _809 = (void *) ivtmp.322_829;  _809 = 
> (void *) ivtmp.322_829;
>   MEM[base: _809, offset: -32B] = _182;   |   MEM[base: 
> _809, offset: -16B] = _182;
>   _807 = (void *) ivtmp.322_829;  _807 = 
> (void *) ivtmp.322_829;
>   MEM[base: _807, offset: -24B] = 0;  |   MEM[base: 
> _807, offset: -8B] = 0;
>   _185 = (struct args_size *) ivtmp.322_829;  |   _801 = 
> ivtmp.322_829 + 16;
>   _801 = ivtmp.322_829 + 18446744073709551600;<
>   _800 = (struct args_size *) _801;   _800 = 
> (struct args_size *) _801;
>   _186 = _800;|   _185 = _800;
>   >   _186 = 
> (struct args_size *) ivtmp.322_829;
>   _187 = _182 != 0B;  _187 = _182 
> != 0B;
>   _188 = (int) _187;  _188 = 
> (int) _187;
>   locate_and_pad_parm (mode_163, 0B, _188, 0B, &args_size, _1 
> locate_and_pad_parm (mode_163, 0B, _188, 0B, &args_size, _1
>   _802 = (void *) ivtmp.322_829;  _802 = 
> (void *) ivtmp.322_829;
>   _190 = MEM[base: _802, offset: 8B]; |   _190 = 
> MEM[base: _802, offset: 24B];
>   if (_190 != 0B) if (_190 != 
> 0B)
> goto ;   goto  45>;
>   elseelse
> goto ;   goto  46>;
>
>   ::
>   fancy_abort ("calls.c", 3724, &__FUNCTION__);   fancy_abort 
> ("calls.c", 3724, &__FUNCTION__);
>
> It's only an offset difference in IV.  And below is difference of generated 
> assembly:
> $ diff -y trunk/calls.S patch/calls.S
> .L489:  .L489:
> leaq-80(%rbp), %rdi leaq  
>   -80(%rbp), %rdi
> xorl%edx, %edx  xorl  
>   %edx, %edx
> movl$1, %ecxmovl  
>   $1, %ecx
> movl%r13d, %esi movl  
>   %r13d, %esi
> movq%rax, -48(%r15)   <
> movl%r13d, -40(%r15)  <
> callfunction_arg  <
> movl$0, -24(%r15) <
> movq%rax, -32(%r15) movq  
>   %rax, -32(%r15)
>   > movl  
>   %r13d, -24(%r15)
>  

Re: [PATCH][PR target/63890] Turn on ACCUMULATE_OUTGOING_ARGS when profiling on darwin

2016-03-30 Thread Mike Stump
On Mar 29, 2016, at 8:57 PM, Jeff Law  wrote:
> I'm installing this on the trunk momentarily.

Thank you for the review.


Re: [PATCH][ARM][4.9 Backport] PR target/69875 Fix atomic_loaddi expansion

2016-03-30 Thread Kyrill Tkachov


On 29/03/16 19:46, Christophe Lyon wrote:

On 16 March 2016 at 16:54, Ramana Radhakrishnan
 wrote:

On Wed, Feb 24, 2016 at 11:23 AM, Kyrill Tkachov
 wrote:

Hi all,

This is the GCC 4.9 backport of
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01338.html.
The differences are that TARGET_HAVE_LPAE has to be defined in arm.h in a
different way because
the ARM_FSET_HAS_CPU1 mechanism doesn't exist on this branch. Also, due to
the location of insn_flags
and the various FL_* (on the 4.9 branch they're defined locally in arm.c
rather than in arm-protos.h)
I chose to define TARGET_HAVE_LPAE in terms of hardware divide instruction
availability. This should be
an equivalent definition.

Also, the scan-assembler tests that check for the DMB instruction are
updated to check for
"dmb sy" rather than "dmb ish", because the memory barrier instruction
changed on trunk for GCC 6.

Bootstrapped and tested on the GCC 4.9 branch on arm-none-linux-gnueabihf.


Ok for the branch after the trunk patch has had a few days to bake?


OK.


Hi Kyrylo,

Since you backported this to branches 4.9 and 5, I've noticed cross-GCC build
failures:
--target arm-none-linux-gnueabihf
--with-mode=arm
--with-cpu=cortex-a57
--with-fpu=crypto-neon-fp-armv8

The build succeeds --with-mode=thumb.

The error message I'm seeing is:
/tmp/6190285_22.tmpdir/ccuX17sh.s: Assembler messages:
/tmp/6190285_22.tmpdir/ccuX17sh.s:34: Error: bad instruction `ldrdeq r0,r1,[r0]'
make[4]: *** [load_8_.lo] Error 1

while building libatomic


Darn, I had re-tested before committing with --with-mode=thumb :(
The problem here is that GCC 5 and 4.9 don't use unified syntax
for arm state (it was switched on for GCC 6), so the output template
in the new arm_atomic_loaddi2_ldrd pattern should be "ldr%(d%)" instead
of "ldrd%?".

I'll prepare a patch.
Thanks for catching this,
Kyrill



Christophe



Ramana

Thanks,
Kyrill

2016-02-24  Kyrylo Tkachov  

 PR target/69875
 * config/arm/arm.h (TARGET_HAVE_LPAE): Define.
 * config/arm/unspecs.md (VUNSPEC_LDRD_ATOMIC): New value.
 * config/arm/sync.md (arm_atomic_loaddi2_ldrd): New pattern.
 (atomic_loaddi_1): Delete.
 (atomic_loaddi): Rewrite expander using the above changes.

2016-02-24  Kyrylo Tkachov  

 PR target/69875
 * gcc.target/arm/atomic_loaddi_acquire.x: New file.
 * gcc.target/arm/atomic_loaddi_relaxed.x: Likewise.
 * gcc.target/arm/atomic_loaddi_seq_cst.x: Likewise.
 * gcc.target/arm/atomic_loaddi_1.c: New test.
 * gcc.target/arm/atomic_loaddi_2.c: Likewise.
 * gcc.target/arm/atomic_loaddi_3.c: Likewise.
 * gcc.target/arm/atomic_loaddi_4.c: Likewise.
 * gcc.target/arm/atomic_loaddi_5.c: Likewise.
 * gcc.target/arm/atomic_loaddi_6.c: Likewise.
 * gcc.target/arm/atomic_loaddi_7.c: Likewise.
 * gcc.target/arm/atomic_loaddi_8.c: Likewise.
 * gcc.target/arm/atomic_loaddi_9.c: Likewise.




Re: rs6000 stack_tie mishap again

2016-03-30 Thread Olivier Hainque
Hello Segher,

> On Mar 28, 2016, at 13:18 , Segher Boessenkool  
> wrote:
> 
>> You need to have had r11 last used to designate a global
>> symbol as part of the function body (in order to have base_term
>> designate a symbol_ref etc), and then have the scheduler
>> decide that moving across is a good idea. It's certainly not
>> an easy combination to trigger.
> 
> Yes, I did that (with some asm's).  Like this:
> 
> ===
> void g(int, char *);
> 
> int dum;
> 
> void f(int x)
> {
>char big[20];
>g(x, big);
>g(x, big);
>register void *p asm("r11") = &dum;
>asm("" : : "r"(p));
> }

Ah, I see, thanks. In this instance, the problem doesn't
trigger because CONSTANT_POOL_ADDRESS_P (base) is false in

  base = find_base_term (true_mem_addr);
  if (! writep
  && base
  && (GET_CODE (base) == LABEL_REF
  || (GET_CODE (base) == SYMBOL_REF
  && CONSTANT_POOL_ADDRESS_P (base
return 0;

  (part of write_dependence_p)

With a minor variation:

void g(int, char *);

void f(int x)
{
   char big[20];
 start:
   g(x, big);
   g(x, big);
   register void *p asm("r11") = &&start;
   asm("" : : "r"(p));
   asm("" : : :"r28");
   asm("" : : :"r29");
   asm("" : : :"r30");
}

I'm getting:

lis 11,.L2@ha
la 11,.L2@l(11)

lwz 11,0(1)
lwz 0,4(11)
lwz 28,-16(11) 

mr 1,11

mtlr 0
lwz 29,-12(11)
lwz 30,-8(11)
lwz 31,-4(11)

blr

out of a powerpc-elf close-to-tunk compiler, despite the
presence of a stack_tie insn at the rtl level.

Olivier



Re: [PATCH ARM v2] PR69770 -mlong-calls does not affect calls to __gnu_mcount_nc generated by -pg

2016-03-30 Thread Ramana Radhakrishnan


On 24/03/16 21:02, Charles Baylis wrote:
> When compiling with -mlong-calls and -pg, calls to the __gnu_mcount_nc
> function are not generated as long calls.
> 
> This is the sequel to this patch
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00881.html
> 
> This patch fixes the following problems with the previous patch.
> . Nested functions now work (thanks to Richard E for spotting this)
> . Thumb-1 now works

> 
> This patch works by adding new patterns (one for ARM/Thumb-2 and one
> for Thumb-1) which are placed in the prologue as a placeholder for
> some RTL which describes the address. This is either a SYMBOL_REF for
> targets with MOVW/MOVT, or a literal pool reference for other targets.
> The implementation of ARM_FUNCTION_PROFILER is changed to search for
> this insn so that the the address of the __gnu_mcount_nc function can
> be loaded using an appropriate sequence for the target.

I'm reasonably happy with the approach but there are nits.

> 
> I also tried generating the profiling call sequence directly in the
> prologue, but this requires some unpleasant hacks to prevent spurious
> register pushes from ASM_OUTPUT_REG_PUSH.
> 
> Tested with no new regressions on arm-unknown-linux-gnueabihf on QEMU.
> The generated code sequences have been inspected for normal and nested
> functions on ARM v6, ARM v7, Thumb-1, and Thumb-2 targets.
> 
> This does not fix a regression, so I don't expect to apply it for
> GCC6, is it OK for when stage 1 re-opens.


I'm not sure how much testing coverage you get by just running the testsuite. 
Doing a profiled bootstrap with -mlong-calls and a regression test run for arm 
and / or thumb2 would be a useful test to do additionally - Given that this 
originally came from kernel builds with allyesconfig how important is this to 
be fixed for GCC 6 ? I'd rather take the fix into GCC 6 to get the kernel 
building again but that's something we can discuss with the RM's once the 
issues with the patch are fixed.
> 
> gcc/ChangeLog:
> 
> 2016-03-24  Charles Baylis  
> 
> * config/arm/arm-protos.h (arm_emit_long_call_profile): New function.
> * config/arm/arm.c (arm_emit_long_call_profile_insn): New function.
> (arm_expand_prologue): Likewise.
> (thumb1_expand_prologue): Likewise.
> (arm_output_long_call_to_profile_func): Likewise.
> (arm_emit_long_call_profile): Likewise.
> * config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment.
> * config/arm/arm.md (arm_long_call_profile): New pattern.
> * config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New
> define.
> * config/arm/thumb1.md (thumb1_long_call_profile): New pattern.
> * config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-03-24  Charles Baylis  
> 
> * gcc.target/arm/pr69770.c: New test.
> 
> 
> 0001-PR69770-mlong-calls-does-not-affect-calls-to-__gnu_m.patch
> 
> 
> From 5a39451f34be9b6ca98b3460bf40d879d6ee61a5 Mon Sep 17 00:00:00 2001
> From: Charles Baylis 
> Date: Thu, 24 Mar 2016 20:43:25 +
> Subject: [PATCH] PR69770 -mlong-calls does not affect calls to __gnu_mcount_nc
>  generated by -pg
> 
> gcc/ChangeLog:
> 
> 2016-03-24  Charles Baylis  
> 
> * config/arm/arm-protos.h (arm_emit_long_call_profile): New function.
> * config/arm/arm.c (arm_emit_long_call_profile_insn): New function.
> (arm_expand_prologue): Likewise.
> (thumb1_expand_prologue): Likewise.
> (arm_output_long_call_to_profile_func): Likewise.
> (arm_emit_long_call_profile): Likewise.
> * config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment.
> * config/arm/arm.md (arm_long_call_profile): New pattern.
> * config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New
>   define.
> * config/arm/thumb1.md (thumb1_long_call_profile): New pattern.
> * config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-03-24  Charles Baylis  
> 
> * gcc.target/arm/pr69770.c: New test.
> 
> Change-Id: I9b8de01fea083f17f729c3801f83174bedb3b0c6
> 
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 0083673..324c9f4 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -343,6 +343,7 @@ extern void arm_register_target_pragmas (void);
>  extern void arm_cpu_cpp_builtins (struct cpp_reader *);
>  
>  extern bool arm_is_constant_pool_ref (rtx);
> +void arm_emit_long_call_profile ();
>  
>  /* Flags used to identify the presence of processor capabilities.  */
>  
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index c868490..040b255 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -21426,6 +21426,22 @@ output_probe_stack_range (rtx reg1, rtx reg2)
>return "";
>  }
>  
> +static void
> +arm_emit_long_call_profile_insn ()

s/()/(void)

> +{
> +  rtx sym_r

Re: Fix overflow in loop peeling code

2016-03-30 Thread Jan Hubicka
 > Index: tree-ssa-loop-ivcanon.c
> > ===
> > --- tree-ssa-loop-ivcanon.c (revision 234516)
> > +++ tree-ssa-loop-ivcanon.c (working copy)
> > @@ -935,7 +935,7 @@ try_peel_loop (struct loop *loop,
> >edge exit, tree niter,
> >HOST_WIDE_INT maxiter)
> >  {
> > -  int npeel;
> > +  HOST_WIDE_INT npeel;
> >struct loop_size size;
> >int peeled_size;
> >sbitmap wont_exit;
> > @@ -990,7 +990,7 @@ try_peel_loop (struct loop *loop,
> >  {
> >if (dump_file)
> >  fprintf (dump_file, "Not peeling: rolls too much "
> > -"(%i + 1 > --param max-peel-times)\n", npeel);
> > +"(%i + 1 > --param max-peel-times)\n", (int) npeel);
> 
> Use "(" HOST_WIDE_INT_PRINT_DEC " + 1 > 

OK, same code exists in the unroller code.  I will update both.
> 
> >return false;
> >  }
> >npeel++;
> > @@ -998,7 +998,7 @@ try_peel_loop (struct loop *loop,
> >/* Check peeled loops size.  */
> >tree_estimate_loop_size (loop, exit, NULL, &size,
> >PARAM_VALUE (PARAM_MAX_PEELED_INSNS));
> > -  if ((peeled_size = estimated_peeled_sequence_size (&size, npeel))
> > +  if ((peeled_size = estimated_peeled_sequence_size (&size, (int) npeel))
> 
> ^^^ suggests estimated_peeled_sequence_size needs adjustment as well,
> otherwise you'll get bogus param check.

This is safe - npeel is capped by the previous test about rolling too much to
small value.  It is basically all about the first test not passing for
large values that converts to small integers.

Honza


Re: [PATCH ARM v2] PR69770 -mlong-calls does not affect calls to __gnu_mcount_nc generated by -pg

2016-03-30 Thread Ramana Radhakrishnan
And some more formatting issues.

On 30/03/16 10:33, Ramana Radhakrishnan wrote:
> 
> 
> On 24/03/16 21:02, Charles Baylis wrote:
>> When compiling with -mlong-calls and -pg, calls to the __gnu_mcount_nc
>> function are not generated as long calls.
>>
>> This is the sequel to this patch
>> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00881.html
>>
>> This patch fixes the following problems with the previous patch.
>> . Nested functions now work (thanks to Richard E for spotting this)
>> . Thumb-1 now works
> 
>>
>> This patch works by adding new patterns (one for ARM/Thumb-2 and one
>> for Thumb-1) which are placed in the prologue as a placeholder for
>> some RTL which describes the address. This is either a SYMBOL_REF for
>> targets with MOVW/MOVT, or a literal pool reference for other targets.
>> The implementation of ARM_FUNCTION_PROFILER is changed to search for
>> this insn so that the the address of the __gnu_mcount_nc function can
>> be loaded using an appropriate sequence for the target.
> 
> I'm reasonably happy with the approach but there are nits.
> 
>>
>> I also tried generating the profiling call sequence directly in the
>> prologue, but this requires some unpleasant hacks to prevent spurious
>> register pushes from ASM_OUTPUT_REG_PUSH.
>>
>> Tested with no new regressions on arm-unknown-linux-gnueabihf on QEMU.
>> The generated code sequences have been inspected for normal and nested
>> functions on ARM v6, ARM v7, Thumb-1, and Thumb-2 targets.
>>
>> This does not fix a regression, so I don't expect to apply it for
>> GCC6, is it OK for when stage 1 re-opens.
> 
> 
> I'm not sure how much testing coverage you get by just running the testsuite. 
> Doing a profiled bootstrap with -mlong-calls and a regression test run for 
> arm and / or thumb2 would be a useful test to do additionally - Given that 
> this originally came from kernel builds with allyesconfig how important is 
> this to be fixed for GCC 6 ? I'd rather take the fix into GCC 6 to get the 
> kernel building again but that's something we can discuss with the RM's once 
> the issues with the patch are fixed.
>>
>> gcc/ChangeLog:
>>
>> 2016-03-24  Charles Baylis  
>>
>> * config/arm/arm-protos.h (arm_emit_long_call_profile): New function.
>> * config/arm/arm.c (arm_emit_long_call_profile_insn): New function.
>> (arm_expand_prologue): Likewise.
>> (thumb1_expand_prologue): Likewise.
>> (arm_output_long_call_to_profile_func): Likewise.
>> (arm_emit_long_call_profile): Likewise.
>> * config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment.
>> * config/arm/arm.md (arm_long_call_profile): New pattern.
>> * config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New
>> define.
>> * config/arm/thumb1.md (thumb1_long_call_profile): New pattern.
>> * config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-03-24  Charles Baylis  
>>
>> * gcc.target/arm/pr69770.c: New test.
>>
>>
>> 0001-PR69770-mlong-calls-does-not-affect-calls-to-__gnu_m.patch
>>
>>
>> From 5a39451f34be9b6ca98b3460bf40d879d6ee61a5 Mon Sep 17 00:00:00 2001
>> From: Charles Baylis 
>> Date: Thu, 24 Mar 2016 20:43:25 +
>> Subject: [PATCH] PR69770 -mlong-calls does not affect calls to 
>> __gnu_mcount_nc
>>  generated by -pg
>>
>> gcc/ChangeLog:
>>
>> 2016-03-24  Charles Baylis  
>>
>> * config/arm/arm-protos.h (arm_emit_long_call_profile): New function.
>> * config/arm/arm.c (arm_emit_long_call_profile_insn): New function.
>> (arm_expand_prologue): Likewise.
>> (thumb1_expand_prologue): Likewise.
>> (arm_output_long_call_to_profile_func): Likewise.
>> (arm_emit_long_call_profile): Likewise.
>> * config/arm/arm.h: (ASM_OUTPUT_REG_PUSH) Update comment.
>> * config/arm/arm.md (arm_long_call_profile): New pattern.
>> * config/arm/bpabi.h (ARM_FUNCTION_PROFILER_SUPPORTS_LONG_CALLS): New
>>  define.
>> * config/arm/thumb1.md (thumb1_long_call_profile): New pattern.
>> * config/arm/unspecs.md (unspecv): Add VUNSPEC_LONG_CALL_PROFILE.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2016-03-24  Charles Baylis  
>>
>> * gcc.target/arm/pr69770.c: New test.
>>
>> Change-Id: I9b8de01fea083f17f729c3801f83174bedb3b0c6
>>
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index 0083673..324c9f4 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -343,6 +343,7 @@ extern void arm_register_target_pragmas (void);
>>  extern void arm_cpu_cpp_builtins (struct cpp_reader *);
>>  
>>  extern bool arm_is_constant_pool_ref (rtx);
>> +void arm_emit_long_call_profile ();
>>  
>>  /* Flags used to identify the presence of processor capabilities.  */
>>  
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index c868490..040b255 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config

Do not give realistic estimates for loop with array accesses

2016-03-30 Thread Jan Hubicka
Hi,
while looking into sudoku solving benchark, I noticed that we incorrectly
estimate loop to iterate 10 times just because the array it traverses is of
dimension 10. This of course is just upper bound and not realistic bound.
Realistically those loops iterates once most of time.

It turns out this bug was introduced by me in
https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
While I do not recall doing that patch, it seems like a thinko about reliable
(name of the variable) and realistic (name of the parameter it is passed to).

Fixing this caused one testsuite fallout in predictive commoning testcase
because loop unswitching previously disabled itself having an estimated number
of iterations 1 (I am not sure if that is not supposed to be 0, with 1
iteration unswithcing may pay back, little bit, I suppose it wants to test
number of stmt executions of the condtional to be at least 2 which depends on
whether the conditional is before or after the loop exits). This made me notice
that some loop passes that want to give up on low trip count uses combination
of estimated number of iterations and max number of iterations while other
don't which is fixed here. The code combining both realistic and max counts
is same as i.e. in unroller and other passes already.

I also wonder if predictive comming is a win for loops with very low
trip count, but that is for separate patch, too, anyway.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
estimates here.
* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
max_loop_iterations_int.
* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 234516)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
   tree low, high, type, next;
   bool sign, upper = true, at_end = false;
   struct loop *loop = data->loop;
-  bool reliable = true;
 
   if (TREE_CODE (base) != ARRAY_REF)
 return true;
@@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
   && tree_int_cst_compare (next, high) <= 0)
 return true;
 
-  /* If access is not executed on every iteration, we must ensure that overlow 
may
- not make the access valid later.  */
+  /* If access is not executed on every iteration, we must ensure that overlow
+ may not make the access valid later.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
   && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
step, data->stmt, loop, true))
-reliable = false;
+upper = false;
 
-  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, 
upper);
+  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
upper);
   return true;
 }
 
Index: tree-ssa-loop-unswitch.c
===
--- tree-ssa-loop-unswitch.c(revision 234516)
+++ tree-ssa-loop-unswitch.c(working copy)
@@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
   /* If the loop is not expected to iterate, there is no need
 for unswitching.  */
   iterations = estimated_loop_iterations_int (loop);
+  if (iterations < 0)
+iterations = max_loop_iterations_int (loop);
   if (iterations >= 0 && iterations <= 1)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
Index: tree-ssa-loop-ivopts.c
===
--- tree-ssa-loop-ivopts.c  (revision 234516)
+++ tree-ssa-loop-ivopts.c  (working copy)
@@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
 {
   HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
   if (niter == -1)
-return AVG_LOOP_NITER (loop);
+{
+  niter = max_stmt_executions_int (loop);
+  if (niter == -1 || niter > AVG_LOOP_NITER (loop))
+return AVG_LOOP_NITER (loop);
+}
 
   return niter;
 }
Index: tree-vect-loop.c
===
--- tree-vect-loop.c(revision 234516)
+++ tree-vect-loop.c(working copy)
@@ -2063,6 +2063,9 @@ start_over:
 
   estimated_niter
 = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
+  if (estimated_niter != -1)
+estimated_niter
+  = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
   if (estimated_niter != -1
   && ((unsigned HOST_WIDE_INT) estimated_niter
   <= MAX (th, (unsigned)min_profitable_estimate)))


Re: [PATCH] PR target/70439: Properly check conflict between DRAP register and __builtin_eh_return

2016-03-30 Thread H.J. Lu
On Wed, Mar 30, 2016 at 12:47 AM, Uros Bizjak  wrote:
> On Tue, Mar 29, 2016 at 8:56 PM, H.J. Lu  wrote:
>> Since %ecx can't be used for both DRAP register and __builtin_eh_return,
>> we need to check if crtl->drap_reg uses %ecx before using %ecx for
>> __builtin_eh_return.
>>
>> Testing on x86-64.  OK for trunk if there are no regressions?
>>
>>
>> H.J.
>> ---
>> PR target/70439
>> * config/i386/i386.c (ix86_expand_epilogue): Properly check
>> conflict between DRAP register and __builtin_eh_return.
>> ---
>>  gcc/config/i386/i386.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 1639704..aafe171 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -13463,8 +13463,9 @@ ix86_expand_epilogue (int style)
>>   rtx sa = EH_RETURN_STACKADJ_RTX;
>>   rtx_insn *insn;
>>
>> - /* Stack align doesn't work with eh_return.  */
>> - gcc_assert (!stack_realign_drap);
>> + /* %ecx can't be used for both DRAP register and eh_return.  */
>> + gcc_assert (!crtl->drap_reg
>> + || REGNO (crtl->drap_reg) != CX_REG);
>
> How about:
>
> if (crtl->drap_reg)
>   gcc_assert (REGNO (crtl->drap_reg) != CX_REG));
>
> ?
>
>>   /* Neither does regparm nested functions.  */
>>   gcc_assert (!ix86_static_chain_on_stack);
>
> This comment needs to be updated, too.
>
> Uros.

Like this?

-- 
H.J.


0001-Properly-check-conflict-between-DRAP-register-and-__.patch
Description: Binary data


Re: [PATCH] PR target/70439: Properly check conflict between DRAP register and __builtin_eh_return

2016-03-30 Thread Uros Bizjak
On Wed, Mar 30, 2016 at 1:00 PM, H.J. Lu  wrote:
> On Wed, Mar 30, 2016 at 12:47 AM, Uros Bizjak  wrote:
>> On Tue, Mar 29, 2016 at 8:56 PM, H.J. Lu  wrote:
>>> Since %ecx can't be used for both DRAP register and __builtin_eh_return,
>>> we need to check if crtl->drap_reg uses %ecx before using %ecx for
>>> __builtin_eh_return.
>>>
>>> Testing on x86-64.  OK for trunk if there are no regressions?
>>>
>>>
>>> H.J.
>>> ---
>>> PR target/70439
>>> * config/i386/i386.c (ix86_expand_epilogue): Properly check
>>> conflict between DRAP register and __builtin_eh_return.
>>> ---
>>>  gcc/config/i386/i386.c | 5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index 1639704..aafe171 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -13463,8 +13463,9 @@ ix86_expand_epilogue (int style)
>>>   rtx sa = EH_RETURN_STACKADJ_RTX;
>>>   rtx_insn *insn;
>>>
>>> - /* Stack align doesn't work with eh_return.  */
>>> - gcc_assert (!stack_realign_drap);
>>> + /* %ecx can't be used for both DRAP register and eh_return.  */
>>> + gcc_assert (!crtl->drap_reg
>>> + || REGNO (crtl->drap_reg) != CX_REG);
>>
>> How about:
>>
>> if (crtl->drap_reg)
>>   gcc_assert (REGNO (crtl->drap_reg) != CX_REG));
>>
>> ?
>>
>>>   /* Neither does regparm nested functions.  */
>>>   gcc_assert (!ix86_static_chain_on_stack);
>>
>> This comment needs to be updated, too.
>>
>> Uros.
>
> Like this?

Yes.

Thanks,
Uros.


Re: [PATCH] PR testsuite/70364: Properly align stack in gcc.target/i386/cleanup-[12].c

2016-03-30 Thread Jakub Jelinek
On Wed, Mar 30, 2016 at 09:07:17AM +0200, Uros Bizjak wrote:
> On Tue, Mar 29, 2016 at 10:13 PM, H.J. Lu  wrote:
> > Tested on x86-64.  OK for trunk?
> >
> > H.J.
> > ---
> > PR testsuite/70364
> > * gcc.target/i386/cleanup-1.c: Include .
> > (check): New function.
> > (bar): Call check.
> > (foo): Align stack to 16 bytes when calling bar.
> > * gcc.target/i386/cleanup-2.c: Likewise.
> 
> OK, but let's also ask Jakub, the author of the testcases, for opinion.

Ok.

Jakub


Re: Do not give realistic estimates for loop with array accesses

2016-03-30 Thread Richard Biener
On Wed, 30 Mar 2016, Jan Hubicka wrote:

> Hi,
> while looking into sudoku solving benchark, I noticed that we incorrectly
> estimate loop to iterate 10 times just because the array it traverses is of
> dimension 10. This of course is just upper bound and not realistic bound.
> Realistically those loops iterates once most of time.
> 
> It turns out this bug was introduced by me in
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
> While I do not recall doing that patch, it seems like a thinko about reliable
> (name of the variable) and realistic (name of the parameter it is passed to).
> 
> Fixing this caused one testsuite fallout in predictive commoning testcase
> because loop unswitching previously disabled itself having an estimated number
> of iterations 1 (I am not sure if that is not supposed to be 0, with 1
> iteration unswithcing may pay back, little bit, I suppose it wants to test
> number of stmt executions of the condtional to be at least 2 which depends on
> whether the conditional is before or after the loop exits). This made me 
> notice
> that some loop passes that want to give up on low trip count uses combination
> of estimated number of iterations and max number of iterations while other
> don't which is fixed here. The code combining both realistic and max counts
> is same as i.e. in unroller and other passes already.
> 
> I also wonder if predictive comming is a win for loops with very low
> trip count, but that is for separate patch, too, anyway.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
> Honza
> 
>   * tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
>   estimates here.
>   * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
>   max_loop_iterations_int.
>   * tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
>   * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> Index: tree-ssa-loop-niter.c
> ===
> --- tree-ssa-loop-niter.c (revision 234516)
> +++ tree-ssa-loop-niter.c (working copy)
> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>tree low, high, type, next;
>bool sign, upper = true, at_end = false;
>struct loop *loop = data->loop;
> -  bool reliable = true;
>  
>if (TREE_CODE (base) != ARRAY_REF)
>  return true;
> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>&& tree_int_cst_compare (next, high) <= 0)
>  return true;
>  
> -  /* If access is not executed on every iteration, we must ensure that 
> overlow may
> - not make the access valid later.  */
> +  /* If access is not executed on every iteration, we must ensure that 
> overlow
> + may not make the access valid later.  */
>if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>&& scev_probably_wraps_p (initial_condition_in_loop_num (ev, 
> loop->num),
>   step, data->stmt, loop, true))
> -reliable = false;
> +upper = false;
>  
> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, 
> upper);
> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
> upper);
>return true;
>  }
>  
> Index: tree-ssa-loop-unswitch.c
> ===
> --- tree-ssa-loop-unswitch.c  (revision 234516)
> +++ tree-ssa-loop-unswitch.c  (working copy)
> @@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
>/* If the loop is not expected to iterate, there is no need
>for unswitching.  */
>iterations = estimated_loop_iterations_int (loop);
> +  if (iterations < 0)
> +iterations = max_loop_iterations_int (loop);

You are only changing one place in this file.

>if (iterations >= 0 && iterations <= 1)
>   {
> if (dump_file && (dump_flags & TDF_DETAILS))
> Index: tree-ssa-loop-ivopts.c
> ===
> --- tree-ssa-loop-ivopts.c(revision 234516)
> +++ tree-ssa-loop-ivopts.c(working copy)
> @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>  {
>HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>if (niter == -1)
> -return AVG_LOOP_NITER (loop);
> +{
> +  niter = max_stmt_executions_int (loop);
> +  if (niter == -1 || niter > AVG_LOOP_NITER (loop))
> +return AVG_LOOP_NITER (loop);
> +}
>  
>return niter;
>  }
> Index: tree-vect-loop.c
> ===
> --- tree-vect-loop.c  (revision 234516)
> +++ tree-vect-loop.c  (working copy)
> @@ -2063,6 +2063,9 @@ start_over:
>  
>estimated_niter
>  = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> +  if (estimated_niter != -1)
> +estimated_niter
> +  = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>if (estimated_niter != -1
>&& ((unsigned HOST_WI

[PATCH] Fix PR70434

2016-03-30 Thread Richard Biener

The patch for PR63764 (accepts-invalid / ICE) caused us to generate
worse code for rvalue vector indexing by forcing the vector to a
temporary.  It turns out this is not necessary if we alter the
way the C/C++ FE lower the vector to perform the indexing operation
from lowering to a pointer-to-element to lowering to an array
using a VIEW_CONVERT_EXPR.

The alternate lowering has the advantage that the vector is not
required to be addressable which should improve aliasing analysis.
Not lowering to indirect accesses should also improve optimizations
like value-numbering.

There's the fallout that we need to make sure to convert back
constant index array-refs to the canonical BIT_FIELD_REF form
we use for vectors (see the gimple-fold.c hunk pattern-matching this).

And there's the latent bug in update-address-taken which happily
re-wrote a vector array-ref base into SSA form.

Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress.

I'll happily defer to stage1 if you think that's better as I don't
know any project that heavily makes use of GCCs vector extension
and I'm not sure about our own testsuite coverage.

Ok for trunk?

Thanks,
Richard.

2016-03-30  Richard Biener  

PR middle-end/70434
c-family/
* c-common.c (convert_vector_to_pointer_for_subscript): Use a
VIEW_CONVERT_EXPR to an array type.

* tree-ssa.c (non_rewritable_mem_ref_base): Make sure to mark
bases which are accessed with non-invariant indices.
* gimple-fold.c (maybe_canonicalize_mem_ref_addr): Re-write
constant index ARRAY_REFs of vectors into BIT_FIELD_REFs.

Index: gcc/c-family/c-common.c
===
*** gcc/c-family/c-common.c (revision 234545)
--- gcc/c-family/c-common.c (working copy)
*** convert_vector_to_pointer_for_subscript
*** 12448,12501 
if (VECTOR_TYPE_P (TREE_TYPE (*vecp)))
  {
tree type = TREE_TYPE (*vecp);
-   tree type1;
  
ret = !lvalue_p (*vecp);
if (TREE_CODE (index) == INTEGER_CST)
  if (!tree_fits_uhwi_p (index)
  || tree_to_uhwi (index) >= TYPE_VECTOR_SUBPARTS (type))
warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
  
!   if (ret)
!   {
! tree tmp = create_tmp_var_raw (type);
! DECL_SOURCE_LOCATION (tmp) = loc;
! *vecp = c_save_expr (*vecp);
! if (TREE_CODE (*vecp) == C_MAYBE_CONST_EXPR)
!   {
! bool non_const = C_MAYBE_CONST_EXPR_NON_CONST (*vecp);
! *vecp = C_MAYBE_CONST_EXPR_EXPR (*vecp);
! *vecp
!   = c_wrap_maybe_const (build4 (TARGET_EXPR, type, tmp,
! *vecp, NULL_TREE, NULL_TREE),
! non_const);
!   }
! else
!   *vecp = build4 (TARGET_EXPR, type, tmp, *vecp,
!   NULL_TREE, NULL_TREE);
! SET_EXPR_LOCATION (*vecp, loc);
! c_common_mark_addressable_vec (tmp);
!   }
!   else
!   c_common_mark_addressable_vec (*vecp);
!   type = build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
!   type1 = build_pointer_type (TREE_TYPE (*vecp));
!   bool ref_all = TYPE_REF_CAN_ALIAS_ALL (type1);
!   if (!ref_all
! && !DECL_P (*vecp))
!   {
! /* If the original vector isn't declared may_alias and it
!isn't a bare vector look if the subscripting would
!alias the vector we subscript, and if not, force ref-all.  */
! alias_set_type vecset = get_alias_set (*vecp);
! alias_set_type sset = get_alias_set (type);
! if (!alias_sets_must_conflict_p (sset, vecset)
! && !alias_set_subset_of (sset, vecset))
!   ref_all = true;
!   }
!   type = build_pointer_type_for_mode (type, ptr_mode, ref_all);
!   *vecp = build1 (ADDR_EXPR, type1, *vecp);
!   *vecp = convert (type, *vecp);
  }
return ret;
  }
--- 12448,12470 
if (VECTOR_TYPE_P (TREE_TYPE (*vecp)))
  {
tree type = TREE_TYPE (*vecp);
  
ret = !lvalue_p (*vecp);
+ 
if (TREE_CODE (index) == INTEGER_CST)
  if (!tree_fits_uhwi_p (index)
  || tree_to_uhwi (index) >= TYPE_VECTOR_SUBPARTS (type))
warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
  
!   /* We are building an ARRAY_REF so mark the vector as addressable
!  to not run into the gimplifiers premature setting of 
DECL_GIMPLE_REG_P
!for function parameters.  */
!   c_common_mark_addressable_vec (*vecp);
! 
!   *vecp = build1 (VIEW_CONVERT_EXPR,
! build_array_type_nelts (TREE_TYPE (type),
! TYPE_VECTOR_SUBPARTS (type)),
! *vecp);
  }
return ret;
  }
Index: gcc/tree-ssa.c
===

[c++/68475] ICE with fno-exceptions

2016-03-30 Thread Nathan Sidwell

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68475

This patch fixes an ICE with -fno-exceptions.  We were not checking eh spec 
equality when merging decls, leading to a checking-assert blowing up later.  As 
postulated in the bug report, always checking leads to good behaviour.  Even 
though we ignore the eh spec when generating code, we should check the source is 
well formed.


tested on x86_64-linux, ok?

nathan
2016-03-30  Nathan Sidwell  

	PR c++/68475
	* decl.c (check_redeclaration_exception_specification): Check
	regardless of -fno-exceptions.

	PR c++/68475
	* g++.dg/eh/spec12.C: New.

Index: cp/decl.c
===
--- cp/decl.c	(revision 234516)
+++ cp/decl.c	(working copy)
@@ -1200,9 +1200,11 @@ check_redeclaration_exception_specificat
  If any declaration of a function has an exception-specification,
  all declarations, including the definition and an explicit
  specialization, of that function shall have an
- exception-specification with the same set of type-ids.  */
+ exception-specification with the same set of type-ids.
+
+ Even when fno-exceptions is in effect, any specifications must
+ still  match.   */
   if (! DECL_IS_BUILTIN (old_decl)
-  && flag_exceptions
   && !comp_except_specs (new_exceptions, old_exceptions, ce_normal))
 {
   const char *msg
Index: testsuite/g++.dg/eh/spec12.C
===
--- testsuite/g++.dg/eh/spec12.C	(nonexistent)
+++ testsuite/g++.dg/eh/spec12.C	(working copy)
@@ -0,0 +1,19 @@
+// { dg-do compile }
+// { dg-additional-options "-fno-exceptions" }
+
+// PR68475 we used to not check eh spec matching with -fno-exceptions,
+// but this could lead to ICEs.
+
+template < typename > struct traits;
+
+template < typename T > struct X
+{
+  X & operator = (X &&) noexcept (traits < T >::foo ()); // { dg-inform "previous declaration" }
+};
+
+template < typename T >
+X < T > &
+X < T >::operator = (X &&) noexcept (traits < T >::bar ()) // { dg-error "different exception specifier" }
+{
+  return *this;
+}


Re: [PATCH GCC]Reduce compilation time for IVOPT by skipping cost computation in use group

2016-03-30 Thread Bin.Cheng
On Wed, Mar 30, 2016 at 9:09 AM, Richard Biener
 wrote:
> On Thu, Mar 24, 2016 at 6:26 PM, Bin Cheng  wrote:
>> Hi,
>> Quite lot of time is used when IVOPT computes cost for  pairs.  
>> As a matter of fact, some pairs are very similar to each other, and we can 
>> abstract and compute cost only once for these pairs.  This is a patch doing 
>> so, the idea is skipping cost computation for sub-uses in each group, of 
>> course it may result in different assembly code for some complicated cases 
>> because it estimates cost rather than doing real computation.  I did double 
>> check one of such case that the change in generated assembly is not 
>> degeneration.  For an IVOPT heavy program (spec2k/173), this patch reduces 
>> IVOPT's compilation time by 7~8%, as well as the memory consumption on my 
>> developing machine.
>>
>> Bootstrap & test on x86_64.
>>
>> For spec2k6 data on x86_64.  Maybe because I ran spec2k6 compiled with 
>> patched GCC in unclean environment, some cases are regressed by small amount 
>> (< %1).  I manually compared assembly code for several cases, including ones 
>> with the largest regression (still within <1%).  I could confirm that 
>> generated assembly code is exact the same as unpatched GCC, except for 
>> function emit_library_call_value_1 in 403.gcc/calls.c.
>>
>> In this case, difference of IVOPT dumps is as below:
>>
>> $ diff -y trunk/calls.c.154t.ivopts patch/calls.c.154t.ivopts
>>
>>   ::
>>   # val_21 = PHI   # val_21 = 
>> PHI 
>>   _811 = (void *) ivtmp.322_829;  _811 = 
>> (void *) ivtmp.322_829;
>>   MEM[base: _811, offset: -48B] = val_21; |   MEM[base: 
>> _811, offset: -32B] = val_21;
>>   _810 = (void *) ivtmp.322_829;  _810 = 
>> (void *) ivtmp.322_829;
>>   MEM[base: _810, offset: -40B] = mode_163;   |   MEM[base: 
>> _810, offset: -24B] = mode_163;
>>   _182 = function_arg (&args_so_far, mode_163, 0B, 1);_182 = 
>> function_arg (&args_so_far, mode_163, 0B, 1);
>>   _809 = (void *) ivtmp.322_829;  _809 = 
>> (void *) ivtmp.322_829;
>>   MEM[base: _809, offset: -32B] = _182;   |   MEM[base: 
>> _809, offset: -16B] = _182;
>>   _807 = (void *) ivtmp.322_829;  _807 = 
>> (void *) ivtmp.322_829;
>>   MEM[base: _807, offset: -24B] = 0;  |   MEM[base: 
>> _807, offset: -8B] = 0;
>>   _185 = (struct args_size *) ivtmp.322_829;  |   _801 = 
>> ivtmp.322_829 + 16;
>>   _801 = ivtmp.322_829 + 18446744073709551600;<
>>   _800 = (struct args_size *) _801;   _800 = 
>> (struct args_size *) _801;
>>   _186 = _800;|   _185 = 
>> _800;
>>   >   _186 = 
>> (struct args_size *) ivtmp.322_829;
>>   _187 = _182 != 0B;  _187 = 
>> _182 != 0B;
>>   _188 = (int) _187;  _188 = 
>> (int) _187;
>>   locate_and_pad_parm (mode_163, 0B, _188, 0B, &args_size, _1 
>> locate_and_pad_parm (mode_163, 0B, _188, 0B, &args_size, _1
>>   _802 = (void *) ivtmp.322_829;  _802 = 
>> (void *) ivtmp.322_829;
>>   _190 = MEM[base: _802, offset: 8B]; |   _190 = 
>> MEM[base: _802, offset: 24B];
>>   if (_190 != 0B) if (_190 
>> != 0B)
>> goto ;   goto > 45>;
>>   elseelse
>> goto ;   goto > 46>;
>>
>>   ::
>>   fancy_abort ("calls.c", 3724, &__FUNCTION__);   
>> fancy_abort ("calls.c", 3724, &__FUNCTION__);
>>
>> It's only an offset difference in IV.  And below is difference of generated 
>> assembly:
>> $ diff -y trunk/calls.S patch/calls.S
>> .L489:  .L489:
>> leaq-80(%rbp), %rdi leaq 
>>-80(%rbp), %rdi
>> xorl%edx, %edx  xorl 
>>%edx, %edx
>> movl$1, %ecxmovl 
>>$1, %ecx
>> movl%r13d, %esi movl 
>>%r13d, %esi
>> movq%rax, -48(%r15)   <
>> movl%r13d, -40(%r15)  <
>> callfunction_arg  <
>> movl$0, -24(%r15) <
>> movq%rax, -32(%r15)  

Re: [PATCH] Fix PR70434

2016-03-30 Thread Jakub Jelinek
On Wed, Mar 30, 2016 at 02:07:07PM +0200, Richard Biener wrote:
> The patch for PR63764 (accepts-invalid / ICE) caused us to generate
> worse code for rvalue vector indexing by forcing the vector to a
> temporary.  It turns out this is not necessary if we alter the
> way the C/C++ FE lower the vector to perform the indexing operation
> from lowering to a pointer-to-element to lowering to an array
> using a VIEW_CONVERT_EXPR.
> 
> The alternate lowering has the advantage that the vector is not
> required to be addressable which should improve aliasing analysis.
> Not lowering to indirect accesses should also improve optimizations
> like value-numbering.
> 
> There's the fallout that we need to make sure to convert back
> constant index array-refs to the canonical BIT_FIELD_REF form
> we use for vectors (see the gimple-fold.c hunk pattern-matching this).
> 
> And there's the latent bug in update-address-taken which happily
> re-wrote a vector array-ref base into SSA form.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress.
> 
> I'll happily defer to stage1 if you think that's better as I don't
> know any project that heavily makes use of GCCs vector extension
> and I'm not sure about our own testsuite coverage.

I think we should defer it for stage1 at this point.

> 2016-03-30  Richard Biener  
> 
>   PR middle-end/70434
>   c-family/
>   * c-common.c (convert_vector_to_pointer_for_subscript): Use a
>   VIEW_CONVERT_EXPR to an array type.
> 
>   * tree-ssa.c (non_rewritable_mem_ref_base): Make sure to mark
>   bases which are accessed with non-invariant indices.
>   * gimple-fold.c (maybe_canonicalize_mem_ref_addr): Re-write
>   constant index ARRAY_REFs of vectors into BIT_FIELD_REFs.

Jakub


Re: Do not give realistic estimates for loop with array accesses

2016-03-30 Thread Jan Hubicka
> 
> You are only changing one place in this file.

You are right. I am attaching the updated patch which I am re-testing now.
> 
> The vectorizer already checks this (albeit indirectly):
> 
>   HOST_WIDE_INT max_niter
> = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>&& (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
>   || (max_niter != -1
>   && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
> {
>   if (dump_enabled_p ())
> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>  "not vectorized: iteration count smaller than "
>  "vectorization factor.\n");
>   return false;
> }

Yes, but one tests only vectorization_factor and other min_profitable_estimate
which probably should be greater than vectorization_factor.

The check above should therefore become redundant.  My reading of the code is
that min_profiltable_estimate is computed after the check above, so it is
probably an useful shortcut and the message is also bit more informative.
I updated the later test to use max_niter variable once it is computed.

OK with those changes assuming testing passes?

Honza

* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
estimates here.
* tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
max_loop_iterations_int.
(tree_unswitch_outer_loop): Likewise.
* tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
Index: tree-ssa-loop-ivopts.c
===
--- tree-ssa-loop-ivopts.c  (revision 234516)
+++ tree-ssa-loop-ivopts.c  (working copy)
@@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
 {
   HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
   if (niter == -1)
-return AVG_LOOP_NITER (loop);
+{
+  niter = max_stmt_executions_int (loop);
+  if (niter == -1 || niter > AVG_LOOP_NITER (loop))
+return AVG_LOOP_NITER (loop);
+}
 
   return niter;
 }
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 234516)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
   tree low, high, type, next;
   bool sign, upper = true, at_end = false;
   struct loop *loop = data->loop;
-  bool reliable = true;
 
   if (TREE_CODE (base) != ARRAY_REF)
 return true;
@@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
   && tree_int_cst_compare (next, high) <= 0)
 return true;
 
-  /* If access is not executed on every iteration, we must ensure that overlow 
may
- not make the access valid later.  */
+  /* If access is not executed on every iteration, we must ensure that overlow
+ may not make the access valid later.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
   && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
step, data->stmt, loop, true))
-reliable = false;
+upper = false;
 
-  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, 
upper);
+  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
upper);
   return true;
 }
 
Index: tree-ssa-loop-unswitch.c
===
--- tree-ssa-loop-unswitch.c(revision 234516)
+++ tree-ssa-loop-unswitch.c(working copy)
@@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
   /* If the loop is not expected to iterate, there is no need
 for unswitching.  */
   iterations = estimated_loop_iterations_int (loop);
+  if (iterations < 0)
+iterations = max_loop_iterations_int (loop);
   if (iterations >= 0 && iterations <= 1)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -439,6 +441,8 @@ tree_unswitch_outer_loop (struct loop *l
   /* If the loop is not expected to iterate, there is no need
   for unswitching.  */
   iterations = estimated_loop_iterations_int (loop);
+  if (iterations < 0)
+iterations = max_loop_iterations_int (loop);
   if (iterations >= 0 && iterations <= 1)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
Index: tree-vect-loop.c
===
--- tree-vect-loop.c(revision 234516)
+++ tree-vect-loop.c(working copy)
@@ -2063,6 +2063,8 @@ start_over:
 
   estimated_niter
 = estimated_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
+  if (estimated_niter != -1)
+estimated_niter = max_niter;
   if (estimated_niter != -1
   && ((unsigned HOST_WIDE_INT) estimated_niter
   <= MAX (th, (unsigned)min_profitable_estimate)))


[PATCH] PR libitm/70456: Allocate aligned memory in gtm_thread operator new

2016-03-30 Thread H.J. Lu
Since GTM::gtm_thread has

gtm_thread *next_thread __attribute__((__aligned__(HW_CACHELINE_SIZE)));

GTM::gtm_thread::operator new should allocate aligned memory.

Tested on Linux/x86-64.  OK for trunk.


H.J.


PR libitm/70456
* beginend.cc (GTM::gtm_thread::operator new): Use posix_memalign
to allocate aligned memory.
---
 libitm/beginend.cc | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 20b5547..e2a8327 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -63,7 +63,14 @@ GTM::gtm_thread::operator new (size_t s)
 
   assert(s == sizeof(gtm_thread));
 
+#ifdef HAVE_POSIX_MEMALIGN
+  if (posix_memalign (&tx, __alignof__ (gtm_thread), sizeof (gtm_thread)))
+GTM_fatal ("Out of memory allocating %lu bytes aligned at %lu bytes",
+  (unsigned long) sizeof (gtm_thread),
+  (unsigned long) __alignof__ (gtm_thread));
+#else
   tx = xmalloc (sizeof (gtm_thread), true);
+#endif
   memset (tx, 0, sizeof (gtm_thread));
 
   return tx;
-- 
2.5.5



Re: Do not give realistic estimates for loop with array accesses

2016-03-30 Thread Richard Biener
On Wed, 30 Mar 2016, Jan Hubicka wrote:

> > 
> > You are only changing one place in this file.
> 
> You are right. I am attaching the updated patch which I am re-testing now.
> > 
> > The vectorizer already checks this (albeit indirectly):
> > 
> >   HOST_WIDE_INT max_niter
> > = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
> >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> >&& (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
> >   || (max_niter != -1
> >   && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
> > {
> >   if (dump_enabled_p ())
> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >  "not vectorized: iteration count smaller than "
> >  "vectorization factor.\n");
> >   return false;
> > }
> 
> Yes, but one tests only vectorization_factor and other min_profitable_estimate
> which probably should be greater than vectorization_factor.
> 
> The check above should therefore become redundant.  My reading of the code is
> that min_profiltable_estimate is computed after the check above, so it is
> probably an useful shortcut and the message is also bit more informative.
> I updated the later test to use max_niter variable once it is computed.
> 
> OK with those changes assuming testing passes?

Ok.

Richard.

> Honza
> 
>   * tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get realistic
>   estimates here.
>   * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
>   max_loop_iterations_int.
>   (tree_unswitch_outer_loop): Likewise.
>   * tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
>   * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> Index: tree-ssa-loop-ivopts.c
> ===
> --- tree-ssa-loop-ivopts.c(revision 234516)
> +++ tree-ssa-loop-ivopts.c(working copy)
> @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>  {
>HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>if (niter == -1)
> -return AVG_LOOP_NITER (loop);
> +{
> +  niter = max_stmt_executions_int (loop);
> +  if (niter == -1 || niter > AVG_LOOP_NITER (loop))
> +return AVG_LOOP_NITER (loop);
> +}
>  
>return niter;
>  }
> Index: tree-ssa-loop-niter.c
> ===
> --- tree-ssa-loop-niter.c (revision 234516)
> +++ tree-ssa-loop-niter.c (working copy)
> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>tree low, high, type, next;
>bool sign, upper = true, at_end = false;
>struct loop *loop = data->loop;
> -  bool reliable = true;
>  
>if (TREE_CODE (base) != ARRAY_REF)
>  return true;
> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>&& tree_int_cst_compare (next, high) <= 0)
>  return true;
>  
> -  /* If access is not executed on every iteration, we must ensure that 
> overlow may
> - not make the access valid later.  */
> +  /* If access is not executed on every iteration, we must ensure that 
> overlow
> + may not make the access valid later.  */
>if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>&& scev_probably_wraps_p (initial_condition_in_loop_num (ev, 
> loop->num),
>   step, data->stmt, loop, true))
> -reliable = false;
> +upper = false;
>  
> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, 
> upper);
> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
> upper);
>return true;
>  }
>  
> Index: tree-ssa-loop-unswitch.c
> ===
> --- tree-ssa-loop-unswitch.c  (revision 234516)
> +++ tree-ssa-loop-unswitch.c  (working copy)
> @@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
>/* If the loop is not expected to iterate, there is no need
>for unswitching.  */
>iterations = estimated_loop_iterations_int (loop);
> +  if (iterations < 0)
> +iterations = max_loop_iterations_int (loop);
>if (iterations >= 0 && iterations <= 1)
>   {
> if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -439,6 +441,8 @@ tree_unswitch_outer_loop (struct loop *l
>/* If the loop is not expected to iterate, there is no need
>for unswitching.  */
>iterations = estimated_loop_iterations_int (loop);
> +  if (iterations < 0)
> +iterations = max_loop_iterations_int (loop);
>if (iterations >= 0 && iterations <= 1)
>  {
>if (dump_file && (dump_flags & TDF_DETAILS))
> Index: tree-vect-loop.c
> ===
> --- tree-vect-loop.c  (revision 234516)
> +++ tree-vect-loop.c  (working copy)
> @@ -2063,6 +2063,8 @@ start_over:
>  
>estimated_niter
>  = estimated_stmt_executi

Commit: ARM: Extend fix for PR 62254

2016-03-30 Thread Nick Clifton
Hi Guys,

  I am applying this patch as a further fix for PR 62254.

  In the long run we will hopefully be dropping support for ARM v3 (and
  earlier) so this is more in the nature of a plaster than a real fix.

Cheers
  Nick

gcc/ChangeLog
2016-03-30  Nick Clifton  

PR target/62254
* config/arm/arm.c (arm_reload_out_hi): Add code to handle the
case where we are already provided with an SImode SUBREG.

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 234516)
+++ gcc/config/arm/arm.c(working copy)
@@ -15601,9 +15601,22 @@
 architecture variant does not have an HImode register move.  */
   if (base == NULL)
{
- gcc_assert (REG_P (outval));
- emit_insn (gen_movsi (gen_rtx_SUBREG (SImode, ref, 0),
-   gen_rtx_SUBREG (SImode, outval, 0)));
+ gcc_assert (REG_P (outval) || SUBREG_P (outval));
+
+ if (REG_P (outval))
+   {
+ emit_insn (gen_movsi (gen_rtx_SUBREG (SImode, ref, 0),
+   gen_rtx_SUBREG (SImode, outval, 0)));
+   }
+ else /* SUBREG_P (outval)  */
+   {
+ if (GET_MODE (SUBREG_REG (outval)) == SImode)
+   emit_insn (gen_movsi (gen_rtx_SUBREG (SImode, ref, 0),
+ SUBREG_REG (outval)));
+ else
+   /* FIXME: Handle other cases ?  */
+   gcc_unreachable ();
+   }
  return;
}
 }


Backports to 5.x branch

2016-03-30 Thread Jakub Jelinek
Hi!

I've bootstrapped/regtested on x86_64-linux and i686-linux following
backports from trunk and committed them to gcc-5-branch.

Jakub
2016-03-30  Jakub Jelinek  

Backported from mainline
2016-02-12  Jakub Jelinek  

PR ipa/68672
* ipa-split.c (split_function): Compute retval early in all cases
if split_part_return_p and return_bb is not EXIT.  Remove all
clobber stmts and reset all debug stmts that refer to SSA_NAMEs
defined in split part, except if it is retval, in that case replace
the old retval with the lhs of the call to the split part.

* g++.dg/ipa/pr68672-1.C: New test.
* g++.dg/ipa/pr68672-2.C: New test.
* g++.dg/ipa/pr68672-3.C: New test.

--- gcc/ipa-split.c (revision 233373)
+++ gcc/ipa-split.c (revision 233374)
@@ -1306,8 +1306,8 @@ split_function (basic_block return_bb, s
  FIXME: Once we are able to change return type, we should change function
  to return void instead of just outputting function with undefined return
  value.  For structures this affects quality of codegen.  */
-  else if (!split_point->split_part_set_retval
-   && find_retval (return_bb))
+  else if ((retval = find_retval (return_bb))
+  && !split_point->split_part_set_retval)
 {
   bool redirected = true;
   basic_block new_return_bb = create_basic_block (NULL, 0, return_bb);
@@ -1402,6 +1402,44 @@ split_function (basic_block return_bb, s
   DECL_FUNCTION_CODE (node->decl) = (enum built_in_function) 0;
 }
 
+  /* If return_bb contains any clobbers that refer to SSA_NAMEs
+ set in the split part, remove them.  Also reset debug stmts that
+ refer to SSA_NAMEs set in the split part.  */
+  if (return_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
+{
+  gimple_stmt_iterator gsi = gsi_start_bb (return_bb);
+  while (!gsi_end_p (gsi))
+   {
+ tree op;
+ ssa_op_iter iter;
+ gimple stmt = gsi_stmt (gsi);
+ bool remove = false;
+ if (gimple_clobber_p (stmt) || is_gimple_debug (stmt))
+   FOR_EACH_SSA_TREE_OPERAND (op, stmt, iter, SSA_OP_USE)
+ {
+   basic_block bb = gimple_bb (SSA_NAME_DEF_STMT (op));
+   if (op != retval
+   && bb
+   && bb != return_bb
+   && bitmap_bit_p (split_point->split_bbs, bb->index))
+ {
+   if (is_gimple_debug (stmt))
+ {
+   gimple_debug_bind_reset_value (stmt);
+   update_stmt (stmt);
+ }
+   else
+ remove = true;
+   break;
+ }
+ }
+ if (remove)
+   gsi_remove (&gsi, true);
+ else
+   gsi_next (&gsi);
+   }
+}
+
   /* If the original function is instrumented then it's
  part is also instrumented.  */
   if (with_bounds)
@@ -1554,7 +1592,7 @@ split_function (basic_block return_bb, s
  return value into and put call just before it.  */
   if (return_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
{
- real_retval = retval = find_retval (return_bb);
+ real_retval = retval;
  retbnd = find_retbnd (return_bb);
 
  if (real_retval && split_point->split_part_set_retval)
@@ -1600,6 +1638,28 @@ split_function (basic_block return_bb, s
break;
  }
  update_stmt (gsi_stmt (bsi));
+ /* Also adjust clobbers and debug stmts in return_bb.  */
+ for (bsi = gsi_start_bb (return_bb); !gsi_end_p (bsi);
+  gsi_next (&bsi))
+   {
+ gimple stmt = gsi_stmt (bsi);
+ if (gimple_clobber_p (stmt)
+ || is_gimple_debug (stmt))
+   {
+ ssa_op_iter iter;
+ use_operand_p use_p;
+ bool update = false;
+ FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter,
+   SSA_OP_USE)
+   if (USE_FROM_PTR (use_p) == real_retval)
+ {
+   SET_USE (use_p, retval);
+   update = true;
+ }
+ if (update)
+   update_stmt (stmt);
+   }
+   }
}
 
  /* Replace retbnd with new one.  */
--- gcc/testsuite/g++.dg/ipa/pr68672-1.C(revision 0)
+++ gcc/testsuite/g++.dg/ipa/pr68672-1.C(revision 233374)
@@ -0,0 +1,20 @@
+// PR ipa/68672
+// { dg-do compile }
+// { dg-options "-O -finline-small-fu

[PATCH] PR target/70454: Check --with-arch=/--with-arch-32= for 32-bit x86 libatomic library

2016-03-30 Thread H.J. Lu
If --with-arch-32= is used to configure GCC, it should be used to
compile 32-bit x86 libatomic library.  Since --with-arch for 64-bit
> i486, we can use it for 32-bit target library.

Tested on x86-64, with and without --with-arch=.  OK for stage 1?

H.J.
--
PR target/70454
* configure.tgt: Use --with-arch=/--with-arch-32= for 32-bit
x86 target library.
---
 libatomic/configure.tgt | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index c5470d7..345c02b 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -83,8 +83,12 @@ case "${target_cpu}" in
   x86_64)
case " ${CC} ${CFLAGS} " in
  *" -m32 "*)
-   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
-   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
+   # Since --with-arch for 64-bit > i486, we can use it for
+   # for 32-bit.
+   if test -z "$with_arch_32" && test -z "$with_arch"; then
+ XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
+ XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
+   fi
;;
  *)
;;
-- 
2.5.5



[PATCH] PR target/70454: Check --with-arch=/--with-arch-32= for 32-bit x86 libgomp library

2016-03-30 Thread H.J. Lu
If --with-arch-32= is used to configure GCC, it should be used to
compile 32-bit x86 libgomp library.  Since --with-arch for 64-bit
> i486, we can use it for 32-bit target library.

Tested on x86-64, with and without --with-arch=.  OK for stage 1?

H.J.
PR target/70454
* configure.tgt: Use --with-arch=/--with-arch-32= for 32-bit
x86 target library.
---
 libgomp/configure.tgt | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
index 77e73f0..c99b620 100644
--- a/libgomp/configure.tgt
+++ b/libgomp/configure.tgt
@@ -79,14 +79,15 @@ if test x$enable_linux_futex = xyes; then
esac
;;
 
-# Similar jiggery-pokery for x86_64 multilibs, except here we
-# can't rely on the --with-arch configure option, since that
-# applies to the 64-bit side.
 x86_64-*-linux*)
config_path="linux/x86 linux posix"
case " ${CC} ${CFLAGS} " in
  *" -m32 "*)
-   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
+   # Since --with-arch for 64-bit > i486, we can use it for
+   # for 32-bit.
+   if test -z "$with_arch_32" && test -z "$with_arch"; then
+ XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
+   fi
;;
esac
;;
-- 
2.5.5



[PATCH] PR target/70454: Check --with-arch=/--with-arch-32= for 32-bit x86 libitm library

2016-03-30 Thread H.J. Lu
If --with-arch-32= is used to configure GCC, it should be used to
compile 32-bit x86 libitm library.  Since --with-arch for 64-bit
> i486, we can use it for 32-bit target library.

Tested on x86-64, with and without --with-arch=.  OK for stage 1?
 
H.J.
PR target/70454
* configure.tgt: Use --with-arch=/--with-arch-32= for 32-bit
x86 target library.
---
 libitm/configure.tgt | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libitm/configure.tgt b/libitm/configure.tgt
index e84382f..13e7cde 100644
--- a/libitm/configure.tgt
+++ b/libitm/configure.tgt
@@ -102,8 +102,12 @@ case "${target_cpu}" in
   x86_64)
case " ${CC} ${CFLAGS} " in
  *" -m32 "*)
-   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
-   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
+   # Since --with-arch for 64-bit > i486, we can use it for
+   # for 32-bit.
+   if test -z "$with_arch_32" && test -z "$with_arch"; then
+ XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
+ XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
+   fi
;;
esac
XCFLAGS="${XCFLAGS} -mrtm"
-- 
2.5.5



[PATCH][C++] Fix PR70430

2016-03-30 Thread Richard Biener

The following fixes a pasto in cp_build_binary_op.  Will apply
as obvious after it passes bootstrap/testing on x86_64-unknown-linux-gnu.

Richard.

2016-03-30  Richard Biener  

PR c++/70430
* typeck.c (cp_build_binary_op): Fix operand order of vector
conditional in truth op handling.

* g++.dg/ext/vector30.C: New testcase.

Index: gcc/cp/typeck.c
===
*** gcc/cp/typeck.c (revision 234546)
--- gcc/cp/typeck.c (working copy)
*** cp_build_binary_op (location_t location,
*** 4364,4370 
{
  tree m1 = build_all_ones_cst (TREE_TYPE (op0));
  tree z = build_zero_cst (TREE_TYPE (op0));
! op1 = build_conditional_expr (location, op1, z, m1, complain);
}
  else if (!COMPARISON_CLASS_P (op1))
op1 = cp_build_binary_op (EXPR_LOCATION (op1), NE_EXPR, op1,
--- 4364,4370 
{
  tree m1 = build_all_ones_cst (TREE_TYPE (op0));
  tree z = build_zero_cst (TREE_TYPE (op0));
! op1 = build_conditional_expr (location, op1, m1, z, complain);
}
  else if (!COMPARISON_CLASS_P (op1))
op1 = cp_build_binary_op (EXPR_LOCATION (op1), NE_EXPR, op1,
Index: gcc/testsuite/g++.dg/ext/vector30.C
===
*** gcc/testsuite/g++.dg/ext/vector30.C (revision 0)
--- gcc/testsuite/g++.dg/ext/vector30.C (working copy)
***
*** 0 
--- 1,15 
+ // PR c++/70430
+ // { dg-do run }
+ extern "C" void abort (void);
+ typedef int v4si __attribute__ ((vector_size (16)));
+ int main()
+ {
+   v4si b = {1,0,-1,2}, c;
+   c = b && 1;
+   if (c[0] != -1 || c[1] != 0 || c[2] != -1 || c[3] != -1)
+ abort ();
+   c = b && 0;
+   if (c[0] != 0 || c[1] != 0 || c[2] != 0 || c[3] != 0)
+ abort ();
+   return 0;
+ }


Re: Do not give realistic estimates for loop with array accesses

2016-03-30 Thread Bin.Cheng
On Wed, Mar 30, 2016 at 11:00 AM, Jan Hubicka  wrote:
> Hi,
> while looking into sudoku solving benchark, I noticed that we incorrectly
> estimate loop to iterate 10 times just because the array it traverses is of
> dimension 10. This of course is just upper bound and not realistic bound.
> Realistically those loops iterates once most of time.
>
> It turns out this bug was introduced by me in
> https://gcc.gnu.org/ml/gcc-patches/2013-01/msg00444.html
> While I do not recall doing that patch, it seems like a thinko about reliable
> (name of the variable) and realistic (name of the parameter it is passed to).
>
> Fixing this caused one testsuite fallout in predictive commoning testcase
> because loop unswitching previously disabled itself having an estimated number
> of iterations 1 (I am not sure if that is not supposed to be 0, with 1
> iteration unswithcing may pay back, little bit, I suppose it wants to test
> number of stmt executions of the condtional to be at least 2 which depends on
> whether the conditional is before or after the loop exits). This made me 
> notice
> that some loop passes that want to give up on low trip count uses combination
> of estimated number of iterations and max number of iterations while other
> don't which is fixed here. The code combining both realistic and max counts
> is same as i.e. in unroller and other passes already.
>
> I also wonder if predictive comming is a win for loops with very low
> trip count, but that is for separate patch, too, anyway.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> Honza
>
> * tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get 
> realistic
> estimates here.
> * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
> max_loop_iterations_int.
> * tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
> * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
> Index: tree-ssa-loop-niter.c
> ===
> --- tree-ssa-loop-niter.c   (revision 234516)
> +++ tree-ssa-loop-niter.c   (working copy)
> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>tree low, high, type, next;
>bool sign, upper = true, at_end = false;
>struct loop *loop = data->loop;
> -  bool reliable = true;
>
>if (TREE_CODE (base) != ARRAY_REF)
>  return true;
> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>&& tree_int_cst_compare (next, high) <= 0)
>  return true;
>
> -  /* If access is not executed on every iteration, we must ensure that 
> overlow may
> - not make the access valid later.  */
> +  /* If access is not executed on every iteration, we must ensure that 
> overlow
> + may not make the access valid later.  */
>if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
>&& scev_probably_wraps_p (initial_condition_in_loop_num (ev, 
> loop->num),
> step, data->stmt, loop, true))
> -reliable = false;
> +upper = false;
>
> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, 
> upper);
> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
> upper);
>return true;
>  }
Hi,
I have a concern that GCC records bound information from basic blocks
even that don't dominate loop latch.  Given below example:

extern int b[];
void bar (int *);
int foo (int x, int n)
{
  int i;
  int arr[128] = {0};

  for (i = 0; i < n; i++)
{
  if (x > i)
{
  a[i] = i;
  b[i] = i;
}
}
  bar (arr);
  return 0;
}
The upper bound inferred from a[i] is 127.  This information is
recorded along with the stmt itself in loop->bounds.  Afterwards, this
information is also used in a flow-sensitive way.  In this example, we
are sure that &b[i] won't overflow (thus a SCEV) because it's in the
same basic block as a[i].  GCC currently relies on such information in
overflow detection for scev, i.e., loop_exits_before_overflow.

But with this change, we won't record upper bound information in
record_estimate because the parameter is set to false?

Thanks,
bin


[PATCH] Fix PR70450

2016-03-30 Thread Richard Biener

The following patch fixes PR70450 where I (again...) was not able
to decipher wide-int workings in a first try.  (the 'sign' op
looks redundant for a tree wide_int::from so I was thinking it
must apply to the destination to specify eventual zero/sign-extension
if the 'precision' arg is not a multiple of HWI bits - appearantly
a case that needs to be handled separately - I wonder about
lurking bits in other similar users for this case).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-03-30  Richard Biener  

PR middle-end/70450
* fold-const.c (extract_muldiv_1): Fix thinko in wide_int::from
usage.

* gcc.dg/torture/pr70450.c: New testcase.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 234546)
--- gcc/fold-const.c(working copy)
*** extract_muldiv_1 (tree t, tree c, enum t
*** 6375,6382 
  bool overflow_mul_p;
  signop sign = TYPE_SIGN (ctype);
  unsigned prec = TYPE_PRECISION (ctype);
! wide_int mul = wi::mul (wide_int::from (op1, prec, sign),
! wide_int::from (c, prec, sign),
  sign, &overflow_mul_p);
  overflow_p = TREE_OVERFLOW (c) | TREE_OVERFLOW (op1);
  if (overflow_mul_p
--- 6375,6384 
  bool overflow_mul_p;
  signop sign = TYPE_SIGN (ctype);
  unsigned prec = TYPE_PRECISION (ctype);
! wide_int mul = wi::mul (wide_int::from (op1, prec,
! TYPE_SIGN (TREE_TYPE (op1))),
! wide_int::from (c, prec,
! TYPE_SIGN (TREE_TYPE (c))),
  sign, &overflow_mul_p);
  overflow_p = TREE_OVERFLOW (c) | TREE_OVERFLOW (op1);
  if (overflow_mul_p
Index: gcc/testsuite/gcc.dg/torture/pr70450.c
===
*** gcc/testsuite/gcc.dg/torture/pr70450.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70450.c  (working copy)
***
*** 0 
--- 1,19 
+ /* { dg-do run } */
+ /* { dg-require-effective-target lp64 } */
+ 
+ unsigned long int a = 2UL;
+ int b = 2;
+ unsigned long int c = 2UL;
+ 
+ void foo ()
+ {
+   c = 2 * ((2 * a) * (2 * (-b)));
+ }
+ 
+ int main ()
+ {
+   foo();
+   if (c != 18446744073709551584UL)
+ __builtin_abort();
+   return 0;
+ }


Re: [c++/68475] ICE with fno-exceptions

2016-03-30 Thread Jason Merrill

On 03/30/2016 08:09 AM, Nathan Sidwell wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68475

This patch fixes an ICE with -fno-exceptions.  We were not checking eh
spec equality when merging decls, leading to a checking-assert blowing
up later.  As postulated in the bug report, always checking leads to
good behaviour.  Even though we ignore the eh spec when generating code,
we should check the source is well formed.


Hmm, I think the use of the flag there was meant to allow leaving the 
exception specification off in some declarations.  I'm open to getting 
stricter, but I'd prefer to make it a pedwarn when !flag_exceptions 
rather than an error, in which case we still need to deal with the 
mismatch in merge_exception_specifiers.


Jason



Re: [PATCH] Fix ix86_expand_vector_set (PR target/70421)

2016-03-30 Thread Kirill Yukhin
On 29 Mar 19:49, Jakub Jelinek wrote:
> On Tue, Mar 29, 2016 at 11:44:15AM -0600, Jeff Law wrote:
> > On 03/29/2016 11:05 AM, Jakub Jelinek wrote:
> > >Hi!
> > >
> > >The various blendm expanders look like:
> > >(define_insn "_blendm"
> > >   [(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
> > > (vec_merge:V48_AVX512VL
> > >   (match_operand:V48_AVX512VL 2 "nonimmediate_operand" "vm")
> > >   (match_operand:V48_AVX512VL 1 "register_operand" "v")
> > One could argue this ordering is just asking for trouble.
> 
> I bet the reason for this ordering are both the x86 intrinsics and
> the HW behavior (see e.g. the order of arguments in the insn template
> of the define_insn, etc.).
> I think VEC_MERGE's definition on which argument you pick the elements from
> for 0 bits in the mask vs. 1 bits in the mask is the exact opposite of what
> the x86 HW wants and the intrinsics expect.
I think that order of arguments in built-in might be changed easily.
This doesn't affect intrinsics at all, because of that.

I can hardly recall, but my bet is that this order was dictated by:
ix86_expand_sse_movcc where order of blends args should corresond
on AVX*. 
If we want to fix the order in the pattern we should swap op_[true|false]
here:
case V64QImode:
  gen = gen_avx512bw_blendmv64qi;
  break;
case V32HImode:
  gen = gen_avx512bw_blendmv32hi;
  break;
case V16SImode:
  gen = gen_avx512f_blendmv16si;
  break;
case V8DImode:
  gen = gen_avx512f_blendmv8di;
  break;
case V8DFmode:
  gen = gen_avx512f_blendmv8df;
  break;
case V16SFmode:
  gen = gen_avx512f_blendmv16sf;
  break;

Jakub, nay be add comment in the patch on blendm patterns emphasizing
this non-regular order?

Otherwise, patch is OK to me.

--
Thanks, K

> 
>   Jakub


Re: [PATCH] Fix ix86_expand_vector_set (PR target/70421)

2016-03-30 Thread Jakub Jelinek
On Wed, Mar 30, 2016 at 04:53:48PM +0300, Kirill Yukhin wrote:
> I think that order of arguments in built-in might be changed easily.
> This doesn't affect intrinsics at all, because of that.
> 
> I can hardly recall, but my bet is that this order was dictated by:
> ix86_expand_sse_movcc where order of blends args should corresond
> on AVX*. 

Having the AVX512* blends have different order from AVX{,2} blends would be
bad though, so if we want to change the order, we'd have to change it
everywhere.

> Jakub, nay be add comment in the patch on blendm patterns emphasizing
> this non-regular order?

Ok, will do.

Jakub


Re: Update OpenACC test cases

2016-03-30 Thread Jakub Jelinek
On Wed, Mar 30, 2016 at 04:06:30PM +0200, Thomas Schwinge wrote:
> This is to integrate into trunk a large amount of the test case updates
> that we have accumulated on gomp-4_0-branch.  OK to commit?

Ok.

Jakub


Re: Do not give realistic estimates for loop with array accesses

2016-03-30 Thread Jan Hubicka
> > -  /* If access is not executed on every iteration, we must ensure that 
> > overlow may
> > - not make the access valid later.  */
> > +  /* If access is not executed on every iteration, we must ensure that 
> > overlow
> > + may not make the access valid later.  */
> >if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
> >&& scev_probably_wraps_p (initial_condition_in_loop_num (ev, 
> > loop->num),
> > step, data->stmt, loop, true))
> > -reliable = false;
> > +upper = false;
> >
> > -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, 
> > reliable, upper);
> > +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
> > upper);
> >return true;
> >  }
> Hi,
> I have a concern that GCC records bound information from basic blocks
> even that don't dominate loop latch.  Given below example:
> 
> extern int b[];
> void bar (int *);
> int foo (int x, int n)
> {
>   int i;
>   int arr[128] = {0};
> 
>   for (i = 0; i < n; i++)
> {
>   if (x > i)
> {
>   a[i] = i;
>   b[i] = i;
> }
> }
>   bar (arr);
>   return 0;
> }

This testcase is not affected, becase scev_probably_wraps_p returns false in 
this case.
In the wrapping case, we can't derive upper bound - this is indeed a 
correctness issue.

I am experiemtning with enabling loop peeling by default. For that I extended 
the code
to record likely upper bounds, which can be used to test if the loop most 
probably has
low trip count. This is also useful to trottle other transformations.

In this case we can probably assume that no sane person would count
on wrapping and record this as likely bound.

Honza


Re: rs6000 stack_tie mishap again

2016-03-30 Thread Alan Modra
On Wed, Mar 30, 2016 at 11:02:41AM +0200, Olivier Hainque wrote:
> void g(int, char *);
> 
> void f(int x)
> {
>char big[20];
>  start:
>g(x, big);
>g(x, big);
>register void *p asm("r11") = &&start;
>asm("" : : "r"(p));
>asm("" : : :"r28");
>asm("" : : :"r29");
>asm("" : : :"r30");
> }
> 
> I'm getting:
> 
> lis 11,.L2@ha
> la 11,.L2@l(11)
> 
> lwz 11,0(1)
> lwz 0,4(11)
> lwz 28,-16(11) 
> 
>   mr 1,11
> 
>   mtlr 0
>   lwz 29,-12(11)
>   lwz 30,-8(11)
>   lwz 31,-4(11)
> 
>   blr

BTW, the exact sequence you get depends on -mcpu (not surprising), but
yes, I see register restores after the "mr 1,11" too.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Do not give realistic estimates for loop with array accesses

2016-03-30 Thread Bin.Cheng
On Wed, Mar 30, 2016 at 3:22 PM, Jan Hubicka  wrote:
>> > -  /* If access is not executed on every iteration, we must ensure that 
>> > overlow may
>> > - not make the access valid later.  */
>> > +  /* If access is not executed on every iteration, we must ensure that 
>> > overlow
>> > + may not make the access valid later.  */
>> >if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb 
>> > (data->stmt))
>> >&& scev_probably_wraps_p (initial_condition_in_loop_num (ev, 
>> > loop->num),
>> > step, data->stmt, loop, true))
>> > -reliable = false;
>> > +upper = false;
>> >
>> > -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, 
>> > reliable, upper);
>> > +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
>> > upper);
>> >return true;
>> >  }
>> Hi,
>> I have a concern that GCC records bound information from basic blocks
>> even that don't dominate loop latch.  Given below example:
>>
>> extern int b[];
>> void bar (int *);
>> int foo (int x, int n)
>> {
>>   int i;
>>   int arr[128] = {0};
>>
>>   for (i = 0; i < n; i++)
>> {
>>   if (x > i)
>> {
>>   a[i] = i;
>>   b[i] = i;
>> }
>> }
>>   bar (arr);
>>   return 0;
>> }
>
> This testcase is not affected, becase scev_probably_wraps_p returns false in 
> this case.
> In the wrapping case, we can't derive upper bound - this is indeed a 
> correctness issue.
In the wrapping case, we still can derive upper bound if the index's
wrapping range is larger than array bound.  But I agree it looks like
very corner case and not likely be useful in practice.

Thanks,
bin
>
> I am experiemtning with enabling loop peeling by default. For that I extended 
> the code
> to record likely upper bounds, which can be used to test if the loop most 
> probably has
> low trip count. This is also useful to trottle other transformations.
>
> In this case we can probably assume that no sane person would count
> on wrapping and record this as likely bound.
>
> Honza


Re: [PATCH] c++/67376 Comparison with pointer to past-the-end, of array fails inside constant expression

2016-03-30 Thread Jason Merrill

On 03/29/2016 11:57 PM, Martin Sebor wrote:

Are we confident that arr[0] won't make it here as POINTER_PLUS_EXPR or
some such?


I'm as confident as I can be given that this is my first time
working in this area.  Which piece of code or what assumption
in particular are you concerned about?


I want to be sure that we don't fold these conditions to false.

constexpr int *ip = 0;
constexpr struct A { int ar[3]; } *ap = 0;

static_assert(&ip[0] == 0);
static_assert(&(ap->ar[0]) == 0);

Jason



[arm-embedded]: Don't ignore target_header_dir when deciding inhibit_libc

2016-03-30 Thread Andre Vieira (lists)


On 17/03/16 16:33, Andre Vieira (lists) wrote:
> On 23/10/15 12:31, Bernd Schmidt wrote:
>> On 10/12/2015 11:58 AM, Ulrich Weigand wrote:
>>>
>>> Index: gcc/configure.ac
>>> ===
>>> --- gcc/configure.ac(revision 228530)
>>> +++ gcc/configure.ac(working copy)
>>> @@ -1993,7 +1993,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; t
>>>   fi
>>>
>>>   if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
>>> -  if test "x$with_headers" != x; then
>>> +  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
>>>   target_header_dir=$with_headers
>>> elif test "x$with_sysroot" = x; then
>>>  
>>> target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
>>>
>>
>> I'm missing the beginning of this conversation, but this looks like a
>> reasonable change (avoiding target_header_dir=yes for --with-headers).
>> So, approved.
>>
>>
>> Bernd
>>
> Hi there,
> 
> I was wondering why this never made it to trunk. I am currently running
> into an issue that this patch would fix.
> 
> Cheers,
> Andre
> 
We decided to apply this to the embedded-5-branch at revision r234576.

Cheers,
Andre


backported patch for PR69614

2016-03-30 Thread Vladimir Makarov

  The patch for PR69614 has been backported to gcc-5 branch:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69614

  Committed as rev. 234577.



[RFC][PATCH v3, ARM 5/8] ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers

2016-03-30 Thread Andre Vieira (lists)
On 29/03/16 17:49, Andre Vieira (lists) wrote:
> On 29/01/16 17:07, Andre Vieira (lists) wrote:
>> On 26/12/15 01:54, Thomas Preud'homme wrote:
>>> [Sending on behalf of Andre Vieira]
>>>
>>> Hello,
>>>
>>> This patch extends support for the ARMv8-M Security Extensions
>>> 'cmse_nonsecure_entry' attribute to safeguard against leak of
>>> information through unbanked registers.
>>>
>>> When returning from a nonsecure entry function we clear all
>>> caller-saved registers that are not used to pass return values, by
>>> writing either the LR, in case of general purpose registers, or the
>>> value 0, in case of FP registers. We use the LR to write to APSR and
>>> FPSCR too. We currently only support 32 FP registers as in we only
>>> clear D0-D7.
>>> We currently do not support entry functions that pass arguments or
>>> return variables on the stack and we diagnose this. This patch relies
>>> on the existing code to make sure callee-saved registers used in
>>> cmse_nonsecure_entry functions are saved and restored thus retaining
>>> their nonsecure mode value, this should be happening already as it is
>>> required by AAPCS.
>>>
>>>
>>> *** gcc/ChangeLog ***
>>> 2015-10-27  Andre Vieira
>>>  Thomas Preud'homme  
>>>
>>>  * gcc/config/arm/arm.c (output_return_instruction): Clear
>>>registers.
>>>(thumb2_expand_return): Likewise.
>>>(thumb1_expand_epilogue): Likewise.
>>>(arm_expand_epilogue): Likewise.
>>>(cmse_nonsecure_entry_clear_before_return): New.
>>>  * gcc/config/arm/arm.h (TARGET_DSP_ADD): New macro define.
>>>  * gcc/config/arm/thumb1.md (*epilogue_insns): Change length
>>> attribute.
>>>  * gcc/config/arm/thumb2.md (*thumb2_return): Likewise.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>> 2015-10-27  Andre Vieira
>>>  Thomas Preud'homme  
>>>
>>>  * gcc.target/arm/cmse/cmse.exp: Test different multilibs
>>> separate.
>>>  * gcc.target/arm/cmse/baseline/cmse-2.c: Test that registers
>>> are cleared.
>>>  * gcc.target/arm/cmse/mainline/soft/cmse-5.c: New.
>>>  * gcc.target/arm/cmse/mainline/hard/cmse-5.c: New.
>>>  * gcc.target/arm/cmse/mainline/hard-sp/cmse-5.c: New.
>>>  * gcc.target/arm/cmse/mainline/softfp/cmse-5.c: New.
>>>  * gcc.target/arm/cmse/mainline/softfp-sp/cmse-5.c: New.
>>>
>>>
>>> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
>>> index
>>> f12e3c93bbe24b10ed8eee6687161826773ef649..b06e0586a3da50f57645bda13629bc4dbd3d53b7
>>> 100644
>>> --- a/gcc/config/arm/arm.h
>>> +++ b/gcc/config/arm/arm.h
>>> @@ -230,6 +230,9 @@ extern void
>>> (*arm_lang_output_object_attributes_hook)(void);
>>>   /* Integer SIMD instructions, and extend-accumulate instructions.  */
>>>   #define TARGET_INT_SIMD \
>>> (TARGET_32BIT && arm_arch6 && (arm_arch_notm || arm_arch7em))
>>> +/* Parallel addition and subtraction instructions.  */
>>> +#define TARGET_DSP_ADD \
>>> +  (TARGET_ARM_ARCH >= 6 && (arm_arch_notm || arm_arch7em))
>>>
>>>   /* Should MOVW/MOVT be used in preference to a constant pool.  */
>>>   #define TARGET_USE_MOVT \
>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> index
>>> e530b772e3cc053c16421a2a2861d815d53ebb01..0700478ca38307f35d0cb01f83ea182802ba28fa
>>> 100644
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -19755,6 +19755,24 @@ output_return_instruction (rtx operand, bool
>>> really_return, bool reverse,
>>>   default:
>>> if (IS_CMSE_ENTRY (func_type))
>>>   {
>>> +  char flags[12] = "APSR_nzcvq";
>>> +  /* Check if we have to clear the 'GE bits' which is only
>>> used if
>>> + parallel add and subtraction instructions are available.  */
>>> +  if (TARGET_DSP_ADD)
>>> +{
>>> +  /* If so also clear the ge flags.  */
>>> +  flags[10] = 'g';
>>> +  flags[11] = '\0';
>>> +}
>>> +  snprintf (instr, sizeof (instr),  "msr%s\t%s, %%|lr",
>>> conditional,
>>> +flags);
>>> +  output_asm_insn (instr, & operand);
>>> +  if (TARGET_HARD_FLOAT && TARGET_VFP)
>>> +{
>>> +  snprintf (instr, sizeof (instr), "vmsr%s\tfpscr, %%|lr",
>>> +conditional);
>>> +  output_asm_insn (instr, & operand);
>>> +}
>>> snprintf (instr, sizeof (instr), "bxns%s\t%%|lr",
>>> conditional);
>>>   }
>>> /* Use bx if it's available.  */
>>> @@ -23999,6 +24017,17 @@ thumb_pop (FILE *f, unsigned long mask)
>>>   static void
>>>   thumb1_cmse_nonsecure_entry_return (FILE *f, int
>>> reg_containing_return_addr)
>>>   {
>>> +  char flags[12] = "APSR_nzcvq";
>>> +  /* Check if we have to clear the 'GE bits' which is only used if
>>> + parallel add and subtraction instructions are available.  */
>>> +  if (TARGET_DSP_ADD)
>>> +{
>>> +  flags[10] = 'g';
>>> +  flags[11] = '\0';
>>> +}
>

C++ PATCH for c++/70449 (ICE when printing a filename of unknown location)

2016-03-30 Thread Marek Polacek
This test ICEs since the addition of the assert in pp_string which ensures that
we aren't trying to print an empty string.  But that's what happens here, the
location is actually UNKNOWN_LOCATION, so LOCATION_FILE on that yields null.
Fixed byt not trying to print the filename of UNKNOWN_LOCATION.

This isn't really related to the bogus -Wreturn-type warning as I initially
thought.  (With -Wreturn-type we say that F is missing a return statement,
which obviously isn't correct -- I'll open a separate PR for that.)
We might ICE anytime we call print_instantiation_full_context with location
that is in fact UNKNOWN_LOCATION.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-03-30  Marek Polacek  

PR c++/70449
* error.c (print_instantiation_full_context): Don't print the filename
for UNKNOWN_LOCATIONs.

* g++.dg/cpp0x/constexpr-70449.C: New test.

diff --git gcc/cp/error.c gcc/cp/error.c
index aa5fd41..159a9e0 100644
--- gcc/cp/error.c
+++ gcc/cp/error.c
@@ -3312,12 +3312,19 @@ print_instantiation_full_context (diagnostic_context 
*context)
 
   if (p)
 {
-  pp_verbatim (context->printer,
-  TREE_CODE (p->decl) == TREE_LIST
-  ? _("%s: In substitution of %qS:\n")
-  : _("%s: In instantiation of %q#D:\n"),
-  LOCATION_FILE (location),
-  p->decl);
+  if (location != UNKNOWN_LOCATION)
+   pp_verbatim (context->printer,
+TREE_CODE (p->decl) == TREE_LIST
+? _("%s: In substitution of %qS:\n")
+: _("%s: In instantiation of %q#D:\n"),
+LOCATION_FILE (location),
+p->decl);
+  else
+   pp_verbatim (context->printer,
+TREE_CODE (p->decl) == TREE_LIST
+? _("In substitution of %qS:\n")
+: _("In instantiation of %q#D:\n"),
+p->decl);
 
   location = p->locus;
   p = p->next;
diff --git gcc/testsuite/g++.dg/cpp0x/constexpr-70449.C 
gcc/testsuite/g++.dg/cpp0x/constexpr-70449.C
index e69de29..bc5dd71 100644
--- gcc/testsuite/g++.dg/cpp0x/constexpr-70449.C
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-70449.C
@@ -0,0 +1,12 @@
+// PR c++/70449
+// { dg-do compile { target c++11 } }
+
+template 
+constexpr
+int f (void)
+{
+  enum E { a = f<0> () };
+  return 0;
+}
+
+// { dg-error "body of constexpr function" "" { target { ! c++14 } } 0 }

Marek


[AArch64] Fix SIMD predicate

2016-03-30 Thread Evandro Menezes

   Add scalar 0.0 to the aarch64_simd_reg_or_zero predicate.

   2016-03-30  Evandro Menezes  

* gcc/config/aarch64/predicates.md
(aarch64_simd_reg_or_zero predicate): Add the "const_double"
   constraint.


It seems to me that the aarch64_simd_reg_or_zero should also handle the 
scalar constant 0.0 as well.


OK to commit?

Thank you,

--
Evandro Menezes

diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 1186827..8f2726d 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -302,7 +302,7 @@
 })
 
 (define_predicate "aarch64_simd_reg_or_zero"
-  (and (match_code "reg,subreg,const_int,const_vector")
+  (and (match_code "reg,subreg,const_int,const_double,const_vector")
(ior (match_operand 0 "register_operand")
(ior (match_test "op == const0_rtx")
 (match_test "aarch64_simd_imm_zero_p (op, mode)")


Re: [gomp4] OpenACC async clause regressions

2016-03-30 Thread Thomas Schwinge
Hi!

On Wed, 18 Nov 2015 16:17:39 +0100, Tom de Vries  wrote:
> On 22/10/15 20:27, Thomas Schwinge wrote:
> > diff --cc libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> > index d478ce2,22cef6d..f3b490a
> > --- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> > +++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
> > @@@ -1,4 -1,4 +1,6 @@@
> >/* { dg-do run { target openacc_nvidia_accel_selected } } */
> > ++/*.
> > ++   { dg-xfail-run-if "TODO" { *-*-* } } */
> >/* { dg-additional-options "-lcuda" } */
> >
> >#include 
> 
> This failure shows up on trunk. Should it also be xfailed there?

I added the XFAIL as part of my recent r234575 "Update OpenACC test
cases" commit,
.
Chung-Lin, for avoidance of doubt, please remove that XFAIL once you get
to commit your fix for this issue (currently waiting for Jakub's
approval).


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH, CHKP, Solaris, PR target/69917, committed] Respect transparent alias chains of assembler names

2016-03-30 Thread Rainer Orth
Hi Ilya,

> This patch fixes gcc.target/i386/chkp-hidden-def.c test failure
> on Solaris.  Failure happens because some Solaris emit code
> ignores transparent alias chains for assembler names.  The patch
> was tested and approved by Rainer Orth (see [1] for more details).
> Applied to trunk.
>
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69917

I've now backported the patch to the gcc-5 branch after testing on
i386-pc-solaris2.1[0-2] and sparc-sun-solaris2.1[0-2].

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] c++/67376 Comparison with pointer to past-the-end, of array fails inside constant expression

2016-03-30 Thread Martin Sebor

On 03/30/2016 09:30 AM, Jason Merrill wrote:

On 03/29/2016 11:57 PM, Martin Sebor wrote:

Are we confident that arr[0] won't make it here as POINTER_PLUS_EXPR or
some such?


I'm as confident as I can be given that this is my first time
working in this area.  Which piece of code or what assumption
in particular are you concerned about?


I want to be sure that we don't fold these conditions to false.

constexpr int *ip = 0;
constexpr struct A { int ar[3]; } *ap = 0;

static_assert(&ip[0] == 0);
static_assert(&(ap->ar[0]) == 0);


I see.  Thanks for clarifying.  The asserts pass.  The expressions
are folded earlier on (in fact, as we discussed, the second one
too early and is accepted even though it's undefined and should be
rejected in a constexpr context) and never reach fold_comparison.

Martin


[RFC][PATCHv3, ARM 7/8] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call]

2016-03-30 Thread Andre Vieira (lists)
On 29/01/16 17:08, Andre Vieira (lists) wrote:
> On 19/01/16 15:28, Andre Vieira (lists) wrote:
>> On 16/01/16 14:49, Senthil Kumar Selvaraj wrote:
>>> User-agent: mu4e 0.9.13; emacs 24.5.1
>>>
>>> Hi,
>>>
>>> Apologies for the bad posting style (I don't have the
>>> original email handy), but shouldn't _gnu_cmse_nonsecure_call be defined
>>> with the .global directive in the below hunk (to make it visible when
>>> linking)?
>>>
>>> diff --git a/libgcc/config/arm/cmse_nonsecure_call.S
>>> b/libgcc/config/arm/cm=
>>> se_nonsecure_call.S
>>> new file mode 100644
>>> index
>>> ..bdc140f5bbe87c6599db225b1b9=
>>> b7bbc7d606710
>>> --- /dev/null
>>> +++ b/libgcc/config/arm/cmse_nonsecure_call.S
>>> @@ -0,0 +1,87 @@
>>> +.syntax unified
>>> +.thumb
>>> +__gnu_cmse_nonsecure_call:
>>>
>>> Right now, it ends up as a local symbol, and compiling and linking a
>>> program with cmse_nonsecure_call (say cmse-11.c), results in a linker
>>> error - the linker doesn't find the symbol even if it is present in
>>> libgcc.a. I found the problem that way - dumping symbols for my variant
>>> of libgcc.a and grepping showed the symbol to be available but local.
>>>
>>> Regards
>>> Senthil
>>>
>> Hi Senthil,
>>
>> Thanks for catching that!
>>
>> Cheers,
>> Andre
>>
> Hi there,
> 
> Added missing global symbol.
> 
> Is this OK?
> 
> Cheers,
> Andre
> 
> *** gcc/ChangeLog ***
> 2016-01-29  Andre Vieira
> Thomas Preud'homme  
> 
> * gcc/config/arm/arm.c (detect_cmse_nonsecure_call): New.
>   (cmse_nonsecure_call_clear_caller_saved): New.
> * gcc/config/arm/arm-protos.h (detect_cmse_nonsecure_call): New.
> * gcc/config/arm/arm.md (call): Handle cmse_nonsecure_entry.
>   (call_value): Likewise.
>   (nonsecure_call_internal): New.
>   (nonsecure_call_value_internal): New.
> * gcc/config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New.
>   (*nonsecure_call_value_reg_thumb1_v5): New.
> * gcc/config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New.
>   (*nonsecure_call_value_reg_thumb2): New.
> * gcc/config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New.
> * libgcc/config/arm/cmse_nonsecure_call.S: New.
> * libgcc/config/arm/t-arm: Compile cmse_nonsecure_call.S
> 
> 
> *** gcc/testsuite/ChangeLog ***
> 2016-01-29  Andre Vieira
> Thomas Preud'homme  
> 
> * gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-13.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-6.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/hard/cmse-13.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/hard/cmse-7.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/hard/cmse-8.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/soft/cmse-13.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/soft/cmse-7.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/soft/cmse-8.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New.
> * gcc/testsuite/gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New.

Hi there,

Forgot to add a copyright header to the cmse_nonsecure_call.S in
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02334.html.
Rectified with this patch.

Any comments?

Cheers,
Andre

*** gcc/ChangeLog ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* gcc/config/arm/arm.c (detect_cmse_nonsecure_call): New.
  (cmse_nonsecure_call_clear_caller_saved): New.
* gcc/config/arm/arm-protos.h (detect_cmse_nonsecure_call): New.
* gcc/config/arm/arm.md (call): Handle cmse_nonsecure_entry.
  (call_value): Likewise.
  (nonsecure_call_internal): New.
  (nonsecure_call_value_internal): New.
* gcc/config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New.
  (*nonsecure_call_value_reg_thumb1_v5): New.
* gcc/config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New.
  (*nonsecure_call_value_reg_thumb2): New.
* gcc/config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New.
* libgcc/config/arm/cmse_nonsecure_call.S: New.
* libgcc/config/arm/t-arm: Compile cmse_nonsecure_call.S


*** gcc/testsuite/ChangeLog ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* gcc/testsuite/gcc.target/arm/cmse/baseline/cmse-11.c: New.
* gcc/testsuite/gcc.target/arm/cmse/baselin

Re: C++ PATCH for c++/70449 (ICE when printing a filename of unknown location)

2016-03-30 Thread Jason Merrill

OK.

Jason


[testsuite, i386] Fix gcc.target/i386/avx-vextractf128-256-5.c with Solaris as (PR testsuite/70356)

2016-03-30 Thread Rainer Orth
This patch (provided by Jakub in the PR) fixes a failure of
gcc.target/i386/avx-vextractf128-256-5.c on the gcc-5 branch with the
Solaris assembler:

FAIL: gcc.target/i386/avx-vextractf128-256-5.c (test for excess errors)

The problem is that a target selector to dg-do overrides a previous
dg-require-effective-target, causing (in this case) an attempt to
assemble avx512f insns with the native Solaris assembler that lacks
support for them.

Tested with the appropriate runtest invocations on i386-pc-solaris2.12
with both as and gas, installed on the gcc-5 branch as approved by Uros
in the PR.

As Uros requested, I've also forward-ported the testcase to mainline
where it had been forgotten, tested as before and on x86_64-pc-linux-gnu.

Rainer


2016-03-29  Jakub Jelinek  

PR testsuite/70356
* gcc.target/i386/avx-vextractf128-256-5.c: Move
dg-require-effective-target after dg-do.

# HG changeset patch
# Parent  e98f530832af6d6305e1369245a5b79bb2d9f5bb
Fix gcc.target/i386/avx-vextractf128-256-5.c with Solaris as (PR testsuite/70356)

diff --git a/gcc/testsuite/gcc.target/i386/avx-vextractf128-256-5.c b/gcc/testsuite/gcc.target/i386/avx-vextractf128-256-5.c
--- a/gcc/testsuite/gcc.target/i386/avx-vextractf128-256-5.c
+++ b/gcc/testsuite/gcc.target/i386/avx-vextractf128-256-5.c
@@ -1,5 +1,5 @@
+/* { dg-do assemble { target { ! ia32 } } } */
 /* { dg-require-effective-target avx512f } */
-/* { dg-do assemble { target { ! ia32 } } } */
 /* { dg-options "-O2 -mavx512f" } */
 
 #include 

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [C++/70393] constexpr constructor

2016-03-30 Thread Jason Merrill

On 03/29/2016 08:40 AM, Nathan Sidwell wrote:

+ /* The field we're initializing must be on the field
+list.  Look to see if it is present before the
+field the current ELT initializes.  */
+ for (; fields != cep->index; fields = DECL_CHAIN (fields))
+   if (index == fields)
+ goto insert;


Instead of searching through the fields, could we compare 
DECL_FIELD_OFFSET of index and cep->index?


Jason



[PATCHv2 1/8, GCC, V8M][arm-embedded] Add support for ARMv8-M's Security Extensions flag and intrinsics

2016-03-30 Thread Andre Vieira (lists)
Hi there,

Applied https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02148.html on
embedded-5-branch using the included patch at revision r234582.


Cheers,
Andre

*** gcc ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* config.gcc (extra_headers): Added arm_cmse.h.
* config/arm/arm-arches.def (armv8-m.base): Add FL_CMSE.
(armv8-m.main): Likewise.
(armv8-m.main+dsp): Likewise.
* config/arm/arm-protos.h (arm_is_constant_pool_ref): Define
FL_CMSE.
* config/arm.c (arm_arch_cmse): New.
(arm_option_override): New error for unsupported cmse target.
* config/arm/arm.h (arm_arch_cmse): New.
(arm_cpu_builtins): Added __ARM_FEATURE_CMSE macro.
* config/arm/arm.opt (mcmse): New.
* doc/invoke.texi (ARM Options): Add -mcmse.
* doc/extend.texi (ACLE): Add CMSE.
* config/arm/arm_cmse.h: New file.

*** libgcc ***
2016-03-30 Andre Vieira 
Thomas Preud'homme 

* config/arm/cmse.c: Likewise.
* config/arm/t-arm (HAVE_CMSE): New.


*** gcc/testsuite ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse.exp: New.
* gcc.target/arm/cmse/cmse-1.c: New.
* gcc.target/arm/cmse/cmse-12.c: New.
* lib/target-supports.exp
(check_effective_target_arm_cmse_ok): New.
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 
9ee1024bd4d5f92d5dd28e763d37ee8324a7..4ec62db49f13642142b932d36f444f5ec9c74fd2
 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -319,7 +319,7 @@ arc*-*-*)
 arm*-*-*)
cpu_type=arm
extra_objs="arm-builtins.o aarch-common.o"
-   extra_headers="mmintrin.h arm_neon.h arm_acle.h"
+   extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_cmse.h"
target_type_format_char='%'
c_target_objs="arm-c.o"
cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def
index 
d44179f290134eb2ec7894b47aa5ccb74b42..8a7d8a3a3895aaf07a9b7e3c2f231357f8c81e21
 100644
--- a/gcc/config/arm/arm-arches.def
+++ b/gcc/config/arm/arm-arches.def
@@ -56,8 +56,8 @@ ARM_ARCH("armv7-m", cortexm3, 7M,  FL_CO_PROC | 
FL_FOR_ARCH7M)
 ARM_ARCH("armv7e-m", cortexm4,  7EM, FL_CO_PROC |FL_FOR_ARCH7EM)
 ARM_ARCH("armv8-a", cortexa53,  8A,  FL_CO_PROC | FL_FOR_ARCH8A)
 ARM_ARCH("armv8-a+crc",cortexa53, 8A,FL_CO_PROC | FL_CRC32  | FL_FOR_ARCH8A)
-ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,  
FL_FOR_ARCH8M_BASE)
-ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN, FL_CO_PROC |  
FL_FOR_ARCH8M_MAIN)
-ARM_ARCH("armv8-m.main+dsp",cortexm7,8M_MAIN,FL_CO_PROC|FL_ARCH7EM|FL_FOR_ARCH8M_MAIN)
+ARM_ARCH("armv8-m.base", cortexm0, 8M_BASE,  
FL_FOR_ARCH8M_BASE |  FL_CMSE)
+ARM_ARCH("armv8-m.main", cortexm7, 8M_MAIN, FL_CO_PROC |  
FL_FOR_ARCH8M_MAIN | FL_CMSE)
+ARM_ARCH("armv8-m.main+dsp",cortexm7,8M_MAIN,FL_CO_PROC|FL_ARCH7EM|FL_FOR_ARCH8M_MAIN
 |FL_CMSE)
 ARM_ARCH("iwmmxt",  iwmmxt, 5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | 
FL_XSCALE | FL_IWMMXT)
 ARM_ARCH("iwmmxt2", iwmmxt2,5TE, FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | 
FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
f48366b2a77f59f91d339358912746f45de55a63..05acdfada28c619102059959bdcfa2a8223524ec
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -368,6 +368,7 @@ extern bool arm_is_constant_pool_ref (rtx);
 
 #define FL_IWMMXT (1 << 29)  /* XScale v2 or "Intel Wireless 
MMX technology".  */
 #define FL_IWMMXT2(1 << 30)   /* "Intel Wireless MMX2 technology".  */
+#define FL_CMSE  (1 << 31)   /* ARMv8-M Security Extensions.  
*/
 
 /* Flags that only effect tuning, not available instructions.  */
 #define FL_TUNE(FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
d8d3ba5cba9807070989350644868fd88a98b4dc..7574064936e5217c8e553e7ab744cbe9320346d2
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -62,6 +62,13 @@ extern char arm_arch_name[];
  builtin_define ("__ARM_FEATURE_CRC32");   \
if (TARGET_32BIT)   \
  builtin_define ("__ARM_32BIT_STATE"); \
+   if (arm_arch8 && !arm_arch_notm)\
+ { \
+   if (arm_arch_cmse && use_cmse)  \
+ builtin_define_with_int_value ("__ARM_FEATURE_CMSE", 3);  \
+   else\
+ builtin_define ("__ARM_FEATURE_CMSE");\
+ } \
if (TARGET_ARM_FEATURE_LDREX)   \
  b

[PATCH 3/8, GCC, V8M][arm-embedded] Handling ARMv8-M Security Extension's cmse_nonsecure_entry attribute

2016-03-30 Thread Andre Vieira (lists)
Hi,

Applied https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02150.html on
embedded-5-branch using the included patch at revision r234583.

*** gcc ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (arm_handle_cmse_nonsecure_entry): New.
(arm_attribute_table): Added cmse_nonsecure_entry
(arm_compute_func_type): Handle cmse_nonsecure_entry.
(cmse_func_args_or_return_in_stack): New.
(arm_handle_cmse_nonsecure_entry): New.
* config/arm/arm.h (ARM_FT_CMSE_ENTRY): New macro define.
(IS_CMSE_ENTRY): Likewise.

*** gcc/testsuite ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-3.c: New.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 
7574064936e5217c8e553e7ab744cbe9320346d2..3467a9ea3d3c59b0b41a59f22f14f277153cff0e
 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -1510,6 +1510,7 @@ do {  
  \
 #define ARM_FT_VOLATILE(1 << 4) /* Does not return.  */
 #define ARM_FT_NESTED  (1 << 5) /* Embedded inside another func.  */
 #define ARM_FT_STACKALIGN  (1 << 6) /* Called with misaligned stack.  */
+#define ARM_FT_CMSE_ENTRY  (1 << 7) /* ARMv8-M non-secure entry function.  
*/
 
 /* Some macros to test these flags.  */
 #define ARM_FUNC_TYPE(t)   (t & ARM_FT_TYPE_MASK)
@@ -1518,6 +1519,7 @@ do {  
  \
 #define IS_NAKED(t)(t & ARM_FT_NAKED)
 #define IS_NESTED(t)   (t & ARM_FT_NESTED)
 #define IS_STACKALIGN(t)   (t & ARM_FT_STACKALIGN)
+#define IS_CMSE_ENTRY(t)   (t & ARM_FT_CMSE_ENTRY)
 
 
 /* Structure used to hold the function stack frame layout.  Offsets are
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
8c951490f0fa4eb5a5d14a1ca75a51bdbe03..d53d96e7e52cdcc3c2340d714d277e12a1ee07f6
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -167,6 +167,7 @@ static tree arm_handle_isr_attribute (tree *, tree, tree, 
int, bool *);
 #if TARGET_DLLIMPORT_DECL_ATTRIBUTES
 static tree arm_handle_notshared_attribute (tree *, tree, tree, int, bool *);
 #endif
+static tree arm_handle_cmse_nonsecure_entry (tree *, tree, tree, int, bool *);
 static void arm_output_function_epilogue (FILE *, HOST_WIDE_INT);
 static void arm_output_function_prologue (FILE *, HOST_WIDE_INT);
 static int arm_comp_type_attributes (const_tree, const_tree);
@@ -368,6 +369,9 @@ static const struct attribute_spec arm_attribute_table[] =
   { "notshared",0, 0, false, true, false, arm_handle_notshared_attribute,
 false },
 #endif
+  /* ARMv8-M Security Extensions support.  */
+  { "cmse_nonsecure_entry", 0, 0, true, false, false,
+arm_handle_cmse_nonsecure_entry, false },
   { NULL,   0, 0, false, false, false, NULL, false }
 };
 
@@ -3354,6 +3358,9 @@ arm_compute_func_type (void)
   else
 type |= arm_isr_value (TREE_VALUE (a));
 
+  if (lookup_attribute ("cmse_nonsecure_entry", attr))
+type |= ARM_FT_CMSE_ENTRY;
+
   return type;
 }
 
@@ -6348,6 +6355,109 @@ arm_handle_notshared_attribute (tree *node,
 }
 #endif
 
+/* This function is used to check whether functions with attributes
+   cmse_nonsecure_call or cmse_nonsecure_entry use the stack to pass arguments
+   or return variables.  If the function does indeed use the stack this
+   function returns true and diagnoses this, otherwise it returns false.  */
+
+static bool
+cmse_func_args_or_return_in_stack (tree fndecl, tree name, tree fntype)
+{
+  function_args_iterator args_iter;
+  CUMULATIVE_ARGS args_so_far_v;
+  cumulative_args_t args_so_far;
+  bool first_param = true;
+  tree arg_type, prev_arg_type = NULL_TREE, ret_type;
+
+  /* Error out if any argument is passed on the stack.  */
+  arm_init_cumulative_args (&args_so_far_v, fntype, NULL_RTX, fndecl);
+  args_so_far = pack_cumulative_args (&args_so_far_v);
+  FOREACH_FUNCTION_ARGS (fntype, arg_type, args_iter)
+{
+  rtx arg_rtx;
+  machine_mode arg_mode = TYPE_MODE (arg_type);
+
+  prev_arg_type = arg_type;
+  if (VOID_TYPE_P (arg_type))
+   continue;
+
+  if (!first_param)
+   arm_function_arg_advance (args_so_far, arg_mode, arg_type, true);
+  arg_rtx = arm_function_arg (args_so_far, arg_mode, arg_type, true);
+  if (!arg_rtx
+ || arm_arg_partial_bytes (args_so_far, arg_mode, arg_type, true))
+   {
+ error ("%qE attribute not available to functions with arguments "
+"passed on the stack", name);
+ return true;
+   }
+  first_param = false;
+}
+
+  /* Error out for variadic functions since we cannot control how many
+ arguments will be passed and thus stack could be used.  stdarg_p () is not
+ used for the checking to avoid browsing arguments twice.  */
+  if (prev_arg_type != NULL_TREE && !VOID_T

[PATCH, ARM 4/8][arm-embedded] ARMv8-M Security Extension's cmse_nonsecure_entry: __acle_se label and bxns return

2016-03-30 Thread Andre Vieira (lists)
Hi,

Applied the patch in
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02151.html on
embedded-5-branch at revision r234584.

Cheers,
Andre


Re: [C++/70393] constexpr constructor

2016-03-30 Thread Nathan Sidwell

On 03/30/16 13:22, Jason Merrill wrote:

On 03/29/2016 08:40 AM, Nathan Sidwell wrote:

+  /* The field we're initializing must be on the field
+ list.  Look to see if it is present before the
+ field the current ELT initializes.  */
+  for (; fields != cep->index; fields = DECL_CHAIN (fields))
+if (index == fields)
+  goto insert;


Instead of searching through the fields, could we compare DECL_FIELD_OFFSET of
index and cep->index?


I wondered about that, but then there's  bitfields (and I can't recall how they 
behave WRT DECL_FIELD_OFFSET).  pointer equality seems cheap enough?


nathan


[PATCH v2, GCC, V8M 5/8][arm-embedded] ARMv8-M Security Extension's cmse_nonsecure_entry: clear registers

2016-03-30 Thread Andre Vieira (lists)
Applied the patch in
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01524.html on
embedded-5-branch at revision r234585.

Cheers,
Andre


[PATCH 6/8, GCC, V8M][arm-embedded] Handling ARMv8-M Security Extension's cmse_nonsecure_call attribute

2016-03-30 Thread Andre Vieira (lists)
Hi,

Applied https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02153.html on
embedded-5-branch using included patch at revision r234586.

*** gcc ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (gimplify.h): New include.
(arm_handle_cmse_nonsecure_call): New.
(arm_attribute_table): Added cmse_nonsecure_call.

*** gcc/testsuite ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-3.c: Add tests.
* gcc.target/arm/cmse/cmse-4.c: Add tests.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
d1be6eed1ac153903d50160f3b08d325187acf0b..d13bc2d49508863cf5b45a5f447a70fb468a115c
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -98,6 +98,7 @@
 #include "tm-constrs.h"
 #include "rtl-iter.h"
 #include "sched-int.h"
+#include "gimplify.h"
 
 /* Forward definitions of types.  */
 typedef struct minipool_nodeMnode;
@@ -168,6 +169,7 @@ static tree arm_handle_isr_attribute (tree *, tree, tree, 
int, bool *);
 static tree arm_handle_notshared_attribute (tree *, tree, tree, int, bool *);
 #endif
 static tree arm_handle_cmse_nonsecure_entry (tree *, tree, tree, int, bool *);
+static tree arm_handle_cmse_nonsecure_call (tree *, tree, tree, int, bool *);
 static void arm_output_function_epilogue (FILE *, HOST_WIDE_INT);
 static void arm_output_function_prologue (FILE *, HOST_WIDE_INT);
 static int arm_comp_type_attributes (const_tree, const_tree);
@@ -372,6 +374,8 @@ static const struct attribute_spec arm_attribute_table[] =
   /* ARMv8-M Security Extensions support.  */
   { "cmse_nonsecure_entry", 0, 0, true, false, false,
 arm_handle_cmse_nonsecure_entry, false },
+  { "cmse_nonsecure_call", 0, 0, true, false, false,
+arm_handle_cmse_nonsecure_call, false },
   { NULL,   0, 0, false, false, false, NULL, false }
 };
 
@@ -6463,6 +6467,76 @@ arm_handle_cmse_nonsecure_entry (tree *node, tree name,
   return NULL_TREE;
 }
 
+
+/* Called upon detection of the use of the cmse_nonsecure_call attribute, this
+   function will check whether the attribute is allowed here and will add the
+   attribute to the function type tree or otherwise issue a diagnose.  The
+   reason we check this at declaration time is to only allow the use of the
+   attribute with declartions of function pointers and not function
+   declartions.  */
+
+static tree
+arm_handle_cmse_nonsecure_call (tree *node, tree name,
+tree /* args */,
+int /* flags */,
+bool *no_add_attrs)
+{
+  tree decl = NULL_TREE;
+  tree type, fntype, main_variant;
+
+  if (!use_cmse)
+{
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+
+  if (TREE_CODE (*node) == VAR_DECL || TREE_CODE (*node) == TYPE_DECL)
+{
+  decl = *node;
+  type = TREE_TYPE (decl);
+}
+
+  if (!decl
+  || (!(TREE_CODE (type) == POINTER_TYPE
+   && TREE_CODE (TREE_TYPE (type)) == FUNCTION_TYPE)
+ && TREE_CODE (type) != FUNCTION_TYPE))
+{
+   warning (OPT_Wattributes, "%qE attribute only applies to base type of a 
"
+"function pointer", name);
+   *no_add_attrs = true;
+   return NULL_TREE;
+}
+
+  /* type is either a function pointer, when the attribute is used on a 
function
+   * pointer, or a function type when used in a typedef.  */
+  if (TREE_CODE (type) == FUNCTION_TYPE)
+fntype = type;
+  else
+fntype = TREE_TYPE (type);
+
+  *no_add_attrs |= cmse_func_args_or_return_in_stack (NULL, name, fntype);
+
+  if (*no_add_attrs)
+return NULL_TREE;
+
+  /* Prevent tree's being shared among function types with and without
+ cmse_nonsecure_call attribute.  Do however make sure they keep the same
+ main_variant, this is required for correct DIE output.  */
+  main_variant = TYPE_MAIN_VARIANT (fntype);
+  fntype = build_distinct_type_copy (fntype);
+  TYPE_MAIN_VARIANT (fntype) = main_variant;
+  if (TREE_CODE (type) == FUNCTION_TYPE)
+TREE_TYPE (decl) = fntype;
+  else
+TREE_TYPE (type) = fntype;
+
+  /* Construct a type attribute and add it to the function type.  */
+  tree attrs = tree_cons (get_identifier ("cmse_nonsecure_call"), NULL_TREE,
+ TYPE_ATTRIBUTES (fntype));
+  TYPE_ATTRIBUTES (fntype) = attrs;
+  return NULL_TREE;
+}
+
 /* Return 0 if the attributes for two types are incompatible, 1 if they
are compatible, and 2 if they are nearly compatible (which causes a
warning to be generated).  */
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-3.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-3.c
index 
f806951e90256e8286d2d0f9467b51a73a522e2b..0fe6eff45d2884736ba7049ce4ed5b9785b1018d
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-3.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-3.c
@@ -36,3 +36,11 @@ norf (struct span2 a) {}
 
 void __attribute__ ((cmse_nonsecure_entry))
 foo2 (long long a, int b,

[PATCH 7/8, GCC, V8M][arm-embedded] ARMv8-M Security Extension's cmse_nonsecure_call: use __gnu_cmse_nonsecure_call

2016-03-30 Thread Andre Vieira (lists)
Hi,

Applied https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02334.html on
embedded-5-branch using the included patch at revision r234587.

*** gcc/ ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* config/arm/arm.c (detect_cmse_nonsecure_call): New.
(cmse_nonsecure_call_clear_caller_saved): New.
* config/arm/arm-protos.h (detect_cmse_nonsecure_call): New.
* config/arm/arm.md (call): Handle cmse_nonsecure_entry.
(call_value): Likewise.
(nonsecure_call_internal): New.
(nonsecure_call_value_internal): New.
* config/arm/thumb1.md (*nonsecure_call_reg_thumb1_v5): New.
(*nonsecure_call_value_reg_thumb1_v5): New.
* config/arm/thumb2.md (*nonsecure_call_reg_thumb2): New.
(*nonsecure_call_value_reg_thumb2): New.
* config/arm/unspecs.md (UNSPEC_NONSECURE_MEM): New.

*** libgcc/ ***
2016-03-30 Andre Vieira 
Thomas Preud'homme 

* config/arm/cmse_nonsecure_call.S: New.
* config/arm/t-arm: Compile cmse_nonsecure_call.S


*** gcc/testsuite/ ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/baseline/cmse-11.c: New.
* gcc.target/arm/cmse/baseline/cmse-13.c: New.
* gcc.target/arm/cmse/baseline/cmse-6.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/hard/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/soft/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp-sp/cmse-8.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-13.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-7.c: New.
* gcc.target/arm/cmse/mainline/softfp/cmse-8.c: New.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
066e2318967e11f0eeba79ef80d990c149992426..27173fea25df0b56bef68656d5f0224c5b817fde
 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -135,6 +135,7 @@ extern int arm_const_double_inline_cost (rtx);
 extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
 extern void arm_emit_call_insn (rtx, rtx, bool);
+bool detect_cmse_nonsecure_call (tree);
 extern const char *output_call (rtx *);
 extern const char *output_call_mem (rtx *);
 void arm_emit_movpair (rtx, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
d13bc2d49508863cf5b45a5f447a70fb468a115c..ec303e871f60485d06e35308b98154c1089bf330
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -17415,6 +17415,129 @@ note_invalid_constants (rtx_insn *insn, HOST_WIDE_INT 
address, int do_pushes)
   return;
 }
 
+/* Saves callee saved registers, clears callee saved registers and caller saved
+   registers not used to pass arguments before a cmse_nonsecure_call.  And
+   restores the callee saved registers after.  */
+
+static void
+cmse_nonsecure_call_clear_caller_saved (void)
+{
+  basic_block bb;
+
+  FOR_EACH_BB_FN (bb, cfun)
+{
+  rtx_insn *insn;
+
+  FOR_BB_INSNS (bb, insn)
+   {
+ uint64_t to_clear_mask, float_mask;
+ rtx_insn *seq;
+ rtx pat, call, unspec, link, reg, cleared_reg, tmp;
+ unsigned int regno, maxregno;
+ rtx address;
+
+ if (!NONDEBUG_INSN_P (insn))
+   continue;
+
+ if (!CALL_P (insn))
+   continue;
+
+ pat = PATTERN (insn);
+ gcc_assert (GET_CODE (pat) == PARALLEL && XVECLEN (pat, 0) > 0);
+ call = XVECEXP (pat, 0, 0);
+
+ /* Get the real call RTX if the insn sets a value, ie. returns.  */
+ if (GET_CODE (call) == SET)
+ call = SET_SRC (call);
+
+ /* Check if it is a cmse_nonsecure_call.  */
+ unspec = XEXP (call, 0);
+ if (GET_CODE (unspec) != UNSPEC
+ || XINT (unspec, 1) != UNSPEC_NONSECURE_MEM)
+   continue;
+
+ /* Determine the caller-saved registers we need to clear.  */
+ to_clear_mask = (1LL << (NUM_ARG_REGS)) - 1;
+ maxregno = NUM_ARG_REGS - 1;
+ if (TARGET_HARD_FLOAT && TARGET_VFP)
+   {
+ float_mask = (1LL << (D7_VFP_REGNUM + 1)) - 1;
+ float_mask &= ~((1LL << FIRST_VFP_REGNUM) - 1);
+ to_clear_mask |= float_mask;
+ maxregno = D7_VFP_REGNUM;
+   }
+
+ /* Make sure the register used to hold the function address is not
+cleared.  */
+ address = RTVEC_ELT (XVEC (unspec, 0), 0);
+ gcc_assert (MEM_P (address));
+ gcc_asse

[PATCH 8/8, GCC, V8M][arm-embedded] Added support for ARMV8-M Security Extension cmse_nonsecure_caller intrinsic

2016-03-30 Thread Andre Vieira (lists)
Hi,

Applied https://gcc.gnu.org/ml/gcc-patches/2015-12/msg02155.html on
embedded-5-branch using included patch at revision r234589.

*** gcc/ ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* config/arm/arm-builtins.c (arm_builtins): Define
ARM_BUILTIN_CMSE_NONSECURE_CALLER.
(bdesc_2arg): Add line for cmse_nonsecure_caller.
(arm_init_builtins): Init for cmse_nonsecure_caller.
(arm_expand_builtin): Handle cmse_nonsecure_caller.
* config/arm/arm_cmse.h (cmse_nonsecure_caller): New.

*** gcc/testsuite/ ***
2016-03-30  Andre Vieira
Thomas Preud'homme  

* gcc.target/arm/cmse/cmse-1.c: Add test for
cmse_nonsecure_caller.
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 
7a451133f861a476a7cad359bd0374e3c4f06f35..277046fe6a6d517d0b33797e45f5535d1d59c11a
 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -508,6 +508,8 @@ enum arm_builtins
   ARM_BUILTIN_GET_FPSCR,
   ARM_BUILTIN_SET_FPSCR,
 
+  ARM_BUILTIN_CMSE_NONSECURE_CALLER,
+
 #undef CRYPTO1
 #undef CRYPTO2
 #undef CRYPTO3
@@ -1224,6 +1226,10 @@ static const struct builtin_description bdesc_2arg[] =
   FP_BUILTIN (set_fpscr, SET_FPSCR)
 #undef FP_BUILTIN
 
+  {FL_CMSE, CODE_FOR_andsi3,
+   "__builtin_arm_cmse_nonsecure_caller", ARM_BUILTIN_CMSE_NONSECURE_CALLER,
+   UNKNOWN, 0},
+
 #define CRC32_BUILTIN(L, U) \
   {0, CODE_FOR_##L, "__builtin_arm_"#L, ARM_BUILTIN_##U, \
UNKNOWN, 0},
@@ -1753,6 +1759,17 @@ arm_init_builtins (void)
= add_builtin_function ("__builtin_arm_stfscr", ftype_set_fpscr,
ARM_BUILTIN_SET_FPSCR, BUILT_IN_MD, NULL, 
NULL_TREE);
 }
+
+  if (arm_arch_cmse)
+{
+  tree ftype_cmse_nonsecure_caller
+   = build_function_type_list (unsigned_type_node, NULL);
+  arm_builtin_decls[ARM_BUILTIN_CMSE_NONSECURE_CALLER]
+   = add_builtin_function ("__builtin_arm_cmse_nonsecure_caller",
+   ftype_cmse_nonsecure_caller,
+   ARM_BUILTIN_CMSE_NONSECURE_CALLER, BUILT_IN_MD,
+   NULL, NULL_TREE);
+}
 }
 
 /* Return the ARM builtin for CODE.  */
@@ -2272,6 +2289,14 @@ arm_expand_builtin (tree exp,
   emit_insn (pat);
   return target;
 
+case ARM_BUILTIN_CMSE_NONSECURE_CALLER:
+  icode = CODE_FOR_andsi3;
+  target = gen_reg_rtx (SImode);
+  op0 = arm_return_addr (0, NULL_RTX);
+  pat = GEN_FCN (icode) (target, op0, const1_rtx);
+  emit_insn (pat);
+  return target;
+
 case ARM_BUILTIN_TEXTRMSB:
 case ARM_BUILTIN_TEXTRMUB:
 case ARM_BUILTIN_TEXTRMSH:
diff --git a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c 
b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
index 
1c3d4e9e934f4b1166d4d98383cf4ae8c3515117..ccecf396d3cda76536537b4d146bbb5f70589fd5
 100644
--- a/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
+++ b/gcc/testsuite/gcc.target/arm/cmse/cmse-1.c
@@ -66,3 +66,32 @@ int foo (char * p)
 /* { dg-final { scan-assembler-times "ttat " 2 } } */
 /* { dg-final { scan-assembler-times "bl.cmse_check_address_range" 7 } } */
 /* { dg-final { scan-assembler-not "cmse_check_pointed_object" } } */
+
+typedef int (*int_ret_funcptr_t) (void);
+typedef int __attribute__ ((cmse_nonsecure_call)) (*int_ret_nsfuncptr_t) 
(void);
+
+int __attribute__ ((cmse_nonsecure_entry))
+baz (void)
+{
+  return cmse_nonsecure_caller ();
+}
+
+int __attribute__ ((cmse_nonsecure_entry))
+qux (int_ret_funcptr_t int_ret_funcptr)
+{
+  int_ret_nsfuncptr_t int_ret_nsfunc_ptr;
+
+  if (cmse_is_nsfptr (int_ret_funcptr))
+{
+  int_ret_nsfunc_ptr = cmse_nsfptr_create (int_ret_funcptr);
+  return int_ret_nsfunc_ptr ();
+}
+  return 0;
+}
+/* { dg-final { scan-assembler "baz:" } } */
+/* { dg-final { scan-assembler "__acle_se_baz:" } } */
+/* { dg-final { scan-assembler-not "\tcmse_nonsecure_caller" } } */
+/* { dg-final { scan-rtl-dump "and.*reg.*const_int 1" expand } } */
+/* { dg-final { scan-assembler "bic" } } */
+/* { dg-final { scan-assembler "push\t\{r4, r5, r6" } } */
+/* { dg-final { scan-assembler "msr\tAPSR_nzcvq" } } */


Re: [PATCH] gcc/final.c: -fdebug-prefix-map support to remap sources with relative path

2016-03-30 Thread Joseph Myers
On Sun, 27 Mar 2016, Hongxu Jia wrote:

> PR other/70428
> * final.c (remap_debug_filename): Use lrealpath to translate
> relative path before remapping

I think this would break cases that currently work.  When using 
-fdebug-prefix-map you should understand what paths will appear in debug 
info (which means understanding something about how a path gets split by 
DWARF into multiple pieces), and likely inspect the resulting debug info 
(in an automated way) to see if you missed any paths and need to add more 
such options.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/4] Add gcc-auto-profile script

2016-03-30 Thread Joseph Myers
On Sun, 27 Mar 2016, Andi Kleen wrote:

> 2016-03-27  Andi Kleen  
> 
>   * gen_autofdo_event.py: New file to regenerate
>   gcc-auto-profile.

It may not be required in contrib, but does this script work with both 
Python 2 and Python 3?  (New code that only works with Python 2 seems like 
a bad idea nowadays, with GNU/Linux distributions moving to having only 
Python 3 in a default install.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Do not give realistic estimates for loop with array accesses

2016-03-30 Thread Bernhard Reutner-Fischer
On March 30, 2016 2:36:14 PM GMT+02:00, Richard Biener  
wrote:
>On Wed, 30 Mar 2016, Jan Hubicka wrote:
>
>> > 
>> > You are only changing one place in this file.
>> 
>> You are right. I am attaching the updated patch which I am re-testing
>now.
>> > 
>> > The vectorizer already checks this (albeit indirectly):
>> > 
>> >   HOST_WIDE_INT max_niter
>> > = max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
>> >   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>> >&& (LOOP_VINFO_INT_NITERS (loop_vinfo) <
>vectorization_factor))
>> >   || (max_niter != -1
>> >   && (unsigned HOST_WIDE_INT) max_niter <
>vectorization_factor))
>> > {
>> >   if (dump_enabled_p ())
>> > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >  "not vectorized: iteration count smaller
>than "
>> >  "vectorization factor.\n");
>> >   return false;
>> > }
>> 
>> Yes, but one tests only vectorization_factor and other
>min_profitable_estimate
>> which probably should be greater than vectorization_factor.
>> 
>> The check above should therefore become redundant.  My reading of the
>code is
>> that min_profiltable_estimate is computed after the check above, so
>it is
>> probably an useful shortcut and the message is also bit more
>informative.
>> I updated the later test to use max_niter variable once it is
>computed.
>> 
>> OK with those changes assuming testing passes?
>
>Ok.

Maybe s/overlow/overflow/g while at it..

TIA,
>
>Richard.
>
>> Honza
>> 
>>  * tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can't get
>realistic
>>  estimates here.
>>  * tree-ssa-loop-unswitch.c (tree_unswitch_single_loop): Use also
>>  max_loop_iterations_int.
>>  (tree_unswitch_outer_loop): Likewise.
>>  * tree-ssa-loop-ivopts.c (avg_loop_niter): Likewise.
>>  * tree-vect-loop.c (vect_analyze_loop_2): Likewise.
>> Index: tree-ssa-loop-ivopts.c
>> ===
>> --- tree-ssa-loop-ivopts.c   (revision 234516)
>> +++ tree-ssa-loop-ivopts.c   (working copy)
>> @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>>  {
>>HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>>if (niter == -1)
>> -return AVG_LOOP_NITER (loop);
>> +{
>> +  niter = max_stmt_executions_int (loop);
>> +  if (niter == -1 || niter > AVG_LOOP_NITER (loop))
>> +return AVG_LOOP_NITER (loop);
>> +}
>>  
>>return niter;
>>  }
>> Index: tree-ssa-loop-niter.c
>> ===
>> --- tree-ssa-loop-niter.c(revision 234516)
>> +++ tree-ssa-loop-niter.c(working copy)
>> @@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
>>tree low, high, type, next;
>>bool sign, upper = true, at_end = false;
>>struct loop *loop = data->loop;
>> -  bool reliable = true;
>>  
>>if (TREE_CODE (base) != ARRAY_REF)
>>  return true;
>> @@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
>>&& tree_int_cst_compare (next, high) <= 0)
>>  return true;
>>  
>> -  /* If access is not executed on every iteration, we must ensure
>that overlow may
>> - not make the access valid later.  */
>> +  /* If access is not executed on every iteration, we must ensure
>that overlow
>> + may not make the access valid later.  */
>>if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb
>(data->stmt))
>>&& scev_probably_wraps_p (initial_condition_in_loop_num (ev,
>loop->num),
>>  step, data->stmt, loop, true))
>> -reliable = false;
>> +upper = false;
>>  
>> -  record_nonwrapping_iv (loop, init, step, data->stmt, low, high,
>reliable, upper);
>> +  record_nonwrapping_iv (loop, init, step, data->stmt, low, high,
>false, upper);
>>return true;
>>  }
>>  
>> Index: tree-ssa-loop-unswitch.c
>> ===
>> --- tree-ssa-loop-unswitch.c (revision 234516)
>> +++ tree-ssa-loop-unswitch.c (working copy)
>> @@ -223,6 +223,8 @@ tree_unswitch_single_loop (struct loop *
>>/* If the loop is not expected to iterate, there is no need
>>   for unswitching.  */
>>iterations = estimated_loop_iterations_int (loop);
>> +  if (iterations < 0)
>> +iterations = max_loop_iterations_int (loop);
>>if (iterations >= 0 && iterations <= 1)
>>  {
>>if (dump_file && (dump_flags & TDF_DETAILS))
>> @@ -439,6 +441,8 @@ tree_unswitch_outer_loop (struct loop *l
>>/* If the loop is not expected to iterate, there is no need
>>for unswitching.  */
>>iterations = estimated_loop_iterations_int (loop);
>> +  if (iterations < 0)
>> +iterations = max_loop_iterations_int (loop);
>>if (iterations >= 0 && iterations <= 1)
>>  {
>>if (dump_file && (dump_flags & TDF_DETAILS))
>> Index: tree-vect-loop.c
>> ===

Re: [PATCH 1/4] Add gcc-auto-profile script

2016-03-30 Thread Andi Kleen
On Wed, Mar 30, 2016 at 06:05:00PM +, Joseph Myers wrote:
> On Sun, 27 Mar 2016, Andi Kleen wrote:
> 
> > 2016-03-27  Andi Kleen  
> > 
> > * gen_autofdo_event.py: New file to regenerate
> > gcc-auto-profile.
> 
> It may not be required in contrib, but does this script work with both 
> Python 2 and Python 3?  (New code that only works with Python 2 seems like 
> a bad idea nowadays, with GNU/Linux distributions moving to having only 
> Python 3 in a default install.)

Currently it's python 2. It could be run through the conversion tool,
but then you would lose older distributions which don't have 2.7.
As I understand the newer distributions always have options to install 2.7.

-andi
-- 
a...@linux.intel.com -- Speaking for myself only


[testsuite, sparcv9] Fix gcc.dg/ifcvt-4.c on 64-bit SPARC (PR rtl-optimization/68749)

2016-03-30 Thread Rainer Orth
gcc.dg/ifcvt-4.c currently FAILs for 64-bit SPARC:

FAIL: gcc.dg/ifcvt-4.c scan-rtl-dump ce1 "2 true changes made"

Eric suggested in the PR that Jeff's fix for PR rtl-optimization/69942
to gcc.dg/ifcvt-5.c applies here as well and indeed it does.

While I was at it, I removed the superfluous default args to dg-skip-if,
which only hinder readibility.  I should probably make a pass over the
whole testsuite to get rid of this nonesense once stage 1 opens.

Tested with the appropriate runtest invocations on sparc-sun-solaris2.12
and i386-pc-solaris2.12.  Ok for mainline?

Rainer


2016-03-29  Rainer Orth  

PR rtl-optimization/68749
* gcc.dg/ifcvt-4.c: Use "word_mode" rather than "int" to limit the
effects of argument promotions.
Remove default args to dg-skip-if.

# HG changeset patch
# Parent  a86884beb324b4a6a88b5f3dac6a6f72b8bbada1
Fix gcc.dg/ifcvt-4.c on 64-bit SPARC (PR rtl-optimization/68749)

diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -1,12 +1,14 @@
 /* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3" } */
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
-/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* hppa*64*-*-* visium-*-*" } {"*"} { "" } }  */
+/* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* hppa*64*-*-* visium-*-*" } }  */
 
-int
-foo (int x, int y, int a)
+typedef int word __attribute__((mode(word)));
+
+word
+foo (word x, word y, word a)
 {
-  int i = x;
-  int j = y;
+  word i = x;
+  word j = y;
   /* Try to make taking the branch likely.  */
   __builtin_expect (x > y, 1);
   if (x > y)

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: out of bounds access in insn-automata.c

2016-03-30 Thread Bernd Schmidt

On 03/25/2016 04:43 AM, Aldy Hernandez wrote:

If Bernd is fine with this, I'm happy to retract my patch and any
possible followups.  I'm just interested in having no path causing a
possible out of bounds access.  If your patch will do that, I'm cool.


I'll need to see that patch first to comment :-)


Bernd


Re: [PATCH] Fix simplify_shift_const_1 once more (PR rtl-optimization/70429)

2016-03-30 Thread Jeff Law

On 03/29/2016 12:03 PM, Jakub Jelinek wrote:

On Tue, Mar 29, 2016 at 11:47:57AM -0600, Jeff Law wrote:

2016-03-29  Jakub Jelinek  

PR rtl-optimization/70429
* combine.c (simplify_shift_const_1): For ASHIFTRT don't optimize
(cst1 >> count) >> cst2 into (cst1 >> cst2) >> count if
mode != result_mode.

* gcc.c-torture/execute/pr70429.c: New test.

But isn't the point of this code that cst1 >> cst2 turns into a compile time
constant just leaving one runtime shift of the result by count?


But with the mode change then you are changing
(cst1 >> count) >> cst2
into
cst1 >> cst2) >> count) << (bitsz - cst2)) >> (bitsz - cst))

Why can't we sign extend cst1 >> cst2 at compile time and use a ASHIFTRT for
the >> count shift?   Even if we've got a mode change to deal with, we can
generate the constant in whatever mode we want.


I don't understand how you could do that.
In the original source there is a variable shift count first, then narrowing
cast, then further arithmetic shift by constant.
So sure, you can shift the cst1 by cst2, but which bit you want to sign
extend on depends on the count value, only known at runtime.

Consider the testcase I've posted in the patch:
__attribute__((noinline, noclone)) int
foo (int a)
{
   return (int) (0x14ff6e2207db5d1fLL >> a) >> 4;
}

if a is 1, 0x14ff6e2207db5d1fLL >> a is
0xa7fb71103edae8f
and bit 31 of this is 0, so in the end you get
0x03edae8f >> 4
If a is 2, 0x14ff6e2207db5d1fLL >> a is
0x53fdb8881f6d747
and bit 31 of this is 1, so in the end you get
0x81f6d747 >> 4
Now, if you want to shift by 4 first, you have cst1 >> cst2
0x14ff6e2207db5d1LL, but you need to sign extend this, but which bit
from depends on count (and the difference between bitsizes of mode and
result_mode).
Ah, I see.  I should have looked at your test.  It's the variable shift 
that's defines the sign bit that we're using for the conversion.   Hence 
we can't know its value at compile-time.


Jeff



[PATCH] New plugin event when evaluating a constexpr call

2016-03-30 Thread Andres Tiraboschi
Hi
 This patch adds a plugin event when evaluating a call expression in constexpr.
 The goal of this patch is to allow the plugins to analyze and or
modify the evaluation of constant expressions.


Changelog 2016-3-30  Andres Tiraboschi

*gcc/plugin.c (PLUGIN_EVAL_CALL_CONSTEXPR): New event.
*gcc/plugin.def (PLUGIN_EVAL_CALL_CONSTEXPR): New event.
*gcc/cp/constexpr.c (constexpr_fundef): Moved to gcc/cp/cp-tree.h.
*gcc/cp/constexpr.c (constexpr_call): Ditto.
*gcc/cp/constexpr.c (constexpr_ctx): Ditto.
*gcc/cp/constexpr.c (cxx_eval_constant_expression): Not static anymore.
*gcc/pc/cp-tree.h (constexpr_call_info): New Type.
*gcc/pc/cp-tree.h (constexpr_fundef): Moved type from gcc/cp/constexpr.c.
*gcc/pc/cp-tree.h (constexpr_call): Ditto.
*gcc/pc/cp-tree.h (constexpr_ctx): Ditto.
*gcc/pc/cp-tree.h (cxx_eval_constant_expression): Declared.




diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 5f97c9d..5562e44 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "tree-inline.h"
 #include "ubsan.h"
+#include "plugin-api.h"
+#include "plugin.h"

 static bool verify_constant (tree, bool, bool *, bool *);
 #define VERIFY_CONSTANT(X)\
@@ -112,13 +114,6 @@ ensure_literal_type_for_constexpr_object (tree decl)
   return decl;
 }

-/* Representation of entries in the constexpr function definition table.  */
-
-struct GTY((for_user)) constexpr_fundef {
-  tree decl;
-  tree body;
-};
-
 struct constexpr_fundef_hasher : ggc_ptr_hash
 {
   static hashval_t hash (constexpr_fundef *);
@@ -856,62 +851,12 @@ explain_invalid_constexpr_fn (tree fun)
   input_location = save_loc;
 }

-/* Objects of this type represent calls to constexpr functions
-   along with the bindings of parameters to their arguments, for
-   the purpose of compile time evaluation.  */
-
-struct GTY((for_user)) constexpr_call {
-  /* Description of the constexpr function definition.  */
-  constexpr_fundef *fundef;
-  /* Parameter bindings environment.  A TREE_LIST where each TREE_PURPOSE
- is a parameter _DECL and the TREE_VALUE is the value of the parameter.
- Note: This arrangement is made to accommodate the use of
- iterative_hash_template_arg (see pt.c).  If you change this
- representation, also change the hash calculation in
- cxx_eval_call_expression.  */
-  tree bindings;
-  /* Result of the call.
-   NULL means the call is being evaluated.
-   error_mark_node means that the evaluation was erroneous;
-   otherwise, the actuall value of the call.  */
-  tree result;
-  /* The hash of this call; we remember it here to avoid having to
- recalculate it when expanding the hash table.  */
-  hashval_t hash;
-};
-
 struct constexpr_call_hasher : ggc_ptr_hash
 {
   static hashval_t hash (constexpr_call *);
   static bool equal (constexpr_call *, constexpr_call *);
 };

-/* The constexpr expansion context.  CALL is the current function
-   expansion, CTOR is the current aggregate initializer, OBJECT is the
-   object being initialized by CTOR, either a VAR_DECL or a _REF.  VALUES
-   is a map of values of variables initialized within the expression.  */
-
-struct constexpr_ctx {
-  /* The innermost call we're evaluating.  */
-  constexpr_call *call;
-  /* Values for any temporaries or local variables within the
- constant-expression. */
-  hash_map *values;
-  /* SAVE_EXPRs that we've seen within the current LOOP_EXPR.  NULL if we
- aren't inside a loop.  */
-  hash_set *save_exprs;
-  /* The CONSTRUCTOR we're currently building up for an aggregate
- initializer.  */
-  tree ctor;
-  /* The object we're building the CONSTRUCTOR for.  */
-  tree object;
-  /* Whether we should error on a non-constant expression or fail quietly.  */
-  bool quiet;
-  /* Whether we are strictly conforming to constant expression rules or
- trying harder to get a constant value.  */
-  bool strict;
-};
-
 /* A table of all constexpr calls that have been evaluated by the
compiler in this translation unit.  */

@@ -1303,6 +1248,22 @@ cxx_eval_call_expression (const constexpr_ctx
*ctx, tree t,
   bool non_constant_args = false;
   cxx_bind_parameters_in_call (ctx, t, &new_call,
non_constant_p, overflow_p, &non_constant_args);
+
+  constexpr_call_info call_info;
+  call_info.function = t;
+  call_info.lval = lval;
+  call_info.call = &new_call;
+  call_info.call_stack = call_stack;
+  call_info.non_constant_args = &non_constant_args;
+  call_info.non_const_p = non_constant_p;
+  call_info.ctx = ctx;
+  call_info.result = NULL_TREE;
+  invoke_plugin_callbacks (PLUGIN_EVAL_CALL_CONSTEXPR, &call_info);
+  if (call_info.result != NULL_TREE)
+{
+  return call_info.result;
+}
+
   if (*non_constant_p)
 return t;

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 15b004d..00856ec 100644
--- a/gcc/cp/cp-tree.h
+++ b

Re: [PATCH] c++/67376 Comparison with pointer to past-the-end, of array fails inside constant expression

2016-03-30 Thread Jason Merrill

On 03/30/2016 12:32 PM, Martin Sebor wrote:

On 03/30/2016 09:30 AM, Jason Merrill wrote:

On 03/29/2016 11:57 PM, Martin Sebor wrote:

Are we confident that arr[0] won't make it here as POINTER_PLUS_EXPR or
some such?


I'm as confident as I can be given that this is my first time
working in this area.  Which piece of code or what assumption
in particular are you concerned about?


I want to be sure that we don't fold these conditions to false.

constexpr int *ip = 0;
constexpr struct A { int ar[3]; } *ap = 0;

static_assert(&ip[0] == 0);
static_assert(&(ap->ar[0]) == 0);


I see.  Thanks for clarifying.  The asserts pass.  The expressions
are folded earlier on (in fact, as we discussed, the second one
too early and is accepted even though it's undefined and should be
rejected in a constexpr context) and never reach fold_comparison.


Good, then let's add at least the first to one of the tests.


+  /* Avoid folding references to struct members at offset 0 to
+ prevent tests like '&ptr->firstmember == 0' from getting
+ eliminated.  When ptr is null, although the -> expression
+ is strictly speaking invalid, GCC retains it as a matter
+ of QoI.  See PR c/44555. */
+  && (TREE_CODE (op0) != ADDR_EXPR
+  || TREE_CODE (TREE_OPERAND (op0, 0)) != COMPONENT_REF
+  || compare_tree_int (DECL_FIELD_OFFSET ((TREE_OPERAND
+  (TREE_OPERAND (op0, 0), 1))), 0))


Can we look at offset/bitpos here rather than examine the tree structure 
of op0?  get_inner_reference already examined it for us.


Also, it looks like you aren't handling the case with the operands 
switched, i.e. 0 == p and such.


Jason



Re: [C++/70393] constexpr constructor

2016-03-30 Thread Jason Merrill

On 03/30/2016 01:40 PM, Nathan Sidwell wrote:

On 03/30/16 13:22, Jason Merrill wrote:

On 03/29/2016 08:40 AM, Nathan Sidwell wrote:

+  /* The field we're initializing must be on the field
+ list.  Look to see if it is present before the
+ field the current ELT initializes.  */
+  for (; fields != cep->index; fields = DECL_CHAIN (fields))
+if (index == fields)
+  goto insert;


Instead of searching through the fields, could we compare
DECL_FIELD_OFFSET of
index and cep->index?


I wondered about that, but then there's  bitfields (and I can't recall
how they behave WRT DECL_FIELD_OFFSET).  pointer equality seems cheap
enough?


Fair enough.  OK.

Jason




[Patch] PR 70235 - [4.9/5/6 Regression] Incorrect output with PF format

2016-03-30 Thread Dominique d'Humières
Is the following patch OK for trunk and the gcc-4.9 and gcc-5 branches after 
some delay (bootstrapped and regtested on x86_64-apple-darwin15)?

TIA

Dominique

Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 234597)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2016-03-30  Dominique d'Humieres  
+   Jerry DeLisle  
+
+   gfortran.dg/fmt_pf.f90: New test.
+
 2016-03-30  Rainer Orth  
 
Forward-port from 5 branch
Index: libgfortran/ChangeLog
===
--- libgfortran/ChangeLog   (revision 234597)
+++ libgfortran/ChangeLog   (working copy)
@@ -1,3 +1,10 @@
+2016-03-30  Jerry DeLisle  
+   Dominique d'Humieres  
+
+   PR libgfortran/70235
+   * io/write_float.def: Fix PF format for negative values of the scale
+   factor.
+
 2016-03-28  Alessandro Fanfarillo  
 
* caf/libcaf.h: caf_stop_numeric and caf_stop_str prototype.
Index: libgfortran/io/write_float.def
===
--- libgfortran/io/write_float.def  (revision 234597)
+++ libgfortran/io/write_float.def  (working copy)
@@ -184,9 +184,6 @@
  memmove (digits + nbefore, digits + nbefore + 1, p);
  digits[nbefore + p] = '.';
  nbefore += p;
- nafter = d - p;
- if (nafter < 0)
-   nafter = 0;
  nafter = d;
  nzero = 0;
}
@@ -204,12 +201,27 @@
{
  nzero = -(nbefore + p);
  memmove (digits + 1, digits, nbefore);
- digits++;
- nafter = d + nbefore;
+ nafter = d - nzero;
+ if (nafter == 0 && d > 0)
+   {
+ /* This is needed to get the correct rounding. */
+ memmove (digits + 1, digits, ndigits - 1);
+ digits[1] = '0';
+ nafter = 1;
+ nzero = d - 1;
+   }
+ else if (nafter < 0)
+   {
+ /* Reset digits to 0 in order to get correct rounding
+towards infinity. */
+ for (i = 0; i < ndigits; i++)
+   digits[i] = '0';
+ digits[ndigits - 1] = '1';
+ nafter = d;
+ nzero = 0;
+   }
  nbefore = 0;
}
- if (nzero > d)
-   nzero = d;
}
}
   else



Re: [Patch] PR 70235 - [4.9/5/6 Regression] Incorrect output with PF format

2016-03-30 Thread Jerry DeLisle
On 03/30/2016 01:18 PM, Dominique d'Humières wrote:
> Is the following patch OK for trunk and the gcc-4.9 and gcc-5 branches after 
> some delay (bootstrapped and regtested on x86_64-apple-darwin15)?

Yes, OK, thanks for help!!

Jerry


Re: [PATCH] New flag in order to dump information about template instantiations.

2016-03-30 Thread Andres Tiraboschi
2016-03-29 12:07 GMT-03:00 Andres Tiraboschi
:
> Hi,
> the attached patch adds a new compilation flag
> 'ftemplate-instantiations' in order
> to allow dumping debug information for template instantiations.
> This flag has 2 possible values: none(by default) and hreadable, that
> prints witch
> templates instantiations had been made in a human readable way.
> This patch was also made in order to add options easily and to interact with
> plugins.
>   For example in a plugin can be defined a derived class for
> template_instantiations_callbacks
> implementing _function_instantiation, _class_instantiation, 
> _using_instantiation
> and then using add_template_instantiations_callbacks in order to
> access information
> about witch template instantiations had been made.
>
> Changelog
> 2016-03-29  Andres Tiraboschi  
>
> * gcc/c-family/c.opt (ftemplate-instantiations): New flag.
> * gcc/flag-types.h (ti_dump_options): New type.
> * gcc/cp/decl2.c (cp_write_global_declarations): Added code to
> dump information.
> * gcc/cp/cp-tree.h (template_instantiations_callbacks): New type.
> (call_template_instantiation_callbacks): Declare.
> (add_template_instantiations_callbacks): Likewise.
> (clean_up_callbacks): Likewise.
> * gcc/cp/pt.c (human_readable_template_instantiations): New type.
> (instantiation_callbacks): Declare.
> (call_template_instantiation_callback): New function.
> (call_template_instantiation_callbacks): Likewise.
> (add_template_instantiations_callbacks): Likewise.
> (initialize_instantiations_callbacks): Likewise.
> (clean_up_callbacks): Likewise.
> (init_template_processing): Added code to initialize 
> instatiation_callbacks.
> (register_specialization): Added code to dump information.
> * gcc/doc/invoke.texi (ftemplate-instantiations): Added documentation.
>
>
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 7c5f6c7..a0ebcdc 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1487,6 +1487,19 @@ fstats
>  C++ ObjC++ Var(flag_detailed_statistics)
>  Display statistics accumulated during compilation.
>
> +ftemplate-instantiations=
> +C++ Joined RejectNegative Enum(ti_dump_options) Var(ti_dump_option)
> Init(TI_NONE)
> +Dump information about wich templates have been instantiated
> +
> +Enum
> +Name(ti_dump_options) Type(enum ti_dump_options)
> UnknownError(unrecognized template instantiation dumping option %qs)
> +
> +EnumValue
> +Enum(ti_dump_options) String(none) Value(TI_NONE)
> +
> +EnumValue
> +Enum(ti_dump_options) String(hreadable) Value(TI_HREADABLE)
> +
>  fstrict-enums
>  C++ ObjC++ Optimization Var(flag_strict_enums)
>  Assume that values of enumeration type are always within the minimum
> range of that type.
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 15b004d..f682b4a 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -4816,6 +4816,61 @@ struct local_specialization_stack
>hash_map *saved;
>  };
>
> +class template_instantiations_callbacks
> +{
> +public:
> +  template_instantiations_callbacks () : next(NULL){}
> +
> +  void function_instantiation (tree tmpl, tree args, tree spec)
> +  {
> +_function_instantiation (tmpl, args, spec);
> +if (next != NULL)
> +  next->function_instantiation (tmpl, args, spec);
> +  }
> +
> +  void class_instantiation (tree tmpl, tree args, tree spec)
> +  {
> +_class_instantiation (tmpl, args, spec);
> +if (next != NULL)
> +  next->class_instantiation (tmpl, args, spec);
> +  }
> +
> +  void using_instantiation (tree tmpl, tree args, tree spec)
> +  {
> +_using_instantiation (tmpl, args, spec);
> +if (next != NULL)
> +  next->using_instantiation (tmpl, args, spec);
> +  }
> +
> +  void add_callbacks (template_instantiations_callbacks* new_next)
> +  {
> +if (next)
> +  next->add_callbacks (new_next);
> +else
> +  next = new_next;
> +  }
> +
> +  virtual ~template_instantiations_callbacks ()
> +  {
> +delete next;
> +  }
> +
> +private:
> +  template_instantiations_callbacks* next;
> +
> +  virtual void _function_instantiation (tree, tree, tree)
> +  {
> +  }
> +
> +  virtual void _class_instantiation (tree, tree, tree)
> +  {
> +  }
> +
> +  virtual void _using_instantiation (tree, tree, tree)
> +  {
> +  }
> +};
> +
>  /* in class.c */
>
>  extern int current_class_depth;
> @@ -6199,6 +6254,9 @@ extern void register_local_specialization
> (tree, tree);
>  extern tree retrieve_local_specialization   (tree);
>  extern tree extract_fnparm_pack (tree, tree *);
>  extern tree template_parm_to_arg(tree);
> +extern void call_template_instantiation_callbacks (void);
> +extern void add_template_instantiations_callbacks
> (template_instantiations_callbacks* new_callback);
> +extern void clean_up_callbacks (void);
>
>  /* in repo.c */
>  extern void init_repo(void);
> diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
> index 73b0d28..097e3564

Re: a patch for PR68695

2016-03-30 Thread Christophe Lyon
On 29 March 2016 at 18:28, Vladimir Makarov  wrote:
>   The following patch improves the code in 2 out of 3 cases in
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68695
>
>   The patch uses more accurate costs for the RA cost improvement
> optimization after colouring.
>
>   The patch was tested and bootstrapped on x86-64.  It is hard to create a
> test to check the correct code generation.  Therefore there is no test.  As
> the patch changes heuristics, a generated code in some cases will be
> different but at least x86-64 tests expecting a specific code are not broken
> by the patch.
>
>   Committed as rev.  234527
>

Hi,

I've noticed that after this patch, 2 tests regress (PASS -> FAIL) on arm:
  gcc.dg/ira-shrinkwrap-prep-2.c scan-rtl-dump pro_and_epilogue
"Performing shrink-wrapping"
  gcc.dg/pr10474.c scan-rtl-dump pro_and_epilogue "Performing shrink-wrapping"

Christophe


[PATCH] Fix ira.c indirect_jump_optimize (PR rtl-optimization/70460)

2016-03-30 Thread Jakub Jelinek
Hi!

As mentioned in the PR, we are miscompiling glibc on i686-linux, because
the new indirect_jump_optimize mini-pass thinks that a insn
which has REG_LABEL_OPERAND note necessarily has to set the target register
to that label, while in the glibc case it is actually that label + some
offset, where the offset is read from a table which contains other labels -
this label differences.

The following patch changes it to just look at SET_SRC of single_set and/or
REG_EQUAL note, and only consider those if one of them is a LABEL_REF.
That alone broke lots of tests, which contain non-local gotos, so I had
to add a check that we don't do anything in this mini-pass (like old ira.c
code did) if there is REG_NON_LOCAL_GOTO note.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/5/4.9?
Bernd has preapproved the patch, but that was before the REG_NON_LOCAL_GOTO
changes.

2016-03-30  Jakub Jelinek  

PR rtl-optimization/70460
* ira.c (indirect_jump_optimize): Don't substitute LABEL_REF
with operand from REG_LABEL_OPERAND, instead substitute
SET_SRC or REG_EQUAL note content if it is a LABEL_REF.
Don't do anything for REG_NON_LOCAL_GOTO jumps.

* gcc.c-torture/execute/pr70460.c: New test.

--- gcc/ira.c.jj2016-03-21 10:12:32.0 +0100
+++ gcc/ira.c   2016-03-30 19:06:42.883892088 +0200
@@ -3870,7 +3870,8 @@ indirect_jump_optimize (void)
   FOR_EACH_BB_REVERSE_FN (bb, cfun)
 {
   rtx_insn *insn = BB_END (bb);
-  if (!JUMP_P (insn))
+  if (!JUMP_P (insn)
+ || find_reg_note (insn, REG_NON_LOCAL_GOTO, NULL_RTX))
continue;
 
   rtx x = pc_set (insn);
@@ -3884,19 +3885,18 @@ indirect_jump_optimize (void)
  if (!DF_REF_IS_ARTIFICIAL (def))
{
  rtx_insn *def_insn = DF_REF_INSN (def);
- rtx note = find_reg_note (def_insn, REG_LABEL_OPERAND, NULL_RTX);
-
- if (note)
+ rtx lab = NULL_RTX;
+ rtx set = single_set (def_insn);
+ if (set && GET_CODE (SET_SRC (set)) == LABEL_REF)
+   lab = SET_SRC (set);
+ else
{
- /* Substitute a LABEL_REF to the label given by the
-note rather than using SET_SRC of DEF_INSN.
-DEF_INSN might be loading the label constant from
-a constant pool, which isn't what we want in a
-direct branch.  */
- rtx lab = gen_rtx_LABEL_REF (Pmode, XEXP (note, 0));
- if (validate_replace_rtx (SET_SRC (x), lab, insn))
-   rebuild_p = true;
+ rtx eqnote = find_reg_note (def_insn, REG_EQUAL, NULL_RTX);
+ if (eqnote && GET_CODE (XEXP (eqnote, 0)) == LABEL_REF)
+   lab = XEXP (eqnote, 0);
}
+ if (lab && validate_replace_rtx (SET_SRC (x), lab, insn))
+   rebuild_p = true;
}
}
 }
--- gcc/testsuite/gcc.c-torture/execute/pr70460.c.jj2016-03-30 
19:08:39.295335567 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr70460.c   2016-03-30 
19:08:18.0 +0200
@@ -0,0 +1,29 @@
+/* PR rtl-optimization/70460 */
+
+int c;
+
+__attribute__((noinline, noclone)) void
+foo (int x)
+{
+  static int b[] = { &&lab1 - &&lab0, &&lab2 - &&lab0 };
+  void *a = &&lab0 + b[x];
+  goto *a;
+lab1:
+  c += 2;
+lab2:
+  c++;
+lab0:
+  ;
+}
+
+int
+main ()
+{
+  foo (0);
+  if (c != 3)
+__builtin_abort ();
+  foo (1);
+  if (c != 4)
+__builtin_abort ();
+  return 0;
+}


Jakub


Re: [Patch] PR 70235 - [4.9/5/6 Regression] Incorrect output with PF format

2016-03-30 Thread Dominique d'Humières
Thanks for the quick review. Committed on trunk as revision r234600. I’ll wait 
a week for gcc-5 and gcc-4.9.

Dominique

> Le 30 mars 2016 à 22:38, Jerry DeLisle  a écrit :
> 
> On 03/30/2016 01:18 PM, Dominique d'Humières wrote:
>> Is the following patch OK for trunk and the gcc-4.9 and gcc-5 branches after 
>> some delay (bootstrapped and regtested on x86_64-apple-darwin15)?
> 
> Yes, OK, thanks for help!!
> 
> Jerry



Re: C++ PATCH for c++/70449 (ICE when printing a filename of unknown location)

2016-03-30 Thread Manuel López-Ibáñez

On 30/03/16 17:14, Marek Polacek wrote:

This test ICEs since the addition of the assert in pp_string which ensures that
we aren't trying to print an empty string.  But that's what happens here, the
location is actually UNKNOWN_LOCATION, so LOCATION_FILE on that yields null.
Fixed byt not trying to print the filename of UNKNOWN_LOCATION.


How it can be UNKNOWN_LOCATION? It has to be somewhere in the input file!

This is what Clang prints:

prog.cc:5:3: warning: type definition in a constexpr function is a C++14 
extension [-Wc++14-extensions]

  enum E { a = f<0> () };
  ^

This is what we print:

In instantiation of 'constexpr int f() [with int  = 0]':
prog.cc:5:22:   required from here
cc1plus: error: body of constexpr function 'constexpr int f() [with int 
 = 0]' not a return-statement


This is very broken: The fact that input_location is UNKNOWN_LOCATION makes us 
print "cc1plus" in the error.


Even if we accept the broken location for now (adding some FIXME to the code 
would help the next person to realise this is not normal), if LOCATION_FILE() 
is NULL, we should print "progname" like diagnostic_build_prefix() does. 
Moreover, the filename string should be built with file_name_as_prefix() to get 
correct coloring.


Something like:

  const char *filename = LOCATION_FILE (location);
  /* FIXME: Somehow we may get UNKNOWN_LOCATION here: See 
g++.dg/cpp0x/constexpr-70449.C */

  if (filename == NULL) filename = progname;
  const char * prefix = file_name_as_prefix (context, filename);

pp_verbatim (context->printer,
 TREE_CODE (p->decl) == TREE_LIST
 ? _("%s: In substitution of %qS:\n")
 : _("%s: In instantiation of %q#D:\n"),
 prefix, p->decl);
  free (prefix);

The above also avoids adding yet another slightly different new string for 
translation.


Cheers,

Manuel.




Re: [PATCH] c++/67376 Comparison with pointer to past-the-end, of array fails inside constant expression

2016-03-30 Thread Martin Sebor

On 03/30/2016 01:25 PM, Jason Merrill wrote:

On 03/30/2016 12:32 PM, Martin Sebor wrote:

On 03/30/2016 09:30 AM, Jason Merrill wrote:

On 03/29/2016 11:57 PM, Martin Sebor wrote:

Are we confident that arr[0] won't make it here as
POINTER_PLUS_EXPR or
some such?


I'm as confident as I can be given that this is my first time
working in this area.  Which piece of code or what assumption
in particular are you concerned about?


I want to be sure that we don't fold these conditions to false.

constexpr int *ip = 0;
constexpr struct A { int ar[3]; } *ap = 0;

static_assert(&ip[0] == 0);
static_assert(&(ap->ar[0]) == 0);


I see.  Thanks for clarifying.  The asserts pass.  The expressions
are folded earlier on (in fact, as we discussed, the second one
too early and is accepted even though it's undefined and should be
rejected in a constexpr context) and never reach fold_comparison.


Good, then let's add at least the first to one of the tests.


I've enhanced the new constexpr-nullptr-1.C test to verify this.
I added assertions exercising the relational expressions as well
and for sanity compiled the test with CLang.  It turns out that
it rejects the relational expressions with null pointers like
the one below complaining they aren't constant.

  constexpr int i = 0;
  constexpr const int *p = &i;
  constexpr int *q = 0;

  static_assert (q < p, "q < p");

I ended up not using a static_assert for the unspecified subset
even though GCC accepts it.  It seems that they really aren't
valid constant expressions (their results are unspecified for
null pointers) and should be rejected.  Do you agree?  (If you
do, I'll add these cases to c++/70248 that's already tracking
another unspecified case that GCC incorrectly accepts).


+   /* Avoid folding references to struct members at offset 0 to
+  prevent tests like '&ptr->firstmember == 0' from getting
+  eliminated.  When ptr is null, although the -> expression
+  is strictly speaking invalid, GCC retains it as a matter
+  of QoI.  See PR c/44555. */
+   && (TREE_CODE (op0) != ADDR_EXPR
+   || TREE_CODE (TREE_OPERAND (op0, 0)) != COMPONENT_REF
+   || compare_tree_int (DECL_FIELD_OFFSET ((TREE_OPERAND
+   (TREE_OPERAND (op0, 0), 1))), 0))


Can we look at offset/bitpos here rather than examine the tree structure
of op0?  get_inner_reference already examined it for us.


Good suggestion, thanks!



Also, it looks like you aren't handling the case with the operands
switched, i.e. 0 == p and such.


Based on my testing and reading the code I believe the caller
(fold_binary_loc) arranges for the constant argument to always
come second in comparisons.  I've added a comment to the code
to make it clear.

Attached is an updated patch retested on x86_64.

Martin
PR c++/67376 - [5/6 regression] Comparison with pointer to past-the-end
	of array fails inside constant expression
PR c++/70170 - [6 regression] bogus not a constant expression error comparing
	pointer to array to null
PR c++/70172 - incorrect reinterpret_cast from integer to pointer error
	on invalid constexpr initialization
PR c++/70228 - insufficient detail in diagnostics for a constexpr out of bounds
	array subscript

gcc/testsuite/ChangeLog:
2016-03-30  Martin Sebor  

	PR c++/67376
	PR c++/70170
	PR c++/70172
	PR c++/70228
	* g++.dg/cpp0x/constexpr-array-ptr10.C: New test.
	* g++.dg/cpp0x/constexpr-array-ptr9.C: New test.
	* g++.dg/cpp0x/constexpr-nullptr-1.C: New test.
	* g++.dg/cpp0x/constexpr-array5.C: Adjust text of expected diagnostic.
	* g++.dg/cpp0x/constexpr-string.C: Same.
	* g++.dg/cpp0x/constexpr-wstring2.C: Same.
	* g++.dg/cpp0x/pr65398.C: Same.
	* g++.dg/ext/constexpr-vla1.C: Same.
	* g++.dg/ext/constexpr-vla2.C: Same.
	* g++.dg/ext/constexpr-vla3.C: Same.
	* g++.dg/ubsan/pr63956.C: Same.

gcc/cp/ChangeLog:
2016-03-30  Martin Sebor  

	PR c++/67376
	PR c++/70170
	PR c++/70172
	PR c++/70228
	* constexpr.c (diag_array_subscript): New function.
	(cxx_eval_array_reference): Detect out of bounds array indices.

gcc/ChangeLog:
2016-03-30  Martin Sebor  

	PR c++/67376
	* fold-const.c (maybe_nonzero_address): New function.
	(fold_comparison): Call it.  Fold equality and relational
	expressions involving null pointers.
	(tree_single_nonzero_warnv_p): Call maybe_nonzero_address.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 8ea7111..5d1b8b3 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1837,6 +1837,30 @@ find_array_ctor_elt (tree ary, tree dindex, bool insert = false)
   return -1;
 }
 
+/* Under the control of CTX, issue a detailed diagnostic for
+   an out-of-bounds subscript INDEX into the expression ARRAY.  */
+
+static void
+diag_array_subscript (const constexpr_ctx *ctx, tree array, tree index)
+{
+  if (!ctx->quiet)
+{
+  tree arraytype = TREE_TYPE (array);
+
+  /* Convert the unsigned array subscript to a signed integer to avoid
+	 printing huge numbers for small negative v

Re: C++ PATCH for c++/70449 (ICE when printing a filename of unknown location)

2016-03-30 Thread Manuel López-Ibáñez
On 30 March 2016 at 23:42, Manuel López-Ibáñez  wrote:
> On 30/03/16 17:14, Marek Polacek wrote:
>>
>> This test ICEs since the addition of the assert in pp_string which ensures
>> that
>> we aren't trying to print an empty string.  But that's what happens here,
>> the
>> location is actually UNKNOWN_LOCATION, so LOCATION_FILE on that yields
>> null.
>> Fixed byt not trying to print the filename of UNKNOWN_LOCATION.

> Even if we accept the broken location for now (adding some FIXME to the code
> would help the next person to realise this is not normal), if
> LOCATION_FILE() is NULL, we should print "progname" like
> diagnostic_build_prefix() does. Moreover, the filename string should be
> built with file_name_as_prefix() to get correct coloring.

Even better: Use "f ? f : progname" in file_name_as_prefix() and
simplify the code to:

  /* FIXME: Somehow we may get UNKNOWN_LOCATION here: See
g++.dg/cpp0x/constexpr-70449.C */
  const char * prefix = file_name_as_prefix (context,
LOCATION_FILE (location));
pp_verbatim (context->printer,
 TREE_CODE (p->decl) == TREE_LIST
 ? _("%s: In substitution of %qS:\n")
 : _("%s: In instantiation of %q#D:\n"),
 prefix, p->decl);
  free (prefix);

Fixes the ICE, adds colors, mentions the broken location and does not
add extra strings.

Cheers,

Manuel.


Re: [PATCH] Fix ira.c indirect_jump_optimize (PR rtl-optimization/70460)

2016-03-30 Thread Alan Modra
On Wed, Mar 30, 2016 at 11:27:28PM +0200, Jakub Jelinek wrote:
> As mentioned in the PR, we are miscompiling glibc on i686-linux, because
> the new indirect_jump_optimize mini-pass thinks that a insn
> which has REG_LABEL_OPERAND note necessarily has to set the target register
> to that label, while in the glibc case it is actually that label + some
> offset, where the offset is read from a table which contains other labels -
> this label differences.

Sorry for the breakage, and thanks for cleaning up after me.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] gcc/final.c: -fdebug-prefix-map support to remap sources with relative path

2016-03-30 Thread Hongxu Jia

On 03/31/2016 01:58 AM, Joseph Myers wrote:

On Sun, 27 Mar 2016, Hongxu Jia wrote:


PR other/70428
* final.c (remap_debug_filename): Use lrealpath to translate
relative path before remapping

I think this would break cases that currently work.  When using
-fdebug-prefix-map you should understand what paths will appear in debug
info (which means understanding something about how a path gets split by
DWARF into multiple pieces), and likely inspect the resulting debug info
(in an automated way) to see if you missed any paths and need to add more
such options.


Got it, thanks to point it out.

//Hongxu




Yo

2016-03-30 Thread Selena Gomez
Yo, sexy guy LOL! I want to meet you in real life. txt me @ 657 363 five zero 
two four if you want sum of mine sexy photos. Will u message me  soon? IMU, OX. 



Re: [PATCH] c++/67376 Comparison with pointer to past-the-end, of array fails inside constant expression

2016-03-30 Thread Jason Merrill

On 03/30/2016 06:50 PM, Martin Sebor wrote:

On 03/30/2016 01:25 PM, Jason Merrill wrote:

On 03/30/2016 12:32 PM, Martin Sebor wrote:

On 03/30/2016 09:30 AM, Jason Merrill wrote:

On 03/29/2016 11:57 PM, Martin Sebor wrote:

Are we confident that arr[0] won't make it here as
POINTER_PLUS_EXPR or
some such?


I'm as confident as I can be given that this is my first time
working in this area.  Which piece of code or what assumption
in particular are you concerned about?


I want to be sure that we don't fold these conditions to false.

constexpr int *ip = 0;
constexpr struct A { int ar[3]; } *ap = 0;

static_assert(&ip[0] == 0);
static_assert(&(ap->ar[0]) == 0);


I see.  Thanks for clarifying.  The asserts pass.  The expressions
are folded earlier on (in fact, as we discussed, the second one
too early and is accepted even though it's undefined and should be
rejected in a constexpr context) and never reach fold_comparison.


Good, then let's add at least the first to one of the tests.


I've enhanced the new constexpr-nullptr-1.C test to verify this.
I added assertions exercising the relational expressions as well
and for sanity compiled the test with CLang.  It turns out that
it rejects the relational expressions with null pointers like
the one below complaining they aren't constant.

   constexpr int i = 0;
   constexpr const int *p = &i;
   constexpr int *q = 0;

   static_assert (q < p, "q < p");

I ended up not using a static_assert for the unspecified subset
even though GCC accepts it.  It seems that they really aren't
valid constant expressions (their results are unspecified for
null pointers) and should be rejected.  Do you agree?  (If you
do, I'll add these cases to c++/70248 that's already tracking
another unspecified case that GCC incorrectly accepts).


I agree.


+   /* Avoid folding references to struct members at offset 0 to
+  prevent tests like '&ptr->firstmember == 0' from getting
+  eliminated.  When ptr is null, although the -> expression
+  is strictly speaking invalid, GCC retains it as a matter
+  of QoI.  See PR c/44555. */
+   && (TREE_CODE (op0) != ADDR_EXPR
+   || TREE_CODE (TREE_OPERAND (op0, 0)) != COMPONENT_REF
+   || compare_tree_int (DECL_FIELD_OFFSET ((TREE_OPERAND
+   (TREE_OPERAND (op0, 0), 1))), 0))


Can we look at offset/bitpos here rather than examine the tree structure
of op0?  get_inner_reference already examined it for us.


Good suggestion, thanks!


But you're still examining the tree structure:


+  && (TREE_CODE (op0) != ADDR_EXPR
+  || TREE_CODE (TREE_OPERAND (op0, 0)) != COMPONENT_REF
+  || bitpos0 != 0)


Here instead of looking at op0 we can check (offset0 == NULL_TREE && 
bitpos0 != 0), which indicates a constant non-zero offset.



+  && TREE_CODE (arg1) == INTEGER_CST
+  && integer_zerop (arg1))


And here you don't need the check for INTEGER_CST.


Also, it looks like you aren't handling the case with the operands
switched, i.e. 0 == p and such.


Based on my testing and reading the code I believe the caller
(fold_binary_loc) arranges for the constant argument to always
come second in comparisons.  I've added a comment to the code
to make it clear.


Great.

Jason



Re: Proposed Patch for Bug 69687

2016-03-30 Thread Marcel Böhme
Hi Bernd,

> Are all the places being patched really problematic ones where an input file 
> could realistically cause an overflow, or just the string functions?
The loop in demangle_args allows to call the patched register*- and 
remember*-methods arbitrarily often. So, those should also overflow at some 
point.
Found a few other segmentation faults in libiberty that I’ll report and patch 
separately.

> I'm concerned about just returning without any kind of error indication. Not 
> sure what we should be calling from libiberty, but I was thinking maybe 
> xmalloc_failed.
Done. Now, clients of libiberty freeze for about 80 seconds and consume about 
3GB of memory before exiting with "out of memory allocating 2147483647 bytes 
after a total of 3221147648 bytes”.

> Might also want to guard against overflow from the first addition.
Done.

Index: libiberty/cplus-dem.c
===
--- libiberty/cplus-dem.c   (revision 234607)
+++ libiberty/cplus-dem.c   (working copy)
@@ -55,6 +55,7 @@ Boston, MA 02110-1301, USA.  */
 void * malloc ();
 void * realloc ();
 #endif
+#include 
 
 #include 
 #undef CURRENT_DEMANGLING_STYLE
@@ -4254,6 +4255,8 @@ remember_type (struct work_stuff *work, 
}
   else
{
+ if (work -> typevec_size > INT_MAX / 2)
+   xmalloc_failed (INT_MAX);
  work -> typevec_size *= 2;
  work -> typevec
= XRESIZEVEC (char *, work->typevec, work->typevec_size);
@@ -4281,6 +4284,8 @@ remember_Ktype (struct work_stuff *work,
}
   else
{
+ if (work -> ksize > INT_MAX / 2)
+   xmalloc_failed (INT_MAX);
  work -> ksize *= 2;
  work -> ktypevec
= XRESIZEVEC (char *, work->ktypevec, work->ksize);
@@ -4310,6 +4315,8 @@ register_Btype (struct work_stuff *work)
}
   else
{
+ if (work -> bsize > INT_MAX / 2)
+   xmalloc_failed (INT_MAX);
  work -> bsize *= 2;
  work -> btypevec
= XRESIZEVEC (char *, work->btypevec, work->bsize);
@@ -4764,6 +4771,8 @@ string_need (string *s, int n)
   else if (s->e - s->p < n)
 {
   tem = s->p - s->b;
+  if (n > INT_MAX / 2 - tem)
+xmalloc_failed (INT_MAX); 
   n += tem;
   n *= 2;
   s->b = XRESIZEVEC (char, s->b, n);