Re: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-09-29 Thread Marcus Shawcroft

On 29/09/15 00:52, Evandro Menezes wrote:

In some micro-architectures the insns to load or store pairs of vector
registers are implemented rather differently from those affecting lanes
in vector registers.  Then, it's important that such insns be described
likewise differently in the scheduling model.

This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart
from the current neon_load2_2reg_q and neon_store2_2reg_q types,
respectively.



Hi,

The AArch64 part of this is OK. Please wait for Kyrill or Ramana to 
comment on ARM side.  Cheers /Marcus



Thank you,

-- Evandro Menezes


0001-AArch64-Add-separate-insn-sched-class-for-vector-LDP.patch


 From 340249dcd2af8dfce486cb4f62d4eaf285c6a799 Mon Sep 17 00:00:00 2001
From: Evandro Menezes
Date: Mon, 28 Sep 2015 15:00:00 -0500
Subject: [PATCH] [AArch64] Add separate insn sched class for vector LDP & STP

2015-09-28  Evandro Menezes

gcc/
* config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q):
add new insn types for vector load and store pairs.


s/add/Add/ and likewise the rest of the changelog comments.


* config/arm/cortex-a53.md (cortex_a53_f_load_2reg): add insn
types "neon_ldp{,_q}".
* config/arm/cortex-a57.md (neon_load_c): add insn types
"neon_ldp{,_q}".
(neon_store_complex): add insn types "neon_stp{,_q}".
* config/aarch64/aarch64-simd.md (aarch64_be_movoi): add insn types
"neon_{ldp,stp}_q".




RE: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-29 Thread Ajit Kumar Agarwal


-Original Message-
From: Aaron Sawdey [mailto:acsaw...@linux.vnet.ibm.com] 
Sent: Monday, September 28, 2015 11:55 PM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,optimization]: Optimized changes in the estimate register 
pressure cost.

On Sat, 2015-09-26 at 04:51 +, Ajit Kumar Agarwal wrote:
> I have made the following changes in the estimate_reg_pressure_cost 
> function used by the loop invariant and IVOPTS.
> 
> Earlier the estimate_reg_pressure cost uses the cost of n_new 
> variables that are generated by the Loop Invariant  and IVOPTS. These 
> are not sufficient for register pressure calculation. The register 
> pressure cost calculation should use the n_new + n_old (numbers) to 
> consider the cost. n_old is the register  used inside the loops and 
> the effect of  n_new new variables generated by loop invariant and 
> IVOPTS on register pressure is based on how the new variables impact 
> on register used inside the loops. The increase or decrease in register 
> pressure is due to the impact of new variables on the register used  inside 
> the loops. The register-register move cost or the spill cost should consider 
> the cost associated with register used and the new variables generated. The 
> movement  of new variables increases or decreases the register pressure, 
> which is based on  overall cost of n_new + n_old variables.
> 
> The increase and decrease in register pressure is based on the overall 
> cost of n_new + n_old as the changes in the register pressure caused 
> due to new variables is based on how the changes behave with respect to the 
> register used in the loops.
> 
> Thus the register pressure caused to new variables is based on the new 
> variables and its impact on register used inside  the loops and thus consider 
> the overall  cost of n_new + n_old.
> 
> Bootstrap for i386 and reg tested on i386 with the change is fine.
> 
> SPEC CPU 2000 benchmarks are run and there is following impact on the 
> performance and code size.
> 
> ratio with the optimization vs ratio without optimization for INT 
> benchmarks
> (3807.632 vs 3804.661)
> 
> ratio with the optimization vs ratio without optimization for FP 
> benchmarks ( 4668.743 vs 4778.741)
> 
> Code size reduction with respect to FP SPEC CPU 2000 benchmarks
> 
> Number of instruction with optimization = 1094117 Number of 
> instruction without optimization = 1094659
> 
> Reduction in number of instruction with the optimization = 542 instruction.
> 
> [Patch,optimization]: Optimized changes in the estimate  register 
> pressure cost.
> 
> Earlier the estimate_reg_pressure cost uses the cost of n_new 
> variables that are generated by the Loop Invariant and IVOPTS. These 
> are not sufficient for register pressure calculation. The register 
> pressure cost calculation should use the n_new + n_old (numbers) to 
> consider the cost. n_old is the register used inside the loops and the 
> affect of n_new new variables generated by loop invariant and IVOPTS 
> on register pressure is based on how the new variables impact on register 
> used inside the loops.
> 
> ChangeLog:
> 2015-09-26  Ajit Agarwal  
> 
>   * cfgloopanal.c (estimate_reg_pressure_cost) : Add changes
>   to consider the n_new plus n_old in the register pressure
>   cost.
> 
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com

>>Ajit,

 >>It looks to me like your change doesn't do anything at all inside the 
 >>loop-invariant.c code. There it's doing a difference between two 
 estimate_reg_pressure_cost calls so adding n_old (regs_used) to both is 
 >>canceled out.

>>  size_cost = (estimate_reg_pressure_cost (new_regs[0] + regs_needed[0],
>>  regs_used, speed, call_p)
>>   - estimate_reg_pressure_cost (new_regs[0],
>> regs_used, speed, call_p));

>>I'm not quite sure I understand the "why" of the heuristic you've added here 
>>-- can you explain your reasoning further?

Aaron:

Extract from function estimate_reg_pressure_cost() where the changes are made.

if (regs_needed <= available_regs)
/* If we are close to running out of registers, try to preserve
   them.  */
/* Case 1 */
cost = target_reg_cost [speed] * regs_needed ;
  else
/* If we run out of registers, it is very expensive to add another
   one.  */
 /* Case 2*/
cost = target_spill_cost [speed] * regs_needed;

If the first estimate_reg_pressure falls into the category of Case I or Case 2 
and the second estimate_reg_pressure falls into same Category for Case1 Or 
Case2 then it will be cancelled out. If both the estimate_reg_pressure falls 
into different category like first One falls into Case 2 and second one falls 
into Case 1, then it will not be cancelled out as the target_reg_cost[speed] 
and target_spill_cost[speed] are different.

The 

Re: [Patch, testsuite] Skip addr_equal-1 if target keeps null pointer checks

2015-09-29 Thread Senthil Kumar Selvaraj
On Mon, Sep 28, 2015 at 01:38:18PM -0600, Jeff Law wrote:
> On 09/28/2015 02:15 AM, Senthil Kumar Selvaraj wrote:
> >Hi,
> >
> >   The below patch skips gcc.dg/addr_equal-1.c if the target keeps null
> >   pointer checks.
> >
> >   The test fails for such targets (avr, in my case) because the address
> >   comparison in the below code does not resolve to a constant, causing
> >   builtin_constant_p to return false and fail the test.
> >
> >   /* Variables and functions do not share same memory locations otherwise.  
> > */
> >   if (!__builtin_constant_p ((void *)undef_fn0 == (void *)_var0))
> > abort ();
> >
> >   For targets that delete null pointer checks, the equality comparison 
> > expression
> >   is optimized away to 0, as the code in match.pd knows they can only be
> >   equal if they are both NULL, which cannot be true since
> >   flag-delete-null-pointer-checks is on.
> >
> >   For targets that keep null pointer checks, 0 is a valid address and the
> > comparison expression is left as is, and that causes a later pass to
> > fold the builtin_constant_p to a false value, resulting in the test 
> > failure.
> This sounds like a failing in the compiler itself, not a testsuite issue.
> 
> Even on a target where objects can be at address 0, you can't have a
> variable and a function at the same address.

Hmm, symtab_node::equal_address_to, which is where the address equality
check happens, has a comment that contradicts
your statement, and the function variable overlap check is done after the
NULL possibility check. The current code looks like this

   /* If both symbols may resolve to NULL, we can not really prove them 
different.  */  
   
if (!nonzero_address () && !s2->nonzero_address ())
  return 2;
  
/* Except for NULL, functions and variables never overlap.  */
if (TREE_CODE (decl) != TREE_CODE (s2->decl))
  return 0;

Does anyone know why?

Regards
Senthil


Re: [PATCH][PR67666] Handle single restrict pointer in struct in create_variable_info_for_1

2015-09-29 Thread Tom de Vries

On 22/09/15 09:49, Richard Biener wrote:

On Tue, 22 Sep 2015, Tom de Vries wrote:


Hi,

Consider this test-case:

struct ps
{
   int *__restrict__ p;
};

void
f (struct ps &__restrict__ ps1)
{
   *(ps1.p) = 1;
}


Atm, the restrict on p has no effect. Now, say we add a field to the struct:

struct ps
{
   int *__restrict__ p;
   int a;
};


Then the restrict on p does have the desired effect.


This patch fixes the handling of structs with a single field in alias
analysis.

Bootstrapped and reg-tested on x86_64.

OK for trunk?


Ok.



Hi,

I wonder if this follow-up patch is necessary.

Now that we handle structs with one field in the final loop of 
create_variable_info_for_1, should we set the is_full_var field as well? 
It used to be set for such structs before I committed the "Handle single 
restrict pointer in struct in create_variable_info_for_1" patch.


Thanks,
- Tom

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index 8d86dcb..26d97a3 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -5720,6 +5720,8 @@ create_variable_info_for_1 (tree decl, const char 
*name)

   newvi->offset = fo->offset;
   newvi->size = fo->size;
   newvi->fullsize = vi->fullsize;
+  if (fieldstack.length () == 1)
+   newvi->is_full_var = true;
   newvi->may_have_pointers = fo->may_have_pointers;
   newvi->only_restrict_pointers = fo->only_restrict_pointers;
   if (i + 1 < fieldstack.length ())



Re: [PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-29 Thread Jakub Jelinek
On Mon, Sep 28, 2015 at 05:53:42PM +0300, Ilya Verbin wrote:
> Currently the COI emulator is single-threaded, i.e. it is able to run only one
> target function at a time, e.g. the following testcase:
> 
>   #pragma omp parallel sections num_threads(2)
> {
>   #pragma omp section
>   #pragma omp target
>   while (1)
>   putchar ('.');
> 
>   #pragma omp section
>   #pragma omp target
>   while (1)
>   putchar ('o');
> }
> 
> prints only dots using emul, while using real libcoi it prints:
> ...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
> Of course, it's not possible to test new OpenMP 4.1's async features using 
> such
> an emulator.
> 
> The patch bellow makes it asynchronous, it creates an auxiliary thread for 
> each
> COIPipeline in host and in target processes.  In general, a new COIPipeline is
> created by liboffloadmic for each host thread with offload, i.e. the example
> above has:
> 4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
> 3 threads in the target process (1 main thread + 2 auxiliary threads).
> An auxiliary host thread runs a target function in the new thread in target
> process and waits for its completion.  When the function is finished, the host
> thread signals an event and can run a callback, if it is registered.
> liboffloadmic waits for signalled events by calling COIEventWait.
> This is identical to how real libcoi works.
> 
> make check-target-libgomp and some internal tests did not show any regression.
> TSan report is clean.  Is it OK for trunk?

For now ok.  Though, I'd say I'd prefer if there were no auxiliary threads
on the host side, just whatever thread is asked to send something to/from
the device, wait for something and/or poll for something just polling the
pipes.  Are there auxiliary host threads also for the case when using
the real COI, offloading to hw?
> 
> 
> liboffloadmic/
>   * plugin/libgomp-plugin-intelmic.cpp (OFFLOAD_ACTIVE_WAIT_ENV): New
>   define.
>   (init): Set OFFLOAD_ACTIVE_WAIT env var to 0, if it is not set.
>   * runtime/emulator/coi_common.h (PIPE_HOST_PATH): Replace with ...
>   (PIPE_HOST2TGT_NAME): ... this.
>   (PIPE_TARGET_PATH): Replace with ...
>   (PIPE_TGT2HOST_NAME): ... this.
>   (MALLOCN): New define.
>   (READN): Likewise.
>   (WRITEN): Likewise.
>   (enum cmd_t): Replace CMD_RUN_FUNCTION with CMD_PIPELINE_RUN_FUNCTION.
>   Add CMD_PIPELINE_CREATE, CMD_PIPELINE_DESTROY.
>   * runtime/emulator/coi_device.cpp (engine_dir): New static variable.
>   (pipeline_thread_routine): New static function.
>   (COIProcessWaitForShutdown): Use global engine_dir instead of mic_dir.
>   Rename pipe_host and pipe_target to pipe_host2tgt and pipe_tgt2host.
>   If cmd is CMD_PIPELINE_CREATE, create a new thread for the pipeline.
>   Remove cmd == CMD_RUN_FUNCTION case.
>   * runtime/emulator/coi_device.h (COIERRORN): New define.
>   * runtime/emulator/coi_host.cpp: Include set, map, queue.
>   Replace typedefs with enums and structs.
>   (struct Function): Remove name, add num_buffers, bufs_size,
>   bufs_data_target, misc_data_len, misc_data, return_value_len,
>   return_value, completion_event.
>   (struct Callback): New.
>   (struct Process): Remove pipeline.  Add pipe_host2tgt and pipe_tgt2host.
>   (struct Pipeline): Remove pipe_host and pipe_target.  Add thread,
>   destroy, is_destroyed, pipe_host2tgt_path, pipe_tgt2host_path,
>   pipe_host2tgt, pipe_tgt2host, queue, process.
>   (max_pipeline_num): New static variable.
>   (pipelines): Likewise.
>   (max_event_num): Likewise.
>   (non_signalled_events): Likewise.
>   (errored_events): Likewise.
>   (callbacks): Likewise.
>   (cleanup): Do not check tmp_dirs before free.
>   (start_critical_section): New static function.
>   (finish_critical_section): Likewise.
>   (pipeline_is_destroyed): Likewise.
>   (maybe_invoke_callback): Likewise.
>   (signal_event): Likewise.
>   (get_event_result): Likewise.
>   (COIBufferCopy): Rename arguments according to headers.  Add asserts.
>   Use process' main pipes, instead of pipeline's pipes.  Signal completion
>   event.
>   (COIBufferCreate): Rename arguments according to headers.  Add asserts.
>   Use process' main pipes, instead of pipeline's pipes.
>   (COIBufferCreateFromMemory): Rename arguments according to headers.
>   Add asserts.
>   (COIBufferDestroy): Rename arguments according to headers.  Add asserts.
>   Use process' main pipes, instead of pipeline's pipes.
>   (COIBufferGetSinkAddress): Rename arguments according to headers.
>   Add asserts.
>   (COIBufferMap): Rename arguments according to headers.  Add asserts.
>   Signal completion event.
>   (COIBufferRead): Likewise.
>   (COIBufferSetState): Likewise.
>   

Re: [PATCH][PR67666] Handle single restrict pointer in struct in create_variable_info_for_1

2015-09-29 Thread Richard Biener
On Tue, 29 Sep 2015, Tom de Vries wrote:

> On 22/09/15 09:49, Richard Biener wrote:
> > On Tue, 22 Sep 2015, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > Consider this test-case:
> > > 
> > > struct ps
> > > {
> > >int *__restrict__ p;
> > > };
> > > 
> > > void
> > > f (struct ps &__restrict__ ps1)
> > > {
> > >*(ps1.p) = 1;
> > > }
> > > 
> > > 
> > > Atm, the restrict on p has no effect. Now, say we add a field to the
> > > struct:
> > > 
> > > struct ps
> > > {
> > >int *__restrict__ p;
> > >int a;
> > > };
> > > 
> > > 
> > > Then the restrict on p does have the desired effect.
> > > 
> > > 
> > > This patch fixes the handling of structs with a single field in alias
> > > analysis.
> > > 
> > > Bootstrapped and reg-tested on x86_64.
> > > 
> > > OK for trunk?
> > 
> > Ok.
> > 
> 
> Hi,
> 
> I wonder if this follow-up patch is necessary.
> 
> Now that we handle structs with one field in the final loop of
> create_variable_info_for_1, should we set the is_full_var field as well? It
> used to be set for such structs before I committed the "Handle single restrict
> pointer in struct in create_variable_info_for_1" patch.

Yeah, I suppose so.  But I'd set vi->is_full_var to true when
allocating 'vi':

  vi = new_var_info (decl, name);
  vi->fullsize = tree_to_uhwi (declsize);
 +  if (fieldstack.length () == 1) 
 +   vi->is_full_var = true;


Ok with that change.

Thanks,
Richard.

> Thanks,
> - Tom
> 
> diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
> index 8d86dcb..26d97a3 100644
> --- a/gcc/tree-ssa-structalias.c
> +++ b/gcc/tree-ssa-structalias.c
> @@ -5720,6 +5720,8 @@ create_variable_info_for_1 (tree decl, const char *name)
>newvi->offset = fo->offset;
>newvi->size = fo->size;
>newvi->fullsize = vi->fullsize;
> +  if (fieldstack.length () == 1)
> +   newvi->is_full_var = true;
>newvi->may_have_pointers = fo->may_have_pointers;
>newvi->only_restrict_pointers = fo->only_restrict_pointers;
>if (i + 1 < fieldstack.length ())
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time

2015-09-29 Thread Jakub Jelinek
On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek  wrote:
> > So, do I understand well that you'll call GOMP_set_offload_targets from
> > construct[ors] of all shared libraries (and the binary) that contain 
> > offloaded
> > code?  If yes, that is surely going to fail the assertions in there.
> 
> Indeed.  My original plan has been to generate/invoke this constructor
> only for/from the final executable and not for any shared libraries, but
> it seems I didn't implemented this correctly.

How would you mean to implement it?  -fopenmp or -fopenacc code with
offloading bits might not be in the final executable at all, nor in shared
libraries it is linked against; such libraries could be only dlopened,
consider say python plugin.  And this is not just made up, perhaps not with
offloading yet, but people regularly use OpenMP code in plugins and then we
get complains that fork child of the main program is not allowed to do
anything but async-signal-safe functions.
> 
> > You can dlopen such libraries etc.  What if you link one library with
> > -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?
> 
> So, the first question to answer is: what do we expect to happen in this
> case, or similarly, if the executable and any shared libraries are
> compiled with different/incompatible -foffload options?

As the device numbers are per-process, the only possibility I see is that
all the physically available devices are always available, and just if you
try to offload from some code to a device that doesn't support it, you get
host fallback.  Because, one shared library could carefully use device(xyz)
to offload to say XeonPhi it is compiled for and supports, and another
library device(abc) to offload to PTX it is compiled for and supports.

> For this, I propose that the only mode of operation that we currently can
> support is that all of the executable and any shared libraries agree on
> the offload targets specified by -foffload, and I thus propose the
> following patch on top of what Joseph has posted before (passes the
> testsuite, but not yet tested otherwise):

See above, no.

Jakub


Re: [AArch64] Fix Prefetch ICE

2015-09-29 Thread Marcus Shawcroft
On 28 September 2015 at 06:27, Hurugalawadi, Naveen
 wrote:
> Hi Marcus,
>
> Thanks for the review and comments.
>
>>> OK and can you back port to 5 ?
>
> Please find attached the backported patch on gcc-5-branch.
>
> Regression tested on AArch64 without any issues.
>
> 2015-09-28  Andrew Pinski  
>
> ChangeLog
>
> * config/aarch64/aarch64.md (prefetch):
> Change the predicate of operand 0 to register_operand.

Thank you, please commit it if you have not already.
/M


[gomp4] fold acc_on_device

2015-09-29 Thread Nathan Sidwell

I've committed this patch to gomp4.

It removes acc_on_device handling  from the oacc_xform pass, and moves it into 
the builtin folder.  I force the runtime version to be built with optimization 
and remove the expander too.


Expansion is rather later than I'm confortable with, but until we have use cases 
where it causes a problem, this is fine.


Bernd, I'd managed to confuse myself last week -- compiling w/o optimization can 
generate a different set of rtl dumps than with optimization, so I ended up 
seeing some stale ones.


Will prepare trunk  versions next ...

nathan
2015-09-29  Nathan Sidwell  

	gcc/
	* omp-low.c (oacc_xform_on_device): Delete.
	(oacc_xform_dim): Return bool.
	(execute_oacc_transform): Don't handle acc_on_device here.  Adjust
	rescan logic.
	* builtins.c (expand_builtin_acc_on_device): Delete.
	(expand_builtin): Do not call it.
	(fold_builtin_1): Fold acc_on_device.

	libgomp/
	* oacc-init.c (acc_on_device): Compile with optimization.
	* config/nvptx/oacc-init.c (acc_on_device): Compile with optimization.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228215)
+++ gcc/omp-low.c	(working copy)
@@ -14719,45 +14719,10 @@ make_pass_late_lower_omp (gcc::context *
   return new pass_late_lower_omp (ctxt);
 }
 
-/* Transform an acc_on_device call.  OpenACC 2.0a requires this folded at
-   compile time for constant operands.  We always fold it.  In an
-   offloaded function we're never 'none'.  */
-
-static void
-oacc_xform_on_device (gcall *call)
-{
-  tree arg = gimple_call_arg (call, 0);
-  unsigned val = GOMP_DEVICE_HOST;
-	  
-#ifdef ACCEL_COMPILER
-  val = GOMP_DEVICE_NOT_HOST;
-#endif
-  tree result = build2 (EQ_EXPR, boolean_type_node, arg,
-			build_int_cst (integer_type_node, val));
-#ifdef ACCEL_COMPILER
-  {
-tree dev  = build2 (EQ_EXPR, boolean_type_node, arg,
-			build_int_cst (integer_type_node,
-   ACCEL_COMPILER_acc_device));
-result = build2 (TRUTH_OR_EXPR, boolean_type_node, result, dev);
-  }
-#endif
-  result = fold_convert (integer_type_node, result);
-  tree lhs = gimple_call_lhs (call);
-  gimple_seq seq = NULL;
-
-  push_gimplify_context (true);
-  gimplify_assign (lhs, result, );
-  pop_gimplify_context (NULL);
-
-  gimple_stmt_iterator gsi = gsi_for_stmt (call);
-  gsi_replace_with_seq (, seq, false);
-}
-
 /* Transform oacc_dim_size and oacc_dim_pos internal function calls to
constants, where possible.  */
 
-static void
+static bool
 oacc_xform_dim (gcall *call, const int dims[], bool is_pos)
 {
   tree arg = gimple_call_arg (call, 0);
@@ -14766,13 +14731,13 @@ oacc_xform_dim (gcall *call, const int d
 
   if (!size)
 /* Dimension size is dynamic.  */
-return;
+return false;
   
   if (is_pos)
 {
   if (size != 1)
 	/* Size is more than 1, so POS might be non-zero.  */
-	return;
+	return false;
   size = 0;
 }
 
@@ -14783,6 +14748,7 @@ oacc_xform_dim (gcall *call, const int d
 
   gimple_stmt_iterator gsi = gsi_for_stmt (call);
   gsi_replace (, g, false);
+  return true;
 }
 
 /* Validate and update the dimensions for offloaded FN.  ATTRS is the
@@ -14868,64 +14834,57 @@ execute_oacc_transform ()
 for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
   {
 	gimple *stmt = gsi_stmt (gsi);
-	int rescan = 0;
-	
 	if (!is_gimple_call (stmt))
 	  {
 	gsi_next ();
 	continue;
 	  }
 
+	gcall *call = as_a  (stmt);
+	if (!gimple_call_internal_p (call))
+	  {
+	gsi_next ();
+	continue;
+	  }
+
 	/* Rewind to allow rescan.  */
 	gsi_prev ();
+	int rescan = 0;
+	unsigned ifn_code = gimple_call_internal_fn (call);
 
-	gcall *call = as_a  (stmt);
-	
-	if (gimple_call_builtin_p (call, BUILT_IN_ACC_ON_DEVICE))
-	  /* acc_on_device must be evaluated at compile time for
-	 constant arguments.  */
+	switch (ifn_code)
 	  {
-	oacc_xform_on_device (call);
+	  default: break;
+
+	  case IFN_GOACC_DIM_POS:
+	  case IFN_GOACC_DIM_SIZE:
+	if (oacc_xform_dim (call, dims, ifn_code == IFN_GOACC_DIM_POS))
+	  rescan = 1;
+	break;
+
+	  case IFN_GOACC_REDUCTION_SETUP:
+	  case IFN_GOACC_REDUCTION_INIT:
+	  case IFN_GOACC_REDUCTION_FINI:
+	  case IFN_GOACC_REDUCTION_TEARDOWN:
+	/* Mark the function for SSA renaming.  */
+	mark_virtual_operands_for_renaming (cfun);
+	targetm.goacc.reduction (call);
 	rescan = 1;
+	break;
+
+	  case IFN_UNIQUE:
+	{
+	  unsigned code = TREE_INT_CST_LOW (gimple_call_arg (call, 0));
+
+	  if ((code == IFN_UNIQUE_OACC_FORK
+		   || code == IFN_UNIQUE_OACC_JOIN)
+		  && (targetm.goacc.fork_join
+		  (call, dims, code == IFN_UNIQUE_OACC_FORK)))
+		rescan = -1;
+	  break;
+	}
 	  }
-	else if (gimple_call_internal_p (call))
-	  {
-	unsigned ifn_code = gimple_call_internal_fn (call);
-	switch (ifn_code)
-	  {
-	  default: break;
-
-	  case IFN_GOACC_DIM_POS:
-	  case IFN_GOACC_DIM_SIZE:
-		

Re: [gomp4] Remove erroneous test and unreachable situation.

2015-09-29 Thread James Norris

Hi,

The original patch still missed some situations (thanks Cesar!)
and the attached patch addresses those. It also adds some new
tests.

Jim

Index: libgomp/ChangeLog.gomp
===
--- libgomp/ChangeLog.gomp	(revision 228245)
+++ libgomp/ChangeLog.gomp	(working copy)
@@ -1,3 +1,7 @@
+2015-09-29  James Norris  
+
+	* testsuite/libgomp.oacc-fortran/routine-9.f90: New test.
+
 2015-09-29  Nathan Sidwell  
 
 	* oacc-init.c (acc_on_device): Compile with optimization.
Index: libgomp/testsuite/libgomp.oacc-fortran/routine-9.f90
===
--- libgomp/testsuite/libgomp.oacc-fortran/routine-9.f90	(revision 0)
+++ libgomp/testsuite/libgomp.oacc-fortran/routine-9.f90	(revision 0)
@@ -0,0 +1,31 @@
+! { dg-do run }
+! { dg-options "-fno-inline" }
+
+program main
+  implicit none
+  integer, parameter :: n = 10
+  integer :: a(n), i
+  integer, external :: fact
+  !$acc routine (fact)
+  !$acc parallel
+  !$acc loop
+  do i = 1, n
+ a(i) = fact (i)
+  end do
+  !$acc end parallel
+  do i = 1, n
+ if (a(i) .ne. fact(i)) call abort
+  end do
+end program main
+
+recursive function fact (x) result (res)
+  implicit none
+  !$acc routine (fact)
+  integer, intent(in) :: x
+  integer :: res
+  if (x < 1) then
+ res = 1
+  else
+ res = x * fact(x - 1)
+  end if
+end function fact
Index: gcc/testsuite/ChangeLog.gomp
===
--- gcc/testsuite/ChangeLog.gomp	(revision 228245)
+++ gcc/testsuite/ChangeLog.gomp	(working copy)
@@ -1,3 +1,7 @@
+2015-08-29  James Norris  
+
+	* gfortran.dg/goacc/routine-6.f90: New test.
+
 2015-09-29  Tom de Vries  
 
 	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.
Index: gcc/testsuite/gfortran.dg/goacc/routine-6.f90
===
--- gcc/testsuite/gfortran.dg/goacc/routine-6.f90	(revision 0)
+++ gcc/testsuite/gfortran.dg/goacc/routine-6.f90	(revision 0)
@@ -0,0 +1,79 @@
+
+module m
+  integer m1int
+contains
+  subroutine subr5 (x) 
+  implicit none
+  !$acc routine (subr5)
+  !$acc routine (m1int) ! { dg-error "invalid function name" }
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+  end subroutine subr5
+end module m
+
+program main
+  implicit none
+  interface
+function subr6 (x) 
+!$acc routine (subr6) ! { dg-error "without list is allowed in interface" }
+integer, intent (in) :: x
+integer :: subr6
+end function subr6
+  end interface
+  integer, parameter :: n = 10
+  integer :: a(n), i
+  !$acc routine (subr1) ! { dg-error "invalid function name" }
+  external :: subr2
+  !$acc routine (subr2)
+  !$acc parallel
+  !$acc loop
+  do i = 1, n
+ call subr1 (i)
+ call subr2 (i)
+  end do
+  !$acc end parallel
+end program main
+
+subroutine subr1 (x) 
+  !$acc routine
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+end subroutine subr1
+
+subroutine subr2 (x) 
+  !$acc routine (subr1) ! { dg-error "invalid function name" }
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+end subroutine subr2
+
+subroutine subr3 (x) 
+  !$acc routine (subr3)
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ call subr4 (x)
+  end if
+end subroutine subr3
+
+subroutine subr4 (x) 
+  !$acc routine (subr4)
+  integer, intent(inout) :: x
+  if (x < 1) then
+ x = 1
+  else
+ x = x * x - 1
+  end if
+end subroutine subr4
Index: gcc/fortran/openmp.c
===
--- gcc/fortran/openmp.c	(revision 228245)
+++ gcc/fortran/openmp.c	(working copy)
@@ -1745,11 +1745,35 @@ gfc_match_oacc_routine (void)
 
   if (m == MATCH_YES)
 {
-  /* Scan for a function name/string.  */
-  m = gfc_match_symbol (, 0);
+  char buffer[GFC_MAX_SYMBOL_LEN + 1];
+  gfc_symtree *st;
 
-  if (m == MATCH_NO)
+  m = gfc_match_name (buffer);
+  if (m == MATCH_YES)
 	{
+	  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
+	  if (st)
+	{
+	  sym = st->n.sym;
+	  if (strcmp (sym->name, gfc_current_ns->proc_name->name) == 0)
+	sym = NULL;
+	}
+
+	  if (st == NULL
+	  || (sym
+		  && !sym->attr.external
+		  && !sym->attr.function
+		  && !sym->attr.subroutine))
+	{
+	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, "
+			 "invalid function name %s",
+			 (sym) ? sym->name : buffer);
+	  gfc_current_locus = old_loc;
+	  return MATCH_ERROR;
+	}
+	}
+  else
+{
 	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C");
 	  gfc_current_locus = old_loc;
 	  return MATCH_ERROR;
@@ -1761,7 +1785,7 @@ 

[PATCH] Fix PR67170

2015-09-29 Thread Richard Biener

The following patch addresses PR67170 which shows we fail to disambiguate
INTENT(IN) variables against for example recursive calls.  The trick
in solving this is to notice that when a function has a fn spec
attribute that says memory reachable by a parameter is not modified
then that memory behaves as if it were readonly throughout the function
and thus it doesn't have a dependence on any other reference in that
function.

In the PR I prototyped a patch in the alias oracle itself but that's
too expensive (we need to find the index of a PARM_DECL).  Thus the
following patch implements that trick in the value-numbering machinery
instead.  Going with the alias oracle patch would still be possible
if we decide on caching the fn spec information in a place that is
O(1) accessible from relevant memory references (thus either on the
SSA default def or the PARM_DECL itself).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

This improves a future important benchmark implementing a
Sudoku puzzle solver considerably (~10% on x86_64 IIRC).

Richard.

2015-09-29  Richard Biener  

PR tree-optimization/67170
* tree-ssa-alias.h (get_continuation_for_phi): Adjust
the translate function pointer parameter to get the
bool whether to disambiguate only by reference.
(walk_non_aliased_vuses): Likewise.
* tree-ssa-alias.c (maybe_skip_until): Adjust.
(get_continuation_for_phi_1): Likewise.
(get_continuation_for_phi): Likewise.
(walk_non_aliased_vuses): Likewise.
* tree-ssa-sccvn.c (const_parms): New bitmap.
(vn_reference_lookup_3): Adjust for interface change.
Disambiguate parameters pointing to readonly memory.
(free_scc_vn): Free const_parms.
(run_scc_vn): Initialize const_parms from a fn spec attribute.

* gfortran.dg/pr67170.f90: New testcase.

Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c(revision 228230)
--- gcc/tree-ssa-alias.c(working copy)
*** static bool
*** 2442,2448 
  maybe_skip_until (gimple *phi, tree target, ao_ref *ref,
  tree vuse, unsigned int *cnt, bitmap *visited,
  bool abort_on_visited,
! void *(*translate)(ao_ref *, tree, void *, bool),
  void *data)
  {
basic_block bb = gimple_bb (phi);
--- 2442,2448 
  maybe_skip_until (gimple *phi, tree target, ao_ref *ref,
  tree vuse, unsigned int *cnt, bitmap *visited,
  bool abort_on_visited,
! void *(*translate)(ao_ref *, tree, void *, bool *),
  void *data)
  {
basic_block bb = gimple_bb (phi);
*** maybe_skip_until (gimple *phi, tree targ
*** 2477,2484 
  ++*cnt;
  if (stmt_may_clobber_ref_p_1 (def_stmt, ref))
{
  if (translate
! && (*translate) (ref, vuse, data, true) == NULL)
;
  else
return false;
--- 2477,2485 
  ++*cnt;
  if (stmt_may_clobber_ref_p_1 (def_stmt, ref))
{
+ bool disambiguate_only = true;
  if (translate
! && (*translate) (ref, vuse, data, _only) == NULL)
;
  else
return false;
*** static tree
*** 2505,2511 
  get_continuation_for_phi_1 (gimple *phi, tree arg0, tree arg1,
ao_ref *ref, unsigned int *cnt,
bitmap *visited, bool abort_on_visited,
!   void *(*translate)(ao_ref *, tree, void *, bool),
void *data)
  {
gimple *def0 = SSA_NAME_DEF_STMT (arg0);
--- 2506,2512 
  get_continuation_for_phi_1 (gimple *phi, tree arg0, tree arg1,
ao_ref *ref, unsigned int *cnt,
bitmap *visited, bool abort_on_visited,
!   void *(*translate)(ao_ref *, tree, void *, bool *),
void *data)
  {
gimple *def0 = SSA_NAME_DEF_STMT (arg0);
*** get_continuation_for_phi_1 (gimple *phi,
*** 2547,2559 
else if ((common_vuse = gimple_vuse (def0))
   && common_vuse == gimple_vuse (def1))
  {
*cnt += 2;
if ((!stmt_may_clobber_ref_p_1 (def0, ref)
   || (translate
!  && (*translate) (ref, arg0, data, true) == NULL))
  && (!stmt_may_clobber_ref_p_1 (def1, ref)
  || (translate
! && (*translate) (ref, arg1, data, true) == NULL)))
return common_vuse;
  }
  
--- 2548,2561 
else if ((common_vuse = gimple_vuse (def0))
   && common_vuse == gimple_vuse (def1))
  {
+   bool disambiguate_only = true;
*cnt += 2;
if ((!stmt_may_clobber_ref_p_1 (def0, ref)
   || (translate

Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Jeff Law

On 09/29/2015 07:19 AM, Oleg Endo wrote:

On Mon, 2015-09-28 at 15:28 -0500, Segher Boessenkool wrote:


We can at least change the default to LRA, so new ports get it unless
they like to hurt themselves.

I don't think it makes sense to keep reload around *just* for the ports
that are in "maintenance mode": by the time we are down to *just* those
ports, it makes more sense to relabel them as "unmaintained".


Just for my understanding ... what's the definition of "maintenance
mode" or "unmaintained"?

I'm not sure there's any formal definition.

If the port isn't getting tested, bugs aren't getting fixed, fails to 
build, etc then it's probably a good bet you could put it into the 
unmaintained bucket.


If the port does get occasional fixes (primarily driven by BZs), but not 
getting updated on a regular basis (such as conversion to LRA, 
conversion to RTL prologue/epilogue, etc), may be only getting 
occasional testing, etc.  Then it's probably fair to call it in 
maintenance mode.  A great example IMHO would be the m68k.


I would say we probably have many ports in maintenance mode right now. 
Not sure if any are in the unmaintained mode with perhaps the exception 
of interix.


jeff


Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Oleg Endo
On Mon, 2015-09-28 at 15:28 -0500, Segher Boessenkool wrote:

> We can at least change the default to LRA, so new ports get it unless
> they like to hurt themselves.
> 
> I don't think it makes sense to keep reload around *just* for the ports
> that are in "maintenance mode": by the time we are down to *just* those
> ports, it makes more sense to relabel them as "unmaintained".

Just for my understanding ... what's the definition of "maintenance
mode" or "unmaintained"?

Cheers,
Oleg



[PATCH] x86 interrupt attribute

2015-09-29 Thread Yulia Koval
Hi,



The patch below implements interrupt attribute for x86 processors.



The interrupt and exception handlers are called by x86 processors.
X86 hardware pushes information onto stack and calls the handler.  The
requirements are



1. Both interrupt and exception handlers must use the 'IRET'
instruction, instead of the 'RET' instruction, to return from the
handlers.

2. All registers are callee-saved in interrupt and exception handlers.

3. The difference between interrupt and exception handlers is the
exception handler must pop 'ERROR_CODE' off the stack before the
'IRET'

instruction.



The design goals of interrupt and exception handlers for x86 processors

are:



1. Support both 32-bit and 64-bit modes.

2. Flexible for compilers to optimize.

3. Easy to use by programmers.



To implement interrupt and exception handlers for x86 processors, a
compiler should support:



'interrupt' attribute



Use this attribute to indicate that the specified function with
mandatory arguments is an interrupt or exception handler.  The
compiler generates function entry and exit sequences suitable for use
in an interrupt handler when this attribute is present.  The 'IRET'
instruction, instead of the 'RET' instruction, is used to return from
interrupt or exception handlers.  All registers, except for the EFLAGS
register which is restored by the 'IRET' instruction, are preserved by
the compiler.



Any interruptible-without-stack-switch code must be compiled with
-mno-red-zone since interrupt handlers can and will, because of the
hardware design, touch the red zone.



1. interrupt handler must be declared with a mandatory pointer argument:



struct interrupt_frame;



__attribute__ ((interrupt))

void

f (struct interrupt_frame *frame)

{

...

}



and user must properly define the structure the pointer pointing to.



2. exception handler:



The exception handler is very similar to the interrupt handler with a
different mandatory function signature:



typedef unsigned long long int uword_t;

typedef unsigned int uword_t;



struct interrupt_frame;



__attribute__ ((interrupt))

void

f (struct interrupt_frame *frame, uword_t error_code) { ...

}



and compiler pops the error code off stack before the 'IRET' instruction.



The exception handler should only be used for exceptions which push an
error code and all other exceptions must use the interrupt handler.

The system will crash if the wrong handler is used.



Bootstrapped/regtested on Linux/x86_64 and Linux/i686.

Ok for trunk?



2015-09-29  Julia Koval 

H.J. Lu 



PR target/67630

PR target/67634

* config/i386/i386-protos.h (ix86_interrupt_return_nregs): New.

* config/i386/i386.c (ix86_frame): Add nbndregs and nmaskregs.

(ix86_interrupt_return_nregs): New variable.

(ix86_nsaved_bndregs): New function.

(ix86_nsaved_maskregs): Likewise.

(ix86_reg_save_area_size): Likewise.

(ix86_nsaved_sseregs): Don't return 0 in interrupt handler.

(ix86_compute_frame_layout): Set nbndregs and nmaskregs.  Set

save_regs_using_mov to true to save bound and mask registers.

Call ix86_reg_save_area_size to get register save area size.

Allocate space to save full vector registers in
interrupt handler.

(ix86_emit_save_reg_using_mov): Set alignment to word_mode

alignment when saving full vector registers in
interrupt handler.

(ix86_emit_save_regs_using_mov): Use regno_reg_rtx to get

register size.

(ix86_emit_restore_regs_using_mov): Likewise.

(ix86_emit_save_sse_regs_using_mov): Save full vector
registers in

interrupt handler.

(ix86_emit_restore_sse_regs_using_mov): Restore full vector

registers in interrupt handler.

(ix86_expand_epilogue): Use move to restore bound registers.

* config/i386/sse.md (*mov_internal): Handle misaligned

SSE load and store in interrupt handler.



PR target/66960

* config/i386/i386.c (ix86_conditional_register_usage): Set

ix86_interrupt_return_nregs/

(ix86_set_current_function): Set is_interrupt and is_exception.

Mark arguments in interrupt handler as used.

(ix86_function_ok_for_sibcall): Return false if in interrupt

handler.

(type_natural_mode): Don't warn ABI change for MMX in interrupt

handler.

(ix86_function_arg_advance): Skip for callee in interrupt

handler.

(ix86_function_arg): Handle arguments for callee in interrupt

handler.


Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Jeff Law

On 09/28/2015 02:28 PM, Segher Boessenkool wrote:

On Mon, Sep 28, 2015 at 03:23:37PM -0400, Vladimir Makarov wrote:

There are more ports using reload than LRA now.  Even some major ports
(e.g. ppc64) did not switch to LRA.


There still are some failures in the testsuite (ICEs even) so we're
not there yet.


I usually say target maintainers, that if they don't switch LRA they
probably will have problems with maintenance and development in a long
perspective.  New things are easier to implement in LRA.


It is also true that new *ports* are easier to do with LRA than with
reload :-)
Right.  And if we set the expectation that a new port must use LRA, then 
I think we're fine.




We can at least change the default to LRA, so new ports get it unless
they like to hurt themselves.

I don't think it makes sense to keep reload around *just* for the ports
that are in "maintenance mode": by the time we are down to *just* those
ports, it makes more sense to relabel them as "unmaintained".
FWIW, I tried to build a simple cc0 target with LRA (v850-elf), but it 
fell over pretty early.  Essentially LRA doesn't seem to be cc0-aware in 
split_reg as ultimately inserted something between a cc0-setter and 
cc0-user.  Oops.



jeff


Re: OpenACC subarray data alignment in fortran

2015-09-29 Thread Cesar Philippidis
Ping.

In the meantime, I'll apply this patch to gomp-4_0-branch.

Cesar

On 09/22/2015 08:24 AM, Cesar Philippidis wrote:
> In both OpenACC and OpenMP, each subarray has at least two data mappings
> associated with them, one for the pointer and another for the data in
> the array section (fortan also has a pset mapping). One problem I
> observed in fortran is that array section data is casted to char *.
> Consequently, when lower_omp_target assigns alignment for the subarray
> data, it does so incorrectly. This is a problem on nvptx if you have a
> data clause such as
> 
>   integer foo
>   real*8 bar (100)
> 
>   !$acc data copy (foo, bar(1:100))
> 
> Here, the data associated with bar could get aligned on a 4 byte
> boundary instead of 8 byte. That causes problems on nvptx targets.
> 
> My fix for this is to prevent the fortran front end from casting the
> data pointers to char *. I only prevented casting on the code which
> handles OMP_CLAUSE_MAP. The subarrays associated with OMP_CLAUSE_SHARED
> also get casted to char *, but I left those as-is because I'm not that
> familiar with how non-OpenMP target regions get lowered.
> 
> Is this patch OK for trunk?
> 
> Thanks,
> Cesar
> 



Re: [PATCH 2/4 v2] bb-reorder: Add the "simple" algorithm

2015-09-29 Thread Bernd Schmidt

On 09/25/2015 04:16 PM, Segher Boessenkool wrote:

v2 changes:
- Add a file header comment;
- Use "for" loop initial declarations;
- Handle asm goto.

Testing this on x86_64-linux; okay if it succeeds?


No objections from me. Let's give Steven another day or so to comment.


Bernd


[PATCH] Fix PR67741

2015-09-29 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-09-29  Richard Biener  

PR tree-optimization/67741
* tree-ssa-math-opts.c (pass_cse_sincos::execute): Only recognize
builtin calls with correct signature.

* gcc.dg/torture/pr67741.c: New testcase.

Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c(revision 228115)
--- gcc/tree-ssa-math-opts.c(working copy)
*** pass_cse_sincos::execute (function *fun)
*** 1738,1752 
 of a basic block.  */
  cleanup_eh = false;
  
! if (is_gimple_call (stmt)
! && gimple_call_lhs (stmt)
! && (fndecl = gimple_call_fndecl (stmt))
! && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
{
  tree arg, arg0, arg1, result;
  HOST_WIDE_INT n;
  location_t loc;
  
  switch (DECL_FUNCTION_CODE (fndecl))
{
CASE_FLT_FN (BUILT_IN_COS):
--- 1738,1751 
 of a basic block.  */
  cleanup_eh = false;
  
! if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL)
! && gimple_call_lhs (stmt))
{
  tree arg, arg0, arg1, result;
  HOST_WIDE_INT n;
  location_t loc;
  
+ fndecl = gimple_call_fndecl (stmt);
  switch (DECL_FUNCTION_CODE (fndecl))
{
CASE_FLT_FN (BUILT_IN_COS):
Index: gcc/testsuite/gcc.dg/torture/pr67741.c
===
*** gcc/testsuite/gcc.dg/torture/pr67741.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr67741.c  (working copy)
***
*** 0 
--- 1,13 
+ /* { dg-do compile } */
+ 
+ struct singlecomplex { float real, imag ; } ;
+ struct doublecomplex { double real, imag ; } ;
+ struct extendedcomplex { long double real, imag ; } ;
+ extern double cabs();
+ float cabsf(fc)
+  struct singlecomplex fc;  /* { dg-warning "doesn't match" } */
+ {
+   struct doublecomplex dc ;
+   dc.real=fc.real; dc.imag=fc.imag;
+   return (float) cabs(dc);
+ }


Re: [Patch, Fortran, 66927, v2] [6 Regression] ICE in gfc_conf_procedure_call

2015-09-29 Thread Andre Vehreschild
Hi Mikael, hi all,

sorry for the late reply, but I was a bit busy lately and the patch was
not as easy as expected. 

Mikael, I addressed your question about clarifying the comment and while
doing so the question arose "what happens when the function returns a
class object?" You have one guess; correct: ICE! This extended patch
now addresses the ICE and furthermore more consequently makes use of
the temporary created for the source= expression. I.e., when the
temporary is a class-object, it's vtab is more often retrieved from the
temporary and no longer generated from the gfc_expr's typespec. 

To efficiently copy - in the class/derived cases - the data, I had to
drill open the gfc_copy_class_to_class() routine a little bit, in that
it accepts the destination object to be a BT_DERIVED, too. 

I provide two testcases now and had to fix class_array_15, which was
expecting one too many calls to __builtin_free. With this patch the
creation of an unnecessary temporary object is prevented, which in the
consequence leads to one less calls to __builtin_free to free the
allocatable component of the temporary object.

Bootstraps and regtests ok on x86_64-linux-gnu/f21.

Ok, for trunk?

Regards,
Andre

On Sun, 9 Aug 2015 14:37:03 +0200
Mikael Morin  wrote:

> Le 06/08/2015 14:00, Mikael Morin a écrit :
> > Let me have a look at it.
> >
> So, I've had a look at it.
> This is a pandora box that I don't want to open.
> So your change is OK.
> However, could you clarify the comment?
> Function calls returning a class object are either pointer or 
> allocatable, so they don't call gfc_conv_expr_descriptor already, they 
> aren't an exception...
> 
> Mikael


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr66927_2.clog
Description: Binary data
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index a6b761b..504b08a 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -3222,7 +3222,7 @@ build_array_ref (tree desc, tree offset, tree decl, tree vptr)
 {
   type = gfc_get_element_type (type);
   tmp = TREE_OPERAND (cdecl, 0);
-  tmp = gfc_get_class_array_ref (offset, tmp);
+  tmp = gfc_get_class_array_ref (offset, tmp, NULL_TREE);
   tmp = fold_convert (build_pointer_type (type), tmp);
   tmp = build_fold_indirect_ref_loc (input_location, tmp);
   return tmp;
@@ -7079,9 +7079,20 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
 	}
 	  else if (GFC_ARRAY_TYPE_P (TREE_TYPE (desc)) || se->use_offset)
 	{
+	  bool toonebased;
 	  tmp = gfc_conv_array_lbound (desc, n);
+	  toonebased = integer_onep (tmp);
+	  // lb(arr) - from (- start + 1)
 	  tmp = fold_build2_loc (input_location, MINUS_EXPR,
  TREE_TYPE (base), tmp, from);
+	  if (onebased && toonebased)
+		{
+		  tmp = fold_build2_loc (input_location, MINUS_EXPR,
+	 TREE_TYPE (base), tmp, start);
+		  tmp = fold_build2_loc (input_location, PLUS_EXPR,
+	 TREE_TYPE (base), tmp,
+	 gfc_index_one_node);
+		}
 	  tmp = fold_build2_loc (input_location, MULT_EXPR,
  TREE_TYPE (base), tmp,
  gfc_conv_array_stride (desc, n));
@@ -7155,12 +7166,13 @@ gfc_conv_expr_descriptor (gfc_se *se, gfc_expr *expr)
   /* For class arrays add the class tree into the saved descriptor to
  enable getting of _vptr and the like.  */
   if (expr->expr_type == EXPR_VARIABLE && VAR_P (desc)
-  && IS_CLASS_ARRAY (expr->symtree->n.sym)
-  && DECL_LANG_SPECIFIC (expr->symtree->n.sym->backend_decl))
+  && IS_CLASS_ARRAY (expr->symtree->n.sym))
 {
   gfc_allocate_lang_decl (desc);
   GFC_DECL_SAVED_DESCRIPTOR (desc) =
-	  GFC_DECL_SAVED_DESCRIPTOR (expr->symtree->n.sym->backend_decl);
+	  DECL_LANG_SPECIFIC (expr->symtree->n.sym->backend_decl) ?
+	GFC_DECL_SAVED_DESCRIPTOR (expr->symtree->n.sym->backend_decl)
+	  : expr->symtree->n.sym->backend_decl;
 }
   if (!se->direct_byref || se->byref_noassign)
 {
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index e086fe3..90b5140 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -1039,9 +1039,10 @@ gfc_conv_class_to_class (gfc_se *parmse, gfc_expr *e, gfc_typespec class_ts,
of the referenced element.  */
 
 tree
-gfc_get_class_array_ref (tree index, tree class_decl)
+gfc_get_class_array_ref (tree index, tree class_decl, tree data_comp)
 {
-  tree data = gfc_class_data_get (class_decl);
+  tree data = data_comp != NULL_TREE ? data_comp :
+   gfc_class_data_get (class_decl);
   tree size = gfc_class_vtab_size_get (class_decl);
   tree offset = fold_build2_loc (input_location, MULT_EXPR,
  gfc_array_index_type,
@@ -1075,6 +1076,7 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems, bool unlimited)
   tree stdcopy;
   tree extcopy;
   tree index;
+  bool is_from_desc = false, is_to_class = false;
 
   args = NULL;
   /* To prevent warnings on uninitialized variables.  */
@@ -1088,7 

Re: [PATCH 1/3, libgomp] Adjust offload plugin interface for avoiding deadlock on exit

2015-09-29 Thread Chung-Lin Tang
On 2015/9/25 上午 04:27, Ilya Verbin wrote:
> On Thu, Aug 27, 2015 at 21:44:50 +0800, Chung-Lin Tang wrote:
>> We've discovered that, for several of the libgomp plugin interface routines,
>> if the target specific routine calls exit() (usually upon a fatal condition),
>> deadlock ensues. We found this using nvptx, but it's possible on intelmic as 
>> well.
>>
>> This is due to many of the plugin routines are called with the device lock 
>> held,
>> and when exit() is called inside the plugin code, the GOMP_unregister_var() 
>> destructor
>> tries to iterate through and acquire all device locks to cleanup. Since we 
>> already hold
>> one of the device locks, this just gets stuck.  Also because gomp_mutex_t is 
>> a
>> simple futex based lock implementation (instead of pthreads), we don't have a
>> trylock mechanism to use either.
>>
>> So this patch tries to alleviate this problem by changing the plugin 
>> interface;
>> the plugin routines that are called while holding the device lock are 
>> adjusted
>> to assume to never fatal exit, but return a value back to libgomp proper to
>> indicate execution results. The core libgomp code then may unlock and call 
>> gomp_fatal().
>>
>> We believe this is the right route to solve the problem, since there's only
>> two accel target plugins so far. Besides the nvptx plugin, I have made some 
>> effort
>> to update the intelmic plugin as well, though it's not as thoroughly audited.
>> Intel folks might want to further make sure your plugin code is free of this 
>> problem as well.
>>
>> This patch contains the libgomp proper changes. The nvptx and intelmic 
>> patches follow.
>> I have tested the libgomp testsuite without regressions for both accel 
>> targets, is this
>> okay for trunk?
> 
> (I have no objections)
> 
> However, in case of intelmic, these exit()s are just the tip of the iceberg,
> because underlying liboffloadmic contains other exit()s at fatal errors.
> And I don't know what to do with such deadlocks.
> 
>   -- Ilya

Yes, I think I saw more things to adjust wrt this issue within liboffloadmic, 
though I
hope this plugin interface change can set things ready.

And ping again, for the libgomp proper changes.

Thanks,
Chung-Lin





Re: [patch] Reduce space and time overhead of std::thread

2015-09-29 Thread Jonathan Wakely

On 23/09/15 17:18 +0100, Jonathan Wakely wrote:

For PR 65393 I avoided some unnecessary shared_ptr copies while
launching a std::thread. This goes further and avoids shared_ptr
entirely, using unique_ptr instead. This reduces the memory overhead
of a std::thread by 32 bytes (on 64-bit) and avoids any
reference-count updates.

The downside is it exports some new symbols, and we have to keep the
old code for backwards compatibility, but I think it's worth doing.

Does anybody disagree?


Tested powerpc64le-linux and x86_64-dragonfly4.1.

Committed to trunk.



commit 2d7e89aae8ac12dd7a6b2083e5169679c1200cc5
Author: Jonathan Wakely 
Date:   Thu Mar 12 13:23:23 2015 +

   Reduce space and time overhead of std::thread
   
   	PR libstdc++/65393

* config/abi/pre/gnu.ver: Export new symbols.
* include/std/thread (thread::_State, thread::_State_impl): New types.
(thread::_M_start_thread): Add overload taking unique_ptr<_State>.
(thread::_M_make_routine): Remove.
(thread::_S_make_state): Add.
(thread::_Impl_base, thread::_Impl, thread::_M_start_thread)
[_GLIBCXX_THREAD_ABI_COMPAT] Only declare conditionally.
* src/c++11/thread.cc (execute_native_thread_routine): Rename to
execute_native_thread_routine_compat and re-define to use _State.
(thread::_State::~_State()): Define.
(thread::_M_make_thread): Define new overload.
(thread::_M_make_thread) [_GLIBCXX_THREAD_ABI_COMPAT]: Only define old
overloads conditionally.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index d42cd37..08d9bc6 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1870,6 +1870,11 @@ GLIBCXX_3.4.22 {
# std::uncaught_exceptions()
_ZSt19uncaught_exceptionsv;

+# std::thread::_State::~_State()
+_ZT[ISV]NSt6thread6_StateE;
+_ZNSt6thread6_StateD[012]Ev;
+
_ZNSt6thread15_M_start_threadESt10unique_ptrINS_6_StateESt14default_deleteIS1_EEPFvvE;
+
} GLIBCXX_3.4.21;

# Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index ebbda62..c67ec46 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -60,9 +60,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  class thread
  {
  public:
+// Abstract base class for types that wrap arbitrary functors to be
+// invoked in the new thread of execution.
+struct _State
+{
+  virtual ~_State();
+  virtual void _M_run() = 0;
+};
+using _State_ptr = unique_ptr<_State>;
+
typedef __gthread_t native_handle_type;
-struct _Impl_base;
-typedef shared_ptr<_Impl_base>   __shared_base_type;

/// thread::id
class id
@@ -92,29 +99,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
operator<<(basic_ostream<_CharT, _Traits>& __out, thread::id __id);
};

-// Simple base type that the templatized, derived class containing
-// an arbitrary functor can be converted to and called.
-struct _Impl_base
-{
-  __shared_base_type   _M_this_ptr;
-
-  inline virtual ~_Impl_base();
-
-  virtual void _M_run() = 0;
-};
-
-template
-  struct _Impl : public _Impl_base
-  {
-   _Callable   _M_func;
-
-   _Impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f))
-   { }
-
-   void
-   _M_run() { _M_func(); }
-  };
-
  private:
id  _M_id;

@@ -133,16 +117,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  thread(_Callable&& __f, _Args&&... __args)
  {
#ifdef GTHR_ACTIVE_PROXY
-   // Create a reference to pthread_create, not just the gthr weak symbol
-_M_start_thread(_M_make_routine(std::__bind_simple(
-std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)),
-   reinterpret_cast(_create));
+   // Create a reference to pthread_create, not just the gthr weak symbol.
+   auto __depend = reinterpret_cast(_create);
#else
-_M_start_thread(_M_make_routine(std::__bind_simple(
-std::forward<_Callable>(__f),
-std::forward<_Args>(__args)...)));
+   auto __depend = nullptr;
#endif
+_M_start_thread(_S_make_state(
+ std::__bind_simple(std::forward<_Callable>(__f),
+std::forward<_Args>(__args)...)),
+   __depend);
  }

~thread()
@@ -190,23 +173,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
hardware_concurrency() noexcept;

  private:
+template
+  struct _State_impl : public _State
+  {
+   _Callable   _M_func;
+
+   _State_impl(_Callable&& __f) : _M_func(std::forward<_Callable>(__f))
+   { }
+
+   void
+   _M_run() { _M_func(); }
+  };
+
+void
+_M_start_thread(_State_ptr, void (*)());
+
+

Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Richard Biener
On Tue, Sep 29, 2015 at 3:39 PM, Jeff Law  wrote:
> On 09/29/2015 07:19 AM, Oleg Endo wrote:
>>
>> On Mon, 2015-09-28 at 15:28 -0500, Segher Boessenkool wrote:
>>
>>> We can at least change the default to LRA, so new ports get it unless
>>> they like to hurt themselves.
>>>
>>> I don't think it makes sense to keep reload around *just* for the ports
>>> that are in "maintenance mode": by the time we are down to *just* those
>>> ports, it makes more sense to relabel them as "unmaintained".
>>
>>
>> Just for my understanding ... what's the definition of "maintenance
>> mode" or "unmaintained"?
>
> I'm not sure there's any formal definition.
>
> If the port isn't getting tested, bugs aren't getting fixed, fails to build,
> etc then it's probably a good bet you could put it into the unmaintained
> bucket.
>
> If the port does get occasional fixes (primarily driven by BZs), but not
> getting updated on a regular basis (such as conversion to LRA, conversion to
> RTL prologue/epilogue, etc), may be only getting occasional testing, etc.
> Then it's probably fair to call it in maintenance mode.  A great example
> IMHO would be the m68k.

Another criteria would be available hardware for which both the PA and
alpha ports
are a good example.  When you can't buy new hardware then targets that
could formerly host GCC quickly rot to the state where only cross-compilation
is viable (and having "old" GCC is good enough).

> I would say we probably have many ports in maintenance mode right now. Not
> sure if any are in the unmaintained mode with perhaps the exception of
> interix.

I'd say that all ports not in maintainance mode should be at least secondary
archs as we can expect maintainers to be around to keep it at the quality
level we expect for secondary targets.  Now I'd like to do the opposite
conclusion and declare all non-primary/secondary targets as in
maintainance mode ... ;)
We have 49 targets (counting directories) and 7 of them compose the list of
primary and secondary triplets.

Richard.

> jeff


[gomp4, committed] Ignore reduction clauses in kernels region

2015-09-29 Thread Tom de Vries

Hi,

this patch filters out reduction clauses in an oacc kernels region. This 
fixes an ICE in the test-case.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Ignore reduction clauses in kernels region

2015-09-29  Tom de Vries  

	* omp-low.c (ctx_in_oacc_kernels_region): New function.
	(scan_omp_for): Filter out reduction clauses in kernels region.

	* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
---
 gcc/omp-low.c  | 18 +++-
 .../goacc/kernels-acc-loop-reduction.c | 25 ++
 2 files changed, 42 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a5904eb..597035f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2579,6 +2579,20 @@ oacc_loop_or_target_p (gimple *stmt)
 	  && gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_OACC_LOOP));
 }
 
+bool
+ctx_in_oacc_kernels_region (omp_context *ctx)
+{
+  for (;ctx != NULL; ctx = ctx->outer)
+{
+  gimple *stmt = ctx->stmt;
+  if (gimple_code (stmt) == GIMPLE_OMP_TARGET
+	  && gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+	return true;
+}
+
+  return false;
+}
+
 /* Scan a GIMPLE_OMP_FOR.  */
 
 static void
@@ -2592,6 +2606,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
   bool auto_clause = false;
   bool seq_clause = false;
   int gwv_routine = 0;
+  bool in_oacc_kernels_region = ctx_in_oacc_kernels_region (outer_ctx);
 
   if (outer_ctx)
 outer_type = gimple_code (outer_ctx->stmt);
@@ -2665,7 +2680,8 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
 
   /* Filter out any OpenACC clauses which aren't associated with
 	 gangs, workers or vectors.  Such reductions are no-ops.  */
-  if (extract_oacc_loop_mask (ctx) == 0)
+  if (extract_oacc_loop_mask (ctx) == 0
+	  || in_oacc_kernels_region)
 	{
 	  /* First filter out the clauses at the beginning of the chain.  */
 	  while (clauses && OMP_CLAUSE_CODE (clauses) == OMP_CLAUSE_REDUCTION)
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
new file mode 100644
index 000..f3aa4e7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+unsigned int
+foo (int n, unsigned int *a)
+{
+  unsigned int sum = 0;
+
+#pragma acc kernels loop gang reduction(+:sum)
+  for (int i = 0; i < n; i++)
+sum += a[i];
+
+  return sum;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 1 "parloops_oacc_kernels" } } */
-- 
1.9.1



Re: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-09-29 Thread Kyrill Tkachov


On 29/09/15 09:03, Marcus Shawcroft wrote:

On 29/09/15 00:52, Evandro Menezes wrote:

In some micro-architectures the insns to load or store pairs of vector
registers are implemented rather differently from those affecting lanes
in vector registers.  Then, it's important that such insns be described
likewise differently in the scheduling model.

This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart
from the current neon_load2_2reg_q and neon_store2_2reg_q types,
respectively.


Hi,

The AArch64 part of this is OK. Please wait for Kyrill or Ramana to
comment on ARM side.  Cheers /Marcus



This is ok arm-wise. I see the instructions being modelled
with this type don't have a direct arm equivalent anyway.
Marcus' comment on the ChangeLog still apply.

Thanks,
Kyrill


Thank you,

-- Evandro Menezes


0001-AArch64-Add-separate-insn-sched-class-for-vector-LDP.patch


  From 340249dcd2af8dfce486cb4f62d4eaf285c6a799 Mon Sep 17 00:00:00 2001
From: Evandro Menezes
Date: Mon, 28 Sep 2015 15:00:00 -0500
Subject: [PATCH] [AArch64] Add separate insn sched class for vector LDP & STP

2015-09-28  Evandro Menezes

gcc/
* config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q):
add new insn types for vector load and store pairs.

s/add/Add/ and likewise the rest of the changelog comments.


* config/arm/cortex-a53.md (cortex_a53_f_load_2reg): add insn
types "neon_ldp{,_q}".
* config/arm/cortex-a57.md (neon_load_c): add insn types
"neon_ldp{,_q}".
(neon_store_complex): add insn types "neon_stp{,_q}".
* config/aarch64/aarch64-simd.md (aarch64_be_movoi): add insn types
"neon_{ldp,stp}_q".




Re: [PR64164] drop copyrename, integrate into expand

2015-09-29 Thread Szabolcs Nagy

On 23/09/15 21:07, Alexandre Oliva wrote:

On Sep 18, 2015, Alan Lawrence  wrote:


With the latest git commit 2b27ef197ece54c4573c5a748b0d40076e35412c on
branch aoliva/pr64164, I am now able to build a cross toolchain for
aarch64 and aarch64_be, and can confirm the ABI failure is fixed on
the branch.




this commit

commit 33cc9081157a8c90460e4c0bdda2ac461a3822cc
Author: aoliva 
Date:   2015-09-27 09:02:00 +

revert to assign_parms assignments using default defs
...

introduced a test failure on arm-none-eabi (using newlib, compiling
with -mthumb -march=armv8-a -mfpu=crypto-neon-fp-armv8 -mfloat-abi=hard ):

FAIL: gcc.target/arm/pr43920-2.c scan-assembler-times pop 2

spawn arm-none-eabi-size pr43920-2.o
   textdata bss dec hex filename
 56   0   0  56  38 pr43920-2.o
text size is 56
FAIL: gcc.target/arm/pr43920-2.c object-size text <= 54

(i haven't looked into the failure, attached asm output before and after).


Thanks for the confirmation.  I've made one further tweak for cris and
lm32, dropping the assert that caused build failures for libstdc++
atomics parms that required more alignment than
MAX_SUPPORTED_STACK_ALIGNMENT, consolidated the patchset and retested it
with a more recent baseline (r228019), with native regstraps on
x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu,
powerpc64le-linux-gnu, and cross toolchain builds for the following 73
platforms: aarch64_be-elf aarch64-elf arm-eabi armeb-eabihf
arm-symbianelf avr-elf bfin-elf c6x-elf cr16-elf cris-elf crisv32-elf
epiphany-elf fido-elf fr30-elf frv-elf ft32-elf h8300-elf i686-elf
ia64-elf iq2000-elf lm32-elf m32c-elf m32r-elf m32rle-elf m68k-elf
mcore-elf mep-elf microblaze-elf mips64el-elf mips64-elf mips64orion-elf
mips64vr-elf mipsel-elf mipsisa32-elfoabi mipsisa64-elfoabi
mipsisa64r2el-elf mipsisa64r2-sde-elf mipsisa64sb1-elf
mipsisa64sr71k-elf mipstx39-elf mn10300-elf moxie-elf msp430-elf
nds32be-elf nds32le-elf nios2-elf pdp11-aout powerpc-eabialtivec
powerpc-eabi powerpc-eabisimaltivec powerpc-eabisim powerpc-eabispe
powerpcle-eabi powerpcle-eabisim powerpcle-elf powerpc-xilinx-eabi
ppc64-eabi ppc-eabi ppc-elf rl78-elf rx-elf sh64-elf sh-elf
sh-superh-elf sparc64-elf sparc-elf sparc-leon-elf spu-elf v850e-elf
v850-elf visium-elf xstormy16-elf xtensa-elf.  Not all of them succeeded
in building, but those that didn't failed at the very same spots before
and after this patch.


This patch doesn't really add much functionality.  It rather
reimplements a lot of the ugly and fragile stuff I put in in the
previous big patchset in a far more robust and pleasant way.  It fixes a
number of regressions in the process, mainly because, instead of
modifying assign_parms so as to let cfgexpand do part of its job, it
reverts all of the RTL assignment for parameters and results to
assign_parms.  cfgexpand now leaves the RTL assignment of partitions
containing default defs or parms and results to assign_parms, and
assign_parms uses a single callback, set_parm_rtl, to tell cfgexpand the
assignment for the partition containing the default def of each
parameter.

This required introducing default defs for all parms and results, even
if unused; we could refrain from creating them, and refrain from
initializing those parameters (at least when optimizing), but that would
require messing with the fragile bits in assign_parms again, and it
would bring little benefit, since RTL optimization will likely notice
the initialization is unused and drop it anyway.  Besides, adding the
default defs was actually needed to fix a regression in the previous
patch, and even with the current patch it helps make sure we don't
assign more than one default def to the same SSA partition (the previous
patch attempted to do that, but there was a bug, fixed in the current
patch).  Having unused default defs makes it easier for us to decide
whether to use an entry_value rtx for the initial debug insn of a parm.
We track partitions holding default defs for parms and results with a
bitmap; we used to have a bitmap that tracked partitions holding default
defs, but it was unused!  I just renamed it and repurposed it.

I've also added checking asserts to set_rtl, to verify that, when we
expect a REG, we get a REG, and that it has the expected mode.  set_rtl
was also adjusted to record anonymous SSA names or their base types in
attrs of REGs or MEMs, respectively, so that code that relied on the
attrs to detect properties of the decl types no longer regress just
because we no longer generate decls for anonymous SSA names.  Since
there were prior uses of types in MEM attrs, that was expected to go
smoothly, but I was surprised at how smoothly adding SSA names to REG
attrs went.  No adjustments required!

I also tightened a bit the conditions for coalescing: we used to require
the same canonical type; I've added tests for same alignment
requirements, and for same 

Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-29 Thread Bernd Schmidt

On 09/29/2015 12:29 PM, Richard Biener wrote:

I agree that obstacks are better here.  Efficiency shouldn't matter here.
But we're in C++ now so can't we statically construct the array with
sth like

const char *new_argv[] = { "objcopy", ... };

?  Thus have the compiler figure out the number of args.  That would work
for me as well.


The issue is that the code is about to be changed to conditionally pass 
certain arguments ("-v"), so you no longer have a fixed arglist.



Bernd


Re: [patch, committed] Dump function attributes

2015-09-29 Thread Richard Biener
On Tue, Sep 29, 2015 at 7:43 AM, Tom de Vries  wrote:
> [ was: Re: [RFC] Dump function attributes ]
>
> On 28/09/15 17:17, Bernd Schmidt wrote:
>>
>> On 09/28/2015 04:32 PM, Tom de Vries wrote:
>>>
>>> patch below prints the function attributes in the dump file.
>>
>>
>>> foo ()
>>> [ noclone , noinline ]
>>> {
>>> ...
>>>
>>> Good idea?
>>>
>>> If so, do we want one attribute per line?
>>
>>
>> Only for really long ones I'd think. Patch is ok for now.
>>
>>
>
> Reposting patch with ChangeLog entry added.
>
> Bootstrapped and reg-tested on x86_64.
>
> Committed to trunk.

Hmpf.  I always like to make the dump-files as much copy to testcases
as possible.  So why did you invent a new syntax for attributes instead of using
the existing __attribute__(("noclone", "noinline")) (in this case)?
Did you verify
how attributes with arguments get printed?

Thanks,
Richard.

>
> Thanks,
> - Tom


[PATCH, testsuite]: Check all variables to be non-zero before signbit tests in tg-tests.h

2015-09-29 Thread Uros Bizjak
Hello!

On targets where denormals are flushed to zero with
-funsafe-math-optimizations (x86 SSE and alpha), it can happen that
zero value enters signbit tests in usafe math mode. Since signs of
zeroes and NaNs are not preserved in unsafe math mode,
gcc.dg/pr28796-2.c can fail on these targets.

We already have a check for non-zero double value in place for unsafe
math mode. Attached patch adds additional tests that guarantee  that
float and long double values are non-zero before signbit tests.

2015-09-29  Uros Bizjak  

* gcc.dg/tg-tests.h (foo_1) [UNSAFE]: Also check if f and ld are
non-zero for __builtin_signbit tests.

Tested on alpha-linux-gnu (where the patch fixes gcc.dg/pr28796-2.c
failure) and x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.

Index: gcc.dg/tg-tests.h
===
--- gcc.dg/tg-tests.h   (revision 228229)
+++ gcc.dg/tg-tests.h   (working copy)
@@ -82,7 +82,7 @@

   /* Sign bit of zeros and nans is not preserved in unsafe math mode.  */
 #ifdef UNSAFE
-  if (!res_isnan && d != 0)
+  if (!res_isnan && f != 0 && d != 0 && ld != 0)
 #endif
 {
   if ((__builtin_signbit (f) ? 1 : 0) != res_signbit)


[PATCH] Clarify __atomic_compare_exchange_n docs

2015-09-29 Thread Jonathan Wakely

Someone on IRC incorrectly parsed the docs at
https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/_005f_005fatomic-Builtins.html#index-g_t_005f_005fatomic_005fcompare_005fexchange_005fn-3536
as:

 IF
 (
  desired is written into *ptr
  AND
  the execution is considered to conform to the memory model
  specified by success_memmodel.
 )
 {
  true is returned
 }
 otherwise ...

rather than the intended:

 IF ( desired is written into *ptr )
 {
  true is returned
  AND
  the execution is considered to conform to the memory model
  specified by success_memmodel.
 }
 otherwise ...

So they asked:


What is otherwise, here? Can I make the function return false even
when 'desired' has been written into 'ptr'? How do I do it? I could
not write an example, so far.


This patch rewords it to avoid the ambiguity.

I've also replaced the rather clunky "the operation is considered to
conform to" phrasing. (It's only _considered_ to? So does it or doesn't
it use that memory order?) Instead I've used the terminology from the
C and C++ standards, which say "memory is affected according to".

OK for trunk?

commit 370a92b7f4d318957a70d0d3f1185f1c6f282ff3
Author: Jonathan Wakely 
Date:   Tue Sep 29 12:45:21 2015 +0100

	* doc/extend.texi (__atomic Builtins): Clarify compare_exchange
	effects.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 8406945..0de94f2 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9353,17 +9353,17 @@ This compares the contents of @code{*@var{ptr}} with the contents of
 @code{*@var{expected}}. If equal, the operation is a @emph{read-modify-write}
 operation that writes @var{desired} into @code{*@var{ptr}}.  If they are not
 equal, the operation is a @emph{read} and the current contents of
-@code{*@var{ptr}} is written into @code{*@var{expected}}.  @var{weak} is true
+@code{*@var{ptr}} are written into @code{*@var{expected}}.  @var{weak} is true
 for weak compare_exchange, and false for the strong variation.  Many targets 
 only offer the strong variation and ignore the parameter.  When in doubt, use
 the strong variation.
 
-True is returned if @var{desired} is written into
-@code{*@var{ptr}} and the operation is considered to conform to the
+If @var{desired} is written into @code{*@var{ptr}} then true is returned
+and memory is affected according to the
 memory order specified by @var{success_memorder}.  There are no
 restrictions on what memory order can be used here.
 
-False is returned otherwise, and the operation is considered to conform
+Otherwise, false is returned and memory is affected according
 to @var{failure_memorder}. This memory order cannot be
 @code{__ATOMIC_RELEASE} nor @code{__ATOMIC_ACQ_REL}.  It also cannot be a
 stronger order than that specified by @var{success_memorder}.


Re: [Graphite] Redesign Graphite scop detection

2015-09-29 Thread Andreas Schwab
FAIL: gcc.dg/graphite/interchange-1.c execution test
FAIL: gcc.dg/graphite/interchange-10.c execution test
FAIL: gcc.dg/graphite/interchange-11.c execution test
FAIL: gcc.dg/graphite/interchange-3.c execution test
FAIL: gcc.dg/graphite/interchange-4.c execution test
FAIL: gcc.dg/graphite/interchange-7.c execution test
FAIL: gcc.dg/graphite/pr46185.c execution test
FAIL: gcc.dg/graphite/uns-block-1.c execution test
FAIL: gcc.dg/graphite/uns-interchange-12.c execution test
FAIL: gcc.dg/graphite/uns-interchange-14.c execution test
FAIL: gcc.dg/graphite/uns-interchange-15.c execution test
FAIL: gcc.dg/graphite/uns-interchange-9.c execution test
FAIL: gcc.dg/graphite/uns-interchange-mvt.c execution test
FAIL: gfortran.dg/graphite/block-1.f90   -O  (internal compiler error)

/daten/aranym/gcc/gcc-20150929/gcc/testsuite/gfortran.dg/graphite/block-1.f90:1:0:
 internal compiler error: in extract_affine_chrec, at 
graphite-sese-to-poly.c:605
0xece332 extract_affine_chrec
../../gcc/graphite-sese-to-poly.c:604
0xece332 extract_affine
../../gcc/graphite-sese-to-poly.c:791
0xecdcec extract_affine_chrec
../../gcc/graphite-sese-to-poly.c:595
0xecdcec extract_affine
../../gcc/graphite-sese-to-poly.c:791
0xed3476 pdr_add_memory_accesses
../../gcc/graphite-sese-to-poly.c:1477
0xed3476 build_poly_dr
../../gcc/graphite-sese-to-poly.c:1572
0xed3476 build_pbb_drs
../../gcc/graphite-sese-to-poly.c:1836
0xed3476 build_scop_drs
../../gcc/graphite-sese-to-poly.c:1919
0xed3476 build_poly_scop(scop*)
../../gcc/graphite-sese-to-poly.c:3179
0xebdfc2 graphite_transform_loops()
../../gcc/graphite.c:318
0xebe6a0 graphite_transforms
../../gcc/graphite.c:353
0xebe6a0 execute
../../gcc/graphite.c:430

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [gomp4, committed] Ignore reduction clauses in kernels region

2015-09-29 Thread Tom de Vries

On 29/09/15 11:49, Tom de Vries wrote:

Hi,

this patch filters out reduction clauses in an oacc kernels region. This
fixes an ICE in the test-case.

Committed to gomp-4_0-branch.


I've committed this follow-up patch that marks the function 
ctx_in_oacc_kernels_region static, and adds the missing function header 
comment.


Thanks,
- Tom

Make ctx_in_oacc_kernels_region static

2015-09-29  Tom de Vries  

	* omp-low.c (ctx_in_oacc_kernels_region): Make static.  Add missing
	function header comment.
---
 gcc/omp-low.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 75044a5..64f6168 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2579,7 +2579,9 @@ oacc_loop_or_target_p (gimple *stmt)
 	  && gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_OACC_LOOP));
 }
 
-bool
+/* Return true if ctx is part of an oacc kernels region.  */
+
+static bool
 ctx_in_oacc_kernels_region (omp_context *ctx)
 {
   for (;ctx != NULL; ctx = ctx->outer)
-- 
1.9.1



Re: [PATCH 1/4] Add mkoffload for Intel MIC

2015-09-29 Thread Richard Biener
On Mon, Sep 28, 2015 at 2:05 PM, Bernd Schmidt  wrote:
> On 09/28/2015 02:00 PM, Jakub Jelinek wrote:
>>
>> On Mon, Sep 28, 2015 at 01:27:32PM +0200, Bernd Schmidt wrote:

 I've removed obstack_ptr_grow for arrays with known sizes after this
 review:
 https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02210.html
>>>
>>>
>>> That's unfortunate, I think that made the code less future-proof. IMO we
>>> should revert to the obstack method especially if Thomas -v patch goes
>>> in.
>>
>>
>> Why?  If the number of arguments is bound by a small constant, using
>> automatic fixed size array is certainly more efficient, and I really don't
>> see it as less readable or maintainable.
>
>
> The code becomes harder to modify, with more room for error, and you no
> longer have consistency in how you build argv arrays within the same file.
> The obstack method is pretty much foolproof and doesn't even remotely allow
> for the possibility of a buffer overflow, and adding new arguments, even
> conditionally, is entirely trivial. Efficiency is really not an issue for
> building arguments compared to the cost of executing another binary.

I agree that obstacks are better here.  Efficiency shouldn't matter here.
But we're in C++ now so can't we statically construct the array with
sth like

const char *new_argv[] = { "objcopy", ... };

?  Thus have the compiler figure out the number of args.  That would work
for me as well.

Richard.

>
> Bernd


Re: [PATCH] liboffloadmic emulation mode: make it asynchronous

2015-09-29 Thread Ilya Verbin
On Tue, Sep 29, 2015 at 09:01:33 +0200, Jakub Jelinek wrote:
> On Mon, Sep 28, 2015 at 05:53:42PM +0300, Ilya Verbin wrote:
> > Currently the COI emulator is single-threaded, i.e. it is able to run only 
> > one
> > target function at a time, e.g. the following testcase:
> > 
> >   #pragma omp parallel sections num_threads(2)
> > {
> >   #pragma omp section
> >   #pragma omp target
> >   while (1)
> > putchar ('.');
> > 
> >   #pragma omp section
> >   #pragma omp target
> >   while (1)
> > putchar ('o');
> > }
> > 
> > prints only dots using emul, while using real libcoi it prints:
> > ...o.o.o.o...o...o.oo.o.o.ooo.oo...o.o.o...o.ooo
> > Of course, it's not possible to test new OpenMP 4.1's async features using 
> > such
> > an emulator.
> > 
> > The patch bellow makes it asynchronous, it creates an auxiliary thread for 
> > each
> > COIPipeline in host and in target processes.  In general, a new COIPipeline 
> > is
> > created by liboffloadmic for each host thread with offload, i.e. the example
> > above has:
> > 4 threads in the host process (2 OpenMP threads + 2 auxiliary threads) and
> > 3 threads in the target process (1 main thread + 2 auxiliary threads).
> > An auxiliary host thread runs a target function in the new thread in target
> > process and waits for its completion.  When the function is finished, the 
> > host
> > thread signals an event and can run a callback, if it is registered.
> > liboffloadmic waits for signalled events by calling COIEventWait.
> > This is identical to how real libcoi works.
> > 
> > make check-target-libgomp and some internal tests did not show any 
> > regression.
> > TSan report is clean.  Is it OK for trunk?
> 
> For now ok.  Though, I'd say I'd prefer if there were no auxiliary threads
> on the host side, just whatever thread is asked to send something to/from
> the device, wait for something and/or poll for something just polling the
>
> pipes.  Are there auxiliary host threads also for the case when using
> the real COI, offloading to hw?

Yes.

  -- Ilya


Re: [patch, committed] Dump function attributes

2015-09-29 Thread Tom de Vries

On 29/09/15 12:36, Richard Biener wrote:

On Tue, Sep 29, 2015 at 7:43 AM, Tom de Vries  wrote:

[ was: Re: [RFC] Dump function attributes ]

On 28/09/15 17:17, Bernd Schmidt wrote:


On 09/28/2015 04:32 PM, Tom de Vries wrote:


patch below prints the function attributes in the dump file.




foo ()
[ noclone , noinline ]
{
...

Good idea?

If so, do we want one attribute per line?



Only for really long ones I'd think. Patch is ok for now.




Reposting patch with ChangeLog entry added.

Bootstrapped and reg-tested on x86_64.

Committed to trunk.


Hmpf.  I always like to make the dump-files as much copy to testcases
as possible.


Hmm, interesting. Not something I use, but I can imagine it's useful.


So why did you invent a new syntax for attributes instead of using
the existing __attribute__(("noclone", "noinline")) (in this case)?


My main concerns were:
- being able to see in dump files what the actual attributes of a
  function are (rather than having to figure it out in a debug session).
- being able to write testcases that can test for the presence of those
  attributes in dump files


Did you verify
how attributes with arguments get printed?


F.i. an oacc offload function compiled by the host compiler is annotated 
as follows:


before pass_oacc_transform (in the gomp-4_0-branch):
...
[ oacc function 32, , , omp target entrypoint ]
...

after pass_oacc_transform:

[ oacc function 1, 1, 1, omp target entrypoint ]
...

Thanks,
- Tom


Re: [RFC, PR target/65105] Use vector instructions for scalar 64bit computations on 32bit target

2015-09-29 Thread H.J. Lu
On Wed, Sep 23, 2015 at 3:29 AM, Uros Bizjak  wrote:
> On Wed, Sep 23, 2015 at 12:19 PM, Ilya Enkovich  
> wrote:
>> On 14 Sep 17:50, Uros Bizjak wrote:
>>>
>>> +(define_insn_and_split "*zext_doubleword"
>>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>>> + (zero_extend:DI (match_operand:SWI24 1 "nonimmediate_operand" "rm")))]
>>> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>>> +  "#"
>>> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
>>> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>>> +   (set (match_dup 2) (const_int 0))]
>>> +  "split_double_mode (DImode, [0], 1, [0], 
>>> [2]);")
>>> +
>>> +(define_insn_and_split "*zextqi_doubleword"
>>> +  [(set (match_operand:DI 0 "register_operand" "=r")
>>> + (zero_extend:DI (match_operand:QI 1 "nonimmediate_operand" "qm")))]
>>> +  "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>>> +  "#"
>>> +  "&& reload_completed && GENERAL_REG_P (operands[0])"
>>> +  [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>>> +   (set (match_dup 2) (const_int 0))]
>>> +  "split_double_mode (DImode, [0], 1, [0], 
>>> [2]);")
>>> +
>>>
>>> Please put the above patterns together with other zero_extend
>>> patterns. You can also merge these two patterns using SWI124 mode
>>> iterator with  mode attribute as a register constraint. Also, no
>>> need to check for GENERAL_REG_P after reload, when "r" constraint is
>>> in effect:
>>>
>>> (define_insn_and_split "*zext_doubleword"
>>>   [(set (match_operand:DI 0 "register_operand" "=r")
>>>  (zero_extend:DI (match_operand:SWI124 1 "nonimmediate_operand" "m")))]
>>>   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
>>>   "#"
>>>   "&& reload_completed"
>>>   [(set (match_dup 0) (zero_extend:SI (match_dup 1)))
>>>(set (match_dup 2) (const_int 0))]
>>>   "split_double_mode (DImode, [0], 1, [0], 
>>> [2]);")
>>
>> Register constraint doesn't affect split and I need GENERAL_REG_P to filter 
>> other registers case.
>
> OK.
>
>> I merged QI and HI cases of zext but made a separate pattern for SI case 
>> because it doesn't need zero_extend in resulting code.  Bootstrapped and 
>> regtested for x86_64-unknown-linux-gnu.
>
> This change is OK.
>
> The patch LGTM, but please wait a couple of days if Jeff has some
> comment on algorithmic aspect of the patch.
>
> Thanks,
> Uros.
>
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2015-09-23  Ilya Enkovich  
>>
>> * config/i386/i386.c: Include dbgcnt.h.
>> (has_non_address_hard_reg): New.
>> (convertible_comparison_p): New.
>> (scalar_to_vector_candidate_p): New.
>> (remove_non_convertible_regs): New.
>> (scalar_chain): New.
>> (scalar_chain::scalar_chain): New.
>> (scalar_chain::~scalar_chain): New.
>> (scalar_chain::add_to_queue): New.
>> (scalar_chain::mark_dual_mode_def): New.
>> (scalar_chain::analyze_register_chain): New.
>> (scalar_chain::add_insn): New.
>> (scalar_chain::build): New.
>> (scalar_chain::compute_convert_gain): New.
>> (scalar_chain::replace_with_subreg): New.
>> (scalar_chain::replace_with_subreg_in_insn): New.
>> (scalar_chain::emit_conversion_insns): New.
>> (scalar_chain::make_vector_copies): New.
>> (scalar_chain::convert_reg): New.
>> (scalar_chain::convert_op): New.
>> (scalar_chain::convert_insn): New.
>> (scalar_chain::convert): New.
>> (convert_scalars_to_vector): New.
>> (pass_data_stv): New.
>> (pass_stv): New.
>> (make_pass_stv): New.
>> (ix86_option_override): Created and register stv pass.
>> (flag_opts): Add -mstv.
>> (ix86_option_override_internal): Likewise.
>> * config/i386/i386.md (SWIM1248x): New.
>> (*movdi_internal): Add xmm to mem alternative for TARGET_STV.
>> (and3): Use SWIM1248x iterator instead of SWIM.
>> (*anddi3_doubleword): New.
>> (*zext_doubleword): New.
>> (*zextsi_doubleword): New.
>> (3): Use SWIM1248x iterator instead of SWIM.
>> (*di3_doubleword): New.
>> * config/i386/i386.opt (mstv): New.
>> * dbgcnt.def (stv_conversion): New.
>>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67761



-- 
H.J.


Re: [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs hook for AArch64

2015-09-29 Thread Richard Biener
On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
 wrote:
>
> Hi,
>
> This patch is a simple prototype showing how a target might choose
> to implement TARGET_COSTS_IFCVT_NOCE_IS_PROFITABLE_P.  It has not been
> tuned, tested or looked at in any meaningful way.
>
> While the patch is in need of more detailed analysis it is sufficient to
> serve as an indication of what direction I was aiming for with this
> patch set.
>
> Clearly this is not OK for trunk without further work, but I thought I'd
> include it as an afterthought for the costs rework.

First of all don't include math.h or use FP math on the host.  If you need
fractional arithmetic use sreal.

It looks like with your hook implementation you are mostly hiding magic
numbers in the target.  I'm not sure how this is better than exposing them
as user-accessible --params (and thus their defaults controllable by
the target).

Richard.

> Thanks,
> James
>
> ---
> 2015-09-26  James Greenhalgh  
>
> * config/aarch64/aarch64.c
> (aarch64_additional_branch_cost_for_probability): New.
> (aarch64_ifcvt_noce_profitable_p): Likewise.
> (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.
>


Re: [PATCH] Use stdint-wrap.h on *-*-netbsd[56]*

2015-09-29 Thread Jonathan Wakely

Ping.

On 18/09/15 13:59 +0100, Jonathan Wakely wrote:

This patch adjust config.gcc so that it installs  for NetBSD
5.x and 6.x, which is necessary for the C++ library because the host
 has:

#if !defined(__cplusplus) || defined(__STDC_LIMIT_MACROS)
#include 
#endif

#if !defined(__cplusplus) || defined(__STDC_CONSTANT_MACROS)
#include 
#endif

This means that contrary to the C++11 standard the stdint macros are
only defined when __STDC_CONSTANT_MACROS / __STDC_LIMIT_MACROS are
defined.

I first noted the problem earlier this year and opened
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65806

I rediscovered the problem when I broke netbsd bootstrap by including
 during bootstrap with https://gcc.gnu.org/r227684

That header uses UINT32_C, which is not defined without this patch.

NetBSD 7.x should be OK, because it knows about C++11 (see the link in
the PR for details).

Tested x86_64-unknown-netbsd5.1, OK for trunk?




diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index affc5ba..9450dcb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2015-09-16  Jonathan Wakely  
+
+   * config.gcc (*-*-netbsd[5-6]*): Set use_gcc_stdint=wrap.
+
2015-09-15  Alan Lawrence  

* config/aarch64/aarch64-simd.md
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 75807f5..394ded3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -788,6 +788,14 @@ case ${target} in
  default_use_cxa_atexit=yes
  ;;
  esac
+
+  # NetBSD 5.x and 6.x provide  but require
+  # __STDC_LIMIT_MACROS and __STDC_CONSTANT_MACROS for C++.
+  case ${target} in
+*-*-netbsd[5-6]* | *-*-netbsdelf[5-6]*)
+  use_gcc_stdint=wrap
+  ;;
+  esac
  ;;
*-*-openbsd*)
  tmake_file="t-openbsd"



Re: [gomp4] error on acc loops not associated with offloaded acc regions

2015-09-29 Thread Thomas Schwinge
Hi Cesar!

On Mon, 28 Sep 2015 10:08:34 -0700, Cesar Philippidis  
wrote:
> I've applied this patch to gomp-4_0-branch which teaches omplower how to
> error when it detects acc loops which aren't nested inside an acc
> parallel or kernels region or located within a function marked as an acc
> routine. A couple of test cases needed to be updated.
> 
> The error message is kind of long. Let me know if it should be revised.

>   gcc/testsuite/
>   * c-c++-common/goacc/non-routine.c: New test.
>   * c-c++-common/goacc-gomp/nesting-1.c: Add checks for invalid loop
>   nesting.
>   * c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
>   * c-c++-common/goacc/clauses-fail.c: Likewise.
>   * c-c++-common/goacc/sb-1.c: Likewise.
>   * c-c++-common/goacc/sb-3.c: Likewise.
>   * gcc.dg/goacc/sb-1.c: Likewise.
>   * gcc.dg/goacc/sb-3.c: Likewise.

What about any Fortran test cases?

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -2901,6 +2901,14 @@ check_omp_nesting_restrictions (gimple *stmt, 
> omp_context *ctx)
>   }
> return true;
>   }
> +  if (is_gimple_omp_oacc (stmt) && ctx == NULL
> +   && get_oacc_fn_attrib (current_function_decl) == NULL)
> + {
> +   error_at (gimple_location (stmt),
> + "acc loops must be associated with an acc region or "
> + "routine");
> +   return false;
> + }
>/* FALLTHRU */
>  case GIMPLE_CALL:
>if (is_gimple_call (stmt)

I see that the error reporting doesn't really use a consistent style
currently, but what about something like "loop directive must be
associated with compute region" (where "compute region" is the language
used by OpenACC 2.0a to mean the structured block associated with a
compute construct as well as routine directive)?

> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
> @@ -20,6 +20,7 @@ f_acc_kernels (void)
>}
>  }
>  
> +#pragma acc routine
>  void
>  f_acc_loop (void)
>  {

OK, but...

> --- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
> +++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
> @@ -361,72 +361,72 @@ f_acc_data (void)
>  void
>  f_acc_loop (void)
>  {
> -#pragma acc loop
> +#pragma acc loop /* { dg-error "acc loops must be associated with an acc 
> region or routine" } */
>for (i = 0; i < 2; ++i)
>  {
> -#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC 
> region" } */
> +#pragma omp parallel
>;
>  }

... here you're changing what this is meant to be testing, so please
restore the original meaning (by adding "#pragma acc routine" to this
function, I suppose), and then perhaps add whichever additional test
cases you deem necessary.

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/non-routine.c
> @@ -0,0 +1,16 @@
> +/* This program validates the behavior of acc loops which are
> +   not associated with a parallel or kernles region or routine.  */

:-) Thanks for adding such a comment -- this is missing in too many test
cases.


Grüße,
 Thomas


signature.asc
Description: PGP signature


[gomp4, committed] Don't unnecessarily set address taken in expand_omp_for_generic

2015-09-29 Thread Tom de Vries

Hi,

this patch sets the address taken bit for start0 and end0 in 
expand_omp_for_generic only if necessary. This fixes an ICE while 
compiling the test-case.


Committed to gomp-4_0-branch.

Thanks,
- Tom
Don't unnecessarily set address taken in expand_omp_for_generic

2015-09-29  Tom de Vries  

	* omp-low.c (expand_omp_for_generic): Only set address taken for istart0
	and end0 unless necessary.

	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.
---
 gcc/omp-low.c  | 10 ++---
 .../goacc/kernels-acc-loop-smaller-equal.c | 25 ++
 2 files changed, 32 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 597035f..a53a872 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6564,7 +6564,7 @@ expand_omp_for_generic (struct omp_region *region,
   gassign *assign_stmt;
   bool in_combined_parallel = is_combined_parallel (region);
   bool broken_loop = region->cont == NULL;
-  bool seq_loop = (!start_fn || !next_fn);
+  bool seq_loop = (start_fn == BUILT_IN_NONE || next_fn == BUILT_IN_NONE);
   edge e, ne;
   tree *counts = NULL;
   int i;
@@ -6576,8 +6576,12 @@ expand_omp_for_generic (struct omp_region *region,
   type = TREE_TYPE (fd->loop.v);
   istart0 = create_tmp_var (fd->iter_type, ".istart0");
   iend0 = create_tmp_var (fd->iter_type, ".iend0");
-  TREE_ADDRESSABLE (istart0) = 1;
-  TREE_ADDRESSABLE (iend0) = 1;
+
+  if (!seq_loop)
+{
+  TREE_ADDRESSABLE (istart0) = 1;
+  TREE_ADDRESSABLE (iend0) = 1;
+}
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
new file mode 100644
index 000..ba7414a
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
@@ -0,0 +1,25 @@
+/* { dg-additional-options "-O2" } */
+/* { dg-additional-options "-ftree-parallelize-loops=32" } */
+/* { dg-additional-options "-fdump-tree-parloops_oacc_kernels-all" } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+unsigned int
+foo (int n)
+{
+  unsigned int sum = 1;
+
+  #pragma acc kernels loop
+  for (int i = 1; i <= n; i++)
+sum += i;
+
+  return sum;
+}
+
+/* Check that only one loop is analyzed, and that it can be parallelized.  */
+/* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-not "FAILED:" "parloops_oacc_kernels" } } */
+
+/* Check that the loop has been split off into a function.  */
+/* { dg-final { scan-tree-dump-times "(?n);; Function .*foo.*\\._omp_fn\\.0" 1 "optimized" } } */
+
+/* { dg-final { scan-tree-dump-times "(?n)pragma omp target oacc_parallel.*num_gangs\\(32\\)" 1 "parloops_oacc_kernels" } } */
-- 
1.9.1



Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-29 Thread Richard Biener
On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
 wrote:
> Hi,
>
> In relation to the patch I put up for review a few weeks ago to teach
> RTL if-convert to handle multiple sets in a basic block [1], I was
> asking about a sensible cost model to use. There was some consensus at
> Cauldron that what should be done in this situation is to introduce a
> target hook that delegates answering the question to the target.

Err - the consensus was to _not_ add gazillion of special target hooks
but instead enhance what we have with rtx_cost so that passes can
rely on comparing before and after costs of a sequence of insns.

Richard.

> This patch series introduces that new target hook to provide cost
> decisions for the RTL ifcvt pass.
>
> The idea is to give the target full visibility of the proposed
> transformation, and allow it to respond as to whether if-conversion in that
> way is profitable.
>
> In order to preserve current behaviour across targets, we will need the
> default implementation to keep to the strategy of simply comparing branch
> cost against a magic number. Patch 1/3 performs this refactoring, which is
> a bit hairy in some corner cases.
>
> Patch 2/3 is a simple code move, pulling the definition of the if_info
> structure used by RTL if-convert in to ifcvt.h where it can be included
> by targets.
>
> Patch 3/3 then introduces the new target hook, with the same default
> behaviour as was previously in noce_is_profitable_p.
>
> The series has been bootstrapped on ARM, AArch64 and x86_64 targets, and
> I've verified with Spec2000 and Spec2006 runs that there are no code
> generation differences for any of these three targets after the patch.
>
> I also gave ultrasparc3 a quick go, from what I could see, I changed the
> register allocation for the floating-point condition code registers.
> Presumably this is a side effect of first constructing RTXen that I then
> discard. I didn't see anything which looked like more frequent reloads or
> substantial code generation changes, though I'm not familiar with the
> intricacies of the Sparc condition registers :).
>
> I've included a patch 4/3, to give an example of what a target might want
> to do with this hook. It needs work for tuning and deciding how the function
> should actually behave, but works if it is thought of as more of a
> strawman/prototype than a patch submission.
>
> Are parts 1, 2 and 3 OK?
>
> Thanks,
> James
>
> [1]: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00781.html
>
> ---
> [Patch ifcvt 1/3] Factor out cost calculations from noce cases
>
> 2015-09-26  James Greenhalgh  
>
> * ifcvt.c (noce_if_info): Add a magic_number field :-(.
> (noce_is_profitable_p): New.
> (noce_try_store_flag_constants): Move cost calculation
> to after sequence generation, factor it out to noce_is_profitable_p.
> (noce_try_addcc): Likewise.
> (noce_try_store_flag_mask): Likewise.
> (noce_try_cmove): Likewise.
> (noce_try_cmove_arith): Likewise.
> (noce_try_sign_mask): Add comment regarding cost calculations.
>
> [Patch ifcvt 2/3] Move noce_if_info in to ifcvt.h
>
> 2015-09-26  James Greenhalgh  
>
> * ifcvt.c (noce_if_info): Move to...
> * ifcvt.h (noce_if_info): ...Here.
>
> [Patch ifcvt 3/3] Create a new target hook for deciding profitability
> of noce if-conversion
>
> 2015-09-26  James Greenhalgh  
>
> * target.def (costs): New hook vector.
> (ifcvt_noce_profitable_p): New hook.
> * doc/tm.texi.in: Document it.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_ifcvt_noce_profitable_p): New.
> * targhooks.c (default_ifcvt_noce_profitable_p): New.
> * ifcvt.c (noce_profitable_p): Use new target hook.
>
> [Patch Prototype AArch64 ifcvt 4/3] Wire up the new if-convert costs
> hook for AArch64
>
> 2015-09-26  James Greenhalgh  
>
> * config/aarch64/aarch64.c
> (aarch64_additional_branch_cost_for_probability): New.
> (aarch64_ifcvt_noce_profitable_p): Likewise.
> (TARGET_COSTS_IFCVT_NOCE_PROFITABLE_P): Likewise.


Re: [patch, committed] Dump function attributes

2015-09-29 Thread Richard Biener
On Tue, Sep 29, 2015 at 1:23 PM, Tom de Vries  wrote:
> On 29/09/15 12:36, Richard Biener wrote:
>>
>> On Tue, Sep 29, 2015 at 7:43 AM, Tom de Vries 
>> wrote:
>>>
>>> [ was: Re: [RFC] Dump function attributes ]
>>>
>>> On 28/09/15 17:17, Bernd Schmidt wrote:


 On 09/28/2015 04:32 PM, Tom de Vries wrote:
>
>
> patch below prints the function attributes in the dump file.



> foo ()
> [ noclone , noinline ]
> {
> ...
>
> Good idea?
>
> If so, do we want one attribute per line?



 Only for really long ones I'd think. Patch is ok for now.


>>>
>>> Reposting patch with ChangeLog entry added.
>>>
>>> Bootstrapped and reg-tested on x86_64.
>>>
>>> Committed to trunk.
>>
>>
>> Hmpf.  I always like to make the dump-files as much copy to
>> testcases
>> as possible.
>
>
> Hmm, interesting. Not something I use, but I can imagine it's useful.
>
>> So why did you invent a new syntax for attributes instead of using
>> the existing __attribute__(("noclone", "noinline")) (in this case)?
>
>
> My main concerns were:
> - being able to see in dump files what the actual attributes of a
>   function are (rather than having to figure it out in a debug session).
> - being able to write testcases that can test for the presence of those
>   attributes in dump files
>
>> Did you verify
>> how attributes with arguments get printed?
>
>
> F.i. an oacc offload function compiled by the host compiler is annotated as
> follows:
>
> before pass_oacc_transform (in the gomp-4_0-branch):
> ...
> [ oacc function 32, , , omp target entrypoint ]
> ...
>
> after pass_oacc_transform:
> 
> [ oacc function 1, 1, 1, omp target entrypoint ]
> .

Hmm, ok.  So without some extra dump_attribute_list wrapping
__attribute_(( ... )) around the above doesn't make it more amenable
for cut

Richard.

>
> Thanks,
> - Tom


[patch] libstdc++/67747 Allocate space for dirent::d_name

2015-09-29 Thread Jonathan Wakely

POSIX says that dirent::d_name has an unspecified length, so calls to
readdir_r must pass a buffer with enough trailing space for
{NAME_MAX}+1 characters. I wasn't doing that, which works OK on
GNU/Linux and BSD where d_name is a large array, but fails on Solaris
32-bit.

This uses pathconf to get NAME_MAX and allocates a buffer.

Tested powerpc64le-linux and x86_64-dragonfly4.1, I'm going to commit
this to trunk today (and backport all the filesystem fixes to
gcc-5-branch).

commit 16ff5d124b8e6c5d1f9dd4edb81b6ca5c9129134
Author: Jonathan Wakely 
Date:   Tue Sep 29 11:58:19 2015 +0100

PR libstdc++/67747 Allocate space for dirent::d_name

	PR libstdc++/67747
	* src/filesystem/dir.cc (_Dir::dirent_buf): New member.
	(get_name_max): New function.
	(native_readdir) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Copy to supplied
	dirent object. Handle end of directory.
	(_Dir::advance): Allocate space for d_name.

diff --git a/libstdc++-v3/src/filesystem/dir.cc b/libstdc++-v3/src/filesystem/dir.cc
index bce751c..d29f8eb 100644
--- a/libstdc++-v3/src/filesystem/dir.cc
+++ b/libstdc++-v3/src/filesystem/dir.cc
@@ -25,8 +25,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+# include 
+#endif
 #ifdef _GLIBCXX_HAVE_DIRENT_H
 # ifdef _GLIBCXX_HAVE_SYS_TYPES_H
 #  include 
@@ -64,20 +68,23 @@ struct fs::_Dir
   fs::path		path;
   directory_entry	entry;
   file_type		type = file_type::none;
+  unique_ptr	dirent_buf;
 };
 
 namespace
 {
   template
-inline bool is_set(Bitmask obj, Bitmask bits)
+inline bool
+is_set(Bitmask obj, Bitmask bits)
 {
   return (obj & bits) != Bitmask::none;
 }
 
   // Returns {dirp, p} on success, {nullptr, p} on error.
   // If an ignored EACCES error occurs returns {}.
-  fs::_Dir
-  open_dir(const fs::path& p, fs::directory_options options, std::error_code* ec)
+  inline fs::_Dir
+  open_dir(const fs::path& p, fs::directory_options options,
+	   std::error_code* ec)
   {
 if (ec)
   ec->clear();
@@ -99,8 +106,22 @@ namespace
 return {nullptr, p};
   }
 
+  inline long
+  get_name_max(const fs::path& path __attribute__((__unused__)))
+  {
+#ifdef _GLIBCXX_HAVE_UNISTD_H
+long name_max = pathconf(path.c_str(), _PC_NAME_MAX);
+if (name_max != -1)
+  return name_max;
+#endif
+
+// Maximum path component on Windows is 255 (UTF-16?) characters,
+// which is a reasonable default for POSIX too.
+return 255;
+  }
+
   inline fs::file_type
-  get_file_type(const dirent& d __attribute__((__unused__)))
+  get_file_type(const ::dirent& d __attribute__((__unused__)))
   {
 #ifdef _GLIBCXX_HAVE_STRUCT_DIRENT_D_TYPE
 switch (d.d_type)
@@ -129,12 +150,26 @@ namespace
 #endif
   }
 
-  int
+  inline int
   native_readdir(DIR* dirp, ::dirent*& entryp)
   {
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-if ((entryp = ::readdir(dirp)))
-  return 0;
+const int saved_errno = errno;
+errno = 0;
+if (auto entp = ::readdir(dirp))
+  {
+	size_t name_len = strlen(entp->d_name);
+	if (name_len > 255)
+	  return ENAMETOOLONG;
+	size_t len = offsetof(::dirent, d_name) + name_len + 1;
+	memcpy(entryp, entp, len);
+	return 0;
+  }
+else if (errno == 0) // End of directory reached.
+  {
+	errno = saved_errno;
+	entryp = nullptr;
+  }
 return errno;
 #else
 return ::readdir_r(dirp, entryp, );
@@ -142,6 +177,7 @@ namespace
   }
 }
 
+
 // Returns false when the end of the directory entries is reached.
 // Reports errors by setting ec or throwing.
 bool
@@ -150,9 +186,15 @@ fs::_Dir::advance(error_code* ec, directory_options options)
   if (ec)
 ec->clear();
 
-  ::dirent ent;
-  ::dirent* result = 
-  if (int err = native_readdir(dirp, result))
+  if (!dirent_buf)
+{
+  size_t len = offsetof(::dirent, d_name) + get_name_max(path) + 1;
+  dirent_buf.reset(new char[len]);
+}
+
+  ::dirent* entp = reinterpret_cast<::dirent*>(dirent_buf.get());
+
+  if (int err = native_readdir(dirp, entp))
 {
   if (err == EACCES
 && is_set(options, directory_options::skip_permission_denied))
@@ -165,13 +207,13 @@ fs::_Dir::advance(error_code* ec, directory_options options)
   ec->assign(err, std::generic_category());
   return true;
 }
-  else if (result != nullptr)
+  else if (entp != nullptr)
 {
   // skip past dot and dot-dot
-  if (!strcmp(ent.d_name, ".") || !strcmp(ent.d_name, ".."))
+  if (!strcmp(entp->d_name, ".") || !strcmp(entp->d_name, ".."))
 	return advance(ec, options);
-  entry = fs::directory_entry{path / ent.d_name};
-  type = get_file_type(ent);
+  entry = fs::directory_entry{path / entp->d_name};
+  type = get_file_type(*entp);
   return true;
 }
   else


Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Peter Bergner
On Mon, 2015-09-28 at 15:28 -0500, Segher Boessenkool wrote:
> On Mon, Sep 28, 2015 at 03:23:37PM -0400, Vladimir Makarov wrote:
> > There are more ports using reload than LRA now.  Even some major ports 
> > (e.g. ppc64) did not switch to LRA.
> 
> There still are some failures in the testsuite (ICEs even) so we're
> not there yet.

I've started to looking through the failures with a target of getting POWER
converted to LRA before the switch to stage3.  From a quick scan, I see what
looks like two different ICEs on multiple tests and one wrong code gen issue.

The first ICE seems to be due to a conversion to long double and LRA ends
up going into a infinite loop spilling things until it hits a threshold and
quits with an ICE.  I haven't spent enough time to determine whether this
is a LRA or port issue yet though.  The simplest test case I have at the
moment is:

bergner@genoa:~/gcc/BUGS/LRA/20011123-1$ cat bug2.i
void
foo (long double *ldb1, double *db1)
{
  *ldb1 = *db1;
}
bergner@genoa:~/gcc/BUGS/LRA/20011123-1$ 
/home/bergner/gcc/build/gcc-fsf-mainline-bootstrap-lra-default-debug/gcc/xgcc 
-B/home/bergner/gcc/build/gcc-fsf-mainline-bootstrap-lra-default-debug/gcc/ -S 
-O1 -mvsx -S bug2.i
bug2.i: In function ‘foo’:
bug2.i:5:1: internal compiler error: Max. number of generated reload insns per 
insn is achieved (90)

 }
 ^
0x10962903 lra_constraints(bool)

/home/bergner/gcc/gcc-fsf-mainline-bootstrap-lra-default/gcc/lra-constraints.c:4351
0x10942af7 lra(_IO_FILE*)
/home/bergner/gcc/gcc-fsf-mainline-bootstrap-lra-default/gcc/lra.c:2298
0x108c0ac7 do_reload
/home/bergner/gcc/gcc-fsf-mainline-bootstrap-lra-default/gcc/ira.c:5391
0x108c1183 execute
/home/bergner/gcc/gcc-fsf-mainline-bootstrap-lra-default/gcc/ira.c:5562


After IRA, things are pretty simple, with just the following one insn which 
needs
a reload/spill, since we don't have memory to memory ops on POWER:

(insn 7 4 10 2 (parallel [
(set (mem:TF (reg:DI 3 3 [ ldb1 ]) [0 *ldb1_5(D)+0 S16 A128])
(float_extend:TF (mem:DF (reg:DI 4 4 [ db1 ]) [0 *db1_2(D)+0 S8 
A64])))
(use (const_double:DF 0.0 [0x0.0p+0]))
]) bug2.i:4 445 {*extenddftf2_internal}
 (expr_list:REG_DEAD (reg:DI 4 4 [ db1 ])
(expr_list:REG_DEAD (reg:DI 3 3 [ ldb1 ])
(nil

In LRA, comes along and gives us the following which looks good:

(insn 7 4 11 2 (parallel [
(set (reg:TF 159)
(float_extend:TF (mem:DF (reg:DI 4 4 [ db1 ]) [0 *db1_2(D)+0 S8 
A64])))
(use (const_double:DF 0.0 [0x0.0p+0]))
]) bug2.i:4 445 {*extenddftf2_internal}
 (expr_list:REG_DEAD (reg:DI 4 4 [ db1 ])
(expr_list:REG_DEAD (reg:DI 3 3 [ ldb1 ])
(nil

(insn 11 7 10 2 (set (mem:TF (reg:DI 3 3 [ ldb1 ]) [0 *ldb1_5(D)+0 S16 A128])
(reg:TF 159)) bug2.i:4 435 {*movtf_64bit_dm}
 (nil))

but for some reason, it thinks reg 159 needs reloading and gives us:

(insn 7 4 12 2 (parallel [
(set (reg:TF 159)
(float_extend:TF (mem:DF (reg:DI 4 4 [ db1 ]) [0 *db1_2(D)+0 S8 
A64])))
(use (const_double:DF 0.0 [0x0.0p+0]))
]) bug2.i:4 445 {*extenddftf2_internal}
 (expr_list:REG_DEAD (reg:DI 4 4 [ db1 ])
(expr_list:REG_DEAD (reg:DI 3 3 [ ldb1 ])
(nil

(insn 12 7 11 2 (set (reg:TF 160 [159])
(reg:TF 159)) bug2.i:4 435 {*movtf_64bit_dm}
 (nil))

(insn 11 12 10 2 (set (mem:TF (reg:DI 3 3 [ ldb1 ]) [0 *ldb1_5(D)+0 S16 A128])
(reg:TF 160 [159])) bug2.i:4 435 {*movtf_64bit_dm}
 (nil))

and we end up doing it again and again and...until we hit the reload threshold
and ICE.  That's as far as I've gotten at this point.  Comments welcome since
I've had to put this on the shelf at the moment while working on next year's
work schedule for our team.

I haven't had a chance to look into the other ICE or wrong code gen issue yet,
but will eventually will get to those.

Peter








Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Jakub Jelinek
On Tue, Sep 29, 2015 at 04:15:41PM +0100, Jonathan Wakely wrote:
> We set errno=0 in __gnu_cxx::__stoa in order to reliably detect when
> it gets set to ERANGE. This restores the previous value when the
> conversion is successful.
> 
> Tested powerpc64le-linux, committed to trunk.

> commit 412f75dc37b1048e14996c9caafa46c00db8eb30
> Author: Jonathan Wakely 
> Date:   Tue Sep 29 15:09:23 2015 +0100
> 
> Leave errno unchanged by successful std::stoi etc
> 
>   * include/ext/string_conversions.h (__stoa): Save and restore errno.
>   * testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc:
>   New.
> 
> diff --git a/libstdc++-v3/include/ext/string_conversions.h 
> b/libstdc++-v3/include/ext/string_conversions.h
> index f4648a8..58387a2 100644
> --- a/libstdc++-v3/include/ext/string_conversions.h
> +++ b/libstdc++-v3/include/ext/string_conversions.h
> @@ -58,6 +58,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>_Ret __ret;
>  
>_CharT* __endptr;
> +  const int __saved_errno = errno;
>errno = 0;
>const _TRet __tmp = __convf(__str, &__endptr, __base...);
>  
> @@ -70,6 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>   std::__throw_out_of_range(__name);
>else
>   __ret = __tmp;
> +  errno = __saved_errno;

That looks wrong to me, you only restore errno if you don't throw :(.
If you throw, then errno might remain 0, which is IMHO undesirable.
So, I'd say you want to restore it earlier, right after __convf, and
immediately before that copy the current errno to some other temporary
for the use in the condition?  Or restore errno = __saved_errno;
in all the 3 spots instead of just one.

Jakub


Re: [Patch ifcvt costs 0/3] Introduce a new target hook for ifcvt costs.

2015-09-29 Thread James Greenhalgh
On Tue, Sep 29, 2015 at 11:16:37AM +0100, Richard Biener wrote:
> On Fri, Sep 25, 2015 at 5:04 PM, James Greenhalgh
>  wrote:
> > Hi,
> >
> > In relation to the patch I put up for review a few weeks ago to teach
> > RTL if-convert to handle multiple sets in a basic block [1], I was
> > asking about a sensible cost model to use. There was some consensus at
> > Cauldron that what should be done in this situation is to introduce a
> > target hook that delegates answering the question to the target.
> 
> Err - the consensus was to _not_ add gazillion of special target hooks
> but instead enhance what we have with rtx_cost so that passes can
> rely on comparing before and after costs of a sequence of insns.

Ah, I was not able to attend Cauldron this year, so I was trying to pick out
"consensus" from the video. Rewatching it now, I see a better phrase would
be "suggestion with some support".

Watching the video a second time, it seems your proposal is that we improve
the RTX costs infrastructure to handle sequences of Gimple/RTX. That would
get us some way to making a smart decision in if-convert, but I'm not
convinced it allows us to answer the question we are interested in.

We have the rtx for before and after, and we can generate costs for these
sequences. This allows us to calculate some weighted cost of the
instructions based on the calculated probabilities that each block is
executed. However, we are missing information on how expensive the branch
is, and we have no way to get that through an RTX-costs infrastructure.

We could add a hook to give a cost in COSTS_N_INSNS units to a branch based
on its predictability. This is difficult as COSTS_N_INSNS units can differ
depending on whether you are talking about floating-point or integer code.
By this I mean, the compiler considers a SET which costs more than
COSTS_N_INSNS (1) to be "expensive". Consequently, some targets set the cost
of both an integer SET and a floating-point SET to both be COSTS_N_INSNS (1).
In reality, these instructions may have different latency performance
characteristics. What real world quantity are we trying to invoke when we
say a branch costs the same as 3 SET instructions of any type? It certainly
isn't mispredict penalty (likely measured in cycles, not relative to the cost
of a SET instruction, which may well be completely free on modern x86
processors), nor is it the cost of executing the branch instruction which
is often constant to resolve regardless of predicted/mispredicted status.

On the other side of the equation, we want a cost for the converted
sequence. We can build a cost of the generated rtl sequence, but for
targets like AArch64 this is going to be wildly off. AArch64 will expand
(a > b) ? x : y; as a set to the CC register, followed by a conditional
move based on the CC register. Consequently, where we have multiple sets
back to back we end up with:

  set CC (a > b)
  set x1 (CC ? x : y)
  set CC (a > b)
  set x2 (CC ? x : z)
  set CC (a > b)
  set x3 (CC ? x : k)

Which we know will be simplified later to:

  set CC (a > b)
  set x1 (CC ? x : y)
  set x2 (CC ? x : z)
  set x3 (CC ? x : k)

I imagine other targets have something similar in their expansion of
movcc (though I haven't looked).

Our comparison for if-conversion then must be:

  weighted_old_cost = (taken_probability * (then_bb_cost)
- (1 - taken_probability) * (else_bb_cost));
  branch_cost = branch_cost_in_insns (taken_probability)
  weighted_new_cost = redundancy_factor (new_sequence) * seq_cost (new_sequence)

  profitable = weighted_new_cost <= weighted_old_cost + branch_cost

And we must define:

  branch_cost_in_insns (taken_probability)
  redundancy_factor (new_sequence)

At that point, I feel you are better giving the entire sequence to the
target and asking it to implement whatever logic is needed to return a
profitable/unprofitable analysis of the transformation.

The "redundancy_factor" in particular is pretty tough to define in a way
which makes sense outside of if_convert, without adding some pretty
detailed analysis to decide what might or might not be eliminated by
later passes. The alternative is to weight the other side of the equation
by tuning the cost of branch_cost_in_insns high. This only serves to increase
the disconnect between a real-world cost and a number to tweak to game
code generation.

If you have a different way of phrasing the if-conversion question that
avoids the two very specific hooks, I'd be happy to try taking the patches
in that direction. I don't see a way to implement this as just queries to
a costing function which does not need substantial target and pass
dependent tweaking to make behave correctly.

Thanks,
James

> > This patch series introduces that new target hook to provide cost
> > decisions for the RTL ifcvt pass.
> >
> > The idea is to give the target full visibility of the proposed
> > transformation, and allow it to respond as to whether 

[patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Jonathan Wakely

We set errno=0 in __gnu_cxx::__stoa in order to reliably detect when
it gets set to ERANGE. This restores the previous value when the
conversion is successful.

Tested powerpc64le-linux, committed to trunk.
commit 412f75dc37b1048e14996c9caafa46c00db8eb30
Author: Jonathan Wakely 
Date:   Tue Sep 29 15:09:23 2015 +0100

Leave errno unchanged by successful std::stoi etc

	* include/ext/string_conversions.h (__stoa): Save and restore errno.
	* testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc:
	New.

diff --git a/libstdc++-v3/include/ext/string_conversions.h b/libstdc++-v3/include/ext/string_conversions.h
index f4648a8..58387a2 100644
--- a/libstdc++-v3/include/ext/string_conversions.h
+++ b/libstdc++-v3/include/ext/string_conversions.h
@@ -58,6 +58,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Ret __ret;
 
   _CharT* __endptr;
+  const int __saved_errno = errno;
   errno = 0;
   const _TRet __tmp = __convf(__str, &__endptr, __base...);
 
@@ -70,6 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	std::__throw_out_of_range(__name);
   else
 	__ret = __tmp;
+  errno = __saved_errno;
 
   if (__idx)
 	*__idx = __endptr - __str;
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc
new file mode 100644
index 000..4079744
--- /dev/null
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc
@@ -0,0 +1,36 @@
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+// { dg-require-string-conversions "" }
+
+#include 
+#include 
+
+void
+test01()
+{
+  errno = ERANGE;
+  std::stoi("42");
+  VERIFY( errno == ERANGE ); // errno should not be altered by successful call
+}
+
+int
+main()
+{
+  test01();
+}


Re: [PATCH] Clarify __atomic_compare_exchange_n docs

2015-09-29 Thread Sandra Loosemore

On 09/29/2015 06:00 AM, Jonathan Wakely wrote:

Someone on IRC incorrectly parsed the docs at
https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/_005f_005fatomic-Builtins.html#index-g_t_005f_005fatomic_005fcompare_005fexchange_005fn-3536

as:

  IF
  (
   desired is written into *ptr
   AND
   the execution is considered to conform to the memory model
   specified by success_memmodel.
  )
  {
   true is returned
  }
  otherwise ...

rather than the intended:

  IF ( desired is written into *ptr )
  {
   true is returned
   AND
   the execution is considered to conform to the memory model
   specified by success_memmodel.
  }
  otherwise ...

So they asked:


What is otherwise, here? Can I make the function return false even
when 'desired' has been written into 'ptr'? How do I do it? I could
not write an example, so far.


This patch rewords it to avoid the ambiguity.

I've also replaced the rather clunky "the operation is considered to
conform to" phrasing. (It's only _considered_ to? So does it or doesn't
it use that memory order?) Instead I've used the terminology from the
C and C++ standards, which say "memory is affected according to".

OK for trunk?


This is OK, as far as it goes, but while we're at it, can we do 
something to fix the description of the weak parameter?



@@ -9353,17 +9353,17 @@ This compares the contents of @code{*@var{ptr}} with 
the contents of
 @code{*@var{expected}}. If equal, the operation is a @emph{read-modify-write}
 operation that writes @var{desired} into @code{*@var{ptr}}.  If they are not
 equal, the operation is a @emph{read} and the current contents of
-@code{*@var{ptr}} is written into @code{*@var{expected}}.  @var{weak} is true
+@code{*@var{ptr}} are written into @code{*@var{expected}}.  @var{weak} is true
 for weak compare_exchange, and false for the strong variation.  Many targets
 only offer the strong variation and ignore the parameter.  When in doubt, use
 the strong variation.


What is "weak compare_exchange", and what is "the strong variation", and 
how do they differ in terms of behavior?


-Sandra



Re: patch to fix PR66424

2015-09-29 Thread Matthias Klose
This was marked as a regression in 5 and 6, but never backported to the 
gcc-5-branch. Is it time to backport?


Matthias

On 21.07.2015 21:54, Vladimir Makarov wrote:

   The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66424

   The patch was tested and bootstrapped on x86/x86-64.

   Committed as rev. 226053.

2015-07-21  Vladimir Makarov  

 PR ipa/66424.
 * lra-remat.c (operand_to_remat): Prevent using insns with input
 subregs processed separately by IRA.

2015-07-21  Vladimir Makarov  

 PR ipa/66424.
 * gcc.target/i386/pr66424.c: New.






Re: [PATCH] Clear variables with stale SSA_NAME_RANGE_INFO (PR tree-optimization/67690)

2015-09-29 Thread Marek Polacek
On Fri, Sep 25, 2015 at 06:22:44PM +0200, Richard Biener wrote:
> On September 25, 2015 3:49:34 PM GMT+02:00, Marek Polacek 
>  wrote:
> >On Fri, Sep 25, 2015 at 09:29:30AM +0200, Richard Biener wrote:
> >> On Thu, 24 Sep 2015, Marek Polacek wrote:
> >> 
> >> > As Richi said in
> >,
> >> > using recorded SSA name range infos in VRP is likely to expose
> >errors in the
> >> > ranges.  This PR is such a case.  As discussed in the PR, after
> >tail merging
> >> > via PRE the range infos cannot be relied upon anymore, so we need
> >to clear
> >> > them.
> >> > 
> >> > Since tree-ssa-ifcombine.c already had code to clean up the flow
> >data in a BB,
> >> > I've factored it out to a common function.
> >> > 
> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?
> >> 
> >> I believe for tail-merge you also need to clear range info on
> >> PHI defs in the BB.  For ifcombine this wasn't necessary (no PHI
> >nodes
> >> in the relevant CFG), but it's ok to extend the new 
> >> reset_flow_sensitive_info_in_bb function to also reset PHI defs.
> >
> >All right.
> > 
> >> Ok with that change.
> >
> >Since I'm not completely sure if I did the right thing here, could you
> >please have another look at the new function?
> 
> Doesn't work that way.  You need to iterate over the PHI sequence separately 
> via gsi_start_phis(bb), etc.

Oops, sorry.  So like this?

Bootstrapped/regtested on x86_64-linux, ok for trunk (and a similar
patch for 5)?

2015-09-29  Marek Polacek  

PR tree-optimization/67690
* tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): Call
reset_flow_sensitive_info_in_bb.
* tree-ssa-tail-merge.c (replace_block_by): Likewise.
* tree-ssanames.c: Include "gimple-iterator.h".
(reset_flow_sensitive_info_in_bb): New function.
* tree-ssanames.h (reset_flow_sensitive_info_in_bb): Declare.

* gcc.dg/torture/pr67690.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr67690.c 
gcc/testsuite/gcc.dg/torture/pr67690.c
index e69de29..491de51 100644
--- gcc/testsuite/gcc.dg/torture/pr67690.c
+++ gcc/testsuite/gcc.dg/torture/pr67690.c
@@ -0,0 +1,32 @@
+/* { dg-do run } */
+
+const int c1 = 1;
+const int c2 = 2;
+
+int
+check (int i)
+{
+  int j;
+  if (i >= 0)
+j = c2 - i;
+  else
+j = c2 - i;
+  return c2 - c1 + 1 > j;
+}
+
+int invoke (int *pi) __attribute__ ((noinline,noclone));
+int
+invoke (int *pi)
+{
+  return check (*pi);
+}
+
+int
+main ()
+{
+  int i = c1;
+  int ret = invoke ();
+  if (!ret)
+__builtin_abort ();
+  return 0;
+}
diff --git gcc/tree-ssa-ifcombine.c gcc/tree-ssa-ifcombine.c
index 9f04174..66be430 100644
--- gcc/tree-ssa-ifcombine.c
+++ gcc/tree-ssa-ifcombine.c
@@ -769,16 +769,7 @@ pass_tree_ifcombine::execute (function *fun)
  {
/* Clear range info from all stmts in BB which is now executed
   conditional on a always true/false condition.  */
-   for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
-!gsi_end_p (gsi); gsi_next ())
- {
-   gimple *stmt = gsi_stmt (gsi);
-   ssa_op_iter i;
-   tree op;
-   FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
- reset_flow_sensitive_info (op);
- }
-
+   reset_flow_sensitive_info_in_bb (bb);
cfg_changed |= true;
  }
 }
diff --git gcc/tree-ssa-tail-merge.c gcc/tree-ssa-tail-merge.c
index 0ce59e8..487961e 100644
--- gcc/tree-ssa-tail-merge.c
+++ gcc/tree-ssa-tail-merge.c
@@ -1534,6 +1534,10 @@ replace_block_by (basic_block bb1, basic_block bb2)
   e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
 }
 
+  /* Clear range info from all stmts in BB2 -- this transformation
+ could make them out of date.  */
+  reset_flow_sensitive_info_in_bb (bb2);
+
   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);
   same_succ_flush_bb (bb1);
diff --git gcc/tree-ssanames.c gcc/tree-ssanames.c
index 4199290..7235dc3 100644
--- gcc/tree-ssanames.c
+++ gcc/tree-ssanames.c
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backend.h"
 #include "tree.h"
 #include "gimple.h"
+#include "gimple-iterator.h"
 #include "hard-reg-set.h"
 #include "ssa.h"
 #include "alias.h"
@@ -544,6 +545,29 @@ reset_flow_sensitive_info (tree name)
 SSA_NAME_RANGE_INFO (name) = NULL;
 }
 
+/* Clear all flow sensitive data from all statements and PHI definitions
+   in BB.  */
+
+void
+reset_flow_sensitive_info_in_bb (basic_block bb)
+{
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+   gsi_next ())
+{
+  gimple *stmt = gsi_stmt (gsi);
+  ssa_op_iter i;
+  tree op;
+  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
+   reset_flow_sensitive_info (op);
+}
+
+  for (gphi_iterator gsi = gsi_start_phis (bb); 

[gomp4] Rename oacc_transform pass

2015-09-29 Thread Nathan Sidwell
I've committed this to gomp4 branch.  It renames the oacc_transform pass to 
oacc_device_lower, in line  with the (now withdrawn) patch for mainline.


I'm preparing a version of the pass for mainline with a different initial use 
than acc_on_device folding.


nathan
2015-09-29  Nathan Sidwell  
	Cesar Philippidis  

	* passes.def: Rename pass_oacc_transform to pass_oacc_device_lower.
	* tree-pass.h (make_pass_oacc_transform): Rename to ...
	(make_pass_oacc_device_lower): ... here.
	* doc/invoke/texi (oaccdevlow): Document tree dump flag.
	* omp-low.c (execute_oacc_transform): Rename to ...
	(execute_oacc_device_lower): ... here.
	(pass_data pass_data_oacc_transform): Rename to ...
	(pass_data pass_data_oacc_device_lower): ... here. Adjust name.
	(class pass_oacc_transform): Rename to ...
	class pass_oacc_device_lower): ... here.
	(make_pass_oacc_transform): Rename to ...
	(make_pass_oacc_device_lower): ... here.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 228241)
+++ gcc/doc/invoke.texi	(working copy)
@@ -1,3 +1,4 @@
+
 @c Copyright (C) 1988-2015 Free Software Foundation, Inc.
 @c This is part of the GCC manual.
 @c For copying conditions, see the file gcc.texi.
@@ -332,6 +333,7 @@ Objective-C and Objective-C++ Dialects}.
 -fdump-passes @gol
 -fdump-statistics @gol
 -fdump-tree-all @gol
+-fdump-tree-accdevlow @gol
 -fdump-tree-original@r{[}-@var{n}@r{]}  @gol
 -fdump-tree-optimized@r{[}-@var{n}@r{]} @gol
 -fdump-tree-cfg -fdump-tree-alias @gol
@@ -7246,6 +7248,11 @@ is made by appending @file{.slp} to the
 Dump each function after Value Range Propagation (VRP).  The file name
 is made by appending @file{.vrp} to the source file name.
 
+@item oaccdevlow
+@opindex fdump-tree-oaccdevlow
+Dump each function after applying device-specific OpenACC transformations.
+The file name is made by appending @file{.oaccdevlow} to the source file name.
+
 @item all
 @opindex fdump-tree-all
 Enable all the available tree dumps with the flags provided in this option.
Index: gcc/passes.def
===
--- gcc/passes.def	(revision 228241)
+++ gcc/passes.def	(working copy)
@@ -164,7 +164,7 @@ along with GCC; see the file COPYING3.
   INSERT_PASSES_AFTER (all_passes)
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
-  NEXT_PASS (pass_oacc_transform);
+  NEXT_PASS (pass_oacc_device_lower);
   NEXT_PASS (pass_all_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
   NEXT_PASS (pass_remove_cgraph_callee_edges);
Index: gcc/tree-pass.h
===
--- gcc/tree-pass.h	(revision 228241)
+++ gcc/tree-pass.h	(working copy)
@@ -411,7 +411,7 @@ extern gimple_opt_pass *make_pass_late_l
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
-extern gimple_opt_pass *make_pass_oacc_transform (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fold_builtins (gcc::context *ctxt);
Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 228241)
+++ gcc/omp-low.c	(working copy)
@@ -14836,7 +14836,7 @@ oacc_validate_dims (tree fn, tree attrs,
point (including the host fallback).  */
 
 static unsigned int
-execute_oacc_transform ()
+execute_oacc_device_lower ()
 {
   tree attrs = get_oacc_fn_attrib (current_function_decl);
   int dims[GOMP_DIM_MAX];
@@ -15036,10 +15036,10 @@ default_goacc_reduction (gcall *call)
 
 namespace {
 
-const pass_data pass_data_oacc_transform =
+const pass_data pass_data_oacc_device_lower =
 {
   GIMPLE_PASS, /* type */
-  "fold_oacc_transform", /* name */
+  "accdevlow", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_cfg, /* properties_required */
@@ -15049,11 +15049,11 @@ const pass_data pass_data_oacc_transform
   TODO_update_ssa | TODO_cleanup_cfg, /* todo_flags_finish */
 };
 
-class pass_oacc_transform : public gimple_opt_pass
+class pass_oacc_device_lower : public gimple_opt_pass
 {
 public:
-  pass_oacc_transform (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_oacc_transform, ctxt)
+  pass_oacc_device_lower (gcc::context *ctxt)
+: gimple_opt_pass (pass_data_oacc_device_lower, ctxt)
   {}
 
   /* opt_pass methods: */
@@ -15064,17 +15064,17 @@ public:
   if (!gate)
 	return 0;
 
-  return execute_oacc_transform ();
+  return execute_oacc_device_lower ();
 }
 
-}; // class pass_oacc_transform
+}; // class pass_oacc_device_lower
 
 

[PATCH] Fix undefined behaviour in msp430 port

2015-09-29 Thread Jeff Law


Similar to the fixes from the weekend.  Avoiding left shifts of negative 
signed values in the obvious way.


Tested by building msp430 targets from config-all.mk.

Installed on the trunk.

Jeff
commit 679cec5bd2f9ca9c6dabff89d0103790d560c0cb
Author: Jeff Law 
Date:   Mon Sep 28 19:24:56 2015 -0400

[PATCH] Fix undefined behaviour in msp430 port

   * config/msp430/msp430.c (msp430_legitimate_constant): Fix undefined
left shift behaviour.
* config/msp430/constraints.md ('L' constraint): Similarly.
('Ys' constraint): Similarly.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 03f566c..1b9985a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2015-09-29  Jeff Law  
+
+   * config/msp430/msp430.c (msp430_legitimate_constant): Fix undefined
+   left shift behaviour.
+   * config/msp430/constraints.md ('L' constraint): Similarly.
+   ('Ys' constraint): Similarly.
+
 2015-09-29  Richard Biener  
 
PR tree-optimization/67170
diff --git a/gcc/config/msp430/constraints.md b/gcc/config/msp430/constraints.md
index 30f944c..dfda152 100644
--- a/gcc/config/msp430/constraints.md
+++ b/gcc/config/msp430/constraints.md
@@ -32,7 +32,7 @@
 (define_constraint "L"
   "Integer constant -1^20..1^19."
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, -1 << 20, 1 << 19)")))
+   (match_test "IN_RANGE (ival, HOST_WIDE_INT_M1U << 20, 1 << 19)")))
 
 (define_constraint "M"
   "Integer constant 1-4."
@@ -77,7 +77,7 @@
(and (match_code "plus" "0")
 (and (match_code "reg" "00")
  (match_test ("CONST_INT_P (XEXP (XEXP (op, 0), 1))"))
- (match_test ("IN_RANGE (INTVAL (XEXP (XEXP (op, 0), 1)), -1 
<< 15, (1 << 15)-1)"
+ (match_test ("IN_RANGE (INTVAL (XEXP (XEXP (op, 0), 1)), 
HOST_WIDE_INT_M1U << 15, (1 << 15)-1)"
(match_code "reg" "0")
)))
 
diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index d2308cb..ba8d862 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -998,7 +998,7 @@ msp430_legitimate_constant (machine_mode mode, rtx x)
 /* GCC does not know the width of the PSImode, so make
sure that it does not try to use a constant value that
is out of range.  */
-|| (INTVAL (x) < (1 << 20) && INTVAL (x) >= (-1 << 20));
+|| (INTVAL (x) < (1 << 20) && INTVAL (x) >= 
(HOST_WIDE_INT)(HOST_WIDE_INT_M1U << 20));
 }
 
 


Re: [C PATCH] Fix missing warning (PR c/67730)

2015-09-29 Thread Marek Polacek
On Tue, Sep 29, 2015 at 06:04:55PM +0200, Marc Glisse wrote:
> On Tue, 29 Sep 2015, Marek Polacek wrote:
> 
> >This fixes missing warning for the attached testcase.  In such a case,
> >we must use the expansion point location.  I didn't simply add
> > loc = expansion_point_location_if_in_system_header (loc);
> >as might be seen elsewhere in the codebase because we pass LOC down to
> >convert_for_assignment where many of the warnings are issued and I was
> >nervous about passing a different location there.
> 
> I assume this means that the other missing warning from
> http://stackoverflow.com/questions/32732281/no-warning-when-returning-null-with-gcc
> (same code but change the return type from void to int)
> is not fixed at the same time?

Nope, I wasn't aware of that one :(.  Maybe we want the
  loc = expansion_point_location_if_in_system_header (loc);
line after all...

Marek


Re: [C PATCH] Fix missing warning (PR c/67730)

2015-09-29 Thread Joseph Myers
On Tue, 29 Sep 2015, Marek Polacek wrote:

> This fixes missing warning for the attached testcase.  In such a case,
> we must use the expansion point location.  I didn't simply add
>   loc = expansion_point_location_if_in_system_header (loc);
> as might be seen elsewhere in the codebase because we pass LOC down to
> convert_for_assignment where many of the warnings are issued and I was
> nervous about passing a different location there.

I suppose that for the convert_for_assignment cases you should warn if the 
user's code is in any way responsible for the issue, which includes if a 
user's function returns a wrong-type macro defined in a system header (or 
for that matter if a system header contains a return but the user chose 
the argument to that return, but that seems much less likely).

> Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?

OK (though followups may be needed for any other issues).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix warnings building pdp11 port

2015-09-29 Thread Jeff Law
The pdp11 port fails to build with the trunk because of a warning. 
Essentially VRP determines that the result of using BRANCH_COST is a 
constant with the range [0..1].  That's always less than 4, 3 and the 
various other magic constants used with BRANCH_COST and VRP issues a 
warning about that comparison.


I expect we're going to be overhauling BRANCH_COST shortly.  In the mean 
time, this just revectors BRANCH_COST for the pdp11 into a function to 
prevent VRP from collapsing the test and issuing the warning.


Yes, this means more code in the pdp11 cross compiler.  I'm not terribly 
concerned about that and I couldn't stand the idea of scattering 
diagnostic push/pop stuff all over the place to make just the pdp11 port 
happy.



Tested by building the pdp11 targets from config-all.mk.

Installed on the trunk.

Jeff


Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Jonathan Wakely

On 29/09/15 17:25 +0200, Jakub Jelinek wrote:

On Tue, Sep 29, 2015 at 04:15:41PM +0100, Jonathan Wakely wrote:

We set errno=0 in __gnu_cxx::__stoa in order to reliably detect when
it gets set to ERANGE. This restores the previous value when the
conversion is successful.

Tested powerpc64le-linux, committed to trunk.



commit 412f75dc37b1048e14996c9caafa46c00db8eb30
Author: Jonathan Wakely 
Date:   Tue Sep 29 15:09:23 2015 +0100

Leave errno unchanged by successful std::stoi etc

* include/ext/string_conversions.h (__stoa): Save and restore errno.
* testsuite/21_strings/basic_string/numeric_conversions/char/errno.cc:
New.

diff --git a/libstdc++-v3/include/ext/string_conversions.h 
b/libstdc++-v3/include/ext/string_conversions.h
index f4648a8..58387a2 100644
--- a/libstdc++-v3/include/ext/string_conversions.h
+++ b/libstdc++-v3/include/ext/string_conversions.h
@@ -58,6 +58,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Ret __ret;

   _CharT* __endptr;
+  const int __saved_errno = errno;
   errno = 0;
   const _TRet __tmp = __convf(__str, &__endptr, __base...);

@@ -70,6 +71,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
std::__throw_out_of_range(__name);
   else
__ret = __tmp;
+  errno = __saved_errno;


That looks wrong to me, you only restore errno if you don't throw :(.
If you throw, then errno might remain 0, which is IMHO undesirable.


My thinking was that a failed conversion that throws an exception
should be allowed to modify errno, and that the second case sets it to
ERANGE sometimes anyway.

But I suppose it would be better to consistently set it to non-zero
when an exception is thrown, or consistently restore the original
value in all cases.


So, I'd say you want to restore it earlier, right after __convf, and
immediately before that copy the current errno to some other temporary
for the use in the condition?  Or restore errno = __saved_errno;
in all the 3 spots instead of just one.


Or in a destructor so it happens however we exit the function, like
this ...


diff --git a/libstdc++-v3/include/ext/string_conversions.h b/libstdc++-v3/include/ext/string_conversions.h
index 58387a2..3b62c9a 100644
--- a/libstdc++-v3/include/ext/string_conversions.h
+++ b/libstdc++-v3/include/ext/string_conversions.h
@@ -58,8 +58,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _Ret __ret;
 
   _CharT* __endptr;
-  const int __saved_errno = errno;
-  errno = 0;
+
+  struct _Restore_errno {
+	  _Restore_errno() : _M_errno(errno) { errno = 0; }
+	  ~_Restore_errno() { errno = _M_errno; }
+	  int _M_errno;
+  } const __restore;
+
   const _TRet __tmp = __convf(__str, &__endptr, __base...);
 
   if (__endptr == __str)
@@ -71,7 +76,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	std::__throw_out_of_range(__name);
   else
 	__ret = __tmp;
-  errno = __saved_errno;
 
   if (__idx)
 	*__idx = __endptr - __str;


Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Jakub Jelinek
On Tue, Sep 29, 2015 at 05:10:20PM +0100, Jonathan Wakely wrote:
> >That looks wrong to me, you only restore errno if you don't throw :(.
> >If you throw, then errno might remain 0, which is IMHO undesirable.
> 
> My thinking was that a failed conversion that throws an exception
> should be allowed to modify errno, and that the second case sets it to
> ERANGE sometimes anyway.

Well, you can modify errno, you just shouldn't change it from non-zero to
zero as far as the user is concerned.

http://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html
"No function in this volume of IEEE Std 1003.1-2001 shall set errno to 0."
Of course, this part of STL is not POSIX, still, as you said, it would be
nice to guarantee the same.
> 
> But I suppose it would be better to consistently set it to non-zero
> when an exception is thrown, or consistently restore the original
> value in all cases.
> 
> >So, I'd say you want to restore it earlier, right after __convf, and
> >immediately before that copy the current errno to some other temporary
> >for the use in the condition?  Or restore errno = __saved_errno;
> >in all the 3 spots instead of just one.
> 
> Or in a destructor so it happens however we exit the function, like
> this ...

Works for me.

Jakub


[PATCH] Fix undefined behaviour in rl78 port

2015-09-29 Thread Jeff Law

And in the rl78 port.  Tested by building the rl78 targets in config-all.mk.

Installed on the trunk.

Jeff
commit 6d8cde85a30e36e5b5842b8d66837a8b4815d197
Author: Jeff Law 
Date:   Mon Sep 28 19:25:04 2015 -0400

[PATCH] Fix undefined behaviour in rl78 port
* config/rl78/rl78-expand.md (movqi): Fix undefined left shift
behaviour.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 1b9985a..79dc89f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
 2015-09-29  Jeff Law  
 
+   * config/rl78/rl78-expand.md (movqi): Fix undefined left shift
+   behaviour.
+
* config/msp430/msp430.c (msp430_legitimate_constant): Fix undefined
left shift behaviour.
* config/msp430/constraints.md ('L' constraint): Similarly.
diff --git a/gcc/config/rl78/rl78-expand.md b/gcc/config/rl78/rl78-expand.md
index 0335a4d..67e6620 100644
--- a/gcc/config/rl78/rl78-expand.md
+++ b/gcc/config/rl78/rl78-expand.md
@@ -48,7 +48,7 @@
&& ! REG_P (operands[0]))
operands[1] = copy_to_mode_reg (QImode, operands[1]);
 
-if (CONST_INT_P (operands[1]) && ! IN_RANGE (INTVAL (operands[1]), (-1 << 
8) + 1, (1 << 8) - 1))
+if (CONST_INT_P (operands[1]) && ! IN_RANGE (INTVAL (operands[1]), 
(HOST_WIDE_INT_M1U << 8) + 1, (1 << 8) - 1))
   FAIL;
   }
 )


Re: [patch] Leave errno unchanged by successful std::stoi etc

2015-09-29 Thread Martin Sebor

On 09/29/2015 10:15 AM, Jakub Jelinek wrote:

On Tue, Sep 29, 2015 at 05:10:20PM +0100, Jonathan Wakely wrote:

That looks wrong to me, you only restore errno if you don't throw :(.
If you throw, then errno might remain 0, which is IMHO undesirable.


My thinking was that a failed conversion that throws an exception
should be allowed to modify errno, and that the second case sets it to
ERANGE sometimes anyway.


Well, you can modify errno, you just shouldn't change it from non-zero to
zero as far as the user is concerned.

http://pubs.opengroup.org/onlinepubs/009695399/functions/errno.html
"No function in this volume of IEEE Std 1003.1-2001 shall set errno to 0."
Of course, this part of STL is not POSIX, still, as you said, it would be
nice to guarantee the same.


FWIW, I agree. It's a helpful property. If libstdc++ provides
the POSIC/C guarantee it would be nice to document it in the
manual.

That said, this part of the C++ spec (stoi and related) is specified
to such a level of detail that one might argue that the functions
aren't allowed to reset errno in an observable way.

As an aside, I objected to this specification when it was first
proposed, not because of the errno guarantee, but because the
functions were meant to be light-weight, efficient, and certainly
thread-safe means of converting strings to numbers. Specifying
their effects as opposed to their postconditions means that can't
be implemented independent of strtol and the C locale, which makes
them anything but light-weight, and prone to data races in
programs that call setlocale.

Martin


[PATCH] Fix building microblaze targets with trunk

2015-09-29 Thread Jeff Law
The microblaze port as a "*p++" statement which computes a result that 
is never used (the memory result).  This removes the spurious memory 
dereference and the unused value warning.


Tested by building the microblaze targets in config-all.mk.

Installed on the trunk.

Jeff
commit b2e58a1a53a3bbba60bd39ce53beb9fd706742f4
Author: Jeff Law 
Date:   Tue Sep 29 11:59:11 2015 -0400

[PATCH] Fix building microblaze targets with trunk
* config/microblaze/microblaze.c (microblaze_version_to_int): Remove
computation of unused value.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 13e930a..8d55423 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,8 @@
 2015-09-29  Jeff Law  
 
+   * config/microblaze/microblaze.c (microblaze_version_to_int): Remove
+   computation of unused value.
+
* config/pdp11/pdp11.c (pdp11_branch_cost): New function.
* config/pdp11/pdp11.h (BRANCH_COST): Call function rather than
inline macro expansion.
diff --git a/gcc/config/microblaze/microblaze.c 
b/gcc/config/microblaze/microblaze.c
index 6e7745a..ebcf65a 100644
--- a/gcc/config/microblaze/microblaze.c
+++ b/gcc/config/microblaze/microblaze.c
@@ -1640,7 +1640,7 @@ microblaze_version_to_int (const char *version)
{   /* Looking for major  */
   if (*p == '.')
 {
-  *v++;
+  v++;
 }
   else
 {


[PATCH] Fix building interix targets

2015-09-29 Thread Jeff Law


I'm resisting the temptation to declare interix dead (it's been tried 
before).  I'm guessing it hasn't built since early 2012.  But the fix is 
trivial enough and it's not like interix needs lots of care and maintenance.


Tested by building the interix targets in config-list.mk.

Installed on the trunk.

Jeff
commit 2cdebba6f51af63ad820568cbd439f296e7c4d82
Author: Jeff Law 
Date:   Tue Sep 29 11:58:51 2015 -0400

[PATCH] Fix building interix targets

* config/i386/t-interix (winnt-stubs.o): Fix compilation rule.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 87de440..68149c4 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,7 @@
 2015-09-29  Jeff Law  
 
+   * config/i386/t-interix (winnt-stubs.o): Fix compilation rule.
+
* config/sh/sh.c (gen_shl_and): Fix undefined left shift
behaviour.
(gen_shl_sext): Likewise.
diff --git a/gcc/config/i386/t-interix b/gcc/config/i386/t-interix
index db35dbe..dd59b85 100644
--- a/gcc/config/i386/t-interix
+++ b/gcc/config/i386/t-interix
@@ -25,6 +25,6 @@ winnt.o: $(srcdir)/config/i386/winnt.c $(CONFIG_H) 
$(SYSTEM_H) coretypes.h \
 winnt-stubs.o: $(srcdir)/config/i386/winnt-stubs.c $(CONFIG_H) $(SYSTEM_H) 
coretypes.h \
   $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \
   $(TM_P_H) toplev.h $(HASHTAB_H) $(GGC_H)
-   $(COMPILER) -c $(ALL_CFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+   $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
$(srcdir)/config/i386/winnt-stubs.c
 


[C PATCH] Fix missing warning (PR c/67730)

2015-09-29 Thread Marek Polacek
This fixes missing warning for the attached testcase.  In such a case,
we must use the expansion point location.  I didn't simply add
  loc = expansion_point_location_if_in_system_header (loc);
as might be seen elsewhere in the codebase because we pass LOC down to
convert_for_assignment where many of the warnings are issued and I was
nervous about passing a different location there.

Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?

2015-09-29  Marek Polacek  

PR c/67730
* c-typeck.c (c_finish_return): Use the expansion point location for
certain "return with value" warnings.

* gcc.dg/pr67730.c: New test.

diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 3b26231..a11ccb2 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -9369,8 +9369,12 @@ c_finish_return (l_cation_t ttt, tree retval, tree 
origtype)
   bool npc = false;
   size_t rank = 0;
 
+  /* Use the expansion point to handle cases such as returning NULL
+ in a function returning void.  */
+  source_location xloc = expansion_point_location_if_in_system_header (loc);
+
   if (TREE_THIS_VOLATILE (current_function_decl))
-warning_at (loc, 0,
+warning_at (xloc, 0,
"function declared % has a % statement");
 
   if (flag_cilkplus && contains_array_notation_expr (retval))
@@ -9425,10 +9429,10 @@ c_finish_return (location_t loc, tree retval, tree 
origtype)
 {
   current_function_returns_null = 1;
   if (TREE_CODE (TREE_TYPE (retval)) != VOID_TYPE)
-   pedwarn (loc, 0,
+   pedwarn (xloc, 0,
 "% with a value, in function returning void");
   else
-   pedwarn (loc, OPT_Wpedantic, "ISO C forbids "
+   pedwarn (xloc, OPT_Wpedantic, "ISO C forbids "
 "% with expression, in function returning void");
 }
   else
diff --git gcc/testsuite/gcc.dg/pr67730.c gcc/testsuite/gcc.dg/pr67730.c
index e69de29..54d73a6 100644
--- gcc/testsuite/gcc.dg/pr67730.c
+++ gcc/testsuite/gcc.dg/pr67730.c
@@ -0,0 +1,11 @@
+/* PR c/67730 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+#include 
+
+void
+fn1 (void)
+{
+  return NULL; /* { dg-warning "10:.return. with a value" } */
+}

Marek


Re: [Patch,optimization]: Optimized changes in the estimate register pressure cost.

2015-09-29 Thread Pat Haugen

On 09/25/2015 11:51 PM, Ajit Kumar Agarwal wrote:

I have made the following changes in the estimate_reg_pressure_cost function 
used
by the loop invariant and IVOPTS.

Earlier the estimate_reg_pressure cost uses the cost of n_new variables that 
are generated by the Loop Invariant
  and IVOPTS. These are not sufficient for register pressure calculation. The 
register pressure cost calculation should
use the n_new + n_old (numbers) to consider the cost. n_old is the register  
used inside the loops and the effect of
  n_new new variables generated by loop invariant and IVOPTS on register 
pressure is based on how the new
variables impact on register used inside the loops. The increase or decrease in 
register pressure is due to the impact
of new variables on the register used  inside the loops. The register-register 
move cost or the spill cost should consider
the cost associated with register used and the new variables generated. The 
movement  of new variables increases or
decreases the register pressure, which is based on  overall cost of n_new + 
n_old variables.

The increase and decrease in register pressure is based on the overall cost of 
n_new + n_old as the changes in the
register pressure caused due to new variables is based on how the changes 
behave with respect to the register used
in the loops.

Thus the register pressure caused to new variables is based on the new 
variables and its impact on register used inside
  the loops and thus consider the overall  cost of n_new + n_old.

Bootstrap for i386 and reg tested on i386 with the change is fine.

SPEC CPU 2000 benchmarks are run and there is following impact on the 
performance
and code size.

ratio with the optimization vs ratio without optimization for INT benchmarks
(3807.632 vs 3804.661)

ratio with the optimization vs ratio without optimization for FP benchmarks
( 4668.743 vs 4778.741)

Code size reduction with respect to FP SPEC CPU 2000 benchmarks

Number of instruction with optimization = 1094117
Number of instruction without optimization = 1094659

Reduction in number of instruction with the optimization = 542 instruction.
I tried your patch on powerpc64le using CPU2006. There was a small 
degradation in mcf (-1.5%) and small improvement in bwaves (+1.3%), the 
remaining benchmarks (and overall results) were neutral.


-Pat



[PATCH, PR target/67761] Fix i686-*-* bootstrap comparison failure

2015-09-29 Thread Ilya Enkovich
Hi,

My recenttly introduced STV pass doesn't skip debug instructions and it causes 
transformation (mistly cost computation) depending on debug info.  It causes 
bootstrap comparison failure.  This patch fixes.  Bootstrapped for i686-linux.  
Testing for x86_64-unknown-linux-gnu{,m32} is in progress.  OK for trunk if 
pass?

Thanks,
Ilya
--
gcc/

2015-09-29  Ilya Enkovich  

* config/i386/i386.c (scalar_chain::analyze_register_chain): Ignore
debug insns.
(scalar_chain::convert_reg): Likewise.

gcc/testsuite/

2015-09-29  Ilya Enkovich  

* gcc.target/i386/pr67761.c: New test.


diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6f2380f..7b3ffb0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2919,6 +2919,10 @@ scalar_chain::analyze_register_chain (bitmap candidates, 
df_ref ref)
   for (chain = DF_REF_CHAIN (ref); chain; chain = chain->next)
 {
   unsigned uid = DF_REF_INSN_UID (chain->ref);
+
+  if (!NONDEBUG_INSN_P (DF_REF_INSN (chain->ref)))
+   continue;
+
   if (!DF_REF_REG_MEM_P (chain->ref))
{
  if (bitmap_bit_p (insns, uid))
@@ -3279,7 +3283,7 @@ scalar_chain::convert_reg (unsigned regno)
bitmap_clear_bit (conv, DF_REF_INSN_UID (ref));
  }
   }
-else
+else if (NONDEBUG_INSN_P (DF_REF_INSN (ref)))
   {
replace_rtx (DF_REF_INSN (ref), reg, scopy);
df_insn_rescan (DF_REF_INSN (ref));
diff --git a/gcc/testsuite/gcc.target/i386/pr67761.c 
b/gcc/testsuite/gcc.target/i386/pr67761.c
new file mode 100644
index 000..9b13d58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr67761.c
@@ -0,0 +1,13 @@
+/* PR target/pr67761 */
+/* { dg-do run { target { ia32 } } } */
+/* { dg-options "-O2 -march=slm -g" } */
+/* { dg-final { scan-assembler "paddq" } } */
+
+void
+test (long long *values, long long val, long long delta)
+{
+  unsigned i;
+
+  for (i = 0; i < 128; i++, val += delta)
+values[i] = val;
+}


Re: [PATCH] remove dead code of commutative_reductions

2015-09-29 Thread Tobias Grosser

On 09/29/2015 06:26 PM, Sebastian Pop wrote:

This code is not used anymore after we removed the previous loop optimizer (not
based on the ISL scheduler.)  We will add back the detection of commutative
reductions after we improve the code generation of scalar dependences (by not
going out of SSA for scalar dependences just to expose them to the data
dependence graph.)

Patch passed bootstrap and check on x86_64-linux with ISL-0.15.
I will commit this patch to trunk.


LGTM.

Regarding the handling of scalars, Polly does this now by only virtually 
modeling
them as memory dependences, but leaving them as registers until code generation.
The final code generation is still done by alloca(ing) a memory slot and then
generating loads/stores from this memory slot. This is significantly easier than
trying to directly generate SSA again.

Best,
Tobias


Re: patch to fix PR66424

2015-09-29 Thread Vladimir Makarov

On 09/29/2015 10:23 AM, Matthias Klose wrote:
This was marked as a regression in 5 and 6, but never backported to 
the gcc-5-branch. Is it time to backport?



Thanks for the remainder.  I've just committed the patch to gcc 5 branch.



[PATCH] remove dead code of commutative_reductions

2015-09-29 Thread Sebastian Pop
This code is not used anymore after we removed the previous loop optimizer (not
based on the ISL scheduler.)  We will add back the detection of commutative
reductions after we improve the code generation of scalar dependences (by not
going out of SSA for scalar dependences just to expose them to the data
dependence graph.)

Patch passed bootstrap and check on x86_64-linux with ISL-0.15.
I will commit this patch to trunk.

2015-09-29  Sebastian Pop  
Aditya Kumar  

* graphite-sese-to-poly.c (gsi_for_phi_node): Remove.
(nb_data_writes_in_bb): Remove.
(split_pbb): Remove.
(split_reduction_stmt): Remove.
(is_reduction_operation_p): Remove.
(phi_contains_arg): Remove.
(follow_ssa_with_commutative_ops): Remove.
(detect_commutative_reduction_arg): Remove.
(detect_commutative_reduction_assign): Remove.
(follow_inital_value_to_phi): Remove.
(edge_initial_value_for_loop_phi): Remove.
(initial_value_for_loop_phi): Remove.
(used_outside_reduction): Remove.
(detect_commutative_reduction): Remove.
(translate_scalar_reduction_to_array_for_stmt): Remove.
(remove_phi): Remove.
(dr_indices_valid_in_loop): Remove.
(close_phi_written_to_memory): Remove.
(translate_scalar_reduction_to_array): Remove.
(rewrite_commutative_reductions_out_of_ssa_close_phi): Remove.
(rewrite_commutative_reductions_out_of_ssa_loop): Remove.
(rewrite_commutative_reductions_out_of_ssa): Remove.
(build_poly_scop): Remove call to 
rewrite_commutative_reductions_out_of_ssa.
---
 gcc/graphite-sese-to-poly.c | 602 
 1 file changed, 602 deletions(-)

diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 3b8dd56..26f75e9 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -1919,22 +1919,6 @@ build_scop_drs (scop_p scop)
 build_pbb_drs (pbb);
 }
 
-/* Return a gsi at the position of the phi node STMT.  */
-
-static gphi_iterator
-gsi_for_phi_node (gphi *stmt)
-{
-  gphi_iterator psi;
-  basic_block bb = gimple_bb (stmt);
-
-  for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next ())
-if (stmt == psi.phi ())
-  return psi;
-
-  gcc_unreachable ();
-  return psi;
-}
-
 /* Analyze all the data references of STMTS and add them to the
GBB_DATA_REFS vector of BB.  */
 
@@ -2515,590 +2499,6 @@ nb_pbbs_in_loops (scop_p scop)
   return res;
 }
 
-/* Return the number of data references in BB that write in
-   memory.  */
-
-static int
-nb_data_writes_in_bb (basic_block bb)
-{
-  int res = 0;
-  gimple_stmt_iterator gsi;
-
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
-if (gimple_vdef (gsi_stmt (gsi)))
-  res++;
-
-  return res;
-}
-
-/* Splits at STMT the basic block BB represented as PBB in the
-   polyhedral form.  */
-
-static edge
-split_pbb (scop_p scop, poly_bb_p pbb, basic_block bb, gimple *stmt)
-{
-  edge e1 = split_block (bb, stmt);
-  new_pbb_from_pbb (scop, pbb, e1->dest);
-  return e1;
-}
-
-/* Splits STMT out of its current BB.  This is done for reduction
-   statements for which we want to ignore data dependences.  */
-
-static basic_block
-split_reduction_stmt (scop_p scop, gimple *stmt)
-{
-  basic_block bb = gimple_bb (stmt);
-  poly_bb_p pbb = pbb_from_bb (bb);
-  gimple_bb_p gbb = gbb_from_bb (bb);
-  edge e1;
-  int i;
-  data_reference_p dr;
-
-  /* Do not split basic blocks with no writes to memory: the reduction
- will be the only write to memory.  */
-  if (nb_data_writes_in_bb (bb) == 0
-  /* Or if we have already marked BB as a reduction.  */
-  || PBB_IS_REDUCTION (pbb_from_bb (bb)))
-return bb;
-
-  e1 = split_pbb (scop, pbb, bb, stmt);
-
-  /* Split once more only when the reduction stmt is not the only one
- left in the original BB.  */
-  if (!gsi_one_before_end_p (gsi_start_nondebug_bb (bb)))
-{
-  gimple_stmt_iterator gsi = gsi_last_bb (bb);
-  gsi_prev ();
-  e1 = split_pbb (scop, pbb, bb, gsi_stmt (gsi));
-}
-
-  /* A part of the data references will end in a different basic block
- after the split: move the DRs from the original GBB to the newly
- created GBB1.  */
-  FOR_EACH_VEC_ELT (GBB_DATA_REFS (gbb), i, dr)
-{
-  basic_block bb1 = gimple_bb (DR_STMT (dr));
-
-  if (bb1 != bb)
-   {
- gimple_bb_p gbb1 = gbb_from_bb (bb1);
- GBB_DATA_REFS (gbb1).safe_push (dr);
- GBB_DATA_REFS (gbb).ordered_remove (i);
- i--;
-   }
-}
-
-  return e1->dest;
-}
-
-/* Return true when stmt is a reduction operation.  */
-
-static inline bool
-is_reduction_operation_p (gimple *stmt)
-{
-  enum tree_code code;
-
-  gcc_assert (is_gimple_assign (stmt));
-  code = gimple_assign_rhs_code 

Re: [C PATCH] Fix missing warning (PR c/67730)

2015-09-29 Thread Marc Glisse

On Tue, 29 Sep 2015, Marek Polacek wrote:


This fixes missing warning for the attached testcase.  In such a case,
we must use the expansion point location.  I didn't simply add
 loc = expansion_point_location_if_in_system_header (loc);
as might be seen elsewhere in the codebase because we pass LOC down to
convert_for_assignment where many of the warnings are issued and I was
nervous about passing a different location there.


I assume this means that the other missing warning from
http://stackoverflow.com/questions/32732281/no-warning-when-returning-null-with-gcc
(same code but change the return type from void to int)
is not fixed at the same time?

--
Marc Glisse


[PATCH] Fix undefined behaviour in rx port

2015-09-29 Thread Jeff Law

And the rx port.  Tested by building the rx targets in config-all.mk.

Installed on the trunk.

Jeff
commit 67dd8bdfba4072f24ea1a2bd07ffacc91185ee89
Author: Jeff Law 
Date:   Mon Sep 28 19:25:14 2015 -0400

[PATCH] Fix undefined behaviour in rx port
* config/rx/constraints.md (Int08): Fix undefined left shift
behaviour.
(Sint08, Sint16, Sint24): Likewise.
* config/rx/rx.c (rx_get_stack_layout): Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 79dc89f..53a52a6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2015-09-29  Jeff Law  
 
+   * config/rx/constraints.md (Int08): Fix undefined left shift
+   behaviour.
+   (Sint08, Sint16, Sint24): Likewise.
+   * config/rx/rx.c (rx_get_stack_layout): Likewise.
+
* config/rl78/rl78-expand.md (movqi): Fix undefined left shift
behaviour.
 
diff --git a/gcc/config/rx/constraints.md b/gcc/config/rx/constraints.md
index d46f9da..b41c232 100644
--- a/gcc/config/rx/constraints.md
+++ b/gcc/config/rx/constraints.md
@@ -28,28 +28,28 @@
 (define_constraint "Int08"
   "@internal A signed or unsigned 8-bit immediate value"
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, (-1 << 8), (1 << 8) - 1)")
+   (match_test "IN_RANGE (ival, (HOST_WIDE_INT_M1U << 8), (1 << 8) - 1)")
   )
 )
 
 (define_constraint "Sint08"
   "@internal A signed 8-bit immediate value"
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, (-1 << 7), (1 << 7) - 1)")
+   (match_test "IN_RANGE (ival, (HOST_WIDE_INT_M1U << 7), (1 << 7) - 1)")
   )
 )
 
 (define_constraint "Sint16"
   "@internal A signed 16-bit immediate value"
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, (-1 << 15), (1 << 15) - 1)")
+   (match_test "IN_RANGE (ival, (HOST_WIDE_INT_M1U << 15), (1 << 15) - 1)")
   )
 )
 
 (define_constraint "Sint24"
   "@internal A signed 24-bit immediate value"
   (and (match_code "const_int")
-   (match_test "IN_RANGE (ival, (-1 << 23), (1 << 23) - 1)")
+   (match_test "IN_RANGE (ival, (HOST_WIDE_INT_M1U << 23), (1 << 23) - 1)")
   )
 )
 
diff --git a/gcc/config/rx/rx.c b/gcc/config/rx/rx.c
index c68f29e..6d911d2 100644
--- a/gcc/config/rx/rx.c
+++ b/gcc/config/rx/rx.c
@@ -1561,7 +1561,7 @@ rx_get_stack_layout (unsigned int * lowest,
  PUSHM.
 
  FIXME: Is it worth improving this heuristic ?  */
-  pushed_mask = (-1 << low) & ~(-1 << (high + 1));
+  pushed_mask = (HOST_WIDE_INT_M1U << low) & ~(HOST_WIDE_INT_M1U << (high + 
1));
   unneeded_pushes = (pushed_mask & (~ save_mask)) & pushed_mask;
 
   if ((fixed_reg && fixed_reg <= high)
@@ -1667,7 +1667,7 @@ ok_for_max_constant (HOST_WIDE_INT val)
 
   /* rx_max_constant_size specifies the maximum number
  of bytes that can be used to hold a signed value.  */
-  return IN_RANGE (val, (-1 << (rx_max_constant_size * 8)),
+  return IN_RANGE (val, (HOST_WIDE_INT_M1U << (rx_max_constant_size * 8)),
( 1 << (rx_max_constant_size * 8)));
 }
 


Patch for PR 66424 has been backported to GCC-5 branch

2015-09-29 Thread Vladimir Makarov

  The following patch has been committed to gcc 5 branch as rev. 228256.

  The patch was bootstrapped and tested on x86/x86-64.


Index: ChangeLog
===
--- ChangeLog	(revision 228250)
+++ ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2015-09-29  Vladimir Makarov  
+
+	Backport from mainline
+	2015-07-21  Vladimir Makarov  
+
+	PR ipa/66424.
+	* lra-remat.c (operand_to_remat): Prevent using insns with input
+	subregs processed separately by IRA.
+
 2015-09-29  Andreas Krebbel  
 
 	Backport from mainline
@@ -31,7 +40,7 @@
 	("vec_scatter_element_SI"): Replace gf mode
 	attribute with bhfgq.
 
-2015-09-29  Andrew Pinski  
+2015-09-29  Andrew Pinski  
 
 	* config/aarch64/aarch64.md (prefetch):
 	Change the predicate of operand 0 to register_operand.
Index: lra-remat.c
===
--- lra-remat.c	(revision 228250)
+++ lra-remat.c	(working copy)
@@ -432,6 +432,16 @@ operand_to_remat (rtx_insn *insn)
 	  return -1;
 	found_reg = reg;
   }
+/* IRA calculates conflicts separately for subregs of two words
+   pseudo.  Even if the pseudo lives, e.g. one its subreg can be
+   used lately, another subreg hard register can be already used
+   for something else.  In such case, it is not safe to
+   rematerialize the insn.  */
+else if (reg->type == OP_IN && reg->subreg_p
+	 && reg->regno >= FIRST_PSEUDO_REGISTER
+	 && (GET_MODE_SIZE (PSEUDO_REGNO_MODE (reg->regno))
+		 == 2 * UNITS_PER_WORD))
+  return -1;
   if (found_reg == NULL)
 return -1;
   if (found_reg->regno < FIRST_PSEUDO_REGISTER)
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 228250)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,11 @@
+2015-09-29  Vladimir Makarov  
+
+	Backport from mainline
+	2015-07-21  Vladimir Makarov  
+
+	PR ipa/66424.
+	* gcc.target/i386/pr66424.c: New.
+
 2015-09-29  Andreas Krebbel  
 
 	Backport from mainline


Re: [PATCH] Fix building microblaze targets with trunk

2015-09-29 Thread Michael Eager

On 09/29/2015 10:01 AM, Jeff Law wrote:

The microblaze port as a "*p++" statement which computes a result that is never 
used (the memory
result).  This removes the spurious memory dereference and the unused value 
warning.

Tested by building the microblaze targets in config-all.mk.

Installed on the trunk.



OK.


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


[PATCH] Fix undefined behaviour in SH port

2015-09-29 Thread Jeff Law
More left shifts of negative signed values to fix in the SH port.  I'm 
not sure how these were missed last week or if they were introduced 
between the point when I tested last week and yesterday.  Regardless, 
they're fixed in the obvious way.


Tested by building all the sh targets form config-all.mk.

Installed on the trunk.

Jeff
commit d1349379450b8e11dcc7adfe678028b674a63cf1
Author: Jeff Law 
Date:   Mon Sep 28 19:25:20 2015 -0400

[PATCH] Fix undefined behaviour in SH port

* config/sh/sh.c (gen_shl_and): Fix undefined left shift
behaviour.
(gen_shl_sext): Likewise.
* config/sh/sh.md (divsi3): Likewise.
(imm->ext_dest_operand splitter): Likewise.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cce1ba5..22c09b7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2015-09-29  Jeff Law  
+
+   * config/sh/sh.c (gen_shl_and): Fix undefined left shift
+   behaviour.
+   (gen_shl_sext): Likewise.
+   * config/sh/sh.md (divsi3): Likewise.
+   (imm->ext_dest_operand splitter): Likewise.
+
 2015-09-29  Evandro Menezes  
 
* config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q):
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 16fb575..904201b 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -4342,7 +4342,7 @@ gen_shl_and (rtx dest, rtx left_rtx, rtx mask_rtx, rtx 
source)
 that don't matter.  This way, we might be able to get a shorter
 signed constant.  */
   if (mask & ((HOST_WIDE_INT) 1 << (31 - total_shift)))
-   mask |= (HOST_WIDE_INT) ~0 << (31 - total_shift);
+   mask |= (HOST_WIDE_INT) ((HOST_WIDE_INT_M1U) << (31 - total_shift));
 case 2:
   /* Don't expand fine-grained when combining, because that will
  make the pattern fail.  */
@@ -4626,7 +4626,7 @@ gen_shl_sext (rtx dest, rtx left_rtx, rtx size_rtx, rtx 
source)
}
   emit_insn (gen_andsi3 (dest, source, GEN_INT ((1 << insize) - 1)));
   emit_insn (gen_xorsi3 (dest, dest, GEN_INT (1 << (insize - 1;
-  emit_insn (gen_addsi3 (dest, dest, GEN_INT (-1 << (insize - 1;
+  emit_insn (gen_addsi3 (dest, dest, GEN_INT (HOST_WIDE_INT_M1U << (insize 
- 1;
   operands[0] = dest;
   operands[2] = kind == 7 ? GEN_INT (left + 1) : left_rtx;
   gen_shifty_op (ASHIFT, operands);
diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 8a388bc..d758e3b 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -3052,7 +3052,7 @@
  tab_base = force_reg (DImode, tab_base);
}
   if (TARGET_DIVIDE_INV20U)
-   i2p27 = force_reg (DImode, GEN_INT (-2 << 27));
+   i2p27 = force_reg (DImode, GEN_INT ((unsigned HOST_WIDE_INT)-2 << 27));
   else
i2p27 = GEN_INT (0);
   if (TARGET_DIVIDE_INV20U || TARGET_DIVIDE_INV20L)
@@ -7875,7 +7875,7 @@ label:
  break;
}
  /* Try movi / mshflo.l w/ r63.  */
- val2 = val + ((HOST_WIDE_INT) -1 << 32);
+ val2 = val + ((HOST_WIDE_INT) (HOST_WIDE_INT_M1U << 32));
  if ((HOST_WIDE_INT) val2 < 0 && CONST_OK_FOR_I16 (val2))
{
  operands[1] = gen_mshflo_l_di (operands[0], operands[0],


Re: [AArch64_be] Fix vtbl[34] and vtbx4

2015-09-29 Thread Christophe Lyon
Ping?


On 15 September 2015 at 18:25, Christophe Lyon
 wrote:
> This patch re-implements vtbl[34] and vtbx4 AdvSIMD intrinsics using
> existing builtins, and fixes the behaviour on aarch64_be.
>
> Tested on aarch64_be-none-elf and aarch64-none-elf using the Foundation Model.
>
> OK?
>
> Christophe.


Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump
On Sep 29, 2015, at 1:59 PM, H.J. Lu  wrote:
> commit f3a6675a8d69d810d2cad0c090a762094a0a8622
> Author: H.J. Lu 
> Date:   Tue Sep 29 13:47:18 2015 -0700
> 
>Define EPILOGUE_USES in i386 so that all preserved registers are used
>by the epilogue of interrupt handler.  Don't explicitly mark BP and SP
>registers as used since they are always used in epilogue.
> 
> Please take a look.

Oh, too bad you didn’t copy it here.  The easiest thing to blow is the addition 
of reload_completed && on the condition:

  /* An interrupt handler must preserve some registers that are 

 ordinarily call-clobbered.  */
  if (reload_completed
  && myarch_interrupt_func (current_function_decl)
  && save_reg_p (regno))
return true;

without it, the optimizer will blow chunks all over the place and code-gen will 
not be very good, if it doesn’t.  I’d love this to be shared across all ports, 
it it is cryptic and usually test cases are not elaborate enough to find the 
problem.  When we ported a large library to our system that made extensive uses 
of complex interrupt routines, the compiler blew chunks.  With lessor code, we 
never even noticed a problem.

[PATCH] use MIN fusion for ISL-14

2015-09-29 Thread Sebastian Pop
This patch fixes PR66754 by reverting an earlier unintended change.
We now generate a much simpler AST for interchange-1.c:

ISL AST generated by ISL:
{
  for (int c1 = 0; c1 <= 1334; c1 += 1) {
S_7(c1);
for (int c3 = 0; c3 <= 1334; c3 += 1)
  S_4(c1, c3);
S_5(c1);
  }
  for (int c1 = 0; c1 <= 1334; c1 += 1)
S_10(c1);
  S_8();
}

Bootstrap and check pass on x86_64-linux with isl-0.14.1

  PR tree-optimization/67754
  * graphite-optimize-isl.c (optimize_isl): Call
  isl_options_set_schedule_fuse with ISL_SCHEDULE_FUSE_MIN for ISL-14.
---
 gcc/graphite-optimize-isl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/graphite-optimize-isl.c b/gcc/graphite-optimize-isl.c
index 4b82174..512c64c 100644
--- a/gcc/graphite-optimize-isl.c
+++ b/gcc/graphite-optimize-isl.c
@@ -327,9 +327,10 @@ optimize_isl (scop_p scop)
   isl_options_set_schedule_max_constant_term (scop->ctx, CONSTANT_BOUND);
   isl_options_set_schedule_maximize_band_depth (scop->ctx, 1);
 #ifdef HAVE_ISL_OPTIONS_SET_SCHEDULE_SERIALIZE_SCCS
+  /* ISL-0.15 or later.  */
   isl_options_set_schedule_serialize_sccs (scop->ctx, 1);
 #else
-  isl_options_set_schedule_fuse (scop->ctx, ISL_SCHEDULE_FUSE_MAX);
+  isl_options_set_schedule_fuse (scop->ctx, ISL_SCHEDULE_FUSE_MIN);
 #endif
 
 #ifdef HAVE_ISL_SCHED_CONSTRAINTS_COMPUTE_SCHEDULE
-- 
2.1.0.243.g30d45f7



Re: [google][gcc-4_9] Remove unused key field in gcov_fn_info

2015-09-29 Thread Xinliang David Li
   else
 {
   gfi_ptr = gi_ptr->functions[f_ix];
-  if (gfi_ptr && gfi_ptr->key == gi_ptr)
+  if (gfi_ptr)
 length = GCOV_TAG_FUNCTION_LENGTH;
-  else
-length = 0;
 }

The removal of 'else' path seems wrong.

David


On Tue, Sep 29, 2015 at 1:46 PM, Rong Xu  wrote:
> Hi,
>
> This patch is for google/gcc-4_9 branch.
>
> The 'key' field in gcov_fn_info is designed to allow gcov function
> data to be COMDATTed, but the comdat elimination never works. This
> patch removes this field to reduce the instrumented object size.
>
> Thanks,
>
> -Rong


Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread H.J. Lu
On Tue, Sep 29, 2015 at 2:12 PM, Mike Stump  wrote:
> On Sep 29, 2015, at 1:16 PM, H.J. Lu  wrote:
>> On Tue, Sep 29, 2015 at 11:49 AM, Mike Stump  wrote:
>>> To be feature complete, it would be nice to have two styles of interrupt 
>>> functions, one that returns with iret, and one that returns with ret.  The 
>>> point is that the user might want to call functions from a interrupt 
>>> handler and not save and restore all call clobbered registers.  By allowing 
>>> a ret style interrupt handler, calls to a ret style interrupt routine can 
>>> avoid saving and restoring all call clobbered registers.
>>
>> Do you have a testcase for this?  I think the current implementation
>> covers most use cases.
>
> When I wrote my interrupt support for my cpu, I ran these through the code 
> generator…  I have many registers, and noticed saving and restoring them all 
> just because two interrupt handlers used the same routine was silly.  Test 
> case is trivial:
>
> interrupt void foo2() {
>   bar();
> }
>
> interrupt void foo1() {
>   bar();
> }
>
> if more than 1-2 registers are saved, then likely it is saving all call used 
> registers.  Saving all means that one cannot use functions to compose 
> semantics and attain performance.  Performance of ISR routines I think is 
> useful to shoot for, given that it is easy enough to attain, I don’t see the 
> harm in doing that.  Even if in the first implementation you don’t bother 
> with performance, if you spec the other function, the user code need never 
> change; and when performance does matter, it is then a mere matter of 
> enhancing the code gen to do the right thing.  It is pretty easy to get most 
> of the benefit without much work.  i call the main interrupt function 
> interrupt, and the recursive (ret style), I call interruptr.  The r is for 
> recursive.

I added:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66960#c3

Thanks.

-- 
H.J.


[google][gcc-4_9] Remove unused key field in gcov_fn_info

2015-09-29 Thread Rong Xu
Hi,

This patch is for google/gcc-4_9 branch.

The 'key' field in gcov_fn_info is designed to allow gcov function
data to be COMDATTed, but the comdat elimination never works. This
patch removes this field to reduce the instrumented object size.

Thanks,

-Rong
Removed the unused 'key' field in gcov_fn_info to reduce the 
instrumented objects size.

2015-09-29  Rong Xu  

* gcc/coverage.c (build_fn_info_type): Remove 'key'
field. (build_fn_info): Ditto.
(coverage_obj_fn): Ditto.
* libgcc/libgcov.h (struct gcov_fn_info): Ditto.
* libgcc/libgcov-driver.c (gcov_compute_histogram): Ditto.
(gcov_exit_compute_summary): Ditto.
(gcov_exit_merge_gcda): Ditto.
(gcov_write_func_counters): Ditto.
(gcov_clear): Ditto.
* libgcc/libgcov-util.c (tag_function): Ditto.
(gcov_merge): Ditto.
(gcov_profile_scale): Ditto.
(gcov_profile_normalize): Ditto.
(compute_one_gcov): Ditto.
(gcov_info_count_all_cold): Ditto.

Index: gcc/coverage.c
===
--- gcc/coverage.c  (revision 228223)
+++ gcc/coverage.c  (working copy)
@@ -189,7 +189,7 @@ static void read_counts_file (const char *, unsign
 static tree build_var (tree, tree, int);
 static void build_fn_info_type (tree, unsigned, tree);
 static void build_info_type (tree, tree);
-static tree build_fn_info (const struct coverage_data *, tree, tree);
+static tree build_fn_info (const struct coverage_data *, tree);
 static tree build_info (tree, tree);
 static bool coverage_obj_init (void);
 static vec *coverage_obj_fn
@@ -1668,16 +1668,9 @@ build_fn_info_type (tree type, unsigned counters,
 
   finish_builtin_struct (ctr_info, "__gcov_ctr_info", fields, NULL_TREE);
 
-  /* key */
-  field = build_decl (BUILTINS_LOCATION, FIELD_DECL, NULL_TREE,
- build_pointer_type (build_qualified_type
- (gcov_info_type, TYPE_QUAL_CONST)));
-  fields = field;
-
   /* ident */
   field = build_decl (BUILTINS_LOCATION, FIELD_DECL, NULL_TREE,
  get_gcov_unsigned_t ());
-  DECL_CHAIN (field) = fields;
   fields = field;
 
   /* lineno_checksum */
@@ -1705,10 +1698,10 @@ build_fn_info_type (tree type, unsigned counters,
 
 /* Returns a CONSTRUCTOR for a gcov_fn_info.  DATA is
the coverage data for the function and TYPE is the gcov_fn_info
-   RECORD_TYPE.  KEY is the object file key.  */
+   RECORD_TYPE.  */
 
 static tree
-build_fn_info (const struct coverage_data *data, tree type, tree key)
+build_fn_info (const struct coverage_data *data, tree type)
 {
   tree fields = TYPE_FIELDS (type);
   tree ctr_type;
@@ -1716,11 +1709,6 @@ static tree
   vec *v1 = NULL;
   vec *v2 = NULL;
 
-  /* key */
-  CONSTRUCTOR_APPEND_ELT (v1, fields,
- build1 (ADDR_EXPR, TREE_TYPE (fields), key));
-  fields = DECL_CHAIN (fields);
-  
   /* ident */
   CONSTRUCTOR_APPEND_ELT (v1, fields,
  build_int_cstu (get_gcov_unsigned_t (),
@@ -2556,7 +2544,7 @@ static vec *
 coverage_obj_fn (vec *ctor, tree fn,
 struct coverage_data const *data)
 {
-  tree init = build_fn_info (data, gcov_fn_info_type, gcov_info_var);
+  tree init = build_fn_info (data, gcov_fn_info_type);
   tree var = build_var (fn, gcov_fn_info_type, -1);
   
   DECL_INITIAL (var) = init;
Index: libgcc/libgcov-driver.c
===
--- libgcc/libgcov-driver.c (revision 227984)
+++ libgcc/libgcov-driver.c (working copy)
@@ -380,7 +380,7 @@ gcov_compute_histogram (struct gcov_summary *sum)
 {
   gfi_ptr = gi_ptr->functions[f_ix];
 
-  if (!gfi_ptr || gfi_ptr->key != gi_ptr)
+  if (!gfi_ptr)
 continue;
 
   ci_ptr = _ptr->ctrs[ctr_info_ix];
@@ -430,9 +430,6 @@ gcov_exit_compute_summary (struct gcov_summary *th
 {
   gfi_ptr = gi_ptr->functions[f_ix];
 
-  if (gfi_ptr && gfi_ptr->key != gi_ptr)
-gfi_ptr = 0;
-
   crc32 = crc32_unsigned (crc32, gfi_ptr ? gfi_ptr->cfg_checksum : 0);
   crc32 = crc32_unsigned (crc32,
   gfi_ptr ? gfi_ptr->lineno_checksum : 0);
@@ -688,7 +685,7 @@ gcov_exit_merge_gcda (struct gcov_info *gi_ptr,
   if (length != GCOV_TAG_FUNCTION_LENGTH)
 goto read_mismatch;
 
-  if (!gfi_ptr || gfi_ptr->key != gi_ptr)
+  if (!gfi_ptr)
 {
   /* This function appears in the other program.  We
  need to buffer the information in order to write
@@ -832,10 +829,8 @@ gcov_write_func_counters (struct gcov_info *gi_ptr
   else
 {
   gfi_ptr = gi_ptr->functions[f_ix];
-  if (gfi_ptr && gfi_ptr->key == gi_ptr)
+  if 

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread H.J. Lu
On Tue, Sep 29, 2015 at 3:19 PM, H.J. Lu  wrote:
> On Tue, Sep 29, 2015 at 2:12 PM, Mike Stump  wrote:
>> On Sep 29, 2015, at 1:16 PM, H.J. Lu  wrote:
>>> On Tue, Sep 29, 2015 at 11:49 AM, Mike Stump  wrote:
 To be feature complete, it would be nice to have two styles of interrupt 
 functions, one that returns with iret, and one that returns with ret.  The 
 point is that the user might want to call functions from a interrupt 
 handler and not save and restore all call clobbered registers.  By 
 allowing a ret style interrupt handler, calls to a ret style interrupt 
 routine can avoid saving and restoring all call clobbered registers.
>>>
>>> Do you have a testcase for this?  I think the current implementation
>>> covers most use cases.
>>
>> When I wrote my interrupt support for my cpu, I ran these through the code 
>> generator…  I have many registers, and noticed saving and restoring them all 
>> just because two interrupt handlers used the same routine was silly.  Test 
>> case is trivial:
>>
>> interrupt void foo2() {
>>   bar();
>> }
>>
>> interrupt void foo1() {
>>   bar();
>> }
>>
>> if more than 1-2 registers are saved, then likely it is saving all call used 
>> registers.  Saving all means that one cannot use functions to compose 
>> semantics and attain performance.  Performance of ISR routines I think is 
>> useful to shoot for, given that it is easy enough to attain, I don’t see the 
>> harm in doing that.  Even if in the first implementation you don’t bother 
>> with performance, if you spec the other function, the user code need never 
>> change; and when performance does matter, it is then a mere matter of 
>> enhancing the code gen to do the right thing.  It is pretty easy to get most 
>> of the benefit without much work.  i call the main interrupt function 
>> interrupt, and the recursive (ret style), I call interruptr.  The r is for 
>> recursive.
>
> I added:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66960#c3
>

How about adding a "no_caller_saved_registers" attribute?

-- 
H.J.


Re: [google][gcc-4_9] Remove unused key field in gcov_fn_info

2015-09-29 Thread Xinliang David Li
ok.

David

On Tue, Sep 29, 2015 at 4:08 PM, Rong Xu  wrote:
> You are right. I attached the updated patch to this email.
>
> On Tue, Sep 29, 2015 at 3:10 PM, Xinliang David Li  wrote:
>>else
>>  {
>>gfi_ptr = gi_ptr->functions[f_ix];
>> -  if (gfi_ptr && gfi_ptr->key == gi_ptr)
>> +  if (gfi_ptr)
>>  length = GCOV_TAG_FUNCTION_LENGTH;
>> -  else
>> -length = 0;
>>  }
>>
>> The removal of 'else' path seems wrong.
>>
>> David
>>
>>
>> On Tue, Sep 29, 2015 at 1:46 PM, Rong Xu  wrote:
>>> Hi,
>>>
>>> This patch is for google/gcc-4_9 branch.
>>>
>>> The 'key' field in gcov_fn_info is designed to allow gcov function
>>> data to be COMDATTed, but the comdat elimination never works. This
>>> patch removes this field to reduce the instrumented object size.
>>>
>>> Thanks,
>>>
>>> -Rong


Re: [testsuite] Fix order of dg-do and dg-require-effective-target directives

2015-09-29 Thread Mike Stump
On Sep 29, 2015, at 1:29 PM, Christophe Lyon  wrote:
> The attached patch fixes the order on the few testcases where I
> noticed it was wrong.

> OK?

Ok.


Re: [google][gcc-4_9] Remove unused key field in gcov_fn_info

2015-09-29 Thread Rong Xu
You are right. I attached the updated patch to this email.

On Tue, Sep 29, 2015 at 3:10 PM, Xinliang David Li  wrote:
>else
>  {
>gfi_ptr = gi_ptr->functions[f_ix];
> -  if (gfi_ptr && gfi_ptr->key == gi_ptr)
> +  if (gfi_ptr)
>  length = GCOV_TAG_FUNCTION_LENGTH;
> -  else
> -length = 0;
>  }
>
> The removal of 'else' path seems wrong.
>
> David
>
>
> On Tue, Sep 29, 2015 at 1:46 PM, Rong Xu  wrote:
>> Hi,
>>
>> This patch is for google/gcc-4_9 branch.
>>
>> The 'key' field in gcov_fn_info is designed to allow gcov function
>> data to be COMDATTed, but the comdat elimination never works. This
>> patch removes this field to reduce the instrumented object size.
>>
>> Thanks,
>>
>> -Rong
Removed the unused 'key' field in gcov_fn_info to reduce the 
instrumented objects size.

2015-09-29  Rong Xu  

* gcc/coverage.c (build_fn_info_type): Remove 'key'
field. (build_fn_info): Ditto.
(coverage_obj_fn): Ditto.
* libgcc/libgcov.h (struct gcov_fn_info): Ditto.
* libgcc/libgcov-driver.c (gcov_compute_histogram): Ditto.
(gcov_exit_compute_summary): Ditto.
(gcov_exit_merge_gcda): Ditto.
(gcov_write_func_counters): Ditto.
(gcov_clear): Ditto.
* libgcc/libgcov-util.c (tag_function): Ditto.
(gcov_merge): Ditto.
(gcov_profile_scale): Ditto.
(gcov_profile_normalize): Ditto.
(compute_one_gcov): Ditto.
(gcov_info_count_all_cold): Ditto.

Index: gcc/coverage.c
===
--- gcc/coverage.c  (revision 228223)
+++ gcc/coverage.c  (working copy)
@@ -189,7 +189,7 @@ static void read_counts_file (const char *, unsign
 static tree build_var (tree, tree, int);
 static void build_fn_info_type (tree, unsigned, tree);
 static void build_info_type (tree, tree);
-static tree build_fn_info (const struct coverage_data *, tree, tree);
+static tree build_fn_info (const struct coverage_data *, tree);
 static tree build_info (tree, tree);
 static bool coverage_obj_init (void);
 static vec *coverage_obj_fn
@@ -1668,16 +1668,9 @@ build_fn_info_type (tree type, unsigned counters,
 
   finish_builtin_struct (ctr_info, "__gcov_ctr_info", fields, NULL_TREE);
 
-  /* key */
-  field = build_decl (BUILTINS_LOCATION, FIELD_DECL, NULL_TREE,
- build_pointer_type (build_qualified_type
- (gcov_info_type, TYPE_QUAL_CONST)));
-  fields = field;
-
   /* ident */
   field = build_decl (BUILTINS_LOCATION, FIELD_DECL, NULL_TREE,
  get_gcov_unsigned_t ());
-  DECL_CHAIN (field) = fields;
   fields = field;
 
   /* lineno_checksum */
@@ -1705,10 +1698,10 @@ build_fn_info_type (tree type, unsigned counters,
 
 /* Returns a CONSTRUCTOR for a gcov_fn_info.  DATA is
the coverage data for the function and TYPE is the gcov_fn_info
-   RECORD_TYPE.  KEY is the object file key.  */
+   RECORD_TYPE.  */
 
 static tree
-build_fn_info (const struct coverage_data *data, tree type, tree key)
+build_fn_info (const struct coverage_data *data, tree type)
 {
   tree fields = TYPE_FIELDS (type);
   tree ctr_type;
@@ -1716,11 +1709,6 @@ static tree
   vec *v1 = NULL;
   vec *v2 = NULL;
 
-  /* key */
-  CONSTRUCTOR_APPEND_ELT (v1, fields,
- build1 (ADDR_EXPR, TREE_TYPE (fields), key));
-  fields = DECL_CHAIN (fields);
-  
   /* ident */
   CONSTRUCTOR_APPEND_ELT (v1, fields,
  build_int_cstu (get_gcov_unsigned_t (),
@@ -2556,7 +2544,7 @@ static vec *
 coverage_obj_fn (vec *ctor, tree fn,
 struct coverage_data const *data)
 {
-  tree init = build_fn_info (data, gcov_fn_info_type, gcov_info_var);
+  tree init = build_fn_info (data, gcov_fn_info_type);
   tree var = build_var (fn, gcov_fn_info_type, -1);
   
   DECL_INITIAL (var) = init;
Index: libgcc/libgcov-driver.c
===
--- libgcc/libgcov-driver.c (revision 227984)
+++ libgcc/libgcov-driver.c (working copy)
@@ -380,7 +380,7 @@ gcov_compute_histogram (struct gcov_summary *sum)
 {
   gfi_ptr = gi_ptr->functions[f_ix];
 
-  if (!gfi_ptr || gfi_ptr->key != gi_ptr)
+  if (!gfi_ptr)
 continue;
 
   ci_ptr = _ptr->ctrs[ctr_info_ix];
@@ -430,9 +430,6 @@ gcov_exit_compute_summary (struct gcov_summary *th
 {
   gfi_ptr = gi_ptr->functions[f_ix];
 
-  if (gfi_ptr && gfi_ptr->key != gi_ptr)
-gfi_ptr = 0;
-
   crc32 = crc32_unsigned (crc32, gfi_ptr ? gfi_ptr->cfg_checksum : 0);
   crc32 = crc32_unsigned (crc32,
   gfi_ptr ? gfi_ptr->lineno_checksum : 0);
@@ 

[patch committed FT32] Limit MEMSET, MEMCPY to <512 bytes

2015-09-29 Thread James Bowman
The attached patch limits the MEMSET (setmemsi pattern) and MEMCPY
(movmemsi pattern) instructions to 0-511 bytes. There is a hardware
limitation on large MEMSET, MEMCPY operations that the library versions
of memset() and memcpy() deal with.

[gcc]

2015-09-29  James Bowman  

* config/ft32/predicates.md (ft32_imm_operand): New predicate.
* config/ft32/ft32.md (movmemsi, setmemsi): Use ft32_imm_operand
predicate, disallow register for operand 2.

Index: gcc/config/ft32/ft32.md
===
--- gcc/config/ft32/ft32.md (revision 228109)
+++ gcc/config/ft32/ft32.md (working copy)
@@ -841,19 +841,19 @@
 )
 
 (define_insn "movmemsi"
-  [(set (match_operand:BLK 0 "memory_operand" "=W,W,BW")
-(match_operand:BLK 1 "memory_operand" "W,W,BW"))
-(use (match_operand:SI 2 "ft32_rimm_operand" "r,KA,rKA"))
+  [(set (match_operand:BLK 0 "memory_operand" "=W,BW")
+(match_operand:BLK 1 "memory_operand" "W,BW"))
+(use (match_operand:SI 2 "ft32_imm_operand" "KA,KA"))
 (use (match_operand:SI 3))
]
   ""
-  "memcpy.%d3 %b0,%b1,%2 # %3!"
+  "memcpy.%d3 %b0,%b1,%2 "
 )
 
 (define_insn "setmemsi"
-  [(set (match_operand:BLK 0 "memory_operand" "=BW,BW") (unspec:BLK [
- (use (match_operand:QI 2 "register_operand" "r,r"))
- (use (match_operand:SI 1 "ft32_rimm_operand" "r,KA"))
+  [(set (match_operand:BLK 0 "memory_operand" "=BW") (unspec:BLK [
+ (use (match_operand:QI 2 "register_operand" "r"))
+ (use (match_operand:SI 1 "ft32_imm_operand" "KA"))
] UNSPEC_SETMEM))
(use (match_operand:SI 3))
]
Index: gcc/config/ft32/predicates.md
===
--- gcc/config/ft32/predicates.md   (revision 228109)
+++ gcc/config/ft32/predicates.md   (working copy)
@@ -80,6 +80,10 @@
(and (match_code "const_int")
 (match_test "IN_RANGE (INTVAL (op), -512, 511)"
 
+(define_predicate "ft32_imm_operand"
+  (and (match_code "const_int")
+   (match_test "IN_RANGE (INTVAL (op), -512, 511)")))
+
 (define_predicate "ft32_bwidth_operand"
   (and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 1, 16)")))


Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump
On Sep 29, 2015, at 3:10 PM, H.J. Lu  wrote:
> On Tue, Sep 29, 2015 at 2:23 PM, Mike Stump  wrote:
>> On Sep 29, 2015, at 1:59 PM, H.J. Lu  wrote:
>>> commit f3a6675a8d69d810d2cad0c090a762094a0a8622
>>> Author: H.J. Lu 
>>> Date:   Tue Sep 29 13:47:18 2015 -0700
>>> 
>>>   Define EPILOGUE_USES in i386 so that all preserved registers are used
>>>   by the epilogue of interrupt handler.  Don't explicitly mark BP and SP
>>>   registers as used since they are always used in epilogue.
>>> 
>>> Please take a look.
>> 
>> Oh, too bad you didn’t copy it here.  The easiest thing to blow is the 
>> addition of reload_completed && on the condition


> static bool
> ix86_save_reg (unsigned int regno, bool maybe_eh_return)
> {
>  /* In interrupt handler, we don't preserve MMX and x87 registers
> which aren't supported when saving and restoring registers.  No
> need to preserve callee-saved registers unless they are modified.
> We also preserve all caller-saved registers if a function call
> is made in interrupt handler since the called function may change
> them.  Don't explicitly save BP and SP registers since they are
> always preserved.  */
>  if (cfun->machine->is_interrupt)
>return ((df_regs_ever_live_p (regno)
> || (call_used_regs[regno] && cfun->machine->make_calls))
>&& !fixed_regs[regno]
>&& !STACK_REGNO_P (regno)
>&& !MMX_REGNO_P (regno)
>&& regno != BP_REG
>&& regno != SP_REG
>&& (regno <= ST7_REG || regno >= XMM0_REG));
> 
> Is this sufficient?

I see no string "reload_completed &&”.  Either, you need it here, or, you need 
it in the caller.

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread H.J. Lu
On Tue, Sep 29, 2015 at 4:53 PM, Mike Stump  wrote:
> On Sep 29, 2015, at 3:10 PM, H.J. Lu  wrote:
>> On Tue, Sep 29, 2015 at 2:23 PM, Mike Stump  wrote:
>>> On Sep 29, 2015, at 1:59 PM, H.J. Lu  wrote:
 commit f3a6675a8d69d810d2cad0c090a762094a0a8622
 Author: H.J. Lu 
 Date:   Tue Sep 29 13:47:18 2015 -0700

   Define EPILOGUE_USES in i386 so that all preserved registers are used
   by the epilogue of interrupt handler.  Don't explicitly mark BP and SP
   registers as used since they are always used in epilogue.

 Please take a look.
>>>
>>> Oh, too bad you didn’t copy it here.  The easiest thing to blow is the 
>>> addition of reload_completed && on the condition
>
>
>> static bool
>> ix86_save_reg (unsigned int regno, bool maybe_eh_return)
>> {
>>  /* In interrupt handler, we don't preserve MMX and x87 registers
>> which aren't supported when saving and restoring registers.  No
>> need to preserve callee-saved registers unless they are modified.
>> We also preserve all caller-saved registers if a function call
>> is made in interrupt handler since the called function may change
>> them.  Don't explicitly save BP and SP registers since they are
>> always preserved.  */
>>  if (cfun->machine->is_interrupt)
>>return ((df_regs_ever_live_p (regno)
>> || (call_used_regs[regno] && cfun->machine->make_calls))
>>&& !fixed_regs[regno]
>>&& !STACK_REGNO_P (regno)
>>&& !MMX_REGNO_P (regno)
>>&& regno != BP_REG
>>&& regno != SP_REG
>>&& (regno <= ST7_REG || regno >= XMM0_REG));
>>
>> Is this sufficient?
>
> I see no string "reload_completed &&”.  Either, you need it here, or, you 
> need it in the caller.

Do you have a testcase to show its impact?

Thanks.

-- 
H.J.


Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump
On Sep 29, 2015, at 2:23 PM, Mike Stump  wrote:
> On Sep 29, 2015, at 1:59 PM, H.J. Lu  wrote:
>> commit f3a6675a8d69d810d2cad0c090a762094a0a8622
>> Author: H.J. Lu 
>> Date:   Tue Sep 29 13:47:18 2015 -0700
>> 
>>   Define EPILOGUE_USES in i386

>> Please take a look.

Oh, and with that, I don’t think one needs the generated USEs anymore.

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread H.J. Lu
On Tue, Sep 29, 2015 at 2:23 PM, Mike Stump  wrote:
> On Sep 29, 2015, at 1:59 PM, H.J. Lu  wrote:
>> commit f3a6675a8d69d810d2cad0c090a762094a0a8622
>> Author: H.J. Lu 
>> Date:   Tue Sep 29 13:47:18 2015 -0700
>>
>>Define EPILOGUE_USES in i386 so that all preserved registers are used
>>by the epilogue of interrupt handler.  Don't explicitly mark BP and SP
>>registers as used since they are always used in epilogue.
>>
>> Please take a look.
>
> Oh, too bad you didn’t copy it here.  The easiest thing to blow is the 
> addition of reload_completed && on the condition:
>
>   /* An interrupt handler must preserve some registers that are
>  ordinarily call-clobbered.  */
>   if (reload_completed
>   && myarch_interrupt_func (current_function_decl)
>   && save_reg_p (regno))
> return true;
>
> without it, the optimizer will blow chunks all over the place and code-gen 
> will not be very good, if it doesn’t.  I’d love this to be shared across all 
> ports, it it is cryptic and usually test cases are not elaborate enough to 
> find the problem.  When we ported a large library to our system that made 
> extensive uses of complex interrupt routines, the compiler blew chunks.  With 
> lessor code, we never even noticed a problem.

We have

static bool
ix86_save_reg (unsigned int regno, bool maybe_eh_return)
{
  /* In interrupt handler, we don't preserve MMX and x87 registers
 which aren't supported when saving and restoring registers.  No
 need to preserve callee-saved registers unless they are modified.
 We also preserve all caller-saved registers if a function call
 is made in interrupt handler since the called function may change
 them.  Don't explicitly save BP and SP registers since they are
 always preserved.  */
  if (cfun->machine->is_interrupt)
return ((df_regs_ever_live_p (regno)
 || (call_used_regs[regno] && cfun->machine->make_calls))
&& !fixed_regs[regno]
&& !STACK_REGNO_P (regno)
&& !MMX_REGNO_P (regno)
&& regno != BP_REG
&& regno != SP_REG
&& (regno <= ST7_REG || regno >= XMM0_REG));

Is this sufficient?



-- 
H.J.


RE: [PATCH] use MIN fusion for ISL-14

2015-09-29 Thread Sebastian Paul Pop
The AST looked very bad: see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67754#c2
There were a lot of testcases failing with MAX fusion.

Sebastian

-Original Message-
From: Tobias Grosser [mailto:tob...@grosser.es] 
Sent: Tuesday, September 29, 2015 5:14 PM
To: Sebastian Pop; gcc-patches@gcc.gnu.org
Cc: seb...@gmail.com; aditya...@samsung.com; richard.guent...@gmail.com
Subject: Re: [PATCH] use MIN fusion for ISL-14

On 09/30/2015 12:10 AM, Sebastian Pop wrote:
> This patch fixes PR66754 by reverting an earlier unintended change.
> We now generate a much simpler AST for interchange-1.c:
>
> ISL AST generated by ISL:
> {
>for (int c1 = 0; c1 <= 1334; c1 += 1) {
>  S_7(c1);
>  for (int c3 = 0; c3 <= 1334; c3 += 1)
>S_4(c1, c3);
>  S_5(c1);
>}
>for (int c1 = 0; c1 <= 1334; c1 += 1)
>  S_10(c1);
>S_8();
> }

Sure. Out of interest. How did the AST look before? Also, does it
make sense to add a test case?

Best,
Tobias



Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump
On Sep 29, 2015, at 3:40 PM, H.J. Lu  wrote:
> How about adding a "no_caller_saved_registers" attribute?

You can save all call clobbered registers with 3 instructions?  Really?  I’m 
skeptical.  Anyway, if you do this by turning off great swaths of registers, 
then, I guess that doesn’t surprise me.  I try and not turn any off in mine.

Now, if you turn off great swaths of registers, then you can’t actually make 
any calls to an abi that supports those registers as call clobbered:

> +  if (cfun->machine->is_interrupt)
> +return ((df_regs_ever_live_p (regno)
> +  || (call_used_regs[regno] && cfun->machine->make_calls))
> + && !fixed_regs[regno]
> + && !STACK_REGNO_P (regno)
> + && !MMX_REGNO_P (regno)
> + && regno != BP_REG
> + && regno != SP_REG
> + && (regno <= ST7_REG || regno >= XMM0_REG));

So, any calls to a function that uses any excluded register won’t work, if that 
register is not fixed and is call clobbered.  If you call such a function, you 
_must_ save and restore all those registers.  Code like:

> +  if (cfun->machine->is_interrupt && VALID_MMX_REG_MODE (mode))
> +{
> +  error ("MMX/3Dnow instructions aren't allowed in %s service routine",
> +  (cfun->machine->is_exception ? "exception" : "interrupt"));
> +  return;
> +}

Does not save your from the obligation of saving such registers, if you support 
function calls inside an interrupt routine.  Once you add that support to make 
function calls work, then, you might as well lift this restriction, cause it 
would already just work?

If you really can save everything in 3 instructions, then there is no point to 
trying to enhance it more on your port.  3 instructions execute so fast as to 
not matter.

Now, if you ask me how I know all this, I had to debug a failure to save 1 
register class in the prologue from large multi core instruction trace, and 
that class was the second most important class for general code gen right after 
the gprs.  Turns out that I killed a live variable in that class from a packet 
handler interrupt routine cause it failed to save/restore.  After that, I 
tested every register class and fixed all the issues.

[committed, PATCH] gcc.dg/debug/pr65771.c: Use tls_runtime instead of tls.

2015-09-29 Thread Jonathan Roelofs
The attached patch fixes a few failures on bare-metal arm that happen 
because `dg-require-effective-target tls` only checks for compile-time 
support, whereas `dg-require-effective-target tls_runtime` checks for 
runtime support too. The latter is needed due to the missing support for 
__aeabi_read_tp which causes link-time failure of this test.


Tested on arm-none-eabi.


2015-09-29  Jonathan Roelofs  

   * gcc.dg/debug/pr65771.c: Use tls_runtime instead of tls.


Committed r228273 by Sandra.

--
Jon Roelofs
jonat...@codesourcery.com
CodeSourcery / Mentor Embedded
Index: gcc/testsuite/gcc.dg/debug/pr65771.c
===
--- gcc/testsuite/gcc.dg/debug/pr65771.c  (revision 228267)
+++ gcc/testsuite/gcc.dg/debug/pr65771.c  (working copy)
 -1,6 +1,6 @@
 /* PR debug/65771 */
 /* { dg-do link } */
-/* { dg-require-effective-target tls } */
+/* { dg-require-effective-target tls_runtime } */
 /* { dg-add-options tls } */
 
 struct S { int s; int t; };


Re: [PATCH] Make compute_deps, extend_schedule static

2015-09-29 Thread Tobias Grosser

On 09/29/2015 10:19 PM, Aditya Kumar wrote:

From: hiraditya 

No functional changes intended. Passes make check and bootstrap.


LGTM.

Tobias


Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread Mike Stump
On Sep 29, 2015, at 1:16 PM, H.J. Lu  wrote:
> On Tue, Sep 29, 2015 at 11:49 AM, Mike Stump  wrote:
>> To be feature complete, it would be nice to have two styles of interrupt 
>> functions, one that returns with iret, and one that returns with ret.  The 
>> point is that the user might want to call functions from a interrupt handler 
>> and not save and restore all call clobbered registers.  By allowing a ret 
>> style interrupt handler, calls to a ret style interrupt routine can avoid 
>> saving and restoring all call clobbered registers.
> 
> Do you have a testcase for this?  I think the current implementation
> covers most use cases.

When I wrote my interrupt support for my cpu, I ran these through the code 
generator…  I have many registers, and noticed saving and restoring them all 
just because two interrupt handlers used the same routine was silly.  Test case 
is trivial:

interrupt void foo2() {
  bar();
}

interrupt void foo1() {
  bar();
}

if more than 1-2 registers are saved, then likely it is saving all call used 
registers.  Saving all means that one cannot use functions to compose semantics 
and attain performance.  Performance of ISR routines I think is useful to shoot 
for, given that it is easy enough to attain, I don’t see the harm in doing 
that.  Even if in the first implementation you don’t bother with performance, 
if you spec the other function, the user code need never change; and when 
performance does matter, it is then a mere matter of enhancing the code gen to 
do the right thing.  It is pretty easy to get most of the benefit without much 
work.  i call the main interrupt function interrupt, and the recursive (ret 
style), I call interruptr.  The r is for recursive.

[RFA][PATCH] Fix building cr16-elf with trunk compiler

2015-09-29 Thread Jeff Law


This code from builtins.c:

  /* If we don't need too much alignment, we'll have been guaranteed
 proper alignment by get_trampoline_type.  */
  if (TRAMPOLINE_ALIGNMENT <= STACK_BOUNDARY)
return tramp;


It's entirely conceivable that TRAMPOLINE_ALIGNMENT will be the same as 
STACK_BOUNDARY.  And if they are, then -Wtautological-compare will 
complain bitterly.


This affects the cr16 port and possibly others (I've had this fix in my 
tree while running the config-all.mk builds).


Given the real possibility that those two objects are the same and thus 
the complaint from -Wtautological-compare, it seems best to simply 
disable -Wtautological-compare for this function.


Bootstrapped and regression tested on x86_64-linux-gnu and also used to 
successfully build cr16-elf cross compilers from config-all.mk.


OK for the trunk?

Other alternatives would be to obfuscate the appropriate macros in the 
cr16 port.  That seemed wrong in this case to me.


Jeff
* builtins.c (round_trampoline_addr): Turn off -Wtautological-compare
when compiling this function.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 1592810..e4ed470 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -4830,6 +4830,11 @@ expand_builtin___clear_cache (tree exp)
   return const0_rtx;
 }
 
+#if GCC_VERSION >= 6000
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wtautological-compare"
+#endif
+
 /* Given a trampoline address, make sure it satisfies TRAMPOLINE_ALIGNMENT.  */
 
 static rtx
@@ -4854,6 +4859,9 @@ round_trampoline_addr (rtx tramp)
 
   return tramp;
 }
+#if GCC_VERSION >= 6000
+#pragma GCC diagnostic pop
+#endif
 
 static rtx
 expand_builtin_init_trampoline (tree exp, bool onstack)


Go patch committed: Accept untyped integer values as indexes

2015-09-29 Thread Ian Lance Taylor
This patch by Chris Manghane fixes the Go frontend to accept any
untyped integer value as an index, even if the default type of the
value is not "int".  This fixes https://golang.org/issue/11545 .
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 228087)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-66c113f1af300ce27b99f18f792901d7327d6699
+f187e13b712824b08f2a8833033840cd52a3b95a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 227863)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -9870,11 +9870,26 @@ void
 Array_index_expression::do_determine_type(const Type_context*)
 {
   this->array_->determine_type_no_context();
-  this->start_->determine_type_no_context();
+
+  Type_context index_context(Type::lookup_integer_type("int"), false);
+  if (this->start_->is_constant())
+this->start_->determine_type(_context);
+  else
+this->start_->determine_type_no_context();
   if (this->end_ != NULL)
-this->end_->determine_type_no_context();
+{
+  if (this->end_->is_constant())
+this->end_->determine_type(_context);
+  else
+this->end_->determine_type_no_context();
+}
   if (this->cap_ != NULL)
-this->cap_->determine_type_no_context();
+{
+  if (this->cap_->is_constant())
+this->cap_->determine_type(_context);
+  else
+this->cap_->determine_type_no_context();
+}
 }
 
 // Check types of an array index.
@@ -10415,9 +10430,19 @@ void
 String_index_expression::do_determine_type(const Type_context*)
 {
   this->string_->determine_type_no_context();
-  this->start_->determine_type_no_context();
+
+  Type_context index_context(Type::lookup_integer_type("int"), false);
+  if (this->start_->is_constant())
+this->start_->determine_type(_context);
+  else
+this->start_->determine_type_no_context();
   if (this->end_ != NULL)
-this->end_->determine_type_no_context();
+{
+  if (this->end_->is_constant())
+this->end_->determine_type(_context);
+  else
+this->end_->determine_type_no_context();
+}
 }
 
 // Check types of a string index.


Re: [PATCH] remove dead code of commutative_reductions

2015-09-29 Thread Sebastian Pop
Tobias Grosser wrote:
> On 09/29/2015 06:26 PM, Sebastian Pop wrote:
> >This code is not used anymore after we removed the previous loop optimizer 
> >(not
> >based on the ISL scheduler.)  We will add back the detection of commutative
> >reductions after we improve the code generation of scalar dependences (by not
> >going out of SSA for scalar dependences just to expose them to the data
> >dependence graph.)
> >
> >Patch passed bootstrap and check on x86_64-linux with ISL-0.15.
> >I will commit this patch to trunk.
> 
> LGTM.
> 
> Regarding the handling of scalars, Polly does this now by only virtually 
> modeling
> them as memory dependences, but leaving them as registers until code 
> generation.
> The final code generation is still done by alloca(ing) a memory slot and then
> generating loads/stores from this memory slot. This is significantly easier 
> than
> trying to directly generate SSA again.

Thanks Tobi for the review and the hint.

Sebastian


Re: [PATCH] use MIN fusion for ISL-14

2015-09-29 Thread Tobias Grosser

On 09/30/2015 12:10 AM, Sebastian Pop wrote:

This patch fixes PR66754 by reverting an earlier unintended change.
We now generate a much simpler AST for interchange-1.c:

ISL AST generated by ISL:
{
   for (int c1 = 0; c1 <= 1334; c1 += 1) {
 S_7(c1);
 for (int c3 = 0; c3 <= 1334; c3 += 1)
   S_4(c1, c3);
 S_5(c1);
   }
   for (int c1 = 0; c1 <= 1334; c1 += 1)
 S_10(c1);
   S_8();
}


Sure. Out of interest. How did the AST look before? Also, does it
make sense to add a test case?

Best,
Tobias


Re: [PATCH] Clear variables with stale SSA_NAME_RANGE_INFO (PR tree-optimization/67690)

2015-09-29 Thread Richard Biener
On September 29, 2015 4:21:16 PM GMT+02:00, Marek Polacek  
wrote:
>On Fri, Sep 25, 2015 at 06:22:44PM +0200, Richard Biener wrote:
>> On September 25, 2015 3:49:34 PM GMT+02:00, Marek Polacek
> wrote:
>> >On Fri, Sep 25, 2015 at 09:29:30AM +0200, Richard Biener wrote:
>> >> On Thu, 24 Sep 2015, Marek Polacek wrote:
>> >> 
>> >> > As Richi said in
>> >,
>> >> > using recorded SSA name range infos in VRP is likely to expose
>> >errors in the
>> >> > ranges.  This PR is such a case.  As discussed in the PR, after
>> >tail merging
>> >> > via PRE the range infos cannot be relied upon anymore, so we
>need
>> >to clear
>> >> > them.
>> >> > 
>> >> > Since tree-ssa-ifcombine.c already had code to clean up the flow
>> >data in a BB,
>> >> > I've factored it out to a common function.
>> >> > 
>> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk and 5?
>> >> 
>> >> I believe for tail-merge you also need to clear range info on
>> >> PHI defs in the BB.  For ifcombine this wasn't necessary (no PHI
>> >nodes
>> >> in the relevant CFG), but it's ok to extend the new 
>> >> reset_flow_sensitive_info_in_bb function to also reset PHI defs.
>> >
>> >All right.
>> > 
>> >> Ok with that change.
>> >
>> >Since I'm not completely sure if I did the right thing here, could
>you
>> >please have another look at the new function?
>> 
>> Doesn't work that way.  You need to iterate over the PHI sequence
>separately via gsi_start_phis(bb), etc.
>
>Oops, sorry.  So like this?
>
>Bootstrapped/regtested on x86_64-linux, ok for trunk (and a similar
>patch for 5)?

Yes, thanks
Richard.

>2015-09-29  Marek Polacek  
>
>   PR tree-optimization/67690
>   * tree-ssa-ifcombine.c (pass_tree_ifcombine::execute): Call
>   reset_flow_sensitive_info_in_bb.
>   * tree-ssa-tail-merge.c (replace_block_by): Likewise.
>   * tree-ssanames.c: Include "gimple-iterator.h".
>   (reset_flow_sensitive_info_in_bb): New function.
>   * tree-ssanames.h (reset_flow_sensitive_info_in_bb): Declare.
>
>   * gcc.dg/torture/pr67690.c: New test.
>
>diff --git gcc/testsuite/gcc.dg/torture/pr67690.c
>gcc/testsuite/gcc.dg/torture/pr67690.c
>index e69de29..491de51 100644
>--- gcc/testsuite/gcc.dg/torture/pr67690.c
>+++ gcc/testsuite/gcc.dg/torture/pr67690.c
>@@ -0,0 +1,32 @@
>+/* { dg-do run } */
>+
>+const int c1 = 1;
>+const int c2 = 2;
>+
>+int
>+check (int i)
>+{
>+  int j;
>+  if (i >= 0)
>+j = c2 - i;
>+  else
>+j = c2 - i;
>+  return c2 - c1 + 1 > j;
>+}
>+
>+int invoke (int *pi) __attribute__ ((noinline,noclone));
>+int
>+invoke (int *pi)
>+{
>+  return check (*pi);
>+}
>+
>+int
>+main ()
>+{
>+  int i = c1;
>+  int ret = invoke ();
>+  if (!ret)
>+__builtin_abort ();
>+  return 0;
>+}
>diff --git gcc/tree-ssa-ifcombine.c gcc/tree-ssa-ifcombine.c
>index 9f04174..66be430 100644
>--- gcc/tree-ssa-ifcombine.c
>+++ gcc/tree-ssa-ifcombine.c
>@@ -769,16 +769,7 @@ pass_tree_ifcombine::execute (function *fun)
> {
>   /* Clear range info from all stmts in BB which is now executed
>  conditional on a always true/false condition.  */
>-  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>-   !gsi_end_p (gsi); gsi_next ())
>-{
>-  gimple *stmt = gsi_stmt (gsi);
>-  ssa_op_iter i;
>-  tree op;
>-  FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF)
>-reset_flow_sensitive_info (op);
>-}
>-
>+  reset_flow_sensitive_info_in_bb (bb);
>   cfg_changed |= true;
> }
> }
>diff --git gcc/tree-ssa-tail-merge.c gcc/tree-ssa-tail-merge.c
>index 0ce59e8..487961e 100644
>--- gcc/tree-ssa-tail-merge.c
>+++ gcc/tree-ssa-tail-merge.c
>@@ -1534,6 +1534,10 @@ replace_block_by (basic_block bb1, basic_block
>bb2)
>   e2->probability = GCOV_COMPUTE_SCALE (e2->count, out_sum);
> }
> 
>+  /* Clear range info from all stmts in BB2 -- this transformation
>+ could make them out of date.  */
>+  reset_flow_sensitive_info_in_bb (bb2);
>+
>   /* Do updates that use bb1, before deleting bb1.  */
>   release_last_vdef (bb1);
>   same_succ_flush_bb (bb1);
>diff --git gcc/tree-ssanames.c gcc/tree-ssanames.c
>index 4199290..7235dc3 100644
>--- gcc/tree-ssanames.c
>+++ gcc/tree-ssanames.c
>@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "backend.h"
> #include "tree.h"
> #include "gimple.h"
>+#include "gimple-iterator.h"
> #include "hard-reg-set.h"
> #include "ssa.h"
> #include "alias.h"
>@@ -544,6 +545,29 @@ reset_flow_sensitive_info (tree name)
> SSA_NAME_RANGE_INFO (name) = NULL;
> }
> 
>+/* Clear all flow sensitive data from all statements and PHI
>definitions
>+   in BB.  */
>+
>+void
>+reset_flow_sensitive_info_in_bb (basic_block bb)
>+{
>+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
>+   gsi_next ())
>+  

Re: [PATCH] x86 interrupt attribute

2015-09-29 Thread H.J. Lu
On Tue, Sep 29, 2015 at 1:16 PM, H.J. Lu  wrote:
> On Tue, Sep 29, 2015 at 11:49 AM, Mike Stump  wrote:
>> To be feature complete, it would be nice to have two styles of interrupt 
>> functions, one that returns with iret, and one that returns with ret.  The 
>> point is that the user might want to call functions from a interrupt handler 
>> and not save and restore all call clobbered registers.  By allowing a ret 
>> style interrupt handler, calls to a ret style interrupt routine can avoid 
>> saving and restoring all call clobbered registers.
>
> Do you have a testcase for this?  I think the current implementation
> covers most use cases.
>
>> Oh, and I wish that all the port independent code for interrupt functions 
>> was shared across all ports, as redoing all this code for each port is silly 
>> (sad).  And example of this would be the sibcall code, the fact that all 
>> call saved registers need to be saved is another.  The EPILOGUE_USES or the 
>> gen_rtx_USE is yet another.  Type checking the return type to ensure the 
>> return type is void, likely another.
>
> A very good point, but beyond this implementation :-(.
>
>> One last comment, most folks use EPILOGUE_USES and mark up the registers as 
>> used.  You don’t.  I’m not sure if both ways work equally well, or if there 
>> is a reason to prefer one over the other.  Maybe someone could comment on 
>> this, as in my port I use EPILOGUE_USES and it seems to work just fine.
>
> We will take a look.

Julia, I checked a patch into hjl/interrupt/master branch to
define EPILOGUE_USES in i386:

commit f3a6675a8d69d810d2cad0c090a762094a0a8622
Author: H.J. Lu 
Date:   Tue Sep 29 13:47:18 2015 -0700

Define EPILOGUE_USES in i386

Define EPILOGUE_USES in i386 so that all preserved registers are used
by the epilogue of interrupt handler.  Don't explicitly mark BP and SP
registers as used since they are always used in epilogue.

Please take a look.


-- 
H.J.


Re: [PATCH] Convert SPARC to LRA

2015-09-29 Thread Sebastian Huber

On 30/09/15 04:07, Jeff Law wrote:

If the port does get occasional fixes (primarily driven by BZs),
but not getting updated on a regular basis (such as conversion to
LRA, conversion to RTL prologue/epilogue, etc), may be only getting
occasional testing, etc. Then it's probably fair to call it in
maintenance mode.  A great example IMHO would be the m68k.


Another criteria would be available hardware for which both the PA
and alpha ports are a good example.  When you can't buy new hardware
then targets that could formerly host GCC quickly rot to the state
where only cross-compilation is viable (and having "old" GCC is good
enough).
Very true. Actually the PA is the best example there.  Alpha I believe 
has a functional-enough QEMU port to do real work and m68k has Aranym 
which I've used to bootstrap m68k within the last 18 months.  Hell, I 
think Aranym actually ran faster than the last shipping real hardware! 


You can still buy m68k based chips (e.g. Freescale ColdFire) for 
embedded systems.


http://www.freescale.com/products/more-processors/32-bit-mcu-and-mcp/coldfire-plus-coldfire-mcus-mpus:PC68KCF

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP

2015-09-29 Thread Ramana Radhakrishnan
On Tue, Sep 29, 2015 at 12:52 AM, Evandro Menezes  wrote:
> In some micro-architectures the insns to load or store pairs of vector
> registers are implemented rather differently from those affecting lanes in
> vector registers.  Then, it's important that such insns be described
> likewise differently in the scheduling model.
>
> This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart from
> the current neon_load2_2reg_q and neon_store2_2reg_q types, respectively.

In such types.md restructuring, please handle these in *all* affected
scheduler descriptions, in this case thunder and xgene are 2 scheduler
descriptions that you have missed - Given Andrew is handling Thunder,
please update the xgene backend too at the same time. I can't think of
anything else that is affected right now.

A simple way to do that is to rename the old form to something else in
an intermediate patch using git to figure out all the
micro-architectures affected that need to be handled for both arm and
aarch64 backends and then add the new forms to handle this.

If there need to be follow up patches for xgene with different
handling, I'm sure Philipp will follow up - added him to CC.

Thanks,
Ramana

>
> Thank you,
>
> --
> Evandro Menezes
>


Re: [PATCH] add static-linked PIE support

2015-09-29 Thread Rich Felker
On Tue, Sep 29, 2015 at 09:34:07PM -0400, Rich Felker wrote:
> This is the gcc side support of the static-linked PIE functionality
> added to binutils in commit 9b8b325a1f4cdaf235e7d803849dde6ededec865:

And unfortunately I wasn't aware of this:

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e9abca4f4a48fa8b1fd9778f6a3cd748e099e3bb

Now I need to figure out the magic spec macros going on there and work
around the fact that default-pie mode has -static implicitly turning
default-pie off while we need it to stay on... Would a new
--enable-default-pie=always mode be appropriate? I don't think people
would be happy with changing the default, especially since glibc does
not have an rcrt1.o (yet)...

Rich


  1   2   >