Re: [PATCH][4/n] Remove GENERIC stmt combining from SCCVN

2015-07-10 Thread Jeff Law

On 06/29/2015 04:02 AM, Richard Biener wrote:


Ok, the above isn't the correct place (seems to be used from the
threading machinery only), but record_equivalences_from_incoming_edge is
and that is where the special-case you mention is which handles
widening converts but not sign-changes.  And yes, it's the wrong-way
around, handling
But the threading machinery is where we need this code for the calloc.C 
test.


Conceptually DOM can only exploit an equivalence that dominates other 
statements.  In this case n_5/n_4 == 0 only reaches BB12 through one 
edge (BB5->BB12).   The equivalence doesn't hold on other paths (through 
BB11) into BB12.  Thus the equivalence isn't going to be recorded by 
r_e_f_i_e.


Jump threading on the other hand can use equivalences that are path 
specific without the dominance relationship -- but it only uses them to 
simplify conditionals.  Again, with limited digging, it appears we just 
need to add the code to handle this case for threading and the right 
things ought to happen.


Now if we did path duplication/isolation in DOM like we do in the 
threader, then we could optimize this case in DOM.  Basically we'd 
duplicate BB12.  One copy would be reachable via BB5, the other via 
BB11.  And magically the n_5/n_4 == 0 equivalence carries because it 
would dominate one of hte two copies.


The path duplication to expose redundancies is one of the things I'd 
like to get out of a Bodik-esque scheme.  One of the things Bodik's work 
does is identify the minimal set of blocks that need to be copied to 
expose each path specific redundancy that it finds.


Jeff


Re: [patch] bit of cleanup to graphite files

2015-07-10 Thread Jeff Law

On 07/10/2015 02:47 PM, Andrew MacLeod wrote:

I noticed a few annoying bits around the graphite files that I decided
to cleanup.

- omega.h shouldn't include "config.h".  including params.h is fine
since it is needed, but it should be within the #ifndef GCC_OMEGA_H guard.
- sese.h is required for compilation of graphite-poly.h, and basically
isn't used anywhere else (except sese.c) , so simply include it in
graphite-poly.h.
- I adjusted the rest of the graphite files . All but graphite.c guard
the enter contents of the file with HAVE_isl, but they all include a ton
of GCC includes outside the guard.  I moved them inside the guard and
ran include reduction on them all to remove the unneeded headers.
- graphite.c was similar, except it has a small hunk of code which is
compiled when HAVE_isl is false. I manually adjusted those includes to
be minimal, and ran include reduction on the rest.

I bootstrapped with HAVE_isl defined and also with it not defined on
x86_64-unknown-linux-gnu.  I ran it with no regressions with HAVE_isl
defined.

OK for trunk?

OK.
jeff


Re: [patch] fixes -fcilkplus functionality on DragonFly (fixes ~2600 tests)

2015-07-10 Thread Jeff Law

On 07/10/2015 06:34 PM, John Marino wrote:

After posting the first testsuite results for DragonFly, it was clear
that the -fcilkplus functionality was broken:
https://gcc.gnu.org/ml/gcc-testresults/2015-07/msg01046.html

The problem was related to the __cpu_model symbol not getting exported.

The solution was to create libgcc/config/i386/t-dragonfly to define an
additional symbol map (similar to t-freebsd).  Simply creating the file
is enough because there's already a placeholder for t-dragonfly at
libgcc/config.host.  The patch is attached.

The improved results of the patch can be seen on the next posted
testsuite results:
https://gcc.gnu.org/ml/gcc-testresults/2015-07/msg01081.html

An additional ~2600 tests now pass.
Please consider this patch for incorporation into trunk.  Only DragonFly
uses the new t-dragonfly file so there is no impact to other platforms.

suggested entry for libgcc/ChangeLog:

2015-07-XX  John Marino  

 * config/i386/t-dragonfly: New.

OK.
jeff



[patch] fixes -fcilkplus functionality on DragonFly (fixes ~2600 tests)

2015-07-10 Thread John Marino
After posting the first testsuite results for DragonFly, it was clear
that the -fcilkplus functionality was broken:
https://gcc.gnu.org/ml/gcc-testresults/2015-07/msg01046.html

The problem was related to the __cpu_model symbol not getting exported.

The solution was to create libgcc/config/i386/t-dragonfly to define an
additional symbol map (similar to t-freebsd).  Simply creating the file
is enough because there's already a placeholder for t-dragonfly at
libgcc/config.host.  The patch is attached.

The improved results of the patch can be seen on the next posted
testsuite results:
https://gcc.gnu.org/ml/gcc-testresults/2015-07/msg01081.html

An additional ~2600 tests now pass.
Please consider this patch for incorporation into trunk.  Only DragonFly
uses the new t-dragonfly file so there is no impact to other platforms.

suggested entry for libgcc/ChangeLog:

2015-07-XX  John Marino  

* config/i386/t-dragonfly: New.

Thanks,
John
--- /dev/null   2015-07-10 21:56:18 UTC
+++ libgcc/config/i386/t-dragonfly
@@ -0,0 +1,2 @@
+# Required for -fcilkplus support
+SHLIB_MAPFILES += $(srcdir)/config/i386/libgcc-bsd.ver


Re: [nvptx] mkoffload cleanup

2015-07-10 Thread Bernd Schmidt

On 07/11/2015 12:53 AM, Nathan Sidwell wrote:

I'mm working through the mkoffload machinery.  mkoffload.c emits a C
file, and the quoting in the source is quite confusing.  This patch
introduces a quoting macro 'Q' that allows one to write raw C to be
stringized and written out.

ok? (more cleanups to follow)


The quoting is fairly standard and used throughout gcc, and I guess I'm 
kind of used to seeing it - the patch would make things inconsistent 
with everything else. It's also nonobvious and probably unintentional 
that indentation and linebreaks get lost in some places in the output 
when the patch is applied - the following is emitted as a single line:


extern void *__OFFLOAD_TABLE__[]; static __attribute__((constructor)) 
void init (void) { GOMP_offload_register (__OFFLOAD_TABLE__, 5, 
&target_data); }


So, I'm sorry - not a fan of this.


Bernd



Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-10 Thread Bernhard Reutner-Fischer
On 11 July 2015 at 01:00, Bernhard Reutner-Fischer
 wrote:
> On 10 July 2015 at 14:31, Kyrill Tkachov  wrote:

> PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either 
> ;)

err, arm and aarch64 have no -mbranch-cost, a couple of prominent
other arches do..


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-10 Thread Bernhard Reutner-Fischer
On 10 July 2015 at 14:31, Kyrill Tkachov  wrote:
> Hi all,
>
> This patch makes if-conversion more aggressive when handling code of the
> form:
> if (test)
>   x := a  //THEN
> else
>   x := b  //ELSE

> The current code adds the costs of both the THEN and ELSE blocks and proceeds 
> if they don't
> exceed the branch cost. I don't think that's quite a right calculation.
> We're going to be executing at least one of the basic blocks anyway.
> This patch we instead check the *maximum* of the two blocks against the 
> branch cost.
> This should still catch cases where a high latency instruction appears in one 
> or both of
> the paths.

Shouldn't this maximum also take probability into account? Or maybe
not, would have to think about it tomorrow.

$ contrib/check_GNU_style.sh rtl-ifcvt.00.patch

Blocks of 8 spaces should be replaced with tabs.
783:+return FALSE;


Generally ifcvt.c (resp. the whole tree) could use a
sed -i -e "s/\([[:space:]]\)FALSE/\1false/g" gcc/ifcvt.c
Maybe some of the int predicates could then become bools.


+/* Return iff the registers that the insns in BB_A set do not
+   get used in BB_B.  */

Return true iff


Did you include go in your testing?
I see:
Unexpected results in this build (new failures)
FAIL: encoding/json
FAIL: go/printer
FAIL: go/scanner
FAIL: html/template
FAIL: log
FAIL: net/http
FAIL: net/http/cgi
FAIL: net/http/cookiejar
FAIL: os
FAIL: text/template


bbs_ok_for_cmove_arith() looks costly but i guess you looked if
there's some pre-existing cleverness you could have used instead?

noce_emit_bb() could use a better comment. Likewise insn_valid_noce_process_p().

insn_rtx_cost() should return an unsigned int, then_cost, else_cost
should thus be unsigned int too.

copy_of_a versus copy_of_insn_b; I'd shorten the latter.

bb_valid_for_noce_process_p() suggests that there is a JOIN_BB param
but there is none?
Also should document the return value (and should not clobber the OUT
params upon failure, no?).

As for the testcases, it would be nice to have at least a tiny bit for
x86_64, too.

PS: no -mbranch-cost and, a tad more seriously, no --param branch-cost either ;)
PPS: attached meant to illustrate comments above. Untested.

cheers,
From 1b7d8f9b61eb538cc4338e2073d04a66518f13c2 Mon Sep 17 00:00:00 2001
From: Bernhard Reutner-Fischer 
Date: Fri, 10 Jul 2015 21:25:30 +0200
Subject: [PATCH] rtl-ifcvt typos

fix some typos on top of Kyrill's patch.
should mv the testcases to common ground.
---
 gcc/ifcvt.c |   49 ++-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_1.c |2 +-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_2.c |2 +-
 gcc/testsuite/gcc.target/aarch64/ifcvt_csel_3.c |7 ++--
 4 files changed, 27 insertions(+), 33 deletions(-)

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 3d324257..0bf6645 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1784,8 +1784,7 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* We're going to execute one of the basic blocks anyway, so
  bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost)
-  > COSTS_N_INSNS (if_info->branch_cost))
+  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
 return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
@@ -2730,28 +2729,26 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   rtx cc = cc_in_cond (cond);
 
   if (!insn_valid_noce_process_p (last_insn, cc))
-return FALSE;
+return false;
   last_set = single_set (last_insn);
 
   rtx x = SET_DEST (last_set);
-
   rtx_insn *first_insn = first_active_insn (test_bb);
   rtx first_set = single_set (first_insn);
-  bool speed_p = optimize_bb_for_speed_p (test_bb);
 
-  *cost = insn_rtx_cost (last_set, speed_p);
   if (!first_set)
 return false;
+
   /* We have a single simple set, that's okay.  */
-  else if (first_insn == last_insn)
+  bool speed_p = optimize_bb_for_speed_p (test_bb);
+
+  if (first_insn == last_insn)
 {
   *simple_p = noce_operand_ok (SET_DEST (first_set));
   *cost = insn_rtx_cost (first_set, speed_p);
   return *simple_p;
 }
 
-  *simple_p = false;
-
   rtx_insn *prev_last_insn = PREV_INSN (last_insn);
   gcc_assert (prev_last_insn);
 
@@ -2764,6 +2761,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   /* The regs that are live out of test_bb.  */
   bitmap test_bb_live_out = df_get_live_out (test_bb);
 
+  int potential_cost = insn_rtx_cost (last_set, speed_p);
   rtx_insn *insn;
   FOR_BB_INSNS (test_bb, insn)
 {
@@ -2781,7 +2779,7 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
 	  if (MEM_P (SET_SRC (sset)) || MEM_P (SET_DEST (sset)))
 	goto free_bitmap_and_fail;
 
-	  *cost += insn_rtx_cost (sset, speed_p);
+	  potential_cost += insn_rtx_cost (sset, speed_p);
 	  bitmap_set_bit (test_bb_temps, REGNO (SET_DEST (sset)));
 	}
 }
@@ -2792,11 +2790,13 @@

[nvptx] mkoffload cleanup

2015-07-10 Thread Nathan Sidwell

Bernd,
I'mm working through the mkoffload machinery.  mkoffload.c emits a C file, and 
the quoting in the source is quite confusing.  This patch introduces a quoting 
macro 'Q' that allows one to write raw C to be stringized and written out.


ok? (more cleanups to follow)

nathan
2015-07-10  Nathan Sidwell  

* config/nvptx/mkoffload.c (Q): New macro.
(proacess): Use it for emitting code.

Index: config/nvptx/mkoffload.c
===
--- config/nvptx/mkoffload.c(revision 225703)
+++ config/nvptx/mkoffload.c(working copy)
@@ -37,6 +37,10 @@
 #include "collect-utils.h"
 #include "gomp-constants.h"
 
+/* Quoter macro so that one can write unquoted 'C' for printfs.
+   Sadly all white space is collapsed.  */
+#define Q(...) #__VA_ARGS__ "\n"
+
 const char tool_name[] = "nvptx mkoffload";
 
 #define COMMENT_PREFIX "#"
@@ -269,37 +273,41 @@ process (FILE *in, FILE *out)
 
   unsigned int nvars = 0, nfuncs = 0;
 
-  fprintf (out, "static const char *var_mappings[] = {\n");
+  fprintf (out,
+  Q (static const char *const var_mappings[] = {));
   for (id_map *id = var_ids; id; id = id->next, nvars++)
 fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
-  fprintf (out, "};\n\n");
-  fprintf (out, "static const char *func_mappings[] = {\n");
+  fprintf (out, Q (};));
+
+  fprintf (out, Q (static const char *const func_mappings[] = {));
   for (id_map *id = func_ids; id; id = id->next, nfuncs++)
 fprintf (out, "\t\"%s\"%s\n", id->ptx_name, id->next ? "," : "");
-  fprintf (out, "};\n\n");
+  fprintf (out, Q(};));
 
-  fprintf (out, "static const void *target_data[] = {\n");
-  fprintf (out, "  ptx_code, (void *)(__UINTPTR_TYPE__)sizeof (ptx_code),\n");
-  fprintf (out, "  (void *) %u, var_mappings, (void *) %u, func_mappings\n",
+  fprintf (out,
+  Q (static const void *target_data[] = {
+  ptx_code, (void *)(__UINTPTR_TYPE__)sizeof (ptx_code),
+(void *) %u, var_mappings, (void *) %u, func_mappings};),
   nvars, nfuncs);
-  fprintf (out, "};\n\n");
-
-  fprintf (out, "#ifdef __cplusplus\n");
-  fprintf (out, "extern \"C\" {\n");
-  fprintf (out, "#endif\n");
-
-  fprintf (out, "extern void GOMP_offload_register (const void *, int, void 
*);\n");
-
-  fprintf (out, "#ifdef __cplusplus\n");
-  fprintf (out, "}\n");
-  fprintf (out, "#endif\n");
 
-  fprintf (out, "extern void *__OFFLOAD_TABLE__[];\n\n");
-  fprintf (out, "static __attribute__((constructor)) void init (void)\n{\n");
-  fprintf (out, "  GOMP_offload_register (__OFFLOAD_TABLE__, %d,\n",
+  fprintf (out, Q (#ifdef __cplusplus));
+  fprintf (out, Q (extern "C" {));
+  fprintf (out, Q (#endif));
+
+  fprintf (out,
+  Q (extern void GOMP_offload_register (const void *, int, void *);));
+
+  fprintf (out, Q (#ifdef __cplusplus));
+  fprintf (out, Q (}));
+  fprintf (out, Q (#endif));
+
+  fprintf (out,
+  Q (extern void *__OFFLOAD_TABLE__[];
+ static __attribute__((constructor)) void init (void)
+ {
+   GOMP_offload_register (__OFFLOAD_TABLE__, %d, &target_data);
+ }),
   GOMP_DEVICE_NVIDIA_PTX);
-  fprintf (out, " &target_data);\n");
-  fprintf (out, "};\n");
 }
 
 static void


[gomp] remove tid/ntid fns from libgcc

2015-07-10 Thread Nathan Sidwell
I've committed this patch to remove library versions of the num threads and 
thread id.  This has been busted since my reorg of the tid and ntid builtins, 
but wasn't noticed because they're not used anyway.


nathan
2015-07-10  Nathan Sidwell  

* config/nvptx/gomp-tids.c: Delete.
* config/nvptx/t-nvptx: Remove gomp-tids.o

Index: config/nvptx/gomp-tids.c
===
--- config/nvptx/gomp-tids.c(revision 225695)
+++ config/nvptx/gomp-tids.c(working copy)
@@ -1,66 +0,0 @@
-/* Each gang consists of 'worker' threads.  Each worker has 'vector'
-   threads.
-
-   gang, worker and vector mapping functions:
-
-   *tid (0) => vector dimension
-   *tid (1) => worker dimension
-   *ctaid (0) = gang dimension
-
-   FIXME: these functions assume that the gang, worker and vector parameters
-   are 0 or 1.  To generalize these functions, we should use -1 to indicate,
-   say, that a gang clause was used without its optional argument.  In this
-   case, gang should correspond to ctaid(0), i.e., the num_gangs parameter
-   passed to cuLaunchKernel.
-
-   tid = [0, ntid-1]
-   ntid = [1...threads_per_dimension]
-*/
-
-int __attribute__ ((used))
-GOACC_get_num_threads (int gang, int worker, int vector)
-{
-  int vsize = vector * __builtin_GOACC_ntid (0);
-  int wsize = worker * __builtin_GOACC_ntid (1);
-  int gsize = gang * __builtin_GOACC_nctaid (0);
-  int size = 1;
-
-  if (vector)
-size *= __builtin_GOACC_ntid (0);
-
-  if (worker)
-size *= __builtin_GOACC_ntid (1);
-
-  if (gang)
-size *= __builtin_GOACC_nctaid (0);
-
-  return size;
-}
-
-int __attribute__ ((used))
-GOACC_get_thread_num (int gang, int worker, int vector)
-{
-  int tid = 0;
-  int ws = __builtin_GOACC_ntid (1);
-  int vs = __builtin_GOACC_ntid (0);
-  int gid = __builtin_GOACC_ctaid (0);
-  int wid = __builtin_GOACC_tid (1);
-  int vid = __builtin_GOACC_tid (0);
-
-  if (gang && worker && vector)
-tid = gid * ws * vs + vs * wid + vid;
-  else if (gang && !worker && vector)
-tid = vs * gid + vid;
-  else if (gang && worker && !vector)
-tid = ws * gid + wid;
-  else if (!gang && worker && vector)
-tid = vs * wid + vid;
-  else if (!gang && !worker && vector)
-tid = vid;
-  else if (!gang && worker && !vector)
-tid = wid;
-  else if (gang && !worker && !vector)
-tid = gid;
-
-  return tid;
-}
Index: config/nvptx/t-nvptx
===
--- config/nvptx/t-nvptx(revision 225695)
+++ config/nvptx/t-nvptx(working copy)
@@ -16,12 +16,10 @@ INHIBIT_LIBC_CFLAGS = -Dinhibit_libc
 
 gomp-acc_on_device.o: $(srcdir)/config/nvptx/gomp-acc_on_device.c
$(gcc_compile) -c -fno-builtin-acc_on_device $<
-gomp-tids.o: $(srcdir)/config/nvptx/gomp-tids.c
-   $(gcc_compile) -c -fopenacc -O $<
 gomp-atomic.o: $(srcdir)/config/nvptx/gomp-atomic.asm
cp $< $@
 
-OBJS_libgomp= gomp-acc_on_device.o gomp-tids.o gomp-atomic.o
+OBJS_libgomp= gomp-acc_on_device.o gomp-atomic.o
 libgomp.a: $(OBJS_libgomp)
$(AR_CREATE_FOR_TARGET) $@ $(OBJS_libgomp)
 libgomp.spec:


[gomp] fix df verify failure

2015-07-10 Thread Nathan Sidwell
I've committed this patch to fix a df verify crash Thomas pointed me at. 
Thomas, I think this means you can revert the workaround  you just committed?


nathan
2015-07-10  Nathan Sidwell  

* config/nvptx/nvptx.c (nvptx_reorg): Move df problem setting, set
dirty flags.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c(revision 225647)
+++ config/nvptx/nvptx.c(working copy)
@@ -2923,16 +2923,16 @@ nvptx_reorg (void)
 
   thread_prologue_and_epilogue_insns ();
 
-  df_clear_flags (DF_LR_RUN_DCE);
-  df_set_flags (DF_NO_INSN_RESCAN | DF_NO_HARD_REGS);
-  df_live_add_problem ();
-  
   /* Split blocks and record interesting unspecs.  */
   bb_insn_map_t bb_insn_map;
 
-nvptx_split_blocks (&bb_insn_map);
+  nvptx_split_blocks (&bb_insn_map);
 
   /* Compute live regs */
+  df_clear_flags (DF_LR_RUN_DCE);
+  df_set_flags (DF_NO_INSN_RESCAN | DF_NO_HARD_REGS);
+  df_live_add_problem ();
+  df_live_set_all_dirty ();
   df_analyze ();
   regstat_init_n_sets_and_refs ();
 


RE: [PATCH] MIPS: Correctly update the isa and arch_test_option_p variables after the arch dependency handling code in mips.exp

2015-07-10 Thread Matthew Fortune
Andrew Bennett  writes:
> I have noticed that in the mips.exp dg-option handling code the isa and
> arch_test_option_p variables are not updated after the pre-arch to arch
> dependency handling.  This means that if this code changes the
> architecture the post-arch dependency handling code (which relies on
> arch_test_option_p being true) is not run to handle any extra dependencies
> the new architecture might need.

I'm not sure this is the right place to fix this, though it does seem
subjective as we are stretching the logic a little I think.

In the pre-arch options (i.e. when an arch is not explicitly requested) we
already have code that sets -mnan-2008 when downgrading a test R6 to R5 as
the R6 headers will be nan2008 and there is no guarantee of nan legacy headers
existing. This is the opposite case where we upgrade a test from R5 to R6
and R6 has to use -mnan=2008 so needs to explicitly override any command line
option to use -mnan=legacy. I think that therefore needs adding when we set
the arch to R6 in the pre-arch options.

At the same time I think we need to add -mabs=2008 in the same place as R6
requires ABS2008 as well. You should see that as a failure if you test with
-mabs=legacy.

I think I wrote the exact same patch as you have when I did the original R6
tests and concluded it was not in-keeping with the structure of mips.exp.

I've added Richard too since he may be able to offer a guiding hand as original
author of most of the mips.exp code.

Thanks,
Matthew

> I have found this issue while investigating failures with the mips-mti-elf
> toolchain using the -mnan=legacy multilib flags when running any of the
> mips tests that have the HAS_LSA option specified in the dg-options.  The
> default architecture for this toolchain is mips32r2.  This means the 
> architecture
> handling code changes the architecture to mips32r6 to handle the HAS_LSA
> requirements.  Unfortunately because the arch_test_option_p is not updated
> it is still set to false, so the post-arch code is not run.  This means
> the nan encoding is not set to -mnan=2008 when then causes the tests to fail
> because mips32r6 does not support -mnan=legacy.
> 
> The patch and ChangeLog are below.
> 
> Ok to commit?
> 
> 
> 
> Regards,
> 
> 
> 
> Andrew
> 
> 
> testsuite/
>   * gcc.target/mips/mips.exp (mips-dg-options): Update the isa and
>   arch_test_option_p variables after the arch dependency handling code.
> 
> 
> diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> b/gcc/testsuite/gcc.target/mips/mips.exp
> index 1dd4173..1eb714d 100644
> --- a/gcc/testsuite/gcc.target/mips/mips.exp
> +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> @@ -1188,8 +1188,10 @@ proc mips-dg-options { args } {
>  }
> 
>  # Re-calculate the isa_rev for use in the abi handling code below
> +set arch_test_option_p [mips_test_option_p options arch]
>  set arch [mips_option options arch]
>  set isa_rev [mips_arch_info $arch isa_rev]
> +set isa [mips_arch_info $arch isa]
> 
>  # Set an appropriate ABI, handling dependencies between the pre-abi
>  # options and the abi options.  This should mirror the abi and post-abi



C++ PATCH for variable templates in pack expansions

2015-07-10 Thread Jason Merrill
Looking at the concepts work led me to notice this bug: we weren't 
finding packs used only in variable template-ids.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 29ae93b90171f5202ec1de7507b77d09b2dff643
Author: Jason Merrill 
Date:   Fri Jul 10 15:27:11 2015 -0400

	* pt.c (find_parameter_packs_r): Handle variable templates.
	(variable_template_specialization_p): New.
	* cp-tree.h: Declare it.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index d383612..8450e9b 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5771,6 +5771,7 @@ extern bool reregister_specialization		(tree, tree, tree);
 extern tree instantiate_non_dependent_expr	(tree);
 extern tree instantiate_non_dependent_expr_sfinae (tree, tsubst_flags_t);
 extern tree instantiate_non_dependent_expr_internal (tree, tsubst_flags_t);
+extern bool variable_template_specialization_p  (tree);
 extern bool alias_type_or_template_p(tree);
 extern bool alias_template_specialization_p (const_tree);
 extern bool dependent_alias_template_spec_p (const_tree);
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 63907ce..8c72a61 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3245,6 +3245,13 @@ find_parameter_packs_r (tree *tp, int *walk_subtrees, void* data)
 			ppd, ppd->visited);
 	  *walk_subtrees = 0;
 	}
+  else if (variable_template_specialization_p (t))
+	{
+	  cp_walk_tree (&DECL_TI_ARGS (t),
+			find_parameter_packs_r,
+			ppd, ppd->visited);
+	  *walk_subtrees = 0;
+	}
   break;
 
 case BASES:
@@ -5351,6 +5358,17 @@ instantiate_non_dependent_expr (tree expr)
   return instantiate_non_dependent_expr_sfinae (expr, tf_error);
 }
 
+/* True iff T is a specialization of a variable template.  */
+
+bool
+variable_template_specialization_p (tree t)
+{
+  if (!VAR_P (t) || !DECL_LANG_SPECIFIC (t) || !DECL_TEMPLATE_INFO (t))
+return false;
+  tree tmpl = DECL_TI_TEMPLATE (t);
+  return variable_template_p (tmpl);
+}
+
 /* Return TRUE iff T is a type alias, a TEMPLATE_DECL for an alias
template declaration, or a TYPE_DECL for an alias declaration.  */
 
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ33.C b/gcc/testsuite/g++.dg/cpp1y/var-templ33.C
new file mode 100644
index 000..53c6db2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ33.C
@@ -0,0 +1,20 @@
+// Test for variable templates in pack expansion
+// { dg-do compile { target c++14 } }
+
+template  const int Val = I;
+
+constexpr int f () { return 0; }
+template 
+constexpr int f(T t, Ts... ts)
+{
+  return t + f(ts...);
+}
+
+template 
+constexpr int g()
+{
+  return f(Val...);
+}
+
+#define SA(X) static_assert((X),#X)
+SA((g<1,2,3,4>() == 1+2+3+4));


[patch] bit of cleanup to graphite files

2015-07-10 Thread Andrew MacLeod
I noticed a few annoying bits around the graphite files that I decided 
to cleanup.


- omega.h shouldn't include "config.h".  including params.h is fine 
since it is needed, but it should be within the #ifndef GCC_OMEGA_H guard.
- sese.h is required for compilation of graphite-poly.h, and basically 
isn't used anywhere else (except sese.c) , so simply include it in 
graphite-poly.h.
- I adjusted the rest of the graphite files . All but graphite.c guard 
the enter contents of the file with HAVE_isl, but they all include a ton 
of GCC includes outside the guard.  I moved them inside the guard and 
ran include reduction on them all to remove the unneeded headers.
- graphite.c was similar, except it has a small hunk of code which is 
compiled when HAVE_isl is false. I manually adjusted those includes to 
be minimal, and ran include reduction on the rest.


I bootstrapped with HAVE_isl defined and also with it not defined on 
x86_64-unknown-linux-gnu.  I ran it with no regressions with HAVE_isl 
defined.


OK for trunk?

Andrew


	* omega.h: Don't include config.h, don't include params.h again if
	omega.h has already been included.
	* graphite-poly.h: Include sese.h.
	* graphite.c: Don't include sese.h, remove needless includes and 
	minimize includes outside #ifdef HAVE_isl block.
	* graphite-blocking.c: Don't include sese.h, remove needless includes,
	and wrap entire file in #ifdef HAVE_isl
	* graphite-dependences.c: Likewise.
	* graphite-interchange.c: Likewise.
	* graphite-isl-ast-to-gimple.c: Likewise.
	* graphite-optimize-isl.c: Likewise.
	* graphite-poly.c: Likewise.
	* graphite-scop-detection.c: Likewise.
	* graphite-sese-to-poly.c: Likewise.

Index: omega.h
===
*** omega.h	(revision 225674)
--- omega.h	(working copy)
*** You should have received a copy of the G
*** 24,35 
  along with GCC; see the file COPYING3.  If not see
  .  */
  
- #include "config.h"
- #include "params.h"
  
  #ifndef GCC_OMEGA_H
  #define GCC_OMEGA_H
  
  #define OMEGA_MAX_VARS PARAM_VALUE (PARAM_OMEGA_MAX_VARS)
  #define OMEGA_MAX_GEQS PARAM_VALUE (PARAM_OMEGA_MAX_GEQS)
  #define OMEGA_MAX_EQS PARAM_VALUE (PARAM_OMEGA_MAX_EQS)
--- 24,35 
  along with GCC; see the file COPYING3.  If not see
  .  */
  
  
  #ifndef GCC_OMEGA_H
  #define GCC_OMEGA_H
  
+ #include "params.h"
+ 
  #define OMEGA_MAX_VARS PARAM_VALUE (PARAM_OMEGA_MAX_VARS)
  #define OMEGA_MAX_GEQS PARAM_VALUE (PARAM_OMEGA_MAX_GEQS)
  #define OMEGA_MAX_EQS PARAM_VALUE (PARAM_OMEGA_MAX_EQS)
Index: graphite-poly.h
===
*** graphite-poly.h	(revision 225674)
--- graphite-poly.h	(working copy)
*** along with GCC; see the file COPYING3.
*** 22,27 
--- 22,29 
  #ifndef GCC_GRAPHITE_POLY_H
  #define GCC_GRAPHITE_POLY_H
  
+ #include "sese.h"
+ 
  typedef struct poly_dr *poly_dr_p;
  
  typedef struct poly_bb *poly_bb_p;
Index: graphite.c
===
*** graphite.c	(revision 225674)
--- graphite.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 46,78 
  
  #include "system.h"
  #include "coretypes.h"
- #include "diagnostic-core.h"
- #include "alias.h"
  #include "backend.h"
  #include "cfghooks.h"
  #include "tree.h"
  #include "gimple.h"
- #include "hard-reg-set.h"
- #include "options.h"
  #include "fold-const.h"
- #include "internal-fn.h"
  #include "gimple-iterator.h"
  #include "tree-cfg.h"
  #include "tree-ssa-loop.h"
- #include "tree-dump.h"
- #include "cfgloop.h"
- #include "tree-chrec.h"
  #include "tree-data-ref.h"
  #include "tree-scalar-evolution.h"
! #include "sese.h"
  #include "dbgcnt.h"
  #include "tree-parloops.h"
- #include "tree-pass.h"
  #include "tree-cfgcleanup.h"
- 
- #ifdef HAVE_isl
- 
- #include "graphite-poly.h"
  #include "graphite-scop-detection.h"
  #include "graphite-isl-ast-to-gimple.h"
  #include "graphite-sese-to-poly.h"
--- 46,70 
  
  #include "system.h"
  #include "coretypes.h"
  #include "backend.h"
+ #include "diagnostic-core.h"
+ #include "cfgloop.h"
+ #include "tree-pass.h"
+ 
+ #ifdef HAVE_isl
  #include "cfghooks.h"
  #include "tree.h"
  #include "gimple.h"
  #include "fold-const.h"
  #include "gimple-iterator.h"
  #include "tree-cfg.h"
  #include "tree-ssa-loop.h"
  #include "tree-data-ref.h"
  #include "tree-scalar-evolution.h"
! #include "graphite-poly.h"
  #include "dbgcnt.h"
  #include "tree-parloops.h"
  #include "tree-cfgcleanup.h"
  #include "graphite-scop-detection.h"
  #include "graphite-isl-ast-to-gimple.h"
  #include "graphite-sese-to-poly.h"
Index: graphite-blocking.c
===
*** graphite-blocking.c	(revision 225674)
--- graphite-blocking.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 31,62 
  #include 
  #include 
  #inc

Re: [PATCH 3/7] Fix trinary op

2015-07-10 Thread Jeff Law

On 07/09/2015 10:48 PM, Mikhail Maltsev wrote:

On 08.07.2015 13:55, Ian Lance Taylor wrote:

I don't know of anybody who actually uses the DMGL_TYPES support.  I
don't know why anybody would.

Ian

Thanks for pointing that out. I updated the testcases, so that now they
don't depend on DMGL_TYPES being used.


But better still is to consider the larger context.  We want the
demangler to work the same on all hosts, if at all possible.
d_identifier is called exactly once.  Change it to take a parameter of
type long.  Don't worry about changing d_source_name.

Fixed.


Then look at the fact that d_number does not check for overflow.  We
should consider changing d_number to limit itself to 32-bit integers,
and to return an error indication on overflow.  From a quick glance I
don't see any need for the demangler to support numbers larger than 32
bits.  I think it's OK if we fail to demangle symbol names that are
more than 2 billion characters long.

OK, but I think it'll be better to fix that in a separate patch.

The attached patch includes the changes mentioned above, there is also a
small change: I moved the comment for CHECK_DEMANGLER macro to
cp-demangle.c (it already contains a comment for other similar macros)
and replaced __builtin_abort() with abort(). For some reason I thought
that it might need an additional #include, but in reality libiberty (and
the demangler too) already use abort().
The changelog is also attached. OK for trunk after regtest?


OK after regression testing.

jeff



Re: [PR66726] Factor conversion out of COND_EXPR

2015-07-10 Thread Jeff Law

On 07/09/2015 05:08 PM, Kugan wrote:


Done. Bootstrapped and regression tested on x86-64-none-linux-gnu with
no new regressions. Is this OK for trunk?

Thanks for the additional testcases.




+  else
+{
+  /* If arg1 is an INTEGER_CST, fold it to new type.  */
+  if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
+ && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
+   {
+ if (gimple_assign_cast_p (arg0_def_stmt))
+   new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1);
+ else
+   return false;
+   }
+  else
+   return false;
+}
Something looks goofy here formatting-wise.  Can you please check for 
horizontal whitespace consistency before committing.





+
+  /* If types of new_arg0 and new_arg1 are different bailout.  */
+  if (TREE_TYPE (new_arg0) != TREE_TYPE (new_arg1))
+return false;
Seems like this should use types_compatible_p here.  You're testing 
pointer equality, but as long as the types are compatible, we should be 
able to make the transformation.


With the horizontal whitespace fixed and using types_compatible_p this 
is OK for the trunk.  So pre-approved with those two changes and a final 
bootstrap/regression test (due to the types_compatible_p change).


jeff



Re: [PATCH][AArch64][testsuite] Adjust some arith+compare tests for potentially more aggressive if-conversion

2015-07-10 Thread James Greenhalgh
On Fri, Jul 10, 2015 at 01:21:05PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> Some of the testcases in aarch64.exp can fail their scan-assembler patterns
> if if-conversion becomes more aggressive.
>
> This patch adjusts the testcases in case the branches are eliminated and
> further optimisations occur that may remove the scan-assembler patterns.
>
> With this patch the patterns are always generated and the expected execute
> values are updated.
> 
> Tests still pass on aarch64.
> Ok for trunk?

This is OK. Please address my one comment below (looks like you left some
#if 0 kicking around in adds1.c) and fix the ChangeLog to include the
gcc.target/aarch64/adds1.c changes.

Thanks,
James

> 
> Thanks,
> Kyrill
> 
> 2015-07-10  Kyrylo Tkachov  
> 
>  * gcc.target/aarch64/adds3.c: Adjust for more aggressive
>  if-conversion..
>  * gcc.target/aarch64/ands_1.c: Likewise.
>  * gcc.target/aarch64/bics_1.c: Likewise.
>  * gcc.target/aarch64/subs1.c: Likewise.
>  * gcc.target/aarch64/subs3.c: Likewise.

> diff --git a/gcc/testsuite/gcc.target/aarch64/adds1.c 
> b/gcc/testsuite/gcc.target/aarch64/adds1.c
> index 6cc700a..1689029 100644
> --- a/gcc/testsuite/gcc.target/aarch64/adds1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/adds1.c
> @@ -12,7 +12,7 @@ adds_si_test1 (int a, int b, int c)
>if (d == 0)
>  return a + c;
>else
> -return b + d + c;
> +return d;
>  }
>  
>  int
> @@ -24,7 +24,7 @@ adds_si_test2 (int a, int b, int c)
>if (d == 0)
>  return a + c;
>else
> -return b + d + c;
> +return d;
>  }
>  
>  int
> @@ -36,7 +36,7 @@ adds_si_test3 (int a, int b, int c)
>if (d == 0)
>  return a + c;
>else
> -return b + d + c;
> +return d;
>  }
>  
>  typedef long long s64;
> @@ -50,7 +50,7 @@ adds_di_test1 (s64 a, s64 b, s64 c)
>if (d == 0)
>  return a + c;
>else
> -return b + d + c;
> +return d;
>  }
>  
>  s64
> @@ -62,7 +62,7 @@ adds_di_test2 (s64 a, s64 b, s64 c)
>if (d == 0)
>  return a + c;
>else
> -return b + d + c;
> +return d;
>  }
>  
>  s64
> @@ -74,7 +74,7 @@ adds_di_test3 (s64 a, s64 b, s64 c)
>if (d == 0)
>  return a + c;
>else
> -return b + d + c;
> +return d;
>  }
>  
>  int main ()
> @@ -83,66 +83,68 @@ int main ()
>s64 y;
>  
>x = adds_si_test1 (29, 4, 5);
> -  if (x != 42)
> +  if (x != (29 + 4))
>  abort ();
>  
> -  x = adds_si_test1 (5, 2, 20);
> -  if (x != 29)
> +  x = adds_si_test1 (5, 2, -5);
> +  if (x != 7)
>  abort ();
>  
>x = adds_si_test2 (29, 4, 5);
> -  if (x != 293)
> +  if (x != (29 + 0xff))
>  abort ();
>  
> -  x = adds_si_test2 (1024, 2, 20);
> -  if (x != 1301)
> +  x = adds_si_test2 (-255, 2, 20);
> +  if (x != -235)
>  abort ();
>  
>x = adds_si_test3 (35, 4, 5);
> -  if (x != 76)
> +  if (x != (35 + (4 << 3)))
>  abort ();
>  
> -  x = adds_si_test3 (5, 2, 20);
> -  if (x != 43)
> +  x = adds_si_test3 (-(2 << 3), 2, 20);
> +  if (x != (20 - (2 << 3)))
>  abort ();
>  
>y = adds_di_test1 (0x13029ll,
>0x32004ll,
>0x505050505ll);
>  
> -  if (y != 0xc75050536)
> +  if (y != (0x13029ll + 0x32004ll))
>  abort ();
>  
>y = adds_di_test1 (0x5000500050005ll,
> -  0x2111211121112ll,
> +  -0x5000500050005ll,
>0x02020ll);
> -  if (y != 0x9222922294249)
> +  if (y != (0x5000500050005ll + 0x02020ll))
>  abort ();
>  
>y = adds_di_test2 (0x13029ll,
>0x32004ll,
>0x505050505ll);
> -  if (y != 0x955050631)
> +  if (y != (0x13029ll + 0xff))
>  abort ();
>  
> -  y = adds_di_test2 (0x130002900ll,
> +  y = adds_di_test2 (-0xff,
>0x32004ll,
>0x505050505ll);
> -  if (y != 0x955052f08)
> +  if (y != (0x505050505ll - 0xff))
>  abort ();
>  
>y = adds_di_test3 (0x13029ll,
>0x06408ll,
>0x505050505ll);
> -  if (y != 0x9b9050576)
> +  if (y != (0x13029ll + (0x06408ll << 3)))
>  abort ();
>  
>y = adds_di_test3 (0x130002900ll,
> -  0x08808ll,
> +  -(0x130002900ll >> 3),
>0x505050505ll);
> -  if (y != 0xafd052e4d)
> +  if (y != (0x130002900ll + 0x505050505ll))
>  abort ();


> +#if 0
>  
> +#endif

Drop this hunk.

>return 0;
>  }
>  



Re: [patch] Remove needless obstack.h includes

2015-07-10 Thread Jeff Law

On 07/10/2015 01:49 PM, Andrew MacLeod wrote:

  noticed obstack.h was being included in a few files which already
include backend.h, so its redundant.  Just taking them out.

Bootstraps on x86_64-unknown-linux-gnu, with no new regressions.
finishing up a config-list.mk using targets from the changes just to be
sure.

OK for trunk assuming everything finishes fine?

OK.

Jeff


Re: [PATCH 3/3] Fix ubsan tests by disabling of an optimization.

2015-07-10 Thread Jeff Law

On 07/10/2015 02:19 AM, Richard Biener wrote:


But the warning on the "bogus" line will still be warranted, so user goes and
fixes it.
But when the user gets the "bogus" line, he may look at the code and 
determine that the reported line can't possibly be executed -- so they 
get confused, assume the warning is bogus and ignore it.




 Then tail-merge no longer applies and he gets the warning on the

other warranted line.
That assumes that we'd get to both paths.  We may not.  The paths may be 
totally independent.


Jeff


Re: [PATCH v2, libcpp] Faster line lexer.

2015-07-10 Thread Jeff Law

On 07/10/2015 07:25 AM, Ondřej Bílka wrote:

On Fri, Jul 10, 2015 at 12:43:48PM +0200, Jakub Jelinek wrote:

On Fri, Jul 10, 2015 at 11:37:18AM +0200, Uros Bizjak wrote:

Have you tried new SSE4.2 implementation (the one with asm flags) with
unrolled loop?


Also, the SSE4.2 implementation looks shorter, so more I-cache friendly,
so I wouldn't really say it is redundant if they are roughly same speed.


Ok, I tried to also optimize sse4 and found that main problem was
checking that index==16 caused high latency.

Trick was checking first 64 bytes in header using flags. Then loop is
relatively unlikely as lines longer than 64 bytes are relatively rare.

I tested that on more machines. On haswell sse4 is noticable faster, on
nehalem a sse2 is still bit faster and on amd fx10 its lot slower. How
do I check processor to select sse2 on amd processors where its
considerably slower?
I doubt any of this is worth the maintenance burden.  I think we should 
pick a reasonably performant implementation and move on to bigger issues.


jeff




Re: [PATCH] Factor out bb_has_abnormal_call_pred (PR middle-end/66353)

2015-07-10 Thread Jeff Law

On 07/10/2015 10:09 AM, Marek Polacek wrote:

ira-lives.c and lra-lives.c both define the same function named
bb_has_abnormal_call_pred.  I think let's factor out this function to
basic-block.h where it really belongs.

Bootstrap/regtest running on x86_64-linux, ok for trunk if it passes?

2015-07-10  Marek Polacek  

PR middle-end/66353
* basic-block.h (has_abnormal_call_or_eh_pred_edge_p): New function.
* ira-lives.c (bb_has_abnormal_call_pred): Remove function.
(process_bb_node_lives): Call has_abnormal_call_or_eh_pred_edge_p
rather than bb_has_abnormal_call_pred.
* lra-lives.c (bb_has_abnormal_call_pred): Remove function.
(process_bb_lives): Call has_abnormal_call_or_eh_pred_edge_p
rather than bb_has_abnormal_call_pred.

OK.

Jeff


Re: [gomp4] Handle Fortran deviceptr clause.

2015-07-10 Thread James Norris

Hi Thomas,

On 07/09/2015 03:29 AM, Thomas Schwinge wrote:

Hi Jim!

On Wed, 8 Jul 2015 13:00:16 -0500, James Norris  
wrote:

This patch adds handling of the deviceptr clause when
used within a Fortran program.


Please motivate such non-obvious code changes by a test case.  At least
to me, it's not at all obvious what's going on here...


Attached are two files which allow the testing of the fix.
I wasn't able to figure out how to get the testsuite to
compile a c source file and a fortran source file. So the
following should suffice until I figure something out.


/opt/codesourcery/trunk/bin/x86_64-none-linux-gnu-gcc -fopenacc -Wall -c 
dp-1.c
/opt/codesourcery/trunk/bin/x86_64-none-linux-gnu-gfortran -fopenacc 
-Wall -c dp-2.f90
/opt/codesourcery/trunk/bin/x86_64-none-linux-gnu-gcc -fopenacc -Wall -o 
dp dp-1.o dp-2.o






Committed to gomp-4_0-branch



+   * oacc-parallel.c (GOACC_parallel GOACC_data_start): Handle Fortran
+   deviceptr clause.



--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -211,6 +211,21 @@ GOACC_parallel (int device, void (*fn) (void *),
thr = goacc_thread ();
acc_dev = thr->dev;

+  for (i = 0; i < mapnum; i++)
+{
+  unsigned short kind1 = kinds[i] & 0xff;
+  unsigned short kind2 = kinds[i+1] & 0xff;
+
+  if ((kind1 == GOMP_MAP_FORCE_DEVICEPTR && kind2 == GOMP_MAP_POINTER)
+  && (sizes[i + 1] == 0)
+  && (hostaddrs[i] == *(void **)hostaddrs[i + 1]))
+   {
+ kinds[i+1] = kinds[i];
+ sizes[i+1] = sizeof (void *);
+ hostaddrs[i] = NULL;
+   }
+}


Ugh.  That loop should be bounded by mapnum - 1 to avoid out-of-bounds
array accesses.  And, such "voodoo" code constructs do need a comment,
please


The 'Ugh' is fixed, comment added, and committed to gomp-4_0-branch.


Why does this processing need to happen at run-time, in libgomp?
Should something else be done during OMP lowering, for example?


The code in question is for the handling of the deviceptr clause
when presented via Fortran. As there is special code to handle
PSETs, which are Fortran specific, in the run-time, I felt that
there shouldn't be an issue with handling the deviceptr clause
there as well.

Thank you, thank you,
Jim
diff --git a/libgomp/oacc-parallel.c b/libgomp/oacc-parallel.c
index eeb08c4..91a5e7d 100644
--- a/libgomp/oacc-parallel.c
+++ b/libgomp/oacc-parallel.c
@@ -187,7 +187,7 @@ GOACC_parallel (int device, void (*fn) (void *),
   struct gomp_device_descr *acc_dev;
   struct target_mem_desc *tgt;
   void **devaddrs;
-  unsigned int i;
+  int i;
   struct splay_tree_key_s k;
   splay_tree_key tgt_fn_key;
   void (*tgt_fn);
@@ -211,11 +211,12 @@ GOACC_parallel (int device, void (*fn) (void *),
   thr = goacc_thread ();
   acc_dev = thr->dev;
 
-  for (i = 0; i < mapnum; i++)
+  for (i = 0; i < (signed)(mapnum - 1); i++)
 {
   unsigned short kind1 = kinds[i] & 0xff;
   unsigned short kind2 = kinds[i+1] & 0xff;
 
+  /* Handle Fortran deviceptr clause.  */
   if ((kind1 == GOMP_MAP_FORCE_DEVICEPTR && kind2 == GOMP_MAP_POINTER)
 	   && (sizes[i + 1] == 0)
 	   && (hostaddrs[i] == *(void **)hostaddrs[i + 1]))
@@ -326,11 +327,12 @@ GOACC_data_start (int device, size_t mapnum,
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
-  for (i = 0; i < mapnum; i++)
+  for (i = 0; i < (signed)(mapnum - 1); i++)
 {
   unsigned short kind1 = kinds[i] & 0xff;
   unsigned short kind2 = kinds[i+1] & 0xff;
 
+  /* Handle Fortran deviceptr clause.  */
   if ((kind1 == GOMP_MAP_FORCE_DEVICEPTR && kind2 == GOMP_MAP_POINTER)
 	   && (sizes[i + 1] == 0)
 	   && (hostaddrs[i] == *(void **)hostaddrs[i + 1]))
/* { dg-do run } */

#include 
#include 
#include 
#include 

extern void subr1_ (int *);
extern void subr2_ (int *);
extern void subr3 (int *);
extern void subr4 (int *);

void
subr (int *a)
{
#pragma acc data deviceptr (a)
  {
#pragma acc parallel
{
  int i;

  for (i = 0; i < 8; i++)
a[i] = i + 1;
}
  }
}

int
main (int argc, char **argv)
{
  int  N = 8;
  int  nbytes;
  int  *a, *b;
  int  i;

  nbytes = N * sizeof (int);

  a = (int *) acc_malloc (nbytes);
  b = (int *) malloc (nbytes);

  memset (&b[0], 0, nbytes);

  subr1_ (a);

  acc_memcpy_from_device (b, a, nbytes);

  for (i = 0; i < N; i++)
{
  if (b[i] != i + 1)
abort ();
}

  memset (&b[0], 0, nbytes);

  subr2_ (a);

  acc_memcpy_from_device (b, a, nbytes);

  for (i = 0; i < N; i++)
{
  if (b[i] != i + 1)
abort ();
}

  memset (&b[0], 0, nbytes);

  subr3 (a);

  acc_memcpy_from_device (b, a, nbytes);

  for (i = 0; i < N; i++)
{
  if (b[i] != i + 1)
abort ();
}

  memset (&b[0], 0, nbytes);

  subr4 (a);

  acc_memcpy_from_device (b, a, nbytes);

  for (i = 0; i < N; i++)
{
  if (b[i] != i + 1)
abort ();
}

  return 0;
}


subroutine subr1 (a)
  implicit none
  
  integer, dime

Re: [PATCH] PR target/66819: Allow indirect sibcall with register arguments

2015-07-10 Thread Uros Bizjak
On Fri, Jul 10, 2015 at 7:58 PM, H.J. Lu  wrote:
> On Fri, Jul 10, 2015 at 10:21 AM, Uros Bizjak  wrote:
>> On Fri, Jul 10, 2015 at 7:10 PM, H.J. Lu  wrote:
>>> On Fri, Jul 10, 2015 at 9:30 AM, Uros Bizjak  wrote:
 On Thu, Jul 9, 2015 at 12:54 PM, H.J. Lu  wrote:
> Indirect sibcall with register arguments is OK when there is register
> available for argument passing.
>
> OK for trunk if there is no regression?
>
>
> H.J.
> ---
> gcc/
>
> PR target/66819
> * config/i386/i386.c (ix86_function_ok_for_sibcall): Allow
> indirect sibcall with register arguments if register available
> for argument passing.
> (init_cumulative_args): Set cfun->machine->arg_reg_available_p
> to cum->nregs != 0.
>>
>> Please update the above entry for nregs > 0.
>>
> (function_arg_advance_32): Set cfun->machine->arg_reg_available_p
> to 0 when setting cum->nregs = 0.

 Do we also need similar functionality for 64bit ABIs? What happens if
 we are out of argument regs there?
>>>
>>> 64-bit is OK since we have rax, r10 and r11 as scratch registers which
>>> aren't used to pass arguments.
>>
>> Maybe this fact should be added as a comment in some appropriate place.
>>
> * config/i386/i386.h (machine_function): Add arg_reg_available_p.
>
> gcc/testsuite/
>
> PR target/66819
> * gcc.target/i386/pr66819-1.c: New test.
> * gcc.target/i386/pr66819-2.c: Likewise.
> * gcc.target/i386/pr66819-3.c: Likewise.
> * gcc.target/i386/pr66819-4.c: Likewise.
> * gcc.target/i386/pr66819-5.c: Likewise.
> ---
>  gcc/config/i386/i386.c| 15 +--
>  gcc/config/i386/i386.h|  3 +++
>  gcc/testsuite/gcc.target/i386/pr66819-1.c |  8 
>  gcc/testsuite/gcc.target/i386/pr66819-2.c |  8 
>  gcc/testsuite/gcc.target/i386/pr66819-3.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr66819-4.c | 12 
>  gcc/testsuite/gcc.target/i386/pr66819-5.c | 10 ++
>  7 files changed, 60 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-5.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 54ee6f3..85e59a8 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5628,12 +5628,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>if (!decl
>   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P 
> (decl)))
> {
> - if (ix86_function_regparm (type, NULL) >= 3)
> -   {
> - /* ??? Need to count the actual number of registers to be 
> used,
> -not the possible number of registers.  Fix later.  */
> - return false;
> -   }
> + /* FIXME: The symbol indirect call doesn't need a
> +call-clobbered register.  But we don't know if
> +this is a symbol indirect call or not  here.  */
> + if (ix86_function_regparm (type, NULL) >= 3
> + && !cfun->machine->arg_reg_available_p)

 Isn't enough to look at arg_reg_available here?
>>>
>>> We need to check ix86_function_regparm since nregs is 0 if
>>> -mregparm=N isn't used and pr65753.c will fail.
>>
>> OK. Please add this comment, is not that obvious.
>>
>>>
> +   return false;
> }
>  }
>
> @@ -6567,6 +6567,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
> Argument info to initialize */
> ? X86_64_REGPARM_MAX
> : X86_64_MS_REGPARM_MAX);
>  }
> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;

 false instead of 0. This is a boolean.
>>>
>>> Updated.
>>>
>if (TARGET_SSE)
>  {
>cum->sse_nregs = SSE_REGPARM_MAX;
> @@ -6636,6 +6637,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
> Argument info to initialize */
>   else
> cum->nregs = ix86_function_regparm (fntype, fndecl);
> }
> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;

 IMO, cum->nregs > 0 would be more descriptive.
>>>
>>> Updated.
>>>
>/* Set up the number of SSE registers used for passing SFmode
>  and DFmode arguments.  Warn for mismatching ABI.  */
> @@ -7584,6 +7586,7 @@ pass_in_reg:
> {
>   cum->nregs = 0;
>   cum->regno = 0;
> + cfun->

[patch] Remove needless obstack.h includes

2015-07-10 Thread Andrew MacLeod
 noticed obstack.h was being included in a few files which already 
include backend.h, so its redundant.  Just taking them out.


Bootstraps on x86_64-unknown-linux-gnu, with no new regressions. 
finishing up a config-list.mk using targets from the changes just to be 
sure.


OK for trunk assuming everything finishes fine?

Andrew

	* bb-reorder.c: Don't include obstack.h if backend.h is included.
	* cfg.c: Likewise.
	* cfgloopanal.c: Likewise.
	* cfgrtl.c: Likewise.
	* combine.c: Likewise.
	* cprop.c: Likewise.
	* dominance.c: Likewise.
	* fwprop.c: Likewise.
	* gcse.c: Likewise.
	* ira-emit.c: Likewise.
	* ira.c: Likewise.
	* loop-init.c: Likewise.
	* loop-invariant.c: Likewise.
	* loop-iv.c: Likewise.
	* loop-unroll.c: Likewise.
	* lower-subreg.c: Likewise.
	* postreload-gcse.c: Likewise.
	* postreload.c: Likewise.
	* regcprop.c: Likewise.
	* regrename.c: Likewise.
	* reload1.c: Likewise.
	* reorg.c: Likewise.
	* tree-ssa-pre.c: Likewise.
	* tree-ssa-structalias.c: Likewise.
	* tree.c: Likewise.
	* web.c: Likewise.
	* config/aarch64/cortex-a57-fma-steering.c: Likewise.
	* config/alpha/alpha.c: Likewise.
	* config/arm/arm.c: Likewise.
	* config/avr/avr.c: Likewise.
	* config/darwin.c: Likewise.
	* config/fr30/fr30.c: Likewise.
	* config/frv/frv.c: Likewise.
	* config/ft32/ft32.c: Likewise.
	* config/m32c/m32c.c: Likewise.
	* config/mcore/mcore.c: Likewise.
	* config/mep/mep.c: Likewise.
	* config/mn10300/mn10300.c: Likewise.
	* config/moxie/moxie.c: Likewise.
	* config/rs6000/rs6000.c: Likewise.
	* config/spu/spu.c: Likewise.
	* config/stormy16/stormy16.c: Likewise.

Index: bb-reorder.c
===
*** bb-reorder.c	(revision 225674)
--- bb-reorder.c	(working copy)
***
*** 92,98 
  #include "output.h"
  #include "target.h"
  #include "tm_p.h"
- #include "obstack.h"
  #include "insn-config.h"
  #include "expmed.h"
  #include "dojump.h"
--- 92,97 
Index: cfg.c
===
*** cfg.c	(revision 225674)
--- cfg.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 49,58 
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
! #include "obstack.h"
  #include "alloc-pool.h"
  #include "alias.h"
- #include "backend.h"
  #include "cfghooks.h"
  #include "tree.h"
  #include "hard-reg-set.h"
--- 49,57 
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
! #include "backend.h"
  #include "alloc-pool.h"
  #include "alias.h"
  #include "cfghooks.h"
  #include "tree.h"
  #include "hard-reg-set.h"
Index: cfgloopanal.c
===
*** cfgloopanal.c	(revision 225674)
--- cfgloopanal.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 23,29 
  #include "backend.h"
  #include "predict.h"
  #include "rtl.h"
- #include "obstack.h"
  #include "cfgloop.h"
  #include "tree.h"
  #include "flags.h"
--- 23,28 
Index: cfgrtl.c
===
*** cfgrtl.c	(revision 225674)
--- cfgrtl.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 56,62 
  #include "except.h"
  #include "rtl-error.h"
  #include "tm_p.h"
- #include "obstack.h"
  #include "insn-attr.h"
  #include "insn-config.h"
  #include "expmed.h"
--- 56,61 
Index: combine.c
===
*** combine.c	(revision 225674)
--- combine.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 111,117 
  #include "tree-pass.h"
  #include "valtrack.h"
  #include "cgraph.h"
- #include "obstack.h"
  #include "rtl-iter.h"
  
  #ifndef LOAD_EXTEND_OP
--- 111,116 
Index: cprop.c
===
*** cprop.c	(revision 225674)
--- cprop.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 50,56 
  #include "alloc-pool.h"
  #include "cselib.h"
  #include "intl.h"
- #include "obstack.h"
  #include "tree-pass.h"
  #include "dbgcnt.h"
  #include "target.h"
--- 50,55 
Index: dominance.c
===
*** dominance.c	(revision 225674)
--- dominance.c	(working copy)
***
*** 37,43 
  #include "coretypes.h"
  #include "backend.h"
  #include "rtl.h"
- #include "obstack.h"
  #include "cfganal.h"
  #include "diagnostic-core.h"
  #include "alloc-pool.h"
--- 37,42 
Index: fwprop.c
===
*** fwprop.c	(revision 225674)
--- fwprop.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 32,38 
  #include "insn-config.h"
  #include "recog.h"
  #include "flags.h"
- #include "obstack.h"
  #include "cfgrtl.h"
  #include "cfgcleanup.h"
  #include "target.h"
--- 32,37 
Index: gcse.c
===

Re: RFC: Use std::{min,max} instead of MIN/MAX?

2015-07-10 Thread Trevor Saunders
On Fri, Jul 10, 2015 at 03:19:10PM +0200, Marek Polacek wrote:
> Uros had the idea of using std::min/max instead of our MIN/MAX
> macros defined in system.h.  I thought I would do this cleanup,
> but very soon I ran into a problem of failed template argument
> substitution: std::min/max function templates require that both
> arguments be of the same type:
> 
> /home/marek/src/gcc/gcc/caller-save.c: In function ‘void 
> replace_reg_with_saved_mem(rtx_def**, machine_mode, int, void*)’:
> /home/marek/src/gcc/gcc/caller-save.c:1151:63: error: no matching function 
> for call to ‘min(int, short unsigned int)’
>   offset -= (std::min (UNITS_PER_WORD, GET_MODE_SIZE (mode))
>^
> In file included from /usr/include/c++/5.1.1/bits/char_traits.h:39:0,
>  from /usr/include/c++/5.1.1/string:40,
>  from /home/marek/src/gcc/gcc/system.h:201,
>  from /home/marek/src/gcc/gcc/caller-save.c:21:
> /usr/include/c++/5.1.1/bits/stl_algobase.h:195:5: note: candidate: 
> template const _Tp& std::min(const _Tp&, const _Tp&)
>  min(const _Tp& __a, const _Tp& __b)
>  ^
> /usr/include/c++/5.1.1/bits/stl_algobase.h:195:5: note:   template argument 
> deduction/substitution failed:
> /home/marek/src/gcc/gcc/caller-save.c:1151:63: note:   deduced conflicting 
> types for parameter ‘const _Tp’ (‘int’ and ‘short unsigned int’)
>   offset -= (std::min (UNITS_PER_WORD, GET_MODE_SIZE (mode))
> 
> We can work around this by using casts, but that seems too ugly a solution.

You can also explicitly pick the specialization you want with e.g.
std::max (x, y); its kind of long, but I can see an argument
for the explicitness so I'm not sure how ugly I think it is.

Trev


> So it appears to me that we're stuck with our MIN/MAX macros.
> 
> Thoughts?
> 
>   Marek


Re: Adjust -fdump-ada-spec to C++14 switch (2)

2015-07-10 Thread Eric Botcazou
> We need to skip the constexpr default constructors.

That's actually not sufficient so I have installed the attached patch instead.

Tested on x86_64-suse-linux, applied on the mainline as obvious.


2015-07-10  Eric Botcazou  

c-family/
* c-ada-spec.h (cpp_operation): Revert latest change.
* c-ada-spec.c (print_ada_declaration): Likewise.  Skip implicit
constructors and destructors.
cp/
* decl2.c (cpp_check): Revert latest change.

-- 
Eric BotcazouIndex: c-family/c-ada-spec.h
===
--- c-family/c-ada-spec.h	(revision 225585)
+++ c-family/c-ada-spec.h	(working copy)
@@ -27,7 +27,6 @@ along with GCC; see the file COPYING3.
 typedef enum {
   HAS_DEPENDENT_TEMPLATE_ARGS,
   IS_ABSTRACT,
-  IS_CONSTEXPR,
   IS_CONSTRUCTOR,
   IS_DESTRUCTOR,
   IS_COPY_CONSTRUCTOR,
Index: c-family/c-ada-spec.c
===
--- c-family/c-ada-spec.c	(revision 225585)
+++ c-family/c-ada-spec.c	(working copy)
@@ -2887,7 +2887,6 @@ print_ada_declaration (pretty_printer *b
   bool is_method = TREE_CODE (TREE_TYPE (t)) == METHOD_TYPE;
   tree decl_name = DECL_NAME (t);
   bool is_abstract = false;
-  bool is_constexpr = false;
   bool is_constructor = false;
   bool is_destructor = false;
   bool is_copy_constructor = false;
@@ -2899,7 +2898,6 @@ print_ada_declaration (pretty_printer *b
   if (cpp_check)
 	{
 	  is_abstract = cpp_check (t, IS_ABSTRACT);
-	  is_constexpr = cpp_check (t, IS_CONSTEXPR);
 	  is_constructor = cpp_check (t, IS_CONSTRUCTOR);
 	  is_destructor = cpp_check (t, IS_DESTRUCTOR);
 	  is_copy_constructor = cpp_check (t, IS_COPY_CONSTRUCTOR);
@@ -2913,8 +2911,8 @@ print_ada_declaration (pretty_printer *b
 
   if (is_constructor || is_destructor)
 	{
-	  /* Skip constexpr default constructors.  */
-	  if (is_constexpr)
+	  /* ??? Skip implicit constructors/destructors for now.  */
+	  if (DECL_ARTIFICIAL (t))
 	return 0;
 
 	  /* Only consider constructors/destructors for complete objects.  */
@@ -3050,9 +3048,12 @@ print_ada_declaration (pretty_printer *b
 	  if (num_fields == 1)
 	is_interface = 1;
 
-	  /* Also check that there are only virtual methods.  */
+	  /* Also check that there are only pure virtual methods.  Since the
+	 class is empty, we can skip implicit constructors/destructors.  */
 	  for (tmp = TYPE_METHODS (TREE_TYPE (t)); tmp; tmp = TREE_CHAIN (tmp))
 	{
+	  if (DECL_ARTIFICIAL (tmp))
+		continue;
 	  if (cpp_check (tmp, IS_ABSTRACT))
 		is_abstract_record = 1;
 	  else
Index: cp/decl2.c
===
--- cp/decl2.c	(revision 225585)
+++ cp/decl2.c	(working copy)
@@ -4070,8 +4070,6 @@ cpp_check (tree t, cpp_operation op)
 	}
   case IS_ABSTRACT:
 	return DECL_PURE_VIRTUAL_P (t);
-  case IS_CONSTEXPR:
-	return DECL_DECLARED_CONSTEXPR_P (t);
   case IS_CONSTRUCTOR:
 	return DECL_CONSTRUCTOR_P (t);
   case IS_DESTRUCTOR:

Re: [C++ Patch/RFC] PR 54521

2015-07-10 Thread Jason Merrill

OK.

Jason


Re: [RFC, Fortran, (pr66775)] Allocatable function result

2015-07-10 Thread Steve Kargl
On Fri, Jul 10, 2015 at 06:20:47PM +0200, Mikael Morin wrote:
> 
> I'm not completely convinced by the standard excerpts that have been
> quoted about this topic, as they don't have any explicit mention of
> allocatable variables/expressions.

I did not quote 12.3.3 about "characteristics of function results",
which mentions the allocatable attribute.  But, that is not 
necessarily relevant.  The pieces I quoted explicitly states

   "On completion of execution of the function, the value returned
is that of its function result. ... If the function result is
not a pointer, its value shall be defined by the function."

The function not only needs to allocate memory, it needs to
assign it a value.  In the following, if i <= 0, the function
result is not defined. 

module foo
   contains
   function bar(i)
  integer, allocatable :: bar
  integer, intent(in) :: i
  if (i > 0) bar = i
   end function bar
end module foo

program test
   use foo
   integer j
   j = bar( 3); print *, j
   j = bar(-3); print *, j
   end if
end program test

Even if Andre developed a patch to allocate memory in
bar() for the i <= 0 case to prevent the segfault, the
function must return a value.  What should that value be?

I suppose one could argue that gfortran should issue
a run-time error if it can detect the undefined function
result.  But may lead to a run-time penalty.

-- 
Steve


Re: [gomp4.1] depend(sink) and depend(source) parsing for C

2015-07-10 Thread Aldy Hernandez

On 07/09/2015 11:53 AM, Jakub Jelinek wrote:

Hi!

On Thu, Jul 09, 2015 at 11:24:44AM -0700, Aldy Hernandez wrote:

Thanks for working on it.


+ wide_int offset = wi::neg (addend, &overflow);
+ addend = wide_int_to_tree (TREE_TYPE (addend), offset);
+ if (overflow)
+   warning_at (c_parser_peek_token (parser)->location,
+   OPT_Woverflow,
+   "possible overflow in % offset");


possible overflow looks weird.  Shouldn't it complain the same
as it does if you do:
int c = - (-2147483648);


Done.


?


--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12489,6 +12489,11 @@ c_finish_omp_clauses (tree clauses, bool declare_simd)
  == OMP_CLAUSE_DEPEND_SOURCE);
  break;
}
+ if (OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK)
+   {
+ gcc_assert (TREE_CODE (t) == TREE_LIST);
+ break;
+   }
  if (TREE_CODE (t) == TREE_LIST)
{
  if (handle_omp_array_sections (c))


Won't this ICE if somebody uses depend(sink:) ? or depend(sink:.::) or
similar garbage?  Make sure you don't create OMP_CLAUSE_DEPEND in that
case.


I've fixed the parser to avoid creating such clause.




diff --git a/gcc/gimple-walk.c b/gcc/gimple-walk.c
index f0e2c67..ba79977 100644
--- a/gcc/gimple-walk.c
+++ b/gcc/gimple-walk.c
@@ -327,6 +327,10 @@ walk_gimple_op (gimple stmt, walk_tree_fn callback_op,
}
break;

+case GIMPLE_OMP_ORDERED:
+  /* Ignore clauses.  */
+  break;
+


I'm not convinced you don't want to walk the clauses.


Ok, I've done so.

Note that the OMP_CLAUSE_DECL will contain a TREE_LIST whose 
TREE_PURPOSE had the variable.  I noticed that walking TREE_LIST's just 
walks the TREE_VALUE, not the TREE_PURPOSE:


case TREE_LIST:
  WALK_SUBTREE (TREE_VALUE (*tp));
  WALK_SUBTREE_TAIL (TREE_CHAIN (*tp));
  break;


So, I changed the layout of the OMP_CLAUSE_DECL TREE_LIST to have the 
variable in the TREE_VALUE.  The TREE_PURPOSE will contain the lone 
integer, which shouldn't need to be walked.  However, if later (C++ 
iterators??) we have a TREE_PURPOSE that needs to be walked we will have 
to change the walker or the layout.





diff --git a/gcc/gimple.h b/gcc/gimple.h
index 6057ea0..e33fe1e 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -527,6 +527,17 @@ struct GTY((tag("GSS_OMP_CRITICAL")))
tree name;
  };

+/* GIMPLE_OMP_ORDERED */
+
+struct GTY((tag("GSS_OMP_ORDERED")))
+  gomp_ordered : public gimple_statement_omp
+{
+  /* [ WORD 1-7 ] : base class */
+
+  /* [ WORD 8 ]  */
+  tree clauses;
+};


I would have expected to use
struct GTY((tag("GSS_OMP_SINGLE_LAYOUT")))
   gomp_ordered : public gimple_statement_omp_single_layout
{
 /* No extra fields; adds invariant:
  stmt->code == GIMPLE_OMP_ORDERED.  */
};
instead (like gomp_single, gomp_teams, ...).


Oh, neat.  I missed that.  Fixed.




@@ -149,6 +149,9 @@ struct gimplify_omp_ctx
struct gimplify_omp_ctx *outer_context;
splay_tree variables;
hash_set *privatized_types;
+  /* Iteration variables in an OMP_FOR.  */
+  tree *iter_vars;
+  int niter_vars;


Wonder if it wouldn't be better to use a vec instead.
Then the size would be there as vec_length.


Done.




@@ -8169,6 +8185,19 @@ gimplify_transaction (tree *expr_p, gimple_seq *pre_p)
return GS_ALL_DONE;
  }

+/* Verify the validity of the depend(sink:...) variable VAR.
+   Return TRUE if everything is OK, otherwise return FALSE.  */
+
+static bool
+verify_sink_var (location_t loc, tree var)
+{
+  for (int i = 0; i < gimplify_omp_ctxp->niter_vars; ++i)
+if (var == gimplify_omp_ctxp->iter_vars[i])
+  return true;
+  error_at (loc, "variable %qE is not an iteration variable", var);
+  return false;


I believe what we want to verify is that ith variable in the OMP_CLAUSE_DECL
vector is iter_vars[i], so not just some random permutation etc.


Fixed.




@@ -3216,7 +3218,51 @@ check_omp_nesting_restrictions (gimple stmt, omp_context 
*ctx)
break;
  }
break;
+case GIMPLE_OMP_TASK:
+  for (c = gimple_omp_task_clauses (stmt); c; c = OMP_CLAUSE_CHAIN (c))
+   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEPEND
+   && (OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SOURCE
+   || OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SINK))
+ {
+   error_at (OMP_CLAUSE_LOCATION (c),
+ "depend(%s) is only available in 'omp ordered'",


Please avoid using ' in diagnostics, it should be % instead.


Fixed.




+ OMP_CLAUSE_DEPEND_KIND (c) == OMP_CLAUSE_DEPEND_SOURCE
+ ? "source" : "sink");
+   return false;
+ }
+  break;


This will eventually be needed also for GIMPLE_OMP_TARGET and
GIMPLE_OMP_ENTER/EXIT_DATA.  But as that isn't really supported right now,
can wait.


I added an asser

Re: [PATCH] PR target/66819: Allow indirect sibcall with register arguments

2015-07-10 Thread H.J. Lu
On Fri, Jul 10, 2015 at 10:21 AM, Uros Bizjak  wrote:
> On Fri, Jul 10, 2015 at 7:10 PM, H.J. Lu  wrote:
>> On Fri, Jul 10, 2015 at 9:30 AM, Uros Bizjak  wrote:
>>> On Thu, Jul 9, 2015 at 12:54 PM, H.J. Lu  wrote:
 Indirect sibcall with register arguments is OK when there is register
 available for argument passing.

 OK for trunk if there is no regression?


 H.J.
 ---
 gcc/

 PR target/66819
 * config/i386/i386.c (ix86_function_ok_for_sibcall): Allow
 indirect sibcall with register arguments if register available
 for argument passing.
 (init_cumulative_args): Set cfun->machine->arg_reg_available_p
 to cum->nregs != 0.
>
> Please update the above entry for nregs > 0.
>
 (function_arg_advance_32): Set cfun->machine->arg_reg_available_p
 to 0 when setting cum->nregs = 0.
>>>
>>> Do we also need similar functionality for 64bit ABIs? What happens if
>>> we are out of argument regs there?
>>
>> 64-bit is OK since we have rax, r10 and r11 as scratch registers which
>> aren't used to pass arguments.
>
> Maybe this fact should be added as a comment in some appropriate place.
>
 * config/i386/i386.h (machine_function): Add arg_reg_available_p.

 gcc/testsuite/

 PR target/66819
 * gcc.target/i386/pr66819-1.c: New test.
 * gcc.target/i386/pr66819-2.c: Likewise.
 * gcc.target/i386/pr66819-3.c: Likewise.
 * gcc.target/i386/pr66819-4.c: Likewise.
 * gcc.target/i386/pr66819-5.c: Likewise.
 ---
  gcc/config/i386/i386.c| 15 +--
  gcc/config/i386/i386.h|  3 +++
  gcc/testsuite/gcc.target/i386/pr66819-1.c |  8 
  gcc/testsuite/gcc.target/i386/pr66819-2.c |  8 
  gcc/testsuite/gcc.target/i386/pr66819-3.c | 10 ++
  gcc/testsuite/gcc.target/i386/pr66819-4.c | 12 
  gcc/testsuite/gcc.target/i386/pr66819-5.c | 10 ++
  7 files changed, 60 insertions(+), 6 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-1.c
  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-2.c
  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-3.c
  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-4.c
  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-5.c

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 54ee6f3..85e59a8 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -5628,12 +5628,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
if (!decl
   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
 {
 - if (ix86_function_regparm (type, NULL) >= 3)
 -   {
 - /* ??? Need to count the actual number of registers to be 
 used,
 -not the possible number of registers.  Fix later.  */
 - return false;
 -   }
 + /* FIXME: The symbol indirect call doesn't need a
 +call-clobbered register.  But we don't know if
 +this is a symbol indirect call or not  here.  */
 + if (ix86_function_regparm (type, NULL) >= 3
 + && !cfun->machine->arg_reg_available_p)
>>>
>>> Isn't enough to look at arg_reg_available here?
>>
>> We need to check ix86_function_regparm since nregs is 0 if
>> -mregparm=N isn't used and pr65753.c will fail.
>
> OK. Please add this comment, is not that obvious.
>
>>
 +   return false;
 }
  }

 @@ -6567,6 +6567,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
 Argument info to initialize */
 ? X86_64_REGPARM_MAX
 : X86_64_MS_REGPARM_MAX);
  }
 +  cfun->machine->arg_reg_available_p = cum->nregs != 0;
>>>
>>> false instead of 0. This is a boolean.
>>
>> Updated.
>>
if (TARGET_SSE)
  {
cum->sse_nregs = SSE_REGPARM_MAX;
 @@ -6636,6 +6637,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
 Argument info to initialize */
   else
 cum->nregs = ix86_function_regparm (fntype, fndecl);
 }
 +  cfun->machine->arg_reg_available_p = cum->nregs != 0;
>>>
>>> IMO, cum->nregs > 0 would be more descriptive.
>>
>> Updated.
>>
/* Set up the number of SSE registers used for passing SFmode
  and DFmode arguments.  Warn for mismatching ABI.  */
 @@ -7584,6 +7586,7 @@ pass_in_reg:
 {
   cum->nregs = 0;
   cum->regno = 0;
 + cfun->machine->arg_reg_available_p = 0;
 }
break;

 diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
 index 74334ff..0b6e304 100644
 --

Re: [gomp4] Handle deviceptr from an outer directive

2015-07-10 Thread James Norris

Hi Thomas!

On 07/09/2015 03:51 AM, Thomas Schwinge wrote:

Hi Jim!

On Tue, 7 Jul 2015 10:19:39 -0500, James Norris  
wrote:

This patch fixes an issue where the deviceptr clause in an outer
directive was being ignored during implicit variable definition
on a nested directive.



Committed to gomp-4_0-branch.



--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c


I'm sorry, have not yet tried very hard; but I can't claim to understand
the logic here -- why is the OpenACC deviceptr clause so special?  :-|


Because no data movement should be initiated on behalf of the user.

Prior to the fix, a map (tofrom) was being inserted when the variable
in question was implied within a nested directive. This is wrong.
No movement should occur as the memory for the variable is already
present on the target. Therefore, the use of 'present' is the correct
specification.





@@ -116,6 +116,9 @@ enum gimplify_omp_var_data
/* Gang-local OpenACC variable.  */
GOVD_GANGLOCAL = (1 << 16),

+  /* OpenACC deviceptr clause.  */
+  GOVD_USE_DEVPTR = (1 << 17),
+
GOVD_DATA_SHARE_CLASS = (GOVD_SHARED | GOVD_PRIVATE | GOVD_FIRSTPRIVATE
   | GOVD_LASTPRIVATE | GOVD_REDUCTION | GOVD_LINEAR
   | GOVD_LOCAL)
@@ -6274,7 +6277,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
}
  break;
}
+
  flags = GOVD_MAP | GOVD_EXPLICIT;
+ if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FORCE_DEVICEPTR)
+   flags |= GOVD_USE_DEVPTR;
  goto do_add;

case OMP_CLAUSE_DEPEND:
@@ -6662,6 +6668,7 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
   : (flags & GOVD_FORCE_MAP
  ? GOMP_MAP_FORCE_TOFROM
  : GOMP_MAP_TOFROM));
+
if (DECL_SIZE (decl)
  && TREE_CODE (DECL_SIZE (decl)) != INTEGER_CST)
{
@@ -6687,7 +6694,17 @@ gimplify_adjust_omp_clauses_1 (splay_tree_node n, void 
*data)
  OMP_CLAUSE_CHAIN (clause) = nc;
}
else
-   OMP_CLAUSE_SIZE (clause) = DECL_SIZE_UNIT (decl);
+   {
+ if (gimplify_omp_ctxp->outer_context)
+   {
+ struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp->outer_context;
+ splay_tree_node on
+   = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
+ if (on && (on->value & GOVD_USE_DEVPTR))
+   OMP_CLAUSE_SET_MAP_KIND (clause, GOMP_MAP_FORCE_PRESENT);
+   }
+ OMP_CLAUSE_SIZE (clause) = DECL_SIZE_UNIT (decl);
+   }
  }
if (code == OMP_CLAUSE_FIRSTPRIVATE && (flags & GOVD_LASTPRIVATE) != 0)
  {


The patch that you committed (r225518) also includes a test case
(thanks!) as follows:

 --- /dev/null
 +++ gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
 @@ -0,0 +1,12 @@
 +/* { dg-additional-options "-fdump-tree-gimple" } */
 +
 +void
 +subr (int *a)
 +{
 +#pragma acc data deviceptr (a)
 +#pragma acc parallel
 +  a[0] += 1.0;
 +}
 +
 +/* { dg-final { scan-tree-dump-times "#pragma omp target 
oacc_parallel.*map\\(force_present:a \\\[len: 8\\\]\\)" 1 "gimple" } } */
 +/* { dg-final { cleanup-tree-dump "gimple" } } */

That len: 8 is obviously valid only for 64-bit configurations, so will
cause a FAIL on anything else.




Fixed and committed to gomp-4_0-branch.

Thank you, thank you,
Jim

diff --git a/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c b/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
index 4f6184c..0fef364 100644
--- a/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/deviceptr-4.c
@@ -8,5 +8,5 @@ subr (int *a)
   a[0] += 1.0;
 }
 
-/* { dg-final { scan-tree-dump-times "#pragma omp target oacc_parallel.*map\\(force_present:a \\\[len: 8\\\]\\)" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "#pragma omp target oacc_parallel.*map\\(force_present:a" 1 "gimple" } } */
 /* { dg-final { cleanup-tree-dump "gimple" } } */


Re: [PATCH] PR target/66819: Allow indirect sibcall with register arguments

2015-07-10 Thread Uros Bizjak
On Fri, Jul 10, 2015 at 7:10 PM, H.J. Lu  wrote:
> On Fri, Jul 10, 2015 at 9:30 AM, Uros Bizjak  wrote:
>> On Thu, Jul 9, 2015 at 12:54 PM, H.J. Lu  wrote:
>>> Indirect sibcall with register arguments is OK when there is register
>>> available for argument passing.
>>>
>>> OK for trunk if there is no regression?
>>>
>>>
>>> H.J.
>>> ---
>>> gcc/
>>>
>>> PR target/66819
>>> * config/i386/i386.c (ix86_function_ok_for_sibcall): Allow
>>> indirect sibcall with register arguments if register available
>>> for argument passing.
>>> (init_cumulative_args): Set cfun->machine->arg_reg_available_p
>>> to cum->nregs != 0.

Please update the above entry for nregs > 0.

>>> (function_arg_advance_32): Set cfun->machine->arg_reg_available_p
>>> to 0 when setting cum->nregs = 0.
>>
>> Do we also need similar functionality for 64bit ABIs? What happens if
>> we are out of argument regs there?
>
> 64-bit is OK since we have rax, r10 and r11 as scratch registers which
> aren't used to pass arguments.

Maybe this fact should be added as a comment in some appropriate place.

>>> * config/i386/i386.h (machine_function): Add arg_reg_available_p.
>>>
>>> gcc/testsuite/
>>>
>>> PR target/66819
>>> * gcc.target/i386/pr66819-1.c: New test.
>>> * gcc.target/i386/pr66819-2.c: Likewise.
>>> * gcc.target/i386/pr66819-3.c: Likewise.
>>> * gcc.target/i386/pr66819-4.c: Likewise.
>>> * gcc.target/i386/pr66819-5.c: Likewise.
>>> ---
>>>  gcc/config/i386/i386.c| 15 +--
>>>  gcc/config/i386/i386.h|  3 +++
>>>  gcc/testsuite/gcc.target/i386/pr66819-1.c |  8 
>>>  gcc/testsuite/gcc.target/i386/pr66819-2.c |  8 
>>>  gcc/testsuite/gcc.target/i386/pr66819-3.c | 10 ++
>>>  gcc/testsuite/gcc.target/i386/pr66819-4.c | 12 
>>>  gcc/testsuite/gcc.target/i386/pr66819-5.c | 10 ++
>>>  7 files changed, 60 insertions(+), 6 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-1.c
>>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-2.c
>>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-3.c
>>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-4.c
>>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-5.c
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index 54ee6f3..85e59a8 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -5628,12 +5628,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>>>if (!decl
>>>   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
>>> {
>>> - if (ix86_function_regparm (type, NULL) >= 3)
>>> -   {
>>> - /* ??? Need to count the actual number of registers to be 
>>> used,
>>> -not the possible number of registers.  Fix later.  */
>>> - return false;
>>> -   }
>>> + /* FIXME: The symbol indirect call doesn't need a
>>> +call-clobbered register.  But we don't know if
>>> +this is a symbol indirect call or not  here.  */
>>> + if (ix86_function_regparm (type, NULL) >= 3
>>> + && !cfun->machine->arg_reg_available_p)
>>
>> Isn't enough to look at arg_reg_available here?
>
> We need to check ix86_function_regparm since nregs is 0 if
> -mregparm=N isn't used and pr65753.c will fail.

OK. Please add this comment, is not that obvious.

>
>>> +   return false;
>>> }
>>>  }
>>>
>>> @@ -6567,6 +6567,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
>>> Argument info to initialize */
>>> ? X86_64_REGPARM_MAX
>>> : X86_64_MS_REGPARM_MAX);
>>>  }
>>> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;
>>
>> false instead of 0. This is a boolean.
>
> Updated.
>
>>>if (TARGET_SSE)
>>>  {
>>>cum->sse_nregs = SSE_REGPARM_MAX;
>>> @@ -6636,6 +6637,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
>>> Argument info to initialize */
>>>   else
>>> cum->nregs = ix86_function_regparm (fntype, fndecl);
>>> }
>>> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;
>>
>> IMO, cum->nregs > 0 would be more descriptive.
>
> Updated.
>
>>>/* Set up the number of SSE registers used for passing SFmode
>>>  and DFmode arguments.  Warn for mismatching ABI.  */
>>> @@ -7584,6 +7586,7 @@ pass_in_reg:
>>> {
>>>   cum->nregs = 0;
>>>   cum->regno = 0;
>>> + cfun->machine->arg_reg_available_p = 0;
>>> }
>>>break;
>>>
>>> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
>>> index 74334ff..0b6e304 100644
>>> --- a/gcc/config/i386/i386.h
>>> +++ b/gcc/config/i386/i386.h
>>> @@ -2479,6 +2479,9 @@ struct GTY(()) machine_function {
>>>/* If true, it is safe to not save/restore DRAP register.  *

Re: [PATCH] PR target/66819: Allow indirect sibcall with register arguments

2015-07-10 Thread H.J. Lu
On Fri, Jul 10, 2015 at 9:30 AM, Uros Bizjak  wrote:
> On Thu, Jul 9, 2015 at 12:54 PM, H.J. Lu  wrote:
>> Indirect sibcall with register arguments is OK when there is register
>> available for argument passing.
>>
>> OK for trunk if there is no regression?
>>
>>
>> H.J.
>> ---
>> gcc/
>>
>> PR target/66819
>> * config/i386/i386.c (ix86_function_ok_for_sibcall): Allow
>> indirect sibcall with register arguments if register available
>> for argument passing.
>> (init_cumulative_args): Set cfun->machine->arg_reg_available_p
>> to cum->nregs != 0.
>> (function_arg_advance_32): Set cfun->machine->arg_reg_available_p
>> to 0 when setting cum->nregs = 0.
>
> Do we also need similar functionality for 64bit ABIs? What happens if
> we are out of argument regs there?

64-bit is OK since we have rax, r10 and r11 as scratch registers which
aren't used to pass arguments.

>> * config/i386/i386.h (machine_function): Add arg_reg_available_p.
>>
>> gcc/testsuite/
>>
>> PR target/66819
>> * gcc.target/i386/pr66819-1.c: New test.
>> * gcc.target/i386/pr66819-2.c: Likewise.
>> * gcc.target/i386/pr66819-3.c: Likewise.
>> * gcc.target/i386/pr66819-4.c: Likewise.
>> * gcc.target/i386/pr66819-5.c: Likewise.
>> ---
>>  gcc/config/i386/i386.c| 15 +--
>>  gcc/config/i386/i386.h|  3 +++
>>  gcc/testsuite/gcc.target/i386/pr66819-1.c |  8 
>>  gcc/testsuite/gcc.target/i386/pr66819-2.c |  8 
>>  gcc/testsuite/gcc.target/i386/pr66819-3.c | 10 ++
>>  gcc/testsuite/gcc.target/i386/pr66819-4.c | 12 
>>  gcc/testsuite/gcc.target/i386/pr66819-5.c | 10 ++
>>  7 files changed, 60 insertions(+), 6 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-1.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-2.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-3.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-4.c
>>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-5.c
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index 54ee6f3..85e59a8 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -5628,12 +5628,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>>if (!decl
>>   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
>> {
>> - if (ix86_function_regparm (type, NULL) >= 3)
>> -   {
>> - /* ??? Need to count the actual number of registers to be used,
>> -not the possible number of registers.  Fix later.  */
>> - return false;
>> -   }
>> + /* FIXME: The symbol indirect call doesn't need a
>> +call-clobbered register.  But we don't know if
>> +this is a symbol indirect call or not  here.  */
>> + if (ix86_function_regparm (type, NULL) >= 3
>> + && !cfun->machine->arg_reg_available_p)
>
> Isn't enough to look at arg_reg_available here?

We need to check ix86_function_regparm since nregs is 0 if
-mregparm=N isn't used and pr65753.c will fail.

>> +   return false;
>> }
>>  }
>>
>> @@ -6567,6 +6567,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
>> Argument info to initialize */
>> ? X86_64_REGPARM_MAX
>> : X86_64_MS_REGPARM_MAX);
>>  }
>> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;
>
> false instead of 0. This is a boolean.

Updated.

>>if (TARGET_SSE)
>>  {
>>cum->sse_nregs = SSE_REGPARM_MAX;
>> @@ -6636,6 +6637,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
>> Argument info to initialize */
>>   else
>> cum->nregs = ix86_function_regparm (fntype, fndecl);
>> }
>> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;
>
> IMO, cum->nregs > 0 would be more descriptive.

Updated.

>>/* Set up the number of SSE registers used for passing SFmode
>>  and DFmode arguments.  Warn for mismatching ABI.  */
>> @@ -7584,6 +7586,7 @@ pass_in_reg:
>> {
>>   cum->nregs = 0;
>>   cum->regno = 0;
>> + cfun->machine->arg_reg_available_p = 0;
>> }
>>break;
>>
>> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
>> index 74334ff..0b6e304 100644
>> --- a/gcc/config/i386/i386.h
>> +++ b/gcc/config/i386/i386.h
>> @@ -2479,6 +2479,9 @@ struct GTY(()) machine_function {
>>/* If true, it is safe to not save/restore DRAP register.  */
>>BOOL_BITFIELD no_drap_save_restore : 1;
>>
>> +  /* If true, there is register available for argument passing.  */
>> +  BOOL_BITFIELD arg_reg_available_p : 1;
>
> This is not a predicate, but a boolean flag. Please remove _p from the name.

Updated.

Here is the updated patch.  OK for trunk?

Thanks.

-- 
H.J.
From 3bcd6c122684d896840b2feb756e

[C++ Patch/RFC] PR 54521

2015-07-10 Thread Paolo Carlini

Hi,

in this rather old issue, we fail to include explicit constructors in 
the second step of a copy-initialization and the below is rejected. I'm 
not 100% sure, but using the existing comments as a guide, I think I 
found the place where we wrongly set LOOKUP_ONLYCONVERTING for the 
second step too. The minimal change passes testing, anyway.


Thanks,
Paolo.


Index: cp/call.c
===
--- cp/call.c   (revision 225678)
+++ cp/call.c   (working copy)
@@ -6437,12 +6437,14 @@ convert_like_real (conversion *convs, tree expr, t
   /* Copy-initialization where the cv-unqualified version of the source
 type is the same class as, or a derived class of, the class of the
 destination [is treated as direct-initialization].  [dcl.init] */
-  flags = LOOKUP_NORMAL|LOOKUP_ONLYCONVERTING;
+  flags = LOOKUP_NORMAL;
   if (convs->user_conv_p)
/* This conversion is being done in the context of a user-defined
   conversion (i.e. the second step of copy-initialization), so
   don't allow any more.  */
flags |= LOOKUP_NO_CONVERSION;
+  else
+   flags |= LOOKUP_ONLYCONVERTING;
   if (convs->rvaluedness_matches_p)
flags |= LOOKUP_PREFER_RVALUE;
   if (TREE_CODE (expr) == TARGET_EXPR
Index: testsuite/g++.dg/init/explicit3.C
===
--- testsuite/g++.dg/init/explicit3.C   (revision 0)
+++ testsuite/g++.dg/init/explicit3.C   (working copy)
@@ -0,0 +1,12 @@
+// PR c++/54521
+
+struct X
+{
+  X(int) {}
+  explicit X(X const &) {}
+};
+
+int main()
+{
+  X x = 1;
+}


Re: [Patch, Fortran, 66035, v2] [5/6 Regression] gfortran ICE segfault

2015-07-10 Thread Mikael Morin
hello Andre.

Le 06/07/2015 13:54, Andre Vehreschild a écrit :
> Hi all,
> 
> please find attached the next version of the patch for pr66035 fixing an ICE.
> Scope (copied from first submit):
> 
> An ICE occurred when in a structure constructor an allocatable component of
> type class was initialized with an existing class object. This was caused by 
> 
> - the size of the memory to allocate for the component was miscalculated,
> - the vptr was not set correctly, and
> - when the class object to be used for init was allocatable already, it was
>   copied wasting some memory instead of a view_convert inserted.
> 
> Bootstraps and regtests fine on x86_64-linux-gnu/f21.
> 
> Ok for trunk?
> 
> Regards,
>   Andre
> 
> 
> pr66035_2.patch
> 
> diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
> index 195f7a4..74af725 100644
> --- a/gcc/fortran/trans-expr.c
> +++ b/gcc/fortran/trans-expr.c
> @@ -6903,6 +6903,29 @@ alloc_scalar_allocatable_for_subcomponent_assignment 
> (stmtblock_t *block,
>  TREE_TYPE (tmp), tmp,
>  fold_convert (TREE_TYPE (tmp), size));
>  }
> +  else if (cm->ts.type == BT_CLASS)
> +{
> +  gcc_assert (expr2->ts.type == BT_CLASS || expr2->ts.type == 
> BT_DERIVED);
> +  if (expr2->ts.type == BT_DERIVED)
> + {
> +   tmp = gfc_get_symbol_decl (expr2->ts.u.derived);
> +   size = TYPE_SIZE_UNIT (tmp);
> + }
> +  else
> + {
> +   gfc_expr *e2vtab;
> +   gfc_se se;
> +   e2vtab = gfc_find_and_cut_at_last_class_ref (expr2);
> +   gfc_add_vptr_component (e2vtab);
> +   gfc_add_size_component (e2vtab);
> +   gfc_init_se (&se, NULL);
> +   gfc_conv_expr (&se, e2vtab);
> +   gfc_add_block_to_block (block, &se.pre);
> +   size = fold_convert (size_type_node, se.expr);
> +   gfc_free_expr (e2vtab);
> + }
> +  size_in_bytes = size;
> +}
>else
>  {
>/* Otherwise use the length in bytes of the rhs.  */
That part is OK.

> @@ -7030,7 +7053,8 @@ gfc_trans_subcomponent_assign (tree dest, gfc_component 
> * cm, gfc_expr * expr,
>gfc_add_expr_to_block (&block, tmp);
>  }
>else if (init && (cm->attr.allocatable
> -|| (cm->ts.type == BT_CLASS && CLASS_DATA (cm)->attr.allocatable)))
> +|| (cm->ts.type == BT_CLASS && CLASS_DATA (cm)->attr.allocatable
> +&& expr->ts.type != BT_CLASS)))
>  {
>/* Take care about non-array allocatable components here.  The alloc_*
>routine below is motivated by the alloc_scalar_allocatable_for_
> @@ -7074,6 +7098,14 @@ gfc_trans_subcomponent_assign (tree dest, 
> gfc_component * cm, gfc_expr * expr,
> tmp = gfc_build_memcpy_call (tmp, se.expr, size);
> gfc_add_expr_to_block (&block, tmp);
>   }
> +  else if (cm->ts.type == BT_CLASS && expr->ts.type == BT_CLASS)
> + {
> +   tmp = gfc_copy_class_to_class (se.expr, dest, integer_one_node,
> +CLASS_DATA (cm)->attr.unlimited_polymorphic);
> +   gfc_add_expr_to_block (&block, tmp);
> +   gfc_add_modify (&block, gfc_class_vptr_get (dest),
> +   gfc_class_vptr_get (se.expr));
> + }
>else
>   gfc_add_modify (&block, tmp,
>   fold_convert (TREE_TYPE (tmp), se.expr));
But this hunk is canceled by the one before, isn't it?
I mean, If the condition here is true, the condition before was false?

Mikael


Re: [RFC, Fortran, (pr66775)] Allocatable function result

2015-07-10 Thread Andre Vehreschild
Hi Mikael, hi all,

I only had the chance to check with ifort (different versions; including the 
most recent one) and that compiler is consistent with gfortran as it is now, 
I.e., the executable segfaults after the function has been called.

I am though curious what other compilers opinion on that point is.

Regards,
Andre

Am 10. Juli 2015 18:20:47 MESZ, schrieb Mikael Morin :
>Hello all,
>
>I'm not completely convinced by the standard excerpts that have been
>quoted about this topic, as they don't have any explicit mention of
>allocatable variables/expressions.
>For what it's worth, in my opinion, the handling of allocatable that
>was
>proposed by Andre makes sense to me.  It's consistent with what is done
>for derived type assignment, the lhs' allocatable components are
>deallocated if their rhs counter part are unallocated.  Doing the same
>for whole objects would be, well, consistent.
>What is done by the other compilers?
>
>Mikael

-- 
Andre Vehreschild * Kreuzherrenstr. 8 * 52062 Aachen
Mail: ve...@gmx.de * Tel.: +49 241 9291018


*Ping* Re: [Patch, fortran] PR61831 side-effect deallocation of variable components

2015-07-10 Thread Mikael Morin
Ping: https://gcc.gnu.org/ml/fortran/2015-06/msg00075.html

Le 21/06/2015 11:48, Mikael Morin a écrit :
> Le 16/05/2015 18:43, Mikael Morin a écrit :
>> Hello,
>>
>> this is about PR61831 where in code like:
>>  
>>  type :: string_t
>> character(LEN=1), dimension(:), allocatable :: chars
>>  end type string_t
>>  type(string_t) :: prt_in
>>  (...)
>>  tmp = new_prt_spec ([prt_in])
>>  
>> the deallocation of the argument's allocatable components after the
>> procedure call (to new_prt_spec) has the side effect of freeing prt_in's
>> allocatable components, as the array constructor temporary for [prt_in]
>> is a shallow copy of prt_in.
>>
>> This bug is a regression caused by the Dominique's PR41936 memory leak
>> fix, itself based on a patch originally from me.
>>
>> The attached patch is basically a revert of that fix.  It avoids the
>> problem by not deallocating allocatable components in the problematic
>> case, at the price of a (possible) memory leak.  A new function is
>> introduced telling whether there is aliasing, so that we don't regress
>> on PR41936's memory leak when there is no aliasing, and we don't free
>> components when there is aliasing.
>> The possible remaining memory leak case is the case of a "mixed" array
>> constructor with some parts aliasing variables, and some non-aliasing parts.
>>
>> The patch takes also the opportunity to reassemble the scattered
>> procedure argument deallocation code into a single place.
>>
>> The test needs pr65792's fix (thanks Paul), so for the 4.9 branch I
>> propose commenting the parts that depend on PR65792 in the test.
>>
>> Regression tested on x86_64-linux. OK for 6/5/4.9 ?
> 
> Hello,
> 
> I would like to come back to the patch:
> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01491.html
> 
> PR66082 made me notice one bug in that patch:
> for descriptorless arrays, the patch was deallocating only the first
> element's allocatable components.
> As gfc_conv_array_parameter returns the array data only, the bounds are
> lost and it is not possible to loop through all the elements.
> 
> With the attached patch, the deallocation code is kept in
> gfc_conv_array_parameter where the bounds of descriptorless arrays are
> known.
> To test this fixes the bug, I have added a count of while (1) loops in
> the dump of pr66082's test.  I'm open to better ideas to properly test this.
> 
> For arrays with descriptors, I have decided to not handle them in
> gfc_conv_array_parameter, because in some cases gfc_conv_expr_descriptor
> is called directly from gfc_conv_procedure_call, without passing through
> gfc_conv_array_parameter.
> To be able to select everything but descriptorless arrays for
> allocatable component deallocation, the flags are moved around somewhat.
> The rest is as in the previous patch.
> 
> The test provided is basically unchanged, thus not suitable for the
> branches without pr65792.
> We can decide what to do with it later (backport pr65792 or disable
> parts of the test), I would like to have the fix in mainline first.
> It has been fortran-tested on x86_64-linux, OK for trunk?
> 
> Mikael
> 
> 
> 


Re: [PATCH] PR target/66819: Allow indirect sibcall with register arguments

2015-07-10 Thread Uros Bizjak
On Thu, Jul 9, 2015 at 12:54 PM, H.J. Lu  wrote:
> Indirect sibcall with register arguments is OK when there is register
> available for argument passing.
>
> OK for trunk if there is no regression?
>
>
> H.J.
> ---
> gcc/
>
> PR target/66819
> * config/i386/i386.c (ix86_function_ok_for_sibcall): Allow
> indirect sibcall with register arguments if register available
> for argument passing.
> (init_cumulative_args): Set cfun->machine->arg_reg_available_p
> to cum->nregs != 0.
> (function_arg_advance_32): Set cfun->machine->arg_reg_available_p
> to 0 when setting cum->nregs = 0.

Do we also need similar functionality for 64bit ABIs? What happens if
we are out of argument regs there?

> * config/i386/i386.h (machine_function): Add arg_reg_available_p.
>
> gcc/testsuite/
>
> PR target/66819
> * gcc.target/i386/pr66819-1.c: New test.
> * gcc.target/i386/pr66819-2.c: Likewise.
> * gcc.target/i386/pr66819-3.c: Likewise.
> * gcc.target/i386/pr66819-4.c: Likewise.
> * gcc.target/i386/pr66819-5.c: Likewise.
> ---
>  gcc/config/i386/i386.c| 15 +--
>  gcc/config/i386/i386.h|  3 +++
>  gcc/testsuite/gcc.target/i386/pr66819-1.c |  8 
>  gcc/testsuite/gcc.target/i386/pr66819-2.c |  8 
>  gcc/testsuite/gcc.target/i386/pr66819-3.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr66819-4.c | 12 
>  gcc/testsuite/gcc.target/i386/pr66819-5.c | 10 ++
>  7 files changed, 60 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-4.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr66819-5.c
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 54ee6f3..85e59a8 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -5628,12 +5628,12 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
>if (!decl
>   || (TARGET_DLLIMPORT_DECL_ATTRIBUTES && DECL_DLLIMPORT_P (decl)))
> {
> - if (ix86_function_regparm (type, NULL) >= 3)
> -   {
> - /* ??? Need to count the actual number of registers to be used,
> -not the possible number of registers.  Fix later.  */
> - return false;
> -   }
> + /* FIXME: The symbol indirect call doesn't need a
> +call-clobbered register.  But we don't know if
> +this is a symbol indirect call or not  here.  */
> + if (ix86_function_regparm (type, NULL) >= 3
> + && !cfun->machine->arg_reg_available_p)

Isn't enough to look at arg_reg_available here?

> +   return false;
> }
>  }
>
> @@ -6567,6 +6567,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
> Argument info to initialize */
> ? X86_64_REGPARM_MAX
> : X86_64_MS_REGPARM_MAX);
>  }
> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;

false instead of 0. This is a boolean.

>if (TARGET_SSE)
>  {
>cum->sse_nregs = SSE_REGPARM_MAX;
> @@ -6636,6 +6637,7 @@ init_cumulative_args (CUMULATIVE_ARGS *cum,  /* 
> Argument info to initialize */
>   else
> cum->nregs = ix86_function_regparm (fntype, fndecl);
> }
> +  cfun->machine->arg_reg_available_p = cum->nregs != 0;

IMO, cum->nregs > 0 would be more descriptive.

>/* Set up the number of SSE registers used for passing SFmode
>  and DFmode arguments.  Warn for mismatching ABI.  */
> @@ -7584,6 +7586,7 @@ pass_in_reg:
> {
>   cum->nregs = 0;
>   cum->regno = 0;
> + cfun->machine->arg_reg_available_p = 0;
> }
>break;
>
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 74334ff..0b6e304 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -2479,6 +2479,9 @@ struct GTY(()) machine_function {
>/* If true, it is safe to not save/restore DRAP register.  */
>BOOL_BITFIELD no_drap_save_restore : 1;
>
> +  /* If true, there is register available for argument passing.  */
> +  BOOL_BITFIELD arg_reg_available_p : 1;

This is not a predicate, but a boolean flag. Please remove _p from the name.

> +
>/* During prologue/epilogue generation, the current frame state.
>   Otherwise, the frame state at the end of the prologue.  */
>struct machine_frame_state fs;
> diff --git a/gcc/testsuite/gcc.target/i386/pr66819-1.c 
> b/gcc/testsuite/gcc.target/i386/pr66819-1.c
> new file mode 100644
> index 000..7c8a1ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr66819-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile { target ia32 } } */
> +/* { dg-options "-O2 -mregparm=3" } */
> +/* { dg

Re: [RFC, Fortran, (pr66775)] Allocatable function result

2015-07-10 Thread Mikael Morin
Hello all,

I'm not completely convinced by the standard excerpts that have been
quoted about this topic, as they don't have any explicit mention of
allocatable variables/expressions.
For what it's worth, in my opinion, the handling of allocatable that was
proposed by Andre makes sense to me.  It's consistent with what is done
for derived type assignment, the lhs' allocatable components are
deallocated if their rhs counter part are unallocated.  Doing the same
for whole objects would be, well, consistent.
What is done by the other compilers?

Mikael


Re: [Fortran, patch, pr64589, v1] [OOP] Linking error due to undefined integer symbol with unlimited polymorphism

2015-07-10 Thread Mikael Morin
Le 10/07/2015 16:51, Andre Vehreschild a écrit :
> Hi everyone,
> 
> attached is a rather trivial patch to fix a linker issue when unlimited
> polymorphism is used and the vtabs of intrinsic types are referenced from two
> different locations (e.g. module and main program). Gfortran finds the vtab
> defined in the scope of a module's subroutine and tries to link it to a
> reference in a subroutine of the main program. Then name mangling takes
> place (the module's name is prefixed to the vtab's identifier) and the linker
> later on can not link the reference in the subroutine of the main program to 
> the
> module's entity. By putting the vtabs of all intrinsic types into the 
> top-level
> scope this is easily fixed. The linker now is able to find the name (although
> it is mangled) and linking is fine. 
> 
> I rather don't understand why the decision to put intrinsic type's vtabs into
> the local scope was choosen. There are not so many intrinsic types that they
> can effectively clutter the top-level scope. Instead putting the intrinsic
> types into local scope bloats the executable, because the same entity is
> created over and over again. So this time removing two lines of code did the
> trick. 
> 
> Bootstraps and regtests fine on x86_64-linux-gnu/f21.
> 
> Ok for trunk?
> 
OK. Thanks.

Mikael


[PATCH] Factor out bb_has_abnormal_call_pred (PR middle-end/66353)

2015-07-10 Thread Marek Polacek
ira-lives.c and lra-lives.c both define the same function named
bb_has_abnormal_call_pred.  I think let's factor out this function to
basic-block.h where it really belongs.

Bootstrap/regtest running on x86_64-linux, ok for trunk if it passes?

2015-07-10  Marek Polacek  

PR middle-end/66353
* basic-block.h (has_abnormal_call_or_eh_pred_edge_p): New function.
* ira-lives.c (bb_has_abnormal_call_pred): Remove function.
(process_bb_node_lives): Call has_abnormal_call_or_eh_pred_edge_p
rather than bb_has_abnormal_call_pred.
* lra-lives.c (bb_has_abnormal_call_pred): Remove function.
(process_bb_lives): Call has_abnormal_call_or_eh_pred_edge_p
rather than bb_has_abnormal_call_pred.

diff --git gcc/basic-block.h gcc/basic-block.h
index 67555b2..389ed9f 100644
--- gcc/basic-block.h
+++ gcc/basic-block.h
@@ -632,4 +632,21 @@ has_abnormal_or_eh_outgoing_edge_p (basic_block bb)
 
   return false;
 }
+
+/* Return true when one of the predecessor edges of BB is marked with
+   EDGE_ABNORMAL_CALL or EDGE_EH.  */
+
+static inline bool
+has_abnormal_call_or_eh_pred_edge_p (basic_block bb)
+{
+  edge e;
+  edge_iterator ei;
+
+  FOR_EACH_EDGE (e, ei, bb->preds)
+if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
+  return true;
+
+  return false;
+}
+
 #endif /* GCC_BASIC_BLOCK_H */
diff --git gcc/ira-lives.c gcc/ira-lives.c
index 7358f67..1cb05c2 100644
--- gcc/ira-lives.c
+++ gcc/ira-lives.c
@@ -968,22 +968,6 @@ process_single_reg_class_operands (bool in_p, int freq)
 }
 }
 
-/* Return true when one of the predecessor edges of BB is marked with
-   EDGE_ABNORMAL_CALL or EDGE_EH.  */
-static bool
-bb_has_abnormal_call_pred (basic_block bb)
-{
-  edge e;
-  edge_iterator ei;
-
-  FOR_EACH_EDGE (e, ei, bb->preds)
-{
-  if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
-   return true;
-}
-  return false;
-}
-
 /* Look through the CALL_INSN_FUNCTION_USAGE of a call insn INSN, and see if
we find a SET rtx that we can use to deduce that a register can be cheaply
caller-saved.  Return such a register, or NULL_RTX if none is found.  */
@@ -1343,7 +1327,8 @@ process_bb_node_lives (ira_loop_tree_node_t 
loop_tree_node)
  /* No need to record conflicts for call clobbered regs if we
 have nonlocal labels around, as we don't ever try to
 allocate such regs in this case.  */
- if (!cfun->has_nonlocal_label && bb_has_abnormal_call_pred (bb))
+ if (!cfun->has_nonlocal_label
+ && has_abnormal_call_or_eh_pred_edge_p (bb))
for (px = 0; px < FIRST_PSEUDO_REGISTER; px++)
  if (call_used_regs[px]
 #ifdef REAL_PIC_OFFSET_TABLE_REGNUM
diff --git gcc/lra-lives.c gcc/lra-lives.c
index 8b86368..322b3bf 100644
--- gcc/lra-lives.c
+++ gcc/lra-lives.c
@@ -508,22 +508,6 @@ static lra_insn_recog_data_t curr_id;
 /* The insn static data.  */
 static struct lra_static_insn_data *curr_static_id;
 
-/* Return true when one of the predecessor edges of BB is marked with
-   EDGE_ABNORMAL_CALL or EDGE_EH.  */
-static bool
-bb_has_abnormal_call_pred (basic_block bb)
-{
-  edge e;
-  edge_iterator ei;
-
-  FOR_EACH_EDGE (e, ei, bb->preds)
-{
-  if (e->flags & (EDGE_ABNORMAL_CALL | EDGE_EH))
-   return true;
-}
-  return false;
-}
-
 /* Vec containing execution frequencies of program points.  */
 static vec point_freq_vec;
 
@@ -965,7 +949,8 @@ process_bb_lives (basic_block bb, int &curr_point, bool 
dead_insn_p)
   /* No need to record conflicts for call clobbered regs if we
 have nonlocal labels around, as we don't ever try to
 allocate such regs in this case.  */
-  if (!cfun->has_nonlocal_label && bb_has_abnormal_call_pred (bb))
+  if (!cfun->has_nonlocal_label
+ && has_abnormal_call_or_eh_pred_edge_p (bb))
for (px = 0; px < FIRST_PSEUDO_REGISTER; px++)
  if (call_used_regs[px]
 #ifdef REAL_PIC_OFFSET_TABLE_REGNUM

Marek


[PATCH, i386]: Prevent subregs of a hard register after reload [was: Fix PR 66814, ...]

2015-07-10 Thread Uros Bizjak
On Thu, Jul 9, 2015 at 10:17 PM, Jakub Jelinek  wrote:
> On Thu, Jul 09, 2015 at 10:13:49PM +0200, Uros Bizjak wrote:
>> I was under impression that peephole2 pass doesn't see subregs of hard
>> regs (all x86 predicates are written in this way). Even documentation
>> somehow agrees with this:

[...]

>> So, I'd say that generating naked SUBREG after reload should be
>> avoided and gen_lowpart should be used in the code above.
>
> There is also:
>   emit_insn (gen_sse2_loadld (operands[3], CONST0_RTX (V4SImode),
>   gen_rtx_SUBREG (SImode, operands[1], 0)));
>   emit_insn (gen_sse2_loadld (operands[4], CONST0_RTX (V4SImode),
>   gen_rtx_SUBREG (SImode, operands[1], 4)));
> in some splitters (also post-reload).

Attached patch converts all relevant places to
gen_lowpart/gen_highpart to prevent subregs of a hard register after
reload.

2015-07-10  Uros Bizjak  

* config/i386/sse.md (movdi_to_sse): Use gen_lowpart
and gen_higpart instead of gen_rtx_SUBREG.
* config/i386/i386.md
(floatdi2_i387_with_xmm splitter): Ditto.
(read-modify peephole2): Use gen_lowpart instead of
gen_rtx_SUBREG for operand 5.

Patch was bootstrapped (with --enable-checking=rtl) and regression
tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be backported to gcc-5 branch together
with PR 66814 patch once branch opens.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 225675)
+++ config/i386/i386.md (working copy)
@@ -5100,11 +5100,11 @@
   /* The DImode arrived in a pair of integral registers (e.g. %edx:%eax).
  Assemble the 64-bit DImode value in an xmm register.  */
   emit_insn (gen_sse2_loadld (operands[3], CONST0_RTX (V4SImode),
- gen_rtx_SUBREG (SImode, operands[1], 0)));
+ gen_lowpart (SImode, operands[1])));
   emit_insn (gen_sse2_loadld (operands[4], CONST0_RTX (V4SImode),
- gen_rtx_SUBREG (SImode, operands[1], 4)));
+ gen_highpart (SImode, operands[1])));
   emit_insn (gen_vec_interleave_lowv4si (operands[3], operands[3],
-operands[4]));
+operands[4]));
 
   operands[3] = gen_rtx_REG (DImode, REGNO (operands[3]));
 })
@@ -18064,11 +18064,13 @@
 
   operands[1] = gen_rtx_PLUS (word_mode, base,
  gen_rtx_MULT (word_mode, index, GEN_INT (scale)));
-  operands[5] = base;
   if (mode != word_mode)
 operands[1] = gen_rtx_SUBREG (mode, operands[1], 0);
+
+  operands[5] = base;
   if (op1mode != word_mode)
-operands[5] = gen_rtx_SUBREG (op1mode, operands[5], 0);
+operands[5] = gen_lowpart (op1mode, operands[5]);
+
   operands[0] = dest;
 })
 
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 225675)
+++ config/i386/sse.md  (working copy)
@@ -1080,9 +1080,9 @@
   /* The DImode arrived in a pair of integral registers (e.g. %edx:%eax).
 Assemble the 64-bit DImode value in an xmm register.  */
   emit_insn (gen_sse2_loadld (operands[0], CONST0_RTX (V4SImode),
- gen_rtx_SUBREG (SImode, operands[1], 0)));
+ gen_lowpart (SImode, operands[1])));
   emit_insn (gen_sse2_loadld (operands[2], CONST0_RTX (V4SImode),
- gen_rtx_SUBREG (SImode, operands[1], 4)));
+ gen_highpart (SImode, operands[1])));
   emit_insn (gen_vec_interleave_lowv4si (operands[0], operands[0],
 operands[2]));
}


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-10 Thread Jim Wilson
On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
 wrote:
> On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson  
> wrote:
>>signed sub-word locals.  Thus to detect the need for a conversion, you
>>have to have the decls, and we don't have them here.  There is also
>
> It probably is.  The decks for the parameter based SSA names are available, 
> for the PHI destination there might be no decl.

I tried looking again, and found the decls.  I'm able to get correct
code for my testcase with the attached patch to force the conversion.
It is rather inelegant, but I think I can cache the values I need to
make this simpler and cleaner.  I still don't have decls from
insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
looks like those are for breaking cycles, and hence might not need
conversions.

Jim
Index: tree-outof-ssa.c
===
--- tree-outof-ssa.c	(revision 225477)
+++ tree-outof-ssa.c	(working copy)
@@ -230,11 +230,32 @@ set_location_for_edge (edge e)
SRC/DEST might be BLKmode memory locations SIZEEXP is a tree from
which we deduce the size to copy in that case.  */
 
-static inline rtx_insn *
-emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp)
+rtx_insn *
+emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp,
+		 tree var2 ATTRIBUTE_UNUSED)
 {
   start_sequence ();
 
+  /* If var2 is set, then sizeexp is the src decl and var2 is the dest decl.  */
+  if (var2)
+{
+  tree src_var = (TREE_CODE (sizeexp) == SSA_NAME
+		  ? SSA_NAME_VAR (sizeexp) : sizeexp);
+  tree dest_var = (TREE_CODE (var2) == SSA_NAME
+		   ? SSA_NAME_VAR (var2) : var2);
+  int src_unsignedp = TYPE_UNSIGNED (TREE_TYPE (src_var));
+  int dest_unsignedp = TYPE_UNSIGNED (TREE_TYPE (dest_var));
+  machine_mode src_mode = promote_decl_mode (src_var, &src_unsignedp);
+  machine_mode dest_mode = promote_decl_mode (dest_var, &dest_unsignedp);
+  if (src_unsignedp != dest_unsignedp
+	  && src_mode != DECL_MODE (src_var)
+	  && dest_mode != DECL_MODE (dest_var))
+	{
+	  src = gen_lowpart_common (DECL_MODE (src_var), src);
+	  unsignedsrcp = dest_unsignedp;
+	}
+}
+
   if (GET_MODE (src) != VOIDmode && GET_MODE (src) != GET_MODE (dest))
 src = convert_to_mode (GET_MODE (dest), src, unsignedsrcp);
   if (GET_MODE (src) == BLKmode)
@@ -256,7 +277,7 @@ emit_partition_copy (rtx dest, rtx src,
 static void
 insert_partition_copy_on_edge (edge e, int dest, int src, source_location locus)
 {
-  tree var;
+  tree var, var2;
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file,
@@ -276,10 +297,11 @@ insert_partition_copy_on_edge (edge e, i
 set_curr_insn_location (locus);
 
   var = partition_to_var (SA.map, src);
+  var2 = partition_to_var (SA.map, dest);
   rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]),
    copy_rtx (SA.partition_to_pseudo[src]),
    TYPE_UNSIGNED (TREE_TYPE (var)),
-   var);
+   var, var2);
 
   insert_insn_on_edge (seq, e);
 }
@@ -373,7 +395,8 @@ insert_rtx_to_part_on_edge (edge e, int
  involved), so it doesn't matter.  */
   rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]),
    src, unsignedsrcp,
-   partition_to_var (SA.map, dest));
+   partition_to_var (SA.map, dest), 0);
+
 
   insert_insn_on_edge (seq, e);
 }
@@ -406,7 +429,7 @@ insert_part_to_rtx_on_edge (edge e, rtx
   rtx_insn *seq = emit_partition_copy (dest,
    copy_rtx (SA.partition_to_pseudo[src]),
    TYPE_UNSIGNED (TREE_TYPE (var)),
-   var);
+   var, 0);
 
   insert_insn_on_edge (seq, e);
 }


[PATCH, i386]: Robustify gcc.target/i386/readeflags-1.c

2015-07-10 Thread Uros Bizjak
Hello!

When using __readeflags, we have to prevent possible flag-clobbering
zero-extensions and make flag-setting operation persistent.

2015-07-10  Uros Bizjak  

PR target/66703
* gcc.target/i386/readeflags-1.c (readeflags_test): Declare with
__attribute__((noinline, noclone)).  Change "x" to "volatile char"
type to prevent possible flag-clobbering zero-extensions.
* gcc.target/i386/pr66703.c: New test.

Tested on x86_64-linux-gnu {,-m32}  and committed to mainline SVN.

Uros.
Index: gcc.target/i386/readeflags-1.c
===
--- gcc.target/i386/readeflags-1.c  (revision 225675)
+++ gcc.target/i386/readeflags-1.c  (working copy)
@@ -11,10 +11,11 @@
 #define EFLAGS_TYPE unsigned int
 #endif
 
-static EFLAGS_TYPE
+__attribute__((noinline, noclone))
+EFLAGS_TYPE
 readeflags_test (unsigned int a, unsigned int b)
 {
-  unsigned x = (a == b);
+  volatile char x = (a == b);
   return __readeflags ();
 }
 
Index: gcc.target/i386/pr66703.c
===
--- gcc.target/i386/pr66703.c   (revision 0)
+++ gcc.target/i386/pr66703.c   (working copy)
@@ -0,0 +1,4 @@
+/* { dg-do run { target { ia32 } } } */
+/* { dg-options "-O0 -mtune=pentium" } */
+
+#include "readeflags-1.c"


[PATCH] MIPS: Correctly update the isa and arch_test_option_p variables after the arch dependency handling code in mips.exp

2015-07-10 Thread Andrew Bennett
Hi,

I have noticed that in the mips.exp dg-option handling code the isa and 
arch_test_option_p variables are not updated after the pre-arch to arch 
dependency handling.  This means that if this code changes the 
architecture the post-arch dependency handling code (which relies on 
arch_test_option_p being true) is not run to handle any extra dependencies 
the new architecture might need.  

I have found this issue while investigating failures with the mips-mti-elf 
toolchain using the -mnan=legacy multilib flags when running any of the 
mips tests that have the HAS_LSA option specified in the dg-options.  The 
default architecture for this toolchain is mips32r2.  This means the 
architecture 
handling code changes the architecture to mips32r6 to handle the HAS_LSA 
requirements.  Unfortunately because the arch_test_option_p is not updated 
it is still set to false, so the post-arch code is not run.  This means
the nan encoding is not set to -mnan=2008 when then causes the tests to fail 
because mips32r6 does not support -mnan=legacy. 

The patch and ChangeLog are below.

Ok to commit?



Regards,



Andrew


testsuite/
* gcc.target/mips/mips.exp (mips-dg-options): Update the isa and
arch_test_option_p variables after the arch dependency handling code.


diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
b/gcc/testsuite/gcc.target/mips/mips.exp
index 1dd4173..1eb714d 100644
--- a/gcc/testsuite/gcc.target/mips/mips.exp
+++ b/gcc/testsuite/gcc.target/mips/mips.exp
@@ -1188,8 +1188,10 @@ proc mips-dg-options { args } {
 }
 
 # Re-calculate the isa_rev for use in the abi handling code below
+set arch_test_option_p [mips_test_option_p options arch]
 set arch [mips_option options arch]
 set isa_rev [mips_arch_info $arch isa_rev]
+set isa [mips_arch_info $arch isa]
 
 # Set an appropriate ABI, handling dependencies between the pre-abi
 # options and the abi options.  This should mirror the abi and post-abi



Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-07-10 Thread Jim Wilson
On Wed, Jul 8, 2015 at 3:54 PM, Jeff Law  wrote:
> On 07/07/2015 10:29 AM, Jim Wilson wrote:
> This is critically important as various parts of the compiler will take a
> degenerate PHI node and propagate the RHS of the PHI into the uses of the
> LHS of the PHI -- without doing any conversions.

I think this is OK, because tree-outof-ssa does send code in basic
blocks through expand_expr, which will emit conversions if necessary.
it is only the conversion of PHI nodes to RTL that is the problem, as
it doesn't use expand_expr, and hence doesn't get the
SUBREG_PROMOTED_P conversions.

Jim


[Fortran, patch, pr64589, v1] [OOP] Linking error due to undefined integer symbol with unlimited polymorphism

2015-07-10 Thread Andre Vehreschild
Hi everyone,

attached is a rather trivial patch to fix a linker issue when unlimited
polymorphism is used and the vtabs of intrinsic types are referenced from two
different locations (e.g. module and main program). Gfortran finds the vtab
defined in the scope of a module's subroutine and tries to link it to a
reference in a subroutine of the main program. Then name mangling takes
place (the module's name is prefixed to the vtab's identifier) and the linker
later on can not link the reference in the subroutine of the main program to the
module's entity. By putting the vtabs of all intrinsic types into the top-level
scope this is easily fixed. The linker now is able to find the name (although
it is mangled) and linking is fine. 

I rather don't understand why the decision to put intrinsic type's vtabs into
the local scope was choosen. There are not so many intrinsic types that they
can effectively clutter the top-level scope. Instead putting the intrinsic
types into local scope bloats the executable, because the same entity is
created over and over again. So this time removing two lines of code did the
trick. 

Bootstraps and regtests fine on x86_64-linux-gnu/f21.

Ok for trunk?

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


pr64589_1.clog
Description: Binary data
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 7990399..218973d 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -2511,10 +2511,8 @@ find_intrinsic_vtab (gfc_typespec *ts)
 
   sprintf (name, "__vtab_%s", tname);
 
-  /* Look for the vtab symbol in various namespaces.  */
-  gfc_find_symbol (name, gfc_current_ns, 0, &vtab);
-  if (vtab == NULL)
-	gfc_find_symbol (name, ns, 0, &vtab);
+  /* Look for the vtab symbol in the top-level namespace only.  */
+  gfc_find_symbol (name, ns, 0, &vtab);
 
   if (vtab == NULL)
 	{
diff --git a/gcc/testsuite/gfortran.dg/pr64589.f90 b/gcc/testsuite/gfortran.dg/pr64589.f90
new file mode 100644
index 000..6e65e70
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr64589.f90
@@ -0,0 +1,30 @@
+! { dg-do compile }
+! Just need to check if compiling and linking is possible.
+!
+! Check that the _vtab linking issue is resolved.
+! Contributed by Damian Rouson  
+
+module m
+contains
+  subroutine fmt()
+class(*), pointer :: arg
+select type (arg)
+type is (integer)
+end select
+  end subroutine
+end module
+
+program p
+  call getSuffix()
+contains
+  subroutine makeString(arg1)
+class(*) :: arg1
+select type (arg1)
+type is (integer)
+end select
+  end subroutine
+  subroutine getSuffix()
+call makeString(1)
+  end subroutine
+end
+


[patch] enable the building on libatomic on DragonFly

2015-07-10 Thread John Marino
With the attached patch, libatomic will build and pass 100% of the tests
on DragonFly.

suggested entry for libatomic/ChangeLog:

2015-07-XX  John Marino  

* configure.tgt: Add *-*-dragonfly to supported targets.

Please consider this patch for trunk.
Thanks,
John
--- libatomic/configure.tgt.orig2015-07-09 16:08:55 UTC
+++ libatomic/configure.tgt
@@ -110,7 +110,7 @@ case "${target}" in
;;
 
   *-*-linux* | *-*-gnu* | *-*-k*bsd*-gnu \
-  | *-*-netbsd* | *-*-freebsd* | *-*-openbsd* \
+  | *-*-netbsd* | *-*-freebsd* | *-*-openbsd* | *-*-dragonfly* \
   | *-*-solaris2* | *-*-sysv4* | *-*-irix6* | *-*-osf* | *-*-hpux11* \
   | *-*-darwin* | *-*-aix* | *-*-cygwin*)
# POSIX system.  The OS is supported.


Re: [RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-07-10 Thread Pat Haugen

On 07/09/2015 04:43 PM, Martin Liška wrote:

This final version which I agreed with Richard Sandiford.
Hope this can be finally installed to trunk?

Patch can bootstrap and survive regression tests on x86_64-linux-gnu.
FWIW, I confirmed this version of the patch fixes the build issues on 
powerpc64 that I have been seeing.


-Pat



Re: [PATCH 4/6] Port ipa-cp to use cgraph_edge summary.

2015-07-10 Thread Martin Jambor
Hi,

I know the patch has been approved by Jeff, but please do not commit
it before considering the following:

On Thu, Jul 09, 2015 at 11:13:53AM +0200, Martin Liska wrote:
> gcc/ChangeLog:
> 
> 2015-07-03  Martin Liska  
> 
>   * ipa-cp.c (struct edge_clone_summary): New structure.
>   (class edge_clone_summary_t): Likewise.
>   (edge_clone_summary_t::initialize): New method.
>   (edge_clone_summary_t::duplicate): Likewise.
>   (get_next_cgraph_edge_clone): Remove.
>   (get_info_about_necessary_edges): Refactor using the new
>   data structure.
>   (gather_edges_for_value): Likewise.
>   (perhaps_add_new_callers): Likewise.
>   (ipcp_driver): Allocate and deallocate newly added
>   instance.
> ---
>  gcc/ipa-cp.c | 198 
> ++-
>  1 file changed, 113 insertions(+), 85 deletions(-)
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 16b9cde..8a50b63 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -2888,54 +2888,79 @@ ipcp_discover_new_direct_edges (struct cgraph_node 
> *node,
>  inline_update_overall_summary (node);
>  }
>  
> -/* Vector of pointers which for linked lists of clones of an original crgaph
> -   edge. */
> +/* Edge clone summary.  */
>  
> -static vec next_edge_clone;
> -static vec prev_edge_clone;
> -
> -static inline void
> -grow_edge_clone_vectors (void)
> +struct edge_clone_summary

I's got constructors and destructors so it should be a class, reaally.

>  {
> -  if (next_edge_clone.length ()
> -  <=  (unsigned) symtab->edges_max_uid)
> -next_edge_clone.safe_grow_cleared (symtab->edges_max_uid + 1);
> -  if (prev_edge_clone.length ()
> -  <=  (unsigned) symtab->edges_max_uid)
> -prev_edge_clone.safe_grow_cleared (symtab->edges_max_uid + 1);
> -}
> +  /* Default constructor.  */
> +  edge_clone_summary (): edge_set (NULL), edge (NULL) {}
>  
> -/* Edge duplication hook to grow the appropriate linked list in
> -   next_edge_clone. */
> +  /* Default destructor.  */
> +  ~edge_clone_summary ()
> +  {
> +gcc_assert (edge_set != NULL);
>  
> -static void
> -ipcp_edge_duplication_hook (struct cgraph_edge *src, struct cgraph_edge *dst,
> - void *)
> +if (edge != NULL)
> +  {
> + gcc_checking_assert (edge_set->contains (edge));
> + edge_set->remove (edge);
> +  }
> +
> +/* Release memory for an empty set.  */
> +if (edge_set->elements () == 0)
> +  delete edge_set;
> +  }
> +
> +  hash_set  *edge_set;
> +  cgraph_edge *edge;

If the hash set is supposed to replace the linked list of edge clones,
then a removal mechanism seems to be missing.  The whole point of
prev_edge_clone vector was to allow removal of edges from the linked
list, because as speculative edges are thrown away, clones can be too
and then we must remove the pointer from the list, or hash set.

Have you tried -O3 LTOing Firefox with these changes?

But I must say that I'm not convinced that converting the linked list
into a hash_set is a good idea at all.  Apart from the self-removal
operation, the lists are always traversed linearly and in full, so
except for using a C++-style iterator, I really do not see any point.

Moreover, you seem to create a hash table for each and every edge,
even when it has no clones, just to be able to enter the edge itself
into it, and so not skip it when you iterate over all clones.  That
really seems like unjustifiable overhead.  And the deletion in
duplication hook is also very unappealing.  So the bottom line is that
while I like turning the two vectors into a summary, I do not like the
hash set at all.  If absolutely think it is a good idea, please make
that change in a separate patch so that we can better argue about its
merits.

On the other hand, since the summaries are hash-based themselves, it
would be great if they had a predicate to find out whether there is
any summary for a given edge at all and have get_next_cgraph_edge_clone
return false if there was none.  That would actually save memory.

Thanks,

Martin



Re: [PATCH][AArch64] Improve csinc/csneg/csinv opportunities on immediates

2015-07-10 Thread Kyrill Tkachov


On 10/07/15 10:00, pins...@gmail.com wrote:





On Jul 10, 2015, at 1:47 AM, Kyrill Tkachov  wrote:

Hi Andrew,


On 10/07/15 09:40, pins...@gmail.com wrote:




On Jul 10, 2015, at 1:34 AM, Kyrill Tkachov  wrote:

Hi all,

Currently when evaluating expressions like (a ? 24 : 25) we will move 24 and 25 
into
registers and perform a csel on them.  This misses the opportunity to instead 
move just 24
into a register and then perform a csinc, saving us an instruction and a 
register use.
Similarly for csneg and csinv.

This patch implements that idea by allowing such pairs of immediates in 
*cmov_insn
and adding an early splitter that performs the necessary transformation.

The testcase included in the patch demonstrates the kind of opportunities that 
are now picked up.

With this patch I see about 9.6% more csinc instructions being generated for 
SPEC2006
and the generated code looks objectively better (i.e. fewer mov-immediates and 
slightly
lower register pressure).

Bootstrapped and tested on aarch64.

Ok for trunk?

I think this is the wrong place for this optimization. It should happen in 
expr.c and we should produce cond_expr on the gimple level.

I had considered it, but I wasn't sure how general the conditional 
increment/negate/inverse operations
are to warrant a midend implementation. Do you mean the 
expand_cond_expr_using_cmove function in expr.c?

Yes and we can expand it to even have a target hook on how to expand them if 
needed.


I played around in that part and it seems that by the time it gets to expansion 
the midend
doesn't have a cond_expr of the two immediates, it's a PHI node with the 
immediates already expanded.
I have not been able to get it to match a cond_expr of two immediates there, 
although that could be
because I'm unfamiliar with that part of the codebase.

Kyrill



There is already a standard pattern for condition add so the a ? Const1 : 
const2 can be handled in the a generic way without much troubles. We should 
handle it better in rtl  ifcvt too (that should be an easier patch). The neg 
and not cases are very target specific but can be handled by a target hook and 
expand it directly to it.


  
I have patches to do both but I have not got around to cleaning them up. If anyone wants them, I can send a link to my current gcc 5.1 sources with them included.

Any chance you can post them on gcc-patches even as a rough idea of what needs 
to be done?


I posted my expr patch a few years ago but I never got around to rth's 
comments. This was the generic increment patch. Basically aarch64 should be 
implementing that pattern too.


The main reason why this should be handled in gimple is that ifcvt on the rtl 
level is not cheap and does not catch all of the cases the simple expansion of 
phi-opt does. I can dig that patch up and I will be doing that next week 
anyways.

Thanks,
Andrew


Thanks,
Kyrill

  
Thanks,

Andrew


Thanks,
Kyrill

2015-07-10  Kyrylo Tkachov  

* config/aarch64/aarch64.md (*cmov_insn): Move stricter
check for operands 3 and 4 to pattern predicate.  Allow immediates
that can be expressed as csinc/csneg/csinv.  New define_split.
(*csinv3_insn): Rename to...
(csinv3_insn): ... This.
* config/aarch64/aarch64.h (AARCH64_IMMS_OK_FOR_CSNEG): New macro.
(AARCH64_IMMS_OK_FOR_CSINC): Likewise.
(AARCH64_IMMS_OK_FOR_CSINV): Likewise.
* config/aarch64/aarch64.c (aarch64_imms_ok_for_cond_op_1):
New function.
(aarch64_imms_ok_for_cond_op): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_imms_ok_for_cond_op_1):
Declare prototype.
(aarch64_imms_ok_for_cond_op): Likewise.

2015-07-10  Kyrylo Tkachov  

* gcc.target/aarch64/cond-op-imm_1.c: New test.





Re: [RFC, Fortran, (pr66775)] Allocatable function result

2015-07-10 Thread Steve Kargl
Yes, it should be closed.  When I asked you to open it,
I thought the issue was a corner case in your patch.

-- 
steve

On Fri, Jul 10, 2015 at 11:44:32AM +0200, Andre Vehreschild wrote:
> 
> this means that pr66775 is to be closed as resolved invalid, because the
> current implementation is alright, but only the program to compile is garbage.
> Ok, suits me.
> 
> - Andre
> 
> On Thu, 9 Jul 2015 12:41:31 -0700
> Steve Kargl  wrote:
> 
> > On Thu, Jul 09, 2015 at 08:59:08PM +0200, Andre Vehreschild wrote:
> > > Hi Steve,
> > > 
> > > Thanks for your knowledge. Can you support your statement that an
> > > allocatable function has to return an allocated object by a part of the
> > > standard? I totally agree with you that this code is ill-designed, but IMO
> > > is it not the task of the compiler to address ill design. The compiler has
> > > to comply to the standard and the standard allows allocatable objects to 
> > > be
> > > unallocated. So why has the result of a function be allocated always?
> > > 
> > > Regards,
> > > Andre
> > > 
> > 
> > I think the following excerpts from F2008 are the relevant
> > clauses, especially the 2nd to last sentence in the excerpt
> > from 12.6.2.2.
> > 
> > !  12.5.3
> > !
> > !  When execution of the function is complete, the value of
> > !  the function result is available for use in the expression
> > !  that caused the function to be invoked.
> > !
> > !  12.6.2.2
> > !
> > !  If RESULT appears, the name of the result variable of the
> > !  function is result-name and all occurrences of the function
> > !  name in execution-part statements in its scope refer to the
> > !  function itself.  If RESULT does not appear, the name of the
> > !  result variable is function-name and all occurrences of the
> > !  function name in execution-part statements in its scope are
> > !  references to the result variable.  The characteristics (12.3.3)
> > !  of the function result are those of the result variable.  On
> > !  completion of execution of the function, the value returned is
> > !  that of its result variable.  If the function result is a pointer,
> > !  the shape of the value returned by the function is determined by
> > !  the shape of the result variable when the execution of the function
> > !  is completed.  If the result variable is not a pointer, its value
> > !  shall be defined by the function.  If the function result is a
> > !  pointer, on return the pointer association status of the result
> > !  variable shall not be undefined.
> > 
> 
> 
> -- 
> Andre Vehreschild * Email: vehre ad gmx dot de 

-- 
Steve


Re: [PATCH 2/6] Introduce new edge_summary class and replace ipa_edge_args_sum.

2015-07-10 Thread Martin Jambor
Hi,

thanks for working on this and sorry for a tad late review:

On Thu, Jul 09, 2015 at 11:13:52AM +0200, Martin Liska wrote:
> gcc/ChangeLog:
> 
> 2015-07-03  Martin Liska  
> 
>   * cgraph.c (symbol_table::create_edge): Introduce summary_uid
>   for cgraph_edge.
>   * cgraph.h (struct GTY): Likewise.

struct GTY does not look right :-)

>   * ipa-inline-analysis.c (estimate_function_body_sizes): Use
>   new data structure.
>   * ipa-profile.c (ipa_profile): Likewise.
>   * ipa-prop.c (ipa_print_node_jump_functions):

  Likewise.

>   (ipa_propagate_indirect_call_infos): Likewise.
>   (ipa_free_edge_args_substructures): Likewise.
>   (ipa_free_all_edge_args): Likewise.
>   (ipa_edge_args_t::remove): Likewise.
>   (ipa_edge_removal_hook): Likewise.
>   (ipa_edge_args_t::duplicate): Likewise.
>   (ipa_register_cgraph_hooks): Likewise.
>   (ipa_unregister_cgraph_hooks): Likewise.
>   * ipa-prop.h (ipa_check_create_edge_args): Likewise.
>   (ipa_edge_args_info_available_for_edge_p): Likewise.

Definition of ipa_edge_args_t is missing here.

>   * symbol-summary.h (gt_ggc_mx): Indent properly.
>   (gt_pch_nx): Likewise.
>   (edge_summary): New class.
> ---
>  gcc/cgraph.c  |   2 +
>  gcc/cgraph.h  |   5 +-
>  gcc/ipa-inline-analysis.c |   2 +-
>  gcc/ipa-profile.c |   2 +-
>  gcc/ipa-prop.c|  71 +++-
>  gcc/ipa-prop.h|  44 ++
>  gcc/symbol-summary.h  | 208 
> +-
>  7 files changed, 252 insertions(+), 82 deletions(-)
> 

...

> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> index e6725aa..f0af9b2 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -493,13 +493,36 @@ public:
>  extern ipa_node_params_t *ipa_node_params_sum;
>  /* Vector of IPA-CP transformation data for each clone.  */
>  extern GTY(()) vec *ipcp_transformations;
> -/* Vector where the parameter infos are actually stored. */
> -extern GTY(()) vec *ipa_edge_args_vector;
> +
> +/* Function summary for ipa_node_params.  */
> +class GTY((user)) ipa_edge_args_t: public edge_summary 
> +{
> +public:
> +  ipa_edge_args_t (symbol_table *symtab):
> +edge_summary  (symtab, true) { }
> +
> +  static ipa_edge_args_t *create_ggc (symbol_table *symtab)
> +  {

Please move the body of this function to where the bodies of the rest
of the member functions are.

> +ipa_edge_args_t *summary = new (ggc_cleared_alloc  ())
> +  ipa_edge_args_t (symtab);
> +return summary;
> +  }
> +
> +  /* Hook that is called by summary when a node is duplicated.  */
> +  virtual void duplicate (cgraph_edge *edge,
> +   cgraph_edge *edge2,
> +   ipa_edge_args *data,
> +   ipa_edge_args *data2);
> +
> +  virtual void remove (cgraph_edge *edge, ipa_edge_args *data);
> +};
> +
> +extern GTY(()) edge_summary  *ipa_edge_args_sum;
>  
>  /* Return the associated parameter/argument info corresponding to the given
> node/edge.  */
>  #define IPA_NODE_REF(NODE) (ipa_node_params_sum->get (NODE))
> -#define IPA_EDGE_REF(EDGE) (&(*ipa_edge_args_vector)[(EDGE)->uid])
> +#define IPA_EDGE_REF(EDGE) (ipa_edge_args_sum->get (EDGE))
>  /* This macro checks validity of index returned by
> ipa_get_param_decl_index function.  */
>  #define IS_VALID_JUMP_FUNC_INDEX(I) ((I) != -1)
> @@ -532,19 +555,8 @@ ipa_check_create_node_params (void)
>  static inline void
>  ipa_check_create_edge_args (void)
>  {
> -  if (vec_safe_length (ipa_edge_args_vector)
> -  <= (unsigned) symtab->edges_max_uid)
> -vec_safe_grow_cleared (ipa_edge_args_vector, symtab->edges_max_uid + 1);
> -}
> -
> -/* Returns true if the array of edge infos is large enough to accommodate an
> -   info for EDGE.  The main purpose of this function is that debug dumping
> -   function can check info availability without causing reallocations.  */
> -
> -static inline bool
> -ipa_edge_args_info_available_for_edge_p (struct cgraph_edge *edge)
> -{
> -  return ((unsigned) edge->uid < vec_safe_length (ipa_edge_args_vector));
> +  if (ipa_edge_args_sum == NULL)
> +ipa_edge_args_sum = ipa_edge_args_t::create_ggc (symtab);
>  }
>  
>  static inline ipcp_transformation_summary *
> diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h
> index eefbfd9..5799443 100644
> --- a/gcc/symbol-summary.h
> +++ b/gcc/symbol-summary.h
> @@ -108,7 +108,7 @@ public:
>/* Allocates new data that are stored within map.  */
>T* allocate_new ()
>{
> -return m_ggc ? new (ggc_alloc  ()) T() : new T () ;
> +return m_ggc ? new (ggc_alloc  ()) T () : new T () ;
>}
>  
>/* Release an item that is stored within map.  */
> @@ -234,7 +234,7 @@ private:
>  
>  template 
>  void
> -gt_ggc_mx(function_summary* const &summary)
> +gt_ggc_mx (function_summary* const &summary)
>  {
>gcc_checking_assert 

Re: [PATCH 5/6] Port IPA reference to function_summary infrastructure.

2015-07-10 Thread Martin Jambor
Hi,

I've spotted a likely typo:

On Thu, Jul 09, 2015 at 11:13:53AM +0200, Martin Liska wrote:
> gcc/ChangeLog:
> 
> 2015-07-03  Martin Liska  
> 
>   * ipa-reference.c (ipa_ref_opt_summary_t): New class.
>   (get_reference_optimization_summary): Use it.
>   (set_reference_optimization_summary): Likewise.
>   (ipa_init): Remove hook holders usage.
>   (ipa_reference_c_finalize): Likewise.
>   (ipa_ref_opt_summary_t::duplicate): New function.
>   (ipa_ref_opt_summary_t::remove): Likewise.
>   (propagate): Allocate the summary if does not exist.
>   (ipa_reference_read_optimization_summary): Likewise.
>   (struct ipa_reference_vars_info_d): Add new method.
>   (struct ipa_reference_optimization_summary_d): Likewise.
>   (get_reference_vars_info): Use new underlying container.
>   (set_reference_vars_info): Remove.
>   (init_function_info): Set up the container.
> ---
>  gcc/ipa-reference.c | 203 
> ++--
>  1 file changed, 102 insertions(+), 101 deletions(-)
> 
> diff --git a/gcc/ipa-reference.c b/gcc/ipa-reference.c
> index 465a74b..2afd9ad 100644
> --- a/gcc/ipa-reference.c
> +++ b/gcc/ipa-reference.c

...

> @@ -837,12 +839,14 @@ propagate (void)
>   }
>  }
>  
> +  if (ipa_ref_opt_sum_summaries == NULL)
> +ipa_ref_opt_sum_summaries = new ipa_ref_opt_summary_t (symtab);
> +
>/* Cleanup. */
>FOR_EACH_DEFINED_FUNCTION (node)
>  {
>ipa_reference_vars_info_t node_info;
>ipa_reference_global_vars_info_t node_g;
> -  ipa_reference_optimization_summary_t opt;
>  
>node_info = get_reference_vars_info (node);
>if (!node->alias && opt_for_fn (node->decl, flag_ipa_reference)
> @@ -851,8 +855,8 @@ propagate (void)
>   {
> node_g = &node_info->global;
>  
> -   opt = XCNEW (struct ipa_reference_optimization_summary_d);
> -   set_reference_optimization_summary (node, opt);
> +   ipa_reference_optimization_summary_d *opt =
> + ipa_ref_opt_sum_summaries->get (node);
>  
> /* Create the complimentary sets.  */
>  
> @@ -880,14 +884,20 @@ propagate (void)
> node_g->statics_written);
>   }
>   }
> -  free (node_info);
> }
>  
>ipa_free_postorder_info ();
>free (order);
>  
>bitmap_obstack_release (&local_info_obstack);
> -  ipa_reference_vars_vector.release ();
> +
> +  if (ipa_ref_var_info_summaries == NULL)

I assume you meant != NULL here.

> +{
> +  delete ipa_ref_var_info_summaries;
> +  ipa_ref_var_info_summaries = NULL;
> +}
> +
> +  ipa_ref_var_info_summaries = NULL;
>if (dump_file)
>  splay_tree_delete (reference_vars_to_consider);
>reference_vars_to_consider = NULL;

Thanks,

Martin



[PATCH v2, libcpp] Faster line lexer.

2015-07-10 Thread Ondřej Bílka
On Fri, Jul 10, 2015 at 12:43:48PM +0200, Jakub Jelinek wrote:
> On Fri, Jul 10, 2015 at 11:37:18AM +0200, Uros Bizjak wrote:
> > Have you tried new SSE4.2 implementation (the one with asm flags) with
> > unrolled loop?
> 
> Also, the SSE4.2 implementation looks shorter, so more I-cache friendly,
> so I wouldn't really say it is redundant if they are roughly same speed.
> 
Ok, I tried to also optimize sse4 and found that main problem was
checking that index==16 caused high latency.

Trick was checking first 64 bytes in header using flags. Then loop is 
relatively unlikely as lines longer than 64 bytes are relatively rare.

I tested that on more machines. On haswell sse4 is noticable faster, on
nehalem a sse2 is still bit faster and on amd fx10 its lot slower. How
do I check processor to select sse2 on amd processors where its
considerably slower?

nehalem

real0m12.091s
user0m12.088s
sys 0m0.000s

real0m11.350s
user0m11.347s
sys 0m0.000s

real0m8.521s
user0m8.519s
sys 0m0.000s

real0m10.407s
user0m10.402s
sys 0m0.004s

real0m8.325s
user0m8.323s
sys 0m0.000s

ivy bridge

real0m5.005s
user0m5.004s
sys 0m0.001s

real0m4.149s
user0m4.150s
sys 0m0.000s

real0m3.836s
user0m3.837s
sys 0m0.000s

real0m4.097s
user0m4.098s
sys 0m0.000s

real0m3.928s
user0m3.930s
sys 0m0.000s

fx10

real0m10.456s
user0m10.469s
sys 0m0.000s

real0m9.356s
user0m9.371s
sys 0m0.000s

real0m9.418s
user0m9.433s
sys 0m0.000s

real0m9.039s
user0m9.050s
sys 0m0.003s

real0m7.983s
user0m7.992s
sys 0m0.003s

* libcpp/lex.c (search_line_sse2): Use better header.
(search_line_sse42): Likewise.

diff --git a/libcpp/lex.c b/libcpp/lex.c
index 0ad9660..3bf9eae 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -374,35 +374,107 @@ search_line_sse2 (const uchar *s, const uchar *end 
ATTRIBUTE_UNUSED)
 
   unsigned int misalign, found, mask;
   const v16qi *p;
-  v16qi data, t;
+  v16qi data, t, tx;
+
+   if (s + 80 < end)
+{
+  v16qi x0 = __builtin_ia32_loaddqu ((char const *) s);
+  tx =  __builtin_ia32_pcmpeqb128 (x0, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x0, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x0, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x0, repl_qm);
+
+  found =  __builtin_ia32_pmovmskb128 (tx);
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + found;
+}
+  v16qi x1 = __builtin_ia32_loaddqu ((char const *) (s + 16));
+  v16qi x2 = __builtin_ia32_loaddqu ((char const *) (s + 32));
+  v16qi x3 = __builtin_ia32_loaddqu ((char const *) (s + 48));
+  v16qi x4 = __builtin_ia32_loaddqu ((char const *) (s + 64));
+
+  tx =  __builtin_ia32_pcmpeqb128 (x1, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x1, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x1, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x1, repl_qm);
+
+  found =  __builtin_ia32_pmovmskb128 (tx);
+
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + 16 + found;
+}
+
+  tx =  __builtin_ia32_pcmpeqb128 (x2, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x2, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x2, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x2, repl_qm);
+
+  found = __builtin_ia32_pmovmskb128 (tx);
+
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + 32 + found;
+}
+
+
+  tx =  __builtin_ia32_pcmpeqb128 (x3, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x3, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x3, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x3, repl_qm);
+
+  found =  __builtin_ia32_pmovmskb128 (tx);
+
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + 48 + found;
+}
+
+  tx =  __builtin_ia32_pcmpeqb128 (x4, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x4, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x4, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x4, repl_qm);
+
+  found =  __builtin_ia32_pmovmskb128 (tx);
+
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + 64 + found;
+}
+
+  s += 80;
+}
 
   /* Align the source pointer.  */
   misalign = (uintptr_t)s & 15;
   p = (const v16qi *)((uintptr_t)s & -16);
   data = *p;
 
-  /* Create a mask for the bytes that are valid within the first
- 16-byte block.  The Idea here is that the AND with the mask
- within the loop is "free", since we need some AND or TEST
- insn in order to set the flags for the branch anyway.  */
   mask = -1u << misalign;
 
-  /* Main loop processing 16 bytes at a time.  */
-  goto start;
-  do
+  t  = __builtin_ia32_pcmpeqb12

Re: [C/C++ PATCH] Implement -Wshift-overflow (PR c++/55095) (take 3)

2015-07-10 Thread Marek Polacek
Ping^5.

On Fri, Jul 03, 2015 at 09:42:39AM +0200, Marek Polacek wrote:
> Ping^4.
> 
> On Fri, Jun 26, 2015 at 10:08:51AM +0200, Marek Polacek wrote:
> > I'm pinging the C++ parts.
> > 
> > On Fri, Jun 19, 2015 at 12:44:36PM +0200, Marek Polacek wrote:
> > > Ping.
> > > 
> > > On Fri, Jun 12, 2015 at 11:07:29AM +0200, Marek Polacek wrote:
> > > > Ping.
> > > > 
> > > > On Fri, Jun 05, 2015 at 10:55:08AM +0200, Marek Polacek wrote:
> > > > > On Thu, Jun 04, 2015 at 09:04:19PM +, Joseph Myers wrote:
> > > > > > The C changes are OK.
> > > > > 
> > > > > Jason, do you want to approve the C++ parts?

Marek


[patch] Adjust tilepro generated headers

2015-07-10 Thread Andrew MacLeod
bah. I forgot that tilepro generates 2 source file that are in the 
source tree... so it overwrites any changes directly to the includes.   
These probably ought to be generated into the build directory so this 
isn't an issue. Although I suppose its probably only an issue for me 
anyway :-P


anyway, tested on the 3 tilepro triplets to verify it compiles... 
checked in as obvious since its basically redoing some previous patches.


Andrew


2015-07-10  Andrew MacLeod  

	* config/tilepro/gen-mul-tables.cc (main): Change include list for
	generated files.
	* config/tilepro/mul-tables.c: Regenerate.
	* config/tilegx/mul-tables.c: Regenerate.

Index: config/tilepro/gen-mul-tables.cc
===
*** config/tilepro/gen-mul-tables.cc	(revision 225667)
--- config/tilepro/gen-mul-tables.cc	(working copy)
*** main ()
*** 1255,1279 
printf ("#include \"config.h\"\n");
printf ("#include \"system.h\"\n");
printf ("#include \"coretypes.h\"\n");
!   printf ("#include \"symtab.h\"\n");
!   printf ("#include \"hashtab.h\"\n");
!   printf ("#include \"hash-set.h\"\n");
!   printf ("#include \"vec.h\"\n");
!   printf ("#include \"machmode.h\"\n");
!   printf ("#include \"tm.h\"\n");
!   printf ("#include \"hard-reg-set.h\"\n");
!   printf ("#include \"input.h\"\n");
!   printf ("#include \"function.h\"\n");
printf ("#include \"rtl.h\"\n");
printf ("#include \"flags.h\"\n");
-   printf ("#include \"statistics.h\"\n");
-   printf ("#include \"double-int.h\"\n");
-   printf ("#include \"real.h\"\n");
-   printf ("#include \"fixed-value.h\"\n");
printf ("#include \"alias.h\"\n");
-   printf ("#include \"wide-int.h\"\n");
-   printf ("#include \"inchash.h\"\n");
-   printf ("#include \"tree.h\"\n");
printf ("#include \"insn-config.h\"\n");
printf ("#include \"expmed.h\"\n");
printf ("#include \"dojump.h\"\n");
--- 1255,1265 
printf ("#include \"config.h\"\n");
printf ("#include \"system.h\"\n");
printf ("#include \"coretypes.h\"\n");
!   printf ("#include \"backend.h\"\n");
!   printf ("#include \"tree.h\"\n");
printf ("#include \"rtl.h\"\n");
printf ("#include \"flags.h\"\n");
printf ("#include \"alias.h\"\n");
printf ("#include \"insn-config.h\"\n");
printf ("#include \"expmed.h\"\n");
printf ("#include \"dojump.h\"\n");
Index: config/tilepro/mul-tables.c
===
*** config/tilepro/mul-tables.c	(revision 225667)
--- config/tilepro/mul-tables.c	(working copy)
***
*** 24,33 
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
  #include "tree.h"
- #include "tm.h"
  #include "rtl.h"
- #include "function.h"
  #include "flags.h"
  #include "alias.h"
  #include "insn-config.h"
--- 24,32 
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
+ #include "backend.h"
  #include "tree.h"
  #include "rtl.h"
  #include "flags.h"
  #include "alias.h"
  #include "insn-config.h"
Index: config/tilegx/mul-tables.c
===
*** config/tilegx/mul-tables.c	(revision 225667)
--- config/tilegx/mul-tables.c	(working copy)
***
*** 24,46 
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
  #include "tree.h"
- #include "hashtab.h"
- #include "hash-set.h"
- #include "vec.h"
- #include "machmode.h"
- #include "tm.h"
  #include "rtl.h"
- #include "input.h"
- #include "function.h"
  #include "flags.h"
- #include "statistics.h"
- #include "double-int.h"
- #include "real.h"
- #include "fixed-value.h"
  #include "alias.h"
- #include "wide-int.h"
- #include "inchash.h"
  #include "insn-config.h"
  #include "expmed.h"
  #include "dojump.h"
--- 24,34 
  #include "config.h"
  #include "system.h"
  #include "coretypes.h"
+ #include "backend.h"
  #include "tree.h"
  #include "rtl.h"
  #include "flags.h"
  #include "alias.h"
  #include "insn-config.h"
  #include "expmed.h"
  #include "dojump.h"


Re: Proposal to postpone release of 5.2 for a week [Was: Re: patch to fix PR66782]

2015-07-10 Thread Vladimir Makarov



On 07/10/2015 04:09 AM, Richard Biener wrote:

On Thu, 9 Jul 2015, Uros Bizjak wrote:


Hello!


The patch was bootstrapped and tested on x86/x86-64.

Committed as rev. 225618.

2015-07-09  Vladimir Makarov  

 PR rtl-optimization/66782
 * lra-int.h (struct lra_insn_recog_data): Add comment about
 clobbered hard regs for arg_hard_regs.
 * lra.c (lra_set_insn_recog_data): Add clobbered hard regs.
 * lra-lives.c (process_bb_lives): Process clobbered hard regs.
 Add condition for processing used hard regs.
 * lra-constraints.c (update_ebb_live_info, inherit_in_ebb):
 Process clobbered hard regs.

I would like to nominate this patch for gcc-5.2 release. According to
downstream bugreport [1], gcc-5.1 is unusable for 64-bit wine:

"Breaks all of wine, no easy workaround -> blocker."

Due to severity of this bug, and importance of Wine, I'd like to
postpone the 5.2 release for a week, so this bug gets some testing in
the mainline, before it is backported to gcc-5 branch

[1] https://bugs.winehq.org/show_bug.cgi?id=38653

Hm.  I'd rather burn this with a RC2 released soon or defer it to
GCC 5.3.  The patch looks kind-of straight-forward, likely not
affecting anything else (to my naiive eyes...).

So - please get it committed to the GCC 5 branch as soon as possible.
A GCC 5.2 RC2 will be done on Monday latest then (possibly during the
weekend if I find the time to do it).

Note this opens the window for other important wrong-code fixes - please
CC me on any you'd like to propose for GCC 5.2 and wait for my
explicit approval.


Thanks, Richard.  I believe the patch is safe.  So I've backported the 
patch to the branch as rev. 225674.  I successfully bootstrapped and 
tested it on x86-64.




RFC: Use std::{min,max} instead of MIN/MAX?

2015-07-10 Thread Marek Polacek
Uros had the idea of using std::min/max instead of our MIN/MAX
macros defined in system.h.  I thought I would do this cleanup,
but very soon I ran into a problem of failed template argument
substitution: std::min/max function templates require that both
arguments be of the same type:

/home/marek/src/gcc/gcc/caller-save.c: In function ‘void 
replace_reg_with_saved_mem(rtx_def**, machine_mode, int, void*)’:
/home/marek/src/gcc/gcc/caller-save.c:1151:63: error: no matching function for 
call to ‘min(int, short unsigned int)’
  offset -= (std::min (UNITS_PER_WORD, GET_MODE_SIZE (mode))
   ^
In file included from /usr/include/c++/5.1.1/bits/char_traits.h:39:0,
 from /usr/include/c++/5.1.1/string:40,
 from /home/marek/src/gcc/gcc/system.h:201,
 from /home/marek/src/gcc/gcc/caller-save.c:21:
/usr/include/c++/5.1.1/bits/stl_algobase.h:195:5: note: candidate: 
template const _Tp& std::min(const _Tp&, const _Tp&)
 min(const _Tp& __a, const _Tp& __b)
 ^
/usr/include/c++/5.1.1/bits/stl_algobase.h:195:5: note:   template argument 
deduction/substitution failed:
/home/marek/src/gcc/gcc/caller-save.c:1151:63: note:   deduced conflicting 
types for parameter ‘const _Tp’ (‘int’ and ‘short unsigned int’)
  offset -= (std::min (UNITS_PER_WORD, GET_MODE_SIZE (mode))

We can work around this by using casts, but that seems too ugly a solution.
So it appears to me that we're stuck with our MIN/MAX macros.

Thoughts?

Marek


Re: [PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-10 Thread Kyrill Tkachov


On 10/07/15 13:31, Kyrill Tkachov wrote:

+   to compute a value for x.  Put the rtx cost of the insns
+   in TEST_BB into COST.  Record whether TEST_BB is a single simple
+   set instruction in SIMPLE_P.  If the bb is not simple place all insns
+   except the last insn into SEQ.  */
+


That last sentence is stale. That function doesn't have a SEQ argument.
Consider that comment sentence removed.

Thanks,
Kyrill



Re: [PATCH] Improve in_array_bounds_p

2015-07-10 Thread Richard Biener
yOn Fri, 10 Jul 2015, Richard Biener wrote:

> 
> I was just testing the patch below which runs into latent issues when
> building libjava (at least)...
> 
> /space/rguenther/src/svn/trunk/libjava/java/lang/natClassLoader.cc: In 
> function ‘java::lang::Class* _Jv_FindClassInCache(_Jv_Utf8Const*)’:
> /space/rguenther/src/svn/trunk/libjava/java/lang/natClassLoader.cc:97:1: 
> error:BB 3 last statement has incorrectly set lp
>  _Jv_FindClassInCache (_Jv_Utf8Const *name)
>  ^
> /space/rguenther/src/svn/trunk/libjava/java/lang/natClassLoader.cc:97:1: 
> internal compiler error: verify_flow_info failed
> 0x8e2132 verify_flow_info()
> /space/rguenther/src/svn/trunk/gcc/cfghooks.c:261
> 
> so I have to debug that first.

It's stmts no longer throwing after VRP setting a value-range on
an array index for example.  I've addressed this in the revised
patch below which teaches CFG cleanup to deal with this (it
already removes dead EH edges and makes similar adjustments for
noreturn calls).

>  Still IMHO the patch makes sense apart
> from the ugly need to go through a INTEGER_CST tree when converting
> a wide_int to a widest_int (ugh).  Any wide-int folks around that
> can suggest something better here (reason: the two integers we compare
> do not have to have the same type/precision - see tree_int_cst_lt
> which also uses widest_ints).

This issue still remains.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2015-07-10  Richard Biener  

* tree-eh.c (in_array_bounds_p): Use value-range information
when available.
* tree-cfgcleanup.c (cleanup_control_flow_bb): Clean stmts
from stale EH info.

Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 225655)
+++ gcc/tree-eh.c   (working copy)
@@ -2532,8 +2532,11 @@ in_array_bounds_p (tree ref)
 {
   tree idx = TREE_OPERAND (ref, 1);
   tree min, max;
+  wide_int idx_min, idx_max;
 
-  if (TREE_CODE (idx) != INTEGER_CST)
+  if (TREE_CODE (idx) != INTEGER_CST
+  && (TREE_CODE (idx) != SSA_NAME
+ || get_range_info (idx, &idx_min, &idx_max) != VR_RANGE))
 return false;
 
   min = array_ref_low_bound (ref);
@@ -2544,11 +2547,26 @@ in_array_bounds_p (tree ref)
   || TREE_CODE (max) != INTEGER_CST)
 return false;
 
-  if (tree_int_cst_lt (idx, min)
-  || tree_int_cst_lt (max, idx))
-return false;
+  if (TREE_CODE (idx) == INTEGER_CST)
+{
+  if (tree_int_cst_lt (idx, min)
+ || tree_int_cst_lt (max, idx))
+   return false;
+
+  return true;
+}
+  else
+{
+  if (wi::lts_p (wi::to_widest (wide_int_to_tree (TREE_TYPE (idx),
+ idx_min)),
+wi::to_widest (min))
+ || wi::lts_p (wi::to_widest (max),
+   wi::to_widest (wide_int_to_tree (TREE_TYPE (idx),
+idx_max
+   return false;
 
-  return true;
+  return true;
+}
 }
 
 /* Returns true if it is possible to prove that the range of
Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 225662)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -256,6 +256,14 @@ cleanup_control_flow_bb (basic_block bb)
&& remove_fallthru_edge (bb->succs))
 retval = true;
 
+  /* If a stmt may no longer throw, remove it from the EH tables
+ and cleanup dead EH edges.  */
+  else if (maybe_clean_eh_stmt (stmt))
+{
+  gimple_purge_dead_eh_edges (bb);
+  retval = true;
+}
+
   return retval;
 }
 

Re: [PATCH] Limit alignment on error_mark_node variable

2015-07-10 Thread Richard Biener
On Fri, Jul 10, 2015 at 2:19 PM, H.J. Lu  wrote:
> On Thu, Jul 09, 2015 at 03:57:31PM +0200, Richard Biener wrote:
>> On Thu, Jul 9, 2015 at 1:08 PM, H.J. Lu  wrote:
>> > On Thu, Jul 9, 2015 at 2:54 AM, Richard Biener
>> >  wrote:
>> >> On Thu, Jul 9, 2015 at 11:52 AM, H.J. Lu  wrote:
>> >>> On Thu, Jul 09, 2015 at 10:16:38AM +0200, Richard Biener wrote:
>>  On Wed, Jul 8, 2015 at 5:32 PM, H.J. Lu  wrote:
>>  > There is no need to try different alignment on variable of
>>  > error_mark_node.
>>  >
>>  > OK for trunk if there is no regression?
>> 
>>  Can't we avoid calling align_variable on error_mark_node type decls
>>  completely?  That is, punt earlier when we try to emit it.
>> 
>> >>>
>> >>> How about this?  OK for trunk?
>> >>
>> >> Heh, you now get the obvious question why we can't simply avoid
>> >> adding the varpool node in the first place ;)
>> >>
>> >
>> > When it was first added to varpool, its type was OK:
>> >
>> > (gdb) bt
>> > #0  varpool_node::get_create (decl=)
>> > at /export/gnu/import/git/sources/gcc/gcc/varpool.c:150
>> > #1  0x00e1c3e8 in rest_of_decl_compilation (
>> > decl=, top_level=1, at_end=0)
>> > at /export/gnu/import/git/sources/gcc/gcc/passes.c:271
>> > #2  0x00731d39 in finish_decl (decl=,
>> > init_loc=0, init=, origtype=, asmspec_tree=> > 0x0>)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4863
>> > #3  0x0078d1ed in c_parser_declaration_or_fndef (
>> > parser=0x715050a8, fndef_ok=false, static_assert_ok=true,
>> > empty_ok=true, nested=false, start_attr_ok=true,
>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1855
>> > #4  0x0078c234 in c_parser_external_declaration 
>> > (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1435
>> > #5  0x0078be45 in c_parser_translation_unit (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1322
>> > #6  0x007b3271 in c_parse_file ()
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:15440
>> > #7  0x0081cb97 in c_common_parse_file ()
>> > at /export/gnu/import/git/sources/gcc/gcc/c-family/c-opts.c:1059
>> > #8  0x00f27662 in compile_file ()
>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:543
>> > ---Type  to continue, or q  to quit---
>> > #9  0x00f29baa in do_compile ()
>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:2041
>> > #10 0x00f29df9 in toplev::main (this=0x7fffdc90, argc=17,
>> > argv=0x7fffdd98)
>> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:2142
>> > #11 0x017d8228 in main (argc=17, argv=0x7fffdd98)
>> > at /export/gnu/import/git/sources/gcc/gcc/main.c:39
>> >
>> > Later, it was turned into error_mark_node:
>> >
>> > Old value = 
>> > New value = 
>> > finish_decl (decl=, init_loc=0, init=,
>> > origtype=, asmspec_tree=)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4802
>> > 4802  if (TREE_USED (type))
>> > (gdb) bt
>> > #0  finish_decl (decl=, init_loc=0,
>> > init=, origtype=, asmspec_tree=)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4802
>> > #1  0x0078d1ed in c_parser_declaration_or_fndef (
>> > parser=0x715050a8, fndef_ok=false, static_assert_ok=true,
>> > empty_ok=true, nested=true, start_attr_ok=true,
>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1855
>> > #2  0x00792a23 in c_parser_compound_statement_nostart (
>> > parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:4621
>> > #3  0x00792688 in c_parser_compound_statement 
>> > (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:4532
>> > #4  0x0078d5a3 in c_parser_declaration_or_fndef (
>> > parser=0x715050a8, fndef_ok=true, static_assert_ok=true,
>> > empty_ok=true, nested=false, start_attr_ok=true,
>> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1965
>> > #5  0x0078c234 in c_parser_external_declaration 
>> > (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1435
>> > #6  0x0078be45 in c_parser_translation_unit (parser=0x715050a8)
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1322
>> > #7  0x007b3271 in c_parse_file ()
>> > ---Type  to continue, or q  to quit---
>> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:15440
>> > #8  0x0081cb97 in c_common_parse_file ()
>> > at /export/gnu/import/git/sources/gcc/gcc/c-family/c-opts.c:1059
>> > #9  0x00f27662 in compile_file ()
>> > at /export/

Re: [PATCH][14/n] Remove GENERIC stmt combining from SCCVN

2015-07-10 Thread Richard Biener
On Thu, 9 Jul 2015, Richard Biener wrote:

> 
> This moves more patterns that show up during bootstrap.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Due to regressions caused I split off the fold_plusminus_mult_expr
part.

Bootstrapped and tested on x86_64-unknown-linux-gnu, committed.

Richard.

2015-07-09  Richard Biener  

* fold-const.c (distribute_bit_expr): Remove.
(fold_binary_loc): Move simplifying (A & C1) + (B & C2)
to (A & C1) | (B & C2), distributing (A & B) | (A & C)
to A & (B | C) and simplifying A << C1 << C2 to ...
* match.pd: ... patterns here.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 225657)
--- gcc/fold-const.c(working copy)
*** static enum tree_code compcode_to_compar
*** 117,123 
  static int operand_equal_for_comparison_p (tree, tree, tree);
  static int twoval_comparison_p (tree, tree *, tree *, int *);
  static tree eval_subst (location_t, tree, tree, tree, tree, tree);
- static tree distribute_bit_expr (location_t, enum tree_code, tree, tree, 
tree);
  static tree make_bit_field_ref (location_t, tree, tree,
HOST_WIDE_INT, HOST_WIDE_INT, int);
  static tree optimize_bit_field_compare (location_t, enum tree_code,
--- 117,122 
*** invert_truthvalue_loc (location_t loc, t
*** 3549,3610 
  type, arg);
  }
  
- /* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
-operands are another bit-wise operation with a common input.  If so,
-distribute the bit operations to save an operation and possibly two if
-constants are involved.  For example, convert
-   (A | B) & (A | C) into A | (B & C)
-Further simplification will occur if B and C are constants.
- 
-If this optimization cannot be done, 0 will be returned.  */
- 
- static tree
- distribute_bit_expr (location_t loc, enum tree_code code, tree type,
-tree arg0, tree arg1)
- {
-   tree common;
-   tree left, right;
- 
-   if (TREE_CODE (arg0) != TREE_CODE (arg1)
-   || TREE_CODE (arg0) == code
-   || (TREE_CODE (arg0) != BIT_AND_EXPR
- && TREE_CODE (arg0) != BIT_IOR_EXPR))
- return 0;
- 
-   if (operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1, 0), 0))
- {
-   common = TREE_OPERAND (arg0, 0);
-   left = TREE_OPERAND (arg0, 1);
-   right = TREE_OPERAND (arg1, 1);
- }
-   else if (operand_equal_p (TREE_OPERAND (arg0, 0), TREE_OPERAND (arg1, 1), 
0))
- {
-   common = TREE_OPERAND (arg0, 0);
-   left = TREE_OPERAND (arg0, 1);
-   right = TREE_OPERAND (arg1, 0);
- }
-   else if (operand_equal_p (TREE_OPERAND (arg0, 1), TREE_OPERAND (arg1, 0), 
0))
- {
-   common = TREE_OPERAND (arg0, 1);
-   left = TREE_OPERAND (arg0, 0);
-   right = TREE_OPERAND (arg1, 1);
- }
-   else if (operand_equal_p (TREE_OPERAND (arg0, 1), TREE_OPERAND (arg1, 1), 
0))
- {
-   common = TREE_OPERAND (arg0, 1);
-   left = TREE_OPERAND (arg0, 0);
-   right = TREE_OPERAND (arg1, 0);
- }
-   else
- return 0;
- 
-   common = fold_convert_loc (loc, type, common);
-   left = fold_convert_loc (loc, type, left);
-   right = fold_convert_loc (loc, type, right);
-   return fold_build2_loc (loc, TREE_CODE (arg0), type, common,
- fold_build2_loc (loc, code, type, left, right));
- }
- 
  /* Knowing that ARG0 and ARG1 are both RDIV_EXPRs, simplify a binary operation
 with code CODE.  This optimization is unsafe.  */
  static tree
--- 3548,3553 
*** fold_binary_loc (location_t loc,
*** 9574,9594 
  
if (! FLOAT_TYPE_P (type))
{
- /* If we are adding two BIT_AND_EXPR's, both of which are and'ing
-with a constant, and the two constants have no bits in common,
-we should treat this as a BIT_IOR_EXPR since this may produce more
-simplifications.  */
- if (TREE_CODE (arg0) == BIT_AND_EXPR
- && TREE_CODE (arg1) == BIT_AND_EXPR
- && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
- && TREE_CODE (TREE_OPERAND (arg1, 1)) == INTEGER_CST
- && wi::bit_and (TREE_OPERAND (arg0, 1),
- TREE_OPERAND (arg1, 1)) == 0)
-   {
- code = BIT_IOR_EXPR;
- goto bit_ior;
-   }
- 
  /* Reassociate (plus (plus (mult) (foo)) (mult)) as
 (plus (plus (mult) (mult)) (foo)) so that we can
 take advantage of the factoring cases below.  */
--- 9517,9522 
*** fold_binary_loc (location_t loc,
*** 10422,10428 
goto associate;
  
  case BIT_IOR_EXPR:
- bit_ior:
/* Canonicalize (X & C1) | C2.  */
if (TREE_CODE (arg0) == BIT_AND_EXPR
  && TREE_CODE (arg1) == INTEGER_CST
--- 10350,10355 

[PATCH][RTL-ifcvt] Make non-conditional execution if-conversion more aggressive

2015-07-10 Thread Kyrill Tkachov

Hi all,

This patch makes if-conversion more aggressive when handling code of the form:
if (test)
  x := a  //THEN
else
  x := b  //ELSE

Currently, we can handle this case only if x:=a and x:=b are simple single set 
instructions.
With this patch we will be able to handle the cases where x:=a and x:=b take 
multiple instructions.
This can be done under the condition that all the instructions in the THEN and 
ELSE basic blocks are
only used to compute a value for x.  I suppose we could generalise even further 
(perhaps to handle
cases where multiple x's are being set) but that's out of the scope of this 
patch.

This was sparked by some cases in aarch64 where the THEN or ELSE branches 
contained an extra
zero_extend operation after an arithmetic instruction which prevented 
if-conversion.

To implement this approach noce_process_if_block in ifcvt.c is relaxed to allow 
multi-instruction
basic blocks when the intermediate values produced in them don't escape the 
basic block except
through x.  noce_process_if_block then calls a number of other functions to 
detect various
patterns and if-convert. Most of them don't actually make sense for 
multi-instruction basic blocks
so they are updated to reject them and operate only on the existing 
single-instruction case.

However, noce_try_cmove_arith can take advantage of multi-instruction basic 
blocks and is thus
updated to emit the whole basic blocks rather than just one instruction.

The transformation is, of course, guarded on a cost calculation.
The current code adds the costs of both the THEN and ELSE blocks and proceeds 
if they don't
exceed the branch cost. I don't think that's quite a right calculation.
We're going to be executing at least one of the basic blocks anyway.
This patch we instead check the *maximum* of the two blocks against the branch 
cost.
This should still catch cases where a high latency instruction appears in one 
or both of
the paths.


This transformation applies to targets with conditional move operations but no 
conditional
execution. Thus, it applies to aarch64 and x86_64, but not arm.

The effect of this patch is more noticeable if the backend branch cost is 
higher (like you'd expect).


Not increasing the branch cost we still get more aggressive if-conversion.
Across the whole of SPEC2006 I saw a 5.8% increase in the number of csel 
instructions generated
(from 41242 -> 43637)

Bootstrapped and tested on aarch64, x86_6, arm.
I've made the testcases aarch64-specific since they depend on backend branch 
costs that are hard
to predict across all platforms (we don't have a -mbranch-cost= option ;))
No performance regressions on SPEC2006 on aarch64 and x86_64.
On aarch64 I've seen 482.sphinx3 improve by 2.3% and 459.GemsFDTD by 2.1%

Some of the testcases in aarch64.exp now fail their scan-assembler patterns due 
to if-conversion.
I've updated those testcases to properly generate the pattern they expect. The 
changes are mostly
due to add+compare-style instructions now appearing in the same basic blocks as 
their result uses,
which, I think, scares combine away from combining them into one.

Does this approach look reasonable?
If so, ok for trunk?

Thanks,
Kyrill


2015-07-10  Kyrylo Tkachov  

* ifcvt.c (struct noce_if_info): Add then_simple, else_simple,
then_cost, else_cost fields.
(end_ifcvt_sequence): Call set_used_flags on each insn in the
sequence.
(noce_simple_bbs): New function.
(noce_try_move): Bail if basic blocks are not simple.
(noce_try_store_flag): Likewise.
(noce_try_store_flag_constants): Likewise.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_minmax): Likewise.
(noce_try_abs): Likewise.
(noce_try_sign_mask): Likewise.
(noce_try_bitop): Likewise.
(bbs_ok_for_cmove_arith): New function.
(noce_emit_all_but_last): Likewise.
(noce_emit_insn): Likewise.
(noce_emit_bb): Likewise.
(noce_try_cmove_arith): Handle non-simple basic blocks.
(insn_valid_noce_process_p): New function.
(bb_valid_for_noce_process_p): Likewise.
(noce_process_if_block): Allow non-simple basic blocks
where appropriate.


2015-07-10  Kyrylo Tkachov  

* gcc.target/aarch64/ifcvt_csel_1.c: New test.
* gcc.target/aarch64/ifcvt_csel_2.c: New test.
* gcc.target/aarch64/ifcvt_csel_3.c: New test.
commit b6fe0e0a5f64fdc11fbbd7c9e05caeeb23e21662
Author: Kyrylo Tkachov 
Date:   Wed Jul 8 15:45:04 2015 +0100

[PATCH][ifcvt] Make non-conditional execution if-conversion more aggressive

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 31849ee..3d324257 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -815,6 +815,15 @@ struct noce_if_info
  form as well.  */
   bool then_else_reversed;
 
+  /* True if the contents of then_bb and else_bb are a
+ simple single set instruction.  */
+  bool then_simple;
+  bool else_simple;
+
+  /* The total rtx cost of the instructions in then_bb and else_bb.  */
+  i

[PATCH][AArch64][testsuite] Adjust some arith+compare tests for potentially more aggressive if-conversion

2015-07-10 Thread Kyrill Tkachov

Hi all,

Some of the testcases in aarch64.exp can fail their scan-assembler patterns if 
if-conversion becomes more aggressive.
This patch adjusts the testcases in case the branches are eliminated and 
further optimisations occur that may remove the
scan-assembler patterns.
With this patch the patterns are always generated and the expected execute 
values are updated.

Tests still pass on aarch64.
Ok for trunk?

Thanks,
Kyrill

2015-07-10  Kyrylo Tkachov  

* gcc.target/aarch64/adds3.c: Adjust for more aggressive
if-conversion..
* gcc.target/aarch64/ands_1.c: Likewise.
* gcc.target/aarch64/bics_1.c: Likewise.
* gcc.target/aarch64/subs1.c: Likewise.
* gcc.target/aarch64/subs3.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/adds1.c b/gcc/testsuite/gcc.target/aarch64/adds1.c
index 6cc700a..1689029 100644
--- a/gcc/testsuite/gcc.target/aarch64/adds1.c
+++ b/gcc/testsuite/gcc.target/aarch64/adds1.c
@@ -12,7 +12,7 @@ adds_si_test1 (int a, int b, int c)
   if (d == 0)
 return a + c;
   else
-return b + d + c;
+return d;
 }
 
 int
@@ -24,7 +24,7 @@ adds_si_test2 (int a, int b, int c)
   if (d == 0)
 return a + c;
   else
-return b + d + c;
+return d;
 }
 
 int
@@ -36,7 +36,7 @@ adds_si_test3 (int a, int b, int c)
   if (d == 0)
 return a + c;
   else
-return b + d + c;
+return d;
 }
 
 typedef long long s64;
@@ -50,7 +50,7 @@ adds_di_test1 (s64 a, s64 b, s64 c)
   if (d == 0)
 return a + c;
   else
-return b + d + c;
+return d;
 }
 
 s64
@@ -62,7 +62,7 @@ adds_di_test2 (s64 a, s64 b, s64 c)
   if (d == 0)
 return a + c;
   else
-return b + d + c;
+return d;
 }
 
 s64
@@ -74,7 +74,7 @@ adds_di_test3 (s64 a, s64 b, s64 c)
   if (d == 0)
 return a + c;
   else
-return b + d + c;
+return d;
 }
 
 int main ()
@@ -83,66 +83,68 @@ int main ()
   s64 y;
 
   x = adds_si_test1 (29, 4, 5);
-  if (x != 42)
+  if (x != (29 + 4))
 abort ();
 
-  x = adds_si_test1 (5, 2, 20);
-  if (x != 29)
+  x = adds_si_test1 (5, 2, -5);
+  if (x != 7)
 abort ();
 
   x = adds_si_test2 (29, 4, 5);
-  if (x != 293)
+  if (x != (29 + 0xff))
 abort ();
 
-  x = adds_si_test2 (1024, 2, 20);
-  if (x != 1301)
+  x = adds_si_test2 (-255, 2, 20);
+  if (x != -235)
 abort ();
 
   x = adds_si_test3 (35, 4, 5);
-  if (x != 76)
+  if (x != (35 + (4 << 3)))
 abort ();
 
-  x = adds_si_test3 (5, 2, 20);
-  if (x != 43)
+  x = adds_si_test3 (-(2 << 3), 2, 20);
+  if (x != (20 - (2 << 3)))
 abort ();
 
   y = adds_di_test1 (0x13029ll,
 		 0x32004ll,
 		 0x505050505ll);
 
-  if (y != 0xc75050536)
+  if (y != (0x13029ll + 0x32004ll))
 abort ();
 
   y = adds_di_test1 (0x5000500050005ll,
-		 0x2111211121112ll,
+		 -0x5000500050005ll,
 		 0x02020ll);
-  if (y != 0x9222922294249)
+  if (y != (0x5000500050005ll + 0x02020ll))
 abort ();
 
   y = adds_di_test2 (0x13029ll,
 		 0x32004ll,
 		 0x505050505ll);
-  if (y != 0x955050631)
+  if (y != (0x13029ll + 0xff))
 abort ();
 
-  y = adds_di_test2 (0x130002900ll,
+  y = adds_di_test2 (-0xff,
 		 0x32004ll,
 		 0x505050505ll);
-  if (y != 0x955052f08)
+  if (y != (0x505050505ll - 0xff))
 abort ();
 
   y = adds_di_test3 (0x13029ll,
 		 0x06408ll,
 		 0x505050505ll);
-  if (y != 0x9b9050576)
+  if (y != (0x13029ll + (0x06408ll << 3)))
 abort ();
 
   y = adds_di_test3 (0x130002900ll,
-		 0x08808ll,
+		 -(0x130002900ll >> 3),
 		 0x505050505ll);
-  if (y != 0xafd052e4d)
+  if (y != (0x130002900ll + 0x505050505ll))
 abort ();
+#if 0
 
+#endif
   return 0;
 }
 
diff --git a/gcc/testsuite/gcc.target/aarch64/adds3.c b/gcc/testsuite/gcc.target/aarch64/adds3.c
index 18efd1c..c5518bd 100644
--- a/gcc/testsuite/gcc.target/aarch64/adds3.c
+++ b/gcc/testsuite/gcc.target/aarch64/adds3.c
@@ -12,7 +12,7 @@ adds_ext (s64 a, int b, int c)
   if (d == 0)
 return a + c;
   else
-return b + d + c;
+return d;
 }
 
 int
@@ -23,7 +23,7 @@ adds_shift_ext (s64 a, int b, int c)
   if (d == 0)
 return a + c;
   else
-return b + d + c;
+return d;
 }
 
 int main ()
@@ -32,27 +32,27 @@ int main ()
   s64 y;
 
   x = adds_ext (0x1302ll, 41, 15);
-  if (x != 318767203)
+  if (x != (int)(0x1302ll + 41))
 abort ();
 
-  x = adds_ext (0x50505050ll, 29, 4);
-  if (x != 1347440782)
+  x = adds_ext (0x50505050ll, -0x50505050ll, 4);
+  if (x != (int)(0x50505050ll + 4))
 abort ();
 
   x = adds_ext (0x12121212121ll, 2, 14);
-  if (x != 555819315)
+  if (x != (int)(0x12121212121ll + 2))
 abort ();
 
   x = adds_shift_ext (0x123456789ll, 4, 12);
-  if (x != 591751097)
+  if (x != (int)(0x123456789ll + (4 << 3)))
 abort ();
 
-  x = adds_shift_ext (0x02020202ll, 9, 8);
-  if (x != 33686107)
+  x = adds_shift_ext (-(0x02020202ll << 3), 0x02020202ll, 8);
+  if (x != (int)(8 - (0x02020202ll << 3)))
 abort ();
 
   x = adds_shift_ext (0

Re: [PATCH] Limit alignment on error_mark_node variable

2015-07-10 Thread H.J. Lu
On Thu, Jul 09, 2015 at 03:57:31PM +0200, Richard Biener wrote:
> On Thu, Jul 9, 2015 at 1:08 PM, H.J. Lu  wrote:
> > On Thu, Jul 9, 2015 at 2:54 AM, Richard Biener
> >  wrote:
> >> On Thu, Jul 9, 2015 at 11:52 AM, H.J. Lu  wrote:
> >>> On Thu, Jul 09, 2015 at 10:16:38AM +0200, Richard Biener wrote:
>  On Wed, Jul 8, 2015 at 5:32 PM, H.J. Lu  wrote:
>  > There is no need to try different alignment on variable of
>  > error_mark_node.
>  >
>  > OK for trunk if there is no regression?
> 
>  Can't we avoid calling align_variable on error_mark_node type decls
>  completely?  That is, punt earlier when we try to emit it.
> 
> >>>
> >>> How about this?  OK for trunk?
> >>
> >> Heh, you now get the obvious question why we can't simply avoid
> >> adding the varpool node in the first place ;)
> >>
> >
> > When it was first added to varpool, its type was OK:
> >
> > (gdb) bt
> > #0  varpool_node::get_create (decl=)
> > at /export/gnu/import/git/sources/gcc/gcc/varpool.c:150
> > #1  0x00e1c3e8 in rest_of_decl_compilation (
> > decl=, top_level=1, at_end=0)
> > at /export/gnu/import/git/sources/gcc/gcc/passes.c:271
> > #2  0x00731d39 in finish_decl (decl=,
> > init_loc=0, init=, origtype=, asmspec_tree= > 0x0>)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4863
> > #3  0x0078d1ed in c_parser_declaration_or_fndef (
> > parser=0x715050a8, fndef_ok=false, static_assert_ok=true,
> > empty_ok=true, nested=false, start_attr_ok=true,
> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1855
> > #4  0x0078c234 in c_parser_external_declaration 
> > (parser=0x715050a8)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1435
> > #5  0x0078be45 in c_parser_translation_unit (parser=0x715050a8)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1322
> > #6  0x007b3271 in c_parse_file ()
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:15440
> > #7  0x0081cb97 in c_common_parse_file ()
> > at /export/gnu/import/git/sources/gcc/gcc/c-family/c-opts.c:1059
> > #8  0x00f27662 in compile_file ()
> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:543
> > ---Type  to continue, or q  to quit---
> > #9  0x00f29baa in do_compile ()
> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:2041
> > #10 0x00f29df9 in toplev::main (this=0x7fffdc90, argc=17,
> > argv=0x7fffdd98)
> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:2142
> > #11 0x017d8228 in main (argc=17, argv=0x7fffdd98)
> > at /export/gnu/import/git/sources/gcc/gcc/main.c:39
> >
> > Later, it was turned into error_mark_node:
> >
> > Old value = 
> > New value = 
> > finish_decl (decl=, init_loc=0, init=,
> > origtype=, asmspec_tree=)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4802
> > 4802  if (TREE_USED (type))
> > (gdb) bt
> > #0  finish_decl (decl=, init_loc=0,
> > init=, origtype=, asmspec_tree=)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-decl.c:4802
> > #1  0x0078d1ed in c_parser_declaration_or_fndef (
> > parser=0x715050a8, fndef_ok=false, static_assert_ok=true,
> > empty_ok=true, nested=true, start_attr_ok=true,
> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1855
> > #2  0x00792a23 in c_parser_compound_statement_nostart (
> > parser=0x715050a8)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:4621
> > #3  0x00792688 in c_parser_compound_statement 
> > (parser=0x715050a8)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:4532
> > #4  0x0078d5a3 in c_parser_declaration_or_fndef (
> > parser=0x715050a8, fndef_ok=true, static_assert_ok=true,
> > empty_ok=true, nested=false, start_attr_ok=true,
> > objc_foreach_object_declaration=0x0, omp_declare_simd_clauses=...)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1965
> > #5  0x0078c234 in c_parser_external_declaration 
> > (parser=0x715050a8)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1435
> > #6  0x0078be45 in c_parser_translation_unit (parser=0x715050a8)
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:1322
> > #7  0x007b3271 in c_parse_file ()
> > ---Type  to continue, or q  to quit---
> > at /export/gnu/import/git/sources/gcc/gcc/c/c-parser.c:15440
> > #8  0x0081cb97 in c_common_parse_file ()
> > at /export/gnu/import/git/sources/gcc/gcc/c-family/c-opts.c:1059
> > #9  0x00f27662 in compile_file ()
> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:543
> > #10 0x00f29baa in do_compile ()
> > at /export/gnu/import/git/sources/gcc/gcc/toplev.c:20

Re: [R220456][4.9] Backport the patch which fixes __ARM_FP & __ARM_NEON_FP predefines

2015-07-10 Thread Kyrill Tkachov

Hi Mantas,

On 13/02/15 10:03, Mantas Mikaitis wrote:

Hi all,

This is a backport for gcc-4_9-branch of the patch " [PATCH][ARM]
__ARM_FP & __ARM_NEON_FP defined when -march=armv7-m" posted in:
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00250.html

arm-none-linux-gnueabi/hf tested without any new regressions.

OK for gcc-4_9-branch?


Ok.

I have applied this on your behalf to the 4.9 branch with r225667 with
the slightly adjusted ChangeLog entries:

2015-07-10  Mantas Mikaitis  

* config/arm/arm.h (TARGET_NEON_FP): Remove conditional definition,
define to zero if !TARGET_NEON.
(TARGET_ARM_FP): Add !TARGET_SOFT_FLOAT into the conditional
definition.

2015-07-10  Mantas Mikaitis  

* gcc.target/arm/macro_defs0.c: New test.
* gcc.target/arm/macro_defs1.c: New test.
* gcc.target/arm/macro_defs2.c: New test.


Thanks,
Kyrill


gcc/ChangeLog:

2015-02-13  Mantas Mikaitis  

  * config/arm/arm.h (TARGET_NEON_FP): Removed conditional definition,
  define to zero if !TARGET_NEON.
  *(TARGET_ARM_FP): Added !TARGET_SOFT_FLOAT into the conditional
  definition

gcc/testsuite/ChangeLog:

  * gcc.target/arm/macro_defs0.c: New test.
  * gcc.target/arm/macro_defs1.c: New test.
  * gcc.target/arm/macro_defs2.c: New test.




Re: [AArch64][1/2] Mark GOT related MEM rtx as const to help RTL loop IV

2015-07-10 Thread Marcus Shawcroft
On 7 July 2015 at 13:33, Jiong Wang  wrote:

> 2015-07-06  Jiong Wang  
>
> gcc/
>   * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Mark mem as
>   READONLY and NOTRAP for PIC symbol.
>
> gcc/testsuite/
>   * gcc.target/aarch64/got_mem_hoist.c: New test.

Looks, OK to me.  Follow the guidance on the wiki here
https://gcc.gnu.org/wiki/TestCaseWriting when naming new test cases
and add _1 suffix.  Otherwise OK.
/Marcus


[PATCH] Improve in_array_bounds_p

2015-07-10 Thread Richard Biener

I was just testing the patch below which runs into latent issues when
building libjava (at least)...

/space/rguenther/src/svn/trunk/libjava/java/lang/natClassLoader.cc: In 
function ‘java::lang::Class* _Jv_FindClassInCache(_Jv_Utf8Const*)’:
/space/rguenther/src/svn/trunk/libjava/java/lang/natClassLoader.cc:97:1: 
error:BB 3 last statement has incorrectly set lp
 _Jv_FindClassInCache (_Jv_Utf8Const *name)
 ^
/space/rguenther/src/svn/trunk/libjava/java/lang/natClassLoader.cc:97:1: 
internal compiler error: verify_flow_info failed
0x8e2132 verify_flow_info()
/space/rguenther/src/svn/trunk/gcc/cfghooks.c:261

so I have to debug that first.  Still IMHO the patch makes sense apart
from the ugly need to go through a INTEGER_CST tree when converting
a wide_int to a widest_int (ugh).  Any wide-int folks around that
can suggest something better here (reason: the two integers we compare
do not have to have the same type/precision - see tree_int_cst_lt
which also uses widest_ints).

Richard.

2015-07-10  Richard Biener  

* tree-eh.c (in_array_bounds_p): Use value-range information
when available.

Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 225655)
+++ gcc/tree-eh.c   (working copy)
@@ -2532,8 +2532,11 @@ in_array_bounds_p (tree ref)
 {
   tree idx = TREE_OPERAND (ref, 1);
   tree min, max;
+  wide_int idx_min, idx_max;
 
-  if (TREE_CODE (idx) != INTEGER_CST)
+  if (TREE_CODE (idx) != INTEGER_CST
+  && (TREE_CODE (idx) != SSA_NAME
+ || get_range_info (idx, &idx_min, &idx_max) != VR_RANGE))
 return false;
 
   min = array_ref_low_bound (ref);
@@ -2544,11 +2547,26 @@ in_array_bounds_p (tree ref)
   || TREE_CODE (max) != INTEGER_CST)
 return false;
 
-  if (tree_int_cst_lt (idx, min)
-  || tree_int_cst_lt (max, idx))
-return false;
+  if (TREE_CODE (idx) == INTEGER_CST)
+{
+  if (tree_int_cst_lt (idx, min)
+ || tree_int_cst_lt (max, idx))
+   return false;
+
+  return true;
+}
+  else
+{
+  if (wi::lts_p (wi::to_widest (wide_int_to_tree (TREE_TYPE (idx),
+ idx_min)),
+wi::to_widest (min))
+ || wi::lts_p (wi::to_widest (max),
+   wi::to_widest (wide_int_to_tree (TREE_TYPE (idx),
+idx_max
+   return false;
 
-  return true;
+  return true;
+}
 }
 
 /* Returns true if it is possible to prove that the range of

Re: genmatch indent generated code

2015-07-10 Thread Michael Matz
Hi,

On Fri, 10 Jul 2015, Richard Biener wrote:

> > I also noticed it but didn't care ;)  But now I notice
> >
> >   switch (TREE_CODE (t))
> > {
> >   case SSA_NAME:
> >
> > cases are indented too much, it should be
> >
> >   switch (TREE_CODE (t))
> > {
> > case SSA_NAME:

I like the first one better, but hey, be my guest ;-)


Ciao,
Michael.


Re: [PATCH] [testsuite] Disable attr_thumb.c test when Thumb mode is not supported.

2015-07-10 Thread Christophe Lyon
On 10 July 2015 at 13:40, Ramana Radhakrishnan
 wrote:
>
>
> On 10/07/15 12:35, Christophe Lyon wrote:
>> On 10 July 2015 at 09:14, Christian Bruel  wrote:
>>>
>>> On 07/09/2015 05:39 PM, Christophe Lyon wrote:
 Some multilibs do not support Thumb mode on ARM targets. This is the
 case for instance when target is arm-linux-gnueabihf and with
 -march=armv5-t: Thumb-1 hard-float VFP ABI is not implemented.

 In this configuration, gcc.target/arm/attr_thumb.c is failing because
 we switch thumb mode via an attribute.

 This patch makes this test unsupported, by adding a new function in
 lib/target-supports.exp: check_effective_target_arm_thumb_ok.

 OK?

>>>
>>> What about just skip-if { ! { arm_thumb1_ok || arm_thumb2_ok } } , for
>>> consistency with the other tests using -mthumb ?
>> OK, let's be consistent.
>>
>>> Can you add the same check to flip-thumb.c as well (must have been also FAIL
>>> for your configuration) ?
>> Indeed, I noticed it when you added that test, then forgot to merge both 
>> fixes.
>>
>
> OK -
> regards
> Ramana
>
> P.S. Is it really interesting to test Thumb1 / armv5t on mfloat-abi=hard 
> configurations especially as Thumb1 doesn't have any instructions to move 
> values into the VFP register bank ?
>
Well, I am testing this configuration:
--target=arm-none-linux-gnueabihf --with-mode=arm --with-cpu=cortex-a9
--with-fpu=vfp
and RUNTESTFLAGS=-march=armv5-t

A few tests do force Thumb1, and fail because HF+Thumb1 is not supported.


>
>>> thanks
>>>
>>> Christian
>>>
>> 2015-07-10  Christophe Lyon  
>>
>> * gcc.target/arm/attr_thumb.c: Skip if Thumb is not supported.
>> * gcc.target/arm/flip-thumb.c: Likewise.
>>
>> diff --git a/gcc/testsuite/gcc.target/arm/attr_thumb.c
>> b/gcc/testsuite/gcc.target/arm/attr_thumb.c
>> index 02ddfda..eac4713 100644
>> --- a/gcc/testsuite/gcc.target/arm/attr_thumb.c
>> +++ b/gcc/testsuite/gcc.target/arm/attr_thumb.c
>> @@ -1,5 +1,7 @@
>>  /* Check that attribute target thumb is recognized. */
>>  /* { dg-do compile } */
>> +/* Make sure the current multilib supports thumb.  */
>> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>>  /* { dg-options "-O2 -mno-restrict-it" } */
>>  /* { dg-final { scan-assembler-not "\\.arm"  } } */
>>  /* { dg-final { scan-assembler "\\.thumb_func" } } */
>> diff --git a/gcc/testsuite/gcc.target/arm/flip-thumb.c
>> b/gcc/testsuite/gcc.target/arm/flip-thumb.c
>> index 9154799..355d663 100644
>> --- a/gcc/testsuite/gcc.target/arm/flip-thumb.c
>> +++ b/gcc/testsuite/gcc.target/arm/flip-thumb.c
>> @@ -1,5 +1,7 @@
>>  /* Check -mflip-thumb. */
>>  /* { dg-do compile } */
>> +/* Make sure the current multilib supports thumb.  */
>> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>>  /* { dg-options "-O2 -mflip-thumb -mno-restrict-it" } */
>>  /* { dg-final { scan-assembler "\\.arm" } } */
>>  /* { dg-final { scan-assembler-times "\\.thumb_func" 1} } */
>>


Re: [PATCH] fold builtin_tolower, builtin_toupper

2015-07-10 Thread Bernhard Reutner-Fischer
On 10 July 2015 at 08:51, Ondřej Bílka  wrote:
> On Thu, Jul 09, 2015 at 03:46:08PM +0200, Richard Biener wrote:
>> On Thu, 9 Jul 2015, Bernhard Reutner-Fischer wrote:

[toupper/tolower patch withdrawn]

>> I don't think this can be correct for all locales which need not
>> have a lower-case character for all upper-case ones nor do
>> all letters having one need to be in the range of 'A' to 'Z'.
>>
>> Joseph will surely correct me if I am wrong.
>>
> Thats correct as this doesn't handle toupper('č') with appropriate
> single byte locale. You cannot even rely on fact that if x<128 then only
> conversion is happens in 'A'..'Z' range, there are locales where that
> doesn't hold and we need to check _NL_CTYPE_NONASCII_CASE. We don't
> export that so you would need to check that while constructing table with 256 
> entries.

I detest locales.
>
> Also your example is invalid as you used __builtin_tolower instead
> tolower. As usual gcc builtins are slow, you will get better performance

You're of course right, libc usually has a map-lookup for the fast
path for these.
(tolower) (...)  comes to mind but doesn't matter here.

> with following.
>
> #include 
> int foo(char *c)
> {
>  int i;
>  for(i=0;i<1000;i++)
>c[i]=tolower(c[i]);
> }
>
>
> As your example first problem is that it doesn't work with utf8 due
> multibyte characters.

yea, the app i saw doing that strcpy/tolower has a defined input of
ASCII A-Za-z0-9- so i should not have used toupper in the example in
the first place.

>
> Second problem is that sse4.2 doesn't help at all as generating masks
> with it is quite slow. Using just sse2 is faster here.

The point of the PR was that a) loop-fusion is missing and b) nothing
is vectorized.
The quick sse4.2 example was just an extension my CPU happens to
support and that showed the result would be smaller than before and
maybe even a tiny bit faster.. ;)

>
> It could be possible to add such function to libc. For vectorization you

I think it would be better if GCC was able to fuse two or more loops
and grok to vectorize patterns like these.
As you point out, toupper is a bad example, a better one would perhaps
be something like the attached.

I guess that there is real-world code that does a
memcpy/memmove/str[n]cpy and then mask out some bits in the
destination so this should be useful generally.

thanks for your comments, though!
cheers,
/* PR middle-end/66741 */
/* Manually expanded variant */
/* We were not fusing the 2 loops (strcpy and tolower) and we did not
 * vectorize the loop.  */
typedef __SIZE_TYPE__ size_t;
static __attribute__ ((noinline, noclone)) char *
tolower_strcpy_1(char *dest, const char *src) {
	char *d = dest, *s = (char *)src;
	while (*s) /* strcpy */
		*d++ = *s++;
	*d = '\0';
	d = dest;
	/* while (*d) should work as well but might be too complicated, so: */
	/* use same loop condition as above */
	s = (char *)src;
	while (*s) { /* ascii_tolower */
		int ch = *d;
		*d++ = ch >= 'A' && ch <= 'Z' ? ch | 0x20 : ch;
		s++;
	}
	return dest;
}
char *tolower_strcpy(char *dest, const char *src) {
	char *s = (char *)src;
	unsigned int len = 0;
	while (*s)
		if (*s < '-' || *s > 'z' || ++len > 255)
			return (void*)0;
	return tolower_strcpy_1(dest, src);
}
#ifdef MAIN
#include 
#include 
#define N 128
int main(void) {
	unsigned long sum = 0;
	char src[N + 1], dest[N + 1];
	while (1) {
		int n = read(0, &src, N);
		if (n == 0)
			break;
		if (n < 0)
			return 1;
		src[n] = 0;
		sum |= (unsigned long)tolower_strcpy(dest, src);
//		write(1, dest, strlen(dest));
	}
	return sum == 42;
}
#endif


Re: [PATCH] [testsuite] Disable attr_thumb.c test when Thumb mode is not supported.

2015-07-10 Thread Ramana Radhakrishnan


On 10/07/15 12:35, Christophe Lyon wrote:
> On 10 July 2015 at 09:14, Christian Bruel  wrote:
>>
>> On 07/09/2015 05:39 PM, Christophe Lyon wrote:
>>> Some multilibs do not support Thumb mode on ARM targets. This is the
>>> case for instance when target is arm-linux-gnueabihf and with
>>> -march=armv5-t: Thumb-1 hard-float VFP ABI is not implemented.
>>>
>>> In this configuration, gcc.target/arm/attr_thumb.c is failing because
>>> we switch thumb mode via an attribute.
>>>
>>> This patch makes this test unsupported, by adding a new function in
>>> lib/target-supports.exp: check_effective_target_arm_thumb_ok.
>>>
>>> OK?
>>>
>>
>> What about just skip-if { ! { arm_thumb1_ok || arm_thumb2_ok } } , for
>> consistency with the other tests using -mthumb ?
> OK, let's be consistent.
> 
>> Can you add the same check to flip-thumb.c as well (must have been also FAIL
>> for your configuration) ?
> Indeed, I noticed it when you added that test, then forgot to merge both 
> fixes.
> 

OK - 
regards
Ramana

P.S. Is it really interesting to test Thumb1 / armv5t on mfloat-abi=hard 
configurations especially as Thumb1 doesn't have any instructions to move 
values into the VFP register bank ?


>> thanks
>>
>> Christian
>>
> 2015-07-10  Christophe Lyon  
> 
> * gcc.target/arm/attr_thumb.c: Skip if Thumb is not supported.
> * gcc.target/arm/flip-thumb.c: Likewise.
> 
> diff --git a/gcc/testsuite/gcc.target/arm/attr_thumb.c
> b/gcc/testsuite/gcc.target/arm/attr_thumb.c
> index 02ddfda..eac4713 100644
> --- a/gcc/testsuite/gcc.target/arm/attr_thumb.c
> +++ b/gcc/testsuite/gcc.target/arm/attr_thumb.c
> @@ -1,5 +1,7 @@
>  /* Check that attribute target thumb is recognized. */
>  /* { dg-do compile } */
> +/* Make sure the current multilib supports thumb.  */
> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>  /* { dg-options "-O2 -mno-restrict-it" } */
>  /* { dg-final { scan-assembler-not "\\.arm"  } } */
>  /* { dg-final { scan-assembler "\\.thumb_func" } } */
> diff --git a/gcc/testsuite/gcc.target/arm/flip-thumb.c
> b/gcc/testsuite/gcc.target/arm/flip-thumb.c
> index 9154799..355d663 100644
> --- a/gcc/testsuite/gcc.target/arm/flip-thumb.c
> +++ b/gcc/testsuite/gcc.target/arm/flip-thumb.c
> @@ -1,5 +1,7 @@
>  /* Check -mflip-thumb. */
>  /* { dg-do compile } */
> +/* Make sure the current multilib supports thumb.  */
> +/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
>  /* { dg-options "-O2 -mflip-thumb -mno-restrict-it" } */
>  /* { dg-final { scan-assembler "\\.arm" } } */
>  /* { dg-final { scan-assembler-times "\\.thumb_func" 1} } */
> 


Re: [PATCH] [testsuite] Disable attr_thumb.c test when Thumb mode is not supported.

2015-07-10 Thread Christophe Lyon
On 10 July 2015 at 09:14, Christian Bruel  wrote:
>
> On 07/09/2015 05:39 PM, Christophe Lyon wrote:
>> Some multilibs do not support Thumb mode on ARM targets. This is the
>> case for instance when target is arm-linux-gnueabihf and with
>> -march=armv5-t: Thumb-1 hard-float VFP ABI is not implemented.
>>
>> In this configuration, gcc.target/arm/attr_thumb.c is failing because
>> we switch thumb mode via an attribute.
>>
>> This patch makes this test unsupported, by adding a new function in
>> lib/target-supports.exp: check_effective_target_arm_thumb_ok.
>>
>> OK?
>>
>
> What about just skip-if { ! { arm_thumb1_ok || arm_thumb2_ok } } , for
> consistency with the other tests using -mthumb ?
OK, let's be consistent.

> Can you add the same check to flip-thumb.c as well (must have been also FAIL
> for your configuration) ?
Indeed, I noticed it when you added that test, then forgot to merge both fixes.

> thanks
>
> Christian
>
2015-07-10  Christophe Lyon  

* gcc.target/arm/attr_thumb.c: Skip if Thumb is not supported.
* gcc.target/arm/flip-thumb.c: Likewise.

diff --git a/gcc/testsuite/gcc.target/arm/attr_thumb.c
b/gcc/testsuite/gcc.target/arm/attr_thumb.c
index 02ddfda..eac4713 100644
--- a/gcc/testsuite/gcc.target/arm/attr_thumb.c
+++ b/gcc/testsuite/gcc.target/arm/attr_thumb.c
@@ -1,5 +1,7 @@
 /* Check that attribute target thumb is recognized. */
 /* { dg-do compile } */
+/* Make sure the current multilib supports thumb.  */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-O2 -mno-restrict-it" } */
 /* { dg-final { scan-assembler-not "\\.arm"  } } */
 /* { dg-final { scan-assembler "\\.thumb_func" } } */
diff --git a/gcc/testsuite/gcc.target/arm/flip-thumb.c
b/gcc/testsuite/gcc.target/arm/flip-thumb.c
index 9154799..355d663 100644
--- a/gcc/testsuite/gcc.target/arm/flip-thumb.c
+++ b/gcc/testsuite/gcc.target/arm/flip-thumb.c
@@ -1,5 +1,7 @@
 /* Check -mflip-thumb. */
 /* { dg-do compile } */
+/* Make sure the current multilib supports thumb.  */
+/* { dg-skip-if "" { ! { arm_thumb1_ok || arm_thumb2_ok } } } */
 /* { dg-options "-O2 -mflip-thumb -mno-restrict-it" } */
 /* { dg-final { scan-assembler "\\.arm" } } */
 /* { dg-final { scan-assembler-times "\\.thumb_func" 1} } */


Ping: [PATCH][ARM][testsuite] Fix FAIL: gcc.target/arm/macro_defs0.c and macro_defs1.c when -marm forced

2015-07-10 Thread Mantas Mikaitis

Pinging this patch.

Thank you,
- Mantas

On 05/03/15 10:14, Mantas Mikaitis wrote:

Hello,

Tests gcc.target/arm/macro_defs0.c and gcc.target/arm/macro_defs1.c fail
in multilib which forces -marm as pointed out in this message:
https://gcc.gnu.org/ml/gcc-patches/2015-02/msg00483.html .

This patch will cause these tests to be classified as unsupported rather
than FAIL.

Ok for trunk?

Kind regards,
Mantas M.

2015-03-05  Mantas Mikaitis  

  * gcc.target/arm/macro_defs0.c: added directive to skip
  test if -marm is present.
  * gcc.target/arm/macro_defs1.c: added directive to skip
  test if -marm is present.




Re: C++ PATCH to change default dialect to C++14

2015-07-10 Thread H.J. Lu
On Wed, Jul 1, 2015 at 11:26 AM, Jason Merrill  wrote:
> I've been threatening to do this for a couple of months, and now that the
> regressions are under control I think it's time.  This patch changes the
> default C++ dialect to C++14.
>
> Tested x86_64-pc-linux-gnu, applying to trunk.

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66829

-- 
H.J.


Re: [patch] Fix PR middle-end/66633

2015-07-10 Thread Eric Botcazou
> After looking into this some more, because the PR got reopened, there were
> two issues: 1) __builtin_adjust_trampoline call needing a frame or chain (as
> can be seen on the new testcases) that wasn't added to parallel/task/target
> clauses 2) for !optimize, there is code to add those, when frame or chain
> is needed for calls, but the problem was that as !optimize didn't clear
> DECL_STATIC_CHAIN initially it wasn't discovered.
> 
> Both issues fixed in the following patch, which made the earlier change
> unnecessary.

Thanks for looking into this!

-- 
Eric Botcazou


New German PO file for 'gcc' (version 5.1.0)

2015-07-10 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-5.1.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH, libcpp] Faster line lexer.

2015-07-10 Thread Jakub Jelinek
On Fri, Jul 10, 2015 at 11:37:18AM +0200, Uros Bizjak wrote:
> Hello!
> 
> > As I wrote at
> >
> > [PATCH, libcpp]: Use asm flag outputs in search_line_sse42 main loop
> >
> > https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg113610.html
> >
> > I wont repeat myself with reasons summary is that current sse4.2 code is
> > reduntant as it has same performance as sse2 one.
> > This improves sse2 performance by around 10% vs sse4.2 code by
> > using better header.
> 
> Have you tried new SSE4.2 implementation (the one with asm flags) with
> unrolled loop?

Also, the SSE4.2 implementation looks shorter, so more I-cache friendly,
so I wouldn't really say it is redundant if they are roughly same speed.

Jakub


Re: [PATCH] PR target/66813: gcc.target/i386/asm-flag-5.c failed with -march=pentium

2015-07-10 Thread Uros Bizjak
On Thu, Jul 9, 2015 at 3:15 PM, Uros Bizjak  wrote:

>> gen_rtx_ZERO_EXTEND isn't suitable in ix86_md_asm_adjust since ZERO_EXTEND
>> may be expaned.  We should call gen_zero_extendqiXi2 instead.
>>
>> OK for trunk?
>
> No, your patch will clobber flags when multiple flag outputs are used.
>
> (I plan to rewrite x86 zero_extend patterns to use preferred_for_size
> attribute with peepholes, this will magically solve this bug and
> readeflags-1.c failure).

No, the above mentioned patch won't fly, it limits AND insn operands
too much with "q" constraint.

So, the patch below is what I plan to commit after
bootstrap/regression test on x86_64-linux-gnu {,-m32}.

2015-07-10  Uros Bizjak  

PR target/66813
* config/i386/i386.c (ix86_md_asm_adjust): Emit movstrictqi
sequence for TARGET_ZERO_EXTEND_WITH_AND targets.

testsuite/ChangeLog:

2015-07-10  Uros Bizjak  

PR target/66813
* gcc.target/i386/pr66813.c: New test.

Uros.
Index: testsuite/gcc.target/i386/pr66813.c
===
--- testsuite/gcc.target/i386/pr66813.c (revision 0)
+++ testsuite/gcc.target/i386/pr66813.c (revision 0)
@@ -0,0 +1,4 @@
+/* { dg-do compile { target { ia32 } } } */
+/* { dg-options "-march=pentium" } */
+
+#include "asm-flag-5.c"
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 225648)
+++ config/i386/i386.c  (working copy)
@@ -45842,7 +45842,17 @@ ix86_md_asm_adjust (vec &outputs, vec &/
{
  rtx destqi = gen_reg_rtx (QImode);
  emit_insn (gen_rtx_SET (destqi, x));
- x = gen_rtx_ZERO_EXTEND (dest_mode, destqi);
+
+ if (TARGET_ZERO_EXTEND_WITH_AND
+ && optimize_function_for_speed_p (cfun))
+   {
+ x = force_reg (dest_mode, const0_rtx);
+
+ emit_insn (gen_movstrictqi
+(gen_lowpart (QImode, x), destqi));
+   }
+ else
+   x = gen_rtx_ZERO_EXTEND (dest_mode, destqi);
}
   emit_insn (gen_rtx_SET (dest, x));
 }


[PATCH] Unswitching outer loops.

2015-07-10 Thread Yuri Rumyantsev
Hi All,

Here is presented simple transformation which tries to hoist out of
outer-loop a check on zero trip count for inner-loop. This is very
restricted transformation since it accepts outer-loops with very
simple cfg, as for example:
acc = 0;
   for (i = 1; i <= m; i++) {
  for (j = 0; j < n; j++)
 if (l[j] == i) { v[j] = acc; acc++; };
  acc <<= 1;
   }

Note that degenerative outer loop (without inner loop) will be
completely deleted as dead code.
The main goal of this transformation was to convert outer-loop to form
accepted by outer-loop vectorization (such test-case is also included
to patch).

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?

ChangeLog:
2015-07-10  Yuri Rumyantsev  

* tree-ssa-loop-unswitch.c: Include "tree-cfgcleanup.h" and
"gimple-iterator.h", add prototype for tree_unswitch_outer_loop.
(tree_ssa_unswitch_loops): Add invoke of tree_unswitch_outer_loop.
(tree_unswitch_outer_loop): New function.

gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/unswitch-outer-loop-1.c: New test.
* gcc.dg/vect/vect-outer-simd-3.c: New test.


patch.3
Description: Binary data


Re: [PATCH]Fix PR66556. Don't drop side-effect in simplify_const_relational_operation function.

2015-07-10 Thread Renlin Li

Hi Jeff,

Thank you for the suggestion! I will committed it first and continue 
working on it.


Regards,
Renlin Li

On 08/07/15 21:56, Jeff Law wrote:

On 07/08/2015 09:03 AM, Renlin Li wrote:

Hi all,

In simplify_const_relational_operation function, there are cases a const
rtx
will be returned.

Three cases are considered in this function:
1, comparisons with upper and lower bounds.
2, Integer comparisons with zero.
3, comparison of ABS with zero.

It's fine to to the optimization if the operands have no side-effects.

For example, I am currently fixing a code generation bug for armv7-a
bigendian.
It turns out that, the following rtx is simplified into a const, and the
side-effect with it is ignored.

(ltu:SI (lshiftrt:SI (subreg:SI (mem/c:HI (post_modify:SI (reg/f:SI 156)
  (plus:SI (reg/f:SI 156)
  (const_int 20 [0x14]))) [5 g+4 S2 A32]) 0)
  (const_int 1 [0x1]))
  (const_int -1 [0x]))

>>

(const_int 1 [0x1])

This particular case falls into category 1 mentioned above. -1, when
regarded
as unsigned integer, is the largest unsigned integer. So the result is
always
a const_true_rtx in this case. However, the first operand of the comparison
has POST_MODIFY side-effect.

In this case, the simplifications should be checked against side-effect.

x86_64 bootstrapping is Okay and arm-none-eabi regression test runs
without any new issues.

Okay to commit and backport to branch 5?

Regards,
Renlin Li

gcc/ChangeLog:

2015-07-08  Renlin Li

  PR rtl/66556
  * simplify-rtx.c (simplify_const_relational_operation): Add
  side_effects_p check.

gcc/testsuite/ChangeLog:

2015-07-08  Renlin Li

  PR rtl/66556
  * gcc.c-torture/execute/pr66556.c: New.

OK.

It may be worth looking at the .optimized dump for the new test and see
if there's something we can/should be optimizing better before we start
generating RTL.  That can obviously be a follow-up.

jeff





Re: Proposal to postpone release of 5.2 for a week

2015-07-10 Thread Kaz Kojima
Richard Biener  wrote:
>> Can I backport the patch in
>> 
>> [SH] Fix PR target/66780
>> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00472.html
>> 
>> to gcc-5.2?  It's a few lines change for a SH specific wrong code bug
>> with -fstack-protector which is disastrous for Debian SH folks who use
>> that option heavily in their packages.
> 
> Yes.

Thanks!  I've just committed it as revision 225660.

Regards,
kaz


[PATCH][obvious] Fix typos above expand_cond_expr_using_cmove

2015-07-10 Thread Kyrill Tkachov

Hi all,

As subject says.
Committed as obvious with r225659.

Thanks,
Kyrill

2015-07-10  Kyrylo Tkachov  

* expr.c (expand_cond_expr_using_cmove): Fix typos in comment
above function.
diff --git a/gcc/expr.c b/gcc/expr.c
index 34930c5..6f6ee9d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7897,9 +7897,9 @@ expand_expr_real (tree exp, rtx target, machine_mode tmode,
 }
 
 /* Try to expand the conditional expression which is represented by
-   TREEOP0 ? TREEOP1 : TREEOP2 using conditonal moves.  If succeseds
-   return the rtl reg which repsents the result.  Otherwise return
-   NULL_RTL.  */
+   TREEOP0 ? TREEOP1 : TREEOP2 using conditonal moves.  If it succeeds
+   return the rtl reg which represents the result.  Otherwise return
+   NULL_RTX.  */
 
 static rtx
 expand_cond_expr_using_cmove (tree treeop0 ATTRIBUTE_UNUSED,


Re: [RFC, Fortran, (pr66775)] Allocatable function result

2015-07-10 Thread Andre Vehreschild
Hi all,

this means that pr66775 is to be closed as resolved invalid, because the
current implementation is alright, but only the program to compile is garbage.
Ok, suits me.

- Andre

On Thu, 9 Jul 2015 12:41:31 -0700
Steve Kargl  wrote:

> On Thu, Jul 09, 2015 at 08:59:08PM +0200, Andre Vehreschild wrote:
> > Hi Steve,
> > 
> > Thanks for your knowledge. Can you support your statement that an
> > allocatable function has to return an allocated object by a part of the
> > standard? I totally agree with you that this code is ill-designed, but IMO
> > is it not the task of the compiler to address ill design. The compiler has
> > to comply to the standard and the standard allows allocatable objects to be
> > unallocated. So why has the result of a function be allocated always?
> > 
> > Regards,
> > Andre
> > 
> 
> I think the following excerpts from F2008 are the relevant
> clauses, especially the 2nd to last sentence in the excerpt
> from 12.6.2.2.
> 
> !  12.5.3
> !
> !  When execution of the function is complete, the value of
> !  the function result is available for use in the expression
> !  that caused the function to be invoked.
> !
> !  12.6.2.2
> !
> !  If RESULT appears, the name of the result variable of the
> !  function is result-name and all occurrences of the function
> !  name in execution-part statements in its scope refer to the
> !  function itself.  If RESULT does not appear, the name of the
> !  result variable is function-name and all occurrences of the
> !  function name in execution-part statements in its scope are
> !  references to the result variable.  The characteristics (12.3.3)
> !  of the function result are those of the result variable.  On
> !  completion of execution of the function, the value returned is
> !  that of its result variable.  If the function result is a pointer,
> !  the shape of the value returned by the function is determined by
> !  the shape of the result variable when the execution of the function
> !  is completed.  If the result variable is not a pointer, its value
> !  shall be defined by the function.  If the function result is a
> !  pointer, on return the pointer association status of the result
> !  variable shall not be undefined.
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH, libcpp] Faster line lexer.

2015-07-10 Thread Uros Bizjak
Hello!

> As I wrote at
>
> [PATCH, libcpp]: Use asm flag outputs in search_line_sse42 main loop
>
> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg113610.html
>
> I wont repeat myself with reasons summary is that current sse4.2 code is
> reduntant as it has same performance as sse2 one.
> This improves sse2 performance by around 10% vs sse4.2 code by
> using better header.

Have you tried new SSE4.2 implementation (the one with asm flags) with
unrolled loop?

Uros.


Ping: [R220456][4.9] Backport the patch which fixes __ARM_FP & __ARM_NEON_FP predefines

2015-07-10 Thread Mantas Mikaitis

Pinging this patch.

Thank you,
- Mantas

On 13/02/15 10:03, Mantas Mikaitis wrote:

Hi all,

This is a backport for gcc-4_9-branch of the patch " [PATCH][ARM]
__ARM_FP & __ARM_NEON_FP defined when -march=armv7-m" posted in:
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00250.html

arm-none-linux-gnueabi/hf tested without any new regressions.

OK for gcc-4_9-branch?

gcc/ChangeLog:

2015-02-13  Mantas Mikaitis  

  * config/arm/arm.h (TARGET_NEON_FP): Removed conditional definition,
  define to zero if !TARGET_NEON.
  *(TARGET_ARM_FP): Added !TARGET_SOFT_FLOAT into the conditional
  definition

gcc/testsuite/ChangeLog:

  * gcc.target/arm/macro_defs0.c: New test.
  * gcc.target/arm/macro_defs1.c: New test.
  * gcc.target/arm/macro_defs2.c: New test.




Re: [Patch wwwdocs] gcc-5/changes.html : Document AMD monitorx and mwaitx

2015-07-10 Thread Richard Biener
On Fri, Jul 10, 2015 at 9:40 AM, Kumar, Venkataramanan
 wrote:
> Hi Richard,
>
>> -Original Message-
>> From: Richard Biener [mailto:richard.guent...@gmail.com]
>> Sent: Thursday, July 09, 2015 8:03 PM
>> To: Kumar, Venkataramanan
>> Cc: Gerald Pfeifer (ger...@pfeifer.com); gcc-patches@gcc.gnu.org
>> Subject: Re: [Patch wwwdocs] gcc-5/changes.html : Document AMD
>> monitorx and mwaitx
>>
>> On Thu, Jul 9, 2015 at 4:28 PM, Kumar, Venkataramanan
>>  wrote:
>> > Hi Gerald,
>> >
>> > This patch documents about  AMD instructions "mwaitx" and "monitorx" in
>> GCC- 5 changes.html.
>> > Please let me know if this ok to commit?
>> >
>> > Index: htdocs/gcc-5/changes.html
>> >
>> ==
>> =
>> > RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
>> > retrieving revision 1.121
>> > diff -r1.121 changes.html
>> > 810a811,819
>> >>   Support for new AMD instructions monitorx and
>> >>   mwaitx has been added. This includes new intrinsic
>> >>   and built-in support. It is enabled through option -
>> mmwaitx.
>> >>   The instructions monitorx and mwaitx
>> >>   implement the same functionality as the old monitor
>> >>   and mwait instructions. In addition
>> mwaitx
>> >>   adds a configurable timer. The timer value is received as third
>> >>   argument and stored in register %ebx. GCC 5.2 is the
>> >>   first release to support these instructions.
>> >
>> > I will send another patch for trunk as well.
>>
>> Please add it to a subsection containing GCC 5.2 changes, see older releases
>> for examples.
>>
>
> I updated  the patch as per your comments.  The format followed  as seen in 
> GCC 4.3 release changes.
>
> Index: htdocs/gcc-5/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-5/changes.html,v
> retrieving revision 1.121
> diff -r1.121 changes.html
> 969a970,985
>>
>> GCC 5.2
>>
>> Target Specific Changes
>>
>> IA-32/x86-64
>>   
>> Support for new AMD instructions monitorx and
>>   mwaitx has been added. This includes new intrinsic
>>   and built-in support. It is enabled through option 
>> -mmwaitx.
>>   The instructions monitorx and mwaitx
>>   implement the same functionality as the old monitor
>>   and mwait instructions. In addition mwaitx
>>   adds a configurable timer. The timer value is received as third
>>   argument and stored in register %ebx.
>>   
>
> Please let me know if this is ok to commit?

Yes, this looks good

>
> Regards,
> Venkat.
>
>


Re: [PATCH 2/3] Fully remove legacy tree-ssa-tail-merge value numbering infrastructure.

2015-07-10 Thread Richard Biener
On Thu, Jul 9, 2015 at 5:39 PM, Jeff Law  wrote:
> On 07/09/2015 07:56 AM, mliska wrote:
>>
>> gcc/ChangeLog:
>>
>> 2015-07-09  Martin Liska  
>>
>> * tree-ssa-tail-merge.c (gimple_operand_equal_value_p): Remove.
>> (gimple_equal_p): Remove.
>> (gsi_advance_bw_nondebug_nonlocal): Remove.
>> (find_duplicate): Remove legacy value numbering.
>> (find_clusters_1): Likewise.
>
> OK once the prerequisites have been approved.

Also as cleanup opportunity it should then be possible to run tail-merging
as separate pass, not piggy-backed on PRE.

Richard.

> Jeff
>


Re: Proposal to postpone release of 5.2 for a week

2015-07-10 Thread Richard Biener
On Fri, 10 Jul 2015, Kaz Kojima wrote:

> Richard Biener  wrote:
> > Note this opens the window for other important wrong-code fixes - please
> > CC me on any you'd like to propose for GCC 5.2 and wait for my
> > explicit approval.
> 
> Can I backport the patch in
> 
> [SH] Fix PR target/66780
> https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00472.html
> 
> to gcc-5.2?  It's a few lines change for a SH specific wrong code bug
> with -fstack-protector which is disastrous for Debian SH folks who use
> that option heavily in their packages.

Yes.

Thanks,
Richard.

> Regards,
>   kaz
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)


Re: [gomp] Move openacc vector& worker single handling to RTL

2015-07-10 Thread Thomas Schwinge
Hi!

On Thu, 09 Jul 2015 20:25:22 -0400, Nathan Sidwell  wrote:
> This is the patch I committed.

:-) Whee!

From testing this, two things:

1. Can you please have a look at the following ICE?  I suppose you can
reproduce this in your non-checking build by just unconditionally
enabling that df_verify call?  Committed to gomp-4_0-branch in r225656:

commit 1aff96b721921f621642c0fab95359453bc01beb
Author: tschwinge 
Date:   Fri Jul 10 09:01:55 2015 +

Work around nvptx offloading compiler --enable-checking=yes,df,fold,rtl 
breakage

... introduced in r225647.

checking whether the GNU Fortran compiler is working... no
configure: error: GNU Fortran is not working; please report a bug in 
http://gcc.gnu.org/bugzilla, attaching 
/home/thomas/tmp/source/gcc/openacc/openacc-gomp-4_0-branch-work_/build-gcc-accel-nvptx/nvptx-none/libgfortran/config.log
make[1]: *** [configure-target-libgfortran] Error 1

configure:4192: [...]/build-gcc-accel-nvptx/./gcc/xgcc 
-B[...]/build-gcc-accel-nvptx/./gcc/ -nostdinc 
-B[...]/build-gcc-accel-nvptx/nvptx-none/newlib/ -isystem 
[...]/build-gcc-accel-nvptx/nvptx-none/newlib/targ-include -isystem 
[...]/source-gcc/newlib/libc/include -B/nvptx-none/bin/ -B/nvptx-none/lib/ 
-isystem /nvptx-none/include -isystem /nvptx-none/sys-include 
--sysroot=[...]/install/nvptx-none   -c -g  conftest.c >&5
conftest.c: In function 'main':
conftest.c:16:1: internal compiler error: in 
df_live_verify_transfer_functions, at df-problems.c:1849
 }
 ^
0x6d3d8e df_live_verify_transfer_functions()
[...]/source-gcc/gcc/df-problems.c:1848
0x6cb83a df_analyze_1
[...]/source-gcc/gcc/df-core.c:1241
0xd909a0 nvptx_reorg
[...]/source-gcc/gcc/config/nvptx/nvptx.c:2946
0xa50829 execute
[...]/source-gcc/gcc/reorg.c:4034
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
configure:4192: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "GNU Fortran Runtime Library"
| #define PACKAGE_TARNAME "libgfortran"
| #define PACKAGE_VERSION "0.3"
| #define PACKAGE_STRING "GNU Fortran Runtime Library 0.3"
| #define PACKAGE_BUGREPORT ""
| #define PACKAGE_URL "http://www.gnu.org/software/libgfortran/";
| /* end confdefs.h.  */
|
| int
| main ()
| {
|
|   ;
|   return 0;
| }

Reproduce:

$ echo 'static void foo(void) {}' | build-gcc-accel-nvptx/gcc/xgcc 
-Bbuild-gcc-accel-nvptx/gcc/ -S -x c -
: In function 'foo':
:1:1: internal compiler error: in 
df_live_verify_transfer_functions, at df-problems.c:1849
0x6d3d8e df_live_verify_transfer_functions()
[...]/source-gcc/gcc/df-problems.c:1848
0x6cb83a df_analyze_1
[...]/source-gcc/gcc/df-core.c:1241
0xd909a0 nvptx_reorg
[...]/source-gcc/gcc/config/nvptx/nvptx.c:2946
0xa50829 execute
[...]/source-gcc/gcc/reorg.c:4034

Workaround:

gcc/
* df-core.c (df_analyze_1): Disable df_verify call.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@225656 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |4 
 gcc/df-core.c  |2 ++
 2 files changed, 6 insertions(+)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index c71e396..535900c 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2015-07-10  Thomas Schwinge  
+
+   * df-core.c (df_analyze_1): Disable df_verify call.
+
 2015-07-09  Nathan Sidwell  
 
Infrastructure:
diff --git gcc/df-core.c gcc/df-core.c
index 67040a1..52cca8e 100644
--- gcc/df-core.c
+++ gcc/df-core.c
@@ -1235,10 +1235,12 @@ df_analyze_1 (void)
   if (dump_file)
 fprintf (dump_file, "df_analyze called\n");
 
+#if /* TODO */ 0
 #ifndef ENABLE_DF_CHECKING
   if (df->changeable_flags & DF_VERIFY_SCHEDULED)
 #endif
 df_verify ();
+#endif
 
   /* Skip over the DF_SCAN problem. */
   for (i = 1; i < df->num_problems_defined; i++)


2. Don't be shy to remove a bunch of XFAILs, in fact all :-) of those
remaining from the test cases that Julian had added in
.

Unfortunately, there's also one regressions, but I'm seeing it only on
Nvidia K20 hardware, not on my laptop (but it may well be
hardware-dependent: according to a web search, CUDA error 716 translates
to CUDA_ERROR_MISALIGNED_ADDRESS).  Are you reproducing that one, and/or
do you have an idea where it's coming from?

Committed to gomp-4_0-branch in r225657:

commit bdecfaf444a5811

Re: [PATCH][AArch64] Improve csinc/csneg/csinv opportunities on immediates

2015-07-10 Thread pinskia




> On Jul 10, 2015, at 1:47 AM, Kyrill Tkachov  wrote:
> 
> Hi Andrew,
> 
>> On 10/07/15 09:40, pins...@gmail.com wrote:
>> 
>> 
>> 
>>> On Jul 10, 2015, at 1:34 AM, Kyrill Tkachov  wrote:
>>> 
>>> Hi all,
>>> 
>>> Currently when evaluating expressions like (a ? 24 : 25) we will move 24 
>>> and 25 into
>>> registers and perform a csel on them.  This misses the opportunity to 
>>> instead move just 24
>>> into a register and then perform a csinc, saving us an instruction and a 
>>> register use.
>>> Similarly for csneg and csinv.
>>> 
>>> This patch implements that idea by allowing such pairs of immediates in 
>>> *cmov_insn
>>> and adding an early splitter that performs the necessary transformation.
>>> 
>>> The testcase included in the patch demonstrates the kind of opportunities 
>>> that are now picked up.
>>> 
>>> With this patch I see about 9.6% more csinc instructions being generated 
>>> for SPEC2006
>>> and the generated code looks objectively better (i.e. fewer mov-immediates 
>>> and slightly
>>> lower register pressure).
>>> 
>>> Bootstrapped and tested on aarch64.
>>> 
>>> Ok for trunk?
>> I think this is the wrong place for this optimization. It should happen in 
>> expr.c and we should produce cond_expr on the gimple level.
> 
> I had considered it, but I wasn't sure how general the conditional 
> increment/negate/inverse operations
> are to warrant a midend implementation. Do you mean the 
> expand_cond_expr_using_cmove function in expr.c?

Yes and we can expand it to even have a target hook on how to expand them if 
needed. 

There is already a standard pattern for condition add so the a ? Const1 : 
const2 can be handled in the a generic way without much troubles. We should 
handle it better in rtl  ifcvt too (that should be an easier patch). The neg 
and not cases are very target specific but can be handled by a target hook and 
expand it directly to it. 


> 
>>  
>> I have patches to do both but I have not got around to cleaning them up. If 
>> anyone wants them, I can send a link to my current gcc 5.1 sources with them 
>> included.
> 
> Any chance you can post them on gcc-patches even as a rough idea of what 
> needs to be done?


I posted my expr patch a few years ago but I never got around to rth's 
comments. This was the generic increment patch. Basically aarch64 should be 
implementing that pattern too. 


The main reason why this should be handled in gimple is that ifcvt on the rtl 
level is not cheap and does not catch all of the cases the simple expansion of 
phi-opt does. I can dig that patch up and I will be doing that next week 
anyways. 

Thanks,
Andrew

> 
> Thanks,
> Kyrill
> 
>>  
>> Thanks,
>> Andrew
>> 
>>> Thanks,
>>> Kyrill
>>> 
>>> 2015-07-10  Kyrylo Tkachov  
>>> 
>>>* config/aarch64/aarch64.md (*cmov_insn): Move stricter
>>>check for operands 3 and 4 to pattern predicate.  Allow immediates
>>>that can be expressed as csinc/csneg/csinv.  New define_split.
>>>(*csinv3_insn): Rename to...
>>>(csinv3_insn): ... This.
>>>* config/aarch64/aarch64.h (AARCH64_IMMS_OK_FOR_CSNEG): New macro.
>>>(AARCH64_IMMS_OK_FOR_CSINC): Likewise.
>>>(AARCH64_IMMS_OK_FOR_CSINV): Likewise.
>>>* config/aarch64/aarch64.c (aarch64_imms_ok_for_cond_op_1):
>>>New function.
>>>(aarch64_imms_ok_for_cond_op): Likewise.
>>>* config/aarch64/aarch64-protos.h (aarch64_imms_ok_for_cond_op_1):
>>>Declare prototype.
>>>(aarch64_imms_ok_for_cond_op): Likewise.
>>> 
>>> 2015-07-10  Kyrylo Tkachov  
>>> 
>>>* gcc.target/aarch64/cond-op-imm_1.c: New test.
>>> 
> 


Re: genmatch indent generated code

2015-07-10 Thread Richard Biener
On Fri, Jul 10, 2015 at 10:45 AM, Richard Biener
 wrote:
> On Thu, Jul 9, 2015 at 5:23 PM, Michael Matz  wrote:
>> Hi,
>>
>> On Thu, 9 Jul 2015, Jakub Jelinek wrote:
>>
>>> That violates the coding style by not using tabs ;)
>>
>> I knew it!  Somebody would notice, pffft.  Fixed in the committed version.
>
> I also noticed it but didn't care ;)  But now I notice
>
>   switch (TREE_CODE (t))
> {
>   case SSA_NAME:
>
> cases are indented too much, it should be
>
>   switch (TREE_CODE (t))
> {
> case SSA_NAME:

Fixed with the attached.

Richard.

2015-07-10  Richard Biener  

* genmatch.c (dt_node::gen_kids_1): Fix indenting of
case labels.
(decision_tree::gen_gimple): Likewise.
(decision_tree::gen_generic): Likewise.


fix-genmatch-indent
Description: Binary data


[PATCH, libcpp] Faster line lexer.

2015-07-10 Thread Ondřej Bílka
Hi,

As I wrote at

[PATCH, libcpp]: Use asm flag outputs in search_line_sse42 main loop

https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg113610.html

I wont repeat myself with reasons summary is that current sse4.2 code is 
reduntant as it has same performance as sse2 one. 
This improves sse2 performance by around 10% vs sse4.2 code by
using better header.

A updated benchmark attached. It counts number of lines of given c
source, I selected itself for replicable results, on sandy bridge
runtime is following, fx10 and nehalem are similar.


time ./a.out line.c 1 10;  time ./a.out line.c 2 10;  time
./a.out line.c 3 10;  time ./a.out line.c 4 10;  time ./a.out
line.c 5 10

# strpbrk
real0m0.507s
user0m0.505s
sys 0m0.000s
# current sse2
real0m0.490s
user0m0.490s
sys 0m0.000s
# current sse4.2
real0m0.423s
user0m0.420s
sys 0m0.003s
# improved header
real0m0.450s
user0m0.451s
sys 0m0.000s
# proposed version
real0m0.426s
user0m0.426s
sys 0m0.000s



* lex.c (search_line_sse2): Improve performance by using
proper header.
(search_line_sse42): Delete.

diff --git a/libcpp/lex.c b/libcpp/lex.c
index 0ad9660..8032e6e 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -373,36 +373,110 @@ search_line_sse2 (const uchar *s, const uchar *end 
ATTRIBUTE_UNUSED)
   const v16qi repl_qm = *(const v16qi *)repl_chars[3];
 
   unsigned int misalign, found, mask;
+
   const v16qi *p;
-  v16qi data, t;
+  v16qi data, t, tx;
+ 
+  if (s + 80 < end)
+{
+  v16qi x0 = __builtin_ia32_loaddqu ((char const *) s);
+  tx =  __builtin_ia32_pcmpeqb128 (x0, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x0, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x0, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x0, repl_qm);
+
+  found =  __builtin_ia32_pmovmskb128 (tx);
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + found;
+}
+  v16qi x1 = __builtin_ia32_loaddqu ((char const *) (s + 16));
+  v16qi x2 = __builtin_ia32_loaddqu ((char const *) (s + 32));
+  v16qi x3 = __builtin_ia32_loaddqu ((char const *) (s + 48));
+  v16qi x4 = __builtin_ia32_loaddqu ((char const *) (s + 64));
+
+  tx =  __builtin_ia32_pcmpeqb128 (x1, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x1, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x1, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x1, repl_qm);
+
+  found =  __builtin_ia32_pmovmskb128 (tx);
+
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + 16 + found;
+}
+
+  tx =  __builtin_ia32_pcmpeqb128 (x2, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x2, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x2, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x2, repl_qm);
+
+  found = __builtin_ia32_pmovmskb128 (tx);
+
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + 32 + found;
+}
+
+
+  tx =  __builtin_ia32_pcmpeqb128 (x3, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x3, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x3, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x3, repl_qm);
+
+  found =  __builtin_ia32_pmovmskb128 (tx);
+
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + 48 + found;
+}
+
+  tx =  __builtin_ia32_pcmpeqb128 (x4, repl_nl);
+  tx |= __builtin_ia32_pcmpeqb128 (x4, repl_cr);
+  tx |= __builtin_ia32_pcmpeqb128 (x4, repl_bs);
+  tx |= __builtin_ia32_pcmpeqb128 (x4, repl_qm);
+
+  found =  __builtin_ia32_pmovmskb128 (tx);
+
+  if (found)
+{
+  found = __builtin_ctz (found);
+  return (const uchar *) s + 64 + found;
+}
+
+  s += 80;
+}
 
   /* Align the source pointer.  */
   misalign = (uintptr_t)s & 15;
   p = (const v16qi *)((uintptr_t)s & -16);
   data = *p;
 
-  /* Create a mask for the bytes that are valid within the first
- 16-byte block.  The Idea here is that the AND with the mask
- within the loop is "free", since we need some AND or TEST
- insn in order to set the flags for the branch anyway.  */
   mask = -1u << misalign;
 
-  /* Main loop processing 16 bytes at a time.  */
-  goto start;
-  do
+  t  = __builtin_ia32_pcmpeqb128(data, repl_nl);
+  t |= __builtin_ia32_pcmpeqb128(data, repl_cr);
+  t |= __builtin_ia32_pcmpeqb128(data, repl_bs);
+  t |= __builtin_ia32_pcmpeqb128(data, repl_qm);
+  found = __builtin_ia32_pmovmskb128 (t);
+  found &= mask;
+
+  while (!found)
 {
   data = *++p;
-  mask = -1;
 
-start:
   t  = __builtin_ia32_pcmpeqb128(data, repl_nl);
   t |= __builtin_ia32_pcmpeqb128(data, repl_cr);
   t |= __builtin_ia32_pcmpeqb128(data, repl_bs);
   t |= __builtin_ia32_pcmpeqb128(data, repl_qm);
   found = __built

Re: Proposal to postpone release of 5.2 for a week

2015-07-10 Thread Kaz Kojima
Richard Biener  wrote:
> Note this opens the window for other important wrong-code fixes - please
> CC me on any you'd like to propose for GCC 5.2 and wait for my
> explicit approval.

Can I backport the patch in

[SH] Fix PR target/66780
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00472.html

to gcc-5.2?  It's a few lines change for a SH specific wrong code bug
with -fstack-protector which is disastrous for Debian SH folks who use
that option heavily in their packages.

Regards,
kaz


Re: [PATCH][AArch64] Improve csinc/csneg/csinv opportunities on immediates

2015-07-10 Thread Kyrill Tkachov

Hi Andrew,

On 10/07/15 09:40, pins...@gmail.com wrote:





On Jul 10, 2015, at 1:34 AM, Kyrill Tkachov  wrote:

Hi all,

Currently when evaluating expressions like (a ? 24 : 25) we will move 24 and 25 
into
registers and perform a csel on them.  This misses the opportunity to instead 
move just 24
into a register and then perform a csinc, saving us an instruction and a 
register use.
Similarly for csneg and csinv.

This patch implements that idea by allowing such pairs of immediates in 
*cmov_insn
and adding an early splitter that performs the necessary transformation.

The testcase included in the patch demonstrates the kind of opportunities that 
are now picked up.

With this patch I see about 9.6% more csinc instructions being generated for 
SPEC2006
and the generated code looks objectively better (i.e. fewer mov-immediates and 
slightly
lower register pressure).

Bootstrapped and tested on aarch64.

Ok for trunk?

I think this is the wrong place for this optimization. It should happen in 
expr.c and we should produce cond_expr on the gimple level.


I had considered it, but I wasn't sure how general the conditional 
increment/negate/inverse operations
 are to warrant a midend implementation. Do you mean the 
expand_cond_expr_using_cmove function in expr.c?

  


I have patches to do both but I have not got around to cleaning them up. If 
anyone wants them, I can send a link to my current gcc 5.1 sources with them 
included.


Any chance you can post them on gcc-patches even as a rough idea of what needs 
to be done?

Thanks,
Kyrill

  


Thanks,
Andrew


Thanks,
Kyrill

2015-07-10  Kyrylo Tkachov  

* config/aarch64/aarch64.md (*cmov_insn): Move stricter
check for operands 3 and 4 to pattern predicate.  Allow immediates
that can be expressed as csinc/csneg/csinv.  New define_split.
(*csinv3_insn): Rename to...
(csinv3_insn): ... This.
* config/aarch64/aarch64.h (AARCH64_IMMS_OK_FOR_CSNEG): New macro.
(AARCH64_IMMS_OK_FOR_CSINC): Likewise.
(AARCH64_IMMS_OK_FOR_CSINV): Likewise.
* config/aarch64/aarch64.c (aarch64_imms_ok_for_cond_op_1):
New function.
(aarch64_imms_ok_for_cond_op): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_imms_ok_for_cond_op_1):
Declare prototype.
(aarch64_imms_ok_for_cond_op): Likewise.

2015-07-10  Kyrylo Tkachov  

* gcc.target/aarch64/cond-op-imm_1.c: New test.





Re: genmatch indent generated code

2015-07-10 Thread Richard Biener
On Thu, Jul 9, 2015 at 5:23 PM, Michael Matz  wrote:
> Hi,
>
> On Thu, 9 Jul 2015, Jakub Jelinek wrote:
>
>> That violates the coding style by not using tabs ;)
>
> I knew it!  Somebody would notice, pffft.  Fixed in the committed version.

I also noticed it but didn't care ;)  But now I notice

  switch (TREE_CODE (t))
{
  case SSA_NAME:

cases are indented too much, it should be

  switch (TREE_CODE (t))
{
case SSA_NAME:

Richard.

>
> Ciao,
> Michael.
> PS: this still isn't fully correct, as sometimes I start the strings with
> spaces which don't count towards the indent parameter, I don't align code
> coming from (with ...) and long lines aren't broken up.  Left as an
> excercise for the reader ;)


Re: [PATCH][AArch64] Improve csinc/csneg/csinv opportunities on immediates

2015-07-10 Thread pinskia




> On Jul 10, 2015, at 1:34 AM, Kyrill Tkachov  wrote:
> 
> Hi all,
> 
> Currently when evaluating expressions like (a ? 24 : 25) we will move 24 and 
> 25 into
> registers and perform a csel on them.  This misses the opportunity to instead 
> move just 24
> into a register and then perform a csinc, saving us an instruction and a 
> register use.
> Similarly for csneg and csinv.
> 
> This patch implements that idea by allowing such pairs of immediates in 
> *cmov_insn
> and adding an early splitter that performs the necessary transformation.
> 
> The testcase included in the patch demonstrates the kind of opportunities 
> that are now picked up.
> 
> With this patch I see about 9.6% more csinc instructions being generated for 
> SPEC2006
> and the generated code looks objectively better (i.e. fewer mov-immediates 
> and slightly
> lower register pressure).
> 
> Bootstrapped and tested on aarch64.
> 
> Ok for trunk?

I think this is the wrong place for this optimization. It should happen in 
expr.c and we should produce cond_expr on the gimple level. 

I have patches to do both but I have not got around to cleaning them up. If 
anyone wants them, I can send a link to my current gcc 5.1 sources with them 
included. 

Thanks,
Andrew

> 
> Thanks,
> Kyrill
> 
> 2015-07-10  Kyrylo Tkachov  
> 
>* config/aarch64/aarch64.md (*cmov_insn): Move stricter
>check for operands 3 and 4 to pattern predicate.  Allow immediates
>that can be expressed as csinc/csneg/csinv.  New define_split.
>(*csinv3_insn): Rename to...
>(csinv3_insn): ... This.
>* config/aarch64/aarch64.h (AARCH64_IMMS_OK_FOR_CSNEG): New macro.
>(AARCH64_IMMS_OK_FOR_CSINC): Likewise.
>(AARCH64_IMMS_OK_FOR_CSINV): Likewise.
>* config/aarch64/aarch64.c (aarch64_imms_ok_for_cond_op_1):
>New function.
>(aarch64_imms_ok_for_cond_op): Likewise.
>* config/aarch64/aarch64-protos.h (aarch64_imms_ok_for_cond_op_1):
>Declare prototype.
>(aarch64_imms_ok_for_cond_op): Likewise.
> 
> 2015-07-10  Kyrylo Tkachov  
> 
>* gcc.target/aarch64/cond-op-imm_1.c: New test.
> 


[PATCH][AArch64] Improve csinc/csneg/csinv opportunities on immediates

2015-07-10 Thread Kyrill Tkachov

Hi all,

Currently when evaluating expressions like (a ? 24 : 25) we will move 24 and 25 
into
registers and perform a csel on them.  This misses the opportunity to instead 
move just 24
into a register and then perform a csinc, saving us an instruction and a 
register use.
Similarly for csneg and csinv.

This patch implements that idea by allowing such pairs of immediates in 
*cmov_insn
and adding an early splitter that performs the necessary transformation.

The testcase included in the patch demonstrates the kind of opportunities that 
are now picked up.

With this patch I see about 9.6% more csinc instructions being generated for 
SPEC2006
and the generated code looks objectively better (i.e. fewer mov-immediates and 
slightly
lower register pressure).

Bootstrapped and tested on aarch64.

Ok for trunk?

Thanks,
Kyrill

2015-07-10  Kyrylo Tkachov  

* config/aarch64/aarch64.md (*cmov_insn): Move stricter
check for operands 3 and 4 to pattern predicate.  Allow immediates
that can be expressed as csinc/csneg/csinv.  New define_split.
(*csinv3_insn): Rename to...
(csinv3_insn): ... This.
* config/aarch64/aarch64.h (AARCH64_IMMS_OK_FOR_CSNEG): New macro.
(AARCH64_IMMS_OK_FOR_CSINC): Likewise.
(AARCH64_IMMS_OK_FOR_CSINV): Likewise.
* config/aarch64/aarch64.c (aarch64_imms_ok_for_cond_op_1):
New function.
(aarch64_imms_ok_for_cond_op): Likewise.
* config/aarch64/aarch64-protos.h (aarch64_imms_ok_for_cond_op_1):
Declare prototype.
(aarch64_imms_ok_for_cond_op): Likewise.

2015-07-10  Kyrylo Tkachov  

* gcc.target/aarch64/cond-op-imm_1.c: New test.
commit eed5af149229609215327b62b7b3b4018adc6f3f
Author: Kyrylo Tkachov 
Date:   Wed Jul 8 10:22:20 2015 +0100

[AArch64] Improve csinc/csneg/csinv opportunities on immediates

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 4fe437f..6e3781e 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -254,6 +254,8 @@ bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
 bool aarch64_gen_movmemqi (rtx *);
 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
+bool aarch64_imms_ok_for_cond_op_1 (rtx, rtx);
+bool aarch64_imms_ok_for_cond_op (rtx, rtx, machine_mode);
 bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
 bool aarch64_is_long_call_p (rtx);
 bool aarch64_label_mentioned_p (rtx);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0d81921..8babefb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3268,6 +3268,36 @@ aarch64_move_imm (HOST_WIDE_INT val, machine_mode mode)
   return aarch64_bitmask_imm (val, mode);
 }
 
+/* Helper for aarch64_imms_ok_for_cond_op.  */
+
+bool
+aarch64_imms_ok_for_cond_op_1 (rtx a, rtx b)
+{
+  return AARCH64_IMMS_OK_FOR_CSNEG (a, b)
+	 || AARCH64_IMMS_OK_FOR_CSINC (a, b)
+	 || AARCH64_IMMS_OK_FOR_CSINV (a, b);
+}
+
+/* Return true if A and B are CONST_INT rtxes that can appear in
+   the two arms of an IF_THEN_ELSE used for a CSINC, CSNEG or CSINV
+   operation in mode MODE.  This is used in the splitter below
+   *cmov_insn in aarch64.md.  */
+
+bool
+aarch64_imms_ok_for_cond_op (rtx a, rtx b, machine_mode mode)
+{
+  if (!CONST_INT_P (a) || !CONST_INT_P (b))
+return false;
+
+  /* No need to do smart splitting with constant 0.  We can just do
+ normal csinc, csneg, csinv on {w,x}zr.  */
+  if (a == const0_rtx || b == const0_rtx)
+return false;
+
+  return aarch64_imms_ok_for_cond_op_1 (a, b)
+	 || aarch64_imms_ok_for_cond_op_1 (b, a);
+}
+
 static bool
 aarch64_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 {
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 7c31376..e7aecd1 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -678,6 +678,15 @@ do {	 \
 /* Maximum bytes moved by a single instruction (load/store pair).  */
 #define MOVE_MAX (UNITS_PER_WORD * 2)
 
+/* Check if CONST_INTs A and B can be used as two arguments to CSNEG.  */
+#define AARCH64_IMMS_OK_FOR_CSNEG(A, B) (INTVAL (A) == -INTVAL (B))
+
+/* Check if CONST_INTs A and B can be used as two arguments to CSINC.  */
+#define AARCH64_IMMS_OK_FOR_CSINC(A, B) (INTVAL (A) == (INTVAL (B) + 1))
+
+/* Check if CONST_INTs A and B can be used as two arguments to CSINV.  */
+#define AARCH64_IMMS_OK_FOR_CSINV(A, B) (INTVAL (A) == ~INTVAL (B))
+
 /* The base cost overhead of a memcpy call, for MOVE_RATIO and friends.  */
 #define AARCH64_CALL_RATIO 8
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1e343fa..358f89b 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -2848,10 +2848,14 @@ (define_insn "*cmov_insn"
 	(if_then_else:ALLI
 	 (match_operator 1 "aarch64_comparison_operator"
 	  [(match_operand 2 "cc_register" "") (const_int 0)])
-	 (match_operand:ALLI 3 

Re: [PATCH 3/3] Fix ubsan tests by disabling of an optimization.

2015-07-10 Thread Richard Biener
On Thu, Jul 9, 2015 at 6:37 PM, Jeff Law  wrote:
> On 07/09/2015 09:41 AM, Jakub Jelinek wrote:
>>
>> On Thu, Jul 09, 2015 at 09:34:25AM -0600, Jeff Law wrote:
>>>
>>> On 07/09/2015 08:13 AM, Jakub Jelinek wrote:

 On Thu, Jul 09, 2015 at 03:56:35PM +0200, mliska wrote:
>
> ---
>   gcc/testsuite/g++.dg/ubsan/vptr-1.C | 2 +-
>   gcc/testsuite/g++.dg/ubsan/vptr-2.C | 2 +-
>   gcc/testsuite/g++.dg/ubsan/vptr-3.C | 2 +-
>   3 files changed, 3 insertions(+), 3 deletions(-)


 I'd actually think it would be better to give up on the
 UBSAN_* internal calls in tail merging.  Those internal pass
 arguments based on their gimple_location, so tail merging breaks
 them.
>>>
>>> So I think the larger question here is should differences in gimple
>>> locations prevent tail merging?  I'd tend to think not, which then begs
>>> the
>>
>>
>> Generally no.
>>
>>> question, are the UBSAN calls special enough to warrant an exception?
>>
>>
>> ASAN internal calls too I suppose.  And yes, I believe they are special
>> enough to warrant an exception.  The speciality is in them being lowered
>> later on into a call that is passed as one argument a structure containing
>> file:line into derived from that location, and for those calls that info
>> is
>> very important (by using -fsanitize=address or -fsanitize=undefined
>> the user already says that he doesn't care that much about generated code
>> quality, the extra overhead is already there).  Another option would be
>> for
>> -fsanitize=address or undefined etc. to disable various optimization
>> options
>> (it does already disable some like non-null optimizations, because it is
>> essential, but I believe there is no reason not to perform tail merging
>> even with those options, as long as there are none of the problematic
>> internal calls involved, or if the locus is the same.  One could consider
>> gimple_location as yet another parameter to those internal calls, not
>> emitted directly just to avoid wasting compiler memory.
>
> I figured you'd say something along these lines :-)  Essentially the
> argument is the line numbers are absolutely core to what the sanitizers
> provide by way their diagnostics.  Getting them wrong because we tail merged
> is likely to cause huge amounts of confusion.
>
> Martin -- if you could have the existence of ASAN or UBSAN calls inhibit
> tail merging.  I guess you could potentially check the location information
> and still allow if the location information on those calls matches, but I
> doubt that's worth the effort.

But the warning on the "bogus" line will still be warranted, so user goes and
fixes it.  Then tail-merge no longer applies and he gets the warning on the
other warranted line.

So in the end tail-merging caused him to fix _two_ bugs instead of just
the single one he'd run into otherwise!

-> I don't see any issue with tail-merging those.

Richard.

> Jeff
>


Re: Proposal to postpone release of 5.2 for a week [Was: Re: patch to fix PR66782]

2015-07-10 Thread Richard Biener
On Thu, 9 Jul 2015, Uros Bizjak wrote:

> Hello!
> 
> > The patch was bootstrapped and tested on x86/x86-64.
> >
> > Committed as rev. 225618.
> >
> > 2015-07-09  Vladimir Makarov  
> >
> > PR rtl-optimization/66782
> > * lra-int.h (struct lra_insn_recog_data): Add comment about
> > clobbered hard regs for arg_hard_regs.
> > * lra.c (lra_set_insn_recog_data): Add clobbered hard regs.
> > * lra-lives.c (process_bb_lives): Process clobbered hard regs.
> > Add condition for processing used hard regs.
> > * lra-constraints.c (update_ebb_live_info, inherit_in_ebb):
> > Process clobbered hard regs.
> 
> I would like to nominate this patch for gcc-5.2 release. According to
> downstream bugreport [1], gcc-5.1 is unusable for 64-bit wine:
> 
> "Breaks all of wine, no easy workaround -> blocker."
> 
> Due to severity of this bug, and importance of Wine, I'd like to
> postpone the 5.2 release for a week, so this bug gets some testing in
> the mainline, before it is backported to gcc-5 branch
> 
> [1] https://bugs.winehq.org/show_bug.cgi?id=38653

Hm.  I'd rather burn this with a RC2 released soon or defer it to
GCC 5.3.  The patch looks kind-of straight-forward, likely not
affecting anything else (to my naiive eyes...).

So - please get it committed to the GCC 5 branch as soon as possible.
A GCC 5.2 RC2 will be done on Monday latest then (possibly during the
weekend if I find the time to do it).

Note this opens the window for other important wrong-code fixes - please
CC me on any you'd like to propose for GCC 5.2 and wait for my
explicit approval.

Thanks,
Richard.


  1   2   >