date:20140618

[Fortran, Patch, committed] Update {gfortran,intrinsic}.texi refs to OpenMP 4

2014-06-18 Thread Tobias Burnus


Committed as obvious as Rev. 211806.

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 211805)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,8 @@
+2014-06-18  Tobias Burnus  bur...@net-b.de
+
+	* gfortran.texi (OpenMP): Update refs to OpenMP 4.0.
+	* intrinsic.texi (OpenMP Modules): Ditto.
+
 2014-06-18  Jakub Jelinek  ja...@redhat.com
 
 	* cpp.c (cpp_define_builtins): Change _OPENMP macro to
Index: gcc/fortran/gfortran.texi
===
--- gcc/fortran/gfortran.texi	(Revision 211805)
+++ gcc/fortran/gfortran.texi	(Arbeitskopie)
@@ -531,7 +531,7 @@ The current status of the support is can be found
 @ref{TS 29113 status} sections of the documentation.
 
 Additionally, the GNU Fortran compilers supports the OpenMP specification
-(version 3.1, @url{http://openmp.org/@/wp/@/openmp-specifications/}).
+(version 4.0, @url{http://openmp.org/@/wp/@/openmp-specifications/}).
 
 @node Varying Length Character Strings
 @subsection Varying Length Character Strings
@@ -1884,8 +1884,8 @@ It consists of a set of compiler directives, libra
 and environment variables that influence run-time behavior.
 
 GNU Fortran strives to be compatible to the 
-@uref{http://www.openmp.org/mp-documents/spec31.pdf,
-OpenMP Application Program Interface v3.1}.
+@uref{http://openmp.org/wp/openmp-specifications/,
+OpenMP Application Program Interface v4.0}.
 
 To enable the processing of the OpenMP directive @code{!$omp} in
 free-form source code; the @code{c$omp}, @code{*$omp} and @code{!$omp}
Index: gcc/fortran/intrinsic.texi
===
--- gcc/fortran/intrinsic.texi	(Revision 211805)
+++ gcc/fortran/intrinsic.texi	(Arbeitskopie)
@@ -13399,8 +13399,7 @@ named constants:
 @code{OMP_LIB} provides the scalar default-integer
 named constant @code{openmp_version} with a value of the form
 @var{mm}, where @code{} is the year and @var{mm} the month
-of the OpenMP version; for OpenMP v3.1 the value is @code{201107}
-and for OpenMP v4.0 the value is @code{201307}.
+of the OpenMP version; for OpenMP v4.0 the value is @code{201307}.
 
 The following scalar integer named constants of the
 kind @code{omp_sched_kind}:

Re: [PATCH 5/5] add libcc1

2014-06-18 Thread Tom Tromey

 Joseph == Joseph S Myers jos...@codesourcery.com writes:

Tom This patch adds the plugin to the gcc tree and updates the
Tom top-level configury.

Following up on your review.

Joseph I don't see anything obvious that would disable the plugin if
Joseph plugins are unsupported (e.g. on Windows host) or disabled
Joseph (--disable-plugin).  Probably the relevant support from
Joseph gcc/configure.ac needs to go somewhere it can be used at
Joseph toplevel.

I've moved some relevant code to a new .m4 file in config and used it
from the plugin itself.  This seemed simpler than dealing with it at the
top level.  The plugin also self-disables if its configury needs are not
met.

Tom +  self-args.push_back (gcc);

Joseph seems wrong - at least you should use the appropriate compiler
Joseph name after transformation for cross compilers /
Joseph --program-transform-name.  Though really the *versioned* driver
Joseph $(target_noncanonical)-gcc-$(version) is the right one to use,

This turned out to be a pain :-) There are two basic problems.

First, gdb gets the names of its architectures from BFD, which doesn't
always use the same naming scheme as the GNU configury triplets.  It
does generally use the same names, but for x86 targets it differs quite
a bit.

Second, the configury triplets can vary in annoying ways that don't
really affect correct operation.  For example, i586- versus i686-
(there is a difference, but I think ignorable given the compiler flags
in the debuginfo, and anyway I suspect not discoverable by gdb); or
-unknown- versus -pc- (completely irrelevant AFAIK); or even
x86_64-redhat-linux versus x86_64-unknown-linux-gnu (seemingly
gratuitous).

In the end I added some code to gdb and to libcc1.so to construct a
regexp matching plausible results and then search $PATH for matches.
Which seems rather gross, but workable in reasonable scenarios.

I didn't try to apply the program transform name.  I suppose I could
apply it to the final gcc component of the name, though, without much
trouble.  I'll fix this up tomorrow.

Let me know if you have any issue with the above.  Barring that, I will
be resubmitting the series soon, most likely this week.

thanks,
Tom

C++ PATCH for c++/61507 (variadics and explicit template args)

2014-06-18 Thread Jason Merrill

In this bug we were throwing away the explicit args when doing nested 
unification for a function, with the result that we didn't remember them 
when proceeding to do unification for the trailing arguments.  Fixed by 
preserving ARGUMENT_PACK_EXPLICIT_ARGS.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 00e76793666566b604903930c77d4a644ec74a12
Author: Jason Merrill ja...@redhat.com
Date:   Tue Jun 17 16:35:57 2014 +0200

	PR c++/61507
	* pt.c (resolve_overloaded_unification): Preserve
	ARGUMENT_PACK_EXPLICIT_ARGS.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index d5cc257..f0a598b 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -16838,7 +16838,16 @@ resolve_overloaded_unification (tree tparms,
   int i = TREE_VEC_LENGTH (targs);
   for (; i--; )
 	if (TREE_VEC_ELT (tempargs, i))
-	  TREE_VEC_ELT (targs, i) = TREE_VEC_ELT (tempargs, i);
+	  {
+	tree old = TREE_VEC_ELT (targs, i);
+	tree new_ = TREE_VEC_ELT (tempargs, i);
+	if (new_  old  ARGUMENT_PACK_P (old)
+		 ARGUMENT_PACK_EXPLICIT_ARGS (old))
+	  /* Don't forget explicit template arguments in a pack.  */
+	  ARGUMENT_PACK_EXPLICIT_ARGS (new_)
+		= ARGUMENT_PACK_EXPLICIT_ARGS (old);
+	TREE_VEC_ELT (targs, i) = new_;
+	  }
 }
   if (good)
 return true;
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic159.C b/gcc/testsuite/g++.dg/cpp0x/variadic159.C
new file mode 100644
index 000..2b14d30
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic159.C
@@ -0,0 +1,14 @@
+// PR c++/61507
+// { dg-do compile { target c++11 } }
+
+struct A {
+  void foo(const int );
+  void foo(float);
+};
+
+template typename... Args
+void bar(void (A::*memfun)(Args...), Args... args);
+
+void go(const int i) {
+  barconst int (A::foo, i);
+}

C++ PATCH for c++/59296 (rvalue object and lvalue ref-qualifier)

2014-06-18 Thread Jason Merrill

We were treating a const  member function like a normal const 
reference, and binding an rvalue object argument to it.  But it doesn't 
work that way.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 20a165532a9b0b0dada391716a1fb781af3ec005
Author: Jason Merrill ja...@redhat.com
Date:   Wed Jun 18 22:56:25 2014 +0200

	PR c++/59296
	* call.c (add_function_candidate): Set LOOKUP_NO_RVAL_BIND for
	ref-qualifier handling.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 1d4c4f9..b4adf36 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -2025,6 +2025,8 @@ add_function_candidate (struct z_candidate **candidates,
 		 object parameter has reference type.  */
 		  bool rv = FUNCTION_RVALUE_QUALIFIED (TREE_TYPE (fn));
 		  parmtype = cp_build_reference_type (parmtype, rv);
+		  /* Don't bind an rvalue to a const lvalue ref-qualifier.  */
+		  lflags |= LOOKUP_NO_RVAL_BIND;
 		}
 	  else
 		{
diff --git a/gcc/testsuite/g++.dg/cpp0x/ref-qual15.C b/gcc/testsuite/g++.dg/cpp0x/ref-qual15.C
new file mode 100644
index 000..ca333c2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/ref-qual15.C
@@ -0,0 +1,13 @@
+// PR c++/59296
+// { dg-do compile { target c++11 } }
+
+struct Type
+{
+  void get() const { }
+  void get() const { }
+};
+
+int main()
+{
+  Type{}.get();
+}

[patch committed] [SH] Fix build failure in libgomp

2014-06-18 Thread Kaz Kojima

Hi,

Trunk fails to build on sh4-unknown-linux-gnu with an ICE
during compiling libgomp.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61550
for detail.  sh.c:prepare_move_operands has the code for
TLS addresses which shouldn't be run if reload in progress
or done.
The attached patch is to fix it.  Committed on trunk.

Regards,
kaz
--

2014-06-18  Kaz Kojima  kkoj...@gcc.gnu.org

PR target/61550
* config/sh/sh.c (prepare_move_operands): Don't process TLS
addresses here if reload in progress or completed.

--- ORIG/trunk/gcc/config/sh/sh.c   2014-06-17 21:21:32.043445314 +0900
+++ trunk/gcc/config/sh/sh.c2014-06-18 08:26:27.846157153 +0900
@@ -1758,7 +1758,8 @@ prepare_move_operands (rtx operands[], e
   else
opc = NULL_RTX;
 
-  if ((tls_kind = tls_symbolic_operand (op1, Pmode)) != TLS_MODEL_NONE)
+  if (! reload_in_progress  ! reload_completed
+  (tls_kind = tls_symbolic_operand (op1, Pmode)) != TLS_MODEL_NONE)
{
  rtx tga_op1, tga_ret, tmp, tmp2;

Re: [PATCH, aarch64] Fix 61545

2014-06-18 Thread Kyle McMartin

On Tue, Jun 17, 2014 at 10:19:06PM -0700, Richard Henderson wrote:
 Trivial fix for missing clobber of the flags over the tlsdesc call.
 
 Ok for all branches?
 
 
 r~
 
   * config/aarch64/aarch64.md (tlsdesc_small_PTR): Clobber CC_REGNUM.
 

pretty sure we need a similar fix for tlsgd_small, since __tls_get_addr
could clobber CC as well.

regards, Kyle

 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index a4d8887..1ee2cae 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -3855,6 +3855,7 @@
  (unspec:PTR [(match_operand 0 aarch64_valid_symref S)]
  UNSPEC_TLSDESC))
 (clobber (reg:DI LR_REGNUM))
 +   (clobber (reg:CC CC_REGNUM))
 (clobber (match_scratch:DI 1 =r))]
TARGET_TLS_DESC
adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0, w0, 
 %L0\;.tlsdesccall\\t%0\;blr\\t%1

Re: [PATCH, aarch64] Fix 61545

2014-06-18 Thread Richard Henderson

On 06/18/2014 03:57 PM, Kyle McMartin wrote:
 pretty sure we need a similar fix for tlsgd_small, since __tls_get_addr
 could clobber CC as well.

As I replied in IRC, no, because tlsgd_small is modeled with an actual
CALL_INSN, and thus call-clobbered registers work as normal.


r~

Re: [PATCH, aarch64] Fix 61545

2014-06-18 Thread Kyle McMartin

On Wed, Jun 18, 2014 at 04:04:53PM -0700, Richard Henderson wrote:
 On 06/18/2014 03:57 PM, Kyle McMartin wrote:
  pretty sure we need a similar fix for tlsgd_small, since __tls_get_addr
  could clobber CC as well.
 
 As I replied in IRC, no, because tlsgd_small is modeled with an actual
 CALL_INSN, and thus call-clobbered registers work as normal.
 

Ah, sorry I missed your reply. Makes sense.

regards, Kyle

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-18 Thread Richard Henderson

On 06/01/2014 03:00 AM, Tom de Vries wrote:
 +/* Emit call insn with PAT and do aarch64-specific handling.  */
 +
 +bool
 +aarch64_emit_call_insn (rtx pat)
 +{
 +  rtx insn = emit_call_insn (pat);
 +
 +  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));
 +}
 +

Which can't have been bootstrapped, since this has no return stmt.
Why the bool return type anyway?  Nothing appears to use it.


r~

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-18 Thread Richard Henderson

On 06/01/2014 03:00 AM, Tom de Vries wrote:
 +aarch64_emit_call_insn (rtx pat)
 +{
 +  rtx insn = emit_call_insn (pat);
 +
 +  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP0_REGNUM));
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP1_REGNUM));

Actually, I'd like to know more about how this is supposed to work.

Why are you only marking the two registers that would be used by a PLT entry,
but not those clobbered by the ld.so trampoline, or indeed the unknown function
that would be called from the PLT.

Oh, I see, looking at the code we do actually follow the cgraph and make sure
it is a direct call with a known destination.  So, in fact, it's only the
registers that could be clobbered by ld branch islands (so these two are still
correct for aarch64).

This means the documentation is actually wrong when it mentions PLTs at all.

Do we in fact make sure this isn't an ifunc resolver?  I don't immediately see
how those get wired up in the cgraph...


r~

Re: [PATCH, ARM] Enable fuse-caller-save for ARM

2014-06-18 Thread Richard Henderson

On 06/01/2014 04:27 AM, Tom de Vries wrote:
 +  if (TARGET_AAPCS_BASED)
 +{
 +  /* For AAPCS, IP and CC can be clobbered by veneers inserted by the
 +  linker.  We need to add these to allow
 +  arm_call_fusage_contains_non_callee_clobbers to return true.  */
 +  rtx *fusage = CALL_INSN_FUNCTION_USAGE (insn);
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, IP_REGNUM));
 +  clobber_reg (fusage, gen_rtx_REG (word_mode, CC_REGNUM));

Why are you adding CC_REGNUM if fixed registers are automatically included?


r~

Re: Update gcc.gnu.org/projects/gomp/

2014-06-18 Thread Gerald Pfeifer

On Wed, 18 Jun 2014, Jakub Jelinek wrote:
 I've committed following change:

Cool.

  h2Status/h2
  dl
 +dtbJun 18, 2014/b/dt
 +ddpThe last major part of Fortran OpenMP v4.0 support has been
 +committed into SVN mainline./p/dd
 +
  dtbOct 11, 2013/b/dt
  ddpThe codegomp-4_0-branch/code has been merged into SVN
 -mainline, so GCC 4.9 and later will feature OpenMP v4.0 support./p/dd
 +mainline, so GCC 4.9 and later will feature OpenMP v4.0 support for
 +C and C++./p/dd

Isn't that worth a note on our homepage as well?

Gerald

Re: -fuse-caller-save - Collect register usage information

2014-06-18 Thread Richard Henderson

On 05/19/2014 07:30 AM, Tom de Vries wrote:
 +  for (insn = get_insns (); insn != NULL_RTX; insn = next_insn (insn))
 +{
 +  HARD_REG_SET insn_used_regs;
 +
 +  if (!NONDEBUG_INSN_P (insn))
 + continue;
 +
 +  find_all_hard_reg_sets (insn, insn_used_regs, false);
 +
 +  if (CALL_P (insn)
 +!get_call_reg_set_usage (insn, insn_used_regs, call_used_reg_set))
 + {
 +   CLEAR_HARD_REG_SET (node-function_used_regs);
 +   return;
 + }
 +
 +  IOR_HARD_REG_SET (node-function_used_regs, insn_used_regs);
 +}

As an aside, wouldn't it work out better if we collect into a local variable
instead of writing to memory here in node-function_used_regs each time?  But
not the main point...

Let's suppose that we've got a rather large function, with only local calls for
which we can acquire usage.  Let's suppose that even one of those callees
further calls something else, such that insn_used_regs == call_used_reg_set.

We fill node-function_used_regs immediately, but keep scanning the rest of the
large function.


 +
 +  /* Be conservative - mark fixed and global registers as used.  */
 +  IOR_HARD_REG_SET (node-function_used_regs, fixed_reg_set);
 +  for (i = 0; i  FIRST_PSEUDO_REGISTER; i++)
 +if (global_regs[i])
 +  SET_HARD_REG_BIT (node-function_used_regs, i);
 +
 +#ifdef STACK_REGS
 +  /* Handle STACK_REGS conservatively, since the df-framework does not
 + provide accurate information for them.  */
 +
 +  for (i = FIRST_STACK_REG; i = LAST_STACK_REG; i++)
 +SET_HARD_REG_BIT (node-function_used_regs, i);
 +#endif
 +
 +  node-function_used_regs_valid = 1;

Wouldn't it be better to compare the collected function_used_regs; if it
contains all of call_used_reg_set, decline to set function_used_regs_valid.
That way, we'll early exit from the above loop whenever we see that we can't
improve over the default call-clobber set.

Although perhaps function_used_regs_valid is no longer the best name in that
case...


r~

RE: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-18 Thread Thomas Preud'homme

 From: Jakub Jelinek [mailto:ja...@redhat.com]
 Sent: Thursday, June 19, 2014 1:54 AM
 
 Seems there are actually two spots with this, not just one.
 
 Completely untested fix:
 
 2014-06-18  Jakub Jelinek  ja...@redhat.com
 
   * tree-ssa-math-opts.c (do_shift_rotate, find_bswap_or_nop_1):
 Cast
   0xff to uint64_t before shifting it up.
 
 --- gcc/tree-ssa-math-opts.c  2014-06-13 08:08:42.354136356 +0200
 +++ gcc/tree-ssa-math-opts.c  2014-06-18 19:50:59.486916201 +0200
 @@ -1669,7 +1669,8 @@ do_shift_rotate (enum tree_code code,
break;
  case RSHIFT_EXPR:
/* Arithmetic shift of signed type: result is dependent on the value.  
 */
 -  if (!TYPE_UNSIGNED (n-type)  (n-n  (0xff  (bitsize - 8
 +  if (!TYPE_UNSIGNED (n-type)
 +(n-n  ((uint64_t) 0xff  (bitsize - 8
   return false;
n-n = count;
break;
 @@ -1903,7 +1904,7 @@ find_bswap_or_nop_1 (gimple stmt, struct
   old_type_size = TYPE_PRECISION (n-type);
   if (!TYPE_UNSIGNED (n-type)
type_size  old_type_size
 -  n-n  (0xff  (old_type_size - 8)))
 +  n-n  ((uint64_t) 0xff  (old_type_size - 8)))
 return NULL_TREE;
 
   if (type_size / BITS_PER_UNIT  (int)(sizeof (int64_t)))
 
 

Yep, that's the right fix. I tested it on both a bootstrapped gcc on 
x86_64-linux-gnu
and an arm-none-eabi cross-compiler with no regression on the testsuite.

Jakub, since you made the patch, the honor of commiting it should be yours.

Richard, given this issue, I think we should wait a few more days before I 
commit
A backported (and fixed of course) version to 4.8 and 4.9.

Best regards,

Thomas

[Patch, Fortran, committed] PR61126 – fix wextra_1.f regression

2014-06-18 Thread Tobias Burnus

Committed as Rev. 211766. See PR comments 10, 23 and 24 for the patch 
and the background.


Thanks to Manuel and Dominque for the patch!

Tobias
2014-06-18  Manuel LÃ³pez-IbÃ¡Ã±ez  m...@gcc.gnu.org

	PR fortran/61126
	* options.c (gfc_handle_option): Remove call to
	handle_generated_option.

2014-06-18  Dominique d'Humieres domi...@lps.ens.fr

	PR fortran/61126
	* gfortran.dg/wextra_1.f: Add -Wall to dg-options.

diff --git a/gcc/fortran/options.c b/gcc/fortran/options.c
index a2b91ca..e4931f0 100644
--- a/gcc/fortran/options.c
+++ b/gcc/fortran/options.c
@@ -674,12 +674,7 @@ gfc_handle_option (size_t scode, const char *arg, int value,
   break;
 
 case OPT_Wextra:
-  handle_generated_option (global_options, global_options_set,
-			   OPT_Wunused_parameter, NULL, value,
-			   gfc_option_lang_mask (), kind, loc,
-			   handlers, global_dc);
   set_Wextra (value);
-
   break;
 
 case OPT_Wfunction_elimination:
diff --git a/gcc/testsuite/gfortran.dg/wextra_1.f b/gcc/testsuite/gfortran.dg/wextra_1.f
index 94c8edd..0eb28e1 100644
--- a/gcc/testsuite/gfortran.dg/wextra_1.f
+++ b/gcc/testsuite/gfortran.dg/wextra_1.f
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options -Wextra }
+! { dg-options -Wall -Wextra }
   program main
   integer, parameter :: x=3 ! { dg-warning Unused parameter }
   real :: a

Re: [Patch, Fortran, committed] PR61126 – fix wextra_1.f regression

2014-06-18 Thread Tobias Burnus


Tobias Burnus wrote:
Committed as Rev. 211766. See PR comments 10, 23 and 24 for the patch 
and the background. Thanks to Manuel and Dominque for the patch!


And as follow up, I have committed the attached documentation patch. I 
think it is sufficient, even though it does not explicitly state that 
-Wall only works because -Wall implies -Wunused.


Committed as Rev. 211767.

Tobias
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 211766)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2014-06-18  Tobias Burnus  bur...@net-b.de
+
+	PR fortran/61126
+	* invoke.texi (-Wunused-parameter): Make clearer when
+	-Wextra implies this option.
+
 2014-06-18  Manuel LÃ³pez-IbÃ¡Ã±ez  m...@gcc.gnu.org
 
 	PR fortran/61126
Index: gcc/fortran/invoke.texi
===
--- gcc/fortran/invoke.texi	(Revision 211766)
+++ gcc/fortran/invoke.texi	(Arbeitskopie)
@@ -911,7 +911,8 @@ Contrary to @command{gcc}'s meaning of @option{-Wu
 @command{gfortran}'s implementation of this option does not warn
 about unused dummy arguments (see @option{-Wunused-dummy-argument}),
 but about unused @code{PARAMETER} values. @option{-Wunused-parameter}
-is not included in @option{-Wall} but is implied by @option{-Wall -Wextra}.
+is implied by @option{-Wextra} if also @option{-Wunused} or
+@option{-Wall} is used.
 
 @item -Walign-commons
 @opindex @code{Walign-commons}

Re: [PATCH, loop2_invariant, 2/2] Change heuristics for identical invariants

2014-06-18 Thread Zhenqiang Chen

On 10 June 2014 19:16, Steven Bosscher stevenb@gmail.com wrote:
 On Tue, Jun 10, 2014 at 11:23 AM, Zhenqiang Chen wrote:
 * loop-invariant.c (struct invariant): Add a new member: eqno;
 (find_identical_invariants): Update eqno;
 (create_new_invariant): Init eqno;
 (get_inv_cost): Compute comp_cost wiht eqno;
 (gain_for_invariant): Take spill cost into account.

 Look OK except ...

 @@ -1243,7 +1256,13 @@ gain_for_invariant (struct invariant *inv,
 unsigned *regs_needed,
  + IRA_LOOP_RESERVED_REGS
  - ira_class_hard_regs_num[cl];
if (size_cost  0)
 -   return -1;
 +   {
 + int spill_cost = target_spill_cost [speed] * (int) regs_needed[cl];
 + if (comp_cost = spill_cost)
 +   return -1;
 +
 + return 2;
 +   }
else
 size_cost = 0;
  }

 ... why return 2, instead of just falling through to return
 comp_cost - size_cost;?

Thanks for the comments. Updated.

As your comments for the previous patch, I should also check the
overlap between reg classes. So I change the logic to check spill
cost.

diff --git a/gcc/loop-invariant.c b/gcc/loop-invariant.c
index 6e43b49..af0c95b 100644
--- a/gcc/loop-invariant.c
+++ b/gcc/loop-invariant.c
@@ -104,6 +104,9 @@ struct invariant
   /* The number of the invariant with the same value.  */
   unsigned eqto;

+  /* The number of invariants which eqto this.  */
+  unsigned eqno;
+
   /* If we moved the invariant out of the loop, the register that contains its
  value.  */
   rtx reg;
@@ -498,6 +501,7 @@ find_identical_invariants (invariant_htab_type eq,
struct invariant *inv)
   struct invariant *dep;
   rtx expr, set;
   enum machine_mode mode;
+  struct invariant *tmp;

   if (inv-eqto != ~0u)
 return;
@@ -513,7 +517,12 @@ find_identical_invariants (invariant_htab_type
eq, struct invariant *inv)
   mode = GET_MODE (expr);
   if (mode == VOIDmode)
 mode = GET_MODE (SET_DEST (set));
-  inv-eqto = find_or_insert_inv (eq, expr, mode, inv)-invno;
+
+  tmp = find_or_insert_inv (eq, expr, mode, inv);
+  inv-eqto = tmp-invno;
+
+  if (tmp-invno != inv-invno  inv-always_executed)
+tmp-eqno++;

   if (dump_file  inv-eqto != inv-invno)
 fprintf (dump_file,
@@ -725,6 +734,10 @@ create_new_invariant (struct def *def, rtx insn,
bitmap depends_on,

   inv-invno = invariants.length ();
   inv-eqto = ~0u;
+
+  /* Itself.  */
+  inv-eqno = 1;
+
   if (def)
 def-invno = inv-invno;
   invariants.safe_push (inv);
@@ -1141,7 +1154,7 @@ get_inv_cost (struct invariant *inv, int
*comp_cost, unsigned *regs_needed,

   if (!inv-cheap_address
   || inv-def-n_addr_uses  inv-def-n_uses)
-(*comp_cost) += inv-cost;
+(*comp_cost) += inv-cost * inv-eqno;

 #ifdef STACK_REGS
   {
@@ -1249,7 +1262,7 @@ gain_for_invariant (struct invariant *inv,
unsigned *regs_needed,
unsigned *new_regs, unsigned regs_used,
bool speed, bool call_p)
 {
-  int comp_cost, size_cost;
+  int comp_cost, size_cost = 0;
   enum reg_class cl;
   int ret;

@@ -1273,6 +1286,8 @@ gain_for_invariant (struct invariant *inv,
unsigned *regs_needed,
 {
   int i;
   enum reg_class pressure_class;
+  int spill_cost = 0;
+  int base_cost = target_spill_cost [speed];

   for (i = 0; i  ira_pressure_classes_num; i++)
{
@@ -1286,30 +1301,13 @@ gain_for_invariant (struct invariant *inv,
unsigned *regs_needed,
  + LOOP_DATA (curr_loop)-max_reg_pressure[pressure_class]
  + IRA_LOOP_RESERVED_REGS
   ira_class_hard_regs_num[pressure_class])
-   break;
+   {
+ spill_cost += base_cost * (int) regs_needed[pressure_class];
+ size_cost = -1;
+   }
}
-  if (i  ira_pressure_classes_num)
-   /* There will be register pressure excess and we want not to
-  make this loop invariant motion.  All loop invariants with
-  non-positive gains will be rejected in function
-  find_invariants_to_move.  Therefore we return the negative
-  number here.
-
-  One could think that this rejects also expensive loop
-  invariant motions and this will hurt code performance.
-  However numerous experiments with different heuristics
-  taking invariant cost into account did not confirm this
-  assumption.  There are possible explanations for this
-  result:
-   o probably all expensive invariants were already moved out
- of the loop by PRE and gimple invariant motion pass.
-   o expensive invariant execution will be hidden by insn
- scheduling or OOO processor hardware because usually such
- invariants have a lot of freedom to be executed
- out-of-order.
-  Another reason for ignoring invariant cost vs spilling cost
-  heuristics is also in difficulties to evaluate accurately
-

Re: [patch] improve sloc assignment on bind_expr entry/exit code

2014-06-18 Thread Olivier Hainque

Hi Jeff,

On Jun 17, 2014, at 22:42 , Jeff Law l...@redhat.com wrote:

  * tree-core.h (tree_block): Add an end_locus field, allowing
  memorization of the end of block source location.
  * tree.h (BLOCK_SOURCE_END_LOCATION): New accessor.
  * gimplify.c (gimplify_bind_expr): Propagate the block start and
  end source location info we have on the block entry/exit code we
  generate.
 OK.

 Great, thanks! :-)

  I assume y'all will add a suitable test to the Ada testsuite and propagate 
 it into the GCC testsuite in due course?

 Yes, I will. At the patch submission time, I was unclear on what dejagnu
 device was available to setup a reliable testing protocol for this kind of
 issue and I was interested in getting feedback on the patch contents first.

 ISTM that dg-scan-asm for the expected extra .loc's would work, maybe
 restricted to some target we know produces .loc directives.

 Sounds appropriate ?

 Thanks again for your feedback,

 Olivier

[PATCH] Make sure cfg-cleanup runs

2014-06-18 Thread Richard Biener


This makes sure we run cfg-cleanup when we propagate into PHI nodes
or on the FRE/PRE side remove any stmt.  Otherwise we can end up
with not removed forwarder blocks.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2014-06-18  Richard Biener  rguent...@suse.de

* tree-ssa-propagate.c (replace_phi_args_in): Return whether
we propagated anything.
(substitute_and_fold_dom_walker::before_dom_children): Something
changed if we propagated into PHI arguments.
* tree-ssa-pre.c (eliminate): Always schedule cfg-cleanup if
we removed a stmt.

Index: gcc/tree-ssa-propagate.c
===
--- gcc/tree-ssa-propagate.c(revision 211738)
+++ gcc/tree-ssa-propagate.c(working copy)
@@ -964,7 +964,7 @@ replace_uses_in (gimple stmt, ssa_prop_g
 /* Replace propagated values into all the arguments for PHI using the
values from PROP_VALUE.  */
 
-static void
+static bool
 replace_phi_args_in (gimple phi, ssa_prop_get_value_fn get_value)
 {
   size_t i;
@@ -1015,6 +1015,8 @@ replace_phi_args_in (gimple phi, ssa_pro
  fprintf (dump_file, \n);
}
 }
+
+  return replaced;
 }
 
 
@@ -1066,7 +1068,7 @@ substitute_and_fold_dom_walker::before_d
  continue;
}
}
-  replace_phi_args_in (phi, get_value_fn);
+  something_changed |= replace_phi_args_in (phi, get_value_fn);
 }
 
   /* Propagate known values into stmts.  In some case it exposes
Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 211738)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4521,11 +4521,7 @@ eliminate (bool do_pre)
 
   gsi = gsi_for_stmt (stmt);
   if (gimple_code (stmt) == GIMPLE_PHI)
-   {
- remove_phi_node (gsi, true);
- /* Removing a PHI node in a block may expose a forwarder block.  */
- el_todo |= TODO_cleanup_cfg;
-   }
+   remove_phi_node (gsi, true);
   else
{
  basic_block bb = gimple_bb (stmt);
@@ -4534,6 +4530,9 @@ eliminate (bool do_pre)
bitmap_set_bit (need_eh_cleanup, bb-index);
  release_defs (stmt);
}
+
+  /* Removing a stmt may expose a forwarder block.  */
+  el_todo |= TODO_cleanup_cfg;
 }
   el_to_remove.release ();

Re: [PATCH, aarch64] Fix 61545

2014-06-18 Thread Richard Earnshaw

On 18/06/14 06:19, Richard Henderson wrote:
 Trivial fix for missing clobber of the flags over the tlsdesc call.
 
 Ok for all branches?
 
 

OK.

R.

 r~
 
   * config/aarch64/aarch64.md (tlsdesc_small_PTR): Clobber CC_REGNUM.
 
 
 z
 
 
 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index a4d8887..1ee2cae 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -3855,6 +3855,7 @@
  (unspec:PTR [(match_operand 0 aarch64_valid_symref S)]
  UNSPEC_TLSDESC))
 (clobber (reg:DI LR_REGNUM))
 +   (clobber (reg:CC CC_REGNUM))
 (clobber (match_scratch:DI 1 =r))]
TARGET_TLS_DESC
adrp\\tx0, %A0\;ldr\\t%w1, [x0, #%L0]\;add\\tw0, w0, 
 %L0\;.tlsdesccall\\t%0\;blr\\t%1

Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-18 Thread Martin Liška



On 06/17/2014 10:14 PM, David Malcolm wrote:

On Fri, 2014-06-13 at 12:24 +0200, mliska wrote:
[...snip...]

   Statistics about the pass:
   Inkscape: 11.95 MB - 11.44 MB (-4.27%)
   Firefox: 70.12 MB - 70.12 MB (-3.07%)

FWIW, you wrote 70.12 MB here for both before and after for Firefox, but
give a -3.07% change, which seems like a typo.

A 3.07% reduction from 70.12 MB would be 67.97 MB; was this what the
pass achieved?


Hi,
   it's typo, original size of FF is 72.34 MB. I hope -3.07% is the correctly 
evaluated achievement.

Thanks,
Martin



[...snip...]

Thanks (nice patch, btw)
Dave

Re: [patch] improve sloc assignment on bind_expr entry/exit code

2014-06-18 Thread Olivier Hainque


On Jun 18, 2014, at 09:42 , Olivier Hainque hain...@adacore.com wrote:
 I assume y'all will add a suitable test to the Ada testsuite and propagate 
 it into the GCC testsuite in due course?

 ISTM that dg-scan-asm for the expected extra .loc's would work, maybe
 restricted to some target we know produces .loc directives.
 
 Sounds appropriate ?

 Ah, we already have one test doing exactly that (return3.adb).
 I'll just add one.

 With Kind Regards,

 Olivier

[PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO

2014-06-18 Thread Hale Wang

Hi,

With LTO, -fno-short-enums is ignored, resulting in ABI mis-matching in
linking.

Refer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61123  for details. 

This patch add fshort-enums and fshout-wchar to LTO group.

To check it, a new procedure object-readelf is added in
testsuite/lib/lto.exp and new lto tests are added in gcc.target/arm/lto.

Bootstrap and no make check regression on X86-64.

Patch also attached for convenience.  Is It ok for trunk?

Thanks and Best Regards,
Hale Wang

c-family/ChangeLog
2014-06-18 Hale Wang hale.w...@arm.com

    PR lto/61123
    *c.opt (fshort-enums): Add to LTO.
    *c.opt (fshort-wchar): Likewise.

testsuite/ChangeLog
2014-06-18 Hale Wang hale.w...@arm.com

    * gcc.target/arm/lto/: New folder to verify the LTO
option for ARM specific.
    * gcc.target/arm/lto/pr61123-enum-size_0.c: New test
case.
    * gcc.target/arm/lto/pr61123-enum-size_1.c: Likewise.
    * gcc.target/arm/lto/lto.exp: New exp file used to test
LTO option for ARM specific.
    * lib/lto.exp (object-readelf): New procedure used to
catch the enum size in the final executable.

Index: gcc/c-family/c.opt
===
--- gcc/c-family/c.opt (revision 211394)
+++ gcc/c-family/c.opt  (working copy)
@@ -1189,11 +1189,11 @@
Use the same size for double as for float

 fshort-enums
-C ObjC C++ ObjC++ Optimization Var(flag_short_enums)
+C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums)
Use the narrowest integer type possible for enumeration types

 fshort-wchar
-C ObjC C++ ObjC++ Optimization Var(flag_short_wchar)
+C ObjC C++ ObjC++ LTO Optimization Var(flag_short_wchar)
Force the underlying type for \wchar_t\ to be \unsigned short\

 fsigned-bitfields
Index: gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c
===
--- gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c   (revision
0)
+++ gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c    (revision 0)
@@ -0,0 +1,22 @@
+/* { dg-lto-do link } */
+/* { dg-lto-options { { -fno-short-enums -Wl,-Ur,--no-enum-size-warning -Os
-nostdlib -flto } } } */
+
+#include stdlib.h
+
+enum enum_size_attribute
+{
+  small_size, int_size
+};
+
+struct debug_ABI_enum_size
+{
+  enum enum_size_attribute es;
+};
+
+int
+foo1 (struct debug_ABI_enum_size *x)
+{
+  return sizeof (x-es);
+}
+
+/* { dg-final { object-readelf Tag_ABI_enum_size int { target arm_eabi } }
} */
Index: gcc/testsuite/gcc.target/arm/lto/lto.exp
===
--- gcc/testsuite/gcc.target/arm/lto/lto.exp    (revision 0)
+++ gcc/testsuite/gcc.target/arm/lto/lto.exp (revision 0)
@@ -0,0 +1,59 @@
+# Copyright (C) 2009-2014 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# http://www.gnu.org/licenses/.
+#
+# Contributed by Diego Novillo dnovi...@google.com
+
+
+# Test link-time optimization across multiple files.
+#
+# Programs are broken into multiple files.  Each one is compiled
+# separately with LTO information.  The final executable is generated
+# by collecting all the generated object files using regular LTO or WHOPR.
+
+if $tracelevel then {
+    strace $tracelevel
+}
+
+# Load procedures from common libraries.
+load_lib standard.exp
+load_lib gcc.exp
+
+# Load the language-independent compabibility support procedures.
+load_lib lto.exp
+
+# If LTO has not been enabled, bail.
+if { ![check_effective_target_lto] } {
+    return
+}
+
+gcc_init
+lto_init no-mathlib
+
+# Define an identifier for use with this suite to avoid name conflicts
+# with other lto tests running at the same time.
+set sid c_lto
+
+# Main loop.
+foreach src [lsort [find $srcdir/$subdir *_0.c]] {
+    # If we're only testing specific files and this isn't one of them, skip
it.
+    if ![runtest_file_p $runtests $src] then {
+ continue
+    }
+
+    lto-execute $src $sid
+}
+
+lto_finish
Index: gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_1.c
===
--- gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_1.c   (revision
0)
+++ gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_1.c    (revision 0)
@@ -0,0 +1,5 @@
+int
+foo2 (int

[PATCH, PR 61540] Do not ICE on impossible devirtualization

2014-06-18 Thread Martin Jambor

Hi,

I was quite surprised we still had an assert checking that the target
of a virtual call derived by ipa-cp is among possible ones derived by
ipa-devirt.  This is not true for various down-casts and I managed to
trigger it in PR 61540 (where the testcase purposefully invokes
undefined behavior but we should not ICE).

Fixed thusly.  Bootstrapped and tested on x86_64-linux.  OK for trunk
and the 4.9 branch?

Thanks,

Martin


2014-06-17  Martin Jambor  mjam...@suse.cz

PR ipa/61540
* ipa-prop.c (impossible_devirt_target): New function.
(try_make_edge_direct_virtual_call): Use it, also instead of
asserting.

testsuite/
* g++.dg/ipa/pr61540.C: New test.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index b67deed..f5ec67a 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -2912,6 +2912,29 @@ try_make_edge_direct_simple_call (struct cgraph_edge *ie,
   return cs;
 }
 
+/* Return the target to be used in cases of impossible devirtualization.  IE
+   and target (the latter can be NULL) are dumped when dumping is enabled.  */
+
+static tree
+impossible_devirt_target (struct cgraph_edge *ie, tree target)
+{
+  if (dump_file)
+{
+  if (target)
+   fprintf (dump_file,
+Type inconsident devirtualization: %s/%i-%s\n,
+ie-caller-name (), ie-caller-order,
+IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (target)));
+  else
+   fprintf (dump_file,
+No devirtualization target in %s/%i\n,
+ie-caller-name (), ie-caller-order);
+}
+  tree new_target = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
+  cgraph_get_create_node (new_target);
+  return new_target;
+}
+
 /* Try to find a destination for indirect edge IE that corresponds to a virtual
call based on a formal parameter which is described by jump function JFUNC
and if it can be determined, make it direct and return the direct edge.
@@ -2946,15 +2969,7 @@ try_make_edge_direct_virtual_call (struct cgraph_edge 
*ie,
DECL_FUNCTION_CODE (target) == BUILT_IN_UNREACHABLE)
  || !possible_polymorphic_call_target_p
   (ie, cgraph_get_node (target)))
-   {
- if (dump_file)
-   fprintf (dump_file,
-Type inconsident devirtualization: %s/%i-%s\n,
-ie-caller-name (), ie-caller-order,
-IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (target)));
- target = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
- cgraph_get_create_node (target);
-   }
+   target = impossible_devirt_target (ie, target);
  return ipa_make_edge_direct_to_target (ie, target);
}
}
@@ -2984,10 +2999,7 @@ try_make_edge_direct_virtual_call (struct cgraph_edge 
*ie,
   if (targets.length () == 1)
target = targets[0]-decl;
   else
-   {
-  target = builtin_decl_implicit (BUILT_IN_UNREACHABLE);
- cgraph_get_create_node (target);
-   }
+   target = impossible_devirt_target (ie, NULL_TREE);
 }
   else
 {
@@ -3002,10 +3014,8 @@ try_make_edge_direct_virtual_call (struct cgraph_edge 
*ie,
 
   if (target)
 {
-#ifdef ENABLE_CHECKING
-  gcc_assert (possible_polymorphic_call_target_p
-(ie, cgraph_get_node (target)));
-#endif
+  if (!possible_polymorphic_call_target_p (ie, cgraph_get_node (target)))
+   return ipa_make_edge_direct_to_target (ie, target);
   return ipa_make_edge_direct_to_target (ie, target);
 }
   else
diff --git a/gcc/testsuite/g++.dg/ipa/pr61540.C 
b/gcc/testsuite/g++.dg/ipa/pr61540.C
new file mode 100644
index 000..d298964
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr61540.C
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options -O3 -fno-early-inlining -fdump-ipa-cp } */
+
+struct data {
+  data(int) {}
+};
+
+struct top {
+  virtual int topf() {}
+};
+
+struct intermediate: top {
+int topf() /* override */ { return 0; }
+};
+
+struct child1: top {
+void childf()
+{
+data d(topf());
+}
+};
+
+struct child2: intermediate {};
+
+void test(top t)
+{
+child1 c = static_castchild1(t);
+c.childf();
+child2 d;
+test(d);
+}
+
+int main (int argc, char **argv)
+{
+  child1 c;
+  test (c);
+  return 0;
+}
+
+/* { dg-final { scan-ipa-dump Type inconsident devirtualization cp } } */
+/* { dg-final { cleanup-ipa-dump cp } } */

[PATCH] pass cleanups

2014-06-18 Thread Richard Biener


This removes the special dce_loop pass in favor of dealing with
scev and niter estimates in dce generally.  Likewise it makes
copyprop always cleanup after itself, dealing with scev and niter
estimates.  It also makes copyprop not unconditionally schedule
a cfg-cleanup but only do so if copyprop did any transform.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2014-06-18  Richard Biener  rguent...@suse.de

* tree-pass.h (make_pass_dce_loop): Remove.
* passes.def: Replace pass_dce_loop with pass_dce.
* tree-ssa-dce.c (perform_tree_ssa_dce): If something
changed free niter estimates and reset the scev cache.
(tree_ssa_dce_loop, pass_data_dce_loop, pass_dce_loop,
make_pass_dce_loop): Remove.
* tree-ssa-copy.c: Include tree-ssa-loop-niter.h.
(fini_copy_prop): Return whether something changed.  Always
let substitute_and_fold perform DCE and free niter estimates
and reset the scev cache if so.
(execute_copy_prop): If sth changed schedule cleanup-cfg.
(pass_data_copy_prop): Do not unconditionally schedule
cleanup-cfg or update-ssa.

Index: gcc/tree-pass.h
===
*** gcc/tree-pass.h (revision 211738)
--- gcc/tree-pass.h (working copy)
*** extern gimple_opt_pass *make_pass_build_
*** 382,388 
  extern gimple_opt_pass *make_pass_build_ealias (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt);
- extern gimple_opt_pass *make_pass_dce_loop (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
--- 382,387 
Index: gcc/passes.def
===
*** gcc/passes.def  (revision 211738)
--- gcc/passes.def  (working copy)
*** along with GCC; see the file COPYING3.
*** 203,209 
  NEXT_PASS (pass_tree_loop_init);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
! NEXT_PASS (pass_dce_loop);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_record_bounds);
--- 206,212 
  NEXT_PASS (pass_tree_loop_init);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
! NEXT_PASS (pass_dce);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_record_bounds);
*** along with GCC; see the file COPYING3.
*** 215,221 
  NEXT_PASS (pass_graphite_transforms);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
! NEXT_PASS (pass_dce_loop);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_iv_canon);
  NEXT_PASS (pass_parallelize_loops);
--- 218,224 
  NEXT_PASS (pass_graphite_transforms);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
! NEXT_PASS (pass_dce);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_iv_canon);
  NEXT_PASS (pass_parallelize_loops);
*** along with GCC; see the file COPYING3.
*** 224,230 
 Please do not add any other passes in between.  */
  NEXT_PASS (pass_vectorize);
PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
! NEXT_PASS (pass_dce_loop);
POP_INSERT_PASSES ()
NEXT_PASS (pass_predcom);
  NEXT_PASS (pass_complete_unroll);
--- 227,233 
 Please do not add any other passes in between.  */
  NEXT_PASS (pass_vectorize);
PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
! NEXT_PASS (pass_dce);
POP_INSERT_PASSES ()
NEXT_PASS (pass_predcom);
  NEXT_PASS (pass_complete_unroll);
Index: gcc/tree-ssa-dce.c
===
*** gcc/tree-ssa-dce.c  (revision 211738)
--- gcc/tree-ssa-dce.c  (working copy)
*** perform_tree_ssa_dce (bool aggressive)
*** 1479,1485 
tree_dce_done (aggressive);
  
if (something_changed)
! return TODO_update_ssa | TODO_cleanup_cfg;
return 0;
  }
  
--- 1479,1490 
tree_dce_done (aggressive);
  
if (something_changed)
! {
!   free_numbers_of_iterations_estimates ();
!   if (scev_initialized_p)
!   scev_reset ();
!   return TODO_update_ssa | TODO_cleanup_cfg;
! }
return 0;
  }
  
*** tree_ssa_dce (void)
*** 1491,1509 
  }
  
  static unsigned int
- tree_ssa_dce_loop (void)
- {
-   unsigned int todo;
-   todo = perform_tree_ssa_dce (/*aggressive=*/false);
-   if (todo)
- {
-   free_numbers_of_iterations_estimates ();
-

Re: [Patch, GCC/Thumb-1]Mishandle the label type insn in function thumb1_reorg

2014-06-18 Thread Richard Earnshaw

On 10/06/14 12:42, Terry Guo wrote:
 Hi There,
 
 The thumb1_reorg function use macro INSN_CODE to find expected instructions.
 But the macro INSN_CODE doesn’t work for label type instruction. The
 INSN_CODE(label_insn) will return the label number. When we have a lot of
 labels and current label_insn is the first insn of basic block, the
 INSN_CODE(label_insn) could accidentally equal to CODE_FOR_cbranchsi4_insn
 in this case. This leads to ICE due to SET_SRC(label_insn) in subsequent
 code. In general we should skip all such improper insns. This is the purpose
 of attached small patch.
 
 Some failures in recent gcc regression test on thumb1 target are caused by
 this reason. So with this patch, all of them passed and no new failures. Is
 it ok to trunk?
 
 BR,
 Terry
 
 2014-06-10  Terry Guo  terry@arm.com
 
  * config/arm/arm.c (thumb1_reorg): Move to next basic block if the head
  of current basic block isn’t a proper insn.   
 

I think you should just test that insn != BB_HEAD (bb).  The loop
immediately above this deals with the !NON-DEBUG insns, so the logic is
confusing the way you've written it.

R.

 
 thumb1-reorg-v2.txt
 
 
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index ccad548..3ebe424 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -16939,7 +16939,8 @@ thumb1_reorg (void)
   insn = PREV_INSN (insn);
  
/* Find the last cbranchsi4_insn in basic block BB.  */
 -  if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn)
 +  if (!NONDEBUG_INSN_P (insn)
 +   || INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn)
   continue;
  
/* Get the register with which we are comparing.  */

Re: [PATCH][RFC] Add phiopt in early opts (and add -fssa-phiopt option)

2014-06-18 Thread Richard Biener

On Tue, 17 Jun 2014, Jeff Law wrote:

 On 06/17/14 07:07, Richard Biener wrote:
  
  I felt that -ftree-XXX is bad naming so I went for -fssa-XXX
  even if that is now inconsistent.  Any optinion here?  For
  RTL we simply have unsuffixed names so shall we instead go
  for -fphiopt?  PHI implies SSA anyway and 'SSA' or 'RTL' is
  an implementation detail that the user should not be interested
  in (applies to tree- as well, of course).  Now, 'phiopt' is a
  bad name when thinking of users (but they shouldn't play with
  those options anyway).
 Our flags are a mess.  If I put my user hat on, then I'd have to ask the
 question, why would I care about tree, ssa, or even phis.   The pass converts
 branchy code into straightline code.  So, arguably, the right name would
 reflect that it changes branchy code to straight line code.

Yeah, but we have so many of those ... well, ideally the user wouldn't
be able to disable random passes with a non-debug option (and we have
-fdisable-tree-XXX to disable individual pass instances).

 But I believe most of our flag names are poor in this regard (and I'm as much
 to blame as anyone).  So go with your best judgement IMHO.

Indeed.

 It'd be nice to have some testcases here to show why we want this moved
 earlier so that a few years from now when someone else wants to move it back,
 we can say umm, see test frobit.c, make that work and you can move it back
 :-)

Hmm, yeah.  But it's really doing this earlier so it would probably
invoke inliner heuristics and -Os ...  I'll try to come up with sth.

For now I have committed the new flag related changes.

Richard.

Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-18 Thread Martin Liška



On 06/17/2014 10:09 PM, Paolo Carlini wrote:

Hi,

On 13/06/14 12:24, mliska wrote:

   The optimization is inspired by Microsoft /OPT:ICF optimization 
(http://msdn.microsoft.com/en-us/library/bxwfs976.aspx) that merges COMDAT 
sections with each function reside in a separate section.

In terms of C++ testcases, I'm wondering if you already double checked that the 
new pass already does well on the typical examples on which, I was told, the 
Microsoft optimization is known to do well, eg, code instantiating std::vector 
for different pointer types, or even long and long long on x86_64-linux, things 
like that.


I've just added another C++ test case:

#include vector

using namespace std;

static vectorvectorint * a;
static vectorvoid * b;

int main()
{
  return b.size() + a.size ();
}

where the pass identifies following equality:

Semantic equality hit:std::vector_Tp, _Alloc::size_type std::vector_Tp, _Alloc::size() const [with _Tp = 
std::vectorint*; _Alloc = std::allocatorstd::vectorint*; std::vector_Tp, _Alloc::size_type = long unsigned 
int]-std::vector_Tp, _Alloc::size_type std::vector_Tp, _Alloc::size() const [with _Tp = void*; _Alloc = 
std::allocatorvoid*; std::vector_Tp, _Alloc::size_type = long unsigned int]
Semantic equality hit:static void std::_Destroy_auxtrue::__destroy(_ForwardIterator, 
_ForwardIterator) [with _ForwardIterator = void**]-static void 
std::_Destroy_auxtrue::__destroy(_ForwardIterator, _ForwardIterator) [with _ForwardIterator 
= std::vectorint**]
Semantic equality hit:void std::_Destroy(_ForwardIterator, _ForwardIterator) [with 
_ForwardIterator = void**]-void std::_Destroy(_ForwardIterator, _ForwardIterator) 
[with _ForwardIterator = std::vectorint**]
Semantic equality hit:void std::_Destroy(_ForwardIterator, _ForwardIterator, std::allocator_T2) [with 
_ForwardIterator = void**; _Tp = void*]-void std::_Destroy(_ForwardIterator, _ForwardIterator, 
std::allocator_T2) [with _ForwardIterator = std::vectorint**; _Tp = std::vectorint*]
Semantic equality hit:void __gnu_cxx::new_allocator_Tp::deallocate(__gnu_cxx::new_allocator_Tp::pointer, 
__gnu_cxx::new_allocator_Tp::size_type) [with _Tp = void*; __gnu_cxx::new_allocator_Tp::pointer = void**; 
__gnu_cxx::new_allocator_Tp::size_type = long unsigned int]-void 
__gnu_cxx::new_allocator_Tp::deallocate(__gnu_cxx::new_allocator_Tp::pointer, __gnu_cxx::new_allocator_Tp::size_type) [with _Tp = 
std::vectorint*; __gnu_cxx::new_allocator_Tp::pointer = std::vectorint**; __gnu_cxx::new_allocator_Tp::size_type = long 
unsigned int]
Semantic equality hit:static void __gnu_cxx::__alloc_traits_Alloc::deallocate(_Alloc, __gnu_cxx::__alloc_traits_Alloc::pointer, 
__gnu_cxx::__alloc_traits_Alloc::size_type) [with _Alloc = std::allocatorvoid*; __gnu_cxx::__alloc_traits_Alloc::pointer = void**; 
__gnu_cxx::__alloc_traits_Alloc::size_type = long unsigned int]-static void __gnu_cxx::__alloc_traits_Alloc::deallocate(_Alloc, 
__gnu_cxx::__alloc_traits_Alloc::pointer, __gnu_cxx::__alloc_traits_Alloc::size_type) [with _Alloc = std::allocatorstd::vectorint*; 
__gnu_cxx::__alloc_traits_Alloc::pointer = std::vectorint**; __gnu_cxx::__alloc_traits_Alloc::size_type = long unsigned int]

As one would expect, there is a function 'size'.

Martin



Thanks,
Paolo.

Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-18 Thread Paolo Carlini


Hi,

On 18/06/14 10:46, Martin Liška wrote:

As one would expect, there is a function 'size'.

Cool, thanks!

Paolo.

Re: [PATCH 4/5] Existing tests fix

2014-06-18 Thread Martin Liška



On 06/17/2014 10:50 PM, Rainer Orth wrote:

Jeff Law l...@redhat.com writes:


On 06/13/14 04:48, mliska wrote:

Hi,
many tests rely on a precise number of scanned functions in a dump file. If 
IPA ICF decides to merge some function and(or) read-only variables, counts do 
not match.

Martin

Changelog:

2014-06-13  Martin Liska  mli...@suse.cz
Honza Hubicka  hubi...@ucw.cz

* c-c++-common/rotate-1.c: Text

^ Huh?


You are right, batch replacement mistake. There should be:

* c-c++-common/rotate-1.c: Update dg-options.
* c-c++-common/rotate-2.c: Likewise.
...


Martin




* c-c++-common/rotate-2.c: New test.
* c-c++-common/rotate-3.c: Likewise.

Rainer

Re: [patch] fix tests for AVX512

2014-06-18 Thread Jakub Jelinek

On Mon, Jun 09, 2014 at 01:43:48PM +0200, Uros Bizjak wrote:
 On Mon, Jun 9, 2014 at 1:34 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
  Hello Uroš,
  On 08 Jun 11:26, Uros Bizjak wrote:
  On Tue, May 27, 2014 at 12:28 PM, Petr Murzin petrmurz...@gmail.com 
  wrote:
   Hi,
   I've fixed tests for AVX512, so they could be compiled with -Werror
   -Wall. Please have a look.
  From a quick look, this looks OK.
  Thanks, checked into trunk.
  Could we apply that to 4.9 branch?
 
 OK, but please wait a couple of days to check if everything is OK in
 mainline and also for Release Manager to reject the patch.

LGTM.

Jakub

RE: [Patch, GCC/Thumb-1]Mishandle the label type insn in function thumb1_reorg

2014-06-18 Thread Terry Guo

 -Original Message-
 From: Richard Earnshaw
 Sent: Wednesday, June 18, 2014 4:31 PM
 To: Terry Guo
 Cc: gcc-patches@gcc.gnu.org; Ramana Radhakrishnan
 Subject: Re: [Patch, GCC/Thumb-1]Mishandle the label type insn in function
 thumb1_reorg

 On 10/06/14 12:42, Terry Guo wrote:
  Hi There,

  The thumb1_reorg function use macro INSN_CODE to find expected
 instructions.
  But the macro INSN_CODE doesn’t work for label type instruction. The
  INSN_CODE(label_insn) will return the label number. When we have a lot
 of
  labels and current label_insn is the first insn of basic block, the
  INSN_CODE(label_insn) could accidentally equal to
 CODE_FOR_cbranchsi4_insn
  in this case. This leads to ICE due to SET_SRC(label_insn) in subsequent
  code. In general we should skip all such improper insns. This is the purpose
  of attached small patch.

  Some failures in recent gcc regression test on thumb1 target are caused by
  this reason. So with this patch, all of them passed and no new failures. Is
  it ok to trunk?

  BR,
  Terry

  2014-06-10  Terry Guo  terry@arm.com

   * config/arm/arm.c (thumb1_reorg): Move to next basic block if the
 head
   of current basic block isn’t a proper insn.

 I think you should just test that insn != BB_HEAD (bb).  The loop
 immediately above this deals with the !NON-DEBUG insns, so the logic is
 confusing the way you've written it.

 R.

Thanks for comments. The patch is updated and tested. No more ICE. Is this one 
OK?

BR,
Terry
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 85d2114..463707e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16946,7 +16946,8 @@ thumb1_reorg (void)
insn = PREV_INSN (insn);

   /* Find the last cbranchsi4_insn in basic block BB.  */
-  if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn)
+  if (insn == BB_HEAD (bb)
+ || INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn)
continue;

   /* Get the register with which we are comparing.  */

Re: [Patch, GCC/Thumb-1]Mishandle the label type insn in function thumb1_reorg

2014-06-18 Thread Richard Earnshaw

On 18/06/14 10:16, Terry Guo wrote:

 -Original Message-
 From: Richard Earnshaw
 Sent: Wednesday, June 18, 2014 4:31 PM
 To: Terry Guo
 Cc: gcc-patches@gcc.gnu.org; Ramana Radhakrishnan
 Subject: Re: [Patch, GCC/Thumb-1]Mishandle the label type insn in function
 thumb1_reorg

 On 10/06/14 12:42, Terry Guo wrote:
 Hi There,

 The thumb1_reorg function use macro INSN_CODE to find expected
 instructions.
 But the macro INSN_CODE doesn’t work for label type instruction. The
 INSN_CODE(label_insn) will return the label number. When we have a lot
 of
 labels and current label_insn is the first insn of basic block, the
 INSN_CODE(label_insn) could accidentally equal to
 CODE_FOR_cbranchsi4_insn
 in this case. This leads to ICE due to SET_SRC(label_insn) in subsequent
 code. In general we should skip all such improper insns. This is the purpose
 of attached small patch.

 Some failures in recent gcc regression test on thumb1 target are caused by
 this reason. So with this patch, all of them passed and no new failures. Is
 it ok to trunk?

 BR,
 Terry

 2014-06-10  Terry Guo  terry@arm.com

  * config/arm/arm.c (thumb1_reorg): Move to next basic block if the
 head
  of current basic block isn’t a proper insn.

 I think you should just test that insn != BB_HEAD (bb).  The loop
 immediately above this deals with the !NON-DEBUG insns, so the logic is
 confusing the way you've written it.

 R.

 Thanks for comments. The patch is updated and tested. No more ICE. Is this 
 one OK?

 BR,
 Terry

Yes, this is fine.

R.

 thumb1-reorg-v3.txt

 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 85d2114..463707e 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -16946,7 +16946,8 @@ thumb1_reorg (void)
   insn = PREV_INSN (insn);

/* Find the last cbranchsi4_insn in basic block BB.  */
 -  if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn)
 +  if (insn == BB_HEAD (bb)
 +   || INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn)
   continue;

/* Get the register with which we are comparing.  */

[Patch libstdc++] PR61536 Export out of line comparison operations.

2014-06-18 Thread Ramana Radhakrishnan

PR61536 is a case where linking fails on arm-linux-gnueabi* and 
arm-eabi* systems as the C++ ABI for ARM specifies out of line 
comparison operators for typeinfo.


Rev r211355 tightened the symbols exported by libstdc++ a bit too much 
which caused some carnage in the test results for arm-linux-gnueabihf.


Paolo proposed this on the bugzilla and asked if I could commit it. I've 
tweaked the comment slightly.


Tested on arm-none-linux-gnueabihf and verified the link time failures 
now disappear.


Applied to trunk.

Ramana

2014-06-18  Paolo Carlini  paolo.carl...@oracle.com
Ramana Radhakrishnan  ramana.radhakrish...@arm.com

PR libstdc++/61536
* config/abi/pre/gnu.ver: Adjust for out of line comparisons.
diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index e7de756..63c9130 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -16,6 +16,18 @@
 ## You should have received a copy of the GNU General Public License along
 ## with this library; see the file COPYING3.  If not see
 ## http://www.gnu.org/licenses/.
+// By default follow the old inline rules to avoid ABI changes.
+// Logic similar to libsupc++/typeinfo (libstdc++/61536). See
+// commentary on out of line comparisons.
+
+#ifndef __GXX_TYPEINFO_EQUALITY_INLINE
+ #if !__GXX_WEAK__
+   #define __GXX_TYPEINFO_EQUALITY_INLINE 0
+  #else
+#define __GXX_TYPEINFO_EQUALITY_INLINE 1
+  #endif
+#endif
+
 
 GLIBCXX_3.4 {
 
@@ -760,6 +772,11 @@ GLIBCXX_3.4 {
 _ZNKSt9type_info1*;
 _ZNSt9type_infoD*;
 
+#if !__GXX_TYPEINFO_EQUALITY_INLINE
+_ZNKSt9type_info6before*;
+_ZNKSt9type_infoeq*;
+#endif
+
 # std::exception
 _ZNKSt9exception4whatEv;
 _ZNSt9exceptionD*;

Update gcc.gnu.org/projects/gomp/

2014-06-18 Thread Jakub Jelinek

Hi!

I've committed following change:

--- projects/gomp/index.html25 Oct 2013 07:16:35 -  1.13
+++ projects/gomp/index.html18 Jun 2014 09:48:34 -
@@ -63,9 +63,19 @@ available./p
 
 h2Status/h2
 dl
+dtbJun 18, 2014/b/dt
+ddpThe last major part of Fortran OpenMP v4.0 support has been
+committed into SVN mainline./p/dd
+
 dtbOct 11, 2013/b/dt
 ddpThe codegomp-4_0-branch/code has been merged into SVN
-mainline, so GCC 4.9 and later will feature OpenMP v4.0 support./p/dd
+mainline, so GCC 4.9 and later will feature OpenMP v4.0 support for
+C and C++./p/dd
+
+dtbJuly 23, 2013/b/dt
+ddpThe final a
+href=http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf;OpenMP v4.0/a
+specification has been released./p/dd
 
 dtbAug 2, 2011/b/dt
 ddpThe codegomp-3_1-branch/code has been merged into SVN

Jakub

[PATCH] Create less TARGET_MEM_REFs

2014-06-18 Thread Richard Biener


I just figured that we create TARGET_MEM_REF [base: a_4, offset: 0]
from within IVOPTs.  That pessimizes further passes unnecessarily.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2014-06-18  Richard Biener  rguent...@suse.de

* tree-ssa-address.c (create_mem_ref_raw): Use proper predicate
to catch all valid MEM_REF pointer operands.

Index: gcc/tree-ssa-address.c
===
--- gcc/tree-ssa-address.c  (revision 211771)
+++ gcc/tree-ssa-address.c  (working copy)
@@ -393,7 +393,7 @@ create_mem_ref_raw (tree type, tree alia
  ???  As IVOPTs does not follow restrictions to where the base
  pointer may point to create a MEM_REF only if we know that
  base is valid.  */
-  if ((TREE_CODE (base) == ADDR_EXPR || TREE_CODE (base) == INTEGER_CST)
+  if (is_gimple_mem_ref_addr (base)
(!index2 || integer_zerop (index2))
(!addr-index || integer_zerop (addr-index)))
 return fold_build2 (MEM_REF, type, base, addr-offset);

Re: [RFC][ARM]: Fix reload spill failure (PR 60617)

2014-06-18 Thread Ramana Radhakrishnan

On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
venkataramanan.ku...@linaro.org wrote:
 Hi Maintainers,

 This patch fixes the PR 60617 that occurs when we turn on reload pass
 in thumb2 mode.

 It occurs for the pattern *ior_scc_scc that gets generated for the 3
 argument of the below function call.

 JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));


 (snip---)
 (insn 634 633 635 27 (parallel [
 (set (reg:SI 3 r3)
 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) == This operand
 r5 is registers gets assigned
 (reg/v:SI 112 [ op2 ]))
 (eq:SI (reg/v:SI 110 [ dst ]) == This operand
 (reg/v:SI 111 [ op1 ]
 (clobber (reg:CC 100 cc))
 ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
 {*ior_scc_scc
 (snip---)

 The issue here is that the above pattern demands 5 registers (LO_REGS).

 But when we are in reload, registers r0 is used for pointer to the
 class, r1 and r2 for first and second argument. r7 is used for stack
 pointer.

 So we are left with r3,r4,r5 and r6. But the above patterns needs five
 LO_REGS. Hence we get spill failure when processing the last register
 operand in that pattern,

 In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
 for thumb 2 mode there is mention of using LO_REG in the comment as
 below.

 Care should be taken to avoid adding thumb-2 patterns that require
 many low registers

 So conservative fix is not to allow this pattern for Thumb-2 mode.

I don't have an additional solution off the top of my head and
probably need to go do some digging.

It sounds like the conservative fix but what's the impact of doing so
? Have you measured that in terms of performance or code size on a
range of benchmarks ?


 I allowed these pattern for Thumb2 when we have constant operands for
 comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
 thum2-cond-cmp-4.c pass.

That sounds fine and fair - no trouble there.

My concern is with removing the register alternatives and loosing the
ability to trigger conditional compares on 4.9 and trunk for Thumb1
till the time the new conditional compare work makes it in.

Ramana


 Regression tested with gcc 4.9 branch since in trunk this bug is
 masked revision 209897.

 Please provide your suggestion on this patch

 regards,
 Venkat.

Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access

2014-06-18 Thread Ramana Radhakrishnan

On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
charles.bay...@linaro.org wrote:
 This patch adds support for post-indexed addressing for NEON structure
 memory accesses.

 For example VLD1.8 {d0}, [r0], r1


 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.

 Ok for trunk?

This is OK.

Ramana


 gcc/Changelog:

 2014-06-02  Charles Baylis  charles.bay...@linaro.org

 * config/arm/arm.c (neon_vector_mem_operand): Allow register
 POST_MODIFY for neon loads and stores.
 (arm_print_operand): Output post-index register for neon loads and
 stores.

Re: [PATCH] Create less TARGET_MEM_REFs

2014-06-18 Thread Jakub Jelinek

On Wed, Jun 18, 2014 at 11:56:01AM +0200, Richard Biener wrote:
 
 I just figured that we create TARGET_MEM_REF [base: a_4, offset: 0]
 from within IVOPTs.  That pessimizes further passes unnecessarily.
 
 Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Isn't that against the comment above it?
   ???  As IVOPTs does not follow restrictions to where the base
   pointer may point to create a MEM_REF only if we know that
   base is valid.
Perhaps it is fine only if addr-offset is integer_zerop?

 2014-06-18  Richard Biener  rguent...@suse.de
 
   * tree-ssa-address.c (create_mem_ref_raw): Use proper predicate
   to catch all valid MEM_REF pointer operands.
 
 Index: gcc/tree-ssa-address.c
 ===
 --- gcc/tree-ssa-address.c(revision 211771)
 +++ gcc/tree-ssa-address.c(working copy)
 @@ -393,7 +393,7 @@ create_mem_ref_raw (tree type, tree alia
   ???  As IVOPTs does not follow restrictions to where the base
   pointer may point to create a MEM_REF only if we know that
   base is valid.  */
 -  if ((TREE_CODE (base) == ADDR_EXPR || TREE_CODE (base) == INTEGER_CST)
 +  if (is_gimple_mem_ref_addr (base)
 (!index2 || integer_zerop (index2))
 (!addr-index || integer_zerop (addr-index)))
  return fold_build2 (MEM_REF, type, base, addr-offset);

Jakub

Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access

2014-06-18 Thread Ramana Radhakrishnan

On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis
charles.bay...@linaro.org wrote:
 On 5 June 2014 07:27, Ramana Radhakrishnan ramana@googlemail.com wrote:
 On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
 charles.bay...@linaro.org wrote:
 This patch adds support for post-indexed addressing for NEON structure
 memory accesses.

 For example VLD1.8 {d0}, [r0], r1


 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.

 Ok for trunk?

 This looks like a reasonable start but this work doesn't look complete
 to me yet.

 Can you also look at the impact on performance of a range of
 benchmarks especially a popular embedded one to see how this behaves
 unless you have already done so ?

 I ran a popular suite of embedded benchmarks, and there is no impact
 at all on Chromebook (including with the additional attached patch)

Thanks for the due diligence


 The patch was developed to address a performance issue with a new
 version of libvpx which uses intrinsics instead of NEON assembler. The
 patch results in a 3% improvement for VP8 decode.

Good - 3% not to be sneezed at.


 POST_INC, POST_MODIFY usually have a funny way of biting you with
 either ivopts or the way in which address costs work. I think there
 maybe further tweaks needed but for a first step I'd like to know what
 the performance impact is.

 I would also suggest running this through clyon's neon intrinsics
 testsuite to see if that catches any issues especially with the large
 vector modes.

Thanks.


 No issues found in clyon's tests.

Please keep an eye out for any regressions.


 Your mention of larger vector modes prompted me to check that the
 patch has the desired result with them. In fact, the costs are
 estimated incorrectly which means the post_modify pattern is not used.
 The attached patch fixes that. (used in combination with my original
 patch)


 2014-06-15  Charles Baylis  charles.ba...@linaro.org

 * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
 embedded side effects.

I'm not too thrilled with putting in more special cases that are not
table driven in there. Can you file a PR with some testcases that show
this so that we don't forget and CC me on it please ?


Ramana

Re: [PATCH] PR61517: fix stmt replacement in bswap pass

2014-06-18 Thread Richard Biener

On Wed, Jun 18, 2014 at 3:30 AM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 Hi everybody,

 Thanks to a comment from Richard Biener, the bswap pass take care to not 
 perform its optimization is memory is modified between the load of the 
 original expression. However, when it replaces these statements by a single 
 load, it does so in the gimple statement that computes the final bitwise OR 
 of the original expression. However, memory could be modified between the 
 last load statement and this bitwise OR statement. Therefore the result is to 
 read memory *after* it was changed instead of before.

 This patch takes care to move the statement to be replaced close to one of 
 the original load, thus avoiding this problem.

Ok.

Thanks,
Richard.

 ChangeLog entries for this fix are:

 *** gcc/ChangeLog ***

 2014-06-16  Thomas Preud'homme  thomas.preudho...@arm.com

 * tree-ssa-math-opts.c (find_bswap_or_nop_1): Adapt to return a stmt
 whose rhs's first tree is the source expression instead of the
 expression itself.
 (find_bswap_or_nop): Likewise.
 (bsap_replace): Rename stmt in cur_stmt. Pass gsi by value and src as 
 a
 gimple stmt whose rhs's first tree is the source. In the memory source
 case, move the stmt to be replaced close to one of the original load 
 to
 avoid the problem of a store between the load and the stmt's original
 location.
 (pass_optimize_bswap::execute): Adapt to change in bswap_replace's
 signature.

 *** gcc/testsuite/ChangeLog ***

 2014-06-16  Thomas Preud'homme  thomas.preudho...@arm.com

 * gcc.c-torture/execute/bswap-2.c (incorrect_read_le32): New.
 (incorrect_read_be32): Likewise.
 (main): Call incorrect_read_* to test stmt replacement is made by
 bswap at the right place.
 * gcc.c-torture/execute/pr61517.c: New test.

 Patch also attached for convenience. Is it ok for trunk?

 diff --git a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c 
 b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
 index a47e01a..88132fe 100644
 --- a/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
 +++ b/gcc/testsuite/gcc.c-torture/execute/bswap-2.c
 @@ -66,6 +66,32 @@ fake_read_be32 (char *x, char *y)
return c3 | c2  8 | c1  16 | c0  24;
  }

 +__attribute__ ((noinline, noclone)) uint32_t
 +incorrect_read_le32 (char *x, char *y)
 +{
 +  unsigned char c0, c1, c2, c3;
 +
 +  c0 = x[0];
 +  c1 = x[1];
 +  c2 = x[2];
 +  c3 = x[3];
 +  *y = 1;
 +  return c0 | c1  8 | c2  16 | c3  24;
 +}
 +
 +__attribute__ ((noinline, noclone)) uint32_t
 +incorrect_read_be32 (char *x, char *y)
 +{
 +  unsigned char c0, c1, c2, c3;
 +
 +  c0 = x[0];
 +  c1 = x[1];
 +  c2 = x[2];
 +  c3 = x[3];
 +  *y = 1;
 +  return c3 | c2  8 | c1  16 | c0  24;
 +}
 +
  int
  main ()
  {
 @@ -92,8 +118,17 @@ main ()
out = fake_read_le32 (cin, cin[2]);
if (out != 0x89018583)
  __builtin_abort ();
 +  cin[2] = 0x87;
out = fake_read_be32 (cin, cin[2]);
if (out != 0x83850189)
  __builtin_abort ();
 +  cin[2] = 0x87;
 +  out = incorrect_read_le32 (cin, cin[2]);
 +  if (out != 0x89878583)
 +__builtin_abort ();
 +  cin[2] = 0x87;
 +  out = incorrect_read_be32 (cin, cin[2]);
 +  if (out != 0x83858789)
 +__builtin_abort ();
return 0;
  }
 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61517.c 
 b/gcc/testsuite/gcc.c-torture/execute/pr61517.c
 new file mode 100644
 index 000..fc9bbe8
 --- /dev/null
 +++ b/gcc/testsuite/gcc.c-torture/execute/pr61517.c
 @@ -0,0 +1,19 @@
 +int a, b, *c = a;
 +unsigned short d;
 +
 +int
 +main ()
 +{
 +  unsigned int e = a;
 +  *c = 1;
 +  if (!b)
 +{
 +  d = e;
 +  *c = d | e;
 +}
 +
 +  if (a != 0)
 +__builtin_abort ();
 +
 +  return 0;
 +}
 diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
 index c868e92..1ee2ba8 100644
 --- a/gcc/tree-ssa-math-opts.c
 +++ b/gcc/tree-ssa-math-opts.c
 @@ -1804,28 +1804,28 @@ find_bswap_or_nop_load (gimple stmt, tree ref, struct 
 symbolic_number *n)

  /* find_bswap_or_nop_1 invokes itself recursively with N and tries to perform
 the operation given by the rhs of STMT on the result.  If the operation
 -   could successfully be executed the function returns the tree expression of
 -   the source operand and NULL otherwise.  */
 +   could successfully be executed the function returns a gimple stmt whose
 +   rhs's first tree is the expression of the source operand and NULL
 +   otherwise.  */

 -static tree
 +static gimple
  find_bswap_or_nop_1 (gimple stmt, struct symbolic_number *n, int limit)
  {
enum tree_code code;
tree rhs1, rhs2 = NULL;
 -  gimple rhs1_stmt, rhs2_stmt;
 -  tree source_expr1;
 +  gimple rhs1_stmt, rhs2_stmt, source_stmt1;
enum gimple_rhs_class rhs_class;

if (!limit || !is_gimple_assign (stmt))
 -return NULL_TREE;
 +return NULL;

rhs1 = gimple_assign_rhs1 (stmt);

if (find_bswap_or_nop_load (stmt, rhs1, n))
 -

Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-18 Thread Richard Biener

On Wed, Jun 18, 2014 at 6:55 AM, Thomas Preud'homme
thomas.preudho...@arm.com wrote:
 From: Richard Biener [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, June 11, 2014 4:32 PM
 
 
  Is this OK for trunk? Does this bug qualify for a backport patch to
  4.8 and 4.9 branches?

 This is ok for trunk and also for backporting (after a short while to
 see if there is any fallout).

 Below is the backported patch for 4.8/4.9. Is this ok for both 4.8 and
 4.9? If yes, how much more should I wait before committing?

 Tested on both 4.8 and 4.9 without regression in the testsuite after
 a bootstrap.

This is ok to commit now.

Thanks,
Richard.

 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 1e35bbe..0559b7f 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,16 @@
 +2014-06-12  Thomas Preud'homme  thomas.preudho...@arm.com
 +
 +   PR tree-optimization/61306
 +   * tree-ssa-math-opts.c (struct symbolic_number): Store type of
 +   expression instead of its size.
 +   (do_shift_rotate): Adapt to change in struct symbolic_number. Return
 +   false to prevent optimization when the result is unpredictable due to
 +   arithmetic right shift of signed type with highest byte is set.
 +   (verify_symbolic_number_p): Adapt to change in struct symbolic_number.
 +   (find_bswap_1): Likewise. Return NULL to prevent optimization when the
 +   result is unpredictable due to sign extension.
 +   (find_bswap): Adapt to change in struct symbolic_number.
 +
  2014-06-12  Alan Modra  amo...@gmail.com

 PR target/61300
 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
 index 757cb74..139f23c 100644
 --- a/gcc/testsuite/ChangeLog
 +++ b/gcc/testsuite/ChangeLog
 @@ -1,3 +1,9 @@
 +2014-06-12  Thomas Preud'homme  thomas.preudho...@arm.com
 +
 +   * gcc.c-torture/execute/pr61306-1.c: New test.
 +   * gcc.c-torture/execute/pr61306-2.c: Likewise.
 +   * gcc.c-torture/execute/pr61306-3.c: Likewise.
 +
  2014-06-11  Richard Biener  rguent...@suse.de

 PR tree-optimization/61452
 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c 
 b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
 new file mode 100644
 index 000..ebc90a3
 --- /dev/null
 +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-1.c
 @@ -0,0 +1,39 @@
 +#ifdef __INT32_TYPE__
 +typedef __INT32_TYPE__ int32_t;
 +#else
 +typedef int int32_t;
 +#endif
 +
 +#ifdef __UINT32_TYPE__
 +typedef __UINT32_TYPE__ uint32_t;
 +#else
 +typedef unsigned uint32_t;
 +#endif
 +
 +#define __fake_const_swab32(x) ((uint32_t)(  \
 +   (((uint32_t)(x)  (uint32_t)0x00ffUL)  24) |\
 +   (((uint32_t)(x)  (uint32_t)0xff00UL)   8) |\
 +   (((uint32_t)(x)  (uint32_t)0x00ffUL)   8) |\
 +   (( (int32_t)(x)   (int32_t)0xff00UL)  24)))
 +
 +/* Previous version of bswap optimization failed to consider sign extension
 +   and as a result would replace an expression *not* doing a bswap by a
 +   bswap.  */
 +
 +__attribute__ ((noinline, noclone)) uint32_t
 +fake_bswap32 (uint32_t in)
 +{
 +  return __fake_const_swab32 (in);
 +}
 +
 +int
 +main(void)
 +{
 +  if (sizeof (int32_t) * __CHAR_BIT__ != 32)
 +return 0;
 +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
 +return 0;
 +  if (fake_bswap32 (0x87654321) != 0xff87)
 +__builtin_abort ();
 +  return 0;
 +}
 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c 
 b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
 new file mode 100644
 index 000..886ecfd
 --- /dev/null
 +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-2.c
 @@ -0,0 +1,40 @@
 +#ifdef __INT16_TYPE__
 +typedef __INT16_TYPE__ int16_t;
 +#else
 +typedef short int16_t;
 +#endif
 +
 +#ifdef __UINT32_TYPE__
 +typedef __UINT32_TYPE__ uint32_t;
 +#else
 +typedef unsigned uint32_t;
 +#endif
 +
 +#define __fake_const_swab32(x) ((uint32_t)(  \
 +   (((uint32_t) (x)  (uint32_t)0x00ffUL)  24) |   \
 +   (((uint32_t)(int16_t)(x)  (uint32_t)0x0000UL)   8) |   \
 +   (((uint32_t) (x)  (uint32_t)0x00ffUL)   8) |   \
 +   (((uint32_t) (x)  (uint32_t)0xff00UL)  24)))
 +
 +
 +/* Previous version of bswap optimization failed to consider sign extension
 +   and as a result would replace an expression *not* doing a bswap by a
 +   bswap.  */
 +
 +__attribute__ ((noinline, noclone)) uint32_t
 +fake_bswap32 (uint32_t in)
 +{
 +  return __fake_const_swab32 (in);
 +}
 +
 +int
 +main(void)
 +{
 +  if (sizeof (uint32_t) * __CHAR_BIT__ != 32)
 +return 0;
 +  if (sizeof (int16_t) * __CHAR_BIT__ != 16)
 +return 0;
 +  if (fake_bswap32 (0x81828384) != 0xff838281)
 +__builtin_abort ();
 +  return 0;
 +}
 diff --git a/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c 
 b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
 new file mode 100644
 index 000..6086e27
 --- /dev/null
 +++ b/gcc/testsuite/gcc.c-torture/execute/pr61306-3.c
 @@ -0,0 +1,13 @@
 +short a = -1;

Re: [PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO

2014-06-18 Thread Richard Biener

On Wed, Jun 18, 2014 at 10:14 AM, Hale Wang hale.w...@arm.com wrote:
 Hi,

 With LTO, -fno-short-enums is ignored, resulting in ABI mis-matching in
 linking.

 Refer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61123  for details.

 This patch add fshort-enums and fshout-wchar to LTO group.

 To check it, a new procedure object-readelf is added in
 testsuite/lib/lto.exp and new lto tests are added in gcc.target/arm/lto.

 Bootstrap and no make check regression on X86-64.

 Patch also attached for convenience.  Is It ok for trunk?

 Thanks and Best Regards,
 Hale Wang

 c-family/ChangeLog
 2014-06-18 Hale Wang hale.w...@arm.com

 PR lto/61123
 *c.opt (fshort-enums): Add to LTO.
 *c.opt (fshort-wchar): Likewise.

Space after the *.

I think you don't need to copy the LTO harness but you can simply
use dg.exp and sth similar to gcc.dg/20081223-1.c (there is an
effective target 'lto' to guard for lto support).

So simply place the testcase in gcc.target/arm/ (make sure to
put a dg-do compile on the 2nd file and use dg-additional-sources).

If that doesn't work I'd say put the testcase in gcc.dg/lto/ instead
and do a dg-skip-if for non-arm targets.

Ok with one of those changes.

Thanks,
Richard.

 testsuite/ChangeLog
 2014-06-18 Hale Wang hale.w...@arm.com

 * gcc.target/arm/lto/: New folder to verify the LTO
 option for ARM specific.
 * gcc.target/arm/lto/pr61123-enum-size_0.c: New test
 case.
 * gcc.target/arm/lto/pr61123-enum-size_1.c: Likewise.
 * gcc.target/arm/lto/lto.exp: New exp file used to test
 LTO option for ARM specific.
 * lib/lto.exp (object-readelf): New procedure used to
 catch the enum size in the final executable.

 Index: gcc/c-family/c.opt
 ===
 --- gcc/c-family/c.opt (revision 211394)
 +++ gcc/c-family/c.opt  (working copy)
 @@ -1189,11 +1189,11 @@
 Use the same size for double as for float

  fshort-enums
 -C ObjC C++ ObjC++ Optimization Var(flag_short_enums)
 +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums)
 Use the narrowest integer type possible for enumeration types

  fshort-wchar
 -C ObjC C++ ObjC++ Optimization Var(flag_short_wchar)
 +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_wchar)
 Force the underlying type for \wchar_t\ to be \unsigned short\

  fsigned-bitfields
 Index: gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c
 ===
 --- gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c   (revision
 0)
 +++ gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c(revision 0)
 @@ -0,0 +1,22 @@
 +/* { dg-lto-do link } */
 +/* { dg-lto-options { { -fno-short-enums -Wl,-Ur,--no-enum-size-warning -Os
 -nostdlib -flto } } } */
 +
 +#include stdlib.h
 +
 +enum enum_size_attribute
 +{
 +  small_size, int_size
 +};
 +
 +struct debug_ABI_enum_size
 +{
 +  enum enum_size_attribute es;
 +};
 +
 +int
 +foo1 (struct debug_ABI_enum_size *x)
 +{
 +  return sizeof (x-es);
 +}
 +
 +/* { dg-final { object-readelf Tag_ABI_enum_size int { target arm_eabi } }
 } */
 Index: gcc/testsuite/gcc.target/arm/lto/lto.exp
 ===
 --- gcc/testsuite/gcc.target/arm/lto/lto.exp(revision 0)
 +++ gcc/testsuite/gcc.target/arm/lto/lto.exp (revision 0)
 @@ -0,0 +1,59 @@
 +# Copyright (C) 2009-2014 Free Software Foundation, Inc.
 +
 +# This program is free software; you can redistribute it and/or modify
 +# it under the terms of the GNU General Public License as published by
 +# the Free Software Foundation; either version 3 of the License, or
 +# (at your option) any later version.
 +#
 +# This program is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with GCC; see the file COPYING3.  If not see
 +# http://www.gnu.org/licenses/.
 +#
 +# Contributed by Diego Novillo dnovi...@google.com
 +
 +
 +# Test link-time optimization across multiple files.
 +#
 +# Programs are broken into multiple files.  Each one is compiled
 +# separately with LTO information.  The final executable is generated
 +# by collecting all the generated object files using regular LTO or WHOPR.
 +
 +if $tracelevel then {
 +strace $tracelevel
 +}
 +
 +# Load procedures from common libraries.
 +load_lib standard.exp
 +load_lib gcc.exp
 +
 +# Load the language-independent compabibility support procedures.
 +load_lib lto.exp
 +
 +# If LTO has not been enabled, bail.
 +if { ![check_effective_target_lto] } {
 +return
 +}
 +
 +gcc_init
 +lto_init no-mathlib
 +
 +# Define an identifier for use with this

Re: [PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO

2014-06-18 Thread Richard Biener

On Wed, Jun 18, 2014 at 12:21 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Wed, Jun 18, 2014 at 10:14 AM, Hale Wang hale.w...@arm.com wrote:
 Hi,

 With LTO, -fno-short-enums is ignored, resulting in ABI mis-matching in
 linking.

 Refer https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61123  for details.

 This patch add fshort-enums and fshout-wchar to LTO group.

 To check it, a new procedure object-readelf is added in
 testsuite/lib/lto.exp and new lto tests are added in gcc.target/arm/lto.

 Bootstrap and no make check regression on X86-64.

 Patch also attached for convenience.  Is It ok for trunk?

 Thanks and Best Regards,
 Hale Wang

 c-family/ChangeLog
 2014-06-18 Hale Wang hale.w...@arm.com

 PR lto/61123
 *c.opt (fshort-enums): Add to LTO.
 *c.opt (fshort-wchar): Likewise.

 Space after the *.

 I think you don't need to copy the LTO harness but you can simply
 use dg.exp and sth similar to gcc.dg/20081223-1.c (there is an
 effective target 'lto' to guard for lto support).

 So simply place the testcase in gcc.target/arm/ (make sure to
 put a dg-do compile on the 2nd file and use dg-additional-sources).

 If that doesn't work I'd say put the testcase in gcc.dg/lto/ instead
 and do a dg-skip-if for non-arm targets.

 Ok with one of those changes.

Oh, I see you need a new object-readelf ... I defer to a testsuite maintainer
for this part.

Richard.

 Thanks,
 Richard.

 testsuite/ChangeLog
 2014-06-18 Hale Wang hale.w...@arm.com

 * gcc.target/arm/lto/: New folder to verify the LTO
 option for ARM specific.
 * gcc.target/arm/lto/pr61123-enum-size_0.c: New test
 case.
 * gcc.target/arm/lto/pr61123-enum-size_1.c: Likewise.
 * gcc.target/arm/lto/lto.exp: New exp file used to test
 LTO option for ARM specific.
 * lib/lto.exp (object-readelf): New procedure used to
 catch the enum size in the final executable.

 Index: gcc/c-family/c.opt
 ===
 --- gcc/c-family/c.opt (revision 211394)
 +++ gcc/c-family/c.opt  (working copy)
 @@ -1189,11 +1189,11 @@
 Use the same size for double as for float

  fshort-enums
 -C ObjC C++ ObjC++ Optimization Var(flag_short_enums)
 +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums)
 Use the narrowest integer type possible for enumeration types

  fshort-wchar
 -C ObjC C++ ObjC++ Optimization Var(flag_short_wchar)
 +C ObjC C++ ObjC++ LTO Optimization Var(flag_short_wchar)
 Force the underlying type for \wchar_t\ to be \unsigned short\

  fsigned-bitfields
 Index: gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c
 ===
 --- gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c   (revision
 0)
 +++ gcc/testsuite/gcc.target/arm/lto/pr61123-enum-size_0.c(revision 0)
 @@ -0,0 +1,22 @@
 +/* { dg-lto-do link } */
 +/* { dg-lto-options { { -fno-short-enums -Wl,-Ur,--no-enum-size-warning -Os
 -nostdlib -flto } } } */
 +
 +#include stdlib.h
 +
 +enum enum_size_attribute
 +{
 +  small_size, int_size
 +};
 +
 +struct debug_ABI_enum_size
 +{
 +  enum enum_size_attribute es;
 +};
 +
 +int
 +foo1 (struct debug_ABI_enum_size *x)
 +{
 +  return sizeof (x-es);
 +}
 +
 +/* { dg-final { object-readelf Tag_ABI_enum_size int { target arm_eabi } }
 } */
 Index: gcc/testsuite/gcc.target/arm/lto/lto.exp
 ===
 --- gcc/testsuite/gcc.target/arm/lto/lto.exp(revision 0)
 +++ gcc/testsuite/gcc.target/arm/lto/lto.exp (revision 0)
 @@ -0,0 +1,59 @@
 +# Copyright (C) 2009-2014 Free Software Foundation, Inc.
 +
 +# This program is free software; you can redistribute it and/or modify
 +# it under the terms of the GNU General Public License as published by
 +# the Free Software Foundation; either version 3 of the License, or
 +# (at your option) any later version.
 +#
 +# This program is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with GCC; see the file COPYING3.  If not see
 +# http://www.gnu.org/licenses/.
 +#
 +# Contributed by Diego Novillo dnovi...@google.com
 +
 +
 +# Test link-time optimization across multiple files.
 +#
 +# Programs are broken into multiple files.  Each one is compiled
 +# separately with LTO information.  The final executable is generated
 +# by collecting all the generated object files using regular LTO or WHOPR.
 +
 +if $tracelevel then {
 +strace $tracelevel
 +}
 +
 +# Load procedures from common libraries.
 +load_lib standard.exp
 +load_lib gcc.exp
 +
 +# Load the language-independent compabibility support procedures.

[PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns

2014-06-18 Thread Uros Bizjak

Hello!

Attached patch fixes recently added sibcall insns and their
corresponding peephole2 patterns:

- There is no need for new memory_nox32_operand. A generic
memory_operand can be used, since new insns and peephole2 patterns
should be disabled for TARGET_X32 entirely.
- Adds missing m constraint in insn patterns.
- Macroizes peephole2 patterns
- Adds check that eliminated register is really dead after the call
(maybe an overkill, but some hard-to-debug problems surfaced due to
missing liveness checks in the past)
- Fixes call RTXes in sibcall_pop related patterns (and fixes two
newly introduced warnings in i386.md)

2014-06-18  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (*sibcall_memory): Rename from *sibcall_intern.
Do not use unspec as call operand.  Use memory_operand instead of
memory_nox32_operand and add m operand constraint.  Disable
pattern for TARGET_X32.
(*sibcall_pop_memory): Ditto.
(*sibcall_value_memory): Ditto.
(*sibcall_value_pop_memory): Ditto.
(sibcall peepholes): Merge SImode and DImode patterns using
W mode iterator.  Use memory_operand instead of memory_nox32_operand.
Disable pattern for TARGET_X32.  Check if eliminated register is
really dead after call insn.  Generate call RTX without unspec operand.
(sibcall_value peepholes): Ditto.
(sibcall_pop peepholes): Fix call insn RTXes.  Use memory_operand
instead of memory_nox32_operand.  Check if eliminated register is
really dead after call insn. Generate call RTX without unspec operand.
(sibcall_value_pop peepholes): Ditto.
* config/i386/predicates.md (memory_nox32_operand): Remove predicate.

The patch was bootstrapped and regression tested on
x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN.

Uros.
Index: i386.md
===
--- i386.md (revision 211725)
+++ i386.md (working copy)
@@ -11354,53 +11354,38 @@
   * return ix86_output_call_insn (insn, operands[0]);
   [(set_attr type call)])
 
-(define_insn *sibcall_intern
-  [(call (unspec [(mem:QI (match_operand:W 0 memory_nox32_operand))]
-  UNSPEC_PEEPSIB)
-(match_operand 1))]
-  
+(define_insn *sibcall_memory
+  [(call (mem:QI (match_operand:W 0 memory_operand m))
+(match_operand 1))
+   (unspec [(const_int 0)] UNSPEC_PEEPSIB)]
+  !TARGET_X32
   * return ix86_output_call_insn (insn, operands[0]);
   [(set_attr type call)])
 
 (define_peephole2
-  [(set (match_operand:DI 0 register_operand)
-(match_operand:DI 1 memory_nox32_operand))
+  [(set (match_operand:W 0 register_operand)
+   (match_operand:W 1 memory_operand))
(call (mem:QI (match_dup 0))
  (match_operand 3))]
-  TARGET_64BIT  SIBLING_CALL_P (peep2_next_insn (1))
-  [(call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB)
- (match_dup 3))])
+  !TARGET_X32  SIBLING_CALL_P (peep2_next_insn (1))
+peep2_reg_dead_p (2, operands[0])
+  [(parallel [(call (mem:QI (match_dup 1))
+   (match_dup 3))
+ (unspec [(const_int 0)] UNSPEC_PEEPSIB)])])
 
 (define_peephole2
-  [(set (match_operand:DI 0 register_operand)
-(match_operand:DI 1 memory_nox32_operand))
+  [(set (match_operand:W 0 register_operand)
+   (match_operand:W 1 memory_operand))
(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
(call (mem:QI (match_dup 0))
  (match_operand 3))]
-  TARGET_64BIT  SIBLING_CALL_P (peep2_next_insn (2))
+  !TARGET_X32  SIBLING_CALL_P (peep2_next_insn (2))
+peep2_reg_dead_p (3, operands[0])
   [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
-   (call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB)
- (match_dup 3))])
+   (parallel [(call (mem:QI (match_dup 1))
+   (match_dup 3))
+ (unspec [(const_int 0)] UNSPEC_PEEPSIB)])])
 
-(define_peephole2
-  [(set (match_operand:SI 0 register_operand)
-(match_operand:SI 1 memory_nox32_operand))
-   (call (mem:QI (match_dup 0))
- (match_operand 3))]
-  !TARGET_64BIT  SIBLING_CALL_P (peep2_next_insn (1))
-  [(call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB)
- (match_dup 3))])
-
-(define_peephole2
-  [(set (match_operand:SI 0 register_operand)
-(match_operand:SI 1 memory_nox32_operand))
-   (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
-   (call (mem:QI (match_dup 0))
- (match_operand 3))]
-  !TARGET_64BIT  SIBLING_CALL_P (peep2_next_insn (2))
-  [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
-   (call (unspec [(mem:QI (match_dup 1))] UNSPEC_PEEPSIB) (match_dup 3))])
-
 (define_expand call_pop
   [(parallel [(call (match_operand:QI 0)
(match_operand:SI 1))
@@ -11434,42 +11419,52 @@
   * return ix86_output_call_insn (insn, operands[0]);
   [(set_attr type call)])
 
-(define_insn *sibcall_pop_intern
-  [(call (unspec [(mem:QI (match_operand:SI 0 memory_nox32_operand))]
-   UNSPEC_PEEPSIB)
+(define_insn *sibcall_pop_memory
+

Re: [PATCH] Create less TARGET_MEM_REFs

2014-06-18 Thread Richard Biener

On Wed, 18 Jun 2014, Jakub Jelinek wrote:

 On Wed, Jun 18, 2014 at 11:56:01AM +0200, Richard Biener wrote:
  
  I just figured that we create TARGET_MEM_REF [base: a_4, offset: 0]
  from within IVOPTs.  That pessimizes further passes unnecessarily.
  
  Bootstrap and regtest running on x86_64-unknown-linux-gnu.
 
 Isn't that against the comment above it?
???  As IVOPTs does not follow restrictions to where the base
pointer may point to create a MEM_REF only if we know that
base is valid.
 Perhaps it is fine only if addr-offset is integer_zerop?

Oh  yeah, I guess so.  Damn IVOPTs ... (though I wonder what's
the difference with MEM[foo, -4B] then, which we don't catch
either).  That said, I'm not sure if it really fixes anything
not allowing MEM_REFs in all cases.

I've found a different workaround for the issue I was facing so
I'm dropping this patch instead.

Richard.

  2014-06-18  Richard Biener  rguent...@suse.de
  
  * tree-ssa-address.c (create_mem_ref_raw): Use proper predicate
  to catch all valid MEM_REF pointer operands.
  
  Index: gcc/tree-ssa-address.c
  ===
  --- gcc/tree-ssa-address.c  (revision 211771)
  +++ gcc/tree-ssa-address.c  (working copy)
  @@ -393,7 +393,7 @@ create_mem_ref_raw (tree type, tree alia
???  As IVOPTs does not follow restrictions to where the base
pointer may point to create a MEM_REF only if we know that
base is valid.  */
  -  if ((TREE_CODE (base) == ADDR_EXPR || TREE_CODE (base) == INTEGER_CST)
  +  if (is_gimple_mem_ref_addr (base)
  (!index2 || integer_zerop (index2))
  (!addr-index || integer_zerop (addr-index)))
   return fold_build2 (MEM_REF, type, base, addr-offset);

Re: [RFC][ARM]: Fix reload spill failure (PR 60617)

2014-06-18 Thread Venkataramanan Kumar

Hi Ramana,

On 18 June 2014 15:29, Ramana Radhakrishnan ramana@googlemail.com wrote:
 On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
 venkataramanan.ku...@linaro.org wrote:
 Hi Maintainers,

 This patch fixes the PR 60617 that occurs when we turn on reload pass
 in thumb2 mode.

 It occurs for the pattern *ior_scc_scc that gets generated for the 3
 argument of the below function call.

 JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));


 (snip---)
 (insn 634 633 635 27 (parallel [
 (set (reg:SI 3 r3)
 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) == This operand
 r5 is registers gets assigned
 (reg/v:SI 112 [ op2 ]))
 (eq:SI (reg/v:SI 110 [ dst ]) == This operand
 (reg/v:SI 111 [ op1 ]
 (clobber (reg:CC 100 cc))
 ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
 {*ior_scc_scc
 (snip---)

 The issue here is that the above pattern demands 5 registers (LO_REGS).

 But when we are in reload, registers r0 is used for pointer to the
 class, r1 and r2 for first and second argument. r7 is used for stack
 pointer.

 So we are left with r3,r4,r5 and r6. But the above patterns needs five
 LO_REGS. Hence we get spill failure when processing the last register
 operand in that pattern,

 In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
 for thumb 2 mode there is mention of using LO_REG in the comment as
 below.

 Care should be taken to avoid adding thumb-2 patterns that require
 many low registers

 So conservative fix is not to allow this pattern for Thumb-2 mode.

 I don't have an additional solution off the top of my head and
 probably need to go do some digging.

 It sounds like the conservative fix but what's the impact of doing so
 ? Have you measured that in terms of performance or code size on a
 range of benchmarks ?



I haven't done any benchmark testing. I will try and run some
benchmarks with my patch.


 I allowed these pattern for Thumb2 when we have constant operands for
 comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
 thum2-cond-cmp-4.c pass.

 That sounds fine and fair - no trouble there.

 My concern is with removing the register alternatives and loosing the
 ability to trigger conditional compares on 4.9 and trunk for Thumb1
 till the time the new conditional compare work makes it in.

 Ramana

This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
 LRA pass is enabled by default now .
May be too conservative, but is there a way to enable this pattern
when we have LRA pass and prevent it we have old reload pass?

regards,
Venkat.





 Regression tested with gcc 4.9 branch since in trunk this bug is
 masked revision 209897.

 Please provide your suggestion on this patch

 regards,
 Venkat.

Re: [PATCH] PR54555: Use strict_low_part for loading a constant only if it is cheaper

2014-06-18 Thread Andreas Schwab

Jeff Law l...@redhat.com writes:

 Let's do better this time ;-)  Add a testcase for the m68k port which
 verifies we're getting the desired code.

Make sense.  Installed with the following test case.

Andreas.

PR rtl-optimization/54555
* gcc.target/m68k/pr54555.c: New test.

diff --git a/gcc/testsuite/gcc.target/m68k/pr54555.c 
b/gcc/testsuite/gcc.target/m68k/pr54555.c
new file mode 100644
index 000..4be704b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/m68k/pr54555.c
@@ -0,0 +1,13 @@
+/* PR rtl-optimization/54555
+   Test that postreload does not shorten the load of small constants to
+   use move.b instead of moveq.  */
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+/* { dg-final { scan-assembler-not move\\.?b } } */
+
+void foo (void);
+void bar (int a)
+{
+  if (a == 16 || a == 23) foo ();
+  if (a == -110 || a == -128) foo ();
+}
-- 
2.0.0


-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.

[PATCH][RFC] Gate loop passes group on number-of-loops 1, add no-loops group

2014-06-18 Thread Richard Biener


The following aims at reducing the number of pointless passes we run
on functions containing no loops.  Those are at least two copyprop
and one dce pass (two dce passes when vectorization is enabled,
three dce passes and an additional copyprop pass when any graphite
optimization is enabled).

Simply gating pass_tree_loop on number_of_loops ()  1 would disable
basic-block vectorization on loopless functions.  Moving
basic-block vectorization out of pass_tree_loop works to the extent
that you'd need to move IVOPTs as well as data-ref analysis cannot
cope with TARGET_MEM_REFs.

So the following introduces a pass_tree_no_loop pass group which
is enabled whenever the pass_tree_loop group is disabled.
As followup this would allow to skip cleanup work we do after the loop
pipeline just to cleanup after it.

Any comments?  Does such followup sound realistic or would it be
better to take the opportunity to move IVOPTs a bit closer to
RTL expansion and avoid that pass_tree_no_loop hack?

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Thanks,
Richard.

2014-06-18  Richard Biener  rguent...@suse.de

* tree-ssa-loop.c (gate_loop): New function.
(pass_tree_loop::gate): Call it.
(pass_data_tree_no_loop, pass_tree_no_loop,
make_pass_tree_no_loop): New.
* tree-vectorizer.c: Include tree-scalar-evolution.c
(pass_slp_vectorize::execute): Initialize loops and SCEV if
required.
(pass_slp_vectorize::clone): New method.
* timevar.def (TV_TREE_NOLOOP): New.
* tree-pass.h (make_pass_tree_no_loop): Declare.
* passes.def (pass_tree_no_loop): New pass group with
SLP vectorizer.

Index: gcc/tree-ssa-loop.c
===
*** gcc/tree-ssa-loop.c.orig2014-06-18 12:06:19.226205380 +0200
--- gcc/tree-ssa-loop.c 2014-06-18 12:06:39.103204012 +0200
*** along with GCC; see the file COPYING3.
*** 42,47 
--- 42,63 
  #include diagnostic-core.h
  #include tree-vectorizer.h
  
+ 
+ /* Gate for loop pass group.  The group is controlled by -ftree-loop-optimize
+but we also avoid running it when the IL doesn't contain any loop.  */
+ 
+ static bool
+ gate_loop (function *fn)
+ {
+   if (!flag_tree_loop_optimize)
+ return false;
+ 
+   /* Make sure to drop / re-discover loops when necessary.  */
+   if (loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+ fix_loop_structure (NULL);
+   return number_of_loops (fn)  1;
+ }
+ 
  /* The loop superpass.  */
  
  namespace {
*** public:
*** 68,74 
{}
  
/* opt_pass methods: */
!   virtual bool gate (function *) { return flag_tree_loop_optimize != 0; }
  
  }; // class pass_tree_loop
  
--- 84,90 
{}
  
/* opt_pass methods: */
!   virtual bool gate (function *fn) { return gate_loop (fn); }
  
  }; // class pass_tree_loop
  
*** make_pass_tree_loop (gcc::context *ctxt)
*** 80,85 
--- 96,140 
return new pass_tree_loop (ctxt);
  }
  
+ /* The no-loop superpass.  */
+ 
+ namespace {
+ 
+ const pass_data pass_data_tree_no_loop =
+ {
+   GIMPLE_PASS, /* type */
+   no_loop, /* name */
+   OPTGROUP_NONE, /* optinfo_flags */
+   false, /* has_execute */
+   TV_TREE_NOLOOP, /* tv_id */
+   PROP_cfg, /* properties_required */
+   0, /* properties_provided */
+   0, /* properties_destroyed */
+   0, /* todo_flags_start */
+   0, /* todo_flags_finish */
+ };
+ 
+ class pass_tree_no_loop : public gimple_opt_pass
+ {
+ public:
+   pass_tree_no_loop (gcc::context *ctxt)
+ : gimple_opt_pass (pass_data_tree_no_loop, ctxt)
+   {}
+ 
+   /* opt_pass methods: */
+   virtual bool gate (function *fn) { return !gate_loop (fn); }
+ 
+ }; // class pass_tree_no_loop
+ 
+ } // anon namespace
+ 
+ gimple_opt_pass *
+ make_pass_tree_no_loop (gcc::context *ctxt)
+ {
+   return new pass_tree_no_loop (ctxt);
+ }
+ 
+ 
  /* Loop optimizer initialization.  */
  
  namespace {
Index: gcc/tree-vectorizer.c
===
*** gcc/tree-vectorizer.c.orig  2014-06-18 12:06:19.226205380 +0200
--- gcc/tree-vectorizer.c   2014-06-18 12:10:55.958186328 +0200
*** along with GCC; see the file COPYING3.
*** 82,87 
--- 82,89 
  #include tree-ssa-propagate.h
  #include dbgcnt.h
  #include gimple-fold.h
+ #include tree-scalar-evolution.h
+ 
  
  /* Loop or bb location.  */
  source_location vect_location;
*** public:
*** 610,615 
--- 612,618 
{}
  
/* opt_pass methods: */
+   opt_pass * clone () { return new pass_slp_vectorize (m_ctxt); }
virtual bool gate (function *) { return flag_tree_slp_vectorize != 0; }
virtual unsigned int execute (function *);
  
*** pass_slp_vectorize::execute (function *f
*** 620,625 
--- 623,635 
  {
basic_block bb;
  
+   bool in_loop_pipeline = scev_initialized_p ();
+   if (!in_loop_pipeline)
+ {
+   loop_optimizer_init

[PATCH PR61518]

2014-06-18 Thread Yuri Rumyantsev

Hi All,

Here is a fix for PR 61518 - additional test was added to reject
transformation if reduction variable is not used in reduction
statement only since such reduction will not be vectorized.

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?

 gcc/ChangeLog
2014-06-18  Yuri Rumyantsev  ysrum...@gmail.com

PR tree-optimization/61518
* tree-if-conv.c (is_cond_scalar_reduction): Add missed check that
reduction var is used in reduction stmt or phi-function only.

gcc/testsuite/ChangeLog
* gcc.dg/torture/pr61518.c: New test.


patch
Description: Binary data

Re: [PATCH][RFC] Gate loop passes group on number-of-loops 1, add no-loops group

2014-06-18 Thread Jakub Jelinek

On Wed, Jun 18, 2014 at 12:42:19PM +0200, Richard Biener wrote:
 Any comments?  Does such followup sound realistic or would it be
 better to take the opportunity to move IVOPTs a bit closer to
 RTL expansion and avoid that pass_tree_no_loop hack?

I think it is fine to have pass_tree_no_loop pass pipeline.

Jakub

[PATCH][ARM][committed] Fix check for __FAST_MATH in arm_neon.h

2014-06-18 Thread Kyrill Tkachov


Hi all,

All other #ifdefs in arm_neon.h that look for fast math use the 
__FAST_MATH form rather than __FAST_MATH__. They have the same effect 
AFAICS.

This patch fixes the one sticking out.

Committed as obvious with r211779.

Thanks,
Kyrill

2014-06-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm_neon.h (vadd_f32): Change #ifdef to __FAST_MATH.diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3e29f44..47f6c5e 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -474,7 +474,7 @@ vadd_s32 (int32x2_t __a, int32x2_t __b)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vadd_f32 (float32x2_t __a, float32x2_t __b)
 {
-#ifdef __FAST_MATH__
+#ifdef __FAST_MATH
   return __a + __b;
 #else
   return (float32x2_t) __builtin_neon_vaddv2sf (__a, __b, 3);

Re: [PATCH PR61518]

2014-06-18 Thread Richard Biener

On Wed, Jun 18, 2014 at 12:47 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Hi All,

 Here is a fix for PR 61518 - additional test was added to reject
 transformation if reduction variable is not used in reduction
 statement only since such reduction will not be vectorized.

 Bootstrap and regression testing did not show any new failures.

 Is it OK for trunk?

Ok.

Thanks,
Richard.

  gcc/ChangeLog
 2014-06-18  Yuri Rumyantsev  ysrum...@gmail.com

 PR tree-optimization/61518
 * tree-if-conv.c (is_cond_scalar_reduction): Add missed check that
 reduction var is used in reduction stmt or phi-function only.

 gcc/testsuite/ChangeLog
 * gcc.dg/torture/pr61518.c: New test.

Re: [PATCH] pass cleanups

2014-06-18 Thread Richard Biener

On Wed, 18 Jun 2014, Richard Biener wrote:

 
 This removes the special dce_loop pass in favor of dealing with
 scev and niter estimates in dce generally.  Likewise it makes
 copyprop always cleanup after itself, dealing with scev and niter
 estimates.  It also makes copyprop not unconditionally schedule
 a cfg-cleanup but only do so if copyprop did any transform.
 
 Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I have applied the following with the testsuite adjustments needed
for gcc.dg/vect/dump-tree-dceloop-pr26359.c.

Richard.

2014-06-18  Richard Biener  rguent...@suse.de

* tree-pass.h (make_pass_dce_loop): Remove.
* passes.def: Replace pass_dce_loop with pass_dce.
* tree-ssa-dce.c (perform_tree_ssa_dce): If something
changed free niter estimates and reset the scev cache.
(tree_ssa_dce_loop, pass_data_dce_loop, pass_dce_loop,
make_pass_dce_loop): Remove.
* tree-ssa-copy.c: Include tree-ssa-loop-niter.h.
(fini_copy_prop): Return whether something changed.  Always
let substitute_and_fold perform DCE and free niter estimates
and reset the scev cache if so.
(execute_copy_prop): If sth changed schedule cleanup-cfg.
(pass_data_copy_prop): Do not unconditionally schedule
cleanup-cfg or update-ssa.

* gcc.dg/vect/vect.exp: Remove dump-tree-dceloop-* processing.
* gcc.dg/vect/dump-tree-dceloop-pr26359.c: Rename to ...
* gcc.dg/vect/pr26359.c: ... this and adjust appropriately.

Index: gcc/tree-pass.h
===
*** gcc/tree-pass.h (revision 211738)
--- gcc/tree-pass.h (working copy)
*** extern gimple_opt_pass *make_pass_build_
*** 382,388 
  extern gimple_opt_pass *make_pass_build_ealias (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_dominator (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_dce (gcc::context *ctxt);
- extern gimple_opt_pass *make_pass_dce_loop (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_cd_dce (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_call_cdce (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_merge_phi (gcc::context *ctxt);
--- 382,387 
Index: gcc/passes.def
===
*** gcc/passes.def  (revision 211738)
--- gcc/passes.def  (working copy)
*** along with GCC; see the file COPYING3.
*** 203,209 
  NEXT_PASS (pass_tree_loop_init);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
! NEXT_PASS (pass_dce_loop);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_record_bounds);
--- 206,212 
  NEXT_PASS (pass_tree_loop_init);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
! NEXT_PASS (pass_dce);
  NEXT_PASS (pass_tree_unswitch);
  NEXT_PASS (pass_scev_cprop);
  NEXT_PASS (pass_record_bounds);
*** along with GCC; see the file COPYING3.
*** 215,221 
  NEXT_PASS (pass_graphite_transforms);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
! NEXT_PASS (pass_dce_loop);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_iv_canon);
  NEXT_PASS (pass_parallelize_loops);
--- 218,224 
  NEXT_PASS (pass_graphite_transforms);
  NEXT_PASS (pass_lim);
  NEXT_PASS (pass_copy_prop);
! NEXT_PASS (pass_dce);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_iv_canon);
  NEXT_PASS (pass_parallelize_loops);
*** along with GCC; see the file COPYING3.
*** 224,230 
 Please do not add any other passes in between.  */
  NEXT_PASS (pass_vectorize);
PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
! NEXT_PASS (pass_dce_loop);
POP_INSERT_PASSES ()
NEXT_PASS (pass_predcom);
  NEXT_PASS (pass_complete_unroll);
--- 227,233 
 Please do not add any other passes in between.  */
  NEXT_PASS (pass_vectorize);
PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
! NEXT_PASS (pass_dce);
POP_INSERT_PASSES ()
NEXT_PASS (pass_predcom);
  NEXT_PASS (pass_complete_unroll);
Index: gcc/tree-ssa-dce.c
===
*** gcc/tree-ssa-dce.c  (revision 211738)
--- gcc/tree-ssa-dce.c  (working copy)
*** perform_tree_ssa_dce (bool aggressive)
*** 1479,1485 
tree_dce_done (aggressive);
  
if (something_changed)
! return TODO_update_ssa | TODO_cleanup_cfg;
return 0;
  }
  
--- 1479,1490 
tree_dce_done (aggressive);
  
if (something_changed)
! {
!   free_numbers_of_iterations_estimates ();
!   if (scev_initialized_p)
!

Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns

2014-06-18 Thread Kai Tietz

2014-06-18 12:24 GMT+02:00 Uros Bizjak ubiz...@gmail.com:
 Hello!

 Attached patch fixes recently added sibcall insns and their
 corresponding peephole2 patterns:

 - There is no need for new memory_nox32_operand. A generic
 memory_operand can be used, since new insns and peephole2 patterns
 should be disabled for TARGET_X32 entirely.
 - Adds missing m constraint in insn patterns.
 - Macroizes peephole2 patterns
 - Adds check that eliminated register is really dead after the call
 (maybe an overkill, but some hard-to-debug problems surfaced due to
 missing liveness checks in the past)
 - Fixes call RTXes in sibcall_pop related patterns (and fixes two
 newly introduced warnings in i386.md)

 2014-06-18  Uros Bizjak  ubiz...@gmail.com

 * config/i386/i386.md (*sibcall_memory): Rename from *sibcall_intern.
 Do not use unspec as call operand.  Use memory_operand instead of
 memory_nox32_operand and add m operand constraint.  Disable
 pattern for TARGET_X32.
 (*sibcall_pop_memory): Ditto.
 (*sibcall_value_memory): Ditto.
 (*sibcall_value_pop_memory): Ditto.
 (sibcall peepholes): Merge SImode and DImode patterns using
 W mode iterator.  Use memory_operand instead of memory_nox32_operand.
 Disable pattern for TARGET_X32.  Check if eliminated register is
 really dead after call insn.  Generate call RTX without unspec operand.
 (sibcall_value peepholes): Ditto.
 (sibcall_pop peepholes): Fix call insn RTXes.  Use memory_operand
 instead of memory_nox32_operand.  Check if eliminated register is
 really dead after call insn. Generate call RTX without unspec operand.
 (sibcall_value_pop peepholes): Ditto.
 * config/i386/predicates.md (memory_nox32_operand): Remove predicate.

 The patch was bootstrapped and regression tested on
 x86_64-pc-linux-gnu {,-m32} and was committed to mainline SVN.

 Uros.

The following change in predicates.md seems to be a bit premature.
There is still the point about Darwin's PIC issue for unspec-gotpcrel.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387

  return ANY_QI_REG_P (op);
})

+;; Return true if OP is a memory operands that can be used in sibcalls.
(define_predicate sibcall_memory_operand
-  (match_operand 0 memory_operand)
-{
-  return CONSTANT_P (XEXP (op, 0));
-})
+  (and (match_operand 0 memory_operand)
+   (match_test CONSTANT_P (XEXP (op, 0)

as we might to pessimize for Darwin UNSPEC_GOTPCREL at that point.
 In general there is still the question why this issue just happens
for Darwin, but not for linux.  For linux that gotpcrel-code path
seems not to be hit at all (at least is that what Ians told).

Kai

[GSoC] [match-and-simplify] check for capture index

2014-06-18 Thread Prathamesh Kulkarni

Put a check for capture index.

* genmatch.c (parse_capture): Add condition to check capture index.
 (capture_max): New constant.
 (stdlib.h): Include.

Thanks and Regards,
Prathamesh
Index: genmatch.c
===
--- genmatch.c	(revision 211732)
+++ genmatch.c	(working copy)
@@ -29,7 +29,9 @@ along with GCC; see the file COPYING3.
 #include hashtab.h
 #include hash-table.h
 #include vec.h
+#include stdlib.h
 
+const unsigned capture_max = 4;
 
 /* libccp helpers.  */
 
@@ -816,7 +818,11 @@ static struct operand *
 parse_capture (cpp_reader *r, operand *op)
 {
   eat_token (r, CPP_ATSIGN);
-  return new capture (get_number (r), op);
+  const cpp_token *token = peek (r);
+  const char *num = get_number (r);
+  if (atoi (num) = capture_max)
+fatal_at (token, capture cannot be greater than %u, capture_max - 1);
+  return new capture (num, op);
 }
 
 /* Parse

Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns

2014-06-18 Thread Uros Bizjak

On Wed, Jun 18, 2014 at 2:24 PM, Kai Tietz ktiet...@googlemail.com wrote:

 The following change in predicates.md seems to be a bit premature.
 There is still the point about Darwin's PIC issue for unspec-gotpcrel.

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61387

   return ANY_QI_REG_P (op);
 })

 +;; Return true if OP is a memory operands that can be used in sibcalls.
 (define_predicate sibcall_memory_operand
 -  (match_operand 0 memory_operand)
 -{
 -  return CONSTANT_P (XEXP (op, 0));
 -})
 +  (and (match_operand 0 memory_operand)
 +   (match_test CONSTANT_P (XEXP (op, 0)

 as we might to pessimize for Darwin UNSPEC_GOTPCREL at that point.
  In general there is still the question why this issue just happens
 for Darwin, but not for linux.  For linux that gotpcrel-code path
 seems not to be hit at all (at least is that what Ians told).

Oh, this part doesn't change any functionality at all. The predicate
is just written in a different way.

Uros.

Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns

2014-06-18 Thread Dominique Dhumieres


 The following change in predicates.md seems to be a bit premature.
 There is still the point about Darwin's PIC issue for unspec-gotpcrel.

The change is indeed incompatible with the patch in pr61387 comment 9.
And without it the failures are back!-(

Kai, what is wrong with Iain's patch in comment 4?

TIA

Dominique

Re: [PATCH][RFC] Gate loop passes group on number-of-loops 1, add no-loops group

2014-06-18 Thread Jeff Law


On 06/18/14 04:42, Richard Biener wrote:


The following aims at reducing the number of pointless passes we run
on functions containing no loops.  Those are at least two copyprop
and one dce pass (two dce passes when vectorization is enabled,
three dce passes and an additional copyprop pass when any graphite
optimization is enabled).

Simply gating pass_tree_loop on number_of_loops ()  1 would disable
basic-block vectorization on loopless functions.  Moving
basic-block vectorization out of pass_tree_loop works to the extent
that you'd need to move IVOPTs as well as data-ref analysis cannot
cope with TARGET_MEM_REFs.

So the following introduces a pass_tree_no_loop pass group which
is enabled whenever the pass_tree_loop group is disabled.
As followup this would allow to skip cleanup work we do after the loop
pipeline just to cleanup after it.

Any comments?  Does such followup sound realistic or would it be
better to take the opportunity to move IVOPTs a bit closer to
RTL expansion and avoid that pass_tree_no_loop hack?
Sounds good.  I've always believed that each pass should be bubbling 
back up some kind of status about what it did/found as well.


It was more of an RTL issue, but we had a certain commercial testsuite 
which created large loopless tests (*) that consumed vast quantities of 
wall clock time.  I always wanted the RTL loop passes to signal back to 
toplev.c that no loops were found, which would in turn be used to say 
we really don't need cse-after-loop and friends.
It's certainly more complex these days, but I'd still like to be able to 
do such things.


Regardless, that's well outside the scope of what you're trying to 
accomplish.


* Those tests consistently found port bugs, so we really didn't want to 
disable them.


jeff

Re: [PATCH, i386]: Fix recently added sibcall insns and peephole2 patterns

2014-06-18 Thread Kai Tietz

2014-06-18 15:11 GMT+02:00 Dominique Dhumieres domi...@lps.ens.fr:

 The following change in predicates.md seems to be a bit premature.
 There is still the point about Darwin's PIC issue for unspec-gotpcrel.

 The change is indeed incompatible with the patch in pr61387 comment 9.
 And without it the failures are back!-(

 Kai, what is wrong with Iain's patch in comment 4?

 TIA

 Dominique

Well, I don't see there patch, just a comment about a possible
code-change.  I think it is getting fixed at wrong place.  Reverting
fnaddr back to its original seems to be wrong solution here.  It would
be better to do something in the lines 'if (!flag_pic ||
targetm.binds_local_p (function) || TARGET_PECOFF)' instead.
Why shall we make it here an unspec, and later on revert that change?
If Darwin isn't able to handle such an UNSPEC_GOTPCREL as address for
a sibcall pattern, we should avoid it in general, and not just
papering over at place of mi__tunk.

Kai

Re: [patch] improve sloc assignment on bind_expr entry/exit code

2014-06-18 Thread Jeff Law


On 06/18/14 01:42, Olivier Hainque wrote:

Hi Jeff,

On Jun 17, 2014, at 22:42 , Jeff Law l...@redhat.com wrote:


* tree-core.h (tree_block): Add an end_locus field, allowing
memorization of the end of block source location.
* tree.h (BLOCK_SOURCE_END_LOCATION): New accessor.
* gimplify.c (gimplify_bind_expr): Propagate the block start and
end source location info we have on the block entry/exit code we
generate.

OK.


  Great, thanks! :-)


  I assume y'all will add a suitable test to the Ada testsuite and propagate it 
into the GCC testsuite in due course?


  Yes, I will. At the patch submission time, I was unclear on what dejagnu
  device was available to setup a reliable testing protocol for this kind of
  issue and I was interested in getting feedback on the patch contents first.

  ISTM that dg-scan-asm for the expected extra .loc's would work, maybe
  restricted to some target we know produces .loc directives.

  Sounds appropriate ?
Yea, that should be fine.  Most folks test x86-64 linux, so that's going 
to get you the widest net for coverage.


jeff

Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access

2014-06-18 Thread Charles Baylis

On 18 June 2014 11:01, Ramana Radhakrishnan ramana@googlemail.com wrote:
 On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
 charles.bay...@linaro.org wrote:
 This patch adds support for post-indexed addressing for NEON structure
 memory accesses.

 For example VLD1.8 {d0}, [r0], r1


 Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.

 Ok for trunk?

 This is OK.

Committed as r211783.

Re: [patch] improve sloc assignment on bind_expr entry/exit code

2014-06-18 Thread Olivier Hainque


On Jun 18, 2014, at 15:48 , Jeff Law l...@redhat.com wrote:

  ISTM that dg-scan-asm for the expected extra .loc's would work, maybe
  restricted to some target we know produces .loc directives.
 
  Sounds appropriate ?

 Yea, that should be fine.  Most folks test x86-64 linux, so that's going to 
 get you the widest net for coverage.


 OK, patch  test checked in. Thanks again for your feedback.

 Cheers,

 Olivier

Re: [PATCH, rs6000] Remove XFAIL from default_format_denormal_2.f90 for PowerPC on Linux

2014-06-18 Thread David Edelsohn

On Tue, Jun 17, 2014 at 4:48 PM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 William J. Schmidt wschm...@linux.vnet.ibm.com writes:

 Index: gcc/testsuite/gfortran.dg/default_format_denormal_2.f90
 ===
 --- gcc/testsuite/gfortran.dg/default_format_denormal_2.f90   (revision 
 211741)
 +++ gcc/testsuite/gfortran.dg/default_format_denormal_2.f90   (working copy)
 @@ -1,5 +1,5 @@
  ! { dg-require-effective-target fortran_large_real }
 -! { dg-do run { xfail powerpc*-apple-darwin* powerpc*-*-linux* } }
 +! { dg-do run { xfail powerpc*-apple-darwin* } }
  ! Test XFAILed on these platforms because the system's printf() lacks
  ! proper support for denormalized long doubles. See PR24685

 You should also update the comment: `these platforms' no longer applies.

 Rainer

Yes, okay, with the grammar fix also.

Thanks, David

Re: [PATCH 3/9] Optimise __aeabi_uldivmod (stack manipulation)

2014-06-18 Thread Richard Earnshaw

On 11/06/14 11:19, Charles Baylis wrote:
 2014-05-22  Charles Baylis  charles.bay...@linaro.org
 
   * config/arm/bpabi.S (__aeabi_uldivmod): Optimise stack pointer
   manipulation.

OK.

R.

 ---
  libgcc/config/arm/bpabi.S | 54 
 +--
  1 file changed, 43 insertions(+), 11 deletions(-)
 
 diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S
 index ae76cd3..67246b0 100644
 --- a/libgcc/config/arm/bpabi.S
 +++ b/libgcc/config/arm/bpabi.S
 @@ -120,6 +120,46 @@ ARM_FUNC_START aeabi_ulcmp
  #endif
  .endm
  
 +/* we can use STRD/LDRD on v5TE and later, and any Thumb-2 architecture. */
 +#if (defined(__ARM_EABI__)\
 +  (defined(__thumb2__)  \
 + || (__ARM_ARCH = 5  defined(__TARGET_FEATURE_DSP
 +#define CAN_USE_LDRD 1
 +#else
 +#define CAN_USE_LDRD 0
 +#endif
 +
 +/* set up stack from for call to __udivmoddi4. At the end of the macro the
 +   stack is arranged as follows:
 + sp+12   / space for remainder
 + sp+8\ (written by __udivmoddi4)
 + sp+4lr
 + sp+0sp+8 [rp (remainder pointer) argument for __udivmoddi4]
 +
 + */
 +.macro push_for_divide fname
 +#if defined(__thumb2__)  CAN_USE_LDRD
 + sub ip, sp, #8
 + strdip, lr, [sp, #-16]!
 +#else
 + sub sp, sp, #8
 + do_push {sp, lr}
 +#endif
 +98:  cfi_push98b - \fname, 0xe, -0xc, 0x10
 +.endm
 +
 +/* restore stack */
 +.macro pop_for_divide
 + ldr lr, [sp, #4]
 +#if CAN_USE_LDRD
 + ldrdr2, r3, [sp, #8]
 + add sp, sp, #16
 +#else
 + add sp, sp, #8
 + do_pop  {r2, r3}
 +#endif
 +.endm
 +
  #ifdef L_aeabi_ldivmod
  
  /* Perform 64 bit signed division.
 @@ -165,18 +205,10 @@ ARM_FUNC_START aeabi_uldivmod
   cfi_start   __aeabi_uldivmod, LSYM(Lend_aeabi_uldivmod)
   test_div_by_zerounsigned
  
 - sub sp, sp, #8
 -#if defined(__thumb2__)
 - mov ip, sp
 - push{ip, lr}
 -#else
 - do_push {sp, lr}
 -#endif
 -98:  cfi_push 98b - __aeabi_uldivmod, 0xe, -0xc, 0x10
 + push_for_divide __aeabi_uldivmod
 + /* arguments in (r0:r1), (r2:r3) and *sp */
   bl  SYM(__gnu_uldivmod_helper) __PLT__
 - ldr lr, [sp, #4]
 - add sp, sp, #8
 - do_pop  {r2, r3}
 + pop_for_divide
   RET
   cfi_end LSYM(Lend_aeabi_uldivmod)

Re: [PATCH 4/9] Optimise __aeabi_uldivmod

2014-06-18 Thread Richard Earnshaw

On 11/06/14 11:19, Charles Baylis wrote:
 2014-05-22  Charles Baylis  charles.bay...@linaro.org
 
 * config/arm/bpabi.S (__aeabi_uldivmod): Perform division using call
 to __udivmoddi4.

OK.

R.

 ---
  libgcc/config/arm/bpabi.S | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S
 index 67246b0..927e37f 100644
 --- a/libgcc/config/arm/bpabi.S
 +++ b/libgcc/config/arm/bpabi.S
 @@ -207,7 +207,7 @@ ARM_FUNC_START aeabi_uldivmod
  
   push_for_divide __aeabi_uldivmod
   /* arguments in (r0:r1), (r2:r3) and *sp */
 - bl  SYM(__gnu_uldivmod_helper) __PLT__
 + bl  SYM(__udivmoddi4) __PLT__
   pop_for_divide
   RET
   cfi_end LSYM(Lend_aeabi_uldivmod)

Re: [PATCH 5/9] Optimise __aeabi_ldivmod (stack manipulation)

2014-06-18 Thread Richard Earnshaw

On 11/06/14 11:19, Charles Baylis wrote:
 2014-05-22  Charles Baylis  charles.bay...@linaro.org
 
   * config/arm/bpabi.S (__aeabi_ldivmod): Optimise stack manipulation.

OK.

R.

 ---
  libgcc/config/arm/bpabi.S | 14 +++---
  1 file changed, 3 insertions(+), 11 deletions(-)
 
 diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S
 index 927e37f..3f9ece5 100644
 --- a/libgcc/config/arm/bpabi.S
 +++ b/libgcc/config/arm/bpabi.S
 @@ -174,18 +174,10 @@ ARM_FUNC_START aeabi_ldivmod
   cfi_start   __aeabi_ldivmod, LSYM(Lend_aeabi_ldivmod)
   test_div_by_zerosigned
  
 - sub sp, sp, #8
 -#if defined(__thumb2__)
 - mov ip, sp
 - push{ip, lr}
 -#else
 - do_push {sp, lr}
 -#endif
 -98:  cfi_push 98b - __aeabi_ldivmod, 0xe, -0xc, 0x10
 + push_for_divide __aeabi_ldivmod
 + /* arguments in (r0:r1), (r2:r3) and *sp */
   bl  SYM(__gnu_ldivmod_helper) __PLT__
 - ldr lr, [sp, #4]
 - add sp, sp, #8
 - do_pop  {r2, r3}
 + pop_for_divide
   RET
   cfi_end LSYM(Lend_aeabi_ldivmod)

Re: [PATCH, rs6000] Fix PR61542 - V4SF vector extract for little endian

2014-06-18 Thread David Edelsohn

On Tue, Jun 17, 2014 at 6:44 PM, BIll Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61542, a
 new test case (gcc.dg/vect/vect-nop-move.c) was added in 4.9.  This
 exposes a bug on PowerPC little endian for extracting an element from a
 V4SF value that goes back to 4.8.  The following patch fixes the
 problem.

 Tested on powerpc64le-unknown-linux-gnu with no regressions.  Ok to
 commit to trunk?  I would also like to commit to 4.8 and 4.9 as soon as
 possible to be picked up by the distros.

This is okay everywhere.

 I would also like to backport gcc.dg/vect/vect-nop-move.c to 4.8 to
 provide regression coverage.

You should ask Bernd and the RMs. Was the bug fix that prompted the
new testcase backported to all targets?

Thanks, David

Re: [PATCH 6/9] Optimise __aeabi_ldivmod

2014-06-18 Thread Richard Earnshaw

On 11/06/14 11:19, Charles Baylis wrote:
 2014-05-22  Charles Baylis  charles.bay...@linaro.org
 
   * config/arm/bpabi.S (__aeabi_ldivmod): Perform division using
   __udivmoddi4, and fixups for negative operands.

OK.


 ---
  libgcc/config/arm/bpabi.S | 41 -
  1 file changed, 40 insertions(+), 1 deletion(-)
 
 diff --git a/libgcc/config/arm/bpabi.S b/libgcc/config/arm/bpabi.S
 index 3f9ece5..c044167 100644
 --- a/libgcc/config/arm/bpabi.S
 +++ b/libgcc/config/arm/bpabi.S
 @@ -175,10 +175,49 @@ ARM_FUNC_START aeabi_ldivmod
   test_div_by_zerosigned
  
   push_for_divide __aeabi_ldivmod
 + cmp xxh, #0
 + blt 1f
 + cmp yyh, #0
 + blt 2f
 + /* arguments in (r0:r1), (r2:r3) and *sp */
 + bl  SYM(__udivmoddi4) __PLT__
 + pop_for_divide
 + RET
 +
 +1: /* xxh:xxl is negative */
 + negsxxl, xxl
 + sbc xxh, xxh, xxh, lsl #1   /* Thumb-2 has no RSC, so use X - 2X */
 + cmp yyh, #0
 + blt 3f
 + /* arguments in (r0:r1), (r2:r3) and *sp */
 + bl  SYM(__udivmoddi4) __PLT__
 + pop_for_divide
 + negsxxl, xxl
 + sbc xxh, xxh, xxh, lsl #1   /* Thumb-2 has no RSC, so use X - 2X */
 + negsyyl, yyl
 + sbc yyh, yyh, yyh, lsl #1   /* Thumb-2 has no RSC, so use X - 2X */
 + RET
 +
 +2: /* only yyh:yyl is negative */
 + negsyyl, yyl
 + sbc yyh, yyh, yyh, lsl #1   /* Thumb-2 has no RSC, so use X - 2X */
 + /* arguments in (r0:r1), (r2:r3) and *sp */
 + bl  SYM(__udivmoddi4) __PLT__
 + pop_for_divide
 + negsxxl, xxl
 + sbc xxh, xxh, xxh, lsl #1   /* Thumb-2 has no RSC, so use X - 2X */
 + RET
 +
 +3: /* both xxh:xxl and yyh:yyl are negative */
 + negsyyl, yyl
 + sbc yyh, yyh, yyh, lsl #1   /* Thumb-2 has no RSC, so use X - 2X */
   /* arguments in (r0:r1), (r2:r3) and *sp */
 - bl  SYM(__gnu_ldivmod_helper) __PLT__
 + bl  SYM(__udivmoddi4) __PLT__
   pop_for_divide
 + negsyyl, yyl
 + sbc yyh, yyh, yyh, lsl #1   /* Thumb-2 has no RSC, so use X - 2X */
   RET
 +
   cfi_end LSYM(Lend_aeabi_ldivmod)
   
  #endif /* L_aeabi_ldivmod */

Re: [PATCH 8/9] Use __udivmoddi4 for v6M aeabi_uldivmod

2014-06-18 Thread Richard Earnshaw

On 11/06/14 11:19, Charles Baylis wrote:
 2014-05-22  Charles Baylis  charles.bay...@linaro.org
 
   * config/arm/bpabi-v6m.S (__aeabi_uldivmod): Perform division using
   __udivmoddi4.

OK.

R.

 ---
  libgcc/config/arm/bpabi-v6m.S | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/libgcc/config/arm/bpabi-v6m.S b/libgcc/config/arm/bpabi-v6m.S
 index 0bf2e55..d549fa6 100644
 --- a/libgcc/config/arm/bpabi-v6m.S
 +++ b/libgcc/config/arm/bpabi-v6m.S
 @@ -148,7 +148,7 @@ FUNC_START aeabi_uldivmod
   mov r0, sp
   push {r0, lr}
   ldr r0, [sp, #8]
 - bl SYM(__gnu_uldivmod_helper)
 + bl SYM(__udivmoddi4)
   ldr r3, [sp, #4]
   mov lr, r3
   add sp, sp, #8

Re: [PATCH 9/9] Remove __gnu_uldivmod_helper

2014-06-18 Thread Richard Earnshaw

On 11/06/14 11:19, Charles Baylis wrote:
 2014-05-22  Charles Baylis  charles.bay...@linaro.org
 
   * config/arm/bpabi.c (__gnu_uldivmod_helper): Remove.

OK.

R.

 ---
  libgcc/config/arm/bpabi.c | 14 --
  1 file changed, 14 deletions(-)
 
 diff --git a/libgcc/config/arm/bpabi.c b/libgcc/config/arm/bpabi.c
 index 7b155cc..e90d044 100644
 --- a/libgcc/config/arm/bpabi.c
 +++ b/libgcc/config/arm/bpabi.c
 @@ -26,9 +26,6 @@ extern long long __divdi3 (long long, long long);
  extern unsigned long long __udivdi3 (unsigned long long, 
unsigned long long);
  extern long long __gnu_ldivmod_helper (long long, long long, long long *);
 -extern unsigned long long __gnu_uldivmod_helper (unsigned long long, 
 -  unsigned long long, 
 -  unsigned long long *);
  
  
  long long
 @@ -43,14 +40,3 @@ __gnu_ldivmod_helper (long long a,
return quotient;
  }
  
 -unsigned long long
 -__gnu_uldivmod_helper (unsigned long long a, 
 -unsigned long long b,
 -unsigned long long *remainder)
 -{
 -  unsigned long long quotient;
 -
 -  quotient = __udivdi3 (a, b);
 -  *remainder = a - b * quotient;
 -  return quotient;
 -}

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-06-18 Thread Ilya Verbin

On 17 Jun 21:22, Bernd Schmidt wrote:
 On 06/17/2014 08:20 PM, Ilya Verbin wrote:
 I don't get this part of the plan.  Where a host compiler will look for 
 mkoffloads?
 
 E.g., first I configure/make/install the target gcc and corresponding 
 mkoffload with the following options:
 --enable-accelerator=intelmic 
 --enable-as-accelerator-for=x86_64-unknown-linux 
 --prefix=/install_gcc/accel_intelmic
 
 Next I configure/make/install the host gcc with:
 --enable-accelerator=intelmic --prefix=/install_gcc/host
 
 Try using the same prefix for both.

I tried to do:
1. --enable-accelerator=intelmic 
--enable-as-accelerator-for=x86_64-intelmic-linux-gnu --prefix=/install_gcc/both
2. --enable-accelerator=intelmic --prefix=/install_gcc/both

In this case only bin/x86_64-intelmic-linux-gnu-accel-intelmic-gcc from accel 
compiler is saved.
All other binaries in bin, lib, lib64, libexec are replaced by host's ones.
Is there a way to have 2 working compilers and libs in the same prefix?

Thanks,
  -- Ilya

Re: [PATCH, ARM] Enable fuse-caller-save for ARM

2014-06-18 Thread Ramana Radhakrishnan

On Sun, Jun 1, 2014 at 12:27 PM, Tom de Vries tom_devr...@mentor.com wrote:
 Richard,

 This patch:
 - adds the for TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS required
   clobbers in CALL_INSN_FUNCTION_USAGE,
 - sets TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true, which
 enables
   the fuse-caller-save optimisation, and
 - adds an arm fuse-caller-save test-case.

 Build and tested on arm-linux-gnueabi.

 OK for trunk?

 +  /* For AAPCS, IP and CC can be clobbered by veneers inserted by the
 + linker.  We need to add these to allow
 + arm_call_fusage_contains_non_callee_clobbers to return true.  */

Please reindent so that arm_call_fusage is on the 2nd line.

Otherwise ok if no regressions.

regards
Ramana



 Thanks,
 - Tom

[PATCH, Testsuite, AArch64] Make Function Return Value Test More Robust

2014-06-18 Thread Yufeng Zhang


Hi,

This improves the robustness of the aapcs64 test framework for testing 
function return ABI rules.  It ensures the test facility functions now 
able to see the exact content of return registers right at the moment 
when a function returns.


OK for trunk?

Thanks,
Yufeng

gcc/testsuite

Make the AAPCS64 function return tests more robust.

* gcc.target/aarch64/aapcs64/abitest-2.h (saved_return_address): New
global variable.
(FUNC_VAL_CHECK): Update to call myfunc via the 'ret' instruction,
instead of calling sequentially in the C code.
* gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Store
saved_return_address to the stack frame where LR register was stored.
(saved_return_address): Declare weak.diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
index c56e7cc..c87fe9b 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
@@ -5,6 +5,7 @@
 #include validate_memory.h
 
 void (*testfunc_ptr)(char* stack);
+unsigned long long saved_return_address;
 
 /* Helper macros to generate function name.  Example of the function name:
func_return_val_1.  */
@@ -71,6 +72,17 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, 
double d, type t)  \
  optimized away.  Using i and d prevents \
  warnings about unused parameters.   \
   */ \
+/* We save and set up the LR register in a way that essentially  \
+   inserts myfunc () between the returning of this function and the
  \
+   continueous execution of its caller.  By doing this, myfunc ()\
+   can save and check the exact content of the registers that are\
+   used forthe function return value.  
  \
+   The previous approach of sequentially calling myfunc right after
  \
+   this function does not guarantee myfunc see the exact register\
+   content, as compiler mayemit code in between the two calls, 
  \
+   especially during the -O0 codegen.  */\
+asm volatile (mov %0, x30 : =r (saved_return_address));
  \
+asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc));   \
 return t;\
   }
 #include TESTFILE
@@ -84,7 +96,8 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, double 
d, type t)  \
   {\
 testfunc_ptr = TEST_FUNC_NAME(id); \
 FUNC_NAME(id) (0, 0.0, var);   \
-myfunc (); \
+/* The above function implicitly calls myfunc () on its return,\
+   and the execution resumes from here after myfunc () finishes.  */\
   }
 
 int main()
diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S 
b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
index 86ce7be..68845fb 100644
--- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
+++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
@@ -50,6 +50,10 @@ LABEL_TEST_FUNC_RETURN:
   add  x9, x9, :lo12:testfunc_ptr
   ldr  x9, [x9, #0]
   blr  x9  // function return value test
+  adrp x9, saved_return_address
+  add  x9, x9, :lo12:saved_return_address
+  ldr  x9, [x9, #0]
+  str  x9, [sp, #8]// Update the copy of LR reg 
saved on stack
 LABEL_RET:
   ldp  x0, x30, [sp]
   mov  sp, x0
@@ -57,3 +61,4 @@ LABEL_RET:
 
 .weak  testfunc
 .weak  testfunc_ptr
+.weak  saved_return_address

[PATCH] rs6000: Make cr5 allocatable

2014-06-18 Thread Segher Boessenkool

A comment in rs6000.h says cr5 is not supposed to be used.  I checked
all ABIs, going as far back as PowerOpen (1994), and found no mention
of this.

Also document cr6 is used by some vector instructions.

Tested on powerpc64-linux, no regressions.  Okay to apply?


Segher


2014-06-18  Segher Boessenkool  seg...@kernel.crashing.org

gcc/
* config/rs6000/rs6000.h (FIXED_REGISTERS): Update comment.
Remove cr5.
(REG_ALLOC_ORDER): Update comment.  Move cr5 earlier.

---
 gcc/config/rs6000/rs6000.h | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3bd0104..569ae2d 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -978,8 +978,6 @@ enum data_align { align_abi, align_opt, align_both };
On RS/6000, r1 is used for the stack.  On Darwin, r2 is available
as a local register; for all other OS's r2 is the TOC pointer.
 
-   cr5 is not supposed to be used.
-
On System V implementations, r13 is fixed and not available for use.  */
 
 #define FIXED_REGISTERS  \
@@ -987,7 +985,7 @@ enum data_align { align_abi, align_opt, align_both };
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
-   0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, \
+   0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, \
/* AltiVec registers.  */  \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, \
@@ -1048,7 +1046,8 @@ enum data_align { align_abi, align_opt, align_both };
fp13 - fp2  (not saved; incoming fp arg registers)
fp1 (not saved; return value)
fp31 - fp14 (saved; order given to save least number)
-   cr7, cr6(not saved or special)
+   cr7, cr5(not saved or special)
+   cr6 (not saved, but used for vector operations)
cr1 (not saved, but used for FP operations)
cr0 (not saved, but used for arithmetic operations)
cr4, cr3, cr2   (saved)
@@ -1061,7 +1060,7 @@ enum data_align { align_abi, align_opt, align_both };
r12 (not saved; if used for DImode or DFmode would use r13)
ctr (not saved; when we have the choice ctr is better)
lr  (saved)
-   cr5, r1, r2, ap, ca (fixed)
+   r1, r2, ap, ca  (fixed)
v0 - v1 (not saved or used for anything)
v13 - v3(not saved; incoming vector arg registers)
v2  (not saved; incoming vector arg reg; return value)
@@ -1099,14 +1098,14 @@ enum data_align { align_abi, align_opt, align_both };
33, \
63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, \
50, 49, 48, 47, 46, \
-   75, 74, 69, 68, 72, 71, 70, \
+   75, 73, 74, 69, 68, 72, 71, 70, \
MAYBE_R2_AVAILABLE  \
9, 10, 8, 7, 6, 5, 4,   \
3, EARLY_R12 11, 0, \
31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, \
18, 17, 16, 15, 14, 13, LATE_R12\
66, 65, \
-   73, 1, MAYBE_R2_FIXED 67, 76,   \
+   1, MAYBE_R2_FIXED 67, 76,   \
/* AltiVec registers.  */   \
77, 78, \
90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, \
-- 
1.8.1.4

Re: [PATCH, Testsuite, AArch64] Make Function Return Value Test More Robust

2014-06-18 Thread Richard Earnshaw

On 18/06/14 15:16, Yufeng Zhang wrote:
 Hi,
 
 This improves the robustness of the aapcs64 test framework for testing 
 function return ABI rules.  It ensures the test facility functions now 
 able to see the exact content of return registers right at the moment 
 when a function returns.
 
 OK for trunk?

OK once the issues with the comment are clarified.

R.

 
 Thanks,
 Yufeng
 
 gcc/testsuite
 
   Make the AAPCS64 function return tests more robust.
 
   * gcc.target/aarch64/aapcs64/abitest-2.h (saved_return_address): New
   global variable.
   (FUNC_VAL_CHECK): Update to call myfunc via the 'ret' instruction,
   instead of calling sequentially in the C code.
   * gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Store
   saved_return_address to the stack frame where LR register was stored.
   (saved_return_address): Declare weak.
 
 
 patch
 
 
 diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h 
 b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
 index c56e7cc..c87fe9b 100644
 --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
 +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
 @@ -5,6 +5,7 @@
  #include validate_memory.h
  
  void (*testfunc_ptr)(char* stack);
 +unsigned long long saved_return_address;
  
  /* Helper macros to generate function name.  Example of the function name:
 func_return_val_1.  */
 @@ -71,6 +72,17 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, 
 double d, type t)  \
 optimized away.  Using i and d prevents \
 warnings about unused parameters.   \
  */ \
 +/* We save and set up the LR register in a way that essentially\
 +   inserts myfunc () between the returning of this function and the  
   \
s/returning/return/

 +   continueous execution of its caller.  By doing this, myfunc ()
   \
Typo: continueous.  Do you mean continuing?

 +   can save and check the exact content of the registers that are
   \
 +   used for  the function return value.  
   \
stray tab.

 +   The previous approach of sequentially calling myfunc right after  
   \
 +   this function does not guarantee myfunc see the exact register
   \
 +   content, as compiler may  emit code in between the two calls, 
   \
Similarly.

 +   especially during the -O0 codegen.  */
   \
 +asm volatile (mov %0, x30 : =r (saved_return_address));  
   \
 +asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc));   \
  return t;
   \
}
  #include TESTFILE
 @@ -84,7 +96,8 @@ __attribute__ ((noinline)) type FUNC_NAME (id) (int i, 
 double d, type t)  \
{  \
  testfunc_ptr = TEST_FUNC_NAME(id);   
 \
  FUNC_NAME(id) (0, 0.0, var); \
 -myfunc ();   
 \
 +/* The above function implicitly calls myfunc () on its return,  \
 +   and the execution resumes from here after myfunc () finishes.  */\
}
  
  int main()
 diff --git a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S 
 b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
 index 86ce7be..68845fb 100644
 --- a/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
 +++ b/gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S
 @@ -50,6 +50,10 @@ LABEL_TEST_FUNC_RETURN:
addx9, x9, :lo12:testfunc_ptr
ldrx9, [x9, #0]
blrx9  // function return value test
 +  adrp   x9, saved_return_address
 +  addx9, x9, :lo12:saved_return_address
 +  ldrx9, [x9, #0]
 +  strx9, [sp, #8]// Update the copy of LR reg 
 saved on stack
  LABEL_RET:
ldpx0, x30, [sp]
movsp, x0
 @@ -57,3 +61,4 @@ LABEL_RET:
  
  .weaktestfunc
  .weaktestfunc_ptr
 +.weaksaved_return_address

Re: [PATCH] [ARM] Post-indexed addressing for NEON memory access

2014-06-18 Thread Charles Baylis

On 18 June 2014 11:06, Ramana Radhakrishnan ramana@googlemail.com wrote:
 2014-06-15  Charles Baylis  charles.ba...@linaro.org

 * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
 embedded side effects.

 I'm not too thrilled with putting in more special cases that are not
 table driven in there. Can you file a PR with some testcases that show
 this so that we don't forget and CC me on it please ?

I created PR61551 and CC'd.

Re: Fwd: [RFC][gomp4] Offloading patches (2/3): Add tables generation

2014-06-18 Thread Bernd Schmidt


On 06/18/2014 04:13 PM, Ilya Verbin wrote:

On 17 Jun 21:22, Bernd Schmidt wrote:

On 06/17/2014 08:20 PM, Ilya Verbin wrote:

I don't get this part of the plan.  Where a host compiler will look for 
mkoffloads?

E.g., first I configure/make/install the target gcc and corresponding mkoffload 
with the following options:
--enable-accelerator=intelmic --enable-as-accelerator-for=x86_64-unknown-linux 
--prefix=/install_gcc/accel_intelmic

Next I configure/make/install the host gcc with:
--enable-accelerator=intelmic --prefix=/install_gcc/host


Try using the same prefix for both.


I tried to do:
1. --enable-accelerator=intelmic 
--enable-as-accelerator-for=x86_64-intelmic-linux-gnu --prefix=/install_gcc/both
2. --enable-accelerator=intelmic --prefix=/install_gcc/both

In this case only bin/x86_64-intelmic-linux-gnu-accel-intelmic-gcc from accel 
compiler is saved.
All other binaries in bin, lib, lib64, libexec are replaced by host's ones.
Is there a way to have 2 working compilers and libs in the same prefix?


Sure, as long as the target triplet is different.

What I think you need to do is
For the first compiler: --enable-as-accelerator-for=x86_64-pc-linux-gnu 
--target=x86_64-intelmic-linux-gnu --prefix=/somewhere

Build and install, then:
For the second: configure 
--enable-offload-targets=x86_64-intelmic-linux-gnu x86_64-pc-linux-gnu 
--prefix=/somewhere


No --enable-accelerator options at all. This should work, if it doesn't 
let me know what you find in /somewhere after installation for both 
compilers.



Bernd

Re: [PATCH, AARCH64] Enable fuse-caller-save for AARCH64

2014-06-18 Thread Marcus Shawcroft

On 1 June 2014 11:00, Tom de Vries tom_devr...@mentor.com wrote:
 Richard,

 This patch:
 - adds the for TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS required
   clobbers in CALL_INSN_FUNCTION_USAGE,
 - sets TARGET_CALL_FUSAGE_CONTAINS_NON_CALLEE_CLOBBERS to true, which
 enables
   the fuse-caller-save optimisation, and
 - adds an aarch64 fuse-caller-save test-case.

 Build and tested on aarch64-linux-gnu.

 OK for trunk?

 Thanks,
 - Tom


OK
/Marcus

Formatting fixes for (gccint) Standard Names

2014-06-18 Thread Andreas Schwab

Tested with make info and installed as obvious.

Andreas.

* doc/md.texi (Standard Names): Use @itemx for grouped items.
Remove blank line after @item.

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index e17ffca..1c3a326 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4835,7 +4835,7 @@ and the scalar result is stored in the least significant 
bits of operand 0
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
 @cindex @code{udot_prod@var{m}} instruction pattern
-@item @samp{udot_prod@var{m}}
+@itemx @samp{udot_prod@var{m}}
 Compute the sum of the products of two signed/unsigned elements.
 Operand 1 and operand 2 are of the same mode. Their product, which is of a
 wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or
@@ -4845,7 +4845,7 @@ is of the same mode as operand 3.
 @cindex @code{ssum_widen@var{m3}} instruction pattern
 @item @samp{ssum_widen@var{m3}}
 @cindex @code{usum_widen@var{m3}} instruction pattern
-@item @samp{usum_widen@var{m3}}
+@itemx @samp{usum_widen@var{m3}}
 Operands 0 and 2 are of the same mode, which is wider than the mode of
 operand 1. Add operand 1 to operand 2 and place the widened result in
 operand 0. (This is used express accumulation of elements into an accumulator
@@ -6218,7 +6218,6 @@ A typical @code{ctrap} pattern looks like
 
 @cindex @code{prefetch} instruction pattern
 @item @samp{prefetch}
-
 This pattern, if defined, emits code for a non-faulting data prefetch
 instruction.  Operand 0 is the address of the memory to prefetch.  Operand 1
 is a constant 1 if the prefetch is preparing for a write to the memory
@@ -6234,7 +6233,6 @@ the values of operands 1 and 2.
 
 @cindex @code{blockage} instruction pattern
 @item @samp{blockage}
-
 This pattern defines a pseudo insn that prevents the instruction
 scheduler and other passes from moving instructions and using register
 equivalences across the boundary defined by the blockage insn.
@@ -6242,7 +6240,6 @@ This needs to be an UNSPEC_VOLATILE pattern or a volatile 
ASM.
 
 @cindex @code{memory_barrier} instruction pattern
 @item @samp{memory_barrier}
-
 If the target memory model is not fully synchronous, then this pattern
 should be defined to an instruction that orders both loads and stores
 before the instruction with respect to loads and stores after the instruction.
@@ -6250,7 +6247,6 @@ This pattern has no operands.
 
 @cindex @code{sync_compare_and_swap@var{mode}} instruction pattern
 @item @samp{sync_compare_and_swap@var{mode}}
-
 This pattern, if defined, emits code for an atomic compare-and-swap
 operation.  Operand 1 is the memory on which the atomic operation is
 performed.  Operand 2 is the ``old'' value to be compared against the
@@ -6299,7 +6295,6 @@ interruptable locking.
 @item @samp{sync_add@var{mode}}, @samp{sync_sub@var{mode}}
 @itemx @samp{sync_ior@var{mode}}, @samp{sync_and@var{mode}}
 @itemx @samp{sync_xor@var{mode}}, @samp{sync_nand@var{mode}}
-
 These patterns emit code for an atomic operation on memory.
 Operand 0 is the memory on which the atomic operation is performed.
 Operand 1 is the second operand to the binary operator.
@@ -6321,7 +6316,6 @@ from a compare-and-swap operation, if defined.
 @item @samp{sync_old_add@var{mode}}, @samp{sync_old_sub@var{mode}}
 @itemx @samp{sync_old_ior@var{mode}}, @samp{sync_old_and@var{mode}}
 @itemx @samp{sync_old_xor@var{mode}}, @samp{sync_old_nand@var{mode}}
-
 These patterns emit code for an atomic operation on memory,
 and return the value that the memory contained before the operation.
 Operand 0 is the result value, operand 1 is the memory on which the
@@ -6345,14 +6339,12 @@ from a compare-and-swap operation, if defined.
 @item @samp{sync_new_add@var{mode}}, @samp{sync_new_sub@var{mode}}
 @itemx @samp{sync_new_ior@var{mode}}, @samp{sync_new_and@var{mode}}
 @itemx @samp{sync_new_xor@var{mode}}, @samp{sync_new_nand@var{mode}}
-
 These patterns are like their @code{sync_old_@var{op}} counterparts,
 except that they return the value that exists in the memory location
 after the operation, rather than before the operation.
 
 @cindex @code{sync_lock_test_and_set@var{mode}} instruction pattern
 @item @samp{sync_lock_test_and_set@var{mode}}
-
 This pattern takes two forms, based on the capabilities of the target.
 In either case, operand 0 is the result of the operand, operand 1 is
 the memory on which the atomic operation is performed, and operand 2
@@ -6377,7 +6369,6 @@ a compare-and-swap operation, if defined.
 
 @cindex @code{sync_lock_release@var{mode}} instruction pattern
 @item @samp{sync_lock_release@var{mode}}
-
 This pattern, if defined, releases a lock set by
 @code{sync_lock_test_and_set@var{mode}}.  Operand 0 is the memory
 that contains the lock; operand 1 is the value to store in the lock.
@@ -6467,7 +6458,6 @@ compare and swap loop.
 @item @samp{atomic_add@var{mode}}, @samp{atomic_sub@var{mode}}
 @itemx @samp{atomic_or@var{mode}}, @samp{atomic_and@var{mode}}

Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-18 Thread Marek Polacek

On Wed, Jun 18, 2014 at 12:55:01PM +0800, Thomas Preud'homme wrote:
 @@ -1646,20 +1648,23 @@ do_shift_rotate (enum tree_code code,
n-n = count;
break;
  case RSHIFT_EXPR:
 +  /* Arithmetic shift of signed type: result is dependent on the value.  
 */
 +  if (!TYPE_UNSIGNED (n-type)  (n-n  (0xff  (bitsize - 8
 + return false;

Looks like here an undefined behavior happens:
tree-ssa-math-opts.c:1672:53: runtime error: shift exponent 56 is too
large for 32-bit type 'int'

Marek

Re: [PATCH 1/9] Whitespace

2014-06-18 Thread Charles Baylis

On 11 June 2014 13:55, Richard Earnshaw rearn...@arm.com wrote:
 On 11/06/14 11:19, Charles Baylis wrote:
 2014-05-22  Charles Baylis  charles.bay...@linaro.org

   * config/arm/bpabi.S (__aeabi_uldivmod): Fix whitespace.
   (__aeabi_ldivmod): Fix whitespace.

 This is OK, but please wait until the others are ready to go in.

The series is now committed as r211789-r211797.

Re: [PATCH, PR 61540] Do not ICE on impossible devirtualization

2014-06-18 Thread Bernhard Reutner-Fischer


On 18 June 2014 10:24:16 Martin Jambor mjam...@suse.cz wrote:

@@ -3002,10 +3014,8 @@ try_make_edge_direct_virtual_call (struct 
cgraph_edge *ie,


   if (target)
 {
-#ifdef ENABLE_CHECKING
-  gcc_assert (possible_polymorphic_call_target_p
-(ie, cgraph_get_node (target)));
-#endif
+  if (!possible_polymorphic_call_target_p (ie, cgraph_get_node (target)))
+   return ipa_make_edge_direct_to_target (ie, target);
   return ipa_make_edge_direct_to_target (ie, target);
 }


The above looks odd. You return the same thing both conditionally and 
unconditionally?


Thanks,

   else




Sent with AquaMail for Android
http://www.aqua-mail.com

[PATCH, ARM] Improve code-gen for multiple shifted accumulations in array indexing

2014-06-18 Thread Yufeng Zhang


Hi,

This patch improves the code-gen of -marm in the case of two-dimensional 
array access.


Given the following code:

typedef struct { int x,y,a,b; } X;

int
f7a(X p[][4], int x, int y)
{
  return p[x][y].a;
}

The code-gen on -O2 -marm -mcpu=cortex-a15 is currently

mov r2, r2, asl #4
add r1, r2, r1, asl #6
add r0, r0, r1
ldr r0, [r0, #8]
bx  lr

With the patch, we'll get:

add r1, r0, r1, lsl #6
add r2, r1, r2, lsl #4
ldr r0, [r2, #8]
bx  lr

The -mthumb code-gen had been OK.

The patch has passed the bootstrapping on cortex-a15 and the 
arm-none-eabi regtest, with no code-gen difference in spec2k 
(unfortunately).


OK for the trunk?

Thanks,
Yufeng

gcc/

* config/arm/arm.c (arm_reassoc_shifts_in_address): New declaration
and new function.
(arm_legitimize_address): Call the new functions.
(thumb_legitimize_address): Prefix the declaration with static.

gcc/testsuite/

* gcc.target/arm/shifted-add-1.c: New test.
* gcc.target/arm/shifted-add-2.c: Ditto.

Re: [PATCH, ARM] Improve code-gen for multiple shifted accumulations in array indexing

2014-06-18 Thread Yufeng Zhang


This time with patch... Apologize.

Yufeng

On 06/18/14 17:31, Yufeng Zhang wrote:

Hi,

This patch improves the code-gen of -marm in the case of two-dimensional
array access.

Given the following code:

typedef struct { int x,y,a,b; } X;

int
f7a(X p[][4], int x, int y)
{
return p[x][y].a;
}

The code-gen on -O2 -marm -mcpu=cortex-a15 is currently

  mov r2, r2, asl #4
  add r1, r2, r1, asl #6
  add r0, r0, r1
  ldr r0, [r0, #8]
  bx  lr

With the patch, we'll get:

  add r1, r0, r1, lsl #6
  add r2, r1, r2, lsl #4
  ldr r0, [r2, #8]
  bx  lr

The -mthumb code-gen had been OK.

The patch has passed the bootstrapping on cortex-a15 and the
arm-none-eabi regtest, with no code-gen difference in spec2k
(unfortunately).

OK for the trunk?

Thanks,
Yufeng

gcc/

* config/arm/arm.c (arm_reassoc_shifts_in_address): New declaration
and new function.
(arm_legitimize_address): Call the new functions.
(thumb_legitimize_address): Prefix the declaration with static.

gcc/testsuite/

* gcc.target/arm/shifted-add-1.c: New test.
* gcc.target/arm/shifted-add-2.c: Ditto.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 16fc7ed..281c96a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -88,6 +88,7 @@ static int thumb1_base_register_rtx_p (rtx, enum 
machine_mode, int);
 static rtx arm_legitimize_address (rtx, rtx, enum machine_mode);
 static reg_class_t arm_preferred_reload_class (rtx, reg_class_t);
 static rtx thumb_legitimize_address (rtx, rtx, enum machine_mode);
+static void arm_reassoc_shifts_in_address (rtx);
 inline static int thumb1_index_register_rtx_p (rtx, int);
 static bool arm_legitimate_address_p (enum machine_mode, rtx, bool);
 static int thumb_far_jump_used_p (void);
@@ -7501,7 +7502,8 @@ arm_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
 {
   /* TODO: legitimize_address for Thumb2.  */
   if (TARGET_THUMB2)
-return x;
+   return x;
+
   return thumb_legitimize_address (x, orig_x, mode);
 }
 
@@ -7551,6 +7553,9 @@ arm_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
}
   else if (xop0 != XEXP (x, 0) || xop1 != XEXP (x, 1))
x = gen_rtx_PLUS (SImode, xop0, xop1);
+
+  if (GET_CODE (xop0) == PLUS)
+   arm_reassoc_shifts_in_address (xop0);
 }
 
   /* XXX We don't allow MINUS any more -- see comment in
@@ -7614,7 +7619,8 @@ arm_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
 
 /* Try machine-dependent ways of modifying an illegitimate Thumb address
to be legitimate.  If we find one, return the new, valid address.  */
-rtx
+
+static rtx
 thumb_legitimize_address (rtx x, rtx orig_x, enum machine_mode mode)
 {
   if (GET_CODE (x) == PLUS
@@ -7679,6 +7685,47 @@ thumb_legitimize_address (rtx x, rtx orig_x, enum 
machine_mode mode)
   return x;
 }
 
+/* Transform
+ PLUS (PLUS (MULT1, MULT2), REG)
+   to
+ PLUS (PLUS (MULT1, REG), MULT2)
+   so that we can use two add (shifted register) instructions
+   to compute the expression.  Note that SHIFTs has already
+   been replaced with MULTs as a result of canonicalization.
+
+   This routine is to help undo the undesired canonicalization
+   that is done by simplify_gen_binary on addresses with
+   multiple shifts.  For example, it will help transform
+  (x  6) + (y  4) + p + 8
+   back to:
+  (x  6) + p + (y  4) + 8
+   where p is the start address of a two-dimensional array and
+   x and y are the indexes.  */
+
+static void
+arm_reassoc_shifts_in_address (rtx x)
+{
+  if (GET_CODE (x) == PLUS)
+{
+  rtx op0 = XEXP (x, 0);
+  rtx op1 = XEXP (x, 1);
+
+  if (GET_CODE (op0) == PLUS  REG_P (op1))
+   {
+ rtx xop0 = XEXP (op0, 0);
+ rtx xop1 = XEXP (op0, 1);
+
+ if (GET_CODE (xop0) == MULT  GET_CODE (xop1) == MULT
+  power_of_two_operand (XEXP (xop0, 1), GET_MODE (xop0))
+  power_of_two_operand (XEXP (xop1, 1), GET_MODE (xop1)))
+   {
+ XEXP (op0, 1) = op1;
+ XEXP (x, 1) = xop1;
+   }
+   }
+}
+}
+
 bool
 arm_legitimize_reload_address (rtx *p,
   enum machine_mode mode,
diff --git a/gcc/testsuite/gcc.target/arm/shifted-add-1.c 
b/gcc/testsuite/gcc.target/arm/shifted-add-1.c
new file mode 100644
index 000..8777fe4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/shifted-add-1.c
@@ -0,0 +1,47 @@
+/* { dg-do assemble } */
+/* { dg-options -O2 } */
+
+typedef struct { int x,y,a,b; } x;
+
+int
+f7a(x p[][4], int x, int y)
+{
+  return p[x][y].a;
+}
+
+/* { dg-final { object-size text = 16 { target { { ! arm_thumb1 }  { ! 
arm_thumb2 } } } } } */
+/* { dg-final { object-size text = 12 { target arm_thumb2 } } } */
+
+
+/* For arm code-gen, expect four instructions like:
+
+   0:   e0801301add r1, r0,

RE: [PATCH, rs6000] Fix PR61542 - V4SF vector extract for little endian

2014-06-18 Thread Bernd Edlinger

Hi,

On Wed, 18 Jun 2014 09:56:15, David Edelsohn wrote:

 On Tue, Jun 17, 2014 at 6:44 PM, BIll Schmidt
 wschm...@linux.vnet.ibm.com wrote:
 Hi,

 As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61542, a
 new test case (gcc.dg/vect/vect-nop-move.c) was added in 4.9. This
 exposes a bug on PowerPC little endian for extracting an element from a
 V4SF value that goes back to 4.8. The following patch fixes the
 problem.

 Tested on powerpc64le-unknown-linux-gnu with no regressions. Ok to
 commit to trunk? I would also like to commit to 4.8 and 4.9 as soon as
 possible to be picked up by the distros.

 This is okay everywhere.

 I would also like to backport gcc.dg/vect/vect-nop-move.c to 4.8 to
 provide regression coverage.

 You should ask Bernd and the RMs. Was the bug fix that prompted the
 new testcase backported to all targets?

 Thanks, David

actually I only added the check_vect to that test case, but that
exposed a bug on Solaris-9.

See https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=207668.

That was in the -fdump-rtl-combine-details handling, where
fprintf got a NULL value passed for %s, which ICEs on Solaris9.

So if you backport that test case, be sure to check that one too.

Originally the test case seems to check something for the aarch64-target.
See https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=205712.

Obviously the patch in  rtlanal.c (set_noop_p) was never backported to the 4.8 
branch.
Maybe Tejas who originally wrote that test case, can explain, if it makes
sense to backport this fix too.


Thanks
Bernd.

Re: [PATCH] PR61123 : Fix the ABI mis-matching error caused by LTO

2014-06-18 Thread Mike Stump

On Jun 18, 2014, at 3:22 AM, Richard Biener richard.guent...@gmail.com wrote:
 Space after the *.
 
 I think you don't need to copy the LTO harness but you can simply
 use dg.exp and sth similar to gcc.dg/20081223-1.c (there is an
 effective target 'lto' to guard for lto support).
 
 So simply place the testcase in gcc.target/arm/ (make sure to
 put a dg-do compile on the 2nd file and use dg-additional-sources).
 
 If that doesn't work I'd say put the testcase in gcc.dg/lto/ instead
 and do a dg-skip-if for non-arm targets.
 
 Ok with one of those changes.
 
 Oh, I see you need a new object-readelf ... I defer to a testsuite maintainer
 for this part.

The testsuite bits are Ok.  My guidance on the test suite would be this, all 
lto test cases in .*lto directories.  20 or fewer test cases for a given 
target, in the main lto directory, more than 50, in the arm/lto directory.  
When one is tracking down bugs and trying to clean test suite results if they 
break, it is nice to be able to skip in mass all lto bugs first, and resolve 
all non-lto issues and then come back to the lto issues last, in hopes that 
they are all then resolved.  Also, if one it redoing lto bits, and a test case 
with lto in the name pops up as a regression, and you’re not an lto person, you 
can stop thinking about it and just pass to the lto person, it is a slightly 
different mindset.  :-)

Re: [PATCH] Fix PR61306: improve handling of sign and cast in bswap

2014-06-18 Thread Jakub Jelinek

On Wed, Jun 18, 2014 at 05:23:13PM +0200, Marek Polacek wrote:
 On Wed, Jun 18, 2014 at 12:55:01PM +0800, Thomas Preud'homme wrote:
  @@ -1646,20 +1648,23 @@ do_shift_rotate (enum tree_code code,
 n-n = count;
 break;
   case RSHIFT_EXPR:
  +  /* Arithmetic shift of signed type: result is dependent on the 
  value.  */
  +  if (!TYPE_UNSIGNED (n-type)  (n-n  (0xff  (bitsize - 8
  +   return false;
 
 Looks like here an undefined behavior happens:
 tree-ssa-math-opts.c:1672:53: runtime error: shift exponent 56 is too
 large for 32-bit type 'int'

Seems there are actually two spots with this, not just one.

Completely untested fix:

2014-06-18  Jakub Jelinek  ja...@redhat.com

* tree-ssa-math-opts.c (do_shift_rotate, find_bswap_or_nop_1): Cast
0xff to uint64_t before shifting it up.

--- gcc/tree-ssa-math-opts.c2014-06-13 08:08:42.354136356 +0200
+++ gcc/tree-ssa-math-opts.c2014-06-18 19:50:59.486916201 +0200
@@ -1669,7 +1669,8 @@ do_shift_rotate (enum tree_code code,
   break;
 case RSHIFT_EXPR:
   /* Arithmetic shift of signed type: result is dependent on the value.  */
-  if (!TYPE_UNSIGNED (n-type)  (n-n  (0xff  (bitsize - 8
+  if (!TYPE_UNSIGNED (n-type)
+  (n-n  ((uint64_t) 0xff  (bitsize - 8
return false;
   n-n = count;
   break;
@@ -1903,7 +1904,7 @@ find_bswap_or_nop_1 (gimple stmt, struct
old_type_size = TYPE_PRECISION (n-type);
if (!TYPE_UNSIGNED (n-type)
 type_size  old_type_size
-n-n  (0xff  (old_type_size - 8)))
+n-n  ((uint64_t) 0xff  (old_type_size - 8)))
  return NULL_TREE;
 
if (type_size / BITS_PER_UNIT  (int)(sizeof (int64_t)))


Jakub

Re: [PATCH] rs6000: Make cr5 allocatable

2014-06-18 Thread Mike Stump

On Jun 18, 2014, at 7:14 AM, Segher Boessenkool seg...@kernel.crashing.org 
wrote:
 A comment in rs6000.h says cr5 is not supposed to be used.  I checked
 all ABIs,

I usually do a blame and find that change that added it…  Ah, there it is, 
r341…  let’s see, rms did it!  :-)  Oh well…  never mind.  Kinda amazing the 
bits lost in time.

[GSoC] Addition of ISL AST generation to Graphite

2014-06-18 Thread Roman Gareev

These patches add ISL AST generation to graphite, which can be chosen
by the fgraphite-code-generator=[isl|cloog] switch. The first patch
makes initial renaming of gloog and gloog_error to
graphite_regenerate_ast_cloog and graphite_regenerate_error,
respectively. The second one adds new files with generation of ISL
AST, new switch, new testcase that checks that the dump is generated.

Is it fine for trunk?

P.S. My copyright assignment has been already processed.

--
   Cheers, Roman Gareev


ChangeLog_entry1
Description: Binary data


ChangeLog_entry2
Description: Binary data


patch1
Description: Binary data


patch2
Description: Binary data

Re: [GSoC] Addition of ISL AST generation to Graphite

2014-06-18 Thread Tobias Grosser


On 18/06/2014 21:00, Roman Gareev wrote:

These patches add ISL AST generation to graphite, which can be chosen
by the fgraphite-code-generator=[isl|cloog] switch. The first patch
makes initial renaming of gloog and gloog_error to
graphite_regenerate_ast_cloog and graphite_regenerate_error,
respectively. The second one adds new files with generation of ISL
AST, new switch, new testcase that checks that the dump is generated.

Is it fine for trunk?


I went over this from the graphite side and it looks fine. However,
as I did not commit for a while to gcc, it would be great if someone 
else could have a look.


Cheers,
Tobias

Re: [PATCH 1/5] New Identical Code Folding IPA pass

2014-06-18 Thread Jan Hubicka

 On 06/13/14 04:24, mliska wrote:
 
 You may ask, why the GNU GCC does need such a new optimization. The
 compiler, having simply better knowledge of a compiled source file,
 is capable of reaching better results, especially if Link-Time
 optimization is enabled. Apart from that, GCC implementation adds
 support for read-only variables like construction vtables (mentioned
 in:
 http://hubicka.blogspot.cz/2014/02/devirtualization-in-c-part-3-building.html).
 Can you outline at a high level cases where GCC's knowledge allows
 it to reach a better result?  Is it because you're not requiring bit
 for bit identical code, but that the code merely be semantically
 equivalent?
 
 The GCC driven ICF seems to pick up 2X more opportunities than the
 gold driven ICF.  But if I'm reading everything correctly, that
 includes ICF of both functions and variables.

There are important differences between in-GCC ICF and gold's ICF. Basically

 - GCC ICF runs before most of context sensitive optimizations, so it does
   see code that is identical to start with, but would become different
   during optimization.

   For example if you have function a1...a1000 calling function b1b1000
   where all bX are same, but all aX differs, then before inlining one
   can easily unify b and let inliner's heuristic decide whether it is good
   idea to duplicate body of b, while after inlining this is no longer
   possible.

   We don't do much in this respect, but we should try to unify accidental
   code duplication early in early passes to not let duplicates bubble
   until late optimizations where they may or may not be caught by
   i.e. tail merging

   This however also means that at least in current implementation it will
   result in somewhat more corruption of debug info (by replacing inline
   functions by different inline function with same body).

 - GCC ICF (doesn't in the current implementation) can do value numbering
   matching and match identical semantic with different implementation.
   It is the plan to get smarter here, I just wanted to have something working
   first and then play with more advanced tricks.

 - GCC ICF sees some things as different while they are not in final assembly.
   Types, alias classes and other details that are important for GCC but lost
   in codegen.  So here gold can do better work.

 - Theoretically, if tuned well, GCC ICF could improve compilation speed
   by avoiding need to optimize duplicates.

 - Gold's ICF depends on functions sections that are not free.

 - GCC ICF can be smarter about objects with address taken: we need analysis
   deciding when the address can be compared with a different address.
   This would be useful on other places, too.

Honza
 
 Do you have any sense of how those improvements break down?  ie, is
 it mostly more function's you're finding as identical, and if so
 what is it about the GCC implementation that allows us to find more
 ICF opportunities.  If it's mostly variables, that's fine too.  I'm
 just trying to understand where the improvements are coming from.
 
 Jeff

Re: [patch i386]: Combine memory and indirect jump

2014-06-18 Thread Jeff Law


On 06/17/14 14:35, Kai Tietz wrote:

I just did retest my testcase with recent source. I can't reproduce
this missed optimization before sched2 pass anymore.  I moved second
peephole2 pass just before split_before_sched2 and everything got
caught.

Let's go with this if your idea of using a define_split doesn't work out.





To remove first peephole2 pass seems to cause weaker code for
impossible pushes, etc

OK.




Nevertheless it might be a point to make this new peephole instead a
define_split?  I admit that this operation isn't a split, nevertheless
we would avoid a second peephole pass.
Doesn't hurt to try and as you say, if we can avoid a 2nd peep2 pass, 
that's good.


jeff

Re: [PATCH] rs6000: Make cr5 allocatable

2014-06-18 Thread David Edelsohn

On Wed, Jun 18, 2014 at 10:14 AM, Segher Boessenkool
seg...@kernel.crashing.org wrote:
 A comment in rs6000.h says cr5 is not supposed to be used.  I checked
 all ABIs, going as far back as PowerOpen (1994), and found no mention
 of this.

 Also document cr6 is used by some vector instructions.

 Tested on powerpc64-linux, no regressions.  Okay to apply?


 Segher


 2014-06-18  Segher Boessenkool  seg...@kernel.crashing.org

 gcc/
 * config/rs6000/rs6000.h (FIXED_REGISTERS): Update comment.
 Remove cr5.
 (REG_ALLOC_ORDER): Update comment.  Move cr5 earlier.

This is okay.

I have no idea why RMS assumed that cr5 is fixed.

Thanks, David

RE: RFA: Make LRA temporarily eliminate addresses before testing constraints

2014-06-18 Thread Matthew Fortune

 On 2014-06-16, 12:12 PM, Robert Suchanek wrote:
  Pinging for approval.
 
  This part of the patch will be needed for MIPS16. The second part to
 enable
  LRA in MIPS has been already approved.
 
 
Sorry, Robert.  I thought you are waiting for some Richard's comment
 (actually he knows the code well and wrote address decoding in rtlanal.c).
 
The patch is ok for me and makes LRA even more portable as it adds a
 new profitable address transformation and the code can be useful for
 other targets too.

Thanks.

Core LRA change committed as: r211802
MIPS LRA committed as: r211805

Matthew

[PATCH] dwarf2out.c: Pass DWARF type modifiers around as flags argument.

2014-06-18 Thread Mark Wielaard

modified_type_die and add_type_attribute take two separate arguments
for whether the type should be const and/or volatile. To help add
more type modifiers pass the requested modifiers as one flag value
to these functions. And introduce helper functions dw_mod_type_flags
and dw_mod_decl_flags to easily extract the modifiers from type and
declaration trees.

DWARFv3 added restrict_type [PR debug/59051] and DWARFv5 has proposals
for atomic_type and aligned_type) pass the requested modifiers as one
flag value to these functions. Which will hopefully be easier to
implement based on this change.

gcc/ChangeLog

   * dwarf2out.h (enum dw_mod_flag): New enum.
   * dwarf2out.c (dw_mod_decl_flags): New function.
   (dw_mod_type_flags): Likewise.
   (modified_type_die): Take one modifiers flag argument instead of
   one for const and one for volatile.
   (add_type_attribute): Likewise.
   (generic_parameter_die): Call add_type_attribute with one modifier
   argument.
   (base_type_for_mode): Likewise.
   (add_bounds_info): Likewise.
   (add_subscript_info): Likewise.
   (gen_array_type_die): Likewise.
   (gen_descr_array_type_die): Likewise.
   (gen_entry_point_die): Likewise.
   (gen_enumeration_type_die): Likewise.
   (gen_formal_parameter_die): Likewise.
   (gen_subprogram_die): Likewise.
   (gen_variable_die): Likewise.
   (gen_const_die): Likewise.
   (gen_field_die): Likewise.
   (gen_pointer_type_die): Likewise.
   (gen_reference_type_die): Likewise.
   (gen_ptr_to_mbr_type_die): Likewise.
   (gen_inheritance_die): Likewise.
   (gen_subroutine_type_die): Likewise.
   (gen_typedef_die): Likewise.
   (force_type_die): Likewise.
---
 gcc/ChangeLog   |   30 
 gcc/dwarf2out.c |  133 ++-
 gcc/dwarf2out.h |8 +++
 3 files changed, 110 insertions(+), 61 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2d0a07c..d7ee868 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,33 @@
+2014-06-18  Mark Wielaard  m...@redhat.com
+
+   * dwarf2out.h (enum dw_mod_flag): New enum.
+   * dwarf2out.c (dw_mod_decl_flags): New function.
+   (dw_mod_type_flags): Likewise.
+   (modified_type_die): Take one modifiers flag argument instead of
+   one for const and one for volatile.
+   (add_type_attribute): Likewise.
+   (generic_parameter_die): Call add_type_attribute with one modifier
+   argument.
+   (base_type_for_mode): Likewise.
+   (add_bounds_info): Likewise.
+   (add_subscript_info): Likewise.
+   (gen_array_type_die): Likewise.
+   (gen_descr_array_type_die): Likewise.
+   (gen_entry_point_die): Likewise.
+   (gen_enumeration_type_die): Likewise.
+   (gen_formal_parameter_die): Likewise.
+   (gen_subprogram_die): Likewise.
+   (gen_variable_die): Likewise.
+   (gen_const_die): Likewise.
+   (gen_field_die): Likewise.
+   (gen_pointer_type_die): Likewise.
+   (gen_reference_type_die): Likewise.
+   (gen_ptr_to_mbr_type_die): Likewise.
+   (gen_inheritance_die): Likewise.
+   (gen_subroutine_type_die): Likewise.
+   (gen_typedef_die): Likewise.
+   (force_type_die): Likewise.
+
 2014-06-18  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 
* config/arm/arm_neon.h (vadd_f32): Change #ifdef to __FAST_MATH.
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 933ec62..0216801 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3140,7 +3140,9 @@ static void output_file_names (void);
 static dw_die_ref base_type_die (tree);
 static int is_base_type (tree);
 static dw_die_ref subrange_type_die (tree, tree, tree, dw_die_ref);
-static dw_die_ref modified_type_die (tree, int, int, dw_die_ref);
+static int dw_mod_decl_flags (const_tree);
+static int dw_mod_type_flags (const_tree);
+static dw_die_ref modified_type_die (tree, int, dw_die_ref);
 static dw_die_ref generic_parameter_die (tree, tree, bool, dw_die_ref);
 static dw_die_ref template_parameter_pack_die (tree, tree, dw_die_ref);
 static int type_is_enum (const_tree);
@@ -3198,7 +3200,7 @@ static dw_die_ref scope_die_for (tree, dw_die_ref);
 static inline int local_scope_p (dw_die_ref);
 static inline int class_scope_p (dw_die_ref);
 static inline int class_or_namespace_scope_p (dw_die_ref);
-static void add_type_attribute (dw_die_ref, tree, int, int, dw_die_ref);
+static void add_type_attribute (dw_die_ref, tree, int, dw_die_ref);
 static void add_calling_convention_attribute (dw_die_ref, tree);
 static const char *type_tag (const_tree);
 static tree member_declared_type (const_tree);
@@ -10498,12 +10500,25 @@ subrange_type_die (tree type, tree low, tree high, 
dw_die_ref context_die)
   return subrange_die;
 }
 
+static int
+dw_mod_decl_flags (const_tree decl)
+{
+  return ((TREE_READONLY (decl) ? dw_mod_const : dw_mod_none)
+ | (TREE_THIS_VOLATILE (decl) ? dw_mod_volatile :

96 matches

Mail list logo