Re: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Joerg Wunsch
As Georg-Johann Lay wrote:

> As already discussed, this patch removes the -mshort-calls command
> option from avr-gcc.

> Ok to apply?

I'm more than happy with removing it.
-- 
cheers, J"org   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)


Re: PING Re: [PATCH, MIPS] add new peephole for 74k dspr2

2012-10-07 Thread Richard Sandiford
"Maciej W. Rozycki"  writes:
> On Tue, 25 Sep 2012, Richard Sandiford wrote:
>> Although I see the 4kp with its 32-cycle MULTs and MADDs is one where
>> MULT $0,$0 would be a really bad choice.  Sigh.  The amount of effort
>> required for this optimisation is getting a bit ridiculous.
>
>  I have double-checked some documentation, and in fact many MIPS cores, 
> including the current ones, have a configuration option to include either 
> a high-performance or an area-efficient MD unit.  Take the M14Kc for 
> example -- its high-performance unit has a one-cycle latency/issue rate 
> for 16-bit multiplication (two-cycle for full 32 bits; here the width of 
> rt is explicitly named) and the area-efficient has a 32-cycle 
> latency/issue rate only regardless of the operand size (obviously 
> iterating over addition one bit at a time).  The latency of MTHI/MTLO is 1 
> across both units.
>
>  So I think this can't really be selected automatically for all cores, 
> some human-supplied knowledge about the MD unit used is required -- that 
> obviously affects other operations too, e.g. some multiplications 
> involving a constant that may be cheaper to do either directly or with a 
> sequence of additions depending on the MD unit present (unless optimising 
> for size, of course).

Yeah, as far as this optimisation goes, I think your original suggestion
of using the DFA to work out the best sequence is still right.  If the DFA
says that multiplications take a handful of cycles when they actually take
32 then we're not going to optimise multiplication very well whatever happens.

In practice, code destined for non-4kc cores with iterative multipliers
would probably tune well with -mtune=4kp.

Anyway, I said upthread that I was looking into removing MD0_REG and
MD1_REG because I thought I'd seen a case where that was needed for
madd-9.c to pass.  I can no longer reproduce that, so it was probably
pilot error.

The final patch ended up being much more complicated than I'd have liked,
but at least it should be easier to add other automatically-derived tuning
costs in future.

Tested on mipsisa64-elf and mipsisa32-elf.  Applied.

Richard


gcc/
* config/mips/mips-protos.h (mips_split_type): New enum.
(mips_split_64bit_move_p, mips_split_doubleword_move): Delete.
(mips_split_move_p, mips_split_move, mips_split_move_insn_p)
(mips_split_move_insn): Declare.
* config/mips/mips.c (mips_tuning_info): New variable.
(mips_load_store_insns): Use mips_split_move_insn_p instead of
mips_split_64bit_move_p.
(mips_emit_move_or_split, mips_mult_move_p): New functions.
(mips_split_64bit_move_p): Rename to...
(mips_split_move_p): ...this and take a mips_split_type argument.
Generalize to all moves.  Call mips_mult_move_p.
(mips_split_doubleword_move): Rename to...
(mips_split_move): ...this and take a mips_split_type argument.
Assert that mips_split_move_p holds.
(mips_insn_split_type, mips_split_move_insn_p, mips_split_move_insn):
New functions.
(mips_output_move): Use mips_split_move_p instead of
mips_split_64bit_move_p.  Handle MULT $0, $0 moves.
(mips_save_reg): Use mips_emit_move_or_split.
(mips_sim_reset): Assign to curr_state.  Call targetm.sched.init
and advance_state.
(mips_sim_init): Call targetm.sched.init_dfa_pre_cycle_insn and
targetm.sched.init_dfa_post_cycle_insn, if defined.
(mips_sim_next_cycle): Assign to curr_state.  Use advance_state
instead of state_transition.
(mips_sim_issue_insn): Assign to curr_state.  Use
targetm.sched.variable_issue to see how many more insns
can be issued.
(mips_seq_time, mips_mult_zero_zero_cost)
(mips_set_fast_mult_zero_zero_p, mips_set_tuning_info)
(mips_expand_to_rtl_hook): New functions.
(TARGET_EXPAND_TO_RTL_HOOK): Define.
* config/mips/mips.md (move_type): Add imul.
(type): Map imul move_types to imul.
(*movdi_32bit, *movti): Add imul alternatives.
Use mips_split_move_insn_p and mips_split_move_insn instead of
mips_split_64bit_move_p and mips_split_doubleword_move in move
splitters.

gcc/testsuite/
2012-10-07  Richard Sandiford  
Sandra Loosemore  

* gcc.target/mips/madd-9.c: Force code to be tuned for the 4kc
and test that the accumulator is initialized using MULT.
* gcc.target/mips/mips32-dsp-accinit-1.c: New test.
* gcc.target/mips/mips32-dsp-accinit-2.c: Likewise.

Index: gcc/config/mips/mips-protos.h
===
--- gcc/config/mips/mips-protos.h   2012-10-06 17:26:43.408618293 +0100
+++ gcc/config/mips/mips-protos.h   2012-10-07 08:21:40.892791332 +0100
@@ -173,6 +173,25 @@ enum mips_call_type {
   MIPS_CALL_EPILOGUE
 };
 
+/* Controls the conditions under which certain inst

Re: RFA: Simplifying truncation and integer lowpart subregs

2012-10-07 Thread Richard Sandiford
Eric Botcazou  writes:
>> I think modelling it as a TRUNCATE operation is correct for
>> !TRULY_NOOP_TRUNCATION (it's the bug that Andrew pointed out).
>> And we shouldn't generate an actual TRUNCATE rtx for
>> TRULY_NOOP_TRUNCATION (the thing about making
>> simplify_gen_unary (TRUNCATE, ...) no worse than simplify_gen_subreg
>> for those targets).  I suppose:
>> 
>>   /* We can't handle truncation to a partial integer mode here
>>  because we don't know the real bitsize of the partial
>>  integer mode.  */
>>   if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
>> break;
>> 
>> might be a problem though; we should still allow a subreg to be
>> generated.  Is that what you were thinking of, or something else?
>
> I was thinking of the !TRULY_NOOP_TRUNCATION case, where the two operations 
> aren't equivalent.  Generating TRUNCATE in simplify_subreg seems suspicious 
> to 
> me in this case but, if not doing it is the source of the bug, I guess I need 
> to do some homework on this TRULY_NOOP_TRUNCATION stuff. :-)
>
> Maybe add a blurb to the head comment of simplify_truncation, explaining that
> it is valid to call the function both for TRUNCATEs and truncations to the 
> lowpart, and why it is correct to generate new TRUNCATEs in the latter case.

Yeah, in hindsight, the patch was definitely lacking commentary.
How about the patch below?  It also fixes the partial int case
and gets rid of the errant NOT hunk.  Tested in the same way as before.

Richard


gcc/
* machmode.h (GET_MODE_UNIT_PRECISION): New macro.
* simplify-rtx.c (simplify_truncation): New function,
extracted from simplify_subreg and (in small part) from
simplify_unary_operation_1.
(simplify_unary_operation_1) : Use it.  Remove sign bit
test for !TRULY_NOOP_TRUNCATION_MODES_P.
(simplify_subreg): Use simplify_truncate for lowpart subregs
where both the inner and outer modes are scalar integers.
* config/mips/mips.c (mips_truncated_op_cost): New function.
(mips_rtx_costs): Adjust test for BADDU.
* config/mips/mips.md (*baddu_di): Push truncates to operands.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2012-10-06 11:22:55.888609330 +0100
+++ gcc/machmode.h  2012-10-06 11:24:42.459603393 +0100
@@ -217,6 +217,11 @@ #define GET_MODE_UNIT_SIZE(MODE)   \
 #define GET_MODE_UNIT_BITSIZE(MODE) \
   ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT))
 
+#define GET_MODE_UNIT_PRECISION(MODE)  \
+  (GET_MODE_INNER (MODE) == VOIDmode   \
+   ? GET_MODE_PRECISION (MODE) \
+   : GET_MODE_PRECISION (GET_MODE_INNER (MODE)))
+
 /* Get the number of units in the object.  */
 
 extern const unsigned char mode_nunits[NUM_MACHINE_MODES];
Index: gcc/simplify-rtx.c
===
--- gcc/simplify-rtx.c  2012-10-06 11:23:57.005605924 +0100
+++ gcc/simplify-rtx.c  2012-10-06 17:14:40.018658588 +0100
@@ -564,6 +564,212 @@ simplify_replace_rtx (rtx x, const_rtx o
   return simplify_replace_fn_rtx (x, old_rtx, 0, new_rtx);
 }
 
+/* Try to simplify a MODE truncation of OP, which has OP_MODE.
+   Only handle cases where the truncated value is inherently an rvalue.
+
+   RTL provides two ways of truncating a value:
+
+   1. a lowpart subreg.  This form is only a truncation when both
+  the outer and inner modes (here MODE and OP_MODE respectively)
+  are scalar integers, and only then when the subreg is used as
+  an rvalue.
+
+  It is only valid to form such truncating subregs if the
+  truncation requires no action by the target.  The onus for
+  proving this is on the creator of the subreg -- e.g. the
+  caller to simplify_subreg or simplify_gen_subreg -- and typically
+  involves either TRULY_NOOP_TRUNCATION_MODES_P or truncated_to_mode.
+
+   2. a TRUNCATE.  This form handles both scalar and compound integers.
+
+   The first form is preferred where valid.  However, the TRUNCATE
+   handling in simplify_unary_operation turns the second form into the
+   first form when TRULY_NOOP_TRUNCATION_MODES_P or truncated_to_mode allow,
+   so it is generally safe to form rvalue truncations using:
+
+  simplify_gen_unary (TRUNCATE, ...)
+
+   and leave simplify_unary_operation to work out which representation
+   should be used.
+
+   Because of the proof requirements on (1), simplify_truncation must
+   also use simplify_gen_unary (TRUNCATE, ...) to truncate parts of OP,
+   regardless of whether the outer truncation came from a SUBREG or a
+   TRUNCATE.  For example, if the caller has proven that an SImode
+   truncation of:
+
+  (and:DI X Y)
+
+   is a no-op and can be represented as a subreg, it does not follow
+   that SImode truncations of X and Y are also no-ops.  On a target
+   like 64-bit MIPS that requires SImode values to be stored in
+   s

Re: [patch] fix libbacktrace build failure on arm-linux

2012-10-07 Thread Matthias Klose
On 06.10.2012 22:46, Ian Lance Taylor wrote:
> On Sat, Oct 6, 2012 at 8:09 AM, Matthias Klose  wrote:
>> current trunk fails to build on arm-linux with:
>>
>> In file included from ../../../../src/libbacktrace/backtrace.c:35:0:
>> ../libgcc/unwind.h: In function '_Unwind_decode_typeinfo_ptr':
>> ../libgcc/unwind.h:42:45: error: unused parameter 'base' 
>> [-Werror=unused-parameter]
>>_Unwind_decode_typeinfo_ptr (_Unwind_Word base, _Unwind_Word ptr)
>>  ^
>> ../libgcc/unwind.h: In function '__gnu_unwind_24bit':
>> ../libgcc/unwind.h:68:41: error: unused parameter 'context'
>> [-Werror=unused-parameter]
>>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>>  ^
>> ../libgcc/unwind.h:68:54: error: unused parameter 'data' 
>> [-Werror=unused-parameter]
>>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>>   ^
>> ../libgcc/unwind.h:68:64: error: unused parameter 'compact'
>> [-Werror=unused-parameter]
>>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>> ^
>> cc1: all warnings being treated as errors
>> make[8]: *** [backtrace.lo] Error 1
>>
>> the immediate fix is to mark all arguments as unused, however I don't know if
>> this function should be used by libbacktrace, if it returns _URC_FAILURE
>> unconditionally.
> 
> The function is not used by libbacktrace.  It's an inline function
> defined in the header file, and the warning is about the inline
> function definition.  That is, this is a libgcc bug, not a
> libbacktrace bug, it just happens to show up when compiling
> libbacktrace for ARM
> 
> 
>> * config/arm/unwind-arm.h (__gnu_unwind_24bit): Mark parameters
>> as unused.
>>
>>
>> --- libgcc/config/arm/unwind-arm.h  (revision 192162)
>> +++ libgcc/config/arm/unwind-arm.h  (working copy)
>> @@ -64,8 +64,11 @@
>>return tmp;
>>  }
>>
>> +#define __unused __attribute__((unused))
>> +
>>static inline _Unwind_Reason_Code
>> -  __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>> +  __gnu_unwind_24bit (_Unwind_Context * context __unused, _uw data __unused,
>> +  int compact __unused)
>>  {
>>return _URC_FAILURE;
>>  }
> 
> Don't #define __unused.  Just write __attribute__ ((unused)) on the
> parameters.  Break the lines as needed.
> 
> This is OK with that change.

committed the following. the base parameter in _Unwind_decode_typeinfo_ptr
is unused too.

  Matthias

2012-10-07  Matthias Klose  

* config/arm/unwind-arm.h (__gnu_unwind_24bit): Mark parameters
as unused.
(_Unwind_decode_typeinfo_ptr): Mark base as unused.

Index: libgcc/config/arm/unwind-arm.h
===
--- a/src/libgcc/config/arm/unwind-arm.h(revision 192162)
+++ b/src/libgcc/config/arm/unwind-arm.h(working copy)
@@ -39,7 +39,8 @@
 #endif
   /* Decode an R_ARM_TARGET2 relocation.  */
   static inline _Unwind_Word
-  _Unwind_decode_typeinfo_ptr (_Unwind_Word base, _Unwind_Word ptr)
+  _Unwind_decode_typeinfo_ptr (_Unwind_Word base __attribute__ ((unused)),
+   _Unwind_Word ptr)
 {
   _Unwind_Word tmp;

@@ -65,7 +66,9 @@
 }

   static inline _Unwind_Reason_Code
-  __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
+  __gnu_unwind_24bit (_Unwind_Context * context __attribute__ ((unused)),
+  _Unwind_Context *_uw data __attribute__ ((unused)),
+  int compact __attribute__ ((unused)))
 {
   return _URC_FAILURE;
 }



Re: [Patch, Fortran] Fix OPTIONAL, esp. with polymorphism

2012-10-07 Thread Dominique Dhumieres
Hi Tobias,

I have tested your patch, mostly the added test cases.
I think the test gfortran.dg/class_optional_2.f90 should be split:
it has too much tests lumped together. In addition
the test gfortran.dg/class_optional_1.f90 does not compile
because "symbol 'i' at (1) has no IMPLICIT type" (three times),
this is fixed with something such as

   elemental subroutine sub_ct2(y)
+integer :: i
 class(t), intent(in), optional :: y
 if (present(y)) i = 5
   end subroutine sub_ct2

but the code seems weird.

The code gfortran.dg/class_optional_2.f90 compiles, but
the runtime does not exit (at least after more than 30s).
Finally I have applied the following changes in order
to make it works:

--- /opt/gcc/p_work/gcc/testsuite/gfortran.dg/class_optional_2.f90  
2012-10-06 19:10:08.0 +0200
+++ class_optional_2_db_2.f90   2012-10-05 22:11:23.0 +0200
@@ -69,14 +69,14 @@
   if (allocated (xa)) call abort ()
 
   call suba2(alloc=.false., prsnt=.false.)
-  call suba2(xa2, alloc=.false., prsnt=.true.)
-  if (.not. allocated (xa2)) call abort ()
-  if (size (xa2) /= 1) call abort ()
-  if (.not. allocated (xa2(1)%i)) call abort ()
-  if (xa2(1)%i /= 5) call abort ()
-  xa2(1)%i = -3
-  call suba2(xa2, alloc=.true., prsnt=.true.)
-  if (allocated (xa2)) call abort ()
+!  call suba2(xa2, alloc=.false., prsnt=.true.)
+!  if (.not. allocated (xa2)) call abort ()
+!  if (size (xa2) /= 1) call abort ()
+!  if (.not. allocated (xa2(1)%i)) call abort ()
+!  if (xa2(1)%i /= 5) call abort ()
+!  xa2(1)%i = -3
+!  call suba2(xa2, alloc=.true., prsnt=.true.)
+!  if (allocated (xa2)) call abort ()
 
   call subp(alloc=.false., prsnt=.false.)
   call subp(xp, alloc=.false., prsnt=.true.)
@@ -88,14 +88,14 @@
   if (associated (xp)) call abort ()
 
   call subp2(alloc=.false., prsnt=.false.)
-  call subp2(xp2, alloc=.false., prsnt=.true.)
-  if (.not. associated (xp2)) call abort ()
-  if (size (xp2) /= 1) call abort ()
-  if (.not. allocated (xp2(1)%i)) call abort ()
-  if (xp2(1)%i /= 5) call abort ()
-  xp2(1)%i = -3
-  call subp2(xp2, alloc=.true., prsnt=.true.)
-  if (associated (xp2)) call abort ()
+!  call subp2(xp2, alloc=.false., prsnt=.true.)
+!  if (.not. associated (xp2)) call abort ()
+!  if (size (xp2) /= 1) call abort ()
+!  if (.not. allocated (xp2(1)%i)) call abort ()
+!  if (xp2(1)%i /= 5) call abort ()
+!  xp2(1)%i = -3
+!  call subp2(xp2, alloc=.true., prsnt=.true.)
+!  if (associated (xp2)) call abort ()
 
   call subac(alloc=.false., prsnt=.false.)
   call subac(xac, alloc=.false., prsnt=.true.)
@@ -107,14 +107,14 @@
   if (allocated (xac)) call abort ()
 
   call suba2c(alloc=.false., prsnt=.false.)
-  call suba2c(xa2c, alloc=.false., prsnt=.true.)
-  if (.not. allocated (xa2c)) call abort ()
-  if (size (xa2c) /= 1) call abort ()
-  if (.not. allocated (xa2c(1)%i)) call abort ()
-  if (xa2c(1)%i /= 5) call abort ()
-  xa2c(1)%i = -3
-  call suba2c(xa2c, alloc=.true., prsnt=.true.)
-  if (allocated (xa2c)) call abort ()
+!  call suba2c(xa2c, alloc=.false., prsnt=.true.)
+!  if (.not. allocated (xa2c)) call abort ()
+!  if (size (xa2c) /= 1) call abort ()
+!  if (.not. allocated (xa2c(1)%i)) call abort ()
+!  if (xa2c(1)%i /= 5) call abort ()
+!  xa2c(1)%i = -3
+!  call suba2c(xa2c, alloc=.true., prsnt=.true.)
+!  if (allocated (xa2c)) call abort ()
 
 contains
  subroutine suba2c(x, prsnt, alloc)
@@ -508,9 +508,9 @@ contains
 !   call s2elem(z3) ! FIXME: Segfault
 !   call s2elem(z4) ! FIXME: Segfault
 !   call s2elem(z5) ! FIXME: Segfault
-   call s2elem_t(x)
-   call s2elem_t(y)
-   call s2elem_t(z)
+!   call s2elem_t(x)
+!   call s2elem_t(y)
+!   call s2elem_t(z)
 !   call s2elem_t(z2) ! FIXME: Segfault
 !   call s2elem_t(z3) ! FIXME: Segfault
 !   call s2elem_t(z4) ! FIXME: Segfault
@@ -550,9 +550,9 @@ contains
 !   call s2elem(z3) ! FIXME: Segfault
 !   call s2elem(z4) ! FIXME: Segfault
 !   call s2elem(z5) ! FIXME: Segfault
-   call s2elem_t2(x)
-   call s2elem_t2(y)
-   call s2elem_t2(z)
+!   call s2elem_t2(x)
+!   call s2elem_t2(y)
+!   call s2elem_t2(z)
 !   call s2elem_t2(z2) ! FIXME: Segfault
 !   call s2elem_t2(z3) ! FIXME: Segfault
 !   call s2elem_t2(z4) ! FIXME: Segfault
@@ -704,9 +704,9 @@ contains
 !   call s2elem(z3) ! FIXME: Segfault
 !   call s2elem(z4) ! FIXME: Segfault
 !   call s2elem(z5) ! FIXME: Segfault
-   call s2elem_t(x)
-   call s2elem_t(y)
-   call s2elem_t(z)
+!   call s2elem_t(x)
+!   call s2elem_t(y)
+!   call s2elem_t(z)
 !   call s2elem_t(z2) ! FIXME: Segfault
 !   call s2elem_t(z3) ! FIXME: Segfault
 !   call s2elem_t(z4) ! FIXME: Segfault
@@ -747,9 +747,9 @@ contains
 !   call s2elem(z3) ! FIXME: Segfault
 !   call s2elem(z4) ! FIXME: Segfault
 !   call s2elem(z5) ! FIXME: Segfault
-   call s2elem_t2(x)
-   call s2elem_t2(y)
-   call s2elem_t2(z)
+!   call s2elem_t2(x)
+!   call s2elem_t2(y)
+!   call s2elem_t2(z)
 !   call s2elem_t2(z2) ! FIXME: Segfault
 !   call s2elem_t2(z3) ! FIXME: Segfault
 !   call s2elem_t2(z4) ! FIXME: Segfault
@@ -79

Check that unlinked uses do not contain ssa-names when renaming.

2012-10-07 Thread Tom de Vries
Richard,

attached patch checks that unlinked uses do not contain ssa-names when renaming.

This assert triggers when compiling (without the fix) the PR54735 example.

AFAIU, it was due to chance that we caught the PR54735 bug by hitting the
verification failure, because the new vdef introduced by renaming happened to be
the same name as the ssa name referenced in the invalid unlinked use (in terms
of maybe_replace_use: rdef == use).

The assert from this patch catches all cases that an unlinked use contains an
ssa-name.

Bootstrapped and reg-tested on x86_64 (Ada inclusive).

OK for trunk?

Thanks,
- Tom

2012-10-07  Tom de Vries  

* tree-into-ssa.c (maybe_replace_use): Add assert.
Index: gcc/tree-into-ssa.c
===
--- gcc/tree-into-ssa.c	(revision 192023)
+++ gcc/tree-into-ssa.c	(working copy)
@@ -1773,6 +1773,9 @@
 rdef = get_reaching_def (sym);
   else if (is_old_name (use))
 rdef = get_reaching_def (use);
+  
+  if (use_p->prev == NULL && use_p->next == NULL)
+gcc_assert (TREE_CODE (use) != SSA_NAME);
 
   if (rdef && rdef != use)
 SET_USE (use_p, rdef);


Re: [C++ Patch / RFC] PR 51422

2012-10-07 Thread Paolo Carlini

Hi,

On 10/06/2012 03:52 PM, Jason Merrill wrote:

On 09/27/2012 07:08 AM, Paolo Carlini wrote:

Then checking error_operand_p (decl) in is_capture_proxy solves the
problem but now the question is: do we have reasons to believe that such
VAR_DECLs should never ever reach is_normal_capture_proxy?


That depends on our error recovery strategy for an invalid capture.  
It seems that we currently build a VAR_DECL that has an 
error_mark_node DECL_VALUE_EXPR; if we're going to do that, we need to 
deal with it.  I guess we should return true in that case, if it 
doesn't cause more problems later on.
Ah good, thanks for the suggestion. Then the below passes testing on 
x86_64-linux. Ok for mainline?


Thanks again,
Paolo.


/cp
2012-10-07  Paolo Carlini  

PR c++/51422
* semantics.c (is_normal_capture_proxy): Return true for
error_mark_node as DECL_VALUE_EXPR.

/testsuite
2012-10-07  Paolo Carlini  

PR c++/51422
* g++.dg/cpp0x/lambda/lambda-ice8.C: New.
Index: cp/semantics.c
===
--- cp/semantics.c  (revision 192179)
+++ cp/semantics.c  (working copy)
@@ -3,8 +3,7 @@
building RTL.  These routines are used both during actual parsing
and during the instantiation of template functions.
 
-   Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
-2008, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
+   Copyright (C) 1998-2012 Free Software Foundation, Inc.
Written by Mark Mitchell (mmitch...@usa.net) based on code found
formerly in parse.y and pt.c.
 
@@ -9005,14 +9004,15 @@ is_capture_proxy (tree decl)
 bool
 is_normal_capture_proxy (tree decl)
 {
-  tree val;
-
   if (!is_capture_proxy (decl))
 /* It's not a capture proxy.  */
 return false;
 
   /* It is a capture proxy, is it a normal capture?  */
-  val = DECL_VALUE_EXPR (decl);
+  tree val = DECL_VALUE_EXPR (decl);
+  if (val == error_mark_node)
+return true;
+
   gcc_assert (TREE_CODE (val) == COMPONENT_REF);
   val = TREE_OPERAND (val, 1);
   return DECL_NORMAL_CAPTURE_P (val);
Index: testsuite/g++.dg/cpp0x/lambda/lambda-ice8.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-ice8.C (revision 0)
+++ testsuite/g++.dg/cpp0x/lambda/lambda-ice8.C (working copy)
@@ -0,0 +1,10 @@
+// PR c++/51422
+// { dg-do compile { target c++11 } }
+
+template struct A {};
+
+void foo()
+{
+  [i] { A(); };   // { dg-error "not declared|invalid" }
+  [i] { A(); };   // { dg-error "invalid" }
+}


[C++ Patch] Small SFINAE-related clean up

2012-10-07 Thread Paolo Carlini

Hi,

I believe that, as a general rule, a function taking a tsubst_flags_t 
complain parameter should propagate it, thus, in particular, call the 
*_sfinae variant of various helper functions we have got. The below 
fixes a few places where we weren't doing that (+ a couple of minor 
unrelated tweaks noticed while going through the code).


Tested x86_64-linux.

Thanks,
Paolo.

//
2012-10-07  Paolo Carlini  

* pt.c (fold_non_dependent_expr_sfinae): Remove static specifier.
(tsubst_copy_and_build): Use get_target_expr_sfinae.
* call.c (build_conditional_expr_1, convert_like_real): Likewise.
* cvt.c (build_up_reference): Likewise.
(ocp_convert): Use abstract_virtuals_error_sfinae.
(build_up_reference): Propagate complain to cp_build_addr_expr.
* decl.c (compute_array_index_type): Use fold_non_dependent_expr_sfinae.
* cp-tree.h: Update declarations.

* cvt.c (build_expr_type_conversion): Tidy.

* tree.c (stabilize_aggr_init): Change to static.
Index: pt.c
===
--- pt.c(revision 192179)
+++ pt.c(working copy)
@@ -5020,7 +5020,7 @@ redeclare_class_template (tree type, tree parms)
 /* Simplify EXPR if it is a non-dependent expression.  Returns the
(possibly simplified) expression.  */
 
-static tree
+tree
 fold_non_dependent_expr_sfinae (tree expr, tsubst_flags_t complain)
 {
   if (expr == NULL_TREE)
@@ -14287,7 +14287,8 @@ tsubst_copy_and_build (tree t,
  FIXME stop folding in cp_parser_initializer_clause.  */
   gcc_assert (TREE_CONSTANT (t));
   {
-   tree r = get_target_expr (RECUR (TARGET_EXPR_INITIAL (t)));
+   tree r = get_target_expr_sfinae (RECUR (TARGET_EXPR_INITIAL (t)),
+complain);
TREE_CONSTANT (r) = true;
RETURN (r);
   }
Index: cvt.c
===
--- cvt.c   (revision 192179)
+++ cvt.c   (working copy)
@@ -339,12 +339,12 @@ build_up_reference (tree type, tree arg, int flags
  LOOKUP_ONLYCONVERTING|DIRECT_BIND);
 }
   else if (!(flags & DIRECT_BIND) && ! lvalue_p (arg))
-return get_target_expr (arg);
+return get_target_expr_sfinae (arg, complain);
 
   /* If we had a way to wrap this up, and say, if we ever needed its
  address, transform all occurrences of the register, into a memory
  reference we could win better.  */
-  rval = cp_build_addr_expr (arg, tf_warning_or_error);
+  rval = cp_build_addr_expr (arg, complain);
   if (rval == error_mark_node)
 return error_mark_node;
 
@@ -842,7 +842,7 @@ ocp_convert (tree type, tree expr, int convtype, i
 
   ctor = e;
 
-  if (abstract_virtuals_error (NULL_TREE, type))
+  if (abstract_virtuals_error_sfinae (NULL_TREE, type, complain))
return error_mark_node;
 
   if (BRACE_ENCLOSED_INITIALIZER_P (ctor))
@@ -1514,8 +1514,6 @@ build_expr_type_conversion (int desires, tree expr
  "converting NULL to non-pointer type");
 }
 
-  basetype = TREE_TYPE (expr);
-
   if (basetype == error_mark_node)
 return error_mark_node;
 
Index: tree.c
===
--- tree.c  (revision 192179)
+++ tree.c  (working copy)
@@ -3557,7 +3557,7 @@ stabilize_call (tree call, tree *initp)
arguments, while, upon return, *INITP contains an expression to
compute the arguments.  */
 
-void
+static void
 stabilize_aggr_init (tree call, tree *initp)
 {
   tree inits = NULL_TREE;
Index: call.c
===
--- call.c  (revision 192179)
+++ call.c  (working copy)
@@ -4777,7 +4777,7 @@ build_conditional_expr_1 (tree arg1, tree arg2, tr
 but now we sometimes wrap them in NOP_EXPRs so the test would
 fail.  */
   if (CLASS_TYPE_P (TREE_TYPE (result)))
-   result = get_target_expr (result);
+   result = get_target_expr_sfinae (result, complain);
   /* If this expression is an rvalue, but might be mistaken for an
 lvalue, we must add a NON_LVALUE_EXPR.  */
   result = rvalue (result);
@@ -5883,7 +5883,7 @@ convert_like_real (conversion *convs, tree expr, t
field = next_initializable_field (DECL_CHAIN (field));
CONSTRUCTOR_APPEND_ELT (vec, field, size_int (len));
new_ctor = build_constructor (totype, vec);
-   return get_target_expr (new_ctor);
+   return get_target_expr_sfinae (new_ctor, complain);
   }
 
 case ck_aggr:
@@ -5899,7 +5899,8 @@ convert_like_real (conversion *convs, tree expr, t
  return fold_if_not_in_template (expr);
}
   expr = reshape_init (totype, expr, complain);
-  return get_target_expr (digest_init (totype, expr, complain));
+  return get_target_expr_sfinae (digest_init (totype, expr, complain),
+  

Re: [patch] fix libbacktrace build failure on arm-linux

2012-10-07 Thread Matthias Klose
On 07.10.2012 12:01, Matthias Klose wrote:
> On 06.10.2012 22:46, Ian Lance Taylor wrote:
>> On Sat, Oct 6, 2012 at 8:09 AM, Matthias Klose  wrote:
>>> current trunk fails to build on arm-linux with:
>>>
>>> In file included from ../../../../src/libbacktrace/backtrace.c:35:0:
>>> ../libgcc/unwind.h: In function '_Unwind_decode_typeinfo_ptr':
>>> ../libgcc/unwind.h:42:45: error: unused parameter 'base' 
>>> [-Werror=unused-parameter]
>>>_Unwind_decode_typeinfo_ptr (_Unwind_Word base, _Unwind_Word ptr)
>>>  ^
>>> ../libgcc/unwind.h: In function '__gnu_unwind_24bit':
>>> ../libgcc/unwind.h:68:41: error: unused parameter 'context'
>>> [-Werror=unused-parameter]
>>>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>>>  ^
>>> ../libgcc/unwind.h:68:54: error: unused parameter 'data' 
>>> [-Werror=unused-parameter]
>>>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>>>   ^
>>> ../libgcc/unwind.h:68:64: error: unused parameter 'compact'
>>> [-Werror=unused-parameter]
>>>__gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>>> ^
>>> cc1: all warnings being treated as errors
>>> make[8]: *** [backtrace.lo] Error 1
>>>
>>> the immediate fix is to mark all arguments as unused, however I don't know 
>>> if
>>> this function should be used by libbacktrace, if it returns _URC_FAILURE
>>> unconditionally.
>>
>> The function is not used by libbacktrace.  It's an inline function
>> defined in the header file, and the warning is about the inline
>> function definition.  That is, this is a libgcc bug, not a
>> libbacktrace bug, it just happens to show up when compiling
>> libbacktrace for ARM
>>
>>
>>> * config/arm/unwind-arm.h (__gnu_unwind_24bit): Mark parameters
>>> as unused.
>>>
>>>
>>> --- libgcc/config/arm/unwind-arm.h  (revision 192162)
>>> +++ libgcc/config/arm/unwind-arm.h  (working copy)
>>> @@ -64,8 +64,11 @@
>>>return tmp;
>>>  }
>>>
>>> +#define __unused __attribute__((unused))
>>> +
>>>static inline _Unwind_Reason_Code
>>> -  __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
>>> +  __gnu_unwind_24bit (_Unwind_Context * context __unused, _uw data 
>>> __unused,
>>> +  int compact __unused)
>>>  {
>>>return _URC_FAILURE;
>>>  }
>>
>> Don't #define __unused.  Just write __attribute__ ((unused)) on the
>> parameters.  Break the lines as needed.
>>
>> This is OK with that change.
> 
> committed the following. the base parameter in _Unwind_decode_typeinfo_ptr
> is unused too.
> 
>   Matthias
> 
> 2012-10-07  Matthias Klose  
> 
> * config/arm/unwind-arm.h (__gnu_unwind_24bit): Mark parameters
> as unused.
> (_Unwind_decode_typeinfo_ptr): Mark base as unused.
> 
> Index: libgcc/config/arm/unwind-arm.h
> ===
> --- a/src/libgcc/config/arm/unwind-arm.h(revision 192162)
> +++ b/src/libgcc/config/arm/unwind-arm.h(working copy)
> @@ -39,7 +39,8 @@
>  #endif
>/* Decode an R_ARM_TARGET2 relocation.  */
>static inline _Unwind_Word
> -  _Unwind_decode_typeinfo_ptr (_Unwind_Word base, _Unwind_Word ptr)
> +  _Unwind_decode_typeinfo_ptr (_Unwind_Word base __attribute__ ((unused)),
> +   _Unwind_Word ptr)
>  {
>_Unwind_Word tmp;
> 
> @@ -65,7 +66,9 @@
>  }
> 
>static inline _Unwind_Reason_Code
> -  __gnu_unwind_24bit (_Unwind_Context * context, _uw data, int compact)
> +  __gnu_unwind_24bit (_Unwind_Context * context __attribute__ ((unused)),
> +  _Unwind_Context *_uw data __attribute__ ((unused)),
> +  int compact __attribute__ ((unused)))
>  {
>return _URC_FAILURE;
>  }
> 

and as a follow-up, fix the typo, committed as obvious

--- config/arm/unwind-arm.h (revision 192181)
+++ config/arm/unwind-arm.h (working copy)
@@ -67,7 +67,7 @@

   static inline _Unwind_Reason_Code
   __gnu_unwind_24bit (_Unwind_Context * context __attribute__ ((unused)),
-  _Unwind_Context *_uw data __attribute__ ((unused)),
+  _uw data __attribute__ ((unused)),
   int compact __attribute__ ((unused)))
 {
   return _URC_FAILURE;



Re: RFA: Simplifying truncation and integer lowpart subregs

2012-10-07 Thread Eric Botcazou
> Yeah, in hindsight, the patch was definitely lacking commentary.
> How about the patch below?  It also fixes the partial int case
> and gets rid of the errant NOT hunk.  Tested in the same way as before.
> 
> Richard
> 
> 
> gcc/
>   * machmode.h (GET_MODE_UNIT_PRECISION): New macro.
>   * simplify-rtx.c (simplify_truncation): New function,
>   extracted from simplify_subreg and (in small part) from
>   simplify_unary_operation_1.
>   (simplify_unary_operation_1) : Use it.  Remove sign bit
>   test for !TRULY_NOOP_TRUNCATION_MODES_P.
>   (simplify_subreg): Use simplify_truncate for lowpart subregs
>   where both the inner and outer modes are scalar integers.

This looks good to me, thanks.

-- 
Eric Botcazou


Re: patch to fix constant math

2012-10-07 Thread Richard Guenther
On Fri, Oct 5, 2012 at 6:34 PM, Kenneth Zadeck  wrote:
> richard s,
> there are two comments that i deferred to you.   that have the word richard
> in them,
>
> richi,
> thank, i will start doing this now.
>
> kenny
>
> On 10/05/2012 09:49 AM, Richard Guenther wrote:
>>
>> On Fri, Oct 5, 2012 at 2:39 PM, Richard Guenther
>>  wrote:
>>>
>>> Ok, I see where you are going.  Let me look at the patch again.
>>
>> * The introduction and use of CONST_SCALAR_INT_P could be split out
>>(obvious and good)
>>
>> * DEF_RTL_EXPR(CONST_WIDE_INT, "const_wide_int", "", RTX_CONST_OBJ)
>>defining that new RTX is orthogonal to providing the wide_int
>> abstraction
>>for operating on CONST_INT and CONST_DOUBLE, right?
>>
>> @@ -144,6 +144,7 @@ static bool
>>   prefer_and_bit_test (enum machine_mode mode, int bitnum)
>>   {
>> bool speed_p;
>> +  wide_int mask = wide_int::set_bit_in_zero (bitnum, mode);
>>
>> set_bit_in_zero ... why use this instead of
>> wide_int::zero (mode).set_bit (bitnum) that would match the old
>> double_int_zero.set_bit interface and be more composed of primitives?
>
> adding something like this was just based on usage.We do this operation
> all over the place, why not make it concise and efficient. The api is
> very bottom up.   I looked at what the clients were doing all over the place
> and added those functions.
>
> wide-int has both and_not and or_not.   double-int only has and_not, but
> there are a lot of places in where you do or_nots, so we have or_not also.
>
>
>> if (and_test == 0)
>>   {
>> @@ -164,8 +165,7 @@ prefer_and_bit_test (enum machine_mode m
>>   }
>>
>> /* Fill in the integers.  */
>> -  XEXP (and_test, 1)
>> -= immed_double_int_const (double_int_zero.set_bit (bitnum), mode);
>> +  XEXP (and_test, 1) = immed_wide_int_const (mask);
>>
>> I suppose immed_wide_int_const may create a CONST_INT, a CONST_DOUBLE
>> or a CONST_WIDE_INT?
>
> yes, on converted targets it builds const_ints and const_wide_ints and on
> unconverted targets it builds const_ints and const_doubles.The reason i
> did not want to convert the targets is that the code that lives in the
> targets generally wants to use the values to create constants and this code
> is very very very target specific.
>
>>
>> +#if TARGET_SUPPORTS_WIDE_INT
>> +/* Returns 1 if OP is an operand that is a CONST_INT or CONST_WIDE_INT.
>> */
>> +int
>> +const_scalar_int_operand (rtx op, enum machine_mode mode)
>> +{
>>
>> why is it necessary to condition anything on TARGET_SUPPORTS_WIDE_INT?
>> It seems testing/compile coverage of this code will be bad ...
>>
>> Can't you simply map CONST_WIDE_INT to CONST_DOUBLE for targets
>
> The accessors for the two are completely different.const doubles "know"
> that there are exactly two hwi's.   for const_wide_ints, you only know that
> the len is greater than 1.   anything with len 1 would be CONST_INT.
>
> In a modern c++ world, const_int would be a subtype of const_int, but that
> is a huge patch.
>
>
>> not supporting CONST_WIDE_INT (what does it mean to "support"
>> CONST_WIDE_INT?  Does a target that supports CONST_WIDE_INT no
>> longer use CONST_DOUBLEs for integers?)
>
> yes, that is exactly what it means.
>
>> +  if (!CONST_WIDE_INT_P (op))
>> +return 0;
>>
>> hm, that doesn't match the comment (CONST_INT is supposed to return 1 as
>> well).
>> The comment doesn't really tell what the function does it seems,
>
> I need some context here to reply.
>
>
>> +  int prec = GET_MODE_PRECISION (mode);
>> +  int bitsize = GET_MODE_BITSIZE (mode);
>> +
>> +  if (CONST_WIDE_INT_NUNITS (op) * HOST_BITS_PER_WIDE_INT > bitsize)
>> +   return 0;
>>
>> it's mode argument is not documented.  And this doesn't even try to see if
>> the constant would fit the mode anyway (like if it's zero).  Not sure what
>> the function is for.
>
> I will upgrade the comments, they were inherited from the old version with
> the const_double changed to the const_wide_int
>
>
>> +   {
>> + /* Multiword partial int.  */
>> + HOST_WIDE_INT x
>> +   = CONST_WIDE_INT_ELT (op, CONST_WIDE_INT_NUNITS (op) - 1);
>> + return (wide_int::sext (x, prec & (HOST_BITS_PER_WIDE_INT - 1))
>> + == x);
>>
>> err - so now we have wide_int with a mode but we pass in another mode
>> and see if they have the same value?  Same value as-in signed value?
>> Which probably is why you need to rule out different size constants above?
>> Because you don't know whether to sign or zero extend?
>
> no it is worse than that.   I made the decision, which i think is correct,
> that we were not going to carry the mode inside const_wide_int.   The
> justification was that we do not do it for const_int.There is a comment
> in the constructor for const_wide_int that says it would be so easy to just
> put this in.
>
> But, this is an old api that did not change.   only the internals changed to
> support const_wide_int.
>
>
>>
>> +/* Returns 1 

Re: patch to fix constant math

2012-10-07 Thread Kenneth Zadeck


On 10/07/2012 08:47 AM, Richard Guenther wrote:

len seems redundant unless you want to optimize encoding.
>>len == (precision + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT.

>
>that is exactly what we do.   However, we are optimizing time, not space.
>the value is always stored in compressed form, i.e the same representation
>as is used inside the CONST_WIDE_INTs and INT_CSTs. this makes the
>transformation between them very fast.   And since we do this a lot, it
>needs to be fast.   So the len is the number of HWIs needed to represent the
>value which is typically less than what would be implied by the precision.

But doesn't this require a sign?  Otherwise how do you encode TImode 0x?
Would len be 1 here?  (and talking about CONST_WIDE_INT vs CONST_INT,
wouldn't CONST_INT be used anyway for all ints we can encode "small"?  Or
is it (hopefully) the goal to replace CONST_INT with CONST_WIDE_INT
everywhere?)  Or do you say wide_int is always "signed', thus TImode 0x
needs len == 2 and an explicit zero upper HWI word?

the compression of this has len ==2 with the explicit upper being 0.


  Or do you say wide_int
is always "unsigned", thus TImode -1 needs len == 2?  Noting that double-ints
were supposed to be twos-complement (thus, 'unsigned') numbers having
implicit non-zero bits sounds error-prone.

That said - I don't see any value in optimizing storage for short-lived
(as you say) wide-ints (apart from limiting it to the mode size).  For
example mutating operations on wide-ints are not really possible
if you need to extend storage.
the compression used is independent of whether it is sign or unsigned.  
but the compression "looks" signed.   i.e. you do not represent the 
upper hwi if they would just be a sign extension of the top hwi that is 
represented.   However, this does not imply anything about the math that 
was used to set those bits that way, and you can always go back to the 
full rep independent of the sign.


I do not have any long term plans to merge CONST_INT into 
CONST_WIDE_INT.   It would be a huge change in the ports and would be 
fairly space inefficient.  My guess is that in real code, less than 1% 
of real constants will have a length greater than 1 even on a 32 bit 
host.   CONST_INT is very space efficient. This could have been 
mentioned as part of Richard's response to your comment about the way we 
represent the CONST_WIDE_INTs.   In practice, there are almost none of 
them anyway.


In fact, you could argue that the tree level did it wrong (not that i am 
suggesting to change this).   But it makes me think what was going on 
when the decision to make TYPE_PRECISION be an INT_CST rather than just 
a HWI was made.   For that there is an implication that it could never 
take more than a HWI since no place in the code even checks 
TREE_INT_CST_HIGH for these.

>>+  enum Op {
>>+NONE,
>>
>>we don't have sth like this in double-int.h, why was the double-int
>>mechanism
>>not viable?

>
>i have chosen to use enums for things rather than passing booleans.

But it's bad to use an enum with 4 values for a thing that can only possibly
expect two.  You cannot optimize tests as easily.  Please retain the bool uns
parameters instead.

I am breaking it into two enums.


Re: patch to fix constant math

2012-10-07 Thread Richard Guenther
On Sun, Oct 7, 2012 at 3:11 PM, Kenneth Zadeck  wrote:
>
> On 10/07/2012 08:47 AM, Richard Guenther wrote:

 len seems redundant unless you want to optimize encoding.
 >>len == (precision + HOST_BITS_PER_WIDE_INT - 1) /
 >> HOST_BITS_PER_WIDE_INT.
>>>
>>> >
>>> >that is exactly what we do.   However, we are optimizing time, not
>>> > space.
>>> >the value is always stored in compressed form, i.e the same
>>> > representation
>>> >as is used inside the CONST_WIDE_INTs and INT_CSTs. this makes the
>>> >transformation between them very fast.   And since we do this a lot, it
>>> >needs to be fast.   So the len is the number of HWIs needed to represent
>>> > the
>>> >value which is typically less than what would be implied by the
>>> > precision.
>>
>> But doesn't this require a sign?  Otherwise how do you encode TImode
>> 0x?
>> Would len be 1 here?  (and talking about CONST_WIDE_INT vs CONST_INT,
>> wouldn't CONST_INT be used anyway for all ints we can encode "small"?  Or
>> is it (hopefully) the goal to replace CONST_INT with CONST_WIDE_INT
>> everywhere?)  Or do you say wide_int is always "signed', thus TImode
>> 0x
>> needs len == 2 and an explicit zero upper HWI word?
>
> the compression of this has len ==2 with the explicit upper being 0.
>
>
>>   Or do you say wide_int
>> is always "unsigned", thus TImode -1 needs len == 2?  Noting that
>> double-ints
>> were supposed to be twos-complement (thus, 'unsigned') numbers having
>> implicit non-zero bits sounds error-prone.
>>
>> That said - I don't see any value in optimizing storage for short-lived
>> (as you say) wide-ints (apart from limiting it to the mode size).  For
>> example mutating operations on wide-ints are not really possible
>> if you need to extend storage.
>
> the compression used is independent of whether it is sign or unsigned.  but
> the compression "looks" signed.   i.e. you do not represent the upper hwi if
> they would just be a sign extension of the top hwi that is represented.
> However, this does not imply anything about the math that was used to set
> those bits that way, and you can always go back to the full rep independent
> of the sign.
>
> I do not have any long term plans to merge CONST_INT into CONST_WIDE_INT.
> It would be a huge change in the ports and would be fairly space
> inefficient.  My guess is that in real code, less than 1% of real constants
> will have a length greater than 1 even on a 32 bit host.   CONST_INT is very
> space efficient. This could have been mentioned as part of Richard's
> response to your comment about the way we represent the CONST_WIDE_INTs.
> In practice, there are almost none of them anyway.
>
> In fact, you could argue that the tree level did it wrong (not that i am
> suggesting to change this).   But it makes me think what was going on when
> the decision to make TYPE_PRECISION be an INT_CST rather than just a HWI was
> made.   For that there is an implication that it could never take more than
> a HWI since no place in the code even checks TREE_INT_CST_HIGH for these.

Well - on the tree level we now always have two HWIs for all INTEGER_CSTs.  If
we can, based on the size of the underlying mode, reduce that to one
HWI we already
win something.  If we add an explicit length to allow a smaller
encoding for larger modes
(tree_base conveniently has an available 'int' for this ...) then we'd
win in more cases.
Thus, is CONST_INT really necessarily better than optimized CONST_WIDE_INT
storage?

Richard.

 >>+  enum Op {
 >>+NONE,
 >>
 >>we don't have sth like this in double-int.h, why was the double-int
 >>mechanism
 >>not viable?
>>>
>>> >
>>> >i have chosen to use enums for things rather than passing booleans.
>>
>> But it's bad to use an enum with 4 values for a thing that can only
>> possibly
>> expect two.  You cannot optimize tests as easily.  Please retain the bool
>> uns
>> parameters instead.
>
> I am breaking it into two enums.


Re: libgo patch committed: Use libbacktrace

2012-10-07 Thread Andreas Schwab
Ian Lance Taylor  writes:

> +/* Set *VAL to the value of the symbol for PC.  */
> +
> +static _Bool
> +__go_symbol_value (uintptr_t pc, uintptr_t *val)
> +{
> +  *val = 0;
> +  backtrace_syminfo (__go_get_backtrace_state (), pc, syminfo_callback,
> +  error_callback, &val);
> +  return *val != 0;
>  }

Don't clobber address of val.

diff --git a/libgo/runtime/go-caller.c b/libgo/runtime/go-caller.c
index 843adf6..8dcf9e4 100644
--- a/libgo/runtime/go-caller.c
+++ b/libgo/runtime/go-caller.c
@@ -144,7 +144,7 @@ __go_symbol_value (uintptr_t pc, uintptr_t *val)
 {
   *val = 0;
   backtrace_syminfo (__go_get_backtrace_state (), pc, syminfo_callback,
-error_callback, &val);
+error_callback, val);
   return *val != 0;
 }
 

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: patch to fix constant math

2012-10-07 Thread Kenneth Zadeck


On 10/07/2012 09:19 AM, Richard Guenther wrote:

>In fact, you could argue that the tree level did it wrong (not that i am
>suggesting to change this).   But it makes me think what was going on when
>the decision to make TYPE_PRECISION be an INT_CST rather than just a HWI was
>made.   For that there is an implication that it could never take more than
>a HWI since no place in the code even checks TREE_INT_CST_HIGH for these.

Well - on the tree level we now always have two HWIs for all INTEGER_CSTs.  If
we can, based on the size of the underlying mode, reduce that to one
HWI we already
win something.  If we add an explicit length to allow a smaller
encoding for larger modes
(tree_base conveniently has an available 'int' for this ...) then we'd
win in more cases.
Thus, is CONST_INT really necessarily better than optimized CONST_WIDE_INT
storage?
i have to admit, that looking at these data structures gives me a 
headache.  This all looks like something that Rube Goldberg would have 
invented had he done object oriented design  (Richard S did not know who 
Rube Goldberg when i mentioned this name to him a few years ago since 
this is an American thing, but the british had their own equivalent and 
I assume the germans do too.).


i did the first cut of changing the rtl level structure and Richard S 
threw up on it and suggested what is there now, which happily (for me) i 
was able to get mike to implement.


mike also did the tree level version of the data structures for me.   i 
will make sure he used that left over length field.


The bottom line is that you most likely just save the length, but that 
is a big percent of something this small.  Both of rtl ints have a mode, 
so if we can make that change later, it will be space neutral.


Fixup INTEGER_CST

2012-10-07 Thread Jan Hubicka
Hi,
I added a santy check that after fixup all types that lost in the merging are
really dead.  And it turns out we have some zombies around.

INTEGER_CST needs special care because it is special cased by the streamer.  We 
also
do not want to do inplace modificaitons on it because that would corrupt the 
hashtable
used by tree.c's sharing code

Bootstrapped/regtested x86_64-linux, OK?


PR lto/54839
* lto/lto.c (remember_with_vars): Also fixup INTEGER_CST.
(fixup_integer_cst): New functoin.
(lto_ft_type): Fixup BASETYPE of methods and offsets.
Index: lto/lto.c
===
--- lto/lto.c   (revision 192164)
+++ lto/lto.c   (working copy)
@@ -1408,11 +1408,36 @@ remember_with_vars (tree t)
(tt) = GIMPLE_REGISTER_TYPE (tt); \
  if (VAR_OR_FUNCTION_DECL_P (tt) && TREE_PUBLIC (tt)) \
remember_with_vars (t); \
+ if (TREE_CODE (tt) == INTEGER_CST) \
+   (tt) = fixup_integer_cst (tt); \
} \
 } while (0)
 
 static void lto_fixup_types (tree);
 
+/* Return integer_cst T with updated type.  */
+
+static tree
+fixup_integer_cst (tree t)
+{
+  tree type = GIMPLE_REGISTER_TYPE (TREE_TYPE (t));
+
+  if (type == TREE_TYPE (t))
+return t;
+
+  /* If overflow was set, streamer_read_integer_cst
+ produced local copy of T. */
+  if (TREE_OVERFLOW (t))
+{
+  TREE_TYPE (t) = type;
+  return t;
+}
+  else
+  /* Otherwise produce new shared node for the new type.  */
+return build_int_cst_wide (type, TREE_INT_CST_LOW (t),
+  TREE_INT_CST_HIGH (t));
+}
+
 /* Fix up fields of a tree_typed T.  */
 
 static void
@@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
   LTO_FIXUP_TREE (t->type_non_common.binfo);
 
   LTO_FIXUP_TREE (TYPE_CONTEXT (t));
+
+  if (TREE_CODE (t) == METHOD_TYPE)
+TYPE_METHOD_BASETYPE (t);
+  if (TREE_CODE (t) == OFFSET_TYPE)
+TYPE_OFFSET_BASETYPE (t);
 }
 
 /* Fix up fields of a BINFO T.  */


[lra] patch to speed more compilation of PR54146

2012-10-07 Thread Vladimir Makarov
The following patch speeds LRA up more on PR54146.  Below times for 
compilation of the test on gcc17.fsffrance.org (an AMD machine):


Before:
real=1214.71 user=1192.05 system=22.48
After:
real=1144.37 user=1124.31 system=20.11

The patch should not change the generated code.  About 2/3 of speed up 
is achieved by insertion of live ranges of pseudos only interesting for 
assignment in the start point chains.
1/3 is achieved by switching iterations through a bitmap instead of 
sparseset at the end of function process_bb_lives.


The patch was successfully bootstrapped on x86/x86-64.

Committed as rev. 192183.

2012-10-07  Vladimir Makarov  

* lra-int.h (struct lra_live_range): Remove finish_next.
(lra_start_point_ranges, lra_finish_point_ranges): Remove.
* lra-lives.c (lra_start_point_ranges, lra_finish_point_ranges):
Remove.
(process_bb_lives): Change start regno in
EXECUTE_IF_SET_IN_BITMAP.  Iterate on DF_LR_IN (bb) instead of
pseudos_live_through_calls.
(create_start_finish_chains, rebuild_start_finish_chains): Remove.
(compress_live_ranges): Don't call rebuild_start_finish_chains.
(lra_create_live_ranges): Don't call create_start_finish_chains.
(lra_clear_live_ranges): Remove code freeing
lra_start_point_ranges and lra_finish_point_ranges.
* lra-assigns.c (start_point_ranges, not_in_chain_mark): New.
(create_live_range_start_chains): Ditto.
(insert_in_live_range_start_chain): Ditto.
(finish_live_range_start_chains): Ditto.
(update_lives, assign_temporarily): Call
insert_in_live_range_start_chain.
(find_hard_regno_for, spill_for): Rename lra_start_point_ranges to
start_point_ranges.
(setup_live_pseudos_and_spill_after_risky): Ditto.
(lra_assign): Call create_live_range_start_chains and
finish_live_range_start_chains.




Index: lra-lives.c
===
--- lra-lives.c (revision 192169)
+++ lra-lives.c (working copy)
@@ -54,10 +54,6 @@ along with GCC; see the file COPYING3.   I
correspond to places where output operands are born. */
 int lra_live_max_point;
 
-/* Arrays of size LRA_LIVE_MAX_POINT mapping a program point to the
-   pseudo live ranges with given start/finish point.  */
-lra_live_range_t *lra_start_point_ranges, *lra_finish_point_ranges;
-
 /* Accumulated execution frequency of all references for each hard
register.  */
 int lra_hard_reg_usage[FIRST_PSEUDO_REGISTER];
@@ -529,9 +525,8 @@ process_bb_lives (basic_block bb)
   REG_SET_TO_HARD_REG_SET (hard_regs_live, reg_live_out);
   AND_COMPL_HARD_REG_SET (hard_regs_live, eliminable_regset);
   AND_COMPL_HARD_REG_SET (hard_regs_live, lra_no_alloc_regs);
-  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, 0, j, bi)
-if (j >= FIRST_PSEUDO_REGISTER)
-  mark_pseudo_live (j);
+  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
+mark_pseudo_live (j);
   
   freq = REG_FREQ_FROM_BB (bb);
 
@@ -755,47 +750,13 @@ process_bb_lives (basic_block bb)
   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
 mark_pseudo_dead (i);
 
-  EXECUTE_IF_SET_IN_SPARSESET (pseudos_live_through_calls, i)
-if (bitmap_bit_p (DF_LR_IN (bb), i))
-  check_pseudos_live_through_calls (i);
+  EXECUTE_IF_SET_IN_BITMAP (DF_LR_IN (bb), FIRST_PSEUDO_REGISTER, j, bi)
+if (sparseset_bit_p (pseudos_live_through_calls, j))
+  check_pseudos_live_through_calls (j);
   
   incr_curr_point (freq);
 }
 
-/* Create and set up LRA_START_POINT_RANGES and
-   LRA_FINISH_POINT_RANGES.  */
-static void
-create_start_finish_chains (void)
-{
-  int i, max_regno;
-  lra_live_range_t r;
-
-  lra_start_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
-  lra_finish_point_ranges = XCNEWVEC (lra_live_range_t, lra_live_max_point);
-  max_regno = max_reg_num ();
-  for (i = FIRST_PSEUDO_REGISTER; i < max_regno; i++)
-{
-  for (r = lra_reg_info[i].live_ranges; r != NULL; r = r->next)
-   {
- r->start_next = lra_start_point_ranges[r->start];
- lra_start_point_ranges[r->start] = r;
- r->finish_next = lra_finish_point_ranges[r->finish];
- lra_finish_point_ranges[r->finish] = r;
-   }
-}
-}
-
-/* Rebuild LRA_START_POINT_RANGES and LRA_FINISH_POINT_RANGES after
-   new live ranges and program points were added as a result of new
-   insn generation.  */
-static void
-rebuild_start_finish_chains (void)
-{
-  free (lra_finish_point_ranges);
-  free (lra_start_point_ranges);
-  create_start_finish_chains ();
-}
-
 /* Compress pseudo live ranges by removing program points where
nothing happens.  Complexity of many algorithms in LRA is linear
function of program points number.  To speed up the code we try to
@@ -934,7 +895,6 @@ static void
 compress_live_ranges (void)
 {
   remove_some_program_points_and_update_live_ranges ();
-  rebuild_start_fi

Re: Fixup INTEGER_CST

2012-10-07 Thread Richard Guenther
On Sun, Oct 7, 2012 at 5:15 PM, Jan Hubicka  wrote:
> Hi,
> I added a santy check that after fixup all types that lost in the merging are
> really dead.  And it turns out we have some zombies around.
>
> INTEGER_CST needs special care because it is special cased by the streamer.  
> We also
> do not want to do inplace modificaitons on it because that would corrupt the 
> hashtable
> used by tree.c's sharing code
>
> Bootstrapped/regtested x86_64-linux, OK?

No, I don't think we want to fixup INTEGER_CSTs this way.  Instead we
want to fixup
them where they end up used unfixed.

Richard.

>
> PR lto/54839
> * lto/lto.c (remember_with_vars): Also fixup INTEGER_CST.
> (fixup_integer_cst): New functoin.
> (lto_ft_type): Fixup BASETYPE of methods and offsets.
> Index: lto/lto.c
> ===
> --- lto/lto.c   (revision 192164)
> +++ lto/lto.c   (working copy)
> @@ -1408,11 +1408,36 @@ remember_with_vars (tree t)
> (tt) = GIMPLE_REGISTER_TYPE (tt); \
>   if (VAR_OR_FUNCTION_DECL_P (tt) && TREE_PUBLIC (tt)) \
> remember_with_vars (t); \
> + if (TREE_CODE (tt) == INTEGER_CST) \
> +   (tt) = fixup_integer_cst (tt); \
> } \
>  } while (0)
>
>  static void lto_fixup_types (tree);
>
> +/* Return integer_cst T with updated type.  */
> +
> +static tree
> +fixup_integer_cst (tree t)
> +{
> +  tree type = GIMPLE_REGISTER_TYPE (TREE_TYPE (t));
> +
> +  if (type == TREE_TYPE (t))
> +return t;
> +
> +  /* If overflow was set, streamer_read_integer_cst
> + produced local copy of T. */
> +  if (TREE_OVERFLOW (t))
> +{
> +  TREE_TYPE (t) = type;
> +  return t;
> +}
> +  else
> +  /* Otherwise produce new shared node for the new type.  */
> +return build_int_cst_wide (type, TREE_INT_CST_LOW (t),
> +  TREE_INT_CST_HIGH (t));
> +}
> +
>  /* Fix up fields of a tree_typed T.  */
>
>  static void
> @@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
>LTO_FIXUP_TREE (t->type_non_common.binfo);
>
>LTO_FIXUP_TREE (TYPE_CONTEXT (t));
> +
> +  if (TREE_CODE (t) == METHOD_TYPE)
> +TYPE_METHOD_BASETYPE (t);
> +  if (TREE_CODE (t) == OFFSET_TYPE)
> +TYPE_OFFSET_BASETYPE (t);
>  }
>
>  /* Fix up fields of a BINFO T.  */


Re: [lra] patch to speed more compilation of PR54146

2012-10-07 Thread Steven Bosscher
Hi Vlad,

Thanks for working on this!

> -  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, 0, j, bi)
> -if (j >= FIRST_PSEUDO_REGISTER)
> -  mark_pseudo_live (j);
> +  EXECUTE_IF_SET_IN_BITMAP (reg_live_out, FIRST_PSEUDO_REGISTER, j, bi)
> +mark_pseudo_live (j);

FWIW, the above is optimized by GCC itself, in VRP. (I was surprised
and impressed by that :-)

Ciao!
Steven


[patch] Add option to compute "reaching and live definitions"

2012-10-07 Thread Steven Bosscher
Hello,

The attached patch adds a DF changeable flag to compute a subset of
reaching definitions that are also live at the program points they
reach. This is an idea I discussed with Paolo many years ago already,
but until today it hadn't really ever been close to the top of my todo
list, but trying to compile the test case for PR54146 with -fweb
finally changed that :-)

The idea is to prune the DF_RD_OUT set of each basic block by
registers live in DF_LR_OUT. I've implemented this pruning with the
same approach as the sparse formulation of RD dataflow, expanding the
regs in DF_LR_OUT to the corresponding set of DEFs and using that set
to mask out dead DEFs in DF_RD_OUT. This is a convenient formulation
because DF_LR is already expressed in terms of regnos (like
sparse_kill & friends), and the formulation also works fine for the
dense formulation, of course.

The effect on compile time for a set of cc1-i files is negligible (not
measurable, anyway), but for crazy large test cases like PR54146 this
patch is the difference between triggering out-of-memory or completing
the pass (at least -fweb, probably also the other affected passes).

Bootstrapped&tested on powerpc64-unknown-linux-gnu. OK for trunk?

Ciao!
Steven


df_rd_pruned.diff
Description: Binary data


Re: Fixup INTEGER_CST

2012-10-07 Thread Jan Hubicka
> On Sun, Oct 7, 2012 at 5:15 PM, Jan Hubicka  wrote:
> > Hi,
> > I added a santy check that after fixup all types that lost in the merging 
> > are
> > really dead.  And it turns out we have some zombies around.
> >
> > INTEGER_CST needs special care because it is special cased by the streamer. 
> >  We also
> > do not want to do inplace modificaitons on it because that would corrupt 
> > the hashtable
> > used by tree.c's sharing code
> >
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> No, I don't think we want to fixup INTEGER_CSTs this way.  Instead we
> want to fixup
> them where they end up used unfixed.

Erm, I think it is what the patch does?
It replaces pointers to integer_cst with type that did not survive by pointer
to new integer_cst. (with the optimization that INTEGER_CST with overflow
is changed in place because it is allowed to do so).

Honza
> 
> Richard.
> 
> >
> > PR lto/54839
> > * lto/lto.c (remember_with_vars): Also fixup INTEGER_CST.
> > (fixup_integer_cst): New functoin.
> > (lto_ft_type): Fixup BASETYPE of methods and offsets.
> > Index: lto/lto.c
> > ===
> > --- lto/lto.c   (revision 192164)
> > +++ lto/lto.c   (working copy)
> > @@ -1408,11 +1408,36 @@ remember_with_vars (tree t)
> > (tt) = GIMPLE_REGISTER_TYPE (tt); \
> >   if (VAR_OR_FUNCTION_DECL_P (tt) && TREE_PUBLIC (tt)) \
> > remember_with_vars (t); \
> > + if (TREE_CODE (tt) == INTEGER_CST) \
> > +   (tt) = fixup_integer_cst (tt); \
> > } \
> >  } while (0)
> >
> >  static void lto_fixup_types (tree);
> >
> > +/* Return integer_cst T with updated type.  */
> > +
> > +static tree
> > +fixup_integer_cst (tree t)
> > +{
> > +  tree type = GIMPLE_REGISTER_TYPE (TREE_TYPE (t));
> > +
> > +  if (type == TREE_TYPE (t))
> > +return t;
> > +
> > +  /* If overflow was set, streamer_read_integer_cst
> > + produced local copy of T. */
> > +  if (TREE_OVERFLOW (t))
> > +{
> > +  TREE_TYPE (t) = type;
> > +  return t;
> > +}
> > +  else
> > +  /* Otherwise produce new shared node for the new type.  */
> > +return build_int_cst_wide (type, TREE_INT_CST_LOW (t),
> > +  TREE_INT_CST_HIGH (t));
> > +}
> > +
> >  /* Fix up fields of a tree_typed T.  */
> >
> >  static void
> > @@ -1526,6 +1549,11 @@ lto_ft_type (tree t)
> >LTO_FIXUP_TREE (t->type_non_common.binfo);
> >
> >LTO_FIXUP_TREE (TYPE_CONTEXT (t));
> > +
> > +  if (TREE_CODE (t) == METHOD_TYPE)
> > +TYPE_METHOD_BASETYPE (t);
> > +  if (TREE_CODE (t) == OFFSET_TYPE)
> > +TYPE_OFFSET_BASETYPE (t);
> >  }
> >
> >  /* Fix up fields of a BINFO T.  */


RE: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Weddington, Eric


> -Original Message-
> From: Georg-Johann Lay []
> Sent: Friday, October 05, 2012 9:55 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Denis Chertykov; Weddington, Eric; Joerg Wunsch
> Subject: [Patch,avr]: Remove -mshort-calls option
> 
> As already discussed, this patch removes the -mshort-calls command option
> from
> avr-gcc.
> 
> Ok to apply?

Ok to apply, but...

> 
> If the change is on order, changes to wwwdocs will follow, i.e. deprecate
> the
> option in 4.7 and tell it is removed in the 4.8 caveats.
>

... but where do we notify the user that the switch is deprecated?



RE: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Oleg Endo
On Sun, 2012-10-07 at 18:01 +, Weddington, Eric wrote:
> 
> > -Original Message-
> > From: Georg-Johann Lay []
> > Sent: Friday, October 05, 2012 9:55 AM
> > To: gcc-patches@gcc.gnu.org
> > Cc: Denis Chertykov; Weddington, Eric; Joerg Wunsch
> > Subject: [Patch,avr]: Remove -mshort-calls option
> > 
> > As already discussed, this patch removes the -mshort-calls command option
> > from
> > avr-gcc.
> > 
> > Ok to apply?
> 
> Ok to apply, but...
> 
> > 
> > If the change is on order, changes to wwwdocs will follow, i.e. deprecate
> > the
> > option in 4.7 and tell it is removed in the 4.8 caveats.
> >
> 
> ... but where do we notify the user that the switch is deprecated?
> 

Maybe would be better to first make the option a no-op that prints a
warning in GCC, remove it from all documentations and mention the
deprecation in the wwwdocs changes.  Then, in the next GCC release,
remove it completely.

Cheers,
Oleg



[patch][lra] Improve initial program point density in lra-lives.c (RFA)

2012-10-07 Thread Steven Bosscher
On Sat, Oct 6, 2012 at 4:52 AM, Vladimir Makarov wrote:
>> Without this patch:
>> Compressing live ranges: from 700458 to 391665 - 55%, pre_count
>> 40730653, post_count 34363983
>> max per-reg pre_count 12978 (228090, 2 defs, 2 uses) (reg/f:DI 228090
>> [ SR.25009 ])
>> max per-reg post_count 10967 (228090, 2 defs, 2 uses) (reg/f:DI 228090
>> [ SR.25009 ])
>>
>> With this patch:
>> Compressing live ranges: from 700458 to 372585 - 53%, pre_count
>> 283937, post_count 271120
>> max per-reg pre_count 545 (230653, 542 defs, 542 uses) (reg/f:DI
>> 230653 [ SR.13303 ])
>> max per-reg post_count 544 (230649, 542 defs, 542 uses) (reg/f:DI
>> 230649 [ SR.13305 ])
>>
>> (the per-reg counts are the lengths of the live range chains for the
>> mentioned regno).
>
> Yes, that is impressive.  But I think, #points in a live range is a real
> parameter of the complexity.

Yes, that's probably true, except for the compression stuff.

Here's the final patch, bootstrapped&tested on
x86_64-unknown-linux-gnu. OK for the LRA-branch?

Ciao!
Steven


lra-lives-cheaper-recompute.diff
Description: Binary data


Fix time accounting in the inliner

2012-10-07 Thread Jan Hubicka
Hi,
estimate_edge_time was originally derrived from estimate_edge_growth.
estimate_edge_growth returns estimated function body growth by inlining.
That is size_of_inlined_body-size_of_call_stmt.
For estimate_edge_time it is kind of confusing and led me to double
account time of call stmt in the badness computation.

This patch fixes estimate_edge_time to actually return time of inlined
sequence and adds estimate_edge_size that is the size variant + makes
do_estimate_edge_growth a wrapper about it (because at most places
the inline heuristic really care about growth)

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-inline-analysis.c (do_estimate_edge_time): Return actual
time spent by the inlined sequence.
(do_estimate_edge_growth): Rename to ...
(do_estimate_edge_time): ... this one; return size of inlined
sequence.
* ipa-inline.h (do_estimate_edge_size): New.
(do_estimate_edge_growth): Remove.
(estimate_edge_size): New function.
(estimate_edge_growth): Use it.
Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 192116)
+++ ipa-inline-analysis.c   (working copy)
@@ -3312,14 +3312,12 @@ do_estimate_edge_time (struct cgraph_edg
   VEC_free (tree, heap, known_binfos);
   VEC_free (ipa_agg_jump_function_p, heap, known_aggs);
 
-  ret = (((gcov_type)time
-  - es->call_stmt_time) * edge->frequency
-+ CGRAPH_FREQ_BASE / 2) / CGRAPH_FREQ_BASE;
+  ret = RDIV ((gcov_type)time * edge->frequency,
+ CGRAPH_FREQ_BASE);
 
   /* When caching, update the cache entry.  */
   if (edge_growth_cache)
 {
-  int ret_size;
   if ((int)VEC_length (edge_growth_cache_entry, edge_growth_cache)
  <= edge->uid)
VEC_safe_grow_cleared (edge_growth_cache_entry, heap, edge_growth_cache,
@@ -3327,10 +3325,8 @@ do_estimate_edge_time (struct cgraph_edg
   VEC_index (edge_growth_cache_entry, edge_growth_cache, edge->uid).time
= ret + (ret >= 0);
 
-  ret_size = size - es->call_stmt_size;
-  gcc_checking_assert (es->call_stmt_size);
   VEC_index (edge_growth_cache_entry, edge_growth_cache, edge->uid).size
-   = ret_size + (ret_size >= 0);
+   = size + (size >= 0);
   VEC_index (edge_growth_cache_entry, edge_growth_cache, edge->uid).hints
= hints + 1;
 }
@@ -3338,11 +3334,11 @@ do_estimate_edge_time (struct cgraph_edg
 }
 
 
-/* Estimate the growth of the caller when inlining EDGE.
+/* Return estimated callee growth after inlining EDGE.
Only to be called via estimate_edge_size.  */
 
 int
-do_estimate_edge_growth (struct cgraph_edge *edge)
+do_estimate_edge_size (struct cgraph_edge *edge)
 {
   int size;
   struct cgraph_node *callee;
@@ -3375,8 +3371,7 @@ do_estimate_edge_growth (struct cgraph_e
   VEC_free (tree, heap, known_vals);
   VEC_free (tree, heap, known_binfos);
   VEC_free (ipa_agg_jump_function_p, heap, known_aggs);
-  gcc_checking_assert (inline_edge_summary (edge)->call_stmt_size);
-  return size - inline_edge_summary (edge)->call_stmt_size;
+  return size;
 }
 
 
Index: ipa-inline.h
===
--- ipa-inline.h(revision 192116)
+++ ipa-inline.h(working copy)
@@ -201,7 +201,7 @@ void estimate_ipcp_clone_size_and_time (
 int do_estimate_growth (struct cgraph_node *);
 void inline_merge_summary (struct cgraph_edge *edge);
 void inline_update_overall_summary (struct cgraph_node *node);
-int do_estimate_edge_growth (struct cgraph_edge *edge);
+int do_estimate_edge_size (struct cgraph_edge *edge);
 int do_estimate_edge_time (struct cgraph_edge *edge);
 inline_hints do_estimate_edge_hints (struct cgraph_edge *edge);
 void initialize_growth_caches (void);
@@ -245,20 +245,31 @@ estimate_growth (struct cgraph_node *nod
 }
 
 
-/* Return estimated callee growth after inlining EDGE.  */
+/* Return estimated size of the inline sequence of EDGE.  */
 
 static inline int
-estimate_edge_growth (struct cgraph_edge *edge)
+estimate_edge_size (struct cgraph_edge *edge)
 {
   int ret;
   if ((int)VEC_length (edge_growth_cache_entry, edge_growth_cache) <= edge->uid
   || !(ret = VEC_index (edge_growth_cache_entry,
edge_growth_cache,
edge->uid).size))
-return do_estimate_edge_growth (edge);
+return do_estimate_edge_size (edge);
   return ret - (ret > 0);
 }
 
+/* Return estimated callee growth after inlining EDGE.  */
+
+static inline int
+estimate_edge_growth (struct cgraph_edge *edge)
+{
+#ifdef ENABLE_CHECKING
+  gcc_checking_assert (inline_edge_summary (edge)->call_stmt_size);
+#endif
+  return (estimate_edge_size (edge)
+ - inline_edge_summary (edge)->call_stmt_size);
+}
 
 /* Return estimated callee runtime increase after inlning
EDGE.  */


Re: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Georg-Johann Lay

Oleg Endo wrote:

Weddington, Eric wrote:

Georg-Johann Lay wrote:


As already discussed, this patch removes the -mshort-calls command option
from avr-gcc.

Ok to apply?

Ok to apply, but...


If the change is on order, changes to wwwdocs will follow, i.e. deprecate
the option in 4.7 and tell it is removed in the 4.8 caveats.


... but where do we notify the user that the switch is deprecated?


Exactly there: In the wwwdocs, namely the caveats section of the release 
notes.  The release notes are the blackboard for anyone who wants to get 
informed about what changed in the new release series.



Maybe would be better to first make the option a no-op that prints a
warning in GCC, remove it from all documentations and mention the
deprecation in the wwwdocs changes.  Then, in the next GCC release,
remove it completely.


IMHO it's not reasonable to put effort into a declining feature.  Why 
shouldn't we follow common GCC practice here?


Johann


Re: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Oleg Endo
On Sun, 2012-10-07 at 21:37 +0200, Georg-Johann Lay wrote:
> Oleg Endo wrote:
> > Maybe would be better to first make the option a no-op that prints a
> > warning in GCC, remove it from all documentations and mention the
> > deprecation in the wwwdocs changes.  Then, in the next GCC release,
> > remove it completely.
> 
> IMHO it's not reasonable to put effort into a declining feature.  Why 
> shouldn't we follow common GCC practice here?

"Effort" in this case is adding a "Warn(...)" to the existing option and
then remove it some time later.  I think it's more user friendly to
first warn and then do.  But that's just my opinion, feel free to
ignore :)

Cheers,
Oleg



Re: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Joerg Wunsch
As Oleg Endo wrote:

> I think it's more user friendly to
> first warn and then do.

The problem is that this option would better not have existed straight
from the beginning.  When using it, the compiler is at the risk of
generating code that fails to link later, because the relative calls
used could not reach the entire address space of the respective
processor.

-- 
cheers, J"org   .-.-.   --... ...--   -.. .  DL8DTL

http://www.sax.de/~joerg/NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)


Re: [Patch,avr]: Remove -mshort-calls option

2012-10-07 Thread Georg-Johann Lay

Joerg Wunsch wrote:

As Oleg Endo wrote:


I think it's more user friendly to
first warn and then do.


The problem is that this option would better not have existed straight
from the beginning.  When using it, the compiler is at the risk of
generating code that fails to link later, because the relative calls
used could not reach the entire address space of the respective
processor.


Also notice that this is only about the -mshort-calls option of the avr 
port, not about similar named options that exist for some other ports.


Johann





Re: handle isl and cloog in contrib/download_prerequisites

2012-10-07 Thread Jonathan Wakely
Resending as plain text so the list doesn't reject it ...

On Oct 7, 2012 12:00 AM, "NightStrike"  wrote:
>
> On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez
>  wrote:
> > Hi,
> >
> > GCC now requires ISL and a very new CLOOG but download_prerequisites
> > does not download those. Also, there is only one sensible place to
>
> As of what version is isl/cloog no longer optional?

If they're really no longer optional then the prerequisites page and
4.8 changes page need to be updated.

The patch downloads isl and cloog unconditionally, does gcc build them
unconditionally if they're found in the source dir? If they are still
optional I don't want download_prerequisites to fetch files that will
slow down building gcc by building libs and enabling features I don't
use.


Re: handle isl and cloog in contrib/download_prerequisites

2012-10-07 Thread Manuel López-Ibáñez
On 7 October 2012 22:13, Jonathan Wakely  wrote:
>
> On Oct 7, 2012 12:00 AM, "NightStrike"  wrote:
>>
>> On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez
>>  wrote:
>> > Hi,
>> >
>> > GCC now requires ISL and a very new CLOOG but download_prerequisites
>> > does not download those. Also, there is only one sensible place to
>>
>> As of what version is isl/cloog no longer optional?
>
> If they're really no longer optional then the prerequisites page and 4.8
> changes page need to be updated.
>
> The patch downloads isl and cloog unconditionally, does gcc build them
> unconditionally if they're found in the source dir?  If they are still
> optional I don't want download_prerequisites to fetch files that will slow
> down building gcc by building libs and enabling features I don't use.

I guess they are optional in the sense that you can configure gcc to
not require them. But the default configure in x86_64-gnu-linux
requires isl and cloog. Since isl and cloog need to be
configured/build in a special way to work with gcc (I didn't manage to
make it work even after reading the docs), the easiest way is to just
let GCC build them. I would also hope that if graphite is disabled,
GCC won't try to build isl and cloog, but I really don't know.

Cheers,

Manuel.


Re: handle isl and cloog in contrib/download_prerequisites

2012-10-07 Thread Steven Bosscher
On Sun, Oct 7, 2012 at 10:31 PM, Manuel López-Ibáñez
 wrote:
> Since isl and cloog need to be
> configured/build in a special way to work with gcc

They do?? I built isl and cloog on a few compile farm machines without
any special configure magic. What problems did you encounter?

Ciao!
Steven


Re: [C++ Patch / RFC] PR 51422

2012-10-07 Thread Jason Merrill

OK.

Jason


Re: [C++ Patch] Small SFINAE-related clean up

2012-10-07 Thread Jason Merrill

OK.

Jason


Re: [PATCH] PR 53528 c++/ C++11 Generalized Attribute support

2012-10-07 Thread Jason Merrill

OK.

Jason


Re: handle isl and cloog in contrib/download_prerequisites

2012-10-07 Thread Manuel López-Ibáñez
On 7 October 2012 22:38, Steven Bosscher  wrote:
> On Sun, Oct 7, 2012 at 10:31 PM, Manuel López-Ibáñez
>  wrote:
>> Since isl and cloog need to be
>> configured/build in a special way to work with gcc
>
> They do?? I built isl and cloog on a few compile farm machines without
> any special configure magic. What problems did you encounter?

http://gcc.gnu.org/install/prerequisites.html

CLooG 0.17.0
Necessary to build GCC with the Graphite loop optimizations. It
can be downloaded from ftp://gcc.gnu.org/pub/gcc/infrastructure/ as
cloog-0.17.0.tar.gz. The --with-cloog configure option should be used
if CLooG is not installed in your default library search path. CLooG
needs to be built against ISL 0.10, not its included copy of ISL which
is too old. Use --with-isl=system to direct CLooG to pick up an
already installed ISL. CLooG needs to be configured to use GMP
internally, use --with-bits=gmp to direct it to do that.

The problem was in gcc10 and I tried to install cloog and isl in
/opt/cfarm. I don't recall exactly what failed and why but among the
problems:

* GCC was not able to find libisl.so after successfully passing the
configure checks, which was totally weird.

* GCC did not define HAVE_cloog before including isl.h so quite a few
things were missing.

* Either cloog or isl were finding the wrong version of gmp, and the
build failed saying that some gmp_ function was missing.

In any case, using the patched script worked like a charm, so I did
not investigate further.

Cheers,

Manuel.


Re: [lra] patch to speed more compilation of PR54146

2012-10-07 Thread Steven Bosscher
On Sun, Oct 7, 2012 at 5:59 PM, Vladimir Makarov wrote:
> The following patch speeds LRA up more on PR54146.  Below times for
> compilation of the test on gcc17.fsffrance.org (an AMD machine):
>
> Before:
> real=1214.71 user=1192.05 system=22.48
> After:
> real=1144.37 user=1124.31 system=20.11

Hi Vlad,

The next bottle-neck in my timings is in
lra-eliminate.c:lra_eliminate(), in this loop:

   FOR_EACH_BB (bb)
 FOR_BB_INSNS_SAFE (bb, insn, temp)
   {
   if (bitmap_bit_p (&insns_with_changed_offsets, INSN_UID (insn)))
  process_insn_for_elimination (insn, final_p);
   }

The problem is in bitmap_bit_p. Random access to a large bitmap can be
very slow.

I'm playing with a patch to expand the insns_with_changed_offsets
bitmap to an sbitmap, and will send a patch if this works better.

Ciao!
Steven


[PATCH] fix nanosleep check in GLIBCXX_ENABLE_LIBSTDCXX_TIME for darwin

2012-10-07 Thread Jack Howarth
   The --enable-libstdcxx-time=yes configure option fails to validate the
presence of a usable nanosleep() call on darwin. The fix is the trivial 
change to the test used so that tp is declared as a struct for _POSIX_TIMERS <= 
0.
Regression tested on x86_64-apple-darwin12. Okay for gcc trunk as well as
gcc-4_7-branch/gcc-4_6-branch/gcc-4_5-branch?
Jack

libstdc++-v3/

2012-10-07  Jack Howarth  

PR libstdc++/54847
* acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Declare tp as struct 
for non-POSIX timers.
* configure: Regenerated.

Index: acinclude.m4
===
--- acinclude.m4(revision 192186)
+++ acinclude.m4(working copy)
@@ -1251,6 +1251,8 @@ AC_DEFUN([GLIBCXX_ENABLE_LIBSTDCXX_TIME]
],
[#if _POSIX_TIMERS > 0
  timespec tp;
+#else
+ struct timespec tp;
 #endif
  nanosleep(&tp, 0);
], [ac_has_nanosleep=yes], [ac_has_nanosleep=no])


Re: [SH] PR 54760 - Add thread pointer built-ins and GBR displacement addressing

2012-10-07 Thread Kaz Kojima
Oleg Endo  wrote:
> Attached patch adds the GBR addressing mode stuff.  I've added the 
> '!NONJUMP_INSN_P (i)' check which should fix the crash.  Could you
> please test again?

No new failures with the top level "make -k check" on cross
sh4-unknown-linux-gnu.  OK for trunk.

Regards,
kaz


[lra] another patch to speed more compilation of PR54146

2012-10-07 Thread Steven Bosscher
Hello,

This patch changes the worklist-like bitmap in lra_eliminate() to an
sbitmap.  Effect on compile time:

lra r192183: LRA virtuals elimination:  51.56 ( 6%)
with patch: LRA virtuals elimination:  14.02 ( 2%)

OK for the branch after bootstrap&test on x86_64-unknown-linux-gnu?

Ciao!
Steven


lra-eliminates-sbitmap2.diff
Description: Binary data


Re: [PATCH] PowerPC VLE port

2012-10-07 Thread David Edelsohn
Jim,

This version of the VLE support patch is an improvement, although I
still am troubled by the number of changes to rs6000.md.

And can you suggest any way to re-factor rs6000_rtx_costs()?

Andrew suggested placing all of the patterns in vle.md independently
and without my prompting.  VLE really is more of a separate
architecture that happens to share some similarities and mnemonics
with PowerPC, not a variant of PowerPC.

I agree with Andrew that the cleanup portions of the patch should be
sent separately.

I also am continuing to work with Mike Meissner on the target flags
patch, which will affect this patch.

Thanks, David


Committed, MMIX: fix a port-bug outputting constants, improving test-results.

2012-10-07 Thread Hans-Peter Nilsson
Fixing this bug, which effectively caused 0 to be emitted instead of
constants requiring more than one actual insn to generate (through the
base-plus-offset support, kind-of macro insns), i.e. like "LDA
$2,#0lx" instead of "LDA $2,#ff", gets rid of 193 of the
regressions from (some-early-time) I see for mmix-knuth-mmixware
(regress-234 to regress-41), vastly improving results, somewhat as
follows:

=== gcc Summary ===

-# of expected passes   73786
-# of unexpected failures   918
+# of expected passes   74337
+# of unexpected failures   367

=== g++ Summary ===

-# of expected passes   45504
-# of unexpected failures   174
+# of expected passes   45536
+# of unexpected failures   142

=== objc Summary ===

-# of expected passes   1655
-# of unexpected failures   153
+# of expected passes   2985
+# of unexpected failures   7

=== libstdc++ Summary ===

-# of expected passes   6822
-# of unexpected failures   141
+# of expected passes   6907
+# of unexpected failures   56

It also brings down build+test from (wall-clock) about 7:09:24 to
5:01:46 on a yesteryear x86_64-linux host, mostly from the 17
eliminated timeouts.  (BTW, that's on top of improvements from 9:18:00
that I get from changing the simulator, start-up-files and newlib
low-level support to instrument and optionally require memory to be
explicitly allocated rather than implicitly allocated on access; lots
of eliminated timeouts.  Bugs are also quite a bit easier to diagnose.
Still quite a bit too high; it should be more like 3:30.  I'll post
the mmix-sim.ch when I've diagnosed all of the fallout from those
changes as bugs previously hidden by implicit allocation, rather than
subtle bugs in the new machinery.)

There *are* fall-out regressions with this patch:
+FAIL: objc/execute/exceptions/matcher-1.m execution,  -O0 -fgnu-runtime
+FAIL: objc/execute/exceptions/matcher-1.m execution,  -O1 -fgnu-runtime
+FAIL: objc/execute/exceptions/matcher-1.m execution,  -O2 -fgnu-runtime
+FAIL: objc/execute/exceptions/matcher-1.m execution,  -O3 -fomit-frame-pointer 
-fgnu-runtime
+FAIL: objc/execute/exceptions/matcher-1.m execution,  -O3 -g -fgnu-runtime
+FAIL: objc/execute/exceptions/matcher-1.m execution,  -Os -fgnu-runtime
but I'm just going to let that slide at the moment.  Not even sorry,
just happy that they're so few.  Before someone starts lecturing
me on knowingly introducing regressions, just don't.

There's also a bug in the assembler for letting "#0lx" through as "#0"
instead of erroring on the "lx".  Later.
Committed.

* config/mmix/mmix.c (mmix_output_octa): Don't assume
HOST_WIDEST_INT_PRINT_HEX starts with "0x".  Instead use
HOST_WIDE_INT_PRINT_HEX_PURE, falling back to
HOST_WIDEST_INT_PRINT_UNSIGNED.

--- gcc/config/mmix/mmix.c.orig 2012-09-10 01:41:59.0 +0200
+++ gcc/config/mmix/mmix.c  2012-10-07 05:11:52.0 +0200
@@ -2499,19 +2499,9 @@ mmix_output_shiftvalue_op_from_str (FILE
 static void
 mmix_output_octa (FILE *stream, HOST_WIDEST_INT value, int do_begin_end)
 {
-  /* Snipped from final.c:output_addr_const.  We need to avoid the
- presumed universal "0x" prefix.  We can do it by replacing "0x" with
- "#0" here; we must avoid a space in the operands and no, the zero
- won't cause the number to be assumed in octal format.  */
-  char hex_format[sizeof (HOST_WIDEST_INT_PRINT_HEX)];
-
   if (do_begin_end)
 fprintf (stream, "\tOCTA ");

-  strcpy (hex_format, HOST_WIDEST_INT_PRINT_HEX);
-  hex_format[0] = '#';
-  hex_format[1] = '0';
-
   /* Provide a few alternative output formats depending on the number, to
  improve legibility of assembler output.  */
   if ((value < (HOST_WIDEST_INT) 0 && value > (HOST_WIDEST_INT) -1)
@@ -2520,8 +2510,13 @@ mmix_output_octa (FILE *stream, HOST_WID
   else if (value > (HOST_WIDEST_INT) 0
   && value < ((HOST_WIDEST_INT) 1 << 31) * 2)
 fprintf (stream, "#%x", (unsigned int) value);
-  else
-fprintf (stream, hex_format, value);
+  else if (sizeof (HOST_WIDE_INT) == sizeof (HOST_WIDEST_INT))
+/* We need to avoid the not-so-universal "0x" prefix; we need the
+   pure hex-digits together with the mmixal "#" hex prefix.  */
+fprintf (stream, "#" HOST_WIDE_INT_PRINT_HEX_PURE,
+(HOST_WIDE_INT) value);
+  else /* Need to avoid the hex output; there's no ...WIDEST...HEX_PURE.  */
+fprintf (stream, HOST_WIDEST_INT_PRINT_UNSIGNED, value);

   if (do_begin_end)
 fprintf (stream, "\n");

brgds, H-P


[PATCH] Assigning correct source location for deallocator

2012-10-07 Thread Dehao Chen
Hi,

R191338 did not completely fix the location for deallocator. This
patch covers more cases for deallocator.

Bootstrapped and passed gcc regression test on x86.

Okay for trunk?

Thanks,
Dehao

gcc/ChangeLog:

2012-10-07  Dehao Chen  

* tree-eh.c (lower_try_finally_onedest): Set correct location for
deallocator.
* gimplify.c (gimplify_expr): Set correct location for TRY stmt.

gcc/cp/ChangeLog:

2012-10-07  Dehao Chen  

* cp-gimplify.c (cp_genericize_r): Set location for TRY expr.

gcc/testsuite/ChangeLog:

2012-10-07  Dehao Chen  

* g++.dg/debug/dwarf2/deallocator.C: Cover more deallocator cases.
Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c(revision 192168)
+++ gcc/cp/cp-gimplify.c(working copy)
@@ -948,11 +948,12 @@ cp_genericize_r (tree *stmt_p, int *walk_subtrees,
  to lower this construct before scanning it, so we need to lower these
  before doing anything else.  */
   else if (TREE_CODE (stmt) == CLEANUP_STMT)
-*stmt_p = build2 (CLEANUP_EH_ONLY (stmt) ? TRY_CATCH_EXPR
-: TRY_FINALLY_EXPR,
- void_type_node,
- CLEANUP_BODY (stmt),
- CLEANUP_EXPR (stmt));
+*stmt_p = build2_loc (input_location,
+ CLEANUP_EH_ONLY (stmt) ? TRY_CATCH_EXPR
+: TRY_FINALLY_EXPR,
+ void_type_node,
+ CLEANUP_BODY (stmt),
+ CLEANUP_EXPR (stmt));
 
   else if (TREE_CODE (stmt) == IF_STMT)
 {
Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 192168)
+++ gcc/gimplify.c  (working copy)
@@ -7475,6 +7475,10 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gi
 TREE_CODE (*expr_p) == TRY_FINALLY_EXPR
 ? GIMPLE_TRY_FINALLY
 : GIMPLE_TRY_CATCH);
+   if (LOCATION_LOCUS (saved_location) != UNKNOWN_LOCATION)
+ gimple_set_location (try_, saved_location);
+   else
+ gimple_set_location (try_, EXPR_LOCATION (save_expr));
if (TREE_CODE (*expr_p) == TRY_CATCH_EXPR)
  gimple_try_set_catch_is_cleanup (try_,
   TRY_CATCH_IS_CLEANUP (*expr_p));
Index: gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C
===
--- gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C (revision 192168)
+++ gcc/testsuite/g++.dg/debug/dwarf2/deallocator.C (working copy)
@@ -18,6 +18,7 @@ int bar();
 
 void foo(int i)
 {
+  t test_outside;
   for (int j = 0; j < 10; j++)
 {
   t test;
@@ -28,6 +29,18 @@ void foo(int i)
  return;
}
 }
+  if (i)
+{
+  t test;
+  if (i == 10)
+   {
+ test.bar();
+   }
+}
+  test_outside.foo();
   return;
 }
-// { dg-final { scan-assembler "deallocator.C:28" } }
+// { dg-final { scan-assembler "deallocator.C:29" } }
+// { dg-final { scan-assembler "deallocator.C:31" } }
+// { dg-final { scan-assembler "deallocator.C:38" } }
+// { dg-final { scan-assembler "deallocator.C:41" } }
Index: gcc/tree-eh.c
===
--- gcc/tree-eh.c   (revision 192168)
+++ gcc/tree-eh.c   (working copy)
@@ -1100,6 +1100,7 @@ lower_try_finally_onedest (struct leh_state *state
   struct goto_queue_node *q, *qe;
   gimple x;
   gimple_seq finally;
+  gimple_stmt_iterator gsi;
   tree finally_label;
   location_t loc = gimple_location (tf->try_finally_expr);
 
@@ -1120,6 +1121,17 @@ lower_try_finally_onedest (struct leh_state *state
 
   lower_eh_constructs_1 (state, &finally);
 
+  for (gsi = gsi_start (finally); !gsi_end_p (gsi); gsi_next (&gsi))
+{
+  gimple stmt = gsi_stmt (gsi);
+  if (LOCATION_LOCUS (gimple_location (stmt)) == UNKNOWN_LOCATION)
+   {
+ tree block = gimple_block (stmt);
+ gimple_set_location (stmt, gimple_location (tf->try_finally_expr));
+ gimple_set_block (stmt, block);
+   }
+}
+
   if (tf->may_throw)
 {
   /* Only reachable via the exception edge.  Add the given label to


Re: [SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF

2012-10-07 Thread Kaz Kojima
Oleg Endo  wrote:
> The attached patch improves comparisons such as
> 'unsigned int <= 0x7FFF' on SH.
> As mentioned in the PR, for some reason, those comparisons do not go
> through the cstore expander.  As a consequence the comparison doesn't
> get the chance to be canonicalized by the target code and ends up as
> '(~x) >> 31'.
> I've not investigated this further and just fixed the symptoms on SH.  I
> don't know whether it's also an issue on other targets.
> 
> Tested on rev 192142 with
> make -k check RUNTESTFLAGS="--target_board=sh-sim
> \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> 
> and no new failures.
> OK?

I've run CSiBE with and without the patch for sh4-unknown-linux-gnu
at -O2.  Only one difference in the resulted sizes: jpeg-6b/jcphuff
increases 5336 bytes to 5340 bytes with the patch.  Could you look
into it?

Regards,
kaz


Re: handle isl and cloog in contrib/download_prerequisites

2012-10-07 Thread Jonathan Wakely
On 7 October 2012 21:31, Manuel López-Ibáñez wrote:
> On 7 October 2012 22:13, Jonathan Wakely  wrote:
>>
>> On Oct 7, 2012 12:00 AM, "NightStrike"  wrote:
>>>
>>> On Sat, Oct 6, 2012 at 7:30 AM, Manuel López-Ibáñez
>>>  wrote:
>>> > Hi,
>>> >
>>> > GCC now requires ISL and a very new CLOOG but download_prerequisites
>>> > does not download those. Also, there is only one sensible place to
>>>
>>> As of what version is isl/cloog no longer optional?
>>
>> If they're really no longer optional then the prerequisites page and 4.8
>> changes page need to be updated.
>>
>> The patch downloads isl and cloog unconditionally, does gcc build them
>> unconditionally if they're found in the source dir?  If they are still
>> optional I don't want download_prerequisites to fetch files that will slow
>> down building gcc by building libs and enabling features I don't use.
>
> I guess they are optional in the sense that you can configure gcc to
> not require them. But the default configure in x86_64-gnu-linux
> requires isl and cloog.

Are you sure?

Seems to me the default is still the same as it always has been, i.e.
Graphite optimisations can be enabled if ISL and cloog are present,
but they're not "required".  I can bootstrap without ISL anyway.


Re: [SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF

2012-10-07 Thread Oleg Endo
On Mon, 2012-10-08 at 09:45 +0900, Kaz Kojima wrote:
> Oleg Endo  wrote:
> > The attached patch improves comparisons such as
> > 'unsigned int <= 0x7FFF' on SH.
> > As mentioned in the PR, for some reason, those comparisons do not go
> > through the cstore expander.  As a consequence the comparison doesn't
> > get the chance to be canonicalized by the target code and ends up as
> > '(~x) >> 31'.
> > I've not investigated this further and just fixed the symptoms on SH.  I
> > don't know whether it's also an issue on other targets.
> > 
> > Tested on rev 192142 with
> > make -k check RUNTESTFLAGS="--target_board=sh-sim
> > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> > 
> > and no new failures.
> > OK?
> 
> I've run CSiBE with and without the patch for sh4-unknown-linux-gnu
> at -O2.  Only one difference in the resulted sizes: jpeg-6b/jcphuff
> increases 5336 bytes to 5340 bytes with the patch.  Could you look
> into it?

Yep, that's actually the only place in the CSiBE set where this case
hits.  The function in question is encode_mcu_AC_refine.  The increase
seems to be due to different register allocation and different spill
code :T
I've attached the asm diff.

Cheers,
Oleg
--- CSiBE/m4-single-ml-O2-trunk/jpeg-6b/jcphuff.s
+++ CSiBE/m4-single-ml-O2/jpeg-6b/jcphuff.s
@@ -2147,7 +2147,7 @@
 	bt/s	.L611
 	mov.l	r2,@(24,r15)
 	bra	.L612
-	mov.l	@(44,r15),r0
+	mov.l	@(44,r15),r3
 .L611:
 	mov.l	.L565,r4
 	mov	r2,r5
@@ -2513,21 +2513,21 @@
 	mov	r0,r1
 	mov	r9,r0
 	and	r2,r1
-	mov.l	@(24,r15),r3
+	mov.l	@(28,r15),r3
 	mov.b	r1,@(r0,r8)
 	mov	r9,r11
-	mov.l	@(28,r15),r0
-	add	#1,r3
-	mov.l	@(36,r15),r1
+	mov.l	@(24,r15),r2
+	add	#4,r3
+	mov.l	@(36,r15),r0
 	add	#1,r11
-	mov.l	@(40,r15),r2
+	mov.l	@(40,r15),r1
+	add	#1,r2
 	add	#4,r0
-	add	#4,r1
-	mov.l	r3,@(24,r15)
-	mov.l	r0,@(28,r15)
-	cmp/ge	r3,r2
+	mov.l	r2,@(24,r15)
+	mov.l	r3,@(28,r15)
+	cmp/ge	r2,r1
 	bt/s	.L599
-	mov.l	r1,@(36,r15)
+	mov.l	r0,@(36,r15)
 	tst	r11,r11
 	bt/s	.L555
 	mov	r12,r14
@@ -2545,21 +2545,23 @@
 	mov.w	.L578,r3
 	cmp/hi	r3,r2
 	bf/s	.L612
-	mov.l	@(44,r15),r0
+	mov.l	@(44,r15),r3
 .L515:
 	mov.l	.L582,r2
 	jsr	@r2
 	mov	r14,r4
 .L459:
+	mov.l	@(44,r15),r3
+.L612:
 	mov.l	@(44,r15),r0
-.L612:
+	mov.l	@(24,r3),r2
 	mov.l	@(16,r14),r3
-	mov.l	@(24,r0),r2
 	mov.l	r3,@r2
 	mov.l	@(20,r14),r3
 	mov.l	r3,@(4,r2)
 	mov.w	.L580,r2
-	mov.l	@(r0,r2),r2
+	add	r0,r2
+	mov.l	@(8,r2),r2
 	tst	r2,r2
 	bt	.L544
 	add	#64,r14
@@ -2594,18 +2596,18 @@
 	add	#1,r2
 	mov.l	r2,@(16,r15)
 .L467:
-	mov.l	@(24,r15),r3
-	mov.l	@(28,r15),r0
-	mov.l	@(36,r15),r1
-	add	#1,r3
-	mov.l	@(40,r15),r2
+	mov.l	@(24,r15),r2
+	mov.l	@(28,r15),r3
+	mov.l	@(36,r15),r0
+	add	#1,r2
+	mov.l	@(40,r15),r1
+	add	#4,r3
 	add	#4,r0
-	add	#4,r1
-	mov.l	r3,@(24,r15)
-	mov.l	r0,@(28,r15)
-	cmp/ge	r3,r2
+	mov.l	r2,@(24,r15)
+	mov.l	r3,@(28,r15)
+	cmp/ge	r2,r1
 	bf/s	.L603
-	mov.l	r1,@(36,r15)
+	mov.l	r0,@(36,r15)
 .L599:
 	bra	.L617
 	mov.l	@(28,r15),r1
@@ -2614,8 +2616,8 @@
 	bf/s	.L523
 	mov	r12,r14
 .L555:
-	mov.l	@(16,r15),r3
-	cmp/pl	r3
+	mov.l	@(16,r15),r2
+	cmp/pl	r2
 	bf	.L459
 	mov.l	@(56,r14),r3
 	bra	.L625
@@ -2642,13 +2644,13 @@
 	add	#1,r2
 	mov.l	r2,@r3
 .L511:
-	mov.l	@(20,r15),r1
+	mov.l	@(20,r15),r0
 .L620:
-	mov	#0,r2
+	mov	#0,r1
 	mov	#0,r11
-	mov.l	r2,@(16,r15)
+	mov.l	r1,@(16,r15)
 	bra	.L467
-	mov.l	@(0,r1),r8
+	mov.l	@(0,r0),r8
 	.align 1
 .L522:
 	bra	.L619
@@ -2659,7 +2661,7 @@
 .L578:
 	.short	937
 .L580:
-	.short	196
+	.short	188
 .L581:
 	.short	312
 .L583:
@@ -2728,16 +2730,15 @@
 	tst	r3,r3
 	mov.l	r14,@(28,r12)
 	mov.l	@r1,r0
-	mov.l	@(52,r15),r2
+	mov.l	@(52,r15),r1
 	add	r0,r0
 	mov.l	r11,@(24,r12)
 	bf/s	.L511
-	mov.w	@(r0,r2),r1
-	not	r1,r1
-	mov	r14,r10
-	shll	r1
+	mov.w	@(r0,r1),r2
+	cmp/pz	r2
 	neg	r14,r3
 	movt	r1
+	mov	r14,r10
 	add	#23,r3
 	shld	r3,r1
 	add	#1,r10
@@ -2784,7 +2785,7 @@
 	mov	r9,r6
 .L601:
 	bra	.L620
-	mov.l	@(20,r15),r1
+	mov.l	@(20,r15),r0
 	.align 1
 .L556:
 	mov.l	.L589,r1
@@ -2812,9 +2813,9 @@
 	add	#-8,r10
 	.align 1
 .L558:
-	mov.l	.L589,r3
+	mov.l	.L589,r2
 	mov	r12,r4
-	jsr	@r3
+	jsr	@r2
 	mov.l	r1,@(4,r15)
 	mov.l	@(4,r15),r1
 	cmp/eq	r13,r1
@@ -2830,8 +2831,8 @@
 	dt	r2
 	bf/s	.L507
 	mov.l	r2,@(20,r12)
-	mov.l	.L589,r0
-	jsr	@r0
+	mov.l	.L589,r3
+	jsr	@r3
 	mov	r12,r4
 	bra	.L622
 	add	#-8,r10


Re: [SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF

2012-10-07 Thread Oleg Endo
On Mon, 2012-10-08 at 03:53 +0200, Oleg Endo wrote:
> On Mon, 2012-10-08 at 09:45 +0900, Kaz Kojima wrote:
> > Oleg Endo  wrote:
> > > The attached patch improves comparisons such as
> > > 'unsigned int <= 0x7FFF' on SH.
> > > As mentioned in the PR, for some reason, those comparisons do not go
> > > through the cstore expander.  As a consequence the comparison doesn't
> > > get the chance to be canonicalized by the target code and ends up as
> > > '(~x) >> 31'.
> > > I've not investigated this further and just fixed the symptoms on SH.  I
> > > don't know whether it's also an issue on other targets.
> > > 
> > > Tested on rev 192142 with
> > > make -k check RUNTESTFLAGS="--target_board=sh-sim
> > > \{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"
> > > 
> > > and no new failures.
> > > OK?
> > 
> > I've run CSiBE with and without the patch for sh4-unknown-linux-gnu
> > at -O2.  Only one difference in the resulted sizes: jpeg-6b/jcphuff
> > increases 5336 bytes to 5340 bytes with the patch.  Could you look
> > into it?
> 
> Yep, that's actually the only place in the CSiBE set where this case
> hits.  The function in question is encode_mcu_AC_refine.  The increase
> seems to be due to different register allocation and different spill
> code :T
> I've attached the asm diff.

I've just checked this against current trunk (rev 192193).  The problem
seems to be gone.  There's also a total decrease of 152 bytes on this
file without the patch.  So it seems it was a different issue.  The diff
is now:

--- CSiBE/m4-single-ml-O2/jpeg-6b/jcphuff_.s
+++ CSiBE/m4-single-ml-O2/jpeg-6b/jcphuff.s
@@ -2626,13 +2626,12 @@
add r0,r0
mov.l   r3,@(24,r13)
bf/s.L502
-   mov.w   @(r0,r1),r11
+   mov.w   @(r0,r1),r7
mov.l   @(12,r15),r0
-   not r11,r11
-   shllr11
+   cmp/pz  r7
mov.l   @(12,r15),r10
+   movtr11
neg r0,r2
-   movtr11
add #23,r2
shldr2,r11
add #1,r10

Cheers,
Oleg



Re: [SH] PR 54685 - unsigned int comparison with 0x7FFFFFFF

2012-10-07 Thread Kaz Kojima
Oleg Endo  wrote:
> I've just checked this against current trunk (rev 192193).  The problem
> seems to be gone.  There's also a total decrease of 152 bytes on this
> file without the patch.  So it seems it was a different issue.  The diff
> is now:

Thanks for looking into it quickly.  The patch is OK.

Regards,
kaz


Re: [ping patch] Predict for loop exits in short-circuit conditions

2012-10-07 Thread Dehao Chen
Attached is the updated patch. Yes, if we add a VRP pass before
profile pass, this patch would be unnecessary. Should we add a VRP
pass?

Thanks,
Dehao

On Sat, Oct 6, 2012 at 9:38 AM, Jan Hubicka  wrote:
>> ping^2
>>
>> Honza, do you think this patch can make into 4.8 stage 1?
>
> +  if (check_value_one ^ integer_onep (val))
>
> Probably better as !=
> (especially because GNU coding standard allows predicates to return more than
> just boolean)
>
>
> +{
> +  edge e1;
> +  edge_iterator ei;
> +  tree val = gimple_phi_arg_def (phi_stmt, i);
> +  edge e = gimple_phi_arg_edge (phi_stmt, i);
> +
> +  if (!TREE_CONSTANT (val) || !(integer_zerop (val) || integer_onep 
> (val)))
> +   continue;
> +  if (check_value_one ^ integer_onep (val))
> +   continue;
> +  if (VEC_length (edge, e->src->succs) != 1)
> +   {
> + if (!predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS_GUESSED)
> + && !predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS)
> + && !predicted_by_p (exit_edge->src, PRED_LOOP_EXIT))
> +   predict_edge_def (e, PRED_LOOP_EXIT, NOT_TAKEN);
> + continue;
> +   }
> +
> +  FOR_EACH_EDGE (e1, ei, e->src->preds)
> +   if (!predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS_GUESSED)
> +   && !predicted_by_p (exit_edge->src, PRED_LOOP_ITERATIONS)
> +   && !predicted_by_p (exit_edge->src, PRED_LOOP_EXIT))
> + predict_edge_def (e1, PRED_LOOP_EXIT, NOT_TAKEN);
>
> Here you found an edge that you know is going to terminate the loop
> and you want to predict all paths to this edge as unlikely.
> Perhaps you want to use predict paths leading_to_edge for edge?
>
> You do not need to check PRED_LOOP_ITERATIONS and PRED_LOOP_ITERATIONS_GUESSED
> because those never go to the non-exit edges.
>
> The nature of predict_paths_for_bb type heuristic is that they are not really
> additive: if the path leads to two different aborts it does not make it more
> sure that it will be unlikely.  So perhaps you can add !predicted_by_p (e, 
> pred)
> prior predict_edge_def call in the function?
>
> I wonder if we did VRP just before branch predction to jump thread the 
> shortcut
> condtions into loopback edges, would be there still cases where this
> optimization will match?
>
> Honza


predict.patch
Description: Binary data


RE: [PATCH RFA] Implement register pressure directed hoist pass

2012-10-07 Thread Bin Cheng
Hi Steven,
Thanks for the comments and sorry for the delay with this message.

> -Original Message-
> 
> Hello,
> 
> Thanks for the update. The first look wasn't a very thorough review, so I
have
> more comments now. Sorry for that, I should have taken the time for this
the
> first time round...

It's OK.

> > -   - do rough calc of how many regs are needed in each block, and a
rough
> > - calc of how many regs are available in each class and use that to
> > - throttle back the code in cases where RTX_COST is minimal.
> 
> This comment still applies to LCM.

It seems this comment just express the idea using register pressure
information to direct code movement for expressions with small
RTX_COST(max_distance). But the threshold of max_distance is only
implemented for hoist in GCC, I assume this comment does not affect LCM.
Could you explain more how this comments applies to LCM?

> But then again...
> 
> > + while (n_regs_set-- > 0)
> > +   {
> > + rtx note = find_regno_note (insn, REG_UNUSED,
> > + REGNO (regs_set[n_regs_set]));
> > + if (! note)
> > +   continue;
> > +
> > + mark_reg_death (XEXP (note, 0));
> > +   }
> 
> Why not just mark all registers mentioned in REG_UNUSED notes as death,
i.e.:
> 
> for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
>   if ((REG_NOTE_KIND (note) == REG_UNUSED)
> mark_reg_death (XEXP (note, 0));
> 
> ?

I don't know the exact reason, can only guess REG_UNUSED notes are recorded
only for set registers here. Since this calculation code is a copy of
existing codes, maybe Vlad can help explain this? Thanks.

> 
> 
> > +fira-hoist-pressure
> > +Common Report Var(flag_ira_hoist_pressure) Init(-1) Use IRA based
> > +register pressure calculation in hoist optimizations.
> 
> Please add the "Optimization" marker. Why initialize to -1? The default
> initialization is 0, and if the flag is set it takes a value 1. If you
follow
> that common behavior, you can replace all occurrences of "if
> (flag_ira_hoist_pressure == 1)" with just ""if (flag_ira_hoist_pressure)".
> 

The reason is because we only want to enable this for Thumb1 originally, and
make it possible to disable it for Thumb1 on command line by specifying
"-fno-ira-hoist-pressure".

> 
> > +  /* Enable register pressure hoist when optimizing for size on
> > + Thumb1 set.  */  if (TARGET_THUMB1 && optimize_function_for_size_p
(cfun)
> > +  && flag_ira_hoist_pressure == -1)
> > +flag_ira_hoist_pressure = 1;
> 
> One would expect this to be a win on all targets, but you probably looked
at
> this (at least Thumb2 and plain ARM). Did your patch not help there?
Otherwise,
> I'd be in favor of enabling this option by default for all targets.

>From the data I collected, it helps for Thumb1/ARM/Mips and maybe x86_64 a
little bit. It has no obvious effect on Thumb2/x86.
Since Jeff also expressed that it might be enabled for all targets, I will
collect size data for more targets. Your previous concern will also be
addressed in this way.

> 
> 
> > +  reg = SET_DEST (set);
> > +  if (GET_CODE (reg) == SUBREG)
> > +reg = SUBREG_REG (reg);
> > +  if (MEM_P (reg))
> > +{
> > +  *nregs = 0;
> > +  pressure_class = NO_REGS;
> > +}
> > +  else
> > +{
> > +  if (! REG_P (reg))
> > +   reg = NULL_RTX;
> > +  if (reg == NULL_RTX)
> > +   pressure_class = GENERAL_REGS;
> 
> gcc_assert (REG_P (reg))? If reg is not a MEM, it must be a REG or it's
not
> recorded in compute_hash_table. In fact, AFAIR reg must be a REG unless
> flag_gcse_las is set, and I don't know if that even works with code
hoisting.
> It's not a particularly well tested flag (although it's a very useful one
for
> most RISC targets, maybe you should have a look at that, too, for ARM
> variants).

Again, it is a copy from loop-invariant.c. It might be rewritten/optimized
for hoist, but should we keep it same with original copy in
loop-invariant.c?

> 
> Ciao!
> Steven

Thanks again for your review, all other comments are accepted and I will
work out an updated patch for review.






RE: [PATCH RFA] Implement register pressure directed hoist pass

2012-10-07 Thread Bin Cheng
Hi Jeff,
Thanks for reviewing and sorry for this delayed message.

> 
> > +  /* Only decrease distance if bb has high register pressure or
EXPR
> > +is const expr, otherwise EXPR can be hoisted through bb without
> > +cost.  */
> ?!?  This comment makes no sense to me.  To accurately know how hoisting
> an expression affects pressure you have to look at the inputs and output
> and see how their lifetime has changed.
> 
> In general:
> 
> For inputs, hoisting *may* reduce pressure.  You really have to look at
> how the life of the input changes based on the new location of the insn.
>   For example, if the input's lifetime is unchanged (say perhaps because
> it was live after the insn we want to hoist), then hoisting will have no
> impact. Otherwise the input's life is shortened, but to know by how much
> you have to determine whether the new death of the input occurs (it may
> still die in the hoisted insn or it may die elsewhere).
> 
> For an output, hoisting will (effectively) always extend the lifetime.
> 
> I've speculated that the right way to deal with register pressure in
> code motion is to actually build the dependency graph and use that to
> guide the code motions.  I've never cobbled together any real code to do
> this though.

Yes, that's exactly what I mean "simulate register pressure change of each
basic block accurately" in the patch. The current patch works in the way it
only extend the lifetime of output registers of hoisted expressions, while
does not consider the possibility of decreasing of register pressure because
of input registers. The rationales are:
1. Calculation live range info and maintain of the info during hoisting
process are very costly.
2. From the observation during the work, I found most of expression hoisted
are constants and base-addresses, means the possibility of decreasing
register pressure in hoisting expression is relatively low.

This patch does not decrease distance if the basic block's register pressure
is low, making the expression can be hoisted up in flow graph longer. When
register pressure is high, the patch retreats to original algorithm.

Also I observed lots of size regressions by hoisting constant expressions
aggressively along with other expressions, so the patch uses original
heuristic for constant expressions.

> > @@ -2863,7 +2909,8 @@ static int
> > if (visited == NULL)
> >   {
> > visited_allocated_locally = 1;
> > -  visited = XCNEWVEC (char, last_basic_block);
> > +  visited = sbitmap_alloc (last_basic_block);
> > +  sbitmap_zero (visited);
> >   }
> What's the purpose behind changing visited from a simple array to a
> sbitmap?  I'm not objecting, but would like to hear the rationale behind
> that change.  I'll also note it wasn't mentioned in the ChangeLog.

Because I have to iterate over visited basic blocks to update register
pressure information, the operation is less effective on integer array.
Maybe I should submit this as a separate patch as suggested by Steven.

> 
> Similarly what's the rationale behind passing the expression itself
> rather than just its index?  I don't see where we need to use anything
> other than the index in this code.  And again, this change isn't
> mentioned in the ChangeLog.

The expression itself is needed to check "CONST_INT_P (expr->expr)".

> 
> > +  /* Considered invariant insns have only one set.  */
> > +  gcc_assert (set != NULL_RTX);
> > +  reg = SET_DEST (set);
> > +  if (GET_CODE (reg) == SUBREG)
> > +reg = SUBREG_REG (reg);
> > +  if (MEM_P (reg))
> > +{
> > +  *nregs = 0;
> > +  pressure_class = NO_REGS;
> > +}
> Don't you need to look at the addresses within the MEM?

This function is a copy from register pressure calculation code in
loop-invariant.c
I think MEM should be ignored here, because as in function hash_scan_set, we
won't honor stores in memory when flag_gcse_las is not enabled. 

> 
> > Index: gcc/config/arm/arm.c
> > ===
> > --- gcc/config/arm/arm.c(revision 191816)
> > +++ gcc/config/arm/arm.c(working copy)
> > @@ -2021,6 +2021,11 @@ arm_option_override (void)
> > && current_tune->num_prefetch_slots > 0)
> >   flag_prefetch_loop_arrays = 1;
> >
> > +  /* Enable register pressure hoist when optimizing for size on Thumb1
set.
> */
> > +  if (TARGET_THUMB1 && optimize_function_for_size_p (cfun)
> > +  && flag_ira_hoist_pressure == -1)
> > +flag_ira_hoist_pressure = 1;
> I'd rather see this enabled for all targets when optimizing for size;
> that guarantees more testing.  Even if it doesn't help a particular
> target, as long as it isn't hurting we're better off enabling it.

I will collect size data for more targets.

> 
> I don't see any testsuite additions -- it'd really help long term to
> have some tests for this stuff.

It's some kind of difficult to create stable cases for hoist, especially
having register pressure information i

Re: libgo patch committed: Use libbacktrace

2012-10-07 Thread Ian Lance Taylor
On Sun, Oct 7, 2012 at 7:05 AM, Andreas Schwab  wrote:
> Ian Lance Taylor  writes:
>
>> +/* Set *VAL to the value of the symbol for PC.  */
>> +
>> +static _Bool
>> +__go_symbol_value (uintptr_t pc, uintptr_t *val)
>> +{
>> +  *val = 0;
>> +  backtrace_syminfo (__go_get_backtrace_state (), pc, syminfo_callback,
>> +  error_callback, &val);
>> +  return *val != 0;
>>  }
>
> Don't clobber address of val.
>
> diff --git a/libgo/runtime/go-caller.c b/libgo/runtime/go-caller.c
> index 843adf6..8dcf9e4 100644
> --- a/libgo/runtime/go-caller.c
> +++ b/libgo/runtime/go-caller.c
> @@ -144,7 +144,7 @@ __go_symbol_value (uintptr_t pc, uintptr_t *val)
>  {
>*val = 0;
>backtrace_syminfo (__go_get_backtrace_state (), pc, syminfo_callback,
> -error_callback, &val);
> +error_callback, val);
>return *val != 0;
>  }

Yikes.  Thanks.  Committed to mainline.

Ian


RE: [PING Updated]: [PATCH GCC/ARM] Fix problem that hardreg_cprop opportunities are missed on thumb1

2012-10-07 Thread Bin Cheng
Ping.

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org]
On
> Behalf Of Bin Cheng
> Sent: Tuesday, September 25, 2012 4:00 PM
> To: 'Richard Sandiford'
> Cc: Ramana Radhakrishnan; Richard Earnshaw; gcc-patches@gcc.gnu.org
> Subject: RE: [Updated]: [PATCH GCC/ARM] Fix problem that hardreg_cprop
> opportunities are missed on thumb1
> 
> 
> > -Original Message-
> > From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
> > Sent: Wednesday, September 05, 2012 6:09 AM
> > To: Bin Cheng
> > Cc: Ramana Radhakrishnan; 'Eric Botcazou'; gcc-patches@gcc.gnu.org
> > Subject: Re: Ping: [PATCH GCC/ARM] Fix problem that hardreg_cprop
> > opportunities are missed on thumb1
> 
> > Subtraction of zero isn't canonical rtl though.  Passes after
> > peephole2
> would
> > be well within their rights to simplify the expression back to a move.
> > From that point of view, making the passes recognise (plus X 0) and
> > (minus
> X 0)
> > as special cases would be inconsistent.
> >
> > Rather than make the Thumb 1 CC usage implicit in the rtl stream, and
> carry
> > the current state around in cfun->machine, it seems like it would be
> better to
> > get md_reorg to rewrite the instructions into a form that makes the
> > use of condition codes explicit.
> >
> > md_reorg also sounds like a better place in the pipeline than
> > peephole2 to
> be
> > doing this kind of transformation, although I admit I have zero
> > evidence
> to
> > back that up...
> >
> 
> Hi Richard,
> 
> This is the updated patch according to your suggestions. I removed the
> peephole2 patterns and introduced function thumb1_reorg to rewrite
> instructions in md_reorg pass.
> 
> In addition to missed propagation, this patch also detects following case:
>   mov r5, r0
>   str r0, [r4]   <---miscellaneous irrelevant instructions
>   [cmp r0, 0]<---saved
>   bne  .Lxxx
> 
> Patch tested on arm-none-eabi/cortex-m0, no regressions introduced.
> 
> Is it OK?
> 
> Thanks.
> 
> 2012-09-25  Bin Cheng  
> 
>   * config/arm/arm.c (thumb1_reorg): New function.
>   (arm_reorg): Call thumb1_reorg.
>   (thumb1_final_prescan_insn): Record src operand in thumb1_cc_op0.
>   * config/arm/arm.md : Remove peephole2 patterns which rewrites move
>   into subtract of ZERO.