Re: PATCH RFA: New configure option --with-native-system-header-dir

2011-10-11 Thread Paolo Bonzini

On 10/09/2011 08:18 AM, Ian Lance Taylor wrote:

+#undef NATIVE_HEADER_HEADER_COMPONENT
+#define NATIVE_SYSTEM_HEADER_COMPONENT MINGW


Typo (I think), otherwise okay.

Paolo


Re: [4/4] Make SMS schedule register moves

2011-10-11 Thread Richard Sandiford
Ayal Zaks ayal.z...@gmail.com writes:
 The issue of assigning stages to reg-moves is mostly relevant for
 prolog and epilog generation, which requires and receives special
 attention -- handled very nicely by ps_num_consecutive_stages! Note
 that currently a simple boolean indicator for (the exceptional case
 of) double stages would suffice, instead of generalizing to arbitrary
 nums of consecutive stages (see any potential use for them?).

 Not in the immediate term.  But I think having a boolean indicator
 would be inconsistent.  If the distance field is an int (even though
 we only expect distance-0 and distance-1 register dependencies)
 then I think the number of stages should be too.

 I did wonder originally about using a boolean, but IMO, it makes
 the code less readable rather than more.  Instead of a simple
 range check like:

     if (first_stage_for_insn = last_stage_in_range
          last_stage_for_insn = first_stage_in_range)

 we end up with the equivalent of:

     if (first_stage_for_insn = last_stage_in_range
          (double_stage_move_p (...)
             ? first_stage_for_insn + 1 = first_stage_in_range
             : first_stage_for_insn = first_stage_in_range))

 with no corresponding simplification elsewhere.


 Sure. But setting the range can be done by consulting an simple
 indicator, rather than generalizing to arbitrary stage numbers; e.g.:

 +ps_num_consecutive_stages (partial_schedule_ptr ps, int id)
 +{
 +  if (id = ps-g-num_nodes  ps_reg_move (ps, id)-double_stages)
 +return 2;
 +  else
 +return 1;
 +}

 or

 -  last_u = first_u + ps_num_consecutive_stages (ps, u) - 1;
 +  if (...double_stages) last_u = first_u + 1;
 +  else last_u = first_u;

Understood.  I still prefer the posted version though.

 E.g. adding something like this at the end:

   ??? The algorithm restricts the scheduling window to II cycles.
   In rare cases, it may be better to allow windows of II+1 cycles.
   The window would then start and end on the same row, but with
   different must precede and must follow requirements.

 Let me know what you think and I'll add it as a follow-on patch.


 great, thanks.

OK, added with the patch below.

 +
 +   The move is part of a chain that satisfies register dependencies
 +   between a producing ddg node and various consuming ddg nodes.
 +   If some of these dependencies cross a loop iteration (that is,
 +   have a distance of 1) then DISTANCE1_USES is nonnull and contains
 +   the set of uses with distance-1 dependencies.  DISTANCE1_USES
 +   is null otherwise.
 +

 Maybe clarify that they are upwards-exposed or live-in uses.

 OK, changed to:

   The move is part of a chain that satisfies register dependencies
   between a producing ddg node and various consuming ddg nodes.
   If some of these dependencies have a distance of 1 (meaning that
   the use is upward-exposoed) then DISTANCE1_USES is nonnull and

 exposed (typo)

Oops, also fixed below (and applied).

Richard


gcc/
* modulo-sched.c: Fix comment typo.  Mention the possibility
of using scheduling windows of II+1 cycles.

Index: gcc/modulo-sched.c
===
--- gcc/modulo-sched.c  2011-10-10 12:42:41.0 +0100
+++ gcc/modulo-sched.c  2011-10-11 09:07:08.069166743 +0100
@@ -545,7 +545,7 @@ set_columns_for_ps (partial_schedule_ptr
The move is part of a chain that satisfies register dependencies
between a producing ddg node and various consuming ddg nodes.
If some of these dependencies have a distance of 1 (meaning that
-   the use is upward-exposoed) then DISTANCE1_USES is nonnull and
+   the use is upward-exposed) then DISTANCE1_USES is nonnull and
contains the set of uses with distance-1 dependencies.
DISTANCE1_USES is null otherwise.
 
@@ -1810,7 +1810,11 @@ sms_schedule (void)
41. endif
42. compute epilogue  prologue
43. finish - succeeded to schedule
-*/
+
+   ??? The algorithm restricts the scheduling window to II cycles.
+   In rare cases, it may be better to allow windows of II+1 cycles.
+   The window would then start and end on the same row, but with
+   different must precede and must follow requirements.  */
 
 /* A limit on the number of cycles that resource conflicts can span.  ??? 
Should
be provided by DFA, and be dependent on the type of insn scheduled.  
Currently


Re: int_cst_hash_table mapping persistence and the garbage collector

2011-10-11 Thread Richard Guenther
On Mon, Oct 10, 2011 at 7:02 PM, Gary Funck g...@intrepid.com wrote:
 Recently, a few UPC test programs failed to compile
 due to mis-matches of parameters in a prototype and its
 corresponding function definition.  The mis-match was
 based upon the apparent inequality of UPC layout qualifiers
 (blocking factors).

 UPC blocking factors are integer constants.  They are
 recorded in a hash table indexed by the type tree node
 that they correspond to.

 Currently, the test for equality of blocking factors
 tests only the pointer to the tree node defining the constant.
 All blocking factors are recorded as sizetype type'd nodes.
 Given that integer constants are hashed by type/value, it seemed
 safe to assume that a given blocking factor would map to
 a single tree node due to the underlying hash method that is used
 when integral constants are created.

 Is it valid to assume that pointer equality is sufficient
 to ensure that two integer constants are equal as long
 as their type and values are equal?

 The bug that we ran into occurred because a garbage collection
 pass was run between the point that the function prototype
 tree node was created and the point at which the function declaration
 was processed.  The garbage collector decided that the integer
 constant representing the blocking factor was no longer in use,
 because it had not been marked.

 In fact, the integer constant was needed because it appeared
 in the blocking factor hash table, but not via a direct pointer.
 Rather it was referenced by nature of the fact that
 the blocking factor hash table referenced the integer constant
 that is mapped in the integer constant hash table.

 Here's a rough diagram:

   tree (type) - [ blocking factor hash ] - tree (integer constant)
   tree (integer constant) - [ integer constant hash ] {unique map}
                              - tree (integer constant)

 When the garbage collector deleted the entry from the
 integer constant hash, it forced a new integer constant tree
 node to be created for the same (type, value) integral constant
 blocking factor.

 One easy way to address the current issue is to call
 tree_int_cst_equal() if the integer constant tree pointers
 do not match:

    if ((c1 != c2)  !tree_int_cst_equal (c1, c2))
      /* integer constants aren't equal.  */

 This may be necessary if 'int_cst_hash_table' is viewed as
 a cache rather than a persistent, unique mapping.

 Another approach, would be to somehow mark the node
 in int_cst_hash_table as in use when the blocking factor
 hash table is traversed by the garbage collector, or
 to add logic the hash table delete function associated
 with int_cst_hash_table; to dis-allow the delete if the
 integer constant is present in the UPC blocking factor
 hash table.

 To effect this change in a modular way probably the hash table
 delete function associated with 'int_cst_hash_table' would have
 to be daisy-chained, where the UPC blocking factor check is made
 first.  The difficulty with implementing the daisy chaining is that
 int_cst_hash_table needs to exist before the UPC-related initialization
 code is run.  One way to handle this might be yet another language
 hook, called from the code that creates 'int_cst_hash_table'.
 That seems overly complex.

 For reference, the current blocking factor mapping table
 is created as follows:

  upc_block_factor_for_type = htab_create_ggc (512, tree_map_hash,
                                               tree_map_eq, 0);

 Summary:

 1. Is it valid to assume that pointer equality is sufficient
 to compare two integer constants for equality as long as they
 have identical type and value?

Yes, if both constants are live

 2. Should 'int_cst_hash_table' be viewed as a cache, where
 the mapping of a given (type, value) integer constant may
 vary over time?

Yes, if a constant is unused it may get collected and re-allocated
later.  Cannot be observed from any valid use of 1.

 3. If the answer to 1. is yes and the answer to 2. is no
 then what is the recommended way to ensure that nodes in
 'int_cst_hash_table' are not removed if the integer constant
 is being referenced via the 'upc_block_factor_for_type'
 hash table?

You need to ensure the constants are marked properly.

Richard.

 thanks,
 - Gary



[Patch,AVR]: Housekeeping avr_legitimate_address_p

2011-10-11 Thread Georg-Johann Lay
This is bit of code cleanup and move macro code from avr.h to functions in 
avr.c.

There's no change in functionality. Passed without regressions.

Ok?

Johann

* config/avr/avr-protos.h (avr_mode_code_base_reg_class): New
prototype.
(avr_regno_mode_code_ok_for_base_p): New prototype.
* config/avr/avr.h (BASE_REG_CLASS): Remove.
(REGNO_OK_FOR_BASE_P): Remove.
(REG_OK_FOR_BASE_NOSTRICT_P): Remove.
(REG_OK_FOR_BASE_STRICT_P): Remove.
(MODE_CODE_BASE_REG_CLASS): New define.
(REGNO_MODE_CODE_OK_FOR_BASE_P): New define.

* config/avr/avr.c (avr_mode_code_base_reg_class): New function.
(avr_regno_mode_code_ok_for_base_p): New function.
(avr_reg_ok_for_addr_p): New static function.
(avr_legitimate_address_p): Use it.  Beautify.
Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(revision 179765)
+++ config/avr/avr-protos.h	(working copy)
@@ -106,6 +106,8 @@ extern int avr_simplify_comparison_p (en
 extern RTX_CODE avr_normalize_condition (RTX_CODE condition);
 extern void out_shift_with_cnt (const char *templ, rtx insn,
 rtx operands[], int *len, int t_len);
+extern reg_class_t avr_mode_code_base_reg_class (enum machine_mode, RTX_CODE, RTX_CODE);
+extern bool avr_regno_mode_code_ok_for_base_p (int, enum machine_mode, RTX_CODE, RTX_CODE);
 extern rtx avr_incoming_return_addr_rtx (void);
 extern rtx avr_legitimize_reload_address (rtx, enum machine_mode, int, int, int, int, rtx (*)(rtx,int));
 #endif /* RTX_CODE */
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 179765)
+++ config/avr/avr.c	(working copy)
@@ -1202,43 +1202,68 @@ avr_cannot_modify_jumps_p (void)
 }
 
 
+/* Helper function for `avr_legitimate_address_p'.  */
+
+static inline bool
+avr_reg_ok_for_addr_p (rtx reg, addr_space_t as ATTRIBUTE_UNUSED, int strict)
+{
+  return (REG_P (reg)
+   (avr_regno_mode_code_ok_for_base_p (REGNO (reg),
+ QImode, MEM, UNKNOWN)
+  || (!strict
+   REGNO (reg) = FIRST_PSEUDO_REGISTER)));
+}
+
+
 /* Return nonzero if X (an RTX) is a legitimate memory address on the target
machine for a memory operand of mode MODE.  */
 
-bool
+static bool
 avr_legitimate_address_p (enum machine_mode mode, rtx x, bool strict)
 {
   reg_class_t r = NO_REGS;
   
-  if (REG_P (x)  (strict ? REG_OK_FOR_BASE_STRICT_P (x)
-: REG_OK_FOR_BASE_NOSTRICT_P (x)))
-r = POINTER_REGS;
+  if (REG_P (x)
+   avr_reg_ok_for_addr_p (x, ADDR_SPACE_GENERIC, strict))
+{
+  r = POINTER_REGS;
+}
   else if (CONSTANT_ADDRESS_P (x))
-r = ALL_REGS;
+{
+  r = ALL_REGS;
+}
   else if (GET_CODE (x) == PLUS
 REG_P (XEXP (x, 0))
-	GET_CODE (XEXP (x, 1)) == CONST_INT
-	INTVAL (XEXP (x, 1)) = 0)
+CONST_INT_P (XEXP (x, 1))
+INTVAL (XEXP (x, 1)) = 0)
 {
-  int fit = INTVAL (XEXP (x, 1)) = MAX_LD_OFFSET (mode);
+  rtx reg = XEXP (x, 0);
+  bool fit = INTVAL (XEXP (x, 1)) = MAX_LD_OFFSET (mode);
+  
   if (fit)
-	{
-	  if (! strict
-	  || REGNO (XEXP (x,0)) == REG_X
-	  || REGNO (XEXP (x,0)) == REG_Y
-	  || REGNO (XEXP (x,0)) == REG_Z)
-	r = BASE_POINTER_REGS;
-	  if (XEXP (x,0) == frame_pointer_rtx
-	  || XEXP (x,0) == arg_pointer_rtx)
-	r = BASE_POINTER_REGS;
-	}
-  else if (frame_pointer_needed  XEXP (x,0) == frame_pointer_rtx)
-	r = POINTER_Y_REGS;
+{
+  if (! strict
+  || REGNO (reg) == REG_X
+  || REGNO (reg) == REG_Y
+  || REGNO (reg) == REG_Z)
+{
+  r = BASE_POINTER_REGS;
+}
+  
+  if (reg == frame_pointer_rtx
+  || reg == arg_pointer_rtx)
+{
+  r = BASE_POINTER_REGS;
+}
+}
+  else if (frame_pointer_needed  reg == frame_pointer_rtx)
+{
+  r = POINTER_Y_REGS;
+}
 }
   else if ((GET_CODE (x) == PRE_DEC || GET_CODE (x) == POST_INC)
 REG_P (XEXP (x, 0))
-(strict ? REG_OK_FOR_BASE_STRICT_P (XEXP (x, 0))
-   : REG_OK_FOR_BASE_NOSTRICT_P (XEXP (x, 0
+avr_reg_ok_for_addr_p (XEXP (x, 0), ADDR_SPACE_GENERIC, strict))
 {
   r = POINTER_REGS;
 }
@@ -1269,7 +1294,7 @@ avr_legitimate_address_p (enum machine_m
 /* Attempts to replace X with a valid
memory address for an operand of mode MODE  */
 
-rtx
+static rtx
 avr_legitimize_address (rtx x, rtx oldx, enum machine_mode mode)
 {
   bool big_offset_p = false;
@@ -7170,6 +7195,51 @@ avr_hard_regno_mode_ok (int regno, enum
 }
 
 
+/* Implement `MODE_CODE_BASE_REG_CLASS'.  */
+
+reg_class_t
+avr_mode_code_base_reg_class (enum machine_mode mode ATTRIBUTE_UNUSED,
+   

Re: Fix for PR libobjc/49883 (clang + gcc 4.6 runtime = broken) and a small related clang fix

2011-10-11 Thread Nicola Pero
 Unfortunately, the report was correct in that clang is producing incorrect 
 code and
 abusing the higher bits of the class-info field to store some other 
 information.
 
 The clang folks are pretty responsive.  I'd always give them a chance to 
 `fix' thier code, before putting hack-arounds in our code in general.

That discussion did happen in private.  It wasn't pleasant.  They won't change 
their code.  In fact,
I just want to fix things and not get into more discussions.

Anyhow, summarizing, the traditional GNU runtime ABI has the values 0x1L or 
0x2L in the class-info field.
But there is no formal definition document for the ABI, so all we can say is 
that GCC has always set that field
to either 0x1L or 0x2L.  By the way, the lack of a formal definition document 
is a problem, and if, at some point,
I get to implement a new ABI for the GNU Objective-C runtime (which I want to 
do) I will produce a formal
document describing it - so that anyone can implement a compatible compiler or 
runtime.

But, for the existing ABI, there is no document describing it, hence all that 
can be said is that GCC only stores
the values 0x1L or 0x2L in the class-field.  The GNU runtime then uses some of 
the other bits to store information
on the class at runtime - eg. when the class is +initialized it sets a bit, 
when it is resolved it sets another, etc.

clang started abusing a higher bit of that field to store information not 
normally present in the ABI.  That worked
with older versions of the GNU runtime, because (by sheer chance in my view) 
the higher bit they set was not
being used.  The fact that it was not being used was an implementation accident 
(in my view) since other higher
bits were actually used.

The new GNU runtime included in GCC 4.6.x and higher has classes in 
construction (part of the new Objective-C
API) and so the next available bit in the class-info field was used to keep 
track of the fact that a class is in
construction.  That was just the next available bit, but (unknown to me) it is 
precisely the bit that clang was (ab)using.
As a consequence, code compiled with clang no longer works with the GNU runtime 
from GCC 4.6.x.  As there is no
formal definition document for the ABI, while it seems obvious to me that they 
broke the ABI (since they produce
object files with some reserved bits set that no version of GCC would ever 
produce), they claim they didn't because
their hack worked with GCC up to 4.5.x and the GNU runtime ignored whether that 
bit was set or not - up until 4.5.x.

It's a standoff because they use that higher bit to basically produce a richer 
ABI, so they can't easily get rid
of it now, and they won't.  The hack-around I added clears this higher bit, 
unlocks the standoff and gets things
to work again.

Let's hope there are no more such issues, and if we introduce a new GNU 
Objective-C runtime ABI, we need
to make sure it is well documented so that it is possible to easily ensure 
compatibility between different compilers
and runtimes.

Thanks


[RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP.

2011-10-11 Thread Sameera Deshpande
This series of 5 patches generate LDRD/STRD instead of POP/PUSH in
epilogue/prologue for ARM and Thumb-2 mode of A15.

Patch [1/5] introduces new field in tune which can be used to indicate
whether LDRD/STRD are preferred over POP/PUSH by the specific core.

Patches [2-5/5] use this field to determine if LDRD/STRD can be
generated instead of PUSH/POP in ARM and Thumb-2 mode.

Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15.
This patch depends on patch [1/5].

Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15.
This patch depends for variables, functions and patterns defined in
[1/5] and [2/5].

Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This
patch depends on [1/5].

Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This
patch depends for variables, functions and patterns defined in [1/5] and
[4/5].

All these patches depend upon the Thumb2/ARM RTL epilogue patches
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html,
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for
review.

All these patches are applied in given order and tested with check-gcc,
check-gdb and bootstrap without regression. 

In case of ARM mode, significant performance improvement can be seen on
some parts of a popular embedded consumer benchmark (~26%). 
However, in most of the cases, not much effect is seen on performance.
(~ 3% improvement) 

In case of thumb2, the performance improvement observed on same parts
the benchmark is ~11% (2.5% improvement). 

-- 






Re: int_cst_hash_table mapping persistence and the garbage collector

2011-10-11 Thread Eric Botcazou
 In fact, the integer constant was needed because it appeared
 in the blocking factor hash table, but not via a direct pointer.
 Rather it was referenced by nature of the fact that
 the blocking factor hash table referenced the integer constant
 that is mapped in the integer constant hash table.

You'd need to elaborate here: what does by nature of the fact that mean?

 When the garbage collector deleted the entry from the
 integer constant hash, it forced a new integer constant tree
 node to be created for the same (type, value) integral constant
 blocking factor.

 One easy way to address the current issue is to call
 tree_int_cst_equal() if the integer constant tree pointers
 do not match:

 if ((c1 != c2)  !tree_int_cst_equal (c1, c2))
   /* integer constants aren't equal.  */

You have two objects C1 and C2 for the same constant and you're comparing them.
One was created first, say C1.  If C1 was still live when C2 was created, why 
was C2 created in the first class?  If C1 wasn't live anymore when C2 was 
created, why are you still using C1 here?

-- 
Eric Botcazou


[RFA/ARM][Patch 01/05]: Create tune for Cortex-A15.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch adds new field in tune_params to indicate if LDRD/STRD are
preferred over PUSH/POP in prologue/epilogue of specific core.
It also creates new tune for cortex-A15 and updates tunes for other
cores to set new field to default value. 

Changelog entry for Patch to create tune for cortex-a15:

2011-10-11  Sameera Deshpande
sameera.deshpa...@arm.com 

* config/arm/arm-cores.def (cortex_a15): Update.
* config/arm/arm-protos.h (struct tune_params): Add new field...
  (arm_gen_ldrd_strd): ... this.
* config/arm/arm.c (arm_slowmul_tune): Add 
  arm_gen_ldrd_strd field settings.
  (arm_fastmul_tune): Likewise.
  (arm_strongarm_tune): Likewise.
  (arm_xscale_tune): Likewise.
  (arm_9e_tune): Likewise.
  (arm_v6t2_tune): Likewise.
  (arm_cortex_tune): Likewise.
  (arm_cortex_a5_tune): Likewise.
  (arm_cortex_a9_tune): Likewise.
  (arm_fa726te_tune): Likewise. 
  (arm_cortex_a15_tune): New variable.
-- 


On Tue, 2011-10-11 at 10:08 +0100, Sameera Deshpande wrote:
 This series of 5 patches generate LDRD/STRD instead of POP/PUSH in
 epilogue/prologue for ARM and Thumb-2 mode of A15.
 
 Patch [1/5] introduces new field in tune which can be used to indicate
 whether LDRD/STRD are preferred over POP/PUSH by the specific core.
 
 Patches [2-5/5] use this field to determine if LDRD/STRD can be
 generated instead of PUSH/POP in ARM and Thumb-2 mode.
 
 Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15.
 This patch depends on patch [1/5].
 
 Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15.
 This patch depends for variables, functions and patterns defined in
 [1/5] and [2/5].
 
 Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This
 patch depends on [1/5].
 
 Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This
 patch depends for variables, functions and patterns defined in [1/5] and
 [4/5].
 
 All these patches depend upon the Thumb2/ARM RTL epilogue patches
 http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html,
 http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for
 review.
 
 All these patches are applied in given order and tested with check-gcc,
 check-gdb and bootstrap without regression. 
 
 In case of ARM mode, significant performance improvement can be seen on
 some parts of a popular embedded consumer benchmark (~26%). 
 However, in most of the cases, not much effect is seen on performance.
 (~ 3% improvement) 
 
 In case of thumb2, the performance improvement observed on same parts
 the benchmark is ~11% (2.5% improvement). 
 diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 742b5e8..1b42713 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -128,7 +128,7 @@ ARM_CORE(generic-armv7-a, genericv7a,	7A, FL_LDSCHED, cortex)
 ARM_CORE(cortex-a5,	  cortexa5,	7A, FL_LDSCHED, cortex_a5)
 ARM_CORE(cortex-a8,	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE(cortex-a9,	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
-ARM_CORE(cortex-a15,	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
+ARM_CORE(cortex-a15,	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
 ARM_CORE(cortex-r4,	  cortexr4,	7R, FL_LDSCHED, cortex)
 ARM_CORE(cortex-r4f,	  cortexr4f,	7R, FL_LDSCHED, cortex)
 ARM_CORE(cortex-r5,	  cortexr5,	7R, FL_LDSCHED | FL_ARM_DIV, cortex)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f69bc42..c6b8f71 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -243,6 +243,9 @@ struct tune_params
   int l1_cache_line_size;
   bool prefer_constant_pool;
   int (*branch_cost) (bool, bool);
+  /* This flag indicates if STRD/LDRD instructions are preferred
+ over PUSH/POP in epilogue/prologue.  */
+  bool prefer_ldrd_strd;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6c09267..d709375 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -850,7 +850,8 @@ const struct tune_params arm_slowmul_tune =
   5,		/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,		/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -861,7 +862,8 @@ const struct tune_params arm_fastmul_tune =
   5,		/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,		/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false /* Prefer LDRD/STRD.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -875,7 +877,8 @@ const struct tune_params 

[RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates LDRD instead of POP for Thumb2 epilogue in A15. 

For optimize_size, original epilogue is generated for A15.
The work involves defining new functions, predicates and patterns.

As LDRD cannot be generated for PC, if PC is in register-list, LDRD is
generated for all other registers in the list which can form register
pair.
Then LDR with return is generated if PC is the only register left to be
popped, otherwise POP with return is generated.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entry for Patch to emit LDRD for thumb2 epilogue in A15:

2011-10-11  Sameera Deshpande
sameera.deshpa...@arm.com 

   
* config/arm/arm-protos.h (bad_reg_pair_for_thumb_ldrd_strd):
New 
  declaration.
* config/arm/arm.c (bad_reg_pair_for_thumb_ldrd_strd): New
helper 
  function.
  (thumb2_emit_ldrd_pop): New static function.
  (thumb2_expand_epilogue): Update functions.
* config/arm/constraints.md (Pz): New constraint. 
* config/arm/ldmstm.md (thumb2_ldrd_base): New pattern.
  (thumb2_ldrd): Likewise.
* config/arm/predicates.md (ldrd_immediate_operand): New
predicate.

-- 


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c6b8f71..06a67b5 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -202,6 +202,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d709375..3eba510 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15410,6 +15410,155 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   par = emit_insn (par);
   add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf);
 }
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+  || GET_CODE (src2) != REG
+  || (REGNO (src1) == PC_REGNUM)
+  || (REGNO (src1) == SP_REGNUM)
+  || (REGNO (src1) == REGNO (src2))
+  || (REGNO (src2) == PC_REGNUM)
+  || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static bool
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i = LAST_ARM_REGNUM; i++)
+if (saved_regs_mask  (1  i))
+  num_regs++;
+
+  gcc_assert (num_regs  num_regs = 16);
+  gcc_assert (really_return || ((saved_regs_mask  (1  PC_REGNUM)) == 0));
+
+  if (really_return  (saved_regs_mask  (1  PC_REGNUM)))
+/* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+   to be popped.  So, if num_regs is even, now it will become odd,
+   and we can generate pop with PC.  If num_regs is odd, it will be
+   even now, and ldr with return can be generated for PC.  */
+num_regs--;
+
+  for (i = 0, j = 0; i  (num_regs - (num_regs % 2)); j++)
+/* Var j iterates over all the registers to gather all the registers in
+   saved_regs_mask.  Var i gives index of saved registers in stack frame.
+   A PARALLEL RTX of register-pair is created here, so that pattern for
+   LDRD can be matched.  As PC is always last register to be popped, and
+   we have already decremented num_regs if PC, we don't have to worry
+   about PC in this loop.  */
+if (saved_regs_mask  (1  j))
+  {
+gcc_assert (j != SP_REGNUM);
+
+/* Create RTX for memory load.  New RTX is created for dwarf as
+   they are not sharable.  */
+reg = gen_rtx_REG (SImode, j);
+tmp = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+
+tmp1 = gen_rtx_SET (SImode,
+   reg,
+   gen_frame_mem (SImode,
+   plus_constant (stack_pointer_rtx, 4 * i)));
+RTX_FRAME_RELATED_P (tmp) = 1;
+RTX_FRAME_RELATED_P (tmp1) = 1;
+
+if (i % 2 == 0)
+  {
+/* When saved-register index (i) is even, the RTX to be emitted is
+   yet to be created.  Hence create it first.  The LDRD pattern we
+   are 

[RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates STRD instead of PUSH in prologue for A15 ARM mode.

For optimize_size, original prologue is generated for A15.
The work involves defining new functions, predicates and patterns, along
with minor changes in existing code:
* STRD in ARM mode needs consecutive registers to be stored. The
performance of compiler degrades greatly if R3 is pushed for stack
alignment as it generates single LDR for pushing R3. Instead, having SUB
instruction to do stack adjustment is more efficient. Hence, the
condition in arm_get_frame_offsets () is changed to disable push-in-R3
if prefer_ldrd_strd in ARM mode.

In this patch we keep on accumulating non-consecutive registers till
register-pair to be pushed is found. Then, first PUSH all the
accumulated registers, followed by STRD with pre-stack update for
register-pair. We repeat this until all the registers in register-list
are PUSHed.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entry for Patch to emit STRD for ARM prologue in A15:

2011-10-11  Sameera Deshpande
sameera.deshpa...@arm.com 
   
* config/arm/arm-protos.h (bad_reg_pair_for_arm_ldrd_strd): New
declaration.
* config/arm/arm.c (arm_emit_strd_push): New static function.  
  (bad_reg_pair_for_arm_ldrd_strd): New helper function.
  (arm_expand_prologue): Update. 
  (arm_get_frame_offsets): Update.
* config/arm/ldmstm.md (arm_strd_base): New pattern.
-- 


diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 06a67b5..d5287ad 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -162,6 +162,7 @@ extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
+extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx);
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fd8c31d..08fa0d5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
+static rtx emit_multi_reg_push (unsigned long);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
 static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
@@ -15095,6 +15096,116 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 }
 }
 
+/* STRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg PUSH for all accumulated
+   registers, and then generates STRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are stored on stack.
+   multi-reg PUSH takes care of lone registers as well.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_pushed_mask;
+
+  for (i = 0; i = LAST_ARM_REGNUM; i++)
+if (saved_regs_mask  (1  i))
+  num_regs++;
+
+  gcc_assert (num_regs  num_regs = 16);
+
+  for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i  num_regs; j--)
+/* Var j iterates over all registers to gather all registers in
+   saved_regs_mask.  Var i is used to count number of registers stored on
+   stack.  regs_to_be_pushed_mask accumulates non-consecutive registers
+   that can be pushed using multi-reg PUSH before STRD is generated.  */
+if (saved_regs_mask  (1  j))
+  {
+gcc_assert (j != SP_REGNUM);
+gcc_assert (j != PC_REGNUM);
+i++;
+
+if ((j % 2 == 1)
+ (saved_regs_mask  (1  (j - 1)))
+ regs_to_be_pushed_mask)
+  {
+/* Current register and previous register form register pair for
+   which STRD can be generated.  Hence, emit PUSH for accumulated
+   registers and reset regs_to_be_pushed_mask.  */
+insn = emit_multi_reg_push (regs_to_be_pushed_mask);
+regs_to_be_pushed_mask = 0;
+RTX_FRAME_RELATED_P (insn) = 1;
+continue;
+  }
+
+regs_to_be_pushed_mask |= (1  j);
+
+if ((j % 2) == 0  (saved_regs_mask  (1  (j + 1
+  {
+/* We have found 2 consecutive registers, for which STRD can be
+   generated.  Generate pattern to emit STRD as accumulated

[RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.

2011-10-11 Thread Sameera Deshpande
Hi!

This patch generates LDRD instead of POP in epilogue for A15 ARM mode.

For optimize_size, original epilogue is generated for A15.
The work involves defining new functions, predicates and patterns.

In this patch we keep on accumulating non-consecutive registers till
register-pair to be popped is found. Then, first POP all the accumulated
registers, followed by LDRD with post-stack update for register-pair. We
repeat this until all the registers in register-list are POPPed.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression.
 
Changelog entry for Patch to emit LDRD for ARM epilogue in A15:

2011-10-11  Sameera Deshpande
sameera.deshpa...@arm.com 
   
* config/arm/arm.c (arm_emit_ldrd_pop): New static function.  
  (arm_expand_epilogue): Update. 
* config/arm/ldmstm.md (arm_ldrd_base): New pattern.
  (arm_ldr_with_update): Likewise. 
-- 


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 08fa0d5..0b9fd93 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -967,7 +967,7 @@ const struct tune_params arm_cortex_a9_tune =
   ARM_PREFETCH_BENEFICIAL(4,32,32),
   false,	/* Prefer constant pool.  */
   arm_default_branch_cost,
-  false /* Prefer LDRD/STRD.  */
+  true  /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -15664,6 +15664,145 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
   || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi-reg POP takes care of lone registers as well.  However, LDRD
+   cannot be generated for PC, as results are unpredictable.  Hence, if PC is
+   in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i = LAST_ARM_REGNUM; i++)
+if (saved_regs_mask  (1  i))
+  num_regs++;
+
+  gcc_assert (num_regs  num_regs = 16);
+
+  for (i = 0, j = 0; i  num_regs; j++)
+if (saved_regs_mask  (1  j))
+  {
+i++;
+if ((j % 2) == 0
+ (saved_regs_mask  (1  (j + 1)))
+ (j + 1) != SP_REGNUM
+ (j + 1) != PC_REGNUM
+ regs_to_be_popped_mask)
+  {
+/* Current register and next register form register pair for which
+   LDRD can be generated.  Generate POP for accumulated registers
+   and reset regs_to_be_popped_mask.  SP should be handled here as
+   the results are unpredictable if register being stored is same
+   as index register (in this case, SP).  PC is always the last
+   register being popped.  Hence, we don't have to worry about PC
+   here.  */
+arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+pc_in_list = false;
+regs_to_be_popped_mask = 0;
+continue;
+  }
+
+if (j == PC_REGNUM)
+  {
+gcc_assert (really_return);
+pc_in_list = 1;
+  }
+
+regs_to_be_popped_mask |= (1  j);
+
+if ((j % 2) == 1
+ (saved_regs_mask  (1  (j - 1)))
+ j != SP_REGNUM
+ j != PC_REGNUM)
+  {
+ /* Generate a LDRD for register pair R_j, R_j+1.  The pattern
+generated here is
+[(SET SP, (PLUS SP, 8))
+ (SET R_j-1, (MEM SP))
+ (SET R_j, (MEM (PLUS SP, 4)))].  */
+ par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+ dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3));
+
+ tmp = gen_rtx_SET (VOIDmode,
+stack_pointer_rtx,
+plus_constant (stack_pointer_rtx, 8));
+ tmp1 = gen_rtx_SET (VOIDmode,
+ stack_pointer_rtx,
+ plus_constant (stack_pointer_rtx, 8));
+ RTX_FRAME_RELATED_P (tmp) = 1;
+ RTX_FRAME_RELATED_P (tmp1) = 1;
+ XVECEXP (par, 0, 0) = tmp;
+ XVECEXP (dwarf, 0, 0) = tmp1;
+
+ 

Re: Fix for PR libobjc/49883 (clang + gcc 4.6 runtime = broken) and a small related clang fix

2011-10-11 Thread Mike Stump
On Oct 11, 2011, at 2:05 AM, Nicola Pero wrote:
 Unfortunately, the report was correct in that clang is producing incorrect 
 code and
 abusing the higher bits of the class-info field to store some other 
 information.
 
 The clang folks are pretty responsive.  I'd always give them a chance to 
 `fix' thier code, before putting hack-arounds in our code in general.
 
 That discussion did happen in private.  It wasn't pleasant.  They won't 
 change their code.

Right, then, it isn't a bug, but rather a shared ABI that we choose to be 
compatible with.  We fix in in our system by noticing how we must set or not 
set the bit in our abi document and code and go on with life, it is too short.

 It's a standoff

It isn't a standoff, we can choose to just fix the issue and be compatible, if 
we want.


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-11 Thread Kirill Yukhin
Hi
Uros, you was right both with fpmath and configflags. That is why it
was passing for me.

Attached patch which cures the problem.

testsuite/ChangeLog entry:

2011-10-11  Kirill Yukhin  kirill.yuk...@intel.com

* gcc.target/i386/fma_double_1.c: Add -mfpmath=sse.
* gcc.target/i386/fma_double_2.c: Ditto.
* gcc.target/i386/fma_double_3.c: Ditto.
* gcc.target/i386/fma_double_4.c: Ditto.
* gcc.target/i386/fma_double_5.c: Ditto.
* gcc.target/i386/fma_double_6.c: Ditto.
* gcc.target/i386/fma_float_1.c: Ditto.
* gcc.target/i386/fma_float_2.c: Ditto.
* gcc.target/i386/fma_float_3.c: Ditto.
* gcc.target/i386/fma_float_4.c: Ditto.
* gcc.target/i386/fma_float_5.c: Ditto.
* gcc.target/i386/fma_float_6.c: Ditto.
* gcc.target/i386/l_fma_double_1.c: Ditto.
* gcc.target/i386/l_fma_double_2.c: Ditto.
* gcc.target/i386/l_fma_double_3.c: Ditto.
* gcc.target/i386/l_fma_double_4.c: Ditto.
* gcc.target/i386/l_fma_double_5.c: Ditto.
* gcc.target/i386/l_fma_double_6.c: Ditto.
* gcc.target/i386/l_fma_float_1.c: Ditto.
* gcc.target/i386/l_fma_float_2.c: Ditto.
* gcc.target/i386/l_fma_float_3.c: Ditto.
* gcc.target/i386/l_fma_float_4.c: Ditto.
* gcc.target/i386/l_fma_float_5.c: Ditto.
* gcc.target/i386/l_fma_float_6.c: Ditto.
* gcc.target/i386/l_fma_run_double_1.c: Ditto.
* gcc.target/i386/l_fma_run_double_2.c: Ditto.
* gcc.target/i386/l_fma_run_double_3.c: Ditto.
* gcc.target/i386/l_fma_run_double_4.c: Ditto.
* gcc.target/i386/l_fma_run_double_5.c: Ditto.
* gcc.target/i386/l_fma_run_double_6.c: Ditto.
* gcc.target/i386/l_fma_run_float_1.c: Ditto.
* gcc.target/i386/l_fma_run_float_2.c: Ditto.
* gcc.target/i386/l_fma_run_float_3.c: Ditto.
* gcc.target/i386/l_fma_run_float_4.c: Ditto.
* gcc.target/i386/l_fma_run_float_5.c: Ditto.
* gcc.target/i386/l_fma_run_float_6.c: Ditto.

Could you please have a look?

Sorry for inconvenience, K


fma3-tests-fix.gcc.patch
Description: Binary data


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-11 Thread Uros Bizjak
On Tue, Oct 11, 2011 at 12:12 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

 Uros, you was right both with fpmath and configflags. That is why it
 was passing for me.

 Attached patch which cures the problem.

 testsuite/ChangeLog entry:

 2011-10-11  Kirill Yukhin  kirill.yuk...@intel.com

        * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse.
        * gcc.target/i386/fma_double_2.c: Ditto.
        * gcc.target/i386/fma_double_3.c: Ditto.
        * gcc.target/i386/fma_double_4.c: Ditto.
        * gcc.target/i386/fma_double_5.c: Ditto.
        * gcc.target/i386/fma_double_6.c: Ditto.
        * gcc.target/i386/fma_float_1.c: Ditto.
        * gcc.target/i386/fma_float_2.c: Ditto.
        * gcc.target/i386/fma_float_3.c: Ditto.
        * gcc.target/i386/fma_float_4.c: Ditto.
        * gcc.target/i386/fma_float_5.c: Ditto.
        * gcc.target/i386/fma_float_6.c: Ditto.
        * gcc.target/i386/l_fma_double_1.c: Ditto.
        * gcc.target/i386/l_fma_double_2.c: Ditto.
        * gcc.target/i386/l_fma_double_3.c: Ditto.
        * gcc.target/i386/l_fma_double_4.c: Ditto.
        * gcc.target/i386/l_fma_double_5.c: Ditto.
        * gcc.target/i386/l_fma_double_6.c: Ditto.
        * gcc.target/i386/l_fma_float_1.c: Ditto.
        * gcc.target/i386/l_fma_float_2.c: Ditto.
        * gcc.target/i386/l_fma_float_3.c: Ditto.
        * gcc.target/i386/l_fma_float_4.c: Ditto.
        * gcc.target/i386/l_fma_float_5.c: Ditto.
        * gcc.target/i386/l_fma_float_6.c: Ditto.
        * gcc.target/i386/l_fma_run_double_1.c: Ditto.
        * gcc.target/i386/l_fma_run_double_2.c: Ditto.
        * gcc.target/i386/l_fma_run_double_3.c: Ditto.
        * gcc.target/i386/l_fma_run_double_4.c: Ditto.
        * gcc.target/i386/l_fma_run_double_5.c: Ditto.
        * gcc.target/i386/l_fma_run_double_6.c: Ditto.
        * gcc.target/i386/l_fma_run_float_1.c: Ditto.
        * gcc.target/i386/l_fma_run_float_2.c: Ditto.
        * gcc.target/i386/l_fma_run_float_3.c: Ditto.
        * gcc.target/i386/l_fma_run_float_4.c: Ditto.
        * gcc.target/i386/l_fma_run_float_5.c: Ditto.
        * gcc.target/i386/l_fma_run_float_6.c: Ditto.

OK. (I have also applied your patch to mainline SVN).

Thanks,
Uros.


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-11 Thread Kirill Yukhin
Thank you!

K

On Tue, Oct 11, 2011 at 2:19 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Oct 11, 2011 at 12:12 PM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:

 Uros, you was right both with fpmath and configflags. That is why it
 was passing for me.

 Attached patch which cures the problem.

 testsuite/ChangeLog entry:

 2011-10-11  Kirill Yukhin  kirill.yuk...@intel.com

        * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse.
        * gcc.target/i386/fma_double_2.c: Ditto.
        * gcc.target/i386/fma_double_3.c: Ditto.
        * gcc.target/i386/fma_double_4.c: Ditto.
        * gcc.target/i386/fma_double_5.c: Ditto.
        * gcc.target/i386/fma_double_6.c: Ditto.
        * gcc.target/i386/fma_float_1.c: Ditto.
        * gcc.target/i386/fma_float_2.c: Ditto.
        * gcc.target/i386/fma_float_3.c: Ditto.
        * gcc.target/i386/fma_float_4.c: Ditto.
        * gcc.target/i386/fma_float_5.c: Ditto.
        * gcc.target/i386/fma_float_6.c: Ditto.
        * gcc.target/i386/l_fma_double_1.c: Ditto.
        * gcc.target/i386/l_fma_double_2.c: Ditto.
        * gcc.target/i386/l_fma_double_3.c: Ditto.
        * gcc.target/i386/l_fma_double_4.c: Ditto.
        * gcc.target/i386/l_fma_double_5.c: Ditto.
        * gcc.target/i386/l_fma_double_6.c: Ditto.
        * gcc.target/i386/l_fma_float_1.c: Ditto.
        * gcc.target/i386/l_fma_float_2.c: Ditto.
        * gcc.target/i386/l_fma_float_3.c: Ditto.
        * gcc.target/i386/l_fma_float_4.c: Ditto.
        * gcc.target/i386/l_fma_float_5.c: Ditto.
        * gcc.target/i386/l_fma_float_6.c: Ditto.
        * gcc.target/i386/l_fma_run_double_1.c: Ditto.
        * gcc.target/i386/l_fma_run_double_2.c: Ditto.
        * gcc.target/i386/l_fma_run_double_3.c: Ditto.
        * gcc.target/i386/l_fma_run_double_4.c: Ditto.
        * gcc.target/i386/l_fma_run_double_5.c: Ditto.
        * gcc.target/i386/l_fma_run_double_6.c: Ditto.
        * gcc.target/i386/l_fma_run_float_1.c: Ditto.
        * gcc.target/i386/l_fma_run_float_2.c: Ditto.
        * gcc.target/i386/l_fma_run_float_3.c: Ditto.
        * gcc.target/i386/l_fma_run_float_4.c: Ditto.
        * gcc.target/i386/l_fma_run_float_5.c: Ditto.
        * gcc.target/i386/l_fma_run_float_6.c: Ditto.

 OK. (I have also applied your patch to mainline SVN).

 Thanks,
 Uros.



Re: Out-of-order update of new_spill_reg_store[]

2011-10-11 Thread Bernd Schmidt
I'm not completely following this yet, so please bear with me...

On 10/09/11 10:01, Richard Sandiford wrote:
 Reload 0: GR_REGS, RELOAD_FOR_OUTPUT_ADDRESS (opnum = 0), can't combine, 
 secondary_reload_p
 reload_reg_rtx: (reg:SI 5 $5)
 Reload 1: reload_out (SI) = (reg:SI 32 $f0 [1655])
 MD1_REG, RELOAD_FOR_OUTPUT (opnum = 0)
 reload_out_reg: (reg:SI 32 $f0 [1655])
 reload_reg_rtx: (reg:SI 65 lo)
 secondary_out_reload = 0
 
 Reload 2: reload_out (SI) = (reg:SI 1656)
 GR_REGS, RELOAD_FOR_OUTPUT (opnum = 3)
 reload_out_reg: (reg:SI 1656)
 reload_reg_rtx: (reg:SI 5 $5)
 
 So $5 is first stored in 1656 (operand 3), then $5 is used a secondary
 reload in copying LO to $f0 (operand 0, reg 1655).  The next and final
 use of 1655 ends up inheriting this second reload of $5, so we try to
 delete the original output copy.  The problem is that we delete the
 wrong one: we delete the store of $5 to 1656 rather than the copy of
 $5 to 1655/$f0.

So, reload 1 inherited from somewhere else rather than using reg $5 from
its secondary reload? Where do we try to delete the insn, and what's the
state of the spill_reg_store data at that point?

 The fix I went for is to clear new_spill_reg_store[] for all reloads
 as a separate pass (rather than in the main do_{input,output}_reload
 loop), then only allow new_spill_store_reg[] to be set if the associated
 reload register reaches the end of the reload sequence.

In this case, reload 0 is emitted after reload 2, so it reaches the end.
Correct? What would happen if the 0/1 pair and 2 were swapped?


Bernd



Re: New warning for expanded vector operations

2011-10-11 Thread Richard Guenther
On Mon, Oct 10, 2011 at 3:21 PM, Artem Shinkarov
artyom.shinkar...@gmail.com wrote:
 On Mon, Oct 10, 2011 at 12:02 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, Oct 7, 2011 at 9:44 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Fri, Oct 7, 2011 at 6:22 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Wed, Oct 5, 2011 at 12:35 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Oct 5, 2011 at 1:28 PM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 Hi

 Here is a patch to inform a programmer about the expanded vector 
 operation.
 Bootstrapped on x86-unknown-linux-gnu.

 ChangeLog:

        * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to
          produce the warning.
          (expand_vector_parallel): Adjust to produce the warning.

 Entries start without gcc/, they are relative to the gcc/ChangeLog file.

 Sure, sorry.

          (lower_vec_shuffle): Adjust to produce the warning.
        * gcc/common.opt: New warning Wvector-operation-expanded.
        * gcc/doc/invoke.texi: Document the wawning.


 Ok?

 I don't like the name -Wvector-operation-expanded.  We emit a
 similar warning for missed inline expansions with -Winline, so
 maybe -Wvector-extensions (that's the name that appears
 in the C extension documentation).

 Hm, I don't care much about the name, unless it gets clear what the
 warning is used for.  I am not really sure that Wvector-extensions
 makes it clear.  Also, I don't see anything bad if the warning will
 pop up during the vectorisation. Any vector operation performed
 outside the SIMD accelerator looks suspicious, because it actually
 doesn't improve performance.  Such a warning during the vectorisation
 could mean that a programmer forgot some flag, or the constant
 propagation failed to deliver a constant, or something else.

 Conceptually the text I am producing is not really a warning, it is
 more like an information, but I am not aware of the mechanisms that
 would allow me to introduce a flag triggering inform () or something
 similar.

 What I think we really need to avoid is including this warning in the
 standard Ox.

 +  location_t loc = gimple_location (gsi_stmt (*gsi));
 +
 +  warning_at (loc, OPT_Wvector_operation_expanded,
 +             vector operation will be expanded piecewise);

   v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta);
   for (i = 0; i  nunits;
 @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter
   tree result, compute_type;
   enum machine_mode mode;
   int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / 
 UNITS_PER_WORD;
 +  location_t loc = gimple_location (gsi_stmt (*gsi));
 +
 +  warning_at (loc, OPT_Wvector_operation_expanded,
 +             vector operation will be expanded in parallel);

 what's the difference between 'piecewise' and 'in parallel'?

 Parallel is a little bit better for performance than piecewise.

 I see.  That difference should probably be documented, maybe with
 an example.

 Richard.

 @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter
  {
   int parts_per_word = UNITS_PER_WORD
                       / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE 
 (type)), 1);
 +  location_t loc = gimple_location (gsi_stmt (*gsi));

   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
        parts_per_word = 4
        TYPE_VECTOR_SUBPARTS (type) = 4)
 -    return expand_vector_parallel (gsi, f_parallel,
 -                                  type, a, b, code);
 +    return expand_vector_parallel (gsi, f_parallel, type, a, b, code);
   else
 -    return expand_vector_piecewise (gsi, f,
 -                                   type, TREE_TYPE (type),
 -                                   a, b, code);
 +    return expand_vector_piecewise (gsi, f, type,
 +                                   TREE_TYPE (type), a, b, code);
  }

  /* Check if vector VEC consists of all the equal elements and

 unless i miss something loc is unused here.  Please avoid random
 whitespace changes (just review your patch yourself before posting
 and revert pieces that do nothing).

 Yes you are right, sorry.

 +@item -Wvector-operation-expanded
 +@opindex Wvector-operation-expanded
 +@opindex Wno-vector-operation-expanded
 +Warn if vector operation is not implemented via SIMD capabilities of 
 the
 +architecture. Mainly useful for the performance tuning.

 I'd mention that this is for vector operations as of the C extension
 documented in Vector Extensions.

 The vectorizer can produce some operations that will need further
 lowering - we probably should make sure to _not_ warn about those.
 Try running the vect.exp testsuite with the new warning turned on
 (eventually disabling SSE), like with

 obj/gcc make check-gcc
 RUNTESTFLAGS=--target_board=unix/-Wvector-extensions/-mno-sse
 vect.exp

 Again, see the 

Re: [C++ Patch / RFC] PR 33067

2011-10-11 Thread Paolo Carlini

On 10/11/2011 03:04 AM, Jason Merrill wrote:

On 10/10/2011 12:40 PM, Paolo Carlini wrote:
+  // The fraction 643/2136 approximates log10(2) to 7 significant 
digits.

+  int max_digits10 = 2 + (is_decimal ? fmt-p : fmt-p * 643L / 2136);


Please cite N1822 in the comment and convert it to C syntax.  OK with 
that change.

Thanks. The below is what I actually applied.

Paolo.

/
2011-10-11  Paolo Carlini  paolo.carl...@oracle.com

PR c++/33067
* c-family/c-pretty-print.c (pp_c_floating_constant): Output
max_digits10 (in the ISO C++ WG N1822 sense) decimal digits.
Index: c-family/c-pretty-print.c
===
--- c-family/c-pretty-print.c   (revision 179792)
+++ c-family/c-pretty-print.c   (working copy)
@@ -1018,8 +1018,20 @@ pp_c_enumeration_constant (c_pretty_printer *pp, t
 static void
 pp_c_floating_constant (c_pretty_printer *pp, tree r)
 {
+  const struct real_format *fmt
+= REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (r)));
+
+  REAL_VALUE_TYPE floating_cst = TREE_REAL_CST (r);
+  bool is_decimal = floating_cst.decimal;
+
+  /* See ISO C++ WG N1822.  Note: The fraction 643/2136 approximates
+ log10(2) to 7 significant digits.  */
+  int max_digits10 = 2 + (is_decimal ? fmt-p : fmt-p * 643L / 2136);
+
   real_to_decimal (pp_buffer (pp)-digit_buffer, TREE_REAL_CST (r),
-  sizeof (pp_buffer (pp)-digit_buffer), 0, 1);
+  sizeof (pp_buffer (pp)-digit_buffer),
+  max_digits10, 1);
+
   pp_string (pp, pp_buffer(pp)-digit_buffer);
   if (TREE_TYPE (r) == float_type_node)
 pp_character (pp, 'f');


[Committed] S/390: Add -Wno-attributes to testcase options

2011-10-11 Thread Andreas Krebbel
Hi,

in the 20090223-1.c gcc complains about functions possibly not being
inlined although they have the always_inline attribute on it.  The
original failure could only be observed with these functions being
inlined. The attached patch forces GCC to suppress the warning.

Committed to mainline.

Bye,

-Andreas-

2011-10-11  Andreas Krebbel  andreas.kreb...@de.ibm.com

* gcc.target/s390/20090223-1.c: Add -Wno-attributes.

Index: gcc/testsuite/gcc.target/s390/20090223-1.c
===
*** gcc/testsuite/gcc.target/s390/20090223-1.c.orig
--- gcc/testsuite/gcc.target/s390/20090223-1.c
***
*** 3,9 
 register asm (0).  */
  
  /* { dg-do run } */
! /* { dg-options -O2 } */
  
  extern void abort (void);
  
--- 3,9 
 register asm (0).  */
  
  /* { dg-do run } */
! /* { dg-options -O2 -Wno-attributes } */
  
  extern void abort (void);
  


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-11 Thread Richard Guenther
On Sat, 8 Oct 2011, William J. Schmidt wrote:

 Greetings,
 
 Here are the revised changes for the tree portions of the patch.  I've
 attempted to resolve all comments to date on those portions.  Per
 Steven's comment, I moved copy_ref_info into tree-ssa-address.c; let me
 know if there's a better place, or whether you'd prefer to leave it
 where it was.
 
 I looked into changing the second reassoc pass to use a different
 pass_late_reassoc entry, but this impacted the test suite.  There are
 about 20 tests that rely on -fdump-tree-reassoc being associated with
 two dump files named reassoc1 and reassoc2.  Rather than change all
 these test cases for a temporary solution, I chose to use the deprecated
 first_pass_instance boolean to distinguish between the two passes.  I
 marked this as a Bad Thing and it will be removed once I have time to
 work on the straight-line strength reducer.
 
 I looked into adding a test case with a negative offset, but was unable
 to come up with a construct that would have a negative offset on the
 base MEM_REF and still be recognized by this particular pattern matcher.
 In any case, the use of double_ints throughout should remove that
 concern.

Comments below.

 Thanks,
 Bill
 
 
 2011-10-08  Bill Schmidt  wschm...@linux.vnet.ibm.com
 
 gcc:
 
   PR rtl-optimization/46556
   * tree.h (copy_ref_info): Expose existing function.
   * tree-ssa-loop-ivopts.c (copy_ref_info): Move this function to...
   * tree-ssa-address.c (copy_ref_info): ...here, and remove static token.
   * tree-ssa-reassoc.c (restructure_base_and_offset): New function.
   (restructure_mem_ref): Likewise.
   (reassociate_bb): Look for opportunities to call restructure_mem_ref.
 
 gcc/testsuite:
 
   PR rtl-optimization/46556
   * gcc.dg/tree-ssa/pr46556-1.c: New testcase.
   * gcc.dg/tree-ssa/pr46556-2.c: Likewise.
   * gcc.dg/tree-ssa/pr46556-3.c: Likewise.
 
 
 Index: gcc/tree.h
 ===
 --- gcc/tree.h(revision 179708)
 +++ gcc/tree.h(working copy)
 @@ -5777,6 +5777,7 @@ tree target_for_debug_bind (tree);
  /* In tree-ssa-address.c.  */
  extern tree tree_mem_ref_addr (tree, tree);
  extern void copy_mem_ref_info (tree, tree);
 +extern void copy_ref_info (tree, tree);
  
  /* In tree-vrp.c */
  extern bool ssa_name_nonnegative_p (const_tree);
 Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0)
 @@ -0,0 +1,22 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-dom2 } */
 +
 +struct x
 +{
 +  int a[16];
 +  int b[16];
 +  int c[16];
 +};
 +
 +extern void foo (int, int, int);
 +
 +void
 +f (struct x *p, unsigned int n)
 +{
 +  foo (p-a[n], p-c[n], p-b[n]);
 +}
 +
 +/* { dg-final { scan-tree-dump-times \\* 4; 1 dom2 } } */
 +/* { dg-final { scan-tree-dump-times p_1\\(D\\) \\+ D 1 dom2 } } */
 +/* { dg-final { scan-tree-dump-times MEM\\\[\\(struct x \\*\\)D 2 dom2 } 
 } */
 +/* { dg-final { cleanup-tree-dump dom2 } } */
 Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0)
 @@ -0,0 +1,26 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-dom2 } */
 +
 +struct x
 +{
 +  int a[16];
 +  int b[16];
 +  int c[16];
 +};
 +
 +extern void foo (int, int, int);
 +
 +void
 +f (struct x *p, unsigned int n)
 +{
 +  foo (p-a[n], p-c[n], p-b[n]);
 +  if (n  12)
 +foo (p-a[n], p-c[n], p-b[n]);
 +  else if (n  3)
 +foo (p-b[n], p-a[n], p-c[n]);
 +}
 +
 +/* { dg-final { scan-tree-dump-times \\* 4; 1 dom2 } } */
 +/* { dg-final { scan-tree-dump-times p_1\\(D\\) \\+ D 1 dom2 } } */
 +/* { dg-final { scan-tree-dump-times MEM\\\[\\(struct x \\*\\)D 6 dom2 } 
 } */
 +/* { dg-final { cleanup-tree-dump dom2 } } */
 Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0)
 @@ -0,0 +1,28 @@
 +/* { dg-do compile } */
 +/* { dg-options -O2 -fdump-tree-dom2 } */
 +
 +struct x
 +{
 +  int a[16];
 +  int b[16];
 +  int c[16];
 +};
 +
 +extern void foo (int, int, int);
 +
 +void
 +f (struct x *p, unsigned int n)
 +{
 +  foo (p-a[n], p-c[n], p-b[n]);
 +  if (n  3)
 +{
 +  foo (p-a[n], p-c[n], p-b[n]);
 +  if (n  12)
 + foo (p-b[n], p-a[n], p-c[n]);
 +}
 +}
 +
 +/* { dg-final { scan-tree-dump-times \\* 4; 1 dom2 } } */
 +/* { dg-final { scan-tree-dump-times p_1\\(D\\) \\+ D 1 dom2 } } */
 +/* { dg-final { scan-tree-dump-times MEM\\\[\\(struct x \\*\\)D 6 dom2 } 
 } */
 +/* { dg-final { cleanup-tree-dump dom2 } } */
 Index: 

[PATCH] Fix PR50204

2011-10-11 Thread Richard Guenther

Since we have the alias oracle we no longer optimize the testcase below
because I initially restricted the stmt walking to give up for PHIs
with more than 2 arguments because of compile-time complexity issues.
But it's easy to see that compile-time is not an issue when we
reduce PHI args pairwise to a single dominating operand.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-10-11  Richard Guenther  rguent...@suse.de

PR tree-optimization/50204
* tree-ssa-alias.c (get_continuation_for_phi_1): Split out
two argument handling from ...
(get_continuation_for_phi): ... here.  Handle arbitrary number
of PHI args.

* gcc.dg/tree-ssa/ssa-fre-36.c: New testcase.

Index: gcc/tree-ssa-alias.c
===
*** gcc/tree-ssa-alias.c(revision 179794)
--- gcc/tree-ssa-alias.c(working copy)
*** maybe_skip_until (gimple phi, tree targe
*** 1875,1880 
--- 1875,1934 
return true;
  }
  
+ /* For two PHI arguments ARG0 and ARG1 try to skip non-aliasing code
+until we hit the phi argument definition that dominates the other one.
+Return that, or NULL_TREE if there is no such definition.  */
+ 
+ static tree
+ get_continuation_for_phi_1 (gimple phi, tree arg0, tree arg1,
+   ao_ref *ref, bitmap *visited)
+ {
+   gimple def0 = SSA_NAME_DEF_STMT (arg0);
+   gimple def1 = SSA_NAME_DEF_STMT (arg1);
+   tree common_vuse;
+ 
+   if (arg0 == arg1)
+ return arg0;
+   else if (gimple_nop_p (def0)
+  || (!gimple_nop_p (def1)
+   dominated_by_p (CDI_DOMINATORS,
+ gimple_bb (def1), gimple_bb (def0
+ {
+   if (maybe_skip_until (phi, arg0, ref, arg1, visited))
+   return arg0;
+ }
+   else if (gimple_nop_p (def1)
+  || dominated_by_p (CDI_DOMINATORS,
+ gimple_bb (def0), gimple_bb (def1)))
+ {
+   if (maybe_skip_until (phi, arg1, ref, arg0, visited))
+   return arg1;
+ }
+   /* Special case of a diamond:
+MEM_1 = ...
+goto (cond) ? L1 : L2
+L1: store1 = ...#MEM_2 = vuse(MEM_1)
+  goto L3
+L2: store2 = ...#MEM_3 = vuse(MEM_1)
+L3: MEM_4 = PHIMEM_2, MEM_3
+  We were called with the PHI at L3, MEM_2 and MEM_3 don't
+  dominate each other, but still we can easily skip this PHI node
+  if we recognize that the vuse MEM operand is the same for both,
+  and that we can skip both statements (they don't clobber us).
+  This is still linear.  Don't use maybe_skip_until, that might
+  potentially be slow.  */
+   else if ((common_vuse = gimple_vuse (def0))
+   common_vuse == gimple_vuse (def1))
+ {
+   if (!stmt_may_clobber_ref_p_1 (def0, ref)
+  !stmt_may_clobber_ref_p_1 (def1, ref))
+   return common_vuse;
+ }
+ 
+   return NULL_TREE;
+ }
+ 
+ 
  /* Starting from a PHI node for the virtual operand of the memory reference
 REF find a continuation virtual operand that allows to continue walking
 statements dominating PHI skipping only statements that cannot possibly
*** get_continuation_for_phi (gimple phi, ao
*** 1890,1942 
if (nargs == 1)
  return PHI_ARG_DEF (phi, 0);
  
!   /* For two arguments try to skip non-aliasing code until we hit
!  the phi argument definition that dominates the other one.  */
!   if (nargs == 2)
  {
tree arg0 = PHI_ARG_DEF (phi, 0);
!   tree arg1 = PHI_ARG_DEF (phi, 1);
!   gimple def0 = SSA_NAME_DEF_STMT (arg0);
!   gimple def1 = SSA_NAME_DEF_STMT (arg1);
!   tree common_vuse;
! 
!   if (arg0 == arg1)
!   return arg0;
!   else if (gimple_nop_p (def0)
!  || (!gimple_nop_p (def1)
!   dominated_by_p (CDI_DOMINATORS,
! gimple_bb (def1), gimple_bb (def0
!   {
! if (maybe_skip_until (phi, arg0, ref, arg1, visited))
!   return arg0;
!   }
!   else if (gimple_nop_p (def1)
!  || dominated_by_p (CDI_DOMINATORS,
! gimple_bb (def0), gimple_bb (def1)))
!   {
! if (maybe_skip_until (phi, arg1, ref, arg0, visited))
!   return arg1;
!   }
!   /* Special case of a diamond:
!  MEM_1 = ...
!  goto (cond) ? L1 : L2
!  L1: store1 = ...#MEM_2 = vuse(MEM_1)
!  goto L3
!  L2: store2 = ...#MEM_3 = vuse(MEM_1)
!  L3: MEM_4 = PHIMEM_2, MEM_3
!We were called with the PHI at L3, MEM_2 and MEM_3 don't
!dominate each other, but still we can easily skip this PHI node
!if we recognize that the vuse MEM operand is the same for both,
!and that we can skip both statements (they don't clobber us).
!This is still linear.  Don't use maybe_skip_until, that might
!

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-11 Thread Michael Matz
Hi,

On Mon, 10 Oct 2011, Kai Tietz wrote:

 To ensure that we use simple_operand_p in all cases, beside for 
 branching AND/OR chains, in same way as before, I added to this function 
 an additional argument, by which the looking into comparisons can be 
 activated.

Better make it a separate function the first tests your new conditions, 
and then calls simple_operand_p.

 +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
 + tree lhs, tree rhs)
  {
/* If this is the or of two comparisons, we can do something if
   the comparisons are NE_EXPR.  If this is the and, we can do something
 @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
  build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
  ll_arg, rl_arg),
  build_int_cst (TREE_TYPE (ll_arg), 0));
 -
 -  if (LOGICAL_OP_NON_SHORT_CIRCUIT)
 - {
 -   if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs)
 - return build2_loc (loc, code, truth_type, lhs, rhs);
 -   return NULL_TREE;
 - }

Why do you remove this hunk?  Shouldn't you instead move the hunk you 
added to fold_truth_andor() here.  I realize this needs some TLC to 
fold_truth_andor_1, because right now it early-outs for non-comparisons, 
but it seems the better place.  I.e. somehow move the below code into the 
above branch, with the associated diddling on fold_truth_andor_1 that it 
gets called.

 +  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
 +   (BRANCH_COST (optimize_function_for_speed_p (cfun),
 +false) = 2)
 +   !TREE_SIDE_EFFECTS (arg1)
 +   LOGICAL_OP_NON_SHORT_CIRCUIT
 +   simple_operand_p (arg1, true))
 +{
 +  enum tree_code ncode = (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
 +: TRUTH_OR_EXPR);
 +
 +  /* We don't want to pack more then two leafs to an non-IF

Missing continuation of the sentence?

 + If tree-code of left-hand operand isn't an AND/OR-IF code and not
 + equal to CODE, then we don't want to add right-hand operand.
 + If the inner right-hand side of left-hand operand has side-effects,
 + or isn't simple, then we can't add to it, as otherwise we might
 + destroy if-sequence.  */


 +  if (TREE_CODE (arg0) == code
 +   /* Needed for sequence points to handle trappings, and
 +  side-effects.  */
 +!TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
 +simple_operand_p (TREE_OPERAND (arg0, 1), true))
 +   {
 + tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1),
 + arg1);
 + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
 +  tem);
 +   }
 + /* Needed for sequence points to handle trappings, and side-effects.  */
 + else if (!TREE_SIDE_EFFECTS (arg0)
 +simple_operand_p (arg0, true))
 +   return fold_build2_loc (loc, ncode, type, arg0, arg1);
 +}
 +


Ciao,
Michael.


[C++ Patch] PR 50611

2011-10-11 Thread Paolo Carlini

Hi,

for this largish testcase (reduced from a big one by Jakub) we ICE due 
to error reporting routines re-entered. In 4_6-branch the situation is 
worse, because for the original testcase we don't produce any useful 
diagnostics at all before ICEing.


Thus the below, which seems pretty straightforward to me given that 
unqualified_name_lookup_error, called by tsubst_copy_and_build (in turn 
called by tsubst, called by dump_template_bindings) errors out 
unconditionally.


Tested mainline and 4_6-branch. Ok for both?

Thanks,
Paolo.

///
2011-10-11  Paolo Carlini  paolo.carl...@oracle.com

PR c++/50611
* pt.c (tsubst_copy_and_build): If (complain  tf_error) is false
do not call unqualified_name_lookup_error.
Index: pt.c
===
--- pt.c(revision 179798)
+++ pt.c(working copy)
@@ -13026,7 +13026,11 @@ tsubst_copy_and_build (tree t,
if (error_msg)
  error (error_msg);
if (!function_p  TREE_CODE (decl) == IDENTIFIER_NODE)
- decl = unqualified_name_lookup_error (decl);
+ {
+   if (complain  tf_error)
+ unqualified_name_lookup_error (decl);
+   decl = error_mark_node;
+ }
return decl;
   }
 


Re: [pph] Make pph.h _the_ interface header. (issue 5247044)

2011-10-11 Thread dnovillo


http://codereview.appspot.com/5247044/diff/1/gcc/cp/pph-streamer.h
File gcc/cp/pph-streamer.h (right):

http://codereview.appspot.com/5247044/diff/1/gcc/cp/pph-streamer.h#newcode165
gcc/cp/pph-streamer.h:165: struct pph_stream {
165 struct pph_stream {

You need the typedef.  Stage 1 is still built with C.

http://codereview.appspot.com/5247044/


[v3] PR libstdc++/50661

2011-10-11 Thread Paolo Carlini

Hi,

tested x86_64-linux, committed.

Paolo.


2011-10-11  Emil Wojak  e...@wojak.eu

PR c++/50661
* include/bits/stl_algobase.h (equal): Compare arrays of pointers
too with memcmp.
Index: include/bits/stl_algobase.h
===
--- include/bits/stl_algobase.h (revision 179798)
+++ include/bits/stl_algobase.h (working copy)
@@ -812,7 +812,8 @@
 {
   typedef typename iterator_traits_II1::value_type _ValueType1;
   typedef typename iterator_traits_II2::value_type _ValueType2;
-  const bool __simple = (__is_integer_ValueType1::__value
+  const bool __simple = ((__is_integer_ValueType1::__value
+ || __is_pointer_ValueType1::__value)
  __is_pointer_II1::__value
  __is_pointer_II2::__value
  __are_same_ValueType1, _ValueType2::__value);


Re: Fix for PR libobjc/49883 (clang + gcc 4.6 runtime = broken) and a small related clang fix

2011-10-11 Thread Nicola Pero
 It isn't a standoff, we can choose to just fix the issue and be compatible, 
 if we want.

I guess you're right and I'm probably using the wrong word - English is not my 
first language. ;-)

But I meant that they could have made the same choice to be compatible (by 
fixing the issue
in their compiler and making their GCC-compatible ABI output actually 
compatible with GCC;
they already have other, clang-only, GCC-incompatible ABIs in there, so why not 
make the
GCC-compatible one actually compatible with GCC ?), but they didn't.

Anyhow I completely agree with you that life is too short and we spent already 
way too much
time discussing this.  It's fixed and let's move on. :-)

Thanks



Re: [gimplefe][patch] The symbol table for declarations

2011-10-11 Thread Tom Tromey
 Sandeep == Sandeep Soni soni.sande...@gmail.com writes:

Sandeep The following patch is a basic attempt to build a symbol table that
Sandeep stores the names of all the declarations made in the input file.

I don't know anything about gimplefe, but unless you have complicated
needs, it is more usual to just put a symbol's value directly into the
identifier node.  The C front end is a good example of this.

Tom


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-11 Thread Kai Tietz
So updated version for patch.  It creates new simple_operand_p_2
function instead of modifying simple_operand_p function.

ChangeLog

2011-10-11  Kai Tietz  kti...@redhat.com

* fold-const.c (simple_operand_p_2): New function.
(fold_truthop): Rename to
(fold_truth_andor_1): function name.
Additionally remove branching creation for logical and/or.
(fold_truth_andor): Handle branching creation for logical and/or here.

Bootstrapped and regression-tested for all languages plus Ada and
Obj-C++ on x86_64-pc-linux-gnu.
Ok for apply?

Regards,
Kai

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c
+++ gcc/gcc/fold-const.c
@@ -112,13 +112,13 @@ static tree decode_field_reference (loca
 static int all_ones_mask_p (const_tree, int);
 static tree sign_bit_p (tree, const_tree);
 static int simple_operand_p (const_tree);
+static bool simple_operand_p_2 (tree);
 static tree range_binop (enum tree_code, tree, tree, int, tree, int);
 static tree range_predecessor (tree);
 static tree range_successor (tree);
 static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
 static tree fold_cond_expr_with_comparison (location_t, tree, tree,
tree, tree);
 static tree unextend (tree, int, int, tree);
-static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
 static tree optimize_minmax_comparison (location_t, enum tree_code,
tree, tree, tree);
 static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
@@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l
   return lhs;
 }
 
-/* Subroutine for fold_truthop: decode a field reference.
+/* Subroutine for fold_truth_andor_1: decode a field reference.

If EXP is a comparison reference, we return the innermost reference.

@@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val)
   return NULL_TREE;
 }

-/* Subroutine for fold_truthop: determine if an operand is simple enough
+/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough
to be evaluated unconditionally.  */

 static int
@@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp)
 registers aren't expensive.  */
   (! TREE_STATIC (exp) || DECL_REGISTER (exp;
 }
+
+/* Subroutine for fold_truth_andor: determine if an operand is simple enough
+   to be evaluated unconditionally.
+   I addition to simple_operand_p, we assume that comparisons and logic-not
+   operations are simple, if their operands are simple, too.  */
+
+static bool
+simple_operand_p_2 (tree exp)
+{
+  enum tree_code code;
+
+  /* Strip any conversions that don't change the machine mode.  */
+  STRIP_NOPS (exp);
+
+  code = TREE_CODE (exp);
+
+  if (TREE_CODE_CLASS (code) == tcc_comparison)
+return (!tree_could_trap_p (exp)
+simple_operand_p_2 (TREE_OPERAND (exp, 0))
+simple_operand_p_2 (TREE_OPERAND (exp, 1)));
+
+  if (FLOAT_TYPE_P (TREE_TYPE (exp))
+   tree_could_trap_p (exp))
+return false;
+
+  switch (code)
+{
+case SSA_NAME:
+  return true;
+case TRUTH_NOT_EXPR:
+  return simple_operand_p_2 (TREE_OPERAND (exp, 0));
+case BIT_NOT_EXPR:
+  if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE)
+   return false;
+  return simple_operand_p_2 (TREE_OPERAND (exp, 0));
+default:
+  return simple_operand_p (exp);
+}
+}
+
 
 /* The following functions are subroutines to fold_range_test and allow it to
try to change a logical combination of comparisons into a range test.
@@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr
   return 0;
 }
 
-/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P
+/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
bit value.  Arrange things so the extra bits will be set to zero if and
only if C is signed-extended to its full width.  If MASK is nonzero,
it is an INTEGER_CST that should be AND'ed with the extra bits.  */
@@ -5025,8 +5065,8 @@ merge_truthop_with_opposite_arm (locatio
We return the simplified tree or 0 if no optimization is possible.  */

 static tree
-fold_truthop (location_t loc, enum tree_code code, tree truth_type,
- tree lhs, tree rhs)
+fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
+   tree lhs, tree rhs)
 {
   /* If this is the or of two comparisons, we can do something if
  the comparisons are NE_EXPR.  If this is the and, we can do something
@@ -5054,8 +5094,6 @@ fold_truthop (location_t loc, enum tree_
   tree lntype, rntype, result;
   HOST_WIDE_INT first_bit, end_bit;
   int volatilep;
-  tree orig_lhs = lhs, orig_rhs = rhs;
-  enum tree_code orig_code = code;

   /* Start by getting the comparison codes.  Fail if anything is volatile.
  If one operand is a BIT_AND_EXPR with the constant one, treat it as if
@@ -5119,8 +5157,7 @@ 

[Committed] S/390: Add -mbackchain for a __builtin_return_address testcase

2011-10-11 Thread Andreas Krebbel
Hi,

on s390 we need a backchain in order to implement
__builtin_return_address for arguments other than 0.

Committed to mainline.

Bye,

-Andreas-


2011-10-11  Andreas Krebbel  andreas.kreb...@de.ibm.com

* gcc.dg/pr49994-3.c: Add -mbackchain for s390 and s390x.

Index: gcc/testsuite/gcc.dg/pr49994-3.c
===
*** gcc/testsuite/gcc.dg/pr49994-3.c.orig
--- gcc/testsuite/gcc.dg/pr49994-3.c
***
*** 1,5 
--- 1,6 
  /* { dg-do compile } */
  /* { dg-options -O2 -fsched2-use-superblocks -g } */
+ /* { dg-options -O2 -fsched2-use-superblocks -g -mbackchain { target 
s390*-*-* } } */
  /* { dg-require-effective-target scheduling } */
  
  void *


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-11 Thread William J. Schmidt
Hi Richard,

Thanks for the comments -- a few responses below.

On Tue, 2011-10-11 at 13:40 +0200, Richard Guenther wrote:
 On Sat, 8 Oct 2011, William J. Schmidt wrote:
 

snip

 
  +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
 
 You don't verify that bitpos % BITS_PER_UNIT is zero anywhere.

I'll add a check in the caller.  I was thinking this was unnecessary
since I had excluded bitfield operations, but on reflection that may not
be sufficient.

snip

 
  +  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
  +   true, GSI_SAME_STMT);
  +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, 
  mult_expr);
  +  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
  +  true, GSI_SAME_STMT);
  +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
  +build_int_cst (offset_type, double_int_to_shwi (c)));
 
 double_int_to_tree (offset_type, c)
 
 Please delay gimplification to the caller, that way this function
 solely operates on the trees returned from get_inner_reference.
 Or are you concerned that fold might undo your association?

I'll try that.  I was just basing this on some suggestions you had made
earlier; I don't believe there is any problem with delaying it.

snip

   
 for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi))
   {
 gimple stmt = gsi_stmt (gsi);
   
  -  if (is_gimple_assign (stmt)
  -  !stmt_could_throw_p (stmt))
  +  /* During late reassociation only, look for restructuring
  +opportunities within an expression that references memory.
  +We only do this for blocks not contained in loops, since the
  +ivopts machinery does a good job on loop expressions, and we
  +don't want to interfere with other loop optimizations.  */
 
 I'm not sure I buy this.  IVOPTs would have produced [TARGET_]MEM_REFs
 which you don't handle.  Did you do any measurements what happens if
 you enable it generally?

Actually I agree with you -- in an earlier iteration this was still
enabled for reassoc1 ahead of loop optimizations and was causing
degradations.  So long as it doesn't occur early it should be fine to do
everywhere now, and catch non-ivar cases in loops.

snip

 You verified the patch has no performance degradations on ppc64
 for SPEC CPU, did you see any improvements?
 

Yes, a few in the 2-3% range.  Nothing stellar.

 The pattern matching is still very ad-hoc and doesn't consider
 statements that feed the base address.  There is conceptually
 no difference between p-a[n] and *(p + n * 4).

That's true.  Since we abandoned the general address-lowering approach,
this was aimed at the specific pattern that comes up frequently in
practice.  I would expect the *(p + n * 4) cases to be handled by the
general straight-line strength reduction, which is the correct long-term
approach.  (Cases like p-a[n], where the multiplication is not yet
explicit, will be a bit of a wart as part of strength reduction, too,
but that's still the right place for it eventually.)

   You placed this
 lowering in reassoc to catch CSE opportunities with DOM, right?
 Does RTL CSE not do it's job or is the transform undone by
 fwprop before it gets a chance to do it?

I think with Paolo's suggested patch for RTL CSE, this could be moved
back to expand.  I will have to experiment with it again to make sure.
If so, that would certainly be my preference as well.

(Or having the whole problem just disappear might be my preference on
some days... :)

Thanks,
Bill



Re: [patch RFC,PR50038]

2011-10-11 Thread Ilya Enkovich
2011/10/4 Richard Henderson r...@redhat.com:
 On 10/04/2011 08:42 AM, Joseph S. Myers wrote:
 On Tue, 4 Oct 2011, Ilya Tocar wrote:

 Hi everyone,

 This patch fixes PR 50038 (redundant zero extensions) by modifying
 implicit-zee pass
 to also remove unneeded zero extensions from QImode to SImode.

 Hardcoding particular modes like this in the target-independent parts of
 the compiler is fundamentally ill-conceived.  Right now it hardcodes the
 (SImode, DImode) pair.  You're adding hardcoding of (QImode, SImode) as
 well.  But really it should consider all pairs of (integer mode, wider
 integer mode), with the machine description (or target hooks) determining
 which pairs are relevant on a particular target.  Changing it not to
 hardcode particular modes would be better than adding a second pair.


 That along with not hard-coding ZERO_EXTEND.

 Both MIPS and Alpha have much the same free operations, but with SIGN_EXTEND.

 I remember rejecting one iteration of this pass with this hard-coded, but the
 pass was apparently approved by someone else without that being corrected.


 r~


Hello guys,

Could you please look at my patch version? I tried to remove all
unnecessary mode restrictions and cover SIGN_EXTEND case. I did not
test this patch yet, just checked it worked on reproducer from
PR50038.

Thanks
Ilya
---
gcc/

* implicit-zee.c (ext_cand): New.
(ext_cand_pool): Likewise.
(add_ext_candidate): Likewise.
(zee_init): Likewise.
(zee_cleanup): Likewise.
(combine_set_zero_extend): Get extend candidate as new parameter.
Now handle sign extend cases and all modes.
(transform_ifelse): Likewise.
(merge_def_and_ze): Likewise.
(combine_reaching_defs): Change parameter type.
(zero_extend_info): Changed insn_list type.
(add_removable_zero_extend): Relaxed mode and code filter.
(find_removable_zero_extends): Changed return type.
(find_and_remove_ze): Var type changes.
(rest_of_handle_zee): Initialization and cleanup added.


PR50038.diff
Description: Binary data


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-11 Thread Michael Matz
Hi,

On Tue, 11 Oct 2011, Kai Tietz wrote:

  Better make it a separate function the first tests your new 
  conditions, and then calls simple_operand_p.
 
 Well, either I make it a new function and call it instead of 
 simple_operand_p,

That's what I meant, yes.

  @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
                           build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
                                   ll_arg, rl_arg),
                           build_int_cst (TREE_TYPE (ll_arg), 0));
  -
  -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
  -     {
  -       if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs)
  -         return build2_loc (loc, code, truth_type, lhs, rhs);
  -       return NULL_TREE;
  -     }
 
  Why do you remove this hunk?  Shouldn't you instead move the hunk you
  added to fold_truth_andor() here.  I realize this needs some TLC to
  fold_truth_andor_1, because right now it early-outs for non-comparisons,
  but it seems the better place.  I.e. somehow move the below code into the
  above branch, with the associated diddling on fold_truth_andor_1 that it
  gets called.
 
 This hunk is removed, as it is vain to do here.

There is a fallthrough now, that wasn't there before.  I don't know if 
it's harmless, I just wanted to mention it.

 Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is 
 better done at a single place in fold_truth_andor only.

As fold_truthop is called twice by fold_truth_andor, the latter might 
indeed be the better place.


Ciao,
Michael.

Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-10-11 Thread Jason Merrill
That looks pretty good, but do you really need to build up a separate 
data structure to search?  You seem to be searching it in the same order 
that it's built up, so why not just walk the expansion chain directly 
when searching?


Jason


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-11 Thread Kai Tietz
2011/10/11 Michael Matz m...@suse.de:
 Hi,

 On Tue, 11 Oct 2011, Kai Tietz wrote:

  Better make it a separate function the first tests your new
  conditions, and then calls simple_operand_p.

 Well, either I make it a new function and call it instead of
 simple_operand_p,

 That's what I meant, yes.

  @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_
                           build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
                                   ll_arg, rl_arg),
                           build_int_cst (TREE_TYPE (ll_arg), 0));
  -
  -      if (LOGICAL_OP_NON_SHORT_CIRCUIT)
  -     {
  -       if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs)
  -         return build2_loc (loc, code, truth_type, lhs, rhs);
  -       return NULL_TREE;
  -     }
 
  Why do you remove this hunk?  Shouldn't you instead move the hunk you
  added to fold_truth_andor() here.  I realize this needs some TLC to
  fold_truth_andor_1, because right now it early-outs for non-comparisons,
  but it seems the better place.  I.e. somehow move the below code into the
  above branch, with the associated diddling on fold_truth_andor_1 that it
  gets called.

 This hunk is removed, as it is vain to do here.

 There is a fallthrough now, that wasn't there before.  I don't know if
 it's harmless, I just wanted to mention it.

It is.  Before we changed expression here and recurse here with the
non-IF AND/OR expression later.  So there is no need to do this
recursion.

 Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is
 better done at a single place in fold_truth_andor only.

 As fold_truthop is called twice by fold_truth_andor, the latter might
 indeed be the better place.


 Ciao,
 Michael.

Kai


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-11 Thread William J. Schmidt
On Tue, 2011-10-11 at 09:12 -0500, William J. Schmidt wrote:

  The pattern matching is still very ad-hoc and doesn't consider
  statements that feed the base address.  There is conceptually
  no difference between p-a[n] and *(p + n * 4).
 
 That's true.  Since we abandoned the general address-lowering approach,
 this was aimed at the specific pattern that comes up frequently in
 practice.  I would expect the *(p + n * 4) cases to be handled by the
 general straight-line strength reduction, which is the correct long-term
 approach.  (Cases like p-a[n], where the multiplication is not yet
 explicit, will be a bit of a wart as part of strength reduction, too,
 but that's still the right place for it eventually.)

Going through my notes, I do have some code for the *(p + n * 4) case
lying around from the last time I tried this in expand, so I'll try to
get this back in place (either in reassoc2 or expand, depending on how
the CSE works out).




Re: [Patch] PR c++/26256

2011-10-11 Thread Jason Merrill

On 10/10/2011 03:59 PM, Fabien Chêne wrote:

Sorry but I've failed to see why you called them callers of
lookup_field_1, could you elaborate ?


Hmm, I was assuming that the other functions would have gotten their 
decls via lookup_field_1, but I suppose that isn't true for unqualified 
lookup that finds the name in class_binding_level.  Never mind.


Jason



Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-11 Thread Michael Matz
Hi,

On Tue, 11 Oct 2011, Kai Tietz wrote:

 So updated version for patch.  It creates new simple_operand_p_2
 function instead of modifying simple_operand_p function.

FWIW: I also can't think of a nice short name for that predicate function 
:)  One thing: move the test for TREE_SIDE_EFFECTS to that new function, 
then the if()s in fold_truth_andor become nicer.  I think the code then is 
okay, but I can't approve.  Just one remark about the comment:

 +  /* We don't want to pack more then two leafs to an non-IF AND/OR

s/then/than/ s/an/a/

 + expression.
 + If tree-code of left-hand operand isn't an AND/OR-IF code and not
 + equal to CODE, then we don't want to add right-hand operand.
 + If the inner right-hand side of left-hand operand has side-effects,
 + or isn't simple, then we can't add to it, as otherwise we might
 + destroy if-sequence.  */

And I think it could use some overview of the transformation done like in 
the initial patch, ala:

Transform ((A  B)  C) into (A  (B  C)).

and

Or (A  B) into (A  B). for this part:

+ /* Needed for sequence points to handle trappings, and side-effects.  */
+ else if (simple_operand_p_2 (arg0))
+   return fold_build2_loc (loc, ncode, type, arg0, arg1);


Ciao,
Michael.


Fix PR 50565 (offsetof-type expressions in static initializers)

2011-10-11 Thread Joseph S. Myers
This patch fixes PR 50565, a failure to accept certain offsetof-type
expressions in static initializers introduced by my constant
expressions changes.  (These expressions are permitted but not
required by ISO C to be accepted; the intent of my constant
expressions model is that they should be valid in GNU C.)

The problem comes down to an expression with the difference of two
pointers being cast to int on a 64-bit system, resulting in
convert_to_integer moving the conversions inside the subtraction.
(These optimizations at conversion time should really be done later as
a part of folding, or even later than that, rather than
unconditionally in convert_to_*, but that's another issue.)  So when
the expression reaches c_fully_fold it is a difference of narrowed
pointers being folded, which the compiler cannot optimize as it can a
difference of unnarrowed pointers with the same base object.  Before
the introduction of c_fully_fold the difference would have been folded
when built and so the narrowing of operands would never have been
applied to it.

This patch disables the narrowing in the case of pointer subtraction,
as it doesn't seem particularly likely to be useful there and is known
to prevent this folding required for these initializers to be
accepted.

Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  OK to
commit?

2011-10-11  Joseph Myers  jos...@codesourcery.com

PR c/50565
* convert.c (convert_to_integer): Do not narrow operands of
pointer subtraction.

testsuite:
2011-10-11  Joseph Myers  jos...@codesourcery.com

PR c/50565
* gcc.c-torture/compile/pr50565-1.c,
gcc.c-torture/compile/pr50565-2.c: New tests.

Index: gcc/testsuite/gcc.c-torture/compile/pr50565-1.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr50565-1.c (revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/pr50565-1.c (revision 0)
@@ -0,0 +1,4 @@
+struct s { char p[2]; };
+static struct s v;
+const int o0 = (int) ((void *) v.p[0] - (void *) v) + 0U;
+const int o1 = (int) ((void *) v.p[0] - (void *) v) + 1U;
Index: gcc/testsuite/gcc.c-torture/compile/pr50565-2.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr50565-2.c (revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/pr50565-2.c (revision 0)
@@ -0,0 +1,4 @@
+struct s { char p[2]; };
+static struct s v;
+const int o0 = (int) ((void *) v.p[0] - (void *) v) + 0;
+const int o1 = (int) ((void *) v.p[0] - (void *) v) + 1;
Index: gcc/convert.c
===
--- gcc/convert.c   (revision 179754)
+++ gcc/convert.c   (working copy)
@@ -745,6 +745,15 @@ convert_to_integer (tree type, tree expr
tree arg0 = get_unwidened (TREE_OPERAND (expr, 0), type);
tree arg1 = get_unwidened (TREE_OPERAND (expr, 1), type);
 
+   /* Do not try to narrow operands of pointer subtraction;
+  that will interfere with other folding.  */
+   if (ex_form == MINUS_EXPR
+CONVERT_EXPR_P (arg0)
+CONVERT_EXPR_P (arg1)
+POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (arg0, 0)))
+POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0
+ break;
+
if (outprec = BITS_PER_WORD
|| TRULY_NOOP_TRUNCATION (outprec, inprec)
|| inprec  TYPE_PRECISION (TREE_TYPE (arg0))

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: PR c++/30195

2011-10-11 Thread Jason Merrill

On 10/10/2011 03:56 PM, Fabien Chêne wrote:

It tried to add the target declaration of a USING_DECL in the
method_vec of the class where the USING_DECL is declared. Thus, I
copied the target decl, adjusted its access, and then called
add_method with the target decl.


Copying the decl is unlikely to do what we want, I think.  Does putting 
the target decl directly into the method vec work?  If not, perhaps 
lookup_fnfields_1 should look through the field list for function 
USING_DECLs.


Jason


Re: fix for c++/44473, mangling of decimal types, checked in

2011-10-11 Thread Peter Bergner
On Fri, 2011-09-30 at 10:37 -0700, Janis Johnson wrote:
 Patch http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00625.html was
 approved by Jason last December but I never got around to checking
 it in.  Paolo Carlini said in PR44473 that it was already approved
 and doesn't need a new approval, so I checked it in after a
 bootstrap and regtest of c,c++ for i686-pc-linux-gnu.

Jason, I assume your approval for committing Janis' patch for the
4.5 branch still stands.  Can we also commit her fix to the 4.6
branch, since it was created after your approval, but before Janis
committed the mainline patch?  I have verified that Janis' patch
bootstraps and regtests with no regressions on both the 4.5 and
4.6 branches.


Peter





Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-11 Thread H.J. Lu
On Tue, Oct 11, 2011 at 3:12 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote:
 Hi
 Uros, you was right both with fpmath and configflags. That is why it
 was passing for me.

 Attached patch which cures the problem.

 testsuite/ChangeLog entry:

 2011-10-11  Kirill Yukhin  kirill.yuk...@intel.com

        * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse.
        * gcc.target/i386/fma_double_2.c: Ditto.
        * gcc.target/i386/fma_double_3.c: Ditto.
        * gcc.target/i386/fma_double_4.c: Ditto.
        * gcc.target/i386/fma_double_5.c: Ditto.
        * gcc.target/i386/fma_double_6.c: Ditto.
        * gcc.target/i386/fma_float_1.c: Ditto.
        * gcc.target/i386/fma_float_2.c: Ditto.
        * gcc.target/i386/fma_float_3.c: Ditto.
        * gcc.target/i386/fma_float_4.c: Ditto.
        * gcc.target/i386/fma_float_5.c: Ditto.
        * gcc.target/i386/fma_float_6.c: Ditto.
        * gcc.target/i386/l_fma_double_1.c: Ditto.
        * gcc.target/i386/l_fma_double_2.c: Ditto.
        * gcc.target/i386/l_fma_double_3.c: Ditto.
        * gcc.target/i386/l_fma_double_4.c: Ditto.
        * gcc.target/i386/l_fma_double_5.c: Ditto.
        * gcc.target/i386/l_fma_double_6.c: Ditto.
        * gcc.target/i386/l_fma_float_1.c: Ditto.
        * gcc.target/i386/l_fma_float_2.c: Ditto.
        * gcc.target/i386/l_fma_float_3.c: Ditto.
        * gcc.target/i386/l_fma_float_4.c: Ditto.
        * gcc.target/i386/l_fma_float_5.c: Ditto.
        * gcc.target/i386/l_fma_float_6.c: Ditto.
        * gcc.target/i386/l_fma_run_double_1.c: Ditto.
        * gcc.target/i386/l_fma_run_double_2.c: Ditto.
        * gcc.target/i386/l_fma_run_double_3.c: Ditto.
        * gcc.target/i386/l_fma_run_double_4.c: Ditto.
        * gcc.target/i386/l_fma_run_double_5.c: Ditto.
        * gcc.target/i386/l_fma_run_double_6.c: Ditto.
        * gcc.target/i386/l_fma_run_float_1.c: Ditto.
        * gcc.target/i386/l_fma_run_float_2.c: Ditto.
        * gcc.target/i386/l_fma_run_float_3.c: Ditto.
        * gcc.target/i386/l_fma_run_float_4.c: Ditto.
        * gcc.target/i386/l_fma_run_float_5.c: Ditto.
        * gcc.target/i386/l_fma_run_float_6.c: Ditto.

 Could you please have a look?

 Sorry for inconvenience, K


All double vector tests are failed when GCC is configured with
--with-cpu=atom since double vectorizer is turned off by default.
You should add -mtune=generic to those tests.

-- 
H.J.


Re: [Patch,AVR]: Housekeeping avr_legitimate_address_p

2011-10-11 Thread Denis Chertykov
2011/10/11 Georg-Johann Lay a...@gjlay.de:
 This is bit of code cleanup and move macro code from avr.h to functions in 
 avr.c.

 There's no change in functionality. Passed without regressions.

 Ok?

 Johann

        * config/avr/avr-protos.h (avr_mode_code_base_reg_class): New
        prototype.
        (avr_regno_mode_code_ok_for_base_p): New prototype.
        * config/avr/avr.h (BASE_REG_CLASS): Remove.
        (REGNO_OK_FOR_BASE_P): Remove.
        (REG_OK_FOR_BASE_NOSTRICT_P): Remove.
        (REG_OK_FOR_BASE_STRICT_P): Remove.
        (MODE_CODE_BASE_REG_CLASS): New define.
        (REGNO_MODE_CODE_OK_FOR_BASE_P): New define.

        * config/avr/avr.c (avr_mode_code_base_reg_class): New function.
        (avr_regno_mode_code_ok_for_base_p): New function.
        (avr_reg_ok_for_addr_p): New static function.
        (avr_legitimate_address_p): Use it.  Beautify.


Approved.

Denis.


Re: [Patch,AVR]: Fix PR50447 (4/n)

2011-10-11 Thread Denis Chertykov
2011/10/11 Georg-Johann Lay a...@gjlay.de:
 This is a small addendum to PR50447.

 It's a change to addsi3 insn; the actual insn sequence printed is still the
 same (except for adding +/-1 to l-reg) but the effect on cc0 is worked out so
 that it can be used to cancel out comparisons like in long loops.

 cc insn attribute gets one more alternative and notice_update_cc calls
 respective output function that works out the effect on cc0.

 Passed without regressions. Ok for trunk?

Ok.

Denis.


Re: [patches] several gcc.target/powerpc tests require hard_float

2011-10-11 Thread Janis Johnson
On 10/10/2011 06:23 PM, Joseph S. Myers wrote:
 On Mon, 10 Oct 2011, Janis Johnson wrote:
 
 This patch skips several Power-specific tests if hard_float support
 isn't available.  OK for trunk?
 
 It looks like these are testing for particular instructions using FPRs and 
 so powerpc_fprs is more appropriate; you don't want to match e500v2 hard 
 float.

Is a patch using powerpc_fprs instead of hard_float OK for these tests?

Janis



Re: fix for c++/44473, mangling of decimal types, checked in

2011-10-11 Thread Jason Merrill

On 10/11/2011 11:34 AM, Peter Bergner wrote:

On Fri, 2011-09-30 at 10:37 -0700, Janis Johnson wrote:

Patch http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00625.html was
approved by Jason last December but I never got around to checking
it in.  Paolo Carlini said in PR44473 that it was already approved
and doesn't need a new approval, so I checked it in after a
bootstrap and regtest of c,c++ for i686-pc-linux-gnu.


Jason, I assume your approval for committing Janis' patch for the
4.5 branch still stands.  Can we also commit her fix to the 4.6
branch, since it was created after your approval, but before Janis
committed the mainline patch?  I have verified that Janis' patch
bootstraps and regtests with no regressions on both the 4.5 and
4.6 branches.


Yes.

Jason



Re: New warning for expanded vector operations

2011-10-11 Thread Artem Shinkarov
On Tue, Oct 11, 2011 at 11:52 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Oct 10, 2011 at 3:21 PM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Mon, Oct 10, 2011 at 12:02 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Fri, Oct 7, 2011 at 9:44 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Fri, Oct 7, 2011 at 6:22 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Wed, Oct 5, 2011 at 12:35 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Oct 5, 2011 at 1:28 PM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
 Hi

 Here is a patch to inform a programmer about the expanded vector 
 operation.
 Bootstrapped on x86-unknown-linux-gnu.

 ChangeLog:

        * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to
          produce the warning.
          (expand_vector_parallel): Adjust to produce the warning.

 Entries start without gcc/, they are relative to the gcc/ChangeLog 
 file.

 Sure, sorry.

          (lower_vec_shuffle): Adjust to produce the warning.
        * gcc/common.opt: New warning Wvector-operation-expanded.
        * gcc/doc/invoke.texi: Document the wawning.


 Ok?

 I don't like the name -Wvector-operation-expanded.  We emit a
 similar warning for missed inline expansions with -Winline, so
 maybe -Wvector-extensions (that's the name that appears
 in the C extension documentation).

 Hm, I don't care much about the name, unless it gets clear what the
 warning is used for.  I am not really sure that Wvector-extensions
 makes it clear.  Also, I don't see anything bad if the warning will
 pop up during the vectorisation. Any vector operation performed
 outside the SIMD accelerator looks suspicious, because it actually
 doesn't improve performance.  Such a warning during the vectorisation
 could mean that a programmer forgot some flag, or the constant
 propagation failed to deliver a constant, or something else.

 Conceptually the text I am producing is not really a warning, it is
 more like an information, but I am not aware of the mechanisms that
 would allow me to introduce a flag triggering inform () or something
 similar.

 What I think we really need to avoid is including this warning in the
 standard Ox.

 +  location_t loc = gimple_location (gsi_stmt (*gsi));
 +
 +  warning_at (loc, OPT_Wvector_operation_expanded,
 +             vector operation will be expanded piecewise);

   v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta);
   for (i = 0; i  nunits;
 @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter
   tree result, compute_type;
   enum machine_mode mode;
   int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / 
 UNITS_PER_WORD;
 +  location_t loc = gimple_location (gsi_stmt (*gsi));
 +
 +  warning_at (loc, OPT_Wvector_operation_expanded,
 +             vector operation will be expanded in parallel);

 what's the difference between 'piecewise' and 'in parallel'?

 Parallel is a little bit better for performance than piecewise.

 I see.  That difference should probably be documented, maybe with
 an example.

 Richard.

 @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter
  {
   int parts_per_word = UNITS_PER_WORD
                       / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE 
 (type)), 1);
 +  location_t loc = gimple_location (gsi_stmt (*gsi));

   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
        parts_per_word = 4
        TYPE_VECTOR_SUBPARTS (type) = 4)
 -    return expand_vector_parallel (gsi, f_parallel,
 -                                  type, a, b, code);
 +    return expand_vector_parallel (gsi, f_parallel, type, a, b, code);
   else
 -    return expand_vector_piecewise (gsi, f,
 -                                   type, TREE_TYPE (type),
 -                                   a, b, code);
 +    return expand_vector_piecewise (gsi, f, type,
 +                                   TREE_TYPE (type), a, b, code);
  }

  /* Check if vector VEC consists of all the equal elements and

 unless i miss something loc is unused here.  Please avoid random
 whitespace changes (just review your patch yourself before posting
 and revert pieces that do nothing).

 Yes you are right, sorry.

 +@item -Wvector-operation-expanded
 +@opindex Wvector-operation-expanded
 +@opindex Wno-vector-operation-expanded
 +Warn if vector operation is not implemented via SIMD capabilities of 
 the
 +architecture. Mainly useful for the performance tuning.

 I'd mention that this is for vector operations as of the C extension
 documented in Vector Extensions.

 The vectorizer can produce some operations that will need further
 lowering - we probably should make sure to _not_ warn about those.
 Try running the vect.exp testsuite with the new warning turned on
 (eventually disabling SSE), like with

 obj/gcc make check-gcc
 

Re: [C++-11] User defined literals

2011-10-11 Thread Jason Merrill

On 10/11/2011 12:55 PM, Jason Merrill wrote:

On 10/09/2011 07:19 PM, Ed Smith-Rowland wrote:

Does cp_parser_identifier (parser) *not* consume the identifier token?


I'm pretty sure it does.

Does it work to only complain if !cp_parser_parsing_tentatively?


I suppose not, if you got no complaints with cp_parser_error.

Jason



Re: [patches] several gcc.target/powerpc tests require hard_float

2011-10-11 Thread Joseph S. Myers
On Tue, 11 Oct 2011, Janis Johnson wrote:

 On 10/10/2011 06:23 PM, Joseph S. Myers wrote:
  On Mon, 10 Oct 2011, Janis Johnson wrote:
  
  This patch skips several Power-specific tests if hard_float support
  isn't available.  OK for trunk?
  
  It looks like these are testing for particular instructions using FPRs and 
  so powerpc_fprs is more appropriate; you don't want to match e500v2 hard 
  float.
 
 Is a patch using powerpc_fprs instead of hard_float OK for these tests?

OK in the absence of testsuite or target maintainer objections within 24 
hours.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: int_cst_hash_table mapping persistence and the garbage collector

2011-10-11 Thread Gary Funck
On 10/11/11 10:24:52, Richard Guenther wrote:
 GF: 1. Is it valid to assume that pointer equality is sufficient
 GF: to compare two integer constants for equality as long as they
 GF: have identical type and value?
 
 Yes, if both constants are live

The upc blocking factor hash table is declared as follows:

static GTY ((if_marked (tree_map_marked_p),
   param_is (struct tree_map)))
 htab_t upc_block_factor_for_type;
[...]
  upc_block_factor_for_type = htab_create_ggc (512, tree_map_hash,
   tree_map_eq, 0);

I had hoped that this would be sufficient to ensure that all
integer constant references recorded in this hash table would
be considered live by the GC.  Reading the code in tree_map_marked_p(),
however, I see the following:

#define tree_map_marked_p tree_map_base_marked_p
[...]
/* Return true if this tree map structure is marked for garbage collection
   purposes.  We simply return true if the from tree is marked, so that this
   structure goes away when the from tree goes away.  */

int
tree_map_base_marked_p (const void *p)
{ 
  return ggc_marked_p (((const struct tree_map_base *) p)-from);
}

This takes care of recycling an entry when the '-from' reference
goes away, but it doesn't make sure that the '-to' reference is
considered live.  I don't understand the GC well enough to
know when/where the '-to' entry should be marked as live.

(note: in the cited test case, the -from pointers in question
are known to be live and did survive garbage collection.)

Given that the declaration above tells the GC that the nodes
in the blocking factor hash table are of type 'struct tree_map',

struct GTY(()) tree_map_base {
  tree from;
};


/* Map from a tree to another tree.  */

struct GTY(()) tree_map {
  struct tree_map_base base;
  unsigned int hash;
  tree to;
};

I thought that the GC would mark the -to nodes as
live automatically?  (note: probably the only direct
reference to the integer constant that is the focus of
this discussion is in the upc_block_factor_for_type hash table.
Therefore, if it isn't seen as live there, it won't be seen
as live anywhere else.)

I suppose that I could declare a linear tree list of mapped integer
constants and let the GC walk that, but that is more of a hack than a solution.

- Gary


PATCH: Remove the extra break

2011-10-11 Thread H.J. Lu
Hi,

I checked in this patch to remove the extra break.

H.J.
---
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 179810)
+++ config/i386/i386.c  (working copy)
@@ -28096,7 +28096,6 @@ ix86_expand_special_args_builtin (const 
   klass = store;
   memory = 0;
   break;
-  break;
 case UINT64_FTYPE_VOID:
 case UNSIGNED_FTYPE_VOID:
   nargs = 0;
Index: ChangeLog
===
--- ChangeLog   (revision 179810)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2011-10-11  H.J. Lu  hongjiu...@intel.com
+
+   * config/i386/i386.c (ix86_expand_special_args_builtin): Remove
+   the extra break.
+
 2011-10-11  Artjoms Sinkarovs  artyom.shinkar...@gmail.com
 
* doc/invoke.texi: Document new warning.


Re: fix for c++/44473, mangling of decimal types, checked in

2011-10-11 Thread Peter Bergner
On Tue, 2011-10-11 at 12:12 -0400, Jason Merrill wrote:
 On 10/11/2011 11:34 AM, Peter Bergner wrote:
  On Fri, 2011-09-30 at 10:37 -0700, Janis Johnson wrote:
  Patch http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00625.html was
  approved by Jason last December but I never got around to checking
  it in.  Paolo Carlini said in PR44473 that it was already approved
  and doesn't need a new approval, so I checked it in after a
  bootstrap and regtest of c,c++ for i686-pc-linux-gnu.
 
  Jason, I assume your approval for committing Janis' patch for the
  4.5 branch still stands.  Can we also commit her fix to the 4.6
  branch, since it was created after your approval, but before Janis
  committed the mainline patch?  I have verified that Janis' patch
  bootstraps and regtests with no regressions on both the 4.5 and
  4.6 branches.
 
 Yes.

Ok, it has been committed to both the FSF 4.6 and 4.5 branches now.
Thanks!

Peter





C++ PATCH for c++/49855, c++/49896 (ICE with named constants in templates)

2011-10-11 Thread Jason Merrill
The problem in both of these PRs is that G++ has assumed that we don't 
ever need to actually perform non-dependent conversions in a template; 
once we know that the conversion can be performed, we just generated a 
NOP_EXPR to change the type of the expression.  But this isn't good 
enough for initializers for variables that might be used in constant 
expressions, since we want to reduce them to constants if possible.


For conversions between scalar types, we can just go ahead and perform 
the conversion; this is a bit of a boundary violation, but should be 
harmless since such conversions are represented by a single tree code 
that we can just pass along at tsubst time.  This is also necessary to 
avoid breaking checking of non-dependent OpenMP for loops in 
c_finish_omp_for.


For C++11 constexpr conversions, it's more involved, so I've introduced 
a new tree code IMPLICIT_CONV_EXPR.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6800169668342f9009a1001bddff8b1400245836
Author: Jason Merrill ja...@redhat.com
Date:   Sun Oct 9 23:27:52 2011 +0100

	PR c++/49855
	PR c++/49896
	* cp-tree.def (IMPLICIT_CONV_EXPR): New.
	* call.c (perform_implicit_conversion_flags): Build it
	instead of NOP_EXPR.
	* cp-objcp-common.c (cp_common_init_ts): It's typed.
	* cxx-pretty-print.c (pp_cxx_cast_expression): Handle it.
	(pp_cxx_expression): Likewise.
	* error.c (dump_expr): Likewise.
	* semantics.c (potential_constant_expression_1): Likewise.
	* tree.c (cp_tree_equal): Likewise.
	(cp_walk_subtrees): Likewise.
	* pt.c (iterative_hash_template_arg): Likewise.
	(for_each_template_parm_r): Likewise.
	(type_dependent_expression_p): Likewise.
	(tsubst_copy, tsubst_copy_and_build): Handle IMPLICIT_CONV_EXPR
	and CONVERT_EXPR.
	* cp-tree.h (IMPLICIT_CONV_EXPR_DIRECT_INIT): New.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 4c03e76..7219afe 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8397,13 +8397,19 @@ perform_implicit_conversion_flags (tree type, tree expr, tsubst_flags_t complain
 	}
   expr = error_mark_node;
 }
-  else if (processing_template_decl)
+  else if (processing_template_decl
+	   /* As a kludge, we always perform conversions between scalar
+	  types, as IMPLICIT_CONV_EXPR confuses c_finish_omp_for.  */
+	!(SCALAR_TYPE_P (type)  SCALAR_TYPE_P (TREE_TYPE (expr
 {
   /* In a template, we are only concerned about determining the
 	 type of non-dependent expressions, so we do not have to
-	 perform the actual conversion.  */
-  if (TREE_TYPE (expr) != type)
-	expr = build_nop (type, expr);
+	 perform the actual conversion.  But for initializers, we
+	 need to be able to perform it at instantiation
+	 (or fold_non_dependent_expr) time.  */
+  expr = build1 (IMPLICIT_CONV_EXPR, type, expr);
+  if (!(flags  LOOKUP_ONLYCONVERTING))
+	IMPLICIT_CONV_EXPR_DIRECT_INIT (expr) = true;
 }
   else
 expr = convert_like (conv, expr, complain);
diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c
index 1866b81..035fdcd 100644
--- a/gcc/cp/cp-objcp-common.c
+++ b/gcc/cp/cp-objcp-common.c
@@ -267,6 +267,7 @@ cp_common_init_ts (void)
   MARK_TS_TYPED (CONST_CAST_EXPR);
   MARK_TS_TYPED (STATIC_CAST_EXPR);
   MARK_TS_TYPED (DYNAMIC_CAST_EXPR);
+  MARK_TS_TYPED (IMPLICIT_CONV_EXPR);
   MARK_TS_TYPED (TEMPLATE_ID_EXPR);
   MARK_TS_TYPED (ARROW_EXPR);
   MARK_TS_TYPED (SIZEOF_EXPR);
diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index bb1b753..be29870 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -250,6 +250,7 @@ DEFTREECODE (REINTERPRET_CAST_EXPR, reinterpret_cast_expr, tcc_unary, 1)
 DEFTREECODE (CONST_CAST_EXPR, const_cast_expr, tcc_unary, 1)
 DEFTREECODE (STATIC_CAST_EXPR, static_cast_expr, tcc_unary, 1)
 DEFTREECODE (DYNAMIC_CAST_EXPR, dynamic_cast_expr, tcc_unary, 1)
+DEFTREECODE (IMPLICIT_CONV_EXPR, implicit_conv_expr, tcc_unary, 1)
 DEFTREECODE (DOTSTAR_EXPR, dotstar_expr, tcc_expression, 2)
 DEFTREECODE (TYPEID_EXPR, typeid_expr, tcc_expression, 1)
 DEFTREECODE (NOEXCEPT_EXPR, noexcept_expr, tcc_unary, 1)
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index f824f38..b53accf 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -72,6 +72,7 @@ c-common.h, not after.
   DECLTYPE_FOR_LAMBDA_CAPTURE (in DECLTYPE_TYPE)
   VEC_INIT_EXPR_IS_CONSTEXPR (in VEC_INIT_EXPR)
   DECL_OVERRIDE_P (in FUNCTION_DECL)
+  IMPLICIT_CONV_EXPR_DIRECT_INIT (in IMPLICIT_CONV_EXPR)
1: IDENTIFIER_VIRTUAL_P (in IDENTIFIER_NODE)
   TI_PENDING_TEMPLATE_FLAG.
   TEMPLATE_PARMS_FOR_INLINE.
@@ -3233,6 +3234,11 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter)
B b{1,2}, not B b({1,2}) or B b = {1,2}.  */
 #define CONSTRUCTOR_IS_DIRECT_INIT(NODE) (TREE_LANG_FLAG_0 (CONSTRUCTOR_CHECK (NODE)))
 
+/* True if NODE represents a conversion for direct-initialization in a
+   template.  Set by perform_implicit_conversion_flags.  */
+#define 

[PATCH] [Annotalysis] Bugfix where lock function is attached to a base class.

2011-10-11 Thread Delesley Hutchins
This patch fixes an error where Annotalysis generates bogus warnings
when using lock and unlock functions that are attached to a base class.
The canonicalize routine did not work correctly in this case.

Bootstrapped and passed gcc regression testsuite on
x86_64-unknown-linux-gnu.  Okay for google/gcc-4_6?

  -DeLesley

Changelog.google-4_6:
2011-10-11  DeLesley Hutchins  deles...@google.com

   * tree-threadsafe-analyze.c (get_canonical_lock_expr)

testsuite/Changelog.google-4_6:
2011-10-11  DeLesley Hutchins  deles...@google.com

   * g++.dg/thread-ann/thread_annot_lock-83.C


Index: gcc/testsuite/g++.dg/thread-ann/thread_annot_lock-83.C
===
--- gcc/testsuite/g++.dg/thread-ann/thread_annot_lock-83.C  (revision 0)
+++ gcc/testsuite/g++.dg/thread-ann/thread_annot_lock-83.C  (revision 0)
@@ -1,5 +1,8 @@
-// Regression test for bugfix, where shared locks are not properly
-// removed from locksets if a universal lock is present.
+// Regression test for two bugfixes.
+// Bugfix 1:  Shared locks are not properly removed from locksets
+// if a universal lock is present.
+// Bugfix 2:  Canonicalization does not properly store the lock in
+// the hash table if the lock function is attached to a base class.
 // { dg-do compile }
 // { dg-options -Wthread-safety }

@@ -7,6 +10,7 @@

 class Foo;

+/* Bugfix 1 */
 class Bar {
 public:
   Foo*  foo;
@@ -29,3 +33,23 @@ void Bar::bar() {
   ReaderMutexLock rlock(mu_);
 }

+
+/* Bugfix 2 */
+class LOCKABLE Base {
+public:
+  Mutex mu_;
+
+  void Lock()   EXCLUSIVE_LOCK_FUNCTION()   { mu_.Lock();   }
+  void Unlock() UNLOCK_FUNCTION()   { mu_.Unlock(); }
+};
+
+class Derived : public Base {
+public:
+  int b;
+};
+
+void doSomething(Derived *d) {
+  d-Lock();
+  d-Unlock();
+};
+
Index: gcc/tree-threadsafe-analyze.c
===
--- gcc/tree-threadsafe-analyze.c   (revision 179771)
+++ gcc/tree-threadsafe-analyze.c   (working copy)
@@ -927,7 +927,16 @@ get_canonical_lock_expr (tree lock, tree base_obj,
 NULL_TREE);

   if (lang_hooks.decl_is_base_field (component))
-return canon_base;
+{
+  if (is_temp_expr)
+return canon_base;
+  else
+/* return canon_base, but recalculate it so that it is stored
+   in the hash table. */
+return get_canonical_lock_expr (base, base_obj,
+false /* is_temp_expr */,
+new_leftmost_base_var);
+}

   if (base != canon_base)
 lock = build3 (COMPONENT_REF, TREE_TYPE (component),

-- 
DeLesley Hutchins | Software Engineer | deles...@google.com | 505-206-0315


Re: [testsuite] modify powerpc test for hard_float target, skip powerpc/warn-[12].c for soft-float

2011-10-11 Thread Janis Johnson
On 10/10/2011 01:19 PM, Janis Johnson wrote:
 Tests gcc.target/powerpc/warn-[12].c fail for soft-float multilibs with
 the unexpected warning -mvsx requires hardware floating point [enabled
 by default].  This patch skips those tests for soft-float multilibs and
 modifies the powerpc check for a soft-float effective target to return
 true for either __NO_FPRS__ or _SOFT_FLOAT being defined.
 
 Is this OK for trunk?  I must admit that I'm not sure what all those
 Power float variants are for.

On second thought these tests should use

  /* { dg-require-effective-target powerpc_vsx_ok } */

instead of requiring hard_float, and forget about the proposed
change to effective target hard_float.  Is that OK?

Janis


Re: [google] record compiler options to .note sections

2011-10-11 Thread Cary Coutant
 How about .gnu.switches.text.quote_paths?

Sounds good to me.

-cary


Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

2011-10-11 Thread Eric Botcazou
   * gcc.target/sparc/sparc.exp: Add vis3 target test.

This doesn't work.  The code always compiles:

(botcazou@ob) /nile.build/botcazou/gcc-head/sparc-sun-solaris2.10 $ 
gcc/xgcc -Bgcc -c -o vis.o vis.c
(botcazou@ob) /nile.build/botcazou/gcc-head/sparc-sun-solaris2.10 $ objdump -d 
vis.o

vis.o: file format elf32-sparc

Disassembly of section .text:

 _vis3_fpadd64:
   0:   9d e3 bf 90 save  %sp, -112, %sp
   4:   f0 3f bf f8 std  %i0, [ %fp + -8 ]
   8:   f4 3f bf f0 std  %i2, [ %fp + -16 ]
   c:   d0 1f bf f8 ldd  [ %fp + -8 ], %o0
  10:   d4 1f bf f0 ldd  [ %fp + -16 ], %o2
  14:   40 00 00 00 call  14 _vis3_fpadd64+0x14
  18:   01 00 00 00 nop
  1c:   82 10 00 08 mov  %o0, %g1
  20:   ba 10 00 01 mov  %g1, %i5
  24:   83 38 60 1f sra  %g1, 0x1f, %g1
  28:   b8 10 00 01 mov  %g1, %i4
  2c:   84 10 00 1c mov  %i4, %g2
  30:   86 10 00 1d mov  %i5, %g3
  34:   b0 10 00 02 mov  %g2, %i0
  38:   b2 10 00 03 mov  %g3, %i1
  3c:   81 cf e0 08 rett  %i7 + 8
  40:   01 00 00 00 nop


-- 
Eric Botcazou


[Patch,AVR]: Fix PR49939: Skip 2-word insns

2011-10-11 Thread Georg-Johann Lay
This patch teaches avr-gcc to skip 2-word instructions like STS and LDS.

It's just about looking into an 2-word insn and check if it's a 2-word
instruction or not.

Passes without regression. Ok to install?

Johann

PR target/49939
* config/avr/avr.md (*movqi): Rename to movqi_insn.
(*call_insn): Rename to call_insn.
(*call_value_insn): Rename to call_value_insn.
* config/avr/avr.c (avr_2word_insn_p): New static function.
(jump_over_one_insn_p): Use it.
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 179765)
+++ config/avr/avr.md	(working copy)
@@ -295,7 +295,7 @@ (define_expand movqi
operands[1] = copy_to_mode_reg(QImode, operand1);
   )
 
-(define_insn *movqi
+(define_insn movqi_insn
   [(set (match_operand:QI 0 nonimmediate_operand =r,d,Qm,r,q,r,*r)
 	(match_operand:QI 1 general_operand   rL,i,rL,Qm,r,q,i))]
   (register_operand (operands[0],QImode)
@@ -3628,7 +3628,7 @@ (define_expand sibcall_value
   
   )
 
-(define_insn *call_insn
+(define_insn call_insn
   [(parallel[(call (mem:HI (match_operand:HI 0 nonmemory_operand z,s,z,s))
(match_operand:HI 1 general_operand   X,X,X,X))
  (use (match_operand:HI 2 const_int_operand  L,L,P,P))])]
@@ -3651,7 +3651,7 @@ (define_insn *call_insn
 (const_int 2)
 (const_int 1))])])
 
-(define_insn *call_value_insn
+(define_insn call_value_insn
   [(parallel[(set (match_operand 0 register_operand   =r,r,r,r)
   (call (mem:HI (match_operand:HI 1 nonmemory_operand  z,s,z,s))
 (match_operand:HI 2 general_operandX,X,X,X)))
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 179765)
+++ config/avr/avr.c	(working copy)
@@ -7123,6 +7123,56 @@ test_hard_reg_class (enum reg_class rcla
 }
 
 
+/* Helper for jump_over_one_insn_p:  Test if INSN is a 2-word instruction
+   and thus is suitable to be skipped by CPSE, SBRC, etc.  */
+
+static bool
+avr_2word_insn_p (rtx insn)
+{
+  if (avr_current_device-errata_skip
+  || !insn
+  || !INSN_P (insn)
+  || 2 != get_attr_length (insn))
+{
+  return false;
+}
+
+  switch (INSN_CODE (insn))
+{
+default:
+  break;
+  
+case CODE_FOR_movqi_insn:
+  {
+rtx set  = single_set (insn);
+rtx src  = SET_SRC (set);
+rtx dest = SET_DEST (set);
+
+/* Factor out LDS and STS from movqi_insn.  */
+
+if (MEM_P (dest)
+ (REG_P (src) || src == const0_rtx))
+  {
+return CONSTANT_ADDRESS_P (XEXP (dest, 0));
+  }
+else if (REG_P (dest)
+  MEM_P (src))
+  {
+return CONSTANT_ADDRESS_P (XEXP (src, 0));
+  }
+
+break;
+  }
+
+case CODE_FOR_call_insn:
+case CODE_FOR_call_value_insn:
+  return true;
+}
+
+  return false;
+}
+
+
 int
 jump_over_one_insn_p (rtx insn, rtx dest)
 {
@@ -7131,7 +7181,11 @@ jump_over_one_insn_p (rtx insn, rtx dest
 		  : dest);
   int jump_addr = INSN_ADDRESSES (INSN_UID (insn));
   int dest_addr = INSN_ADDRESSES (uid);
-  return dest_addr - jump_addr == get_attr_length (insn) + 1;
+  int jump_offset = dest_addr - jump_addr - get_attr_length (insn);
+  
+  return (jump_offset == 1
+  || (jump_offset == 2
+   avr_2word_insn_p (next_nonnote_nondebug_insn (insn;
 }
 
 /* Returns 1 if a value of mode MODE can be stored starting with hard


Re: [C++-11] User defined literals

2011-10-11 Thread Jason Merrill

On 10/09/2011 07:19 PM, Ed Smith-Rowland wrote:

Does cp_parser_identifier (parser) *not* consume the identifier token?


I'm pretty sure it does.

Does it work to only complain if !cp_parser_parsing_tentatively?

Jason



Re: C++ PATCH for c++/49855, c++/49896 (ICE with named constants in templates)

2011-10-11 Thread Jason Merrill

On 10/11/2011 02:19 PM, Jason Merrill wrote:

For the 4.6 branch I'm only making the change for scalars.

Tested x86_64-pc-linux-gnu, applying to trunk.


Er, to 4.6.


Re: C++ PATCH for c++/49855, c++/49896 (ICE with named constants in templates)

2011-10-11 Thread Jason Merrill

For the 4.6 branch I'm only making the change for scalars.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit d8978a333ab71a4ad2c38446764c1b37092ea098
Author: Jason Merrill ja...@redhat.com
Date:   Mon Oct 3 17:06:02 2011 -0400

	PR c++/49855
	PR c++/49896
	* call.c (perform_implicit_conversion_flags): Do perform
	scalar conversions in templates.
	* pt.c (tsubst_copy, tsubst_copy_and_build): Handle CONVERT_EXPR.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 0ec0a07..c54ce7b 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8068,7 +8068,8 @@ perform_implicit_conversion_flags (tree type, tree expr, tsubst_flags_t complain
 	}
   expr = error_mark_node;
 }
-  else if (processing_template_decl)
+  else if (processing_template_decl
+	!(SCALAR_TYPE_P (type)  SCALAR_TYPE_P (TREE_TYPE (expr
 {
   /* In a template, we are only concerned about determining the
 	 type of non-dependent expressions, so we do not have to
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 2ca1ce4..9a48bb4 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11486,6 +11486,7 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl)
 case STATIC_CAST_EXPR:
 case DYNAMIC_CAST_EXPR:
 case NOP_EXPR:
+case CONVERT_EXPR:
   return build1
 	(code, tsubst (TREE_TYPE (t), args, complain, in_decl),
 	 tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl));
@@ -12637,6 +12638,12 @@ tsubst_copy_and_build (tree t,
 	(tsubst (TREE_TYPE (t), args, complain, in_decl),
 	 RECUR (TREE_OPERAND (t, 0)));
 
+case CONVERT_EXPR:
+  return build1
+	(CONVERT_EXPR,
+	 tsubst (TREE_TYPE (t), args, complain, in_decl),
+	 RECUR (TREE_OPERAND (t, 0)));
+
 case CAST_EXPR:
 case REINTERPRET_CAST_EXPR:
 case CONST_CAST_EXPR:
diff --git a/gcc/testsuite/g++.dg/template/constant1.C b/gcc/testsuite/g++.dg/template/constant1.C
new file mode 100644
index 000..a2c5a08
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/constant1.C
@@ -0,0 +1,13 @@
+// PR c++/49855
+
+extern void foo(int);
+
+template class Key, class Value void Basic() {
+  const int kT = 1.5e6;// --- causes ICE
+  int size = kT*2/3;
+  do {
+foo(size);
+size = size * 0.5 - 1;
+  } while (size = 0 );
+
+}
diff --git a/gcc/testsuite/g++.dg/template/constant2.C b/gcc/testsuite/g++.dg/template/constant2.C
new file mode 100644
index 000..f71e4f5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/constant2.C
@@ -0,0 +1,22 @@
+// PR c++/49896
+
+templateclass C
+class test {
+ protected:
+  static const int versionConst = 0x8000;
+  enum { versionEnum = versionConst };
+ public:
+  int getVersion();
+};
+
+templateclass C
+int testC::getVersion() {
+  return versionEnum;
+}
+
+class dummy_class {};
+
+int main() {
+  testdummy_class t;
+  return t.getVersion();
+}


Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

2011-10-11 Thread David Miller
From: Eric Botcazou ebotca...@adacore.com
Date: Tue, 11 Oct 2011 20:10:34 +0200

  * gcc.target/sparc/sparc.exp: Add vis3 target test.
 
 This doesn't work.  The code always compiles:

What does gcc -mcpu=niagara3 -mvis give to you for the following
source file:

long long
_vis3_fpadd64 (long long __X, long long __Y)
{
return __builtin_vis_fpadd64 (__X, __Y);
}

That's what the sparc.exp test is using.  I would expect that to spit
out a warning.  Do I need to explicitly add -Wall, -Wno-implicit or
similar?  Similar tests in i386.exp don't seem to need this and that
was what I used as my template.




Re: C++ PATCH for c++/49216 (problems with new T[1]{})

2011-10-11 Thread Jason Merrill
This ICE is still a regression in 4.6, so I'm checking in this patch to 
fix it.  Note that after this patch, the code generated for new T[1]{} 
is still wrong, but that isn't a regression.  The value-initialization 
semantics are fixed in 4.7.


Tested x86_64-pc-linux-gnu.
commit 25a2d664e6a3787cc68e3c41beb12330469ee4a5
Author: Jason Merrill ja...@redhat.com
Date:   Tue Oct 11 15:22:55 2011 -0400

	PR c++/49216
	* init.c (build_vec_init): Avoid crash on new int[1]{}.

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 9440c1a..c4bd635 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -3067,8 +3067,9 @@ build_vec_init (tree base, tree maxindex, tree init,
   unsigned HOST_WIDE_INT idx;
   tree field, elt;
   /* Should we try to create a constant initializer?  */
-  bool try_const = (literal_type_p (inner_elt_type)
-			|| TYPE_HAS_CONSTEXPR_CTOR (inner_elt_type));
+  bool try_const = (TREE_CODE (atype) == ARRAY_TYPE
+			 (literal_type_p (inner_elt_type)
+			|| TYPE_HAS_CONSTEXPR_CTOR (inner_elt_type)));
   bool saw_non_const = false;
   bool saw_const = false;
   /* If we're initializing a static array, we want to do static
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-49216.C b/gcc/testsuite/g++.dg/cpp0x/initlist-49216.C
new file mode 100644
index 000..4bf6082
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist-49216.C
@@ -0,0 +1,6 @@
+// PR c++/49216
+// { dg-options -std=c++0x }
+
+int main() {
+  new int[1]{};
+}


Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

2011-10-11 Thread Eric Botcazou
 What does gcc -mcpu=niagara3 -mvis give to you for the following
 source file:

   long long
   _vis3_fpadd64 (long long __X, long long __Y)
   {
 return __builtin_vis_fpadd64 (__X, __Y);
   }

Nothing at all, with or without the options.

 That's what the sparc.exp test is using.

And that's what I tested of course.

 I would expect that to spit out a warning.  Do I need to explicitly
 add -Wall, -Wno-implicit or similar?  Similar tests in i386.exp don't
 seem to need this and that was what I used as my template.

-Wall does yield a warning:

vis.c: In function '_vis3_fpadd64':
vis.c:4:3: warning: implicit declaration of function '__builtin_vis_fpadd64' 
[-Wimplicit-function-declaration]

-- 
Eric Botcazou


Re: [testsuite] modify powerpc test for hard_float target, skip powerpc/warn-[12].c for soft-float

2011-10-11 Thread Mike Stump
On Oct 11, 2011, at 11:08 AM, Janis Johnson wrote:
 On 10/10/2011 01:19 PM, Janis Johnson wrote:
 Tests gcc.target/powerpc/warn-[12].c fail for soft-float multilibs with
 the unexpected warning -mvsx requires hardware floating point [enabled
 by default].  This patch skips those tests for soft-float multilibs and
 modifies the powerpc check for a soft-float effective target to return
 true for either __NO_FPRS__ or _SOFT_FLOAT being defined.
 
 Is this OK for trunk?  I must admit that I'm not sure what all those
 Power float variants are for.
 
 On second thought these tests should use
 
  /* { dg-require-effective-target powerpc_vsx_ok } */
 
 instead of requiring hard_float, and forget about the proposed
 change to effective target hard_float.  Is that OK?

I was hoping a ppc person would chime in...  Ok.


[pph] Make libcpp symbol validation a warning (issue5235061)

2011-10-11 Thread Diego Novillo

Currently, the consistency check done on pre-processor symbols is
triggering on symbols that are not really problematic (e.g., symbols
used for double-include guards).

The problem is that in the testsuite, we are refusing to process PPH
images that fail that test, which means we don't get to test other
issues.  To avoid this, I changed the error() call to warning().  Seemed
innocent enough, but there were more problems behind that one:

1- We do not really try to avoid reading PPH images more than once.
   This problem is different than the usual double-inclusion guard.
   For instance, suppose a file foo.pph includes 1.pph, 2.pph and
   3.pph.  When generating foo.pph, we read all 3 files just once and
   double-include guards do not need to trigger.  However, if we are
   later building a TU with:
#include 2.pph
#include foo.pph
   we first read 2.pph and when reading foo.pph, we try to read 2.pph
   again, because it is mentioned in foo.pph's line map table.

   I added a guard in pph_stream_open() so it doesn't try to open the
   same file more than once, but that meant adjusting some of the
   assertions while reading the line table.  We should not expect to
   find foo.pph's line map table exactly like the one we wrote.
   
2- We cannot keep a global list of included files to consult external
   caches.  In the example above, if foo.pph needs to resolve an
   external reference to 2.pph, it needs to index into the 2nd slot in
   its include vector.  However, the file including foo.pph needs to
   index into the 1st slot in its include vector (since 2.pph is
   included first).  This meant moving the includes field inside
   struct pph_stream.

3- When reading a type, we should not try to access the method vector
   for it, unless the type is a class.

There's some more consequences of this, but the patch was starting to
become too big, so I'm submitting it now.  This fixes a couple of
files and changes the expected error on another two.  I'll be fixing
them separately.

Tested on x86_64.  Committed to branch.


Diego.


* pph-streamer-in.c (pph_reading_includes): Remove.  Update all users.
(pph_in_include): Call pph_add_include with the newly
materialized stream.
(pph_in_line_table_and_includes): Document differences between
non-PPH compiles and PPH compiles wrt line table behaviour.
Modify assertions accordingly.
(pph_read_tree_header): Tidy.
(report_validation_error): Change error() call to warning().
(pph_image_already_read): Remove.  Update all users.
(pph_read_file_1): If STREAM-IN_MEMORY_P is set, return.
Call pph_mark_strea_read.
(pph_add_read_image): Remove.
(pph_read_file): Change return type to pph_stream *.  Update all
users.
(pph_reader_finish): Remove.
* pph-streamer-out.c (pph_writer_init): Tidy.
(pph_add_include): Remove.
(pph_get_marker_for): Always consult the pre-loaded cache first.
(pph_writer_add_include): New.
* pph-streamer.c (pph_read_images): Make static.
(pph_init_preloaded_cache): Make static.
(pph_streamer_init): New.
(pph_streamer_finish): New.
(pph_find_stream_for): New.
(pph_mark_stream_read): New.
(pph_stream_open): Call pph_find_stream_for.  If the stream
already exists, return it.
(pph_add_include): Move from pph-streamer-in.c.  Add new
argument STREAM.
(pph_cache_lookup_in_includes): Add new argument STREAM.
Update all users.
* pph-streamer.h (pph_read_images): Remove extern declaration.
Move field INCLUDES out of union W.  Update all users.
Add field IN_MEMORY_P.
(pph_streamer_init): Declare.
(pph_streamer_finish): Declare.
(pph_mark_stream_read): Declare.
(pph_add_include): Declare.
(pph_writer_add_include): Declare.
* pph.c (pph_include_handler): Call pph_writer_add_include.
(pph_init): Call pph_streamer_init.
(pph_finish): Call pph_streamer_finish.

testsuite/ChangeLog.pph

* g++.dg/pph/d1symnotinc.cc: Change expected error.
* g++.dg/pph/x7dynarray6.cc: Likewise.
* g++.dg/pph/x7dynarray7.cc: Likewise.
* g++.dg/pph/x5dynarray7.h: Mark fixed.
* g++.dg/pph/x6dynarray6.h: Mark fixed.

diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
index fbd78d0..ffa1433 100644
--- a/gcc/cp/pph-streamer-in.c
+++ b/gcc/cp/pph-streamer-in.c
@@ -47,12 +47,6 @@ DEF_VEC_ALLOC_P(char_p,heap);
   memory will remain allocated until the end of compilation.  */
 static VEC(char_p,heap) *string_tables = NULL;
 
-/* Increment when we are in the process of reading includes as we do not want
-   to add those to the parent pph stream's list of includes to be written out.
-   Decrement when done. We cannot use a simple true/false flag as read includes
-   will call pph_in_includes as well.  */

Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-11 Thread Ian Lance Taylor
On Tue, Oct 11, 2011 at 4:40 AM, Richard Guenther rguent...@suse.de wrote:

 this function misses to transfer TREE_THIS_NOTRAP which is supposed
 to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
 If you make the function generic please adjust it to at least do ...
 ...

          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);

This line was indeed added to the patch as committed.  This appears to have
broken the build of libgo.  I now get this:

../../../gccgo3/libgo/go/image/png/writer.go: In function
‘png.writeIDATs.pN23_libgo_image.png.encoder’:
../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
marked for throw, but doesn’t
# .MEM_775 = VDEF .MEM_774
MEM[base: D.8326_1070, offset: 0B] = VIEW_CONVERT_EXPRstruct
{
  uint8 * __values;
  int __count;
  int __capacity;
}(GOTMP.495);

../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
marked for throw, but doesn’t
# .MEM_776 = VDEF .MEM_775
D.7574 = MEM[base: D.8325_1069, offset: 0B];

../../../gccgo3/libgo/go/image/png/writer.go:403:1: internal compiler
error: verify_gimple failed
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.


I have not yet done a full investigation, but it appears that this
function is now marking
a newly created reference as TREE_THIS_NOTRAP, which it did not previously do.
The new instruction is within an exception region, and the tree-cfg
checker insists that
instructions in exception region are permitted to trap.  It may be
that the ivopts pass
now requires TODO_cleanup_cfg, or it may be something more complicated.

You should be able to recreate the problem yourself by using
--enable-languages=go when you run configure.

Ian


Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

2011-10-11 Thread Eric Botcazou
 Cool, Eric could you quickly test the following?  This still leaves
 the i386.exp case issue open, it stands to reason that something like
 -Wall is needed for those tests too.

I think that we should go the i386 way.  This works on i386 because the 
builtins are always available (when you pass the right options) and the 
assembler rejects the unknown instructions.  So in config/sparc/sparc.h:

#ifndef HAVE_AS_FMAF_HPC_VIS3
#define AS_NIAGARA3_FLAG b
#undef TARGET_FMAF
#define TARGET_FMAF 0
#undef TARGET_VIS3
#define TARGET_VIS3 0
#else
#define AS_NIAGARA3_FLAG d
#endif

we shouldn't force TARGET_FMAF and TARGET_VIS3 to 0.  The configure test would 
only be used to compute default options.

-- 
Eric Botcazou


Go patch committed: Correct ChangeLog and spacing

2011-10-11 Thread Ian Lance Taylor
I committed this patch to mainline to remove an incorrect ChangeLog
entry (the gofrontend directory lives elsewhere; gcc/go/ChangeLog only
applies to the files in gcc/go outside of gcc/go/gofrontend) and to fix
spacing in a Go frontend file.

Ian

Index: gofrontend/gogo-tree.cc
===
--- gofrontend/gogo-tree.cc	(revision 179825)
+++ gofrontend/gogo-tree.cc	(working copy)
@@ -69,7 +69,7 @@ define_builtin(built_in_function bcode, 
    libname, NULL_TREE);
   if (const_p)
 TREE_READONLY(decl) = 1;
-  set_builtin_decl (bcode, decl, true);
+  set_builtin_decl(bcode, decl, true);
   builtin_functions[name] = decl;
   if (libname != NULL)
 {
Index: ChangeLog
===
--- ChangeLog	(revision 179825)
+++ ChangeLog	(working copy)
@@ -1,13 +1,3 @@
-2011-10-11  Michael Meissner  meiss...@linux.vnet.ibm.com
-
-	* gofrontend/gogo-tree.cc (define_builtin): Delete old interface
-	with two parallel arrays to hold standard builtin declarations,
-	and replace it with a function based interface that can support
-	creating builtins on the fly in the future.  Change all uses, and
-	poison the old names.  Make sure 0 is not a legitimate builtin
-	index.
-	(Gogo::make_trampoline(tree): Ditto.
-
 2011-08-24  Roberto Lublinerman  rlu...@gmail.com
 
 	* lang.opt: Add fgo-optimize-.


Re: [PATCH] [Annotalysis] Bugfix for spurious thread safety warnings with shared mutexes

2011-10-11 Thread Ollie Wild
On Mon, Oct 10, 2011 at 3:37 PM, Delesley Hutchins deles...@google.com wrote:

 --- gcc/tree-threadsafe-analyze.c       (revision 179771)
 +++ gcc/tree-threadsafe-analyze.c       (working copy)
 @@ -1830,14 +1830,27 @@ remove_lock_from_lockset (tree lockable, struct po

This feels like a bug in lock_set_contains(), not
remove_lock_from_lockset().  I'd modify lock_set_contains() as
follows:

1) During the universal lock conditional, remove the return statement.
 Instead, set default_lock = lock (where default_lock is a new
variable initialized to NULL_TREE).

2) Anywhere NULL_TREE is returned later, replace it with default_lock.

Ollie


[SPARC] Fix PR target/49965

2011-10-11 Thread Eric Botcazou
This is a regression present on mainline and 4.6/4.5 branch.  We generate wrong 
code for the movcc patterns if the operands of the comparison have TFmode and 
TARGET_HARD_QUAD is not set, because we fail to update the comparison code 
after going through the comparison routine.

Tested on SPARC/Solaris, applied to mainline and 4.6/4.5 branch.


2011-10-11  Eric Botcazou  ebotca...@adacore.com

PR target/49965
* config/sparc/sparc.md (movI:modecc): Do not save comparison code.
(movF:modecc): Likewise.


-- 
Eric Botcazou
Index: config/sparc/sparc.md
===
--- config/sparc/sparc.md	(revision 179736)
+++ config/sparc/sparc.md	(working copy)
@@ -2614,11 +2614,9 @@ (define_expand movI:modecc
 			(match_operand:I 3 arith10_operand )))]
   TARGET_V9  !(I:MODEmode == DImode  TARGET_ARCH32)
 {
-  enum rtx_code code = GET_CODE (operands[1]);
   rtx cc_reg;
 
-  if (GET_MODE (XEXP (operands[1], 0)) == DImode
-   ! TARGET_ARCH64)
+  if (GET_MODE (XEXP (operands[1], 0)) == DImode  !TARGET_ARCH64)
 FAIL;
 
   if (GET_MODE (XEXP (operands[1], 0)) == TFmode  !TARGET_HARD_QUAD)
@@ -2629,12 +2627,14 @@ (define_expand movI:modecc
   if (XEXP (operands[1], 1) == const0_rtx
GET_CODE (XEXP (operands[1], 0)) == REG
GET_MODE (XEXP (operands[1], 0)) == DImode
-   v9_regcmp_p (code))
+   v9_regcmp_p (GET_CODE (operands[1])))
 cc_reg = XEXP (operands[1], 0);
   else
 cc_reg = gen_compare_reg (operands[1]);
 
-  operands[1] = gen_rtx_fmt_ee (code, GET_MODE (cc_reg), cc_reg, const0_rtx);
+  operands[1]
+= gen_rtx_fmt_ee (GET_CODE (operands[1]), GET_MODE (cc_reg), cc_reg,
+		  const0_rtx);
 })
 
 (define_expand movF:modecc
@@ -2644,11 +2644,9 @@ (define_expand movF:modecc
 			(match_operand:F 3 register_operand )))]
   TARGET_V9  TARGET_FPU
 {
-  enum rtx_code code = GET_CODE (operands[1]);
   rtx cc_reg;
 
-  if (GET_MODE (XEXP (operands[1], 0)) == DImode
-   ! TARGET_ARCH64)
+  if (GET_MODE (XEXP (operands[1], 0)) == DImode  !TARGET_ARCH64)
 FAIL;
 
   if (GET_MODE (XEXP (operands[1], 0)) == TFmode  !TARGET_HARD_QUAD)
@@ -2659,12 +2657,14 @@ (define_expand movF:modecc
   if (XEXP (operands[1], 1) == const0_rtx
GET_CODE (XEXP (operands[1], 0)) == REG
GET_MODE (XEXP (operands[1], 0)) == DImode
-   v9_regcmp_p (code))
+   v9_regcmp_p (GET_CODE (operands[1])))
 cc_reg = XEXP (operands[1], 0);
   else
 cc_reg = gen_compare_reg (operands[1]);
 
-  operands[1] = gen_rtx_fmt_ee (code, GET_MODE (cc_reg), cc_reg, const0_rtx);
+  operands[1]
+= gen_rtx_fmt_ee (GET_CODE (operands[1]), GET_MODE (cc_reg), cc_reg,
+		  const0_rtx);
 })
 
 ;; Conditional move define_insns


Re: fix for c++/44473, mangling of decimal types, checked in

2011-10-11 Thread Eric Botcazou
 Ok, it has been committed to both the FSF 4.6 and 4.5 branches now.

The ChangeLog entry is in the wrong file, it must be moved to cp/ChangeLog.

-- 
Eric Botcazou


Re: [wwwdocs] gcc-4.6/porting_to.html

2011-10-11 Thread Benjamin Kosnik

 I realized this one hasn't made it in, but is really nice.  I made a 
 number of minor edits (typos, markup, simplifying headings,... among 
 others).  What do you think -- should we include this?
 
 Many users still won't have GCC 4.6 deployed yet, so I think it's
 still worth it.
 
 What do you think?

Ouch. I see this is not in, and I though I checked in the draft months
ago. 

Please check this in immediately!!!

-benjamin


Re: fix for c++/44473, mangling of decimal types, checked in

2011-10-11 Thread Peter Bergner
On Tue, 2011-10-11 at 23:34 +0200, Eric Botcazou wrote:
  Ok, it has been committed to both the FSF 4.6 and 4.5 branches now.
 
 The ChangeLog entry is in the wrong file, it must be moved to cp/ChangeLog.

Oops, thanks for catching that!  Fixed now.

Peter






Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

2011-10-11 Thread Eric Botcazou
 I see, so we can test the code generation in the testsuite even if the
 compiler was built against an assembler without support for the
 instructions.

At least partially, yes.

 But in such a case, I'm unsure if I understand why i386.exp needs
 these tests at all.  The presence of support for a particular i386
 intrinsic is an implicit property of the gcc sources that these test
 cases are a part of.

 If the tests are properly added only once the code to support the i386
 intrinsic is added as well, the checks seem superfluous.

The check is an _object_ check, for example:

proc check_effective_target_sse4 { } {
return [check_no_compiler_messages sse4.1 object {

so it checks that an object file can be produced.  You indeed don't need to 
invoke the check via the sse4.1 tag if you use:

/* { dg-do compile } */

in your tests, but you do need the sse4.1 tag if you use:

/* { dg-do assemble } */

or

/* { dg-do run } */

So the first category of tests will always be executed, whereas the latter two 
will only be executed if you have the binutils support.

-- 
Eric Botcazou


Re: [arm-embedded] Tune loop unrolling for cortex-m

2011-10-11 Thread Hans-Peter Nilsson
On Wed, 21 Sep 2011, Joey Ye wrote:

 Committed in ARM/embedded-4_6-branch.

 2011-09-21  Jiangning Liu  jiangning@arm.com

   Tune loop unrolling for cortex-m
   * config/arm/arm-cores.def (cortex-m0): Change to new tune
   cortex_v6m.
   (cortex-m1): Likewise.
   * config/arm/arm-protos.h (max_unroll_times): New.
   * config/arm/arm.c (arm_default_unroll_times): New.
   (arm_cortex_m_unroll_times): New.
   (arm_cortex_v6m_tune): New.
   (arm_slowmul_tune): Add max_unroll_times function pointer.
   (arm_fastmul_tune, arm_xscale_tune, arm_9e_tune,
   arm_v6t2_tune, arm_cortex_tune, arm_cortex_a9_tune,
   arm_cortex_v7m_tune, arm_cortex_v6m_tune,
   arm_fa726te_tune): Likewise.
   (arm_option_override): Enable loop unroll for all all M class
   Cores, if optimization level is = 1.

Shouldn't this kind of stuff get into trunk as well?

brgds, H-P


Re: Fix PR 50565 (offsetof-type expressions in static initializers)

2011-10-11 Thread Gabriel Dos Reis
On Tue, Oct 11, 2011 at 10:32 AM, Joseph S. Myers
jos...@codesourcery.com wrote:

 The problem comes down to an expression with the difference of two
 pointers being cast to int on a 64-bit system, resulting in
 convert_to_integer moving the conversions inside the subtraction.
 (These optimizations at conversion time should really be done later as
 a part of folding, or even later than that, rather than
 unconditionally in convert_to_*, but that's another issue.)

Interesting. C++11 classified this as linktime constants, e.g. they
are constant
expressions for static initialization purposes, but not compile-time constant
expressions, precisely because of this kind of issues.


Re: [PATCH 5/9] [SMS] Support new loop pattern

2011-10-11 Thread Ayal Zaks
On Fri, Sep 30, 2011 at 5:22 PM, Roman Zhuykov zhr...@ispras.ru wrote:
 2011/7/21  zhr...@ispras.ru:
 This patch should be applied only after pending patches by Revital.


 Ping. New version is attached, it suits current trunk without
 additional patches.

Thanks for the ping.

 Also this related patch needs approval:
 http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html

 The loop should meet the following requirements.
 First three are the same as for loop with doloop pattern:
 ...
 The next three describe the control part of new supported loops.
 - the last jump instruction should look like:  pc=(regF!=0)?label:pc, regF is

you'd probably want to bump to next instruction if falling through,
e.g., pc=(regF!=0)?label:pc+4

  flag register;
 - the last instruction which sets regF should be: regF=COMPARE(regC,X), 
 where X
  is a constant, or maybe a register, which is not changed inside a loop;
 - only one instruction modifies regC inside a loop (other can use regC, but 
 not
  write), and it should simply adjust it by a constant: regC=regC+step, where
  step is a constant.

 When doloop is succesfully scheduled by SMS, its number of
 iterations of loop kernel should be decreased by the number of stages in a
 schedule minus one, while other iterations expand to prologue and epilogue.
 In new supported loops such approach can't be used, because some
 instructions can use count register (regC).  Instead of this,
 the final register value X in compare instruction regF=COMPARE(regC,X)
 is changed to another value Y respective to the stage this instruction
 is scheduled (Y = X - stage * step).

making sure this does not underflow; i.e., that the number of
iterations is no less than stage (you've addressed this towards the
end below).


 The main difference from doloop case is that regC can be used by some
 instructions in loop body.
 That's why we are unable to simply adjust regC initial value, but have
 to keep it's value correct on each particular iteration.
 So, we change comparison instruction accordingly.

 An example:
 int a[100];
 int main()
 {
  int i;
  for (i = 85; i  12; i -= 5)
      a[i] = i * i;
  return a[15]-225;
 }
 ARM assembler with -O2 -fno-auto-inc-dec:
        ldr     r0, .L5
        mov     r3, #85
        mov     r2, r0
 .L2:
        mul     r1, r3, r3
        sub     r3, r3, #5
        cmp     r3, #10
        str     r1, [r2, #340]
        sub     r2, r2, #20
        bne     .L2
        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L5:
        .word   a

 Loop body is executed 15 times.
 When compiling with SMS, it finds a schedule with ii=7, stage_count=3
 and following times:
 Stage  Time       Insn
 0          5      mul     r1, r3, r3
 1         10     sub     r3, r3, #5
 1         11     cmp     r3, #10
 1         11     str     r1, [r2, #340]
 1         13     bne     .L2
 2         16     sub     r2, r2, #20


branch is not scheduled last?

 To make new schedule correct the loop body
 should be executed 14 times and we change compare instruction:

the loop itself should execute 13 times.

 regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step.
 In our example regC is r3, X is 10, step = -5, compare instruction
 is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15.


right. In general, if the compare is on stage s (starting from 0), it
will be executed s times in the epilog, so it should exit the loop
upon reaching Y = X - s * step.

 So, after SMS it looks like:
        ldr     r0, .L5
        mov     r3, #85
        mov     r2, r0
 ;;prologue
        mul     r1, r3, r3      ;;from stage 0 first iteration
        sub     r3, r3, #5      ;;3 insns from stage 1 first iteration
        cmp     r3, #10
        str     r1, [r2, #340]
        mul     r1, r3, r3      ;;from stage 0 second iteration
 ;;body
 .L2:
        sub     r3, r3, #5
        sub     r2, r2, #20
        cmp     r3, #15         ;; new value to compare with is Y=15
        str     r1, [r2, #340]
        mul     r1, r3, r3
        bne     .L2
 ;;epilogue
        sub     r2, r2, #20     ;;from stage 2 pre-last iteration
        sub     r3, r3, #5      ;;3 insns from stage 1 last iteration
        cmp     r3, #10
        str     r1, [r2, #340]
        sub     r2, r2, #20     ;;from stage 2 last iteration

        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L5:
        .word   a

 Real ARM assembler with SMS (after some optimizations and without dead code):
        mov     r3, #85
        ldr     r0, .L8
        mul     r1, r3, r3
        sub     r3, r3, #5
        mov     r2, r0
        str     r1, [r0, #340]
        mul     r1, r3, r3
 .L2:
        sub     r3, r3, #5
        sub     r2, r2, #20
        cmp     r3, #15
        str     r1, [r2, #340]
        mul     r1, r3, r3
        bne     .L2
        str     r1, [r2, #320]
        ldr     r0, [r0, #60]
        sub     r0, r0, #225
        bx      lr
 .L8:
        .word   a


 

Re: [Patch 2/5] ARM 64 bit sync atomic operations [V3]

2011-10-11 Thread Ramana Radhakrishnan
On 6 October 2011 18:52, Dr. David Alan Gilbert
david.gilb...@linaro.org wrote:
        Micahel K. Edwards points out in PR/48126 that the sync is in the 
 wrong place
        relative to the branch target of the compare, since the load could 
 float
        up beyond the ldrex.

        PR target/48126

          * config/arm/arm.c (arm_output_sync_loop): Move label before barrier



 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 5161439..6e7105a 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit,
        }
     }

 -  arm_process_output_memory_barrier (emit, NULL);
 +  /* Note: label is before barrier so that in cmp failure case we still get
 +     a barrier to stop subsequent loads floating upwards past the ldrex
 +     pr/48126.  */
   arm_output_asm_insn (emit, 1, operands, %sLSYB%%=:, LOCAL_LABEL_PREFIX);
 +  arm_process_output_memory_barrier (emit, NULL);
  }

  static rtx


OK.

Ramana


Re: [Patch 4/5] ARM 64 bit sync atomic operations [V3]

2011-10-11 Thread Ramana Radhakrishnan
On 6 October 2011 18:54, Dr. David Alan Gilbert
david.gilb...@linaro.org wrote:
    Add ARM 64bit sync helpers for use on older ARMs.  Based on 32bit
    versions but with check for sufficiently new kernel version.

        gcc/
        * config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c)
        * config/arm/linux-atomic.c: Change comment to point to 64bit version
          (SYNC_LOCK_RELEASE): Instantiate 64bit version.
        * config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c

OK.

Ramana


 diff --git a/gcc/config/arm/linux-atomic-64bit.c 
 b/gcc/config/arm/linux-atomic-64bit.c
 new file mode 100644
 index 000..6966e66
 --- /dev/null
 +++ b/gcc/config/arm/linux-atomic-64bit.c
 @@ -0,0 +1,166 @@
 +/* 64bit Linux-specific atomic operations for ARM EABI.
 +   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
 +   Based on linux-atomic.c
 +
 +   64 bit additions david.gilb...@linaro.org
 +
 +This file is part of GCC.
 +
 +GCC is free software; you can redistribute it and/or modify it under
 +the terms of the GNU General Public License as published by the Free
 +Software Foundation; either version 3, or (at your option) any later
 +version.
 +
 +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
 +WARRANTY; without even the implied warranty of MERCHANTABILITY or
 +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 +for more details.
 +
 +Under Section 7 of GPL version 3, you are granted additional
 +permissions described in the GCC Runtime Library Exception, version
 +3.1, as published by the Free Software Foundation.
 +
 +You should have received a copy of the GNU General Public License and
 +a copy of the GCC Runtime Library Exception along with this program;
 +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 +http://www.gnu.org/licenses/.  */
 +
 +/* 64bit helper functions for atomic operations; the compiler will
 +   call these when the code is compiled for a CPU without ldrexd/strexd.
 +   (If the CPU had those then the compiler inlines the operation).
 +
 +   These helpers require a kernel helper that's only present on newer
 +   kernels; we check for that in an init section and bail out rather
 +   unceremoneously.  */
 +
 +extern unsigned int __write (int fd, const void *buf, unsigned int count);
 +extern void abort (void);
 +
 +/* Kernel helper for compare-and-exchange.  */
 +typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
 +                                       const long long* newval,
 +                                       long long *ptr);
 +#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0x0f60)
 +
 +/* Kernel helper page version number.  */
 +#define __kernel_helper_version (*(unsigned int *)0x0ffc)
 +
 +/* Check that the kernel has a new enough version at load.  */
 +static void __check_for_sync8_kernelhelper (void)
 +{
 +  if (__kernel_helper_version  5)
 +    {
 +      const char err[] = A newer kernel is required to run this binary. 
 +                               (__kernel_cmpxchg64 helper)\n;
 +      /* At this point we need a way to crash with some information
 +        for the user - I'm not sure I can rely on much else being
 +        available at this point, so do the same as generic-morestack.c
 +        write () and abort ().  */
 +      __write (2 /* stderr.  */, err, sizeof (err));
 +      abort ();
 +    }
 +};
 +
 +static void (*__sync8_kernelhelper_inithook[]) (void)
 +               __attribute__ ((used, section (.init_array))) = {
 +  __check_for_sync8_kernelhelper
 +};
 +
 +#define HIDDEN __attribute__ ((visibility (hidden)))
 +
 +#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)                        \
 +  long long HIDDEN                                             \
 +  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)    \
 +  {                                                            \
 +    int failure;                                               \
 +    long long tmp,tmp2;                                                \
 +                                                               \
 +    do {                                                       \
 +      tmp = *ptr;                                              \
 +      tmp2 = PFX_OP (tmp INF_OP val);                          \
 +      failure = __kernel_cmpxchg64 (tmp, tmp2, ptr);         \
 +    } while (failure != 0);                                    \
 +                                                               \
 +    return tmp;                                                        \
 +  }
 +
 +FETCH_AND_OP_WORD64 (add,   , +)
 +FETCH_AND_OP_WORD64 (sub,   , -)
 +FETCH_AND_OP_WORD64 (or,    , |)
 +FETCH_AND_OP_WORD64 (and,   , )
 +FETCH_AND_OP_WORD64 (xor,   , ^)
 +FETCH_AND_OP_WORD64 (nand, ~, )
 +
 +#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
 +#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
 +
 +/* Implement both 

Re: RFC: ARM: Add comments to emitted .eabi_attribute directives

2011-10-11 Thread Ramana Radhakrishnan
  Any objections to this version of the patch ?

Fine with me though I think it's worthwhile to have such comments
without -dA but that's my personal impression. I don't care either
way.

cheers
Ramana


 Cheers
  Nick

 gcc/ChangeLog
 2011-10-05  Nick Clifton  ni...@redhat.com

        * config/arm/arm.c (EMIT_EABI_ATTRIBUTE): New macro.  Used to
        emit a .eabi_attribute assembler directive, possibly with a
        comment attached.
        (asm_file_start): Use the new macro.


 Index: gcc/config/arm/arm.c
 ===
 --- gcc/config/arm/arm.c        (revision 179554)
 +++ gcc/config/arm/arm.c        (working copy)
 @@ -22243,6 +22243,21 @@
     asm_fprintf (stream, %U%s, name);
  }

 +/* This macro is used to emit an EABI tag and its associated value.
 +   We emit the numerical value of the tag in case the assembler does not
 +   support textual tags.  (Eg gas prior to 2.20).  If requested we include
 +   the tag name in a comment so that anyone reading the assembler output
 +   will know which tag is being set.  */
 +#define EMIT_EABI_ATTRIBUTE(NAME,NUM,VAL)                              \
 +  do                                                                   \
 +    {                                                                  \
 +      asm_fprintf (asm_out_file, \t.eabi_attribute %d, %d, NUM, VAL); \
 +      if (flag_verbose_asm || flag_debug_asm)                          \
 +       asm_fprintf (asm_out_file, \t%s  #NAME, ASM_COMMENT_START);   \
 +      asm_fprintf (asm_out_file, \n);                                      
   \
 +    }                                                                  \
 +  while (0)
 +
  static void
  arm_file_start (void)
  {
 @@ -22274,9 +22289,9 @@
          if (arm_fpu_desc-model == ARM_FP_MODEL_VFP)
            {
              if (TARGET_HARD_FLOAT)
 -               asm_fprintf (asm_out_file, \t.eabi_attribute 27, 3\n);
 +               EMIT_EABI_ATTRIBUTE (Tag_ABI_HardFP_use, 27, 3);
              if (TARGET_HARD_FLOAT_ABI)
 -               asm_fprintf (asm_out_file, \t.eabi_attribute 28, 1\n);
 +               EMIT_EABI_ATTRIBUTE (Tag_ABI_VFP_args, 28, 1);
            }
        }
       asm_fprintf (asm_out_file, \t.fpu %s\n, fpu_name);
 @@ -22285,31 +22300,24 @@
          are used.  However we don't have any easy way of figuring this out.
         Conservatively record the setting that would have been used.  */

 -      /* Tag_ABI_FP_rounding.  */
       if (flag_rounding_math)
 -       asm_fprintf (asm_out_file, \t.eabi_attribute 19, 1\n);
 +       EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_rounding, 19, 1);
 +
       if (!flag_unsafe_math_optimizations)
        {
 -         /* Tag_ABI_FP_denomal.  */
 -         asm_fprintf (asm_out_file, \t.eabi_attribute 20, 1\n);
 -         /* Tag_ABI_FP_exceptions.  */
 -         asm_fprintf (asm_out_file, \t.eabi_attribute 21, 1\n);
 +         EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_denormal, 20, 1);
 +         EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_exceptions, 21, 1);
        }
 -      /* Tag_ABI_FP_user_exceptions.  */
       if (flag_signaling_nans)
 -       asm_fprintf (asm_out_file, \t.eabi_attribute 22, 1\n);
 -      /* Tag_ABI_FP_number_model.  */
 -      asm_fprintf (asm_out_file, \t.eabi_attribute 23, %d\n,
 -                  flag_finite_math_only ? 1 : 3);
 +       EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_user_exceptions, 22, 1);

 -      /* Tag_ABI_align8_needed.  */
 -      asm_fprintf (asm_out_file, \t.eabi_attribute 24, 1\n);
 -      /* Tag_ABI_align8_preserved.  */
 -      asm_fprintf (asm_out_file, \t.eabi_attribute 25, 1\n);
 -      /* Tag_ABI_enum_size.  */
 -      asm_fprintf (asm_out_file, \t.eabi_attribute 26, %d\n,
 -                  flag_short_enums ? 1 : 2);
 +      EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_number_model, 23,
 +                          flag_finite_math_only ? 1 : 3);

 +      EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_needed, 24, 1);
 +      EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_preserved, 25, 1);
 +      EMIT_EABI_ATTRIBUTE (Tag_ABI_enum_size, 26, flag_short_enums ? 1 : 2);
 +
       /* Tag_ABI_optimization_goals.  */
       if (optimize_size)
        val = 4;
 @@ -22319,16 +22327,12 @@
        val = 1;
       else
        val = 6;
 -      asm_fprintf (asm_out_file, \t.eabi_attribute 30, %d\n, val);
 +      EMIT_EABI_ATTRIBUTE (Tag_ABI_optimization_goals, 30, val);

 -      /* Tag_CPU_unaligned_access.  */
 -      asm_fprintf (asm_out_file, \t.eabi_attribute 34, %d\n,
 -                  unaligned_access);
 +      EMIT_EABI_ATTRIBUTE (Tag_CPU_unaligned_access, 34, unaligned_access);

 -      /* Tag_ABI_FP_16bit_format.  */
       if (arm_fp16_format)
 -       asm_fprintf (asm_out_file, \t.eabi_attribute 38, %d\n,
 -                    (int)arm_fp16_format);
 +       EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_16bit_format, 38, (int) 
 arm_fp16_format);

       if (arm_lang_output_object_attributes_hook)
        arm_lang_output_object_attributes_hook();



Re: [Patch 2/5] ARM 64 bit sync atomic operations [V3]

2011-10-11 Thread Ramana Radhakrishnan
On 6 October 2011 18:52, Dr. David Alan Gilbert
david.gilb...@linaro.org wrote:
        Micahel K. Edwards points out in PR/48126 that the sync is in the 
 wrong place
        relative to the branch target of the compare, since the load could 
 float
        up beyond the ldrex.

        PR target/48126

          * config/arm/arm.c (arm_output_sync_loop): Move label before barrier

 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 5161439..6e7105a 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit,
        }
     }

 -  arm_process_output_memory_barrier (emit, NULL);
 +  /* Note: label is before barrier so that in cmp failure case we still get
 +     a barrier to stop subsequent loads floating upwards past the ldrex
 +     pr/48126.  */

Just one minor nit I just noticed. Please correct this to PR 48126 in
the comment rather than pr/48126.

Otherwise OK.

Ramana


Re: [Patch 3/5] ARM 64 bit sync atomic operations [V3]

2011-10-11 Thread Ramana Radhakrishnan
On 6 October 2011 18:53, Dr. David Alan Gilbert
david.gilb...@linaro.org wrote:
    Add support for ARM 64bit sync intrinsics.

        gcc/
        * arm.c (arm_output_ldrex): Support ldrexd.
          (arm_output_strex): Support strexd.
          (arm_output_it): New helper to output it in Thumb2 mode only.
          (arm_output_sync_loop): Support DI mode,
                                 Change comment to not support const_int.
          (arm_expand_sync): Support DI mode.

        * arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD.

        * iterators.md (NARROW): move from sync.md.
          (QHSD): New iterator for all current ARM integer modes.
          (SIDI): New iterator for SI and DI modes only.

        * sync.md  (sync_predtab): New mode_attr
          (sync_compare_and_swapsi): Fold into sync_compare_and_swapmode
          (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsimode
          (sync_sync_optabsi): Fold into sync_sync_optabmode
          (sync_nandsi): Fold into sync_nandmode
          (sync_new_sync_optabsi): Fold into sync_new_sync_optabmode
          (sync_new_nandsi): Fold into sync_new_nandmode
          (sync_old_sync_optabsi): Fold into sync_old_sync_optabmode
          (sync_old_nandsi): Fold into sync_old_nandmode
          (sync_compare_and_swapmode): Support SI  DI
          (sync_lock_test_and_setmode): Likewise
          (sync_sync_optabmode): Likewise
          (sync_nandmode): Likewise
          (sync_new_sync_optabmode): Likewise
          (sync_new_nandmode): Likewise
          (sync_old_sync_optabmode): Likewise
          (sync_old_nandmode): Likewise
          (arm_sync_compare_and_swapsi): Turn into iterator on SI  DI
          (arm_sync_lock_test_and_setsi): Likewise
          (arm_sync_new_sync_optabsi): Likewise
          (arm_sync_new_nandsi): Likewise
          (arm_sync_old_sync_optabsi): Likewise
          (arm_sync_old_nandsi): Likewise
          (arm_sync_compare_and_swapmode NARROW): use sync_predtab, fix 
 indent
          (arm_sync_lock_test_and_setsimode NARROW): Likewise
          (arm_sync_new_sync_optabmode NARROW): Likewise
          (arm_sync_new_nandmode NARROW): Likewise
          (arm_sync_old_sync_optabmode NARROW): Likewise
          (arm_sync_old_nandmode NARROW): Likewise

OK . Please commit this by Friday if no one else objects.

cheers
Ramana


Re: [Patch 5/5] ARM 64 bit sync atomic operations [V3]

2011-10-11 Thread Ramana Radhakrishnan
On 6 October 2011 18:54, Dr. David Alan Gilbert
david.gilb...@linaro.org wrote:
   Test support for ARM 64bit sync intrinsics.

      gcc/testsuite/
        * gcc.dg/di-longlong64-sync-1.c: New test.
        * gcc.dg/di-sync-multithread.c: New test.
        * gcc.target/arm/di-longlong64-sync-withhelpers.c: New test.
        * gcc.target/arm/di-longlong64-sync-withldrexd.c: New test.
        * lib/target-supports.exp: (arm_arch_*_ok): Series of  effective-target
                tests for v5, v6, v6k, and v7-a, and add-options helpers.
          (check_effective_target_arm_arm_ok): New helper.
          (check_effective_target_sync_longlong): New helper.

I would like one of the testsuite maintainers to have a second look at
this. I'm not confident about my dejagnu foo to fully review this.

Ramana


RE: [arm-embedded] Tune loop unrolling for cortex-m

2011-10-11 Thread Joey Ye
 -Original Message-
 From: Hans-Peter Nilsson [mailto:h...@bitrange.com]
 Sent: Wednesday, October 12, 2011 06:57
 To: Joey Ye
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: [arm-embedded] Tune loop unrolling for cortex-m
 
 On Wed, 21 Sep 2011, Joey Ye wrote:
 
  Committed in ARM/embedded-4_6-branch.
 
  2011-09-21  Jiangning Liu  jiangning@arm.com
 
  Tune loop unrolling for cortex-m
  * config/arm/arm-cores.def (cortex-m0): Change to new tune
  cortex_v6m.
  (cortex-m1): Likewise.
  * config/arm/arm-protos.h (max_unroll_times): New.
  * config/arm/arm.c (arm_default_unroll_times): New.
  (arm_cortex_m_unroll_times): New.
  (arm_cortex_v6m_tune): New.
  (arm_slowmul_tune): Add max_unroll_times function pointer.
  (arm_fastmul_tune, arm_xscale_tune, arm_9e_tune,
  arm_v6t2_tune, arm_cortex_tune, arm_cortex_a9_tune,
  arm_cortex_v7m_tune, arm_cortex_v6m_tune,
  arm_fa726te_tune): Likewise.
  (arm_option_override): Enable loop unroll for all all M class
  Cores, if optimization level is = 1.
 
 Shouldn't this kind of stuff get into trunk as well?
Sure. Working on it.

Thanks - Joey





Re: [Patch 5/5] ARM 64 bit sync atomic operations [V3]

2011-10-11 Thread Mike Stump
On Oct 6, 2011, at 10:54 AM, Dr. David Alan Gilbert wrote:
   Test support for ARM 64bit sync intrinsics.

Ok.  Watch for any fallout on non-arm systems.  I'd always invite people who 
think they know the best way to test volatile to chime in.  There is the new 
infrastructure to test multi core synch issues with gdb trickery.  As you want 
more beef, you can consider it.

I'll note that I do sometimes wonder if this type of code isn't better handled 
in #if feature tests inside the testcases themselves.


Re: [google] record compiler options to .note sections

2011-10-11 Thread Dehao Chen
Attached is the new patch. Bootstrapped on x86_64, no regressions.

gcc/ChangeLog.google-4_6:
2011-10-08  Dehao Chen  de...@google.com

   Add a flag (-frecord-gcc-switches-in-elf) to record compiler
   command line options to .gnu.switches.text sections of
   the object file.
   * coverage.c (write_opts_to_asm): Write the options to
   .gnu.switches.text sections.
   * common.opt: Ditto.
   * opts.h: Ditto.

gcc/c-family/ChangeLog.google-4_6:
2011-10-08  Dehao Chen  de...@google.com
   * c-opts.c (c_common_parse_file): Write the options to
   .gnu.switches.text sections.

gcc/testsuite/ChangeLog.google-4_6:
2011-10-08  Dehao Chen  de...@google.com

   * gcc.dg/record-gcc-switches-in-elf-1.c: New test.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 179836)
+++ gcc/doc/invoke.texi (working copy)
@@ -391,6 +391,7 @@
 -fpmu-profile-generate=@var{pmuoption} @gol
 -fpmu-profile-use=@var{pmuoption} @gol
 -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol
+-frecord-gcc-switches-in-elf@gol
 -freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
 -fripa -fripa-disallow-asm-modules -fripa-disallow-opt-mismatch @gol
@@ -8170,6 +8171,11 @@
 number of times it is called. The params variable
 note-cgraph-section-edge-threshold can be used to only list edges above a
 certain threshold.
+
+@item -frecord-gcc-switches-in-elf
+@opindex frecord-gcc-switches-in-elf
+Record the command line options in the .gnu.switches.text elf section
for sample
+based LIPO to do module grouping.
 @end table

 The following options control compiler behavior regarding floating
Index: gcc/c-family/c-opts.c
===
--- gcc/c-family/c-opts.c   (revision 179836)
+++ gcc/c-family/c-opts.c   (working copy)
@@ -1109,6 +1109,8 @@
   for (;;)
 {
   c_finish_options ();
+  if (flag_record_gcc_switches_in_elf  i == 0)
+   write_opts_to_asm ();
   pch_init ();
   set_lipo_c_parsing_context (parse_in, i, verbose);
   push_file_scope ();
Index: gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c
===
--- gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0)
+++ gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile} */
+/* { dg-options -frecord-gcc-switches-in-elf -Dtest -dA } */
+
+void foobar(int);
+
+void
+foo (void)
+{
+  int i;
+  for (i = 0; i  100; i++)
+{
+  foobar(i);
+}
+}
+
+/* { dg-final { scan-assembler-times Dtest 1 } } */
Index: gcc/opts.h
===
--- gcc/opts.h  (revision 179836)
+++ gcc/opts.h  (working copy)
@@ -381,4 +381,5 @@
 extern void set_struct_debug_option (struct gcc_options *opts,
 location_t loc,
 const char *value);
+extern void write_opts_to_asm (void);
 #endif
Index: gcc/coverage.c
===
--- gcc/coverage.c  (revision 179836)
+++ gcc/coverage.c  (working copy)
@@ -55,6 +55,7 @@
 #include diagnostic-core.h
 #include intl.h
 #include l-ipo.h
+#include dwarf2asm.h

 #include gcov-io.h
 #include gcov-io.c
@@ -2146,4 +2147,69 @@
   return 0;
 }

+/* Write command line options to the .note section.  */
+
+void
+write_opts_to_asm (void)
+{
+  size_t i;
+  cpp_dir *quote_paths, *bracket_paths, *pdir;
+  struct str_list *pdef, *pinc;
+  int num_quote_paths = 0;
+  int num_bracket_paths = 0;
+
+  get_include_chains (quote_paths, bracket_paths);
+
+  /* Write quote_paths to ASM section.  */
+  switch_to_section (get_section (.gnu.switches.text.quote_paths,
+ SECTION_DEBUG, NULL));
+  for (pdir = quote_paths; pdir; pdir = pdir-next)
+{
+  if (pdir == bracket_paths)
+   break;
+  num_quote_paths++;
+}
+  dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL);
+  dw2_asm_output_data_uleb128 (num_quote_paths, NULL);
+  for (pdir = quote_paths; pdir; pdir = pdir-next)
+{
+  if (pdir == bracket_paths)
+   break;
+  dw2_asm_output_nstring (pdir-name, (size_t)-1, NULL);
+}
+
+  /* Write bracket_paths to ASM section.  */
+  switch_to_section (get_section (.gnu.switches.text.bracket_paths,
+ SECTION_DEBUG, NULL));
+  for (pdir = bracket_paths; pdir; pdir = pdir-next)
+num_bracket_paths++;
+  dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL);
+  dw2_asm_output_data_uleb128 (num_bracket_paths, NULL);
+  for (pdir = bracket_paths; pdir; pdir = pdir-next)
+dw2_asm_output_nstring (pdir-name, (size_t)-1, NULL);
+
+  /* Write cpp_defines to ASM section.  */
+  switch_to_section (get_section 

Re: [google] record compiler options to .note sections

2011-10-11 Thread Xinliang David Li
ok.

David

On Tue, Oct 11, 2011 at 9:51 PM, Dehao Chen de...@google.com wrote:
 Attached is the new patch. Bootstrapped on x86_64, no regressions.

 gcc/ChangeLog.google-4_6:
 2011-10-08  Dehao Chen  de...@google.com

       Add a flag (-frecord-gcc-switches-in-elf) to record compiler
       command line options to .gnu.switches.text sections of
       the object file.
       * coverage.c (write_opts_to_asm): Write the options to
       .gnu.switches.text sections.
       * common.opt: Ditto.
       * opts.h: Ditto.

 gcc/c-family/ChangeLog.google-4_6:
 2011-10-08  Dehao Chen  de...@google.com
       * c-opts.c (c_common_parse_file): Write the options to
       .gnu.switches.text sections.

 gcc/testsuite/ChangeLog.google-4_6:
 2011-10-08  Dehao Chen  de...@google.com

       * gcc.dg/record-gcc-switches-in-elf-1.c: New test.

 Index: gcc/doc/invoke.texi
 ===
 --- gcc/doc/invoke.texi (revision 179836)
 +++ gcc/doc/invoke.texi (working copy)
 @@ -391,6 +391,7 @@
  -fpmu-profile-generate=@var{pmuoption} @gol
  -fpmu-profile-use=@var{pmuoption} @gol
  -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol
 +-frecord-gcc-switches-in-elf@gol
  -freorder-blocks-and-partition -freorder-functions @gol
  -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol
  -fripa -fripa-disallow-asm-modules -fripa-disallow-opt-mismatch @gol
 @@ -8170,6 +8171,11 @@
  number of times it is called. The params variable
  note-cgraph-section-edge-threshold can be used to only list edges above a
  certain threshold.
 +
 +@item -frecord-gcc-switches-in-elf
 +@opindex frecord-gcc-switches-in-elf
 +Record the command line options in the .gnu.switches.text elf section
 for sample
 +based LIPO to do module grouping.
  @end table

  The following options control compiler behavior regarding floating
 Index: gcc/c-family/c-opts.c
 ===
 --- gcc/c-family/c-opts.c       (revision 179836)
 +++ gcc/c-family/c-opts.c       (working copy)
 @@ -1109,6 +1109,8 @@
   for (;;)
     {
       c_finish_options ();
 +      if (flag_record_gcc_switches_in_elf  i == 0)
 +       write_opts_to_asm ();
       pch_init ();
       set_lipo_c_parsing_context (parse_in, i, verbose);
       push_file_scope ();
 Index: gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c
 ===
 --- gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0)
 +++ gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0)
 @@ -0,0 +1,16 @@
 +/* { dg-do compile} */
 +/* { dg-options -frecord-gcc-switches-in-elf -Dtest -dA } */
 +
 +void foobar(int);
 +
 +void
 +foo (void)
 +{
 +  int i;
 +  for (i = 0; i  100; i++)
 +    {
 +      foobar(i);
 +    }
 +}
 +
 +/* { dg-final { scan-assembler-times Dtest 1 } } */
 Index: gcc/opts.h
 ===
 --- gcc/opts.h  (revision 179836)
 +++ gcc/opts.h  (working copy)
 @@ -381,4 +381,5 @@
  extern void set_struct_debug_option (struct gcc_options *opts,
                                     location_t loc,
                                     const char *value);
 +extern void write_opts_to_asm (void);
  #endif
 Index: gcc/coverage.c
 ===
 --- gcc/coverage.c      (revision 179836)
 +++ gcc/coverage.c      (working copy)
 @@ -55,6 +55,7 @@
  #include diagnostic-core.h
  #include intl.h
  #include l-ipo.h
 +#include dwarf2asm.h

  #include gcov-io.h
  #include gcov-io.c
 @@ -2146,4 +2147,69 @@
   return 0;
  }

 +/* Write command line options to the .note section.  */
 +
 +void
 +write_opts_to_asm (void)
 +{
 +  size_t i;
 +  cpp_dir *quote_paths, *bracket_paths, *pdir;
 +  struct str_list *pdef, *pinc;
 +  int num_quote_paths = 0;
 +  int num_bracket_paths = 0;
 +
 +  get_include_chains (quote_paths, bracket_paths);
 +
 +  /* Write quote_paths to ASM section.  */
 +  switch_to_section (get_section (.gnu.switches.text.quote_paths,
 +                                 SECTION_DEBUG, NULL));
 +  for (pdir = quote_paths; pdir; pdir = pdir-next)
 +    {
 +      if (pdir == bracket_paths)
 +       break;
 +      num_quote_paths++;
 +    }
 +  dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL);
 +  dw2_asm_output_data_uleb128 (num_quote_paths, NULL);
 +  for (pdir = quote_paths; pdir; pdir = pdir-next)
 +    {
 +      if (pdir == bracket_paths)
 +       break;
 +      dw2_asm_output_nstring (pdir-name, (size_t)-1, NULL);
 +    }
 +
 +  /* Write bracket_paths to ASM section.  */
 +  switch_to_section (get_section (.gnu.switches.text.bracket_paths,
 +                                 SECTION_DEBUG, NULL));
 +  for (pdir = bracket_paths; pdir; pdir = pdir-next)
 +    num_bracket_paths++;
 +  dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL);
 +  dw2_asm_output_data_uleb128 (num_bracket_paths, NULL);
 +  for (pdir = 

Re: [C++-11] User defined literals

2011-10-11 Thread Ed Smith-Rowland

On 10/11/2011 12:57 PM, Jason Merrill wrote:

On 10/11/2011 12:55 PM, Jason Merrill wrote:

On 10/09/2011 07:19 PM, Ed Smith-Rowland wrote:

Does cp_parser_identifier (parser) *not* consume the identifier token?


I'm pretty sure it does.

It does.


Does it work to only complain if !cp_parser_parsing_tentatively?


I suppose not, if you got no complaints with cp_parser_error.

Jason


cp_parser_operator(function_id) is simply run twice in 
cp_parser_unqualified_id.

Once inside cp_parser_template_id called at parser.c:4515.
Once directly inside cp_parser_unqualified_id at parser.c:4525.

cp_parser_template_id never succeeds with literal operator templates.  I 
find that curious.  But I haven't looked real hard and the things do get 
parsed somehow.




Re: PR c++/30195

2011-10-11 Thread Fabien Chêne
2011/10/11 Jason Merrill ja...@redhat.com:
 On 10/10/2011 03:56 PM, Fabien Chêne wrote:

 It tried to add the target declaration of a USING_DECL in the
 method_vec of the class where the USING_DECL is declared. Thus, I
 copied the target decl, adjusted its access, and then called
 add_method with the target decl.

 Copying the decl is unlikely to do what we want, I think.  Does putting the
 target decl directly into the method vec work?

Unfortunately not, it ends up with the same error: undefined
reference. Furthermore, I don't think it is the right approach since
the access may be different between the member function and the using
declaration... Never mind.

 If not, perhaps lookup_fnfields_1 should look through the field list for 
 function
 USING_DECLs.

That's what I've tried first, and it works. Though, I guess you mean
lookup_field_r should perform an additional lookup if
lookup_fnfields_1 does not find anything.

The attached patch implement that, and eventually fixed c++/26256,
c++/25994, c++/30195, c++/6936.
tested x86_64-unknown-linux-gnu without new regressions.

gcc/ChangeLog

2011-10-11  Fabien Chêne  fab...@gcc.gnu.org

PR c++/6936
PR c++/25994
PR c++/26256
PR c++/30195
* dbxout.c (dbxout_type_fields): Ignore using declarations.


gcc/testsuite/ChangeLog

2011-10-11  Fabien Chêne  fab...@gcc.gnu.org

PR c++/6936
PR c++/25994
PR c++/26256
PR c++/30195
* g++.dg/lookup/using23.C: New.
* g++.dg/lookup/using24.C: New.
* g++.dg/lookup/using25.C: New.
* g++.dg/lookup/using26.C: New.
* g++.dg/lookup/using27.C: New.
* g++.dg/lookup/using28.C: New.
* g++.dg/lookup/using29.C: New.
* g++.dg/lookup/using30.C: New.
* g++.dg/lookup/using31.C: New.
* g++.dg/lookup/using32.C: New.
* g++.dg/lookup/using33.C: New.
* g++.dg/lookup/using34.C: New.
* g++.dg/lookup/using35.C: New.
* g++.dg/lookup/using36.C: New.
* g++.dg/lookup/using37.C: New.
* g++.dg/lookup/using38.C: New.
* g++.dg/debug/using4.C: New.
* g++.dg/debug/using5.C: New.
* g++.dg/cpp0x/forw_enum10.C: New.
* g++.old-deja/g++.other/using1.C: Adjust.
* g++.dg/template/using2.C: Likewise.

gcc/cp/ChangeLog

2011-10-11  Fabien Chêne  fab...@gcc.gnu.org

PR c++/6936
PR c++/25994
PR c++/26256
PR c++/30195
* search.c (lookup_field_1): Get rid of the comment saying that
USING_DECL should not be returned, and actually return USING_DECL
if appropriate.
(lookup_field_r): Call lookup_fnfields_slot with LOOKUP_USING=true
instead of lookup_fnfields_1.
(lookup_fnfields_slot): add a new parameter LOOKUP_USING, and
perform an additional lookup for USING_DECLs targeting functions
if LOOKUP_USING is set to true.
* semantics.c (finish_member_declaration): Remove the check that
prevents USING_DECLs from being verified by pushdecl_class_level.
* typeck.c (build_class_member_access_expr): Handle USING_DECLs.
* class.c (check_field_decls): Keep using declarations.
(add_method): Remove a wrong diagnostic about conflicting using
declarations.
(type_has_move_assign): Call lookup_fnfields_slot with
LOOKUP_USING set to false.
* parser.c (cp_parser_nonclass_name): Handle USING_DECLs.
* decl.c (start_enum): Call xref_tag whenever possible.
* name-lookup.c (strip_using_decl): New function.
(supplement_binding_1): Call strip_using_decl on decl and
bval. Perform most of the checks with USING_DECLs stripped.  Also
check that the target decl and the target bval does not refer to
the same declaration. Allow pushing an enum multiple times in a
template class.
(push_class_level_binding): Call strip_using_decl on decl and
bval. Perform most of the checks with USING_DECLs stripped. Return
true if both decl and bval refer to USING_DECLs and are dependent.
* call.c (build_user_type_conversion_1): Call lookup_fnfields_slot
with LOOKUP_USING set to false.

-- 
Fabien


using.patch
Description: Binary data