date:20110928

Re: [google] Backport r178628 from trunk to google/gcc-4_6 (issue5139050)

2011-09-28 Thread 關振德

This is okay

On Mon, Sep 26, 2011 at 6:14 PM, Guozhi Wei car...@google.com wrote:
 Hi

 This patch removes the XPASS of test case 20040204-1.c on arm.

 Tested by
 on arm: make check-gcc 
 RUNTESTFLAGS=--target_board=arm-sim/thumb/arch=armv7-a 
 tree-ssa.exp=20040204-1.c
 on x86_64: make check-gcc RUNTESTFLAGS=tree-ssa.exp=20040204-1.c

 OK for google/gcc-4_6?

 thanks
 Carrot


 2011-09-27  Guozhi Wei  car...@google.com

        2011-09-07  Jiangning Liu  jiangning@arm.com

                PR tree-optimization/46021
                * gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*.


 Index: gcc.dg/tree-ssa/20040204-1.c
 ===
 --- gcc.dg/tree-ssa/20040204-1.c        (revision 179225)
 +++ gcc.dg/tree-ssa/20040204-1.c        (working copy)
 @@ -33,5 +33,5 @@ void test55 (int x, int y)
    that the  should be emitted (based on BRANCH_COST).  Fix this
    by teaching dom to look through  and register all components
    as true.  */
 -/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { ! 
 alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* 
 mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } } } */
 +/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { ! 
 alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* 
 mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } } } 
 */
  /* { dg-final { cleanup-tree-dump optimized } } */

 --
 This patch is available for review at http://codereview.appspot.com/5139050

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu

  -static inline bool
  +extern bool
 
 static bool
 
   ix86_using_red_zone (void)
   {
    return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
  @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
   #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
   #endif
 

...

 
  +/* Return true if the ABI allows red zone access.  */
  +extern bool
 
 static bool
 
  +rs6000_stack_using_red_zone (void)
  +{
  +   return (DEFAULT_ABI != ABI_V4);
  +}
  +
 

Uros,

Thanks for your review. Accept and new patch is as below. No change for
ChangeLog.

Thanks,
-Jiangning

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ff8c49f..e200974 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2631,7 +2631,7 @@ static const char *const
cpu_names[TARGET_CPU_DEFAULT_max] =
 

 /* Return true if a red-zone is in use.  */
 
-static inline bool
+static bool
 ix86_using_red_zone (void)
 {
   return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
@@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
 #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone
+
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE ix86_function_value
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1ab57e5..1e64d14 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1537,6 +1537,9 @@ static const struct attribute_spec
rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
+#undef TARGET_STACK_USING_RED_ZONE
+#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
+
 /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
The PowerPC architecture requires only weak consistency among
processors--that is, memory accesses between processors need not be
@@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
}
 }
 
+/* Return true if the ABI allows red zone access.  */
+static bool
+rs6000_stack_using_red_zone (void)
+{
+   return (DEFAULT_ABI != ABI_V4);
+}
+
 /* Return true if OFFSET from stack pointer can be clobbered by signals.
V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
below stack pointer not cloberred by signals.  */
@@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
using_mfcr_multiple)
 static inline bool
 offset_below_red_zone_p (HOST_WIDE_INT offset)
 {
-  return offset  (DEFAULT_ABI == ABI_V4
+  return offset  (!TARGET_STACK_USING_RED_ZONE
   ? 0
   : TARGET_32BIT ? -220 : -288);
 }
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 335c1d1..c848806 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR
 On some architectures it can take multiple instructions to synthesize
 a constant.  If there is another constant already in a register that
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 6783826..0fa9e10 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -11223,6 +11223,22 @@ to the stack.  Therefore, this hook should return
true in general, but
 false for naked functions.  The default implementation always returns true.
 @end deftypefn
 
+@hook TARGET_STACK_USING_RED_ZONE
+This hook returns true if the target has a red zone (an area beyond the
+current extent of the stack that cannot be modified by asynchronous events
+on the processor).
+
+If this hook returns false then the compiler mid-end will not move an
access
+to memory in the stack frame past a stack adjustment insn.
+
+If this hook returns true then the compiler mid-end will assume that it is
+safe to move an access to memory in the stack frame past a stack adjustment
+insn.  The target back-end must emit scheduling barrier insns when this is
+unsafe.
+
+The default is to return false which is safe and appropriate for most
targets.
+@end deftypefn
+
 @hook TARGET_CONST_ANCHOR
 On some architectures it can take multiple

[arm-embedded] Backport mainline r173371

2011-09-28 Thread Terry Guo

Backport mainline r173371 to ARM/embedded-4_6-branch.

Committed.

2011-09-28  Terry Guo  terry@arm.com

Backport r173371 from mainline
2011-05-04  Andreas Krebbel  andreas.kreb...@de.ibm.com

* calls.c (emit_library_call_value_1): Invoke
promote_function_mode hook on libcall arguments.
* explow.c (promote_function_mode, promote_mode): Handle TYPE
argument being NULL.
* targhooks.c (default_promote_function_mode): Lisewise.
* config/s390/s390.c (s390_promote_function_mode): Likewise.
* config/sparc/sparc.c (sparc_promote_function_mode): Likewise.

* doc/tm.texi: Document that TYPE argument might be NULL.

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu

Just realized ChangeLog needs to be changed as well.

ChangeLog:

* config/i386/i386.c (ix86_using_red_zone): Remove inline.
(TARGET_STACK_USING_RED_ZONE): New.
* config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
(TARGET_STACK_USING_RED_ZONE): New.
(offset_below_red_zone_p): Change to use new hook 
TARGET_STACK_USING_RED_ZONE.
* doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
* doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
* sched-deps.c (sched_analyze_1): If the stack pointer is being 
modified and stack red zone is not supported for ports, flush out 
all memory references as they may become invalid if moved across 
the stack adjustment.
* target.def (stack_using_red_zone): New.
* testsuite/gcc.target/arm/stack-red-zone.c: New.

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jiangning Liu
 Sent: Wednesday, September 28, 2011 2:24 PM
 To: 'Uros Bizjak'
 Cc: gcc-patches@gcc.gnu.org; j...@suse.cz; geo...@geoffk.org;
 dje@gmail.com; r...@redhat.com; Richard Earnshaw; Matthew Gretton-
 Dann
 Subject: RE: [PATCH] Fix stack red zone bug (PR38644)
 
   -static inline bool
   +extern bool
 
  static bool
 
    ix86_using_red_zone (void)
    {
     return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
   @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
    #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
    #endif
  
 
 ...
 
  
   +/* Return true if the ABI allows red zone access.  */
   +extern bool
 
  static bool
 
   +rs6000_stack_using_red_zone (void)
   +{
   +   return (DEFAULT_ABI != ABI_V4);
   +}
   +
 
 
 Uros,
 
 Thanks for your review. Accept and new patch is as below. No change for
 ChangeLog.
 
 Thanks,
 -Jiangning
 
 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index ff8c49f..e200974 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -2631,7 +2631,7 @@ static const char *const
 cpu_names[TARGET_CPU_DEFAULT_max] =
 
 
  /* Return true if a red-zone is in use.  */
 
 -static inline bool
 +static bool
  ix86_using_red_zone (void)
  {
return TARGET_RED_ZONE  !TARGET_64BIT_MS_ABI;
 @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void)
  #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail
  #endif
 
 +#undef TARGET_STACK_USING_RED_ZONE
 +#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone
 +
  #undef TARGET_FUNCTION_VALUE
  #define TARGET_FUNCTION_VALUE ix86_function_value
 
 diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
 index 1ab57e5..1e64d14 100644
 --- a/gcc/config/rs6000/rs6000.c
 +++ b/gcc/config/rs6000/rs6000.c
 @@ -1537,6 +1537,9 @@ static const struct attribute_spec
 rs6000_attribute_table[] =
  #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
  #endif
 
 +#undef TARGET_STACK_USING_RED_ZONE
 +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone
 +
  /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
 The PowerPC architecture requires only weak consistency among
 processors--that is, memory accesses between processors need not be
 @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int
 using_mfcr_multiple)
   }
  }
 
 +/* Return true if the ABI allows red zone access.  */
 +static bool
 +rs6000_stack_using_red_zone (void)
 +{
 +   return (DEFAULT_ABI != ABI_V4);
 +}
 +
  /* Return true if OFFSET from stack pointer can be clobbered by
 signals.
 V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes
 below stack pointer not cloberred by signals.  */
 @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int
 using_mfcr_multiple)
  static inline bool
  offset_below_red_zone_p (HOST_WIDE_INT offset)
  {
 -  return offset  (DEFAULT_ABI == ABI_V4
 +  return offset  (!TARGET_STACK_USING_RED_ZONE
  ? 0
  : TARGET_32BIT ? -220 : -288);
  }
 diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
 index 335c1d1..c848806 100644
 --- a/gcc/doc/tm.texi
 +++ b/gcc/doc/tm.texi
 @@ -11332,6 +11332,22 @@ to the stack.  Therefore, this hook should
 return
 true in general, but
  false for naked functions.  The default implementation always returns
 true.
  @end deftypefn
 
 +@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void)
 +This hook returns true if the target has a red zone (an area beyond
 the
 +current extent of the stack that cannot be modified by asynchronous
 events
 +on the processor).
 +
 +If this hook returns false then the compiler mid-end will not move an
 access
 +to memory in the stack frame past a stack adjustment insn.
 +
 +If this hook returns true then the compiler mid-end will assume that
 it is
 +safe to move an access to memory in the stack frame past a stack
 adjustment
 +insn.  The target back-end must emit scheduling barrier insns when
 this is
 +unsafe.
 +
 +The default is to

Re: [ARM] Optimise handling of neon_vget_high/low

2011-09-28 Thread Richard Sandiford

Ramana Radhakrishnan ramana.radhakrish...@linaro.org writes:
 On 14 September 2011 13:30, Richard Sandiford
 richard.sandif...@linaro.org wrote:
 neon_vget_high and neon_vget_low extract one half of a vector.
 The patterns look like:

 (define_insn neon_vget_highv16qi
  [(set (match_operand:V8QI 0 s_register_operand =w)
        (vec_select:V8QI (match_operand:V16QI 1 s_register_operand w)
                         (parallel [(const_int 8) (const_int 9)
                                    (const_int 10) (const_int 11)
                                    (const_int 12) (const_int 13)
                                    (const_int 14) (const_int 15)])))]
  TARGET_NEON
 {
  int dest = REGNO (operands[0]);
  int src = REGNO (operands[1]);

  if (dest != src + 2)
    return vmov\t%P0, %f1;
  else
    return ;
 }
  [(set_attr neon_type neon_bp_simple)]
 )

 But there's nothing here to tell the register allocator what's expected
 of it, so we do often get the move.  The patch below makes the patterns
 expand to normal subreg moves instead.

 Unfortunately, when I first tried this, I ran across some bugs in
 simplify-rtx.c.  They should be fixed now.  Of course, I can't
 guarantee that there are other similar bugs elsewhere, but I'll
 try to fix any that crop up.

 The new patterns preserve the current treatment on big-endian targets.
 Namely, the low half is always in the lower-numbered registers
 (subreg byte offset 0).

 Tested on arm-linux-gnueabi.  OK to install?


 This is OK . Please watch out for any fallout that comes as a result
 of this . It would be useful to do some big endian testing at some
 point but I'm not going to let that hold up this patch.

 While you are here can you look at the quad_halves_codev4si etc.
 patterns neon_move_lo_quad_mode and neon_move_hi_quad_mode
 patterns which seem to have the same problem ?

Ah, yeah, thanks for the pointer.

Tested on arm-linux-gnueabi, and on benchmarks which should (and did)
benefit from it.  OK to install?

Richard


gcc/
* config/arm/neon.md (neon_move_lo_quad_mode): Delete.
(neon_move_hi_quad_mode): Likewise.
(move_hi_quad_mode, move_lo_quad_mode): Use subreg moves.

Index: gcc/config/arm/neon.md
===
--- gcc/config/arm/neon.md  2011-09-27 09:27:18.0 +0100
+++ gcc/config/arm/neon.md  2011-09-27 09:45:26.008275971 +0100
@@ -1251,66 +1251,14 @@ (define_insn quad_halves_codev16qi
 (const_string neon_int_1) (const_string neon_int_5)))]
 )
 
-; FIXME: We wouldn't need the following insns if we could write subregs of
-; vector registers. Make an attempt at removing unnecessary moves, though
-; we're really at the mercy of the register allocator.
-
-(define_insn neon_move_lo_quad_mode
-  [(set (match_operand:ANY128 0 s_register_operand +w)
-(vec_concat:ANY128
-  (match_operand:V_HALF 1 s_register_operand w)
-  (vec_select:V_HALF 
-   (match_dup 0)
-   (match_operand:ANY128 2 vect_par_constant_high ]
-  TARGET_NEON
-{
-  int dest = REGNO (operands[0]);
-  int src = REGNO (operands[1]);
-
-  if (dest != src)
-return vmov\t%e0, %P1;
-  else
-return ;
-}
-  [(set_attr neon_type neon_bp_simple)]
-)
-
-(define_insn neon_move_hi_quad_mode
-  [(set (match_operand:ANY128 0 s_register_operand +w)
-(vec_concat:ANY128
-  (vec_select:V_HALF
-   (match_dup 0)
-   (match_operand:ANY128 2 vect_par_constant_low ))
-  (match_operand:V_HALF 1 s_register_operand w)))]
-  
-  TARGET_NEON
-{
-  int dest = REGNO (operands[0]);
-  int src = REGNO (operands[1]);
-
-  if (dest != src)
-return vmov\t%f0, %P1;
-  else
-return ;
-}
-  [(set_attr neon_type neon_bp_simple)]
-)
-
 (define_expand move_hi_quad_mode
  [(match_operand:ANY128 0 s_register_operand )
   (match_operand:V_HALF 1 s_register_operand )]
  TARGET_NEON
 {
-  rtvec v = rtvec_alloc (V_mode_nunits/2);
-  rtx t1;
-  int i;
-
-  for (i=0; i  (V_mode_nunits/2); i++)
- RTVEC_ELT (v, i) = GEN_INT (i);
-
-  t1 = gen_rtx_PARALLEL (MODEmode, v);
-  emit_insn (gen_neon_move_hi_quad_mode (operands[0], operands[1], t1));
-
+  emit_move_insn (simplify_gen_subreg (V_HALFmode, operands[0], MODEmode,
+  GET_MODE_SIZE (V_HALFmode)),
+ operands[1]);
   DONE;
 })
 
@@ -1319,16 +1267,9 @@ (define_expand move_lo_quad_mode
   (match_operand:V_HALF 1 s_register_operand )]
  TARGET_NEON
 {
-  rtvec v = rtvec_alloc (V_mode_nunits/2);
-  rtx t1;
-  int i;
-
-  for (i=0; i  (V_mode_nunits/2); i++)
- RTVEC_ELT (v, i) = GEN_INT ((V_mode_nunits/2) + i);
-
-  t1 = gen_rtx_PARALLEL (MODEmode, v);
-  emit_insn (gen_neon_move_lo_quad_mode (operands[0], operands[1], t1));
-
+  emit_move_insn (simplify_gen_subreg (V_HALFmode, operands[0],
+  MODEmode, 0),
+ operands[1]);
   DONE;
 })

Re: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Uros Bizjak

On Wed, Sep 28, 2011 at 8:42 AM, Jiangning Liu jiangning@arm.com wrote:
 Just realized ChangeLog needs to be changed as well.

 ChangeLog:

        * config/i386/i386.c (ix86_using_red_zone): Remove inline.
        (TARGET_STACK_USING_RED_ZONE): New.
        * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
        (TARGET_STACK_USING_RED_ZONE): New.
        (offset_below_red_zone_p): Change to use new hook
        TARGET_STACK_USING_RED_ZONE.
        * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
        * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
        * sched-deps.c (sched_analyze_1): If the stack pointer is being
        modified and stack red zone is not supported for ports, flush out
        all memory references as they may become invalid if moved across
        the stack adjustment.
        * target.def (stack_using_red_zone): New.
        * testsuite/gcc.target/arm/stack-red-zone.c: New.

OK for x86, but you need approval from middle-end maintainer for the patch.

Thanks,
Uros.

Re: Intrinsics for N2965: Type traits and base classes

2011-09-28 Thread Jonathan Wakely

On 28 September 2011 04:22, Michael Spertus wrote:
 Benjamin,
 I think tuple is wrong both for performance reasons (I believe these are 
 likely to be serious enough to depress use due to inordinately long compiles) 
 and because it prematurely locks us into a rigid choice of how our typelists 
 are implemented.

 My inclination is to make it type-independent by returning an unspecified 
 type that can have a sequence of types extracted from it (this is the 
 approach taken by boost::mpl and has loads of experience that shows it is a 
 good approach to metaprogramming). In other words, firstbasesA::type 
 would be the first base of A, etc.

Citing Boost MPL as a good way to avoid inordinately long compiles ...
interesting!  Have you ever tried to reduce a GCC bug report from 20k
lines to 20, because most Boost libs include every MPL header?!

I hope we can get a simple typelist _without_ needing everything else
in MPL, such as the apply and lambda metafunctions (and maybe a lot of
that could be massively simplified using variadic templates anyway.)

Re: [PATCH 1/2] Change random seeds to 64bit and drop re-crcing

2011-09-28 Thread Richard Guenther

On Tue, Sep 27, 2011 at 11:30 PM, Andi Kleen a...@firstfloor.org wrote:
 From: Andi Kleen a...@linux.intel.com

 I had some trouble with random build failures in a large LTO project
 and it turned out to be random seed collisions in a highly parallel build
 (thanks to Honza for suggesting that)

 There were multiple problems:
 - The way to generate the random seed is pure (milli seconds plus pid)
 - It's only 32bit
 - Several users take the existing ascii seed and re-CRC32 it again, which
 doesn't exactly improve it.

 This patch changes that to:
 - Always use 64bit seeds as numbers (no re-crcing)
 - Change all users to use HOST_WIDE_INT
 - When the user specifies a random seed it's still crc32ed, but only in
 this case.

 Passes bootstrap + testsuite on x86_64-linux. Ok to commit?

Ok if there are no objections from other folks within 24h.

Thanks,
Richard.

 -Andi

 gcc/cp:

 2011-09-26   Andi Kleen a...@linux.intel.com

        * repo.c (finish_repo): Use HOST_WIDE_INT_PRINT_HEX_PURE.

 gcc/:

 2011-09-26   Andi Kleen a...@linux.intel.com

        * hwint.h (HOST_WIDE_INT_PRINT_HEX_PURE): Add.
        * lto-streamer.c (lto_get_section_name): Remove crc32_string.
        Handle numerical random seed.
        * lto-streamer.h (lto_file_decl_data): Change id to unsigned 
 HOST_WIDE_INT.
        * toplev.c (random_seed): Add.
        (init_random_seed): Change for numerical random seed.
        (get_random_seed): Return as HOST_WIDE_INT.
        (set_random_seed): Crc32 existing string.
        * toplev.h (get_random_seed): Change to numercal return.
        * tree.c (get_file_function_name): Remove CRC. Handle numerical random 
 seed.

 gcc/lto/:

 2011-09-26   Andi Kleen a...@linux.intel.com

        * lto.c (lto_resolution_read): Remove id dumping.
        (lto_section_with_id): Turn id HOST_WIDE_ID.
        (create_subid_section_table): Dito.
 ---
  gcc/cp/repo.c      |    3 ++-
  gcc/hwint.h        |    1 +
  gcc/lto-streamer.c |    8 
  gcc/lto-streamer.h |    2 +-
  gcc/lto/lto.c      |   11 ---
  gcc/toplev.c       |   23 ---
  gcc/toplev.h       |    2 +-
  gcc/tree.c         |    6 +++---
  8 files changed, 28 insertions(+), 28 deletions(-)

 diff --git a/gcc/cp/repo.c b/gcc/cp/repo.c
 index 16a192e..ca971b6 100644
 --- a/gcc/cp/repo.c
 +++ b/gcc/cp/repo.c
 @@ -263,7 +263,8 @@ finish_repo (void)
         anonymous namespaces will get the same mangling when this
         file is recompiled.  */
       if (!strstr (args, '-frandom-seed=))
 -       fprintf (repo_file,  '-frandom-seed=%s', get_random_seed (false));
 +       fprintf (repo_file,  '-frandom-seed= HOST_WIDE_INT_PRINT_HEX_PURE 
 ',
 +                get_random_seed (false));
       fprintf (repo_file, \n);
     }

 diff --git a/gcc/hwint.h b/gcc/hwint.h
 index 2643aee..f5e0bee 100644
 --- a/gcc/hwint.h
 +++ b/gcc/hwint.h
 @@ -102,6 +102,7 @@ extern char sizeof_long_long_must_be_8[sizeof(long long) 
 == 8 ? 1 : -1];
  #define HOST_WIDE_INT_PRINT_DEC_C HOST_WIDE_INT_PRINT_DEC 
 HOST_WIDE_INT_PRINT_C
  #define HOST_WIDE_INT_PRINT_UNSIGNED % HOST_WIDE_INT_PRINT u
  #define HOST_WIDE_INT_PRINT_HEX %# HOST_WIDE_INT_PRINT x
 +#define HOST_WIDE_INT_PRINT_HEX_PURE % HOST_WIDE_INT_PRINT x

  /* Set HOST_WIDEST_INT.  This is a 64-bit type unless the compiler
    in use has no 64-bit type at all; in that case it's 32 bits.  */
 diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
 index 633c3ce..e3ccb79 100644
 --- a/gcc/lto-streamer.c
 +++ b/gcc/lto-streamer.c
 @@ -166,13 +166,13 @@ lto_get_section_name (int section_type, const char 
 *name, struct lto_file_decl_d
      doesn't confuse the reader with merged sections.

      For options don't add a ID, the option reader cannot deal with them
 -     and merging should be ok here.
 -
 -     XXX: use crc64 to minimize collisions? */
 +     and merging should be ok here. */
   if (section_type == LTO_section_opts)
     strcpy (post, );
 +  else if (f != NULL)
 +    sprintf (post, . HOST_WIDE_INT_PRINT_HEX_PURE, f-id);
   else
 -    sprintf (post, .%x, f ? f-id : crc32_string(0, get_random_seed 
 (false)));
 +    sprintf (post, . HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed 
 (false));
   return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
  }

 diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
 index 190d6a3..2564bd2 100644
 --- a/gcc/lto-streamer.h
 +++ b/gcc/lto-streamer.h
 @@ -552,7 +552,7 @@ struct GTY(()) lto_file_decl_data
   struct lto_file_decl_data *next;

   /* Sub ID for merged objects. */
 -  unsigned id;
 +  unsigned HOST_WIDE_INT id;

   /* Symbol resolutions for this file */
   VEC(ld_plugin_symbol_resolution_t,heap) * GTY((skip)) resolutions;
 diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
 index 0b1dcb9..77eb1a1 100644
 --- a/gcc/lto/lto.c
 +++ b/gcc/lto/lto.c
 @@ -976,9 +976,6 @@ lto_resolution_read (splay_tree file_ids, FILE 
 *resolution, lto_file *file)
        }

       file_data = (struct lto_file_decl_data *)nd-value;
 -      if

Re: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com wrote:

 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Tuesday, September 27, 2011 3:41 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

 On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu jiangning@arm.com
 wrote:
  Think of it this way.  What the IR says is there is no barrier
 between
  those moves.  You either have an implicit barrier (which is what you
  are proposing) or you have it explicitly.  I think we all rather
 have
  more things explicit rather than implicit in the IR.  And that has
  been the overall feeling for a few years now.

  Sorry, I'm afraid I can't agree with you. Instead, I think using
 barrier to describe this kind of dependence is a kind of implicit
 method. Please note that this is not an usual data dependence issue.
 The stack pointer change doesn't have any dependence with memory access
 at all.

 It is similar to atomic instructions that require being an
 optimization/memory barrier.  Sure it is not a usual data dependence
 (otherwise we would handle
 it already), so the targets have to explicitly express the dependence
 somehow, for which we only have barriers right now.

 Richard,

 Thanks for your explanation. It's explicit to back-end, while it's implicit
 to scheduler in middle end, because barrier can decide dependence in
 scheduler but barrier can be generated from several different scenarios.
 It's unsafe and prone to introduce bug if any one of the scenarios requiring
 generating barriers is being missed in back-end.

 Between middle-end and back-end, we should have interfaces that is easy to
 be implemented by back-end. After all, middle-end itself can't consist of a
 compiler, and vice versa. Back-end needs middle-end's help to make sure
 back-end is easy to be implemented and reduce the possibility of introducing
 bugs.

 Without an explicit hook as I'm proposing, back-end implementers have to
 clearly know all scenarios of generating barrier very clearly, because there
 isn't any code tips and comments in middle end telling back-end the list of
 all scenarios on generating barriers.

 Yes, barrier is a perfect interface for scheduler in theory. But from
 engineering point of view, I think it's better to explicitly define an
 interface to describe stack red zone and inform back-end, or vice versa. Not
 like computer, people is easy to make mistake if you don't tell them. On
 this bug, the fact is over the years different back-ends made similar bugs.

 GCC is really a perfect platform on building new ports, and I saw a lot of
 new back-ends. The current middle end is unsafe, if port doesn't support
 stack red zone and back-ends doesn't generate barrier for it. Why can't we
 explicitly clarify this in compiler code between middle end and back end?
 What if any other back-end (new or old) NOT supporting stack red zone
 exposing the similar bug again?

There are gazillion things you have to explicitly get right in your backends,
so I don't see why exposing proper scheduling barriers should be special,
and there, why red-zones should be special (as opposed to other occasions
where you need to emit barriers from the backend for the scheduler).

Richard.

 Thanks,
 -Jiangning

 Richard.

  No matter what solution itself is, the problem itself is a quite a
 common one on ISA level, so we should solve it in middle-end, because
 middle-end is shared for all ports.

  My proposal avoids problems in future. Any new ports and new back-end
 implementations needn't explicitly define this hook in a very safe way.
 But if one port wants to unsafely introduce red zone, we can
 explicitly define this hook in back-end. This way, we would avoid the
 bug in the earliest time. Do you really want to hide this problem in
 back-end silently? And wait others to report bug and then waste time to
 get it fixed again?

  The facts I see is over the years, for different ports reported the
 similar issue around this, and somebody tried to fix them. Actually,
 this kind of problem should be fixed in design level. If the first
 people solve this bug can give solution in middle end, we would not
 need to waste time on filing those bugs in bug zilla and fixing them
 around this problem.

  Thanks,
  -Jiangning

[patch, testsuite, ARM] Skip architecture option in pr42575.c

2011-09-28 Thread Joey Ye

2011-09-28  Joey Ye  joey...@arm.com

* gcc.target/arm/pr42575.c: Remove architecture option.

Index: gcc/testsuite/gcc.target/arm/pr42575.c
===
--- gcc/testsuite/gcc.target/arm/pr42575.c  (revision 179308)
+++ gcc/testsuite/gcc.target/arm/pr42575.c  (working copy)
@@ -1,4 +1,4 @@
-/* { dg-options -O2 -march=armv7-a }  */
+/* { dg-options -O2 }  */
 /* Make sure RA does good job allocating registers and avoids
unnecessary moves.  */
 /* { dg-final { scan-assembler-not mov } } */

[PATCH, PR50485, committed] Initialize variable in sse4_1-blendps{,-2}.c with random floats

2011-09-28 Thread Tom de Vries

Committed as obvious.

Thanks,
- Tom

2011-09-28  Tom de Vries  t...@codesourcery.com

PR testsuite/50485
* gcc.target/i386/sse4_1-blendps.c: Include stdlib.h.
(TEST): Initialize src3 with random floats.
* gcc.target/i386/sse4_1-blendps-2.c (sse4_1_test): Remove field i from
union src3.  Initialize src3 with random floats.
Index: gcc/testsuite/gcc.target/i386/sse4_1-blendps.c
===
--- gcc/testsuite/gcc.target/i386/sse4_1-blendps.c (revision 179043)
+++ gcc/testsuite/gcc.target/i386/sse4_1-blendps.c (working copy)
@@ -14,6 +14,7 @@
 
 #include smmintrin.h
 #include string.h
+#include stdlib.h
 
 #define NUM 20
 
@@ -66,6 +67,9 @@ TEST (void)
 
   init_blendps (src1.f, src2.f);
 
+  for (i = 0; i  4; i++)
+src3.f[i] = (int) random ();
+
   /* Check blendps imm8, m128, xmm */
   for (i = 0; i  NUM; i++)
 {
Index: gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c
===
--- gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c (revision 179043)
+++ gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c (working copy)
@@ -53,14 +53,13 @@ sse4_1_test (void)
 {
   __m128 x;
   float f[4];
-  int i[4];
 } src3;
   int i;
 
   init_blendps (src1.f, src2.f);
 
   for (i = 0; i  4; i++)
-src3.i[i] = (int) random ();
+src3.f[i] = (int) random ();
 
   /* Check blendps imm8, m128, xmm */
   for (i = 0; i  NUM; i++)

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu

 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 28, 2011 4:39 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

 On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com
 wrote:

  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Tuesday, September 27, 2011 3:41 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

  On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
 jiangning@arm.com
  wrote:
   Think of it this way.  What the IR says is there is no barrier
  between
   those moves.  You either have an implicit barrier (which is what
 you
   are proposing) or you have it explicitly.  I think we all rather
  have
   more things explicit rather than implicit in the IR.  And that
 has
   been the overall feeling for a few years now.

   Sorry, I'm afraid I can't agree with you. Instead, I think using
  barrier to describe this kind of dependence is a kind of implicit
  method. Please note that this is not an usual data dependence issue.
  The stack pointer change doesn't have any dependence with memory
 access
  at all.

  It is similar to atomic instructions that require being an
  optimization/memory barrier.  Sure it is not a usual data dependence
  (otherwise we would handle
  it already), so the targets have to explicitly express the
 dependence
  somehow, for which we only have barriers right now.

  Richard,

  Thanks for your explanation. It's explicit to back-end, while it's
 implicit
  to scheduler in middle end, because barrier can decide dependence in
  scheduler but barrier can be generated from several different
 scenarios.
  It's unsafe and prone to introduce bug if any one of the scenarios
 requiring
  generating barriers is being missed in back-end.

  Between middle-end and back-end, we should have interfaces that is
 easy to
  be implemented by back-end. After all, middle-end itself can't
 consist of a
  compiler, and vice versa. Back-end needs middle-end's help to make
 sure
  back-end is easy to be implemented and reduce the possibility of
 introducing
  bugs.

  Without an explicit hook as I'm proposing, back-end implementers have
 to
  clearly know all scenarios of generating barrier very clearly,
 because there
  isn't any code tips and comments in middle end telling back-end the
 list of
  all scenarios on generating barriers.

  Yes, barrier is a perfect interface for scheduler in theory. But from
  engineering point of view, I think it's better to explicitly define
 an
  interface to describe stack red zone and inform back-end, or vice
 versa. Not
  like computer, people is easy to make mistake if you don't tell them.
 On
  this bug, the fact is over the years different back-ends made similar
 bugs.

  GCC is really a perfect platform on building new ports, and I saw a
 lot of
  new back-ends. The current middle end is unsafe, if port doesn't
 support
  stack red zone and back-ends doesn't generate barrier for it. Why
 can't we
  explicitly clarify this in compiler code between middle end and back
 end?
  What if any other back-end (new or old) NOT supporting stack red zone
  exposing the similar bug again?

 There are gazillion things you have to explicitly get right in your
 backends,
 so I don't see why exposing proper scheduling barriers should be
 special,
 and there, why red-zones should be special (as opposed to other
 occasions
 where you need to emit barriers from the backend for the scheduler).

Richard,

This is because,

1) Current scheduler is unsafe if back-end doesn't generate barrier for a
port which doesn't support stack red zone at all.
2) Implementing barrier in back-end is a burden to new back-end
implementation for ports not supporting stack red zone at all. 
3) There are many other ports not reporting bugs around this. I doubt there
isn't bug for them. 
4) There are over 300 TARGET HOOKS being defined in target.def. I don't
think adding this interface is hurting GCC.

BTW, really appreciate your close attention to this specific issue.

Thanks,
-Jiangning

 Richard.

  Thanks,
  -Jiangning

  Richard.

   No matter what solution itself is, the problem itself is a quite a
  common one on ISA level, so we should solve it in middle-end,
 because
  middle-end is shared for all ports.

   My proposal avoids problems in future. Any new ports and new back-
 end
  implementations needn't explicitly define this hook in a very safe
 way.
  But if one port wants to unsafely introduce red zone, we can
  explicitly define this hook in back-end. This way, we would avoid
 the
  bug in the earliest time. Do you really want to hide this problem in
  back-end silently? And wait others to report bug and then waste time
 to
  get it fixed again?

   The facts I see is over the years,

Re: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu jiangning@arm.com wrote:

 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 28, 2011 4:39 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

 On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com
 wrote:

  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Tuesday, September 27, 2011 3:41 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

  On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
 jiangning@arm.com
  wrote:
   Think of it this way.  What the IR says is there is no barrier
  between
   those moves.  You either have an implicit barrier (which is what
 you
   are proposing) or you have it explicitly.  I think we all rather
  have
   more things explicit rather than implicit in the IR.  And that
 has
   been the overall feeling for a few years now.

   Sorry, I'm afraid I can't agree with you. Instead, I think using
  barrier to describe this kind of dependence is a kind of implicit
  method. Please note that this is not an usual data dependence issue.
  The stack pointer change doesn't have any dependence with memory
 access
  at all.

  It is similar to atomic instructions that require being an
  optimization/memory barrier.  Sure it is not a usual data dependence
  (otherwise we would handle
  it already), so the targets have to explicitly express the
 dependence
  somehow, for which we only have barriers right now.

  Richard,

  Thanks for your explanation. It's explicit to back-end, while it's
 implicit
  to scheduler in middle end, because barrier can decide dependence in
  scheduler but barrier can be generated from several different
 scenarios.
  It's unsafe and prone to introduce bug if any one of the scenarios
 requiring
  generating barriers is being missed in back-end.

  Between middle-end and back-end, we should have interfaces that is
 easy to
  be implemented by back-end. After all, middle-end itself can't
 consist of a
  compiler, and vice versa. Back-end needs middle-end's help to make
 sure
  back-end is easy to be implemented and reduce the possibility of
 introducing
  bugs.

  Without an explicit hook as I'm proposing, back-end implementers have
 to
  clearly know all scenarios of generating barrier very clearly,
 because there
  isn't any code tips and comments in middle end telling back-end the
 list of
  all scenarios on generating barriers.

  Yes, barrier is a perfect interface for scheduler in theory. But from
  engineering point of view, I think it's better to explicitly define
 an
  interface to describe stack red zone and inform back-end, or vice
 versa. Not
  like computer, people is easy to make mistake if you don't tell them.
 On
  this bug, the fact is over the years different back-ends made similar
 bugs.

  GCC is really a perfect platform on building new ports, and I saw a
 lot of
  new back-ends. The current middle end is unsafe, if port doesn't
 support
  stack red zone and back-ends doesn't generate barrier for it. Why
 can't we
  explicitly clarify this in compiler code between middle end and back
 end?
  What if any other back-end (new or old) NOT supporting stack red zone
  exposing the similar bug again?

 There are gazillion things you have to explicitly get right in your
 backends,
 so I don't see why exposing proper scheduling barriers should be
 special,
 and there, why red-zones should be special (as opposed to other
 occasions
 where you need to emit barriers from the backend for the scheduler).

 Richard,

 This is because,

 1) Current scheduler is unsafe if back-end doesn't generate barrier for a
 port which doesn't support stack red zone at all.
 2) Implementing barrier in back-end is a burden to new back-end
 implementation for ports not supporting stack red zone at all.
 3) There are many other ports not reporting bugs around this. I doubt there
 isn't bug for them.
 4) There are over 300 TARGET HOOKS being defined in target.def. I don't
 think adding this interface is hurting GCC.

I don't argue that your solution is not acceptable, just your reasoning
is bogus IMHO. 1) is a target bug, 2) huh, I doubt that this is the
biggest issue
one faces when implementing a new target, 3) I'm sure there are very many
latent backend bugs not related to red-zone

The middle-end isn't safe by default either if you have bogus
instruction patterns in your .md file, or you generate bogus IL
from the va-arg gimplification hook.  A target bug is a target bug,
the middle-end can't and should not do anything to try to workaround
bugs in targets.

 BTW, really appreciate your close attention to this specific issue.

 Thanks,
 -Jiangning

 Richard.

  Thanks,
  -Jiangning

  Richard.

   No matter what solution itself is,

[PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-09-28 Thread Tom de Vries

Richard,

I got a patch for PR50527.

The patch prevents the alignment of vla-related allocas to be set to
BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
the alloca.

Bootstrapped and regtested on x86_64.

OK for trunk?

Thanks,
- Tom

2011-09-27  Tom de Vries  t...@codesourcery.com

* tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for vla-related
allocas.

* gcc.dg/pr50527.c: New test.
Index: gcc/tree-ssa-ccp.c
===
--- gcc/tree-ssa-ccp.c (revision 179210)
+++ gcc/tree-ssa-ccp.c (working copy)
@@ -1632,6 +1632,8 @@ evaluate_stmt (gimple stmt)
 	  break;
 
 	case BUILT_IN_ALLOCA:
+	  if (gimple_call_alloca_for_var_p (stmt))
+		break;
 	  val.lattice_val = CONSTANT;
 	  val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0);
 	  val.mask = shwi_to_double_int
Index: gcc/testsuite/gcc.dg/pr50527.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/pr50527.c (revision 0)
@@ -0,0 +1,46 @@
+/* { dg-do run } */
+/* { dg-options -Os --param large-stack-frame=30 } */
+
+extern void abort (void);
+
+void __attribute__((noinline)) 
+bar (char *a)
+{
+}
+
+void __attribute__((noinline)) 
+foo (char *a, int b)
+{
+}
+
+void __attribute__((noinline))
+test_align (char *p, int aligned, unsigned int mask)
+{
+  int p_aligned = ((unsigned long int)p  mask) == 0;
+  if (aligned != p_aligned)
+abort ();
+}
+
+int
+main ()
+{
+  const int kIterations = 4;
+  char results[kIterations];
+  int i;
+  unsigned int mask;
+
+  mask = 0xf;
+  test_align (results, ((unsigned long int)results  mask) == 0, mask);
+  mask = 0x7;
+  test_align (results, ((unsigned long int)results  mask) == 0, mask);
+  mask = 0x3;
+  test_align (results, ((unsigned long int)results  mask) == 0, mask);
+  mask = 0x1;
+  test_align (results, ((unsigned long int)results  mask) == 0, mask);
+
+  bar (results);
+  for (i = 0; i  kIterations; i++)
+foo (%d , results[i]);
+
+  return 0;
+}

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu

 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 28, 2011 5:20 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

 On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu jiangning@arm.com
 wrote:

  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Wednesday, September 28, 2011 4:39 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

  On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
 jiangning@arm.com
  wrote:

   -Original Message-
   From: Richard Guenther [mailto:richard.guent...@gmail.com]
   Sent: Tuesday, September 27, 2011 3:41 PM
   To: Jiangning Liu
   Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
   Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

   On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
  jiangning@arm.com
   wrote:
Think of it this way.  What the IR says is there is no barrier
   between
those moves.  You either have an implicit barrier (which is
 what
  you
are proposing) or you have it explicitly.  I think we all
 rather
   have
more things explicit rather than implicit in the IR.  And that
  has
been the overall feeling for a few years now.

Sorry, I'm afraid I can't agree with you. Instead, I think
 using
   barrier to describe this kind of dependence is a kind of implicit
   method. Please note that this is not an usual data dependence
 issue.
   The stack pointer change doesn't have any dependence with memory
  access
   at all.

   It is similar to atomic instructions that require being an
   optimization/memory barrier.  Sure it is not a usual data
 dependence
   (otherwise we would handle
   it already), so the targets have to explicitly express the
  dependence
   somehow, for which we only have barriers right now.

   Richard,

   Thanks for your explanation. It's explicit to back-end, while it's
  implicit
   to scheduler in middle end, because barrier can decide dependence
 in
   scheduler but barrier can be generated from several different
  scenarios.
   It's unsafe and prone to introduce bug if any one of the scenarios
  requiring
   generating barriers is being missed in back-end.

   Between middle-end and back-end, we should have interfaces that is
  easy to
   be implemented by back-end. After all, middle-end itself can't
  consist of a
   compiler, and vice versa. Back-end needs middle-end's help to make
  sure
   back-end is easy to be implemented and reduce the possibility of
  introducing
   bugs.

   Without an explicit hook as I'm proposing, back-end implementers
 have
  to
   clearly know all scenarios of generating barrier very clearly,
  because there
   isn't any code tips and comments in middle end telling back-end
 the
  list of
   all scenarios on generating barriers.

   Yes, barrier is a perfect interface for scheduler in theory. But
 from
   engineering point of view, I think it's better to explicitly
 define
  an
   interface to describe stack red zone and inform back-end, or vice
  versa. Not
   like computer, people is easy to make mistake if you don't tell
 them.
  On
   this bug, the fact is over the years different back-ends made
 similar
  bugs.

   GCC is really a perfect platform on building new ports, and I saw
 a
  lot of
   new back-ends. The current middle end is unsafe, if port doesn't
  support
   stack red zone and back-ends doesn't generate barrier for it. Why
  can't we
   explicitly clarify this in compiler code between middle end and
 back
  end?
   What if any other back-end (new or old) NOT supporting stack red
 zone
   exposing the similar bug again?

  There are gazillion things you have to explicitly get right in your
  backends,
  so I don't see why exposing proper scheduling barriers should be
  special,
  and there, why red-zones should be special (as opposed to other
  occasions
  where you need to emit barriers from the backend for the scheduler).

  Richard,

  This is because,

  1) Current scheduler is unsafe if back-end doesn't generate barrier
 for a
  port which doesn't support stack red zone at all.
  2) Implementing barrier in back-end is a burden to new back-end
  implementation for ports not supporting stack red zone at all.
  3) There are many other ports not reporting bugs around this. I doubt
 there
  isn't bug for them.
  4) There are over 300 TARGET HOOKS being defined in target.def. I
 don't
  think adding this interface is hurting GCC.

 I don't argue that your solution is not acceptable, just your reasoning
 is bogus IMHO. 
 1) is a target bug, 

Why should back-ends handle a thing that doesn't exist at all for them? I
don't see clear logic here. 

 2) huh, I doubt that this is the
 biggest issue
 one faces when implementing a new target, 

I never say it is a

Re: [patch, testsuite, ARM] Skip architecture option in pr42575.c

2011-09-28 Thread Ramana Radhakrishnan

On 28 September 2011 09:48, Joey Ye joey...@arm.com wrote:
 2011-09-28  Joey Ye  joey...@arm.com

        * gcc.target/arm/pr42575.c: Remove architecture option.

What happens if this test is run with a multilib of march=armv5te ?

Ramana

 Index: gcc/testsuite/gcc.target/arm/pr42575.c
 ===
 --- gcc/testsuite/gcc.target/arm/pr42575.c      (revision 179308)
 +++ gcc/testsuite/gcc.target/arm/pr42575.c      (working copy)
 @@ -1,4 +1,4 @@
 -/* { dg-options -O2 -march=armv7-a }  */
 +/* { dg-options -O2 }  */
  /* Make sure RA does good job allocating registers and avoids
    unnecessary moves.  */
  /* { dg-final { scan-assembler-not mov } } */

Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries tom_devr...@mentor.com wrote:
 Richard,

 I got a patch for PR50527.

 The patch prevents the alignment of vla-related allocas to be set to
 BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding
 the alloca.

 Bootstrapped and regtested on x86_64.

 OK for trunk?

Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
the vectorizer then will no longer see that the arrays are properly aligned.

I'm not sure what the best thing to do is here, other than trying to record
the alignment requirement of the VLA somewhere.

Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT
has the issue that it will force stack-realignment which isn't free (and the
point was to make the decl cheaper than the alloca).  But that might
possibly be the better choice.

Any other thoughts?

Thanks,
Richard.

 Thanks,
 - Tom

 2011-09-27  Tom de Vries  t...@codesourcery.com

        * tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for 
 vla-related
        allocas.

        * gcc.dg/pr50527.c: New test.

Re: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu jiangning@arm.com wrote:

 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 28, 2011 5:20 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

 On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu jiangning@arm.com
 wrote:

  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Wednesday, September 28, 2011 4:39 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

  On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
 jiangning@arm.com
  wrote:

   -Original Message-
   From: Richard Guenther [mailto:richard.guent...@gmail.com]
   Sent: Tuesday, September 27, 2011 3:41 PM
   To: Jiangning Liu
   Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
   Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

   On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
  jiangning@arm.com
   wrote:
Think of it this way.  What the IR says is there is no barrier
   between
those moves.  You either have an implicit barrier (which is
 what
  you
are proposing) or you have it explicitly.  I think we all
 rather
   have
more things explicit rather than implicit in the IR.  And that
  has
been the overall feeling for a few years now.

Sorry, I'm afraid I can't agree with you. Instead, I think
 using
   barrier to describe this kind of dependence is a kind of implicit
   method. Please note that this is not an usual data dependence
 issue.
   The stack pointer change doesn't have any dependence with memory
  access
   at all.

   It is similar to atomic instructions that require being an
   optimization/memory barrier.  Sure it is not a usual data
 dependence
   (otherwise we would handle
   it already), so the targets have to explicitly express the
  dependence
   somehow, for which we only have barriers right now.

   Richard,

   Thanks for your explanation. It's explicit to back-end, while it's
  implicit
   to scheduler in middle end, because barrier can decide dependence
 in
   scheduler but barrier can be generated from several different
  scenarios.
   It's unsafe and prone to introduce bug if any one of the scenarios
  requiring
   generating barriers is being missed in back-end.

   Between middle-end and back-end, we should have interfaces that is
  easy to
   be implemented by back-end. After all, middle-end itself can't
  consist of a
   compiler, and vice versa. Back-end needs middle-end's help to make
  sure
   back-end is easy to be implemented and reduce the possibility of
  introducing
   bugs.

   Without an explicit hook as I'm proposing, back-end implementers
 have
  to
   clearly know all scenarios of generating barrier very clearly,
  because there
   isn't any code tips and comments in middle end telling back-end
 the
  list of
   all scenarios on generating barriers.

   Yes, barrier is a perfect interface for scheduler in theory. But
 from
   engineering point of view, I think it's better to explicitly
 define
  an
   interface to describe stack red zone and inform back-end, or vice
  versa. Not
   like computer, people is easy to make mistake if you don't tell
 them.
  On
   this bug, the fact is over the years different back-ends made
 similar
  bugs.

   GCC is really a perfect platform on building new ports, and I saw
 a
  lot of
   new back-ends. The current middle end is unsafe, if port doesn't
  support
   stack red zone and back-ends doesn't generate barrier for it. Why
  can't we
   explicitly clarify this in compiler code between middle end and
 back
  end?
   What if any other back-end (new or old) NOT supporting stack red
 zone
   exposing the similar bug again?

  There are gazillion things you have to explicitly get right in your
  backends,
  so I don't see why exposing proper scheduling barriers should be
  special,
  and there, why red-zones should be special (as opposed to other
  occasions
  where you need to emit barriers from the backend for the scheduler).

  Richard,

  This is because,

  1) Current scheduler is unsafe if back-end doesn't generate barrier
 for a
  port which doesn't support stack red zone at all.
  2) Implementing barrier in back-end is a burden to new back-end
  implementation for ports not supporting stack red zone at all.
  3) There are many other ports not reporting bugs around this. I doubt
 there
  isn't bug for them.
  4) There are over 300 TARGET HOOKS being defined in target.def. I
 don't
  think adding this interface is hurting GCC.

 I don't argue that your solution is not acceptable, just your reasoning
 is bogus IMHO.
 1) is a target bug,

 Why should back-ends handle a thing that doesn't exist at all for them? I
 don't see clear logic here.

I don't think this is the case here.  You need

Re: Go patch committed: Update libgo to Go release r60

2011-09-28 Thread Rainer Orth

Ian Lance Taylor i...@google.com writes:

 Rainer Orth r...@cebitec.uni-bielefeld.de writes:

 Solaris 8 and 9 suffer from the same problem.  The following patch
 allowed the bootstrap to complete.  An IRIX bootstrap is currently
 running, but will take some time to complete.

  Rainer


 2011-09-23  Rainer Orth  r...@cebitec.uni-bielefeld.de

  * mksysinfo.sh: Provide TIOCSCTTY if missing.

 Thanks.  I committed this patch, though I moved the new lines farther
 down in the script next to the handling of the ioctl constants.

Thanks, I'd missed that.  It turned out that IRIX 6 needs one more
change to return to bootstrap land: sys/termios.h only defines
TIOCNOTTY if !_XOPEN_SOURCE, which we need for other stuff
(cf. configure.ac).  I've cheated and use sys/ttold.h instead, which
doesn't have this check.  With this patch, a Go-only IRIX 6.5 bootstrap
completed successfully.

Rainer


2011-09-28  Rainer Orth  r...@cebitec.uni-bielefeld.de

* mksysinfo.sh [__sgi__]: Include sys/ttold.h.

# HG changeset patch
# Parent 4530aeaf12a2b1576a7bf67de9cf5569719107c6
Provide TIOCNOTTY on IRIX 6

diff --git a/libgo/mksysinfo.sh b/libgo/mksysinfo.sh
--- a/libgo/mksysinfo.sh
+++ b/libgo/mksysinfo.sh
@@ -36,9 +36,12 @@ cat  sysinfo.c EOF
 #include netinet/in.h
 /* netinet/tcp.h needs u_char/u_short, but sys/bsd_types is only
included by netinet/in.h if _SGIAPI (i.e. _SGI_SOURCE
-!_XOPEN_SOURCE.  */
+!_XOPEN_SOURCE.
+   sys/termios.h only defines TIOCNOTTY if !_XOPEN_SOURCE, while
+   sys/ttold.h does so unconditionally.  */
 #ifdef __sgi__
 #include sys/bsd_types.h
+#include sys/ttold.h
 #endif
 #include netinet/tcp.h
 #include signal.h

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [RFC] Context sensitive inline analysis

2011-09-28 Thread Richard Sandiford

Jan Hubicka hubi...@ucw.cz writes:
 the problem is sign overflow in time computation. Time should be
 capped by MAX_TIME and we compute MAX_TIME * INLINE_SIZE_SCALE *
 2. This happens to be 2^31  2^32 so we overflow here because of use
 of signed arithmetics.

 Index: ipa-inline-analysis.c
 ===
 --- ipa-inline-analysis.c (revision 179266)
 +++ ipa-inline-analysis.c (working copy)
 @@ -92,7 +92,7 @@ along with GCC; see the file COPYING3.
  /* Estimate runtime of function can easilly run into huge numbers with many
 nested loops.  Be sure we can compute time * INLINE_SIZE_SCALE in integer.
 For anything larger we use gcov_type.  */
 -#define MAX_TIME 100
 +#define MAX_TIME 50
  
  /* Number of bits in integer, but we really want to be stable across 
 different
 hosts.  */

Could you update the comment too?  (time * INLINE_SIZE_SCALE * 2)

Richard

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

Attached is a part 1 of patch that enables use of vector-instructions
in memset and memcopy (middle-end part).
The main part of the changes is in functions
move_by_pieces/set_by_pieces. In new version algorithm of move-mode
selection was changed – now it checks if alignment is known at compile
time and uses cost-models to choose between aligned and unaligned
vector or not-vector move-modes.

Build and 'make check' was tested - in 'make check' there is a fail,
that would be cured when complete patch is applied.

On 27 September 2011 18:44, Michael Zolotukhin
michael.v.zolotuk...@gmail.com wrote:
 I divided the patch into three smaller ones:

 1) Patch with target-independent changes (see attached file 
 memfunc-mid.patch).
 The main part of the changes is in functions
 move_by_pieces/set_by_pieces. In new version algorithm of move-mode
 selection was changed – now it checks if alignment is known at compile
 time and uses cost-models to choose between aligned and unaligned
 vector or not-vector move-modes.

 2) Patch with target-dependent changes (memfunc-be.patch).
 The main part of the changes is in functions
 ix86_expand_setmem/ix86_expand_movmem. The other changes are only
 needed to support it.
 The changes mostly touched unrolled_loop strategy – now vector move
 modes could be used here. That resulted in large epilogues and
 prologues, so their generation also was modified.
 This patch contains some changes in middle-end (to make build
 possible) - but all these changes are present in the first patch, so
 there is no need to review them here.

 3) Patch with all new tests (memfunc-tests.patch).
 This patch contains a lot of small tests for different memset and memcopy 
 cases.

 Separately from each other, these patches won't give performance gain.
 The positive effect will be noticeable only if they are applied
 together (I attach the complete patch also - see file
 memfunc-complete.patch).


 If you have any questions regarding these changes, please don't
 hesitate to ask them.


 On 18 July 2011 15:00, Michael Zolotukhin
 michael.v.zolotuk...@gmail.com wrote:
 Here is a summary - probably, it doesn't cover every single piece in
 the patch, but I tried to describe the major changes. I hope this will
 help you a bit - and of course I'll answer your further questions if
 they appear.

 The changes could be logically divided into two parts (though, these
 parts have something in common).
 The first part is changes in target-independent part, in functions
 move_by_pieces() and store_by_pieces() - mostly located in expr.c.
 The second part touches ix86_expand_movmem() and ix86_expand_setmem()
 - mostly located in config/i386/i386.c.

 Changes in i386.c (target-dependent part):
 1) Strategies for cases with known and unknown alignment are separated
 from each other.
 When alignment is known at compile time, we could generate optimized
 code without libcalls.
 When it's unknown, we sometimes could create runtime-checks to reach
 desired alignment, but not always.
 Strategies for atom and generic_32, generic_64 were chosen according
 to set of experiments, strategies in other
 cost models are unchanged (strategies for unknown alignment are copied
 from existing strategies).
 2) unrolled_loop algorithm was modified - now it uses SSE move-modes,
 if they're available.
 3) As size of data, moved in one iteration, greatly increased, and
 epilogues became bigger - so some changes were needed in epilogue
 generation. In some cases a special loop (not unrolled) is generated
 in epilogue to avoid slow copying by bytes (changes in
 expand_set_or_movmem_via_loop() and introducing of
 expand_set_or_movmem_via_loop_with_iter() is made for these cases).
 4) As bigger alignment might be needed than previously, prologue
 generation was also modified.

 Changes in expr.c (target-independent part):
 There are two possible strategies now: use of aligned and unaligned
 moves. For each of them a cost model was implemented and the choice is
 made according to the cost of each option. Move-mode choice is made by
 functions widest_mode_for_unaligned_mov() and
 widest_mode_for_aligned_mov().
 Cost estimation is implemented in functions compute_aligned_cost() and
 compute_unaligned_cost().
 Choice between these two strategies and the generation of moves
 themselves are in function move_by_pieces().

 Function store_by_pieces() calls set_by_pieces_1() instead of
 store_by_pieces_1(), if this is memset-case (I needed to introduce
 set_by_pieces_1 to separate memset-case from others -
 store_by_pieces_1 is sometimes called for strcpy and some other
 functions, not only for memset).

 Set_by_pieces_1() estimates costs of aligned and unaligned strategies
 (as in move_by_pieces() ) and generates moves for memset. Single move
 is generated via
 generate_move_with_mode(). If it's called first time, a promoted value
 (register, filled with one-byte value of memset argument) is generated
 - later calls reuse this value.

 Changes in

Re: [ARM] Optimise handling of neon_vget_high/low

2011-09-28 Thread Ramana Radhakrishnan


 Tested on arm-linux-gnueabi, and on benchmarks which should (and did)
 benefit from it.  OK to install?

OK.

Ramana

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen

Michael Zolotukhin michael.v.zolotuk...@gmail.com writes:

 Build and 'make check' was tested.

Could you expand a bit on the performance benefits?  Where does it help?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only

Re: Intrinsics for N2965: Type traits and base classes

2011-09-28 Thread Mike Spertus

Don't worry, I'm not suggesting including boost::mpl at all, just 
leaving the return type of the bases trait unspecified. IMO, your 
example illustrates my point that without performance tuning, compiling 
metaprograms can be prohibitively expensive, so I want to avoid running 
the tuple metaprogram that creates the fields when we never need to 
instantiate the type. Benchmarks soon.


Mike

On 9/28/2011 2:53 AM, Jonathan Wakely wrote:

On 28 September 2011 04:22, Michael Spertus wrote:
   

Benjamin,
I think tuple is wrong both for performance reasons (I believe these are likely 
to be serious enough to depress use due to inordinately long compiles) and 
because it prematurely locks us into a rigid choice of how our typelists are 
implemented.

My inclination is to make it type-independent by returning an unspecified type that can 
have a sequence of types extracted from it (this is the approach taken by boost::mpl and 
has loads of experience that shows it is a good approach to metaprogramming). In other 
words, firstbasesA::type would be the first base of A, etc.
 

Citing Boost MPL as a good way to avoid inordinately long compiles ...
interesting!  Have you ever tried to reduce a GCC bug report from 20k
lines to 20, because most Boost libs include every MPL header?!

I hope we can get a simple typelist _without_ needing everything else
in MPL, such as the apply and lambda metafunctions (and maybe a lot of
that could be massively simplified using variadic templates anyway.)

.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jakub Jelinek

On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
 Michael Zolotukhin michael.v.zolotuk...@gmail.com writes:
 
  Build and 'make check' was tested.
 
 Could you expand a bit on the performance benefits?  Where does it help?

Especially when glibc these days has very well optimized implementations
tuned for various CPUs and it is very unlikely beneficial to inline
memcpy/memset if they aren't really short or have unknown number of
iterations.

Jakub

Support for V2 plugin API

2011-09-28 Thread Jan Hubicka

Hi,
this patch adds support for V2 plugin API (thanks, Cary) that adds
LDPR_PREVAILING_DEF_IRONLY_EXP.
The reoslution is like LDPR_PREVAILING_DEF_IRONLY but the symbol is exported
out of DSO.  It is up to the compiler to optimize it out or keep it based on
the knowledge whether the symbol can be optimized out at all (i.e. most COMDATs
can, other types can't).

This solve quite few problems with building C++ APPS, see the PR log.

I was originally wrong about gold implementation being buggy. The problem turned
out to be subtle lto-symtab bug that was mostly latent because of the COMDAT 
hack
we use. lto_symtab_resolve_symbols is supposed to honor plugin decision when it 
is
available but it doesn't when resolution of very first entry in the list is 
UNKNOWN.
This can happen because we add into symtab also declarations that are not in
varpool (i.e. they are neither defined or used by the object file), but they are
used otherwise, i.e. referred from stuff used for debug info or TB 
devirtualization.

To ensure backward compatibility I am keeping the COMDAT hack in place.  It 
won't help
letting compiler know the plugin API version, since we decide on that at a time
we output object files and thus we are not called from plugin.  I suppose we 
could
keep the hack in place for next release and remove it afterwards penalizing 
builds
with old binutils? Or perhaps even in GCC 4.7 if GNU LD gets updated in time.

Bootstrapped/regtested x86_64-linux, built Mozilla and lto-bootstrap in 
progress.
OK if it passes?

Honza

PR lto/47247
* lto-plugin.c (get_symbols_v2): New variable.
(write_resolution): Use V2 API when available.
(onload): Handle LDPT_GET_SYMBOLS_V2.

* lto-symtab.c (lto_symtab_resolve_symbols): Do not resolve
when resolution is already availbale from plugin.
(lto_symtab_merge_decls_1): Handle LDPR_PREVAILING_DEF_IRONLY_EXP.
* cgraph.c (ld_plugin_symbol_resolution): Add prevailing_def_ironly_exp.
* lto-cgraph.c (LDPR_NUM_KNOWN): Update.
* ipa.c (varpool_externally_visible_p): IRONLY variables are never
externally visible.
* varasm.c (resolution_to_local_definition_p): Add
LDPR_PREVAILING_DEF_IRONLY_EXP.
(resolution_local_p): Likewise.

* common.c (lto_resolution_str): Add new resolution.
* common.h (lto_resolution_str): Likewise.
Index: lto-plugin/lto-plugin.c
===
*** lto-plugin/lto-plugin.c (revision 179274)
--- lto-plugin/lto-plugin.c (working copy)
*** enum symbol_style
*** 129,135 
  static char *arguments_file_name;
  static ld_plugin_register_claim_file register_claim_file;
  static ld_plugin_register_all_symbols_read register_all_symbols_read;
! static ld_plugin_get_symbols get_symbols;
  static ld_plugin_register_cleanup register_cleanup;
  static ld_plugin_add_input_file add_input_file;
  static ld_plugin_add_input_library add_input_library;
--- 129,135 
  static char *arguments_file_name;
  static ld_plugin_register_claim_file register_claim_file;
  static ld_plugin_register_all_symbols_read register_all_symbols_read;
! static ld_plugin_get_symbols get_symbols, get_symbols_v2;
  static ld_plugin_register_cleanup register_cleanup;
  static ld_plugin_add_input_file add_input_file;
  static ld_plugin_add_input_library add_input_library;
*** write_resolution (void)
*** 441,447 
struct plugin_symtab *symtab = info-symtab;
struct ld_plugin_symbol *syms = symtab-syms;
  
!   get_symbols (info-handle, symtab-nsyms, syms);
  
finish_conflict_resolution (symtab, info-conflicts);
  
--- 441,452 
struct plugin_symtab *symtab = info-symtab;
struct ld_plugin_symbol *syms = symtab-syms;
  
!   /* Version 2 of API supports IRONLY_EXP resolution that is
!  accepted by GCC-4.7 and newer.  */
!   if (get_symbols_v2)
! get_symbols_v2 (info-handle, symtab-nsyms, syms);
!   else
! get_symbols (info-handle, symtab-nsyms, syms);
  
finish_conflict_resolution (symtab, info-conflicts);
  
*** onload (struct ld_plugin_tv *tv)
*** 986,991 
--- 991,999 
case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
  register_all_symbols_read = p-tv_u.tv_register_all_symbols_read;
  break;
+   case LDPT_GET_SYMBOLS_V2:
+ get_symbols_v2 = p-tv_u.tv_get_symbols;
+ break;
case LDPT_GET_SYMBOLS:
  get_symbols = p-tv_u.tv_get_symbols;
  break;
Index: gcc/lto-symtab.c
===
*** gcc/lto-symtab.c(revision 179274)
--- gcc/lto-symtab.c(working copy)
*** lto_symtab_resolve_symbols (void **slot)
*** 441,452 
e-node = cgraph_get_node (e-decl);
else if (TREE_CODE (e-decl) == VAR_DECL)
e-vnode = varpool_get_node (e-decl);
  }
  
-   e =

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jan Hubicka

 On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
  Michael Zolotukhin michael.v.zolotuk...@gmail.com writes:
  
   Build and 'make check' was tested.
  
  Could you expand a bit on the performance benefits?  Where does it help?
 
 Especially when glibc these days has very well optimized implementations
 tuned for various CPUs and it is very unlikely beneficial to inline
 memcpy/memset if they aren't really short or have unknown number of
 iterations.

I guess we should update the expansion tables so we produce function calls more 
often.
I will look how things behave on my setup.  Do you know glibc version numbers 
when
the optimized string functions was introduced?

Concerning inline SSE, I think it makes a lot of sense when we know size 
alignment so we can output just few SSE moves instead of more integer moves.
We definitely need some numbers for the loop variants.

Honza

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

This expanding only works on relatively small sizes (up to 4k), where
overhead of library call could be quite significant. In some cases new
implementation gives 5x acceleration (especially on small sizes - less
than ~256 bytes). Almost on all sizes from 16 to 4096 bytes there is a
some gain, in average it's 20-30% on 64-bits and 40-50% on 32-bits (on
Atom).
This inlining implementation isn't intended to replace glibc, it's
intended to replace old implementation which sometimes is quite slow.

If glibc-calls turn out to be faster than this expanding, libcall is
generated (special experiments were carried out to find threshold
values in cost models).

If the size is unknown at all, this inlining doesn't work (i.e glibc is called).

On 28 September 2011 15:55, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote:
 Michael Zolotukhin michael.v.zolotuk...@gmail.com writes:
 
  Build and 'make check' was tested.

 Could you expand a bit on the performance benefits?  Where does it help?

 Especially when glibc these days has very well optimized implementations
 tuned for various CPUs and it is very unlikely beneficial to inline
 memcpy/memset if they aren't really short or have unknown number of
 iterations.

        Jakub




-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Re: Support for V2 plugin API

2011-09-28 Thread Richard Guenther

On Wed, 28 Sep 2011, Jan Hubicka wrote:

 Hi,
 this patch adds support for V2 plugin API (thanks, Cary) that adds
 LDPR_PREVAILING_DEF_IRONLY_EXP.
 The reoslution is like LDPR_PREVAILING_DEF_IRONLY but the symbol is exported
 out of DSO.  It is up to the compiler to optimize it out or keep it based on
 the knowledge whether the symbol can be optimized out at all (i.e. most 
 COMDATs
 can, other types can't).
 
 This solve quite few problems with building C++ APPS, see the PR log.
 
 I was originally wrong about gold implementation being buggy. The problem 
 turned
 out to be subtle lto-symtab bug that was mostly latent because of the COMDAT 
 hack
 we use. lto_symtab_resolve_symbols is supposed to honor plugin decision when 
 it is
 available but it doesn't when resolution of very first entry in the list is 
 UNKNOWN.
 This can happen because we add into symtab also declarations that are not in
 varpool (i.e. they are neither defined or used by the object file), but they 
 are
 used otherwise, i.e. referred from stuff used for debug info or TB 
 devirtualization.
 
 To ensure backward compatibility I am keeping the COMDAT hack in place.  It 
 won't help
 letting compiler know the plugin API version, since we decide on that at a 
 time
 we output object files and thus we are not called from plugin.  I suppose we 
 could
 keep the hack in place for next release and remove it afterwards penalizing 
 builds
 with old binutils? Or perhaps even in GCC 4.7 if GNU LD gets updated in time.
 
 Bootstrapped/regtested x86_64-linux, built Mozilla and lto-bootstrap in 
 progress.
 OK if it passes?

Ok.

Any idea when GNU ld will catch up?

Thanks,
Richard.

 Honza
 
   PR lto/47247
   * lto-plugin.c (get_symbols_v2): New variable.
   (write_resolution): Use V2 API when available.
   (onload): Handle LDPT_GET_SYMBOLS_V2.
 
   * lto-symtab.c (lto_symtab_resolve_symbols): Do not resolve
   when resolution is already availbale from plugin.
   (lto_symtab_merge_decls_1): Handle LDPR_PREVAILING_DEF_IRONLY_EXP.
   * cgraph.c (ld_plugin_symbol_resolution): Add prevailing_def_ironly_exp.
   * lto-cgraph.c (LDPR_NUM_KNOWN): Update.
   * ipa.c (varpool_externally_visible_p): IRONLY variables are never
   externally visible.
   * varasm.c (resolution_to_local_definition_p): Add
   LDPR_PREVAILING_DEF_IRONLY_EXP.
   (resolution_local_p): Likewise.
 
   * common.c (lto_resolution_str): Add new resolution.
   * common.h (lto_resolution_str): Likewise.
 Index: lto-plugin/lto-plugin.c
 ===
 *** lto-plugin/lto-plugin.c   (revision 179274)
 --- lto-plugin/lto-plugin.c   (working copy)
 *** enum symbol_style
 *** 129,135 
   static char *arguments_file_name;
   static ld_plugin_register_claim_file register_claim_file;
   static ld_plugin_register_all_symbols_read register_all_symbols_read;
 ! static ld_plugin_get_symbols get_symbols;
   static ld_plugin_register_cleanup register_cleanup;
   static ld_plugin_add_input_file add_input_file;
   static ld_plugin_add_input_library add_input_library;
 --- 129,135 
   static char *arguments_file_name;
   static ld_plugin_register_claim_file register_claim_file;
   static ld_plugin_register_all_symbols_read register_all_symbols_read;
 ! static ld_plugin_get_symbols get_symbols, get_symbols_v2;
   static ld_plugin_register_cleanup register_cleanup;
   static ld_plugin_add_input_file add_input_file;
   static ld_plugin_add_input_library add_input_library;
 *** write_resolution (void)
 *** 441,447 
 struct plugin_symtab *symtab = info-symtab;
 struct ld_plugin_symbol *syms = symtab-syms;
   
 !   get_symbols (info-handle, symtab-nsyms, syms);
   
 finish_conflict_resolution (symtab, info-conflicts);
   
 --- 441,452 
 struct plugin_symtab *symtab = info-symtab;
 struct ld_plugin_symbol *syms = symtab-syms;
   
 !   /* Version 2 of API supports IRONLY_EXP resolution that is
 !  accepted by GCC-4.7 and newer.  */
 !   if (get_symbols_v2)
 ! get_symbols_v2 (info-handle, symtab-nsyms, syms);
 !   else
 ! get_symbols (info-handle, symtab-nsyms, syms);
   
 finish_conflict_resolution (symtab, info-conflicts);
   
 *** onload (struct ld_plugin_tv *tv)
 *** 986,991 
 --- 991,999 
   case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK:
 register_all_symbols_read = p-tv_u.tv_register_all_symbols_read;
 break;
 + case LDPT_GET_SYMBOLS_V2:
 +   get_symbols_v2 = p-tv_u.tv_get_symbols;
 +   break;
   case LDPT_GET_SYMBOLS:
 get_symbols = p-tv_u.tv_get_symbols;
 break;
 Index: gcc/lto-symtab.c
 ===
 *** gcc/lto-symtab.c  (revision 179274)
 --- gcc/lto-symtab.c  (working copy)
 *** lto_symtab_resolve_symbols (void **slot)
 ***

Updated random seed patchkit

2011-09-28 Thread Andi Kleen

I addressed all review comments, clarified some code.
The random seed generation in gcc.c is now fixed too and toplev.c
detects this case and doesn't rerun the CRC.

Repasses bootstrap and testsuite on x86_64-linux.

Since the previous version was approved I will commit in 24h,
unless there are new objections.

-Andi

[PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Andi Kleen

From: Andi Kleen a...@linux.intel.com

When available use /dev/urandom to get the random seem. This will lower the 
probability
of collisions.

On other systems it will fallback to the old methods.

Passes bootstrap + testsuite on x86_64. Ok?

gcc/:

* 2011-09-26   Andi Kleen a...@linux.intel.com

* toplev.c (init_local_tick): Try reading random seed from /dev/urandom
---
 gcc/toplev.c |   12 +++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/gcc/toplev.c b/gcc/toplev.c
index 78583fc..ab6b5a4 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -262,7 +262,17 @@ init_local_tick (void)
 {
   if (!flag_random_seed)
 {
-  /* Get some more or less random data.  */
+  /* Try urandom first. Time of day is too likely to collide. 
+In case of any error we just use the local tick. */
+
+  int fd = open (/dev/urandom, O_RDONLY);
+  if (fd = 0)
+{
+  read (fd, random_seed, sizeof (random_seed));
+  close (fd);
+}
+
+  /* Now get the tick anyways  */
 #ifdef HAVE_GETTIMEOFDAY
   {
struct timeval tv;
-- 
1.7.5.4

[PATCH 3/3] Use urandom in gcc.c too

2011-09-28 Thread Andi Kleen

From: Andi Kleen a...@linux.intel.com

gcc also takes generates a random number in some special circumstances,
so teach it about /dev/urandom too.

gcc/:

2011-09-27   Andi Kleen a...@linux.intel.com

* gcc.c (get_local_tick). Rename to get_random_number.  Read from 
/dev/urandom.
Add getpid call.
(compare_debug_dump_opt_spec_function): Drop getpid call.
---
 gcc/gcc.c |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/gcc/gcc.c b/gcc/gcc.c
index ddec8db..3bfdf77 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -8062,12 +8062,22 @@ print_asm_header_spec_function (int arg 
ATTRIBUTE_UNUSED,
   return NULL;
 }
 
-/* Compute a timestamp to initialize flag_random_seed.  */
+/* Get a random number for -frandom-seed */
 
-static unsigned
-get_local_tick (void)
+static unsigned HOST_WIDE_INT
+get_random_number (void)
 {
-  unsigned ret = 0;
+  unsigned HOST_WIDE_INT ret = 0;
+  int fd; 
+
+  fd = open (/dev/urandom, O_RDONLY); 
+  if (fd = 0)
+{
+  read (fd, ret, sizeof (HOST_WIDE_INT));
+  close (fd);
+  if (ret)
+return ret;
+}
 
   /* Get some more or less random data.  */
 #ifdef HAVE_GETTIMEOFDAY
@@ -8086,7 +8096,7 @@ get_local_tick (void)
   }
 #endif
 
-  return ret;
+  return ret ^ getpid();
 }
 
 /* %:compare-debug-dump-opt spec function.  Save the last argument,
@@ -8145,7 +8155,7 @@ compare_debug_dump_opt_spec_function (int arg,
 
   if (!which)
 {
-  unsigned HOST_WIDE_INT value = get_local_tick () ^ getpid ();
+  unsigned HOST_WIDE_INT value = get_random_number ();
 
   sprintf (random_seed, HOST_WIDE_INT_PRINT_HEX, value);
 }
-- 
1.7.5.4

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

 Do you know glibc version numbers when
 the optimized string functions was introduced?

Afaik, it's 2.13.
I also compared my implementation to 2.13.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jan Hubicka

  Do you know glibc version numbers when
  the optimized string functions was introduced?
 
 Afaik, it's 2.13.
 I also compared my implementation to 2.13.

I wonder if we can assume that most of GCC 4.7 based systems will be glibc 2.13
based, too.  I would tend to say that yes and thus would suggest to tamn down
inlining that is no longer profitable on newer glibcs with a note in
changes.html...

(I worry about the tables in i386.c deciding what strategy to use for block of
given size. This is more or less unrelated to the actual patch)

Honza

Re: [PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 2:49 PM, Andi Kleen a...@firstfloor.org wrote:
 From: Andi Kleen a...@linux.intel.com

 When available use /dev/urandom to get the random seem. This will lower the 
 probability
 of collisions.

 On other systems it will fallback to the old methods.

 Passes bootstrap + testsuite on x86_64. Ok?

 gcc/:

 * 2011-09-26   Andi Kleen a...@linux.intel.com

        * toplev.c (init_local_tick): Try reading random seed from /dev/urandom
 ---
  gcc/toplev.c |   12 +++-
  1 files changed, 11 insertions(+), 1 deletions(-)

 diff --git a/gcc/toplev.c b/gcc/toplev.c
 index 78583fc..ab6b5a4 100644
 --- a/gcc/toplev.c
 +++ b/gcc/toplev.c
 @@ -262,7 +262,17 @@ init_local_tick (void)
  {
   if (!flag_random_seed)
     {
 -      /* Get some more or less random data.  */
 +      /* Try urandom first. Time of day is too likely to collide.
 +        In case of any error we just use the local tick. */
 +
 +      int fd = open (/dev/urandom, O_RDONLY);
 +      if (fd = 0)
 +        {
 +          read (fd, random_seed, sizeof (random_seed));

I suppose we might get interrupted before anything is read and
read can return with -1 (I suppose partial reads are quite unlikely
though)?  Thus, don't we need the usual EINTR loop?

If not, the patch is ok.

Thanks,
Richard.

 +          close (fd);
 +        }
 +
 +      /* Now get the tick anyways  */
  #ifdef HAVE_GETTIMEOFDAY
       {
        struct timeval tv;
 --
 1.7.5.4

Re: Vector shuffling

2011-09-28 Thread Artem Shinkarov

On Thu, Sep 15, 2011 at 8:05 PM, Richard Henderson r...@redhat.com wrote:
 +The elements of the input vectors are numbered from left to right across
 +one or both of the vectors. Each element in the mask specifies a number
 +of element from the input vector(s). Consider the following example.

 It would be more preferable to talk about the memory ordering of the
 elements rather than left and right which are ambiguous at best.

 +  if (TREE_CODE (mask) == VECTOR_CST)
 +    {
 +      tree m_type, call;
 +      tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), 
 m_type);
 +      /*rtx t;*/

 Leftover crap.

Fixed.

 +
 +      if (!fn)
 +     goto vshuffle;
 +
 +      if (m_type != TREE_TYPE (TREE_TYPE (mask)))
 +     {
 +       int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask));
 +       tree cvt = build_vector_type (m_type, units);
 +       mask = fold_convert (cvt, mask);
 +     }
 +
 +      call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), 
 fn);
 +      call = build_call_nary (type /* ? */, call, 3, v0, v1, mask);
 +
 +      return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, 
 NULL);
 +    }
 +
 +vshuffle:
 +  gcc_assert (operand_equal_p (v0, v1, 0));

 Why can't a non-constant shuffle have different V0 and V1?  That seems
 like a direct violation of the documentation, and any sort of usefulness.

Ok, I agree. The reason why this assert is here is that noone in the
middle-end generates the code that does not meet this assert. In
principle we definitely want to support it in the upcoming patches,
but it would be nice to start with a simple thing.

 Also, while I'm ok with the use of builtin_vec_perm here in the short
 term, I think that in the long term we should simply force the named
 pattern to handle constants.  Then the vectorizer can simply use the
 predicate and the tree code and we can drop the large redundancy of
 builtins with different argument types.

 Indeed, once this patch is applied, I think that ought to be the very
 next task in this domain.

 +/* Vector shuffle expression.  A = VEC_SHUFFLE_EXPRv0, v1, maks

 Typo in mask.

 +   foreach i in length (mask):
 +     A = mask[i]  length (v0) ? v0[mask[i]] : v1[mask[i]]

 Surely it's v1[mask[i] - length].

 +      if (TREE_CODE (vect) == VECTOR_CST)
 +        {
 +            unsigned i;

 Indentation is off all through this function.

Fixed.

 +  mask = gen_rtx_AND (maskmode, mask, mm);
 +
 +  /* Convert mask to vector of chars.  */
 +  mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0);
 +  mask = force_reg (V16QImode, mask);

 Why are you using force_reg to do all the dirty work?  Seems to
 me this should be using expand_normal.  All throughout this
 function.  That would also avoid the need for all of the extra
 force_reg stuff that ought not be there for -O0.

I don't really understand this. As far as I know, expand_normal
converts tree to rtx. All my computations are happening at the level
of rtx and force_reg is needed just to bring an rtx expression to the
register of the correct mode. If I am missing something, could you
give an example how can I use expand_normal instead of force_reg in
this particular code.

 I also see that you're not even attempting to use xop_pperm.

As I said, I am happy to experiment with the cases v0 != v1 in the
upcoming patches. Let's just start with a simple thing and see what
kind of issues/problems it would bring.

 Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?

My personal feeling is that it may be the case with v0 != v1, that it
would be more efficient to perform piecewise shuffling rather than
bitwise dances around the masks.

 And give the vshuffle named pattern the wrong number of arguments?

Ok, If I'll make vshuffle to accept only two arguments -- vector and
mask, would it be ok?

 It's certainly possible to handle it, though it takes a few more steps,
 and might well be more efficient as a libgcc function rather than inline.

I don't really understand why it could be more efficient. I thought
that inline gives more chances to the final RTL optimisation.


 r~




Thanks,
Artem.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

 (I worry about the tables in i386.c deciding what strategy to use for block of
 given size. This is more or less unrelated to the actual patch)
Yep, the threshold values I mentioned above are the values in these
tables. Even with fast glibs there are some cases when inlining is
profitable (e.g. if alignment is known at compile time).

On 28 September 2011 16:54, Jan Hubicka hubi...@ucw.cz wrote:
  Do you know glibc version numbers when
  the optimized string functions was introduced?

 Afaik, it's 2.13.
 I also compared my implementation to 2.13.

 I wonder if we can assume that most of GCC 4.7 based systems will be glibc 
 2.13
 based, too.  I would tend to say that yes and thus would suggest to tamn down
 inlining that is no longer profitable on newer glibcs with a note in
 changes.html...

 (I worry about the tables in i386.c deciding what strategy to use for block of
 given size. This is more or less unrelated to the actual patch)

 Honza




-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Re: Go patch committed: Update libgo to Go release r60

2011-09-28 Thread Ian Lance Taylor

Rainer Orth r...@cebitec.uni-bielefeld.de writes:

 Thanks, I'd missed that.  It turned out that IRIX 6 needs one more
 change to return to bootstrap land: sys/termios.h only defines
 TIOCNOTTY if !_XOPEN_SOURCE, which we need for other stuff
 (cf. configure.ac).  I've cheated and use sys/ttold.h instead, which
 doesn't have this check.  With this patch, a Go-only IRIX 6.5 bootstrap
 completed successfully.

Thanks.  Committed to mainline.

Ian

Re: Intrinsics for N2965: Type traits and base classes

2011-09-28 Thread Mike Spertus

OK. Here are some simple benchmarks. I simulated heavy use of reflection 
with 1000 classes that each had about a thousand base classes. I also 
created a super-simple typelist class


templatetypename... T struct typelist {}; // Variadic templates rock

If bases returns a typelist, the program takes about 4 sec.
If bases returns a tuple, the program takes about 4 min.

If I make the program any bigger, the tuple case fails to compile with 
spurious error messages, while the typelist version stays quick.


Given that metaprograms typically create large class hierarchies (look 
at Alexandrescu's CreateScatterHierarchy that he uses to implement 
factory in the Modern C++ design book) and that compile times are an 
enormous obstacle to metaprogramming, I don't think these tests are at 
all ridiculous.


I think this shows we need to return a typelist instead of a tuple.

As I mentioned earlier, I could just return the typelist, or hide it by 
returning an unspecified type (which would actually be a typelist) that 
you would apply a first and a rest template to walk through. This 
would give us more flexibility for the future (e.g., if a standard 
typelist type is adopted. Likewise, we would be covered if wanted to 
change bases implementation in the future to return an associative 
container. For example, if using sizegrepA, basesE::type::value to 
count the number of occurrences of A as a base class of E turns out to 
be useful).


Thanks,

Mike

On 9/28/2011 6:54 AM, Mike Spertus wrote:
Don't worry, I'm not suggesting including boost::mpl at all, just 
leaving the return type of the bases trait unspecified. IMO, your 
example illustrates my point that without performance tuning, 
compiling metaprograms can be prohibitively expensive, so I want to 
avoid running the tuple metaprogram that creates the fields when we 
never need to instantiate the type. Benchmarks soon.


Mike

On 9/28/2011 2:53 AM, Jonathan Wakely wrote:

On 28 September 2011 04:22, Michael Spertus wrote:

Benjamin,
I think tuple is wrong both for performance reasons (I believe these 
are likely to be serious enough to depress use due to inordinately 
long compiles) and because it prematurely locks us into a rigid 
choice of how our typelists are implemented.


My inclination is to make it type-independent by returning an 
unspecified type that can have a sequence of types extracted from it 
(this is the approach taken by boost::mpl and has loads of 
experience that shows it is a good approach to metaprogramming). In 
other words, firstbasesA::type would be the first base of A, etc.

Citing Boost MPL as a good way to avoid inordinately long compiles ...
interesting!  Have you ever tried to reduce a GCC bug report from 20k
lines to 20, because most Boost libs include every MPL header?!

I hope we can get a simple typelist _without_ needing everything else
in MPL, such as the apply and lambda metafunctions (and maybe a lot of
that could be massively simplified using variadic templates anyway.)

.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Jack Howarth

On Wed, Sep 28, 2011 at 02:56:30PM +0400, Michael Zolotukhin wrote:
 Attached is a part 1 of patch that enables use of vector-instructions
 in memset and memcopy (middle-end part).
 The main part of the changes is in functions
 move_by_pieces/set_by_pieces. In new version algorithm of move-mode
 selection was changed – now it checks if alignment is known at compile
 time and uses cost-models to choose between aligned and unaligned
 vector or not-vector move-modes.
 

Michael,
   It appears that part 1 of the patch wasn't really attached.
   Jack

 
 
 
 -- 
 ---
 Best regards,
 Michael V. Zolotukhin,
 Software Engineer
 Intel Corporation.

Re: [PATCH, ARM] Unaligned accesses for builtin memcpy [2/2]

2011-09-28 Thread Ramana Radhakrishnan

On 6 May 2011 14:13, Julian Brown jul...@codesourcery.com wrote:
 Hi,

 This is the second of two patches to add unaligned-access support to
 the ARM backend. It builds on the first patch to provide support for
 unaligned accesses when expanding block moves (i.e. for builtin memcpy
 operations). It makes some effort to use load/store multiple
 instructions where appropriate (when accessing sufficiently-aligned
 source or destination addresses), and also makes some effort to
 generate fast code (for -O1/2/3) or small code (for -Os), though some
 of the heuristics may need tweaking still

Sorry it's taken me a while to get around to this one. Do you know
what difference this makes to performance on some standard benchmarks
on let's say an A9 and an M4 as I see that this gets triggered only
when we have less than 64 bytes to copy. ?

Please add a few testcases from the examples that you've shown here to
be sure that ldm's are being generated in the right cases.

cheers
Ramana

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

   It appears that part 1 of the patch wasn't really attached.
Thanks, resending.


memfunc-mid.patch
Description: Binary data

[PATCH] Fix PR50460

2011-09-28 Thread Richard Guenther


This extends try_move_mult_to_index folding to be less picky about
the outermost array reference and handles a component-ref of an
array the same as if that were wrapped with an array-ref with
minimum index.

This consolidates array-ref reconstruction to the frontend-closest
position we have right now (eventually that function should move
to some c-family helper I guess).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-09-28  Richard Guenther  rguent...@suse.de

PR middle-end/50460
* fold-const.c (try_move_mult_to_index): Handle a.array the
same as a.array[0].

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 179308)
--- gcc/fold-const.c(working copy)
*** try_move_mult_to_index (location_t loc,
*** 6870,6875 
--- 6870,6929 
  
  break;
}
+   else if (TREE_CODE (ref) == COMPONENT_REF
+   TREE_CODE (TREE_TYPE (ref)) == ARRAY_TYPE)
+   {
+ tree domain;
+ 
+ /* Remember if this was a multi-dimensional array.  */
+ if (TREE_CODE (TREE_OPERAND (ref, 0)) == ARRAY_REF)
+   mdim = true;
+ 
+ domain = TYPE_DOMAIN (TREE_TYPE (ref));
+ if (! domain)
+   continue;
+ itype = TREE_TYPE (domain);
+ 
+ step = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (ref)));
+ if (TREE_CODE (step) != INTEGER_CST)
+   continue;
+ 
+ if (s)
+   {
+ if (! tree_int_cst_equal (step, s))
+ continue;
+   }
+ else
+   {
+ /* Try if delta is a multiple of step.  */
+ tree tmp = div_if_zero_remainder (EXACT_DIV_EXPR, op1, step);
+ if (! tmp)
+   continue;
+ delta = tmp;
+   }
+ 
+ /* Only fold here if we can verify we do not overflow one
+dimension of a multi-dimensional array.  */
+ if (mdim)
+   {
+ tree tmp;
+ 
+ if (!TYPE_MIN_VALUE (domain)
+ || !TYPE_MAX_VALUE (domain)
+ || TREE_CODE (TYPE_MAX_VALUE (domain)) != INTEGER_CST)
+   continue;
+ 
+ tmp = fold_binary_loc (loc, PLUS_EXPR, itype,
+fold_convert_loc (loc, itype,
+  TYPE_MIN_VALUE (domain)),
+fold_convert_loc (loc, itype, delta));
+ if (TREE_CODE (tmp) != INTEGER_CST
+ || tree_int_cst_lt (TYPE_MAX_VALUE (domain), tmp))
+   continue;
+   }
+ 
+ break;
+   }
else
mdim = false;
  
*** try_move_mult_to_index (location_t loc,
*** 6892,6903 
pos = TREE_OPERAND (pos, 0);
  }
  
!   TREE_OPERAND (pos, 1) = fold_build2_loc (loc, PLUS_EXPR, itype,
!  fold_convert_loc (loc, itype,
!TREE_OPERAND (pos, 1)),
!  fold_convert_loc (loc, itype, delta));
! 
!   return fold_build1_loc (loc, ADDR_EXPR, TREE_TYPE (addr), ret);
  }
  
  
--- 6946,6974 
pos = TREE_OPERAND (pos, 0);
  }
  
!   if (TREE_CODE (ref) == ARRAY_REF)
! {
!   TREE_OPERAND (pos, 1)
!   = fold_build2_loc (loc, PLUS_EXPR, itype,
!  fold_convert_loc (loc, itype, TREE_OPERAND (pos, 1)),
!  fold_convert_loc (loc, itype, delta));
!   return fold_build1_loc (loc, ADDR_EXPR, TREE_TYPE (addr), ret);
! }
!   else if (TREE_CODE (ref) == COMPONENT_REF)
! {
!   gcc_assert (ret == pos);
!   ret = build4_loc (loc, ARRAY_REF, TREE_TYPE (TREE_TYPE (ref)), ret,
!   fold_build2_loc
! (loc, PLUS_EXPR, itype,
!  fold_convert_loc (loc, itype,
!TYPE_MIN_VALUE
!  (TYPE_DOMAIN (TREE_TYPE (ref,
!  fold_convert_loc (loc, itype, delta)),
!   NULL_TREE, NULL_TREE);
!   return build_fold_addr_expr_loc (loc, ret);
! }
!   else
! gcc_unreachable ();
  }

Re: Scalar vector binary operation

2011-09-28 Thread Ian Lance Taylor

On Wed, Aug 10, 2011 at 7:44 AM, Richard Guenther
richard.guent...@gmail.com wrote:

 On Tue, Aug 9, 2011 at 10:23 PM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:
  Sorry, I didn't attach the patch itself.
  Here we go, in the attachment.

 I have committed the patch after re-bootstrapping and testing it
 on x86_64-unknown-linux-gnu with {,-m32}.

This is new functionality for gcc's vector extension, and I think it
should be mentioned at http://gcc.gnu.org/gcc-4.7/changes.html.  Does
somebody want to take care of that?

Ian

Commit: RX: Fix problems building libgcc

2011-09-28 Thread Nick Clifton

Hi Guys,

  I am applying the patch below to fix a couple of problems building
  libgcc for the RX target.  The first is that when 32-bit doubles are
  enabled we need to make sure that we never try to construct a 64-bit
  double type.  This is done in rx-lib.h, but it was only being enabled
  when constructing the float-type functions.  The patch enables the fix
  for when double-type functions are being constructed as well.

  The second fix is to prevent a extraneous renaming of the floatsisf
  and floatunsisf functions when building libgcc width 32-bit doubles
  enabled.

Cheers
  Nick

libgcc/ChangeLog
2011-09-28  Nick Clifton  ni...@redhat.com

* config/rx/rx-lib.h: Always restrict doubles to the SF type when
64-bit doubles are not enabled.
* config/rx/rx-abi.h: Fix extraneous renaming of the floatsisf
and floatunsisf functions.

Index: libgcc/config/rx/rx-lib.h
===
--- libgcc/config/rx/rx-lib.h   (revision 179307)
+++ libgcc/config/rx/rx-lib.h   (working copy)
@@ -1,6 +1,5 @@
-#ifdef FLOAT
 #ifndef __RX_64BIT_DOUBLES__
 #define DF SF
 #define FLOAT_ONLY
 #endif
-#endif
+
Index: libgcc/config/rx/rx-abi.h
===
--- libgcc/config/rx/rx-abi.h   (revision 179307)
+++ libgcc/config/rx/rx-abi.h   (working copy)
@@ -80,16 +80,7 @@
 #endif
 
 
-#ifdef L_si_to_sf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatsisf, CONV32sf)
-#endif
 
-#ifdef L_usi_to_sf
-#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatunsisf, CONV32uf)
-#endif
-
-
-
 #ifdef __RX_64BIT_DOUBLES__
 
 /* Float (32-bit) aliases...  */
@@ -176,6 +167,14 @@
 #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (negdf2, NEGd)
 #endif
 
+#ifdef L_si_to_sf
+#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatsisf, CONV32sf)
+#endif
+
+#ifdef L_usi_to_sf
+#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatunsisf, CONV32uf)
+#endif
+
 /* The 64-bit comparison functions do not have aliases because libgcc2
does not provide them.  Instead they have to be supplied in
rx-abi-functions.c.  */

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Carrot Wei

Hi Tom

What's the behavior of your patch to the following case

typedef int int_unaligned __attribute__((aligned(1)));
int foo (int_unaligned *p)
{
  return *p;
}

thanks
Carrot

On Tue, Sep 20, 2011 at 7:13 PM, Tom de Vries vr...@codesourcery.com wrote:
 Hi Richard,

 I have a patch for PR43814. It introduces an option that assumes that function
 arguments of pointer type are aligned, and uses that information in
 tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.

 I tested the patch successfully on-by-default on x86_64 and i686 (both gcc 
 only
 builds).

 I also tested the patch on-by-default for ARM (gcc/glibc build). The patch
 generated wrong code for uselocale.c:
 ...
 glibc/locale/locale.h:
 ...
 /* This value can be passed to `uselocale' and may be returned by
   it. Passing this value to any other function has undefined behavior.  */
 # define LC_GLOBAL_LOCALE       ((__locale_t) -1L)
 ...
 glibc/locale/uselocale.c:
 ...
 locale_t
 __uselocale (locale_t newloc)
 {
  locale_t oldloc = _NL_CURRENT_LOCALE;

  if (newloc != NULL)
    {
      const locale_t locobj
        = newloc == LC_GLOBAL_LOCALE ? _nl_global_locale : newloc;

 ...
 The assumption that function arguments of pointer type are aligned, allowed 
 the
 test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false.
 But the usage of ((__locale_t) -1L) as function argument in uselocale violates
 that assumption.

 Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run without
 regressions for ARM.

 Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM,
 discussed here:
 - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html
 - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html

 But, since glibc uses this construct currently, the option is off-by-default 
 for
 now.

 OK for trunk?

 Thanks,
 - Tom

 2011-09-20  Tom de Vries t...@codesourcery.com

        PR target/43814
        * tree-ssa-ccp.c (get_align_value): New function, factored out of
        get_value_from_alignment.
        (get_value_from_alignment): Use get_align_value.
        (get_value_for_expr): Use get_align_value to handle alignment of
        function argument pointers.
        * common.opt (faligned-pointer-argument): New option.
        * doc/invoke.texi (Optimization Options): Add
        -faligned-pointer-argument.
        (-faligned-pointer-argument): New item.

        * gcc/testsuite/gcc.dg/pr43814.c: New test.
        * gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.

Re: Vector Comparison patch

2011-09-28 Thread Richard Guenther

On Mon, Sep 26, 2011 at 5:43 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers jos...@codesourcery.com 
 wrote:
 This looks like it has the same issue with maybe needing to use
 TYPE_MAIN_VARIANT in type comparisons as the shuffle patch.

 I don't think so, we move qualifiers to the vector type from the element type
 in make_vector_type and the tests only look at the component type.

 I am re-testing the patch currently and will commit it if that succeeds.

 Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32
 for

    vector (2, double) d0;
    vector (2, double) d1;
    vector (2, long) idres;

    d0 = (vector (2, double)){(double)argc,  10.};
    d1 = (vector (2, double)){0., (double)-23};
    idres = (d0  d1);

 as appearantly the type we chose to assign to (d0  d1) is different
 from that of idres:

 /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
 error: incompatible types when assigning to type '__vector(2) long
 int' from type '__vector(2) long long int'^M

 Adjusting it to vector (2, long long) otoh yields, for -m64:

 /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5:
 error: incompatible types when assigning to type '__vector(2) long
 long int' from type '__vector(2) long int'

 But those two types are at least compatible from their modes.  Joseph,
 should we accept mode-compatible types in assignments or maybe
 transparently convert them?

Looks like we have a more suitable solution for these automatically
generated vector types - mark them with TYPE_VECTOR_OPAQUE.

I'm testing the following incremental patch.

Richard.

Index: gcc/c-typeck.c
===
--- gcc/c-typeck.c.orig 2011-09-28 16:22:10.0 +0200
+++ gcc/c-typeck.c  2011-09-28 16:18:39.0 +0200
@@ -9928,8 +9928,10 @@ build_binary_op (location_t location, en
 }

   /* Always construct signed integer vector type.  */
-  intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
(type0)), 0);
-  result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+  intt = c_common_type_for_size (GET_MODE_BITSIZE
+  (TYPE_MODE (TREE_TYPE (type0))), 0);
+  result_type = build_opaque_vector_type (intt,
+ TYPE_VECTOR_SUBPARTS (type0));
   converted = 1;
   break;
 }
@@ -10063,8 +10065,10 @@ build_binary_op (location_t location, en
 }

   /* Always construct signed integer vector type.  */
-  intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE
(type0)), 0);
-  result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0));
+  intt = c_common_type_for_size (GET_MODE_BITSIZE
+  (TYPE_MODE (TREE_TYPE (type0))), 0);
+  result_type = build_opaque_vector_type (intt,
+ TYPE_VECTOR_SUBPARTS (type0));
   converted = 1;
   break;
 }

[Patch, Fortran] Add c_float128{,_complex} as GNU extension to ISO_C_BINDING

2011-09-28 Thread Tobias Burnus

This patch makes the GCC extension __float128 (_Complex) available in 
the C bindings via C_FLOAT128 and C_FLOAT128_COMPLEX.


Additionally, I have improved the diagnostic for explicitly use 
associating -std= versioned symbols. And I have finally added the 
iso*.def files to the makefile dependencies.


As usual, with -std=f2008, the GNU extensions are not loaded. I have 
also updated the documentation.


OK for the trunk?

Tobias

PS: If you think that C_FLOAT128/C_FLOAT128_COMPLEX are bad names for 
C's __float128, please speak up before gfortran - and other compilers 
implement it. (At least one vendor is implementing __float128 support 
and plans to modify ISO_C_BINDING.) The proper name would be 
C___FLOAT128, but that looks awkward!
2011-09-28  Tobias Burnus  bur...@net-b.de

	* Make-lang.in (F95_PARSER_OBJS, GFORTRAN_TRANS_DEPS): Add
	dependency on iso-c-binding.def and iso-fortran-env.def.
	* module.c (import_iso_c_binding_module): Add error when
	explicitly importing a nonstandard symbol; extend standard-
	depending loading.
	* iso-c-binding.def: Add c_float128 and c_float128_complex
	integer parameters (for -std=gnu).
	* intrinsic.texi (ISO_C_Binding): Document them.
	* symbol.c (generate_isocbinding_symbol): Change macros
	to ignore GFC_STD_* data.
	* trans-types.c (gfc_init_c_interop_kinds): Ditto; make
	nonstatic and renamed from init_c_interop_kinds.
	(gfc_init_kinds): Don't call it
	* trans-types.h (gfc_init_c_interop_kinds): Add prototype.
	* f95-lang.c (gfc_init_decl_processing): Call it.

2011-09-28  Tobias Burnus  bur...@net-b.de

	* gfortran.dg/iso_c_binding_param_1.f90: New.
	* gfortran.dg/iso_c_binding_param_2.f90: New.
	* gfortran.dg/c_sizeof_2.f90: Update dg-error.

diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in
index 0816458..b766da6 100644
--- a/gcc/fortran/Make-lang.in
+++ b/gcc/fortran/Make-lang.in
@@ -329,14 +329,16 @@ $(F95_PARSER_OBJS): fortran/gfortran.h fortran/libgfortran.h \
 		fortran/parse.h fortran/arith.h fortran/target-memory.h \
 		$(CONFIG_H) $(SYSTEM_H) $(TM_H) $(TM_P_H) coretypes.h \
 		$(RTL_H) $(TREE_H) $(TREE_DUMP_H) $(GGC_H) $(EXPR_H) \
-		$(FLAGS_H) output.h $(DIAGNOSTIC_H) errors.h $(FUNCTION_H) 
+		$(FLAGS_H) output.h $(DIAGNOSTIC_H) errors.h $(FUNCTION_H) \
+		fortran/iso-c-binding.def fortran/iso-fortran-env.def
 fortran/openmp.o: pointer-set.h $(TARGET_H) toplev.h
 
 GFORTRAN_TRANS_DEPS = fortran/gfortran.h fortran/libgfortran.h \
 fortran/intrinsic.h fortran/trans-array.h \
 fortran/trans-const.h fortran/trans-const.h fortran/trans.h \
 fortran/trans-stmt.h fortran/trans-types.h \
-$(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(TM_H) coretypes.h $(GGC_H)
+$(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(TM_H) coretypes.h $(GGC_H) \
+fortran/iso-c-binding.def fortran/iso-fortran-env.def
 
 fortran/f95-lang.o: $(GFORTRAN_TRANS_DEPS) fortran/mathbuiltins.def \
   gt-fortran-f95-lang.h gtype-fortran.h $(CGRAPH_H) $(TARGET_H) fortran/cpp.h \
diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c
index 648831f..8f8dd7d 100644
--- a/gcc/fortran/f95-lang.c
+++ b/gcc/fortran/f95-lang.c
@@ -595,6 +595,7 @@ gfc_init_decl_processing (void)
   /* Set up F95 type nodes.  */
   gfc_init_kinds ();
   gfc_init_types ();
+  gfc_init_c_interop_kinds ();
 }
 
 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 54e0b20..1bd5ec3 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -610,8 +610,8 @@ iso_fortran_env_symbol;
 #undef NAMED_DERIVED_TYPE
 
 #define NAMED_INTCST(a,b,c,d) a,
-#define NAMED_REALCST(a,b,c) a,
-#define NAMED_CMPXCST(a,b,c) a,
+#define NAMED_REALCST(a,b,c,d) a,
+#define NAMED_CMPXCST(a,b,c,d) a,
 #define NAMED_LOGCST(a,b,c) a,
 #define NAMED_CHARKNDCST(a,b,c) a,
 #define NAMED_CHARCST(a,b,c) a,
diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 9adeeab..a093bec 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -13006,7 +13006,9 @@ type default integer, which can be used as KIND type parameters.
 In addition to the integer named constants required by the Fortran 2003 
 standard, GNU Fortran provides as an extension named constants for the 
 128-bit integer types supported by the C compiler: @code{C_INT128_T, 
-C_INT_LEAST128_T, C_INT_FAST128_T}.
+C_INT_LEAST128_T, C_INT_FAST128_T}. Furthermore, if @code{__float} is
+supported in C, the named constants @code{C_FLOAT128, C_FLOAT128_COMPLEX}
+are defined.
 
 @multitable @columnfractions .15 .35 .35 .35
 @item Fortran Type  @tab Named constant @tab C type@tab Extension
@@ -13036,9 +13038,11 @@ C_INT_LEAST128_T, C_INT_FAST128_T}.
 @item @code{REAL}   @tab @code{C_FLOAT} @tab @code{float}
 @item @code{REAL}   @tab @code{C_DOUBLE}@tab @code{double}
 @item @code{REAL}   @tab @code{C_LONG_DOUBLE}   @tab @code{long double}
+@item @code{REAL}   @tab @code{C_FLOAT128}  @tab @code{__float128}@tab Ext.
 @item @code{COMPLEX}@tab

Commit: RX: Add support for MIN and MAX instructions in QI and HI modes

2011-09-28 Thread Nick Clifton

Hi Guys,

  I am going to apply the patch below to the RX backend to add support
  for generating MIN and MAX instructions for HI and QI modes.

Cheers
  Nick

gcc/ChangeLog
2011-09-28  Nick Clifton  ni...@redhat.com

* config/rx/predicates.md (rx_minmax_operand): New predicate.
Accepts immediates and a restricted subset of MEMs.
* config/rx/rx.md (int_modes): New iterator.
(smaxsi3, sminsi3): Delete and replace with...
(smaxint_mode3, sminint_mode3): New patterns.
(umax3_u, umax3_ur, umax3, umin3): New patterns.

Index: gcc/config/rx/predicates.md
===
--- gcc/config/rx/predicates.md (revision 179307)
+++ gcc/config/rx/predicates.md (working copy)
@@ -72,6 +72,16 @@
(match_operand 0 rx_restricted_mem_operand))
 )
 
+;; Check that the operand is suitable as the source operand
+;; for a min/max instruction.  This is the same as
+;; rx_source_operand except that CONST_INTs are allowed but
+;; REGs and SUBREGs are not.
+
+(define_predicate rx_minmaxex_operand
+  (ior (match_operand 0 immediate_operand)
+   (match_operand 0 rx_restricted_mem_operand))
+)
+
 ;; Return true if OP is a store multiple operation.  This looks like:
 ;;
 ;;   [(set (SP) (MINUS (SP) (INT)))
Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md (revision 179307)
+++ gcc/config/rx/rx.md (working copy)
@@ -22,6 +22,9 @@
 ;; This code iterator is used for sign- and zero- extensions.
 (define_mode_iterator small_int_modes [(HI ) (QI )])
 
+;; This code iterator is used for max and min operations.
+(define_mode_iterator int_modes [(SI ) (HI ) (QI )])
+
 ;; We do not handle DFmode here because it is either
 ;; the same as SFmode, or if -m64bit-doubles is active
 ;; then all operations on doubles have to be handled by
@@ -1160,28 +1163,109 @@
(set_attr timings 22,44)]
 )
 
-(define_insn smaxsi3
-  [(set (match_operand:SI  0 register_operand =r,r,r,r,r,r)
-   (smax:SI (match_operand:SI 1 register_operand %0,0,0,0,0,0)
-(match_operand:SI 2 rx_source_operand
-  r,Sint08,Sint16,Sint24,i,Q)))]
+(define_insn smaxint_modes:mode3
+  [(set (match_operand:int_modes 0 register_operand 
=r,r,r,r,r,r)
+   (smax:int_modes (match_operand:int_modes 1 register_operand 
%0,0,0,0,0,0)
+   (match_operand:int_modes 2 rx_source_operand
+
r,Sint08,Sint16,Sint24,i,Q)))]
   
   max\t%Q2, %0
   [(set_attr length 3,4,5,6,7,6)
(set_attr timings 11,11,11,11,11,33)]
 )
 
-(define_insn sminsi3
-  [(set (match_operand:SI  0 register_operand =r,r,r,r,r,r)
-   (smin:SI (match_operand:SI 1 register_operand %0,0,0,0,0,0)
-(match_operand:SI 2 rx_source_operand
-  r,Sint08,Sint16,Sint24,i,Q)))]
+(define_insn sminint_modes:mode3
+  [(set (match_operand:int_modes 0 register_operand 
=r,r,r,r,r,r)
+   (smin:int_modes (match_operand:int_modes 1 register_operand 
%0,0,0,0,0,0)
+(match_operand:int_modes2 rx_source_operand
+
r,Sint08,Sint16,Sint24,i,Q)))]
   
   min\t%Q2, %0
   [(set_attr length  3,4,5,6,7,6)
(set_attr timings 11,11,11,11,11,33)]
 )
 
+(define_insn umaxsmall_int_modes:mode3_u
+  [(set (match_operand:SI  0 register_operand =r,r,r,r,r,r)
+   (smax:SI (match_operand:SI 1 register_operand %0,0,0,0,0,0)
+(zero_extend:SI (match_operand:small_int_modes 2 
rx_minmaxex_operand
+   
r,Sint08,Sint16,Sint24,i,Q]
+  
+  max\t%R2, %0
+  [(set_attr length  3,4,5,6,7,6)
+   (set_attr timings 11,11,11,11,11,33)]
+)
+
+(define_insn uminsmall_int_modes:mode3_ur
+  [(set (match_operand:SI  0 register_operand =r,r,r,r,r,r)
+   (smin:SI (zero_extend:SI (match_operand:small_int_modes 2 
rx_minmaxex_operand
+   
r,Sint08,Sint16,Sint24,i,Q))
+(match_operand:SI 1 register_operand %0,0,0,0,0,0)))]
+  
+  min\t%R2, %0
+  [(set_attr length  3,4,5,6,7,6)
+   (set_attr timings 11,11,11,11,11,33)]
+)
+
+(define_insn umaxsmall_int_modes:mode3_ur
+  [(set (match_operand:SI  0 register_operand =r,r,r,r,r,r)
+   (smax:SI (zero_extend:SI (match_operand:small_int_modes 2 
rx_minmaxex_operand
+   
r,Sint08,Sint16,Sint24,i,Q))
+(match_operand:SI 1 register_operand %0,0,0,0,0,0)))]
+  
+  max\t%R2, %0
+  [(set_attr length  3,4,5,6,7,6)
+   (set_attr timings 11,11,11,11,11,33)]
+)
+
+(define_expand umaxsmall_int_modes:mode3
+  [(set (match_dup 4)
+   (zero_extend:SI (match_operand:small_int_modes 1 register_operand 
%0,0,0,0,0,0)))
+   (set (match_dup

Re: Vector shuffling

2011-09-28 Thread Richard Henderson

On 09/28/2011 05:59 AM, Artem Shinkarov wrote:
 I don't really understand this. As far as I know, expand_normal
 converts tree to rtx. All my computations are happening at the level
 of rtx and force_reg is needed just to bring an rtx expression to the
 register of the correct mode. If I am missing something, could you
 give an example how can I use expand_normal instead of force_reg in
 this particular code.

Sorry, I meant expand_(simple_)binop.

 Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1?
 
 My personal feeling is that it may be the case with v0 != v1, that it
 would be more efficient to perform piecewise shuffling rather than
 bitwise dances around the masks.

Maybe for V2DI and V2DFmode, but probably not otherwise.

We can perform the double-word shuffle in 12 insns; 10 for SSE 4.1.
Example assembly attached.

 It's certainly possible to handle it, though it takes a few more steps,
 and might well be more efficient as a libgcc function rather than inline.
 
 I don't really understand why it could be more efficient. I thought
 that inline gives more chances to the final RTL optimisation.

We'll not be able to optimize this at the rtl level.  There are too many
UNSPEC instructions in the way.  In any case, even if that weren't so we'd
only be able to do useful optimization for a constant permutation.  And
we should have been able to prove that at the gimple level.


r~
.data
.align 16
vec3:   .long   3,3,3,3
vec4:   .long   4,4,4,4
dup4:   .byte   0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12
ofs4:   .byte   0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3

.text
shuffle2:

// Convert the low bits of the mask to a shuffle
movdqa  %xmm2, %xmm3
pandvec3, %xmm3
pmulld  vec4, %xmm3
pshufb  dup4, %xmm3
paddb   ofs4, %xmm3

// Shuffle both inputs
pshufb  %xmm3, %xmm0
pshufb  %xmm3, %xmm1

// Select and merge the inputs
// Use ix86_expand_int_vcond for use of pblendvb for SSE4_1.
pandvec4, %xmm2
pcmpeqd vec4, %xmm2
pand%xmm2, %xmm1
pandn   %xmm2, %xmm0
por %xmm1, %xmm0

ret

Re: Scalar vector binary operation

2011-09-28 Thread Ian Lance Taylor

Artem Shinkarov artyom.shinkar...@gmail.com writes:

 I can try to put a description in the document. I am not sure that I
 have rights to commit to the svn, but at least I can try to write the
 text.

 There are also pending patches for vector-comparison (almost
 submitted) and vector shuffling (still under discussion), but I hope
 to finish both of them until the new release is coming.

 Could you point out where the cnhanges.html is located in svn?

It's actually still in CVS, at gcc.gnu.org:/cvs/gcc.

Ian

Re: [4/4] Make SMS schedule register moves

2011-09-28 Thread Richard Sandiford

Ayal Zaks ayal.z...@gmail.com writes:
  +  /* The cyclic lifetime of move-new_reg starts and ends at move-def
  +     (the instruction that defines move-old_reg).
 
  So instruction I_REG_MOVE (new_reg=reg) must be scheduled before the
  next I_MUST_FOLLOW move/original-def (due to anti-dependence: it
  overwrites reg), but after the previous instance of I_MUST_FOLLOW (due
  to true dependence; i.e. account for latency also). Why do moves,
  except for the one closest to move-def (which is directly dependent
  upon it, i.e. for whom move-def == I_MUST_FOLLOW), have to worry
  about move-def at all? (Or have their cyclic lifetimes start and end
  there?)

 Because the uses of new_reg belong to the same move-def based cycle.


 the cycle (overloaded term; rather iteration in this context) to
 which the uses belong, is inferred from the cycle (absolute schedule
 time) in which they are scheduled, regardless of move-def.

Just to prove your point about cycle being an overloaded term: I wasn't
actually meaning it in the sense of (loop) iteration.  I meant a circular
window based on move-def.

 So (I think this is the uncontroversial bit): [M1] must be scheduled
 cyclically before [B] and cyclically after [C], with the cycle
 based at [B]:

    row 3 after [B]:  empty
    row 4:            [C]
    row 5:            [D]
    row 0:            empty
    row 1:            empty
    row 2:            [A]
    row 3 before [B]: empty

 [M1] could therefore go in row 1.  This part is OK.


 Here's how I see it:
 [M1] feeds [C] which is scheduled at cycle 10, so it must be scheduled
 before cycle 10-M_latency and after cycle 10-ii. [M1] uses the result
 of [B] which is scheduled at cycle 3, so must be scheduled after cycle
 3+B_latency and before cycle 3+ii. Taking all latencies to be 1 and
 ii=6, this yields a scheduling window of cycles [4,9]\cap[4,9]=[4,9];
 if scheduled at cycle 4 it must_follow [C], if scheduled at cycle 9 it
 must_precede [B]. This is identical to the logic behind the
 sched_window of any instruction, based on its dependencies (as you've
 updated not too long ago..), if we do not allow reg_moves (and
 arguably, one should not allow reg_moves when scheduling
 reg_moves...).

 To address the potential erroneous scenario of Loop 2, suppose [A] is
 scheduled as in the beginning in cycle 20, and that [M1] is scheduled
 in cycle 7 (\in[4,9]). Then
 [M2] feeds [D] and [A] which are scheduled at cycles 17 and 20, so it
 must be scheduled before cycle 17-1 and after cycle 20-6. [M2] uses
 the result of [M1], so must be scheduled after cycle 7+1 and before
 cycle 7+6. This yields the desired [14,16]\cap[8,13]=\emptyset.

I agree it's natural to schedule moves for intra-iteration dependencies
in the normal get_sched_window way.  But suppose we have a dependency:

   A --(T,N,1)-- B

that requires two moves M1 and M2.  If we think in terms of cycles
(in the SCHED_TIME sense), then this effectively becomes:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B

because it is now M1 that is fed by both the loop and the incoming edge.
But if there is a second dependency:

   A --(T,M,0)-- C

that also requires two moves, we end up with:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B
--(T,M3,-1)-- B

and dependence distances of -1 feel a bit weird. :-)  Of course,
what we really have are two parallel dependencies:

   A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B

   A --(T,M1,0)-- M1' --(T,M2,0)-- M2' --(T,N3,0)-- B

where M1' and M2' occupy the same position as M1 and M2 in the schedule,
but are one stage further along.  But we only schedule them once,
so if we take the cycle/SCHED_TIME route, we have to introduce
dependencies of distance -1.

Another reason why it seemed a little confusing to think in terms of
cycles (in the SCHED_TIME sense) was that, by this stage, we were
already thinking in terms of rows and columns to some extent:

/* If dest precedes src in the schedule of the kernel, then dest
   will read before src writes and we can save one reg_copy.  */
if (SCHED_ROW (e-dest-cuid) == SCHED_ROW (e-src-cuid)
 SCHED_COLUMN (e-dest-cuid)  SCHED_COLUMN (e-src-cuid))
  nreg_moves4e--;

However...

 Also note that if moves are assigned absolute cycles, it becomes clear
 to which stage each belongs (just like any other instruction),
 regulating their generation in prolog and epilog.

...I have to concede that it makes the stage calculation easier,
and that that tips the balance.  (Although again, a move can belong
to two stages simultanouesly.)

Anyway, here's an updated patch that uses cycle times.  I've also
dropped the code that tried to allow windows to start and end on
the same row (and therefore schedule either side of that row).
I thought it might help with some VLIWy DFAs, but it was always
going to be of minor benefit, and we don't try that elsewhere,
so let's avoid the complication.

Re: [3/4] SMS: Record moves in the partial schedule

2011-09-28 Thread Richard Sandiford

Ayal Zaks ayal.z...@gmail.com writes:
 Only request is to document that the register moves are
 placed/assigned-id's in a specific order.

I suppose this is the downside of splitting the patches up, sorry,
but the ids are only ordered for the throw-away loop:

 FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move)
   add_insn_before (move-insn, ps_first_note (ps, move-def), NULL);

and for the prologue/epilogue code.  Both are replaced in patch 4
with code that doesn't rely on the ordering of ids.
 Ok then, very well. I was mostly referring to the following in
 prologue/epiloque code, which merely relies on assigning all regmoves
 of a node consecutive id's:

FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move
 nreg_moves fields.

Here's a slightly updated patch in line with your 4/4 comments.
The move-def is now always the id of the predecessor, rather than
the id of the original ddg node producer.  I've adapted the throw-away
loop accordingly, but there are no other changes.

Tested on powerpc64-linux-gnu and with ARM benchmarks.

Richard

gcc/
* modulo-sched.c (ps_insn): Adjust comment.
(ps_reg_move_info): New structure.
(partial_schedule): Add reg_moves field.
(SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params.
(node_sched_params): Turn first_reg_move into an identifier.
(ps_reg_move): New function.
(ps_rtl_insn): Cope with register moves.
(ps_first_note): Adjust comment and assert that the instruction
isn't a register move.
(node_sched_params): Replace with...
(node_sched_param_vec): ...this vector.
(set_node_sched_params): Adjust accordingly.
(print_node_sched_params): Take a partial schedule instead of a ddg.
Use ps_rtl_insn and ps_reg_move.
(generate_reg_moves): Rename to...
(schedule_reg_moves): ...this.  Remove rescan parameter.  Record each
move in the partial schedule, but don't emit it here.  Don't perform
register substitutions here either.
(apply_reg_moves): New function.
(duplicate_insns_of_cycles): Use register indices directly,
rather than finding instructions using PREV_INSN.  Use ps_reg_move.
(sms_schedule): Call schedule_reg_moves before committing to
a partial schedule.   Try the next ii if the schedule fails.
Use apply_reg_moves instead of generate_reg_moves.  Adjust
call to print_node_sched_params.  Free node_sched_param_vec
instead of node_sched_params.
(create_partial_schedule): Initialize reg_moves.
(free_partial_schedule): Free reg_moves.

Index: gcc/modulo-sched.c
===
--- gcc/modulo-sched.c  2011-09-28 09:03:10.334301485 +0100
+++ gcc/modulo-sched.c  2011-09-28 11:24:26.530300781 +0100
@@ -124,7 +124,9 @@ #define PS_STAGE_COUNT(ps) (((partial_sc
 /* A single instruction in the partial schedule.  */
 struct ps_insn
 {
-  /* The number of the ddg node whose instruction is being scheduled.  */
+  /* Identifies the instruction to be scheduled.  Values smaller than
+ the ddg's num_nodes refer directly to ddg nodes.  A value of
+ X - num_nodes refers to register move X.  */
   int id;
 
   /* The (absolute) cycle in which the PS instruction is scheduled.
@@ -137,6 +139,30 @@ struct ps_insn
 
 };
 
+/* Information about a register move that has been added to a partial
+   schedule.  */
+struct ps_reg_move_info
+{
+  /* The source of the move is defined by the ps_insn with id DEF.
+ The destination is used by the ps_insns with the ids in USES.  */
+  int def;
+  sbitmap uses;
+
+  /* The original form of USES' instructions used OLD_REG, but they
+ should now use NEW_REG.  */
+  rtx old_reg;
+  rtx new_reg;
+
+  /* An instruction that sets NEW_REG to the correct value.  The first
+ move associated with DEF will have an rhs of OLD_REG; later moves
+ use the result of the previous move.  */
+  rtx insn;
+};
+
+typedef struct ps_reg_move_info ps_reg_move_info;
+DEF_VEC_O (ps_reg_move_info);
+DEF_VEC_ALLOC_O (ps_reg_move_info, heap);
+
 /* Holds the partial schedule as an array of II rows.  Each entry of the
array points to a linked list of PS_INSNs, which represents the
instructions that are scheduled for that row.  */
@@ -148,6 +174,10 @@ struct partial_schedule
   /* rows[i] points to linked list of insns scheduled in row i (0=iii).  */
   ps_insn_ptr *rows;
 
+  /* All the moves added for this partial schedule.  Index X has
+ a ps_insn id of X + g-num_nodes.  */
+  VEC (ps_reg_move_info, heap) *reg_moves;
+
   /*  rows_length[i] holds the number of instructions in the row.
   It is used only (as an optimization) to back off quickly from
   trying to schedule a node in a full row; that is, to avoid running
@@ -201,7 +231,7 @@ static void remove_node_from_ps (partial
 
 #define NODE_ASAP(node)

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen

On Wed, Sep 28, 2011 at 02:54:34PM +0200, Jan Hubicka wrote:
   Do you know glibc version numbers when
   the optimized string functions was introduced?
  
  Afaik, it's 2.13.
  I also compared my implementation to 2.13.
 
 I wonder if we can assume that most of GCC 4.7 based systems will be glibc 
 2.13
 based, too.  I would tend to say that yes and thus would suggest to tamn down
 inlining that is no longer profitable on newer glibcs with a note in
 changes.html...

You could add a check to configure and generate based on that?

BTW I know that the tables need tuning for Nehalem and Sandy Bridge too.
Michael are you planning to do that?

-Andi

Re: [PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Andi Kleen

 I suppose we might get interrupted before anything is read and
 read can return with -1 (I suppose partial reads are quite unlikely
 though)?  Thus, don't we need the usual EINTR loop?

When we get interrupted gcc will die. I don't think gcc handles any
asynchronous signals, right?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.

[Patch,AVR]: Better log output with -mdeb

2011-09-28 Thread Georg-Johann Lay

This is a tentative patch for better support of logging information for avr BE
developers.

There are situations where it is more convenient to let the compiler produce
information than to debug into the compiler. One example are -da dumps.

This patch proposes a better support to print information by means of a
printf-like function via %-codes.

The current debug output with avr-gcc's option -mdeb produces bulk of
information that is very hard to read because
 - there is much output
 - there is no information on current_function_decl
 - there is no information on current pass name/number
 - there is no print-like function so the trees, rtxes so that it is
   tedious to get formatted output.

For example, the following call to avr_edump

static int
avr_OS_main_function_p (tree func)
{
  avr_edump (%?: %t\n, func);
  return avr_lookup_function_attribute1 (func, OS_main);
}

prints additional information in a convenient way (foo is function to be 
compiled):

avr_OS_main_function_p[foo:pro_and_epilogue(202)]: function_decl # foo
type function_type #
type void_type # void VOID
...

Wondering that similar functionality is not provided by GCC itself, I wrote
this patch. GCC's diagnostic seems to be overwhelmingly complicated and not
intended for the purpose mentioned above.

The code is dead code at the moment. No function in the BE uses the new
functions. This patch just adds them.

Moreover; I don't know if avr port maintainers or global reviewers like such
stuff in the compiler... Therefore it's just tentative draft.

Supported %-codes are:

  r: rtx
  t: tree
  T: tree (brief)
  C: enum rtx_code
  m: enum machine_mode
  R: enum reg_class
  L: insn list
  H: location_t

  -- no args --
  A: call abort()
  f: current_function_name()
  F: caller (via __FUNCTION__)
  P: Pass name and number
  ?: Print caller, current function and pass info

  -- same as printf --
  %: %
  c: char
  s: string
  d: int (decimal)
  x: int (hex)

These codes cover great deal of BE developers needs and if something is missing
it can be added easily.

The calling itself is not as straight forward as it could be in the presence of
variadic macros (to get __FUNCTION__), but that should not matter here.

Johann

* config/avr/avr-protos.h (avr_edump, avr_fdump): New macros.
(avr__set_caller_e, avr__set_caller_f): New prototypes.
* config/avr/avr.c: Include tree-pass.h.
(avr__caller, avr__stream): New static variables.
(avr__fdump_e, avr__fdump_f, avr__vadump): New static functions.
(avr__set_caller_e, avr__set_caller_f): New functions.

Index: config/avr/avr-protos.h
===
--- config/avr/avr-protos.h	(revision 179257)
+++ config/avr/avr-protos.h	(working copy)
@@ -111,3 +111,9 @@ extern rtx avr_incoming_return_addr_rtx
 #ifdef REAL_VALUE_TYPE
 extern void asm_output_float (FILE *file, REAL_VALUE_TYPE n);
 #endif
+
+#define avr_edump (avr__set_caller_e (__FUNCTION__))
+#define avr_fdump (avr__set_caller_f (__FUNCTION__))
+
+extern int (*avr__set_caller_e (const char*))(const char*, ...);
+extern int (*avr__set_caller_f (const char*))(FILE*, const char*, ...);
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 179257)
+++ config/avr/avr.c	(working copy)
@@ -7810,5 +7810,233 @@ avr_expand_builtin (tree exp, rtx target
   gcc_unreachable ();
 }
 
+
+/***
+ ** Logging, for BE developers only
+ ***/
+
+#include tree-pass.h
+
+static const char* avr__caller = ?;
+static FILE* avr__stream;
+
+static void avr__vadump (FILE*, const char*, va_list);
+
+static int
+avr__fdump_f (FILE *stream, const char* fmt, ...)
+{
+  va_list ap;
+
+  va_start (ap, fmt);
+  if (stream)
+avr__vadump (stream, fmt, ap);
+  va_end (ap);
+
+  return 1;
+}
+
+int (*
+avr__set_caller_f (const char* caller)
+ )(FILE*, const char*, ...)
+{
+  avr__caller = caller;
+
+  return avr__fdump_f;
+}
+
+static int
+avr__fdump_e (const char* fmt, ...)
+{
+  va_list ap;
+
+  va_start (ap, fmt);
+  avr__vadump (stderr, fmt, ap);
+  va_end (ap);
+
+  return 1;
+}
+
+int (*
+avr__set_caller_e (const char* caller)
+ )(const char*, ...)
+{
+  avr__caller = caller;
+  avr__stream = stderr;
+  
+  return avr__fdump_e;
+}
+
+/*
+  -- known %-codes --
+  
+  r: rtx
+  t: tree
+  T: tree (brief)
+  C: enum rtx_code
+  m: enum machine_mode
+  R: enum reg_class
+  L: insn list
+  H: location_t
+
+  -- no args --
+  A: call abort()
+  f: current_function_name()
+  F: caller (via __FUNCTION__)
+  P: Pass name and number
+  ?: Print caller, current function and pass info
+
+  -- same as printf --
+  %: %
+  c: char
+  s: string
+  d: int (decimal)
+  x: int (hex)
+*/
+
+static void
+avr__vadump (FILE *file, const char *fmt, va_list ap)

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

 You could add a check to configure and generate based on that?
Do you mean check if glibc is newer than 2.13?
I think that when new glibc version is released, the tables should be
re-examined anyway - we shouldn't just stop inlining, or stop
generating libcalls.

 BTW I know that the tables need tuning for Nehalem and Sandy Bridge too.
 Michael are you planning to do that?
There is no separate cost-table for Nehalem or SandyBridge - however,
I tuned generic32 and generic64 tables, that should improve
performance on modern processors. In old version REP-MOV was used - it
turns out to be slower than SSE-moves or libcalls (in my version the
fastest from these options is used).

Re: PowerPC shrink-wrap support 3 of 3

2011-09-28 Thread Alan Modra

This supercedes http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01004.html
and http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01593.html, fixing
the two regressions introduced by those patches.  The first patch is
unchanged except to leave all the out-of-line restore functions using
return rather than simple_return.  We don't want these being
confused with a plain simple_return and perhaps used by the shrink-
wrapping to return from pre-prologue code.

The second of these two patches was way too simplistic.  It was a real
pain getting the cfa_restores correct.  A lot were missing, or emitted
at the wrong place (due to bug in rs6000_emit_stack_reset).  I also
had the real restore insn move past the cfa_restores (mtlr 0 insn
scheduled over loads from stack).

* config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded
declaration.
(rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp.
(rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx.  Use
simple_return in pattern, emit instruction, and set jump_label.
(rs6000_emit_prologue): Update for rs6000_emit_savres_rtx.  Use
simple_return rather than return.
(emit_cfa_restores): New function.
(rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap.
Add missing cfa_restores for SAVE_WORLD.  Add missing LR cfa_restore
when using out-of-line gpr restore.  Add missing LR and FP regs
cfa_restores for out-of-line fpr restore.  Consolidate code setting
up cfa_restores.  Formatting.  Use LR_REGNO define.
(rs6000_output_mi_thunk): Use simple_return rather than return.
* config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise.
(return_internal*): Likewise.
(any_return, return_pred, return_str): New iterators.
(return, conditional return insns): Provide both return and
simple_return variants.
* gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define.
(REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13.
Move r11 and r0 later to suit shrink-wrapping.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 178876)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c
 static void rs6000_set_default_type_attributes (tree);
 static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool);
 static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool);
-static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int,
-  enum machine_mode, bool, bool, bool);
 static bool rs6000_reg_live_or_pic_offset_p (int);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static tree rs6000_builtin_vectorized_function (tree, tree, tree);
@@ -19704,8 +19728,10 @@ rs6000_emit_stack_reset (rs6000_stack_t 
   if (sp_offset != 0)
{
  rtx dest_reg = savres ? gen_rtx_REG (Pmode, 11) : sp_reg_rtx;
- return emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx,
-  GEN_INT (sp_offset)));
+ rtx insn = emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx,
+  GEN_INT (sp_offset)));
+ if (!savres)
+   return insn;
}
   else if (!savres)
return emit_move_insn (sp_reg_rtx, frame_reg_rtx);
@@ -19729,10 +19755,11 @@ rs6000_emit_stack_reset (rs6000_stack_t 
 }
 
 /* Construct a parallel rtx describing the effect of a call to an
-   out-of-line register save/restore routine.  */
+   out-of-line register save/restore routine, and emit the insn
+   or jump_insn as appropriate.  */
 
 static rtx
-rs6000_make_savres_rtx (rs6000_stack_t *info,
+rs6000_emit_savres_rtx (rs6000_stack_t *info,
rtx frame_reg_rtx, int save_area_offset,
enum machine_mode reg_mode,
bool savep, bool gpr, bool lr)
@@ -19742,6 +19769,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   int reg_size = GET_MODE_SIZE (reg_mode);
   rtx sym;
   rtvec p;
+  rtx par, insn;
 
   offset = 0;
   start_reg = (gpr
@@ -19755,7 +19783,7 @@ rs6000_make_savres_rtx (rs6000_stack_t *
 RTVEC_ELT (p, offset++) = ret_rtx;
 
   RTVEC_ELT (p, offset++)
-= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65));
+= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO));
 
   sym = rs6000_savres_routine_sym (info, savep, gpr, lr);
   RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym);
@@ -19788,7 +19816,16 @@ rs6000_make_savres_rtx (rs6000_stack_t *
   RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg);
 }
 
-  return gen_rtx_PARALLEL (VOIDmode, p);
+  par = gen_rtx_PARALLEL (VOIDmode, p);
+
+  if (!savep  lr)
+{
+  insn = emit_jump_insn (par);
+  JUMP_LABEL (insn) = ret_rtx;
+

[RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL

2011-09-28 Thread Sameera Deshpande

Hi!

This patch generates Thumb2 epilogues in RTL form.

The work involves defining new functions, predicates and patterns along with
few changes in existing code:
* The load_multiple_operation predicate was found to be too restrictive for
integer loads as it required consecutive destination regs, so this
restriction was lifted.
* Variations of load_multiple_operation were required to handle cases 
   - where SP must be the base register 
   - where FP values were being loaded (which do require consecutive
destination registers)
   - where PC can be in register-list (which requires return pattern along
with register loads).
  Hence, the common code was factored out into a new function in arm.c and
parameterised to show 
   - whether consecutive destination regs are needed
   - the data type being loaded 
   - whether the base register has to be SP
   - whether PC is in register-list

The patch is tested with arm-eabi with no regressions.

ChangeLog:

2011-09-28  Ian Bolton ian.bol...@arm.com
Sameera Deshpande  sameera.deshpa...@arm.com
   
   * config/arm/arm-protos.h (load_multiple_operation_p): New
declaration.
 (thumb2_expand_epilogue): Likewise.
 (thumb2_output_return): Likewise
 (thumb2_expand_return): Likewise.
 (thumb_unexpanded_epilogue): Rename to... 
 (thumb1_unexpanded_epilogue): ...this 
   * config/arm/arm.c (load_multiple_operation_p): New function. 
 (thumb2_emit_multi_reg_pop): Likewise.
 (thumb2_emit_vfp_multi_reg_pop): Likewise.
 (thumb2_expand_return): Likewise. 
 (thumb2_expand_epilogue): Likewise. 
 (thumb2_output_return): Likewise
 (thumb_unexpanded_epilogue): Rename to...
 ( thumb1_unexpanded_epilogue): ...this
   * config/arm/arm.md (pop_multiple_with_stack_update): New pattern. 
 (pop_multiple_with_stack_update_and_return): Likewise.
 (thumb2_ldr_with_return): Likewise.
 (floating_point_pop_multiple_with_stack_update): Likewise.
 (return): Update condition and code for pattern.
 (arm_return): Likewise.
 (epilogue_insns): Likewise.
   * config/arm/predicates.md (load_multiple_operation): Update
predicate.
 (load_multiple_operation_stack_and_return): New predicate. 
 (load_multiple_operation_stack): Likewise.
 (load_multiple_operation_stack_fp): Likewise.
   * config/arm/thumb2.md (thumb2_return): Remove.
 (thumb2_rtl_epilogue_return): New pattern.


- Thanks and regards,
  Sameera D.

thumb2_rtl_epilogue_complete-27Sept.patch
Description: Binary data

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Tom de Vries

On 09/28/2011 03:57 PM, Carrot Wei wrote:
 Hi Tom
 
 What's the behavior of your patch to the following case
 
 typedef int int_unaligned __attribute__((aligned(1)));
 int foo (int_unaligned *p)
 {
   return *p;
 }
 

I modified the example slightly:

test.c:
...
typedef int __attribute__((aligned(2))) int_unaligned;
int foo (int_unaligned *p)
{
  return *(p+1);
}
...

test.c.023t.ccp1:
...
  # PT = anything
  # ALIGN = 2, MISALIGN = 0
  D.2723_2 = pD.1604_1(D) + 4;
...

Thanks,
- Tom

 thanks
 Carrot
 
 On Tue, Sep 20, 2011 at 7:13 PM, Tom de Vries vr...@codesourcery.com wrote:
 Hi Richard,

 I have a patch for PR43814. It introduces an option that assumes that 
 function
 arguments of pointer type are aligned, and uses that information in
 tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.

 I tested the patch successfully on-by-default on x86_64 and i686 (both gcc 
 only
 builds).

 I also tested the patch on-by-default for ARM (gcc/glibc build). The patch
 generated wrong code for uselocale.c:
 ...
 glibc/locale/locale.h:
 ...
 /* This value can be passed to `uselocale' and may be returned by
   it. Passing this value to any other function has undefined behavior.  */
 # define LC_GLOBAL_LOCALE   ((__locale_t) -1L)
 ...
 glibc/locale/uselocale.c:
 ...
 locale_t
 __uselocale (locale_t newloc)
 {
  locale_t oldloc = _NL_CURRENT_LOCALE;

  if (newloc != NULL)
{
  const locale_t locobj
= newloc == LC_GLOBAL_LOCALE ? _nl_global_locale : newloc;

 ...
 The assumption that function arguments of pointer type are aligned, allowed 
 the
 test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false.
 But the usage of ((__locale_t) -1L) as function argument in uselocale 
 violates
 that assumption.

 Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run 
 without
 regressions for ARM.

 Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM,
 discussed here:
 - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html
 - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html

 But, since glibc uses this construct currently, the option is off-by-default 
 for
 now.

 OK for trunk?

 Thanks,
 - Tom

 2011-09-20  Tom de Vries t...@codesourcery.com

PR target/43814
* tree-ssa-ccp.c (get_align_value): New function, factored out of
get_value_from_alignment.
(get_value_from_alignment): Use get_align_value.
(get_value_for_expr): Use get_align_value to handle alignment of
function argument pointers.
* common.opt (faligned-pointer-argument): New option.
* doc/invoke.texi (Optimization Options): Add
-faligned-pointer-argument.
(-faligned-pointer-argument): New item.

* gcc/testsuite/gcc.dg/pr43814.c: New test.
* gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen

 There is no separate cost-table for Nehalem or SandyBridge - however,
 I tuned generic32 and generic64 tables, that should improve
 performance on modern processors. In old version REP-MOV was used - it

The recommended heuristics have changed in Nehalem and Sandy-Bridge
over earlier Intel CPUs.

-nAid
-- 
a...@linux.intel.com -- Speaking for myself only.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Andi Kleen

On Wed, Sep 28, 2011 at 06:27:11PM +0200, Andi Kleen wrote:
  There is no separate cost-table for Nehalem or SandyBridge - however,
  I tuned generic32 and generic64 tables, that should improve
  performance on modern processors. In old version REP-MOV was used - it
 
 The recommended heuristics have changed in Nehalem and Sandy-Bridge
 over earlier Intel CPUs.

Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was
slower than generic.

Adding new tables shouldn't be very difficult, even if they are the same
as generic.

-Andi

Re: [PATCH 2/3] Use urandom to get random seed

2011-09-28 Thread Basile Starynkevitch

On Wed, 28 Sep 2011 16:56:32 +0200
Andi Kleen a...@firstfloor.org wrote:

  I suppose we might get interrupted before anything is read and
  read can return with -1 (I suppose partial reads are quite unlikely
  though)?  Thus, don't we need the usual EINTR loop?
 
 When we get interrupted gcc will die. I don't think gcc handles any
 asynchronous signals, right?


That depends upon the signal. If we got SIGCHLD or SIGWINCH, the call to read 
probably
gets EINTR, but the signal is ignored unless explicitly handled.

So I would suggest testing for EINTR.

Besides, perhaps some plugins could install their signal handlers

Regards.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Maxim Kuvyrkov

On 24/09/2011, at 9:40 PM, Jakub Jelinek wrote:

 On Sat, Sep 24, 2011 at 11:31:25AM +0200, Richard Guenther wrote:
 In the end I'd probably say the patch is ok without the option (thus
 turned on by default), but if LC_GLOBAL_LOCALE is part of the
 glibc ABI then we clearly can't do this.
 
 Yes, LC_GLOBAL_LOCALE is part of glibc ABI.  I guess we could only assume
 the alignment if the pointer is actually dereferenced on the statement
 that checks the ABI or in some stmt that dominates the spot where you want
 to check the alignment.  It is IMHO quite common to pass arbitrary values
 in pointer types, then cast them back or just compare.

I can relate to camps of both GCC developers and GLIBC developers, and I 
understand the benefits and liabilities of Tom's optimization.  Ultimately, the 
problem we need to solve is to find a way to manage non-conforming code with a 
compiler that tries to squeeze out as much performance from valid code as 
possible.

I think Tom's suggestion to have an option (either enabled or disabled by 
default) is a very good solution given the circumstances.  I think we should 
document the option as a transitional measure designed to give time to GLIBC 
and others to catch up, and obsolete it in the next release.  GLIBC patch to 
fix locale_t definition is attached.

In this submission Tom is being punished for his thoroughness in testing the 
optimization.  Let this be a lesson to all of us to steer clear of GLIBC.  
[Kidding.]

We had similar discussions several times already, and it seems the guiding 
principle of whether enabling new optimization that may break undefined code 
patterns is to balance expected performance benefits against how frequently the 
offending code pattern is used.  Returning to the case at hand: Is there code 
comparing a pointer to an integer?  Yes.  Is it common?  Comparing to a zero, 
absolutely; to a non-zero  errr.  Probably not that much.

The performance benefits from the optimization are quite significant.  Pointer 
alignment has tremendous effect on expanding __builtin_{mem,str}* functions.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



fsf-glibc-locale_t-align.patch
Description: Binary data

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Maxim Kuvyrkov

On 29/09/2011, at 7:35 AM, David Miller wrote:

 From: Maxim Kuvyrkov ma...@codesourcery.com
 Date: Thu, 29 Sep 2011 07:29:12 +1300

 GLIBC patch to fix locale_t definition is attached.

 Isn't this going to result in byte loads being used to dereference
 all locale_t pointers on targets like sparc and mips?

Yes, that's the price for binary compatibility for comparing a pointer to -1.  
I just hope that no common program has locale functions on its critical path.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Maxim Kuvyrkov ma...@codesourcery.com
Date: Thu, 29 Sep 2011 07:40:55 +1300

 On 29/09/2011, at 7:35 AM, David Miller wrote:

 From: Maxim Kuvyrkov ma...@codesourcery.com
 Date: Thu, 29 Sep 2011 07:29:12 +1300

 GLIBC patch to fix locale_t definition is attached.

 Isn't this going to result in byte loads being used to dereference
 all locale_t pointers on targets like sparc and mips?

 Yes, that's the price for binary compatibility for comparing a pointer to -1. 
  I just hope that no common program has locale functions on its critical path.

I personally don't think this is acceptable, critical path or not.

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Maxim Kuvyrkov

On 29/09/2011, at 7:41 AM, David Miller wrote:

 From: Maxim Kuvyrkov ma...@codesourcery.com
 Date: Thu, 29 Sep 2011 07:40:55 +1300

 On 29/09/2011, at 7:35 AM, David Miller wrote:

 From: Maxim Kuvyrkov ma...@codesourcery.com
 Date: Thu, 29 Sep 2011 07:29:12 +1300

 GLIBC patch to fix locale_t definition is attached.

 Isn't this going to result in byte loads being used to dereference
 all locale_t pointers on targets like sparc and mips?

 Yes, that's the price for binary compatibility for comparing a pointer to 
 -1.  I just hope that no common program has locale functions on its critical 
 path.

 I personally don't think this is acceptable, critical path or not.

OK.  Do you have an alternative suggestion that would fix non-portable use of 
locale_t?

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Maxim Kuvyrkov ma...@codesourcery.com
Date: Thu, 29 Sep 2011 07:45:17 +1300

 OK.  Do you have an alternative suggestion that would fix non-portable use of 
 locale_t?

Don't optimize something that is invalidated by a quite common practice?

What about people who encode invalid pointers with 0xdeadbeef, do we need
to audit every source tree that does this too?  This invalidates the
optimization's preconditions as well.

You're not going to eradicate all the code in the world which uses unaligned
constants to encode pointers to make them have special meanings in certain
situations.

We use the -1 thing in the Linux kernel too I believe.  I'd go so far as
to say this kind of thing is pervasive.

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Maxim Kuvyrkov ma...@codesourcery.com
Date: Thu, 29 Sep 2011 07:58:01 +1300

 To summarize, your opinion seems to be to not enable the
 optimization by default, correct?

Yes, and I don't think we ever could do so.

BTW, another common paradigm is using the always clear bits of a
pointer to encode state bits.

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Jakub Jelinek

On Wed, Sep 28, 2011 at 03:00:09PM -0400, David Miller wrote:
 From: Maxim Kuvyrkov ma...@codesourcery.com
 Date: Thu, 29 Sep 2011 07:58:01 +1300

  To summarize, your opinion seems to be to not enable the
  optimization by default, correct?

 Yes, and I don't think we ever could do so.

 BTW, another common paradigm is using the always clear bits of a
 pointer to encode state bits.

Yeah, I think it is a bad optimization that is going to break lots of code
and the glibc patch is definitely unacceptable (and not doing what you think
it is doing, your change didn't do anything at all, as the aligned attribute
applies to the pointer type and not to the pointed type).

The alignment of the argument pointer can be IMHO only hard assumed
if there is a load or store using that type dominating the spot where you
are checking it, and the target is strict alignment target.
Then we can both assume that alignment, optimize away tests on the low bits
etc.
Otherwise, what you could do is just use the pointed type's alignment as
a hint, likely alignment, e.g. on non-strict alignment target you could
expand code optimistically assuming that it is very likely aligned, but
still shouldn't optimize away low bits tests.

Jakub

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Richard Guenther

On Wed, Sep 28, 2011 at 9:13 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Wed, Sep 28, 2011 at 03:00:09PM -0400, David Miller wrote:
 From: Maxim Kuvyrkov ma...@codesourcery.com
 Date: Thu, 29 Sep 2011 07:58:01 +1300

  To summarize, your opinion seems to be to not enable the
  optimization by default, correct?

 Yes, and I don't think we ever could do so.

 BTW, another common paradigm is using the always clear bits of a
 pointer to encode state bits.

 Yeah, I think it is a bad optimization that is going to break lots of code
 and the glibc patch is definitely unacceptable (and not doing what you think
 it is doing, your change didn't do anything at all, as the aligned attribute
 applies to the pointer type and not to the pointed type).

 The alignment of the argument pointer can be IMHO only hard assumed
 if there is a load or store using that type dominating the spot where you
 are checking it, and the target is strict alignment target.
 Then we can both assume that alignment, optimize away tests on the low bits
 etc.
 Otherwise, what you could do is just use the pointed type's alignment as
 a hint, likely alignment, e.g. on non-strict alignment target you could
 expand code optimistically assuming that it is very likely aligned, but
 still shouldn't optimize away low bits tests.

There is nothing like very likely aligned ;)  Note that what is new is
that we now no longer assume alignment by default (we did in the past)
and that we derive properties about the pointer _value_ from alignment.

I think we can derive pointer values when we see dereferences, the
code wouldn't be portable to strict-alignment targets otherwise.  We
can offer a -fno-strict-alignment option to disable deriving alignment
from types.

Richard.

        Jakub

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Jakub Jelinek

On Wed, Sep 28, 2011 at 09:56:27PM +0200, Richard Guenther wrote:
 There is nothing like very likely aligned ;)  Note that what is new is

On non-strict aligned targets there is no reason not to have something like
very likely aligned.  You would expand stores/loads as if it was aligned
in that case, and if it isn't, all you'd need to ensure from that is that
you don't derive properties about the pointer value from the likely
aligned info, only from alignment.

Very likely aligned is interesting to the vectorizer too, if it is very
likely something is sufficiently aligned, the vectorizer could decide to
assume the alignment in the vectorized loop and add the check for the
alignment to the loop guards.  In the likely case the vectorized loop would
be used (if other guards were true too), in the unlikely case it is
unaligned it would just use a slower loop.

 that we now no longer assume alignment by default (we did in the past)
 and that we derive properties about the pointer _value_ from alignment.
 
 I think we can derive pointer values when we see dereferences, the
 code wouldn't be portable to strict-alignment targets otherwise.  We

But any references?  If you have
int foo (int *p)
{
  memcpy (p, a, 1);
  return ((uintptr_t) p  3) == 0;
}
then even if p isn't aligned, this could work even on strict aligned
targets.

Anyway, the arbitrary value in a pointer thing is much more important then
the rest, so having the dominating dereference test is very important.

Jakub

Re: [PATCH 2/7] Generate virtual locations for tokens

2011-09-28 Thread Jason Merrill


On 09/28/2011 03:15 PM, Dodji Seketeli wrote:

+set_arg_token (macro_arg *arg, const cpp_token *token,
+  source_location location, size_t index,
+  enum macro_arg_token_kind kind,
+  bool track_macro_exp_p)
+{
+  const cpp_token **token_ptr;
+  source_location *loc = NULL;
+
+  token_ptr =
+arg_token_ptr_at (arg, index, kind,
+ track_macro_exp_p ? loc : NULL);

...

+  if (virt_location)
+{
+  if (kind == MACRO_ARG_TOKEN_NORMAL)
+   *virt_location = arg-virt_locs[index];
+  else if (kind == MACRO_ARG_TOKEN_EXPANDED)
+   *virt_location = arg-expanded_virt_locs[index];
+  else if (kind == MACRO_ARG_TOKEN_STRINGIFIED)
+   *virt_location =
+ (source_location *) tokens_ptr[index]-src_loc;
+}


If we make this block conditional on arg-virt_locs being set, then we 
can pass loc in unconditionally and don't need the track_macro_exp_p 
flag to set_arg_token.



Note that the gotos were put there also because we needed to get out
of the for (;;) loop, similarly to what the previous return statements
were doing; so by doing this doesn't we don't do get rid of the gotos.


Can't you use break instead?

Jason

[C++ Patch] PR 45278

2011-09-28 Thread Paolo Carlini


Hi,

here submitter remarks that, inconsistently with the documentation, with 
-Wextra the C++ front-end doesn't warn for ordered comparison of pointer 
with integer zero. Thus I simply adapted the warning in the C front-end. 
Is the patch Ok?


Tested x86_64-linux.

Paolo.

/

Re: [C++ Patch] PR 45278

2011-09-28 Thread Paolo Carlini


On 09/28/2011 11:02 PM, Paolo Carlini wrote:

Hi,

here submitter remarks that, inconsistently with the documentation, 
with -Wextra the C++ front-end doesn't warn for ordered comparison of 
pointer with integer zero. Thus I simply adapted the warning in the C 
front-end. Is the patch Ok?

Better attaching the patch ;)

Paolo.
/cp
2011-09-28  Paolo Carlini  paolo.carl...@oracle.com

PR c++/45278
* typeck.c (cp_build_binary_op): With -Wextra, warn for ordered
comparison of pointer with zero.

/testsuite
2011-09-28  Paolo Carlini  paolo.carl...@oracle.com

PR c++/45278
* g++.dg/warn/Wextra-3.C: New.

Index: testsuite/g++.dg/warn/Wextra-3.C
===
--- testsuite/g++.dg/warn/Wextra-3.C(revision 0)
+++ testsuite/g++.dg/warn/Wextra-3.C(revision 0)
@@ -0,0 +1,9 @@
+// PR c++/45278
+// { dg-options -Wextra } 
+
+extern void* p;
+
+int f1() { return ( p   0 ? 1 : 0 ); } // { dg-warning ordered comparison }
+int f2() { return ( p = 0 ? 1 : 0 ); } // { dg-warning ordered comparison }
+int f3() { return ( p   0 ? 1 : 0 ); } // { dg-warning ordered comparison }
+int f4() { return ( p = 0 ? 1 : 0 ); } // { dg-warning ordered comparison }
Index: cp/typeck.c
===
--- cp/typeck.c (revision 179319)
+++ cp/typeck.c (working copy)
@@ -4189,9 +4189,19 @@ cp_build_binary_op (location_t location,
result_type = composite_pointer_type (type0, type1, op0, op1,
  CPO_COMPARISON, complain);
   else if (code0 == POINTER_TYPE  null_ptr_cst_p (op1))
-   result_type = type0;
+   {
+ result_type = type0;
+ if (extra_warnings  (complain  tf_warning))
+   warning (OPT_Wextra,
+ordered comparison of pointer with integer zero);
+   }
   else if (code1 == POINTER_TYPE  null_ptr_cst_p (op0))
-   result_type = type1;
+   {
+ result_type = type1;
+ if (extra_warnings  (complain  tf_warning))
+   warning (OPT_Wextra,
+ordered comparison of pointer with integer zero);
+   }
   else if (null_ptr_cst_p (op0)  null_ptr_cst_p (op1))
/* One of the operands must be of nullptr_t type.  */
 result_type = TREE_TYPE (nullptr_node);

Re: [pph] Prepare for mutation detection [2/3] (issue5142049)

2011-09-28 Thread Gabriel Charette

More comments to come on [3/3], for now just a single comment below on
this specific patch:

 diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
 index 0bd4d64..b267833 100644
 --- a/gcc/cp/pph-streamer-in.c
 +++ b/gcc/cp/pph-streamer-in.c
 @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream)
   if (marker == PPH_RECORD_END)
     return NULL;
   else if (pph_is_reference_marker (marker))
 -    return (cxx_binding *) pph_cache_get (stream-cache, image_ix, ix, 
 marker);
 +    {
 +      pph_cache *cache = pph_cache_select (stream, marker, image_ix);
 +      return (cxx_binding *) pph_cache_get (cache, ix);
 +    }

Seems like you replaced the pph_cache_get one liners with these
two-liners. Wouldn't a simple inline function be nicer for this?

Gab

Re: [pph] Prepare for mutation detection [3/3] (issue5139052)

2011-09-28 Thread Gabriel Charette

Very nice!

I really like where this is heading :)!

I think it would be great to instrument this to know how many times we
need to use a PPH_RECORD_MREF, to avoid trashing the cache (i.e.
potentially there are specific cases where only a small field's value
changes and pickling the entire tree again is sub-optimal; if instead
we could detect those cases and simply output the required change
which could then be applied on the reader side it would potentially be
better... it is possible that we haven't been affected by these
specific cases up until now, but that they would all massively result
in PPH_RECORD_MREFs now (the instrumentation would also potentially
help us find some of those tricky mutation, if there are any caused by
other things than overloads, which we aren't aware of yet...just a
thought).

(or maybe even assert that those mutation are indeed overloads if we
think that's the only time it occurs??)

Cheers,
Gab

On Tue, Sep 27, 2011 at 1:03 PM, Diego Novillo dnovi...@google.com wrote:

 This finishes removing constants and builtins out of the cache.
 This time in a slightly more elegant way.

 The patch introduces a new version of pph_out_start_record exclusively
 for trees (pph_out_start_tree_record).  If the tree is not cacheable
 then it emits a PPH_RECORD_START_NO_CACHE record to indicate that the
 reader should not bother adding the tree to its cache.

 It also changes the semantics of pph_out_start_record.  It now returns
 true when the caller should do nothing else.

 We are now signing every DECL and TYPE tree we see, but doing nothing
 with this signature.  In the final patch this will change so we only
 sign DECLs/TYPEs after reading, if we are also generating a PPH image
 (pure readers do not need to care about mutations).

 Tested on x86_64.  Committed to branch.


        * pph-streamer-in.c (pph_read_namespace_tree): Do not insert
        constants in the cache.
        * pph-streamer-out.c (pph_cache_should_handle): New.
        (pph_out_start_ref_record): Factor out of ...
        (pph_out_start_record): ... here.
        Change return value meaning; true means 'all done'; false
        means 'caller should write data out'.  Update all callers.
        (pph_out_start_tree_record):  New.
        (pph_write_builtin): Remove.  Update all users.
        (pph_write_namespace_tree): Call pph_out_start_tree_record.
        * pph-streamer.c (pph_cache_insert_at): Assert that we are
        not trying to insert the same data more than once.
        * pph-streamer.h (enum pph_record_marker): Add new value
        PPH_RECORD_START_NO_CACHE.
        (pph_in_record_marker): Handle it.

 diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
 index b267833..8e7c772 100644
 --- a/gcc/cp/pph-streamer-in.c
 +++ b/gcc/cp/pph-streamer-in.c
 @@ -2137,7 +2137,6 @@ pph_read_namespace_tree (pph_stream *stream, tree 
 enclosing_namespace)
       /* For integer constants we only need the type and its hi/low
         words.  */
       expr = streamer_read_integer_cst (ib, data_in);
 -      pph_cache_insert_at (stream-cache, expr, ix);
     }
   else
     {
 diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c
 index 7157275..2f9fcae 100644
 --- a/gcc/cp/pph-streamer-out.c
 +++ b/gcc/cp/pph-streamer-out.c
 @@ -182,14 +182,56 @@ pph_flush_buffers (pph_stream *stream)
  }


 -/* Start a new record in STREAM for data in DATA.  If DATA is NULL
 -   write an end-of-record marker and return false.  If DATA is not NULL
 -   and did not exist in the pickle cache, add it, write a
 -   start-of-record marker and return true.  If DATA existed in the
 -   cache, write a shared-record marker and return false.  */
 +/* Return true if tree node T should be added to the pickle cache.  */

  static inline bool
 -pph_out_start_record (pph_stream *stream, void *data)
 +pph_cache_should_handle (tree t)
 +{
 +  if (t)
 +    {
 +      if (TREE_CODE (t) == INTEGER_CST)
 +        {
 +          /* With the exception of some special constants that are
 +            pointer-compared everywhere (e.g., integer_zero_node), we
 +            do not want constants in the cache.  Their physical
 +            representation is smaller than a cache reference record
 +            and they can also trick the cache in similar ways to
 +            builtins (read below).  */
 +          return false;
 +        }
 +      else if (streamer_handle_as_builtin_p (t))
 +        {
 +          /* We do not want builtins in the cache for two reasons:
 +
 +            - They never need pickling.  Much like pre-loaded tree
 +              nodes, builtins are pre-built by the compiler and
 +              accessible with their class and code.
 +
 +            - They can trick the cache.  When possible, user-provided
 +              functions are generally replaced by their builtin
 +              counterparts (e.g., strcmp, malloc, etc).  When this
 +              happens, the writer cache will store in its cache two
 +

Re: [pph] Prepare for mutation detection [2/3] (issue5142049)

2011-09-28 Thread Diego Novillo

On Wed, Sep 28, 2011 at 17:23, Gabriel Charette gcharet...@gmail.com wrote:
 More comments to come on [3/3], for now just a single comment below on
 this specific patch:

 diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
 index 0bd4d64..b267833 100644
 --- a/gcc/cp/pph-streamer-in.c
 +++ b/gcc/cp/pph-streamer-in.c
 @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream)
   if (marker == PPH_RECORD_END)
     return NULL;
   else if (pph_is_reference_marker (marker))
 -    return (cxx_binding *) pph_cache_get (stream-cache, image_ix, ix, 
 marker);
 +    {
 +      pph_cache *cache = pph_cache_select (stream, marker, image_ix);
 +      return (cxx_binding *) pph_cache_get (cache, ix);
 +    }

 Seems like you replaced the pph_cache_get one liners with these
 two-liners. Wouldn't a simple inline function be nicer for this?

I call them separately.  Or do you mean a single call that combines
them for the common case?

Re: [SH] PR 49468 - Integer SI abs

2011-09-28 Thread Kaz Kojima

Oleg Endo oleg.e...@t-online.de wrote:
 The attached patch and ChangeLog below should fix it.
 I have also added a test case for SI mode abs.

Thanks!  I've applied your patch as revision 179320.

Regards,
kaz

[PING] Re: [PATCH] Fix PR50183

2011-09-28 Thread William J. Schmidt

Hi there,

Ping.  I'm seeking approval for this fix on trunk and 4_6-branch.
Thanks!

Bill

On Tue, 2011-09-13 at 17:55 -0500, William J. Schmidt wrote:
 Greetings,
 
 The code to build scops (static control parts) for graphite first
 rewrites loops into canonical loop-closed SSA form.  PR50183 identifies
 a scenario where the results do not fulfill all required invariants of
 this form.  In particular, a value defined inside a loop and used
 outside that loop must reach exactly one definition, which must be a
 single-argument PHI node called a close-phi.  When nested loops exist,
 it is possible that, following the rewrite, a definition may reach two
 close-phis.  This patch corrects that problem.
 
 The problem arises because loops are processed from outside in.  While
 processing a loop, duplicate close-phis are eliminated.  However,
 eliminating duplicate close-phis for an inner loop can sometimes create
 duplicate close-phis for an already-processed outer loop.  This patch
 detects when this may have occurred and repeats the removal of duplicate
 close-phis as necessary.
 
 The problem was noted on ibm/4_6-branch and 4_6-branch; it is apparently
 latent on trunk.  The same patch can be applied to all three branches.
 
 Bootstrapped and regression-tested on powerpc64-linux.  OK to commit to
 these three branches?
 
 Thanks,
 Bill
 
 
 2011-09-13  Bill Schmidt  wschm...@linux.vnet.ibm.com
 
   * graphite-scop-detection.c (make_close_phi_nodes_unique):  New
   forward declaration.
   (remove_duplicate_close_phi): Detect and repair creation of
   duplicate close-phis for a containing loop.
 
 
 Index: gcc/graphite-scop-detection.c
 ===
 --- gcc/graphite-scop-detection.c (revision 178829)
 +++ gcc/graphite-scop-detection.c (working copy)
 @@ -30,6 +30,9 @@ along with GCC; see the file COPYING3.  If not see
  #include tree-pass.h
  #include sese.h
 
 +/* Forward declarations.  */
 +static void make_close_phi_nodes_unique (basic_block);
 +
  #ifdef HAVE_cloog
  #include ppl_c.h
  #include graphite-ppl.h
 @@ -1231,6 +1234,13 @@ remove_duplicate_close_phi (gimple phi, gimple_stm
   SET_USE (use_p, res);
 
update_stmt (use_stmt);
 +  
 +  /* It is possible that we just created a duplicate close-phi
 +  for an already-processed containing loop.  Check for this
 +  case and clean it up.  */
 +  if (gimple_code (use_stmt) == GIMPLE_PHI
 +gimple_phi_num_args (use_stmt) == 1)
 + make_close_phi_nodes_unique (gimple_bb (use_stmt));
  }
 
remove_phi_node (gsi, true);

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Mike Stump

On Sep 20, 2011, at 4:13 AM, Tom de Vries wrote:
 I have a patch for PR43814. It introduces an option that assumes that function
 arguments of pointer type are aligned, and uses that information in
 tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.

I'm not a huge fan of an option that is very hard to use.  I think this option 
would be hard to use.  I'd rather have a port just assert that no code will be 
compiled that is weird in this way, then the front end can check for weird 
values on int to pointer conversions with constants and complain about the 
code, if they tried it.  If Android is safe in this respect, then, they can 
just turn it on, and then force anyone porting software to their platform to 
`fix' their code.  For systems that are clean, and new systems, we can 
recommend they set the option on.  For legacy systems that don't want to change 
or just want to compile legacy software, well, they can opt out.

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Tom de Vries

On 09/24/2011 11:31 AM, Richard Guenther wrote:
 On Tue, Sep 20, 2011 at 1:13 PM, Tom de Vries vr...@codesourcery.com wrote:
 Hi Richard,

 I have a patch for PR43814. It introduces an option that assumes that 
 function
 arguments of pointer type are aligned, and uses that information in
 tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined.

 I tested the patch successfully on-by-default on x86_64 and i686 (both gcc 
 only
 builds).

 I also tested the patch on-by-default for ARM (gcc/glibc build). The patch
 generated wrong code for uselocale.c:
 ...
 glibc/locale/locale.h:
 ...
 /* This value can be passed to `uselocale' and may be returned by
   it. Passing this value to any other function has undefined behavior.  */
 # define LC_GLOBAL_LOCALE   ((__locale_t) -1L)
 ...
 glibc/locale/uselocale.c:
 ...
 locale_t
 __uselocale (locale_t newloc)
 {
  locale_t oldloc = _NL_CURRENT_LOCALE;

  if (newloc != NULL)
{
  const locale_t locobj
= newloc == LC_GLOBAL_LOCALE ? _nl_global_locale : newloc;

 ...
 The assumption that function arguments of pointer type are aligned, allowed 
 the
 test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false.
 But the usage of ((__locale_t) -1L) as function argument in uselocale 
 violates
 that assumption.

 Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run 
 without
 regressions for ARM.

 Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM,
 discussed here:
 - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html
 - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html

 But, since glibc uses this construct currently, the option is off-by-default 
 for
 now.

 OK for trunk?
 
 No, I don't like to have an option to control this.  And given the issue
 you spotted it doesn't look like the best idea either.  This area in GCC
 is simply too fragile right now :/
 
 It would be nice if we could extend IPA-CP to propagate alignment
 information though.
 
 And at some point devise a reliable way for frontends to communicate
 alignment constraints the middle-end can rely on (well, yes, you could
 argue that's what TYPE_ALIGN is about, and I thought that maybe we
 can unconditionally rely on TYPE_ALIGN for pointer PARM_DECLs
 at least - your example shows we can't).
 
 In the end I'd probably say the patch is ok without the option (thus
 turned on by default), but if LC_GLOBAL_LOCALE is part of the
 glibc ABI then we clearly can't do this.
 

I removed the flag, and enabled the optimization now with the aligned attribute.

bootstrapped and tested on x86_64 (gcc only build), and build and reg-tested on
arm (gcc + glibc build), no issues found.

OK for trunk?

I intend to send a follow-up patch that introduces a target hook
function_pointers_aligned, that is false by default, and on by default for
-mandroid. I asked Google to test their Android codebase with the optimization
on by default. Would such a target hook be acceptable?

 Richard.
 

Thanks,
- Tom

2011-09-29  Tom de Vries t...@codesourcery.com

PR target/43814
* tree-ssa-ccp.c (get_align_value): New function, factored out of
get_value_from_alignment.
(get_value_from_alignment): Use get_align_value.
(get_value_for_expr): Use get_align_value to handle alignment of
function argument pointers with TYPE_USER_ALIGN.

* gcc/testsuite/gcc.dg/pr43814.c: New test.
* gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.
Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c	(revision 173703)
+++ gcc/tree-cfgcleanup.c	(working copy)
@@ -641,6 +641,552 @@ cleanup_omp_return (basic_block bb)
   return true;
 }
 
+/* Returns true if S contains (I1, I2).  */
+
+static bool
+int_int_splay_lookup (splay_tree s, unsigned int i1, unsigned int i2)
+{
+  splay_tree_node node;
+
+  if (s == NULL)
+return false;
+
+  node = splay_tree_lookup (s, i1);
+  return node  node-value == i2;
+}
+
+/* Attempts to insert (I1, I2) into *S.  Returns true if successful.
+   Allocates *S if necessary.  */
+
+static bool
+int_int_splay_insert (splay_tree *s, unsigned int i1 , unsigned int i2)
+{
+  if (*s != NULL)
+{
+  /* Check for existing element, which would otherwise be silently
+	 overwritten by splay_tree_insert.  */
+  if (splay_tree_lookup (*s, i1))
+	return false;
+}
+  else
+*s = splay_tree_new (splay_tree_compare_ints, 0, 0);
+
+  splay_tree_insert (*s, i1, i2);
+  return true;
+}
+
+/* Returns 0 if (NODE-value, NODE-key) is an element of S.  Otherwise,
+   returns 1.  */
+
+static int
+int_int_splay_node_contained_in (splay_tree_node node, void *s)
+{
+  splay_tree_node snode = splay_tree_lookup ((splay_tree)s, node-key);
+  return (!snode || node-value != snode-value) ? 1 : 0;
+}
+
+/* Returns true if all elements of S1 are also in S2.  */
+
+static bool
+int_int_splay_contained_in (splay_tree s1, splay_tree s2)
+{
+

Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.

2011-09-28 Thread Eric Botcazou

[Vlad, if you have a few minutes, would you mind having a look at the couple of 
questions at the end of the message?  Thanks in advance].

 No problem.

Here are the results of the investigation.  Pseudo 116 needs to be assigned a 
hard register.  It is used mostly in vector instructions so we would like it 
to be assigned a FP reg, but it is initialized in insn 2:

(insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ])
(reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9}
 (expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ])
(nil)))

so it ends up being assigned the (integer) argument register %i0 instead.  It 
used to be assigned a FP reg as expected with the GCC 4.6.x series.


The register class preference discovery is OK:

r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS, 
allocno GENERAL_OR_EXTRA_FP_REGS
a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS

i.e. EXTRA_FP_REGS is preferred/best.  Then it seems that this preference 
is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is 
handed down to the coloring stage.  By contrast, in the GCC 4.6 series, the 
cover_class of the allocno is EXTRA_FP_REGS.

The initial cost for %i0 is twice as high (24000) as the cost of FP regs.  But 
then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn 
2 above and then again by 12000 when process_regs_for_copy sees the same insn.
So, in the end, %i0 is given cost 0 and thus beats every other register.  This 
doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class.

This is at -O1.  At -O2, there is an extra pass at the discovery stage and it 
sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series, 
so a simple workaround is

Index: gcc.target/sparc/combined-1.c
===
--- gcc.target/sparc/combined-1.c   (revision 179316)
+++ gcc.target/sparc/combined-1.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O -mcpu=ultrasparc -mvis } */
+/* { dg-options -O2 -mcpu=ultrasparc -mvis } */
 typedef short vec16 __attribute__((vector_size(8)));
 typedef int vec32 __attribute__((vector_size(8)));


Finally the couple of questions:

 1. Is it expected that the register class preference be dropped at -O1?

 2. Is it expected that a single insn be processed by 2 different mechanisms 
that independently halve the initial cost of a hard register?


-- 
Eric Botcazou

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread David Miller

From: Mike Stump mikest...@comcast.net
Date: Wed, 28 Sep 2011 15:19:10 -0700

 If Android is safe in this respect, then, they can just turn it on,
 and then force anyone porting software to their platform to `fix'
 their code.

They'd have to then know to turn this option off when building the kernel,
which does use such constructs extensively.

I think this whole idea has too many downsides to be considered seriously.

People write problems like this, lots of people.  It's a pervasive
technique to encode boolean state into the low bits of a pointer or to
represent special token pointers using integers such as -1.

Re: [pph] Prepare for mutation detection [2/3] (issue5142049)

2011-09-28 Thread Gabriel Charette

On Wed, Sep 28, 2011 at 5:31 PM, Diego Novillo dnovi...@google.com wrote:
 On Wed, Sep 28, 2011 at 17:23, Gabriel Charette gcharet...@gmail.com wrote:
 More comments to come on [3/3], for now just a single comment below on
 this specific patch:

 diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c
 index 0bd4d64..b267833 100644
 --- a/gcc/cp/pph-streamer-in.c
 +++ b/gcc/cp/pph-streamer-in.c
 @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream)
   if (marker == PPH_RECORD_END)
     return NULL;
   else if (pph_is_reference_marker (marker))
 -    return (cxx_binding *) pph_cache_get (stream-cache, image_ix, ix, 
 marker);
 +    {
 +      pph_cache *cache = pph_cache_select (stream, marker, image_ix);
 +      return (cxx_binding *) pph_cache_get (cache, ix);
 +    }

 Seems like you replaced the pph_cache_get one liners with these
 two-liners. Wouldn't a simple inline function be nicer for this?

 I call them separately.  Or do you mean a single call that combines
 them for the common case?


Yes that's what I mean, a single call that combines both, since that's
the common usage and I feel there should be as little cache code at
the top of every pph_* function (in particular, every time a new  pph
streaming function is added all that code needs to be duplicated, so
the less code the better imo).

Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.

2011-09-28 Thread Jakub Jelinek

On Wed, Sep 28, 2011 at 06:43:04PM -0400, David Miller wrote:
 From: Mike Stump mikest...@comcast.net
 Date: Wed, 28 Sep 2011 15:19:10 -0700

  If Android is safe in this respect, then, they can just turn it on,
  and then force anyone porting software to their platform to `fix'
  their code.

 They'd have to then know to turn this option off when building the kernel,
 which does use such constructs extensively.

 I think this whole idea has too many downsides to be considered seriously.

 People write problems like this, lots of people.  It's a pervasive
 technique to encode boolean state into the low bits of a pointer or to
 represent special token pointers using integers such as -1.

Or 1.  Just do
grep '\*)[[:blank:]]*1' *.[chS]
to see how often an integer value is stored into a pointer.
And it is not just void * pointers, it is struct cgraph_node *
or struct varpool_node * too and the pointed types there certainly
have higher alignment than 1.
Now repeat the same with google codesearch.

Jakub

[C++ testcase, committed as obvious] PR 40145

2011-09-28 Thread Paolo Carlini


Hi,

tested x86_64-linux, committed.

Paolo.

//
2011-09-28  Paolo Carlini  paolo.carl...@oracle.com

PR c++/40145
* g++.dg/ext/visibility/warn5.C: New.

Index: g++.dg/ext/visibility/warn5.C
===
--- g++.dg/ext/visibility/warn5.C   (revision 0)
+++ g++.dg/ext/visibility/warn5.C   (revision 0)
@@ -0,0 +1,11 @@
+// PR c++/40145
+// { dg-do compile }
+// { dg-require-visibility  }
+// { dg-options -fvisibility=hidden }
+
+struct EditorInternalCommand { };
+
+static void createCommandMap()
+{
+struct CommandEntry { EditorInternalCommand command; };
+}

Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-09-28 Thread Jason Merrill


On 09/27/2011 01:52 PM, Dodji Seketeli wrote:

+ Remember we are at the expansion point of MACRO.  Each xI is the
+ location of the Ith token of the replacement-list. Now it gets
+ confusing. the xI is the location of the Ith token of the
+ replacement-list at the macro *definition* point. Not at the
+ macro replacement point. Okay, let's try to explain this below.


This should be yI.


+ The token '+' has two locations, so to speak. One in the context
+ of the macro *expansion* of PLUS in #2 and one in the context of
+ the macro *definition* of PLUS in #1. These two locations are
+ encoded in the the latter context, somehow in the xI we are
+ talking about.


The location of '+' in #2 is not encoded in xI or yI, we reach it 
through the expansion point location of the macro.  The location in #1 
is yI (and xI, in this case).



+ xI is roughly the index of the token inside the replacement-list
+ at the expansion point. So for '+', it's index would then be 1


its


+ [The index of token '1' would be 0 and the index of token 2 would
+ be 1]. So if '+' is our current xI, it is actualy an x1.


Are we still talking about #1 here?  It looks to me like the index of 
1 would be 2, the index of + would be 4, and the index of token 2 
would be 6.  I bet PLUS used to just be A + B, and this section of 
comment didn't get updated when it changed.


Keep changing xI to yI.


+ Now what's the y1 then? Remember, we said macro_locations is an
+ array of pairs (xI,yI). We now know what the xI is, now let's
+ look at the yI.


xI allows us to find where the token was actually written.  If the 
current macro context is also the spelling location of the token (e.g. 
#1 for +), then xI is the same as yI, i.e. the source location of that 
token.


If the current macro context is not the spelling location of the token 
(e.g. #0 or #1 for 1), then xI is the location outside the current 
macro context.  For 1 in #0, this the location of 1 in #1, which is 
a virtual location.  For 1 in #1, this is the location of 1 in #2, 
which is a source location.



+   * If LRK is set to LRK_MACRO_EXPANSION_POINT
+   ---
+
+   The virtual location is resolved to the location to the locus of
+   the expansion point of the macro.


The first macro expansion point that led to this macro expansion.


+   * If LRK is set to LRK_MACRO_DEFINITION_LOCATION
+   --


The virtual location is resolved to the locus of the token in the 
context of the macro definition.



+   If LOC is the locus of a token that is an argument of a
+   function-like macro [replacing a parameter in the replacement list
+   of the macro] the virtual location is resolved to the locus of the
+   parameter that is replaced, in the context of the definition of the
+   macro.
+
+   If LOC is the locus of a token that is not an argument of a
+   function-like macro, then the function behaves as if LRK was set to
+   LRK_SPELLING_LOCATION.


(and then keep these two paragraphs)


+   Finally, if SPELLING_LOC is not NULL, *RESULTING_LOC is set to the
+   location to which LOC was resolved


SPELLING_LOC doesn't exist anymore.


+   ORIG_LOC is the orginal location of the token at the definition
+   point of the macro. If you read the extensive comments of struct
+   line_map_macro in line-map.h, this is the xI.
+
+   If the token is part of a macro argument, ORIG_PARM_REPLACEMENT_LOC
+   is the location of the point at wich the token (the argument)
+   replaces the macro parameter in the context of the relevant macro
+   definition. If you read the comments of struct line_map_macro in
+   line-map.h, this is the yI.  */
+
+source_location
+linemap_add_macro_token (const struct line_map *map,


ORIG_LOC is the location of the token outside this macro expansion.  If 
the token comes originally from the macro definition, it is the locus in 
the macro definition; otherwise it is a location in the caller of this 
macro expansion (which is a virtual location or a source location if the 
caller is itself a macro expansion or not).


MACRO_DEFINITION_LOC is the location in the macro definition, either of 
the token itself or of a macro parameter that it replaces.



+/* If LOCATION is the locus of a token that is an argument of a
+   function-like macro M and appears in the expansion of M, return the
+   locus of that argument in the context of the caller of M.  Note
+   that the caller of M is necessarily another macro.


Why is the caller of M necessarily another macro?  In the PLUS example 
above, if we have the location for 1 in #1, won't it give us the 
location of 1 in #2?



  The context of M is a macro definition.


What does this mean?


+/* Return the source line number corresponding to source location
+   LOCATION.  SET is the line map set LOCATION comes from.  If
+   LOCATION is the source location of token that is part of the
+

[arm-embedded][PR38644] Fix stack red zone.

2011-09-28 Thread Terry Guo

Committed to ARM/embedded-4_6-branch.

2011-09-28  Jiangning Liu  jiangning@arm.com

   PR rtl-optimization/38644
   * config/i386/i386.c (ix86_stack_using_red_zone): Change inline
   to be extern.
   (TARGET_STACK_USING_RED_ZONE): New.
   * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New.
   (TARGET_STACK_USING_RED_ZONE): New.
   (offset_below_red_zone_p): Change to use new hook
   TARGET_STACK_USING_RED_ZONE.
   * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New.
   * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New.
   * sched-deps.c (sched_analyze_1): If the stack pointer is being
   modified and stack red zone is not supported for ports, flush out
   all memory references as they may become invalid if moved across
   the stack adjustment.
   * target.def (stack_using_red_zone): New.
   * testsuite/gcc.target/arm/stack-red-zone.c: New.

RE: [PATCH] Fix stack red zone bug (PR38644)

2011-09-28 Thread Jiangning Liu

 -Original Message-
 From: Richard Guenther [mailto:richard.guent...@gmail.com]
 Sent: Wednesday, September 28, 2011 5:56 PM
 To: Jiangning Liu
 Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

 On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu jiangning@arm.com
 wrote:

  -Original Message-
  From: Richard Guenther [mailto:richard.guent...@gmail.com]
  Sent: Wednesday, September 28, 2011 5:20 PM
  To: Jiangning Liu
  Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

  On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu
 jiangning@arm.com
  wrote:

   -Original Message-
   From: Richard Guenther [mailto:richard.guent...@gmail.com]
   Sent: Wednesday, September 28, 2011 4:39 PM
   To: Jiangning Liu
   Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
   Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

   On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu
  jiangning@arm.com
   wrote:

-Original Message-
From: Richard Guenther [mailto:richard.guent...@gmail.com]
Sent: Tuesday, September 27, 2011 3:41 PM
To: Jiangning Liu
Cc: Andrew Pinski; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Fix stack red zone bug (PR38644)

On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu
   jiangning@arm.com
wrote:
 Think of it this way.  What the IR says is there is no
 barrier
between
 those moves.  You either have an implicit barrier (which is
  what
   you
 are proposing) or you have it explicitly.  I think we all
  rather
have
 more things explicit rather than implicit in the IR.  And
 that
   has
 been the overall feeling for a few years now.

 Sorry, I'm afraid I can't agree with you. Instead, I think
  using
barrier to describe this kind of dependence is a kind of
 implicit
method. Please note that this is not an usual data dependence
  issue.
The stack pointer change doesn't have any dependence with
 memory
   access
at all.

It is similar to atomic instructions that require being an
optimization/memory barrier.  Sure it is not a usual data
  dependence
(otherwise we would handle
it already), so the targets have to explicitly express the
   dependence
somehow, for which we only have barriers right now.

Richard,

Thanks for your explanation. It's explicit to back-end, while
 it's
   implicit
to scheduler in middle end, because barrier can decide
 dependence
  in
scheduler but barrier can be generated from several different
   scenarios.
It's unsafe and prone to introduce bug if any one of the
 scenarios
   requiring
generating barriers is being missed in back-end.

Between middle-end and back-end, we should have interfaces that
 is
   easy to
be implemented by back-end. After all, middle-end itself can't
   consist of a
compiler, and vice versa. Back-end needs middle-end's help to
 make
   sure
back-end is easy to be implemented and reduce the possibility
 of
   introducing
bugs.

Without an explicit hook as I'm proposing, back-end
 implementers
  have
   to
clearly know all scenarios of generating barrier very clearly,
   because there
isn't any code tips and comments in middle end telling back-end
  the
   list of
all scenarios on generating barriers.

Yes, barrier is a perfect interface for scheduler in theory.
 But
  from
engineering point of view, I think it's better to explicitly
  define
   an
interface to describe stack red zone and inform back-end, or
 vice
   versa. Not
like computer, people is easy to make mistake if you don't tell
  them.
   On
this bug, the fact is over the years different back-ends made
  similar
   bugs.

GCC is really a perfect platform on building new ports, and I
 saw
  a
   lot of
new back-ends. The current middle end is unsafe, if port
 doesn't
   support
stack red zone and back-ends doesn't generate barrier for it.
 Why
   can't we
explicitly clarify this in compiler code between middle end and
  back
   end?
What if any other back-end (new or old) NOT supporting stack
 red
  zone
exposing the similar bug again?

   There are gazillion things you have to explicitly get right in
 your
   backends,
   so I don't see why exposing proper scheduling barriers should be
   special,
   and there, why red-zones should be special (as opposed to other
   occasions
   where you need to emit barriers from the backend for the
 scheduler).

   Richard,

   This is because,

   1) Current scheduler is unsafe if back-end doesn't generate
 barrier
  for a
   port which doesn't support stack red zone at all.
   2) Implementing barrier in back-end is a burden to new back-end
   implementation for ports not supporting stack red zone at all.
   3) There are many other ports not reporting bugs around this.

[PATCH] Respin sparc pixel-compare patterns using iterators.

2011-09-28 Thread David Miller


This is heavily inspired by a proof-of-concept patch David
Bremner sent to me the other week.

Committed to trunk.

gcc/

* config/sparc/sparc.md (UNSPEC_FCMPLE, UNSPEC_FCMPNE,
UNSPEC_FCMPGT, UNSPEC_FCMPEQ): Delete and reduce to...
(UNSPEC_FCMP): New unspec.
(gcond): New code iterator.
(gcond_name): New code attr.
(GCM): New mode iterator.
(gcm_name): New mode attr.
(fcmp{le,ne,gt,eq}{16,32}_vis): Reimplement using iterators.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179329 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |   11 +
 gcc/config/sparc/sparc.md |   90 +
 2 files changed, 21 insertions(+), 80 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2aae5aa..e853b15 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2011-09-28  David S. Miller  da...@davemloft.net
+
+   * config/sparc/sparc.md (UNSPEC_FCMPLE, UNSPEC_FCMPNE,
+   UNSPEC_FCMPGT, UNSPEC_FCMPEQ): Delete and reduce to...
+   (UNSPEC_FCMP): New unspec.
+   (gcond): New code iterator.
+   (gcond_name): New code attr.
+   (GCM): New mode iterator.
+   (gcm_name): New mode attr.
+   (fcmp{le,ne,gt,eq}{16,32}_vis): Reimplement using iterators.
+
 2011-09-28  Oleg Endo  oleg.e...@t-online.de
 
PR target/49486
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 6e38298..dfc5559 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -58,7 +58,7 @@
(UNSPEC_MUL8UL  46)
(UNSPEC_MULDUL  47)
(UNSPEC_ALIGNDATA   48)
-
+   (UNSPEC_FCMP49)
(UNSPEC_PDIST   50)
(UNSPEC_EDGE8   51)
(UNSPEC_EDGE8L  52)
@@ -69,11 +69,6 @@
 
(UNSPEC_SP_SET  60)
(UNSPEC_SP_TEST 61)
-
-   (UNSPEC_FCMPLE  70)
-   (UNSPEC_FCMPNE  71)
-   (UNSPEC_FCMPGT  72)
-   (UNSPEC_FCMPEQ  73)
   ])
 
 (define_constants
@@ -8149,83 +8144,18 @@
   edge32l\t%r1, %r2, %0
   [(set_attr type edge)])
 
-(define_insn fcmple16P:mode_vis
-  [(set (match_operand:P 0 register_operand =r)
-   (unspec:P [(match_operand:V4HI 1 register_operand e)
-  (match_operand:V4HI 2 register_operand e)]
-UNSPEC_FCMPLE))]
-  TARGET_VIS
-  fcmple16\t%1, %2, %0
-  [(set_attr type fpmul)
-   (set_attr fptype double)])
-
-(define_insn fcmple32P:mode_vis
-  [(set (match_operand:P 0 register_operand =r)
-   (unspec:P [(match_operand:V2SI 1 register_operand e)
-  (match_operand:V2SI 2 register_operand e)]
-UNSPEC_FCMPLE))]
-  TARGET_VIS
-  fcmple32\t%1, %2, %0
-  [(set_attr type fpmul)
-   (set_attr fptype double)])
-
-(define_insn fcmpne16P:mode_vis
-  [(set (match_operand:P 0 register_operand =r)
-   (unspec:P [(match_operand:V4HI 1 register_operand e)
-  (match_operand:V4HI 2 register_operand e)]
-UNSPEC_FCMPNE))]
-  TARGET_VIS
-  fcmpne16\t%1, %2, %0
-  [(set_attr type fpmul)
-   (set_attr fptype double)])
-
-(define_insn fcmpne32P:mode_vis
-  [(set (match_operand:P 0 register_operand =r)
-   (unspec:P [(match_operand:V2SI 1 register_operand e)
-  (match_operand:V2SI 2 register_operand e)]
-UNSPEC_FCMPNE))]
-  TARGET_VIS
-  fcmpne32\t%1, %2, %0
-  [(set_attr type fpmul)
-   (set_attr fptype double)])
-
-(define_insn fcmpgt16P:mode_vis
-  [(set (match_operand:P 0 register_operand =r)
-   (unspec:P [(match_operand:V4HI 1 register_operand e)
-  (match_operand:V4HI 2 register_operand e)]
-UNSPEC_FCMPGT))]
-  TARGET_VIS
-  fcmpgt16\t%1, %2, %0
-  [(set_attr type fpmul)
-   (set_attr fptype double)])
-
-(define_insn fcmpgt32P:mode_vis
-  [(set (match_operand:P 0 register_operand =r)
-   (unspec:P [(match_operand:V2SI 1 register_operand e)
-  (match_operand:V2SI 2 register_operand e)]
-UNSPEC_FCMPGT))]
-  TARGET_VIS
-  fcmpgt32\t%1, %2, %0
-  [(set_attr type fpmul)
-   (set_attr fptype double)])
-
-(define_insn fcmpeq16P:mode_vis
-  [(set (match_operand:P 0 register_operand =r)
-   (unspec:P [(match_operand:V4HI 1 register_operand e)
-  (match_operand:V4HI 2 register_operand e)]
-UNSPEC_FCMPEQ))]
-  TARGET_VIS
-  fcmpeq16\t%1, %2, %0
-  [(set_attr type fpmul)
-   (set_attr fptype double)])
+(define_code_iterator gcond [le ne gt eq])
+(define_code_attr gcond_name [(le le) (ne ne) (gt gt) (eq eq)])
+(define_mode_iterator GCM [V4HI V2SI])
+(define_mode_attr gcm_name [(V4HI 16) (V2SI 32)])
 
-(define_insn fcmpeq32P:mode_vis
+(define_insn fcmpgcond_namegcm_nameP:mode_vis
   [(set (match_operand:P 0 register_operand =r)
-   (unspec:P [(match_operand:V2SI 1 register_operand e)
-   (match_operand:V2SI 2 register_operand e)]
-UNSPEC_FCMPEQ))]
+   (unspec:P [(gcond:GCM

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

 Michael,
Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap
 failure...
I checked bootstrap, specs and 'make check' with the complete patch.
Separate patches for ME and BE were only tested for build (no
bootstrap) and 'make check'. I think it's better to apply the complete
patch, but review the separate patches (to make it easier).

 ps There also seems to be common sections in the memfunc-mid.patch and 
 memfunc-be.patch patches.
That's true, some new routines from middle-end are used in back-end
changes - I couldn't separate the patches in other way without
significant changes in them.


On 29 September 2011 01:51, Jack Howarth howa...@bromo.med.uc.edu wrote:
 On Wed, Sep 28, 2011 at 05:33:23PM +0400, Michael Zolotukhin wrote:
    It appears that part 1 of the patch wasn't really attached.
 Thanks, resending.

 Michael,
    Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap
 failure...

 /sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/./prev-gcc/g++ 
 -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/./prev-gcc/ 
 -B/sw/lib/gcc4.7/x86_64-apple-darwin11.2.0/bin/ -nostdinc++ 
 -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/src/.libs
  
 -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/libsupc++/.libs
  
 -I/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/include/x86_64-apple-darwin11.2.0
  
 -I/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/include
  -I/sw/src/fink.build/gcc47-4.7.0-1/gcc-4.7-20110927/libstdc++-v3/libsupc++ 
 -L/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/src/.libs
  
 -L/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/libsupc++/.libs
  -c   -g -O2 -mdynamic-no-pic -gtoggle -DIN_GCC   -W -Wall -Wwrite-strings 
 -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long 
 -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  
 -DHAVE_CONFIG_H -I. -I. -I../../gcc-4.7-20110927/gcc 
 -I../../gcc-4.7-20110927/gcc/. -I../../gcc-4.7-20110927/gcc/../include 
 -I../../gcc-4.7-20110927/gcc/../libcpp/include -I/sw/include -I/sw/include  
 -I../../gcc-4.7-20110927/gcc/../libdecnumber 
 -I../../gcc-4.7-20110927/gcc/../libdecnumber/dpd -I../libdecnumber 
 -I/sw/include  -I/sw/include -DCLOOG_INT_GMP -DCLOOG_ORG -I/sw/include 
 ../../gcc-4.7-20110927/gcc/emit-rtl.c -o emit-rtl.o
 ../../gcc-4.7-20110927/gcc/emit-rtl.c: In function ‘rtx_def* 
 adjust_address_1(rtx, machine_mode, long int, int, int)’:
 ../../gcc-4.7-20110927/gcc/emit-rtl.c:2060:26: error: unused variable 
 ‘max_align’ [-Werror=unused-variable]
 cc1plus: all warnings being treated as errors

 on x86_64-apple-darwin11 with your patches.
          Jack
 ps There also seems to be common sections in the memfunc-mid.patch and 
 memfunc-be.patch patches.




-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.

Re: Use of vector instructions in memmov/memset expanding

2011-09-28 Thread Michael Zolotukhin

 Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was
 slower than generic.
For now, -mtune=corei7 is triggering use of generic cost-table (I'm
not sure about corei7-avx, but assume the same) - so it won't be
slower.

 Adding new tables shouldn't be very difficult, even if they are the same
 as generic.
Yes, that should be quite easy, but this isn't 'must-have' for the
current patch (and it's already very huge). This could be done
independently from the work on memfunctions inlining.

93 matches

Mail list logo