date:20131017

Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Joey Ye

Which Cortex-R you are targeting that supports both D16 and D32?

Thanks,
Joey

On Thu, Oct 17, 2013 at 3:13 PM, Sebastian Huber
sebastian.hu...@embedded-brains.de wrote:
 Hello,

 it seems that it is not possible to deduce from GCC built-in defines whether
 we compile for the VFPv3-D16 or VFPv3-D32 floating point unit.

 touch empty.c

 arm-eabi-gcc -march=armv7-r -mfpu=vfpv3-d16 -mfloat-abi=hard -E -P -v -dD
 empty.c  vfpv3-d16.txt

 arm-eabi-gcc -march=armv7-r -mfpu=vfpv3 -mfloat-abi=hard -E -P -v -dD
 empty.c  vfpv3-d32.txt

 diff vfpv3-d16.txt vfpv3-d32.txt

 Is it possible to add a built-in define for this?  Or as an alternative is
 it possible to use a GCC configuration target specific multi-lib define that
 indicates it?

 I want to use such a compiler provided define to determine how the context
 switch support in an operating system should look like.

 --
 Sebastian Huber, embedded brains GmbH

 Address : Dornierstr. 4, D-82178 Puchheim, Germany
 Phone   : +49 89 189 47 41-16
 Fax : +49 89 189 47 41-09
 E-Mail  : sebastian.hu...@embedded-brains.de
 PGP : Public key available on request.

 Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Sebastian Huber


On 2013-10-17 09:28, Joey Ye wrote:

Which Cortex-R you are targeting that supports both D16 and D32?


I have a Cortex-R variant which supports D16 only and now I want to add a 
multi-lib to our GCC target configuration and use a compiler built-in to adjust 
the context switch code for this multi-lib.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Joey Ye

There is no macro to indicate VFP variances. Probably you can check
CPU variance instead. As I know Cortex-R only support D16.

Joey

On Thu, Oct 17, 2013 at 3:47 PM, Sebastian Huber
sebastian.hu...@embedded-brains.de wrote:
 On 2013-10-17 09:28, Joey Ye wrote:

 Which Cortex-R you are targeting that supports both D16 and D32?


 I have a Cortex-R variant which supports D16 only and now I want to add a
 multi-lib to our GCC target configuration and use a compiler built-in to
 adjust the context switch code for this multi-lib.


 --
 Sebastian Huber, embedded brains GmbH

 Address : Dornierstr. 4, D-82178 Puchheim, Germany
 Phone   : +49 89 189 47 41-16
 Fax : +49 89 189 47 41-09
 E-Mail  : sebastian.hu...@embedded-brains.de
 PGP : Public key available on request.

 Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: Compilation flags in libgfortran

2013-10-17 Thread Richard Biener

On Wed, Oct 16, 2013 at 12:22 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 On 16/10/13 10:37, pins...@gmail.com wrote:

 On Oct 15, 2013, at 6:58 AM, Igor Zamyatin izamya...@gmail.com wrote:
 Hi All!

 Is there any particular reason that matmul* modules from libgfortran
 are compiled with -O2 -ftree-vectorize?

 I see some regressions on Atom processor after r202980
 (http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00846.html)

 Why not just use O3 for those modules?

 -O3 and -O2 -ftree-vectorize won't give much performance difference.  What
 you are seeing is the cost model needs improvement; at least for atom.

 Hi all,
 I think http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01908.html introduced
 the new cheap vectoriser cost model that favors compilation time over
 runtime performance and is set as default for -O2. -O3 uses the dynamic
 model which potentially gives better runtime performance in exchange for
 longer compile times (if I understand the new rules correctly).
 Therefore, I'd expect -O3 to give a better vector performance than -O2...

But this suggests to compile with -O2 -ftree-vectorize
-fvect-cost-model=dynamic, not building with -O3.

Richard.

 Kyrill

Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Richard Earnshaw

On 17/10/13 08:56, Joey Ye wrote:
 There is no macro to indicate VFP variances. Probably you can check
 CPU variance instead. As I know Cortex-R only support D16.
 

Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile
and therefore only 16 regs.

R.

 Joey
 
 On Thu, Oct 17, 2013 at 3:47 PM, Sebastian Huber
 sebastian.hu...@embedded-brains.de wrote:
 On 2013-10-17 09:28, Joey Ye wrote:

 Which Cortex-R you are targeting that supports both D16 and D32?


 I have a Cortex-R variant which supports D16 only and now I want to add a
 multi-lib to our GCC target configuration and use a compiler built-in to
 adjust the context switch code for this multi-lib.


 --
 Sebastian Huber, embedded brains GmbH

 Address : Dornierstr. 4, D-82178 Puchheim, Germany
 Phone   : +49 89 189 47 41-16
 Fax : +49 89 189 47 41-09
 E-Mail  : sebastian.hu...@embedded-brains.de
 PGP : Public key available on request.

 Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Sebastian Huber


On 2013-10-17 14:24, Richard Earnshaw wrote:

On 17/10/13 08:56, Joey Ye wrote:

There is no macro to indicate VFP variances. Probably you can check
CPU variance instead. As I know Cortex-R only support D16.


Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile
and therefore only 16 regs.


Thanks for this information.  I was not aware that no existing Cortex-R variant 
has a D32 VFP unit.


Is the converse true for Cortex-A variants or are there Cortex-A cores with 
VFP-D16 and VFP-D32 units?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Richard Earnshaw

On 17/10/13 13:46, Sebastian Huber wrote:
 On 2013-10-17 14:24, Richard Earnshaw wrote:
 On 17/10/13 08:56, Joey Ye wrote:
 There is no macro to indicate VFP variances. Probably you can check
 CPU variance instead. As I know Cortex-R only support D16.

 Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile
 and therefore only 16 regs.
 
 Thanks for this information.  I was not aware that no existing Cortex-R 
 variant 
 has a D32 VFP unit.
 
 Is the converse true for Cortex-A variants or are there Cortex-A cores with 
 VFP-D16 and VFP-D32 units?
 

No, the converse is not true.  In general (though I don't think there's
a guarantee that this will be the case) if you don't have Neon you won't
have D32.  However, if you have Neon you must have D32.  Not all
Cortex-A cores have Neon; though many do these days.

R.

Re: [RFC] By default if-convert only basic blocks that will be vectorized

2013-10-17 Thread Sergey Ostanevich

Jakub, Richard,

I believe this patch is a good opportunity to improve the
vectorization capabilities.
I have the following question related to it: whether we plan to treat
the #pragma omp simd as a
directive to vectorize the underlying loop, hence dropping any
assessment regarding profitablity?


Regards,
Sergos

On Tue, Oct 15, 2013 at 4:32 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 Especially on i?86/x86_64 if-conversion pass seems to be often
 a pessimization, but the vectorization relies on it and without it we can't
 vectorize a lot of the loops.

 Here is a prototype of a patch that will by default (unless explicit
 -ftree-loop-if-convert) only if-convert loops internally for vectorization,
 so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized
 basic blocks, but will not appear if vectorization fails, or in the
 scalar loop if vectorization is conditional, or in the prologue or epilogue
 loops around the vectorized loop.

 Instead of moving the ifcvt pass inside of the vectorizer, this patch
 during ifcvt performs loop versioning depending on a special internal
 call, only if the internal call returns true we go to the if-converted
 original loop, otherwise the non-if-converted copy of the original loop
 is performed.  And the vectorizer is taught to fold this internal call
 into true resp. false depending on if the loop was vectorized or not, and
 vectorizer loop versioning, peeling for alignment and for bound are adjusted
 to also copy from the non-if-converted loop rather than if-converted one.

 Besides fixing the various PRs where if-conversion pessimizes code I'd like
 to also move forward with this with conditional loads and stores,
 http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html
 where the if-unconversion approach looked like a failure.

 This patch doesn't yet handle if-converted inner loop in outer loop
 vectorization, something on my todo list (so several vect-cond-*.c tests
 FAIL because they are no longer vectorized) plus I had to change two
 SLP vectorization tests that silently relied on loop if-conversion being
 performed to actually optimize the basic block (if the same thing didn't
 appear in a loop, it wouldn't be optimized at all).

 On the newly added testcase on x86_64, there are before this patch
 18 scalar conditional moves, with the patch just 2 (both in the checking
 routine).

 Comments?

 --- gcc/internal-fn.def.jj  2013-10-11 14:32:57.079909782 +0200
 +++ gcc/internal-fn.def 2013-10-11 17:23:58.705526840 +0200
 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
  DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
  DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
  DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 --- gcc/tree-vect-loop-manip.c.jj   2013-09-30 22:13:47.0 +0200
 +++ gcc/tree-vect-loop-manip.c  2013-10-15 12:57:54.854970913 +0200
 @@ -374,24 +374,31 @@ LOOP-  loop1

  static void
  slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
 +   struct loop *scalar_loop,
  bool is_new_loop, basic_block 
 *new_exit_bb)
  {
 -  gimple orig_phi, new_phi;
 +  gimple orig_phi, new_phi, scalar_phi = NULL;
gimple update_phi, update_phi2;
tree guard_arg, loop_arg;
basic_block new_merge_bb = guard_edge-dest;
edge e = EDGE_SUCC (new_merge_bb, 0);
basic_block update_bb = e-dest;
basic_block orig_bb = loop-header;
 -  edge new_exit_e;
 +  edge new_exit_e, scalar_e = NULL;
tree current_new_name;
 -  gimple_stmt_iterator gsi_orig, gsi_update;
 +  gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none ();

/* Create new bb between loop and new_merge_bb.  */
*new_exit_bb = split_edge (single_exit (loop));

new_exit_e = EDGE_SUCC (*new_exit_bb, 0);

 +  if (scalar_loop != NULL  !is_new_loop)
 +{
 +  gsi_scalar = gsi_start_phis (scalar_loop-header);
 +  scalar_e = EDGE_SUCC (scalar_loop-latch, 0);
 +}
 +
for (gsi_orig = gsi_start_phis (orig_bb),
 gsi_update = gsi_start_phis (update_bb);
 !gsi_end_p (gsi_orig)  !gsi_end_p (gsi_update);
 @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge
tree new_res;
orig_phi = gsi_stmt (gsi_orig);
update_phi = gsi_stmt (gsi_update);
 +  if (scalar_e != NULL)
 +   {
 + scalar_phi = gsi_stmt (gsi_scalar);
 + gsi_next (gsi_scalar);
 +   }

/** 1. Handle new-merge-point phis  **/

 @@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge
  current_new_name = loop_arg;
else
  {
 -  current_new_name = get_current_def (loop_arg);
 + if (scalar_e)
 +   {
 + current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e);
 + current_new_name =

Re: [RFC] By default if-convert only basic blocks that will be vectorized

2013-10-17 Thread pinskia


 On Oct 15, 2013, at 5:32 AM, Jakub Jelinek ja...@redhat.com wrote:
 
 Hi!
 
 Especially on i?86/x86_64 if-conversion pass seems to be often
 a pessimization, but the vectorization relies on it and without it we can't
 vectorize a lot of the loops.

I think on many other targets it actually helps.  I know for one it helps on 
octeon even though octeon has no vector instructions.  I think it helps most 
arm targets too.

Thanks,
Andrew

 
 Here is a prototype of a patch that will by default (unless explicit
 -ftree-loop-if-convert) only if-convert loops internally for vectorization,
 so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized
 basic blocks, but will not appear if vectorization fails, or in the
 scalar loop if vectorization is conditional, or in the prologue or epilogue
 loops around the vectorized loop.
 
 Instead of moving the ifcvt pass inside of the vectorizer, this patch
 during ifcvt performs loop versioning depending on a special internal
 call, only if the internal call returns true we go to the if-converted
 original loop, otherwise the non-if-converted copy of the original loop
 is performed.  And the vectorizer is taught to fold this internal call
 into true resp. false depending on if the loop was vectorized or not, and
 vectorizer loop versioning, peeling for alignment and for bound are adjusted
 to also copy from the non-if-converted loop rather than if-converted one.
 
 Besides fixing the various PRs where if-conversion pessimizes code I'd like
 to also move forward with this with conditional loads and stores,
 http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html
 where the if-unconversion approach looked like a failure.
 
 This patch doesn't yet handle if-converted inner loop in outer loop
 vectorization, something on my todo list (so several vect-cond-*.c tests
 FAIL because they are no longer vectorized) plus I had to change two
 SLP vectorization tests that silently relied on loop if-conversion being
 performed to actually optimize the basic block (if the same thing didn't
 appear in a loop, it wouldn't be optimized at all).
 
 On the newly added testcase on x86_64, there are before this patch
 18 scalar conditional moves, with the patch just 2 (both in the checking
 routine).
 
 Comments?
 
 --- gcc/internal-fn.def.jj2013-10-11 14:32:57.079909782 +0200
 +++ gcc/internal-fn.def2013-10-11 17:23:58.705526840 +0200
 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
 DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 --- gcc/tree-vect-loop-manip.c.jj2013-09-30 22:13:47.0 +0200
 +++ gcc/tree-vect-loop-manip.c2013-10-15 12:57:54.854970913 +0200
 @@ -374,24 +374,31 @@ LOOP-  loop1
 
 static void
 slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
 +struct loop *scalar_loop,
 bool is_new_loop, basic_block 
 *new_exit_bb)
 {
 -  gimple orig_phi, new_phi;
 +  gimple orig_phi, new_phi, scalar_phi = NULL;
   gimple update_phi, update_phi2;
   tree guard_arg, loop_arg;
   basic_block new_merge_bb = guard_edge-dest;
   edge e = EDGE_SUCC (new_merge_bb, 0);
   basic_block update_bb = e-dest;
   basic_block orig_bb = loop-header;
 -  edge new_exit_e;
 +  edge new_exit_e, scalar_e = NULL;
   tree current_new_name;
 -  gimple_stmt_iterator gsi_orig, gsi_update;
 +  gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none ();
 
   /* Create new bb between loop and new_merge_bb.  */
   *new_exit_bb = split_edge (single_exit (loop));
 
   new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
 
 +  if (scalar_loop != NULL  !is_new_loop)
 +{
 +  gsi_scalar = gsi_start_phis (scalar_loop-header);
 +  scalar_e = EDGE_SUCC (scalar_loop-latch, 0);
 +}
 +
   for (gsi_orig = gsi_start_phis (orig_bb),
gsi_update = gsi_start_phis (update_bb);
!gsi_end_p (gsi_orig)  !gsi_end_p (gsi_update);
 @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge
   tree new_res;
   orig_phi = gsi_stmt (gsi_orig);
   update_phi = gsi_stmt (gsi_update);
 +  if (scalar_e != NULL)
 +{
 +  scalar_phi = gsi_stmt (gsi_scalar);
 +  gsi_next (gsi_scalar);
 +}
 
   /** 1. Handle new-merge-point phis  **/
 
 @@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge
 current_new_name = loop_arg;
   else
 {
 -  current_new_name = get_current_def (loop_arg);
 +  if (scalar_e)
 +{
 +  current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e);
 +  current_new_name = get_current_def (current_new_name);
 +}
 +  else
 +current_new_name = get_current_def (loop_arg);
  /* current_def is not available only if the variable does

Re: Enable building of libatomic on AArch64

2013-10-17 Thread Michael Hudson-Doyle

Ping?

Michael Hudson-Doyle michael.hud...@linaro.org writes:

 Marcus Shawcroft marcus.shawcr...@gmail.com writes:

 On 3 October 2013 23:43, Michael Hudson-Doyle michael.hud...@linaro.org 
 wrote:
 Hi,

 As libatomic builds for and the tests pass on AArch64 (built on x86_64
 but tested on a foundation model, logs and summary:

 http://people.linaro.org/~mwhudson/libatomic.sum.txt
 http://people.linaro.org/~mwhudson/runtest-log-v-2.txt

 ) this patch enables the build.

 Cheers,
 mwh
 (first time posting to this list, let me know if I'm doing it wrong)

 2013-10-04  Michael Hudson-Doyle  michael.hud...@linaro.org

   * configure.tgt: Add AArch64 support.


 Hi,
 The patch looks fine to me.

 Thanks for looking!

 The ChangeLog entry should reflect the code that was removed rather
 than the functionality added.  Perhaps:

   * configure.tgt (aarch64*): Remove.

 There are few too many negatives going on to make a pithy explanation
 easy...

 Did you investigate whether or not the 10 UNSUPPORTED results in the
 testsuite are sane?

 I did not, but have now.

 I think that 5 look legitimate since they require 128 bit sync ops.
 The other 5 look superficially like they should be supported on
 aarch64.  We may just be missing aarch64 target supports wiring in
 check_effective_target_sync_long_long_runtime?

 Yes, that was it, with appropriate changes the -4 tests all pass.

 However, just out of a sense of curiosity, I added wiring to claim
 aarch64* supports 128 bit sync ops and all the -5 tests pass too.  Is
 that just luck or because the reservation granule on the foundation
 model is big enough or something else?

 In any case, I'll attach a patch that just claims support for long long
 sync ops for now...

 Cheers,
 mwh

 /Marcus

 2013-10-04  Michael Hudson-Doyle  michael.hud...@linaro.org

   * libatomic/configure.tgt (aarch64*): Remove code preventing
 build.

   * gcc/testsuite/lib/target-supports.exp
 (check_effective_target_sync_long_long): AArch64 supports
 atomic operations on long long.
 (check_effective_target_sync_long_long_runtime): AArch64 can
 execute atomic operations on long long.

 diff --git a/gcc/testsuite/lib/target-supports.exp 
 b/gcc/testsuite/lib/target-supports.exp
 index 7eb4dfe..5557c06 100644
 --- a/gcc/testsuite/lib/target-supports.exp
 +++ b/gcc/testsuite/lib/target-supports.exp
 @@ -4508,6 +4508,7 @@ proc check_effective_target_sync_int_128_runtime { } {
  proc check_effective_target_sync_long_long { } {
  if { [istarget x86_64-*-*]
|| [istarget i?86-*-*])
 +  || [istarget aarch64*-*-*]
|| [istarget arm*-*-*]
|| [istarget alpha*-*-*]
|| ([istarget sparc*-*-*]  [check_effective_target_lp64]) } {
 @@ -4537,6 +4538,8 @@ proc check_effective_target_sync_long_long_runtime { } {
   }
   } 
   }]
 +} elseif { [istarget aarch64*-*-*] } {
 + return 1
  } elseif { [istarget arm*-*-linux-*] } {
   return [check_runtime sync_longlong_runtime {
   #include stdlib.h
 diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
 index b9e5d6c..7eaab38 100644
 --- a/libatomic/configure.tgt
 +++ b/libatomic/configure.tgt
 @@ -95,11 +95,6 @@ fi
  
  # Other system configury
  case ${target} in
 -  aarch64*)
 - # This is currently not supported in AArch64.
 - UNSUPPORTED=1
 - ;;
 -
arm*-*-linux*)
   # OS support for atomic primitives.
   config_path=${config_path} linux/arm posix

PR libstdc++/58729 - tr2::dynamic_bitset::resize fails

2013-10-17 Thread Ed Smith-Rowland


This patch bootstraps and tests clean on x86-64-linux.

Truthfully, dynamic_bitset needs some more love wrt C++11 and a testsuite.
It got put in before it was baked really.
That will be later.


2013-10-16  Edward Smith-Rowland  3dw...@verizon.net

PR libstdc++/58729
* include/tr2/dynamic_bitset (_M_resize, resize): Use input value
to set bits; (_M_do_left_shift, _M_do_right_shift, _M_do_to_ulong,
_M_do_to_ullong, _M_do_find_first, _M_do_find_next, _M_copy_from_ptr,
operator): Move long methods outline to...
* include/tr2/dynamic_bitset.tcc: New.
* include/Makefile.am: Add dynamic_bitset.tcc.
* include/Makefile.in: Add dynamic_bitset.tcc.
* testsuite/tr2/dynamic_bitset/pr58729.cc: New.
Index: include/tr2/dynamic_bitset
===
--- include/tr2/dynamic_bitset  (revision 203739)
+++ include/tr2/dynamic_bitset  (working copy)
@@ -137,7 +137,12 @@
if (__nbits % _S_bits_per_block  0)
  ++__sz;
if (__sz != this-_M_w.size())
- this-_M_w.resize(__sz);
+ {
+   block_type __val = 0;
+   if (__value)
+ __val = std::numeric_limitsblock_type::max();
+   this-_M_w.resize(__sz, __val);
+ }
   }
 
   allocator_type
@@ -246,7 +251,7 @@
   bool
   _M_is_equal(const __dynamic_bitset_base __x) const
   {
-   if (__x.size() == this-size())
+   if (__x._M_w.size() == this-_M_w.size())
  {
for (size_t __i = 0; __i  this-_M_w.size(); ++__i)
  if (this-_M_w[__i] != __x._M_w[__i])
@@ -260,7 +265,7 @@
   bool
   _M_is_less(const __dynamic_bitset_base __x) const
   {
-   if (__x.size() == this-size())
+   if (__x._M_w.size() == this-_M_w.size())
  {
for (size_t __i = this-_M_w.size(); __i  0; --__i)
  {
@@ -297,9 +302,9 @@
   bool
   _M_is_subset_of(const __dynamic_bitset_base __b)
   {
-   if (__b.size() == this-size())
+   if (__b._M_w.size() == this-_M_w.size())
  {
-   for (size_t __i = 0; __i  _M_w.size(); ++__i)
+   for (size_t __i = 0; __i  this-_M_w.size(); ++__i)
  if (this-_M_w[__i] != (this-_M_w[__i] | __b._M_w[__i]))
return false;
return true;
@@ -364,140 +369,6 @@
   }
 };
 
-  // Definitions of non-inline functions from __dynamic_bitset_base.
-  templatetypename _WordT, typename _Alloc
-void
-__dynamic_bitset_base_WordT, _Alloc::_M_do_left_shift(size_t __shift)
-{
-  if (__builtin_expect(__shift != 0, 1))
-   {
- const size_t __wshift = __shift / _S_bits_per_block;
- const size_t __offset = __shift % _S_bits_per_block;
-
- if (__offset == 0)
-   for (size_t __n = this-_M_w.size() - 1; __n = __wshift; --__n)
- this-_M_w[__n] = this-_M_w[__n - __wshift];
- else
-   {
- const size_t __sub_offset = _S_bits_per_block - __offset;
- for (size_t __n = _M_w.size() - 1; __n  __wshift; --__n)
-   this-_M_w[__n] = ((this-_M_w[__n - __wshift]  __offset)
-| (this-_M_w[__n - __wshift - 1]  
__sub_offset));
- this-_M_w[__wshift] = this-_M_w[0]  __offset;
-   }
-
-  std::fill(this-_M_w.begin(), this-_M_w.begin() + __wshift,
-   static_cast_WordT(0));
-   }
-}
-
-  templatetypename _WordT, typename _Alloc
-void
-__dynamic_bitset_base_WordT, _Alloc::_M_do_right_shift(size_t __shift)
-{
-  if (__builtin_expect(__shift != 0, 1))
-   {
- const size_t __wshift = __shift / _S_bits_per_block;
- const size_t __offset = __shift % _S_bits_per_block;
- const size_t __limit = this-_M_w.size() - __wshift - 1;
-
- if (__offset == 0)
-   for (size_t __n = 0; __n = __limit; ++__n)
- this-_M_w[__n] = this-_M_w[__n + __wshift];
- else
-   {
- const size_t __sub_offset = (_S_bits_per_block
-  - __offset);
- for (size_t __n = 0; __n  __limit; ++__n)
-   this-_M_w[__n] = ((this-_M_w[__n + __wshift]  __offset)
-| (this-_M_w[__n + __wshift + 1]  
__sub_offset));
- this-_M_w[__limit] = this-_M_w[_M_w.size()-1]  __offset;
-   }
-
- std::fill(this-_M_w.begin() + __limit + 1, this-_M_w.end(),
-   static_cast_WordT(0));
-   }
-}
-
-  templatetypename _WordT, typename _Alloc
-unsigned long
-__dynamic_bitset_base_WordT, _Alloc::_M_do_to_ulong() const
-{
-  size_t __n = sizeof(unsigned long) / sizeof(block_type);
-  for (size_t __i = __n; __i  this-_M_w.size(); ++__i)
-   if (this-_M_w[__i])
- __throw_overflow_error(__N(__dynamic_bitset_base::_M_do_to_ulong));
-  unsigned

Re: [Patch] Fix undefined behaviors in regex

2013-10-17 Thread Marek Polacek

On Wed, Oct 16, 2013 at 07:02:03PM -0400, Tim Shen wrote:
  To be honest, I was thinking something much smaller than the whole regex
  ;) But let's add Marek in CC.
 
 int work() {
 }
 
 int main() {
 int a = work();
 return a;
 }
 
 /* This is a smaller case to test the sanitizer. It seems that the
 undefined sanitizer is not merged? I use `g++ (GCC) 4.9.0 20131003`,
 is that too old? */

No, that's not too old, the thing is -fsanitize=undefined isn't
complete - we currently sanitize shift, division by zero, and
__builtin_unreachable call; VLA sanitization is done, but not commited
because I'm waiting for a review of the C++ FE part of that patch,
and on NULL pointer checking I'm working now.

Missing return statement will definitely be added, too (quite
easy, I should think), and that would detect the bug in your
testcase.

Still, thanks for letting me know.

Marek

Re: [Patch] Fix undefined behaviors in regex

2013-10-17 Thread Jakub Jelinek

On Thu, Oct 17, 2013 at 09:12:41AM +0200, Marek Polacek wrote:
 On Wed, Oct 16, 2013 at 07:02:03PM -0400, Tim Shen wrote:
   To be honest, I was thinking something much smaller than the whole regex
   ;) But let's add Marek in CC.
  
  int work() {
  }
  
  int main() {
  int a = work();
  return a;
  }
  
  /* This is a smaller case to test the sanitizer. It seems that the
  undefined sanitizer is not merged? I use `g++ (GCC) 4.9.0 20131003`,
  is that too old? */
 
 No, that's not too old, the thing is -fsanitize=undefined isn't
 complete - we currently sanitize shift, division by zero, and
 __builtin_unreachable call; VLA sanitization is done, but not commited
 because I'm waiting for a review of the C++ FE part of that patch,
 and on NULL pointer checking I'm working now.
 
 Missing return statement will definitely be added, too (quite
 easy, I should think), and that would detect the bug in your
 testcase.

Though, in the above case, the question is why people ignore warnings
from the compiler and need to have special runtime instrumentation to remind
them instead.  I'm not objecting to that sanitization, only find it weird.

Jakub

Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Jan-Benedict Glaw

On Wed, 2013-10-16 19:40:21 -0700, Xinliang David Li davi...@google.com wrote:
 On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote:
  On Wed, Oct 16, 2013 at 7:23 PM, Sriraman Tallam tmsri...@google.com 
  wrote:
   I was unable to build a native powerpc compiler. I checked for
   build_target_node and build_optimization_node throughout and
   changed rs6000 because it had references. I did not realize
   function_specific_save and function_specific_restore have to be
   changed. Sorry for breaking it.
 
  As Mike replied, gcc110 is available.  Richard Biener's approval
  was dependent upon successful bootstrap and passing the regression
  testsuite, which you did not even attempt, nor did you try to
  build a cross-compiler.
 
 This is an oversight. I agree that it is better to test on multiple
 platforms for large changes like this. In the past, Sri has been
 very attentive to any fallouts due to his changes, so is this time.

As of speaking about multiple platforms...  This patch didn't only
produce fallout on rs6k, but also for quite a number of other
architectures.

  I already send one message (it can be found in the archives at
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01156.html) listing quite
a number of targets that are broken right now. This situation didn't
change significantly since they started to fail. Please have a look at
my build robot[1]. All targets (except nds32, nios2 and arceb) used to
build.

  Don't get me wrong. I don't want to overly blame Sriraman for
breaking it in the first place. Shit happens. But please have an eye
on fixing the fallout, timely.

MfG, JBG
[1] http://toolchain.lug-owl.de/buildbot/?limit=2000

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:   Ich hatte in letzter Zeit ein bißchen viel Realitycheck.
the second  :   Langsam möchte ich mal wieder weiterträumen können.
 -- Maximilian Wilhelm (18. Mai 2005, #lug-owl.de)


signature.asc
Description: Digital signature

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-17 Thread Marek Polacek

On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote:
 +/* Return true if target platform supports macro-fusion.  */
 +
 +static bool
 +ix86_macro_fusion_p ()
 +{
 +  if (TARGET_FUSE_CMP_AND_BRANCH)
 +return true;
 +  else
 +return false;
 +}

That looks weird, why not just

static bool
ix86_macro_fusion_p (void)
{
  return TARGET_FUSE_CMP_AND_BRANCH;
}

?

Marek

Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Andreas Schwab

What about all the other targets you broke?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.

Re: [PATCH] Fix PR58143 and dups

2013-10-17 Thread Richard Biener

On Tue, 15 Oct 2013, Richard Biener wrote:

 
 This is an alternate fix (see 
 http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00234.html for the other
 one) for the various PRs that show that LIM exposes undefined
 signed overflow on paths where it wasn't executed before LIM
 ultimately leading to a miscompilation.
 
 For this fix we rewrite invariant stmts to use unsigned arithmetic
 when it is one of the operations that SCEV and niter analysis
 handles (thus not division or absolute).  The other fix instead
 disables invariant motion for those expressions.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu.
 
 The issue is also present on the 4.8 branch, so either patch
 should be backported in the end.  I will try to benchmark
 both tomorrow (unless somebody beats me on that).

Comparing both patches doesn't get conclusive results.  A single-run
SPEC 2k6 on a Sandy Bridge machine gives (base is Bernds patch, peak
is mine, -Ofast -funroll-loops -march-native):

  Estimated   
Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
400.perlbench9770332   29.4 *9770325   
30.0 *  
401.bzip29650450   21.4 *9650449   
21.5 *  
403.gcc  8050282   28.5 *8050282   
28.6 *  
429.mcf  9120225   40.6 *9120226   
40.4 *  
445.gobmk   10490411   25.6 *   10490408   
25.7 *  
456.hmmer9330364   25.6 *9330365   
25.6 *  
458.sjeng   12100433   27.9 *   12100432   
28.0 *  
462.libquantum  20720318   65.1 *   20720325   
63.8 *  
464.h264ref 22130525   42.2 *   22130527   
42.0 *  
471.omnetpp  6250263   23.7 *6250264   
23.7 *  
473.astar7020347   20.2 *7020346   
20.3 *  
483.xalancbmk6900188   36.7 *6900189   
36.5 *  
 Est. SPECint_base2006   --
 Est. SPECint2006
--

  Estimated   
Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
410.bwaves  13590307   44.2 *   13590303   
44.9 *  
416.gamess  NR  
NR 
433.milc 9180356   25.8 *9180356   
25.8 *  
434.zeusmp   9100295   30.8 *9100297   
30.7 *  
435.gromacs  7140283   25.3 *7140282   
25.3 *  
436.cactusADM   11950380   31.4 *   11950383   
31.2 *  
437.leslie3d 9400288   32.7 *9400284   
33.1 *  
444.namd 8020401   20.0 *8020402   
20.0 *  
447.dealII  11440277   41.3 *   11440278   
41.1 *  
450.soplex   8340208   40.1 *8340206   
40.5 *  
453.povray   5320180   29.6 *5320178   
29.9 *  
454.calculix 8250393   21.0 *8250392   
21.0 *  
459.GemsFDTD10610316   33.6 *   10610308   
34.4 *  
465.tonto9840294   33.5 *9840294   
33.5 *  
470.lbm 13740245   56.1 *   13740245   
56.1 *  
481.wrf 11170259   43.1 *   11170259   
43.2 *  
482.sphinx3 19490396   49.3 *   19490397   
49.1 *  
 Est. SPECfp_base2006--
 Est. SPECfp2006 
--


off-noise (more than 5s difference) may be 462.libquantum and 
459.GemsFDTD.  I didn't include unpatched trunk in the comparison
(not fixing the bug isn't an option after all).

Conceptually I like the rewriting into unsigned arithmetic more
so I'm going to apply that variant later today (re-testing 3
runs of 462.libquantum and 459.GemsFDTD, this time with address-space
randomization turned off).

Richard.

 Any preference or other suggestions?
 
 Thanks,
 Richard.
 
 2013-10-15  Richard Biener  rguent...@suse.de
 
   PR tree-optimization/58143
   * tree-ssa-loop-im.c (arith_code_with_undefined_signed_overflow):
   New function.
   (rewrite_to_defined_overflow): Likewise.
   (move_computations_dom_walker::before_dom): Rewrite stmts
   with undefined signed overflow that are not always

Re: [PATCH] Fix PR58143 and dups

2013-10-17 Thread Jakub Jelinek

On Thu, Oct 17, 2013 at 09:56:31AM +0200, Richard Biener wrote:
 off-noise (more than 5s difference) may be 462.libquantum and 
 459.GemsFDTD.  I didn't include unpatched trunk in the comparison
 (not fixing the bug isn't an option after all).
 
 Conceptually I like the rewriting into unsigned arithmetic more
 so I'm going to apply that variant later today (re-testing 3
 runs of 462.libquantum and 459.GemsFDTD, this time with address-space
 randomization turned off).

Can't we rewrite for the selected arithmetic operations and punt (is that
what Bernd's patch did) on moving other arithmetics?  Well, there are
operations that are safe even for signed types, e.g. rotates, isn't
RSHIFT_EXPR also safe?
Can't you handle also LSHIFT_EXPR (or do we treat it even signed as never
undefined in the middle-end/backend?).

Jakub

Re: Patch: Add #pragma ivdep support to the ME and C FE

2013-10-17 Thread Richard Biener

On Wed, 16 Oct 2013, Tobias Burnus wrote:

Frederic Riss wrote:
Just one question. You describe the pragma in the doco patch as:

+This pragma tells the compiler that the immediately following @code{for}
+loop can be executed in any loop index order without affecting the result.
+The pragma aids optimization and in particular vectorization as the
+compiler can then assume a vectorization safelen of infinity.

I'm not a specialist, but I was always told that the 'original'
meaning of ivdep (which I believe was introduced by Cray), was that
the compiler could assume that there are only forward dependencies in
the loop, but not that it can be executed in any order.

The nice thing about #pragma ivdep is that there is no real standard. And
the explanation of the different vendors is also not completely clear.

Some overview about this is given in the following file on pages 13-14 for
Cray Reaseach PVP, MIPSPRO Open64, Intel ICC, Multiflow
http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/GREPS2007-Benoit.pdf

That's summerized as:
- vector: ignore lexical upward dependencies (Cray PVP, Intel ICC)
- parallel: ignore loop-carried dependencies (MIPSPRO, Open64)
- liberal: ignore loop-variant dependencies (Multiflow)

The quotes for Cray and Intel are below.

Cray:
http://docs.cray.com/books/004-2179-001/html-004-2179-001/brtlrwh.html#EKZ5MRWH
The ivdep directive tells the compiler to ignore vector dependencies for
the loop immediately following the directive. Conditions other than vector
dependencies can inhibit vectorization. If these conditions are satisfactory,
the loop vectorizes. This directive is useful for some loops that contain
pointers and indirect addressing. The format of this directive is as follows:
#pragma _CRI ivdep

Which suggests we use

#pragma GCC ivdep

to not collide with eventually different semantics in existing programs
that use variants of this pragma?

Intel:
http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm
The ivdep pragma instructs the compiler to ignore assumed vector
dependencies.
To ensure correct code, the compiler treats an assumed dependence as a proven
dependence, which prevents vectorization. This pragma overrides that
decision.
Use this pragma only when you know that the assumed loop dependencies are
safe
to ignore.

This suggests that _known_ dependences are still treated as dependences.
But what is known obviously depends on the implementation which
may not know that a[i] and a[i+1] depend but merely assume it. Not
a standard-proof definition of the pragma ;)

That said, safelen even overrides know dependences (but with unknown
distance vector)! (that looks like a bug to me, or at least a QOI issue)

The Intel docs give this example:
...
Given your description, this loop wouldn't be a candidate for ivdep,
as reversing the loop index order changes the semantics. I believe
that the way you interpret it (ie. setting vectorization safe length
to INT_MAX) is correct with respect to this other definition, though.

Do you have a suggestion for a better wording? My idea was to interpret
this part similar to OpenMP's simd with safelen=infinity. (Actually, I
believe loop-safelen was added for OpenMPv4's and/or Cilk Plus's simd.)

OpenMPv4.0, http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf , states
for this (excerpt from page 70):
A SIMD loop has logical iterations numbered 0, 1,...,N-1 where N is the
number of loop iterations, and the logical numbering denotes the sequence
in which the iterations would be executed if the associated loop(s) were
executed with no SIMD instructions. If the safelen clause is used then no
two iterations executed concurrently with SIMD instructions can have a
greater distance in the logical iteration space than its value. The
parameter of the safelen clause must be a constant positive integer
expression. The number of iterations that are executed concurrently at
any given time is implementation defined. Each concurrent iteration will
be executed by a different SIMD lane. Each set of concurrent iterations
is a SIMD chunk.

OTOH, if we are mapping ivdep to safelen why not simply allow

#pragma GCC safelen 4

Oh, and are there any plans to maintain this information in some way
till the back-end? Software pipelining could be another huge winner
for that kind of dependency analysis simplification.

I don't know until when loop-safelen is kept. As it is late in the
middle-end, providing the backend with this information should be
simple.

It's kept as long as we preserve loops which at the moment is until
after RTL loop optimizations are finished. Extending this isn't
hard, I just didn't see a reason to do that.

Richard.

1 2 >

1 - 100 of 177 matches

Mail list logo