Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Joey Ye
Which Cortex-R you are targeting that supports both D16 and D32?

Thanks,
Joey

On Thu, Oct 17, 2013 at 3:13 PM, Sebastian Huber
sebastian.hu...@embedded-brains.de wrote:
 Hello,

 it seems that it is not possible to deduce from GCC built-in defines whether
 we compile for the VFPv3-D16 or VFPv3-D32 floating point unit.

 touch empty.c

 arm-eabi-gcc -march=armv7-r -mfpu=vfpv3-d16 -mfloat-abi=hard -E -P -v -dD
 empty.c  vfpv3-d16.txt

 arm-eabi-gcc -march=armv7-r -mfpu=vfpv3 -mfloat-abi=hard -E -P -v -dD
 empty.c  vfpv3-d32.txt

 diff vfpv3-d16.txt vfpv3-d32.txt

 Is it possible to add a built-in define for this?  Or as an alternative is
 it possible to use a GCC configuration target specific multi-lib define that
 indicates it?

 I want to use such a compiler provided define to determine how the context
 switch support in an operating system should look like.

 --
 Sebastian Huber, embedded brains GmbH

 Address : Dornierstr. 4, D-82178 Puchheim, Germany
 Phone   : +49 89 189 47 41-16
 Fax : +49 89 189 47 41-09
 E-Mail  : sebastian.hu...@embedded-brains.de
 PGP : Public key available on request.

 Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Sebastian Huber

On 2013-10-17 09:28, Joey Ye wrote:

Which Cortex-R you are targeting that supports both D16 and D32?


I have a Cortex-R variant which supports D16 only and now I want to add a 
multi-lib to our GCC target configuration and use a compiler built-in to adjust 
the context switch code for this multi-lib.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Joey Ye
There is no macro to indicate VFP variances. Probably you can check
CPU variance instead. As I know Cortex-R only support D16.

Joey

On Thu, Oct 17, 2013 at 3:47 PM, Sebastian Huber
sebastian.hu...@embedded-brains.de wrote:
 On 2013-10-17 09:28, Joey Ye wrote:

 Which Cortex-R you are targeting that supports both D16 and D32?


 I have a Cortex-R variant which supports D16 only and now I want to add a
 multi-lib to our GCC target configuration and use a compiler built-in to
 adjust the context switch code for this multi-lib.


 --
 Sebastian Huber, embedded brains GmbH

 Address : Dornierstr. 4, D-82178 Puchheim, Germany
 Phone   : +49 89 189 47 41-16
 Fax : +49 89 189 47 41-09
 E-Mail  : sebastian.hu...@embedded-brains.de
 PGP : Public key available on request.

 Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: Compilation flags in libgfortran

2013-10-17 Thread Richard Biener
On Wed, Oct 16, 2013 at 12:22 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:
 On 16/10/13 10:37, pins...@gmail.com wrote:

 On Oct 15, 2013, at 6:58 AM, Igor Zamyatin izamya...@gmail.com wrote:
 Hi All!

 Is there any particular reason that matmul* modules from libgfortran
 are compiled with -O2 -ftree-vectorize?

 I see some regressions on Atom processor after r202980
 (http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00846.html)

 Why not just use O3 for those modules?

 -O3 and -O2 -ftree-vectorize won't give much performance difference.  What
 you are seeing is the cost model needs improvement; at least for atom.

 Hi all,
 I think http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01908.html introduced
 the new cheap vectoriser cost model that favors compilation time over
 runtime performance and is set as default for -O2. -O3 uses the dynamic
 model which potentially gives better runtime performance in exchange for
 longer compile times (if I understand the new rules correctly).
 Therefore, I'd expect -O3 to give a better vector performance than -O2...

But this suggests to compile with -O2 -ftree-vectorize
-fvect-cost-model=dynamic, not building with -O3.

Richard.

 Kyrill




Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Richard Earnshaw
On 17/10/13 08:56, Joey Ye wrote:
 There is no macro to indicate VFP variances. Probably you can check
 CPU variance instead. As I know Cortex-R only support D16.
 

Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile
and therefore only 16 regs.

R.

 Joey
 
 On Thu, Oct 17, 2013 at 3:47 PM, Sebastian Huber
 sebastian.hu...@embedded-brains.de wrote:
 On 2013-10-17 09:28, Joey Ye wrote:

 Which Cortex-R you are targeting that supports both D16 and D32?


 I have a Cortex-R variant which supports D16 only and now I want to add a
 multi-lib to our GCC target configuration and use a compiler built-in to
 adjust the context switch code for this multi-lib.


 --
 Sebastian Huber, embedded brains GmbH

 Address : Dornierstr. 4, D-82178 Puchheim, Germany
 Phone   : +49 89 189 47 41-16
 Fax : +49 89 189 47 41-09
 E-Mail  : sebastian.hu...@embedded-brains.de
 PGP : Public key available on request.

 Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
 
 
 




Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Sebastian Huber

On 2013-10-17 14:24, Richard Earnshaw wrote:

On 17/10/13 08:56, Joey Ye wrote:

There is no macro to indicate VFP variances. Probably you can check
CPU variance instead. As I know Cortex-R only support D16.


Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile
and therefore only 16 regs.


Thanks for this information.  I was not aware that no existing Cortex-R variant 
has a D32 VFP unit.


Is the converse true for Cortex-A variants or are there Cortex-A cores with 
VFP-D16 and VFP-D32 units?


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


Re: ARM: VFPv3-D16 vs. VFPv3-D32

2013-10-17 Thread Richard Earnshaw
On 17/10/13 13:46, Sebastian Huber wrote:
 On 2013-10-17 14:24, Richard Earnshaw wrote:
 On 17/10/13 08:56, Joey Ye wrote:
 There is no macro to indicate VFP variances. Probably you can check
 CPU variance instead. As I know Cortex-R only support D16.

 Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile
 and therefore only 16 regs.
 
 Thanks for this information.  I was not aware that no existing Cortex-R 
 variant 
 has a D32 VFP unit.
 
 Is the converse true for Cortex-A variants or are there Cortex-A cores with 
 VFP-D16 and VFP-D32 units?
 

No, the converse is not true.  In general (though I don't think there's
a guarantee that this will be the case) if you don't have Neon you won't
have D32.  However, if you have Neon you must have D32.  Not all
Cortex-A cores have Neon; though many do these days.

R.



Re: [RFC] By default if-convert only basic blocks that will be vectorized

2013-10-17 Thread Sergey Ostanevich
Jakub, Richard,

I believe this patch is a good opportunity to improve the
vectorization capabilities.
I have the following question related to it: whether we plan to treat
the #pragma omp simd as a
directive to vectorize the underlying loop, hence dropping any
assessment regarding profitablity?


Regards,
Sergos

On Tue, Oct 15, 2013 at 4:32 PM, Jakub Jelinek ja...@redhat.com wrote:
 Hi!

 Especially on i?86/x86_64 if-conversion pass seems to be often
 a pessimization, but the vectorization relies on it and without it we can't
 vectorize a lot of the loops.

 Here is a prototype of a patch that will by default (unless explicit
 -ftree-loop-if-convert) only if-convert loops internally for vectorization,
 so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized
 basic blocks, but will not appear if vectorization fails, or in the
 scalar loop if vectorization is conditional, or in the prologue or epilogue
 loops around the vectorized loop.

 Instead of moving the ifcvt pass inside of the vectorizer, this patch
 during ifcvt performs loop versioning depending on a special internal
 call, only if the internal call returns true we go to the if-converted
 original loop, otherwise the non-if-converted copy of the original loop
 is performed.  And the vectorizer is taught to fold this internal call
 into true resp. false depending on if the loop was vectorized or not, and
 vectorizer loop versioning, peeling for alignment and for bound are adjusted
 to also copy from the non-if-converted loop rather than if-converted one.

 Besides fixing the various PRs where if-conversion pessimizes code I'd like
 to also move forward with this with conditional loads and stores,
 http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html
 where the if-unconversion approach looked like a failure.

 This patch doesn't yet handle if-converted inner loop in outer loop
 vectorization, something on my todo list (so several vect-cond-*.c tests
 FAIL because they are no longer vectorized) plus I had to change two
 SLP vectorization tests that silently relied on loop if-conversion being
 performed to actually optimize the basic block (if the same thing didn't
 appear in a loop, it wouldn't be optimized at all).

 On the newly added testcase on x86_64, there are before this patch
 18 scalar conditional moves, with the patch just 2 (both in the checking
 routine).

 Comments?

 --- gcc/internal-fn.def.jj  2013-10-11 14:32:57.079909782 +0200
 +++ gcc/internal-fn.def 2013-10-11 17:23:58.705526840 +0200
 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
  DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
  DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
  DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 --- gcc/tree-vect-loop-manip.c.jj   2013-09-30 22:13:47.0 +0200
 +++ gcc/tree-vect-loop-manip.c  2013-10-15 12:57:54.854970913 +0200
 @@ -374,24 +374,31 @@ LOOP-  loop1

  static void
  slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
 +   struct loop *scalar_loop,
  bool is_new_loop, basic_block 
 *new_exit_bb)
  {
 -  gimple orig_phi, new_phi;
 +  gimple orig_phi, new_phi, scalar_phi = NULL;
gimple update_phi, update_phi2;
tree guard_arg, loop_arg;
basic_block new_merge_bb = guard_edge-dest;
edge e = EDGE_SUCC (new_merge_bb, 0);
basic_block update_bb = e-dest;
basic_block orig_bb = loop-header;
 -  edge new_exit_e;
 +  edge new_exit_e, scalar_e = NULL;
tree current_new_name;
 -  gimple_stmt_iterator gsi_orig, gsi_update;
 +  gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none ();

/* Create new bb between loop and new_merge_bb.  */
*new_exit_bb = split_edge (single_exit (loop));

new_exit_e = EDGE_SUCC (*new_exit_bb, 0);

 +  if (scalar_loop != NULL  !is_new_loop)
 +{
 +  gsi_scalar = gsi_start_phis (scalar_loop-header);
 +  scalar_e = EDGE_SUCC (scalar_loop-latch, 0);
 +}
 +
for (gsi_orig = gsi_start_phis (orig_bb),
 gsi_update = gsi_start_phis (update_bb);
 !gsi_end_p (gsi_orig)  !gsi_end_p (gsi_update);
 @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge
tree new_res;
orig_phi = gsi_stmt (gsi_orig);
update_phi = gsi_stmt (gsi_update);
 +  if (scalar_e != NULL)
 +   {
 + scalar_phi = gsi_stmt (gsi_scalar);
 + gsi_next (gsi_scalar);
 +   }

/** 1. Handle new-merge-point phis  **/

 @@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge
  current_new_name = loop_arg;
else
  {
 -  current_new_name = get_current_def (loop_arg);
 + if (scalar_e)
 +   {
 + current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e);
 + current_new_name = 

Re: [RFC] By default if-convert only basic blocks that will be vectorized

2013-10-17 Thread pinskia

 On Oct 15, 2013, at 5:32 AM, Jakub Jelinek ja...@redhat.com wrote:
 
 Hi!
 
 Especially on i?86/x86_64 if-conversion pass seems to be often
 a pessimization, but the vectorization relies on it and without it we can't
 vectorize a lot of the loops.

I think on many other targets it actually helps.  I know for one it helps on 
octeon even though octeon has no vector instructions.  I think it helps most 
arm targets too.

Thanks,
Andrew

 
 Here is a prototype of a patch that will by default (unless explicit
 -ftree-loop-if-convert) only if-convert loops internally for vectorization,
 so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized
 basic blocks, but will not appear if vectorization fails, or in the
 scalar loop if vectorization is conditional, or in the prologue or epilogue
 loops around the vectorized loop.
 
 Instead of moving the ifcvt pass inside of the vectorizer, this patch
 during ifcvt performs loop versioning depending on a special internal
 call, only if the internal call returns true we go to the if-converted
 original loop, otherwise the non-if-converted copy of the original loop
 is performed.  And the vectorizer is taught to fold this internal call
 into true resp. false depending on if the loop was vectorized or not, and
 vectorizer loop versioning, peeling for alignment and for bound are adjusted
 to also copy from the non-if-converted loop rather than if-converted one.
 
 Besides fixing the various PRs where if-conversion pessimizes code I'd like
 to also move forward with this with conditional loads and stores,
 http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html
 where the if-unconversion approach looked like a failure.
 
 This patch doesn't yet handle if-converted inner loop in outer loop
 vectorization, something on my todo list (so several vect-cond-*.c tests
 FAIL because they are no longer vectorized) plus I had to change two
 SLP vectorization tests that silently relied on loop if-conversion being
 performed to actually optimize the basic block (if the same thing didn't
 appear in a loop, it wouldn't be optimized at all).
 
 On the newly added testcase on x86_64, there are before this patch
 18 scalar conditional moves, with the patch just 2 (both in the checking
 routine).
 
 Comments?
 
 --- gcc/internal-fn.def.jj2013-10-11 14:32:57.079909782 +0200
 +++ gcc/internal-fn.def2013-10-11 17:23:58.705526840 +0200
 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
 DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
 +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
 --- gcc/tree-vect-loop-manip.c.jj2013-09-30 22:13:47.0 +0200
 +++ gcc/tree-vect-loop-manip.c2013-10-15 12:57:54.854970913 +0200
 @@ -374,24 +374,31 @@ LOOP-  loop1
 
 static void
 slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
 +struct loop *scalar_loop,
 bool is_new_loop, basic_block 
 *new_exit_bb)
 {
 -  gimple orig_phi, new_phi;
 +  gimple orig_phi, new_phi, scalar_phi = NULL;
   gimple update_phi, update_phi2;
   tree guard_arg, loop_arg;
   basic_block new_merge_bb = guard_edge-dest;
   edge e = EDGE_SUCC (new_merge_bb, 0);
   basic_block update_bb = e-dest;
   basic_block orig_bb = loop-header;
 -  edge new_exit_e;
 +  edge new_exit_e, scalar_e = NULL;
   tree current_new_name;
 -  gimple_stmt_iterator gsi_orig, gsi_update;
 +  gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none ();
 
   /* Create new bb between loop and new_merge_bb.  */
   *new_exit_bb = split_edge (single_exit (loop));
 
   new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
 
 +  if (scalar_loop != NULL  !is_new_loop)
 +{
 +  gsi_scalar = gsi_start_phis (scalar_loop-header);
 +  scalar_e = EDGE_SUCC (scalar_loop-latch, 0);
 +}
 +
   for (gsi_orig = gsi_start_phis (orig_bb),
gsi_update = gsi_start_phis (update_bb);
!gsi_end_p (gsi_orig)  !gsi_end_p (gsi_update);
 @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge
   tree new_res;
   orig_phi = gsi_stmt (gsi_orig);
   update_phi = gsi_stmt (gsi_update);
 +  if (scalar_e != NULL)
 +{
 +  scalar_phi = gsi_stmt (gsi_scalar);
 +  gsi_next (gsi_scalar);
 +}
 
   /** 1. Handle new-merge-point phis  **/
 
 @@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge
 current_new_name = loop_arg;
   else
 {
 -  current_new_name = get_current_def (loop_arg);
 +  if (scalar_e)
 +{
 +  current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e);
 +  current_new_name = get_current_def (current_new_name);
 +}
 +  else
 +current_new_name = get_current_def (loop_arg);
  /* current_def is not available only if the variable does 

Re: Enable building of libatomic on AArch64

2013-10-17 Thread Michael Hudson-Doyle
Ping?

Michael Hudson-Doyle michael.hud...@linaro.org writes:

 Marcus Shawcroft marcus.shawcr...@gmail.com writes:

 On 3 October 2013 23:43, Michael Hudson-Doyle michael.hud...@linaro.org 
 wrote:
 Hi,

 As libatomic builds for and the tests pass on AArch64 (built on x86_64
 but tested on a foundation model, logs and summary:

 http://people.linaro.org/~mwhudson/libatomic.sum.txt
 http://people.linaro.org/~mwhudson/runtest-log-v-2.txt

 ) this patch enables the build.

 Cheers,
 mwh
 (first time posting to this list, let me know if I'm doing it wrong)

 2013-10-04  Michael Hudson-Doyle  michael.hud...@linaro.org

   * configure.tgt: Add AArch64 support.


 Hi,
 The patch looks fine to me.

 Thanks for looking!

 The ChangeLog entry should reflect the code that was removed rather
 than the functionality added.  Perhaps:

   * configure.tgt (aarch64*): Remove.

 There are few too many negatives going on to make a pithy explanation
 easy...

 Did you investigate whether or not the 10 UNSUPPORTED results in the
 testsuite are sane?

 I did not, but have now.

 I think that 5 look legitimate since they require 128 bit sync ops.
 The other 5 look superficially like they should be supported on
 aarch64.  We may just be missing aarch64 target supports wiring in
 check_effective_target_sync_long_long_runtime?

 Yes, that was it, with appropriate changes the -4 tests all pass.

 However, just out of a sense of curiosity, I added wiring to claim
 aarch64* supports 128 bit sync ops and all the -5 tests pass too.  Is
 that just luck or because the reservation granule on the foundation
 model is big enough or something else?

 In any case, I'll attach a patch that just claims support for long long
 sync ops for now...

 Cheers,
 mwh

 /Marcus

 2013-10-04  Michael Hudson-Doyle  michael.hud...@linaro.org

   * libatomic/configure.tgt (aarch64*): Remove code preventing
 build.

   * gcc/testsuite/lib/target-supports.exp
 (check_effective_target_sync_long_long): AArch64 supports
 atomic operations on long long.
 (check_effective_target_sync_long_long_runtime): AArch64 can
 execute atomic operations on long long.

 diff --git a/gcc/testsuite/lib/target-supports.exp 
 b/gcc/testsuite/lib/target-supports.exp
 index 7eb4dfe..5557c06 100644
 --- a/gcc/testsuite/lib/target-supports.exp
 +++ b/gcc/testsuite/lib/target-supports.exp
 @@ -4508,6 +4508,7 @@ proc check_effective_target_sync_int_128_runtime { } {
  proc check_effective_target_sync_long_long { } {
  if { [istarget x86_64-*-*]
|| [istarget i?86-*-*])
 +  || [istarget aarch64*-*-*]
|| [istarget arm*-*-*]
|| [istarget alpha*-*-*]
|| ([istarget sparc*-*-*]  [check_effective_target_lp64]) } {
 @@ -4537,6 +4538,8 @@ proc check_effective_target_sync_long_long_runtime { } {
   }
   } 
   }]
 +} elseif { [istarget aarch64*-*-*] } {
 + return 1
  } elseif { [istarget arm*-*-linux-*] } {
   return [check_runtime sync_longlong_runtime {
   #include stdlib.h
 diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
 index b9e5d6c..7eaab38 100644
 --- a/libatomic/configure.tgt
 +++ b/libatomic/configure.tgt
 @@ -95,11 +95,6 @@ fi
  
  # Other system configury
  case ${target} in
 -  aarch64*)
 - # This is currently not supported in AArch64.
 - UNSUPPORTED=1
 - ;;
 -
arm*-*-linux*)
   # OS support for atomic primitives.
   config_path=${config_path} linux/arm posix


PR libstdc++/58729 - tr2::dynamic_bitset::resize fails

2013-10-17 Thread Ed Smith-Rowland

This patch bootstraps and tests clean on x86-64-linux.

Truthfully, dynamic_bitset needs some more love wrt C++11 and a testsuite.
It got put in before it was baked really.
That will be later.


2013-10-16  Edward Smith-Rowland  3dw...@verizon.net

PR libstdc++/58729
* include/tr2/dynamic_bitset (_M_resize, resize): Use input value
to set bits; (_M_do_left_shift, _M_do_right_shift, _M_do_to_ulong,
_M_do_to_ullong, _M_do_find_first, _M_do_find_next, _M_copy_from_ptr,
operator): Move long methods outline to...
* include/tr2/dynamic_bitset.tcc: New.
* include/Makefile.am: Add dynamic_bitset.tcc.
* include/Makefile.in: Add dynamic_bitset.tcc.
* testsuite/tr2/dynamic_bitset/pr58729.cc: New.
Index: include/tr2/dynamic_bitset
===
--- include/tr2/dynamic_bitset  (revision 203739)
+++ include/tr2/dynamic_bitset  (working copy)
@@ -137,7 +137,12 @@
if (__nbits % _S_bits_per_block  0)
  ++__sz;
if (__sz != this-_M_w.size())
- this-_M_w.resize(__sz);
+ {
+   block_type __val = 0;
+   if (__value)
+ __val = std::numeric_limitsblock_type::max();
+   this-_M_w.resize(__sz, __val);
+ }
   }
 
   allocator_type
@@ -246,7 +251,7 @@
   bool
   _M_is_equal(const __dynamic_bitset_base __x) const
   {
-   if (__x.size() == this-size())
+   if (__x._M_w.size() == this-_M_w.size())
  {
for (size_t __i = 0; __i  this-_M_w.size(); ++__i)
  if (this-_M_w[__i] != __x._M_w[__i])
@@ -260,7 +265,7 @@
   bool
   _M_is_less(const __dynamic_bitset_base __x) const
   {
-   if (__x.size() == this-size())
+   if (__x._M_w.size() == this-_M_w.size())
  {
for (size_t __i = this-_M_w.size(); __i  0; --__i)
  {
@@ -297,9 +302,9 @@
   bool
   _M_is_subset_of(const __dynamic_bitset_base __b)
   {
-   if (__b.size() == this-size())
+   if (__b._M_w.size() == this-_M_w.size())
  {
-   for (size_t __i = 0; __i  _M_w.size(); ++__i)
+   for (size_t __i = 0; __i  this-_M_w.size(); ++__i)
  if (this-_M_w[__i] != (this-_M_w[__i] | __b._M_w[__i]))
return false;
return true;
@@ -364,140 +369,6 @@
   }
 };
 
-  // Definitions of non-inline functions from __dynamic_bitset_base.
-  templatetypename _WordT, typename _Alloc
-void
-__dynamic_bitset_base_WordT, _Alloc::_M_do_left_shift(size_t __shift)
-{
-  if (__builtin_expect(__shift != 0, 1))
-   {
- const size_t __wshift = __shift / _S_bits_per_block;
- const size_t __offset = __shift % _S_bits_per_block;
-
- if (__offset == 0)
-   for (size_t __n = this-_M_w.size() - 1; __n = __wshift; --__n)
- this-_M_w[__n] = this-_M_w[__n - __wshift];
- else
-   {
- const size_t __sub_offset = _S_bits_per_block - __offset;
- for (size_t __n = _M_w.size() - 1; __n  __wshift; --__n)
-   this-_M_w[__n] = ((this-_M_w[__n - __wshift]  __offset)
-| (this-_M_w[__n - __wshift - 1]  
__sub_offset));
- this-_M_w[__wshift] = this-_M_w[0]  __offset;
-   }
-
-  std::fill(this-_M_w.begin(), this-_M_w.begin() + __wshift,
-   static_cast_WordT(0));
-   }
-}
-
-  templatetypename _WordT, typename _Alloc
-void
-__dynamic_bitset_base_WordT, _Alloc::_M_do_right_shift(size_t __shift)
-{
-  if (__builtin_expect(__shift != 0, 1))
-   {
- const size_t __wshift = __shift / _S_bits_per_block;
- const size_t __offset = __shift % _S_bits_per_block;
- const size_t __limit = this-_M_w.size() - __wshift - 1;
-
- if (__offset == 0)
-   for (size_t __n = 0; __n = __limit; ++__n)
- this-_M_w[__n] = this-_M_w[__n + __wshift];
- else
-   {
- const size_t __sub_offset = (_S_bits_per_block
-  - __offset);
- for (size_t __n = 0; __n  __limit; ++__n)
-   this-_M_w[__n] = ((this-_M_w[__n + __wshift]  __offset)
-| (this-_M_w[__n + __wshift + 1]  
__sub_offset));
- this-_M_w[__limit] = this-_M_w[_M_w.size()-1]  __offset;
-   }
-
- std::fill(this-_M_w.begin() + __limit + 1, this-_M_w.end(),
-   static_cast_WordT(0));
-   }
-}
-
-  templatetypename _WordT, typename _Alloc
-unsigned long
-__dynamic_bitset_base_WordT, _Alloc::_M_do_to_ulong() const
-{
-  size_t __n = sizeof(unsigned long) / sizeof(block_type);
-  for (size_t __i = __n; __i  this-_M_w.size(); ++__i)
-   if (this-_M_w[__i])
- __throw_overflow_error(__N(__dynamic_bitset_base::_M_do_to_ulong));
-  unsigned 

Re: [Patch] Fix undefined behaviors in regex

2013-10-17 Thread Marek Polacek
On Wed, Oct 16, 2013 at 07:02:03PM -0400, Tim Shen wrote:
  To be honest, I was thinking something much smaller than the whole regex
  ;) But let's add Marek in CC.
 
 int work() {
 }
 
 int main() {
 int a = work();
 return a;
 }
 
 /* This is a smaller case to test the sanitizer. It seems that the
 undefined sanitizer is not merged? I use `g++ (GCC) 4.9.0 20131003`,
 is that too old? */

No, that's not too old, the thing is -fsanitize=undefined isn't
complete - we currently sanitize shift, division by zero, and
__builtin_unreachable call; VLA sanitization is done, but not commited
because I'm waiting for a review of the C++ FE part of that patch,
and on NULL pointer checking I'm working now.

Missing return statement will definitely be added, too (quite
easy, I should think), and that would detect the bug in your
testcase.

Still, thanks for letting me know.

Marek


Re: [Patch] Fix undefined behaviors in regex

2013-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2013 at 09:12:41AM +0200, Marek Polacek wrote:
 On Wed, Oct 16, 2013 at 07:02:03PM -0400, Tim Shen wrote:
   To be honest, I was thinking something much smaller than the whole regex
   ;) But let's add Marek in CC.
  
  int work() {
  }
  
  int main() {
  int a = work();
  return a;
  }
  
  /* This is a smaller case to test the sanitizer. It seems that the
  undefined sanitizer is not merged? I use `g++ (GCC) 4.9.0 20131003`,
  is that too old? */
 
 No, that's not too old, the thing is -fsanitize=undefined isn't
 complete - we currently sanitize shift, division by zero, and
 __builtin_unreachable call; VLA sanitization is done, but not commited
 because I'm waiting for a review of the C++ FE part of that patch,
 and on NULL pointer checking I'm working now.
 
 Missing return statement will definitely be added, too (quite
 easy, I should think), and that would detect the bug in your
 testcase.

Though, in the above case, the question is why people ignore warnings
from the compiler and need to have special runtime instrumentation to remind
them instead.  I'm not objecting to that sanitization, only find it weird.

Jakub


Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Jan-Benedict Glaw
On Wed, 2013-10-16 19:40:21 -0700, Xinliang David Li davi...@google.com wrote:
 On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote:
  On Wed, Oct 16, 2013 at 7:23 PM, Sriraman Tallam tmsri...@google.com 
  wrote:
   I was unable to build a native powerpc compiler. I checked for
   build_target_node and build_optimization_node throughout and
   changed rs6000 because it had references. I did not realize
   function_specific_save and function_specific_restore have to be
   changed. Sorry for breaking it.
 
  As Mike replied, gcc110 is available.  Richard Biener's approval
  was dependent upon successful bootstrap and passing the regression
  testsuite, which you did not even attempt, nor did you try to
  build a cross-compiler.
 
 This is an oversight. I agree that it is better to test on multiple
 platforms for large changes like this. In the past, Sri has been
 very attentive to any fallouts due to his changes, so is this time.

As of speaking about multiple platforms...  This patch didn't only
produce fallout on rs6k, but also for quite a number of other
architectures.

  I already send one message (it can be found in the archives at
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01156.html) listing quite
a number of targets that are broken right now. This situation didn't
change significantly since they started to fail. Please have a look at
my build robot[1]. All targets (except nds32, nios2 and arceb) used to
build.

  Don't get me wrong. I don't want to overly blame Sriraman for
breaking it in the first place. Shit happens. But please have an eye
on fixing the fallout, timely.

MfG, JBG
[1] http://toolchain.lug-owl.de/buildbot/?limit=2000

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of:   Ich hatte in letzter Zeit ein bißchen viel Realitycheck.
the second  :   Langsam möchte ich mal wieder weiterträumen können.
 -- Maximilian Wilhelm (18. Mai 2005, #lug-owl.de)


signature.asc
Description: Digital signature


Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-17 Thread Marek Polacek
On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote:
 +/* Return true if target platform supports macro-fusion.  */
 +
 +static bool
 +ix86_macro_fusion_p ()
 +{
 +  if (TARGET_FUSE_CMP_AND_BRANCH)
 +return true;
 +  else
 +return false;
 +}

That looks weird, why not just

static bool
ix86_macro_fusion_p (void)
{
  return TARGET_FUSE_CMP_AND_BRANCH;
}

?

Marek


Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Andreas Schwab
What about all the other targets you broke?

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


Re: [PATCH] Fix PR58143 and dups

2013-10-17 Thread Richard Biener
On Tue, 15 Oct 2013, Richard Biener wrote:

 
 This is an alternate fix (see 
 http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00234.html for the other
 one) for the various PRs that show that LIM exposes undefined
 signed overflow on paths where it wasn't executed before LIM
 ultimately leading to a miscompilation.
 
 For this fix we rewrite invariant stmts to use unsigned arithmetic
 when it is one of the operations that SCEV and niter analysis
 handles (thus not division or absolute).  The other fix instead
 disables invariant motion for those expressions.
 
 Bootstrapped and tested on x86_64-unknown-linux-gnu.
 
 The issue is also present on the 4.8 branch, so either patch
 should be backported in the end.  I will try to benchmark
 both tomorrow (unless somebody beats me on that).

Comparing both patches doesn't get conclusive results.  A single-run
SPEC 2k6 on a Sandy Bridge machine gives (base is Bernds patch, peak
is mine, -Ofast -funroll-loops -march-native):

  Estimated   
Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
400.perlbench9770332   29.4 *9770325   
30.0 *  
401.bzip29650450   21.4 *9650449   
21.5 *  
403.gcc  8050282   28.5 *8050282   
28.6 *  
429.mcf  9120225   40.6 *9120226   
40.4 *  
445.gobmk   10490411   25.6 *   10490408   
25.7 *  
456.hmmer9330364   25.6 *9330365   
25.6 *  
458.sjeng   12100433   27.9 *   12100432   
28.0 *  
462.libquantum  20720318   65.1 *   20720325   
63.8 *  
464.h264ref 22130525   42.2 *   22130527   
42.0 *  
471.omnetpp  6250263   23.7 *6250264   
23.7 *  
473.astar7020347   20.2 *7020346   
20.3 *  
483.xalancbmk6900188   36.7 *6900189   
36.5 *  
 Est. SPECint_base2006   --
 Est. SPECint2006
--

  Estimated   
Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
410.bwaves  13590307   44.2 *   13590303   
44.9 *  
416.gamess  NR  
NR 
433.milc 9180356   25.8 *9180356   
25.8 *  
434.zeusmp   9100295   30.8 *9100297   
30.7 *  
435.gromacs  7140283   25.3 *7140282   
25.3 *  
436.cactusADM   11950380   31.4 *   11950383   
31.2 *  
437.leslie3d 9400288   32.7 *9400284   
33.1 *  
444.namd 8020401   20.0 *8020402   
20.0 *  
447.dealII  11440277   41.3 *   11440278   
41.1 *  
450.soplex   8340208   40.1 *8340206   
40.5 *  
453.povray   5320180   29.6 *5320178   
29.9 *  
454.calculix 8250393   21.0 *8250392   
21.0 *  
459.GemsFDTD10610316   33.6 *   10610308   
34.4 *  
465.tonto9840294   33.5 *9840294   
33.5 *  
470.lbm 13740245   56.1 *   13740245   
56.1 *  
481.wrf 11170259   43.1 *   11170259   
43.2 *  
482.sphinx3 19490396   49.3 *   19490397   
49.1 *  
 Est. SPECfp_base2006--
 Est. SPECfp2006 
--


off-noise (more than 5s difference) may be 462.libquantum and 
459.GemsFDTD.  I didn't include unpatched trunk in the comparison
(not fixing the bug isn't an option after all).

Conceptually I like the rewriting into unsigned arithmetic more
so I'm going to apply that variant later today (re-testing 3
runs of 462.libquantum and 459.GemsFDTD, this time with address-space
randomization turned off).

Richard.

 Any preference or other suggestions?
 
 Thanks,
 Richard.
 
 2013-10-15  Richard Biener  rguent...@suse.de
 
   PR tree-optimization/58143
   * tree-ssa-loop-im.c (arith_code_with_undefined_signed_overflow):
   New function.
   (rewrite_to_defined_overflow): Likewise.
   (move_computations_dom_walker::before_dom): Rewrite stmts
   with undefined signed overflow that are not always 

Re: [PATCH] Fix PR58143 and dups

2013-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2013 at 09:56:31AM +0200, Richard Biener wrote:
 off-noise (more than 5s difference) may be 462.libquantum and 
 459.GemsFDTD.  I didn't include unpatched trunk in the comparison
 (not fixing the bug isn't an option after all).
 
 Conceptually I like the rewriting into unsigned arithmetic more
 so I'm going to apply that variant later today (re-testing 3
 runs of 462.libquantum and 459.GemsFDTD, this time with address-space
 randomization turned off).

Can't we rewrite for the selected arithmetic operations and punt (is that
what Bernd's patch did) on moving other arithmetics?  Well, there are
operations that are safe even for signed types, e.g. rotates, isn't
RSHIFT_EXPR also safe?
Can't you handle also LSHIFT_EXPR (or do we treat it even signed as never
undefined in the middle-end/backend?).

Jakub


Re: Patch: Add #pragma ivdep support to the ME and C FE

2013-10-17 Thread Richard Biener
On Wed, 16 Oct 2013, Tobias Burnus wrote:

 Frederic Riss wrote:
  Just one question. You describe the pragma in the doco patch as:
 
  +This pragma tells the compiler that the immediately following @code{for}
  +loop can be executed in any loop index order without affecting the result.
  +The pragma aids optimization and in particular vectorization as the
  +compiler can then assume a vectorization safelen of infinity.
 
  I'm not a specialist, but I was always told that the 'original'
  meaning of ivdep (which I believe was introduced by Cray), was that
  the compiler could assume that there are only forward dependencies in
  the loop, but not that it can be executed in any order.
 
 The nice thing about #pragma ivdep is that there is no real standard. And
 the explanation of the different vendors is also not completely clear.
 
 
 Some overview about this is given in the following file on pages 13-14 for
 Cray Reaseach PVP, MIPSPRO  Open64, Intel ICC, Multiflow
 http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/GREPS2007-Benoit.pdf
 
 That's summerized as:
 - vector: ignore lexical upward dependencies (Cray PVP, Intel ICC)
 - parallel: ignore loop-carried dependencies (MIPSPRO, Open64)
 - liberal: ignore loop-variant dependencies (Multiflow)
 
 
 The quotes for Cray and Intel are below.
 
 Cray: 
 http://docs.cray.com/books/004-2179-001/html-004-2179-001/brtlrwh.html#EKZ5MRWH
 The ivdep directive tells the compiler to ignore vector dependencies for
  the loop immediately following the directive. Conditions other than vector
  dependencies can inhibit vectorization. If these conditions are satisfactory,
  the loop vectorizes. This directive is useful for some loops that contain
  pointers and indirect addressing. The format of this directive is as follows:
  #pragma _CRI ivdep

Which suggests we use

#pragma GCC ivdep

to not collide with eventually different semantics in existing programs
that use variants of this pragma?

 Intel: 
 http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm
 The ivdep pragma instructs the compiler to ignore assumed vector 
 dependencies.
  To ensure correct code, the compiler treats an assumed dependence as a proven
  dependence, which prevents vectorization. This pragma overrides that 
 decision.
  Use this pragma only when you know that the assumed loop dependencies are 
 safe
  to ignore.

This suggests that _known_ dependences are still treated as dependences.
But what is known obviously depends on the implementation which
may not know that a[i] and a[i+1] depend but merely assume it.  Not
a standard-proof definition of the pragma ;)

That said, safelen even overrides know dependences (but with unknown
distance vector)! (that looks like a bug to me, or at least a QOI issue)

 
  The Intel docs give this example:
 ...
  Given your description, this loop wouldn't be a candidate for ivdep,
  as reversing the loop index order changes the semantics. I believe
  that the way you interpret it (ie. setting vectorization safe length
  to INT_MAX) is correct with respect to this other definition, though.
 
 Do you have a suggestion for a better wording? My idea was to interpret
 this part similar to OpenMP's simd with safelen=infinity. (Actually, I
 believe loop-safelen was added for OpenMPv4's and/or Cilk Plus's simd.)
 
 OpenMPv4.0, http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf , states
 for this (excerpt from page 70):
 A SIMD loop has logical iterations numbered 0, 1,...,N-1 where N is the
 number of loop iterations, and the logical numbering denotes the sequence
 in which the iterations would be executed if the associated loop(s) were
 executed with no SIMD instructions. If the safelen clause is used then no
 two iterations executed concurrently with SIMD instructions can have a
 greater distance in the logical iteration space than its value. The
 parameter of the safelen clause must be a constant positive integer
 expression. The number of iterations that are executed concurrently at
 any given time is implementation defined. Each concurrent iteration will
 be executed by a different SIMD lane. Each set of concurrent iterations
 is a SIMD chunk.

OTOH, if we are mapping ivdep to safelen why not simply allow

#pragma GCC safelen 4

?

 
  Oh, and are there any plans to maintain this information in some way
  till the back-end? Software pipelining could be another huge winner
  for that kind of dependency analysis simplification.
 
 I don't know until when loop-safelen is kept. As it is late in the
 middle-end, providing the backend with this information should be
 simple.

It's kept as long as we preserve loops which at the moment is until
after RTL loop optimizations are finished.  Extending this isn't
hard, I just didn't see a reason to do that.

Richard.


Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 As mentioned in my message yesterday, I thought your new way of canonising
 unsigned tree constants meant that there was always an upper zero bit.
 Is that right?
 i believe this is correct.
 If so, xprecision  precision is a no-op, because the number always
 has the right form for wider precisions.  The only difficult case is
 xprecision == precision, since then we need to peel off any upper -1 HWIs.
 say my HWI size is 8 bits (just to keep from typing a million 'f's.   if 
 i have a 16 bit unsigned number that is all 1s in the tree world it is 3 
 hwis
 0x00 0xff 0xff.

 but inside regular wide int, it would take 1 wide int whose value is 0xff.
 inside of max it would be the same as the tree, but then the test 
 precision  xprecision + hbpwi never kicks in because precision is 
 guaranteed to be huge.

 inside of addr_wide_int, i think we tank with the assertion.

It should be OK for addr_wide_int too.  The precision still fits 2 HWIs.
The initial length is greater than the maximum length of an addr_wide_int,
but your len = MAX (len, max_len) deals with that.

 the case actually comes up on the ppc because they do a lot of 128 bit 
 math.I think i got thru the x86-64 without noticing this.

Well, it'd be suspicious if we're directly using 128-bit numbers
in addr_wide_int.  The justification for the assertion was that we
should explicitly truncate to addr_wide_int when deliberately
ignoring upper bits, beyond bit or byte address width.  128 bits
definitely falls into that category on powerpc.

Thanks,
Richard


Re: patch to canonize unsigned tree-csts

2013-10-17 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 On 10/15/2013 02:30 PM, Richard Sandiford wrote:
 Richard Sandiford rdsandif...@googlemail.com writes:
if (small_prec)
  ;
else if (precision == xprecision)
  while (len = 0  val[len - 1] == -1)
len--;
 Err, len  0 obviously.
 you were only close.

Bah.

 patch tested on ppc and committed as revision 203739.

Thanks.  So that should have got rid of the last use of scratch in
tree-addr_wide_int and tree-max_wide_int.

Richard


RE: FW: MAX_PATH problems with mingw gcc

2013-10-17 Thread Vladimir Simonov
 -Original Message-
 From: Joey Ye [mailto:joey.ye...@gmail.com] 
 
 There is an issue on file system with symbolic link, like Linux ext2/3. It 
 could vitally change behavior of GCC. The issue is described as following.
 
 Such a logic cannot be deduced simple from the name string, so your patch can 
 do nothing about it.
 
 For your patch IMHO the way out could be just define a real function for DOS 
 FS, and leave a blank function body otherwise. Like:
 filename_normalize (char *fn)
 {
 #if DOS
 ...
 #endif
 }

Thank you for pointing this problem. 

So, on file systems with symlinks support playing with filenames as strings 
is impossible.
This means that filename_normalize name is too pretentious - it will do nothing 
for most of gcc consumers.
From other side - it is useful for OS-es with small MAX_PATH.
Do you think it should be renamed as filename_shortcut/filename_simplify or 
something like it?
So readers won't expect too much from it even without looking at its body.

Is it allowed to write
# ifdef HAVE_DOS_BASED_FILE_SYSTEM
extern void filename_normalize (char *f); 
#else
#define filename_normalize (char *f)
#endif
directly in include/filenames.h?
This way we'll avoid unnecessary empty call on platforms with symlinks. 

And more common question.
I can imagine that different filenames produced after cross build on different 
OS-es for the same target
may confuse some upper-level tools, like code analyzers, code coverage, etc...
Does it make sense to push such solution to gcc mainsteam?
May be it is better to keep the patch for cross toolchains builders...

Regards
Vladimir


Re: Patch: Add #pragma ivdep support to the ME and C FE

2013-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2013 at 10:07:43AM +0200, Richard Biener wrote:
 Which suggests we use
 
 #pragma GCC ivdep
 
 to not collide with eventually different semantics in existing programs
 that use variants of this pragma?

Yeah, perhaps.

  Intel: 
  http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm
  The ivdep pragma instructs the compiler to ignore assumed vector 
  dependencies.
   To ensure correct code, the compiler treats an assumed dependence as a 
  proven
   dependence, which prevents vectorization. This pragma overrides that 
  decision.
   Use this pragma only when you know that the assumed loop dependencies are 
  safe
   to ignore.
 
 This suggests that _known_ dependences are still treated as dependences.
 But what is known obviously depends on the implementation which
 may not know that a[i] and a[i+1] depend but merely assume it.  Not
 a standard-proof definition of the pragma ;)

Very bad definition indeed.

 That said, safelen even overrides know dependences (but with unknown
 distance vector)! (that looks like a bug to me, or at least a QOI issue)

safelen is whatever the OpenMP 4.0 standard requires (and Cilk+ is I believe
just defering the description to the OpenMP 4.0 definition), and fortunately
the OpenMP 4.0 standard doesn't contain any so badly worded definition.
The actual wording is not that the = safelen consecutive iterations can be
run in any order, but that they can be performed all together using
(possibly emulated) SIMD instructions.  Thus I think it is correct if we use
it for decisions if we can vectorize a loop (without versioning it for
alias), regardless of known vs. unknown dependencies - if we have known
dependencies that would result in known broken code, perhaps we should
warn?, but probably not for anything further, it should not affect aliasing
of scalar or vector loads/stores in the loop, etc.
Then I think we'd handle forward but not backward dependencies.  Given the
void ignore_vec_dep(int *a, int k, int c, int m)
{
  #pragma omp simd
  for (int i = 0; i  m; i++)
a[i] = a[i + k] * c;
}
testcase, I think we'll handle it fine for k = -m and k = 0.
For k = m obviously, there is no overlap and even runtime versioning for
alias would handle it right, for smaller k because the load (vector or
non-vector) will be before the store and we don't tell aliasing the two
don't alias.  We don't vectorize any load+store operations (expressed as one
stmt in GIMPLE; struct copies or atomic stmts), do we?  Those would be
a problem with the above.  If for anything else we place all the VF loads
where the original load was in the IL and all the VF stores where the
original store was in the IL, and just leave it to other passes to reorder
if they can prove it doesn't alias, we should be fine.

Jakub


Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Richard Sandiford wrote:

 Kenneth Zadeck zad...@naturalbridge.com writes:
  As mentioned in my message yesterday, I thought your new way of canonising
  unsigned tree constants meant that there was always an upper zero bit.
  Is that right?
  i believe this is correct.
  If so, xprecision  precision is a no-op, because the number always
  has the right form for wider precisions.  The only difficult case is
  xprecision == precision, since then we need to peel off any upper -1 HWIs.
  say my HWI size is 8 bits (just to keep from typing a million 'f's.   if 
  i have a 16 bit unsigned number that is all 1s in the tree world it is 3 
  hwis
  0x00 0xff 0xff.
 
  but inside regular wide int, it would take 1 wide int whose value is 0xff.
  inside of max it would be the same as the tree, but then the test 
  precision  xprecision + hbpwi never kicks in because precision is 
  guaranteed to be huge.
 
  inside of addr_wide_int, i think we tank with the assertion.
 
 It should be OK for addr_wide_int too.  The precision still fits 2 HWIs.
 The initial length is greater than the maximum length of an addr_wide_int,
 but your len = MAX (len, max_len) deals with that.

It's

  len = MIN (len, max_len)

which looked suspicious to me, but with precision = xprecision
precision can only be zero if xprecision is zero which looked to
me like it cannot happen - or rather it should be fixed.

  the case actually comes up on the ppc because they do a lot of 128 bit 
  math.I think i got thru the x86-64 without noticing this.
 
 Well, it'd be suspicious if we're directly using 128-bit numbers
 in addr_wide_int.  The justification for the assertion was that we
 should explicitly truncate to addr_wide_int when deliberately
 ignoring upper bits, beyond bit or byte address width.  128 bits
 definitely falls into that category on powerpc.

My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
wide-int value if it has precision 16.  AFAIK that is what the
code produces, but now Kenny says this is only for some kind
of wide-ints but not all?  That is, why is

inline wi::storage_ref
wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch,
unsigned int precision, const_tree 
x)
{
  unsigned int len = TREE_INT_CST_NUNITS (x);
  const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 
0);
  return wi::storage_ref (val, len, precision);
}

not a valid implementation together with making sure that the
INTEGER_CST tree rep has that extra word of zeros if required?
I would like to see us move in that direction (given that the
tree rep of INTEGER_CST has transitioned to variable-length already).

Btw, code such as

tree
wide_int_to_tree (tree type, const wide_int_ref pcst)
{
...
  unsigned int small_prec = prec  (HOST_BITS_PER_WIDE_INT - 1);
  bool recanonize = sgn == UNSIGNED
 small_prec
 (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == 
len;

definitely needs a comment.  I would have thought _all_ unsigned
numbers need re-canonicalization.  Well, maybe only if we're
forcing the extra zeros.

[This function shows another optimization issue:

case BOOLEAN_TYPE:
  /* Cache false or true.  */
  limit = 2;
  if (wi::leu_p (cst, 1))
ix = cst.to_uhwi ();

I would have expected cst = 1 be optimized to cst.len == 1 
cst.val[0] = 1.  It expands to

L27:
  MEM[(long int *)D.50698 + 16B] = 1;
  MEM[(struct wide_int_ref_storage *)D.50698] = MEM[(struct 
wide_int_ref_storage *)D.50698].scratch;
  MEM[(struct wide_int_ref_storage *)D.50698 + 8B] = 1;
  MEM[(struct wide_int_ref_storage *)D.50698 + 12B] = 32;
  _277 = MEM[(const struct wide_int_storage *)cst + 260B];
  if (_277 = 64)
goto bb 42;
  else
goto bb 43;

  bb 42:
  xl_491 = zext_hwi (1, 32);  // ok, checking enabled and thus out-of-line
  _494 = MEM[(const long int *)cst];
  _495 = (long unsigned int) _494;
  yl_496 = zext_hwi (_495, _277);
  _497 = xl_491  yl_496;
  goto bb 44;

  bb 43:
  _503 = wi::ltu_p_large (MEM[(struct wide_int_ref_storage 
*)D.50698].scratch, 1, 32, MEM[(const struct wide_int_storage 
*)cst].val, len_274, _277);

this keeps D.50698 and cst un-SRAable - inline storage is problematic
for this reason.  But the representation should guarantee the
compare with a low precision (32 bit) constant is evaluatable
at compile-time if len of the larger value is  1, no?

  bb 44:
  # _504 = PHI _497(42), _503(43)
  D.50698 ={v} {CLOBBER};
  if (_504 != 0)
goto bb 45;
  else
goto bb 46;

  bb 45:
  pretmp_563 = MEM[(const struct wide_int_storage *)cst + 256B];
  goto bb 229 (L131);

  bb 46:
  _65 = generic_wide_intwide_int_storage::to_uhwi (cst, 0);
  ix_66 = (int) _65;
  goto bb 91;

The question is whether we should try to optimize wide-int for
such cases or simply not use wi:leu_p (cst, 1) but rather

 if (cst.fits_uhwi_p () == 1  cst.to_uhwi ()  1)

?


 Thanks,
 Richard
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE 

RE: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)

2013-10-17 Thread Paulo Matos
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On
 Behalf Of Mike Stump
 Sent: 16 October 2013 23:06
 To: Paulo J.Matos
 Cc: gcc-patches@gcc.gnu.org
 Subject: Re: Teaching emacs about GCC coding conventions (was Re: [PATCH]
 tree_code_name wrapper)
 
 First, we like to wait and let patches bake on mainline before considering 
 back
 porting, this has no bake time yet in our tree.  Second, only patches that 
 impact
 quality enough should be back ported.  I tend to think that this one does not
 impact quality enough to worry about.  Also, you should develop on trunk, not
 4_8.  Arguably, I would say no.  Now, a release manager can always review 
 approve
 it; it should be very, very low risk.
 

Makes sense. Thanks for the clarification.

-- 
Paulo Matos


Re: [Patch] Fix undefined behaviors in regex

2013-10-17 Thread Paolo Carlini

On 10/17/2013 09:17 AM, Jakub Jelinek wrote:
Though, in the above case, the question is why people ignore warnings 
from the compiler and need to have special runtime instrumentation to 
remind them instead. I'm not objecting to that sanitization, only find 
it weird.

I had the same thought.

Paolo.



Re: PR libstdc++/58729 - tr2::dynamic_bitset::resize fails

2013-10-17 Thread Paolo Carlini

Hi,

On 10/17/2013 09:04 AM, Ed Smith-Rowland wrote:

This patch bootstraps and tests clean on x86-64-linux.

Truthfully, dynamic_bitset needs some more love wrt C++11 and a 
testsuite.

It got put in before it was baked really.
That will be later.

Patch is Ok with me. Before committing you may want to test -m32 too.

Thanks,
Paolo.



RE: FW: MAX_PATH problems with mingw gcc

2013-10-17 Thread Vladimir Simonov
Thank you for review. Please see inline

 From: Joey Ye [mailto:joey.ye...@gmail.com] 
 Sent: Thursday, October 17, 2013 7:36 AM
 
 The macro FILENAME_NORMALIZE doesn't look necessary. I would prefer use 
 filename_normalize directly, just as filename_cmp is used in GCC rather than 
 FILENAME_CMP
The patch is quite old(from gcc 4.2, may be 4.3). As I remember those days 
filenames.h had only
extern int filename_cmp (const char *s1, const char *s2);
#define FILENAME_CMP(s1, s2)filename_cmp(s1, s2)

at the end. So I just followed its style. Nowadays the style is changed.
I'll use filename_normalize everywhere.



 Also it normalize filename like:
 c:\abc\ into c:/abc\
The ending \ isn't normalized. Need to refine logic here to handle this case.
Thank you.
I was concentrated on filenames shorting  correctness and missed this case.
In fact it is not too bad - the patch doesn't guarantee that all filenames 
inside gcc process will be
as short as possible and will have forward slashes. It just simplifies 
filenames in some key places.
Anyway, I'll add this case to my test and fix it.

Other comments are accepted without notes:)
I'll redo the patch in several days.

Thank you
Vladimir


Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.

2013-10-17 Thread Richard Biener
On Wed, 16 Oct 2013, Cong Hou wrote:

 On Wed, Oct 16, 2013 at 2:02 AM, Richard Biener rguent...@suse.de wrote:
  On Tue, 15 Oct 2013, Cong Hou wrote:
 
  Thank you for your reminder, Jeff! I just noticed Richard's comment. I
  have modified the patch according to that.
 
  The new patch is attached.
 
  (posting patches inline is easier for review, now you have to deal
  with no quoting markers ;))
 
  Comments inline.
 
  diff --git a/gcc/ChangeLog b/gcc/ChangeLog
  index 8a38316..2637309 100644
  --- a/gcc/ChangeLog
  +++ b/gcc/ChangeLog
  @@ -1,3 +1,8 @@
  +2013-10-15  Cong Hou  co...@google.com
  +
  +   * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop 
  invariant
  +   statement that contains data refs with zero-step.
  +
   2013-10-14  David Malcolm  dmalc...@redhat.com
 
  * dumpfile.h (gcc::dump_manager): New class, to hold state
  diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
  index 075d071..9d0f4a5 100644
  --- a/gcc/testsuite/ChangeLog
  +++ b/gcc/testsuite/ChangeLog
  @@ -1,3 +1,7 @@
  +2013-10-15  Cong Hou  co...@google.com
  +
  +   * gcc.dg/vect/pr58508.c: New test.
  +
   2013-10-14  Tobias Burnus  bur...@net-b.de
 
  PR fortran/58658
  diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c 
  b/gcc/testsuite/gcc.dg/vect/pr58508.c
  new file mode 100644
  index 000..cb22b50
  --- /dev/null
  +++ b/gcc/testsuite/gcc.dg/vect/pr58508.c
  @@ -0,0 +1,20 @@
  +/* { dg-do compile } */
  +/* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details } */
  +
  +
  +/* The GCC vectorizer generates loop versioning for the following loop
  +   since there may exist aliasing between A and B.  The predicate checks
  +   if A may alias with B across all iterations.  Then for the loop in
  +   the true body, we can assert that *B is a loop invariant so that
  +   we can hoist the load of *B before the loop body.  */
  +
  +void foo (int* a, int* b)
  +{
  +  int i;
  +  for (i = 0; i  10; ++i)
  +a[i] = *b + 1;
  +}
  +
  +
  +/* { dg-final { scan-tree-dump-times hoist 2 vect } } */
  +/* { dg-final { cleanup-tree-dump vect } } */
  diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
  index 574446a..f4fdec2 100644
  --- a/gcc/tree-vect-loop-manip.c
  +++ b/gcc/tree-vect-loop-manip.c
  @@ -2477,6 +2477,92 @@ vect_loop_versioning (loop_vec_info loop_vinfo,
 adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi));
   }
 
 
  Note that applying this kind of transform at this point invalidates
  some of the earlier analysis the vectorizer performed (namely the
  def-kind which now effectively gets vect_external_def from
  vect_internal_def).  In this case it doesn't seem to cause any
  issues (we re-compute the def-kind everytime we need it (how wasteful)).
 
  +  /* Extract load and store statements on pointers with zero-stride
  + accesses.  */
  +  if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
  +{
  +  /* In the loop body, we iterate each statement to check if it is a 
  load
  +or store.  Then we check the DR_STEP of the data reference.  If
  +DR_STEP is zero, then we will hoist the load statement to the loop
  +preheader, and move the store statement to the loop exit.  */
 
  We don't move the store yet.  Micha has a patch pending that enables
  vectorization of zero-step stores.
 
  +  for (gimple_stmt_iterator si = gsi_start_bb (loop-header);
  +  !gsi_end_p (si);)
 
  While technically ok now (vectorized loops contain a single basic block)
  please use LOOP_VINFO_BBS () to get at the vector of basic-blcoks
  and iterate over them like other code does.
 
 
 Have done it.
 
 
 
  +   {
  + gimple stmt = gsi_stmt (si);
  + stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
  + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
  +
  + if (dr  integer_zerop (DR_STEP (dr)))
  +   {
  + if (DR_IS_READ (dr))
  +   {
  + if (dump_enabled_p ())
  +   {
  + dump_printf_loc
  + (MSG_NOTE, vect_location,
  +  hoist the statement to outside of the loop );
 
  hoisting out of the vectorized loop: 
 
  + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
  + dump_printf (MSG_NOTE, \n);
  +   }
  +
  + gsi_remove (si, false);
  + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), 
  stmt);
 
  Note that this will result in a bogus VUSE on the stmt at this point which
  will be only fixed because of implementation details of loop versioning.
  Either get the correct VUSE from the loop header virtual PHI node
  preheader edge (if there is none then the current VUSE is the correct one
  to use) or clear it.
 
 
 I just cleared the VUSE since I noticed that after the vectorization
 pass the correct VUSE 

Re: [RFC] By default if-convert only basic blocks that will be vectorized

2013-10-17 Thread Richard Biener
On Wed, 16 Oct 2013, pins...@gmail.com wrote:

 
  On Oct 15, 2013, at 5:32 AM, Jakub Jelinek ja...@redhat.com wrote:
  
  Hi!
  
  Especially on i?86/x86_64 if-conversion pass seems to be often
  a pessimization, but the vectorization relies on it and without it we can't
  vectorize a lot of the loops.
 
 I think on many other targets it actually helps.  I know for one it 
 helps on octeon even though octeon has no vector instructions.  I think 
 it helps most arm targets too.

The main issue is that it has no cost model - the only cost model
being that it can successfully if-convert all conditional code in
a loop, resulting in a single-BB loop.  So it is clearly vectorization
targeted.

It's infrastructure may be useful to do a more sensible if-conversion
on GIMPLE level on scalar code.

Of course even the infrastructure needs some TLC (and some better
generic machinery of keeping track and simplifying of a predicate
combination).

Thanks,
Richard.

 Thanks,
 Andrew
 
  
  Here is a prototype of a patch that will by default (unless explicit
  -ftree-loop-if-convert) only if-convert loops internally for vectorization,
  so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized
  basic blocks, but will not appear if vectorization fails, or in the
  scalar loop if vectorization is conditional, or in the prologue or epilogue
  loops around the vectorized loop.
  
  Instead of moving the ifcvt pass inside of the vectorizer, this patch
  during ifcvt performs loop versioning depending on a special internal
  call, only if the internal call returns true we go to the if-converted
  original loop, otherwise the non-if-converted copy of the original loop
  is performed.  And the vectorizer is taught to fold this internal call
  into true resp. false depending on if the loop was vectorized or not, and
  vectorizer loop versioning, peeling for alignment and for bound are adjusted
  to also copy from the non-if-converted loop rather than if-converted one.
  
  Besides fixing the various PRs where if-conversion pessimizes code I'd like
  to also move forward with this with conditional loads and stores,
  http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html
  where the if-unconversion approach looked like a failure.
  
  This patch doesn't yet handle if-converted inner loop in outer loop
  vectorization, something on my todo list (so several vect-cond-*.c tests
  FAIL because they are no longer vectorized) plus I had to change two
  SLP vectorization tests that silently relied on loop if-conversion being
  performed to actually optimize the basic block (if the same thing didn't
  appear in a loop, it wouldn't be optimized at all).
  
  On the newly added testcase on x86_64, there are before this patch
  18 scalar conditional moves, with the patch just 2 (both in the checking
  routine).
  
  Comments?
  
  --- gcc/internal-fn.def.jj2013-10-11 14:32:57.079909782 +0200
  +++ gcc/internal-fn.def2013-10-11 17:23:58.705526840 +0200
  @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST
  DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
  DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
  DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW)
  +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW)
  --- gcc/tree-vect-loop-manip.c.jj2013-09-30 22:13:47.0 +0200
  +++ gcc/tree-vect-loop-manip.c2013-10-15 12:57:54.854970913 +0200
  @@ -374,24 +374,31 @@ LOOP-  loop1
  
  static void
  slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop,
  +struct loop *scalar_loop,
  bool is_new_loop, basic_block 
  *new_exit_bb)
  {
  -  gimple orig_phi, new_phi;
  +  gimple orig_phi, new_phi, scalar_phi = NULL;
gimple update_phi, update_phi2;
tree guard_arg, loop_arg;
basic_block new_merge_bb = guard_edge-dest;
edge e = EDGE_SUCC (new_merge_bb, 0);
basic_block update_bb = e-dest;
basic_block orig_bb = loop-header;
  -  edge new_exit_e;
  +  edge new_exit_e, scalar_e = NULL;
tree current_new_name;
  -  gimple_stmt_iterator gsi_orig, gsi_update;
  +  gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none ();
  
/* Create new bb between loop and new_merge_bb.  */
*new_exit_bb = split_edge (single_exit (loop));
  
new_exit_e = EDGE_SUCC (*new_exit_bb, 0);
  
  +  if (scalar_loop != NULL  !is_new_loop)
  +{
  +  gsi_scalar = gsi_start_phis (scalar_loop-header);
  +  scalar_e = EDGE_SUCC (scalar_loop-latch, 0);
  +}
  +
for (gsi_orig = gsi_start_phis (orig_bb),
 gsi_update = gsi_start_phis (update_bb);
 !gsi_end_p (gsi_orig)  !gsi_end_p (gsi_update);
  @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge
tree new_res;
orig_phi = gsi_stmt (gsi_orig);
update_phi = gsi_stmt (gsi_update);
  +  if (scalar_e != NULL)
  +{
  +  

Re: [PATCH] Fix PR58143 and dups

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Jakub Jelinek wrote:

 On Thu, Oct 17, 2013 at 09:56:31AM +0200, Richard Biener wrote:
  off-noise (more than 5s difference) may be 462.libquantum and 
  459.GemsFDTD.  I didn't include unpatched trunk in the comparison
  (not fixing the bug isn't an option after all).
  
  Conceptually I like the rewriting into unsigned arithmetic more
  so I'm going to apply that variant later today (re-testing 3
  runs of 462.libquantum and 459.GemsFDTD, this time with address-space
  randomization turned off).
 
 Can't we rewrite for the selected arithmetic operations and punt (is that
 what Bernd's patch did) on moving other arithmetics?  Well, there are
 operations that are safe even for signed types, e.g. rotates, isn't
 RSHIFT_EXPR also safe?
 Can't you handle also LSHIFT_EXPR (or do we treat it even signed as never
 undefined in the middle-end/backend?).

Sure, we can.  I concentrated on fixing the cases that later cause
issues with aggressively treating the ops during niter analysis
which means ops that SCEV handles (see 
tree-scalar-evolution.c:interpret_rhs_expr).

But I'd rather have testcases here ...

Teaching LIM how to move conditional code would also be nice.

Btw, I don't think we'd lose many optimization opportunities if
we for example don't treat signed shifts or division as invoking
undefined behavior on overflow.  All the overflow business is
important for loop optimizations and thus important only for
the cases that SCEV can handle ...

Richard.


[RESEND] Enable building of libatomic on AArch64

2013-10-17 Thread Michael Hudson-Doyle
Resending as the previous attempt went missing...

  2013-10-04  Michael Hudson-Doyle  michael.hud...@linaro.org

* libatomic/configure.tgt (aarch64*): Remove code preventing
  build.

* gcc/testsuite/lib/target-supports.exp
  (check_effective_target_sync_long_long): AArch64 supports
  atomic operations on long long.
  (check_effective_target_sync_long_long_runtime): AArch64 can
  execute atomic operations on long long.

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 7eb4dfe..5557c06 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4508,6 +4508,7 @@ proc check_effective_target_sync_int_128_runtime { } {
 proc check_effective_target_sync_long_long { } {
 if { [istarget x86_64-*-*]
 	 || [istarget i?86-*-*])
+	 || [istarget aarch64*-*-*]
 	 || [istarget arm*-*-*]
 	 || [istarget alpha*-*-*]
 	 || ([istarget sparc*-*-*]  [check_effective_target_lp64]) } {
@@ -4537,6 +4538,8 @@ proc check_effective_target_sync_long_long_runtime { } {
 		}
 	} 
 	}]
+} elseif { [istarget aarch64*-*-*] } {
+	return 1
 } elseif { [istarget arm*-*-linux-*] } {
 	return [check_runtime sync_longlong_runtime {
 	#include stdlib.h
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index b9e5d6c..7eaab38 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -95,11 +95,6 @@ fi
 
 # Other system configury
 case ${target} in
-  aarch64*)
-	# This is currently not supported in AArch64.
-	UNSUPPORTED=1
-	;;
-
   arm*-*-linux*)
 	# OS support for atomic primitives.
 	config_path=${config_path} linux/arm posix


Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf

2013-10-17 Thread Richard Earnshaw
On 19/09/13 18:21, Charles Baylis wrote:
 Hi
 
 Here is an updated version.
 
 Changelog:
 
 * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets
 * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails
 with hardfloat, and test is not thumb-specific
 * gcc,target/arm/thumb-ltu.c: Avoid test failure with
 hardfloat ABI by requiring arm_thumb1_ok
 * lib/target-supports.exp
 (check_effective_target_arm_fp16_ok_nocache): don't force
 -mfloat-abi=soft when building for hardfloat target
 

ChangeLogs should be formatted to 80 columns.

Otherwise OK.

R.

 On 19 August 2013 16:34, Richard Earnshaw rearn...@arm.com wrote:
 On 15/08/13 15:10, Charles Baylis wrote:
 Hi

 The attached patch fixes some tests which fail when testing gcc for a
 arm-none-linux-gnueabihf target because they do not expect to be built
 with a hard float ABI.

 The change in target-supports.exp fixes arm-fp16-ops-5.c and 
 arm-fp16-ops-6.c.

 Tested on arm-none-linux-gnueabihf using qemu-arm, and does not cause
 any other tests to break.

 Comments? This is my first patch, so please point out anything wrong.





 2013-08-15  Charles Baylis  charles.bay...@linaro.org

 * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets
 * gcc.dg/tls/pr42894.c: Use -mfloat-abi=soft as Thumb1 does
 not support hardfloat ABI
 * arm/thumb-ltu.c: Use -mfloat-abi=soft as Thumb1 does not
 support hardfloat ABI
 * target-supports.exp: don't force -mfloat-abi=soft when
 building for hardfloat target


 hf-fixes.txt


 Index: gcc/testsuite/gcc.dg/builtin-apply2.c
 ===
 --- gcc/testsuite/gcc.dg/builtin-apply2.c (revision 201726)
 +++ gcc/testsuite/gcc.dg/builtin-apply2.c (working copy)
 @@ -1,6 +1,7 @@
  /* { dg-do run } */
  /* { dg-skip-if Variadic funcs have all args on stack. Normal funcs have 
 args in registers. { aarch64*-*-* avr-*-*  } { * } {  } } */
  /* { dg-skip-if Variadic funcs use Base AAPCS.  Normal funcs use VFP 
 variant. { arm*-*-* } { -mfloat-abi=hard } {  } } */
 +/* { dg-skip-if Variadic funcs use Base AAPCS.  Normal funcs use VFP 
 variant. { arm*-*-gnueabihf } { * } { -mfloat-abi=soft* } } */



 As you've noticed, basing the test's behaviour on the config variant
 doesn't work reliably.  The builtin-apply2 test really should be skipped
 if the current test variant is not soft-float.  We already have
 check_effective_target_arm_hf_eabi in target-supports.exp that checks
 whether __ARM_PCS_VFP is defined during a compilation.  So can  replace
 both arm related lines in builtin-apply2 with

  /* { dg-skip-if Variadic funcs use Base AAPCS.  Normal funcs use VFP
 variant. { arm*-*-*  arm_hf_eabi} { * } {  } } */

  /* PR target/12503 */
  /* Origin: pierre.nguyen-tu...@asim.lip6.fr */
 Index: gcc/testsuite/gcc.dg/tls/pr42894.c
 ===
 --- gcc/testsuite/gcc.dg/tls/pr42894.c(revision 201726)
 +++ gcc/testsuite/gcc.dg/tls/pr42894.c(working copy)
 @@ -1,6 +1,7 @@
  /* PR target/42894 */
  /* { dg-do compile } */
  /* { dg-options -march=armv5te -mthumb { target arm*-*-* } } */
 +/* { dg-options -march=armv5te -mthumb -mfloat-abi=soft { target 
 arm*-*-*hf } } */
  /* { dg-require-effective-target tls } */


 Although the original PR was for Thumb1, this is a generic test.  I'm
 not convinced that on ARM it should try to force thumb1.  Removing the
 original dg-options line should solve the problem and we then get better
 multi-lib testing as well.

  extern __thread int t;
 Index: gcc/testsuite/gcc.target/arm/thumb-ltu.c
 ===
 --- gcc/testsuite/gcc.target/arm/thumb-ltu.c  (revision 201726)
 +++ gcc/testsuite/gcc.target/arm/thumb-ltu.c  (working copy)
 @@ -1,6 +1,6 @@
  /* { dg-do compile } */
  /* { dg-skip-if incompatible options { arm*-*-* } { -march=* } { 
 -march=armv6 -march=armv6j -march=armv6z } } */
 -/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 } */
 +/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 -mfloat-abi=soft } */


 This won't work if there's an explict -mfloat-abi={softfp,hard} on the
 multilib options.  Probably the best thing to do here is to skip the
 test if arm_thumb1_ok is not true.

  void f(unsigned a, unsigned b, unsigned c, unsigned d)
  {
 Index: gcc/testsuite/lib/target-supports.exp
 ===
 --- gcc/testsuite/lib/target-supports.exp (revision 201726)
 +++ gcc/testsuite/lib/target-supports.exp (working copy)
 @@ -2445,6 +2445,11 @@
   # Must generate floating-point instructions.
   return 0
  }
 +if [check-flags [list  { *-*-gnueabihf } { * } {  } ]] {
 +# Use existing float-abi and force an fpu which supports fp16

 This should use arm_hf_eabi as described above.

 + set et_arm_fp16_flags -mfpu=vfpv4
 + return 1;
 +

[Ada] Clean ups in SPARK mode

2013-10-17 Thread Arnaud Charlet
1) Remove special expansion of NOT IN operator in SPARK verification

The special expansion for NOT IN operator in the SPARK formal verification
mode is not needed anymore. Now removed.

2) Document additional requirements on tree for SPARK verification

Formal verification of SPARK code is done in a special mode, in which
the tree generated by the frontend must respect additional requirements.
These are now described in a special section of sinfo.ads

2013-10-17  Yannick Moy  m...@adacore.com

* exp_spark.adb (Expand_SPARK): Remove special case for NOT IN
operation.
* sinfo.ads: Add special comment section to describe SPARK mode
effect on tree.
* exp_spark.ads: Remove comments, moved to sinfo.ads.

Index: exp_spark.adb
===
--- exp_spark.adb   (revision 203568)
+++ exp_spark.adb   (working copy)
@@ -25,7 +25,6 @@
 
 with Atree;use Atree;
 with Einfo;use Einfo;
-with Exp_Ch4;  use Exp_Ch4;
 with Exp_Dbug; use Exp_Dbug;
 with Exp_Util; use Exp_Util;
 with Sem_Aux;  use Sem_Aux;
@@ -80,12 +79,6 @@
   N_Identifier=
 Expand_Potential_Renaming (N);
 
- --  A NOT IN B gets transformed to NOT (A IN B). This is the same
- --  expansion used in the normal case, so shared the code.
-
- when N_Not_In =
-Expand_N_Not_In (N);
-
  when N_Object_Renaming_Declaration =
 Expand_SPARK_N_Object_Renaming_Declaration (N);
 
Index: exp_spark.ads
===
--- exp_spark.ads   (revision 203568)
+++ exp_spark.ads   (working copy)
@@ -30,54 +30,6 @@
 
 --  Expand_SPARK is called directly by Expander.Expand.
 
---  SPARK expansion has three main objectives:
-
---1. Perform limited expansion to explicit some Ada rules and constructs
---   (translate 'Old and 'Result, replace renamings by renamed, insert
---conversions, expand actuals in calls to introduce temporaries, expand
---generics instantiations)
-
---2. Facilitate treatment for the formal verification back-end (fully
---   qualify names, expand set membership, compute data dependences)
-
---3. Avoid the introduction of low-level code that is difficult to analyze
---   formally, as typically done in the full expansion for high-level
---   constructs (tasking, dispatching)
-
---  To fulfill objective 1, Expand_SPARK selectively expands some constructs.
-
---  To fulfill objective 2, the tree after SPARK expansion should be fully
---  analyzed semantically. In particular, all expression must have their proper
---  type, and semantic links should be set between tree nodes (partial to full
---  view, etc.) Some kinds of nodes should be either absent, or can be ignored
---  by the formal verification backend:
-
---  N_Object_Renaming_Declaration: can be ignored safely
---  N_Expression_Function: absent (rewitten)
---  N_Expression_With_Actions: absent (not generated)
-
---  SPARK cross-references are generated from the regular cross-references
---  (used for browsing and code understanding) and additional references
---  collected during semantic analysis, in particular on all
---  dereferences. These SPARK cross-references are output in a separate section
---  of ALI files, as described in spark_xrefs.adb. They are the basis for the
---  computation of data dependences in the formal verification backend. This
---  implies that all cross-references should be generated in this mode, even
---  those that would not make sense from a user point-of-view, and that
---  cross-references that do not lead to data dependences for subprograms can
---  be safely ignored.
-
---  To support the formal verification of units parameterized by data, the
---  value of deferred constants should not be considered as a compile-time
---  constant at program locations where the full view is not visible.
-
---  To fulfill objective 3, Expand_SPARK does not expand features that are not
---  formally analyzed (tasking), or for which formal analysis relies on the
---  source level representation (dispatching, aspects, pragmas). However, these
---  should be semantically analyzed, which sometimes requires the insertion of
---  semantic pre-analysis, for example for subprogram contracts and pragma
---  check/assert.
-
 with Types; use Types;
 
 package Exp_SPARK is
Index: sinfo.ads
===
--- sinfo.ads   (revision 203568)
+++ sinfo.ads   (working copy)
@@ -508,6 +508,48 @@
--  simply ignore these nodes, since they are not relevant to the task
--  of back annotating representation information.
 
+   
+   -- SPARK Mode --
+   
+
+   --  When a file is compiled in SPARK mode (-gnatd.F), a very light expansion
+   --  is performed and the analysis must generate a tree in a form 

[Ada] Fix generation of references for SPARK formal verification

2013-10-17 Thread Arnaud Charlet
The generation of references for SPARK formal verification was missing some
write references through renamings. This is now fixed.

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Yannick Moy  m...@adacore.com

* sem_ch8.adb (Find_Direct_Name): Keep track of assignments for
renamings in SPARK mode.

Index: sem_ch8.adb
===
--- sem_ch8.adb (revision 203568)
+++ sem_ch8.adb (working copy)
@@ -5073,9 +5073,14 @@
 --  Entity is unambiguous, indicate that it is referenced here
 
 --  For a renaming of an object, always generate simple reference,
---  we don't try to keep track of assignments in this case.
+--  we don't try to keep track of assignments in this case, except
+--  in SPARK mode where renamings are traversed for generating
+--  local effects of subprograms.
 
-if Is_Object (E) and then Present (Renamed_Object (E)) then
+if Is_Object (E)
+  and then Present (Renamed_Object (E))
+  and then not SPARK_Mode
+then
Generate_Reference (E, N);
 
--  If the renamed entity is a private protected component,


[Ada] Remove useless special handling for SPARK verification

2013-10-17 Thread Arnaud Charlet
The use of a specific light expansion for SPARK verification has rendered
obsolete a number of special handling cases only triggered in the normal
full expansion. Remove these useless cases now.

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Yannick Moy  m...@adacore.com

* exp_ch3.adb (Expand_Freeze_Class_Wide_Type,
Expand_Freeze_Class_Wide_Type, Expand_Freeze_Class_Wide_Type):
Remove useless special cases.
* exp_ch4.adb (Expand_Allocator_Expression, Expand_N_Allocator,
Expand_N_Op_Expon): Remove useless special cases.
* exp_ch6.adb (Is_Build_In_Place_Function_Call): Disable build-in-place
in SPARK mode by testing Full_Expander_Active instead of
Expander_Active.
(Make_Build_In_Place_Call_In_Allocator): Remove useless special case.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Remove
useless special case.
* sem_eval.adb (Compile_Time_Known_Value): Remove special handling of
deferred constant.

Index: exp_util.adb
===
--- exp_util.adb(revision 203568)
+++ exp_util.adb(working copy)
@@ -560,13 +560,6 @@
--  Start of processing for Build_Allocate_Deallocate_Proc
 
begin
-  --  Do not perform this expansion in SPARK mode because it is not
-  --  necessary.
-
-  if SPARK_Mode then
- return;
-  end if;
-
   --  Obtain the attributes of the allocation / deallocation
 
   if Nkind (N) = N_Free_Statement then
Index: exp_ch4.adb
===
--- exp_ch4.adb (revision 203568)
+++ exp_ch4.adb (working copy)
@@ -1268,14 +1268,10 @@
 --* .NET/JVM - these targets do not support address arithmetic
 --and unchecked conversion, key elements of Finalize_Address.
 
---* SPARK mode - the call is useless and results in unwanted
---expansion.
-
 --* CodePeer mode - TSS primitive Finalize_Address is not
 --created in this mode.
 
 if VM_Target = No_VM
-  and then not SPARK_Mode
   and then not CodePeer_Mode
   and then Present (Finalization_Master (PtrT))
   and then Present (Temp_Decl)
@@ -4295,16 +4291,13 @@
  end if;
 
  --  The finalization master must be inserted and analyzed as part of
- --  the current semantic unit. This form of expansion is not carried
- --  out in SPARK mode because it is useless. Note that the master is
- --  updated when analysis changes current units.
+ --  the current semantic unit. Note that the master is updated when
+ --  analysis changes current units.
 
- if not SPARK_Mode then
-if Present (Rel_Typ) then
-   Set_Finalization_Master (PtrT, Finalization_Master (Rel_Typ));
-else
-   Set_Finalization_Master (PtrT, Current_Anonymous_Master);
-end if;
+ if Present (Rel_Typ) then
+Set_Finalization_Master (PtrT, Finalization_Master (Rel_Typ));
+ else
+Set_Finalization_Master (PtrT, Current_Anonymous_Master);
  end if;
   end if;
 
@@ -4839,15 +4832,11 @@
  --Set_Finalize_Address
  --  (PtrTFM, TFD'Unrestricted_Access);
 
- --  Do not generate this call in the following cases:
- --
- --* SPARK mode - the call is useless and results in
- --unwanted expansion.
- --
- --* CodePeer mode - TSS primitive Finalize_Address is
- --not created in this mode.
+ --  Do not generate this call in CodePeer mode, as TSS
+ --  primitive Finalize_Address is not created in this
+ --  mode.
 
- elsif not (SPARK_Mode or CodePeer_Mode) then
+ elsif not CodePeer_Mode then
 Insert_Action (N,
   Make_Set_Finalize_Address_Call
 (Loc = Loc,
@@ -7321,9 +7310,9 @@
begin
   Binary_Op_Validity_Checks (N);
 
-  --  CodePeer and GNATprove want to see the unexpanded N_Op_Expon node
+  --  CodePeer wants to see the unexpanded N_Op_Expon node
 
-  if CodePeer_Mode or SPARK_Mode then
+  if CodePeer_Mode then
  return;
   end if;
 
Index: exp_ch6.adb
===
--- exp_ch6.adb (revision 203568)
+++ exp_ch6.adb (working copy)
@@ -9599,7 +9599,11 @@
   --  disabled (such as with -gnatc) since those would trip over the raise
   --  of Program_Error below.
 
-  if not Expander_Active then
+  --  In SPARK mode, build-in-place calls are not 

Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf

2013-10-17 Thread Richard Earnshaw
On 17/10/13 11:32, Richard Earnshaw wrote:
 On 19/09/13 18:21, Charles Baylis wrote:
 Hi

 Here is an updated version.

 Changelog:

 * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets
 * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails
 with hardfloat, and test is not thumb-specific
 * gcc,target/arm/thumb-ltu.c: Avoid test failure with
 hardfloat ABI by requiring arm_thumb1_ok
 * lib/target-supports.exp
 (check_effective_target_arm_fp16_ok_nocache): don't force
 -mfloat-abi=soft when building for hardfloat target

 
 ChangeLogs should be formatted to 80 columns.
 

Oh, and they should be complete sentences.  So start with a capital
letter, and end with a full stop.

R.

 Otherwise OK.
 
 R.
 
 On 19 August 2013 16:34, Richard Earnshaw rearn...@arm.com wrote:
 On 15/08/13 15:10, Charles Baylis wrote:
 Hi

 The attached patch fixes some tests which fail when testing gcc for a
 arm-none-linux-gnueabihf target because they do not expect to be built
 with a hard float ABI.

 The change in target-supports.exp fixes arm-fp16-ops-5.c and 
 arm-fp16-ops-6.c.

 Tested on arm-none-linux-gnueabihf using qemu-arm, and does not cause
 any other tests to break.

 Comments? This is my first patch, so please point out anything wrong.





 2013-08-15  Charles Baylis  charles.bay...@linaro.org

 * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets
 * gcc.dg/tls/pr42894.c: Use -mfloat-abi=soft as Thumb1 does
 not support hardfloat ABI
 * arm/thumb-ltu.c: Use -mfloat-abi=soft as Thumb1 does not
 support hardfloat ABI
 * target-supports.exp: don't force -mfloat-abi=soft when
 building for hardfloat target


 hf-fixes.txt


 Index: gcc/testsuite/gcc.dg/builtin-apply2.c
 ===
 --- gcc/testsuite/gcc.dg/builtin-apply2.c (revision 201726)
 +++ gcc/testsuite/gcc.dg/builtin-apply2.c (working copy)
 @@ -1,6 +1,7 @@
  /* { dg-do run } */
  /* { dg-skip-if Variadic funcs have all args on stack. Normal funcs have 
 args in registers. { aarch64*-*-* avr-*-*  } { * } {  } } */
  /* { dg-skip-if Variadic funcs use Base AAPCS.  Normal funcs use VFP 
 variant. { arm*-*-* } { -mfloat-abi=hard } {  } } */
 +/* { dg-skip-if Variadic funcs use Base AAPCS.  Normal funcs use VFP 
 variant. { arm*-*-gnueabihf } { * } { -mfloat-abi=soft* } } */



 As you've noticed, basing the test's behaviour on the config variant
 doesn't work reliably.  The builtin-apply2 test really should be skipped
 if the current test variant is not soft-float.  We already have
 check_effective_target_arm_hf_eabi in target-supports.exp that checks
 whether __ARM_PCS_VFP is defined during a compilation.  So can  replace
 both arm related lines in builtin-apply2 with

  /* { dg-skip-if Variadic funcs use Base AAPCS.  Normal funcs use VFP
 variant. { arm*-*-*  arm_hf_eabi} { * } {  } } */

  /* PR target/12503 */
  /* Origin: pierre.nguyen-tu...@asim.lip6.fr */
 Index: gcc/testsuite/gcc.dg/tls/pr42894.c
 ===
 --- gcc/testsuite/gcc.dg/tls/pr42894.c(revision 201726)
 +++ gcc/testsuite/gcc.dg/tls/pr42894.c(working copy)
 @@ -1,6 +1,7 @@
  /* PR target/42894 */
  /* { dg-do compile } */
  /* { dg-options -march=armv5te -mthumb { target arm*-*-* } } */
 +/* { dg-options -march=armv5te -mthumb -mfloat-abi=soft { target 
 arm*-*-*hf } } */
  /* { dg-require-effective-target tls } */


 Although the original PR was for Thumb1, this is a generic test.  I'm
 not convinced that on ARM it should try to force thumb1.  Removing the
 original dg-options line should solve the problem and we then get better
 multi-lib testing as well.

  extern __thread int t;
 Index: gcc/testsuite/gcc.target/arm/thumb-ltu.c
 ===
 --- gcc/testsuite/gcc.target/arm/thumb-ltu.c  (revision 201726)
 +++ gcc/testsuite/gcc.target/arm/thumb-ltu.c  (working copy)
 @@ -1,6 +1,6 @@
  /* { dg-do compile } */
  /* { dg-skip-if incompatible options { arm*-*-* } { -march=* } { 
 -march=armv6 -march=armv6j -march=armv6z } } */
 -/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 } */
 +/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 -mfloat-abi=soft } */


 This won't work if there's an explict -mfloat-abi={softfp,hard} on the
 multilib options.  Probably the best thing to do here is to skip the
 test if arm_thumb1_ok is not true.

  void f(unsigned a, unsigned b, unsigned c, unsigned d)
  {
 Index: gcc/testsuite/lib/target-supports.exp
 ===
 --- gcc/testsuite/lib/target-supports.exp (revision 201726)
 +++ gcc/testsuite/lib/target-supports.exp (working copy)
 @@ -2445,6 +2445,11 @@
   # Must generate floating-point instructions.
   return 0
  }
 +if [check-flags [list  { *-*-gnueabihf } { * } {  } ]] {
 +# Use existing 

[Ada] Add switch to output reason why spec requires body

2013-10-17 Thread Arnaud Charlet
The warning switch -gnatw.y(.Y) activates(deactivates) a mode in which
information messages are given that show why a package spec requires a
body. This can be useful if you have a large package which unexpectedly
requires a body.

 1. package ReqBody is
|
 info: ReqBody requires body (Elaborate_Body)

 2.pragma Elaborate_Body;
 3.A : Integer;
 4.B : Integer;
 5.procedure K;
 |
 info: ReqBody requires body (K requires completion)

 6. end ReqBody;

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Robert Dewar  de...@adacore.com

* gnat_ugn.texi: Document -gnatw.y/-gnatw.Y.
* opt.ads (List_Body_Required_Info): New flag.
* prep.adb: Minor reformatting.
* sem_ch7.adb (Unit_Requires_Body_Info): New
procedure (Analyze_Package_Specification): Add call to
Unit_Requires_Body_Info.
* ug_words: Add entries for -gnatw.y and -gnatw.Y.
* usage.adb: Add line for new warning switch -gnatw.y/.Y.
* vms_data.ads: Add entry for [NO_]WHY_SPEC_NEEDS_BODY warning
qualifier.
* warnsw.ads, warnsw.adb: Implement new warning switch -gnatw.y/.Y.

Index: sem_ch7.adb
===
--- sem_ch7.adb (revision 203598)
+++ sem_ch7.adb (working copy)
@@ -136,6 +136,11 @@
--  inherited private operation has been overridden, then it's replaced by
--  the overriding operation.
 
+   procedure Unit_Requires_Body_Info (P : Entity_Id);
+   --  Outputs info messages showing why package specification P requires a
+   --  body. Caller has checked that the switch requesting this information
+   --  is set, and that the package does indeed require a body.
+
--
-- Analyze_Package_Body --
--
@@ -1515,6 +1520,15 @@
   (\pragma Elaborate_Body is required in this case, P);
  end;
   end if;
+
+  --  If switch set, output information on why body required
+
+  if List_Body_Required_Info
+and then In_Extended_Main_Source_Unit (Id)
+and then Unit_Requires_Body (Id)
+  then
+ Unit_Requires_Body_Info (Id);
+  end if;
end Analyze_Package_Specification;
 
--
@@ -1686,8 +1700,8 @@
   and then No (Interface_Alias (Node (Op_Elmt_2)))
 then
--  The private inherited operation has been
-   --  overridden by an explicit subprogram: replace
-   --  the former by the latter.
+   --  overridden by an explicit subprogram:
+   --  replace the former by the latter.
 
New_Op := Node (Op_Elmt_2);
Replace_Elmt (Op_Elmt, New_Op);
@@ -2748,4 +2762,135 @@
   return False;
end Unit_Requires_Body;
 
+   -
+   -- Unit_Requires_Body_Info --
+   -
+
+   procedure Unit_Requires_Body_Info (P : Entity_Id) is
+  E : Entity_Id;
+
+   begin
+  --  Imported entity never requires body. Right now, only subprograms can
+  --  be imported, but perhaps in the future we will allow import of
+  --  packages.
+
+  if Is_Imported (P) then
+ return;
+
+  --  Body required if library package with pragma Elaborate_Body
+
+  elsif Has_Pragma_Elaborate_Body (P) then
+ Error_Msg_N
+   (?Y?info:  requires body (Elaborate_Body), P);
+
+  --  Body required if subprogram
+
+  elsif Is_Subprogram (P) or else Is_Generic_Subprogram (P) then
+ Error_Msg_N (?Y?info:  requires body (subprogram case), P);
+
+  --  Body required if generic parent has Elaborate_Body
+
+  elsif Ekind (P) = E_Package
+and then Nkind (Parent (P)) = N_Package_Specification
+and then Present (Generic_Parent (Parent (P)))
+  then
+ declare
+G_P : constant Entity_Id := Generic_Parent (Parent (P));
+ begin
+if Has_Pragma_Elaborate_Body (G_P) then
+   Error_Msg_N
+ (?Y?info:  requires body (generic parent Elaborate_Body),
+  P);
+end if;
+ end;
+
+  --  A [generic] package that introduces at least one non-null abstract
+  --  state requires completion. However, there is a separate rule that
+  --  requires that such a package have a reason other than this for a
+  --  body being required (if necessary a pragma Elaborate_Body must be
+  --  provided). If Ignore_Abstract_State is True, we don't do this check
+  --  (so we can use Unit_Requires_Body to check for some other reason).
+
+  elsif Ekind_In (P, E_Generic_Package, E_Package)
+and then Present (Abstract_States (P))
+and then
+  not 

Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf

2013-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2013 at 11:32:42AM +0100, Richard Earnshaw wrote:
 On 19/09/13 18:21, Charles Baylis wrote:
  Here is an updated version.
  
  Changelog:
  
  * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets
  * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails
  with hardfloat, and test is not thumb-specific
  * gcc,target/arm/thumb-ltu.c: Avoid test failure with
  hardfloat ABI by requiring arm_thumb1_ok
  * lib/target-supports.exp
  (check_effective_target_arm_fp16_ok_nocache): don't force
  -mfloat-abi=soft when building for hardfloat target
  
 
 ChangeLogs should be formatted to 80 columns.

Not only that.  The descriptions should start with capital letter
and end with a dot.  For pr42894.c, you are not removing options,
you are removing dg-options, and the rest is why, not what, so doesn't
belong into the ChangeLog description.  Similarly, for thumb-ltu.c,
what are doing is remove dg-skip-if and require affective target
arm_thumb1_ok.

Jakub


Re: RFC: Add of type-demotion pass

2013-10-17 Thread Richard Biener
On Wed, Oct 16, 2013 at 6:33 PM, Jeff Law l...@redhat.com wrote:
 On 10/16/13 03:31, Richard Biener wrote:

 I see two primary effects of type sinking.


 Note it was called type demotion ;)

 ;)  It's a mental block of mine; it's been called type hoisting/sinking in
 various contexts and I see parallels between the code motion algorithms and
 how the type promotion/demotion exposes unnecessary type conversions.  So I
 keep calling them hoisting/sinking.  I'll try to use promotion/demotion.





   First and probably the most
 important in my mind is by sinking a cast through its uses the various
 transformations we already perform are more likely to apply *without*
 needing to handle optimizing through typecasts explicitly.


 I would say it is desirable to express arithmetic in the smallest possible
 types (see what premature optimization the C family frontends do
 to narrow operations again after C integer promotion applied).

 I don't see this as the major benefit of type demotion.  Yes, there is some
 value in shrinking constants and the like, but in my experience the benefits
 are relatively small and often get lost in things like partial register
 stalls on x86, the PA and probably others (yes, the PA has partial register
 stalls, it's just that nobody used that term).

 What I really want to get at here is avoiding having a large number of
 optimizers looking back through the use-def chains and attempting to elide
 typecasts in the middle of a chain of statements of interest.

Hmm, off the top of my head only forwprop and VRP look back through
use-def chains to elide typecasts.  And they do that to optimize those
casts, thus it is their job ...?  Other cases are around, but those
are of the sorts of is op1 available in type X and/or can I safely cast
it to type X? that code isn't going to be simplified by generic
promotion / demotion because that code isn't going to know what
type pass Y in the end wants.

Abstracting functions that can answer those questions instead of
repeating N variants of it would of course be nice.

Likewise reducing the number of places we perform promotion / demotion
(remove it from frontend code and fold, add it in the GIMPLE combiner).

Also making the GIMPLE combiner available as an utility to apply
to a single statement (see my very original GIMPLE-fold proposal)
would be very useful.

As for promotion / demotion (if you are not talking about applying
PROMOTE_MODE which rather forces promotion of variables and
requires inserting compensation code), you want to optimize

 op1 = (T) op1';
 op2 = (T) op2';
 x = op1 OP op2; (*)
 y = (T2) x;

to either carry out OP in type T2 or in a type derived from the types
of op1' and op2'.

For the simple case combine-like pattern matching is ok.  It gets
more complicated if there are a series of statements here (*), but
even that case is handled by iteratively applying the combiner
patterns (which forwprop does).

If you split out promotion / demotion into a separate pass then
you introduce pass ordering issues as combining may introduce
promotion / demotion opportunities and the other way around.

That wouldn't apply to a pass lowering GIMPLE to fully honor
PROMOTE_MODE.

 You need some kind of range information to do this, thus either integrate
 it into VRP (there is already code that does this there) or use range
 information from VRP which we now preserve.

 If the primary goal is to shrink types, then yes, you want to use whatever
 information you can, including VRP.  But that's not the primary goal in my
 mind, at least not at this stage.

 There's no reason why this pass couldn't utilize VRP information to provide
 more opportunities to demote types and achieve the goal you want.  But I'd
 consider that a follow-on opportunity.






 The second primary effect is, given two casts where the first indirectly
 feeds the second (ie, the first feeds some statement, which then feeds
 the
 second cast), if we're able to sink the first cast, we end up with the
 first
 cast directly feeding the second cast.  When this occurs one of the two
 casts can often be eliminated.   Sadly, I didn't keep any of those test
 files, but I regularly saw them in GCC bootstraps.


 This transformation is applied both by fold-const.c and by SSA forwprop
 (our GIMPLE combiner).  Doing it in yet another pass looks wrong
 (and it isn't type demotion but also can be promotion).

 Yes, I know.  And we need to get this back down to a single implementation.
 I don't much care which of the 3 implementations we keep, but it really
 should just be one and it needs to be reusable.

 I probably should have stated this differently -- the second primary effect
 is to expose more cases where type conversions can be eliminated via type
 promotion/demotion.  I don't much care which of the 3 blobs of code to
 eliminate the conversions we use -- I do care that we've got a consistent
 way to promote/demote conversions to expose the unnecessary type
 conversions.

Sure.

 

[Ada] Scalar_Storage_Order consistency for nested composites

2013-10-17 Thread Arnaud Charlet
Scalar_Storage_Order must be consistent between any component of a composite
type (array or record) and the composite type itself. We already enforced
this in the case where the enclosing type has a Scalar_Storage_Order
attribute definition, and the component type has none. We now also do so
also in the opposite case when the enclosing type has no Scalar_Storage_Order
clause, and any component does have one.

The following compilation must be rejected with the indicated error messages:

$ gcc -c nat_comp_rev_el.ads
nat_comp_rev_el.ads:26:04: composite type must have explicit scalar storage 
order
nat_comp_rev_el.ads:27:04: composite type must have explicit scalar storage 
order
nat_comp_rev_el.ads:29:07: composite type must have explicit scalar storage 
order
nat_comp_rev_el.ads:34:07: composite type must have explicit scalar storage 
order

with System; use System;
package Nat_Comp_Rev_El is
   type U32 is mod 2**32;

   type LE_Record is record
  X : U32;
   end record;
   for LE_Record use record
  X at 0 range 0 .. 31;
   end record;
   for LE_Record'Bit_Order use Low_Order_First;
   for LE_Record'Scalar_Storage_Order use Low_Order_First;

   type BE_Record is record
  X : U32;
   end record;
   for BE_Record use record
  X at 0 range 0 .. 31;
   end record;
   for BE_Record'Bit_Order use High_Order_First;
   for BE_Record'Scalar_Storage_Order use High_Order_First;

   --  Reject the below declarations: the component type has an explicit SSO,
   --  so we also require one on the enclosing composite type.

   type Two_LE is array (1 .. 2) of LE_Record;
   type Two_BE is array (1 .. 2) of BE_Record;
   type Rec_LE is record
  Comp : LE_Record;
  X: Integer;
   end record;

   type Rec_BE is record
  Comp : BE_Record;
  X: Integer;
   end record;

end Nat_Comp_Rev_El;

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Thomas Quinot  qui...@adacore.com

* freeze.adb (Check_Component_Storage_Order): Reject a record or
array type that does not have an explicit Scalar_Storage_Order
attribute definition if a component of the record, or the
elements of the array, have one.
* gnat_rm.texi (attribute Scalar_Storage_Order): Document the above
rule.

Index: gnat_rm.texi
===
--- gnat_rm.texi(revision 203568)
+++ gnat_rm.texi(working copy)
@@ -8727,6 +8727,10 @@
 if the component does not start on a byte boundary, then the scalar storage
 order specified for S and for the nested component type shall be identical.
 
+If @var{S} appears as the type of a record or array component, the enclosing
+record or array shall also have a @code{Scalar_Storage_Order} attribute
+definition clause.
+
 No component of a type that has a @code{Scalar_Storage_Order} attribute
 definition may be aliased.
 
Index: freeze.adb
===
--- freeze.adb  (revision 203568)
+++ freeze.adb  (working copy)
@@ -92,11 +92,15 @@
 
procedure Check_Component_Storage_Order
  (Encl_Type : Entity_Id;
-  Comp  : Entity_Id);
+  Comp  : Entity_Id;
+  ADC   : Node_Id);
--  For an Encl_Type that has a Scalar_Storage_Order attribute definition
-   --  clause, verify that the component type is compatible. For arrays,
-   --  Comp is Empty; for records, it is the entity of the component under
-   --  consideration.
+   --  clause, verify that the component type has an explicit and compatible
+   --  attribute/aspect. For arrays, Comp is Empty; for records, it is the
+   --  entity of the component under consideration. For an Encl_Type that
+   --  does not have a Scalar_Storage_Order attribute definition clause,
+   --  verify that the component also does not have such a clause.
+   --  ADC is the attribute definition clause if present (or Empty).
 
procedure Check_Strict_Alignment (E : Entity_Id);
--  E is a base type. If E is tagged or has a component that is aliased
@@ -1068,11 +1072,12 @@
 
procedure Check_Component_Storage_Order
  (Encl_Type : Entity_Id;
-  Comp  : Entity_Id)
+  Comp  : Entity_Id;
+  ADC   : Node_Id)
is
   Comp_Type : Entity_Id;
+  Comp_ADC  : Node_Id;
   Err_Node  : Node_Id;
-  ADC   : Node_Id;
 
   Comp_Byte_Aligned : Boolean;
   --  Set True for the record case, when Comp starts on a byte boundary
@@ -1113,11 +1118,24 @@
   --  the attribute definition clause is attached to the first subtype.
 
   Comp_Type := Base_Type (Comp_Type);
-  ADC := Get_Attribute_Definition_Clause
-   (First_Subtype (Comp_Type),
-Attribute_Scalar_Storage_Order);
+  Comp_ADC := Get_Attribute_Definition_Clause
+(First_Subtype (Comp_Type),
+ Attribute_Scalar_Storage_Order);
 
-  if Is_Record_Type (Comp_Type) or else Is_Array_Type 

[Ada] Check for illegal global refs to abstract state when refinement visible

2013-10-17 Thread Arnaud Charlet
This implements the rule in SPARK RM (6.1.5(4)): A global item shall
not denote a state abstraction whose refinement is visible (a state
abstraction cannot be named within its enclosing package's body other
than in its refinement).

The following is compiled with -gnatd.V

 1. package Depends_Illegal
 2.with Abstract_State = A
 3. is
 4.pragma Elaborate_Body (Depends_Illegal);
 5. end Depends_Illegal;

 1. package body Depends_Illegal
 2.with Refined_State = (A = (X, Y))
 3. is
 4.X, Y : Natural := 0;
 5.function F1 (Par1 : Natural) return Natural
 6.   with Global  = A,
  |
 global reference to A not allowed (SPARK RM 6.1.5(4))
 refinement of A is visible at line 2

 7.Depends = (F1'Result = A,
|
 global reference to A not allowed (SPARK RM 6.1.5(4))
 refinement of A is visible at line 2

 8.null  = Par1)
 9.is
10.begin
11.   return X;
12.end F1;
13. end Depends_Illegal;

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Robert Dewar  de...@adacore.com

* einfo.ads, einfo.adb (Has_Body_References): New flag.
(Body_References): New field.
* sem_prag.adb (Record_Possible_Body_Reference): New procedure
(Analyze_Input_Output): Call Record_Possible_Body_Reference
(Analyze_Global_Item): Call Record_Possible_Body_Reference
(Analyze_Refinement_Clause): Output messages if illegal global refs.

Index: einfo.adb
===
--- einfo.adb   (revision 203568)
+++ einfo.adb   (working copy)
@@ -132,6 +132,7 @@
--String_Literal_Low_BoundNode15
 
--Access_Disp_Table   Elist16
+   --Body_References Elist16
--Cloned_Subtype  Node16
--DTC_Entity  Node16
--Entry_FormalNode16
@@ -552,8 +553,8 @@
--Has_Delayed_Rep_Aspects Flag261
--May_Inherit_Delayed_Rep_Aspects Flag262
--Has_Visible_Refinement  Flag263
+   --Has_Body_References Flag264
 
-   --(unused)Flag264
--(unused)Flag265
--(unused)Flag266
--(unused)Flag267
@@ -733,6 +734,12 @@
   return Flag40 (Id);
end Body_Needed_For_SAL;
 
+   function Body_References (Id : E) return L is
+   begin
+  pragma Assert (Ekind (Id) = E_Abstract_State);
+  return Elist16 (Id);
+   end Body_References;
+
function C_Pass_By_Copy (Id : E) return B is
begin
   pragma Assert (Is_Record_Type (Id));
@@ -1294,6 +1301,12 @@
   return Flag139 (Id);
end Has_Biased_Representation;
 
+   function Has_Body_References (Id : E) return B is
+   begin
+  pragma Assert (Ekind (Id) = E_Abstract_State);
+  return Flag264 (Id);
+   end Has_Body_References;
+
function Has_Completion (Id : E) return B is
begin
   return Flag26 (Id);
@@ -3336,6 +3349,12 @@
   Set_Flag40 (Id, V);
end Set_Body_Needed_For_SAL;
 
+   procedure Set_Body_References (Id : E; V : L) is
+   begin
+  pragma Assert (Ekind (Id) = E_Abstract_State);
+  Set_Elist16 (Id, V);
+   end Set_Body_References;
+
procedure Set_C_Pass_By_Copy (Id : E; V : B := True) is
begin
   pragma Assert (Is_Record_Type (Id) and then Is_Base_Type (Id));
@@ -3909,6 +3928,12 @@
   Set_Flag139 (Id, V);
end Set_Has_Biased_Representation;
 
+   procedure Set_Has_Body_References (Id : E; V : B := True) is
+   begin
+  pragma Assert (Ekind (Id) = E_Abstract_State);
+  Set_Flag264 (Id, V);
+   end Set_Has_Body_References;
+
procedure Set_Has_Completion (Id : E; V : B := True) is
begin
   Set_Flag26 (Id, V);
@@ -7984,6 +8009,7 @@
   W (Has_Anonymous_Master,Flag253 (Id));
   W (Has_Atomic_Components,   Flag86  (Id));
   W (Has_Biased_Representation,   Flag139 (Id));
+  W (Has_Body_References, Flag264 (Id));
   W (Has_Completion,  Flag26  (Id));
   W (Has_Completion_In_Body,  Flag71  (Id));
   W (Has_Complex_Representation,  Flag140 (Id));
@@ -8672,6 +8698,10 @@
procedure Write_Field16_Name (Id : Entity_Id) is
begin
   case Ekind (Id) is
+
+ when E_Abstract_State =
+Write_Str (Body_References);
+
  when E_Record_Type|
   E_Record_Type_With_Private   =
 Write_Str (Access_Disp_Table);
Index: einfo.ads
===
--- einfo.ads   (revision 203568)
+++ einfo.ads   (working copy)
@@ -493,6 +493,12 @@
 --   

Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.

2013-10-17 Thread Kirill Yukhin
Hello,

On 16 Oct 09:59, Richard Henderson wrote:
 On 10/16/2013 09:07 AM, Kirill Yukhin wrote:
  I suspect gen_lowpart is bad turn when reload is completed, as
  far as it can create new pseudo. gen_lowpart () may call
  gen_reg_rtx (), which contain corresponging gcc_assert ().
 
 False.  gen_lowpart is perfectly safe post-reload.
 Indeed, taking the subreg of a hard register should arrive
 
   x = gen_rtx_REG_offset (op, outermode, final_regno, final_offset);
 
 in simplify_subreg.
 
 Have you encountered some specific problem with gen_lowpart?
Yes.

Patch [8/8] contains testsuite for AVX-512.
This pattern is covered as well.

When trying to do so:

(define_insn_and_split vec_extract_lo_v32hi
  [(set (match_operand:V16HI 0 nonimmediate_operand =v,m)
(vec_select:V16HI
  (match_operand:V32HI 1 nonimmediate_operand vm,v)
  (parallel [(const_int 0) (const_int 1)
 (const_int 2) (const_int 3)
 (const_int 4) (const_int 5)
 (const_int 6) (const_int 7)
 (const_int 8) (const_int 9)
 (const_int 10) (const_int 11)
 (const_int 12) (const_int 13)
 (const_int 14) (const_int 15)])))]
  TARGET_AVX512F  !(MEM_P (operands[0])  MEM_P (operands[1]))
  #
   reload_completed
  [(const_int 0)]
{
  rtx op1 = operands[1];
  op1 = gen_lowpart (V16HImode, op1);
  emit_move_insn (operands[0], op1);
  DONE;
})

I've got ICE, with following bt:

#1  0x006f28d6 in gen_reg_rtx (mode=V32HImode) at 
/export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:866
#2  0x0070243a in copy_to_reg (x=(reg:V32HI 21 xmm0 [163])) at 
/export/users/kyukhin/gcc/git/gcc/gcc/explow.c\
:606
#3  0x0091dfb8 in gen_lowpart_general (mode=V16HImode, x=optimized 
out)
at /export/users/kyukhin/gcc/git/gcc/gcc/rtlhooks.c:50
#4  0x00ce16e8 in gen_split_4943 (curr_insn=optimized out, 
operands=0x16f6320)
at /export/users/kyukhin/gcc/git/gcc/gcc/config/i386/sse.md:6329
#5  0x006f7865 in try_split (pat=(set (reg:V16HI 23 xmm2 [164])
(vec_select:V16HI (reg:V32HI 21 xmm0 [163])
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
(const_int 8 [0x8])
(const_int 9 [0x9])
(const_int 10 [0xa])
(const_int 11 [0xb])
(const_int 12 [0xc])
(const_int 13 [0xd])
(const_int 14 [0xe])
(const_int 15 [0xf])
]))), trial=(insn 48 46 49 6 (set (reg:V16HI 23 xmm2 [164])
(vec_select:V16HI (reg:V32HI 21 xmm0 [163])
(parallel [
(const_int 0 [0])
(const_int 1 [0x1])
(const_int 2 [0x2])
(const_int 3 [0x3])
(const_int 4 [0x4])
(const_int 5 [0x5])
(const_int 6 [0x6])
(const_int 7 [0x7])
(const_int 8 [0x8])
(const_int 9 [0x9])
(const_int 10 [0xa])
(const_int 11 [0xb])
(const_int 12 [0xc])
(const_int 13 [0xd])
(const_int 14 [0xe])
(const_int 15 [0xf])
]))) 
/export/users/kyukhin/gcc/git/gcc/gcc/testsuite/gcc.target/i386/avx512f-vec-unpack.c:24
 2151 {ve\
c_extract_lo_v32hi}
 (nil)), last=optimized out)
at /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:3467

So, we have:
[rtlhooks.c:50]gen_lowpart_general () - 
[explow.c:606]copy_to_reg () - 
[emit-rtl.c:866]gen_reg_rtx (): gcc_assert (can_create_pseudo_p ());

Maybe the code in the pattern is buggy? Or is it a gen_lowpart?

--
Thanks, K


Re: alias fix for PR58685

2013-10-17 Thread Richard Biener
On Wed, Oct 16, 2013 at 7:53 PM, Bernd Schmidt ber...@codesourcery.com wrote:
 The sequence of events here can be summarized as shrink-wrapping causes
 the i386 backend to do something that confuses alias analysis. The
 miscompilation is that two instructions are swapped by the scheduler
 when they shouldn't be, due to an incorrect reg_base_value.

 The miscompiled function has the following parts:
  * a loop which uses %rdi as an incoming argument register
  * following that, a decrement of the stack pointer (shrink-wrapped
to this point rather than the start of the function)
  * the rest of the function which has one set of %rdi to
(symbol_ref LC0).

 The argument register %rdi is dead by the point where we decrement the
 stack pointer. The i386 backend splits the sub into a sequence of two
 insns, a clobber of %rdi and a push of it (which register is chosen
 appears to be somewhat random).

 When called from the scheduler, init_alias_analysis incorrectly deduces
 that %rdi has a base value of (symbol_ref LC0). Although it tries to
 track which registers are argument registers that may hold a pointer,
 this information is lost when we encounter the clobber. The main part of
 the following patch is to modify that piece of code to also set reg_seen
 for argument registers when encountering a clobber.

 There are other problems in this area, one of which showed up while
 testing an earlier patch. i386/pr57003.c demonstrates that return values
 of FUNCTION_ARG_REGNO are not constant across the whole compilation;
 they are affected by the ms_abi attribute. This necessitates changing
 from a static_reg_base_value array to a per-function one. Once that is
 done, it's better to look at DECL_ARGUMENTS in a similar way to what
 combine.c does to identify the arguments of the specific function we're
 compiling rather than using the less specific FUNCTION_ARG_REGNO.
 Lastly, I modified a test for Pmode to use the valid_pointer_mode hook
 which should be a little more correct.

 Bootstrapped and tested on x86_64-linux. Ok?

+ if (!bitmap_bit_p (reg_seen, regno))
+   {
+ /* We have to make an exception here for an argument
+register, in case it is used before the clobber.  */
+ gcc_assert (new_reg_base_value[regno] == arg_base_value);
+ bitmap_set_bit (reg_seen, regno);
+   }

please use gcc_checking_assert.  (bah, I hate that subtle difference
in sbitmap vs. bitmap not returning whether the bit changed in
the modifying operations ...)

+ /* We have to make an exception here for an argument
+register, in case it is used before the clobber.  */

can you elaborate here _why_ we have to make that exception?

Otherwise looks good to me (with my limited knowledge of this area).

Thanks,
Richard.


 Bernd


Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Sandiford
Richard Biener rguent...@suse.de writes:
 On Thu, 17 Oct 2013, Richard Sandiford wrote:

 Kenneth Zadeck zad...@naturalbridge.com writes:
  As mentioned in my message yesterday, I thought your new way of canonising
  unsigned tree constants meant that there was always an upper zero bit.
  Is that right?
  i believe this is correct.
  If so, xprecision  precision is a no-op, because the number always
  has the right form for wider precisions.  The only difficult case is
  xprecision == precision, since then we need to peel off any upper -1 HWIs.
  say my HWI size is 8 bits (just to keep from typing a million 'f's.   if 
  i have a 16 bit unsigned number that is all 1s in the tree world it is 3 
  hwis
  0x00 0xff 0xff.
 
  but inside regular wide int, it would take 1 wide int whose value is 0xff.
  inside of max it would be the same as the tree, but then the test 
  precision  xprecision + hbpwi never kicks in because precision is 
  guaranteed to be huge.
 
  inside of addr_wide_int, i think we tank with the assertion.
 
 It should be OK for addr_wide_int too.  The precision still fits 2 HWIs.
 The initial length is greater than the maximum length of an addr_wide_int,
 but your len = MAX (len, max_len) deals with that.

 It's

   len = MIN (len, max_len)

Oops, yeah, I meant MIN, sorry.

 which looked suspicious to me, but with precision = xprecision
 precision can only be zero if xprecision is zero which looked to
 me like it cannot happen - or rather it should be fixed.

Despite the comment above the code, I don't think this MIN is there
for the zero-precision case.  I think it's there to handle the new
tree representation.

The new tree representation can have a length greater than max_len
for an unsigned tree constant that occupies a whole number of HWIs.
The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
When extended to max_wide_int the representation is the same.
But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
The MIN trims the length from 3 to 2 in the last case.

  the case actually comes up on the ppc because they do a lot of 128 bit 
  math.I think i got thru the x86-64 without noticing this.
 
 Well, it'd be suspicious if we're directly using 128-bit numbers
 in addr_wide_int.  The justification for the assertion was that we
 should explicitly truncate to addr_wide_int when deliberately
 ignoring upper bits, beyond bit or byte address width.  128 bits
 definitely falls into that category on powerpc.

 My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
 wide-int value if it has precision 16.

No, for a 16-bit wide_int it should be 0xff.  0x00 0xff 0xff is correct
for any wide_int wider than 16 bits though.

 AFAIK that is what the code produces,

In which case?  This is:

   precision == 16
   xprecision == 16
   len == 3
   max_len == 2

The MIN trims the len to 2 and then the loop Kenny added trims it
again to 1, so the 0x00 0xff 0xff becomes 0xff.  The 0x00 0xff
is still there in the array, but not used.

 but now Kenny says this is only for some kind
 of wide-ints but not all?  That is, why is

 inline wi::storage_ref
 wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch,
 unsigned int precision, const_tree 
 x)
 {
   unsigned int len = TREE_INT_CST_NUNITS (x);
   const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 
 0);
   return wi::storage_ref (val, len, precision);
 }

 not a valid implementation together with making sure that the
 INTEGER_CST tree rep has that extra word of zeros if required?

The fundamental problem here is that we're trying to support two cases:

(a) doing N-bit arithemtic in cases where the inputs have N bits
(b) doing N-bit arithmetic in cases where the inputs have fewer than N bits
and are extended according to TYPE_SIGN.

Let's assume 32-bit HWIs.  The 16-bit (4-hex-digit) constant 0x8000 is
0x8000 regardless of whether the type is signed or unsigned.  But if it's
extended to 32-bits you get two different numbers 0x8000 and 0x8000,
depending on the sign.

So for one value of the precision parameter (i.e. xprecision), signed
and unsigned constants produce the same number.  But for another value
of the precision parameter (those greater than xprecision), signed and
unsigned constants produce different numbers.  Yet at the moment the tree
constant has a single representation.

So I think the possibilities are:

(1) Use the representation of an N-bit wide_int to store N-bit tree constants.
Do work when extending them to wider wide_ints.

(2) Use the representation of max_wide_int to store N-bit tree constants.
Do work when creating an N-bit wide_int.

(3) Store both representations in the tree constant.

(4) Require all tree arithmetic to be done in the same way as rtl arithmetic,
with explicit extensions.  This gets rid of case (b).

(5) Require all tree arithemtic to be done in wider wide_ints than the inputs,
which 

Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-17 Thread Richard Biener
On Thu, Oct 17, 2013 at 4:14 AM, Andrew Pinski pins...@gmail.com wrote:
 On Wed, Oct 16, 2013 at 2:12 AM, Zhenqiang Chen
 zhenqiang.c...@linaro.org wrote:
 Hi,

 The patch enhances ifcombine pass to recover some non short circuit
 branches. Basically, it will do the following transformation:

 Case 1:
   if (cond1)
 if (cond2)
 ==
   if (cond1  cond2)

 Case 2:
   if (cond1)
 goto L1
   if (cond2)
 goto L1
 ==
   if (cond1 || cond2)
 goto L1

 Case 3:
   if (cond1)
 goto L1
   else
 goto L2
 L1:
   if (cond2)
 goto L2
 ==
   if (invert (cond1) || cond2)
 goto L2

 Case 4:
   if (cond1)
 goto L1
   if (cond2)
 goto L2
 L1:
 ==
   if (invert (cond1)  cond2)
 goto L2

 Bootstrap on X86-64 and ARM.

 Two new FAILs in regression tests:
 gcc.dg/uninit-pred-8_b.c
 gcc.dg/uninit-pred-9_b.c
 uninit pass should be enhanced to handle more complex conditions. Will
 submit a bug to track it and fix it later.

 Is it OK for trunk?

 I had a much simpler change which did basically the same from 4.7 (I
 can update it if people think this is a better approach).

I like that more (note you can now use is_gimple_condexpr as predicate
for force_gimple_operand).

With that we should be able to kill the fold-const.c transform?

Thanks,
Richard.

 Thanks,
 Andrew Pinski


 2012-09-29  Andrew Pinski  apin...@cavium.com

 * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
 (ifcombine_ifandif): Handle cases where
 maybe_fold_and_comparisons fails, combining the branches
 anyways.
 (tree_ssa_ifcombine): Inverse the order of
 the basic block walk, increases the number of combinings.
 * Makefile.in (tree-ssa-ifcombine.o): Update dependencies.

 * testsuite/gcc.dg/tree-ssa/phi-opt-2.c: Expect zero ifs as the 
 compiler
 produces a  b now.
 * testsuite/gcc.dg/tree-ssa/phi-opt-9.c: Use a function call
 to prevent conditional move to be used.
 * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for
 one or more intermediate.



 Thanks!
 -Zhenqiang

 ChangeLog:
 2013-10-16  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * fold-const.c (simple_operand_p_2): Make it global.
 * tree.h (simple_operand_p_2): Declare it.
 * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
 (bb_has_overhead_p, generate_condition_node,
 ifcombine_ccmp): New functions.
 (ifcombine_fold_ifandif): New function, extracted from
 ifcombine_ifandif.
 (ifcombine_ifandif): Call ifcombine_ccmp.
 (tree_ssa_ifcombine_bb): Skip optimized bb.

 testsuite/ChangeLog
 2013-10-16  Zhenqiang Chen  zhenqiang.c...@linaro.org

 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.
 * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Updated.


Re: patch to canonize unsigned tree-csts

2013-10-17 Thread Richard Biener
On Wed, 16 Oct 2013, Kenneth Zadeck wrote:

 On 10/15/2013 02:30 PM, Richard Sandiford wrote:
  Richard Sandiford rdsandif...@googlemail.com writes:
  if (small_prec)
;
  else if (precision == xprecision)
while (len = 0  val[len - 1] == -1)
  len--;
  Err, len  0 obviously.
 you were only close.patch tested on ppc and committed as revision 203739.
 
 Index: gcc/tree.c
 ===
 --- gcc/tree.c  (revision 203701)
 +++ gcc/tree.c  (working copy)
 @@ -1204,7 +1204,7 @@ wide_int_to_tree (tree type, const wide_
  }
 
wide_int cst = wide_int::from (pcst, prec, sgn);
 -  unsigned int len = int (cst.get_len ());
 +  unsigned int len = cst.get_len ();
unsigned int small_prec = prec  (HOST_BITS_PER_WIDE_INT - 1);
bool recanonize = sgn == UNSIGNED
   small_prec
 Index: gcc/tree.h
 ===
 --- gcc/tree.h  (revision 203701)
 +++ gcc/tree.h  (working copy)
 @@ -5204,13 +5204,13 @@ wi::int_traits const_tree::decompose (
   scratch[len - 1] = sext_hwi (val[len - 1], precision);
   return wi::storage_ref (scratch, len, precision);
 }
 -   }
 -
 -  if (precision  xprecision + HOST_BITS_PER_WIDE_INT)
 -   {
 - len = wi::force_to_size (scratch, val, len, xprecision, precision,
 UNS
 IGNED);
 - return wi::storage_ref (scratch, len, precision);
 -   }
 +   }
 +  /* We have to futz here because a large unsigned int with
 +precision 128 may look (0x0 0x 0xF...) as a
 +tree-cst and as (0xF...) as a wide-int.  */
 +  else if (precision == xprecision  len == max_len)
 +while (len  1  val[len - 1] == (HOST_WIDE_INT)-1)
 +  len--;
  }

Err, that now undoes the extra zero word thing?  Or was I confused
about the previous code and this append extra zero word for
MSB set unsigned constants?

Richard.


[PATCH][AArch64] Implement %c output template

2013-10-17 Thread Kyrill Tkachov

Hi all,

This patch implements the %c output template for inline asm. The code for it is 
almost identical to the support in arm, so it's pretty straightforward. I've 
added a few compile tests for it as well.


Tested aarch64-none-elf on a model.

Ok for trunk?

Thanks,
Kyrill

[gcc/]
2013-10-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/aarch64/aarch64.c (aarch64_print_operand): Handle 'c'.

[gcc/testsuite]
2013-10-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* gcc.target/aarch64/c-output-template.c: New testcase.
* gcc.target/aarch64/c-output-template-2.c: Likewise.
* gcc.target/aarch64/c-output-template-3.c: Likewise.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b61b453..d2a3d49 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3426,6 +3426,32 @@ aarch64_print_operand (FILE *f, rtx x, char code)
 {
   switch (code)
 {
+/* An integer or symbol address without a preceding # sign.  */
+case 'c':
+  switch (GET_CODE (x))
+	{
+	case CONST_INT:
+	  fprintf (f, HOST_WIDE_INT_PRINT_DEC, INTVAL (x));
+	  break;
+
+	case SYMBOL_REF:
+	  output_addr_const (f, x);
+	  break;
+
+	case CONST:
+	  if (GET_CODE (XEXP (x, 0)) == PLUS
+	   GET_CODE (XEXP (XEXP (x, 0), 0)) == SYMBOL_REF)
+	{
+	  output_addr_const (f, x);
+	  break;
+	}
+	  /* Fall through.  */
+
+	default:
+	  output_operand_lossage (Unsupported operand for code '%c', code);
+	}
+  break;
+
 case 'e':
   /* Print the sign/zero-extend size as a character 8-b, 16-h, 32-w.  */
   {
diff --git a/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c b/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c
new file mode 100644
index 000..16ff58d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+struct tracepoint {
+int dummy;
+int state;
+};
+static struct tracepoint tp;
+
+void
+test (void)
+{
+__asm__ (@ %c0 : : i (tp));
+}
+
+/* { dg-final { scan-assembler @ tp } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c b/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c
new file mode 100644
index 000..e332fe1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+struct tracepoint {
+int dummy;
+int state;
+};
+static struct tracepoint tp;
+
+void
+test (void)
+{
+__asm__ (@ %c0 : : i (tp.state));
+}
+
+/* { dg-final { scan-assembler @ tp\\+4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/c-output-template.c b/gcc/testsuite/gcc.target/aarch64/c-output-template.c
new file mode 100644
index 000..1b67c91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/c-output-template.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+
+void
+test (void)
+{
+__asm__ (@ %c0 : : i (42));
+}
+
+/* { dg-final { scan-assembler @ 42 } } */

Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.

2013-10-17 Thread Uros Bizjak
On Thu, Oct 17, 2013 at 12:47 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote:

  I suspect gen_lowpart is bad turn when reload is completed, as
  far as it can create new pseudo. gen_lowpart () may call
  gen_reg_rtx (), which contain corresponging gcc_assert ().

 False.  gen_lowpart is perfectly safe post-reload.
 Indeed, taking the subreg of a hard register should arrive

   x = gen_rtx_REG_offset (op, outermode, final_regno, final_offset);

 in simplify_subreg.

 Have you encountered some specific problem with gen_lowpart?
 Yes.

 Patch [8/8] contains testsuite for AVX-512.
 This pattern is covered as well.

 When trying to do so:

 (define_insn_and_split vec_extract_lo_v32hi
   [(set (match_operand:V16HI 0 nonimmediate_operand =v,m)
 (vec_select:V16HI
   (match_operand:V32HI 1 nonimmediate_operand vm,v)
   (parallel [(const_int 0) (const_int 1)
  (const_int 2) (const_int 3)
  (const_int 4) (const_int 5)
  (const_int 6) (const_int 7)
  (const_int 8) (const_int 9)
  (const_int 10) (const_int 11)
  (const_int 12) (const_int 13)
  (const_int 14) (const_int 15)])))]
   TARGET_AVX512F  !(MEM_P (operands[0])  MEM_P (operands[1]))
   #
reload_completed
   [(const_int 0)]
 {
   rtx op1 = operands[1];
   op1 = gen_lowpart (V16HImode, op1);
   emit_move_insn (operands[0], op1);
   DONE;
 })

 I've got ICE, with following bt:

 #1  0x006f28d6 in gen_reg_rtx (mode=V32HImode) at 
 /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:866
 #2  0x0070243a in copy_to_reg (x=(reg:V32HI 21 xmm0 [163])) at 
 /export/users/kyukhin/gcc/git/gcc/gcc/explow.c\
 :606
 #3  0x0091dfb8 in gen_lowpart_general (mode=V16HImode, x=optimized 
 out)
 at /export/users/kyukhin/gcc/git/gcc/gcc/rtlhooks.c:50
 #4  0x00ce16e8 in gen_split_4943 (curr_insn=optimized out, 
 operands=0x16f6320)
 at /export/users/kyukhin/gcc/git/gcc/gcc/config/i386/sse.md:6329
 #5  0x006f7865 in try_split (pat=(set (reg:V16HI 23 xmm2 [164])
 (vec_select:V16HI (reg:V32HI 21 xmm0 [163])
 (parallel [
 (const_int 0 [0])
 (const_int 1 [0x1])
 (const_int 2 [0x2])
 (const_int 3 [0x3])
 (const_int 4 [0x4])
 (const_int 5 [0x5])
 (const_int 6 [0x6])
 (const_int 7 [0x7])
 (const_int 8 [0x8])
 (const_int 9 [0x9])
 (const_int 10 [0xa])
 (const_int 11 [0xb])
 (const_int 12 [0xc])
 (const_int 13 [0xd])
 (const_int 14 [0xe])
 (const_int 15 [0xf])
 ]))), trial=(insn 48 46 49 6 (set (reg:V16HI 23 xmm2 [164])
 (vec_select:V16HI (reg:V32HI 21 xmm0 [163])
 (parallel [
 (const_int 0 [0])
 (const_int 1 [0x1])
 (const_int 2 [0x2])
 (const_int 3 [0x3])
 (const_int 4 [0x4])
 (const_int 5 [0x5])
 (const_int 6 [0x6])
 (const_int 7 [0x7])
 (const_int 8 [0x8])
 (const_int 9 [0x9])
 (const_int 10 [0xa])
 (const_int 11 [0xb])
 (const_int 12 [0xc])
 (const_int 13 [0xd])
 (const_int 14 [0xe])
 (const_int 15 [0xf])
 ]))) 
 /export/users/kyukhin/gcc/git/gcc/gcc/testsuite/gcc.target/i386/avx512f-vec-unpack.c:24
  2151 {ve\
 c_extract_lo_v32hi}
  (nil)), last=optimized out)
 at /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:3467

 So, we have:
 [rtlhooks.c:50]gen_lowpart_general () -
 [explow.c:606]copy_to_reg () -
 [emit-rtl.c:866]gen_reg_rtx (): gcc_assert (can_create_pseudo_p ());

 Maybe the code in the pattern is buggy? Or is it a gen_lowpart?

I think that original approach with gen_rtx_REG is correct and follows
established practice in sse.md (please grep for gen_reg_RTX in
sse.md). If this approach is necessary due to the deficiency of
gen_lowpart, then the fix to gen_lowpart should be proposed in a
follow-up patch.

Uros.


Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
  On Thu, 17 Oct 2013, Richard Sandiford wrote:
 
  Kenneth Zadeck zad...@naturalbridge.com writes:
   As mentioned in my message yesterday, I thought your new way of 
   canonising
   unsigned tree constants meant that there was always an upper zero bit.
   Is that right?
   i believe this is correct.
   If so, xprecision  precision is a no-op, because the number always
   has the right form for wider precisions.  The only difficult case is
   xprecision == precision, since then we need to peel off any upper -1 
   HWIs.
   say my HWI size is 8 bits (just to keep from typing a million 'f's.   if 
   i have a 16 bit unsigned number that is all 1s in the tree world it is 3 
   hwis
   0x00 0xff 0xff.
  
   but inside regular wide int, it would take 1 wide int whose value is 
   0xff.
   inside of max it would be the same as the tree, but then the test 
   precision  xprecision + hbpwi never kicks in because precision is 
   guaranteed to be huge.
  
   inside of addr_wide_int, i think we tank with the assertion.
  
  It should be OK for addr_wide_int too.  The precision still fits 2 HWIs.
  The initial length is greater than the maximum length of an addr_wide_int,
  but your len = MAX (len, max_len) deals with that.
 
  It's
 
len = MIN (len, max_len)
 
 Oops, yeah, I meant MIN, sorry.

  which looked suspicious to me, but with precision = xprecision
  precision can only be zero if xprecision is zero which looked to
  me like it cannot happen - or rather it should be fixed.
 
 Despite the comment above the code, I don't think this MIN is there
 for the zero-precision case.  I think it's there to handle the new
 tree representation.
 
 The new tree representation can have a length greater than max_len
 for an unsigned tree constant that occupies a whole number of HWIs.
 The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
 When extended to max_wide_int the representation is the same.
 But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
 The MIN trims the length from 3 to 2 in the last case.

Oh, so it was the tree rep that changed?  _Why_ was it changed?
We still cannot use it directly from wide-int and the extra
word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).
 
   the case actually comes up on the ppc because they do a lot of 128 bit 
   math.I think i got thru the x86-64 without noticing this.
  
  Well, it'd be suspicious if we're directly using 128-bit numbers
  in addr_wide_int.  The justification for the assertion was that we
  should explicitly truncate to addr_wide_int when deliberately
  ignoring upper bits, beyond bit or byte address width.  128 bits
  definitely falls into that category on powerpc.
 
  My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
  wide-int value if it has precision 16.
 
 No, for a 16-bit wide_int it should be 0xff.  0x00 0xff 0xff is correct
 for any wide_int wider than 16 bits though.
 
  AFAIK that is what the code produces,
 
 In which case?  This is:
 
precision == 16
xprecision == 16
len == 3
max_len == 2
 
 The MIN trims the len to 2 and then the loop Kenny added trims it
 again to 1, so the 0x00 0xff 0xff becomes 0xff.  The 0x00 0xff
 is still there in the array, but not used.
 
  but now Kenny says this is only for some kind
  of wide-ints but not all?  That is, why is
 
  inline wi::storage_ref
  wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch,
  unsigned int precision, const_tree 
  x)
  {
unsigned int len = TREE_INT_CST_NUNITS (x);
const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 
  0);
return wi::storage_ref (val, len, precision);
  }
 
  not a valid implementation together with making sure that the
  INTEGER_CST tree rep has that extra word of zeros if required?
 
 The fundamental problem here is that we're trying to support two cases:
 
 (a) doing N-bit arithemtic in cases where the inputs have N bits
 (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits
 and are extended according to TYPE_SIGN.
 
 Let's assume 32-bit HWIs.  The 16-bit (4-hex-digit) constant 0x8000 is
 0x8000 regardless of whether the type is signed or unsigned.  But if it's
 extended to 32-bits you get two different numbers 0x8000 and 0x8000,
 depending on the sign.
 
 So for one value of the precision parameter (i.e. xprecision), signed
 and unsigned constants produce the same number.  But for another value
 of the precision parameter (those greater than xprecision), signed and
 unsigned constants produce different numbers.  Yet at the moment the tree
 constant has a single representation.

But a correctly extended one, up to its len!  (as opposed to RTL)

 So I think the possibilities are:
 
 (1) Use the representation of an N-bit wide_int to store N-bit tree constants.
 Do work when 

Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-10-17 Thread Ilya Tocar
Ping.

On 09 Oct 19:12, Ilya Tocar wrote:
 Ping.
 
 On 03 Oct 20:05, Ilya Tocar wrote:
  On 26 Sep 21:21, Ilya Tocar wrote:
   On 25 Sep 15:48, Richard Biener wrote:
On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com 
wrote:
 On 24 Sep 11:02, Richard Biener wrote:
 On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar 
 tocarip.in...@gmail.com wrote:
  thus consider assigning the section
 name in a different place.

 Richard.

 What do you mean  by different place?
 I can add global dumping_omp_target variable to choose correct name,
 depending on it's value (patch below). Is it better?

More like passing down a different abstraction, like for

 @@ -907,9 +907,15 @@ output_symtab (void)
  {
symtab_node node = lto_symtab_encoder_deref (encoder, i);
if (cgraph_node *cnode = dyn_cast cgraph_node (node))
 -lto_output_node (ob, cnode, encoder);
 +   {
 + if (!dumping_omp_target || lookup_attribute (omp declare 
 target,
 + DECL_ATTRIBUTES 
 (node-symbol.decl)))
 +   lto_output_node (ob, cnode, encoder);
 +   }
else
 -lto_output_varpool_node (ob, varpool (node), encoder);
 + if (!dumping_omp_target || lookup_attribute (omp declare 
 target,
 + DECL_ATTRIBUTES 
 (node-symbol.decl)))
 +   lto_output_varpool_node (ob, varpool (node), encoder);

  }

have the symtab encoder already not contain the varpool nodes you
don't need.

And instead of looking up attributes, mark the symtab node with a flag.
   
   Good idea!
   I've tried creating 2 encoders, and adding only nodes with
   omp declare target attribute in omp case. There is still some is_omp
   passing to control  lto_set_symtab_encoder_in_partition behaivor, 
   because i think it's better than global var.
   What do you think?
  
  Updated version of the patch. I've checked that it doesn't break lto on
  SPEC 2006. Streaming for omp is enabled by -fopnemp flag. Works with and
  without enabled lto. Ok for gomp4 branch?
  
  
  ---
   gcc/cgraphunit.c  | 15 +--
   gcc/ipa-inline-analysis.c |  2 +-
   gcc/lto-cgraph.c  | 15 ++-
   gcc/lto-streamer.c|  5 +++--
   gcc/lto-streamer.h| 10 --
   gcc/lto/lto-partition.c   |  4 ++--
   gcc/passes.c  | 12 ++--
   gcc/tree-pass.h   |  2 +-
   8 files changed, 44 insertions(+), 21 deletions(-)
  
  diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
  index 1644ca9..d595475 100644
  --- a/gcc/cgraphunit.c
  +++ b/gcc/cgraphunit.c
  @@ -2016,7 +2016,18 @@ ipa_passes (void)
passes-all_lto_gen_passes);
   
 if (!in_lto_p)
  -ipa_write_summaries ();
  +{
  +  if (flag_openmp)
  +   {
  + section_name_prefix = OMP_SECTION_NAME_PREFIX;
  + ipa_write_summaries (true);
  +   }
  +  if (flag_lto)
  +   {
  + section_name_prefix = LTO_SECTION_NAME_PREFIX;
  + ipa_write_summaries (false);
  +   }
  +}
   
 if (flag_generate_lto)
   targetm.asm_out.lto_end ();
  @@ -2107,7 +2118,7 @@ compile (void)
 cgraph_state = CGRAPH_STATE_IPA;
   
 /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
  -  if (flag_lto)
  +  if (flag_lto || flag_openmp)
   lto_streamer_hooks_init ();
   
 /* Don't run the IPA passes if there was any error or sorry messages.  */
  diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
  index ba6221e..4420213 100644
  --- a/gcc/ipa-inline-analysis.c
  +++ b/gcc/ipa-inline-analysis.c
  @@ -3721,7 +3721,7 @@ inline_generate_summary (void)
   
 /* When not optimizing, do not bother to analyze.  Inlining is still done
because edge redirection needs to happen there.  */
  -  if (!optimize  !flag_lto  !flag_wpa)
  +  if (!optimize  !flag_lto  !flag_wpa  !flag_openmp)
   return;
   
 function_insertion_hook_holder =
  diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
  index 952588d..4a7d179 100644
  --- a/gcc/lto-cgraph.c
  +++ b/gcc/lto-cgraph.c
  @@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p 
  (lto_symtab_encoder_t encoder,
   
   void
   lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
  -symtab_node node)
  +symtab_node node, bool is_omp)
   {
  +  /* Ignore non omp target nodes for omp case.  */
  +  if (is_omp  !lookup_attribute (omp declare target,
  +  DECL_ATTRIBUTES (node-symbol.decl)))
  +return;
  +
 int index = lto_symtab_encoder_encode (encoder, (symtab_node)node);
 encoder-nodes[index].in_partition = true;
   }
  @@ -760,7 +765,7 @@ add_references (lto_symtab_encoder_t encoder,
  ignored by the 

Re: [wide-int] int_traits tree

2013-10-17 Thread Kenneth Zadeck

On 10/17/2013 04:46 AM, Richard Biener wrote:

the case actually comes up on the ppc because they do a lot of 128 bit
math.I think i got thru the x86-64 without noticing this.

Well, it'd be suspicious if we're directly using 128-bit numbers
in addr_wide_int.  The justification for the assertion was that we
should explicitly truncate to addr_wide_int when deliberately
ignoring upper bits, beyond bit or byte address width.  128 bits
definitely falls into that category on powerpc.

My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
wide-int value if it has precision 16.  AFAIK that is what the
code produces, but now Kenny says this is only for some kind
of wide-ints but not all?  That is, why is


The issue is not that the rules are different between the different 
flavors of wide int, it is that the circumstances are different. The 
rule is that the only bits above the precision that exist are if the 
precision is not an even multiple of the HBPWI.   In that case, the bits 
are always an extension of the sign bits.


max_wide_int and addr_wide_int have large enough precisions so that it 
is impossible to ever generate an unsigned number on the target that is 
large enough to ever run against the precision.However, for the 
fixed precision case, you can, and at least on the ppc, you do, generate 
unsigned numbers that are big enough to have to be recanonicalized like 
this.



inline wi::storage_ref
wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch,
 unsigned int precision, const_tree
x)
{
   unsigned int len = TREE_INT_CST_NUNITS (x);
   const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x,
0);
   return wi::storage_ref (val, len, precision);
}

not a valid implementation together with making sure that the
INTEGER_CST tree rep has that extra word of zeros if required?
I would like to see us move in that direction (given that the
tree rep of INTEGER_CST has transitioned to variable-length already).

Btw, code such as

tree
wide_int_to_tree (tree type, const wide_int_ref pcst)
{
...
   unsigned int small_prec = prec  (HOST_BITS_PER_WIDE_INT - 1);
   bool recanonize = sgn == UNSIGNED
  small_prec
  (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT ==
len;

definitely needs a comment.  I would have thought _all_ unsigned
numbers need re-canonicalization.  Well, maybe only if we're
forcing the extra zeros.

It will get a comment.

[This function shows another optimization issue:

 case BOOLEAN_TYPE:
   /* Cache false or true.  */
   limit = 2;
   if (wi::leu_p (cst, 1))
 ix = cst.to_uhwi ();

I would have expected cst = 1 be optimized to cst.len == 1 
cst.val[0] = 1.  It expands to

L27:
   MEM[(long int *)D.50698 + 16B] = 1;
   MEM[(struct wide_int_ref_storage *)D.50698] = MEM[(struct
wide_int_ref_storage *)D.50698].scratch;
   MEM[(struct wide_int_ref_storage *)D.50698 + 8B] = 1;
   MEM[(struct wide_int_ref_storage *)D.50698 + 12B] = 32;
   _277 = MEM[(const struct wide_int_storage *)cst + 260B];
   if (_277 = 64)
 goto bb 42;
   else
 goto bb 43;

   bb 42:
   xl_491 = zext_hwi (1, 32);  // ok, checking enabled and thus out-of-line
   _494 = MEM[(const long int *)cst];
   _495 = (long unsigned int) _494;
   yl_496 = zext_hwi (_495, _277);
   _497 = xl_491  yl_496;
   goto bb 44;

   bb 43:
   _503 = wi::ltu_p_large (MEM[(struct wide_int_ref_storage
*)D.50698].scratch, 1, 32, MEM[(const struct wide_int_storage
*)cst].val, len_274, _277);

this keeps D.50698 and cst un-SRAable - inline storage is problematic
for this reason.  But the representation should guarantee the
compare with a low precision (32 bit) constant is evaluatable
at compile-time if len of the larger value is  1, no?

   bb 44:
   # _504 = PHI _497(42), _503(43)
   D.50698 ={v} {CLOBBER};
   if (_504 != 0)
 goto bb 45;
   else
 goto bb 46;

   bb 45:
   pretmp_563 = MEM[(const struct wide_int_storage *)cst + 256B];
   goto bb 229 (L131);

   bb 46:
   _65 = generic_wide_intwide_int_storage::to_uhwi (cst, 0);
   ix_66 = (int) _65;
   goto bb 91;

The question is whether we should try to optimize wide-int for
such cases or simply not use wi:leu_p (cst, 1) but rather

  if (cst.fits_uhwi_p () == 1  cst.to_uhwi ()  1)

?
i find this ugly, but i see where you are coming from.   The problem is 
that both you and i know that the len has to be 1, but the optimizer 
does not.   This is a case where I think that we made a mistake getting 
rid of the wi::one_p, wi::zero_p and wi::minus_one_p.  The internals of 
one_p were return (len == 1  val[0] ==1) and i think that is much 
nicer than what you put there.  On the other hand, it seem that a 
person more skilled than i am with c++ could specialize the comparisons 
with an integer constant, since i believe that that constant must fit in 
one hwi (I am a little concerned about large unsigned constants).



Thanks,
Richard






Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Kenneth Zadeck wrote:

 On 10/17/2013 04:46 AM, Richard Biener wrote:
  the case actually comes up on the ppc because they do a lot of 128 bit
  math.I think i got thru the x86-64 without noticing this.
   Well, it'd be suspicious if we're directly using 128-bit numbers
   in addr_wide_int.  The justification for the assertion was that we
   should explicitly truncate to addr_wide_int when deliberately
   ignoring upper bits, beyond bit or byte address width.  128 bits
   definitely falls into that category on powerpc.
  My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
  wide-int value if it has precision 16.  AFAIK that is what the
  code produces, but now Kenny says this is only for some kind
  of wide-ints but not all?  That is, why is
 
 The issue is not that the rules are different between the different flavors of
 wide int, it is that the circumstances are different. The rule is that the
 only bits above the precision that exist are if the precision is not an even
 multiple of the HBPWI.   In that case, the bits are always an extension of the
 sign bits.
 
 max_wide_int and addr_wide_int have large enough precisions so that it is
 impossible to ever generate an unsigned number on the target that is large
 enough to ever run against the precision.However, for the fixed precision
 case, you can, and at least on the ppc, you do, generate unsigned numbers that
 are big enough to have to be recanonicalized like this.
 
  inline wi::storage_ref
  wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch,
   unsigned int precision, const_tree
  x)
  {
 unsigned int len = TREE_INT_CST_NUNITS (x);
 const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x,
  0);
 return wi::storage_ref (val, len, precision);
  }
  
  not a valid implementation together with making sure that the
  INTEGER_CST tree rep has that extra word of zeros if required?
  I would like to see us move in that direction (given that the
  tree rep of INTEGER_CST has transitioned to variable-length already).
  
  Btw, code such as
  
  tree
  wide_int_to_tree (tree type, const wide_int_ref pcst)
  {
  ...
 unsigned int small_prec = prec  (HOST_BITS_PER_WIDE_INT - 1);
 bool recanonize = sgn == UNSIGNED
small_prec
(prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT ==
  len;
  
  definitely needs a comment.  I would have thought _all_ unsigned
  numbers need re-canonicalization.  Well, maybe only if we're
  forcing the extra zeros.
 It will get a comment.
  [This function shows another optimization issue:
  
   case BOOLEAN_TYPE:
 /* Cache false or true.  */
 limit = 2;
 if (wi::leu_p (cst, 1))
   ix = cst.to_uhwi ();
  
  I would have expected cst = 1 be optimized to cst.len == 1 
  cst.val[0] = 1.  It expands to
  
  L27:
 MEM[(long int *)D.50698 + 16B] = 1;
 MEM[(struct wide_int_ref_storage *)D.50698] = MEM[(struct
  wide_int_ref_storage *)D.50698].scratch;
 MEM[(struct wide_int_ref_storage *)D.50698 + 8B] = 1;
 MEM[(struct wide_int_ref_storage *)D.50698 + 12B] = 32;
 _277 = MEM[(const struct wide_int_storage *)cst + 260B];
 if (_277 = 64)
   goto bb 42;
 else
   goto bb 43;
  
 bb 42:
 xl_491 = zext_hwi (1, 32);  // ok, checking enabled and thus out-of-line
 _494 = MEM[(const long int *)cst];
 _495 = (long unsigned int) _494;
 yl_496 = zext_hwi (_495, _277);
 _497 = xl_491  yl_496;
 goto bb 44;
  
 bb 43:
 _503 = wi::ltu_p_large (MEM[(struct wide_int_ref_storage
  *)D.50698].scratch, 1, 32, MEM[(const struct wide_int_storage
  *)cst].val, len_274, _277);
  
  this keeps D.50698 and cst un-SRAable - inline storage is problematic
  for this reason.  But the representation should guarantee the
  compare with a low precision (32 bit) constant is evaluatable
  at compile-time if len of the larger value is  1, no?
  
 bb 44:
 # _504 = PHI _497(42), _503(43)
 D.50698 ={v} {CLOBBER};
 if (_504 != 0)
   goto bb 45;
 else
   goto bb 46;
  
 bb 45:
 pretmp_563 = MEM[(const struct wide_int_storage *)cst + 256B];
 goto bb 229 (L131);
  
 bb 46:
 _65 = generic_wide_intwide_int_storage::to_uhwi (cst, 0);
 ix_66 = (int) _65;
 goto bb 91;
  
  The question is whether we should try to optimize wide-int for
  such cases or simply not use wi:leu_p (cst, 1) but rather
  
if (cst.fits_uhwi_p () == 1  cst.to_uhwi ()  1)
  
  ?
 i find this ugly, but i see where you are coming from.   The problem is that
 both you and i know that the len has to be 1, but the optimizer does not.
 This is a case where I think that we made a mistake getting rid of the
 wi::one_p, wi::zero_p and wi::minus_one_p.  The internals of one_p were return
 (len == 1  val[0] ==1) and i think that is much nicer than what you put
 there.  On the other hand, it seem that a person more 

Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Sandiford
Richard Biener rguent...@suse.de writes:
 The new tree representation can have a length greater than max_len
 for an unsigned tree constant that occupies a whole number of HWIs.
 The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
 When extended to max_wide_int the representation is the same.
 But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
 The MIN trims the length from 3 to 2 in the last case.

 Oh, so it was the tree rep that changed?  _Why_ was it changed?
 We still cannot use it directly from wide-int and the extra
 word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).

It means that we can now use the tree's HWI array directly, without any
copying, for addr_wide_int and max_wide_int.  The only part of decompose ()
that does a copy is the small_prec case, which is trivially compiled out
for addr_wide_int and max_wide_int.

   the case actually comes up on the ppc because they do a lot of 128 bit 
   math.I think i got thru the x86-64 without noticing this.
  
  Well, it'd be suspicious if we're directly using 128-bit numbers
  in addr_wide_int.  The justification for the assertion was that we
  should explicitly truncate to addr_wide_int when deliberately
  ignoring upper bits, beyond bit or byte address width.  128 bits
  definitely falls into that category on powerpc.
 
  My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
  wide-int value if it has precision 16.
 
 No, for a 16-bit wide_int it should be 0xff.  0x00 0xff 0xff is correct
 for any wide_int wider than 16 bits though.
 
  AFAIK that is what the code produces,
 
 In which case?  This is:
 
precision == 16
xprecision == 16
len == 3
max_len == 2
 
 The MIN trims the len to 2 and then the loop Kenny added trims it
 again to 1, so the 0x00 0xff 0xff becomes 0xff.  The 0x00 0xff
 is still there in the array, but not used.
 
  but now Kenny says this is only for some kind
  of wide-ints but not all?  That is, why is
 
  inline wi::storage_ref
  wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch,
  unsigned int precision, const_tree 
  x)
  {
unsigned int len = TREE_INT_CST_NUNITS (x);
const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 
  0);
return wi::storage_ref (val, len, precision);
  }
 
  not a valid implementation together with making sure that the
  INTEGER_CST tree rep has that extra word of zeros if required?
 
 The fundamental problem here is that we're trying to support two cases:
 
 (a) doing N-bit arithemtic in cases where the inputs have N bits
 (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits
 and are extended according to TYPE_SIGN.
 
 Let's assume 32-bit HWIs.  The 16-bit (4-hex-digit) constant 0x8000 is
 0x8000 regardless of whether the type is signed or unsigned.  But if it's
 extended to 32-bits you get two different numbers 0x8000 and 0x8000,
 depending on the sign.
 
 So for one value of the precision parameter (i.e. xprecision), signed
 and unsigned constants produce the same number.  But for another value
 of the precision parameter (those greater than xprecision), signed and
 unsigned constants produce different numbers.  Yet at the moment the tree
 constant has a single representation.

 But a correctly extended one, up to its len!  (as opposed to RTL)

But extending the precision can change the right value of len.
Take the same example with 16-bit HWIs.  In wide_int terms, and with
the original tree representation, the constant is a single HWI:

0x8000

with len 1.  And in case (a) -- where we're asking for a 16-bit wide_int --
this single HWI is all we want.  The signed and unsigned constants give
the same wide_int.

But the same constant extended to 32 bits and left uncompressed would be
two HWIs:

0x 0x8000 for unsigned constants
0x 0x8000 for signed constants

Compressed according to the sign scheme they are:

0x 0x8000 (len == 2) for unsigned constants
   0x8000 (len == 1) for signed constants

which is also the new tree representation.

So the unsigned case is different for (a) and (b).

 So I think the possibilities are:
 
 (1) Use the representation of an N-bit wide_int to store N-bit tree 
 constants.
 Do work when extending them to wider wide_ints.
 
 (2) Use the representation of max_wide_int to store N-bit tree constants.
 Do work when creating an N-bit wide_int.
 
 (3) Store both representations in the tree constant.
 
 (4) Require all tree arithmetic to be done in the same way as rtl arithmetic,
 with explicit extensions.  This gets rid of case (b).
 
 (5) Require all tree arithemtic to be done in wider wide_ints than the 
 inputs,
 which I think is what you preferred.  This gets rid of case (a).
 
 (6) Allow the same wide_int constant to have several different
 representations.
 Remember that this is to some extent what Kenny's original 

Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
  The new tree representation can have a length greater than max_len
  for an unsigned tree constant that occupies a whole number of HWIs.
  The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
  When extended to max_wide_int the representation is the same.
  But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
  The MIN trims the length from 3 to 2 in the last case.
 
  Oh, so it was the tree rep that changed?  _Why_ was it changed?
  We still cannot use it directly from wide-int and the extra
  word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).
 
 It means that we can now use the tree's HWI array directly, without any
 copying, for addr_wide_int and max_wide_int.  The only part of decompose ()
 that does a copy is the small_prec case, which is trivially compiled out
 for addr_wide_int and max_wide_int.

 2) addr_wide_int.  This is a fixed size representation that is
 guaranteed to be large enough to compute any bit or byte sized
 address calculation on the target.  Currently the value is 64 + 4
 bits rounded up to the next number even multiple of
 HOST_BITS_PER_WIDE_INT (but this can be changed when the first
 port needs more than 64 bits for the size of a pointer).

 This flavor can be used for all address math on the target.  In
 this representation, the values are sign or zero extended based
 on their input types to the internal precision.  All math is done
 in this precision and then the values are truncated to fit in the
 result type.  Unlike most gimple or rtl intermediate code, it is
 not useful to perform the address arithmetic at the same
 precision in which the operands are represented because there has
 been no effort by the front ends to convert most addressing
 arithmetic to canonical types.

 In the addr_wide_int, all numbers are represented as signed
 numbers.  There are enough bits in the internal representation so
 that no infomation is lost by representing them this way.

so I guess from that that addr_wide_int.get_precision is always
that 64 + 4 rounded up.  Thus decompose gets that constant precision
input and the extra zeros make the necessary extension always a no-op.
Aha.

For max_wide_int the same rules apply, just its size is larger.

Ok.  So the reps are only canonical wide-int because we only
ever use them with precision  xprecision (maybe we should assert
that).

Btw, we are not using them directly, but every time we actually
build a addr_wide_int / max_wide_int we copy them anyway:

/* Initialize the storage from integer X, in precision N.  */
template int N
template typename T
inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x)
{
  /* Check for type compatibility.  We don't want to initialize a
 fixed-width integer from something like a wide_int.  */
  WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
  wide_int_ref xi (x, N);
  len = xi.len;
  for (unsigned int i = 0; i  len; ++i)
val[i] = xi.val[i];
}

it avoids a 2nd copy though, which shows nicely what was rummaging in
my head for the last two days - that the int_trais  abstraction
was somehow at the wrong level - it should have been traits that
are specific to the storage model?  or the above should use
int_traits::decompose manually with it always doing the 
copy (that would also optimize away one copy and eventually
would make the extra zeros not necessary).

I originally thought that extra zeros get rid of all copying from trees
to all wide-int kinds.

What's the reason again to not use my original proposed encoding
of the MSB being the sign bit?  RTL constants simply are all signed
then.  Just you have to also sign-extend in functions like lts_p
as not all constants are sign-extended.  But we can use both tree
(with the now appended zero) and RTL constants representation
unchanged.

Richard.

the case actually comes up on the ppc because they do a lot of 128 bit 
math.I think i got thru the x86-64 without noticing this.
   
   Well, it'd be suspicious if we're directly using 128-bit numbers
   in addr_wide_int.  The justification for the assertion was that we
   should explicitly truncate to addr_wide_int when deliberately
   ignoring upper bits, beyond bit or byte address width.  128 bits
   definitely falls into that category on powerpc.
  
   My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
   wide-int value if it has precision 16.
  
  No, for a 16-bit wide_int it should be 0xff.  0x00 0xff 0xff is correct
  for any wide_int wider than 16 bits though.
  
   AFAIK that is what the code produces,
  
  In which case?  This is:
  
 precision == 16
 xprecision == 16
 len == 3
 max_len == 2
  
  The MIN trims the len to 2 and then the loop Kenny added trims it
  again to 1, so the 0x00 0xff 0xff becomes 0xff.  The 0x00 

Re: [wide-int] int_traits tree

2013-10-17 Thread Kenneth Zadeck

On 10/17/2013 07:49 AM, Richard Biener wrote:

On Thu, 17 Oct 2013, Kenneth Zadeck wrote:


On 10/17/2013 04:46 AM, Richard Biener wrote:

the case actually comes up on the ppc because they do a lot of 128 bit
math.I think i got thru the x86-64 without noticing this.

Well, it'd be suspicious if we're directly using 128-bit numbers
in addr_wide_int.  The justification for the assertion was that we
should explicitly truncate to addr_wide_int when deliberately
ignoring upper bits, beyond bit or byte address width.  128 bits
definitely falls into that category on powerpc.

My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
wide-int value if it has precision 16.  AFAIK that is what the
code produces, but now Kenny says this is only for some kind
of wide-ints but not all?  That is, why is

The issue is not that the rules are different between the different flavors of
wide int, it is that the circumstances are different. The rule is that the
only bits above the precision that exist are if the precision is not an even
multiple of the HBPWI.   In that case, the bits are always an extension of the
sign bits.

max_wide_int and addr_wide_int have large enough precisions so that it is
impossible to ever generate an unsigned number on the target that is large
enough to ever run against the precision.However, for the fixed precision
case, you can, and at least on the ppc, you do, generate unsigned numbers that
are big enough to have to be recanonicalized like this.


inline wi::storage_ref
wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch,
  unsigned int precision, const_tree
x)
{
unsigned int len = TREE_INT_CST_NUNITS (x);
const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x,
0);
return wi::storage_ref (val, len, precision);
}

not a valid implementation together with making sure that the
INTEGER_CST tree rep has that extra word of zeros if required?
I would like to see us move in that direction (given that the
tree rep of INTEGER_CST has transitioned to variable-length already).

Btw, code such as

tree
wide_int_to_tree (tree type, const wide_int_ref pcst)
{
...
unsigned int small_prec = prec  (HOST_BITS_PER_WIDE_INT - 1);
bool recanonize = sgn == UNSIGNED
   small_prec
   (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT ==
len;

definitely needs a comment.  I would have thought _all_ unsigned
numbers need re-canonicalization.  Well, maybe only if we're
forcing the extra zeros.

It will get a comment.

[This function shows another optimization issue:

  case BOOLEAN_TYPE:
/* Cache false or true.  */
limit = 2;
if (wi::leu_p (cst, 1))
  ix = cst.to_uhwi ();

I would have expected cst = 1 be optimized to cst.len == 1 
cst.val[0] = 1.  It expands to

L27:
MEM[(long int *)D.50698 + 16B] = 1;
MEM[(struct wide_int_ref_storage *)D.50698] = MEM[(struct
wide_int_ref_storage *)D.50698].scratch;
MEM[(struct wide_int_ref_storage *)D.50698 + 8B] = 1;
MEM[(struct wide_int_ref_storage *)D.50698 + 12B] = 32;
_277 = MEM[(const struct wide_int_storage *)cst + 260B];
if (_277 = 64)
  goto bb 42;
else
  goto bb 43;

bb 42:
xl_491 = zext_hwi (1, 32);  // ok, checking enabled and thus out-of-line
_494 = MEM[(const long int *)cst];
_495 = (long unsigned int) _494;
yl_496 = zext_hwi (_495, _277);
_497 = xl_491  yl_496;
goto bb 44;

bb 43:
_503 = wi::ltu_p_large (MEM[(struct wide_int_ref_storage
*)D.50698].scratch, 1, 32, MEM[(const struct wide_int_storage
*)cst].val, len_274, _277);

this keeps D.50698 and cst un-SRAable - inline storage is problematic
for this reason.  But the representation should guarantee the
compare with a low precision (32 bit) constant is evaluatable
at compile-time if len of the larger value is  1, no?

bb 44:
# _504 = PHI _497(42), _503(43)
D.50698 ={v} {CLOBBER};
if (_504 != 0)
  goto bb 45;
else
  goto bb 46;

bb 45:
pretmp_563 = MEM[(const struct wide_int_storage *)cst + 256B];
goto bb 229 (L131);

bb 46:
_65 = generic_wide_intwide_int_storage::to_uhwi (cst, 0);
ix_66 = (int) _65;
goto bb 91;

The question is whether we should try to optimize wide-int for
such cases or simply not use wi:leu_p (cst, 1) but rather

   if (cst.fits_uhwi_p () == 1  cst.to_uhwi ()  1)

?

i find this ugly, but i see where you are coming from.   The problem is that
both you and i know that the len has to be 1, but the optimizer does not.
This is a case where I think that we made a mistake getting rid of the
wi::one_p, wi::zero_p and wi::minus_one_p.  The internals of one_p were return
(len == 1  val[0] ==1) and i think that is much nicer than what you put
there.  On the other hand, it seem that a person more skilled than i am
with c++ could specialize the comparisons with an integer constant, since i
believe that that constant must fit in 

RFA: Change behaviour of %* in spec strings

2013-10-17 Thread Nick Clifton
Hi Guys,

  I would like to make a change to the way %* behaves in spec strings.
  Currently whenever %* is encountered the text that was previously
  matched is substituted, followed by a space.  Thus %{foo=*:bar%*baz}
  would match -foo=4 and insert 'bar4 baz'.  I would like to change this
  so that the space is only inserted if %* is at the end of a spec
  sequence.  (In all of the strings currently built in to gcc this is
  the case, so my patch will not change current behaviour).

  The motivation is that I want to construct a pathname based on a
  command line option using a spec string like this:

%{mmcu=*:--script=%*/memory.ld}

  So that if the user invokes -mmcu=msp430f2617 then gcc will generate:

--script=msp430f2617/memory.ld

  A spec string like this however:

%{mmcu=*:--script=%*}/memory.ld

  would match the current behaviour and generate:
  
--script=msp430f2617 /memory.ld

  As a secondary feature of the patch I have also updated the
  documentation to explicitly state when a space will be inserted into
  the generated text.

  I have tested the patch and found no regressions using an
  i686-pc-liunx-gnu toolchain and an msp430-elf toolchain.

  OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2013-10-17  Nick Clifton  ni...@redhat.com

* gcc.c (do_spec_1): Do not insert a space after a %* substitution
unless it is the last part of a spec substring.
* doc/invoke.texi (Spec Files): Document space insertion
behaviour of %*.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 203750)
+++ gcc/doc/invoke.texi (working copy)
@@ -10929,6 +10929,22 @@
 for each matching switch, with the @code{%*} replaced by the part of
 that switch matching the @code{*}.
 
+If @code{%*} appears as the last part of a spec sequence then a space
+will added after the end of the last substitution.  If there is more
+text in the sequence however then a space will not be generated.  This
+allows the @code{%*} substitution to be used as part of a larger
+string.  For example, a spec string like this:
+
+@smallexample
+%@{mcu=*:--script=%*/memory.ld@}
+@end smallexample
+
+when matching an option like @code{-mcu=newchip} will produce:
+
+@smallexample
+--script=newchip/memory.ld
+@end smallexample
+
 @item %@{.@code{S}:@code{X}@}
 Substitutes @code{X}, if processing a file with suffix @code{S}.
 
Index: gcc/gcc.c
===
--- gcc/gcc.c   (revision 203750)
+++ gcc/gcc.c   (working copy)
@@ -388,7 +388,8 @@
  %2process CC1PLUS_SPEC as a spec.
  %*substitute the variable part of a matched option.  (See below.)
Note that each comma in the substituted string is replaced by
-   a single space.
+   a single space.  A space is appended after the last substitution
+   unless there is more text in current sequence.
  %Sremove all occurrences of -S from the command line.
 Note - this command is position dependent.  % commands in the
 spec string before this one will see -S, % commands in the
@@ -422,7 +423,9 @@
   once, no matter how many such switches appeared.  However,
   if %* appears somewhere in X, then X will be substituted
   once for each matching switch, with the %* replaced by the
-  part of that switch that matched the '*'.
+  part of that switch that matched the '*'.  A space will be
+ appended after the last substitution unless there is more
+ text in current sequence.
  %{.S:X}  substitutes X, if processing a file with suffix S.
  %{!.S:X} substitutes X, if NOT processing a file with suffix S.
  %{,S:X}  substitutes X, if processing a file which will use spec S.
@@ -5375,7 +5378,17 @@
  {
if (soft_matched_part[0])
  do_spec_1 (soft_matched_part, 1, NULL);
-   do_spec_1 ( , 0, NULL);
+   /* Only insert a space after the substitution if it is at the
+  end of the current sequence.  So if:
+
+%{foo=*:bar%*}%{foo=*:one%*two}
+
+  matches -foo=hello then it will produce:
+  
+barhello onehellotwo
+   */
+   if (*p == 0 || *p == '}')
+ do_spec_1 ( , 0, NULL);
  }
else
  /* Catch the case where a spec string contains something like




Re: [wide-int] int_traits tree

2013-10-17 Thread Kenneth Zadeck

On 10/17/2013 08:29 AM, Richard Biener wrote:

On Thu, 17 Oct 2013, Richard Sandiford wrote:


Richard Biener rguent...@suse.de writes:

The new tree representation can have a length greater than max_len
for an unsigned tree constant that occupies a whole number of HWIs.
The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
When extended to max_wide_int the representation is the same.
But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
The MIN trims the length from 3 to 2 in the last case.

Oh, so it was the tree rep that changed?  _Why_ was it changed?
We still cannot use it directly from wide-int and the extra
word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).

It means that we can now use the tree's HWI array directly, without any
copying, for addr_wide_int and max_wide_int.  The only part of decompose ()
that does a copy is the small_prec case, which is trivially compiled out
for addr_wide_int and max_wide_int.

 2) addr_wide_int.  This is a fixed size representation that is
  guaranteed to be large enough to compute any bit or byte sized
  address calculation on the target.  Currently the value is 64 + 4
  bits rounded up to the next number even multiple of
  HOST_BITS_PER_WIDE_INT (but this can be changed when the first
  port needs more than 64 bits for the size of a pointer).

  This flavor can be used for all address math on the target.  In
  this representation, the values are sign or zero extended based
  on their input types to the internal precision.  All math is done
  in this precision and then the values are truncated to fit in the
  result type.  Unlike most gimple or rtl intermediate code, it is
  not useful to perform the address arithmetic at the same
  precision in which the operands are represented because there has
  been no effort by the front ends to convert most addressing
  arithmetic to canonical types.

  In the addr_wide_int, all numbers are represented as signed
  numbers.  There are enough bits in the internal representation so
  that no infomation is lost by representing them this way.

so I guess from that that addr_wide_int.get_precision is always
that 64 + 4 rounded up.  Thus decompose gets that constant precision
input and the extra zeros make the necessary extension always a no-op.
Aha.
it is until someone comes up with a port that this will not work for, 
then they will have to add some machinery to sniff the port and make 
this bigger.I am hoping to be retired by the time this happens.

For max_wide_int the same rules apply, just its size is larger.

Ok.  So the reps are only canonical wide-int because we only
ever use them with precision  xprecision (maybe we should assert
that).

It is now asserted for (as of a few days ago).



Btw, we are not using them directly, but every time we actually
build a addr_wide_int / max_wide_int we copy them anyway:

/* Initialize the storage from integer X, in precision N.  */
template int N
template typename T
inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x)
{
   /* Check for type compatibility.  We don't want to initialize a
  fixed-width integer from something like a wide_int.  */
   WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
   wide_int_ref xi (x, N);
   len = xi.len;
   for (unsigned int i = 0; i  len; ++i)
 val[i] = xi.val[i];
}




it avoids a 2nd copy though, which shows nicely what was rummaging in
my head for the last two days - that the int_trais  abstraction
was somehow at the wrong level - it should have been traits that
are specific to the storage model?  or the above should use
int_traits::decompose manually with it always doing the
copy (that would also optimize away one copy and eventually
would make the extra zeros not necessary).
this came in with richard's storage manager patch.In my older code, 
we tried and succeeded many times to just borrow the underlying rep.
I think that richard needs to work this out.

I originally thought that extra zeros get rid of all copying from trees
to all wide-int kinds.

What's the reason again to not use my original proposed encoding
of the MSB being the sign bit?  RTL constants simply are all signed
then.  Just you have to also sign-extend in functions like lts_p
as not all constants are sign-extended.  But we can use both tree
(with the now appended zero) and RTL constants representation
unchanged.
I am not following you here.   In trees, the msb is effectively a sign 
bit, even for unsigned numbers because we add that extra block.


but inside of wide int, we do not add extra blocks beyond the 
precision.   That would be messy for a lot of other reasons.





Re: [wide-int] int_traits tree

2013-10-17 Thread Kenneth Zadeck

On 10/17/2013 07:30 AM, Richard Biener wrote:

On Thu, 17 Oct 2013, Richard Sandiford wrote:


Richard Biener rguent...@suse.de writes:

On Thu, 17 Oct 2013, Richard Sandiford wrote:


Kenneth Zadeck zad...@naturalbridge.com writes:

As mentioned in my message yesterday, I thought your new way of canonising
unsigned tree constants meant that there was always an upper zero bit.
Is that right?

i believe this is correct.

If so, xprecision  precision is a no-op, because the number always
has the right form for wider precisions.  The only difficult case is
xprecision == precision, since then we need to peel off any upper -1 HWIs.

say my HWI size is 8 bits (just to keep from typing a million 'f's.   if
i have a 16 bit unsigned number that is all 1s in the tree world it is 3
hwis
0x00 0xff 0xff.

but inside regular wide int, it would take 1 wide int whose value is 0xff.
inside of max it would be the same as the tree, but then the test
precision  xprecision + hbpwi never kicks in because precision is
guaranteed to be huge.

inside of addr_wide_int, i think we tank with the assertion.

It should be OK for addr_wide_int too.  The precision still fits 2 HWIs.
The initial length is greater than the maximum length of an addr_wide_int,
but your len = MAX (len, max_len) deals with that.

It's

   len = MIN (len, max_len)

Oops, yeah, I meant MIN, sorry.


which looked suspicious to me, but with precision = xprecision
precision can only be zero if xprecision is zero which looked to
me like it cannot happen - or rather it should be fixed.

Despite the comment above the code, I don't think this MIN is there
for the zero-precision case.  I think it's there to handle the new
tree representation.

The new tree representation can have a length greater than max_len
for an unsigned tree constant that occupies a whole number of HWIs.
The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
When extended to max_wide_int the representation is the same.
But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
The MIN trims the length from 3 to 2 in the last case.

Oh, so it was the tree rep that changed?  _Why_ was it changed?
We still cannot use it directly from wide-int and the extra
word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).
  
It was changed because you asked me to change it.   However, you end up 
with special cases either way.

the case actually comes up on the ppc because they do a lot of 128 bit

math.I think i got thru the x86-64 without noticing this.

Well, it'd be suspicious if we're directly using 128-bit numbers
in addr_wide_int.  The justification for the assertion was that we
should explicitly truncate to addr_wide_int when deliberately
ignoring upper bits, beyond bit or byte address width.  128 bits
definitely falls into that category on powerpc.

My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid
wide-int value if it has precision 16.

No, for a 16-bit wide_int it should be 0xff.  0x00 0xff 0xff is correct
for any wide_int wider than 16 bits though.


AFAIK that is what the code produces,

In which case?  This is:

precision == 16
xprecision == 16
len == 3
max_len == 2

The MIN trims the len to 2 and then the loop Kenny added trims it
again to 1, so the 0x00 0xff 0xff becomes 0xff.  The 0x00 0xff
is still there in the array, but not used.


but now Kenny says this is only for some kind
of wide-ints but not all?  That is, why is

inline wi::storage_ref
wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch,
 unsigned int precision, const_tree
x)
{
   unsigned int len = TREE_INT_CST_NUNITS (x);
   const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x,
0);
   return wi::storage_ref (val, len, precision);
}

not a valid implementation together with making sure that the
INTEGER_CST tree rep has that extra word of zeros if required?

The fundamental problem here is that we're trying to support two cases:

(a) doing N-bit arithemtic in cases where the inputs have N bits
(b) doing N-bit arithmetic in cases where the inputs have fewer than N bits
 and are extended according to TYPE_SIGN.

Let's assume 32-bit HWIs.  The 16-bit (4-hex-digit) constant 0x8000 is
0x8000 regardless of whether the type is signed or unsigned.  But if it's
extended to 32-bits you get two different numbers 0x8000 and 0x8000,
depending on the sign.

So for one value of the precision parameter (i.e. xprecision), signed
and unsigned constants produce the same number.  But for another value
of the precision parameter (those greater than xprecision), signed and
unsigned constants produce different numbers.  Yet at the moment the tree
constant has a single representation.

But a correctly extended one, up to its len!  (as opposed to RTL)


So I think the possibilities are:

(1) Use the representation of an N-bit wide_int to store N-bit tree constants.
 Do work when extending them to wider 

RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2

2013-10-17 Thread Bernd Edlinger
On Wed, 16 Oct 2013 22:50:20, DJ Delorie wrote:
 Not all of them can work, because they describe something that can't
 be done in hardware. For example, the first test has an incomplete
 bitfield - the fields do not completely describe an int so the
 structure is smaller (one byte, according to sizeof()) than the access
 type (2-8 bytes).


where in the C standard did you read the requirement that every bit
field must be complete? (This is a serious question).
extern struct
{
  volatile unsigned int b : 1;
} bf3;

on my compiler this structure occupies 4 bytes.
and it is aligned at 4 bytes.
That is OK for me and AAPCS.

But the access bf3.b=1 is SI mode with Sandra's patch (part 1, which just
obeys the AAPCS and does nothing else) 
and QI mode without this patch, which is just a BUG.
I am quite surprised how your target manages to avoid it?

It is as Sandra said, at least on ARM -fstrict-volatile-bitfields
does not function at all. And the C++11 memory model wins all the time.

 Looking through the tests, most of them combine packed with
 mismatched types. IMHO, those tests are invalid.

I dont think so. They are simply packed. And volatile just says that
the value may be changed by a different thread. It has a great
impact on loop optimizations.

 Either way, if -fstrict-volatile-bitfields does not do what it's
 supposed to do, the correct action is to fix it - not to disable it on
 targets that rely on it for correct operation.

Agreed. That is the number one priority here.

...
 I've not objected to fixing -fstrict-volatile-bitfields, or making the
 -fno-strict-volatile-bitfields case match the standard. I've only
 objected to breaking my targets by making that flag not the default.

Fine. Why cant we just get this fixed?

Bernd.

Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Diego Novillo
On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote:

 How is Google going to change its patch commit policies to ensure that
 this does not happen again?

There is nothing to change.  Google follows
http://gcc.gnu.org/contribute.html, like everyone else. Sri just fixed
the oversight and if there is any other fallout from his patch, he
will address it.

I don't see anything out of the ordinary here.


Diego.


[PATCH] Fixup handling of zero-precision integer types

2013-10-17 Thread Richard Biener

These two patches try to fix handling of zero-precision integer
types (created by struct { int : 0; };).  Currently they get
assigned TYPE_MIN/MAX_VALUEs but clearly a zero-precision
integer type does not have a single valid value.  Of course
for these nothing should look at TYPE_MIN/MAX_VALUE you'd think ...
Clearly you are wrong.  See the patch below which bootstrapped
and tested on x86_64-unknown-linux-gnu for all languages
in the attempt to apply the 2nd patch (still testing).

Objections?

Thanks,
Richard.


Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   (revision 203744)
+++ gcc/stor-layout.c   (working copy)
@@ -2054,7 +2054,7 @@ layout_type (tree type)
 
 case BOOLEAN_TYPE:  /* Used for Java, Pascal, and Chill.  */
   if (TYPE_PRECISION (type) == 0)
-   TYPE_PRECISION (type) = 1; /* default to one byte/boolean.  */
+   gcc_unreachable ();
 
   /* ... fall through ...  */
 
@@ -2062,7 +2062,7 @@ layout_type (tree type)
 case ENUMERAL_TYPE:
   if (TREE_CODE (TYPE_MIN_VALUE (type)) == INTEGER_CST
   tree_int_cst_sgn (TYPE_MIN_VALUE (type)) = 0)
-   TYPE_UNSIGNED (type) = 1;
+   gcc_assert (TYPE_UNSIGNED (type));
 
   SET_TYPE_MODE (type,
 smallest_mode_for_size (TYPE_PRECISION (type), MODE_INT));



2013-10-17  Richard Biener  rguent...@suse.de

* stor-layout.c (layout_type): Do not change TYPE_PRECISION
or TYPE_UNSIGNED of integral types.
(set_min_and_max_values_for_integral_type): Leave TYPE_MIN/MAX_VALUE
NULL_TREE for zero-precision integral types.

Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   (revision 203744)
+++ gcc/stor-layout.c   (working copy)
@@ -2052,18 +2052,9 @@ layout_type (tree type)
 of the language-specific code.  */
   gcc_unreachable ();
 
-case BOOLEAN_TYPE:  /* Used for Java, Pascal, and Chill.  */
-  if (TYPE_PRECISION (type) == 0)
-   TYPE_PRECISION (type) = 1; /* default to one byte/boolean.  */
-
-  /* ... fall through ...  */
-
+case BOOLEAN_TYPE:
 case INTEGER_TYPE:
 case ENUMERAL_TYPE:
-  if (TREE_CODE (TYPE_MIN_VALUE (type)) == INTEGER_CST
-  tree_int_cst_sgn (TYPE_MIN_VALUE (type)) = 0)
-   TYPE_UNSIGNED (type) = 1;
-
   SET_TYPE_MODE (type,
 smallest_mode_for_size (TYPE_PRECISION (type), MODE_INT));
   TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)));
@@ -2520,6 +2511,12 @@ set_min_and_max_values_for_integral_type
   tree min_value;
   tree max_value;
 
+  /* For bitfields with zero width we end up creating integer types
+ with zero precision.  Don't assign any minimum/maximum values
+ to those types, they don't have any valid value.  */
+  if (precision  1)
+return;
+
   if (is_unsigned)
 {
   min_value = build_int_cst (type, 0);


Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Kenneth Zadeck wrote:

 On 10/17/2013 08:29 AM, Richard Biener wrote:
  On Thu, 17 Oct 2013, Richard Sandiford wrote:
  
   Richard Biener rguent...@suse.de writes:
 The new tree representation can have a length greater than max_len
 for an unsigned tree constant that occupies a whole number of HWIs.
 The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
 When extended to max_wide_int the representation is the same.
 But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading
 zero.
 The MIN trims the length from 3 to 2 in the last case.
Oh, so it was the tree rep that changed?  _Why_ was it changed?
We still cannot use it directly from wide-int and the extra
word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE
()).
   It means that we can now use the tree's HWI array directly, without any
   copying, for addr_wide_int and max_wide_int.  The only part of decompose
   ()
   that does a copy is the small_prec case, which is trivially compiled out
   for addr_wide_int and max_wide_int.
   2) addr_wide_int.  This is a fixed size representation that is
guaranteed to be large enough to compute any bit or byte sized
address calculation on the target.  Currently the value is 64 + 4
bits rounded up to the next number even multiple of
HOST_BITS_PER_WIDE_INT (but this can be changed when the first
port needs more than 64 bits for the size of a pointer).
  
This flavor can be used for all address math on the target.  In
this representation, the values are sign or zero extended based
on their input types to the internal precision.  All math is done
in this precision and then the values are truncated to fit in the
result type.  Unlike most gimple or rtl intermediate code, it is
not useful to perform the address arithmetic at the same
precision in which the operands are represented because there has
been no effort by the front ends to convert most addressing
arithmetic to canonical types.
  
In the addr_wide_int, all numbers are represented as signed
numbers.  There are enough bits in the internal representation so
that no infomation is lost by representing them this way.
  
  so I guess from that that addr_wide_int.get_precision is always
  that 64 + 4 rounded up.  Thus decompose gets that constant precision
  input and the extra zeros make the necessary extension always a no-op.
  Aha.
 it is until someone comes up with a port that this will not work for, then
 they will have to add some machinery to sniff the port and make this bigger.
 I am hoping to be retired by the time this happens.
  For max_wide_int the same rules apply, just its size is larger.
  
  Ok.  So the reps are only canonical wide-int because we only
  ever use them with precision  xprecision (maybe we should assert
  that).
 It is now asserted for (as of a few days ago).
 
  
  Btw, we are not using them directly, but every time we actually
  build a addr_wide_int / max_wide_int we copy them anyway:
  
  /* Initialize the storage from integer X, in precision N.  */
  template int N
  template typename T
  inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x)
  {
 /* Check for type compatibility.  We don't want to initialize a
fixed-width integer from something like a wide_int.  */
 WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
 wide_int_ref xi (x, N);
 len = xi.len;
 for (unsigned int i = 0; i  len; ++i)
   val[i] = xi.val[i];
  }
 
 
  it avoids a 2nd copy though, which shows nicely what was rummaging in
  my head for the last two days - that the int_trais  abstraction
  was somehow at the wrong level - it should have been traits that
  are specific to the storage model?  or the above should use
  int_traits::decompose manually with it always doing the
  copy (that would also optimize away one copy and eventually
  would make the extra zeros not necessary).
 this came in with richard's storage manager patch.In my older code, we
 tried and succeeded many times to just borrow the underlying rep.I think
 that richard needs to work this out.
  I originally thought that extra zeros get rid of all copying from trees
  to all wide-int kinds.
  
  What's the reason again to not use my original proposed encoding
  of the MSB being the sign bit?  RTL constants simply are all signed
  then.  Just you have to also sign-extend in functions like lts_p
  as not all constants are sign-extended.  But we can use both tree
  (with the now appended zero) and RTL constants representation
  unchanged.
 I am not following you here.   In trees, the msb is effectively a sign bit,
 even for unsigned numbers because we add that extra block.
 
 but inside of wide int, we do not add extra blocks beyond the precision.
 That would be messy for a lot of other reasons.


Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Kenneth Zadeck wrote:

 On 10/17/2013 07:30 AM, Richard Biener wrote:
  Oh, so it was the tree rep that changed?  _Why_ was it changed?
  We still cannot use it directly from wide-int and the extra
  word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).

 It was changed because you asked me to change it.   However, you end up with
 special cases either way.

It was a mis-communication - I thought you added this to the wide-int
rep to be able to use both tree and RTX reps directly.

Richard.


Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Sandiford
Richard Biener rguent...@suse.de writes:
 On Thu, 17 Oct 2013, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
  The new tree representation can have a length greater than max_len
  for an unsigned tree constant that occupies a whole number of HWIs.
  The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
  When extended to max_wide_int the representation is the same.
  But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
  The MIN trims the length from 3 to 2 in the last case.
 
  Oh, so it was the tree rep that changed?  _Why_ was it changed?
  We still cannot use it directly from wide-int and the extra
  word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).
 
 It means that we can now use the tree's HWI array directly, without any
 copying, for addr_wide_int and max_wide_int.  The only part of decompose ()
 that does a copy is the small_prec case, which is trivially compiled out
 for addr_wide_int and max_wide_int.

  2) addr_wide_int.  This is a fixed size representation that is
  guaranteed to be large enough to compute any bit or byte sized
  address calculation on the target.  Currently the value is 64 + 4
  bits rounded up to the next number even multiple of
  HOST_BITS_PER_WIDE_INT (but this can be changed when the first
  port needs more than 64 bits for the size of a pointer).

  This flavor can be used for all address math on the target.  In
  this representation, the values are sign or zero extended based
  on their input types to the internal precision.  All math is done
  in this precision and then the values are truncated to fit in the
  result type.  Unlike most gimple or rtl intermediate code, it is
  not useful to perform the address arithmetic at the same
  precision in which the operands are represented because there has
  been no effort by the front ends to convert most addressing
  arithmetic to canonical types.

  In the addr_wide_int, all numbers are represented as signed
  numbers.  There are enough bits in the internal representation so
  that no infomation is lost by representing them this way.

 so I guess from that that addr_wide_int.get_precision is always
 that 64 + 4 rounded up.  Thus decompose gets that constant precision
 input and the extra zeros make the necessary extension always a no-op.
 Aha.

 For max_wide_int the same rules apply, just its size is larger.

 Ok.  So the reps are only canonical wide-int because we only
 ever use them with precision  xprecision (maybe we should assert
 that).

No, we allow precision == xprecision for addr_wide_int and max_wide_int too.
But all we do in that case is trim the length.

Requiring precision  xprecision was option (5) from my message.

 Btw, we are not using them directly, but every time we actually
 build a addr_wide_int / max_wide_int we copy them anyway:

 /* Initialize the storage from integer X, in precision N.  */
 template int N
 template typename T
 inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x)
 {
   /* Check for type compatibility.  We don't want to initialize a
  fixed-width integer from something like a wide_int.  */
   WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
   wide_int_ref xi (x, N);
   len = xi.len;
   for (unsigned int i = 0; i  len; ++i)
 val[i] = xi.val[i];
 }

Are you saying that:

 max_wide_int x = (tree) y;

should just copy a pointer?  Then what about:

 max_wide_int z = x;

Should that just copy a pointer too, or is it OK for the second constructor
to do a copy?

In most cases we only use the fixed_wide_int to store the result of the
computation.  We don't use the above constructor for things like:

 max_wide_int x = wi::add ((tree) y, (int) z);

wi::add uses y and z without going through max_wide_int.

 What's the reason again to not use my original proposed encoding
 of the MSB being the sign bit?  RTL constants simply are all signed
 then.  Just you have to also sign-extend in functions like lts_p
 as not all constants are sign-extended.  But we can use both tree
 (with the now appended zero) and RTL constants representation
 unchanged.

The answer's the same as always.  RTL constants don't have a sign.
Any time we extend an RTL constant, we need to say whether the extension
should be sign extension or zero extension.  So the natural model for
RTL is that an SImode addition is done to SImode width, not some
arbitrary wider width.

The branch uses wide_int for RTL quite naturally, and changing it
wouldn't help do anything with the tree decompose routine.

Thanks,
Richard



Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Sandiford
Kenneth Zadeck zad...@naturalbridge.com writes:
 it avoids a 2nd copy though, which shows nicely what was rummaging in
 my head for the last two days - that the int_trais  abstraction
 was somehow at the wrong level - it should have been traits that
 are specific to the storage model?  or the above should use
 int_traits::decompose manually with it always doing the
 copy (that would also optimize away one copy and eventually
 would make the extra zeros not necessary).
 this came in with richard's storage manager patch.In my older code, 
 we tried and succeeded many times to just borrow the underlying rep.
 I think that richard needs to work this out.

Hmm, you mean the patch to introduce all the wi:: stuff?  Which cases
do you mean?

That patch didn't change the HWI representation (and therefore when a
copy was needed by the to_shwi*/decompose routines).  And it got rid of
quite a few *_wide_int constructions, things like:

- wide_int wi = (max_wide_int (prev_value)
-.add (1, sgn, overflowed));
+ wide_int wi = wi::add (prev_value, 1, sgn, overflowed);

Thanks,
Richard



[Ada] Handle constraint error in vector when large index type

2013-10-17 Thread Arnaud Charlet
To compute the range of values in the generic actual Index_Type for a
vector, there is a series of compile-time tests to determine which of
Index_Type or Count_Type to use for intermediate values.

In the case of an Index_Type comprising a range of values larger than
in Count_Type, the number of values was computed using Index_Type, and
then converted to Count_Type. However, even though the computation of
the result does not overflow, the conversion of the result to type
Count_Type will fail when the value is greater than
Count_Type'Last. The solution is to first test whether the result is
less than Count_Type'Last, and only then convert the result.

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Matthew Heaney  hea...@adacore.com

* a-convec.adb, a-coinve.adb, a-cobove.adb (Insert, Insert_Space):
Inspect value range before converting type.

Index: a-coinve.adb
===
--- a-coinve.adb(revision 203568)
+++ a-coinve.adb(working copy)
@@ -1734,7 +1734,22 @@
 --  worry about if No_Index were less than 0, but that case is
 --  handled above).
 
-Max_Length := Count_Type'Base (Index_Type'Last - No_Index);
+if Index_Type'Last - No_Index =
+ Count_Type'Pos (Count_Type'Last)
+then
+   --  We have determined that range of Index_Type has at least as
+   --  many values as in Count_Type, so Count_Type'Last is the
+   --  maximum number of items that are allowed.
+
+   Max_Length := Count_Type'Last;
+
+else
+   --  The range of Index_Type has fewer values than in Count_Type,
+   --  so the maximum number of items is computed from the range of
+   --  the Index_Type.
+
+   Max_Length := Count_Type'Base (Index_Type'Last - No_Index);
+end if;
  end if;
 
   elsif Index_Type'First = 0 then
@@ -2504,7 +2519,22 @@
 --  worry about if No_Index were less than 0, but that case is
 --  handled above).
 
-Max_Length := Count_Type'Base (Index_Type'Last - No_Index);
+if Index_Type'Last - No_Index =
+ Count_Type'Pos (Count_Type'Last)
+then
+   --  We have determined that range of Index_Type has at least as
+   --  many values as in Count_Type, so Count_Type'Last is the
+   --  maximum number of items that are allowed.
+
+   Max_Length := Count_Type'Last;
+
+else
+   --  The range of Index_Type has fewer values than in Count_Type,
+   --  so the maximum number of items is computed from the range of
+   --  the Index_Type.
+
+   Max_Length := Count_Type'Base (Index_Type'Last - No_Index);
+end if;
  end if;
 
   elsif Index_Type'First = 0 then
Index: a-cobove.adb
===
--- a-cobove.adb(revision 203568)
+++ a-cobove.adb(working copy)
@@ -1227,7 +1227,22 @@
 --  worry about if No_Index were less than 0, but that case is
 --  handled above).
 
-Max_Length := Count_Type'Base (Index_Type'Last - No_Index);
+if Index_Type'Last - No_Index =
+ Count_Type'Pos (Count_Type'Last)
+then
+   --  We have determined that range of Index_Type has at least as
+   --  many values as in Count_Type, so Count_Type'Last is the
+   --  maximum number of items that are allowed.
+
+   Max_Length := Count_Type'Last;
+
+else
+   --  The range of Index_Type has fewer values than in Count_Type,
+   --  so the maximum number of items is computed from the range of
+   --  the Index_Type.
+
+   Max_Length := Count_Type'Base (Index_Type'Last - No_Index);
+end if;
  end if;
 
   elsif Index_Type'First = 0 then
@@ -1685,7 +1700,22 @@
 --  worry about if No_Index were less than 0, but that case is
 --  handled above).
 
-Max_Length := Count_Type'Base (Index_Type'Last - No_Index);
+if Index_Type'Last - No_Index =
+ Count_Type'Pos (Count_Type'Last)
+then
+   --  We have determined that range of Index_Type has at least as
+   --  many values as in Count_Type, so Count_Type'Last is the
+   --  maximum number of items that are allowed.
+
+   Max_Length := Count_Type'Last;
+
+else
+   --  The range of Index_Type has fewer values than in Count_Type,
+   --  so the maximum number of items is computed from the range of
+   --  the Index_Type.
+
+   Max_Length := Count_Type'Base (Index_Type'Last - 

Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
  On Thu, 17 Oct 2013, Richard Sandiford wrote:
 
  Richard Biener rguent...@suse.de writes:
   The new tree representation can have a length greater than max_len
   for an unsigned tree constant that occupies a whole number of HWIs.
   The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
   When extended to max_wide_int the representation is the same.
   But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero.
   The MIN trims the length from 3 to 2 in the last case.
  
   Oh, so it was the tree rep that changed?  _Why_ was it changed?
   We still cannot use it directly from wide-int and the extra
   word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()).
  
  It means that we can now use the tree's HWI array directly, without any
  copying, for addr_wide_int and max_wide_int.  The only part of decompose ()
  that does a copy is the small_prec case, which is trivially compiled out
  for addr_wide_int and max_wide_int.
 
   2) addr_wide_int.  This is a fixed size representation that is
   guaranteed to be large enough to compute any bit or byte sized
   address calculation on the target.  Currently the value is 64 + 4
   bits rounded up to the next number even multiple of
   HOST_BITS_PER_WIDE_INT (but this can be changed when the first
   port needs more than 64 bits for the size of a pointer).
 
   This flavor can be used for all address math on the target.  In
   this representation, the values are sign or zero extended based
   on their input types to the internal precision.  All math is done
   in this precision and then the values are truncated to fit in the
   result type.  Unlike most gimple or rtl intermediate code, it is
   not useful to perform the address arithmetic at the same
   precision in which the operands are represented because there has
   been no effort by the front ends to convert most addressing
   arithmetic to canonical types.
 
   In the addr_wide_int, all numbers are represented as signed
   numbers.  There are enough bits in the internal representation so
   that no infomation is lost by representing them this way.
 
  so I guess from that that addr_wide_int.get_precision is always
  that 64 + 4 rounded up.  Thus decompose gets that constant precision
  input and the extra zeros make the necessary extension always a no-op.
  Aha.
 
  For max_wide_int the same rules apply, just its size is larger.
 
  Ok.  So the reps are only canonical wide-int because we only
  ever use them with precision  xprecision (maybe we should assert
  that).
 
 No, we allow precision == xprecision for addr_wide_int and max_wide_int too.
 But all we do in that case is trim the length.
 
 Requiring precision  xprecision was option (5) from my message.
 
  Btw, we are not using them directly, but every time we actually
  build a addr_wide_int / max_wide_int we copy them anyway:
 
  /* Initialize the storage from integer X, in precision N.  */
  template int N
  template typename T
  inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x)
  {
/* Check for type compatibility.  We don't want to initialize a
   fixed-width integer from something like a wide_int.  */
WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
wide_int_ref xi (x, N);
len = xi.len;
for (unsigned int i = 0; i  len; ++i)
  val[i] = xi.val[i];
  }
 
 Are you saying that:
 
  max_wide_int x = (tree) y;
 
 should just copy a pointer?  Then what about:

No, it should do a copy.  But only one - which it does now with
the append-extra-zeros-in-tree-rep, I was just thinking how to
avoid it when not doing that which means adjusting the rep in
the copy we need to do anyway.  If fixed_wide_int_storage is
really the only reason to enlarge the tree rep.
 
  max_wide_int z = x;
 
 Should that just copy a pointer too, or is it OK for the second constructor
 to do a copy?
 
 In most cases we only use the fixed_wide_int to store the result of the
 computation.  We don't use the above constructor for things like:
 
  max_wide_int x = wi::add ((tree) y, (int) z);
 
 wi::add uses y and z without going through max_wide_int.

Yes, I am aware of that.

  What's the reason again to not use my original proposed encoding
  of the MSB being the sign bit?  RTL constants simply are all signed
  then.  Just you have to also sign-extend in functions like lts_p
  as not all constants are sign-extended.  But we can use both tree
  (with the now appended zero) and RTL constants representation
  unchanged.
 
 The answer's the same as always.  RTL constants don't have a sign.
 Any time we extend an RTL constant, we need to say whether the extension
 should be sign extension or zero extension.  So the natural model for
 RTL is that an SImode addition is done to SImode width, not some
 arbitrary wider 

[Ada] Illegal constituent in state refinement

2013-10-17 Thread Arnaud Charlet
This patch modifies the logic in the analysis of aspect/pragma Refined_State
to catch a case where a visible variable is used as a constituent in a state
refinement.


-- Source --


--  pack.ads

package Pack
  with Abstract_State = State
is
   Var : Integer;

   procedure Proc (Formal : Integer)
 with Global = (Output = State);
end Pack;

--  pack.adb

package body Pack
  with Refined_State = (State = Var)
is
   procedure Proc (Formal : Integer)
 with Refined_Global = (Output = Var)
   is begin null; end Proc;
end Pack;


-- Compilation and output --


$ gcc -c -gnatd.V pack.adb
pack.adb:2:35: cannot use Var in refinement, constituent is not a hidden
  state of package Pack
pack.adb:5:11: useless refinement, subprogram Proc does not mention abstract
  state with visible refinement

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Hristian Kirtchev  kirtc...@adacore.com

* sem_prag.adb (Analyze_Constituent): Move the check
concerning option Part_Of to routine Check_Matching_Constituent.
(Check_Matching_Constituent): Verify that an abstract state
that acts as a constituent has the prope Part_Op option in
its aspect/pragma Abstract_State.  Account for the case when a
constituent comes from a private child or private sibling.
* sem_util.ads, sem_util.adb (Is_Child_Or_Sibling): New routine.

Index: sem_prag.adb
===
--- sem_prag.adb(revision 203755)
+++ sem_prag.adb(working copy)
@@ -21439,52 +21439,75 @@
   Error_Msg_NE
 (duplicate use of constituent , Constit, Constit_Id);
   return;
-   end if;
 
-   --  The related package has no hidden states, nothing to match.
-   --  This case arises when the constituents are states coming
-   --  from a private child.
+   --  A state can act as a constituent only when it is part of
+   --  another state. This relation is expressed by option Part_Of
+   --  of pragma Abstract_State.
 
-   if No (Hidden_States) then
-  return;
+   elsif Ekind (Constit_Id) = E_Abstract_State then
+  if not Is_Part_Of (Constit_Id, State_Id) then
+ Error_Msg_Name_1 := Chars (State_Id);
+ Error_Msg_NE
+   (state  is not a valid constituent of ancestor 
+ state %, Constit, Constit_Id);
+ return;
+
+  --  The constituent has the proper Part_Of option, but may
+  --  not appear in the immediate hidden state of the related
+  --  package. This case arises when the constituent comes from
+  --  a private child or a private sibling. Recognize these
+  --  scenarios to avoid generating a bogus error message.
+
+  elsif Is_Child_Or_Sibling
+  (Pack_1= Scope (State_Id),
+   Pack_2= Scope (Constit_Id),
+   Private_Child = True)
+  then
+ return;
+  end if;
end if;
 
--  Inspect the hidden states of the related package looking for
--  a match.
 
-   State_Elmt := First_Elmt (Hidden_States);
-   while Present (State_Elmt) loop
+   if Present (Hidden_States) then
+  State_Elmt := First_Elmt (Hidden_States);
+  while Present (State_Elmt) loop
 
-  --  A valid hidden state or variable participates in a
-  --  refinement. Add the constituent to the list of processed
-  --  items to aid with the detection of duplicate constituent
-  --  use. Remove the constituent from Hidden_States to signal
-  --  that it has already been used.
+ --  A valid hidden state or variable acts as a constituent
 
-  if Node (State_Elmt) = Constit_Id then
- Add_Item (Constit_Id, Constituents_Seen);
- Remove_Elmt (Hidden_States, State_Elmt);
+ if Node (State_Elmt) = Constit_Id then
 
- --  Collect the constituent in the list of refinement
- --  items. Establish a relation between the refined state
- --  and its constituent.
+--  Add the constituent to the lis of processed items
+--  to aid with the detection of duplicates. Remove the
+--  constituent from Hidden_States to signal that it
+--  has already been matched.
 
- 

[Ada] Containment of finalization actions in short circuit operators

2013-10-17 Thread Arnaud Charlet
This change ensures that any finalization action required within an expression
that appears as the left operand of a short circuit operator remains
contained within the code fragment that evaluates that operand (and not
scattered after the evaluation of the complete Boolean expression). This
is helpful for static analysis tools, in particular for coverage analysis.

This is achieved by enclosing the left operand in an Expression_With_Actions,
which will also capture the required finalization actions. A number of
adjustments are made so that the rest of the compiler knows to deal
appropriately with the new occurrences of these expressions with actions.

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Thomas Quinot  qui...@adacore.com

* exp_util.adb (Get_Current_Value_Condition,
Set_Current_Value_Condition): Handle the case of expressions
with actions * exp_util.adb (Insert_Actions): Handle the case
of an expression with actions whose Actions list is empty.
* exp_util.adb (Remove_Side_Effects.Side_Effect_Free): An
expression with actions that has no Actions and whose Expression
is side effect free is itself side effect free.
* exp_util.adb (Remove_Side_Effects): Do not set an incorrect etype on
temporary 'R' (Def_Id), which is in general an access to Exp_Type, not
an Exp_Type.
* sem_res.adb (Resolve): For an expression with
actions, resolve the expression early.  * sem_res.adb
(Resolve_Expression_With_Actions): Rewrite an expression with
actions whose value is compile time known and which has no
actions into just its expression, so that its constant value is
available downstream.
* sem_res.adb (Resolve_Short_Circuit):
Wrap the left operand in an expression with actions to contain
any required finalization actions.
* exp_ch4.adb (Expand_Expression_With_Actions): For an
expression with actions returning a Boolean expression, ensure
any finalization action is kept within the Actions list.
* sem_warn.adb (Check_References, Check_Unset_Reference): add
missing circuitry to handle expressions with actions.
* checks.adb (Ensure_Valid): For an expression with actions,
insert the validity check on the Expression.
* sem_ch13.adb (Build_Static_Predicate.Get_RList): An expression
with actions that has a non-empty Actions list is not static. An
expression with actions that has an empty Actions list has the
static ranges of its Expression.
* sem_util.adb (Has_No_Obvious_Side_Effects): An expression with
actions with an empty Actions list has no obvious side effects
if its Expression itsekf has no obvious side effects.

Index: exp_util.adb
===
--- exp_util.adb(revision 203762)
+++ exp_util.adb(working copy)
@@ -2706,18 +2706,36 @@
 (N : Node_Id;
  S : Boolean)
   is
- Cond : Node_Id;
- Sens : Boolean;
+ Cond  : Node_Id;
+ Prev_Cond : Node_Id;
+ Sens  : Boolean;
 
   begin
  Cond := N;
  Sens := S;
 
- --  Deal with NOT operators, inverting sense
+ loop
+Prev_Cond := Cond;
 
- while Nkind (Cond) = N_Op_Not loop
-Cond := Right_Opnd (Cond);
-Sens := not Sens;
+--  Deal with NOT operators, inverting sense
+
+while Nkind (Cond) = N_Op_Not loop
+   Cond := Right_Opnd (Cond);
+   Sens := not Sens;
+end loop;
+
+--  Deal with conversions, qualifications, and expressions with
+--  actions.
+
+while Nkind_In (Cond,
+N_Type_Conversion,
+N_Qualified_Expression,
+N_Expression_With_Actions)
+loop
+   Cond := Expression (Cond);
+end loop;
+
+exit when Cond = Prev_Cond;
  end loop;
 
  --  Deal with AND THEN and AND cases
@@ -2798,9 +2816,16 @@
 
 return;
 
---  Case of Boolean variable reference, return as though the
---  reference had said var = True.
+ elsif Nkind_In (Cond,
+ N_Type_Conversion,
+ N_Qualified_Expression,
+ N_Expression_With_Actions)
+ then
+Cond := Expression (Cond);
 
+ --  Case of Boolean variable reference, return as though the
+ --  reference had said var = True.
+
  else
 if Is_Entity_Name (Cond) and then Ent = Entity (Cond) then
Val := New_Occurrence_Of (Standard_True, Sloc (Cond));
@@ -3406,8 +3431,13 @@
 
 when N_Expression_With_Actions =
if N = Expression (P) then
-  

Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Sandiford
Richard Biener rguent...@suse.de writes:
  What's the reason again to not use my original proposed encoding
  of the MSB being the sign bit?  RTL constants simply are all signed
  then.  Just you have to also sign-extend in functions like lts_p
  as not all constants are sign-extended.  But we can use both tree
  (with the now appended zero) and RTL constants representation
  unchanged.
 
 The answer's the same as always.  RTL constants don't have a sign.
 Any time we extend an RTL constant, we need to say whether the extension
 should be sign extension or zero extension.  So the natural model for
 RTL is that an SImode addition is done to SImode width, not some
 arbitrary wider width.

 RTL constants are sign-extended (whether you call them then signed
 is up to you).  They have a precision.  This is how they are
 valid reps for wide-ints, and that doesn't change.

 I was saying that if we make not _all_ wide-ints sign-extended
 then we can use the tree rep as-is.  We'd then have the wide-int
 rep being either zero or sign extended but not arbitrary random
 bits outside of the precision (same as the tree rep).

OK, but does that have any practical value over leaving them arbitrary,
as in Kenny's original implementation?  Saying that upper bits can be
either signs or zeros means that readers wouldn't be able to rely on
one particular extension (so would have to do the work that they did
in Kenny's implementation).  At the same time it means that writers
can't leave the upper bits in an arbitrary state, so would have to
make sure that they are signs or zeros (presumably having a free
choice of which).

Thanks,
Richard



[Ada] Warnings on unused with_clauses in subunits

2013-10-17 Thread Arnaud Charlet
When generating code, subunits are inserted into the tree of the parent unit,
and their context is added to the context of the parent. This makes it hard
to determine whether any with_clause on the proper body is superfluous. This
patch adds circuitry to detect these superfluous with_clauses when the subunit
is compiled by itself, in -gnatc mode.

Executing:

 gcc -c -gnatc -gnatwu pkg-proc.adb

must yield:

pkg-proc.adb:1:06: warning: unit Unused_Pkg is not referenced
pkg-proc.adb:2:06: warning: unit Unused_Pkg_2 is not referenced
pkg.adb:3:03: warning: variable Variable is never read and never assigned

---
with Unused_Pkg;
with Unused_Pkg_2;
with Ada.Text_Io;
separate (Pkg)
procedure Proc is
begin
  Ada.Text_Io.Put_Line (Hello);
end Proc;
---
with Unused_Pkg;
package body pkg is
  Variable : Unused_Pkg.T;
  procedure Proc is separate;
end Pkg;
---
package Pkg is
  procedure Proc;
end Pkg;
---
package body Unused_Pkg is

  procedure Empty is
  begin
null;
  end Empty;

end Unused_Pkg;
--
package Unused_Pkg is
  type T is (Tbd);
  procedure Empty;
end Unused_Pkg;
--
package Unused_Pkg_2 is
  type T is (Tbd);
end Unused_Pkg_2;

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Ed Schonberg  schonb...@adacore.com

* sem_warn.adb (Check_Unused_Withs): If the main unit is a
subunit, apply the check to the units mentioned in its context
only. This provides additional warnings on with_clauses that
are superfluous.

Index: sem_warn.adb
===
--- sem_warn.adb(revision 203763)
+++ sem_warn.adb(working copy)
@@ -2545,13 +2545,16 @@
  return;
   end if;
 
-  --  Flag any unused with clauses, but skip this step if we are compiling
-  --  a subunit on its own, since we do not have enough information to
-  --  determine whether with's are used. We will get the relevant warnings
-  --  when we compile the parent. This is the normal style of GNAT
-  --  compilation in any case.
+  --  Flag any unused with clauses. For a subunit, check only the units
+  --  in its context, not those of the parent, which may be needed by other
+  --  subunits.  We will get the full warnings when we compile the parent,
+  --  but the following is helpful when compiling a subunit by itself.
 
   if Nkind (Unit (Cunit (Main_Unit))) = N_Subunit then
+ if Current_Sem_Unit = Main_Unit then
+Check_One_Unit (Main_Unit);
+ end if;
+
  return;
   end if;
 


[Ada] Non-SPARK package bodies and state refinement

2013-10-17 Thread Arnaud Charlet
This patch modifies the analysis of package bodies to suppress an error message
prompting for state refinement when the body's SPARK mode is Off.


-- Source --


--  pack.ads

package Pack
   with Abstract_State = State,
Initializes= State
is
   procedure Proc
  with Global  = (In_Out = State),
   Depends = (State  = State);
end Pack;

--  pack.adb

package body Pack is
   pragma SPARK_Mode (Off);

   X : Integer := 0;

   procedure Proc is
   begin
  X := X + 1;
   end Proc;
end Pack;

-
-- Compilation --
-

$ gcc -c -gnatd.V pack.adb

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Hristian Kirtchev  kirtc...@adacore.com

* sem_ch3.adb (Analyze_Declarations): Emit an
error message concerning state refinement when the spec defines at
least one non-null abstract state and the body's SPARK mode is On.
(Requires_State_Refinement): New routine.

Index: sem_ch3.adb
===
--- sem_ch3.adb (revision 203762)
+++ sem_ch3.adb (working copy)
@@ -2071,6 +2071,12 @@
   --  If the states have visible refinement, remove the visibility of each
   --  constituent at the end of the package body declarations.
 
+  function Requires_State_Refinement
+(Spec_Id : Entity_Id;
+ Body_Id : Entity_Id) return Boolean;
+  --  Determine whether a package denoted by its spec and body entities
+  --  requires refinement of abstract states.
+
   -
   -- Adjust_Decl --
   -
@@ -2100,6 +2106,82 @@
  end if;
   end Remove_Visible_Refinements;
 
+  ---
+  -- Requires_State_Refinement --
+  ---
+
+  function Requires_State_Refinement
+(Spec_Id : Entity_Id;
+ Body_Id : Entity_Id) return Boolean
+  is
+ function Mode_Is_Off (Prag : Node_Id) return Boolean;
+ --  Given pragma SPARK_Mode, determine whether the mode is Off
+
+ -
+ -- Mode_Is_Off --
+ -
+
+ function Mode_Is_Off (Prag : Node_Id) return Boolean is
+Mode : Node_Id;
+
+ begin
+--  The default SPARK mode is On
+
+if No (Prag) then
+   return False;
+end if;
+
+Mode :=
+  Get_Pragma_Arg (First (Pragma_Argument_Associations (Prag)));
+
+--  Then the pragma lacks an argument, the default mode is On
+
+if No (Mode) then
+   return False;
+else
+   return Chars (Mode) = Name_Off;
+end if;
+ end Mode_Is_Off;
+
+  --  Start of processing for Requires_State_Refinement
+
+  begin
+ --  A package that does not define at least one abstract state cannot
+ --  possibly require refinement.
+
+ if No (Abstract_States (Spec_Id)) then
+return False;
+
+ --  The package instroduces a single null state which does not merit
+ --  refinement.
+
+ elsif Has_Null_Abstract_State (Spec_Id) then
+return False;
+
+ --  Check whether the package body is subject to pragma SPARK_Mode. If
+ --  it is and the mode is Off, the package body is considered to be in
+ --  regular Ada and does not require refinement.
+
+ elsif Mode_Is_Off (SPARK_Mode_Pragmas (Body_Id)) then
+return False;
+
+ --  The body's SPARK_Mode may be inherited from a similar pragma that
+ --  appears in the private declarations of the spec. The pragma we are
+ --  interested appears as the second entry in SPARK_Mode_Pragmas.
+
+ elsif Present (SPARK_Mode_Pragmas (Spec_Id))
+   and then Mode_Is_Off (Next_Pragma (SPARK_Mode_Pragmas (Spec_Id)))
+ then
+return False;
+
+ --  The spec defines at least one abstract state and the body has no
+ --  way of circumventing the refinement.
+
+ else
+return True;
+ end if;
+  end Requires_State_Refinement;
+
   --  Local variables
 
   Body_Id : Entity_Id;
@@ -2264,9 +2346,7 @@
 --  State refinement is required when the package declaration has
 --  abstract states. Null states are not considered.
 
-elsif Present (Abstract_States (Spec_Id))
-  and then not Has_Null_Abstract_State (Spec_Id)
-then
+elsif Requires_State_Refinement (Spec_Id, Body_Id) then
Error_Msg_NE
  (package  requires state refinement, Context, Spec_Id);
 end if;


[Ada] Remove special handling for package declaring abstract state

2013-10-17 Thread Arnaud Charlet
On the basis of further discussion, we decided not to implement the
rule saying that a package body must be required for some other
reason if an abstract state is declared. Now we just say a package
body is required if a non-null abstract state and that's it! This
change undoes the error message. So if we compile the following
with -gnatd.V -gnatld7 (but without -gnatc), we get

cannot generate code for file nnas.ads (package spec)

Compiling: nnas.ads

 1. package NNAS
 2.with Abstract_State = State
 3. is
 4.--  package declarations with non-null
 5.--  Abstract State shall have bodies.
 6. end NNAS;

 6 lines: No errors

The cannot generate line reflects the fact that this package
spec requires a body, so the spec cannot be compiled alone.

Tested on x86_64-pc-linux-gnu, committed on trunk

2013-10-17  Robert Dewar  de...@adacore.com

* sem_ch7.adb (Analyze_Package_Specification): Remove circuit
for ensuring that a package spec requires a body for some other
reason than that it contains the declaration of an abstract state.

Index: sem_ch7.adb
===
--- sem_ch7.adb (revision 203755)
+++ sem_ch7.adb (working copy)
@@ -1493,34 +1493,6 @@
 
   Check_One_Tagged_Type_Or_Extension_At_Most;
 
-  --  Issue an error if a package that is a library unit does not require a
-  --  body, and we have a non-null abstract state (SPARK LRM 7.1.5(4)).
-
-  if not Unit_Requires_Body (Id, Ignore_Abstract_State = True)
-and then Present (Abstract_States (Id))
-
---  We use Scope_Depth of 1 to identify library units, which seems a
---  bit ugly, but there doesn't seem to be an easier way.
-
-and then Scope_Depth (Id) = 1
-
---  A null abstract state always appears as the sole element of the
---  state list.
-
-and then not Is_Null_State (Node (First_Elmt (Abstract_States (Id
-  then
- declare
-P : constant Node_Id := Get_Pragma (Id, Pragma_Abstract_State);
- begin
-Error_Msg_NE
-  (package  specifies a non-null abstract state, P, Id);
-Error_Msg_N
-  (\but package does not otherwise require a body, P);
-Error_Msg_N
-  (\pragma Elaborate_Body is required in this case, P);
- end;
-  end if;
-
   --  If switch set, output information on why body required
 
   if List_Body_Required_Info


Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts

2013-10-17 Thread Bernd Schmidt
On 07/14/2013 09:54 AM, Chung-Lin Tang wrote:
 Hi, the last ping of the Nios II patches was:
 http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html
 
 After assessing the state, we feel it would be better to post a
 re-submission of the newest patches.

Since this hasn't attracted attention for months I'll have a go at a
review. I was not involved in this project and haven't seen this code
before.

 @@ -4196,6 +4203,7 @@ esac
  # version to the per-target configury.
  case $cpu_type in
alpha | arm | avr | bfin | cris | i386 | m32c | m68k | microblaze | mips \
 +  | nios2 \
| pa | rs6000 | score | sparc | spu | tilegx | tilepro | xstormy16 | 
 xtensa)
  insn=nop

Could maybe format this nicer to fill up all but the last line.

 +;; These are documented for use in inline asm.
 +(define_register_constraint D00 D00_REG Hard register 0.)
 +(define_register_constraint D01 D01_REG Hard register 1.)
 +(define_register_constraint D02 D02_REG Hard register 2.)
[...]

This doesn't strike me as a good idea. To really make this work for
cases where people write things like D28D13 you'd have to provide all
the union classes as well which is obviously infeasible. There are
alternative mechanisms to get assembly to use the registers you want,
and I'll note that libgcc seems to be using those instead of these
constraints.

This probably also slows down the compiler, and it might be worth an
experiment to see whether deleting these has any effect on register
allocation.

 +;; We use the following constraint letters for constants
 +;;
 +;;  I: -32768 to -32767
 +;;  J: 0 to 65535

 +(define_insn movhi_internal
 +  [(set (match_operand:HI 0 nonimmediate_operand =m, r,r, r,r)
 +(match_operand:HI 1 general_operand   rM,m,rM,I,J))]

The J alternative is unnecessary. The range 32768...65535 doesn't give a
valid HImode constant, the RTL representation of integer constants
always sign extends. Probably you can just use i here and in
movqi_internal.

 +(define_insn movsi_internal
 +  [(set (match_operand:SI 0 nonimmediate_operand =m, r,r, r,r,r,r,r)
 +(match_operand:SI 1 general_operand   rM,m,rM,I,J,K,S,i))]

Not required, but as an experiment you might want to reorder the
alternatives and sprinkle extra rs to get optional reloads and see
whether that improves code generation. You'd have

(match_operand:SI 0 nonimmediate_operand =r,rm,r ,r ,r ,r ,r ,r)
(match_operand:SI 1 general_operand  rM,rM,rm,rI,rJ,rK,rS,ri))]

with the register-register move first. It might not make a difference.

 +;; Split patterns for register alternative cases.
 +(define_split
 +  [(set (match_operand:SI 0 register_operand )
 +(sign_extend:SI (match_operand:HI 1 register_operand )))]

Could merge into a define_insn_and_split?

 +  reload_completed
 +  [(set (match_dup 0)
 +(and:SI (match_dup 1) (const_int 65535)))
 +   (set (match_dup 0)
 +(xor:SI (match_dup 0) (const_int 32768)))
 +   (set (match_dup 0)
 +(plus:SI (match_dup 0) (const_int -32768)))]
 +  operands[1] = gen_lowpart (SImode, operands[1]);)

Is this really faster than two shifts (either as expander or splitter)?

 +;; Arithmetic Operations
 +
 +(define_insn addsi3
 +  [(set (match_operand:SI 0 register_operand  =r)
 +(plus:SI (match_operand:SI 1 register_operand %r)
 + (match_operand:SI 2 arith_operand   rIT)))]
 +  
 +  add%i2\\t%0, %1, %z2
 +  [(set_attr type alu)])

[...]

 +(define_insn mulsi3
 +  [(set (match_operand:SI 0 register_operand  =r)
 +(mult:SI (match_operand:SI 1 register_operand %r)
 + (match_operand:SI 2 arith_operandrI)))]
 +  TARGET_HAS_MUL
 +  mul%i2\\t%0, %1, %z2
 +  [(set_attr type mul)])

There seems to be a slight mismatch here between arith_operand and the
constraints. Unlike in addsi3, T (which calls nios2_unspec_reloc_p)
isn't allowed here and in the other uses of arith_operand. That suggests
that arith_operand should be split into two different predicates, one
for addsi3 and one for all the other uses.

 +(define_expand divsi3
 +  [(set (match_operand:SI 0 register_operand  =r)
 +(div:SI (match_operand:SI 1 register_operand   r)
 +(match_operand:SI 2 register_operand   r)))]
 +  
 +{
 +  if (!TARGET_HAS_DIV)
 +{
 +  if (!TARGET_FAST_SW_DIV)
 +FAIL;
 +  else
 +{
 +  if (nios2_emit_expensive_div (operands, SImode))
 +DONE;
 +}
 +}
 +})

Shouldn't this FAIL if !nios2_emit_expensive_div (ok so that function
never returns 0 but still...)?

 +;;  Integer logical Operations
 +
 +(define_code_iterator LOGICAL [and ior xor])
 +(define_code_attr logical_asm [(and and) (ior or) (xor xor)])
 +
 +(define_insn codesi3
 +  [(set (match_operand:SI 0 register_operand =r,r,r)
 +(LOGICAL:SI (match_operand:SI 1 register_operand %r,r,r)
 +(match_operand:SI 2 logical_operand  rM,J,K)))]
 +  
 +  @
 +

[PATCH, SH] Add support for inlined builtin-strcmp (1/2)

2013-10-17 Thread Christian Bruel
Hello,

This patch just reorganizes the SH code used for memory builtins into
its own file, in preparation of the RTL strcmp hoisting in the next part.

OK for trunk ?

Thanks

Christian





2013-10-17  Christian Bruel  christian.br...@st.com

	* config.gcc (sh-*): Add sh-mem.o to extra_obj.
	* gcc/config/sh/t-sh (sh-mem.o): New rule.
	* gcc/config/sh/sh-mem (expand_block_move): Moved here.
	* gcc/config/sh/sh.c (force_into, expand_block_move): Move to sh-mem.c

Index: gcc/config/sh/sh-mem.c
===
--- gcc/config/sh/sh-mem.c	(revision 0)
+++ gcc/config/sh/sh-mem.c	(working copy)
@@ -0,0 +1,176 @@
+/* Helper routines for memory move and comparison insns.
+   Copyright (C) 2013 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+http://www.gnu.org/licenses/.  */
+
+#include config.h
+#include system.h
+#include coretypes.h
+#include tm.h
+#include expr.h
+#include tm_p.h
+
+/* Like force_operand, but guarantees that VALUE ends up in TARGET.  */
+static void
+force_into (rtx value, rtx target)
+{
+  value = force_operand (value, target);
+  if (! rtx_equal_p (value, target))
+emit_insn (gen_move_insn (target, value));
+}
+
+/* Emit code to perform a block move.  Choose the best method.
+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the source.
+   OPERANDS[2] is the size.
+   OPERANDS[3] is the alignment safe to use.  */
+bool
+expand_block_move (rtx *operands)
+{
+  int align = INTVAL (operands[3]);
+  int constp = (CONST_INT_P (operands[2]));
+  int bytes = (constp ? INTVAL (operands[2]) : 0);
+
+  if (! constp)
+return false;
+
+  /* If we could use mov.l to move words and dest is word-aligned, we
+ can use movua.l for loads and still generate a relatively short
+ and efficient sequence.  */
+  if (TARGET_SH4A_ARCH  align  4
+   MEM_ALIGN (operands[0]) = 32
+   can_move_by_pieces (bytes, 32))
+{
+  rtx dest = copy_rtx (operands[0]);
+  rtx src = copy_rtx (operands[1]);
+  /* We could use different pseudos for each copied word, but
+	 since movua can only load into r0, it's kind of
+	 pointless.  */
+  rtx temp = gen_reg_rtx (SImode);
+  rtx src_addr = copy_addr_to_reg (XEXP (src, 0));
+  int copied = 0;
+
+  while (copied + 4 = bytes)
+	{
+	  rtx to = adjust_address (dest, SImode, copied);
+	  rtx from = adjust_automodify_address (src, BLKmode,
+		src_addr, copied);
+
+	  set_mem_size (from, 4);
+	  emit_insn (gen_movua (temp, from));
+	  emit_move_insn (src_addr, plus_constant (Pmode, src_addr, 4));
+	  emit_move_insn (to, temp);
+	  copied += 4;
+	}
+
+  if (copied  bytes)
+	move_by_pieces (adjust_address (dest, BLKmode, copied),
+			adjust_automodify_address (src, BLKmode,
+		   src_addr, copied),
+			bytes - copied, align, 0);
+
+  return true;
+}
+
+  /* If it isn't a constant number of bytes, or if it doesn't have 4 byte
+ alignment, or if it isn't a multiple of 4 bytes, then fail.  */
+  if (align  4 || (bytes % 4 != 0))
+return false;
+
+  if (TARGET_HARD_SH4)
+{
+  if (bytes  12)
+	return false;
+  else if (bytes == 12)
+	{
+	  rtx func_addr_rtx = gen_reg_rtx (Pmode);
+	  rtx r4 = gen_rtx_REG (SImode, 4);
+	  rtx r5 = gen_rtx_REG (SImode, 5);
+
+	  function_symbol (func_addr_rtx, __movmemSI12_i4, SFUNC_STATIC);
+	  force_into (XEXP (operands[0], 0), r4);
+	  force_into (XEXP (operands[1], 0), r5);
+	  emit_insn (gen_block_move_real_i4 (func_addr_rtx));
+	  return true;
+	}
+  else if (! optimize_size)
+	{
+	  const char *entry_name;
+	  rtx func_addr_rtx = gen_reg_rtx (Pmode);
+	  int dwords;
+	  rtx r4 = gen_rtx_REG (SImode, 4);
+	  rtx r5 = gen_rtx_REG (SImode, 5);
+	  rtx r6 = gen_rtx_REG (SImode, 6);
+
+	  entry_name = (bytes  4 ? __movmem_i4_odd : __movmem_i4_even);
+	  function_symbol (func_addr_rtx, entry_name, SFUNC_STATIC);
+	  force_into (XEXP (operands[0], 0), r4);
+	  force_into (XEXP (operands[1], 0), r5);
+
+	  dwords = bytes  3;
+	  emit_insn (gen_move_insn (r6, GEN_INT (dwords - 1)));
+	  emit_insn (gen_block_lump_real_i4 (func_addr_rtx));
+	  return true;
+	}
+  else
+	return false;
+}
+  if (bytes  64)
+{
+  char entry[30];
+  rtx func_addr_rtx = gen_reg_rtx (Pmode);
+  rtx r4 = gen_rtx_REG (SImode, 4);
+  rtx r5 = gen_rtx_REG (SImode, 5);
+
+  sprintf (entry, __movmemSI%d, bytes);
+  function_symbol 

[PATCH, SH] Add support for inlined builtin-strcmp (2/2)

2013-10-17 Thread Christian Bruel
Hello,

This patch adds support to inline an optimized version of strcmp when
not optimizing for size. The generated code makes use of the cmp/str
instruction to test 4 bytes at a time when correctly aligned.

note that a new pattern was added to match the cmp/str instruction, but
no attempt was made to catch it from combine.

This results in general cycles improvements (against both newlib and
glibc implementations), one of which is a 10%  cycle improvement for a
famous strcmp-biased benchmark starting with a D , but still standard.

This optimization  can be disabled with -fno-builtin-strcmp.

No regressions on sh4 in big and little endian, and sh2 (sh3, and sh4a
are still running for big and little endian for sanity)

OK for trunk

Thanks

Christian



 
2013-10-17  Christian Bruel  christian.br...@st.com

	* gcc/config/sh/sh-mem.c (sh4_expand_cmpstr): New function.
	* gcc/config/sh/sh-protos.h (sh4_expand_cmpstr): Declare.
	* gcc/config/sh/sh.md (cmpstrsi, cmpstr_t): New patterns.
	(rotlhi3_8): Rename.

--- gcc/config/sh/sh.md	2013-10-17 15:14:18.0 +0200
+++ gcc-new/config/sh/sh.md	2013-10-16 16:13:49.0 +0200
@@ -31,9 +31,6 @@
 ;; ??? The MAC.W and MAC.L instructions are not supported.  There is no
 ;; way to generate them.
 
-;; ??? The cmp/str instruction is not supported.  Perhaps it can be used
-;; for a str* inline function.
-
 ;; BSR is not generated by the compiler proper, but when relaxing, it
 ;; generates .uses pseudo-ops that allow linker relaxation to create
 ;; BSR.  This is actually implemented in bfd/{coff,elf32}-sh.c
@@ -4037,7 +4034,7 @@
   DONE;
 })
 
-(define_insn *rotlhi3_8
+(define_insn rotlhi3_8
   [(set (match_operand:HI 0 arith_reg_dest =r)
 	(rotate:HI (match_operand:HI 1 arith_reg_operand r)
 		   (const_int 8)))]
@@ -11912,6 +11909,41 @@
   jsr	@%0%#
   [(set_attr type sfunc)
(set_attr needs_delay_slot yes)])
+
+;; byte compare pattern
+;; temp = a ^ b;
+;; !((temp  0xF000)  (temp  0x0F00)  (temp  0x00F0)  (temp  0x000F))
+(define_insn cmpstr_t
+  [(set (reg:SI T_REG)
+	(eq:SI (and:SI
+		(and:SI
+		 (and:SI
+		  (zero_extract:SI (xor:SI (match_operand:SI 0 arith_reg_operand r)
+	   (match_operand:SI 1 arith_reg_operand r))
+   (const_int 8) (const_int 0))
+		  (zero_extract:SI (xor:SI (match_dup 0) (match_dup 1))
+   (const_int 8) (const_int 8)))
+		  (zero_extract:SI (xor:SI (match_dup 0) (match_dup 1))
+   (const_int 8) (const_int 16)))
+		(zero_extract:SI (xor:SI (match_dup 0) (match_dup 1))
+ (const_int 8) (const_int 24))) (const_int 0)))]
+  TARGET_SH1
+  cmp/str	%0,%1
+  [(set_attr type mt_group)])
+
+(define_expand cmpstrsi
+  [(set (match_operand:SI 0 register_operand )
+	(compare:SI (match_operand:BLK 1 memory_operand )
+		(match_operand:BLK 2 memory_operand )))
+   (use (match_operand 3 immediate_operand ))]
+  TARGET_SH1
+  
+{
+   if (! optimize_insn_for_size_p ()  sh4_expand_cmpstr(operands))
+  DONE;
+   else FAIL;
+})
+
 
 ;; -
 ;; Floating point instructions.
diff -ru gcc/config/sh/sh-mem.c gcc-new/config/sh/sh-mem.c
--- gcc/config/sh/sh-mem.c	2013-10-17 14:59:02.0 +0200
+++ gcc-new/config/sh/sh-mem.c	2013-10-17 14:57:57.0 +0200
@@ -23,6 +23,7 @@
 #include tm.h
 #include expr.h
 #include tm_p.h
+#include basic-block.h
 
 /* Like force_operand, but guarantees that VALUE ends up in TARGET.  */
 static void
@@ -174,3 +175,130 @@
 
   return false;
 }
+
+/* Emit code to perform a strcmp.
+
+   OPERANDS[0] is the destination.
+   OPERANDS[1] is the first string.
+   OPERANDS[2] is the second string.
+   OPERANDS[3] is the align.  */
+bool
+sh4_expand_cmpstr (rtx *operands)
+{
+  rtx s1 = copy_rtx (operands[1]);
+  rtx s2 = copy_rtx (operands[2]);
+  rtx s1_addr = copy_addr_to_reg (XEXP (s1, 0));
+  rtx s2_addr = copy_addr_to_reg (XEXP (s2, 0));
+  rtx tmp0 = gen_reg_rtx (SImode);
+  rtx tmp1 = gen_reg_rtx (SImode);
+  rtx tmp2 = gen_reg_rtx (SImode);
+  rtx tmp3 = gen_reg_rtx (SImode);
+
+  rtx L_return = gen_label_rtx ();
+  rtx L_loop_byte = gen_label_rtx ();
+  rtx L_end_loop_byte = gen_label_rtx ();
+  rtx L_loop_long = gen_label_rtx ();
+  rtx L_end_loop_long = gen_label_rtx ();
+
+  rtx jump, addr1, addr2;
+  int prob_unlikely = REG_BR_PROB_BASE / 10;
+  int prob_likely = REG_BR_PROB_BASE / 4;
+
+  emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr));
+  emit_move_insn (tmp0, GEN_INT (3));
+
+  emit_insn (gen_tstsi_t (tmp0, tmp1));
+
+  emit_move_insn (tmp0, const0_rtx);
+
+  jump = emit_jump_insn (gen_branch_false (L_loop_byte));
+  add_int_reg_note (jump, REG_BR_PROB, prob_likely);
+
+  addr1 = adjust_automodify_address (s1, SImode, s1_addr, 0);
+  addr2 = adjust_automodify_address (s2, SImode, s2_addr, 0);
+
+  /* tmp2 is aligned, OK to load.  */
+  emit_move_insn (tmp3, addr2);
+  emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, 4));
+
+  /*start long loop.  */
+  emit_label (L_loop_long);
+
+  emit_move_insn 

Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.

2013-10-17 Thread Kirill Yukhin
Hello,
On 17 Oct 13:14, Uros Bizjak wrote:
 On Thu, Oct 17, 2013 at 12:47 PM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 
   I suspect gen_lowpart is bad turn when reload is completed, as
   far as it can create new pseudo. gen_lowpart () may call
   gen_reg_rtx (), which contain corresponging gcc_assert ().
 
  False.  gen_lowpart is perfectly safe post-reload.
  Indeed, taking the subreg of a hard register should arrive
 
x = gen_rtx_REG_offset (op, outermode, final_regno, 
  final_offset);
 
  in simplify_subreg.
 
  Have you encountered some specific problem with gen_lowpart?
  Maybe the code in the pattern is buggy? Or is it a gen_lowpart?
 
 I think that original approach with gen_rtx_REG is correct and follows
 established practice in sse.md (please grep for gen_reg_RTX in
 sse.md). If this approach is necessary due to the deficiency of
 gen_lowpart, then the fix to gen_lowpart should be proposed in a
 follow-up patch.

So, I've reverted changes in mult_vect patterns and
added % to constraints.

I've also reverted vec_extract_* (with slight update):
 ...
 +   reload_completed
 +  [(set (match_dup 0) (match_dup 1))]
 +{
 +  if (REG_P (operands[1]))
 +operands[1] = gen_rtx_REG (V16HImode, REGNO (operands[1]));
 +  else
 +operands[1] = adjust_address (operands[1], V16HImode, 0);
 +})

Bootastrapped. 
AVX* tests pass (including new AVX-512)

Is it ok now?

--
Thanks, K

---
 gcc/config/i386/i386.md   |   5 +
 gcc/config/i386/predicates.md |  40 ++
 gcc/config/i386/sse.md| 873 +-
 3 files changed, 912 insertions(+), 6 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 10ca6cb..e7e9f2d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -831,6 +831,11 @@
 (define_code_attr s [(sign_extend s) (zero_extend u)])
 (define_code_attr u_bool [(sign_extend false) (zero_extend true)])
 
+;; Used in signed and unsigned truncations.
+(define_code_iterator any_truncate [ss_truncate truncate us_truncate])
+;; Instruction suffix for truncations.
+(define_code_attr trunsuffix [(ss_truncate s) (truncate ) (us_truncate 
us)])
+
 ;; Used in signed and unsigned fix.
 (define_code_iterator any_fix [fix unsigned_fix])
 (define_code_attr fixsuffix [(fix ) (unsigned_fix u)])
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 06b2914..999d8ab 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -752,6 +752,11 @@
   (and (match_code const_int)
(match_test IN_RANGE (INTVAL (op), 6, 7
 
+;; Match 8 to 9.
+(define_predicate const_8_to_9_operand
+  (and (match_code const_int)
+   (match_test IN_RANGE (INTVAL (op), 8, 9
+
 ;; Match 8 to 11.
 (define_predicate const_8_to_11_operand
   (and (match_code const_int)
@@ -762,16 +767,51 @@
   (and (match_code const_int)
(match_test IN_RANGE (INTVAL (op), 8, 15
 
+;; Match 10 to 11.
+(define_predicate const_10_to_11_operand
+  (and (match_code const_int)
+   (match_test IN_RANGE (INTVAL (op), 10, 11
+
+;; Match 12 to 13.
+(define_predicate const_12_to_13_operand
+  (and (match_code const_int)
+   (match_test IN_RANGE (INTVAL (op), 12, 13
+
 ;; Match 12 to 15.
 (define_predicate const_12_to_15_operand
   (and (match_code const_int)
(match_test IN_RANGE (INTVAL (op), 12, 15
 
+;; Match 14 to 15.
+(define_predicate const_14_to_15_operand
+  (and (match_code const_int)
+   (match_test IN_RANGE (INTVAL (op), 14, 15
+
+;; Match 16 to 19.
+(define_predicate const_16_to_19_operand
+  (and (match_code const_int)
+   (match_test IN_RANGE (INTVAL (op), 16, 19
+
 ;; Match 16 to 31.
 (define_predicate const_16_to_31_operand
   (and (match_code const_int)
(match_test IN_RANGE (INTVAL (op), 16, 31
 
+;; Match 20 to 23.
+(define_predicate const_20_to_23_operand
+  (and (match_code const_int)
+   (match_test IN_RANGE (INTVAL (op), 20, 23
+
+;; Match 24 to 27.
+(define_predicate const_24_to_27_operand
+  (and (match_code const_int)
+   (match_test IN_RANGE (INTVAL (op), 24, 27
+
+;; Match 28 to 31.
+(define_predicate const_28_to_31_operand
+  (and (match_code const_int)
+   (match_test IN_RANGE (INTVAL (op), 28, 31
+
 ;; True if this is a constant appropriate for an increment or decrement.
 (define_predicate incdec_operand
   (match_code const_int)
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 623e919..e21cba3 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -87,6 +87,7 @@
   ;; For AVX512F support
   UNSPEC_VPERMI2
   UNSPEC_VPERMT2
+  UNSPEC_UNSIGNED_FIX_NOTRUNC
   UNSPEC_UNSIGNED_PCMP
   UNSPEC_TESTM
   UNSPEC_TESTNM
@@ -2994,6 +2995,34 @@
(set_attr prefix maybe_vex)
(set_attr mode DI)])
 
+(define_insn cvtusi2ssescalarmodesuffix32
+  [(set (match_operand:VF_128 0 register_operand =v)
+   (vec_merge:VF_128
+ 

Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Biener
On Thu, 17 Oct 2013, Richard Sandiford wrote:

 Richard Biener rguent...@suse.de writes:
   What's the reason again to not use my original proposed encoding
   of the MSB being the sign bit?  RTL constants simply are all signed
   then.  Just you have to also sign-extend in functions like lts_p
   as not all constants are sign-extended.  But we can use both tree
   (with the now appended zero) and RTL constants representation
   unchanged.
  
  The answer's the same as always.  RTL constants don't have a sign.
  Any time we extend an RTL constant, we need to say whether the extension
  should be sign extension or zero extension.  So the natural model for
  RTL is that an SImode addition is done to SImode width, not some
  arbitrary wider width.
 
  RTL constants are sign-extended (whether you call them then signed
  is up to you).  They have a precision.  This is how they are
  valid reps for wide-ints, and that doesn't change.
 
  I was saying that if we make not _all_ wide-ints sign-extended
  then we can use the tree rep as-is.  We'd then have the wide-int
  rep being either zero or sign extended but not arbitrary random
  bits outside of the precision (same as the tree rep).
 
 OK, but does that have any practical value over leaving them arbitrary,
 as in Kenny's original implementation?  Saying that upper bits can be
 either signs or zeros means that readers wouldn't be able to rely on
 one particular extension (so would have to do the work that they did
 in Kenny's implementation).  At the same time it means that writers
 can't leave the upper bits in an arbitrary state, so would have to
 make sure that they are signs or zeros (presumably having a free
 choice of which).

At least you can easily optimize for existing zero / sign extension.

What I see is really bad code for the simple integer-cst predicates
in tree.c.  I don't mind in what way we fix it, but avoiding the
copy on every tree integer constant read looks required to me.

Richard.


Re: [PATCH][2/3] Re-submission of Altera Nios II port, testsuite parts

2013-10-17 Thread Bernd Schmidt
On 07/14/2013 09:54 AM, Chung-Lin Tang wrote:
 These are nios2 patches for the gcc testsuite. Some new testcases were
 added since the last posting.

 Index: gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c
 ===
 --- gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c(revision 
 200946)
 +++ gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c(working copy)
 @@ -124,16 +124,17 @@ __memmove_chk (void *dst, const void *src, __SIZE_
  void *
  memset (void *dst, int c, __SIZE_TYPE__ n)
  {
 +  while (n-- != 0)
 +n[(char *) dst] = c;
 +
/* Single-byte memsets should be done inline when optimisation
 - is enabled.  */
 + is enabled.  Do this after the copy in case we're being called to
 + initialize bss.  */
  #ifdef __OPTIMIZE__
if (memset_disallowed  inside_main  n  2)
  abort ();
  #endif
  
 -  while (n-- != 0)
 -n[(char *) dst] = c;
 -
return dst;
  }

I'm not sure I understand this change. Is nios2 the only target calling
memset to initialize bss, and memset_disallowed is nonzero at the start
of execution?

 Index: gcc/testsuite/gcc.target/nios2/nios2-int-types.c
 ===
 --- gcc/testsuite/gcc.target/nios2/nios2-int-types.c  (revision 0)
 +++ gcc/testsuite/gcc.target/nios2/nios2-int-types.c  (revision 0)
 @@ -0,0 +1,34 @@
 +/* Test that various types are all derived from int.  */
 +/* { dg-do compile { target nios2-*-* } } */

I think you can lose the { target nios2-*-* } for everything inside
gcc.target/nios2.


Bernd


Re: [wide-int] int_traits tree

2013-10-17 Thread Richard Sandiford
Richard Biener rguent...@suse.de writes:
 What I see is really bad code for the simple integer-cst predicates
 in tree.c.  I don't mind in what way we fix it, but avoiding the
 copy on every tree integer constant read looks required to me.

But we got rid of the copy with yesterday's patch.  Are you talking
about the code after that patch or before?

Richard



Re: [wide-int] int_traits tree

2013-10-17 Thread Kenneth Zadeck

On 10/17/2013 09:16 AM, Richard Biener wrote:

On Thu, 17 Oct 2013, Kenneth Zadeck wrote:


On 10/17/2013 08:29 AM, Richard Biener wrote:

On Thu, 17 Oct 2013, Richard Sandiford wrote:


Richard Biener rguent...@suse.de writes:

The new tree representation can have a length greater than max_len
for an unsigned tree constant that occupies a whole number of HWIs.
The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00.
When extended to max_wide_int the representation is the same.
But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading
zero.
The MIN trims the length from 3 to 2 in the last case.

Oh, so it was the tree rep that changed?  _Why_ was it changed?
We still cannot use it directly from wide-int and the extra
word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE
()).

It means that we can now use the tree's HWI array directly, without any
copying, for addr_wide_int and max_wide_int.  The only part of decompose
()
that does a copy is the small_prec case, which is trivially compiled out
for addr_wide_int and max_wide_int.

 2) addr_wide_int.  This is a fixed size representation that is
   guaranteed to be large enough to compute any bit or byte sized
   address calculation on the target.  Currently the value is 64 + 4
   bits rounded up to the next number even multiple of
   HOST_BITS_PER_WIDE_INT (but this can be changed when the first
   port needs more than 64 bits for the size of a pointer).

   This flavor can be used for all address math on the target.  In
   this representation, the values are sign or zero extended based
   on their input types to the internal precision.  All math is done
   in this precision and then the values are truncated to fit in the
   result type.  Unlike most gimple or rtl intermediate code, it is
   not useful to perform the address arithmetic at the same
   precision in which the operands are represented because there has
   been no effort by the front ends to convert most addressing
   arithmetic to canonical types.

   In the addr_wide_int, all numbers are represented as signed
   numbers.  There are enough bits in the internal representation so
   that no infomation is lost by representing them this way.

so I guess from that that addr_wide_int.get_precision is always
that 64 + 4 rounded up.  Thus decompose gets that constant precision
input and the extra zeros make the necessary extension always a no-op.
Aha.

it is until someone comes up with a port that this will not work for, then
they will have to add some machinery to sniff the port and make this bigger.
I am hoping to be retired by the time this happens.

For max_wide_int the same rules apply, just its size is larger.

Ok.  So the reps are only canonical wide-int because we only
ever use them with precision  xprecision (maybe we should assert
that).

It is now asserted for (as of a few days ago).


Btw, we are not using them directly, but every time we actually
build a addr_wide_int / max_wide_int we copy them anyway:

/* Initialize the storage from integer X, in precision N.  */
template int N
template typename T
inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x)
{
/* Check for type compatibility.  We don't want to initialize a
   fixed-width integer from something like a wide_int.  */
WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED;
wide_int_ref xi (x, N);
len = xi.len;
for (unsigned int i = 0; i  len; ++i)
  val[i] = xi.val[i];
}



it avoids a 2nd copy though, which shows nicely what was rummaging in
my head for the last two days - that the int_trais  abstraction
was somehow at the wrong level - it should have been traits that
are specific to the storage model?  or the above should use
int_traits::decompose manually with it always doing the
copy (that would also optimize away one copy and eventually
would make the extra zeros not necessary).

this came in with richard's storage manager patch.In my older code, we
tried and succeeded many times to just borrow the underlying rep.I think
that richard needs to work this out.

I originally thought that extra zeros get rid of all copying from trees
to all wide-int kinds.

What's the reason again to not use my original proposed encoding
of the MSB being the sign bit?  RTL constants simply are all signed
then.  Just you have to also sign-extend in functions like lts_p
as not all constants are sign-extended.  But we can use both tree
(with the now appended zero) and RTL constants representation
unchanged.

I am not following you here.   In trees, the msb is effectively a sign bit,
even for unsigned numbers because we add that extra block.

but inside of wide int, we do not add extra blocks beyond the precision.
That would be messy for a lot of other reasons.

Can you elaborate?  It would make tree and RTX reps directly usable,
only wide-int-to-tree and wide-int-to-rtx need special 

Re: [RFC] By default if-convert only basic blocks that will be vectorized

2013-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2013 at 11:26:56AM +0200, Richard Biener wrote:
 On Wed, 16 Oct 2013, pins...@gmail.com wrote:
 
  
   On Oct 15, 2013, at 5:32 AM, Jakub Jelinek ja...@redhat.com wrote:
   Especially on i?86/x86_64 if-conversion pass seems to be often
   a pessimization, but the vectorization relies on it and without it we 
   can't
   vectorize a lot of the loops.
  
  I think on many other targets it actually helps.  I know for one it 
  helps on octeon even though octeon has no vector instructions.  I think 
  it helps most arm targets too.
 
 The main issue is that it has no cost model - the only cost model
 being that it can successfully if-convert all conditional code in
 a loop, resulting in a single-BB loop.  So it is clearly vectorization
 targeted.
 
 It's infrastructure may be useful to do a more sensible if-conversion
 on GIMPLE level on scalar code.
 
 Of course even the infrastructure needs some TLC (and some better
 generic machinery of keeping track and simplifying of a predicate
 combination).

Yeah, or, if tree if-conversion is a net win for some port even when not
vectorizing, supposedly such a port should at least until the above is
implemented just enable flag_tree_loop_if_convert by default, at least at
some -O* levels.  Of course the question is why would it be beneficial only
in inner loops and not elsewhere (other loops, or stright line code),
and when it is desirable and when not (for vectorization we of course try
hard to if-convert the whole loop, because otherwise we are not able to
vectorize it, but otherwise, is it beneficial just for a couple of
conditionalized stmts at most, or is it fine to conditionalize say 1000
arithmetic statements?).

Jakub


Re: [patch] omp-low.h

2013-10-17 Thread Jakub Jelinek
On Tue, Oct 15, 2013 at 10:46:43PM -0400, Andrew MacLeod wrote:
 Bootstraps on 86_64-unknown-linux-gnu and no new regressions.  OK?
 
 Andrew

 
   * tree-flow.h (struct omp_region): Move to omp-low.c

Missing dot at the end of line.

   Remove omp_ prototypes and variables.
   * gimple.h (omp_reduction_init): Move prototype to omp-low.h.
   (copy_var_decl): Relocate prototype from tree-flow.h.
   * gimple.c (copy_var_decl): Relocate from omp-low.c.
   * tree.h: Move prototype to omp-low.h.
   * omp-low.h: New File. Relocate prototypes here.

Missing space after .
   * omp-low.c (struct omp_region): Make local here.
   (root_omp_region): Make static.
   (copy_var_decl) Move to gimple.c.
   (new_omp_region): Make static.
   (make_gimple_omp_edges): New.  Refactored from tree-cfg.c make_edges.
   * tree-cfg.c: Include omp-low.h.
   (make_edges): Factor out OMP specific bits to make_gimple_omp_edges.
   * gimplify.c: Include omp-low.h.
   * tree-parloops.c: Include omp-low.h.

Use Likewise. for the second.

 
   c
   * c-parser.c: Include omp-low.h.
   * c-typeck.c: Include omp-low.h.

Likewise.
 
   cp
   * parser.c: Include omp-low.h.
   * semantics.c: Include omp-low.h.

Likewise.
 
   fortran
   * trans-openmp.c: Include omp-low.h.

 *** struct omp_for_data
 *** 135,141 
   static splay_tree all_contexts;
   static int taskreg_nesting_level;
   static int target_nesting_level;
 ! struct omp_region *root_omp_region;
   static bitmap task_shared_vars;
   
   static void scan_omp (gimple_seq *, omp_context *);
 --- 175,181 
   static splay_tree all_contexts;
   static int taskreg_nesting_level;
   static int target_nesting_level;
 ! static struct omp_region *root_omp_region;
   static bitmap task_shared_vars;
   
   static void scan_omp (gimple_seq *, omp_context *);

Why?

 --- 912,917 
 *** debug_all_omp_regions (void)
 *** 1219,1225 
   
   /* Create a new parallel region starting at STMT inside region PARENT.  */
   
 ! struct omp_region *
   new_omp_region (basic_block bb, enum gimple_code type,
   struct omp_region *parent)
   {
 --- 1238,1244 
   
   /* Create a new parallel region starting at STMT inside region PARENT.  */
   
 ! static struct omp_region *
   new_omp_region (basic_block bb, enum gimple_code type,
   struct omp_region *parent)
   {

Likewise.

 + case GIMPLE_OMP_CONTINUE:
...
 + 
 +   default:
 + gcc_unreachable ();

Bad indentation here, default: should be indented just by 4
spaces and gcc_unreachable () by 6.

Otherwise LGTM.

Jakub


Re: [SKETCH] Refactor implicit function template implementation and fix 58534, 58536, 58548, 58549 and 58637.

2013-10-17 Thread Jason Merrill

On 10/15/2013 05:21 PM, Adam Butcher wrote:

On Wed, 25 Sep 2013 11:01:26 -0500, Jason Merrill wrote:

1) Build up the type as normal and use tsubst to replace the non-pack
template parameter with a pack if needed.


The problem I've hit with this (and other hacks I've tried that involve
finishing the type first and rewriting afterward) is that, with parm
types such as pairauto,auto..., the specialization pairauto,auto
is indexed in pt.c by hash_specialization into type_specializations
before the '...' is seen.


I'm not sure why that's a problem; we just produce a new specialization 
that uses different template arguments in tsubst.  Specifically, the 
pack 'auto' needs to be a different TEMPLATE_TYPE_PARM from the non-pack 
'auto' produced during parsing.


Jason



Go patch committed: Don't deref unknown type when importing anon field

2013-10-17 Thread Ian Lance Taylor
This patch fixes the Go frontend to not dereference an unknown type when
importing an anonymous field.  This fixes a bug in a recent patch I
committed.  I have a test case ready to commit to the master repository
after the Go 1.2 release is made.  This patch bootstrapped and ran Go
testsuite on x86_64-unknown-linux-gnu.  Committed to mainline and 4.8
branch.

Ian

diff -r 742104f0d4c7 go/types.cc
--- a/go/types.cc	Wed Oct 16 06:37:07 2013 -0700
+++ b/go/types.cc	Thu Oct 17 08:39:50 2013 -0700
@@ -5263,11 +5263,25 @@
 	  // that an embedded builtin type is accessible from another
 	  // package (we know that all the builtin types are not
 	  // exported).
-	  if (name.empty()  ftype-deref()-named_type() != NULL)
+	  // This is called during parsing, before anything is
+	  // lowered, so we have to be careful to avoid dereferencing
+	  // an unknown type name.
+	  if (name.empty())
 	{
-	  const std::string fn(ftype-deref()-named_type()-name());
-	  if (fn[0] = 'a'  fn[0] = 'z')
-		name = '.' + imp-package()-pkgpath() + '.' + fn;
+	  Type *t = ftype;
+	  if (t-classification() == Type::TYPE_POINTER)
+		{
+		  // Very ugly.
+		  Pointer_type* ptype = static_castPointer_type*(t);
+		  t = ptype-points_to();
+		}
+	  std::string tname;
+	  if (t-forward_declaration_type() != NULL)
+		tname = t-forward_declaration_type()-name();
+	  else if (t-named_type() != NULL)
+		tname = t-named_type()-name();
+	  if (!tname.empty()  tname[0] = 'a'  tname[0] = 'z')
+		name = '.' + imp-package()-pkgpath() + '.' + tname;
 	}
 
 	  Struct_field sf(Typed_identifier(name, ftype, imp-location()));


Re: [patch] omp-low.h

2013-10-17 Thread Jakub Jelinek
On Thu, Oct 17, 2013 at 11:52:21AM -0400, Andrew MacLeod wrote:
 On 10/17/2013 11:15 AM, Jakub Jelinek wrote:
 *** struct omp_for_data
 *** 135,141 
static splay_tree all_contexts;
static int taskreg_nesting_level;
static int target_nesting_level;
 ! struct omp_region *root_omp_region;
static bitmap task_shared_vars;
static void scan_omp (gimple_seq *, omp_context *);
 --- 175,181 
static splay_tree all_contexts;
static int taskreg_nesting_level;
static int target_nesting_level;
 ! static struct omp_region *root_omp_region;
static bitmap task_shared_vars;
static void scan_omp (gimple_seq *, omp_context *);
 Why?
 It should be static now since it is no longer exported outside the
 file... and can't be now that struct omp_region is declared in
 omp-low.c

Ah, just misread your change, thought you are adding struct keyword
while you are actually adding static.  Sorry.

Jakub


Merge from 4.8 branch to gccgo branch

2013-10-17 Thread Ian Lance Taylor
I've merged revision 203772 from the GCC 4.8 branch to the gccgo branch.

Ian


Re: [patch] omp-low.h

2013-10-17 Thread Andrew MacLeod

On 10/17/2013 11:15 AM, Jakub Jelinek wrote:

*** struct omp_for_data
*** 135,141 
   static splay_tree all_contexts;
   static int taskreg_nesting_level;
   static int target_nesting_level;
! struct omp_region *root_omp_region;
   static bitmap task_shared_vars;
   
   static void scan_omp (gimple_seq *, omp_context *);

--- 175,181 
   static splay_tree all_contexts;
   static int taskreg_nesting_level;
   static int target_nesting_level;
! static struct omp_region *root_omp_region;
   static bitmap task_shared_vars;
   
   static void scan_omp (gimple_seq *, omp_context *);

Why?
It should be static now since it is no longer exported outside the 
file... and can't be now that struct omp_region is declared in omp-low.c



--- 912,917 
*** debug_all_omp_regions (void)
*** 1219,1225 
   
   /* Create a new parallel region starting at STMT inside region PARENT.  */
   
! struct omp_region *

   new_omp_region (basic_block bb, enum gimple_code type,
struct omp_region *parent)
   {
--- 1238,1244 
   
   /* Create a new parallel region starting at STMT inside region PARENT.  */
   
! static struct omp_region *

   new_omp_region (basic_block bb, enum gimple_code type,
struct omp_region *parent)
   {

Likewise.
again, because it is no longer an exported function...  only 
tree-cfg.c::make_edges needed it, and it no longer does.


Thanks.

Andrew


Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2

2013-10-17 Thread DJ Delorie

 where in the C standard did you read the requirement that every bit
 field must be complete? (This is a serious question).

The spec doesn't say each field must be complete, but neither does it
say that the structure must be as big as the type used.  If you
specify int foo:1 then the compile is allowed to make the struct
smaller than int.

 extern struct
 {
   volatile unsigned int b : 1;
 } bf3;
 
 on my compiler this structure occupies 4 bytes.
 and it is aligned at 4 bytes.
 That is OK for me and AAPCS.

On my target, the structure occupies 1 byte.  I suspect gcc is
rounding up to WORD_MODE, which is 4 bytes for you and 1 for me.  If
so, you're relying on a coincidence, not a standard.

 But the access bf3.b=1 is SI mode with Sandra's patch (part 1, which just
 obeys the AAPCS and does nothing else) 
 and QI mode without this patch, which is just a BUG.
 I am quite surprised how your target manages to avoid it?

It doesn't, but I wouldn't expect it to, because IMHO that test is
invalid - you have not given a precise enough test to expect
consistent results.

 It is as Sandra said, at least on ARM -fstrict-volatile-bitfields
 does not function at all. And the C++11 memory model wins all the time.

Are we talking about N2429?  I read through the changes and it didn't
preclude honoring the user's types.

  Looking through the tests, most of them combine packed with
  mismatched types. IMHO, those tests are invalid.
 
 I dont think so. They are simply packed. And volatile just says that
 the value may be changed by a different thread. It has a great
 impact on loop optimizations.

Nothing you've said so far precludes honoring the user's types when
the user gives you a consistent structure.  Consider:

typedef struct { unsigned char b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1; 
} Bits;

extern volatile struct {
  Bits reg1; /* read status without resetting them */
  Bits reg2; /* read status and atomically reset them */
} interrupt_handler;

Without -fstrict-volatile-bitfields, gcc is allowed to use a 16-bit
access to read reg1, but this causes an unexpected read to volatile
reg2, which may have unintended consequences.  The spec doesn't say
the compiler *must* read reg2, it just *doesn't* say that it *can't*.
The -fstrict-volatile-bitfields flag tells the compiler that it
*can't*.  IMHO this doesn't violate the spec, it just limits what the
compiler can do within the spec.

If ARM wants fast word-sized volatile bitfields, use int and
structure your bitfields so that no int field overlaps a natural
int boundary.

When I added the option, I considered making it an error() to define a
strict volatile bitfield that spanned the allowed boundary of the type
specified, but I figured that would be too much.

  I've not objected to fixing -fstrict-volatile-bitfields, or making the
  -fno-strict-volatile-bitfields case match the standard. I've only
  objected to breaking my targets by making that flag not the default.
 
 Fine. Why cant we just get this fixed?

Dunno.  I'm only opposing the patch that disabled it on my targets.


Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-10-17 Thread Wei Mi
On Thu, Oct 17, 2013 at 12:35 AM, Marek Polacek pola...@redhat.com wrote:
 On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote:
 +/* Return true if target platform supports macro-fusion.  */
 +
 +static bool
 +ix86_macro_fusion_p ()
 +{
 +  if (TARGET_FUSE_CMP_AND_BRANCH)
 +return true;
 +  else
 +return false;
 +}

 That looks weird, why not just

 static bool
 ix86_macro_fusion_p (void)
 {
   return TARGET_FUSE_CMP_AND_BRANCH;
 }

 ?

 Marek

Thanks, fixed.

Wei Mi.


Re: [RESEND] Enable building of libatomic on AArch64

2013-10-17 Thread Marcus Shawcroft
On 17 October 2013 10:43, Michael Hudson-Doyle
michael.hud...@linaro.org wrote:
 Resending as the previous attempt went missing...

   2013-10-04  Michael Hudson-Doyle  michael.hud...@linaro.org

 * libatomic/configure.tgt (aarch64*): Remove code preventing
   build.

 * gcc/testsuite/lib/target-supports.exp
   (check_effective_target_sync_long_long): AArch64 supports
   atomic operations on long long.
   (check_effective_target_sync_long_long_runtime): AArch64 can
   execute atomic operations on long long.


OK, and committed.
/Marcus


[C++ Patch] PR 58596

2013-10-17 Thread Paolo Carlini

Hi,

in this ICE on valid, 4.8/4.9 Regression, the problem is that, for a 
lambda in an NSDMI context, the early return of lambda_expr_this_capture:


   /* In unevaluated context this isn't an odr-use, so just return the
  nearest 'this'.  */
   if (cp_unevaluated_operand)
 return lookup_name (this_identifier);

fails to find a this_identifier. This is expected, in a NSDMI context, 
as also clarified from the comment a few lines below. Thus I'm handling 
the issue the same way, that is by way of scope_chain-x_current_class_ptr.


Tested x86_64-linux.

Thanks!
Paolo.


/cp
2013-10-17  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58596
* lambda.c (lambda_expr_this_capture): Handle NSDMIs in the
cp_unevaluated_operand case.

/testsuite
2013-10-17  Paolo Carlini  paolo.carl...@oracle.com

PR c++/58596
* g++.dg/cpp0x/lambda/lambda-nsdmi5.C: New
Index: cp/lambda.c
===
--- cp/lambda.c (revision 203744)
+++ cp/lambda.c (working copy)
@@ -628,7 +628,14 @@ lambda_expr_this_capture (tree lambda)
   /* In unevaluated context this isn't an odr-use, so just return the
  nearest 'this'.  */
   if (cp_unevaluated_operand)
-return lookup_name (this_identifier);
+{
+  /* In an NSDMI the fake 'this' pointer that we're using for
+parsing is in scope_chain.  */
+  if (LAMBDA_EXPR_EXTRA_SCOPE (lambda)
+  TREE_CODE (LAMBDA_EXPR_EXTRA_SCOPE (lambda)) == FIELD_DECL)
+   return scope_chain-x_current_class_ptr;
+  return lookup_name (this_identifier);
+}
 
   /* Try to default capture 'this' if we can.  */
   if (!this_capture
Index: testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi5.C
===
--- testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi5.C   (revision 0)
+++ testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi5.C   (working copy)
@@ -0,0 +1,7 @@
+// PR c++/58596
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  int i = [] { return decltype(i)(); }();
+};


Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Steve Ellcey
On Thu, 2013-10-17 at 06:03 -0700, Diego Novillo wrote:
 On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote:
 
  How is Google going to change its patch commit policies to ensure that
  this does not happen again?
 
 There is nothing to change.  Google follows
 http://gcc.gnu.org/contribute.html, like everyone else. Sri just fixed
 the oversight and if there is any other fallout from his patch, he
 will address it.
 
 I don't see anything out of the ordinary here.
 
 
 Diego.

FYI: I just want to make sure everyone is aware that there are still
broken targets.  rs6000 may be fixed but mips still doesn't work and
I presume other platforms like sparc are also still broken.

/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c: In
function 'void cpp_atomic_builtins(cpp_reader*)':
/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:594:7: 
error: 'target_flags_explicit' was not declared in this scope
/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:606:7: 
error: 'target_flags_explicit' was not declared in this scope
/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:618:7: 
error: 'target_flags_explicit' was not declared in this scope
/local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:630:7: 
error: 'target_flags_explicit' was not declared in this scope
make[1]: *** [c-family/c-cppbuiltin.o] Error 1

Sriraman, are you looking at how to fix this globally or are the target
maintainers expected to fix this?  Currently I am using this one line
patch to get things building, but I presume this is not what we want
long term.


% git diff opth-gen.awk
diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk
index 01c5ab6..46bd570 100644
--- a/gcc/opth-gen.awk
+++ b/gcc/opth-gen.awk
@@ -114,6 +114,7 @@ print };
 print extern struct gcc_options global_options;
 print extern const struct gcc_options global_options_init;
 print extern struct gcc_options global_options_set;
+print #define target_flags_explicit global_options_set.x_target_flag
s
 print #endif
 print #endif
 print 


Steve Ellcey
sell...@mips.com




[AArch64] Fix types for vcvtsd_n intrinsics.

2013-10-17 Thread James Greenhalgh

Hi,

I spotted that the types of arguments to these intrinsics are wrong,
which results in all sorts of fun issues!

Fixed thusly, regression tested with aarch64.exp on aarch64-none-elf
with no issues.

OK?

Thanks,
James

---
2013-10-17  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/arm_neon.h
(vcvtds_n_fsu32,64_fsu32,64): Correct argument types.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index f7c9db6..55aa742 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -5442,7 +5442,7 @@ static float32x2_t vdup_n_f32 (float32_t);
   __extension__ \
 ({  \
int64_t a_ = (a);\
-   int64_t result;  \
+   float64_t result;\
__asm__ (scvtf %d0,%d1,%2  \
 : =w(result)  \
 : w(a_), i(b)   \
@@ -5454,7 +5454,7 @@ static float32x2_t vdup_n_f32 (float32_t);
   __extension__ \
 ({  \
uint64_t a_ = (a);   \
-   uint64_t result; \
+   float64_t result;\
__asm__ (ucvtf %d0,%d1,%2  \
 : =w(result)  \
 : w(a_), i(b)   \
@@ -5466,7 +5466,7 @@ static float32x2_t vdup_n_f32 (float32_t);
   __extension__ \
 ({  \
float64_t a_ = (a);  \
-   float64_t result;\
+   int64_t result;  \
__asm__ (fcvtzs %d0,%d1,%2 \
 : =w(result)  \
 : w(a_), i(b)   \
@@ -5478,7 +5478,7 @@ static float32x2_t vdup_n_f32 (float32_t);
   __extension__ \
 ({  \
float64_t a_ = (a);  \
-   float64_t result;\
+   uint64_t result;  \
__asm__ (fcvtzu %d0,%d1,%2 \
 : =w(result)  \
 : w(a_), i(b)   \
@@ -5586,7 +5586,7 @@ static float32x2_t vdup_n_f32 (float32_t);
   __extension__ \
 ({  \
int32_t a_ = (a);\
-   int32_t result;  \
+   float32_t result;\
__asm__ (scvtf %s0,%s1,%2  \
 : =w(result)  \
 : w(a_), i(b)   \
@@ -5598,7 +5598,7 @@ static float32x2_t vdup_n_f32 (float32_t);
   __extension__ \
 ({  \
uint32_t a_ = (a);   \
-   uint32_t result; \
+   float32_t result;\
__asm__ (ucvtf %s0,%s1,%2  \
 : =w(result)  \
 : w(a_), i(b)   \
@@ -5610,7 +5610,7 @@ static float32x2_t vdup_n_f32 (float32_t);
   __extension__ \
 ({  \
float32_t a_ = (a);  \
-   float32_t result;\
+   int32_t result;  \
__asm__ (fcvtzs %s0,%s1,%2   

Re: [PATCH][i386]Fix PR 57756

2013-10-17 Thread Sriraman Tallam
On Thu, Oct 17, 2013 at 9:28 AM, Steve Ellcey sell...@mips.com wrote:
 On Thu, 2013-10-17 at 06:03 -0700, Diego Novillo wrote:
 On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote:

  How is Google going to change its patch commit policies to ensure that
  this does not happen again?

 There is nothing to change.  Google follows
 http://gcc.gnu.org/contribute.html, like everyone else. Sri just fixed
 the oversight and if there is any other fallout from his patch, he
 will address it.

 I don't see anything out of the ordinary here.


 Diego.

 FYI: I just want to make sure everyone is aware that there are still
 broken targets.  rs6000 may be fixed but mips still doesn't work and
 I presume other platforms like sparc are also still broken.

 /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c: In
 function 'void cpp_atomic_builtins(cpp_reader*)':
 /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:594:7: 
 error: 'target_flags_explicit' was not declared in this scope
 /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:606:7: 
 error: 'target_flags_explicit' was not declared in this scope
 /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:618:7: 
 error: 'target_flags_explicit' was not declared in this scope
 /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:630:7: 
 error: 'target_flags_explicit' was not declared in this scope
 make[1]: *** [c-family/c-cppbuiltin.o] Error 1

 Sriraman, are you looking at how to fix this globally or are the target
 maintainers expected to fix this?  Currently I am using this one line
 patch to get things building, but I presume this is not what we want
 long term.

Yes, I am on it. I will get back asap.


Sri




 % git diff opth-gen.awk
 diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk
 index 01c5ab6..46bd570 100644
 --- a/gcc/opth-gen.awk
 +++ b/gcc/opth-gen.awk
 @@ -114,6 +114,7 @@ print };
  print extern struct gcc_options global_options;
  print extern const struct gcc_options global_options_init;
  print extern struct gcc_options global_options_set;
 +print #define target_flags_explicit global_options_set.x_target_flag
 s
  print #endif
  print #endif
  print 


 Steve Ellcey
 sell...@mips.com




Re: [C++ Patch] PR 58596

2013-10-17 Thread Jason Merrill

OK.

Jason


[AArch64,PATCH] Adjust preferred_reload_class of SP+C

2013-10-17 Thread Marcus Shawcroft

Hi,

This patch addresses an issue in reload triggered by the
gfortran.dg/loc_2.f90 regression test at -O3 with LRA disabled.

The patch is based on work done by Ian Bolton here at ARM which I've
dusted down and submitted.

Following SFP elimination and under heavy register pressure, reload
attempts the reload of SP+offset into a V register.  The AArch64
instruction set does not support such an operation.

We considered two solutions to this issue:

1) Detect the SP+offset pattern in secondary reload and use an
intermediate X register.

2) Detect the SP+offset-V pattern inpreferred_reload_class and
return NO_REG before secondary reload gets involved.

The latter looks like a simpler and more intuitive solution to me than
the first.  I also note that the i386 backend implementation of
preferred_reload_class contains equivalent code.

I intend to leave this patch on the list for a few days before
committing to give folks knowledgable on reload and the associated
target hooks the opportunity to comment.

Thanks
/Marcus

2013-10-17  Ian Bolton  ian.bol...@arm.com
Marcus Shawcroft  marcus.shawcr...@arm.com

* config/aarch64/aarch64.c (aarch64_preferred_reload_class):
Special case reload SP+C into none GENERAL_REGS.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7fce7a0..cc9ecdd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4237,6 +4237,24 @@ aarch64_preferred_reload_class (rtx x, reg_class_t regclass)
!aarch64_simd_imm_scalar_p (x, GET_MODE (x)))
 return NO_REGS;
 
+  /* Register eliminiation can result in a request for
+ SP+constant-FP_REGS.  We cannot support such operations which
+ use SP as source and an FP_REG as destination, so reject out
+ right now.  */
+  if (! reg_class_subset_p (regclass, GENERAL_REGS)  GET_CODE (x) == PLUS)
+{
+  rtx lhs = XEXP (x, 0);
+
+  /* Look through a possible SUBREG introduced by ILP32.  */
+  if (GET_CODE (lhs) == SUBREG)
+	lhs = SUBREG_REG (lhs);
+
+  gcc_assert (REG_P (lhs));
+  gcc_assert (reg_class_subset_p (REGNO_REG_CLASS (REGNO (lhs)),
+  POINTER_REGS));
+  return NO_REGS;
+}
+
   return regclass;
 }
 

Re: [PATCH][AArch64] Implement %c output template

2013-10-17 Thread Marcus Shawcroft
On 17 October 2013 12:13, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:

 [gcc/]
 2013-10-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64.c (aarch64_print_operand): Handle 'c'.

 [gcc/testsuite]
 2013-10-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * gcc.target/aarch64/c-output-template.c: New testcase.
 * gcc.target/aarch64/c-output-template-2.c: Likewise.
 * gcc.target/aarch64/c-output-template-3.c: Likewise.

OK
/Marcus


Re: [AArch64] Fix types for vcvtsd_n intrinsics.

2013-10-17 Thread Marcus Shawcroft
On 17 October 2013 17:27, James Greenhalgh james.greenha...@arm.com wrote:

 Hi,

 I spotted that the types of arguments to these intrinsics are wrong,
 which results in all sorts of fun issues!

 Fixed thusly, regression tested with aarch64.exp on aarch64-none-elf
 with no issues.

 OK?

 Thanks,
 James

 ---
 2013-10-17  James Greenhalgh  james.greenha...@arm.com

 * config/aarch64/arm_neon.h
 (vcvtds_n_fsu32,64_fsu32,64): Correct argument types.

OK
/Marcus


  1   2   >