date:20131029

Re: [Patch, fortran] PRs 57893 and 58858

2013-10-29 Thread Tobias Burnus


Am 28.10.2013 23:26, schrieb Paul Richard Thomas:

This patch addresses issues arising from PR57893.  It is entirely obvious.
Bootstraps and regtests on FC17/x86_64 - OK for trunk?


Thanks for the patch. It's OK except for the following change:

*** gcc/testsuite/gfortran.dg/unlimited_polymorphic_13.f90  (revision 
204135)
--- gcc/testsuite/gfortran.dg/unlimited_polymorphic_13.f90  (working copy)
...
*** 36,43 
  if (storage_size(o) /= sz) call abort()
type is (complex(c2))
  if (storage_size(o) /= sz) call abort()
- end select
- select type (o)
type is (complex(c3))
  if (storage_size(o) /= sz) call abort()
type is (complex(c4))
--- 42,47 

Please don't merge the two SELECT TYPE blocks. If a system has only two or 
three complex kinds, that will lead to an error message like:

Error: CASE label at (1) overlaps with CASE label at (2)

Tobias

PS: The following patch is still pending review: 
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01957.html



2013-10-29  Paul Thomas  pa...@gcc.gnu.org

 PR fortran 57893
 * trans-types.c (gfc_typenode_for_spec): Add typenode for
 BT_HOLLERITH. Note that the length is incorrect but unusable.

 PR fortran 58858
 * target-memory.c (gfc_element_size): Add element sizes for
 BT_VOID and BT_ASSUMED, using gfc_typenode_for_spec.

2013-10-29  Paul Thomas  pa...@gcc.gnu.org

 PR fortran 57893
 * gfortran.dg/unlimited_polymorphic_13.f90 : Use real variables
 to determine sizes of real kinds.

 PR fortran 58858
 * gfortran.dg/unlimited_polymorphic_14.f90 : New test.

Re: free is a killer

2013-10-29 Thread Marc Glisse


On Mon, 28 Oct 2013, Jeff Law wrote:


On 10/28/13 16:05, Marc Glisse wrote:


I checked and it does the wrong thing (I don't have the testcase handy
anymore, but it shouldn't be hard to recreate one), I even wrote a patch
(attached) but it is related to:
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html
so it can't go in. A more limited fix (still missing many cases) would
be possible. I didn't test VA_END, but it doesn't look as bad (it is the
SSA_NAME case that is wrong for memcpy

Actually, it's fairly easy to see with memset.  Something like this:

/* { dg-do compile } */
/* { dg-options -O1 -fdump-tree-optimized } */

void f (long *p) {
 *p = 42;
 p[4] = 42;
 __builtin_memset (p, 0, 100);
}

/* { dg-final { scan-tree-dump-not = 42 optimized } } */
/* { dg-final { cleanup-tree-dump optimized } } */


Indeed. Do you want to commit it xfailed or put it in bugzilla so we don't 
lose it? (it becomes harder if you replace p with p-1 in the memset 
arguments).



While I am posting non-submitted patches, does the following vaguely
make sense? I couldn't find a testcase that exercised it without some
local patches, but the idea (IIRC) is that with:
struct A { struct B b; int i; }
we can easily end up with one ao_ref.base that is a MEM_REF[p] and
another one a MEM_REF[(B*)q] where p and q are of type A*, and that
prevents us from noticing that p-i and q-b don't alias. Maybe I should
instead find a way to get MEM_REF[q] as ao_ref.base, but if q actually
points to a larger structure that starts with A it is likely to fail as
well...
I've never looked at the alias oracle stuff, but ISTM that offsets from the 
base ought to be enough to disambiguate?


Yes, the issue here is making them notice that they can be seen as having 
bases of the same type (and thus that it is meaningful to compare 
offsets), since there are multiple possible base choices when a structure 
starts with a structure which starts with...



In the category ugly hack that I won't submit, let me also attach this
patch that sometimes helps notice that malloc doesn't alias other things
(I don't know if the first hunk ever fires).
Well, one thing that can be done here (I think it comes from Steensgaard and 
I'm pretty sure it's discussed in Morgan's text) is you assign a storage 
class and propagate that.   Once you do so, you can eliminate certain things. 
You use a vrp-like engine with a fairly simple meet operator to handle 
propagation of the storage class.


Once you have storage classes for all the pointers, you can make certain 
deductions about aliasing relationships.  Including what things in the malloc 
storage class can and can not alias.


Sounds good, but I am not really up to rewriting the points-to analysis at 
the moment...


Thanks,

--
Marc Glisse

[PATCH] Testcase for nonzero_bits __builtin_unreachable

2013-10-29 Thread Jakub Jelinek

On Sat, Oct 26, 2013 at 12:19:33AM +0200, Jakub Jelinek wrote:
 And here is a patch that allows vectorization without peeling for alignment
 and scalar loop for bound even for fn2, fn3 and fn4 in the following
 testcase, though as with the range __builtin_unreachable () notes, it is
 quite fragile, because it only works if there are no immediate uses of the
 tested SSA_NAME before the assertion.  Perhaps more reliable way would be to
 convert those assertions info __builtin_assume_aligned, but that has the
 disadvantage that it's first argument is a pointer and it returns a pointer,
 so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED
 internal function.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux.

And here is as a follow-up to the whole patchset of 4 patches a testcase:

2013-10-29  Jakub Jelinek  ja...@redhat.com

* gcc.dg/vect/vect-align-3.c: New test.

--- gcc/testsuite/gcc.dg/vect/vect-align-3.c.jj 2013-10-29 08:18:08.636348586 
+0100
+++ gcc/testsuite/gcc.dg/vect/vect-align-3.c2013-10-29 08:21:14.931384138 
+0100
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+int a[2048];
+
+void
+f1 (int x, int y)
+{
+  int i;
+  x = -256;
+  y = -256;
+  for (i = x + 256; i  y; i++)
+a[i]++;
+}
+
+void
+f2 (int x, int y)
+{
+  int i;
+  if (x  31)
+__builtin_unreachable ();
+  if (y  31)
+__builtin_unreachable ();
+  for (i = x + 256; i  x + y; i++)
+a[i]++;
+}
+
+void
+f3 (int x, int y)
+{
+  int i;
+  if (x % 256)
+__builtin_unreachable ();
+  if (y % 256)
+__builtin_unreachable ();
+  for (i = x + 256; i  x + y; i++)
+a[i]++;
+}
+
+void
+f4 (int x, int y)
+{
+  int i;
+  if ((x % 256) != 0)
+__builtin_unreachable ();
+  if ((y % 256) != 0)
+__builtin_unreachable ();
+  for (i = x + 256; i  x + y; i++)
+a[i]++;
+}
+
+/* { dg-final { scan-tree-dump-not vect_do_peeling_for_loop_bound vect } } 
*/
+/* { dg-final { scan-tree-dump-not loop peeled for vectorization vect } } 
*/
+/* { dg-final { cleanup-tree-dump vect } } */

Jakub

Re: [PATCH, ARM] Fix line number data for PIC register setup code

2013-10-29 Thread Eric Botcazou

 Updated patch, re-bootstrapped on x86_64 and committed to trunk.
 
 Also applied to 4.7 and 4.8 branches, the same problem is present there.

You only asked for approval on trunk though, and I'm not sure we really care 
about the results of the GDB testsuite on the 4.7 branch at this point, so 
let's exclude the 4.7 branch here.

-- 
Eric Botcazou

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-29 Thread Uros Bizjak

Hello!

 For the define_expand I added as below, the else body is there to
 avoid fall-through transformations to ABS operation in optabs.c.
 Otherwise ABS will be converted to other operations even that we have
 corresponding instructions from SSSE3.

No, it wont be.

Fallthrough will generate the pattern that will be matched by the insn
pattern above, just like you are doing by hand below.

 (define_expand absmode2
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand)
 (abs:VI124_AVX2_48_AVX512F
  (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))]
   TARGET_SSE2
 {
   if (!TARGET_SSSE3)
 ix86_expand_sse2_abs (operands[0], force_reg (MODEmode, operands[1]));

Do you really need force_reg here? You are using generic expanders in
ix86_expand_sse2_abs that can handle non-registers operands just as
well.

   else
 emit_insn (gen_rtx_SET (VOIDmode, operands[0],
gen_rtx_ABS (MODEmode, operands[1])));
   DONE;
 })

Please note that your mailer mangles indents. Please indent your code correctly.

Uros.

RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-10-29 Thread Bernd Edlinger

Hi,

On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote:

 On 10/28/2013 03:20 AM, Bernd Edlinger wrote:

 On Sun, 20 Oct 2013 20:23:49, Sandra Loosemore wrote:

 I tried a backport to GCC 4.8 and tested on arm-none-eabi. On the new
 pr56997-1.c testcase, it got stuck in infinite recursion between
 store_split_bit_field/store_fixed_bit_field and/or
 extract_split_bit_field/extract_fixed_bit_field. This didn't show up in
 my previous mainline testing.

 [snip]

 I have attached an update to your patch, that should
 a) fix the recursion problem.
 b) restrict the -fstrict-volatile-bitfields to not violate the C++ memory 
 model.


 Thanks for picking up the ball on this -- I've been busy with other
 tasks for several days.

 I again tried backporting the patch series along with your fix to GCC
 4.8 and tested on arm-none-eabi. I found that it was still getting
 stuck in infinite recursion unless the test from this patch hunk


hmmm
Actually I used your path on a clean 4.8.2 and built for --target=arm-eabi.
I have seen the recursion on the extract_bit_field, but not on store_bit_field
so far, maybe you could give me a hint which test case exposes the other
flavour of this recursion problem.

 @@ -1699,6 +1711,14 @@ extract_bit_field (rtx str_rtx, unsigned
 {
 enum machine_mode mode1;

 + /* Handle extraction of unaligned fields,
 + this can happen in -fstrict-volatile-bitfields. */
 + if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0
 +  GET_MODE_BITSIZE (GET_MODE (str_rtx))  GET_MODE_BITSIZE (word_mode)
 +  bitnum % GET_MODE_BITSIZE (GET_MODE (str_rtx)) + bitsize
 + GET_MODE_BITSIZE (GET_MODE (str_rtx)) )
 + str_rtx = adjust_address (str_rtx, word_mode, 0);
 +
 /* Handle -fstrict-volatile-bitfields in the cases where it applies. */
 if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) 0)
 mode1 = GET_MODE (str_rtx);

 was also inserted into store_bit_field.

 I also think that, minimally, this needs to be rewritten as something like

 if (MEM_P (str_rtx)
  STRICT_ALIGNMENT
  GET_MODE_BITSIZE (GET_MODE (str_rtx)) != 0
  GET_MODE_BITSIZE (GET_MODE (str_rtx))  GET_MODE_BITSIZE
 (word_mode)
  (bitnum % MEM_ALIGN (str_rtx) + bitsize
 GET_MODE_BITSIZE (GET_MODE (str_rtx
 str_rtx = adjust_address (str_rtx, word_mode, 0);


Yes, that looks fine. 
 

 Otherwise, we'll end up using word_mode instead of the field mode on
 targets that support unaligned accesses. I tested that (again, 4.8 and
 arm-none-eabi) and results looked good, but of course ARM isn't one of
 the targets that supports unaligned accesses. :-S

 I'm still not convinced this is the right fix, though. It seems to me
 that callers of store_bit_field and extract_bit_field in expr.c ought
 not to be messing with the mode of the rtx when
 flag_strict_volatile_bitfields 0, because that is losing information
 about the original mode that we are trying to patch up here by forcing
 it to word_mode. Instead, I think the callers ought to be passing the
 declared field mode into store_bit_field and extract_bit_field, and
 those functions ought to change the mode of the incoming rtx to match
 the field mode only if strict_volatile_bitfield_p assures us that the
 insertion/extraction can be done in that mode.


The problem starts likely at expr.c when this is done:

if (volatilep  flag_strict_volatile_bitfields 0)
 to_rtx = adjust_address (to_rtx, mode1, 0);

this restricts the possible access mode not only for bit-fields
but for all possible volatile members. But -fstrict-volatile-bitfields
is supposed to affect bit-fields only.

 Alternatively, if strict_volatile_bitfield_p returns false but
 flag_strict_volatile_bitfields 0, then always force to word_mode and
 change the -fstrict-volatile-bitfields documentation to indicate that's
 the fallback if the insertion/extraction cannot be done in the declared
 mode, rather than claiming that it tries to do the same thing as if
 -fstrict-volatile-bitfields were not enabled at all.

 Either way, still needs more work, and more thorough testing (more
 targets, and obviously trunk as well as the 4.8 backport) before I'd
 consider this ready to commit. I might or might not be able to find
 some more time to work on this in the next week


Yes. 

And it would be good to reach Richard's Nov-21 deadline for GCC-4.9

Bernd.

 -Sandra

[PATCH] Adjust testsuite with respect to -fvect-cost-model changes

2013-10-29 Thread Richard Biener


-fvect-cost-model changed its behavior dependent on the optimization
level.  The former behavior is restored by using -fvect-cost-model=dynamic
for -fvect-cost-model and -fvect-cost-model=unlimited for 
-fno-vect-cost-model.  The following patch makes this change
throughout the testsuite - which also highlights the fact that
we miss testcases for the cheap cost model.

Previously selected testcases that regressed with the above change
were adjusted, but clearly all cost-model testcases were developed
with the dynamic model in mind.

Committed.

Richard.

2013-10-29  Richard Biener  rguent...@suse.de

* g++.dg/vect/slp-pr56812.cc: Adjust with respect to -fvect-cost-model
changes.
* gcc.dg/vect/bb-slp-32.c: Likewise.
* gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp: Likewise.
* gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp: Likewise.
* gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp: Likewise.
* gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp: Likewise.
* gcc.target/powerpc/crypto-builtin-1.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-1.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-2.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-3.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-4.c: Likewise.
* gcc.target/powerpc/p8vector-builtin-5.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-1.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-2.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-3.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-4.c: Likewise.
* gcc.target/powerpc/p8vector-vectorize-5.c: Likewise.
* gfortran.dg/vect/vect.exp: Likewise.

Index: trunk/gcc/testsuite/g++.dg/vect/slp-pr56812.cc
===
*** trunk.orig/gcc/testsuite/g++.dg/vect/slp-pr56812.cc 2013-08-30 
09:55:27.0 +0200
--- trunk/gcc/testsuite/g++.dg/vect/slp-pr56812.cc  2013-10-29 
10:06:33.146955101 +0100
***
*** 1,7 
  /* { dg-do compile } */
  /* { dg-require-effective-target vect_float } */
  /* { dg-require-effective-target vect_hw_misalign } */
! /* { dg-additional-options -O3 -funroll-loops -fvect-cost-model } */
  
  class mydata {
  public:
--- 1,7 
  /* { dg-do compile } */
  /* { dg-require-effective-target vect_float } */
  /* { dg-require-effective-target vect_hw_misalign } */
! /* { dg-additional-options -O3 -funroll-loops -fvect-cost-model=dynamic } */
  
  class mydata {
  public:
Index: trunk/gcc/testsuite/gcc.dg/vect/bb-slp-32.c
===
*** trunk.orig/gcc/testsuite/gcc.dg/vect/bb-slp-32.c2013-08-27 
11:13:22.0 +0200
--- trunk/gcc/testsuite/gcc.dg/vect/bb-slp-32.c 2013-10-29 10:06:38.660018142 
+0100
***
*** 1,6 
  /* { dg-do compile } */
  /* { dg-require-effective-target vect_int } */
! /* { dg-additional-options -fvect-cost-model } */
  
  void bar (int *);
  int foo (int *p)
--- 1,6 
  /* { dg-do compile } */
  /* { dg-require-effective-target vect_int } */
! /* { dg-additional-options -fvect-cost-model=dynamic } */
  
  void bar (int *);
  int foo (int *p)
Index: trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp
===
*** trunk.orig/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp 
2013-01-11 10:54:37.0 +0100
--- trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp  
2013-10-29 10:07:01.423277877 +0100
*** if { ![istarget i?86*-*-*]  ![istarget
*** 28,34 
  set DEFAULT_VECTCFLAGS 
  
  # These flags are used for all targets.
! lappend DEFAULT_VECTCFLAGS -O2 -ftree-vectorize -fvect-cost-model
  
  # If the target system supports vector instructions, the default action
  # for a test is 'run', otherwise it's 'compile'.  Save current default.
--- 28,34 
  set DEFAULT_VECTCFLAGS 
  
  # These flags are used for all targets.
! lappend DEFAULT_VECTCFLAGS -O2 -ftree-vectorize 
-fvect-cost-model=dynamic
  
  # If the target system supports vector instructions, the default action
  # for a test is 'run', otherwise it's 'compile'.  Save current default.
Index: trunk/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp
===
*** trunk.orig/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp   
2013-01-11 10:54:37.0 +0100
--- trunk/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp
2013-10-29 10:06:48.601131525 +0100
*** if ![is-effective-target powerpc_altivec
*** 33,39 
  set DEFAULT_VECTCFLAGS 
  
  # These flags are used for all targets.
! lappend DEFAULT_VECTCFLAGS -O2 -ftree-vectorize -fvect-cost-model 
-fno-common
  
  # If the target system supports vector instructions, the

[PATCH GCC]Simplify address expression in IVOPT

2013-10-29 Thread bin.cheng

Hi,
I noticed that IVOPT generates complex address expressions like below for iv
base.
arr_base[0].y
arr[0]
MEM[p+o]
It's even worse for targets support auto-increment addressing mode because
IVOPT adjusts such base expression with +/- step, then creates below:
arr_base[0].y +/- step
arr[0] +/- step
MEM[p+o] +/- step
It has two disadvantages:
1) Cost computation in IVOPT can't handle complex address expression and
general returns spill_cost for it, which is bad since address iv is
important to IVOPT.
2) IVOPT creates duplicate candidates for IVs which have same value in
different forms, for example, two candidates are generated with each for
a[0] and a.  Again, it's even worse for auto-increment addressing
mode.

This patch fixes the issue by simplifying address expression at the entry of
allocating IV struct.  Maybe the simplification can be put in various fold*
functions but I think it might be better in this way, because:
1) fold* functions are used from front-end to various tree optimizations,
the simplified address expressions may not be what each optimizer wanted.
Think about parallelism related passes, they might want the array index
information kept for further analysis.
2) In some way, the simplification is conflict with current implementation
of fold* function.  Take fold_binary_loc as an example, it tries to simplify
a[i1] +p c* i2 into a[i1+i2].  Of course we can simplify in this way
for IVOPT too, but that will cause new problems like: a) we have to add code
in IVOPT to cope with complex ARRAY_REF which is the exactly thing we want
to avoid; b) the simplification can't always be done because of the
sign/unsigned offset problem (especially for auto-increment addressing
mode).
3) There are many entry point for fold* functions, the change will be
non-trivial.
4) The simplification is only done in alloc_iv for true (not duplicate ones)
iv struct, the number of such iv should be moderate.

With these points, I think it might be a win to do the simplification in
IVOPT and create a kind of sand box to let IVOPT play.  Any suggestions?

Bootstrap and tested on x86/x86_64/arm.
The patch causes three cases failed on some target, but all of them are
false alarm, which can be resolved by refining test cases to check more
accurate information.

Is it OK?

Thanks.
bin

gcc/testsuite/ChangeLog
2013-10-29  Bin Cheng  bin.ch...@arm.com

* gcc.dg/tree-ssa/loop-2.c: Refine check condition.
* gcc.dg/tree-ssa/ivopt_infer_2.c: Ditto.
* gcc.dg/tree-ssa/ivopt_mult_3.c: Ditto.

2013-10-29  Bin Cheng  bin.ch...@arm.com

* tree-ssa-loop-ivopts.c (alloc_iv): Simplify base expression.
(get_shiftadd_cost): Check equality using operand_equal_p.
(force_expr_to_var_cost): Refactor the code.  Handle type
conversion.
(split_address_cost): Call force_expr_to_var_cost.Index: gcc/testsuite/gcc.dg/tree-ssa/loop-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/loop-2.c  (revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-2.c  (working copy)
@@ -27,7 +27,7 @@ void xxx(void)
 
 /* { dg-final { scan-tree-dump-times  \\* \[^\\n\\r\]*= 0 optimized } } */
 /* { dg-final { scan-tree-dump-times \[^\\n\\r\]*= \\*  0 optimized } } */
-/* { dg-final { scan-tree-dump-times MEM 1 optimized } } */
+/* { dg-final { scan-tree-dump-times MEM\\\[base 1 optimized } } */
 
 /* 17 * iter should be strength reduced.  */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c   (revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_infer_2.c   (working copy)
@@ -7,7 +7,8 @@
 
 extern char a[];
 
-/* Can not infer loop iteration from array -- exit test can not be replaced.  
*/
+/* Can not infer loop iteration from array -- exit test can not be
+   replaced by the array address.  */
 void foo (unsigned int i_width, TYPE dst)
 {
   unsigned long long i = 0;
@@ -21,5 +22,5 @@ void foo (unsigned int i_width, TYPE dst)
 }
 }
 
-/* { dg-final { scan-tree-dump-times Replacing 0 ivopts} } */
+/* { dg-final { scan-tree-dump-times \[^:\]*if \\(.*j_\[0-9\]+.*\\) 1 
ivopts} } */
 /* { dg-final { cleanup-tree-dump ivopts } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c(revision 204117)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_3.c(working copy)
@@ -18,5 +18,5 @@ long foo(long* p, long* p2, int N1, int N2)
   return s;
 }
 
-/* { dg-final { scan-tree-dump-times Replacing 1 ivopts} } */
+/* { dg-final { scan-tree-dump-times Replacing exit test: if \\(.*p2.*\\) 1 
ivopts} } */
 /* { dg-final { cleanup-tree-dump ivopts } } */
Index: gcc/tree-ssa-loop-ivopts.c

[PING] [AArch64] Peepholes to generate ldp and stp instructions

2013-10-29 Thread Hurugalawadi, Naveen

Hi,

Please consider this as a reminder to review the ldp and stp peephole
implementation for AArch64 target.

The patch was originally posted at:-
http://gcc.gnu.org/ml/gcc-patches/2013-03/msg01051.html

Please review the same and let me know if its okay.
 
Build and tested on aarch64-thunder-elf (using Cavium's internal
simulator). No new regressions.

2013-10-29   Naveen H.S  naveen.hurugalaw...@caviumnetworks.com

gcc/
* config/aarch64/aarch64.md (peephole2s to generate ldp
instruction for 2 consecutive loads from memory): New.
(peephole2s to generate stp instruction for 2 consecutive
stores to memory in integer mode): New.
(peephole2s to generate ldp instruction for 2 consecutive
loads from memory in floating point mode): New.
(peephole2s to generate stp instruction for 2 consecutive
stores to memory in floating point mode): New.

gcc/testsuite
* gcc.target/aarch64/ldp-stp.c: New testcase

Thanks,
Naveendiff -uprN '-x*.orig' mainline-orig/gcc/config/aarch64/aarch64.md gcc-4.8.0/gcc/config/aarch64/aarch64.md
--- mainline-orig/gcc/config/aarch64/aarch64.md	2013-10-28 17:15:52.363975264 +0530
+++ gcc-4.8.0/gcc/config/aarch64/aarch64.md	2013-10-29 10:11:09.656082201 +0530
@@ -1068,6 +1068,27 @@
(set_attr mode MODE)]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 register_operand)
+	(match_operand:GPI 1 aarch64_mem_pair_operand))
+   (set (match_operand:GPI 2 register_operand)
+	(match_operand:GPI 3 memory_operand))]
+  GET_CODE (operands[1]) == MEM
+GET_CODE (XEXP (operands[1], 0)) == PLUS
+GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+REGNO (operands[0]) != REGNO (operands[2])
+REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+  GET_MODE_SIZE (MODEmode)))
+optimize_size
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn store_pairmode
@@ -1085,6 +1106,27 @@
(set_attr mode MODE)]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 aarch64_mem_pair_operand)
+	(match_operand:GPI 1 register_operand))
+   (set (match_operand:GPI 2 memory_operand)
+	(match_operand:GPI 3 register_operand))]
+  GET_CODE (operands[0]) == MEM
+GET_CODE (XEXP (operands[0], 0)) == PLUS
+GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT
+GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0))
+REGNO (operands[1]) != REGNO (operands[3])
+REGNO_REG_CLASS (REGNO (operands[1]))
+  == REGNO_REG_CLASS (REGNO (operands[3]))
+rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (Pmode, XEXP (operands[0], 0),
+  GET_MODE_SIZE (MODEmode)))
+optimize_size
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn load_pairmode
@@ -1102,6 +1144,28 @@
(set_attr mode MODE)]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 register_operand)
+	(match_operand:GPF 1 aarch64_mem_pair_operand))
+   (set (match_operand:GPF 2 register_operand)
+	(match_operand:GPF 3 memory_operand))]
+  GET_CODE (operands[1]) == MEM
+GET_CODE (XEXP (operands[1], 0)) == PLUS
+GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+REGNO (operands[0]) != REGNO (operands[2])
+REGNO (operands[0]) = 32  REGNO (operands[2]) = 32
+REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+  GET_MODE_SIZE (MODEmode)))
+optimize_size
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn store_pairmode
@@ -1119,6 +1183,28 @@
(set_attr mode MODE)]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 aarch64_mem_pair_operand)
+	(match_operand:GPF 1 register_operand))
+   (set (match_operand:GPF 2 memory_operand)
+	(match_operand:GPF 3 register_operand))]
+  GET_CODE (operands[0]) == MEM
+GET_CODE (XEXP (operands[0], 0)) == PLUS
+GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT
+

Re: Aliasing: look through pointer's def stmt

2013-10-29 Thread Richard Biener

On Sat, Oct 26, 2013 at 7:07 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Fri, 25 Oct 2013, Marc Glisse wrote:

 On Fri, 25 Oct 2013, Richard Biener wrote:

 you can followup with handling POINTER_PLUS_EXPR if you like.


 Like this? (bootstrap+testsuite on x86_64-unknown-linux-gnu)

 2013-10-26  Marc Glisse  marc.gli...@inria.fr

 gcc/
 * tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Look for a
 POINTER_PLUS_EXPR in the defining statement.


 This has some issues with the type of the offsets. First, they might
 overflow in some extreme cases (that could probably already happen without
 the patch). But mostly, negative offsets are not supported. There is a
 comment to that effect before ao_ref_init_from_ptr_and_size, and
 ranges_overlap_p takes the offsets as unsigned HOST_WIDE_INT. Currently it
 does indeed seem hard to produce a negative offset there, but handling
 POINTER_PLUS_EXPR (definitely a good thing) would obviously change that.

For the POINTER_PLUS_EXPR offset argument you should use
int_cst_value () to access it (it will unconditionally sign-extend)
and use host_integerp (..., 0).  That leaves the overflow possibility
in place (and you should multiply by BITS_PER_UNIT) which we
ignore in enough other places similar to this to ignore ...
(side-note: for quite some time on my TODO is to make the
gimple alias-machinery use byte-offsets instead of bit-offsets
which wouldn't cause regressions if we finally lowered bitfield
accesses ...).  A way to not ignore it is to do

  off = double_int_from_tree (ptr-plus-offset);
  off = double_int_sext (off, TYPE_PRECISION (...))
  off = double_int_mul (off, BITS_PER_UNIT);
  if (off.fits_shwi ())
...

Richard.


 --
 Marc Glisse

Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c

2013-10-29 Thread Ramana Radhakrishnan


On 10/24/13 00:04, Kugan wrote:

Hi,

arm testcases neon-vcond-ltgt.c and neon-vcond-unordered.c fails in
Linaro 4.8 branch. It is not reproducable with trunk but it can happen.
Both neon-vcond-ltgt.c and neon-vcond-unordered.c scans for vbsl
instruction, with other vector instructions. However, as per the comment
  for neon_vbslmode_internal md pattern defined in neon.md, gcc can
generate vbsl or vbit or vbif depending on the register allocation.
Therfore, these testcases should scan for one of these three
instructions instead of just vbsl. I have updated the testcases to scan
vbsl or vbit or vbif now.

Is this OK?


This is OK. Thanks for the analysis on the testcase.

regards
Ramana



Thanks,
Kugan

2013-10-23  Kugan Vivekanandarajah  kug...@linaro.org
* gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif.
* gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.

Re: [ARM][PATCH] Fix testsuite testcase neon-vcond-[ltgt,unordered].c

2013-10-29 Thread Ramana Radhakrishnan


On 10/24/13 00:04, Kugan wrote:

Hi,

arm testcases neon-vcond-ltgt.c and neon-vcond-unordered.c fails in
Linaro 4.8 branch. It is not reproducable with trunk but it can happen.
Both neon-vcond-ltgt.c and neon-vcond-unordered.c scans for vbsl
instruction, with other vector instructions. However, as per the comment
  for neon_vbslmode_internal md pattern defined in neon.md, gcc can
generate vbsl or vbit or vbif depending on the register allocation.
Therfore, these testcases should scan for one of these three
instructions instead of just vbsl. I have updated the testcases to scan
vbsl or vbit or vbif now.

Is this OK?

Thanks,
Kugan



Hit send too soon on my earlier email. Minor nit in the Changelog as 
below. Need a newline between the DATE line and the file names .




2013-10-23  Kugan Vivekanandarajah  kug...@linaro.org



* gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif.
* gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.





Ramana

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Jakub Jelinek

On Wed, Oct 16, 2013 at 09:35:21AM +0400, Yury Gribov wrote:
  I've recently submitted a bug report regarding invalid
 unpoisoning of stack frame redzones
 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone
 take a look at proposed patch (a simple one-liner) and check whether
 it's ok for commit?
 
  Can you please be more verbose
 
  Do you think the proposed patch is ok or should I provide some
 additional info?
 
 Jakub, Dodji,
 
 Any updates on this one? Note that this bug is a huge blocker for
 using AddressSanitizer on ARM platforms.

Sorry for the delay, I finally found time to look at it.
While your patch fixes the issue, I wonder if for the case where
shadow_mem before the asan_clear_shadow call is already of the form
(mem (reg)) it isn't better to just adjust next asan_clear_shadow
base from the end of the cleared region rather than from the start of it,
because the end will be already in the right pseudo, while with your
approach it needs to be done in a different register and potentially
increase register pressure.  So here is (completely untested) alternate fix:

2013-10-29  Jakub Jelinek  ja...@redhat.com
Yury Gribov  y.gri...@samsung.com

PR sanitizer/58543
* asan.c (asan_clear_shadow): Return bool whether the emitted loop
(if any) updated shadow_mem's base.
(asan_emit_stack_protection): If asan_clear_shadow returned true,
adjust shadow_mem and prev_offset.

--- gcc/asan.c.jj   2013-10-23 14:43:15.0 +0200
+++ gcc/asan.c  2013-10-29 10:26:56.085406934 +0100
@@ -878,10 +878,11 @@ asan_shadow_cst (unsigned char shadow_by
 /* Clear shadow memory at SHADOW_MEM, LEN bytes.  Can't call a library call 
here
though.  */
 
-static void
+static bool
 asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
 {
   rtx insn, insns, top_label, end, addr, tmp, jump;
+  bool ret;
 
   start_sequence ();
   clear_storage (shadow_mem, GEN_INT (len), BLOCK_OP_NORMAL);
@@ -893,12 +894,13 @@ asan_clear_shadow (rtx shadow_mem, HOST_
   if (insn == NULL_RTX)
 {
   emit_insn (insns);
-  return;
+  return false;
 }
 
   gcc_assert ((len  3) == 0);
   top_label = gen_label_rtx ();
   addr = force_reg (Pmode, XEXP (shadow_mem, 0));
+  ret = addr == XEXP (shadow_mem, 0);
   shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0);
   end = force_reg (Pmode, plus_constant (Pmode, addr, len));
   emit_label (top_label);
@@ -912,6 +914,7 @@ asan_clear_shadow (rtx shadow_mem, HOST_
   jump = get_last_insn ();
   gcc_assert (JUMP_P (jump));
   add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100);
+  return ret;
 }
 
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
@@ -1048,7 +1051,14 @@ asan_emit_stack_protection (rtx base, HO
   (last_offset - prev_offset)
ASAN_SHADOW_SHIFT);
  prev_offset = last_offset;
- asan_clear_shadow (shadow_mem, last_size  ASAN_SHADOW_SHIFT);
+ if (asan_clear_shadow (shadow_mem, last_size  ASAN_SHADOW_SHIFT))
+   {
+ shadow_mem
+   = adjust_automodify_address (shadow_mem, VOIDmode,
+XEXP (shadow_mem, 0),
+last_size  ASAN_SHADOW_SHIFT);
+ prev_offset += last_size;
+   }
  last_offset = offset;
  last_size = 0;
}


Jakub

[PATCH] Time profiler - phase 1

2013-10-29 Thread Martin Liška

Hello,
   I've cooperating with Jan on a new profile-based function
reordering stuff. This first patch introduces a new GCOV counter that
instruments each function call and stores the time of first run of a
function.

Bootstrapped/regtested on x86_64-linux and i686-linux.

Thanks,
Martin
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fca665b..3b62bcc 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2013-10-29  Martin Liska  marxin.li...@gmail.com
+		Jan Hubicka  j...@suse.cz
+
+	* cgraph.c (dump_cgraph_node): Profile dump added.
+	* cgraph.h (struct cgraph_node): New time profile variable added.
+	* cgraphclones.c (cgraph_clone_node): Time profile is cloned.
+	* gcov-io.h (gcov_type): New profiler type introduced.
+	* ipa-profile.c (lto_output_node): Streaming for time profile added.
+	(input_node): Time profiler is read from LTO stream.
+	* predict.c (maybe_hot_count_p): Hot prediction changed.
+	* profile.c (instrument_values): New case for time profiler added.
+	(compute_value_histograms): Read of time profile.
+	* tree-pretty-print.c (dump_function_header): Time profiler is dumped.
+	* tree-profile.c (init_ic_make_global_vars): Time profiler function added.
+	(gimple_init_edge_profiler): TP function instrumentation.
+	(gimple_gen_time_profiler): New.
+	* value-prof.c (gimple_add_histogram_value): Support for time profiler
+	added.
+	(dump_histogram_value): TP type added to dumps.
+	(visit_hist): More sensitive check that takes TP into account.
+	(gimple_find_values_to_profile): TP instrumentation.
+	* value-prof.h (hist_type): New histogram type added.
+	(struct histogram_value_t): Pointer to struct function added.
+	* libgcc/Makefile.in: New GCOV merge function for TP added.
+	* libgcov.c: function_counter variable introduced.
+	(_gcov_merge_time_profile): New.
+	(_gcov_time_profiler): New.	
+
 2013-10-29  David Malcolm  dmalc...@redhat.com
 
 	* doc/gty.texi (Inheritance and GTY): Make it clear that
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 52d9ab0..c95a54e 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1890,6 +1890,7 @@ dump_cgraph_node (FILE *f, struct cgraph_node *node)
   if (node-profile_id)
 fprintf (f,   Profile id: %i\n,
 	 node-profile_id);
+  fprintf (f,   First run: %i\n, node-tp_first_run);
   fprintf (f,   Function flags:);
   if (node-count)
 fprintf (f,  executed HOST_WIDEST_INT_PRINT_DECx,
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 7706419..479d49f 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -247,7 +247,6 @@ struct GTY(()) cgraph_clone_info
   bitmap combined_args_to_skip;
 };
 
-
 /* The cgraph data structure.
Each function decl has assigned cgraph_node listing callees and callers.  */
 
@@ -324,6 +323,8 @@ struct GTY(()) cgraph_node {
   unsigned tm_clone : 1;
   /* True if this decl is a dispatcher for function versions.  */
   unsigned dispatcher_function : 1;
+  /* Time profiler: first run of function.  */
+  int tp_first_run;
 };
 
 
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index 800dd2c..a05fd77 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -207,6 +207,7 @@ cgraph_clone_node (struct cgraph_node *n, tree decl, gcov_type count, int freq,
   new_node-frequency = n-frequency;
   new_node-clone = n-clone;
   new_node-clone.tree_map = NULL;
+  new_node-tp_first_run = n-tp_first_run;
   if (n-count)
 {
   if (new_node-count  n-count)
diff --git a/gcc/gcov-io.c b/gcc/gcov-io.c
index 5a21c1f..aea272b 100644
--- a/gcc/gcov-io.c
+++ b/gcc/gcov-io.c
@@ -68,7 +68,7 @@ gcov_open (const char *name, int mode)
 #if IN_LIBGCOV
   const int mode = 0;
 #endif
-#if GCOV_LOCKED
+#if GCOV_LOCKED  
   struct flock s_flock;
   int fd;
 
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index ea8d9a7..4650836 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -342,9 +342,10 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
   counter.  */
 #define GCOV_COUNTER_IOR	7  /* IOR of the all values passed to
   counter.  */
-#define GCOV_LAST_VALUE_COUNTER 7  /* The last of counters used for value
+#define GCOV_TIME_PROFILER  8 /* Time profile collecting first run of a function */
+#define GCOV_LAST_VALUE_COUNTER 8  /* The last of counters used for value
   profiling.  */
-#define GCOV_COUNTERS		8
+#define GCOV_COUNTERS		9
 
 /* Number of counters used for value profiling.  */
 #define GCOV_N_VALUE_COUNTERS \
@@ -352,7 +353,7 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
 
   /* A list of human readable names of the counters */
 #define GCOV_COUNTER_NAMES	{arcs, interval, pow2, single, \
-   delta, indirect_call, average, ior}
+   delta, indirect_call, average, ior, time_profiler}
 
   /* Names of merge functions for counters.  */
 #define GCOV_MERGE_FUNCTIONS	{__gcov_merge_add,	\
@@ -362,7 +363,8 @@ typedef unsigned HOST_WIDEST_INT gcov_type_unsigned;
  __gcov_merge_delta,  \
  __gcov_merge_single, \
  __gcov_merge_add,	\
- __gcov_merge_ior}
+

[PATCH] tree-ssa documetation fix

2013-10-29 Thread Martin Liška

Hello,
  I've noticed that some part of documentation is obsolete and this
patch adds documentation for new functions that replace old macros.

Thanks,
Martin
diff --git a/ChangeLog b/ChangeLog
index 1c50b9b..d64fbcf 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,8 @@
+2013-10-29	Martin Liska marxin.li...@gmail.com
+
+	* doc/tree-ssa.texi: Newly added functions that replace old macros
+	are documented.
+
 2013-10-21  Cong Hou  co...@google.com
 
 	* MAINTAINERS (Write After Approval): Add myself.
diff --git a/gcc/doc/tree-ssa.texi b/gcc/doc/tree-ssa.texi
index aaf741b..93f596d 100644
--- a/gcc/doc/tree-ssa.texi
+++ b/gcc/doc/tree-ssa.texi
@@ -526,31 +526,29 @@ the result of ``merging'' @code{a_1}, @code{a_2} and @code{a_3}.
 Hence, PHI nodes mean ``one of these operands.  I don't know
 which''.
 
-The following macros can be used to examine PHI nodes
+The following functions can be used to examine PHI nodes
 
-@defmac PHI_RESULT (@var{phi})
+@defun gimple_phi_result (@var{phi})
 Returns the @code{SSA_NAME} created by PHI node @var{phi} (i.e.,
 @var{phi}'s LHS)@.
-@end defmac
+@end defun
 
-@defmac PHI_NUM_ARGS (@var{phi})
+@defun gimple_phi_num_args (@var{phi})
 Returns the number of arguments in @var{phi}.  This number is exactly
 the number of incoming edges to the basic block holding @var{phi}@.
-@end defmac
+@end defun
 
-@defmac PHI_ARG_ELT (@var{phi}, @var{i})
-Returns a tuple representing the @var{i}th argument of @var{phi}@.
-Each element of this tuple contains an @code{SSA_NAME} @var{var} and
-the incoming edge through which @var{var} flows.
-@end defmac
+@defun gimple_phi_arg (@var{phi}, @var{i})
+Returns @var{i}th argument of @var{phi}@.
+@end defun
 
-@defmac PHI_ARG_EDGE (@var{phi}, @var{i})
+@defun gimple_phi_arg_edge (@var{phi}, @var{i})
 Returns the incoming edge for the @var{i}th argument of @var{phi}.
-@end defmac
+@end defun
 
-@defmac PHI_ARG_DEF (@var{phi}, @var{i})
+@defun gimple_phi_arg_def (@var{phi}, @var{i})
 Returns the @code{SSA_NAME} for the @var{i}th argument of @var{phi}.
-@end defmac
+@end defun
 
 
 @subsection Preserving the SSA form

Re: [PING] [AArch64] Peepholes to generate ldp and stp instructions

2013-10-29 Thread Ramana Radhakrishnan




You are better off CCing the maintainers for such reviews. Let me do 
that for you. I cannot approve or reject this patch but I have a few 
comments as below.


On 10/29/13 09:22, Hurugalawadi, Naveen wrote:


diff -uprN '-x*.orig' mainline-orig/gcc/config/aarch64/aarch64.md 
gcc-4.8.0/gcc/config/aarch64/aarch64.md
--- mainline-orig/gcc/config/aarch64/aarch64.md 2013-10-28 17:15:52.363975264 
+0530
+++ gcc-4.8.0/gcc/config/aarch64/aarch64.md 2013-10-29 10:11:09.656082201 
+0530
@@ -1068,6 +1068,27 @@
 (set_attr mode MODE)]
  )

+(define_peephole2
+  [(set (match_operand:GPI 0 register_operand)
+   (match_operand:GPI 1 aarch64_mem_pair_operand))
+   (set (match_operand:GPI 2 register_operand)
+   (match_operand:GPI 3 memory_operand))]
+  GET_CODE (operands[1]) == MEM
+GET_CODE (XEXP (operands[1], 0)) == PLUS
+GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG


Use the predicates REG_P , CONST_INT_P and so on. Don't do this in this 
form.



+GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+REGNO (operands[0]) != REGNO (operands[2])
+REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+rtx_equal_p (XEXP (operands[3], 0),
+  plus_constant (Pmode, XEXP (operands[1], 0),
+ GET_MODE_SIZE (MODEmode)))
+optimize_size


Why use optimize_size here ? I would have thought it was always good to 
generate ldp and stp instructions. If you want to make it a core 
specific decision make it a tuning decision but don't disable just based 
on size.


Additionally if this check is common between all the patterns it would 
be better to factor this out into a predicate or a function in its own 
right.



+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+)
+
  ;; Operands 0 and 2 are tied together by the final condition; so we allow
  ;; fairly lax checking on the second memory operation.
  (define_insn store_pairmode
@@ -1085,6 +1106,27 @@
 (set_attr mode MODE)]
  )

+(define_peephole2
+  [(set (match_operand:GPI 0 aarch64_mem_pair_operand)
+   (match_operand:GPI 1 register_operand))
+   (set (match_operand:GPI 2 memory_operand)
+   (match_operand:GPI 3 register_operand))]
+  GET_CODE (operands[0]) == MEM
+GET_CODE (XEXP (operands[0], 0)) == PLUS
+GET_CODE (XEXP (XEXP (operands[0], 0), 0)) == REG
+GET_CODE (XEXP (XEXP (operands[0], 0), 1)) == CONST_INT
+GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0))
+REGNO (operands[1]) != REGNO (operands[3])
+REGNO_REG_CLASS (REGNO (operands[1]))
+  == REGNO_REG_CLASS (REGNO (operands[3]))
+rtx_equal_p (XEXP (operands[2], 0),
+  plus_constant (Pmode, XEXP (operands[0], 0),
+ GET_MODE_SIZE (MODEmode)))
+optimize_size
+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+)
+
  ;; Operands 1 and 3 are tied together by the final condition; so we allow
  ;; fairly lax checking on the second memory operation.
  (define_insn load_pairmode
@@ -1102,6 +1144,28 @@
 (set_attr mode MODE)]
  )

+(define_peephole2
+  [(set (match_operand:GPF 0 register_operand)
+   (match_operand:GPF 1 aarch64_mem_pair_operand))
+   (set (match_operand:GPF 2 register_operand)
+   (match_operand:GPF 3 memory_operand))]
+  GET_CODE (operands[1]) == MEM
+GET_CODE (XEXP (operands[1], 0)) == PLUS
+GET_CODE (XEXP (XEXP (operands[1], 0), 0)) == REG
+GET_CODE (XEXP (XEXP (operands[1], 0), 1)) == CONST_INT
+GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+REGNO (operands[0]) != REGNO (operands[2])
+REGNO (operands[0]) = 32  REGNO (operands[2]) = 32


I'm not sure what this means here but this check doesn't look right - 32 
is V0_REGNUM - why are you checking that here ? If you really need to do 
that check then use V0_REGNUM or it may be better to see if this really 
is a SIMD regnum ? Can you not use something like FP_REGNUM_P here 
instead ?




+REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+rtx_equal_p (XEXP (operands[3], 0),
+  plus_constant (Pmode, XEXP (operands[1], 0),
+ GET_MODE_SIZE (MODEmode)))
+optimize_size
+  [(parallel [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2) (match_dup 3))])]
+)
+
  ;; Operands 0 and 2 are tied together by the final condition; so we allow
  ;; fairly lax checking on the second memory operation.
  (define_insn store_pairmode
@@ -1119,6 +1183,28 @@
 (set_attr mode MODE)]
  )

+(define_peephole2
+  [(set (match_operand:GPF 0 aarch64_mem_pair_operand)
+   (match_operand:GPF 1 register_operand))
+   (set (match_operand:GPF 2 memory_operand)
+   (match_operand:GPF 3 register_operand))]
+

Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.

2013-10-29 Thread Kirill Yukhin

Hello Richard,

On 28 Oct 14:45, Richard Henderson wrote:
 On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
  Hello Richard,
  On 28 Oct 08:20, Richard Henderson wrote:
  Why is a masked *scalar* operation useful?
  
  The reason the instructions exist is so that
  you can do fully fault correct predicated scalar algorithms.
 
 Using VEC_MERGE isn't the proper representation for that.
 
 If that's your real goal, then COND_EXEC is the only way to let
 rtl know that faults are suppressed in the false condition.

I believe cond_exec approach supposed to look like this:
  (define_subst mask_scalar
[(set (match_operand:SUBST_V 0)
  (vec_merge:SUBST_V
(match_operand:SUBST_V 1)
(match_operand:SUBST_V 2)
(const_int 1)))]
TARGET_AVX512F
[(cond_exec (eq:CC
  (match_operand:avx512fmaskmode 3 register_operand k)
  (const_int 1))
(set (match_dup 0)
 (vec_merge:SUBST_V
   (match_dup 1)
   (match_dup 2)
   (const_int 1])

But this only will describe merge-masking in incorrect way.
We will need to add a clobber to signal that even for false
condition we will zero higher part of register.
Preferable zerro-masking will be indistinguishable from merge-
masking and will need to choose which mask mode to enable. Bad turn.

IMHO, we have 3 options to implement scalar masked insns:
  1. `vec_merge' over vec_merge (current approach).
 Pro.
   1. Precise semantic description
   2. Unified approach with vector patterns
   3. Freedom for simplifier to reduce EVEX to VEX for
   certain const masks
 Cons.
   1. Too precise semantic description and as a
   consequence complicated code in md-file

  2. `cond_exec' approach
Pro.
  1. Look useful for compiler when trying to generate
  predicated code
Cons.
  1. Not precise. Extra clobbers (?) needed: to signal
  that we're changing the register even for false
  condition in cond_exec 
  2. Unable to describe zero masking nicely
  3. Code still complicated as for option #1
  4. Simplifier won't work (clobber is always clobber)

  3. Make all masked scalar insns to be unspecs
Pro.
  1. Straight-forward, not overweighted. Enough for
  intrinsics to work
Cons.
  1. Since every unspec needs a code: substs won't be
  applied directly: huge volume of similar code
  2. Simplifier won't work
  3. Generation of predicated code become hard

Am I missing  some options, or that’s all we have?
If so, what option would you prefer?

Thanks, K

Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-29 Thread Richard Biener

On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law l...@redhat.com wrote:
 On 10/25/13 11:57, Ilya Enkovich wrote:


 There are currently two known issues with LTO. The first one is ICE in
 LTO streamer when it reads instrumented code. The second one is
 unitialized flag_check_pointers when code is compiled by lto1 (do you
 know why it may happen BTW?). It also causes ICE beacause instrumented
 code is met when not expected. Of course, I'll fix these problems
 anyway, but I was going to allow '-fcheck-pointers -flto' only when
 checker testsuite has 100% pass rate with lto.

 No idea about why LTO would be failing in this way -- my experience with LTO
 is virtually nil.

 I'm OK with the restriction as a temporary measure.  But we really dislike
 this kidn of stuff, so I need a commitment from you to dive into these
 problems and sort out what's really going on.

I'm not ok with adding such arbitrary restrictions.  LTO support should
be trivial.

Richard.



 Ilya

Re: Using gen_int_mode instead of GEN_INT minor testsuite fallout on MIPS

2013-10-29 Thread Richard Biener

On Mon, Oct 28, 2013 at 9:16 PM, Mike Stump mikest...@comcast.net wrote:
 Ping?

Can you please get rid of PARTIAL_INT_MODE_NAME by
making the name a required argument to PARTIAL_INT_MODE?

Ok with that change (given target maintainers are ok with this).

Thanks,
Richard.

 On Oct 18, 2013, at 10:03 AM, Mike Stump mikest...@comcast.net wrote:
 On Oct 18, 2013, at 4:24 AM, Richard Biener richard.guent...@gmail.com 
 wrote:
 I agree.  Btw, the implementation defined precision of PDI on SH-5 
 definitely
 looks interesting, but I wonder how you can perform implementation defined
 arithmetic on that PDI mode then ;)

 Easy, perform it in the maximal size supported…

 What else blocks this patch?

 Nothing.  We have heard from everyone, and all the issues found by testing 
 were on msp, and those were exclusively due to port issues that I believe 
 now have been resolved.  The below patch has the change to 22 bits for PSI 
 on SH.

 Ok?

Re: [PATCH, ARM] Fix line number data for PIC register setup code

2013-10-29 Thread Tom de Vries

On 29/10/13 09:12, Eric Botcazou wrote:
 Updated patch, re-bootstrapped on x86_64 and committed to trunk.

 Also applied to 4.7 and 4.8 branches, the same problem is present there.
 
 You only asked for approval on trunk though,

Eric,

Sorry about that.

 and I'm not sure we really care
 about the results of the GDB testsuite on the 4.7 branch at this point,

FWIW, I originally encountered the problem on 4.7.

 so let's exclude the 4.7 branch here.
 

Reverted on the 4.7 branch.

Thanks,
- Tom

Re: [RFC] [Testsuite,ARM] Neon intrinsics executable tests

2013-10-29 Thread Ramana Radhakrishnan


On 10/09/13 23:16, Christophe Lyon wrote:

Hi,

This patch is a first small sample of dejagnu-ization of my ARM Neon
intrinsics tests.


Thanks for attempting this and apologies for the slow response - I've 
been busy with a few other things internally.




It's derived from my previous work at
http://gitorious.org/arm-neon-tests/arm-neon-tests  which supports all
the ARM intrinsics, with executable tests. As I have to manually
transform each test (to include the expected data, and a few other
modifications), it's quite a bit tedious.


I'll take your word that this is tedious :) I can see how you get the 
reference input in from the original text file into headers and having 
to marshall things there.


Irrespective of our earlier conversations on this now I'm actually 
wondering if instead of doing this and integrating this in the GCC 
source base it maybe easier to write a harness to test this cross on 
qemu or natively. Additionally setting up an auto-tester to do this 
might be a more productive use of time rather than manually dejagnuizing 
this which appears to be a tedious and slow process.




I'd like your feedback before continuing, as there are a lot more
files to come.

I have made some cleanup to help review, but the two .h files will
need to grow as more intrinsics will be added (see the original ones).


Which one should I compare this with in terms of the original file ?



I'd like to keep the modifications at a minimal level, to save my time
when adapting each test (there are currently 145 test files, so 143
left:-).


On to the patch itself.

The prefix TEST_ seems a bit misleading in that it suggests this is 
testing something when in reality this is initializing stuff.





Thanks,

Christophe.

This patch only introduces new files.
2013-10-03  Christophe Lyonchristophe.l...@linaro.org

testsuite/gcc.target/arm/neon-intrinsics/
* neon-intrinsics.exp: New driver file.
* arm-neon-ref.h: New file, with common vector construction
helpers.
* compute_ref_data.h: New file, with helpers for input data
initialization.
* ref_vaba.c: New test file for the vaba family of intrinsics.
* ref_vld1.c: New test file for vld1.


neontests.patch.txt


diff -rNup '--exclude=.git' 
gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp 
gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp
--- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp
1970-01-01 01:00:00.0 +0100
+++ 
gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/neon-intrinsics.exp
  2013-05-08 23:08:46.271786347 +0200
@@ -0,0 +1,35 @@
+# Copyright (C) 1997-2013 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+#http://www.gnu.org/licenses/.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM target.
+if ![istarget arm*-*-*] then {
+  return
+}


Also for aarch64*-*-* as all these intrinsics are compatible with the 
aarch64 port. I would also prefer that this be tortured over multiple 
optimization levels as many times we find issues with different 
optimization levels.


More later I need to get back to something else and I need to play more 
with your original testsuite - but I'd like some discussion around some 
of these points anyway.


Ramana


+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+   
+
+# All done.
+dg-finish
diff -rNup '--exclude=.git' 
gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h 
gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
--- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h 
1970-01-01 01:00:00.0 +0100
+++ 
gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h   
2013-05-09 00:48:59.395628726 +0200
@@ -0,0 +1,349 @@
+#ifndef_ARM_NEON_REF_H_
+#define_ARM_NEON_REF_H_
+
+#include stdio.h
+#include inttypes.h
+#include string.h
+#include stdlib.h
+
+#define xSTR(X) #X
+#define STR(X) xSTR(X)
+
+#define xNAME1(V,T) V ## _ ##  T
+#define xNAME(V,T) xNAME1(V,T)
+
+#define VAR(V,T,W) xNAME(V,T##W)
+#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W)
+
+#define VECT_NAME(T, W, N) T##W##x##N
+#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L
+#define VECT_TYPE(T, W, N)

Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-29 Thread Richard Biener

On Sun, Oct 27, 2013 at 7:55 PM, Andrew Pinski pins...@gmail.com wrote:
 On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski pins...@gmail.com wrote:
 On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski pins...@gmail.com wrote:
 On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen
 zhenqiang.c...@linaro.org wrote:
 On 18 October 2013 00:58, Jeff Law l...@redhat.com wrote:
 On 10/17/13 05:03, Richard Biener wrote:

 Is it OK for trunk?


 I had a much simpler change which did basically the same from 4.7 (I
 can update it if people think this is a better approach).


 I like that more (note you can now use is_gimple_condexpr as predicate
 for force_gimple_operand).

 The obvious question is whether or not Andrew's simpler change picks up as
 many transformations as Zhenqiang's change.  If not are the things missed
 important.

 Zhenqiang, can you do some testing of your change vs Andrew P.'s change?

 Here is a rough compare:

 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included
 in my patch). Root cause is that it does not skip LABEL. The guard
 to do this opt should be the same the bb_has_overhead_p in my patch.

 This should be an easy change, I am working on this right now.


 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not
 efficient for ||. e.g. For ssa-ifcombine-ccmp-6.c, it will generate

   _3 = a_2(D)  0;
   _5 = b_4(D)  0;
   _6 = _3 | _5;
   _9 = c_7(D) = 0;
   _10 = ~_6;
   _11 = _9  _10;
   if (_11 == 0)

 With my patch, it will generate

   _3 = a_2(D)  0;
   _5 = b_4(D)  0;
   _6 = _3 | _5;
   _9 = c_7(D)  0;
   _10 = _6 | _9;
   if (_10 != 0)

 As mentioned otherwise, this seems like a missed optimization inside
 forwprop.  When I originally wrote this code there used to be two
 cases one for  and one for |, but this was removed sometime and I
 just made the code evolve with that.

 Actually I can make a small patch (3 lines) to my current patch which
 causes tree-ssa-ifcombine.c to produce the behavior of your patch.

  if (result_inv)
{
  t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t);
  result_inv = false;
}

 I will submit a new patch after some testing of my current patch which
 fixes items 1 and 2.


 Here is my latest patch which adds the testcases from Zhenqiang's
 patch and fixes item 1 and 2.

 OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

   return i;
 }
+/* Return a new iterator pointing to the first non-debug non-label statement in
+   basic block BB.  */
+
+static inline gimple_stmt_iterator
+gsi_start_nondebug_after_labels_bb (basic_block bb)


vertical space before the comment missing.

@@ -631,7 +669,7 @@ tree_ssa_ifcombine (void)
   bbs = single_pred_before_succ_order ();
   calculate_dominance_info (CDI_DOMINATORS);

-  for (i = 0; i  n_basic_blocks - NUM_FIXED_BLOCKS; ++i)
+  for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i = 0; i--)
 {
   basic_block bb = bbs[i];
   gimple stmt = last_stmt (bb);

The original code matches what phiopt does which even has a comment:

  /* Search every basic block for COND_EXPR we may be able to optimize.

 We walk the blocks in order that guarantees that a block with
 a single predecessor is processed before the predecessor.
 This ensures that we collapse inner ifs before visiting the
 outer ones, and also that we do not try to visit a removed
 block.  */
  bb_order = single_pred_before_succ_order ();
  n = n_basic_blocks - NUM_FIXED_BLOCKS;

  for (i = 0; i  n; i++)

so if the reverse order is better here (and in phiopt?) then it at least
needs a comment explaining why.

Otherwise the patch looks good, but please iterate with Jeff over
the dom testcase issue.

Thanks,
Richard.

 Thanks,
 Andrew Pinski

 ChangeLog:
 * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
 (ifcombine_ifandif): Handle cases where
 maybe_fold_and_comparisons fails, combining the branches
 anyways.
 (tree_ssa_ifcombine): Inverse the order of
 the basic block walk, increases the number of combinings.
 * gimple.h (gsi_start_nondebug_after_labels_bb): New function.


 testsuite/ChangeLog:

 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.

 * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
 conditional move to be used.
 * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more
 intermediate.





 Thanks,
 Andrew Pinski




 3) The good thing of Andrew P.'s change is that Inverse the order of
 the basic block walk so it can do combine recursively.

 But I think we need some heuristic to control the number of ifs. Move
 too much compares from
 the inner_bb to outer_bb is not good.


 I think this depends on the target.  For MIPS we don't want an upper
 bound

[PATCH v4, nds32] Andes nds32 port, libgcc part

2013-10-29 Thread Chung-Ju Wu

Hi, all,

This is v4 patch for Andes nds32 port on libgcc part.

Thanks Joseph Myers's comments.  Now we remove the .file directive
from crtzero.S.  The rationale of this can be referred to:
  http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00123.html

The diff is as follows and the patch file is attached.

diff --git a/libgcc/config/nds32/crtzero.S b/libgcc/config/nds32/crtzero.S
index 5ef0361..6509869 100644
--- a/libgcc/config/nds32/crtzero.S
+++ b/libgcc/config/nds32/crtzero.S
@@ -33,8 +33,6 @@
 !!
 
!!==

-   .file   crtzero.S
-
 
!!--
 !! Jump to start up code
 
!!--


libgcc/
2013-10-29  Chung-Ju Wu  jasonw...@gmail.com
Shiva Chen  shiva0...@gmail.com

* config/nds32 : New directory and files.


Best regards,
jasonwucj


3-nds32-libgcc.v4.patch.tar.gz
Description: GNU Zip compressed data

Re: free is a killer

2013-10-29 Thread Richard Biener

On Mon, Oct 28, 2013 at 11:05 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 28 Oct 2013, Jeff Law wrote:

 On 10/26/13 01:15, Marc Glisse wrote:

 Hello,

 this patch teaches gcc that free kills the memory its argument points
 to. The equality test is probably too strict, I guess we can loosen it
 later (unless you have suggestions?).

 Note that the corresponding code for BUILT_IN_MEMCPY and others seems
 suspicious to me, it looks like it is testing for equality between a
 pointer and a mem_ref, which is unlikely to happen.

 Bootstrap+testsuite on x86_64-unknown-linux-gnu.

 2013-10-27  Marc Glisse  marc.gli...@inria.fr

  PR tree-optimization/19831
 gcc/
  * tree-ssa-alias.c (stmt_kills_ref_p_1): Handle BUILT_IN_FREE.

 gcc/testsuite/
  * gcc.dg/tree-ssa/alias-25.c: New file.

 OK for the trunk.


 Thanks.


 I agree the MEM_REF* and VA_END cases look strange and I think they're
 wrong as well.  Did you happen to try fixing them and running the testsuite
 to see if anything fails?

 ISTM your testcase could be tweaked to test conclusively if they're doing
 the right thing -- and regardless of what's found that test can/should make
 its way into the testsuite.

The VA_END case looks ok (though == equality testing is a bit
too conservative, the SSA_NAME case of the builtins handling
looks wrong indeed.

 I checked and it does the wrong thing (I don't have the testcase handy
 anymore, but it shouldn't be hard to recreate one), I even wrote a patch
 (attached) but it is related to:
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html
 so it can't go in. A more limited fix (still missing many cases) would be
 possible. I didn't test VA_END, but it doesn't look as bad (it is the
 SSA_NAME case that is wrong for memcpy).


 While I am posting non-submitted patches, does the following vaguely make
 sense? I couldn't find a testcase that exercised it without some local
 patches, but the idea (IIRC) is that with:
 struct A { struct B b; int i; }
 we can easily end up with one ao_ref.base that is a MEM_REF[p] and another
 one a MEM_REF[(B*)q] where p and q are of type A*, and that prevents us from
 noticing that p-i and q-b don't alias. Maybe I should instead find a way
 to get MEM_REF[q] as ao_ref.base, but if q actually points to a larger
 structure that starts with A it is likely to fail as well...

 --- gcc/tree-ssa-alias.c(revision 204095)
 +++ gcc/tree-ssa-alias.c(working copy)
 @@ -1099,22 +1099,24 @@ indirect_refs_may_alias_p (tree ref1 ATT

/* If both references are through the same type, they do not alias
   if the accesses do not overlap.  This does extra disambiguation
   for mixed/pointer accesses but requires strict aliasing.  */
if ((TREE_CODE (base1) != TARGET_MEM_REF
 || (!TMR_INDEX (base1)  !TMR_INDEX2 (base1)))
 (TREE_CODE (base2) != TARGET_MEM_REF
   || (!TMR_INDEX (base2)  !TMR_INDEX2 (base2)))
 same_type_for_tbaa (TREE_TYPE (base1), TREE_TYPE (ptrtype1)) == 1
 same_type_for_tbaa (TREE_TYPE (base2), TREE_TYPE (ptrtype2)) == 1
 -   same_type_for_tbaa (TREE_TYPE (ptrtype1),
 -TREE_TYPE (ptrtype2)) == 1)
 +   (same_type_for_tbaa (TREE_TYPE (ptrtype1),
 + TREE_TYPE (ptrtype2)) == 1
 + || same_type_for_tbaa (TREE_TYPE (TREE_TYPE (ptr1)),
 +TREE_TYPE (TREE_TYPE (ptr2))) == 1))

No, the type of 'ptr' are not in any way relevant.

Richard.

  return ranges_overlap_p (offset1, max_size1, offset2, max_size2);

/* Do type-based disambiguation.  */
if (base1_alias_set != base2_alias_set
 !alias_sets_conflict_p (base1_alias_set, base2_alias_set))
  return false;

/* Do access-path based disambiguation.  */
if (ref1  ref2
 (handled_component_p (ref1) || handled_component_p (ref2))



 In the category ugly hack that I won't submit, let me also attach this
 patch that sometimes helps notice that malloc doesn't alias other things (I
 don't know if the first hunk ever fires).

 --
 Marc Glisse

[PATCH v4, nds32] Andes nds32 port, documentation part

2013-10-29 Thread Chung-Ju Wu

This is v4 patch for Andes nds32 port on documentation part.

Thanks Joseph Myers's comments.  Now we use @deftypefn and @samp
to refine our documentation content.

The diff is as follows and the patch file is attached.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index f0002a9..5f91813 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -12432,33 +12432,38 @@ once the handler returns.

 These built-in functions are available for the NDS32 target:

-@table @code
-@item void __builtin_nds32_isync (int *@var{addr})
+@deftypefn {Built-in Function} void __builtin_nds32_isync (int *@var{addr})
 Insert an ISYNC instruction into the instruction stream where
 @var{addr} is an instruction address for serialization.
+@end deftypefn

-@item void __builtin_nds32_isb (void)
+@deftypefn {Built-in Function} void __builtin_nds32_isb (void)
 Insert an ISB instruction into the instruction stream.
+@end deftypefn

-@item int __builtin_nds32_mfsr (int @var{sr})
+@deftypefn {Built-in Function} int __builtin_nds32_mfsr (int @var{sr})
 Return the content of a system register which is mapped by @var{sr}.
+@end deftypefn

-@item int __builtin_nds32_mfusr (int @var{usr})
+@deftypefn {Built-in Function} int __builtin_nds32_mfusr (int @var{usr})
 Return the content of a user space register which is mapped by @var{usr}.
+@end deftypefn

-@item void __builtin_nds32_mtsr (int @var{value}, int @var{sr})
+@deftypefn {Built-in Function} void __builtin_nds32_mtsr (int
@var{value}, int @var{sr})
 Move the @var{value} to a system register which is mapped by @var{sr}.
+@end deftypefn
-@item void __builtin_nds32_mtusr (int @var{value}, int @var{usr})
+@deftypefn {Built-in Function} void __builtin_nds32_mtusr (int
@var{value}, int @var{usr})
 Move the @var{value} to a user space register which is mapped by @var{usr}.
+@end deftypefn

-@item void __builtin_nds32_setgie_en (void)
+@deftypefn {Built-in Function} void __builtin_nds32_setgie_en (void)
 Enable global interrupt.
+@end deftypefn

-@item void __builtin_nds32_setgie_dis (void)
+@deftypefn {Built-in Function} void __builtin_nds32_setgie_dis (void)
 Disable global interrupt.
-
-@end table
+@end deftypefn

 @node picoChip Built-in Functions
 @subsection picoChip Built-in Functions
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 68af98e..cbc0644 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1864,7 +1864,7 @@ supported since version 4.7.2 and is the default
in 4.8.0 and newer.

 @item --with-nds32-lib=@var{library}
 Specifies that @var{library} setting is used for building @file{libgcc.a}.
-Currently, the valid @var{library} is 'newlib' or 'mculib'.
+Currently, the valid @var{library} is @samp{newlib} or @samp{mculib}.
 This option is only supported for the NDS32 target.

 @item --with-build-time-tools=@var{dir}


gcc/
2013-10-29  Chung-Ju Wu  jasonw...@gmail.com
Shiva Chen  shiva0...@gmail.com

* doc/invoke.texi (NDS32 options): Document nds32 specific options.
* doc/md.texi (NDS32 family): Document nds32 specific constraints.
* doc/install.texi (Cross-Compiler-Specific Options): Document
--with-nds32-lib for nds32 target.
* doc/extend.texi (Function Attributes): Document nds32 specific
attributes.


Best regards,
jasonwucj


5-nds32-documentation.v4.patch
Description: Binary data

Re: free is a killer

2013-10-29 Thread Richard Biener

On Tue, Oct 29, 2013 at 6:35 AM, Jeff Law l...@redhat.com wrote:
 On 10/28/13 16:05, Marc Glisse wrote:


 I checked and it does the wrong thing (I don't have the testcase handy
 anymore, but it shouldn't be hard to recreate one), I even wrote a patch
 (attached) but it is related to:
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html
 so it can't go in. A more limited fix (still missing many cases) would
 be possible. I didn't test VA_END, but it doesn't look as bad (it is the
 SSA_NAME case that is wrong for memcpy

 Actually, it's fairly easy to see with memset.  Something like this:

 /* { dg-do compile } */
 /* { dg-options -O1 -fdump-tree-optimized } */

 void f (long *p) {
   *p = 42;
   p[4] = 42;
   __builtin_memset (p, 0, 100);
 }

 /* { dg-final { scan-tree-dump-not = 42 optimized } } */
 /* { dg-final { cleanup-tree-dump optimized } } */


 I'm sure it can be made to work with memcpy as well, it's just harder
 because you have to ensure the source/dest args don't alias.





 While I am posting non-submitted patches, does the following vaguely
 make sense? I couldn't find a testcase that exercised it without some
 local patches, but the idea (IIRC) is that with:
 struct A { struct B b; int i; }
 we can easily end up with one ao_ref.base that is a MEM_REF[p] and
 another one a MEM_REF[(B*)q] where p and q are of type A*, and that
 prevents us from noticing that p-i and q-b don't alias. Maybe I should
 instead find a way to get MEM_REF[q] as ao_ref.base, but if q actually
 points to a larger structure that starts with A it is likely to fail as
 well...

 I've never looked at the alias oracle stuff, but ISTM that offsets from the
 base ought to be enough to disambiguate?





 In the category ugly hack that I won't submit, let me also attach this
 patch that sometimes helps notice that malloc doesn't alias other things
 (I don't know if the first hunk ever fires).

 Well, one thing that can be done here (I think it comes from Steensgaard and
 I'm pretty sure it's discussed in Morgan's text) is you assign a storage
 class and propagate that.   Once you do so, you can eliminate certain
 things.  You use a vrp-like engine with a fairly simple meet operator to
 handle propagation of the storage class.

 Once you have storage classes for all the pointers, you can make certain
 deductions about aliasing relationships.  Including what things in the
 malloc storage class can and can not alias.

Of course in the example we have the global memory storage class
(incoming function argument) and malloc memory which is really
the same storage class.  It only becomes a different storage class
if you factor in flow analysis (for which the current PTA code is weak
once you leave the SSA web).

Richard.

 Jeff

Re: RFA: Andes nds32 port v4 patch

2013-10-29 Thread Chung-Ju Wu

2013/10/29 Joseph S. Myers jos...@codesourcery.com:
 Since Richard looked at the machine description part of the port, I looked
 at the other pieces.  I'm happy with those pieces with the following minor
 changes:

 http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00520.html (libgcc): remove
 the .file directive from crtzero.S.  See
 http://gcc.gnu.org/ml/gcc-patches/2008-07/msg00123.html for why such
 directives are best avoided.


Thank you for the link.  Now I realize the rationale of
why it is better to remove the .file directive. :)

Modify it accordingly:
  http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html

 http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00522.html (documentation):
 use @deftypefn rather than @table @code when documenting built-in
 functions.  In install.texi, say @samp{newlib} and @samp{mculib} instead
 of using '' quotes.


Thanks for the comment.  Now I modify it as you suggested:
  http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html

 --
 Joseph S. Myers
 jos...@codesourcery.com

Thank you very much for the comments on libgcc and documentation parts.
Here I re-list the v4 patch of nds32 port:

  Machine Description --
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02153.html
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02154.html
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02156.html
  Libgcc and Documentation --
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html

Are those nds32 port patches OK to be committed?


Best regards,
jasonwucj

Re: [C++ Patch] PR 58888

2013-10-29 Thread Paolo Carlini


Hi,

On 10/28/2013 07:05 PM, Jason Merrill wrote:
My question is, why do we need that whole block for massaging VAR_DECL 
initializers? That all ought to be handled properly by cp_finish_decl. 
Does removing everything after



else if (!VAR_P (value))
gcc_unreachable ();

work?
In fact, it works great, excellent. I only have to tweak a bit 
g++.dg/other/warning1.C, which, for some reason, used dg-warnings, 
whereas in fact we have been matching errors (we are removing the field 
initializer is not constant error and replacing it with (1.0e+0f / 
0.0f)’ is not a constant expression etc)


Tested x86_64-linux.

Thanks,
Paolo.

/
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 204152)
+++ cp/decl2.c  (working copy)
@@ -955,26 +955,6 @@ grokfield (const cp_declarator *declarator,
/* C++11 NSDMI, keep going.  */;
   else if (!VAR_P (value))
gcc_unreachable ();
-  else if (!processing_template_decl)
-   {
- if (TREE_CODE (init) == CONSTRUCTOR)
-   init = digest_init (TREE_TYPE (value), init, tf_warning_or_error);
- init = maybe_constant_init (init);
-
- if (init != error_mark_node  !TREE_CONSTANT (init))
-   {
- /* We can allow references to things that are effectively
-static, since references are initialized with the
-address.  */
- if (TREE_CODE (TREE_TYPE (value)) != REFERENCE_TYPE
- || (TREE_STATIC (init) == 0
-  (!DECL_P (init) || DECL_EXTERNAL (init) == 0)))
-   {
- error (field initializer is not constant);
- init = error_mark_node;
-   }
-   }
-   }
 }
 
   if (processing_template_decl  VAR_OR_FUNCTION_DECL_P (value))
Index: testsuite/g++.dg/cpp0x/auto40.C
===
--- testsuite/g++.dg/cpp0x/auto40.C (revision 0)
+++ testsuite/g++.dg/cpp0x/auto40.C (working copy)
@@ -0,0 +1,11 @@
+// PR c++/5
+// { dg-do compile { target c++11 } }
+
+#include initializer_list
+
+struct A
+{
+  static constexpr auto b{1.0};
+};
+
+constexpr decltype(A::b) A::b;
Index: testsuite/g++.dg/other/warning1.C
===
--- testsuite/g++.dg/other/warning1.C   (revision 204152)
+++ testsuite/g++.dg/other/warning1.C   (working copy)
@@ -7,8 +7,8 @@ extern C int printf(const char *, ...);
 
 struct S
 {
-  static const float inf = 1.0f / 0.0f; // { dg-warning 1.0|float|initializ }
-  static const float nan = 0.0f / 0.0f; // { dg-warning 0.0|float|initializ }
+  static const float inf = 1.0f / 0.0f; // { dg-error 1.0|float|initializ }
+  static const float nan = 0.0f / 0.0f; // { dg-error 0.0|float|initializ }
 };
 
 int main()

Re: [PATCH, ARM] Fix line number data for PIC register setup code

2013-10-29 Thread Eric Botcazou

 Reverted on the 4.7 branch.

Thanks.

-- 
Eric Botcazou

Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP

2013-10-29 Thread Richard Biener

On Fri, 25 Oct 2013, Jakub Jelinek wrote:

 Hi!
 
 As discussed on IRC, this patch attempts to preserve VRP computed
 range info for some simple __builtin_unreachable () using assertions.
 If there are no immediate uses of some SSA_NAME except for those in
 a condition guarding __builtin_unreachable () and in ASSERT_EXPR
 in the following basic block, we can copy the range info from the assert
 lhs (which is lost during remove_range_assertions) and thus preserve it.
 Eventually we might then remove the __builtin_unreachable () in that case
 and still benefit from user annotating the source.
 In the testcase below in bar we have even without this patch the expected
 range info on the SSA_NAME set to x + 1, but in foo there was no range
 info preserved.  As we have just one user of get_range_info right now,
 not creating a testcase.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 int
 foo (int x)
 {
   if (x  26 || x  37)
 __builtin_unreachable ();
   return x;
 }
 
 int
 bar (int x)
 {
   if (x  26 || x  37)
 __builtin_unreachable ();
   return x + 1;
 }
 
 2013-10-25  Jakub Jelinek  ja...@redhat.com
 
   * tree-vrp.c (remove_range_assertions): If ASSERT_EXPR_VAR
   has no other immediate uses but in the condition and ASSERT_EXPR
   and the other successor of the predecessor bb is
   __builtin_unreachable (), set_range_info of the ASSERT_EXPR_VAR
   to the range info of the ASSERT_EXPR's lhs.
 
 --- gcc/tree-vrp.c.jj 2013-10-24 10:19:21.0 +0200
 +++ gcc/tree-vrp.c2013-10-24 14:32:29.065878208 +0200
 @@ -6488,12 +6488,16 @@ remove_range_assertions (void)
  {
basic_block bb;
gimple_stmt_iterator si;
 +  /* 1 if looking at ASSERT_EXPRs immediately at the beginning of
 + a basic block preceeded by GIMPLE_COND branching to it and
 + __builtin_trap, -1 if not yet checked, 0 otherwise.  */
 +  int is_unreachable;
  
/* Note that the BSI iterator bump happens at the bottom of the
   loop and no bump is necessary if we're removing the statement
   referenced by the current BSI.  */
FOR_EACH_BB (bb)
 -for (si = gsi_start_bb (bb); !gsi_end_p (si);)
 +for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);)
{
   gimple stmt = gsi_stmt (si);
   gimple use_stmt;
 @@ -6501,30 +6505,96 @@ remove_range_assertions (void)
   if (is_gimple_assign (stmt)
gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
 {
 + tree lhs = gimple_assign_lhs (stmt);
   tree rhs = gimple_assign_rhs1 (stmt);
   tree var;
   tree cond = fold (ASSERT_EXPR_COND (rhs));
 - use_operand_p use_p;
 + use_operand_p use_p, use2_p;
   imm_use_iterator iter;
  
   gcc_assert (cond != boolean_false_node);
  
 - /* Propagate the RHS into every use of the LHS.  */
   var = ASSERT_EXPR_VAR (rhs);
 - FOR_EACH_IMM_USE_STMT (use_stmt, iter,
 -gimple_assign_lhs (stmt))
 + gcc_assert (TREE_CODE (var) == SSA_NAME);
 +
 + if (!POINTER_TYPE_P (TREE_TYPE (lhs))
 +  SSA_NAME_RANGE_INFO (lhs))
 +   {
 + if (is_unreachable == -1)
 +   {
 + is_unreachable = 0;
 + if (single_pred_p (bb))
 +   {
 + basic_block pred_bb = single_pred (bb);
 + gimple last = last_stmt (pred_bb);
 + if (last  gimple_code (last) == GIMPLE_COND)
 +   {
 + basic_block other_bb
 +   = EDGE_SUCC (pred_bb, 0)-dest;
 + if (other_bb == bb)
 +   other_bb = EDGE_SUCC (pred_bb, 1)-dest;
 + if (EDGE_COUNT (other_bb-succs) == 0)
 +   {
 + gimple_stmt_iterator gsi
 +   = gsi_after_labels (other_bb);
 + if (!gsi_end_p (gsi)
 +  gimple_call_builtin_p
 + (gsi_stmt (gsi),
 +  BUILT_IN_UNREACHABLE))
 +   is_unreachable = 1;
 +   }
 +   }

Please factor this out into a separate function.  Like
obfuscated_fallthru_edge_p () (better name?).  Best somewhere
in tree-cfg.c.  Also you want to skip debug stmts ... (not sure
if we reliably enough remove them, but I can envision a compare-debug
fail with some DCE disabled at least).

 +   }
 +   }
 + /* Handle
 +if (x_7 = 10  x_7  20)
 +  __builtin_unreachable ();
 +x_8 = ASSERT_EXPR x_7, ...;
 +if the only uses of x_7 are in the ASSERT_EXPR and
 +in the condition.  In that

Re: [PATCH] Introduce [sg]et_nonzero_bits

2013-10-29 Thread Richard Biener

On Fri, 25 Oct 2013, Jakub Jelinek wrote:

 Hi!
 
 tree-ssa-ccp.c already computes which bits are known to be zero, but
 we preserve that info only for pointers and not for integers.
 This patch changes SSA_NAME_RANGE_INFO, so we preserve that info even for
 integers.  The bitmask is also computed from range info.
 There are no users of this besides ccp itself right now, but I'm working
 on using that info e.g. in the vectorizer.
 
 I had to tweak one testcase, because it relied on one of the two warnings
 being emitted during VRP1 and one during VRP2, but because of the range info
 derived nonzero_bits we now manage to optimize away the second test during
 CCP3, which doesn't have -Wtype-limits warnings.  If it would be too
 important to keep warning even there, I guess it could check get_range_info
 when folding a condition to 0/1 and warn if the rhs isn't in the lhs' range.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 During the development of the patch I had there code to not only
 use get_range_info if otherwise returning VARYING, but also
 if it was already constant, but that failed bootstrap miserably:
 
 It was something along the lines of:
 
 ++/* If get_nonzero_bits tells us some bits
 ++   have to be zero and they aren't, something
 ++   went wrong.  Only bits in value where val.mask
 ++   has zeroes (i.e. valid bit from CCP), mask has ones
 ++   (don't care about bits outside of precision) and
 ++   has zeroes in nonzero_bits (guaranteed to be zero)
 ++   should be verified to be zero.  */
 ++double_int valv = tree_to_double_int (val.value);
 ++gcc_assert ((valv  ~val.mask
 ++  ~nonzero_bits  mask).is_zero ());
 ++if (!(valv  ~nonzero_bits  mask).is_zero ())
 ++  val.value = double_int_to_tree (TREE_TYPE (lhs),
 ++  valv  nonzero_bits);
 ++if (nonzero_bits.is_zero ())
 ++  val.mask = double_int_zero;
 ++else
 ++  val.mask = val.mask  (nonzero_bits | ~mask);
 
 The problem was that during the iteration when only a subset of the edges
 were executable it sometimes failed the assertion, dunno if that is
 the result of the nonzero bits info in this case coming from completely
 different algorithm (from VRP ranges) and that simply until the CCP
 iterations settle it can be wrong, so just to be safe in that case I'm
 adding the info only in ccp_finalize (and the new nonzero_bits mask with
 the old one).  From what I could see, the range info on that testcase looked
 just fine, the nonzero_bits derived from it too, if desirable, I can try to
 cook up a small testcase showing that (reduce from libgcov.c that exhibited
 that).

Surely you can't rely on CCP and VRP compute exactly the same
nonzero_bits.  As you don't record/compute zero_bits you can't
tell whether a not set bit in nonzer_bits is don't know or
if it is zero.  And you cannot do an assert during iteration
(during iteration optimistically 'wrong' values can disappear).

During CCP iteration the rule is that bits may only be added to mask
(and obviously the constant itself on a CONSTANT lattice value may
not change).

More comments inline

 2013-10-24  Jakub Jelinek  ja...@redhat.com
 
   * gimple-pretty-print.c (dump_ssaname_info): Print newline also
   in case of VR_VARYING.  Print get_nonzero_bits if not all ones.
   * tree-ssanames.h (struct range_info_def): Add nonzero_bits field.
   (set_nonzero_bits, get_nonzero_bits): New prototypes.
   * tree-ssa-ccp.c (get_default_value): Use get_range_info to see if
   a default def isn't partially constant.
   (ccp_finalize): If after IPA, set_range_info if integral SSA_NAME
   is known to be partially zero.
   (evaluate_stmt): If we'd return otherwise VARYING, use get_range_info
   to see if a default def isn't partially constant.
   * tree-ssanames.c (set_range_info): Initialize nonzero_bits upon
   creation of a range, if VR_RANGE, try to improve nonzero_bits from
   the range.
   (set_nonzero_bits, get_nonzero_bits): New functions.
 
   * g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure
   we warn about it already during VRP1 pass.
 
 --- gcc/gimple-pretty-print.c.jj  2013-10-23 14:43:12.0 +0200
 +++ gcc/gimple-pretty-print.c 2013-10-24 17:26:59.650945232 +0200
 @@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe
if (!POINTER_TYPE_P (TREE_TYPE (node))
 SSA_NAME_RANGE_INFO (node))
  {
 -  double_int min, max;
 +  double_int min, max, nonzero_bits;
value_range_type range_type = get_range_info (node, min, max);
  
if (range_type == VR_VARYING)
 @@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *buffe
 pp_printf (buffer, , );
 pp_double_int (buffer, max,

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Yury Gribov


 Sorry for the delay, I finally found time to look at it.

Np, thanks for helping!

 While your patch fixes the issue, I wonder
 ...
 potentially increase register pressure.

Makes sense. I didn't take care of this because I believed that we can 
freely allocate vregs and rely on register allocator do it's work.


 So here is (completely untested) alternate fix:

Nice, this seems to fix my repro cases.

I'll only be able to test my larger ARM app in 1-2 days. We could wait 
until then or commit now.


-Y

[AArch64] Fix size of memory store for the vstn_lane intrinsics

2013-10-29 Thread James Greenhalgh


Hi,

The vstn_lane_lane_type intrinsics should write
(sizeof (lane_type) * n) bytes to memory.

In their current form, their asm constraints suggest a write size of
(sizeof (vector_type) * n). This is anywhere from 1 to 16 times too
much data, can cause huge headaches with dead store elimination.

This patch better models how much data we will be writing, which in
turn lets us eliminate the memory clobber. Together, we avoid the
problems with dead store elimination.

Tested with aarch64.exp and checked the C++ neon mangling test which
often breaks when you do these ugly casts.

OK?

Thanks,
James

---
gcc/

2013-10-29  James Greenhalgh  james.greenha...@arm.com

* config/aarch64/arm_neon.h
(__ST2_LANE_FUNC): Better model data size.
(__ST3_LANE_FUNC): Likewise.
(__ST4_LANE_FUNC): Likewise.
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 787ff15..7a63ea1 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -14704,16 +14704,19 @@ __LD4_LANE_FUNC (uint64x2x4_t, uint64_t, 2d, d, u64, q)
 
 #define __ST2_LANE_FUNC(intype, ptrtype, regsuffix,			\
 			lnsuffix, funcsuffix, Q)			\
+  typedef struct { ptrtype __x[2]; } __ST2_LANE_STRUCTURE_##intype;	\
   __extension__ static __inline void	\
   __attribute__ ((__always_inline__))	\
-  vst2 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr,		\
+  vst2 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr,			\
  intype b, const int c)		\
   {	\
+__ST2_LANE_STRUCTURE_##intype *__p =\
+(__ST2_LANE_STRUCTURE_##intype *)ptr;	\
 __asm__ (ld1 {v16. #regsuffix , v17. #regsuffix }, %1\n\t	\
 	 st2 {v16. #lnsuffix , v17. #lnsuffix }[%2], %0\n\t	\
-	 : =Q(*(intype *) ptr)	\
+	 : =Q(*__p)		\
 	 : Q(b), i(c)		\
-	 : memory, v16, v17);	\
+	 : v16, v17);		\
   }
 
 __ST2_LANE_FUNC (int8x8x2_t, int8_t, 8b, b, s8,)
@@ -14743,16 +14746,19 @@ __ST2_LANE_FUNC (uint64x2x2_t, uint64_t, 2d, d, u64, q)
 
 #define __ST3_LANE_FUNC(intype, ptrtype, regsuffix,			\
 			lnsuffix, funcsuffix, Q)			\
+  typedef struct { ptrtype __x[3]; } __ST3_LANE_STRUCTURE_##intype;	\
   __extension__ static __inline void	\
   __attribute__ ((__always_inline__))	\
-  vst3 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr,		\
+  vst3 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr,			\
  intype b, const int c)		\
   {	\
+__ST3_LANE_STRUCTURE_##intype *__p =\
+(__ST3_LANE_STRUCTURE_##intype *)ptr;	\
 __asm__ (ld1 {v16. #regsuffix  - v18. #regsuffix }, %1\n\t	\
 	 st3 {v16. #lnsuffix  - v18. #lnsuffix }[%2], %0\n\t	\
-	 : =Q(*(intype *) ptr)	\
+	 : =Q(*__p)		\
 	 : Q(b), i(c)		\
-	 : memory, v16, v17, v18);\
+	 : v16, v17, v18);	\
   }
 
 __ST3_LANE_FUNC (int8x8x3_t, int8_t, 8b, b, s8,)
@@ -14782,16 +14788,19 @@ __ST3_LANE_FUNC (uint64x2x3_t, uint64_t, 2d, d, u64, q)
 
 #define __ST4_LANE_FUNC(intype, ptrtype, regsuffix,			\
 			lnsuffix, funcsuffix, Q)			\
+  typedef struct { ptrtype __x[4]; } __ST4_LANE_STRUCTURE_##intype;	\
   __extension__ static __inline void	\
   __attribute__ ((__always_inline__))	\
-  vst4 ## Q ## _lane_ ## funcsuffix (const ptrtype *ptr,		\
+  vst4 ## Q ## _lane_ ## funcsuffix (ptrtype *ptr,			\
  intype b, const int c)		\
   {	\
+__ST4_LANE_STRUCTURE_##intype *__p =\
+(__ST4_LANE_STRUCTURE_##intype *)ptr;	\
 __asm__ (ld1 {v16. #regsuffix  - v19. #regsuffix }, %1\n\t	\
 	 st4 {v16. #lnsuffix  - v19. #lnsuffix }[%2], %0\n\t	\
-	 : =Q(*(intype *) ptr)	\
+	 : =Q(*__p)		\
 	 : Q(b), i(c)		\
-	 : memory, v16, v17, v18, v19);			\
+	 : v16, v17, v18, v19);\
   }
 
 __ST4_LANE_FUNC (int8x8x4_t, int8_t, 8b, b, s8,)

Re: [PATCH] Use get_nonzero_bits to improve vectorization

2013-10-29 Thread Richard Biener

On Fri, 25 Oct 2013, Jakub Jelinek wrote:

 Hi!
 
 The following patch makes use of the computed nonzero_bits preserved
 in the SSA_NAME_RANGE_INFO.
 I chose to write a new routine instead of improving current
 highest_pow2_factor, because that routine didn't care about overflows etc.
 and by working on ctz numbers instead of powers of two in UHWI we can handle
 even larger constants etc.  highest_pow2_factor could very well overflow to
 zero etc.  So, the patch introduces a new tree_ctz routine and reimplements
 highest_pow2_factor on top of that, plus uses tree_ctz also in
 get_object_alignment_2 and in the vectorizer to determine if it can avoid
 scalar loop for bound (and indirectly also in the analysis whether peeling
 for alignment is needed).
 
 With this patch, e.g.
 int a[1024];
 
 void
 foo (int x, int y)
 {
   int i;
   x = -32;
   y = -32;
   for (i = x + 32; i  y; i++)
 a[i]++;
 }
 can be vectorized without any peeling for alignment or scalar loop
 afterwards.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
 
 2013-10-25  Jakub Jelinek  ja...@redhat.com
 
   * tree.c (tree_ctz): New function.
   * tree.h (tree_ctz): New prototype.
   * tree-ssanames.h (get_range_info, get_nonzero_bits): Change
   first argument from tree to const_tree.
   * tree-ssanames.c (get_range_info, get_nonzero_bits): Likewise.
   * tree-vectorizer.h (vect_generate_tmps_on_preheader): New prototype.
   * tree-vect-loop-manip.c (vect_generate_tmps_on_preheader): No longer
   static.
   * expr.c (highest_pow2_factor): Reimplemented using tree_ctz.
   * tree-vect-loop.c (vect_analyze_loop_operations,
   vect_transform_loop): Don't force scalar loop for bound just because
   number of iterations is unknown, only do it if it is not known to be
   a multiple of vectorization_factor.
   * builtins.c (get_object_alignment_2): Use tree_ctz on offset.
 
 --- gcc/tree.c.jj 2013-10-23 14:43:12.0 +0200
 +++ gcc/tree.c2013-10-25 15:00:55.296178794 +0200
 @@ -2213,6 +2213,110 @@ tree_floor_log2 (const_tree expr)
 : floor_log2 (low));
  }
  
 +/* Return number of known trailing zero bits in EXPR, or, if the value of
 +   EXPR is known to be zero, the precision of it's type.  */
 +
 +int

unsigned int?

 +tree_ctz (const_tree expr)
 +{
 +  if (!INTEGRAL_TYPE_P (TREE_TYPE (expr))
 +   !POINTER_TYPE_P (TREE_TYPE (expr)))
 +return 0;
 +
 +  int ret1, ret2, prec = TYPE_PRECISION (TREE_TYPE (expr));
 +  switch (TREE_CODE (expr))
 +{
 +case INTEGER_CST:
 +  ret1 = tree_to_double_int (expr).trailing_zeros ();
 +  return MIN (ret1, prec);
 +case SSA_NAME:
 +  ret1 = get_nonzero_bits (expr).trailing_zeros ();
 +  return MIN (ret1, prec);
 +case PLUS_EXPR:
 +case MINUS_EXPR:
 +case BIT_IOR_EXPR:
 +case BIT_XOR_EXPR:
 +case MIN_EXPR:
 +case MAX_EXPR:
 +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
 +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));

This first recurses but if either one returns 0 you don't have
to (that cuts down the recursion from exponential to linear in
the common case?).  Thus, early out on ret == 0?

 +  return MIN (ret1, ret2);
 +case POINTER_PLUS_EXPR:
 +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
 +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
 +  ret2 = MIN (ret2, prec);

Why do you need that here but not elsewhere when processing
binary ops?

 +  return MIN (ret1, ret2);
 +case BIT_AND_EXPR:
 +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
 +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
 +  return MAX (ret1, ret2);
 +case MULT_EXPR:
 +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
 +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
 +  return MIN (ret1 + ret2, prec);
 +case LSHIFT_EXPR:
 +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
 +  if (host_integerp (TREE_OPERAND (expr, 1), 1)

check that first before recursing for op0 - if op1 is negative
you simply return ret1 which looks wrong, too.

 +((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
 +(unsigned HOST_WIDE_INT) prec))

This check is to avoid overflowing ret1 + ret2?  If so, why not add

 + {
 +   ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);

  ret2 = MIN (ret2, prec);

instead?

 +   return MIN (ret1 + ret2, prec);
 + }
 +  return ret1;
 +case RSHIFT_EXPR:
 +  if (host_integerp (TREE_OPERAND (expr, 1), 1)
 +((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
 +(unsigned HOST_WIDE_INT) prec))
 + {
 +   ret1 = tree_ctz (TREE_OPERAND (expr, 0));
 +   ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);
 +   if (ret1  ret2)
 + return ret1 - ret2;
 + }
 +  return 0;

Seems to be slightly better structured.  Looks like you assume only
positive shift amounts exist in the LSHIFT_EXPR case, I'm not sure
that's a valid

Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.

2013-10-29 Thread Eric Botcazou

 My reading of memory_address_addr_space is that MODE is the mode of the
 memory reference, not the mode of the address.  I fail to see how
 passing in the mode of the address in the first call can be correct.
 
 What am I missing?

Nothing, it's me confusing

rtx memory_address_addr_space (enum machine_mode, rtx, addr_space_t);

with

rtx convert_memory_address_addr_space (enum machine_mode, rtx, addr_space_t);

for a long time apparently...  Sorry about that.  In the meantime, I installed 
the obvious patch to eliminate the small redundancy.  Tested on x86_64/Linux.


2013-10-29  Eric Botcazou  ebotca...@adacore.com

* expr.c (expand_expr_real_1) MEM_EXPR: Eliminate small redundancy.


-- 
Eric BotcazouIndex: expr.c
===
--- expr.c	(revision 204106)
+++ expr.c	(working copy)
@@ -9648,8 +9648,8 @@ expand_expr_real_1 (tree exp, rtx target
 	rtx off
 	  = immed_double_int_const (mem_ref_offset (exp), address_mode);
 	op0 = simplify_gen_binary (PLUS, address_mode, op0, off);
+	op0 = memory_address_addr_space (mode, op0, as);
 	  }
-	op0 = memory_address_addr_space (mode, op0, as);
 	temp = gen_rtx_MEM (mode, op0);
 	set_mem_attributes (temp, exp, 0);
 	set_mem_addr_space (temp, as);

Re: libsanitizer merge from upstream r191666

2013-10-29 Thread Jakub Jelinek

 On Wed, Oct 2, 2013 at 12:51 PM, Konstantin Serebryany
  2013-10-XX  Kostya Serebryany  k...@google.com
 
  * g++.dg/asan/asan_test.cc: Update the test
  to match the fresh asan run-time.
  * c-c++-common/asan/stack-overflow-1.c: Ditto.
 
  === gcc/ChangeLog
 
  2013-10-XX  Kostya Serebryany  k...@google.com
 
  * asan.c: Update to match the changed asan API.
  (asan_emit_stack_protection): update the string stored in the
  stack red zone to match new API. Store the PC of the current
  function in the red zone.
  (asan_global_struct): update the __asan_global definition to match
  the new API.
  (asan_add_global): Ditto.
  * sanitizer.def: rename __asan_init_v1 to __asan_init_v3

The Update to match the changed asan API. should either be
dropped, or come on a line before the * asan.c (asan_emit_stack_protection):
line.  All descriptions should start with capital letters, end with ., two
spaces after . if followed by another sentence.

Besides that, here is (completely untested) attempt to give you
the pc of the first instruction of the function and two minor changes
(pp_string (something, ) is useless and in two spots I've noticed
you didn't add space before ( in function call).

Finally, if the new libasan is ABI incompatible with the old one, which
seems it is, then libsanitizer/asan/libtool-version (and perhaps also
libsanitizer/tsan/libtool-version, haven't looked if that one is ABI
compatible or not) needs to be bumped (to 1:0:0 ?).

--- gcc/asan.c.jj   2013-10-29 11:58:30.0 +0100
+++ gcc/asan.c  2013-10-29 13:04:07.709667677 +0100
@@ -921,6 +921,15 @@ asan_clear_shadow (rtx shadow_mem, HOST_
   add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100);
 }
 
+void
+asan_function_start (void)
+{
+  section *fnsec = function_section (current_function_decl);
+  switch_to_section (fnsec);
+  ASM_OUTPUT_DEBUG_LABEL (asm_out_file, LASANPC,
+ current_function_funcdef_no);
+}
+
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
directly, epilogue sequence returned.  BASE is the register holding the
stack base, against which OFFSETS array offsets are relative to, OFFSETS
@@ -936,12 +945,13 @@ asan_emit_stack_protection (rtx base, HO
int length)
 {
   rtx shadow_base, shadow_mem, ret, mem;
+  char buf[30];
   unsigned char shadow_bytes[4];
   HOST_WIDE_INT base_offset = offsets[length - 1], offset, prev_offset;
   HOST_WIDE_INT last_offset, last_size;
   int l;
   unsigned char cur_shadow_byte = ASAN_STACK_MAGIC_LEFT;
-  tree str_cst;
+  tree str_cst, decl;
 
   if (shadow_ptr_types[0] == NULL_TREE)
 asan_init_shadow_ptr_types ();
@@ -949,7 +959,6 @@ asan_emit_stack_protection (rtx base, HO
   /* First of all, prepare the description string.  */
   pretty_printer asan_pp;
 
-  pp_string (asan_pp, );
   pp_decimal_int (asan_pp, length / 2 - 1);
   pp_space (asan_pp);
   for (l = length - 2; l; l -= 2)
@@ -980,7 +989,17 @@ asan_emit_stack_protection (rtx base, HO
   mem = adjust_address (mem, VOIDmode, GET_MODE_SIZE (ptr_mode));
   emit_move_insn (mem, expand_normal (str_cst));
   mem = adjust_address (mem, VOIDmode, GET_MODE_SIZE (ptr_mode));
-  emit_move_insn (mem, expand_normal (str_cst));  // FIXME: should be cur_pc.
+  ASM_GENERATE_INTERNAL_LABEL (buf, LASANPC, current_function_funcdef_no);
+  decl = build_decl (DECL_SOURCE_LOCATION (current_function_decl),
+VAR_DECL, get_identifier (buf), char_type_node);
+  TREE_ADDRESSABLE (decl) = 1;
+  TREE_READONLY (decl) = 1;
+  DECL_ARTIFICIAL (decl) = 1;
+  DECL_IGNORED_P (decl) = 1;
+  TREE_STATIC (decl) = 1;
+  TREE_PUBLIC (decl) = 0;
+  TREE_USED (decl) = 1;
+  emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
  GEN_INT (ASAN_SHADOW_SHIFT),
  NULL_RTX, 1, OPTAB_DIRECT);
@@ -1979,8 +1998,8 @@ asan_add_global (tree decl, tree type, v
 pp_string (asan_pp, unknown);
   str_cst = asan_pp_string (asan_pp);
 
-  pp_string(module_name_pp, main_input_filename);
-  module_name_cst = asan_pp_string(module_name_pp);
+  pp_string (module_name_pp, main_input_filename);
+  module_name_cst = asan_pp_string (module_name_pp);
 
   if (asan_needs_local_alias (decl))
 {
--- gcc/asan.h.jj   2013-01-11 09:02:50.0 +0100
+++ gcc/asan.h  2013-10-29 12:37:54.190798947 +0100
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.
 #ifndef TREE_ASAN
 #define TREE_ASAN
 
+extern void asan_function_start (void);
 extern void asan_finish_file (void);
 extern rtx asan_emit_stack_protection (rtx, HOST_WIDE_INT *, tree *, int);
 extern bool asan_protect_global (tree);
--- gcc/final.c.jj  2013-10-23 14:43:12.0 +0200
+++ gcc/final.c 2013-10-29 12:49:33.609176613 +0100
@@ -78,6 +78,7 @@ along with GCC; see the

[PATCH][ARM] New rtx cost table for Cortex-A7

2013-10-29 Thread Kyrill Tkachov


Hi all,

This patch adds the new rtx costs for the Cortex-A7 core as well as a new tuning 
structure to contain it.


Tested arm-none-eabi on qemu and no benchmark regressions.

Ok for trunk?

Thanks,
Kyrill

[gcc/]
2013-10-29  Kyrylo Tkachov  kyrylo.tkac...@arm.com

* config/arm/arm.c (cortexa7_extra_costs): New table.
(arm_cortex_a7_tune): New.
* config/arm/arm-cores.def: Use cortex_a7 tuning for cortex-a7.diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 17c9bf3..79e2e87 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -125,7 +125,7 @@ ARM_CORE(arm1156t2-s,	  arm1156t2s,	6T2, FL_LDSCHED, v6t2)
 ARM_CORE(arm1156t2f-s,  arm1156t2fs,  6T2, FL_LDSCHED | FL_VFPV2, v6t2)
 ARM_CORE(generic-armv7-a, genericv7a,	7A, FL_LDSCHED, cortex)
 ARM_CORE(cortex-a5,	  cortexa5,	7A, FL_LDSCHED, cortex_a5)
-ARM_CORE(cortex-a7,	  cortexa7,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
+ARM_CORE(cortex-a7,	  cortexa7,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a7)
 ARM_CORE(cortex-a8,	  cortexa8,	7A, FL_LDSCHED, cortex)
 ARM_CORE(cortex-a9,	  cortexa9,	7A, FL_LDSCHED, cortex_a9)
 ARM_CORE(cortex-a15,	  cortexa15,	7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4cdac60..768bc21 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1050,6 +1050,107 @@ const struct cpu_cost_table generic_extra_costs =
   }
 };
 
+const struct cpu_cost_table cortexa7_extra_costs =
+{
+  /* ALU */
+  {
+0,			/* Arith.  */
+0,			/* Logical.  */
+COSTS_N_INSNS (1),	/* Shift.  */
+COSTS_N_INSNS (1),	/* Shift_reg.  */
+COSTS_N_INSNS (1),	/* Arith_shift.  */
+COSTS_N_INSNS (1),	/* Arith_shift_reg.  */
+COSTS_N_INSNS (1),	/* Log_shift.  */
+COSTS_N_INSNS (1),	/* Log_shift_reg.  */
+COSTS_N_INSNS (1),	/* Extend.  */
+COSTS_N_INSNS (1),	/* Extend_arith.  */
+COSTS_N_INSNS (1),	/* Bfi.  */
+COSTS_N_INSNS (1),	/* Bfx.  */
+COSTS_N_INSNS (1),	/* Clz.  */
+0,			/* non_exec.  */
+true		/* non_exec_costs_exec.  */
+  },
+
+  {
+/* MULT SImode */
+{
+  0,			/* Simple.  */
+  COSTS_N_INSNS (1),	/* Flag_setting.  */
+  COSTS_N_INSNS (1),	/* Extend.  */
+  COSTS_N_INSNS (1),	/* Add.  */
+  COSTS_N_INSNS (1),	/* Extend_add.  */
+  COSTS_N_INSNS (7)		/* Idiv.  */
+},
+/* MULT DImode */
+{
+  0,			/* Simple (N/A).  */
+  0,			/* Flag_setting (N/A).  */
+  COSTS_N_INSNS (1),	/* Extend.  */
+  0,			/* Add.  */
+  COSTS_N_INSNS (2),	/* Extend_add.  */
+  0/* Idiv (N/A).  */
+}
+  },
+  /* LD/ST */
+  {
+COSTS_N_INSNS (1),	/* Load.  */
+COSTS_N_INSNS (1),	/* Load_sign_extend.  */
+COSTS_N_INSNS (3),	/* Ldrd.  */
+COSTS_N_INSNS (1),	/* Ldm_1st.  */
+1,			/* Ldm_regs_per_insn_1st.  */
+2,			/* Ldm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2),	/* Loadf.  */
+COSTS_N_INSNS (2),	/* Loadd.  */
+COSTS_N_INSNS (1),	/* Load_unaligned.  */
+COSTS_N_INSNS (1),	/* Store.  */
+COSTS_N_INSNS (3),	/* Strd.  */
+COSTS_N_INSNS (1),	/* Stm_1st.  */
+1,			/* Stm_regs_per_insn_1st.  */
+2,			/* Stm_regs_per_insn_subsequent.  */
+COSTS_N_INSNS (2),	/* Storef.  */
+COSTS_N_INSNS (2),	/* Stored.  */
+COSTS_N_INSNS (1)	/* Store_unaligned.  */
+  },
+  {
+/* FP SFmode */
+{
+  COSTS_N_INSNS (15),	/* Div.  */
+  COSTS_N_INSNS (3),	/* Mult.  */
+  COSTS_N_INSNS (7),	/* Mult_addsub. */
+  COSTS_N_INSNS (7),	/* Fma.  */
+  COSTS_N_INSNS (3),	/* Addsub.  */
+  COSTS_N_INSNS (3),	/* Fpconst.  */
+  COSTS_N_INSNS (3),	/* Neg.  */
+  COSTS_N_INSNS (3),	/* Compare.  */
+  COSTS_N_INSNS (3),	/* Widen.  */
+  COSTS_N_INSNS (3),	/* Narrow.  */
+  COSTS_N_INSNS (3),	/* Toint.  */
+  COSTS_N_INSNS (3),	/* Fromint.  */
+  COSTS_N_INSNS (3)		/* Roundint.  */
+},
+/* FP DFmode */
+{
+  COSTS_N_INSNS (30),	/* Div.  */
+  COSTS_N_INSNS (6),	/* Mult.  */
+  COSTS_N_INSNS (10),	/* Mult_addsub.  */
+  COSTS_N_INSNS (7),	/* Fma.  */
+  COSTS_N_INSNS (3),	/* Addsub.  */
+  COSTS_N_INSNS (3),	/* Fpconst.  */
+  COSTS_N_INSNS (3),	/* Neg.  */
+  COSTS_N_INSNS (3),	/* Compare.  */
+  COSTS_N_INSNS (3),	/* Widen.  */
+  COSTS_N_INSNS (3),	/* Narrow.  */
+  COSTS_N_INSNS (3),	/* Toint.  */
+  COSTS_N_INSNS (3),	/* Fromint.  */
+  COSTS_N_INSNS (3)		/* Roundint.  */
+}
+  },
+  /* Vector */
+  {
+COSTS_N_INSNS (1)	/* Alu.  */
+  }
+};
+
 const struct cpu_cost_table cortexa15_extra_costs =
 {
   /* ALU */
@@ -1266,6 +1367,22 @@ const struct tune_params arm_cortex_tune =
   false /* Prefer Neon for 64-bits bitops.  */
 };
 
+const struct tune_params arm_cortex_a7_tune =
+{
+  arm_9e_rtx_costs,
+  cortexa7_extra_costs,
+  NULL,
+  1,		/* Constant

Re: [PATCH][ARM] New rtx cost table for Cortex-A7

2013-10-29 Thread Ramana Radhakrishnan


On 10/29/13 12:15, Kyrill Tkachov wrote:

Hi all,

This patch adds the new rtx costs for the Cortex-A7 core as well as a new tuning
structure to contain it.

Tested arm-none-eabi on qemu and no benchmark regressions.

Ok for trunk?


Ok.

Ramana

Re: [wide-int] Treat order comparisons like other binary ops

2013-10-29 Thread Richard Biener

On Sun, 27 Oct 2013, Richard Sandiford wrote:

 Until now, eq_p and ne_p have enforced the same argument rules as things
 like addition, while order comparisons like lts_p have treated the two
 arguments as independent and signed.  Richard, I think you said on IRC
 that you thought lts_p should behave like the others.  E.g. lts_p on two
 trees or two rtxes should make sure that the precisions are the same.

Yes.

 This patch does that.  I think all uses of INT_CST_LT and INT_CST_LT_UNSIGNED
 are really comparing to infinite precision, and the UNSIGNED distinction is
 only there because double_int isn't wide enough to be effectively infinite.
 Is that right? 

Yes, it's only because of the case of set MSB on the double-int.

 Since that isn't a problem with widest_int, the patch gets
 rid of INT_CST_LT_UNSIGNED and only uses INT_CST_LT.

 
 The c-lex.c change includes a generic change that I justed posted for trunk.
 
 I rejigged the order in tree.h slightly so that it matches trunk.
 
 Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK for wide-int?

Ok.

Thanks,
Richard.

 Thanks,
 Richard
 
 
 Index: gcc/c-family/c-common.c
 ===
 --- gcc/c-family/c-common.c   2013-10-27 14:11:53.006510519 +
 +++ gcc/c-family/c-common.c   2013-10-27 14:19:27.667578745 +
 @@ -4101,20 +4101,10 @@ shorten_compare (tree *op0_ptr, tree *op
 maxval = convert (*restype_ptr, maxval);
   }
  
 -  if (unsignedp  unsignedp0)
 - {
 -   min_gt = INT_CST_LT_UNSIGNED (primop1, minval);
 -   max_gt = INT_CST_LT_UNSIGNED (primop1, maxval);
 -   min_lt = INT_CST_LT_UNSIGNED (minval, primop1);
 -   max_lt = INT_CST_LT_UNSIGNED (maxval, primop1);
 - }
 -  else
 - {
 -   min_gt = INT_CST_LT (primop1, minval);
 -   max_gt = INT_CST_LT (primop1, maxval);
 -   min_lt = INT_CST_LT (minval, primop1);
 -   max_lt = INT_CST_LT (maxval, primop1);
 - }
 +  min_gt = INT_CST_LT (primop1, minval);
 +  max_gt = INT_CST_LT (primop1, maxval);
 +  min_lt = INT_CST_LT (minval, primop1);
 +  max_lt = INT_CST_LT (maxval, primop1);
  
val = 0;
/* This used to be a switch, but Genix compiler can't handle that.  */
 Index: gcc/c-family/c-lex.c
 ===
 --- gcc/c-family/c-lex.c  2013-10-27 14:11:53.006510519 +
 +++ gcc/c-family/c-lex.c  2013-10-27 14:19:27.664578718 +
 @@ -48,9 +48,9 @@ static tree interpret_float (const cpp_t
enum overflow_type *);
  static tree interpret_fixed (const cpp_token *, unsigned int);
  static enum integer_type_kind narrowest_unsigned_type
 - (const wide_int , unsigned int);
 + (const widest_int , unsigned int);
  static enum integer_type_kind narrowest_signed_type
 - (const wide_int , unsigned int);
 + (const widest_int , unsigned int);
  static enum cpp_ttype lex_string (const cpp_token *, tree *, bool, bool);
  static tree lex_charconst (const cpp_token *);
  static void update_header_times (const char *);
 @@ -526,7 +526,7 @@ c_lex_with_flags (tree *value, location_
 there isn't one.  */
  
  static enum integer_type_kind
 -narrowest_unsigned_type (const wide_int val, unsigned int flags)
 +narrowest_unsigned_type (const widest_int val, unsigned int flags)
  {
int itk;
  
 @@ -545,7 +545,7 @@ narrowest_unsigned_type (const wide_int
   continue;
upper = TYPE_MAX_VALUE (integer_types[itk]);
  
 -  if (wi::geu_p (upper, val))
 +  if (wi::geu_p (wi::to_widest (upper), val))
   return (enum integer_type_kind) itk;
  }
  
 @@ -554,7 +554,7 @@ narrowest_unsigned_type (const wide_int
  
  /* Ditto, but narrowest signed type.  */
  static enum integer_type_kind
 -narrowest_signed_type (const wide_int val, unsigned int flags)
 +narrowest_signed_type (const widest_int val, unsigned int flags)
  {
int itk;
  
 @@ -573,7 +573,7 @@ narrowest_signed_type (const wide_int v
   continue;
upper = TYPE_MAX_VALUE (integer_types[itk]);
  
 -  if (wi::geu_p (upper, val))
 +  if (wi::geu_p (wi::to_widest (upper), val))
   return (enum integer_type_kind) itk;
  }
  
 @@ -588,20 +588,18 @@ interpret_integer (const cpp_token *toke
tree value, type;
enum integer_type_kind itk;
cpp_num integer;
 -  cpp_options *options = cpp_get_options (parse_in);
 -  HOST_WIDE_INT ival[2];
 -  wide_int wval;
 +  HOST_WIDE_INT ival[3];
  
*overflow = OT_NONE;
  
integer = cpp_interpret_integer (parse_in, token, flags);
 -  integer = cpp_num_sign_extend (integer, options-precision);
if (integer.overflow)
  *overflow = OT_OVERFLOW;
  
ival[0] = integer.low;
ival[1] = integer.high;
 -  wval = wide_int::from_array (ival, 2, HOST_BITS_PER_WIDE_INT * 2);
 +  ival[2] = 0;
 +  widest_int wval = widest_int::from_array (ival, 3);
  
/* The type of a constant with a U suffix is

Re: [wide-int] More optimisations

2013-10-29 Thread Richard Biener

On Sun, 27 Oct 2013, Richard Sandiford wrote:

 This patch adds some more optimisations to the wi:: comparison functions.
 It uses the:
 
   #define CONSTANT(X) (__builtin_constant_p (X)  (X))
 
 idiom that was mentioned before, except that I thought CONSTANT would be
 too easily confused with CONSTANT_P, so I went for CAN_TELL instead.
 Better names welcome.

STATIC_CONSTANT_P similar to static_assert?

 The changes are:
 
 - Add a fast path to eq_p for when one of the inputs isn't sign-extended.
   This includes code to handle compile-time 0 specially.
 
 - Add the opposite optimisation to Mike's lts_p change, if we can tell at
   compile time that it applies.

I think the cases Mike added should only be enabled when we can figure
them out at compile-time, too.

Ok with that change.

Thanks,
Richard.

 - Add fast paths to ltu_p for constants.
 
 E.g.:
 
   bool
   f1 (const_tree x)
   {
 return wi::eq_p (x, 0);
   }
 
 now gives:
 
 xorl%eax, %eax
 cmpw$1, 4(%rdi)
 je  .L5
 rep ret
 .p2align 4,,10
 .p2align 3
 .L5:
 cmpq$0, 16(%rdi)
 sete%al
 ret
 
   bool
   f2 (const_tree x, HOST_WIDE_INT y)
   {
 return wi::eq_p (x, y);
   }
 
 gives:
 
 movq8(%rdi), %rax
 movzwl  52(%rax), %edx
 xorl%eax, %eax
 andw$1023, %dx
 cmpw$1, 4(%rdi)
 je  .L10
 rep ret
 .p2align 4,,10
 .p2align 3
 .L10:
 xorq16(%rdi), %rsi
 movzwl  %dx, %edx
 movl$64, %ecx
 subl%edx, %ecx
 movq%rsi, %rax
 salq%cl, %rax
 testl   %ecx, %ecx
 cmovg   %rax, %rsi
 testq   %rsi, %rsi
 sete%al
 ret
 
   bool
   f3 (HOST_WIDE_INT x, const_tree y)
   {
 return wi::lts_p (x, y);
   }
 
 is similarly ugly because of way that it ignores TYPE_SIGN and so has to
 explicitly sign-extend small-prec cases:
 
 movq8(%rsi), %rax
 movzwl  4(%rsi), %ecx
 movzwl  52(%rax), %edx
 andl$1023, %edx
 cmpl$1, %ecx
 je  .L16
 leal-1(%rcx), %eax
 sall$6, %ecx
 subl%edx, %ecx
 movq16(%rsi,%rax,8), %rax
 movq%rax, %rdx
 salq%cl, %rdx
 testl   %ecx, %ecx
 cmovg   %rdx, %rax
 sarq$63, %rax
 addl$1, %eax
 ret
 .p2align 4,,10
 .p2align 3
 .L16:
 cmpl$63, %edx
 movq16(%rsi), %rax
 ja  .L13
 movb$64, %cl
 subl%edx, %ecx
 salq%cl, %rax
 sarq%cl, %rax
 .L13:
 cmpq%rdi, %rax
 setg%al
 ret
 
 but:
 
   bool
   f4 (HOST_WIDE_INT x, const_tree y)
   {
 return wi::lts_p (x, wi::to_widest (y));
   }
 
 is a bit more respectable:
 
 movzwl  6(%rsi), %eax
 cmpl$1, %eax
 je  .L20
 subl$1, %eax
 movq16(%rsi,%rax,8), %rax
 sarq$63, %rax
 addl$1, %eax
 ret
 .p2align 4,,10
 .p2align 3
 .L20:
 cmpq%rdi, 16(%rsi)
 setg%al
 ret
 
 For similar reasons:
 
   bool
   f5 (const_tree x)
   {
 return wi::ltu_p (x, 100);
   }
 
 gives:
 
 movq8(%rdi), %rax
 movzwl  52(%rax), %ecx
 xorl%eax, %eax
 andw$1023, %cx
 cmpw$1, 4(%rdi)
 je  .L26
 rep ret
 .p2align 4,,10
 .p2align 3
 .L26:
 cmpw$63, %cx
 ja  .L23
 movl$1, %eax
 salq%cl, %rax
 subq$1, %rax
 andq16(%rdi), %rax
 .L24:
 cmpq$99, %rax
 setbe   %al
 ret
 .p2align 4,,10
 .p2align 3
 .L23:
 movq16(%rdi), %rax
 jmp .L24
 
 but:
 
   bool
   f6 (const_tree x)
   {
 return wi::ltu_p (wi::to_widest (x), 100);
   }
 
 gives:
 
 xorl%eax, %eax
 cmpw$1, 6(%rdi)
 je  .L30
 rep ret
 .p2align 4,,10
 .p2align 3
 .L30:
 cmpq$99, 16(%rdi)
 setbe   %al
 ret
 
 Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK for wide-int?
 
 Thanks,
 Richard
 
 
 Index: gcc/system.h
 ===
 --- gcc/system.h  2013-10-27 14:25:19.144723977 +
 +++ gcc/system.h  2013-10-27 14:25:20.716738045 +
 @@ -711,6 +711,12 @@ #define gcc_unreachable() __builtin_unre
  #define gcc_unreachable() (fancy_abort (__FILE__, __LINE__, __FUNCTION__))
  #endif
  
 +#if GCC_VERSION = 3001
 +#define CAN_TELL(X) (__builtin_constant_p (X)  (X))
 +#else
 +#define CAN_TELL(X) (false  (X))
 +#endif
 +
  /* Until we can use STATIC_ASSERT.  */
  #define STATIC_ASSERT(X) \
typedef int assertion1[(X) ? 1 : -1] ATTRIBUTE_UNUSED
 Index: gcc/wide-int.h

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Richard Sandiford

Jakub Jelinek ja...@redhat.com writes:
 On Wed, Oct 16, 2013 at 09:35:21AM +0400, Yury Gribov wrote:
  I've recently submitted a bug report regarding invalid
 unpoisoning of stack frame redzones
 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58543). Could someone
 take a look at proposed patch (a simple one-liner) and check whether
 it's ok for commit?
 
  Can you please be more verbose
 
  Do you think the proposed patch is ok or should I provide some
 additional info?
 
 Jakub, Dodji,
 
 Any updates on this one? Note that this bug is a huge blocker for
 using AddressSanitizer on ARM platforms.

 Sorry for the delay, I finally found time to look at it.
 While your patch fixes the issue, I wonder if for the case where
 shadow_mem before the asan_clear_shadow call is already of the form
 (mem (reg)) it isn't better to just adjust next asan_clear_shadow
 base from the end of the cleared region rather than from the start of it,
 because the end will be already in the right pseudo, while with your
 approach it needs to be done in a different register and potentially
 increase register pressure.  So here is (completely untested) alternate fix:

Hmm, it just seems wrong to be assigning to registers returned by force_reg
and expand_binop though.  Would this variation be OK?

Thanks,
Richard


Index: gcc/asan.c
===
--- gcc/asan.c  (revision 204124)
+++ gcc/asan.c  (working copy)
@@ -876,9 +876,10 @@
 }
 
 /* Clear shadow memory at SHADOW_MEM, LEN bytes.  Can't call a library call 
here
-   though.  */
+   though.  If the address of the next byte is in a known register, return
+   that register, otherwise return null.  */
 
-static void
+static rtx
 asan_clear_shadow (rtx shadow_mem, HOST_WIDE_INT len)
 {
   rtx insn, insns, top_label, end, addr, tmp, jump;
@@ -893,12 +894,12 @@
   if (insn == NULL_RTX)
 {
   emit_insn (insns);
-  return;
+  return 0;
 }
 
   gcc_assert ((len  3) == 0);
   top_label = gen_label_rtx ();
-  addr = force_reg (Pmode, XEXP (shadow_mem, 0));
+  addr = copy_to_mode_reg (Pmode, XEXP (shadow_mem, 0));
   shadow_mem = adjust_automodify_address (shadow_mem, SImode, addr, 0);
   end = force_reg (Pmode, plus_constant (Pmode, addr, len));
   emit_label (top_label);
@@ -912,6 +913,7 @@
   jump = get_last_insn ();
   gcc_assert (JUMP_P (jump));
   add_int_reg_note (jump, REG_BR_PROB, REG_BR_PROB_BASE * 80 / 100);
+  return addr;
 }
 
 /* Insert code to protect stack vars.  The prologue sequence should be emitted
@@ -1048,7 +1050,15 @@
   (last_offset - prev_offset)
ASAN_SHADOW_SHIFT);
  prev_offset = last_offset;
- asan_clear_shadow (shadow_mem, last_size  ASAN_SHADOW_SHIFT);
+ rtx end_addr = asan_clear_shadow (shadow_mem,
+   last_size  ASAN_SHADOW_SHIFT);
+ if (end_addr)
+   {
+ shadow_mem
+   = adjust_automodify_address (shadow_mem, VOIDmode, end_addr,
+last_size  ASAN_SHADOW_SHIFT);
+ prev_offset += last_size;
+   }
  last_offset = offset;
  last_size = 0;
}

RE: [PING] [AArch64] Peepholes to generate ldp and stp instructions

2013-10-29 Thread Hurugalawadi, Naveen

Hi, 

 You are better off CCing the maintainers for such reviews. Let me do
 that for you. I cannot approve or reject this patch but I have a few
 comments as below.

Thanks for the quick review and comments.
Please find attached the modified patch as per review comments.
Please review the same and let me know if its okay.
 
Build and tested on aarch64-thunder-elf (using Cavium's internal
simulator). No new regressions.

2013-10-29   Naveen H.S  naveen.hurugalaw...@caviumnetworks.com

gcc/
* config/aarch64/aarch64.md (peephole2 to generate ldp
instruction for 2 consecutive loads from memory): New.
(peephole2 to generate stp instruction for 2 consecutive
stores to memory in integer mode): New.
(peephole2 to generate ldp instruction for 2 consecutive
loads from memory in floating point mode): New.
(peephole2 to generate stp instruction for 2 consecutive
stores to memory in floating point mode): New.

gcc/testsuite
* gcc.target/aarch64/ldp-stp.c: New testcase.

Thanks,
Naveen

--- gcc/config/aarch64/aarch64.md	2013-10-28 17:15:52.363975264 +0530
+++ gcc/config/aarch64/aarch64.md	2013-10-29 17:40:48.516129561 +0530
@@ -1068,6 +1068,27 @@
(set_attr mode MODE)]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 register_operand)
+	(match_operand:GPI 1 aarch64_mem_pair_operand))
+   (set (match_operand:GPI 2 register_operand)
+	(match_operand:GPI 3 memory_operand))]
+  GET_CODE (operands[1]) == MEM
+GET_CODE (XEXP (operands[1], 0)) == PLUS
+REG_P (XEXP (XEXP (operands[1], 0), 0))
+CONST_INT_P (XEXP (XEXP (operands[1], 0), 1))
+GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+REGNO (operands[0]) != REGNO (operands[2])
+GP_REGNUM_P (REGNO (operands[0]))  GP_REGNUM_P (REGNO (operands[2]))
+REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+  GET_MODE_SIZE (MODEmode)))
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn store_pairmode
@@ -1085,6 +1106,27 @@
(set_attr mode MODE)]
 )
 
+(define_peephole2
+  [(set (match_operand:GPI 0 aarch64_mem_pair_operand)
+	(match_operand:GPI 1 register_operand))
+   (set (match_operand:GPI 2 memory_operand)
+	(match_operand:GPI 3 register_operand))]
+  GET_CODE (operands[0]) == MEM
+GET_CODE (XEXP (operands[0], 0)) == PLUS
+REG_P (XEXP (XEXP (operands[0], 0), 0))
+CONST_INT_P (XEXP (XEXP (operands[0], 0), 1))
+GET_MODE (operands[1]) == GET_MODE (XEXP (XEXP (operands[0], 0), 0))
+REGNO (operands[1]) != REGNO (operands[3])
+GP_REGNUM_P (REGNO (operands[1]))  GP_REGNUM_P (REGNO (operands[3]))
+REGNO_REG_CLASS (REGNO (operands[1]))
+  == REGNO_REG_CLASS (REGNO (operands[3]))
+rtx_equal_p (XEXP (operands[2], 0),
+		   plus_constant (Pmode, XEXP (operands[0], 0),
+  GET_MODE_SIZE (MODEmode)))
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 1 and 3 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn load_pairmode
@@ -1102,6 +1144,27 @@
(set_attr mode MODE)]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 register_operand)
+	(match_operand:GPF 1 aarch64_mem_pair_operand))
+   (set (match_operand:GPF 2 register_operand)
+	(match_operand:GPF 3 memory_operand))]
+  GET_CODE (operands[1]) == MEM
+GET_CODE (XEXP (operands[1], 0)) == PLUS
+REG_P (XEXP (XEXP (operands[1], 0), 0))
+CONST_INT_P (XEXP (XEXP (operands[1], 0), 1))
+GET_MODE (operands[0]) == GET_MODE (XEXP (XEXP (operands[1], 0), 0))
+REGNO (operands[0]) != REGNO (operands[2])
+FP_REGNUM_P (REGNO (operands[0]))  FP_REGNUM_P (REGNO (operands[2]))
+REGNO_REG_CLASS (REGNO (operands[0]))
+  == REGNO_REG_CLASS (REGNO (operands[2]))
+rtx_equal_p (XEXP (operands[3], 0),
+		   plus_constant (Pmode, XEXP (operands[1], 0),
+  GET_MODE_SIZE (MODEmode)))
+  [(parallel [(set (match_dup 0) (match_dup 1))
+	  (set (match_dup 2) (match_dup 3))])]
+)
+
 ;; Operands 0 and 2 are tied together by the final condition; so we allow
 ;; fairly lax checking on the second memory operation.
 (define_insn store_pairmode
@@ -1119,6 +1182,27 @@
(set_attr mode MODE)]
 )
 
+(define_peephole2
+  [(set (match_operand:GPF 0 aarch64_mem_pair_operand)
+	(match_operand:GPF 1 register_operand))
+   (set (match_operand:GPF 2 memory_operand)
+	(match_operand:GPF 3 register_operand))]
+  GET_CODE (operands[0]) == MEM
+GET_CODE (XEXP (operands[0], 0)) == PLUS
+REG_P (XEXP (XEXP (operands[0], 0), 0))
+CONST_INT_P (XEXP (XEXP (operands[0], 0), 1))
+GET_MODE (operands[1]) ==

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Jakub Jelinek

On Tue, Oct 29, 2013 at 12:25:33PM +, Richard Sandiford wrote:
  Any updates on this one? Note that this bug is a huge blocker for
  using AddressSanitizer on ARM platforms.
 
  Sorry for the delay, I finally found time to look at it.
  While your patch fixes the issue, I wonder if for the case where
  shadow_mem before the asan_clear_shadow call is already of the form
  (mem (reg)) it isn't better to just adjust next asan_clear_shadow
  base from the end of the cleared region rather than from the start of it,
  because the end will be already in the right pseudo, while with your
  approach it needs to be done in a different register and potentially
  increase register pressure.  So here is (completely untested) alternate fix:
 
 Hmm, it just seems wrong to be assigning to registers returned by force_reg
 and expand_binop though.  Would this variation be OK?

Why?  If it is a pseudo, it is certainly a pseudo that isn't used for
anything else, as it is the result of (base  3) + constant, if it isn't a
pseudo, then supposedly it is better not to just keep adding the offsets to
a base and thus not to use the end address from asan_clear_shadow.
Your patch would force using the end address even if shadow_base
is initially say some_reg + 32, I think it is better to use shadow_mem
some_reg + 72 next time instead of some_other_reg + 40.

Jakub

Re: [wide-int] More optimisations

2013-10-29 Thread Richard Sandiford

Richard Biener rguent...@suse.de writes:
 On Sun, 27 Oct 2013, Richard Sandiford wrote:
 This patch adds some more optimisations to the wi:: comparison functions.
 It uses the:
 
   #define CONSTANT(X) (__builtin_constant_p (X)  (X))
 
 idiom that was mentioned before, except that I thought CONSTANT would be
 too easily confused with CONSTANT_P, so I went for CAN_TELL instead.
 Better names welcome.

 STATIC_CONSTANT_P similar to static_assert?

Sounds good.

 The changes are:
 
 - Add a fast path to eq_p for when one of the inputs isn't sign-extended.
   This includes code to handle compile-time 0 specially.
 
 - Add the opposite optimisation to Mike's lts_p change, if we can tell at
   compile time that it applies.

 I think the cases Mike added should only be enabled when we can figure
 them out at compile-time, too.

Well, the idea with most of these functions was to handle the simple
everything is one HWI cases inline.  For operations like addition this
needs to be based on the precision, since that's the only cheap way of
knowing whether the result is also a single HWI.  But for logic operations
like AND and OR, it can be based on whether the inputs are single HWIs,
since single-HWI inputs give a single-HWI output.  This catches more cases.

lts_p is a bit like AND and OR in this respect.  fits_shwi_p is the
same as len == 1, which is simple to check and is often true.  If we
restrict the optimisation to STATIC_CONSTANT_P we'll end up using the
out-of-line comparison for all TREE_INT_CST_LTs, even the int-int ones
that we currently handle inline.

Thanks,
Richard

Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-29 Thread Jeff Law


On 10/29/13 04:17, Richard Biener wrote:

On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law l...@redhat.com wrote:

On 10/25/13 11:57, Ilya Enkovich wrote:



There are currently two known issues with LTO. The first one is ICE in
LTO streamer when it reads instrumented code. The second one is
unitialized flag_check_pointers when code is compiled by lto1 (do you
know why it may happen BTW?). It also causes ICE beacause instrumented
code is met when not expected. Of course, I'll fix these problems
anyway, but I was going to allow '-fcheck-pointers -flto' only when
checker testsuite has 100% pass rate with lto.


No idea about why LTO would be failing in this way -- my experience with LTO
is virtually nil.

I'm OK with the restriction as a temporary measure.  But we really dislike
this kidn of stuff, so I need a commitment from you to dive into these
problems and sort out what's really going on.


I'm not ok with adding such arbitrary restrictions.  LTO support should
be trivial.
In  that case, Ilya, you've got to resolve the LTO issues before this 
can move forward.


jeff

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Richard Sandiford

Jakub Jelinek ja...@redhat.com writes:
 On Tue, Oct 29, 2013 at 12:25:33PM +, Richard Sandiford wrote:
  Any updates on this one? Note that this bug is a huge blocker for
  using AddressSanitizer on ARM platforms.
 
  Sorry for the delay, I finally found time to look at it.
  While your patch fixes the issue, I wonder if for the case where
  shadow_mem before the asan_clear_shadow call is already of the form
  (mem (reg)) it isn't better to just adjust next asan_clear_shadow
  base from the end of the cleared region rather than from the start of it,
  because the end will be already in the right pseudo, while with your
  approach it needs to be done in a different register and potentially
  increase register pressure.  So here is (completely untested) alternate 
  fix:
 
 Hmm, it just seems wrong to be assigning to registers returned by force_reg
 and expand_binop though.  Would this variation be OK?

 Why?

Well, that was the traditional rule:

   The caller must not alter the value in the register we return,
   since we mark it as a constant register.  */

rtx
force_reg (enum machine_mode mode, rtx x)
{

but maybe it doesn't matter now, since the constant register flag has
gone and since we seem to use REG_EQUAL rather than REG_EQUIV...

 If it is a pseudo, it is certainly a pseudo that isn't used for
 anything else, as it is the result of (base  3) + constant, if it isn't a
 pseudo, then supposedly it is better not to just keep adding the offsets to
 a base and thus not to use the end address from asan_clear_shadow.
 Your patch would force using the end address even if shadow_base
 is initially say some_reg + 32, I think it is better to use shadow_mem
 some_reg + 72 next time instead of some_other_reg + 40.

But I thought the register pressure thing was to avoid having the
original base live across the loop.  Doesn't that apply for reg + offset
as well as plain reg?  If we have to force some_reg + 32 into a
temporary and then loop on it, using the new base should avoid keeping
both it and some_reg live at the same time.  Plus smaller offsets are
often better on some targets.

Thanks,
Richard

Re: [PATCH] Invalid unpoisoning of stack redzones on ARM

2013-10-29 Thread Jakub Jelinek

On Tue, Oct 29, 2013 at 01:06:21PM +, Richard Sandiford wrote:
  If it is a pseudo, it is certainly a pseudo that isn't used for
  anything else, as it is the result of (base  3) + constant, if it isn't a
  pseudo, then supposedly it is better not to just keep adding the offsets to
  a base and thus not to use the end address from asan_clear_shadow.
  Your patch would force using the end address even if shadow_base
  is initially say some_reg + 32, I think it is better to use shadow_mem
  some_reg + 72 next time instead of some_other_reg + 40.
 
 But I thought the register pressure thing was to avoid having the
 original base live across the loop.  Doesn't that apply for reg + offset
 as well as plain reg?  If we have to force some_reg + 32 into a
 temporary and then loop on it, using the new base should avoid keeping
 both it and some_reg live at the same time.  Plus smaller offsets are
 often better on some targets.

On the other side, if we force the address to be just register based when it
previously wasn't, it makes life harder for alias analysis etc., at least in
the past say on ia64 I remember the fact that it didn't allow anything but
registers as valid addresses often resulted in tons of missed
optimizations), so in the end I guess I have nothing against the original
patch (with whatever form of the copy to reg call is desirable).

Jakub

Re: libsanitizer merge from upstream r191666

2013-10-29 Thread Jakub Jelinek

On Tue, Oct 29, 2013 at 06:49:30AM -0700, Konstantin Serebryany wrote:
 Thanks!
 (At this time I will be slow with response due to travel)

BTW, don't have compiled clang/llvm pre-3.4 around to look at, for the use
after return, do you emit always the the __asan_*malloc/free calls for the
stack vars, or only conditionally based on some compiler flag (and in the
latter case, is that flag on by default or not)?

Jakub

Re: [PATCH, MPX, 2/X] Pointers Checker [2/25] Builtins

2013-10-29 Thread Ilya Enkovich

On 29 Oct 06:59, Jeff Law wrote:
 On 10/29/13 04:17, Richard Biener wrote:
 On Mon, Oct 28, 2013 at 10:21 PM, Jeff Law l...@redhat.com wrote:
 On 10/25/13 11:57, Ilya Enkovich wrote:
 
 
 There are currently two known issues with LTO. The first one is ICE in
 LTO streamer when it reads instrumented code. The second one is
 unitialized flag_check_pointers when code is compiled by lto1 (do you
 know why it may happen BTW?). It also causes ICE beacause instrumented
 code is met when not expected. Of course, I'll fix these problems
 anyway, but I was going to allow '-fcheck-pointers -flto' only when
 checker testsuite has 100% pass rate with lto.
 
 No idea about why LTO would be failing in this way -- my experience with LTO
 is virtually nil.
 
 I'm OK with the restriction as a temporary measure.  But we really dislike
 this kidn of stuff, so I need a commitment from you to dive into these
 problems and sort out what's really going on.
 
 I'm not ok with adding such arbitrary restrictions.  LTO support should
 be trivial.
 In  that case, Ilya, you've got to resolve the LTO issues before
 this can move forward.
 
 jeff
 

Yeah.  I'm working on it right now.  I've fixed known issues and now I'm 
looking for others.  Meanwhile here is a new patch version with required 
renames and without LTO restriction.

Thanks,
Ilya
--

gcc/

2013-10-29  Ilya Enkovich  ilya.enkov...@intel.com

* builtin-types.def (BT_FN_VOID_CONST_PTR): New.
(BT_FN_PTR_CONST_PTR): New.
(BT_FN_CONST_PTR_CONST_PTR): New.
(BT_FN_PTR_CONST_PTR_SIZE): New.
(BT_FN_PTR_CONST_PTR_CONST_PTR): New.
(BT_FN_VOID_PTRPTR_CONST_PTR): New.
(BT_FN_VOID_CONST_PTR_SIZE): New.
(BT_FN_PTR_CONST_PTR_CONST_PTR_SIZE): New.
* chkp-builtins.def: New.
* builtins.def: include chkp-builtins.def.
(DEF_CHKP_BUILTIN): New.
* builtins.c (expand_builtin): Support BUILT_IN_CHKP_INIT_PTR_BOUNDS,
BUILT_IN_CHKP_NULL_PTR_BOUNDS, BUILT_IN_CHKP_COPY_PTR_BOUNDS,
BUILT_IN_CHKP_CHECK_PTR_LBOUNDS, BUILT_IN_CHKP_CHECK_PTR_UBOUNDS,
BUILT_IN_CHKP_CHECK_PTR_BOUNDS, BUILT_IN_CHKP_SET_PTR_BOUNDS,
BUILT_IN_CHKP_NARROW_PTR_BOUNDS, BUILT_IN_CHKP_STORE_PTR_BOUNDS,
BUILT_IN_CHKP_GET_PTR_LBOUND, BUILT_IN_CHKP_GET_PTR_UBOUND,
BUILT_IN_CHKP_BNDMK, BUILT_IN_CHKP_BNDSTX, BUILT_IN_CHKP_BNDCL,
BUILT_IN_CHKP_BNDCU, BUILT_IN_CHKP_BNDLDX, BUILT_IN_CHKP_BNDRET,
BUILT_IN_CHKP_INTERSECT, BUILT_IN_CHKP_ARG_BND, BUILT_IN_CHKP_NARROW,
BUILT_IN_CHKP_EXTRACT_LOWER, BUILT_IN_CHKP_EXTRACT_UPPER.
* common.opt (fcheck-pointer-bounds): New.
* toplev.c (process_options): Check Pointer Bounds Checker is supported.
* doc/extend.texi: Document Pointer Bounds Checker built-in functions.


diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 3deedba..1f9ae4e 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -226,6 +226,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT64_DFLOAT64, BT_DFLOAT64, 
BT_DFLOAT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, BT_DFLOAT128, BT_DFLOAT128)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_VOID_CONST_PTR, BT_VOID, BT_CONST_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
 DEF_FUNCTION_TYPE_1 (BT_FN_ULONG_ULONG, BT_ULONG, BT_ULONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_ULONGLONG_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG)
@@ -233,6 +234,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT16, BT_UINT16, 
BT_UINT16)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT32_UINT32, BT_UINT32, BT_UINT32)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT64_UINT64, BT_UINT64, BT_UINT64)
 DEF_FUNCTION_TYPE_1 (BT_FN_BOOL_INT, BT_BOOL, BT_INT)
+DEF_FUNCTION_TYPE_1 (BT_FN_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_CONST_PTR_CONST_PTR, BT_CONST_PTR, BT_CONST_PTR)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR, BT_FN_VOID_PTR)
 
@@ -346,6 +349,10 @@ DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_SIZE_CONST_VPTR, BT_BOOL, 
BT_SIZE,
 BT_CONST_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_2 (BT_FN_BOOL_INT_BOOL, BT_BOOL, BT_INT, BT_BOOL)
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_UINT_UINT, BT_VOID, BT_UINT, BT_UINT)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_SIZE, BT_PTR, BT_CONST_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_CONST_PTR, BT_PTR, BT_CONST_PTR, 
BT_CONST_PTR)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_PTRPTR_CONST_PTR, BT_VOID, BT_PTR_PTR, 
BT_CONST_PTR)
+DEF_FUNCTION_TYPE_2 (BT_FN_VOID_CONST_PTR_SIZE, BT_VOID, BT_CONST_PTR, BT_SIZE)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
 
@@ -428,6 +435,7 @@ DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I2_INT, BT_VOID, 
BT_VOLATILE_PTR, BT_I2, BT
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I4_INT, BT_VOID, BT_VOLATILE_PTR, BT_I4, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I8_INT, BT_VOID, BT_VOLATILE_PTR, BT_I8, 
BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_VPTR_I16_INT, BT_VOID,

Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP

2013-10-29 Thread Jakub Jelinek

On Tue, Oct 29, 2013 at 12:28:49PM +0100, Richard Biener wrote:
 Otherwise ok.

So like this?

2013-10-29  Jakub Jelinek  ja...@redhat.com

* tree-cfg.c (assert_unreachable_fallthru_edge_p): New function.
* tree-cfg.h (assert_unreachable_fallthru_edge_p): New prototype.
* tree-vrp.c (all_imm_uses_in_stmt_or_feed_cond): New function.
(remove_range_assertions): If ASSERT_EXPR_VAR has no other immediate
uses but in the condition and ASSERT_EXPR and the other successor of
the predecessor bb is __builtin_unreachable (), set_range_info of the
ASSERT_EXPR_VAR to the range info of the ASSERT_EXPR's lhs.

--- gcc/tree-cfg.c.jj   2013-10-29 09:25:55.0 +0100
+++ gcc/tree-cfg.c  2013-10-29 14:32:53.235150267 +0100
@@ -380,6 +380,50 @@ computed_goto_p (gimple t)
   TREE_CODE (gimple_goto_dest (t)) != LABEL_DECL);
 }
 
+/* Returns true for edge E where e-src ends with a GIMPLE_COND and
+   the other edge points to a bb with just __builtin_unreachable ().
+   I.e. return true for C-M edge in:
+   bb C:
+   ...
+   if (something)
+ goto bb N;
+   else
+ goto bb M;
+   bb N:
+   __builtin_unreachable ();
+   bb M:  */
+
+bool
+assert_unreachable_fallthru_edge_p (edge e)
+{
+  basic_block pred_bb = e-src;
+  gimple last = last_stmt (pred_bb);
+  if (last  gimple_code (last) == GIMPLE_COND)
+{
+  basic_block other_bb = EDGE_SUCC (pred_bb, 0)-dest;
+  if (other_bb == e-dest)
+   other_bb = EDGE_SUCC (pred_bb, 1)-dest;
+  if (EDGE_COUNT (other_bb-succs) == 0)
+   {
+ gimple_stmt_iterator gsi = gsi_after_labels (other_bb);
+ gimple stmt;
+
+ if (gsi_end_p (gsi))
+   return false;
+ stmt = gsi_stmt (gsi);
+ if (is_gimple_debug (stmt))
+   {
+ gsi_next_nondebug (gsi);
+ if (gsi_end_p (gsi))
+   return false;
+ stmt = gsi_stmt (gsi);
+   }
+ return gimple_call_builtin_p (stmt, BUILT_IN_UNREACHABLE);
+   }
+}
+  return false;
+}
+
 
 /* Search the CFG for any computed gotos.  If found, factor them to a
common computed goto site.  Also record the location of that site so
--- gcc/tree-vrp.c.jj   2013-10-29 09:25:38.382477963 +0100
+++ gcc/tree-vrp.c  2013-10-29 14:36:23.695060600 +0100
@@ -6459,6 +6459,33 @@ check_all_array_refs (void)
 }
 }
 
+/* Return true if all imm uses of VAR are either in STMT, or
+   feed (optionally through a chain of single imm uses) GIMPLE_COND
+   in basic block COND_BB.  */
+
+static bool
+all_imm_uses_in_stmt_or_feed_cond (tree var, gimple stmt, basic_block cond_bb)
+{
+  use_operand_p use_p, use2_p;
+  imm_use_iterator iter;
+
+  FOR_EACH_IMM_USE_FAST (use_p, iter, var)
+if (USE_STMT (use_p) != stmt)
+  {
+   gimple use_stmt = USE_STMT (use_p);
+   if (is_gimple_debug (use_stmt))
+ continue;
+   while (is_gimple_assign (use_stmt)
+   single_imm_use (gimple_assign_lhs (use_stmt),
+ use2_p, use_stmt))
+ ;
+   if (gimple_code (use_stmt) != GIMPLE_COND
+   || gimple_bb (use_stmt) != cond_bb)
+ return false;
+  }
+  return true;
+}
+
 /* Convert range assertion expressions into the implied copies and
copy propagate away the copies.  Doing the trivial copy propagation
here avoids the need to run the full copy propagation pass after
@@ -6488,12 +6515,16 @@ remove_range_assertions (void)
 {
   basic_block bb;
   gimple_stmt_iterator si;
+  /* 1 if looking at ASSERT_EXPRs immediately at the beginning of
+ a basic block preceeded by GIMPLE_COND branching to it and
+ __builtin_trap, -1 if not yet checked, 0 otherwise.  */
+  int is_unreachable;
 
   /* Note that the BSI iterator bump happens at the bottom of the
  loop and no bump is necessary if we're removing the statement
  referenced by the current BSI.  */
   FOR_EACH_BB (bb)
-for (si = gsi_start_bb (bb); !gsi_end_p (si);)
+for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);)
   {
gimple stmt = gsi_stmt (si);
gimple use_stmt;
@@ -6501,6 +6532,7 @@ remove_range_assertions (void)
if (is_gimple_assign (stmt)
 gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
  {
+   tree lhs = gimple_assign_lhs (stmt);
tree rhs = gimple_assign_rhs1 (stmt);
tree var;
tree cond = fold (ASSERT_EXPR_COND (rhs));
@@ -6509,22 +6541,49 @@ remove_range_assertions (void)
 
gcc_assert (cond != boolean_false_node);
 
-   /* Propagate the RHS into every use of the LHS.  */
var = ASSERT_EXPR_VAR (rhs);
-   FOR_EACH_IMM_USE_STMT (use_stmt, iter,
-  gimple_assign_lhs (stmt))
+   gcc_assert (TREE_CODE (var) == SSA_NAME);
+
+   if (!POINTER_TYPE_P (TREE_TYPE (lhs))
+SSA_NAME_RANGE_INFO (lhs))
+ {
+

Re: libsanitizer merge from upstream r191666

2013-10-29 Thread Konstantin Serebryany

On Tue, Oct 29, 2013 at 6:52 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Oct 29, 2013 at 06:49:30AM -0700, Konstantin Serebryany wrote:
 Thanks!
 (At this time I will be slow with response due to travel)

 BTW, don't have compiled clang/llvm pre-3.4 around to look at, for the use
 after return, do you emit always the the __asan_*malloc/free calls for the
 stack vars, or only conditionally based on some compiler flag (and in the
 latter case, is that flag on by default or not)?

The calls are emitted by default, but the __asan_stack_malloc call is
done under a run-time flag
__asan_option_detect_stack_use_after_return.
So, to use the stack-use-after-return feature you should simply
compile with -fsanitize=address and then at run-time
pass ASAN_OPTIONS=detect_stack_use_after_return=1
For small stack frame sizes the call to __asan_stack_free is inlined
(as a performance optimization, not mandatory).

Here is how the IR looks like:

void bar(void*);
void foo() {
  char x[1];
  bar(x);
}


define void @foo() #0 {
entry:
  %MyAlloca = alloca [10080 x i8], align 32
  %0 = ptrtoint [10080 x i8]* %MyAlloca to i64
  %1 = load i32* @__asan_option_detect_stack_use_after_return
  %2 = icmp ne i32 %1, 0
  br i1 %2, label %3, label %5

; label:3   ; preds = %entry
  %4 = call i64 @__asan_stack_malloc_8(i64 10080, i64 %0)
  br label %5

; label:5   ; preds = %entry, %3
  %6 = phi i64 [ %0, %entry ], [ %4, %3 ]
...
  call void @__asan_stack_free_8(i64 %6, i64 10080, i64 %0)
  ret void
}


So, when the UAR is disabled at run-time we pay a very small
additional price for this code
(less than 1% on avg on SPEC).
When it is enabled, we pay more:
https://code.google.com/p/address-sanitizer/wiki/UseAfterReturn#Performance

--kcc


 Jakub

Re: [PATCH] Handle __builtin_unreachable () using assertions in VRP

2013-10-29 Thread Richard Biener

On Tue, 29 Oct 2013, Jakub Jelinek wrote:

 On Tue, Oct 29, 2013 at 12:28:49PM +0100, Richard Biener wrote:
  Otherwise ok.
 
 So like this?

Yes, Thanks.
Richard.

 2013-10-29  Jakub Jelinek  ja...@redhat.com
 
   * tree-cfg.c (assert_unreachable_fallthru_edge_p): New function.
   * tree-cfg.h (assert_unreachable_fallthru_edge_p): New prototype.
   * tree-vrp.c (all_imm_uses_in_stmt_or_feed_cond): New function.
   (remove_range_assertions): If ASSERT_EXPR_VAR has no other immediate
   uses but in the condition and ASSERT_EXPR and the other successor of
   the predecessor bb is __builtin_unreachable (), set_range_info of the
   ASSERT_EXPR_VAR to the range info of the ASSERT_EXPR's lhs.
 
 --- gcc/tree-cfg.c.jj 2013-10-29 09:25:55.0 +0100
 +++ gcc/tree-cfg.c2013-10-29 14:32:53.235150267 +0100
 @@ -380,6 +380,50 @@ computed_goto_p (gimple t)
  TREE_CODE (gimple_goto_dest (t)) != LABEL_DECL);
  }
  
 +/* Returns true for edge E where e-src ends with a GIMPLE_COND and
 +   the other edge points to a bb with just __builtin_unreachable ().
 +   I.e. return true for C-M edge in:
 +   bb C:
 +   ...
 +   if (something)
 + goto bb N;
 +   else
 + goto bb M;
 +   bb N:
 +   __builtin_unreachable ();
 +   bb M:  */
 +
 +bool
 +assert_unreachable_fallthru_edge_p (edge e)
 +{
 +  basic_block pred_bb = e-src;
 +  gimple last = last_stmt (pred_bb);
 +  if (last  gimple_code (last) == GIMPLE_COND)
 +{
 +  basic_block other_bb = EDGE_SUCC (pred_bb, 0)-dest;
 +  if (other_bb == e-dest)
 + other_bb = EDGE_SUCC (pred_bb, 1)-dest;
 +  if (EDGE_COUNT (other_bb-succs) == 0)
 + {
 +   gimple_stmt_iterator gsi = gsi_after_labels (other_bb);
 +   gimple stmt;
 +
 +   if (gsi_end_p (gsi))
 + return false;
 +   stmt = gsi_stmt (gsi);
 +   if (is_gimple_debug (stmt))
 + {
 +   gsi_next_nondebug (gsi);
 +   if (gsi_end_p (gsi))
 + return false;
 +   stmt = gsi_stmt (gsi);
 + }
 +   return gimple_call_builtin_p (stmt, BUILT_IN_UNREACHABLE);
 + }
 +}
 +  return false;
 +}
 +
  
  /* Search the CFG for any computed gotos.  If found, factor them to a
 common computed goto site.  Also record the location of that site so
 --- gcc/tree-vrp.c.jj 2013-10-29 09:25:38.382477963 +0100
 +++ gcc/tree-vrp.c2013-10-29 14:36:23.695060600 +0100
 @@ -6459,6 +6459,33 @@ check_all_array_refs (void)
  }
  }
  
 +/* Return true if all imm uses of VAR are either in STMT, or
 +   feed (optionally through a chain of single imm uses) GIMPLE_COND
 +   in basic block COND_BB.  */
 +
 +static bool
 +all_imm_uses_in_stmt_or_feed_cond (tree var, gimple stmt, basic_block 
 cond_bb)
 +{
 +  use_operand_p use_p, use2_p;
 +  imm_use_iterator iter;
 +
 +  FOR_EACH_IMM_USE_FAST (use_p, iter, var)
 +if (USE_STMT (use_p) != stmt)
 +  {
 + gimple use_stmt = USE_STMT (use_p);
 + if (is_gimple_debug (use_stmt))
 +   continue;
 + while (is_gimple_assign (use_stmt)
 + single_imm_use (gimple_assign_lhs (use_stmt),
 +   use2_p, use_stmt))
 +   ;
 + if (gimple_code (use_stmt) != GIMPLE_COND
 + || gimple_bb (use_stmt) != cond_bb)
 +   return false;
 +  }
 +  return true;
 +}
 +
  /* Convert range assertion expressions into the implied copies and
 copy propagate away the copies.  Doing the trivial copy propagation
 here avoids the need to run the full copy propagation pass after
 @@ -6488,12 +6515,16 @@ remove_range_assertions (void)
  {
basic_block bb;
gimple_stmt_iterator si;
 +  /* 1 if looking at ASSERT_EXPRs immediately at the beginning of
 + a basic block preceeded by GIMPLE_COND branching to it and
 + __builtin_trap, -1 if not yet checked, 0 otherwise.  */
 +  int is_unreachable;
  
/* Note that the BSI iterator bump happens at the bottom of the
   loop and no bump is necessary if we're removing the statement
   referenced by the current BSI.  */
FOR_EACH_BB (bb)
 -for (si = gsi_start_bb (bb); !gsi_end_p (si);)
 +for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);)
{
   gimple stmt = gsi_stmt (si);
   gimple use_stmt;
 @@ -6501,6 +6532,7 @@ remove_range_assertions (void)
   if (is_gimple_assign (stmt)
gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
 {
 + tree lhs = gimple_assign_lhs (stmt);
   tree rhs = gimple_assign_rhs1 (stmt);
   tree var;
   tree cond = fold (ASSERT_EXPR_COND (rhs));
 @@ -6509,22 +6541,49 @@ remove_range_assertions (void)
  
   gcc_assert (cond != boolean_false_node);
  
 - /* Propagate the RHS into every use of the LHS.  */
   var = ASSERT_EXPR_VAR (rhs);
 - FOR_EACH_IMM_USE_STMT (use_stmt, iter,
 -gimple_assign_lhs (stmt))
 + gcc_assert (TREE_CODE (var) == SSA_NAME);
 +
 +

[PATCH] More restrict testcases

2013-10-29 Thread Richard Biener


Just figured while optimizing Michas ADD_RESTRICT patch ...
(well, make it work).

Tested on x86_64-unknown-linux-gnu, committed.

Richard.

2013-10-29  Richard Biener  rguent...@suse.de

* gcc.dg/torture/restrict-2.c: New testcase.
* gcc.dg/torture/restrict-3.c: Likewise.
* gcc.dg/torture/restrict-4.c: Likewise.
* gcc.dg/torture/restrict-5.c: Likewise.

Index: gcc/testsuite/gcc.dg/torture/restrict-2.c
===
--- gcc/testsuite/gcc.dg/torture/restrict-2.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/restrict-2.c   (working copy)
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+static inline void
+foo (int * p)
+{
+  int * __restrict pr = p;
+  *pr = 1;
+}
+
+int __attribute__((noinline,noclone))
+bar (int *q)
+{
+  int * __restrict qr = q;
+  *qr = 0;
+  foo (qr);
+  return *qr;
+}
+
+int main()
+{
+  int i;
+  if (bar (i) != 1)
+abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/restrict-3.c
===
--- gcc/testsuite/gcc.dg/torture/restrict-3.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/restrict-3.c   (working copy)
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+static inline void
+foo (int * __restrict pr)
+{
+  *pr = 1;
+}
+
+int __attribute__((noinline,noclone))
+bar (int *q)
+{
+  int * __restrict qr = q;
+  *qr = 0;
+  foo (qr);
+  return *qr;
+}
+
+int main()
+{
+  int i;
+  if (bar (i) != 1)
+abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/restrict-4.c
===
--- gcc/testsuite/gcc.dg/torture/restrict-4.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/restrict-4.c   (working copy)
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+static inline void
+foo (int * p)
+{
+  int * __restrict pr = p;
+  *pr = 1;
+}
+
+int __attribute__((noinline,noclone))
+bar (int * __restrict qr)
+{
+  *qr = 0;
+  foo (qr);
+  return *qr;
+}
+
+int main()
+{
+  int i;
+  if (bar (i) != 1)
+abort ();
+  return 0;
+}
Index: gcc/testsuite/gcc.dg/torture/restrict-5.c
===
--- gcc/testsuite/gcc.dg/torture/restrict-5.c   (revision 0)
+++ gcc/testsuite/gcc.dg/torture/restrict-5.c   (working copy)
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+
+extern void abort (void);
+
+static inline void
+foo (int * __restrict pr)
+{
+  *pr = 1;
+}
+
+int __attribute__((noinline,noclone))
+bar (int * __restrict qr)
+{
+  *qr = 0;
+  foo (qr);
+  return *qr;
+}
+
+int main()
+{
+  int i;
+  if (bar (i) != 1)
+abort ();
+  return 0;
+}

[PATCH, 4.8, PR 58789] Backport cgraph_get_create_real_symbol_node and PR 57084 fix

2013-10-29 Thread Martin Jambor

Hi,

PR 58789 has been fixed on trunk by revision 198743 which needs
cgraph_get_create_real_symbol_node introduced in revision 196750.
This patch backports both.  It has been pre-approved by Honza in
person and passed bootstrap and testing on the branch.  I am about
to commit it momentarily.

Thanks,

Martin


2013-10-25  Martin Jambor  mjam...@suse.cz

PR middle-end/58789
Backport from mainline
2013-05-09  Martin Jambor  mjam...@suse.cz

PR lto/57084
* gimple-fold.c (canonicalize_constructor_val): Call
cgraph_get_create_real_symbol_node instead of cgraph_get_create_node.

Backport from mainline
2013-03-16  Jan Hubicka  j...@suse.cz

* cgraph.h (cgraph_get_create_real_symbol_node): Declare.
* cgraph.c (cgraph_get_create_real_symbol_node): New function.
* cgrpahbuild.c: Use cgraph_get_create_real_symbol_node instead
of cgraph_get_create_node.
* ipa-prop.c (ipa_make_edge_direct_to_target): Likewise.


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 8c1efb4..fd3aade 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2596,4 +2596,47 @@ verify_cgraph (void)
   FOR_EACH_FUNCTION (node)
 verify_cgraph_node (node);
 }
+
+/* Create external decl node for DECL.
+   The difference i nbetween cgraph_get_create_node and
+   cgraph_get_create_real_symbol_node is that cgraph_get_create_node
+   may return inline clone, while cgraph_get_create_real_symbol_node
+   will create a new node in this case.
+   FIXME: This function should be removed once clones are put out of decl
+   hash.  */
+
+struct cgraph_node *
+cgraph_get_create_real_symbol_node (tree decl)
+{
+  struct cgraph_node *first_clone = cgraph_get_node (decl);
+  struct cgraph_node *node;
+  /* create symbol table node.  even if inline clone exists, we can not take
+ it as a target of non-inlined call.  */
+  node = cgraph_get_node (decl);
+  if (node  !node-global.inlined_to)
+return node;
+
+  node = cgraph_create_node (decl);
+
+  /* ok, we previously inlined the function, then removed the offline copy and
+ now we want it back for external call.  this can happen when 
devirtualizing
+ while inlining function called once that happens after extern inlined and
+ virtuals are already removed.  in this case introduce the external node
+ and make it available for call.  */
+  if (first_clone)
+{
+  first_clone-clone_of = node;
+  node-clones = first_clone;
+  symtab_prevail_in_asm_name_hash ((symtab_node) node);
+  symtab_insert_node_to_hashtable ((symtab_node) node);
+  if (dump_file)
+   fprintf (dump_file, Introduced new external node 
+(%s/%i) and turned into root of the clone tree.\n,
+xstrdup (cgraph_node_name (node)), node-uid);
+}
+  else if (dump_file)
+fprintf (dump_file, Introduced new external node 
+(%s/%i).\n, xstrdup (cgraph_node_name (node)), node-uid);
+  return node;
+}
 #include gt-cgraph.h
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 5df7fb4..8ab7ae1 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -575,6 +575,7 @@ struct cgraph_indirect_call_info 
*cgraph_allocate_init_indirect_info (void);
 struct cgraph_node * cgraph_create_node (tree);
 struct cgraph_node * cgraph_create_empty_node (void);
 struct cgraph_node * cgraph_get_create_node (tree);
+struct cgraph_node * cgraph_get_create_real_symbol_node (tree);
 struct cgraph_node * cgraph_same_body_alias (struct cgraph_node *, tree, tree);
 struct cgraph_node * cgraph_add_thunk (struct cgraph_node *, tree, tree, bool, 
HOST_WIDE_INT,
   HOST_WIDE_INT, tree, tree);
diff --git a/gcc/cgraphbuild.c b/gcc/cgraphbuild.c
index fb01f24..a74a4c0 100644
--- a/gcc/cgraphbuild.c
+++ b/gcc/cgraphbuild.c
@@ -73,7 +73,7 @@ record_reference (tree *tp, int *walk_subtrees, void *data)
   decl = get_base_var (*tp);
   if (TREE_CODE (decl) == FUNCTION_DECL)
{
- struct cgraph_node *node = cgraph_get_create_node (decl);
+ struct cgraph_node *node = cgraph_get_create_real_symbol_node (decl);
  if (!ctx-only_vars)
cgraph_mark_address_taken_node (node);
  ipa_record_reference ((symtab_node)ctx-varpool_node,
@@ -143,7 +143,7 @@ record_eh_tables (struct cgraph_node *node, struct function 
*fun)
 {
   struct cgraph_node *per_node;
 
-  per_node = cgraph_get_create_node (DECL_FUNCTION_PERSONALITY 
(node-symbol.decl));
+  per_node = cgraph_get_create_real_symbol_node (DECL_FUNCTION_PERSONALITY 
(node-symbol.decl));
   ipa_record_reference ((symtab_node)node, (symtab_node)per_node, 
IPA_REF_ADDR, NULL);
   cgraph_mark_address_taken_node (per_node);
 }
@@ -223,7 +223,7 @@ mark_address (gimple stmt, tree addr, void *data)
   addr = get_base_address (addr);
   if (TREE_CODE (addr) == FUNCTION_DECL)
 {
-  struct cgraph_node *node = cgraph_get_create_node (addr);
+  struct cgraph_node

Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 1/2

2013-10-29 Thread Sandra Loosemore


On 10/29/2013 02:51 AM, Bernd Edlinger wrote:

On Mon, 28 Oct 2013 21:29:24, Sandra Loosemore wrote:


I again tried backporting the patch series along with your fix to GCC
4.8 and tested on arm-none-eabi. I found that it was still getting
stuck in infinite recursion unless the test from this patch hunk



hmmm
Actually I used your path on a clean 4.8.2 and built for --target=arm-eabi.
I have seen the recursion on the extract_bit_field, but not on store_bit_field
so far, maybe you could give me a hint which test case exposes the other
flavour of this recursion problem.


Sorry, I was going to describe that in more detail but I forgot.  It was 
compiling pr56997-1.c with -mthumb, like:


arm-none-eabi-gcc /path/to/gcc/testsuite/gcc.dg/pr56997-1.c -c 
-fno-diagnostics-show-caret   -fstrict-volatile-bitfields 
-DSTACK_SIZE=16384 -mthumb


Other testing with -mthumb looked fine.


Either way, still needs more work, and more thorough testing (more
targets, and obviously trunk as well as the 4.8 backport) before I'd
consider this ready to commit. I might or might not be able to find
some more time to work on this in the next week


Yes.

And it would be good to reach Richard's Nov-21 deadline for GCC-4.9


Yeah, I know the clock is ticking.  :-(

-Sandra

Re: [Patch, C, C++] Accept GCC ivdep for 'do' and 'while', and for C++11's range-based loops

2013-10-29 Thread Jason Merrill


On 10/28/2013 05:48 PM, Tobias Burnus wrote:

I am not completely sure whether you had the following in mind, but
that's what I have now implemented:
   DEFTREECODE (RANGE_FOR_STMT, range_for_stmt, tcc_statement, 4)
has now a 5th operator (RANGE_FOR_IVDEP), which has the value
boolean_true_node - or NULL_TREE (via begin_range_for_stmt).


You don't need to add an operand, you can use one of the TREE_LANG_FLAGs 
for a boolean flag.


Jason

Re: [C++ Patch] PR 58888

2013-10-29 Thread Jason Merrill


OK.

Jason

Re: [PATCH] Compute nonzero_bits from __builtin_unreachable assertions

2013-10-29 Thread Richard Biener

On Sat, 26 Oct 2013, Jakub Jelinek wrote:

 Hi!
 
 And here is a patch that allows vectorization without peeling for alignment
 and scalar loop for bound even for fn2, fn3 and fn4 in the following
 testcase, though as with the range __builtin_unreachable () notes, it is
 quite fragile, because it only works if there are no immediate uses of the
 tested SSA_NAME before the assertion.  Perhaps more reliable way would be to
 convert those assertions info __builtin_assume_aligned, but that has the
 disadvantage that it's first argument is a pointer and it returns a pointer,
 so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED
 internal function.
 
 Bootstrapped/regtested on x86_64-linux and i686-linux.

Ok.

Thanks,
Richard.

 int a[1024];
 
 void
 fn1 (int x, int y)
 {
   int i;
   x = -32;
   y = -32;
   for (i = x + 32; i  y; i++)
 a[i]++;
 }
 
 void
 fn2 (int x, int y)
 {
   int i;
   if (x  31)
 __builtin_unreachable ();
   if (y  31)
 __builtin_unreachable ();
   for (i = x + 32; i  x + y; i++)
 a[i]++;
 }
 
 void
 fn3 (int x, int y)
 {
   int i;
   if (x % 32)
 __builtin_unreachable ();
   if (y % 32)
 __builtin_unreachable ();
   for (i = x + 32; i  x + y; i++)
 a[i]++;
 }
 
 void
 fn3 (int x, int y)
 {
   int i;
   if ((x % 32) != 0)
 __builtin_unreachable ();
   if ((y % 32) != 0)
 __builtin_unreachable ();
   for (i = x + 32; i  x + y; i++)
 a[i]++;
 }
 
 2013-10-25  Jakub Jelinek  ja...@redhat.com
 
   * tree-vrp.c (maybe_set_nonzero_bits): New function.
   (remove_range_assertions): Call it.
 
 --- gcc/tree-vrp.c.jj 2013-10-24 14:32:29.0 +0200
 +++ gcc/tree-vrp.c2013-10-25 21:21:35.183092937 +0200
 @@ -6459,6 +6459,60 @@ check_all_array_refs (void)
  }
  }
  
 +/* Handle
 +   _4 = x_3  31;
 +   if (_4 != 0)
 + goto bb 6;
 +   else
 + goto bb 7;
 +   bb 6:
 +   __builtin_unreachable ();
 +   bb 7:
 +   x_5 = ASSERT_EXPR x_3, ...;
 +   If x_3 has no other immediate uses (checked by caller),
 +   var is the x_3 var from ASSERT_EXPR, we can clear low 5 bits
 +   from the non-zero bitmask.  */
 +
 +static void
 +maybe_set_nonzero_bits (basic_block bb, tree var)
 +{
 +  edge e = single_pred_edge (bb);
 +  basic_block cond_bb = e-src;
 +  gimple stmt = last_stmt (cond_bb);
 +  tree cst;
 +
 +  if (stmt == NULL
 +  || gimple_code (stmt) != GIMPLE_COND
 +  || gimple_cond_code (stmt) != ((e-flags  EDGE_TRUE_VALUE)
 +  ? EQ_EXPR : NE_EXPR)
 +  || TREE_CODE (gimple_cond_lhs (stmt)) != SSA_NAME
 +  || !integer_zerop (gimple_cond_rhs (stmt)))
 +return;
 +
 +  stmt = SSA_NAME_DEF_STMT (gimple_cond_lhs (stmt));
 +  if (!is_gimple_assign (stmt)
 +  || gimple_assign_rhs_code (stmt) != BIT_AND_EXPR
 +  || TREE_CODE (gimple_assign_rhs2 (stmt)) != INTEGER_CST)
 +return;
 +  if (gimple_assign_rhs1 (stmt) != var)
 +{
 +  gimple stmt2;
 +
 +  if (TREE_CODE (gimple_assign_rhs1 (stmt)) != SSA_NAME)
 + return;
 +  stmt2 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (stmt));
 +  if (!gimple_assign_cast_p (stmt2)
 +   || gimple_assign_rhs1 (stmt2) != var
 +   || !CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
 +   || (TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (stmt)))
 +   != TYPE_PRECISION (TREE_TYPE (var
 + return;
 +}
 +  cst = gimple_assign_rhs2 (stmt);
 +  set_nonzero_bits (var, (get_nonzero_bits (var)
 +~tree_to_double_int (cst)));
 +}
 +
  /* Convert range assertion expressions into the implied copies and
 copy propagate away the copies.  Doing the trivial copy propagation
 here avoids the need to run the full copy propagation pass after
 @@ -6576,8 +6630,11 @@ remove_range_assertions (void)
   }
   }
   if (ok)
 -   set_range_info (var, SSA_NAME_RANGE_INFO (lhs)-min,
 -   SSA_NAME_RANGE_INFO (lhs)-max);
 +   {
 + set_range_info (var, SSA_NAME_RANGE_INFO (lhs)-min,
 + SSA_NAME_RANGE_INFO (lhs)-max);
 + maybe_set_nonzero_bits (bb, var);
 +   }
 }
 }
  
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

Re: [PATCH] Testcase for nonzero_bits __builtin_unreachable

2013-10-29 Thread Richard Biener

On Tue, 29 Oct 2013, Jakub Jelinek wrote:

 On Sat, Oct 26, 2013 at 12:19:33AM +0200, Jakub Jelinek wrote:
  And here is a patch that allows vectorization without peeling for alignment
  and scalar loop for bound even for fn2, fn3 and fn4 in the following
  testcase, though as with the range __builtin_unreachable () notes, it is
  quite fragile, because it only works if there are no immediate uses of the
  tested SSA_NAME before the assertion.  Perhaps more reliable way would be to
  convert those assertions info __builtin_assume_aligned, but that has the
  disadvantage that it's first argument is a pointer and it returns a pointer,
  so we'd need to cast integers to pointers and back, or add ASSUME_ALIGNED
  internal function.
  
  Bootstrapped/regtested on x86_64-linux and i686-linux.
 
 And here is as a follow-up to the whole patchset of 4 patches a testcase:

Ok.

Thanks,
Richard.

 2013-10-29  Jakub Jelinek  ja...@redhat.com
 
   * gcc.dg/vect/vect-align-3.c: New test.
 
 --- gcc/testsuite/gcc.dg/vect/vect-align-3.c.jj   2013-10-29 
 08:18:08.636348586 +0100
 +++ gcc/testsuite/gcc.dg/vect/vect-align-3.c  2013-10-29 08:21:14.931384138 
 +0100
 @@ -0,0 +1,54 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target vect_int } */
 +
 +int a[2048];
 +
 +void
 +f1 (int x, int y)
 +{
 +  int i;
 +  x = -256;
 +  y = -256;
 +  for (i = x + 256; i  y; i++)
 +a[i]++;
 +}
 +
 +void
 +f2 (int x, int y)
 +{
 +  int i;
 +  if (x  31)
 +__builtin_unreachable ();
 +  if (y  31)
 +__builtin_unreachable ();
 +  for (i = x + 256; i  x + y; i++)
 +a[i]++;
 +}
 +
 +void
 +f3 (int x, int y)
 +{
 +  int i;
 +  if (x % 256)
 +__builtin_unreachable ();
 +  if (y % 256)
 +__builtin_unreachable ();
 +  for (i = x + 256; i  x + y; i++)
 +a[i]++;
 +}
 +
 +void
 +f4 (int x, int y)
 +{
 +  int i;
 +  if ((x % 256) != 0)
 +__builtin_unreachable ();
 +  if ((y % 256) != 0)
 +__builtin_unreachable ();
 +  for (i = x + 256; i  x + y; i++)
 +a[i]++;
 +}
 +
 +/* { dg-final { scan-tree-dump-not vect_do_peeling_for_loop_bound vect } 
 } */
 +/* { dg-final { scan-tree-dump-not loop peeled for vectorization vect } 
 } */
 +/* { dg-final { cleanup-tree-dump vect } } */
 
   Jakub
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend

[PATCH] Fix PR 58867: asan and ubsan tests not run for installed testing

2013-10-29 Thread Andrew Pinski

Hi,
  The problem here is that both asan and ubsan testsuite test if we
have set a library path before running the testsuite.  This is
incorrect when running the already installed testing as there is no
path to set.

This patch changes it so it tests if the executable is able to be
built/linked before running the tests.  This also fixes a
warning/error in restoring the CXXFLAGS which shows up in the already
installed testing case.

OK?  Bootstrapped and tested on x86_64-linux-gnu.  Also manually
looked into the log file to make sure both asan and ubsan were still
being tested.

Thanks,
Andrew Pinski

ChangeLog:
* lib/ubsan-dg.exp (check_effective_target_fundefined_sanitizer): New function.
(ubsan_init): Save off ALWAYS_CXXFLAGS.
(ubsan_finish): Restore ALWAYS_CXXFLAGS correctly.
* lib/asan-dg.exp (check_effective_target_faddress_sanitizer): Change
to creating an executable.
(asan_init): Save off ALWAYS_CXXFLAGS.
(asan_finish): Restore ALWAYS_CXXFLAGS correctly.
* gcc.dg/ubsan/ubsan.exp: Don't check the return value of ubsan_init.
Check check_effective_target_fundefined_sanitizer before running the
tests.
* g++.dg/ubsan/ubsan.exp: Likewise.
* testsuite/gcc.dg/asan/asan.exp: Don't check
check_effective_target_faddress_sanitizer too early. Don't check the
return value of asan_init.
* g++.dg/asan/asan.exp: Likewise.
Index: testsuite/lib/ubsan-dg.exp
===
--- testsuite/lib/ubsan-dg.exp  (revision 204138)
+++ testsuite/lib/ubsan-dg.exp  (working copy)
@@ -14,6 +14,15 @@
 # along with GCC; see the file COPYING3.  If not see
 # http://www.gnu.org/licenses/.
 
+# Return 1 if compilation with -fsanitize=undefined is error-free for trivial
+# code, 0 otherwise.
+
+proc check_effective_target_fundefined_sanitizer {} {
+return [check_no_compiler_messages fundefined_sanitizer executable {
+   int main (void) { return 0; }
+} -fsanitize=undefined]
+}
+
 #
 # ubsan_link_flags -- compute library path and flags to find libubsan.
 # (originally from g++.exp)
@@ -60,6 +69,7 @@ proc ubsan_init { args } {
 global ALWAYS_CXXFLAGS
 global TOOL_OPTIONS
 global ubsan_saved_TEST_ALWAYS_FLAGS
+global usan_saved_ALWAYS_CXXFLAGS
 
 set link_flags 
 if ![is_remote host] {
@@ -74,6 +84,7 @@ proc ubsan_init { args } {
set ubsan_saved_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS
 }
 if [info exists ALWAYS_CXXFLAGS] {
+   set usan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS
set ALWAYS_CXXFLAGS [concat {ldflags=$link_flags} $ALWAYS_CXXFLAGS]
 } else {
if [info exists TEST_ALWAYS_FLAGS] {
@@ -95,10 +106,15 @@ proc ubsan_init { args } {
 proc ubsan_finish { args } {
 global TEST_ALWAYS_FLAGS
 global ubsan_saved_TEST_ALWAYS_FLAGS
+global asan_saved_ALWAYS_CXXFLAGS
 
-if [info exists ubsan_saved_TEST_ALWAYS_FLAGS] {
-   set TEST_ALWAYS_FLAGS $ubsan_saved_TEST_ALWAYS_FLAGS
+if [info exists asan_saved_ALWAYS_CXXFLAGS ] {
+   set ALWAYS_CXXFLAGS $asan_saved_ALWAYS_CXXFLAGS
 } else {
-   unset TEST_ALWAYS_FLAGS
+   if [info exists ubsan_saved_TEST_ALWAYS_FLAGS] {
+   set TEST_ALWAYS_FLAGS $ubsan_saved_TEST_ALWAYS_FLAGS
+   } else {
+   unset TEST_ALWAYS_FLAGS
+   }
 }
 }
Index: testsuite/lib/asan-dg.exp
===
--- testsuite/lib/asan-dg.exp   (revision 204138)
+++ testsuite/lib/asan-dg.exp   (working copy)
@@ -18,8 +18,8 @@
 # code, 0 otherwise.
 
 proc check_effective_target_faddress_sanitizer {} {
-return [check_no_compiler_messages faddress_sanitizer object {
-   void foo (void) { }
+return [check_no_compiler_messages faddress_sanitizer executable {
+   int main (void) { return 0; }
 } -fsanitize=address]
 }
 
@@ -69,6 +69,7 @@ proc asan_init { args } {
 global ALWAYS_CXXFLAGS
 global TOOL_OPTIONS
 global asan_saved_TEST_ALWAYS_FLAGS
+global asan_saved_ALWAYS_CXXFLAGS
 
 set link_flags 
 if ![is_remote host] {
@@ -83,6 +84,7 @@ proc asan_init { args } {
set asan_saved_TEST_ALWAYS_FLAGS $TEST_ALWAYS_FLAGS
 }
 if [info exists ALWAYS_CXXFLAGS] {
+   set asan_saved_ALWAYS_CXXFLAGS $ALWAYS_CXXFLAGS
set ALWAYS_CXXFLAGS [concat {ldflags=$link_flags} $ALWAYS_CXXFLAGS]
set ALWAYS_CXXFLAGS [concat {additional_flags=-fsanitize=address -g} 
$ALWAYS_CXXFLAGS]
 } else {
@@ -105,11 +107,16 @@ proc asan_init { args } {
 proc asan_finish { args } {
 global TEST_ALWAYS_FLAGS
 global asan_saved_TEST_ALWAYS_FLAGS
+global asan_saved_ALWAYS_CXXFLAGS
 
-if [info exists asan_saved_TEST_ALWAYS_FLAGS] {
-   set TEST_ALWAYS_FLAGS $asan_saved_TEST_ALWAYS_FLAGS
+if [info exists asan_saved_ALWAYS_CXXFLAGS ] {
+   set ALWAYS_CXXFLAGS $asan_saved_ALWAYS_CXXFLAGS
 } else {
-   unset TEST_ALWAYS_FLAGS
+   if [info exists asan_saved_TEST_ALWAYS_FLAGS] {
+   set TEST_ALWAYS_FLAGS

Re: [PATCH] Do not append INTERNAL to the decl name

2013-10-29 Thread Jason Merrill


On 10/28/2013 06:12 PM, Dehao Chen wrote:

ping...


Sorry for the slow response.

If we're actually emitting the name now, we need to give it a name 
different from the complete constructor.  I suppose it makes sense to go 
with C4/D4 as in the decloning patch,


http://gcc.gnu.org/ml/gcc-patches/2007-11/txt00046.txt

(which I've been meaning to revisit before the end of stage 1).

Jason

Re: free is a killer

2013-10-29 Thread Jeff Law


On 10/29/13 04:40, Richard Biener wrote:


Of course in the example we have the global memory storage class
(incoming function argument) and malloc memory which is really
the same storage class.  It only becomes a different storage class
if you factor in flow analysis (for which the current PTA code is weak
once you leave the SSA web).
The global memory storage class is really better thought of as i 
haven't a clue class.


You want a different class for things passed in as you know they can't 
conflict with anything in the current function's stack or anything 
malloc'd by the current function.



Jeff

Re: [PATCH 1/2] fix PR49847 ICE-on-HAVE_cc0 regression from PR50780 changes

2013-10-29 Thread Richard Biener

On Wed, Aug 28, 2013 at 10:06 PM, Jeff Law l...@redhat.com wrote:
 On 08/28/2013 03:50 AM, Mikael Pettersson wrote:

 This patch fixes an ICE that occurs in #ifdef HAVE_cc0 code.  The ICE
 breaks both Java and Ada bootstrap on m68k-linux.  There is also a
 tiny C++ test case in the BZ entry.

 The ICE is triggered by the PR middle-end/50780 changes in r180192.
 As explained in http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49847#c16,
 the effect of those changes is that an expression in EH context is
 broken up so that the cc0 setter and user are put in separate BBs, which
 normally isn't allowed.  As fold_rtx sees the cc0 user, it calls
 equiv_constant on prev_insn_cc0, but that is NULL due to the BB boundary,
 resulting in the ICE.

 This patch checks if prev_insn_cc0 is NULL, and if so doesn't call
 equiv_constant but sets const_arg to zero, which avoids the ICE and
 makes fold_rtx leave the insn unchanged.

 Bootstrapped and regtested on m68k-linux, no regressions.  This patch
 has been in 4.6-based production compilers on m68k-linux since early 2012,
 and in a 4.7-based compiler since early 2013.

 Ok for trunk and 4.8?

 [If approved I'll need help to commit it as I don't have commit rights.]

 gcc/

 2013-08-28  Mikael Pettersson  mi...@it.uu.se

 PR rtl-optimization/49847
 * cse.c (fold_rtx) second case CC0: If prev_insn_cc0 is NULL
 don't call equiv_constant on it.

 I can't help but feel something different needs to be done here.   It's
 always been the case that those two insns are expected to be contiguous and
 there's going to be code all over the place that makes that assumption and
 that model is what every GCC developer has expected for the last 20+ years.

 What precisely about Richi's patch causes the setter and user to end up in
 different blocks?  I'm guessing we now consider the comparison itself as
 potentially trapping and thus we end the block immediately?  The consumer
 then appears in the next block?

Btw, the handling for COND_EXPRs was simply adjusted to be the same
as for if () conditions.

From looking at the testcase,

int f (float g)
{
  try { return g = 0; }
  catch (...) {}
}

can't you simply avoid expanding throwing stuff to a set-reg-based-on-compare
instruction sequence?

Or why does the issue not trigger on

 try { if (g = 0) return 1; else 0; } catch (...) {}

?

Richard.

 I wonder if the comparison could be duplicated and marked as non-throwing at
 RTL generation time.   Alternately (and probably better) would be to convert
 the remaining stragglers and drop the old cc0 support entirely.

 jeff

Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-29 Thread Andrew Pinski

On Mon, Oct 28, 2013 at 10:51 PM, Jeff Law l...@redhat.com wrote:
 On 10/27/13 12:55, Andrew Pinski wrote:

 Here is my latest patch which adds the testcases from Zhenqiang's
 patch and fixes item 1 and 2.

 OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

 Thanks,
 Andrew Pinski

 ChangeLog:
 * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
 (ifcombine_ifandif): Handle cases where
 maybe_fold_and_comparisons fails, combining the branches
 anyways.
 (tree_ssa_ifcombine): Inverse the order of
 the basic block walk, increases the number of combinings.
 * gimple.h (gsi_start_nondebug_after_labels_bb): New function.


 testsuite/ChangeLog:

 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
 * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.

 * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
 conditional move to be used.
 * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more
 intermediate.

 I don't be able to look at this in any depth tonight.  However, I would ask
 that you pass along the dump file for ssa-dom-thread-3.c so that I can
 evaluate the correctness of your change better.

We no longer jump thread one case as we have:

  bb 2:
  var_3 = var_1(D)-ssa_name.var;
  _4 = var_1(D)-base.code;
  if (_4 == 1)
goto bb 3;
  else
goto bb 5;

...

 bb 5:
  _6 = var_3-base.code;
  _13 = _4 != 1;
  _14 = _6 != 0;
  _12 = _13  _14;
  if (_12 != 0)
goto bb 8;
  else
goto bb 6;

Attached is the full dump too.

Thanks,
Andrew Pinski




 Thanks,
 jeff



ssa-dom-thread-3.c.077t.dom1
Description: Binary data

Re: free is a killer

2013-10-29 Thread Richard Biener

On Tue, Oct 29, 2013 at 4:01 PM, Jeff Law l...@redhat.com wrote:
 On 10/29/13 04:40, Richard Biener wrote:


 Of course in the example we have the global memory storage class
 (incoming function argument) and malloc memory which is really
 the same storage class.  It only becomes a different storage class
 if you factor in flow analysis (for which the current PTA code is weak
 once you leave the SSA web).

 The global memory storage class is really better thought of as i haven't a
 clue class.

No, that's the anything class in the current PTA.

 You want a different class for things passed in as you know they can't
 conflict with anything in the current function's stack or anything malloc'd
 by the current function.

That's the NONLOCAL class in the current PTA.  Which also means
that simple cases should work.  It's only when the malloc result
escapes somewhere that it gets reachable by NONLOCAL because
of the limited flow-sensitiveness of PTA (and its lack of context sensitivity).

Richard.




 Jeff

Re: [wide-int] More optimisations

2013-10-29 Thread Kenneth Zadeck


On 10/29/2013 08:43 AM, Richard Sandiford wrote:

Richard Biener rguent...@suse.de writes:

On Sun, 27 Oct 2013, Richard Sandiford wrote:

This patch adds some more optimisations to the wi:: comparison functions.
It uses the:

   #define CONSTANT(X) (__builtin_constant_p (X)  (X))

idiom that was mentioned before, except that I thought CONSTANT would be
too easily confused with CONSTANT_P, so I went for CAN_TELL instead.
Better names welcome.

STATIC_CONSTANT_P similar to static_assert?

Sounds good.


The changes are:

- Add a fast path to eq_p for when one of the inputs isn't sign-extended.
   This includes code to handle compile-time 0 specially.

- Add the opposite optimisation to Mike's lts_p change, if we can tell at
   compile time that it applies.

I think the cases Mike added should only be enabled when we can figure
them out at compile-time, too.

Well, the idea with most of these functions was to handle the simple
everything is one HWI cases inline.  For operations like addition this
needs to be based on the precision, since that's the only cheap way of
knowing whether the result is also a single HWI.  But for logic operations
like AND and OR, it can be based on whether the inputs are single HWIs,
since single-HWI inputs give a single-HWI output.  This catches more cases.

lts_p is a bit like AND and OR in this respect.  fits_shwi_p is the
same as len == 1, which is simple to check and is often true.  If we
restrict the optimisation to STATIC_CONSTANT_P we'll end up using the
out-of-line comparison for all TREE_INT_CST_LTs, even the int-int ones
that we currently handle inline.


This has always been my way of looking at it.

kenny

Thanks,
Richard

Re: [PATCH] Introduce [sg]et_nonzero_bits

2013-10-29 Thread Jakub Jelinek

On Tue, Oct 29, 2013 at 12:55:04PM +0100, Richard Biener wrote:
 Surely you can't rely on CCP and VRP compute exactly the same
 nonzero_bits.  As you don't record/compute zero_bits you can't
 tell whether a not set bit in nonzer_bits is don't know or
 if it is zero.  And you cannot do an assert during iteration
 (during iteration optimistically 'wrong' values can disappear).
 
 During CCP iteration the rule is that bits may only be added to mask
 (and obviously the constant itself on a CONSTANT lattice value may
 not change).

 Factor this out to commonize with code in evaluate_stmt

Ok.

  + tem = val-mask.low;
  + align = (tem  -tem);
  + if (align  1)
  +   set_ptr_info_alignment (get_ptr_info (name), align,
  +   (TREE_INT_CST_LOW (val-value)
  + (align - 1)));
  +   }
  +  else
  +   {
  + double_int nonzero_bits = val-mask;
  + nonzero_bits = nonzero_bits | tree_to_double_int (val-value);
  + nonzero_bits = get_nonzero_bits (name);
 
 This looks odd to me - shouldn't it be simply
 
nonzero_bits = ~val-mask  tree_to_double_int (val-value);

Bits set in the mask are for the bits where the value is unknown,
so potentially nonzero bits.  Where bits are clear in the mask,
but set in the value, those are surely nonzero bits.
The only known zero bits are where both mask and value are zero,
and as nonzero_bits is defined to be 1 where the bit may be non-zero,
0 where it must be zero (like it is for nonzero_bits* in RTL),
I think val-mask | tree_to_double_int (val-value); is exactly
what we want.

 ?  = of get_nonzero_bits either is not necessary or shows the
 lattice is unnecessarily conservative because you apply it too early
 during propagation?
 
  + set_nonzero_bits (name, nonzero_bits);
  +   }
   }
   
 /* Perform substitutions based on the known constant values.  */
  @@ -1671,6 +1700,25 @@ evaluate_stmt (gimple stmt)
 is_constant = (val.lattice_val == CONSTANT);
   }
   
  +  if (flag_tree_bit_ccp
  +   !is_constant
 ^^^
 
 ah, so if the bit-CCP part figured out a set of known bits
 then you don't do anything here while in fact you can
 still remove bits from mask with get_nonzero_bits info.

Yeah, that was not to lose info for the case where the lattice
is already CONSTANT.

Originally, I had code like:

--- tree-ssa-ccp.c.xx   2013-10-29 15:31:03.0 +0100
+++ tree-ssa-ccp.c  2013-10-29 15:34:44.257977817 +0100
@@ -1701,8 +1701,7 @@ evaluate_stmt (gimple stmt)
 }
 
   if (flag_tree_bit_ccp
-   !is_constant
-   likelyvalue != UNDEFINED
+   (is_constant || likelyvalue != UNDEFINED)
gimple_get_lhs (stmt)
TREE_CODE (gimple_get_lhs (stmt)) == SSA_NAME)
 {
@@ -1711,11 +1710,27 @@ evaluate_stmt (gimple stmt)
   double_int mask = double_int::mask (TYPE_PRECISION (TREE_TYPE (lhs)));
   if (nonzero_bits != double_int_minus_one  nonzero_bits != mask)
{
- val.lattice_val = CONSTANT;
- val.value = build_zero_cst (TREE_TYPE (lhs));
- /* CCP wants the bits above precision set.  */
- val.mask = nonzero_bits | ~mask;
- is_constant = true;
+ if (!is_constant)
+   {
+ val.lattice_val = CONSTANT;
+ val.value = build_zero_cst (TREE_TYPE (lhs));
+ /* CCP wants the bits above precision set.  */
+ val.mask = nonzero_bits | ~mask;
+ is_constant = true;
+   }
+ else
+   {
+ double_int valv = tree_to_double_int (val.value);
+ gcc_assert ((valv  ~val.mask
+  ~nonzero_bits  mask).is_zero ());
+ if (!(valv  ~nonzero_bits  mask).is_zero ())
+   val.value = double_int_to_tree (TREE_TYPE (lhs),
+   valv  nonzero_bits);
+ if (nonzero_bits.is_zero ())
+   val.mask = double_int_zero;
+ else
+   val.mask = val.mask  (nonzero_bits | ~mask);
+   }
}
 }
 
in the patch, but that breaks e.g. on
void
foo (unsigned int a, unsigned b, unsigned **c, void *d[8], int e)
{
  int i;
  for (i = 0; i != e; i++);
  if (a)
{
  for (i = 0;;)
{
  i--;
  if (bar ())
goto lab2;
}
lab1:;
  __builtin_printf (%d\n, i  0 ? -1 - i : i);
  return;
}
lab2:;
  if (!d[b]  *c)
goto lab1;
}
at -O2 - VRP1 figures out:
  if (i_3  0)
goto bb 10;
  else
goto bb 11;

  bb 10:
  # RANGE [0, 2147483647] NONZERO 0x07fff
  iftmp.0_23 = ~i_3;

  bb 11:
  # RANGE [0, 2147483647] NONZERO 0x07fff
  # iftmp.0_4 = PHI iftmp.0_23(10), i_3(9)
through the edge assertions, but CPP of course doesn't have edge
assertions and during iterations just at some point assumes
i_3 can be zero and processes bb 10 with that, which leads
to value -1 that triggers the above assert.  So, I still

Re: free is a killer

2013-10-29 Thread Jeff Law


On 10/29/13 09:15, Richard Biener wrote:

On Tue, Oct 29, 2013 at 4:01 PM, Jeff Law l...@redhat.com wrote:

On 10/29/13 04:40, Richard Biener wrote:



Of course in the example we have the global memory storage class
(incoming function argument) and malloc memory which is really
the same storage class.  It only becomes a different storage class
if you factor in flow analysis (for which the current PTA code is weak
once you leave the SSA web).


The global memory storage class is really better thought of as i haven't a
clue class.


No, that's the anything class in the current PTA.


You want a different class for things passed in as you know they can't
conflict with anything in the current function's stack or anything malloc'd
by the current function.


That's the NONLOCAL class in the current PTA.  Which also means
that simple cases should work.  It's only when the malloc result
escapes somewhere that it gets reachable by NONLOCAL because
of the limited flow-sensitiveness of PTA (and its lack of context sensitivity).
OK.  I think we've got all the same concepts and may just be arguing 
about terminology :-)


jeff

Re: [PATCH] fixing typo in expr.c to allow proper recognition of complex addresses in some arches.

2013-10-29 Thread Jeff Law


On 10/29/13 06:10, Eric Botcazou wrote:

My reading of memory_address_addr_space is that MODE is the mode of the
memory reference, not the mode of the address.  I fail to see how
passing in the mode of the address in the first call can be correct.

What am I missing?


Nothing, it's me confusing

rtx memory_address_addr_space (enum machine_mode, rtx, addr_space_t);

with

rtx convert_memory_address_addr_space (enum machine_mode, rtx, addr_space_t);

for a long time apparently...  Sorry about that.  In the meantime, I installed
the obvious patch to eliminate the small redundancy.  Tested on x86_64/Linux.
No worries.  At one time I kept reading memory_address_addr_space_p and 
couldn't reconcile how the return value was used.  Eventually I realized 
I needed to be reading memory_address_space_addr_space instead.  Ugh.


Jeff

[patch] fix fallout from lto-streamer.h not including gimple.h

2013-10-29 Thread Andrew MacLeod

2 target files also included lto-streamer.h but did not include 
gimple.h.Fixed thusly.


Applied as revision 204166

Andrew



	* config/darwin.c: Include gimple.h.
	* config/i386/winnt.c: Likewise.

Index: config/darwin.c
===
*** config/darwin.c	(revision 204164)
--- config/darwin.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 45,50 
--- 45,51 
  #include df.h
  #include debug.h
  #include obstack.h
+ #include gimple.h
  #include lto-streamer.h
  
  /* Darwin supports a feature called fix-and-continue, which is used
Index: config/i386/winnt.c
===
*** config/i386/winnt.c	(revision 204164)
--- config/i386/winnt.c	(working copy)
*** along with GCC; see the file COPYING3.
*** 35,40 
--- 35,41 
  #include ggc.h
  #include target.h
  #include except.h
+ #include gimple.h
  #include lto-streamer.h
  
  /* i386/PE specific attribute support.

Re: RFC/A: PR 58079: CONST_INT mode assert in combine.c:do_SUBST

2013-10-29 Thread Uros Bizjak

On Mon, Oct 28, 2013 at 5:28 PM, Uros Bizjak ubiz...@gmail.com wrote:

 PR 58079 is about the do_SUBST assert:

   /* Sanity check that we're replacing oldval with a CONST_INT
 that is a valid sign-extension for the original mode.  */
   gcc_assert (INTVAL (newval)
  == trunc_int_for_mode (INTVAL (newval), GET_MODE (oldval)));

 triggering while trying to optimise the temporary result:

   (eq (const_int -73 [0xffb7])
   (reg:QI 234 [ MEM[(const u8 *)buf_7(D) + 4B] ])

 combine_simplify_rtx calls simplify_comparison, which first canonicalises
 the order so that the constant is second and then promotes the comparison
 to SImode (the first supported comparison mode on MIPS).  So we end up with:

   (eq (zero_extend:SI (reg:QI 234 [ MEM[(const u8 *)buf_7(D) + 4B] ]))
   (const_int 183 [0xb7]))

 So far so good.  But combine_simplify_rtx then tries to install the
 new operands in-place, with the second part of:

  /* If the code changed, return a whole new comparison.  */
  if (new_code != code)
return gen_rtx_fmt_ee (new_code, mode, op0, op1);

  /* Otherwise, keep this operation, but maybe change its operands.
 This also converts (ne (compare FOO BAR) 0) to (ne FOO BAR).  */
  SUBST (XEXP (x, 0), op0);
  SUBST (XEXP (x, 1), op1);

 And this triggers the assert because we're replacing a QImode register
 with the non-QImode constant 183.

 One fix would be to extend the new_code != code condition, as below.
 This should avoid the problem cases without generating too much
 garbage rtl.  But if the condition's getting this complicated,
 perhaps it'd be better just to avoid SUBST here?  I.e.

  if (new_code != code || op0 != XEXP (x, 0) || op1 != XEXP (x, 1))
return gen_rtx_fmt_ee (new_code, mode, op0, op1);

 Just asking though. :-)

 Tested on mipsisa64-elf.  OK to install?

 Thanks,
 Richard


 gcc/
 PR rtl-optimization/58079
 * combine.c (combine_simplify_rtx): Avoid using SUBST if
 simplify_comparison has widened a comparison with an integer.

 gcc/testsuite/
 * gcc.dg/torture/pr58079.c: New test.

 I would like to backport this patch to 4.8 branch. The patch fixes the
 failure on alpha-linux-gnu and also (reportedly) fixes the same
 failure on mips and pa targets.

 I have bootstrapped and regression test the patch on
 x86_64-pc-linux-gnu {,-m32} and started bootstrap and regtests on
 alphaev68-linux-gnu on 4.8 branch for all default languages + obj-c++
 and go. The latter will take some ~10 hours.

 OK to backort the patch to the 4.8 branch if both tests pass?

The regression test was OK in both cases, so I went ahead and
installed the patch. The fix is kind of obvious anyway.

Uros.

Re: free is a killer

2013-10-29 Thread Jeff Law


On 10/29/13 00:38, Marc Glisse wrote:


Indeed. Do you want to commit it xfailed or put it in bugzilla so we
don't lose it? (it becomes harder if you replace p with p-1 in the
memset arguments).
I'll either add it as xfailed, or I'll add it as-is if I fix the alias 
code.   It'd primarly be to address the current incorrect state of the 
code as it stands today rather than to extensively feature test this code.





Sounds good, but I am not really up to rewriting the points-to analysis
at the moment...
Understandable.  Never hurts to put a bug in someone's ear, you never 
know when they (or someone else reading) will run with it.


jeff

Re: [PATCH] Use get_nonzero_bits to improve vectorization

2013-10-29 Thread Jakub Jelinek

On Tue, Oct 29, 2013 at 01:11:53PM +0100, Richard Biener wrote:
  +/* Return number of known trailing zero bits in EXPR, or, if the value of
  +   EXPR is known to be zero, the precision of it's type.  */
  +
  +int
 
 unsigned int?

Ok.

  +case PLUS_EXPR:
  +case MINUS_EXPR:
  +case BIT_IOR_EXPR:
  +case BIT_XOR_EXPR:
  +case MIN_EXPR:
  +case MAX_EXPR:
  +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
  +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
 
 This first recurses but if either one returns 0 you don't have
 to (that cuts down the recursion from exponential to linear in
 the common case?).  Thus, early out on ret == 0?

Yeah, that is reasonable.  Usually we use this during expansion etc.,
trees won't be arbitrarily large and for SSA_NAMEs we don't recurse
on definitions, so I think we are unlikely to ever see very large
expressions there though.  I've added similar early out to the COND_EXPR
case.

  +  return MIN (ret1, ret2);
  +case POINTER_PLUS_EXPR:
  +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
  +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
  +  ret2 = MIN (ret2, prec);
 
 Why do you need that here but not elsewhere when processing
 binary ops?

Because POINTER_PLUS_EXPR (well, also shifts, but for those we
don't call tree_ctz on the second argument) is the only
binop we handle that uses different types for the operands,
for the first argument we know it has the same precision as the result, but
what if sizetype has bigger precision than TYPE_PRECISION of pointers?
I know it typically doesn't, just wanted to make sure we never return
an out of range return value, tree_ctz result should be in [0, prec]
always.

  +  return MIN (ret1, ret2);
  +case BIT_AND_EXPR:
  +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
  +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
  +  return MAX (ret1, ret2);
  +case MULT_EXPR:
  +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
  +  ret2 = tree_ctz (TREE_OPERAND (expr, 1));
  +  return MIN (ret1 + ret2, prec);
  +case LSHIFT_EXPR:
  +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
  +  if (host_integerp (TREE_OPERAND (expr, 1), 1)
 
 check that first before recursing for op0 - if op1 is negative
 you simply return ret1 which looks wrong, too.

If op1 is negative, then it is undefined behavior, so assuming
in that case the same thing as when op1 is not constant (i.e.
that we worst case shift left by 0 and thus not increase number
of trailing zeros, or shift left by  0 and increase it) is IMHO
correct.  I don't think we treat left shift by negative value as
right shift anywhere, do we?

  +  ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
  +  (unsigned HOST_WIDE_INT) prec))
 
 This check is to avoid overflowing ret1 + ret2?  If so, why not add
 
  +   {
  + ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);
 
   ret2 = MIN (ret2, prec);
 
 instead?

Because ret{1,2} are int (well, with the change you've suggested unsigned
int), but tree_low_cst is UHWI, so if the shift count is say UINT_MAX + 1
(yeah, surely undefined behavior), I just didn't want to lose any of the upper
bits.  Sure, alternatively I could have an UHWI temporary variable and
assign tree_low_cst to it and do the MIN on that temporary and prec, but
still I think it is better to handle out of range constants as variable
shift count rather than say shift count up by prec - 1.

  +case TRUNC_DIV_EXPR:
  +case CEIL_DIV_EXPR:
  +case FLOOR_DIV_EXPR:
  +case ROUND_DIV_EXPR:
  +case EXACT_DIV_EXPR:
  +  if (TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST)
  +   {
  + ret2 = tree_log2 (TREE_OPERAND (expr, 1));
  + if (ret2 = 0  tree_int_cst_sgn (TREE_OPERAND (expr, 1)) == 1)
 
 cheaper to test the sign first.

Ok.
 
  +   {
  + ret1 = tree_ctz (TREE_OPERAND (expr, 0));
  + if (ret1  ret2)
  +   return ret1 - ret2;
  +   }
  +   }
  +  return 0;
  +CASE_CONVERT:
  +  ret1 = tree_ctz (TREE_OPERAND (expr, 0));
  +  if (ret1  ret1 == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (expr, 
  0
  +   ret1 = prec;
  +  return MIN (ret1, prec);
  +case SAVE_EXPR:
  +  return tree_ctz (TREE_OPERAND (expr, 0));
  +case COND_EXPR:
  +  ret1 = tree_ctz (TREE_OPERAND (expr, 1));
  +  ret2 = tree_ctz (TREE_OPERAND (expr, 2));
  +  return MIN (ret1, ret2);
  +case COMPOUND_EXPR:
  +  return tree_ctz (TREE_OPERAND (expr, 1));
  +case ADDR_EXPR:
  +  ret1 = get_object_alignment (TREE_OPERAND (expr, 0));
 
 Use get_pointer_alignment, this isn't a memory reference so type
 alignment rules don't apply.

Ok.

Jakub

Re: [PATCH, PR 10474] Split live-ranges of function arguments to help shrink-wrapping

2013-10-29 Thread Vladimir Makarov

On 10/25/2013 11:19 AM, Martin Jambor wrote:
 Hi,

 On Thu, Oct 24, 2013 at 01:02:51AM +0200, Steven Bosscher wrote:
 On Wed, Oct 23, 2013 at 6:46 PM, Martin Jambor wrote:

  /* Perform the second half of the transformation started in
 @@ -4522,7 +4704,15 @@ ira (FILE *f)
   allocation because of -O0 usage or because the function is too
   big.  */
if (ira_conflicts_p)
 -find_moveable_pseudos ();
 +{
 +  df_analyze ();
 +  calculate_dominance_info (CDI_DOMINATORS);
 +
 +  find_moveable_pseudos ();
 +  split_live_ranges_for_shrink_wrap ();
 +
 +  free_dominance_info (CDI_DOMINATORS);
 +}

 You probably want to add another df_analyze if
 split_live_ranges_for_shrink_wrap makes code transformations. AFAIU
 find_moveable_pseudos doesn't change global liveness but your
 transformation might. IRA/LRA need up-to-date DF_LR results to compute
 allocno live ranges.

 OK, I have changed the patch to fo that (it is below, still bootstraps
 and passes tests on x86_64 fine).  However, I have noticed that the
 corresponding part in function ira now looks like:

   /* ... */
   if (delete_trivially_dead_insns (get_insns (), max_reg_num ()))
 df_analyze ();

   /* It is not worth to do such improvement when we use a simple
  allocation because of -O0 usage or because the function is too
  big.  */
   if (ira_conflicts_p)
 {
   df_analyze ();
   calculate_dominance_info (CDI_DOMINATORS);

   find_moveable_pseudos ();
   if (split_live_ranges_for_shrink_wrap ())
   df_analyze ();

   free_dominance_info (CDI_DOMINATORS);
 }
   /* ... */

 So, that left me wondering whether the first call to df_analyze is
 actually necessary, or whether perhaps the data are actually already
 up to date.  What do you think?
I guess it needs some investigation.  delete_trivially_dead_insns code
was taken from the old RA.  First of all, I don't know how many insns
are really trivially dead before RA in optimization and non-optimization
mode.  May be the code can be removed at all.  I'll put it on my todo list.
The patch is ok to commit.  Thanks for working on this, Martin.

 2013-10-23  Martin Jambor  mjam...@suse.cz

   PR rtl-optimization/10474
   * ira.c (find_moveable_pseudos): Do not calculate dominance info
   nor df analysis.
   (interesting_dest_for_shprep): New function.
   (split_live_ranges_for_shrink_wrap): Likewise.
   (ira): Calculate dominance info and df analysis. Call
   split_live_ranges_for_shrink_wrap.

 testsuite/
   * gcc.dg/pr10474.c: New testcase.
   * gcc.dg/ira-shrinkwrap-prep-1.c: Likewise.
   * gcc.dg/ira-shrinkwrap-prep-2.c: Likewise.

Re: [Fwd: Re: [PATCH (updated)] Convert symtab, cgraph and varpool nodes into a real class hierarchy]

2013-10-29 Thread Jeff Law


On 10/29/13 10:03, David Malcolm wrote:


commit 739276306f1d8f5d0b1202da21cc29b5fcac102e
Author: David Malcolmdmalc...@redhat.com
Date:   Mon Oct 28 22:25:38 2013 -0400

 Add chain_next and chain_prev options back to symtab_node_base

 gcc/
* cgraph.h (symtab_node_base): Add missing chain_next and
chain_prev GTY options.

This is fine.  Sorry I didn't review it at the same time as the other patch.

Jeff

Re: RE : Problem with _Hashtable_ebo_helper

2013-10-29 Thread Jonathan Wakely

This patch replaces the problematic use of _Hashtable_ebo_helper with
a new type.

2013-10-29  Jonathan Wakely  jwakely@gmail.com

* include/bits/hashtable.cc (__access_protected_ctor): Define and use
new type instead of _Hashtable_ebo_helper.
* testsuite/23_containers/unordered_set/
not_default_constructible_hash_neg.cc: Adjust line number.

Tested x86_64-linux, I'll commit it to trunk today.
commit 50786d1785911ef2d335fdcd162e71e14da4
Author: Jonathan Wakely jwakely@gmail.com
Date:   Tue Oct 29 15:50:00 2013 +

* include/bits/hashtable.cc (__access_protected_ctor): Define and use
new type instead of _Hashtable_ebo_helper.
* testsuite/23_containers/unordered_set/
not_default_constructible_hash_neg.cc: Adjust line number.

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index c639c55..aae146b 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -277,14 +277,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
Functor used to map hash code to bucket index
 must be default constructible);
 
+  // _Hash_code_base has a protected default constructor, so use this
+  // derived type to tell if it's usable.
+  struct __access_protected_ctor : __hash_code_base { };
+
   // When hash codes are not cached local iterator inherits from
   // __hash_code_base above to compute node bucket index so it has to be
   // default constructible.
   static_assert(__if_hash_not_cached
-   is_default_constructible
- // We use _Hashtable_ebo_helper to access the protected
- // default constructor.
- __detail::_Hashtable_ebo_helper0, __hash_code_base, 
true::value,
+   is_default_constructible__access_protected_ctor::value,
Cache the hash code or make functors involved in hash code
 and bucket index computation default constructible);
 
diff --git 
a/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc
 
b/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc
index 3332cc5..4216b91 100644
--- 
a/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc
+++ 
b/libstdc++-v3/testsuite/23_containers/unordered_set/not_default_constructible_hash_neg.cc
@@ -19,7 +19,7 @@
 // with this library; see the file COPYING3.  If not see
 // http://www.gnu.org/licenses/.
 
-// { dg-error default constructible  { target *-*-* } 283 }
+// { dg-error default constructible  { target *-*-* } 287 }
 
 #include unordered_set

Re: patch to improve register preferencing in IRA and to remove regmove pass

2013-10-29 Thread Jeff Law


On 10/29/13 09:12, Vladimir Makarov wrote:

   Tomorrow I'd like commit the following patch.

   The patch removes regmove pass.

I'm buying next time we get together :-)  You have my eternal gratitude.




   I've found only one useful transformations in regmove pass:

   dst = srcdst = src (src dies)
   ...  no dst or src modification  =  src changed on dst
   src dies ...

   It is some kind value numbering technique decreasing register
   pressure by removing one live-range.  It is not frequently
   triggered transformation (about 30 in all combine.c) and its
   effect is quite small but there is no harm in it at all too.

   So I added the code to IRA without changes practically (it would
   be nice to make this in more general way, e.g. not only in BB
   scope -- but I am not sure that it will makes visible
   improvements and I have no time to try this currently).
I wouldn't be concerned about generalizing this too much.  I can't see 
how it's *that* important, and we can always come back to it if we 
decide it is important.





   Still to achieve the same code performance without regmove pass, I
needed to improve code in IRA which implicitly replace removed regmove
transformations:

   o improving hard reg preferences.  As you know RTL code can contain
 explicitly hard regs.  Currently, hard register in RTL code affect
 costs of hard regs only for pseudo involved with the hard register
 occurrence.  E.g.

[ ... ]
This sounds good and generally useful.



   o improving preference propagation of hard registers occurring in RTL
 and assigned to connected pseudo.  Let us look at the situation:

p1 - p2 - p3,  where '-' means a copy

 and we are assigning p1, p2, p3 in the same order.
 When we've just assigned hr1 to p1, we propagating hr1 preference
 to p2 and p3.  When we assign to p2, we can not assign hr1 for
 some reason and have to assign hr2.  P3 in the current preference
 implementation still has hr1 preference which is wrong.

 I implemented undoing preference propagation for such situations.

Ouch.  Again, sounds good to get this fixed.



   o Currently IRA generates too aggressively copies for operands might
   be matched, so I rewrite this code to generate copies more
   accurately.

Also good :-)



   The changes in testsuites are necessary as IRA/LRA now generate a
different code (more accurately a better code by removing register
shuffle moves for each case).

Excellent.


   So this patch removes a lot of code, decrease compilation time
(e.g. valgrind lackey reports about 0.4% less executed insns on
compiling GCC combine.i with -O2), generates about the same performace
code (the best improvement I saw is 0.5% SPEC2000
improvement on x86_64 in -O3 mode on a Haswell processor) and about
the same average code size for SPEC2000 (the differences in hundredth
percent range).

No concerns here.




   It is a big change and I hope there are no serious objections to
this.  If somebody has them, please express them or inform me.

   Thanks, Vlad.

2013-10-28  Vladimir Makarov  vmaka...@redhat.com

* regmove.c: Remove.
* tree-pass.h (make_pass_regmove): Remove.
* timevar.def (TV_REGMOVE): Remove.
* passes.def (pass_regmove): Remove.
* opts.c (default_options_table): Remove entry for regmove.
* doc/passes.texi: Remove regmove pass description.
* doc/invoke.texi (-foptimize-register-move, -fregmove): Remove
options.
(-fdump-rtl-regmove): Ditto.
* common.opt (foptimize-register-move, fregmove): Remove.
* Makefile.in (OBJS): Remove regmove.o.
* regmove.c: Remove.
* ira-int.h (struct ira_allocno_pref, ira_pref_t): New structure
and type.
(struct ira_allocno) New member allocno_prefs.
(ALLOCNO_PREFS): New macro.
(ira_prefs, ira_prefs_num): New external vars.
(ira_setup_alts, ira_get_dup_out_num, ira_debug_pref): New
prototypes.
(ira_debug_prefs, ira_debug_allocno_prefs, ira_create_pref):
Ditto.
(ira_add_allocno_pref, ira_remove_pref, ira_remove_allocno_prefs):
Ditto.
(ira_add_allocno_copy_to_list): Remove prototype.
(ira_swap_allocno_copy_ends_if_necessary): Ditto.
(ira_pref_iterator): New type.
(ira_pref_iter_init, ira_pref_iter_cond): New functions.
(FOR_EACH_PREF): New macro.
* ira.c (commutative_constraint_p): Move from ira-conflicts.c.
(ira_get_dup_out_num): Ditto. Rename from get_dup_num.  Modify the
code.
(ira_setup_alts): New function.
(decrease_live_ranges_number): New function.
(ira): Call the above function.
* ira-build.c (ira_prefs, ira_prefs_num): New global vars.
(ira_create_allocno): Initialize allocno prefs.
(pref_pool,

Re: RFA: Andes nds32 port v4 patch

2013-10-29 Thread Joseph S. Myers

On Tue, 29 Oct 2013, Chung-Ju Wu wrote:

 Thank you very much for the comments on libgcc and documentation parts.
 Here I re-list the v4 patch of nds32 port:
 
   Machine Description --
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02153.html
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02154.html
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02156.html
   Libgcc and Documentation --
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02382.html
 http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02384.html
 
 Are those nds32 port patches OK to be committed?

This is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [patch] Mostly remove tree-core.h from other .h files.

2013-10-29 Thread Andrew MacLeod


On 10/28/2013 03:38 PM, Andrew MacLeod wrote:
I noticed there were a few .h files which we including tree-core.h 
themselves.   expr.h was one of them, and many of the rtl files used 
that to get at  the basic tree structure for various bits.


I moved the include of tree-core.h to rtl.h instead.  That way all the 
rtl files get it, and when we flatten the rtl.h includes into the c 
files, it'll be reduced to just the ones that need it. Eventually, 
those things that need it from rtl.h won't even need tree-code.h, 
they'll get converted to the new gimple interface instead.


bah, looks like I broke some cross compiling.  Apparently some gen* 
routines include rtl.h before all-tree.def exists, and this causes 
breakage.  Doesnt happen during stage 1 of a cross, but later.

sigh.

I'll put tree-core.h back into expr.h, remove it from rtl.h and look to 
flatten it properly shortly...


bootstrapped on x86_64-unknown-linux-gnu and applied as revision 204167.

Andrew


	* expr.h: Revert change and include tree-core.h.
	* rtl.h: Revert change and don't include tree-core.h.

Index: expr.h
===
*** expr.h	(revision 204163)
--- expr.h	(working copy)
*** along with GCC; see the file COPYING3.  
*** 26,31 
--- 26,34 
  #include rtl.h
  /* For optimize_size */
  #include flags.h
+ /* For host_integerp, tree_low_cst, fold_convert, size_binop, ssize_int,
+TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/
+ #include tree-core.h
  /* For GET_MODE_BITSIZE, word_mode */
  #include machmode.h
  
Index: rtl.h
===
*** rtl.h	(revision 204163)
--- rtl.h	(working copy)
*** along with GCC; see the file COPYING3.  
*** 29,35 
  #include alias.h
  #include hashtab.h
  #include flags.h
- #include tree-core.h
  
  /* Value used by some passes to recognize noop moves as valid
   instructions.  */
--- 29,34

Re: [PATCH i386 4/8] [AVX512] [1/n] Add substed patterns.

2013-10-29 Thread Richard Henderson

On 10/29/2013 03:02 AM, Kirill Yukhin wrote:
 Hello Richard,
 
 On 28 Oct 14:45, Richard Henderson wrote:
 On 10/28/2013 01:58 PM, Kirill Yukhin wrote:
 Hello Richard,
 On 28 Oct 08:20, Richard Henderson wrote:
 Why is a masked *scalar* operation useful?

 The reason the instructions exist is so that
 you can do fully fault correct predicated scalar algorithms.

 Using VEC_MERGE isn't the proper representation for that.

 If that's your real goal, then COND_EXEC is the only way to let
 rtl know that faults are suppressed in the false condition.
 
 I believe cond_exec approach supposed to look like this:
   (define_subst mask_scalar
 [(set (match_operand:SUBST_V 0)
   (vec_merge:SUBST_V
 (match_operand:SUBST_V 1)
 (match_operand:SUBST_V 2)
 (const_int 1)))]
 TARGET_AVX512F
 [(cond_exec (eq:CC
   (match_operand:avx512fmaskmode 3 register_operand k)
   (const_int 1))
 (set (match_dup 0)
  (vec_merge:SUBST_V
(match_dup 1)
(match_dup 2)
(const_int 1])
 
 But this only will describe merge-masking in incorrect way.
 We will need to add a clobber to signal that even for false
 condition we will zero higher part of register.
 Preferable zerro-masking will be indistinguishable from merge-
 masking and will need to choose which mask mode to enable. Bad turn.

No, a cond_exec approach to scalars would use scalar modes
not vector modes with vec_merge.  In that case the higher
part of the register is ignored and undefined, and the fact
that zeroing happens is irrelevant.

 
 IMHO, we have 3 options to implement scalar masked insns:
   1. `vec_merge' over vec_merge (current approach).
  Pro.
1. Precise semantic description

False, as I've described above.  The compiler will believe
that all exceptions, especially memory exceptions, will be
taken.

2. Unified approach with vector patterns
3. Freedom for simplifier to reduce EVEX to VEX for
certain const masks
  Cons.
1. Too precise semantic description and as a
consequence complicated code in md-file
 
   2. `cond_exec' approach
 Pro.
   1. Look useful for compiler when trying to generate
   predicated code
 Cons.
   1. Not precise. Extra clobbers (?) needed: to signal
   that we're changing the register even for false
   condition in cond_exec 
   2. Unable to describe zero masking nicely
   3. Code still complicated as for option #1
   4. Simplifier won't work (clobber is always clobber)
 
   3. Make all masked scalar insns to be unspecs
 Pro.
   1. Straight-forward, not overweighted. Enough for
   intrinsics to work
 Cons.
   1. Since every unspec needs a code: substs won't be
   applied directly: huge volume of similar code
   2. Simplifier won't work
   3. Generation of predicated code become hard
 
 Am I missing  some options, or that’s all we have?
 If so, what option would you prefer?

As far as builtins are concerned, all three approaches are
functional.  But in order for the compiler to be able to
automatically create conditional code from normal scalar
code we'll have to use cond_exec.

Indeed, if we've done our job right, then the user-facing
inlines that we expose could be written

  inline double add_mask(double r, double x, double y, int m)
  {
if (m  1)
  r = x + y;
return r;
  }

although honestly such inlines would be silly.

Re: patch to improve register preferencing in IRA and to remove regmove pass

2013-10-29 Thread Alexander Monakov

Hello,

A very minor nit: in common.opt, entries for the options should be changed to

fregmove
Common Ignore
Does nothing. Preserved for backward compatibility.

instead of removing them altogether, so the compiler does not start rejecting
build commands with such options.  There are now a few such entries there.

Alexander

Re: patch to improve register preferencing in IRA and to remove regmove pass

2013-10-29 Thread Joseph S. Myers

Please change the .opt entries to

foptimize-register-move
Common Ignore
Does nothing. Preserved for backward compatibility.

and

fregmove
Common Ignore
Does nothing. Preserved for backward compatibility.

rather than removing them completely.  In general, when removing an option 
that's purely about optimizations rather than affecting the semantics of 
the user's source code, it's best to leave it behind as an ignored option 
like that.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: patch to improve register preferencing in IRA and to remove regmove pass

2013-10-29 Thread Vladimir Makarov

On 10/29/2013 12:26 PM, Jeff Law wrote:
 On 10/29/13 09:12, Vladimir Makarov wrote:
I've found only one useful transformations in regmove pass:

dst = srcdst = src (src dies)
...  no dst or src modification  =  src changed on dst
src dies ...

It is some kind value numbering technique decreasing register
pressure by removing one live-range.  It is not frequently
triggered transformation (about 30 in all combine.c) and its
effect is quite small but there is no harm in it at all too.

So I added the code to IRA without changes practically (it would
be nice to make this in more general way, e.g. not only in BB
scope -- but I am not sure that it will makes visible
improvements and I have no time to try this currently).
 I wouldn't be concerned about generalizing this too much.  I can't see
 how it's *that* important, and we can always come back to it if we
 decide it is important.

I am completely agree with you.  I don't expect any important outcome
from this work.  In fact, I did some experiments with GVN about 10 years
ago (how time flies) for old RA and did not see any visible effect on
performance but generated code always changes and may be there are more
opportunity for this (although more probably less opportunities).  In
any case, it is lowest priority project for me and most probably I will
never start it.

Re: patch to improve register preferencing in IRA and to remove regmove pass

2013-10-29 Thread Vladimir Makarov

On 10/29/2013 12:43 PM, Alexander Monakov wrote:
 Hello,

 A very minor nit: in common.opt, entries for the options should be changed to

 fregmove
 Common Ignore
 Does nothing. Preserved for backward compatibility.

 instead of removing them altogether, so the compiler does not start rejecting
 build commands with such options.  There are now a few such entries there.

Thanks, Alex.  A good point.  I'll change it.

Re: patch to improve register preferencing in IRA and to remove regmove pass

2013-10-29 Thread Vladimir Makarov

On 10/29/2013 12:44 PM, Joseph S. Myers wrote:
 Please change the .opt entries to

 foptimize-register-move
 Common Ignore
 Does nothing. Preserved for backward compatibility.

 and

 fregmove
 Common Ignore
 Does nothing. Preserved for backward compatibility.

 rather than removing them completely.  In general, when removing an option 
 that's purely about optimizations rather than affecting the semantics of 
 the user's source code, it's best to leave it behind as an ignored option 
 like that.

Thanks, Joseph.  I modified common.opt accordingly.

Re: [GOOGLE] Don't update the callee count if caller is not resolved node

2013-10-29 Thread Dehao Chen

On Mon, Oct 28, 2013 at 9:49 PM, Xinliang David Li davi...@google.com wrote:
 Is it sufficient to check if the final caller is defined in primary module?

This might not be sufficient because the final caller may come from
comdat of aux-modules (not defined in the primary module).


 Note that in some cases, doing profile update 'speculatively' (without
 your fix) can be more precise (assuming the inlining gets materialized
 in a different compilation), but in general it might be better to be
 conservative in profile update (as in your patch) -- the downside is
 you may end up with less aggressive size optimization.

This is actually no speculatively update profile, but double update.

E.g. comdat_foo()--bar() with count 100

and comdat_foo() has been defined in 2 aux-modules. Then there will be
two edges from comdat_foo()--bar(), each of which has the count 100.
But bar()'s entry count is only 100 (assume comdat_foo is the only
caller). Then if we update bar() twice when inline these two edges,
the second update will be wrong.

Dehao


 David

 On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen de...@google.com wrote:
 This patch changes to no update callee count if caller count is not a
 resolved node (in LIPO mode) during AutoFDO compilation. This is
 because AutoFDO will have the same edge counts for all unresolved
 nodes. Original update method will lead to multi-update of the callee.

 Bootstrapped and testing on going.

 OK for google-4_8 if test is OK?

 Thanks,
 Dehao

 Index: gcc/ipa-inline-transform.c
 ===
 --- gcc/ipa-inline-transform.c (revision 204131)
 +++ gcc/ipa-inline-transform.c (working copy)
 @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
else
   {
struct cgraph_node *n;
 +  if (flag_auto_profile  L_IPO_COMP_MODE
 +   cgraph_pre_profiling_inlining_done)
 +{
 +  struct cgraph_node *caller = e-caller;
 +  if (caller-global.inlined_to)
 + caller = caller-global.inlined_to;
 +  if (cgraph_lipo_get_resolved_node (caller-symbol.decl) != caller)
 + update_original = false;
 +}
n = cgraph_clone_node (e-callee, e-callee-symbol.decl,
   e-count, e-frequency,
   update_original, vNULL, true);

Re: [PATCH] Introduce [sg]et_nonzero_bits

2013-10-29 Thread Jakub Jelinek

Hi!

On Tue, Oct 29, 2013 at 04:29:56PM +0100, Jakub Jelinek wrote:
  Surely you can't rely on CCP and VRP compute exactly the same
  nonzero_bits.  As you don't record/compute zero_bits you can't
  tell whether a not set bit in nonzer_bits is don't know or
  if it is zero.  And you cannot do an assert during iteration

Unset bits in get_nonzero_bits are always known to be zero, set bits
are don't know or known to be nonzero.

  (during iteration optimistically 'wrong' values can disappear).
  
  During CCP iteration the rule is that bits may only be added to mask
  (and obviously the constant itself on a CONSTANT lattice value may
  not change).

Here is an untested safer variant of the original patch that attempts
to honor the constraints you've mentioned by looking at get_value
and either making nonzero_bits more conservative or not applying it at
all if the above rules would be violated.  Still, not doing anything
for the !is_constant case, because I don't know what would be the right
thing.  Say, if for iteration we assume some constant value that has some
bits unset in mask (known) and set in value (known non-zero), yet they
are clear in get_nonzero_bits, shall we punt, or ignore what the iteration
would try to do and adjust from get_nonzero_bits instead, something else?

 
  Factor this out to commonize with code in evaluate_stmt
 
 Ok.

But in this more conservative patch I don't see enough code to commonize
any more.

The only change in the patch below since the previous patch is in the
evaluate_stmt function:

2013-10-29  Jakub Jelinek  ja...@redhat.com

* gimple-pretty-print.c (dump_ssaname_info): Print newline also
in case of VR_VARYING.  Print get_nonzero_bits if not all ones.
* tree-ssanames.h (struct range_info_def): Add nonzero_bits field.
(set_nonzero_bits, get_nonzero_bits): New prototypes.
* tree-ssa-ccp.c (get_default_value): Use get_range_info to see if
a default def isn't partially constant.
(ccp_finalize): If after IPA, set_range_info if integral SSA_NAME
is known to be partially zero.
(evaluate_stmt): If we'd return otherwise VARYING, use get_range_info
to see if a default def isn't partially constant.
* tree-ssanames.c (set_range_info): Initialize nonzero_bits upon
creation of a range, if VR_RANGE, try to improve nonzero_bits from
the range.
(set_nonzero_bits, get_nonzero_bits): New functions.

* g++.dg/warn/pr33738.C (main): Initialize a2 again to make sure
we warn about it already during VRP1 pass.

--- gcc/gimple-pretty-print.c.jj2013-10-23 14:43:12.0 +0200
+++ gcc/gimple-pretty-print.c   2013-10-24 17:26:59.650945232 +0200
@@ -1731,7 +1731,7 @@ dump_ssaname_info (pretty_printer *buffe
   if (!POINTER_TYPE_P (TREE_TYPE (node))
SSA_NAME_RANGE_INFO (node))
 {
-  double_int min, max;
+  double_int min, max, nonzero_bits;
   value_range_type range_type = get_range_info (node, min, max);
 
   if (range_type == VR_VARYING)
@@ -1744,8 +1744,20 @@ dump_ssaname_info (pretty_printer *buffe
  pp_printf (buffer, , );
  pp_double_int (buffer, max, TYPE_UNSIGNED (TREE_TYPE (node)));
  pp_printf (buffer, ]);
- newline_and_indent (buffer, spc);
}
+  nonzero_bits = get_nonzero_bits (node);
+  if (nonzero_bits != double_int_minus_one
+  (nonzero_bits
+ != double_int::mask (TYPE_PRECISION (TREE_TYPE (node)
+   {
+ pp_string (buffer,  NONZERO );
+ sprintf (pp_buffer (buffer)-digit_buffer,
+  HOST_WIDE_INT_PRINT_DOUBLE_HEX,
+  (unsigned HOST_WIDE_INT) nonzero_bits.high,
+  nonzero_bits.low);
+ pp_string (buffer, pp_buffer (buffer)-digit_buffer);
+   }
+  newline_and_indent (buffer, spc);
 }
 }
 
--- gcc/tree-ssanames.h.jj  2013-09-27 15:42:44.0 +0200
+++ gcc/tree-ssanames.h 2013-10-24 15:52:53.765955605 +0200
@@ -52,6 +52,8 @@ struct GTY (()) range_info_def {
   double_int min;
   /* Maximum for value range.  */
   double_int max;
+  /* Non-zero bits - bits not set are guaranteed to be always zero.  */
+  double_int nonzero_bits;
 };
 
 
@@ -68,10 +70,11 @@ struct GTY (()) range_info_def {
 enum value_range_type { VR_UNDEFINED, VR_RANGE, VR_ANTI_RANGE, VR_VARYING };
 
 /* Sets the value range to SSA.  */
-extern void set_range_info (tree ssa, double_int min, double_int max);
+extern void set_range_info (tree, double_int, double_int);
 /* Gets the value range from SSA.  */
-extern enum value_range_type  get_range_info (tree name, double_int *min,
- double_int *max);
+extern enum value_range_type get_range_info (tree, double_int *, double_int *);
+extern void set_nonzero_bits (tree, double_int);
+extern double_int get_nonzero_bits (tree);
 extern void init_ssanames (struct function *, int);
 extern void fini_ssanames

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-29 Thread Cong Hou

On Tue, Oct 29, 2013 at 1:38 AM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 For the define_expand I added as below, the else body is there to
 avoid fall-through transformations to ABS operation in optabs.c.
 Otherwise ABS will be converted to other operations even that we have
 corresponding instructions from SSSE3.

 No, it wont be.

 Fallthrough will generate the pattern that will be matched by the insn
 pattern above, just like you are doing by hand below.


I think the case is special for abs(). In optabs.c, there is a
function expand_abs() in which the function expand_abs_nojump() is
called. This function first tries the expand function defined for the
target and if it fails it will try max(v, -v) then shift-xor-sub
method. If I don't generate any instruction for SSSE3, the
fall-through will be max(v, -v). I have tested it on my machine.



 (define_expand absmode2
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand)
 (abs:VI124_AVX2_48_AVX512F
  (match_operand:VI124_AVX2_48_AVX512F 1 nonimmediate_operand)))]
   TARGET_SSE2
 {
   if (!TARGET_SSSE3)
 ix86_expand_sse2_abs (operands[0], force_reg (MODEmode, operands[1]));

 Do you really need force_reg here? You are using generic expanders in
 ix86_expand_sse2_abs that can handle non-registers operands just as
 well.

You are right. I have removed force_reg.



   else
 emit_insn (gen_rtx_SET (VOIDmode, operands[0],
gen_rtx_ABS (MODEmode, operands[1])));
   DONE;
 })

 Please note that your mailer mangles indents. Please indent your code 
 correctly.

Right.. I also attached a text file in which all tabs are there.


The updated patch is pasted below (and also in the attached file).
Thank you very much for your comment!


Cong




diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 8a38316..84c7ab5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2013-10-22  Cong Hou  co...@google.com
+
+ PR target/58762
+ * config/i386/i386-protos.h (ix86_expand_sse2_abs): New function.
+ * config/i386/i386.c (ix86_expand_sse2_abs): New function.
+ * config/i386/sse.md: Add SSE2 support to abs (8/16/32-bit-int).
+
 2013-10-14  David Malcolm  dmalc...@redhat.com

  * dumpfile.h (gcc::dump_manager): New class, to hold state
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 3ab2f3a..ca31224 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -238,6 +238,7 @@ extern void ix86_expand_mul_widen_evenodd (rtx,
rtx, rtx, bool, bool);
 extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool);
 extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx);
 extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx);
+extern void ix86_expand_sse2_abs (rtx, rtx);

 /* In i386-c.c  */
 extern void ix86_target_macros (void);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 02cbbbd..71905fc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41696,6 +41696,53 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
gen_rtx_MULT (mode, op1, op2));
 }

+void
+ix86_expand_sse2_abs (rtx op0, rtx op1)
+{
+  enum machine_mode mode = GET_MODE (op0);
+  rtx tmp0, tmp1;
+
+  switch (mode)
+{
+  /* For 32-bit signed integer X, the best way to calculate the absolute
+ value of X is (((signed) X  (W-1)) ^ X) - ((signed) X  (W-1)).  */
+  case V4SImode:
+ tmp0 = expand_simple_binop (mode, ASHIFTRT, op1,
+GEN_INT (GET_MODE_BITSIZE
+ (GET_MODE_INNER (mode)) - 1),
+NULL, 0, OPTAB_DIRECT);
+ if (tmp0)
+  tmp1 = expand_simple_binop (mode, XOR, op1, tmp0,
+  NULL, 0, OPTAB_DIRECT);
+ if (tmp0  tmp1)
+  expand_simple_binop (mode, MINUS, tmp1, tmp0,
+   op0, 0, OPTAB_DIRECT);
+ break;
+
+  /* For 16-bit signed integer X, the best way to calculate the absolute
+ value of X is max (X, -X), as SSE2 provides the PMAXSW insn.  */
+  case V8HImode:
+ tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+ if (tmp0)
+  expand_simple_binop (mode, SMAX, op1, tmp0, op0, 0,
+   OPTAB_DIRECT);
+ break;
+
+  /* For 8-bit signed integer X, the best way to calculate the absolute
+ value of X is min ((unsigned char) X, (unsigned char) (-X)),
+ as SSE2 provides the PMINUB insn.  */
+  case V16QImode:
+ tmp0 = expand_unop (mode, neg_optab, op1, NULL_RTX, 0);
+ if (tmp0)
+  expand_simple_binop (V16QImode, UMIN, op1, tmp0, op0, 0,
+   OPTAB_DIRECT);
+ break;
+
+  default:
+ break;
+}
+}
+
 /* Expand an insert into a vector register through pinsr insn.
Return true if successful.  */

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index c3f6c94..0d9cefe 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -8721,7 +8721,7 @@
(set (attr prefix_rex) (symbol_ref x86_extended_reg_mentioned_p (insn)))
(set_attr mode DI)])

-(define_insn absmode2
+(define_insn *absmode2
   [(set (match_operand:VI124_AVX2_48_AVX512F 0 register_operand =v)
  (abs:VI124_AVX2_48_AVX512F

Re: patch to improve register preferencing in IRA and to remove regmove pass

2013-10-29 Thread Jeff Law


On 10/29/13 10:46, Vladimir Makarov wrote:

On 10/29/2013 12:26 PM, Jeff Law wrote:

On 10/29/13 09:12, Vladimir Makarov wrote:

I've found only one useful transformations in regmove pass:

dst = srcdst = src (src dies)
...  no dst or src modification  =  src changed on dst
src dies ...

It is some kind value numbering technique decreasing register
pressure by removing one live-range.  It is not frequently
triggered transformation (about 30 in all combine.c) and its
effect is quite small but there is no harm in it at all too.

So I added the code to IRA without changes practically (it would
be nice to make this in more general way, e.g. not only in BB
scope -- but I am not sure that it will makes visible
improvements and I have no time to try this currently).

I wouldn't be concerned about generalizing this too much.  I can't see
how it's *that* important, and we can always come back to it if we
decide it is important.


I am completely agree with you.  I don't expect any important outcome
from this work.  In fact, I did some experiments with GVN about 10 years
ago (how time flies) for old RA and did not see any visible effect on
performance but generated code always changes and may be there are more
opportunity for this (although more probably less opportunities).  In
any case, it is lowest priority project for me and most probably I will
never start it.
I vaguely recall you talking about GVN in the allocator once -- perhaps 
at one of the summits.  And yes, I recall you saying it wasn't 
particularly helpful.


Jeff

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-29 Thread Uros Bizjak

On Tue, Oct 29, 2013 at 6:18 PM, Cong Hou co...@google.com wrote:

 For the define_expand I added as below, the else body is there to
 avoid fall-through transformations to ABS operation in optabs.c.
 Otherwise ABS will be converted to other operations even that we have
 corresponding instructions from SSSE3.

 No, it wont be.

 Fallthrough will generate the pattern that will be matched by the insn
 pattern above, just like you are doing by hand below.


 I think the case is special for abs(). In optabs.c, there is a
 function expand_abs() in which the function expand_abs_nojump() is
 called. This function first tries the expand function defined for the
 target and if it fails it will try max(v, -v) then shift-xor-sub
 method. If I don't generate any instruction for SSSE3, the
 fall-through will be max(v, -v). I have tested it on my machine.

Huh, strange.

Then you can rename previous pattern to absmode2_1 and call it from
the new expander instead of expanding it manually. Please also add a
small comment, describing the situation to prevent future
optimizations in this place.

Thanks,
Uros.

Re: [PATCH] Do not append INTERNAL to the decl name

2013-10-29 Thread Dehao Chen

On Tue, Oct 29, 2013 at 7:58 AM, Jason Merrill ja...@redhat.com wrote:
 On 10/28/2013 06:12 PM, Dehao Chen wrote:

 ping...


 Sorry for the slow response.

 If we're actually emitting the name now, we need to give it a name different
 from the complete constructor.  I suppose it makes sense to go with C4/D4 as
 in the decloning patch,

Shall we do it in a separate patch? And I suppose binutils also need
to be updated for C4/D4?

Thanks,
Dehao


 http://gcc.gnu.org/ml/gcc-patches/2007-11/txt00046.txt

 (which I've been meaning to revisit before the end of stage 1).

 Jason

Re: [GOOGLE] Don't update the callee count if caller is not resolved node

2013-10-29 Thread Xinliang David Li

The situation you described is worse -- hopefully it will be addressed
in the next version of lipo.

The change is ok.

David

On Tue, Oct 29, 2013 at 10:08 AM, Dehao Chen de...@google.com wrote:
 On Mon, Oct 28, 2013 at 9:49 PM, Xinliang David Li davi...@google.com wrote:
 Is it sufficient to check if the final caller is defined in primary module?

 This might not be sufficient because the final caller may come from
 comdat of aux-modules (not defined in the primary module).


 Note that in some cases, doing profile update 'speculatively' (without
 your fix) can be more precise (assuming the inlining gets materialized
 in a different compilation), but in general it might be better to be
 conservative in profile update (as in your patch) -- the downside is
 you may end up with less aggressive size optimization.

 This is actually no speculatively update profile, but double update.

 E.g. comdat_foo()--bar() with count 100

 and comdat_foo() has been defined in 2 aux-modules. Then there will be
 two edges from comdat_foo()--bar(), each of which has the count 100.
 But bar()'s entry count is only 100 (assume comdat_foo is the only
 caller). Then if we update bar() twice when inline these two edges,
 the second update will be wrong.

 Dehao


 David

 On Mon, Oct 28, 2013 at 3:51 PM, Dehao Chen de...@google.com wrote:
 This patch changes to no update callee count if caller count is not a
 resolved node (in LIPO mode) during AutoFDO compilation. This is
 because AutoFDO will have the same edge counts for all unresolved
 nodes. Original update method will lead to multi-update of the callee.

 Bootstrapped and testing on going.

 OK for google-4_8 if test is OK?

 Thanks,
 Dehao

 Index: gcc/ipa-inline-transform.c
 ===
 --- gcc/ipa-inline-transform.c (revision 204131)
 +++ gcc/ipa-inline-transform.c (working copy)
 @@ -169,6 +169,15 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
else
   {
struct cgraph_node *n;
 +  if (flag_auto_profile  L_IPO_COMP_MODE
 +   cgraph_pre_profiling_inlining_done)
 +{
 +  struct cgraph_node *caller = e-caller;
 +  if (caller-global.inlined_to)
 + caller = caller-global.inlined_to;
 +  if (cgraph_lipo_get_resolved_node (caller-symbol.decl) != caller)
 + update_original = false;
 +}
n = cgraph_clone_node (e-callee, e-callee-symbol.decl,
   e-count, e-frequency,
   update_original, vNULL, true);

Re: [PATCH] tree-ssa documetation fix

2013-10-29 Thread Jeff Law


On 10/29/13 03:53, Martin Liška wrote:

Hello,
   I've noticed that some part of documentation is obsolete and this
patch adds documentation for new functions that replace old macros.

[ ... ]
Thanks.  Installed on the trunk.

jeff

Re: [PATCH] Enhance ifcombine to recover non short circuit branches

2013-10-29 Thread Jeff Law


On 10/29/13 09:14, Andrew Pinski wrote:

On Mon, Oct 28, 2013 at 10:51 PM, Jeff Law l...@redhat.com wrote:

On 10/27/13 12:55, Andrew Pinski wrote:


Here is my latest patch which adds the testcases from Zhenqiang's
patch and fixes item 1 and 2.

OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* tree-ssa-ifcombine.c: Include rtl.h and tm_p.h.
(ifcombine_ifandif): Handle cases where
maybe_fold_and_comparisons fails, combining the branches
anyways.
(tree_ssa_ifcombine): Inverse the order of
the basic block walk, increases the number of combinings.
* gimple.h (gsi_start_nondebug_after_labels_bb): New function.


testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case.

* gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent
conditional move to be used.
* gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more
intermediate.


I don't be able to look at this in any depth tonight.  However, I would ask
that you pass along the dump file for ssa-dom-thread-3.c so that I can
evaluate the correctness of your change better.


We no longer jump thread one case as we have:

   bb 2:
   var_3 = var_1(D)-ssa_name.var;
   _4 = var_1(D)-base.code;
   if (_4 == 1)
 goto bb 3;
   else
 goto bb 5;

...

  bb 5:
   _6 = var_3-base.code;
   _13 = _4 != 1;
   _14 = _6 != 0;
   _12 = _13  _14;
   if (_12 != 0)
 goto bb 8;
   else
 goto bb 6;

Attached is the full dump too.
I mostly wanted to verify whether or not the test was worth running. 
It isn't after this change.  It no longer tests for the threader's 
ability to clone an intermediate block to isolate a path to a successor 
of the intermediate that is threadable.



Please just remove ssa-dom-thread-3.c as part of this checkin.

Thanks,

Jeff

Re: [RFC] [Testsuite,ARM] Neon intrinsics executable tests

2013-10-29 Thread Christophe Lyon

On 29 October 2013 03:24, Ramana Radhakrishnan ramra...@arm.com wrote:
 On 10/09/13 23:16, Christophe Lyon wrote:

 Irrespective of our earlier conversations on this now I'm actually wondering
 if instead of doing this and integrating this in the GCC source base it
 maybe easier to write a harness to test this cross on qemu or natively.
 Additionally setting up an auto-tester to do this might be a more productive
 use of time rather than manually dejagnuizing this which appears to be a
 tedious and slow process.

This would be easy to setup, since the Makefile on gitorious is
already targetting qemu. I used it occasionnally on boards with
minimal changes.
This just means we'd have to agree on how to set up such an
auto-tester, where do we send the results to, etc...

 I'd like your feedback before continuing, as there are a lot more
 files to come.

 I have made some cleanup to help review, but the two .h files will
 need to grow as more intrinsics will be added (see the original ones).

 Which one should I compare this with in terms of the original file ?

I have kept the same file names.


 I'd like to keep the modifications at a minimal level, to save my time
 when adapting each test (there are currently 145 test files, so 143
 left:-).


 On to the patch itself.

 The prefix TEST_ seems a bit misleading in that it suggests this is testing
 something when in reality this is initializing stuff.
In fact, TEST_ executes the  intrinsics, and copies the
results to memory when relevant. But I can easily change TEST_ to
something else.

So in the sample I posted:

TEST_VABA: VAR=vaba(); vst1(BUFFER,VAR)
TEST_VLD1: VAR=vld1(); vst1(BUFFER, VAR)

VDUP is special in that it is a helper for other tests:
TEST_VDUP: VAR1=vdup(VAR2,)
and similarly for TEST_VLOAD and TEST_VSETLANE

 +# Exit immediately if this isn't an ARM target.
 +if ![istarget arm*-*-*] then {
 +  return
 +}


 Also for aarch64*-*-* as all these intrinsics are compatible with the
 aarch64 port. I would also prefer that this be tortured over multiple
 optimization levels as many times we find issues with different optimization
 levels.

OK, this sounds easy to do, and I agree. I prefered to post a simple
version first.
And given you talked me about your plans to factorize arm and aarch64
tests, I thought it was better to start with a simple version I knew
was working.

 More later I need to get back to something else and I need to play more with
 your original testsuite - but I'd like some discussion around some of these
 points anyway.

 Ramana

OK thanks for the feedback. If we decide to go with auto-testers
instead, the discussion will probably be shorter.

Christophe


 +
 +# Load support procs.
 +load_lib gcc-dg.exp
 +
 +# Initialize `dg'.
 +dg-init
 +
 +# Main loop.
 +gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
 +   
 +
 +# All done.
 +dg-finish
 diff -rNup '--exclude=.git'
 gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
 gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
 --- gcc-fsf/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
 1970-01-01 01:00:00.0 +0100
 +++
 gcc-fsf-neontests/gcc/testsuite/gcc.target/arm/neon-intrinsics/arm-neon-ref.h
 2013-05-09 00:48:59.395628726 +0200
 @@ -0,0 +1,349 @@
 +#ifndef_ARM_NEON_REF_H_
 +#define_ARM_NEON_REF_H_
 +
 +#include stdio.h
 +#include inttypes.h
 +#include string.h
 +#include stdlib.h
 +
 +#define xSTR(X) #X
 +#define STR(X) xSTR(X)
 +
 +#define xNAME1(V,T) V ## _ ##  T
 +#define xNAME(V,T) xNAME1(V,T)
 +
 +#define VAR(V,T,W) xNAME(V,T##W)
 +#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W)
 +
 +#define VECT_NAME(T, W, N) T##W##x##N
 +#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L
 +#define VECT_TYPE(T, W, N) xNAME(VECT_NAME(T,W,N),t)
 +#define VECT_ARRAY_TYPE(T, W, N, L) xNAME(VECT_ARRAY_NAME(T,W,N,L),t)
 +
 +#define VECT_VAR(V,T,W,N) xNAME(V,VECT_NAME(T,W,N))
 +#define VECT_VAR_DECL(V, T, W, N) T##W##_t VECT_VAR(V,T,W,N)
 +
 +/* Array declarations.  */
 +#define ARRAY(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[N]
 +
 +/* Check results vs expected values.  */
 +#define CHECK(MSG,T,W,N,FMT)   \
 +  for(i=0; iN ; i++)  \
 +{  \
 +  if (VECT_VAR(result, T, W, N)[i] != VECT_VAR(expected, T, W, N)[i])
 { \
 +   fprintf(stderr, \
 +   ERROR in %s at type %s index %d: 0x% FMT  != 0x%\
 +   FMT  (expected)\n,\
 +   MSG, STR(VECT_NAME(T, W, N)), i,\
 +   VECT_VAR(result, T, W, N)[i],   \
 +   VECT_VAR(expected, T, W, N)[i]);\
 +   abort();\
 +  }
 \
 +

Re: [PATCH] Vectorizing abs(char/short/int) on x86.

2013-10-29 Thread Cong Hou

On Tue, Oct 29, 2013 at 10:34 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Oct 29, 2013 at 6:18 PM, Cong Hou co...@google.com wrote:

 For the define_expand I added as below, the else body is there to
 avoid fall-through transformations to ABS operation in optabs.c.
 Otherwise ABS will be converted to other operations even that we have
 corresponding instructions from SSSE3.

 No, it wont be.

 Fallthrough will generate the pattern that will be matched by the insn
 pattern above, just like you are doing by hand below.


 I think the case is special for abs(). In optabs.c, there is a
 function expand_abs() in which the function expand_abs_nojump() is
 called. This function first tries the expand function defined for the
 target and if it fails it will try max(v, -v) then shift-xor-sub
 method. If I don't generate any instruction for SSSE3, the
 fall-through will be max(v, -v). I have tested it on my machine.

 Huh, strange.

 Then you can rename previous pattern to absmode2_1 and call it from
 the new expander instead of expanding it manually. Please also add a
 small comment, describing the situation to prevent future
 optimizations in this place.

Could you tell me how to do that? The renamed pattern absmode2_1 is
also a define_expand? How to call this expander?

Thank you!


Cong




 Thanks,
 Uros.

[PATCH] Fix PR ipa/58862 (profiled bootstrap failure)

2013-10-29 Thread Teresa Johnson

This patch fixes a profiledbootstrap failure that occurred after I
added some profile fixup. The initial profile insanity occurred
upstream of my change, but my change caused the insanity to spread to
the edge probability, resulting in a verify failure. The patch below
ensures this doesn't occur.

Bootstrapped and tested on x86-64-unknown-linux-gnu. A
profiledbootstrap build also now succeeds. Ok for trunk?

Thanks,
Teresa

2013-10-29  Teresa Johnson  tejohn...@google.com

PR ipa/58862
* tree-ssa-tail-merge.c (replace_block_by): Tolerate profile
insanities when updating probabilities.

Index: tree-ssa-tail-merge.c
===
--- tree-ssa-tail-merge.c   (revision 204166)
+++ tree-ssa-tail-merge.c   (working copy)
@@ -1467,7 +1467,7 @@ static void
 replace_block_by (basic_block bb1, basic_block bb2)
 {
   edge pred_edge;
-  edge e1;
+  edge e1, e2;
   edge_iterator ei;
   unsigned int i;
   gimple bb2_phi;
@@ -1502,16 +1502,22 @@ replace_block_by (basic_block bb1, basic_block bb2
   bb2-count += bb1-count;

   /* Merge the outgoing edge counts from bb1 onto bb2.  */
+  gcov_type out_sum = 0;
   FOR_EACH_EDGE (e1, ei, bb1-succs)
 {
-  edge e2;
   e2 = find_edge (bb2, e1-dest);
   gcc_assert (e2);
   e2-count += e1-count;
-  /* Recompute the probability from the new merged edge count (bb2-count
- was updated above).  */
-  e2-probability = GCOV_COMPUTE_SCALE (e2-count, bb2-count);
+  out_sum += e2-count;
 }
+  /* Recompute the edge probabilities from the new merged edge count.
+ Use the sum of the new merged edge counts computed above instead
+ of bb2's merged count, in case there are profile count insanities
+ making the bb count inconsistent with the edge weights.  */
+  FOR_EACH_EDGE (e2, ei, bb2-succs)
+{
+  e2-probability = GCOV_COMPUTE_SCALE (e2-count, out_sum);
+}

   /* Do updates that use bb1, before deleting bb1.  */
   release_last_vdef (bb1);


-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413

Re: [PATCH] Keep REG_INC note in subreg2 pass

2013-10-29 Thread Jeff Law


On 10/24/13 02:20, Zhenqiang Chen wrote:

Hi,

REG_INC note is lost in subreg2 pass when resolve_simple_move, which
might lead to wrong dependence for ira. e.g. In function
validate_equiv_mem of ira.c, it checks REG_INC note:

  for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
 if ((REG_NOTE_KIND (note) == REG_INC
  || REG_NOTE_KIND (note) == REG_DEAD)
  REG_P (XEXP (note, 0))
  reg_overlap_mentioned_p (XEXP (note, 0), memref))
   return 0;

Without REG_INC note, validate_equiv_mem will return a wrong result.

Referhttps://bugs.launchpad.net/gcc-linaro/+bug/1243022  for more
detail about a real case in kernel.

Bootstrap and no make check regression on X86-64 and ARM.

Is it OK for trunk and 4.8?

Thanks!
-Zhenqiang

ChangeLog:
2013-10-24  Zhenqiang Chenzhenqiang.c...@linaro.org

 * lower-subreg.c (resolve_simple_move): Copy REG_INC note.

testsuite/ChangeLog:
2013-10-24  Zhenqiang Chenzhenqiang.c...@linaro.org

 * gcc.target/arm/lp1243022.c: New test.
This clearly handles adding a note when the destination is a MEM with a 
side effect.  What about cases where the side effect is associated with 
a load from memory rather than a store to memory?






lp1243022.patch


diff --git a/gcc/lower-subreg.c b/gcc/lower-subreg.c
index 57b4b3c..e710fa5 100644
--- a/gcc/lower-subreg.c
+++ b/gcc/lower-subreg.c
@@ -1056,6 +1056,22 @@ resolve_simple_move (rtx set, rtx insn)
mdest = simplify_gen_subreg (orig_mode, dest, GET_MODE (dest), 0);
minsn = emit_move_insn (real_dest, mdest);

+#ifdef AUTO_INC_DEC
+  /* Copy the REG_INC notes.  */
+  if (MEM_P (real_dest)  !(resolve_reg_p (real_dest)
+|| resolve_subreg_p (real_dest)))
+   {
+ rtx note = find_reg_note (insn, REG_INC, NULL_RTX);
+ if (note)
+   {
+ if (!REG_NOTES (minsn))
+   REG_NOTES (minsn) = note;
+ else
+   add_reg_note (minsn, REG_INC, note);
+   }
+   }
+#endif
If MINSN does not have any notes, then this results in MINSN and INSN 
sharing the note.  Note carefully that notes are chained (see 
implementation of add_reg_note).  Thus the sharing would result in MINSN 
and INSN actually sharing a chain of notes.  I'm pretty sure that's not 
what you intended.  I think you need to always use add_reg_note.


Jeff

Symtab, cgraph and varpool nodes are now a real class hierarchy

2013-10-29 Thread David Malcolm

On Mon, 2013-10-28 at 23:37 -0600, Jeff Law wrote:
 On 10/28/13 20:57, David Malcolm wrote:
* cgraph.h (symtab_node_base): Convert to a class;
add GTY((desc (%h.type), tag (SYMTAB_SYMBOL))).
(cgraph_node): Inherit from symtab_node; add GTY option
tag (SYMTAB_FUNCTION).
(varpool_node): Inherit from symtab_node; add GTY option
tag (SYMTAB_VARIABLE).
(symtab_node_def): Remove.
(is_a_helper cgraph_node::test (symtab_node_def *)): Convert to...
(is_a_helper cgraph_node::test (symtab_node_base *)): ...this.
(is_a_helper varpool_node::test (symtab_node_def *)): Convert to...
(is_a_helper varpool_node::test (symtab_node_base *)): ...this.
 
* ipa-ref.h (symtab_node_def): Drop.
(symtab_node): Change underlying type from symtab_node_def to
symtab_node_base.
(const_symtab_node): Likwise.
 
* is-a.h: Update examples in comment.
 
* symtab.c (symtab_hash): Change symtab_node_def to symtab_node_base.
(assembler_name_hash): Likewise.
 
  This patch is OK.  Thanks for working on this!
 
  These symtab changes were dependent on having gengtype support for
  inheritance, which is now in trunk, so I'm now revisiting these patches.
 
  The above patch hasn't bitrotted, though the autogenerated one that goes
  with it needed regenerating.
 
  A new version of the autogenerated patch can be seen at:
  http://dmalcolm.fedorapeople.org/gcc/large-patches/eaba9669644c84592ea32be2dcd19ba92beca381-0003-Autogenerated-fixes-of-symbol.-to.patch
  Is that new patch OK?  (it's 450KB so one's eyes tend to glaze over
  after a while, but FWIW you approved an earlier version of that in:
 http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00730.html
  and the test suite for the script that generated the patch can be seen
  at:
  https://github.com/davidmalcolm/gcc-refactoring-scripts/blob/master/test_refactor_symtab.py
   )
 Given it's already approved and hasn't changed in any substantial way, 
 it's OK for the trunk once you get through the bootstrap and regression 
 testing.
Thanks.

The conversion of the symtab types from inheritance-in-C to a C++ class
hierarchy is now in trunk: I committed the manual parts of the
conversion as r204170, and the automated part as r204171.  I then
noticed that this broke the gdb debugging hooks for printing a
(cgraph_node *).  I tested and committed the attached fix for them as
r204174.

As a followup, I'm next looking at the renaming of the symtab types
previously discussed with Honza (on 2013-09-10).

Dave
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 204173)
+++ gcc/ChangeLog	(revision 204174)
@@ -1,3 +1,10 @@
+2013-10-29  David Malcolm  dmalc...@redhat.com
+
+	* gdbhooks.py (CGraphNodePrinter.to_string): Update gdb
+	prettyprinter for cgraph_node to reflect the conversion of the
+	symtable types to a C++ class hierarchy: it now *is* a
+	symtab_node_base, rather than having one (named symbol).
+
 2013-10-29  Balaji V. Iyer  balaji.v.i...@intel.com
 
 	* builtins.c (is_builtin_name): Added a check for __cilkrts_detach and
Index: gcc/gdbhooks.py
===
--- gcc/gdbhooks.py	(revision 204173)
+++ gcc/gdbhooks.py	(revision 204174)
@@ -226,8 +226,7 @@
 # symtab_node_name calls lang_hooks.decl_printable_name
 # default implementation (lhd_decl_printable_name) is:
 #return IDENTIFIER_POINTER (DECL_NAME (decl));
-symbol = self.gdbval['symbol']
-tree_decl = Tree(symbol['decl'])
+tree_decl = Tree(self.gdbval['decl'])
 result += ' %s' % tree_decl.DECL_NAME().IDENTIFIER_POINTER()
 result += ''
 return result

Re: free is a killer

2013-10-29 Thread Jeff Law


On 10/29/13 00:38, Marc Glisse wrote:

On Mon, 28 Oct 2013, Jeff Law wrote:


On 10/28/13 16:05, Marc Glisse wrote:


I checked and it does the wrong thing (I don't have the testcase handy
anymore, but it shouldn't be hard to recreate one), I even wrote a patch
(attached) but it is related to:
http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02238.html
so it can't go in. A more limited fix (still missing many cases) would
be possible. I didn't test VA_END, but it doesn't look as bad (it is the
SSA_NAME case that is wrong for memcpy

Actually, it's fairly easy to see with memset.  Something like this:

/* { dg-do compile } */
/* { dg-options -O1 -fdump-tree-optimized } */

void f (long *p) {
 *p = 42;
 p[4] = 42;
 __builtin_memset (p, 0, 100);
}

/* { dg-final { scan-tree-dump-not = 42 optimized } } */
/* { dg-final { cleanup-tree-dump optimized } } */


Indeed. Do you want to commit it xfailed or put it in bugzilla so we
don't lose it? (it becomes harder if you replace p with p-1 in the
memset arguments).
BTW, you can get a VAR_DECL (and presumably a PARM_DECL) from 
ao_ref_base.  Some of the ssa-dse tests show this.


Jeff

1 2 >

1 - 100 of 134 matches

Mail list logo