date:20130617

Re: [PATCH 4/4] Fix leading spaces.

2013-06-17 Thread Chung-Ju Wu

2013/6/16 Ondřej Bílka :
> On Sat, Jun 15, 2013 at 05:13:31PM +0800, Chung-Ju Wu wrote:
>> 2013/6/14 Joseph S. Myers :
>> > On Thu, 13 Jun 2013, Richard Biener wrote:
>> >
>> >> Btw, rather than these kind of patches I'd appreciate if someone would 
>> >> look
>> >> at a simple pre(post?)-commit hook that enforces those whitespace rules.
>> >
>> > In the cpp testsuite we definitely want tests with bad whitespace (e.g.
>> > gcc.dg/cpp/backslash*.c) and generally I think it's wrong to be changing
>> > whitespace in the testsuite without an understanding of what the test is
>> > testing and how the whitespace is irrelevant to that (more generally,
>> > cleanups of compiler tests are suspect without such an understanding of
>> > what is or is not significant in a particular test).  And so you need to
>> > allow addition of otherwise bad whitespace there.
>> >
>> > It's not obvious whether there might be other cases needing such
>> > whitespace as well.
>> >
>> >> Either by adjusting the committed content or by rejecting the commit(?)
>> >
>> > I don't think hooks adjusting committed content are likely to work at all.
>> >
>> > --
>> > Joseph S. Myers
>> > jos...@codesourcery.com
>>
>> By having a look at Ondřej's patch, it is nice to fix existing
>> codes with proper whitespace/tab rules.
>>
>> But it covers too much under whole gcc source tree.
>> Like Joseph said, we may accidentally change the cases
>> that need bad whitespace.
>>
>> Perhaps we should gradually fix them?  i.e. We can only fix
>> leading spaces for some obvious cases (gcc/*.c gcc/c-family/*.c ...),
>> leaving other source (gcc/testsuite/* gcc/config/* ...) untouched.
>>
> I uploaded new version that touches only gcc/*.[ch] gcc/c-family/*.[ch]
> to http://kam.mff.cuni.cz/~ondra/stylepp.tar.bz2
> patches are also updated.
>

IMHO, the preliminary whitelist could be:
  gcc/*.[ch]
  gcc/c/*.[ch]
  gcc/c-family/*.[ch]
  gcc/common/*.[ch]
  gcc/cp/*.[ch]

> Anyway what in gcc/config/*.c depends on leading/trailing spaces?
>

In general, I agree with you that all stuff under gcc/config/ and
gcc/common/config/ are supposed to follow whitespace rules.
But I think it would be better to have corresponding port maintainers
make the decision.

Your tool and patches look great to me.  It helps fixup the existing
codes with strong whitespace convention.
But I am sorry that I don't have right to approve it.
You will need some reviewers to review the patch and give the OK.

If you do not receive any response to the patches within two weeks or so,
you can 'ping' it with a follow-up mail to remind reviewers. :)


Best regards,
jasonwucj

Re: [Patch, AArch64] Adjust gcc.dg/torture/stackalign/builtin-apply-2.c

2013-06-17 Thread Marcus Shawcroft

OK
/Marcus

On 17 June 2013 14:43, Yufeng Zhang  wrote:
> Hi,
>
> This patch sets STACK_ARGUMENTS_SIZE with 0 for AArch64 as variadic
> arguments to 'bar' are passed in registers on this target.
>
> OK for the trunk?
>
> Thanks,
> Yufeng
>
> gcc/testsuite/
>
> * gcc.dg/torture/stackalign/builtin-apply-2.c: set
> STACK_ARGUMENTS_SIZE with 0 if __aarch64__ is defined.

Apply powerpc64le patches to gcc-4.8 branch?

2013-06-17 Thread Alan Modra

I'd like to apply the following set of patches supporting powerpc64le
to the 4.8 branch.  David has stated that he's happy with the idea,
even though technically these are not regressions.  OK?

http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01374.html
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00211.html
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00214.html
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00244.html
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00297.html
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00435.html
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00476.html
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00165.html
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00166.html
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00388.html
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00554.html
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00684.html
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00766.html

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH] PowerPC: Fix test case for PR55033

2013-06-17 Thread Chung-Ju Wu

2013/6/18 David Edelsohn :
> gcc/testsuite/ChangeLog
> 2013-06-17  Sebastian Huber  
>
> PR target/55033
> * gcc.target/powerpc/pr55033.c: Fix options.
>
> Okay.
>
> Thanks, David
>
> P.S. Please explicitly copy me on patches.

Hi, Sebastian,

Since David has pproved your patch,
do you need me to help commit this fix again?
I'd happy to do this for you. :)


Best regards,
jasonwucj

Re: [gomp4] Some progress on #pragma omp simd

2013-06-17 Thread Aldy Hernandez


On 06/17/13 12:23, Richard Henderson wrote:

On 06/17/2013 10:13 AM, Aldy Hernandez wrote:

- data.simduid = tree_low_cst (gimple_call_arg (stmt, 0), 1);
+ data.simduid = gimple_call_arg (stmt, 0);


Doesn't this copy the ADDR_EXPR from the call into simduid?


  simduid_to_vf::hash (const value_type *p)
  {
-  return p->simduid;
+  return htab_hash_pointer (p->simduid);


... at which point this bit is meaningless since all ADDR_EXPRs must of course
have different pointers.

I think we should validate the DECL_P extracted from the call_arg, and store
that.  The hash should use DECL_UID to minimize hash variation due to memory
layout.


As discussed on IRC.  Attached are these changes you requested, plus 
changing OMP_CLAUSE__SIMDUID__UID to OMP_CLAUSE__SIMDUID__DECL.


I will tackle the dot named builtins in the next iteration.

BTW, this patch bootstraps with no regressions.  I also manually 
inspected the gimple generated by the test below, and made sure that 
inlining func() into both foo() and bar() have different temporaries. 
Without this patch, the same constant was used incorrectly as arguments 
to __builtin_GOMP.simd_vf and __builtin_GOMP.simd_lane.


How does this look?
Aldy

#define N 1000

static inline int func (int *p)
{
  int x = 0, i;
#pragma simd reduction (+:x)
  for (i = 0; i < 1000; i++)
x += p[i];
  return x;
}

int array[5];
int dork[];
foo()
{
  return func(array);
}

bar()
{
  return func(dork) + 666;
}

diff --git a/gcc/ChangeLog.gomp b/gcc/ChangeLog.gomp
index 7f9151d..0ed1b2c 100644
--- a/gcc/ChangeLog.gomp
+++ b/gcc/ChangeLog.gomp
@@ -1,3 +1,24 @@
+2013-06-17  Aldy Hernandez  
+
+   * builtin-types.def (BT_FN_UINT_PTR): New.
+   * omp-builtins.def (BUILT_IN_GOMP_SIMD_LANE): Use it.
+   (BUILT_IN_GOMP_SIMD_VF): Same.
+   * cfgloop.h (struct loop): Change type of simduid to tree.
+   * omp-low.c (lower_rec_input_clauses): Adapt to use simduid as a
+   tree.
+   (expand_omp_simd): Same.
+   * tree-data-ref.c (get_references_in_stmt): Same.
+   * tree-vect-data-refs.c (vect_analyze_data_refs): Same.
+   * tree-vectorizer.c (struct simduid_to_vf): Change type of simduid
+   to tree.
+   (simduid_to_vf::hash): Hash pointer.
+   (adjust_simduid_builtins): Add comment.
+   Use simduid as tree.
+   * tree-pretty-print.c (dump_omp_clause): Rename
+   OMP_CLAUSE__SIMDUID__UID to OMP_CLAUSE__SIMDUID__DECL.
+   * tree.h (OMP_CLAUSE__SIMDUID__DECL): Rename from
+   OMP_CLAUSE__SIMDUID__UID.
+
 2013-06-14  Jakub Jelinek  
 
* gimple-pretty-print.c (dump_gimple_omp_for): Don't handle
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 4c866f2..171fdb7 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -227,6 +227,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, 
BT_DFLOAT128, BT_DFLOAT128)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
+DEF_FUNCTION_TYPE_1 (BT_FN_UINT_PTR, BT_UINT, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_ULONG_ULONG, BT_ULONG, BT_ULONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_ULONGLONG_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT16, BT_UINT16, BT_UINT16)
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 6cc9a6c..41677bc 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -176,7 +176,7 @@ struct GTY ((chain_next ("%h.next"))) loop {
 
   /* For SIMD loops, this is a unique identifier of the loop, referenced
  by __builtin_GOMP.simd_vf and __builtin_GOMP.simd_lane builtins.  */
-  unsigned int simduid;
+  tree simduid;
 
   /* True if we should try harder to vectorize this loop.  */
   bool force_vect;
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 8ad2113..ddbe2c1 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -220,6 +220,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, 
"GOMP_single_copy_end",
  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
 
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SIMD_LANE, "GOMP.simd_lane",
- BT_FN_UINT_UINT, ATTR_NOVOPS_NOTHROW_LEAF_LIST)
+ BT_FN_UINT_PTR, ATTR_NOVOPS_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SIMD_VF, "GOMP.simd_vf",
- BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+ BT_FN_UINT_PTR, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a9e2758..731c6d9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2497,7 +2497,6 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, 
gimple_seq *dlist,
   bool copyin_by_ref = false;
   bool lastprivate_firstprivate = false;
   int pass;
-  static int simd_uid;
   bool is_simd = (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
  && gimple_omp_for_kind (ctx->stmt) == GF_OMP_FOR_KIND_SIMD);
   int max_vf = 0;
@@ -2887,15 +2886,15 @@ lower_rec_input_clauses (tr

Re: [c++-concepts] code review

2013-06-17 Thread Gabriel Dos Reis

On Mon, Jun 17, 2013 at 2:20 PM, Jason Merrill  wrote:

>> I have not thought deeply about constrained friend declarations. What
>> would a friend temploid look like?
>
>
> I was thinking something like
>
> template  struct A {
>   T t;
>  requires Addable()
>   friend A operator+(const A& a1, const A& a2)
>   { return A(a1.t + a2.t); }
>
> };

We agreed about a month ago to have something like this
for member functions.  It makes sense that it applies to friend
too (since it already applies to static member functions.)

One caveat in the design is that the nested requirement can
only add to the outer requirements.

-- Gaby

Re: [C++ Patch / RFC] PR 53211

2013-06-17 Thread Paolo Carlini


Hi,

On 06/17/2013 10:30 PM, Jason Merrill wrote:

On 06/17/2013 04:08 PM, Paolo Carlini wrote:

+  if (TREE_CODE (TREE_TYPE (*tp)) == ARRAY_TYPE
+  && !TYPE_DOMAIN (TREE_TYPE (*tp))
+  && DECL_INITIAL (*tp)
+  && type_dependent_expression_p (DECL_INITIAL (*tp)))
+return *tp;


I think this approach makes sense, but it should go in 
type_dependent_expression_p rather than instantiation_dependent_r.


And please revert my fix for 56794, since this should fix one as well.
I see... There is a little difficulty in that 56794 involves a non-type 
variadic parameter and in that case type_dependent_expression_p returns 
false. If I use value_dependent_expression_p things work, but I'm not 
sure it's 100% correct.


I considered replacing it with uses_template_parms?!? Testing is fine 
with it too. It's also fine with both type_dependent_expression_p || 
value_dependent_expression_p.


Eventually, we'll have to decide where we want to commit this: 56794 is 
fixed in 4.7 and 4.8 too, but it's true that the issue with the specific 
testcase attached by Jon isn't a regression.


Thanks,
Paolo.


/cp
2013-06-18

PR c++/53211
* pt.c (type_dependent_expression_p): Handle an array of unknown
bound depending on a variadic parameter.
* parser.c (cp_parser_range_for): Revert PR56794 changes.

/testsuite
2013-06-18

PR c++/53211
* g++.dg/cpp0x/decltype55.C: New.

Index: cp/parser.c
===
--- cp/parser.c (revision 200151)
+++ cp/parser.c (working copy)
@@ -9750,10 +9750,7 @@ cp_parser_range_for (cp_parser *parser, tree scope
range_expr = error_mark_node;
   stmt = begin_range_for_stmt (scope, init);
   finish_range_for_decl (stmt, range_decl, range_expr);
-  if (range_expr != error_mark_node
- && !type_dependent_expression_p (range_expr)
- /* The length of an array might be dependent.  */
- && COMPLETE_TYPE_P (complete_type (TREE_TYPE (range_expr)))
+  if (!type_dependent_expression_p (range_expr)
  /* do_auto_deduction doesn't mess with template init-lists.  */
  && !BRACE_ENCLOSED_INITIALIZER_P (range_expr))
do_range_for_auto_deduction (range_decl, range_expr);
Index: cp/pt.c
===
--- cp/pt.c (revision 200151)
+++ cp/pt.c (working copy)
@@ -20079,6 +20079,26 @@ type_dependent_expression_p (tree expression)
   && VAR_HAD_UNKNOWN_BOUND (expression))
 return true;
 
+  /* An array of unknown bound depending on a variadic parameter, eg:
+
+ template
+   void foo (Args... args)
+   {
+ int arr[] = { args... };
+   }
+
+ template
+   void bar ()
+   {
+ int arr[] = { vals... };
+   }  */
+  if (VAR_P (expression)
+  && TREE_CODE (TREE_TYPE (expression)) == ARRAY_TYPE
+  && !TYPE_DOMAIN (TREE_TYPE (expression))
+  && DECL_INITIAL (expression)
+  && value_dependent_expression_p (DECL_INITIAL (expression)))
+   return true;
+
   if (TREE_TYPE (expression) == unknown_type_node)
 {
   if (TREE_CODE (expression) == ADDR_EXPR)
Index: testsuite/g++.dg/cpp0x/decltype55.C
===
--- testsuite/g++.dg/cpp0x/decltype55.C (revision 0)
+++ testsuite/g++.dg/cpp0x/decltype55.C (working copy)
@@ -0,0 +1,20 @@
+// PR c++/53211
+// { dg-do compile { target c++11 } }
+
+template
+  struct is_same { static const bool value = false; };
+
+template
+  struct is_same { static const bool value = true; };
+
+template
+void func(Args... args)
+{
+  int arr[] = { args... };
+  static_assert (is_same::value, "");
+}
+
+int main()
+{
+  func(1, 2, 3, 4);
+}

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Sandra Loosemore


Earlier today I wrote:


On 06/17/2013 08:41 AM, Joseph S. Myers wrote:

On Mon, 17 Jun 2013, Julian Brown wrote:


IIUC, the incompatibility between the specified
-fstrict-volatile-bitfields behaviour and the C++ memory model is a
recognised deficiency in the ARM EABI. It might be an unpopular
suggestion, but how about disabling the option by default for C++ on
ARM [*]? Maybe even with a "sorry" if the user tries to turn it on
manually?


The C11 and C++11 memory models are the same, this is nothing to do with
C++ as opposed to C.


The problem with always choosing C++ memory model compliance over ABI
compliance, no matter what the setting of -fstrict-volatile-bitfields,
is that some users may have legacy code out there where they really want
the ABI-compliant behavior.  Perhaps we should not make
-fstrict-volatile-bitfields not be the default on any target any more,
but if you supply it explicitly it tells GCC that you really want the
ABI-compliant behavior to override the language-standard-compliant
behavior?


I had another thought:  perhaps -fstrict-volatile-bitfields could remain 
the default on targets where it currently is, but it can be overridden 
by an appropriate -std= option.  Perhaps also GCC could give an error if 
-fstrict-volatile-bitfields is given explicitly with an incompatible 
-std= option.


-Sandra

Re: [PATCH] DATA_ALIGNMENT vs. DATA_ABI_ALIGNMENT (PR target/56564)

2013-06-17 Thread David Edelsohn

On Thu, Jun 13, 2013 at 11:37 AM, Alan Modra  wrote:

> Revised patch with offsettable_ok_by_alignment change, avoiding dumb
> idea of using statement expressions.  This one actually bootstraps and
> passes regression testing.
>
> * config/rs6000/rs6000.h (enum data_align): New.
> (LOCAL_ALIGNMENT, DATA_ALIGNMENT): Use rs6000_data_alignment.
> (DATA_ABI_ALIGNMENT): Define.
> (CONSTANT_ALIGNMENT): Correct comment.
> * config/rs6000/rs6000-protos.h (rs6000_data_alignment): Declare.
> * config/rs6000/rs6000.c (rs6000_data_alignment): New function.

The revised patch, without the DECL_P part is okay.

The original code produced the necessary alignment and neither of us
can find any code in public packages that increases the alignment for
PPC vector types.  While there is the possibility that a user could
encounter an object file produced by an older GCC with less strict
alignment and a version of GCC with this fix would make an incorrect
assumption, this does not seem very likely in practice.

Thanks, David

Re: [PATCH] Re-write LTO type merging again, do tree merging

2013-06-17 Thread Martin Liška

Hello,
I tried to compile LTO kernel with latest gcc, applied patch by
Jan http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57334#c6:

lto1: internal compiler error: in lto_symtab_prevailing_decl, at
lto-symtab.c:644
0x783c63 lto_symtab_prevailing_decl(tree_node*)
../../gcc/lto-symtab.c:644
0x50afe4 lto_fixup_prevailing_decls
../../gcc/lto/lto.c:3220
0x50afe4 lto_fixup_decls
../../gcc/lto/lto.c:3284
0x50afe4 read_cgraph_and_symbols
../../gcc/lto/lto.c:3490
0x50afe4 lto_main()
../../gcc/lto/lto.c:3834

I commented gcc_checking_assert in Jan's patch and compilation was
'almost' successful:
+ sortextable vmlinux
+ /home/marxin/Programming/linux-andikleen/scripts/sortextable vmlinux
Error:
no main_extable_sort_needed symbol in  file: vmlinux

Both vmlinux1 and vmlinux2 were linker without any problem.

ld --version
GNU ld (Linux/GNU Binutils) 2.23.51.0.9.20130118

Build was done with Andi's kernel config and lto-3.9 branch of git repository.

Thanks,
Martin

On 17 June 2013 19:51, Andi Kleen  wrote:
>
> Current trunk cannot build LTO kernels now, with your patch,
> as reported earlier.
>
> Please fix ASAP. I'm surprised that you checked in a patchkit
> with such serious reported problems.
>
> -Andi
>
>
> backup/lsrc/git/linux-lto-2.6/lib/decompress_unlzo.c: In function 'unlzo':
> /backup/lsrc/git/linux-lto-2.6/lib/decompress_unlzo.c:79:8: internal compiler 
> error: in expand_expr_real_1, at expr.c:9361
>   parse += 7;
> ^
> 0x5ea175 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**)
> ../../gcc/gcc/expr.c:9356
> 0x53fe49 expand_expr
> ../../gcc/gcc/expr.h:444
> 0x53fe49 expand_gimple_stmt_1
> ../../gcc/gcc/cfgexpand.c:2283
> 0x53fe49 expand_gimple_stmt
> ../../gcc/gcc/cfgexpand.c:2370
> 0x5408a8 expand_gimple_basic_block
> ../../gcc/gcc/cfgexpand.c:4204
> 0x542996 gimple_expand_cfg
> ../../gcc/gcc/cfgexpand.c:4723
> Please submit a full bug report,
> with preprocessed source if appropriate.
> Please include the complete backtrace with any bug report.
> See  for instructions.
> make[3]: *** [/home/ak/lsrc/git/obj-lto/ccUSzSrf.ltrans18.ltrans.o] Error 1
>
>
>
>
>
>
>
> --
> a...@linux.intel.com -- Speaking for myself only.

Re: [PATCH] Re-write LTO type merging again, do tree merging

2013-06-17 Thread Richard Biener

Andi Kleen  wrote:

>
>Current trunk cannot build LTO kernels now, with your patch, 
>as reported earlier.
>
>Please fix ASAP. I'm surprised that you checked in a patchkit
>with such serious reported problems.

Instructions for reproducing this issue appreciated. I've never built lto 
kernels.

Richard.

>-Andi
>
>
>backup/lsrc/git/linux-lto-2.6/lib/decompress_unlzo.c: In function
>'unlzo':
>/backup/lsrc/git/linux-lto-2.6/lib/decompress_unlzo.c:79:8: internal
>compiler error: in expand_expr_real_1, at expr.c:9361
>  parse += 7;
>^
>0x5ea175 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
>expand_modifier, rtx_def**)
>../../gcc/gcc/expr.c:9356
>0x53fe49 expand_expr
>../../gcc/gcc/expr.h:444
>0x53fe49 expand_gimple_stmt_1
>../../gcc/gcc/cfgexpand.c:2283
>0x53fe49 expand_gimple_stmt
>../../gcc/gcc/cfgexpand.c:2370
>0x5408a8 expand_gimple_basic_block
>../../gcc/gcc/cfgexpand.c:4204
>0x542996 gimple_expand_cfg
>../../gcc/gcc/cfgexpand.c:4723
>Please submit a full bug report,
>with preprocessed source if appropriate.
>Please include the complete backtrace with any bug report.
>See  for instructions.
>make[3]: *** [/home/ak/lsrc/git/obj-lto/ccUSzSrf.ltrans18.ltrans.o]
>Error 1

Re: [C++ Patch / RFC] PR 53211

2013-06-17 Thread Jason Merrill


On 06/17/2013 04:08 PM, Paolo Carlini wrote:

+  if (TREE_CODE (TREE_TYPE (*tp)) == ARRAY_TYPE
+ && !TYPE_DOMAIN (TREE_TYPE (*tp))
+ && DECL_INITIAL (*tp)
+ && type_dependent_expression_p (DECL_INITIAL (*tp)))
+   return *tp;


I think this approach makes sense, but it should go in 
type_dependent_expression_p rather than instantiation_dependent_r.


And please revert my fix for 56794, since this should fix one as well.

Jason

[C++ Patch / RFC] PR 53211

2013-06-17 Thread Paolo Carlini


Hi,

while triaging this PR (the original issue is already fixed) Jonathan 
added to the audit trail the attached testcase, which we are still 
mishandling. It seems to me that something is wrong in 
instantiation_dependent_expression_p: when finish_decltype_type is 
called the first time by the parser, and calls in turns 
instantiation_dependent_expression_p on the VAR_DECL arr, it returns 
*false*, whereas of course the type of arr, as far as the number of 
elements is concerned, depends on the actual variadic Args. Thus I tried 
the patchlet below, which works, but I'm not sure if it's papering over 
a deeper issue, or, in case the approach makes sense in general, whether 
the check should be tweaked somehow. It's what I have at the moment, 
anyway...


Thanks!
Paolo.

//
Index: cp/pt.c
===
--- cp/pt.c (revision 200151)
+++ cp/pt.c (working copy)
@@ -20148,6 +20148,11 @@ instantiation_dependent_r (tree *tp, int *walk_sub
   return NULL_TREE;
 
 case VAR_DECL:
+  if (TREE_CODE (TREE_TYPE (*tp)) == ARRAY_TYPE
+ && !TYPE_DOMAIN (TREE_TYPE (*tp))
+ && DECL_INITIAL (*tp)
+ && type_dependent_expression_p (DECL_INITIAL (*tp)))
+   return *tp;
 case CONST_DECL:
   /* A constant with a dependent initializer is dependent.  */
   if (value_dependent_expression_p (*tp))
Index: testsuite/g++.dg/cpp0x/decltype55.C
===
--- testsuite/g++.dg/cpp0x/decltype55.C (revision 0)
+++ testsuite/g++.dg/cpp0x/decltype55.C (working copy)
@@ -0,0 +1,20 @@
+// PR c++/53211
+// { dg-do compile { target c++11 } }
+
+template
+  struct is_same { static const bool value = false; };
+
+template
+  struct is_same { static const bool value = true; };
+
+template
+void func(Args... args)
+{
+  int arr[] = { args... };
+  static_assert (is_same::value, "");
+}
+
+int main()
+{
+  func(1, 2, 3, 4);
+}

Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Oleg Endo

On Mon, 2013-06-17 at 10:41 +0200, Eric Botcazou wrote:
> > My mistake. It's because arm_legitimize_address cannot re-factor "[r105 +
> > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
> > suggest that this kind of transformation should be done in backend?  I can
> > think of some disadvantages by doing it in backend:
> > Different kinds of address expression might be wanted in different phase of
> > compilation, for example, sometimes GCC wants to force computation of
> > address expression out of memory access because it cannot CSE array
> > indexing.  It's difficult to differentiate in backend.
> 
> It's equally difficult in memory_address_addr_space and it affects all the 
> architectures at once...  You said in the original message that Thumb2 and 
> ARM 
> modes are already not equally sensitive to it, so it's not unthinkable that 
> some architectures might not want it at all.  Therefore, yes, this should 
> preferably be handled in arm_legitimize_address.

My observation is, that legitimizing addressing modes in the backend by
looking at one isolated address works, but doesn't give good results.
In the SH backend there is this a particular case with displacement
addressing (register + constant).  On SH displacements for byte
addressing are 0..15, 0..31 for 16 bit words and 0..63 for 32 bit words.
sh_legitimize_address (or rather sh_find_mov_disp_adjust) uses a fixed
heuristic to satisfy the displacement constraint and splits out a plus
insn if needed to adjust the base address.  Of course that fixed
heuristic doesn't work for some cases and thus sometimes results in
unnecessary base address adjustments.
I had the idea of replacing the current (partially defunct) auto-inc-dec
RTL pass with a more generic addressing mode selection pass:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56590

Any suggestions/comments/... are highly appreciated.

Cheers,
Oleg

Re: [c++-concepts] code review

2013-06-17 Thread Jason Merrill


On 06/17/2013 02:10 PM, Andrew Sutton wrote:

You mean you don't need  anymore in logic.cc?  I think we want
the  include in general if we're going to support people using the
C++ standard library.


I don't. Those decisions are above my pay grade, so I'm doing my best
not to make them.


If you don't need the change for concepts any more, feel free to drop it.


Can friend temploids be constrained?


I have not thought deeply about constrained friend declarations. What
would a friend temploid look like?


I was thinking something like

template  struct A {
  T t;
 requires Addable()
  friend A operator+(const A& a1, const A& a2)
  { return A(a1.t + a2.t); }
};


I'm not clear on the issue.  Perhaps leaving processing_template_decl alone
and using fold_non_dependent_expr would accomplish what you want?


I don't think that will work. The problem comes from the instantiation
of traits (and some other expressions) during constraint checking.
When processing_template_decl is non-zero, finish_trait_expr will
create TRAIT_EXPR nodes, which aren't handled by the constexpr
evaluation engine.


Sure, but fold_non_dependent_expr should turn the TRAIT_EXPR into a more 
useful form.



Can explicit specializations have constraints, to indicate which template
they are specializing?


Good question. I don't think so. I believe that it would be a
specialization of the most specialized function template whose
constraints were satisfied. So:


Makes sense.


Passing 'true' for require_all_args means no deduction will be done; rather,
all arguments must either be specified or have default arguments.


I see. It seems like I should be passing false here, since I want to
ensure that the resulting argument list can be used to instantiate the
template.


I think true is what you want, since there are no function arguments to 
do argument deduction from.  Passing true for require_all_args 
guarantees that the result can be used to instantiate the template; 
passing false can return an incomplete set of arguments that will be 
filled in later by fn_type_unification.


Jason

[ping**2] Nios II port

2013-06-17 Thread Sandra Loosemore

Ping?  I think these are the most recent versions of the unreviewed 
patches in the series:


http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00287.html
http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00760.html
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01085.html

There are also these two parts that have been reviewed already:
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01329.html  [approved but 
not applied yet?]
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01087.html  [needs minor 
cleanup and separate consideration]


-Sandra

Re: [C/C++ PATCH] Handle COMPOUND_EXPR in convert_to_* (PR c++/56493)

2013-06-17 Thread Jason Merrill


On 06/17/2013 02:09 PM, Jakub Jelinek wrote:

On Mon, Jun 17, 2013 at 01:44:25PM -0400, Jason Merrill wrote:

On 06/17/2013 12:54 PM, Joseph S. Myers wrote:

I suppose it's OK to fix the regression, though really we should be
eliminating these early convert_to_* optimizations (optimizing later on
GIMPLE if possible) rather than adding to them.


My thought as well.  How hard is it to fix this in gimple fold?


PRs for it are open for almost 3 years now, I think Kai has some patch, but
it hasn't been posted yet, so it is unclear if it would make 4.9.


Kai, what's the status of this patch?

Jason

Re: [patch] gcov intermediate format

2013-06-17 Thread Sharad Singhai

Ping.

Thanks,
Sharad


On Wed, Jun 5, 2013 at 11:18 AM, Sharad Singhai  wrote:
> Ping.
>
> Thanks,
> Sharad
>
>
> On Tue, May 28, 2013 at 11:35 AM, Sharad Singhai  wrote:
>> Sorry, my patch had bad formatting in one of the functions
>> (output_gcov_file). Here is the corrected version.
>>
>> Thanks,
>> Sharad
>>
>> (2013-05-28
>>
>> * gcov.c (print_usage): Handle new option.
>> (process_args): Ditto.
>> (get_gcov_intermediate_filename): New function.
>> (output_intermediate_file): New function.
>> (output_gcov_file): New function
>> (generate_results): Handle new option.
>> (release_function): Relase demangled name.
>> (read_graph_file): Handle demangled name.
>> (output_lines): Ditto.
>> * doc/gcov.texi: Document gcov intermediate format.
>>
>> testsuite/ChangeLog:
>>
>> 2013-05-28   Sharad Singhai  
>>
>> * g++.dg/gcov/gcov-8.C: New testcase.
>> * lib/gcov.exp: Handle intermediate format.
>>
>> Index: doc/gcov.texi
>> ===
>> --- doc/gcov.texi (revision 199273)
>> +++ doc/gcov.texi (working copy)
>> @@ -122,15 +122,17 @@ gcov [@option{-v}|@option{--version}] [@option{-h}
>>   [@option{-a}|@option{--all-blocks}]
>>   [@option{-b}|@option{--branch-probabilities}]
>>   [@option{-c}|@option{--branch-counts}]
>> - [@option{-u}|@option{--unconditional-branches}]
>> + [@option{-d}|@option{--display-progress}]
>> + [@option{-f}|@option{--function-summaries}]
>> + [@option{-i}|@option{--intermediate-format}]
>> + [@option{-l}|@option{--long-file-names}]
>> + [@option{-m}|@option{--demangled-names}]
>>   [@option{-n}|@option{--no-output}]
>> - [@option{-l}|@option{--long-file-names}]
>> + [@option{-o}|@option{--object-directory} @var{directory|file}]
>>   [@option{-p}|@option{--preserve-paths}]
>>   [@option{-r}|@option{--relative-only}]
>> - [@option{-f}|@option{--function-summaries}]
>> - [@option{-o}|@option{--object-directory} @var{directory|file}]
>>   [@option{-s}|@option{--source-prefix} @var{directory}]
>> - [@option{-d}|@option{--display-progress}]
>> + [@option{-u}|@option{--unconditional-branches}]
>>   @var{files}
>>  @c man end
>>  @c man begin SEEALSO
>> @@ -232,6 +234,50 @@ Unconditional branches are normally not interestin
>>  @itemx --display-progress
>>  Display the progress on the standard output.
>>
>> +@item -i
>> +@itemx --intermediate-format
>> +Output gcov file in an easy-to-parse intermediate text format that can
>> +be used by @command{lcov} or other tools. The output is a single
>> +@file{.gcov} file per @file{.gcda} file. No source code is required.
>> +
>> +The format of the intermediate @file{.gcov} file is plain text with
>> +one entry per line
>> +
>> +@smallexample
>> +file:@var{source_file_name}
>> +function:@var{line_number},@var{execution_count},@var{function_name}
>> +lcount:@var{line number},@var{execution_count}
>> +branch:@var{line_number},@var{branch_coverage_type}
>> +
>> +Where the @var{branch_coverage_type} is
>> +   notexec (Branch not executed)
>> +   taken (Branch executed and taken)
>> +   nottaken (Branch executed, but not taken)
>> +
>> +There can be multiple @var{file} entries in an intermediate gcov
>> +file. All entries following a @var{file} pertain to that source file
>> +until the next @var{file} entry.
>> +@end smallexample
>> +
>> +Here is a sample when @option{-i} is used in conjuction with
>> @option{-b} option:
>> +
>> +@smallexample
>> +file:array.cc
>> +function:11,1,_Z3sumRKSt6vectorIPiSaIS0_EE
>> +function:22,1,main
>> +lcount:11,1
>> +lcount:12,1
>> +lcount:14,1
>> +branch:14,taken
>> +lcount:26,1
>> +branch:28,nottaken
>> +@end smallexample
>> +
>> +@item -m
>> +@itemx --demangled-names
>> +Display demangled function names in output. The default is to show
>> +mangled function names.
>> +
>>  @end table
>>
>>  @command{gcov} should be run with the current directory the same as that
>> Index: gcov.c
>> ===
>> --- gcov.c (revision 199273)
>> +++ gcov.c (working copy)
>> @@ -37,6 +37,7 @@ along with Gcov; see the file COPYING3.  If not se
>>  #include "intl.h"
>>  #include "diagnostic.h"
>>  #include "version.h"
>> +#include "demangle.h"
>>
>>  #include 
>>
>> @@ -168,6 +169,7 @@ typedef struct function_info
>>  {
>>/* Name of function.  */
>>char *name;
>> +  char *demangled_name;
>>unsigned ident;
>>unsigned lineno_checksum;
>>unsigned cfg_checksum;
>> @@ -325,6 +327,14 @@ static int flag_gcov_file = 1;
>>
>>  static int flag_display_progress = 0;
>>
>> +/* Output *.gcov file in intermediate format used by 'lcov'.  */
>> +
>> +static int flag_intermediate_format = 0;
>> +
>> +/* Output demangled function names.  */
>> +
>> +static int flag_demangled_names = 0;
>> +
>>  /* For included files, make the gcov output file name include the name
>> of the input source file.  For example, if x.h is included in a.c,
>> the

[PATCH] New test to Cilk Plus array notation suite

2013-06-17 Thread Iyer, Balaji V

Hello Everyone,
I am adding a new execution test to the array notation suite. In array 
notation's __sec_reduce_max_ind and __sec_reduce_min_ind builtin functions, if 
there is a  tie for max/min value, then the higher index should be returned. In 
this case, all the values are the same, so it should return the highest index 
of the array (i.e. size-1).

I am going to check this in as obvious. I am willing to revert this patch (or 
fix it) if anyone has any objections.

Here is the ChangeLog for it.

2013-06-17  Balaji V. Iyer  

* c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c: New test.

Here is the patch:

Index: gcc/testsuite/c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c
===
--- gcc/testsuite/c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c 
(revision 0)
+++ gcc/testsuite/c-c++-common/cilk-plus/AN/sec_reduce_ind_same_value.c 
(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do run } */
+/* { dg-options "-fcilkplus" } */
+
+int A[256];
+
+int main () {
+A[:] = 2;
+int max_index = 0, min_index = 0;
+
+max_index = __sec_reduce_max_ind (A[:]);
+
+if (max_index != 255)
+  return 1;
+
+min_index = __sec_reduce_min_ind (A[:]);
+if (min_index != 255)
+  return 2;
+
+return 0;
+}
+

Thanks,

Balaji V. Iyer.

Re: [c++-concepts] code review

2013-06-17 Thread Andrew Sutton

>> 2. I left the  include in system.h, because removing it will
>> result in errors due to an inclusion of . It's probable
>> that both can be removed, but I didn't test it with this patch.
>
>
> You mean you don't need  anymore in logic.cc?  I think we want
> the  include in general if we're going to support people using the
> C++ standard library.

I don't. Those decisions are above my pay grade, so I'm doing my best
not to make them.



>> +  reason = template_unification_error_rejection ();
>
>
> Can't you do
>
>   template_constraint_failure (DECL_TI_TEMPLATE (fn), DECL_TI_ARGS (fn))

I can indeed. Fixed.


>>if (fn == error_mark_node)
>>  {
>> +  if (!check_template_constraints (tmpl, targs))
>> +  reason = template_constraint_failure (tmpl, targs);
>> +  else if (errorcount+sorrycount == errs)
>>/* Don't repeat unification later if it already resulted in errors.
>> */
>
>
> A constraint failure is a form of unification failure, like SFINAE; let's
> handle it in that context rather than separately here, and reserve
> rr_constraint_failure for the case of a non-template member function.

And emit the diagnostics in fn_type_unification when explain_p is set?
Seems reasonable.


> The uses of "actual template" in the comment and function name seem to mean
> different things.  Maybe call the function template_decl_for_candidate?

Fixed.

> Can friend temploids be constrained?

I have not thought deeply about constrained friend declarations. What
would a friend temploid look like?


> Seems like the comment is no longer accurate; the function now only returns
> NULL_TREE if both a and b are NULL_TREE.

It's not. And I removed disjoin_requirements. I don't think it will
ever be used.

>> +static tree
>> +get_constraint_check (tree call)
>
> The comment needs updating.  And the function isn't used anywhere; it seems
> redundant with resolve_constraint_check.

I removed the function.


>> +// and other template nodes from generating new expressions
>> +// during instantiation.
>
> I'm not clear on the issue.  Perhaps leaving processing_template_decl alone
> and using fold_non_dependent_expr would accomplish what you want?

I don't think that will work. The problem comes from the instantiation
of traits (and some other expressions) during constraint checking.
When processing_template_decl is non-zero, finish_trait_expr will
create TRAIT_EXPR nodes, which aren't handled by the constexpr
evaluation engine.

I could updated constexpr to fix that, or changed finish_trait_expr
(and others? noexcept maybe) to evaluate for non-dependent arguments,
but I was trying to limit the scope of my changes.

I'd really like for this structure to go away. It's very fragile.


>> +// Check the template's constraints for the specified arguments,
>> +// returning true if the constraints are satisfied and false
>> +// otherwise.
>> +bool
>> +check_template_constraints (tree t, tree args)
>> +{
>> +  return check_constraints (get_constraints (t), args);
>> +}
>
>
> Seems like the last function would also work for types and non-template
> decls.

It should. Updated the comment to reflect the broader applicability of
the function.


>> +  error_at (loc, "constraints not satisfied");
>
>
> I assume you're planning to make this more verbose?  It could use a TODO to
> that effect.

Very much so. Added comments.


>> +  else if (are_constrained_overloads (newdecl, olddecl))
>> +{
>> +  // If newdecl and olddecl are function templates with the
>> +  // same type but different constraints, then they cannot be
>> +  // duplicate declarations. However, if the olddecl is declared
>> +  // as a concept, then the program is ill-formed.
>> +  if (DECL_DECLARED_CONCEPT_P (DECL_TEMPLATE_RESULT (olddecl)))
>> +{
>> +  error ("cannot specialize concept %q#D", olddecl);
>> +  return error_mark_node;
>> +}
>> +  return NULL;
>> +}
>
>
> We should check for differing constraints in decls_match, and then this code
> should be in the !types_match section of duplicate_decls.

I'll get back to you on this. I need to work my way through that code.


>
>>error ("concept %q#D result must be convertible to bool", fn);
>
>
> Why don't we require the result to actually be bool?  It seems difficult to
> decompose dependent requirements if the return type can be something else.

I'm probably sitting somewhere between two ideas. Requiring the result
to be exactly bool is fine.


>> +  cp_lexer_next_token_is_keyword (parser->lexer,
>> RID_REQUIRES))
>>{
>>  tree reqs = release (current_template_reqs);
>> -if (tree r = cp_parser_requires_clause_opt (parser))
>> +if (tree r = cp_parser_requires_clause (parser))
>
>
> I was thinking to change the name of cp_requires_clause to
> cp_parser_requires_clause_opt and change the assert to a test for the
> keyword and return NULL_TREE if it isn't found.

I s

Re: [C/C++ PATCH] Handle COMPOUND_EXPR in convert_to_* (PR c++/56493)

2013-06-17 Thread Jakub Jelinek

On Mon, Jun 17, 2013 at 01:44:25PM -0400, Jason Merrill wrote:
> On 06/17/2013 12:54 PM, Joseph S. Myers wrote:
> >I suppose it's OK to fix the regression, though really we should be
> >eliminating these early convert_to_* optimizations (optimizing later on
> >GIMPLE if possible) rather than adding to them.
> 
> My thought as well.  How hard is it to fix this in gimple fold?

PRs for it are open for almost 3 years now, I think Kai has some patch, but
it hasn't been posted yet, so it is unclear if it would make 4.9.
In any case, it isn't probably something that can be backported to release
branches, while perhaps this simple convert.c change is.

Once we have it in GIMPLE, sure, the change in convert.c can be happily
reverted and also those optimizations removed from there too.

Jakub

Re: [PATCH] Re-write LTO type merging again, do tree merging

2013-06-17 Thread Andi Kleen


Current trunk cannot build LTO kernels now, with your patch, 
as reported earlier.

Please fix ASAP. I'm surprised that you checked in a patchkit
with such serious reported problems.

-Andi


backup/lsrc/git/linux-lto-2.6/lib/decompress_unlzo.c: In function 'unlzo':
/backup/lsrc/git/linux-lto-2.6/lib/decompress_unlzo.c:79:8: internal compiler 
error: in expand_expr_real_1, at expr.c:9361
  parse += 7;
^
0x5ea175 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**)
../../gcc/gcc/expr.c:9356
0x53fe49 expand_expr
../../gcc/gcc/expr.h:444
0x53fe49 expand_gimple_stmt_1
../../gcc/gcc/cfgexpand.c:2283
0x53fe49 expand_gimple_stmt
../../gcc/gcc/cfgexpand.c:2370
0x5408a8 expand_gimple_basic_block
../../gcc/gcc/cfgexpand.c:4204
0x542996 gimple_expand_cfg
../../gcc/gcc/cfgexpand.c:4723
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
make[3]: *** [/home/ak/lsrc/git/obj-lto/ccUSzSrf.ltrans18.ltrans.o] Error 1







-- 
a...@linux.intel.com -- Speaking for myself only.

Re: GCC does not support *mmintrin.h with function specific opts

2013-06-17 Thread Sriraman Tallam

On Fri, Jun 14, 2013 at 11:08 AM, Sriraman Tallam  wrote:
> On Fri, Jun 14, 2013 at 1:43 AM, Richard Biener
>  wrote:
>> On Fri, Jun 14, 2013 at 4:52 AM, Sriraman Tallam  wrote:
>>> On Thu, Jun 13, 2013 at 12:40 PM, Jan Hubicka  wrote:
>   * tree-inline.c (expand_call_inline): Allow the error to be flagged
>   in early inline pass.
>   * ipa-inline.c (inline_always_inline_functions): Pretend 
> always_inline
>   functions are inlined during failures to flag an error.
>   * gcc.target/i386/inline_error.c: New test.
>>>
 This patch is OK if it passes testing.
>>>
>>> Two tests gcc.c-torture/compile/pr43791.c and pr44043.c are failing
>>> because of always_inline functions being present that cannot be
>>> inlined and the compiler is now generating error messages. I will fix
>>> them and resend the patch.
>>
>> Quick look - pr43791.c is not expected to work at -O0, so skip -O0
>> for example by guarding the whole thing with #if __OPTIMIZED__ > 0.
>> Similar for pr44043.c.
>
> Seems like __OPTIMIZED__ is not defined in my config, dont know why. I
> see other tests checking for __OPTIMIZED.
>
> I fixed these two tests by adding a dg-prune-output at the end. Is
> that reasonable? I have attached the patch with all tests passing now.

I have attached the latest patch with a small error in the previous
patch fixed. I will submit this patch if there are no objections.

Thanks
Sri

>
> Thanks
> Sri
>
>>
>> That we didn't error at -O0 before is a bug.  Eventually I was suggesting
>> to error if we end up outputting the body of an always_inline function,
>> that is, any uses remain (including indirect calls or address-takens which
>> is where we do not error right now).
>>
>> Richard.
>>
>>> Thanks
>>> Sri
>>>
 Thanks for your patience!
>>>
>>>
>>>

 Honza
* tree-inline.c (expand_call_inline): Allow the error to be flagged
in early inline pass.
* ipa-inline.c (inline_always_inline_functions): Pretend always_inline
functions are inlined during failures to flag an error.
* gcc.target/i386/inline_error.c: New test.
* gcc.c-torture/compile/pr44043.c: Fix test to expect an error.
* gcc.c-torture/compile/pr43791.c: Fix test to expect an error.

Index: testsuite/gcc.c-torture/compile/pr44043.c
===
--- testsuite/gcc.c-torture/compile/pr44043.c   (revision 200034)
+++ testsuite/gcc.c-torture/compile/pr44043.c   (working copy)
@@ -85,3 +85,5 @@ int raw_sendmsg(struct sock *sk, struct msghdr *ms
 {
   raw_send_hdrinc(sk, msg->msg_iov, len, (void *)0, msg->msg_flags);
 }
+
+/* { dg-prune-output "(inlining failed in call to always_inline.*indirect 
function call with a yet undetermined callee|called from here)" } */
Index: testsuite/gcc.c-torture/compile/pr43791.c
===
--- testsuite/gcc.c-torture/compile/pr43791.c   (revision 200034)
+++ testsuite/gcc.c-torture/compile/pr43791.c   (working copy)
@@ -1,3 +1,4 @@
+
 int owner();
 int clear();
 
@@ -18,4 +19,4 @@ void fasttrylock(void (*slowfn)()) {
 void trylock(void) {
  fasttrylock(slowtrylock);
 }
-
+/* { dg-prune-output "(inlining failed in call to always_inline.*indirect 
function call with a yet undetermined callee|called from here)" } */
Index: testsuite/gcc.target/i386/inline_error.c
===
--- testsuite/gcc.target/i386/inline_error.c(revision 0)
+++ testsuite/gcc.target/i386/inline_error.c(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -mno-popcnt" } */
+
+inline int __attribute__ ((__gnu_inline__, __always_inline__, 
target("popcnt")))
+foo () /* { dg-error "inlining failed in call to always_inline .* target 
specific option mismatch" } */
+{
+  return 0;
+}
+
+int bar()
+{
+  return foo (); /* { dg-error "called from here" } */
+}
Index: ipa-inline.c
===
--- ipa-inline.c(revision 200034)
+++ ipa-inline.c(working copy)
@@ -1911,7 +1911,15 @@ inline_always_inline_functions (struct cgraph_node
}
 
   if (!can_early_inline_edge_p (e))
-   continue;
+   {
+ /* Set inlined to true if the callee is marked "always_inline" but
+is not inlinable.  This will allow flagging an error later in
+expand_call_inline in tree-inline.c.  */
+ if (lookup_attribute ("always_inline",
+DECL_ATTRIBUTES (callee->symbol.decl)) != NULL)
+   inlined = true;
+ continue;
+   }
 
   if (dump_file)
fprintf (dump_file, "  Inlining %s into %s (always_inline).\n",
Index: tree-inline.c
===
--- tree-inline.c   (revision 200034)
+++ tree-inline.c   (working copy)
@@ -3905,8 +3905,6 @@ expand_call_inlin

Re: [C/C++ PATCH] Handle COMPOUND_EXPR in convert_to_* (PR c++/56493)

2013-06-17 Thread Jason Merrill


On 06/17/2013 12:54 PM, Joseph S. Myers wrote:

I suppose it's OK to fix the regression, though really we should be
eliminating these early convert_to_* optimizations (optimizing later on
GIMPLE if possible) rather than adding to them.


My thought as well.  How hard is it to fix this in gimple fold?

Jason

Re: [patch i386]: RFC enable inlining for function with machine-attributes

2013-06-17 Thread Kai Tietz

2013/6/17 Richard Henderson :
> On 06/16/2013 11:55 PM, Kai Tietz wrote:
>> +static bool
>> +ix86_function_attribute_inlinable_p (const_tree fndecl ATTRIBUTE_UNUSED)
>> +{
>> +  return true;
>> +}
>
> This is hook_bool_const_tree_true.

Right, we could define macro to this ...

> I have an idea that perhaps the default ought to be true, and that the few
> targets (like mep) that have an interrupt attribute, etc ought to be the ones
> to actually implement this hook.

Yes, that was actual the cause why I did a RFC on that patch.  I was
concerned to break some targets by changing the default.  I admit that
changing default would be the preferred solution for me, too.

>
> r~

Kai

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jason Merrill


On 06/17/2013 10:18 AM, Jan Hubicka wrote:

Does ODR hold also for extern "C" declarations in C++?


Yes.


In C, you can treat all structurally identical types as the same, right?

In next step we can sort out other reasons they
are not merged by Richard's merging code but stil equivalent in C++ sense.
There are easy cases to list - i.e. we will not merge class type from unit that
defines the keyed method with class type from other unit since the associated
binfo will differ in types of the declarations (DECL_EXTERNAL versus
TREE_PUBLIC+STATIC).


I'm surprised the code wouldn't handle that; don't you handle merging 
extern function decls with definitions outside of classes?



OK, thanks!  I naively assumed that checking DECL_NAME for NULL is enough
here. I will fix this.


Ah, yes, that ought to work.  I was overlooking that bit.


The question is when will two types be "ODR equivalent" but not
"tree equivalent".  Certainly there is no such thing as "ODR" for C,
so given C++ manually implemented in C you cannot expect same
"vtable" contents when the types have the same name.



yes, that is why I think we need flag specifying if type is ODR or not.


I'm not sure we do.  A manual implementation wouldn't have the "vtable" 
specially associated with the type, so that shouldn't be an issue.


Jason

Re: [PATCH] Re-write LTO type merging again, do tree merging

2013-06-17 Thread Andi Kleen

On Mon, Jun 17, 2013 at 10:12:41AM +0200, Jan Hubicka wrote:
> > > CPU: AMD64 family10, speed 2100 MHz (estimated)
> > > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a 
> > > unit mask of 0x00 (No unit mask) count 75
> > > samples  %app name symbol name
> > > 4504711.7420  lto1 inflate_fast
> > 
> > It might be worth changing LTO section layout to include a header
> > that specifies whether a section is compressed or not so we can
> > allow mixed compressed/uncompressed sections in the LTRANS files
> > and avoid decompressing the function sections.
> 
> Yes, but this profile shows only decl streaming. Functions do not really show
> up in profile.  I guess only way to cut this down is to either use LZO that
> is faster at decompression side and/or reduce amount of data we stream to .o
> files.

Or better snappy / snappy-c. It's faster at compression than LZO and 
comparable at decompression.

-Andi

Re: [PATCH] ggc-page.c : remove erroneous ATTRIBUTE_UNUSED

2013-06-17 Thread David Malcolm

On Mon, 2013-06-17 at 11:18 +0200, Richard Biener wrote:
> On Fri, Jun 14, 2013 at 11:17 PM, David Malcolm  wrote:
> > ggc_pch_write_object's parameter "d" is marked with ATTRIBUTE_UNUSED,
> > but in fact it is used in 4 places at the end of the function.
> >
> > Successfully bootstrapped on x86_64-unknown-linux-gnu.
> >
> > OK for trunk?
> 
> Ok.
Thanks; committed to svn trunk as r200154.

Re: [gomp4] Some progress on #pragma omp simd

2013-06-17 Thread Richard Henderson

On 06/17/2013 10:13 AM, Aldy Hernandez wrote:
> -   data.simduid = tree_low_cst (gimple_call_arg (stmt, 0), 1);
> +   data.simduid = gimple_call_arg (stmt, 0);

Doesn't this copy the ADDR_EXPR from the call into simduid?

>  simduid_to_vf::hash (const value_type *p)
>  {
> -  return p->simduid;
> +  return htab_hash_pointer (p->simduid);

... at which point this bit is meaningless since all ADDR_EXPRs must of course
have different pointers.

I think we should validate the DECL_P extracted from the call_arg, and store
that.  The hash should use DECL_UID to minimize hash variation due to memory
layout.

r~

Re: [gomp4] Some progress on #pragma omp simd

2013-06-17 Thread Aldy Hernandez


On 06/12/13 16:36, Jakub Jelinek wrote:

On Wed, Jun 12, 2013 at 10:38:00AM -0700, Richard Henderson wrote:

On 06/12/2013 10:30 AM, Jakub Jelinek wrote:

So the built-ins would take address of this decl, something else?


Perhaps address, perhaps just referenced uninitialized?


True, assuming no pass would actually want to change that SSA_NAME of the
magic decl just because it is undefined (coalesce with some other undefined
SSA_NAME or something similar).  I hope nothing does that, it would be
problematic for the uninitialized warning pass too I bet.


Gentlemen:

Attached is a lightly tested patch rewriting the aforementioned UID to 
use the address of a dummy DECL (see the lower_rec_input_clauses fragment).


I first tried an uninitialized reference but into-SSA happens after 
lower_rec_input_clauses, and the DECL gets rewritten into SSA, while 
loop->simduid maintains the original DECL.  This causes all the equality 
checks throughout to fail.  I thought of changing the SSA pass to update 
loop->simduid as well, but that seemed kludgy.  I didn't think about it 
too hard, perhaps there's a more elegant way.


Anyway...this at least works with my contrived reduction tests for Cilk 
Plus pragma simd.


If this is what you want, I can start looking at how this behaves with 
LTO and inlining.


Aldy
diff --git a/gcc/ChangeLog.gomp b/gcc/ChangeLog.gomp
index 7f9151d..0478ab2 100644
--- a/gcc/ChangeLog.gomp
+++ b/gcc/ChangeLog.gomp
@@ -1,3 +1,20 @@
+2013-06-17  Aldy Hernandez  
+
+   * builtin-types.def (BT_FN_UINT_PTR): New.
+   * omp-builtins.def (BUILT_IN_GOMP_SIMD_LANE): Use it.
+   (BUILT_IN_GOMP_SIMD_VF): Same.
+   * cfgloop.h (struct loop): Change type of simduid to tree.
+   * omp-low.c (lower_rec_input_clauses): Adapt to use simduid as a
+   tree.
+   (expand_omp_simd): Same.
+   * tree-data-ref.c (get_references_in_stmt): Same.
+   * tree-vect-data-refs.c (vect_analyze_data_refs): Same.
+   * tree-vectorizer.c (struct simduid_to_vf): Change type of simduid
+   to tree.
+   (simduid_to_vf::hash): Hash pointer.
+   (adjust_simduid_builtins): Add comment.
+   Use simduid as tree.
+
 2013-06-14  Jakub Jelinek  
 
* gimple-pretty-print.c (dump_gimple_omp_for): Don't handle
diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 4c866f2..171fdb7 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -227,6 +227,7 @@ DEF_FUNCTION_TYPE_1 (BT_FN_DFLOAT128_DFLOAT128, 
BT_DFLOAT128, BT_DFLOAT128)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_VPTR, BT_VOID, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_VOID_PTRPTR, BT_VOID, BT_PTR_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT_UINT, BT_UINT, BT_UINT)
+DEF_FUNCTION_TYPE_1 (BT_FN_UINT_PTR, BT_UINT, BT_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_ULONG_ULONG, BT_ULONG, BT_ULONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_ULONGLONG_ULONGLONG, BT_ULONGLONG, BT_ULONGLONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_UINT16_UINT16, BT_UINT16, BT_UINT16)
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 6cc9a6c..41677bc 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -176,7 +176,7 @@ struct GTY ((chain_next ("%h.next"))) loop {
 
   /* For SIMD loops, this is a unique identifier of the loop, referenced
  by __builtin_GOMP.simd_vf and __builtin_GOMP.simd_lane builtins.  */
-  unsigned int simduid;
+  tree simduid;
 
   /* True if we should try harder to vectorize this loop.  */
   bool force_vect;
diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def
index 8ad2113..ddbe2c1 100644
--- a/gcc/omp-builtins.def
+++ b/gcc/omp-builtins.def
@@ -220,6 +220,6 @@ DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SINGLE_COPY_END, 
"GOMP_single_copy_end",
  BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
 
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SIMD_LANE, "GOMP.simd_lane",
- BT_FN_UINT_UINT, ATTR_NOVOPS_NOTHROW_LEAF_LIST)
+ BT_FN_UINT_PTR, ATTR_NOVOPS_NOTHROW_LEAF_LIST)
 DEF_GOMP_BUILTIN (BUILT_IN_GOMP_SIMD_VF, "GOMP.simd_vf",
- BT_FN_UINT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST)
+ BT_FN_UINT_PTR, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a9e2758..84448a7 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2497,7 +2497,6 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, 
gimple_seq *dlist,
   bool copyin_by_ref = false;
   bool lastprivate_firstprivate = false;
   int pass;
-  static int simd_uid;
   bool is_simd = (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
  && gimple_omp_for_kind (ctx->stmt) == GF_OMP_FOR_KIND_SIMD);
   int max_vf = 0;
@@ -2887,15 +2886,17 @@ lower_rec_input_clauses (tree clauses, gimple_seq 
*ilist, gimple_seq *dlist,
 
   if (lane)
 {
-  tree uid_cst = build_int_cst (unsigned_type_node, ++simd_uid);
+  tree uid = create_tmp_var (ptr_type_node, "simduid");
+  TREE_ADDRESSABLE (uid) = 1;
+  uid = build_fold_addr_expr (uid);
   gimple g
= gimple_build_call (builtin_decl_explicit (BUILT_IN_GOMP_SIMD_LAN

Re: [PATCH] Cilk Plus Array Notation for C++

2013-06-17 Thread Richard Henderson

On 06/17/2013 10:00 AM, Iyer, Balaji V wrote:
> In hindsight, I could have for __sec_reduce_max and __sec_reduce_min. I was
> more familiar with conditional expression. Out of curiosity, is there a big
> performance benefit of using max/min expr over conditional?

There can be.  The COND->MIN/MAX transformation is not done without the
-ffinite-math-only component of -ffast-math.  I.e. we don't try the transform
when NaNs are a possibility.

So, yes, you probably should generate MIN/MAX_EXPR right from the start.


r~

RE: [PATCH] Cilk Plus Array Notation for C++

2013-06-17 Thread Iyer, Balaji V



> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Richard Henderson
> Sent: Thursday, June 13, 2013 12:19 PM
> To: Aldy Hernandez
> Cc: Iyer, Balaji V; gcc-patches@gcc.gnu.org; Jason Merrill (ja...@redhat.com)
> Subject: Re: [PATCH] Cilk Plus Array Notation for C++
> 
> On 06/13/2013 09:11 AM, Aldy Hernandez wrote:
> > The whole slew of these cases have a lot of duplicated code.  For
> > instance, BUILT_IN_CILKPLUS_SEC_REDUCE_MIN is the same as
> > BUILT_IN_CILKPLUS_SEC_REDUCE_MAX, the only difference being GT_EXPR
> vs
> > LT_EXPR.  Surely you could do something like:
> >
> > if (an_type == BUILT_IN_CILKPLUS_SEC_REDUCE_MIN
> > || an_type == BUILT_IN_CILKPLUS_SEC_REDUCE_MAX) {
> >enum tree_code code;
> >if (an_type == BUILT_IN_CILKPLUS_SEC_REDUCE_MIN)
> >  code = GT_EXPR;
> >else
> >  code = LT_EXPR;
> >// stuff
> > }
> >
> > The compiler should be able to optimize the above, but even if it
> > couldn't, I am willing to compare twice and save lines and lines of code.
> >
> > Similarly for SEC_REDUCE_ANY_ZERO/SEC_REDUCE_ANY_NONZERO,
> > SEC_REDUCE_ALL_ZERO/SEC_REDUCE_ALL_NONZERO,
> > SEC_REDUCE_ADD/SEC_REDUCE_MUL, etc etc etc.
> 
> Yep.  It's at this point that I normally start using the idiom
> 
>   switch (an_type)
> {
> case BUILT_IN_CILKPLUS_SEC_REDUCE_MIN:
>   code = GT_EXPR;
>   goto do_min_max;
> case BUILT_IN_CILKPLUS_SEC_REDUCE_MAX:
>   code = LT_EXPR;
>   goto do_min_max;
> do_min_max:
>   // stuff
> 
> So much the better if you can include the other SEC_* cases in that switch 
> too,
> doing one compiler-controlled dispatch.

I didn't use goto, but I did try to abstract out the common parts and either 
tried to lump them together in the same case or  put them outside of the case. 
The new patch is here: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00984.html

> 
> It also occurs to me to wonder why you're building your own COND_EXPR here,
> with the comparison, rather than using MIN/MAX_EXPR...

In hindsight, I could have for __sec_reduce_max and __sec_reduce_min. I was 
more familiar with conditional expression. Out of curiosity, is there a big 
performance benefit of using max/min expr over conditional?


Thanks,

Balaji V. Iyer.

> 
> 
> r~

Re: [C/C++ PATCH] Handle COMPOUND_EXPR in convert_to_* (PR c++/56493)

2013-06-17 Thread Joseph S. Myers

On Mon, 17 Jun 2013, Jakub Jelinek wrote:

> The following patch shows a performance regression caused by the C++ changes
> to evaluate side-effects in x += foo (); first and only then do the
> addition.  Previously if x was say int and foo () returned unsigned long
> long, convert_to_integer would narrow the addition to unsigned int, but
> as we now pass to convert_to_* a COMPOUND_EXPR with the side-effects on the
> op0 and addition on op1 of the COMPOUND_EXPR, convert_to_integer doesn't
> perform this shortening and unfortunately we don't have any gimple
> optimization yet to do this later on.  This patch fixes it by handling
> COMPOUND_EXPR in convert_to_* where we care about TREE_CODE of the expr.
> 
> Ok for trunk?  Ok for 4.8 as well after a while?

I suppose it's OK to fix the regression, though really we should be 
eliminating these early convert_to_* optimizations (optimizing later on 
GIMPLE if possible) rather than adding to them.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Sandra Loosemore


On 06/17/2013 08:41 AM, Joseph S. Myers wrote:

On Mon, 17 Jun 2013, Julian Brown wrote:


IIUC, the incompatibility between the specified
-fstrict-volatile-bitfields behaviour and the C++ memory model is a
recognised deficiency in the ARM EABI. It might be an unpopular
suggestion, but how about disabling the option by default for C++ on
ARM [*]? Maybe even with a "sorry" if the user tries to turn it on
manually?


The C11 and C++11 memory models are the same, this is nothing to do with
C++ as opposed to C.

Someone from ARM will need to answer as to what their plans are to make
AAPCS compatible with the memory model.  I'd say strict-volatile-bitfields
should only ever apply in cases where it does not conflict with language
semantics - so just as "packed" should take precedence, so the memory
model should also take precedence (in the absence of --param
allow-store-data-races=1, anyway) in those cases where AAPCS would
indicate writing to an object outside the maximal sequence of consecutive
non-zero-width bit-fields.


The problem with always choosing C++ memory model compliance over ABI 
compliance, no matter what the setting of -fstrict-volatile-bitfields, 
is that some users may have legacy code out there where they really want 
the ABI-compliant behavior.  Perhaps we should not make 
-fstrict-volatile-bitfields not be the default on any target any more, 
but if you supply it explicitly it tells GCC that you really want the 
ABI-compliant behavior to override the language-standard-compliant behavior?


I don't think there's an actual conflict there between 
-fstrict-volatile-bitfields and packed structure extensions.  AAPCS 
allows for the possibility of packed structures but says conforming 
programs don't use them for external interfaces (this is the note at the 
end of 7.1.7.1).  As I noted previously, my understanding is that the 
volatile access requirements in the AAPCS only apply to non-packed 
struct fields whose layout and alignment is fully specified by AAPCS. 
The various packed-structure bugs addressed by part 4 of my patch series 
are mostly cases where the required single access *cannot* be emitted 
because the field is not aligned; instead, we're emitting code that is 
totally wrong even under GCC's normal access rules, because it faults at 
runtime or only accesses part of the bit-field.


-Sandra

[PATCH] Fix raw-string handling (PR preprocessor/57620)

2013-06-17 Thread Jakub Jelinek

Hi!

lex_raw_string right now only undoes phase {1,2} transformations in between
R"delim( and )delim", while it should undo them everywhere between R" and
the final ".  The following patch implements that, and adds testsuite
coverage for that.  Bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

BTW, any thoughts on how to deal with the GNU backslash whitespace newline
extension that we need to undo, but lose track on what exact whitespace has
been seen?  Perhaps we could encode in extra line notes that there were 123
spaces, 1 tab, 2 spaces in between \ and new-line, or just add a note with a
pointer to an allocated buffer with the whitespace?
Dunno how often \ whitespace \n occurs in real-world code.

2013-06-17  Jakub Jelinek  

PR preprocessor/57620
* lex.c (lex_raw_string): Undo phase1 and phase2 transformations
between R" and final " rather than only in between R"del( and )del".

* c-c++-common/raw-string-2.c (s12, u12, U12, L12): Remove.
(main): Don't test {s,u,U,L}12.
* c-c++-common/raw-string-13.c: New test.
* c-c++-common/raw-string-14.c: New test.
* c-c++-common/raw-string-15.c: New test.
* c-c++-common/raw-string-16.c: New test.

--- libcpp/lex.c.jj 2013-04-25 23:47:59.0 +0200
+++ libcpp/lex.c2013-06-17 15:22:40.343049765 +0200
@@ -1346,7 +1346,8 @@ static void
 lex_raw_string (cpp_reader *pfile, cpp_token *token, const uchar *base,
const uchar *cur)
 {
-  const uchar *raw_prefix;
+  uchar raw_prefix[17];
+  const uchar *raw_prefix_start, *orig_base;
   unsigned int raw_prefix_len = 0;
   enum cpp_ttype type;
   size_t total_len = 0;
@@ -1358,9 +1359,76 @@ lex_raw_string (cpp_reader *pfile, cpp_t
  *base == 'u' ? (base[1] == '8' ? CPP_UTF8STRING : CPP_STRING16)
  : CPP_STRING);
 
-  raw_prefix = cur + 1;
-  while (raw_prefix_len < 16)
+#define BUF_APPEND(STR,LEN)\
+  do { \
+   bufring_append (pfile, (const uchar *)(STR), (LEN), \
+   &first_buff, &last_buff);   \
+   total_len += (LEN); \
+  } while (0);
+
+  orig_base = base;
+  raw_prefix_start = cur;
+  cur++;
+  while (raw_prefix_len <= 16)
 {
+  /* If we previously performed any trigraph or line splicing
+transformations, undo them anywhere after R" and the final "
+of the raw string.  */
+  while (note->pos < cur)
+   ++note;
+
+  raw_prefix[raw_prefix_len] = *cur;
+  if (note->pos == cur)
+   switch (note->type)
+ {
+ case '\\':
+ case ' ':
+   /* As neither \\ nor \n are valid d-char characters,
+  '\\' or ' ' notes always result in invalid raw string.  */
+   if (raw_prefix_len <= 16)
+ raw_prefix[raw_prefix_len] = '\\';
+   break;
+ case 0:
+   break;
+ default:
+   if (_cpp_trigraph_map[note->type])
+ {
+   uchar type = note->type;
+   note->type = 0;
+
+   if (!CPP_OPTION (pfile, trigraphs))
+ break;
+   if (raw_prefix_len <= 16)
+ raw_prefix[raw_prefix_len] = '?';
+   if (raw_prefix_len < 16)
+ raw_prefix[++raw_prefix_len] = '?';
+   if (raw_prefix_len < 16)
+ raw_prefix[++raw_prefix_len] = type;
+   /* ??/ followed by new-line means invalid raw-string,
+  while ??/ is fine, the new-line is not.  */
+   if (type == '/' && note[1].pos == cur)
+ {
+   if (note[1].type != '\\' && note[1].type != ' ')
+ abort ();
+   ++note;
+   if (raw_prefix_len < 16)
+ raw_prefix[++raw_prefix_len] = '\n';
+ }
+   else
+ {
+   BUF_APPEND (base, cur - base);
+   base = cur + 1;
+   BUF_APPEND ("??", 2);
+   BUF_APPEND (&type, 1);
+ }
+ }
+   else
+ abort ();
+ }
+
+  if (raw_prefix_len == 16)
+   break;
+
   switch (raw_prefix[raw_prefix_len])
{
case ' ': case '(': case ')': case '\\': case '\t':
@@ -1385,6 +1453,7 @@ lex_raw_string (cpp_reader *pfile, cpp_t
case '&': case '|': case '~': case '!': case '=': case ',':
case '"': case '\'':
  raw_prefix_len++;
+ cur++;
  continue;
}
   break;
@@ -1392,30 +1461,29 @@ lex_raw_string (cpp_reader *pfile, cpp_t
 
   if (raw_prefix[raw_prefix_len] != '(')
 {
-  int col = CPP_BUF_COLUMN (pfile->buffer, raw_prefix + raw_prefix_len)
-   + 1;
+  int col = CPP_BUF_COLUMN (pfile->buffer, cur) + 1;
   if (

[PATCH] Make one of array notation test a run test

2013-06-17 Thread Iyer, Balaji V

Hello Everyone, 
This patch will make one of the array notation tests a runnable one 
instead of a compile only. I have also changed the return values from just '1' 
to distinct values so that it is easier to debug. This patch is committed as 
obvious.

Here is the ChangeLog entry:
+2013-06-17  Balaji V. Iyer  
+
+* c-c++-common/cilk-plus/AN/array_test1.c: Make this an execution test.
+   Also changed the returns from error as distinct values so that it is
+   easier to debug.
+

Thanks,

Balaji V. Iyer.
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 200149)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,9 @@
+2013-06-17  Balaji V. Iyer  
+   
+* c-c++-common/cilk-plus/AN/array_test1.c: Make this an execution test.
+   Also changed the returns from error as distinct values so that it is
+   easier to debug.
+
 2013-06-17  Kyrylo Tkachov  
 
* gcc.target/arm/unaligned-memcpy-2.c (dest): Initialize to
Index: gcc/testsuite/c-c++-common/cilk-plus/AN/array_test1.c
===
--- gcc/testsuite/c-c++-common/cilk-plus/AN/array_test1.c   (revision 
200148)
+++ gcc/testsuite/c-c++-common/cilk-plus/AN/array_test1.c   (working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-options "-fcilkplus" } */
 
 #include 
@@ -47,7 +47,7 @@
   array[x:y:z] = 505;
   for (ii = x; ii < 10; ii += z)
 if (array[ii] != 505)
-  return 2;
+  return 4;
 
   x = atoi(argv[1]);
   z = (10-atoi(argv[1]))/atoi(argv[1]);
@@ -57,7 +57,7 @@
 
   for (ii = x; ii < 10; ii += z)
 if (array[ii] != 25)
-  return 1;
+  return 5;
   x = atoi(argv[1]);
   z = (10-atoi(argv[1]))/atoi(argv[1]);
   y = 10-atoi(argv[1]);
@@ -66,19 +66,19 @@
 1400;
   for (ii = x; ii < 10; ii += z)
 if (array[ii] != 1400)
-  return 1;
+  return 6;
   
 
   array[atoi("5"):5:1] = ;
   
   for (ii = atoi ("5"); ii < 10; ii++)
 if (array[ii] != )
-  return 2;
+  return 7;
   
 
   array[atoi("5"):atoi("5"):atoi("1")] = ;
   for (ii = atoi ("5"); ii < (atoi ("5") + atoi ("5")); ii += atoi ("1"))
 if (array[ii] != )
-  return 3;
+  return 8;
   return 0;
 }

Re: [patch i386]: RFC enable inlining for function with machine-attributes

2013-06-17 Thread Richard Henderson

On 06/16/2013 11:55 PM, Kai Tietz wrote:
> +static bool
> +ix86_function_attribute_inlinable_p (const_tree fndecl ATTRIBUTE_UNUSED)
> +{
> +  return true;
> +}

This is hook_bool_const_tree_true.

I have an idea that perhaps the default ought to be true, and that the few
targets (like mep) that have an interrupt attribute, etc ought to be the ones
to actually implement this hook.

r~

[C/C++ PATCH] Handle COMPOUND_EXPR in convert_to_* (PR c++/56493)

2013-06-17 Thread Jakub Jelinek

Hi!

The following patch shows a performance regression caused by the C++ changes
to evaluate side-effects in x += foo (); first and only then do the
addition.  Previously if x was say int and foo () returned unsigned long
long, convert_to_integer would narrow the addition to unsigned int, but
as we now pass to convert_to_* a COMPOUND_EXPR with the side-effects on the
op0 and addition on op1 of the COMPOUND_EXPR, convert_to_integer doesn't
perform this shortening and unfortunately we don't have any gimple
optimization yet to do this later on.  This patch fixes it by handling
COMPOUND_EXPR in convert_to_* where we care about TREE_CODE of the expr.

Ok for trunk?  Ok for 4.8 as well after a while?

2013-06-17  Jakub Jelinek  

PR c++/56493
* convert.c (convert_to_real, convert_to_expr, convert_to_complex):
Handle COMPOUND_EXPR.

* c-c++-common/pr56493.c: New test.

--- gcc/convert.c.jj2013-05-13 09:44:53.0 +0200
+++ gcc/convert.c   2013-06-16 12:16:13.754108523 +0200
@@ -95,6 +95,15 @@ convert_to_real (tree type, tree expr)
   enum built_in_function fcode = builtin_mathfn_code (expr);
   tree itype = TREE_TYPE (expr);
 
+  if (TREE_CODE (expr) == COMPOUND_EXPR)
+{
+  tree t = convert_to_real (type, TREE_OPERAND (expr, 1));
+  if (t == TREE_OPERAND (expr, 1))
+   return expr;
+  return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR, TREE_TYPE (t),
+TREE_OPERAND (expr, 0), t);
+}
+
   /* Disable until we figure out how to decide whether the functions are
  present in runtime.  */
   /* Convert (float)sqrt((double)x) where x is float into sqrtf(x) */
@@ -366,6 +375,15 @@ convert_to_integer (tree type, tree expr
   return error_mark_node;
 }
 
+  if (ex_form == COMPOUND_EXPR)
+{
+  tree t = convert_to_integer (type, TREE_OPERAND (expr, 1));
+  if (t == TREE_OPERAND (expr, 1))
+   return expr;
+  return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR, TREE_TYPE (t),
+TREE_OPERAND (expr, 0), t);
+}
+
   /* Convert e.g. (long)round(d) -> lround(d).  */
   /* If we're converting to char, we may encounter differing behavior
  between converting from double->char vs double->long->char.
@@ -854,6 +872,14 @@ convert_to_complex (tree type, tree expr
 
if (TYPE_MAIN_VARIANT (elt_type) == TYPE_MAIN_VARIANT (subtype))
  return expr;
+   else if (TREE_CODE (expr) == COMPOUND_EXPR)
+ {
+   tree t = convert_to_complex (type, TREE_OPERAND (expr, 1));
+   if (t == TREE_OPERAND (expr, 1))
+ return expr;
+   return build2_loc (EXPR_LOCATION (expr), COMPOUND_EXPR,
+  TREE_TYPE (t), TREE_OPERAND (expr, 0), t);
+ }
else if (TREE_CODE (expr) == COMPLEX_EXPR)
  return fold_build2 (COMPLEX_EXPR, type,
  convert (subtype, TREE_OPERAND (expr, 0)),
--- gcc/testsuite/c-c++-common/pr56493.c.jj 2013-06-17 10:24:36.891659600 
+0200
+++ gcc/testsuite/c-c++-common/pr56493.c2013-06-17 10:24:33.164720149 
+0200
@@ -0,0 +1,16 @@
+/* PR c++/56493 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-gimple" } */
+
+unsigned long long bar (void);
+int x;
+
+void
+foo (void)
+{
+  x += bar ();
+}
+
+/* Verify we narrow the addition from unsigned long long to unsigned int type. 
 */
+/* { dg-final { scan-tree-dump "  (\[a-zA-Z._0-9]*) = \\(unsigned int\\) 
\[^;\n\r]*;.*  (\[a-zA-Z._0-9]*) = \\(unsigned int\\) \[^;\n\r]*;.* = \\1 \\+ 
\\2;" "gimple" { target { ilp32 || lp64 } } } } */
+/* { dg-final { cleanup-tree-dump "gimple" } } */

Jakub

[PATCH] Fix up bmi_bextr_ (PR target/57623)

2013-06-17 Thread Jakub Jelinek

Hi!

This instruction has the predicates/constraints wrong, the r/m argument is
the middle-one, the value from which it should be extracted, rather than
the packed start/length argument.  This got broken with PR50766, where the
patch hasn't touched just BMI2, but for unknown reasons also this BMI
instruction which was handled right in GCC 4.6.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.8/4.7?

BTW, the AVX2 docs document _bextr_u{32,64} 3 argument intrinsics, rather
than the __bextr_u{32,64} 2 argument intrinsics we have in GCC right now.
Something to fix up (as the names are different, perhaps we can both the old
ones and the new ones implemented say as return __bextr_u32 (x, (y & 255) |
(z << 8)); or similar)?

2013-06-17  Jakub Jelinek  

PR target/57623
* config/i386/i386.md (bmi_bextr_): Swap predicates and
constraints of operand 1 and 2.

* gcc.target/i386/bmi-bextr-3.c: New test.

--- gcc/config/i386/i386.md.jj  2013-06-09 20:29:02.0 +0200
+++ gcc/config/i386/i386.md 2013-06-15 18:24:48.942362202 +0200
@@ -11679,8 +11679,8 @@ (define_insn "*bmi_andn_"
 
 (define_insn "bmi_bextr_"
   [(set (match_operand:SWI48 0 "register_operand" "=r,r")
-(unspec:SWI48 [(match_operand:SWI48 1 "register_operand" "r,r")
-   (match_operand:SWI48 2 "nonimmediate_operand" "r,m")]
+(unspec:SWI48 [(match_operand:SWI48 1 "nonimmediate_operand" "r,m")
+   (match_operand:SWI48 2 "register_operand" "r,r")]
UNSPEC_BEXTR))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_BMI"
--- gcc/testsuite/gcc.target/i386/bmi-bextr-3.c.jj  2013-06-17 
09:48:47.789600664 +0200
+++ gcc/testsuite/gcc.target/i386/bmi-bextr-3.c 2013-06-17 09:48:37.0 
+0200
@@ -0,0 +1,31 @@
+/* PR target/57623 */
+/* { dg-do assemble { target bmi } } */
+/* { dg-options "-O2 -mbmi" } */
+
+#include 
+
+unsigned int
+f1 (unsigned int x, unsigned int *y)
+{
+  return __bextr_u32 (x, *y);
+}
+
+unsigned int
+f2 (unsigned int *x, unsigned int y)
+{
+  return __bextr_u32 (*x, y);
+}
+
+#ifdef  __x86_64__
+unsigned long long
+f3 (unsigned long long x, unsigned long long *y)
+{
+  return __bextr_u64 (x, *y);
+}
+
+unsigned long long
+f4 (unsigned long long *x, unsigned long long y)
+{
+  return __bextr_u64 (*x, y);
+}
+#endif

Jakub

Re: [PATCH] Add command line parsing of -fsanitize

2013-06-17 Thread Jakub Jelinek

On Mon, Jun 17, 2013 at 03:52:54PM +, Joseph S. Myers wrote:
> On Mon, 17 Jun 2013, Jakub Jelinek wrote:
> 
> > > +; What the sanitizer should instrument
> > > +Variable
> > > +unsigned int flag_sanitize
> > 
> > Can't you just add Var(flag_sanitize) to the line after fsanitize= ?
> 
> I think that would create a string variable, whereas an integer is what's 
> wanted here.

We already have say:

Wstack-usage=
Common Joined RejectNegative UInteger Var(warn_stack_usage) Init(-1) Warning
Warn if stack usage might be larger than specified amount

that creates
#ifdef GENERATOR_FILE
extern int warn_stack_usage;
#else
  int x_warn_stack_usage;
#define warn_stack_usage global_options.x_warn_stack_usage
#endif

so I guess UInteger Var(flag_sanitize) then would do the trick.

Though, now looking at it, -fsanitize={address,thread} aren't
RejectNegative, so they accept also -fno-sanitize=address etc.
Marek, thus your patch should handle that properly too, say if you do:
-fsanitize=undefined -fno-sanitize=shift
it should first or into the flag_sanitize bitmask SANITIZE_UNDEFINED
and then and it with ~SANITIZE_SHIFT.

Jakub

Re: [PATCH] PowerPC: Fix test case for PR55033

2013-06-17 Thread David Edelsohn

gcc/testsuite/ChangeLog
2013-06-17  Sebastian Huber  

PR target/55033
* gcc.target/powerpc/pr55033.c: Fix options.

Okay.

Thanks, David

P.S. Please explicitly copy me on patches.

Re: [PATCH] Add command line parsing of -fsanitize

2013-06-17 Thread Joseph S. Myers

On Mon, 17 Jun 2013, Jakub Jelinek wrote:

> > +; What the sanitizer should instrument
> > +Variable
> > +unsigned int flag_sanitize
> 
> Can't you just add Var(flag_sanitize) to the line after fsanitize= ?

I think that would create a string variable, whereas an integer is what's 
wanted here.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [AArch64] Add r<-w alternative to aarch64_dup_lane

2013-06-17 Thread Marcus Shawcroft


On 17/06/13 16:03, Sofiane Naci wrote:

Hi,

This patch adds a r<-w alternative to the aarch64_dup_lane pattern and
updates the testcase gcc.target/aarch64/scalar_intrinsics.c accordingly.

The patch has been successfully tested on a full regression run in
aarch64-none-elf.

OK for trunk?

-
Thanks
Sofiane



OK
/Marcus

[AArch64] Add r<-w alternative to aarch64_dup_lane

2013-06-17 Thread Sofiane Naci

Hi,

This patch adds a r<-w alternative to the aarch64_dup_lane pattern and
updates the testcase gcc.target/aarch64/scalar_intrinsics.c accordingly.

The patch has been successfully tested on a full regression run in
aarch64-none-elf.

OK for trunk?

-
Thanks
Sofiane


aarch64-dup-alternative.diff
Description: Binary data

Re: [PATCH] Add command line parsing of -fsanitize

2013-06-17 Thread Jakub Jelinek

Hi!

Looks good, though I'd appreciate Joseph to look at this from the
option handling POV.  Some nits from me below:

On Fri, Jun 14, 2013 at 09:04:40PM +0200, Marek Polacek wrote:
> --- gcc/opts.c.mp 2013-06-14 19:23:02.300554991 +0200
> +++ gcc/opts.c2013-06-14 20:00:30.377638168 +0200
> @@ -1405,6 +1405,64 @@ common_handle_option (struct gcc_options
>opts->x_exit_after_options = true;
>break;
>  
> +case OPT_fsanitize_:
> +  {
> + const char *p = arg;
> + while (*p != 0)
> +   {
> + static const struct
> + {
> +   const char *const name;
> +   unsigned int flag;
> +   size_t len;
> + } spec[] =
> + {
> +   { "address", SANITIZE_ADDRESS, sizeof "address" - 1 },
> +   { "thread", SANITIZE_THREAD, sizeof "thread" - 1 },

For the trunk version I'd say the following 4 lines should be commented out,
and only enabled when you actually commit to trunk (or branch) corresponding
support.

> +   { "shift", SANITIZE_SHIFT, sizeof "shift" - 1 },
> +   { "integer-divide-by-zero", SANITIZE_DIVIDE,
> + sizeof "integer-divide-by-zero" - 1 },
> +   { "undefined", SANITIZE_UNDEFINED, sizeof "undefined" - 1 },
> +   { NULL, 0, 0 }
> --- gcc/builtins.def.mp   2013-06-14 19:24:52.670944783 +0200
> +++ gcc/builtins.def  2013-06-14 20:00:30.384638196 +0200
> @@ -155,7 +155,8 @@ along with GCC; see the file COPYING3.
>  #define DEF_SANITIZER_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
>DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\
>  true, true, true, ATTRS, true, \
> -(flag_asan || flag_tsan))
> +((flag_sanitize & SANITIZE_ADDRESS) \
> + || (flag_sanitize & SANITIZE_THREAD)))

Use (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD))
or just flag_sanitize instead?

> --- gcc/common.opt.mp 2013-06-14 19:24:52.671944787 +0200
> +++ gcc/common.opt2013-06-14 20:00:30.389638216 +0200
> @@ -207,6 +207,10 @@ unsigned int help_columns
>  Variable
>  bool flag_opts_finished
>  
> +; What the sanitizer should instrument
> +Variable
> +unsigned int flag_sanitize

Can't you just add Var(flag_sanitize) to the line after fsanitize= ?

> +fsanitize=
> +Common Report Joined
> +Select what to sanitize
>  
>  fasynchronous-unwind-tables
>  Common Report Var(flag_asynchronous_unwind_tables) Optimization

Jakub

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Joseph S. Myers

On Mon, 17 Jun 2013, Julian Brown wrote:

> IIUC, the incompatibility between the specified
> -fstrict-volatile-bitfields behaviour and the C++ memory model is a
> recognised deficiency in the ARM EABI. It might be an unpopular
> suggestion, but how about disabling the option by default for C++ on
> ARM [*]? Maybe even with a "sorry" if the user tries to turn it on
> manually?

The C11 and C++11 memory models are the same, this is nothing to do with 
C++ as opposed to C.

Someone from ARM will need to answer as to what their plans are to make 
AAPCS compatible with the memory model.  I'd say strict-volatile-bitfields 
should only ever apply in cases where it does not conflict with language 
semantics - so just as "packed" should take precedence, so the memory 
model should also take precedence (in the absence of --param 
allow-store-data-races=1, anyway) in those cases where AAPCS would 
indicate writing to an object outside the maximal sequence of consecutive 
non-zero-width bit-fields.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Fix PR57630

2013-06-17 Thread Joseph S. Myers

On Mon, 17 Jun 2013, Marek Polacek wrote:

> This improves the diagnostics messages in case we're using initial
> declarations in the for loop, but we're not using C99 or newer standard;
> in this case we shouldn't forget about =c11 and =gnu11, where the
> initial declaration is fine as well.
> 
> Ok for trunk?
> 
> 2013-06-17  Marek Polacek  
> 
>   PR c/57630
>   * c-decl.c (check_for_loop_decls): Improve diagnostics messages.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jan Hubicka

> On Mon, 17 Jun 2013, Jan Hubicka wrote:
> 
> > > On 06/17/2013 09:35 AM, Jan Hubicka wrote:
> > > >To get meaningful warnings, we need to know what decls/types are subject
> > > >to ODR. Do you think you can make C++ FE to drop a flag so middle-end 
> > > >know?
> > > >We can LTO ODR and non-ODR languages together.
> > > 
> > > Basically everything in C++ is subject to the ODR.  There are two
> > > cases: either there can be only one definition anywhere (e.g. normal
> > > variables and functions, anything local to such a function) or all
> > > definitions need to be identical.
> > 
> > Does ODR hold also for extern "C" declarations in C++?
> > 
> > > 
> > > In C, you can treat all structurally identical types as the same, right?
> > 
> > Sort of. Currently we have two notions of equivalency of types
> > 
> > 1) TYPE_CANONICAL that is strictly merged by structural equivalence.
> >The equivalency is defined by gimple_canonical_types_compatible_p
> >and is very conservative (i.e. ignores type names for example)
> > 
> >This is what drives most of backend and aliasing machinery.
> > 
> >Having false equivalences leads to worse TBAA, false nonequivalences
> >leads to wrong code due to aliasing.
> 
> That's 100% accurate.
> 
> > 2) Richard's type (and now tree) merging. This merging is conservative
> >on what is considered to be equivalent type and compare pretty
> >much everything in the in-memory type representaion (i.e. names, 
> > modifiers,
> >BINFOs, stuff that matters for debug or devirt machinery)
> > 
> >false equivalences leads to loss of debug info quality and false
> >devirtualizations, false nonequivalences leads to explossions of memory
> >use and streaming time for Firefox or any other huge LTO build.
> 
> There should not be false equivalencies - false equivalencies can
> lead to random bugs including wrong-code.  Basically we consider
> two tree SCCs equivalent if all values inside them are equal and
> the tree SCC is structurally equivalent.
> 
> >At least this is my understanding, Richard know more ;)
> > 
> > ODR adds third notion of equivalency that should sit strictly in between
> > and in fact perhaps for ODR types we can make 1)=2)=ODR unless ODR is
> > obviously violated.
> 
> The question is when will two types be "ODR equivalent" but not
> "tree equivalent".  Certainly there is no such thing as "ODR" for C,

yes, that is why I think we need flag specifying if type is ODR or not.

An idea would be to check ODR flagged types of ODR equivalency. If there is a
match, look if CANONICAL_TYPE match. If it does not, we can warn user.  If it
does, pick the one that is more specified and make it win (similarly as we have
can_prevail predicate for DECLS I suppose).  Perhaps with some extra
equivalency tests + warnings.

Honza

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Richard Biener

On Mon, 17 Jun 2013, Jan Hubicka wrote:

> > On 06/17/2013 09:35 AM, Jan Hubicka wrote:
> > >To get meaningful warnings, we need to know what decls/types are subject
> > >to ODR. Do you think you can make C++ FE to drop a flag so middle-end know?
> > >We can LTO ODR and non-ODR languages together.
> > 
> > Basically everything in C++ is subject to the ODR.  There are two
> > cases: either there can be only one definition anywhere (e.g. normal
> > variables and functions, anything local to such a function) or all
> > definitions need to be identical.
> 
> Does ODR hold also for extern "C" declarations in C++?
> 
> > 
> > In C, you can treat all structurally identical types as the same, right?
> 
> Sort of. Currently we have two notions of equivalency of types
> 
> 1) TYPE_CANONICAL that is strictly merged by structural equivalence.
>The equivalency is defined by gimple_canonical_types_compatible_p
>and is very conservative (i.e. ignores type names for example)
> 
>This is what drives most of backend and aliasing machinery.
> 
>Having false equivalences leads to worse TBAA, false nonequivalences
>leads to wrong code due to aliasing.

That's 100% accurate.

> 2) Richard's type (and now tree) merging. This merging is conservative
>on what is considered to be equivalent type and compare pretty
>much everything in the in-memory type representaion (i.e. names, modifiers,
>BINFOs, stuff that matters for debug or devirt machinery)
> 
>false equivalences leads to loss of debug info quality and false
>devirtualizations, false nonequivalences leads to explossions of memory
>use and streaming time for Firefox or any other huge LTO build.

There should not be false equivalencies - false equivalencies can
lead to random bugs including wrong-code.  Basically we consider
two tree SCCs equivalent if all values inside them are equal and
the tree SCC is structurally equivalent.

>At least this is my understanding, Richard know more ;)
> 
> ODR adds third notion of equivalency that should sit strictly in between
> and in fact perhaps for ODR types we can make 1)=2)=ODR unless ODR is
> obviously violated.

The question is when will two types be "ODR equivalent" but not
"tree equivalent".  Certainly there is no such thing as "ODR" for C,
so given C++ manually implemented in C you cannot expect same
"vtable" contents when the types have the same name.

Richard.

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jan Hubicka

> On 06/17/2013 09:35 AM, Jan Hubicka wrote:
> >To get meaningful warnings, we need to know what decls/types are subject
> >to ODR. Do you think you can make C++ FE to drop a flag so middle-end know?
> >We can LTO ODR and non-ODR languages together.
> 
> Basically everything in C++ is subject to the ODR.  There are two
> cases: either there can be only one definition anywhere (e.g. normal
> variables and functions, anything local to such a function) or all
> definitions need to be identical.

Does ODR hold also for extern "C" declarations in C++?

> 
> In C, you can treat all structurally identical types as the same, right?

Sort of. Currently we have two notions of equivalency of types

1) TYPE_CANONICAL that is strictly merged by structural equivalence.
   The equivalency is defined by gimple_canonical_types_compatible_p
   and is very conservative (i.e. ignores type names for example)

   This is what drives most of backend and aliasing machinery.

   Having false equivalences leads to worse TBAA, false nonequivalences
   leads to wrong code due to aliasing.

2) Richard's type (and now tree) merging. This merging is conservative
   on what is considered to be equivalent type and compare pretty
   much everything in the in-memory type representaion (i.e. names, modifiers,
   BINFOs, stuff that matters for debug or devirt machinery)

   false equivalences leads to loss of debug info quality and false
   devirtualizations, false nonequivalences leads to explossions of memory
   use and streaming time for Firefox or any other huge LTO build.

   At least this is my understanding, Richard know more ;)

ODR adds third notion of equivalency that should sit strictly in between
and in fact perhaps for ODR types we can make 1)=2)=ODR unless ODR is
obviously violated.

What we need a flag for is to know whether type originates from C++
FE or something else. While we can add the flag at streaming time,
I guess it would be easier if we had TYPE_ODR/DECL_ODR flag and it was
set by C++ FE itself so it was easilly available everywhere in the
middle-end.

If we want ODR violation warning, I think in first cut we could warn out when
types are same by ODR predicate, they are flagged as ODR types and they are not
having same CANONICAL types.  In next step we can sort out other reasons they
are not merged by Richard's merging code but stil equivalent in C++ sense.
There are easy cases to list - i.e. we will not merge class type from unit that
defines the keyed method with class type from other unit since the associated
binfo will differ in types of the declarations (DECL_EXTERNAL versus
TREE_PUBLIC+STATIC). I am sure there are many more such situations.  I also
expect that by looking into these cases we can refine 2) definition of
equivalency.
> 
> The common subset of those would be treating everything with the
> same structure and the same name/context as identical.
> 
> Hmm, it occurs to me that you should check that your patch does the
> right thing with anonymous namespaces: a class in an anonymous
> namespace is not the same as a class with the same name in an
> anonymous namespace in another translation unit.  The C++ front end
> clears TREE_PUBLIC on the TYPE_STUB_DECL (TYPE_MAIN_VARIANT of such
> types to indicate that they are local to that TU.

OK, thanks!  I naively assumed that checking DECL_NAME for NULL is enough
here. I will fix this.

Honza

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jason Merrill


On 06/17/2013 09:35 AM, Jan Hubicka wrote:

To get meaningful warnings, we need to know what decls/types are subject
to ODR. Do you think you can make C++ FE to drop a flag so middle-end know?
We can LTO ODR and non-ODR languages together.


Basically everything in C++ is subject to the ODR.  There are two cases: 
either there can be only one definition anywhere (e.g. normal variables 
and functions, anything local to such a function) or all definitions 
need to be identical.


In C, you can treat all structurally identical types as the same, right?

The common subset of those would be treating everything with the same 
structure and the same name/context as identical.


Hmm, it occurs to me that you should check that your patch does the 
right thing with anonymous namespaces: a class in an anonymous namespace 
is not the same as a class with the same name in an anonymous namespace 
in another translation unit.  The C++ front end clears TREE_PUBLIC on 
the TYPE_STUB_DECL (TYPE_MAIN_VARIANT of such types to indicate that 
they are local to that TU.



Hmm, firefox combine C and C++ units, but I do not see how C and C++
type can get into the devirutalization machinery.  I will debug this.
I suppose TYPE_NAME can be considered identical to correspodning
IDENTIFIER_NODE, tought?


Yes.

Jason

[Patch, AArch64] Adjust gcc.dg/torture/stackalign/builtin-apply-2.c

2013-06-17 Thread Yufeng Zhang


Hi,

This patch sets STACK_ARGUMENTS_SIZE with 0 for AArch64 as variadic 
arguments to 'bar' are passed in registers on this target.


OK for the trunk?

Thanks,
Yufeng

gcc/testsuite/

* gcc.dg/torture/stackalign/builtin-apply-2.c: set
STACK_ARGUMENTS_SIZE with 0 if __aarch64__ is defined.diff --git a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c 
b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
index cbb38ef..7982210 100644
--- a/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
+++ b/gcc/testsuite/gcc.dg/torture/stackalign/builtin-apply-2.c
@@ -16,7 +16,7 @@
E, F and G are passed on stack.  So the size of the stack argument
data is 20.  */
 #define STACK_ARGUMENTS_SIZE  20
-#elif defined __MMIX__
+#elif defined __aarch64__ || defined __MMIX__
 /* No parameters on stack for bar.  */
 #define STACK_ARGUMENTS_SIZE 0
 #else

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jan Hubicka

> On 06/17/2013 09:09 AM, Jan Hubicka wrote:
> >also one quick question. I was dumping on when my odr tests fails and only
> >case I do not understand is the case where decls_same_for_odr ends up 
> >comparing
> >IDENTIFIER_NODE and TYPE_DECL of the same name.
> 
> Hmm, I would guess that it comes from TYPE_NAME being an identifier
> in one case and a decl in another, perhaps because one TU is
> compiled as C and the other as C++?

Hmm, firefox combine C and C++ units, but I do not see how C and C++
type can get into the devirutalization machinery.  I will debug this.
I suppose TYPE_NAME can be considered identical to correspodning
IDENTIFIER_NODE, tought?

Honza

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jan Hubicka

> On 06/17/2013 09:07 AM, Jan Hubicka wrote:
> >>Yes.  Also for template instantiations and inline functions
> >>(basically, decls with TREE_COMDAT set).  That isn't very
> >
> >Can't those be just merged based on assembler name?
> 
> Yes, though they can have local classes that are also subject to the ODR.

I see, I guess my ODR predicate should just work for those.  I will write
some code looking for unmerged ODR cases on the top of Richard's patch and we
can decide what to do.

To get meaningful warnings, we need to know what decls/types are subject
to ODR. Do you think you can make C++ FE to drop a flag so middle-end know?
We can LTO ODR and non-ODR languages together.

Thanks,
Honza
> 
> >I think this is something Richard can handle (semi-easilly) with his
> >new tree merging patch.
> 
> Great.
> 
> >Shall I understand this as approval with those fixes and testing?
> 
> Yes.
> 
> Jason

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jason Merrill


On 06/17/2013 09:09 AM, Jan Hubicka wrote:

also one quick question. I was dumping on when my odr tests fails and only
case I do not understand is the case where decls_same_for_odr ends up comparing
IDENTIFIER_NODE and TYPE_DECL of the same name.


Hmm, I would guess that it comes from TYPE_NAME being an identifier in 
one case and a decl in another, perhaps because one TU is compiled as C 
and the other as C++?


Jason

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jason Merrill


On 06/17/2013 09:07 AM, Jan Hubicka wrote:

Yes.  Also for template instantiations and inline functions
(basically, decls with TREE_COMDAT set).  That isn't very


Can't those be just merged based on assembler name?


Yes, though they can have local classes that are also subject to the ODR.


I think this is something Richard can handle (semi-easilly) with his
new tree merging patch.


Great.


Shall I understand this as approval with those fixes and testing?


Yes.

Jason

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jan Hubicka

Hi,
also one quick question. I was dumping on when my odr tests fails and only
case I do not understand is the case where decls_same_for_odr ends up comparing
IDENTIFIER_NODE and TYPE_DECL of the same name.  Does this mean that they are
different in class hiearchy or do I miss some equality case?  I can try to 
produce
testcase (out of firefox that will be time consuming)

Honza

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jan Hubicka

> On 06/17/2013 06:05 AM, Jan Hubicka wrote:
> >It is my understanding that C++ standard enforces one definition rule for
> >types, too (to enable sane mangling?) and that we can basically match types
> >by their name and contextes (namespaces/outer classes)?
> 
> Yes.  Also for template instantiations and inline functions
> (basically, decls with TREE_COMDAT set).  That isn't very

Can't those be just merged based on assembler name?

> interesting for devirt, but we might want to handle it in
> decls_same_for_odr anyway.
> 
> Also, it would be really nice to warn about ODR violations:
> types/decls that are the same for ODR but are structurally
> different.

I think this is something Richard can handle (semi-easilly) with his
new tree merging patch.
> 
> >+This is non-trivial for LTO where minnor differences in
> 
> "minor"
> 
> >+   /* If types are not structuraly same, do not bother to contnue.
> 
> "structurally" "continue"

Shall I understand this as approval with those fixes and testing?
(I already tested the patch on GCC bootstrap/regtest where it seem
to work, but I can do at least some testing on LTO firefox / SPEC2k6)

Thanks,
Honza
> 
> Jason

Re: [C++ Patch] PR 16128

2013-06-17 Thread Jason Merrill


OK.

Jason

Re: [RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jason Merrill


On 06/17/2013 06:05 AM, Jan Hubicka wrote:

It is my understanding that C++ standard enforces one definition rule for
types, too (to enable sane mangling?) and that we can basically match types
by their name and contextes (namespaces/outer classes)?


Yes.  Also for template instantiations and inline functions (basically, 
decls with TREE_COMDAT set).  That isn't very interesting for devirt, 
but we might want to handle it in decls_same_for_odr anyway.


Also, it would be really nice to warn about ODR violations: types/decls 
that are the same for ODR but are structurally different.



+This is non-trivial for LTO where minnor differences in


"minor"


+   /* If types are not structuraly same, do not bother to contnue.


"structurally" "continue"

Jason

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Richard Biener

On Mon, Jun 17, 2013 at 2:38 PM, Bernd Schmidt  wrote:
> On 06/17/2013 02:27 PM, Julian Brown wrote:
>> On Mon, 17 Jun 2013 13:38:05 +0200
>> Richard Biener  wrote:
>>
>>> On Mon, Jun 17, 2013 at 1:31 PM, Julian Brown
>>>  wrote:
 IIUC, the incompatibility between the specified
 -fstrict-volatile-bitfields behaviour and the C++ memory model is a
 recognised deficiency in the ARM EABI. It might be an unpopular
 suggestion, but how about disabling the option by default for C++ on
 ARM [*]? Maybe even with a "sorry" if the user tries to turn it on
 manually?
>>>
>>> How are they incompatible?  As far as I understand the
>>> -fstrict-volatile-bitfields
>>> at most restricts the size of the access further, no?  Can you write
>>> down an example that shows the incompatibility?  (woudl be nice to
>>> see that as comment before the relevant code).
>>
>> Well -- I'm certainly no expert on the C++ memory model, but I am under
>> the impression (that I can't seem to verify by googling ;-)) that
>> accesses to adjacent bitfields during volatile access of a particular
>> bitfield are forbidden. So simply, for the following:
>>
>> struct foo {
>>   int a : 8;
>>   int b : 8;
>>   int c : 16;
>> };
>>
>> volatile struct foo x;
>>
>> void bar (void) { x.b++; }
>>
>> to satisfy the ARM EABI, 'bar' will access all three of a, b and c
>> using read-modify-write (using int-sized accesses). IIUC for the C++
>> memory model, only 'b' may be accessed, e.g. using byte-sized
>> loads/stores.
>
> The one I came up with after reading about the C++ memory model spec is
> struct x
> {
>   int x: 8;
>   char : 0;
>   short y : 8;
> } z;
>
> I interpret the C++ rules to say that due to the zero-sized bitfield, x
> and y are different memory locations, and modifying one shouldn't affect
> the other.

True.  IIRC the char : 0 isn't even necessary for the limitation to take
place (just the different underlying types is enough).

> However, with -fstrict-volatile-bitfields, x must be accessed
> as an int, so it overlaps y.

Ok, that's a good example then.  Looking at the mode of the FIELD_DECL
for x isn't a good way to query the access mode then, but the mode
of DECL_BIT_FIELD_TYPE would.  Note that for z.x you are not guaranteed
to end up with a COMPONENT_REF at RTL expansion time at all but it
may be lowered to a MEM_REF by arbitrary optimization passes (unlikely
because most simply go away when they see a volatile access), because
it's bitsize is a multiple of BITS_PER_UNIT and it's boundary is byte-aligned.

Richard.

>> I'm quite prepared to be wrong though, in which case sorry for the
>> noise :-).
>
> Same here :)
>
>
> Bernd
>

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Bernd Schmidt

On 06/17/2013 02:27 PM, Julian Brown wrote:
> On Mon, 17 Jun 2013 13:38:05 +0200
> Richard Biener  wrote:
> 
>> On Mon, Jun 17, 2013 at 1:31 PM, Julian Brown
>>  wrote:
>>> IIUC, the incompatibility between the specified
>>> -fstrict-volatile-bitfields behaviour and the C++ memory model is a
>>> recognised deficiency in the ARM EABI. It might be an unpopular
>>> suggestion, but how about disabling the option by default for C++ on
>>> ARM [*]? Maybe even with a "sorry" if the user tries to turn it on
>>> manually?
>>
>> How are they incompatible?  As far as I understand the
>> -fstrict-volatile-bitfields
>> at most restricts the size of the access further, no?  Can you write
>> down an example that shows the incompatibility?  (woudl be nice to
>> see that as comment before the relevant code).
> 
> Well -- I'm certainly no expert on the C++ memory model, but I am under
> the impression (that I can't seem to verify by googling ;-)) that
> accesses to adjacent bitfields during volatile access of a particular
> bitfield are forbidden. So simply, for the following:
> 
> struct foo {
>   int a : 8;
>   int b : 8;
>   int c : 16;
> };
> 
> volatile struct foo x;
> 
> void bar (void) { x.b++; }
> 
> to satisfy the ARM EABI, 'bar' will access all three of a, b and c
> using read-modify-write (using int-sized accesses). IIUC for the C++
> memory model, only 'b' may be accessed, e.g. using byte-sized
> loads/stores.

The one I came up with after reading about the C++ memory model spec is
struct x
{
  int x: 8;
  char : 0;
  short y : 8;
} z;

I interpret the C++ rules to say that due to the zero-sized bitfield, x
and y are different memory locations, and modifying one shouldn't affect
the other. However, with -fstrict-volatile-bitfields, x must be accessed
as an int, so it overlaps y.

> I'm quite prepared to be wrong though, in which case sorry for the
> noise :-).

Same here :)


Bernd

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Jakub Jelinek

On Mon, Jun 17, 2013 at 01:27:38PM +0100, Julian Brown wrote:
> Well -- I'm certainly no expert on the C++ memory model, but I am under
> the impression (that I can't seem to verify by googling ;-)) that
> accesses to adjacent bitfields during volatile access of a particular
> bitfield are forbidden. So simply, for the following:
> 
> struct foo {
>   int a : 8;
>   int b : 8;
>   int c : 16;
> };
> 
> volatile struct foo x;
> 
> void bar (void) { x.b++; }

I believe in the above it is ok in C++ memory model if the RMW cycle is
using 32-bit type, but in
struct foo {
  int a : 8;
  int b : 8;
  char c, d;
};
  
volatile struct foo x;

void bar (void) { x.b++; }
it is not (but it is laid out the same), because modification to x.a or x.b
must not create data races on x.c and/or x.d.

Jakub

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Julian Brown

On Mon, 17 Jun 2013 13:38:05 +0200
Richard Biener  wrote:

> On Mon, Jun 17, 2013 at 1:31 PM, Julian Brown
>  wrote:
> > On Mon, 17 Jun 2013 13:12:09 +0200
> > Richard Biener  wrote:
> >
> >> On Sun, Jun 16, 2013 at 9:26 PM, Jakub Jelinek 
> >> wrote:
> >> > On Sun, Jun 16, 2013 at 01:08:12PM -0600, Sandra Loosemore wrote:
> >> >> This patch fixes the PR23623 regression.  In conjunction with
> >> >> part 2 of the series, it also fixes the new
> >> >> volatile-bitfields-3.c test case.
> >> >>
> >> >> As I noted in previous discussion, there might be a better
> >> >> place to accomplish this effect, but hacking
> >> >> DECL_BIT_FIELD_REPRESENTATIVE can't work because the
> >> >> volatile-ness may be coming from a qualifier on the pointer or
> >> >> object from which the field is being extracted, rather than
> >> >> from a volatile qualifier on the bit field decl.  I think the
> >> >> choices are to do it in get_bit_range (as in this patch), in
> >> >> the callers of get_bit_range, or at the places where the bit
> >> >> range information is being used.
> >> >
> >> > So does this means you just always violate C++11 memory model
> >> > requirements with -fstrict-volatile-bitfields?  That doesn't seem
> >> > to be a good idea.
> >>
> >> Yeah, it's not clear to me that this patch fixes anything by
> >> design. It mainly changes things from limiting the access in some
> >> way to not limiting it at all ...
> >
> > IIUC, the incompatibility between the specified
> > -fstrict-volatile-bitfields behaviour and the C++ memory model is a
> > recognised deficiency in the ARM EABI. It might be an unpopular
> > suggestion, but how about disabling the option by default for C++ on
> > ARM [*]? Maybe even with a "sorry" if the user tries to turn it on
> > manually?
> 
> How are they incompatible?  As far as I understand the
> -fstrict-volatile-bitfields
> at most restricts the size of the access further, no?  Can you write
> down an example that shows the incompatibility?  (woudl be nice to
> see that as comment before the relevant code).

Well -- I'm certainly no expert on the C++ memory model, but I am under
the impression (that I can't seem to verify by googling ;-)) that
accesses to adjacent bitfields during volatile access of a particular
bitfield are forbidden. So simply, for the following:

struct foo {
  int a : 8;
  int b : 8;
  int c : 16;
};

volatile struct foo x;

void bar (void) { x.b++; }

to satisfy the ARM EABI, 'bar' will access all three of a, b and c
using read-modify-write (using int-sized accesses). IIUC for the C++
memory model, only 'b' may be accessed, e.g. using byte-sized
loads/stores.

I'm quite prepared to be wrong though, in which case sorry for the
noise :-).

Julian

[PING][PATCH] for for c/PR57541

2013-06-17 Thread Iyer, Balaji V

Hello Everyone,
Is this patch OK for trunk?

Thanks,

Balaji V. Iyer.

> -Original Message-
> From: Iyer, Balaji V
> Sent: Wednesday, June 12, 2013 1:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: anna.m.tikhon...@gmail.com
> Subject: [PATCH] for for c/PR57541
> 
> Hello Everyone,
>   Attach, please find a patch that will fix the issues in C/PR57541. The
> issue reported was that the parameters passed into the builtin array notation
> reduction functions were not checked correctly. This patch should fix that 
> issue.
> It is tested on x86 and x86_64 and it seem to pass all the tests. I have also
> included a testsuite.
> 
> Here are the ChangeLog entries:
> 
> gcc/c/ChangeLog
> 2013-06-12  Balaji V. Iyer  
> 
> * c-array-notation.c (fix_builtin_array_notation_fn): Added a call to
> valid_no_reduce_fn_params_p and valid_reduce_fn_params_p.
> (build_array_notation_expr): Added a check for capturing the return
> value (i.e. void) of __sec_reduce_mutating function.
> 
> 
> gcc/c-family/ChangeLog:
> 2013-06-12  Balaji V. Iyer  
> 
> * array-notation-common.c (valid_reduce_fn_params_p): New function.
> (valid_no_reduce_fn_params_p): Likewise.
> * c-common.h (valid_reduce_fn_params_p): Added a new prototype.
> (valid_no_reduce_fn_params_p): Likewise.
> 
> gcc/testsuite/ChangeLog
> 2013-06-12  Balaji V. Iyer  
> 
> PR c/57541
> * c-c++-common/cilk-plus/AN/pr57541-2.c: New test.
> * c-c++-common/cilk-plus/AN/rank_mismatch2.c: Fixed a bug by replacing
> a comma with an operation.
> 
> 
> Thanks,
> 
> Balaji V. Iyer.
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
old mode 100644
new mode 100755
index 3d8f68f..52a58a6
Binary files a/gcc/c-family/ChangeLog and b/gcc/c-family/ChangeLog differ
diff --git a/gcc/c-family/array-notation-common.c 
b/gcc/c-family/array-notation-common.c
old mode 100644
new mode 100755
index 489b67c..6c1c7e2
--- a/gcc/c-family/array-notation-common.c
+++ b/gcc/c-family/array-notation-common.c
@@ -560,3 +560,125 @@ find_correct_array_notation_type (tree op)
 } 
   return return_type;
 }
+
+/* Returns true if the function call in BUILTIN_FN (of type CALL_EXPR) if the
+   number of parameters for the array notation reduction functions are
+   correct.  */
+
+bool
+valid_no_reduce_fn_params_p (tree builtin_fn)
+{
+  switch (is_cilkplus_reduce_builtin (CALL_EXPR_FN (builtin_fn)))
+{
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ADD:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MUL:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ALL_ZERO:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_ZERO:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MAX:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MIN:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MIN_IND:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MAX_IND:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_NONZERO:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ALL_NONZERO:
+  if (call_expr_nargs (builtin_fn) != 1)
+   {
+ error_at (EXPR_LOCATION (builtin_fn),
+   "builtin function %qE can only have one argument",
+   CALL_EXPR_FN (builtin_fn));
+ return false;
+   }
+  break;
+case BUILT_IN_CILKPLUS_SEC_REDUCE:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MUTATING:
+  if (call_expr_nargs (builtin_fn) != 3)
+   {
+ error_at (EXPR_LOCATION (builtin_fn),
+   "builtin function %qE must have 3 arguments",
+   CALL_EXPR_FN (builtin_fn));
+ return false;
+   }
+  break;
+default:
+  /* If it is not a builtin function, then no reason for us do any checking
+here.  */
+  return true;
+}
+  return true;
+}
+
+/* Returns true if the parameters of BUILTIN_FN (array notation builtin
+   function): IDENTITY_VALUE and FUNC_PARM are valid.  */
+
+bool
+valid_reduce_fn_params_p (tree builtin_fn, tree identity_value, tree func_parm)
+{
+  switch (is_cilkplus_reduce_builtin (CALL_EXPR_FN (builtin_fn)))
+{
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ADD:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MUL:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ALL_ZERO:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_ZERO:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MAX:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MIN:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MIN_IND:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_MAX_IND:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ANY_NONZERO:
+case BUILT_IN_CILKPLUS_SEC_REDUCE_ALL_NONZERO:
+  func_parm = CALL_EXPR_ARG (builtin_fn, 0);
+  if (!contains_array_notation_expr (func_parm))
+   {
+ error_at (EXPR_LOCATION (builtin_fn),
+   "builtin function %qE must contain one argument with "
+   "array notations", CALL_EXPR_FN (builtin_fn));
+ return false;
+   }
+  if (TREE_CODE (func_parm) == CALL_EXPR
+ && is_cilkplus_reduce_builtin (CALL_EXPR_FN (func_parm)) !=
+ BUILT_IN_NONE)
+   {
+

Re: Symtab cleanup 6/17: fix handling of constructors of aliases

2013-06-17 Thread Jan Hubicka

> On Mon, Jun 17, 2013 at 11:53 AM, Jan Hubicka  wrote:
> > Hi,
> > this patch makes it possible to fold through aliases.  It may seem 
> > unimportant, but we
> > run into those cases in C++ where extra name aliases may get used by 
> > devirtualization
> > machinery.  The patch also fixes the following long standing bug:
> > jh@gcc10:~/trunk/build2/gcc$ cat t.c
> > static int a=4;
> > static int b __attribute__ ((alias("a")));
> > main()
> > {
> >return b+a;
> > }
> 
> The gimple alias machinery also happily treats a and b as different decls
> btw (so does the PTA code).  To fix that the lookups have to be fast
> (no hashtable query at least for the case where there is no alias).

Yep, I know. There are a lot more wrong code issues with aliases.
Basically things works on assumption that you nmever use original and the
alias together. This breaks badly with LTO.

I have hashtable lookup in my tree and it seems to not be bottleneck, but 
definitely
we either can finally move to have direct symtab pointer in decl node or have 
flag
if decl is an alias.

Honza

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Richard Biener

On Mon, Jun 17, 2013 at 1:31 PM, Julian Brown  wrote:
> On Mon, 17 Jun 2013 13:12:09 +0200
> Richard Biener  wrote:
>
>> On Sun, Jun 16, 2013 at 9:26 PM, Jakub Jelinek 
>> wrote:
>> > On Sun, Jun 16, 2013 at 01:08:12PM -0600, Sandra Loosemore wrote:
>> >> This patch fixes the PR23623 regression.  In conjunction with part
>> >> 2 of the series, it also fixes the new volatile-bitfields-3.c test
>> >> case.
>> >>
>> >> As I noted in previous discussion, there might be a better place to
>> >> accomplish this effect, but hacking DECL_BIT_FIELD_REPRESENTATIVE
>> >> can't work because the volatile-ness may be coming from a qualifier
>> >> on the pointer or object from which the field is being extracted,
>> >> rather than from a volatile qualifier on the bit field decl.  I
>> >> think the choices are to do it in get_bit_range (as in this patch),
>> >> in the callers of get_bit_range, or at the places where the bit
>> >> range information is being used.
>> >
>> > So does this means you just always violate C++11 memory model
>> > requirements with -fstrict-volatile-bitfields?  That doesn't seem
>> > to be a good idea.
>>
>> Yeah, it's not clear to me that this patch fixes anything by design.
>> It mainly changes things from limiting the access in some way to not
>> limiting it at all ...
>
> IIUC, the incompatibility between the specified
> -fstrict-volatile-bitfields behaviour and the C++ memory model is a
> recognised deficiency in the ARM EABI. It might be an unpopular
> suggestion, but how about disabling the option by default for C++ on
> ARM [*]? Maybe even with a "sorry" if the user tries to turn it on
> manually?

How are they incompatible?  As far as I understand the
-fstrict-volatile-bitfields
at most restricts the size of the access further, no?  Can you write down an
example that shows the incompatibility?  (woudl be nice to see that as comment
before the relevant code).

Richard.

> The assumption behind that idea is that people who most care about the
> EABI-specified behaviour (to access hardware registers, etc.) are
> probably using bare metal and plain C. The downside is the potential
> for "surprise" for users though, I suppose.
>
> Julian
>
> [*] Or some other variant target-dependent hack, like depending on
> -ansi, bare-metal vs. $operating system, etc

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Julian Brown

On Mon, 17 Jun 2013 13:12:09 +0200
Richard Biener  wrote:

> On Sun, Jun 16, 2013 at 9:26 PM, Jakub Jelinek 
> wrote:
> > On Sun, Jun 16, 2013 at 01:08:12PM -0600, Sandra Loosemore wrote:
> >> This patch fixes the PR23623 regression.  In conjunction with part
> >> 2 of the series, it also fixes the new volatile-bitfields-3.c test
> >> case.
> >>
> >> As I noted in previous discussion, there might be a better place to
> >> accomplish this effect, but hacking DECL_BIT_FIELD_REPRESENTATIVE
> >> can't work because the volatile-ness may be coming from a qualifier
> >> on the pointer or object from which the field is being extracted,
> >> rather than from a volatile qualifier on the bit field decl.  I
> >> think the choices are to do it in get_bit_range (as in this patch),
> >> in the callers of get_bit_range, or at the places where the bit
> >> range information is being used.
> >
> > So does this means you just always violate C++11 memory model
> > requirements with -fstrict-volatile-bitfields?  That doesn't seem
> > to be a good idea.
> 
> Yeah, it's not clear to me that this patch fixes anything by design.
> It mainly changes things from limiting the access in some way to not
> limiting it at all ...

IIUC, the incompatibility between the specified
-fstrict-volatile-bitfields behaviour and the C++ memory model is a
recognised deficiency in the ARM EABI. It might be an unpopular
suggestion, but how about disabling the option by default for C++ on
ARM [*]? Maybe even with a "sorry" if the user tries to turn it on
manually?

The assumption behind that idea is that people who most care about the
EABI-specified behaviour (to access hardware registers, etc.) are
probably using bare metal and plain C. The downside is the potential
for "surprise" for users though, I suppose.

Julian

[*] Or some other variant target-dependent hack, like depending on
-ansi, bare-metal vs. $operating system, etc

[PATCH,ARM] Define MAX_CONDITIONAL_EXECUTE

2013-06-17 Thread Greta Yorsh

This patch makes the following changes:
* Define MAX_CONDITIONAL_EXECUTE in arm backend using max_insns_skipped,
which is set based on the current tune. 
* Update max_insns_skipped for Cortex-A15 tune to be 2 (instead of 5).
* Use max_insns_skipped in thumb2_final_prescan_insn to decide when to
combine IT blocks
into larger IT blocks. Previously, max_insns_skipped was only used in 
arm_final_prescan_insn to decide when branch should be converted to
conditional execution. 

No regression on qemu for arm-none-eabi with cortex-a15 arm/thumb mode.
Bootstrap successful on Cortex-A15. 

Performance improvement on Cortex-A15 in both arm and thumb states on both
Dhrystone and Coremark, and improvement on Spec2000 in thumb state, with all
benchmarks showing improvements except three benchmarks in CFP2000 that have
slight regressions (189,183,178).

gcc/ChangeLog

2013-06-17  Greta Yorsh  

* config/arm/arm.h (MAX_CONDITIONAL_EXECUTE): Define macro.
* config/arm/arm-protos.h (arm_max_conditional_execute): New
declaration.
(tune_params): Update comment.
* config/arm/arm.c (arm_cortex_a15_tune): Set max_cond_insns to 2.
(arm_max_conditional_execute): New function.
(thumb2_final_prescan_insn): Use max_insn_skipped and
MAX_INSN_PER_IT_BLOCK to compute maximum instructions in a block.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c791341..374c364 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -227,6 +227,8 @@ extern const char *arm_mangle_type (const_tree);
 
 extern void arm_order_regs_for_local_alloc (void);
 
+extern int arm_max_conditional_execute ();
+
 /* Vectorizer cost model implementation.  */
 struct cpu_vec_costs {
   const int scalar_stmt_cost;   /* Cost of any scalar operation, excluding
@@ -256,8 +258,7 @@ struct tune_params
   bool (*rtx_costs) (rtx, RTX_CODE, RTX_CODE, int *, bool);
   bool (*sched_adjust_cost) (rtx, rtx, rtx, int *);
   int constant_limit;
-  /* Maximum number of instructions to conditionalise in
- arm_final_prescan_insn.  */
+  /* Maximum number of instructions to conditionalise.  */
   int max_insns_skipped;
   int num_prefetch_slots;
   int l1_cache_size;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 43dfe27..6ca81eb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -1054,7 +1057,7 @@ const struct tune_params arm_cortex_a15_tune =
   arm_9e_rtx_costs,
   NULL,
   1,   /* Constant limit.  */
-  5,   /* Max cond insns.  */
+  2,   /* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   false,   /* Prefer constant pool.  */
   arm_default_branch_cost,
@@ -9101,6 +9104,12 @@ arm_adjust_cost (rtx insn, rtx link, rtx dep, int cost)
   return cost;
 }
 
+int
+arm_max_conditional_execute (void)
+{
+  return max_insns_skipped;
+}
+
 static int
 arm_default_branch_cost (bool speed_p, bool predictable_p ATTRIBUTE_UNUSED)
 {
@@ -19488,6 +19497,13 @@ thumb2_final_prescan_insn (rtx insn)
   enum arm_cond_code code;
   int n;
   int mask;
+  int max;
+
+  /* Maximum number of conditionally executed instructions in a block
+ is minimum of the two max values: maximum allowed in an IT block
+ and maximum that is beneficial according to the cost model and tune.  */
+  max = (max_insns_skipped < MAX_INSN_PER_IT_BLOCK) ?
+max_insns_skipped : MAX_INSN_PER_IT_BLOCK;
 
   /* Remove the previous insn from the count of insns to be output.  */
   if (arm_condexec_count)
@@ -19530,9 +19546,9 @@ thumb2_final_prescan_insn (rtx insn)
   /* ??? Recognize conditional jumps, and combine them with IT blocks.  */
   if (GET_CODE (body) != COND_EXEC)
break;
-  /* Allow up to 4 conditionally executed instructions in a block.  */
+  /* Maximum number of conditionally executed instructions in a block.  */
   n = get_attr_ce_count (insn);
-  if (arm_condexec_masklen + n > MAX_INSN_PER_IT_BLOCK)
+  if (arm_condexec_masklen + n > max)
break;
 
   predicate = COND_EXEC_TEST (body);
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 3a49a90..387d271 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -183,6 +183,11 @@ extern arm_cc arm_current_cc;
 
 #define ARM_INVERSE_CONDITION_CODE(X)  ((arm_cc) (((int)X) ^ 1))
 
+/* The maximaum number of instructions that is beneficial to
+   conditionally execute. */
+#undef MAX_CONDITIONAL_EXECUTE
+#define MAX_CONDITIONAL_EXECUTE arm_max_conditional_execute ()
+
 extern int arm_target_label;
 extern int arm_ccfsm_state;
 extern GTY(()) rtx arm_target_insn;

[PATCH, ARM] Fix gcc.dg/pr48335-5.c ICE with NEON enabled

2013-06-17 Thread Julian Brown

Hi,

This patch fixes an ICE where the compiler crashes (with NEON enabled)
during expansion of an instruction such as:

pr48335-5.c:17:1: error: unrecognizable insn:
 }
 ^
(insn 9 8 10 2 (set (reg:DI 116 [ s ])
(unspec:DI [
(mem/c:DI (plus:SI (reg/f:SI 105 virtual-stack-vars)
(const_int -8 [0xfff8])) [2 S8 A32])
] UNSPEC_MISALIGNED_ACCESS)) pr48335-5.c:15 -1
 (nil))
pr48335-5.c:17:1: internal compiler error: in extract_insn, at recog.c:2158

The problem is that generic code (expr.c:expand_expr_real_1, case
VIEW_CONVERT_EXPR) expects to be able to use movmisalign without
explicitly checking that the operands match -- reminiscent of normal
moves, which are generally expected to "always work". However, the
current predicates on the NEON movmisalign insn patterns reject MEMs
which refer to virtual registers such as the virtual-stack-vars above,
leading to the ICE, even though the middle end would generally be capable of
rewriting such an instruction into one which is valid.

The fix for this is to tweak the operands for the instruction patterns
in question to allow such virtual registers, pre-reload. This fixes
gcc.dg/pr48335-5.c when the testsuite is run with "-march=armv7-a
-mfpu=neon -mfloat-abi=softfp" options, with no other changes in
results for gcc, g++ & libstdc++.

OK to apply?

Thanks,

Julian

ChangeLog

gcc/
* config/arm/arm.c (neon_vector_mem_operand): Add strict argument.
Permit virtual register pre-reload if !strict.
(coproc_secondary_reload_class): Adjust for neon_vector_mem_operand
change.
* config/arm/arm-protos.h (neon_vector_mem_operand): Adjust
prototype.
* config/arm/neon.md (movmisalign): Use
neon_perm_struct_or_reg_operand instead of
neon_struct_or_register_operand.
(*movmisalign_neon_load, *movmisalign_neon_store): Use
neon_permissive_struct_operand instead of neon_struct_operand.
* config/arm/constraints.md (Un, Um, Us): Adjust calls to
neon_vector_mem_operand.
* config/arm/predicates.md (neon_struct_operand): Adjust call to
neon_vector_mem_operand.
(neon_permissive_struct_operand): New.
(neon_struct_or_register_operand): Rename to...
(neon_perm_struct_or_reg_operand): This. Adjust call to
neon_vector_mem_operand.
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c	(revision 200070)
+++ gcc/config/arm/arm.c	(working copy)
@@ -7863,7 +7863,7 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou
 	  && GET_CODE (SET_SRC (x)) == VEC_SELECT)
 	{
 	  *total = rtx_cost (SET_DEST (x), code, 0, speed);
-	  if (!neon_vector_mem_operand (SET_DEST (x), 2))
+	  if (!neon_vector_mem_operand (SET_DEST (x), 2, true))
 	*total += COSTS_N_INSNS (1);
 	  return true;
 	}
@@ -7874,7 +7874,7 @@ arm_rtx_costs_1 (rtx x, enum rtx_code ou
 	{
 	  rtx mem = XEXP (XEXP (SET_SRC (x), 0), 0);
 	  *total = rtx_cost (mem, code, 0, speed);
-	  if (!neon_vector_mem_operand (mem, 2))
+	  if (!neon_vector_mem_operand (mem, 2, true))
 	*total += COSTS_N_INSNS (1);
 	  return true;
 	}
@@ -10046,7 +10046,7 @@ arm_coproc_mem_operand (rtx op, bool wb)
 2 - Element/structure loads (vld1)
  */
 int
-neon_vector_mem_operand (rtx op, int type)
+neon_vector_mem_operand (rtx op, int type, bool strict)
 {
   rtx ind;
 
@@ -10058,7 +10058,7 @@ neon_vector_mem_operand (rtx op, int typ
 	  || reg_mentioned_p (virtual_outgoing_args_rtx, op)
 	  || reg_mentioned_p (virtual_stack_dynamic_rtx, op)
 	  || reg_mentioned_p (virtual_stack_vars_rtx, op)))
-return FALSE;
+return !strict;
 
   /* Constants are converted into offsets from labels.  */
   if (!MEM_P (op))
@@ -10168,7 +10168,7 @@ coproc_secondary_reload_class (enum mach
 {
   if (!TARGET_NEON_FP16)
 	return GENERAL_REGS;
-  if (s_register_operand (x, mode) || neon_vector_mem_operand (x, 2))
+  if (s_register_operand (x, mode) || neon_vector_mem_operand (x, 2, true))
 	return NO_REGS;
   return GENERAL_REGS;
 }
Index: gcc/config/arm/arm-protos.h
===
--- gcc/config/arm/arm-protos.h	(revision 200070)
+++ gcc/config/arm/arm-protos.h	(working copy)
@@ -95,7 +95,7 @@ extern enum reg_class coproc_secondary_r
 extern bool arm_tls_referenced_p (rtx);
 
 extern int arm_coproc_mem_operand (rtx, bool);
-extern int neon_vector_mem_operand (rtx, int);
+extern int neon_vector_mem_operand (rtx, int, bool);
 extern int neon_struct_mem_operand (rtx);
 extern int arm_no_early_store_addr_dep (rtx, rtx);
 extern int arm_early_store_addr_dep (rtx, rtx);
Index: gcc/config/arm/neon.md
===
--- gcc/config/arm/neon.md	(revision 200070)
+++ gcc/config/arm/neon.md	(working copy)
@@ -241,8 +241,8 @@
 })
 
 (define_expand "movmisalign"
-  [(set (match_operand:VDQX 0 "neon_struct_or_register_operand")
-	(unspec:VDQX [(match_operan

Re: [patch 3/5] don't restrict bit range for -fstrict-volatile-bitfields

2013-06-17 Thread Richard Biener

On Sun, Jun 16, 2013 at 9:26 PM, Jakub Jelinek  wrote:
> On Sun, Jun 16, 2013 at 01:08:12PM -0600, Sandra Loosemore wrote:
>> This patch fixes the PR23623 regression.  In conjunction with part 2
>> of the series, it also fixes the new volatile-bitfields-3.c test
>> case.
>>
>> As I noted in previous discussion, there might be a better place to
>> accomplish this effect, but hacking DECL_BIT_FIELD_REPRESENTATIVE
>> can't work because the volatile-ness may be coming from a qualifier
>> on the pointer or object from which the field is being extracted,
>> rather than from a volatile qualifier on the bit field decl.  I
>> think the choices are to do it in get_bit_range (as in this patch),
>> in the callers of get_bit_range, or at the places where the bit
>> range information is being used.
>
> So does this means you just always violate C++11 memory model requirements
> with -fstrict-volatile-bitfields?  That doesn't seem to be a good idea.

Yeah, it's not clear to me that this patch fixes anything by design.  It
mainly changes things from limiting the access in some way to not
limiting it at all ...

Richard.

>> 2013-06-16  Sandra Loosemore  
>>
>>   PR middle-end/23623
>>
>>   gcc/
>>   * expr.c (get_bit_range): Handle flag_strict_volatile_bitfields.
>
> Jakub

[PATCH] PowerPC: Fix test case for PR55033

2013-06-17 Thread Sebastian Huber

Tested on powerpc64-unknown-linux-gnu with:

make -k check RUNTESTFLAGS="--target_board=unix'{-m64,-m32,-m32/-mpowerpc64}'"

gcc/testsuite/ChangeLog
2013-06-17  Sebastian Huber  

PR target/55033
* gcc.target/powerpc/pr55033.c: Fix options.
---
 gcc/testsuite/gcc.target/powerpc/pr55033.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/pr55033.c 
b/gcc/testsuite/gcc.target/powerpc/pr55033.c
index 2c1835e..fcc6bfc 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr55033.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr55033.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target powerpc_eabi_ok } */
-/* { dg-options "-mcpu=8540 -msoft-float -msdata=eabi -G 8 -fno-common" } */
+/* { dg-options "-mcpu=8540 -msoft-float -meabi -msdata=eabi -G 8 -fno-common" 
} */
 
 extern void f (void);
 
-- 
1.7.7

Re: Symtab cleanup 6/17: fix handling of constructors of aliases

2013-06-17 Thread Richard Biener

On Mon, Jun 17, 2013 at 11:53 AM, Jan Hubicka  wrote:
> Hi,
> this patch makes it possible to fold through aliases.  It may seem 
> unimportant, but we
> run into those cases in C++ where extra name aliases may get used by 
> devirtualization
> machinery.  The patch also fixes the following long standing bug:
> jh@gcc10:~/trunk/build2/gcc$ cat t.c
> static int a=4;
> static int b __attribute__ ((alias("a")));
> main()
> {
>return b+a;
> }

The gimple alias machinery also happily treats a and b as different decls
btw (so does the PTA code).  To fix that the lookups have to be fast
(no hashtable query at least for the case where there is no alias).

Richard.

> jh@gcc10:~/trunk/build2/gcc$ gcc -O2 t.c -S
> jh@gcc10:~/trunk/build2/gcc$ more t.s
> .file   "t.c"
> .text
> .p2align 4,,15
> .globl main
> .type   main, @function
> main:
> .LFB0:
> .cfi_startproc
> movl$4, %eax
> ret
> .cfi_endproc
> .LFE0:
> .size   main, .-main
> .section.rodata
> .align 4
> .type   a, @object
> .size   a, 4
> a:
> .long   4
> .setb,a
> .ident  "GCC: (Debian 4.4.5-8) 4.4.5"
> .section.note.GNU-stack,"",@progbits
> jh@gcc10:~/trunk/build2/gcc$ gcc --version
> gcc (Debian 4.4.5-8) 4.4.5
> Copyright (C) 2010 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> jh@gcc10:~/trunk/build2/gcc$ ./xgcc -B ./ -O2 t.c -S
> jh@gcc10:~/trunk/build2/gcc$ more t.s
> .file   "t.c"
> .section.text.startup,"ax",@progbits
> .p2align 4,,15
> .globl  main
> .type   main, @function
> main:
> .LFB0:
> .cfi_startproc
> movl$8, %eax
> ret
> .cfi_endproc
> .LFE0:
> .size   main, .-main
> .ident  "GCC: (GNU) 4.9.0 20130616 (experimental)"
> .section.note.GNU-stack,"",@progbits
>
> The main idea is to replace const_value_known_p predicate by ctor_for_folding
> that returns the ctor and is able to look through aliases via the symbol 
> table.
> The huge change in expand_expr_real_1 is really just a reformating.
>
> I have bootstrapped/regtested the patch on x86_64-linux and tested with 
> Firefox build.
> I am running now PPC64 test and will wait for Richard's renaming patch.
>
> Honza
>
> * cgraph.h (const_value_known_p): Replace by ...
> (ctor_for_folding): .. this one.
> * cgraphunit.c (process_function_and_variable_attributes): Use it.
> * lto-cgraph.c (compute_ltrans_boundary): Use ctor_for_folding.
> * expr.c (expand_expr_real_1): Likewise.
> (string_constant): Likewise.
> * tree-ssa-loop-ivcanon.c (constant_after_peeling): Likewise.
> * ipa.c (process_references): Likewise.
> (symtab_remove_unreachable_nodes): Likewise.
> * ipa-inline-analysis.c (param_change_prob): Likewise.
> * gimple-fold.c (canonicalize_constructor_val): Likewise.
> (get_base_constructor): Likwise.
> * varpool.c (varpool_remove_node): Likewise.
> (varpool_remove_initializer): LIkewise.
> (dump_varpool_node): LIkwise.
> (const_value_known_p): Rewrite to ...
> (ctor_for_folding): ... this one.
>
> * lto-partition.c (add_references_to_partition): Use
> ctor_for_folding.
>
> * gcc.dg/tree-ssa/attr-alias-2.c: New testcase.
> Index: cgraph.h
> ===
> *** cgraph.h(revision 200147)
> --- cgraph.h(working copy)
> *** void varpool_analyze_node (struct varpoo
> *** 797,803 
>   struct varpool_node * varpool_extra_name_alias (tree, tree);
>   struct varpool_node * varpool_create_variable_alias (tree, tree);
>   void varpool_reset_queue (void);
> ! bool const_value_known_p (tree);
>   bool varpool_for_node_and_aliases (struct varpool_node *,
>bool (*) (struct varpool_node *, void *),
>void *, bool);
> --- 797,803 
>   struct varpool_node * varpool_extra_name_alias (tree, tree);
>   struct varpool_node * varpool_create_variable_alias (tree, tree);
>   void varpool_reset_queue (void);
> ! tree ctor_for_folding (tree);
>   bool varpool_for_node_and_aliases (struct varpool_node *,
>bool (*) (struct varpool_node *, void *),
>void *, bool);
> Index: cgraphunit.c
> ===
> *** cgraphunit.c(revision 200147)
> --- cgraphunit.c(working copy)
> *** process_function_and_variable_attributes
> *** 761,768 
>   {
> tree decl = vnode->symbol.decl;
> if (DECL_EXTERNAL (decl)
> ! && DECL_INITIAL

RE: [PATCH, ARM][2 of 2] Enable shrink-wrap for ARM

2013-06-17 Thread Kyrylo Tkachov

Hi Zhenqiang,

This patch causes PR57637 (miscompare in SPEC2000). I'll try to reduce a
testcase.

Thanks,
Kyrill

> The *arm_simple_return is the same as "simple_return" used by
> shrink-wrap. *arm_return and *arm_simple_return are not merged
> since
> *arm_return is for "Often the return insn will be the same as
> loading
> from memory", while *arm_simple_return is for "branch".
> 
> For thumb2, there is only "simple_return" pattern --
> *thumb2_return.
> 
> And this is the reason why the "normal" return generated by
> epilogue
> and "simple" return generated by shrink-wrap can be optimized as
> one,
> which leads to dwarf info issue.
> 
> The rebased patch is attached.
> 
> Thanks!
> -Zhenqiang
> 
> 
> > On Wed, Apr 3, 2013 at 7:50 AM, Zhenqiang Chen
> >  wrote:
> >> On 2 April 2013 17:55, Ramana Radhakrishnan
>  wrote:
> >>> On Thu, Mar 21, 2013 at 7:03 AM, Zhenqiang Chen
> >>>  wrote:
>  Hi,
> 
>  The patch is to enable shrink-wrap for TARGET_ARM and
> TARGET_THUMB2.
> 
>  Bootstrapped and no make check regression.
>  All previous Linaro shrink-wrap bugs (http://goo.gl/6fGg5) are
> verified.
> 
>  Is it OK?
> >>>
> >>> The tests should be part of the patch attached and not just
> added as
> >>> separate files in your patch submission.
> >>
> >> Thank you for the comments. The patch is updated.
> >>
> >> -Zhenqiang
> >>
> 
>  Thanks!
>  -Zhenqiang
> 
>  ChangeLog:
>  2013-03-21 Bernd Schmidt  
> Zhenqiang Chen 
> 
>  * config/arm/arm-protos.h: Add and update function
> protos.
>  * config/arm/arm.c (use_simple_return_p): New added.
>  (thumb2_expand_return): Check simple_return flag.
>  * config/arm/arm.md: Add simple_return and conditional
> simple_return.
>  * config/arm/iterators.md: Add iterator for return and
> simple_return.
> 
>  testsuite/ChangeLog:
>  2013-03-21 Zhenqiang Chen 
> 
>  * gcc.dg/shrink-wrap-alloca.c: New added.
>  * gcc.dg/shrink-wrap-pretend.c: New added.
>  * gcc.dg/shrink-wrap-sibcall.c: New added.

[RFC] Using One Definition Rule for types during LTO devirtualizatoin?

2013-06-17 Thread Jan Hubicka

Hi,
during LTO we seem to give up on many valid devirtualization cases because
the types are not merged by type merging machinery. This is i.e. because their
declarations are different; one unit define a function, while in the other
unit it is just an external declaration.

It is my understanding that C++ standard enforces one definition rule for
types, too (to enable sane mangling?) and that we can basically match types
by their name and contextes (namespaces/outer classes)>  Does the attaches
patch make sense?

It enables a lot more devirtualization to happen during Firefox build.

Honza
Index: gimple-fold.c
===
*** gimple-fold.c   (revision 200063)
--- gimple-fold.c   (working copy)
*** gimple_extract_devirt_binfo_from_cst (tr
*** 1038,1044 
HOST_WIDE_INT pos, size;
tree fld;
  
!   if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (expected_type))
break;
if (offset < 0)
return NULL_TREE;
--- 1038,1044 
HOST_WIDE_INT pos, size;
tree fld;
  
!   if (types_same_for_odr (type, expected_type))
break;
if (offset < 0)
return NULL_TREE;
Index: tree.c
===
*** tree.c  (revision 200063)
--- tree.c  (working copy)
*** lhd_gcc_personality (void)
*** 11618,11623 
--- 11618,11695 
return gcc_eh_personality_decl;
  }
  
+ /* For languages with One Definition Rule, work out if
+decls are actually the same even if the tree representation
+differs.  This handles only decls appearing in TYPE_NAME
+and TYPE_CONTEXT.  That is NAMESPACE_DECL, TYPE_DECL,
+RECORD_TYPE and IDENTIFIER_NODE.  */
+ 
+ static bool
+ decls_same_for_odr (tree decl1, tree decl2)
+ {
+   if (decl1 && TREE_CODE (decl1) == TYPE_DECL
+   && DECL_ORIGINAL_TYPE (decl1))
+ decl1 = DECL_ORIGINAL_TYPE (decl1);
+   if (decl2 && TREE_CODE (decl2) == TYPE_DECL
+   && DECL_ORIGINAL_TYPE (decl2))
+ decl2 = DECL_ORIGINAL_TYPE (decl2);
+   if (decl1 == decl2)
+ return true;
+   if (!decl1 || !decl2)
+ return false;
+   if (TREE_CODE (decl1) != TREE_CODE (decl2))
+ return false;
+   if (TREE_CODE (decl1) == TRANSLATION_UNIT_DECL)
+ return true;
+   if (TREE_CODE (decl1) != NAMESPACE_DECL
+   && TREE_CODE (decl1) != RECORD_TYPE
+   && TREE_CODE (decl1) != TYPE_DECL)
+ return false;
+   if (!DECL_NAME (decl1))
+ return false;
+   if (!decls_same_for_odr (DECL_NAME (decl1), DECL_NAME (decl2)))
+ return false;
+   return decls_same_for_odr (DECL_CONTEXT (decl1),
+DECL_CONTEXT (decl2));
+ }
+ 
+ /* For languages with One Definition Rule, work out if
+types are same even if the tree representation differs. 
+This is non-trivial for LTO where minnor differences in
+the type representation may have prevented type merging
+to merge two copies of otherwise equivalent type.  */
+ 
+ bool
+ types_same_for_odr (tree type1, tree type2)
+ {
+   type1 = TYPE_MAIN_VARIANT (type1);
+   type2 = TYPE_MAIN_VARIANT (type2);
+   if (type1 == type2)
+ return true;
+   if (!type1 || !type2)
+ return false;
+ 
+   /* If types are not structuraly same, do not bother to contnue.
+  Match in the remainder of code would mean ODR violation.  */
+   if (!types_compatible_p (type1, type2))
+ return false;
+ 
+ #ifndef ENABLE_CHECKING
+   if (!in_lto_p)
+ return false;
+ #endif
+ 
+   if (!TYPE_NAME (type1))
+ return false;
+   if (!decls_same_for_odr (TYPE_NAME (type1), TYPE_NAME (type2)))
+ return false;
+   if (!decls_same_for_odr (TYPE_CONTEXT (type1), TYPE_CONTEXT (type2)))
+ return false;
+   gcc_assert (in_lto_p);
+ 
+   return true;
+ }
+ 
  /* Try to find a base info of BINFO that would have its field decl at offset
 OFFSET within the BINFO type and which is of EXPECTED_TYPE.  If it can be
 found, return, otherwise return NULL_TREE.  */
*** get_binfo_at_offset (tree binfo, HOST_WI
*** 11633,11639 
tree fld;
int i;
  
!   if (TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (expected_type))
  return binfo;
if (offset < 0)
return NULL_TREE;
--- 11705,11711 
tree fld;
int i;
  
!   if (types_same_for_odr (type, expected_type))
  return binfo;
if (offset < 0)
return NULL_TREE;
*** get_binfo_at_offset (tree binfo, HOST_WI
*** 11663,11669 
{
  tree base_binfo, found_binfo = NULL_TREE;
  for (i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)
!   if (TREE_TYPE (base_binfo) == TREE_TYPE (fld))
  {
found_binfo = base_binfo;
break;
--- 11735,11741 
{
  tree base_binfo, found_binfo = NULL_TREE;
  for (i = 0; BINFO_BASE_ITERATE (binfo, i, base_binfo); i++)

Symtab cleanup 6/17: fix handling of constructors of aliases

2013-06-17 Thread Jan Hubicka

Hi,
this patch makes it possible to fold through aliases.  It may seem unimportant, 
but we
run into those cases in C++ where extra name aliases may get used by 
devirtualization
machinery.  The patch also fixes the following long standing bug:
jh@gcc10:~/trunk/build2/gcc$ cat t.c
static int a=4;
static int b __attribute__ ((alias("a")));
main()
{
   return b+a;
}
jh@gcc10:~/trunk/build2/gcc$ gcc -O2 t.c -S
jh@gcc10:~/trunk/build2/gcc$ more t.s
.file   "t.c"
.text
.p2align 4,,15
.globl main
.type   main, @function
main:
.LFB0:
.cfi_startproc
movl$4, %eax
ret
.cfi_endproc
.LFE0:
.size   main, .-main
.section.rodata
.align 4
.type   a, @object
.size   a, 4
a:
.long   4
.setb,a
.ident  "GCC: (Debian 4.4.5-8) 4.4.5"
.section.note.GNU-stack,"",@progbits
jh@gcc10:~/trunk/build2/gcc$ gcc --version
gcc (Debian 4.4.5-8) 4.4.5
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

jh@gcc10:~/trunk/build2/gcc$ ./xgcc -B ./ -O2 t.c -S
jh@gcc10:~/trunk/build2/gcc$ more t.s
.file   "t.c"
.section.text.startup,"ax",@progbits
.p2align 4,,15
.globl  main
.type   main, @function
main:
.LFB0:
.cfi_startproc
movl$8, %eax
ret
.cfi_endproc
.LFE0:
.size   main, .-main
.ident  "GCC: (GNU) 4.9.0 20130616 (experimental)"
.section.note.GNU-stack,"",@progbits

The main idea is to replace const_value_known_p predicate by ctor_for_folding
that returns the ctor and is able to look through aliases via the symbol table.
The huge change in expand_expr_real_1 is really just a reformating.

I have bootstrapped/regtested the patch on x86_64-linux and tested with Firefox 
build.
I am running now PPC64 test and will wait for Richard's renaming patch.

Honza

* cgraph.h (const_value_known_p): Replace by ...
(ctor_for_folding): .. this one.
* cgraphunit.c (process_function_and_variable_attributes): Use it.
* lto-cgraph.c (compute_ltrans_boundary): Use ctor_for_folding.
* expr.c (expand_expr_real_1): Likewise.
(string_constant): Likewise.
* tree-ssa-loop-ivcanon.c (constant_after_peeling): Likewise.
* ipa.c (process_references): Likewise.
(symtab_remove_unreachable_nodes): Likewise.
* ipa-inline-analysis.c (param_change_prob): Likewise.
* gimple-fold.c (canonicalize_constructor_val): Likewise.
(get_base_constructor): Likwise.
* varpool.c (varpool_remove_node): Likewise.
(varpool_remove_initializer): LIkewise.
(dump_varpool_node): LIkwise.
(const_value_known_p): Rewrite to ...
(ctor_for_folding): ... this one.

* lto-partition.c (add_references_to_partition): Use
ctor_for_folding.

* gcc.dg/tree-ssa/attr-alias-2.c: New testcase.
Index: cgraph.h
===
*** cgraph.h(revision 200147)
--- cgraph.h(working copy)
*** void varpool_analyze_node (struct varpoo
*** 797,803 
  struct varpool_node * varpool_extra_name_alias (tree, tree);
  struct varpool_node * varpool_create_variable_alias (tree, tree);
  void varpool_reset_queue (void);
! bool const_value_known_p (tree);
  bool varpool_for_node_and_aliases (struct varpool_node *,
   bool (*) (struct varpool_node *, void *),
   void *, bool);
--- 797,803 
  struct varpool_node * varpool_extra_name_alias (tree, tree);
  struct varpool_node * varpool_create_variable_alias (tree, tree);
  void varpool_reset_queue (void);
! tree ctor_for_folding (tree);
  bool varpool_for_node_and_aliases (struct varpool_node *,
   bool (*) (struct varpool_node *, void *),
   void *, bool);
Index: cgraphunit.c
===
*** cgraphunit.c(revision 200147)
--- cgraphunit.c(working copy)
*** process_function_and_variable_attributes
*** 761,768 
  {
tree decl = vnode->symbol.decl;
if (DECL_EXTERNAL (decl)
! && DECL_INITIAL (decl)
! && const_value_known_p (decl))
varpool_finalize_decl (decl);
if (DECL_PRESERVE_P (decl))
vnode->symbol.force_output = true;
--- 761,767 
  {
tree decl = vnode->symbol.decl;
if (DECL_EXTERNAL (decl)
! && DECL_INITIAL (decl))
varpool_finalize_decl (decl);
if (DECL_PRESERVE_P (decl))
vnode->symbol.force_output = true;
Index: testsuite/gcc.dg/tree-ssa/attr-alias-2.c
===

[PATCH] Fix PR57630

2013-06-17 Thread Marek Polacek

This improves the diagnostics messages in case we're using initial
declarations in the for loop, but we're not using C99 or newer standard;
in this case we shouldn't forget about =c11 and =gnu11, where the
initial declaration is fine as well.

Ok for trunk?

2013-06-17  Marek Polacek  

PR c/57630
* c-decl.c (check_for_loop_decls): Improve diagnostics messages.

--- gcc/c/c-decl.c.mp2  2013-06-17 11:23:18.007191155 +0200
+++ gcc/c/c-decl.c  2013-06-17 11:32:47.410472006 +0200
@@ -8503,11 +8503,12 @@ check_for_loop_decls (location_t loc, bo
 the C99 for loop scope.  This doesn't make much sense, so don't
 allow it.  */
   error_at (loc, "% loop initial declarations "
-   "are only allowed in C99 mode");
+   "are only allowed in C99 or C11 mode");
   if (hint)
{
  inform (loc,
- "use option -std=c99 or -std=gnu99 to compile your code");
+ "use option -std=c99, -std=gnu99, -std=c11 or -std=gnu11 "
+ "to compile your code");
  hint = false;
}
   return NULL_TREE;

Marek

Re: [PATCH][ARM] Fix FAIL: gcc.target/arm/unaligned-memcpy-2.c scan-assembler-times stmia 1

2013-06-17 Thread Ramana Radhakrishnan


On 06/17/13 10:24, Kyrylo Tkachov wrote:

Hi all,

This arm testsuite patch initialises an array in the
unaligned-memcpy-2.c test to ensure alignment, making the test pass.
This is similar to the patch
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00683.html.

Ok for trunk?


Ok - thanks for catching this : looks like I missed it in my original trawl.

regards
Ramana



Thanks,
Kyrill

gcc/testsuite/
2013-06-17  Kyrylo Tkachov  

* gcc.target/arm/unaligned-memcpy-2.c (dest): Initialize
to ensure alignment.

[PATCH][ARM] Fix FAIL: gcc.target/arm/unaligned-memcpy-2.c scan-assembler-times stmia 1

2013-06-17 Thread Kyrylo Tkachov

Hi all,

This arm testsuite patch initialises an array in the
unaligned-memcpy-2.c test to ensure alignment, making the test pass.
This is similar to the patch
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00683.html.

Ok for trunk?

Thanks,
Kyrill

gcc/testsuite/
2013-06-17  Kyrylo Tkachov  

* gcc.target/arm/unaligned-memcpy-2.c (dest): Initialize
to ensure alignment.

unaligned-memcpy-test.patch
Description: Binary data

Re: [PATCH] ggc-page.c : remove erroneous ATTRIBUTE_UNUSED

2013-06-17 Thread Richard Biener

On Fri, Jun 14, 2013 at 11:17 PM, David Malcolm  wrote:
> ggc_pch_write_object's parameter "d" is marked with ATTRIBUTE_UNUSED,
> but in fact it is used in 4 places at the end of the function.
>
> Successfully bootstrapped on x86_64-unknown-linux-gnu.
>
> OK for trunk?

Ok.

Thanks,
Richard.

> 2013-06-14  David Malcolm  
>
> * ggc-page.c (ggc_pch_write_object) : Remove erroneous
> ATTRIBUTE_UNUSED marking.
>
>

Re: [patch] reimplement -fstrict-volatile-bitfields

2013-06-17 Thread Richard Biener

On Fri, Jun 14, 2013 at 6:31 PM, Sandra Loosemore
 wrote:
> On 06/14/2013 06:31 AM, Richard Biener wrote:
>
>> I think we can split the patch up, so let me do piecewise approval of
>> changes.
>>
>> The changes that remove the packedp flag passing around and remove
>> the warning code are ok.  The store_bit_field_1 change is ok, as is
>> the similar extract_bit_field_1 change (guard the other register copying
>> path).
>>
>> Please split those out and commit them separately (I suppose they are
>> strict progressions in testing).
>
>
> I will work on splitting the patch, but I feel a little uncomfortable about
> starting to commit it piecewise without some more clear indication that this
> is the right direction.  Otherwise we'll just be back in the situation where
> things are still inconsistent and broken, but in a different way than
> before.
>
>
>> That leaves a much smaller set of changes for which I have one comment
>> for now:
>>
>> @@ -4508,6 +4508,16 @@ get_bit_range (unsigned HOST_WIDE_INT *b
>>
>> gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
>>
>> +  /* If -fstrict-volatile-bitfields was given and this is a volatile
>> + access, then we must ignore any DECL_BIT_FIELD_REPRESENTATIVE and
>> + do the access in the mode of the field.  */
>> +  if (TREE_THIS_VOLATILE (exp)
>> +  && flag_strict_volatile_bitfields > 0)
>> +{
>> +  *bitstart = *bitend = 0;
>> +  return;
>> +}
>>
>> with the past reviews where I said the implementation was broken anyway
>> I was hinting at that DECL_BIT_FIELD_REPRESENTATIVE should simply
>> be chosen in a way to adhere to flag_strict_volatile_bitfields.  I had the
>> impression that this alone should be enough to implement strict volatile
>> bitfields correctly and no code in the backend would need to check
>> for flag_strict_volatile_bitfields.  I may of course be wrong here, as
>> -fstrict-volatile-bitfields may apply to cases where the bitfield is _not_
>> declared volatile but only the decl?  Thus,
>>
>> struct X {
>>int i : 3;
>> };
>>
>> volatile struct X x;
>>
>> Is x.i an access that needs to adhere to strict volatile bitfield
>> accesses?
>
>
> Yes, exactly; see the new pr23623.c test case included with the patch, for
> instance.  Or the volatile bitfield access could be through a
> volatile-qualified pointer, as in the volatile-bitfields-3.c test case.
> Bernd Schmidt and I had some internal discussion about this and neither of
> us saw how messing with DECL_BIT_FIELD_REPRESENTATIVE could help where the
> field is not declared volatile but the object being referenced is.

Yeah, I can see that.

>> Note that IIRC whether you short-cut get_bit_range in the above way
>> you still get the access to use the mode of the FIELD_DECL.
>
>
> As I noted in my previous message, the pr23623.c test case was failing on
> all the targets I tried without this patch hunk, so the access was clearly
> not using the mode of the FIELD_DECL without it.
>
>
>> If the above constitutes a strict volatile bitfield access then the very
>> correct implementation of strict volatile bitfield accesses is to
>> adjust the memory accesses the frontends generate (consider
>> LTO and a TU compiled with -fstrict-volatile-bitfields and another TU
>> compiled with -fno-strict-volatile-bitfields).  Which it probably does
>> reasonably well already (setting TREE_THIS_VOLATILE on the
>> outermost bit-field COMPONENT_REF).  If that is not preserved
>> then converting the accesses to BIT_FIELD_REFs is another
>> possibility.  That said, the behavior of GIMPLE IL should not change
>> dependent on a compile flag.
>
>
> I am kind of lost, here.  -fstrict-volatile-bitfields is a code generation
> option, not a front end option, and it seems like LTO should not need any
> special handling here any more than it does for any of the gazillion other
> code generation options supported by GCC.

LTO doesn't have any handling of code-generation options.  And it cannot
reliably.  It has a few hacks to support mixed -fstrict-aliasing units for
example, but it will be completely lost if you mix -fstrict-volatile-bitfields
with -fno-strict-volatile-bitfields TUs.

Of course the easiest solution here is always adhere to the ABI and
simply drop the command-line flag.  Make it a target hook instead.

That said, with the goal to have bitfield accesses lowered to their
final memory access patterns way earlier than RTL expansion we'll end
up using BIT_FIELD_REF-like accesses for the strict volatile bitfields
at some point anyway.

Richard.


> -Sandra
>

Re: [ping][PATCH][1 of 2] Add value range info to SSA_NAME for zero sign extension elimination in RTL

2013-06-17 Thread Richard Biener

On Mon, 17 Jun 2013, Kugan wrote:

> Can you please help to review this patch? Richard reviewed the original patch
> and asked it to be split into two parts. Also, he wanted a review from RTL
> maintainer for the RTL changes.
> 
> Thanks,
> Kugan
> 
> On 03/06/13 11:43, Kugan wrote:
> > Hi,
> > 
> > This patch adds value range information to tree SSA_NAME during Value
> > Range Propagation (VRP) pass  in preparation to removes some of the
> > redundant sign/zero extensions during RTL expansion.
> > 
> > This is based on the original patch posted in
> > http://gcc.gnu.org/ml/gcc-patches/2013-05/msg00610.html and addresses
> > the review comments of  Richard Biener.
> > 
> > Tested  on X86_64 and ARM.
> > 
> > I would like review comments on this.
> > 
> > Thanks,
> > Kugan
> > 
> > 
> > +2013-06-03  Kugan Vivekanandarajah  
> > +
> > +* gcc/gcc/tree-flow.h: Declared structure range_info_def and function
> > +definition for mark_range_info_unknown.
> > +* gcc/tree-ssa-alias.c (dump_alias_info) : Check pointer type
> > +* gcc/tree-ssanames.c (make_ssa_name_fn) : Check pointer type in
> > +initialize.
> > +* (mark_range_info_unknown) : New function.
> > +* (duplicate_ssa_name_range_info) : Likewise.
> > +* (duplicate_ssa_name_fn) : Check pointer type and call correct
> > +duplicate function.
> > +* gcc/tree-vrp.c (extract_exp_value_range): New function.
> > +* (simplify_stmt_using_ranges): Call extract_exp_value_range and
> > +tree_ssa_set_value_range.
> > +* gcc/tree.c (tree_ssa_set_value_range): New function.
> > +* gcc/tree.h (SSA_NAME_PTR_INFO) : changed to access via union
> > +* gcc/tree.h (SSA_NAME_RANGE_INFO) : New macro

These go to gcc/ChangeLog and thus do not need the gcc/ prefix.

+/* Value range information for SSA_NAMEs representing non-pointer 
variables.  */
+
+struct GTY (()) range_info_def {
+  /* Set to true if VR_RANGE and false if VR_ANTI_RANGE.  */
+  bool vr_range;
+  /* Minmum for value range.  */
+  double_int min;
+  /* Maximum for value range.  */
+  double_int max;
+  /* Set to true if range is valid.  */
+  bool valid;
+};

this way you waste a lot of padding, so please move the two bool
members next to each other.

+/* Extract the value range of assigned exprassion for GIMPLE_ASSIGN stmt.
+   If the extracted value range is valid, return true else return
+   false.  */
+static bool
+extract_exp_value_range (gimple stmt, value_range_t *vr)
+{
+  gcc_assert (is_gimple_assign (stmt));
+  tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
...
@@ -8960,6 +9016,23 @@ simplify_stmt_using_ranges (gimple_stmt_iterator 
*gsi)
 {
   enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
   tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree lhs = gimple_assign_lhs (stmt);
+
+  /* Set value range information for ssa.  */
+  if (!POINTER_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+  && (TREE_CODE (gimple_assign_lhs (stmt)) == SSA_NAME)
+  && INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
+  && !SSA_NAME_RANGE_INFO (lhs))
+{
+  value_range_t vr = VR_INITIALIZER;
...
+  if (extract_exp_value_range (stmt, &vr))
+tree_ssa_set_value_range (lhs,
+  tree_to_double_int (vr.min),
+  tree_to_double_int (vr.max),
+  vr.type == VR_RANGE);
+}

This looks overly complicated to me.  In vrp_finalize you can simply do

  for (i = 0; i < num_vr_values; i++)
if (vr_value[i])
  {
tree name = ssa_name (i);
if (POINTER_TYPE_P (name))
  continue;
if (vr_value[i].type == VR_RANGE
|| vr_value[i].type == VR_ANTI_RANGE)
  tree_ssa_set_value_range (name, tree_to_double_int 
(vr_value[i].min), tree_to_double_int (vr_value[i].max), vr_value[i].type 
== VR_RANGE);
  }

that is, transfer directly from the lattice.

+/* Set zero/sign extension redundant for ssa def stmt.  */
+
+void
+tree_ssa_set_value_range (tree ssa, double_int min,
+  double_int max, bool vr_range)
+{

The comment needs updating.  Also this doesn't belong in tree.c
but in tree-ssanames.c with a more suitable name like
set_range_info ().

+/* The garbage collector needs to know the interpretation of the
+   value range info in the tree_ssa_name.  */
+#define TREE_SSA_PTR_INFO   (0)
+#define TREE_SSA_RANGE_INFO (1)

no need for those, just use GTY ((tag ("0")) and "1".

+  /* Value range information.  */

/* SSA name annotations.  */

+  union vrp_info_type {
+/* Pointer attributes used for alias analysis.  */
+struct GTY ((tag ("TREE_SSA_PTR_INFO"))) ptr_info_def *ptr_info;
+/* Value range attributes used for zero/sign extension elimination.  
*/

/* Value range information.  */

+struct GTY ((tag ("TREE_

RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Bin Cheng



> -Original Message-
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Monday, June 17, 2013 4:42 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address
mode
> when expanding array reference
> 
> > My mistake. It's because arm_legitimize_address cannot re-factor
> > "[r105 +
> > r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
> > suggest that this kind of transformation should be done in backend?  I
> > can think of some disadvantages by doing it in backend:
> > Different kinds of address expression might be wanted in different
> > phase of compilation, for example, sometimes GCC wants to force
> > computation of address expression out of memory access because it
> > cannot CSE array indexing.  It's difficult to differentiate in backend.
> 
> It's equally difficult in memory_address_addr_space and it affects all the
> architectures at once...  You said in the original message that Thumb2 and
ARM
> modes are already not equally sensitive to it, so it's not unthinkable
that
> some architectures might not want it at all.  Therefore, yes, this should
> preferably be handled in arm_legitimize_address.

I was thinking it would be more accurate to capture the scaled_offset
without over-kill by doing it in offset_address.  Only problem I can image
is the change forces addr into register, which might nullifies backend
dependent transformation.
I will try to do it in arm_legitimize_address and see what's the result.

Thanks.
bin

RE: [PATCH][ARM][6/n] Partial IT block deprecation in ARMv8 AArch32 - VFP patterns

2013-06-17 Thread Kyrylo Tkachov

Ping?
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00493.html

Thanks,
Kyrill

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Kyrylo Tkachov
> Sent: 10 June 2013 11:52
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Earnshaw; Ramana Radhakrishnan
> Subject: [PATCH][ARM][6/n] Partial IT block deprecation in ARMv8
> AArch32 - VFP patterns
> 
> Hi all,
> 
> This patch makes the changes to the various floating point patterns
> in
> vfp.md. Since pretty much all floating point instruction are always
> encoded in 32 bits, they cannot be used inside an IT block by the
> -mrestrict-it rules. Therefore this patch just goes and disables
> the
> predicable variants of the offending VFP patterns.
> 
> The conditional floating point move patterns are disabled
> altogether for
> arm_restrict_it because they explicitly use IT blocks in their
> output
> template and the corresponding expanders in arm.md are updated to
> make
> sure we use the new vsel instruction that is available in ARMv8
> when
> appropriate.
> 
> Tested arm-none-eabi on qemu and model as part of the whole series
> and
> also independently bootstrapped with -mrestrict-it enabled on a
> Cortex-A15.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2013-06-10  Kyrylo Tkachov  
> 
>   * config/arm/predicates.md (arm_cond_move_operator): New
> predicate.
>   * config/arm/arm.md (movsfcc): Use arm_cond_move_operator
> predicate.
>   (movdfcc): Likewise.
>   * config/arm/vfp.md (*thumb2_movsf_vfp):
>   Disable predication for arm_restrict_it.
>   (*thumb2_movsfcc_vfp): Disable for arm_restrict_it.
>   (*thumb2_movdfcc_vfp): Likewise.
>   (*abssf2_vfp, *absdf2_vfp, *negsf2_vfp, *negdf2_vfp,
> *addsf3_vfp,
>   *adddf3_vfp, *subsf3_vfp, *subdf3_vfpc, *divsf3_vfp,
> *divdf3_vfp,
>   *mulsf3_vfp, *muldf3_vfp, *mulsf3negsf_vfp, *muldf3negdf_vfp,
>   *mulsf3addsf_vfp, *muldf3adddf_vfp, *mulsf3subsf_vfp,
>   *muldf3subdf_vfp, *mulsf3negsfaddsf_vfp,
> *fmuldf3negdfadddf_vfp,
>   *mulsf3negsfsubsf_vfp, *muldf3negdfsubdf_vfp,
> *fma4,
>   *fmsub4, *fnmsub4, *fnmadd4,
>   *extendsfdf2_vfp, *truncdfsf2_vfp, *extendhfsf2, *truncsfhf2,
>   *truncsisf2_vfp, *truncsidf2_vfp, fixuns_truncsfsi2,
> fixuns_truncdfsi2,
>   *floatsisf2_vfp, *floatsidf2_vfp, floatunssisf2,
> floatunssidf2,
>   *sqrtsf2_vfp, *sqrtdf2_vfp, *cmpsf_vfp, *cmpsf_trap_vfp,
> *cmpdf_vfp,
>   *cmpdf_trap_vfp, 2):
>   Disable predication for arm_restrict_it.

Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Eric Botcazou

> My mistake. It's because arm_legitimize_address cannot re-factor "[r105 +
> r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
> suggest that this kind of transformation should be done in backend?  I can
> think of some disadvantages by doing it in backend:
> Different kinds of address expression might be wanted in different phase of
> compilation, for example, sometimes GCC wants to force computation of
> address expression out of memory access because it cannot CSE array
> indexing.  It's difficult to differentiate in backend.

It's equally difficult in memory_address_addr_space and it affects all the 
architectures at once...  You said in the original message that Thumb2 and ARM 
modes are already not equally sensitive to it, so it's not unthinkable that 
some architectures might not want it at all.  Therefore, yes, this should 
preferably be handled in arm_legitimize_address.

-- 
Eric Botcazou

RE: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Bin Cheng



> -Original Message-
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Monday, June 17, 2013 3:32 PM
> To: Bin Cheng
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address
mode
> when expanding array reference
> 
> > The problem occurs when accessing local array element. For example, #
> > VUSE <.MEM_27>
> > k_8 = parent[k_29];
> >
> > GCC calculates the address in three steps:
> > 1) addr is calculated as "r105 + (-2064)".
> > 2) offset is calculated as "r165*4".
> > 3) calls offset_address to combine the address into "r105+ r165*4 +
> > (-2064)".
> >
> > Since ADDR is valid and there is no call to memory_address_addr_space
> > in offset_address, the invalid address expression has no chance to go
> > through target dependent legitimization function.
> 
> But offset_address calls change_address_1 with validate set to 1 so during
RTL
> expansion memory_address_addr_space should be invoked on the invalid
address.
Yes, I missed that part.

> 
> > Even there is a chance, the
> > current implementation of memory_address_addr_space prevents the
> > scaled address expression from being generated because of below code:
> >   if (! cse_not_expected && !REG_P (x))
> >   x = break_out_memory_refs (x);
> 
> Where are the memory references in the above address?
My mistake. It's because arm_legitimize_address cannot re-factor "[r105 +
r165*4 + (-2064)]" into "rx = r105 + (-2064); [rx + r165*4]".  Do you
suggest that this kind of transformation should be done in backend?  I can
think of some disadvantages by doing it in backend:
Different kinds of address expression might be wanted in different phase of
compilation, for example, sometimes GCC wants to force computation of
address expression out of memory access because it cannot CSE array
indexing.  It's difficult to differentiate in backend.

Thanks.
bin

Re: [PATCH] Re-write LTO type merging again, do tree merging

2013-06-17 Thread Jan Hubicka

> > CPU: AMD64 family10, speed 2100 MHz (estimated)
> > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
> > mask of 0x00 (No unit mask) count 75
> > samples  %app name symbol name
> > 4504711.7420  lto1 inflate_fast
> 
> It might be worth changing LTO section layout to include a header
> that specifies whether a section is compressed or not so we can
> allow mixed compressed/uncompressed sections in the LTRANS files
> and avoid decompressing the function sections.

Yes, but this profile shows only decl streaming. Functions do not really show
up in profile.  I guess only way to cut this down is to either use LZO that
is faster at decompression side and/or reduce amount of data we stream to .o
files.
> 
> > 34224 8.9209  lto1 
> > streamer_read_uhwi(lto_input_block*)
> > 24630 6.4201  lto1 compare_tree_sccs_1(tree_node*, 
> > tree_node*, tree_node***)
> > 23205 6.0487  lto1 
> > pointer_map_insert(pointer_map_t*, void const*)
> > 20829 5.4293  lto1 unpack_value_fields(data_in*, 
> > bitpack_d*, tree_node*)
> > 13545 3.5307  lto1 ht_lookup_with_hash(ht*, 
> > unsigned char const*, unsigned long, unsigned int, ht_lookup_option)
> > 12841 3.3472  libc-2.11.1.so   memset
> > 11840 3.0862  lto1 htab_find_slot_with_hash
> > 11397 2.9708  lto1 
> > streamer_tree_cache_insert_1(streamer_tree_cache_d*, tree_node*, unsigned 
> > int, unsigned int*, bool)
> > 11086 2.8897  lto1 lto_input_tree(lto_input_block*, 
> > data_in*)
> > 10522 2.7427  lto1 
> > lto_input_tree_1(lto_input_block*, data_in*, LTO_tags, unsigned int)
> > 8853  2.3076  lto1 
> > unify_scc(streamer_tree_cache_d*, unsigned int, unsigned int, unsigned int, 
> > unsigned int)
> > 8539  2.2258  lto1 hash_table > xcallocator>::find_slot_with_hash(tree_scc const*, unsigned int, 
> > insert_option)
> > 7987  2.0819  lto1 adler32
> > 7743  2.0183  lto1 
> > streamer_read_tree_body(lto_input_block*, data_in*, tree_node*)
> > 
> > Can't we free the pointer map in streamer after every SCC?
> 
> You mean on read-in?  We even can do without the pointer-map there at all.
> 
> We can experiment with that as a followup.

I believe it was needed for one of the cleanups (to update the map), but i guess
one can easily just run the fixup on the segment of array corresponding to new 
SCC.
> > The longest running ltrans add another 400 seconds.
> >  combiner:  16.16 ( 4%) usr   0.08 ( 1%) sys  16.53 ( 4%) 
> > wall  205251 kB ( 6%) ggc
> >  integrated RA   :  47.97 (12%) usr   0.21 ( 3%) sys  48.39 (12%) 
> > wall  391655 kB (12%) ggc
> >  LRA hard reg assignment : 158.64 (39%) usr   0.02 ( 0%) sys 158.74 (38%) 
> > wall   0 kB ( 0%) ggc
> >  TOTAL : 404.51 8.39   414.01   
> >  3215235 kB
> 
> Otherwise it looks pretty good.

Indeed. We are getting closer to numbers I measured on the same machine in 
2010, when Firefox
was half of its today size.

Thanks for all the hard work!
Honza

Re: [PATCH] Re-write LTO type merging again, do tree merging

2013-06-17 Thread Richard Biener

On Sat, 15 Jun 2013, Jan Hubicka wrote:

> > 
> > I've managed to fix nearly all reported missed merged types for cc1.
> > Remaining are those we'll never be able to merge (merging would
> > change the SCC shape) and those that eventually end up refering
> > to a TYPE_STUB_DECL with a make_anon_name () IDENTIFIER_NODE.
> > For the latter we should find a middle-end solution as a followup
> > in case it really matters.
> > 
> > WPA statistics for stage2 cc1 are
> > 
> > [WPA] read 2495082 SCCs of average size 2.380088
> > [WPA] 5938514 tree bodies read in total
> > [WPA] tree SCC table: size 524287, 260253 elements, collision ratio: 
> > 0.804380
> > [WPA] tree SCC max chain length 11 (size 1)
> > [WPA] Compared 429412 SCCs, 7039 collisions (0.016392)
> > [WPA] Merged 426111 SCCs
> > [WPA] Merged 3313709 tree bodies
> > [WPA] Merged 225079 types
> > [WPA] 162844 types prevailed (488124 associated trees)
> > [WPA] Old merging code merges an additional 22412 types of which 21492 are 
> > in the same SCC with their prevailing variant (345831 and 323276 
> > associated trees)
> > 
> > which shows there are 920 such TYPE_STUB_DECL issues and 21492
> > merges the old code did that destroyed SCCs.
> > 
> > Compared to the old code which only unified types and some selected
> > trees (INTEGER_CSTs), the new code can immediately ggc_free the
> > unified SCCs after they have been read which results in 55% of
> > all tree bodies input into WPA stage to be freed (rather than hoping
> > on secondary GC walk effects as the old code relied on), 58% of
> > all types are recycled.
> > 
> > Compile-time is at least on-par with the old code now and disk-usage
> > grows by a moderate 10% due to the streaming format change.
> 
> On Firefox we now get
> [WPA] read 43144472 SCCs of average size 2.270524
> [WPA] 97960575 tree bodies read in total
> [WPA] tree SCC table: size 8388593, 3936571 elements, collision ratio: 
> 0.727773
> [WPA] tree SCC max chain length 88 (size 1)
> [WPA] Compared 19030240 SCCs, 337719 collisions (0.017746)
> [WPA] Merged 18957101 SCCs
> [WPA] Merged 58202930 tree bodies
> [WPA] Merged 11800337 types
> [WPA] 4506307 types prevailed (13699881 associated trees)
> [WPA] Old merging code merges an additional 2174796 types of which 141104 are 
> in the same SCC with their prevailing variant (12811826 and 6367853 
> associated trees)
> [WPA] GIMPLE canonical type table: size 131071, 77871 elements, 4506442 
> searches, 1130903 collisions (ratio: 0.250953)
> [WPA] GIMPLE canonical type hash table: size 8388593, 4506386 elements, 
> 15712947 searches, 12879021 collisions (ratio: 0.819644)
> 
> and about 5GB of GGC memory after merging, overall the footprint is still 
> around 10GB.
> It is notable improvmenet over old code however, where we needed 16GB.
> 
> [LTRANS] read 319710 SCCs of average size 6.184039
> [LTRANS] 1977099 tree bodies read in total
> [LTRANS] GIMPLE canonical type table: size 16381, 9569 elements, 473131 
> searches, 24899 collisions (ratio: 0.052626)
> [LTRANS] GIMPLE canonical type hash table: size 1048573, 473076 elements, 
> 1611909 searches, 1340396 collisions (ratio: 0.831558)
> 
> CPU: AMD64 family10, speed 2100 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
> mask of 0x00 (No unit mask) count 75
> samples  %app name symbol name
> 4504711.7420  lto1 inflate_fast

It might be worth changing LTO section layout to include a header
that specifies whether a section is compressed or not so we can
allow mixed compressed/uncompressed sections in the LTRANS files
and avoid decompressing the function sections.

> 34224 8.9209  lto1 
> streamer_read_uhwi(lto_input_block*)
> 24630 6.4201  lto1 compare_tree_sccs_1(tree_node*, 
> tree_node*, tree_node***)
> 23205 6.0487  lto1 pointer_map_insert(pointer_map_t*, 
> void const*)
> 20829 5.4293  lto1 unpack_value_fields(data_in*, 
> bitpack_d*, tree_node*)
> 13545 3.5307  lto1 ht_lookup_with_hash(ht*, unsigned 
> char const*, unsigned long, unsigned int, ht_lookup_option)
> 12841 3.3472  libc-2.11.1.so   memset
> 11840 3.0862  lto1 htab_find_slot_with_hash
> 11397 2.9708  lto1 
> streamer_tree_cache_insert_1(streamer_tree_cache_d*, tree_node*, unsigned 
> int, unsigned int*, bool)
> 11086 2.8897  lto1 lto_input_tree(lto_input_block*, 
> data_in*)
> 10522 2.7427  lto1 lto_input_tree_1(lto_input_block*, 
> data_in*, LTO_tags, unsigned int)
> 8853  2.3076  lto1 unify_scc(streamer_tree_cache_d*, 
> unsigned int, unsigned int, unsigned int, unsigned int)
> 8539  2.2258  lto1 hash_table xcallocator>::find_slot_with_hash(tree_scc const*, unsigned int, 
> insert_option)
> 7987  2.0819  l

Re: [patch] Improve debug info for small structures passed by reference

2013-06-17 Thread Jakub Jelinek

On Mon, Jun 17, 2013 at 09:46:30AM +0200, Eric Botcazou wrote:
> > This really should come with testcases (e.g. guality ones).
> 
> guality testcases are barely maintainable, especially on non-x86 platforms, 
> so 
> I'd rather not enter this business.  I'll discuss with Joel and see whether 
> we 
> can coordinate with the GDB side.

Especially if it is -O0 only, I don't see why you think so.  Just dg-skip-if
it for -O1+ if you believe it is unreliable for some reason, but if you
look at the parameter value after the prologue, not showing the right value
at -O0 would be a serious bug everywhere.  Having some GDB testcase also
doesn't hurt, but having it in GCC testsuite has significant advantages over
that, most GCC developers don't run GDB testsuite after any changes they do.

Jakub

Re: [patch] Improve debug info for small structures passed by reference

2013-06-17 Thread Eric Botcazou

> This really should come with testcases (e.g. guality ones).

guality testcases are barely maintainable, especially on non-x86 platforms, so 
I'd rather not enter this business.  I'll discuss with Joel and see whether we 
can coordinate with the GDB side.

-- 
Eric Botcazou

Re: [PATCH GCC]Fix PR57540, try to choose scaled_offset address mode when expanding array reference

2013-06-17 Thread Eric Botcazou

> The problem occurs when accessing local array element. For example,
> # VUSE <.MEM_27>
> k_8 = parent[k_29];
> 
> GCC calculates the address in three steps:
> 1) addr is calculated as "r105 + (-2064)".
> 2) offset is calculated as "r165*4".
> 3) calls offset_address to combine the address into "r105+ r165*4 +
> (-2064)".
> 
> Since ADDR is valid and there is no call to memory_address_addr_space in
> offset_address, the invalid address expression has no chance to go through
> target dependent legitimization function.

But offset_address calls change_address_1 with validate set to 1 so during RTL 
expansion memory_address_addr_space should be invoked on the invalid address.

> Even there is a chance, the
> current implementation of memory_address_addr_space prevents the scaled
> address expression from being generated because of below code:
>   if (! cse_not_expected && !REG_P (x))
> x = break_out_memory_refs (x);

Where are the memory references in the above address?

-- 
Eric Botcazou

91 matches

Mail list logo