Re: [PATCH, generic] Support printing of escaped curly braces and vertical bar in assembler output

2013-04-17 Thread Richard Henderson

On 2013-04-16 13:55, Jakub Jelinek wrote:

On Tue, Apr 16, 2013 at 03:41:52PM +0400, Maksim Kuznetsov wrote:

Richard, Jeff, could you please have a look?


I wonder if it %{ and %} shouldn't be better handled in final.c
for all #ifdef ASSEMBLER_DIALECT targets, rather than just for one specific.


Yes, please.


r~


Re: [patch] Fix PR middle-end/56474

2013-04-17 Thread Richard Biener
On Wed, Apr 17, 2013 at 1:12 AM, Eric Botcazou ebotca...@adacore.com wrote:
 For the C family I found exactly one - the layout_type case, and fixed
 it in the FEs by making empty arrays use [1, 0] domains or signed domains
 (I don't remember exactly).  I believe the layout_type change was to make
 Ada happy.

 I'm skeptical, I had to narrow down the initial kludge because it hurted Ada.

 It may be that enabling overflow detection for even unsigned sizetype was
 because of Ada as well.  After all only Ada changed its sizetype sign
 recently.

 Not true, overflow detection has _always_ been enabled for sizetypes.
 But sizetypes, including unsigned ones, were sign-extended so 0 -1 didn't
 overflow and we need that behavior back for Ada to work properly,

Yeah, well - they were effectively signed.

 I don't like special casing 0 - 1 in a general compute function.  Maybe
 you want to use size_diffop for the computation?  That would result in
 a signed result and thus no overflow for 0 - 1.

 But it's not a general compute function, it's size_binop which is meant to be
 used for sizetypes only and which forces overflow on unsigned types.  We need
 overflow detection for sizetypes but we can also tailor it to fit our needs.

I'm not against tailoring it to fit our needs - I'm just against special casing
behavior for specific values.  That just sounds wrong.

Maybe we should detect overflow as if the input and output were signed
while computing an unsigned result.  As far as I can see int_const_binop_1
does detect overflow as if operations were signed (it passes 'false' as
uns to all double-int operations rather than TYPE_UNSIGNED).
For example sub_with_overflow simply does

  neg_double (b.low, b.high, ret.low, ret.high);
  add_double (low, high, ret.low, ret.high, ret.low, ret.high);
  *overflow = OVERFLOW_SUM_SIGN (ret.high, b.high, high);

which I believe is wrong.  Shouldn't it be

  neg_double (b.low, b.high, ret.low, ret.high);
  HOST_WIDE_INT tem = ret.high;
  add_double (low, high, ret.low, ret.high, ret.low, ret.high);
  *overflow = OVERFLOW_SUM_SIGN (ret.high, tem, high);

?  Because we are computing a + (-b) and thus OVERFLOW_SUM_SIGN
expects the sign of a and -b, not a and b to verify against the
sign of ret.

 The other option is to for example disable overflow handling for _all_
 constants and MINUS_EXPR (and then please PLUS_EXPR as well)
 in size_binop.  Maybe it's only the MULT_EXPR overflow we want to
 know (byte-to-bit conversion / element size scaling IIRC).

 Well, we just need 0 - 1 because of the way we compute size expressions for
 variable-sized arrays.

I'm sceptical.  Where do you compute the size expression for variable-sized
arrays?  I suppose with the testcase in the initial patch I can then inspect
myself what actually happens?

Thanks,
Richard.

 --
 Eric Botcazou


Re: [patch] simplify emit_delay_sequence

2013-04-17 Thread Eric Botcazou
 This patch is also necessary for my new delay-slot scheduler to keep
 basic block boundaries correctly up-to-date. The emit-rtl API does
 that already.
 
 Cross-tested powerpc64 x mips. Currently running bootstraptest on
 sparc64-unknown-linux-gnu. OK if it passes?

Yes, modulo

@@ -538,6 +502,8 @@ emit_delay_sequence (rtx insn, rtx list, int lengt
INSN_LOCATION (seq_insn) = INSN_LOCATION (tem);
   INSN_LOCATION (tem) = 0;
 
+  /* Remove any REG_DEAD notes because we can't rely on them now
+that the insn has been moved.  */
   for (note = REG_NOTES (tem); note; note = next)
{
  next = XEXP (note, 1);

Did you mean to move the comment instead of duplicating it?

-- 
Eric Botcazou


Re: [patch] Fix ICE during RTL expansion at -O1

2013-04-17 Thread Eric Botcazou
 +  if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE)
 +goto may_overlap;
 
 ick, can TREE_CODE (type1) != RECORD_TYPE happen as well here?
 Please add a comment similar to the Fortran ??? above.

It can happen because we stop at unions (and qualified unions) and for them we 
cannot disambiguate based on the fields.  I'll add a regular comment.

 Can you please also add at least one testcase as
 gcc.dg/tree-ssa/ssa-fre-??.c that tests the functionality of this and that
 wasn't handled before? I suppose it would be sth like
 
 struct S { int i; int j; };
 struct U
 {
   struct S a[10];
 } u;
 
 u.a[n].i= i;
 u.a[n].j = j;
 return u.a[n].i;
 
 where we miss to CSE the load from u.a[n].i.

Yes, the patch does eliminate the redundant load in .fre1:

  u.a[n_2(D)].i = i_3(D);
  u.a[n_2(D)].j = j_5(D);
  _7 = u.a[n_2(D)].i;
  return _7;

becomes:

  u.a[n_2(D)].i = i_3(D);
  u.a[n_2(D)].j = j_5(D);
  _7 = i_3(D);
  return _7;

 Otherwise the patch is ok.

Thanks.

-- 
Eric Botcazou


Re: [PATCH] Add a new option -fstack-protector-strong

2013-04-17 Thread Florian Weimer

On 04/17/2013 02:49 AM, Han Shen wrote:

+  if (flag_stack_protect == 3)
+cpp_define (pfile, __SSP_STRONG__=3);
   if (flag_stack_protect == 2)
 cpp_define (pfile, __SSP_ALL__=2);


3 and 2 should be replaced by SPCT_FLAG_STRONG and SPCT_FLAG_ALL.

I define these SPCT_FLAG_XXX in cfgexpand.c locally, so they are not
visible to c-cppbuiltin.c, do you suggest define these inside
c-cppbuiltin.c also?


I see.  Let's use the constants for now.


Indentation is off (unless both mail clients I tried are clobbering your
patch).  I think the GNU coding style prohibits the braces around the
single-statement body of the outer 'for.


Done with indentation properly on and removed the braces. (GMail
composing window drops all the tabs when pasting... I have to use
Thunderbird to paste the patch. Hope it is right this time)


Thunderbird mangles patches as well, but I was able to repair the 
damage.  When using Thunderbird, please send the patch as a text file 
attachment.  You can put the changelog snippets at the beginning of the 
file as well.  This way, everything is sent out unchanged.



Can you make the conditional more similar to the comment, perhaps using a
switch statement on the value of the flag_stack_protect variable? That's
going to be much easier to read.


Re-coded. Now using 'switch-case'.


Thanks.  I think the comment is now redundant because it matches the 
code almost word-for-word. 8-)



No for 'struct-returning' functions. But I regard this not an issue ---
at the programming level, there is no way to get one's hand on the
address of a returned structure ---
struct Node foo();
struct Node *p = foo();  // compiler error - lvalue required as
unary '' operand.


C++ const references can bind to rvalues.

But I'm more worried about the interaction with the return value 
optimization.  Consider this C++ code:


struct S {
  S();
  int a;
  int b;
  int c;
  int d;
  int e;
};

void f1(int *);

S f2()
{
  S s;
  f1(s.a);
  return s;
}

S g2();

void g3()
{
  S s = g2();
}

void g3b(const S);

void g3b()
{
  g3b(g2());
}

With your patch and -O2 -fstack-protector-strong, this generates the 
following assembly:


.globl  _Z2f2v
.type   _Z2f2v, @function
_Z2f2v:
.LFB0:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq%rdi, %rbx
call_ZN1SC1Ev
movq%rbx, %rdi
call_Z2f1Pi
movq%rbx, %rax
popq%rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE0:
.size   _Z2f2v, .-_Z2f2v
.p2align 4,,15
.globl  _Z2g3v
.type   _Z2g3v, @function
_Z2g3v:
.LFB1:
.cfi_startproc
subq$40, %rsp
.cfi_def_cfa_offset 48
movq%rsp, %rdi
call_Z2g2v
addq$40, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE1:
.size   _Z2g3v, .-_Z2g3v
.p2align 4,,15
.globl  _Z3g3bv
.type   _Z3g3bv, @function
_Z3g3bv:
.LFB2:
.cfi_startproc
subq$40, %rsp
.cfi_def_cfa_offset 48
movq%rsp, %rdi
movq%fs:40, %rax
movq%rax, 24(%rsp)
xorl%eax, %eax
call_Z2g2v
movq%rsp, %rdi
call_Z3g3bRK1S
movq24(%rsp), %rax
xorq%fs:40, %rax
jne .L9
addq$40, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.L9:
.cfi_restore_state
.p2align 4,,6
call__stack_chk_fail
.cfi_endproc
.LFE2:
.size   _Z3g3bv, .-_Z3g3bv

Here, g3b() is correctly instrumented, and f2() does not need 
instrumentation (because space for the returned object is not part of 
the local frame).  But an address on the stack escapes in g3() and is 
used for the return value of the call to g2().  This requires 
instrumentation, which is missing in this example.


I suppose this can be handled in a follow-up patch if necessary.


ChangeLog and patch below --

gcc/ChangeLog
2013-04-16  Han Shen  shen...@google.com
 * cfgexpand.c (record_or_union_type_has_array_p): Helper function
 to check if a record or union contains an array field.


I think the GNU convention is to write only this:

* cfgexpand.c (record_or_union_type_has_array_p): New function.


 (expand_used_vars): Add logic handling '-fstack-protector-strong'.
 * common.opt (fstack-protector-all): New option.


Should be fstack-protector-strong.

--
Florian Weimer / Red Hat Product Security Team


RE: [PATCH, AArch64] Compare instruction in shift_extend mode

2013-04-17 Thread Hurugalawadi, Naveen
Hi,

 I suggest for this one test case either making it compile only and 
 dropping main() such that the pattern match only looks in the
 assembled output of the cmp_* functions

The testcase will check only for assembly pattern of the instruction
as per your suggestion.

Please find attached the modified patch let me know if there should
be any further modifications in it.

Thanks,
Naveen

--- gcc/config/aarch64/aarch64.md	2013-04-17 11:18:29.453576713 +0530
+++ gcc/config/aarch64/aarch64.md	2013-04-17 15:22:36.161492471 +0530
@@ -2311,6 +2311,18 @@
(set_attr mode GPI:MODE)]
 )
 
+(define_insn *cmp_swp_optabALLX:mode_shft_GPI:mode
+  [(set (reg:CC_SWP CC_REGNUM)
+	(compare:CC_SWP (ashift:GPI
+			 (ANY_EXTEND:GPI
+			  (match_operand:ALLX 0 register_operand r))
+			 (match_operand:QI 1 aarch64_shift_imm_mode n))
+	(match_operand:GPI 2 register_operand r)))]
+  
+  cmp\\t%GPI:w2, %GPI:w0, suxtALLX:size %1
+  [(set_attr v8type alus_ext)
+   (set_attr mode GPI:MODE)]
+)
 
 ;; ---
 ;; Store-flag and conditional select insns
--- gcc/testsuite/gcc.target/aarch64/cmp.c	1970-01-01 05:30:00.0 +0530
+++ gcc/testsuite/gcc.target/aarch64/cmp.c	2013-04-17 15:23:36.121492125 +0530
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options -O2 } */
+
+int
+cmp_si_test1 (int a, int b, int c)
+{
+  if (a  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int
+cmp_si_test2 (int a, int b, int c)
+{
+  if ((a  3)  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+typedef long long s64;
+
+s64
+cmp_di_test1 (s64 a, s64 b, s64 c)
+{
+  if (a  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+s64
+cmp_di_test2 (s64 a, s64 b, s64 c)
+{
+  if ((a  3)  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int
+cmp_di_test3 (int a, s64 b, s64 c)
+{
+  if (a  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+int
+cmp_di_test4 (int a, s64 b, s64 c)
+{
+  if (((s64)a  3)  b)
+return a + c;
+  else
+return a + b + c;
+}
+
+/* { dg-final { scan-assembler-times cmp\tw\[0-9\]+, w\[0-9\]+ 2 } } */
+/* { dg-final { scan-assembler-times cmp\tx\[0-9\]+, x\[0-9\]+ 4 } } */


RE: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-17 Thread Kyrylo Tkachov
Hi Julian,

 From: Julian Brown [mailto:jul...@codesourcery.com]
 Sent: 13 April 2013 15:04
 To: Julian Brown
 Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
 Radhakrishnan
 Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
 vcvt_f32_f16 NEON intrinsics
 
 On Fri, 12 Apr 2013 20:09:39 +0100
 Julian Brown jul...@codesourcery.com wrote:
 
  On Fri, 12 Apr 2013 15:19:18 +0100
  Kyrylo Tkachov kyrylo.tkac...@arm.com wrote:
 
   Hi all,
  
   This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
   to arm_neon.h through the generator ML scripts and also adds the
   built-ins to which the intrinsics will map to. The generator ML
   scripts are updated and used to generate the relevant .texi
   documentation, arm_neon.h and the tests in gcc.target/arm/neon .
 
  FWIW, some of the changes to neon*.ml can be simplified somewhat --
 my
  attempt at an improved version of those bits is attached. I'm still
  not too happy with mode_suffix, but these new instructions require
  adding semantics to parts of the generator program which weren't
  really very well-defined to start with :-). I appreciate that it's a
  bit of a tangle...
 
 I thought of an improvement to the mode_suffix part from the last
 version of the patch, so here it is. I'm done fiddling with this now,
 so back to you!

Thanks for looking at it! My Ocaml-fu is rather limited.
It does look cleaner now.
Here it is together with all the other parts of the patch, plus some
minor formatting changes.

Ok for trunk now?

gcc/ChangeLog
2013-04-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com
Julian Brown  jul...@codesourcery.com

* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
(TB_DREG): Add T_V4HF.
(v4hf_UP): New macro.
(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
Handle initialisation of V4HF. Adjust initialisation of reinterpret
built-ins.
(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
(arm_vector_mode_supported_p): Handle V4HF.
(arm_mangle_map): Handle V4HFmode.
* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
* config/arm/arm_neon_builtins.def: Add entries for
vcvtv4hfv4sf, vcvtv4sfv4hf.
* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
(neon_vcvtv4hfv4sf): Likewise.
* config/arm/neon-gen.ml: Handle half-precision floating point
features.
* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
* config/arm/arm_neon.h: Regenerate.
* config/arm/neon.ml (type elts): Add F16.
(type vectype): Add T_float16x4, T_floatHF.
(type vecmode): Add V4HF.
(type features): Add Requires_FP_bit feature.
(elt_width): Handle F16.
(elt_class): Likewise.
(elt_of_class_width): Likewise.
(mode_of_elt): Refactor.
(type_for_elt): Handle F16, fix error messages.
(vectype_size): Handle T_float16x4.
(vcvt_sh): New function.
(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
(string_of_mode): Handle V4HF.
* doc/arm-neon-intrinsics.texi: Regenerate.


gcc/testsuite/ChangeLog
2013-04-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com
Julian Brown  jul...@codesourcery.com

* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.


neon-vcvt-intrinsics.patch
Description: Binary data


[PATCH] Fix PR56982, handle setjmp like non-local labels

2013-04-17 Thread Richard Biener

This fixes PR56982 by properly modeling the control-flow
of setjmp.  It basically behaves as a non-local goto target
so this patch treats it so - it makes it start a basic-block
and get abnormal edges from possible sources of non-local
gotos.  The patch also fixes the bug that longjmp is marked
as leaf.

Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?
What about release branches (after it had some time to settle on
trunk of course)?

Thanks,
Richard.

2013-04-17  Richard Biener  rguent...@suse.de

PR tree-optimization/56982
* builtins.def (BUILT_IN_LONGJMP): longjmp is not a leaf
function.
* gimplify.c (gimplify_call_expr): Notice special calls.
(gimplify_modify_expr): Likewise.
* tree-cfg.c (make_abnormal_goto_edges): Handle setjmp-like
abnormal control flow receivers.
(call_can_make_abnormal_goto): Handle cfun-calls_setjmp
in the same way as cfun-has_nonlocal_labels.
(gimple_purge_dead_abnormal_call_edges): Likewise.
(stmt_starts_bb_p): Make setjmp-like abnormal control flow
receivers start a basic-block.

* gcc.c-torture/execute/pr56982.c: New testcase.

Index: gcc/gimplify.c
===
*** gcc/gimplify.c  (revision 198021)
--- gcc/gimplify.c  (working copy)
*** gimplify_call_expr (tree *expr_p, gimple
*** 2729,2734 
--- 2729,2735 
gimple_stmt_iterator gsi;
call = gimple_build_call_from_tree (*expr_p);
gimple_call_set_fntype (call, TREE_TYPE (fnptrtype));
+   notice_special_calls (call);
gimplify_seq_add_stmt (pre_p, call);
gsi = gsi_last (*pre_p);
fold_stmt (gsi);
*** gimplify_modify_expr (tree *expr_p, gimp
*** 4968,4973 
--- 4969,4975 
STRIP_USELESS_TYPE_CONVERSION (CALL_EXPR_FN (*from_p));
assign = gimple_build_call_from_tree (*from_p);
gimple_call_set_fntype (assign, TREE_TYPE (fnptrtype));
+   notice_special_calls (assign);
if (!gimple_call_noreturn_p (assign))
gimple_call_set_lhs (assign, *to_p);
  }
Index: gcc/tree-cfg.c
===
*** gcc/tree-cfg.c  (revision 198021)
--- gcc/tree-cfg.c  (working copy)
*** make_abnormal_goto_edges (basic_block bb
*** 967,991 
gimple_stmt_iterator gsi;
  
FOR_EACH_BB (target_bb)
! for (gsi = gsi_start_bb (target_bb); !gsi_end_p (gsi); gsi_next (gsi))
!   {
!   gimple label_stmt = gsi_stmt (gsi);
!   tree target;
  
!   if (gimple_code (label_stmt) != GIMPLE_LABEL)
! break;
  
!   target = gimple_label_label (label_stmt);
  
!   /* Make an edge to every label block that has been marked as a
!  potential target for a computed goto or a non-local goto.  */
!   if ((FORCED_LABEL (target)  !for_call)
!   || (DECL_NONLOCAL (target)  for_call))
! {
make_edge (bb, target_bb, EDGE_ABNORMAL);
!   break;
! }
!   }
  }
  
  /* Create edges for a goto statement at block BB.  */
--- 971,1005 
gimple_stmt_iterator gsi;
  
FOR_EACH_BB (target_bb)
! {
!   for (gsi = gsi_start_bb (target_bb); !gsi_end_p (gsi); gsi_next (gsi))
!   {
! gimple label_stmt = gsi_stmt (gsi);
! tree target;
  
! if (gimple_code (label_stmt) != GIMPLE_LABEL)
!   break;
  
! target = gimple_label_label (label_stmt);
  
! /* Make an edge to every label block that has been marked as a
!potential target for a computed goto or a non-local goto.  */
! if ((FORCED_LABEL (target)  !for_call)
! || (DECL_NONLOCAL (target)  for_call))
!   {
! make_edge (bb, target_bb, EDGE_ABNORMAL);
! break;
!   }
!   }
!   if (!gsi_end_p (gsi))
!   {
! /* Make an edge to every setjmp-like call.  */
! gimple call_stmt = gsi_stmt (gsi);
! if (is_gimple_call (call_stmt)
!  (gimple_call_flags (call_stmt)  ECF_RETURNS_TWICE))
make_edge (bb, target_bb, EDGE_ABNORMAL);
!   }
! }
  }
  
  /* Create edges for a goto statement at block BB.  */
*** call_can_make_abnormal_goto (gimple t)
*** 2147,2153 
  {
/* If the function has no non-local labels, then a call cannot make an
   abnormal transfer of control.  */
!   if (!cfun-has_nonlocal_label)
 return false;
  
/* Likewise if the call has no side effects.  */
--- 2161,2168 
  {
/* If the function has no non-local labels, then a call cannot make an
   abnormal transfer of control.  */
!   if (!cfun-has_nonlocal_label
!!cfun-calls_setjmp)
 return false;
  
/* Likewise if the call has no side effects.  */
*** stmt_starts_bb_p (gimple stmt, gimple pr
*** 2302,2307 
--- 2317,2327 
else

[PATCH] Fix PR56921

2013-04-17 Thread Richard Biener

This fixes PR56921 in a better way and restores the ability
to ggc-collect during RTL loop passes.  The patch stores
the simple-loop-description in a separate member of struct loop
and not its aux field which is not scanned by GC.

Bootstrapped and tested on x86_64-unknown-linux-gnu and
powerpc64-linux-gnu, applied.

Richard.

2013-04-17  Richard Biener  rguent...@suse.de

PR rtl-optimization/56921
* cfgloop.h (struct loop): Add simple_loop_desc member.
(struct niter_desc): Mark with GTY(()).
(simple_loop_desc): Do not use aux field but simple_loop_desc.
* loop-iv.c (get_simple_loop_desc): Likewise.
(free_simple_loop_desc): Likewise.

Revert
2013-04-16  Richard Biener  rguent...@suse.de

PR rtl-optimization/56921
* loop-init.c (pass_rtl_move_loop_invariants): Add
TODO_do_not_ggc_collect to todo_flags_finish.
(pass_rtl_unswitch): Same.
(pass_rtl_unroll_and_peel_loops): Same.
(pass_rtl_doloop): Same.

Index: gcc/cfgloop.h
===
*** gcc/cfgloop.h   (revision 198021)
--- gcc/cfgloop.h   (working copy)
*** struct GTY ((chain_next (%h.next))) lo
*** 172,177 
--- 172,180 
  
/* Head of the cyclic list of the exits of the loop.  */
struct loop_exit *exits;
+ 
+   /* Number of iteration analysis data for RTL.  */
+   struct niter_desc *simple_loop_desc;
  };
  
  /* Flags for state of loop structure.  */
*** struct rtx_iv
*** 372,378 
  /* The description of an exit from the loop and of the number of iterations
 till we take the exit.  */
  
! struct niter_desc
  {
/* The edge out of the loop.  */
edge out_edge;
--- 375,381 
  /* The description of an exit from the loop and of the number of iterations
 till we take the exit.  */
  
! struct GTY(()) niter_desc
  {
/* The edge out of the loop.  */
edge out_edge;
*** extern void free_simple_loop_desc (struc
*** 425,431 
  static inline struct niter_desc *
  simple_loop_desc (struct loop *loop)
  {
!   return (struct niter_desc *) loop-aux;
  }
  
  /* Accessors for the loop structures.  */
--- 428,434 
  static inline struct niter_desc *
  simple_loop_desc (struct loop *loop)
  {
!   return loop-simple_loop_desc;
  }
  
  /* Accessors for the loop structures.  */
Index: gcc/loop-iv.c
===
*** gcc/loop-iv.c   (revision 198021)
--- gcc/loop-iv.c   (working copy)
*** get_simple_loop_desc (struct loop *loop)
*** 3016,3025 
  
/* At least desc-infinite is not always initialized by
   find_simple_loop_exit.  */
!   desc = XCNEW (struct niter_desc);
iv_analysis_loop_init (loop);
find_simple_exit (loop, desc);
!   loop-aux = desc;
  
if (desc-simple_p  (desc-assumptions || desc-infinite))
  {
--- 3016,3025 
  
/* At least desc-infinite is not always initialized by
   find_simple_loop_exit.  */
!   desc = ggc_alloc_cleared_niter_desc ();
iv_analysis_loop_init (loop);
find_simple_exit (loop, desc);
!   loop-simple_loop_desc = desc;
  
if (desc-simple_p  (desc-assumptions || desc-infinite))
  {
*** free_simple_loop_desc (struct loop *loop
*** 3069,3074 
if (!desc)
  return;
  
!   free (desc);
!   loop-aux = NULL;
  }
--- 3069,3074 
if (!desc)
  return;
  
!   ggc_free (desc);
!   loop-simple_loop_desc = NULL;
  }
Index: gcc/loop-init.c
===
*** gcc/loop-init.c (revision 198021)
--- gcc/loop-init.c (working copy)
*** struct rtl_opt_pass pass_rtl_move_loop_i
*** 434,441 
0,/* properties_destroyed */
0,/* todo_flags_start */
TODO_df_verify |
!   TODO_df_finish | TODO_verify_rtl_sharing
!   | TODO_do_not_ggc_collect   /* todo_flags_finish */
   }
  };
  
--- 434,440 
0,/* properties_destroyed */
0,/* todo_flags_start */
TODO_df_verify |
!   TODO_df_finish | TODO_verify_rtl_sharing  /* todo_flags_finish */
   }
  };
  
*** struct rtl_opt_pass pass_rtl_unswitch =
*** 471,478 
0,/* properties_provided */
0,/* properties_destroyed */
0,/* todo_flags_start */
!   TODO_verify_rtl_sharing
!   | TODO_do_not_ggc_collect   /* todo_flags_finish */
   }
  };
  
--- 470,476 
0,/* properties_provided */
0,/* properties_destroyed */
0,/* todo_flags_start */
!   TODO_verify_rtl_sharing,  /* todo_flags_finish */
   }
  

Re: [PATCH][RFC] Handle commutative operations in SLP tree build

2013-04-17 Thread Richard Biener
On Wed, 10 Apr 2013, Richard Biener wrote:

 
 This handles commutative operations during SLP tree build in the
 way that if one configuration does not match, the build will
 try again with commutated operands for.  This allows to remove
 the special-casing of commutated loads in a complex addition
 that was in the end handled as permutation.  It of course
 also applies more generally.  Permutation is currently limited
 to 3 unsuccessful permutes to avoid running into the inherently
 exponential complexity of tree matching.
 
 The gcc.dg/vect/vect-complex-?.c testcases provide some testing
 coverage (previously handled by the special-casing).  I have
 seen failed SLP in the wild previously but it's usually on
 larger testcases and dependent on operand order of commutative
 operands.
 
 I've discussed ideas to restrict the cases where we try a permutation
 with Matz, but I'll rather defer that to an eventual followup.
 (compute per SSA name a value dependent on the shape of its
 use-def tree and use that as a quick check whether sub-trees
 can possibly match)
 
 Bootstrap and regtest running on x86_64-unknown-linux-gnu.
 
 Any comments?

Committed to trunk.

Richard.

 2013-04-10  Richard Biener  rguent...@suse.de
 
   * tree-vect-slp.c (vect_build_slp_tree_1): Split out from ...
   (vect_build_slp_tree): ... here.
   (vect_build_slp_tree_1): Compute which stmts of the SLP group
   match.  Remove special-casing of mismatched complex loads.
   (vect_build_slp_tree): Based on the result from vect_build_slp_tree_1
   re-try the match with swapped commutative operands.
   (vect_supported_load_permutation_p): Remove special-casing of
   mismatched complex loads.
   (vect_analyze_slp_instance): Adjust.


Re: [patch] RFC: ix86 / x86_64 register pressure aware scheduling

2013-04-17 Thread Igor Zamyatin
These changes are what we used to try here at Intel after bunch of
changes which made pre-alloc scheduler more stable. We benchmarked
both register pressure algorithms and overall result was not that
promising.

We saw number of regressions e.g. for optset -mavx -O3 -funroll-loops
-ffast-math -march=corei7 (for spec2000 not only lucas but also applu
regressed). And overall gain is negative even for x86_64. For 32 bits
picture was worse if I remember correctly.

In common we have doubts that this feature is good for OOO machine

Thanks,
Igor

-Original Message-
From: gcc-patches-ow...@gcc.gnu.org
[mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Steven Bosscher
Sent: Monday, April 15, 2013 11:34 PM
To: GCC Patches
Cc: H.J. Lu; Uros Bizjak; Jan Hubicha
Subject: [patch] RFC: ix86 / x86_64 register pressure aware scheduling

Hello,

The attached patch enables register pressure aware scheduling for the
ix86 and x86_64 targets. It uses the optimistic algorithm to avoid
being overly conservative.

This is the same as what other CISCy targets, like s390, also do.

The motivation for this patch is the excessive spilling I've observed
in a few test cases with relatively large basic blocks, e.g.
encryption algorithms and codecs. The patch passes bootstrap+testing
on x86_64-unknown-linux-gnu and i686-unknown-linux-gnu, with a few new
failures due to PR56950.

Off-list, Uros, Honza and others have already looked at the patch and
benchmarked it. For x86_64 there is an overall improvement for SPEC2k
except that lucas regresses, but such a preliminary result is IMHO
very promising.

Comments/suggestions welcome :-)

Ciao!
Steven
* common/config/i386/i386-common.c (ix86_option_optimization_table):
Do not disable insns scheduling.  Enable register pressure aware
scheduling.
* config/i386/i386.c (ix86_option_override): Use the alternative,
optimistic scheduling-pressure algorithm by default.

Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 197941)
+++ common/config/i386/i386-common.c(working copy)
@@ -707,9 +707,15 @@ static const struct default_options ix86
   {
 /* Enable redundant extension instructions removal at -O2 and higher.  */
 { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
-/* Turn off -fschedule-insns by default.  It tends to make the
-   problem with not enough registers even worse.  */
-{ OPT_LEVELS_ALL, OPT_fschedule_insns, NULL, 0 },
+/* Enable -fsched-pressure by default for all optimization levels.
+   Before SCHED_PRESSURE_MODEL register-pressure aware schedule was
+   available, -fschedule-insns was turned off completely by default for
+   this port, because scheduling before register allocation tends to
+   make the problem with not enough registers even worse.  However,
+   for very long basic blocks the scheduler can help bring register
+   pressure down significantly, and SCHED_PRESSURE_MODEL is still
+   conservative enough to avoid creating excessive register pressure.  */
+{ OPT_LEVELS_ALL, OPT_fsched_pressure, NULL, 1 },
 
 #ifdef SUBTARGET_OPTIMIZATION_OPTIONS
 SUBTARGET_OPTIMIZATION_OPTIONS,
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 197941)
+++ config/i386/i386.c  (working copy)
@@ -3936,6 +3936,10 @@ ix86_option_override (void)
 
   ix86_option_override_internal (true);
 
+  /* Use the alternative scheduling-pressure algorithm by default.  */
+  maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2,
+global_options.x_param_values,
+global_options_set.x_param_values);
 
   /* This needs to be done at start up.  It's convenient to do it here.  */
   register_pass (insert_vzeroupper_info);


[PATCH, ARM] emit LDRD epilogue instead of a single LDM return

2013-04-17 Thread Greta Yorsh
Currently, epilogue is not generated in RTL for function that can return
using a single instruction. This patch enables RTL epilogue for such
functions on targets that can benefit from using a sequence of LDRD
instructions in epilogue instead of a single LDM instruction.

No regression on qemu arm-none-eabi with cortex-a15.

Ok for trunk?

Thanks,
Greta

gcc/

2012-10-19  Greta Yorsh  Greta.Yorsh at arm.com

* config/arm/arm.c (use_return_insn): Return 0 for targets that
can benefit from using a sequence of LDRD instructions in epilogue
instead of a single LDM instruction.diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 866385c..bca92af 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2296,6 +2296,10 @@ use_return_insn (int iscond, rtx sibling)
   if (IS_INTERRUPT (func_type)  (frame_pointer_needed || TARGET_THUMB))
 return 0;
 
+  if (TARGET_LDRD  current_tune-prefer_ldrd_strd
+   !optimize_function_for_size_p (cfun))
+return 0;
+
   offsets = arm_get_frame_offsets ();
   stack_adjust = offsets-outgoing_args - offsets-saved_regs;
 


[PATCH, ARM][10/n] Split scc patterns using cond_exec

2013-04-17 Thread Greta Yorsh
This patch converts define_insn into define_insn_and_split to split
some alternatives of movsicc_insn and some scc patterns that cannot be
expressed using movsicc. The patch emits cond_exec RTL insns.

Ok for trunk?

Thanks,
Greta

gcc/

2013-02-19  Greta Yorsh  greta.yo...@arm.com

* config/arm/arm.md (movsicc_insn): Convert define_insn into
define_insn_and_split.
(and_scc,ior_scc,negscc): Likewise.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 83b36ca..c2e59ed 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -858,7 +858,7 @@
 
 ;; This is the canonicalization of addsi3_compare0_for_combiner when the
 ;; addend is a constant.
-(define_insn *cmpsi2_addneg
+(define_insn cmpsi2_addneg
   [(set (reg:CC CC_REGNUM)
(compare:CC
 (match_operand:SI 1 s_register_operand r,r)
@@ -1415,7 +1415,7 @@
(set_attr type  simple_alu_imm,*,*)]
 )
 
-(define_insn *subsi3_compare
+(define_insn subsi3_compare
   [(set (reg:CC CC_REGNUM)
(compare:CC (match_operand:SI 1 arm_rhs_operand r,r,I)
(match_operand:SI 2 arm_rhs_operand I,r,r)))
@@ -8619,7 +8619,7 @@
(set_attr type f_selvfp_type)]
 )
 
-(define_insn *movsicc_insn
+(define_insn_and_split *movsicc_insn
   [(set (match_operand:SI 0 s_register_operand =r,r,r,r,r,r,r,r)
(if_then_else:SI
 (match_operator 3 arm_comparison_operator
@@ -8632,10 +8632,45 @@
mvn%D3\\t%0, #%B2
mov%d3\\t%0, %1
mvn%d3\\t%0, #%B1
-   mov%d3\\t%0, %1\;mov%D3\\t%0, %2
-   mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
-   mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
-   mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
+   #
+   #
+   #
+   #
+   ; alt4: mov%d3\\t%0, %1\;mov%D3\\t%0, %2
+   ; alt5: mov%d3\\t%0, %1\;mvn%D3\\t%0, #%B2
+   ; alt6: mvn%d3\\t%0, #%B1\;mov%D3\\t%0, %2
+   ; alt7: mvn%d3\\t%0, #%B1\;mvn%D3\\t%0, #%B2
+   reload_completed
+  [(const_int 0)]
+  {
+enum rtx_code rev_code;
+enum machine_mode mode;
+rtx rev_cond;
+
+emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+  operands[3],
+  gen_rtx_SET (VOIDmode,
+   operands[0],
+   operands[1])));
+
+rev_code = GET_CODE (operands[3]);
+mode = GET_MODE (operands[4]);
+if (mode == CCFPmode || mode == CCFPEmode)
+  rev_code = reverse_condition_maybe_unordered (rev_code);
+else
+  rev_code = reverse_condition (rev_code);
+
+rev_cond = gen_rtx_fmt_ee (rev_code,
+   VOIDmode,
+   operands[4],
+   const0_rtx);
+emit_insn (gen_rtx_COND_EXEC (VOIDmode,
+  rev_cond,
+  gen_rtx_SET (VOIDmode,
+   operands[0],
+   operands[2])));
+DONE;
+  }
   [(set_attr length 4,4,4,4,8,8,8,8)
(set_attr conds use)
(set_attr insn mov,mvn,mov,mvn,mov,mov,mvn,mvn)
@@ -9604,27 +9639,64 @@
(set_attr type alu_shift,alu_shift_reg)])
 
 
-(define_insn *and_scc
+(define_insn_and_split *and_scc
   [(set (match_operand:SI 0 s_register_operand =r)
(and:SI (match_operator:SI 1 arm_comparison_operator
-[(match_operand 3 cc_register ) (const_int 0)])
-   (match_operand:SI 2 s_register_operand r)))]
+[(match_operand 2 cc_register ) (const_int 0)])
+   (match_operand:SI 3 s_register_operand r)))]
   TARGET_ARM
-  mov%D1\\t%0, #0\;and%d1\\t%0, %2, #1
+  #   ; mov%D1\\t%0, #0\;and%d1\\t%0, %3, #1
+   reload_completed
+  [(cond_exec (match_dup 5) (set (match_dup 0) (const_int 0)))
+   (cond_exec (match_dup 4) (set (match_dup 0)
+ (and:SI (match_dup 3) (const_int 1]
+  {
+enum machine_mode mode = GET_MODE (operands[2]);
+enum rtx_code rc = GET_CODE (operands[1]);
+
+/* Note that operands[4] is the same as operands[1],
+   but with VOIDmode as the result. */
+operands[4] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx);
+if (mode == CCFPmode || mode == CCFPEmode)
+  rc = reverse_condition_maybe_unordered (rc);
+else
+  rc = reverse_condition (rc);
+operands[5] = gen_rtx_fmt_ee (rc, VOIDmode, operands[2], const0_rtx);
+  }
   [(set_attr conds use)
(set_attr insn mov)
(set_attr length 8)]
 )
 
-(define_insn *ior_scc
+(define_insn_and_split *ior_scc
   [(set (match_operand:SI 0 s_register_operand =r,r)
-   (ior:SI (match_operator:SI 2 arm_comparison_operator
-[(match_operand 3 cc_register ) (const_int 0)])
-   (match_operand:SI 1 s_register_operand 0,?r)))]
+   (ior:SI (match_operator:SI 1 arm_comparison_operator
+[(match_operand 2 cc_register ) (const_int 0)])
+   (match_operand:SI 3 s_register_operand 0,?r)))]
   TARGET_ARM
   @
-   

New German PO file for 'gcc' (version 4.8.0)

2013-04-17 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-4.8.0.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org



Re: [PATCH] Fix linking with -findirect-dispatch

2013-04-17 Thread Andreas Schwab
Bryce McKinlay bmckin...@gmail.com writes:

 It certainly _did_ work as intended previously.

Only by chance, when libtool has to relink the library during install.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
And now for something completely different.


[Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure

2013-04-17 Thread Janus Weil
Hi all,

here is patch for a recent regression with procedure pointers.
Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8?

Cheers,
Janus


2013-04-17  Janus Weil  ja...@gcc.gnu.org

PR fortran/56814
* interface.c (check_result_characteristics): Get result from interface
if present.


2013-04-17  Janus Weil  ja...@gcc.gnu.org

PR fortran/56814
* gfortran.dg/proc_ptr_42.f90: New.


pr56814_v2.diff
Description: Binary data


proc_ptr_42.f90
Description: Binary data


Re: [Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure

2013-04-17 Thread Tobias Burnus

Janus Weil:

here is patch for a recent regression with procedure pointers.
Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8?


Looks rather obvious. OK - and thanks for the patch.

Tobias

PS: If you have time, could you review my C_LOC patch at 
http://gcc.gnu.org/ml/fortran/2013-04/msg00073.html ?



2013-04-17  Janus Weil  ja...@gcc.gnu.org

 PR fortran/56814
 * interface.c (check_result_characteristics): Get result from interface
 if present.


2013-04-17  Janus Weil  ja...@gcc.gnu.org

 PR fortran/56814
 * gfortran.dg/proc_ptr_42.f90: New.




Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-17 Thread Vladimir Makarov

On 13-04-16 6:56 PM, Michael Meissner wrote:

I tracked down the bug with the spec 2006 benchmark WRF using the LRA register
allocator.

At one point LRA has decided to use the CTR to hold a CCmode value:

(insn 11019 11018 11020 16 (set (reg:CC 66 ctr [4411])
 (reg:CC 66 ctr [4411])) module_diffusion_em.fppized.f90:4885 360 
{*movcc_internal1}
  (expr_list:REG_DEAD (reg:CC 66 ctr [4411])
 (nil)))

Now movcc_internal1 has moves from r-h (which includes ctr/lr) and ctr/lr-r,
but it doesn't have a move to cover the nop move of moving the ctr to the ctr.
IMHO, LRA should not be generating NOP moves that are later deleted.

There are two ways to solve the problem.  One is not to let anything but int
modes into CTR/LR, which will also eliminate the register allocator from
spilling floating point values there (which we've seen in the past, but the
last time I tried to eliminate it I couldn't).  The following patch does this,
and also changes the assertion to call fatal_insn_not_found to make it clearer
what the error is.

I imagine, I could add a NOP move insn to movcc_internal1, but that just
strikes me as wrong.

Note, this does not fix the 32-bit failure in dealII, and I also noticed that I
can't bootstrap the compiler using --with-cpu=power7, which I will get to
tomorrow.

2013-04-16  Michael Meissner  meiss...@linux.vnet.ibm.com

* config/rs6000/rs6000.opt (-mconstrain-regs): New debug switch to
control whether we only allow int modes to go in the CTR, LR,
VRSAVE, VSCR registers.
* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok): Likewise.
(rs6000_debug_reg_global): If -mdebug=reg, print out if SPRs are
constrained.
(rs6000_option_override_internal): Set -mconstrain-regs if we are
using the LRA register allocator.

* lra.c (check_rtl): Use fatal_insn_not_found to report constraint
does not match.

Mike, thanks for the patch and all the SPEC2006 data  (which are very 
useful as I have no access to power machine which can be used for 
benchmarking).  I guess that may be some benchmark scores are lower 
because of LRA lacks some micro-optimizations which reload implements 
through many power hooks (e.g. LRA does not use push reload).  Although 
sometimes it is not a bad thing (e.g. LRA does not use  
SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the stack slots for 
other useful things).


In general I got impression that power7 is the most difficult port for 
LRA.  If we manage to port it, LRA ports for other targets will be easier.


I also reproduced bootstrap failure --with-cpu=power7 and I am going to 
work on this and after that on SPEC2006 you wrote about.




Re: [PATCH, x86] Use vector moves in memmove expanding

2013-04-17 Thread Jan Hubicka
 
 Bootstrap/make check/Specs2k are passing on i686 and x86_64.
Thanks for returning to this!

glibc has quite comprehensive testsuite for stringop.  It may be useful to test 
it
with -minline-all-stringop -mstringop-stategy=vector

I tested the patch on my core notebook and my memcpy micro benchmark. 
Vector loop is not a win since apparenlty we do not produce any SSE code for 
64bit
compilation. What CPUs and bock sizes this is intended for?

Also the internal loop with -march=native seems to come out as:
.L7:
movq(%rsi,%r8), %rax
movq8(%rsi,%r8), %rdx
movq48(%rsi,%r8), %r9
movq56(%rsi,%r8), %r10
movdqu  16(%rsi,%r8), %xmm3
movdqu  32(%rsi,%r8), %xmm1
movq%rax, (%rdi,%r8)
movq%rdx, 8(%rdi,%r8)
movdqa  %xmm3, 16(%rdi,%r8)
movdqa  %xmm1, 32(%rdi,%r8)
movq%r9, 48(%rdi,%r8)
movq%r10, 56(%rdi,%r8)
addq$64, %r8
cmpq%r11, %r8

It is not htat much of SSE enablement since RA seems to home the vars in 
integer regs.
Could you please look into it?
 
 Changelog entry:
 
 2013-04-10  Michael Zolotukhin  michael.v.zolotuk...@gmail.com
 
 * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
 * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
 adjust_address instead of change_address to keep info about alignment.
 (emit_strmov): Remove.
 (emit_memmov): New function.
 (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
 (expand_movmem_epilogue): Likewise and return updated rtx for
 destination.
 (expand_constant_movmem_prologue): Likewise and return updated rtx for
 destination and source.
 (decide_alignment): Refactor, handle vector_loop.
 (ix86_expand_movmem): Likewise.
 (ix86_expand_setmem): Likewise.
 * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
 * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 73a59b5..edb59da 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1565,6 +1565,18 @@ get_mem_align_offset (rtx mem, unsigned int align)
  expr = inner;
}
 }
+  else if (TREE_CODE (expr) == MEM_REF)
+{
+  tree base = TREE_OPERAND (expr, 0);
+  tree byte_offset = TREE_OPERAND (expr, 1);
+  if (TREE_CODE (base) != ADDR_EXPR
+ || TREE_CODE (byte_offset) != INTEGER_CST)
+   return -1;
+  if (!DECL_P (TREE_OPERAND (base, 0))
+ || DECL_ALIGN (TREE_OPERAND (base, 0))  align)

You can use TYPE_ALIGN here? In general can't we replace all the GIMPLE
handling by get_object_alignment?

+   return -1;
+  offset += tree_low_cst (byte_offset, 1);
+}
   else
 return -1;
 
This change out to go independently. I can not review it. 
I will make first look over the patch shortly, but please send updated patch 
fixing
the problem with integer regs.

Honza


Re: [PATCH, ARM][10/n] Split scc patterns using cond_exec

2013-04-17 Thread Richard Earnshaw

On 17/04/13 14:12, Greta Yorsh wrote:

This patch converts define_insn into define_insn_and_split to split
some alternatives of movsicc_insn and some scc patterns that cannot be
expressed using movsicc. The patch emits cond_exec RTL insns.

Ok for trunk?

Thanks,
Greta

gcc/

2013-02-19  Greta Yorsh  greta.yo...@arm.com

 * config/arm/arm.md (movsicc_insn): Convert define_insn into
 define_insn_and_split.
 (and_scc,ior_scc,negscc): Likewise.




OK.

R.




Re: [PATCH, ARM] emit LDRD epilogue instead of a single LDM return

2013-04-17 Thread Richard Earnshaw

On 17/04/13 14:12, Greta Yorsh wrote:

Currently, epilogue is not generated in RTL for function that can return
using a single instruction. This patch enables RTL epilogue for such
functions on targets that can benefit from using a sequence of LDRD
instructions in epilogue instead of a single LDM instruction.

No regression on qemu arm-none-eabi with cortex-a15.

Ok for trunk?

Thanks,
Greta

gcc/

2012-10-19  Greta Yorsh  Greta.Yorsh at arm.com

 * config/arm/arm.c (use_return_insn): Return 0 for targets that
 can benefit from using a sequence of LDRD instructions in epilogue
 instead of a single LDM instruction.



OK.

R.




Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsics

2013-04-17 Thread Richard Earnshaw

On 17/04/13 12:06, Kyrylo Tkachov wrote:

Hi Julian,


From: Julian Brown [mailto:jul...@codesourcery.com]
Sent: 13 April 2013 15:04
To: Julian Brown
Cc: Kyrylo Tkachov; gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana
Radhakrishnan
Subject: Re: [PATCH][ARM][1/2] Add support for vcvt_f16_f32 and
vcvt_f32_f16 NEON intrinsics

On Fri, 12 Apr 2013 20:09:39 +0100
Julian Brown jul...@codesourcery.com wrote:


On Fri, 12 Apr 2013 15:19:18 +0100
Kyrylo Tkachov kyrylo.tkac...@arm.com wrote:


Hi all,

This patch adds the vcvt_f16_f32 and vcvt_f32_f16 NEON intrinsic
to arm_neon.h through the generator ML scripts and also adds the
built-ins to which the intrinsics will map to. The generator ML
scripts are updated and used to generate the relevant .texi
documentation, arm_neon.h and the tests in gcc.target/arm/neon .


FWIW, some of the changes to neon*.ml can be simplified somewhat --

my

attempt at an improved version of those bits is attached. I'm still
not too happy with mode_suffix, but these new instructions require
adding semantics to parts of the generator program which weren't
really very well-defined to start with :-). I appreciate that it's a
bit of a tangle...


I thought of an improvement to the mode_suffix part from the last
version of the patch, so here it is. I'm done fiddling with this now,
so back to you!


Thanks for looking at it! My Ocaml-fu is rather limited.
It does look cleaner now.
Here it is together with all the other parts of the patch, plus some
minor formatting changes.

Ok for trunk now?

gcc/ChangeLog
2013-04-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 Julian Brown  jul...@codesourcery.com

* config/arm/arm.c (neon_builtin_type_mode): Add T_V4HF.
(TB_DREG): Add T_V4HF.
(v4hf_UP): New macro.
(neon_itype): Add NEON_FLOAT_WIDEN, NEON_FLOAT_NARROW.
(arm_init_neon_builtins): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
Handle initialisation of V4HF. Adjust initialisation of reinterpret
built-ins.
(arm_expand_neon_builtin): Handle NEON_FLOAT_WIDEN,
NEON_FLOAT_NARROW.
(arm_vector_mode_supported_p): Handle V4HF.
(arm_mangle_map): Handle V4HFmode.
* config/arm/arm.h (VALID_NEON_DREG_MODE): Add V4HF.
* config/arm/arm_neon_builtins.def: Add entries for
vcvtv4hfv4sf, vcvtv4sfv4hf.
* config/arm/neon.md (neon_vcvtv4sfv4hf): New pattern.
(neon_vcvtv4hfv4sf): Likewise.
* config/arm/neon-gen.ml: Handle half-precision floating point
features.
* config/arm/neon-testgen.ml: Handle Requires_FP_bit feature.
* config/arm/arm_neon.h: Regenerate.
* config/arm/neon.ml (type elts): Add F16.
(type vectype): Add T_float16x4, T_floatHF.
(type vecmode): Add V4HF.
(type features): Add Requires_FP_bit feature.
(elt_width): Handle F16.
(elt_class): Likewise.
(elt_of_class_width): Likewise.
(mode_of_elt): Refactor.
(type_for_elt): Handle F16, fix error messages.
(vectype_size): Handle T_float16x4.
(vcvt_sh): New function.
(ops): Add entries for vcvt_f16_f32, vcvt_f32_f16.
(string_of_vectype): Handle T_floatHF, T_float16, T_float16x4.
(string_of_mode): Handle V4HF.
* doc/arm-neon-intrinsics.texi: Regenerate.


gcc/testsuite/ChangeLog
2013-04-17  Kyrylo Tkachov  kyrylo.tkac...@arm.com
 Julian Brown  jul...@codesourcery.com

* gcc.target/arm/neon/vcvtf16_f32.c: New test. Generated.
* gcc.target/arm/neon/vcvtf32_f16.c: Likewise.


neon-vcvt-intrinsics.patch



Please give Julian 24 hours for one final review of the Ocaml bits. 
Otherwise OK.


R.




Re: [PATCH, x86] Use vector moves in memmove expanding

2013-04-17 Thread Jan Hubicka
@@ -2392,6 +2392,7 @@ static void ix86_set_current_function (tree);
 static unsigned int ix86_minimum_incoming_stack_boundary (bool);
 
 static enum calling_abi ix86_function_abi (const_tree);
+static int smallest_pow2_greater_than (int);

Perhaps it is easier to use existing 1ceil_log2?
 
 
 #ifndef SUBTARGET32_DEFAULT_CPU
@@ -21829,11 +21830,10 @@ expand_set_or_movmem_via_loop (rtx destmem, rtx 
srcmem,
 {
   rtx out_label, top_label, iter, tmp;
   enum machine_mode iter_mode = counter_mode (count);
-  rtx piece_size = GEN_INT (GET_MODE_SIZE (mode) * unroll);
+  int piece_size_n = GET_MODE_SIZE (mode) * unroll;
+  rtx piece_size = GEN_INT (piece_size_n);
   rtx piece_size_mask = GEN_INT (~((GET_MODE_SIZE (mode) * unroll) - 1));
   rtx size;
-  rtx x_addr;
-  rtx y_addr;
   int i;
 
   top_label = gen_label_rtx ();
@@ -21854,13 +21854,18 @@ expand_set_or_movmem_via_loop (rtx destmem, rtx 
srcmem,
   emit_label (top_label);
 
   tmp = convert_modes (Pmode, iter_mode, iter, true);
-  x_addr = gen_rtx_PLUS (Pmode, destptr, tmp);
-  destmem = change_address (destmem, mode, x_addr);
+
+  /* This assert could be relaxed - in this case we'll need to compute
+ smallest power of two, containing in PIECE_SIZE_N and pass it to
+ offset_address.  */
+  gcc_assert ((piece_size_n  (piece_size_n - 1)) == 0);
+  destmem = offset_address (destmem, tmp, piece_size_n);
+  destmem = adjust_address (destmem, mode, 0);
 
   if (srcmem)
 {
-  y_addr = gen_rtx_PLUS (Pmode, srcptr, copy_rtx (tmp));
-  srcmem = change_address (srcmem, mode, y_addr);
+  srcmem = offset_address (srcmem, copy_rtx (tmp), piece_size_n);
+  srcmem = adjust_address (srcmem, mode, 0);
 
   /* When unrolling for chips that reorder memory reads and writes,
 we can save registers by using single temporary.
@@ -22039,13 +22044,61 @@ expand_setmem_via_rep_stos (rtx destmem, rtx destptr, 
rtx value,
   emit_insn (gen_rep_stos (destptr, countreg, destmem, value, destexp));
 }

This change looks OK and can go into manline independnetly. Just please ensure 
that changing
the way address is computed is not making us to preserve alias set. Memmove can 
not rely on the alias
set of the src/destination objects.
 
-static void
-emit_strmov (rtx destmem, rtx srcmem,
-rtx destptr, rtx srcptr, enum machine_mode mode, int offset)
-{
-  rtx src = adjust_automodify_address_nv (srcmem, mode, srcptr, offset);
-  rtx dest = adjust_automodify_address_nv (destmem, mode, destptr, offset);
-  emit_insn (gen_strmov (destptr, dest, srcptr, src));
+/* This function emits moves to copy SIZE_TO_MOVE bytes from SRCMEM to
+   DESTMEM.
+   SRC is passed by pointer to be updated on return.
+   Return value is updated DST.  */
+static rtx
+emit_memmov (rtx destmem, rtx *srcmem, rtx destptr, rtx srcptr,
+HOST_WIDE_INT size_to_move)
+{
+  rtx dst = destmem, src = *srcmem, adjust, tempreg;
+  enum insn_code code;
+  enum machine_mode move_mode;
+  int piece_size, i;
+
+  /* Find the widest mode in which we could perform moves.
+ Start with the biggest power of 2 less than SIZE_TO_MOVE and half
+ it until move of such size is supported.  */
+  piece_size = smallest_pow2_greater_than (size_to_move)  1;
+  move_mode = mode_for_size (piece_size * BITS_PER_UNIT, MODE_INT, 0);

I suppose this is a problem with SSE moves ending up in integer register, since
you get TImode rather than vectorized mode, like V8QImode in here.  Why not 
stick
with the original mode parmaeter?
+  code = optab_handler (mov_optab, move_mode);
+  while (code == CODE_FOR_nothing  piece_size  1)
+{
+  piece_size = 1;
+  move_mode = mode_for_size (piece_size * BITS_PER_UNIT, MODE_INT, 0);
+  code = optab_handler (mov_optab, move_mode);
+}
+  gcc_assert (code != CODE_FOR_nothing);
+
+  dst = adjust_automodify_address_nv (dst, move_mode, destptr, 0);
+  src = adjust_automodify_address_nv (src, move_mode, srcptr, 0);
+
+  /* Emit moves.  We'll need SIZE_TO_MOVE/PIECE_SIZES moves.  */
+  gcc_assert (size_to_move % piece_size == 0);
+  adjust = GEN_INT (piece_size);
+  for (i = 0; i  size_to_move; i += piece_size)
+{
+  /* We move from memory to memory, so we'll need to do it via
+a temporary register.  */
+  tempreg = gen_reg_rtx (move_mode);
+  emit_insn (GEN_FCN (code) (tempreg, src));
+  emit_insn (GEN_FCN (code) (dst, tempreg));
+
+  emit_move_insn (destptr,
+ gen_rtx_PLUS (Pmode, copy_rtx (destptr), adjust));
+  emit_move_insn (srcptr,
+ gen_rtx_PLUS (Pmode, copy_rtx (srcptr), adjust));
+
+  dst = adjust_automodify_address_nv (dst, move_mode, destptr,
+ piece_size);
+  src = adjust_automodify_address_nv (src, move_mode, srcptr,
+ piece_size);
+}
+
+  /* Update DST and SRC rtx.  */
+  *srcmem = src;
+  return dst;

Doesn't this effectively kill support for 

[PATCH, PR 42371] Remove references to functions from symbol table during inlining

2013-04-17 Thread Martin Jambor
Hi,

the patch below is a fix for PR 42371 removing references to functions
from symbol table when we know that all their uses were inlined.  This
then allows us to remove out-of-line copies of functions when they are
not needed.

The patch adds ability to count the uses of a parameter value that are
all well described by jump functions and call graph indirect edges (or
mark that we do not have uses under control).  These counts are then
combined or transferred to a new helper structure (ipa_cst_ref_desc)
associated with constant jump functions when functions are inlined and
decremented when the reference is used to turn an indirect call graph
edges into a direct ones.  When the count eventually drops to zero,
the other information in the structure are used to identify and remove
an item in the appropriate reference list in the symbol table.  The
data structures allow for non-trivial duplication when whole trees of
inlined call graph nodes are duplicated.

I currently allocate the new structures only when a constant pointer
to a function is passed as a parameter.  It is trivial to have one for
all references but I'm not sure what good that would be, at least not
now.  Perhaps once we will also attempt to track references to virtual
tables but it will be much more difficult to prove it does not escape
somewhere in the callee(s).

Bootstrapped and tested on x86_64-linux without any problems.  OK for
trunk?

Thanks,

Martin


2013-03-22  Martin Jambor  mjam...@suse.cz

PR middle-end/42371
* ipa-prop.h (IPA_UNDESCRIBED_USE): New macro.
(ipa_constant_data): New type.
(ipa_jump_func): Use ipa_constant_data to hold information about
constant jump functions.
(ipa_get_jf_constant): Adjust to jump function type changes.
(ipa_get_jf_constant_rdesc): New function.
(ipa_param_descriptor): New field controlled_uses.
(ipa_get_controlled_uses): New function.
(ipa_set_controlled_uses): Likewise.
* ipa-ref.h (ipa_find_reference): Declare.
* ipa-prop.c (ipa_cst_ref_desc): New type.
(ipa_print_node_jump_functions_for_edge): Adjust for jump function type
changes.
(ipa_set_jf_constant): Likewise.  Also create reference descriptions.
New parameter cs.  Adjust all callers.
(ipa_analyze_params_uses): Detect uncontrolled and controlled uses.
(remove_described_reference): New function.
(jfunc_rdesc_usable): Likewise.
(try_make_edge_direct_simple_call): Decrement controlled use count,
attempt to remove reference if it hits zero.
(combine_controlled_uses_counters): New function.
(propagate_controlled_uses): Likewise.
(ipa_propagate_indirect_call_infos): Call propagate_controlled_uses.
(ipa_edge_duplication_hook): Duplicate reference descriptions.
(ipa_print_node_params): Print described use counter.
(ipa_write_jump_function): Adjust to jump function type changes.
(ipa_read_jump_function): New parameter CS, pass it to
ipa_set_jf_constant.  Adjust caller.
(ipa_write_node_info): Stream controlled use count
(ipa_read_node_info): Likewise.
* cgraph.c (cgraph_mark_address_taken_node): Bail out instead of
asserting.
* ipa-cp.c (ipcp_discover_new_direct_edges): Decrement controlled use
count.  Remove cloning-added reference if it reaches zero.
* ipa-ref.c (ipa_find_reference): New function.

testsuite/
* gcc.dg/ipa/remref-0.c: New test.
* gcc.dg/ipa/remref-1a.c: Likewise.
* gcc.dg/ipa/remref-1b.c: Likewise.
* gcc.dg/ipa/remref-2a.c: Likewise.
* gcc.dg/ipa/remref-2b.c: Likewise.

Index: src/gcc/ipa-prop.c
===
*** src.orig/gcc/ipa-prop.c
--- src/gcc/ipa-prop.c
*** static struct cgraph_2edge_hook_list *ed
*** 62,67 
--- 62,83 
  static struct cgraph_2node_hook_list *node_duplication_hook_holder;
  static struct cgraph_node_hook_list *function_insertion_hook_holder;
  
+ /* Description of a reference to an IPA constant.  */
+ struct ipa_cst_ref_desc
+ {
+   /* Edge that corresponds to the statement which took the reference.  */
+   struct cgraph_edge *cs;
+   /* Linked list of duplicates created when call graph edges are cloned.  */
+   struct ipa_cst_ref_desc *next_duplicate;
+   /* Number of references in IPA structures, IPA_UNDESCRIBED_USE if the value
+  if out of control.  */
+   int refcount;
+ };
+ 
+ /* Allocation pool for reference descriptions.  */
+ 
+ static alloc_pool ipa_refdesc_pool;
+ 
  /* Return index of the formal whose tree is PTREE in function which 
corresponds
 to INFO.  */
  
*** ipa_print_node_jump_functions_for_edge (
*** 175,181 
}
else if (type == IPA_JF_CONST)
{
! tree val = jump_func-value.constant;
  fprintf (f, CONST: );
  

[PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue

2013-04-17 Thread Martin Jambor
Hi,

I have discovered that scheduling pass_cprop_hardreg before
pass_thread_prologue_and_epilogue leads to significant increases in
numbers of performed shrink-wrappings.  For one it solves PR 10474 (at
least on x86_64-linux) but it also boosts the number of
shrink-wrappings performed during gcc bootstrap by nearly 80%
(3165-5692 functions).  It is also necessary (although not
sufficient) to perform shrink-wrapping in at least one function in the
povray benchmark.

The reason why it helps so much is that before register allocation
there are instructions moving the value of actual arguments from
originally hard register (e.g. SI, DI, etc.) to a pseudo at the
beginning of each function.  When the argument is live across a
function call, the pseudo is likely to be assigned to a callee-saved
register and then also accessed from that register, even in the first
BB, making it require prologue, though it could be fetched from the
original one.  When we convert all uses (at least in the first BB) to
the original register, the preparatory stage of shrink wrapping is
often capable of moving the register moves to a later BB, thus
creating fast paths which do not require prologue and epilogue.

We believe this change in the pipeline should not bring about any
negative effects.  During gcc bootstrap, the number of instructions
changed by pass_cprop_hardreg dropped but by only 1.2%.  We have also
ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all
run time differences could be attributed to noise.  The changes in
binary sizes were also small:

|| Trunk produced | New ||
| Benchmark  |binary size | binary size | % diff |
|++-+|
| 400.perlbench  |6219603 | 6136803 |  -1.33 |
| 401.bzip2  | 359291 |  351659 |  -2.12 |
| 403.gcc|   16249718 |15915774 |  -2.06 |
| 410.bwaves | 145249 |  145769 |   0.36 |
| 416.gamess |   40269686 |40270270 |   0.00 |
| 429.mcf|  97142 |   97126 |  -0.02 |
| 433.milc   | 715444 |  713236 |  -0.31 |
| 434.zeusmp |1444596 | 1444676 |   0.01 |
| 435.gromacs|6609207 | 6470039 |  -2.11 |
| 436.cactusADM  |4571319 | 4532607 |  -0.85 |
| 437.leslie3d   | 492197 |  492357 |   0.03 |
| 444.namd   |1001921 | 1007001 |   0.51 |
| 445.gobmk  |8193495 | 8163839 |  -0.36 |
| 450.soplex |5565070 | 5530734 |  -0.62 |
| 453.povray |7468446 | 7340142 |  -1.72 |
| 454.calculix   |8474754 | 8464954 |  -0.12 |
| 456.hmmer  |1662315 | 1650147 |  -0.73 |
| 458.sjeng  | 623065 |  620817 |  -0.36 |
| 459.GemsFDTD   |1456669 | 1461573 |   0.34 |
| 462.libquantum | 249809 |  248401 |  -0.56 |
| 464.h264ref|2784806 | 2772806 |  -0.43 |
| 465.tonto  |   15511395 |15480899 |  -0.20 |
| 470.lbm|  64327 |   64215 |  -0.17 |
| 471.omnetpp|5325418 | 5293874 |  -0.59 |
| 473.astar  | 365853 |  363261 |  -0.71 |
| 481.wrf|   22002287 |21950783 |  -0.23 |
| 482.sphinx3|1153616 | 1145248 |  -0.73 |
| 483.xalancbmk  |   62458676 |62001540 |  -0.73 |
|++-+|
| TOTAL  |  221535374 |   220130550 |  -0.63 |

I have successfully bootstrapped and tested the patch on
x86-64-linux.  Is it OK for trunk?  Or should I also examine some
other aspect?

Thanks,

Martin


2013-03-28  Martin Jambor  mjam...@suse.cz

PR middle-end/10474
* passes.c (init_optimization_passes): Move pass_cprop_hardreg before
pass_thread_prologue_and_epilogue.

testsuite/
* gcc.dg/pr10474.c: New test.


Index: src/gcc/passes.c
===
--- src.orig/gcc/passes.c
+++ src/gcc/passes.c
@@ -1630,6 +1630,7 @@ init_optimization_passes (void)
  NEXT_PASS (pass_ree);
  NEXT_PASS (pass_compare_elim_after_reload);
  NEXT_PASS (pass_branch_target_load_optimize1);
+ NEXT_PASS (pass_cprop_hardreg);
  NEXT_PASS (pass_thread_prologue_and_epilogue);
  NEXT_PASS (pass_rtl_dse2);
  NEXT_PASS (pass_stack_adjustments);
@@ -1637,7 +1638,6 @@ init_optimization_passes (void)
  NEXT_PASS (pass_peephole2);
  NEXT_PASS (pass_if_after_reload);
  NEXT_PASS (pass_regrename);
- NEXT_PASS (pass_cprop_hardreg);
  NEXT_PASS (pass_fast_rtl_dce);
  NEXT_PASS (pass_reorder_blocks);
  NEXT_PASS (pass_branch_target_load_optimize2);
Index: src/gcc/testsuite/gcc.dg/pr10474.c

[PATCH, PR 56718] Middle-end intraprocedural type-based devirtualization

2013-04-17 Thread Martin Jambor
Hi,

this patch implements type-based devirtualization done
intra-procedurally.  This is normally done by the front-end except in
cases when opportunities for this transformation are created by
early-inlining.  Because we handle this situation at IPA-level
(especially in inlining but also in IPA-CP), this can currently mean
that some code runs much faster when compiled without early-inlining,
e.g. the PR 56718 testcase.

This patch piggy-backs on PRE OBJ_TYPE_REF folding and when that
fails, it calls into ipa-cp routines that simply use the
infrastructure we have for construction of known-type jump functions
to figure out the type of the object involved.  If that is known, the
appropriate method is looked up in the VMT obtained from its BINFO.

From now on I'll concentrate on tracking of VMT pointers within
objects rather than on type-based techniques in my devirtualization
efforts.  Nevertheless, I do believe we want to have this patch in
because (as opposed to VMT tracking) it can also work when
constructors are in a different compilation unit.  Even these
situations do not occur that often, the effects can be quite
substantial when we miss an inlining opportunity, the testcase in the
PR 56718 is rather artificial but 2.5 times quicker with the patch.

A very similar patch has passed bootstrap and testing on x86_64-linux
without any problems, I'm bootstrapping and testing this exact one
now.  OK for trunk if it passes?

Thanks,

Martin


2013-03-25  Martin Jambor  mjam...@suse.cz

PR tree-optimization/56718
* ipa-cp.c (ipa_value_from_known_type_jfunc): Moved...
* ipa-prop.c (ipa_binfo_from_known_type_jfunc): ...here, renamed
and made public.  adjusted all callers.
(ipa_intraprocedural_devirtualization): New function.
* ipa-prop.h (ipa_binfo_from_known_type_jfunc): Declare.
(ipa_intraprocedural_devirtualization): Likewise.

* Makefile.in (tree-ssa-pre.o): Add ipa-prop.h to dependencies.

testsuite/
* g++.dg/ipa/imm-devirt-1.C: New test.
* g++.dg/ipa/imm-devirt-2.C: Likewise.

*** /tmp/ITGyHx_Makefile.in 2013-04-16 00:02:39.0 +0200
--- gcc/Makefile.in 2013-04-15 15:02:53.079696533 +0200
*** tree-ssa-pre.o : tree-ssa-pre.c $(TREE_F
*** 2369,2375 
 $(TM_H) coretypes.h $(TREE_PASS_H) $(FLAGS_H) langhooks.h \
 $(CFGLOOP_H) alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) $(HASH_TABLE_H) \
 $(GIMPLE_H) $(TREE_INLINE_H) tree-iterator.h tree-ssa-sccvn.h $(PARAMS_H) \
!$(DBGCNT_H) tree-scalar-evolution.h $(GIMPLE_PRETTY_PRINT_H) domwalk.h
  tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H) $(CONFIG_H) \
 $(SYSTEM_H) $(TREE_H) $(DIAGNOSTIC_H) \
 $(TM_H) coretypes.h $(DUMPFILE_H) $(FLAGS_H) $(CFGLOOP_H) \
--- 2369,2376 
 $(TM_H) coretypes.h $(TREE_PASS_H) $(FLAGS_H) langhooks.h \
 $(CFGLOOP_H) alloc-pool.h $(BASIC_BLOCK_H) $(BITMAP_H) $(HASH_TABLE_H) \
 $(GIMPLE_H) $(TREE_INLINE_H) tree-iterator.h tree-ssa-sccvn.h $(PARAMS_H) \
!$(DBGCNT_H) tree-scalar-evolution.h $(GIMPLE_PRETTY_PRINT_H) domwalk.h \
!$(IPA_PROP_H)
  tree-ssa-sccvn.o : tree-ssa-sccvn.c $(TREE_FLOW_H) $(CONFIG_H) \
 $(SYSTEM_H) $(TREE_H) $(DIAGNOSTIC_H) \
 $(TM_H) coretypes.h $(DUMPFILE_H) $(FLAGS_H) $(CFGLOOP_H) \
*** /tmp/CO3mFA_ipa-cp.c2013-04-16 00:02:39.0 +0200
--- gcc/ipa-cp.c2013-04-15 15:02:53.070696564 +0200
*** ipa_get_jf_ancestor_result (struct ipa_j
*** 791,810 
  return NULL_TREE;
  }
  
- /* Extract the acual BINFO being described by JFUNC which must be a known type
-jump function.  */
- 
- static tree
- ipa_value_from_known_type_jfunc (struct ipa_jump_func *jfunc)
- {
-   tree base_binfo = TYPE_BINFO (ipa_get_jf_known_type_base_type (jfunc));
-   if (!base_binfo)
- return NULL_TREE;
-   return get_binfo_at_offset (base_binfo,
- ipa_get_jf_known_type_offset (jfunc),
- ipa_get_jf_known_type_component_type (jfunc));
- }
- 
  /* Determine whether JFUNC evaluates to a known value (that is either a
 constant or a binfo) and if so, return it.  Otherwise return NULL. INFO
 describes the caller node so that pass-through jump functions can be
--- 791,796 
*** ipa_value_from_jfunc (struct ipa_node_pa
*** 816,822 
if (jfunc-type == IPA_JF_CONST)
  return ipa_get_jf_constant (jfunc);
else if (jfunc-type == IPA_JF_KNOWN_TYPE)
! return ipa_value_from_known_type_jfunc (jfunc);
else if (jfunc-type == IPA_JF_PASS_THROUGH
   || jfunc-type == IPA_JF_ANCESTOR)
  {
--- 802,808 
if (jfunc-type == IPA_JF_CONST)
  return ipa_get_jf_constant (jfunc);
else if (jfunc-type == IPA_JF_KNOWN_TYPE)
! return ipa_binfo_from_known_type_jfunc (jfunc);
else if (jfunc-type == IPA_JF_PASS_THROUGH
   || jfunc-type == IPA_JF_ANCESTOR)
  {
*** propagate_scalar_accross_jump_function (
*** 1103,1109 
  
 

[google] Add libgcov interface for accessing profile directory (issue8726046)

2013-04-17 Thread Teresa Johnson
Patch to add interface for querying the profile directory prefix
specified to the -fprofile-generate= option.

Google ref b/8629045.

Tested with regression tests and internal test. Ok for google branches?

2013-04-17  Teresa Johnson  tejohn...@google.com

* libgcc/libgcov.c (__gcov_get_profile_prefix): New function.
* gcc/tree-profile.c (tree_init_instrumentation): New function.
* gcc/coverage.c (get_const_string_type): Make non-static.
(coverage_init): Invoke tree_init_instrumentation().
* gcc/coverage.h (get_const_string_type): Declare.
(tree_init_instrumentation): Ditto.

Index: libgcc/libgcov.c
===
--- libgcc/libgcov.c(revision 197640)
+++ libgcc/libgcov.c(working copy)
@@ -183,6 +183,14 @@ unsigned int __gcov_sampling_enabled ()
   return __gcov_has_sampling;
 }
 
+/* Profile directory prefix specified to -fprofile-generate=.  */
+extern char * __gcov_profile_prefix;
+
+char *__gcov_get_profile_prefix ()
+{
+  return __gcov_profile_prefix;
+}
+
 /* Per thread sample counter.  */
 THREAD_PREFIX gcov_unsigned_t __gcov_sample_counter = 0;
 
Index: gcc/tree-profile.c
===
--- gcc/tree-profile.c  (revision 197640)
+++ gcc/tree-profile.c  (working copy)
@@ -165,6 +165,9 @@ static struct pointer_set_t *instrumentation_to_be
 /* extern __thread gcov_unsigned_t __gcov_sample_counter  */
 static GTY(()) tree gcov_sample_counter_decl = NULL_TREE;
 
+/* extern gcov_unsigned_t __gcov_profile_prefix  */
+static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE;
+
 /* extern gcov_unsigned_t __gcov_sampling_period  */
 static tree GTY(()) gcov_sampling_period_decl = NULL_TREE;
 
@@ -407,6 +410,41 @@ cleanup_instrumentation_sampling (void)
 }
 }
 
+/* Initialization function for FDO instrumentation.  */
+
+void
+tree_init_instrumentation (void)
+{
+  if (!gcov_profile_prefix_decl)
+{
+  tree prefix_ptr;
+  int prefix_len;
+  tree prefix_string;
+
+  /* Construct an initializer for __gcov_profile_prefix.  */
+  gcov_profile_prefix_decl =
+build_decl (UNKNOWN_LOCATION, VAR_DECL,
+get_identifier (__gcov_profile_prefix),
+get_const_string_type ());
+  TREE_PUBLIC (gcov_profile_prefix_decl) = 1;
+  DECL_ARTIFICIAL (gcov_profile_prefix_decl) = 1;
+  make_decl_one_only (gcov_profile_prefix_decl,
+  DECL_ASSEMBLER_NAME (gcov_profile_prefix_decl));
+  TREE_STATIC (gcov_profile_prefix_decl) = 1;
+
+  prefix_len = strlen (profile_data_prefix);
+  prefix_string = build_string (prefix_len + 1, profile_data_prefix);
+  TREE_TYPE (prefix_string) = build_array_type
+  (char_type_node, build_index_type
+   (build_int_cst (NULL_TREE, prefix_len)));
+  prefix_ptr = build1 (ADDR_EXPR, get_const_string_type (),
+   prefix_string);
+
+  DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr;
+  varpool_finalize_decl (gcov_profile_prefix_decl);
+}
+}
+
 /* Initialization function for FDO sampling.  */
 
 void
Index: gcc/coverage.c
===
--- gcc/coverage.c  (revision 197640)
+++ gcc/coverage.c  (working copy)
@@ -227,7 +227,7 @@ get_gcov_unsigned_t (void)
 
 /* Return the type node for const char *.  */
 
-static tree
+tree
 get_const_string_type (void)
 {
   return build_pointer_type
@@ -2879,6 +2879,7 @@ coverage_init (const char *filename, const char* s
   /* Define variables which are referenced at runtime by libgcov.  */
   if (profiling_enabled_p ())
 {
+  tree_init_instrumentation ();
   tree_init_dyn_ipa_parameters ();
   init_pmu_profiling ();
   tree_init_instrumentation_sampling ();
Index: gcc/coverage.h
===
--- gcc/coverage.h  (revision 197640)
+++ gcc/coverage.h  (working copy)
@@ -82,6 +82,7 @@ extern bool pmu_data_present (void);
 
 extern tree get_gcov_type (void);
 extern tree get_gcov_unsigned_t (void);
+extern tree get_const_string_type (void);
 
 /* Mark this module as containing asm statements.  */
 extern void coverage_has_asm_stmt (void);
@@ -89,5 +90,6 @@ extern void coverage_has_asm_stmt (void);
 /* Defined in tree-profile.c.  */
 extern void tree_init_instrumentation_sampling (void);
 extern void tree_init_dyn_ipa_parameters (void);
+extern void tree_init_instrumentation (void);
 
 #endif

--
This patch is available for review at http://codereview.appspot.com/8726046


Re: RFA: enable LRA for rs6000 [patch for WRF]

2013-04-17 Thread Michael Meissner
On Wed, Apr 17, 2013 at 10:14:53AM -0400, Vladimir Makarov wrote:
 Mike, thanks for the patch and all the SPEC2006 data  (which are
 very useful as I have no access to power machine which can be used
 for benchmarking).  I guess that may be some benchmark scores are
 lower because of LRA lacks some micro-optimizations which reload
 implements through many power hooks (e.g. LRA does not use push
 reload).  Although sometimes it is not a bad thing (e.g. LRA does
 not use  SECONDARY_MEMORY_NEEDED_RTX which permits to reuse the
 stack slots for other useful things).

SECONDARY_MEMORY_NEEDED_RTX is needed for SDmode on machines before the power7,
where we need a larger stack slot to hold spilled SDmode values (power6 did not
have the LFIWZX instruction that is needed to load SDmode values into floating
point registers).

 In general I got impression that power7 is the most difficult port
 for LRA.  If we manage to port it, LRA ports for other targets will
 be easier.

I dunno, has LRA been ported to the SH yet?

 I also reproduced bootstrap failure --with-cpu=power7 and I am going
 to work on this and after that on SPEC2006 you wrote about.

Ok.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [Patch, Fortran] PR 56814: [4.8/4.9 Regression] Bogus Interface mismatch in dummy procedure

2013-04-17 Thread Janus Weil
 here is patch for a recent regression with procedure pointers.
 Regtested on x86_64-unknown-linux-gnu. Ok for trunk and 4.8?

 Looks rather obvious. OK - and thanks for the patch.

Thanks for the review. Committed as r198032.


 PS: If you have time, could you review my C_LOC patch at
 http://gcc.gnu.org/ml/fortran/2013-04/msg00073.html ?

Will try to have a look soon ...

Cheers,
Janus



 2013-04-17  Janus Weil  ja...@gcc.gnu.org

  PR fortran/56814
  * interface.c (check_result_characteristics): Get result from
 interface
  if present.


 2013-04-17  Janus Weil  ja...@gcc.gnu.org

  PR fortran/56814
  * gfortran.dg/proc_ptr_42.f90: New.




Re: [PATCH] backport darwin12 fixes to gcc-4_7-branch

2013-04-17 Thread Mike Stump
On Jun 18, 2012, at 9:36 AM, Jack Howarth howa...@bromo.med.uc.edu wrote:
  The attached patch backports...
 
 http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01710.html
 http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01707.html
 
 which implement changes for darwin12 and later. Bootstrap
 and regression tested on current x86_64-apple-darwin12.
 
 http://gcc.gnu.org/ml/gcc-testresults/2012-06/msg01547.html
 
 Okay for gcc-4_7-branch?

Committed 198005.
Committed 198012.
Committed 198031.

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01181.html is the original 
request, since it's been a while since people saw it last.

Re: [google] Add libgcov interface for accessing profile directory (issue8726046)

2013-04-17 Thread Xinliang David Li
looks ok.

David

On Wed, Apr 17, 2013 at 9:00 AM, Teresa Johnson tejohn...@google.com wrote:
 Patch to add interface for querying the profile directory prefix
 specified to the -fprofile-generate= option.

 Google ref b/8629045.

 Tested with regression tests and internal test. Ok for google branches?

 2013-04-17  Teresa Johnson  tejohn...@google.com

 * libgcc/libgcov.c (__gcov_get_profile_prefix): New function.
 * gcc/tree-profile.c (tree_init_instrumentation): New function.
 * gcc/coverage.c (get_const_string_type): Make non-static.
 (coverage_init): Invoke tree_init_instrumentation().
 * gcc/coverage.h (get_const_string_type): Declare.
 (tree_init_instrumentation): Ditto.

 Index: libgcc/libgcov.c
 ===
 --- libgcc/libgcov.c(revision 197640)
 +++ libgcc/libgcov.c(working copy)
 @@ -183,6 +183,14 @@ unsigned int __gcov_sampling_enabled ()
return __gcov_has_sampling;
  }

 +/* Profile directory prefix specified to -fprofile-generate=.  */
 +extern char * __gcov_profile_prefix;
 +
 +char *__gcov_get_profile_prefix ()
 +{
 +  return __gcov_profile_prefix;
 +}
 +
  /* Per thread sample counter.  */
  THREAD_PREFIX gcov_unsigned_t __gcov_sample_counter = 0;

 Index: gcc/tree-profile.c
 ===
 --- gcc/tree-profile.c  (revision 197640)
 +++ gcc/tree-profile.c  (working copy)
 @@ -165,6 +165,9 @@ static struct pointer_set_t *instrumentation_to_be
  /* extern __thread gcov_unsigned_t __gcov_sample_counter  */
  static GTY(()) tree gcov_sample_counter_decl = NULL_TREE;

 +/* extern gcov_unsigned_t __gcov_profile_prefix  */
 +static tree GTY(()) gcov_profile_prefix_decl = NULL_TREE;
 +
  /* extern gcov_unsigned_t __gcov_sampling_period  */
  static tree GTY(()) gcov_sampling_period_decl = NULL_TREE;

 @@ -407,6 +410,41 @@ cleanup_instrumentation_sampling (void)
  }
  }

 +/* Initialization function for FDO instrumentation.  */
 +
 +void
 +tree_init_instrumentation (void)
 +{
 +  if (!gcov_profile_prefix_decl)
 +{
 +  tree prefix_ptr;
 +  int prefix_len;
 +  tree prefix_string;
 +
 +  /* Construct an initializer for __gcov_profile_prefix.  */
 +  gcov_profile_prefix_decl =
 +build_decl (UNKNOWN_LOCATION, VAR_DECL,
 +get_identifier (__gcov_profile_prefix),
 +get_const_string_type ());
 +  TREE_PUBLIC (gcov_profile_prefix_decl) = 1;
 +  DECL_ARTIFICIAL (gcov_profile_prefix_decl) = 1;
 +  make_decl_one_only (gcov_profile_prefix_decl,
 +  DECL_ASSEMBLER_NAME (gcov_profile_prefix_decl));
 +  TREE_STATIC (gcov_profile_prefix_decl) = 1;
 +
 +  prefix_len = strlen (profile_data_prefix);
 +  prefix_string = build_string (prefix_len + 1, profile_data_prefix);
 +  TREE_TYPE (prefix_string) = build_array_type
 +  (char_type_node, build_index_type
 +   (build_int_cst (NULL_TREE, prefix_len)));
 +  prefix_ptr = build1 (ADDR_EXPR, get_const_string_type (),
 +   prefix_string);
 +
 +  DECL_INITIAL (gcov_profile_prefix_decl) = prefix_ptr;
 +  varpool_finalize_decl (gcov_profile_prefix_decl);
 +}
 +}
 +
  /* Initialization function for FDO sampling.  */

  void
 Index: gcc/coverage.c
 ===
 --- gcc/coverage.c  (revision 197640)
 +++ gcc/coverage.c  (working copy)
 @@ -227,7 +227,7 @@ get_gcov_unsigned_t (void)

  /* Return the type node for const char *.  */

 -static tree
 +tree
  get_const_string_type (void)
  {
return build_pointer_type
 @@ -2879,6 +2879,7 @@ coverage_init (const char *filename, const char* s
/* Define variables which are referenced at runtime by libgcov.  */
if (profiling_enabled_p ())
  {
 +  tree_init_instrumentation ();
tree_init_dyn_ipa_parameters ();
init_pmu_profiling ();
tree_init_instrumentation_sampling ();
 Index: gcc/coverage.h
 ===
 --- gcc/coverage.h  (revision 197640)
 +++ gcc/coverage.h  (working copy)
 @@ -82,6 +82,7 @@ extern bool pmu_data_present (void);

  extern tree get_gcov_type (void);
  extern tree get_gcov_unsigned_t (void);
 +extern tree get_const_string_type (void);

  /* Mark this module as containing asm statements.  */
  extern void coverage_has_asm_stmt (void);
 @@ -89,5 +90,6 @@ extern void coverage_has_asm_stmt (void);
  /* Defined in tree-profile.c.  */
  extern void tree_init_instrumentation_sampling (void);
  extern void tree_init_dyn_ipa_parameters (void);
 +extern void tree_init_instrumentation (void);

  #endif

 --
 This patch is available for review at http://codereview.appspot.com/8726046


Re: [patch, fortran] Really fix PR 56782

2013-04-17 Thread Thomas Koenig

Hi Mikael,


Attached is what I had in mind.
And a testcase (the '|| expr-expr_type == EXPR_FUNCTION' in
copy_walk_reduction_arg appeared wrong to me, and it was seemingly).


Yes, you can also do it this way.

OK if accompanied by a ChangeLog entry and my test cases, as well :-)

Thomas


Re: [GOOGLE] Add AutoFDO based indirect call value profiling to GCC

2013-04-17 Thread Xinliang David Li
https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c
File gcc/auto-profile.c (right):

https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode161
gcc/auto-profile.c:161: };
Why not sharing the same hist_type enum in value-prof.h?

https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode171
gcc/auto-profile.c:171: int count;
int -- gcov_type?

https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode519
gcc/auto-profile.c:519: static void
New line above.

https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode550
gcc/auto-profile.c:550: if (actual_count  2)
Are the targets sorted?

https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode570
gcc/auto-profile.c:570: static void afdo_vpt (gimple stmt, struct
gcov_hist *v, int hist_size)
New line.

https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode914
gcc/auto-profile.c:914: for the given POS_STACK.  */
Document new parameters.

https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode1193
gcc/auto-profile.c:1193: gcov_functions[i].stacks[j].hist_size =
gcov_read_unsigned ();
Why is the data not optional?

https://codereview.appspot.com/8814043/diff/1/gcc/auto-profile.c#newcode1203
gcc/auto-profile.c:1203: file_names[gcov_read_counter()];
missing white space (many other instances)

https://codereview.appspot.com/8814043/diff/1/gcc/tree-inline.c
File gcc/tree-inline.c (right):

https://codereview.appspot.com/8814043/diff/1/gcc/tree-inline.c#newcode1833
gcc/tree-inline.c:1833: gcov_type count = afdo_get_bb_count
(copy_basic_block, false);
comment about the 'false' parameter.

https://codereview.appspot.com/8814043/diff/1/gcc/value-prof.c
File gcc/value-prof.c (right):

https://codereview.appspot.com/8814043/diff/1/gcc/value-prof.c#newcode1223
gcc/value-prof.c:1223: if (!strcmp(fname, name))
You can use assembler_name_hash to find the cgraph_node

https://codereview.appspot.com/8814043/diff/1/gcc/value-prof.c#newcode1626
gcc/value-prof.c:1626: direct_call1 = find_func_by_global_id (val1);
May be add a parameter to find_func_by_global_id (val, is_auto_fdo)?

https://codereview.appspot.com/8814043/

On Tue, Apr 16, 2013 at 8:41 PM, Dehao Chen de...@google.com wrote:
 This patch implements indirect call promotion for AutoFDO.

 Bootstrapped and passed gcc regression tests.

 Is it okay for gcc-4_7 branch?

 Thanks,
 Dehao

 http://codereview.appspot.com/8814043

 diff --git a/gcc/Makefile.in b/gcc/Makefile.in
 index d17b250..c217846 100644
 --- a/gcc/Makefile.in
 +++ b/gcc/Makefile.in
 @@ -3039,7 +3039,7 @@ coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H)
 $(SYSTEM_H) coretypes.h \
 $(DIAGNOSTIC_CORE_H) intl.h gt-coverage.h $(TARGET_H) $(AUTO_PROFILE_H)
  auto-profile.o : auto-profile.c $(CONFIG_H) $(SYSTEM_H) $(FLAGS_H) \
 $(BASIC_BLOCK_H) $(DIAGNOSTIC_CORE_H) $(GCOV_IO_H) $(INPUT_H) 
 $(PROFILE_H) \
 -   $(LANGHOOKS_H) $(OPTS_H) $(TREE_PASS_H) $(CGRAPH_H) $(GIMPLE_H) \
 +   $(LANGHOOKS_H) $(OPTS_H) $(TREE_PASS_H) $(CGRAPH_H) $(GIMPLE_H)
 value-prof.h \
 $(AUTO_PROFILE_H)
  cselib.o : cselib.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
 $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(RECOG_H) \
 diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
 index cf17370..38f4209 100644
 --- a/gcc/auto-profile.c
 +++ b/gcc/auto-profile.c
 @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3.  If not see
  #include gimple.h
  #include cgraph.h
  #include tree-flow.h
 +#include value-prof.h
  #include auto-profile.h

  /* The following routines implements AutoFDO optimization.
 @@ -109,6 +110,8 @@ struct gcov_stack
const char *callee_name;
struct gcov_callsite_pos *stack;
gcov_unsigned_t size;
 +  struct gcov_hist *hist;
 +  gcov_unsigned_t hist_size;
gcov_type num_inst;
gcov_type count;
gcov_type max_count;
 @@ -150,6 +153,24 @@ struct afdo_module
char **strings;
  };

 +enum afdo_histogram_type
 +{
 +  CALL_HIST = 1,
 +  STRINGOP_HIST,
 +  DIVMOD_HIST
 +};
 +
 +struct gcov_hist
 +{
 +  enum afdo_histogram_type type;
 +  union
 +{
 +  const char *func_name;
 +  unsigned long long value;
 +} value;
 +  int count;
 +};
 +
  /* Store the file name strings read from the profile data file. */
  static const char **file_names;

 @@ -493,6 +514,64 @@ read_aux_modules (void)
  }
  }

 +/* From AutoFDO profiles, find values inside STMT for that we want to measure
 +   histograms for indirect-call optimization.  */
 +static void
 +afdo_indirect_call (gimple stmt, struct gcov_hist *values, int hist_size)
 +{
 +  tree callee;
 +  int i, total = 0;
 +  int actual_count = 0;
 +  histogram_value hist;
 +
 +  if (gimple_code (stmt) != GIMPLE_CALL
 +  || gimple_call_fndecl (stmt) != NULL_TREE)
 +return;
 +
 +  callee = gimple_call_fn (stmt);
 +
 +  for (i = 0; i  hist_size; i++)
 +if (values[i].type == CALL_HIST)
 +  break;
 +
 +  if (i == hist_size)
 +return;
 +

Re: [PATCH] Add a new option -fstack-protector-strong

2013-04-17 Thread 沈涵
Thanks.

On Wed, Apr 17, 2013 at 2:26 AM, Florian Weimer fwei...@redhat.com wrote:
 On 04/17/2013 02:49 AM, Han Shen wrote:
 Indentation is off (unless both mail clients I tried are clobbering your
 patch).  I think the GNU coding style prohibits the braces around the
 single-statement body of the outer 'for.

 Done with indentation properly on and removed the braces. (GMail
 composing window drops all the tabs when pasting... I have to use
 Thunderbird to paste the patch. Hope it is right this time)


 Thunderbird mangles patches as well, but I was able to repair the damage.
 When using Thunderbird, please send the patch as a text file attachment.
 You can put the changelog snippets at the beginning of the file as well.
 This way, everything is sent out unchanged.

Patch sent as attachment.


 Can you make the conditional more similar to the comment, perhaps using a
 switch statement on the value of the flag_stack_protect variable? That's
 going to be much easier to read.

 Re-coded. Now using 'switch-case'.


 Thanks.  I think the comment is now redundant because it matches the code
 almost word-for-word. 8-)

Comment deleted.

 No for 'struct-returning' functions. But I regard this not an issue ---
 at the programming level, there is no way to get one's hand on the
 address of a returned structure ---
 struct Node foo();
 struct Node *p = foo();  // compiler error - lvalue required as
 unary '' operand.


 C++ const references can bind to rvalues.
 2013-04-11  Jakub Jelinek  ja...@redhat.com
 But I'm more worried about the interaction with the return value
 optimization.  Consider this C++ code:

 struct S {
   S();
   int a;
   int b;
   int c;
   int d;
   int e;
 };

 void f1(int *);

 S f2()
 {
   S s;
   f1(s.a);
   return s;
 }

 S g2();

 void g3()
 {
   S s = g2();
 }

 void g3b(const S);

 void g3b()
 {
   g3b(g2());
 }

 With your patch and -O2 -fstack-protector-strong, this generates the
 following assembly:

 .globl  _Z2f2v
 .type   _Z2f2v, @function
 _Z2f2v:
 .LFB0:
 .cfi_startproc
 pushq   %rbx
 .cfi_def_cfa_offset 16
 .cfi_offset 3, -16
 movq%rdi, %rbx
 call_ZN1SC1Ev
 movq%rbx, %rdi
 call_Z2f1Pi
 movq%rbx, %rax
 popq%rbx
 .cfi_def_cfa_offset 8
 ret
 .cfi_endproc
 .LFE0:
 .size   _Z2f2v, .-_Z2f2v
 .p2align 4,,15
 .globl  _Z2g3v
 .type   _Z2g3v, @function
 _Z2g3v:
 .LFB1:
 .cfi_startproc
 subq$40, %rsp
 .cfi_def_cfa_offset 48
 movq%rsp, %rdi
 call_Z2g2v
 addq$40, %rsp
 .cfi_def_cfa_offset 8
 ret
 .cfi_endproc
 .LFE1:
 .size   _Z2g3v, .-_Z2g3v
 .p2align 4,,15
 .globl  _Z3g3bv
 .type   _Z3g3bv, @function
 _Z3g3bv:
 .LFB2:
 .cfi_startproc
 subq$40, %rsp
 .cfi_def_cfa_offset 48
 movq%rsp, %rdi
 movq%fs:40, %rax
 movq%rax, 24(%rsp)
 xorl%eax, %eax
 call_Z2g2v
 movq%rsp, %rdi
 call_Z3g3bRK1S
 movq24(%rsp), %rax
 xorq%fs:40, %rax
 jne .L9
 addq$40, %rsp
 .cfi_remember_state
 .cfi_def_cfa_offset 8
 ret
 .L9:
 .cfi_restore_state
 .p2align 4,,6
 call__stack_chk_fail
 .cfi_endproc
 .LFE2:
 .size   _Z3g3bv, .-_Z3g3bv

 Here, g3b() is correctly instrumented, and f2() does not need
 instrumentation (because space for the returned object is not part of the
 local frame).  But an address on the stack escapes in g3() and is used for
 the return value of the call to g2().  This requires instrumentation, which
 is missing in this example.

 I suppose this can be handled in a follow-up patch if necessary.

Thanks for the case. Yeah, I see the problem here - in cfgexpand phase
- where functions being scanned for stack protection - the tree
representation has no indication that a local address is computed,
just as listed below -
  void g3() ()
  {
struct S s;
bb 2:
s = g2 (); [return slot optimization]
s ={v} {CLOBBER};
return;
  }
One solution might be to scan for the [return slot optimization] tag
in the tree. I'll post later another patch to address this.

 ChangeLog and patch below --

 gcc/ChangeLog
 2013-04-16  Han Shen  shen...@google.com
  * cfgexpand.c (record_or_union_type_has_array_p): Helper function
  to check if a record or union contains an array field.


 I think the GNU convention is to write only this:

 * cfgexpand.c (record_or_union_type_has_array_p): New function.
Done.


  (expand_used_vars): Add logic handling '-fstack-protector-strong'.
  * common.opt (fstack-protector-all): New option.


 Should be fstack-protector-strong.
Done.


 --
 Florian Weimer / Red Hat Product Security Team

Patch attached as plain txt.

Thanks,

Re: [PATCH, PR 10474] Shedule pass_cprop_hardreg before pass_thread_prologue_and_epilogue

2013-04-17 Thread Jeff Law

On 04/17/2013 09:49 AM, Martin Jambor wrote:


The reason why it helps so much is that before register allocation
there are instructions moving the value of actual arguments from
originally hard register (e.g. SI, DI, etc.) to a pseudo at the
beginning of each function.  When the argument is live across a
function call, the pseudo is likely to be assigned to a callee-saved
register and then also accessed from that register, even in the first
BB, making it require prologue, though it could be fetched from the
original one.  When we convert all uses (at least in the first BB) to
the original register, the preparatory stage of shrink wrapping is
often capable of moving the register moves to a later BB, thus
creating fast paths which do not require prologue and epilogue.
I noticed similar effects when looking at range splitting.  Being able 
to move those calls into a deeper control level in the CFG would 
definitely be an improvement.




We believe this change in the pipeline should not bring about any
negative effects.  During gcc bootstrap, the number of instructions
changed by pass_cprop_hardreg dropped but by only 1.2%.  We have also
ran SPEC 2006 CPU benchmarks on recent Intel and AMD hardware and all
run time differences could be attributed to noise.  The changes in
binary sizes were also small:
Did anyone ponder just doing the hard register propagation on argument 
registers prior the prologue/epilogue handling, then the full blown 
propagation pass in its current location in the pipeline?


That would get you the benefit you're seeking and minimize other 
effects.  Of course if you try that and get effectively the same results 
as moving the full propagation pass before prologue/epilogue handling 
then the complexity of only propagating argument registers early is 
clearly not needed and we'd probably want to go with your patch as-is.



jeff



Re: Fix std::pair std::is_copy_assignable behavior

2013-04-17 Thread François Dumont

Hi

Here is an other proposal to fix std::is_copy_assignablestd::pair.

This is not perfect because I have adapted it to current compiler 
behavior but it is still better than current behavior and enough to 
commit the unordered C++11 allocator adaptation afterward. It will give 
me more time to work on the Standard modification proposal to avoid the 
partial template specialization used for the moment.


Thanks to current resolution of DR 1402 we can already define the 
move assignment operator as default, if deleted the template move 
assignment operator will be considered and potentially used instead.


2013-04-17  François Dumont fdum...@gcc.gnu.org

* include/bits/stl_pair.h (operator=(const pair)): Add noexcept
qualification.
(operator=(pair)): Use default implementation.
(template operator=(const pair)): Add noexcept
qualification. Enable if is_assignableT, const U true for both
parameter types.
(template operator=(pair)): Add noexcept
qualification. Enable if is_assignableT, U true for both
parameter types.
(std::is_copy_assignable, std::is_move_assignable): Add
partial specialization.
* testsuite/20_util/pair/is_move_assignable.cc: New.
* testsuite/20_util/pair/is_copy_assignable.cc: Likewise.
* testsuite/20_util/pair/is_assignable.cc: Likewise.
* testsuite/20_util/pair/is_nothrow_move_assignable.cc: Likewise.
* testsuite/20_util/pair/assign_neg.cc: Likewise.
* testsuite/20_util/pair/is_nothrow_copy_assignable.cc: Likewise.
* testsuite/20_util/pair/assign.cc: Likewise.

Tested under Linux x86_64.

Ok to commit ?

François
Index: include/bits/stl_pair.h
===
--- include/bits/stl_pair.h	(revision 197829)
+++ include/bits/stl_pair.h	(working copy)
@@ -156,6 +156,8 @@
 
   pair
   operator=(const pair __p)
+  noexcept(__and_is_nothrow_copy_assignable_T1,
+		  is_nothrow_copy_assignable_T2::value)
   {
 	first = __p.first;
 	second = __p.second;
@@ -163,18 +165,15 @@
   }
 
   pair
-  operator=(pair __p)
-  noexcept(__and_is_nothrow_move_assignable_T1,
-	  is_nothrow_move_assignable_T2::value)
-  {
-	first = std::forwardfirst_type(__p.first);
-	second = std::forwardsecond_type(__p.second);
-	return *this;
-  }
+  operator=(pair) = default;
 
   templateclass _U1, class _U2
-	pair
+	typename enable_if__and_is_assignable_T1, const _U1,
+  is_assignable_T2, const _U2::value,
+			   pair::type
 	operator=(const pair_U1, _U2 __p)
+	noexcept(__and_is_nothrow_assignable_T1, const _U1,
+			is_nothrow_assignable_T2, const _U2::value)
 	{
 	  first = __p.first;
 	  second = __p.second;
@@ -182,8 +181,12 @@
 	}
 
   templateclass _U1, class _U2
-	pair
+  	typename enable_if__and_is_assignable_T1, _U1,
+    is_assignable_T2, _U2::value,
+  			   pair::type
 	operator=(pair_U1, _U2 __p)
+	noexcept(__and_is_nothrow_assignable_T1, _U1,
+			is_nothrow_assignable_T2, _U2::value)
 	{
 	  first = std::forward_U1(__p.first);
 	  second = std::forward_U2(__p.second);
@@ -287,6 +290,19 @@
 { return pair_T1, _T2(__x, __y); }
 #endif
 
+#if __cplusplus = 201103L
+  templateclass _T1, class _T2
+struct is_copy_assignablestd::pair_T1, _T2
+  : public __and_std::is_copy_assignable_T1,
+		  std::is_copy_assignable_T2::type
+{ };
+
+  templateclass _T1, class _T2
+struct is_move_assignablestd::pair_T1, _T2
+  : public __and_std::is_move_assignable_T1,
+		  std::is_move_assignable_T2::type
+{ };
+#endif
   /// @}
 
 _GLIBCXX_END_NAMESPACE_VERSION
Index: testsuite/20_util/pair/is_assignable.cc
===
--- testsuite/20_util/pair/is_assignable.cc	(revision 0)
+++ testsuite/20_util/pair/is_assignable.cc	(revision 0)
@@ -0,0 +1,48 @@
+// { dg-do compile }
+// { dg-options -std=c++11 }
+
+// Copyright (C) 2013 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// http://www.gnu.org/licenses/.
+
+#include type_traits
+#include utility
+
+struct T1
+{
+};
+
+struct T2
+{
+  T2
+  operator=(const T1);
+};
+
+typedef std::pairint, int		pt1;
+typedef std::pairshort, short		pt2;
+typedef std::pairint, double pt3;
+typedef std::pairint, T1 pt4;
+typedef 

Re: Fix std::pair std::is_copy_assignable behavior

2013-04-17 Thread Paolo Carlini

On 4/17/13 8:10 PM, François Dumont wrote:

Hi

Here is an other proposal to fix 
std::is_copy_assignablestd::pair.
Sorry, I'm still missing something very, very basic: which behavior is 
conforming, the current one or what we would get instead? If the former, 
is there a DR arguing for the latter?


Paolo.


Re: [patch] simplify emit_delay_sequence

2013-04-17 Thread Steven Bosscher
On Wed, Apr 17, 2013 at 10:39 AM, Eric Botcazou ebotca...@adacore.com wrote:
 @@ -538,6 +502,8 @@ emit_delay_sequence (rtx insn, rtx list, int lengt
 INSN_LOCATION (seq_insn) = INSN_LOCATION (tem);
INSN_LOCATION (tem) = 0;

 +  /* Remove any REG_DEAD notes because we can't rely on them now
 +that the insn has been moved.  */
for (note = REG_NOTES (tem); note; note = next)
 {
   next = XEXP (note, 1);

 Did you mean to move the comment instead of duplicating it?

No, it's a merge mistake. My copy of this function in sched-dbr
doesn't have the switch (it doesn't delete insns and re-inserts them,
it simply moves them, so updating label notes is not necessary) and I
just copied the comment before the loop. Then I merged back some of
the changes to reorg.c and messed up :-)

Thanks for the review. Committed.

Ciao!
Steven


Re: Fix std::pair std::is_copy_assignable behavior

2013-04-17 Thread François Dumont

On 04/17/2013 09:18 PM, Paolo Carlini wrote:

On 4/17/13 8:10 PM, François Dumont wrote:

Hi

Here is an other proposal to fix 
std::is_copy_assignablestd::pair.
Sorry, I'm still missing something very, very basic: which behavior is 
conforming, the current one or what we would get instead? If the 
former, is there a DR arguing for the latter?


Paolo.

The behavior I am targeting is 
std::is_copy_asignablestd::pairconst int, int to be std::false_type 
for instance. I have added test for many other use cases. More generally 
I need that when std::is_copy_assignableT is std::true_type then 
writing a = b, with a and b being T, does compile.


Otherwise this patch just make std::pair match the Standard 
requirements like at 20.3.2.17. Do you want me to add a bug report in 
Bugzilla first ?


François



Re: Fix std::pair std::is_copy_assignable behavior

2013-04-17 Thread Paolo Carlini

Hi,

On 4/17/13 8:43 PM, François Dumont wrote:

On 04/17/2013 09:18 PM, Paolo Carlini wrote:

On 4/17/13 8:10 PM, François Dumont wrote:

Hi

Here is an other proposal to fix 
std::is_copy_assignablestd::pair.
Sorry, I'm still missing something very, very basic: which behavior 
is conforming, the current one or what we would get instead? If the 
former, is there a DR arguing for the latter?


Paolo.

The behavior I am targeting is 
std::is_copy_asignablestd::pairconst int, int to be 
std::false_type for instance. I have added test for many other use 
cases. More generally I need that when std::is_copy_assignableT is 
std::true_type then writing a = b, with a and b being T, does compile.


Otherwise this patch just make std::pair match the Standard 
requirements like at 20.3.2.17. Do you want me to add a bug report in 
Bugzilla first ?
I'm not talking about GCC's Bugzilla, I meant an ISO DR: if we don't 
have a DR and preferably a vague support of LWG people, I think we 
should not change the behavior of our std::is_copy_assignable only 
because it helps our implementation of other facilities.


Paolo.


Re: [patch] PR56729

2013-04-17 Thread Steven Bosscher
 On Fri, Mar 29, 2013 at 1:05 PM, Steven Bosscher wrote:
 It looks like there are places in the middle end that use remove_insn
 on insns that are not actually emitted. This breaks the assert I added
 in df_insn_delete. The patch disables the assert for now. The comment
 before the assert is now even messier than before but I think it's
 better to explain why the assert cannot work than to remove the
 comment and the assert altogether.

This is no longer necessary, now that remove_insn doesn't use
df_insn_delete. Only a small patch to lower-subreg.c is needed to
restore the check.

Bootstrappedtested (unix{,-m32}) on powerpc64-unknown-linux-gnu.
OK for trunk?

Ciao!
Steven


* lower-subreg.c (resolve_simple_move): If called self-recursive,
do not delete_insn insns that have not yet been emitted, only
unlink them with remove_insn.
* df-scan.c (df_insn_delete): Revert r197492.

Index: lower-subreg.c
===
--- lower-subreg.c  (revision 198002)
+++ lower-subreg.c  (working copy)
@@ -1069,7 +1069,13 @@ resolve_simple_move (rtx set, rtx insn)

   emit_insn_before (insns, insn);

-  delete_insn (insn);
+  /* If we get here via self-recursion, then INSN is not yet in the insns
+ chain and delete_insn will fail.  We only want to remove INSN from the
+ current sequence.  See PR56738.  */
+  if (in_sequence_p ())
+remove_insn (insn);
+  else
+delete_insn (insn);

   return insns;
 }
Index: df-scan.c
===
--- df-scan.c   (revision 198002)
+++ df-scan.c   (working copy)
@@ -1158,17 +1158,7 @@ df_insn_delete (rtx insn)
  In any case, we expect BB to be non-NULL at least up to register
  allocation, so disallow a non-NULL BB up to there.  Not perfect
  but better than nothing...  */
-  /* ??? bb can also be NULL if lower-subreg.c:resolve_simple_mov emits
- an insn into a sequence and then does delete_insn on it.  Not sure
- if that makes sense, but for now it means this assert cannot work.
- See PR56738.
- Disable for now but revisit before the end of GCC 4.9 stage1.  */
-#if 0
   gcc_checking_assert (bb != NULL || reload_completed);
-#else
-  if (bb == NULL)
-return;
-#endif

   df_grow_bb_info (df_scan);
   df_grow_reg_info ();


Re: [patch] PR56729

2013-04-17 Thread Jeff Law

On 04/17/2013 02:12 PM, Steven Bosscher wrote:

On Fri, Mar 29, 2013 at 1:05 PM, Steven Bosscher wrote:

It looks like there are places in the middle end that use remove_insn
on insns that are not actually emitted. This breaks the assert I added
in df_insn_delete. The patch disables the assert for now. The comment
before the assert is now even messier than before but I think it's
better to explain why the assert cannot work than to remove the
comment and the assert altogether.


This is no longer necessary, now that remove_insn doesn't use
df_insn_delete. Only a small patch to lower-subreg.c is needed to
restore the check.

Bootstrappedtested (unix{,-m32}) on powerpc64-unknown-linux-gnu.
OK for trunk?

Ciao!
Steven


 * lower-subreg.c (resolve_simple_move): If called self-recursive,
 do not delete_insn insns that have not yet been emitted, only
 unlink them with remove_insn.
 * df-scan.c (df_insn_delete): Revert r197492.

OK.
Jeff



Re: Fix std::pair std::is_copy_assignable behavior

2013-04-17 Thread François Dumont

On 04/17/2013 10:02 PM, Paolo Carlini wrote:

Hi,

On 4/17/13 8:43 PM, François Dumont wrote:

On 04/17/2013 09:18 PM, Paolo Carlini wrote:

On 4/17/13 8:10 PM, François Dumont wrote:

Hi

Here is an other proposal to fix 
std::is_copy_assignablestd::pair.
Sorry, I'm still missing something very, very basic: which behavior 
is conforming, the current one or what we would get instead? If the 
former, is there a DR arguing for the latter?


Paolo.

The behavior I am targeting is 
std::is_copy_asignablestd::pairconst int, int to be 
std::false_type for instance. I have added test for many other use 
cases. More generally I need that when std::is_copy_assignableT is 
std::true_type then writing a = b, with a and b being T, does compile.


Otherwise this patch just make std::pair match the Standard 
requirements like at 20.3.2.17. Do you want me to add a bug report in 
Bugzilla first ?
I'm not talking about GCC's Bugzilla, I meant an ISO DR: if we don't 
have a DR and preferably a vague support of LWG people, I think we 
should not change the behavior of our std::is_copy_assignable only 
because it helps our implementation of other facilities.


Paolo.


I really don't get it. Is current behavior really Standard compliant ?

I don't think so and would like to fix it. The Standard says that 
pair operator=(const pair) requires that both 
is_copy_assignablefirst_type and is_copy_assignablesecond_type to be 
true. With std::pairconst int, int, is_copy_assignableconst int is 
false and then the operator is ill formed. It is if you try to call it 
but if you check is_copy_assignablepairconst int, int it says true. 
Do you see it as a correct behavior ? Do you really think that it 
requires an ISO DR ?


François



Re: [PATCH, tree-ssa] Avoid -Wuninitialized warning in try_unroll_loop_completely()

2013-04-17 Thread Jeff Law

On 04/15/2013 08:35 PM, Chung-Ju Wu wrote:

You are right.
After doing survey on http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings
and reading related discussion thread, I realized it is a
complicated detection and this is a false positive case.

I was using gcc-4.6.3, which is provided by Ubuntu 12.04,
and the warning is displayed during the compilation process.
As I tried to build another native gcc by myself with
current main trunk and used it to compile tree-ssa-loop-ivcanon.c again,
there is no such warning at all.
(See attachment for my console output.)

So I am wondering if my patch is still valuable since
such false positive warning is already fixed on trunk.
Thanks for checking on this stuff.  My preference would be to not add 
the initialization unless we're currently seeing false positive warnings 
with the trunk.


Anytime we add a dummy initialization like this to avoid a false 
positive, we run the risk of missing real bugs later if the nearby code 
is changed in such a way as to expose an uninitialized use, which we'd 
like to catch, but can't if we're inserted a dummy initialization.


Jeff



Re: [Patch, Fortran] PR39505 - add support for !GCC$ attributes NO_ARG_CHECK

2013-04-17 Thread Thomas Koenig

Hi Tobias,


Is my -f(no-)directives patch okay? Or do you envision something else?


In principle, it is OK; the only question is what the default should be :-)

For OpenMP, we require an option to change the semantics of a program
based on special comments.  Currently, we do not do so for directives
which do the same thing.  So, what should we do?

Does anybody else have an opinion here?  I'm willing to go with the
majority here.

Thomas



Re: [Patch, Fortran] PR39505 - add support for !GCC$ attributes NO_ARG_CHECK

2013-04-17 Thread Tobias Burnus

Thomas Koenig wrote:

Is my -f(no-)directives patch okay? Or do you envision something else?


In principle, it is OK; the only question is what the default should 
be :-)


I am in favor of on.


For OpenMP, we require an option to change the semantics of a program
based on special comments.  Currently, we do not do so for directives
which do the same thing.


Well, I see a difference here: (Nearly) all program using OpenMP work 
also as serial program (-fno-openmp). On the other hand, without !GCC$ 
attributes directives (or the C equivalent: __attribute__((...))), 
the program does not work properly. The attributes are used to be able 
to express some feature which is not available in the standard but still 
in some way required: stdcall, fastcall, dllimport, dllexport. unused 
arguments, weak bindings (not yet for Fortran), disabling argument 
checking (only Fortran), etc.


Tobias


Re: [patch] fix verify_rtl_sharing handling of SEQUENCEs

2013-04-17 Thread Steven Bosscher
On Tue, Apr 16, 2013 at 6:39 PM, Steven Bosscher wrote:
 My new delay branch scheduler uses TODO_verify_rtl_sharing but it
 turns out that verify_rtl_sharing doesn't handle SEQUENCEs correctly:
 Clearing the used-flags is done correctly, but verifying insns in the
 SEQUENCE fails. The problem is that every insn in the SEQUENCE is
 marked used via PATTERN(SEQUENCE) and also via PREV_INSN/NEXT_INSN of
 the insns in the SEQUENCE.

 Fixed with the attached patch. Bootstrapepdtested on
 sparc64-unknown-linux-gnu, and cross-builttested on mips-elf, both
 with TODO_verify_rtl_sharing added to the passes in reorg.c.
 Will commit as obvious.

 Andreas Krebbel's patch
 (http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00714.html) makes the
 problem go away for me, so I'm not going to commit this patch after
 all.

Hmm, no it hasn't.

What happens is this:

reset_all_used_flags resets the used flags via mark_used_flags,
which doesn't mark or unmark insns:

case DEBUG_INSN:
case INSN:
case JUMP_INSN:
case CALL_INSN:
case NOTE:
case LABEL_REF:
case BARRIER:
  /* The chain of insns is not being copied.  */
  return;

But verify_rtx_sharing sets the used flag on insns if they are
reached via a SEQUENCE. So the first verify_rtl_sharing call with
SEQUENCEs in the insn chain passes, but a second call will fail
because the used flags on insns in the SEQUENCE isn't cleared.

So, updated patch attached, will commit after testing on sparc64-linux.

Ciao!
Steven
* emit-rtl.c (reset_insn_used_flags): New function.
(reset_all_used_flags): Use it.
(verify_insn_sharing): New function.
(verify_rtl_sharing): Fix verification for SEQUENCEs.

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 198036)
+++ emit-rtl.c  (working copy)
@@ -2596,6 +2596,18 @@ verify_rtx_sharing (rtx orig, rtx insn)
   return;
 }
 
+/* Reset used-flags for INSN.  */
+
+static void
+reset_insn_used_flags (rtx insn)
+{
+  gcc_assert (INSN_P (insn));
+  reset_used_flags (PATTERN (insn));
+  reset_used_flags (REG_NOTES (insn));
+  if (CALL_P (insn))
+reset_used_flags (CALL_INSN_FUNCTION_USAGE (insn));
+}
+
 /* Go through all the RTL insn bodies and clear all the USED bits.  */
 
 static void
@@ -2606,28 +2618,30 @@ reset_all_used_flags (void)
   for (p = get_insns (); p; p = NEXT_INSN (p))
 if (INSN_P (p))
   {
-   reset_used_flags (PATTERN (p));
-   reset_used_flags (REG_NOTES (p));
-   if (CALL_P (p))
- reset_used_flags (CALL_INSN_FUNCTION_USAGE (p));
-   if (GET_CODE (PATTERN (p)) == SEQUENCE)
- {
-   int i;
-   rtx q, sequence = PATTERN (p);
-
-   for (i = 0; i  XVECLEN (sequence, 0); i++)
- {
-   q = XVECEXP (sequence, 0, i);
-   gcc_assert (INSN_P (q));
-   reset_used_flags (PATTERN (q));
-   reset_used_flags (REG_NOTES (q));
-   if (CALL_P (q))
- reset_used_flags (CALL_INSN_FUNCTION_USAGE (q));
- }
- }
+   rtx pat = PATTERN (p);
+   if (GET_CODE (pat) != SEQUENCE)
+ reset_insn_used_flags (p);
+   else
+ {
+   gcc_assert (REG_NOTES (p) == NULL);
+   for (int i = 0; i  XVECLEN (pat, 0); i++)
+ reset_insn_used_flags (XVECEXP (pat, 0, i));
+ }
   }
 }
 
+/* Verify sharing in INSN.  */
+
+static void
+verify_insn_sharing (rtx insn)
+{
+  gcc_assert (INSN_P (insn));
+  reset_used_flags (PATTERN (insn));
+  reset_used_flags (REG_NOTES (insn));
+  if (CALL_P (insn))
+reset_used_flags (CALL_INSN_FUNCTION_USAGE (insn));
+}
+
 /* Go through all the RTL insn bodies and check that there is no unexpected
sharing in between the subexpressions.  */
 
@@ -2643,10 +2657,12 @@ verify_rtl_sharing (void)
   for (p = get_insns (); p; p = NEXT_INSN (p))
 if (INSN_P (p))
   {
-   verify_rtx_sharing (PATTERN (p), p);
-   verify_rtx_sharing (REG_NOTES (p), p);
-   if (CALL_P (p))
- verify_rtx_sharing (CALL_INSN_FUNCTION_USAGE (p), p);
+   rtx pat = PATTERN (p);
+   if (GET_CODE (pat) != SEQUENCE)
+ verify_insn_sharing (p);
+   else
+ for (i = 0; i  XVECLEN (pat, 0); i++)
+   verify_insn_sharing (XVECEXP (pat, 0, i));
   }
 
   reset_all_used_flags ();


Re: GCC does not support *mmintrin.h with function specific opts

2013-04-17 Thread Sriraman Tallam
+HJ

On Tue, Apr 16, 2013 at 1:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

 I have attached an updated patch that  addresses all the comments raised.

 On Fri, Apr 12, 2013 at 1:58 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, Apr 11, 2013 at 12:05:41PM -0700, Sriraman Tallam wrote:
 I have attached a patch that fixes this. I have added an option
 -mgenerate-builtins that will do two things.  It will define a macro
 __ALL_ISA__ which will expose the *intrin.h functions. It will also
 expose all the target specific builtins.  -mgenerate-builtins will not
 affect code generation.

 1) this shouldn't be an option, either it can be made to work reliably,
then it should be done always, or it can't, then it shouldn't be done

 Ok, it is on by default now.  There is a way to turn it off, with
 -mno-generate-builtins.

 2) have you verified that if you always generate all builtins, that the
builtins not supported by the ISA selected from the command line are
created with the right vector modes?

 This issue does not arise.  When the target builtin is expanded, it is
 checked if the ISA support is there, either via function specific
 target opts or global target opts. If not, an error is raised. Test
 case added for this, please see intrinsic_4.c in patch.

 3) the *intrin.h headers in the case where the guarding macro isn't defined
should be surrounded by something like
#ifndef __FMA4__
#pragma GCC push options
#pragma GCC target(fma4)
#endif
...
#ifndef __FMA4__
#pragma GCC pop options
#endif
so that everything that is in the headers is compiled with the ISA
in question

 I do not think this should be done because it will break the inlining
 ability of the header function and cause issues if the caller does not
 specify the required ISA. The fact that the header functions are
 marked extern __inline, with gnu_inline guarantees that a body will
 not be generated and they will be inlined.  If the caller does not
 have the required ISA, appropriate errors will be raised. Test cases
 added, see intrinsics_1.c, intrinsics_2.c

 4) what happens if you use the various vector types typedefed in the
*intrin.h headers in code that doesn't support those ISAs?  As TYPE_MODE
for VECTOR_TYPE is a function call, perhaps it will just be handled as
generic BLKmode vectors, which is desirable I think

 I checked some tests here.  With -mno-sse for instance, vector types
 are not permitted in function arguments and return values and gcc
 raises a warning/error in each case.  With return values, gcc always
 gives an error if a SSE register is required in a return value.  I
 even fixed this message to not do it for functions marked as extern
 inline, with gnu_inline keyword as a body for them will not be
 generated.


 5) what happens if you use a target builtin in a function not supporting
the corresponding ISA, do you get proper error explaining what you are
doing wrong?

 Yes, please sse intrinsic_4.c test in patch.

 6) what happens if you use some intrinsics in a function not supporting
the corresponding ISA?  Dunno if the inliner chooses not to inline it
and error out because it is always_inline, or what exactly will happen
then

 Same deal here. The intrinsic function will, guaranteed, to be inlined
 into the caller which will be a corresponding builtin call. That
 builtin call will trigger an error if the ISA is not supported.

 Thanks
 Sri


 For all this you certainly need testcases.

 Jakub


[PATCH ARM]Extend thumb1_reorg to save more comparison instructions

2013-04-17 Thread Bin Cheng
Hi,
Before thumb1_reorg, ARM backend uses peephole to save comparison
instructions when a flag setting move is found before branch instruction.
Since we are using thumb1_reog now, it can be extended to catch more
opportunities by searching flag setting move instruction before branch,
rather than only the exact one before branch. 
For example:

mov r0, r1
//other insns does not kill r0
branch if (r0 == 0)
//other insns

Tested on thumb1, is it OK?


2013-04-18  Bin Cheng  bin.ch...@arm.com

* config/arm/arm.c (thumb1_reorg): Search for flag setting insn
before branch in same basic block.  Check both src and dest of
the move insn.
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 197562)
+++ gcc/config/arm/arm.c(working copy)
@@ -14026,6 +14026,7 @@ thumb1_reorg (void)
   rtx set, dest, src;
   rtx pat, op0;
   rtx prev, insn = BB_END (bb);
+  bool insn_clobbered = false;
 
   while (insn != BB_HEAD (bb)  DEBUG_INSN_P (insn))
insn = PREV_INSN (insn);
@@ -14034,12 +14035,29 @@ thumb1_reorg (void)
   if (INSN_CODE (insn) != CODE_FOR_cbranchsi4_insn)
continue;
 
-  /* Find the first non-note insn before INSN in basic block BB.  */
+  /* Get the register with which we are comparing.  */
+  pat = PATTERN (insn);
+  op0 = XEXP (XEXP (SET_SRC (pat), 0), 0);
+
+  /* Find the first flag setting insn before INSN in basic block BB.  */
   gcc_assert (insn != BB_HEAD (bb));
-  prev = PREV_INSN (insn);
-  while (prev != BB_HEAD (bb)  (NOTE_P (prev) || DEBUG_INSN_P (prev)))
-   prev = PREV_INSN (prev);
+  for (prev = PREV_INSN (insn);
+  (!insn_clobbered
+prev != BB_HEAD (bb)
+(NOTE_P (prev)
+   || DEBUG_INSN_P (prev)
+   || (GET_CODE (prev) == SET
+get_attr_conds (prev) == CONDS_NOCOND)));
+  prev = PREV_INSN (prev))
+   {
+ if (reg_set_p (op0, prev))
+   insn_clobbered = true;
+   }
 
+  /* Skip if op0 is clobbered by insn other than prev. */
+  if (insn_clobbered)
+   continue;
+
   set = single_set (prev);
   if (!set)
continue;
@@ -14050,12 +14068,9 @@ thumb1_reorg (void)
  || !low_register_operand (src, SImode))
continue;
 
-  pat = PATTERN (insn);
-  op0 = XEXP (XEXP (SET_SRC (pat), 0), 0);
   /* Rewrite move into subtract of 0 if its operand is compared with ZERO
-in INSN. Don't need to check dest since cprop_hardreg pass propagates
-src into INSN.  */
-  if (REGNO (op0) == REGNO (src))
+in INSN.  Both src and dest of the move insn are checked.  */
+  if (REGNO (op0) == REGNO (src) || REGNO (op0) == REGNO (dest))
{
  dest = copy_rtx (dest);
  src = copy_rtx (src);