[PATCH ARM/Embedded-4_8-branch]disable rtl loop invariant when optimizing for size

2013-08-28 Thread bin.cheng
Hi,

The attached patch disables rtl loop invariant when optimizing for code
size.  Committed to ARM/Embedded-4_8-branch as r202067.

Thanks.
bin

2013-08-29  Zhenqiang Chen  

* config/arm/arm.c (arm_option_override): Disable loop2_invariant
pass when optimize_size and ira-loop-pressure is not enabled.

Index: gcc/ChangeLog.arm
===
--- gcc/ChangeLog.arm   (revision 202066)
+++ gcc/ChangeLog.arm   (revision 202067)
@@ -1,3 +1,8 @@
+2013-08-29  Zhenqiang Chen  
+
+   * config/arm/arm.c (arm_option_override): Disable loop2_invariant
+   pass when optimize_size and ira-loop-pressure is not enabled.
+
 2013-08-05  Terry Guo  
 
Backport from mainline r197956
Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 202066)
+++ gcc/config/arm/arm.c(revision 202067)
@@ -2134,6 +2134,13 @@
  global_options.x_param_values,
  global_options_set.x_param_values);
 
+  /* Do not move invariants out of loops since it tends to increase register
+ pressure.  The heuristic to estimate register pressure does not fit for
+ ARM.  -fira-loop-pressure tends to get more precise estimation.  But it
+ still need more tuning.  */
+  if (optimize_function_for_size_p (cfun) && !flag_ira_loop_pressure)
+flag_move_loop_invariants = 0;
+
   /* Register global variables with the garbage collector.  */
   arm_add_gc_roots ();
 }


Re: wide-int branch now up for public comment and review

2013-08-28 Thread Mike Stump
On Aug 28, 2013, at 1:41 PM, Kenneth Zadeck  wrote:
> On 08/28/2013 12:45 PM, Mike Stump wrote:
>> 
>> tree t;
>> wide_int w = t;
>> 
>> wide_int_to_tree needs an additional type, so, the spelling is not as short 
>> out of necessity.

> i made wide_int_to_tree a function that lives in tree.[ch], not a member 
> function of wide-int.This seemed to be consistent with the way other 
> things were done.if you want it to be a member function, that is 
> certainly doable.

EOUTOFDATE:

   There are constructors to create the various forms of wide-int from
   trees, rtl and constants.  For trees and constants, you can simply say:

 tree t = ...;
 wide_int x = t;
 wide_int y = 6;
public:
  wide_int_ro ();
  wide_int_ro (const_tree);

:-)

Go patch committed: Set TREE_PUBLIC reliably

2013-08-28 Thread Ian Lance Taylor
Uros tracked down a problem with the section used for immutable structs:
the value of compute_reloc_for_constant would change between the time
immutable_struct_set_init would call it and the time that
get_variable_section would call it, for the same decl and the same decl
initializer.  He identified the problem: TREE_PUBLIC was set in the
latter case but not the former.  This patch fixes the problem.
Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu.
Committed to mainline and 4.8 branch.

Ian


2013-08-28  Ian Lance Taylor  

* go-gcc.cc (Gcc_backend::immutable_struct): Set TREE_PUBLIC if
the struct is not hidden.
(Gcc_backend::immutable_struct_set_init): Don't set TREE_PUBLIC.


Index: go-gcc.cc
===
--- go-gcc.cc	(revision 201932)
+++ go-gcc.cc	(working copy)
@@ -1475,8 +1475,8 @@ Gcc_backend::temporary_variable(Bfunctio
 // Create a named immutable initialized data structure.
 
 Bvariable*
-Gcc_backend::immutable_struct(const std::string& name, bool, bool,
-			  Btype* btype, Location location)
+Gcc_backend::immutable_struct(const std::string& name, bool is_hidden,
+			  bool, Btype* btype, Location location)
 {
   tree type_tree = btype->get_tree();
   if (type_tree == error_mark_node)
@@ -1490,6 +1490,8 @@ Gcc_backend::immutable_struct(const std:
   TREE_CONSTANT(decl) = 1;
   TREE_USED(decl) = 1;
   DECL_ARTIFICIAL(decl) = 1;
+  if (!is_hidden)
+TREE_PUBLIC(decl) = 1;
 
   // We don't call rest_of_decl_compilation until we have the
   // initializer.
@@ -1503,8 +1505,7 @@ Gcc_backend::immutable_struct(const std:
 
 void
 Gcc_backend::immutable_struct_set_init(Bvariable* var, const std::string&,
-   bool is_hidden, bool is_common, Btype*,
-   Location,
+   bool, bool is_common, Btype*, Location,
    Bexpression* initializer)
 {
   tree decl = var->get_tree();
@@ -1515,12 +1516,7 @@ Gcc_backend::immutable_struct_set_init(B
   DECL_INITIAL(decl) = init_tree;
 
   // We can't call make_decl_one_only until we set DECL_INITIAL.
-  if (!is_common)
-{
-  if (!is_hidden)
-	TREE_PUBLIC(decl) = 1;
-}
-  else
+  if (is_common)
 make_decl_one_only(decl, DECL_ASSEMBLER_NAME(decl));
 
   // These variables are often unneeded in the final program, so put


Re: [PATCH]: Fix PR middle-end/56382 -- Only move MODE_COMPLEX_FLOAT by parts if we can create pseudos

2013-08-28 Thread John David Anglin

On 24-Aug-13, at 6:43 AM, Steven Bosscher wrote:


On Fri, Aug 23, 2013 at 2:47 AM, John David Anglin wrote:

Ping.


On 28-Jul-13, at 12:17 PM, John David Anglin wrote:

This patch fixes PR middle-end/56382 on hppa64-hp-hpux11.11.  The  
patch

prevents moving a complex float by parts if we can't
create pseudos.  On a big endian 64-bit target, we need a psuedo  
to move a

complex float and this fails during reload.

OK for trunk?



I'm trying to understand how the patch would help...

The code you're patching is:

 /* Move floating point as parts.  */
 if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
+&& can_create_pseudo_p ()
 && optab_handler (mov_optab, GET_MODE_INNER (mode)) !=  
CODE_FOR_nothing)

   try_int = false;
 /* Not possible if the values are inherently not adjacent.  */
 else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT)
   try_int = false;
 /* Is possible if both are registers (or subregs of registers).  */
 else if (register_operand (x, mode) && register_operand (y, mode))
   try_int = true;
 /* If one of the operands is a memory, and alignment constraints
are friendly enough, we may be able to do combined memory  
operations.
We do not attempt this if Y is a constant because that  
combination is

usually better with the by-parts thing below.  */
 else if ((MEM_P (x) ? !CONSTANT_P (y) : MEM_P (y))
  && (!STRICT_ALIGNMENT
  || get_mode_alignment (mode) == BIGGEST_ALIGNMENT))
   try_int = true;
 else
   try_int = false;

With the new test for can_create_pseudo_p, you're trying to make
"try_int" be false. Apparently your failure happens if one of the
operands is a MEM? Otherwise the second "else if " test would find x
and y be registers and "try_int" still ends up being true.


With the patch, I'm trying to prevent "try_int" from being set  
directly to false

when the mode class is MODE_COMPLEX_FLOAT.  The 64-bit PA target
has move instructions to handle the inner mode.  So, "optab_handler  
(mov_optab,

GET_MODE_INNER (mode)) != CODE_FOR_nothing" is always true.



It seems to me that can_create_pseudo_p is not the right test anyway.
There many be other targets that can take this path just fine without
needing new registers. In the PR audit trail you say: "The problem is
SCmode is the same size as DImode on this target, so the subreg can't
be extracted by a move." Using can_create_pseudo_p is too big a hammer
to solve this problem. The right test would be to see if you end up
needing extra registers to perform the move. But emit_move_change_mode
already handles that, AFAICT, so can you please try and test if the
following patch solves the PR for you?


As expected, your patch doesn't fix the PR.

The bug lies in using "emit_move_complex_parts" when we can't create  
pseudos.
There are several places in the functions that it calls where  
"gen_reg_rtx" can be
called (e.g., "store_bit_field_using_insv").  In debugging, I found  
that fixing these

didn't help.  In the end, it fails to find a way move the complex parts.

In gcc.c-torture/compile/pr55921.c, we have two register operands so  
try_int
can be true.  "emit_move_via_integer" is successful in this case.   
Your patch

might be correct.

I'm not sure that can_create_pseudo_p is that big a hammer as it appears
emit_move_complex is mainly called prior to reload.  The proposed change
only affects the code when we can't create pseudos.  Even then, we  
fall back

to emit_move_complex_parts.

As you say, some other check might be more appropriate to determine
whether a call to gen_reg_rtx might be needed in  
emit_move_complex_parts.
For the PA, it would be something like "GET_MODE_BITSIZE (mode) >  
BITS_PER_WORD".
But, we still need to check can_create_pseudo_p as we probably still  
want to use

emit_move_complex_parts before reload.

Dave
--
John David Anglin   dave.ang...@bell.net





Re: wide-int branch now up for public comment and review

2013-08-28 Thread Kenneth Zadeck



   Note that the bits above the precision are not defined and the
   algorithms used here are careful not to depend on their value.  In
   particular, values that come in from rtx constants may have random
   bits.

Which is a red herring.  It should be fixed.  I cannot even believe
that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH
or INTVAL/UINTVAL.  I don't see accesses masking out 'undefined' bits
anywhere.

I can agree with you that this could be fixed.   But it is not necessary 
to fix it.   The rtl level and most of the tree level has existed for a 
long time by doing math within the precision.


you do not see the masking at the rtl level because the masking is not 
necessary.if you never look beyond the precision you just do not 
care.There is the issue with divide and mod and quite frankly the 
code on the trunk scares me to death.   my code at the rtl level makes 
sure that everything is clean since i can see if it is a udiv or a div 
that is enough info to clean the value up.



I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't
have random bits.  The upper bits must be a sign extension of the value.
There's exactly one valid rtx for each (value, mode) pair.  If you saw
something different then that sounds like a bug.  The rules should already
be fairly well enforced though, since something like (const_int 128) --
or (const_int 256) -- will not match a QImode operand.

See.  We're saved ;)
this is richard's theory.   There is a block of code at wide-int.cc:175 
that is ifdefed out that checks to see if the rtl consts are 
canonical.   if you bring it in, building the libraries fails very 
quickly.   The million dollar question is this the only bug or is this 
the first bug of a 1000 bugs.   Your comment about not seeing any 
masking cuts both ways.   There is very little code on the way in if you 
create ints with GEN_INT that makes sure someone has done the right 
thing on that side.   So my guess is that there are a lot of failures 
and a large number of them will be in the ports.


If richard is right then there will be changes.  The internals of 
wide-int will be changed so that everything is maintained in canonical 
form rather than just doing the canonization on the outside.   This will 
clean up things like fits_uhwi_p and a lot more of the wide-int internals.


But i think that if you want to change the compiler to use infinite 
precision arithmetic, you really ought to have a better reason that you 
think that it is cleaner.   Because, it buys you nothing for most of the 
compiler and it is slower AND it is a much bigger change to the compiler 
than the one we want to make.




This is probably the part of the representation that I disagree most with.
There seem to be two main ways we could hande the extension to whole HWIs:

(1) leave the stored upper bits undefined and extend them on read
(2) keep the stored upper bits in extended form

The patch goes for (1) but (2) seems better to me, for a few reasons:

I agree whole-heartedly.
my statement is above.   if we can do 2 then we should.   But I do not 
think that it is worth several people-years to do this.



* As above, constants coming from rtl are already in the right form,
   so if you create a wide_int from an rtx and only query it, no explicit
   extension is needed.

* Things like logical operations and right shifts naturally preserve
   the sign-extended form, so only a subset of write operations need
   to take special measures.

* You have a public interface that exposes the underlying HWIs
   (which is fine with me FWIW), so it seems better to expose a fully-defined
   HWI rather than only a partially-defined HWI.

E.g. zero_p is:

   HOST_WIDE_INT x;

   if (precision && precision < HOST_BITS_PER_WIDE_INT)
 x = sext_hwi (val[0], precision);
   else if (len == 0)
 {
   gcc_assert (precision == 0);
   return true;
 }
   else
 x = val[0];

   return len == 1 && x == 0;

but I think it really ought to be just:

   return len == 1 && val[0] == 0;

Yes!

But then - what value does keeping track of a 'precision' have
in this case?  It seems to me it's only a "convenient carrier"
for

   wide_int x = wide-int-from-RTX (y);
   machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode;
   ... process x ...
   RTX = RTX-from-wide_int (x, saved_mode);

that is, wide-int doesn't do anything with 'precision' but you
can extract it later to not need to remember a mode you were
interested in?

Oh, and of course some operations require a 'precision', like rotate.
As i have said, when the operands and result are all precision correct, 
as they are generally in both rtl and tree, then the default fixed 
precision interface is really very clean.I know you object to some 
implementation issues, but this is why we put the branch up, so you can 
see what you get from the other side.




   When the precision is 0, all the bits in the LEN elements of
   VE

Re: [Patch, fortran] PR 52243 - avoid reallocation checks on assignment when possible

2013-08-28 Thread Tobias Burnus

Dear Thomas, dear all,

Thomas Koenig wrote:

the attached patch avoids checks for reallocation on assignment when
the same variable is on both sides of the assignment.

Regression-tested.  OK for trunk?


OK and thanks for the patch.  (Can you remove the trailing spaces in the 
line the first "return false;" before committal?)


Tobias

PS: I should have (again) more time for gfortran in the near future.


2013-08-29  Thomas Koenig  

 PR fortran/52243
 * trans-expr.c (is_runtime_conformable):  New function.
 * gfc_trans_assignment_1:  Use it.

2013-08-29  Thomas Koenig  

 PR fortran/52243
 * gfortran.dg/realloc_on_assign_14.f90:  Remove warning made
 obsolete by patch.
 * gfortran.dg/realloc_on_assign_19.f90:  New test.




[Patch, fortran] PR 52243 - avoid reallocation checks on assignment when possible

2013-08-28 Thread Thomas Koenig
Hello world,

the attached patch avoids checks for reallocation on assignment when
the same variable is on both sides of the assignment.

Regression-tested.  OK for trunk?

Thomas

2013-08-29  Thomas Koenig  

PR fortran/52243
* trans-expr.c (is_runtime_conformable):  New function.
* gfc_trans_assignment_1:  Use it.

2013-08-29  Thomas Koenig  

PR fortran/52243
* gfortran.dg/realloc_on_assign_14.f90:  Remove warning made
obsolete by patch.
* gfortran.dg/realloc_on_assign_19.f90:  New test.
Index: fortran/trans-expr.c
===
--- fortran/trans-expr.c	(Revision 201996)
+++ fortran/trans-expr.c	(Arbeitskopie)
@@ -7738,7 +7738,103 @@ alloc_scalar_allocatable_for_assignment (stmtblock
 }
 }
 
+/* Check for assignments of the type
 
+   a = a + 4
+
+   to make sure we do not check for reallocation unneccessarily.  */
+
+
+static bool
+is_runtime_conformable (gfc_expr *expr1, gfc_expr *expr2)
+{
+  gfc_actual_arglist *a;
+  gfc_expr *e1, *e2;
+
+  switch (expr2->expr_type)
+{
+case EXPR_VARIABLE:
+  return gfc_dep_compare_expr (expr1, expr2) == 0;
+
+case EXPR_FUNCTION:
+  if (expr2->value.function.esym
+	  && expr2->value.function.esym->attr.elemental)
+	{
+	  for (a = expr2->value.function.actual; a != NULL; a = a->next)
+	{
+	  e1 = a->expr;
+	  if (e1->rank > 0 && !is_runtime_conformable (expr1, e1))
+		return false;
+	}	 
+	  return true;
+	}
+  else if (expr2->value.function.isym
+	   && expr2->value.function.isym->elemental)
+	{
+	  for (a = expr2->value.function.actual; a != NULL; a = a->next)
+	{
+	  e1 = a->expr;
+	  if (e1->rank > 0 && !is_runtime_conformable (expr1, e1))
+		return false;
+	}
+	  return true;
+	}
+
+  break;
+
+case EXPR_OP:
+  switch (expr2->value.op.op)
+	{
+	case INTRINSIC_NOT:
+	case INTRINSIC_UPLUS:
+	case INTRINSIC_UMINUS:
+	case INTRINSIC_PARENTHESES:
+	  return is_runtime_conformable (expr1, expr2->value.op.op1);
+
+	case INTRINSIC_PLUS:
+	case INTRINSIC_MINUS:
+	case INTRINSIC_TIMES:
+	case INTRINSIC_DIVIDE:
+	case INTRINSIC_POWER:
+	case INTRINSIC_AND:
+	case INTRINSIC_OR:
+	case INTRINSIC_EQV:
+	case INTRINSIC_NEQV:
+	case INTRINSIC_EQ:
+	case INTRINSIC_NE:
+	case INTRINSIC_GT:
+	case INTRINSIC_GE:
+	case INTRINSIC_LT:
+	case INTRINSIC_LE:
+	case INTRINSIC_EQ_OS:
+	case INTRINSIC_NE_OS:
+	case INTRINSIC_GT_OS:
+	case INTRINSIC_GE_OS:
+	case INTRINSIC_LT_OS:
+	case INTRINSIC_LE_OS:
+
+	  e1 = expr2->value.op.op1;
+	  e2 = expr2->value.op.op2;
+
+	  if (e1->rank == 0 && e2->rank > 0)
+	return is_runtime_conformable (expr1, e2);
+	  else if (e1->rank > 0 && e2->rank == 0)
+	return is_runtime_conformable (expr1, e1);
+	  else if (e1->rank > 0 && e2->rank > 0)
+	return is_runtime_conformable (expr1, e1)
+	  && is_runtime_conformable (expr1, e2);
+	  break;
+
+	default:
+	  break;
+
+	}
+default:
+  break;
+}
+  return false;
+}
+
 /* Subroutine of gfc_trans_assignment that actually scalarizes the
assignment.  EXPR1 is the destination/LHS and EXPR2 is the source/RHS.
init_flag indicates initialization expressions and dealloc that no
@@ -7935,7 +8031,8 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr
 	&& gfc_is_reallocatable_lhs (expr1)
 	&& !gfc_expr_attr (expr1).codimension
 	&& !gfc_is_coindexed (expr1)
-	&& expr2->rank)
+	&& expr2->rank
+	&& !is_runtime_conformable (expr1, expr2))
 	{
 	  realloc_lhs_warning (expr1->ts.type, true, &expr1->where);
 	  ompws_flags &= ~OMPWS_SCALARIZER_WS;
Index: testsuite/gfortran.dg/realloc_on_assign_14.f90
===
--- testsuite/gfortran.dg/realloc_on_assign_14.f90	(Revision 201996)
+++ testsuite/gfortran.dg/realloc_on_assign_14.f90	(Arbeitskopie)
@@ -23,7 +23,7 @@ str = 'abc'! { dg-warning "Code for reallocati
 astr = 'abc'   ! no realloc
 astr = ['abc'] ! { dg-warning "Code for reallocating the allocatable array" }
 a = reshape(a,shape(a)) ! { dg-warning "Code for reallocating the allocatable array" }
-r = sin(r) ! { dg-warning "Code for reallocating the allocatable array" }
+r = sin(r)
 r = sin(r(1))  ! no realloc
 b = sin(r(1))  ! { dg-warning "Code for reallocating the allocatable variable" }
 
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
! PR 52243 - avoid check for reallocation when doing simple
! assignments with the same variable on both sides.
module  foo
contains
  elemental function ele(a)
real, intent(in) :: a
real :: ele
ele = 1./(2+a)
  end function ele

  subroutine bar(a)
real, dimension(:), allocatable :: a
a = a * 2.0
a = sin(a-0.3)
a = ele(a)
  end subroutine bar
end module foo
! { dg-final { scan-tree-dump-times "alloc" 0 "original" } }
! { dg-final { cleanup-tree-dump "original" } }


Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-08-28 Thread Kirill Yukhin
Hello,

On 28 Aug 13:17, Richard Henderson wrote:
> Uh, no, you can't just add it when doing the split.  You could be adding it in
> a place that the flags register is live.  You must ALWAYS have the clobber on
> the whole pattern when gprs are possible.
I see now, thanks a lot for explanation!

> Fix the indentation.
Done.

> Spurious comment change.
Done.

> > +(define_insn "kandn"
> > +  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
> Yk not k?
Yes, sure. Fixed.

> > +(define_insn "kxnor"
> > +  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
> > +   (not:SWI12
> Likewise.
Fixed.

> > +(define_split
> > +  [(set (match_operand:SWI12 0 "register_operand")
> > +   (not:SWI12
> > + (xor:SWI12
> general_reg_operand.
Done.

> > +(define_insn "kortestzhi"
> > +  [(set (reg:CCZ FLAGS_REG)
> > +   (compare:CCZ
> > + (ior:HI
> > +   (match_operand:HI 0 "register_operand" "%Yk")
> > +   (match_operand:HI 1 "register_operand" "Yk"))
> 
> Omit the %; the two operands are identical.
Fixed.

> > +(define_insn "kortestchi"
> > +  [(set (reg:CCC FLAGS_REG)
> 
> Likewise.
Fixed.

> > +;; Do not split instructions with mask regs.
> >  (define_split
> >[(set (match_operand 0 "register_operand")
> > (not (match_operand 1 "register_operand")))]
> general_reg_operand.
Fixed.

Testing in progress, is it ok for trunk if pass?

--
Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c |  38 --
 gcc/config/i386/i386.h |  38 --
 gcc/config/i386/i386.md| 287 ++---
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 317 insertions(+), 63 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;; B H   T
-;;;   h jk
+;;;   h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..4e9bac0 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,  /* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,  /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32,  /* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,   /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,   /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1,  /* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,  /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
-for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
-  fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+{
+  for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
+   fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+  for (i = FIRST_MASK_REG; i < LA

Re: [C++ Patch] PR 58255

2013-08-28 Thread Jason Merrill

OK.

Jason


Re: Make some comdats implicitly hidden

2013-08-28 Thread Jason Merrill

Looks good.

Jason


[PATCH] PR58143/58227 wrong code at -O3

2013-08-28 Thread Bernd Edlinger
The lim pass (aka loop invariant motion) can move conditional expressions with 
possible undefined behavior out of the if statement inside a loop which may 
cause the loop optimization to silently generate wrong code as PR 
tree-optimization/58143 and PR tree-optimization/58227 demonstrate. 

This patch prevents lim from moving code that might cause undefined behavior.

This triggered a minor regression in gcc.target/i386/pr53397-1.c:
Here lim used to move the expression "2*step" out of the loop, but this may 
cause undefined behavior on case of overflow, I propose to resolve this by 
adding -fno-strict-overflow. The test case looks pretty constructed anyway.

The patch was boot-strapped and regression tested on i686-pc-linux-gnu and 
X86_64-linux-gnu

OK for trunk?

Regards
Bernd Edlinger2013-08-28  Bernd Edlinger  

PR tree-optimization/58143
PR tree-optimization/58227
Prevent lim from moving conditional expressions
if that could trigger undefined behavior.
* gimple.h (gimple_could_trap_p_2): New function.
* gimple.c (gimple_could_trap_p_2): New function.
(gimple_could_trap_p_1): Call gimple_could_trap_p_2.
* tree-ssa-loop-im.c (movement_possibility): Likewise.


testsuite/

* gcc.dg/pr58143.c: New test.
* gcc.target/i386/pr53397-1.c: Added -fno-strict-overflow.
* gcc.target/i386/pr53397-2.c: Likewise.



patch-lim.diff
Description: Binary data


Re: [PATCH, vtv update] Add documentation for --enable-vtable-verify configure option...

2013-08-28 Thread Caroline Tice
Discussion on this topic seems to have petered out.  I have removed
references to ENABLE_VTABLE_VERIFY from libvtv, as requested.  I
cannot remove them from libgcc, as they really need to be there.  I
have updated the documentation in install.texi, as requested.
Please review the attached patch, which does all of this, and let me
know if it is ok to commit.

-- Caroline Tice
cmt...@google.com

gcc ChangeLog:

2013-08-28  Caroline Tice  

* doc/install.texi: Add documentation for the --enable-vtable-verify
and the --disable-libvtv configure options.

libvtv ChangeLog:
2013-08-28  Caroline Tice  

* Makefile.am: Remove #if ENABLE_VTABLE_VERIFY checks around
definitions of SUBDIRS, libvtv_la_SOURCES and libvtv_include_HEADERS.
* Makefile.in: Regenerate.
* configure.ac: Remove checks and tests for --enable-vtable-verify.
* configure: Regenerate.

On Thu, Aug 8, 2013 at 3:11 PM, Joseph S. Myers  wrote:
> On Thu, 8 Aug 2013, Caroline Tice wrote:
>
>> The attached patch adds documentation for the --enable-vtable-verify
>> configure option to  install.texi.  Is this ok to commit?
>
> Could you please answer the questions I raised in the first four
> paragraphs of  at
> greater length?  This patch appears to document it just as a libstdc++
> configure option, but as noted there it appears to affect more than
> libstdc++ (even though it would be more sensible if it *only* affected
> libstdc++, and even better if it just enabled extra multilibs).
>
> --
> Joseph S. Myers
> jos...@codesourcery.com


vtv-update-doc.patch
Description: Binary data


Re: wide-int branch now up for public comment and review

2013-08-28 Thread Kenneth Zadeck

On 08/28/2013 12:45 PM, Mike Stump wrote:

On Aug 28, 2013, at 5:48 AM, Richard Biener  wrote:

Only if the precision is > HOST_BITS_PER_WIDE_INT.  If the precision
is HOST_BITS_PER_WIDE_INT then both are { -1U }.

That wasn't my understanding on how things work.

You are thinking about prec==0 numbers.  These are useful and important for 
ease of use of the wide-int package.  They allow one to do:

   wide_int w = …;

   w = w + 6;
   w = w - 3;
   w = w + (unsigned HOST_WIDE_INT)~0;

and extend the constant out to the precision of the other side.  This is a very 
narrow feature and not a general property of a wide_int.  In general, 
signedness of a wide_int is an external feature of wide_int.  We only permit 
prec==0 numbers for ease of use, and ease of use needs to track the sign, in 
the general case.  Now, one is free to have a precision that allows the sign to 
be stored, this is available to the user, if they want.  They merely are not 
forced to do this.  For example, RTL largely doesn't want or need a sign.


Right, that's what the constructors, from_* and to_* routines do.

I wonder where the from_tree and to_tree ones are?

tree t;
wide_int w = t;

wide_int_to_tree needs an additional type, so, the spelling is not as short out 
of necessity.

i made wide_int_to_tree a function that lives in tree.[ch], not a member 
function of wide-int.This seemed to be consistent with the way other 
things were done.if you want it to be a member function, that is 
certainly doable.




Are they
from_double_int / wide_int_to_tree (what's wide_int_to_infinite_tree?)

I think wide_int_to_infinite_tree is leftover junk.  I removed it:

diff --git a/gcc/wide-int.h b/gcc/wide-int.h
index 86be20a..83c2170 100644
--- a/gcc/wide-int.h
+++ b/gcc/wide-int.h
@@ -4203,8 +4203,6 @@ wide_int_ro::to_shwi2 (HOST_WIDE_INT *s ATTRIBUTE_UNUSED,
  /* tree related routines.  */
  
  extern tree wide_int_to_tree (tree type, const wide_int_ro &cst);

-extern tree wide_int_to_infinite_tree (tree type, const wide_int_ro &cst,
-  unsigned int prec);
  extern tree force_fit_type_wide (tree, const wide_int_ro &, int, bool);
  
  /* real related routines.  */






Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-08-28 Thread Richard Henderson
On 08/28/2013 11:38 AM, Kirill Yukhin wrote:
>> When combine puts the AND and the NOT together, we don't know what registers 
>> we
>> want the data in.  If we do not supply the general register alternative, with
>> the clobber, then we will be FORCED to implement the operation in the mask
>> registers, even if this operation had nothing to do with actual vector masks.
>> And it ought to come as no surprise that X & ~Y is a fairly common operation.
> I agree with all of that. But why to put in BMI alternative as well? Without 
> it
> me may have this pattern w/o clobber and add it when doing split for GPR 
> constraint.

Uh, no, you can't just add it when doing the split.  You could be adding it in
a place that the flags register is live.  You must ALWAYS have the clobber on
the whole pattern when gprs are possible.

> @@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
>  
>/* If AVX512F is disabled, squash the registers.  */
>if (! TARGET_AVX512F)
> +  {
>  for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
>fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +
> +for (i = FIRST_MASK_REG; i < LAST_MASK_REG; i++)
> +  fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
> +  }

Fix the indentation.

> @@ -1429,7 +1450,7 @@ enum reg_class
>  
>  /* Get_secondary_mem widens integral modes to BITS_PER_WORD.
> There is no need to emit full 64 bit move on 64 bit targets
> -   for integral modes that can be moved using 32 bit move.  */
> +   for integral modes that can be moved using 8 bit move.  */
>  #define SECONDARY_MEMORY_NEEDED_MODE(MODE)   \
>(GET_MODE_BITSIZE (MODE) < 32 && INTEGRAL_MODE_P (MODE)\
> ? mode_for_size (32, GET_MODE_CLASS (MODE), 0)\

Spurious comment change.

> +(define_insn "kandn"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
> + (and:SWI12
> +   (not:SWI12
> + (match_operand:SWI12 1 "register_operand" "r,0,k"))
> +   (match_operand:SWI12 2 "register_operand" "r,r,k")))
> +   (clobber (reg:CC FLAGS_REG))]

Yk not k?

> +(define_insn "kxnor"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,!k")
> + (not:SWI12
> +   (xor:SWI12
> + (match_operand:SWI12 1 "register_operand" "0,k")
> + (match_operand:SWI12 2 "register_operand" "r,k"]
> +  "TARGET_AVX512F"
> +  "@
> +   #
> +   kxnorw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "*,msklog")
> +   (set_attr "prefix" "*,vex")
> +   (set_attr "mode" "")])

Likewise.

> +(define_split
> +  [(set (match_operand:SWI12 0 "register_operand")
> + (not:SWI12
> +   (xor:SWI12
> + (match_dup 0)
> + (match_operand:SWI12 1 "register_operand"]
> +  "TARGET_AVX512F && !ANY_MASK_REG_P (operands [0])"
> +   [(parallel [(set (match_dup 0)
> + (xor:HI (match_dup 0)
> + (match_dup 1)))
> +(clobber (reg:CC FLAGS_REG))])
> +(set (match_dup 0)
> +  (not:HI (match_dup 0)))]
> +  "")

general_reg_operand.

> +(define_insn "kortestzhi"
> +  [(set (reg:CCZ FLAGS_REG)
> + (compare:CCZ
> +   (ior:HI
> + (match_operand:HI 0 "register_operand" "%Yk")
> + (match_operand:HI 1 "register_operand" "Yk"))

Omit the %; the two operands are identical.

> +(define_insn "kortestchi"
> +  [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> +   (ior:HI
> + (match_operand:HI 0 "register_operand" "%Yk")
> + (match_operand:HI 1 "register_operand" "Yk"))

Likewise.

> +;; Do not split instructions with mask regs.
>  (define_split
>[(set (match_operand 0 "register_operand")
>   (not (match_operand 1 "register_operand")))]
> @@ -16486,7 +16683,9 @@
> && (GET_MODE (operands[0]) == HImode
> || (GET_MODE (operands[0]) == QImode
>  && (TARGET_PROMOTE_QImode
> -|| optimize_insn_for_size_p ("
> +|| optimize_insn_for_size_p (
> +   && (! ANY_MASK_REG_P (operands[0])
> +  || ! ANY_MASK_REG_P (operands[1]))"
>[(set (match_dup 0)
>   (not:SI (match_dup 1)))]

general_reg_operand.


r~


Re: [PATCH 1/2] fix PR49847 ICE-on-HAVE_cc0 regression from PR50780 changes

2013-08-28 Thread Jeff Law

On 08/28/2013 03:50 AM, Mikael Pettersson wrote:

This patch fixes an ICE that occurs in #ifdef HAVE_cc0 code.  The ICE
breaks both Java and Ada bootstrap on m68k-linux.  There is also a
tiny C++ test case in the BZ entry.

The ICE is triggered by the PR middle-end/50780 changes in r180192.
As explained in ,
the effect of those changes is that an expression in EH context is
broken up so that the cc0 setter and user are put in separate BBs, which
normally isn't allowed.  As fold_rtx sees the cc0 user, it calls
equiv_constant on prev_insn_cc0, but that is NULL due to the BB boundary,
resulting in the ICE.

This patch checks if prev_insn_cc0 is NULL, and if so doesn't call
equiv_constant but sets const_arg to zero, which avoids the ICE and
makes fold_rtx leave the insn unchanged.

Bootstrapped and regtested on m68k-linux, no regressions.  This patch
has been in 4.6-based production compilers on m68k-linux since early 2012,
and in a 4.7-based compiler since early 2013.

Ok for trunk and 4.8?

[If approved I'll need help to commit it as I don't have commit rights.]

gcc/

2013-08-28  Mikael Pettersson  

PR rtl-optimization/49847
* cse.c (fold_rtx) : If prev_insn_cc0 is NULL
don't call equiv_constant on it.
I can't help but feel something different needs to be done here.   It's 
always been the case that those two insns are expected to be contiguous 
and there's going to be code all over the place that makes that 
assumption and that model is what every GCC developer has expected for 
the last 20+ years.


What precisely about Richi's patch causes the setter and user to end up 
in different blocks?  I'm guessing we now consider the comparison itself 
as potentially trapping and thus we end the block immediately?  The 
consumer then appears in the next block?


I wonder if the comparison could be duplicated and marked as 
non-throwing at RTL generation time.   Alternately (and probably better) 
would be to convert the remaining stragglers and drop the old cc0 
support entirely.


jeff




Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-08-28 Thread Kirill Yukhin
Hello Richard,

On 28 Aug 10:55, Richard Henderson wrote:
> On 08/28/2013 10:45 AM, Kirill Yukhin wrote:
> > Hello Richard,
> > 
> > On 27 Aug 13:07, Richard Henderson wrote:
> >> On 08/27/2013 11:11 AM, Kirill Yukhin wrote:
> > What happened to the bmi andn alternative we discussed?
> >>> BMI only supported for 4- and 8- byte integers, while
> >>> kandw - for HI/QI
> >>>
> >>
> >> We're talking about values in registers.  Ignoring the high bits of the 
> >> andn
> >> result still produces the correct results.
> > 
> > However I am not fully understand why do we need this.
> > `kandn' is different from BMI `andn' in clobbering of flags reg.
> > So, having such a pattern we'll make compiler think that `kandn'
> > clobber, which seems to me like opportunity to misoptimization as
> > far as `kandn' doesn't clobber.
> 
> This is no different than ANY OTHER use of the mask logical ops.
> 
> When combine puts the AND and the NOT together, we don't know what registers 
> we
> want the data in.  If we do not supply the general register alternative, with
> the clobber, then we will be FORCED to implement the operation in the mask
> registers, even if this operation had nothing to do with actual vector masks.
> And it ought to come as no surprise that X & ~Y is a fairly common operation.
I agree with all of that. But why to put in BMI alternative as well? Without it
me may have this pattern w/o clobber and add it when doing split for GPR 
constraint.
I am just thing that presense of flags clobber in `kandn' pattern is not good 
from
optimization point of view. Anyway I don't think this is big deal...

> I suppose a real question here in how this is written: Does TARGET_AVX512F
> imply TARGET_BMI?  If so, then we can eliminate the second alternative.  If
> not, then you are missing an set_attr isa to restrict the first alternative.

I think that it should be possible to use AVX-512F w/o BMI, so I've added
new isa attribute "bmi". I am testing previous patch with that change:
@@ -703,7 +703,7 @@
 ;; Used to control the "enabled" attribute on a per-instruction basis.
 (define_attr "isa" "base,x64,x64_sse4,x64_sse4_noavx,x64_avx,nox64,
sse2,sse2_noavx,sse3,sse4,sse4_noavx,avx,noavx,
-   avx2,noavx2,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
+   avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,fma_avx512f"
   (const_string "base"))
 
 (define_attr "enabled" ""
@@ -726,6 +726,7 @@
 (eq_attr "isa" "noavx") (symbol_ref "!TARGET_AVX")
 (eq_attr "isa" "avx2") (symbol_ref "TARGET_AVX2")
 (eq_attr "isa" "noavx2") (symbol_ref "!TARGET_AVX2")
+(eq_attr "isa" "bmi") (symbol_ref "TARGET_BMI")
 (eq_attr "isa" "bmi2") (symbol_ref "TARGET_BMI2")
 (eq_attr "isa" "fma4") (symbol_ref "TARGET_FMA4")
 (eq_attr "isa" "fma") (symbol_ref "TARGET_FMA")
@@ -7744,7 +7745,8 @@
andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
#
kandnw\t{%2, %1, %0|%0, %1, %2}"
-  [(set_attr "type" "bitmanip,*,msklog")
+  [(set_attr "isa" "bmi,*,avx512f")
+   (set_attr "type" "bitmanip,*,msklog")
(set_attr "prefix" "*,*,vex")
(set_attr "btver2_decode" "direct,*,*")
(set_attr "mode" "")])

Full patch below.

Ok if testing pass?

Thanks, K

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c |  34 -
 gcc/config/i386/i386.h |  40 --
 gcc/config/i386/i386.md| 287 ++---
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 317 insertions(+), 61 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;; B H   T
-;;;   h jk
+;;;   h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..8325919 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS,

Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C

2013-08-28 Thread Aldy Hernandez

On 08/27/13 16:27, Iyer, Balaji V wrote:

Hello Aldy, I went through all the emails and here are the major
issues that I could gather (other than lowering the keywords after
gimplification, which I am skipping since it is more of an
optimization for now).


Ok, for now I am fine with delaying handling all this as a gimple tuple 
since most of your code lives in it's only little world :).  But I will 
go on record saying that part of the reason that you have to handle 
CALL_EXPR, MODIFY_EXPR, INIT_EXPR and such is because you don't have 
easy gimplified code to examine.  Anyways, agreed, you can do this later.




1. Calling the gimplify_cilk_spawn on top of the gimplify_expr before
the switch-statement could slow the compiler down 2. I need a
CILK_SPAWN_STMT case in the switch statement in gimplify_expr (). 3.
No test for catching the suspicious spawned function warning 4.
Reasoning for expanding the 2 builtin functions in builtins.c instead
of just inserting the appropriate expanded-code when I am inserting
the function call.

Did I miss anything else (or misunderstand anything you pointed
out)?

Here are my answers to those questions above and am attaching a fixed
patch with the changelog entries:

1 & 2(partial): There are 3 places where we could have _Cilk_spawn:
INIT_EXPR, CALL_EXPR and MODIFY_EXPR. INIT_EXPR and MODIFY_EXPRS are
both gimplified using gimplify_modify_expr. I have moved the
cilk_detect_spawn into this function. We will go into the
cilk_detect_spawn if cilk plus is enabled, and if there is a
cilk_frame (meaning the function has a Cilk_spawn in it) thereby
reducing the number of hits into this function significantly. Inside
this function, it will go into the function that has a spawned
function call and then unwrap the CILK_SPAWN_STMT wrapper and returns
true. This shouldn't cause a huge compilation time hit. 2. To handle
CALL_EXPR (e.g. _Cilk_spawn foo (x), where foo returns a void or the
return value of it is ignored), I have added a CILK_SPAWN_STMT case.
Again, I am calling the detect_cilk_spawn and we will only step into
this function if Cilk Plus is enabled and if there is a cilk-frame
(i.e saying the function has a cilk spawn in it). If there is an
error (seen_error () == true), then it just falls through into
CALL_EXPR and is handled like a normal call expr not spawned
expression. 3. This warning rarely get hit, and I have seen it hit


See my comments below on this.


only when you are spawning a constructor in C++. To my knowledge, we
have not had anyone report that they have hit this warning. I just


Ok, leave the warning, but do include a test case.


kept it in there just in case as a heads-up. 4. The reason why I am
handling pop_frame and detach functions in builtins.c is because one
of the things that LTO does is to remove the frame pointer. All Cilk
functions must use the frame pointer. When LTO is invoked, it is hard
to see what function is a cilk function and what is not. The CFUN
flag that says it is a cilk function gets cleared out. But, the
builtin functions are expanded during LTO phase and I set the
is_cilk_function flag when it is expanding pop_frame and detach. The
other option that I was thinking was to have frame pointer on when
Cilk Plus is enabled, but this is a bit over-kill and can cause
performance issues.


I think the reason you have to do all these gymnastics is bececause you 
are waiting so late to expand.  You could expand into trees and not have 
to worry about frame pointers and such.  See fold_builtin_* in 
builtins.c.  *However*, if you think you can generate better code by 
delaying expansion all the way until RTL, then I'm fine with your 
current approach.




Also, I had added a couple more tests to catch a couple cases.



+  /* Implicit _Cilk_sync must be inserted right before any return statement
+ if there is a _Cilk_spawn in the function (checked by seeing if
+ cilk_frame_decl is not NULL_TREE).  If the user has provided a
+ _Cilk_sync, the optimizer should remove this duplicate one.  */
+  if (flag_enable_cilkplus && cfun->cilk_frame_decl != NULL_TREE)


Again, never document the obvious things your code is doing.  For 
example, you can remove "(checked by seeing if
> + cilk_frame_decl is not NULL_TREE)".  It's obvious by looking at 
the code.



+  /* If there are errors, there is no point in expanding the
+ _Cilk_spawn.  Just gimplify like a normal CALL_EXPR.  */
+  && !seen_error ())


Similarly here.  No need to document the obvious.


+  /* If there are errors, there is no point in expanding the
+ _Cilk_spawn.  Just gimplify like a normal MODIFY or INIT_EXPR.  */
+  && !seen_error ())


And here.

Alos, if the canonical way of checking if a function has a cilk_spawn in 
it is always "cfun->cilk_frame_decl != NULL_TREE", then you should 
abstract this into an inline function.  The implementation will be 
trivial to change if we ever decide to keep that information elsewhere. 
 Perhaps s

Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-28 Thread Teresa Johnson
On Wed, Aug 28, 2013 at 9:58 AM, Jan Hubicka  wrote:
> Hi,
> with Martin we did bit of progress on analyzing the problems.  We now have
> COMDAT profile merging for FDO

 Great! Is this the LTO merging you were talking about in an earlier
message, or the gcov runtime fix (that would presumably not be
lto-specific)?

> and we also noticed that forks can make your
> basic blocks appear never executed even though they are executed every run:
> the fork is accounted as 3 independnet runs of the program.  First run
> is until fork, the second 2 runs are both variant.
>
> I have patch to track this.  Moreover vforks seems to produce repeated
> merging of results.

Aha, does this explain the gimp issue as well?

>
> These two factors seems to help to riddle enought the firefox profiles
> so it took us weeks to understand what happens.
>> + if (changed)
>> +{
>> +  /* Edge forwarding in particular can cause hot blocks 
>> previously
>> + reached by both hot and cold blocks to become dominated 
>> only
>> + by cold blocks. This will cause the verification
>> below to fail,
>> + and lead to now cold code in the hot section. This is not 
>> easy
>> + to detect and fix during edge forwarding, and in some cases
>> + is only visible after newly unreachable blocks are deleted,
>> + which will be done in fixup_partitions.  */
>> +  fixup_partitions ();
>
> Is it really necessary to run this from internal loop of the cfgcleanup?

The reason I added it here is that just below there is a call to
verify_flow_info, and that will fail with the new verification.

> It seems
> you will play back and forth game where edge forwarding will remove your 
> fallthru
> and you will be re-adding it?

fixup_partitions will not add new fall through edges. (It may invoke
force_nonfallthru to do the opposite.) So there shouldn't be any
ping-ponging effect.

>
> I would wait for cfgcleanup to finish its job (I don't really think it needs 
> to be
> iterative) and then do the fixup possibly cleaning up when the blocks was 
> repoisitoined
> (I suppose it is only case where the code above introduces new cfgcleanup 
> oppurtunities).

As noted above, I can't do this due to the call to verify_flow_info
for each iteration. One option is to move both down outside the loop.

>
>> +  /* Walk the preds/succs and check if there is at least one already
>> + marked hot. Keep track of the most frequent pred/succ so that we
>> + can mark it hot if we don't find one.  */
>> +  FOR_EACH_EDGE (e, ei, edges)
>> +{
>> +  basic_block reach_bb = walk_up ? e->src : e->dest;
>> +
>> +  if (e->flags & EDGE_DFS_BACK)
>> +continue;
>> +
>> +  if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
>> +  {
>> +found = true;
>> +break;
>> +  }
>> +  if (e->probability > highest_probability)
>> +highest_probability = e->probability;
>
> When doing predecestor walk if you have two predecestors, one executing 10
> times and getting to the block with probability 1%, you want to chose it over
> block executing once and getting to you with probability 100%.
>
> You probably want to look for most likely predecestor here.  You need to look
> for highest e->count and if they are all 0 for highest EDGE_FREQUENCY and for
> maximal probability only if all fails?

Yes, thanks, let me do that.

>
>> +}
>> +
>> +  /* If bb is reached by (or reaches, in the case of !WALK_UP) another 
>> hot
>> + block (or unpartitioned, e.g. the entry block) then it is ok. If 
>> not,
>> + then the most frequent pred (or succ) needs to be adjusted.  In the
>> + case where multiple preds/succs have the same probability (e.g. a
>> + 50-50 branch), then both will be adjusted.  */
>> +  if (found)
>> +continue;
>> +
>> +  FOR_EACH_EDGE (e, ei, edges)
>> +{
>> +  if (e->flags & EDGE_DFS_BACK)
>> +continue;
>> +  if (e->probability < highest_probability)
>> +continue;
>
> Again for predecestor walk you need to wind down the bit crazy logic 
> described above.
>> Index: predict.c
>> ===
>> --- predict.c   (revision 202021)
>> +++ predict.c   (working copy)
>> @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun
>>return false;
>>  }
>>
>> +
>> +/* Return true in case edge E is probably never executed.  */
>> +
>> +bool
>> +probably_never_executed_edge_p (struct function *fun, edge e)
>> +{
>> +  gcc_checking_assert (fun);
>> +  if (profile_info && flag_branch_probabilities)
>> +return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0;
>> +  if ((!profile_info || !flag_branch_probabilities)
>> +  && (cgraph_get_node (fun->decl)->frequen

Re: opt-info message change for vectorizer

2013-08-28 Thread Xinliang David Li
Fixed as requested. I don't like the extra newline either, but I will
leave that to Teresa.

basic3.c:8:foo: note: loop vectorized

basic3.c:8:foo: note: loop versioned for vectorization because of
possible aliasing

basic3.c:8:foo: note: loop peeled for vectorization to enhance alignment

basic3.c:8:foo: note: loop with 7 iterations completely unrolled

basic3.c:5:foo: note: loop with 7 iterations completely unrolled


Is this version ok after testing?

thanks,

David

On Wed, Aug 28, 2013 at 2:45 AM, Richard Biener
 wrote:
> On Tue, Aug 27, 2013 at 10:30 PM, Xinliang David Li  
> wrote:
>> If this is the convention, we should probably have another patch to
>> fix all the existing opt-info messages.
>
> Yes please.
>
> Also ...
>
>
> b.c:16:A::foo: note: Loop is vectorized
>
> "loop vectorized"
>
>
> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization
>
> "loop versioned for vectorization because of possible aliasing"
>
> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization
>
> "loop peeled for vectorization to enhance alignment"
>
> b.c:16:A::foo: note: Completely unroll loop 6 times
>
> maybe "loop with 6 iterations completely unrolled"
>
>
> b.c:12:A::foo: note: Completely unroll loop 6 times
>
>
> I hate the excessive vertical spacing as well.
>
> Richard.
>
> Ok after testing?
>
> thanks,
>
> David
>>>
Index: tree-vectorizer.c
===
--- tree-vectorizer.c   (revision 201751)
+++ tree-vectorizer.c   (working copy)
@@ -119,7 +119,7 @@ vectorize_loops (void)
 if (LOCATION_LOCUS (vect_location) != UNKNOWN_LOC
&& dump_enabled_p ())
   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-   "Vectorized loop\n");
+   "loop vectorized\n");
vect_transform_loop (loop_vinfo);
num_vectorized_loops++;
   }
@@ -180,7 +180,7 @@ execute_vect_slp (void)
   vect_slp_transform_bb (bb);
   if (dump_enabled_p ())
 dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
-"Vectorized basic-block\n");
+"Basic block is vectorized\n");
 }
 }
 
Index: loop-unroll.c
===
--- loop-unroll.c   (revision 201751)
+++ loop-unroll.c   (working copy)
@@ -225,7 +225,7 @@ report_unroll_peel (struct loop *loop, l
   && !loop->lpt_decision.times)
 {
   dump_printf_loc (report_flags, locus,
-   "Turned loop into non-loop; it never loops.\n");
+   "loop turned into non-loop; it never loops.\n");
   return;
 }
 
@@ -236,13 +236,16 @@ report_unroll_peel (struct loop *loop, l
   else if (loop->header->count)
 niters = expected_loop_iterations (loop);
 
-  dump_printf_loc (report_flags, locus,
-   "%s loop %d times",
-   (loop->lpt_decision.decision == LPT_PEEL_COMPLETELY
-?  "Completely unroll"
-: (loop->lpt_decision.decision == LPT_PEEL_SIMPLE
-   ? "Peel" : "Unroll")),
-   loop->lpt_decision.times);
+  if (loop->lpt_decision.decision == LPT_PEEL_COMPLETELY)
+dump_printf_loc (report_flags, locus,
+ "loop with %d iterations completely unrolled",
+loop->lpt_decision.times + 1);
+  else
+dump_printf_loc (report_flags, locus,
+ "loop %s %d times",
+ (loop->lpt_decision.decision == LPT_PEEL_SIMPLE
+   ? "peeled" : "unrolled"),
+ loop->lpt_decision.times);
   if (profile_info)
 dump_printf (report_flags,
  " (header execution count %d",
Index: dumpfile.c
===
--- dumpfile.c  (revision 201751)
+++ dumpfile.c  (working copy)
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
 #include "tree.h"
+#include "gimple.h"
 
 /* If non-NULL, return one past-the-end of the matching SUBPART of
the WHOLE string.  */
@@ -261,12 +262,20 @@ dump_loc (int dump_kind, FILE *dfile, so
   if (dump_kind)
 {
   if (LOCATION_LOCUS (loc) > BUILTINS_LOCATION)
-fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
- LOCATION_LINE (loc));
+{
+  if (current_function_decl)
+fprintf (dfile, "\n%s:%d:%s: note: ", LOCATION_FILE (loc),
+ LOCATION_LINE (loc),
+ gimple_decl_printable_name (current_function_decl, 1));
+  else
+fprintf (dfile, "\n%s:%d: note: ", LOCATION_FILE (loc),
+ LOCATION_LINE (loc));
+}
   else if (current_function_decl)
-fprintf (dfile, "\n%s:%d

Re: [PATCH][2/n] 2nd try: Re-organize -fvect-cost-model, enable basic vectorization at -O2

2013-08-28 Thread Xinliang David Li
On Wed, Aug 28, 2013 at 12:59 AM, Richard Biener  wrote:
> On Tue, 27 Aug 2013, Xinliang David Li wrote:
>
>> Richard, I have some comments about the patch.
>>
>> >   -ftree-vectorizer-verbose=This switch is deprecated. Use 
>> > -fopt-info instead.
>> >
>> >   ftree-slp-vectorize
>> > ! Common Report Var(flag_tree_slp_vectorize) Optimization
>> >   Enable basic block vectorization (SLP) on trees
>>
>> The code dealing with the interactions between -ftree-vectorize, O3,
>> etc are complicated and hard to understand. Is it better to change the
>> meaning of -ftree-vectorize to mean -floop-vectorize only, and make it
>> independent of -fslp-vectorize?  P
>
> Yeah, but that would be an independent change.  Also people expect
> to be able to enable all of the vectorizer with -ftree-vectorize.
> So rather we introduce -floop-vectorize?

I think that will be good and simplify the logic too --
ftree-vectorize turns on both loop and slp if they are not explicitly
specified.


>
>> > + fvect-cost-model=
>> > + Common Joined RejectNegative Enum(vect_cost_model) 
>> > Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT)
>> > + Specifies the cost model for vectorization
>> > +
>> > + Enum
>> > + Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
>> > vectorizer cost model %qs)
>> > +
>> > + EnumValue
>> > + Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED)
>> > +
>> > + EnumValue
>> > + Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC)
>> > +
>> > + EnumValue
>> > + Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP)
>>
>> Introducing cheap model is a great change.
>>
>> > +
>>
>> > *** 173,179 
>> >   {
>> > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> >
>> > !   if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) 
>> > == 0)
>> >   return false;
>> >
>> > if (dump_enabled_p ())
>> > --- 173,180 
>> >   {
>> > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> >
>> > !   if (loop_vinfo->cost_model == VECT_COST_MODEL_CHEAP
>> > !   || (unsigned) PARAM_VALUE 
>> > (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 0)
>> >   return false;
>> >
>>
>> When the cost_model == cheap, the alignment peeling should also be
>> disabled -- there will still be loops that are beneficial to be
>> vectorized without peeling -- at perhaps reduced net runtime gain.
>
> IIRC there are targets that cannot vectorize unaligned accesses, so
> in the end the cost model needs to be more target-controlled.
>
> The above was just a start for experimenting, of course.
>
>> >   struct gimple_opt_pass pass_slp_vectorize =
>> > --- 206,220 
>> >   static bool
>> >   gate_vect_slp (void)
>> >   {
>> > !   /* Apply SLP either according to whether the user specified whether to
>> > !  run SLP or not, or according to whether the user specified whether
>> > !  to do vectorization or not.  */
>> > !   if (global_options_set.x_flag_tree_slp_vectorize)
>> > ! return flag_tree_slp_vectorize != 0;
>> > !   if (global_options_set.x_flag_tree_vectorize)
>> > ! return flag_tree_vectorize != 0;
>> > !   /* And if vectorization was enabled by default run SLP only at -O3.  */
>> > !   return flag_tree_vectorize != 0 && optimize == 3;
>> >   }
>>
>> The logic can be greatly simplified if slp vectorizer is controlled
>> independently -- easier for user to understand too.
>
> It should work with separating out -floop-vectorize, too I guess.  But
> yes, as I wanted to preserve behavior of adding -ftree-vectorize to
> -O2 the above necessarily became quite complicated ;)

With floop-vectorize, ftree-vectorize becomes a simple shorthand/alias
to 'floop-vectorize + fslp-vectorize', and O3, O2 does not need to
look at ftree-vectorize (which does even need a flag variable).

>
>> > ! @item -fvect-cost-model=@var{model}
>> >   @opindex fvect-cost-model
>> > ! Alter the cost model used for vectorization.  The @var{model} argument
>> > ! should be one of @code{unlimited}, @code{dynamic} or @code{cheap}.
>> > ! With the @code{unlimited} model the vectorized code-path is assumed
>> > ! to be profitable while with the @code{dynamic} model a runtime check
>> > ! will guard the vectorized code-path to enable it only for iteration
>> > ! counts that will likely execute faster than when executing the original
>> > ! scalar loop.  The @code{cheap} model will disable vectorization of
>> > ! loops where doing so would be cost prohibitive for example due to
>> > ! required runtime checks for data dependence or alignment but otherwise
>> > ! is equal to the @code{dynamic} model.
>> > ! The default cost model depends on other optimization flags and is
>> > ! either @code{dynamic} or @code{cheap}.
>> >
>>
>> Vectorizer in theory will only vectorize a loop with net runtime gain,
>> so the 'cost' here should only mean code size and compile time cost.
>
> Not exactly - for 'unlimited' we may enter the vectorized path
> even if the overhead 

Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-08-28 Thread Richard Henderson
On 08/28/2013 10:45 AM, Kirill Yukhin wrote:
> Hello Richard,
> 
> On 27 Aug 13:07, Richard Henderson wrote:
>> On 08/27/2013 11:11 AM, Kirill Yukhin wrote:
> What happened to the bmi andn alternative we discussed?
>>> BMI only supported for 4- and 8- byte integers, while
>>> kandw - for HI/QI
>>>
>>
>> We're talking about values in registers.  Ignoring the high bits of the andn
>> result still produces the correct results.
> 
> I've updated patch, adding BMI alternative and clobber of flags:
> +(define_insn "kandn"
> +  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
> +   (and:SWI12
> + (not:SWI12
> +   (match_operand:SWI12 1 "register_operand" "r,0,k"))
> + (match_operand:SWI12 2 "register_operand" "r,r,k")))
> +   (clobber (reg:CC FLAGS_REG))]
> +  "TARGET_AVX512F"
> +  "@
> +   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
> +   #
> +   kandnw\t{%2, %1, %0|%0, %1, %2}"
> +  [(set_attr "type" "bitmanip,*,msklog")
> +   (set_attr "prefix" "*,*,vex")
> +   (set_attr "btver2_decode" "direct,*,*")
> +   (set_attr "mode" "")])
> 
> However I am not fully understand why do we need this.
> `kandn' is different from BMI `andn' in clobbering of flags reg.
> So, having such a pattern we'll make compiler think that `kandn'
> clobber, which seems to me like opportunity to misoptimization as
> far as `kandn' doesn't clobber.

This is no different than ANY OTHER use of the mask logical ops.

When combine puts the AND and the NOT together, we don't know what registers we
want the data in.  If we do not supply the general register alternative, with
the clobber, then we will be FORCED to implement the operation in the mask
registers, even if this operation had nothing to do with actual vector masks.
And it ought to come as no surprise that X & ~Y is a fairly common operation.

I suppose a real question here in how this is written: Does TARGET_AVX512F
imply TARGET_BMI?  If so, then we can eliminate the second alternative.  If
not, then you are missing an set_attr isa to restrict the first alternative.



r~


Re: [PATCH i386 2/8] [AVX512] Add mask registers.

2013-08-28 Thread Kirill Yukhin
Hello Richard,

On 27 Aug 13:07, Richard Henderson wrote:
> On 08/27/2013 11:11 AM, Kirill Yukhin wrote:
> >> > What happened to the bmi andn alternative we discussed?
> > BMI only supported for 4- and 8- byte integers, while
> > kandw - for HI/QI
> >
>
> We're talking about values in registers.  Ignoring the high bits of the andn
> result still produces the correct results.

I've updated patch, adding BMI alternative and clobber of flags:
+(define_insn "kandn"
+  [(set (match_operand:SWI12 0 "register_operand" "=r,&r,!k")
+   (and:SWI12
+ (not:SWI12
+   (match_operand:SWI12 1 "register_operand" "r,0,k"))
+ (match_operand:SWI12 2 "register_operand" "r,r,k")))
+   (clobber (reg:CC FLAGS_REG))]
+  "TARGET_AVX512F"
+  "@
+   andn\t{%k2, %k1, %k0|%k0, %k1, %k2}
+   #
+   kandnw\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "type" "bitmanip,*,msklog")
+   (set_attr "prefix" "*,*,vex")
+   (set_attr "btver2_decode" "direct,*,*")
+   (set_attr "mode" "")])

However I am not fully understand why do we need this.
`kandn' is different from BMI `andn' in clobbering of flags reg.
So, having such a pattern we'll make compiler think that `kandn'
clobber, which seems to me like opportunity to misoptimization as
far as `kandn' doesn't clobber.

Anyway, it seems to work.

Testing:
  1. Bootstrap pass
  2. make check shows no regressions
  3. Spec 2000 & 2006 build show no regressions both with and without -mavx512f 
option
  4. Spec 2000 & 2006 run shows no stability regressions without -mavx512f 
option

Is it ok?

---
 gcc/config/i386/constraints.md |   8 +-
 gcc/config/i386/i386.c |  34 -
 gcc/config/i386/i386.h |  40 --
 gcc/config/i386/i386.md| 283 ++---
 gcc/config/i386/predicates.md  |   9 ++
 5 files changed, 314 insertions(+), 60 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 28e626f..92e0c05 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;; B H   T
-;;;   h jk
+;;;   h j
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -78,6 +78,12 @@
  "TARGET_80387 || TARGET_FLOAT_RETURNS_IN_80387 ? FP_SECOND_REG : NO_REGS"
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
+(define_register_constraint "k" "TARGET_AVX512F ? MASK_EVEX_REGS : NO_REGS"
+"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+
+(define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
+"@internal Any mask register.")
+
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
  "Any MMX register.")
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d05dbf0..8325919 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2032,6 +2032,9 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
   EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS, EVEX_SSE_REGS,
+  /* Mask registers.  */
+  MASK_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
+  MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS, MASK_EVEX_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -2047,6 +2050,7 @@ int const dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,  /* Mask registers */
 };
 
 /* The "default" register map used in 64bit mode.  */
@@ -2062,6 +2066,7 @@ int const dbx64_register_map[FIRST_PSEUDO_REGISTER] =
   25, 26, 27, 28, 29, 30, 31, 32, /* extended SSE registers */
   67, 68, 69, 70, 71, 72, 73, 74,   /* AVX-512 registers 16-23 */
   75, 76, 77, 78, 79, 80, 81, 82,   /* AVX-512 registers 24-31 */
+  118, 119, 120, 121, 122, 123, 124, 125, /* Mask registers */
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -2129,6 +2134,7 @@ int const svr4_dbx_register_map[FIRST_PSEUDO_REGISTER] =
   -1, -1, -1, -1, -1, -1, -1, -1, /* extended SSE registers */
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 16-23*/
   -1, -1, -1, -1, -1, -1, -1, -1,   /* AVX-512 registers 24-31*/
+  93, 94, 95, 96, 97, 98, 99, 100,  /* Mask registers */
 };
 
 /* Define parameter passing and return registers.  */
@@ -4219,8 +4225,13 @@ ix86_conditional_register_usage (void)
 
   /* If AVX512F is disabled, squash the registers.  */
   if (! TARGET_AVX512F)
+  {
 for (i = FIRST_EXT_REX_SSE_REG; i < LAST_EXT_REX_SSE_REG; i++)
   fixed_regs[i] = call_used_regs[i] = 1, reg_names[i] = "";
+
+for (i = FIRST_MASK_REG; i < 

Re: [PATCH, vtv update] Change fixed size array to a vector; fix diagnostic messages.

2013-08-28 Thread Diego Novillo
On Wed, Aug 28, 2013 at 1:39 PM, Caroline Tice  wrote:

> num_vtable_args  was an int that we manually incremented/decremented,
> so it might have been negative.  I am
> assuming that a vec.length() can never be negative. So before
> the else-clause was explicitly checking that the value was 0 or 1
> (since the if-condition takes all values greater than 1.  Now I am
> implicitly assuming that when we get to the else-clause the value must
> be 0 or 1, because the if-condition took care of all values greater
> than 1, and values less that zero cannot occur.  Is that assumption
> incorrect?

Ah, thanks.

> Do you need to review this again after I fix your suggestion, or can I
> go ahead and commit it after?

No. If the changes bootstrap and pass validation, you can commit the
modified patch.


Diego.


Re: [PATCH, vtv update] Change fixed size array to a vector; fix diagnostic messages.

2013-08-28 Thread Caroline Tice
On Wed, Aug 28, 2013 at 10:30 AM, Diego Novillo  wrote:
> On 2013-08-28 12:59 , Caroline Tice wrote:
>
>>  static void
>> -output_set_info (tree record_type, tree *vtbl_ptr_array, int array_size)
>> +output_set_info (tree record_type, vec *vtbl_ptr_array)
>
>

Okay, will do.

> Since this function does not modify vtbl_ptr_array, you could pass it by
> value.
>
>>
>> @@ -1023,23 +1010,23 @@ register_all_pairs (tree body)
>>
>>if (flag_vtv_debug)
>>  output_set_info (current->class_info->class_type,
>> - vtbl_ptr_array, num_vtable_args);
>> + vtbl_ptr_array);
>>
>> -  if (num_vtable_args > 1)
>> +  if (vtbl_ptr_array->length() > 1)
>>  {
>> -  insert_call_to_register_set (current->class_name,
>> num_vtable_args,
>> +  insert_call_to_register_set (current->class_name,
>> vtbl_ptr_array, body, arg1, arg2,
>> size_hint_arg);
>>registered_at_least_one = true;
>>  }
>> -  else if (num_vtable_args >= 0)
>> +  else
>

num_vtable_args  was an int that we manually incremented/decremented,
so it might have been negative.  I am
assuming that a vec.length() can never be negative. So before
the else-clause was explicitly checking that the value was 0 or 1
(since the if-condition takes all values greater than 1.  Now I am
implicitly assuming that when we get to the else-clause the value must
be 0 or 1, because the if-condition took care of all values greater
than 1, and values less that zero cannot occur.  Is that assumption
incorrect?

Do you need to review this again after I fix your suggestion, or can I
go ahead and commit it after?

-- Caroline
cmt...@google.com

>
> You've changed the meaning of this else now.  Intended?  I can't tell from
> context.
>
> The patch looks fine, otherwise.
>
>
> Diego.


Re: [PATCH, vtv update] Change fixed size array to a vector; fix diagnostic messages.

2013-08-28 Thread Diego Novillo

On 2013-08-28 12:59 , Caroline Tice wrote:


 static void
-output_set_info (tree record_type, tree *vtbl_ptr_array, int array_size)
+output_set_info (tree record_type, vec *vtbl_ptr_array)


Since this function does not modify vtbl_ptr_array, you could pass it by 
value.




@@ -1023,23 +1010,23 @@ register_all_pairs (tree body)

   if (flag_vtv_debug)
 output_set_info (current->class_info->class_type,
- vtbl_ptr_array, num_vtable_args);
+ vtbl_ptr_array);

-  if (num_vtable_args > 1)
+  if (vtbl_ptr_array->length() > 1)
 {
-  insert_call_to_register_set (current->class_name, 
num_vtable_args,

+  insert_call_to_register_set (current->class_name,
vtbl_ptr_array, body, arg1, arg2,
size_hint_arg);
   registered_at_least_one = true;
 }
-  else if (num_vtable_args >= 0)
+  else


You've changed the meaning of this else now.  Intended?  I can't tell 
from context.


The patch looks fine, otherwise.


Diego.


Re: Make some comdats implicitly hidden

2013-08-28 Thread Jan Hubicka
> On 08/26/2013 03:57 PM, Jan Hubicka wrote:
> >While analyzing the relocations of libreoffice I noticed that I can play
> >the same game w/o LTO at linker level.  Making those symbols hidden truns
> >external relocations to internal and should improve runtime, too: comdat
> >sharing by dynamic linker is expensive and won't save duplicated functions
> >from the binary.
> 
> Makes sense.
> 
> >There is ext/visibility/template2.C that fails with the patch. It tests that
> >visibility pragma does not bring the symbol local, but now we do so based
> >on logic above.
> >
> >Jason, do you have any idea how to fix the testcase? I tried to use different
> >visility but that doesn't work, since we do not have corresponding scan-not-*
> 
> Maybe change it to use a function that has its address taken, so
> this optimization doesn't apply.
OK, thanks.  This did not appear to me.  Does the following look resonable?

Index: testsuite/g++.dg/ext/visibility/template2.C
===
--- testsuite/g++.dg/ext/visibility/template2.C (revision 202047)
+++ testsuite/g++.dg/ext/visibility/template2.C (working copy)
@@ -4,15 +4,16 @@
 
 /* { dg-do compile } */
 /* { dg-require-visibility "" } */
-/* { dg-final { scan-not-hidden "_ZN1SIiED1Ev" } } */
-/* { dg-final { scan-not-hidden "_ZN1SIiEC1ERKi" } } */
+/* { dg-final { scan-not-hidden "_ZN1SIiE9nothiddenEv" } } */
 
 template 
 struct S
 {
   S (const T &);
+  void nothidden(void);
   ~S ();
   T t;
+  void (S::* ptr) (void);
 };
 
 template 
@@ -20,6 +21,13 @@ S::S (const T &x)
 {
   t = x;
 }
+template 
+void
+S::nothidden(void)
+{
+  if (t)
+   ptr=&S::nothidden;
+}
 
 template 
 S::~S ()
@@ -30,6 +38,6 @@ S::~S ()
 struct U
 {
   S s;
-  U () : s (6) { }
+  U () : s (6) { s.nothidden();}
 } u;
 #pragma GCC visibility pop


Re: [PATCH, vtv update] Change fixed size array to a vector; fix diagnostic messages.

2013-08-28 Thread Diego Novillo
There are two patches attached in your original submission.  I assume
they're both identical?

On Wed, Aug 28, 2013 at 12:59 PM, Caroline Tice  wrote:
> PING!
>
> Could somebody please, pretty please review this patch?  It's not very
> big or complex, and it's been out there for three weeks nowIs
> there something else I need to do for this?
>
> -- Caroline Tice
> cmt...@google.com
>
> On Wed, Aug 21, 2013 at 1:30 PM, Caroline Tice  wrote:
>> Ping? Ping?
>>
>> On Wed, Aug 14, 2013 at 12:14 PM, Caroline Tice  wrote:
>>> Ping?
>>>
>>> On Thu, Aug 8, 2013 at 3:16 PM, Caroline Tice  wrote:
 This patch replaces the fixed sized array that was holding vtable
 pointers for a particular class hierarchy with a vector, allowing for
 dynamic resizing.  It also fixes issues with the warning diagnostics.
 I am in the process of running regression tests with this patch;
 assuming they all pass, is this patch OK to commit?

 -- Caroline Tice
 cmt...@google.com

 2013-08-08  Caroline Tice  

 * vtable-class-hierarchy.c: Remove unnecessary include statements.
 (MAX_SET_SIZE): Remove unnecessary constant.
 (register_construction_vtables):  Make vtable_ptr_array parameter
 into a vector; remove num_args parameter. Change array accesses to
 vector accesses.
 (register_other_binfo_vtables): Ditto.
 (insert_call_to_register_set): Ditto.
 (insert_call_to_register_pair): Ditto.
 (output_set_info):  Ditto.  Also change warning calls to warning_at
 calls, and fix format of warning messages.
 (register_all_pairs): Change vtbl_ptr_array from an array into a
 vector.  Remove num_vtable_args (replace with calls to vector 
 length).
 Change array stores & accesses to vector functions. Change calls to
 register_construction_vtables, register_other_binfo_vtables,
 insert_call_to_register_set, insert_call_to_register_pair and
 output_set_info to match their new signatures.  Change warning to
 warning_at and fix the format of the warning message.


Re: [ubsan] Introduce pointer_sized_int_node

2013-08-28 Thread Joseph S. Myers
On Wed, 28 Aug 2013, Richard Biener wrote:

> > That is not easily possible.  The thing is, in the C FEs they are created
> > from the C/C++ type name (for stdint purposes):
> >   if (INTPTR_TYPE)
> > intptr_type_node =
> >   TREE_TYPE (identifier_global_value (c_get_ident (INTPTR_TYPE)));
> >   if (UINTPTR_TYPE)
> > uintptr_type_node =
> >   TREE_TYPE (identifier_global_value (c_get_ident (UINTPTR_TYPE)));
> > and are supposed to match the stdint.h previously used type, because
> > it is part of ABI etc. (long vs. long long vs. int e.g. when two of these
> > are the same precision).
> > So the middle-end uintptr type needs to be something different, while it
> > ideally should have the same precision, it is not the same language type.
> 
> But it's the C ABI type - and we do need the C ABI types from within
> the middle-end.  Joseph, any idea?

The possible issues that I see are:

* Types getting determined from strings (that are the values of macros 
such as INTPTR_TYPE) by the front ends.  This is no real problem, in that 
the Fortran front end already emulates what's required in 
get_typenode_from_name.  Of course it would be cleaner for all these 
macros to be defined to evaluate to enum values rather than strings, so 
the middle-end doesn't need such emulation of C type name parsing.  See 
the thread starting at 
 for a conversion 
to using enum values for the stdint.h macros - that could be updated for 
current trunk.  Of course the aim would be for other macros such as 
SIZE_TYPE to be converted as well, and indeed to get a conversion to hooks 
(probably a few separate hooks for different groups of types, taking 
appropriate enum values as arguments, rather than just one for all the 
typedefs or a separate hook for every typedef).

* Targets still not having the type information in GCC.  As I said in 
, listing those 
targets, it's probably time to deprecate them (and then freely check in 
changes that break them by requiring those types to be defined).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, tm]: Annotate _ITM_free as leaf

2013-08-28 Thread Jan Hubicka
> On Wed, Aug 28, 2013 at 07:06:04PM +0200, Uros Bizjak wrote:
> > >> while adding LEAF attributes, I apparently missed free.   Malloc is 
> > >> already annotated.
> > >> Fixed thus. Comitted as obvoius.
> > >
> > > Given that glibc headers mark free as leaf and nothing has been reported
> > > against it, it is probably fine; but you've ignored the comment that
> > > told you to adjust also BUILT_IN_TM_FREE.
> > 
> > Like in attached patch.
> > 
> > 2013-08-28  Uros Bizjak  
> > 
> > * gtm-builtins.def (_ITM_free): Declare leaf.
> > 
> > Tested on x86_64-pc-linux-gnu {,-m32}.
> > 
> > OK for mainline?
> 
> Yes, thanks.

Thank you for beating me!
Honza


Re: [PATCH] Change the badness computation to ensure no integer-underflow

2013-08-28 Thread Jan Hubicka
> > I am giving the patch brief benchmarking on profiledbootstrap and it it 
> > won't
> > cause major regression, I think we should go ahead with the patch.

Uhm, I profiledbootstrapped and we bit too fast to get resonable oprofile.  
What I get is:
7443  9.4372  lto1 lto1 
lto_end_uncompression(lto_compression_stream*)
4438  5.6271  lto1 lto1 
_ZL14DFS_write_treeP12output_blockP4sccsP9tree_nodebb.lto_priv.4993
2351  2.9809  lto1 lto1 
lto_output_tree(output_block*, tree_node*, bool, bool)
2179  2.7628  lto1 lto1 
_ZL30linemap_macro_loc_to_exp_pointP9line_mapsjPPK8line_map.lto_priv.7860
1910  2.4217  lto1 lto1 
_ZL19unpack_value_fieldsP7data_inP9bitpack_dP9tree_node.lto_priv.7292
1855  2.3520  libc-2.11.1.so   libc-2.11.1.so   
msort_with_tmp
1531  1.9412  lto1 lto1 
streamer_string_index(output_block*, char const*, unsigned int, bool)
1530  1.9399  libc-2.11.1.so   libc-2.11.1.so   _int_malloc
1471  1.8651  lto1 lto1 
do_estimate_growth(cgraph_node*)
1306  1.6559  lto1 lto1 
pointer_map_insert(pointer_map_t*, void const*)
1238  1.5697  lto1 lto1 
_Z28streamer_pack_tree_bitfieldsP12output_blockP9bitpack_dP9tree_node.constprop.1086
1138  1.4429  lto1 lto1 
compare_tree_sccs_1(tree_node*, tree_node*, tree_node***)
1082  1.3719  lto1 lto1 
streamer_write_tree_body(output_block*, tree_node*, bool)
1044  1.3237  lto1 lto1 
_ZL28estimate_calls_size_and_timeP11cgraph_nodePiS1_S1_j3vecIP9tree_node7va_heap6vl_ptrES7_S2_IP21ipa_agg_jump_function

We take 12 seconds of WPA on GCC (with my fork patch)
Execution times (seconds)
 phase setup :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
  1412 kB ( 0%) ggc
 phase opt and generate  :   4.48 (37%) usr   0.05 ( 6%) sys   4.57 (34%) wall  
 42983 kB ( 7%) ggc
 phase stream in :   7.21 (60%) usr   0.26 (32%) sys   7.47 (56%) wall  
565102 kB (93%) ggc
 phase stream out:   0.38 ( 3%) usr   0.50 (62%) sys   1.37 (10%) wall  
   623 kB ( 0%) ggc
 callgraph optimization  :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) wall  
 6 kB ( 0%) ggc
 ipa dead code removal   :   0.46 ( 4%) usr   0.00 ( 0%) sys   0.46 ( 3%) wall  
 0 kB ( 0%) ggc
 ipa cp  :   0.36 ( 3%) usr   0.01 ( 1%) sys   0.41 ( 3%) wall  
 38261 kB ( 6%) ggc
 ipa inlining heuristics :   2.84 (24%) usr   0.05 ( 6%) sys   2.87 (21%) wall  
 60263 kB (10%) ggc
 ipa lto gimple in   :   0.02 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto gimple out  :   0.04 ( 0%) usr   0.02 ( 2%) sys   0.06 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto decl in :   6.23 (52%) usr   0.18 (22%) sys   6.40 (48%) wall  
425731 kB (70%) ggc
 ipa lto decl out:   0.09 ( 1%) usr   0.01 ( 1%) sys   0.10 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa lto cgraph I/O  :   0.22 ( 2%) usr   0.02 ( 2%) sys   0.25 ( 2%) wall  
 60840 kB (10%) ggc
 ipa lto decl merge  :   0.20 ( 2%) usr   0.00 ( 0%) sys   0.20 ( 1%) wall  
  1051 kB ( 0%) ggc
 ipa lto cgraph merge:   0.22 ( 2%) usr   0.01 ( 1%) sys   0.25 ( 2%) wall  
 17676 kB ( 3%) ggc
 whopr wpa   :   0.38 ( 3%) usr   0.00 ( 0%) sys   0.35 ( 3%) wall  
   626 kB ( 0%) ggc
 whopr wpa I/O   :   0.01 ( 0%) usr   0.47 (58%) sys   0.98 ( 7%) wall  
 0 kB ( 0%) ggc
 whopr partitioning  :   0.18 ( 1%) usr   0.00 ( 0%) sys   0.19 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa reference   :   0.31 ( 3%) usr   0.01 ( 1%) sys   0.33 ( 2%) wall  
 0 kB ( 0%) ggc
 ipa profile :   0.09 ( 1%) usr   0.01 ( 1%) sys   0.10 ( 1%) wall  
   150 kB ( 0%) ggc
 ipa pure const  :   0.29 ( 2%) usr   0.00 ( 0%) sys   0.30 ( 2%) wall  
 0 kB ( 0%) ggc
 tree SSA incremental:   0.02 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall  
   203 kB ( 0%) ggc
 tree operand scan   :   0.00 ( 0%) usr   0.01 ( 1%) sys   0.00 ( 0%) wall  
  3512 kB ( 1%) ggc
 dominance computation   :   0.01 ( 0%) usr   0.00 ( 0%) sys   0.02 ( 0%) wall  
 0 kB ( 0%) ggc
 varconst:   0.00 ( 0%) usr   0.00 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 unaccounted todo:   0.06 ( 0%) usr   0.01 ( 1%) sys   0.09 ( 1%) wall  
 0 kB ( 0%) ggc
 TOTAL :  12.08 0.8113.43 
610123 kB

Inliing heuristics was also around 25% w/o your change.  Timming maches my
experience with firefox - growth estimation tends to be the hot functions, with
caching, badness i

Re: [PATCH, tm]: Annotate _ITM_free as leaf

2013-08-28 Thread Jakub Jelinek
On Wed, Aug 28, 2013 at 07:06:04PM +0200, Uros Bizjak wrote:
> >> while adding LEAF attributes, I apparently missed free.   Malloc is 
> >> already annotated.
> >> Fixed thus. Comitted as obvoius.
> >
> > Given that glibc headers mark free as leaf and nothing has been reported
> > against it, it is probably fine; but you've ignored the comment that
> > told you to adjust also BUILT_IN_TM_FREE.
> 
> Like in attached patch.
> 
> 2013-08-28  Uros Bizjak  
> 
> * gtm-builtins.def (_ITM_free): Declare leaf.
> 
> Tested on x86_64-pc-linux-gnu {,-m32}.
> 
> OK for mainline?

Yes, thanks.

> --- gcc/gtm-builtins.def(revision 202054)
> +++ gcc/gtm-builtins.def(working copy)
> @@ -28,7 +28,7 @@ DEF_TM_BUILTIN (BUILT_IN_TM_MALLOC, "_ITM_malloc",
>  DEF_TM_BUILTIN (BUILT_IN_TM_CALLOC, "_ITM_calloc",
> BT_FN_PTR_SIZE_SIZE, ATTR_TMPURE_MALLOC_NOTHROW_LIST)
>  DEF_TM_BUILTIN (BUILT_IN_TM_FREE, "_ITM_free",
> -   BT_FN_VOID_PTR, ATTR_TMPURE_NOTHROW_LIST)
> +   BT_FN_VOID_PTR, ATTR_TMPURE_NOTHROW_LEAF_LIST)
> 
>  /* Logging builtins.  */
>  DEF_TM_BUILTIN (BUILT_IN_TM_LOG_1, "_ITM_LU1",

Jakub


[PATCH, tm]: Annotate _ITM_free as leaf

2013-08-28 Thread Uros Bizjak
Hello!

>> while adding LEAF attributes, I apparently missed free.   Malloc is already 
>> annotated.
>> Fixed thus. Comitted as obvoius.
>
> Given that glibc headers mark free as leaf and nothing has been reported
> against it, it is probably fine; but you've ignored the comment that
> told you to adjust also BUILT_IN_TM_FREE.

Like in attached patch.

2013-08-28  Uros Bizjak  

* gtm-builtins.def (_ITM_free): Declare leaf.

Tested on x86_64-pc-linux-gnu {,-m32}.

OK for mainline?

Uros.

Index: gcc/gtm-builtins.def
===
--- gcc/gtm-builtins.def(revision 202054)
+++ gcc/gtm-builtins.def(working copy)
@@ -28,7 +28,7 @@ DEF_TM_BUILTIN (BUILT_IN_TM_MALLOC, "_ITM_malloc",
 DEF_TM_BUILTIN (BUILT_IN_TM_CALLOC, "_ITM_calloc",
BT_FN_PTR_SIZE_SIZE, ATTR_TMPURE_MALLOC_NOTHROW_LIST)
 DEF_TM_BUILTIN (BUILT_IN_TM_FREE, "_ITM_free",
-   BT_FN_VOID_PTR, ATTR_TMPURE_NOTHROW_LIST)
+   BT_FN_VOID_PTR, ATTR_TMPURE_NOTHROW_LEAF_LIST)

 /* Logging builtins.  */
 DEF_TM_BUILTIN (BUILT_IN_TM_LOG_1, "_ITM_LU1",


Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy

2013-08-28 Thread Mike Stump
On Aug 28, 2013, at 2:34 AM, Richard Biener  wrote:
> Huh?  Why should wide-int need to be marked GTY at all?!

Can I answer with a question?  Why would nb_iter_bound be GTYed?  Why would 
dw_val_struct be GTYed?  The first makes little sense to me.  The second, well, 
it is used to generate debug information…  When the clients no longer want to 
GTY, we could remove it.  wide_ints seem like they should not make it to the 
PCH file.

Re: [PATCH, vtv update] Change fixed size array to a vector; fix diagnostic messages.

2013-08-28 Thread Caroline Tice
PING!

Could somebody please, pretty please review this patch?  It's not very
big or complex, and it's been out there for three weeks nowIs
there something else I need to do for this?

-- Caroline Tice
cmt...@google.com

On Wed, Aug 21, 2013 at 1:30 PM, Caroline Tice  wrote:
> Ping? Ping?
>
> On Wed, Aug 14, 2013 at 12:14 PM, Caroline Tice  wrote:
>> Ping?
>>
>> On Thu, Aug 8, 2013 at 3:16 PM, Caroline Tice  wrote:
>>> This patch replaces the fixed sized array that was holding vtable
>>> pointers for a particular class hierarchy with a vector, allowing for
>>> dynamic resizing.  It also fixes issues with the warning diagnostics.
>>> I am in the process of running regression tests with this patch;
>>> assuming they all pass, is this patch OK to commit?
>>>
>>> -- Caroline Tice
>>> cmt...@google.com
>>>
>>> 2013-08-08  Caroline Tice  
>>>
>>> * vtable-class-hierarchy.c: Remove unnecessary include statements.
>>> (MAX_SET_SIZE): Remove unnecessary constant.
>>> (register_construction_vtables):  Make vtable_ptr_array parameter
>>> into a vector; remove num_args parameter. Change array accesses to
>>> vector accesses.
>>> (register_other_binfo_vtables): Ditto.
>>> (insert_call_to_register_set): Ditto.
>>> (insert_call_to_register_pair): Ditto.
>>> (output_set_info):  Ditto.  Also change warning calls to warning_at
>>> calls, and fix format of warning messages.
>>> (register_all_pairs): Change vtbl_ptr_array from an array into a
>>> vector.  Remove num_vtable_args (replace with calls to vector 
>>> length).
>>> Change array stores & accesses to vector functions. Change calls to
>>> register_construction_vtables, register_other_binfo_vtables,
>>> insert_call_to_register_set, insert_call_to_register_pair and
>>> output_set_info to match their new signatures.  Change warning to
>>> warning_at and fix the format of the warning message.


Re: [PATCH] Sanitize block partitioning under -freorder-blocks-and-partition

2013-08-28 Thread Jan Hubicka
Hi,
with Martin we did bit of progress on analyzing the problems.  We now have
COMDAT profile merging for FDO and we also noticed that forks can make your
basic blocks appear never executed even though they are executed every run:
the fork is accounted as 3 independnet runs of the program.  First run
is until fork, the second 2 runs are both variant.

I have patch to track this.  Moreover vforks seems to produce repeated
merging of results.

These two factors seems to help to riddle enought the firefox profiles
so it took us weeks to understand what happens.
> + if (changed)
> +{
> +  /* Edge forwarding in particular can cause hot blocks 
> previously
> + reached by both hot and cold blocks to become dominated only
> + by cold blocks. This will cause the verification
> below to fail,
> + and lead to now cold code in the hot section. This is not 
> easy
> + to detect and fix during edge forwarding, and in some cases
> + is only visible after newly unreachable blocks are deleted,
> + which will be done in fixup_partitions.  */
> +  fixup_partitions ();

Is it really necessary to run this from internal loop of the cfgcleanup?  It 
seems
you will play back and forth game where edge forwarding will remove your 
fallthru
and you will be re-adding it?

I would wait for cfgcleanup to finish its job (I don't really think it needs to 
be
iterative) and then do the fixup possibly cleaning up when the blocks was 
repoisitoined
(I suppose it is only case where the code above introduces new cfgcleanup 
oppurtunities).

> +  /* Walk the preds/succs and check if there is at least one already
> + marked hot. Keep track of the most frequent pred/succ so that we
> + can mark it hot if we don't find one.  */
> +  FOR_EACH_EDGE (e, ei, edges)
> +{
> +  basic_block reach_bb = walk_up ? e->src : e->dest;
> +
> +  if (e->flags & EDGE_DFS_BACK)
> +continue;
> +
> +  if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
> +  {
> +found = true;
> +break;
> +  }
> +  if (e->probability > highest_probability)
> +highest_probability = e->probability;

When doing predecestor walk if you have two predecestors, one executing 10
times and getting to the block with probability 1%, you want to chose it over
block executing once and getting to you with probability 100%.

You probably want to look for most likely predecestor here.  You need to look
for highest e->count and if they are all 0 for highest EDGE_FREQUENCY and for
maximal probability only if all fails?

> +}
> +
> +  /* If bb is reached by (or reaches, in the case of !WALK_UP) another 
> hot
> + block (or unpartitioned, e.g. the entry block) then it is ok. If 
> not,
> + then the most frequent pred (or succ) needs to be adjusted.  In the
> + case where multiple preds/succs have the same probability (e.g. a
> + 50-50 branch), then both will be adjusted.  */
> +  if (found)
> +continue;
> +
> +  FOR_EACH_EDGE (e, ei, edges)
> +{
> +  if (e->flags & EDGE_DFS_BACK)
> +continue;
> +  if (e->probability < highest_probability)
> +continue;

Again for predecestor walk you need to wind down the bit crazy logic described 
above.
> Index: predict.c
> ===
> --- predict.c   (revision 202021)
> +++ predict.c   (working copy)
> @@ -241,6 +241,22 @@ probably_never_executed_bb_p (struct function *fun
>return false;
>  }
> 
> +
> +/* Return true in case edge E is probably never executed.  */
> +
> +bool
> +probably_never_executed_edge_p (struct function *fun, edge e)
> +{
> +  gcc_checking_assert (fun);
> +  if (profile_info && flag_branch_probabilities)
> +return ((e->count + profile_info->runs / 2) / profile_info->runs) == 0;
> +  if ((!profile_info || !flag_branch_probabilities)
> +  && (cgraph_get_node (fun->decl)->frequency
> + == NODE_FREQUENCY_UNLIKELY_EXECUTED))
> +return true;
> +  return false;
Instead of duplicating the conditional, break out the tests into
probably_never_executed_count_p, like we have for maybe_hot_count_p.

It would be nice to extend this to work w/o profile: 
probably_never_executed_edge_p
can return true for EH edges, setjmp edges and we can then walk bodies ignoring
EH/setjmp flagging blocks that are probably never executed enabling the 
partitioning
to do a job w/o profile.

Otherwise the patch look OK to me. Thanks for working on this. Do we have 
agreement on C++ way of mixing
declarations and code?

Honza


Re: wide-int branch now up for public comment and review

2013-08-28 Thread Mike Stump
On Aug 28, 2013, at 5:48 AM, Richard Biener  wrote:
>> Only if the precision is > HOST_BITS_PER_WIDE_INT.  If the precision
>> is HOST_BITS_PER_WIDE_INT then both are { -1U }.
> 
> That wasn't my understanding on how things work.

You are thinking about prec==0 numbers.  These are useful and important for 
ease of use of the wide-int package.  They allow one to do:

  wide_int w = …;

  w = w + 6;
  w = w - 3;
  w = w + (unsigned HOST_WIDE_INT)~0;

and extend the constant out to the precision of the other side.  This is a very 
narrow feature and not a general property of a wide_int.  In general, 
signedness of a wide_int is an external feature of wide_int.  We only permit 
prec==0 numbers for ease of use, and ease of use needs to track the sign, in 
the general case.  Now, one is free to have a precision that allows the sign to 
be stored, this is available to the user, if they want.  They merely are not 
forced to do this.  For example, RTL largely doesn't want or need a sign.

>> Right, that's what the constructors, from_* and to_* routines do.
> 
> I wonder where the from_tree and to_tree ones are?

tree t;
wide_int w = t;

wide_int_to_tree needs an additional type, so, the spelling is not as short out 
of necessity.

> Are they
> from_double_int / wide_int_to_tree (what's wide_int_to_infinite_tree?)

I think wide_int_to_infinite_tree is leftover junk.  I removed it:

diff --git a/gcc/wide-int.h b/gcc/wide-int.h
index 86be20a..83c2170 100644
--- a/gcc/wide-int.h
+++ b/gcc/wide-int.h
@@ -4203,8 +4203,6 @@ wide_int_ro::to_shwi2 (HOST_WIDE_INT *s ATTRIBUTE_UNUSED,
 /* tree related routines.  */
 
 extern tree wide_int_to_tree (tree type, const wide_int_ro &cst);
-extern tree wide_int_to_infinite_tree (tree type, const wide_int_ro &cst,
-  unsigned int prec);
 extern tree force_fit_type_wide (tree, const wide_int_ro &, int, bool);
 
 /* real related routines.  */



Re: [PATCH] Convert more passes to new dump framework

2013-08-28 Thread Teresa Johnson
On Wed, Aug 28, 2013 at 7:09 AM, Teresa Johnson  wrote:
> On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
>  wrote:
>> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson  wrote:
>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson  wrote:
 On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor  wrote:
> Hi,
>
> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor  wrote:
>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>> >> This patch ports messages to the new dump framework,
>> >
>> > It would be great this new framework was documented somewhere.  I lost
>> > track of what was agreed it would be and from the uses in the
>> > vectorizer I was never quite sure how to utilize it in other passes.
>>
>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>> wiki or elsewhere?
>
> Thanks
>
>>
>> >
>> > I'd also like to point out two other minor things inline:
>> >
>> > [...]
>> >
>> >> 2013-08-06  Teresa Johnson  
>> >> Dehao Chen  
>> >>
>> >> * dumpfile.c (dump_loc): Add column number to output, make 
>> >> newlines
>> >> consistent.
>> >> * dumpfile.h (OPTGROUP_OTHER): Add and enable under 
>> >> OPTGROUP_ALL.
>> >> * ipa-inline-transform.c (clone_inlined_nodes):
>> >> (cgraph_node_opt_info): New function.
>> >> (cgraph_node_call_chain): Ditto.
>> >> (dump_inline_decision): Ditto.
>> >> (inline_call): Invoke dump_inline_decision.
>> >> * doc/invoke.texi: Document optall -fopt-info flag.
>> >> * profile.c (read_profile_edge_counts): Use new dump 
>> >> framework.
>> >> (compute_branch_probabilities): Ditto.
>> >> * passes.c (pass_manager::register_one_dump_file): Use 
>> >> OPTGROUP_OTHER
>> >> when pass not in any opt group.
>> >> * value-prof.c (check_counter): Use new dump framework.
>> >> (find_func_by_funcdef_no): Ditto.
>> >> (check_ic_target): Ditto.
>> >> * coverage.c (get_coverage_counts): Ditto.
>> >> (coverage_init): Setup new dump framework.
>> >> * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>> >> * ipa-inline.h (is_in_ipa_inline): Declare.
>> >>
>> >> * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>> >> * testsuite/gcc.dg/pr26570.c: Ditto.
>> >> * testsuite/gcc.dg/pr32773.c: Ditto.
>> >> * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>> >>
>> >
>> > [...]
>> >
>> >> Index: ipa-inline-transform.c
>> >> ===
>> >> --- ipa-inline-transform.c  (revision 201461)
>> >> +++ ipa-inline-transform.c  (working copy)
>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, 
>> >> bool d
>> >>  }
>> >>
>> >>
>> >> +#define MAX_INT_LENGTH 20
>> >> +
>> >> +/* Return NODE's name and profile count, if available.  */
>> >> +
>> >> +static const char *
>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>> >> +{
>> >> +  char *buf;
>> >> +  size_t buf_size;
>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 
>> >> 0);
>> >> +
>> >> +  if (!bfd_name)
>> >> +bfd_name = "unknown";
>> >> +
>> >> +  buf_size = strlen (bfd_name) + 1;
>> >> +  if (profile_info)
>> >> +buf_size += (MAX_INT_LENGTH + 3);
>> >> +
>> >> +  buf = (char *) xmalloc (buf_size);
>> >> +
>> >> +  strcpy (buf, bfd_name);
>> >> +
>> >> +  if (profile_info)
>> >> +sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, 
>> >> node->count);
>> >> +  return buf;
>> >> +}
>> >
>> > I'm not sure if output of this function is aimed only at the user or
>> > if it is supposed to be used by gcc developers as well.  If the
>> > latter, an incredibly useful thing is to also dump node->symbol.order
>> > too.  We usually dump it after "/" sign separating it from node name.
>> > It is invaluable when examining decisions in C++ code where you can
>> > have lots of clones of a node (and also because existing dumps print
>> > it, it is easy to combine them).
>>
>> The output is useful for both power users doing performance tuning of
>> their application, and by gcc developers. Adding the id is not so
>> useful for the former, but I agree that it is very useful for compiler
>> developers. In fact, in the google branch version we emit more verbose
>> information (the lipo module id and the funcdef_no) to help uniquely
>> identify the routines and to aid in post-processing by humans and
>> tools. S

Re: wide-int branch now up for public comment and review

2013-08-28 Thread Mike Stump
On Aug 28, 2013, at 3:22 AM, Richard Biener  wrote:
> Btw, rtl.h still wastes space with
> 
> struct GTY((variable_size)) hwivec_def {
>  int num_elem; /* number of elements */
>  HOST_WIDE_INT elem[1];
> };
> 
> struct GTY((chain_next ("RTX_NEXT (&%h)"),
>chain_prev ("RTX_PREV (&%h)"), variable_size)) rtx_def {
> ...
>  /* The first element of the operands of this rtx.
> The number of operands and their types are controlled
> by the `code' field, according to rtl.def.  */
>  union u {
>rtunion fld[1];
>HOST_WIDE_INT hwint[1];
>struct block_symbol block_sym;
>struct real_value rv;
>struct fixed_value fv;
>struct hwivec_def hwiv;
>  } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
> };
> 
> there are 32bits available before the union.  If you don't use
> those for num_elem then all wide-ints will at least take as
> much space as DOUBLE_INTs originally took - and large ints
> that would have required DOUBLE_INTs in the past will now
> require more space than before.  Which means your math
> motivating the 'num_elem' encoding stuff is wrong.  With
> moving 'num_elem' before u you can even re-use the hwint
> field in the union as the existing double-int code does
> (which in fact could simply do the encoding trick in the
> old CONST_DOUBLE scheme, similar for the tree INTEGER_CST
> container).

So, HOST_WIDE_INT is likely 64 bits, and likely is 64 bit aligned.  The base 
(stuff before the union) is 32 bits.  There is a 32 bit gap, even if not used 
before the HOST_WIDE_INT elem.  We place the num_elem is this gap.  Even if the 
field were removed, the size would not change, nor the placement of elem.  So, 
short of packing, a 32-bit HWI host or going with a 32-bit type instead of a 
HOST_WIDE_INT, I'm not sure I follow you?  I tend to discount 32-bit hosted 
compilers as a thing of the past.

Re: [RFC] Old school parallelization of WPA streaming

2013-08-28 Thread Michael Matz
Hi,

On Wed, 21 Aug 2013, Richard Biener wrote:

> I also fail to see why threads should not work here.  Maybe simply 
> annotate gcc with openmp?

Threads simply don't work here, because the whole streamer infrastructure 
(or anything else in GCC for that matter) isn't thread safe (you'd have to 
have multiple streamer objects, multiple SCC finder objects, and you'd 
have to audit everything for not using any other shared resources).  
Fork-fire-forget is really a much simpler choice here IMO; no worries 
about shared resources, less debug hassle.


Ciao,
Michael.


Re: [PATCH] -mcmodel=large -fpic TLS GD and LD support gcc + binutils (PR target/58067)

2013-08-28 Thread H.J. Lu
On Wed, Aug 28, 2013 at 2:37 AM, Jakub Jelinek  wrote:
> On Wed, Aug 14, 2013 at 09:03:24AM +0200, Uros Bizjak wrote:
>> The implementation for x86 is technically OK, but I wonder if these
>> sequences should be documented in some authoritative document about
>> TLS relocations. The "ELF Handling For Thread-Local Storage" document
>> [1] doesn't mention various code models fo x86_64, so I was not able
>> to cross-check the implementaton vs. documentation.
>>
>> [1] http://www.akkadia.org/drepper/tls.pdf
>
> Ping, are the patches ok for gcc trunk and binutils trunk?
> Uli has kindly updated the docs some time ago.
>

Linker change is OK with testcases for GD and LD.

Thanks.

-- 
H.J.


Re: [PATCH] Convert more passes to new dump framework

2013-08-28 Thread Xinliang David Li
On Wed, Aug 28, 2013 at 7:09 AM, Teresa Johnson  wrote:
> On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
>  wrote:
>> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson  wrote:
>>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson  wrote:
 On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor  wrote:
> Hi,
>
> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
>> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor  wrote:
>> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
>> >> This patch ports messages to the new dump framework,
>> >
>> > It would be great this new framework was documented somewhere.  I lost
>> > track of what was agreed it would be and from the uses in the
>> > vectorizer I was never quite sure how to utilize it in other passes.
>>
>> Cc'ing Sharad who implemented this - Sharad, is this documented on a
>> wiki or elsewhere?
>
> Thanks
>
>>
>> >
>> > I'd also like to point out two other minor things inline:
>> >
>> > [...]
>> >
>> >> 2013-08-06  Teresa Johnson  
>> >> Dehao Chen  
>> >>
>> >> * dumpfile.c (dump_loc): Add column number to output, make 
>> >> newlines
>> >> consistent.
>> >> * dumpfile.h (OPTGROUP_OTHER): Add and enable under 
>> >> OPTGROUP_ALL.
>> >> * ipa-inline-transform.c (clone_inlined_nodes):
>> >> (cgraph_node_opt_info): New function.
>> >> (cgraph_node_call_chain): Ditto.
>> >> (dump_inline_decision): Ditto.
>> >> (inline_call): Invoke dump_inline_decision.
>> >> * doc/invoke.texi: Document optall -fopt-info flag.
>> >> * profile.c (read_profile_edge_counts): Use new dump 
>> >> framework.
>> >> (compute_branch_probabilities): Ditto.
>> >> * passes.c (pass_manager::register_one_dump_file): Use 
>> >> OPTGROUP_OTHER
>> >> when pass not in any opt group.
>> >> * value-prof.c (check_counter): Use new dump framework.
>> >> (find_func_by_funcdef_no): Ditto.
>> >> (check_ic_target): Ditto.
>> >> * coverage.c (get_coverage_counts): Ditto.
>> >> (coverage_init): Setup new dump framework.
>> >> * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
>> >> * ipa-inline.h (is_in_ipa_inline): Declare.
>> >>
>> >> * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
>> >> * testsuite/gcc.dg/pr26570.c: Ditto.
>> >> * testsuite/gcc.dg/pr32773.c: Ditto.
>> >> * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
>> >>
>> >
>> > [...]
>> >
>> >> Index: ipa-inline-transform.c
>> >> ===
>> >> --- ipa-inline-transform.c  (revision 201461)
>> >> +++ ipa-inline-transform.c  (working copy)
>> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, 
>> >> bool d
>> >>  }
>> >>
>> >>
>> >> +#define MAX_INT_LENGTH 20
>> >> +
>> >> +/* Return NODE's name and profile count, if available.  */
>> >> +
>> >> +static const char *
>> >> +cgraph_node_opt_info (struct cgraph_node *node)
>> >> +{
>> >> +  char *buf;
>> >> +  size_t buf_size;
>> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 
>> >> 0);
>> >> +
>> >> +  if (!bfd_name)
>> >> +bfd_name = "unknown";
>> >> +
>> >> +  buf_size = strlen (bfd_name) + 1;
>> >> +  if (profile_info)
>> >> +buf_size += (MAX_INT_LENGTH + 3);
>> >> +
>> >> +  buf = (char *) xmalloc (buf_size);
>> >> +
>> >> +  strcpy (buf, bfd_name);
>> >> +
>> >> +  if (profile_info)
>> >> +sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, 
>> >> node->count);
>> >> +  return buf;
>> >> +}
>> >
>> > I'm not sure if output of this function is aimed only at the user or
>> > if it is supposed to be used by gcc developers as well.  If the
>> > latter, an incredibly useful thing is to also dump node->symbol.order
>> > too.  We usually dump it after "/" sign separating it from node name.
>> > It is invaluable when examining decisions in C++ code where you can
>> > have lots of clones of a node (and also because existing dumps print
>> > it, it is easy to combine them).
>>
>> The output is useful for both power users doing performance tuning of
>> their application, and by gcc developers. Adding the id is not so
>> useful for the former, but I agree that it is very useful for compiler
>> developers. In fact, in the google branch version we emit more verbose
>> information (the lipo module id and the funcdef_no) to help uniquely
>> identify the routines and to aid in post-processing by humans and
>> tools. S

Another jump threading improvement

2013-08-28 Thread Jeff Law


When the code was written to thread jumps through joiner blocks, out of 
an abundance of caution I severely restricted the form allowed for the 
successors of the joiner block.  In particular we required the successor 
of the joiner block to have just one predecessor and more than one 
successor.  This patch removes those restrictions.


By allowing the successor of the joiner to have > 2 pred, we immediately 
expose more jump threading opportunities.  By allowing the successor to 
have a single successor, we can thread through forwarder blocks after 
the joiner (which is important for the FSA optimization).


This patch also filters out jump threading requests which are not useful 
because they're subsumed by a simpler jump threading request.


Consider (A, B) (B, C) (C, D).  In this case B is a joiner block and 
will need to be copied. The outgoing edge B->C will be threaded to C->D.


If we have a jump threading request (B, C) (C, D), we get the same 
effect without copying B and thus its preferred.  In fact, if we copy B 
it's likely B' will have a threadable jump, but we can't optimize it 
until the next iteration, which is obviously bad.



Bootstrapped and regression tested on x86_64-unknown-linux-gnu. 
Installed onto the trunk.



commit 06df8aadeebff5f4cad3082efe3c12e87d3347e5
Author: Jeff Law 
Date:   Wed Aug 28 09:04:09 2013 -0600

* tree-ssa-threadedge.c (thread_around_empty_block): Remove
checks for the number of predecessors and successors allowed.
* tree-ssa-threadupdate.c (mark_threaded_blocks): Ignore requests
which require copying a joiner block if there is a request which
is a subpath that requires no joiner block copying.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 746adff..17a355c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2013-08-28  Jeff Law  
+
+   * tree-ssa-threadedge.c (thread_around_empty_block): Remove
+   checks for the number of predecessors and successors allowed.
+   * tree-ssa-threadupdate.c (mark_threaded_blocks): Ignore requests
+   which require copying a joiner block if there is a request which
+   is a subpath that requires no joiner block copying.
+
 2013-08-28  Jakub Jelinek  
 
PR middle-end/58257
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 320dec5..fc33647 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -761,14 +761,6 @@ thread_around_empty_block (edge taken_edge,
   gimple stmt;
   tree cond;
 
-  /* This block must have a single predecessor (E->dest).  */
-  if (!single_pred_p (bb))
-return NULL;
-
-  /* This block must have more than one successor.  */
-  if (single_succ_p (bb))
-return NULL;
-
   /* This block can have no PHI nodes.  This is overly conservative.  */
   if (!gsi_end_p (gsi_start_phis (bb)))
 return NULL;
diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 259a691..8a872a3 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -1146,17 +1146,56 @@ mark_threaded_blocks (bitmap threaded_blocks)
   edge e;
   edge_iterator ei;
 
+  /* It is possible to have jump threads in which one is a subpath
+ of the other.  ie, (A, B), (B, C), (C, D) where B is a joiner
+ block and (B, C), (C, D) where no joiner block exists.
+
+ When this occurs ignore the jump thread request with the joiner
+ block.  It's totally subsumed by the simpler jump thread request.
+
+ This results in less block copying, simpler CFGs.  More improtantly,
+ when we duplicate the joiner block, B, in this case we will create
+ a new threading opportunity that we wouldn't be able to optimize
+ until the next jump threading iteration. 
+
+ So first convert the jump thread requests which do not require a
+ joiner block.  */
   for (i = 0; i < threaded_edges.length (); i += 3)
 {
   edge e = threaded_edges[i];
-  edge *x = XNEWVEC (edge, 2);
 
-  e->aux = x;
-  THREAD_TARGET (e) = threaded_edges[i + 1];
-  THREAD_TARGET2 (e) = threaded_edges[i + 2];
-  bitmap_set_bit (tmp, e->dest->index);
+  if (threaded_edges[i + 2] == NULL)
+   {
+ edge *x = XNEWVEC (edge, 2);
+
+ e->aux = x;
+ THREAD_TARGET (e) = threaded_edges[i + 1];
+ THREAD_TARGET2 (e) = NULL;
+ bitmap_set_bit (tmp, e->dest->index);
+   }
 }
 
+
+  /* Now iterate again, converting cases where we threaded through
+ a joiner block, but ignoring those where we have already
+ threaded through the joiner block.  */
+  for (i = 0; i < threaded_edges.length (); i += 3)
+{
+  edge e = threaded_edges[i];
+
+  if (threaded_edges[i + 2] != NULL
+ && threaded_edges[i + 1]->aux == NULL)
+   {
+ edge *x = XNEWVEC (edge, 2);
+
+ e->aux = x;
+ THREAD_TARGET (e) = threaded_edges[i + 1];
+ THREAD_TARGET2 (e) = threaded_edges[i + 2];
+   

[PATCH, PR 58106] Make ipa_edge_duplication_hook cope with recursive inlining

2013-08-28 Thread Martin Jambor
Hi,

PR 58106 happens because the reference description duplication code
cannot deal with existence of master_nodes of recursive inlining.  It
did not put rdescs pertaining to edges going out of these nodes to the
lookup linked list (because they do not seem to be inlined into
anything) but the remapping of "not-this-edge" rdesc pointers looked
for them there - and also need to consider that the rdesc edge caller
is not inlined into anything to find them.

So this patch puts all rdesc clones to the lookup linked list and
updates the search criteria and fixes the issue.  I have looked at how
often we duplicate rdescs and we do it very rarely, 26 times in whole
SPEC 2006 at -Ofast for example.  This is expected because we create
them only constant jump functions with addresses of functions.  In
order to build a long lookup list which could potentially be a compile
time hog, such an edge would need to be inlined and we'd have to
inline its caller massively, the length grows linearly with the number
of inline clones and the number of searches as well.

Bootstrapped and tested on x86_64-linux, OK for trunk?

Thanks,

Martin


2013-08-27  Martin Jambor  

PR ipa/58106
* ipa-prop.c (ipa_edge_duplication_hook): Always put new rdesc to the
linked list.  When finding the correct duplicate, also consider also
the caller in additon to its inlined_to node.

testsuite/
* gcc.dg/ipa/pr58106.c: New test.

Index: src/gcc/ipa-prop.c
===
--- src.orig/gcc/ipa-prop.c
+++ src/gcc/ipa-prop.c
@@ -2964,11 +2964,8 @@ ipa_edge_duplication_hook (struct cgraph
= (struct ipa_cst_ref_desc *) pool_alloc (ipa_refdesc_pool);
  dst_rdesc->cs = dst;
  dst_rdesc->refcount = src_rdesc->refcount;
- if (dst->caller->global.inlined_to)
-   {
- dst_rdesc->next_duplicate = src_rdesc->next_duplicate;
- src_rdesc->next_duplicate = dst_rdesc;
-   }
+ dst_rdesc->next_duplicate = src_rdesc->next_duplicate;
+ src_rdesc->next_duplicate = dst_rdesc;
  dst_jf->value.constant.rdesc = dst_rdesc;
}
  else
@@ -2983,9 +2980,14 @@ ipa_edge_duplication_hook (struct cgraph
  for (dst_rdesc = src_rdesc->next_duplicate;
   dst_rdesc;
   dst_rdesc = dst_rdesc->next_duplicate)
-   if (dst_rdesc->cs->caller->global.inlined_to
-   == dst->caller->global.inlined_to)
- break;
+   {
+ struct cgraph_node *top;
+ top = dst_rdesc->cs->caller->global.inlined_to
+   ? dst_rdesc->cs->caller->global.inlined_to
+   : dst_rdesc->cs->caller;
+ if (dst->caller->global.inlined_to == top)
+   break;
+   }
  gcc_assert (dst_rdesc);
  dst_jf->value.constant.rdesc = dst_rdesc;
}
Index: src/gcc/testsuite/gcc.dg/ipa/pr58106.c
===
--- /dev/null
+++ src/gcc/testsuite/gcc.dg/ipa/pr58106.c
@@ -0,0 +1,50 @@
+/* PR 58106 testcase.  Verify that redsc chain creating and lookup works with
+   recursive inlining and master clone creation.  */
+/* { dg-do compile } */
+/* { dg-options "-O3"  } */
+
+typedef struct rtx_def *rtx;
+enum rtx_code {
+  LAST_AND_UNUSED_RTX_CODE};
+extern const char * const rtx_format[((int) LAST_AND_UNUSED_RTX_CODE)];
+struct rtx_def {
+  enum rtx_code code;
+};
+typedef int (*rtx_function) (rtx *, void *);
+extern int for_each_rtx (rtx *, rtx_function, void *);
+int
+replace_label (rtx *x, void *data)
+{
+  rtx l = *x;
+  if (l == (rtx) 0)
+{
+ {
+   rtx new_c, new_l;
+   for_each_rtx (&new_c, replace_label, data);
+ }
+}
+}
+static int
+for_each_rtx_1 (rtx exp, int n, rtx_function f, void *data)
+{
+  int result, i, j;
+  const char *format = (rtx_format[(int) (((enum rtx_code) (exp)->code))]);
+  rtx *x;
+  for (; format[n] != '\0'; n++)
+{
+  switch (format[n])
+ {
+ case 'e':
+   result = (*f) (x, data);
+ {
+   result = for_each_rtx_1 (*x, i, f, data);
+ }
+ }
+}
+}
+int
+for_each_rtx (rtx *x, rtx_function f, void *data)
+{
+  int i;
+  return for_each_rtx_1 (*x, i, f, data);
+}





Re: [PATCH] Convert more passes to new dump framework

2013-08-28 Thread Teresa Johnson
On Wed, Aug 28, 2013 at 4:01 AM, Richard Biener
 wrote:
> On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson  wrote:
>> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson  wrote:
>>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor  wrote:
 Hi,

 On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
> On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor  wrote:
> > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
> >> This patch ports messages to the new dump framework,
> >
> > It would be great this new framework was documented somewhere.  I lost
> > track of what was agreed it would be and from the uses in the
> > vectorizer I was never quite sure how to utilize it in other passes.
>
> Cc'ing Sharad who implemented this - Sharad, is this documented on a
> wiki or elsewhere?

 Thanks

>
> >
> > I'd also like to point out two other minor things inline:
> >
> > [...]
> >
> >> 2013-08-06  Teresa Johnson  
> >> Dehao Chen  
> >>
> >> * dumpfile.c (dump_loc): Add column number to output, make 
> >> newlines
> >> consistent.
> >> * dumpfile.h (OPTGROUP_OTHER): Add and enable under 
> >> OPTGROUP_ALL.
> >> * ipa-inline-transform.c (clone_inlined_nodes):
> >> (cgraph_node_opt_info): New function.
> >> (cgraph_node_call_chain): Ditto.
> >> (dump_inline_decision): Ditto.
> >> (inline_call): Invoke dump_inline_decision.
> >> * doc/invoke.texi: Document optall -fopt-info flag.
> >> * profile.c (read_profile_edge_counts): Use new dump framework.
> >> (compute_branch_probabilities): Ditto.
> >> * passes.c (pass_manager::register_one_dump_file): Use 
> >> OPTGROUP_OTHER
> >> when pass not in any opt group.
> >> * value-prof.c (check_counter): Use new dump framework.
> >> (find_func_by_funcdef_no): Ditto.
> >> (check_ic_target): Ditto.
> >> * coverage.c (get_coverage_counts): Ditto.
> >> (coverage_init): Setup new dump framework.
> >> * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
> >> * ipa-inline.h (is_in_ipa_inline): Declare.
> >>
> >> * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
> >> * testsuite/gcc.dg/pr26570.c: Ditto.
> >> * testsuite/gcc.dg/pr32773.c: Ditto.
> >> * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
> >>
> >
> > [...]
> >
> >> Index: ipa-inline-transform.c
> >> ===
> >> --- ipa-inline-transform.c  (revision 201461)
> >> +++ ipa-inline-transform.c  (working copy)
> >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool 
> >> d
> >>  }
> >>
> >>
> >> +#define MAX_INT_LENGTH 20
> >> +
> >> +/* Return NODE's name and profile count, if available.  */
> >> +
> >> +static const char *
> >> +cgraph_node_opt_info (struct cgraph_node *node)
> >> +{
> >> +  char *buf;
> >> +  size_t buf_size;
> >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
> >> +
> >> +  if (!bfd_name)
> >> +bfd_name = "unknown";
> >> +
> >> +  buf_size = strlen (bfd_name) + 1;
> >> +  if (profile_info)
> >> +buf_size += (MAX_INT_LENGTH + 3);
> >> +
> >> +  buf = (char *) xmalloc (buf_size);
> >> +
> >> +  strcpy (buf, bfd_name);
> >> +
> >> +  if (profile_info)
> >> +sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, 
> >> node->count);
> >> +  return buf;
> >> +}
> >
> > I'm not sure if output of this function is aimed only at the user or
> > if it is supposed to be used by gcc developers as well.  If the
> > latter, an incredibly useful thing is to also dump node->symbol.order
> > too.  We usually dump it after "/" sign separating it from node name.
> > It is invaluable when examining decisions in C++ code where you can
> > have lots of clones of a node (and also because existing dumps print
> > it, it is easy to combine them).
>
> The output is useful for both power users doing performance tuning of
> their application, and by gcc developers. Adding the id is not so
> useful for the former, but I agree that it is very useful for compiler
> developers. In fact, in the google branch version we emit more verbose
> information (the lipo module id and the funcdef_no) to help uniquely
> identify the routines and to aid in post-processing by humans and
> tools. So it is probably useful to add something similar here too. Is
> the node->symbol.order more or less unique than the funcdef_no? I see
> that you added a patch a few months ago to print t

Re: [PATCH] Change the badness computation to ensure no integer-underflow

2013-08-28 Thread Richard Biener
On Wed, Aug 28, 2013 at 2:13 PM, Jan Hubicka  wrote:
>> On Tue, Aug 27, 2013 at 2:50 PM, Jan Hubicka  wrote:
>> >> >>
>> >> >> I've updated the patch (as attached) to use sreal to compute badness.
>> >>  +  badness = ((int)((double) edge->count / max_count
>> >>  +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / 
>> >>  growth;
>> >>  +
>> >> >>>
>> >> >>> FP operations on the host are frowned upon if code generation depends
>> >> >>> on their outcome.  They all should use sreal_* as predict already 
>> >> >>> does.
>> >> >>> Other than that I wonder why the final division is int (so we get 
>> >> >>> truncation
>> >> >>> and not rounding)?  That increases the effect of doing the multiply by
>> >> >>> relbenefit in double.
>> > Sorry, i missed this patch - I need to update my scripts marking files 
>> > interesting
>> > to me.
>> >
>> > I originally duplicated the trick from global.c that used to do this for 
>> > ages.
>> > The rationale is that if you do only one FP operation, then you will not 
>> > get
>> > difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and 
>> > you
>> > do the operation a lot faster than sreal does. While this is on slipperly 
>> > ground,
>> > controlled use of double should not imply host dependency.
>>
>> Well .. the above is clearly not "one" FP operation.  Also I don't follow.
>> If we can assume all hosts implement proper IEEE double math then
>> of course the issue is moot.  But fact is we don't (not sure if we can
>
> We are not really consistent across all hosts either, because of HOST_WIDE_INT
> differences.  What we really cared about is to have compiler that does not 
> give
> different result based on build flags.

Well, we care that a cross from X to T produces the same code as a cross
from Y to T or a native compiler on T.  Otherwise it's hard to even think of
using icecream like we do internally.

>> declare such hosts broken and unsupported as host).  Of couse on
>> a perfect IEEE compliant host sreal.[ch] could be simply mapped to
>> double.  Oh, and yes, 80bit x87 unit isn't holding up to that premises.
>
> sreal is not IEEE compliant.  We first time hit the problem of doing
> non-trivial chains of fp opreations in predict.c during my school project and
> Josef implemented it as a dirty but resonably quick FP emulation module.
> Hopefully it will work well enough in badness computationm, too.
>
> I am giving the patch brief benchmarking on profiledbootstrap and it it won't
> cause major regression, I think we should go ahead with the patch.
>
> I was never really happy about the double use there and in fact the whole 
> fixed
> point arithmetic in badness compuation is a mess.  If we had template based
> fibonaci heap and sreal fast enough, turing it all to reals would save quite
> some maintenance burden.

Yeah, well.

Richard.

> Honza


Re: Free decl_in_states when not needed

2013-08-28 Thread Jan Hubicka
> > Index: lto-section-in.c
> > ===
> > --- lto-section-in.c(revision 202047)
> > +++ lto-section-in.c(working copy)
> > @@ -414,6 +414,41 @@ lto_get_function_in_decl_state (struct l
> >return slot? ((struct lto_in_decl_state*) *slot) : NULL;
> >  }
> >  
> > +/* Free decl_states.  */
> > +
> > +void
> > +lto_free_function_in_decl_state (struct lto_in_decl_state *state)
> > +{
> > +  int i;
> > +  for (i = 0; i < LTO_N_DECL_STREAMS; i++)
> > +ggc_free (state->streams[i].trees);
> 
> We likely also waste a lot of memory by means of GC overhead here.
> Moving this array out of GC space would be better (we know the
> exact size upfront) - should be doable with making lto_tree_ref_able
> GTY((user)) and a custom marker.

Possibly, yes. It should be way how to get this memory back for malloc stuff
without convincing ggc to do something useful ;)

I will be happy to play with this incrementally. (It bit makes me feel bit
worried about maintenance costs when user markes starts to do smart things
everywhere, but this one is important enough special case for sure).
> 
> Stray changes?

Yes, sorry, I was running both tests at one machine.
Removed from the patch now.
> 
> Just leave the ggc_collect ()s there, they are free.

First ggc_collect after type reading executes garbage collector and the other
ggc_collects do nothing (since we never allocate enough of ggc memory to
trigger them).  Currently we trigger ggc_collect just after tree streaming. It
spend time to walk memory, free almost nothing, produce debug output so I know
how much memory we need, and mark the bounds high enough so it ggc_collect
won't happen again. All in all it is pointless for overall memory use of
WPA.

I want the first run to happen only after things are merged & dead code
removed.  Then it frees about 0.5-1GB of garbage that is result of decl state
streams disappearing and some trees actually being rendered dead. This way ggc
allocations of IPA optimizers can use the memory freed.

I can drop this change from the patch removing states now and we can discuss
these incrementally.

Honza
>   
> >timevar_pop (TV_IPA_LTO_DECL_MERGE);
> >/* Each pass will set the appropriate timer.  */
> > @@ -3503,6 +3501,9 @@ read_cgraph_and_symbols (unsigned nfiles
> >gcc_assert (all_file_decl_data[i]->symtab_node_encoder);
> >lto_symtab_encoder_delete 
> > (all_file_decl_data[i]->symtab_node_encoder);
> >all_file_decl_data[i]->symtab_node_encoder = NULL;
> > +  lto_free_function_in_decl_state 
> > (all_file_decl_data[i]->global_decl_state);
> > +  all_file_decl_data[i]->global_decl_state = NULL;
> > +  all_file_decl_data[i]->current_decl_state = NULL; 
> >  }
> >  
> >/* Finally merge the cgraph according to the decl merging decisions.  */
> > @@ -3513,7 +3514,12 @@ read_cgraph_and_symbols (unsigned nfiles
> >dump_symtab (cgraph_dump_file);
> >  }
> >lto_symtab_merge_symbols ();
> > -  ggc_collect ();
> > +
> > +  /* Do not GGC collect here;  streaming in should not produce garbage.
> > + Be sure we first collect after merging symbols, setting up 
> > visibilities
> > + and removing unreachable nodes.  This will happen after whole program
> > + visibility pass.  This should release more memory back to the system
> > + and possibly allow us to re-use it for heap.  */
> 
> But it's not wrong to collect here, no?  See above, it should be
> mostly free.
> 
> Thanks,
> Richard.


Re: [ubsan] Use pointer map instead of hash table.

2013-08-28 Thread Jakub Jelinek
On Wed, Aug 28, 2013 at 03:30:46PM +0200, Marek Polacek wrote:
> >From a quick look, it doesn't seem like we do.  (I was searching for
> something about pointer_map in ggc* and gen* files.)

If we can't make it GC aware easily, I'm afraid we need to revert that change to
pointer_map.  Now, the question is if the if_marked stuff can be easily done
for the previous implementation with C++ish hash table, or if we should just
use something similar to what tree.c uses for value_expr_for_decl and
similar (of course, with minor adjustments, because we want to map from
TYPE_UID rather than DECL_UID of the key to tree).  Or with pointer_map we'd
need to push both the keys (types) and what they map to into (decls) into
some GC vector in addition to the pointer_map.

Jakub


Re: [ubsan] Use pointer map instead of hash table.

2013-08-28 Thread Marek Polacek
On Wed, Aug 28, 2013 at 03:05:37PM +0200, Jakub Jelinek wrote:
> On Wed, Aug 28, 2013 at 02:57:01PM +0200, Marek Polacek wrote:
> > On Wed, Aug 28, 2013 at 02:49:54PM +0200, Richard Biener wrote:
> > > On Wed, Aug 28, 2013 at 2:15 PM, Marek Polacek  wrote:
> > > > On Wed, Aug 28, 2013 at 12:40:50PM +0200, Richard Biener wrote:
> > > >> On Tue, Aug 27, 2013 at 2:33 PM, Marek Polacek  
> > > >> wrote:
> > > >> > It turned out that for tree -> tree mapping we don't need the hash
> > > >> > table at all; pointer map is much more convenient.  So this patch
> > > >> > weeds out the hash table out of ubsan and introduces pointer map
> > > >> > instead.  Quite a lot of code could go away--no need to set the
> > > >> > alloc pools up etc.
> > > >> >
> > > >> > Regtested, ran bootstrap-ubsan on x86_64-linux.  Applying to the
> > > >> > ubsan branch.
> > > >>
> > > >> You can use the type-safe pointer_map  now (ok, only the data 
> > > >> type
> > > >> is type safe, the pointer is still void).
> > > >
> > > > Thanks, done with the following.  Please let me know if you see
> > > > something wrong in this; otherwise I'll commit it if the
> > > > bootstrap-ubsan passes.
> > > 
> > > Probably misses freeing of the pointer-map using 'delete' somewhere.
> > 
> > That's a problem, since ubsan is not a pass, we can't simply delete
> > the map at the end of the pass when it's not needed anymore...
> > 
> > Perhaps some GTY(()) stuff could do it, but I don't know which one.
> 
> You basically want to keep the pointer map for as long as you can
> lookup types, which is done now from both FEs and
> middle-end?  

Yes, I think so.

> I guess it is the same category as say
> debug_expr_for_decl or value_expr_for_decl map tables, which we never free
> either.  But for those we indeed
> GTY ((if_marked ("tree_decl_map_marked_p"), param_is (struct tree_decl_map)))
> and thus remove items if the original decl is not referenced from anywhere
> else; but pointer_map doesn't allow item removal; do we walk it for GC at
> all?

>From a quick look, it doesn't seem like we do.  (I was searching for
something about pointer_map in ggc* and gen* files.)

> If so, it might prevent some types from being GC collected, on the
> other side right now it isn't expected that too many types will be put into
> the map.

That's right.  I expect that typically there will be no more than,
say, 10 types in the map.

Marek


Re: [ubsan] Use pointer map instead of hash table.

2013-08-28 Thread Richard Biener
On Wed, Aug 28, 2013 at 3:05 PM, Jakub Jelinek  wrote:
> On Wed, Aug 28, 2013 at 02:57:01PM +0200, Marek Polacek wrote:
>> On Wed, Aug 28, 2013 at 02:49:54PM +0200, Richard Biener wrote:
>> > On Wed, Aug 28, 2013 at 2:15 PM, Marek Polacek  wrote:
>> > > On Wed, Aug 28, 2013 at 12:40:50PM +0200, Richard Biener wrote:
>> > >> On Tue, Aug 27, 2013 at 2:33 PM, Marek Polacek  
>> > >> wrote:
>> > >> > It turned out that for tree -> tree mapping we don't need the hash
>> > >> > table at all; pointer map is much more convenient.  So this patch
>> > >> > weeds out the hash table out of ubsan and introduces pointer map
>> > >> > instead.  Quite a lot of code could go away--no need to set the
>> > >> > alloc pools up etc.
>> > >> >
>> > >> > Regtested, ran bootstrap-ubsan on x86_64-linux.  Applying to the
>> > >> > ubsan branch.
>> > >>
>> > >> You can use the type-safe pointer_map  now (ok, only the data type
>> > >> is type safe, the pointer is still void).
>> > >
>> > > Thanks, done with the following.  Please let me know if you see
>> > > something wrong in this; otherwise I'll commit it if the
>> > > bootstrap-ubsan passes.
>> >
>> > Probably misses freeing of the pointer-map using 'delete' somewhere.
>>
>> That's a problem, since ubsan is not a pass, we can't simply delete
>> the map at the end of the pass when it's not needed anymore...
>>
>> Perhaps some GTY(()) stuff could do it, but I don't know which one.
>
> You basically want to keep the pointer map for as long as you can
> lookup types, which is done now from both FEs and
> middle-end?  I guess it is the same category as say
> debug_expr_for_decl or value_expr_for_decl map tables, which we never free
> either.  But for those we indeed
> GTY ((if_marked ("tree_decl_map_marked_p"), param_is (struct tree_decl_map)))
> and thus remove items if the original decl is not referenced from anywhere
> else; but pointer_map doesn't allow item removal; do we walk it for GC at
> all?  If so, it might prevent some types from being GC collected, on the
> other side right now it isn't expected that too many types will be put into
> the map.

pointer-map is not GC enabled.

Richard.

> Jakub


[C++ Patch] PR 58255

2013-08-28 Thread Paolo Carlini

Hi,

for value-initialization (thus void_type_node as init), explicit should 
not matter, thus we shouldn't set LOOKUP_ONLYCONVERTING. I also double 
checked that we do the right thing for testcases like:


struct A {
  explicit A(int = 0);
};

A a1 = 1; /* error */
A a2; /* Ok */
A a3(2);  /* Ok */

As expected, in such cases init is an INTEGER_CST, a NULL_TREE and a 
TREE_LIST of a single INTEGER_CST, respectively.


Booted and tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2013-08-28  Paolo Carlini  

PR c++/58255
* init.c (build_aggr_init): When init == void_type_node do not
set LOOKUP_ONLYCONVERTING.

/testsuite
2013-08-28  Paolo Carlini  

PR c++/58255
* g++.dg/cpp0x/dc7.C: New.
Index: cp/init.c
===
--- cp/init.c   (revision 202020)
+++ cp/init.c   (working copy)
@@ -1464,7 +1464,8 @@ build_aggr_init (tree exp, tree init, int flags, t
   TREE_READONLY (exp) = 0;
   TREE_THIS_VOLATILE (exp) = 0;
 
-  if (init && TREE_CODE (init) != TREE_LIST
+  if (init && init != void_type_node
+  && TREE_CODE (init) != TREE_LIST
   && !(TREE_CODE (init) == TARGET_EXPR
   && TARGET_EXPR_DIRECT_INIT_P (init))
   && !(BRACE_ENCLOSED_INITIALIZER_P (init)
Index: testsuite/g++.dg/cpp0x/dc7.C
===
--- testsuite/g++.dg/cpp0x/dc7.C(revision 0)
+++ testsuite/g++.dg/cpp0x/dc7.C(working copy)
@@ -0,0 +1,7 @@
+// PR c++/58255
+// { dg-do compile { target c++11 } }
+
+struct A {
+  explicit A() { }
+  A(int x) : A() { }
+};


Re: opt-info message change for vectorizer

2013-08-28 Thread Teresa Johnson
On Wed, Aug 28, 2013 at 2:45 AM, Richard Biener
 wrote:
> On Tue, Aug 27, 2013 at 10:30 PM, Xinliang David Li  
> wrote:
>> If this is the convention, we should probably have another patch to
>> fix all the existing opt-info messages.
>
> Yes please.
>
> Also ...
>
>
> b.c:16:A::foo: note: Loop is vectorized
>
> "loop vectorized"
>
>
> b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization
>
> "loop versioned for vectorization because of possible aliasing"
>
> b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization
>
> "loop peeled for vectorization to enhance alignment"
>
> b.c:16:A::foo: note: Completely unroll loop 6 times
>
> maybe "loop with 6 iterations completely unrolled"
>
>
> b.c:12:A::foo: note: Completely unroll loop 6 times
>
>
> I hate the excessive vertical spacing as well.

This part should be fixable after my related patch that you reviewed
(the Convert more passes to new dump framework patch). The problem is
that the dump framework was not consistently emitting newlines, so
many of the messages were explicitly adding a newline, which often
wasn't needed. I'll respond more on that thread, but for cleanup such
as removing extra newlines and adjusting the capitalization, I can
send a separate cleanup patch after these are in.

Teresa

>
> Richard.
>
> Ok after testing?
>
> thanks,
>
> David
>>>



-- 
Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413


Re: wide-int branch now up for public comment and review

2013-08-28 Thread Kenneth Zadeck



   Note that the bits above the precision are not defined and the
   algorithms used here are careful not to depend on their value.  In
   particular, values that come in from rtx constants may have random
   bits.

Which is a red herring.  It should be fixed.  I cannot even believe
that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH
or INTVAL/UINTVAL.  I don't see accesses masking out 'undefined' bits
anywhere.


Richi,

you asked for the entire patch as a branch so that you could look at the 
whole picture.
It is now time for you to do that.I understand it is very large and 
it will take some time for you to get your head around the whole 
thing.   But remember this:


The vast majority of the clients are dealing with intermediate code that 
has explicit promotions.Not only the rtl level, but the majority of 
the tree level takes inputs that have explicit precisions that have been 
matched and wants to see an explicit precision result.   For this code, 
doing the fixed precision thing where you do not ever ask about what is 
behind the curtain is a very good match.


However, there are parts of the compiler, all within the tree or gimple 
level, that do not have this view.   For this, there are two templates 
export an interface that behaves in a manner very similar to what 
double-int does when the precision was smaller than 128 bits.  (we 
discovered a large number of bugs when using double int for timode 
because they did make an infinite precision assumption, but that is 
another story.)  All numbers are converted to signed numbers that are 
extended based on their input type and the math is performed in a large 
enough field so that they never push near the end.   We know what the 
end is because we sniff the port.


At this point we looked at the pass we were converting and we used the 
appropriate implementation that match the style of coding and the 
algorithm. i.e. we made no substantive changes.  As mentioned in my 
earlier mail, i plan to change in the future tree-ssa-ccp to use the 
fixed precision form, but this is a change that is motivated by being 
able to find more constants with the proper representation than what is 
beautiful.   But this is the only case where i think the rep should be 
substantially changed.


I know you find the fixed precision stuff unappealing.   But the truth 
is that you wanted us to make this patch so that it did as little damage 
to the way the compiler worked as possible and given that so much of the 
compiler actually does fixed precision math, this is the path of least 
resistance.


If it is reasonable to change the rtl, we may change that, but the truth 
is that the clients never see this so it is not as much of an issue as 
you are making it.   Now you can see all of the clients, see this for 
yourself.


Kenny




Re: [ubsan] Use pointer map instead of hash table.

2013-08-28 Thread Jakub Jelinek
On Wed, Aug 28, 2013 at 02:57:01PM +0200, Marek Polacek wrote:
> On Wed, Aug 28, 2013 at 02:49:54PM +0200, Richard Biener wrote:
> > On Wed, Aug 28, 2013 at 2:15 PM, Marek Polacek  wrote:
> > > On Wed, Aug 28, 2013 at 12:40:50PM +0200, Richard Biener wrote:
> > >> On Tue, Aug 27, 2013 at 2:33 PM, Marek Polacek  
> > >> wrote:
> > >> > It turned out that for tree -> tree mapping we don't need the hash
> > >> > table at all; pointer map is much more convenient.  So this patch
> > >> > weeds out the hash table out of ubsan and introduces pointer map
> > >> > instead.  Quite a lot of code could go away--no need to set the
> > >> > alloc pools up etc.
> > >> >
> > >> > Regtested, ran bootstrap-ubsan on x86_64-linux.  Applying to the
> > >> > ubsan branch.
> > >>
> > >> You can use the type-safe pointer_map  now (ok, only the data type
> > >> is type safe, the pointer is still void).
> > >
> > > Thanks, done with the following.  Please let me know if you see
> > > something wrong in this; otherwise I'll commit it if the
> > > bootstrap-ubsan passes.
> > 
> > Probably misses freeing of the pointer-map using 'delete' somewhere.
> 
> That's a problem, since ubsan is not a pass, we can't simply delete
> the map at the end of the pass when it's not needed anymore...
> 
> Perhaps some GTY(()) stuff could do it, but I don't know which one.

You basically want to keep the pointer map for as long as you can
lookup types, which is done now from both FEs and
middle-end?  I guess it is the same category as say
debug_expr_for_decl or value_expr_for_decl map tables, which we never free
either.  But for those we indeed
GTY ((if_marked ("tree_decl_map_marked_p"), param_is (struct tree_decl_map)))
and thus remove items if the original decl is not referenced from anywhere
else; but pointer_map doesn't allow item removal; do we walk it for GC at
all?  If so, it might prevent some types from being GC collected, on the
other side right now it isn't expected that too many types will be put into
the map.

Jakub


Re: [ubsan] Use pointer map instead of hash table.

2013-08-28 Thread Marek Polacek
On Wed, Aug 28, 2013 at 02:49:54PM +0200, Richard Biener wrote:
> On Wed, Aug 28, 2013 at 2:15 PM, Marek Polacek  wrote:
> > On Wed, Aug 28, 2013 at 12:40:50PM +0200, Richard Biener wrote:
> >> On Tue, Aug 27, 2013 at 2:33 PM, Marek Polacek  wrote:
> >> > It turned out that for tree -> tree mapping we don't need the hash
> >> > table at all; pointer map is much more convenient.  So this patch
> >> > weeds out the hash table out of ubsan and introduces pointer map
> >> > instead.  Quite a lot of code could go away--no need to set the
> >> > alloc pools up etc.
> >> >
> >> > Regtested, ran bootstrap-ubsan on x86_64-linux.  Applying to the
> >> > ubsan branch.
> >>
> >> You can use the type-safe pointer_map  now (ok, only the data type
> >> is type safe, the pointer is still void).
> >
> > Thanks, done with the following.  Please let me know if you see
> > something wrong in this; otherwise I'll commit it if the
> > bootstrap-ubsan passes.
> 
> Probably misses freeing of the pointer-map using 'delete' somewhere.

That's a problem, since ubsan is not a pass, we can't simply delete
the map at the end of the pass when it's not needed anymore...

Perhaps some GTY(()) stuff could do it, but I don't know which one.

Marek


Re: [PATCH] Support dumping type bindings and 'mutable' qualifier in lambda diagnostics.

2013-08-28 Thread Gabriel Dos Reis
On Tue, Aug 27, 2013 at 7:54 PM, Adam Butcher  wrote:
> * error.c (dump_function_decl): Use standard diagnostic flow to dump a
> lambda diagnostic, albeit without stating the function name or
> duplicating the parameter spec (which is dumped as part of the type).
> Rather than qualifying the diagnostic with 'const' for plain lambdas,
> qualify with 'mutable' if non-const.
> ---
> Okay to commit?

This looks good to me.  I have one suggestion.  See below.

> ---
>  gcc/cp/error.c | 27 +++
>  1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/cp/error.c b/gcc/cp/error.c
> index c82a0ce..a8ca269 100644
> --- a/gcc/cp/error.c
> +++ b/gcc/cp/error.c
> @@ -1380,14 +1380,7 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, 
> int flags)
>int do_outer_scope = ! (flags & TFF_UNQUALIFIED_NAME);
>tree exceptions;
>vec *typenames = NULL;
> -
> -  if (DECL_NAME (t) && LAMBDA_FUNCTION_P (t))
> -{
> -  /* A lambda's signature is essentially its "type", so defer.  */
> -  gcc_assert (LAMBDA_TYPE_P (DECL_CONTEXT (t)));
> -  dump_type (pp, DECL_CONTEXT (t), flags);
> -  return;
> -}
> +  bool lambda_p = false;
>
>flags &= ~(TFF_UNQUALIFIED_NAME | TFF_TEMPLATE_NAME);
>if (TREE_CODE (t) == TEMPLATE_DECL)
> @@ -1449,21 +1442,31 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, 
> int flags)
>else if (cname)
>  {
>dump_type (pp, cname, flags);
> -  pp_cxx_colon_colon (pp);
> +  if (LAMBDA_TYPE_P (cname))
> +   lambda_p = true;

Instead of introducing the local variable lambda_p and then
weave your way through the existing maze, do this instead:
(1)  Introduce a new function, call it dump_lambda_expr
  (or something like that) that does only the formatting
  of a lambda expression, and nothing else.

   (2) when you arrive here where you test LAMBDA_TYPE_P
then do a
return dump_lambda_expr (pp, flags);

The result is much simpler and cleaner that way.
Thanks,

> +  else
> +   pp_cxx_colon_colon (pp);
>  }
>else
>  dump_scope (pp, CP_DECL_CONTEXT (t), flags);
>
> -  dump_function_name (pp, t, flags);
> +  /* A lambda's signature is essentially its "type", which has already been
> + dumped.  */
> +  if (!lambda_p)
> +dump_function_name (pp, t, flags);
>
>if (!(flags & TFF_NO_FUNCTION_ARGUMENTS))
>  {
> -  dump_parameters (pp, parmtypes, flags);
> +  if (!lambda_p)
> +   dump_parameters (pp, parmtypes, flags);
>
>if (TREE_CODE (fntype) == METHOD_TYPE)
> {
>   pp->padding = pp_before;
> - pp_cxx_cv_qualifier_seq (pp, class_of_this_parm (fntype));
> + if (!lambda_p)
> +   pp_cxx_cv_qualifier_seq (pp, class_of_this_parm (fntype));
> + else if (!(TYPE_QUALS (class_of_this_parm (fntype)) & 
> TYPE_QUAL_CONST))
> +   pp_c_ws_string (pp, "mutable");
>   dump_ref_qualifier (pp, fntype, flags);
> }
>
> --
> 1.8.4
>


Re: [ubsan] Use pointer map instead of hash table.

2013-08-28 Thread Richard Biener
On Wed, Aug 28, 2013 at 2:15 PM, Marek Polacek  wrote:
> On Wed, Aug 28, 2013 at 12:40:50PM +0200, Richard Biener wrote:
>> On Tue, Aug 27, 2013 at 2:33 PM, Marek Polacek  wrote:
>> > It turned out that for tree -> tree mapping we don't need the hash
>> > table at all; pointer map is much more convenient.  So this patch
>> > weeds out the hash table out of ubsan and introduces pointer map
>> > instead.  Quite a lot of code could go away--no need to set the
>> > alloc pools up etc.
>> >
>> > Regtested, ran bootstrap-ubsan on x86_64-linux.  Applying to the
>> > ubsan branch.
>>
>> You can use the type-safe pointer_map  now (ok, only the data type
>> is type safe, the pointer is still void).
>
> Thanks, done with the following.  Please let me know if you see
> something wrong in this; otherwise I'll commit it if the
> bootstrap-ubsan passes.

Probably misses freeing of the pointer-map using 'delete' somewhere.

Richard.

> 2013-08-28  Marek Polacek  
>
> * ubsan.c: Use pointer_map instead of pointer_map_t.
> (insert_decl_for_type): Adjust.
> (lookup_decl_for_type): Likewise.
>
> --- gcc/ubsan.c.mp  2013-08-28 12:54:17.778383224 +0200
> +++ gcc/ubsan.c 2013-08-28 14:09:42.400105470 +0200
> @@ -31,16 +31,14 @@ along with GCC; see the file COPYING3.
>  #include "c-family/c-common.h"
>
>  /* Map a TYPE to an ubsan type descriptor VAR_DECL for that type.  */
> -static pointer_map_t *typedesc_map;
> +static pointer_map *typedesc_map;
>
>  /* Insert DECL as the VAR_DECL for TYPE in the TYPEDESC_MAP.  */
>
>  static void
>  insert_decl_for_type (tree decl, tree type)
>  {
> -  void **slot = pointer_map_insert (typedesc_map, type);
> -  gcc_assert (*slot == NULL);
> -  *slot = decl;
> +  *typedesc_map->insert (type) = decl;
>  }
>
>  /* Find the VAR_DECL for TYPE in TYPEDESC_MAP.  If TYPE does not
> @@ -52,9 +50,13 @@ lookup_decl_for_type (tree type)
>  {
>/* If the pointer map is not initialized yet, create it now.  */
>if (typedesc_map == NULL)
> -typedesc_map = pointer_map_create ();
> -  void **slot = pointer_map_contains (typedesc_map, type);
> -  return slot ? (tree) *slot : NULL_TREE;
> +{
> +  typedesc_map = new pointer_map;
> +  /* That also means we don't have to bother with the lookup.  */
> +  return NULL_TREE;
> +}
> +  tree *t = typedesc_map->contains (type);
> +  return t ? *t : NULL_TREE;
>  }
>
>  /* Helper routine, which encodes a value in the pointer_sized_int_node.
>
> Marek


Re: wide-int branch now up for public comment and review

2013-08-28 Thread Richard Biener
On Wed, 28 Aug 2013, Richard Sandiford wrote:

> Richard Biener  writes:
> 
> > On Wed, 28 Aug 2013, Richard Sandiford wrote:
> >
> >> Richard Biener  writes:
> >> >> So the precision variable is good for the rtl level in several ways:
> >> >> 
> >> >> - As you say, it avoids adding the explicit truncations that (in 
> >> >> practice)
> >> >>   every rtl operation would need
> >> >> 
> >> >> - It's more efficient in that case, since we don't calculate high values
> >> >>   and then discard them immediately.  The common GET_MODE_PRECISION 
> >> >> (mode)
> >> >>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
> >> >>   the wide-int trappings.
> >> >> 
> >> >> - It's a good way of checking type safety and making sure that excess
> >> >>   bits aren't accidentally given a semantic meaning.  This is the most
> >> >>   important reason IMO.
> >> >> 
> >> >> The branch has both the constant-precision, very wide integers that we
> >> >> want for trees and the variable-precision integers we want for rtl,
> >> >> so it's not an "either or".  With the accessor-based implementation,
> >> >> there should be very little cost to having both.
> >> >
> >> > So what I wonder (and where we maybe disagree) is how much code
> >> > wants to inspect "intermediate" results.  Say originally you have
> >> >
> >> > rtx foo (rtx x, rtx y)
> >> > {
> >> >   rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
> >> > GEN_INT (1));
> >> >   rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
> >> > y);
> >> >   return res;
> >> > }
> >> >
> >> > and with wide-int you want to change that to
> >> >
> >> > rtx foo (rtx x, rtx y)
> >> > {
> >> >   wide_int tem = wide_int (x) + 1;
> >> >   wide_int res = tem - y;
> >> >   return res.to_rtx ();
> >> > }
> >> >
> >> > how much code ever wants to inspect 'tem' or 'res'?
> >> > That is, does it matter
> >> > if 'tem' and 'res' would have been calculated in "infinite precision"
> >> > and only to_rtx () would do the truncation to the desired mode?
> >> >
> >> > I think not.  The amount of code performing multiple operations on
> >> > _constants_ in sequence is extremely low (if it even exists).
> >> >
> >> > So I'd rather have to_rtx get a mode argument (or a precision) and
> >> > perform the required truncation / sign-extension at RTX construction
> >> > time (which is an expensive operation anyway).
> >> 
> >> I agree this is where we disagree.  I don't understand why you think
> >> the above is better.  Why do we want to do "infinite precision"
> >> addition of two values when only the lowest N bits of those values
> >> have a (semantically) defined meaning?  Earlier in the thread it sounded
> >> like we both agreed that having undefined bits in the _representation_
> >> was bad.  So why do we want to do calculations on parts of values that
> >> are undefined in the (rtx) semantics?
> >> 
> >> E.g. say we're adding two rtx values whose mode just happens to be
> >> HOST_BITS_PER_WIDE_INT in size.  Why does it make sense to calculate
> >> the carry from adding the two HWIs, only to add it to an upper HWI
> >> that has no semantically-defined value?  It's garbage in, garbage out.
> >
> > Not garbage in, and not garbage out (just wasted work).
> 
> Well, it's not garbage in the sense of an uninitialised HWI detected
> by valgrind (say).  But it's semantic garbage.
> 
> > That's the possible downside - the upside is to get rid of the notion
> > of a 'precision'.
> 
> No, it's still there, just in a different place.
> 
> > OTOH they still will be in some ways "undefined" if you consider
> >
> >   wide_int xw = from_rtx (xr, mode);
> >   tree xt = to_tree (xw, type);
> >   wide_int xw2 = from_tree (xt);
> >
> > with an unsigned type xw and xw2 will not be equal (in the
> > 'extension' bits) for a value with MSB set.
> 
> Do you mean it's undefined as things stand, or when using "infinite
> precision" for rtl?  It shouldn't lead to anything undefined at
> the moment.  Only the low GET_MODE_BITSIZE (mode) bits of xw are
> meaningful, but those are also the only bits that would be used.
> 
> > That is, RTL chooses to always sign-extend, tree chooses to extend
> > according to sign information.  wide-int chooses to ... ?  (it seems
> > the wide-int overall comment lost the part that defined its encoding,
> > but it seems that we still sign-extend val[len-1], so (unsigned
> > HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and (HOST_WIDE_INT)-1 is
> > { -1 } with len == 1.
> 
> Only if the precision is > HOST_BITS_PER_WIDE_INT.  If the precision
> is HOST_BITS_PER_WIDE_INT then both are { -1U }.

That wasn't my understanding on how things work.

> "len" is never
> greater than precision * HOST_BITS_PER_WIDE_INT.

"len" can be one larger than precision * HOST_BITS_PER_WIDE_INT as
I originally designed the encoding scheme.  It was supposed to
be able to capture the difference between a positive and a negative
number (unlike the RTL rep).

I see canoni

[ubsan] Fix Makefile.in dependencies

2013-08-28 Thread Marek Polacek
As usually, when changing #include's in ubsan.c, I forgot to adjust
Makefile.in as well.  Thus fixed now.

Applying to the ubsan branch.

2013-08-28  Marek Polacek  

* Makefile.in (ubsan.o): Add pointer-set.h dependency.  Remove
alloc-pool.h and HASH_TABLE_H dependencies.

--- gcc/Makefile.in.mp  2013-08-28 14:30:28.537813780 +0200
+++ gcc/Makefile.in 2013-08-28 14:30:33.498831590 +0200
@@ -2273,7 +2273,7 @@ tsan.o : $(CONFIG_H) $(SYSTEM_H) $(TREE_
intl.h cfghooks.h output.h options.h $(C_COMMON_H) tsan.h asan.h \
tree-ssa-propagate.h
 ubsan.o : ubsan.c ubsan.h $(CONFIG_H) $(SYSTEM_H) $(GIMPLE_H) \
-   output.h coretypes.h $(TREE_H) alloc-pool.h $(CGRAPH_H) $(HASH_TABLE_H) \
+   output.h coretypes.h $(TREE_H) $(CGRAPH_H) pointer-set.h \
toplev.h $(C_COMMON_H)
 tree-ssa-tail-merge.o: tree-ssa-tail-merge.c \
$(SYSTEM_H) $(CONFIG_H) coretypes.h $(TM_H) $(BITMAP_H) \

Marek


Re: wide-int branch updated.

2013-08-28 Thread Richard Biener
On Wed, 28 Aug 2013, Kenneth Zadeck wrote:

> On 08/28/2013 03:45 AM, Richard Biener wrote:
> > On Tue, 27 Aug 2013, Kenneth Zadeck wrote:
> > 
> > > removed all knowledge of SHIFT_COUNT_TRUNCATED from wide-int
> > > 
> > > both Richard Biener and Richard Sandiford had commented negatively about
> > > this.
> > > 
> > > fixed bug with wide-int::fits_uhwi_p.
> >   inline bool
> >   wide_int_ro::fits_uhwi_p () const
> >   {
> > -  return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0);
> > +  return (precision <= HOST_BITS_PER_WIDE_INT)
> > +|| (len == 1 && val[0] >= 0)
> > +|| (len == 2 && (precision >= 2 * HOST_BITS_PER_WIDE_INT) && (val[1]
> > ==
> > 0))
> > +|| (len == 2 && (sext_hwi (val[1], precision &
> > (HOST_BITS_PER_WIDE_INT
> > - 1)) == 0));
> >   }
> > 
> > it now get's scary ;)  Still wrong for precision == 0?
> no, because anything that comes in at precision 0 is a canonized sign extended
> number already.   the precision 0 just means that it is safe to be any
> precision.

Hmm, how can "any" precision be valid?  Only any precision that can
represent the value.  fits_uhwi_p asks whether truncation to
hwi precision is value-preserving.

> > ;)
> > 
> > I wonder what it's semantic is ... in double_int we simply require
> > high == 0 (thus, negative numbers are not allowed).  with
> > precision <= HOST_BITS_PER_WIDE_INT you allow negative numbers.
> > 
> > Matching what double-int fits_uhwi does would be
> > 
> > (len == 1 && ((signed HOST_WIDE_INT)val[0]) >= 0)
> it is signed so i am matching this part.
> > || (len == 2 && val[1] == 0)
> so this does not work.   say i had a precision 70 bit wide-int. The bits above
> the precision are undefined, so i have to clear it out.   This is what the two
> lines at len 2 are for.   However if the precision is greater than 2 hwi's
> then we can do something this simple.

?  The bits in the encoding should not be undefined.  And why should
they be magically defined when the precision is greater than 2 hwi's then?

Richard.


Re: wide-int branch now up for public comment and review

2013-08-28 Thread Richard Sandiford
Richard Biener  writes:

> On Wed, 28 Aug 2013, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> >> So the precision variable is good for the rtl level in several ways:
>> >> 
>> >> - As you say, it avoids adding the explicit truncations that (in practice)
>> >>   every rtl operation would need
>> >> 
>> >> - It's more efficient in that case, since we don't calculate high values
>> >>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
>> >>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
>> >>   the wide-int trappings.
>> >> 
>> >> - It's a good way of checking type safety and making sure that excess
>> >>   bits aren't accidentally given a semantic meaning.  This is the most
>> >>   important reason IMO.
>> >> 
>> >> The branch has both the constant-precision, very wide integers that we
>> >> want for trees and the variable-precision integers we want for rtl,
>> >> so it's not an "either or".  With the accessor-based implementation,
>> >> there should be very little cost to having both.
>> >
>> > So what I wonder (and where we maybe disagree) is how much code
>> > wants to inspect "intermediate" results.  Say originally you have
>> >
>> > rtx foo (rtx x, rtx y)
>> > {
>> >   rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
>> > GEN_INT (1));
>> >   rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
>> > y);
>> >   return res;
>> > }
>> >
>> > and with wide-int you want to change that to
>> >
>> > rtx foo (rtx x, rtx y)
>> > {
>> >   wide_int tem = wide_int (x) + 1;
>> >   wide_int res = tem - y;
>> >   return res.to_rtx ();
>> > }
>> >
>> > how much code ever wants to inspect 'tem' or 'res'?
>> > That is, does it matter
>> > if 'tem' and 'res' would have been calculated in "infinite precision"
>> > and only to_rtx () would do the truncation to the desired mode?
>> >
>> > I think not.  The amount of code performing multiple operations on
>> > _constants_ in sequence is extremely low (if it even exists).
>> >
>> > So I'd rather have to_rtx get a mode argument (or a precision) and
>> > perform the required truncation / sign-extension at RTX construction
>> > time (which is an expensive operation anyway).
>> 
>> I agree this is where we disagree.  I don't understand why you think
>> the above is better.  Why do we want to do "infinite precision"
>> addition of two values when only the lowest N bits of those values
>> have a (semantically) defined meaning?  Earlier in the thread it sounded
>> like we both agreed that having undefined bits in the _representation_
>> was bad.  So why do we want to do calculations on parts of values that
>> are undefined in the (rtx) semantics?
>> 
>> E.g. say we're adding two rtx values whose mode just happens to be
>> HOST_BITS_PER_WIDE_INT in size.  Why does it make sense to calculate
>> the carry from adding the two HWIs, only to add it to an upper HWI
>> that has no semantically-defined value?  It's garbage in, garbage out.
>
> Not garbage in, and not garbage out (just wasted work).

Well, it's not garbage in the sense of an uninitialised HWI detected
by valgrind (say).  But it's semantic garbage.

> That's the possible downside - the upside is to get rid of the notion
> of a 'precision'.

No, it's still there, just in a different place.

> OTOH they still will be in some ways "undefined" if you consider
>
>   wide_int xw = from_rtx (xr, mode);
>   tree xt = to_tree (xw, type);
>   wide_int xw2 = from_tree (xt);
>
> with an unsigned type xw and xw2 will not be equal (in the
> 'extension' bits) for a value with MSB set.

Do you mean it's undefined as things stand, or when using "infinite
precision" for rtl?  It shouldn't lead to anything undefined at
the moment.  Only the low GET_MODE_BITSIZE (mode) bits of xw are
meaningful, but those are also the only bits that would be used.

> That is, RTL chooses to always sign-extend, tree chooses to extend
> according to sign information.  wide-int chooses to ... ?  (it seems
> the wide-int overall comment lost the part that defined its encoding,
> but it seems that we still sign-extend val[len-1], so (unsigned
> HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and (HOST_WIDE_INT)-1 is
> { -1 } with len == 1.

Only if the precision is > HOST_BITS_PER_WIDE_INT.  If the precision
is HOST_BITS_PER_WIDE_INT then both are { -1U }.  "len" is never
greater than precision * HOST_BITS_PER_WIDE_INT.

> In RTL both would be encoded with len == 1 (no
> distinction between a signed and unsigned number with all bits set),

Same again: both are -1 if the mode is HOST_BITS_PER_WIDE_INT or smaller.
If the mode is wider then RTL too uses { -1, 0 }.  So the current wide_int
representation matches the RTL representation pretty closely, except for
the part about wide_int leaving excess bits undefined.  But that's just
a convenience, it isn't important in terms of what operators to.

> on the current tree representation the encoding would be with len ==
> 1, too, as 

Re: [ubsan] Use pointer map instead of hash table.

2013-08-28 Thread Marek Polacek
On Wed, Aug 28, 2013 at 12:40:50PM +0200, Richard Biener wrote:
> On Tue, Aug 27, 2013 at 2:33 PM, Marek Polacek  wrote:
> > It turned out that for tree -> tree mapping we don't need the hash
> > table at all; pointer map is much more convenient.  So this patch
> > weeds out the hash table out of ubsan and introduces pointer map
> > instead.  Quite a lot of code could go away--no need to set the
> > alloc pools up etc.
> >
> > Regtested, ran bootstrap-ubsan on x86_64-linux.  Applying to the
> > ubsan branch.
> 
> You can use the type-safe pointer_map  now (ok, only the data type
> is type safe, the pointer is still void).

Thanks, done with the following.  Please let me know if you see
something wrong in this; otherwise I'll commit it if the
bootstrap-ubsan passes.

2013-08-28  Marek Polacek  

* ubsan.c: Use pointer_map instead of pointer_map_t.
(insert_decl_for_type): Adjust.
(lookup_decl_for_type): Likewise.

--- gcc/ubsan.c.mp  2013-08-28 12:54:17.778383224 +0200
+++ gcc/ubsan.c 2013-08-28 14:09:42.400105470 +0200
@@ -31,16 +31,14 @@ along with GCC; see the file COPYING3.
 #include "c-family/c-common.h"
 
 /* Map a TYPE to an ubsan type descriptor VAR_DECL for that type.  */
-static pointer_map_t *typedesc_map;
+static pointer_map *typedesc_map;
 
 /* Insert DECL as the VAR_DECL for TYPE in the TYPEDESC_MAP.  */
 
 static void
 insert_decl_for_type (tree decl, tree type)
 {
-  void **slot = pointer_map_insert (typedesc_map, type);
-  gcc_assert (*slot == NULL);
-  *slot = decl;
+  *typedesc_map->insert (type) = decl;
 }
 
 /* Find the VAR_DECL for TYPE in TYPEDESC_MAP.  If TYPE does not
@@ -52,9 +50,13 @@ lookup_decl_for_type (tree type)
 {
   /* If the pointer map is not initialized yet, create it now.  */
   if (typedesc_map == NULL)
-typedesc_map = pointer_map_create ();
-  void **slot = pointer_map_contains (typedesc_map, type);
-  return slot ? (tree) *slot : NULL_TREE;
+{
+  typedesc_map = new pointer_map;
+  /* That also means we don't have to bother with the lookup.  */
+  return NULL_TREE;
+}
+  tree *t = typedesc_map->contains (type);
+  return t ? *t : NULL_TREE;
 }
 
 /* Helper routine, which encodes a value in the pointer_sized_int_node.

Marek


Re: [PATCH] Change the badness computation to ensure no integer-underflow

2013-08-28 Thread Jan Hubicka
> On Tue, Aug 27, 2013 at 2:50 PM, Jan Hubicka  wrote:
> >> >>
> >> >> I've updated the patch (as attached) to use sreal to compute badness.
> >>  +  badness = ((int)((double) edge->count / max_count
> >>  +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / 
> >>  growth;
> >>  +
> >> >>>
> >> >>> FP operations on the host are frowned upon if code generation depends
> >> >>> on their outcome.  They all should use sreal_* as predict already does.
> >> >>> Other than that I wonder why the final division is int (so we get 
> >> >>> truncation
> >> >>> and not rounding)?  That increases the effect of doing the multiply by
> >> >>> relbenefit in double.
> > Sorry, i missed this patch - I need to update my scripts marking files 
> > interesting
> > to me.
> >
> > I originally duplicated the trick from global.c that used to do this for 
> > ages.
> > The rationale is that if you do only one FP operation, then you will not get
> > difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and 
> > you
> > do the operation a lot faster than sreal does. While this is on slipperly 
> > ground,
> > controlled use of double should not imply host dependency.
> 
> Well .. the above is clearly not "one" FP operation.  Also I don't follow.
> If we can assume all hosts implement proper IEEE double math then
> of course the issue is moot.  But fact is we don't (not sure if we can

We are not really consistent across all hosts either, because of HOST_WIDE_INT
differences.  What we really cared about is to have compiler that does not give
different result based on build flags.

> declare such hosts broken and unsupported as host).  Of couse on
> a perfect IEEE compliant host sreal.[ch] could be simply mapped to
> double.  Oh, and yes, 80bit x87 unit isn't holding up to that premises.

sreal is not IEEE compliant.  We first time hit the problem of doing
non-trivial chains of fp opreations in predict.c during my school project and
Josef implemented it as a dirty but resonably quick FP emulation module.
Hopefully it will work well enough in badness computationm, too.

I am giving the patch brief benchmarking on profiledbootstrap and it it won't
cause major regression, I think we should go ahead with the patch.

I was never really happy about the double use there and in fact the whole fixed
point arithmetic in badness compuation is a mess.  If we had template based
fibonaci heap and sreal fast enough, turing it all to reals would save quite
some maintenance burden.

Honza


Re: wide-int branch updated

2013-08-28 Thread Kenneth Zadeck

On 08/28/2013 03:50 AM, Richard Biener wrote:

On Tue, 27 Aug 2013, Richard Sandiford wrote:


Kenneth Zadeck  writes:

fixed fits_uhwi_p.

tested on x86-64.

kenny

Index: gcc/wide-int.h
===
--- gcc/wide-int.h  (revision 201985)
+++ gcc/wide-int.h  (working copy)
@@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const
  inline bool
  wide_int_ro::fits_uhwi_p () const
  {
-  return len == 1 || (len == 2 && val[1] == 0);
+  return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0);
  }

With upper bits being undefined, it doesn't seem safe to check

Err - we're back at upper bits being undefined?  Ugh.  Then
having both fits_uhwi_p and fits_shwi_p doesn't make sense.


yes, that is the problem.   Richard is going to look into what it might 
take to make rtl always have the upper bits canonized as sign 
extended.   if he/we can do this, then we will change everything so that 
it is always canonized above the precision.   Every one thinks this is a 
good plan, assuming you do not have to rewrite ever back end to do it.





val[0] or val[1] like this.   I was thinking along the lines of:

inline bool
wide_int_ro::fits_uhwi_p () const
{
   if (precision <= HOST_BITS_PER_WIDE_INT)
 return true;
   if (len == 1)
 return val[0] >= 0;
   if (precision < HOST_BITS_PER_WIDE_INT * 2)
 return ((unsigned HOST_WIDE_INT) val[1]
<< (HOST_BITS_PER_WIDE_INT * 2 - precision)) == 0;
   return val[1] == 0;
}

Since we don't have a sign, everything HWI-sized or smaller fits in a
uhwi without loss of precision.

Which then means fits_hwi_p () is simply

   if (precision <= HOST_BITS_PER_WIDE_INT)
 return true;
   zext ();  // ick, modification in a predicate!  stupid undefined bits!
   return len == 1 || (len == 2 && val[1] == 0);

we do not ever do this.  where are you looking.


but if upper bits are undefined then the whole encoding and len
business is broken.

no it is not.


btw - welcome back.  we have missed you.

Richard.




Re: wide-int branch updated.

2013-08-28 Thread Kenneth Zadeck

On 08/28/2013 03:45 AM, Richard Biener wrote:

On Tue, 27 Aug 2013, Kenneth Zadeck wrote:


removed all knowledge of SHIFT_COUNT_TRUNCATED from wide-int

both Richard Biener and Richard Sandiford had commented negatively about
this.

fixed bug with wide-int::fits_uhwi_p.

  inline bool
  wide_int_ro::fits_uhwi_p () const
  {
-  return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0);
+  return (precision <= HOST_BITS_PER_WIDE_INT)
+|| (len == 1 && val[0] >= 0)
+|| (len == 2 && (precision >= 2 * HOST_BITS_PER_WIDE_INT) && (val[1]
==
0))
+|| (len == 2 && (sext_hwi (val[1], precision &
(HOST_BITS_PER_WIDE_INT
- 1)) == 0));
  }

it now get's scary ;)  Still wrong for precision == 0?
no, because anything that comes in at precision 0 is a canonized sign 
extended number already.   the precision 0 just means that it is safe to 
be any precision.

;)

I wonder what it's semantic is ... in double_int we simply require
high == 0 (thus, negative numbers are not allowed).  with
precision <= HOST_BITS_PER_WIDE_INT you allow negative numbers.

Matching what double-int fits_uhwi does would be

(len == 1 && ((signed HOST_WIDE_INT)val[0]) >= 0)

it is signed so i am matching this part.

|| (len == 2 && val[1] == 0)
so this does not work.   say i had a precision 70 bit wide-int. The bits 
above the precision are undefined, so i have to clear it out.   This is 
what the two lines at len 2 are for.   However if the precision is 
greater than 2 hwi's then we can do something this simple.


kenny

(I don't remember off-hand the signedness of val[], but eventually
you missed the conversion to signed)

Now, what double-int does is supposed to match
host_integerp (..., 1) which I think it does.

Richard.




Re: wide-int branch now up for public comment and review

2013-08-28 Thread Richard Biener
On Wed, 28 Aug 2013, Richard Sandiford wrote:

> Richard Biener  writes:
> >> So the precision variable is good for the rtl level in several ways:
> >> 
> >> - As you say, it avoids adding the explicit truncations that (in practice)
> >>   every rtl operation would need
> >> 
> >> - It's more efficient in that case, since we don't calculate high values
> >>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
> >>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
> >>   the wide-int trappings.
> >> 
> >> - It's a good way of checking type safety and making sure that excess
> >>   bits aren't accidentally given a semantic meaning.  This is the most
> >>   important reason IMO.
> >> 
> >> The branch has both the constant-precision, very wide integers that we
> >> want for trees and the variable-precision integers we want for rtl,
> >> so it's not an "either or".  With the accessor-based implementation,
> >> there should be very little cost to having both.
> >
> > So what I wonder (and where we maybe disagree) is how much code
> > wants to inspect "intermediate" results.  Say originally you have
> >
> > rtx foo (rtx x, rtx y)
> > {
> >   rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
> > GEN_INT (1));
> >   rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
> > y);
> >   return res;
> > }
> >
> > and with wide-int you want to change that to
> >
> > rtx foo (rtx x, rtx y)
> > {
> >   wide_int tem = wide_int (x) + 1;
> >   wide_int res = tem - y;
> >   return res.to_rtx ();
> > }
> >
> > how much code ever wants to inspect 'tem' or 'res'?
> > That is, does it matter
> > if 'tem' and 'res' would have been calculated in "infinite precision"
> > and only to_rtx () would do the truncation to the desired mode?
> >
> > I think not.  The amount of code performing multiple operations on
> > _constants_ in sequence is extremely low (if it even exists).
> >
> > So I'd rather have to_rtx get a mode argument (or a precision) and
> > perform the required truncation / sign-extension at RTX construction
> > time (which is an expensive operation anyway).
> 
> I agree this is where we disagree.  I don't understand why you think
> the above is better.  Why do we want to do "infinite precision"
> addition of two values when only the lowest N bits of those values
> have a (semantically) defined meaning?  Earlier in the thread it sounded
> like we both agreed that having undefined bits in the _representation_
> was bad.  So why do we want to do calculations on parts of values that
> are undefined in the (rtx) semantics?
> 
> E.g. say we're adding two rtx values whose mode just happens to be
> HOST_BITS_PER_WIDE_INT in size.  Why does it make sense to calculate
> the carry from adding the two HWIs, only to add it to an upper HWI
> that has no semantically-defined value?  It's garbage in, garbage out.

Not garbage in, and not garbage out (just wasted work).  That's
the possible downside - the upside is to get rid of the notion of
a 'precision'.

But yes, it's good that we agree on the fact that undefined bits
in the _representation_ are wrong.

OTOH they still will be in some ways "undefined" if you consider

  wide_int xw = from_rtx (xr, mode);
  tree xt = to_tree (xw, type);
  wide_int xw2 = from_tree (xt);

with an unsigned type xw and xw2 will not be equal (in the
'extension' bits) for a value with MSB set.  That is, RTL
chooses to always sign-extend, tree chooses to extend according
to sign information.  wide-int chooses to ... ?  (it seems the
wide-int overall comment lost the part that defined its encoding,
but it seems that we still sign-extend val[len-1], so
(unsigned HOST_WIDE_INT)-1 is { -1U, 0 } with len == 2 and
(HOST_WIDE_INT)-1 is { -1 } with len == 1.  In RTL both
would be encoded with len == 1 (no distinction between a signed
and unsigned number with all bits set), on the current
tree representation the encoding would be with len == 1, too,
as we have TYPE_UNSIGNED to tell us the sign.

So we still need to somehow "map" between those representations.

Coming from the tree side (as opposed to from the RTL side) I'd
have argued you need a 'signed' flag ;)  We side-stepped that
by doing the extension trick in a way that preserves sign information.

Looking at the RTL representation from that wide-int representation
makes RTL look as if all constants are signed.  That's fine if
code that wants to do unsigned stuff properly extends.  So - do
we need a from_rtx_unsigned () constructor that does this?
I'm worried about all the extensions done in operations like add ():

  if (p1 <= HOST_BITS_PER_WIDE_INT)
{
  result.len = 1;
  result.precision = p1;
  result.val[0] = val[0] + s[0];
  if (p1 < HOST_BITS_PER_WIDE_INT)
result.val[0] = sext_hwi (result.val[0], p1);
  if (sgn == SIGNED)
{
  HOST_WIDE_INT x
= (((result.val[0] ^ val[0]) & (result.val[0] ^ s[0]))
   >> (p1 - 1

Re: ELF interposition and One Definition Rule

2013-08-28 Thread Jan Hubicka
> > I had thought that that case (overriding malloc etc) was what this patch was
> > dealing with.  Perhaps I was confused.
> >
> > There's nothing particularly special about ctors, dtors and virtual function
> > implementations.  Their common feature, as Jan described, is that it's hard
> > to take their address:
> > * ctors don't have names.
> > * dtors do have names, but you can't take their address.
> > * virtual function implementations can't have their instance address taken
> > -- you get the ptr-to-memfn object describing the vtable slot etc.  There is
> > (was?) a GNU extension to work around that, IIRC.
> >
> > So that stops a programmer calling these indirectly.  However I don't really
> > see the semantic difference between:
> >
> >   fptr = &somefunc; ... fptr (...);
> > and
> >   somefunc (...);
> >
> > So treating those 3 types of function specially because you can't use the
> > former syntax with them seems odd.

What is really important for backend is that it is not defined what happens
when you compare addresses of those functions (based on fact that youcan't take
it, as for ctors/dtors, or compare it, as for virtual functions).  If backend
also knows that they are not interposable, it knows it can freely duplicate
or unify their bodies when it seems win.

The problem whether we want to promise working interposition on these is
however ortoghonal to whether one can or can not take address of them.  I
wonder how much we want to allow by default: my impression is that we pay large
cost to allow occasional rewrite of functions like malloc. As Jason mentioned,
it is really rare to rewrite something implemented in current unit and it is
where 90% of the runtime costs come from.

Adjusting by defualt would be a good win.  We can also invent command line flag
for this and ask users to use it often.
> 
> Yeah - this is why I'm always worried about "frontend specific" optimizations
> in the middle-end.  If you need to rely stuff that's not visible in GIMPLE 
> then
> it smells.

I am not too keen about language specific rules in gimple either (though I 
introduced
quite some knowledge now for C++ devirtualization - we may think how to make 
these
more generic if we care about Java/other languages with polymorphic calls).

I have in my TODO to discuss ading extra gimple flag "address of this symbol is
never used for equality comparsion" flag.  S far we always tested DECL_VIRTUAL
for this (that is something I learned from ICF presentation of gold) but I am
sure non-C frontends can use it more widely.  We can also make early
optimizations /propagation to rule out simple cases where address is taken but
clearly never compared.

Similarly we may want to make frontend to decide on availablity of function in
question instead of having the magic in the cgraph function.  But I guess we 
first
need to figure out what the rules are.

Honza


Re: wide-int branch now up for public comment and review

2013-08-28 Thread Richard Sandiford
Richard Biener  writes:
>> So the precision variable is good for the rtl level in several ways:
>> 
>> - As you say, it avoids adding the explicit truncations that (in practice)
>>   every rtl operation would need
>> 
>> - It's more efficient in that case, since we don't calculate high values
>>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
>>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
>>   the wide-int trappings.
>> 
>> - It's a good way of checking type safety and making sure that excess
>>   bits aren't accidentally given a semantic meaning.  This is the most
>>   important reason IMO.
>> 
>> The branch has both the constant-precision, very wide integers that we
>> want for trees and the variable-precision integers we want for rtl,
>> so it's not an "either or".  With the accessor-based implementation,
>> there should be very little cost to having both.
>
> So what I wonder (and where we maybe disagree) is how much code
> wants to inspect "intermediate" results.  Say originally you have
>
> rtx foo (rtx x, rtx y)
> {
>   rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
> GEN_INT (1));
>   rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
> y);
>   return res;
> }
>
> and with wide-int you want to change that to
>
> rtx foo (rtx x, rtx y)
> {
>   wide_int tem = wide_int (x) + 1;
>   wide_int res = tem - y;
>   return res.to_rtx ();
> }
>
> how much code ever wants to inspect 'tem' or 'res'?
> That is, does it matter
> if 'tem' and 'res' would have been calculated in "infinite precision"
> and only to_rtx () would do the truncation to the desired mode?
>
> I think not.  The amount of code performing multiple operations on
> _constants_ in sequence is extremely low (if it even exists).
>
> So I'd rather have to_rtx get a mode argument (or a precision) and
> perform the required truncation / sign-extension at RTX construction
> time (which is an expensive operation anyway).

I agree this is where we disagree.  I don't understand why you think
the above is better.  Why do we want to do "infinite precision"
addition of two values when only the lowest N bits of those values
have a (semantically) defined meaning?  Earlier in the thread it sounded
like we both agreed that having undefined bits in the _representation_
was bad.  So why do we want to do calculations on parts of values that
are undefined in the (rtx) semantics?

E.g. say we're adding two rtx values whose mode just happens to be
HOST_BITS_PER_WIDE_INT in size.  Why does it make sense to calculate
the carry from adding the two HWIs, only to add it to an upper HWI
that has no semantically-defined value?  It's garbage in, garbage out.

Providing this for rtl doesn't affect the tree-level operations in any
way.  Although things like addition require both arguments to have the
same precision after promotion, that's trivially true for trees, since
(a) the inputs used there -- C integers and tree constants -- can be
promoted and (b) tree-level code uses fixed_wide_int , where every
fixed_wide_int  has the same precision.

And you can still do "infinite-precision" arithmetic on rtx constants
if you want.  You just have to say how you want the constant to be
extended (sign or zero), so that the value of all bits is meaningful.

Thanks,
Richard


Re: [ubsan] Introduce pointer_sized_int_node

2013-08-28 Thread Richard Biener
On Wed, Aug 28, 2013 at 1:10 PM, Jakub Jelinek  wrote:
> On Wed, Aug 28, 2013 at 12:48:41PM +0200, Richard Biener wrote:
>> On Tue, Aug 27, 2013 at 2:56 PM, Marek Polacek  wrote:
>> > On Tue, Aug 27, 2013 at 01:48:29PM +0200, Richard Biener wrote:
>> >> > +  TI_POINTER_SIZED_TYPE,
>> >>
>> >> I'd rather see TI_UINTPTR_TYPE and TI_INTPTR_TYPE (note they might
>> >> not be exactly of POINTER_SIZE but larger).
>> >
>> > We already have [u]intptr_type_node -- but only in c-family/, thus
>> > ubsan.c/asan.c cannot use those nodes.  I can create both
>> > TI_SIGNED_POINTER_SIZED_TYPE and TI_UNSIGNED_POINTER_SIZED_TYPE,
>> > but we currently need only the latter...
>>
>> So simply move them to the middle-end.  The set of C language types that
>> define the target ABI should be constructed and maintained by the middle-end
>> (you need it for various builtins anyway)
>
> That is not easily possible.  The thing is, in the C FEs they are created
> from the C/C++ type name (for stdint purposes):
>   if (INTPTR_TYPE)
> intptr_type_node =
>   TREE_TYPE (identifier_global_value (c_get_ident (INTPTR_TYPE)));
>   if (UINTPTR_TYPE)
> uintptr_type_node =
>   TREE_TYPE (identifier_global_value (c_get_ident (UINTPTR_TYPE)));
> and are supposed to match the stdint.h previously used type, because
> it is part of ABI etc. (long vs. long long vs. int e.g. when two of these
> are the same precision).
> So the middle-end uintptr type needs to be something different, while it
> ideally should have the same precision, it is not the same language type.

But it's the C ABI type - and we do need the C ABI types from within
the middle-end.  Joseph, any idea?

Richard.

> Jakub


Re: [ubsan] Introduce pointer_sized_int_node

2013-08-28 Thread Jakub Jelinek
On Wed, Aug 28, 2013 at 12:48:41PM +0200, Richard Biener wrote:
> On Tue, Aug 27, 2013 at 2:56 PM, Marek Polacek  wrote:
> > On Tue, Aug 27, 2013 at 01:48:29PM +0200, Richard Biener wrote:
> >> > +  TI_POINTER_SIZED_TYPE,
> >>
> >> I'd rather see TI_UINTPTR_TYPE and TI_INTPTR_TYPE (note they might
> >> not be exactly of POINTER_SIZE but larger).
> >
> > We already have [u]intptr_type_node -- but only in c-family/, thus
> > ubsan.c/asan.c cannot use those nodes.  I can create both
> > TI_SIGNED_POINTER_SIZED_TYPE and TI_UNSIGNED_POINTER_SIZED_TYPE,
> > but we currently need only the latter...
> 
> So simply move them to the middle-end.  The set of C language types that
> define the target ABI should be constructed and maintained by the middle-end
> (you need it for various builtins anyway)

That is not easily possible.  The thing is, in the C FEs they are created
from the C/C++ type name (for stdint purposes):
  if (INTPTR_TYPE)
intptr_type_node =
  TREE_TYPE (identifier_global_value (c_get_ident (INTPTR_TYPE)));
  if (UINTPTR_TYPE)
uintptr_type_node =
  TREE_TYPE (identifier_global_value (c_get_ident (UINTPTR_TYPE)));
and are supposed to match the stdint.h previously used type, because
it is part of ABI etc. (long vs. long long vs. int e.g. when two of these
are the same precision).
So the middle-end uintptr type needs to be something different, while it
ideally should have the same precision, it is not the same language type.

Jakub


Re: Free decl_in_states when not needed

2013-08-28 Thread Richard Biener
On Wed, 28 Aug 2013, Jan Hubicka wrote:

> Hi,
> this patch saves almost 1Gb of GGC reachable memory for firefox. 
> Currently we point to all streamed in trees from global decl in state and from
> function specific states.  The states are big and they dangle pointers.
> This patch gets rid of states we do not need.
> 
> We run GGC collect just once during usual WPA compilation.  Currently
> it is just after tree loading that introduce very little garbage.  This
> patch moves it after unreachable code removal when freeing happen.
> 
> Still we do not suceed in much of reuse in between GGC and heap, so peak 
> memory
> use in TOP reduces by only 300MB.  It may be interesting to allow random 
> access
> into per-function decl in states and read them only after decl merging (and
> possibly unreachable function removal).
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
>   * lto-symtab.c (lto_cgraph_replace_node): Free decl_in_state.
>   * cgraph.c (cgraph_release_function_body): Free decl_in_state.
>   * lto-section-in.c (lto_free_function_in_decl_state): New function.
>   (lto_free_function_in_decl_state_for_node): New function.
> 
>   * lto.c (read_cgraph_and_symbols): Remove ggc_collect;
>   clear section node; add comment why we do not collect.
> Index: lto-symtab.c
> ===
> --- lto-symtab.c  (revision 202047)
> +++ lto-symtab.c  (working copy)
> @@ -80,6 +80,8 @@ lto_cgraph_replace_node (struct cgraph_n
>/* Redirect incomming references.  */
>ipa_clone_referring ((symtab_node)prevailing_node, &node->symbol.ref_list);
>  
> +  lto_free_function_in_decl_state_for_node ((symtab_node)node);
> +
>if (node->symbol.decl != prevailing_node->symbol.decl)
>  cgraph_release_function_body (node);
>  
> Index: cgraph.c
> ===
> --- cgraph.c  (revision 202047)
> +++ cgraph.c  (working copy)
> @@ -1663,6 +1663,8 @@ cgraph_release_function_body (struct cgr
>if (!node->used_as_abstract_origin && DECL_INITIAL (node->symbol.decl))
>  DECL_INITIAL (node->symbol.decl) = error_mark_node;
>release_function_body (node->symbol.decl);
> +  if (node->symbol.lto_file_data)
> +lto_free_function_in_decl_state_for_node ((symtab_node) node);
>  }
>  
>  /* Remove the node from cgraph.  */
> Index: lto-section-in.c
> ===
> --- lto-section-in.c  (revision 202047)
> +++ lto-section-in.c  (working copy)
> @@ -414,6 +414,41 @@ lto_get_function_in_decl_state (struct l
>return slot? ((struct lto_in_decl_state*) *slot) : NULL;
>  }
>  
> +/* Free decl_states.  */
> +
> +void
> +lto_free_function_in_decl_state (struct lto_in_decl_state *state)
> +{
> +  int i;
> +  for (i = 0; i < LTO_N_DECL_STREAMS; i++)
> +ggc_free (state->streams[i].trees);

We likely also waste a lot of memory by means of GC overhead here.
Moving this array out of GC space would be better (we know the
exact size upfront) - should be doable with making lto_tree_ref_able
GTY((user)) and a custom marker.

> +  ggc_free (state);
> +}
> +
> +/* Free decl_states associated with NODE.  This makes it possible to furhter
> +   release trees needed by the NODE's body.  */
> +
> +void
> +lto_free_function_in_decl_state_for_node (symtab_node node)
> +{
> +  struct lto_in_decl_state temp;
> +  void **slot;
> +
> +  if (!node->symbol.lto_file_data)
> +return;
> +
> +  temp.fn_decl = node->symbol.decl;
> +  slot = htab_find_slot (node->symbol.lto_file_data->function_decl_states,
> +  &temp, NO_INSERT);
> +  if (slot && *slot)
> +{
> +  lto_free_function_in_decl_state ((struct lto_in_decl_state*) *slot);
> +  htab_clear_slot (node->symbol.lto_file_data->function_decl_states,
> +slot);
> +}
> +  node->symbol.lto_file_data = NULL;
> +}
> +
>  
>  /* Report read pass end of the section.  */
>  
> Index: ipa.c
> ===
> --- ipa.c (revision 202047)
> +++ ipa.c (working copy)
> @@ -873,6 +873,17 @@ function_and_variable_visibility (bool w
>  segfault though. */
>   symtab_dissolve_same_comdat_group_list ((symtab_node) node);
>   }
> +  if (node->symbol.externally_visible
> +  && DECL_COMDAT (node->symbol.decl)
> +   && comdat_can_be_unshared_p ((symtab_node) node))
> + {
> +   if (dump_file
> +   && DECL_VISIBILITY (node->symbol.decl) != VISIBILITY_HIDDEN)
> + fprintf (dump_file, "Promoting visibility to hidden: %s/%i\n",
> +  cgraph_node_name (node), node->symbol.order);
> +   DECL_VISIBILITY (node->symbol.decl) = VISIBILITY_HIDDEN;
> +   DECL_VISIBILITY_SPECIFIED (node->symbol.decl) = true;
> + }
>  
>if (node->thunk.thunk_p
> && TREE_PUBLIC (node->symbol.decl))
> @@ -980,6 +991,17 @@ function_and_varia

Re: [PATCH] Convert more passes to new dump framework

2013-08-28 Thread Richard Biener
On Wed, Aug 7, 2013 at 7:23 AM, Teresa Johnson  wrote:
> On Tue, Aug 6, 2013 at 9:29 AM, Teresa Johnson  wrote:
>> On Tue, Aug 6, 2013 at 9:01 AM, Martin Jambor  wrote:
>>> Hi,
>>>
>>> On Tue, Aug 06, 2013 at 07:14:42AM -0700, Teresa Johnson wrote:
 On Tue, Aug 6, 2013 at 5:37 AM, Martin Jambor  wrote:
 > On Mon, Aug 05, 2013 at 10:37:00PM -0700, Teresa Johnson wrote:
 >> This patch ports messages to the new dump framework,
 >
 > It would be great this new framework was documented somewhere.  I lost
 > track of what was agreed it would be and from the uses in the
 > vectorizer I was never quite sure how to utilize it in other passes.

 Cc'ing Sharad who implemented this - Sharad, is this documented on a
 wiki or elsewhere?
>>>
>>> Thanks
>>>

 >
 > I'd also like to point out two other minor things inline:
 >
 > [...]
 >
 >> 2013-08-06  Teresa Johnson  
 >> Dehao Chen  
 >>
 >> * dumpfile.c (dump_loc): Add column number to output, make 
 >> newlines
 >> consistent.
 >> * dumpfile.h (OPTGROUP_OTHER): Add and enable under 
 >> OPTGROUP_ALL.
 >> * ipa-inline-transform.c (clone_inlined_nodes):
 >> (cgraph_node_opt_info): New function.
 >> (cgraph_node_call_chain): Ditto.
 >> (dump_inline_decision): Ditto.
 >> (inline_call): Invoke dump_inline_decision.
 >> * doc/invoke.texi: Document optall -fopt-info flag.
 >> * profile.c (read_profile_edge_counts): Use new dump framework.
 >> (compute_branch_probabilities): Ditto.
 >> * passes.c (pass_manager::register_one_dump_file): Use 
 >> OPTGROUP_OTHER
 >> when pass not in any opt group.
 >> * value-prof.c (check_counter): Use new dump framework.
 >> (find_func_by_funcdef_no): Ditto.
 >> (check_ic_target): Ditto.
 >> * coverage.c (get_coverage_counts): Ditto.
 >> (coverage_init): Setup new dump framework.
 >> * ipa-inline.c (inline_small_functions): Set is_in_ipa_inline.
 >> * ipa-inline.h (is_in_ipa_inline): Declare.
 >>
 >> * testsuite/gcc.dg/pr40209.c: Use -fopt-info.
 >> * testsuite/gcc.dg/pr26570.c: Ditto.
 >> * testsuite/gcc.dg/pr32773.c: Ditto.
 >> * testsuite/g++.dg/tree-ssa/dom-invalid.C (struct C): Ditto.
 >>
 >
 > [...]
 >
 >> Index: ipa-inline-transform.c
 >> ===
 >> --- ipa-inline-transform.c  (revision 201461)
 >> +++ ipa-inline-transform.c  (working copy)
 >> @@ -192,6 +192,108 @@ clone_inlined_nodes (struct cgraph_edge *e, bool d
 >>  }
 >>
 >>
 >> +#define MAX_INT_LENGTH 20
 >> +
 >> +/* Return NODE's name and profile count, if available.  */
 >> +
 >> +static const char *
 >> +cgraph_node_opt_info (struct cgraph_node *node)
 >> +{
 >> +  char *buf;
 >> +  size_t buf_size;
 >> +  const char *bfd_name = lang_hooks.dwarf_name (node->symbol.decl, 0);
 >> +
 >> +  if (!bfd_name)
 >> +bfd_name = "unknown";
 >> +
 >> +  buf_size = strlen (bfd_name) + 1;
 >> +  if (profile_info)
 >> +buf_size += (MAX_INT_LENGTH + 3);
 >> +
 >> +  buf = (char *) xmalloc (buf_size);
 >> +
 >> +  strcpy (buf, bfd_name);
 >> +
 >> +  if (profile_info)
 >> +sprintf (buf, "%s ("HOST_WIDEST_INT_PRINT_DEC")", buf, 
 >> node->count);
 >> +  return buf;
 >> +}
 >
 > I'm not sure if output of this function is aimed only at the user or
 > if it is supposed to be used by gcc developers as well.  If the
 > latter, an incredibly useful thing is to also dump node->symbol.order
 > too.  We usually dump it after "/" sign separating it from node name.
 > It is invaluable when examining decisions in C++ code where you can
 > have lots of clones of a node (and also because existing dumps print
 > it, it is easy to combine them).

 The output is useful for both power users doing performance tuning of
 their application, and by gcc developers. Adding the id is not so
 useful for the former, but I agree that it is very useful for compiler
 developers. In fact, in the google branch version we emit more verbose
 information (the lipo module id and the funcdef_no) to help uniquely
 identify the routines and to aid in post-processing by humans and
 tools. So it is probably useful to add something similar here too. Is
 the node->symbol.order more or less unique than the funcdef_no? I see
 that you added a patch a few months ago to print the
 node->symbol.order in the function header, and it also has the
 advantage as you note of matching up with existing ipa dumps.
>>>
>>> node->symbol.order is unique and i

Re: [ubsan] Introduce pointer_sized_int_node

2013-08-28 Thread Richard Biener
On Tue, Aug 27, 2013 at 2:56 PM, Marek Polacek  wrote:
> On Tue, Aug 27, 2013 at 01:48:29PM +0200, Richard Biener wrote:
>> > +  TI_POINTER_SIZED_TYPE,
>>
>> I'd rather see TI_UINTPTR_TYPE and TI_INTPTR_TYPE (note they might
>> not be exactly of POINTER_SIZE but larger).
>
> We already have [u]intptr_type_node -- but only in c-family/, thus
> ubsan.c/asan.c cannot use those nodes.  I can create both
> TI_SIGNED_POINTER_SIZED_TYPE and TI_UNSIGNED_POINTER_SIZED_TYPE,
> but we currently need only the latter...

So simply move them to the middle-end.  The set of C language types that
define the target ABI should be constructed and maintained by the middle-end
(you need it for various builtins anyway)

Richard.

>> TI_POINTER_SIZED_TYPE is ambiguous, too - what's its signedness?
>
> Unsigned.  But yeah, one can't tell by just looking at the name.
>
>> All around the compiler we use sizetype and ssizetype to munge pointers
>> (well, not strictly correct as targets may define sizetype to be 
>> larger/smaller
>> than actual pointers).
>
> I see.
>
> Marek


Free decl_in_states when not needed

2013-08-28 Thread Jan Hubicka
Hi,
this patch saves almost 1Gb of GGC reachable memory for firefox. 
Currently we point to all streamed in trees from global decl in state and from
function specific states.  The states are big and they dangle pointers.
This patch gets rid of states we do not need.

We run GGC collect just once during usual WPA compilation.  Currently
it is just after tree loading that introduce very little garbage.  This
patch moves it after unreachable code removal when freeing happen.

Still we do not suceed in much of reuse in between GGC and heap, so peak memory
use in TOP reduces by only 300MB.  It may be interesting to allow random access
into per-function decl in states and read them only after decl merging (and
possibly unreachable function removal).

Bootstrapped/regtested x86_64-linux, OK?

* lto-symtab.c (lto_cgraph_replace_node): Free decl_in_state.
* cgraph.c (cgraph_release_function_body): Free decl_in_state.
* lto-section-in.c (lto_free_function_in_decl_state): New function.
(lto_free_function_in_decl_state_for_node): New function.

* lto.c (read_cgraph_and_symbols): Remove ggc_collect;
clear section node; add comment why we do not collect.
Index: lto-symtab.c
===
--- lto-symtab.c(revision 202047)
+++ lto-symtab.c(working copy)
@@ -80,6 +80,8 @@ lto_cgraph_replace_node (struct cgraph_n
   /* Redirect incomming references.  */
   ipa_clone_referring ((symtab_node)prevailing_node, &node->symbol.ref_list);
 
+  lto_free_function_in_decl_state_for_node ((symtab_node)node);
+
   if (node->symbol.decl != prevailing_node->symbol.decl)
 cgraph_release_function_body (node);
 
Index: cgraph.c
===
--- cgraph.c(revision 202047)
+++ cgraph.c(working copy)
@@ -1663,6 +1663,8 @@ cgraph_release_function_body (struct cgr
   if (!node->used_as_abstract_origin && DECL_INITIAL (node->symbol.decl))
 DECL_INITIAL (node->symbol.decl) = error_mark_node;
   release_function_body (node->symbol.decl);
+  if (node->symbol.lto_file_data)
+lto_free_function_in_decl_state_for_node ((symtab_node) node);
 }
 
 /* Remove the node from cgraph.  */
Index: lto-section-in.c
===
--- lto-section-in.c(revision 202047)
+++ lto-section-in.c(working copy)
@@ -414,6 +414,41 @@ lto_get_function_in_decl_state (struct l
   return slot? ((struct lto_in_decl_state*) *slot) : NULL;
 }
 
+/* Free decl_states.  */
+
+void
+lto_free_function_in_decl_state (struct lto_in_decl_state *state)
+{
+  int i;
+  for (i = 0; i < LTO_N_DECL_STREAMS; i++)
+ggc_free (state->streams[i].trees);
+  ggc_free (state);
+}
+
+/* Free decl_states associated with NODE.  This makes it possible to furhter
+   release trees needed by the NODE's body.  */
+
+void
+lto_free_function_in_decl_state_for_node (symtab_node node)
+{
+  struct lto_in_decl_state temp;
+  void **slot;
+
+  if (!node->symbol.lto_file_data)
+return;
+
+  temp.fn_decl = node->symbol.decl;
+  slot = htab_find_slot (node->symbol.lto_file_data->function_decl_states,
+&temp, NO_INSERT);
+  if (slot && *slot)
+{
+  lto_free_function_in_decl_state ((struct lto_in_decl_state*) *slot);
+  htab_clear_slot (node->symbol.lto_file_data->function_decl_states,
+  slot);
+}
+  node->symbol.lto_file_data = NULL;
+}
+
 
 /* Report read pass end of the section.  */
 
Index: ipa.c
===
--- ipa.c   (revision 202047)
+++ ipa.c   (working copy)
@@ -873,6 +873,17 @@ function_and_variable_visibility (bool w
   segfault though. */
symtab_dissolve_same_comdat_group_list ((symtab_node) node);
}
+  if (node->symbol.externally_visible
+  && DECL_COMDAT (node->symbol.decl)
+ && comdat_can_be_unshared_p ((symtab_node) node))
+   {
+ if (dump_file
+ && DECL_VISIBILITY (node->symbol.decl) != VISIBILITY_HIDDEN)
+   fprintf (dump_file, "Promoting visibility to hidden: %s/%i\n",
+cgraph_node_name (node), node->symbol.order);
+ DECL_VISIBILITY (node->symbol.decl) = VISIBILITY_HIDDEN;
+ DECL_VISIBILITY_SPECIFIED (node->symbol.decl) = true;
+   }
 
   if (node->thunk.thunk_p
  && TREE_PUBLIC (node->symbol.decl))
@@ -980,6 +991,17 @@ function_and_variable_visibility (bool w
symtab_dissolve_same_comdat_group_list ((symtab_node) vnode);
  vnode->symbol.resolution = LDPR_PREVAILING_DEF_IRONLY;
}
+  if (vnode->symbol.externally_visible
+  && DECL_COMDAT (vnode->symbol.decl)
+ && comdat_can_be_unshared_p ((symtab_node) vnode))
+   {
+ if (dump_file
+ && DECL_VISIBILITY (vnode->symbol.decl) == VISIBILITY_HIDDEN)
+   fprintf (dump_f

Re: ELF interposition and One Definition Rule

2013-08-28 Thread Richard Biener
On Tue, Aug 27, 2013 at 2:53 PM, Nathan Sidwell
 wrote:
> On 08/26/13 20:58, Jason Merrill wrote:
>
>> I would be happy with an even stronger default that optimizes on the
>> assumption
>> that no interposition occurs; typically interposition is overriding a
>> symbol
>> found in a dynamic library (i.e. malloc) rather than a symbol defined in
>> the
>> same translation unit as the use.
>
>
> I had thought that that case (overriding malloc etc) was what this patch was
> dealing with.  Perhaps I was confused.
>
> There's nothing particularly special about ctors, dtors and virtual function
> implementations.  Their common feature, as Jan described, is that it's hard
> to take their address:
> * ctors don't have names.
> * dtors do have names, but you can't take their address.
> * virtual function implementations can't have their instance address taken
> -- you get the ptr-to-memfn object describing the vtable slot etc.  There is
> (was?) a GNU extension to work around that, IIRC.
>
> So that stops a programmer calling these indirectly.  However I don't really
> see the semantic difference between:
>
>   fptr = &somefunc; ... fptr (...);
> and
>   somefunc (...);
>
> So treating those 3 types of function specially because you can't use the
> former syntax with them seems odd.

Yeah - this is why I'm always worried about "frontend specific" optimizations
in the middle-end.  If you need to rely stuff that's not visible in GIMPLE then
it smells.

Richard.

> nathan
>
> --
> Nathan Sidwell


Re: [PATCH] Change the badness computation to ensure no integer-underflow

2013-08-28 Thread Richard Biener
On Tue, Aug 27, 2013 at 2:50 PM, Jan Hubicka  wrote:
>> >>
>> >> I've updated the patch (as attached) to use sreal to compute badness.
>>  +  badness = ((int)((double) edge->count / max_count
>>  +  * relbenefit / RELATIVE_TIME_BENEFIT_RANGE * INT_MIN / 2)) / growth;
>>  +
>> >>>
>> >>> FP operations on the host are frowned upon if code generation depends
>> >>> on their outcome.  They all should use sreal_* as predict already does.
>> >>> Other than that I wonder why the final division is int (so we get 
>> >>> truncation
>> >>> and not rounding)?  That increases the effect of doing the multiply by
>> >>> relbenefit in double.
> Sorry, i missed this patch - I need to update my scripts marking files 
> interesting
> to me.
>
> I originally duplicated the trick from global.c that used to do this for ages.
> The rationale is that if you do only one FP operation, then you will not get
> difference in between hosts doing 80bit fp (x86) and 64bit fp (others) and you
> do the operation a lot faster than sreal does. While this is on slipperly 
> ground,
> controlled use of double should not imply host dependency.

Well .. the above is clearly not "one" FP operation.  Also I don't follow.
If we can assume all hosts implement proper IEEE double math then
of course the issue is moot.  But fact is we don't (not sure if we can
declare such hosts broken and unsupported as host).  Of couse on
a perfect IEEE compliant host sreal.[ch] could be simply mapped to
double.  Oh, and yes, 80bit x87 unit isn't holding up to that premises.

> Badness computation is on hot the path of inliner that is the slowest WPA
> optimization we have (well becuase it only does something realy useful).  I
> suppose I can try to get Martin to benchmark the patch on firefox on the
> afternoon.
>
> Honza
>> >>>
>> >>> Richard.
>> >>>
>> /* Be sure that insanity of the profile won't lead to increasing 
>>  counts
>>    in the scalling and thus to overflow in the computation above.  */
>> gcc_assert (max_count >= edge->count);


Re: [ubsan] Use pointer map instead of hash table.

2013-08-28 Thread Richard Biener
On Tue, Aug 27, 2013 at 2:33 PM, Marek Polacek  wrote:
> It turned out that for tree -> tree mapping we don't need the hash
> table at all; pointer map is much more convenient.  So this patch
> weeds out the hash table out of ubsan and introduces pointer map
> instead.  Quite a lot of code could go away--no need to set the
> alloc pools up etc.
>
> Regtested, ran bootstrap-ubsan on x86_64-linux.  Applying to the
> ubsan branch.

You can use the type-safe pointer_map  now (ok, only the data type
is type safe, the pointer is still void).

Richard.

> 2013-08-27  Marek Polacek  
>
> * ubsan.c: Use pointer map instead of hash table.
>
> --- gcc/ubsan.c.mp  2013-08-27 12:31:04.136213922 +0200
> +++ gcc/ubsan.c 2013-08-27 12:31:10.361236456 +0200
> @@ -22,109 +22,42 @@ along with GCC; see the file COPYING3.
>  #include "system.h"
>  #include "coretypes.h"
>  #include "tree.h"
> -#include "alloc-pool.h"
>  #include "cgraph.h"
>  #include "gimple.h"
> -#include "hash-table.h"
> +#include "pointer-set.h"
>  #include "output.h"
>  #include "toplev.h"
>  #include "ubsan.h"
>  #include "c-family/c-common.h"
>
> -/* This type represents an entry in the hash table; this hash table
> -   maps from a TYPE to a ubsan type descriptor VAR_DECL for that type.  */
> -struct ubsan_typedesc
> -{
> -  /* This represents the type of a variable.  */
> -  tree type;
> -
> -  /* This is the VAR_DECL of the type.  */
> -  tree decl;
> -};
> -
> -static alloc_pool ubsan_typedesc_alloc_pool;
> -
> -/* Hash table for type descriptors.  */
> -struct ubsan_typedesc_hasher
> -  : typed_noop_remove 
> -{
> -  typedef ubsan_typedesc value_type;
> -  typedef ubsan_typedesc compare_type;
> -
> -  static inline hashval_t hash (const value_type *);
> -  static inline bool equal (const value_type *, const compare_type *);
> -};
> -
> -/* Hash a memory reference.  */
> -
> -inline hashval_t
> -ubsan_typedesc_hasher::hash (const ubsan_typedesc *data)
> -{
> -  return iterative_hash_object (TYPE_UID (data->type), 0);
> -}
> -
> -/* Compare two data types.  */
> -
> -inline bool
> -ubsan_typedesc_hasher::equal (const ubsan_typedesc *d1,
> - const ubsan_typedesc *d2)
> -{
> -  /* Here, the types should have identical __typekind,
> - __typeinfo and __typename.  */
> -  return d1->type == d2->type;
> -}
> -
> -static hash_table  ubsan_typedesc_ht;
> +/* Map a TYPE to an ubsan type descriptor VAR_DECL for that type.  */
> +static pointer_map_t *typedesc_map;
>
> -/* Initializes an instance of ubsan_typedesc.  */
> +/* Insert DECL as the VAR_DECL for TYPE in the TYPEDESC_MAP.  */
>
>  static void
> -ubsan_typedesc_init (ubsan_typedesc *data, tree type, tree decl)
> +insert_decl_for_type (tree decl, tree type)
>  {
> -  data->type = type;
> -  data->decl = decl;
> +  void **slot = pointer_map_insert (typedesc_map, type);
> +  gcc_assert (*slot == NULL);
> +  *slot = decl;
>  }
>
> -/* This creates the alloc pool used to store the instances of
> -   ubsan_typedesc that are stored in the hash table ubsan_typedesc_ht.  */
> +/* Find the VAR_DECL for TYPE in TYPEDESC_MAP.  If TYPE does not
> +   exist in the map, return NULL_TREE, otherwise, return the VAR_DECL
> +   we found.  */
>
> -static alloc_pool
> -ubsan_typedesc_get_alloc_pool ()
> -{
> -  if (ubsan_typedesc_alloc_pool == NULL)
> -ubsan_typedesc_alloc_pool = create_alloc_pool ("ubsan_typedesc",
> -  sizeof (ubsan_typedesc),
> -  10);
> -  return ubsan_typedesc_alloc_pool;
> -}
> -
> -/* Returns a reference to the hash table containing data type.
> -   This function ensures that the hash table is created.  */
> -
> -static hash_table  &
> -get_typedesc_hash_table ()
> -{
> -  if (!ubsan_typedesc_ht.is_created ())
> -ubsan_typedesc_ht.create (10);
> -
> -  return ubsan_typedesc_ht;
> -}
> -
> -/* Allocates memory for an instance of ubsan_typedesc into the memory
> -   pool returned by ubsan_typedesc_get_alloc_pool and initialize it.
> -   TYPE describes a particular type, DECL is its VAR_DECL.  */
> -
> -static ubsan_typedesc *
> -ubsan_typedesc_new (tree type, tree decl)
> +static tree
> +lookup_decl_for_type (tree type)
>  {
> -  ubsan_typedesc *desc =
> -(ubsan_typedesc *) pool_alloc (ubsan_typedesc_get_alloc_pool ());
> -
> -  ubsan_typedesc_init (desc, type, decl);
> -  return desc;
> +  /* If the pointer map is not initialized yet, create it now.  */
> +  if (typedesc_map == NULL)
> +typedesc_map = pointer_map_create ();
> +  void **slot = pointer_map_contains (typedesc_map, type);
> +  return slot ? (tree) *slot : NULL_TREE;
>  }
>
> -/* Helper routine, which encodes a value in the uptr type.
> +/* Helper routine, which encodes a value in the pointer_sized_int_node.
> Arguments with precision <= POINTER_SIZE are passed directly,
> the rest is passed by reference.  T is a value we are to encode.  */
>
> @@ -281,24 +214,19 @@ get_ubsan_ty

Re: wide-int branch now up for public comment and review

2013-08-28 Thread Richard Biener
On Wed, 28 Aug 2013, Richard Sandiford wrote:

> Richard Biener  writes:
> >> * As above, constants coming from rtl are already in the right form,
> >>   so if you create a wide_int from an rtx and only query it, no explicit
> >>   extension is needed.
> >> 
> >> * Things like logical operations and right shifts naturally preserve
> >>   the sign-extended form, so only a subset of write operations need
> >>   to take special measures.
> >> 
> >> * You have a public interface that exposes the underlying HWIs
> >>   (which is fine with me FWIW), so it seems better to expose a 
> >> fully-defined
> >>   HWI rather than only a partially-defined HWI.
> >> 
> >> E.g. zero_p is:
> >> 
> >>   HOST_WIDE_INT x;
> >> 
> >>   if (precision && precision < HOST_BITS_PER_WIDE_INT)
> >> x = sext_hwi (val[0], precision);
> >>   else if (len == 0)
> >> {
> >>   gcc_assert (precision == 0);
> >>   return true;
> >> }
> >>   else
> >> x = val[0];
> >> 
> >>   return len == 1 && x == 0;
> >> 
> >> but I think it really ought to be just:
> >> 
> >>   return len == 1 && val[0] == 0;
> >
> > Yes!
> >
> > But then - what value does keeping track of a 'precision' have
> > in this case?  It seems to me it's only a "convenient carrier"
> > for
> >
> >   wide_int x = wide-int-from-RTX (y);
> >   machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode;
> >   ... process x ...
> >   RTX = RTX-from-wide_int (x, saved_mode);
> >
> > that is, wide-int doesn't do anything with 'precision' but you
> > can extract it later to not need to remember a mode you were
> > interested in?
> 
> I can see why you like the constant-precision, very wide integers for trees,
> where the constants have an inherent sign.  But (and I think this might be
> old ground too :-)), that isn't the case with rtl.  At the tree level,
> using constant-precision, very wide integers allows you to add a 32-bit
> signed INTEGER_CST to a 16-unsigned INTEGER_CST.  And that has an
> obvious meaning, both as a 32-bit result or as a wider result, depending
> on how you choose to use it.  But in rtl there is no meaning to adding
> an SImode and an HImode value together, since we don't know how to
> extend the HImode value beyond its precision.  You must explicitly sign-
> or zero-extend the value first.  (The fact that we choose to sign-extend
> rtl constants when storing them in HWIs is just a representation detail,
> to avoid having undefined bits in the HWIs.  It doesn't mean that rtx
> values themselves are signed.  We could have used a zero-extending
> representation instead without changing the semantics.)

Yeah, that was my understanding.

> So the precision variable is good for the rtl level in several ways:
> 
> - As you say, it avoids adding the explicit truncations that (in practice)
>   every rtl operation would need
> 
> - It's more efficient in that case, since we don't calculate high values
>   and then discard them immediately.  The common GET_MODE_PRECISION (mode)
>   <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
>   the wide-int trappings.
> 
> - It's a good way of checking type safety and making sure that excess
>   bits aren't accidentally given a semantic meaning.  This is the most
>   important reason IMO.
> 
> The branch has both the constant-precision, very wide integers that we
> want for trees and the variable-precision integers we want for rtl,
> so it's not an "either or".  With the accessor-based implementation,
> there should be very little cost to having both.

So what I wonder (and where we maybe disagree) is how much code
wants to inspect "intermediate" results.  Say originally you have

rtx foo (rtx x, rtx y)
{
  rtx tem = simplify_const_binary_operation (PLUS, GET_MODE (x), x, 
GEN_INT (1));
  rtx res = simplify_const_binary_operation (MINUS, GET_MODE (tem), tem, 
y);
  return res;
}

and with wide-int you want to change that to

rtx foo (rtx x, rtx y)
{
  wide_int tem = wide_int (x) + 1;
  wide_int res = tem - y;
  return res.to_rtx ();
}

how much code ever wants to inspect 'tem' or 'res'?
That is, does it matter
if 'tem' and 'res' would have been calculated in "infinite precision"
and only to_rtx () would do the truncation to the desired mode?

I think not.  The amount of code performing multiple operations on
_constants_ in sequence is extremely low (if it even exists).

So I'd rather have to_rtx get a mode argument (or a precision) and
perform the required truncation / sign-extension at RTX construction
time (which is an expensive operation anyway).

So where does this "break"?  It only breaks where previous code
broke (or where previous code had measures to not break that we
don't carry over to the wide-int case).  Obvious case is unsigned
division on the sign-extended rep of RTL constants for example.

> >> The main thing that's changed since the early patches is that we now
> >> have a mixture of wide-int types.  This seems to have led to a lot of
> >> boiler-plate forwarding 

[committed] Fix bogus warning on OpenMP collapsed loops (PR middle-end/58257)

2013-08-28 Thread Jakub Jelinek
Hi!

The .count.NN vars have TREE_NO_WARNING set on them, because
they may be uninitialized if the loop body isn't executed at all
(zero trip count), but copy_var_decl wasn't propagating the flag over to
the parallel body ompfn function vars.

Fixed thusly, tested on x86_64-linux, committed to trunk/4.8.

2013-08-28  Jakub Jelinek  

PR middle-end/58257
* omp-low.c (copy_var_decl): Copy over TREE_NO_WARNING flag.

* c-c++-common/gomp/pr58257.c: New test.

--- gcc/omp-low.c.jj2013-08-28 11:27:54.0 +0200
+++ gcc/omp-low.c   2013-08-28 11:39:07.578948737 +0200
@@ -850,6 +850,7 @@ copy_var_decl (tree var, tree name, tree
   DECL_ARTIFICIAL (copy) = DECL_ARTIFICIAL (var);
   DECL_IGNORED_P (copy) = DECL_IGNORED_P (var);
   DECL_CONTEXT (copy) = DECL_CONTEXT (var);
+  TREE_NO_WARNING (copy) = TREE_NO_WARNING (var);
   TREE_USED (copy) = 1;
   DECL_SEEN_IN_BIND_EXPR_P (copy) = 1;
 
--- gcc/testsuite/c-c++-common/gomp/pr58257.c.jj2013-08-28 
11:56:17.991343584 +0200
+++ gcc/testsuite/c-c++-common/gomp/pr58257.c   2013-08-28 11:58:24.616648066 
+0200
@@ -0,0 +1,15 @@
+/* PR middle-end/58257 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fopenmp -Wall" } */
+
+int
+foo (int n)
+{
+  int a[10][10];
+  int x, y;
+#pragma omp parallel for collapse(2)   /* { dg-bogus "may be used 
uninitialized in this function" } */
+  for (x = 0; x < n; x++)  /* { dg-bogus "may be used 
uninitialized in this function" } */
+for (y = 0; y < n; y++)
+  a[x][y] = x + y * y;
+  return a[0][0];
+}

Jakub


[PATCH 2/2] fix PR48835 ICE-on-HAVE_cc0 regression from PR50780 changes

2013-08-28 Thread Mikael Pettersson
This patch fixes an ICE that occurs in #ifdef HAVE_cc0 code.  The ICE
breaks Ada bootstrap with m68060 multilibs on m68k-linux; see
.

The ICE is triggered by the PR middle-end/50780 changes in r180192.
Unlike the previous patch for PR49847, there is no simple test case,
but a bisection identified the same revision as the trigger.

In this case it's the the first  in fold_rtx that's reached:
it unconditionally returns prev_insn_cc0 but that's NULL in this case
(due to r180192 changing the BB boundaries in EH regions), which causes
the ICE a few steps later.

At this point in fold_rtx, if it cannot optimize the expression it should
just return the original expression, so the patch checks if prev_insn_c00
is NULL and if so returns the original expression rather than NULL.

Bootstrapped and regtested on m68k-linux, no regressions.  This patch has
been in a 4.7-based production compiler on m68k-linux since early 2013.

Ok for trunk and 4.8?

[If approved I'll need help to commit it as I don't have commit rights.]

gcc/

2013-08-28  Mikael Pettersson  

PR ada/48835
* cse.c (fold_rtx) : If prev_insn_cc0 is NULL
return the orginal expression x.

--- gcc-4.9-20130818/gcc/cse.c.~1~  2013-08-05 22:16:05.0 +0200
+++ gcc-4.9-20130818/gcc/cse.c  2013-08-24 16:37:47.572940952 +0200
@@ -3137,6 +3137,8 @@ fold_rtx (rtx x, rtx insn)
 
 #ifdef HAVE_cc0
 case CC0:
+  if (! prev_insn_cc0)
+   return x;
   return prev_insn_cc0;
 #endif
 


[PATCH 1/2] fix PR49847 ICE-on-HAVE_cc0 regression from PR50780 changes

2013-08-28 Thread Mikael Pettersson
This patch fixes an ICE that occurs in #ifdef HAVE_cc0 code.  The ICE
breaks both Java and Ada bootstrap on m68k-linux.  There is also a
tiny C++ test case in the BZ entry.

The ICE is triggered by the PR middle-end/50780 changes in r180192.
As explained in ,
the effect of those changes is that an expression in EH context is
broken up so that the cc0 setter and user are put in separate BBs, which
normally isn't allowed.  As fold_rtx sees the cc0 user, it calls
equiv_constant on prev_insn_cc0, but that is NULL due to the BB boundary,
resulting in the ICE.

This patch checks if prev_insn_cc0 is NULL, and if so doesn't call
equiv_constant but sets const_arg to zero, which avoids the ICE and
makes fold_rtx leave the insn unchanged.

Bootstrapped and regtested on m68k-linux, no regressions.  This patch
has been in 4.6-based production compilers on m68k-linux since early 2012,
and in a 4.7-based compiler since early 2013.

Ok for trunk and 4.8?

[If approved I'll need help to commit it as I don't have commit rights.]

gcc/

2013-08-28  Mikael Pettersson  

PR rtl-optimization/49847
* cse.c (fold_rtx) : If prev_insn_cc0 is NULL
don't call equiv_constant on it.

--- gcc-4.9-20130818/gcc/cse.c.~1~  2013-08-05 22:16:05.0 +0200
+++ gcc-4.9-20130818/gcc/cse.c  2013-08-24 16:38:40.912803915 +0200
@@ -3194,9 +3194,14 @@ fold_rtx (rtx x, rtx insn)
 
 #ifdef HAVE_cc0
  case CC0:
-   folded_arg = prev_insn_cc0;
-   mode_arg = prev_insn_cc0_mode;
-   const_arg = equiv_constant (folded_arg);
+   if (!prev_insn_cc0)
+ const_arg = 0;
+   else
+ {
+   folded_arg = prev_insn_cc0;
+   mode_arg = prev_insn_cc0_mode;
+   const_arg = equiv_constant (folded_arg);
+ }
break;
 #endif
 


Re: wide-int branch now up for public comment and review

2013-08-28 Thread Richard Sandiford
Richard Biener  writes:
>> * As above, constants coming from rtl are already in the right form,
>>   so if you create a wide_int from an rtx and only query it, no explicit
>>   extension is needed.
>> 
>> * Things like logical operations and right shifts naturally preserve
>>   the sign-extended form, so only a subset of write operations need
>>   to take special measures.
>> 
>> * You have a public interface that exposes the underlying HWIs
>>   (which is fine with me FWIW), so it seems better to expose a fully-defined
>>   HWI rather than only a partially-defined HWI.
>> 
>> E.g. zero_p is:
>> 
>>   HOST_WIDE_INT x;
>> 
>>   if (precision && precision < HOST_BITS_PER_WIDE_INT)
>> x = sext_hwi (val[0], precision);
>>   else if (len == 0)
>> {
>>   gcc_assert (precision == 0);
>>   return true;
>> }
>>   else
>> x = val[0];
>> 
>>   return len == 1 && x == 0;
>> 
>> but I think it really ought to be just:
>> 
>>   return len == 1 && val[0] == 0;
>
> Yes!
>
> But then - what value does keeping track of a 'precision' have
> in this case?  It seems to me it's only a "convenient carrier"
> for
>
>   wide_int x = wide-int-from-RTX (y);
>   machine_mode saved_mode = mode-available? GET_MODE (y) : magic-mode;
>   ... process x ...
>   RTX = RTX-from-wide_int (x, saved_mode);
>
> that is, wide-int doesn't do anything with 'precision' but you
> can extract it later to not need to remember a mode you were
> interested in?

I can see why you like the constant-precision, very wide integers for trees,
where the constants have an inherent sign.  But (and I think this might be
old ground too :-)), that isn't the case with rtl.  At the tree level,
using constant-precision, very wide integers allows you to add a 32-bit
signed INTEGER_CST to a 16-unsigned INTEGER_CST.  And that has an
obvious meaning, both as a 32-bit result or as a wider result, depending
on how you choose to use it.  But in rtl there is no meaning to adding
an SImode and an HImode value together, since we don't know how to
extend the HImode value beyond its precision.  You must explicitly sign-
or zero-extend the value first.  (The fact that we choose to sign-extend
rtl constants when storing them in HWIs is just a representation detail,
to avoid having undefined bits in the HWIs.  It doesn't mean that rtx
values themselves are signed.  We could have used a zero-extending
representation instead without changing the semantics.)

So the precision variable is good for the rtl level in several ways:

- As you say, it avoids adding the explicit truncations that (in practice)
  every rtl operation would need

- It's more efficient in that case, since we don't calculate high values
  and then discard them immediately.  The common GET_MODE_PRECISION (mode)
  <= HOST_BITS_PER_WIDE_INT case stays a pure HWI operation, despite all
  the wide-int trappings.

- It's a good way of checking type safety and making sure that excess
  bits aren't accidentally given a semantic meaning.  This is the most
  important reason IMO.

The branch has both the constant-precision, very wide integers that we
want for trees and the variable-precision integers we want for rtl,
so it's not an "either or".  With the accessor-based implementation,
there should be very little cost to having both.

>> The main thing that's changed since the early patches is that we now
>> have a mixture of wide-int types.  This seems to have led to a lot of
>> boiler-plate forwarding functions (or at least it felt like that while
>> moving them all out the class).  And that in turn seems to be because
>> you're trying to keep everything as member functions.  E.g. a lot of the
>> forwarders are from a member function to a static function.
>> 
>> Wouldn't it be better to have the actual classes be light-weight,
>> with little more than accessors, and do the actual work with non-member
>> template functions?  There seems to be 3 grades of wide-int:
>> 
>>   (1) read-only, constant precision  (from int, etc.)
>>   (2) read-write, constant precision  (fixed_wide_int)
>>   (3) read-write, variable precision  (wide_int proper)
>> 
>> but we should be able to hide that behind templates, with compiler errors
>> if you try to write to (1), etc.
>
> Yeah, I'm probably trying to clean up the implementation once I
> got past recovering from two month without GCC ...

FWIW, I've been plugging away at a version that uses accessors.
I hope to have it vaguely presentable by the middle of next week,
in case your recovery takes that long...

Thanks,
Richard


Re: opt-info message change for vectorizer

2013-08-28 Thread Richard Biener
On Tue, Aug 27, 2013 at 10:30 PM, Xinliang David Li  wrote:
> If this is the convention, we should probably have another patch to
> fix all the existing opt-info messages.

Yes please.

Also ...


 b.c:16:A::foo: note: Loop is vectorized

"loop vectorized"


 b.c:16:A::foo: note: Loop is versioned to remove aliases for vectorization

"loop versioned for vectorization because of possible aliasing"

 b.c:16:A::foo: note: Loop is peeled to enhance alignment for vectorization

"loop peeled for vectorization to enhance alignment"

 b.c:16:A::foo: note: Completely unroll loop 6 times

maybe "loop with 6 iterations completely unrolled"


 b.c:12:A::foo: note: Completely unroll loop 6 times


I hate the excessive vertical spacing as well.

Richard.

 Ok after testing?

 thanks,

 David
>>


Re: [PATCH] -mcmodel=large -fpic TLS GD and LD support gcc + binutils (PR target/58067)

2013-08-28 Thread Uros Bizjak
On Wed, Aug 28, 2013 at 11:37 AM, Jakub Jelinek  wrote:
> On Wed, Aug 14, 2013 at 09:03:24AM +0200, Uros Bizjak wrote:
>> The implementation for x86 is technically OK, but I wonder if these
>> sequences should be documented in some authoritative document about
>> TLS relocations. The "ELF Handling For Thread-Local Storage" document
>> [1] doesn't mention various code models fo x86_64, so I was not able
>> to cross-check the implementaton vs. documentation.
>>
>> [1] http://www.akkadia.org/drepper/tls.pdf
>
> Ping, are the patches ok for gcc trunk and binutils trunk?
> Uli has kindly updated the docs some time ago.

OK for gcc.

Thanks,
Uros.


Re: [PATCH] -mcmodel=large -fpic TLS GD and LD support gcc + binutils (PR target/58067)

2013-08-28 Thread Jakub Jelinek
On Wed, Aug 14, 2013 at 09:03:24AM +0200, Uros Bizjak wrote:
> The implementation for x86 is technically OK, but I wonder if these
> sequences should be documented in some authoritative document about
> TLS relocations. The "ELF Handling For Thread-Local Storage" document
> [1] doesn't mention various code models fo x86_64, so I was not able
> to cross-check the implementaton vs. documentation.
> 
> [1] http://www.akkadia.org/drepper/tls.pdf

Ping, are the patches ok for gcc trunk and binutils trunk?
Uli has kindly updated the docs some time ago.

Jakub


Re: [PATCH 1/2] Convert symtab, cgraph and varpool nodes into a real class hierarchy

2013-08-28 Thread Richard Biener
On Tue, Aug 27, 2013 at 5:06 PM, Mike Stump  wrote:
> On Aug 27, 2013, at 4:08 AM, Richard Biener  
> wrote:
>>> and converts:
>>>  struct GTY(()) cgraph_node
>>> to:
>>>  struct GTY((user)) cgraph_node : public symtab_node_base
>
> GTY didn't like single inheritance for me in in wide-int.h.  I extended GTY 
> to support it better.   See the wide-int branch, if you need more beef here.  
> It isn't flawless, but, it did left me to things to make it work nice enough 
> for my purposes.  In my case, all my data is in the base class, and the base 
> class and the derived have the same address (single inheritance, no virtual 
> bases), and I have no embedded pointers in my data.  I don't use user, seemed 
> less ideal to me.
>
> I also had to put:
>
>   void gt_ggc_mx(max_wide_int*) { }
>   void gt_pch_nx(max_wide_int*,void (*)(void*, void*), void*) { }
>   void gt_pch_nx(max_wide_int*) { }
>
> in wide-int.cc and:
>
>   extern void gt_ggc_mx(max_wide_int*);
>   extern void gt_pch_nx(max_wide_int*,void (*)(void*, void*), void*);
>   extern void gt_pch_nx(max_wide_int*);
>
> into wide-int.h to get it to go.  These I think are merely because I'm using 
> a typedef for a template with template args.  In the wide-int branch, it'd be 
> nice if a GTY god can figure out how to improves things so I don't need to do 
> the above, and to review/approve/fix the GTY code to support single 
> inheritance even better.

Huh?  Why should wide-int need to be marked GTY at all?!

Richard.

>>> and:
>>>  struct GTY(()) varpool_node
>>> to:
>>>  class GTY((user)) varpool_node : public symtab_node_base
>>>
>>> dropping the symtab_node_def union.
>>>
>>> Since gengtype is unable to cope with inheritance, we have to mark the
>>> types with GTY((user)), and handcode the gty field-visiting functions.
>>> Given the simple hierarchy, we don't need virtual functions for this.


Re: [PING PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK

2013-08-28 Thread Martin Jambor
On Wed, Aug 28, 2013 at 10:17:52AM +0200, Richard Biener wrote:
> On Wed, 28 Aug 2013, Richard Biener wrote:
> > Eh ... :/
> > 
> > I'm extremely nervous about this change.  I also believe the change
> > is unrelated to the issue in the bugreport (even if it happens to
> > fix the ICE).
> > 
> > Let me have a (short) look.
> 
> Everything looks ok up until
> 
>   if (offset != 0)
> {
>   enum machine_mode address_mode;
> 
> where we are supposed to factor in offset into the memory address.
> This code doesn't handle the movmisalign path which has the memory
> in 'mem' instead of in to_rtx.
> 
> And indeed a hack like
> 
> Index: gcc/expr.c
> ===
> --- gcc/expr.c  (revision 202043)
> +++ gcc/expr.c  (working copy)
> @@ -4753,6 +4753,9 @@ expand_assignment (tree to, tree from, b
> {
>   enum machine_mode address_mode;
>   rtx offset_rtx;
> + rtx saved_to_rtx = to_rtx;
> + if (misalignp)
> +   to_rtx = mem;
>  
>   if (!MEM_P (to_rtx))
> {
> @@ -4785,6 +4788,11 @@ expand_assignment (tree to, tree from, b
>   to_rtx = offset_address (to_rtx, offset_rtx,
>highest_pow2_factor_for_target (to,
>
> offset));
> + if (misalignp)
> +   {
> + mem = to_rtx;
> + to_rtx = saved_to_rtx;
> +   }
> }
>  
>/* No action is needed if the target is not a memory and the field
> 
> fixes the testcase and results in (at -O)
> 
> foo:
> .LFB1:
> .cfi_startproc
> pushq   %rbx
> .cfi_def_cfa_offset 16
> .cfi_offset 3, -16
> movq%rdi, %rbx
> callget_i
> cltq
> addq$1, %rax
> salq$4, %rax
> movdqa  .LC0(%rip), %xmm0
> movdqu  %xmm0, (%rbx,%rax)
> popq%rbx
> .cfi_def_cfa_offset 8
> ret
> 
> exactly what is expected.
> 
> Martin, do you want to audit the (few) places we do the movmisalign
> game for the same problem or shall I put it on my TODO?  The audit
> should look for all MEM_P (to_rtx) tests which really should look
> at 'mem' for the unaligned move case (I see a volatile bitfiled
> case for example ...).

And I thought I had an easy way out.  Well, let me see what I can do.

Thanks,

Martin


Re: Remove streaming of some tree flags and fields

2013-08-28 Thread Richard Biener
On Wed, 28 Aug 2013, Jan Hubicka wrote:

> Hi,
> this patch removes streaming of DECL_DEFER_OUTPUT, BINFO_INHERITANCE_CHAIN.
> BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX, DECL_IN_TEXT_SECTION.
> and DECL_ERROR_ISSUED.
> 
> I originally intended to remove DECL_DEFER_OUTPUT completely (it is pointless
> ever since we are unit-at-a-time), but that is bit harder because of java
> frontend relying on it in a strange (and accidental) way.  I think 
> DECL_DEFER_OUTPUT
> may be random in between units preventing merging, but I did not verify this
> assumption.
> 
> Other flag I would like to go since it is not really part of IL and represents
> just an bookeeping is DECL_SEEN_IN_BIND_EXPR_P, but that one is used by 
> gimplfier
> sanity checks.
> 
> While working on it, I noticed that DECL_ERROR_ISSUED is unused in compiler.
> Moreover BINFO_INHERITANCE_CHAIN.  BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX are
> unused by backend (and always NULL, cleared by free-lang-data).
> DECL_IN_TEXT_SECTION is removed on a basis that it is set only at a time
> we output variable into an assembly file.
> 
> I will send separate patch to remove DECL_DEFER_OUTPUT.
> 
> Boostrapped/regtested ppc64-linux, OK?

Ok.

Thanks,
Richard.

>   * lto.c (compare_tree_sccs_1): Drop DECL_ERROR_ISSUED,
>   DECL_DEFER_OUTPUT and DECL_IN_TEXT_SECTION.
>   (unify_scc): Do checking assert.
> 
>   * lto-streamer-out.c (DFS_write_tree_body): Drop
>   BINFO_INHERITANCE_CHAIN, BINFO_SUBVTT_INDEX and BINFO_VPTR_INDEX.
>   (hash_tree): Do not hash DECL_DEFER_OUTPUT, BINFO_INHERITANCE_CHAIN,
>   BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX, DECL_IN_TEXT_SECTION.
>   * tree-streamer-in.c (unpack_ts_decl_common_value_fields):
>   Do not read DECL_ERROR_ISSUED.
>   (unpack_ts_decl_with_vis_value_fields): Do not read
>   DECL_DEFER_OUTPUT.
>   (lto_input_ts_binfo_tree_pointers): Do not read BINFO_INHERITANCE_CHAIN,
> BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX
>   * tree-streamer-out.c (pack_ts_decl_common_value_fields): Do not
>   write DECL_ERROR_ISSUED..
>   (pack_ts_decl_with_vis_value_fields): Do not write
>   DECL_DEFER_OUTPUT.
>   (write_ts_binfo_tree_pointers): Do not read BINFO_INHERITANCE_CHAIN,
> BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX
>   * print-tree.c (print_node): Do not print DECL_ERROR_ISSUED.
>   * tree.h (tree_decl_common): Update comment.
>   (DECL_ERROR_ISSUED): Remove.
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 202000)
> +++ lto-streamer-out.c(working copy)
> @@ -642,9 +642,8 @@ DFS_write_tree_body (struct output_block
>FOR_EACH_VEC_SAFE_ELT (BINFO_BASE_ACCESSES (expr), i, t)
>   DFS_follow_tree_edge (t);
>  
> -  DFS_follow_tree_edge (BINFO_INHERITANCE_CHAIN (expr));
> -  DFS_follow_tree_edge (BINFO_SUBVTT_INDEX (expr));
> -  DFS_follow_tree_edge (BINFO_VPTR_INDEX (expr));
> +  /* Do not walk BINFO_INHERITANCE_CHAIN, BINFO_SUBVTT_INDEX
> +  and BINFO_VPTR_INDEX; these are used by C++ FE only.  */
>  }
>  
>if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
> @@ -748,7 +747,6 @@ hash_tree (struct streamer_tree_cache_d
>v = iterative_hash_host_wide_int (DECL_ALIGN (t), v);
>if (code == LABEL_DECL)
>   {
> -   v = iterative_hash_host_wide_int (DECL_ERROR_ISSUED (t), v);
> v = iterative_hash_host_wide_int (EH_LANDING_PAD_NR (t), v);
> v = iterative_hash_host_wide_int (LABEL_DECL_UID (t), v);
>   }
> @@ -781,20 +779,19 @@ hash_tree (struct streamer_tree_cache_d
>  
>if (CODE_CONTAINS_STRUCT (code, TS_DECL_WITH_VIS))
>  {
> -  v = iterative_hash_host_wide_int (DECL_DEFER_OUTPUT (t)
> - | (DECL_COMMON (t) << 1)
> - | (DECL_DLLIMPORT_P (t) << 2)
> - | (DECL_WEAK (t) << 3)
> - | (DECL_SEEN_IN_BIND_EXPR_P (t) << 4)
> - | (DECL_COMDAT (t) << 5)
> +  v = iterative_hash_host_wide_int ((DECL_COMMON (t))
> + | (DECL_DLLIMPORT_P (t) << 1)
> + | (DECL_WEAK (t) << 2)
> + | (DECL_SEEN_IN_BIND_EXPR_P (t) << 3)
> + | (DECL_COMDAT (t) << 4)
>   | (DECL_VISIBILITY_SPECIFIED (t) << 6),
>   v);
>v = iterative_hash_host_wide_int (DECL_VISIBILITY (t), v);
>if (code == VAR_DECL)
>   {
> +   /* DECL_IN_TEXT_SECTION is set during final asm output only.  */
> v = iterative_hash_host_wide_int (DECL_HARD_REGISTER (t)
> - | (DECL_IN_TEXT_SECTION (t) << 1)
> - | (DECL_IN_CONSTANT_POOL (t) << 2),
> +

Re: wide-int branch now up for public comment and review

2013-08-28 Thread Richard Biener
On Sun, 25 Aug 2013, Mike Stump wrote:

> On Aug 25, 2013, at 12:26 AM, Richard Sandiford  
> wrote:
> > (2) Adding a new namespace, wi, for the operators.  So far this
> >just contains the previously-static comparison functions
> >and whatever else was needed to avoid cross-dependencies
> >between wi and wide_int_ro (except for the debug routines).
> 
> It seems reasonable; I don't see anything I object to.  Seems like most of 
> the time, the code is shorter (though, you use wi, which is fairly short).  
> It doesn't seem any more complex, though, knowing how to spell the operation 
> wide_int:: v wi:: is confusing on the client side.  I'm torn between this and 
> the nice things that come with the patch.
> 
> > (3) Removing the comparison member functions and using the static
> >ones everywhere.
> 
> I've love to have richi weigh in (or someone else that wants to play the 
> role of C++ coding expert)?  I'd defer to them?

Yeah - wi::lt (a, b) is much better than a.lt (b) IMHO.  It mimics how
the standard library works.

> > The idea behind using a namespace rather than static functions
> > is that it makes it easier to separate the core, tree and rtx bits.
> 
> Being able to separate core, tree and rtx bits gets a +1 in my book.  I 
> do understand the beauty of this.

Now, if you look back in discussions I wanted a storage 
abstraction anyway.  Basically the interface is

class wide_int_storage
{
  int precision ();
  int len ();
  element_t get (unsigned);
  void set (unsigned, element_t);
};

and wide_int is then templated like

template 
class wide_int : public storage
{
};

where RTX / tree storage classes provide read-only access to their
storage and a rvalue integer rep to its value.

You can look at my example draft implementation I posted some
months ago.  But I'll gladly wiggle on the branch to make it
more like above (easy step one: don't access the wide-int members
directly but via accessor functions)

> > IMO wide-int.h shouldn't know about trees and rtxes, and all routines
> > related to them should be in tree.h and rtl.h instead.  But using
> > static functions means that you have to declare everything in one place.
> > Also, it feels odd for wide_int to be both an object and a home
> > of static functions that don't always operate on wide_ints, e.g. when
> > comparing a CONST_INT against 16.

Indeed - in my sample the wide-int-rtx-storage and wide-int-tree-storage
storage models were declared in rtl.h and tree.h and wide-int.h did
know nothing about them.

> Yes, though, does wi feel odd being a home for comparing a CONST_INT and 
> 16?  :-)
> 
> > I realise I'm probably not being helpful here.
> 
> Iterating on how we want to code to look like is reasonable.  Prettying 
> it up where it needs it, is good.
> 
> Indeed, if the code is as you like, and as richi likes, we'll then our 
> mission is just about complete.  :-)  For this patch, I'd love to defer 
> to richi (or someone that has a stronger opinion than I do) to say, 
> better, worse?

The comparisons?  Better.

Thanks,
Richard.


[PATCH] Support dumping type bindings and 'mutable' qualifier in lambda diagnostics.

2013-08-28 Thread Adam Butcher
* error.c (dump_function_decl): Use standard diagnostic flow to dump a
lambda diagnostic, albeit without stating the function name or
duplicating the parameter spec (which is dumped as part of the type).
Rather than qualifying the diagnostic with 'const' for plain lambdas,
qualify with 'mutable' if non-const.
---
Okay to commit?
---
 gcc/cp/error.c | 27 +++
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index c82a0ce..a8ca269 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -1380,14 +1380,7 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, int 
flags)
   int do_outer_scope = ! (flags & TFF_UNQUALIFIED_NAME);
   tree exceptions;
   vec *typenames = NULL;
-
-  if (DECL_NAME (t) && LAMBDA_FUNCTION_P (t))
-{
-  /* A lambda's signature is essentially its "type", so defer.  */
-  gcc_assert (LAMBDA_TYPE_P (DECL_CONTEXT (t)));
-  dump_type (pp, DECL_CONTEXT (t), flags);
-  return;
-}
+  bool lambda_p = false;
 
   flags &= ~(TFF_UNQUALIFIED_NAME | TFF_TEMPLATE_NAME);
   if (TREE_CODE (t) == TEMPLATE_DECL)
@@ -1449,21 +1442,31 @@ dump_function_decl (cxx_pretty_printer *pp, tree t, int 
flags)
   else if (cname)
 {
   dump_type (pp, cname, flags);
-  pp_cxx_colon_colon (pp);
+  if (LAMBDA_TYPE_P (cname))
+   lambda_p = true;
+  else
+   pp_cxx_colon_colon (pp);
 }
   else
 dump_scope (pp, CP_DECL_CONTEXT (t), flags);
 
-  dump_function_name (pp, t, flags);
+  /* A lambda's signature is essentially its "type", which has already been
+ dumped.  */
+  if (!lambda_p)
+dump_function_name (pp, t, flags);
 
   if (!(flags & TFF_NO_FUNCTION_ARGUMENTS))
 {
-  dump_parameters (pp, parmtypes, flags);
+  if (!lambda_p)
+   dump_parameters (pp, parmtypes, flags);
 
   if (TREE_CODE (fntype) == METHOD_TYPE)
{
  pp->padding = pp_before;
- pp_cxx_cv_qualifier_seq (pp, class_of_this_parm (fntype));
+ if (!lambda_p)
+   pp_cxx_cv_qualifier_seq (pp, class_of_this_parm (fntype));
+ else if (!(TYPE_QUALS (class_of_this_parm (fntype)) & 
TYPE_QUAL_CONST))
+   pp_c_ws_string (pp, "mutable");
  dump_ref_qualifier (pp, fntype, flags);
}
 
-- 
1.8.4



Re: wide-int branch now up for public comment and review

2013-08-28 Thread Richard Biener
On Fri, 23 Aug 2013, Richard Sandiford wrote:

> Hi Kenny,
> 
> This is the first time I've looked at the implementation of wide-int.h
> (rather than just looking at the rtl changes, which as you know I like
> in general), so FWIW here are some comments on wide-int.h.  I expect
> a lot of them overlap with Richard B.'s comments.
> 
> I also expect many of them are going to be annoying, sorry, but this
> first one definitely will.  The coding conventions say that functions
> should be defined outside the class:
> 
> http://gcc.gnu.org/codingconventions.html
> 
> and that opening braces should be on their own line, so most of the file
> needs to be reformatted.  I went through and made that change with the
> patch below, in the process of reading through.  I also removed "SGN
> must be SIGNED or UNSIGNED." because it seemed redundant when those are
> the only two values available.  The patch fixes a few other coding standard
> problems and typos, but I've not made any actual code changes (or at least,
> I didn't mean to).
> 
> Does it look OK to install?
> 
> I'm still unsure about these "infinite" precision types, but I understand
> the motivation and I have no objections.  However:
> 
> > * Code that does widening conversions.  The canonical way that
> >   this is performed is to sign or zero extend the input value to
> >   the max width based on the sign of the type of the source and
> >   then to truncate that value to the target type.  This is in
> >   preference to using the sign of the target type to extend the
> >   value directly (which gets the wrong value for the conversion
> >   of large unsigned numbers to larger signed types).
> 
> I don't understand this particular reason.  Using the sign of the source
> type is obviously right, but why does that mean we need "infinite" precision,
> rather than just doubling the precision of the source?

The comment indeed looks redundant - of course it is not correct
to extend the value "directly".

> >   * When a constant that has an integer type is converted to a
> > wide-int it comes in with precision 0.  For these constants the
> > top bit does accurately reflect the sign of that constant; this
> > is an exception to the normal rule that the signedness is not
> > represented.  When used in a binary operation, the wide-int
> > implementation properly extends these constants so that they
> > properly match the other operand of the computation.  This allows
> > you write:
> >
> >tree t = ...
> >wide_int x = t + 6;
> >
> > assuming t is a int_cst.
> 
> This seems dangerous.  Not all code that uses "unsigned HOST_WIDE_INT"
> actually wants it to be an unsigned value.  Some code uses it to avoid
> the undefinedness of signed overflow.  So these overloads could lead
> to us accidentally zero-extending what's conceptually a signed value
> without any obvious indication that that's happening.  Also, hex constants
> are unsigned int, but it doesn't seem safe to assume that 0x8000 was
> meant to be zero-extended.
>
> I realise the same thing can happen if you mix "unsigned int" with
> HOST_WIDE_INT, but the point is that you shouldn't really do that
> in general, whereas we're defining these overloads precisely so that
> a mixture can be used.
> 
> I'd prefer some explicit indication of the sign, at least for anything
> other than plain "int" (so that the compiler will complain about uses
> of "unsigned int" and above).

I prefer the automatic promotion - it is exactly what regular C types
do.  Now, consider

  wide_int x = ... construct 5 with precision 16 ...
  wide_int y = x + 6;

now, '6' is 'int' (precision 32), but with wide-int we treat it
as precision '0' ('infinite').  For x + 6 we the _truncate_ its
precision to that of 'x'(?) not exactly matching C behavior
(where we'd promote 'x' to 'int', perform the add and then truncate
to the precision of 'y' - which for wide-int gets its precision
from the result of x + 6).

Mimicing C would support dropping those 'require equal precision'
asserts but also would require us to properly track a sign to be
able to promote properly (or as I argued all the time always
properly sign-extend values so we effectively have infinite precision
anyway).

The fits_uhwi_p implementation changes scares me off that
"upper bits are undefined" thing a lot again... (I hate introducing
'undefinedness' into the compiler by 'design')

> >   Note that the bits above the precision are not defined and the
> >   algorithms used here are careful not to depend on their value.  In
> >   particular, values that come in from rtx constants may have random
> >   bits.

Which is a red herring.  It should be fixed.  I cannot even believe
that sentence given the uses of CONST_DOUBLE_LOW/CONST_DOUBLE_HIGH
or INTVAL/UINTVAL.  I don't see accesses masking out 'undefined' bits
anywhere.

> I have a feeling I'm rehashing a past debate, sorry, but rtx constant

Remove streaming of some tree flags and fields

2013-08-28 Thread Jan Hubicka
Hi,
this patch removes streaming of DECL_DEFER_OUTPUT, BINFO_INHERITANCE_CHAIN.
BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX, DECL_IN_TEXT_SECTION.
and DECL_ERROR_ISSUED.

I originally intended to remove DECL_DEFER_OUTPUT completely (it is pointless
ever since we are unit-at-a-time), but that is bit harder because of java
frontend relying on it in a strange (and accidental) way.  I think 
DECL_DEFER_OUTPUT
may be random in between units preventing merging, but I did not verify this
assumption.

Other flag I would like to go since it is not really part of IL and represents
just an bookeeping is DECL_SEEN_IN_BIND_EXPR_P, but that one is used by 
gimplfier
sanity checks.

While working on it, I noticed that DECL_ERROR_ISSUED is unused in compiler.
Moreover BINFO_INHERITANCE_CHAIN.  BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX are
unused by backend (and always NULL, cleared by free-lang-data).
DECL_IN_TEXT_SECTION is removed on a basis that it is set only at a time
we output variable into an assembly file.

I will send separate patch to remove DECL_DEFER_OUTPUT.

Boostrapped/regtested ppc64-linux, OK?

* lto.c (compare_tree_sccs_1): Drop DECL_ERROR_ISSUED,
DECL_DEFER_OUTPUT and DECL_IN_TEXT_SECTION.
(unify_scc): Do checking assert.

* lto-streamer-out.c (DFS_write_tree_body): Drop
BINFO_INHERITANCE_CHAIN, BINFO_SUBVTT_INDEX and BINFO_VPTR_INDEX.
(hash_tree): Do not hash DECL_DEFER_OUTPUT, BINFO_INHERITANCE_CHAIN,
BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX, DECL_IN_TEXT_SECTION.
* tree-streamer-in.c (unpack_ts_decl_common_value_fields):
Do not read DECL_ERROR_ISSUED.
(unpack_ts_decl_with_vis_value_fields): Do not read
DECL_DEFER_OUTPUT.
(lto_input_ts_binfo_tree_pointers): Do not read BINFO_INHERITANCE_CHAIN,
BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX
* tree-streamer-out.c (pack_ts_decl_common_value_fields): Do not
write DECL_ERROR_ISSUED..
(pack_ts_decl_with_vis_value_fields): Do not write
DECL_DEFER_OUTPUT.
(write_ts_binfo_tree_pointers): Do not read BINFO_INHERITANCE_CHAIN,
BINFO_SUBVTT_INDEX, BINFO_VPTR_INDEX
* print-tree.c (print_node): Do not print DECL_ERROR_ISSUED.
* tree.h (tree_decl_common): Update comment.
(DECL_ERROR_ISSUED): Remove.
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 202000)
+++ lto-streamer-out.c  (working copy)
@@ -642,9 +642,8 @@ DFS_write_tree_body (struct output_block
   FOR_EACH_VEC_SAFE_ELT (BINFO_BASE_ACCESSES (expr), i, t)
DFS_follow_tree_edge (t);
 
-  DFS_follow_tree_edge (BINFO_INHERITANCE_CHAIN (expr));
-  DFS_follow_tree_edge (BINFO_SUBVTT_INDEX (expr));
-  DFS_follow_tree_edge (BINFO_VPTR_INDEX (expr));
+  /* Do not walk BINFO_INHERITANCE_CHAIN, BINFO_SUBVTT_INDEX
+and BINFO_VPTR_INDEX; these are used by C++ FE only.  */
 }
 
   if (CODE_CONTAINS_STRUCT (code, TS_CONSTRUCTOR))
@@ -748,7 +747,6 @@ hash_tree (struct streamer_tree_cache_d
   v = iterative_hash_host_wide_int (DECL_ALIGN (t), v);
   if (code == LABEL_DECL)
{
- v = iterative_hash_host_wide_int (DECL_ERROR_ISSUED (t), v);
  v = iterative_hash_host_wide_int (EH_LANDING_PAD_NR (t), v);
  v = iterative_hash_host_wide_int (LABEL_DECL_UID (t), v);
}
@@ -781,20 +779,19 @@ hash_tree (struct streamer_tree_cache_d
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_WITH_VIS))
 {
-  v = iterative_hash_host_wide_int (DECL_DEFER_OUTPUT (t)
-   | (DECL_COMMON (t) << 1)
-   | (DECL_DLLIMPORT_P (t) << 2)
-   | (DECL_WEAK (t) << 3)
-   | (DECL_SEEN_IN_BIND_EXPR_P (t) << 4)
-   | (DECL_COMDAT (t) << 5)
+  v = iterative_hash_host_wide_int ((DECL_COMMON (t))
+   | (DECL_DLLIMPORT_P (t) << 1)
+   | (DECL_WEAK (t) << 2)
+   | (DECL_SEEN_IN_BIND_EXPR_P (t) << 3)
+   | (DECL_COMDAT (t) << 4)
| (DECL_VISIBILITY_SPECIFIED (t) << 6),
v);
   v = iterative_hash_host_wide_int (DECL_VISIBILITY (t), v);
   if (code == VAR_DECL)
{
+ /* DECL_IN_TEXT_SECTION is set during final asm output only.  */
  v = iterative_hash_host_wide_int (DECL_HARD_REGISTER (t)
-   | (DECL_IN_TEXT_SECTION (t) << 1)
-   | (DECL_IN_CONSTANT_POOL (t) << 2),
+   | (DECL_IN_CONSTANT_POOL (t) << 1),
v);
  v = iterative_hash_host_wide_int (DECL_TLS_MODEL (t), v);

Re: [PATCH, i386] PR 57927: -march=core-avx2 different than -march=native on INTEL Haswell (i7-4700K)

2013-08-28 Thread Uros Bizjak
Hello!

>> >>> As reported in [1] the host processor detection has not yet been updated
>> >>> to recognize Intel Ivy Bridge and Haswell processors.
>> >>> This small patch adds the detection of these processors and assumes
>> >>> core-avx2 as march for unknown processors of the PENTIUMPRO family that
>> >>> support AVX2.
>> >>
>> >> I have committed slightly improved (attached) patch that uses
>> >> core-avx-i for IvyBridge and adds another IvyBridge model number.
>> >> While there, I also reordered a bunch  of statements.
>> >>
>> > Page C-3 in ntel optimization guide shows:
>> >
>> > 06_3CH, 06_45H, 06_46H Intel microarchitecture Haswell
>> > 06_3AH, 06_3EH Intel microarchitecture Ivy Bridge
>> > 06_2AH, 06_2DH Intel microarchitecture Sandy Bridge
>> > 06_25H, 06_2CH, 06_2FH Intel microarchitecture Westmere
>> > 06_1AH, 06_1EH, 06_1FH, Intel microarchitecture Nehalem
>> > 06_2EH
>> > 06_17H, 06_1DH Enhanced Intel Core microarchitecture
>> > 06_0FH Intel Core microarchitecture
>> >
>> > At least, we should add 0x45 and 0x46 to Haswell.
>>
>> OK, the patch is pre-approved.
>
> Can these changes be backported to 4.8 and the Ivy Bridge parts to 4.7 as
> well? We've had a couple reports of bad -march=native results for Haswell on
> 4.7.  I can file PRs if you're interested.

I have backported all relevant patches to release branches (4.7, 4.8).

Uros.


Re: [PING PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK

2013-08-28 Thread Richard Biener
On Wed, 28 Aug 2013, Richard Biener wrote:

> On Tue, 27 Aug 2013, Jakub Jelinek wrote:
> 
> > On Tue, Aug 27, 2013 at 04:03:42PM +0200, Martin Jambor wrote:
> > > On Fri, Aug 23, 2013 at 05:29:23PM +0200, Jakub Jelinek wrote:
> > > > On Fri, Aug 23, 2013 at 05:11:22PM +0200, Martin Jambor wrote:
> > > > > Hi Jakub and/or Joseph,
> > > > > 
> > > > > the reporter of this bug seems to be very anxious to have it fixed in
> > > > > the repository.  While Richi is away, do you think you could have a
> > > > > look?  It is very small.
> > > > 
> > > > Isn't this ABI incompatible change (at least potential on various 
> > > > targets)?
> > > > If so, then it shouldn't be applied to release branches, because it 
> > > > would
> > > > create (potential?) ABI incompatibility between 4.8.[01] and 4.8.2.
> > > > 
> > > 
> > > I don't know.  I did a few attempts to observe a change in the calling
> > > convention of a function accepting a zero sized array terminated
> > > structure by value (on x86_64) but I did not succeed.  On the other
> > > hand, I assume there are many other ways how a mode can influence ABI.
> > > So I'll try to have a look whether we can hack around this situation
> > > in 4.8's expr.c instead.
> > 
> > All I remember that e.g. the number of regressions from PR20020 was big,
> > and any kind of TYPE_MODE changes are just extremely risky.
> > Perhaps x86_64 in the ABI decisions never uses mode, but we have tons of
> > other targets and it would surprise me if none of those were affected.
> > 
> > > Nevertheless, is this patch ok for trunk?
> > 
> > I'll defer that to Richard now that he is back ;)
> 
> Eh ... :/
> 
> I'm extremely nervous about this change.  I also believe the change
> is unrelated to the issue in the bugreport (even if it happens to
> fix the ICE).
> 
> Let me have a (short) look.

Everything looks ok up until

  if (offset != 0)
{
  enum machine_mode address_mode;

where we are supposed to factor in offset into the memory address.
This code doesn't handle the movmisalign path which has the memory
in 'mem' instead of in to_rtx.

And indeed a hack like

Index: gcc/expr.c
===
--- gcc/expr.c  (revision 202043)
+++ gcc/expr.c  (working copy)
@@ -4753,6 +4753,9 @@ expand_assignment (tree to, tree from, b
{
  enum machine_mode address_mode;
  rtx offset_rtx;
+ rtx saved_to_rtx = to_rtx;
+ if (misalignp)
+   to_rtx = mem;
 
  if (!MEM_P (to_rtx))
{
@@ -4785,6 +4788,11 @@ expand_assignment (tree to, tree from, b
  to_rtx = offset_address (to_rtx, offset_rtx,
   highest_pow2_factor_for_target (to,
   
offset));
+ if (misalignp)
+   {
+ mem = to_rtx;
+ to_rtx = saved_to_rtx;
+   }
}
 
   /* No action is needed if the target is not a memory and the field

fixes the testcase and results in (at -O)

foo:
.LFB1:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq%rdi, %rbx
callget_i
cltq
addq$1, %rax
salq$4, %rax
movdqa  .LC0(%rip), %xmm0
movdqu  %xmm0, (%rbx,%rax)
popq%rbx
.cfi_def_cfa_offset 8
ret

exactly what is expected.

Martin, do you want to audit the (few) places we do the movmisalign
game for the same problem or shall I put it on my TODO?  The audit
should look for all MEM_P (to_rtx) tests which really should look
at 'mem' for the unaligned move case (I see a volatile bitfiled
case for example ...).

Still need to figure a "nice" way to restructure the code ...

Any ideas?

Thanks,
Richard.


Re: [PING PATCH, PR 57748] Set mode of structures with zero sized arrays to be BLK

2013-08-28 Thread Richard Biener
On Tue, 27 Aug 2013, Jakub Jelinek wrote:

> On Tue, Aug 27, 2013 at 04:03:42PM +0200, Martin Jambor wrote:
> > On Fri, Aug 23, 2013 at 05:29:23PM +0200, Jakub Jelinek wrote:
> > > On Fri, Aug 23, 2013 at 05:11:22PM +0200, Martin Jambor wrote:
> > > > Hi Jakub and/or Joseph,
> > > > 
> > > > the reporter of this bug seems to be very anxious to have it fixed in
> > > > the repository.  While Richi is away, do you think you could have a
> > > > look?  It is very small.
> > > 
> > > Isn't this ABI incompatible change (at least potential on various 
> > > targets)?
> > > If so, then it shouldn't be applied to release branches, because it would
> > > create (potential?) ABI incompatibility between 4.8.[01] and 4.8.2.
> > > 
> > 
> > I don't know.  I did a few attempts to observe a change in the calling
> > convention of a function accepting a zero sized array terminated
> > structure by value (on x86_64) but I did not succeed.  On the other
> > hand, I assume there are many other ways how a mode can influence ABI.
> > So I'll try to have a look whether we can hack around this situation
> > in 4.8's expr.c instead.
> 
> All I remember that e.g. the number of regressions from PR20020 was big,
> and any kind of TYPE_MODE changes are just extremely risky.
> Perhaps x86_64 in the ABI decisions never uses mode, but we have tons of
> other targets and it would surprise me if none of those were affected.
> 
> > Nevertheless, is this patch ok for trunk?
> 
> I'll defer that to Richard now that he is back ;)

Eh ... :/

I'm extremely nervous about this change.  I also believe the change
is unrelated to the issue in the bugreport (even if it happens to
fix the ICE).

Let me have a (short) look.

Richard.


Re: [PATCH][2/n] 2nd try: Re-organize -fvect-cost-model, enable basic vectorization at -O2

2013-08-28 Thread Richard Biener
On Tue, 27 Aug 2013, Xinliang David Li wrote:

> Richard, I have some comments about the patch.
> 
> >   -ftree-vectorizer-verbose=This switch is deprecated. Use 
> > -fopt-info instead.
> >
> >   ftree-slp-vectorize
> > ! Common Report Var(flag_tree_slp_vectorize) Optimization
> >   Enable basic block vectorization (SLP) on trees
> 
> The code dealing with the interactions between -ftree-vectorize, O3,
> etc are complicated and hard to understand. Is it better to change the
> meaning of -ftree-vectorize to mean -floop-vectorize only, and make it
> independent of -fslp-vectorize?  P

Yeah, but that would be an independent change.  Also people expect
to be able to enable all of the vectorizer with -ftree-vectorize.
So rather we introduce -floop-vectorize?

> > + fvect-cost-model=
> > + Common Joined RejectNegative Enum(vect_cost_model) 
> > Var(flag_vect_cost_model) Init(VECT_COST_MODEL_DEFAULT)
> > + Specifies the cost model for vectorization
> > +
> > + Enum
> > + Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
> > vectorizer cost model %qs)
> > +
> > + EnumValue
> > + Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED)
> > +
> > + EnumValue
> > + Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC)
> > +
> > + EnumValue
> > + Enum(vect_cost_model) String(cheap) Value(VECT_COST_MODEL_CHEAP)
> 
> Introducing cheap model is a great change.
> 
> > +
> 
> > *** 173,179 
> >   {
> > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >
> > !   if ((unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) == 
> > 0)
> >   return false;
> >
> > if (dump_enabled_p ())
> > --- 173,180 
> >   {
> > struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >
> > !   if (loop_vinfo->cost_model == VECT_COST_MODEL_CHEAP
> > !   || (unsigned) PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS) 
> > == 0)
> >   return false;
> >
> 
> When the cost_model == cheap, the alignment peeling should also be
> disabled -- there will still be loops that are beneficial to be
> vectorized without peeling -- at perhaps reduced net runtime gain.

IIRC there are targets that cannot vectorize unaligned accesses, so
in the end the cost model needs to be more target-controlled.

The above was just a start for experimenting, of course.

> >   struct gimple_opt_pass pass_slp_vectorize =
> > --- 206,220 
> >   static bool
> >   gate_vect_slp (void)
> >   {
> > !   /* Apply SLP either according to whether the user specified whether to
> > !  run SLP or not, or according to whether the user specified whether
> > !  to do vectorization or not.  */
> > !   if (global_options_set.x_flag_tree_slp_vectorize)
> > ! return flag_tree_slp_vectorize != 0;
> > !   if (global_options_set.x_flag_tree_vectorize)
> > ! return flag_tree_vectorize != 0;
> > !   /* And if vectorization was enabled by default run SLP only at -O3.  */
> > !   return flag_tree_vectorize != 0 && optimize == 3;
> >   }
> 
> The logic can be greatly simplified if slp vectorizer is controlled
> independently -- easier for user to understand too.

It should work with separating out -floop-vectorize, too I guess.  But
yes, as I wanted to preserve behavior of adding -ftree-vectorize to
-O2 the above necessarily became quite complicated ;)

> > ! @item -fvect-cost-model=@var{model}
> >   @opindex fvect-cost-model
> > ! Alter the cost model used for vectorization.  The @var{model} argument
> > ! should be one of @code{unlimited}, @code{dynamic} or @code{cheap}.
> > ! With the @code{unlimited} model the vectorized code-path is assumed
> > ! to be profitable while with the @code{dynamic} model a runtime check
> > ! will guard the vectorized code-path to enable it only for iteration
> > ! counts that will likely execute faster than when executing the original
> > ! scalar loop.  The @code{cheap} model will disable vectorization of
> > ! loops where doing so would be cost prohibitive for example due to
> > ! required runtime checks for data dependence or alignment but otherwise
> > ! is equal to the @code{dynamic} model.
> > ! The default cost model depends on other optimization flags and is
> > ! either @code{dynamic} or @code{cheap}.
> >
> 
> Vectorizer in theory will only vectorize a loop with net runtime gain,
> so the 'cost' here should only mean code size and compile time cost.

Not exactly - for 'unlimited' we may enter the vectorized path
even if the overhead of the guards, prologue and epilogue exceeds
the benefit of the (eventually never entered) vectorized loop.
That is, the 'dynamic' model does

  if (n > profitable-iters)
{
  if (alias checks, align checks)
{
  prologue loop
  vectorized loop
  epilogue loop
}
  else goto scalar loop
}
  else
scalar loop

because clearly the more complicated flow is not always profitable
to enter.

> Cheap Model: with this model, the compiler will vectorize loops that
> a

Re: wide-int branch updated

2013-08-28 Thread Richard Biener
On Tue, 27 Aug 2013, Richard Sandiford wrote:

> Kenneth Zadeck  writes:
> > fixed fits_uhwi_p.
> >
> > tested on x86-64.
> >
> > kenny
> >
> > Index: gcc/wide-int.h
> > ===
> > --- gcc/wide-int.h  (revision 201985)
> > +++ gcc/wide-int.h  (working copy)
> > @@ -1650,7 +1650,7 @@ wide_int_ro::fits_shwi_p () const
> >  inline bool
> >  wide_int_ro::fits_uhwi_p () const
> >  {
> > -  return len == 1 || (len == 2 && val[1] == 0);
> > +  return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0);
> >  }
> 
> With upper bits being undefined, it doesn't seem safe to check

Err - we're back at upper bits being undefined?  Ugh.  Then
having both fits_uhwi_p and fits_shwi_p doesn't make sense.

> val[0] or val[1] like this.   I was thinking along the lines of:
> 
> inline bool
> wide_int_ro::fits_uhwi_p () const
> {
>   if (precision <= HOST_BITS_PER_WIDE_INT)
> return true;
>   if (len == 1)
> return val[0] >= 0;
>   if (precision < HOST_BITS_PER_WIDE_INT * 2)
> return ((unsigned HOST_WIDE_INT) val[1]
>   << (HOST_BITS_PER_WIDE_INT * 2 - precision)) == 0;
>   return val[1] == 0;
> }
> 
> Since we don't have a sign, everything HWI-sized or smaller fits in a
> uhwi without loss of precision.

Which then means fits_hwi_p () is simply

  if (precision <= HOST_BITS_PER_WIDE_INT)
return true;
  zext ();  // ick, modification in a predicate!  stupid undefined bits!
  return len == 1 || (len == 2 && val[1] == 0);

but if upper bits are undefined then the whole encoding and len
business is broken.

Richard.


Re: wide-int branch updated.

2013-08-28 Thread Richard Biener
On Tue, 27 Aug 2013, Kenneth Zadeck wrote:

> removed all knowledge of SHIFT_COUNT_TRUNCATED from wide-int
> 
> both Richard Biener and Richard Sandiford had commented negatively about 
> this.
>
> fixed bug with wide-int::fits_uhwi_p.

 inline bool
 wide_int_ro::fits_uhwi_p () const
 {
-  return (len == 1 && val[0] >= 0) || (len == 2 && val[1] == 0);
+  return (precision <= HOST_BITS_PER_WIDE_INT)
+|| (len == 1 && val[0] >= 0) 
+|| (len == 2 && (precision >= 2 * HOST_BITS_PER_WIDE_INT) && (val[1] 
==
0))
+|| (len == 2 && (sext_hwi (val[1], precision & 
(HOST_BITS_PER_WIDE_INT
- 1)) == 0));
 }

it now get's scary ;)  Still wrong for precision == 0?

;)

I wonder what it's semantic is ... in double_int we simply require
high == 0 (thus, negative numbers are not allowed).  with
precision <= HOST_BITS_PER_WIDE_INT you allow negative numbers.

Matching what double-int fits_uhwi does would be

(len == 1 && ((signed HOST_WIDE_INT)val[0]) >= 0)
|| (len == 2 && val[1] == 0)

(I don't remember off-hand the signedness of val[], but eventually
you missed the conversion to signed)

Now, what double-int does is supposed to match
host_integerp (..., 1) which I think it does.

Richard.


[PATCH] Fix gcc.dg/vect/pr56933.c runtime

2013-08-28 Thread Richard Biener

Using check_vect ().

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2013-08-28  Richard Biener  

PR tree-optimization/56933
* gcc.dg/vect/pr56933.c: Properly guard runtime with check_vect ().

Index: gcc/testsuite/gcc.dg/vect/pr56933.c
===
--- gcc/testsuite/gcc.dg/vect/pr56933.c (revision 202019)
+++ gcc/testsuite/gcc.dg/vect/pr56933.c (working copy)
@@ -1,4 +1,7 @@
 /* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+
+#include "tree-vect.h"
 
 extern void abort (void);
 void __attribute__((noinline,noclone))
@@ -17,6 +20,9 @@ int main()
 {
   double b[1024], d[2*1024], f[1024];
   int i;
+
+  check_vect ();
+
   for (i = 0; i < 2*1024; i++)
 d[i] = 1.;
   foo (b, d, f);


Re: Annotate free as leaf

2013-08-28 Thread Jakub Jelinek
On Wed, Aug 28, 2013 at 09:21:28AM +0200, Jan Hubicka wrote:
> while adding LEAF attributes, I apparently missed free.   Malloc is already 
> annotated.
> Fixed thus. Comitted as obvoius.

Given that glibc headers mark free as leaf and nothing has been reported
against it, it is probably fine; but you've ignored the comment that
told you to adjust also BUILT_IN_TM_FREE.

> --- ChangeLog (revision 202003)
> +++ ChangeLog (working copy)
> @@ -1,5 +1,9 @@
>  2013-08-26  Jan Hubicka  
>  
> +  * builtins.def (free): Declare leaf.
> +
> +2013-08-26  Jan Hubicka  
> +
>   * cgraph.c (cgraph_propagate_frequency): Do not assume that virtual
>   methods can not be called indirectly when their address is not taken.
>  
> Index: builtins.def
> ===
> --- builtins.def  (revision 202000)
> +++ builtins.def  (working copy)
> @@ -694,7 +694,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSLL,
>  DEF_EXT_LIB_BUILTIN(BUILT_IN_FORK, "fork", BT_FN_PID, 
> ATTR_NOTHROW_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_FRAME_ADDRESS, "frame_address", 
> BT_FN_PTR_UINT, ATTR_NULL)
>  /* [trans-mem]: Adjust BUILT_IN_TM_FREE if BUILT_IN_FREE is changed.  */
> -DEF_LIB_BUILTIN(BUILT_IN_FREE, "free", BT_FN_VOID_PTR, 
> ATTR_NOTHROW_LIST)
> +DEF_LIB_BUILTIN(BUILT_IN_FREE, "free", BT_FN_VOID_PTR, 
> ATTR_NOTHROW_LEAF_LIST)
>  DEF_GCC_BUILTIN(BUILT_IN_FROB_RETURN_ADDR, "frob_return_addr", 
> BT_FN_PTR_PTR, ATTR_NULL)
>  DEF_EXT_LIB_BUILTIN(BUILT_IN_GETTEXT, "gettext", 
> BT_FN_STRING_CONST_STRING, ATTR_FORMAT_ARG_1)
>  DEF_C99_BUILTIN(BUILT_IN_IMAXABS, "imaxabs", BT_FN_INTMAX_INTMAX, 
> ATTR_CONST_NOTHROW_LEAF_LIST)

Jakub


Annotate free as leaf

2013-08-28 Thread Jan Hubicka
Hi,
while adding LEAF attributes, I apparently missed free.   Malloc is already 
annotated.
Fixed thus. Comitted as obvoius.

Honza
cndex: ChangeLog
===
--- ChangeLog   (revision 202003)
+++ ChangeLog   (working copy)
@@ -1,5 +1,9 @@
 2013-08-26  Jan Hubicka  
 
+* builtins.def (free): Declare leaf.
+
+2013-08-26  Jan Hubicka  
+
* cgraph.c (cgraph_propagate_frequency): Do not assume that virtual
methods can not be called indirectly when their address is not taken.
 
Index: builtins.def
===
--- builtins.def(revision 202000)
+++ builtins.def(working copy)
@@ -694,7 +694,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSLL,
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FORK, "fork", BT_FN_PID, 
ATTR_NOTHROW_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FRAME_ADDRESS, "frame_address", 
BT_FN_PTR_UINT, ATTR_NULL)
 /* [trans-mem]: Adjust BUILT_IN_TM_FREE if BUILT_IN_FREE is changed.  */
-DEF_LIB_BUILTIN(BUILT_IN_FREE, "free", BT_FN_VOID_PTR, 
ATTR_NOTHROW_LIST)
+DEF_LIB_BUILTIN(BUILT_IN_FREE, "free", BT_FN_VOID_PTR, 
ATTR_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FROB_RETURN_ADDR, "frob_return_addr", 
BT_FN_PTR_PTR, ATTR_NULL)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_GETTEXT, "gettext", 
BT_FN_STRING_CONST_STRING, ATTR_FORMAT_ARG_1)
 DEF_C99_BUILTIN(BUILT_IN_IMAXABS, "imaxabs", BT_FN_INTMAX_INTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)


  1   2   >