Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-14 Thread Dmitry Vyukov
On Thu, Dec 13, 2012 at 1:02 PM, Jakub Jelinek  wrote:
>> > Various TM tests ICE when built with -fgnu-tm -fsanitizer=address.
>> > The problem is that asan.c pass adds calls to builtins that weren't there
>> > before and TM is upset about it.  The __asan_report* are all like
>> > abort, in correctly written program they shouldn't have a user visible
>> > effect, in bad program they will terminate the process, but in any case
>> > it doesn't matter how many times they are retried as part of a transaction,
>> > there is no state to roll back on transaction cancellation.
>> > __asan_handle_no_return, while not being noreturn, just marks the stack as
>> > unprotected, so again in correctly written application no effect, in bad 
>> > app
>> > might result in some issues being undetected, but still, it can be done 
>> > many
>> > times and isn't irreversible.
>>
>> I was only loosely following tm-languages discussions. Does gcc tm
>> model guarantees strong consistency for all memory accesses? I mean
>> there are tm implementations that allow transient inconsistencies,
>
> Will leave this to Torvald.
>
>> than are detected later and trx is restarted. Can't asan trigger false
>> positives in this case?
>
> I can't imagine any.
>
>> Also, what is the order of instrumentation in tm+asan setting? I mean
>> that neither tm must instrument asan instrumentation, nor asan must
>> instrument tm instrumentation. Is it the case? There also can be
>> conflicts related to ordering of instrumentation in the following
>> case:
>> asan_check();
>> speculative_load();
>> tm_check();
>
> I'm not aware of TM having speculative loads, libgtm certainly doesn't
> install a SIGSEGV handler (and testing whether some memory is
> readable/writable without actually dereferencing would be terribly slow).
> If a memory load or store segfaults, whether in a transaction or outside of
> it, it is a program bug and it is right if asan terminates the program.


OK, thanks.
AFAIR tl2 installs SIGSEGV and uses non-failing loads.


Re: [PATCH,x86] Fix combine for condditional instructions.

2012-12-14 Thread Yuri Rumyantsev
Hi Uros,

With your new fix that add if-then-else splitting for memory operand I
got expected performance speed-up - +6.7% for Atom and +8.4% for SNB.
We need to do all testing this weekend and I will get you our final
feedback on Monday.

Thanks ahead for all your help.
Yuri.

2012/12/13 Uros Bizjak :
> On Thu, Dec 13, 2012 at 4:02 PM, Yuri Rumyantsev  wrote:
>
>> We did not see any performance improvement on Atom in 32-bit mode at
>> routelookup from eembc_2_0 (eembc_1_1).
>
> I assume that for x86_64 the patch works as expected. Let's take a
> bigger hammer for 32bit targets - the splitter that effectively does
> the same as your proposed patch. Please note, that this splitter
> operates with optimize_function_for_speed_p predicate, so this
> transformation will take place in the whole function. Can you perhaps
> investigate what happens if this predicate is changed back to
> optimize_insn_for_speed_p () - this is what we would like to have
> here?
>
> ;; Don't do conditional moves with memory inputs.  This splitter helps
> ;; register starved x86_32 by forcing inputs into registers before reload.
> (define_split
>   [(set (match_operand:SWI248 0 "register_operand")
> (if_then_else:SWI248 (match_operator 1 "ix86_comparison_operator"
>[(reg FLAGS_REG) (const_int 0)])
>   (match_operand:SWI248 2 "nonimmediate_operand")
>   (match_operand:SWI248 3 "nonimmediate_operand")))]
>   "!TARGET_64BIT && TARGET_CMOVE
>&& (MEM_P (operands[2]) || MEM_P (operands[3]))
>&& can_create_pseudo_p ()
>&& optimize_function_for_speed_p (cfun)"
>   [(set (match_dup 0)
> (if_then_else:SWI248 (match_dup 1) (match_dup 2) (match_dup 3)))]
> {
>   if (MEM_P (operands[2]))
> operands[2] = force_reg (mode, operands[2]);
>   if (MEM_P (operands[3]))
> operands[3] = force_reg (mode, operands[3]);
> })
>
> Attached is the complete patch, including peephole2s.
>
> Uros.


RE: [PATCH, AArch64] Make zero_extends explicit for common SImode patterns

2012-12-14 Thread Ian Bolton
Hi Richard,

> +  "add\\t%w0, %w2, %w, xt"
>
>  ^^^ %w1

Got spot. I guess that pattern hasn't fired yet then!  I'll fix it.


> > This patch significantly reduces the number of redundant
> > uxtw instructions seen in a variety of programs.
> >
> > (There are further patterns that can be done, but I have them
> > in a separate patch that's still in development.)
> 
> What do you get if you enable flag_ree, as we do for x86_64?
> In theory this should avoid even more extensions...
> 
> 
> C.f. common/config/i386/i386-common.c:
> 
> static const struct default_options ix86_option_optimization_table[] =
>   {
> /* Enable redundant extension instructions removal at -O2 and
> higher.  */
> { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 },
> 

I should have said that I am indeed running with REE enabled.  It has
some impact (about 70 further UXTW removed from the set of binaries
I've been building) and seems to mostly be good across basic blocks
within the same function.  As far as I can tell, there is no downside
to REE, so I think it should be enabled by default for O2 or higher
on AArch64 too.

I'll prepare a new patch ...






[PATCH] Fix PR55684

2012-12-14 Thread Richard Biener

Honza left an ICE when we cannot compute number_of_iterations_exit
in remove_redundant_iv_tests, but that can for sure happen by design
(see bugzilla).

The following patch fixes it, bootstrap and regtest pending on 
x86_64-unknown-linux-gnu.

Richard.

2012-12-14  Richard Biener  

PR tree-optimization/55684
* tree-ssa-loop-ivcanon.c (remove_redundant_iv_tests): Handle
gracefully the case where we cannot compute the number of
iterations at an exit.

* gcc.dg/torture/pr55684.c: New testcase.

Index: gcc/tree-ssa-loop-ivcanon.c
===
*** gcc/tree-ssa-loop-ivcanon.c (revision 194496)
--- gcc/tree-ssa-loop-ivcanon.c (working copy)
*** remove_redundant_iv_tests (struct loop *
*** 555,563 
  /* Only when we know the actual number of iterations, not
 just a bound, we can remove the exit.  */
  if (!number_of_iterations_exit (loop, exit_edge,
! &niter, false, false))
!   gcc_unreachable ();
! if (!integer_onep (niter.assumptions)
  || !integer_zerop (niter.may_be_zero)
  || !niter.niter
  || TREE_CODE (niter.niter) != INTEGER_CST
--- 555,562 
  /* Only when we know the actual number of iterations, not
 just a bound, we can remove the exit.  */
  if (!number_of_iterations_exit (loop, exit_edge,
! &niter, false, false)
! || !integer_onep (niter.assumptions)
  || !integer_zerop (niter.may_be_zero)
  || !niter.niter
  || TREE_CODE (niter.niter) != INTEGER_CST
Index: gcc/testsuite/gcc.dg/torture/pr55684.c
===
*** gcc/testsuite/gcc.dg/torture/pr55684.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr55684.c  (working copy)
***
*** 0 
--- 1,33 
+ /* { dg-do compile } */
+ 
+ typedef struct _IO_FILE FILE;
+ unsigned long int strtoul(const char *, char **, int);
+ char *fgets(char *, int, FILE *);
+ struct ihexrec {
+ unsigned char reclen;
+ unsigned char data[256];
+ };
+ static void srec_readrec(struct ihexrec * srec, char * rec)
+ {
+   int i, j;
+   char buf[8];
+   int offset = 0, len;
+   char * e;
+   for (i=0; jreclen; j++)
+ {
+   if (offset+2 > len)
+ return;
+   for (i=0; i<2; i++)
+ buf[i] = rec[offset++];
+   srec->data[j] = strtoul(buf, &e, 16);
+ }
+   for (i=0; i<2; i++)
+ buf[i] = rec[offset++];
+ }
+ void srec2b(FILE *inf)
+ {
+   char buffer[256];
+   struct ihexrec srec;
+   while (fgets(buffer,256,inf)!=(void *)0)
+ srec_readrec(&srec, buffer);
+ }


RE: [PATCH, AArch64] Make zero_extends explicit for common SImode patterns

2012-12-14 Thread Ian Bolton
> Hi Richard,
> 
> > +  "add\\t%w0, %w2, %w, xt"
> >
> >  ^^^ %w1
> 
> Got spot. I guess that pattern hasn't fired yet then!  I'll fix it.

Now fixed in v3.

> I should have said that I am indeed running with REE enabled.  It has
> some impact (about 70 further UXTW removed from the set of binaries
> I've been building) and seems to mostly be good across basic blocks
> within the same function.  As far as I can tell, there is no downside
> to REE, so I think it should be enabled by default for O2 or higher
> on AArch64 too.
> 

I'm going to enable REE in a separate patch.

Is this one OK to commit here and backport to ARM/aarch64-4.7-branch?

Thanks,
Ian
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a9a8b5f..d5c0206 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1271,6 +1273,22 @@
(set_attr "mode" "SI")]
 )
 
+;; zero_extend version of above
+(define_insn "*addsi3_aarch64_uxtw"
+  [(set
+(match_operand:DI 0 "register_operand" "=rk,rk,rk")
+(zero_extend:DI (plus:SI
+ (match_operand:SI 1 "register_operand" "%rk,rk,rk")
+ (match_operand:SI 2 "aarch64_plus_operand" "I,r,J"]
+  ""
+  "@
+  add\\t%w0, %w1, %2
+  add\\t%w0, %w1, %w2
+  sub\\t%w0, %w1, #%n2"
+  [(set_attr "v8type" "alu")
+   (set_attr "mode" "SI")]
+)
+
 (define_insn "*adddi3_aarch64"
   [(set
 (match_operand:DI 0 "register_operand" "=rk,rk,rk,!w")
@@ -1304,6 +1322,23 @@
(set_attr "mode" "")]
 )
 
+;; zero_extend version of above
+(define_insn "*addsi3_compare0_uxtw"
+  [(set (reg:CC_NZ CC_REGNUM)
+   (compare:CC_NZ
+(plus:SI (match_operand:SI 1 "register_operand" "%r,r")
+  (match_operand:SI 2 "aarch64_plus_operand" "rI,J"))
+(const_int 0)))
+   (set (match_operand:DI 0 "register_operand" "=r,r")
+   (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2]
+  ""
+  "@
+  adds\\t%w0, %w1, %w2
+  subs\\t%w0, %w1, #%n2"
+  [(set_attr "v8type" "alus")
+   (set_attr "mode" "SI")]
+)
+
 (define_insn "*add3nr_compare0"
   [(set (reg:CC_NZ CC_REGNUM)
(compare:CC_NZ
@@ -1340,6 +1375,19 @@
(set_attr "mode" "")]
 )
 
+;; zero_extend version of above
+(define_insn "*add__si_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=rk")
+   (zero_extend:DI (plus:SI
+   (ASHIFT:SI (match_operand:SI 1 "register_operand" "r")
+  (match_operand:QI 2 "aarch64_shift_imm_si" "n"))
+  (match_operand:SI 3 "register_operand" "r"]
+  ""
+  "add\\t%w0, %w3, %w1,  %2"
+  [(set_attr "v8type" "alu_shift")
+   (set_attr "mode" "SI")]
+)
+
 (define_insn "*add_mul_imm_"
   [(set (match_operand:GPI 0 "register_operand" "=rk")
(plus:GPI (mult:GPI (match_operand:GPI 1 "register_operand" "r")
@@ -1361,6 +1409,17 @@
(set_attr "mode" "")]
 )
 
+;; zero_extend version of above
+(define_insn "*add__si_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=rk")
+   (zero_extend:DI (plus:SI (ANY_EXTEND:SI (match_operand:SHORT 1 
"register_operand" "r"))
+ (match_operand:GPI 2 "register_operand" "r"]
+  ""
+  "add\\t%w0, %w2, %w1, xt"
+  [(set_attr "v8type" "alu_ext")
+   (set_attr "mode" "SI")]
+)
+
 (define_insn "*add__shft_"
   [(set (match_operand:GPI 0 "register_operand" "=rk")
(plus:GPI (ashift:GPI (ANY_EXTEND:GPI
@@ -1373,6 +1432,19 @@
(set_attr "mode" "")]
 )
 
+;; zero_extend version of above
+(define_insn "*add__shft_si_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=rk")
+   (zero_extend:DI (plus:SI (ashift:SI (ANY_EXTEND:SI
+  (match_operand:SHORT 1 "register_operand" "r"))
+ (match_operand 2 "aarch64_imm3" "Ui3"))
+ (match_operand:SI 3 "register_operand" "r"]
+  ""
+  "add\\t%w0, %w3, %w1, xt %2"
+  [(set_attr "v8type" "alu_ext")
+   (set_attr "mode" "SI")]
+)
+
 (define_insn "*add__mult_"
   [(set (match_operand:GPI 0 "register_operand" "=rk")
(plus:GPI (mult:GPI (ANY_EXTEND:GPI
@@ -1385,6 +1457,19 @@
(set_attr "mode" "")]
 )
 
+;; zero_extend version of above
+(define_insn "*add__mult_si_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=rk")
+   (zero_extend:DI (plus:SI (mult:SI (ANY_EXTEND:SI
+(match_operand:SHORT 1 "register_operand" "r"))
+   (match_operand 2 "aarch64_pwr_imm3" "Up3"))
+ (match_operand:SI 3 "register_operand" "r"]
+  ""
+  "add\\t%w0, %w3, %w1, xt %p2"
+  [(set_attr "v8type" "alu_ext")
+   (set_attr "mode" "SI")]
+)
+
 (define_insn "*add__multp2"
   [(set (match_operand:GPI 0 "register_operand" "=rk")
(plus:GPI (ANY_EXTRACT:GPI
@@ -1399,6 +1484,21 @@
(set_attr "mode" "")]
 )
 
+;; zero_extend version of above
+(define_insn "*add_si_multp2_uxtw"
+  [(set (match_operand:DI 0 "register_operand" "=rk")
+   (zero_extend:DI (plus:SI (ANY_EXTRACT:SI
+  (mult:SI (match_operand:SI 1 "registe

[PATCH, PR 55355, trunk, 4.6, 4.7] One more host_integerp check in SRA

2012-12-14 Thread Martin Jambor
Hi,

below are two variants for PR 55355, one for trunk and for 4.7, the
second one, without dumping, for 4.6.  On 4.6 they fix an ICE for a
too-large integer on i686, which perhaps can also happen on 4.7 and
trunk even though it does not for their particular testcase.

I've bootstrapped and tested this on x86_64-linux on trunk and the two
branches, I'm in the process of doing the same on i686-linux.  OK for
everywhere if it passes?

Thanks,

Martin


trunk and 4.7 variant:

2012-12-12  Martin Jambor  

PR tree-optimization/55355
* tree-sra.c (type_internals_preclude_sra_p): Also check that
bit_position is small enough to fit a single HOST_WIDE_INT.

* testsuite/g++.dg/torture/pr55355.C: New test.


diff --git a/gcc/testsuite/g++.dg/torture/pr55355.C 
b/gcc/testsuite/g++.dg/torture/pr55355.C
new file mode 100644
index 000..6d8f8b6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr55355.C
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+
+struct A
+{
+void funcA(void);
+};
+
+struct B {};
+
+struct C
+{
+void funcC(void) { a_mp->funcA(); }
+
+char buf_ma[268435456];
+A   *a_mp;
+Bb_m;
+};
+
+void
+func(C *c_p)
+{
+c_p->funcC();
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 21d8a51..286ef26 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -714,7 +714,12 @@ type_internals_preclude_sra_p (tree type, const char **msg)
  {
*msg = "structure field size not fixed";
return true;
- }   
+ }
+   if (!host_integerp (bit_position (fld), 0))
+ {
+   *msg = "structure field size too big";
+   return true;
+ }
if (AGGREGATE_TYPE_P (ft)
&& int_bit_position (fld) % BITS_PER_UNIT != 0)
  {


4.6 variant:


2012-12-12  Martin Jambor  

PR tree-optimization/55355
* tree-sra.c (type_internals_preclude_sra_p): Also check that
bit_position is small enough to fit a single HOST_WIDE_INT.

* testsuite/g++.dg/torture/pr55355.C: New test.

Index: gcc/testsuite/g++.dg/torture/pr55355.C
===
--- gcc/testsuite/g++.dg/torture/pr55355.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr55355.C  (revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+
+struct A
+{
+void funcA(void);
+};
+
+struct B {};
+
+struct C
+{
+void funcC(void) { a_mp->funcA(); }
+
+char buf_ma[268435456];
+A   *a_mp;
+Bb_m;
+};
+
+void
+func(C *c_p)
+{
+c_p->funcC();
+}
Index: gcc/tree-sra.c
===
--- gcc/tree-sra.c  (revision 194450)
+++ gcc/tree-sra.c  (working copy)
@@ -666,6 +666,7 @@ type_internals_preclude_sra_p (tree type
|| !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld)
|| !host_integerp (DECL_FIELD_OFFSET (fld), 1)
|| !host_integerp (DECL_SIZE (fld), 1)
+   || !host_integerp (bit_position (fld), 0)
|| (AGGREGATE_TYPE_P (ft)
&& int_bit_position (fld) % BITS_PER_UNIT != 0))
  return true;


[PATCH] Fix PR55687

2012-12-14 Thread Richard Biener

This fixes PR55687 - there is a latent issue in SCEV analysis.
For (int) (short) {(unsigned short) i_2, +, 1 }_4 no_evolution_in_loop_p
returns false when asked whether that CHREC has an evolution in loop 4
(which is obviously wrong).  The reason is that the machinery is not
good at filtering NOPs.

The following patch conservatively fixes it (appropriate for stage3)
to use tree_contains_chrecs instead of just checking whether the
outermost tree is a chrec.

In fact hide_evolution_in_other_loops_than_loop isn't conservative
(but luckily unused apart from from no_evolution_in_loop_p).  Instead
of trying to implement that no_evolution_in_loop_p could be simplified
much, piggy-backing on tree_contains_chrecs (and thus also made
more precise).  I'll leave that for 4.9 though (PR 55689 to track that).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Thanks,
Richard.

2012-12-14  Richard Biener  

PR tree-optimization/55687
* tree-chrec.h (no_evolution_in_loop_p): Properly use
tree_contains_chrecs.

* gcc.dg/torture/pr55687.c: New testcase.

Index: gcc/tree-chrec.h
===
*** gcc/tree-chrec.h(revision 194496)
--- gcc/tree-chrec.h(working copy)
*** no_evolution_in_loop_p (tree chrec, unsi
*** 117,123 
  
STRIP_NOPS (chrec);
scev = hide_evolution_in_other_loops_than_loop (chrec, loop_num);
!   *res = !tree_is_chrec (scev);
return true;
  }
  
--- 117,123 
  
STRIP_NOPS (chrec);
scev = hide_evolution_in_other_loops_than_loop (chrec, loop_num);
!   *res = !tree_contains_chrecs (scev, NULL);
return true;
  }
  
Index: gcc/testsuite/gcc.dg/torture/pr55687.c
===
*** gcc/testsuite/gcc.dg/torture/pr55687.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr55687.c  (working copy)
***
*** 0 
--- 1,29 
+ /* { dg-do compile } */
+ 
+ typedef struct _IO_FILE FILE;
+ typedef short gshort;
+ typedef struct _GString GString;
+ 
+ extern char *fgets(char *, int, FILE *);
+ 
+ void verbose_text_loop (void *data)
+ {
+   FILE *dev_vcs;
+   char buf[81];
+   GString *buf_str;
+   gshort i, j;
+   while (1)  
+ {
+   for (i = 1; i <= 7; i++)
+   {
+ while (fgets (buf, 81, dev_vcs))
+   {
+ for (j = 0;   j < __builtin_strlen (buf);   j++)
+   if (buf[j] != ' ')
+ break;   
+ for (;   j < __builtin_strlen (buf);   j++)
+   g_string_append_c_inline (buf_str, buf[j]); 
+   }
+   }
+ }
+ }


Re: [PATCH, PR 55355, trunk, 4.6, 4.7] One more host_integerp check in SRA

2012-12-14 Thread Richard Biener
On Fri, Dec 14, 2012 at 1:04 PM, Martin Jambor  wrote:
> Hi,
>
> below are two variants for PR 55355, one for trunk and for 4.7, the
> second one, without dumping, for 4.6.  On 4.6 they fix an ICE for a
> too-large integer on i686, which perhaps can also happen on 4.7 and
> trunk even though it does not for their particular testcase.
>
> I've bootstrapped and tested this on x86_64-linux on trunk and the two
> branches, I'm in the process of doing the same on i686-linux.  OK for
> everywhere if it passes?

Ok.

Thanks,
Richard.

> Thanks,
>
> Martin
>
>
> trunk and 4.7 variant:
>
> 2012-12-12  Martin Jambor  
>
> PR tree-optimization/55355
> * tree-sra.c (type_internals_preclude_sra_p): Also check that
> bit_position is small enough to fit a single HOST_WIDE_INT.
>
> * testsuite/g++.dg/torture/pr55355.C: New test.
>
>
> diff --git a/gcc/testsuite/g++.dg/torture/pr55355.C 
> b/gcc/testsuite/g++.dg/torture/pr55355.C
> new file mode 100644
> index 000..6d8f8b6
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/torture/pr55355.C
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +
> +struct A
> +{
> +void funcA(void);
> +};
> +
> +struct B {};
> +
> +struct C
> +{
> +void funcC(void) { a_mp->funcA(); }
> +
> +char buf_ma[268435456];
> +A   *a_mp;
> +Bb_m;
> +};
> +
> +void
> +func(C *c_p)
> +{
> +c_p->funcC();
> +}
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 21d8a51..286ef26 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -714,7 +714,12 @@ type_internals_preclude_sra_p (tree type, const char 
> **msg)
>   {
> *msg = "structure field size not fixed";
> return true;
> - }
> + }
> +   if (!host_integerp (bit_position (fld), 0))
> + {
> +   *msg = "structure field size too big";
> +   return true;
> + }
> if (AGGREGATE_TYPE_P (ft)
> && int_bit_position (fld) % BITS_PER_UNIT != 0)
>   {
>
>
> 4.6 variant:
>
>
> 2012-12-12  Martin Jambor  
>
> PR tree-optimization/55355
> * tree-sra.c (type_internals_preclude_sra_p): Also check that
> bit_position is small enough to fit a single HOST_WIDE_INT.
>
> * testsuite/g++.dg/torture/pr55355.C: New test.
>
> Index: gcc/testsuite/g++.dg/torture/pr55355.C
> ===
> --- gcc/testsuite/g++.dg/torture/pr55355.C  (revision 0)
> +++ gcc/testsuite/g++.dg/torture/pr55355.C  (revision 0)
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +
> +struct A
> +{
> +void funcA(void);
> +};
> +
> +struct B {};
> +
> +struct C
> +{
> +void funcC(void) { a_mp->funcA(); }
> +
> +char buf_ma[268435456];
> +A   *a_mp;
> +Bb_m;
> +};
> +
> +void
> +func(C *c_p)
> +{
> +c_p->funcC();
> +}
> Index: gcc/tree-sra.c
> ===
> --- gcc/tree-sra.c  (revision 194450)
> +++ gcc/tree-sra.c  (working copy)
> @@ -666,6 +666,7 @@ type_internals_preclude_sra_p (tree type
> || !DECL_FIELD_OFFSET (fld) || !DECL_SIZE (fld)
> || !host_integerp (DECL_FIELD_OFFSET (fld), 1)
> || !host_integerp (DECL_SIZE (fld), 1)
> +   || !host_integerp (bit_position (fld), 0)
> || (AGGREGATE_TYPE_P (ft)
> && int_bit_position (fld) % BITS_PER_UNIT != 0))
>   return true;


Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-14 Thread Torvald Riegel
On Thu, 2012-12-13 at 10:02 +0100, Jakub Jelinek wrote:
> On Thu, Dec 13, 2012 at 10:38:13AM +0400, Dmitry Vyukov wrote:
> > On Wed, Dec 12, 2012 at 11:50 PM, Jakub Jelinek  wrote:
> > > Various TM tests ICE when built with -fgnu-tm -fsanitizer=address.
> > > The problem is that asan.c pass adds calls to builtins that weren't there
> > > before and TM is upset about it.  The __asan_report* are all like
> > > abort, in correctly written program they shouldn't have a user visible
> > > effect, in bad program they will terminate the process, but in any case
> > > it doesn't matter how many times they are retried as part of a 
> > > transaction,
> > > there is no state to roll back on transaction cancellation.
> > > __asan_handle_no_return, while not being noreturn, just marks the stack as
> > > unprotected, so again in correctly written application no effect, in bad 
> > > app
> > > might result in some issues being undetected, but still, it can be done 
> > > many
> > > times and isn't irreversible.
> > 
> > I was only loosely following tm-languages discussions. Does gcc tm
> > model guarantees strong consistency for all memory accesses? I mean
> > there are tm implementations that allow transient inconsistencies,
> 
> Will leave this to Torvald.

This has two parts: (1) how TM fits into the C++11/C11 memory model, and
(2) which guarantees the compiler and the TM runtime library agree on at
the level of the TM ABI that we use in GCC.

Regarding the first part, all transactions provide guarantees similar to
a global lock.  Specifically, there are virtual transaction start/end
events that take part in sequenced-before (similar to acquisition and
release of the global lock), whose are then guaranteed to execute
(without transactions interleaving with each other) in a global total
order (let's call this order TO).  TO then contributes to
happens-before.

On the ABI level, the TM runtime is allowed to execute speculatively as
long as (1) it does not expose any speculative execution to
nontransactional code and (2) speculation doesn't violate the language's
as-if rules (i.e., no visible side effects other than the abstract
machine; no signals due to seg faults etc.).  This means that, for
example, the TM will not access data at a wider granularity than what
nontransactional code that conforms to the C++11 memory model does would
do.
Second, transactions can have a tentative position in TO, but they will
only expose final TO choices to nontxnal code.  So, each transaction
will execute code that would be valid when executed in isolation -- but
it is possible that several transactions noncommitted transactions are
in flight concurrently that may conflict with each other.  The TM
ensures that such speculative execution is safe and not visible to a
race-free program.  So, when a transaction commits at a certain position
in TO, it will make sure that all other active transactions reach
consensus on TO (up to this position) before it returns to the execution
of nontxnal code.  Those active transactions will either see that they
would be still valid at a new TO position (after the committing txn), or
they abort and then signal that they agree to this TO.  That means that
the nontxnal code will not be affected by any speculative execution,
even if the prior txn privatized some data (this is called privatization
safety in TM jargon).
The sort-of counterpart to privatization safety is publication safety
(ie, transactions can safely make some data shared).  While
privatization safety is handled by the TM runtime, publication safety is
ensured by the compiler by not allowing any dangerous load/load
reordering, basicallly.  Publication safety is not yet ensured always,
but Aldy is working on this.

> > than are detected later and trx is restarted. Can't asan trigger false
> > positives in this case?
> 
> I can't imagine any.

I also don't see any, because all loads/stores of all transactions, even
those with a tentative TO position, would be a valid execution.  For
malloc/free in transactions, AFAIR, ASAN hooks into malloc/free
directly, so when libitm handles those including rollback of an
allocation, there shouldn't be false positives.  However, there might be
false negatives in case of free() because deallocations are deferred
until transaction commit (they can't be rolled back).  Could we tell
ASAN about free() directly using a call added by the TM instrumentation
pass?  That would at least help catch the false negatives for all
free()-like functions that the compiler knows about.

All of this is different for TSAN, of course, because there a memory
access does alter the state that TSAN keeps, so if we roll back the
accesses in the TM we would also have to roll back the TSAN state.

> > Also, what is the order of instrumentation in tm+asan setting? I mean
> > that neither tm must instrument asan instrumentation, nor asan must
> > instrument tm instrumentation. Is it the case? There also can be
> > conflicts related to order

Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-14 Thread Torvald Riegel
On Fri, 2012-12-14 at 13:44 +0400, Dmitry Vyukov wrote:
> On Thu, Dec 13, 2012 at 1:02 PM, Jakub Jelinek  wrote:
> >> > Various TM tests ICE when built with -fgnu-tm -fsanitizer=address.
> >> > The problem is that asan.c pass adds calls to builtins that weren't there
> >> > before and TM is upset about it.  The __asan_report* are all like
> >> > abort, in correctly written program they shouldn't have a user visible
> >> > effect, in bad program they will terminate the process, but in any case
> >> > it doesn't matter how many times they are retried as part of a 
> >> > transaction,
> >> > there is no state to roll back on transaction cancellation.
> >> > __asan_handle_no_return, while not being noreturn, just marks the stack 
> >> > as
> >> > unprotected, so again in correctly written application no effect, in bad 
> >> > app
> >> > might result in some issues being undetected, but still, it can be done 
> >> > many
> >> > times and isn't irreversible.
> >>
> >> I was only loosely following tm-languages discussions. Does gcc tm
> >> model guarantees strong consistency for all memory accesses? I mean
> >> there are tm implementations that allow transient inconsistencies,
> >
> > Will leave this to Torvald.
> >
> >> than are detected later and trx is restarted. Can't asan trigger false
> >> positives in this case?
> >
> > I can't imagine any.
> >
> >> Also, what is the order of instrumentation in tm+asan setting? I mean
> >> that neither tm must instrument asan instrumentation, nor asan must
> >> instrument tm instrumentation. Is it the case? There also can be
> >> conflicts related to ordering of instrumentation in the following
> >> case:
> >> asan_check();
> >> speculative_load();
> >> tm_check();
> >
> > I'm not aware of TM having speculative loads, libgtm certainly doesn't
> > install a SIGSEGV handler (and testing whether some memory is
> > readable/writable without actually dereferencing would be terribly slow).
> > If a memory load or store segfaults, whether in a transaction or outside of
> > it, it is a program bug and it is right if asan terminates the program.
> 
> 
> OK, thanks.
> AFAIR tl2 installs SIGSEGV and uses non-failing loads.

I think the original implementation didn't, but other TL2
implementations might if they want to avoid providing full privatization
safety without relying on any sandboxing attempts.  Sandboxing is hard
to get right (e.g., if the architecture doesn't provide nonfaulting
loads), and if you try it, you end up with a much more intrusive
solution (e.g., you need to ensure that the program doesn't override
your signal handlers).

Therefore, libitm ensures privatization safety by a quiescence-based
approach (i.e., a privatizing transaction waits until all other active
reading transactions are guaranteed to have observed the privatizer's
updates).

Torvald



Re: [PATCH] Make some asan builtins tm_pure (PR sanitizer/55508)

2012-12-14 Thread Torvald Riegel
On Thu, 2012-12-13 at 10:40 +0400, Dmitry Vyukov wrote:
> On Thu, Dec 13, 2012 at 1:58 AM, Richard Henderson  wrote:
> > On 12/12/2012 11:50 AM, Jakub Jelinek wrote:
> >> 2012-12-12  Jakub Jelinek  
> >>
> >>   PR sanitizer/55508
> >>   * builtin-attrs.def (ATTR_TMPURE_NOTHROW_LEAF_LIST,
> >>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): New.
> >>   * asan.c (ATTR_TMPURE_NOTHROW_LEAF_LIST,
> >>   ATTR_TMPURE_NORETURN_NOTHROW_LEAF_LIST): Define.
> >>   * sanitizer.def: Make __asan_report_* and __asan_handle_no_return
> >>   builtins tm pure.
> >
> > Ok.
> >
> > Agreed about we need another solution for tsan + tm.
> 
> 
> What type of bugs do you expect tsan catch in transactional setting?
> Are we talking about data races between transactional and
> non-transactional code?

Right.  And it would be good if TSAN can detect these cases.

I don't think we need TSAN to detect races in the TM runtime library
implementation itself, which would probably need a lot of special cases
because of (restricted, see my other post) speculative execution by the
TM.

> Does atomic transactions permitted by gcc? I mean can I use mutexes
> and atomics inside of transactions to synchronize with
> non-transactional code?

Such operations, called "unsafe code", are not allowed in atomic
transactions (currently--that choice is determined by several factors
such as likely runtime overheads and intrusiveness of TM
implementations).  But they are allowed in relaxed transactions.  Before
relaxed transactions execute such unsafe code, they will switch to
serial execution of transactions, which we could make visible to TSAN if
that would help.

Torvald



Re: [PATCH 4/7] s390: Add mode attribute for mode bitsize

2012-12-14 Thread Andreas Krebbel
On Thu, Aug 09, 2012 at 07:31:58PM -0700, Richard Henderson wrote:
> Constant fold, and less typing than, GET_MODE_BITSIZE with
> another mode substitution.

I did just a minor change in order to avoid a warning in the last hunk. 

Tested on s390 and s390x with different arch flags.

Please apply to mainline. Thanks!

Bye,

-Andreas-


 gcc/config/s390/s390.md |   72 
 1 file changed, 7 insertions(+), 65 modifications(!)

Index: gcc/config/s390/s390.md
===
*** gcc/config/s390/s390.md.orig
--- gcc/config/s390/s390.md
***
*** 531,536 
--- 531,539 
  (define_mode_attr bfstart [(DI "s") (SI "t")])
  (define_mode_attr bfend   [(DI "e") (SI "f")])
  
+ ;; In place of GET_MODE_BITSIZE (mode)
+ (define_mode_attr bitsize [(DI "64") (SI "32") (HI "16") (QI "8")])
+ 
  ;;
  ;;- Compare instructions.
  ;;
***
*** 3326,3332 
  
operands[1] = adjust_address (operands[1], BLKmode, 0);
set_mem_size (operands[1], size);
!   operands[2] = GEN_INT (GET_MODE_BITSIZE (mode) - bitsize);
operands[3] = GEN_INT (mask);
  })
  
--- 3329,3335 
  
operands[1] = adjust_address (operands[1], BLKmode, 0);
set_mem_size (operands[1], size);
!   operands[2] = GEN_INT ( - bitsize);
operands[3] = GEN_INT (mask);
  })
  
***
*** 3353,3359 
  
operands[1] = adjust_address (operands[1], BLKmode, 0);
set_mem_size (operands[1], size);
!   operands[2] = GEN_INT (GET_MODE_BITSIZE (mode) - bitsize);
operands[3] = GEN_INT (mask);
  })
  
--- 3356,3362 
  
operands[1] = adjust_address (operands[1], BLKmode, 0);
set_mem_size (operands[1], size);
!   operands[2] = GEN_INT ( - bitsize);
operands[3] = GEN_INT (mask);
  })
  
***
*** 3373,3421 
FAIL;
  })
  
  (define_insn "*insv_zEC12"
[(set (zero_extract:GPR (match_operand:GPR 0 "nonimmediate_operand" "+d")
! (match_operand 1 "const_int_operand""I")
! (match_operand 2 "const_int_operand""I"))
(match_operand:GPR 3 "nonimmediate_operand" "d"))]
"TARGET_ZEC12
!&& (INTVAL (operands[1]) + INTVAL (operands[2])) <=
!   GET_MODE_BITSIZE (mode)"
! {
!   int start = INTVAL (operands[2]);
!   int size = INTVAL (operands[1]);
!   int offset = 64 - GET_MODE_BITSIZE (mode);
! 
!   operands[2] = GEN_INT (offset + start);  /* start bit position 
*/
!   operands[1] = GEN_INT (offset + start + size - 1);   /* end bit position */
!   operands[4] = GEN_INT (GET_MODE_BITSIZE (mode) -
!start - size);   /* left shift count */
! 
!   return "risbgn\t%0,%3,%b2,%b1,%b4";
! }
[(set_attr "op_type" "RIE")])
  
  (define_insn "*insv_z10"
[(set (zero_extract:GPR (match_operand:GPR 0 "nonimmediate_operand" "+d")
! (match_operand 1 "const_int_operand""I")
! (match_operand 2 "const_int_operand""I"))
(match_operand:GPR 3 "nonimmediate_operand" "d"))
 (clobber (reg:CC CC_REGNUM))]
"TARGET_Z10
!&& (INTVAL (operands[1]) + INTVAL (operands[2])) <=
!   GET_MODE_BITSIZE (mode)"
! {
!   int start = INTVAL (operands[2]);
!   int size = INTVAL (operands[1]);
!   int offset = 64 - GET_MODE_BITSIZE (mode);
! 
!   operands[2] = GEN_INT (offset + start);  /* start bit position 
*/
!   operands[1] = GEN_INT (offset + start + size - 1);   /* end bit position */
!   operands[4] = GEN_INT (GET_MODE_BITSIZE (mode) -
!start - size);   /* left shift count */
! 
!   return "risbg\t%0,%3,%b2,%b1,%b4";
! }
[(set_attr "op_type" "RIE")
 (set_attr "z10prop" "z10_super_E1")])
  
--- 3376,3404 
FAIL;
  })
  
+ 
+ ; The normal RTL expansion will never generate a zero_extract where
+ ; the location operand isn't word mode.  However, we do this in the
+ ; back-end when generating atomic operations. See s390_two_part_insv.
  (define_insn "*insv_zEC12"
[(set (zero_extract:GPR (match_operand:GPR 0 "nonimmediate_operand" "+d")
! (match_operand 1 "const_int_operand""I")  ; size
! (match_operand 2 "const_int_operand""I")) ; pos
(match_operand:GPR 3 "nonimmediate_operand" "d"))]
"TARGET_ZEC12
!&& (INTVAL (operands[1]) + INTVAL (operands[2])) <= "
!   "risbgn\t%0,%3,64-+%2,64-+%2+%1-1,-%2-%1"
[(set_attr "op_type" "RIE")])
  
  (define_insn "*insv_z10"
[(set (zero_extract:GPR (match_operand:GPR 0 "nonimmediate_operand" "+d")
! (match_operand 1 "const_int_operand""I")  ; size
! (match_operand 2 "const_int_operand""I")) ; pos
(match_operand:GPR 3 "nonimmediate_operand" "d"))
 (clobber (reg:CC CC_REGNUM))]
"TARGET_Z10
!&& (INTVAL (operands[1]) + INTVAL (operands[2])) <= "
!   "risbg\t%0,%3,64-+%2,64-+%2+%1-1,-%2-%1"
[(set_att

Re: [PATCH 5/7] s390: Implement extzv for z10

2012-12-14 Thread Andreas Krebbel
I've changed the s390_extzv_shift_ok function and have added extzv
patterns for zEC12. I also tried to implement an extzvsi expander but
ran into some issues which need to be fixed first.

Tested on s390 and s390x with z196 and zEC12.

Please apply to mainline. Thanks!

Bye,

-Andreas-

 gcc/config/s390/predicates.md |4 +
 gcc/config/s390/s390-protos.h |1 
 gcc/config/s390/s390.c|   18 
 gcc/config/s390/s390.md   |   94 +!
 4 files changed, 81 insertions(+), 36 modifications(!)

Index: gcc/config/s390/predicates.md
===
*** gcc/config/s390/predicates.md.orig
--- gcc/config/s390/predicates.md
***
*** 101,106 
--- 101,110 
return true;
  })
  
+ (define_predicate "nonzero_shift_count_operand"
+   (and (match_code "const_int")
+(match_test "IN_RANGE (INTVAL (op), 1, GET_MODE_BITSIZE (mode) - 1)")))
+ 
  ;;  Return true if OP a valid operand for the LARL instruction.
  
  (define_predicate "larl_operand"
Index: gcc/config/s390/s390-protos.h
===
*** gcc/config/s390/s390-protos.h.orig
--- gcc/config/s390/s390-protos.h
*** extern bool s390_legitimate_address_with
*** 109,113 
--- 109,114 
  extern bool s390_decompose_shift_count (rtx, rtx *, HOST_WIDE_INT *);
  extern int s390_branch_condition_mask (rtx);
  extern int s390_compare_and_branch_condition_mask (rtx);
+ extern bool s390_extzv_shift_ok (int, int, unsigned HOST_WIDE_INT);
  
  #endif /* RTX_CODE */
Index: gcc/config/s390/s390.c
===
*** gcc/config/s390/s390.c.orig
--- gcc/config/s390/s390.c
*** s390_contiguous_bitmask_p (unsigned HOST
*** 1347,1352 
--- 1347,1370 
return true;
  }
  
+ /* Check whether a rotate of ROTL followed by an AND of CONTIG is
+equivalent to a shift followed by the AND.  In particular, CONTIG
+should not overlap the (rotated) bit 0/bit 63 gap.  Negative values
+for ROTL indicate a rotate to the right.  */
+ 
+ bool
+ s390_extzv_shift_ok (int bitsize, int rotl, unsigned HOST_WIDE_INT contig)
+ {
+   int pos, len;
+   bool ok;
+ 
+   ok = s390_contiguous_bitmask_p (contig, bitsize, &pos, &len);
+   gcc_assert (ok);
+ 
+   return ((rotl >= 0 && rotl <= pos)
+ || (rotl < 0 && -rotl <= bitsize - len - pos));
+ }
+ 
  /* Check whether we can (and want to) split a double-word
 move in mode MODE from SRC to DST into two single-word
 moves, moving the subword FIRST_SUBWORD first.  */
Index: gcc/config/s390/s390.md
===
*** gcc/config/s390/s390.md.orig
--- gcc/config/s390/s390.md
***
*** 3307,3321 
[(set_attr "op_type" "RS,RSY")
 (set_attr "z10prop" "z10_super_E1,z10_super_E1")])
  
  
! (define_insn_and_split "*extzv"
[(set (match_operand:GPR 0 "register_operand" "=d")
(zero_extract:GPR (match_operand:QI 1 "s_operand" "QS")
! (match_operand 2 "const_int_operand" "n")
  (const_int 0)))
 (clobber (reg:CC CC_REGNUM))]
!   "INTVAL (operands[2]) > 0
!&& INTVAL (operands[2]) <= GET_MODE_BITSIZE (SImode)"
"#"
"&& reload_completed"
[(parallel
--- 3307,3370 
[(set_attr "op_type" "RS,RSY")
 (set_attr "z10prop" "z10_super_E1,z10_super_E1")])
  
+ ;
+ ; extv instruction patterns
+ ;
+ 
+ ; FIXME: This expander needs to be converted from DI to GPR as well
+ ; after resolving some issues with it.
+ 
+ (define_expand "extzv"
+   [(parallel
+ [(set (match_operand:DI 0 "register_operand" "=d")
+ (zero_extract:DI
+  (match_operand:DI 1 "register_operand" "d")
+  (match_operand 2 "const_int_operand" "")   ; size
+  (match_operand 3 "const_int_operand" ""))) ; start
+  (clobber (reg:CC CC_REGNUM))])]
+   "TARGET_Z10"
+ {
+   /* Starting with zEC12 there is risbgn not clobbering CC.  */
+   if (TARGET_ZEC12)
+ {
+   emit_move_insn (operands[0],
+ gen_rtx_ZERO_EXTRACT (DImode,
+   operands[1],
+   operands[2],
+   operands[3]));
+   DONE;
+ }
+ })
  
! (define_insn "*extzv_zEC12"
!   [(set (match_operand:GPR 0 "register_operand" "=d")
!   (zero_extract:GPR
! (match_operand:GPR 1 "register_operand" "d")
! (match_operand 2 "const_int_operand" "")   ; size
! (match_operand 3 "const_int_operand" "")))] ; start]
!   "TARGET_ZEC12"
!   "risbgn\t%0,%1,64-%2,128+63,+%3+%2" ; dst, src, start, end, shift
!   [(set_attr "op_type" "RIE")])
! 
! (define_insn "*extzv_z10"
!   [(set (match_operand:GPR 0 "register_operand" "=d")
!   (zero_extract:GPR
!(match_operand:GPR 1 "register_operand" "d")
!(match_op

Re: [PATCH 6/7] s390: Generate rxsbg, and shifted forms of rosbg

2012-12-14 Thread Andreas Krebbel
I only changed DSI to GPR. This is semantically the same but more
common in the backend.

Tested on s390 and s390x with z196 and zEC12.

Please apply to mainline. Thanks!

Bye,

-Andreas-

 gcc/config/s390/s390.md |   61 ++!!
 1 file changed, 3 insertions(+), 58 modifications(!)

Index: gcc/config/s390/s390.md
===
*** gcc/config/s390/s390.md.orig
--- gcc/config/s390/s390.md
***
*** 393,398 
--- 393,401 
  ;; the same template.
  (define_code_iterator SHIFT [ashift lshiftrt])
  
+ ;; This iterator allow r[ox]sbg to be defined with the same template
+ (define_code_iterator IXOR [ior xor])
+ 
  ;; This iterator and attribute allow to combine most atomic operations.
  (define_code_iterator ATOMIC [and ior xor plus minus mult])
  (define_code_iterator ATOMIC_Z196 [and ior xor plus])
***
*** 3474,3488 
[(set_attr "op_type" "RIE")
 (set_attr "z10prop" "z10_super_E1")])
  
! ; and op1 with a mask being 1 for the selected bits and 0 for the rest
! (define_insn "*insv_or_z10_noshift"
[(set (match_operand:GPR 0 "nonimmediate_operand" "=d")
!   (ior:GPR (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "d")
! (match_operand:GPR 2 "contiguous_bitmask_operand" ""))
!   (match_operand:GPR 3 "nonimmediate_operand" "0")))
 (clobber (reg:CC CC_REGNUM))]
"TARGET_Z10"
!   "rosbg\t%0,%1,%2,%2,0"
[(set_attr "op_type" "RIE")])
  
  (define_insn "*insv_mem_reg"
--- 3477,3537 
[(set_attr "op_type" "RIE")
 (set_attr "z10prop" "z10_super_E1")])
  
! (define_insn "*rsbg__noshift"
[(set (match_operand:GPR 0 "nonimmediate_operand" "=d")
!   (IXOR:GPR
! (and:GPR (match_operand:GPR 1 "nonimmediate_operand" "d")
!(match_operand:GPR 2 "contiguous_bitmask_operand" ""))
! (match_operand:GPR 3 "nonimmediate_operand" "0")))
 (clobber (reg:CC CC_REGNUM))]
"TARGET_Z10"
!   "rsbg\t%0,%1,%2,%2,0"
!   [(set_attr "op_type" "RIE")])
! 
! (define_insn "*rsbg_di_rotl"
!   [(set (match_operand:DI 0 "nonimmediate_operand" "=d")
!   (IXOR:DI
! (and:DI
!   (rotate:DI
! (match_operand:DI 1 "nonimmediate_operand" "d")
!   (match_operand:DI 3 "const_int_operand" ""))
! (match_operand:DI 2 "contiguous_bitmask_operand" ""))
! (match_operand:DI 4 "nonimmediate_operand" "0")))
!(clobber (reg:CC CC_REGNUM))]
!   "TARGET_Z10"
!   "rsbg\t%0,%1,%2,%2,%b3"
!   [(set_attr "op_type" "RIE")])
! 
! (define_insn "*rsbg__srl"
!   [(set (match_operand:GPR 0 "nonimmediate_operand" "=d")
!   (IXOR:GPR
! (and:GPR
!   (lshiftrt:GPR
!   (match_operand:GPR 1 "nonimmediate_operand" "d")
!   (match_operand:GPR 3 "nonzero_shift_count_operand" ""))
! (match_operand:GPR 2 "contiguous_bitmask_operand" ""))
! (match_operand:GPR 4 "nonimmediate_operand" "0")))
!(clobber (reg:CC CC_REGNUM))]
!   "TARGET_Z10
!&& s390_extzv_shift_ok (, 64 - INTVAL (operands[3]),
!INTVAL (operands[2]))"
!   "rsbg\t%0,%1,%2,%2,64-%3"
!   [(set_attr "op_type" "RIE")])
! 
! (define_insn "*rsbg__sll"
!   [(set (match_operand:GPR 0 "nonimmediate_operand" "=d")
!   (IXOR:GPR
! (and:GPR
!   (ashift:GPR
!   (match_operand:GPR 1 "nonimmediate_operand" "d")
!   (match_operand:GPR 3 "nonzero_shift_count_operand" ""))
! (match_operand:GPR 2 "contiguous_bitmask_operand" ""))
! (match_operand:GPR 4 "nonimmediate_operand" "0")))
!(clobber (reg:CC CC_REGNUM))]
!   "TARGET_Z10
!&& s390_extzv_shift_ok (, INTVAL (operands[3]),
!INTVAL (operands[2]))"
!   "rsbg\t%0,%1,%2,%2,%3"
[(set_attr "op_type" "RIE")])
  
  (define_insn "*insv_mem_reg"



Re: [PATCH 7/7] s390: Generate rnsbg

2012-12-14 Thread Andreas Krebbel
I did the change suggested by Uli in the *insv_rnsbg_srl pattern.

Tested on s390 and s390x with z196 and zEC12.

Please apply to mainline. Thanks!

Bye,

-Andreas-

 gcc/config/s390/s390.md |   55 
 1 file changed, 55 insertions(+)

Index: gcc/config/s390/s390.md
===
*** gcc/config/s390/s390.md.orig
--- gcc/config/s390/s390.md
***
*** 3534,3539 
--- 3534,3594 
"rsbg\t%0,%1,%2,%2,%3"
[(set_attr "op_type" "RIE")])
  
+ ;; These two are generated by combine for s.bf &= val.
+ ;; ??? For bitfields smaller than 32-bits, we wind up with SImode
+ ;; shifts and ands, which results in some truly awful patterns
+ ;; including subregs of operations.  Rather unnecessisarily, IMO.
+ ;; Instead of
+ ;;
+ ;; (set (zero_extract:DI (reg/v:DI 50 [ s ])
+ ;;(const_int 24 [0x18])
+ ;;(const_int 0 [0]))
+ ;;(subreg:DI (and:SI (subreg:SI (lshiftrt:DI (reg/v:DI 50 [ s ])
+ ;;(const_int 40 [0x28])) 4)
+ ;;(reg:SI 4 %r4 [ y+4 ])) 0))
+ ;;
+ ;; we should instead generate
+ ;;
+ ;; (set (zero_extract:DI (reg/v:DI 50 [ s ])
+ ;;(const_int 24 [0x18])
+ ;;(const_int 0 [0]))
+ ;;(and:DI (lshiftrt:DI (reg/v:DI 50 [ s ])
+ ;;(const_int 40 [0x28]))
+ ;;(subreg:DI (reg:SI 4 %r4 [ y+4 ]) 0)))
+ ;;
+ ;; by noticing that we can push down the outer paradoxical subreg
+ ;; into the operation.
+ 
+ (define_insn "*insv_rnsbg_noshift"
+   [(set (zero_extract:DI
+ (match_operand:DI 0 "nonimmediate_operand" "+d")
+ (match_operand 1 "const_int_operand" "")
+ (match_operand 2 "const_int_operand" ""))
+   (and:DI
+ (match_dup 0)
+ (match_operand:DI 3 "nonimmediate_operand" "d")))
+(clobber (reg:CC CC_REGNUM))]
+   "TARGET_Z10
+&& INTVAL (operands[1]) + INTVAL (operands[2]) == 64"
+   "rnsbg\t%0,%3,%2,63,0"
+   [(set_attr "op_type" "RIE")])
+ 
+ (define_insn "*insv_rnsbg_srl"
+   [(set (zero_extract:DI
+ (match_operand:DI 0 "nonimmediate_operand" "+d")
+ (match_operand 1 "const_int_operand" "")
+ (match_operand 2 "const_int_operand" ""))
+   (and:DI
+ (lshiftrt:DI
+   (match_dup 0)
+   (match_operand 3 "const_int_operand" ""))
+ (match_operand:DI 4 "nonimmediate_operand" "d")))
+(clobber (reg:CC CC_REGNUM))]
+   "TARGET_Z10
+&& INTVAL (operands[3]) == 64 - INTVAL (operands[1]) - INTVAL 
(operands[2])"
+   "rnsbg\t%0,%4,%2,%2+%1-1,%3"
+   [(set_attr "op_type" "RIE")])
+ 
  (define_insn "*insv_mem_reg"
[(set (zero_extract:W (match_operand:QI 0 "memory_operand" "+Q,S")
(match_operand 1 "const_int_operand" "n,n")



Re: [fortran, patch] Allow displaying backtraces from user code

2012-12-14 Thread Janus Weil
Hi all,

here is another attempt to make gfortran support user-requested backtraces.

A patch in this direction was already proposed by FX in March, but did
not make it in so far. It was last discussed in June, cf.
http://gcc.gnu.org/ml/fortran/2012-06/msg00131.html and follow-ups,
where the consensus seemed to be to add a new intrinsic subroutine for
this (as GNU extension).

This is just what the attached patch does: It exports
_gfortran_show_backtrace from libgfortran and adds an intrinsic
SHOW_BACKTRACE, together with some documentation in intrinsic.texi (it
also documents the fact that ABORT gives a backtrace recently).

Ok for trunk?

Cheers,
Janus


2012-12-14  Janus Weil  

PR fortran/36044
* gfortran.h (gfc_isym_id): Add GFC_ISYM_SHOW_BACKTRACE.
* intrinsic.c (add_subroutines): Add "show_backtrace".
* intrinsic.texi (SHOW_BACKTRACE): Document SHOW_BACKTRACE intrinsic.


2012-12-14  Janus Weil  

PR fortran/36044
* gfortran.map: Add _gfortran_show_backtrace.
* libgfortran.h: Export show_backtrace.
* runtime/backtrace.c: Ditto.




2012/6/21 Janus Weil :
>>> There are two possibilities:
>>> a) Making _gfortran_show_backtrace accessible from the outside (via manual
>>> C binding from Fortran)
>>> b) Adding a new intrinsic
>>>
>>
>> I would vote for b), as it gets documented then.
>> It is enough useful for a wide range of programmers to deserve
>> an intrinsic of its own, IMHO.
>> And always directly available, no need of module convolutions.
>
> As noted before, I also prefer b).
>
>
>> Name: simply show_backtrace ?
>> This would be a self-explaining name, the odd "QQ" in
>> tracebackqq is just this, odd.
>> And why call it traceback when it is actually a backtrace ;-)
>
> Adopting the name from Intel would have the advantage of compatibility
> between ifort and gfortran. However, since other vendors have
> different names, compatibility between several compilers in this
> non-standard function will not be realized. Moreover I agree that the
> 'QQ' part is odd (I never understood what it is supposed to mean).
>
> Therefore I would also vote for something like "show_backtrace" (or
> simply "backtrace"?).
>
> Cheers,
> Janus


pr36044_backtrace_intrinsic.diff
Description: Binary data


Go patch committed: Better error message for invalid shifts

2012-12-14 Thread Ian Lance Taylor
My understanding of shifts of constants in Go has improved.  This patch
more accurately reflects the language spec when it comes to reporting
errors for invalid shifts.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline.

Ian

diff -r 9be532343312 go/expressions.cc
--- a/go/expressions.cc	Thu Dec 13 14:14:53 2012 -0800
+++ b/go/expressions.cc	Fri Dec 14 06:45:18 2012 -0800
@@ -5463,13 +5463,10 @@
   // Give a useful error if that happened.
   if (tleft->is_abstract()
 	  && subcontext.type != NULL
-	  && (this->left_->type()->integer_type() == NULL
-	  || (subcontext.type->integer_type() == NULL
-		  && subcontext.type->float_type() == NULL
-		  && subcontext.type->complex_type() == NULL
-		  && subcontext.type->interface_type() == NULL)))
+	  && !subcontext.may_be_abstract
+	  && subcontext.type->integer_type() == NULL)
 	this->report_error(("invalid context-determined non-integer type "
-			"for shift operand"));
+			"for left operand of shift"));
 
   // The context for the right hand operand is the same as for the
   // left hand operand, except for a shift operator.


[PATCH][Revised] PR55679: skip invalid tests from r194458 on darwin

2012-12-14 Thread Jack Howarth
   The attached revised patch disables two asan tests introduced in r194458 
which are invalid
on darwin. The g++.dg/asan/interception-test-1.C test should be dg-skip-if'd  
on darwin since 
mac function interposition is used instead of interception. The 
c-c++-common/asan/swapcontext-test-1.c
test is invalid as ucontext support was deprecated in darwin10 and removed 
completely
in darwin11. The current implementation of check_effective_target_swapcontext in
lib/target-supports.exp doesn't catch the deprecation of ucontext support on 
darwin
because it doesn't try to parse the ucontext.h include file. The attached patch
changes the check_effective_target_swapcontext to use check_no_compiler_messages
with compilation of ucontext.h. Tested without regression on both 
x86_64-apple-darwin10
and x86_64-unknown-linux-gnu. Okay for gcc trunk?
 Jack

gcc/testsuite/

2012-12-14  Jack Howarth 

PR sanitizer/55679
* g++.dg/asan/interception-test-1.C: Skip on darwin.
* lib/target-supports.exp (check_effective_target_swapcontext): Use
check_no_compiler_messages to test support in ucontext.h.

Index: gcc/testsuite/g++.dg/asan/interception-test-1.C
===
--- gcc/testsuite/g++.dg/asan/interception-test-1.C (revision 194483)
+++ gcc/testsuite/g++.dg/asan/interception-test-1.C (working copy)
@@ -3,6 +3,7 @@
 // { dg-do run }
 // { dg-options "-fno-builtin-malloc -fno-builtin-free" }
 // { dg-shouldfail "asan" }
+// { dg-skip-if "Darwin uses mac function interposition" { *-*-darwin* } }
 
 #include 
 #include 
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   (revision 194495)
+++ gcc/testsuite/lib/target-supports.exp   (working copy)
@@ -736,7 +736,14 @@ proc check_effective_target_setrlimit {}
 
 # Return 1 if the target supports swapcontext, 0 otherwise.
 proc check_effective_target_swapcontext {} {
-return [check_function_available "swapcontext"]
+return [check_no_compiler_messages swapcontext object {
+   #include 
+   int main (void)
+   {
+ ucontext_t orig_context,child_context;
+ if (swapcontext(&child_context, &orig_context) < 0) { }
+   }
+}]
 }
 
 # Return 1 if compilation with -pthread is error-free for trivial


Re: [PATCH][Revised] PR55679: skip invalid tests from r194458 on darwin

2012-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2012 at 10:13:47AM -0500, Jack Howarth wrote:
> 2012-12-14  Jack Howarth 
> 
>   PR sanitizer/55679
>   * g++.dg/asan/interception-test-1.C: Skip on darwin.
>   * lib/target-supports.exp (check_effective_target_swapcontext): Use
>   check_no_compiler_messages to test support in ucontext.h.

Ok with:

> --- gcc/testsuite/lib/target-supports.exp (revision 194495)
> +++ gcc/testsuite/lib/target-supports.exp (working copy)
> @@ -736,7 +736,14 @@ proc check_effective_target_setrlimit {}
>  
>  # Return 1 if the target supports swapcontext, 0 otherwise.
>  proc check_effective_target_swapcontext {} {
> -return [check_function_available "swapcontext"]
> +return [check_no_compiler_messages swapcontext object {

This should be executable instead of object.

Jakub


[COMMITTED] Tweak to patch for PR55492

2012-12-14 Thread Richard Henderson
Yvan noticed that the set of memory models has already been
validated by expand_builtin_atomic_store, so the test here
in expand_atomic_store is fully redundant.

Committed.


r~
* optabs.c (expand_atomic_store): Elide redundant model test.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index 99fd025..4f3cbb4 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -7537,8 +7537,7 @@ expand_atomic_store (rtx mem, rtx val, enum memmodel 
model, bool use_release)
 }
 
   /* Otherwise assume stores are atomic, and emit the proper barriers.  */
-  if (model == MEMMODEL_SEQ_CST || model == MEMMODEL_RELEASE)
-expand_mem_thread_fence (model);
+  expand_mem_thread_fence (model);
 
   emit_move_insn (mem, val);
 


Re: [cxx-conversion] Iterator for hash_table, and associated conversions

2012-12-14 Thread Diego Novillo

On 2012-12-13 23:08 , Lawrence Crowl wrote:


+  class iterator
+  {
+  public:
+inline iterator ();
+inline iterator (value_type **, value_type **);
+inline value_type &operator * ();
+void slide ();
+inline iterator &operator ++ ();
+inline bool operator != (const iterator &) const;
+  private:
+value_type **slot_;
+value_type **limit_;
+  };
+


This reminds me...  We (OK, *I*) should do the same in vec<>.



+/* Iterate through the elements of hash_table HTAB,
+   using hash_table <>::iterator ITER,
+   storing each element in RESULT, which is of type TYPE.
+
+   This macro has this form for compatibility with the
+   FOR_EACH_HTAB_ELEMENT currently defined in tree-flow.h.  */
+
+#define FOR_EACH_HASH_TABLE_ELEMENT(HTAB, RESULT, TYPE, ITER) \
+  for ((ITER) = (HTAB).begin (); \
+   (ITER) != (HTAB).end () ? (RESULT = &*(ITER) , true) : false; \
+   ++(ITER))
+


I wonder if we shouldn't just get rid of this style of iteration.  I 
never quite liked it.  I guess it doesn't hurt, though.



The patch is OK.  Thanks.


Diego.



C++ PATCH for c++/42315 (ICE after error with invalid array init)

2012-12-14 Thread Jason Merrill
In this bug, when we see the invalid initializer we changed the type of 
x to be error_mark_node.  But we had already built an ARRAY_REF to it 
using the incomplete type.  Then the back end gets confused by this 
invalid expression wrapped in a valid one.


Fixed by not messing with the type of x.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 1c880bb1328d3815435cf0695e180ab7fe1c47ec
Author: Jason Merrill 
Date:   Fri Dec 14 11:36:43 2012 -0500

	PR c++/42315
	* decl.c (maybe_deduce_size_from_array_init): Don't change the
	variable type.

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index cdda2f4..64bd4b5 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -4834,14 +4834,12 @@ maybe_deduce_size_from_array_init (tree decl, tree init)
 	  if (failure == 1)
 	{
 	  error ("initializer fails to determine size of %qD", decl);
-	  TREE_TYPE (decl) = error_mark_node;
 	}
 	  else if (failure == 2)
 	{
 	  if (do_default)
 		{
 		  error ("array size missing in %qD", decl);
-		  TREE_TYPE (decl) = error_mark_node;
 		}
 	  /* If a `static' var's size isn't known, make it extern as
 		 well as static, so it does not get allocated.  If it's not
@@ -4853,7 +4851,6 @@ maybe_deduce_size_from_array_init (tree decl, tree init)
 	  else if (failure == 3)
 	{
 	  error ("zero-size array %qD", decl);
-	  TREE_TYPE (decl) = error_mark_node;
 	}
 	}
 
diff --git a/gcc/testsuite/g++.dg/gomp/pr34964.C b/gcc/testsuite/g++.dg/gomp/pr34964.C
index f5995a6..a02faa2 100644
--- a/gcc/testsuite/g++.dg/gomp/pr34964.C
+++ b/gcc/testsuite/g++.dg/gomp/pr34964.C
@@ -2,5 +2,5 @@
 // { dg-do compile }
 // { dg-options "-fopenmp" }
 
-char x[] = 0;	// { dg-error "initializer fails to determine size" }
+char x[] = 0;	// { dg-error "initializer" }
 #pragma omp threadprivate (x)
diff --git a/gcc/testsuite/g++.dg/init/array21.C b/gcc/testsuite/g++.dg/init/array21.C
index f41ce86..5438af1 100644
--- a/gcc/testsuite/g++.dg/init/array21.C
+++ b/gcc/testsuite/g++.dg/init/array21.C
@@ -2,6 +2,6 @@
 
 void foo()
 {
-  const int x[] = 0; // { dg-error "size" }
+  const int x[] = 0; // { dg-error "initializer" }
   ++x;
 }
diff --git a/gcc/testsuite/g++.dg/init/array32.C b/gcc/testsuite/g++.dg/init/array32.C
new file mode 100644
index 000..06b27a9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/array32.C
@@ -0,0 +1,7 @@
+// PR c++/42315
+
+extern int x[];
+
+int i = x[0];
+
+int x[] = 0;			// { dg-error "" }


Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem

2012-12-14 Thread Christophe Lyon
Ping^2?

On 7 December 2012 09:34, Christophe Lyon  wrote:
> Ping?
> http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02558.html
>
> Thanks,
>
> Christophe.


Re: [PATCH][Cilkplus] Fix pragma simd info being lost

2012-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2012 at 04:59:02AM +, Iyer, Balaji V wrote:
> --- tree-vect-loop.c  (revision 194483)
> +++ tree-vect-loop.c  (working copy)
> @@ -234,8 +234,8 @@
> if (flag_enable_cilk && pragma_simd_assert_requested_p
> (loop->pragma_simd_index))
>   {
> -   error ("Loop not vectorized. " 
> -  "Exiting as requested by Pragma SIMD");
> +   fatal_error ("Loop not vectorized. " 
> +"Exiting as requested by Pragma SIMD");
>   }
> return false;
>   }

Why do you think fatal_error is the right thing here?  Why doesn't normal
error work?  Generally, if one function contains 10 #pragma simd loops that
require vectorization and 5 out of them aren't vectorized, it is nicer for
users to be told about all 5 of them, rather than just the first one.
fatal_error will exit immediately.

Jakub


RE: [PATCH][Cilkplus] Fix pragma simd info being lost

2012-12-14 Thread Iyer, Balaji V
Hello Jakub,
Please see my responses below.

Thanks,

Balaji V. Iyer.

> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Friday, December 14, 2012 11:59 AM
> To: Iyer, Balaji V
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH][Cilkplus] Fix pragma simd info being lost
> 
> On Fri, Dec 14, 2012 at 04:59:02AM +, Iyer, Balaji V wrote:
> > --- tree-vect-loop.c(revision 194483)
> > +++ tree-vect-loop.c(working copy)
> > @@ -234,8 +234,8 @@
> >   if (flag_enable_cilk && pragma_simd_assert_requested_p
> >   (loop->pragma_simd_index))
> > {
> > - error ("Loop not vectorized. "
> > -"Exiting as requested by Pragma SIMD");
> > + fatal_error ("Loop not vectorized. "
> > +  "Exiting as requested by Pragma SIMD");
> > }
> >   return false;
> > }
> 
> Why do you think fatal_error is the right thing here?  Why doesn't normal 
> error
> work?  Generally, if one function contains 10 #pragma simd loops that require
> vectorization and 5 out of them aren't vectorized, it is nicer for users to 
> be told
> about all 5 of them, rather than just the first one.
> fatal_error will exit immediately.

The #pragma simd assert requires the compiler to halt compilation if the loop 
is not vectorized. This is why I used fatal_error. The default case is noassert.


> 
>   Jakub


Re: [PATCH][Cilkplus] Fix pragma simd info being lost

2012-12-14 Thread Jakub Jelinek
On Fri, Dec 14, 2012 at 05:01:48PM +, Iyer, Balaji V wrote:
> > Why do you think fatal_error is the right thing here?  Why doesn't normal 
> > error
> > work?  Generally, if one function contains 10 #pragma simd loops that 
> > require
> > vectorization and 5 out of them aren't vectorized, it is nicer for users to 
> > be told
> > about all 5 of them, rather than just the first one.
> > fatal_error will exit immediately.
> 
> The #pragma simd assert requires the compiler to halt compilation if the
> loop is not vectorized.  This is why I used fatal_error.  The default case
> is noassert.

The compilation is halted even with error, compiler will exit with non-zero
exit status, won't compile any further functions, etc.

Jakub


[Patch, AArch64] Add basic recognition for cpu cortex-a53 and cortex-a57

2012-12-14 Thread Yufeng Zhang

Hi,

This patch adds the basic recognition for cpu cortex-a53 and cortex-a57. 
 OK to commit?


Thanks,
Yufeng


gcc/

2012-12-14  Yufeng Zhang  

* config/aarch64/aarch64-cores.def: Add entries for 
"cortex-a53" and

"cortex-a57".
* config/aarch64/aarch64-tune.md: Re-generate.diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 06cc982..4b77009 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -34,5 +34,7 @@
This list currently contains example CPUs that implement AArch64, and
therefore serves as a template for adding more CPUs in the future.  */
 
+AARCH64_CORE("cortex-a53",	  cortexa53,	 8,  AARCH64_FL_FPSIMD,generic)
+AARCH64_CORE("cortex-a57",	  cortexa57,	 8,  AARCH64_FL_FPSIMD,generic)
 AARCH64_CORE("example-1",	  large,	 8,  AARCH64_FL_FPSIMD,generic)
 AARCH64_CORE("example-2",	  small,	 8,  AARCH64_FL_FPSIMD,generic)
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index a654a91..02699e3 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"large,small"
+	"cortexa53,cortexa57,large,small"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))

Re: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c

2012-12-14 Thread H.J. Lu
On Thu, Dec 13, 2012 at 10:56 PM, Joey Ye  wrote:
>> -Original Message-
>> From: H.J. Lu [mailto:hjl.to...@gmail.com]
>> Sent: Friday, December 14, 2012 11:55
>> To: Joey Ye
>> Cc: gcc-patches@gcc.gnu.org; Joseph Prostko
>> Subject: Re: [PATCH, libgcc] Make possible to disable JCR in crtstuff.c
>>
>> > 2012-12-12  Joey Ye  
>> >
>> >  * configure.ac (enable-jcr-section): New target_configargs.
>> >  * configure: Regenerated.
>> >  * libgcc/Makefile.in: Include TARGET_USE_JCR_SECTION in CFLAGS.
>> >  * libgcc/configure.ac (use_jcr_section): New variable.
>> >  * libgcc/configure: Regenerated.
>> >  * libgcc/crtstuff.c: Check TARGET_USE_JCR_SECTION.
>>
>> Why do we need a new configure option at toplevel?
>> Can't you check --enable-languages=.. in libgcc?
> Although --enable-languages=... is passed to sub-configure, it needs a whole
> bunch of code to process with consideration of --disable-languages and
> targets. These code doesn't exist in libgcc/configure, and I intend not to
> duplicate them.
>

Can't you do

+# Disable jcr section if we're not building java
+case ,${enable_languages}, in
+  *java*)
+use_jcr_section=1
+;;
+  *)
+use_jcr_section=0
+;;
+esac

in libgcc/configure.ac?

BTW, checking *,java,* is wrong for

--enable-languages=c,c++,java

-- 
H.J.


[patch, libstdc++ testsuite] Shrink more tests for simulator

2012-12-14 Thread Steve Ellcey
Here are four more C++ tests that fail for me when run under the GNU
simulator.  I would like to shrink them to use less memory in the same
way as the other tests that I modified earlier.

OK to checkin?

Steve Ellcey
sell...@mips.com


2012-12-14  Steve Ellcey  

* testsuite/21_strings/basic_string/append/wchar_t/3.cc: Shrink
memory usage under simulator.
* testsuite/21_strings/basic_string/cons/wchar_t/6.cc: Ditto.
* testsuite/21_strings/basic_string/inserters_extractors/wchar_t/10.cc:
Ditto.
* testsuite/21_strings/basic_string/inserters_extractors/wchar_t/11.cc:
Ditto.


diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/append/wchar_t/3.cc 
b/libstdc++-v3/testsuite/21_strings/basic_string/append/wchar_t/3.cc
index dca5dbd..d20b994 100644
--- a/libstdc++-v3/testsuite/21_strings/basic_string/append/wchar_t/3.cc
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/append/wchar_t/3.cc
@@ -19,6 +19,12 @@
 
 // 21.3.5 string modifiers
 
+// { dg-options "-DITERATIONS=14" { target simulator } }
+
+#ifndef ITERATIONS
+#define ITERATIONS 18
+#endif
+
 #include 
 #include 
 
@@ -37,7 +43,7 @@ test03()
 {
   wstring one(source);
   wstring two(source);
-  for (unsigned j = 0; j < 18; ++j)
+  for (unsigned j = 0; j < ITERATIONS; ++j)
{
  VERIFY( one == two );
  one.append(one);
diff --git a/libstdc++-v3/testsuite/21_strings/basic_string/cons/wchar_t/6.cc 
b/libstdc++-v3/testsuite/21_strings/basic_string/cons/wchar_t/6.cc
index 82ed764..95acdd0 100644
--- a/libstdc++-v3/testsuite/21_strings/basic_string/cons/wchar_t/6.cc
+++ b/libstdc++-v3/testsuite/21_strings/basic_string/cons/wchar_t/6.cc
@@ -19,6 +19,12 @@
 
 // 21.3.1 basic_string constructors.
 
+// { dg-options "-DITERATIONS=11" { target simulator } }
+
+#ifndef ITERATIONS
+#define ITERATIONS 13
+#endif
+
 #include 
 #include 
 #include 
@@ -50,6 +56,6 @@ void test01(int iter)
 
 int main()
 {
-  test01(13);
+  test01(ITERATIONS);
   return 0;
 }
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/10.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/10.cc
index 0883e13..a647c39 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/10.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/10.cc
@@ -17,6 +17,12 @@
 
 // 21.3.7.9 inserters and extractors
 
+// { dg-options "-DMAX_SIZE=505" { target simulator } }
+
+#ifndef MAX_SIZE
+#define MAX_SIZE 777
+#endif
+
 #include 
 #include 
 #include 
@@ -64,7 +70,7 @@ void test01()
 
   const wchar_t delim = L'|';
   const unsigned nchunks = 10;
-  const wstring data = prepare(777, nchunks, delim);
+  const wstring data = prepare(MAX_SIZE, nchunks, delim);
 
   wofstream ofstrm;
   ofstrm.open(filename);
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/11.cc
 
b/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/11.cc
index 2f174ca..a0a822f 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/11.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/inserters_extractors/wchar_t/11.cc
@@ -17,6 +17,12 @@
 
 // 21.3.7.9 inserters and extractors
 
+// { dg-options "-DMAX_SIZE=466" { target simulator } }
+
+#ifndef MAX_SIZE
+#define MAX_SIZE 666
+#endif
+
 #include 
 #include 
 #include 
@@ -63,7 +69,7 @@ void test01()
   const char filename[] = "inserters_extractors-3.txt";
 
   const unsigned nchunks = 10;
-  const wstring data = prepare(666, nchunks);
+  const wstring data = prepare(MAX_SIZE, nchunks);
 
   wofstream ofstrm;
   ofstrm.open(filename);


Re: [fortran, patch] Allow displaying backtraces from user code

2012-12-14 Thread Janus Weil
Btw: Forgot to mention that it regtests cleanly and the docu parts
were tested with make pdf.

Cheers,
Janus



2012/12/14 Janus Weil :
> Hi all,
>
> here is another attempt to make gfortran support user-requested backtraces.
>
> A patch in this direction was already proposed by FX in March, but did
> not make it in so far. It was last discussed in June, cf.
> http://gcc.gnu.org/ml/fortran/2012-06/msg00131.html and follow-ups,
> where the consensus seemed to be to add a new intrinsic subroutine for
> this (as GNU extension).
>
> This is just what the attached patch does: It exports
> _gfortran_show_backtrace from libgfortran and adds an intrinsic
> SHOW_BACKTRACE, together with some documentation in intrinsic.texi (it
> also documents the fact that ABORT gives a backtrace recently).
>
> Ok for trunk?
>
> Cheers,
> Janus
>
>
> 2012-12-14  Janus Weil  
>
> PR fortran/36044
> * gfortran.h (gfc_isym_id): Add GFC_ISYM_SHOW_BACKTRACE.
> * intrinsic.c (add_subroutines): Add "show_backtrace".
> * intrinsic.texi (SHOW_BACKTRACE): Document SHOW_BACKTRACE intrinsic.
>
>
> 2012-12-14  Janus Weil  
>
> PR fortran/36044
> * gfortran.map: Add _gfortran_show_backtrace.
> * libgfortran.h: Export show_backtrace.
> * runtime/backtrace.c: Ditto.
>
>
>
>
> 2012/6/21 Janus Weil :
 There are two possibilities:
 a) Making _gfortran_show_backtrace accessible from the outside (via manual
 C binding from Fortran)
 b) Adding a new intrinsic

>>>
>>> I would vote for b), as it gets documented then.
>>> It is enough useful for a wide range of programmers to deserve
>>> an intrinsic of its own, IMHO.
>>> And always directly available, no need of module convolutions.
>>
>> As noted before, I also prefer b).
>>
>>
>>> Name: simply show_backtrace ?
>>> This would be a self-explaining name, the odd "QQ" in
>>> tracebackqq is just this, odd.
>>> And why call it traceback when it is actually a backtrace ;-)
>>
>> Adopting the name from Intel would have the advantage of compatibility
>> between ifort and gfortran. However, since other vendors have
>> different names, compatibility between several compilers in this
>> non-standard function will not be realized. Moreover I agree that the
>> 'QQ' part is odd (I never understood what it is supposed to mean).
>>
>> Therefore I would also vote for something like "show_backtrace" (or
>> simply "backtrace"?).
>>
>> Cheers,
>> Janus


RE: [PATCH][Cilkplus] Fix pragma simd info being lost

2012-12-14 Thread Iyer, Balaji V


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Friday, December 14, 2012 12:09 PM
> To: Iyer, Balaji V
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH][Cilkplus] Fix pragma simd info being lost
> 
> On Fri, Dec 14, 2012 at 05:01:48PM +, Iyer, Balaji V wrote:
> > > Why do you think fatal_error is the right thing here?  Why doesn't
> > > normal error work?  Generally, if one function contains 10 #pragma
> > > simd loops that require vectorization and 5 out of them aren't
> > > vectorized, it is nicer for users to be told about all 5 of them, rather 
> > > than just
> the first one.
> > > fatal_error will exit immediately.
> >
> > The #pragma simd assert requires the compiler to halt compilation if
> > the loop is not vectorized.  This is why I used fatal_error.  The
> > default case is noassert.
> 
> The compilation is halted even with error, compiler will exit with non-zero 
> exit
> status, won't compile any further functions, etc.

OK, I will change them and send out another patch.

> 
>   Jakub


[PATCH][Cilkplus] Cilk_for: Reject certain induction vars and comparision

2012-12-14 Thread Iyer, Balaji V
Hello Everyone,
This patch is for the Cilk Plus branch mainly affecting the C compiler. 
It will reject certain type of comparisons (e.g. ==) and complain about 
volatile or const induction variables all in _Cilk_for.

Thanks,

Balaji V. Iyer.
Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c(revision 194496)
+++ gcc/c/c-parser.c(working copy)
@@ -11918,7 +11918,7 @@
   c_cont_label = NULL_TREE;
   body = c_parser_c99_block_statement (parser);
   c_finish_cilk_loop (loc, cvar, cond, incr, body, c_cont_label, grain);
-  add_stmt (c_end_compound_stmt (UNKNOWN_LOCATION, block, true));
+  add_stmt (c_end_compound_stmt (loc, block, true));
   c_break_label = save_break;
   c_cont_label = save_cont;
 }
Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c(revision 194496)
+++ gcc/c/c-typeck.c(working copy)
@@ -10970,7 +10970,7 @@
 }
 
 void
-c_finish_cilk_loop (location_t start_locus ATTRIBUTE_UNUSED, tree cvar,
+c_finish_cilk_loop (location_t start_locus, tree cvar,
tree cond, tree incr, tree body, tree clab, tree grain)
 {
   tree init;
@@ -10982,12 +10982,12 @@
 
   if (!cond)
 {
-  error ("_Cilk_for missing condition");
+  error_at (start_locus, "_Cilk_for missing condition");
   return;
 }
   if (!incr)
 {
-  error ("_Cilk_for missing increment");
+  error_at (start_locus, "_Cilk_for missing increment");
   return;
 }
 
@@ -10999,7 +10999,7 @@
 }
   if (!cvar)
 {
-  error ("missing control variable");
+  error_at (start_locus, "missing control variable");
   return;
 }
   if (clab)
@@ -11007,7 +11007,31 @@
   error_at (EXPR_LOCATION (clab), "_Cilk_for has continue");
   return;
 }
+  
+  /* Checks if cond expr is one of the following: !=, <=, <, >=, or >.  */
+  if (TREE_CODE (cond) != NE_EXPR
+  && TREE_CODE (cond) != LT_EXPR
+  && TREE_CODE (cond) != LE_EXPR
+  && TREE_CODE (cond) != GT_EXPR
+  && TREE_CODE (cond) != GE_EXPR)
+{
+  error_at (EXPR_LOCATION (cond), "_Cilk_for condition must be one of the "
+   "following: !=, <=, <, >= or >");
+  return;
+}
 
+  /* Checks if the induction variable is volatile or constant.  */
+  if (TREE_THIS_VOLATILE (cvar))
+{
+  error_at (start_locus, "_Cilk_for induction variable cannot be 
volatile");
+  return;
+}
+  else if (TREE_CONSTANT (cvar) || TREE_READONLY (cvar))
+{
+  error_at (start_locus, "_Cilk_for induction variable cannot be constant 
or "
+   "readonly");
+  return;
+}
   init = DECL_INITIAL (cvar);
   
   c_tree = build_stmt (UNKNOWN_LOCATION, CILK_FOR_STMT, NULL_TREE, NULL_TREE,
Index: 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_volatile_var.c
===
--- 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_volatile_var.c
(revision 0)
+++ 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_volatile_var.c
(revision 0)
@@ -0,0 +1,20 @@
+/* 
+The variable may not be const or volatile
+   
+*/
+
+#include "controls.h"
+
+int a[SMALL_INT_ARRAY_SIZE];
+
+int main (void)
+{
+volatile int ii;
+const int jj;
+
+
+_Cilk_for(ii = 0; ii < SMALL_INT_ARRAY_SIZE; ii++) /* { dg-error 
"_Cilk_for induction variable cannot be volatile." } */
+{
+   a[ii] = ii;
+}
+}
Index: 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_invalid_compares.c
===
--- 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_invalid_compares.c
(revision 0)
+++ 
gcc/testsuite/gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_invalid_compares.c
(revision 0)
@@ -0,0 +1,17 @@
+/* 
+ The operator denoted OP shall be one of !=, <=, <, >=, or >
+   
+*/
+
+#define ARRAY_SIZE 1
+
+int a[ARRAY_SIZE];
+
+int main (void)
+{
+int ii;
+_Cilk_for(ii = 0; ii == ARRAY_SIZE; ii++) /* { dg-error "_Cilk_for 
condition must be one of the following."  */
+{
+   a[ii] = ii;
+}
+}
Index: gcc/testsuite/ChangeLog.cilkplus
===
--- gcc/testsuite/ChangeLog.cilkplus(revision 194496)
+++ gcc/testsuite/ChangeLog.cilkplus(working copy)
@@ -1,3 +1,10 @@
+2012-12-14  Balaji V. Iyer  
+
+   * 
gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_invalid_compares.c: 
+   New test.
+   * gcc.dg/cilk-plus/cilk_keywords_test/errors/cilk_for_volatile_var.c:
+   Likewise.
+
 2012-12-12  Balaji V. Iyer  
 
* gcc.dg/cilk-plus/cilk_keywords_test/errors/goto_inside_cilkfor.c: 
Index: gcc/ChangeLog.cilkplus
===
--- gcc/ChangeLog.cilkplus  (revision 194496)
+++ gcc/Cha

Re: [patch, libstdc++ testsuite] Shrink more tests for simulator

2012-12-14 Thread Jonathan Wakely
On 14 December 2012 18:10, Steve Ellcey wrote:
> Here are four more C++ tests that fail for me when run under the GNU
> simulator.  I would like to shrink them to use less memory in the same
> way as the other tests that I modified earlier.
>
> OK to checkin?

OK, thanks.


[patch, testsuite, committed] Obvious fix for targets with no profiling

2012-12-14 Thread Steve Ellcey
I am going to check in this patch as the obvious fix for PR 55688.  I
tested it on mips-mti-elf.

Steve Ellcey
sell...@mips.com


2012-12-14  Steve Ellcey  

PR regression/55688
* g++.dg/other/pr55650.C: Add dg-require-profiling.


diff --git a/gcc/testsuite/g++.dg/other/pr55650.C 
b/gcc/testsuite/g++.dg/other/pr55650.C
index fc52b19..c565b06 100644
--- a/gcc/testsuite/g++.dg/other/pr55650.C
+++ b/gcc/testsuite/g++.dg/other/pr55650.C
@@ -1,5 +1,6 @@
 // PR gcov-profile/55650
 // { dg-do link }
+// { dg-require-profiling "-fprofile-generate" }
 // { dg-options "-O2 -fprofile-generate" }
 // { dg-additional-sources "pr55650.cc" }
 


[google][4.7] Allow function reordering linker plugin to separate hot and cold code into different ELF segments

2012-12-14 Thread Sriraman Tallam
Hi Rong,

Please review this code. This code allows the function reordering
plugin to separate hot and cold code into different ELF segments.
This would allow optimizations like mapping the hot code alone to huge
pages.

With this patch, by default, the plugin maps .text.unlikely
sections into a separate ELF segment.  This can be turned off with
plugin option "--segment=none".

The include/plugin-api.h changes are a backport from trunk.

Thanks,
-Sri.
Index: function_reordering_plugin/function_reordering_plugin.c
===
--- function_reordering_plugin/function_reordering_plugin.c (revision 
194505)
+++ function_reordering_plugin/function_reordering_plugin.c (working copy)
@@ -74,6 +74,9 @@ static ld_plugin_get_input_section_name get_input_
 static ld_plugin_get_input_section_contents get_input_section_contents = NULL;
 static ld_plugin_update_section_order update_section_order = NULL;
 static ld_plugin_allow_section_ordering allow_section_ordering = NULL;
+static ld_plugin_allow_unique_segment_for_sections 
+allow_unique_segment_for_sections = NULL;
+static ld_plugin_unique_segment_for_sections unique_segment_for_sections = 
NULL;
 
 /* The file where the final function order will be stored.
It can be set by using the  plugin option  as --plugin-opt
@@ -86,6 +89,10 @@ static int is_api_exist = 0;
 /* The plugin does nothing when no-op is 1.  */
 static int no_op = 0;
 
+/* The plugin does not create a new segment for unlikely code if
+   no_segment is set.  */
+static int no_segment = 0;
+
 /* Copies new output file name out_file  */
 void get_filename (const char *name)
 {
@@ -96,12 +103,14 @@ void get_filename (const char *name)
 /* Process options to plugin.  Options with prefix "group=" are special.
They specify the type of grouping. The option "group=none" makes the
plugin do nothing.   Options with prefix "file=" set the output file
-   where the final function order must be stored.  */
+   where the final function order must be stored.  Option "segment=none"
+   does not place the cold code in a separate ELF segment.  */
 void
 process_option (const char *name)
 {
   const char *option_group = "group=";
   const char *option_file = "file=";
+  const char *option_segment = "segment=";
 
   /* Check if option is "group="  */
   if (strncmp (name, option_group, strlen (option_group)) == 0)
@@ -120,6 +129,16 @@ process_option (const char *name)
   return;
 }
 
+  /* Check if options is "segment=none"  */
+  if (strncmp (name, option_segment, strlen (option_segment)) == 0)
+{
+  if (strcmp (name + strlen (option_segment), "none") == 0)
+   no_segment = 1;
+  else
+   no_segment = 0;
+  return;
+}
+
   /* Unknown option, set no_op to 1.  */
   no_op = 1;
   fprintf (stderr, "Unknown option to function reordering plugin :%s\n",
@@ -169,6 +188,13 @@ onload (struct ld_plugin_tv *tv)
case LDPT_ALLOW_SECTION_ORDERING:
  allow_section_ordering = *entry->tv_u.tv_allow_section_ordering;
  break;
+   case LDPT_ALLOW_UNIQUE_SEGMENT_FOR_SECTIONS:
+ allow_unique_segment_for_sections
+ = *entry->tv_u.tv_allow_unique_segment_for_sections;
+ break;
+   case LDPT_UNIQUE_SEGMENT_FOR_SECTIONS:
+ unique_segment_for_sections = 
*entry->tv_u.tv_unique_segment_for_sections;
+ break;
 default:
   break;
 }
@@ -183,7 +209,9 @@ onload (struct ld_plugin_tv *tv)
   && get_input_section_name != NULL
   && get_input_section_contents != NULL
   && update_section_order != NULL
-  && allow_section_ordering != NULL)
+  && allow_section_ordering != NULL
+  && allow_unique_segment_for_sections != NULL
+  && unique_segment_for_sections != NULL)
 is_api_exist = 1;
   else
 return LDPS_OK;
@@ -216,6 +244,7 @@ claim_file_hook (const struct ld_plugin_input_file
 {
   /* Inform the linker to prepare for section reordering.  */
   (*allow_section_ordering) ();
+  (*allow_unique_segment_for_sections) ();
   is_ordering_specified = 1;
 }
 
@@ -259,6 +288,11 @@ claim_file_hook (const struct ld_plugin_input_file
 /* This function is called by the linker after all the symbols have been read.
At this stage, it is fine to tell the linker the desired function order.  */
 
+/* These globals are set to the start and end of the unlikely function sections
+   in the section list, which can then be mapped to a separate segment.  */
+extern int unlikely_segment_start;
+extern int unlikely_segment_end;
+
 enum ld_plugin_status
 all_symbols_read_hook (void)
 {
@@ -302,7 +336,14 @@ all_symbols_read_hook (void)
   && strcmp (out_file, "stderr") != 0)
 fclose (fp);
   /* Pass the new order of functions to the linker.  */
-  update_section_order (section_list, num_entries);
+  update_section_order (section_list, unlikely_segment_start);
+  assert (num_entries > unlike

[PATCH] Compute and emit working set information from gcov-dump (issue6940061)

2012-12-14 Thread Teresa Johnson
This patch enables the gcov-dump tool to optionally compute and dump
the working set information from the counter histogram, via a new -w option.
This is useful to help understand and tune how the compiler will use
the counter histogram, since it first computes the working set and selects
thresholds based on that.

This required moving the bulk of the compute_working_sets functionality
into gcov-io.c so that it was accessible by gcov-dump.c.

Bootstrapped and tested on x86_64-unknown-linux-gnu, and tested with various
gcda files. Ok for trunk?

2012-12-14  Teresa Johnson  

* lto-cgraph.c (input_symtab): Replace call to compute_working_sets
to get_working_sets.
* gcov-io.c (compute_working_sets): Moved most of body of old
compute_working_sets here from profile.c.
* gcov-io.h (NUM_GCOV_WORKING_SETS): Moved here from profile.c.
(gcov_working_set_t): Moved typedef here from basic-block.h
(compute_working_set): Declare.
* profile.c (NUM_GCOV_WORKING_SETS): Moved to gcov-io.h.
(get_working_sets): Renamed from compute_working_set,
replace most of body with call to new compute_working_sets.
(get_exec_counts): Replace call to compute_working_sets
to get_working_sets.
* profile.h (get_working_sets): Renamed from
compute_working_set.
* basic-block.h (gcov_working_set_t): Moved to gcov-io.h.
* gcov-dump.c (dump_working_sets): New function.

Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 194502)
+++ lto-cgraph.c(working copy)
@@ -1457,7 +1457,7 @@ input_symtab (void)
 }
 
   merge_profile_summaries (file_data_vec);
-  compute_working_sets ();
+  get_working_sets ();
 
 
   /* Clear out the aux field that was used to store enough state to
Index: gcov-io.c
===
--- gcov-io.c   (revision 194502)
+++ gcov-io.c   (working copy)
@@ -806,3 +806,110 @@ static void gcov_histogram_merge (gcov_bucket_type
   memcpy(tgt_histo, tmp_histo, sizeof (gcov_bucket_type) * 
GCOV_HISTOGRAM_SIZE);
 }
 #endif /* !IN_GCOV */
+
+/* This is used by gcov-dump (IN_GCOV == -1) and in the compiler
+   (!IN_GCOV && !IN_LIBGCOV).  */
+#if IN_GCOV <= 0 && !IN_LIBGCOV
+/* Compute the working set information from the counter histogram in
+   the profile summary. This is an array of information corresponding to a
+   range of percentages of the total execution count (sum_all), and includes
+   the number of counters required to cover that working set percentage and
+   the minimum counter value in that working set.  */
+
+GCOV_LINKAGE void
+compute_working_sets (const struct gcov_ctr_summary *summary,
+  gcov_working_set_t *gcov_working_sets)
+{
+  gcov_type working_set_cum_values[NUM_GCOV_WORKING_SETS];
+  gcov_type ws_cum_hotness_incr;
+  gcov_type cum, tmp_cum;
+  const gcov_bucket_type *histo_bucket;
+  unsigned ws_ix, c_num, count;
+  int h_ix;
+
+  /* Compute the amount of sum_all that the cumulative hotness grows
+ by in each successive working set entry, which depends on the
+ number of working set entries.  */
+  ws_cum_hotness_incr = summary->sum_all / NUM_GCOV_WORKING_SETS;
+
+  /* Next fill in an array of the cumulative hotness values corresponding
+ to each working set summary entry we are going to compute below.
+ Skip 0% statistics, which can be extrapolated from the
+ rest of the summary data.  */
+  cum = ws_cum_hotness_incr;
+  for (ws_ix = 0; ws_ix < NUM_GCOV_WORKING_SETS;
+   ws_ix++, cum += ws_cum_hotness_incr)
+working_set_cum_values[ws_ix] = cum;
+  /* The last summary entry is reserved for (roughly) 99.9% of the
+ working set. Divide by 1024 so it becomes a shift, which gives
+ almost exactly 99.9%.  */
+  working_set_cum_values[NUM_GCOV_WORKING_SETS-1]
+  = summary->sum_all - summary->sum_all/1024;
+
+  /* Next, walk through the histogram in decending order of hotness
+ and compute the statistics for the working set summary array.
+ As histogram entries are accumulated, we check to see which
+ working set entries have had their expected cum_value reached
+ and fill them in, walking the working set entries in increasing
+ size of cum_value.  */
+  ws_ix = 0; /* The current entry into the working set array.  */
+  cum = 0; /* The current accumulated counter sum.  */
+  count = 0; /* The current accumulated count of block counters.  */
+  for (h_ix = GCOV_HISTOGRAM_SIZE - 1;
+   h_ix >= 0 && ws_ix < NUM_GCOV_WORKING_SETS; h_ix--)
+{
+  histo_bucket = &summary->histogram[h_ix];
+
+  /* If we haven't reached the required cumulative counter value for
+ the current working set percentage, simply accumulate this histogram
+ entry into the running sums and continue to the next histogram
+ entry.  */
+  if (cum + histo_bucket->cum_value < worki

Re: [PATCH] Use new dump scheme for loop unroll passes

2012-12-14 Thread Sharad Singhai
Teresa,

Yes, I didn't take enhancements in google branches into account while
porting this patch. In light of these comments, I withdraw this patch
and will wait for your patch. Once your patch is in, I will update
this patch for regular dumps.

To answer your other question, yes, the new dump infrastructure can
dump to either dump file or opt-info streams (or both) depending upon
dump_flags. If the dump_flags contain TDF_* flags then the dump
happens on regular dump files, if dump_flags contain MSG_* flags then
the dump happens on opt-info stream.

Thanks,
Sharad


On Thu, Dec 13, 2012 at 9:37 PM, Teresa Johnson  wrote:
> On Thu, Dec 13, 2012 at 8:58 PM, Xinliang David Li  wrote:
>> A couple of comments:
>>
>> 1) please dump with source location if possible. What is the use of
>> the information if there is no line number
>
> The google branches have the code to identify a source location of the
> loop, and a similar message to the one proposed below (which uses the
> inform() interface on the google branches). I have a trunk patch ready
> to submit with this ported to the new dumping infrastructure, which I
> was going to submit after Sharad's patch. Sharad, do you want me to
> submit that one first, then it can be leveraged if you want to extend
> the messages? But I agree with David in that I think the bulk of these
> types of messages should stay in the dump file and not be emitted by
> -fopt-info because they are too verbose and low-level. Can the new
> dumping infrastructure be used to just dump to the dump file and not
> via -fopt-info?
>
> Teresa
>
>> 2) Please do not use the existing dump report -- Loop 1,2,3 etc means
>> nothing to user
>> 3) The optimization report should be standardized with some template
>> (similar to informational notes):
>>
>> file line:column note:  is ed >
>>   where  is a source construct such as a loop, a branch, a
>> function etc, while  is the transformation such as 'vectorized',
>> 'unrolled', 'peeled', 'if converted', 'hoisted' etc. Additional
>> information can be something to describe more about the transformation
>> and the source construct. For instance, unrolled N times, unrolled
>> completely,   and profile information of the loop (entry count,
>> average trip count etc). The addtitional information needs to be
>> concise. Please do *not* dump with verbosity as you proposed
>> (including the size, induction variable folding, exit condition
>> elimination etc).
>>
>> 4) the existing dump (into the dump file) can be changed to use the
>> same dump format above
>> 5) For loop unroll/peeling, the dumping code can be refactorized using
>> one report function -- see the code in google branch
>>
>> 6) do not forget the tree level unroller.
>>
>> David
>>
>> On Thu, Dec 13, 2012 at 6:15 PM, Sharad Singhai  wrote:
>>> Hi,
>>>
>>> As per discussion in http://gcc.gnu.org/ml/gcc/2012-12/msg00056.html,
>>> the attached patch updates loop unroll passes to use new dump
>>> infrastructure.
>>>
>>> This patch filters relevant dump messages into the following
>>> three categories
>>>
>>> - optimized: an optimization was successfully applied
>>> - missed: an optimization was missed due to the described reason
>>> - note: other relevant/detailed info during optimization. For example,
>>>   loop unrolling computes the loop bounds and size.
>>>
>>> Two sample outputs from one of the gcc tests (gcc.dg/unroll_1.c) are below.
>>>
>>> Sample 1
>>> -- info about optimized loops via
>>> "-fopt-info-loop-optimized" ---
>>> $ gcc gcc.dg/unroll_1.c -fno-diagnostics-show-caret -O2 -S
>>> -fdump-rtl-loop2_unroll -funroll-loops -fopt-info-loop-optimized
>>>
>>> Unrolled loop 1 completely (duplicated 2 times).
>>> Exit condition of peeled iterations was eliminated.
>>> Last iteration exit edge was proved true.
>>> Unrolled loop 1 completely (duplicated 2 times).
>>> Exit condition of peeled iterations was eliminated.
>>> Last iteration exit edge was proved true.
>>> 
>>>
>>> Sample 2:
>>> --- All available loop optimization info, i.e., optimized+missed+note
>>> via "-fopt-info-loop" ---
>>> $ gcc gcc.dg/unroll_1.c -fno-diagnostics-show-caret -O2 -S
>>> -fdump-rtl-loop2_unroll -funroll-loops -fopt-info-loop
>>>
>>> Loop 1 iterates 2 times.
>>> Loop 1 iterates at most 2 times.
>>> Estimating sizes for loop 1
>>>  BB: 4, after_exit: 0
>>>   size:   2 if (i_1 <= 1)
>>>Exit condition will be eliminated in peeled copies.
>>>  BB: 3, after_exit: 1
>>>   size:   1 _5 = b[i_1];
>>>   size:   1 _6 = _5 + 1;
>>>   size:   1 a[i_1] = _6;
>>>   size:   1 i_8 = i_1 + 1;
>>>Induction variable computation will be folded away.
>>> size: 6-3, last_iteration: 2-0
>>>   Loop size: 6
>>>   Estimated size after unrolling: 5
>>> Unrolled loop 1 completely (duplicated 2 times).
>>> Exit condition of peeled iterations was eliminated.
>>> Last iteration exit edge was proved true.
>>> Forced exit to be taken: if (1 == 0)
>>> Loop 1 iterates 2 times.
>>

C++ PATCH for c++/55685 (ICE with sizeof in template)

2012-12-14 Thread Jason Merrill
Here the problem was that tsubst_copy_and_build wasn't guarding the 
SIZEOF_EXPR_TYPE_P code with processing_template_decl like we do in the 
parser.  It's still necessary here because sometimes tsubsting happens 
when we're still in a template.  Fixed by updating the code to match.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit f415d192e2b16069344b5b9bea9c48dc2a217f99
Author: Jason Merrill 
Date:   Fri Dec 14 16:10:39 2012 -0500

	PR c++/55685
	* pt.c (tsubst_copy_and_build): Don't use SIZEOF_EXPR_TYPE_P in
	templates.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 91450d8..a21522b 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -13563,7 +13563,7 @@ tsubst_copy_and_build (tree t,
 	  {
 	if (TREE_CODE (r) != SIZEOF_EXPR || TYPE_P (op1))
 	  {
-		if (TYPE_P (op1))
+		if (!processing_template_decl && TYPE_P (op1))
 		  {
 		r = build_min (SIZEOF_EXPR, size_type_node,
    build1 (NOP_EXPR, op1, error_mark_node));
diff --git a/gcc/testsuite/g++.dg/template/sizeof15.C b/gcc/testsuite/g++.dg/template/sizeof15.C
new file mode 100644
index 000..3298dad3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/sizeof15.C
@@ -0,0 +1,13 @@
+// PR c++/55685
+
+typedef __SIZE_TYPE__ size_t;
+template 
+struct A;
+
+template  struct B
+{
+  static A  x;
+};
+
+template 
+A  B ::x;