[PING, testsuite] Add dot-file scan to test-case

2016-04-16 Thread Tom de Vries

[ was: PATCH, PR70161] Fix fdump-ipa-all-graph ]

On 18/03/16 10:35, Tom de Vries wrote:

On 18/03/16 10:23, Tom de Vries wrote:

On 15/03/16 12:37, Richard Biener wrote:

On Mon, 14 Mar 2016, Tom de Vries wrote:


Hi,

this patch fixes PR70161, a 4.9/5/6 regression.

Currently when using -fdump-ipa-all-graph, the compiler ICEs in
execute_function_dump when testing for pass->graph_dump_initialized,
because
pass == NULL.

The patch fixes:
- the ICE by setting the pass argument in the call to
   execute_function_dump in execute_one_ipa_transform_pass
- a subsequent ICE (triggered with -fipa-pta) by saving, resetting and
   restoring dump_file_name in cgraph_node::get_body, alongside the
   saving and restoring of the dump_file variable.
- the duplicate edges in the subsequently generated dot file by
   ensuring that execute_function_dump is called only once per function
   per pass. [ Note that this bit also has an effect for the normal
dump
   files for the ipa passes with transform function. For those
functions,
   atm execute_function_dump is called both after execute and after
   transform. With the patch, it's only called after transform. ]

Bootstrapped and reg-tested on x86_64.

OK for stage4?


Ok.



I've added these two test-cases that test the first two fixes.

Committed to trunk as obvious.



This patch adds testing for the last fix.

In order to make scanning lines in a .dot file work, I needed a fix in
dump-suffix to show cp.dot and inline.dot in the test summary:
...
PASS: gcc.dg/pr70161.c scan-ipa-dump-times cp.dot "subgraph" 1
PASS: gcc.dg/pr70161.c scan-ipa-dump-times inline.dot "subgraph" 1
...
Otherwise it would just show 'dot'.

Bootstrapped and reg-tested on x86_64.

OK for stage4 trunk, 4.9/5 release branches?



Ping.

Thanks,
- Tom


0004-Add-dot-file-scans-to-pr70161.c.patch


Add dot-file scans to pr70161.c

2016-03-18  Tom de Vries  

* gcc.dg/pr70161.c: Add dot-file scans.
* lib/scandump.exp (dump-suffix): Return suffix after first dot char,
instead of after last dot char.

---
  gcc/testsuite/gcc.dg/pr70161.c | 3 +++
  gcc/testsuite/lib/scandump.exp | 2 +-
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr70161.c b/gcc/testsuite/gcc.dg/pr70161.c
index 0b173c7..9b77d90 100644
--- a/gcc/testsuite/gcc.dg/pr70161.c
+++ b/gcc/testsuite/gcc.dg/pr70161.c
@@ -5,3 +5,6 @@ void
  foo (void)
  {
  }
+
+/* { dg-final { scan-ipa-dump-times "subgraph" 1 "inline.dot" } } */
+/* { dg-final { scan-ipa-dump-times "subgraph" 1 "cp.dot" } } */
diff --git a/gcc/testsuite/lib/scandump.exp b/gcc/testsuite/lib/scandump.exp
index 74d27cc..89b3944 100644
--- a/gcc/testsuite/lib/scandump.exp
+++ b/gcc/testsuite/lib/scandump.exp
@@ -22,7 +22,7 @@
  # Extract the constant part of the dump file suffix from the regexp.
  # Argument 0 is the regular expression.
  proc dump-suffix { arg } {
-set idx [expr [string last "." $arg] + 1]
+set idx [expr [string first "." $arg] + 1]
  return [string range $arg $idx end]
  }






[PATCH] Fix PR c++/27100

2016-04-16 Thread Patrick Palka
The problem here is that duplicate_decls doesn't preserve the
DECL_PENDING_INLINE_P flag of the old decl in the new decl.  This
happens only when a friend function is defined inside a class and then
redeclared.  The initial definition sets the DECL_PENDING_INLINE_P flag,
but the subsequent redeclaration doesn't retain the flag.

This patch makes duplicate_decls retain the DECL_PENDING_INLINE_P flag
from the old decl to the new decl.

Bootstrapped + regtested on x86_64-pc-linux-gnu.  Does this look OK to
commit?

gcc/cp/ChangeLog:

PR c++/27100
* decl.c (duplicate_decls): Properly copy the
DECL_PENDING_INLINE_P, DECL_PENDING_INLINE_INFO and
DECL_SAVED_FUNCTION_DATA fields from OLDDECL to NEWDECL.

gcc/testsuite/ChangeLog:

PR c++/27100
* g++.dg/other/friend6.C: New test.
---
 gcc/cp/decl.c| 13 +++--
 gcc/testsuite/g++.dg/other/friend6.C | 15 +++
 2 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/other/friend6.C

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index c1ad52f..4f79c71 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -2375,8 +2375,17 @@ duplicate_decls (tree newdecl, tree olddecl, bool 
newdecl_is_friend)
}
   else
{
- if (DECL_PENDING_INLINE_INFO (newdecl) == 0)
-   DECL_PENDING_INLINE_INFO (newdecl) = DECL_PENDING_INLINE_INFO 
(olddecl);
+ if (DECL_PENDING_INLINE_P (olddecl))
+   {
+ DECL_PENDING_INLINE_P (newdecl) = 1;
+ DECL_PENDING_INLINE_INFO (newdecl)
+   = DECL_PENDING_INLINE_INFO (olddecl);
+   }
+ else if (DECL_PENDING_INLINE_P (newdecl))
+   ;
+ else if (DECL_SAVED_FUNCTION_DATA (newdecl) == NULL)
+   DECL_SAVED_FUNCTION_DATA (newdecl)
+ = DECL_SAVED_FUNCTION_DATA (olddecl);
 
  DECL_DECLARED_INLINE_P (newdecl) |= DECL_DECLARED_INLINE_P (olddecl);
 
diff --git a/gcc/testsuite/g++.dg/other/friend6.C 
b/gcc/testsuite/g++.dg/other/friend6.C
new file mode 100644
index 000..5f593a1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/other/friend6.C
@@ -0,0 +1,15 @@
+// PR c++/27100
+// This used to fail at link time with an "undefined reference to 'foo'" error.
+// { dg-do link }
+
+struct A
+{
+  friend void foo (const A&) { }
+  friend void foo (const A&);
+};
+
+int
+main ()
+{
+  foo (A ());
+}
-- 
2.8.1.231.g95ac767



[RFC][PR61839]Convert CST BINOP COND_EXPR to COND_EXPR ? (CST BINOP 1) : (CST BINOP 0)

2016-04-16 Thread kugan

As explained in PR61839,

Following difference results in extra instructions:
-  c = b != 0 ? 486097858 : 972195717;
+  c = a + 972195718 >> (b != 0);

As suggested in PR, attached patch converts CST BINOP COND_EXPR to 
COND_EXPR ? (CST BINOP 1) : (CST BINOP 0).


Bootstrapped and regression tested for x86-64-linux-gnu with no new 
regression. Is this OK for statege-1.


Thanks,
Kugan

gcc/ChangeLog:

2016-04-17  Kugan Vivekanandarajah  

* tree-vrp.c (simplify_stmt_using_ranges): Convert CST BINOP COND_EXPR 
to
COND_EXPR ? (CST BINOP 1) : (CST BINOP 0) when possible.
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index bbdf9ce..caf7a2a 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9902,6 +9902,49 @@ simplify_stmt_using_ranges (gimple_stmt_iterator *gsi)
 {
   enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
   tree rhs1 = gimple_assign_rhs1 (stmt);
+  tree rhs2 = gimple_assign_rhs2 (stmt);
+  tree var;
+
+  /* Convert:
+COND_RES = X COMPARE Y
+TMP = (CAST) COND_RES
+LHS = CST BINOP TMP
+
+To:
+LHS = COND_RES ? (CST BINOP 1) : (CST BINOP 0) */
+
+  if (TREE_CODE_CLASS (rhs_code) == tcc_binary
+ && TREE_CODE (rhs1) == INTEGER_CST
+ && TREE_CODE (rhs2) == SSA_NAME
+ && is_gimple_assign (SSA_NAME_DEF_STMT (rhs2))
+ && gimple_assign_rhs_code (SSA_NAME_DEF_STMT (rhs2)) == NOP_EXPR
+ && (var = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (rhs2)))
+ && TREE_CODE (var) == SSA_NAME
+ && is_gimple_assign (SSA_NAME_DEF_STMT (var))
+ && TREE_CODE_CLASS (gimple_assign_rhs_code (SSA_NAME_DEF_STMT (var)))
+ == tcc_comparison)
+
+   {
+ value_range *vr = get_value_range (var);
+ if (range_int_cst_p (vr)
+ && integer_zerop (vr->min)
+ && integer_onep (vr->max))
+   {
+
+ tree new_rhs1 =  int_const_binop (rhs_code, rhs1, vr->max);
+ tree new_rhs2 =  int_const_binop (rhs_code, rhs1, vr->min);
+
+ if (new_rhs1 && new_rhs2)
+   {
+ gimple_assign_set_rhs_with_ops (gsi,
+ COND_EXPR, var,
+ new_rhs1,
+ new_rhs2);
+ update_stmt (gsi_stmt (*gsi));
+ return true;
+   }
+   }
+   }
 
   switch (rhs_code)
{


[PATCH] Fix missed DSE opportunity with operator delete.

2016-04-16 Thread Mikhail Maltsev
Hi, all!

Currently GCC can optimize away the following dead store:

void test(char *x)
{
  *x = 1;
  free(x);
}

but not this one (Clang handles both cases):

void test(char *x)
{
  *x = 1;
  delete x;
}

The attached patch fixes this by introducing a new __attribute__((free)). I
first tried to add new built-ins for each version of operator delete (there are
four of them), but it looked a little clumsy, and would require some special
handling for warning about taking address of built-in function.

Is such approach (i.e. adding a new attribute) OK? Bootstrapped and regtested on
x86_64-pc-linux-gnu.

-- 
Regards,
Mikhail Maltsev

gcc/c/ChangeLog:

2016-04-16  Mikhail Maltsev  

* c-decl.c (merge_decls): Handle free_flag.

gcc/ChangeLog:

2016-04-16  Mikhail Maltsev  

* builtin-attrs.def: Add attribute free.
* builtins.def (free): Add attribute free.
* doc/extend.texi: Document attribute free.
* gtm-builtins.def (_ITM_free): Add attribute free.
* tree-core.h (struct tree_function_decl): Add free_flag.
* tree-ssa-alias.c (ref_maybe_used_by_call_p_1): Handle free_flag.
(call_may_clobber_ref_p_1): Likewise.
(stmt_kills_ref_p): Likewise.
* tree-streamer-in.c (unpack_ts_function_decl_value_fields): Likewise.
* tree-streamer-out.c (pack_ts_function_decl_value_fields): Likewise.
* tree.h (DECL_IS_FREE): New accessor macro.

gcc/testsuite/ChangeLog:

2016-04-16  Mikhail Maltsev  

* g++.dg/opt/op-delete-dse.C: New test.
* gcc.dg/attr-free.c: New test.

gcc/c-family/ChangeLog:

2016-04-16  Mikhail Maltsev  



* c-common.c (handle_free_attribute): New function.





gcc/cp/ChangeLog:





2016-04-16  Mikhail Maltsev  





* decl.c (cxx_init_decl_processing): Set flag_free for operator delete.
diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def
index 089817a..ddaf3e6 100644
--- a/gcc/builtin-attrs.def
+++ b/gcc/builtin-attrs.def
@@ -88,6 +88,7 @@ DEF_ATTR_IDENT (ATTR_CONST, "const")
 DEF_ATTR_IDENT (ATTR_FORMAT, "format")
 DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
 DEF_ATTR_IDENT (ATTR_MALLOC, "malloc")
+DEF_ATTR_IDENT (ATTR_FREE, "free")
 DEF_ATTR_IDENT (ATTR_NONNULL, "nonnull")
 DEF_ATTR_IDENT (ATTR_NORETURN, "noreturn")
 DEF_ATTR_IDENT (ATTR_NOTHROW, "nothrow")
@@ -141,6 +142,10 @@ DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC,	\
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LEAF_LIST, ATTR_MALLOC,	\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_FREE_NOTHROW_LIST, ATTR_FREE,		\
+			ATTR_NULL, ATTR_NOTHROW_LIST)
+DEF_ATTR_TREE_LIST (ATTR_FREE_NOTHROW_LEAF_LIST, ATTR_FREE,	\
+			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_SENTINEL_NOTHROW_LIST, ATTR_SENTINEL,	\
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_SENTINEL_NOTHROW_LEAF_LIST, ATTR_SENTINEL,	\
@@ -269,8 +274,10 @@ DEF_ATTR_TREE_LIST (ATTR_TM_NOTHROW_RT_LIST,
 DEF_ATTR_TREE_LIST (ATTR_TMPURE_MALLOC_NOTHROW_LIST,
 		   ATTR_TM_TMPURE, ATTR_NULL, ATTR_MALLOC_NOTHROW_LIST)
 /* Same attributes used for BUILT_IN_FREE except with TM_PURE thrown in.  */
-DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LIST,
-		   ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LIST)
+DEF_ATTR_TREE_LIST (ATTR_TMPURE_FREE_NOTHROW_LIST,
+		   ATTR_TM_TMPURE, ATTR_NULL, ATTR_FREE_NOTHROW_LIST)
+DEF_ATTR_TREE_LIST (ATTR_TMPURE_FREE_NOTHROW_LEAF_LIST,
+		   ATTR_TM_TMPURE, ATTR_NULL, ATTR_FREE_NOTHROW_LEAF_LIST)
 
 DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LEAF_LIST,
 		ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 2fc7f65..e3d1614 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -781,7 +781,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSLL, "ffsll", BT_FN_INT_LONGLONG, ATTR_CONST_
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FORK, "fork", BT_FN_PID, ATTR_NOTHROW_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FRAME_ADDRESS, "frame_address", BT_FN_PTR_UINT, ATTR_NULL)
 /* [trans-mem]: Adjust BUILT_IN_TM_FREE if BUILT_IN_FREE is changed.  */
-DEF_LIB_BUILTIN(BUILT_IN_FREE, "free", BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
+DEF_LIB_BUILTIN(BUILT_IN_FREE, "free", BT_FN_VOID_PTR, ATTR_FREE_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FROB_RETURN_ADDR, "frob_return_addr", BT_FN_PTR_PTR, ATTR_NULL)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_GETTEXT, "gettext", BT_FN_STRING_CONST_STRING, ATTR_FORMAT_ARG_1)
 DEF_C99_BUILTIN(BUILT_IN_IMAXABS, "imaxabs", BT_FN_INTMAX_INTMAX, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 30c815d..3840675 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -355,6 +355,7 @@ static tree handle_tls_model_attribute (tree *, tree, tree, int,
 static tree handle_no_instrument_function_attribute (tree *, tree,
 		 tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
+static t

Re: [wwwdocs,Java] java/index.html -- fix formatting on gcc.gnu.org

2016-04-16 Thread Gerald Pfeifer
On Sun, 10 Apr 2016, Andrew Hughes wrote:
>> That said, looking at the page, and how since 2005 nearly all changes
>> have been maintainance ones from me, is it really worthwhile keeping
>> this (short of historic reasons)?
> I guess the next news will be the removal of GCJ during the
> GCC 7 development period, so its remaining shelf life should
> be limited anyway.

Soo, GCC 6 has branched -- would it make sense for you guys to
start this removal?

Somewhat related, any concerns if I were to remove
https://gcc.gnu.org/java/status.html now?

("Status of GCJ as of GCC 3.2" _really_ is rather old.)

Gerald


[PATCH] lto-streamer.h: Include gimple.h for LAST_AND_UNUSED_GIMPLE_CODE.

2016-04-16 Thread Khem Raj
gcc/:
2016-04-16  Khem Raj  

* lto-streamer.h: Include gimple.h for LAST_AND_UNUSED_GIMPLE_CODE.


Fixes build errors e.g.

| ../../../../../../../work-shared/gcc-6.0.0-r0/git/gcc/lto-streamer.h:159:34: 
error: 'LAST_AND_UNUSED_GIMPLE_CODE' was not declared in this scope
|LTO_bb0 = 1 + MAX_TREE_CODES + LAST_AND_UNUSED_GIMPLE_CODE,

---
 gcc/lto-streamer.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index f391161..489801b 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -26,6 +26,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "plugin-api.h"
 #include "gcov-io.h"
 #include "diagnostic.h"
+#include "gimple.h"
 
 /* Define when debugging the LTO streamer.  This causes the writer
to output the numeric value for the memory address of the tree node
-- 
2.8.0



Fix pure/const discovery WRT interposition part 2

2016-04-16 Thread Jan Hubicka
Hi,
this patch updates ipa-pure-const.c to only propagate PURE flag across
calls that does not bind to local defs and are not explicitly declared const.
This gets memory state into shape that the callee produced by other compiler
and still accessing memory is safe.

We need similar logic for -fnon-call-exceptions which I will do incrementally.
We also want to track if the original unoptimized body did access memory but
that needs frontend changes because memory accesses may get folded away during
parsing.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

PR ipa/70018
* cgraph.c (cgraph_set_const_flag_1): Only set as pure if
function does not bind to current def.
* ipa-pure-const.c (worse_state): Add FROM and TO parameters;
handle conservatively calls to functions that does not need to bind
to current def.
(check_call): Update call of worse_state.
(ignore_edge_for_nothrow): Update.
(ignore_edge_for_pure_const): Likewise.
(propagate_pure_const): Update calls to worse_state.
(skip_function_for_local_pure_const): Reformat comments.

* g++.dg/ipa/pure-const-1.C: New testcase.
* g++.dg/ipa/pure-const-2.C: New testcase.
* g++.dg/ipa/pure-const-3.C: New testcase.
Index: cgraph.c
===
--- cgraph.c(revision 235063)
+++ cgraph.c(working copy)
@@ -2393,7 +2393,35 @@ cgraph_set_const_flag_1 (cgraph_node *no
   if (DECL_STATIC_DESTRUCTOR (node->decl))
DECL_STATIC_DESTRUCTOR (node->decl) = 0;
 }
-  TREE_READONLY (node->decl) = data != NULL;
+
+  /* Consider function:
+
+ bool a(int *p)
+ {
+   return *p==*p;
+ }
+
+ During early optimization we will turn this into:
+
+ bool a(int *p)
+ {
+   return true;
+ }
+
+ Now if this function will be detected as CONST however when interposed it
+ may end up being just pure.  We always must assume the worst scenario 
here.
+   */
+  if (TREE_READONLY (node->decl))
+;
+  else if (node->binds_to_current_def_p ())
+TREE_READONLY (node->decl) = data != NULL;
+  else
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Dropping state to PURE because function does "
+"not bind to current def.\n");
+  DECL_PURE_P (node->decl) = data != NULL;
+}
   DECL_LOOPING_CONST_OR_PURE_P (node->decl) = ((size_t)data & 2) != 0;
   return false;
 }
Index: ipa-pure-const.c
===
--- ipa-pure-const.c(revision 235063)
+++ ipa-pure-const.c(working copy)
@@ -440,12 +440,40 @@ better_state (enum pure_const_state_e *s
 }
 
 /* Merge STATE and STATE2 and LOOPING and LOOPING2 and store
-   into STATE and LOOPING worse of the two variants.  */
+   into STATE and LOOPING worse of the two variants.
+   N is the actual node called.  */
 
 static inline void
 worse_state (enum pure_const_state_e *state, bool *looping,
-enum pure_const_state_e state2, bool looping2)
-{
+enum pure_const_state_e state2, bool looping2,
+struct symtab_node *from,
+struct symtab_node *to)
+{
+  /* Consider function:
+
+ bool a(int *p)
+ {
+   return *p==*p;
+ }
+
+ During early optimization we will turn this into:
+
+ bool a(int *p)
+ {
+   return true;
+ }
+
+ Now if this function will be detected as CONST however when interposed it
+ may end up being just pure.  We always must assume the worst scenario 
here.
+   */
+  if (*state == IPA_CONST && state2 == IPA_CONST
+  && to && !TREE_READONLY (to->decl) && !to->binds_to_current_def_p (from))
+{
+  if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "Dropping state to PURE because call to %s may not "
+"bind to current def.\n", to->name ());
+  state2 = IPA_PURE;
+}
   *state = MAX (*state, state2);
   *looping = MAX (*looping, looping2);
 }
@@ -546,7 +574,8 @@ check_call (funct_state local, gcall *ca
   if (special_builtin_state (&call_state, &call_looping, callee_t))
{
  worse_state (&local->pure_const_state, &local->looping,
-  call_state, call_looping);
+  call_state, call_looping,
+  NULL, NULL);
  return;
}
   /* When bad things happen to bad functions, they cannot be const
@@ -617,7 +646,7 @@ check_call (funct_state local, gcall *ca
 == (ECF_NORETURN | ECF_NOTHROW))
|| (!flag_exceptions && (flags & ECF_NORETURN)));
   worse_state (&local->pure_const_state, &local->looping,
-  call_state, call_looping);
+  call_state, call_looping, NULL, NULL);
 }
   /* Direct functions calls are handled by IPA propagation.  */
 }
@@ -1134,7 +1161,8 @@ ignore_edge_for_nothrow 

Re: [PATCH 1/4] Add gcc-auto-profile script

2016-04-16 Thread Andi Kleen
Andi Kleen  writes:

Ping for the patch series!

> From: Andi Kleen 
>
> Using autofdo is currently something difficult. It requires using the
> model specific branches taken event, which differs on different CPUs.
> The example shown in the manual requires a special patched version of
> perf that is non standard, and also will likely not work everywhere.
>
> This patch adds a new gcc-auto-profile script that figures out the
> correct event and runs perf. The script is installed with on Linux systems.
>
> Since maintaining the script would be somewhat tedious (needs changes
> every time a new CPU comes out) I auto generated it from the online
> Intel event database. The script to do that is in contrib and can be
> rerun.
>
> Right now there is no test if perf works in configure. This
> would vary depending on the build and target system, and since
> it currently doesn't work in virtualization and needs uptodate
> kernel it may often fail in common distribution build setups.
>
> So Linux just hardcodes installing the script, but it may fail at runtime.
>
> This is needed to actually make use of autofdo in a generic way
> in the build system and in the test suite.
>
> So far the script is not installed.
>
> gcc/:
> 2016-03-27  Andi Kleen  
>
>   * doc/invoke.texi: Document gcc-auto-profile
>   * gcc-auto-profile: Create.
>
> contrib/:
>
> 2016-03-27  Andi Kleen  
>
>   * gen_autofdo_event.py: New file to regenerate
>   gcc-auto-profile.
> ---
>  contrib/gen_autofdo_event.py | 155 
> +++
>  gcc/doc/invoke.texi  |  31 +++--
>  gcc/gcc-auto-profile |  70 +++
>  3 files changed, 251 insertions(+), 5 deletions(-)
>  create mode 100755 contrib/gen_autofdo_event.py
>  create mode 100755 gcc/gcc-auto-profile
>
> diff --git a/contrib/gen_autofdo_event.py b/contrib/gen_autofdo_event.py
> new file mode 100755
> index 000..db4db33
> --- /dev/null
> +++ b/contrib/gen_autofdo_event.py
> @@ -0,0 +1,155 @@
> +#!/usr/bin/python
> +# generate Intel taken branches Linux perf event script for autofdo profiling
> +
> +# Copyright (C) 2016 Free Software Foundation, Inc.
> +#
> +# GCC is free software; you can redistribute it and/or modify it under
> +# the terms of the GNU General Public License as published by the Free
> +# Software Foundation; either version 3, or (at your option) any later
> +# version.
> +#
> +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +# WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +# for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# .  */
> +
> +# run it with perf record -b -e EVENT program ...
> +# The Linux Kernel needs to support the PMU of the current CPU, and
> +# it will likely not work in VMs.
> +# add --all to print for all cpus, otherwise for current cpu
> +# add --script to generate shell script to run correct event
> +#
> +# requires internet (https) access. this may require setting up a proxy
> +# with export https_proxy=...
> +#
> +import urllib2
> +import sys
> +import json
> +import argparse
> +import collections
> +
> +baseurl = "https://download.01.org/perfmon";
> +
> +target_events = (u'BR_INST_RETIRED.NEAR_TAKEN',
> + u'BR_INST_EXEC.TAKEN',
> + u'BR_INST_RETIRED.TAKEN_JCC',
> + u'BR_INST_TYPE_RETIRED.COND_TAKEN')
> +
> +ap = argparse.ArgumentParser()
> +ap.add_argument('--all', '-a', help='Print for all CPUs', 
> action='store_true')
> +ap.add_argument('--script', help='Generate shell script', 
> action='store_true')
> +args = ap.parse_args()
> +
> +eventmap = collections.defaultdict(list)
> +
> +def get_cpu_str():
> +with open('/proc/cpuinfo', 'r') as c:
> +vendor, fam, model = None, None, None
> +for j in c:
> +n = j.split()
> +if n[0] == 'vendor_id':
> +vendor = n[2]
> +elif n[0] == 'model' and n[1] == ':':
> +model = int(n[2])
> +elif n[0] == 'cpu' and n[1] == 'family':
> +fam = int(n[3])
> +if vendor and fam and model:
> +return "%s-%d-%X" % (vendor, fam, model), model
> +return None, None
> +
> +def find_event(eventurl, model):
> +print >>sys.stderr, "Downloading", eventurl
> +u = urllib2.urlopen(eventurl)
> +events = json.loads(u.read())
> +u.close()
> +
> +found = 0
> +for j in events:
> +if j[u'EventName'] in target_events:
> +event = "cpu/event=%s,umask=%s/" % (j[u'EventCode'], j[u'UMask'])
> +if u'PEBS' in j and j[u'PEBS'] > 0:
> +event += "p"
> +if args.script:
> +eventmap[event].append(model)
> +else:
> +   

Re: [PATCH, libgomp] Rewire OpenACC async

2016-04-16 Thread Chung-Lin Tang
Ping.

On 2016/4/8 07:02 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/3/29 5:48 PM, Chung-Lin Tang wrote:
>> I've updated this patch for trunk (as attached), and re-tested without
>> regressions. This patch is still a fix for 
>> libgomp.oacc-c-c++-common/asyncwait-1.c,
>> which FAILs right now.
>>
>> ChangeLog is still as before. Is this okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> On 2015/12/22 4:58 PM, Chung-Lin Tang wrote:
>>> Ping.
>>>
>>> On 2015/11/24 6:27 PM, Chung-Lin Tang wrote:
 Hi, this patch reworks some of the way that asynchronous copyouts are
 implemented for OpenACC in libgomp.

 Before this patch, we had a somewhat confusing way of implementing this
 by having two refcounts for each mapping: refcount and async_refcount,
 which I never got working again after the last wave of async regressions
 showed up.

 So this patch implements what I believe to be a simplification: 
 async_refcount
 is removed, and instead of trying to queue the async copyouts during 
 unmapping
 we actually do that during the plugin event handling. This requires a 
 addition
 of the async stream integer as an argument to the register_async_cleanup
 plugin hook, but overall I think this should be more elegant than before.

 This patch fixes the libgomp.oacc-c-c++-common/asyncwait-1.c regression.
 It also fixed data-[23].c regressions before, but some other recent 
 check-in
 happened to already fixed those.

 Tested without regressions, is this okay for trunk?

 Thanks,
 Chung-Lin

 2015-11-24  Chung-Lin Tang  

 * oacc-plugin.h (GOMP_PLUGIN_async_unmap_vars): Add int parameter.
 * oacc-plugin.c (GOMP_PLUGIN_async_unmap_vars): Add 'int async'
 parameter, use to set async stream around call to gomp_unmap_vars,
 call gomp_unmap_vars() with 'do_copyfrom' set to true.
 * plugin/plugin-nvptx.c (struct ptx_event): Add 'int val' field.
 (event_gc): Adjust event handling loop, collect 
 PTX_EVT_ASYNC_CLEANUP
 events and call GOMP_PLUGIN_async_unmap_vars() for each of them.
 (event_add): Add int parameter, initialize 'val' field when
 adding new ptx_event struct.
 (nvptx_evec): Adjust event_add() call arguments.
 (nvptx_host2dev): Likewise.
 (nvptx_dev2host): Likewise.
 (nvptx_wait_async): Likewise.
 (nvptx_wait_all_async): Likewise.
 (GOMP_OFFLOAD_openacc_register_async_cleanup): Add async parameter,
 pass to event_add() call.
 * oacc-host.c (host_openacc_register_async_cleanup): Add 'int 
 async'
 parameter.
 * oacc-mem.c (gomp_acc_remove_pointer): Adjust async case to
 call openacc.register_async_cleanup_func() hook.
 * oacc-parallel.c (GOACC_parallel_keyed): Likewise.
 * target.c (gomp_copy_from_async): Delete function.
 (gomp_map_vars): Remove async_refcount.
 (gomp_unmap_vars): Likewise.
 (gomp_load_image_to_device): Likewise.
 (omp_target_associate_ptr): Likewise.
 * libgomp.h (struct splay_tree_key_s): Remove async_refcount.
 (acc_dispatch_t.register_async_cleanup_func): Add int parameter.
 (gomp_copy_from_async): Remove.

>>>
>>
> 



Re: [PATCH 1/4, libgomp] Resolve deadlock on plugin exit

2016-04-16 Thread Chung-Lin Tang
Ping.

On 2016/3/21 06:21 PM, Chung-Lin Tang wrote:
> Hi, this is the set of patches from 
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01411.html
> revised again, this time also with audits for the HSA plugin.
> 
> The changes are pretty minor, mainly that the unload_image hook now
> receives similar error handling treatment.
> 
> Tested again without regressions for nvptx and intelmic, however
> while I was able to build the toolchain with HSA offloading support, I was
> unsure how I could test it, as I currently don't have any AMD hardware (not
> aware if there's an emulator like intelmic).  I would be grateful if
> the HSA folks can run them for me.
> 
> Thanks,
> Chung-Lin
> 
> ChangeLog for the libgomp proper parts, patch as attached.
> 
> 2016-03-20  Chung-Lin Tang  
> 
> * target.c (gomp_device_copy): New function.
> (gomp_copy_host2dev): Likewise.
> (gomp_copy_dev2host): Likewise.
> (gomp_free_device_memory): Likewise.
> (gomp_map_vars_existing): Adjust to call gomp_copy_host2dev().
> (gomp_map_pointer): Likewise.
> (gomp_map_vars): Adjust to call gomp_copy_host2dev(), handle
> NULL value from alloc_func plugin hook.
> (gomp_unmap_tgt): Adjust to call gomp_free_device_memory().
> (gomp_copy_from_async): Adjust to call gomp_copy_dev2host().
> (gomp_unmap_vars): Likewise.
> (gomp_update): Adjust to call gomp_copy_dev2host() and
> gomp_copy_host2dev() functions.
> (gomp_unload_image_from_device): Handle false value from
> unload_image_func plugin hook.
> (gomp_init_device): Handle false value from init_device_func
> plugin hook.
> (gomp_exit_data): Adjust to call gomp_copy_dev2host().
> (omp_target_free): Adjust to call gomp_free_device_memory().
> (omp_target_memcpy): Handle return values from host2dev_func,
> dev2host_func, and dev2dev_func plugin hooks.
> (omp_target_memcpy_rect_worker): Likewise.
> (gomp_target_fini): Handle false value from fini_device_func
> plugin hook.
> * libgomp.h (struct gomp_device_descr): Adjust return type of
> init_device_func, fini_device_func, unload_image_func, free_func,
> dev2host_func,host2dev_func, and dev2dev_func plugin hooks to 'bool'.
> * oacc-host.c (host_init_device): Change return type to bool.
> (host_fini_device): Likewise.
> (host_unload_image): Likewise.
> (host_free): Likewise.
> (host_dev2host): Likewise.
> (host_host2dev): Likewise.
> * oacc-mem.c (acc_free): Handle plugin hook fatal error case.
> (acc_memcpy_to_device): Likewise.
> (acc_memcpy_from_device): Likewise.
> (delete_copyout): Add libfnname parameter, handle free_func
> hook fatal error case.
> (acc_delete): Adjust delete_copyout call.
> (acc_copyout): Likewise.
> (update_dev_host): Move gomp_mutex_unlock to after
> host2dev/dev2host hook calls.
> 



Re: [PATCH, libgomp] Fix deadlock in acc_set_device_type

2016-04-16 Thread Chung-Lin Tang
Ping.

On 2016/3/28 05:45 PM, Chung-Lin Tang wrote:
> Hi Jakub, there's a path for deadlock on acc_device_lock when going
> through the acc_set_device_type() OpenACC library function.
> Basically, the gomp_init_targets_once() function should not be
> called with that held. The attached patch moves it appropriately.
> 
> Also in this patch, there are several cases in acc_* functions
> where gomp_init_targets_once() is guarded by a test of
> !cached_base_dev. Since that function already uses pthread_once() to
> call gomp_target_init(), and technically cached_base_dev
> is protected by acc_device_lock, the cleanest way should be to
> simply drop those "if(!cached_base_dev)" tests.
> 
> Tested libgomp without regressions on an nvptx offloaded system,
> is this okay for trunk?
> 
> Thanks,
> Chung-Lin
> 
> 2016-03-28  Chung-Lin Tang  
> 
> * oacc-init.c (acc_init): Remove !cached_base_dev condition on call to
> gomp_init_targets_once().
> (acc_set_device_type): Remove !cached_base_dev condition on call to
> gomp_init_targets_once(), move call to before acc_device_lock acquire,
> to avoid deadlock.
> (acc_get_device_num): Remove !cached_base_dev condition on call to
> gomp_init_targets_once().
> (acc_set_device_num): Likewise.
> 



Re: [PATCH 3/4, libgomp] Resolve deadlock on plugin exit, HSA plugin parts

2016-04-16 Thread Chung-Lin Tang
On 2016/3/29 09:35 PM, Martin Jambor wrote:
> Hi,
> 
> On Sun, Mar 27, 2016 at 06:26:29PM +0800, Chung-Lin Tang wrote:
>> On 2016/3/25 上午 02:40, Martin Jambor wrote:
>>> On the whole, I am fine with the patch but there are two issues:
>>>
>>> First, and generally, when you change the return type of a function,
>>> you must document what return values mean in the comment of the
>>> function.  Most importantly, it must be immediately apparent whether a
>>> function returns true or false on failure from its comment.  So please
>>> fix that.
>>
>> Thanks, I'll update on that.
>>
>  /* Callback of dispatch queues to report errors.  */
> @@ -454,7 +471,7 @@ queue_callback (hsa_status_t status,
>   hsa_queue_t *queue __attribute__ ((unused)),
>   void *data __attribute__ ((unused)))
>  {
> -  hsa_fatal ("Asynchronous queue error", status);
> +  hsa_error ("Asynchronous queue error", status);
>  }
>>> ...I believe this hunk is wrong.  Errors reported in this way mean
>>> that something is very wrong and generally happen during execution of
>>> code on HSA GPU, i.e. within GOMP_OFFLOAD_run.  And since you left
>>> calls in create_single_kernel_dispatch, which is called as a part of
>>> GOMP_OFFLOAD_run, intact, I believe you actually want to leave
>>> hsa_fatel here too.
>>
>> Yes, a fatal exit is okay within the 'run' hook, since we're not holding
>> the device lock there. I was only trying to audit the 
>> GOMP_OFFLOAD_init_device()
>> function, where the queues are created.
>>
>> I'm not familiar with the HSA runtime API; will the callback only be 
>> triggered
>> during GPU kernel execution (inside the 'run' hook), and not for example,
>> within hsa_queue_create()? If so, then yes as you advised, the above change 
>> to
>> queue_callback() should be reverted.
>>
> 
> The documentation says the callback is "invoked by the HSA runtime for
> every asynchronous event related to the newly created queue."  All
> enumerated situations when the callback is called happen at command
> launch time (i.e. inside a run hook).
> 
> Since creation of the queue is a synchronous event, callback should
> not be invoked if it fails.  But of course, the description does not
> rule out such failures do not occur out of the blue at any arbitrary
> time.  But I think this is as improbable as an GOMP_PLUGIN_malloc
> ending up in a fatal error, which is something you do not seem to be
> worried about.
> 
> So please revert the hunk.
> 
> Thanks,
> 
> Martin
> 

Hi Martin, the attached patch reverts that queue_callback() change, and adds
some more descriptions in the comments to reflect the bool return changes.
Please see if they are acceptable.

Thanks,
Chung-Lin

* plugin/plugin-hsa.c (hsa_warn): Adjust 'hsa_error' local variable
to 'hsa_error_msg', for clarity.
(hsa_fatal): Likewise.
(hsa_error): New function.
(init_hsa_context): Change return type to bool, adjust to return
false on error.
(GOMP_OFFLOAD_get_num_devices): Adjust to handle init_hsa_context
return value.
(GOMP_OFFLOAD_init_device): Change return type to bool, adjust to
return false on error.
(get_agent_info): Adjust to return NULL on error.
(destroy_hsa_program): Change return type to bool, adjust to
return false on error.
(GOMP_OFFLOAD_load_image): Adjust to return -1 on error.
(destroy_module): Change return type to bool, adjust to
return false on error.
(GOMP_OFFLOAD_unload_image): Likewise.
(GOMP_OFFLOAD_fini_device): Likewise.
(GOMP_OFFLOAD_alloc): Change to return NULL when called.
(GOMP_OFFLOAD_free): Change to return false when called.
(GOMP_OFFLOAD_dev2host): Likewise.
(GOMP_OFFLOAD_host2dev): Likewise.
(GOMP_OFFLOAD_dev2dev): Likewise.


Index: plugin/plugin-hsa.c
===
--- plugin/plugin-hsa.c	(revision 235059)
+++ plugin/plugin-hsa.c	(working copy)
@@ -175,10 +175,10 @@ hsa_warn (const char *str, hsa_status_t status)
   if (!debug)
 return;
 
-  const char *hsa_error;
-  hsa_status_string (status, &hsa_error);
+  const char *hsa_error_msg;
+  hsa_status_string (status, &hsa_error_msg);
 
-  fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error);
+  fprintf (stderr, "HSA warning: %s\nRuntime message: %s", str, hsa_error_msg);
 }
 
 /* Report a fatal error STR together with the HSA error corresponding to STATUS
@@ -187,12 +187,25 @@ hsa_warn (const char *str, hsa_status_t status)
 static void
 hsa_fatal (const char *str, hsa_status_t status)
 {
-  const char *hsa_error;
-  hsa_status_string (status, &hsa_error);
+  const char *hsa_error_msg;
+  hsa_status_string (status, &hsa_error_msg);
   GOMP_PLUGIN_fatal ("HSA fatal error: %s\nRuntime message: %s", str,
-		 hsa_error);
+		 hsa_error_msg);
 }
 
+/* Like hsa_fatal, except only report error me