Re: [PATCH] PR78879

2016-12-22 Thread Yuan, Pengfei
> How specifically does this fix the problem.  I suspect you're just 
> papering over the bug by changing the order of thread processing.
> 
> You'll note that when a thread is canceled we call the 
> paths.unordered_remove without incrementing i.  AFAICT you're just 
> changing order in which paths out of the array.
> 
> 
> Jeff

Please check my explanation in
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78879#c11

Thanks!

Yuan, Pengfei


[PATCH] PR78879

2016-12-21 Thread Yuan, Pengfei
Hi,

The following patch fixes
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78879

There are some other invocations of unordered_remove in
tree-ssa-threadupdate.c, which may also need to be replaced
with ordered_remove.

Regards,
Yuan, Pengfei


2016-12-22  Yuan Pengfei  <y...@pku.edu.cn>

PR middle-end/78879
* tree-ssa-threadupdate.c (mark_threaded_blocks): Replace
unordered_remove with ordered_remove.


diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index 5a5f8df..b2e6d7a 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2144,7 +2144,7 @@ mark_threaded_blocks (bitmap threaded_blocks)
}
  else
{
- paths.unordered_remove (i);
+ paths.ordered_remove (i);
  if (dump_file && (dump_flags & TDF_DETAILS))
dump_jump_thread_path (dump_file, *path, false);
  delete_jump_thread_path (path);
@@ -2180,7 +2180,7 @@ mark_threaded_blocks (bitmap threaded_blocks)
  else
{
  e->aux = NULL;
- paths.unordered_remove (i);
+ paths.ordered_remove (i);
  if (dump_file && (dump_flags & TDF_DETAILS))
dump_jump_thread_path (dump_file, *path, false);
  delete_jump_thread_path (path);



Re: PING: [PATCH] Be more conservative in early inliner if FDO is enabled

2016-10-10 Thread Yuan, Pengfei
> On Mon, Oct 10, 2016 at 4:23 AM, Yuan, Pengfei <y...@pku.edu.cn> wrote:
> > Hi,
> >
> > What is the decision on this patch?
> > https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01041.html
> 
> Honza approved the patch already.
> 
> Richard.

Do I need to sign a copyright assignment for the patch?
Moreover, I do not have the permission to commit it.

Regards,
Yuan, Pengfei



PING: [PATCH] Be more conservative in early inliner if FDO is enabled

2016-10-09 Thread Yuan, Pengfei
Hi,

What is the decision on this patch?
https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01041.html

Regards,
Yuan, Pengfei

> A new patch for trunk is attached.
> 
> Regards,
> Yuan, Pengfei
> 
> 
> 2016-09-16  Yuan Pengfei  <y...@pku.edu.cn>
> 
>   * doc/invoke.texi (--param early-inlining-insns-feedback): New.
>   * ipa-inline.c (want_early_inline_function_p): Use
>   PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
>   * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
>   (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.
> 
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 8eb5eff..6e7659a 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9124,12 +9124,18 @@ given call expression.  This parameter limits 
> inlining only to call expressions
>  whose probability exceeds the given threshold (in percents).
>  The default value is 10.
>  
>  @item early-inlining-insns
> +@itemx early-inlining-insns-feedback
>  Specify growth that the early inliner can make.  In effect it increases
>  the amount of inlining for code having a large abstraction penalty.
>  The default value is 14.
>  
> +The @option{early-inlining-insns-feedback} parameter is used only when
> +profile feedback-directed optimizations are enabled (by
> +@option{-fprofile-generate} or @option{-fprofile-use}).
> +The default value is 2.
> +
>  @item max-early-inliner-iterations
>  Limit of iterations of the early inliner.  This basically bounds
>  the number of nested indirect calls the early inliner can resolve.
>  Deeper chains are still handled by late inlining.
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 5c9366a..e028c08 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -594,10 +594,17 @@ want_early_inline_function_p (struct cgraph_edge *e)
>  }
>else
>  {
>int growth = estimate_edge_growth (e);
> +  int growth_limit;
>int n;
>  
> +  if ((profile_arc_flag && !flag_test_coverage)
> +   || (flag_branch_probabilities && !flag_auto_profile))
> + growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_FEEDBACK);
> +  else
> + growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS);
> +
>if (growth <= 0)
>   ;
>else if (!e->maybe_hot_p ()
>  && growth > 0)
> @@ -610,9 +617,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
>xstrdup_for_dump (callee->name ()), callee->order,
>growth);
> want_inline = false;
>   }
> -  else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
> +  else if (growth > growth_limit)
>   {
> if (dump_file)
>   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
>"growth %i exceeds --param early-inlining-insns\n",
> @@ -622,9 +629,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
>growth);
> want_inline = false;
>   }
>else if ((n = num_calls (callee)) != 0
> -&& growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
> +&& growth * (n + 1) > growth_limit)
>   {
> if (dump_file)
>   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
>"growth %i exceeds --param early-inlining-insns "
> diff --git a/gcc/params.def b/gcc/params.def
> index 79b7dd4..91ea513 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -199,12 +199,20 @@ DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
>  DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
>"ipcp-unit-growth",
>"How much can given compilation unit grow because of the 
> interprocedural constant propagation (in percent).",
>10, 0, 0)
> -DEFPARAM(PARAM_EARLY_INLINING_INSNS,
> -  "early-inlining-insns",
> -  "Maximal estimated growth of function body caused by early inlining of 
> single call.",
> -  14, 0, 0)
> +DEFPARAM (PARAM_EARLY_INLINING_INSNS_FEEDBACK,
> +   "early-inlining-insns-feedback",
> +   "Maximal estimated growth of function body caused by early "
> +   "inlining of single call.  Used when profile feedback-directed "
> +   "optimizations are enabled.",
> +   2, 0, 0)
> +DEFPARAM (PARAM_EARLY_INLINING_INSNS,
> +   "early-inlining-insns",
> +   "Maximal estimated growth of function body caused by early "
> +   "inlining of single call.  Used when profile feedback-directed "
> +   "optimizations are not enabled.",
> +   14, 0, 0)
>  DEFPARAM(PARAM_LARGE_STACK_FRAME,
>"large-stack-frame",
>"The size of stack frame to be considered large.",
>256, 0, 0)



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-25 Thread Yuan, Pengfei
Hi,

May I ask if there is any decision?

Regards,
Yuan, Pengfei

> > > I also like a new param better as it avoids a new magic constant and
> > > makes playing with
> > > it easier (your patch removes the ability to do statistics like you did 
> > > via the
> > > early-inlining-insns parameter by forcing it to two).
> > 
> > Hmm, you are right that you do not know if this particular function will get
> > profile (forgot about that).  Still, please use two params - it is more
> > consistent with what we have now and we may make it profile specific in
> > future..
> > 
> > Honza
> > > 
> > > Thanks,
> > > Richard.
> 
> A new patch for trunk is attached.
> 
> Regards,
> Yuan, Pengfei
> 
> 
> 2016-09-16  Yuan Pengfei  <y...@pku.edu.cn>
> 
>   * doc/invoke.texi (--param early-inlining-insns-feedback): New.
>   * ipa-inline.c (want_early_inline_function_p): Use
>   PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
>   * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
>   (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.
> 
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 8eb5eff..6e7659a 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9124,12 +9124,18 @@ given call expression.  This parameter limits 
> inlining only to call expressions
>  whose probability exceeds the given threshold (in percents).
>  The default value is 10.
>  
>  @item early-inlining-insns
> +@itemx early-inlining-insns-feedback
>  Specify growth that the early inliner can make.  In effect it increases
>  the amount of inlining for code having a large abstraction penalty.
>  The default value is 14.
>  
> +The @option{early-inlining-insns-feedback} parameter is used only when
> +profile feedback-directed optimizations are enabled (by
> +@option{-fprofile-generate} or @option{-fprofile-use}).
> +The default value is 2.
> +
>  @item max-early-inliner-iterations
>  Limit of iterations of the early inliner.  This basically bounds
>  the number of nested indirect calls the early inliner can resolve.
>  Deeper chains are still handled by late inlining.
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 5c9366a..e028c08 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -594,10 +594,17 @@ want_early_inline_function_p (struct cgraph_edge *e)
>  }
>else
>  {
>int growth = estimate_edge_growth (e);
> +  int growth_limit;
>int n;
>  
> +  if ((profile_arc_flag && !flag_test_coverage)
> +   || (flag_branch_probabilities && !flag_auto_profile))
> + growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_FEEDBACK);
> +  else
> + growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS);
> +
>if (growth <= 0)
>   ;
>else if (!e->maybe_hot_p ()
>  && growth > 0)
> @@ -610,9 +617,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
>xstrdup_for_dump (callee->name ()), callee->order,
>growth);
> want_inline = false;
>   }
> -  else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
> +  else if (growth > growth_limit)
>   {
> if (dump_file)
>   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
>"growth %i exceeds --param early-inlining-insns\n",
> @@ -622,9 +629,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
>growth);
> want_inline = false;
>   }
>else if ((n = num_calls (callee)) != 0
> -&& growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
> +&& growth * (n + 1) > growth_limit)
>   {
> if (dump_file)
>   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
>"growth %i exceeds --param early-inlining-insns "
> diff --git a/gcc/params.def b/gcc/params.def
> index 79b7dd4..91ea513 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -199,12 +199,20 @@ DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
>  DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
>"ipcp-unit-growth",
>"How much can given compilation unit grow because of the 
> interprocedural constant propagation (in percent).",
>10, 0, 0)
> -DEFPARAM(PARAM_EARLY_INLINING_INSNS,
> -  "early-inlining-insns",
> -  "Maximal estimated growth of function body caused by early inlining of 
> single call.&qu

Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-21 Thread Yuan, Pengfei
> > Btw, It occurs to me that then win in code-size might be purely due to the
> > smaller base value for the TU size we use to compute the maximum unit
> > growth with ... any idea how to improve it on this side?  Say, computing
> > the TU size before early optimization (uh, probably not ...)
> >
> > That said, the inliner always completely fills its budged, that is, increase
> > the unit by max-unit-growth?
> 
> What I'm trying to say is that rather than limiting early inlining we should
> maybe decrease inline-unit-growth when FDO is in effect?  Because we
> can better control where the inlining goes.  If there is over 8% reduction
> in size benchmarking (unpatched) compiler on Firefox with FDO and
> --param inline-unit-growth=12 might show if the results are the same.

FYI, with --param inline-unit-growth=12 --param early-inlining-insns=14,
the code size reduction is only 1.1%.

Yuan, Pengfei

> Richard.
> 
> > Richard.



Re: Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Yuan, Pengfei
> >> > Btw, It occurs to me that then win in code-size might be purely due to 
> >> > the
> >> > smaller base value for the TU size we use to compute the maximum unit
> >> > growth with ... any idea how to improve it on this side?  Say, computing
> >> > the TU size before early optimization (uh, probably not ...)
> >> >
> >> > That said, the inliner always completely fills its budged, that is, 
> >> > increase
> >> > the unit by max-unit-growth?
> >>
> >> What I'm trying to say is that rather than limiting early inlining we 
> >> should
> >> maybe decrease inline-unit-growth when FDO is in effect?  Because we
> >> can better control where the inlining goes.  If there is over 8% reduction
> >> in size benchmarking (unpatched) compiler on Firefox with FDO and
> >> --param inline-unit-growth=12 might show if the results are the same.
> >>
> >> Richard.
> >>
> >> > Richard.
> >
> > AFAIK, param inline-unit-growth only affects pass_ipa_inline. However, the 
> > code
> > size reduction achieved with my patch is from pass_early_inline.
> 
> Any inlining you don't do at early inlining time will a) decrease the
> size inline-unit-growth
> is computed on b) just delays inlining to IPA inlining (with the now
> more constrained size budget).

Delaying inlining from early inlining to IPA inlining can save size because 
profile
feedback is effective at IPA inlining and inlining of cold functions can be 
avoided.

Otherwise, inlining done at early inlining can not be canceled later.

Yuan, Pengfei

> Richard.
> 
> > Yuan, Pengfei



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Yuan, Pengfei
> > Btw, It occurs to me that then win in code-size might be purely due to the
> > smaller base value for the TU size we use to compute the maximum unit
> > growth with ... any idea how to improve it on this side?  Say, computing
> > the TU size before early optimization (uh, probably not ...)
> >
> > That said, the inliner always completely fills its budged, that is, increase
> > the unit by max-unit-growth?
> 
> What I'm trying to say is that rather than limiting early inlining we should
> maybe decrease inline-unit-growth when FDO is in effect?  Because we
> can better control where the inlining goes.  If there is over 8% reduction
> in size benchmarking (unpatched) compiler on Firefox with FDO and
> --param inline-unit-growth=12 might show if the results are the same.
> 
> Richard.
> 
> > Richard.

AFAIK, param inline-unit-growth only affects pass_ipa_inline. However, the code
size reduction achieved with my patch is from pass_early_inline.

Yuan, Pengfei


Re: Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Yuan, Pengfei
> > > > I also like a new param better as it avoids a new magic constant and
> > > > makes playing with
> > > > it easier (your patch removes the ability to do statistics like you did 
> > > > via the
> > > > early-inlining-insns parameter by forcing it to two).
> > > 
> > > Hmm, you are right that you do not know if this particular function will 
> > > get
> > > profile (forgot about that).  Still, please use two params - it is more
> > > consistent with what we have now and we may make it profile specific in
> > > future..
> > > 
> > > Honza
> > > > 
> > > > Thanks,
> > > > Richard.
> > 
> > A new patch for trunk is attached.
> > 
> > Regards,
> > Yuan, Pengfei
> > 
> > 
> > 2016-09-16  Yuan Pengfei  <y...@pku.edu.cn>
> > 
> > * doc/invoke.texi (--param early-inlining-insns-feedback): New.
> > * ipa-inline.c (want_early_inline_function_p): Use
> > PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
> >     * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
> > (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.
> 
> OK,
> thanks
> 
> Honza

Do I have to sign an FSF copyright assignment to get this patch applied?

Regards,
Yuan, Pengfei


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Yuan, Pengfei
> > I also like a new param better as it avoids a new magic constant and
> > makes playing with
> > it easier (your patch removes the ability to do statistics like you did via 
> > the
> > early-inlining-insns parameter by forcing it to two).
> 
> Hmm, you are right that you do not know if this particular function will get
> profile (forgot about that).  Still, please use two params - it is more
> consistent with what we have now and we may make it profile specific in
> future..
> 
> Honza
> > 
> > Thanks,
> > Richard.

A new patch for trunk is attached.

Regards,
Yuan, Pengfei


2016-09-16  Yuan Pengfei  <y...@pku.edu.cn>

* doc/invoke.texi (--param early-inlining-insns-feedback): New.
* ipa-inline.c (want_early_inline_function_p): Use
PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
* params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
(PARAM_EARLY_INLINING_INSNS): Change help string accordingly.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8eb5eff..6e7659a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9124,12 +9124,18 @@ given call expression.  This parameter limits inlining 
only to call expressions
 whose probability exceeds the given threshold (in percents).
 The default value is 10.
 
 @item early-inlining-insns
+@itemx early-inlining-insns-feedback
 Specify growth that the early inliner can make.  In effect it increases
 the amount of inlining for code having a large abstraction penalty.
 The default value is 14.
 
+The @option{early-inlining-insns-feedback} parameter is used only when
+profile feedback-directed optimizations are enabled (by
+@option{-fprofile-generate} or @option{-fprofile-use}).
+The default value is 2.
+
 @item max-early-inliner-iterations
 Limit of iterations of the early inliner.  This basically bounds
 the number of nested indirect calls the early inliner can resolve.
 Deeper chains are still handled by late inlining.
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 5c9366a..e028c08 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -594,10 +594,17 @@ want_early_inline_function_p (struct cgraph_edge *e)
 }
   else
 {
   int growth = estimate_edge_growth (e);
+  int growth_limit;
   int n;
 
+  if ((profile_arc_flag && !flag_test_coverage)
+ || (flag_branch_probabilities && !flag_auto_profile))
+   growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_FEEDBACK);
+  else
+   growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS);
+
   if (growth <= 0)
;
   else if (!e->maybe_hot_p ()
   && growth > 0)
@@ -610,9 +617,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
 xstrdup_for_dump (callee->name ()), callee->order,
 growth);
  want_inline = false;
}
-  else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+  else if (growth > growth_limit)
{
  if (dump_file)
fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
 "growth %i exceeds --param early-inlining-insns\n",
@@ -622,9 +629,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
 growth);
  want_inline = false;
}
   else if ((n = num_calls (callee)) != 0
-  && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+  && growth * (n + 1) > growth_limit)
{
  if (dump_file)
fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
 "growth %i exceeds --param early-inlining-insns "
diff --git a/gcc/params.def b/gcc/params.def
index 79b7dd4..91ea513 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -199,12 +199,20 @@ DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
 DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
 "ipcp-unit-growth",
 "How much can given compilation unit grow because of the 
interprocedural constant propagation (in percent).",
 10, 0, 0)
-DEFPARAM(PARAM_EARLY_INLINING_INSNS,
-"early-inlining-insns",
-"Maximal estimated growth of function body caused by early inlining of 
single call.",
-14, 0, 0)
+DEFPARAM (PARAM_EARLY_INLINING_INSNS_FEEDBACK,
+ "early-inlining-insns-feedback",
+ "Maximal estimated growth of function body caused by early "
+ "inlining of single call.  Used when profile feedback-directed "
+ "optimizations are enabled.",
+ 2, 0, 0)
+DEFPARAM (PARAM_EARLY_INLINING_INSNS,
+ "early-inlining-insns",
+ "Maximal estimated growth of function body caused by early "
+ "inlining of single call.  Used when profile feedback-directed "
+ "optimizations are not enabled.",
+ 14, 0, 0)
 DEFPARAM(PARAM_LARGE_STACK_FRAME,
 "large-stack-frame",
 "The size of stack frame to be considered large.",
 256, 0, 0)



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Yuan, Pengfei
> > Here are the results:
> > 
> > Param  Size (GCC5)Time (GCC5)  Time (GCC7)
> > 0  44686265 (-8.26%)  58.772s  66.332s
> > 1  45692793 (-6.19%)  40.684s  39.220s
> > 2  45556185 (-6.47%)  35.292s  34.328s
> > 3  46251049 (-5.05%)  28.820s  27.136s
> > 4  47028873 (-3.45%)  24.616s  22.200s
> > 5  47495641 (-2.49%)  20.160s  17.800s
> > 6  47520153 (-2.44%)  16.444s  15.656s
> > 14 48708873   5.620s   5.556s
> 
> Thanks for data! I meant to run the benchmark myself, but had little time to 
> do
> it over past week becuase of traveling and was also wondering what to do given
> that spec is rather poor benchmark in this area. Tramp3d is biassed but we are
> in stage1 and can fine tune latter. I am debugging the libxul crashes in FDO
> binary now, so we can re-run talos.

It seems to be memory corruption. The content of a pointer becomes 0xe5e5...e5.
I have tried Firefox 48.0.2, 49b10 and mozilla-central with GCC 6. Same error.

> > Param: value of PARAM_EARLY_INLINING_INSNS
> > Size:  code size (.text) of optimized libxul.so
> > Time:  execution time of instrumented tramp3d (-n 25)
> > 
> > To balance between size reduction of optimized binary and speed penalty
> > of instrumented binary, I set param=6 as baseline and compare:
> > 
> > Param  Size score  Time score  Total
> > 0  3.39-3.57   -0.18
> > 1  2.54-2.470.07
> > 2  2.65-2.150.50
> > 3  2.07-1.750.32
> > 4  1.41-1.50   -0.09
> > 5  1.02-1.23   -0.21
> > 6  1.00-1.000.00
> > 14 0.00-0.34   -0.34
> > 
> > Therefore, I think param=2 is the best choice.
> > 
> > Is the attached patch OK?
> 
> Setting param to 2 looks fine
> 
> > gcc/ChangeLog
> > * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
> > when FDO is enabled.
> > 
> > 
> > diff --git a/gcc/opts.c b/gcc/opts.c
> > index 39c190d..b59c700 100644
> > --- a/gcc/opts.c
> > +++ b/gcc/opts.c
> > @@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
> > gcc_options *opts_set,
> >maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
> >  opts->x_param_values, 
> > opts_set->x_param_values);
> >  }
> > 
> > +  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
> > +  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
> > +  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
> > +maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
> > +  opts->x_param_values, opts_set->x_param_values);
> > +
> 
> I would actually preffer to have PARAM_EARLY_ININING_INSNS_FEEDBACK.  We
> already have TRACER_DYNAMIC_COVERAGE_FEEDBACK and other params.  The reason is
> that profile is not a global property of program. It may or may not be
> available for given function, while params are global.

Whether profile is available for specific functions is not important here 
because
profile is loaded after pass_early_inline. Therefore I think setting the param
globally is fine.

> Even at compile time profile may be selectively missing for example for
> COMDATs that did not win in the linking process.
> 
> There is also need to update the documentation.
> Thanks for the work!
> Honza

Here is the new patch.

Regards,
Yuan, Pengfei


gcc/ChangeLog
* doc/invoke.texi (early-inlining-insns): Value is adjusted
when FDO is enabled.
* opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
when FDO is enabled.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1ca4dcc..880a28c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10272,9 +10272,11 @@ The default value is 10.
 
 @item early-inlining-insns
 Specify growth that the early inliner can make.  In effect it increases
 the amount of inlining for code having a large abstraction penalty.
-The default value is 14.
+The default value is 14.  When feedback-directed optimizations is enabled (by
+@option{-fprofile-generate} or @option{-fprofile-use}), the value is adjusted
+to 2.
 
 @item max-early-inliner-iterations
 Limit of iterations of the early inliner.  This basically bounds
 the number of nested indirect calls the early inliner can resolve.
diff --git a/gcc/opts.c b/gcc/opts.c
index 39c190d..b59c700 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *

Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-15 Thread Yuan, Pengfei
> > Setting PARAM_EARLY_INLINING_INSNS to 0 when FDO is enabled should be
> > equivalent to my patch.
> 
> Yes.  This means it's easy to experiment with other values than zero.  
> Basically
> early inlining is supposed to remove abstraction penalty to
> 
>  a) reduce FDO instrumentation overhead
>  b) get more realistic size estimates for the inliner
> 
> a) was particularly important back in time for tramp3d, reducing
> profiling runtime
> 1000-fold.  b) is generally important.
> 
> PARAM_EARLY_INLINING_INSNS is supposed to be a reasonable value to
> get abstraction removed but IIRC we increased it quite a bit to also get more
> early optimization (to get more accurate inliner estimates).
> 
> What I am saying is that reducing PARAM_EARLY_INLINING_INSNS makes
> sense for FDO but reducing it to zero is probably a bit much.
> 
> Can you do your measurements with values between zero and the current
> default of 14 (wow, 14 ... didn't know it's _that_ high currently ;)).
> What's the
> value that crosses the boundary of diminishing returns regarding to code-size
> improvements for you?
> 
> Richard.

Here are the results:

Param  Size (GCC5)Time (GCC5)  Time (GCC7)
0  44686265 (-8.26%)  58.772s  66.332s
1  45692793 (-6.19%)  40.684s  39.220s
2  45556185 (-6.47%)  35.292s  34.328s
3  46251049 (-5.05%)  28.820s  27.136s
4  47028873 (-3.45%)  24.616s  22.200s
5  47495641 (-2.49%)  20.160s  17.800s
6  47520153 (-2.44%)  16.444s  15.656s
14 48708873   5.620s   5.556s

Param: value of PARAM_EARLY_INLINING_INSNS
Size:  code size (.text) of optimized libxul.so
Time:  execution time of instrumented tramp3d (-n 25)

To balance between size reduction of optimized binary and speed penalty
of instrumented binary, I set param=6 as baseline and compare:

Param  Size score  Time score  Total
0  3.39-3.57   -0.18
1  2.54-2.470.07
2  2.65-2.150.50
3  2.07-1.750.32
4  1.41-1.50   -0.09
5  1.02-1.23   -0.21
6  1.00-1.000.00
14 0.00-0.34   -0.34

Therefore, I think param=2 is the best choice.

Is the attached patch OK?

Regards,
Yuan, Pengfei


gcc/ChangeLog
* opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
when FDO is enabled.


diff --git a/gcc/opts.c b/gcc/opts.c
index 39c190d..b59c700 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
 opts->x_param_values, opts_set->x_param_values);
 }

+  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
+  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
+  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
+maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
+  opts->x_param_values, opts_set->x_param_values);
+
   if (opts->x_flag_lto)
 {
 #ifdef ENABLE_LTO
   opts->x_flag_generate_lto = 1;



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-15 Thread Yuan, Pengfei
> Yes.  This means it's easy to experiment with other values than zero.  
> Basically
> early inlining is supposed to remove abstraction penalty to
> 
>  a) reduce FDO instrumentation overhead
>  b) get more realistic size estimates for the inliner
> 
> a) was particularly important back in time for tramp3d, reducing
> profiling runtime
> 1000-fold.  b) is generally important.
> 
> PARAM_EARLY_INLINING_INSNS is supposed to be a reasonable value to
> get abstraction removed but IIRC we increased it quite a bit to also get more
> early optimization (to get more accurate inliner estimates).
> 
> What I am saying is that reducing PARAM_EARLY_INLINING_INSNS makes
> sense for FDO but reducing it to zero is probably a bit much.
> 
> Can you do your measurements with values between zero and the current
> default of 14 (wow, 14 ... didn't know it's _that_ high currently ;)).
> What's the
> value that crosses the boundary of diminishing returns regarding to code-size
> improvements for you?
> 
> Richard.
> 

I see. This procedure may take some time since profile data can not be reused.

Yuan, Pengfei



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-14 Thread Yuan, Pengfei
> I think the approach is reasonable though it might still have
> interesting effects on
> optimization for very small growth.  So for further experimenting it
> would be nice

Test it on SPEC CPU with FDO enabled?

> to have a separate PARAM_EARLY_FDO_INLINING_INSNS or maybe simply
> adjust the PARAM_EARLY_INLINING_INSNS default accordingly when FDO is
> enabled?

Setting PARAM_EARLY_INLINING_INSNS to 0 when FDO is enabled should be
equivalent to my patch.

Regards,
Yuan, Pengfei
 
> I'll let Honza also double-check the condition detecting FDO (it looks
> like we should
> have some abstraction for that).
> 
> Thanks,
> Richard.



[PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-10 Thread Yuan, Pengfei
Hi,

Previously I have sent a patch on profile based option tuning:
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01377.html

According to Richard Biener's advice, I try investigating where the code size
reduction comes from. After analyzing the dumped IL, I figure out that it is
related to function inlining. Some cold functions are inlined regardless of
profile feedback, which increases code size.

The problem is with the early inliner. In want_early_inline_function_p, if the
estimated edge growth > 0, want_inline depends on maybe_hot_p, which usually
returns true unless optimize_size, since profile feedback is not available at
this point. Some functions which may be cold according to profile feedback are
inlined regardlessly, resulting in code size increase.

At first, I come up with a solution that preloads some profile info before
pass_early_inline. But it fails with numerous coverage-mismatch errors in
pass_ipa_tree_profile. Therefore, the proposed patch prevents early inlining
with positive code size growth if FDO is enabled.

Experiment results are as follows:

Setup
  Hardware Core i7-4770, 32GB RAM
  OS   Debian sid amd64
  Compiler GCC 5.4.1 20160907
  Firefox source   mozilla-central, cset 91c2b9d5c135
  Training workloadcss3test.com, html5test.com, Octane benchmark

Vanilla GCC
  Code size (.text of libxul.so)48708873
  Octane benchmark (score)  35828   36618   35847
  Kraken benchmark (time)   939.4ms 964.0ms 951.8ms

Patched GCC
  Code size (.text of libxul.so)44686265
  Octane benchmark (score)  36103   35740   35611
  Kraken benchmark (time)   928.9ms 949.1ms 938.7ms

There is over 8% reduction in code size, while no obvious difference in
performance. The experiment is conducted with GCC 5. There is segmentation
fault when starting Firefox instrumented by GCC 6. GCC 7 encounters ICE when
building Firefox.

Regards,

Yuan, Pengfei


gcc/ChangeLog:
* ipa-inline.c (want_early_inline_function_p): Be more conservative
if FDO is enabled.


diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 7097cf3..8266f97 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -628,6 +628,20 @@ want_early_inline_function_p (struct cgraph_edge *e)
 
   if (growth <= 0)
;
+  /* Profile feedback is not available at this point.
+Be more conservative if FDO is enabled.  */
+  else if ((profile_arc_flag && !flag_test_coverage)
+  || (flag_branch_probabilities && !flag_auto_profile))
+   {
+ if (dump_file)
+   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
+"FDO is enabled and code would grow by %i\n",
+xstrdup_for_dump (e->caller->name ()),
+e->caller->order,
+xstrdup_for_dump (callee->name ()), callee->order,
+growth);
+ want_inline = false;
+   }
   else if (!e->maybe_hot_p ()
   && growth > 0)
{