Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-25 Thread Yuan, Pengfei
Hi,

May I ask if there is any decision?

Regards,
Yuan, Pengfei

> > > I also like a new param better as it avoids a new magic constant and
> > > makes playing with
> > > it easier (your patch removes the ability to do statistics like you did 
> > > via the
> > > early-inlining-insns parameter by forcing it to two).
> > 
> > Hmm, you are right that you do not know if this particular function will get
> > profile (forgot about that).  Still, please use two params - it is more
> > consistent with what we have now and we may make it profile specific in
> > future..
> > 
> > Honza
> > > 
> > > Thanks,
> > > Richard.
> 
> A new patch for trunk is attached.
> 
> Regards,
> Yuan, Pengfei
> 
> 
> 2016-09-16  Yuan Pengfei  
> 
>   * doc/invoke.texi (--param early-inlining-insns-feedback): New.
>   * ipa-inline.c (want_early_inline_function_p): Use
>   PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
>   * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
>   (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.
> 
> 
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 8eb5eff..6e7659a 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -9124,12 +9124,18 @@ given call expression.  This parameter limits 
> inlining only to call expressions
>  whose probability exceeds the given threshold (in percents).
>  The default value is 10.
>  
>  @item early-inlining-insns
> +@itemx early-inlining-insns-feedback
>  Specify growth that the early inliner can make.  In effect it increases
>  the amount of inlining for code having a large abstraction penalty.
>  The default value is 14.
>  
> +The @option{early-inlining-insns-feedback} parameter is used only when
> +profile feedback-directed optimizations are enabled (by
> +@option{-fprofile-generate} or @option{-fprofile-use}).
> +The default value is 2.
> +
>  @item max-early-inliner-iterations
>  Limit of iterations of the early inliner.  This basically bounds
>  the number of nested indirect calls the early inliner can resolve.
>  Deeper chains are still handled by late inlining.
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 5c9366a..e028c08 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -594,10 +594,17 @@ want_early_inline_function_p (struct cgraph_edge *e)
>  }
>else
>  {
>int growth = estimate_edge_growth (e);
> +  int growth_limit;
>int n;
>  
> +  if ((profile_arc_flag && !flag_test_coverage)
> +   || (flag_branch_probabilities && !flag_auto_profile))
> + growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_FEEDBACK);
> +  else
> + growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS);
> +
>if (growth <= 0)
>   ;
>else if (!e->maybe_hot_p ()
>  && growth > 0)
> @@ -610,9 +617,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
>xstrdup_for_dump (callee->name ()), callee->order,
>growth);
> want_inline = false;
>   }
> -  else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
> +  else if (growth > growth_limit)
>   {
> if (dump_file)
>   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
>"growth %i exceeds --param early-inlining-insns\n",
> @@ -622,9 +629,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
>growth);
> want_inline = false;
>   }
>else if ((n = num_calls (callee)) != 0
> -&& growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
> +&& growth * (n + 1) > growth_limit)
>   {
> if (dump_file)
>   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
>"growth %i exceeds --param early-inlining-insns "
> diff --git a/gcc/params.def b/gcc/params.def
> index 79b7dd4..91ea513 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -199,12 +199,20 @@ DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
>  DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
>"ipcp-unit-growth",
>"How much can given compilation unit grow because of the 
> interprocedural constant propagation (in percent).",
>10, 0, 0)
> -DEFPARAM(PARAM_EARLY_INLINING_INSNS,
> -  "early-inlining-insns",
> -  "Maximal estimated growth of function body caused by early inlining of 
> single call.",
> -  14, 0, 0)
> +DEFPARAM (PARAM_EARLY_INLINING_INSNS_FEEDBACK,
> +   "early-inlining-insns-feedback",
> +   "Maximal estimated growth of function body caused by early "
> +   "inlining of single call.  Used when profile feedback-directed "
> +   "optimizations are enabled.",
> +   2, 0, 0)
> +DEFPARAM (PARAM_EARLY_INLINING_INSNS,
> +   "early-inlining-insns",
> +   "Maximal estimated growth of function body caused by early "
> +   "inlining of single call.  Used when profile feedback-directed "
> +   "optimizations are not 

Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-21 Thread Yuan, Pengfei
> > Btw, It occurs to me that then win in code-size might be purely due to the
> > smaller base value for the TU size we use to compute the maximum unit
> > growth with ... any idea how to improve it on this side?  Say, computing
> > the TU size before early optimization (uh, probably not ...)
> >
> > That said, the inliner always completely fills its budged, that is, increase
> > the unit by max-unit-growth?
> 
> What I'm trying to say is that rather than limiting early inlining we should
> maybe decrease inline-unit-growth when FDO is in effect?  Because we
> can better control where the inlining goes.  If there is over 8% reduction
> in size benchmarking (unpatched) compiler on Firefox with FDO and
> --param inline-unit-growth=12 might show if the results are the same.

FYI, with --param inline-unit-growth=12 --param early-inlining-insns=14,
the code size reduction is only 1.1%.

Yuan, Pengfei

> Richard.
> 
> > Richard.



Re: Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Yuan, Pengfei
> >> > Btw, It occurs to me that then win in code-size might be purely due to 
> >> > the
> >> > smaller base value for the TU size we use to compute the maximum unit
> >> > growth with ... any idea how to improve it on this side?  Say, computing
> >> > the TU size before early optimization (uh, probably not ...)
> >> >
> >> > That said, the inliner always completely fills its budged, that is, 
> >> > increase
> >> > the unit by max-unit-growth?
> >>
> >> What I'm trying to say is that rather than limiting early inlining we 
> >> should
> >> maybe decrease inline-unit-growth when FDO is in effect?  Because we
> >> can better control where the inlining goes.  If there is over 8% reduction
> >> in size benchmarking (unpatched) compiler on Firefox with FDO and
> >> --param inline-unit-growth=12 might show if the results are the same.
> >>
> >> Richard.
> >>
> >> > Richard.
> >
> > AFAIK, param inline-unit-growth only affects pass_ipa_inline. However, the 
> > code
> > size reduction achieved with my patch is from pass_early_inline.
> 
> Any inlining you don't do at early inlining time will a) decrease the
> size inline-unit-growth
> is computed on b) just delays inlining to IPA inlining (with the now
> more constrained size budget).

Delaying inlining from early inlining to IPA inlining can save size because 
profile
feedback is effective at IPA inlining and inlining of cold functions can be 
avoided.

Otherwise, inlining done at early inlining can not be canceled later.

Yuan, Pengfei

> Richard.
> 
> > Yuan, Pengfei



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Richard Biener
On Tue, Sep 20, 2016 at 1:57 PM, Yuan, Pengfei  wrote:
>> > Btw, It occurs to me that then win in code-size might be purely due to the
>> > smaller base value for the TU size we use to compute the maximum unit
>> > growth with ... any idea how to improve it on this side?  Say, computing
>> > the TU size before early optimization (uh, probably not ...)
>> >
>> > That said, the inliner always completely fills its budged, that is, 
>> > increase
>> > the unit by max-unit-growth?
>>
>> What I'm trying to say is that rather than limiting early inlining we should
>> maybe decrease inline-unit-growth when FDO is in effect?  Because we
>> can better control where the inlining goes.  If there is over 8% reduction
>> in size benchmarking (unpatched) compiler on Firefox with FDO and
>> --param inline-unit-growth=12 might show if the results are the same.
>>
>> Richard.
>>
>> > Richard.
>
> AFAIK, param inline-unit-growth only affects pass_ipa_inline. However, the 
> code
> size reduction achieved with my patch is from pass_early_inline.

Any inlining you don't do at early inlining time will a) decrease the
size inline-unit-growth
is computed on b) just delays inlining to IPA inlining (with the now
more constrained size budget).

Richard.

> Yuan, Pengfei


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Yuan, Pengfei
> > Btw, It occurs to me that then win in code-size might be purely due to the
> > smaller base value for the TU size we use to compute the maximum unit
> > growth with ... any idea how to improve it on this side?  Say, computing
> > the TU size before early optimization (uh, probably not ...)
> >
> > That said, the inliner always completely fills its budged, that is, increase
> > the unit by max-unit-growth?
> 
> What I'm trying to say is that rather than limiting early inlining we should
> maybe decrease inline-unit-growth when FDO is in effect?  Because we
> can better control where the inlining goes.  If there is over 8% reduction
> in size benchmarking (unpatched) compiler on Firefox with FDO and
> --param inline-unit-growth=12 might show if the results are the same.
> 
> Richard.
> 
> > Richard.

AFAIK, param inline-unit-growth only affects pass_ipa_inline. However, the code
size reduction achieved with my patch is from pass_early_inline.

Yuan, Pengfei


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Richard Biener
On Tue, Sep 20, 2016 at 1:31 PM, Richard Biener
 wrote:
> On Fri, Sep 16, 2016 at 2:00 PM, Jan Hubicka  wrote:
>>> > > I also like a new param better as it avoids a new magic constant and
>>> > > makes playing with
>>> > > it easier (your patch removes the ability to do statistics like you did 
>>> > > via the
>>> > > early-inlining-insns parameter by forcing it to two).
>>> >
>>> > Hmm, you are right that you do not know if this particular function will 
>>> > get
>>> > profile (forgot about that).  Still, please use two params - it is more
>>> > consistent with what we have now and we may make it profile specific in
>>> > future..
>>> >
>>> > Honza
>>> > >
>>> > > Thanks,
>>> > > Richard.
>>>
>>> A new patch for trunk is attached.
>>>
>>> Regards,
>>> Yuan, Pengfei
>>>
>>>
>>> 2016-09-16  Yuan Pengfei  
>>>
>>>   * doc/invoke.texi (--param early-inlining-insns-feedback): New.
>>>   * ipa-inline.c (want_early_inline_function_p): Use
>>>   PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
>>>   * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
>>>   (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.
>
> Btw, It occurs to me that then win in code-size might be purely due to the
> smaller base value for the TU size we use to compute the maximum unit
> growth with ... any idea how to improve it on this side?  Say, computing
> the TU size before early optimization (uh, probably not ...)
>
> That said, the inliner always completely fills its budged, that is, increase
> the unit by max-unit-growth?

What I'm trying to say is that rather than limiting early inlining we should
maybe decrease inline-unit-growth when FDO is in effect?  Because we
can better control where the inlining goes.  If there is over 8% reduction
in size benchmarking (unpatched) compiler on Firefox with FDO and
--param inline-unit-growth=12 might show if the results are the same.

Richard.

> Richard.
>
>> OK,
>> thanks
>>
>> Honza


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Richard Biener
On Fri, Sep 16, 2016 at 2:00 PM, Jan Hubicka  wrote:
>> > > I also like a new param better as it avoids a new magic constant and
>> > > makes playing with
>> > > it easier (your patch removes the ability to do statistics like you did 
>> > > via the
>> > > early-inlining-insns parameter by forcing it to two).
>> >
>> > Hmm, you are right that you do not know if this particular function will 
>> > get
>> > profile (forgot about that).  Still, please use two params - it is more
>> > consistent with what we have now and we may make it profile specific in
>> > future..
>> >
>> > Honza
>> > >
>> > > Thanks,
>> > > Richard.
>>
>> A new patch for trunk is attached.
>>
>> Regards,
>> Yuan, Pengfei
>>
>>
>> 2016-09-16  Yuan Pengfei  
>>
>>   * doc/invoke.texi (--param early-inlining-insns-feedback): New.
>>   * ipa-inline.c (want_early_inline_function_p): Use
>>   PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
>>   * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
>>   (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.

Btw, It occurs to me that then win in code-size might be purely due to the
smaller base value for the TU size we use to compute the maximum unit
growth with ... any idea how to improve it on this side?  Say, computing
the TU size before early optimization (uh, probably not ...)

That said, the inliner always completely fills its budged, that is, increase
the unit by max-unit-growth?

Richard.

> OK,
> thanks
>
> Honza


Re: Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-20 Thread Yuan, Pengfei
> > > > I also like a new param better as it avoids a new magic constant and
> > > > makes playing with
> > > > it easier (your patch removes the ability to do statistics like you did 
> > > > via the
> > > > early-inlining-insns parameter by forcing it to two).
> > > 
> > > Hmm, you are right that you do not know if this particular function will 
> > > get
> > > profile (forgot about that).  Still, please use two params - it is more
> > > consistent with what we have now and we may make it profile specific in
> > > future..
> > > 
> > > Honza
> > > > 
> > > > Thanks,
> > > > Richard.
> > 
> > A new patch for trunk is attached.
> > 
> > Regards,
> > Yuan, Pengfei
> > 
> > 
> > 2016-09-16  Yuan Pengfei  
> > 
> > * doc/invoke.texi (--param early-inlining-insns-feedback): New.
> > * ipa-inline.c (want_early_inline_function_p): Use
> > PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
> > * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
> > (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.
> 
> OK,
> thanks
> 
> Honza

Do I have to sign an FSF copyright assignment to get this patch applied?

Regards,
Yuan, Pengfei


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Jan Hubicka
> > > I also like a new param better as it avoids a new magic constant and
> > > makes playing with
> > > it easier (your patch removes the ability to do statistics like you did 
> > > via the
> > > early-inlining-insns parameter by forcing it to two).
> > 
> > Hmm, you are right that you do not know if this particular function will get
> > profile (forgot about that).  Still, please use two params - it is more
> > consistent with what we have now and we may make it profile specific in
> > future..
> > 
> > Honza
> > > 
> > > Thanks,
> > > Richard.
> 
> A new patch for trunk is attached.
> 
> Regards,
> Yuan, Pengfei
> 
> 
> 2016-09-16  Yuan Pengfei  
> 
>   * doc/invoke.texi (--param early-inlining-insns-feedback): New.
>   * ipa-inline.c (want_early_inline_function_p): Use
>   PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
>   * params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
>   (PARAM_EARLY_INLINING_INSNS): Change help string accordingly.

OK,
thanks

Honza


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Yuan, Pengfei
> > I also like a new param better as it avoids a new magic constant and
> > makes playing with
> > it easier (your patch removes the ability to do statistics like you did via 
> > the
> > early-inlining-insns parameter by forcing it to two).
> 
> Hmm, you are right that you do not know if this particular function will get
> profile (forgot about that).  Still, please use two params - it is more
> consistent with what we have now and we may make it profile specific in
> future..
> 
> Honza
> > 
> > Thanks,
> > Richard.

A new patch for trunk is attached.

Regards,
Yuan, Pengfei


2016-09-16  Yuan Pengfei  

* doc/invoke.texi (--param early-inlining-insns-feedback): New.
* ipa-inline.c (want_early_inline_function_p): Use
PARAM_EARLY_INLINING_INSNS_FEEDBACK when FDO is enabled.
* params.def (PARAM_EARLY_INLINING_INSNS_FEEDBACK): Define.
(PARAM_EARLY_INLINING_INSNS): Change help string accordingly.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 8eb5eff..6e7659a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9124,12 +9124,18 @@ given call expression.  This parameter limits inlining 
only to call expressions
 whose probability exceeds the given threshold (in percents).
 The default value is 10.
 
 @item early-inlining-insns
+@itemx early-inlining-insns-feedback
 Specify growth that the early inliner can make.  In effect it increases
 the amount of inlining for code having a large abstraction penalty.
 The default value is 14.
 
+The @option{early-inlining-insns-feedback} parameter is used only when
+profile feedback-directed optimizations are enabled (by
+@option{-fprofile-generate} or @option{-fprofile-use}).
+The default value is 2.
+
 @item max-early-inliner-iterations
 Limit of iterations of the early inliner.  This basically bounds
 the number of nested indirect calls the early inliner can resolve.
 Deeper chains are still handled by late inlining.
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 5c9366a..e028c08 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -594,10 +594,17 @@ want_early_inline_function_p (struct cgraph_edge *e)
 }
   else
 {
   int growth = estimate_edge_growth (e);
+  int growth_limit;
   int n;
 
+  if ((profile_arc_flag && !flag_test_coverage)
+ || (flag_branch_probabilities && !flag_auto_profile))
+   growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_FEEDBACK);
+  else
+   growth_limit = PARAM_VALUE (PARAM_EARLY_INLINING_INSNS);
+
   if (growth <= 0)
;
   else if (!e->maybe_hot_p ()
   && growth > 0)
@@ -610,9 +617,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
 xstrdup_for_dump (callee->name ()), callee->order,
 growth);
  want_inline = false;
}
-  else if (growth > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+  else if (growth > growth_limit)
{
  if (dump_file)
fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
 "growth %i exceeds --param early-inlining-insns\n",
@@ -622,9 +629,9 @@ want_early_inline_function_p (struct cgraph_edge *e)
 growth);
  want_inline = false;
}
   else if ((n = num_calls (callee)) != 0
-  && growth * (n + 1) > PARAM_VALUE (PARAM_EARLY_INLINING_INSNS))
+  && growth * (n + 1) > growth_limit)
{
  if (dump_file)
fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
 "growth %i exceeds --param early-inlining-insns "
diff --git a/gcc/params.def b/gcc/params.def
index 79b7dd4..91ea513 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -199,12 +199,20 @@ DEFPARAM(PARAM_INLINE_UNIT_GROWTH,
 DEFPARAM(PARAM_IPCP_UNIT_GROWTH,
 "ipcp-unit-growth",
 "How much can given compilation unit grow because of the 
interprocedural constant propagation (in percent).",
 10, 0, 0)
-DEFPARAM(PARAM_EARLY_INLINING_INSNS,
-"early-inlining-insns",
-"Maximal estimated growth of function body caused by early inlining of 
single call.",
-14, 0, 0)
+DEFPARAM (PARAM_EARLY_INLINING_INSNS_FEEDBACK,
+ "early-inlining-insns-feedback",
+ "Maximal estimated growth of function body caused by early "
+ "inlining of single call.  Used when profile feedback-directed "
+ "optimizations are enabled.",
+ 2, 0, 0)
+DEFPARAM (PARAM_EARLY_INLINING_INSNS,
+ "early-inlining-insns",
+ "Maximal estimated growth of function body caused by early "
+ "inlining of single call.  Used when profile feedback-directed "
+ "optimizations are not enabled.",
+ 14, 0, 0)
 DEFPARAM(PARAM_LARGE_STACK_FRAME,
 "large-stack-frame",
 "The size of stack frame to be considered large.",
 256, 0, 0)



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Jan Hubicka
> On Fri, Sep 16, 2016 at 10:50 AM, Yuan, Pengfei  wrote:
> >> > Here are the results:
> >> >
> >> > Param  Size (GCC5)Time (GCC5)  Time (GCC7)
> >> > 0  44686265 (-8.26%)  58.772s  66.332s
> >> > 1  45692793 (-6.19%)  40.684s  39.220s
> >> > 2  45556185 (-6.47%)  35.292s  34.328s
> >> > 3  46251049 (-5.05%)  28.820s  27.136s
> >> > 4  47028873 (-3.45%)  24.616s  22.200s
> >> > 5  47495641 (-2.49%)  20.160s  17.800s
> >> > 6  47520153 (-2.44%)  16.444s  15.656s
> >> > 14 48708873   5.620s   5.556s
> >>
> >> Thanks for data! I meant to run the benchmark myself, but had little time 
> >> to do
> >> it over past week becuase of traveling and was also wondering what to do 
> >> given
> >> that spec is rather poor benchmark in this area. Tramp3d is biassed but we 
> >> are
> >> in stage1 and can fine tune latter. I am debugging the libxul crashes in 
> >> FDO
> >> binary now, so we can re-run talos.
> >
> > It seems to be memory corruption. The content of a pointer becomes 
> > 0xe5e5...e5.
> > I have tried Firefox 48.0.2, 49b10 and mozilla-central with GCC 6. Same 
> > error.
> >
> >> > Param: value of PARAM_EARLY_INLINING_INSNS
> >> > Size:  code size (.text) of optimized libxul.so
> >> > Time:  execution time of instrumented tramp3d (-n 25)
> >> >
> >> > To balance between size reduction of optimized binary and speed penalty
> >> > of instrumented binary, I set param=6 as baseline and compare:
> >> >
> >> > Param  Size score  Time score  Total
> >> > 0  3.39-3.57   -0.18
> >> > 1  2.54-2.470.07
> >> > 2  2.65-2.150.50
> >> > 3  2.07-1.750.32
> >> > 4  1.41-1.50   -0.09
> >> > 5  1.02-1.23   -0.21
> >> > 6  1.00-1.000.00
> >> > 14 0.00-0.34   -0.34
> >> >
> >> > Therefore, I think param=2 is the best choice.
> >> >
> >> > Is the attached patch OK?
> >>
> >> Setting param to 2 looks fine
> >>
> >> > gcc/ChangeLog
> >> > * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
> >> > when FDO is enabled.
> >> >
> >> >
> >> > diff --git a/gcc/opts.c b/gcc/opts.c
> >> > index 39c190d..b59c700 100644
> >> > --- a/gcc/opts.c
> >> > +++ b/gcc/opts.c
> >> > @@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
> >> > gcc_options *opts_set,
> >> >maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
> >> >  opts->x_param_values, 
> >> > opts_set->x_param_values);
> >> >  }
> >> >
> >> > +  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
> >> > +  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
> >> > +  || (opts->x_flag_branch_probabilities && 
> >> > !opts->x_flag_auto_profile))
> >> > +maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
> >> > +  opts->x_param_values, 
> >> > opts_set->x_param_values);
> >> > +
> >>
> >> I would actually preffer to have PARAM_EARLY_ININING_INSNS_FEEDBACK.  We
> >> already have TRACER_DYNAMIC_COVERAGE_FEEDBACK and other params.  The 
> >> reason is
> >> that profile is not a global property of program. It may or may not be
> >> available for given function, while params are global.
> >
> > Whether profile is available for specific functions is not important here 
> > because
> > profile is loaded after pass_early_inline. Therefore I think setting the 
> > param
> > globally is fine.
> 
> I also like a new param better as it avoids a new magic constant and
> makes playing with
> it easier (your patch removes the ability to do statistics like you did via 
> the
> early-inlining-insns parameter by forcing it to two).

Hmm, you are right that you do not know if this particular function will get
profile (forgot about that).  Still, please use two params - it is more
consistent with what we have now and we may make it profile specific in
future..

Honza
> 
> Thanks,
> Richard.


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Richard Biener
On Fri, Sep 16, 2016 at 10:50 AM, Yuan, Pengfei  wrote:
>> > Here are the results:
>> >
>> > Param  Size (GCC5)Time (GCC5)  Time (GCC7)
>> > 0  44686265 (-8.26%)  58.772s  66.332s
>> > 1  45692793 (-6.19%)  40.684s  39.220s
>> > 2  45556185 (-6.47%)  35.292s  34.328s
>> > 3  46251049 (-5.05%)  28.820s  27.136s
>> > 4  47028873 (-3.45%)  24.616s  22.200s
>> > 5  47495641 (-2.49%)  20.160s  17.800s
>> > 6  47520153 (-2.44%)  16.444s  15.656s
>> > 14 48708873   5.620s   5.556s
>>
>> Thanks for data! I meant to run the benchmark myself, but had little time to 
>> do
>> it over past week becuase of traveling and was also wondering what to do 
>> given
>> that spec is rather poor benchmark in this area. Tramp3d is biassed but we 
>> are
>> in stage1 and can fine tune latter. I am debugging the libxul crashes in FDO
>> binary now, so we can re-run talos.
>
> It seems to be memory corruption. The content of a pointer becomes 
> 0xe5e5...e5.
> I have tried Firefox 48.0.2, 49b10 and mozilla-central with GCC 6. Same error.
>
>> > Param: value of PARAM_EARLY_INLINING_INSNS
>> > Size:  code size (.text) of optimized libxul.so
>> > Time:  execution time of instrumented tramp3d (-n 25)
>> >
>> > To balance between size reduction of optimized binary and speed penalty
>> > of instrumented binary, I set param=6 as baseline and compare:
>> >
>> > Param  Size score  Time score  Total
>> > 0  3.39-3.57   -0.18
>> > 1  2.54-2.470.07
>> > 2  2.65-2.150.50
>> > 3  2.07-1.750.32
>> > 4  1.41-1.50   -0.09
>> > 5  1.02-1.23   -0.21
>> > 6  1.00-1.000.00
>> > 14 0.00-0.34   -0.34
>> >
>> > Therefore, I think param=2 is the best choice.
>> >
>> > Is the attached patch OK?
>>
>> Setting param to 2 looks fine
>>
>> > gcc/ChangeLog
>> > * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
>> > when FDO is enabled.
>> >
>> >
>> > diff --git a/gcc/opts.c b/gcc/opts.c
>> > index 39c190d..b59c700 100644
>> > --- a/gcc/opts.c
>> > +++ b/gcc/opts.c
>> > @@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
>> > gcc_options *opts_set,
>> >maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
>> >  opts->x_param_values, 
>> > opts_set->x_param_values);
>> >  }
>> >
>> > +  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
>> > +  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
>> > +  || (opts->x_flag_branch_probabilities && 
>> > !opts->x_flag_auto_profile))
>> > +maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
>> > +  opts->x_param_values, opts_set->x_param_values);
>> > +
>>
>> I would actually preffer to have PARAM_EARLY_ININING_INSNS_FEEDBACK.  We
>> already have TRACER_DYNAMIC_COVERAGE_FEEDBACK and other params.  The reason 
>> is
>> that profile is not a global property of program. It may or may not be
>> available for given function, while params are global.
>
> Whether profile is available for specific functions is not important here 
> because
> profile is loaded after pass_early_inline. Therefore I think setting the param
> globally is fine.

I also like a new param better as it avoids a new magic constant and
makes playing with
it easier (your patch removes the ability to do statistics like you did via the
early-inlining-insns parameter by forcing it to two).

Thanks,
Richard.

>> Even at compile time profile may be selectively missing for example for
>> COMDATs that did not win in the linking process.
>>
>> There is also need to update the documentation.
>> Thanks for the work!
>> Honza
>
> Here is the new patch.
>
> Regards,
> Yuan, Pengfei
>
>
> gcc/ChangeLog
> * doc/invoke.texi (early-inlining-insns): Value is adjusted
> when FDO is enabled.
> * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
> when FDO is enabled.
>
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 1ca4dcc..880a28c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10272,9 +10272,11 @@ The default value is 10.
>
>  @item early-inlining-insns
>  Specify growth that the early inliner can make.  In effect it increases
>  the amount of inlining for code having a large abstraction penalty.
> -The default value is 14.
> +The default value is 14.  When feedback-directed optimizations is enabled (by
> +@option{-fprofile-generate} or @option{-fprofile-use}), the value is adjusted
> +to 2.
>
>  @item max-early-inliner-iterations
>  Limit of iterations of the early inliner.  This basically bounds
>  the number of nested indirect calls the early inliner can resolve.
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 39c190d..b59c700 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -826,8 +826,14 @@ finish_options (struct 

Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Yuan, Pengfei
> > Here are the results:
> > 
> > Param  Size (GCC5)Time (GCC5)  Time (GCC7)
> > 0  44686265 (-8.26%)  58.772s  66.332s
> > 1  45692793 (-6.19%)  40.684s  39.220s
> > 2  45556185 (-6.47%)  35.292s  34.328s
> > 3  46251049 (-5.05%)  28.820s  27.136s
> > 4  47028873 (-3.45%)  24.616s  22.200s
> > 5  47495641 (-2.49%)  20.160s  17.800s
> > 6  47520153 (-2.44%)  16.444s  15.656s
> > 14 48708873   5.620s   5.556s
> 
> Thanks for data! I meant to run the benchmark myself, but had little time to 
> do
> it over past week becuase of traveling and was also wondering what to do given
> that spec is rather poor benchmark in this area. Tramp3d is biassed but we are
> in stage1 and can fine tune latter. I am debugging the libxul crashes in FDO
> binary now, so we can re-run talos.

It seems to be memory corruption. The content of a pointer becomes 0xe5e5...e5.
I have tried Firefox 48.0.2, 49b10 and mozilla-central with GCC 6. Same error.

> > Param: value of PARAM_EARLY_INLINING_INSNS
> > Size:  code size (.text) of optimized libxul.so
> > Time:  execution time of instrumented tramp3d (-n 25)
> > 
> > To balance between size reduction of optimized binary and speed penalty
> > of instrumented binary, I set param=6 as baseline and compare:
> > 
> > Param  Size score  Time score  Total
> > 0  3.39-3.57   -0.18
> > 1  2.54-2.470.07
> > 2  2.65-2.150.50
> > 3  2.07-1.750.32
> > 4  1.41-1.50   -0.09
> > 5  1.02-1.23   -0.21
> > 6  1.00-1.000.00
> > 14 0.00-0.34   -0.34
> > 
> > Therefore, I think param=2 is the best choice.
> > 
> > Is the attached patch OK?
> 
> Setting param to 2 looks fine
> 
> > gcc/ChangeLog
> > * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
> > when FDO is enabled.
> > 
> > 
> > diff --git a/gcc/opts.c b/gcc/opts.c
> > index 39c190d..b59c700 100644
> > --- a/gcc/opts.c
> > +++ b/gcc/opts.c
> > @@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
> > gcc_options *opts_set,
> >maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
> >  opts->x_param_values, 
> > opts_set->x_param_values);
> >  }
> > 
> > +  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
> > +  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
> > +  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
> > +maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
> > +  opts->x_param_values, opts_set->x_param_values);
> > +
> 
> I would actually preffer to have PARAM_EARLY_ININING_INSNS_FEEDBACK.  We
> already have TRACER_DYNAMIC_COVERAGE_FEEDBACK and other params.  The reason is
> that profile is not a global property of program. It may or may not be
> available for given function, while params are global.

Whether profile is available for specific functions is not important here 
because
profile is loaded after pass_early_inline. Therefore I think setting the param
globally is fine.

> Even at compile time profile may be selectively missing for example for
> COMDATs that did not win in the linking process.
> 
> There is also need to update the documentation.
> Thanks for the work!
> Honza

Here is the new patch.

Regards,
Yuan, Pengfei


gcc/ChangeLog
* doc/invoke.texi (early-inlining-insns): Value is adjusted
when FDO is enabled.
* opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
when FDO is enabled.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1ca4dcc..880a28c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10272,9 +10272,11 @@ The default value is 10.
 
 @item early-inlining-insns
 Specify growth that the early inliner can make.  In effect it increases
 the amount of inlining for code having a large abstraction penalty.
-The default value is 14.
+The default value is 14.  When feedback-directed optimizations is enabled (by
+@option{-fprofile-generate} or @option{-fprofile-use}), the value is adjusted
+to 2.
 
 @item max-early-inliner-iterations
 Limit of iterations of the early inliner.  This basically bounds
 the number of nested indirect calls the early inliner can resolve.
diff --git a/gcc/opts.c b/gcc/opts.c
index 39c190d..b59c700 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
 opts->x_param_values, opts_set->x_param_values);
 }
 
+  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
+  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
+  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
+maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
+  

Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-16 Thread Jan Hubicka
> 
> Here are the results:
> 
> Param  Size (GCC5)Time (GCC5)  Time (GCC7)
> 0  44686265 (-8.26%)  58.772s  66.332s
> 1  45692793 (-6.19%)  40.684s  39.220s
> 2  45556185 (-6.47%)  35.292s  34.328s
> 3  46251049 (-5.05%)  28.820s  27.136s
> 4  47028873 (-3.45%)  24.616s  22.200s
> 5  47495641 (-2.49%)  20.160s  17.800s
> 6  47520153 (-2.44%)  16.444s  15.656s
> 14 48708873   5.620s   5.556s

Thanks for data! I meant to run the benchmark myself, but had little time to do
it over past week becuase of traveling and was also wondering what to do given
that spec is rather poor benchmark in this area. Tramp3d is biassed but we are
in stage1 and can fine tune latter. I am debugging the libxul crashes in FDO
binary now, so we can re-run talos.
> 
> Param: value of PARAM_EARLY_INLINING_INSNS
> Size:  code size (.text) of optimized libxul.so
> Time:  execution time of instrumented tramp3d (-n 25)
> 
> To balance between size reduction of optimized binary and speed penalty
> of instrumented binary, I set param=6 as baseline and compare:
> 
> Param  Size score  Time score  Total
> 0  3.39-3.57   -0.18
> 1  2.54-2.470.07
> 2  2.65-2.150.50
> 3  2.07-1.750.32
> 4  1.41-1.50   -0.09
> 5  1.02-1.23   -0.21
> 6  1.00-1.000.00
> 14 0.00-0.34   -0.34
> 
> Therefore, I think param=2 is the best choice.
> 
> Is the attached patch OK?

Setting param to 2 looks fine

> gcc/ChangeLog
>   * opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
>   when FDO is enabled.
> 
> 
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 39c190d..b59c700 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
> gcc_options *opts_set,
>maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
>  opts->x_param_values, opts_set->x_param_values);
>  }
> 
> +  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
> +  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
> +  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
> +maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
> +  opts->x_param_values, opts_set->x_param_values);
> +

I would actually preffer to have PARAM_EARLY_ININING_INSNS_FEEDBACK.  We
already have TRACER_DYNAMIC_COVERAGE_FEEDBACK and other params.  The reason is
that profile is not a global property of program. It may or may not be
available for given function, while params are global.

Even at compile time profile may be selectively missing for example for
COMDATs that did not win in the linking process.

There is also need to update the documentation.
Thanks for the work!
Honza
>if (opts->x_flag_lto)
>  {
>  #ifdef ENABLE_LTO
>opts->x_flag_generate_lto = 1;


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-15 Thread Yuan, Pengfei
> > Setting PARAM_EARLY_INLINING_INSNS to 0 when FDO is enabled should be
> > equivalent to my patch.
> 
> Yes.  This means it's easy to experiment with other values than zero.  
> Basically
> early inlining is supposed to remove abstraction penalty to
> 
>  a) reduce FDO instrumentation overhead
>  b) get more realistic size estimates for the inliner
> 
> a) was particularly important back in time for tramp3d, reducing
> profiling runtime
> 1000-fold.  b) is generally important.
> 
> PARAM_EARLY_INLINING_INSNS is supposed to be a reasonable value to
> get abstraction removed but IIRC we increased it quite a bit to also get more
> early optimization (to get more accurate inliner estimates).
> 
> What I am saying is that reducing PARAM_EARLY_INLINING_INSNS makes
> sense for FDO but reducing it to zero is probably a bit much.
> 
> Can you do your measurements with values between zero and the current
> default of 14 (wow, 14 ... didn't know it's _that_ high currently ;)).
> What's the
> value that crosses the boundary of diminishing returns regarding to code-size
> improvements for you?
> 
> Richard.

Here are the results:

Param  Size (GCC5)Time (GCC5)  Time (GCC7)
0  44686265 (-8.26%)  58.772s  66.332s
1  45692793 (-6.19%)  40.684s  39.220s
2  45556185 (-6.47%)  35.292s  34.328s
3  46251049 (-5.05%)  28.820s  27.136s
4  47028873 (-3.45%)  24.616s  22.200s
5  47495641 (-2.49%)  20.160s  17.800s
6  47520153 (-2.44%)  16.444s  15.656s
14 48708873   5.620s   5.556s

Param: value of PARAM_EARLY_INLINING_INSNS
Size:  code size (.text) of optimized libxul.so
Time:  execution time of instrumented tramp3d (-n 25)

To balance between size reduction of optimized binary and speed penalty
of instrumented binary, I set param=6 as baseline and compare:

Param  Size score  Time score  Total
0  3.39-3.57   -0.18
1  2.54-2.470.07
2  2.65-2.150.50
3  2.07-1.750.32
4  1.41-1.50   -0.09
5  1.02-1.23   -0.21
6  1.00-1.000.00
14 0.00-0.34   -0.34

Therefore, I think param=2 is the best choice.

Is the attached patch OK?

Regards,
Yuan, Pengfei


gcc/ChangeLog
* opts.c (finish_options): Adjust PARAM_EARLY_INLINING_INSNS
when FDO is enabled.


diff --git a/gcc/opts.c b/gcc/opts.c
index 39c190d..b59c700 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -826,8 +826,14 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   maybe_set_param_value (PARAM_STACK_FRAME_GROWTH, 40,
 opts->x_param_values, opts_set->x_param_values);
 }

+  /* Adjust PARAM_EARLY_INLINING_INSNS when FDO is enabled.  */
+  if ((opts->x_profile_arc_flag && !opts->x_flag_test_coverage)
+  || (opts->x_flag_branch_probabilities && !opts->x_flag_auto_profile))
+maybe_set_param_value (PARAM_EARLY_INLINING_INSNS, 2,
+  opts->x_param_values, opts_set->x_param_values);
+
   if (opts->x_flag_lto)
 {
 #ifdef ENABLE_LTO
   opts->x_flag_generate_lto = 1;



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-15 Thread Yuan, Pengfei
> Yes.  This means it's easy to experiment with other values than zero.  
> Basically
> early inlining is supposed to remove abstraction penalty to
> 
>  a) reduce FDO instrumentation overhead
>  b) get more realistic size estimates for the inliner
> 
> a) was particularly important back in time for tramp3d, reducing
> profiling runtime
> 1000-fold.  b) is generally important.
> 
> PARAM_EARLY_INLINING_INSNS is supposed to be a reasonable value to
> get abstraction removed but IIRC we increased it quite a bit to also get more
> early optimization (to get more accurate inliner estimates).
> 
> What I am saying is that reducing PARAM_EARLY_INLINING_INSNS makes
> sense for FDO but reducing it to zero is probably a bit much.
> 
> Can you do your measurements with values between zero and the current
> default of 14 (wow, 14 ... didn't know it's _that_ high currently ;)).
> What's the
> value that crosses the boundary of diminishing returns regarding to code-size
> improvements for you?
> 
> Richard.
> 

I see. This procedure may take some time since profile data can not be reused.

Yuan, Pengfei



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-15 Thread Richard Biener
On Thu, Sep 15, 2016 at 2:21 AM, Yuan, Pengfei  wrote:
>> I think the approach is reasonable though it might still have
>> interesting effects on
>> optimization for very small growth.  So for further experimenting it
>> would be nice
>
> Test it on SPEC CPU with FDO enabled?
>
>> to have a separate PARAM_EARLY_FDO_INLINING_INSNS or maybe simply
>> adjust the PARAM_EARLY_INLINING_INSNS default accordingly when FDO is
>> enabled?
>
> Setting PARAM_EARLY_INLINING_INSNS to 0 when FDO is enabled should be
> equivalent to my patch.

Yes.  This means it's easy to experiment with other values than zero.  Basically
early inlining is supposed to remove abstraction penalty to

 a) reduce FDO instrumentation overhead
 b) get more realistic size estimates for the inliner

a) was particularly important back in time for tramp3d, reducing
profiling runtime
1000-fold.  b) is generally important.

PARAM_EARLY_INLINING_INSNS is supposed to be a reasonable value to
get abstraction removed but IIRC we increased it quite a bit to also get more
early optimization (to get more accurate inliner estimates).

What I am saying is that reducing PARAM_EARLY_INLINING_INSNS makes
sense for FDO but reducing it to zero is probably a bit much.

Can you do your measurements with values between zero and the current
default of 14 (wow, 14 ... didn't know it's _that_ high currently ;)).
What's the
value that crosses the boundary of diminishing returns regarding to code-size
improvements for you?

Richard.

> Regards,
> Yuan, Pengfei
>
>> I'll let Honza also double-check the condition detecting FDO (it looks
>> like we should
>> have some abstraction for that).
>>
>> Thanks,
>> Richard.
>


Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-14 Thread Yuan, Pengfei
> I think the approach is reasonable though it might still have
> interesting effects on
> optimization for very small growth.  So for further experimenting it
> would be nice

Test it on SPEC CPU with FDO enabled?

> to have a separate PARAM_EARLY_FDO_INLINING_INSNS or maybe simply
> adjust the PARAM_EARLY_INLINING_INSNS default accordingly when FDO is
> enabled?

Setting PARAM_EARLY_INLINING_INSNS to 0 when FDO is enabled should be
equivalent to my patch.

Regards,
Yuan, Pengfei
 
> I'll let Honza also double-check the condition detecting FDO (it looks
> like we should
> have some abstraction for that).
> 
> Thanks,
> Richard.



Re: [PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-14 Thread Richard Biener
On Sat, Sep 10, 2016 at 8:04 AM, Yuan, Pengfei  wrote:
> Hi,
>
> Previously I have sent a patch on profile based option tuning:
> https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01377.html
>
> According to Richard Biener's advice, I try investigating where the code size
> reduction comes from. After analyzing the dumped IL, I figure out that it is
> related to function inlining. Some cold functions are inlined regardless of
> profile feedback, which increases code size.
>
> The problem is with the early inliner. In want_early_inline_function_p, if the
> estimated edge growth > 0, want_inline depends on maybe_hot_p, which usually
> returns true unless optimize_size, since profile feedback is not available at
> this point. Some functions which may be cold according to profile feedback are
> inlined regardlessly, resulting in code size increase.
>
> At first, I come up with a solution that preloads some profile info before
> pass_early_inline. But it fails with numerous coverage-mismatch errors in
> pass_ipa_tree_profile. Therefore, the proposed patch prevents early inlining
> with positive code size growth if FDO is enabled.
>
> Experiment results are as follows:
>
> Setup
>   Hardware Core i7-4770, 32GB RAM
>   OS   Debian sid amd64
>   Compiler GCC 5.4.1 20160907
>   Firefox source   mozilla-central, cset 91c2b9d5c135
>   Training workloadcss3test.com, html5test.com, Octane benchmark
>
> Vanilla GCC
>   Code size (.text of libxul.so)48708873
>   Octane benchmark (score)  35828   36618   35847
>   Kraken benchmark (time)   939.4ms 964.0ms 951.8ms
>
> Patched GCC
>   Code size (.text of libxul.so)44686265
>   Octane benchmark (score)  36103   35740   35611
>   Kraken benchmark (time)   928.9ms 949.1ms 938.7ms
>
> There is over 8% reduction in code size, while no obvious difference in
> performance. The experiment is conducted with GCC 5. There is segmentation
> fault when starting Firefox instrumented by GCC 6. GCC 7 encounters ICE when
> building Firefox.

I think the approach is reasonable though it might still have
interesting effects on
optimization for very small growth.  So for further experimenting it
would be nice
to have a separate PARAM_EARLY_FDO_INLINING_INSNS or maybe simply
adjust the PARAM_EARLY_INLINING_INSNS default accordingly when FDO is
enabled?

I'll let Honza also double-check the condition detecting FDO (it looks
like we should
have some abstraction for that).

Thanks,
Richard.

> Regards,
>
> Yuan, Pengfei
>
>
> gcc/ChangeLog:
> * ipa-inline.c (want_early_inline_function_p): Be more conservative
> if FDO is enabled.
>
>
> diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
> index 7097cf3..8266f97 100644
> --- a/gcc/ipa-inline.c
> +++ b/gcc/ipa-inline.c
> @@ -628,6 +628,20 @@ want_early_inline_function_p (struct cgraph_edge *e)
>
>if (growth <= 0)
> ;
> +  /* Profile feedback is not available at this point.
> +Be more conservative if FDO is enabled.  */
> +  else if ((profile_arc_flag && !flag_test_coverage)
> +  || (flag_branch_probabilities && !flag_auto_profile))
> +   {
> + if (dump_file)
> +   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
> +"FDO is enabled and code would grow by %i\n",
> +xstrdup_for_dump (e->caller->name ()),
> +e->caller->order,
> +xstrdup_for_dump (callee->name ()), callee->order,
> +growth);
> + want_inline = false;
> +   }
>else if (!e->maybe_hot_p ()
>&& growth > 0)
> {
>


[PATCH, 5.x/6.x/7.x] Be more conservative in early inliner if FDO is enabled

2016-09-10 Thread Yuan, Pengfei
Hi,

Previously I have sent a patch on profile based option tuning:
https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01377.html

According to Richard Biener's advice, I try investigating where the code size
reduction comes from. After analyzing the dumped IL, I figure out that it is
related to function inlining. Some cold functions are inlined regardless of
profile feedback, which increases code size.

The problem is with the early inliner. In want_early_inline_function_p, if the
estimated edge growth > 0, want_inline depends on maybe_hot_p, which usually
returns true unless optimize_size, since profile feedback is not available at
this point. Some functions which may be cold according to profile feedback are
inlined regardlessly, resulting in code size increase.

At first, I come up with a solution that preloads some profile info before
pass_early_inline. But it fails with numerous coverage-mismatch errors in
pass_ipa_tree_profile. Therefore, the proposed patch prevents early inlining
with positive code size growth if FDO is enabled.

Experiment results are as follows:

Setup
  Hardware Core i7-4770, 32GB RAM
  OS   Debian sid amd64
  Compiler GCC 5.4.1 20160907
  Firefox source   mozilla-central, cset 91c2b9d5c135
  Training workloadcss3test.com, html5test.com, Octane benchmark

Vanilla GCC
  Code size (.text of libxul.so)48708873
  Octane benchmark (score)  35828   36618   35847
  Kraken benchmark (time)   939.4ms 964.0ms 951.8ms

Patched GCC
  Code size (.text of libxul.so)44686265
  Octane benchmark (score)  36103   35740   35611
  Kraken benchmark (time)   928.9ms 949.1ms 938.7ms

There is over 8% reduction in code size, while no obvious difference in
performance. The experiment is conducted with GCC 5. There is segmentation
fault when starting Firefox instrumented by GCC 6. GCC 7 encounters ICE when
building Firefox.

Regards,

Yuan, Pengfei


gcc/ChangeLog:
* ipa-inline.c (want_early_inline_function_p): Be more conservative
if FDO is enabled.


diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index 7097cf3..8266f97 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -628,6 +628,20 @@ want_early_inline_function_p (struct cgraph_edge *e)
 
   if (growth <= 0)
;
+  /* Profile feedback is not available at this point.
+Be more conservative if FDO is enabled.  */
+  else if ((profile_arc_flag && !flag_test_coverage)
+  || (flag_branch_probabilities && !flag_auto_profile))
+   {
+ if (dump_file)
+   fprintf (dump_file, "  will not early inline: %s/%i->%s/%i, "
+"FDO is enabled and code would grow by %i\n",
+xstrdup_for_dump (e->caller->name ()),
+e->caller->order,
+xstrdup_for_dump (callee->name ()), callee->order,
+growth);
+ want_inline = false;
+   }
   else if (!e->maybe_hot_p ()
   && growth > 0)
{