On Thu, Nov 29, 2018 at 12:20 AM Jan Hubicka <[email protected]> wrote:
>
> > On Tue, Nov 20, 2018 at 6:55 PM bin.cheng <[email protected]>
> > wrote:
> > >
> > > Sender:Jan Hubicka <[email protected]>
> > > Sent at:2018 Nov 5 (Mon) 22:21
> > > To:Richard Biener <[email protected]>
> > > Cc:bin.cheng <[email protected]>; GCC Patches
> > > <[email protected]>
> > > Subject:Re: [PATCH AutoFDO/2]Treat ZERO as common profile
> > > probability/count
> > >
> > > >
> > > > > On Wed, Oct 31, 2018 at 7:30 AM bin.cheng
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi,
> > > > > > In new profile probability/count infra, we have different precision
> > > > > > quality categories,
> > > > > > and probabilities/counts of different categories are not supposed
> > > > > > to be compared or
> > > > > > calculated. Though in general is an improvement, it introduces
> > > > > > unexpected behavior.
> > > > > > Specifically, class profile_probablity and profile_count themselves
> > > > > > are implemented
> > > > > > by comparing probabilities/counts against profile_count::zero().
> > > > > > while zero() is of
> > > > > > profile_precision category, it's always compared different to zero
> > > > > > of other precision
> > > > > > categories including afdo.
> > > > > >
> > > > > > I can see two ways fixing this: 1) Treat zero as a common
> > > > > > probability/count regardless
> > > > > > of its category; 2) Provide an "is_zero" method rather than relying
> > > > > > on "==" comparison
> > > > > > against probability_count::zero(). 2) requires lots of code
> > > > > > changes so I went with 1)
> > > > > > in this patch set. This patch doesn't handle "always" but it might
> > > > > > be.
> > > > > >
> > > > > > This patch also corrects a minor issue where we try to invert an
> > > > > > uninitialized value.
> > > > > >
> > > > > > Bootstrap and test on x86_64 in patch set. Is it OK?
> > > > >
> > > > > I'll defer on the emit_store_flag_force change, likewise for the zero
> > > > > handling in
> > > > > compares - I don't think zeros of different qualities should compare
> > > > > equal.
> > > > > Would compares against ::always() not have the very same issue?
> > > > > Likewise ::even(),
> > > > > ::likely(), etc.? Those always get guessed quality.
> > > > >
> > > > > The invert change looks OK to me. The related change to the always()
> > > > > API would
> > > > > suggest to replace guessed_always() with always (guessed) and also do
> > > > > similar
> > > > > changes throughout the whole API...
> > > > >
> > > > > Honza?
> > > >
> > > > The zeros are really differenct zeros. profile_count::zero makes us to
> > > > drop the basic block into cold section because we know that it won't be
> > > > executed in normal run of program (either we have accurate profile
> > > > feedback or by proving that the program is on way to crash or user
> > > > annotated cold section). Having guessed zero or auto-fdo zero won't
> > > > make us to do such agressive size optimization.
> > > > This is important since those zeros relatively commonly happens by
> > > > accident and thus if we dropped all the code to cold section the cold
> > > > section would be visited relativel often during execution of program
> > > > which would eliminate its need.
> > > >
> > > > Most comparsion in profile-count.h which goes agains profile_count==zero
> > > > are realy intended to pass only for this "aboslute zero". They bypass
> > > > the precision adjusmtents which normally happen when you merge values
> > > > of different precision.
> > > >
> > > > What kind of unexpected behaviour are you seeing?
> > > > We already have nonzero_p which is what we use when we want to know that
> > > > count is non-zero in some sense of precision.
> > > Hi Honza,
> > > Sorry for letting this slip away. So in case of AutoFDO, due to the
> > > nature
> > > of sampling, lots of funcs/bbs are annotated with zero profile_count in
> > > afdo
> > > precision, and we have checks against zero profile_count in precise
> > > precision
> > > All these checks end up with false and cause issues. Take the code in
> > > update_profiling_info as an example:
> > >
> > > update_profiling_info (struct cgraph_node *orig_node,
> > > struct cgraph_node *new_node)
> > > {
> > > struct cgraph_edge *cs;
> > > struct caller_statistics stats;
> > > profile_count new_sum, orig_sum;
> > > profile_count remainder, orig_node_count = orig_node->count;
> > >
> > > if (!(orig_node_count.ipa () > profile_count::zero ()))
> > > return;
> > > //...
> > > for (cs = new_node->callees; cs; cs = cs->next_callee)
> > > cs->count = cs->count.apply_scale (new_sum, orig_node_count);
> > >
> > > Since we also have below code in profile_count::operator>,
> > > if (other == profile_count::zero ())
> > > return !(*this == profile_count::zero ());
> > >
> > > If orig_node_count is afdo zero, the above zero check for orig_node_count
> > > returns false, we end up with passing zero density to apply_scale issue
> > > and
> > > asserting.
> > >
> > > In this updated patch, I restrcited changes only to
> > > profile_count::operator
> > > <, >, <= and >=. Plus, I think there is a latent typo in operator>=
> > > because
> > > current code return TRUE if '*this' is precise zero and 'other' is precise
> > > non-zero.
> > > @@ -879,7 +879,7 @@ public:
> > > if (other == profile_count::zero ())
> > > return true;
> > > if (*this == profile_count::zero ())
> > > - return !(other == profile_count::zero ());
> > > + return !other.nonzero_p ();
>
> We already have
>
> True:
> profile_count::zero < any other value
> any other value > profile_count::zero
> profile_count::zero <= any initialized value
> profile_count::zero <= profile_count::zero
> any initialized value >= profile_count::zero
>
> false
> profile_count::zero > any other value
> any other value < profile_count::zero
>
> You are right about typo in >=, it should be:
>
> Index: profile-count.h
> ===================================================================
> --- profile-count.h (revision 266450)
> +++ profile-count.h (working copy)
> @@ -879,7 +879,7 @@
> if (other == profile_count::zero ())
> return true;
> if (*this == profile_count::zero ())
> - return !(other == profile_count::zero ());
> + return other == profile_count::zero ();
> gcc_checking_assert (compatible_p (other));
> return m_val >= other.m_val;
> }
>
> With your patch we get false for:
> profile_count::zero < guessed/auto_fdo/other 0
> guessed/auto_fdo/other > profile_count::zero
> guessed/auto_fdo/other <= profile_count::zero
> profile_count::zero >= profile_count::zero
>
> The original idea was to intentionally make profile_count::zero smaller
> than any toher types of initialized values, since it is more strict hint
> that the path will not be taken.
> For example in bb_reorder if you end up with "funny" profile with two
> exit edges one having profile_count::zero and other being zero as result
> of (unsucesfull) profile updates it is still better idea to pick the
> profile_count::zero for taken edge. With your patch it will end up
> picking either of the paths.
>
> How the patch helps to your situation?
Hi Honza, thanks very much for elaborating. Issue in case of autofdo
is as described in last message:
Given update_profiling_info implemented as below:
update_profiling_info (struct cgraph_node *orig_node,
struct cgraph_node *new_node)
{
struct cgraph_edge *cs;
struct caller_statistics stats;
profile_count new_sum, orig_sum;
profile_count remainder, orig_node_count = orig_node->count;
//*****Operator ">" returns true if orig_node_count == autofdo.zero.
if (!(orig_node_count.ipa () > profile_count::zero ()))
return;
//...
for (cs = new_node->callees; cs; cs = cs->next_callee)
//*****Result in apply_scale called with autofdo.zero as the 2nd argument.
cs->count = cs->count.apply_scale (new_sum, orig_node_count);
Also apply_scale is implemented as:
profile_count apply_scale (profile_count num, profile_count den) const
{
if (*this == profile_count::zero ())
return *this;
if (num == profile_count::zero ())
return num;
if (!initialized_p () || !num.initialized_p () || !den.initialized_p ())
return profile_count::uninitialized ();
if (num == den)
return *this;
gcc_checking_assert (den.m_val);
Here we have (num != zero && den == autofdo.zero), it triggers the
gcc_checking_assert.
According to your explanation, guess we need to call force_nonzero for
orig_node_count before calling apply_scale, right?
Thanks,
bin