Hi Richard and Jeff,

Thanks for your comments.

On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guent...@gmail.com> wrote:
>
> On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <l...@redhat.com> wrote:
> >
> > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > Hi,
> > >
> > > PR87528 showed a case where libgcc generated popcount is causing
> > > regression for Skylake.
> > > We also have PR86677 where kernel build is failing because the kernel
> > > does not use the libgcc (when backend is not defining popcount
> > > pattern).  While I agree that the kernel should implement its own
> > > functionality when it is not using the libgcc, I am afraid that the
> > > implementation can have the same performance issues reported for
> > > Skylake in PR87528.
> > >
> > > Therefore, I would like to propose that we disable popcount detection
> > > when we don't have a pattern for that. The attached patch (based on
> > > previous discussions) does this.
> > >
> > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > regressions. We need to disable the popcount* testcases. I will have
> > > to define a effective_target_with_popcount in
> > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > Thanks,
> > > Kugan
> > >
> > >
> > > gcc/ChangeLog:
> > >
> > > 2018-10-25  Kugan Vivekanandarajah  <kug...@linaro.org>
> > >
> > >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN 
> > > POPCOUNT
> > >     as expensive when backend does not define it.
> > >
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2018-10-25  Kugan Vivekanandarajah  <kug...@linaro.org>
> > >
> > >     * gcc.target/aarch64/popcount4.c: New test.
> > >
> > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > patch from you.
> >
> > Richi argued that it's the kernel team's responsibility to provide a
> > popcount since they don't link with libgcc.  And I'm generally in
> > agreement with that position, though it does tend to generate some
> > friction with the kernel developers.  We also run the real risk of GCC 9
> > not being able to build the kernel which, IMHO, would be a disaster from
> > a PR standpoint.
> >
> > I'd like to hear from others here.  I fully realize we're beyond the
> > realm of what is strictly technically correct here from a review standpoint.
>
> As said final value replacement to a library call is probably not wanted
> for optimization purpose, so adjusting expression_expensive_p is OK with
> me.  It might not fully solve the (non-)issue in case another optimization 
> pass
> chooses to materialize niter computation result.
>
> Few comments on the patch:
>
> +      tree fndecl = get_callee_fndecl (expr);
> +
> +      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> +       {
> +         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
>
>   combined_fn cfn = gimple_call_combined_fn (expr);
>   switch (cfn)
>     {

Did you mean:
combined_fn cfn = get_call_combined_fn (expr);

> ...
>
> cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is 
> mostly
> offline but eventually he knows whether there is a better way to query
>
> +           CASE_CFN_POPCOUNT:
> +             /* Check if opcode for popcount is available.  */
> +             if (optab_handler (popcount_optab,
> +                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> (expr, 0))))
> +                 == CODE_FOR_nothing)
> +               return true;
>
> note that we currently generate builtin calls rather than IFN calls
> (when a direct
> optab is supported).
>
> Another comment on the patch is that you probably have to adjust existing
> popcount testcases to add architecture specific flags enabling suport for
> the instructions, otherwise you won't see loop replacement.
Indeed.
In lib/target-supports.exp, I will try to add support for
check_effective_target_popcount_long.
When I grep for the popcount pattern in md files, I see it is defined for:

tilegx
tilepro
alpha
aarch64  when TARGET_SIMD
ia64
rs6000
i386  when TARGET_POPCOUNT
popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
s390  when TARGET_Z916 && TARGET_64BIT
sparc when TARGET_POPC
arm when TARGET_NEON
mips when ISA_HAS_POP
spu
avr

I can check these targets with the condition.
Another possibility is to check with a sample code and see if we are
getting a libcall in the asm. Not sure if that is straightforward. Are
there any example for such.

We could also move these test to a primary target that is tested often
tested which also defines popcount pattern. I dont think these tests
change for targets and if we can test in one target that could be
enough,

I am happy to implement what is appropriate.

Thanks,
Kugan



>
> Also I think that the expression is only expensive (for final value
> replacement!)
> if you consider optimizing for speed.  When optimizing for size getting rid of
> the loop is probably beneificial unconditionally.  That would leave the
> possibility to switch said testcases to -Os.  It would require adding a
> bool size_p flag to expression_expensive and passing down
> optimize_loop_for_size_p ().
>
> _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> replacing sth with an expression based on the niter analysis result doesn't
> mean we get rid of the loop (but only of an IV), so maybe that reasoning
> doesn't apply there.
>
> Richard.
>
> > Jeff
> >

Reply via email to