Hi Richard and Jeff, Thanks for your comments.
On Fri, 26 Oct 2018 at 19:40, Richard Biener <[email protected]> wrote: > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <[email protected]> wrote: > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote: > > > Hi, > > > > > > PR87528 showed a case where libgcc generated popcount is causing > > > regression for Skylake. > > > We also have PR86677 where kernel build is failing because the kernel > > > does not use the libgcc (when backend is not defining popcount > > > pattern). While I agree that the kernel should implement its own > > > functionality when it is not using the libgcc, I am afraid that the > > > implementation can have the same performance issues reported for > > > Skylake in PR87528. > > > > > > Therefore, I would like to propose that we disable popcount detection > > > when we don't have a pattern for that. The attached patch (based on > > > previous discussions) does this. > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new > > > regressions. We need to disable the popcount* testcases. I will have > > > to define a effective_target_with_popcount in > > > gcc/testsuite/lib/target-supports.exp if this patch is OK? > > > Thanks, > > > Kugan > > > > > > > > > gcc/ChangeLog: > > > > > > 2018-10-25 Kugan Vivekanandarajah <[email protected]> > > > > > > * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN > > > POPCOUNT > > > as expensive when backend does not define it. > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > 2018-10-25 Kugan Vivekanandarajah <[email protected]> > > > > > > * gcc.target/aarch64/popcount4.c: New test. > > > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere > > (number_of_iterations_popcount) in my tester. It may in fact be an old > > patch from you. > > > > Richi argued that it's the kernel team's responsibility to provide a > > popcount since they don't link with libgcc. And I'm generally in > > agreement with that position, though it does tend to generate some > > friction with the kernel developers. We also run the real risk of GCC 9 > > not being able to build the kernel which, IMHO, would be a disaster from > > a PR standpoint. > > > > I'd like to hear from others here. I fully realize we're beyond the > > realm of what is strictly technically correct here from a review standpoint. > > As said final value replacement to a library call is probably not wanted > for optimization purpose, so adjusting expression_expensive_p is OK with > me. It might not fully solve the (non-)issue in case another optimization > pass > chooses to materialize niter computation result. > > Few comments on the patch: > > + tree fndecl = get_callee_fndecl (expr); > + > + if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) > + { > + combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl)); > > combined_fn cfn = gimple_call_combined_fn (expr); > switch (cfn) > { Did you mean: combined_fn cfn = get_call_combined_fn (expr); > ... > > cfn will be CFN_LAST for a non-builtin/internal call. I know Richard is > mostly > offline but eventually he knows whether there is a better way to query > > + CASE_CFN_POPCOUNT: > + /* Check if opcode for popcount is available. */ > + if (optab_handler (popcount_optab, > + TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG > (expr, 0)))) > + == CODE_FOR_nothing) > + return true; > > note that we currently generate builtin calls rather than IFN calls > (when a direct > optab is supported). > > Another comment on the patch is that you probably have to adjust existing > popcount testcases to add architecture specific flags enabling suport for > the instructions, otherwise you won't see loop replacement. Indeed. In lib/target-supports.exp, I will try to add support for check_effective_target_popcount_long. When I grep for the popcount pattern in md files, I see it is defined for: tilegx tilepro alpha aarch64 when TARGET_SIMD ia64 rs6000 i386 when TARGET_POPCOUNT popwerpcspce when TARGET_POPCNTB || TARGET_POPCNTD s390 when TARGET_Z916 && TARGET_64BIT sparc when TARGET_POPC arm when TARGET_NEON mips when ISA_HAS_POP spu avr I can check these targets with the condition. Another possibility is to check with a sample code and see if we are getting a libcall in the asm. Not sure if that is straightforward. Are there any example for such. We could also move these test to a primary target that is tested often tested which also defines popcount pattern. I dont think these tests change for targets and if we can test in one target that could be enough, I am happy to implement what is appropriate. Thanks, Kugan > > Also I think that the expression is only expensive (for final value > replacement!) > if you consider optimizing for speed. When optimizing for size getting rid of > the loop is probably beneificial unconditionally. That would leave the > possibility to switch said testcases to -Os. It would require adding a > bool size_p flag to expression_expensive and passing down > optimize_loop_for_size_p (). > > _NOTE_ that expression_expensive_p is also used by IVOPTs and there > replacing sth with an expression based on the niter analysis result doesn't > mean we get rid of the loop (but only of an IV), so maybe that reasoning > doesn't apply there. > > Richard. > > > Jeff > >
