Ping. Ok for mainline?
On Thu, Apr 25, 2024 at 09:26:45AM +0200, Stefan Schulze Frielinghaus wrote: > Bitcount operations popcount, clz, and ctz are emulated for narrow modes > in case an operation is only supported for wider modes. Beside that ctz > may be emulated via clz in expand_ctz. Reflect this in > expression_expensive_p. > > I considered the emulation of ctz via clz as not expensive since this > basically reduces to ctz (x) = c - (clz (x & ~x)) where c is the mode > precision minus 1 which should be faster than a loop. > > Bootstrapped and regtested on x86_64 and s390. Though, this is probably > stage1 material? > > gcc/ChangeLog: > > PR tree-optimization/110490 > * tree-scalar-evolution.cc (expression_expensive_p): Also > consider mode widening for popcount, clz, and ctz. > --- > gcc/tree-scalar-evolution.cc | 23 +++++++++++++++++++++++ > 1 file changed, 23 insertions(+) > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc > index b0a5e09a77c..622c7246c1b 100644 > --- a/gcc/tree-scalar-evolution.cc > +++ b/gcc/tree-scalar-evolution.cc > @@ -3458,6 +3458,28 @@ bitcount_call: > && (optab_handler (optab, word_mode) > != CODE_FOR_nothing)) > break; > + /* If popcount is available for a wider mode, we emulate the > + operation for a narrow mode by first zero-extending the value > + and then computing popcount in the wider mode. Analogue for > + ctz. For clz we do the same except that we additionally have > + to subtract the difference of the mode precisions from the > + result. */ > + if (is_a <scalar_int_mode> (mode, &int_mode)) > + { > + machine_mode wider_mode_iter; > + FOR_EACH_WIDER_MODE (wider_mode_iter, mode) > + if (optab_handler (optab, wider_mode_iter) > + != CODE_FOR_nothing) > + goto check_call_args; > + /* Operation ctz may be emulated via clz in expand_ctz. */ > + if (optab == ctz_optab) > + { > + FOR_EACH_WIDER_MODE_FROM (wider_mode_iter, mode) > + if (optab_handler (clz_optab, wider_mode_iter) > + != CODE_FOR_nothing) > + goto check_call_args; > + } > + } > return true; > } > break; > @@ -3469,6 +3491,7 @@ bitcount_call: > break; > } > > +check_call_args: > FOR_EACH_CALL_EXPR_ARG (arg, iter, expr) > if (expression_expensive_p (arg, cond_overflow_p, cache, op_cost)) > return true; > -- > 2.44.0 >