https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #19 from Gabriel Ravier <gabravier at gmail dot com> ---
(In reply to Jakub Jelinek from comment #14)
> The patch does:
> +      bool zero_ok = CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), ctzval)
> == 2;
> +
> +      /* Skip if there is no value defined at zero, or if we can't easily
> +        return the correct value for zero.  */
> +      if (!zero_ok)
> +       return false;
> +      if (zero_val != ctzval && !(zero_val == 0 && ctzval == type_size))
> +       return false;
> For CTZ_DEFINED_VALUE_AT_ZERO == 1 we could support it the same way but we'd
> need
> to emit into the IL an equivalent of val == 0 ? zero_val : .CTZ (val) (with
> GIMPLE_COND and a separate bb - not sure if anything in forwprop creates new
> basic blocks right now), where there is a high chance that RTL opts would
> turn it back into unconditional
> ctz.
> That still wouldn't help non--mbmi x86, because CTZ_DEFINED_VALUE_AT_ZERO is
> 0 there.
> We could handle even that case by doing the branches around, but those would
> stay there
> in the generated code, at which point I wonder whether it would be a win. 
> The original
> code is branchless...

If the original code being branchless makes it faster, wouldn't that imply that
we should use the table-based implementation when generating code for
`__builtin_ctz` ?

Reply via email to