[Bug tree-optimization/90838] Detect table-based ctz implementation

wilco at gcc dot gnu.org via Gcc-bugs Fri, 17 Feb 2023 04:57:54 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838


--- Comment #15 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #14)
> The patch does:
> +      bool zero_ok = CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), ctzval)
> == 2;
> +
> +      /* Skip if there is no value defined at zero, or if we can't easily
> +        return the correct value for zero.  */
> +      if (!zero_ok)
> +       return false;
> +      if (zero_val != ctzval && !(zero_val == 0 && ctzval == type_size))
> +       return false;
> For CTZ_DEFINED_VALUE_AT_ZERO == 1 we could support it the same way but we'd
> need
> to emit into the IL an equivalent of val == 0 ? zero_val : .CTZ (val) (with
> GIMPLE_COND and a separate bb - not sure if anything in forwprop creates new
> basic blocks right now), where there is a high chance that RTL opts would
> turn it back into unconditional
> ctz.
> That still wouldn't help non--mbmi x86, because CTZ_DEFINED_VALUE_AT_ZERO is
> 0 there.
> We could handle even that case by doing the branches around, but those would
> stay there
> in the generated code, at which point I wonder whether it would be a win. 
> The original
> code is branchless...

It would make more sense to move x86 backends to CTZ_DEFINED_VALUE_AT_ZERO == 2
so that you always get the same result even when you don't have tzcnt. A
conditional move would be possible, so it adds an extra 2 instructions at worst
(ie. still significantly faster than doing the table lookup, multiply etc). And
it could be optimized when you know CLZ/CTZ input is non-zero.

[Bug tree-optimization/90838] Detect table-based ctz implementation

Reply via email to