https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838
--- Comment #19 from Gabriel Ravier <gabravier at gmail dot com> --- (In reply to Jakub Jelinek from comment #14) > The patch does: > + bool zero_ok = CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), ctzval) > == 2; > + > + /* Skip if there is no value defined at zero, or if we can't easily > + return the correct value for zero. */ > + if (!zero_ok) > + return false; > + if (zero_val != ctzval && !(zero_val == 0 && ctzval == type_size)) > + return false; > For CTZ_DEFINED_VALUE_AT_ZERO == 1 we could support it the same way but we'd > need > to emit into the IL an equivalent of val == 0 ? zero_val : .CTZ (val) (with > GIMPLE_COND and a separate bb - not sure if anything in forwprop creates new > basic blocks right now), where there is a high chance that RTL opts would > turn it back into unconditional > ctz. > That still wouldn't help non--mbmi x86, because CTZ_DEFINED_VALUE_AT_ZERO is > 0 there. > We could handle even that case by doing the branches around, but those would > stay there > in the generated code, at which point I wonder whether it would be a win. > The original > code is branchless... If the original code being branchless makes it faster, wouldn't that imply that we should use the table-based implementation when generating code for `__builtin_ctz` ?