https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838
--- Comment #15 from Wilco <wilco at gcc dot gnu.org> --- (In reply to Jakub Jelinek from comment #14) > The patch does: > + bool zero_ok = CTZ_DEFINED_VALUE_AT_ZERO (TYPE_MODE (type), ctzval) > == 2; > + > + /* Skip if there is no value defined at zero, or if we can't easily > + return the correct value for zero. */ > + if (!zero_ok) > + return false; > + if (zero_val != ctzval && !(zero_val == 0 && ctzval == type_size)) > + return false; > For CTZ_DEFINED_VALUE_AT_ZERO == 1 we could support it the same way but we'd > need > to emit into the IL an equivalent of val == 0 ? zero_val : .CTZ (val) (with > GIMPLE_COND and a separate bb - not sure if anything in forwprop creates new > basic blocks right now), where there is a high chance that RTL opts would > turn it back into unconditional > ctz. > That still wouldn't help non--mbmi x86, because CTZ_DEFINED_VALUE_AT_ZERO is > 0 there. > We could handle even that case by doing the branches around, but those would > stay there > in the generated code, at which point I wonder whether it would be a win. > The original > code is branchless... It would make more sense to move x86 backends to CTZ_DEFINED_VALUE_AT_ZERO == 2 so that you always get the same result even when you don't have tzcnt. A conditional move would be possible, so it adds an extra 2 instructions at worst (ie. still significantly faster than doing the table lookup, multiply etc). And it could be optimized when you know CLZ/CTZ input is non-zero.