https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118342
--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jakub Jelinek from comment #6)
> Yes, so we can use it for a == 0 ? prec : __builtin_ctzll (a); but not say
> (with small middle-end enhancements) for a == 0 ? -1 : __builtin_ctzll (a);
> because on some CPUs that would yield -1LL and on others 0xffffffffULL.
As I read the footnote from the Comment #4, the problem would be with:
long long foo (int a)
{
return a ? __builtin_ctz (a) : -1ll;
}
We declare:
(define_insn_and_split "ctz<mode>2"
[(set (match_operand:SWI48 0 "register_operand" "=r")
(ctz:SWI48
(match_operand:SWI48 1 "nonimmediate_operand" "rm")))
(clobber (reg:CC FLAGS_REG))]
without strict_low_part, so for ctzsi2, there is no guarantee that bits outside
SImode low part register will be preserved.
OTOH, we also declare:
(define_insn_and_split "*ctzsidi2_<s>ext"
[(set (match_operand:DI 0 "register_operand" "=r")
(any_extend:DI
(ctz:SI
(match_operand:SI 1 "nonimmediate_operand" "rm"))))
(clobber (reg:CC FLAGS_REG))]
"TARGET_64BIT"
{
if (TARGET_BMI)
return "tzcnt{l}\t{%1, %k0|%k0, %1}";
else if (TARGET_CPU_P (GENERIC)
&& !optimize_function_for_size_p (cfun))
/* tzcnt expands to 'rep bsf' and we can use it even if !TARGET_BMI. */
return "rep%; bsf{l}\t{%1, %k0|%k0, %1}";
return "bsf{l}\t{%1, %k0|%k0, %1}";
}
which *may* clear upper 32 bits with input operand == 0. So,
movq $-1, %rax
rep bsfl %edi, %eax
ret
would be risky, because bsfl *may* clobber the highpart of %rax when %edi is 0.