https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106471

--- Comment #5 from Linus Torvalds <torva...@linux-foundation.org> ---
(In reply to Andrew Pinski from comment #2)
> The xor is needed because of an errata in some Intel cores.

The only errata I'm aware of is that tzcnt can act as tzcnt even when cpuid
doesn't enumerate it (so it would be expected to act as just bsf). Errata 010
for Gen 8/9 cores.

And yes, that's an errata, but the xor doesn't really help.

Sure, the xor means that on old machines, where 'rep' is ignored, and tzcnt
will always act as bsf, the result register is now going to be zero if the
input is zero.

But that's

 (a) not what tzcnt does (it sets the result to 64 when the input is zero)

 (b) not what __builtin_ctzl() is documented to do anyway

In particular, wrt (b), the documentation already states

 "If x is 0, the result is undefined"

which is exactly the old legacy 'bsf' behavior.

And the errata I'm aware of is that 'rep bsf' acts as tzcnt (ie "write 64 to
destination instead of leave unmodified"), so even with the xor you actually
get undefined behavior (0 or 64 depending on CPU).

So both (a) and (b) argue for that xor being wrong.

Now, of course, there may be some other errata that I'm not aware of. Can
somebody point to it?

(And yes, on old CPUs that don't have tzcnt at all the added xor will break a
false dependency and maybe help performance, but should gcc really care about
old CPUs like that? Particularly when it eats a register and makes it
impossible to have the same source and destination register?)

> There is a different bug already recording the issue with cltq (and should
> be fixed soon or was already committed, there is a patch).

Ok, thanks.

Reply via email to