Richard Guenther writes: >As I said - at least for AMD CPUs - it looks like you can freely >interchange the ps|pd or integer variants of the bitwise and/or >operations without a penalty.
An example in AMD's "Software Optmization Guide for AMD64 Processors" suggests that you can't freely interchange them. In the example it gives for using XOR to negate a double-precision vector, it uses XORPD. If PXOR, XORPS and XORPD were all interchangable, it should have used XORPS since it's a byte shorter than XORPD. The guide also says: When it is necessary to zero out an XMM register, use an instruction whose format matches the format required by the consumers of the zeroed register. ... When an XMM register must be set to zero, using the appropriate instruction helps reduce the chance of any performance penalty later. This advice differs from Intel's, which on Pentium 4 processors recommends always using PXOR to clear XMM registers, as that instruction breaks dependency chains, while the XORPS and XORPD instructions don't. Only the newer Intel Core processors support breaking chains with all three instructions. Ross Ridge