On Friday, 5 February 2016 at 17:33:55 UTC, Era Scarecrow wrote:
On Friday, 5 February 2016 at 15:41:11 UTC, Fool wrote:
One swap usually decomposes into three moves.

Recently from reading some interesting hacks and super code, a good swap can also be done via 3 xor operations (and avoiding an intermediate storage unit).

http://aggregate.ee.engr.uky.edu/MAGIC/#Swap%20Values%20Without%20a%20Temporary

x ^= y;
y ^= x;
x ^= y;

Since these can be used directly as registers it might be faster (assuming all 5 are mapped to registers until the final write out, which might not work).

It's a cool trick but I would be surprised if this were actually faster. Modern x64 processors have 16 "general purpose" registers so saving a single register for a simple set of instructions is likely to not have any benefit. My other thought is that the compiler may not recognize this pattern, making it more difficult to optimize. On the other hand, compilers are very good at rearranging simple reads and writes to avoid stalling the pipeline in the CPU.

Reply via email to