On Fri, 2017-01-13 at 23:22 +0530, 'Naveen N. Rao' wrote: > > That rather depends on whether the processor has a store to load forwarder > > that will satisfy the read from the store buffer. > > I don't know about ppc, but at least some x86 will do that. > > Interesting - good to know that. > > However, I don't think powerpc does that and in-register swap is likely > faster regardless. Note also that gcc prefers this form at higher > optimization levels.
Of course powerpc has a load-store forwarder these days, however, I wouldn't be surprised if the in-register form was still faster on some implementations, but this needs to be tested. Ideally, you'd want to try to "optimize" load+swap or swap+store though. Cheers, Ben.