On Tue, Feb 02, 2016 at 11:55:57AM -0800, Linus Torvalds wrote: > On Tue, Feb 2, 2016 at 11:30 AM, Will Deacon <will.dea...@arm.com> wrote: > > > > FWIW, and this is by no means conclusive, I hacked that up quickly and > > ran hackbench a few times on the nearest idle arm64 system. The results > > were consistently ~4% slower using acquire for rcu_dereference. > > Ok, that's *much* more noticeable than I would have expected. I take > it that load-acquire is really really slow on current arm64 > implementations.
See my reply to Ingo, but it seems a bunch of this was down to rebooting the system between runs and hackbench being particularly susceptible to that. > Just out of interest, is store-release slow too? Because that should > be easy to make fast. There's a slight gotcha with arm64's store-release instruction in that it's RCsc and therefore orders against a subsequent load-acquire. That's not to say you can't make it fast, but it's potentially more involved than posting a flag in a store buffer (or whatever you were envisaging :) Measuring store-release is much more difficult, because you can't replace it with a dependency or the like, only other barrier constructs. Will