On Mon, Nov 16, 2020 at 09:10:54AM +0000, Mel Gorman wrote: > Similarly, it's not clear why the arm64 implementation > does not call smp_acquire__after_ctrl_dep in the smp_load_acquire > implementation. Even when it was introduced, the arm64 implementation > differed significantly from the arm implementation in terms of what > barriers it used for non-obvious reasons.
This is because ARM64's smp_cond_load_acquire() implementation uses smp_load_aquire() directly, as opposed to the generic version that uses READ_ONCE(). This is because ARM64 has a load-acquire instruction, which is highly optimized, and generally considered cheaper than the smp_rmb() from smp_acquire__after_ctrl_dep(). Or so I've been led to believe.

