On Mon, Nov 16, 2020 at 01:46:57PM +0100, Peter Zijlstra wrote: > On Mon, Nov 16, 2020 at 09:10:54AM +0000, Mel Gorman wrote: > > Similarly, it's not clear why the arm64 implementation > > does not call smp_acquire__after_ctrl_dep in the smp_load_acquire > > implementation. Even when it was introduced, the arm64 implementation > > differed significantly from the arm implementation in terms of what > > barriers it used for non-obvious reasons. > > This is because ARM64's smp_cond_load_acquire() implementation uses > smp_load_aquire() directly, as opposed to the generic version that uses > READ_ONCE(). > > This is because ARM64 has a load-acquire instruction, which is highly > optimized, and generally considered cheaper than the smp_rmb() from > smp_acquire__after_ctrl_dep(). > > Or so I've been led to believe.
Fair enough. Either way, barriering sched_contributes_to_load "works" but it's clumsy and may not be guaranteed to be correct. The bits should have been protected by the rq lock but sched_remote_wakeup updates outside of the lock which might be leading to the adject fields (like sched_contributes_to_load) getting corrupted as per the "anti guarantees" in memory-barriers.txt. The rq lock could be conditionally acquired __ttwu_queue_wakelist for WF_MIGRATED and explicitly cleared in sched_ttwu_pending (not tested if this works) but it would also suck to acquire a remote lock when that's what we're explicitly trying to avoid in that path. -- Mel Gorman SUSE Labs