On Fri, Sep 8, 2017 at 4:07 PM, Andy Lutomirski <l...@kernel.org> wrote: > > I *think* this is impossible because CPU A's mm_cpumask manipulations > are atomic and should therefore force out the streaming write buffers, > but maybe there's some other scenario where this matters.
I don't think atomic memops do that. They enforce globally visible ordering, but since they happen in the cache and is not actually visible to outside, that doesn't actually affect any streaming write buffers. Then, if somebody else requests a cacheline that we have exclusive ownership to, the write buffers just need to flush before we give up that cacheline. So a locked memory op is *not* serializing, it only enforces memory ordering. Big difference. Only fully serializing instructions will serialize with the write buffers, and they are expensive as hell (partly exactly _due_ to these kinds of issues). So this change to delay invalidation does sound fairly scary.. Linus