On Tue, 1 Aug 2017 13:00:23 +0200 Peter Zijlstra <pet...@infradead.org> wrote:
> On Tue, Aug 01, 2017 at 08:39:28PM +1000, Nicholas Piggin wrote: > > Right, I just don't see what real problem this opens up that you don't > > already have when you are not hard partitioned, therefore it doesn't > > make sense to add a slowdown to the context switch fastpath to close > > one hole in the sieve. > > > > Completely recognizing that other architectures can do it without > > taking rq lock at all and will not be forced to do so. > > If we can limit this to hard partitioned, that would be good indeed. > > I'm just trying to avoid having two implementation of this thing. At the > same time I very much understand your reluctance to add this barrier. Well I think we could have some kind of for_each_cpu_where_this_process_is_running macro that is needed to abstract the arch details. Presumably we're already going to get two implementations of that one -- I can't imagine x86 would be happy with doing a for_all_cpus iteration just because arm does not have the cpumask. powerpc will only make that 3 :) > > In any case, supposing we can do that intent thing. How horrible would > something like: > > > context_switch() > if (unlikely(mm->needs_barrier)) > smp_mb__after_unlock_lock(); > > > be? We only need the extra barrier when we switch _into_ mm's that care > about sys_membarrier() in the first place. At which point they pay the > price. Not beautiful :) and it would also have to have an arch speicific bit on the other side. Although yes it gives a different way to reduce cost without rq. So Paul and googling filled me in on the importance of this syscall. Also I do appreciate the concern about taking rq lock. I just think maybe we (powerpc) pay a few more cycles in the new syscall rather than context switch. It will take a little while to get a good idea of performance and behaviour on bigger systems where this will matter most.