On Sat, Jun 2, 2018 at 6:38 PM Rik van Riel <[email protected]> wrote: > > On Sun, 2018-06-03 at 00:51 +0000, Song Liu wrote: > > > > Just to check: in the workload where you're seeing this problem, > > > are > > > you using an mm with many threads? I would imagine that, if you > > > only > > > have one or two threads, the bit operations aren't so bad. > > > > Yes, we are running netperf/netserver with 300 threads. We don't see > > this much overhead in with real workload. > > We may not, but there are some crazy workloads out > there in the world. Think of some Java programs with > thousands of threads, causing a million context > switches a second on a large system. > > I like Andy's idea of having one cache line with > a cpumask per node. That seems like it will have > fewer downsides for tasks with fewer threads running > on giant systems. > > I'll throw out the code I was working on, and look > into implementing that :) >
I'm not sure you should throw your patch out. It's a decent idea, too.

