The NUMA balancing code spends way too much CPU time scanning and
faulting when running multi-threaded workloads.

This patch set slows down NUMA PTE scanning when there are lots
of shared faults, and when dealing with large NUMA groups that
have a large fraction of shared faults.

Some results from Jirka's half-week performance run, on
a 4 node system:
- improvements in the range of 10-30% for NAS benchmarks
  (mostly ft and lu subtests)
- SPECjbb2005 single instance mode - improvements in the range of 5-10%
- SPECjvm2008 - performance very similar to before, some small
  improvements for the scimark* subtests

Reply via email to