Hi, On 2015-06-10 09:54:00 -0400, Jan Wieck wrote: > model name : Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz
> numactl --hardware shows the distance to the attached memory as 10, the > distance to every other node as 21. I interpret that as the machine having > one NUMA bus with all cpu packages attached to that, rather than individual > connections from cpu to cpu or something different. Generally that doesn't say very much - IIRC the distances are defined by the bios. > What led me into that spinlock area was the fact that a wall clock based > systemtap FlameGraph showed a large portion of the time spent in > BufferPin() and BufferUnpin(). I've seen that as a bottleneck in the past as well. My plan to fix that is to "simply" make buffer pinning lockless for the majority of cases. I don't have access to hardware to test that at higher node counts atm though. My guess is that the pins are on the btree root pages. But it'd be good to confirm that. > >Maybe we need to adjust the amount of spinning, but to me such drastic > >differences are a hint that we should tackle the actual contention > >point. Often a spinlock for something regularly heavily contended can be > >worse than a queued lock. > > I have the impression that the code assumes that there is little penalty for > accessing the shared byte in a tight loop from any number of cores in > parallel. That apparently is true for some architectures and core counts, > but no longer holds for these machines with many sockets. It's just generally a tradeoff. It's beneficial to spin longer if there's only mild amounts of contention. If the likelihood of getting the spinlock soon is high (i.e. existing, but low contention), it'll nearly always be beneficial to spin. If the likelihood is low, it'll be mostly beneficial to sleep. The latter is especially true if a machine is sufficiently overcommitted that it's likely that it'll sleep while holding a spinlock. The danger of sleeping while holding a spinlock, without targeted wakeup, is why spinlocks in userspace aren't a really good idea. My bet is that if you'd measure using different number iterations for different spinlocks you'd find some where the higher number of iterations is rather beneficial as well. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers