čt 16. 4. 2020 v 9:18 odesílatel Amit Khandekar <amitdkhan...@gmail.com> napsal:
> On Mon, 13 Apr 2020 at 20:16, Amit Khandekar <amitdkhan...@gmail.com> > wrote: > > On Sat, 11 Apr 2020 at 04:18, Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > > > I wrote: > > > > A more useful test would be to directly experiment with contended > > > > spinlocks. As I recall, we had some test cases laying about when > > > > we were fooling with the spin delay stuff on Intel --- maybe > > > > resurrecting one of those would be useful? > > > > > > The last really significant performance testing we did in this area > > > seems to have been in this thread: > > > > > > > https://www.postgresql.org/message-id/flat/CA%2BTgmoZvATZV%2BeLh3U35jaNnwwzLL5ewUU_-t0X%3DT0Qwas%2BZdA%40mail.gmail.com > > > > > > A relevant point from that is Haas' comment > > > > > > I think optimizing spinlocks for machines with only a few CPUs is > > > probably pointless. Based on what I've seen so far, spinlock > > > contention even at 16 CPUs is negligible pretty much no matter what > > > you do. Whether your implementation is fast or slow isn't going to > > > matter, because even an inefficient implementation will account for > > > only a negligible percentage of the total CPU time - much less > than 1% > > > - as opposed to a 64-core machine, where it's not that hard to find > > > cases where spin-waits consume the *majority* of available CPU time > > > (recall previous discussion of lseek). > > > > Yeah, will check if I find some machines with large cores. > > I got hold of a 32 CPUs VM (actually it was a 16-core, but being > hyperthreaded, CPUs were 32). > It was an Intel Xeon , 3Gz CPU. 15G available memory. Hypervisor : > KVM. Single NUMA node. > PG parameters changed : shared_buffer: 8G ; max_connections : 1000 > > I compared pgbench results with HEAD versus PAUSE removed like this : > perform_spin_delay(SpinDelayStatus *status) > { > - /* CPU-specific delay each time through the loop */ > - SPIN_DELAY(); > > Ran with increasing number of parallel clients : > pgbench -S -c $num -j $num -T 60 -M prepared > But couldn't find any significant change in the TPS numbers with or > without PAUSE: > > Clients HEAD Without_PAUSE > 8 244446 247264 > 16 399939 399549 > 24 454189 453244 > 32 1097592 1098844 > 40 1090424 1087984 > 48 1068645 1075173 > 64 1035035 1039973 > 96 976578 970699 > > May be it will indeed show some difference only with around 64 cores, > or perhaps a bare metal machine will help; but as of now I didn't get > such a machine. Anyways, I thought why not archive the results with > whatever I have. > > Not relevant to the PAUSE stuff .... Note that when the parallel > clients reach from 24 to 32 (which equals the machine CPUs), the TPS > shoots from 454189 to 1097592 which is more than double speed gain > with just a 30% increase in parallel sessions. I was not expecting > this much speed gain, because, with contended scenario already pgbench > processes are already taking around 20% of the total CPU time of > pgbench run. May be later on, I will get a chance to run with some > customized pgbench script that runs a server function which keeps on > running an index scan on pgbench_accounts, so as to make pgbench > clients almost idle. > what I know, pgbench cannot be used for testing spinlocks problems. Maybe you can see this issue when a) use higher number clients - hundreds, thousands. Decrease share memory, so there will be press on related spin lock. Regards Pavel > Thanks > -Amit Khandekar > > >