Re: spin_delay() for ARM

Pavel Stehule Thu, 16 Apr 2020 00:34:26 -0700

čt 16. 4. 2020 v 9:18 odesílatel Amit Khandekar <amitdkhan...@gmail.com>
napsal:


> On Mon, 13 Apr 2020 at 20:16, Amit Khandekar <amitdkhan...@gmail.com>
> wrote:
> > On Sat, 11 Apr 2020 at 04:18, Tom Lane <t...@sss.pgh.pa.us> wrote:
> > >
> > > I wrote:
> > > > A more useful test would be to directly experiment with contended
> > > > spinlocks.  As I recall, we had some test cases laying about when
> > > > we were fooling with the spin delay stuff on Intel --- maybe
> > > > resurrecting one of those would be useful?
> > >
> > > The last really significant performance testing we did in this area
> > > seems to have been in this thread:
> > >
> > >
> https://www.postgresql.org/message-id/flat/CA%2BTgmoZvATZV%2BeLh3U35jaNnwwzLL5ewUU_-t0X%3DT0Qwas%2BZdA%40mail.gmail.com
> > >
> > > A relevant point from that is Haas' comment
> > >
> > >     I think optimizing spinlocks for machines with only a few CPUs is
> > >     probably pointless.  Based on what I've seen so far, spinlock
> > >     contention even at 16 CPUs is negligible pretty much no matter what
> > >     you do.  Whether your implementation is fast or slow isn't going to
> > >     matter, because even an inefficient implementation will account for
> > >     only a negligible percentage of the total CPU time - much less
> than 1%
> > >     - as opposed to a 64-core machine, where it's not that hard to find
> > >     cases where spin-waits consume the *majority* of available CPU time
> > >     (recall previous discussion of lseek).
> >
> > Yeah, will check if I find some machines with large cores.
>
> I got hold of a 32 CPUs VM (actually it was a 16-core, but being
> hyperthreaded, CPUs were 32).
> It was an Intel Xeon , 3Gz CPU. 15G available memory. Hypervisor :
> KVM. Single NUMA node.
> PG parameters changed : shared_buffer: 8G ; max_connections : 1000
>
> I compared pgbench results with HEAD versus PAUSE removed like this :
>  perform_spin_delay(SpinDelayStatus *status)
>  {
> -       /* CPU-specific delay each time through the loop */
> -       SPIN_DELAY();
>
> Ran with increasing number of parallel clients :
> pgbench -S -c $num -j $num -T 60 -M prepared
> But couldn't find any significant change in the TPS numbers with or
> without PAUSE:
>
> Clients     HEAD     Without_PAUSE
> 8         244446       247264
> 16        399939       399549
> 24        454189       453244
> 32       1097592      1098844
> 40       1090424      1087984
> 48       1068645      1075173
> 64       1035035      1039973
> 96        976578       970699
>
> May be it will indeed show some difference only with around 64 cores,
> or perhaps a bare metal machine will help; but as of now I didn't get
> such a machine. Anyways, I thought why not archive the results with
> whatever I have.
>
> Not relevant to the PAUSE stuff .... Note that when the parallel
> clients reach from 24 to 32 (which equals the machine CPUs), the TPS
> shoots from 454189 to 1097592 which is more than double speed gain
> with just a 30% increase in parallel sessions. I was not expecting
> this much speed gain, because, with contended scenario already pgbench
> processes are already taking around 20% of the total CPU time of
> pgbench run. May be later on, I will get a chance to run with some
> customized pgbench script that runs a server function which keeps on
> running an index scan on pgbench_accounts, so as to make pgbench
> clients almost idle.
>

what I know, pgbench cannot be used for testing spinlocks problems.

Maybe you can see this issue when a) use higher number clients - hundreds,
thousands. Decrease share memory, so there will be press on related spin
lock.

Regards

Pavel


> Thanks
> -Amit Khandekar
>
>
>

Re: spin_delay() for ARM

Reply via email to