Re: Improving spin-lock implementation on ARM.

Krunal Bauskar Tue, 01 Dec 2020 19:58:31 -0800

On Tue, 1 Dec 2020 at 22:19, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Alexander Korotkov <aekorot...@gmail.com> writes:
> > On Tue, Dec 1, 2020 at 6:19 PM Krunal Bauskar <krunalbaus...@gmail.com>
> wrote:
> >> I would request you guys to re-think it from this perspective to help
> ensure that PGSQL can scale well on ARM.
> >> s_lock becomes a top-most function and LSE is not a universal solution
> but CAS surely helps ease the main bottleneck.
>
> > CAS patch isn't proven to be a universal solution as well.  We have
> > tested the patch on just a few processors, and Tom has seen the
> > regression [1].  The benchmark used by Tom was artificial, but the
> > results may be relevant for some real-life workload.
>
> Yeah.  I think that the main conclusion from what we've seen here is
> that on smaller machines like M1, a standard pgbench benchmark just
> isn't capable of driving PG into serious spinlock contention.  (That
> reflects very well on the work various people have done over the years
> to get rid of spinlock contention, because ten or so years ago it was
> a huge problem on this size of machine.  But evidently, not any more.)
> Per the results others have posted, nowadays you need dozens of cores
> and hundreds of client threads to measure any such issue with pgbench.
>
> So that is why I experimented with a special test that does nothing
> except pound on one spinlock.  Sure it's artificial, but if you want
> to see the effects of different spinlock implementations then it's
> just too hard to get any results with pgbench's regular scripts.
>
> And that's why it disturbs me that the CAS-spinlock patch showed up
> worse in that environment.  The fact that it's not visible in the
> regular pgbench test just means that the effect is too small to
> measure in that test.  But in a test where we *can* measure an effect,
> it's not looking good.
>
> It would be interesting to see some results from the same test I did
> on other processors.  I suspect the results would look a lot different
> from mine ... but we won't know unless someone does it.  Or, if someone
> wants to propose some other test case, let's have a look.
>
> > I'm expressing just my personal opinion, other committers can have
> > different opinions.  I don't particularly think this topic is
> > necessarily a non-starter.  But I do think that given ambiguity we've
> > observed in the benchmark, much more research is needed to push this
> > topic forward.
>
> Yeah.  I'm not here to say "do nothing".  But I think we need results
> from more machines and more test cases to convince ourselves whether
> there's a consistent, worthwhile win from any specific patch.
>


I think there is
*an ambiguity with lse and that has been the*
*source of some confusion* so let's make another attempt to
understand all the observations and then define the next steps.

-----------------------------------------------------------------


*1. CAS patch (applied on the baseline)*   - Kunpeng: 10-45% improvement
observed [1]
   - Graviton2: 30-50% improvement observed [2]
   - M1: Only select results are available cas continue to maintain a
marginal gain but not significant. [3]
     [inline with what we observed with Kunpeng and Graviton2 for select
results too].


*2. Let's ignore CAS for a sec and just think of LSE independently*   -
Kunpeng: regression observed
   - Graviton2: gain observed
   - M1: regression observed
     [while lse probably is default explicitly enabling it with +lse causes
regression on the head itself [4].
      client=2/4: 1816/714 ---- vs  ---- 892/610]

   There is enough reason not to immediately consider enabling LSE given
its unable to perform consistently on all hardware.
-----------------------------------------------------------------

With those 2 aspects clear let's evaluate what options we have in hand


*1. Enable CAS approach*   *- What we gain:* pgsql scale on
Kunpeng/Graviton2
     (m1 awaiting read-write result but may marginally scale  [[5]: "but
the patched numbers are only about a few percent better"])
   *- What we lose:* Nothing for now.


*2. LSE:*   *- What we gain: *Scaled workload with Graviton2
  * - What we lose:* regression on M1 and Kunpeng.

Let's think of both approaches independently.

- Enabling CAS would help us scale on all hardware (Kunpeng/Graviton2/M1)
- Enabling LSE would help us scale only on some but regress on others.
  [LSE could be considered in the future once it stabilizes and all
hardware adapts to it]

-------------------------------------------------------------------

*Let me know what do you think about this analysis and any specific
direction that we should consider to help move forward.*

-------------------------------------------------------------------

Links:
[1]:
https://www.postgresql.org/message-id/attachment/116612/Screenshot%20from%202020-12-01%2017-55-21.png
[2]: https://www.postgresql.org/message-id/attachment/116521/arm-rw.png
[3]:
https://www.postgresql.org/message-id/1367116.1606802480%40sss.pgh.pa.us
[4]:
https://www.postgresql.org/message-id/1158478.1606716507%40sss.pgh.pa.us
[5]:
https://www.postgresql.org/message-id/51e2f75b-3742-7f28-4438-0425b11cf410%40enterprisedb.com


>                         regards, tom lane
>


-- 
Regards,
Krunal Bauskar

Re: Improving spin-lock implementation on ARM.

Reply via email to