On Tue, 1 Dec 2020 at 22:19, Tom Lane <t...@sss.pgh.pa.us> wrote: > Alexander Korotkov <aekorot...@gmail.com> writes: > > On Tue, Dec 1, 2020 at 6:19 PM Krunal Bauskar <krunalbaus...@gmail.com> > wrote: > >> I would request you guys to re-think it from this perspective to help > ensure that PGSQL can scale well on ARM. > >> s_lock becomes a top-most function and LSE is not a universal solution > but CAS surely helps ease the main bottleneck. > > > CAS patch isn't proven to be a universal solution as well. We have > > tested the patch on just a few processors, and Tom has seen the > > regression [1]. The benchmark used by Tom was artificial, but the > > results may be relevant for some real-life workload. > > Yeah. I think that the main conclusion from what we've seen here is > that on smaller machines like M1, a standard pgbench benchmark just > isn't capable of driving PG into serious spinlock contention. (That > reflects very well on the work various people have done over the years > to get rid of spinlock contention, because ten or so years ago it was > a huge problem on this size of machine. But evidently, not any more.) > Per the results others have posted, nowadays you need dozens of cores > and hundreds of client threads to measure any such issue with pgbench. > > So that is why I experimented with a special test that does nothing > except pound on one spinlock. Sure it's artificial, but if you want > to see the effects of different spinlock implementations then it's > just too hard to get any results with pgbench's regular scripts. > > And that's why it disturbs me that the CAS-spinlock patch showed up > worse in that environment. The fact that it's not visible in the > regular pgbench test just means that the effect is too small to > measure in that test. But in a test where we *can* measure an effect, > it's not looking good. > > It would be interesting to see some results from the same test I did > on other processors. I suspect the results would look a lot different > from mine ... but we won't know unless someone does it. Or, if someone > wants to propose some other test case, let's have a look. > > > I'm expressing just my personal opinion, other committers can have > > different opinions. I don't particularly think this topic is > > necessarily a non-starter. But I do think that given ambiguity we've > > observed in the benchmark, much more research is needed to push this > > topic forward. > > Yeah. I'm not here to say "do nothing". But I think we need results > from more machines and more test cases to convince ourselves whether > there's a consistent, worthwhile win from any specific patch. >
I think there is *an ambiguity with lse and that has been the* *source of some confusion* so let's make another attempt to understand all the observations and then define the next steps. ----------------------------------------------------------------- *1. CAS patch (applied on the baseline)* - Kunpeng: 10-45% improvement observed [1] - Graviton2: 30-50% improvement observed [2] - M1: Only select results are available cas continue to maintain a marginal gain but not significant. [3] [inline with what we observed with Kunpeng and Graviton2 for select results too]. *2. Let's ignore CAS for a sec and just think of LSE independently* - Kunpeng: regression observed - Graviton2: gain observed - M1: regression observed [while lse probably is default explicitly enabling it with +lse causes regression on the head itself [4]. client=2/4: 1816/714 ---- vs ---- 892/610] There is enough reason not to immediately consider enabling LSE given its unable to perform consistently on all hardware. ----------------------------------------------------------------- With those 2 aspects clear let's evaluate what options we have in hand *1. Enable CAS approach* *- What we gain:* pgsql scale on Kunpeng/Graviton2 (m1 awaiting read-write result but may marginally scale [[5]: "but the patched numbers are only about a few percent better"]) *- What we lose:* Nothing for now. *2. LSE:* *- What we gain: *Scaled workload with Graviton2 * - What we lose:* regression on M1 and Kunpeng. Let's think of both approaches independently. - Enabling CAS would help us scale on all hardware (Kunpeng/Graviton2/M1) - Enabling LSE would help us scale only on some but regress on others. [LSE could be considered in the future once it stabilizes and all hardware adapts to it] ------------------------------------------------------------------- *Let me know what do you think about this analysis and any specific direction that we should consider to help move forward.* ------------------------------------------------------------------- Links: [1]: https://www.postgresql.org/message-id/attachment/116612/Screenshot%20from%202020-12-01%2017-55-21.png [2]: https://www.postgresql.org/message-id/attachment/116521/arm-rw.png [3]: https://www.postgresql.org/message-id/1367116.1606802480%40sss.pgh.pa.us [4]: https://www.postgresql.org/message-id/1158478.1606716507%40sss.pgh.pa.us [5]: https://www.postgresql.org/message-id/51e2f75b-3742-7f28-4438-0425b11cf410%40enterprisedb.com > regards, tom lane > -- Regards, Krunal Bauskar