On Tue, 8 Dec 2020 at 14:33, Krunal Bauskar <krunalbaus...@gmail.com> wrote:
> > > On Thu, 3 Dec 2020 at 21:32, Tom Lane <t...@sss.pgh.pa.us> wrote: > >> Krunal Bauskar <krunalbaus...@gmail.com> writes: >> > Any updates or further inputs on this. >> >> As far as LSE goes: my take is that tampering with the >> compiler/platform's default optimization options requires *very* >> strong evidence, which we have not got and likely won't get. Users >> who are building for specific hardware can choose to supply custom >> CFLAGS, of course. But we shouldn't presume to do that for them, >> because we don't know what they are building for, or with what. >> >> I'm very willing to consider the CAS spinlock patch, but it still >> feels like there's not enough evidence to show that it's a universal >> win. The way to move forward on that is to collect more measurements >> on additional ARM-based platforms. And I continue to think that >> pgbench is only a very crude tool for testing spinlock performance; >> we should look at other tests. >> > > Thanks Tom. > > Given pg-bench limited option I decided to try things with sysbench to > expose > the real contention using zipfian type (zipfian pattern causes part of the > database > to get updated there-by exposing main contention point). > > > ---------------------------------------------------------------------------- > *Baseline for 256 threads update-index use-case:* > - 44.24% 174935 postgres postgres [.] s_lock > transactions: > transactions: 5587105 (92988.40 per sec.) > > *Patched for 256 threads update-index use-case:* > 0.02% 80 postgres postgres [.] s_lock > transactions: > transactions: 10288781 (171305.24 per sec.) > > *perf diff* > > * 0.02% +44.22% postgres [.] s_lock* > ---------------------------------------------------------------------------- > > As we see from the above result s_lock is exposing major contention that > could be relaxed using the > said cas patch. Performance improvement in range of 80% is observed. > > Taking this guideline we decided to run it for all scalability for update > and non-update use-case. > Check the attached graph. Consistent improvement is observed. > > I presume this should help re-establish that for major contention cases > existing tas approach will always give up. > > > ------------------------------------------------------------------------------------------- > > Unfortunately, I don't have access to different ARM arch except for > Kunpeng or Graviton2 where > we have already proved the value of the patch. > [ref: Apple M1 as per your evaluation patch doesn't show regression for > select. Maybe if possible can you try update scenarios too]. > > Do you know anyone from the community who has access to other ARM arches > we can request them to evaluate? > But since it is has proven on 2 independent ARM arch I am pretty confident > it will scale with other ARM arches too. > > Any direction on how we can proceed on this? * We have tested it with both cloud vendors that provide ARM instances. * We have tested it with Apple M1 (partially at-least) * Ampere use to provide instance on packet.com but now it is an evaluation program only. No other active arm instance offering a cloud provider. Given our evaluation so far has proven to be +ve can we think of considering it on basis of the available data which is quite encouraging with 80% improvement seen for heavy contention use-cases. > >> From a system structural standpoint, I seriously dislike that lwlock.c >> patch: putting machine-specific variant implementations into that file >> seems like a disaster for maintainability. So it would need to show a >> very significant gain across a range of hardware before I'd want to >> consider adopting it ... and it has not shown that. >> >> regards, tom lane >> > > > -- > Regards, > Krunal Bauskar > -- Regards, Krunal Bauskar