Re: Improving spin-lock implementation on ARM.
> On 01/12/2020, 19:08, "Alexander Korotkov" wrote: >BTW, what number of clients did you use? I can't find it in your message. Sure. Important params seem to be: Pgbench: Clients: 256 pgbench_jobs : 32 Scale: 1000 fill_factor: 90 Postgresql: shared buffers: 31GB max_connections: 1024 Thanks! Tsahi.
Re: Improving spin-lock implementation on ARM.
> On 01/12/2020, 16:59, "Alexander Korotkov" wrote: > On Tue, Dec 1, 2020 at 1:10 PM Amit Khandekar wrote: > > FWIW, here is an earlier discussion on the same (also added the >> proposal author here) : Thanks for looping me in! >> >> > https://www.postgresql.org/message-id/flat/099F69EE-51D3-4214-934A-1F28C0A1A7A7%40amazon.com > > >Thank you for pointing! I wonder why the effect of LSE on Graviton2 >observed by Tsahi Zidenberg is so modest. It's probably because he >runs the tests with a low number of clients. There are multiple possible reasons why I saw a smaller effect of LSE, but I think an important one was that I used a 32-core instance rather than a 64-core one. The reason I did so, was that 32-cores gave me better absolute results than 64 cores, and I didn't want to feel like I could misguide anyone. The 64-core instance results is a good example for the benefit of LSE. LSE becomes most important in edges, and with adversarial workloads. If multiple CPUs try to acquire a lock simultaneously - LSE ensures one CPU will indeed get the lock (with just one transaction), while LDRX/STRX could have multiple CPUS looping and no-one acquiring a lock. This is why I believe just looking at "reasonable" benchmarks misses out on effects real customers will run into. Happy to see another arm-optimization thread so quickly :) Thank you! Tsahi.
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
On 29/09/2020, 10:21, "Heikki Linnakangas" wrote: > If it's a good idea to use -moutline-atomics, I would expect the > compiler or distribution to enable it by default. And as you pointed > out, many have. -moutline-atomics is only enabled by default on the gcc-10 branch where it was originally developed. It was important enough to be backported to previous versions and picked up by e.g. ubuntu and amazon-linux. However, outline-atomics is not enabled by default in any backports that I'm aware of. Ubuntu 20.04 even turned it off by default for gcc-10, which seems like a compatibility step with the main gcc-9 compiler. Always-enabled outline-atomic is, sadly, many years in the future for release systems. > For the others, there are probably reasons they haven't, > like begin conservative in general. Whatever the reasons, IMHO we should > not second-guess them. I assume GCC chose conservatively not to add code by default that won't help old CPUs when increasing minor versions (although I see no performance degradation in real software). On the other hand, the feature was important enough to be back-ported to allow software to take advantage of it. Postgresql should be the most advanced open source database. As I understand it, it should be able to handle as well as possible large workloads on large modern machines like Graviton2, and that means using LSE. > I'm marking this as Rejected in the commitfest. But thanks for the > benchmarking, that is valuable information nevertheless. Could additional data change your mind?
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
On 08/09/2020, 1:01, "Tom Lane" wrote: > I wonder what version of gcc you intend this for. AFAICS, older > gcc versions lack this flag at all, while newer ones have it on > by default. (previously sent private reply, sorry) The moutline-atomics flag showed substantial enough improvements that it has been backported to GCC 9, 8 and there is a gcc-7 branch in the works. Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it, with other distributions including Ubuntu 18.04 and Debian on the way. all distributions, including the upcoming Ubuntu with GCC-10, have moutline-atomics turned off by default.
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
Hello! First, I apologize for taking so long to answer. This e-mail regretfully got lost in my inbox. On 24/07/2020, 4:17, "Andres Freund" wrote: > What does "not significantly affected" exactly mean? Could you post the > raw numbers? The following tests show benchmark behavior on m6g.8xl instance (32-core with LSE support) and a1.4xlarge (16-core, no LSE support) with and without the patch, based on postgresql 12.4. Tests are pgbench select-only/simple-update, and sysbench read-only/write only. . select-only. simple-update.read-only. write-only m6g.8xlarge/vanila. 482130. 56275. 273327. 33364 m6g.8xlarge/patch. 493748. 59681. 262702. 33024 a1.4xlarge/vanila.82437. 13978. 62489. 2928 a1.4xlarge/patch. 79499. 13932. 62796. 2945 Results obviously change with OS / parameters /etc. I have attempted ensure a fair comparison, But I don't think these numbers should be taken as absolute. As reference points, m6g instance compiled with -march=native flag, and m5g (x86) instances: m6g.8xlarge/native. 522771.60354. 261366. 33582 m5.8xlarge. 362908.58732. 147730. 32750 > I'm a bit concerned that the additional conditional > branches on platforms without non ll/sc atomics could hurt noticably. As can be seen in a1 results - the difference for CPUSs with no LSE atomic support is low. Locks have one branch added, which is always taken the same way and thus easy to predict. > I'm surprised that read-only didn't benefit - with ll/sc that ought to > have pretty high contention on a few lwlocks. These results show only about 6% performance increase in simple-update, and very close performance in other results, most of which could be attributed to benchmark result jitter. These results from "well behaved" benchmarks do not show the full importance of using outline-atomics. I have observed in some experiments with other values and larger systems a crush of performance including read-only tests, which was caused by continuously failing to commit strx instructions. In such cases, outline-atomics improved performance by more than 2x factor. These cases are not always easy to replicate. Thank you! and sorry again for the delay Tsahi Zidenberg
Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64
On 01/07/2020, 18:40, "Zidenberg, Tsahi" wrote: > Outline-atomics is a gcc compilation flag that adds runtime detection of > weather or not the cpu > supports atomic instructions. CPUs that don't support atomic instructions > will use the old > load-exclusive/store-exclusive instructions. If a different compilation flag > defined an architecture > that unconditionally supports atomic instructions (e.g. -march=armv8.2), the > outline-atomic flag > will have no effect. > > The patch was tested to improve pgbench simple-update by 10% and sysbench > write-only by 3% > on a 64-core armv8.2 machine (AWS m6g.16xlarge). Select-only and read-only > benchmarks were > not significantly affected, and neither was performance on a 16-core armv8.0 > machine that does > not support atomic instructions (AWS a1.4xlarge). > > The patch uses an existing configure.in macro to detect compiler support of > the flag. Checking for > aarch64 machine is not strictly necessary, but was added for readability. Added a commitfest entry: https://commitfest.postgresql.org/29/2637/ Thank you! Tsahi