Re: Improving spin-lock implementation on ARM.

2020-12-02 Thread Zidenberg, Tsahi
> On 01/12/2020, 19:08, "Alexander Korotkov"  wrote:
>BTW, what number of clients did you use?  I can't find it in your message.

Sure. Important params seem to be:

Pgbench:
Clients: 256 
pgbench_jobs : 32 
Scale: 1000
fill_factor: 90

Postgresql:
shared buffers: 31GB
max_connections: 1024

Thanks!
Tsahi.



Re: Improving spin-lock implementation on ARM.

2020-12-01 Thread Zidenberg, Tsahi
> On 01/12/2020, 16:59, "Alexander Korotkov"  wrote:
>  On Tue, Dec 1, 2020 at 1:10 PM Amit Khandekar  wrote:
>   > FWIW, here is an earlier discussion on the same (also added the
>> proposal author here) :

Thanks for looping me in!

>>
>> 
> https://www.postgresql.org/message-id/flat/099F69EE-51D3-4214-934A-1F28C0A1A7A7%40amazon.com
>
>
>Thank you for pointing! I wonder why the effect of LSE on Graviton2
>observed by Tsahi Zidenberg is so modest.  It's probably because he
>runs the tests with a low number of clients.

There are multiple possible reasons why I saw a smaller effect of LSE, but I 
think an important one was that I
used a 32-core instance rather than a 64-core one. The reason I did so, was 
that 32-cores gave me better
absolute results than 64 cores, and I didn't want to feel like I could misguide 
anyone.

The 64-core instance results is a good example for the benefit of LSE. LSE 
becomes most important in edges,
and with adversarial workloads. If multiple CPUs try to acquire a lock 
simultaneously - LSE ensures one CPU
will indeed get the lock (with just one transaction), while LDRX/STRX could 
have multiple CPUS looping and
no-one acquiring a lock. This is why I believe just looking at "reasonable" 
benchmarks misses out on effects
real customers will run into.

Happy to see another arm-optimization thread so quickly :)

Thank you!
Tsahi.




Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

2020-09-30 Thread Zidenberg, Tsahi


On 29/09/2020, 10:21, "Heikki Linnakangas"  wrote:
> If it's a good idea to use -moutline-atomics, I would expect the
> compiler or distribution to enable it by default. And as you pointed
> out, many have.

-moutline-atomics is only enabled by default on the gcc-10 branch where
it was originally developed. It was important enough to be backported
to previous versions and picked up by e.g. ubuntu and amazon-linux.
However, outline-atomics is not enabled by default in any backports that
I'm aware of. Ubuntu 20.04 even turned it off by default for gcc-10,
which seems like a compatibility step with the main gcc-9 compiler.
Always-enabled outline-atomic is, sadly, many years in the
future for release systems.

> For the others, there are probably reasons they haven't,
> like begin conservative in general. Whatever the reasons, IMHO we should
> not second-guess them.

I assume GCC chose conservatively not to add code by default that
won't help old CPUs when increasing minor versions (although I see
no performance degradation in real software).
On the other hand, the feature was important enough to be
back-ported to allow software to take advantage of it.
Postgresql should be the most advanced open source database.
As I understand it, it should be able to handle as well as possible
large workloads on large modern machines like Graviton2, and
that means using LSE.

> I'm marking this as Rejected in the commitfest. But thanks for the
> benchmarking, that is valuable information nevertheless.

Could additional data change your mind?





Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

2020-09-10 Thread Zidenberg, Tsahi
On 08/09/2020, 1:01, "Tom Lane"  wrote:

> I wonder what version of gcc you intend this for.  AFAICS, older
> gcc versions lack this flag at all, while newer ones have it on
> by default.


(previously sent private reply, sorry)

The moutline-atomics flag showed substantial enough improvements
that it has been backported to GCC 9, 8 and there is a gcc-7 branch in
the works.
Ubuntu has integrated this in 20.04, Amazon Linux 2 supports it,
with other distributions including Ubuntu 18.04 and Debian on the way.
all distributions, including the upcoming Ubuntu with GCC-10, have
moutline-atomics turned off by default.



Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

2020-09-06 Thread Zidenberg, Tsahi
Hello!

First, I apologize for taking so long to answer. This e-mail regretfully got 
lost in my inbox.

On 24/07/2020, 4:17, "Andres Freund"  wrote:

> What does "not significantly affected" exactly mean? Could you post the
> raw numbers?

The following tests show benchmark behavior on m6g.8xl instance (32-core with 
LSE support)
and a1.4xlarge (16-core, no LSE support) with and without the patch, based on 
postgresql 12.4.
Tests are pgbench select-only/simple-update, and sysbench read-only/write only.

.  select-only. simple-update.read-only.   
write-only
m6g.8xlarge/vanila.  482130. 56275.  273327.
   33364
m6g.8xlarge/patch.   493748. 59681.  262702.
   33024
a1.4xlarge/vanila.82437. 13978.   62489.
2928
a1.4xlarge/patch. 79499. 13932.   62796.
2945

Results obviously change with OS / parameters /etc. I have attempted ensure a 
fair comparison,
But I don't think these numbers should be taken as absolute.
As reference points, m6g instance compiled with -march=native flag, and m5g 
(x86) instances:

m6g.8xlarge/native.   522771.60354.   261366.   
   33582
m5.8xlarge.   362908.58732.   147730.   
   32750

> I'm a bit concerned that the additional conditional
> branches on platforms without non ll/sc atomics could hurt noticably.

As can be seen in a1 results - the difference for CPUSs with no LSE atomic 
support is low.
Locks have one branch added, which is always taken the same way and thus easy 
to predict.

> I'm surprised that read-only didn't benefit - with ll/sc that ought to
> have pretty high contention on a few lwlocks.

These results show only about 6% performance increase in simple-update, and 
very close
performance in other results, most of which could be attributed to benchmark 
result jitter.
These results from "well behaved" benchmarks do not show the full importance of 
using 
outline-atomics. I have observed in some experiments with other values and 
larger systems
a crush of performance including read-only tests, which was caused by 
continuously failing to
commit strx instructions. In such cases, outline-atomics improved performance 
by more
than 2x factor. These cases are not always easy to replicate.

Thank you!
and sorry again for the delay
Tsahi Zidenberg



Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

2020-07-07 Thread Zidenberg, Tsahi
On 01/07/2020, 18:40, "Zidenberg, Tsahi"  wrote:

> Outline-atomics is a gcc compilation flag that adds runtime detection of 
> weather or not the cpu
> supports atomic instructions. CPUs that don't support atomic instructions 
> will use the old 
> load-exclusive/store-exclusive instructions. If a different compilation flag 
> defined an architecture
> that unconditionally supports atomic instructions (e.g. -march=armv8.2), the 
> outline-atomic flag
> will have no effect.
>
> The patch was tested to improve pgbench simple-update by 10% and sysbench 
> write-only by 3%
> on a 64-core armv8.2 machine (AWS m6g.16xlarge). Select-only and read-only 
> benchmarks were
> not significantly affected, and neither was performance on a 16-core armv8.0 
> machine that does
> not support atomic instructions (AWS a1.4xlarge).
>
> The patch uses an existing configure.in macro to detect compiler support of 
> the flag. Checking for
> aarch64 machine is not strictly necessary, but was added for readability.

Added a commitfest entry:
https://commitfest.postgresql.org/29/2637/

Thank you!
Tsahi