Improving spin-lock implementation on ARM. ------------------------------------------------------------
* Spin-Lock is known to have a significant effect on performance with increasing scalability. * Existing Spin-Lock implementation for ARM is sub-optimal due to use of TAS (test and swap) * TAS is implemented on ARM as load-store so even if the lock is not free, store operation will execute to replace the same value. This redundant operation (mainly store) is costly. * CAS is implemented on ARM as load-check-store-check that means if the lock is not free, check operation, post-load will cause the loop to return there-by saving on costlier store operation. [1] * x86 uses optimized xchg operation. ARM too started supporting it (using Large System Extension) with ARM-v8.1 but since it not supported with ARM-v8, GCC default tends to roll more generic load-store assembly code. * gcc-9.4+ onwards there is support for outline-atomics that could emit both the variants of the code (load-store and cas/swp) and based on underlying supported architecture proper variant it used but still a lot of distros don't support GCC-9.4 as the default compiler. * In light of this, we would like to propose a CAS-based approach based on our local testing has shown improvement in the range of 10-40%. (attaching graph). * Patch enables CAS based approach if the CAS is supported depending on existing compiled flag HAVE_GCC__ATOMIC_INT32_CAS (Thanks to Amit Khandekar for rigorously performance testing this patch with different combinations). [1]: https://godbolt.org/z/jqbEsa P.S: Sorry if I missed any standard pgsql protocol since I am just starting with pgsql. --- Krunal Bauskar #mysqlonarm Huawei Technologies
diff --git a/src/include/storage/s_lock.h b/src/include/storage/s_lock.h index 31a5ca6..940fdcd 100644 --- a/src/include/storage/s_lock.h +++ b/src/include/storage/s_lock.h @@ -321,7 +321,24 @@ tas(volatile slock_t *lock) * than other widths. */ #if defined(__arm__) || defined(__arm) || defined(__aarch64__) || defined(__aarch64) -#ifdef HAVE_GCC__SYNC_INT32_TAS + +#ifdef HAVE_GCC__ATOMIC_INT32_CAS +/* just reusing the same flag to avoid re-declaration of default tas functions below */ +#define HAS_TEST_AND_SET + +#define TAS(lock) cas(lock) +typedef int slock_t; + +static __inline__ int +cas(volatile slock_t *lock) +{ + slock_t expected = 0; + return !(__atomic_compare_exchange_n(lock, &expected, (slock_t) 1, + false, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE)); +} + +#define S_UNLOCK(lock) __atomic_store_n(lock, (slock_t) 0, __ATOMIC_RELEASE); +#elif HAVE_GCC__SYNC_INT32_TAS #define HAS_TEST_AND_SET #define TAS(lock) tas(lock)