On Mon, Feb 15, 2016 at 02:02:41PM -0500, Tom Lane wrote:
> Noah Misch <[email protected]> writes:
> > That commit (0d32d2e) permitted things to compile and usually pass tests,
> > but
> > I missed the synchronization bug. Since 2015-10-01, the buildfarm has seen
> > sixteen duplicate-catalog-OID failures.
>
> I'd been wondering about those ...
>
> > These suggested OidGenLock wasn't doing its job. I've seen similar symptoms
> > around WALInsertLocks with "IBM XL C/C++ for Linux, V13.1.2 (5725-C73,
> > 5765-J08)" for ppc64le. The problem is generic-xlc.h
> > pg_atomic_compare_exchange_u32_impl() issuing __isync() before
> > __compare_and_swap(). __isync() shall follow __compare_and_swap(); see our
> > own s_lock.h, its references, and other projects' usage:
>
> Nice catch!
>
> > This patch's test case would have failed about half the time under today's
> > generic-xlc.h. Fast machines run it in about 1s. A test that detects the
> > bug
> > 99% of the time would run far longer, hence this compromise.
>
> Sounds like a reasonable compromise to me, although I wonder about the
> value of it if we stick it into pgbench's TAP tests. How many of the
> slower buildfarm members are running the TAP tests? Certainly mine are
> not.
I missed a second synchronization bug in generic-xlc.h, but the pgbench test
suite caught it promptly after commit 008608b. Buildfarm members mandrill and
hornet failed[1] or deadlocked within two runs. The bug is that the
pg_atomic_compare_exchange_*() specifications grant "full barrier semantics",
but generic-xlc.h provided only the semantics of an acquire barrier. Commit
008608b exposed that older problem; its LWLockWaitListUnlock() relies on
pg_atomic_compare_exchange_u32() (via pg_atomic_fetch_and_u32()) for release
barrier semantics.
The fix is to issue "sync" before each compare-and-swap, in addition to the
"isync" after it. This is consistent with the corresponding GCC atomics. The
pgbench test suite survived dozens of runs so patched.
[1]
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2016-04-11%2003%3A14%3A13
diff --git a/src/include/port/atomics/generic-xlc.h
b/src/include/port/atomics/generic-xlc.h
index f24e3af..f4fd2f3 100644
--- a/src/include/port/atomics/generic-xlc.h
+++ b/src/include/port/atomics/generic-xlc.h
@@ -40,12 +40,22 @@ static inline bool
pg_atomic_compare_exchange_u32_impl(volatile pg_atomic_uint32 *ptr,
uint32
*expected, uint32 newval)
{
+ bool ret;
+
+ /*
+ * atomics.h specifies sequential consistency ("full barrier semantics")
+ * for this interface. Since "lwsync" provides acquire/release
+ * consistency only, do not use it here. GCC atomics observe the same
+ * restriction; see its rs6000_pre_atomic_barrier().
+ */
+ __asm__ __volatile__ (" sync \n" ::: "memory");
+
/*
* XXX: __compare_and_swap is defined to take signed parameters, but
that
* shouldn't matter since we don't perform any arithmetic operations.
*/
- bool ret = __compare_and_swap((volatile int*)&ptr->value,
-
(int *)expected, (int)newval);
+ ret = __compare_and_swap((volatile int*)&ptr->value,
+ (int *)expected,
(int)newval);
/*
* xlc's documentation tells us:
@@ -63,6 +73,10 @@ pg_atomic_compare_exchange_u32_impl(volatile
pg_atomic_uint32 *ptr,
static inline uint32
pg_atomic_fetch_add_u32_impl(volatile pg_atomic_uint32 *ptr, int32 add_)
{
+ /*
+ * __fetch_and_add() emits a leading "sync" and trailing "isync",
thereby
+ * providing sequential consistency. This is undocumented.
+ */
return __fetch_and_add((volatile int *)&ptr->value, add_);
}
@@ -73,8 +87,12 @@ static inline bool
pg_atomic_compare_exchange_u64_impl(volatile pg_atomic_uint64 *ptr,
uint64
*expected, uint64 newval)
{
- bool ret = __compare_and_swaplp((volatile long*)&ptr->value,
-
(long *)expected, (long)newval);
+ bool ret;
+
+ __asm__ __volatile__ (" sync \n" ::: "memory");
+
+ ret = __compare_and_swaplp((volatile long*)&ptr->value,
+ (long *)expected,
(long)newval);
__isync();
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers