On 04/27/2015 07:02 PM, Peter Geoghegan wrote:
So, this can still happen, but is now happening less often than
before, I believe. On a 16 core server, with continual 128 client
jjanes_upsert exclusion constraint only runs, with fsync=off, I
started at this time:

2015-04-27 21:22:28 UTC [ 0 ]: LOG:  database system was shut down at
2015-04-27 21:22:25 UTC
2015-04-27 21:22:28 UTC [ 0 ]: LOG:  database system is ready to
accept connections
2015-04-27 22:47:20 UTC [ 0 ]: LOG:  autovacuum launcher started
2015-04-27 22:47:21 UTC [ 0 ]: LOG:  autovacuum launcher started

Finally, with ON CONFLICT UPDATE (which we don't intend to support
with exclusion constraints anyway), the torture testing finally
produces a deadlock several hours later (due to having "livelock
insurance" [1]):

2015-04-28 00:22:06 UTC [ 0 ]: LOG:  autovacuum launcher started
2015-04-28 00:37:24 UTC [ 432432057 ]: ERROR:  deadlock detected
2015-04-28 00:37:24 UTC [ 432432057 ]: DETAIL:  Process 130628 waits
for ShareLock on transaction 432432127; blocked by process 130589.
         Process 130589 waits for ShareLock on speculative token 13 of
transaction 432432057; blocked by process 130628.
         Process 130628: insert into upsert_race_test (index, count)
values ('7566','-1') on conflict
                       update set count=TARGET.count + EXCLUDED.count
                       where TARGET.index = EXCLUDED.index
                       returning count
         Process 130589: insert into upsert_race_test (index, count)
values ('7566','1') on conflict
                       update set count=TARGET.count + EXCLUDED.count
                       where TARGET.index = EXCLUDED.index
                       returning count
2015-04-28 00:37:24 UTC [ 432432057 ]: HINT:  See server log for query details.
2015-04-28 00:37:24 UTC [ 432432057 ]: CONTEXT:  while checking
exclusion constraint on tuple (3,36) in relation "upsert_race_test"
2015-04-28 00:37:24 UTC [ 432432057 ]: STATEMENT:  insert into
upsert_race_test (index, count) values ('7566','-1') on conflict
                       update set count=TARGET.count + EXCLUDED.count
                       where TARGET.index = EXCLUDED.index
                       returning count

ON CONFLICT UPDATE will only ever use unique indexes, and so is not affected.

Given that exclusion constraints can only be used with IGNORE, and
given that this is so hard to recreate, I'm inclined to conclude that
it's acceptable. It's certainly way better than risking livelocks by
not having "deadlock insurance". This is a ridiculously CPU-bound
workload, with extreme and constant contention. I'd be surprised if
there were any real complaints from the field in practice.

Do you think that this is acceptable, Heikki?

I thought we had an ironclad scheme to prevent deadlocks like this, so I'd like to understand why that happens.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to