On 04/27/2015 07:02 PM, Peter Geoghegan wrote:
So, this can still happen, but is now happening less often than before, I believe. On a 16 core server, with continual 128 client jjanes_upsert exclusion constraint only runs, with fsync=off, I started at this time:2015-04-27 21:22:28 UTC [ 0 ]: LOG: database system was shut down at 2015-04-27 21:22:25 UTC 2015-04-27 21:22:28 UTC [ 0 ]: LOG: database system is ready to accept connections 2015-04-27 22:47:20 UTC [ 0 ]: LOG: autovacuum launcher started 2015-04-27 22:47:21 UTC [ 0 ]: LOG: autovacuum launcher started Finally, with ON CONFLICT UPDATE (which we don't intend to support with exclusion constraints anyway), the torture testing finally produces a deadlock several hours later (due to having "livelock insurance" [1]): 2015-04-28 00:22:06 UTC [ 0 ]: LOG: autovacuum launcher started 2015-04-28 00:37:24 UTC [ 432432057 ]: ERROR: deadlock detected 2015-04-28 00:37:24 UTC [ 432432057 ]: DETAIL: Process 130628 waits for ShareLock on transaction 432432127; blocked by process 130589. Process 130589 waits for ShareLock on speculative token 13 of transaction 432432057; blocked by process 130628. Process 130628: insert into upsert_race_test (index, count) values ('7566','-1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning count Process 130589: insert into upsert_race_test (index, count) values ('7566','1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning count 2015-04-28 00:37:24 UTC [ 432432057 ]: HINT: See server log for query details. 2015-04-28 00:37:24 UTC [ 432432057 ]: CONTEXT: while checking exclusion constraint on tuple (3,36) in relation "upsert_race_test" 2015-04-28 00:37:24 UTC [ 432432057 ]: STATEMENT: insert into upsert_race_test (index, count) values ('7566','-1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning count ON CONFLICT UPDATE will only ever use unique indexes, and so is not affected. Given that exclusion constraints can only be used with IGNORE, and given that this is so hard to recreate, I'm inclined to conclude that it's acceptable. It's certainly way better than risking livelocks by not having "deadlock insurance". This is a ridiculously CPU-bound workload, with extreme and constant contention. I'd be surprised if there were any real complaints from the field in practice. Do you think that this is acceptable, Heikki?
I thought we had an ironclad scheme to prevent deadlocks like this, so I'd like to understand why that happens.
- Heikki -- Sent via pgsql-hackers mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
