Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
From: Anibal David Acosta [mailto:a...@devshock.com] Sent: Thursday, October 04, 2012 10:01 AM To: pgsql-performance@postgresql.org Subject: how to avoid deadlock on masive update with multiples delete . . . The other situation could be that update process while blocking rows scale to block page and the try to scale to lock table while the delete process as some locked rows. Thanks! This (lock escalation from row - to page - to table) is MS SQL Server feature, pretty sure Postgres does not do it. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
On Thu, Oct 4, 2012 at 1:10 PM, Jeff Janes jeff.ja...@gmail.com wrote: The bulk update could take an Exclusive (not Access Exclusive) lock. Or the delete could perhaps be arranged to delete the records in ctid order (although that might still deadlock). Or you could just repeat the failed transaction. How do you make pg update/delete records, in bulk, in some particular order? (ie, without issuing separate queries for each record) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
Presumably something like this?: maciek=# CREATE TABLE test AS SELECT g, random() FROM generate_series(1,1000) g; CREATE maciek=# EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY ctid) x where x.g = test.g; QUERY PLAN - Delete on test (cost=188.99..242.34 rows=1940 width=34) - Hash Join (cost=188.99..242.34 rows=1940 width=34) Hash Cond: (x.g = public.test.g) - Subquery Scan on x (cost=135.34..159.59 rows=1940 width=32) - Sort (cost=135.34..140.19 rows=1940 width=10) Sort Key: public.test.ctid - Seq Scan on test (cost=0.00..29.40 rows=1940 width=10) - Hash (cost=29.40..29.40 rows=1940 width=10) - Seq Scan on test (cost=0.00..29.40 rows=1940 width=10) (9 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
Maciek Sakrejda m.sakre...@gmail.com writes: Presumably something like this?: maciek=# CREATE TABLE test AS SELECT g, random() FROM generate_series(1,1000) g; CREATE maciek=# EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY ctid) x where x.g = test.g; There's no guarantee that the planner won't re-sort the rows coming from the sub-select, unfortunately. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote: Maciek Sakrejda m.sakre...@gmail.com writes: Presumably something like this?: maciek=# CREATE TABLE test AS SELECT g, random() FROM generate_series(1,1000) g; CREATE maciek=# EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY ctid) x where x.g = test.g; There's no guarantee that the planner won't re-sort the rows coming from the sub-select, unfortunately. More often than not you can prevent the planner from doing that by putting a OFFSET 0 in the query. Not 100% but better than nothing. We really need ORDER BY for DML. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
Andres Freund and...@2ndquadrant.com writes: On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote: There's no guarantee that the planner won't re-sort the rows coming from the sub-select, unfortunately. More often than not you can prevent the planner from doing that by putting a OFFSET 0 in the query. Not 100% but better than nothing. No, that will accomplish exactly nothing. The ORDER BY is already an optimization fence. The problem is that of the several ways the planner might choose to join the subquery output to the original table, not all will produce the join rows in the same order as the subquery's result is. For instance, when I tried his example I initially got Delete on test (cost=400.88..692.85 rows=18818 width=34) - Merge Join (cost=400.88..692.85 rows=18818 width=34) Merge Cond: (test.g = x.g) - Sort (cost=135.34..140.19 rows=1940 width=10) Sort Key: test.g - Seq Scan on test (cost=0.00..29.40 rows=1940 width=10) - Sort (cost=265.53..270.38 rows=1940 width=32) Sort Key: x.g - Subquery Scan on x (cost=135.34..159.59 rows=1940 width=32) - Sort (cost=135.34..140.19 rows=1940 width=10) Sort Key: test_1.ctid - Seq Scan on test test_1 (cost=0.00..29.40 rows=1940 width=10) which is going to do the deletes in g order, not ctid order; and then after an ANALYZE I got Delete on test (cost=90.83..120.58 rows=1000 width=34) - Hash Join (cost=90.83..120.58 rows=1000 width=34) Hash Cond: (test.g = x.g) - Seq Scan on test (cost=0.00..16.00 rows=1000 width=10) - Hash (cost=78.33..78.33 rows=1000 width=32) - Subquery Scan on x (cost=65.83..78.33 rows=1000 width=32) - Sort (cost=65.83..68.33 rows=1000 width=10) Sort Key: test_1.ctid - Seq Scan on test test_1 (cost=0.00..16.00 rows=1000 width=10) which is going to do the deletes in ctid order, but that's an artifact of using a seqscan on the test table; the order of the subquery's output is irrelevant, since it got hashed. We really need ORDER BY for DML. Meh. That's outside the SQL standard (not only outside the letter of the standard, but foreign to its very conceptual model) and I don't think the problem really comes up that often. Personally, if I had to deal with this I'd use a plpgsql function (or DO command) that does FOR c IN SELECT ctid FROM table WHERE ... ORDER BY ... LOOP DELETE FROM table WHERE ctid = c; END LOOP; which is not great but at least it avoids client-to-server traffic. Having said all that, are we sure this is even a deletion-order problem? I was wondering about deadlocks from foreign key references, for instance. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
On Fri, Oct 5, 2012 at 10:46 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andres Freund and...@2ndquadrant.com writes: On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote: There's no guarantee that the planner won't re-sort the rows coming from the sub-select, unfortunately. More often than not you can prevent the planner from doing that by putting a OFFSET 0 in the query. Not 100% but better than nothing. No, that will accomplish exactly nothing. The ORDER BY is already an optimization fence. The problem is that of the several ways the planner might choose to join the subquery output to the original table, not all will produce the join rows in the same order as the subquery's result is. For instance, when I tried his example I initially got Delete on test (cost=400.88..692.85 rows=18818 width=34) - Merge Join (cost=400.88..692.85 rows=18818 width=34) Merge Cond: (test.g = x.g) - Sort (cost=135.34..140.19 rows=1940 width=10) Sort Key: test.g - Seq Scan on test (cost=0.00..29.40 rows=1940 width=10) - Sort (cost=265.53..270.38 rows=1940 width=32) Sort Key: x.g - Subquery Scan on x (cost=135.34..159.59 rows=1940 width=32) - Sort (cost=135.34..140.19 rows=1940 width=10) Sort Key: test_1.ctid - Seq Scan on test test_1 (cost=0.00..29.40 rows=1940 width=10) which is going to do the deletes in g order, not ctid order; and then after an ANALYZE I got Delete on test (cost=90.83..120.58 rows=1000 width=34) - Hash Join (cost=90.83..120.58 rows=1000 width=34) Hash Cond: (test.g = x.g) - Seq Scan on test (cost=0.00..16.00 rows=1000 width=10) - Hash (cost=78.33..78.33 rows=1000 width=32) - Subquery Scan on x (cost=65.83..78.33 rows=1000 width=32) - Sort (cost=65.83..68.33 rows=1000 width=10) Sort Key: test_1.ctid - Seq Scan on test test_1 (cost=0.00..16.00 rows=1000 width=10) which is going to do the deletes in ctid order, but that's an artifact of using a seqscan on the test table; the order of the subquery's output is irrelevant, since it got hashed. We really need ORDER BY for DML. Meh. That's outside the SQL standard (not only outside the letter of the standard, but foreign to its very conceptual model) and I don't think the problem really comes up that often. Personally, if I had to deal with this I'd use a plpgsql function (or DO command) that does Can't it be forced like this (assuming it is in fact a vanilla order by problem)? EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY ctid FOR UPDATE) x where x.g = test.g; (emphasis on 'for update') merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
On Fri, Oct 5, 2012 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: FOR c IN SELECT ctid FROM table WHERE ... ORDER BY ... LOOP DELETE FROM table WHERE ctid = c; END LOOP; Maybe, in that sense, it would be better to optimize client-server protocol for batch operations. PREPARE blah(c) AS DELETE FROM table WHERE ctid = $1; EXECUTE blah(c1), blah(c2), blah(c3), ... ^ 1 transaction, 1 roundtrip, multiple queries -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
Merlin Moncure mmonc...@gmail.com writes: Can't it be forced like this (assuming it is in fact a vanilla order by problem)? EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY ctid FOR UPDATE) x where x.g = test.g; (emphasis on 'for update') Hm ... yeah, that might work, once you redefine the problem as get the row locks in a consistent order rather than do the updates in a consistent order. But I'd be inclined to phrase it as EXPLAIN DELETE FROM test USING (SELECT ctid FROM test ORDER BY g FOR UPDATE) x where x.ctid = test.ctid; I'm not sure that ORDER BY ctid is really very meaningful here; think about FOR UPDATE switching its attention to updated versions of rows. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
On Friday, October 05, 2012 05:46:05 PM Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote: There's no guarantee that the planner won't re-sort the rows coming from the sub-select, unfortunately. More often than not you can prevent the planner from doing that by putting a OFFSET 0 in the query. Not 100% but better than nothing. No, that will accomplish exactly nothing. The ORDER BY is already an optimization fence. Yea, sorry. I was thinking of related problem/solution. We really need ORDER BY for DML. Meh. That's outside the SQL standard (not only outside the letter of the standard, but foreign to its very conceptual model) and I don't think the problem really comes up that often. Back when I mostly did consulting/development on client code it came up about once a week. I might have a warped view though because thats the kind of thing you would ask a consultant about... Having said all that, are we sure this is even a deletion-order problem? I was wondering about deadlocks from foreign key references, for instance. Absolutely not sure, no. Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
Process 1 (massive update): update table A set column1=0, column2=0 Process 2 (multiple delete): perform delete_row(user_name, column1, column2) from table A where user_name=YYY The pgsql function delete_row delete the row and do other business logic not related to table A. -Mensaje original- De: Claudio Freire [mailto:klaussfre...@gmail.com] Enviado el: viernes, 05 de octubre de 2012 10:27 a.m. Para: Jeff Janes CC: Anibal David Acosta; pgsql-performance@postgresql.org Asunto: Re: [PERFORM] how to avoid deadlock on masive update with multiples delete On Thu, Oct 4, 2012 at 1:10 PM, Jeff Janes jeff.ja...@gmail.com wrote: The bulk update could take an Exclusive (not Access Exclusive) lock. Or the delete could perhaps be arranged to delete the records in ctid order (although that might still deadlock). Or you could just repeat the failed transaction. How do you make pg update/delete records, in bulk, in some particular order? (ie, without issuing separate queries for each record) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] how to avoid deadlock on masive update with multiples delete
Hi, I have a table with about 10 millions of records, this table is update and inserted very often during the day (approx. 200 per second) , in the night the activity is a lot less, so in the first seconds of a day (00:00:01) a batch process update some columns (used like counters) of this table setting his value to 0. Yesterday, the first time it occurs, I got a deadlock when other process try to delete multiple (about 10 or 20) rows of the same table. I think that maybe the situation was: Process A (PA) (massive update) Process B (PB) (multiple delete) PA Block record 1, update PA Block record 2, update PA Block record 3, update PB Block record 4, delete PB Block record 5, delete PA Block record 4, waiting PB Block record 3, waiting The other situation could be that update process while blocking rows scale to block page and the try to scale to lock table while the delete process as some locked rows. Any ideas how to prevent this situation? Thanks!
Re: [PERFORM] how to avoid deadlock on masive update with multiples delete
On Thu, Oct 4, 2012 at 7:01 AM, Anibal David Acosta a...@devshock.com wrote: Hi, I have a table with about 10 millions of records, this table is update and inserted very often during the day (approx. 200 per second) , in the night the activity is a lot less, so in the first seconds of a day (00:00:01) a batch process update some columns (used like counters) of this table setting his value to 0. Yesterday, the first time it occurs, I got a deadlock when other process try to delete multiple (about 10 or 20) rows of the same table. ... Any ideas how to prevent this situation? The bulk update could take an Exclusive (not Access Exclusive) lock. Or the delete could perhaps be arranged to delete the records in ctid order (although that might still deadlock). Or you could just repeat the failed transaction. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance