Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Igor Neyman
From: Anibal David Acosta [mailto:a...@devshock.com] 
Sent: Thursday, October 04, 2012 10:01 AM
To: pgsql-performance@postgresql.org
Subject: how to avoid deadlock on masive update with multiples delete

.
. 
.

The other situation could be that update process while blocking rows scale to 
block page and the try to scale to lock table while the delete process as some 
locked rows.

Thanks!


This (lock escalation from row - to page - to table) is MS SQL Server 
feature, pretty sure Postgres does not do it.

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Claudio Freire
On Thu, Oct 4, 2012 at 1:10 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 The bulk update could take an Exclusive (not Access Exclusive) lock.
 Or the delete could perhaps be arranged to delete the records in ctid
 order (although that might still deadlock).  Or you could just repeat
 the failed transaction.

How do you make pg update/delete records, in bulk, in some particular order?

(ie, without issuing separate queries for each record)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Maciek Sakrejda
Presumably something like this?:

maciek=# CREATE TABLE test AS SELECT g, random() FROM
generate_series(1,1000) g;
CREATE
maciek=# EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY
ctid) x where x.g = test.g;
   QUERY PLAN
-
 Delete on test  (cost=188.99..242.34 rows=1940 width=34)
   -  Hash Join  (cost=188.99..242.34 rows=1940 width=34)
 Hash Cond: (x.g = public.test.g)
 -  Subquery Scan on x  (cost=135.34..159.59 rows=1940 width=32)
   -  Sort  (cost=135.34..140.19 rows=1940 width=10)
 Sort Key: public.test.ctid
 -  Seq Scan on test  (cost=0.00..29.40 rows=1940 width=10)
 -  Hash  (cost=29.40..29.40 rows=1940 width=10)
   -  Seq Scan on test  (cost=0.00..29.40 rows=1940 width=10)
(9 rows)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Tom Lane
Maciek Sakrejda m.sakre...@gmail.com writes:
 Presumably something like this?:
 maciek=# CREATE TABLE test AS SELECT g, random() FROM
 generate_series(1,1000) g;
 CREATE
 maciek=# EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY
 ctid) x where x.g = test.g;

There's no guarantee that the planner won't re-sort the rows coming from
the sub-select, unfortunately.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Andres Freund
On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
 Maciek Sakrejda m.sakre...@gmail.com writes:
  Presumably something like this?:
  maciek=# CREATE TABLE test AS SELECT g, random() FROM
  generate_series(1,1000) g;
  CREATE
  maciek=# EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY
  ctid) x where x.g = test.g;
 
 There's no guarantee that the planner won't re-sort the rows coming from
 the sub-select, unfortunately.
More often than not you can prevent the planner from doing that by putting a 
OFFSET 0 in the query. Not 100% but better than nothing.

We really need ORDER BY for DML.

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
 There's no guarantee that the planner won't re-sort the rows coming from
 the sub-select, unfortunately.

 More often than not you can prevent the planner from doing that by putting a 
 OFFSET 0 in the query. Not 100% but better than nothing.

No, that will accomplish exactly nothing.  The ORDER BY is already an
optimization fence.  The problem is that of the several ways the planner
might choose to join the subquery output to the original table, not all
will produce the join rows in the same order as the subquery's result
is.  For instance, when I tried his example I initially got

 Delete on test  (cost=400.88..692.85 rows=18818 width=34)
   -  Merge Join  (cost=400.88..692.85 rows=18818 width=34)
 Merge Cond: (test.g = x.g)
 -  Sort  (cost=135.34..140.19 rows=1940 width=10)
   Sort Key: test.g
   -  Seq Scan on test  (cost=0.00..29.40 rows=1940 width=10)
 -  Sort  (cost=265.53..270.38 rows=1940 width=32)
   Sort Key: x.g
   -  Subquery Scan on x  (cost=135.34..159.59 rows=1940 width=32)
 -  Sort  (cost=135.34..140.19 rows=1940 width=10)
   Sort Key: test_1.ctid
   -  Seq Scan on test test_1  (cost=0.00..29.40 
rows=1940 width=10)

which is going to do the deletes in g order, not ctid order;
and then after an ANALYZE I got

 Delete on test  (cost=90.83..120.58 rows=1000 width=34)
   -  Hash Join  (cost=90.83..120.58 rows=1000 width=34)
 Hash Cond: (test.g = x.g)
 -  Seq Scan on test  (cost=0.00..16.00 rows=1000 width=10)
 -  Hash  (cost=78.33..78.33 rows=1000 width=32)
   -  Subquery Scan on x  (cost=65.83..78.33 rows=1000 width=32)
 -  Sort  (cost=65.83..68.33 rows=1000 width=10)
   Sort Key: test_1.ctid
   -  Seq Scan on test test_1  (cost=0.00..16.00 
rows=1000 width=10)

which is going to do the deletes in ctid order, but that's an artifact
of using a seqscan on the test table; the order of the subquery's output
is irrelevant, since it got hashed.

 We really need ORDER BY for DML.

Meh.  That's outside the SQL standard (not only outside the letter of
the standard, but foreign to its very conceptual model) and I don't
think the problem really comes up that often.  Personally, if I had to
deal with this I'd use a plpgsql function (or DO command) that does

FOR c IN SELECT ctid FROM table WHERE ... ORDER BY ... LOOP
DELETE FROM table WHERE ctid = c;
END LOOP;

which is not great but at least it avoids client-to-server traffic.

Having said all that, are we sure this is even a deletion-order
problem?  I was wondering about deadlocks from foreign key references,
for instance.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Merlin Moncure
On Fri, Oct 5, 2012 at 10:46 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
 There's no guarantee that the planner won't re-sort the rows coming from
 the sub-select, unfortunately.

 More often than not you can prevent the planner from doing that by putting a
 OFFSET 0 in the query. Not 100% but better than nothing.

 No, that will accomplish exactly nothing.  The ORDER BY is already an
 optimization fence.  The problem is that of the several ways the planner
 might choose to join the subquery output to the original table, not all
 will produce the join rows in the same order as the subquery's result
 is.  For instance, when I tried his example I initially got

  Delete on test  (cost=400.88..692.85 rows=18818 width=34)
-  Merge Join  (cost=400.88..692.85 rows=18818 width=34)
  Merge Cond: (test.g = x.g)
  -  Sort  (cost=135.34..140.19 rows=1940 width=10)
Sort Key: test.g
-  Seq Scan on test  (cost=0.00..29.40 rows=1940 width=10)
  -  Sort  (cost=265.53..270.38 rows=1940 width=32)
Sort Key: x.g
-  Subquery Scan on x  (cost=135.34..159.59 rows=1940 
 width=32)
  -  Sort  (cost=135.34..140.19 rows=1940 width=10)
Sort Key: test_1.ctid
-  Seq Scan on test test_1  (cost=0.00..29.40 
 rows=1940 width=10)

 which is going to do the deletes in g order, not ctid order;
 and then after an ANALYZE I got

  Delete on test  (cost=90.83..120.58 rows=1000 width=34)
-  Hash Join  (cost=90.83..120.58 rows=1000 width=34)
  Hash Cond: (test.g = x.g)
  -  Seq Scan on test  (cost=0.00..16.00 rows=1000 width=10)
  -  Hash  (cost=78.33..78.33 rows=1000 width=32)
-  Subquery Scan on x  (cost=65.83..78.33 rows=1000 width=32)
  -  Sort  (cost=65.83..68.33 rows=1000 width=10)
Sort Key: test_1.ctid
-  Seq Scan on test test_1  (cost=0.00..16.00 
 rows=1000 width=10)

 which is going to do the deletes in ctid order, but that's an artifact
 of using a seqscan on the test table; the order of the subquery's output
 is irrelevant, since it got hashed.

 We really need ORDER BY for DML.

 Meh.  That's outside the SQL standard (not only outside the letter of
 the standard, but foreign to its very conceptual model) and I don't
 think the problem really comes up that often.  Personally, if I had to
 deal with this I'd use a plpgsql function (or DO command) that does

Can't it be forced like this (assuming it is in fact a vanilla order
by problem)?

EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY
ctid FOR UPDATE) x where x.g = test.g;

(emphasis on 'for update')

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Claudio Freire
On Fri, Oct 5, 2012 at 12:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 FOR c IN SELECT ctid FROM table WHERE ... ORDER BY ... LOOP
 DELETE FROM table WHERE ctid = c;
 END LOOP;

Maybe, in that sense, it would be better to optimize client-server
protocol for batch operations.

PREPARE blah(c) AS DELETE FROM table WHERE ctid = $1;

EXECUTE blah(c1), blah(c2), blah(c3), ...
 ^ 1 transaction, 1 roundtrip, multiple queries


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Tom Lane
Merlin Moncure mmonc...@gmail.com writes:
 Can't it be forced like this (assuming it is in fact a vanilla order
 by problem)?

 EXPLAIN DELETE FROM test USING (SELECT g FROM test ORDER BY
 ctid FOR UPDATE) x where x.g = test.g;

 (emphasis on 'for update')

Hm ... yeah, that might work, once you redefine the problem as get the
row locks in a consistent order rather than do the updates in a
consistent order.  But I'd be inclined to phrase it as

EXPLAIN DELETE FROM test USING (SELECT ctid FROM test ORDER BY
g FOR UPDATE) x where x.ctid = test.ctid;

I'm not sure that ORDER BY ctid is really very meaningful here; think
about FOR UPDATE switching its attention to updated versions of rows.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Andres Freund
On Friday, October 05, 2012 05:46:05 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Friday, October 05, 2012 05:31:43 PM Tom Lane wrote:
  There's no guarantee that the planner won't re-sort the rows coming from
  the sub-select, unfortunately.
  
  More often than not you can prevent the planner from doing that by
  putting a OFFSET 0 in the query. Not 100% but better than nothing.
 
 No, that will accomplish exactly nothing.  The ORDER BY is already an
 optimization fence.
Yea, sorry. I was thinking of related problem/solution.

  We really need ORDER BY for DML.
 
 Meh.  That's outside the SQL standard (not only outside the letter of
 the standard, but foreign to its very conceptual model) and I don't
 think the problem really comes up that often.
Back when I mostly did consulting/development on client code it came up about 
once a week. I might have  a warped view though because thats the kind of 
thing you would ask a consultant about...

 Having said all that, are we sure this is even a deletion-order
 problem?  I was wondering about deadlocks from foreign key references,
 for instance.
Absolutely not sure, no.

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-05 Thread Anibal David Acosta
Process 1 (massive update): update table A set column1=0, column2=0 

Process 2 (multiple delete): perform delete_row(user_name, column1, column2)
from table A where user_name=YYY

The pgsql function delete_row delete the row and do other business logic not
related to table A.



-Mensaje original-
De: Claudio Freire [mailto:klaussfre...@gmail.com] 
Enviado el: viernes, 05 de octubre de 2012 10:27 a.m.
Para: Jeff Janes
CC: Anibal David Acosta; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] how to avoid deadlock on masive update with multiples
delete

On Thu, Oct 4, 2012 at 1:10 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 The bulk update could take an Exclusive (not Access Exclusive) lock.
 Or the delete could perhaps be arranged to delete the records in ctid 
 order (although that might still deadlock).  Or you could just repeat 
 the failed transaction.

How do you make pg update/delete records, in bulk, in some particular order?

(ie, without issuing separate queries for each record)



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-04 Thread Anibal David Acosta
Hi,

I have a table with about 10 millions of records, this table is update and
inserted very often during the day (approx. 200 per second) , in the night
the activity is a lot less, so in the first seconds of a day (00:00:01) a
batch process update  some columns (used like counters) of this table
setting his value to 0.

 

Yesterday, the first time it occurs, I got a deadlock when other process try
to delete multiple (about 10 or 20) rows of the same table.

 

I think that maybe the situation was:

 

Process A (PA) (massive update)

Process B (PB) (multiple delete)

 

PA Block record 1, update

PA Block record 2, update

PA Block record 3, update

PB Block record 4, delete

PB Block record 5, delete

PA Block record 4, waiting

PB Block record 3, waiting

 

The other situation could be that update process while blocking rows scale
to block page and the try to scale to lock table while the delete process as
some locked rows.

 

Any ideas how to prevent this situation?

 

Thanks!



Re: [PERFORM] how to avoid deadlock on masive update with multiples delete

2012-10-04 Thread Jeff Janes
On Thu, Oct 4, 2012 at 7:01 AM, Anibal David Acosta a...@devshock.com wrote:
 Hi,

 I have a table with about 10 millions of records, this table is update and
 inserted very often during the day (approx. 200 per second) , in the night
 the activity is a lot less, so in the first seconds of a day (00:00:01) a
 batch process update  some columns (used like counters) of this table
 setting his value to 0.



 Yesterday, the first time it occurs, I got a deadlock when other process try
 to delete multiple (about 10 or 20) rows of the same table.
...

 Any ideas how to prevent this situation?

The bulk update could take an Exclusive (not Access Exclusive) lock.
Or the delete could perhaps be arranged to delete the records in ctid
order (although that might still deadlock).  Or you could just repeat
the failed transaction.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance