Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-13 Thread Denis
Jeff Janes wrote
 On Thu, Nov 8, 2012 at 1:04 AM, Denis lt;

 socsam@

 gt; wrote:

 Still I can't undesrtand why pg_dump has to know about all the tables?
 
 Strictly speaking it probably doesn't need to.  But it is primarily
 designed for dumping entire databases, and the efficient way to do
 that is to read it all into memory in a few queries and then sort out
 the dependencies, rather than tracking down every dependency
 individually with one or more trips back to the database.  (Although
 it still does make plenty of trips back to the database per
 table/sequence, for acls, defaults, attributes.
 
 If you were to rewrite pg_dump from the ground up to achieve your
 specific needs (dumping one schema, with no dependencies between to
 other schemata) you could probably make it much more efficient.  But
 then it wouldn't be pg_dump, it would be something else.
 
 Cheers,
 
 Jeff
 
 
 -- 
 Sent via pgsql-performance mailing list (

 pgsql-performance@

 )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Please don't think that I'm trying to nitpick here, but pg_dump has options
for dumping separate tables and that's not really consistent with the idea
that pg_dump is primarily  designed for dumping entire databases.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731900.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-13 Thread Andrew Dunstan


On 11/13/2012 10:12 PM, Denis wrote:

Please don't think that I'm trying to nitpick here, but pg_dump has options
for dumping separate tables and that's not really consistent with the idea
that pg_dump is primarily  designed for dumping entire databases.





Sure it is. The word primarily is not just a noise word here.

The fact that we have options to do other things doesn't mean that its 
primary design goal has changed.



cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-09 Thread Jeff Janes
On Thu, Nov 8, 2012 at 1:04 AM, Denis soc...@gmail.com wrote:

 Still I can't undesrtand why pg_dump has to know about all the tables?

Strictly speaking it probably doesn't need to.  But it is primarily
designed for dumping entire databases, and the efficient way to do
that is to read it all into memory in a few queries and then sort out
the dependencies, rather than tracking down every dependency
individually with one or more trips back to the database.  (Although
it still does make plenty of trips back to the database per
table/sequence, for acls, defaults, attributes.

If you were to rewrite pg_dump from the ground up to achieve your
specific needs (dumping one schema, with no dependencies between to
other schemata) you could probably make it much more efficient.  But
then it wouldn't be pg_dump, it would be something else.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-08 Thread Denis
Tom Lane-2 wrote
 Denis lt;

 socsam@

 gt; writes:
 Tom Lane-2 wrote
 Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
 that you've got 183924 tables.  That's going to take some time no matter
 what.
 
 I wonder why pg_dump has to have deal with all these 183924 tables, if I
 specified to dump only one scheme: pg_dump -n schema_name or even like
 this to dump just one table pg_dump -t 'schema_name.comments'   ?
 
 It has to know about all the tables even if it's not going to dump them
 all, for purposes such as dependency analysis.
 
 We have a web application where we create a schema with a number of
 tables
 in it for each customer. This architecture was chosen to ease the process
 of
 backup/restoring data.
 
 I find that argument fairly dubious, but in any case you should not
 imagine that hundreds of thousands of tables are going to be cost-free.
 
   regards, tom lane
 
 
 -- 
 Sent via pgsql-performance mailing list (

 pgsql-performance@

 )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Still I can't undesrtand why pg_dump has to know about all the tables? For
example I have such an easy table 
CREATE TABLE CLog (
fromUser integer,
toUser integer,
message character varying(2048) NOT NULL,
dateSend timestamp without time zone NOT NULL
);
no foreign keys, it doesn't use partitioning, it doesn't have any relations
to any other table. Why pg_dump has to gother information about ALL the
tables in the database just to dump one this table?



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731188.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-07 Thread Denis
Tom Lane-2 wrote
 Denis lt;

 socsam@

 gt; writes:
 Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it
 takes from 10 to 15 seconds when I am doing backup.
 
 Sort  (cost=853562.04..854020.73 rows=183478 width=219) (actual
 time=5340.477..5405.604 rows=183924 loops=1)
 
 Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
 that you've got 183924 tables.  That's going to take some time no matter
 what.
 
 It does seem like we could make some small changes to optimize that
 query a little bit, but they're not going to result in any amazing
 improvement overall, because pg_dump still has to deal with all the
 tables it's getting back.  Fundamentally, I would ask whether you really
 need so many tables.  It seems pretty likely that you have lots and lots
 of basically-identical tables.  Usually it would be better to redesign
 such a structure into fewer tables with more index columns.
 
 Here is the output of pg_dump -s  test.dump
 lt;http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dumpgt;  
 
 This dump contains only 1 schema and 43 tables, so I don't think it's
 for the database you're having trouble with ...
 
   regards, tom lane
 
 
 -- 
 Sent via pgsql-performance mailing list (

 pgsql-performance@

 )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

I wonder why pg_dump has to have deal with all these 183924 tables, if I
specified to dump only one scheme: pg_dump -n schema_name or even like
this to dump just one table pg_dump -t 'schema_name.comments'   ?

We have a web application where we create a schema with a number of tables
in it for each customer. This architecture was chosen to ease the process of
backup/restoring data. Sometimes clients ask us to restore data for the last
month or roll back to last week's state. This task is easy to accomplish
then the client's data is isolated in a schema/DB. If we put all the clients
data in one table - operations of this kind will be much harder to perform.
We will have to restore a huge DB with an enormously large tables in it to
find the requested data. 
Different clients have different activity rate and we can select different
backup strategies according to it. This would be impossible in case we keep
all the clients data in one table. 
Besides all the above mentioned, the probability of massive data corruption
(if an error in our web application occurs) is much higher.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730998.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-07 Thread Tom Lane
Denis soc...@gmail.com writes:
 Tom Lane-2 wrote
 Hmmm ... so the problem here isn't that you've got 2600 schemas, it's
 that you've got 183924 tables.  That's going to take some time no matter
 what.

 I wonder why pg_dump has to have deal with all these 183924 tables, if I
 specified to dump only one scheme: pg_dump -n schema_name or even like
 this to dump just one table pg_dump -t 'schema_name.comments'   ?

It has to know about all the tables even if it's not going to dump them
all, for purposes such as dependency analysis.

 We have a web application where we create a schema with a number of tables
 in it for each customer. This architecture was chosen to ease the process of
 backup/restoring data.

I find that argument fairly dubious, but in any case you should not
imagine that hundreds of thousands of tables are going to be cost-free.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-06 Thread Denis
I've read all the posts in thread, and as I understood in version 9.2 some
patches were applied to improve pg_dump speed. I've just installed
PostgreSQL 9.2.1 and I still have the same problem. I have a database with
2600 schemas in it. I try to dump each schema individually, but it takes too
much time for every schema (about 30-40 seconds per schema, no matter what
the data size is). Also for each schema dump I have a slow query log entry,
here is an example:

2012-11-06 13:15:32 GMTLOG:  duration: 12029.334 ms  statement: SELECT
c.tableoid, c.oid, c.relname, c.relacl, c.relkind, c.relnamespace, (SELECT
rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname,
c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, c.relhasoids,
c.relfrozenxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid,
c.relpersistence, CASE WHEN c.reloftype  0 THEN
c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, d.refobjid AS
owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace
t WHERE t.oid = c.reltablespace) AS reltablespace,
array_to_string(c.reloptions, ', ') AS reloptions,
array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ',
') AS toast_reloptions FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind =
'S' AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND
d.refclassid = c.tableoid AND d.deptype = 'a') LEFT JOIN pg_class tc ON
(c.reltoastrelid = tc.oid) WHERE c.relkind in ('r', 'S', 'v', 'c', 'f')
ORDER BY c.oid



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730864.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-06 Thread Tom Lane
Denis soc...@gmail.com writes:
 I've read all the posts in thread, and as I understood in version 9.2 some
 patches were applied to improve pg_dump speed. I've just installed
 PostgreSQL 9.2.1 and I still have the same problem. I have a database with
 2600 schemas in it. I try to dump each schema individually, but it takes too
 much time for every schema (about 30-40 seconds per schema, no matter what
 the data size is).

Could you provide a test case for that?  Maybe the output of pg_dump -s,
anonymized as you see fit?

 Also for each schema dump I have a slow query log entry,

Could you provide EXPLAIN ANALYZE output for that query?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-06 Thread Denis
Tom Lane-2 wrote
 Denis lt;

 socsam@

 gt; writes:
 I've read all the posts in thread, and as I understood in version 9.2
 some
 patches were applied to improve pg_dump speed. I've just installed
 PostgreSQL 9.2.1 and I still have the same problem. I have a database
 with
 2600 schemas in it. I try to dump each schema individually, but it takes
 too
 much time for every schema (about 30-40 seconds per schema, no matter
 what
 the data size is).
 
 Could you provide a test case for that?  Maybe the output of pg_dump -s,
 anonymized as you see fit?
 
 Also for each schema dump I have a slow query log entry,
 
 Could you provide EXPLAIN ANALYZE output for that query?
 
   regards, tom lane
 
 
 -- 
 Sent via pgsql-performance mailing list (

 pgsql-performance@

 )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it
takes from 10 to 15 seconds when I am doing backup.

Sort  (cost=853562.04..854020.73 rows=183478 width=219) (actual
time=5340.477..5405.604 rows=183924 loops=1)
   Sort Key: c.oid
   Sort Method: external merge  Disk: 33048kB
   -  Hash Left Join  (cost=59259.80..798636.25 rows=183478 width=219)
(actual time=839.297..4971.299 rows=183924 loops=1)
 Hash Cond: (c.reltoastrelid = tc.oid)
 -  Hash Right Join  (cost=29530.77..146976.65 rows=183478
width=183) (actual time=404.959..3261.462 rows=183924 loops=1
)
   Hash Cond: ((d.classid = c.tableoid) AND (d.objid = c.oid)
AND (d.refclassid = c.tableoid))
   Join Filter: (c.relkind = 'S'::char)
   -  Seq Scan on pg_depend d  (cost=0.00..71403.54 rows=995806
width=20) (actual time=1.137..878.571 rows=998642 lo
ops=1)
 Filter: ((objsubid = 0) AND (deptype = 'a'::char))
 Rows Removed by Filter: 2196665
   -  Hash  (cost=21839.91..21839.91 rows=183478 width=175)
(actual time=402.947..402.947 rows=183924 loops=1)
 Buckets: 1024  Batches: 32  Memory Usage: 876kB
 -  Seq Scan on pg_class c  (cost=0.00..21839.91
rows=183478 width=175) (actual time=0.017..267.614 rows=183
924 loops=1)
   Filter: (relkind = ANY ('{r,S,v,c,f}'::char[]))
   Rows Removed by Filter: 383565
 -  Hash  (cost=18333.79..18333.79 rows=560979 width=40) (actual
time=434.258..434.258 rows=567489 loops=1)
   Buckets: 4096  Batches: 32  Memory Usage: 703kB
   -  Seq Scan on pg_class tc  (cost=0.00..18333.79 rows=560979
width=40) (actual time=0.003..273.418 rows=567489 lo
ops=1)
 SubPlan 1
   -  Seq Scan on pg_authid  (cost=0.00..1.01 rows=1 width=68)
(actual time=0.001..0.001 rows=1 loops=183924)
 Filter: (oid = c.relowner)
 Rows Removed by Filter: 2
 SubPlan 2
   -  Seq Scan on pg_tablespace t  (cost=0.00..1.02 rows=1
width=64) (actual time=0.001..0.001 rows=0 loops=183924)
 Filter: (oid = c.reltablespace)
 Rows Removed by Filter: 2
 SubPlan 3
   -  Function Scan on unnest x  (cost=0.00..1.25 rows=100
width=32) (actual time=0.001..0.001 rows=0 loops=183924)
 Total runtime: 5428.498 ms

Here is the output of pg_dump -s  test.dump
http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump  



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730877.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Jeff Janes
On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Was this applied?

 No, we fixed the server side instead.

 But only for 9.2, right?  So people running back branches are still screwed.

 Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
 behaviors involved here, not all of which are masked by what Tatsuo-san
 suggested.

All of the other ones that I know of were associated with pg_dump
itself, and since it is recommended to run the newer version of
pg_dump against the older version of the server, no back patching
would be necessary to get the benefits of those particular fixes.

 Six months or a year from now, we might have enough confidence in that
 batch of 9.2 fixes to back-port them en masse.  Don't want to do it
 today though.


What would be the recommendation for people trying to upgrade, but who
can't get their data out in a reasonable window?

Putting Tatsuo-san's change into a future pg_dump might be more
conservative than back-porting the server's Lock Table change to the
server version they are trying to get rid of.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-09-02 Thread Robert Haas
On Sun, Sep 2, 2012 at 5:39 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Was this applied?

 No, we fixed the server side instead.

 But only for 9.2, right?  So people running back branches are still screwed.

 Yeah, but they're screwed anyway, because there are a bunch of O(N^2)
 behaviors involved here, not all of which are masked by what Tatsuo-san
 suggested.

 All of the other ones that I know of were associated with pg_dump
 itself, and since it is recommended to run the newer version of
 pg_dump against the older version of the server, no back patching
 would be necessary to get the benefits of those particular fixes.

 Six months or a year from now, we might have enough confidence in that
 batch of 9.2 fixes to back-port them en masse.  Don't want to do it
 today though.


 What would be the recommendation for people trying to upgrade, but who
 can't get their data out in a reasonable window?

 Putting Tatsuo-san's change into a future pg_dump might be more
 conservative than back-porting the server's Lock Table change to the
 server version they are trying to get rid of.

What he said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-08-30 Thread Robert Haas
On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote:
 Ok, I modified the part of pg_dump where tremendous number of LOCK
 TABLE are issued. I replace them with single LOCK TABLE with multiple
 tables. With 100k tables LOCK statements took 13 minutes in total, now
 it only takes 3 seconds. Comments?

 Shall I commit to master and all supported branches?

 Was this applied?

 No, we fixed the server side instead.

But only for 9.2, right?  So people running back branches are still screwed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 It's not clear whether Tom is already working on that O(N^2) fix in locking.

I'm not; Jeff Janes is.  But you shouldn't be holding your breath
anyway, since it's 9.3 material at this point.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 It's not clear whether Tom is already working on that O(N^2) fix in locking.

 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

I agree we can't back-patch that change, but then I think we ought to
consider back-patching some variant of Tatsuo's patch.  Maybe it's not
reasonable to thunk an arbitrary number of relation names in there on
one line, but how about 1000 relations per LOCK statement or so?  I
guess we'd need to see how much that erodes the benefit, but we've
certainly done back-branch rearrangements in pg_dump in the past to
fix various kinds of issues, and this is pretty non-invasive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

 I agree we can't back-patch that change, but then I think we ought to
 consider back-patching some variant of Tatsuo's patch.  Maybe it's not
 reasonable to thunk an arbitrary number of relation names in there on
 one line, but how about 1000 relations per LOCK statement or so?  I
 guess we'd need to see how much that erodes the benefit, but we've
 certainly done back-branch rearrangements in pg_dump in the past to
 fix various kinds of issues, and this is pretty non-invasive.

I am not convinced either that this patch will still be useful after
Jeff's fix goes in, or that it provides any meaningful savings when
you consider a complete pg_dump run.  Yeah, it will make the lock
acquisition phase faster, but that's not a big part of the runtime
except in very limited scenarios (--schema-only, perhaps).

The performance patches we applied to pg_dump over the past couple weeks
were meant to relieve pain in situations where the big server-side
lossage wasn't the dominant factor in runtime (ie, partial dumps).
But this one is targeting exactly that area, which is why it looks like
a band-aid and not a fix to me.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Bruce Momjian
On Thu, May 31, 2012 at 10:50:51AM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I'm not; Jeff Janes is. �But you shouldn't be holding your breath
  anyway, since it's 9.3 material at this point.
 
  I agree we can't back-patch that change, but then I think we ought to
  consider back-patching some variant of Tatsuo's patch.  Maybe it's not
  reasonable to thunk an arbitrary number of relation names in there on
  one line, but how about 1000 relations per LOCK statement or so?  I
  guess we'd need to see how much that erodes the benefit, but we've
  certainly done back-branch rearrangements in pg_dump in the past to
  fix various kinds of issues, and this is pretty non-invasive.
 
 I am not convinced either that this patch will still be useful after
 Jeff's fix goes in, or that it provides any meaningful savings when
 you consider a complete pg_dump run.  Yeah, it will make the lock
 acquisition phase faster, but that's not a big part of the runtime
 except in very limited scenarios (--schema-only, perhaps).

FYI, that is the pg_upgrade use-case, and pg_dump/restore time is
reportedly taking the majority of time in many cases.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire
On Thu, May 31, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The performance patches we applied to pg_dump over the past couple weeks
 were meant to relieve pain in situations where the big server-side
 lossage wasn't the dominant factor in runtime (ie, partial dumps).
 But this one is targeting exactly that area, which is why it looks like
 a band-aid and not a fix to me.

No, Tatsuo's patch attacks a phase dominated by latency in some
setups. That it's also becoming slow currently because of the locking
cost is irrelevant, with locking sped up, the patch should only
improve the phase even further. Imagine the current timeline:

* = locking
. = waiting

*.*.**.**.***.***...*.

Tatsuo's patch converts it to:

*.**

The locking fix would turn the timeline into:

*.*.*.*.*.*.*

Tatsuo's patch would turn that into:

***

And, as noted before, pg_dump --schema-only is a key bottleneck in pg_upgrade.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Robert Haas
On Thu, May 31, 2012 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm not; Jeff Janes is.  But you shouldn't be holding your breath
 anyway, since it's 9.3 material at this point.

 I agree we can't back-patch that change, but then I think we ought to
 consider back-patching some variant of Tatsuo's patch.  Maybe it's not
 reasonable to thunk an arbitrary number of relation names in there on
 one line, but how about 1000 relations per LOCK statement or so?  I
 guess we'd need to see how much that erodes the benefit, but we've
 certainly done back-branch rearrangements in pg_dump in the past to
 fix various kinds of issues, and this is pretty non-invasive.

 I am not convinced either that this patch will still be useful after
 Jeff's fix goes in, ...

But people on older branches are not going to GET Jeff's fix.

 or that it provides any meaningful savings when
 you consider a complete pg_dump run.  Yeah, it will make the lock
 acquisition phase faster, but that's not a big part of the runtime
 except in very limited scenarios (--schema-only, perhaps).

That is not a borderline scenario, as others have also pointed out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Bruce Momjian
On Thu, May 31, 2012 at 11:04:12AM -0400, Robert Haas wrote:
 On Thu, May 31, 2012 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  I'm not; Jeff Janes is.  But you shouldn't be holding your breath
  anyway, since it's 9.3 material at this point.
 
  I agree we can't back-patch that change, but then I think we ought to
  consider back-patching some variant of Tatsuo's patch.  Maybe it's not
  reasonable to thunk an arbitrary number of relation names in there on
  one line, but how about 1000 relations per LOCK statement or so?  I
  guess we'd need to see how much that erodes the benefit, but we've
  certainly done back-branch rearrangements in pg_dump in the past to
  fix various kinds of issues, and this is pretty non-invasive.
 
  I am not convinced either that this patch will still be useful after
  Jeff's fix goes in, ...
 
 But people on older branches are not going to GET Jeff's fix.

FYI, if it got into Postgres 9.2, everyone upgrading to Postgres 9.2
would benefit because pg_upgrade uses the new cluster's pg_dumpall.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 On Thu, May 31, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The performance patches we applied to pg_dump over the past couple weeks
 were meant to relieve pain in situations where the big server-side
 lossage wasn't the dominant factor in runtime (ie, partial dumps).
 But this one is targeting exactly that area, which is why it looks like
 a band-aid and not a fix to me.

 No, Tatsuo's patch attacks a phase dominated by latency in some
 setups.

No, it does not.  The reason it's a win is that it avoids the O(N^2)
behavior in the server.  Whether the bandwidth savings is worth worrying
about cannot be proven one way or the other as long as that elephant
is in the room.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-05-31 Thread Claudio Freire
On Thu, May 31, 2012 at 12:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 No, Tatsuo's patch attacks a phase dominated by latency in some
 setups.

 No, it does not.  The reason it's a win is that it avoids the O(N^2)
 behavior in the server.  Whether the bandwidth savings is worth worrying
 about cannot be proven one way or the other as long as that elephant
 is in the room.

                        regards, tom lane

I understand that, but if the locking is fixed and made to be O(N)
(and hence each table locking O(1)), then latency suddenly becomes the
dominating factor.

I'm thinking, though, pg_upgrade runs locally, contrary to pg_dump
backups, so in that case latency would be negligible and Tatsuo's
patch inconsequential.

I'm also thinking, whether the ResourceOwner patch you've proposed
would get negated by Tatsuo's patch, because suddenly a portal
(IIRC) has a lot more locks than ResourceOwner could accomodate,
forcing a reversal to O(N²) behavior. In that case, that patch would
in fact be detrimental... huh... way to go 180

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance