Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Jeff Janes wrote On Thu, Nov 8, 2012 at 1:04 AM, Denis lt; socsam@ gt; wrote: Still I can't undesrtand why pg_dump has to know about all the tables? Strictly speaking it probably doesn't need to. But it is primarily designed for dumping entire databases, and the efficient way to do that is to read it all into memory in a few queries and then sort out the dependencies, rather than tracking down every dependency individually with one or more trips back to the database. (Although it still does make plenty of trips back to the database per table/sequence, for acls, defaults, attributes. If you were to rewrite pg_dump from the ground up to achieve your specific needs (dumping one schema, with no dependencies between to other schemata) you could probably make it much more efficient. But then it wouldn't be pg_dump, it would be something else. Cheers, Jeff -- Sent via pgsql-performance mailing list ( pgsql-performance@ ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance Please don't think that I'm trying to nitpick here, but pg_dump has options for dumping separate tables and that's not really consistent with the idea that pg_dump is primarily designed for dumping entire databases. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731900.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On 11/13/2012 10:12 PM, Denis wrote: Please don't think that I'm trying to nitpick here, but pg_dump has options for dumping separate tables and that's not really consistent with the idea that pg_dump is primarily designed for dumping entire databases. Sure it is. The word primarily is not just a noise word here. The fact that we have options to do other things doesn't mean that its primary design goal has changed. cheers andrew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, Nov 8, 2012 at 1:04 AM, Denis soc...@gmail.com wrote: Still I can't undesrtand why pg_dump has to know about all the tables? Strictly speaking it probably doesn't need to. But it is primarily designed for dumping entire databases, and the efficient way to do that is to read it all into memory in a few queries and then sort out the dependencies, rather than tracking down every dependency individually with one or more trips back to the database. (Although it still does make plenty of trips back to the database per table/sequence, for acls, defaults, attributes. If you were to rewrite pg_dump from the ground up to achieve your specific needs (dumping one schema, with no dependencies between to other schemata) you could probably make it much more efficient. But then it wouldn't be pg_dump, it would be something else. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Tom Lane-2 wrote Denis lt; socsam@ gt; writes: Tom Lane-2 wrote Hmmm ... so the problem here isn't that you've got 2600 schemas, it's that you've got 183924 tables. That's going to take some time no matter what. I wonder why pg_dump has to have deal with all these 183924 tables, if I specified to dump only one scheme: pg_dump -n schema_name or even like this to dump just one table pg_dump -t 'schema_name.comments' ? It has to know about all the tables even if it's not going to dump them all, for purposes such as dependency analysis. We have a web application where we create a schema with a number of tables in it for each customer. This architecture was chosen to ease the process of backup/restoring data. I find that argument fairly dubious, but in any case you should not imagine that hundreds of thousands of tables are going to be cost-free. regards, tom lane -- Sent via pgsql-performance mailing list ( pgsql-performance@ ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance Still I can't undesrtand why pg_dump has to know about all the tables? For example I have such an easy table CREATE TABLE CLog ( fromUser integer, toUser integer, message character varying(2048) NOT NULL, dateSend timestamp without time zone NOT NULL ); no foreign keys, it doesn't use partitioning, it doesn't have any relations to any other table. Why pg_dump has to gother information about ALL the tables in the database just to dump one this table? -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5731188.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Tom Lane-2 wrote Denis lt; socsam@ gt; writes: Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it takes from 10 to 15 seconds when I am doing backup. Sort (cost=853562.04..854020.73 rows=183478 width=219) (actual time=5340.477..5405.604 rows=183924 loops=1) Hmmm ... so the problem here isn't that you've got 2600 schemas, it's that you've got 183924 tables. That's going to take some time no matter what. It does seem like we could make some small changes to optimize that query a little bit, but they're not going to result in any amazing improvement overall, because pg_dump still has to deal with all the tables it's getting back. Fundamentally, I would ask whether you really need so many tables. It seems pretty likely that you have lots and lots of basically-identical tables. Usually it would be better to redesign such a structure into fewer tables with more index columns. Here is the output of pg_dump -s test.dump lt;http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dumpgt; This dump contains only 1 schema and 43 tables, so I don't think it's for the database you're having trouble with ... regards, tom lane -- Sent via pgsql-performance mailing list ( pgsql-performance@ ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance I wonder why pg_dump has to have deal with all these 183924 tables, if I specified to dump only one scheme: pg_dump -n schema_name or even like this to dump just one table pg_dump -t 'schema_name.comments' ? We have a web application where we create a schema with a number of tables in it for each customer. This architecture was chosen to ease the process of backup/restoring data. Sometimes clients ask us to restore data for the last month or roll back to last week's state. This task is easy to accomplish then the client's data is isolated in a schema/DB. If we put all the clients data in one table - operations of this kind will be much harder to perform. We will have to restore a huge DB with an enormously large tables in it to find the requested data. Different clients have different activity rate and we can select different backup strategies according to it. This would be impossible in case we keep all the clients data in one table. Besides all the above mentioned, the probability of massive data corruption (if an error in our web application occurs) is much higher. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730998.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Denis soc...@gmail.com writes: Tom Lane-2 wrote Hmmm ... so the problem here isn't that you've got 2600 schemas, it's that you've got 183924 tables. That's going to take some time no matter what. I wonder why pg_dump has to have deal with all these 183924 tables, if I specified to dump only one scheme: pg_dump -n schema_name or even like this to dump just one table pg_dump -t 'schema_name.comments' ? It has to know about all the tables even if it's not going to dump them all, for purposes such as dependency analysis. We have a web application where we create a schema with a number of tables in it for each customer. This architecture was chosen to ease the process of backup/restoring data. I find that argument fairly dubious, but in any case you should not imagine that hundreds of thousands of tables are going to be cost-free. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
I've read all the posts in thread, and as I understood in version 9.2 some patches were applied to improve pg_dump speed. I've just installed PostgreSQL 9.2.1 and I still have the same problem. I have a database with 2600 schemas in it. I try to dump each schema individually, but it takes too much time for every schema (about 30-40 seconds per schema, no matter what the data size is). Also for each schema dump I have a slow query log entry, here is an example: 2012-11-06 13:15:32 GMTLOG: duration: 12029.334 ms statement: SELECT c.tableoid, c.oid, c.relname, c.relacl, c.relkind, c.relnamespace, (SELECT rolname FROM pg_catalog.pg_roles WHERE oid = c.relowner) AS rolname, c.relchecks, c.relhastriggers, c.relhasindex, c.relhasrules, c.relhasoids, c.relfrozenxid, tc.oid AS toid, tc.relfrozenxid AS tfrozenxid, c.relpersistence, CASE WHEN c.reloftype 0 THEN c.reloftype::pg_catalog.regtype ELSE NULL END AS reloftype, d.refobjid AS owning_tab, d.refobjsubid AS owning_col, (SELECT spcname FROM pg_tablespace t WHERE t.oid = c.reltablespace) AS reltablespace, array_to_string(c.reloptions, ', ') AS reloptions, array_to_string(array(SELECT 'toast.' || x FROM unnest(tc.reloptions) x), ', ') AS toast_reloptions FROM pg_class c LEFT JOIN pg_depend d ON (c.relkind = 'S' AND d.classid = c.tableoid AND d.objid = c.oid AND d.objsubid = 0 AND d.refclassid = c.tableoid AND d.deptype = 'a') LEFT JOIN pg_class tc ON (c.reltoastrelid = tc.oid) WHERE c.relkind in ('r', 'S', 'v', 'c', 'f') ORDER BY c.oid -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730864.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Denis soc...@gmail.com writes: I've read all the posts in thread, and as I understood in version 9.2 some patches were applied to improve pg_dump speed. I've just installed PostgreSQL 9.2.1 and I still have the same problem. I have a database with 2600 schemas in it. I try to dump each schema individually, but it takes too much time for every schema (about 30-40 seconds per schema, no matter what the data size is). Could you provide a test case for that? Maybe the output of pg_dump -s, anonymized as you see fit? Also for each schema dump I have a slow query log entry, Could you provide EXPLAIN ANALYZE output for that query? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Tom Lane-2 wrote Denis lt; socsam@ gt; writes: I've read all the posts in thread, and as I understood in version 9.2 some patches were applied to improve pg_dump speed. I've just installed PostgreSQL 9.2.1 and I still have the same problem. I have a database with 2600 schemas in it. I try to dump each schema individually, but it takes too much time for every schema (about 30-40 seconds per schema, no matter what the data size is). Could you provide a test case for that? Maybe the output of pg_dump -s, anonymized as you see fit? Also for each schema dump I have a slow query log entry, Could you provide EXPLAIN ANALYZE output for that query? regards, tom lane -- Sent via pgsql-performance mailing list ( pgsql-performance@ ) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance Here is the output of EXPLAIN ANALYZE. It took 5 seconds but usually it takes from 10 to 15 seconds when I am doing backup. Sort (cost=853562.04..854020.73 rows=183478 width=219) (actual time=5340.477..5405.604 rows=183924 loops=1) Sort Key: c.oid Sort Method: external merge Disk: 33048kB - Hash Left Join (cost=59259.80..798636.25 rows=183478 width=219) (actual time=839.297..4971.299 rows=183924 loops=1) Hash Cond: (c.reltoastrelid = tc.oid) - Hash Right Join (cost=29530.77..146976.65 rows=183478 width=183) (actual time=404.959..3261.462 rows=183924 loops=1 ) Hash Cond: ((d.classid = c.tableoid) AND (d.objid = c.oid) AND (d.refclassid = c.tableoid)) Join Filter: (c.relkind = 'S'::char) - Seq Scan on pg_depend d (cost=0.00..71403.54 rows=995806 width=20) (actual time=1.137..878.571 rows=998642 lo ops=1) Filter: ((objsubid = 0) AND (deptype = 'a'::char)) Rows Removed by Filter: 2196665 - Hash (cost=21839.91..21839.91 rows=183478 width=175) (actual time=402.947..402.947 rows=183924 loops=1) Buckets: 1024 Batches: 32 Memory Usage: 876kB - Seq Scan on pg_class c (cost=0.00..21839.91 rows=183478 width=175) (actual time=0.017..267.614 rows=183 924 loops=1) Filter: (relkind = ANY ('{r,S,v,c,f}'::char[])) Rows Removed by Filter: 383565 - Hash (cost=18333.79..18333.79 rows=560979 width=40) (actual time=434.258..434.258 rows=567489 loops=1) Buckets: 4096 Batches: 32 Memory Usage: 703kB - Seq Scan on pg_class tc (cost=0.00..18333.79 rows=560979 width=40) (actual time=0.003..273.418 rows=567489 lo ops=1) SubPlan 1 - Seq Scan on pg_authid (cost=0.00..1.01 rows=1 width=68) (actual time=0.001..0.001 rows=1 loops=183924) Filter: (oid = c.relowner) Rows Removed by Filter: 2 SubPlan 2 - Seq Scan on pg_tablespace t (cost=0.00..1.02 rows=1 width=64) (actual time=0.001..0.001 rows=0 loops=183924) Filter: (oid = c.reltablespace) Rows Removed by Filter: 2 SubPlan 3 - Function Scan on unnest x (cost=0.00..1.25 rows=100 width=32) (actual time=0.001..0.001 rows=0 loops=183924) Total runtime: 5428.498 ms Here is the output of pg_dump -s test.dump http://postgresql.1045698.n5.nabble.com/file/n5730877/test.dump -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-thousands-of-schemas-tp5709766p5730877.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: Ok, I modified the part of pg_dump where tremendous number of LOCK TABLE are issued. I replace them with single LOCK TABLE with multiple tables. With 100k tables LOCK statements took 13 minutes in total, now it only takes 3 seconds. Comments? Was this applied? No, we fixed the server side instead. But only for 9.2, right? So people running back branches are still screwed. Yeah, but they're screwed anyway, because there are a bunch of O(N^2) behaviors involved here, not all of which are masked by what Tatsuo-san suggested. All of the other ones that I know of were associated with pg_dump itself, and since it is recommended to run the newer version of pg_dump against the older version of the server, no back patching would be necessary to get the benefits of those particular fixes. Six months or a year from now, we might have enough confidence in that batch of 9.2 fixes to back-port them en masse. Don't want to do it today though. What would be the recommendation for people trying to upgrade, but who can't get their data out in a reasonable window? Putting Tatsuo-san's change into a future pg_dump might be more conservative than back-porting the server's Lock Table change to the server version they are trying to get rid of. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Sun, Sep 2, 2012 at 5:39 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Thu, Aug 30, 2012 at 8:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: Ok, I modified the part of pg_dump where tremendous number of LOCK TABLE are issued. I replace them with single LOCK TABLE with multiple tables. With 100k tables LOCK statements took 13 minutes in total, now it only takes 3 seconds. Comments? Was this applied? No, we fixed the server side instead. But only for 9.2, right? So people running back branches are still screwed. Yeah, but they're screwed anyway, because there are a bunch of O(N^2) behaviors involved here, not all of which are masked by what Tatsuo-san suggested. All of the other ones that I know of were associated with pg_dump itself, and since it is recommended to run the newer version of pg_dump against the older version of the server, no back patching would be necessary to get the benefits of those particular fixes. Six months or a year from now, we might have enough confidence in that batch of 9.2 fixes to back-port them en masse. Don't want to do it today though. What would be the recommendation for people trying to upgrade, but who can't get their data out in a reasonable window? Putting Tatsuo-san's change into a future pg_dump might be more conservative than back-porting the server's Lock Table change to the server version they are trying to get rid of. What he said. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, Aug 30, 2012 at 4:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: Bruce Momjian br...@momjian.us writes: On Thu, May 31, 2012 at 09:20:43AM +0900, Tatsuo Ishii wrote: Ok, I modified the part of pg_dump where tremendous number of LOCK TABLE are issued. I replace them with single LOCK TABLE with multiple tables. With 100k tables LOCK statements took 13 minutes in total, now it only takes 3 seconds. Comments? Shall I commit to master and all supported branches? Was this applied? No, we fixed the server side instead. But only for 9.2, right? So people running back branches are still screwed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Claudio Freire klaussfre...@gmail.com writes: It's not clear whether Tom is already working on that O(N^2) fix in locking. I'm not; Jeff Janes is. But you shouldn't be holding your breath anyway, since it's 9.3 material at this point. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: Claudio Freire klaussfre...@gmail.com writes: It's not clear whether Tom is already working on that O(N^2) fix in locking. I'm not; Jeff Janes is. But you shouldn't be holding your breath anyway, since it's 9.3 material at this point. I agree we can't back-patch that change, but then I think we ought to consider back-patching some variant of Tatsuo's patch. Maybe it's not reasonable to thunk an arbitrary number of relation names in there on one line, but how about 1000 relations per LOCK statement or so? I guess we'd need to see how much that erodes the benefit, but we've certainly done back-branch rearrangements in pg_dump in the past to fix various kinds of issues, and this is pretty non-invasive. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Robert Haas robertmh...@gmail.com writes: On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not; Jeff Janes is. But you shouldn't be holding your breath anyway, since it's 9.3 material at this point. I agree we can't back-patch that change, but then I think we ought to consider back-patching some variant of Tatsuo's patch. Maybe it's not reasonable to thunk an arbitrary number of relation names in there on one line, but how about 1000 relations per LOCK statement or so? I guess we'd need to see how much that erodes the benefit, but we've certainly done back-branch rearrangements in pg_dump in the past to fix various kinds of issues, and this is pretty non-invasive. I am not convinced either that this patch will still be useful after Jeff's fix goes in, or that it provides any meaningful savings when you consider a complete pg_dump run. Yeah, it will make the lock acquisition phase faster, but that's not a big part of the runtime except in very limited scenarios (--schema-only, perhaps). The performance patches we applied to pg_dump over the past couple weeks were meant to relieve pain in situations where the big server-side lossage wasn't the dominant factor in runtime (ie, partial dumps). But this one is targeting exactly that area, which is why it looks like a band-aid and not a fix to me. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 10:50:51AM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not; Jeff Janes is. �But you shouldn't be holding your breath anyway, since it's 9.3 material at this point. I agree we can't back-patch that change, but then I think we ought to consider back-patching some variant of Tatsuo's patch. Maybe it's not reasonable to thunk an arbitrary number of relation names in there on one line, but how about 1000 relations per LOCK statement or so? I guess we'd need to see how much that erodes the benefit, but we've certainly done back-branch rearrangements in pg_dump in the past to fix various kinds of issues, and this is pretty non-invasive. I am not convinced either that this patch will still be useful after Jeff's fix goes in, or that it provides any meaningful savings when you consider a complete pg_dump run. Yeah, it will make the lock acquisition phase faster, but that's not a big part of the runtime except in very limited scenarios (--schema-only, perhaps). FYI, that is the pg_upgrade use-case, and pg_dump/restore time is reportedly taking the majority of time in many cases. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: The performance patches we applied to pg_dump over the past couple weeks were meant to relieve pain in situations where the big server-side lossage wasn't the dominant factor in runtime (ie, partial dumps). But this one is targeting exactly that area, which is why it looks like a band-aid and not a fix to me. No, Tatsuo's patch attacks a phase dominated by latency in some setups. That it's also becoming slow currently because of the locking cost is irrelevant, with locking sped up, the patch should only improve the phase even further. Imagine the current timeline: * = locking . = waiting *.*.**.**.***.***...*. Tatsuo's patch converts it to: *.** The locking fix would turn the timeline into: *.*.*.*.*.*.* Tatsuo's patch would turn that into: *** And, as noted before, pg_dump --schema-only is a key bottleneck in pg_upgrade. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not; Jeff Janes is. But you shouldn't be holding your breath anyway, since it's 9.3 material at this point. I agree we can't back-patch that change, but then I think we ought to consider back-patching some variant of Tatsuo's patch. Maybe it's not reasonable to thunk an arbitrary number of relation names in there on one line, but how about 1000 relations per LOCK statement or so? I guess we'd need to see how much that erodes the benefit, but we've certainly done back-branch rearrangements in pg_dump in the past to fix various kinds of issues, and this is pretty non-invasive. I am not convinced either that this patch will still be useful after Jeff's fix goes in, ... But people on older branches are not going to GET Jeff's fix. or that it provides any meaningful savings when you consider a complete pg_dump run. Yeah, it will make the lock acquisition phase faster, but that's not a big part of the runtime except in very limited scenarios (--schema-only, perhaps). That is not a borderline scenario, as others have also pointed out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 11:04:12AM -0400, Robert Haas wrote: On Thu, May 31, 2012 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, May 31, 2012 at 10:31 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm not; Jeff Janes is. But you shouldn't be holding your breath anyway, since it's 9.3 material at this point. I agree we can't back-patch that change, but then I think we ought to consider back-patching some variant of Tatsuo's patch. Maybe it's not reasonable to thunk an arbitrary number of relation names in there on one line, but how about 1000 relations per LOCK statement or so? I guess we'd need to see how much that erodes the benefit, but we've certainly done back-branch rearrangements in pg_dump in the past to fix various kinds of issues, and this is pretty non-invasive. I am not convinced either that this patch will still be useful after Jeff's fix goes in, ... But people on older branches are not going to GET Jeff's fix. FYI, if it got into Postgres 9.2, everyone upgrading to Postgres 9.2 would benefit because pg_upgrade uses the new cluster's pg_dumpall. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
Claudio Freire klaussfre...@gmail.com writes: On Thu, May 31, 2012 at 11:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: The performance patches we applied to pg_dump over the past couple weeks were meant to relieve pain in situations where the big server-side lossage wasn't the dominant factor in runtime (ie, partial dumps). But this one is targeting exactly that area, which is why it looks like a band-aid and not a fix to me. No, Tatsuo's patch attacks a phase dominated by latency in some setups. No, it does not. The reason it's a win is that it avoids the O(N^2) behavior in the server. Whether the bandwidth savings is worth worrying about cannot be proven one way or the other as long as that elephant is in the room. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas
On Thu, May 31, 2012 at 12:25 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, Tatsuo's patch attacks a phase dominated by latency in some setups. No, it does not. The reason it's a win is that it avoids the O(N^2) behavior in the server. Whether the bandwidth savings is worth worrying about cannot be proven one way or the other as long as that elephant is in the room. regards, tom lane I understand that, but if the locking is fixed and made to be O(N) (and hence each table locking O(1)), then latency suddenly becomes the dominating factor. I'm thinking, though, pg_upgrade runs locally, contrary to pg_dump backups, so in that case latency would be negligible and Tatsuo's patch inconsequential. I'm also thinking, whether the ResourceOwner patch you've proposed would get negated by Tatsuo's patch, because suddenly a portal (IIRC) has a lot more locks than ResourceOwner could accomodate, forcing a reversal to O(N²) behavior. In that case, that patch would in fact be detrimental... huh... way to go 180 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance