[PERFORM] How to efficiently duplicate a whole schema?
Hi, I'm running on Redhat 7.2 with postgresql 7.3.2 and I have two schema in the same database 'db' and 'db_dev'. Both contain a set of 20 tables for a total of less than 50 Mb of data each (on the order of 50k rows in total). Once in a while (often these days!), I need to synchronize the dev version from the production 'db'. Currently, I do this by setting constraints to deferred, deleting everything in db_dev, then issue a serie of insert ... select ... to copy data from each table in db to the equivalent table in db_dev. This approach used to run in less than 30 seconds in MySQL, but in PostgreSQL it currently takes around 30 minutes. The postmaster process is running at 100% cpu all the time. I enclosed all the delete statement in one transaction and all the insert statements in a second transaction. All the time is taken at the commit of both transaction. Is there a more straightforward way to synchronize a development database to a production one? Is there anyway to increase the performance of this delete/insert combination? I've got indexes and constraints on most tables, could that be the problem? At some point in the future, I will also need to make a copy of a whole schema ('db' into 'db_backup'), what would be an efficient way to do that? These are the parameters I've adjusted in the postgresql.conf: max_connections = 16 shared_buffers = 3000 max_fsm_relations = 2000 max_fsm_pages = 2 sort_mem = 2 vacuum_mem = 2 effective_cache_size = 15000 And this is the memory state of the machine: [EMAIL PROTECTED] free total used free sharedbuffers cached Mem: 20594722042224 17248 24768 1157121286572 -/+ buffers/cache: 6399401419532 Swap: 2096440 4909681605472 thanks, -- Sebastien Lemieux Bioinformatics, post-doc Elitra-canada ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to efficiently duplicate a whole schema?
On Wed, 6 Aug 2003, Tom Lane wrote: scott.marlowe [EMAIL PROTECTED] writes: On Wed, 6 Aug 2003, Tom Lane wrote: One obvious question is whether you have your foreign keys set up efficiently in the first place. As a rule, the referenced and referencing columns should have identical datatypes and both should be indexed. (PG will often let you create foreign key constraints that don't meet these rules ... but performance will suffer.) Is this one of those things that should spit out a NOTICE when it happens? I.e. when a table is created with a references and uses a different type than the parent, would it be a good idea to issue a NOTICE: parent and child fields are not of the same type I could see doing that for unequal data types, but I'm not sure if it's reasonable to do it for lack of index. Usually you won't have created the referencing column's index yet when you create the FK constraint, so any warning would just be noise. (The referenced column's index *is* checked for, since we require it to be unique.) Sure. I wasn't thinking of the index issue anyway, just the type mismatch. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] How to efficiently duplicate a whole schema?
Sebastien Lemieux [EMAIL PROTECTED] writes: On Wed, 6 Aug 2003, Tom Lane wrote: The idea here is to make sure that the planner's statistics reflect the full state of the table, not the empty state. Otherwise it may pick plans for the foreign key checks that are optimized for small tables. I added the 'analyze' but without any noticable gain in speed. I can't use 'truncate' since I need to 'set constraints all deferred'. What are you using, exactly? regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] How to efficiently duplicate a whole schema?
scott.marlowe [EMAIL PROTECTED] writes: On Wed, 6 Aug 2003, Tom Lane wrote: One obvious question is whether you have your foreign keys set up efficiently in the first place. As a rule, the referenced and referencing columns should have identical datatypes and both should be indexed. (PG will often let you create foreign key constraints that don't meet these rules ... but performance will suffer.) Is this one of those things that should spit out a NOTICE when it happens? I.e. when a table is created with a references and uses a different type than the parent, would it be a good idea to issue a NOTICE: parent and child fields are not of the same type I could see doing that for unequal data types, but I'm not sure if it's reasonable to do it for lack of index. Usually you won't have created the referencing column's index yet when you create the FK constraint, so any warning would just be noise. (The referenced column's index *is* checked for, since we require it to be unique.) regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] How to efficiently duplicate a whole schema?
On Wed, 6 Aug 2003, Tom Lane wrote: Sebastien Lemieux [EMAIL PROTECTED] writes: All the time is taken at the commit of both transaction. Sounds like the culprit is foreign-key checks. One obvious question is whether you have your foreign keys set up efficiently in the first place. As a rule, the referenced and referencing columns should have identical datatypes and both should be indexed. (PG will often let you create foreign key constraints that don't meet these rules ... but performance will suffer.) I've checked and all the foreign keys are setup between 'serial' (the primary key of the referenced table) and 'integer not null' (the foreign key field). Would that be same type? A couple of my foreign keys are not indexed, I'll fix that. The latter seems to do the job, since I can now synchronize in about 75 seconds (compared to 30 minutes), which seems good enough. Also, what procedure are you using to delete all the old data? What I'd recommend is ANALYZE table; TRUNCATE table; INSERT new data; The idea here is to make sure that the planner's statistics reflect the full state of the table, not the empty state. Otherwise it may pick plans for the foreign key checks that are optimized for small tables. I added the 'analyze' but without any noticable gain in speed. I can't use 'truncate' since I need to 'set constraints all deferred'. I guess the bottom line is that I really need to first drop all constraints and indexes, synchronize and then rebuild indexes and check constraints. But for that I'll need to reorganize my code a little bit! In the meantime, how bad a decision would it be to simply remove all foreign key constraints? Because, currently I think they are causing more problems than they are avoiding... thanks, -- Sebastien Lemieux Bioinformatics, post-doc Elitra-canada ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to efficiently duplicate a whole schema?
The idea here is to make sure that the planner's statistics reflect the full state of the table, not the empty state. Otherwise it may pick plans for the foreign key checks that are optimized for small tables. I added the 'analyze' but without any noticable gain in speed. I can't use 'truncate' since I need to 'set constraints all deferred'. What are you using, exactly? What I want to do: let t be the list of tables for t in tables: delete from db_dev.t; for t in tables: insert into db_dev.t (...) select ... from db.t; Some of my foreign keys are creating references loops in my schema, thus there is no correct order to do the deletes and inserts so that the constraints are satisfied at all time. I have to enclose those two loops in a 'set constraints all deferred' to avoid complaints from the constraints. I tried dropping the indexes first, doing the transfer and recreating the indexes: no gain. So computing the indexes doesn't take significant time. I then tried removing all the foreign keys constraints, replacing delete by truncate and it now runs in about 25 seconds. Downside is that I lose the foreign keys integrity verification, but because of this reference loop in my schema it has caused me more problem than it has avoided until now. So I can live with that! Thanks all! -- Sebastien Lemieux Bioinformatics, post-doc Elitra-canada ---(end of broadcast)--- TIP 8: explain analyze is your friend