On Fri, Nov 9, 2012 at 12:50:34AM -0500, Tom Lane wrote: > Jeff Janes <jeff.ja...@gmail.com> writes: > > Are sure the server you are dumping out of is head? > > I experimented a bit with dumping/restoring 16000 tables matching > Bruce's test case (ie, one serial column apiece). The pg_dump profile > seems fairly flat, without any easy optimization targets. But > restoring the dump script shows a rather interesting backend profile: > > samples % image name symbol name > 30861 39.6289 postgres AtEOXact_RelationCache > 9911 12.7268 postgres hash_seq_search > 2682 3.4440 postgres init_sequence > 2218 2.8482 postgres _bt_compare > 2120 2.7223 postgres hash_search_with_hash_value > 1976 2.5374 postgres XLogInsert > 1429 1.8350 postgres CatalogCacheIdInvalidate > 1282 1.6462 postgres LWLockAcquire > 973 1.2494 postgres LWLockRelease > 702 0.9014 postgres hash_any > > The hash_seq_search time is probably mostly associated with > AtEOXact_RelationCache, which is run during transaction commit and scans > the relcache hashtable looking for tables created in the current > transaction. So that's about 50% of the runtime going into that one > activity.
Thanks for finding this. What is odd is that I am not seeing non-linear restores at 16k in git head, so I am confused how something that consumes ~50% of backend time could still perform linearly. Would this consume 50% at lower table counts? I agree we should do something, even if this is a rare case, because 50% is a large percentage. > There are at least three ways we could whack that mole: > > * Run the psql script in --single-transaction mode, as I was mumbling > about the other day. If we were doing AtEOXact_RelationCache only once, > rather than once per CREATE TABLE statement, it wouldn't be a problem. > Easy but has only a narrow scope of applicability. > > * Keep a separate list (or data structure of your choice) so that > relcache entries created in the current xact could be found directly > rather than having to scan the whole relcache. That'd add complexity > though, and could perhaps be a net loss for cases where the relcache > isn't so bloated. I like this one. Could we do it only when the cache gets to be above a certain size, to avoid any penalty? -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers