All, It's perhaps not the ideal time for a discussion but if I thought it would turn into a long discussion then I'd probably not post this due to the current timing in the release cycle. This is something I thought of while doing a restore on a 40ish GB database which has a few hundred smallish tables of various sizes up to about 1.5 million records, then a handful of larger tables containing 20-70 million records.
During the restore (which was running 4 separate jobs), I was polling SELECT query FROM pg_Stat_activity to find out the progress of the restore. I noticed that there was now less than 4 jobs running and pg_restore was busy doing COPY into some of the 20-70 million record tables. If pg_dump was to still follow the dependencies of objects, would there be any reason why it shouldn't backup larger tables first? This should then allow pg_restore to balance the smaller tables around separate jobs at the end of the restore instead of having CPUs sitting idle while say 1 job is busy on a big table. Of course this would not improve things for all work loads, but I hardly think that a database with a high number of smallish tables and a small number of large tables is unusual. If there was consensus that it might be a good idea to craft up a patch to test if this is worth it then I'd be willing to give it a go. Some of the things I thought about but did not have an answer for: 1. Would it be enough just check the number of blocks in each relation or would it be better to look at the statistics to estimate the size of the when it's restored minus the dead tuples. 2. Would it be a good idea to add an extra pg_dump option for this or just make it the default for all dumps that contain table data? Any thoughts on this are welcome. Regards David Rowley -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers