I'm trying to update several tables (all child tables of the same parent), and as the number of records increases, the length of time it takes to run the update is shooting up exponentially. I have imported the new data to an import table, and then join the import table to the update table. The update statement looks like:
UPDATE household_2000 h SET hhincome = new_hhincome FROM ( SELECT serial, new_hhincome FROM import WHERE year = 2000 ) r WHERE h.serial = r.serial household_2000 is a child table of a household table that, as you might guess, only contains records from the year 2000. I am putting a year = 2000 restriction on the import table and then linking on the unique identifier. For different child tables, this is how long the update takes to run (numbers are approximate): Records Cost (via EXPLAIN) Actual time 460,000 300,000 23 seconds 510,000 320,000 26 seconds 1.2 million 670,000 3:16 1.3 million 820,000 3:25 6.2 million 2.7 million ~2.5 hours So, the cost estimate given by EXPLAIN seems to be roughly proportional to the number of records in the dataset, but the actual time it takes to run seems to increase faster than the cost, even for the small and medium tables, and shoots through the roof for the large tables. Since I need to run this on additional child tables that are larger (the largest is 14 million records), I want to know what I can do to speed up the query. Here's the EXPLAIN for the query. Note that the query plan is the same for the small, medium, and large tables. Hash Join (cost=1268532.36..2379787.06 rows=5465837 width=1128) Hash Cond: (import_6_17_rev_hh.serial = h.serial) -> Bitmap Heap Scan on import_6_17_rev_hh (cost=126551.72..308495.69 rows=5465837 width=8) Recheck Cond: (year = 1990) -> Bitmap Index Scan on import_6_17_rev_hh_pkey (cost=0.00..125185.26 rows=5465837 width=0) Index Cond: (year = 1990) -> Hash (cost=295596.06..295596.06 rows=5527406 width=1124) -> Seq Scan on household_1990 h (cost=0.00..295596.06 rows=5527406 width=1124) Any help will be appreciated. Thanks, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth & Environmental Sciences CUNY Graduate Center -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql