Hi again! I have finally got my Ubuntu VirtualBox VM running PostgreSQL with PL/Python and am now looking at performance.
So here's the scenario: We have a great big table: cse=# \d nlpg.match_data Table "nlpg.match_data" Column | Type | Modifiers -------------------+----------+-------------------------------------------------------------------------- premise_id | integer | usrn | bigint | org | text | sao | text | level | text | pao | text | name | text | street | text | town | text | postcode | text | match_data_id | integer | not null default nextval('nlpg.match_data_match_data_id_seq1'::regclass) addr_str | text | tssearch_name | tsvector | tssearch_street | tsvector | tssearch_addr_str | tsvector | Indexes: "match_data_pkey1" PRIMARY KEY, btree (match_data_id) "index_match_data_mid" btree (match_data_id) "index_match_data_pid" btree (premise_id) "index_match_data_tssearch_addr_str" gin (tssearch_addr_str) "index_match_data_tssearch_name" gin (tssearch_name) "index_match_data_tssearch_street" gin (tssearch_street) "index_match_data_usrn" btree (usrn) KEY NOTE: nlpg.match_data has approximately 27,000,000 rows.. Running this query: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE match_data_id < 1000000; I get this: "Index Scan using match_data_pkey1 on match_data (cost=0.00..1452207.14 rows=1913756 width=302) (actual time=23.448..61559.652 rows=999999 loops=1)" " Index Cond: (match_data_id < 1000000)" "Total runtime: 403855.675 ms" I copied a chunk of the table like this: CREATE TABLE nlpg.md_copy AS SELECT * FROM nlpg.match_data WHERE match_data_id < 1000000; Then ran the same query on the smaller copy table: EXPLAIN ANALYZE UPDATE nlpg.md_copy SET org = org WHERE match_data_id < 1000000; And get this: "Seq Scan on md_copy (cost=0.00..96935.99 rows=999899 width=301) (actual time=26.745..33944.923 rows=999999 loops=1)" " Filter: (match_data_id < 1000000)" "Total runtime: 57169.419 ms" As you can see this is much faster per row with the smaller table chunk. I then tried running the same first query with 10 times the number of rows: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org WHERE match_data_id < 10000000; This takes a massive amount of time (still running) and is definitely a non-linear increase in the run time in comparison with the previous query. EXPLAIN UPDATE nlpg.match_data SET org = org WHERE match_data_id < 10000000; "Seq Scan on match_data (cost=0.00..3980053.11 rows=19172782 width=302)" " Filter: (match_data_id < 10000000)" Any suggestions on what I can do to speed things up? I presume if I turn off Sequential Scan then it might default to Index Scan.. Is there anything else? Cheers, Tom