[PERFORM] Problem with query, forget previous message
Seems the problem was with the custom aggregate function not being able to handle thousands of rows. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Problem with query, server totally unresponsive
Hello, I have a big problem with one of my databases. When i run my query, after a few minutes, the postmaster shows 99% mem i top, and the server becomes totally unresponsive. I get this message when I try to cancel the query: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. This works fine on a different machine with the same database settings and about 30% less records. The other machine is running PostgreSQL 8.0.3 The troubled one is running 8.1.2 Any help is greatly appreciated! Thanks The machine has 2x Intel dual core processors (3GHz) and 2 Gigs of ram. #--- # RESOURCE USAGE (except WAL) #--- # - Memory - shared_buffers = 8192 # min 16 or max_connections*2, 8KB each #temp_buffers = 1000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 4096 # min 64, size in KB maintenance_work_mem = 262144 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB My query: SELECT r.id AS id, max(r.name) AS name, max(companyid) AS companyid, max(extract(epoch from r.updated)) as r_updated, hydra.join(co.value) AS contacts, hydra.join(ad.postalsite) AS postalsites FROM records r LEFT OUTER JOIN contacts co ON(r.id = co.record AND co.type IN (1,11,101,3)) LEFT OUTER JOIN addresses ad ON(r.id = ad.record) WHERE r.original IS NULL GROUP BY r.id; The hydra.join function -- Aggregates a column to an array DROP FUNCTION hydra.join_aggregate(text, text) CASCADE; DROP FUNCTION hydra.join_aggregate_to_array(text); CREATE FUNCTION hydra.join_aggregate(text, text) RETURNS text AS 'select $1 || ''|'' || $2' LANGUAGE sql IMMUTABLE STRICT; CREATE FUNCTION hydra.join_aggregate_to_array(text) RETURNS text[] AS 'SELECT string_to_array($1, ''|'')' LANGUAGE sql IMMUTABLE STRICT; CREATE AGGREGATE hydra.join ( BASETYPE = text ,SFUNC = hydra.join_aggregate ,STYPE = text ,FINALFUNC = hydra.join_aggregate_to_array ); Tables: records: 757278 rows contacts: 2256253 rows addresses: 741536 rows Explain: QUERY PLAN - GroupAggregate (cost=636575.63..738618.40 rows=757278 width=75) -> Merge Left Join (cost=636575.63..694469.65 rows=1681120 width=75) Merge Cond: ("outer".id = "inner".record) -> Merge Left Join (cost=523248.93..552247.54 rows=1681120 width=63) Merge Cond: ("outer".id = "inner".record) -> Sort (cost=164044.73..165937.93 rows=757278 width=48) Sort Key: r.id -> Seq Scan on records r (cost=0.00..19134.78 rows=757278 width=48) Filter: (original IS NULL) -> Sort (cost=359204.20..363407.00 rows=1681120 width=19) Sort Key: co.record -> Seq Scan on contacts co (cost=0.00..73438.06 rows=1681120 width=19) Filter: (("type" = 1) OR ("type" = 11) OR ("type" = 101) OR ("type" = 3)) -> Sort (cost=113326.70..115180.54 rows=741536 width=16) Sort Key: ad.record -> Seq Scan on addresses ad (cost=0.00..20801.36 rows=741536 width=16) (16 rows) se_companies=# \d records; Table "public.records" Column | Type | Modifiers -+-- +-- id | integer | not null default nextval ('records_id_seq'::regclass) companyid | character varying(16)| default ''::character varying categories | integer[]| nace| integer[]| name| character varying(255) | default ''::character varying updated | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone updater | integer | owner | integer | loaner | integer | info| text | original| integer | active | boolean | default true categoryquality | integer | not null default 0 searchwords | character varying(128)[] | priority| integer
Re: [PERFORM] Slow query with joins
The sort is definitively the culprit. When I removed it the query was instant. I tried setting work_mem = 131072 but it did not seem to help. I really don't understand this :-( Any other ideas? Thanks! On Jan 11, 2006, at 9:23 PM, Jim C. Nasby wrote: I'd try figuring out if the join is the culprit or the sort is (by dropping the ORDER BY). work_mem is probably forcing the sort to spill to disk, and if your drives are rather busy... You might also get a win if you re-order the joins to people, contacts, addresses, if you know it will have the same result. In this case LIMIT won't have any real effect, because you have to go all the way through with the ORDER BY anyway. On Wed, Jan 11, 2006 at 08:55:32PM +0100, Bendik Rognlien Johansen wrote: Yes, the rowcount estimates are real, however, it has been a long time since the last VACUUM FULL (there is never a good time). I have clustered the tables, reindexed, analyzed, vacuumed and the plan now looks like this: no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' || r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat, ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS co_type, co.value AS co_value, co.description AS co_description, co.priority AS co_priority, co.visible AS co_visible, co.searchable AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON (r.id = co.record) WHERE NOT r.deleted AND r.original IS NULL ORDER BY r.id; QUERY PLAN - --- -- Sort (cost=182866.49..182943.12 rows=30655 width=587) Sort Key: r.id -> Nested Loop Left Join (cost=0.00..170552.10 rows=30655 width=587) -> Nested Loop Left Join (cost=0.00..75054.96 rows=26325 width=160) -> Index Scan using people_deleted_original_is_null on people r (cost=0.00..1045.47 rows=23861 width=27) Filter: ((NOT deleted) AND (original IS NULL)) -> Index Scan using addresses_record_idx on addresses ad (cost=0.00..3.05 rows=4 width=137) Index Cond: ("outer".id = ad.record) -> Index Scan using contacts_record_idx on contacts co (cost=0.00..3.32 rows=24 width=431) Index Cond: ("outer".id = co.record) (10 rows) Looks faster, but still very slow. I added limit 1000 and it has been running for about 25 minutes now with no output. top shows: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29994 postgres 18 0 95768 78m 68m R 17.0 7.7 0:53.27 postmaster which is unusual, I usually get 99.9 %cpu for just about any query, which leads me to believe this is disk related. postgresql.conf: shared_buffers = 8192 work_mem = 8192 maintenance_work_mem = 524288 Hardware 2x2.8GHz cpu 1GB ram Could this be an issue related to lack of VACUUM FULL? The tables get a lot of updates. Thank you very much so far! On Jan 11, 2006, at 4:45 PM, Tom Lane wrote: Bendik Rognlien Johansen <[EMAIL PROTECTED]> writes: Has anyone got any tips for speeding up this query? It currently takes hours to start. Are the rowcount estimates close to reality? The plan doesn't look unreasonable to me if they are. It might help to increase work_mem to ensure that the hash tables don't spill to disk. Indexes: "people_original_is_null" btree (original) WHERE original IS NULL This index seems poorly designed: the actual index entries are dead weight since all of them are necessarily NULL. You might as well make the index carry something that you frequently test in conjunction with "original IS NULL". For instance, if this particular query is a common case, you could replace this index with CREATE INDEX people_deleted_original_is_null ON people(deleted) WHERE original IS NULL; This index is still perfectly usable for queries that only say "original IS NULL", but it can also filter out rows with the wrong value of deleted. Now, if there are hardly any rows with deleted = true, maybe this won't help much for your problem. But in any case you ought to consider whether you can make the index entries do something useful. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-611
Re: [PERFORM] Slow query with joins
Yes, the rowcount estimates are real, however, it has been a long time since the last VACUUM FULL (there is never a good time). I have clustered the tables, reindexed, analyzed, vacuumed and the plan now looks like this: no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' || r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat, ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS co_type, co.value AS co_value, co.description AS co_description, co.priority AS co_priority, co.visible AS co_visible, co.searchable AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON (r.id = co.record) WHERE NOT r.deleted AND r.original IS NULL ORDER BY r.id; QUERY PLAN -- Sort (cost=182866.49..182943.12 rows=30655 width=587) Sort Key: r.id -> Nested Loop Left Join (cost=0.00..170552.10 rows=30655 width=587) -> Nested Loop Left Join (cost=0.00..75054.96 rows=26325 width=160) -> Index Scan using people_deleted_original_is_null on people r (cost=0.00..1045.47 rows=23861 width=27) Filter: ((NOT deleted) AND (original IS NULL)) -> Index Scan using addresses_record_idx on addresses ad (cost=0.00..3.05 rows=4 width=137) Index Cond: ("outer".id = ad.record) -> Index Scan using contacts_record_idx on contacts co (cost=0.00..3.32 rows=24 width=431) Index Cond: ("outer".id = co.record) (10 rows) Looks faster, but still very slow. I added limit 1000 and it has been running for about 25 minutes now with no output. top shows: PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 29994 postgres 18 0 95768 78m 68m R 17.0 7.7 0:53.27 postmaster which is unusual, I usually get 99.9 %cpu for just about any query, which leads me to believe this is disk related. postgresql.conf: shared_buffers = 8192 work_mem = 8192 maintenance_work_mem = 524288 Hardware 2x2.8GHz cpu 1GB ram Could this be an issue related to lack of VACUUM FULL? The tables get a lot of updates. Thank you very much so far! On Jan 11, 2006, at 4:45 PM, Tom Lane wrote: Bendik Rognlien Johansen <[EMAIL PROTECTED]> writes: Has anyone got any tips for speeding up this query? It currently takes hours to start. Are the rowcount estimates close to reality? The plan doesn't look unreasonable to me if they are. It might help to increase work_mem to ensure that the hash tables don't spill to disk. Indexes: "people_original_is_null" btree (original) WHERE original IS NULL This index seems poorly designed: the actual index entries are dead weight since all of them are necessarily NULL. You might as well make the index carry something that you frequently test in conjunction with "original IS NULL". For instance, if this particular query is a common case, you could replace this index with CREATE INDEX people_deleted_original_is_null ON people(deleted) WHERE original IS NULL; This index is still perfectly usable for queries that only say "original IS NULL", but it can also filter out rows with the wrong value of deleted. Now, if there are hardly any rows with deleted = true, maybe this won't help much for your problem. But in any case you ought to consider whether you can make the index entries do something useful. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Slow query with joins
Hello! Has anyone got any tips for speeding up this query? It currently takes hours to start. PostgreSQL v8.x on (SuSe Linux) Thanks! no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' || r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat, ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra AS ad_extra, co.id AS co_id, co.type AS co_type, co.value AS co_value, co.description AS co_description, co.priority AS co_priority, co.visible AS co_visible, co.searchable AS co_searchable FROM people r LEFT OUTER JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON(r.id = co.record) WHERE r.deleted = false AND r.original IS NULL AND co.deleted = false AND NOT ad.deleted ORDER BY r.id; QUERY PLAN --- Sort (cost=1152540.74..1152988.20 rows=178983 width=585) Sort Key: r.id -> Hash Join (cost=313757.11..1005334.96 rows=178983 width=585) Hash Cond: ("outer".record = "inner".id) -> Seq Scan on addresses ad (cost=0.00..428541.29 rows=4952580 width=136) Filter: (NOT deleted) -> Hash (cost=312039.95..312039.95 rows=27664 width=457) -> Hash Join (cost=94815.24..312039.95 rows=27664 width=457) Hash Cond: ("outer".record = "inner".id) -> Seq Scan on contacts co (cost=0.00..147791.54 rows=5532523 width=430) Filter: (deleted = false) -> Hash (cost=94755.85..94755.85 rows=23755 width=27) -> Index Scan using people_original_is_null on people r (cost=0.00..94755.85 rows=23755 width=27) Filter: ((deleted = false) AND (original IS NULL)) (14 rows) no_people=# \d contacts Table "public.contacts" Column| Type | Modifiers -+ +-- id | integer| not null default nextval ('public.contacts_id_seq'::text) record | integer| type| integer| value | character varying(128) | description | character varying(255) | priority| integer| itescotype | integer| original| integer| source | integer| reference | character varying(32) | deleted | boolean| not null default false quality | integer| visible | boolean| not null default true searchable | boolean| not null default true Indexes: "contacts_pkey" PRIMARY KEY, btree (id) "contacts_deleted_idx" btree (deleted) "contacts_record_idx" btree (record) CLUSTER "contacts_source_reference_idx" btree (source, reference) no_people=# \d addresses Table "public.addresses" Column| Type | Modifiers -+ +--- id | integer| not null default nextval ('public.addresses_id_seq'::text) record | integer| address | character varying(128) | extra | character varying(32) | postalcode | character varying(16) | postalsite | character varying(64) | description | character varying(255) | position| point | uncertainty | integer| default priority| integer| type| integer| place | character varying(64) | floor | integer| side| character varying(8) | housename | character varying(64) | original| integer| source | integer| reference | character varying(32) | deleted | boolean| not null default false quality | integer| visible | boolean| not null default true searchable | boolean| not null default true Indexes: "addresses_pkey" PRIMARY KEY, btree (id) "addresses_deleted_idx" btree (deleted) "addresses_record_idx" btree (record) CLUSTER "addresses_source_reference_idx" btree (source, reference) no_people=# \d people Table "public.people" Column | Type | Modifiers +-- + id
Re: [PERFORM] How to speed up delete
Thanks! That took care of it. On Jul 7, 2005, at 4:02 PM, Tom Lane wrote: Bendik Rognlien Johansen <[EMAIL PROTECTED]> writes: I am running few of these deletes (could become many more) inside a transaction and each one takes allmost a second to complete. Is it because of the foreign key constraint, or is it something else? You need an index on "original" to support that FK efficiently. Check for references from other tables to this one, too. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] How to speed up delete
Hello, I was wondering if there is any way to speed up deletes on this table (see details below)? I am running few of these deletes (could become many more) inside a transaction and each one takes allmost a second to complete. Is it because of the foreign key constraint, or is it something else? Thanks! Table "public.contacts" Column| Type | Modifiers -+ +-- id | integer| not null default nextval ('public.contacts_id_seq'::text) record | integer| type| integer| value | character varying(128) | description | character varying(255) | priority| integer| itescotype | integer| original| integer| Indexes: "contacts_pkey" PRIMARY KEY, btree (id) "contacts_record_idx" btree (record) Foreign-key constraints: "contacts_original_fkey" FOREIGN KEY (original) REFERENCES contacts(id) dev=# select count(id) from contacts; count 984834 (1 row) dev=# explain analyze DELETE FROM contacts WHERE id = 985458; QUERY PLAN Index Scan using contacts_pkey on contacts (cost=0.00..3.01 rows=1 width=6) (actual time=0.043..0.049 rows=1 loops=1) Index Cond: (id = 985458) Total runtime: 840.481 ms (3 rows) ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings