[PERFORM] Problem with query, forget previous message

2006-03-23 Thread Bendik Rognlien Johansen
Seems the problem was with the custom aggregate function not being  
able to handle thousands of rows.


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Problem with query, server totally unresponsive

2006-03-23 Thread Bendik Rognlien Johansen
Hello, I have a big problem with one of my databases. When i run my  
query, after a few minutes, the postmaster shows 99% mem i top, and  
the server becomes totally unresponsive.


I get this message when I try to cancel the query:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


This works fine on a different machine with the same database  
settings and about 30% less records. The other machine is running  
PostgreSQL 8.0.3

The troubled one is running 8.1.2


Any help is greatly appreciated!

Thanks





The machine has 2x Intel dual core processors (3GHz) and 2 Gigs of ram.

#--- 


# RESOURCE USAGE (except WAL)
#--- 



# - Memory -

shared_buffers = 8192   # min 16 or  
max_connections*2, 8KB each

#temp_buffers = 1000# min 100, 8KB each
#max_prepared_transactions = 5  # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of  
shared memory

# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 4096 # min 64, size in KB
maintenance_work_mem = 262144   # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB





My query:

SELECT r.id AS id, max(r.name) AS name, max(companyid) AS companyid,  
max(extract(epoch from r.updated)) as r_updated, hydra.join(co.value)  
AS contacts, hydra.join(ad.postalsite) AS postalsites FROM records r  
LEFT OUTER JOIN contacts co ON(r.id = co.record AND co.type IN 
(1,11,101,3)) LEFT OUTER JOIN addresses ad ON(r.id = ad.record) WHERE  
r.original IS NULL GROUP BY r.id;



The hydra.join function
-- Aggregates a column to an array

DROP FUNCTION hydra.join_aggregate(text, text) CASCADE;
DROP FUNCTION hydra.join_aggregate_to_array(text);

CREATE FUNCTION hydra.join_aggregate(text, text) RETURNS text
  AS 'select $1 || ''|'' || $2'
  LANGUAGE sql IMMUTABLE STRICT;

CREATE FUNCTION hydra.join_aggregate_to_array(text) RETURNS text[]
  AS 'SELECT string_to_array($1, ''|'')'
  LANGUAGE sql IMMUTABLE STRICT;

CREATE AGGREGATE hydra.join (
  BASETYPE = text
,SFUNC = hydra.join_aggregate
,STYPE = text
,FINALFUNC = hydra.join_aggregate_to_array
);





Tables:
records: 757278 rows
contacts: 2256253 rows
addresses: 741536 rows








Explain:

 QUERY PLAN
 
-

GroupAggregate  (cost=636575.63..738618.40 rows=757278 width=75)
   ->  Merge Left Join  (cost=636575.63..694469.65 rows=1681120  
width=75)

 Merge Cond: ("outer".id = "inner".record)
 ->  Merge Left Join  (cost=523248.93..552247.54  
rows=1681120 width=63)

   Merge Cond: ("outer".id = "inner".record)
   ->  Sort  (cost=164044.73..165937.93 rows=757278  
width=48)

 Sort Key: r.id
 ->  Seq Scan on records r  (cost=0.00..19134.78  
rows=757278 width=48)

   Filter: (original IS NULL)
   ->  Sort  (cost=359204.20..363407.00 rows=1681120  
width=19)

 Sort Key: co.record
 ->  Seq Scan on contacts co   
(cost=0.00..73438.06 rows=1681120 width=19)
   Filter: (("type" = 1) OR ("type" = 11) OR  
("type" = 101) OR ("type" = 3))

 ->  Sort  (cost=113326.70..115180.54 rows=741536 width=16)
   Sort Key: ad.record
   ->  Seq Scan on addresses ad  (cost=0.00..20801.36  
rows=741536 width=16)

(16 rows)







se_companies=# \d records;
  Table "public.records"
 Column  |   Type   |   
Modifiers
-+-- 
+--
id  | integer  | not null default nextval 
('records_id_seq'::regclass)
companyid   | character varying(16)| default ''::character  
varying

categories  | integer[]|
nace| integer[]|
name| character varying(255)   | default ''::character  
varying
updated | timestamp with time zone | default  
('now'::text)::timestamp(6) with time zone

updater | integer  |
owner   | integer  |
loaner  | integer  |
info| text |
original| integer  |
active  | boolean  | default true
categoryquality | integer  | not null default 0
searchwords | character varying(128)[] |
priority| integer 

Re: [PERFORM] Slow query with joins

2006-01-11 Thread Bendik Rognlien Johansen
The sort is definitively the culprit. When I removed it the query was  
instant. I tried setting work_mem = 131072 but it did not seem to  
help. I really don't understand this :-( Any other ideas?


Thanks!


On Jan 11, 2006, at 9:23 PM, Jim C. Nasby wrote:


I'd try figuring out if the join is the culprit or the sort is (by
dropping the ORDER BY). work_mem is probably forcing the sort to spill
to disk, and if your drives are rather busy...

You might also get a win if you re-order the joins to people,  
contacts,

addresses, if you know it will have the same result.

In this case LIMIT won't have any real effect, because you have to go
all the way through with the ORDER BY anyway.

On Wed, Jan 11, 2006 at 08:55:32PM +0100, Bendik Rognlien Johansen  
wrote:

Yes,  the rowcount estimates are real, however, it has been a long
time since the last VACUUM FULL (there is never a good time).

I have clustered the tables, reindexed, analyzed, vacuumed and the
plan now looks like this:


no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' ||
r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address
AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS
ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat,
ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra
AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS
co_type, co.value AS co_value, co.description AS co_description,
co.priority AS co_priority, co.visible AS co_visible, co.searchable
AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER
JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON
(r.id = co.record) WHERE NOT r.deleted AND  r.original IS NULL ORDER
BY r.id;
QUERY PLAN
- 
---

--
Sort  (cost=182866.49..182943.12 rows=30655 width=587)
   Sort Key: r.id
   ->  Nested Loop Left Join  (cost=0.00..170552.10 rows=30655
width=587)
 ->  Nested Loop Left Join  (cost=0.00..75054.96 rows=26325
width=160)
   ->  Index Scan using people_deleted_original_is_null
on people r  (cost=0.00..1045.47 rows=23861 width=27)
 Filter: ((NOT deleted) AND (original IS NULL))
   ->  Index Scan using addresses_record_idx on
addresses ad  (cost=0.00..3.05 rows=4 width=137)
 Index Cond: ("outer".id = ad.record)
 ->  Index Scan using contacts_record_idx on contacts co
(cost=0.00..3.32 rows=24 width=431)
   Index Cond: ("outer".id = co.record)
(10 rows)






Looks faster, but still very slow. I added limit 1000 and it has been
running for about 25 minutes now with no output. top shows:


  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
29994 postgres  18   0 95768  78m  68m R 17.0  7.7   0:53.27  
postmaster




which is unusual, I usually get 99.9 %cpu for just about any query,
which leads me to believe this is disk related.



postgresql.conf:
shared_buffers = 8192
work_mem = 8192
maintenance_work_mem = 524288




Hardware 2x2.8GHz cpu
1GB ram

Could this be an issue related to lack of VACUUM FULL? The tables get
a lot of updates.


Thank you very much so far!




On Jan 11, 2006, at 4:45 PM, Tom Lane wrote:


Bendik Rognlien Johansen <[EMAIL PROTECTED]> writes:

Has anyone got any tips for speeding up this query? It currently
takes hours to start.


Are the rowcount estimates close to reality?  The plan doesn't look
unreasonable to me if they are.  It might help to increase work_mem
to ensure that the hash tables don't spill to disk.

Indexes:
"people_original_is_null" btree (original) WHERE original IS  
NULL


This index seems poorly designed: the actual index entries are dead
weight since all of them are necessarily NULL.  You might as well  
make
the index carry something that you frequently test in conjunction  
with

"original IS NULL".  For instance, if this particular query is a
common
case, you could replace this index with

CREATE INDEX people_deleted_original_is_null ON people(deleted)
 WHERE original IS NULL;

This index is still perfectly usable for queries that only say
"original
IS NULL", but it can also filter out rows with the wrong value of
deleted.  Now, if there are hardly any rows with deleted = true,  
maybe

this won't help much for your problem.  But in any case you ought to
consider whether you can make the index entries do something useful.

regards, tom lane



---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings



--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-611

Re: [PERFORM] Slow query with joins

2006-01-11 Thread Bendik Rognlien Johansen
Yes,  the rowcount estimates are real, however, it has been a long  
time since the last VACUUM FULL (there is never a good time).


I have clustered the tables, reindexed, analyzed, vacuumed and the  
plan now looks like this:



no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' ||  
r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address  
AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS  
ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat,  
ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra  
AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS  
co_type, co.value AS co_value, co.description AS co_description,  
co.priority AS co_priority, co.visible AS co_visible, co.searchable  
AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER  
JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON 
(r.id = co.record) WHERE NOT r.deleted AND  r.original IS NULL ORDER  
BY r.id;

QUERY PLAN
 
--

Sort  (cost=182866.49..182943.12 rows=30655 width=587)
   Sort Key: r.id
   ->  Nested Loop Left Join  (cost=0.00..170552.10 rows=30655  
width=587)
 ->  Nested Loop Left Join  (cost=0.00..75054.96 rows=26325  
width=160)
   ->  Index Scan using people_deleted_original_is_null  
on people r  (cost=0.00..1045.47 rows=23861 width=27)

 Filter: ((NOT deleted) AND (original IS NULL))
   ->  Index Scan using addresses_record_idx on  
addresses ad  (cost=0.00..3.05 rows=4 width=137)

 Index Cond: ("outer".id = ad.record)
 ->  Index Scan using contacts_record_idx on contacts co   
(cost=0.00..3.32 rows=24 width=431)

   Index Cond: ("outer".id = co.record)
(10 rows)






Looks faster, but still very slow. I added limit 1000 and it has been  
running for about 25 minutes now with no output. top shows:



  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
29994 postgres  18   0 95768  78m  68m R 17.0  7.7   0:53.27 postmaster



which is unusual, I usually get 99.9 %cpu for just about any query,  
which leads me to believe this is disk related.




postgresql.conf:
shared_buffers = 8192
work_mem = 8192
maintenance_work_mem = 524288




Hardware 2x2.8GHz cpu
1GB ram

Could this be an issue related to lack of VACUUM FULL? The tables get  
a lot of updates.



Thank you very much so far!




On Jan 11, 2006, at 4:45 PM, Tom Lane wrote:


Bendik Rognlien Johansen <[EMAIL PROTECTED]> writes:

Has anyone got any tips for speeding up this query? It currently
takes hours to start.


Are the rowcount estimates close to reality?  The plan doesn't look
unreasonable to me if they are.  It might help to increase work_mem
to ensure that the hash tables don't spill to disk.

Indexes:
 "people_original_is_null" btree (original) WHERE original IS NULL

This index seems poorly designed: the actual index entries are dead
weight since all of them are necessarily NULL.  You might as well make
the index carry something that you frequently test in conjunction with
"original IS NULL".  For instance, if this particular query is a  
common

case, you could replace this index with

CREATE INDEX people_deleted_original_is_null ON people(deleted)
  WHERE original IS NULL;

This index is still perfectly usable for queries that only say  
"original

IS NULL", but it can also filter out rows with the wrong value of
deleted.  Now, if there are hardly any rows with deleted = true, maybe
this won't help much for your problem.  But in any case you ought to
consider whether you can make the index entries do something useful.

regards, tom lane



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Slow query with joins

2006-01-11 Thread Bendik Rognlien Johansen

Hello!
Has anyone got any tips for speeding up this query? It currently  
takes hours to start.


PostgreSQL v8.x on (SuSe Linux)
Thanks!


no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' ||  
r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address  
AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS  
ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat,  
ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra  
AS ad_extra, co.id AS co_id, co.type AS co_type, co.value AS  
co_value, co.description AS co_description, co.priority AS  
co_priority, co.visible AS co_visible, co.searchable AS co_searchable


FROM people r
LEFT OUTER JOIN addresses ad ON(r.id = ad.record)
LEFT OUTER JOIN contacts co ON(r.id = co.record)
WHERE r.deleted = false AND  r.original IS NULL AND co.deleted =  
false AND NOT ad.deleted

ORDER BY r.id;

  QUERY PLAN
 
---

Sort  (cost=1152540.74..1152988.20 rows=178983 width=585)
   Sort Key: r.id
   ->  Hash Join  (cost=313757.11..1005334.96 rows=178983 width=585)
 Hash Cond: ("outer".record = "inner".id)
 ->  Seq Scan on addresses ad  (cost=0.00..428541.29  
rows=4952580 width=136)

   Filter: (NOT deleted)
 ->  Hash  (cost=312039.95..312039.95 rows=27664 width=457)
   ->  Hash Join  (cost=94815.24..312039.95 rows=27664  
width=457)

 Hash Cond: ("outer".record = "inner".id)
 ->  Seq Scan on contacts co   
(cost=0.00..147791.54 rows=5532523 width=430)

   Filter: (deleted = false)
 ->  Hash  (cost=94755.85..94755.85 rows=23755  
width=27)
   ->  Index Scan using  
people_original_is_null on people r  (cost=0.00..94755.85 rows=23755  
width=27)
 Filter: ((deleted = false) AND  
(original IS NULL))

(14 rows)






no_people=# \d contacts
 Table "public.contacts"
   Column|  Type  | 
Modifiers
-+ 
+--
id  | integer| not null default nextval 
('public.contacts_id_seq'::text)

record  | integer|
type| integer|
value   | character varying(128) |
description | character varying(255) |
priority| integer|
itescotype  | integer|
original| integer|
source  | integer|
reference   | character varying(32)  |
deleted | boolean| not null default false
quality | integer|
visible | boolean| not null default true
searchable  | boolean| not null default true
Indexes:
"contacts_pkey" PRIMARY KEY, btree (id)
"contacts_deleted_idx" btree (deleted)
"contacts_record_idx" btree (record) CLUSTER
"contacts_source_reference_idx" btree (source, reference)









no_people=# \d addresses
 Table "public.addresses"
   Column|  Type  |  
Modifiers
-+ 
+---
id  | integer| not null default nextval 
('public.addresses_id_seq'::text)

record  | integer|
address | character varying(128) |
extra   | character varying(32)  |
postalcode  | character varying(16)  |
postalsite  | character varying(64)  |
description | character varying(255) |
position| point  |
uncertainty | integer| default 
priority| integer|
type| integer|
place   | character varying(64)  |
floor   | integer|
side| character varying(8)   |
housename   | character varying(64)  |
original| integer|
source  | integer|
reference   | character varying(32)  |
deleted | boolean| not null default false
quality | integer|
visible | boolean| not null default true
searchable  | boolean| not null default true
Indexes:
"addresses_pkey" PRIMARY KEY, btree (id)
"addresses_deleted_idx" btree (deleted)
"addresses_record_idx" btree (record) CLUSTER
"addresses_source_reference_idx" btree (source, reference)








no_people=# \d people
 Table "public.people"
   Column   |   Type   |
Modifiers
+-- 
+
id

Re: [PERFORM] How to speed up delete

2005-07-07 Thread Bendik Rognlien Johansen

Thanks!
That took care of it.
On Jul 7, 2005, at 4:02 PM, Tom Lane wrote:


Bendik Rognlien Johansen <[EMAIL PROTECTED]> writes:


I am running few of these deletes (could become many more) inside a
transaction and each one takes allmost a second to complete.
Is it because of the foreign key constraint, or is it something else?



You need an index on "original" to support that FK efficiently.  Check
for references from other tables to this one, too.

regards, tom lane




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] How to speed up delete

2005-07-07 Thread Bendik Rognlien Johansen

Hello,
I was wondering if there is any way to speed up deletes on this table  
(see details below)?
I am running few of these deletes (could become many more) inside a  
transaction and each one takes allmost a second to complete.

Is it because of the foreign key constraint, or is it something else?

Thanks!

 Table "public.contacts"
   Column|  Type  | 
Modifiers
-+ 
+--
id  | integer| not null default nextval 
('public.contacts_id_seq'::text)

record  | integer|
type| integer|
value   | character varying(128) |
description | character varying(255) |
priority| integer|
itescotype  | integer|
original| integer|
Indexes:
"contacts_pkey" PRIMARY KEY, btree (id)
"contacts_record_idx" btree (record)
Foreign-key constraints:
"contacts_original_fkey" FOREIGN KEY (original) REFERENCES  
contacts(id)


dev=# select count(id) from contacts;
count

984834
(1 row)


dev=# explain analyze DELETE FROM contacts WHERE id = 985458;
   QUERY PLAN
 

Index Scan using contacts_pkey on contacts  (cost=0.00..3.01 rows=1  
width=6) (actual time=0.043..0.049 rows=1 loops=1)

   Index Cond: (id = 985458)
Total runtime: 840.481 ms
(3 rows)


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings