Re: [PERFORM] Forcing the use of particular execution plans

2006-10-03 Thread Tim Truman

Thanks Tom
The time difference did distract me from the issue. Switching Seq Scan to
off reduced the runtime greatly, so I am now adjusting the
effective_cache_size, random_page_cost settings to favor indexes over Seq
Scans.

Regards,
Tim


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, 3 October 2006 1:50 PM
To: Tim Truman
Cc: 'Dave Dutcher'; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Forcing the use of particular execution plans 

Tim Truman [EMAIL PROTECTED] writes:
 Here is an explain analyze for the query that performs slowly,

This shows that the planner is exactly correct in thinking that all
the runtime is going into the seqscan on transaction:

 Aggregate  (cost=88256.32..88256.32 rows=1 width=0) (actual
 time=55829.000..55829.000 rows=1 loops=1)
 ...
   -  Seq Scan on transaction t
 (cost=0.00..87061.04 rows=1630 width=349) (actual time=234.000..55797.000
 rows=200 loops=1)
 Filter: ((transaction_date =
 '2005-01-01'::date) AND (transaction_date = '2006-09-25'::date) AND
 ((credit_card_no)::text ~~ '4564%549'::text))

Since that component of the plan was identical in your two original
plans (desired and undesired) it seems pretty clear that you have
not correctly identified what your problem is.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High CPU Load

2006-10-03 Thread Jérôme BENOIS
Hi All,

I reply to me, we solved a CPU Load problem. We had an external batch
who used an expensive SQL view and took 99% of the CPU.

Thanks all for you help !

---

I started the HAPlatform open-source project is a part of Share'nGo
Project, this goal is define all documentation and scripts required to
install and maintain High Available platform.

Tow platform are targeted :

* LAPJ : Linux Apache PostgreSQL Java

* LAMP : Linux Apache MySQL PHP

The first documentation is here (it's my postgres configuration) :

http://sharengo.org/haplatform/docs/PostgreSQL/en/html_single/index.html


Cheers,
Jérôme.
-- 
Open-Source : http://www.sharengo.org
Corporate : http://www.argia-engineering.fr

Le vendredi 22 septembre 2006 à 09:43 +0200, Jérôme BENOIS a écrit :
 Hi, Markus,
 
 Le mardi 19 septembre 2006 à 15:09 +0200, Markus Schaber a écrit :
  Hi, Jerome,
  
  Jérôme BENOIS wrote:
  
   Now i Have 335 concurrent connections, i decreased work_mem parameter to
   32768 and disabled Hyper Threading in BIOS. But my CPU load is still
   very important.
   What are your settings for commit_siblings and commit_delay?
   It default :
   
   #commit_delay = 01   # range 0-10, inmicroseconds
   #commit_siblings = 5 # range 1-1000
  
  You should uncomment them, and play with different settings. I'd try a
  commit_delay of 100, and commit_siblings of 5 to start with.
  
   I plan to return to previous version : 7.4.6 in and i will reinstall all
   in a dedicated server in order to reproduce and solve the problem.
  
  You should use at least 7.4.13 as it fixes some critical buts that were
  in 7.4.6. They use the same on-disk format and query planner logic, so
  they should not have any difference.
  
  I don't have much more ideas what the problem could be.
  
  Can you try to do some profiling (e. G. with statement logging) to see
  what specific statements are the one that cause high cpu load?
  
  Are there other differences (besides the PostgreSQL version) between the
  two installations? (Kernel, libraries, other software...)
 nothing.
 
 I returned to the previous version 7.4.6 in my production server, it's
 work fine !
 
 And I plan to reproduce this problem in a dedicated server, and i will
 send all informations in this list in the next week.
 
 I hope your help for solve this problem.
 
 Cheers,
 Jérôme.
 
  HTH,
  Markus



signature.asc
Description: Ceci est une partie de message	numériquement signée


Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Markus Schaber
Hi, Carlo,

Carlo Stonebanks wrote:
 Did you think about putting the whole data into PostgreSQL using COPY in
 a nearly unprocessed manner, index it properly, and then use SQL and
 stored functions to transform the data inside the database to the
 desired result?
 
 This is actually what we are doing. The slowness is on the row-by-row 
 transformation. Every row reqauires that all the inserts and updates of the 
 pvious row be committed - that's why we have problems figuring out how to 
 use this using SQL set logic.

Maybe group by, order by, distinct on and hand-written functions
and aggregates (like first() or best()) may help.

You could combine all relevant columns into an user-defined compund
type, then group by entity, and have a self-defined aggregate generate
the accumulated tuple for each entity.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Markus Schaber
Hi, Carlo,

Carlo Stonebanks wrote:

 Trying to achieve a high level of data quality in one large project is
 not often possible. Focus on the most critical areas of checking and get
 that working first with acceptable performance, then layer on additional
 checks while tuning. The complexity of the load programs you have also
 means they are susceptible to introducing data quality problems rather
 than removing them, so an incremental approach will also aid debugging
 of the load suite.
 
 I couldn't agree more.

I still think that using a PL in the backend might be more performant
than having an external client, alone being the SPI interface more
efficient compared to the network serialization for external applications.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-03 Thread Carlo Stonebanks
Some very helpful people had asked that I post the troublesome code that was
generated by my import program.

I installed a SQL log feature in my import program. I have
posted samples of the SQL statements that cause the biggest delays.

Thanks for all of your help.

Carlo

--
Sample 1:
This one is very expensive on my system.
--
select
f.facility_id,
provider_practice_id
from
mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join mdx_core.facility_address as fa
on fa.facility_id = pp.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
pp.provider_id = 1411311
and f.facility_type_code != 'P'
and (
pp.facility_address_id is not null
and a.state_code = 'NY'
and '10001-2382' = a.postal_code||'%'
and a.city = 'New York'
) or (
f.default_state_code = 'NY'
and '10001-2382' like f.default_postal_code||'%'
and f.default_city = 'New York'
)
limit 1

Limit  (cost=3899.18..32935.21 rows=1 width=8)
  -  Hash Join  (cost=3899.18..91007.27 rows=3 width=8)
Hash Cond: (outer.address_id = inner.address_id)
Join Filter: (((outer.provider_id = 1411311) AND 
(outer.facility_type_code  'P'::bpchar) AND (outer.facility_address_id 
IS NOT NULL) AND ((inner.state_code)::text = 'NY'::text) AND 
('10001-2382'::text = ((inner.postal_code)::text || '%'::text)) AND 
((inner.city)::text = 'New York'::text)) OR ((outer.default_state_code = 
'NY'::bpchar) AND ('10001-2382'::text ~~ 
((outer.default_postal_code)::text || '%'::text)) AND 
((outer.default_city)::text = 'New York'::text)))
-  Merge Join  (cost=0.00..50589.20 rows=695598 width=57)
  Merge Cond: (outer.facility_id = inner.facility_id)
  -  Merge Join  (cost=0.00..16873.90 rows=128268 width=49)
Merge Cond: (outer.facility_id = inner.facility_id)
-  Index Scan using facility_pkey on facility f 
(cost=0.00..13590.18 rows=162525 width=41)
-  Index Scan using facility_address_facility_idx on 
facility_address fa  (cost=0.00..4254.46 rows=128268 width=8)
  -  Index Scan using provider_practice_facility_idx on 
provider_practice pp  (cost=0.00..28718.27 rows=452129 width=16)
-  Hash  (cost=3650.54..3650.54 rows=99454 width=36)
  -  Seq Scan on address a  (cost=0.00..3650.54 rows=99454 
width=36)

--
Sample 2:
This one includes a call to a custom function which performs lexical 
comparisons
and returns a rating on the likelihood that the company names refer to the 
same
facility. Replacing the code:
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as 
comp
with
  1 as comp
-- to avoid the function call only shaved a fragment off the execution time, 
which leads me to believe my problem is in the SQL structure itself.

--
select
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as 
comp,
facil.*
from (
select
f.facility_id,
fa.facility_address_id,
a.address_id,
f.facility_type_code,
f.name,
a.address,
a.city,
a.state_code,
a.postal_code,
a.country_code
from
mdx_core.facility as f
join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
facility_address_id is not null
and a.country_code = 'US'
and a.state_code = 'IL'
and '60640-5759' like a.postal_code||'%'
union select
f.facility_id,
null as facility_address_id,
null as address_id,
f.facility_type_code,
f.name,
null as address,
f.default_city as city,
f.default_state_code as state_code,
f.default_postal_code as postal_code,
f.default_country_code as country_code
from
mdx_core.facility as f
left outer join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
where
facility_address_id is null
and f.default_country_code = 'US'
and '60640-5759' like f.default_postal_code||'%'
) as facil
order by comp

Sort  (cost=20595.92..20598.01 rows=834 width=236)
  Sort Key: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 
Inc'::text, (name)::text)
  -  Subquery Scan facil  (cost=20522.10..20555.46 rows=834 width=236)
-  Unique  (cost=20522.10..20545.03 rows=834 width=103)
  -  Sort  (cost=20522.10..20524.18 rows=834 width=103)
Sort Key: facility_id, facility_address_id, address_id, 
facility_type_code, name, address, city, state_code, postal_code, 
country_code
-  Append  (cost=4645.12..20481.63 rows=834 width=103)
  -  Nested Loop  (cost=4645.12..8381.36 rows=21 
width=103)
-  Hash Join  (cost=4645.12..8301.35 
rows=21 width=72)
  Hash Cond: (outer.address_id = 
inner.address_id)
  -  Seq Scan on facility_address fa 
(cost=0.00..3014.68 rows=128268 width=12)
Filter: (facility_address_id IS 
NOT NULL)
  -  

Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Carlo Stonebanks
 I still think that using a PL in the backend might be more performant
 than having an external client, alone being the SPI interface more
 efficient compared to the network serialization for external applications.

I would actually love for this to work better, as this is technology that I 
would like to develop in general - I see db servers with strong server-side 
programming languages as being able to operate as application servers, with 
the enterprises business logic centralised on the server.

The import routine that I wrote will actually work on the server as well - 
it will detect the presence of the spi_ calls, and replace the pg_* calls 
with spi_* calls. So, you see this WAS my intention.

However, the last time I tried to run something that complex from the db 
server, it ran quite slowly compared to from a client. This may have had 
something to do with the client that I used to call the stored procedure - I 
thought that perhaps the client created an implicit transaction around my 
SQL statement to allow a rollback, and all of the updates and inserts got 
backed up in a massive transaction queue that took forever to commit.

Carlo 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Carlo Stonebanks
 Maybe group by, order by, distinct on and hand-written functions
 and aggregates (like first() or best()) may help.

We use these - we have lexical analysis functions which assign a rating to 
each row in a set, and the likelyhood that the data is a match, and then we 
sort our results.

I thought this would be the cause of the slowdowns - and it is, but a very 
small part of it. I have identified the problem code, and the problems are 
within some very simple joins. I have posted the code under a related topic 
header. I obviously have a few things to learn about optimising SQL joins.

Carlo


 You could combine all relevant columns into an user-defined compund
 type, then group by entity, and have a self-defined aggregate generate
 the accumulated tuple for each entity.

 Markus
 -- 
 Markus Schaber | Logical TrackingTracing International AG
 Dipl. Inf. | Software Development GIS

 Fight against software patents in Europe! www.ffii.org
 www.nosoftwarepatents.org

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Arnaud Lesauvage

Hi List !

I have a performance problem, but I am not sure whether it really
is a problem or not.
I am running a fresh install of PostgreSQL 8.1.4 on Windows2000.
The server is a bi-opteron with 2GB of RAM. The PostgreSQL's data
folder is on a RAID-0 array of 2 SATA WD Raptor drives (10.000
rpm, 8MB cache).

I have a very simple table, with only ~500 rows :
CREATE TABLE table1
(
   gid int4 NOT NULL DEFAULT 0,
   field1 varchar(45) NOT NULL,
   field2 int2 NOT NULL DEFAULT 1,
   field3 int2 NOT NULL DEFAULT 0,
   field4 int2 NOT NULL DEFAULT 1,
   field5 int4 NOT NULL DEFAULT -1,
   field6 int4,
   field7 int4,
   field8 int4,
   field9 int2 DEFAULT 1,
   CONSTRAINT table1_pkey PRIMARY KEY (gid)
)
WITHOUT OIDS;

The problem is that simple select queries with the primary key in the 
WHERE statement take very long to run.

For example, this query returns only 7 rows and takes about 1
second to run !
SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41);

EXPLAIN ANALYZE SELECT * FROM table1 WHERE gid in
(33,110,65,84,92,94,13,7,68,41);

  QUERY PLAN
--
  Seq Scan on table1  (cost=0.00..23.69 rows=10 width=35) (actual
time=0.023..0.734 rows=7 loops=1)
Filter: ((gid = 33) OR (gid = 110) OR (gid = 65) OR (gid = 84)
OR (gid = 92) OR (gid = 94) OR (gid = 13) OR (gid = 7) OR (gid =
68) OR (gid = 41))
  Total runtime: 0.801 ms
(3 rows)

I have run VACUUM FULL on this table many times... I don't know
what to try next !
What is wrong here (because I hope that something is wrong) ?
Thanks a lot for your help !

Regards
--
Arnaud


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Arnaud Lesauvage

Steinar H. Gunderson wrote:

  Total runtime: 0.801 ms


0.801 ms is _far_ under a second... Where do you have the latter timing from?


I fell stupid...
Sorry for the useless message...



[]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Alexander Staubo

On Oct 3, 2006, at 13:25 , Arnaud Lesauvage wrote:

The problem is that simple select queries with the primary key in  
the WHERE statement take very long to run.

For example, this query returns only 7 rows and takes about 1
second to run !
SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41);


This is a very small table, but generally speaking, such queries  
benefit from an index; eg.,


  create index table1_gid on table1 (gid);

Note that PostgreSQL may still perform a sequential scan if it thinks  
this has a lower cost, eg. for small tables that span just a few pages.



I have run VACUUM FULL on this table many times... I don't know
what to try next !


PostgreSQL's query planner relies on table statistics to perform  
certain optimizations; make sure you run analyze table1.


Alexander.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Tobias Brox
[Arnaud Lesauvage - Tue at 01:25:10PM +0200]
 I have a performance problem, but I am not sure whether it really
 is a problem or not.
   QUERY PLAN
 --
   Seq Scan on table1  (cost=0.00..23.69 rows=10 width=35) (actual
 time=0.023..0.734 rows=7 loops=1)
 Filter: ((gid = 33) OR (gid = 110) OR (gid = 65) OR (gid = 84)
 OR (gid = 92) OR (gid = 94) OR (gid = 13) OR (gid = 7) OR (gid =
 68) OR (gid = 41))
   Total runtime: 0.801 ms
 (3 rows)
 
 I have run VACUUM FULL on this table many times... I don't know
 what to try next !
 What is wrong here (because I hope that something is wrong) ?
 Thanks a lot for your help !

Did you try analyze as well?  It's weird it's using seq scan, since
you have a primary key it's supposed to have an index ... though 500
rows is little.

I just checked up our own production database, takes 0.08 ms to fetch a
row by ID from one of our tables containing 176k with rows.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Tobias Brox
[Tobias Brox - Tue at 02:10:04PM +0200]
 Did you try analyze as well?  It's weird it's using seq scan, since
 you have a primary key it's supposed to have an index ... though 500
 rows is little.
 
 I just checked up our own production database, takes 0.08 ms to fetch a
 row by ID from one of our tables containing 176k with rows.

Oh, the gid is not primary key.  I guess I should also apologize for
adding noise here :-)

Make an index here! :-)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Arnaud Lesauvage

Tobias Brox wrote:

Oh, the gid is not primary key.  I guess I should also apologize for
adding noise here :-)


Yes, it is a primary key, but I am the noise maker here ! ;-)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Tobias Brox
[Arnaud Lesauvage - Tue at 02:13:59PM +0200]
 Tobias Brox wrote:
 Oh, the gid is not primary key.  I guess I should also apologize for
 adding noise here :-)
 
 Yes, it is a primary key, but I am the noise maker here ! ;-)

Oh - it is.  How can you have a default value on a primary key?  Will it
use the index if you do analyze?  Is there an index on the table at
all, do you get it up if you ask for a description of the table (\d
tablename)? 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Arnaud Lesauvage

Tobias Brox wrote:

[Arnaud Lesauvage - Tue at 02:13:59PM +0200]

Tobias Brox wrote:
Oh, the gid is not primary key.  I guess I should also apologize for
adding noise here :-)

Yes, it is a primary key, but I am the noise maker here ! ;-)


Oh - it is.  How can you have a default value on a primary key?  


Good question, but I am not the DB designer in that case.

 Will it

use the index if you do analyze?  Is there an index on the table at
all, do you get it up if you ask for a description of the table (\d
tablename)? 


In this case (a simplified version of the real case), the pkey is the 
only index. It is used if I only as for one row (WHERE gid=33).


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Guillaume Cottenceau
Tobias Brox tobias 'at' nordicbet.com writes:

 Oh - it is.  How can you have a default value on a primary key?  Will it

you can but it is useless :)

foo=# create table bar (uid int primary key default 0, baz text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index bar_pkey for 
table bar
CREATE TABLE
foo=# insert into bar (baz) values ('');
INSERT 217426996 1
foo=# insert into bar (baz) values ('');
ERROR:  duplicate key violates unique constraint bar_pkey

-- 
Guillaume Cottenceau
Create your personal SMS or WAP Service - visit http://mobilefriends.ch/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Unsubscribe

2006-10-03 Thread Geoffrey

uwcssa wrote:

 Please unsubscribe me!  Thank you!


Also, it would be better to have a message foot saying how to unsubscribe.


It would be better if you would have paid attention when you subscribed 
as to how to unsubscribe.


--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-03 Thread Merlin Moncure

On 10/3/06, Carlo Stonebanks [EMAIL PROTECTED] wrote:

Some very helpful people had asked that I post the troublesome code that was
generated by my import program.

I installed a SQL log feature in my import program. I have
posted samples of the SQL statements that cause the biggest delays.


explain analyze is more helpful because it prints the times.

sample 1, couple questions:
what is the purpose of limit 1?
if you break up the 'or' which checks facility and address into two
separate queries, are the two queries total times more, less, or same
as the large query.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Darcy Buskermolen
On October 3, 2006 04:25 am, Arnaud Lesauvage wrote:
 Hi List !

 I have a performance problem, but I am not sure whether it really
 is a problem or not.
 I am running a fresh install of PostgreSQL 8.1.4 on Windows2000.
 The server is a bi-opteron with 2GB of RAM. The PostgreSQL's data
 folder is on a RAID-0 array of 2 SATA WD Raptor drives (10.000
 rpm, 8MB cache).

 I have a very simple table, with only ~500 rows :
 CREATE TABLE table1
 (
 gid int4 NOT NULL DEFAULT 0,
 field1 varchar(45) NOT NULL,
 field2 int2 NOT NULL DEFAULT 1,
 field3 int2 NOT NULL DEFAULT 0,
 field4 int2 NOT NULL DEFAULT 1,
 field5 int4 NOT NULL DEFAULT -1,
 field6 int4,
 field7 int4,
 field8 int4,
 field9 int2 DEFAULT 1,
 CONSTRAINT table1_pkey PRIMARY KEY (gid)
 )
 WITHOUT OIDS;

 The problem is that simple select queries with the primary key in the
 WHERE statement take very long to run.
 For example, this query returns only 7 rows and takes about 1
 second to run !

According to your explain analyze, it's taking 0.8 of a milisecond (less than 
1 1000th of a second) so I can't see how this can possibly be speed up.

 SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41);

 EXPLAIN ANALYZE SELECT * FROM table1 WHERE gid in
 (33,110,65,84,92,94,13,7,68,41);

QUERY PLAN
 ---
---
 Seq Scan on table1  (cost=0.00..23.69 rows=10 width=35) (actual
 time=0.023..0.734 rows=7 loops=1)
  Filter: ((gid = 33) OR (gid = 110) OR (gid = 65) OR (gid = 84)
 OR (gid = 92) OR (gid = 94) OR (gid = 13) OR (gid = 7) OR (gid =
 68) OR (gid = 41))
Total runtime: 0.801 ms
 (3 rows)

 I have run VACUUM FULL on this table many times... I don't know
 what to try next !
 What is wrong here (because I hope that something is wrong) ?
 Thanks a lot for your help !

 Regards
 --
 Arnaud


 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
Darcy Buskermolen
Command Prompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997
http://www.commandprompt.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Poor performance on very simple query ?

2006-10-03 Thread Darcy Buskermolen
On October 3, 2006 05:08 am, Alexander Staubo wrote:
 On Oct 3, 2006, at 13:25 , Arnaud Lesauvage wrote:
  The problem is that simple select queries with the primary key in
  the WHERE statement take very long to run.
  For example, this query returns only 7 rows and takes about 1
  second to run !
  SELECT * FROM table1 WHERE gid in (33,110,65,84,92,94,13,7,68,41);

 This is a very small table, but generally speaking, such queries
 benefit from an index; eg.,

create index table1_gid on table1 (gid);

gid is is a PRIMARY KEY, so it will already have an index in place.

 Note that PostgreSQL may still perform a sequential scan if it thinks
 this has a lower cost, eg. for small tables that span just a few pages.

  I have run VACUUM FULL on this table many times... I don't know
  what to try next !

 PostgreSQL's query planner relies on table statistics to perform
 certain optimizations; make sure you run analyze table1.

 Alexander.

 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
Darcy Buskermolen
Command Prompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997
http://www.commandprompt.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-03 Thread Alex Stapleton

On 3 Oct 2006, at 16:04, Merlin Moncure wrote:


On 10/3/06, Carlo Stonebanks [EMAIL PROTECTED] wrote:
Some very helpful people had asked that I post the troublesome  
code that was

generated by my import program.

I installed a SQL log feature in my import program. I have
posted samples of the SQL statements that cause the biggest delays.


explain analyze is more helpful because it prints the times.


You can always use the \timing flag in psql ;)

l1_historical=# \timing
Timing is on.
l1_historical=# select 1;
?column?
--
1
(1 row)

Time: 4.717 ms





---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-03 Thread Carlo Stonebanks
 explain analyze is more helpful because it prints the times.

Sorry, this runs in-line in my code, and I didn't want to slow the 
already-slow program with explain analyze. I have run it outside of the code 
in its own query. The new results are below.

 sample 1, couple questions:
 what is the purpose of limit 1?

I don't need to know the results, I just need to know if any data which 
meets this criteria exists.

 if you break up the 'or' which checks facility and address into two
 separate queries, are the two queries total times more, less, or same
 as the large query.

They are much less; I had assumed that SQL would use lazy evaluation in this 
case, not bothering to perform one half of the OR condition if the other 
half  But the single query is much heavier than the two seperate ones.

Carlo


 merlin

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

select
f.facility_id,
provider_practice_id
from
mdx_core.provider_practice as pp
join mdx_core.facility as f
on f.facility_id = pp.facility_id
join mdx_core.facility_address as fa
on fa.facility_id = pp.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
pp.provider_id = 1411311
and f.facility_type_code != 'P'
and (
pp.facility_address_id is not null
and a.state_code = 'NY'
and '10001-2382' = a.postal_code||'%'
and a.city = 'New York'
) or (
f.default_state_code = 'NY'
and '10001-2382' like f.default_postal_code||'%'
and f.default_city = 'New York'
)
limit 1

Limit  (cost=3899.18..22561.46 rows=1 width=8) (actual 
time=9410.970..9410.970 rows=0 loops=1)
  -  Hash Join  (cost=3899.18..97210.58 rows=5 width=8) (actual 
time=9410.966..9410.966 rows=0 loops=1)
Hash Cond: (outer.address_id = inner.address_id)
Join Filter: (((outer.provider_id = 1411311) AND 
(outer.facility_type_code  'P'::bpchar) AND (outer.facility_address_id 
IS NOT NULL) AND ((inner.state_code)::text = 'NY'::text) AND 
('10001-2382'::text = ((inner.postal_code)::text || '%' (..)
-  Merge Join  (cost=0.00..51234.97 rows=801456 width=57) (actual 
time=0.314..6690.241 rows=685198 loops=1)
  Merge Cond: (outer.facility_id = inner.facility_id)
  -  Merge Join  (cost=0.00..15799.46 rows=128268 width=49) 
(actual time=0.197..1637.553 rows=128268 loops=1)
Merge Cond: (outer.facility_id = 
inner.facility_id)
-  Index Scan using facility_pkey on facility f 
(cost=0.00..13247.94 rows=176864 width=41) (actual time=0.145..591.219 
rows=126624 loops=1)
-  Index Scan using facility_address_facility_idx on 
facility_address fa  (cost=0.00..4245.12 rows=128268 width=8) (actual 
time=0.041..384.632 rows=128268 loops=1)
  -  Index Scan using provider_practice_facility_idx on 
provider_practice pp  (cost=0.00..30346.89 rows=489069 width=16) (actual 
time=0.111..3031.675 rows=708714 loops=1)
-  Hash  (cost=3650.54..3650.54 rows=99454 width=36) (actual 
time=478.509..478.509 rows=99454 loops=1)
  -  Seq Scan on address a  (cost=0.00..3650.54 rows=99454 
width=36) (actual time=0.033..251.203 rows=99454 loops=1)
Total runtime: 9412.654 ms

--
Sample 2:
This one includes a call to a custom function which performs lexical
comparisons
and returns a rating on the likelihood that the company names refer to the
same
facility. Replacing the code:
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as
comp
with
  1 as comp
-- to avoid the function call only shaved a fragment off the execution time,
which leads me to believe my problem is in the SQL structure itself.
--
select
mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 Inc', name) as
comp,
facil.*
from (
select
f.facility_id,
fa.facility_address_id,
a.address_id,
f.facility_type_code,
f.name,
a.address,
a.city,
a.state_code,
a.postal_code,
a.country_code
from
mdx_core.facility as f
join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
join mdx_core.address as a
on a.address_id = fa.address_id
where
facility_address_id is not null
and a.country_code = 'US'
and a.state_code = 'IL'
and '60640-5759' like a.postal_code||'%'
union select
f.facility_id,
null as facility_address_id,
null as address_id,
f.facility_type_code,
f.name,
null as address,
f.default_city as city,
f.default_state_code as state_code,
f.default_postal_code as postal_code,
f.default_country_code as country_code
from
mdx_core.facility as f
left outer join mdx_core.facility_address as fa
on fa.facility_id = f.facility_id
where
facility_address_id is null
and f.default_country_code = 'US'
and '60640-5759' like f.default_postal_code||'%'
) as facil
order by comp

Sort  (cost=21565.50..21567.78 rows=909 width=236) (actual 
time=1622.448..1622.456 rows=12 loops=1)
  Sort Key: mdx_lib.lex_compare('Vhs Acquisition Subsidiary Number 3 
Inc'::text, (name)::text)
  -  Subquery Scan facil  (cost=21484.47..21520.83 rows=909 

Re: [PERFORM] Unsubscribe

2006-10-03 Thread felix
On Mon, Oct 02, 2006 at 01:36:17PM -0400, uwcssa wrote:
  Please unsubscribe me!  Thank you!
 
 
 Also, it would be better to have a message foot saying how to unsubscribe.

Will this do?  It's too big for a footer.

Here's how to unsubscribe:

First, ask your Internet Provider to mail you an Unsubscribing Kit.
Then follow these directions.

The kit will most likely be the standard no-fault type. Depending on
requirements, System A and/or System B can be used. When operating
System A, depress lever and a plastic dalkron unsubscriber will be
dispensed through the slot immediately underneath. When you have
fastened the adhesive lip, attach connection marked by the large X
outlet hose. Twist the silver-coloured ring one inch below the
connection point until you feel it lock.

The kit is now ready for use. The Cin-Eliminator is activated by the
small switch on the lip.  When securing, twist the ring back to its
initial condition, so that the two orange lines meet.  Disconnect.
Place the dalkron unsubscriber in the vacuum receptacle to the rear.
Activate by pressing the blue button.

The controls for System B are located on the opposite side. The red
release switch places the Cin-Eliminator into position; it can be
adjusted manually up or down by pressing the blue manual release
button. The opening is self-adjusting. To secure after use, press the
green button, which simultaneously activates the evaporator and
returns the Cin-Eliminator to its storage position.

You may log off if the green exit light is on over the evaporator.  If
the red light is illuminated, one of the Cin-Eliminator requirements
has not been properly implemented. Press the List Guy call button on
the right of the evaporator. He will secure all facilities from his
control panel.

To use the Auto-Unsub, first undress and place all your clothes in the
clothes rack. Put on the velcro slippers located in the cabinet
immediately below. Enter the shower, taking the entire kit with
you. On the control panel to your upper right upon entering you will
see a Shower seal button. Press to activate. A green light will then
be illuminated immediately below. On the intensity knob, select the
desired setting. Now depress the Auto-Unsub activation lever. Bathe
normally.

The Auto-Unsub will automatically go off after three minutes unless
you activate the Manual off override switch by flipping it up. When
you are ready to leave, press the blue Shower seal release
button. The door will open and you may leave. Please remove the velcro
slippers and place them in their container.

If you prefer the ultrasonic log-off mode, press the indicated blue
button. When the twin panels open, pull forward by rings A  B. The
knob to the left, just below the blue light, has three settings, low,
medium or high. For normal use, the medium setting is suggested.

After these settings have been made, you can activate the device by
switching to the ON position the clearly marked red switch. If
during the unsubscribing operation you wish to change the settings,
place the manual off override switch in the OFF position. You may
now make the change and repeat the cycle. When the green exit light
goes on, you may log off and have lunch. Please close the door behind
you.


-- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / [EMAIL PROTECTED]
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license #4933
I've found a solution to Fermat's Last Theorem but I see I've run out of room o

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Unsubscribe

2006-10-03 Thread Carlo Stonebanks
I got one of these last Christmas. It works great, but the device has no 
obvious power source and now I can't find my cat.

God help me when I accidently try to unsubscribe like that ..
Carlo


[EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 On Mon, Oct 02, 2006 at 01:36:17PM -0400, uwcssa wrote:
  Please unsubscribe me!  Thank you!


 Also, it would be better to have a message foot saying how to 
 unsubscribe.

 Will this do?  It's too big for a footer.

 Here's how to unsubscribe:

 First, ask your Internet Provider to mail you an Unsubscribing Kit.
 Then follow these directions.

 The kit will most likely be the standard no-fault type. Depending on
 requirements, System A and/or System B can be used. When operating
 System A, depress lever and a plastic dalkron unsubscriber will be
 dispensed through the slot immediately underneath. When you have
 fastened the adhesive lip, attach connection marked by the large X
 outlet hose. Twist the silver-coloured ring one inch below the
 connection point until you feel it lock.

 The kit is now ready for use. The Cin-Eliminator is activated by the
 small switch on the lip.  When securing, twist the ring back to its
 initial condition, so that the two orange lines meet.  Disconnect.
 Place the dalkron unsubscriber in the vacuum receptacle to the rear.
 Activate by pressing the blue button.

 The controls for System B are located on the opposite side. The red
 release switch places the Cin-Eliminator into position; it can be
 adjusted manually up or down by pressing the blue manual release
 button. The opening is self-adjusting. To secure after use, press the
 green button, which simultaneously activates the evaporator and
 returns the Cin-Eliminator to its storage position.

 You may log off if the green exit light is on over the evaporator.  If
 the red light is illuminated, one of the Cin-Eliminator requirements
 has not been properly implemented. Press the List Guy call button on
 the right of the evaporator. He will secure all facilities from his
 control panel.

 To use the Auto-Unsub, first undress and place all your clothes in the
 clothes rack. Put on the velcro slippers located in the cabinet
 immediately below. Enter the shower, taking the entire kit with
 you. On the control panel to your upper right upon entering you will
 see a Shower seal button. Press to activate. A green light will then
 be illuminated immediately below. On the intensity knob, select the
 desired setting. Now depress the Auto-Unsub activation lever. Bathe
 normally.

 The Auto-Unsub will automatically go off after three minutes unless
 you activate the Manual off override switch by flipping it up. When
 you are ready to leave, press the blue Shower seal release
 button. The door will open and you may leave. Please remove the velcro
 slippers and place them in their container.

 If you prefer the ultrasonic log-off mode, press the indicated blue
 button. When the twin panels open, pull forward by rings A  B. The
 knob to the left, just below the blue light, has three settings, low,
 medium or high. For normal use, the medium setting is suggested.

 After these settings have been made, you can activate the device by
 switching to the ON position the clearly marked red switch. If
 during the unsubscribing operation you wish to change the settings,
 place the manual off override switch in the OFF position. You may
 now make the change and repeat the cycle. When the green exit light
 goes on, you may log off and have lunch. Please close the door behind
 you.


 -- 
... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._.
 Felix Finch: scarecrow repairman  rocket surgeon / [EMAIL PROTECTED]
  GPG = E987 4493 C860 246C 3B1E  6477 7838 76E9 182E 8151 ITAR license 
 #4933
 I've found a solution to Fermat's Last Theorem but I see I've run out of 
 room o

 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-03 Thread Carlo Stonebanks
Please ignore sample 1 - now that I have the logging feature, I can see that 
my query generator algorithm made an error.

The SQL of concern is now script 2. 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Graham Davis

Hi,

Adding DESC to both columns in the SORT BY did not make the query use 
the multikey index.   So both


SELECT DISTINCT ON (assetid) assetid, ts
FROM asset_positions 
ORDER BY assetid, ts DESC;


and

SELECT DISTINCT ON (assetid) assetid, ts
FROM asset_positions 
ORDER BY assetid DESC, ts DESC;


use the same query plans and both do sequential scans without using either the 
(assetid, ts) or (ts) indexes.  Any other ideas on how to make this query use 
an index?  Thanks,

--
Graham Davis
Refractions Research Inc.
[EMAIL PROTECTED]




On Wed, Sep 27, 2006 at 20:56:32 +,
 Graham Davis [EMAIL PROTECTED] wrote:
 

SELECT assetid, max(ts) AS ts 
FROM asset_positions 
GROUP BY assetid;


I have an index on (ts), another index on (assetid) and a multikey index on
(assetid, ts).  I know the assetid index is pointless since the multikey one
takes its place, but I put it there while testing just to make sure.  The
ANALYZE EXPLAIN for this query is:

QUERY PLAN

-
HashAggregate  (cost=125423.96..125424.21 rows=20 width=12) (actual
time=39693.995..39694.036 rows=20 loops=1)
  -  Seq Scan on asset_positions  (cost=0.00..116654.64 rows=1753864
width=12) (actual time=20002.362..34724.896 rows=1738693 loops=1)
Total runtime: 39694.245 ms
(3 rows)

You can see it is doing a sequential scan on the table when it should be
using the (assetid, ts) index, or at the very least the (ts) index.  This
query takes about 40 seconds to complete with a table of 1.7 million rows. 
I tested running the query without the group by as follows:
   



 


SELECT DISTINCT ON (assetid) assetid, ts
FROM asset_positions 
ORDER BY assetid, ts DESC;
   



This is almost what you want to do to get an alternative plan. But you
need to ORDER BY assetid DESC, ts DESC to make use of the multicolumn
index. If you really need the other output order, reverse it in your
application or use the above as a subselect in another query that orders
by assetid ASC.
 





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Chris Browne
[EMAIL PROTECTED] (Graham Davis) writes:
 Adding DESC to both columns in the SORT BY did not make the query use
 the multikey index.   So both

 SELECT DISTINCT ON (assetid) assetid, ts
 FROM asset_positions ORDER BY assetid, ts DESC;

 and

 SELECT DISTINCT ON (assetid) assetid, ts
 FROM asset_positions ORDER BY assetid DESC, ts DESC;

 use the same query plans and both do sequential scans without using
 either the (assetid, ts) or (ts) indexes.  Any other ideas on how to
 make this query use an index?  Thanks,

Why do you want to worsen performance by forcing the use of an index?

You are reading through the entire table, after all, and doing so via
a sequential scan is normally the fastest way to do that.  An index
scan would only be more efficient if you don't have enough space in
memory to store all assetid values.
-- 
(reverse (concatenate 'string gro.mca @ enworbbc))
http://www3.sympatico.ca/cbbrowne/emacs.html
Expect the unexpected.
-- The Hitchhiker's Guide to the Galaxy, page 7023

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Graham Davis
The asset_positions table has about 1.7 million rows, and this query 
takes over 40 seconds to do a sequential scan.  Initially I was trying 
to get the original query:


SELECT assetid, max(ts) AS ts 
FROM asset_positions 
GROUP BY assetid;


to use the multikey index since I read that PostgreSQL 8 added support 
for aggregates to use indexes.  However, the GROUP BY was causing the query
plan to not use any index (removing the GROUP by allowed the query to 
use the ts index and it took only 50 ms to run).  Since I need the query 
to find the max time
for EACH asset, I can't just drop the GROUP BY from my query.  So I was 
trying some alternate ways of writing the query (as described in the 
below email) to

force the use of one of these indexes.

40 seconds is much too slow for this query to run and I'm assuming that 
the use of an index will make it much faster (as seen when I removed the 
GROUP BY clause).  Any tips?


Graham.


Chris Browne wrote:


[EMAIL PROTECTED] (Graham Davis) writes:
 


Adding DESC to both columns in the SORT BY did not make the query use
the multikey index.   So both

SELECT DISTINCT ON (assetid) assetid, ts
FROM asset_positions ORDER BY assetid, ts DESC;

and

SELECT DISTINCT ON (assetid) assetid, ts
FROM asset_positions ORDER BY assetid DESC, ts DESC;

use the same query plans and both do sequential scans without using
either the (assetid, ts) or (ts) indexes.  Any other ideas on how to
make this query use an index?  Thanks,
   



Why do you want to worsen performance by forcing the use of an index?

You are reading through the entire table, after all, and doing so via
a sequential scan is normally the fastest way to do that.  An index
scan would only be more efficient if you don't have enough space in
memory to store all assetid values.
 




--
Graham Davis
Refractions Research Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Graham Davis
Also, the multikey index of (assetid, ts) would already be sorted and 
that is why using such an index in this case is

faster than doing a sequential scan that does the sorting afterwards.

Graham.


Chris Browne wrote:


[EMAIL PROTECTED] (Graham Davis) writes:
 


Adding DESC to both columns in the SORT BY did not make the query use
the multikey index.   So both

SELECT DISTINCT ON (assetid) assetid, ts
FROM asset_positions ORDER BY assetid, ts DESC;

and

SELECT DISTINCT ON (assetid) assetid, ts
FROM asset_positions ORDER BY assetid DESC, ts DESC;

use the same query plans and both do sequential scans without using
either the (assetid, ts) or (ts) indexes.  Any other ideas on how to
make this query use an index?  Thanks,
   



Why do you want to worsen performance by forcing the use of an index?

You are reading through the entire table, after all, and doing so via
a sequential scan is normally the fastest way to do that.  An index
scan would only be more efficient if you don't have enough space in
memory to store all assetid values.
 




--
Graham Davis
Refractions Research Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-03 Thread Merlin Moncure

On 10/3/06, Carlo Stonebanks [EMAIL PROTECTED] wrote:

Please ignore sample 1 - now that I have the logging feature, I can see that
my query generator algorithm made an error.


can you do explain analyze on the two select queries on either side of
the union separatly?  the subquery is correctly written and unlikely
to be a problem (in fact, good style imo).  so lets have a look at
both sides of facil query and see where the problem is.

merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Chris Browne
[EMAIL PROTECTED] (Graham Davis) writes:
 40 seconds is much too slow for this query to run and I'm assuming
 that the use of an index will make it much faster (as seen when I
 removed the GROUP BY clause).  Any tips?

Assumptions are dangerous things.

An aggregate like this has *got to* scan the entire table, and given
that that is the case, an index scan is NOT optimal; a seq scan is.

An index scan is just going to be slower.
-- 
let name=cbbrowne and tld=linuxdatabases.info in String.concat @ 
[name;tld];;
http://cbbrowne.com/info/linux.html
The computer is the ultimate polluter: its feces are
indistinguishable from the food it produces. -- Alan J. Perlis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Graham Davis
How come an aggreate like that has to use a sequential scan?  I know 
that PostgreSQL use to have to do a sequential scan for all aggregates, 
but there was support added to version 8 so that aggregates would take 
advantage of indexes.   This is why


SELECT max(ts) AS ts
FROM asset_positions;

Uses an index on the ts column and only takes 50 milliseconds.  When I 
added the group by it would not use a multikey index or any other 
index.   Is there just no support for aggregates to use multikey 
indexes?  Sorry to be so pushy, but I just want to make sure I 
understand why the above query can use an index and the following can't:


SELECT assetid, max(ts) AS ts
FROM asset_positions
GROUP BY assetid;

--
Graham Davis
Refractions Research Inc.
[EMAIL PROTECTED]



Chris Browne wrote:


[EMAIL PROTECTED] (Graham Davis) writes:
 


40 seconds is much too slow for this query to run and I'm assuming
that the use of an index will make it much faster (as seen when I
removed the GROUP BY clause).  Any tips?
   



Assumptions are dangerous things.

An aggregate like this has *got to* scan the entire table, and given
that that is the case, an index scan is NOT optimal; a seq scan is.

An index scan is just going to be slower.
 





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Bruno Wolff III
On Tue, Oct 03, 2006 at 12:13:43 -0700,
  Graham Davis [EMAIL PROTECTED] wrote:
 Also, the multikey index of (assetid, ts) would already be sorted and 
 that is why using such an index in this case is
 faster than doing a sequential scan that does the sorting afterwards.

That isn't necessarily true. The sequentional scan and sort will need a lot
fewer disk seeks and could run faster than using an index scan that has
the disk drives doing seeks for every tuple (in the worst case, where
the on disk order of tuples doesn't match the order in the index).

If your server is caching most of the blocks than the index scan might
give better results. You might try disabling sequentional scans to
try to coerce the other plan and see what results you get. If it is
substantially faster the other way, then you might want to look at lowering
the random page cost factor. However, since this can affect other queries
you need to be careful that you don't speed up one query at the expense
of a lot of other queries.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Tom Lane
Graham Davis [EMAIL PROTECTED] writes:
 How come an aggreate like that has to use a sequential scan?  I know 
 that PostgreSQL use to have to do a sequential scan for all aggregates, 
 but there was support added to version 8 so that aggregates would take 
 advantage of indexes.

Not in a GROUP BY context, only for the simple case.  Per the comment in
planagg.c:

 * We don't handle GROUP BY, because our current implementations of
 * grouping require looking at all the rows anyway, and so there's not
 * much point in optimizing MIN/MAX.

The problem is that using an index to obtain the maximum value of ts for
a given value of assetid is not the same thing as finding out what all
the distinct values of assetid are.

This could possibly be improved but it would take a considerable amount
more work.  It's definitely not in the category of bug fix.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Graham Davis
Thanks Tom, that explains it and makes sense.  I guess I will have to 
accept this query taking 40 seconds, unless I can figure out another way 
to write it so it can use indexes.  If there are any more syntax 
suggestions, please pass them on.  Thanks for the help everyone.


Graham.


Tom Lane wrote:


Graham Davis [EMAIL PROTECTED] writes:
 

How come an aggreate like that has to use a sequential scan?  I know 
that PostgreSQL use to have to do a sequential scan for all aggregates, 
but there was support added to version 8 so that aggregates would take 
advantage of indexes.
   



Not in a GROUP BY context, only for the simple case.  Per the comment in
planagg.c:

 * We don't handle GROUP BY, because our current implementations of
 * grouping require looking at all the rows anyway, and so there's not
 * much point in optimizing MIN/MAX.

The problem is that using an index to obtain the maximum value of ts for
a given value of assetid is not the same thing as finding out what all
the distinct values of assetid are.

This could possibly be improved but it would take a considerable amount
more work.  It's definitely not in the category of bug fix.

regards, tom lane
 




--
Graham Davis
Refractions Research Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Mark Lewis
Have you looked into a materialized view sort of approach?  You could
create a table which had assetid as a primary key, and max_ts as a
column.  Then use triggers to keep that table up to date as rows are
added/updated/removed from the main table.

This approach would only make sense if there were far fewer distinct
assetid values than rows in the main table, and would get slow if you
commonly delete rows from the main table or decrease the value for ts in
the row with the highest ts for a given assetid.

-- Mark Lewis

On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:
 Thanks Tom, that explains it and makes sense.  I guess I will have to 
 accept this query taking 40 seconds, unless I can figure out another way 
 to write it so it can use indexes.  If there are any more syntax 
 suggestions, please pass them on.  Thanks for the help everyone.
 
 Graham.
 
 
 Tom Lane wrote:
 
 Graham Davis [EMAIL PROTECTED] writes:
   
 
 How come an aggreate like that has to use a sequential scan?  I know 
 that PostgreSQL use to have to do a sequential scan for all aggregates, 
 but there was support added to version 8 so that aggregates would take 
 advantage of indexes.
 
 
 
 Not in a GROUP BY context, only for the simple case.  Per the comment in
 planagg.c:
 
   * We don't handle GROUP BY, because our current implementations of
   * grouping require looking at all the rows anyway, and so there's not
   * much point in optimizing MIN/MAX.
 
 The problem is that using an index to obtain the maximum value of ts for
 a given value of assetid is not the same thing as finding out what all
 the distinct values of assetid are.
 
 This could possibly be improved but it would take a considerable amount
 more work.  It's definitely not in the category of bug fix.
 
  regards, tom lane
   
 
 
 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Graham Davis
The summary table approach maintained by triggers is something we are 
considering, but it becomes a bit more complicated to implement.  
Currently we have groups of new positions coming in every few seconds or 
less.  They are not guaranteed to be in order.  So for instance, a group 
of positions from today could come in and be inserted, then a group of 
positions that got lost from yesterday could come in and be inserted 
afterwards. 

This means the triggers would have to do some sort of logic to figure 
out if the newly inserted position is actually the most recent by 
timestamp.  If positions are ever deleted or updated, the same sort of 
query that is currently running slow will need to be executed in order 
to get the new most recent position.  So there is the possibility that 
new positions can be inserted faster than the triggers can calculate 
and  maintain the summary table.  There are some other complications 
with maintaining such a summary table in our system too, but I won't get 
into those.


Right now I'm just trying to see if I can get the query itself running 
faster, which would be the easiest solution for now.


Graham.


Mark Lewis wrote:


Have you looked into a materialized view sort of approach?  You could
create a table which had assetid as a primary key, and max_ts as a
column.  Then use triggers to keep that table up to date as rows are
added/updated/removed from the main table.

This approach would only make sense if there were far fewer distinct
assetid values than rows in the main table, and would get slow if you
commonly delete rows from the main table or decrease the value for ts in
the row with the highest ts for a given assetid.

-- Mark Lewis

On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:
 

Thanks Tom, that explains it and makes sense.  I guess I will have to 
accept this query taking 40 seconds, unless I can figure out another way 
to write it so it can use indexes.  If there are any more syntax 
suggestions, please pass them on.  Thanks for the help everyone.


Graham.


Tom Lane wrote:

   


Graham Davis [EMAIL PROTECTED] writes:


 

How come an aggreate like that has to use a sequential scan?  I know 
that PostgreSQL use to have to do a sequential scan for all aggregates, 
but there was support added to version 8 so that aggregates would take 
advantage of indexes.
  

   


Not in a GROUP BY context, only for the simple case.  Per the comment in
planagg.c:

 * We don't handle GROUP BY, because our current implementations of
 * grouping require looking at all the rows anyway, and so there's not
 * much point in optimizing MIN/MAX.

The problem is that using an index to obtain the maximum value of ts for
a given value of assetid is not the same thing as finding out what all
the distinct values of assetid are.

This could possibly be improved but it would take a considerable amount
more work.  It's definitely not in the category of bug fix.

regards, tom lane


 

   




--
Graham Davis
Refractions Research Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Mark Lewis
Hmmm.  How many distinct assetids are there?
-- Mark Lewis

On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote:
 The summary table approach maintained by triggers is something we are 
 considering, but it becomes a bit more complicated to implement.  
 Currently we have groups of new positions coming in every few seconds or 
 less.  They are not guaranteed to be in order.  So for instance, a group 
 of positions from today could come in and be inserted, then a group of 
 positions that got lost from yesterday could come in and be inserted 
 afterwards. 
 
 This means the triggers would have to do some sort of logic to figure 
 out if the newly inserted position is actually the most recent by 
 timestamp.  If positions are ever deleted or updated, the same sort of 
 query that is currently running slow will need to be executed in order 
 to get the new most recent position.  So there is the possibility that 
 new positions can be inserted faster than the triggers can calculate 
 and  maintain the summary table.  There are some other complications 
 with maintaining such a summary table in our system too, but I won't get 
 into those.
 
 Right now I'm just trying to see if I can get the query itself running 
 faster, which would be the easiest solution for now.
 
 Graham.
 
 
 Mark Lewis wrote:
 
 Have you looked into a materialized view sort of approach?  You could
 create a table which had assetid as a primary key, and max_ts as a
 column.  Then use triggers to keep that table up to date as rows are
 added/updated/removed from the main table.
 
 This approach would only make sense if there were far fewer distinct
 assetid values than rows in the main table, and would get slow if you
 commonly delete rows from the main table or decrease the value for ts in
 the row with the highest ts for a given assetid.
 
 -- Mark Lewis
 
 On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:
   
 
 Thanks Tom, that explains it and makes sense.  I guess I will have to 
 accept this query taking 40 seconds, unless I can figure out another way 
 to write it so it can use indexes.  If there are any more syntax 
 suggestions, please pass them on.  Thanks for the help everyone.
 
 Graham.
 
 
 Tom Lane wrote:
 
 
 
 Graham Davis [EMAIL PROTECTED] writes:
  
 
   
 
 How come an aggreate like that has to use a sequential scan?  I know 
 that PostgreSQL use to have to do a sequential scan for all aggregates, 
 but there was support added to version 8 so that aggregates would take 
 advantage of indexes.

 
 
 
 Not in a GROUP BY context, only for the simple case.  Per the comment in
 planagg.c:
 
 * We don't handle GROUP BY, because our current implementations of
 * grouping require looking at all the rows anyway, and so there's not
 * much point in optimizing MIN/MAX.
 
 The problem is that using an index to obtain the maximum value of ts for
 a given value of assetid is not the same thing as finding out what all
 the distinct values of assetid are.
 
 This could possibly be improved but it would take a considerable amount
 more work.  It's definitely not in the category of bug fix.
 
regards, tom lane
  
 
   
 
 
 
 
 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Graham Davis
Not many.  It fluctuates, but there are usually only ever a few hundred 
at most.  Each assetid has multi-millions of positions though.


Mark Lewis wrote:


Hmmm.  How many distinct assetids are there?
-- Mark Lewis

On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote:
 

The summary table approach maintained by triggers is something we are 
considering, but it becomes a bit more complicated to implement.  
Currently we have groups of new positions coming in every few seconds or 
less.  They are not guaranteed to be in order.  So for instance, a group 
of positions from today could come in and be inserted, then a group of 
positions that got lost from yesterday could come in and be inserted 
afterwards. 

This means the triggers would have to do some sort of logic to figure 
out if the newly inserted position is actually the most recent by 
timestamp.  If positions are ever deleted or updated, the same sort of 
query that is currently running slow will need to be executed in order 
to get the new most recent position.  So there is the possibility that 
new positions can be inserted faster than the triggers can calculate 
and  maintain the summary table.  There are some other complications 
with maintaining such a summary table in our system too, but I won't get 
into those.


Right now I'm just trying to see if I can get the query itself running 
faster, which would be the easiest solution for now.


Graham.


Mark Lewis wrote:

   


Have you looked into a materialized view sort of approach?  You could
create a table which had assetid as a primary key, and max_ts as a
column.  Then use triggers to keep that table up to date as rows are
added/updated/removed from the main table.

This approach would only make sense if there were far fewer distinct
assetid values than rows in the main table, and would get slow if you
commonly delete rows from the main table or decrease the value for ts in
the row with the highest ts for a given assetid.

-- Mark Lewis

On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:


 

Thanks Tom, that explains it and makes sense.  I guess I will have to 
accept this query taking 40 seconds, unless I can figure out another way 
to write it so it can use indexes.  If there are any more syntax 
suggestions, please pass them on.  Thanks for the help everyone.


Graham.


Tom Lane wrote:

  

   


Graham Davis [EMAIL PROTECTED] writes:




 

How come an aggreate like that has to use a sequential scan?  I know 
that PostgreSQL use to have to do a sequential scan for all aggregates, 
but there was support added to version 8 so that aggregates would take 
advantage of indexes.
 

  

   


Not in a GROUP BY context, only for the simple case.  Per the comment in
planagg.c:

 * We don't handle GROUP BY, because our current implementations of
 * grouping require looking at all the rows anyway, and so there's not
 * much point in optimizing MIN/MAX.

The problem is that using an index to obtain the maximum value of ts for
a given value of assetid is not the same thing as finding out what all
the distinct values of assetid are.

This could possibly be improved but it would take a considerable amount
more work.  It's definitely not in the category of bug fix.

regards, tom lane




 

  

   

   




--
Graham Davis
Refractions Research Inc.
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] PostgreSQL Caching

2006-10-03 Thread Adnan DURSUN

   Hi,

   I wonder how PostgreSQL caches the SQL query results. For example ;

   * does postgres cache query result in memory that done by session A 
?

   * does session B use these results ?

Best Regards

Adnan DURSUN


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Mark Lewis
A few hundred is quite a lot for the next proposal and it's kind of an
ugly one, but might as well throw the idea out since you never know.

Have you considered creating one partial index per assetid?  Something
along the lines of CREATE INDEX asset_index_N ON asset_positions(ts)
WHERE assetid=N?  I'd guess that the planner probably wouldn't be smart
enough to use the partial indexes unless you issued a separate query for
each assetid, but each one of those queries should be really fast.  Of
course, this is all assuming that PG knows how to use partial indexes to
satisfy MAX queries; I'm not sure if it does.

-- Mark Lewis

On Tue, 2006-10-03 at 14:35 -0700, Graham Davis wrote:
 Not many.  It fluctuates, but there are usually only ever a few hundred 
 at most.  Each assetid has multi-millions of positions though.
 
 Mark Lewis wrote:
 
 Hmmm.  How many distinct assetids are there?
 -- Mark Lewis
 
 On Tue, 2006-10-03 at 14:23 -0700, Graham Davis wrote:
   
 
 The summary table approach maintained by triggers is something we are 
 considering, but it becomes a bit more complicated to implement.  
 Currently we have groups of new positions coming in every few seconds or 
 less.  They are not guaranteed to be in order.  So for instance, a group 
 of positions from today could come in and be inserted, then a group of 
 positions that got lost from yesterday could come in and be inserted 
 afterwards. 
 
 This means the triggers would have to do some sort of logic to figure 
 out if the newly inserted position is actually the most recent by 
 timestamp.  If positions are ever deleted or updated, the same sort of 
 query that is currently running slow will need to be executed in order 
 to get the new most recent position.  So there is the possibility that 
 new positions can be inserted faster than the triggers can calculate 
 and  maintain the summary table.  There are some other complications 
 with maintaining such a summary table in our system too, but I won't get 
 into those.
 
 Right now I'm just trying to see if I can get the query itself running 
 faster, which would be the easiest solution for now.
 
 Graham.
 
 
 Mark Lewis wrote:
 
 
 
 Have you looked into a materialized view sort of approach?  You could
 create a table which had assetid as a primary key, and max_ts as a
 column.  Then use triggers to keep that table up to date as rows are
 added/updated/removed from the main table.
 
 This approach would only make sense if there were far fewer distinct
 assetid values than rows in the main table, and would get slow if you
 commonly delete rows from the main table or decrease the value for ts in
 the row with the highest ts for a given assetid.
 
 -- Mark Lewis
 
 On Tue, 2006-10-03 at 13:52 -0700, Graham Davis wrote:
  
 
   
 
 Thanks Tom, that explains it and makes sense.  I guess I will have to 
 accept this query taking 40 seconds, unless I can figure out another way 
 to write it so it can use indexes.  If there are any more syntax 
 suggestions, please pass them on.  Thanks for the help everyone.
 
 Graham.
 
 
 Tom Lane wrote:
 

 
 
 
 Graham Davis [EMAIL PROTECTED] writes:
 
 
  
 
   
 
 How come an aggreate like that has to use a sequential scan?  I know 
 that PostgreSQL use to have to do a sequential scan for all aggregates, 
 but there was support added to version 8 so that aggregates would take 
 advantage of indexes.
   
 

 
 
 
 Not in a GROUP BY context, only for the simple case.  Per the comment in
 planagg.c:
 
   * We don't handle GROUP BY, because our current implementations of
   * grouping require looking at all the rows anyway, and so there's not
   * much point in optimizing MIN/MAX.
 
 The problem is that using an index to obtain the maximum value of ts for
 a given value of assetid is not the same thing as finding out what all
 the distinct values of assetid are.
 
 This could possibly be improved but it would take a considerable amount
 more work.  It's definitely not in the category of bug fix.
 
  regards, tom lane
 
 
  
 
   
 

 
 
 
 
 
 
 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL Caching

2006-10-03 Thread Tomeh, Husam

Like many descent RDBMS, Postgresql server allocates its own shared
memory area where data is cached in. When receiving a query request,
Postgres engine checks first its shared memory buffers, if not found,
the engine performs disk I/Os to retrieve data from PostgreSQL data
files and place it in the shared buffer area before serving it back to
the client. Blocks in the shared buffers are shared by other sessions
and can therefore be possibly accessed by other sessions. Postgresql
shared buffers can be allocated by setting the postgresql.conf parameter
namely, shared_buffers.

Sincerely,

--
  Husam 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adnan
DURSUN
Sent: Tuesday, October 03, 2006 2:49 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] PostgreSQL Caching

Hi,

I wonder how PostgreSQL caches the SQL query results. For example ;

* does postgres cache query result in memory that done by
session A 
?
* does session B use these results ?

Best Regards

Adnan DURSUN


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
**
This message contains confidential information intended only for the use of the 
addressee(s) named above and may contain information that is legally 
privileged.  If you are not the addressee, or the person responsible for 
delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly prohibited.  If 
you have received this message by mistake, please immediately notify us by 
replying to the message and delete the original message immediately thereafter.

Thank you.

   FADLD Tag
**


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] BUG #2658: Query not using index

2006-10-03 Thread Tom Lane
Mark Lewis [EMAIL PROTECTED] writes:
 Have you considered creating one partial index per assetid?  Something
 along the lines of CREATE INDEX asset_index_N ON asset_positions(ts)
 WHERE assetid=N?  I'd guess that the planner probably wouldn't be smart
 enough to use the partial indexes unless you issued a separate query for
 each assetid, but each one of those queries should be really fast.

Actually, a single index on (assetid, ts) is sufficient to handle

select max(ts) from asset_positions where assetid = constant

The problem is to know what values of constant to issue the query for,
and this idea doesn't seem to help with that.

If Graham is willing to assume that the set of assetids changes slowly,
perhaps he could keep a summary table that contains all the valid
assetids (or maybe there already is such a table?  is assetid a foreign
key?) and do

select pk.assetid,
   (select max(ts) from asset_positions where assetid = pk.assetid)
from other_table pk;

I'm pretty sure the subselect would be planned the way he wants.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL Caching

2006-10-03 Thread Adnan DURSUN


   Thanks,

   I wonder these ;

   * When any session updates the data that allready in shared buffer, 
does Postgres sychronize the data both disk and shared buffers area 
immediately ?
   * Does postgres cache SQL execution plan analyze results in memory 
to use for other sessions ? For example ;
   When session A execute SELECT * FROM tab WHERE col1 = val1 AND col2 
= val2, does postgres save the parser/optimizer result in memory in order
to use by other session to prevent duplicate execution of parser 
and optimizer so therefore get time ?. Because an execution plan is created 
before..


Sincenerly

Adnan DURSUN

- Original Message - 
From: Tomeh, Husam [EMAIL PROTECTED]
To: Adnan DURSUN [EMAIL PROTECTED]; 
pgsql-performance@postgresql.org

Sent: Wednesday, October 04, 2006 1:11 AM
Subject: Re: [PERFORM] PostgreSQL Caching



Like many descent RDBMS, Postgresql server allocates its own shared
memory area where data is cached in. When receiving a query request,
Postgres engine checks first its shared memory buffers, if not found,
the engine performs disk I/Os to retrieve data from PostgreSQL data
files and place it in the shared buffer area before serving it back to
the client. Blocks in the shared buffers are shared by other sessions
and can therefore be possibly accessed by other sessions. Postgresql
shared buffers can be allocated by setting the postgresql.conf parameter
namely, shared_buffers.

Sincerely,

--
 Husam

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adnan
DURSUN
Sent: Tuesday, October 03, 2006 2:49 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] PostgreSQL Caching

   Hi,

   I wonder how PostgreSQL caches the SQL query results. For example ;

   * does postgres cache query result in memory that done by
session A
?
   * does session B use these results ?

Best Regards

Adnan DURSUN


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
**
This message contains confidential information intended only for the use of 
the addressee(s) named above and may contain information that is legally 
privileged.  If you are not the addressee, or the person responsible for 
delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly prohibited. 
If you have received this message by mistake, please immediately notify us 
by replying to the message and delete the original message immediately 
thereafter.


Thank you.

  FADLD Tag
**


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Forcing the use of particular execution plans

2006-10-03 Thread Ron Mayer
Jim C. Nasby wrote:
 
 Index scans are also pretty picky about correlation. If you have really
 low correlation you don't want to index scan,

I'm still don't think correlation is the right metric
at all for making this decision.

If you have a list of addresses clustered by zip
the correlation of State, City, County, etc will all be zero (since
the zip codes don't match the alphabetical order of state or city names)
but index scans are still big wins because the data for any given
state or city will be packed on the same few pages - and in fact
the pages could be read mostly sequentially.

 but I think our current
 estimates make it too eager to switch to a seqscan.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Forcing the use of particular execution plans

2006-10-03 Thread Jim C. Nasby
Adding -performance back in.

On Tue, Oct 03, 2006 at 05:10:04PM -0700, Ron Mayer wrote:
 Jim C. Nasby wrote:
  
  Index scans are also pretty picky about correlation. If you have really
  low correlation you don't want to index scan,
 
 I'm still don't think correlation is the right metric
 at all for making this decision.
 
 If you have a list of addresses clustered by zip
 the correlation of State, City, County, etc will all be zero (since
 the zip codes don't match the alphabetical order of state or city names)
 but index scans are still big wins because the data for any given
 state or city will be packed on the same few pages - and in fact
 the pages could be read mostly sequentially.
 
That's a good point that I don't think has been considered before. I
think correlation is still somewhat important, but what's probably far
more important is data localization.

One possible way to calculate this would be to note the location of
every tuple with a given value in the heap. Calculate the geometric mean
of those locations (I think you could essentially average all the
ctids), and divide that by the average distance of each tuple from that
mean (or maybe the reciprocal of that would be more logical).

Obviously we don't want to scan the whole table to do that, but there
should be some way to do it via sampling as well.

Or perhaps someone knows of a research paper with real data on how to do
this instead of hand-waving. :)

  but I think our current
  estimates make it too eager to switch to a seqscan.
-- 
Jim C. Nasby, Database Architect   [EMAIL PROTECTED]
512.569.9461 (cell) http://jim.nasby.net

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL Caching

2006-10-03 Thread Tomeh, Husam
 
  * When any session updates the data that already in shared
buffer, 
does Postgres synchronize the data both disk and shared buffers area 
 immediately ?

Not necessarily true. When a block is modified in the shared buffers,
the modified block is written to the Postgres WAL log. A periodic DB
checkpoint is performed to flush the modified blocks in the shared
buffers to the data files.

  * Does postgres cache SQL execution plan analyze results in memory 
 to use for other sessions ? For example ;
When session A execute SELECT * FROM tab WHERE col1 = val1
AND col2 
 = val2, does postgres save the parser/optimizer result in memory in
order
 to use by other session to prevent duplicate execution of
parser 
 and optimizer so therefore get time ?. Because an execution plan is
created 
 before..

Query plans are not stored in the shared buffers and therefore can not
be re-used by other sessions. They're only cached by the connection on a
session level.

Sincerely,

--
  Husam 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adnan
DURSUN
Sent: Tuesday, October 03, 2006 4:53 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL Caching


Thanks,

I wonder these ;

* When any session updates the data that allready in shared
buffer, 
does Postgres sychronize the data both disk and shared buffers area 
immediately ?
* Does postgres cache SQL execution plan analyze results in
memory 
to use for other sessions ? For example ;
When session A execute SELECT * FROM tab WHERE col1 = val1 AND
col2 
= val2, does postgres save the parser/optimizer result in memory in
order
 to use by other session to prevent duplicate execution of
parser 
and optimizer so therefore get time ?. Because an execution plan is
created 
before..

Sincenerly

Adnan DURSUN

- Original Message - 
From: Tomeh, Husam [EMAIL PROTECTED]
To: Adnan DURSUN [EMAIL PROTECTED]; 
pgsql-performance@postgresql.org
Sent: Wednesday, October 04, 2006 1:11 AM
Subject: Re: [PERFORM] PostgreSQL Caching



Like many descent RDBMS, Postgresql server allocates its own shared
memory area where data is cached in. When receiving a query request,
Postgres engine checks first its shared memory buffers, if not found,
the engine performs disk I/Os to retrieve data from PostgreSQL data
files and place it in the shared buffer area before serving it back to
the client. Blocks in the shared buffers are shared by other sessions
and can therefore be possibly accessed by other sessions. Postgresql
shared buffers can be allocated by setting the postgresql.conf parameter
namely, shared_buffers.

Sincerely,

--
  Husam

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Adnan
DURSUN
Sent: Tuesday, October 03, 2006 2:49 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] PostgreSQL Caching

Hi,

I wonder how PostgreSQL caches the SQL query results. For example ;

* does postgres cache query result in memory that done by
session A
?
* does session B use these results ?

Best Regards

Adnan DURSUN


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
**
This message contains confidential information intended only for the use
of 
the addressee(s) named above and may contain information that is legally

privileged.  If you are not the addressee, or the person responsible for

delivering it to the addressee, you are hereby notified that reading, 
disseminating, distributing or copying this message is strictly
prohibited. 
If you have received this message by mistake, please immediately notify
us 
by replying to the message and delete the original message immediately 
thereafter.

Thank you.

   FADLD Tag
**


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL Caching

2006-10-03 Thread Adnan DURSUN
- Original Message - 
From: Tomeh, Husam [EMAIL PROTECTED]
To: Adnan DURSUN [EMAIL PROTECTED]; 
pgsql-performance@postgresql.org

Sent: Wednesday, October 04, 2006 4:29 AM
Subject: RE: [PERFORM] PostgreSQL Caching



Query plans are not stored in the shared buffers and therefore can not
be re-used by other sessions. They're only cached by the connection on a
session level.


   Ok. i see. thanks..So that means that a stored object execution plan 
saved before is destroyed from memory after it was altered or dropped by any 
session. Is that true ?

   And last one :-)
   i want to be can read an execution plan when i look at it. 
So, is there any doc about how it should be read ?


Sincenerly !

Adnan DURSUN 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match