[PERFORM] limitation using LIKE on ANY(array)

2006-03-24 Thread K C Lau



With 8.1.3, I get an error when trying to do this on a Text[] column
:
.. WHERE ANY(array) LIKE 'xx%'

Indeed, I get rejected even with:
.. WHERE ANY(array) = 'xx'

In both cases, the error is: ERROR: syntax error at or near
any ... 

It would only work as documented in the manual (8.10.5):
SELECT * FROM sal_emp WHERE 1 = ANY (pay_by_quarter);

It appears that this restriction is still in place in
8.2:

http://developer.postgresql.org/docs/postgres/arrays.html

Is that the case?
Thanks in advance,
KC.



Re: [PERFORM] SELECT MIN, MAX took longer time than SELECT

2006-01-21 Thread K C Lau

At 01:20 06/01/21, Jim C. Nasby wrote:


BTW, these queries below are meaningless; they are not equivalent to
min(logsn).

 esdt= explain analyze select LogSN from Log where create_time 
 '2005/10/19' order by create_time limit 1;


Thank you for pointing it out.

It actually returns the min(logsn), as the index is on (create_time, 
logsn). To be more explicit, I have changed to query to:
explain analyze select LogSN from Log where create_time  '2005/10/19' 
order by create_time, logsn limit 1;


esdt= \d log;
 create_time | character varying(23)   | default 
'1970/01/01~00:00:00.000'::character varying

 logsn   | integer | not null
 ...
Indexes:
pk_log PRIMARY KEY, btree (logsn)
idx_logtime btree (create_time, logsn)

Best regards,
KC.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX

2006-01-19 Thread K C Lau


The following query took 17 seconds:
select count(LogSN), min(LogSN), max(LogSN) from Log where create_time  
'2005/10/19';


Figuring that getting the count will involve scanning the database, I took 
it out, but the new query took 200 seconds:

select min(LogSN), max(LogSN) from Log where create_time  '2005/10/19';

Is it because the planner is using index pk_log instead of idx_logtime? 
Anyway to avoid that?


I can get instant replies with 2 separate queries for min(LogSN) and 
max(LogSN) using order by create_time limit 1, but I can't get both values 
within 1 query using the limit 1 construct. Any suggestions?


I am running pg 8.1.2 on Windows 2000. The queries are done immediately 
after a vacuum analyze.


Best regards,
KC.

--

esdt= \d log;
 create_time | character varying(23)   | default 
'1970/01/01~00:00:00.000'::char

acter varying
 logsn   | integer | not null
 ...
Indexes:
pk_log PRIMARY KEY, btree (logsn)
idx_logtime btree (create_time, logsn)
...

esdt= vacuum analyze log;
VACUUM

esdt= explain analyze select count(LogSN), min(LogSN), max(LogSN) from Log 
where create_time  '2005/10/19';


 Aggregate  (cost=57817.74..57817.75 rows=1 width=4) (actual 
time=17403.381..17403.384 rows=1 loops=1)
   -  Bitmap Heap Scan on log  (cost=1458.31..57172.06 rows=86089 
width=4) (actual time=180.368..17039.262 rows=106708 loops=1)

 Recheck Cond: ((create_time)::text  '2005/10/19'::text)
 -  Bitmap Index Scan on idx_logtime  (cost=0.00..1458.31 
rows=86089 width=0) (actual time=168.777..168.777 rows=106708 loops=1)

   Index Cond: ((create_time)::text  '2005/10/19'::text)
 Total runtime: 17403.787 ms

esdt= explain analyze select min(LogSN), max(LogSN) from Log where 
create_time  '2005/10/19';


 Result  (cost=2.51..2.52 rows=1 width=0) (actual 
time=200051.507..200051.510 rows=1 loops=1)

   InitPlan
 -  Limit  (cost=0.00..1.26 rows=1 width=4) (actual 
time=18.541..18.544 rows=1 loops=1)
   -  Index Scan using pk_log on log  (cost=0.00..108047.11 
rows=86089

width=4) (actual time=18.533..18.533 rows=1 loops=1)
 Filter: (((create_time)::text  '2005/10/19'::text) AND 
(logsn IS NOT NULL))
 -  Limit  (cost=0.00..1.26 rows=1 width=4) (actual 
time=200032.928..200032.931 rows=1 loops=1)
   -  Index Scan Backward using pk_log on 
log  (cost=0.00..108047.11 rows=86089 width=4) (actual 
time=200032.920..200032.920 rows=1 loops=1)
 Filter: (((create_time)::text  '2005/10/19'::text) AND 
(logsn IS NOT NULL))

 Total runtime: 200051.701 ms

esdt= explain analyze select LogSN from Log where create_time  
'2005/10/19' order by create_time limit 1;


 Limit  (cost=0.00..0.98 rows=1 width=31) (actual time=0.071..0.073 rows=1 
loops=1)
   -  Index Scan using idx_logtime on log  (cost=0.00..84649.94 
rows=86089 width=31) (actual time=0.063..0.063 rows=1 loops=1)

 Index Cond: ((create_time)::text  '2005/10/19'::text)
 Total runtime: 0.182 ms

esdt= explain analyze select LogSN from Log where create_time  
'2005/10/19' order by create_time desc limit 1;
 Limit  (cost=0.00..0.98 rows=1 width=31) (actual time=0.058..0.061 rows=1 
loops=1)
   -  Index Scan Backward using idx_logtime on log  (cost=0.00..84649.94 
rows=86089 width=31) (actual time=0.051..0.051 rows=1 loops=1)

 Index Cond: ((create_time)::text  '2005/10/19'::text)
 Total runtime: 0.186 ms


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] indexes on primary and foreign keys

2006-01-11 Thread K C Lau

At 07:21 06/01/12, Michael Fuhr wrote:

On Wed, Jan 11, 2006 at 02:38:42PM -0800, Burak Seydioglu wrote:
 I do a load of sql joins using primary and foreign keys. What i would like
 to know if PostgreSQL creates indexes on these columns automatically (in
 addition to using them to maintain referential integrity) or do I have to
 create an index manually on these columns as indicated below?

 CREATE TABLE cities (
   city_id integer primary key,
   city_name varchar(50)
 );

 CREATE INDEX city_id_index ON cities(city_id);

PostgreSQL automatically creates indexes on primary keys.  If you run
the above CREATE TABLE statement in psql you should see a message to
that effect:

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index 
cities_pkey for table cities


Is there a way to suppress this notice when I create tables in a script?

Best regards,
KC.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] indexes on primary and foreign keys

2006-01-11 Thread K C Lau

At 09:26 06/01/12, you wrote:


On Jan 12, 2006, at 9:36 , K C Lau wrote:


NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
cities_pkey for table cities


Is there a way to suppress this notice when I create tables in a
script?


Set[1] your log_min_messages to WARNING or higher[2].

[1](http://www.postgresql.org/docs/current/interactive/sql-set.html)
[2](http://www.postgresql.org/docs/current/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN)

Michael Glaesemann
grzm myrealbox com


Thanks. The side effect is that it would suppress other notices which might 
be useful.


I was looking for a way to suppress the notice within the CREATE TABLE 
statement but could not.
I noticed that when I specify a constraint name for the primary key, it 
would create an implicit index with the constraint name. So may be if the 
optional constraint name is specified by the user, then the notice can be 
suppressed. Indeed the manual already says that the index will be 
automatically created.


BTW, there's an extra space in link[2] above which I have removed.

Best regards,
KC. 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-10-12 Thread K C Lau

Dear Merlin and all,

That direct SQL returns in 0 ms. The problem only appears when a view is used.

What we've done to work around this problem is to modify the table to add a 
field DataStatus which is set to 1 for the latest record for each player, 
and reset to 0 when it is superceded.


A partial index is then created as:
CREATE INDEX IDX_CurPlayer on Player (PlayerID) where DataStatus = 1;

The VCurPlayer view is changed to:
CREATE or REPLACE VIEW VCurPlayer as select * from Player where DataStatus = 1;
and it now returns in 0 ms.

This is not the best solution, but until (if ever) the original problem is 
fixed, we have not found an alternative work around.


The good news is that even with the additional overhead of maintaining an 
extra index and the problem of vacuuming, pg 8.0.3 still performs 
significantly faster on Windows than MS Sql 2000 in our OLTP application 
testing so far.


Thanks to all for your help.

Best regards,
KC.

At 20:14 05/10/12, you wrote:

KC wrote:

 So I guess it all comes back to the basic question:

 For the query select distinct on (PlayerID) * from Player a where
 PlayerID='0' order by PlayerId Desc, AtDate Desc;
 can the optimizer recognise the fact the query is selecting by the
primary
 key (PlayerID,AtDate), so it can skip the remaining rows for that
 PlayerID,
 as if LIMIT 1 is implied?

 Best regards, KC.

Hi KC, have you tried:
select * from player where playerid = '0' and atdate  99
order by platerid desc, atdate desc limit 1;

??
Merlin



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-26 Thread K C Lau

At 20:17 05/09/23, K C Lau wrote:

At 19:15 05/09/23, Simon Riggs wrote:

select distinct on (PlayerID) PlayerID,AtDate from Player a
where PlayerID='0' order by PlayerId, AtDate Desc;

Does that work for you?

Best Regards, Simon Riggs


esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from 
Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc;
 Unique  (cost=0.00..1327.44 rows=2 width=23) (actual time=0.027..8.438 
rows=1 loops=1)
   -  Index Scan Backward using pk_player on player 
a  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..4.950 
rows=1743 loops=1)

 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 8.499 ms

That is the fastest of all queries looping the 1743 rows.
I do get the desired result by adding LIMIT 1:

esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from 
Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc LIMIT 1;


 Limit  (cost=0.00..663.72 rows=1 width=23) (actual time=0.032..0.033 
rows=1 loops=1)
   -  Unique  (cost=0.00..1327.44 rows=2 width=23) (actual 
time=0.028..0.028 rows=1 loops=1)
 -  Index Scan Backward using pk_player on player 
a  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..0.022 
rows=1 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 0.094 ms

However, when I use that within a function in a view, it is slow again:

esdt= create or replace function player_max_atdate (varchar(32)) returns 
varchar(32) as $$
esdt$  select distinct on (PlayerID) AtDate from player where PlayerID= 
$1 order by PlayerID desc, AtDate desc limit 1;

esdt$ $$ language sql immutable;
CREATE FUNCTION
esdt= create or replace view VCurPlayer3 as select * from Player where 
AtDate = player_max_atdate(PlayerID);

CREATE VIEW
esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where 
PlayerID='0';


 Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9 
width=23) (actual time=76.660..76.664 rows=1 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
 Total runtime: 76.716 ms

Why wouldn't the function get the row as quickly as the direct sql does?


Results from the following query suggests that the explain analyze output 
above only tells half the story, and that the function is in fact called 
1743 times:


esdt= create or replace view VCurPlayer3 as select distinct on (PlayerID) 
* from Player a where OID = (select distinct on (PlayerID) OID from Player 
b where b.PlayerID = a.PlayerID and b.AtDate = 
player_max_atdate(b.PlayerID) order by PlayerID desc, AtDate desc limit 1) 
order by PlayerId Desc, AtDate desc;

CREATE VIEW
esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where 
PlayerID='0';
 Subquery Scan vcurplayer3  (cost=0.00..1715846.91 rows=1 width=68) 
(actual time=0.640..119.124 rows=1 loops=1)
   -  Unique  (cost=0.00..1715846.90 rows=1 width=776) (actual 
time=0.633..119.112 rows=1 loops=1)
 -  Index Scan Backward using pk_player on player 
a  (cost=0.00..1715846.88 rows=9 width=776) (actual time=0.628..119.104 
rows=1 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
   Filter: (oid = (subplan))
   SubPlan
 -  Limit  (cost=0.00..976.38 rows=1 width=27) (actual 
time=0.057..0.058 rows=1 loops=1743)
   -  Unique  (cost=0.00..976.38 rows=1 width=27) 
(actual time=0.052..0.052 rows=1 loops=1743)
 -  Index Scan Backward using pk_player on 
player b  (cost=0.00..976.36 rows=6 width=27) (actual  time=0.047..0.047 
rows=1 loops=1743)

   Index Cond: ((playerid)::text = ($0)::text)
   Filter: ((atdate)::text = 
(player_max_atdate(playerid))::text)

 Total runtime: 119.357 ms

It would also explain the very long time taken by the pl/pgsql function I 
posted a bit earlier.


So I guess it all comes back to the basic question:

For the query select distinct on (PlayerID) * from Player a where 
PlayerID='0' order by PlayerId Desc, AtDate Desc;
can the optimizer recognise the fact the query is selecting by the primary 
key (PlayerID,AtDate), so it can skip the remaining rows for that PlayerID, 
as if LIMIT 1 is implied?


Best regards, KC.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-23 Thread K C Lau
Thank you all for your suggestions. I' tried, with some variations too, but 
still no success. The times given are the best of a few repeated tries on 
an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows.


For reference, only the following gets the record quickly:

esdt= explain analyze select PlayerID,AtDate from Player a
 where PlayerID='0' and AtDate = (select b.AtDate from Player b
 where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc LIMIT 1);

 Index Scan using pk_player on player a  (cost=0.75..4.26 rows=1 width=23) 
(actual time=0.054..0.057 rows=1 loops=1)
   Index Cond: (((playerid)::text = '0'::text) AND ((atdate)::text = 
($0)::text))

   InitPlan
 -  Limit  (cost=0.00..0.75 rows=1 width=23) (actual 
time=0.027..0.028 rows=1 loops=1)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.023..0.023 rows=1 
loops=1)

 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 0.132 ms

At 02:19 05/09/23, Kevin Grittner wrote:

Have you tried the best choice pattern -- where you select the set of
candidate rows and then exclude those for which a better choice
exists within the set?  I often get better results with this pattern than
with the alternatives.


esdt= explain analyze select PlayerID,AtDate from Player a where 
PlayerID='0'
and not exists (select * from Player b where b.PlayerID = a.PlayerID and 
b.AtDate  a.AtDate);


 Index Scan using pk_player on player a  (cost=0.00..3032.46 rows=878 
width=23)

(actual time=35.820..35.823 rows=1 loops=1)
   Index Cond: ((playerid)::text = '0'::text)
   Filter: (NOT (subplan))
   SubPlan
 -  Index Scan using pk_player on player b  (cost=0.00..378.68 
rows=389 width=776) (actual time=0.013..0.013 rows=1 loops=1743)
   Index Cond: (((playerid)::text = ($0)::text) AND 
((atdate)::text  ($1)::text))

 Total runtime: 35.950 ms

Note that it is faster than the LIMIT 1:

esdt= explain analyze select PlayerID,AtDate from Player a where 
PlayerID='0' and AtDate = (select b.AtDate from Pl
ayer b where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate 
desc LIMIT 1);


 Index Scan using pk_player on player a  (cost=0.00..2789.07 rows=9 
width=23) (actual time=41.366..41.371 rows=1 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = ((subplan))::text)
   SubPlan
 -  Limit  (cost=0.00..0.83 rows=1 width=23) (actual 
time=0.013..0.014 rows=1 loops=1743)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..970.53 rows=1166 width=23) (actual time=0.008..0.008 rows=1 
loops=1743)

 Index Cond: ((playerid)::text = ($0)::text)
 Total runtime: 41.490 ms

At 02:07 05/09/23, Merlin Moncure wrote:

 Here is a trick I use sometimes with views, etc.  This may or may not be
 effective to solve your problem but it's worth a shot.  Create one small
 SQL function taking date, etc. and returning the values and define it
 immutable.  Now in-query it is treated like a constant.


esdt= create or replace function player_max_atdate (varchar(32)) returns 
varchar(32) as $$
esdt$  select atdate from player where playerid = $1 order by playerid 
desc, AtDate desc limit 1;

esdt$ $$ language sql immutable;
CREATE FUNCTION
esdt= create or replace view VCurPlayer3 as select * from Player where 
AtDate = player_max_atdate(PlayerID);

CREATE VIEW
esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where 
PlayerID='0';


 Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9 
width=23) (actual time=65.434..65.439 rows=1 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
 Total runtime: 65.508 ms

While it says loops=1, the time suggests that it is going through all 1743 
records for that PlayerID.


I tried to simulate the fast subquery inside the function, but it is taking 
almost twice as much time:


esdt= create or replace function player_max_atdate (varchar(32)) returns 
varchar(32) as $$
esdt$  select atdate from player a where playerid = $1 and AtDate = 
(select b.AtDate from Player b

esdt$  where b.PlayerID = $1 order by b.PlayerID desc, b.AtDate desc LIMIT 1);
esdt$ $$ language sql immutable;
CREATE FUNCTION
esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where 
PlayerID='0';


 Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9 
width=23) (actual time=119.369..119.373 rows=1 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
 Total runtime: 119.441 ms

Adding another LIMIT 1 inside the function makes it even slower:

esdt= create or replace function player_max_atdate (varchar(32)) returns 
varchar(32) as $$
esdt$  select atdate from player where playerid = $1 and AtDate = (select 
b.AtDate from Player b

esdt$  where b.PlayerID = $1 order by b.PlayerID 

Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-23 Thread K C Lau

At 19:15 05/09/23, Simon Riggs wrote:

select distinct on (PlayerID) PlayerID,AtDate from Player a
where PlayerID='0' order by PlayerId, AtDate Desc;

Does that work for you?

Best Regards, Simon Riggs


esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from 
Player a where PlayerID='0' order by PlayerId, AtDate Desc;
 Unique  (cost=1417.69..1426.47 rows=2 width=23) (actual 
time=31.231..36.609 rows=1 loops=1)
   -  Sort  (cost=1417.69..1422.08 rows=1756 width=23) (actual 
time=31.129..32.473 rows=1743 loops=1)

 Sort Key: playerid, atdate
 -  Index Scan using pk_player on player a  (cost=0.00..1323.05 
rows=1756 width=23) (actual time=0.035..6.575 rows=1743 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 36.943 ms

The sort was eliminated with: order by PlayerId Desc, AtDate Desc:

esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from 
Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc;
 Unique  (cost=0.00..1327.44 rows=2 width=23) (actual time=0.027..8.438 
rows=1 loops=1)
   -  Index Scan Backward using pk_player on player 
a  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..4.950 
rows=1743 loops=1)

 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 8.499 ms

That is the fastest of all queries looping the 1743 rows.
I do get the desired result by adding LIMIT 1:

esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from 
Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc LIMIT 1;


 Limit  (cost=0.00..663.72 rows=1 width=23) (actual time=0.032..0.033 
rows=1 loops=1)
   -  Unique  (cost=0.00..1327.44 rows=2 width=23) (actual 
time=0.028..0.028 rows=1 loops=1)
 -  Index Scan Backward using pk_player on player 
a  (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..0.022 rows=1 
loops=1)

   Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 0.094 ms

However, when I use that within a function in a view, it is slow again:

esdt= create or replace function player_max_atdate (varchar(32)) returns 
varchar(32) as $$
esdt$  select distinct on (PlayerID) AtDate from player where PlayerID= $1 
order by PlayerID desc, AtDate desc limit 1;

esdt$ $$ language sql immutable;
CREATE FUNCTION
esdt= create or replace view VCurPlayer3 as select * from Player where 
AtDate = player_max_atdate(PlayerID);

CREATE VIEW
esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where 
PlayerID='0';


 Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9 
width=23) (actual time=76.660..76.664 rows=1 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
 Total runtime: 76.716 ms

Why wouldn't the function get the row as quickly as the direct sql does?

Best regards, KC.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-23 Thread K C Lau

Dear Merlin,

At 20:34 05/09/23, Merlin Moncure wrote:

Can you time just the execution of this function and compare vs. pure
SQL version?  If the times are different, can you do a exaplain analyze
of a prepared version of above?


esdt= prepare test(character varying) as select atdate from player where
esdt- playerid = $1 order by playerid desc, AtDate desc limit 1;
PREPARE
esdt= explain analyze execute test('0');
 Limit  (cost=0.00..0.83 rows=1 width=23) (actual time=0.032..0.033 rows=1 
loops=1)
   -  Index Scan Backward using pk_player on player  (cost=0.00..970.53 
rows=1166 width=23) (actual time=0.027..0.027 rows=1 loops=1)

 Index Cond: ((playerid)::text = ($1)::text)
 Total runtime: 0.088 ms

The prepared SQL timing is similar to that of a direct SQL.


 esdt= create or replace view VCurPlayer3 as select * from Player where
 AtDate = player_max_atdate(PlayerID);

This is wrong, it should have been
create or replace view VCurPlayer3 as select *,
player_max_atdate(PlayerID) as max_date  from Player;


Your suggestion returns all the records plus a max AtDate column for each 
PlayerID.
What I want to get with the view is the record that has the max value of 
AtDate for each PlayerID.
The AtDate is a varchar(23) field containing a string date of format 
'mmddhh', not the SQL Date field. Sorry if that confused you.



Something is not jiving here.  However, if the server plan still does
not come out correct, try the following (p.s. why is function returning
varchar(32) and not date?):


esdt= create or replace function player_max_atdate (varchar(32)) returns 
varchar(32) as $$

esdt$ DECLARE
esdt$player_record record;
esdt$return_date varchar(32);
esdt$ BEGIN
esdt$for player_record in execute
esdt$'select atdate from player where playerid = \'' || $1 || 
'\' order by playerid desc, AtDate desc limit 1;' loop

esdt$return_date = player_record.atdate;
esdt$end loop;
esdt$return return_date;
esdt$ END;
esdt$ $$ language plpgsql immutable;
CREATE FUNCTION
esdt= create or replace view VCurPlayer3 as select * from Player where 
AtDate = player_max_atdate(PlayerID);

CREATE VIEW
esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where 
PlayerID='0';


 Index Scan using pk_player on player  (cost=0.00..1331.83 rows=9 
width=23) (actual time=849.021..849.025 rows=1 loops=1)

   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
 Total runtime: 849.078 ms

Your suggested plpgsql function seems to be even slower, with a best time 
of 849 ms after several tries. Is that expected?


Thanks again and best regards,
KC.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-23 Thread K C Lau

Dear Mark,

Thank you. That seems like a more manageable alternative if nothing else 
works out. It should cover many of the OLTP update transactions. But it 
does mean quite a bit of programming changes and adding another index on 
all such tables, and it would not cover those cases when we need to get the 
latest record before a certain time, for example.


I'm wondering if this performance issue is common enough for other users to 
merit a fix in pg, especially as it seems that with MVCC, each of the data 
records need to be accessed in addition to scanning the index.


Best regards,
KC.

At 09:40 05/09/24, Mark Kirkwood wrote:
A small denormalization, where you mark the row with the latest atdate for 
each playerid may get you the performance you want.


e.g: (8.1beta1)

ALTER TABLE player ADD islastatdate boolean;

UPDATE player SET islastatdate = true where (playerid,atdate) IN
(SELECT playerid, atdate FROM vcurplayer);

CREATE OR REPLACE VIEW vcurplayer AS
SELECT * FROM player a
WHERE islastatdate;

CREATE INDEX player_id_lastatdate ON player(playerid, islastatdate)
WHERE islastatdate;

ANALYZE player;

Generating some test data produced:

EXPLAIN ANALYZE
SELECT playerid,atdate
FROM vcurplayer
WHERE playerid='0';

   QUERY PLAN

 Index Scan using player_id_lastatdate on player a  (cost=0.00..4.33 
rows=1 width=13) (actual time=0.142..0.149 rows=1 loops=1)

   Index Cond: ((playerid = '0'::text) AND (lastatdate = true))
   Filter: lastatdate
 Total runtime: 0.272 ms
(4 rows)

Whereas with the original view definition:

CREATE OR REPLACE VIEW vcurplayer AS
SELECT * FROM player a
WHERE a.atdate =
( SELECT max(b.atdate) FROM player b
  WHERE a.playerid = b.playerid);

EXPLAIN ANALYZE
SELECT playerid,atdate
FROM vcurplayer
WHERE playerid='0';
QUERY PLAN
-
 Index Scan using player_id_date on player a  (cost=0.00..7399.23 rows=11 
width=13) (actual time=121.738..121.745 rows=1 loops=1)

   Index Cond: (playerid = '0'::text)
   Filter: (atdate = (subplan))
   SubPlan
 -  Result  (cost=1.72..1.73 rows=1 width=0) (actual 
time=0.044..0.047 rows=1 loops=2000)

   InitPlan
 -  Limit  (cost=0.00..1.72 rows=1 width=4) (actual 
time=0.028..0.031 rows=1 loops=2000)
   -  Index Scan Backward using player_id_date on player 
b  (cost=0.00..3787.94 rows=2198 width=4) (actual time=0.019..0.019 
rows=1 loops=2000)

 Index Cond: ($0 = playerid)
 Filter: (atdate IS NOT NULL)
 Total runtime: 121.916 ms
(11 rows)

Note that my generated data has too many rows for each playerid, but 
the  difference in performance should illustrate the idea.


Cheers

Mark



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-22 Thread K C Lau
We use similar views as base views throughout our OLTP system to get the 
latest time-based record(s). So it is quite impossible to use summary 
tables etc. Are there other ways to do it?


The subquery would pinpoint the record(s) with the composite primary key. 
Both MS Sql and Oracle do not have such performance problem. So this 
problem is effectively stopping us from migrating to PostgreSQL.


Any suggestions would be most appreciated.

Best regards,
KC.

At 16:40 05/09/22, Simon Riggs wrote:

On Thu, 2005-09-22 at 12:21 +0800, K C Lau wrote:

 Investigating further on this problem I brought up in June, the following
 query with pg 8.0.3 on Windows scans all 1743 data records for a player:

 esdt= explain analyze select PlayerID,AtDate from Player a
   where PlayerID='0' and AtDate = (select b.AtDate from Player b
   where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc
 LIMIT 1);


   Total runtime: 51.133 ms

 Using a static value in the subquery produces the desired result below, 
but
 since we use views for our queries (see last part of this email), we 
cannot

 push the static value into the subquery:

 esdt= explain analyze select PlayerID,AtDate from Player a
   where PlayerID='0' and AtDate = (select b.AtDate from Player b
   where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc 
LIMIT 1);


   Total runtime: 0.149 ms

 The Player table has a primary key on PlayerID, AtDate. Is there a way to
 stop the inner-most index scan looping all 1743 data records for that
 player?  Is that a bug or known issue?

Currently the planner can't tell whether a subquery is correlated or not
until it has planned the query. So it is unable to push down the
qualification automatically in the way you have achieved manually. The
new min() optimisation doesn't yet work with GROUP BY which is what you
would use to reformulate the query that way, so no luck that way either.

If you don't want to do this in a view, calculate the values for all
players at once and store the values in a summary table for when you
need them.

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-22 Thread K C Lau

At 20:48 05/09/22, Simon Riggs wrote:

On Thu, 2005-09-22 at 18:40 +0800, K C Lau wrote:
 We use similar views as base views throughout our OLTP system to get the
 latest time-based record(s). So it is quite impossible to use summary
 tables etc. Are there other ways to do it?

 The subquery would pinpoint the record(s) with the composite primary key.
 Both MS Sql and Oracle do not have such performance problem. So this
 problem is effectively stopping us from migrating to PostgreSQL.

 Any suggestions would be most appreciated.

Even if this were fixed for 8.1, which seems unlikely, would you be able
to move to that release immediately?


Yes. In fact when we first developed our system a few years ago, we tested 
on MS7.0, Oracle 8 and PG 7.1.1 and we did not hit that problem. When we 
try again with PG 8.0, the performance becomes unbearable, but other areas 
appear ok and other queries are often faster than MS Sql2k.



Maybe its possible to reconstruct your query with sub-sub-selects so
that you have a correlated query with manually pushed down clauses,
which also references a more constant base view?


We would be most happy to try them if we have some example views or pointers.


Is a 51ms query really such a problem for you?


Unfortunately yes, as our target performance is in the high hundreds of 
transactions per sec. And 51 ms is already the best case for a single 
select, with everything cached in memory immediately after the same select 
which took 390 ms on a quiet system.



Best Regards, Simon Riggs


Best regards,
KC. 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-22 Thread K C Lau

At 22:37 05/09/22, Merlin Moncure wrote:


 create or replace view VCurPlayer as select * from Player a
 where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID=
 b.PlayerID);

Here is a trick I use sometimes with views, etc.  This may or may not be
effective to solve your problem but it's worth a shot.  Create one small
SQL function taking date, etc. and returning the values and define it
immutable.  Now in-query it is treated like a constant.


We don't use functions as a rule, but I would be glad to give it a try.
I would most appreciate if you could define a sample function and rewrite 
the VCurPlayer view above. Both PlayerID and AtDate are varchar fields.



Another useful application for this feature is when you have nested
views (view 1 queries view 2) and you need to filter records based on
fields from view 2 which are not returned in view 1.  Impossible?

in view 2 add clause where v2.f between  f_min() and f_max(), them being
immutable functions which can grab filter criteria based on inputs or
values from a table.

Merlin


Best regards,
KC. 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-21 Thread K C Lau

Hi All,

Investigating further on this problem I brought up in June, the following 
query with pg 8.0.3 on Windows scans all 1743 data records for a player:


esdt= explain analyze select PlayerID,AtDate from Player a
 where PlayerID='0' and AtDate = (select b.AtDate from Player b
 where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc 
LIMIT 1);


 Index Scan using pk_player on player a  (cost=0.00..2789.07 rows=9 
width=23) (a

ctual time=51.046..51.049 rows=1 loops=1)
   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = ((subplan))::text)
   SubPlan
 -  Limit  (cost=0.00..0.83 rows=1 width=23) (actual 
time=0.016..0.017 rows

=1 loops=1743)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..970.

53 rows=1166 width=23) (actual time=0.011..0.011 rows=1 loops=1743)
 Index Cond: ((playerid)::text = ($0)::text)
 Total runtime: 51.133 ms

Using a static value in the subquery produces the desired result below, but 
since we use views for our queries (see last part of this email), we cannot 
push the static value into the subquery:


esdt= explain analyze select PlayerID,AtDate from Player a
 where PlayerID='0' and AtDate = (select b.AtDate from Player b
 where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc LIMIT 1);

 Index Scan using pk_player on player a  (cost=0.75..4.26 rows=1 width=23) 
(actu

al time=0.054..0.058 rows=1 loops=1)
   Index Cond: (((playerid)::text = '0'::text) AND ((atdate)::text = 
($0)::t

ext))
   InitPlan
 -  Limit  (cost=0.00..0.75 rows=1 width=23) (actual 
time=0.028..0.029 rows

=1 loops=1)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..1323

.05 rows=1756 width=23) (actual time=0.023..0.023 rows=1 loops=1)
 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 0.149 ms

The Player table has a primary key on PlayerID, AtDate. Is there a way to 
stop the inner-most index scan looping all 1743 data records for that 
player?  Is that a bug or known issue?


BTW, I tried using 8.1 beta2 on Windows and its performance is similar, I 
have also tried other variants such as MAX and DISTINCT but with no success.


Any help is most appreciated.

Best regards,
KC.


At 10:46 05/06/15, K C Lau wrote:

Hi All,

I previously posted the following as a sequel to my SELECT DISTINCT 
Performance Issue question. We would most appreciate any clue or 
suggestions on how to overcome this show-stopping issue. We are using 
8.0.3 on Windows.


Is it a known limitation when using a view with SELECT ... LIMIT 1?

Would the forthcoming performance enhancement with MAX help when used 
within a view, as in:


create or replace view VCurPlayer as select * from Player a
where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID = 
b.PlayerID);


select PlayerID,AtDate from VCurPlayer where PlayerID='0';

Thanks and regards,
KC.

-

Actually the problem with LIMIT 1 query is when we use views with the 
LIMIT 1 construct. The direct SQL is ok:


esdt= explain analyze select PlayerID,AtDate from Player where 
PlayerID='0'

 order by PlayerID desc, AtDate desc LIMIT 1;

 Limit  (cost=0.00..1.37 rows=1 width=23) (actual time=0.000..0.000 
rows=1 loops=1)
   -  Index Scan Backward using pk_player on 
player  (cost=0.00..16074.23 rows=11770 width=23) (actual 
time=0.000..0.000 rows=1 loops=1)

 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 0.000 ms

esdt= create or replace view VCurPlayer3 as select * from Player a
where AtDate = (select b.AtDate from Player b where a.PlayerID = b.PlayerID
order by b.PlayerID desc, b.AtDate desc LIMIT 1);

esdt= explain analyze select PlayerID,AtDate,version from VCurPlayer3 
where PlayerID='0';
 Index Scan using pk_player on player a  (cost=0.00..33072.78 rows=59 
width=27)

(actual time=235.000..235.000 rows=1 loops=1)
   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = ((subplan))::text)
   SubPlan
 -  Limit  (cost=0.00..1.44 rows=1 width=23) (actual 
time=0.117..0.117 rows=1 loops=1743)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..14023.67 rows=9727 width=23) (actual time=0.108..0.108 
rows=1 loops=1743)

 Index Cond: (($0)::text = (playerid)::text)
 Total runtime: 235.000 ms



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SELECT DISTINCT Performance Issue

2005-06-06 Thread K C Lau

At 19:45 05/06/06, PFC wrote:



Previously, we have also tried to use LIMIT 1 instead of DISTINCT, but
the performance was no better:
select PlayerID,AtDate from Player where PlayerID='0' order by
PlayerID desc, AtDate desc LIMIT 1


The DISTINCT query will pull out all the rows and keep only one, 
so the

one with LIMIT should be faster. Can you post explain analyze of the LIMIT
query ?


Actually the problem with LIMIT 1 query is when we use views with the LIMIT 
1 construct. The direct SQL is ok:


esdt= explain analyze select PlayerID,AtDate from Player where 
PlayerID='0'

 order by PlayerID desc, AtDate desc LIMIT 1;

 Limit  (cost=0.00..1.37 rows=1 width=23) (actual time=0.000..0.000 rows=1 
loops

=1)
   -  Index Scan Backward using pk_player on player  (cost=0.00..16074.23 
rows=

11770 width=23) (actual time=0.000..0.000 rows=1 loops=1)
 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 0.000 ms

esdt= create or replace view VCurPlayer3 as select * from Player a
where AtDate = (select b.AtDate from Player b where a.PlayerID = b.PlayerID
order by b.PlayerID desc, b.AtDate desc LIMIT 1);

esdt= explain analyze select PlayerID,AtDate,version from VCurPlayer3 
where Pla

yerID='0';
 Index Scan using pk_player on player a  (cost=0.00..33072.78 rows=59 
width=27)

(actual time=235.000..235.000 rows=1 loops=1)
   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = ((subplan))::text)
   SubPlan
 -  Limit  (cost=0.00..1.44 rows=1 width=23) (actual 
time=0.117..0.117 rows

=1 loops=1743)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..1402

3.67 rows=9727 width=23) (actual time=0.108..0.108 rows=1 loops=1743)
 Index Cond: (($0)::text = (playerid)::text)
 Total runtime: 235.000 ms

The problem appears to be in the loops=1743 scanning all 1743 data records 
for that player.


Regards, KC.



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] SELECT DISTINCT Performance Issue

2005-06-05 Thread K C Lau

Hi All,

We are testing PostgreSQL 8.0.3 on MS Windows for porting an OLTP system 
from MS SqlServer.


We got a major performance issue which seems to boil down to the following 
type of query:


select DISTINCT ON (PlayerID) PlayerID,AtDate from Player  where 
PlayerID='0' order by PlayerID desc, AtDate desc;
The Player table has primary key (PlayerID, AtDate) representing data over 
time and the query gets the latest data for a player.


With enable_seqscan forced off (which I'm not sure if that should be done 
for a production system), the average query still takes a very long time to 
return a record:


esdt= explain analyze select DISTINCT ON (PlayerID) PlayerID,AtDate from 
Player

 where PlayerID='0' order by PlayerID desc, AtDate desc;
 Unique  (cost=0.00..2507.66 rows=1 width=23) (actual time=0.000..187.000 
rows=1 loops=1)
   -  Index Scan Backward using pk_player on player  (cost=0.00..2505.55 
rows=8

43 width=23) (actual time=0.000..187.000 rows=1227 loops=1)
 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 187.000 ms

It appears that all the 1227 data records for that player were searched, 
even when doing a backward index scan. I would presume that, after locating 
the index for the highest AtDate, only the first data record needs to be 
retrieved.


The following summary of tests seems to confirm my observation. They were 
done on a quiet system (MS Windows 2000 Server, P4 3.0GHz with 
Hyperthreading, 1GB Memory, PostgreSQL shared_buffers = 5), starting 
with a test database before doing a vacuum:


set enable_seqscan = off;
select  Total runtime: 187.000 ms
again:  Total runtime: 78.000 ms
vacuum analyze verbose player;
select  Total runtime: 47.000 ms
again:  Total runtime: 47.000 ms
reindex table player;
select  Total runtime: 78.000 ms
again:  Total runtime: 63.000 ms
cluster pk_player on player;
select  Total runtime: 16.000 ms
again:  Total runtime: 0.000 ms
set enable_seqscan = on;
analyze verbose player;
select  Total runtime: 62.000 ms
again:  Total runtime: 78.000 ms

Previously, we have also tried to use LIMIT 1 instead of DISTINCT, but the 
performance was no better:
select PlayerID,AtDate from Player where PlayerID='0' order by PlayerID 
desc, AtDate desc LIMIT 1


Any clue or suggestions would be most appreciated. If you need further info 
or the full explain logs, please let me know.


Regards,
KC.
 



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings