[PERFORM] limitation using LIKE on ANY(array)
With 8.1.3, I get an error when trying to do this on a Text[] column : .. WHERE ANY(array) LIKE 'xx%' Indeed, I get rejected even with: .. WHERE ANY(array) = 'xx' In both cases, the error is: ERROR: syntax error at or near any ... It would only work as documented in the manual (8.10.5): SELECT * FROM sal_emp WHERE 1 = ANY (pay_by_quarter); It appears that this restriction is still in place in 8.2: http://developer.postgresql.org/docs/postgres/arrays.html Is that the case? Thanks in advance, KC.
Re: [PERFORM] SELECT MIN, MAX took longer time than SELECT
At 01:20 06/01/21, Jim C. Nasby wrote: BTW, these queries below are meaningless; they are not equivalent to min(logsn). esdt= explain analyze select LogSN from Log where create_time '2005/10/19' order by create_time limit 1; Thank you for pointing it out. It actually returns the min(logsn), as the index is on (create_time, logsn). To be more explicit, I have changed to query to: explain analyze select LogSN from Log where create_time '2005/10/19' order by create_time, logsn limit 1; esdt= \d log; create_time | character varying(23) | default '1970/01/01~00:00:00.000'::character varying logsn | integer | not null ... Indexes: pk_log PRIMARY KEY, btree (logsn) idx_logtime btree (create_time, logsn) Best regards, KC. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] SELECT MIN, MAX took longer time than SELECT COUNT, MIN, MAX
The following query took 17 seconds: select count(LogSN), min(LogSN), max(LogSN) from Log where create_time '2005/10/19'; Figuring that getting the count will involve scanning the database, I took it out, but the new query took 200 seconds: select min(LogSN), max(LogSN) from Log where create_time '2005/10/19'; Is it because the planner is using index pk_log instead of idx_logtime? Anyway to avoid that? I can get instant replies with 2 separate queries for min(LogSN) and max(LogSN) using order by create_time limit 1, but I can't get both values within 1 query using the limit 1 construct. Any suggestions? I am running pg 8.1.2 on Windows 2000. The queries are done immediately after a vacuum analyze. Best regards, KC. -- esdt= \d log; create_time | character varying(23) | default '1970/01/01~00:00:00.000'::char acter varying logsn | integer | not null ... Indexes: pk_log PRIMARY KEY, btree (logsn) idx_logtime btree (create_time, logsn) ... esdt= vacuum analyze log; VACUUM esdt= explain analyze select count(LogSN), min(LogSN), max(LogSN) from Log where create_time '2005/10/19'; Aggregate (cost=57817.74..57817.75 rows=1 width=4) (actual time=17403.381..17403.384 rows=1 loops=1) - Bitmap Heap Scan on log (cost=1458.31..57172.06 rows=86089 width=4) (actual time=180.368..17039.262 rows=106708 loops=1) Recheck Cond: ((create_time)::text '2005/10/19'::text) - Bitmap Index Scan on idx_logtime (cost=0.00..1458.31 rows=86089 width=0) (actual time=168.777..168.777 rows=106708 loops=1) Index Cond: ((create_time)::text '2005/10/19'::text) Total runtime: 17403.787 ms esdt= explain analyze select min(LogSN), max(LogSN) from Log where create_time '2005/10/19'; Result (cost=2.51..2.52 rows=1 width=0) (actual time=200051.507..200051.510 rows=1 loops=1) InitPlan - Limit (cost=0.00..1.26 rows=1 width=4) (actual time=18.541..18.544 rows=1 loops=1) - Index Scan using pk_log on log (cost=0.00..108047.11 rows=86089 width=4) (actual time=18.533..18.533 rows=1 loops=1) Filter: (((create_time)::text '2005/10/19'::text) AND (logsn IS NOT NULL)) - Limit (cost=0.00..1.26 rows=1 width=4) (actual time=200032.928..200032.931 rows=1 loops=1) - Index Scan Backward using pk_log on log (cost=0.00..108047.11 rows=86089 width=4) (actual time=200032.920..200032.920 rows=1 loops=1) Filter: (((create_time)::text '2005/10/19'::text) AND (logsn IS NOT NULL)) Total runtime: 200051.701 ms esdt= explain analyze select LogSN from Log where create_time '2005/10/19' order by create_time limit 1; Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.071..0.073 rows=1 loops=1) - Index Scan using idx_logtime on log (cost=0.00..84649.94 rows=86089 width=31) (actual time=0.063..0.063 rows=1 loops=1) Index Cond: ((create_time)::text '2005/10/19'::text) Total runtime: 0.182 ms esdt= explain analyze select LogSN from Log where create_time '2005/10/19' order by create_time desc limit 1; Limit (cost=0.00..0.98 rows=1 width=31) (actual time=0.058..0.061 rows=1 loops=1) - Index Scan Backward using idx_logtime on log (cost=0.00..84649.94 rows=86089 width=31) (actual time=0.051..0.051 rows=1 loops=1) Index Cond: ((create_time)::text '2005/10/19'::text) Total runtime: 0.186 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] indexes on primary and foreign keys
At 07:21 06/01/12, Michael Fuhr wrote: On Wed, Jan 11, 2006 at 02:38:42PM -0800, Burak Seydioglu wrote: I do a load of sql joins using primary and foreign keys. What i would like to know if PostgreSQL creates indexes on these columns automatically (in addition to using them to maintain referential integrity) or do I have to create an index manually on these columns as indicated below? CREATE TABLE cities ( city_id integer primary key, city_name varchar(50) ); CREATE INDEX city_id_index ON cities(city_id); PostgreSQL automatically creates indexes on primary keys. If you run the above CREATE TABLE statement in psql you should see a message to that effect: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index cities_pkey for table cities Is there a way to suppress this notice when I create tables in a script? Best regards, KC. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] indexes on primary and foreign keys
At 09:26 06/01/12, you wrote: On Jan 12, 2006, at 9:36 , K C Lau wrote: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index cities_pkey for table cities Is there a way to suppress this notice when I create tables in a script? Set[1] your log_min_messages to WARNING or higher[2]. [1](http://www.postgresql.org/docs/current/interactive/sql-set.html) [2](http://www.postgresql.org/docs/current/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN) Michael Glaesemann grzm myrealbox com Thanks. The side effect is that it would suppress other notices which might be useful. I was looking for a way to suppress the notice within the CREATE TABLE statement but could not. I noticed that when I specify a constraint name for the primary key, it would create an implicit index with the constraint name. So may be if the optional constraint name is specified by the user, then the notice can be suppressed. Indeed the manual already says that the index will be automatically created. BTW, there's an extra space in link[2] above which I have removed. Best regards, KC. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
Dear Merlin and all, That direct SQL returns in 0 ms. The problem only appears when a view is used. What we've done to work around this problem is to modify the table to add a field DataStatus which is set to 1 for the latest record for each player, and reset to 0 when it is superceded. A partial index is then created as: CREATE INDEX IDX_CurPlayer on Player (PlayerID) where DataStatus = 1; The VCurPlayer view is changed to: CREATE or REPLACE VIEW VCurPlayer as select * from Player where DataStatus = 1; and it now returns in 0 ms. This is not the best solution, but until (if ever) the original problem is fixed, we have not found an alternative work around. The good news is that even with the additional overhead of maintaining an extra index and the problem of vacuuming, pg 8.0.3 still performs significantly faster on Windows than MS Sql 2000 in our OLTP application testing so far. Thanks to all for your help. Best regards, KC. At 20:14 05/10/12, you wrote: KC wrote: So I guess it all comes back to the basic question: For the query select distinct on (PlayerID) * from Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc; can the optimizer recognise the fact the query is selecting by the primary key (PlayerID,AtDate), so it can skip the remaining rows for that PlayerID, as if LIMIT 1 is implied? Best regards, KC. Hi KC, have you tried: select * from player where playerid = '0' and atdate 99 order by platerid desc, atdate desc limit 1; ?? Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
At 20:17 05/09/23, K C Lau wrote: At 19:15 05/09/23, Simon Riggs wrote: select distinct on (PlayerID) PlayerID,AtDate from Player a where PlayerID='0' order by PlayerId, AtDate Desc; Does that work for you? Best Regards, Simon Riggs esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc; Unique (cost=0.00..1327.44 rows=2 width=23) (actual time=0.027..8.438 rows=1 loops=1) - Index Scan Backward using pk_player on player a (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..4.950 rows=1743 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 8.499 ms That is the fastest of all queries looping the 1743 rows. I do get the desired result by adding LIMIT 1: esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc LIMIT 1; Limit (cost=0.00..663.72 rows=1 width=23) (actual time=0.032..0.033 rows=1 loops=1) - Unique (cost=0.00..1327.44 rows=2 width=23) (actual time=0.028..0.028 rows=1 loops=1) - Index Scan Backward using pk_player on player a (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..0.022 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 0.094 ms However, when I use that within a function in a view, it is slow again: esdt= create or replace function player_max_atdate (varchar(32)) returns varchar(32) as $$ esdt$ select distinct on (PlayerID) AtDate from player where PlayerID= $1 order by PlayerID desc, AtDate desc limit 1; esdt$ $$ language sql immutable; CREATE FUNCTION esdt= create or replace view VCurPlayer3 as select * from Player where AtDate = player_max_atdate(PlayerID); CREATE VIEW esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where PlayerID='0'; Index Scan using pk_player on player (cost=0.00..1331.83 rows=9 width=23) (actual time=76.660..76.664 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = (player_max_atdate(playerid))::text) Total runtime: 76.716 ms Why wouldn't the function get the row as quickly as the direct sql does? Results from the following query suggests that the explain analyze output above only tells half the story, and that the function is in fact called 1743 times: esdt= create or replace view VCurPlayer3 as select distinct on (PlayerID) * from Player a where OID = (select distinct on (PlayerID) OID from Player b where b.PlayerID = a.PlayerID and b.AtDate = player_max_atdate(b.PlayerID) order by PlayerID desc, AtDate desc limit 1) order by PlayerId Desc, AtDate desc; CREATE VIEW esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where PlayerID='0'; Subquery Scan vcurplayer3 (cost=0.00..1715846.91 rows=1 width=68) (actual time=0.640..119.124 rows=1 loops=1) - Unique (cost=0.00..1715846.90 rows=1 width=776) (actual time=0.633..119.112 rows=1 loops=1) - Index Scan Backward using pk_player on player a (cost=0.00..1715846.88 rows=9 width=776) (actual time=0.628..119.104 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: (oid = (subplan)) SubPlan - Limit (cost=0.00..976.38 rows=1 width=27) (actual time=0.057..0.058 rows=1 loops=1743) - Unique (cost=0.00..976.38 rows=1 width=27) (actual time=0.052..0.052 rows=1 loops=1743) - Index Scan Backward using pk_player on player b (cost=0.00..976.36 rows=6 width=27) (actual time=0.047..0.047 rows=1 loops=1743) Index Cond: ((playerid)::text = ($0)::text) Filter: ((atdate)::text = (player_max_atdate(playerid))::text) Total runtime: 119.357 ms It would also explain the very long time taken by the pl/pgsql function I posted a bit earlier. So I guess it all comes back to the basic question: For the query select distinct on (PlayerID) * from Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc; can the optimizer recognise the fact the query is selecting by the primary key (PlayerID,AtDate), so it can skip the remaining rows for that PlayerID, as if LIMIT 1 is implied? Best regards, KC. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
Thank you all for your suggestions. I' tried, with some variations too, but still no success. The times given are the best of a few repeated tries on an 8.1 beta 2 db freshly migrated from 8.0.3 on Windows. For reference, only the following gets the record quickly: esdt= explain analyze select PlayerID,AtDate from Player a where PlayerID='0' and AtDate = (select b.AtDate from Player b where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc LIMIT 1); Index Scan using pk_player on player a (cost=0.75..4.26 rows=1 width=23) (actual time=0.054..0.057 rows=1 loops=1) Index Cond: (((playerid)::text = '0'::text) AND ((atdate)::text = ($0)::text)) InitPlan - Limit (cost=0.00..0.75 rows=1 width=23) (actual time=0.027..0.028 rows=1 loops=1) - Index Scan Backward using pk_player on player b (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.023..0.023 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 0.132 ms At 02:19 05/09/23, Kevin Grittner wrote: Have you tried the best choice pattern -- where you select the set of candidate rows and then exclude those for which a better choice exists within the set? I often get better results with this pattern than with the alternatives. esdt= explain analyze select PlayerID,AtDate from Player a where PlayerID='0' and not exists (select * from Player b where b.PlayerID = a.PlayerID and b.AtDate a.AtDate); Index Scan using pk_player on player a (cost=0.00..3032.46 rows=878 width=23) (actual time=35.820..35.823 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: (NOT (subplan)) SubPlan - Index Scan using pk_player on player b (cost=0.00..378.68 rows=389 width=776) (actual time=0.013..0.013 rows=1 loops=1743) Index Cond: (((playerid)::text = ($0)::text) AND ((atdate)::text ($1)::text)) Total runtime: 35.950 ms Note that it is faster than the LIMIT 1: esdt= explain analyze select PlayerID,AtDate from Player a where PlayerID='0' and AtDate = (select b.AtDate from Pl ayer b where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc LIMIT 1); Index Scan using pk_player on player a (cost=0.00..2789.07 rows=9 width=23) (actual time=41.366..41.371 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = ((subplan))::text) SubPlan - Limit (cost=0.00..0.83 rows=1 width=23) (actual time=0.013..0.014 rows=1 loops=1743) - Index Scan Backward using pk_player on player b (cost=0.00..970.53 rows=1166 width=23) (actual time=0.008..0.008 rows=1 loops=1743) Index Cond: ((playerid)::text = ($0)::text) Total runtime: 41.490 ms At 02:07 05/09/23, Merlin Moncure wrote: Here is a trick I use sometimes with views, etc. This may or may not be effective to solve your problem but it's worth a shot. Create one small SQL function taking date, etc. and returning the values and define it immutable. Now in-query it is treated like a constant. esdt= create or replace function player_max_atdate (varchar(32)) returns varchar(32) as $$ esdt$ select atdate from player where playerid = $1 order by playerid desc, AtDate desc limit 1; esdt$ $$ language sql immutable; CREATE FUNCTION esdt= create or replace view VCurPlayer3 as select * from Player where AtDate = player_max_atdate(PlayerID); CREATE VIEW esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where PlayerID='0'; Index Scan using pk_player on player (cost=0.00..1331.83 rows=9 width=23) (actual time=65.434..65.439 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = (player_max_atdate(playerid))::text) Total runtime: 65.508 ms While it says loops=1, the time suggests that it is going through all 1743 records for that PlayerID. I tried to simulate the fast subquery inside the function, but it is taking almost twice as much time: esdt= create or replace function player_max_atdate (varchar(32)) returns varchar(32) as $$ esdt$ select atdate from player a where playerid = $1 and AtDate = (select b.AtDate from Player b esdt$ where b.PlayerID = $1 order by b.PlayerID desc, b.AtDate desc LIMIT 1); esdt$ $$ language sql immutable; CREATE FUNCTION esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where PlayerID='0'; Index Scan using pk_player on player (cost=0.00..1331.83 rows=9 width=23) (actual time=119.369..119.373 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = (player_max_atdate(playerid))::text) Total runtime: 119.441 ms Adding another LIMIT 1 inside the function makes it even slower: esdt= create or replace function player_max_atdate (varchar(32)) returns varchar(32) as $$ esdt$ select atdate from player where playerid = $1 and AtDate = (select b.AtDate from Player b esdt$ where b.PlayerID = $1 order by b.PlayerID
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
At 19:15 05/09/23, Simon Riggs wrote: select distinct on (PlayerID) PlayerID,AtDate from Player a where PlayerID='0' order by PlayerId, AtDate Desc; Does that work for you? Best Regards, Simon Riggs esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from Player a where PlayerID='0' order by PlayerId, AtDate Desc; Unique (cost=1417.69..1426.47 rows=2 width=23) (actual time=31.231..36.609 rows=1 loops=1) - Sort (cost=1417.69..1422.08 rows=1756 width=23) (actual time=31.129..32.473 rows=1743 loops=1) Sort Key: playerid, atdate - Index Scan using pk_player on player a (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.035..6.575 rows=1743 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 36.943 ms The sort was eliminated with: order by PlayerId Desc, AtDate Desc: esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc; Unique (cost=0.00..1327.44 rows=2 width=23) (actual time=0.027..8.438 rows=1 loops=1) - Index Scan Backward using pk_player on player a (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..4.950 rows=1743 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 8.499 ms That is the fastest of all queries looping the 1743 rows. I do get the desired result by adding LIMIT 1: esdt= explain analyze select distinct on (PlayerID) PlayerID,AtDate from Player a where PlayerID='0' order by PlayerId Desc, AtDate Desc LIMIT 1; Limit (cost=0.00..663.72 rows=1 width=23) (actual time=0.032..0.033 rows=1 loops=1) - Unique (cost=0.00..1327.44 rows=2 width=23) (actual time=0.028..0.028 rows=1 loops=1) - Index Scan Backward using pk_player on player a (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..0.022 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 0.094 ms However, when I use that within a function in a view, it is slow again: esdt= create or replace function player_max_atdate (varchar(32)) returns varchar(32) as $$ esdt$ select distinct on (PlayerID) AtDate from player where PlayerID= $1 order by PlayerID desc, AtDate desc limit 1; esdt$ $$ language sql immutable; CREATE FUNCTION esdt= create or replace view VCurPlayer3 as select * from Player where AtDate = player_max_atdate(PlayerID); CREATE VIEW esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where PlayerID='0'; Index Scan using pk_player on player (cost=0.00..1331.83 rows=9 width=23) (actual time=76.660..76.664 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = (player_max_atdate(playerid))::text) Total runtime: 76.716 ms Why wouldn't the function get the row as quickly as the direct sql does? Best regards, KC. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
Dear Merlin, At 20:34 05/09/23, Merlin Moncure wrote: Can you time just the execution of this function and compare vs. pure SQL version? If the times are different, can you do a exaplain analyze of a prepared version of above? esdt= prepare test(character varying) as select atdate from player where esdt- playerid = $1 order by playerid desc, AtDate desc limit 1; PREPARE esdt= explain analyze execute test('0'); Limit (cost=0.00..0.83 rows=1 width=23) (actual time=0.032..0.033 rows=1 loops=1) - Index Scan Backward using pk_player on player (cost=0.00..970.53 rows=1166 width=23) (actual time=0.027..0.027 rows=1 loops=1) Index Cond: ((playerid)::text = ($1)::text) Total runtime: 0.088 ms The prepared SQL timing is similar to that of a direct SQL. esdt= create or replace view VCurPlayer3 as select * from Player where AtDate = player_max_atdate(PlayerID); This is wrong, it should have been create or replace view VCurPlayer3 as select *, player_max_atdate(PlayerID) as max_date from Player; Your suggestion returns all the records plus a max AtDate column for each PlayerID. What I want to get with the view is the record that has the max value of AtDate for each PlayerID. The AtDate is a varchar(23) field containing a string date of format 'mmddhh', not the SQL Date field. Sorry if that confused you. Something is not jiving here. However, if the server plan still does not come out correct, try the following (p.s. why is function returning varchar(32) and not date?): esdt= create or replace function player_max_atdate (varchar(32)) returns varchar(32) as $$ esdt$ DECLARE esdt$player_record record; esdt$return_date varchar(32); esdt$ BEGIN esdt$for player_record in execute esdt$'select atdate from player where playerid = \'' || $1 || '\' order by playerid desc, AtDate desc limit 1;' loop esdt$return_date = player_record.atdate; esdt$end loop; esdt$return return_date; esdt$ END; esdt$ $$ language plpgsql immutable; CREATE FUNCTION esdt= create or replace view VCurPlayer3 as select * from Player where AtDate = player_max_atdate(PlayerID); CREATE VIEW esdt= explain analyze select PlayerID,AtDate from VCurPlayer3 where PlayerID='0'; Index Scan using pk_player on player (cost=0.00..1331.83 rows=9 width=23) (actual time=849.021..849.025 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = (player_max_atdate(playerid))::text) Total runtime: 849.078 ms Your suggested plpgsql function seems to be even slower, with a best time of 849 ms after several tries. Is that expected? Thanks again and best regards, KC. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
Dear Mark, Thank you. That seems like a more manageable alternative if nothing else works out. It should cover many of the OLTP update transactions. But it does mean quite a bit of programming changes and adding another index on all such tables, and it would not cover those cases when we need to get the latest record before a certain time, for example. I'm wondering if this performance issue is common enough for other users to merit a fix in pg, especially as it seems that with MVCC, each of the data records need to be accessed in addition to scanning the index. Best regards, KC. At 09:40 05/09/24, Mark Kirkwood wrote: A small denormalization, where you mark the row with the latest atdate for each playerid may get you the performance you want. e.g: (8.1beta1) ALTER TABLE player ADD islastatdate boolean; UPDATE player SET islastatdate = true where (playerid,atdate) IN (SELECT playerid, atdate FROM vcurplayer); CREATE OR REPLACE VIEW vcurplayer AS SELECT * FROM player a WHERE islastatdate; CREATE INDEX player_id_lastatdate ON player(playerid, islastatdate) WHERE islastatdate; ANALYZE player; Generating some test data produced: EXPLAIN ANALYZE SELECT playerid,atdate FROM vcurplayer WHERE playerid='0'; QUERY PLAN Index Scan using player_id_lastatdate on player a (cost=0.00..4.33 rows=1 width=13) (actual time=0.142..0.149 rows=1 loops=1) Index Cond: ((playerid = '0'::text) AND (lastatdate = true)) Filter: lastatdate Total runtime: 0.272 ms (4 rows) Whereas with the original view definition: CREATE OR REPLACE VIEW vcurplayer AS SELECT * FROM player a WHERE a.atdate = ( SELECT max(b.atdate) FROM player b WHERE a.playerid = b.playerid); EXPLAIN ANALYZE SELECT playerid,atdate FROM vcurplayer WHERE playerid='0'; QUERY PLAN - Index Scan using player_id_date on player a (cost=0.00..7399.23 rows=11 width=13) (actual time=121.738..121.745 rows=1 loops=1) Index Cond: (playerid = '0'::text) Filter: (atdate = (subplan)) SubPlan - Result (cost=1.72..1.73 rows=1 width=0) (actual time=0.044..0.047 rows=1 loops=2000) InitPlan - Limit (cost=0.00..1.72 rows=1 width=4) (actual time=0.028..0.031 rows=1 loops=2000) - Index Scan Backward using player_id_date on player b (cost=0.00..3787.94 rows=2198 width=4) (actual time=0.019..0.019 rows=1 loops=2000) Index Cond: ($0 = playerid) Filter: (atdate IS NOT NULL) Total runtime: 121.916 ms (11 rows) Note that my generated data has too many rows for each playerid, but the difference in performance should illustrate the idea. Cheers Mark ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
We use similar views as base views throughout our OLTP system to get the latest time-based record(s). So it is quite impossible to use summary tables etc. Are there other ways to do it? The subquery would pinpoint the record(s) with the composite primary key. Both MS Sql and Oracle do not have such performance problem. So this problem is effectively stopping us from migrating to PostgreSQL. Any suggestions would be most appreciated. Best regards, KC. At 16:40 05/09/22, Simon Riggs wrote: On Thu, 2005-09-22 at 12:21 +0800, K C Lau wrote: Investigating further on this problem I brought up in June, the following query with pg 8.0.3 on Windows scans all 1743 data records for a player: esdt= explain analyze select PlayerID,AtDate from Player a where PlayerID='0' and AtDate = (select b.AtDate from Player b where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc LIMIT 1); Total runtime: 51.133 ms Using a static value in the subquery produces the desired result below, but since we use views for our queries (see last part of this email), we cannot push the static value into the subquery: esdt= explain analyze select PlayerID,AtDate from Player a where PlayerID='0' and AtDate = (select b.AtDate from Player b where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc LIMIT 1); Total runtime: 0.149 ms The Player table has a primary key on PlayerID, AtDate. Is there a way to stop the inner-most index scan looping all 1743 data records for that player? Is that a bug or known issue? Currently the planner can't tell whether a subquery is correlated or not until it has planned the query. So it is unable to push down the qualification automatically in the way you have achieved manually. The new min() optimisation doesn't yet work with GROUP BY which is what you would use to reformulate the query that way, so no luck that way either. If you don't want to do this in a view, calculate the values for all players at once and store the values in a summary table for when you need them. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
At 20:48 05/09/22, Simon Riggs wrote: On Thu, 2005-09-22 at 18:40 +0800, K C Lau wrote: We use similar views as base views throughout our OLTP system to get the latest time-based record(s). So it is quite impossible to use summary tables etc. Are there other ways to do it? The subquery would pinpoint the record(s) with the composite primary key. Both MS Sql and Oracle do not have such performance problem. So this problem is effectively stopping us from migrating to PostgreSQL. Any suggestions would be most appreciated. Even if this were fixed for 8.1, which seems unlikely, would you be able to move to that release immediately? Yes. In fact when we first developed our system a few years ago, we tested on MS7.0, Oracle 8 and PG 7.1.1 and we did not hit that problem. When we try again with PG 8.0, the performance becomes unbearable, but other areas appear ok and other queries are often faster than MS Sql2k. Maybe its possible to reconstruct your query with sub-sub-selects so that you have a correlated query with manually pushed down clauses, which also references a more constant base view? We would be most happy to try them if we have some example views or pointers. Is a 51ms query really such a problem for you? Unfortunately yes, as our target performance is in the high hundreds of transactions per sec. And 51 ms is already the best case for a single select, with everything cached in memory immediately after the same select which took 390 ms on a quiet system. Best Regards, Simon Riggs Best regards, KC. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
At 22:37 05/09/22, Merlin Moncure wrote: create or replace view VCurPlayer as select * from Player a where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID= b.PlayerID); Here is a trick I use sometimes with views, etc. This may or may not be effective to solve your problem but it's worth a shot. Create one small SQL function taking date, etc. and returning the values and define it immutable. Now in-query it is treated like a constant. We don't use functions as a rule, but I would be glad to give it a try. I would most appreciate if you could define a sample function and rewrite the VCurPlayer view above. Both PlayerID and AtDate are varchar fields. Another useful application for this feature is when you have nested views (view 1 queries view 2) and you need to filter records based on fields from view 2 which are not returned in view 1. Impossible? in view 2 add clause where v2.f between f_min() and f_max(), them being immutable functions which can grab filter criteria based on inputs or values from a table. Merlin Best regards, KC. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue
Hi All, Investigating further on this problem I brought up in June, the following query with pg 8.0.3 on Windows scans all 1743 data records for a player: esdt= explain analyze select PlayerID,AtDate from Player a where PlayerID='0' and AtDate = (select b.AtDate from Player b where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc LIMIT 1); Index Scan using pk_player on player a (cost=0.00..2789.07 rows=9 width=23) (a ctual time=51.046..51.049 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = ((subplan))::text) SubPlan - Limit (cost=0.00..0.83 rows=1 width=23) (actual time=0.016..0.017 rows =1 loops=1743) - Index Scan Backward using pk_player on player b (cost=0.00..970. 53 rows=1166 width=23) (actual time=0.011..0.011 rows=1 loops=1743) Index Cond: ((playerid)::text = ($0)::text) Total runtime: 51.133 ms Using a static value in the subquery produces the desired result below, but since we use views for our queries (see last part of this email), we cannot push the static value into the subquery: esdt= explain analyze select PlayerID,AtDate from Player a where PlayerID='0' and AtDate = (select b.AtDate from Player b where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc LIMIT 1); Index Scan using pk_player on player a (cost=0.75..4.26 rows=1 width=23) (actu al time=0.054..0.058 rows=1 loops=1) Index Cond: (((playerid)::text = '0'::text) AND ((atdate)::text = ($0)::t ext)) InitPlan - Limit (cost=0.00..0.75 rows=1 width=23) (actual time=0.028..0.029 rows =1 loops=1) - Index Scan Backward using pk_player on player b (cost=0.00..1323 .05 rows=1756 width=23) (actual time=0.023..0.023 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 0.149 ms The Player table has a primary key on PlayerID, AtDate. Is there a way to stop the inner-most index scan looping all 1743 data records for that player? Is that a bug or known issue? BTW, I tried using 8.1 beta2 on Windows and its performance is similar, I have also tried other variants such as MAX and DISTINCT but with no success. Any help is most appreciated. Best regards, KC. At 10:46 05/06/15, K C Lau wrote: Hi All, I previously posted the following as a sequel to my SELECT DISTINCT Performance Issue question. We would most appreciate any clue or suggestions on how to overcome this show-stopping issue. We are using 8.0.3 on Windows. Is it a known limitation when using a view with SELECT ... LIMIT 1? Would the forthcoming performance enhancement with MAX help when used within a view, as in: create or replace view VCurPlayer as select * from Player a where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID = b.PlayerID); select PlayerID,AtDate from VCurPlayer where PlayerID='0'; Thanks and regards, KC. - Actually the problem with LIMIT 1 query is when we use views with the LIMIT 1 construct. The direct SQL is ok: esdt= explain analyze select PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc LIMIT 1; Limit (cost=0.00..1.37 rows=1 width=23) (actual time=0.000..0.000 rows=1 loops=1) - Index Scan Backward using pk_player on player (cost=0.00..16074.23 rows=11770 width=23) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 0.000 ms esdt= create or replace view VCurPlayer3 as select * from Player a where AtDate = (select b.AtDate from Player b where a.PlayerID = b.PlayerID order by b.PlayerID desc, b.AtDate desc LIMIT 1); esdt= explain analyze select PlayerID,AtDate,version from VCurPlayer3 where PlayerID='0'; Index Scan using pk_player on player a (cost=0.00..33072.78 rows=59 width=27) (actual time=235.000..235.000 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = ((subplan))::text) SubPlan - Limit (cost=0.00..1.44 rows=1 width=23) (actual time=0.117..0.117 rows=1 loops=1743) - Index Scan Backward using pk_player on player b (cost=0.00..14023.67 rows=9727 width=23) (actual time=0.108..0.108 rows=1 loops=1743) Index Cond: (($0)::text = (playerid)::text) Total runtime: 235.000 ms ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] SELECT DISTINCT Performance Issue
At 19:45 05/06/06, PFC wrote: Previously, we have also tried to use LIMIT 1 instead of DISTINCT, but the performance was no better: select PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc LIMIT 1 The DISTINCT query will pull out all the rows and keep only one, so the one with LIMIT should be faster. Can you post explain analyze of the LIMIT query ? Actually the problem with LIMIT 1 query is when we use views with the LIMIT 1 construct. The direct SQL is ok: esdt= explain analyze select PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc LIMIT 1; Limit (cost=0.00..1.37 rows=1 width=23) (actual time=0.000..0.000 rows=1 loops =1) - Index Scan Backward using pk_player on player (cost=0.00..16074.23 rows= 11770 width=23) (actual time=0.000..0.000 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 0.000 ms esdt= create or replace view VCurPlayer3 as select * from Player a where AtDate = (select b.AtDate from Player b where a.PlayerID = b.PlayerID order by b.PlayerID desc, b.AtDate desc LIMIT 1); esdt= explain analyze select PlayerID,AtDate,version from VCurPlayer3 where Pla yerID='0'; Index Scan using pk_player on player a (cost=0.00..33072.78 rows=59 width=27) (actual time=235.000..235.000 rows=1 loops=1) Index Cond: ((playerid)::text = '0'::text) Filter: ((atdate)::text = ((subplan))::text) SubPlan - Limit (cost=0.00..1.44 rows=1 width=23) (actual time=0.117..0.117 rows =1 loops=1743) - Index Scan Backward using pk_player on player b (cost=0.00..1402 3.67 rows=9727 width=23) (actual time=0.108..0.108 rows=1 loops=1743) Index Cond: (($0)::text = (playerid)::text) Total runtime: 235.000 ms The problem appears to be in the loops=1743 scanning all 1743 data records for that player. Regards, KC. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] SELECT DISTINCT Performance Issue
Hi All, We are testing PostgreSQL 8.0.3 on MS Windows for porting an OLTP system from MS SqlServer. We got a major performance issue which seems to boil down to the following type of query: select DISTINCT ON (PlayerID) PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc; The Player table has primary key (PlayerID, AtDate) representing data over time and the query gets the latest data for a player. With enable_seqscan forced off (which I'm not sure if that should be done for a production system), the average query still takes a very long time to return a record: esdt= explain analyze select DISTINCT ON (PlayerID) PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc; Unique (cost=0.00..2507.66 rows=1 width=23) (actual time=0.000..187.000 rows=1 loops=1) - Index Scan Backward using pk_player on player (cost=0.00..2505.55 rows=8 43 width=23) (actual time=0.000..187.000 rows=1227 loops=1) Index Cond: ((playerid)::text = '0'::text) Total runtime: 187.000 ms It appears that all the 1227 data records for that player were searched, even when doing a backward index scan. I would presume that, after locating the index for the highest AtDate, only the first data record needs to be retrieved. The following summary of tests seems to confirm my observation. They were done on a quiet system (MS Windows 2000 Server, P4 3.0GHz with Hyperthreading, 1GB Memory, PostgreSQL shared_buffers = 5), starting with a test database before doing a vacuum: set enable_seqscan = off; select Total runtime: 187.000 ms again: Total runtime: 78.000 ms vacuum analyze verbose player; select Total runtime: 47.000 ms again: Total runtime: 47.000 ms reindex table player; select Total runtime: 78.000 ms again: Total runtime: 63.000 ms cluster pk_player on player; select Total runtime: 16.000 ms again: Total runtime: 0.000 ms set enable_seqscan = on; analyze verbose player; select Total runtime: 62.000 ms again: Total runtime: 78.000 ms Previously, we have also tried to use LIMIT 1 instead of DISTINCT, but the performance was no better: select PlayerID,AtDate from Player where PlayerID='0' order by PlayerID desc, AtDate desc LIMIT 1 Any clue or suggestions would be most appreciated. If you need further info or the full explain logs, please let me know. Regards, KC. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings